Training: 2022-04-12 19:42:25,049-rank_id: 0
Training: 2022-04-12 19:42:55,671-: margin_list              [1.0, 0.0, 0.4]
Training: 2022-04-12 19:42:55,672-: network                  r100
Training: 2022-04-12 19:42:55,672-: resume                   False
Training: 2022-04-12 19:42:55,672-: output                   work_dirs/wf42m_pfc02_r100
Training: 2022-04-12 19:42:55,672-: embedding_size           512
Training: 2022-04-12 19:42:55,672-: sample_rate              0.2
Training: 2022-04-12 19:42:55,672-: interclass_filtering_threshold0
Training: 2022-04-12 19:42:55,672-: fp16                     True
Training: 2022-04-12 19:42:55,672-: batch_size               128
Training: 2022-04-12 19:42:55,672-: optimizer                sgd
Training: 2022-04-12 19:42:55,672-: lr                       0.1
Training: 2022-04-12 19:42:55,672-: momentum                 0.9
Training: 2022-04-12 19:42:55,673-: weight_decay             0.0005
Training: 2022-04-12 19:42:55,673-: verbose                  10000
Training: 2022-04-12 19:42:55,673-: frequent                 10
Training: 2022-04-12 19:42:55,673-: dali                     True
Training: 2022-04-12 19:42:55,673-: rec                      /train_tmp/WebFace42M
Training: 2022-04-12 19:42:55,673-: num_classes              2059906
Training: 2022-04-12 19:42:55,673-: num_image                42474557
Training: 2022-04-12 19:42:55,673-: num_epoch                20
Training: 2022-04-12 19:42:55,673-: warmup_epoch             0
Training: 2022-04-12 19:42:55,673-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-12 19:42:55,673-: total_batch_size         1024
Training: 2022-04-12 19:42:55,673-: warmup_step              0
Training: 2022-04-12 19:42:55,673-: total_step               829580
Training: 2022-04-12 19:43:57,091-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-12 19:44:04,153-Speed 2621.92 samples/sec   Loss 42.5304   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 16384   Required: 109 hours
Training: 2022-04-12 19:44:08,052-Speed 2627.37 samples/sec   Loss 42.6028   LearningRate 0.1000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 16384   Required: 103 hours
Training: 2022-04-12 19:44:11,898-Speed 2663.44 samples/sec   Loss 42.8713   LearningRate 0.1000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 16384   Required: 99 hours
Training: 2022-04-12 19:44:15,772-Speed 2643.31 samples/sec   Loss 43.3095   LearningRate 0.1000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 16384   Required: 97 hours
Training: 2022-04-12 19:44:19,641-Speed 2647.58 samples/sec   Loss 43.3497   LearningRate 0.1000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 16384   Required: 96 hours
Training: 2022-04-12 19:44:23,558-Speed 2615.48 samples/sec   Loss 43.4846   LearningRate 0.1000   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 16384   Required: 95 hours
Training: 2022-04-12 19:44:27,482-Speed 2610.28 samples/sec   Loss 43.5972   LearningRate 0.1000   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 16384   Required: 95 hours
Training: 2022-04-12 19:44:31,325-Speed 2664.88 samples/sec   Loss 43.0671   LearningRate 0.1000   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 16384   Required: 94 hours
Training: 2022-04-12 19:44:35,209-Speed 2637.76 samples/sec   Loss 43.2108   LearningRate 0.1000   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 16384   Required: 94 hours
Training: 2022-04-12 19:44:39,129-Speed 2613.09 samples/sec   Loss 43.3634   LearningRate 0.1000   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 32768   Required: 93 hours
Training: 2022-04-12 19:44:42,990-Speed 2653.32 samples/sec   Loss 43.2378   LearningRate 0.1000   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 32768   Required: 93 hours
Training: 2022-04-12 19:44:46,847-Speed 2655.40 samples/sec   Loss 43.3276   LearningRate 0.1000   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 32768   Required: 93 hours
Training: 2022-04-12 19:44:50,701-Speed 2657.17 samples/sec   Loss 43.5597   LearningRate 0.1000   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 32768   Required: 92 hours
Training: 2022-04-12 19:44:54,548-Speed 2662.05 samples/sec   Loss 43.1397   LearningRate 0.1000   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 32768   Required: 92 hours
Training: 2022-04-12 19:44:58,395-Speed 2662.99 samples/sec   Loss 43.3202   LearningRate 0.1000   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 32768   Required: 92 hours
Training: 2022-04-12 19:45:02,240-Speed 2664.75 samples/sec   Loss 43.0240   LearningRate 0.1000   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 32768   Required: 92 hours
Training: 2022-04-12 19:45:06,166-Speed 2608.34 samples/sec   Loss 43.0135   LearningRate 0.1000   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 32768   Required: 92 hours
Training: 2022-04-12 19:45:10,080-Speed 2617.94 samples/sec   Loss 43.3390   LearningRate 0.1000   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 32768   Required: 92 hours
Training: 2022-04-12 19:45:13,959-Speed 2640.07 samples/sec   Loss 43.0878   LearningRate 0.1000   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 32768   Required: 91 hours
Training: 2022-04-12 19:45:17,827-Speed 2648.24 samples/sec   Loss 42.9079   LearningRate 0.0999   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:21,675-Speed 2661.79 samples/sec   Loss 43.0201   LearningRate 0.0999   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:25,537-Speed 2652.53 samples/sec   Loss 42.9868   LearningRate 0.0999   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:29,390-Speed 2658.00 samples/sec   Loss 42.9571   LearningRate 0.0999   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:33,275-Speed 2636.77 samples/sec   Loss 42.9252   LearningRate 0.0999   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:37,132-Speed 2655.06 samples/sec   Loss 43.1064   LearningRate 0.0999   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:41,031-Speed 2627.43 samples/sec   Loss 42.9818   LearningRate 0.0999   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:44,887-Speed 2656.52 samples/sec   Loss 42.8356   LearningRate 0.0999   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:48,739-Speed 2659.10 samples/sec   Loss 42.7623   LearningRate 0.0999   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:52,591-Speed 2658.33 samples/sec   Loss 42.7866   LearningRate 0.0999   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 19:45:56,476-Speed 2636.73 samples/sec   Loss 42.8538   LearningRate 0.0999   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 19:46:00,355-Speed 2640.70 samples/sec   Loss 42.7730   LearningRate 0.0999   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 19:46:04,213-Speed 2655.39 samples/sec   Loss 42.9514   LearningRate 0.0999   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:08,063-Speed 2660.34 samples/sec   Loss 42.7928   LearningRate 0.0999   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:11,924-Speed 2652.76 samples/sec   Loss 42.7317   LearningRate 0.0999   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:15,778-Speed 2658.14 samples/sec   Loss 42.6753   LearningRate 0.0999   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:19,705-Speed 2607.79 samples/sec   Loss 42.6768   LearningRate 0.0999   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:23,568-Speed 2651.61 samples/sec   Loss 42.6679   LearningRate 0.0999   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:27,490-Speed 2611.66 samples/sec   Loss 42.6630   LearningRate 0.0999   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:31,398-Speed 2620.97 samples/sec   Loss 42.6817   LearningRate 0.0999   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 19:46:35,261-Speed 2651.37 samples/sec   Loss 42.6596   LearningRate 0.0999   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:46:39,146-Speed 2636.36 samples/sec   Loss 42.6045   LearningRate 0.0999   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:46:43,019-Speed 2645.17 samples/sec   Loss 42.6643   LearningRate 0.0999   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:46:46,859-Speed 2667.30 samples/sec   Loss 42.6625   LearningRate 0.0999   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:46:50,717-Speed 2655.27 samples/sec   Loss 42.5804   LearningRate 0.0999   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:46:54,623-Speed 2622.11 samples/sec   Loss 42.5641   LearningRate 0.0999   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:46:58,491-Speed 2648.52 samples/sec   Loss 42.6173   LearningRate 0.0999   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:47:02,389-Speed 2627.68 samples/sec   Loss 42.5451   LearningRate 0.0999   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:47:06,272-Speed 2637.37 samples/sec   Loss 42.4640   LearningRate 0.0999   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:47:10,137-Speed 2650.07 samples/sec   Loss 42.5158   LearningRate 0.0999   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 19:47:14,010-Speed 2644.60 samples/sec   Loss 42.4684   LearningRate 0.0999   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:17,896-Speed 2636.06 samples/sec   Loss 42.5050   LearningRate 0.0999   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:21,787-Speed 2631.79 samples/sec   Loss 42.4539   LearningRate 0.0999   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:25,747-Speed 2586.84 samples/sec   Loss 42.4475   LearningRate 0.0999   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:29,673-Speed 2608.46 samples/sec   Loss 42.4660   LearningRate 0.0999   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:33,531-Speed 2654.96 samples/sec   Loss 42.5184   LearningRate 0.0999   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:37,423-Speed 2631.80 samples/sec   Loss 42.4954   LearningRate 0.0999   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:41,275-Speed 2659.25 samples/sec   Loss 42.4437   LearningRate 0.0999   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:45,136-Speed 2652.63 samples/sec   Loss 42.3833   LearningRate 0.0999   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:48,997-Speed 2652.83 samples/sec   Loss 42.4676   LearningRate 0.0999   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:52,844-Speed 2662.35 samples/sec   Loss 42.3712   LearningRate 0.0999   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:47:56,708-Speed 2651.03 samples/sec   Loss 42.3637   LearningRate 0.0999   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:00,556-Speed 2661.27 samples/sec   Loss 42.4162   LearningRate 0.0998   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:04,407-Speed 2660.02 samples/sec   Loss 42.4495   LearningRate 0.0998   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:08,254-Speed 2662.37 samples/sec   Loss 42.3778   LearningRate 0.0998   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:12,116-Speed 2651.73 samples/sec   Loss 42.4134   LearningRate 0.0998   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:15,962-Speed 2663.13 samples/sec   Loss 42.3165   LearningRate 0.0998   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:19,834-Speed 2644.94 samples/sec   Loss 42.3406   LearningRate 0.0998   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:23,699-Speed 2650.05 samples/sec   Loss 42.3699   LearningRate 0.0998   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:27,556-Speed 2655.89 samples/sec   Loss 42.3046   LearningRate 0.0998   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:32,307-Speed 2156.08 samples/sec   Loss 42.3396   LearningRate 0.0998   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:36,159-Speed 2658.60 samples/sec   Loss 42.3294   LearningRate 0.0998   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:40,010-Speed 2659.81 samples/sec   Loss 42.2768   LearningRate 0.0998   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:43,859-Speed 2660.99 samples/sec   Loss 42.2089   LearningRate 0.0998   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:47,765-Speed 2621.76 samples/sec   Loss 42.2807   LearningRate 0.0998   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:51,630-Speed 2650.14 samples/sec   Loss 42.2119   LearningRate 0.0998   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:55,522-Speed 2632.03 samples/sec   Loss 42.2720   LearningRate 0.0998   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:48:59,427-Speed 2622.71 samples/sec   Loss 42.1816   LearningRate 0.0998   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:03,321-Speed 2630.07 samples/sec   Loss 42.2307   LearningRate 0.0998   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:07,168-Speed 2662.65 samples/sec   Loss 42.2073   LearningRate 0.0998   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:10,995-Speed 2676.44 samples/sec   Loss 42.1903   LearningRate 0.0998   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:14,881-Speed 2635.49 samples/sec   Loss 42.2138   LearningRate 0.0998   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:18,853-Speed 2578.55 samples/sec   Loss 42.1714   LearningRate 0.0998   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:22,708-Speed 2657.42 samples/sec   Loss 42.1144   LearningRate 0.0998   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:26,584-Speed 2642.83 samples/sec   Loss 42.1573   LearningRate 0.0998   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:30,442-Speed 2654.56 samples/sec   Loss 42.1422   LearningRate 0.0998   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:34,300-Speed 2655.08 samples/sec   Loss 42.0638   LearningRate 0.0998   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:38,178-Speed 2640.91 samples/sec   Loss 42.1135   LearningRate 0.0998   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:42,179-Speed 2560.79 samples/sec   Loss 41.9945   LearningRate 0.0998   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:46,043-Speed 2650.88 samples/sec   Loss 42.0112   LearningRate 0.0998   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:49,876-Speed 2672.14 samples/sec   Loss 41.9996   LearningRate 0.0998   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:53,745-Speed 2647.32 samples/sec   Loss 42.0212   LearningRate 0.0998   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:49:57,598-Speed 2658.22 samples/sec   Loss 41.9687   LearningRate 0.0998   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:01,455-Speed 2655.86 samples/sec   Loss 41.9489   LearningRate 0.0998   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:05,360-Speed 2623.09 samples/sec   Loss 41.9953   LearningRate 0.0998   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:09,249-Speed 2633.17 samples/sec   Loss 41.9818   LearningRate 0.0998   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:13,097-Speed 2662.03 samples/sec   Loss 41.9204   LearningRate 0.0998   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:16,941-Speed 2664.45 samples/sec   Loss 41.9794   LearningRate 0.0998   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:20,809-Speed 2648.52 samples/sec   Loss 41.9502   LearningRate 0.0998   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:24,680-Speed 2645.88 samples/sec   Loss 41.9432   LearningRate 0.0998   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:28,587-Speed 2621.94 samples/sec   Loss 42.0069   LearningRate 0.0998   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:32,469-Speed 2638.68 samples/sec   Loss 41.9488   LearningRate 0.0998   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:36,315-Speed 2662.77 samples/sec   Loss 41.8047   LearningRate 0.0998   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:40,171-Speed 2656.05 samples/sec   Loss 41.8269   LearningRate 0.0997   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:44,031-Speed 2653.71 samples/sec   Loss 41.7951   LearningRate 0.0997   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:47,898-Speed 2648.92 samples/sec   Loss 41.8016   LearningRate 0.0997   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:51,752-Speed 2657.72 samples/sec   Loss 41.8751   LearningRate 0.0997   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:55,603-Speed 2659.90 samples/sec   Loss 41.7278   LearningRate 0.0997   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:50:59,456-Speed 2657.99 samples/sec   Loss 41.7916   LearningRate 0.0997   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:03,313-Speed 2655.84 samples/sec   Loss 41.8093   LearningRate 0.0997   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:07,150-Speed 2668.96 samples/sec   Loss 41.7445   LearningRate 0.0997   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:11,125-Speed 2576.68 samples/sec   Loss 41.8262   LearningRate 0.0997   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:14,973-Speed 2662.14 samples/sec   Loss 41.7468   LearningRate 0.0997   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:18,892-Speed 2613.71 samples/sec   Loss 41.7342   LearningRate 0.0997   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:22,804-Speed 2618.05 samples/sec   Loss 41.7011   LearningRate 0.0997   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:26,659-Speed 2657.61 samples/sec   Loss 41.6749   LearningRate 0.0997   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:30,519-Speed 2653.65 samples/sec   Loss 41.6210   LearningRate 0.0997   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:34,378-Speed 2653.61 samples/sec   Loss 41.6678   LearningRate 0.0997   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:38,236-Speed 2655.23 samples/sec   Loss 41.6911   LearningRate 0.0997   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:42,094-Speed 2654.99 samples/sec   Loss 41.5984   LearningRate 0.0997   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:45,915-Speed 2679.99 samples/sec   Loss 41.6814   LearningRate 0.0997   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:49,776-Speed 2653.27 samples/sec   Loss 41.5634   LearningRate 0.0997   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:53,653-Speed 2641.81 samples/sec   Loss 41.5432   LearningRate 0.0997   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:51:57,498-Speed 2664.47 samples/sec   Loss 41.5780   LearningRate 0.0997   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:01,459-Speed 2585.18 samples/sec   Loss 41.5293   LearningRate 0.0997   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:05,340-Speed 2639.19 samples/sec   Loss 41.5144   LearningRate 0.0997   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:09,202-Speed 2652.16 samples/sec   Loss 41.5784   LearningRate 0.0997   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:13,077-Speed 2643.26 samples/sec   Loss 41.5274   LearningRate 0.0997   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:16,924-Speed 2662.96 samples/sec   Loss 41.5047   LearningRate 0.0997   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:20,783-Speed 2653.88 samples/sec   Loss 41.5687   LearningRate 0.0997   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:24,625-Speed 2665.77 samples/sec   Loss 41.5304   LearningRate 0.0997   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:28,484-Speed 2654.95 samples/sec   Loss 41.4842   LearningRate 0.0997   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:32,409-Speed 2609.53 samples/sec   Loss 41.3958   LearningRate 0.0997   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:36,356-Speed 2594.58 samples/sec   Loss 41.5194   LearningRate 0.0997   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:40,215-Speed 2654.11 samples/sec   Loss 41.4695   LearningRate 0.0997   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:44,058-Speed 2665.23 samples/sec   Loss 41.4078   LearningRate 0.0997   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:47,900-Speed 2666.61 samples/sec   Loss 41.4669   LearningRate 0.0997   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:51,749-Speed 2661.33 samples/sec   Loss 41.4162   LearningRate 0.0997   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:55,619-Speed 2646.64 samples/sec   Loss 41.3238   LearningRate 0.0997   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:52:59,459-Speed 2666.77 samples/sec   Loss 41.3471   LearningRate 0.0997   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:03,312-Speed 2658.46 samples/sec   Loss 41.3947   LearningRate 0.0997   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 1048576   Required: 90 hours
Training: 2022-04-12 19:53:07,272-Speed 2586.19 samples/sec   Loss 41.2457   LearningRate 0.0997   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:11,284-Speed 2552.77 samples/sec   Loss 41.3320   LearningRate 0.0997   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:15,330-Speed 2531.34 samples/sec   Loss 41.3041   LearningRate 0.0997   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:19,275-Speed 2596.85 samples/sec   Loss 41.3373   LearningRate 0.0997   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:23,120-Speed 2664.67 samples/sec   Loss 41.3215   LearningRate 0.0996   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:26,973-Speed 2658.03 samples/sec   Loss 41.2658   LearningRate 0.0996   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:30,836-Speed 2651.20 samples/sec   Loss 41.3112   LearningRate 0.0996   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:34,696-Speed 2653.50 samples/sec   Loss 41.2044   LearningRate 0.0996   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:38,553-Speed 2655.16 samples/sec   Loss 41.2451   LearningRate 0.0996   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:42,451-Speed 2627.48 samples/sec   Loss 41.2717   LearningRate 0.0996   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:46,281-Speed 2674.71 samples/sec   Loss 41.2494   LearningRate 0.0996   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:50,130-Speed 2661.14 samples/sec   Loss 41.1898   LearningRate 0.0996   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:53,981-Speed 2660.22 samples/sec   Loss 41.1657   LearningRate 0.0996   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:53:57,830-Speed 2660.60 samples/sec   Loss 41.1613   LearningRate 0.0996   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:54:01,679-Speed 2661.10 samples/sec   Loss 41.2220   LearningRate 0.0996   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:54:05,551-Speed 2645.58 samples/sec   Loss 41.1442   LearningRate 0.0996   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 19:54:09,398-Speed 2662.92 samples/sec   Loss 41.1496   LearningRate 0.0996   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:13,245-Speed 2662.11 samples/sec   Loss 41.1264   LearningRate 0.0996   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:17,090-Speed 2663.60 samples/sec   Loss 41.1509   LearningRate 0.0996   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:20,937-Speed 2662.34 samples/sec   Loss 41.1268   LearningRate 0.0996   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:24,775-Speed 2669.26 samples/sec   Loss 41.0994   LearningRate 0.0996   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:28,625-Speed 2660.64 samples/sec   Loss 41.0465   LearningRate 0.0996   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:32,473-Speed 2661.68 samples/sec   Loss 41.0645   LearningRate 0.0996   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:36,350-Speed 2641.53 samples/sec   Loss 41.0397   LearningRate 0.0996   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:40,205-Speed 2656.96 samples/sec   Loss 40.9561   LearningRate 0.0996   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:44,178-Speed 2578.01 samples/sec   Loss 41.0124   LearningRate 0.0996   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:48,033-Speed 2656.75 samples/sec   Loss 41.1072   LearningRate 0.0996   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:51,890-Speed 2656.08 samples/sec   Loss 41.0189   LearningRate 0.0996   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:55,737-Speed 2662.45 samples/sec   Loss 40.9806   LearningRate 0.0996   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:54:59,604-Speed 2648.42 samples/sec   Loss 40.9770   LearningRate 0.0996   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:03,447-Speed 2665.06 samples/sec   Loss 40.9253   LearningRate 0.0996   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:07,294-Speed 2662.69 samples/sec   Loss 40.9440   LearningRate 0.0996   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:11,141-Speed 2662.59 samples/sec   Loss 40.9179   LearningRate 0.0996   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:14,990-Speed 2661.01 samples/sec   Loss 40.9538   LearningRate 0.0996   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:18,861-Speed 2645.92 samples/sec   Loss 40.9404   LearningRate 0.0996   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:22,765-Speed 2623.95 samples/sec   Loss 40.8540   LearningRate 0.0996   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:26,642-Speed 2642.26 samples/sec   Loss 40.9988   LearningRate 0.0996   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:30,511-Speed 2647.02 samples/sec   Loss 40.8583   LearningRate 0.0996   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:34,368-Speed 2655.24 samples/sec   Loss 40.8915   LearningRate 0.0996   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:38,211-Speed 2665.10 samples/sec   Loss 40.8874   LearningRate 0.0996   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:42,037-Speed 2677.72 samples/sec   Loss 40.8494   LearningRate 0.0996   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:45,879-Speed 2665.71 samples/sec   Loss 40.8729   LearningRate 0.0996   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:49,748-Speed 2647.75 samples/sec   Loss 40.8416   LearningRate 0.0996   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:53,607-Speed 2654.09 samples/sec   Loss 40.7785   LearningRate 0.0996   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:55:57,455-Speed 2661.85 samples/sec   Loss 40.7893   LearningRate 0.0996   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:01,332-Speed 2642.02 samples/sec   Loss 40.7354   LearningRate 0.0995   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:05,185-Speed 2658.03 samples/sec   Loss 40.7616   LearningRate 0.0995   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:09,041-Speed 2656.06 samples/sec   Loss 40.7822   LearningRate 0.0995   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:12,890-Speed 2662.34 samples/sec   Loss 40.7469   LearningRate 0.0995   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:16,739-Speed 2660.58 samples/sec   Loss 40.7567   LearningRate 0.0995   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:20,562-Speed 2679.77 samples/sec   Loss 40.7845   LearningRate 0.0995   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:24,445-Speed 2637.57 samples/sec   Loss 40.7217   LearningRate 0.0995   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:28,296-Speed 2660.14 samples/sec   Loss 40.6904   LearningRate 0.0995   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:56:32,132-Speed 2669.91 samples/sec   Loss 40.6544   LearningRate 0.0995   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:35,986-Speed 2657.45 samples/sec   Loss 40.7414   LearningRate 0.0995   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:39,843-Speed 2655.55 samples/sec   Loss 40.6627   LearningRate 0.0995   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:43,703-Speed 2653.55 samples/sec   Loss 40.6364   LearningRate 0.0995   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:47,594-Speed 2631.97 samples/sec   Loss 40.5832   LearningRate 0.0995   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:51,463-Speed 2647.71 samples/sec   Loss 40.5860   LearningRate 0.0995   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:55,319-Speed 2656.54 samples/sec   Loss 40.5484   LearningRate 0.0995   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:56:59,176-Speed 2655.86 samples/sec   Loss 40.5930   LearningRate 0.0995   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:57:03,027-Speed 2659.17 samples/sec   Loss 40.6221   LearningRate 0.0995   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:57:06,969-Speed 2598.30 samples/sec   Loss 40.5851   LearningRate 0.0995   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:57:11,002-Speed 2539.24 samples/sec   Loss 40.5833   LearningRate 0.0995   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:14,906-Speed 2624.02 samples/sec   Loss 40.5884   LearningRate 0.0995   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:18,760-Speed 2658.39 samples/sec   Loss 40.4775   LearningRate 0.0995   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:22,611-Speed 2659.98 samples/sec   Loss 40.5093   LearningRate 0.0995   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:26,457-Speed 2663.22 samples/sec   Loss 40.4569   LearningRate 0.0995   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:30,316-Speed 2654.39 samples/sec   Loss 40.4250   LearningRate 0.0995   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:34,175-Speed 2654.05 samples/sec   Loss 40.5026   LearningRate 0.0995   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:38,031-Speed 2656.02 samples/sec   Loss 40.4758   LearningRate 0.0995   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:42,048-Speed 2549.70 samples/sec   Loss 40.4176   LearningRate 0.0995   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:46,000-Speed 2591.78 samples/sec   Loss 40.4618   LearningRate 0.0995   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:49,820-Speed 2681.69 samples/sec   Loss 40.4291   LearningRate 0.0995   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:53,682-Speed 2651.67 samples/sec   Loss 40.3924   LearningRate 0.0995   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:57:57,532-Speed 2660.94 samples/sec   Loss 40.3789   LearningRate 0.0995   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:01,377-Speed 2663.76 samples/sec   Loss 40.4080   LearningRate 0.0995   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:05,221-Speed 2663.80 samples/sec   Loss 40.3101   LearningRate 0.0995   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:09,076-Speed 2657.04 samples/sec   Loss 40.3845   LearningRate 0.0995   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:13,050-Speed 2577.61 samples/sec   Loss 40.3328   LearningRate 0.0995   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:16,897-Speed 2662.42 samples/sec   Loss 40.2691   LearningRate 0.0995   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:20,749-Speed 2659.33 samples/sec   Loss 40.2989   LearningRate 0.0995   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:24,598-Speed 2661.11 samples/sec   Loss 40.3804   LearningRate 0.0995   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:28,422-Speed 2678.36 samples/sec   Loss 40.2580   LearningRate 0.0995   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:32,312-Speed 2634.00 samples/sec   Loss 40.1867   LearningRate 0.0995   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:58:36,148-Speed 2669.90 samples/sec   Loss 40.3195   LearningRate 0.0995   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:58:40,032-Speed 2636.81 samples/sec   Loss 40.2760   LearningRate 0.0995   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:58:43,896-Speed 2651.08 samples/sec   Loss 40.1930   LearningRate 0.0994   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:58:47,745-Speed 2661.48 samples/sec   Loss 40.2465   LearningRate 0.0994   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:58:51,613-Speed 2647.95 samples/sec   Loss 40.1891   LearningRate 0.0994   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:58:55,635-Speed 2546.81 samples/sec   Loss 40.1783   LearningRate 0.0994   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:58:59,534-Speed 2626.79 samples/sec   Loss 40.0880   LearningRate 0.0994   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:59:03,398-Speed 2650.80 samples/sec   Loss 40.1053   LearningRate 0.0994   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:59:07,258-Speed 2653.66 samples/sec   Loss 40.0922   LearningRate 0.0994   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:59:11,129-Speed 2645.40 samples/sec   Loss 40.2074   LearningRate 0.0994   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 19:59:15,006-Speed 2642.27 samples/sec   Loss 40.1522   LearningRate 0.0994   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:18,853-Speed 2662.49 samples/sec   Loss 39.9927   LearningRate 0.0994   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:22,707-Speed 2657.87 samples/sec   Loss 40.0784   LearningRate 0.0994   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:26,564-Speed 2655.41 samples/sec   Loss 40.0351   LearningRate 0.0994   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:30,420-Speed 2656.05 samples/sec   Loss 39.9989   LearningRate 0.0994   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:34,279-Speed 2654.41 samples/sec   Loss 40.1087   LearningRate 0.0994   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:38,129-Speed 2660.57 samples/sec   Loss 39.9762   LearningRate 0.0994   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:41,975-Speed 2662.91 samples/sec   Loss 40.0479   LearningRate 0.0994   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:45,838-Speed 2651.70 samples/sec   Loss 40.0027   LearningRate 0.0994   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:49,751-Speed 2617.56 samples/sec   Loss 39.9511   LearningRate 0.0994   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:53,585-Speed 2671.46 samples/sec   Loss 39.9656   LearningRate 0.0994   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 19:59:57,424-Speed 2668.39 samples/sec   Loss 39.9624   LearningRate 0.0994   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:01,275-Speed 2659.68 samples/sec   Loss 39.9551   LearningRate 0.0994   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:05,119-Speed 2664.19 samples/sec   Loss 39.9514   LearningRate 0.0994   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:08,975-Speed 2656.07 samples/sec   Loss 39.9417   LearningRate 0.0994   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:12,825-Speed 2660.53 samples/sec   Loss 39.8884   LearningRate 0.0994   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:16,671-Speed 2663.08 samples/sec   Loss 39.9012   LearningRate 0.0994   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:20,519-Speed 2661.99 samples/sec   Loss 39.9391   LearningRate 0.0994   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:24,365-Speed 2663.24 samples/sec   Loss 39.8745   LearningRate 0.0994   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:28,228-Speed 2651.45 samples/sec   Loss 39.8240   LearningRate 0.0994   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:32,169-Speed 2598.79 samples/sec   Loss 39.8180   LearningRate 0.0994   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:00:36,140-Speed 2578.82 samples/sec   Loss 39.8768   LearningRate 0.0994   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:00:40,027-Speed 2635.04 samples/sec   Loss 39.7742   LearningRate 0.0994   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:00:44,017-Speed 2567.75 samples/sec   Loss 39.7691   LearningRate 0.0994   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:00:47,862-Speed 2663.45 samples/sec   Loss 39.7529   LearningRate 0.0994   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:00:51,708-Speed 2663.37 samples/sec   Loss 39.7866   LearningRate 0.0994   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:00:55,566-Speed 2654.61 samples/sec   Loss 39.7313   LearningRate 0.0994   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:00:59,417-Speed 2659.26 samples/sec   Loss 39.7998   LearningRate 0.0994   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:01:03,279-Speed 2652.41 samples/sec   Loss 39.7164   LearningRate 0.0994   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:01:07,131-Speed 2658.75 samples/sec   Loss 39.6912   LearningRate 0.0994   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:01:10,981-Speed 2660.29 samples/sec   Loss 39.7123   LearningRate 0.0994   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:01:14,798-Speed 2683.82 samples/sec   Loss 39.7934   LearningRate 0.0994   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:01:18,650-Speed 2659.18 samples/sec   Loss 39.7308   LearningRate 0.0994   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:01:22,497-Speed 2662.28 samples/sec   Loss 39.6438   LearningRate 0.0994   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:01:26,343-Speed 2663.31 samples/sec   Loss 39.7132   LearningRate 0.0993   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:01:30,691-Speed 2356.47 samples/sec   Loss 39.6244   LearningRate 0.0993   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:01:34,601-Speed 2619.36 samples/sec   Loss 39.6483   LearningRate 0.0993   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:01:38,459-Speed 2654.56 samples/sec   Loss 39.6867   LearningRate 0.0993   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:01:42,311-Speed 2659.76 samples/sec   Loss 39.5841   LearningRate 0.0993   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:01:46,159-Speed 2661.83 samples/sec   Loss 39.6679   LearningRate 0.0993   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:01:50,014-Speed 2656.78 samples/sec   Loss 39.5977   LearningRate 0.0993   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:01:53,880-Speed 2649.34 samples/sec   Loss 39.5822   LearningRate 0.0993   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:01:57,729-Speed 2661.24 samples/sec   Loss 39.6654   LearningRate 0.0993   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:01,650-Speed 2611.86 samples/sec   Loss 39.4988   LearningRate 0.0993   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:06,211-Speed 2245.87 samples/sec   Loss 39.5830   LearningRate 0.0993   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:10,063-Speed 2658.93 samples/sec   Loss 39.4532   LearningRate 0.0993   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:13,917-Speed 2657.57 samples/sec   Loss 39.5505   LearningRate 0.0993   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:02:17,874-Speed 2588.72 samples/sec   Loss 39.5254   LearningRate 0.0993   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:21,784-Speed 2619.65 samples/sec   Loss 39.5191   LearningRate 0.0993   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:25,627-Speed 2664.75 samples/sec   Loss 39.4855   LearningRate 0.0993   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:29,931-Speed 2380.19 samples/sec   Loss 39.4702   LearningRate 0.0993   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:33,796-Speed 2649.61 samples/sec   Loss 39.4657   LearningRate 0.0993   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:38,062-Speed 2400.72 samples/sec   Loss 39.4489   LearningRate 0.0993   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:41,911-Speed 2661.35 samples/sec   Loss 39.4566   LearningRate 0.0993   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:45,756-Speed 2663.90 samples/sec   Loss 39.4355   LearningRate 0.0993   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:49,624-Speed 2648.38 samples/sec   Loss 39.3976   LearningRate 0.0993   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:53,497-Speed 2644.47 samples/sec   Loss 39.3934   LearningRate 0.0993   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:02:57,343-Speed 2663.67 samples/sec   Loss 39.4137   LearningRate 0.0993   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:01,193-Speed 2659.80 samples/sec   Loss 39.3886   LearningRate 0.0993   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:05,099-Speed 2622.61 samples/sec   Loss 39.3333   LearningRate 0.0993   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:08,995-Speed 2628.46 samples/sec   Loss 39.2035   LearningRate 0.0993   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:12,856-Speed 2653.29 samples/sec   Loss 39.2837   LearningRate 0.0993   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:16,725-Speed 2647.38 samples/sec   Loss 39.3784   LearningRate 0.0993   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:20,610-Speed 2636.69 samples/sec   Loss 39.3632   LearningRate 0.0993   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:24,467-Speed 2655.53 samples/sec   Loss 39.3235   LearningRate 0.0993   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:28,323-Speed 2657.14 samples/sec   Loss 39.1619   LearningRate 0.0993   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:32,174-Speed 2658.86 samples/sec   Loss 39.3767   LearningRate 0.0993   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:36,029-Speed 2656.97 samples/sec   Loss 39.2234   LearningRate 0.0993   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:03:39,882-Speed 2658.38 samples/sec   Loss 39.2403   LearningRate 0.0993   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:03:43,721-Speed 2668.74 samples/sec   Loss 39.1903   LearningRate 0.0993   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:47,574-Speed 2657.89 samples/sec   Loss 39.2098   LearningRate 0.0993   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:51,437-Speed 2651.51 samples/sec   Loss 39.1761   LearningRate 0.0993   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:03:55,289-Speed 2658.62 samples/sec   Loss 39.1466   LearningRate 0.0993   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:03:59,144-Speed 2657.19 samples/sec   Loss 39.1346   LearningRate 0.0993   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:03,004-Speed 2653.38 samples/sec   Loss 39.1762   LearningRate 0.0993   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:06,854-Speed 2660.45 samples/sec   Loss 39.1302   LearningRate 0.0992   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:10,732-Speed 2641.02 samples/sec   Loss 39.1963   LearningRate 0.0992   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:14,586-Speed 2657.74 samples/sec   Loss 39.0633   LearningRate 0.0992   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:18,442-Speed 2656.33 samples/sec   Loss 39.1371   LearningRate 0.0992   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:22,310-Speed 2648.08 samples/sec   Loss 39.1250   LearningRate 0.0992   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:26,340-Speed 2541.95 samples/sec   Loss 38.9594   LearningRate 0.0992   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:30,194-Speed 2657.04 samples/sec   Loss 39.0327   LearningRate 0.0992   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:04:34,129-Speed 2603.37 samples/sec   Loss 39.1091   LearningRate 0.0992   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:04:37,986-Speed 2655.87 samples/sec   Loss 38.9936   LearningRate 0.0992   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:04:41,841-Speed 2656.60 samples/sec   Loss 38.9710   LearningRate 0.0992   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:04:45,733-Speed 2631.76 samples/sec   Loss 38.9302   LearningRate 0.0992   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:04:49,775-Speed 2534.31 samples/sec   Loss 39.0973   LearningRate 0.0992   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:04:53,704-Speed 2607.40 samples/sec   Loss 39.0165   LearningRate 0.0992   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:04:57,557-Speed 2658.29 samples/sec   Loss 39.0115   LearningRate 0.0992   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:01,447-Speed 2633.42 samples/sec   Loss 39.0031   LearningRate 0.0992   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:05,467-Speed 2548.05 samples/sec   Loss 38.9980   LearningRate 0.0992   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:09,507-Speed 2535.46 samples/sec   Loss 38.8823   LearningRate 0.0992   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:13,522-Speed 2550.68 samples/sec   Loss 38.8606   LearningRate 0.0992   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:17,410-Speed 2634.71 samples/sec   Loss 38.9217   LearningRate 0.0992   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:21,262-Speed 2658.77 samples/sec   Loss 38.8542   LearningRate 0.0992   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:25,167-Speed 2623.21 samples/sec   Loss 38.7892   LearningRate 0.0992   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:29,031-Speed 2650.59 samples/sec   Loss 38.8637   LearningRate 0.0992   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:32,918-Speed 2635.39 samples/sec   Loss 38.8750   LearningRate 0.0992   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:36,783-Speed 2649.88 samples/sec   Loss 38.7469   LearningRate 0.0992   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:40,681-Speed 2627.60 samples/sec   Loss 38.7871   LearningRate 0.0992   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:44,541-Speed 2653.45 samples/sec   Loss 38.6208   LearningRate 0.0992   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:48,416-Speed 2643.59 samples/sec   Loss 38.7532   LearningRate 0.0992   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:05:52,262-Speed 2662.76 samples/sec   Loss 38.6912   LearningRate 0.0992   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:05:56,140-Speed 2641.60 samples/sec   Loss 38.7821   LearningRate 0.0992   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:05:59,993-Speed 2658.14 samples/sec   Loss 38.6954   LearningRate 0.0992   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:06:03,849-Speed 2656.94 samples/sec   Loss 38.7402   LearningRate 0.0992   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:07,714-Speed 2649.46 samples/sec   Loss 38.7240   LearningRate 0.0992   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:11,582-Speed 2647.70 samples/sec   Loss 38.6733   LearningRate 0.0992   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:15,433-Speed 2659.60 samples/sec   Loss 38.6158   LearningRate 0.0992   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:19,291-Speed 2654.91 samples/sec   Loss 38.5968   LearningRate 0.0992   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:23,154-Speed 2651.40 samples/sec   Loss 38.6091   LearningRate 0.0992   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:27,027-Speed 2644.53 samples/sec   Loss 38.6587   LearningRate 0.0992   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:30,881-Speed 2658.19 samples/sec   Loss 38.5811   LearningRate 0.0992   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:34,734-Speed 2658.26 samples/sec   Loss 38.7139   LearningRate 0.0992   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:38,600-Speed 2649.52 samples/sec   Loss 38.4925   LearningRate 0.0992   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:06:42,463-Speed 2651.14 samples/sec   Loss 38.5850   LearningRate 0.0992   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:06:46,341-Speed 2640.93 samples/sec   Loss 38.5757   LearningRate 0.0992   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:06:50,222-Speed 2639.58 samples/sec   Loss 38.5402   LearningRate 0.0991   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:06:54,084-Speed 2651.94 samples/sec   Loss 38.5598   LearningRate 0.0991   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:06:57,949-Speed 2650.23 samples/sec   Loss 38.5989   LearningRate 0.0991   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:01,826-Speed 2642.10 samples/sec   Loss 38.5215   LearningRate 0.0991   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:05,690-Speed 2650.22 samples/sec   Loss 38.4559   LearningRate 0.0991   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:09,584-Speed 2630.38 samples/sec   Loss 38.4935   LearningRate 0.0991   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:13,462-Speed 2641.16 samples/sec   Loss 38.5077   LearningRate 0.0991   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:17,345-Speed 2637.89 samples/sec   Loss 38.4518   LearningRate 0.0991   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:21,213-Speed 2647.97 samples/sec   Loss 38.4694   LearningRate 0.0991   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:25,099-Speed 2635.68 samples/sec   Loss 38.3846   LearningRate 0.0991   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:28,988-Speed 2633.55 samples/sec   Loss 38.3976   LearningRate 0.0991   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:32,877-Speed 2633.72 samples/sec   Loss 38.4048   LearningRate 0.0991   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:07:36,753-Speed 2642.75 samples/sec   Loss 38.3564   LearningRate 0.0991   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:07:40,625-Speed 2645.27 samples/sec   Loss 38.2873   LearningRate 0.0991   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:07:44,522-Speed 2628.28 samples/sec   Loss 38.3042   LearningRate 0.0991   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:07:48,558-Speed 2537.31 samples/sec   Loss 38.3016   LearningRate 0.0991   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:07:52,474-Speed 2615.40 samples/sec   Loss 38.3053   LearningRate 0.0991   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:07:56,362-Speed 2634.51 samples/sec   Loss 38.3925   LearningRate 0.0991   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:08:00,248-Speed 2635.85 samples/sec   Loss 38.1772   LearningRate 0.0991   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:08:04,125-Speed 2641.81 samples/sec   Loss 38.0769   LearningRate 0.0991   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:07,995-Speed 2646.69 samples/sec   Loss 38.1563   LearningRate 0.0991   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:11,870-Speed 2642.82 samples/sec   Loss 38.4046   LearningRate 0.0991   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:15,736-Speed 2649.26 samples/sec   Loss 38.1527   LearningRate 0.0991   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:19,609-Speed 2644.66 samples/sec   Loss 38.2230   LearningRate 0.0991   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:23,490-Speed 2639.61 samples/sec   Loss 38.1597   LearningRate 0.0991   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:27,385-Speed 2629.22 samples/sec   Loss 38.1225   LearningRate 0.0991   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:31,289-Speed 2623.33 samples/sec   Loss 38.1904   LearningRate 0.0991   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:35,174-Speed 2636.40 samples/sec   Loss 38.0662   LearningRate 0.0991   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:39,057-Speed 2637.45 samples/sec   Loss 38.1134   LearningRate 0.0991   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:08:42,955-Speed 2628.18 samples/sec   Loss 38.1362   LearningRate 0.0991   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:08:46,839-Speed 2637.44 samples/sec   Loss 38.0315   LearningRate 0.0991   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:08:50,713-Speed 2643.78 samples/sec   Loss 37.9904   LearningRate 0.0991   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:08:54,598-Speed 2636.24 samples/sec   Loss 38.0017   LearningRate 0.0991   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:08:58,479-Speed 2639.57 samples/sec   Loss 37.9783   LearningRate 0.0991   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:02,353-Speed 2643.23 samples/sec   Loss 38.0460   LearningRate 0.0991   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:06,231-Speed 2640.96 samples/sec   Loss 38.0099   LearningRate 0.0991   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:10,129-Speed 2627.99 samples/sec   Loss 38.0569   LearningRate 0.0991   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:14,024-Speed 2629.17 samples/sec   Loss 38.0828   LearningRate 0.0991   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:17,912-Speed 2634.62 samples/sec   Loss 37.8833   LearningRate 0.0991   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:21,791-Speed 2640.97 samples/sec   Loss 37.8297   LearningRate 0.0991   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:25,696-Speed 2622.35 samples/sec   Loss 37.9622   LearningRate 0.0991   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:29,599-Speed 2624.91 samples/sec   Loss 37.8013   LearningRate 0.0990   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:33,500-Speed 2625.09 samples/sec   Loss 37.9154   LearningRate 0.0990   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:37,411-Speed 2618.57 samples/sec   Loss 37.8259   LearningRate 0.0990   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:41,293-Speed 2638.71 samples/sec   Loss 37.8161   LearningRate 0.0990   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:45,181-Speed 2634.48 samples/sec   Loss 37.7841   LearningRate 0.0990   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:49,083-Speed 2624.26 samples/sec   Loss 37.8513   LearningRate 0.0990   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:52,978-Speed 2630.00 samples/sec   Loss 37.8044   LearningRate 0.0990   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:09:56,868-Speed 2632.71 samples/sec   Loss 37.7173   LearningRate 0.0990   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:00,785-Speed 2615.13 samples/sec   Loss 37.7698   LearningRate 0.0990   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:10:04,845-Speed 2523.20 samples/sec   Loss 37.8085   LearningRate 0.0990   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:10:08,841-Speed 2562.36 samples/sec   Loss 37.8604   LearningRate 0.0990   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:10:12,751-Speed 2619.61 samples/sec   Loss 37.7562   LearningRate 0.0990   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:10:16,634-Speed 2637.72 samples/sec   Loss 37.6119   LearningRate 0.0990   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:10:20,538-Speed 2623.40 samples/sec   Loss 37.6041   LearningRate 0.0990   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:10:24,437-Speed 2626.81 samples/sec   Loss 37.6087   LearningRate 0.0990   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:28,334-Speed 2628.91 samples/sec   Loss 37.6244   LearningRate 0.0990   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:32,226-Speed 2631.49 samples/sec   Loss 37.6001   LearningRate 0.0990   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:36,117-Speed 2632.39 samples/sec   Loss 37.6001   LearningRate 0.0990   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:40,044-Speed 2608.18 samples/sec   Loss 37.5621   LearningRate 0.0990   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:43,953-Speed 2619.91 samples/sec   Loss 37.6180   LearningRate 0.0990   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:47,853-Speed 2626.20 samples/sec   Loss 37.6152   LearningRate 0.0990   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:51,755-Speed 2624.90 samples/sec   Loss 37.5811   LearningRate 0.0990   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:55,783-Speed 2543.41 samples/sec   Loss 37.6891   LearningRate 0.0990   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:10:59,689-Speed 2622.01 samples/sec   Loss 37.4972   LearningRate 0.0990   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:03,583-Speed 2630.65 samples/sec   Loss 37.4964   LearningRate 0.0990   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:07,527-Speed 2596.67 samples/sec   Loss 37.5289   LearningRate 0.0990   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:11,548-Speed 2547.33 samples/sec   Loss 37.4582   LearningRate 0.0990   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:15,604-Speed 2525.06 samples/sec   Loss 37.4443   LearningRate 0.0990   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:19,597-Speed 2564.85 samples/sec   Loss 37.4963   LearningRate 0.0990   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:23,497-Speed 2626.22 samples/sec   Loss 37.5297   LearningRate 0.0990   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:27,404-Speed 2622.11 samples/sec   Loss 37.3847   LearningRate 0.0990   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:11:31,306-Speed 2624.32 samples/sec   Loss 37.3837   LearningRate 0.0990   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:35,243-Speed 2602.34 samples/sec   Loss 37.4402   LearningRate 0.0990   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:39,150-Speed 2621.20 samples/sec   Loss 37.4219   LearningRate 0.0990   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:43,181-Speed 2540.99 samples/sec   Loss 37.2878   LearningRate 0.0990   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:47,230-Speed 2529.57 samples/sec   Loss 37.3528   LearningRate 0.0990   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:51,133-Speed 2624.40 samples/sec   Loss 37.3855   LearningRate 0.0990   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:55,034-Speed 2626.00 samples/sec   Loss 37.3495   LearningRate 0.0990   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:11:58,937-Speed 2624.23 samples/sec   Loss 37.2557   LearningRate 0.0990   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:02,844-Speed 2621.67 samples/sec   Loss 37.1253   LearningRate 0.0990   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:06,746-Speed 2625.23 samples/sec   Loss 37.2704   LearningRate 0.0990   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:10,635-Speed 2633.25 samples/sec   Loss 37.2503   LearningRate 0.0990   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:14,536-Speed 2626.04 samples/sec   Loss 37.1716   LearningRate 0.0989   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:18,483-Speed 2594.93 samples/sec   Loss 37.2437   LearningRate 0.0989   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:22,405-Speed 2611.59 samples/sec   Loss 37.2620   LearningRate 0.0989   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:26,387-Speed 2572.53 samples/sec   Loss 37.2134   LearningRate 0.0989   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:30,489-Speed 2497.17 samples/sec   Loss 37.1718   LearningRate 0.0989   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:34,486-Speed 2562.14 samples/sec   Loss 37.1826   LearningRate 0.0989   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:38,427-Speed 2598.72 samples/sec   Loss 37.1326   LearningRate 0.0989   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:42,335-Speed 2621.24 samples/sec   Loss 37.1341   LearningRate 0.0989   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:46,436-Speed 2497.60 samples/sec   Loss 36.9572   LearningRate 0.0989   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:50,539-Speed 2496.87 samples/sec   Loss 37.0367   LearningRate 0.0989   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:12:54,536-Speed 2562.60 samples/sec   Loss 37.0501   LearningRate 0.0989   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:12:58,442-Speed 2621.82 samples/sec   Loss 37.0338   LearningRate 0.0989   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:02,342-Speed 2626.37 samples/sec   Loss 37.0853   LearningRate 0.0989   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:06,251-Speed 2620.26 samples/sec   Loss 37.0500   LearningRate 0.0989   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:10,159-Speed 2621.48 samples/sec   Loss 36.9836   LearningRate 0.0989   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:14,059-Speed 2626.10 samples/sec   Loss 36.8891   LearningRate 0.0989   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:17,957-Speed 2627.56 samples/sec   Loss 36.9875   LearningRate 0.0989   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:21,877-Speed 2613.38 samples/sec   Loss 36.9237   LearningRate 0.0989   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:25,792-Speed 2615.94 samples/sec   Loss 36.9314   LearningRate 0.0989   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:29,712-Speed 2613.13 samples/sec   Loss 36.8525   LearningRate 0.0989   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:33,595-Speed 2638.24 samples/sec   Loss 36.8027   LearningRate 0.0989   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:37,497-Speed 2624.63 samples/sec   Loss 36.8146   LearningRate 0.0989   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:41,399-Speed 2624.82 samples/sec   Loss 36.9001   LearningRate 0.0989   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:45,300-Speed 2626.12 samples/sec   Loss 36.8108   LearningRate 0.0989   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:49,199-Speed 2626.57 samples/sec   Loss 36.7389   LearningRate 0.0989   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:53,119-Speed 2612.91 samples/sec   Loss 36.8937   LearningRate 0.0989   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:13:57,065-Speed 2596.25 samples/sec   Loss 36.6866   LearningRate 0.0989   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:00,976-Speed 2619.18 samples/sec   Loss 36.8824   LearningRate 0.0989   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:04,982-Speed 2556.22 samples/sec   Loss 36.6646   LearningRate 0.0989   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:08,889-Speed 2621.42 samples/sec   Loss 36.7035   LearningRate 0.0989   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:12,803-Speed 2616.62 samples/sec   Loss 36.6914   LearningRate 0.0989   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:14:16,710-Speed 2622.02 samples/sec   Loss 36.6477   LearningRate 0.0989   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:14:20,599-Speed 2634.62 samples/sec   Loss 36.7906   LearningRate 0.0989   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:24,563-Speed 2583.15 samples/sec   Loss 36.7113   LearningRate 0.0989   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:28,673-Speed 2493.30 samples/sec   Loss 36.7178   LearningRate 0.0989   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:32,628-Speed 2589.31 samples/sec   Loss 36.6456   LearningRate 0.0989   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:36,536-Speed 2620.96 samples/sec   Loss 36.5202   LearningRate 0.0989   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:40,448-Speed 2618.00 samples/sec   Loss 36.5608   LearningRate 0.0989   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:44,350-Speed 2625.15 samples/sec   Loss 36.5546   LearningRate 0.0989   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:48,248-Speed 2627.50 samples/sec   Loss 36.4731   LearningRate 0.0989   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:52,149-Speed 2626.10 samples/sec   Loss 36.4947   LearningRate 0.0989   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:56,049-Speed 2626.00 samples/sec   Loss 36.5854   LearningRate 0.0989   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:14:59,971-Speed 2612.20 samples/sec   Loss 36.4984   LearningRate 0.0988   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:15:03,859-Speed 2633.87 samples/sec   Loss 36.5313   LearningRate 0.0988   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:07,788-Speed 2606.82 samples/sec   Loss 36.4358   LearningRate 0.0988   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:11,706-Speed 2613.68 samples/sec   Loss 36.4918   LearningRate 0.0988   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:15,629-Speed 2611.60 samples/sec   Loss 36.3662   LearningRate 0.0988   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:19,566-Speed 2601.59 samples/sec   Loss 36.3526   LearningRate 0.0988   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:23,471-Speed 2622.82 samples/sec   Loss 36.4004   LearningRate 0.0988   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:27,393-Speed 2611.69 samples/sec   Loss 36.3576   LearningRate 0.0988   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:31,318-Speed 2609.86 samples/sec   Loss 36.3057   LearningRate 0.0988   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:35,218-Speed 2625.69 samples/sec   Loss 36.3566   LearningRate 0.0988   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:39,141-Speed 2611.06 samples/sec   Loss 36.3466   LearningRate 0.0988   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:43,082-Speed 2598.53 samples/sec   Loss 36.4149   LearningRate 0.0988   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:15:46,969-Speed 2635.61 samples/sec   Loss 36.2610   LearningRate 0.0988   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:50,867-Speed 2627.59 samples/sec   Loss 36.1962   LearningRate 0.0988   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:54,762-Speed 2629.88 samples/sec   Loss 36.1179   LearningRate 0.0988   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:15:58,664-Speed 2624.90 samples/sec   Loss 36.1487   LearningRate 0.0988   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:02,604-Speed 2599.55 samples/sec   Loss 36.2884   LearningRate 0.0988   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:06,511-Speed 2621.90 samples/sec   Loss 36.2084   LearningRate 0.0988   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:10,422-Speed 2618.48 samples/sec   Loss 36.1738   LearningRate 0.0988   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:14,322-Speed 2627.03 samples/sec   Loss 36.1480   LearningRate 0.0988   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:18,234-Speed 2617.94 samples/sec   Loss 36.2383   LearningRate 0.0988   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:22,143-Speed 2620.02 samples/sec   Loss 36.0457   LearningRate 0.0988   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:26,044-Speed 2626.01 samples/sec   Loss 36.0366   LearningRate 0.0988   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:16:29,946-Speed 2625.04 samples/sec   Loss 36.0548   LearningRate 0.0988   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:33,847-Speed 2626.04 samples/sec   Loss 36.1658   LearningRate 0.0988   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:37,748-Speed 2625.44 samples/sec   Loss 36.0212   LearningRate 0.0988   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:41,656-Speed 2621.33 samples/sec   Loss 36.0153   LearningRate 0.0988   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:45,560-Speed 2623.03 samples/sec   Loss 35.9839   LearningRate 0.0988   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:49,469-Speed 2620.27 samples/sec   Loss 36.0517   LearningRate 0.0988   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:53,384-Speed 2615.96 samples/sec   Loss 35.9245   LearningRate 0.0988   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:16:57,285-Speed 2625.59 samples/sec   Loss 35.9891   LearningRate 0.0988   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:01,192-Speed 2621.64 samples/sec   Loss 35.9411   LearningRate 0.0988   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:05,098-Speed 2622.53 samples/sec   Loss 35.8126   LearningRate 0.0988   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:09,014-Speed 2615.56 samples/sec   Loss 35.7562   LearningRate 0.0988   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:17:12,916-Speed 2624.82 samples/sec   Loss 35.8749   LearningRate 0.0988   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:17:16,818-Speed 2624.95 samples/sec   Loss 35.7547   LearningRate 0.0988   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:17:20,892-Speed 2513.59 samples/sec   Loss 35.8858   LearningRate 0.0988   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:17:24,967-Speed 2513.85 samples/sec   Loss 35.7020   LearningRate 0.0988   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:28,992-Speed 2544.77 samples/sec   Loss 35.7964   LearningRate 0.0988   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:32,901-Speed 2620.35 samples/sec   Loss 35.8057   LearningRate 0.0988   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:36,817-Speed 2615.37 samples/sec   Loss 35.7299   LearningRate 0.0988   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:40,749-Speed 2604.83 samples/sec   Loss 35.7588   LearningRate 0.0988   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:44,654-Speed 2623.16 samples/sec   Loss 35.7959   LearningRate 0.0987   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:48,563-Speed 2619.76 samples/sec   Loss 35.7541   LearningRate 0.0987   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:52,470-Speed 2621.54 samples/sec   Loss 35.6726   LearningRate 0.0987   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:17:56,390-Speed 2612.65 samples/sec   Loss 35.6798   LearningRate 0.0987   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:00,289-Speed 2626.95 samples/sec   Loss 35.8385   LearningRate 0.0987   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:04,188-Speed 2627.20 samples/sec   Loss 35.6452   LearningRate 0.0987   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:18:08,096-Speed 2620.91 samples/sec   Loss 35.7032   LearningRate 0.0987   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:18:11,985-Speed 2633.90 samples/sec   Loss 35.6526   LearningRate 0.0987   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:15,987-Speed 2559.06 samples/sec   Loss 35.5451   LearningRate 0.0987   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:19,886-Speed 2626.71 samples/sec   Loss 35.5302   LearningRate 0.0987   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:23,806-Speed 2612.76 samples/sec   Loss 35.5257   LearningRate 0.0987   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:27,843-Speed 2537.24 samples/sec   Loss 35.4904   LearningRate 0.0987   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:31,822-Speed 2574.63 samples/sec   Loss 35.4634   LearningRate 0.0987   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:35,742-Speed 2613.11 samples/sec   Loss 35.4056   LearningRate 0.0987   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:39,640-Speed 2627.32 samples/sec   Loss 35.3940   LearningRate 0.0987   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:43,540-Speed 2626.20 samples/sec   Loss 35.4564   LearningRate 0.0987   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:47,466-Speed 2608.81 samples/sec   Loss 35.3916   LearningRate 0.0987   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:18:51,406-Speed 2599.56 samples/sec   Loss 35.4319   LearningRate 0.0987   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:18:55,317-Speed 2619.08 samples/sec   Loss 35.2466   LearningRate 0.0987   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:18:59,200-Speed 2637.69 samples/sec   Loss 35.4244   LearningRate 0.0987   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:03,102-Speed 2624.98 samples/sec   Loss 35.4258   LearningRate 0.0987   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:07,006-Speed 2624.07 samples/sec   Loss 35.4317   LearningRate 0.0987   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:10,906-Speed 2626.55 samples/sec   Loss 35.2272   LearningRate 0.0987   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:14,805-Speed 2626.73 samples/sec   Loss 35.3857   LearningRate 0.0987   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:18,715-Speed 2619.22 samples/sec   Loss 35.2411   LearningRate 0.0987   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:22,642-Speed 2608.48 samples/sec   Loss 35.2241   LearningRate 0.0987   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:26,552-Speed 2619.95 samples/sec   Loss 35.2367   LearningRate 0.0987   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:30,455-Speed 2624.37 samples/sec   Loss 35.3222   LearningRate 0.0987   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:34,361-Speed 2621.92 samples/sec   Loss 35.3202   LearningRate 0.0987   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:38,265-Speed 2623.56 samples/sec   Loss 35.1747   LearningRate 0.0987   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:19:42,178-Speed 2617.31 samples/sec   Loss 35.2025   LearningRate 0.0987   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:19:46,107-Speed 2607.10 samples/sec   Loss 35.0848   LearningRate 0.0987   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:19:50,007-Speed 2626.03 samples/sec   Loss 35.0078   LearningRate 0.0987   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:19:53,920-Speed 2618.55 samples/sec   Loss 35.0596   LearningRate 0.0987   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:19:57,821-Speed 2625.17 samples/sec   Loss 35.0623   LearningRate 0.0987   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:01,786-Speed 2583.85 samples/sec   Loss 35.0539   LearningRate 0.0987   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:05,728-Speed 2598.53 samples/sec   Loss 35.1178   LearningRate 0.0987   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:09,637-Speed 2619.54 samples/sec   Loss 34.9651   LearningRate 0.0987   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:13,539-Speed 2625.08 samples/sec   Loss 34.9361   LearningRate 0.0987   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:17,441-Speed 2625.02 samples/sec   Loss 34.8686   LearningRate 0.0987   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:21,348-Speed 2621.97 samples/sec   Loss 35.0181   LearningRate 0.0987   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:25,463-Speed 2488.89 samples/sec   Loss 34.8929   LearningRate 0.0986   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:29,518-Speed 2526.24 samples/sec   Loss 35.0111   LearningRate 0.0986   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:33,422-Speed 2624.05 samples/sec   Loss 35.0492   LearningRate 0.0986   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:20:37,450-Speed 2542.99 samples/sec   Loss 34.8013   LearningRate 0.0986   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:20:41,347-Speed 2628.03 samples/sec   Loss 34.9848   LearningRate 0.0986   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:20:45,246-Speed 2626.94 samples/sec   Loss 34.8505   LearningRate 0.0986   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:20:49,142-Speed 2628.83 samples/sec   Loss 34.9822   LearningRate 0.0986   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:53,045-Speed 2624.67 samples/sec   Loss 34.7693   LearningRate 0.0986   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:20:57,050-Speed 2557.70 samples/sec   Loss 34.8846   LearningRate 0.0986   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:00,950-Speed 2628.25 samples/sec   Loss 34.8628   LearningRate 0.0986   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:04,856-Speed 2622.13 samples/sec   Loss 34.7246   LearningRate 0.0986   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:08,772-Speed 2615.34 samples/sec   Loss 34.8403   LearningRate 0.0986   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:12,673-Speed 2625.43 samples/sec   Loss 34.5997   LearningRate 0.0986   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:16,577-Speed 2623.67 samples/sec   Loss 34.6872   LearningRate 0.0986   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:20,630-Speed 2527.47 samples/sec   Loss 34.5773   LearningRate 0.0986   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:24,676-Speed 2531.66 samples/sec   Loss 34.6448   LearningRate 0.0986   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:28,563-Speed 2635.00 samples/sec   Loss 34.7026   LearningRate 0.0986   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:32,538-Speed 2576.75 samples/sec   Loss 34.4643   LearningRate 0.0986   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:36,437-Speed 2627.53 samples/sec   Loss 34.7998   LearningRate 0.0986   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:40,337-Speed 2625.92 samples/sec   Loss 34.5995   LearningRate 0.0986   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:44,371-Speed 2538.67 samples/sec   Loss 34.5848   LearningRate 0.0986   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:48,364-Speed 2565.42 samples/sec   Loss 34.4339   LearningRate 0.0986   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:52,266-Speed 2624.94 samples/sec   Loss 34.5216   LearningRate 0.0986   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:21:56,187-Speed 2612.77 samples/sec   Loss 34.4249   LearningRate 0.0986   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:00,120-Speed 2604.08 samples/sec   Loss 34.4100   LearningRate 0.0986   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:04,036-Speed 2615.87 samples/sec   Loss 34.3533   LearningRate 0.0986   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:07,958-Speed 2611.17 samples/sec   Loss 34.5426   LearningRate 0.0986   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:22:11,867-Speed 2620.47 samples/sec   Loss 34.3541   LearningRate 0.0986   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:22:15,788-Speed 2612.66 samples/sec   Loss 34.5030   LearningRate 0.0986   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:19,694-Speed 2622.12 samples/sec   Loss 34.3219   LearningRate 0.0986   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:23,601-Speed 2621.35 samples/sec   Loss 34.4173   LearningRate 0.0986   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:27,501-Speed 2626.25 samples/sec   Loss 34.3737   LearningRate 0.0986   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:31,401-Speed 2627.05 samples/sec   Loss 34.3680   LearningRate 0.0986   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:35,306-Speed 2622.70 samples/sec   Loss 34.1544   LearningRate 0.0986   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:39,212-Speed 2622.28 samples/sec   Loss 34.2713   LearningRate 0.0986   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:43,122-Speed 2619.93 samples/sec   Loss 34.3173   LearningRate 0.0986   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:47,057-Speed 2602.52 samples/sec   Loss 34.3726   LearningRate 0.0986   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:22:51,032-Speed 2579.25 samples/sec   Loss 34.0460   LearningRate 0.0986   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:22:55,105-Speed 2514.66 samples/sec   Loss 34.1723   LearningRate 0.0986   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:22:59,115-Speed 2553.79 samples/sec   Loss 34.1425   LearningRate 0.0986   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:03,068-Speed 2591.53 samples/sec   Loss 33.9713   LearningRate 0.0986   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:06,971-Speed 2624.52 samples/sec   Loss 34.1654   LearningRate 0.0986   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:10,881-Speed 2620.12 samples/sec   Loss 34.1128   LearningRate 0.0985   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:14,785-Speed 2623.55 samples/sec   Loss 34.1258   LearningRate 0.0985   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:18,708-Speed 2610.87 samples/sec   Loss 34.0147   LearningRate 0.0985   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:22,616-Speed 2620.94 samples/sec   Loss 34.0261   LearningRate 0.0985   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:26,530-Speed 2617.11 samples/sec   Loss 34.1214   LearningRate 0.0985   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:23:30,452-Speed 2611.59 samples/sec   Loss 34.0655   LearningRate 0.0985   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:34,354-Speed 2625.16 samples/sec   Loss 34.0430   LearningRate 0.0985   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:38,267-Speed 2617.18 samples/sec   Loss 33.9728   LearningRate 0.0985   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:42,202-Speed 2603.39 samples/sec   Loss 33.8394   LearningRate 0.0985   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:46,108-Speed 2622.32 samples/sec   Loss 33.9915   LearningRate 0.0985   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:50,016-Speed 2621.10 samples/sec   Loss 33.9473   LearningRate 0.0985   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:53,934-Speed 2614.13 samples/sec   Loss 33.8250   LearningRate 0.0985   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:23:57,839-Speed 2623.07 samples/sec   Loss 33.8547   LearningRate 0.0985   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:01,742-Speed 2624.20 samples/sec   Loss 33.7300   LearningRate 0.0985   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:05,645-Speed 2624.49 samples/sec   Loss 33.8557   LearningRate 0.0985   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:09,550-Speed 2622.47 samples/sec   Loss 33.7938   LearningRate 0.0985   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:24:13,463-Speed 2617.85 samples/sec   Loss 33.8555   LearningRate 0.0985   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:17,382-Speed 2613.41 samples/sec   Loss 33.8633   LearningRate 0.0985   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:21,285-Speed 2624.88 samples/sec   Loss 33.6847   LearningRate 0.0985   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:25,198-Speed 2616.90 samples/sec   Loss 33.6890   LearningRate 0.0985   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:29,102-Speed 2624.00 samples/sec   Loss 33.5697   LearningRate 0.0985   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:33,004-Speed 2624.77 samples/sec   Loss 33.6544   LearningRate 0.0985   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:36,938-Speed 2603.73 samples/sec   Loss 33.7582   LearningRate 0.0985   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:40,841-Speed 2624.44 samples/sec   Loss 33.7559   LearningRate 0.0985   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:44,766-Speed 2609.38 samples/sec   Loss 33.6103   LearningRate 0.0985   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:48,685-Speed 2614.13 samples/sec   Loss 33.5045   LearningRate 0.0985   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:24:52,588-Speed 2624.63 samples/sec   Loss 33.5311   LearningRate 0.0985   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:24:56,511-Speed 2611.14 samples/sec   Loss 33.6326   LearningRate 0.0985   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:00,414-Speed 2624.04 samples/sec   Loss 33.4163   LearningRate 0.0985   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:04,317-Speed 2623.95 samples/sec   Loss 33.4364   LearningRate 0.0985   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:08,217-Speed 2626.03 samples/sec   Loss 33.4079   LearningRate 0.0985   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:12,122-Speed 2623.45 samples/sec   Loss 33.4634   LearningRate 0.0985   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:16,024-Speed 2624.90 samples/sec   Loss 33.6069   LearningRate 0.0985   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:19,935-Speed 2618.80 samples/sec   Loss 33.2852   LearningRate 0.0985   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:23,865-Speed 2606.24 samples/sec   Loss 33.5009   LearningRate 0.0985   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:27,763-Speed 2628.02 samples/sec   Loss 33.3382   LearningRate 0.0985   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:31,680-Speed 2615.35 samples/sec   Loss 33.3261   LearningRate 0.0985   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:35,564-Speed 2637.06 samples/sec   Loss 33.3406   LearningRate 0.0985   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:39,471-Speed 2621.36 samples/sec   Loss 33.2412   LearningRate 0.0985   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:43,386-Speed 2616.24 samples/sec   Loss 33.4366   LearningRate 0.0985   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:47,303-Speed 2615.01 samples/sec   Loss 33.2903   LearningRate 0.0985   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:51,204-Speed 2625.83 samples/sec   Loss 33.3708   LearningRate 0.0985   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:55,104-Speed 2626.09 samples/sec   Loss 33.1003   LearningRate 0.0984   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:25:59,009-Speed 2623.40 samples/sec   Loss 33.1680   LearningRate 0.0984   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:02,910-Speed 2625.47 samples/sec   Loss 33.1572   LearningRate 0.0984   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:06,807-Speed 2628.14 samples/sec   Loss 33.0730   LearningRate 0.0984   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:10,707-Speed 2625.83 samples/sec   Loss 33.1681   LearningRate 0.0984   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:14,586-Speed 2640.53 samples/sec   Loss 33.0487   LearningRate 0.0984   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:18,491-Speed 2622.92 samples/sec   Loss 33.1290   LearningRate 0.0984   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:22,396-Speed 2623.36 samples/sec   Loss 33.1780   LearningRate 0.0984   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:26,297-Speed 2625.51 samples/sec   Loss 33.0306   LearningRate 0.0984   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:30,201-Speed 2623.49 samples/sec   Loss 33.1446   LearningRate 0.0984   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:34,105-Speed 2623.54 samples/sec   Loss 32.9486   LearningRate 0.0984   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:38,028-Speed 2610.81 samples/sec   Loss 32.9195   LearningRate 0.0984   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:41,935-Speed 2621.10 samples/sec   Loss 32.9230   LearningRate 0.0984   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:45,848-Speed 2617.60 samples/sec   Loss 32.9465   LearningRate 0.0984   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:49,751-Speed 2624.25 samples/sec   Loss 32.8549   LearningRate 0.0984   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:26:53,734-Speed 2571.45 samples/sec   Loss 32.9468   LearningRate 0.0984   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:26:57,805-Speed 2515.96 samples/sec   Loss 32.9403   LearningRate 0.0984   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:27:01,905-Speed 2497.86 samples/sec   Loss 32.8940   LearningRate 0.0984   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:27:05,834-Speed 2607.36 samples/sec   Loss 32.8663   LearningRate 0.0984   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:27:09,718-Speed 2636.67 samples/sec   Loss 32.8623   LearningRate 0.0984   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:13,614-Speed 2629.42 samples/sec   Loss 32.6287   LearningRate 0.0984   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:17,517-Speed 2624.06 samples/sec   Loss 32.6822   LearningRate 0.0984   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:21,420-Speed 2624.17 samples/sec   Loss 32.8240   LearningRate 0.0984   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:25,331-Speed 2619.00 samples/sec   Loss 32.7200   LearningRate 0.0984   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:29,267-Speed 2602.62 samples/sec   Loss 32.6346   LearningRate 0.0984   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:33,173-Speed 2621.88 samples/sec   Loss 32.6765   LearningRate 0.0984   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:37,084-Speed 2619.00 samples/sec   Loss 32.7087   LearningRate 0.0984   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:41,001-Speed 2614.30 samples/sec   Loss 32.6512   LearningRate 0.0984   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:44,917-Speed 2615.87 samples/sec   Loss 32.5247   LearningRate 0.0984   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:27:48,827-Speed 2619.53 samples/sec   Loss 32.6963   LearningRate 0.0984   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:27:52,739-Speed 2618.82 samples/sec   Loss 32.5356   LearningRate 0.0984   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:27:56,652-Speed 2616.86 samples/sec   Loss 32.5219   LearningRate 0.0984   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:28:00,542-Speed 2633.06 samples/sec   Loss 32.5763   LearningRate 0.0984   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:04,450-Speed 2621.17 samples/sec   Loss 32.4855   LearningRate 0.0984   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:08,357-Speed 2620.85 samples/sec   Loss 32.5242   LearningRate 0.0984   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:12,267-Speed 2619.36 samples/sec   Loss 32.6748   LearningRate 0.0984   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:16,176-Speed 2620.70 samples/sec   Loss 32.4196   LearningRate 0.0984   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:20,090-Speed 2616.62 samples/sec   Loss 32.6146   LearningRate 0.0984   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:23,988-Speed 2627.55 samples/sec   Loss 32.3633   LearningRate 0.0984   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:27,908-Speed 2613.52 samples/sec   Loss 32.3764   LearningRate 0.0984   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:31,888-Speed 2573.39 samples/sec   Loss 32.3350   LearningRate 0.0984   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:35,790-Speed 2624.50 samples/sec   Loss 32.5603   LearningRate 0.0984   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:39,687-Speed 2628.05 samples/sec   Loss 32.3605   LearningRate 0.0983   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:28:43,588-Speed 2626.02 samples/sec   Loss 32.2889   LearningRate 0.0983   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:28:47,465-Speed 2641.56 samples/sec   Loss 32.3160   LearningRate 0.0983   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:51,419-Speed 2590.54 samples/sec   Loss 32.3068   LearningRate 0.0983   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:55,402-Speed 2571.66 samples/sec   Loss 32.2107   LearningRate 0.0983   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:28:59,300-Speed 2627.35 samples/sec   Loss 32.2756   LearningRate 0.0983   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:03,202-Speed 2624.60 samples/sec   Loss 32.1582   LearningRate 0.0983   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:07,104-Speed 2625.32 samples/sec   Loss 32.3757   LearningRate 0.0983   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:11,160-Speed 2524.91 samples/sec   Loss 32.1219   LearningRate 0.0983   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:15,258-Speed 2499.59 samples/sec   Loss 32.1021   LearningRate 0.0983   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:19,361-Speed 2496.11 samples/sec   Loss 32.0510   LearningRate 0.0983   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:23,456-Speed 2500.80 samples/sec   Loss 32.0516   LearningRate 0.0983   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:29:27,428-Speed 2578.78 samples/sec   Loss 32.2105   LearningRate 0.0983   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:31,331-Speed 2624.55 samples/sec   Loss 31.9394   LearningRate 0.0983   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:35,229-Speed 2627.78 samples/sec   Loss 32.1908   LearningRate 0.0983   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:39,124-Speed 2629.29 samples/sec   Loss 32.0051   LearningRate 0.0983   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:43,024-Speed 2626.64 samples/sec   Loss 32.0673   LearningRate 0.0983   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:46,923-Speed 2627.00 samples/sec   Loss 32.0858   LearningRate 0.0983   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:50,819-Speed 2628.65 samples/sec   Loss 31.8388   LearningRate 0.0983   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:54,717-Speed 2627.59 samples/sec   Loss 31.9085   LearningRate 0.0983   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:29:58,627-Speed 2619.20 samples/sec   Loss 31.9363   LearningRate 0.0983   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:02,529-Speed 2625.16 samples/sec   Loss 31.9298   LearningRate 0.0983   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:06,433-Speed 2623.54 samples/sec   Loss 31.8660   LearningRate 0.0983   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:30:10,325-Speed 2631.90 samples/sec   Loss 31.8789   LearningRate 0.0983   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:14,227-Speed 2624.65 samples/sec   Loss 31.6086   LearningRate 0.0983   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:18,238-Speed 2554.05 samples/sec   Loss 32.0320   LearningRate 0.0983   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:22,144-Speed 2621.48 samples/sec   Loss 31.8754   LearningRate 0.0983   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:26,045-Speed 2625.88 samples/sec   Loss 31.8391   LearningRate 0.0983   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:29,944-Speed 2627.59 samples/sec   Loss 31.8138   LearningRate 0.0983   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:33,852-Speed 2620.33 samples/sec   Loss 31.6954   LearningRate 0.0983   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:37,776-Speed 2610.02 samples/sec   Loss 31.6759   LearningRate 0.0983   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:41,679-Speed 2625.01 samples/sec   Loss 31.5674   LearningRate 0.0983   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:45,647-Speed 2580.75 samples/sec   Loss 31.8369   LearningRate 0.0983   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:49,600-Speed 2591.37 samples/sec   Loss 31.8668   LearningRate 0.0983   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:30:53,588-Speed 2568.18 samples/sec   Loss 31.5941   LearningRate 0.0983   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:30:57,490-Speed 2625.21 samples/sec   Loss 31.6417   LearningRate 0.0983   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:01,527-Speed 2536.88 samples/sec   Loss 31.4814   LearningRate 0.0983   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:05,566-Speed 2536.11 samples/sec   Loss 31.4879   LearningRate 0.0983   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:09,470-Speed 2623.33 samples/sec   Loss 31.4460   LearningRate 0.0983   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:13,371-Speed 2625.52 samples/sec   Loss 31.4880   LearningRate 0.0983   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:17,349-Speed 2575.54 samples/sec   Loss 31.4714   LearningRate 0.0983   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:21,452-Speed 2495.98 samples/sec   Loss 31.3622   LearningRate 0.0983   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:31:25,361-Speed 2620.54 samples/sec   Loss 31.4135   LearningRate 0.0982   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:29,260-Speed 2627.19 samples/sec   Loss 31.1969   LearningRate 0.0982   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:33,157-Speed 2628.02 samples/sec   Loss 31.3781   LearningRate 0.0982   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:37,117-Speed 2586.38 samples/sec   Loss 31.5173   LearningRate 0.0982   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:41,125-Speed 2555.57 samples/sec   Loss 31.3308   LearningRate 0.0982   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:45,088-Speed 2584.67 samples/sec   Loss 31.2794   LearningRate 0.0982   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:48,996-Speed 2621.07 samples/sec   Loss 31.2997   LearningRate 0.0982   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:52,902-Speed 2622.31 samples/sec   Loss 31.3638   LearningRate 0.0982   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:31:56,800-Speed 2628.06 samples/sec   Loss 31.1534   LearningRate 0.0982   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:32:00,698-Speed 2627.12 samples/sec   Loss 31.3053   LearningRate 0.0982   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:32:04,601-Speed 2624.34 samples/sec   Loss 31.1291   LearningRate 0.0982   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:08,506-Speed 2622.63 samples/sec   Loss 31.1307   LearningRate 0.0982   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:12,431-Speed 2609.82 samples/sec   Loss 31.1277   LearningRate 0.0982   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:16,339-Speed 2621.18 samples/sec   Loss 31.1308   LearningRate 0.0982   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:20,240-Speed 2625.90 samples/sec   Loss 31.1896   LearningRate 0.0982   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:24,146-Speed 2621.85 samples/sec   Loss 31.0132   LearningRate 0.0982   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:28,072-Speed 2609.54 samples/sec   Loss 30.8965   LearningRate 0.0982   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:31,976-Speed 2623.39 samples/sec   Loss 31.1256   LearningRate 0.0982   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:35,888-Speed 2618.19 samples/sec   Loss 31.2007   LearningRate 0.0982   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:39,788-Speed 2626.37 samples/sec   Loss 30.9814   LearningRate 0.0982   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:43,692-Speed 2623.88 samples/sec   Loss 30.8926   LearningRate 0.0982   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:32:47,696-Speed 2558.26 samples/sec   Loss 30.9746   LearningRate 0.0982   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:51,805-Speed 2492.68 samples/sec   Loss 30.9087   LearningRate 0.0982   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:55,720-Speed 2616.49 samples/sec   Loss 30.9580   LearningRate 0.0982   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:32:59,623-Speed 2624.23 samples/sec   Loss 31.0432   LearningRate 0.0982   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:03,529-Speed 2621.94 samples/sec   Loss 30.8776   LearningRate 0.0982   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:07,430-Speed 2625.54 samples/sec   Loss 30.9190   LearningRate 0.0982   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:11,396-Speed 2583.30 samples/sec   Loss 31.0235   LearningRate 0.0982   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:15,504-Speed 2493.06 samples/sec   Loss 30.7353   LearningRate 0.0982   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:19,617-Speed 2490.48 samples/sec   Loss 30.9403   LearningRate 0.0982   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:23,686-Speed 2516.91 samples/sec   Loss 30.8062   LearningRate 0.0982   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:27,614-Speed 2607.67 samples/sec   Loss 30.6895   LearningRate 0.0982   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:31,536-Speed 2612.32 samples/sec   Loss 30.7420   LearningRate 0.0982   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:35,456-Speed 2612.68 samples/sec   Loss 30.7051   LearningRate 0.0982   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:39,358-Speed 2624.40 samples/sec   Loss 30.5601   LearningRate 0.0982   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:43,260-Speed 2624.92 samples/sec   Loss 30.7237   LearningRate 0.0982   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:47,199-Speed 2600.65 samples/sec   Loss 30.6493   LearningRate 0.0982   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:51,116-Speed 2615.34 samples/sec   Loss 30.5775   LearningRate 0.0982   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:55,016-Speed 2626.23 samples/sec   Loss 30.6444   LearningRate 0.0982   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:33:58,916-Speed 2626.40 samples/sec   Loss 30.4838   LearningRate 0.0982   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:02,821-Speed 2622.23 samples/sec   Loss 30.6008   LearningRate 0.0982   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:06,745-Speed 2609.91 samples/sec   Loss 30.4543   LearningRate 0.0981   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:34:10,830-Speed 2507.87 samples/sec   Loss 30.7298   LearningRate 0.0981   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:14,897-Speed 2518.15 samples/sec   Loss 30.4743   LearningRate 0.0981   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:19,010-Speed 2490.39 samples/sec   Loss 30.4403   LearningRate 0.0981   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:23,079-Speed 2517.56 samples/sec   Loss 30.4858   LearningRate 0.0981   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:27,005-Speed 2608.88 samples/sec   Loss 30.7023   LearningRate 0.0981   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:30,919-Speed 2617.44 samples/sec   Loss 30.5233   LearningRate 0.0981   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:34:34,799-Speed 2639.58 samples/sec   Loss 30.4448   LearningRate 0.0981   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:34:38,725-Speed 2609.42 samples/sec   Loss 30.5592   LearningRate 0.0981   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:34:42,622-Speed 2628.42 samples/sec   Loss 30.2927   LearningRate 0.0981   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:34:46,520-Speed 2627.69 samples/sec   Loss 30.3314   LearningRate 0.0981   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:34:50,415-Speed 2629.33 samples/sec   Loss 30.4409   LearningRate 0.0981   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:34:54,321-Speed 2622.80 samples/sec   Loss 30.4759   LearningRate 0.0981   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:34:58,219-Speed 2627.51 samples/sec   Loss 30.3092   LearningRate 0.0981   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:35:02,117-Speed 2627.03 samples/sec   Loss 30.3522   LearningRate 0.0981   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:35:06,024-Speed 2625.72 samples/sec   Loss 30.3515   LearningRate 0.0981   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:35:09,953-Speed 2606.94 samples/sec   Loss 30.2874   LearningRate 0.0981   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:35:13,857-Speed 2623.78 samples/sec   Loss 30.2703   LearningRate 0.0981   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:17,773-Speed 2616.25 samples/sec   Loss 30.1768   LearningRate 0.0981   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:21,679-Speed 2622.42 samples/sec   Loss 30.3648   LearningRate 0.0981   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:25,579-Speed 2626.48 samples/sec   Loss 30.1958   LearningRate 0.0981   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:29,481-Speed 2624.89 samples/sec   Loss 30.1501   LearningRate 0.0981   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:33,433-Speed 2591.48 samples/sec   Loss 30.1606   LearningRate 0.0981   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:37,339-Speed 2621.93 samples/sec   Loss 30.1301   LearningRate 0.0981   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:41,244-Speed 2623.47 samples/sec   Loss 30.1067   LearningRate 0.0981   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:45,146-Speed 2624.98 samples/sec   Loss 30.0879   LearningRate 0.0981   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:49,055-Speed 2620.15 samples/sec   Loss 30.0037   LearningRate 0.0981   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:35:52,961-Speed 2622.27 samples/sec   Loss 30.0166   LearningRate 0.0981   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:35:57,009-Speed 2531.20 samples/sec   Loss 30.1903   LearningRate 0.0981   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:01,079-Speed 2516.54 samples/sec   Loss 30.1629   LearningRate 0.0981   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:05,158-Speed 2511.02 samples/sec   Loss 30.1590   LearningRate 0.0981   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:09,093-Speed 2603.32 samples/sec   Loss 29.9725   LearningRate 0.0981   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:12,993-Speed 2625.73 samples/sec   Loss 29.9926   LearningRate 0.0981   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:16,900-Speed 2621.55 samples/sec   Loss 29.7415   LearningRate 0.0981   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:20,801-Speed 2626.20 samples/sec   Loss 29.8282   LearningRate 0.0981   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:24,699-Speed 2627.19 samples/sec   Loss 29.8789   LearningRate 0.0981   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:28,626-Speed 2608.38 samples/sec   Loss 29.7601   LearningRate 0.0981   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:32,525-Speed 2626.98 samples/sec   Loss 29.6902   LearningRate 0.0981   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:36,412-Speed 2635.22 samples/sec   Loss 29.8888   LearningRate 0.0981   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:40,333-Speed 2612.24 samples/sec   Loss 29.6038   LearningRate 0.0981   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:44,239-Speed 2622.56 samples/sec   Loss 29.6613   LearningRate 0.0981   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:48,155-Speed 2615.64 samples/sec   Loss 29.8589   LearningRate 0.0981   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:52,068-Speed 2617.57 samples/sec   Loss 29.7129   LearningRate 0.0980   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:55,980-Speed 2618.04 samples/sec   Loss 29.7486   LearningRate 0.0980   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:36:59,901-Speed 2612.08 samples/sec   Loss 29.5960   LearningRate 0.0980   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:03,798-Speed 2628.04 samples/sec   Loss 29.6836   LearningRate 0.0980   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:07,701-Speed 2624.75 samples/sec   Loss 29.6433   LearningRate 0.0980   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:11,601-Speed 2626.06 samples/sec   Loss 29.6254   LearningRate 0.0980   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:15,510-Speed 2620.27 samples/sec   Loss 29.4742   LearningRate 0.0980   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:37:19,490-Speed 2573.45 samples/sec   Loss 29.5948   LearningRate 0.0980   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:23,594-Speed 2495.46 samples/sec   Loss 29.5731   LearningRate 0.0980   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:27,491-Speed 2628.90 samples/sec   Loss 29.4678   LearningRate 0.0980   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:31,390-Speed 2626.69 samples/sec   Loss 29.4725   LearningRate 0.0980   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:35,324-Speed 2603.87 samples/sec   Loss 29.3436   LearningRate 0.0980   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:39,225-Speed 2625.72 samples/sec   Loss 29.5214   LearningRate 0.0980   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:43,122-Speed 2628.10 samples/sec   Loss 29.2837   LearningRate 0.0980   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:47,025-Speed 2623.99 samples/sec   Loss 29.4372   LearningRate 0.0980   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:50,934-Speed 2621.04 samples/sec   Loss 29.4086   LearningRate 0.0980   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:54,848-Speed 2616.38 samples/sec   Loss 29.3765   LearningRate 0.0980   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:37:58,829-Speed 2574.04 samples/sec   Loss 29.3001   LearningRate 0.0980   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:38:02,722-Speed 2630.99 samples/sec   Loss 29.3908   LearningRate 0.0980   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:06,631-Speed 2619.82 samples/sec   Loss 29.3652   LearningRate 0.0980   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:10,538-Speed 2620.98 samples/sec   Loss 29.4736   LearningRate 0.0980   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:14,618-Speed 2510.74 samples/sec   Loss 29.4221   LearningRate 0.0980   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:18,587-Speed 2580.51 samples/sec   Loss 29.2012   LearningRate 0.0980   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:22,486-Speed 2627.50 samples/sec   Loss 29.0432   LearningRate 0.0980   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:26,407-Speed 2612.04 samples/sec   Loss 29.1645   LearningRate 0.0980   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:38:30,293-Speed 2635.93 samples/sec   Loss 29.0904   LearningRate 0.0980   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:34,191-Speed 2627.53 samples/sec   Loss 29.1964   LearningRate 0.0980   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:38,098-Speed 2621.74 samples/sec   Loss 29.2414   LearningRate 0.0980   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:42,004-Speed 2621.85 samples/sec   Loss 29.2019   LearningRate 0.0980   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:45,895-Speed 2633.14 samples/sec   Loss 29.0513   LearningRate 0.0980   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:49,799-Speed 2623.38 samples/sec   Loss 29.0234   LearningRate 0.0980   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:53,709-Speed 2619.56 samples/sec   Loss 29.2007   LearningRate 0.0980   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:38:57,616-Speed 2621.61 samples/sec   Loss 29.0230   LearningRate 0.0980   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:39:01,516-Speed 2626.54 samples/sec   Loss 28.9956   LearningRate 0.0980   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:39:05,418-Speed 2624.36 samples/sec   Loss 28.9609   LearningRate 0.0980   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:39:09,341-Speed 2611.19 samples/sec   Loss 28.8818   LearningRate 0.0980   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:13,271-Speed 2606.08 samples/sec   Loss 28.7655   LearningRate 0.0980   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:17,171-Speed 2626.16 samples/sec   Loss 28.9609   LearningRate 0.0980   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:21,233-Speed 2521.67 samples/sec   Loss 28.7315   LearningRate 0.0980   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:25,332-Speed 2498.95 samples/sec   Loss 28.7060   LearningRate 0.0980   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:29,439-Speed 2493.73 samples/sec   Loss 28.7558   LearningRate 0.0980   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:33,554-Speed 2488.91 samples/sec   Loss 28.6553   LearningRate 0.0980   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:37,583-Speed 2542.45 samples/sec   Loss 28.8282   LearningRate 0.0979   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:41,488-Speed 2622.43 samples/sec   Loss 28.8563   LearningRate 0.0979   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:39:45,371-Speed 2638.22 samples/sec   Loss 28.7507   LearningRate 0.0979   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:39:49,279-Speed 2620.69 samples/sec   Loss 28.8108   LearningRate 0.0979   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:39:53,201-Speed 2612.13 samples/sec   Loss 28.7837   LearningRate 0.0979   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:39:57,103-Speed 2624.92 samples/sec   Loss 28.7941   LearningRate 0.0979   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:01,025-Speed 2612.05 samples/sec   Loss 28.7554   LearningRate 0.0979   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:05,007-Speed 2572.01 samples/sec   Loss 28.7215   LearningRate 0.0979   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:08,915-Speed 2620.54 samples/sec   Loss 28.7366   LearningRate 0.0979   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:12,822-Speed 2621.72 samples/sec   Loss 28.5774   LearningRate 0.0979   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:16,725-Speed 2624.79 samples/sec   Loss 28.6153   LearningRate 0.0979   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:20,645-Speed 2612.10 samples/sec   Loss 28.5789   LearningRate 0.0979   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:40:24,546-Speed 2625.80 samples/sec   Loss 28.6659   LearningRate 0.0979   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:28,451-Speed 2623.55 samples/sec   Loss 28.6232   LearningRate 0.0979   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:32,352-Speed 2625.40 samples/sec   Loss 28.6456   LearningRate 0.0979   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:36,251-Speed 2626.37 samples/sec   Loss 28.6229   LearningRate 0.0979   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:40,320-Speed 2516.94 samples/sec   Loss 28.4283   LearningRate 0.0979   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:44,429-Speed 2492.83 samples/sec   Loss 28.4668   LearningRate 0.0979   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:48,366-Speed 2601.76 samples/sec   Loss 28.4218   LearningRate 0.0979   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:52,263-Speed 2628.56 samples/sec   Loss 28.4679   LearningRate 0.0979   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:40:56,176-Speed 2617.83 samples/sec   Loss 28.3855   LearningRate 0.0979   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:00,082-Speed 2622.00 samples/sec   Loss 28.4518   LearningRate 0.0979   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:03,970-Speed 2634.32 samples/sec   Loss 28.3854   LearningRate 0.0979   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:07,872-Speed 2624.98 samples/sec   Loss 28.4194   LearningRate 0.0979   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:11,787-Speed 2616.49 samples/sec   Loss 28.4204   LearningRate 0.0979   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:15,711-Speed 2610.13 samples/sec   Loss 28.2436   LearningRate 0.0979   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:19,610-Speed 2626.94 samples/sec   Loss 28.5098   LearningRate 0.0979   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:23,544-Speed 2603.67 samples/sec   Loss 28.2190   LearningRate 0.0979   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:27,445-Speed 2626.16 samples/sec   Loss 28.3208   LearningRate 0.0979   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:31,346-Speed 2625.18 samples/sec   Loss 28.1830   LearningRate 0.0979   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:35,244-Speed 2627.75 samples/sec   Loss 28.2027   LearningRate 0.0979   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:39,150-Speed 2621.96 samples/sec   Loss 28.1714   LearningRate 0.0979   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:43,054-Speed 2624.02 samples/sec   Loss 28.2329   LearningRate 0.0979   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:46,956-Speed 2625.13 samples/sec   Loss 28.1270   LearningRate 0.0979   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:50,873-Speed 2614.89 samples/sec   Loss 28.2216   LearningRate 0.0979   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:54,790-Speed 2614.75 samples/sec   Loss 28.1825   LearningRate 0.0979   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:41:58,693-Speed 2624.76 samples/sec   Loss 28.0849   LearningRate 0.0979   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:02,711-Speed 2548.99 samples/sec   Loss 28.2432   LearningRate 0.0979   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:06,821-Speed 2492.02 samples/sec   Loss 28.1227   LearningRate 0.0979   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:10,802-Speed 2573.31 samples/sec   Loss 28.0397   LearningRate 0.0979   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:14,707-Speed 2622.64 samples/sec   Loss 28.2350   LearningRate 0.0979   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:18,606-Speed 2627.52 samples/sec   Loss 28.0076   LearningRate 0.0979   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:22,486-Speed 2639.05 samples/sec   Loss 28.0850   LearningRate 0.0978   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:26,491-Speed 2558.19 samples/sec   Loss 27.9504   LearningRate 0.0978   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:30,399-Speed 2620.91 samples/sec   Loss 28.0034   LearningRate 0.0978   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:34,300-Speed 2625.26 samples/sec   Loss 27.8420   LearningRate 0.0978   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:38,212-Speed 2618.57 samples/sec   Loss 27.8597   LearningRate 0.0978   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:42,109-Speed 2628.11 samples/sec   Loss 27.8935   LearningRate 0.0978   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:46,022-Speed 2617.98 samples/sec   Loss 27.7350   LearningRate 0.0978   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:49,939-Speed 2614.82 samples/sec   Loss 27.9296   LearningRate 0.0978   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:53,851-Speed 2617.62 samples/sec   Loss 27.8657   LearningRate 0.0978   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:42:57,753-Speed 2625.35 samples/sec   Loss 27.8021   LearningRate 0.0978   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:01,639-Speed 2636.02 samples/sec   Loss 27.9774   LearningRate 0.0978   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:05,541-Speed 2624.64 samples/sec   Loss 27.8316   LearningRate 0.0978   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:09,581-Speed 2535.06 samples/sec   Loss 27.8182   LearningRate 0.0978   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:13,648-Speed 2519.01 samples/sec   Loss 27.7763   LearningRate 0.0978   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:17,548-Speed 2626.37 samples/sec   Loss 27.7766   LearningRate 0.0978   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:21,446-Speed 2627.64 samples/sec   Loss 27.4138   LearningRate 0.0978   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:25,356-Speed 2619.29 samples/sec   Loss 27.4933   LearningRate 0.0978   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:29,254-Speed 2627.95 samples/sec   Loss 27.5888   LearningRate 0.0978   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:33,154-Speed 2626.18 samples/sec   Loss 27.7330   LearningRate 0.0978   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:37,059-Speed 2623.01 samples/sec   Loss 27.6962   LearningRate 0.0978   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:40,939-Speed 2639.77 samples/sec   Loss 27.4825   LearningRate 0.0978   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:44,836-Speed 2628.62 samples/sec   Loss 27.3066   LearningRate 0.0978   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:48,978-Speed 2472.53 samples/sec   Loss 27.7119   LearningRate 0.0978   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:52,879-Speed 2626.05 samples/sec   Loss 27.5522   LearningRate 0.0978   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:43:56,809-Speed 2606.55 samples/sec   Loss 27.5434   LearningRate 0.0978   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:00,709-Speed 2626.56 samples/sec   Loss 27.7230   LearningRate 0.0978   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:04,609-Speed 2626.18 samples/sec   Loss 27.2470   LearningRate 0.0978   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:08,516-Speed 2621.60 samples/sec   Loss 27.3241   LearningRate 0.0978   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:12,414-Speed 2627.60 samples/sec   Loss 27.5802   LearningRate 0.0978   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:16,511-Speed 2500.27 samples/sec   Loss 27.4420   LearningRate 0.0978   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:20,457-Speed 2595.02 samples/sec   Loss 27.4551   LearningRate 0.0978   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 20:44:24,349-Speed 2632.66 samples/sec   Loss 27.5119   LearningRate 0.0978   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:28,245-Speed 2628.61 samples/sec   Loss 27.5700   LearningRate 0.0978   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:32,149-Speed 2624.12 samples/sec   Loss 27.4008   LearningRate 0.0978   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:36,048-Speed 2626.45 samples/sec   Loss 27.3973   LearningRate 0.0978   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:40,013-Speed 2583.47 samples/sec   Loss 27.5546   LearningRate 0.0978   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:43,958-Speed 2596.11 samples/sec   Loss 27.1830   LearningRate 0.0978   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:47,877-Speed 2613.80 samples/sec   Loss 27.4072   LearningRate 0.0978   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:51,775-Speed 2627.01 samples/sec   Loss 27.3724   LearningRate 0.0978   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:55,676-Speed 2626.37 samples/sec   Loss 27.2869   LearningRate 0.0978   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:44:59,557-Speed 2639.17 samples/sec   Loss 27.3143   LearningRate 0.0978   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:03,527-Speed 2580.17 samples/sec   Loss 27.2176   LearningRate 0.0978   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:07,421-Speed 2630.22 samples/sec   Loss 27.3812   LearningRate 0.0977   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:11,451-Speed 2541.08 samples/sec   Loss 27.1825   LearningRate 0.0977   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:15,565-Speed 2489.60 samples/sec   Loss 27.1027   LearningRate 0.0977   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:19,462-Speed 2628.85 samples/sec   Loss 27.3380   LearningRate 0.0977   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:23,381-Speed 2614.08 samples/sec   Loss 27.1832   LearningRate 0.0977   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:27,285-Speed 2623.27 samples/sec   Loss 27.4080   LearningRate 0.0977   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:31,242-Speed 2589.49 samples/sec   Loss 27.1572   LearningRate 0.0977   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:35,146-Speed 2623.32 samples/sec   Loss 26.9004   LearningRate 0.0977   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:45:39,064-Speed 2614.21 samples/sec   Loss 26.9150   LearningRate 0.0977   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:45:42,966-Speed 2625.14 samples/sec   Loss 27.0863   LearningRate 0.0977   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:45:46,864-Speed 2627.48 samples/sec   Loss 26.8527   LearningRate 0.0977   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:45:50,795-Speed 2605.47 samples/sec   Loss 27.0035   LearningRate 0.0977   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:45:54,697-Speed 2625.25 samples/sec   Loss 26.8741   LearningRate 0.0977   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:45:58,609-Speed 2618.26 samples/sec   Loss 26.9509   LearningRate 0.0977   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:02,515-Speed 2623.19 samples/sec   Loss 26.9166   LearningRate 0.0977   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:06,474-Speed 2587.05 samples/sec   Loss 27.0350   LearningRate 0.0977   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:10,403-Speed 2606.67 samples/sec   Loss 27.0351   LearningRate 0.0977   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:14,334-Speed 2606.03 samples/sec   Loss 26.9478   LearningRate 0.0977   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:18,313-Speed 2573.90 samples/sec   Loss 27.0402   LearningRate 0.0977   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:22,222-Speed 2620.26 samples/sec   Loss 26.9473   LearningRate 0.0977   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:26,132-Speed 2619.42 samples/sec   Loss 26.8317   LearningRate 0.0977   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:30,050-Speed 2614.61 samples/sec   Loss 26.7432   LearningRate 0.0977   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:33,946-Speed 2629.01 samples/sec   Loss 26.9352   LearningRate 0.0977   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:38,014-Speed 2517.59 samples/sec   Loss 26.9532   LearningRate 0.0977   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:42,041-Speed 2543.77 samples/sec   Loss 26.7934   LearningRate 0.0977   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:45,952-Speed 2619.08 samples/sec   Loss 26.7332   LearningRate 0.0977   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:49,853-Speed 2625.16 samples/sec   Loss 26.6772   LearningRate 0.0977   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:53,766-Speed 2618.47 samples/sec   Loss 26.4110   LearningRate 0.0977   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:46:57,686-Speed 2612.34 samples/sec   Loss 27.0927   LearningRate 0.0977   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:47:01,594-Speed 2621.02 samples/sec   Loss 26.6755   LearningRate 0.0977   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:47:05,494-Speed 2626.50 samples/sec   Loss 26.9161   LearningRate 0.0977   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:47:09,392-Speed 2627.40 samples/sec   Loss 26.8106   LearningRate 0.0977   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:47:13,321-Speed 2607.54 samples/sec   Loss 26.6780   LearningRate 0.0977   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:47:17,222-Speed 2625.21 samples/sec   Loss 26.7642   LearningRate 0.0977   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:47:21,151-Speed 2607.64 samples/sec   Loss 26.5720   LearningRate 0.0977   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:25,045-Speed 2630.21 samples/sec   Loss 26.7784   LearningRate 0.0977   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:28,946-Speed 2625.72 samples/sec   Loss 26.5854   LearningRate 0.0977   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:32,857-Speed 2619.01 samples/sec   Loss 26.6253   LearningRate 0.0977   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:36,763-Speed 2621.67 samples/sec   Loss 26.5464   LearningRate 0.0977   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:40,669-Speed 2622.12 samples/sec   Loss 26.5488   LearningRate 0.0977   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:44,589-Speed 2613.70 samples/sec   Loss 26.7020   LearningRate 0.0977   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:47:48,484-Speed 2629.34 samples/sec   Loss 26.6182   LearningRate 0.0977   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:47:52,393-Speed 2620.27 samples/sec   Loss 26.6087   LearningRate 0.0976   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:47:56,311-Speed 2614.27 samples/sec   Loss 26.4546   LearningRate 0.0976   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:00,222-Speed 2619.34 samples/sec   Loss 26.5255   LearningRate 0.0976   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:04,122-Speed 2626.12 samples/sec   Loss 26.2516   LearningRate 0.0976   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:08,061-Speed 2599.98 samples/sec   Loss 26.4670   LearningRate 0.0976   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:11,964-Speed 2624.60 samples/sec   Loss 26.4238   LearningRate 0.0976   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:15,864-Speed 2626.50 samples/sec   Loss 26.3077   LearningRate 0.0976   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:19,771-Speed 2621.33 samples/sec   Loss 26.4509   LearningRate 0.0976   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:23,669-Speed 2628.03 samples/sec   Loss 26.3677   LearningRate 0.0976   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 20:48:27,571-Speed 2625.04 samples/sec   Loss 26.3974   LearningRate 0.0976   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:31,482-Speed 2619.32 samples/sec   Loss 26.2149   LearningRate 0.0976   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:35,381-Speed 2626.92 samples/sec   Loss 26.2522   LearningRate 0.0976   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:39,291-Speed 2619.03 samples/sec   Loss 26.3349   LearningRate 0.0976   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:43,195-Speed 2623.75 samples/sec   Loss 26.0603   LearningRate 0.0976   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:47,163-Speed 2581.75 samples/sec   Loss 26.1304   LearningRate 0.0976   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:51,220-Speed 2524.70 samples/sec   Loss 26.3008   LearningRate 0.0976   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:55,318-Speed 2499.64 samples/sec   Loss 26.1454   LearningRate 0.0976   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:48:59,253-Speed 2604.85 samples/sec   Loss 26.1659   LearningRate 0.0976   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:49:03,157-Speed 2623.28 samples/sec   Loss 26.1125   LearningRate 0.0976   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 20:49:07,066-Speed 2620.23 samples/sec   Loss 26.0881   LearningRate 0.0976   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 20:49:49,878-[lfw][10000]XNorm: 24.128440
Training: 2022-04-12 20:49:49,879-[lfw][10000]Accuracy-Flip: 0.98383+-0.00548
Training: 2022-04-12 20:49:49,880-[lfw][10000]Accuracy-Highest: 0.98383
Training: 2022-04-12 20:50:39,805-[cfp_fp][10000]XNorm: 21.212615
Training: 2022-04-12 20:50:39,806-[cfp_fp][10000]Accuracy-Flip: 0.90471+-0.01300
Training: 2022-04-12 20:50:39,807-[cfp_fp][10000]Accuracy-Highest: 0.90471
Training: 2022-04-12 20:51:22,999-[agedb_30][10000]XNorm: 23.680319
Training: 2022-04-12 20:51:23,000-[agedb_30][10000]Accuracy-Flip: 0.87767+-0.02199
Training: 2022-04-12 20:51:23,001-[agedb_30][10000]Accuracy-Highest: 0.87767
Training: 2022-04-12 20:51:26,904-Speed 73.23 samples/sec   Loss 26.3416   LearningRate 0.0976   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:30,775-Speed 2645.46 samples/sec   Loss 26.0953   LearningRate 0.0976   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:34,688-Speed 2618.33 samples/sec   Loss 26.2818   LearningRate 0.0976   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:38,563-Speed 2642.87 samples/sec   Loss 26.0735   LearningRate 0.0976   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:42,444-Speed 2638.94 samples/sec   Loss 26.0839   LearningRate 0.0976   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:46,339-Speed 2630.36 samples/sec   Loss 25.9140   LearningRate 0.0976   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:50,233-Speed 2630.60 samples/sec   Loss 26.2683   LearningRate 0.0976   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:54,131-Speed 2627.88 samples/sec   Loss 26.0169   LearningRate 0.0976   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:51:58,043-Speed 2618.52 samples/sec   Loss 25.8959   LearningRate 0.0976   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:52:01,918-Speed 2643.15 samples/sec   Loss 26.0521   LearningRate 0.0976   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:05,825-Speed 2621.14 samples/sec   Loss 25.9822   LearningRate 0.0976   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:09,717-Speed 2632.27 samples/sec   Loss 25.8842   LearningRate 0.0976   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:13,630-Speed 2617.83 samples/sec   Loss 25.8582   LearningRate 0.0976   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:17,531-Speed 2625.73 samples/sec   Loss 25.7265   LearningRate 0.0976   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:21,435-Speed 2624.24 samples/sec   Loss 25.7682   LearningRate 0.0976   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:25,338-Speed 2623.82 samples/sec   Loss 25.9368   LearningRate 0.0976   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:29,238-Speed 2626.65 samples/sec   Loss 25.7861   LearningRate 0.0976   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:33,140-Speed 2624.75 samples/sec   Loss 25.9549   LearningRate 0.0976   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:37,099-Speed 2586.99 samples/sec   Loss 25.8198   LearningRate 0.0976   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:52:41,125-Speed 2544.28 samples/sec   Loss 25.7778   LearningRate 0.0976   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:52:45,037-Speed 2618.01 samples/sec   Loss 26.0180   LearningRate 0.0976   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:52:48,938-Speed 2626.01 samples/sec   Loss 25.7231   LearningRate 0.0976   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:52:52,929-Speed 2566.31 samples/sec   Loss 25.9952   LearningRate 0.0975   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:52:56,917-Speed 2568.36 samples/sec   Loss 25.6923   LearningRate 0.0975   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:00,821-Speed 2624.03 samples/sec   Loss 25.8228   LearningRate 0.0975   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:04,905-Speed 2507.84 samples/sec   Loss 25.7174   LearningRate 0.0975   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:09,011-Speed 2494.09 samples/sec   Loss 25.5992   LearningRate 0.0975   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:13,027-Speed 2550.87 samples/sec   Loss 25.4028   LearningRate 0.0975   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:16,926-Speed 2626.79 samples/sec   Loss 25.5055   LearningRate 0.0975   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:20,830-Speed 2624.05 samples/sec   Loss 25.7221   LearningRate 0.0975   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:24,753-Speed 2610.98 samples/sec   Loss 25.4007   LearningRate 0.0975   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:28,649-Speed 2629.10 samples/sec   Loss 25.6807   LearningRate 0.0975   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:32,544-Speed 2629.63 samples/sec   Loss 25.8160   LearningRate 0.0975   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:53:36,437-Speed 2630.91 samples/sec   Loss 25.7485   LearningRate 0.0975   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:53:40,334-Speed 2628.21 samples/sec   Loss 25.5572   LearningRate 0.0975   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:53:44,258-Speed 2612.81 samples/sec   Loss 25.5285   LearningRate 0.0975   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:53:48,155-Speed 2628.61 samples/sec   Loss 25.5055   LearningRate 0.0975   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:53:52,062-Speed 2622.03 samples/sec   Loss 25.4029   LearningRate 0.0975   Epoch: 0   Global Step: 10380   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:53:55,965-Speed 2624.27 samples/sec   Loss 25.5362   LearningRate 0.0975   Epoch: 0   Global Step: 10390   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:53:59,897-Speed 2605.09 samples/sec   Loss 25.5295   LearningRate 0.0975   Epoch: 0   Global Step: 10400   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:03,811-Speed 2617.25 samples/sec   Loss 25.5614   LearningRate 0.0975   Epoch: 0   Global Step: 10410   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:07,713-Speed 2624.61 samples/sec   Loss 25.3439   LearningRate 0.0975   Epoch: 0   Global Step: 10420   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:11,607-Speed 2630.55 samples/sec   Loss 25.4497   LearningRate 0.0975   Epoch: 0   Global Step: 10430   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:15,495-Speed 2634.40 samples/sec   Loss 25.4713   LearningRate 0.0975   Epoch: 0   Global Step: 10440   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:19,386-Speed 2632.68 samples/sec   Loss 25.3453   LearningRate 0.0975   Epoch: 0   Global Step: 10450   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:23,284-Speed 2627.55 samples/sec   Loss 25.5343   LearningRate 0.0975   Epoch: 0   Global Step: 10460   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:27,190-Speed 2622.65 samples/sec   Loss 25.1323   LearningRate 0.0975   Epoch: 0   Global Step: 10470   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:31,125-Speed 2602.81 samples/sec   Loss 25.3013   LearningRate 0.0975   Epoch: 0   Global Step: 10480   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:35,041-Speed 2615.58 samples/sec   Loss 25.2745   LearningRate 0.0975   Epoch: 0   Global Step: 10490   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:38,954-Speed 2617.30 samples/sec   Loss 25.2008   LearningRate 0.0975   Epoch: 0   Global Step: 10500   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:54:42,839-Speed 2636.15 samples/sec   Loss 25.0343   LearningRate 0.0975   Epoch: 0   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:54:46,732-Speed 2631.08 samples/sec   Loss 25.1451   LearningRate 0.0975   Epoch: 0   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:54:50,639-Speed 2622.34 samples/sec   Loss 25.2910   LearningRate 0.0975   Epoch: 0   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:54:54,536-Speed 2627.77 samples/sec   Loss 25.3129   LearningRate 0.0975   Epoch: 0   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:54:58,446-Speed 2619.87 samples/sec   Loss 25.1349   LearningRate 0.0975   Epoch: 0   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:02,340-Speed 2630.16 samples/sec   Loss 25.1342   LearningRate 0.0975   Epoch: 0   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:06,242-Speed 2625.40 samples/sec   Loss 25.1804   LearningRate 0.0975   Epoch: 0   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:10,153-Speed 2618.88 samples/sec   Loss 25.3840   LearningRate 0.0975   Epoch: 0   Global Step: 10580   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:14,073-Speed 2612.99 samples/sec   Loss 25.2804   LearningRate 0.0975   Epoch: 0   Global Step: 10590   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:17,971-Speed 2627.84 samples/sec   Loss 25.3338   LearningRate 0.0975   Epoch: 0   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:21,871-Speed 2626.48 samples/sec   Loss 25.1724   LearningRate 0.0975   Epoch: 0   Global Step: 10610   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:55:25,791-Speed 2612.54 samples/sec   Loss 25.0033   LearningRate 0.0975   Epoch: 0   Global Step: 10620   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:55:29,697-Speed 2623.07 samples/sec   Loss 25.0583   LearningRate 0.0975   Epoch: 0   Global Step: 10630   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:55:33,600-Speed 2624.21 samples/sec   Loss 25.0010   LearningRate 0.0975   Epoch: 0   Global Step: 10640   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:55:37,496-Speed 2628.57 samples/sec   Loss 25.2046   LearningRate 0.0974   Epoch: 0   Global Step: 10650   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:55:41,387-Speed 2632.44 samples/sec   Loss 25.0156   LearningRate 0.0974   Epoch: 0   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:45,273-Speed 2635.23 samples/sec   Loss 24.9292   LearningRate 0.0974   Epoch: 0   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:49,172-Speed 2627.36 samples/sec   Loss 24.9341   LearningRate 0.0974   Epoch: 0   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:53,072-Speed 2626.26 samples/sec   Loss 24.9616   LearningRate 0.0974   Epoch: 0   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:55:56,970-Speed 2627.47 samples/sec   Loss 24.8965   LearningRate 0.0974   Epoch: 0   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:00,870-Speed 2626.26 samples/sec   Loss 24.8104   LearningRate 0.0974   Epoch: 0   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:04,770-Speed 2626.36 samples/sec   Loss 25.0927   LearningRate 0.0974   Epoch: 0   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:08,673-Speed 2624.14 samples/sec   Loss 24.7839   LearningRate 0.0974   Epoch: 0   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:12,603-Speed 2605.98 samples/sec   Loss 24.8693   LearningRate 0.0974   Epoch: 0   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:16,523-Speed 2612.93 samples/sec   Loss 24.7595   LearningRate 0.0974   Epoch: 0   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:20,440-Speed 2615.13 samples/sec   Loss 24.9841   LearningRate 0.0974   Epoch: 0   Global Step: 10760   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:56:24,345-Speed 2623.21 samples/sec   Loss 24.8591   LearningRate 0.0974   Epoch: 0   Global Step: 10770   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:56:28,242-Speed 2628.47 samples/sec   Loss 24.7366   LearningRate 0.0974   Epoch: 0   Global Step: 10780   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:56:32,120-Speed 2640.99 samples/sec   Loss 24.6284   LearningRate 0.0974   Epoch: 0   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:36,142-Speed 2546.67 samples/sec   Loss 24.9517   LearningRate 0.0974   Epoch: 0   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:40,245-Speed 2496.62 samples/sec   Loss 24.9556   LearningRate 0.0974   Epoch: 0   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:44,303-Speed 2524.01 samples/sec   Loss 24.7013   LearningRate 0.0974   Epoch: 0   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:48,219-Speed 2615.33 samples/sec   Loss 24.7794   LearningRate 0.0974   Epoch: 0   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:52,110-Speed 2633.39 samples/sec   Loss 24.7543   LearningRate 0.0974   Epoch: 0   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:56,005-Speed 2629.56 samples/sec   Loss 24.7747   LearningRate 0.0974   Epoch: 0   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:56:59,904-Speed 2627.28 samples/sec   Loss 24.7925   LearningRate 0.0974   Epoch: 0   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:57:03,805-Speed 2625.06 samples/sec   Loss 24.5589   LearningRate 0.0974   Epoch: 0   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:57:07,848-Speed 2533.47 samples/sec   Loss 24.7518   LearningRate 0.0974   Epoch: 0   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 20:57:11,953-Speed 2495.12 samples/sec   Loss 24.7633   LearningRate 0.0974   Epoch: 0   Global Step: 10890   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:15,867-Speed 2617.15 samples/sec   Loss 24.7814   LearningRate 0.0974   Epoch: 0   Global Step: 10900   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:19,770-Speed 2623.86 samples/sec   Loss 24.7277   LearningRate 0.0974   Epoch: 0   Global Step: 10910   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:23,688-Speed 2614.25 samples/sec   Loss 24.8398   LearningRate 0.0974   Epoch: 0   Global Step: 10920   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:27,591-Speed 2624.81 samples/sec   Loss 24.7911   LearningRate 0.0974   Epoch: 0   Global Step: 10930   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:31,500-Speed 2619.61 samples/sec   Loss 24.7152   LearningRate 0.0974   Epoch: 0   Global Step: 10940   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:35,411-Speed 2618.98 samples/sec   Loss 24.6755   LearningRate 0.0974   Epoch: 0   Global Step: 10950   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:39,310-Speed 2626.99 samples/sec   Loss 24.6336   LearningRate 0.0974   Epoch: 0   Global Step: 10960   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:43,212-Speed 2624.68 samples/sec   Loss 24.5345   LearningRate 0.0974   Epoch: 0   Global Step: 10970   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:47,109-Speed 2635.54 samples/sec   Loss 24.6742   LearningRate 0.0974   Epoch: 0   Global Step: 10980   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:50,994-Speed 2636.25 samples/sec   Loss 24.4949   LearningRate 0.0974   Epoch: 0   Global Step: 10990   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:54,893-Speed 2627.26 samples/sec   Loss 24.4961   LearningRate 0.0974   Epoch: 0   Global Step: 11000   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:57:58,937-Speed 2532.62 samples/sec   Loss 24.3708   LearningRate 0.0974   Epoch: 0   Global Step: 11010   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:02,849-Speed 2618.35 samples/sec   Loss 24.5412   LearningRate 0.0974   Epoch: 0   Global Step: 11020   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:06,828-Speed 2573.88 samples/sec   Loss 24.5040   LearningRate 0.0974   Epoch: 0   Global Step: 11030   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:10,779-Speed 2592.79 samples/sec   Loss 24.4288   LearningRate 0.0974   Epoch: 0   Global Step: 11040   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:14,705-Speed 2609.35 samples/sec   Loss 24.5361   LearningRate 0.0974   Epoch: 0   Global Step: 11050   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:18,619-Speed 2616.87 samples/sec   Loss 24.6164   LearningRate 0.0974   Epoch: 0   Global Step: 11060   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:22,551-Speed 2605.30 samples/sec   Loss 24.3951   LearningRate 0.0973   Epoch: 0   Global Step: 11070   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:26,447-Speed 2628.43 samples/sec   Loss 24.2871   LearningRate 0.0973   Epoch: 0   Global Step: 11080   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:30,337-Speed 2633.44 samples/sec   Loss 24.3536   LearningRate 0.0973   Epoch: 0   Global Step: 11090   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:34,237-Speed 2626.20 samples/sec   Loss 24.3756   LearningRate 0.0973   Epoch: 0   Global Step: 11100   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:38,142-Speed 2623.13 samples/sec   Loss 24.3116   LearningRate 0.0973   Epoch: 0   Global Step: 11110   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:42,038-Speed 2628.67 samples/sec   Loss 24.3662   LearningRate 0.0973   Epoch: 0   Global Step: 11120   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:45,937-Speed 2627.08 samples/sec   Loss 24.1740   LearningRate 0.0973   Epoch: 0   Global Step: 11130   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:49,839-Speed 2625.20 samples/sec   Loss 24.1972   LearningRate 0.0973   Epoch: 0   Global Step: 11140   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:53,734-Speed 2630.24 samples/sec   Loss 24.3213   LearningRate 0.0973   Epoch: 0   Global Step: 11150   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:58:57,731-Speed 2562.08 samples/sec   Loss 24.2747   LearningRate 0.0973   Epoch: 0   Global Step: 11160   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:01,627-Speed 2629.06 samples/sec   Loss 24.1988   LearningRate 0.0973   Epoch: 0   Global Step: 11170   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:05,521-Speed 2630.00 samples/sec   Loss 24.4360   LearningRate 0.0973   Epoch: 0   Global Step: 11180   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:09,405-Speed 2637.29 samples/sec   Loss 24.2739   LearningRate 0.0973   Epoch: 0   Global Step: 11190   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:13,305-Speed 2626.56 samples/sec   Loss 24.3445   LearningRate 0.0973   Epoch: 0   Global Step: 11200   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:17,227-Speed 2611.31 samples/sec   Loss 24.2993   LearningRate 0.0973   Epoch: 0   Global Step: 11210   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:21,124-Speed 2632.87 samples/sec   Loss 24.3260   LearningRate 0.0973   Epoch: 0   Global Step: 11220   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:25,035-Speed 2618.98 samples/sec   Loss 24.4078   LearningRate 0.0973   Epoch: 0   Global Step: 11230   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:29,130-Speed 2501.32 samples/sec   Loss 24.0584   LearningRate 0.0973   Epoch: 0   Global Step: 11240   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:33,155-Speed 2544.87 samples/sec   Loss 24.2061   LearningRate 0.0973   Epoch: 0   Global Step: 11250   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:37,064-Speed 2620.31 samples/sec   Loss 24.2413   LearningRate 0.0973   Epoch: 0   Global Step: 11260   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:40,970-Speed 2622.08 samples/sec   Loss 24.0774   LearningRate 0.0973   Epoch: 0   Global Step: 11270   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:44,883-Speed 2617.61 samples/sec   Loss 24.0927   LearningRate 0.0973   Epoch: 0   Global Step: 11280   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:48,781-Speed 2627.23 samples/sec   Loss 24.2251   LearningRate 0.0973   Epoch: 0   Global Step: 11290   Fp16 Grad Scale: 524288   Required: 92 hours
Training: 2022-04-12 20:59:52,663-Speed 2639.25 samples/sec   Loss 24.0165   LearningRate 0.0973   Epoch: 0   Global Step: 11300   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 20:59:56,557-Speed 2629.81 samples/sec   Loss 23.9387   LearningRate 0.0973   Epoch: 0   Global Step: 11310   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:00:00,455-Speed 2627.80 samples/sec   Loss 23.9461   LearningRate 0.0973   Epoch: 0   Global Step: 11320   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:00:04,353-Speed 2627.79 samples/sec   Loss 24.0221   LearningRate 0.0973   Epoch: 0   Global Step: 11330   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:00:08,265-Speed 2617.76 samples/sec   Loss 24.1475   LearningRate 0.0973   Epoch: 0   Global Step: 11340   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:00:12,166-Speed 2625.22 samples/sec   Loss 24.0531   LearningRate 0.0973   Epoch: 0   Global Step: 11350   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:00:16,057-Speed 2632.60 samples/sec   Loss 23.8007   LearningRate 0.0973   Epoch: 0   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:19,964-Speed 2622.10 samples/sec   Loss 23.8685   LearningRate 0.0973   Epoch: 0   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:23,873-Speed 2620.21 samples/sec   Loss 24.0787   LearningRate 0.0973   Epoch: 0   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:27,774-Speed 2626.12 samples/sec   Loss 23.9378   LearningRate 0.0973   Epoch: 0   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:31,668-Speed 2630.01 samples/sec   Loss 24.0796   LearningRate 0.0973   Epoch: 0   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:35,568-Speed 2626.22 samples/sec   Loss 23.8185   LearningRate 0.0973   Epoch: 0   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:39,464-Speed 2629.05 samples/sec   Loss 23.9420   LearningRate 0.0973   Epoch: 0   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:43,366-Speed 2625.29 samples/sec   Loss 24.0507   LearningRate 0.0973   Epoch: 0   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:47,265-Speed 2626.65 samples/sec   Loss 23.8362   LearningRate 0.0973   Epoch: 0   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:51,165-Speed 2626.58 samples/sec   Loss 23.6750   LearningRate 0.0973   Epoch: 0   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:00:55,068-Speed 2624.08 samples/sec   Loss 23.7539   LearningRate 0.0973   Epoch: 0   Global Step: 11460   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:00:58,966-Speed 2627.83 samples/sec   Loss 24.0846   LearningRate 0.0973   Epoch: 0   Global Step: 11470   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:02,947-Speed 2572.71 samples/sec   Loss 23.8444   LearningRate 0.0973   Epoch: 0   Global Step: 11480   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:06,869-Speed 2611.62 samples/sec   Loss 23.8200   LearningRate 0.0972   Epoch: 0   Global Step: 11490   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:10,781-Speed 2618.25 samples/sec   Loss 23.7068   LearningRate 0.0972   Epoch: 0   Global Step: 11500   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:14,691-Speed 2619.98 samples/sec   Loss 23.9309   LearningRate 0.0972   Epoch: 0   Global Step: 11510   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:18,594-Speed 2623.71 samples/sec   Loss 23.7136   LearningRate 0.0972   Epoch: 0   Global Step: 11520   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:22,508-Speed 2617.90 samples/sec   Loss 23.6356   LearningRate 0.0972   Epoch: 0   Global Step: 11530   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:26,488-Speed 2573.29 samples/sec   Loss 23.6270   LearningRate 0.0972   Epoch: 0   Global Step: 11540   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:30,421-Speed 2604.54 samples/sec   Loss 23.7791   LearningRate 0.0972   Epoch: 0   Global Step: 11550   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:34,304-Speed 2638.06 samples/sec   Loss 23.5974   LearningRate 0.0972   Epoch: 0   Global Step: 11560   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:38,201-Speed 2627.93 samples/sec   Loss 23.9312   LearningRate 0.0972   Epoch: 0   Global Step: 11570   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:42,130-Speed 2606.91 samples/sec   Loss 23.8163   LearningRate 0.0972   Epoch: 0   Global Step: 11580   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:46,025-Speed 2629.94 samples/sec   Loss 23.8748   LearningRate 0.0972   Epoch: 0   Global Step: 11590   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:49,923-Speed 2628.13 samples/sec   Loss 23.6850   LearningRate 0.0972   Epoch: 0   Global Step: 11600   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:53,831-Speed 2621.12 samples/sec   Loss 23.7128   LearningRate 0.0972   Epoch: 0   Global Step: 11610   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:01:57,748-Speed 2615.18 samples/sec   Loss 23.6204   LearningRate 0.0972   Epoch: 0   Global Step: 11620   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:01,651-Speed 2623.96 samples/sec   Loss 23.5729   LearningRate 0.0972   Epoch: 0   Global Step: 11630   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:05,571-Speed 2612.86 samples/sec   Loss 23.6758   LearningRate 0.0972   Epoch: 0   Global Step: 11640   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:09,466-Speed 2629.41 samples/sec   Loss 23.7170   LearningRate 0.0972   Epoch: 0   Global Step: 11650   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:13,353-Speed 2635.54 samples/sec   Loss 23.4402   LearningRate 0.0972   Epoch: 0   Global Step: 11660   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:17,243-Speed 2632.79 samples/sec   Loss 23.5444   LearningRate 0.0972   Epoch: 0   Global Step: 11670   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:21,155-Speed 2618.43 samples/sec   Loss 23.5445   LearningRate 0.0972   Epoch: 0   Global Step: 11680   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:25,128-Speed 2578.68 samples/sec   Loss 23.4500   LearningRate 0.0972   Epoch: 0   Global Step: 11690   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:29,133-Speed 2557.15 samples/sec   Loss 23.6127   LearningRate 0.0972   Epoch: 0   Global Step: 11700   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:33,080-Speed 2594.92 samples/sec   Loss 23.6015   LearningRate 0.0972   Epoch: 0   Global Step: 11710   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:37,096-Speed 2550.61 samples/sec   Loss 23.3289   LearningRate 0.0972   Epoch: 0   Global Step: 11720   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:41,008-Speed 2618.61 samples/sec   Loss 23.6072   LearningRate 0.0972   Epoch: 0   Global Step: 11730   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:44,909-Speed 2625.56 samples/sec   Loss 23.6353   LearningRate 0.0972   Epoch: 0   Global Step: 11740   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:48,819-Speed 2619.44 samples/sec   Loss 23.3112   LearningRate 0.0972   Epoch: 0   Global Step: 11750   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:02:52,813-Speed 2564.43 samples/sec   Loss 23.4358   LearningRate 0.0972   Epoch: 0   Global Step: 11760   Fp16 Grad Scale: 524288   Required: 92 hours
Training: 2022-04-12 21:02:56,698-Speed 2636.85 samples/sec   Loss 23.5063   LearningRate 0.0972   Epoch: 0   Global Step: 11770   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:03:00,603-Speed 2622.35 samples/sec   Loss 23.5662   LearningRate 0.0972   Epoch: 0   Global Step: 11780   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:03:04,528-Speed 2609.88 samples/sec   Loss 23.2884   LearningRate 0.0972   Epoch: 0   Global Step: 11790   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:03:08,429-Speed 2626.08 samples/sec   Loss 23.4348   LearningRate 0.0972   Epoch: 0   Global Step: 11800   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:03:12,539-Speed 2491.67 samples/sec   Loss 23.3854   LearningRate 0.0972   Epoch: 0   Global Step: 11810   Fp16 Grad Scale: 262144   Required: 92 hours
Training: 2022-04-12 21:03:16,619-Speed 2511.36 samples/sec   Loss 23.2861   LearningRate 0.0972   Epoch: 0   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:03:20,566-Speed 2595.98 samples/sec   Loss 23.3866   LearningRate 0.0972   Epoch: 0   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:03:24,466-Speed 2625.85 samples/sec   Loss 23.2487   LearningRate 0.0972   Epoch: 0   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:03:28,378-Speed 2618.60 samples/sec   Loss 23.2853   LearningRate 0.0972   Epoch: 0   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:03:32,296-Speed 2614.29 samples/sec   Loss 23.4276   LearningRate 0.0972   Epoch: 0   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:03:36,325-Speed 2541.96 samples/sec   Loss 23.3802   LearningRate 0.0972   Epoch: 0   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 92 hours
Training: 2022-04-12 21:03:40,230-Speed 2623.39 samples/sec   Loss 23.3754   LearningRate 0.0972   Epoch: 0   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:03:44,141-Speed 2618.50 samples/sec   Loss 23.2311   LearningRate 0.0972   Epoch: 0   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:03:48,039-Speed 2627.92 samples/sec   Loss 23.3196   LearningRate 0.0972   Epoch: 0   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:03:51,952-Speed 2617.62 samples/sec   Loss 23.3323   LearningRate 0.0971   Epoch: 0   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:03:55,852-Speed 2626.55 samples/sec   Loss 23.1215   LearningRate 0.0971   Epoch: 0   Global Step: 11920   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:03:59,760-Speed 2621.11 samples/sec   Loss 23.3211   LearningRate 0.0971   Epoch: 0   Global Step: 11930   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:03,663-Speed 2624.34 samples/sec   Loss 23.0525   LearningRate 0.0971   Epoch: 0   Global Step: 11940   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:07,563-Speed 2626.41 samples/sec   Loss 23.3195   LearningRate 0.0971   Epoch: 0   Global Step: 11950   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:11,466-Speed 2624.18 samples/sec   Loss 23.2789   LearningRate 0.0971   Epoch: 0   Global Step: 11960   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:15,363-Speed 2628.72 samples/sec   Loss 23.2794   LearningRate 0.0971   Epoch: 0   Global Step: 11970   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:19,260-Speed 2628.24 samples/sec   Loss 23.2121   LearningRate 0.0971   Epoch: 0   Global Step: 11980   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:23,163-Speed 2623.97 samples/sec   Loss 23.1384   LearningRate 0.0971   Epoch: 0   Global Step: 11990   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:27,065-Speed 2625.56 samples/sec   Loss 23.2496   LearningRate 0.0971   Epoch: 0   Global Step: 12000   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:30,967-Speed 2624.79 samples/sec   Loss 22.9744   LearningRate 0.0971   Epoch: 0   Global Step: 12010   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:34,863-Speed 2629.29 samples/sec   Loss 23.1442   LearningRate 0.0971   Epoch: 0   Global Step: 12020   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:38,777-Speed 2616.91 samples/sec   Loss 23.1529   LearningRate 0.0971   Epoch: 0   Global Step: 12030   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:42,675-Speed 2627.17 samples/sec   Loss 23.1291   LearningRate 0.0971   Epoch: 0   Global Step: 12040   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:46,574-Speed 2627.43 samples/sec   Loss 23.1541   LearningRate 0.0971   Epoch: 0   Global Step: 12050   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:50,482-Speed 2621.24 samples/sec   Loss 23.1384   LearningRate 0.0971   Epoch: 0   Global Step: 12060   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:54,387-Speed 2622.67 samples/sec   Loss 22.9940   LearningRate 0.0971   Epoch: 0   Global Step: 12070   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:04:58,286-Speed 2626.95 samples/sec   Loss 23.1300   LearningRate 0.0971   Epoch: 0   Global Step: 12080   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:02,185-Speed 2627.04 samples/sec   Loss 23.1124   LearningRate 0.0971   Epoch: 0   Global Step: 12090   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:06,085-Speed 2625.98 samples/sec   Loss 23.0423   LearningRate 0.0971   Epoch: 0   Global Step: 12100   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:09,982-Speed 2628.53 samples/sec   Loss 22.9773   LearningRate 0.0971   Epoch: 0   Global Step: 12110   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:13,881-Speed 2627.07 samples/sec   Loss 23.0201   LearningRate 0.0971   Epoch: 0   Global Step: 12120   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:05:17,767-Speed 2635.20 samples/sec   Loss 23.1005   LearningRate 0.0971   Epoch: 0   Global Step: 12130   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:21,664-Speed 2628.93 samples/sec   Loss 23.0081   LearningRate 0.0971   Epoch: 0   Global Step: 12140   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:25,562-Speed 2627.32 samples/sec   Loss 23.1205   LearningRate 0.0971   Epoch: 0   Global Step: 12150   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:29,580-Speed 2549.37 samples/sec   Loss 22.7805   LearningRate 0.0971   Epoch: 0   Global Step: 12160   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:33,673-Speed 2502.19 samples/sec   Loss 22.7256   LearningRate 0.0971   Epoch: 0   Global Step: 12170   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:37,707-Speed 2539.19 samples/sec   Loss 23.0662   LearningRate 0.0971   Epoch: 0   Global Step: 12180   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:41,745-Speed 2536.80 samples/sec   Loss 22.8063   LearningRate 0.0971   Epoch: 0   Global Step: 12190   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:45,820-Speed 2513.22 samples/sec   Loss 22.7127   LearningRate 0.0971   Epoch: 0   Global Step: 12200   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:49,743-Speed 2610.96 samples/sec   Loss 22.7708   LearningRate 0.0971   Epoch: 0   Global Step: 12210   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:53,672-Speed 2612.60 samples/sec   Loss 22.8297   LearningRate 0.0971   Epoch: 0   Global Step: 12220   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:05:57,563-Speed 2632.11 samples/sec   Loss 22.9926   LearningRate 0.0971   Epoch: 0   Global Step: 12230   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:01,473-Speed 2619.69 samples/sec   Loss 22.7414   LearningRate 0.0971   Epoch: 0   Global Step: 12240   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:05,373-Speed 2626.44 samples/sec   Loss 22.6848   LearningRate 0.0971   Epoch: 0   Global Step: 12250   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:09,281-Speed 2620.67 samples/sec   Loss 22.8649   LearningRate 0.0971   Epoch: 0   Global Step: 12260   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:13,184-Speed 2624.59 samples/sec   Loss 22.9567   LearningRate 0.0971   Epoch: 0   Global Step: 12270   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:17,207-Speed 2545.87 samples/sec   Loss 22.8172   LearningRate 0.0971   Epoch: 0   Global Step: 12280   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:21,113-Speed 2622.57 samples/sec   Loss 22.7182   LearningRate 0.0971   Epoch: 0   Global Step: 12290   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:25,010-Speed 2628.23 samples/sec   Loss 22.7642   LearningRate 0.0971   Epoch: 0   Global Step: 12300   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:28,911-Speed 2625.46 samples/sec   Loss 22.6296   LearningRate 0.0971   Epoch: 0   Global Step: 12310   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:32,812-Speed 2625.65 samples/sec   Loss 22.6737   LearningRate 0.0971   Epoch: 0   Global Step: 12320   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:36,772-Speed 2586.39 samples/sec   Loss 22.7985   LearningRate 0.0970   Epoch: 0   Global Step: 12330   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:40,677-Speed 2622.82 samples/sec   Loss 22.8695   LearningRate 0.0970   Epoch: 0   Global Step: 12340   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:44,641-Speed 2584.17 samples/sec   Loss 22.4713   LearningRate 0.0970   Epoch: 0   Global Step: 12350   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:48,564-Speed 2610.66 samples/sec   Loss 22.6371   LearningRate 0.0970   Epoch: 0   Global Step: 12360   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:52,476-Speed 2618.34 samples/sec   Loss 22.5962   LearningRate 0.0970   Epoch: 0   Global Step: 12370   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:06:56,428-Speed 2591.95 samples/sec   Loss 22.7849   LearningRate 0.0970   Epoch: 0   Global Step: 12380   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:00,548-Speed 2486.05 samples/sec   Loss 22.6879   LearningRate 0.0970   Epoch: 0   Global Step: 12390   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:04,662-Speed 2489.76 samples/sec   Loss 22.5807   LearningRate 0.0970   Epoch: 0   Global Step: 12400   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:08,769-Speed 2493.86 samples/sec   Loss 22.5824   LearningRate 0.0970   Epoch: 0   Global Step: 12410   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:12,754-Speed 2570.23 samples/sec   Loss 22.4593   LearningRate 0.0970   Epoch: 0   Global Step: 12420   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:16,859-Speed 2495.08 samples/sec   Loss 22.6251   LearningRate 0.0970   Epoch: 0   Global Step: 12430   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:07:20,898-Speed 2536.37 samples/sec   Loss 22.6533   LearningRate 0.0970   Epoch: 0   Global Step: 12440   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:24,806-Speed 2620.56 samples/sec   Loss 22.5777   LearningRate 0.0970   Epoch: 0   Global Step: 12450   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:28,715-Speed 2620.87 samples/sec   Loss 22.6462   LearningRate 0.0970   Epoch: 0   Global Step: 12460   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:07:32,600-Speed 2636.06 samples/sec   Loss 22.5146   LearningRate 0.0970   Epoch: 0   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:36,514-Speed 2616.86 samples/sec   Loss 22.6128   LearningRate 0.0970   Epoch: 0   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:40,420-Speed 2621.82 samples/sec   Loss 22.7122   LearningRate 0.0970   Epoch: 0   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:44,330-Speed 2619.95 samples/sec   Loss 22.6672   LearningRate 0.0970   Epoch: 0   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:48,234-Speed 2623.59 samples/sec   Loss 22.5040   LearningRate 0.0970   Epoch: 0   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:52,134-Speed 2626.27 samples/sec   Loss 22.3887   LearningRate 0.0970   Epoch: 0   Global Step: 12520   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:56,049-Speed 2616.30 samples/sec   Loss 22.5132   LearningRate 0.0970   Epoch: 0   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:07:59,944-Speed 2630.21 samples/sec   Loss 22.4858   LearningRate 0.0970   Epoch: 0   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:08:03,859-Speed 2616.13 samples/sec   Loss 22.5814   LearningRate 0.0970   Epoch: 0   Global Step: 12550   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:08:07,761-Speed 2625.18 samples/sec   Loss 22.4180   LearningRate 0.0970   Epoch: 0   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:08:11,666-Speed 2622.49 samples/sec   Loss 22.5058   LearningRate 0.0970   Epoch: 0   Global Step: 12570   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:15,572-Speed 2622.53 samples/sec   Loss 22.3390   LearningRate 0.0970   Epoch: 0   Global Step: 12580   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:19,489-Speed 2614.53 samples/sec   Loss 22.3738   LearningRate 0.0970   Epoch: 0   Global Step: 12590   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:23,391-Speed 2625.10 samples/sec   Loss 22.3143   LearningRate 0.0970   Epoch: 0   Global Step: 12600   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:27,299-Speed 2620.64 samples/sec   Loss 22.5547   LearningRate 0.0970   Epoch: 0   Global Step: 12610   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:31,222-Speed 2611.18 samples/sec   Loss 22.3073   LearningRate 0.0970   Epoch: 0   Global Step: 12620   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:35,126-Speed 2623.67 samples/sec   Loss 22.5420   LearningRate 0.0970   Epoch: 0   Global Step: 12630   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:39,031-Speed 2622.36 samples/sec   Loss 22.3726   LearningRate 0.0970   Epoch: 0   Global Step: 12640   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:43,004-Speed 2578.21 samples/sec   Loss 22.3295   LearningRate 0.0970   Epoch: 0   Global Step: 12650   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:46,904-Speed 2626.04 samples/sec   Loss 22.4101   LearningRate 0.0970   Epoch: 0   Global Step: 12660   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:50,786-Speed 2638.54 samples/sec   Loss 22.2679   LearningRate 0.0970   Epoch: 0   Global Step: 12670   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:54,693-Speed 2621.48 samples/sec   Loss 22.1937   LearningRate 0.0970   Epoch: 0   Global Step: 12680   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:08:58,596-Speed 2624.35 samples/sec   Loss 22.1994   LearningRate 0.0970   Epoch: 0   Global Step: 12690   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:02,501-Speed 2622.88 samples/sec   Loss 22.3682   LearningRate 0.0970   Epoch: 0   Global Step: 12700   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:06,405-Speed 2623.62 samples/sec   Loss 22.1289   LearningRate 0.0970   Epoch: 0   Global Step: 12710   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:10,340-Speed 2603.37 samples/sec   Loss 22.0947   LearningRate 0.0970   Epoch: 0   Global Step: 12720   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:14,238-Speed 2627.76 samples/sec   Loss 22.3172   LearningRate 0.0970   Epoch: 0   Global Step: 12730   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:18,142-Speed 2623.42 samples/sec   Loss 22.2262   LearningRate 0.0970   Epoch: 0   Global Step: 12740   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:22,055-Speed 2617.62 samples/sec   Loss 21.9881   LearningRate 0.0969   Epoch: 0   Global Step: 12750   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:25,969-Speed 2616.69 samples/sec   Loss 22.3067   LearningRate 0.0969   Epoch: 0   Global Step: 12760   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:29,863-Speed 2631.10 samples/sec   Loss 22.2658   LearningRate 0.0969   Epoch: 0   Global Step: 12770   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:33,837-Speed 2576.94 samples/sec   Loss 21.8652   LearningRate 0.0969   Epoch: 0   Global Step: 12780   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:37,792-Speed 2590.34 samples/sec   Loss 22.3276   LearningRate 0.0969   Epoch: 0   Global Step: 12790   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:41,695-Speed 2624.09 samples/sec   Loss 22.3600   LearningRate 0.0969   Epoch: 0   Global Step: 12800   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:45,614-Speed 2613.58 samples/sec   Loss 22.0661   LearningRate 0.0969   Epoch: 0   Global Step: 12810   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:49,510-Speed 2629.14 samples/sec   Loss 22.1113   LearningRate 0.0969   Epoch: 0   Global Step: 12820   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:53,415-Speed 2623.05 samples/sec   Loss 21.9496   LearningRate 0.0969   Epoch: 0   Global Step: 12830   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:09:57,504-Speed 2504.65 samples/sec   Loss 22.1286   LearningRate 0.0969   Epoch: 0   Global Step: 12840   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:01,474-Speed 2580.23 samples/sec   Loss 22.0993   LearningRate 0.0969   Epoch: 0   Global Step: 12850   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:05,457-Speed 2571.40 samples/sec   Loss 22.0234   LearningRate 0.0969   Epoch: 0   Global Step: 12860   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:09,556-Speed 2498.66 samples/sec   Loss 21.9928   LearningRate 0.0969   Epoch: 0   Global Step: 12870   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:10:13,436-Speed 2640.06 samples/sec   Loss 22.1007   LearningRate 0.0969   Epoch: 0   Global Step: 12880   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:17,338-Speed 2624.65 samples/sec   Loss 21.8888   LearningRate 0.0969   Epoch: 0   Global Step: 12890   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:21,237-Speed 2627.66 samples/sec   Loss 22.0910   LearningRate 0.0969   Epoch: 0   Global Step: 12900   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:25,138-Speed 2625.06 samples/sec   Loss 21.9996   LearningRate 0.0969   Epoch: 0   Global Step: 12910   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:29,083-Speed 2597.13 samples/sec   Loss 22.0999   LearningRate 0.0969   Epoch: 0   Global Step: 12920   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:32,985-Speed 2624.62 samples/sec   Loss 21.8988   LearningRate 0.0969   Epoch: 0   Global Step: 12930   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:36,919-Speed 2603.34 samples/sec   Loss 21.9651   LearningRate 0.0969   Epoch: 0   Global Step: 12940   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:40,828-Speed 2621.02 samples/sec   Loss 22.0520   LearningRate 0.0969   Epoch: 0   Global Step: 12950   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:44,734-Speed 2622.31 samples/sec   Loss 21.9549   LearningRate 0.0969   Epoch: 0   Global Step: 12960   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:48,633-Speed 2626.95 samples/sec   Loss 21.9126   LearningRate 0.0969   Epoch: 0   Global Step: 12970   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:52,519-Speed 2635.68 samples/sec   Loss 22.0018   LearningRate 0.0969   Epoch: 0   Global Step: 12980   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:10:56,425-Speed 2621.93 samples/sec   Loss 21.8344   LearningRate 0.0969   Epoch: 0   Global Step: 12990   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:11:00,326-Speed 2625.91 samples/sec   Loss 22.0886   LearningRate 0.0969   Epoch: 0   Global Step: 13000   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:11:04,223-Speed 2628.17 samples/sec   Loss 22.0893   LearningRate 0.0969   Epoch: 0   Global Step: 13010   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:08,124-Speed 2625.65 samples/sec   Loss 21.9768   LearningRate 0.0969   Epoch: 0   Global Step: 13020   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:12,028-Speed 2623.19 samples/sec   Loss 21.9919   LearningRate 0.0969   Epoch: 0   Global Step: 13030   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:15,948-Speed 2613.15 samples/sec   Loss 21.8396   LearningRate 0.0969   Epoch: 0   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:19,862-Speed 2616.87 samples/sec   Loss 21.8320   LearningRate 0.0969   Epoch: 0   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:23,769-Speed 2621.88 samples/sec   Loss 21.9053   LearningRate 0.0969   Epoch: 0   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:27,683-Speed 2616.63 samples/sec   Loss 21.9638   LearningRate 0.0969   Epoch: 0   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:31,686-Speed 2558.87 samples/sec   Loss 21.9854   LearningRate 0.0969   Epoch: 0   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:35,594-Speed 2620.68 samples/sec   Loss 21.8539   LearningRate 0.0969   Epoch: 0   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:39,497-Speed 2624.01 samples/sec   Loss 21.8529   LearningRate 0.0969   Epoch: 0   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:11:43,444-Speed 2595.12 samples/sec   Loss 21.8813   LearningRate 0.0969   Epoch: 0   Global Step: 13110   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:11:47,460-Speed 2550.17 samples/sec   Loss 21.9317   LearningRate 0.0969   Epoch: 0   Global Step: 13120   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:11:51,376-Speed 2615.83 samples/sec   Loss 21.7083   LearningRate 0.0969   Epoch: 0   Global Step: 13130   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:11:55,275-Speed 2626.67 samples/sec   Loss 22.1024   LearningRate 0.0969   Epoch: 0   Global Step: 13140   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:11:59,190-Speed 2616.70 samples/sec   Loss 22.0134   LearningRate 0.0969   Epoch: 0   Global Step: 13150   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:03,095-Speed 2622.52 samples/sec   Loss 21.8493   LearningRate 0.0969   Epoch: 0   Global Step: 13160   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:06,995-Speed 2626.47 samples/sec   Loss 21.7454   LearningRate 0.0969   Epoch: 0   Global Step: 13170   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:10,907-Speed 2618.25 samples/sec   Loss 21.7387   LearningRate 0.0968   Epoch: 0   Global Step: 13180   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:14,808-Speed 2625.76 samples/sec   Loss 21.7976   LearningRate 0.0968   Epoch: 0   Global Step: 13190   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:18,720-Speed 2617.67 samples/sec   Loss 21.9894   LearningRate 0.0968   Epoch: 0   Global Step: 13200   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:22,634-Speed 2616.94 samples/sec   Loss 21.7291   LearningRate 0.0968   Epoch: 0   Global Step: 13210   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:12:26,519-Speed 2636.18 samples/sec   Loss 21.7679   LearningRate 0.0968   Epoch: 0   Global Step: 13220   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:30,428-Speed 2620.50 samples/sec   Loss 21.7536   LearningRate 0.0968   Epoch: 0   Global Step: 13230   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:34,339-Speed 2619.39 samples/sec   Loss 21.7988   LearningRate 0.0968   Epoch: 0   Global Step: 13240   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:38,236-Speed 2628.02 samples/sec   Loss 21.6645   LearningRate 0.0968   Epoch: 0   Global Step: 13250   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:12:42,137-Speed 2625.62 samples/sec   Loss 21.6848   LearningRate 0.0968   Epoch: 0   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:12:46,037-Speed 2626.17 samples/sec   Loss 21.6457   LearningRate 0.0968   Epoch: 0   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:12:49,942-Speed 2622.94 samples/sec   Loss 21.5899   LearningRate 0.0968   Epoch: 0   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:12:53,846-Speed 2623.57 samples/sec   Loss 21.6562   LearningRate 0.0968   Epoch: 0   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:12:57,743-Speed 2628.40 samples/sec   Loss 21.5651   LearningRate 0.0968   Epoch: 0   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:13:01,646-Speed 2624.23 samples/sec   Loss 21.7268   LearningRate 0.0968   Epoch: 0   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:13:05,550-Speed 2624.12 samples/sec   Loss 21.6673   LearningRate 0.0968   Epoch: 0   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:13:09,454-Speed 2623.87 samples/sec   Loss 21.5848   LearningRate 0.0968   Epoch: 0   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:13:13,352-Speed 2627.39 samples/sec   Loss 21.5243   LearningRate 0.0968   Epoch: 0   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:13:17,252-Speed 2625.71 samples/sec   Loss 21.6052   LearningRate 0.0968   Epoch: 0   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:13:21,154-Speed 2625.63 samples/sec   Loss 21.5295   LearningRate 0.0968   Epoch: 0   Global Step: 13360   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:25,051-Speed 2627.81 samples/sec   Loss 21.7733   LearningRate 0.0968   Epoch: 0   Global Step: 13370   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:28,960-Speed 2620.28 samples/sec   Loss 21.6417   LearningRate 0.0968   Epoch: 0   Global Step: 13380   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:32,864-Speed 2623.66 samples/sec   Loss 21.4377   LearningRate 0.0968   Epoch: 0   Global Step: 13390   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:36,796-Speed 2605.06 samples/sec   Loss 21.6088   LearningRate 0.0968   Epoch: 0   Global Step: 13400   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:40,758-Speed 2585.31 samples/sec   Loss 21.3130   LearningRate 0.0968   Epoch: 0   Global Step: 13410   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:44,865-Speed 2493.53 samples/sec   Loss 21.5228   LearningRate 0.0968   Epoch: 0   Global Step: 13420   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:48,835-Speed 2580.29 samples/sec   Loss 21.6012   LearningRate 0.0968   Epoch: 0   Global Step: 13430   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:52,733-Speed 2627.45 samples/sec   Loss 21.5291   LearningRate 0.0968   Epoch: 0   Global Step: 13440   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:13:56,628-Speed 2630.47 samples/sec   Loss 21.3219   LearningRate 0.0968   Epoch: 0   Global Step: 13450   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:00,522-Speed 2630.06 samples/sec   Loss 21.6586   LearningRate 0.0968   Epoch: 0   Global Step: 13460   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:14:04,417-Speed 2629.90 samples/sec   Loss 21.4278   LearningRate 0.0968   Epoch: 0   Global Step: 13470   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:08,316-Speed 2626.88 samples/sec   Loss 21.4503   LearningRate 0.0968   Epoch: 0   Global Step: 13480   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:12,222-Speed 2622.04 samples/sec   Loss 21.4006   LearningRate 0.0968   Epoch: 0   Global Step: 13490   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:16,126-Speed 2624.05 samples/sec   Loss 21.5551   LearningRate 0.0968   Epoch: 0   Global Step: 13500   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:20,037-Speed 2619.11 samples/sec   Loss 21.4444   LearningRate 0.0968   Epoch: 0   Global Step: 13510   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:23,942-Speed 2622.77 samples/sec   Loss 21.3833   LearningRate 0.0968   Epoch: 0   Global Step: 13520   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:27,844-Speed 2624.72 samples/sec   Loss 21.4697   LearningRate 0.0968   Epoch: 0   Global Step: 13530   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:31,744-Speed 2626.18 samples/sec   Loss 21.5747   LearningRate 0.0968   Epoch: 0   Global Step: 13540   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:35,645-Speed 2625.57 samples/sec   Loss 21.3708   LearningRate 0.0968   Epoch: 0   Global Step: 13550   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:39,557-Speed 2618.32 samples/sec   Loss 21.5991   LearningRate 0.0968   Epoch: 0   Global Step: 13560   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:43,467-Speed 2619.58 samples/sec   Loss 21.3451   LearningRate 0.0968   Epoch: 0   Global Step: 13570   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:14:47,348-Speed 2639.04 samples/sec   Loss 21.3215   LearningRate 0.0968   Epoch: 0   Global Step: 13580   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:51,267-Speed 2613.54 samples/sec   Loss 21.3097   LearningRate 0.0968   Epoch: 0   Global Step: 13590   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:14:55,163-Speed 2629.26 samples/sec   Loss 21.2851   LearningRate 0.0967   Epoch: 0   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:14:59,068-Speed 2622.84 samples/sec   Loss 21.1045   LearningRate 0.0967   Epoch: 0   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:02,967-Speed 2626.21 samples/sec   Loss 21.3155   LearningRate 0.0967   Epoch: 0   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:06,867-Speed 2626.82 samples/sec   Loss 21.1027   LearningRate 0.0967   Epoch: 0   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:10,784-Speed 2615.37 samples/sec   Loss 21.0412   LearningRate 0.0967   Epoch: 0   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:14,703-Speed 2613.52 samples/sec   Loss 21.2593   LearningRate 0.0967   Epoch: 0   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:18,602-Speed 2626.81 samples/sec   Loss 21.2689   LearningRate 0.0967   Epoch: 0   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:22,512-Speed 2619.74 samples/sec   Loss 21.1240   LearningRate 0.0967   Epoch: 0   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:26,564-Speed 2527.58 samples/sec   Loss 21.2263   LearningRate 0.0967   Epoch: 0   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:30,478-Speed 2617.36 samples/sec   Loss 21.3292   LearningRate 0.0967   Epoch: 0   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:15:34,401-Speed 2610.83 samples/sec   Loss 21.2554   LearningRate 0.0967   Epoch: 0   Global Step: 13700   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:15:38,410-Speed 2554.67 samples/sec   Loss 21.1902   LearningRate 0.0967   Epoch: 0   Global Step: 13710   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:15:42,329-Speed 2613.51 samples/sec   Loss 21.2321   LearningRate 0.0967   Epoch: 0   Global Step: 13720   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:15:46,259-Speed 2606.14 samples/sec   Loss 21.0333   LearningRate 0.0967   Epoch: 0   Global Step: 13730   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:15:50,169-Speed 2619.75 samples/sec   Loss 21.1267   LearningRate 0.0967   Epoch: 0   Global Step: 13740   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:15:54,069-Speed 2626.67 samples/sec   Loss 21.2469   LearningRate 0.0967   Epoch: 0   Global Step: 13750   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:15:57,976-Speed 2621.49 samples/sec   Loss 21.1254   LearningRate 0.0967   Epoch: 0   Global Step: 13760   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:01,882-Speed 2622.66 samples/sec   Loss 21.1211   LearningRate 0.0967   Epoch: 0   Global Step: 13770   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:05,791-Speed 2620.58 samples/sec   Loss 21.2554   LearningRate 0.0967   Epoch: 0   Global Step: 13780   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:09,710-Speed 2613.15 samples/sec   Loss 21.1452   LearningRate 0.0967   Epoch: 0   Global Step: 13790   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:13,625-Speed 2616.02 samples/sec   Loss 20.9004   LearningRate 0.0967   Epoch: 0   Global Step: 13800   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:16:17,514-Speed 2634.44 samples/sec   Loss 21.0467   LearningRate 0.0967   Epoch: 0   Global Step: 13810   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:21,434-Speed 2612.86 samples/sec   Loss 21.2549   LearningRate 0.0967   Epoch: 0   Global Step: 13820   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:25,369-Speed 2603.09 samples/sec   Loss 21.2285   LearningRate 0.0967   Epoch: 0   Global Step: 13830   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:29,296-Speed 2608.23 samples/sec   Loss 21.0404   LearningRate 0.0967   Epoch: 0   Global Step: 13840   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:33,199-Speed 2624.61 samples/sec   Loss 21.2160   LearningRate 0.0967   Epoch: 0   Global Step: 13850   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:37,114-Speed 2616.13 samples/sec   Loss 21.0615   LearningRate 0.0967   Epoch: 0   Global Step: 13860   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:41,019-Speed 2622.62 samples/sec   Loss 20.9693   LearningRate 0.0967   Epoch: 0   Global Step: 13870   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:44,921-Speed 2624.70 samples/sec   Loss 21.1094   LearningRate 0.0967   Epoch: 0   Global Step: 13880   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:16:48,807-Speed 2636.41 samples/sec   Loss 20.9659   LearningRate 0.0967   Epoch: 0   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:16:52,705-Speed 2627.65 samples/sec   Loss 21.0756   LearningRate 0.0967   Epoch: 0   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:16:56,617-Speed 2618.13 samples/sec   Loss 20.9197   LearningRate 0.0967   Epoch: 0   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:00,557-Speed 2599.78 samples/sec   Loss 21.1058   LearningRate 0.0967   Epoch: 0   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:04,463-Speed 2622.30 samples/sec   Loss 21.1606   LearningRate 0.0967   Epoch: 0   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:08,427-Speed 2583.48 samples/sec   Loss 21.0476   LearningRate 0.0967   Epoch: 0   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:12,332-Speed 2623.24 samples/sec   Loss 20.9954   LearningRate 0.0967   Epoch: 0   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:16,330-Speed 2562.13 samples/sec   Loss 21.0270   LearningRate 0.0967   Epoch: 0   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:20,235-Speed 2622.79 samples/sec   Loss 21.0123   LearningRate 0.0967   Epoch: 0   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:24,138-Speed 2624.51 samples/sec   Loss 20.9538   LearningRate 0.0967   Epoch: 0   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:17:28,045-Speed 2621.61 samples/sec   Loss 20.8936   LearningRate 0.0967   Epoch: 0   Global Step: 13990   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:32,082-Speed 2537.59 samples/sec   Loss 20.7091   LearningRate 0.0967   Epoch: 0   Global Step: 14000   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:36,186-Speed 2495.83 samples/sec   Loss 20.8389   LearningRate 0.0967   Epoch: 0   Global Step: 14010   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:40,290-Speed 2495.17 samples/sec   Loss 21.0306   LearningRate 0.0966   Epoch: 0   Global Step: 14020   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:44,240-Speed 2593.52 samples/sec   Loss 20.9286   LearningRate 0.0966   Epoch: 0   Global Step: 14030   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:48,137-Speed 2628.60 samples/sec   Loss 20.9496   LearningRate 0.0966   Epoch: 0   Global Step: 14040   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:52,058-Speed 2612.36 samples/sec   Loss 21.0849   LearningRate 0.0966   Epoch: 0   Global Step: 14050   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:55,960-Speed 2624.65 samples/sec   Loss 20.9803   LearningRate 0.0966   Epoch: 0   Global Step: 14060   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:17:59,862-Speed 2625.04 samples/sec   Loss 20.7408   LearningRate 0.0966   Epoch: 0   Global Step: 14070   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:03,764-Speed 2625.42 samples/sec   Loss 20.8444   LearningRate 0.0966   Epoch: 0   Global Step: 14080   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:07,645-Speed 2638.65 samples/sec   Loss 20.5915   LearningRate 0.0966   Epoch: 0   Global Step: 14090   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:11,553-Speed 2621.16 samples/sec   Loss 20.9767   LearningRate 0.0966   Epoch: 0   Global Step: 14100   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:15,455-Speed 2624.60 samples/sec   Loss 20.8806   LearningRate 0.0966   Epoch: 0   Global Step: 14110   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:19,358-Speed 2624.31 samples/sec   Loss 20.9393   LearningRate 0.0966   Epoch: 0   Global Step: 14120   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:23,265-Speed 2621.46 samples/sec   Loss 20.7980   LearningRate 0.0966   Epoch: 0   Global Step: 14130   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:27,191-Speed 2609.02 samples/sec   Loss 20.7596   LearningRate 0.0966   Epoch: 0   Global Step: 14140   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:31,216-Speed 2544.75 samples/sec   Loss 20.7570   LearningRate 0.0966   Epoch: 0   Global Step: 14150   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:35,321-Speed 2495.55 samples/sec   Loss 20.8807   LearningRate 0.0966   Epoch: 0   Global Step: 14160   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:39,278-Speed 2587.87 samples/sec   Loss 20.8961   LearningRate 0.0966   Epoch: 0   Global Step: 14170   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:43,176-Speed 2628.02 samples/sec   Loss 20.9016   LearningRate 0.0966   Epoch: 0   Global Step: 14180   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:47,056-Speed 2639.79 samples/sec   Loss 20.7540   LearningRate 0.0966   Epoch: 0   Global Step: 14190   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:50,979-Speed 2610.98 samples/sec   Loss 20.7130   LearningRate 0.0966   Epoch: 0   Global Step: 14200   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:54,915-Speed 2602.84 samples/sec   Loss 20.7855   LearningRate 0.0966   Epoch: 0   Global Step: 14210   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:18:58,836-Speed 2612.38 samples/sec   Loss 20.9002   LearningRate 0.0966   Epoch: 0   Global Step: 14220   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:19:02,763-Speed 2607.94 samples/sec   Loss 20.6541   LearningRate 0.0966   Epoch: 0   Global Step: 14230   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:19:06,672-Speed 2620.31 samples/sec   Loss 20.6625   LearningRate 0.0966   Epoch: 0   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:10,573-Speed 2625.57 samples/sec   Loss 20.6501   LearningRate 0.0966   Epoch: 0   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:14,549-Speed 2576.43 samples/sec   Loss 20.5439   LearningRate 0.0966   Epoch: 0   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:18,463-Speed 2616.41 samples/sec   Loss 20.6955   LearningRate 0.0966   Epoch: 0   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:22,400-Speed 2602.24 samples/sec   Loss 20.6172   LearningRate 0.0966   Epoch: 0   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:26,355-Speed 2590.10 samples/sec   Loss 20.6038   LearningRate 0.0966   Epoch: 0   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:30,264-Speed 2619.67 samples/sec   Loss 20.7531   LearningRate 0.0966   Epoch: 0   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:34,182-Speed 2614.56 samples/sec   Loss 20.8339   LearningRate 0.0966   Epoch: 0   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:38,086-Speed 2623.36 samples/sec   Loss 20.7419   LearningRate 0.0966   Epoch: 0   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:41,995-Speed 2620.77 samples/sec   Loss 20.7064   LearningRate 0.0966   Epoch: 0   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:19:45,892-Speed 2628.06 samples/sec   Loss 20.6198   LearningRate 0.0966   Epoch: 0   Global Step: 14340   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:19:49,795-Speed 2624.30 samples/sec   Loss 20.5753   LearningRate 0.0966   Epoch: 0   Global Step: 14350   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:19:53,706-Speed 2618.41 samples/sec   Loss 20.6853   LearningRate 0.0966   Epoch: 0   Global Step: 14360   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:19:57,686-Speed 2574.11 samples/sec   Loss 20.6354   LearningRate 0.0966   Epoch: 0   Global Step: 14370   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:01,594-Speed 2620.16 samples/sec   Loss 20.5403   LearningRate 0.0966   Epoch: 0   Global Step: 14380   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:05,504-Speed 2620.60 samples/sec   Loss 20.6340   LearningRate 0.0966   Epoch: 0   Global Step: 14390   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:09,431-Speed 2608.09 samples/sec   Loss 20.7312   LearningRate 0.0966   Epoch: 0   Global Step: 14400   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:13,329-Speed 2627.82 samples/sec   Loss 20.4011   LearningRate 0.0966   Epoch: 0   Global Step: 14410   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:17,231-Speed 2624.76 samples/sec   Loss 20.5333   LearningRate 0.0966   Epoch: 0   Global Step: 14420   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:21,151-Speed 2613.22 samples/sec   Loss 20.5136   LearningRate 0.0966   Epoch: 0   Global Step: 14430   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:25,045-Speed 2630.28 samples/sec   Loss 20.6507   LearningRate 0.0965   Epoch: 0   Global Step: 14440   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:28,943-Speed 2627.45 samples/sec   Loss 20.5765   LearningRate 0.0965   Epoch: 0   Global Step: 14450   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:32,892-Speed 2593.62 samples/sec   Loss 20.4315   LearningRate 0.0965   Epoch: 0   Global Step: 14460   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:36,795-Speed 2624.61 samples/sec   Loss 20.4861   LearningRate 0.0965   Epoch: 0   Global Step: 14470   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:40,693-Speed 2627.51 samples/sec   Loss 20.4021   LearningRate 0.0965   Epoch: 0   Global Step: 14480   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:44,593-Speed 2626.18 samples/sec   Loss 20.6437   LearningRate 0.0965   Epoch: 0   Global Step: 14490   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:48,496-Speed 2625.11 samples/sec   Loss 20.4150   LearningRate 0.0965   Epoch: 0   Global Step: 14500   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:52,456-Speed 2586.36 samples/sec   Loss 20.7459   LearningRate 0.0965   Epoch: 0   Global Step: 14510   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:20:56,392-Speed 2603.00 samples/sec   Loss 20.3151   LearningRate 0.0965   Epoch: 0   Global Step: 14520   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:00,302-Speed 2619.95 samples/sec   Loss 20.5424   LearningRate 0.0965   Epoch: 0   Global Step: 14530   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:04,239-Speed 2601.15 samples/sec   Loss 20.6996   LearningRate 0.0965   Epoch: 0   Global Step: 14540   Fp16 Grad Scale: 524288   Required: 91 hours
Training: 2022-04-12 21:21:08,104-Speed 2650.27 samples/sec   Loss 20.4683   LearningRate 0.0965   Epoch: 0   Global Step: 14550   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:12,021-Speed 2614.93 samples/sec   Loss 20.4152   LearningRate 0.0965   Epoch: 0   Global Step: 14560   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:15,921-Speed 2626.15 samples/sec   Loss 20.3713   LearningRate 0.0965   Epoch: 0   Global Step: 14570   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:19,860-Speed 2600.33 samples/sec   Loss 20.3828   LearningRate 0.0965   Epoch: 0   Global Step: 14580   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:23,971-Speed 2491.81 samples/sec   Loss 20.4944   LearningRate 0.0965   Epoch: 0   Global Step: 14590   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:28,047-Speed 2512.82 samples/sec   Loss 20.3514   LearningRate 0.0965   Epoch: 0   Global Step: 14600   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:31,948-Speed 2625.59 samples/sec   Loss 20.5247   LearningRate 0.0965   Epoch: 0   Global Step: 14610   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:35,847-Speed 2627.72 samples/sec   Loss 20.5284   LearningRate 0.0965   Epoch: 0   Global Step: 14620   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:39,750-Speed 2623.74 samples/sec   Loss 20.4351   LearningRate 0.0965   Epoch: 0   Global Step: 14630   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:43,665-Speed 2615.84 samples/sec   Loss 20.2618   LearningRate 0.0965   Epoch: 0   Global Step: 14640   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:47,541-Speed 2642.62 samples/sec   Loss 20.2751   LearningRate 0.0965   Epoch: 0   Global Step: 14650   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:51,442-Speed 2626.01 samples/sec   Loss 20.4466   LearningRate 0.0965   Epoch: 0   Global Step: 14660   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:55,337-Speed 2630.09 samples/sec   Loss 20.4509   LearningRate 0.0965   Epoch: 0   Global Step: 14670   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:21:59,238-Speed 2625.09 samples/sec   Loss 20.3595   LearningRate 0.0965   Epoch: 0   Global Step: 14680   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:22:03,139-Speed 2625.64 samples/sec   Loss 20.2840   LearningRate 0.0965   Epoch: 0   Global Step: 14690   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:22:07,044-Speed 2622.92 samples/sec   Loss 20.3608   LearningRate 0.0965   Epoch: 0   Global Step: 14700   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:22:10,947-Speed 2623.92 samples/sec   Loss 20.2411   LearningRate 0.0965   Epoch: 0   Global Step: 14710   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:22:14,836-Speed 2633.46 samples/sec   Loss 20.2789   LearningRate 0.0965   Epoch: 0   Global Step: 14720   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:18,731-Speed 2629.84 samples/sec   Loss 20.5326   LearningRate 0.0965   Epoch: 0   Global Step: 14730   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:22,636-Speed 2623.19 samples/sec   Loss 20.3216   LearningRate 0.0965   Epoch: 0   Global Step: 14740   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:26,676-Speed 2535.21 samples/sec   Loss 20.3277   LearningRate 0.0965   Epoch: 0   Global Step: 14750   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:30,585-Speed 2620.68 samples/sec   Loss 20.3620   LearningRate 0.0965   Epoch: 0   Global Step: 14760   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:34,494-Speed 2620.01 samples/sec   Loss 20.3141   LearningRate 0.0965   Epoch: 0   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:38,409-Speed 2616.51 samples/sec   Loss 20.2870   LearningRate 0.0965   Epoch: 0   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:42,327-Speed 2613.91 samples/sec   Loss 20.3425   LearningRate 0.0965   Epoch: 0   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:46,248-Speed 2612.50 samples/sec   Loss 20.2899   LearningRate 0.0965   Epoch: 0   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:50,175-Speed 2608.11 samples/sec   Loss 20.2103   LearningRate 0.0965   Epoch: 0   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:22:54,072-Speed 2628.61 samples/sec   Loss 20.1464   LearningRate 0.0965   Epoch: 0   Global Step: 14820   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:22:57,969-Speed 2628.09 samples/sec   Loss 20.1599   LearningRate 0.0965   Epoch: 0   Global Step: 14830   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:01,866-Speed 2628.54 samples/sec   Loss 20.3299   LearningRate 0.0965   Epoch: 0   Global Step: 14840   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:05,763-Speed 2628.07 samples/sec   Loss 20.1703   LearningRate 0.0965   Epoch: 0   Global Step: 14850   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:09,659-Speed 2628.77 samples/sec   Loss 20.3464   LearningRate 0.0964   Epoch: 0   Global Step: 14860   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:13,679-Speed 2547.70 samples/sec   Loss 20.2353   LearningRate 0.0964   Epoch: 0   Global Step: 14870   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:17,725-Speed 2532.15 samples/sec   Loss 20.2014   LearningRate 0.0964   Epoch: 0   Global Step: 14880   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:21,628-Speed 2624.32 samples/sec   Loss 20.2372   LearningRate 0.0964   Epoch: 0   Global Step: 14890   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:25,523-Speed 2629.85 samples/sec   Loss 20.1777   LearningRate 0.0964   Epoch: 0   Global Step: 14900   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:29,423-Speed 2626.48 samples/sec   Loss 20.2687   LearningRate 0.0964   Epoch: 0   Global Step: 14910   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:33,310-Speed 2634.97 samples/sec   Loss 20.2008   LearningRate 0.0964   Epoch: 0   Global Step: 14920   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:37,214-Speed 2623.72 samples/sec   Loss 20.0360   LearningRate 0.0964   Epoch: 0   Global Step: 14930   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:41,120-Speed 2622.16 samples/sec   Loss 20.2671   LearningRate 0.0964   Epoch: 0   Global Step: 14940   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:45,024-Speed 2624.18 samples/sec   Loss 20.0740   LearningRate 0.0964   Epoch: 0   Global Step: 14950   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:23:48,913-Speed 2633.68 samples/sec   Loss 20.0570   LearningRate 0.0964   Epoch: 0   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:23:52,846-Speed 2604.66 samples/sec   Loss 20.2165   LearningRate 0.0964   Epoch: 0   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:23:56,744-Speed 2627.41 samples/sec   Loss 19.9507   LearningRate 0.0964   Epoch: 0   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:00,642-Speed 2628.00 samples/sec   Loss 20.2019   LearningRate 0.0964   Epoch: 0   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:04,576-Speed 2603.87 samples/sec   Loss 20.0819   LearningRate 0.0964   Epoch: 0   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:08,497-Speed 2611.75 samples/sec   Loss 20.0790   LearningRate 0.0964   Epoch: 0   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:12,397-Speed 2626.44 samples/sec   Loss 20.1781   LearningRate 0.0964   Epoch: 0   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:16,294-Speed 2628.44 samples/sec   Loss 20.1237   LearningRate 0.0964   Epoch: 0   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:20,206-Speed 2617.94 samples/sec   Loss 19.9617   LearningRate 0.0964   Epoch: 0   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:24,105-Speed 2627.20 samples/sec   Loss 20.0262   LearningRate 0.0964   Epoch: 0   Global Step: 15050   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:24:28,010-Speed 2623.27 samples/sec   Loss 19.8639   LearningRate 0.0964   Epoch: 0   Global Step: 15060   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:32,114-Speed 2495.46 samples/sec   Loss 20.0847   LearningRate 0.0964   Epoch: 0   Global Step: 15070   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:36,226-Speed 2491.19 samples/sec   Loss 20.0900   LearningRate 0.0964   Epoch: 0   Global Step: 15080   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:40,257-Speed 2541.26 samples/sec   Loss 20.1431   LearningRate 0.0964   Epoch: 0   Global Step: 15090   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:44,159-Speed 2625.11 samples/sec   Loss 19.9474   LearningRate 0.0964   Epoch: 0   Global Step: 15100   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:48,067-Speed 2620.83 samples/sec   Loss 20.0551   LearningRate 0.0964   Epoch: 0   Global Step: 15110   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:51,970-Speed 2624.34 samples/sec   Loss 20.0255   LearningRate 0.0964   Epoch: 0   Global Step: 15120   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:55,875-Speed 2623.18 samples/sec   Loss 20.0312   LearningRate 0.0964   Epoch: 0   Global Step: 15130   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:24:59,781-Speed 2622.01 samples/sec   Loss 20.1780   LearningRate 0.0964   Epoch: 0   Global Step: 15140   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:25:03,826-Speed 2531.94 samples/sec   Loss 19.8765   LearningRate 0.0964   Epoch: 0   Global Step: 15150   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:25:07,876-Speed 2529.30 samples/sec   Loss 19.9500   LearningRate 0.0964   Epoch: 0   Global Step: 15160   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:25:11,809-Speed 2603.96 samples/sec   Loss 20.0620   LearningRate 0.0964   Epoch: 0   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:15,704-Speed 2630.01 samples/sec   Loss 19.8389   LearningRate 0.0964   Epoch: 0   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:19,605-Speed 2625.19 samples/sec   Loss 19.9978   LearningRate 0.0964   Epoch: 0   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:23,539-Speed 2604.41 samples/sec   Loss 19.9619   LearningRate 0.0964   Epoch: 0   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:27,435-Speed 2629.23 samples/sec   Loss 19.9275   LearningRate 0.0964   Epoch: 0   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:31,351-Speed 2615.60 samples/sec   Loss 19.9222   LearningRate 0.0964   Epoch: 0   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:35,429-Speed 2515.76 samples/sec   Loss 19.8817   LearningRate 0.0964   Epoch: 0   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:39,412-Speed 2571.15 samples/sec   Loss 19.8325   LearningRate 0.0964   Epoch: 0   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:43,312-Speed 2626.31 samples/sec   Loss 19.8355   LearningRate 0.0964   Epoch: 0   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:47,218-Speed 2621.92 samples/sec   Loss 19.8318   LearningRate 0.0964   Epoch: 0   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:25:51,116-Speed 2628.25 samples/sec   Loss 19.8232   LearningRate 0.0964   Epoch: 0   Global Step: 15270   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:25:55,019-Speed 2624.53 samples/sec   Loss 19.7669   LearningRate 0.0964   Epoch: 0   Global Step: 15280   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:25:58,906-Speed 2635.65 samples/sec   Loss 19.8215   LearningRate 0.0963   Epoch: 0   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:02,818-Speed 2617.61 samples/sec   Loss 19.8195   LearningRate 0.0963   Epoch: 0   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:06,722-Speed 2624.19 samples/sec   Loss 20.0283   LearningRate 0.0963   Epoch: 0   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:10,631-Speed 2620.25 samples/sec   Loss 19.8638   LearningRate 0.0963   Epoch: 0   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:14,531-Speed 2626.09 samples/sec   Loss 19.8123   LearningRate 0.0963   Epoch: 0   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:18,449-Speed 2614.22 samples/sec   Loss 19.8732   LearningRate 0.0963   Epoch: 0   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:22,352-Speed 2624.56 samples/sec   Loss 19.9749   LearningRate 0.0963   Epoch: 0   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:26,259-Speed 2621.64 samples/sec   Loss 19.6293   LearningRate 0.0963   Epoch: 0   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:30,184-Speed 2609.17 samples/sec   Loss 19.7119   LearningRate 0.0963   Epoch: 0   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:34,099-Speed 2616.59 samples/sec   Loss 19.5980   LearningRate 0.0963   Epoch: 0   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:38,010-Speed 2619.08 samples/sec   Loss 19.8416   LearningRate 0.0963   Epoch: 0   Global Step: 15390   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:26:41,913-Speed 2623.92 samples/sec   Loss 19.9768   LearningRate 0.0963   Epoch: 0   Global Step: 15400   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:26:45,797-Speed 2637.59 samples/sec   Loss 19.8493   LearningRate 0.0963   Epoch: 0   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:49,701-Speed 2623.40 samples/sec   Loss 19.8868   LearningRate 0.0963   Epoch: 0   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:53,602-Speed 2625.66 samples/sec   Loss 19.9059   LearningRate 0.0963   Epoch: 0   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:26:57,500-Speed 2627.99 samples/sec   Loss 19.8015   LearningRate 0.0963   Epoch: 0   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:01,402-Speed 2624.59 samples/sec   Loss 19.7324   LearningRate 0.0963   Epoch: 0   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:05,303-Speed 2625.09 samples/sec   Loss 19.7264   LearningRate 0.0963   Epoch: 0   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:09,203-Speed 2626.54 samples/sec   Loss 19.7913   LearningRate 0.0963   Epoch: 0   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:13,106-Speed 2624.42 samples/sec   Loss 19.7186   LearningRate 0.0963   Epoch: 0   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:17,014-Speed 2621.46 samples/sec   Loss 19.6280   LearningRate 0.0963   Epoch: 0   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:20,916-Speed 2624.41 samples/sec   Loss 19.8520   LearningRate 0.0963   Epoch: 0   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:27:24,847-Speed 2605.69 samples/sec   Loss 19.7613   LearningRate 0.0963   Epoch: 0   Global Step: 15510   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:28,751-Speed 2623.64 samples/sec   Loss 19.6829   LearningRate 0.0963   Epoch: 0   Global Step: 15520   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:32,655-Speed 2623.67 samples/sec   Loss 19.8028   LearningRate 0.0963   Epoch: 0   Global Step: 15530   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:36,616-Speed 2586.10 samples/sec   Loss 19.6120   LearningRate 0.0963   Epoch: 0   Global Step: 15540   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:40,550-Speed 2603.45 samples/sec   Loss 19.6141   LearningRate 0.0963   Epoch: 0   Global Step: 15550   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:44,459-Speed 2620.43 samples/sec   Loss 19.6628   LearningRate 0.0963   Epoch: 0   Global Step: 15560   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:48,368-Speed 2620.20 samples/sec   Loss 19.8381   LearningRate 0.0963   Epoch: 0   Global Step: 15570   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:52,280-Speed 2618.38 samples/sec   Loss 19.7307   LearningRate 0.0963   Epoch: 0   Global Step: 15580   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:27:56,198-Speed 2614.13 samples/sec   Loss 19.4143   LearningRate 0.0963   Epoch: 0   Global Step: 15590   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 21:28:00,081-Speed 2637.92 samples/sec   Loss 19.7376   LearningRate 0.0963   Epoch: 0   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:03,995-Speed 2616.84 samples/sec   Loss 19.6846   LearningRate 0.0963   Epoch: 0   Global Step: 15610   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:07,912-Speed 2614.75 samples/sec   Loss 19.6380   LearningRate 0.0963   Epoch: 0   Global Step: 15620   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:11,823-Speed 2619.39 samples/sec   Loss 19.5017   LearningRate 0.0963   Epoch: 0   Global Step: 15630   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:15,738-Speed 2615.60 samples/sec   Loss 19.7234   LearningRate 0.0963   Epoch: 0   Global Step: 15640   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:19,706-Speed 2581.59 samples/sec   Loss 19.5795   LearningRate 0.0963   Epoch: 0   Global Step: 15650   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:23,608-Speed 2624.84 samples/sec   Loss 19.6088   LearningRate 0.0963   Epoch: 0   Global Step: 15660   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:27,643-Speed 2538.50 samples/sec   Loss 19.5246   LearningRate 0.0963   Epoch: 0   Global Step: 15670   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:31,548-Speed 2622.92 samples/sec   Loss 19.5502   LearningRate 0.0963   Epoch: 0   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:35,464-Speed 2615.55 samples/sec   Loss 19.3298   LearningRate 0.0963   Epoch: 0   Global Step: 15690   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:28:39,377-Speed 2617.61 samples/sec   Loss 19.3995   LearningRate 0.0963   Epoch: 0   Global Step: 15700   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:28:43,301-Speed 2609.89 samples/sec   Loss 19.7105   LearningRate 0.0962   Epoch: 0   Global Step: 15710   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:28:47,216-Speed 2618.04 samples/sec   Loss 19.8256   LearningRate 0.0962   Epoch: 0   Global Step: 15720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:28:51,122-Speed 2622.49 samples/sec   Loss 19.8013   LearningRate 0.0962   Epoch: 0   Global Step: 15730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:28:55,019-Speed 2628.16 samples/sec   Loss 19.6269   LearningRate 0.0962   Epoch: 0   Global Step: 15740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:28:58,916-Speed 2628.66 samples/sec   Loss 19.5726   LearningRate 0.0962   Epoch: 0   Global Step: 15750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:02,821-Speed 2622.90 samples/sec   Loss 19.4912   LearningRate 0.0962   Epoch: 0   Global Step: 15760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:06,720-Speed 2626.40 samples/sec   Loss 19.5454   LearningRate 0.0962   Epoch: 0   Global Step: 15770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:10,618-Speed 2628.19 samples/sec   Loss 19.4650   LearningRate 0.0962   Epoch: 0   Global Step: 15780   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:14,524-Speed 2622.00 samples/sec   Loss 19.5618   LearningRate 0.0962   Epoch: 0   Global Step: 15790   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:18,442-Speed 2614.49 samples/sec   Loss 19.4869   LearningRate 0.0962   Epoch: 0   Global Step: 15800   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:22,338-Speed 2628.46 samples/sec   Loss 19.5134   LearningRate 0.0962   Epoch: 0   Global Step: 15810   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:26,246-Speed 2621.38 samples/sec   Loss 19.3377   LearningRate 0.0962   Epoch: 0   Global Step: 15820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:30,255-Speed 2554.37 samples/sec   Loss 19.5641   LearningRate 0.0962   Epoch: 0   Global Step: 15830   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:34,161-Speed 2622.39 samples/sec   Loss 19.5162   LearningRate 0.0962   Epoch: 0   Global Step: 15840   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:38,062-Speed 2625.20 samples/sec   Loss 19.3122   LearningRate 0.0962   Epoch: 0   Global Step: 15850   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:29:42,058-Speed 2563.68 samples/sec   Loss 19.5205   LearningRate 0.0962   Epoch: 0   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:29:46,094-Speed 2537.50 samples/sec   Loss 19.4667   LearningRate 0.0962   Epoch: 0   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:29:50,129-Speed 2538.84 samples/sec   Loss 19.4186   LearningRate 0.0962   Epoch: 0   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:29:54,031-Speed 2624.49 samples/sec   Loss 19.5073   LearningRate 0.0962   Epoch: 0   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:29:57,992-Speed 2586.81 samples/sec   Loss 19.3129   LearningRate 0.0962   Epoch: 0   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:30:02,058-Speed 2518.70 samples/sec   Loss 19.4443   LearningRate 0.0962   Epoch: 0   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:30:05,978-Speed 2613.18 samples/sec   Loss 19.4529   LearningRate 0.0962   Epoch: 0   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:30:09,877-Speed 2626.37 samples/sec   Loss 19.4327   LearningRate 0.0962   Epoch: 0   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:30:13,827-Speed 2593.45 samples/sec   Loss 19.6399   LearningRate 0.0962   Epoch: 0   Global Step: 15940   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:30:17,816-Speed 2568.25 samples/sec   Loss 19.3149   LearningRate 0.0962   Epoch: 0   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:30:21,722-Speed 2622.01 samples/sec   Loss 19.3308   LearningRate 0.0962   Epoch: 0   Global Step: 15960   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:25,632-Speed 2619.29 samples/sec   Loss 19.4301   LearningRate 0.0962   Epoch: 0   Global Step: 15970   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:29,546-Speed 2617.28 samples/sec   Loss 19.2976   LearningRate 0.0962   Epoch: 0   Global Step: 15980   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:33,455-Speed 2620.86 samples/sec   Loss 19.3772   LearningRate 0.0962   Epoch: 0   Global Step: 15990   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:37,368-Speed 2616.92 samples/sec   Loss 19.3519   LearningRate 0.0962   Epoch: 0   Global Step: 16000   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:41,275-Speed 2621.23 samples/sec   Loss 19.4913   LearningRate 0.0962   Epoch: 0   Global Step: 16010   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:45,187-Speed 2618.16 samples/sec   Loss 19.4049   LearningRate 0.0962   Epoch: 0   Global Step: 16020   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:49,211-Speed 2545.81 samples/sec   Loss 19.5115   LearningRate 0.0962   Epoch: 0   Global Step: 16030   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:53,309-Speed 2499.64 samples/sec   Loss 19.4723   LearningRate 0.0962   Epoch: 0   Global Step: 16040   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:30:57,390-Speed 2510.02 samples/sec   Loss 19.3057   LearningRate 0.0962   Epoch: 0   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:01,287-Speed 2628.43 samples/sec   Loss 19.2148   LearningRate 0.0962   Epoch: 0   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:05,208-Speed 2612.27 samples/sec   Loss 19.4634   LearningRate 0.0962   Epoch: 0   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:09,131-Speed 2610.96 samples/sec   Loss 19.4095   LearningRate 0.0962   Epoch: 0   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:13,044-Speed 2617.73 samples/sec   Loss 19.4027   LearningRate 0.0962   Epoch: 0   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:16,959-Speed 2616.31 samples/sec   Loss 19.2628   LearningRate 0.0962   Epoch: 0   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:20,862-Speed 2624.19 samples/sec   Loss 19.2431   LearningRate 0.0962   Epoch: 0   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:24,762-Speed 2626.03 samples/sec   Loss 19.2754   LearningRate 0.0962   Epoch: 0   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:28,662-Speed 2626.99 samples/sec   Loss 19.2482   LearningRate 0.0961   Epoch: 0   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:32,557-Speed 2629.66 samples/sec   Loss 19.3709   LearningRate 0.0961   Epoch: 0   Global Step: 16140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:36,457-Speed 2626.32 samples/sec   Loss 19.3035   LearningRate 0.0961   Epoch: 0   Global Step: 16150   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:31:40,368-Speed 2618.69 samples/sec   Loss 19.3423   LearningRate 0.0961   Epoch: 0   Global Step: 16160   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:31:44,272-Speed 2623.22 samples/sec   Loss 19.1420   LearningRate 0.0961   Epoch: 0   Global Step: 16170   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:31:48,254-Speed 2572.32 samples/sec   Loss 19.2104   LearningRate 0.0961   Epoch: 0   Global Step: 16180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:52,155-Speed 2626.31 samples/sec   Loss 19.4840   LearningRate 0.0961   Epoch: 0   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:56,053-Speed 2627.84 samples/sec   Loss 19.4578   LearningRate 0.0961   Epoch: 0   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:31:59,956-Speed 2624.16 samples/sec   Loss 19.3557   LearningRate 0.0961   Epoch: 0   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:03,854-Speed 2627.85 samples/sec   Loss 19.2880   LearningRate 0.0961   Epoch: 0   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:07,749-Speed 2629.44 samples/sec   Loss 19.3535   LearningRate 0.0961   Epoch: 0   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:11,650-Speed 2625.58 samples/sec   Loss 19.1594   LearningRate 0.0961   Epoch: 0   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:15,550-Speed 2626.13 samples/sec   Loss 19.2143   LearningRate 0.0961   Epoch: 0   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:19,458-Speed 2620.97 samples/sec   Loss 19.3322   LearningRate 0.0961   Epoch: 0   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:23,356-Speed 2627.85 samples/sec   Loss 19.2327   LearningRate 0.0961   Epoch: 0   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:32:27,258-Speed 2625.15 samples/sec   Loss 19.4268   LearningRate 0.0961   Epoch: 0   Global Step: 16280   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:31,161-Speed 2623.75 samples/sec   Loss 19.2140   LearningRate 0.0961   Epoch: 0   Global Step: 16290   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:35,060-Speed 2627.29 samples/sec   Loss 19.1399   LearningRate 0.0961   Epoch: 0   Global Step: 16300   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:38,990-Speed 2606.70 samples/sec   Loss 19.2792   LearningRate 0.0961   Epoch: 0   Global Step: 16310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:42,888-Speed 2627.41 samples/sec   Loss 19.2350   LearningRate 0.0961   Epoch: 0   Global Step: 16320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:46,810-Speed 2611.75 samples/sec   Loss 19.1820   LearningRate 0.0961   Epoch: 0   Global Step: 16330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:50,723-Speed 2617.77 samples/sec   Loss 19.1477   LearningRate 0.0961   Epoch: 0   Global Step: 16340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:54,621-Speed 2627.40 samples/sec   Loss 19.0761   LearningRate 0.0961   Epoch: 0   Global Step: 16350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:32:58,503-Speed 2638.06 samples/sec   Loss 19.1807   LearningRate 0.0961   Epoch: 0   Global Step: 16360   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:02,418-Speed 2616.28 samples/sec   Loss 19.1691   LearningRate 0.0961   Epoch: 0   Global Step: 16370   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:06,328-Speed 2620.27 samples/sec   Loss 19.0885   LearningRate 0.0961   Epoch: 0   Global Step: 16380   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:10,225-Speed 2628.04 samples/sec   Loss 19.0631   LearningRate 0.0961   Epoch: 0   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:14,132-Speed 2621.97 samples/sec   Loss 19.1683   LearningRate 0.0961   Epoch: 0   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:18,037-Speed 2623.37 samples/sec   Loss 19.2033   LearningRate 0.0961   Epoch: 0   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:21,950-Speed 2617.32 samples/sec   Loss 19.1899   LearningRate 0.0961   Epoch: 0   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:25,850-Speed 2626.84 samples/sec   Loss 19.1508   LearningRate 0.0961   Epoch: 0   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:29,753-Speed 2623.98 samples/sec   Loss 18.9315   LearningRate 0.0961   Epoch: 0   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:33,660-Speed 2621.45 samples/sec   Loss 19.1554   LearningRate 0.0961   Epoch: 0   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:37,571-Speed 2618.63 samples/sec   Loss 19.0474   LearningRate 0.0961   Epoch: 0   Global Step: 16460   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:33:41,474-Speed 2624.75 samples/sec   Loss 19.1450   LearningRate 0.0961   Epoch: 0   Global Step: 16470   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:33:45,386-Speed 2618.02 samples/sec   Loss 19.0001   LearningRate 0.0961   Epoch: 0   Global Step: 16480   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:33:49,282-Speed 2632.65 samples/sec   Loss 19.1274   LearningRate 0.0961   Epoch: 0   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:53,173-Speed 2632.32 samples/sec   Loss 19.0434   LearningRate 0.0961   Epoch: 0   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:33:57,075-Speed 2625.07 samples/sec   Loss 19.0653   LearningRate 0.0961   Epoch: 0   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:00,971-Speed 2629.12 samples/sec   Loss 18.8936   LearningRate 0.0961   Epoch: 0   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:04,944-Speed 2577.77 samples/sec   Loss 19.0752   LearningRate 0.0961   Epoch: 0   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:08,976-Speed 2539.86 samples/sec   Loss 18.9252   LearningRate 0.0961   Epoch: 0   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:12,970-Speed 2565.12 samples/sec   Loss 19.3036   LearningRate 0.0960   Epoch: 0   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:16,883-Speed 2617.36 samples/sec   Loss 18.9670   LearningRate 0.0960   Epoch: 0   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:20,795-Speed 2618.62 samples/sec   Loss 19.1126   LearningRate 0.0960   Epoch: 0   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:24,700-Speed 2622.63 samples/sec   Loss 18.8779   LearningRate 0.0960   Epoch: 0   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:28,599-Speed 2627.26 samples/sec   Loss 18.9406   LearningRate 0.0960   Epoch: 0   Global Step: 16590   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:34:32,497-Speed 2627.48 samples/sec   Loss 19.0408   LearningRate 0.0960   Epoch: 0   Global Step: 16600   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:34:36,408-Speed 2619.08 samples/sec   Loss 19.0475   LearningRate 0.0960   Epoch: 0   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:40,338-Speed 2606.12 samples/sec   Loss 19.0731   LearningRate 0.0960   Epoch: 0   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:44,236-Speed 2627.67 samples/sec   Loss 18.8711   LearningRate 0.0960   Epoch: 0   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:48,140-Speed 2623.90 samples/sec   Loss 18.9269   LearningRate 0.0960   Epoch: 0   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:52,041-Speed 2625.67 samples/sec   Loss 18.8855   LearningRate 0.0960   Epoch: 0   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:55,978-Speed 2601.64 samples/sec   Loss 18.8269   LearningRate 0.0960   Epoch: 0   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:34:59,882-Speed 2623.51 samples/sec   Loss 18.9991   LearningRate 0.0960   Epoch: 0   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:35:03,795-Speed 2617.78 samples/sec   Loss 18.9296   LearningRate 0.0960   Epoch: 0   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:35:07,696-Speed 2625.22 samples/sec   Loss 19.0012   LearningRate 0.0960   Epoch: 0   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:35:11,593-Speed 2628.29 samples/sec   Loss 19.0163   LearningRate 0.0960   Epoch: 0   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:35:15,493-Speed 2626.50 samples/sec   Loss 18.9328   LearningRate 0.0960   Epoch: 0   Global Step: 16710   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:19,408-Speed 2616.36 samples/sec   Loss 18.8871   LearningRate 0.0960   Epoch: 0   Global Step: 16720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:23,348-Speed 2599.57 samples/sec   Loss 18.9915   LearningRate 0.0960   Epoch: 0   Global Step: 16730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:27,264-Speed 2615.75 samples/sec   Loss 18.8782   LearningRate 0.0960   Epoch: 0   Global Step: 16740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:31,169-Speed 2623.09 samples/sec   Loss 18.8398   LearningRate 0.0960   Epoch: 0   Global Step: 16750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:35,066-Speed 2628.27 samples/sec   Loss 18.8749   LearningRate 0.0960   Epoch: 0   Global Step: 16760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:38,974-Speed 2620.58 samples/sec   Loss 18.8228   LearningRate 0.0960   Epoch: 0   Global Step: 16770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:42,885-Speed 2619.51 samples/sec   Loss 18.7446   LearningRate 0.0960   Epoch: 0   Global Step: 16780   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:46,796-Speed 2618.18 samples/sec   Loss 18.9940   LearningRate 0.0960   Epoch: 0   Global Step: 16790   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:50,810-Speed 2551.89 samples/sec   Loss 18.6806   LearningRate 0.0960   Epoch: 0   Global Step: 16800   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:35:54,899-Speed 2505.24 samples/sec   Loss 18.9573   LearningRate 0.0960   Epoch: 0   Global Step: 16810   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 21:35:58,992-Speed 2502.66 samples/sec   Loss 18.9461   LearningRate 0.0960   Epoch: 0   Global Step: 16820   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 21:36:02,948-Speed 2588.99 samples/sec   Loss 19.0081   LearningRate 0.0960   Epoch: 0   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:06,846-Speed 2627.72 samples/sec   Loss 18.9221   LearningRate 0.0960   Epoch: 0   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:10,769-Speed 2610.94 samples/sec   Loss 18.8652   LearningRate 0.0960   Epoch: 0   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:14,674-Speed 2623.04 samples/sec   Loss 18.9036   LearningRate 0.0960   Epoch: 0   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:18,601-Speed 2608.72 samples/sec   Loss 18.9129   LearningRate 0.0960   Epoch: 0   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:22,498-Speed 2628.05 samples/sec   Loss 18.8909   LearningRate 0.0960   Epoch: 0   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:26,420-Speed 2611.75 samples/sec   Loss 18.7915   LearningRate 0.0960   Epoch: 0   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:30,323-Speed 2624.79 samples/sec   Loss 18.9109   LearningRate 0.0960   Epoch: 0   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:34,222-Speed 2626.87 samples/sec   Loss 18.8309   LearningRate 0.0960   Epoch: 0   Global Step: 16910   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:38,227-Speed 2557.58 samples/sec   Loss 18.8111   LearningRate 0.0960   Epoch: 0   Global Step: 16920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:42,154-Speed 2607.83 samples/sec   Loss 18.8319   LearningRate 0.0960   Epoch: 0   Global Step: 16930   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:36:46,054-Speed 2626.83 samples/sec   Loss 18.9484   LearningRate 0.0960   Epoch: 0   Global Step: 16940   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:36:49,943-Speed 2633.54 samples/sec   Loss 18.8288   LearningRate 0.0960   Epoch: 0   Global Step: 16950   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:53,850-Speed 2621.58 samples/sec   Loss 18.7720   LearningRate 0.0960   Epoch: 0   Global Step: 16960   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:36:57,774-Speed 2610.18 samples/sec   Loss 18.7448   LearningRate 0.0960   Epoch: 0   Global Step: 16970   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:01,680-Speed 2622.60 samples/sec   Loss 18.7248   LearningRate 0.0959   Epoch: 0   Global Step: 16980   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:05,581-Speed 2625.12 samples/sec   Loss 18.8466   LearningRate 0.0959   Epoch: 0   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:09,550-Speed 2580.51 samples/sec   Loss 18.6689   LearningRate 0.0959   Epoch: 0   Global Step: 17000   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:13,456-Speed 2622.25 samples/sec   Loss 18.7883   LearningRate 0.0959   Epoch: 0   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:17,368-Speed 2618.40 samples/sec   Loss 18.6970   LearningRate 0.0959   Epoch: 0   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:21,272-Speed 2623.20 samples/sec   Loss 18.6128   LearningRate 0.0959   Epoch: 0   Global Step: 17030   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:25,177-Speed 2623.79 samples/sec   Loss 18.6981   LearningRate 0.0959   Epoch: 0   Global Step: 17040   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:29,078-Speed 2625.17 samples/sec   Loss 18.7014   LearningRate 0.0959   Epoch: 0   Global Step: 17050   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:37:32,975-Speed 2628.66 samples/sec   Loss 18.7558   LearningRate 0.0959   Epoch: 0   Global Step: 17060   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:37:37,007-Speed 2540.36 samples/sec   Loss 18.5585   LearningRate 0.0959   Epoch: 0   Global Step: 17070   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:37:41,093-Speed 2506.68 samples/sec   Loss 18.6075   LearningRate 0.0959   Epoch: 0   Global Step: 17080   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:37:45,111-Speed 2549.05 samples/sec   Loss 18.6098   LearningRate 0.0959   Epoch: 0   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:49,005-Speed 2630.43 samples/sec   Loss 18.8486   LearningRate 0.0959   Epoch: 0   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:52,910-Speed 2622.49 samples/sec   Loss 18.6959   LearningRate 0.0959   Epoch: 0   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:37:56,882-Speed 2579.44 samples/sec   Loss 18.6890   LearningRate 0.0959   Epoch: 0   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:00,787-Speed 2622.99 samples/sec   Loss 18.8730   LearningRate 0.0959   Epoch: 0   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:04,690-Speed 2624.20 samples/sec   Loss 18.7591   LearningRate 0.0959   Epoch: 0   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:08,586-Speed 2628.62 samples/sec   Loss 18.8309   LearningRate 0.0959   Epoch: 0   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:12,483-Speed 2628.53 samples/sec   Loss 18.5447   LearningRate 0.0959   Epoch: 0   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:16,377-Speed 2630.17 samples/sec   Loss 18.6145   LearningRate 0.0959   Epoch: 0   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:20,280-Speed 2623.93 samples/sec   Loss 18.7063   LearningRate 0.0959   Epoch: 0   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:24,184-Speed 2623.50 samples/sec   Loss 18.5919   LearningRate 0.0959   Epoch: 0   Global Step: 17190   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:38:28,080-Speed 2629.96 samples/sec   Loss 18.6246   LearningRate 0.0959   Epoch: 0   Global Step: 17200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:38:31,958-Speed 2640.86 samples/sec   Loss 18.3443   LearningRate 0.0959   Epoch: 0   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:35,854-Speed 2628.88 samples/sec   Loss 18.6411   LearningRate 0.0959   Epoch: 0   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:39,774-Speed 2612.85 samples/sec   Loss 18.7173   LearningRate 0.0959   Epoch: 0   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:43,691-Speed 2615.36 samples/sec   Loss 18.6190   LearningRate 0.0959   Epoch: 0   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:47,625-Speed 2603.90 samples/sec   Loss 18.5798   LearningRate 0.0959   Epoch: 0   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:51,531-Speed 2622.72 samples/sec   Loss 18.5093   LearningRate 0.0959   Epoch: 0   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:55,444-Speed 2616.85 samples/sec   Loss 18.6254   LearningRate 0.0959   Epoch: 0   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:38:59,348-Speed 2623.56 samples/sec   Loss 18.7235   LearningRate 0.0959   Epoch: 0   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:39:03,255-Speed 2621.71 samples/sec   Loss 18.6812   LearningRate 0.0959   Epoch: 0   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:39:07,164-Speed 2620.70 samples/sec   Loss 18.5658   LearningRate 0.0959   Epoch: 0   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:39:11,066-Speed 2624.93 samples/sec   Loss 18.5562   LearningRate 0.0959   Epoch: 0   Global Step: 17310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:39:14,973-Speed 2621.72 samples/sec   Loss 18.6058   LearningRate 0.0959   Epoch: 0   Global Step: 17320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:39:18,908-Speed 2603.17 samples/sec   Loss 18.5384   LearningRate 0.0959   Epoch: 0   Global Step: 17330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:39:22,818-Speed 2619.81 samples/sec   Loss 18.6768   LearningRate 0.0959   Epoch: 0   Global Step: 17340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:39:26,718-Speed 2626.08 samples/sec   Loss 18.5012   LearningRate 0.0959   Epoch: 0   Global Step: 17350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:39:30,620-Speed 2625.14 samples/sec   Loss 18.6699   LearningRate 0.0959   Epoch: 0   Global Step: 17360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:39:34,491-Speed 2645.98 samples/sec   Loss 18.6062   LearningRate 0.0959   Epoch: 0   Global Step: 17370   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:39:38,382-Speed 2631.92 samples/sec   Loss 18.5708   LearningRate 0.0959   Epoch: 0   Global Step: 17380   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:39:42,288-Speed 2622.43 samples/sec   Loss 18.6713   LearningRate 0.0959   Epoch: 0   Global Step: 17390   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:39:46,216-Speed 2607.55 samples/sec   Loss 18.4591   LearningRate 0.0958   Epoch: 0   Global Step: 17400   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:39:50,126-Speed 2620.42 samples/sec   Loss 18.2639   LearningRate 0.0958   Epoch: 0   Global Step: 17410   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:39:54,032-Speed 2621.56 samples/sec   Loss 18.4446   LearningRate 0.0958   Epoch: 0   Global Step: 17420   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:39:57,943-Speed 2619.19 samples/sec   Loss 18.3904   LearningRate 0.0958   Epoch: 0   Global Step: 17430   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:40:01,847-Speed 2623.29 samples/sec   Loss 18.4460   LearningRate 0.0958   Epoch: 0   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:40:05,762-Speed 2616.52 samples/sec   Loss 18.4449   LearningRate 0.0958   Epoch: 0   Global Step: 17450   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:40:09,666-Speed 2623.24 samples/sec   Loss 18.5072   LearningRate 0.0958   Epoch: 0   Global Step: 17460   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:40:13,566-Speed 2626.64 samples/sec   Loss 18.5321   LearningRate 0.0958   Epoch: 0   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:17,468-Speed 2625.06 samples/sec   Loss 18.5781   LearningRate 0.0958   Epoch: 0   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:21,396-Speed 2607.82 samples/sec   Loss 18.3992   LearningRate 0.0958   Epoch: 0   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:25,297-Speed 2625.03 samples/sec   Loss 18.6779   LearningRate 0.0958   Epoch: 0   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:29,249-Speed 2591.94 samples/sec   Loss 18.5081   LearningRate 0.0958   Epoch: 0   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:33,157-Speed 2621.47 samples/sec   Loss 18.3493   LearningRate 0.0958   Epoch: 0   Global Step: 17520   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:37,064-Speed 2621.40 samples/sec   Loss 18.3773   LearningRate 0.0958   Epoch: 0   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:40,974-Speed 2619.63 samples/sec   Loss 18.5439   LearningRate 0.0958   Epoch: 0   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:44,882-Speed 2621.15 samples/sec   Loss 18.2566   LearningRate 0.0958   Epoch: 0   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:48,796-Speed 2617.36 samples/sec   Loss 18.5245   LearningRate 0.0958   Epoch: 0   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:40:52,707-Speed 2618.18 samples/sec   Loss 18.4863   LearningRate 0.0958   Epoch: 0   Global Step: 17570   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:40:56,680-Speed 2578.99 samples/sec   Loss 18.6396   LearningRate 0.0958   Epoch: 0   Global Step: 17580   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:41:00,779-Speed 2498.66 samples/sec   Loss 18.4179   LearningRate 0.0958   Epoch: 0   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:04,783-Speed 2557.97 samples/sec   Loss 18.4413   LearningRate 0.0958   Epoch: 0   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:08,685-Speed 2624.85 samples/sec   Loss 18.4886   LearningRate 0.0958   Epoch: 0   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:12,591-Speed 2622.69 samples/sec   Loss 18.3128   LearningRate 0.0958   Epoch: 0   Global Step: 17620   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:16,496-Speed 2622.35 samples/sec   Loss 18.3152   LearningRate 0.0958   Epoch: 0   Global Step: 17630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:20,462-Speed 2582.73 samples/sec   Loss 18.5744   LearningRate 0.0958   Epoch: 0   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:24,361-Speed 2627.34 samples/sec   Loss 18.5958   LearningRate 0.0958   Epoch: 0   Global Step: 17650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:28,264-Speed 2624.66 samples/sec   Loss 18.5716   LearningRate 0.0958   Epoch: 0   Global Step: 17660   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:32,168-Speed 2623.61 samples/sec   Loss 18.5943   LearningRate 0.0958   Epoch: 0   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:36,071-Speed 2624.32 samples/sec   Loss 18.2299   LearningRate 0.0958   Epoch: 0   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:41:39,979-Speed 2620.62 samples/sec   Loss 18.5051   LearningRate 0.0958   Epoch: 0   Global Step: 17690   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:41:43,886-Speed 2621.81 samples/sec   Loss 18.2257   LearningRate 0.0958   Epoch: 0   Global Step: 17700   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:41:47,796-Speed 2619.64 samples/sec   Loss 18.4533   LearningRate 0.0958   Epoch: 0   Global Step: 17710   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:41:51,701-Speed 2622.97 samples/sec   Loss 18.4208   LearningRate 0.0958   Epoch: 0   Global Step: 17720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:41:55,601-Speed 2626.00 samples/sec   Loss 18.3075   LearningRate 0.0958   Epoch: 0   Global Step: 17730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:41:59,501-Speed 2627.20 samples/sec   Loss 18.4998   LearningRate 0.0958   Epoch: 0   Global Step: 17740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:03,414-Speed 2617.17 samples/sec   Loss 18.4351   LearningRate 0.0958   Epoch: 0   Global Step: 17750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:07,314-Speed 2626.30 samples/sec   Loss 18.4997   LearningRate 0.0958   Epoch: 0   Global Step: 17760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:11,228-Speed 2616.37 samples/sec   Loss 18.3818   LearningRate 0.0958   Epoch: 0   Global Step: 17770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:15,169-Speed 2599.73 samples/sec   Loss 18.3415   LearningRate 0.0958   Epoch: 0   Global Step: 17780   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:19,051-Speed 2639.00 samples/sec   Loss 18.3828   LearningRate 0.0958   Epoch: 0   Global Step: 17790   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:22,954-Speed 2624.15 samples/sec   Loss 18.2549   LearningRate 0.0958   Epoch: 0   Global Step: 17800   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:42:26,850-Speed 2629.19 samples/sec   Loss 18.3265   LearningRate 0.0958   Epoch: 0   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:30,752-Speed 2625.02 samples/sec   Loss 18.4371   LearningRate 0.0957   Epoch: 0   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:34,779-Speed 2543.70 samples/sec   Loss 18.3358   LearningRate 0.0957   Epoch: 0   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:38,758-Speed 2574.11 samples/sec   Loss 18.3665   LearningRate 0.0957   Epoch: 0   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:42,668-Speed 2619.68 samples/sec   Loss 18.4066   LearningRate 0.0957   Epoch: 0   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:46,573-Speed 2622.98 samples/sec   Loss 18.3307   LearningRate 0.0957   Epoch: 0   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:50,483-Speed 2619.34 samples/sec   Loss 18.3592   LearningRate 0.0957   Epoch: 0   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:54,386-Speed 2624.52 samples/sec   Loss 18.4958   LearningRate 0.0957   Epoch: 0   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:42:58,284-Speed 2627.60 samples/sec   Loss 18.2821   LearningRate 0.0957   Epoch: 0   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:43:02,197-Speed 2617.83 samples/sec   Loss 18.3501   LearningRate 0.0957   Epoch: 0   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:43:06,115-Speed 2614.22 samples/sec   Loss 18.3933   LearningRate 0.0957   Epoch: 0   Global Step: 17910   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:10,028-Speed 2618.31 samples/sec   Loss 18.4611   LearningRate 0.0957   Epoch: 0   Global Step: 17920   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:13,925-Speed 2628.15 samples/sec   Loss 18.2330   LearningRate 0.0957   Epoch: 0   Global Step: 17930   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:17,909-Speed 2570.66 samples/sec   Loss 18.2776   LearningRate 0.0957   Epoch: 0   Global Step: 17940   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:21,809-Speed 2625.87 samples/sec   Loss 18.4361   LearningRate 0.0957   Epoch: 0   Global Step: 17950   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:25,709-Speed 2625.96 samples/sec   Loss 18.1383   LearningRate 0.0957   Epoch: 0   Global Step: 17960   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:29,614-Speed 2623.40 samples/sec   Loss 18.1212   LearningRate 0.0957   Epoch: 0   Global Step: 17970   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:33,525-Speed 2619.10 samples/sec   Loss 18.3211   LearningRate 0.0957   Epoch: 0   Global Step: 17980   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:37,429-Speed 2623.54 samples/sec   Loss 18.0621   LearningRate 0.0957   Epoch: 0   Global Step: 17990   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:43:41,315-Speed 2636.31 samples/sec   Loss 18.2717   LearningRate 0.0957   Epoch: 0   Global Step: 18000   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:43:45,220-Speed 2622.67 samples/sec   Loss 18.3641   LearningRate 0.0957   Epoch: 0   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:43:49,131-Speed 2618.23 samples/sec   Loss 18.3980   LearningRate 0.0957   Epoch: 0   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:43:53,043-Speed 2618.43 samples/sec   Loss 18.2029   LearningRate 0.0957   Epoch: 0   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:43:56,952-Speed 2620.34 samples/sec   Loss 18.0442   LearningRate 0.0957   Epoch: 0   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:44:00,863-Speed 2619.11 samples/sec   Loss 18.2281   LearningRate 0.0957   Epoch: 0   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:44:04,780-Speed 2614.85 samples/sec   Loss 18.3425   LearningRate 0.0957   Epoch: 0   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:44:08,690-Speed 2619.99 samples/sec   Loss 18.3728   LearningRate 0.0957   Epoch: 0   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:44:12,587-Speed 2628.06 samples/sec   Loss 18.1922   LearningRate 0.0957   Epoch: 0   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:44:16,488-Speed 2625.31 samples/sec   Loss 18.2622   LearningRate 0.0957   Epoch: 0   Global Step: 18090   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:44:20,408-Speed 2612.75 samples/sec   Loss 18.0871   LearningRate 0.0957   Epoch: 0   Global Step: 18100   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:24,307-Speed 2627.30 samples/sec   Loss 18.1889   LearningRate 0.0957   Epoch: 0   Global Step: 18110   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:28,228-Speed 2611.77 samples/sec   Loss 18.1746   LearningRate 0.0957   Epoch: 0   Global Step: 18120   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:32,133-Speed 2622.73 samples/sec   Loss 18.1558   LearningRate 0.0957   Epoch: 0   Global Step: 18130   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:36,039-Speed 2622.34 samples/sec   Loss 18.1852   LearningRate 0.0957   Epoch: 0   Global Step: 18140   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:39,944-Speed 2623.39 samples/sec   Loss 18.2432   LearningRate 0.0957   Epoch: 0   Global Step: 18150   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:43,846-Speed 2625.36 samples/sec   Loss 18.1496   LearningRate 0.0957   Epoch: 0   Global Step: 18160   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:47,768-Speed 2611.47 samples/sec   Loss 18.1871   LearningRate 0.0957   Epoch: 0   Global Step: 18170   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:51,680-Speed 2618.25 samples/sec   Loss 18.2074   LearningRate 0.0957   Epoch: 0   Global Step: 18180   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:55,582-Speed 2624.91 samples/sec   Loss 18.1602   LearningRate 0.0957   Epoch: 0   Global Step: 18190   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:44:59,660-Speed 2511.82 samples/sec   Loss 18.1146   LearningRate 0.0957   Epoch: 0   Global Step: 18200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:45:03,780-Speed 2485.94 samples/sec   Loss 18.3924   LearningRate 0.0957   Epoch: 0   Global Step: 18210   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:45:07,806-Speed 2544.04 samples/sec   Loss 18.3129   LearningRate 0.0957   Epoch: 0   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:11,708-Speed 2625.20 samples/sec   Loss 18.3000   LearningRate 0.0957   Epoch: 0   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:15,643-Speed 2603.09 samples/sec   Loss 18.0136   LearningRate 0.0957   Epoch: 0   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:19,550-Speed 2621.96 samples/sec   Loss 18.1801   LearningRate 0.0956   Epoch: 0   Global Step: 18250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:23,457-Speed 2621.54 samples/sec   Loss 18.0841   LearningRate 0.0956   Epoch: 0   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:27,358-Speed 2625.86 samples/sec   Loss 18.1076   LearningRate 0.0956   Epoch: 0   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:31,256-Speed 2627.70 samples/sec   Loss 18.1984   LearningRate 0.0956   Epoch: 0   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:35,171-Speed 2615.92 samples/sec   Loss 18.1256   LearningRate 0.0956   Epoch: 0   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:39,067-Speed 2628.68 samples/sec   Loss 18.0985   LearningRate 0.0956   Epoch: 0   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:42,989-Speed 2612.02 samples/sec   Loss 18.0364   LearningRate 0.0956   Epoch: 0   Global Step: 18310   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:45:46,885-Speed 2628.70 samples/sec   Loss 18.1798   LearningRate 0.0956   Epoch: 0   Global Step: 18320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:45:50,785-Speed 2626.68 samples/sec   Loss 18.0837   LearningRate 0.0956   Epoch: 0   Global Step: 18330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:45:54,689-Speed 2623.63 samples/sec   Loss 18.1257   LearningRate 0.0956   Epoch: 0   Global Step: 18340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:45:58,585-Speed 2628.71 samples/sec   Loss 18.0332   LearningRate 0.0956   Epoch: 0   Global Step: 18350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:46:02,485-Speed 2626.25 samples/sec   Loss 18.3143   LearningRate 0.0956   Epoch: 0   Global Step: 18360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:46:06,385-Speed 2626.08 samples/sec   Loss 18.0062   LearningRate 0.0956   Epoch: 0   Global Step: 18370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:46:10,288-Speed 2624.24 samples/sec   Loss 18.1183   LearningRate 0.0956   Epoch: 0   Global Step: 18380   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:46:14,193-Speed 2623.28 samples/sec   Loss 18.1862   LearningRate 0.0956   Epoch: 0   Global Step: 18390   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:46:18,072-Speed 2640.72 samples/sec   Loss 18.1182   LearningRate 0.0956   Epoch: 0   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:21,969-Speed 2628.18 samples/sec   Loss 18.0740   LearningRate 0.0956   Epoch: 0   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:25,869-Speed 2625.96 samples/sec   Loss 17.8848   LearningRate 0.0956   Epoch: 0   Global Step: 18420   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:29,778-Speed 2620.79 samples/sec   Loss 18.1338   LearningRate 0.0956   Epoch: 0   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:33,674-Speed 2628.59 samples/sec   Loss 18.2178   LearningRate 0.0956   Epoch: 0   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:37,609-Speed 2602.63 samples/sec   Loss 18.0238   LearningRate 0.0956   Epoch: 0   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:41,519-Speed 2619.21 samples/sec   Loss 18.0538   LearningRate 0.0956   Epoch: 0   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:45,419-Speed 2627.02 samples/sec   Loss 18.0097   LearningRate 0.0956   Epoch: 0   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:49,320-Speed 2625.89 samples/sec   Loss 18.1874   LearningRate 0.0956   Epoch: 0   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:53,384-Speed 2520.29 samples/sec   Loss 18.2285   LearningRate 0.0956   Epoch: 0   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:46:57,368-Speed 2571.00 samples/sec   Loss 18.1414   LearningRate 0.0956   Epoch: 0   Global Step: 18500   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:01,272-Speed 2623.65 samples/sec   Loss 18.1673   LearningRate 0.0956   Epoch: 0   Global Step: 18510   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:05,187-Speed 2616.02 samples/sec   Loss 18.0343   LearningRate 0.0956   Epoch: 0   Global Step: 18520   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:09,088-Speed 2625.21 samples/sec   Loss 18.0002   LearningRate 0.0956   Epoch: 0   Global Step: 18530   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:12,986-Speed 2628.00 samples/sec   Loss 18.1295   LearningRate 0.0956   Epoch: 0   Global Step: 18540   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:16,887-Speed 2625.47 samples/sec   Loss 17.8831   LearningRate 0.0956   Epoch: 0   Global Step: 18550   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:20,797-Speed 2620.03 samples/sec   Loss 18.0493   LearningRate 0.0956   Epoch: 0   Global Step: 18560   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:47:24,682-Speed 2636.40 samples/sec   Loss 17.9456   LearningRate 0.0956   Epoch: 0   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:28,591-Speed 2620.00 samples/sec   Loss 17.9657   LearningRate 0.0956   Epoch: 0   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:32,528-Speed 2601.54 samples/sec   Loss 17.9078   LearningRate 0.0956   Epoch: 0   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:36,435-Speed 2621.10 samples/sec   Loss 17.9802   LearningRate 0.0956   Epoch: 0   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:40,352-Speed 2614.72 samples/sec   Loss 18.0268   LearningRate 0.0956   Epoch: 0   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:44,262-Speed 2620.22 samples/sec   Loss 17.9153   LearningRate 0.0956   Epoch: 0   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:48,167-Speed 2622.47 samples/sec   Loss 18.0416   LearningRate 0.0956   Epoch: 0   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:52,074-Speed 2621.78 samples/sec   Loss 17.9440   LearningRate 0.0956   Epoch: 0   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:55,976-Speed 2625.47 samples/sec   Loss 17.9009   LearningRate 0.0956   Epoch: 0   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:47:59,875-Speed 2626.94 samples/sec   Loss 18.0552   LearningRate 0.0956   Epoch: 0   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:03,794-Speed 2613.23 samples/sec   Loss 17.9787   LearningRate 0.0955   Epoch: 0   Global Step: 18670   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:48:07,693-Speed 2626.96 samples/sec   Loss 17.8204   LearningRate 0.0955   Epoch: 0   Global Step: 18680   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:48:11,717-Speed 2544.77 samples/sec   Loss 18.0202   LearningRate 0.0955   Epoch: 0   Global Step: 18690   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:48:15,625-Speed 2621.53 samples/sec   Loss 18.0259   LearningRate 0.0955   Epoch: 0   Global Step: 18700   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:48:19,636-Speed 2553.73 samples/sec   Loss 17.9317   LearningRate 0.0955   Epoch: 0   Global Step: 18710   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:23,757-Speed 2485.69 samples/sec   Loss 18.1433   LearningRate 0.0955   Epoch: 0   Global Step: 18720   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:27,970-Speed 2430.75 samples/sec   Loss 17.7669   LearningRate 0.0955   Epoch: 0   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:31,992-Speed 2547.13 samples/sec   Loss 17.9332   LearningRate 0.0955   Epoch: 0   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:35,888-Speed 2628.85 samples/sec   Loss 18.0121   LearningRate 0.0955   Epoch: 0   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:39,786-Speed 2627.57 samples/sec   Loss 17.8698   LearningRate 0.0955   Epoch: 0   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:43,687-Speed 2625.54 samples/sec   Loss 17.8436   LearningRate 0.0955   Epoch: 0   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:47,578-Speed 2632.33 samples/sec   Loss 17.8947   LearningRate 0.0955   Epoch: 0   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:51,474-Speed 2629.45 samples/sec   Loss 17.6847   LearningRate 0.0955   Epoch: 0   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:55,369-Speed 2629.32 samples/sec   Loss 17.9630   LearningRate 0.0955   Epoch: 0   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:48:59,270-Speed 2625.31 samples/sec   Loss 17.9661   LearningRate 0.0955   Epoch: 0   Global Step: 18810   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:49:03,182-Speed 2618.59 samples/sec   Loss 17.9470   LearningRate 0.0955   Epoch: 0   Global Step: 18820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:49:07,069-Speed 2634.99 samples/sec   Loss 17.8972   LearningRate 0.0955   Epoch: 0   Global Step: 18830   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:10,972-Speed 2624.39 samples/sec   Loss 17.7020   LearningRate 0.0955   Epoch: 0   Global Step: 18840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:14,869-Speed 2628.41 samples/sec   Loss 17.9604   LearningRate 0.0955   Epoch: 0   Global Step: 18850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:18,774-Speed 2622.25 samples/sec   Loss 17.9261   LearningRate 0.0955   Epoch: 0   Global Step: 18860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:22,672-Speed 2628.01 samples/sec   Loss 17.8128   LearningRate 0.0955   Epoch: 0   Global Step: 18870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:26,575-Speed 2624.40 samples/sec   Loss 18.0030   LearningRate 0.0955   Epoch: 0   Global Step: 18880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:30,479-Speed 2623.52 samples/sec   Loss 18.1322   LearningRate 0.0955   Epoch: 0   Global Step: 18890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:34,383-Speed 2623.78 samples/sec   Loss 17.7548   LearningRate 0.0955   Epoch: 0   Global Step: 18900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:38,284-Speed 2625.87 samples/sec   Loss 17.7556   LearningRate 0.0955   Epoch: 0   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:42,179-Speed 2630.13 samples/sec   Loss 17.7762   LearningRate 0.0955   Epoch: 0   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:46,081-Speed 2624.42 samples/sec   Loss 17.7876   LearningRate 0.0955   Epoch: 0   Global Step: 18930   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:49:49,992-Speed 2619.21 samples/sec   Loss 17.8212   LearningRate 0.0955   Epoch: 0   Global Step: 18940   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:49:53,873-Speed 2639.01 samples/sec   Loss 17.7351   LearningRate 0.0955   Epoch: 0   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:49:57,782-Speed 2620.85 samples/sec   Loss 17.8961   LearningRate 0.0955   Epoch: 0   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:01,679-Speed 2627.75 samples/sec   Loss 17.8273   LearningRate 0.0955   Epoch: 0   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:05,591-Speed 2618.62 samples/sec   Loss 17.8659   LearningRate 0.0955   Epoch: 0   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:09,490-Speed 2627.11 samples/sec   Loss 17.7508   LearningRate 0.0955   Epoch: 0   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:13,410-Speed 2613.66 samples/sec   Loss 18.0121   LearningRate 0.0955   Epoch: 0   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:17,450-Speed 2534.59 samples/sec   Loss 17.9239   LearningRate 0.0955   Epoch: 0   Global Step: 19010   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:21,365-Speed 2616.62 samples/sec   Loss 17.9537   LearningRate 0.0955   Epoch: 0   Global Step: 19020   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:25,260-Speed 2629.08 samples/sec   Loss 18.0341   LearningRate 0.0955   Epoch: 0   Global Step: 19030   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:29,159-Speed 2628.21 samples/sec   Loss 17.8589   LearningRate 0.0955   Epoch: 0   Global Step: 19040   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:33,100-Speed 2598.76 samples/sec   Loss 17.8529   LearningRate 0.0955   Epoch: 0   Global Step: 19050   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:50:37,121-Speed 2547.27 samples/sec   Loss 17.5309   LearningRate 0.0955   Epoch: 0   Global Step: 19060   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:50:41,020-Speed 2627.08 samples/sec   Loss 17.6666   LearningRate 0.0955   Epoch: 0   Global Step: 19070   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:50:44,923-Speed 2624.30 samples/sec   Loss 17.7971   LearningRate 0.0955   Epoch: 0   Global Step: 19080   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:50:48,827-Speed 2623.43 samples/sec   Loss 17.7781   LearningRate 0.0955   Epoch: 0   Global Step: 19090   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:50:52,723-Speed 2629.48 samples/sec   Loss 17.9023   LearningRate 0.0954   Epoch: 0   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:50:56,625-Speed 2624.61 samples/sec   Loss 17.6611   LearningRate 0.0954   Epoch: 0   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:00,523-Speed 2627.62 samples/sec   Loss 17.6982   LearningRate 0.0954   Epoch: 0   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:04,424-Speed 2625.37 samples/sec   Loss 17.7216   LearningRate 0.0954   Epoch: 0   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:08,328-Speed 2623.87 samples/sec   Loss 17.7848   LearningRate 0.0954   Epoch: 0   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:12,240-Speed 2618.68 samples/sec   Loss 17.8644   LearningRate 0.0954   Epoch: 0   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:16,159-Speed 2613.41 samples/sec   Loss 17.6474   LearningRate 0.0954   Epoch: 0   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:20,102-Speed 2598.13 samples/sec   Loss 17.7398   LearningRate 0.0954   Epoch: 0   Global Step: 19170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:24,008-Speed 2621.96 samples/sec   Loss 17.7747   LearningRate 0.0954   Epoch: 0   Global Step: 19180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:27,918-Speed 2619.94 samples/sec   Loss 17.8897   LearningRate 0.0954   Epoch: 0   Global Step: 19190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:31,831-Speed 2617.67 samples/sec   Loss 17.6622   LearningRate 0.0954   Epoch: 0   Global Step: 19200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:51:35,734-Speed 2624.45 samples/sec   Loss 17.8350   LearningRate 0.0954   Epoch: 0   Global Step: 19210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:39,662-Speed 2607.28 samples/sec   Loss 17.7188   LearningRate 0.0954   Epoch: 0   Global Step: 19220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:43,745-Speed 2508.75 samples/sec   Loss 17.7541   LearningRate 0.0954   Epoch: 0   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:47,652-Speed 2621.02 samples/sec   Loss 17.6151   LearningRate 0.0954   Epoch: 0   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:51,561-Speed 2620.62 samples/sec   Loss 17.7492   LearningRate 0.0954   Epoch: 0   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:55,462-Speed 2625.28 samples/sec   Loss 17.7059   LearningRate 0.0954   Epoch: 0   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:51:59,367-Speed 2622.99 samples/sec   Loss 17.7024   LearningRate 0.0954   Epoch: 0   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:03,281-Speed 2617.18 samples/sec   Loss 17.6729   LearningRate 0.0954   Epoch: 0   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:07,195-Speed 2616.36 samples/sec   Loss 17.6817   LearningRate 0.0954   Epoch: 0   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:11,114-Speed 2613.77 samples/sec   Loss 17.6995   LearningRate 0.0954   Epoch: 0   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:15,021-Speed 2621.58 samples/sec   Loss 17.8199   LearningRate 0.0954   Epoch: 0   Global Step: 19310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:18,929-Speed 2620.64 samples/sec   Loss 17.5816   LearningRate 0.0954   Epoch: 0   Global Step: 19320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:22,839-Speed 2620.18 samples/sec   Loss 17.8101   LearningRate 0.0954   Epoch: 0   Global Step: 19330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:26,748-Speed 2619.99 samples/sec   Loss 17.3588   LearningRate 0.0954   Epoch: 0   Global Step: 19340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:30,656-Speed 2621.09 samples/sec   Loss 17.7315   LearningRate 0.0954   Epoch: 0   Global Step: 19350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:34,622-Speed 2582.21 samples/sec   Loss 17.7278   LearningRate 0.0954   Epoch: 0   Global Step: 19360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:38,734-Speed 2491.31 samples/sec   Loss 17.6213   LearningRate 0.0954   Epoch: 0   Global Step: 19370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:52:42,776-Speed 2534.08 samples/sec   Loss 17.6644   LearningRate 0.0954   Epoch: 0   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:46,685-Speed 2619.72 samples/sec   Loss 17.6863   LearningRate 0.0954   Epoch: 0   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:50,597-Speed 2618.32 samples/sec   Loss 17.8517   LearningRate 0.0954   Epoch: 0   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:54,505-Speed 2620.59 samples/sec   Loss 17.7554   LearningRate 0.0954   Epoch: 0   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:52:58,417-Speed 2618.48 samples/sec   Loss 17.6788   LearningRate 0.0954   Epoch: 0   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:53:02,328-Speed 2618.93 samples/sec   Loss 17.5842   LearningRate 0.0954   Epoch: 0   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:53:06,306-Speed 2574.71 samples/sec   Loss 17.7680   LearningRate 0.0954   Epoch: 0   Global Step: 19440   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:53:10,261-Speed 2589.43 samples/sec   Loss 17.6630   LearningRate 0.0954   Epoch: 0   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:53:14,171-Speed 2620.29 samples/sec   Loss 17.5794   LearningRate 0.0954   Epoch: 0   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:53:18,076-Speed 2622.60 samples/sec   Loss 17.6071   LearningRate 0.0954   Epoch: 0   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:53:21,973-Speed 2628.32 samples/sec   Loss 17.6385   LearningRate 0.0954   Epoch: 0   Global Step: 19480   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:25,874-Speed 2625.89 samples/sec   Loss 17.7544   LearningRate 0.0954   Epoch: 0   Global Step: 19490   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:29,769-Speed 2629.87 samples/sec   Loss 17.5695   LearningRate 0.0954   Epoch: 0   Global Step: 19500   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:33,668-Speed 2626.87 samples/sec   Loss 17.6947   LearningRate 0.0954   Epoch: 0   Global Step: 19510   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:37,563-Speed 2629.08 samples/sec   Loss 17.5368   LearningRate 0.0953   Epoch: 0   Global Step: 19520   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:41,460-Speed 2628.73 samples/sec   Loss 17.6336   LearningRate 0.0953   Epoch: 0   Global Step: 19530   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:45,370-Speed 2619.69 samples/sec   Loss 17.6664   LearningRate 0.0953   Epoch: 0   Global Step: 19540   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:49,270-Speed 2626.26 samples/sec   Loss 17.5681   LearningRate 0.0953   Epoch: 0   Global Step: 19550   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:53,192-Speed 2612.07 samples/sec   Loss 17.5985   LearningRate 0.0953   Epoch: 0   Global Step: 19560   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 21:53:57,078-Speed 2635.35 samples/sec   Loss 17.6396   LearningRate 0.0953   Epoch: 0   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:00,981-Speed 2625.07 samples/sec   Loss 17.5347   LearningRate 0.0953   Epoch: 0   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:04,888-Speed 2621.52 samples/sec   Loss 17.4657   LearningRate 0.0953   Epoch: 0   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:08,790-Speed 2625.36 samples/sec   Loss 17.5218   LearningRate 0.0953   Epoch: 0   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:12,684-Speed 2629.80 samples/sec   Loss 17.4519   LearningRate 0.0953   Epoch: 0   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:16,597-Speed 2617.98 samples/sec   Loss 17.5497   LearningRate 0.0953   Epoch: 0   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:20,578-Speed 2573.00 samples/sec   Loss 17.6564   LearningRate 0.0953   Epoch: 0   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:24,534-Speed 2589.55 samples/sec   Loss 17.5050   LearningRate 0.0953   Epoch: 0   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:28,469-Speed 2602.79 samples/sec   Loss 17.5466   LearningRate 0.0953   Epoch: 0   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:54:32,350-Speed 2638.79 samples/sec   Loss 17.4392   LearningRate 0.0953   Epoch: 0   Global Step: 19660   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:36,431-Speed 2510.25 samples/sec   Loss 17.5418   LearningRate 0.0953   Epoch: 0   Global Step: 19670   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:40,331-Speed 2626.47 samples/sec   Loss 17.6341   LearningRate 0.0953   Epoch: 0   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:44,238-Speed 2621.94 samples/sec   Loss 17.5715   LearningRate 0.0953   Epoch: 0   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:48,134-Speed 2629.32 samples/sec   Loss 17.4866   LearningRate 0.0953   Epoch: 0   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:52,032-Speed 2627.58 samples/sec   Loss 17.3626   LearningRate 0.0953   Epoch: 0   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:55,932-Speed 2625.83 samples/sec   Loss 17.5903   LearningRate 0.0953   Epoch: 0   Global Step: 19720   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:54:59,838-Speed 2622.18 samples/sec   Loss 17.6314   LearningRate 0.0953   Epoch: 0   Global Step: 19730   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:55:03,755-Speed 2615.56 samples/sec   Loss 17.4813   LearningRate 0.0953   Epoch: 0   Global Step: 19740   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:55:07,653-Speed 2627.15 samples/sec   Loss 17.4221   LearningRate 0.0953   Epoch: 0   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:55:11,645-Speed 2565.97 samples/sec   Loss 17.5287   LearningRate 0.0953   Epoch: 0   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:15,549-Speed 2624.07 samples/sec   Loss 17.5506   LearningRate 0.0953   Epoch: 0   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:19,451-Speed 2624.91 samples/sec   Loss 17.6147   LearningRate 0.0953   Epoch: 0   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:23,351-Speed 2626.28 samples/sec   Loss 17.3949   LearningRate 0.0953   Epoch: 0   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:27,248-Speed 2628.36 samples/sec   Loss 17.4337   LearningRate 0.0953   Epoch: 0   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:31,154-Speed 2622.18 samples/sec   Loss 17.6503   LearningRate 0.0953   Epoch: 0   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:35,054-Speed 2626.20 samples/sec   Loss 17.5286   LearningRate 0.0953   Epoch: 0   Global Step: 19820   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:38,955-Speed 2625.47 samples/sec   Loss 17.5874   LearningRate 0.0953   Epoch: 0   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:42,852-Speed 2628.72 samples/sec   Loss 17.4944   LearningRate 0.0953   Epoch: 0   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 21:55:46,740-Speed 2634.01 samples/sec   Loss 17.5396   LearningRate 0.0953   Epoch: 0   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:55:50,789-Speed 2529.97 samples/sec   Loss 17.6308   LearningRate 0.0953   Epoch: 0   Global Step: 19860   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:55:54,684-Speed 2630.17 samples/sec   Loss 17.4186   LearningRate 0.0953   Epoch: 0   Global Step: 19870   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:55:58,576-Speed 2630.97 samples/sec   Loss 17.4360   LearningRate 0.0953   Epoch: 0   Global Step: 19880   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:02,471-Speed 2629.74 samples/sec   Loss 17.6369   LearningRate 0.0953   Epoch: 0   Global Step: 19890   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:06,393-Speed 2612.40 samples/sec   Loss 17.5765   LearningRate 0.0953   Epoch: 0   Global Step: 19900   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:10,286-Speed 2631.59 samples/sec   Loss 17.7087   LearningRate 0.0953   Epoch: 0   Global Step: 19910   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:14,181-Speed 2629.56 samples/sec   Loss 17.5543   LearningRate 0.0953   Epoch: 0   Global Step: 19920   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:18,082-Speed 2626.01 samples/sec   Loss 17.5969   LearningRate 0.0953   Epoch: 0   Global Step: 19930   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:21,992-Speed 2619.36 samples/sec   Loss 17.6018   LearningRate 0.0953   Epoch: 0   Global Step: 19940   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:25,903-Speed 2618.75 samples/sec   Loss 17.4949   LearningRate 0.0952   Epoch: 0   Global Step: 19950   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 21:56:29,811-Speed 2621.25 samples/sec   Loss 17.6592   LearningRate 0.0952   Epoch: 0   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:56:33,706-Speed 2629.67 samples/sec   Loss 17.3061   LearningRate 0.0952   Epoch: 0   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:56:37,604-Speed 2627.38 samples/sec   Loss 17.2772   LearningRate 0.0952   Epoch: 0   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:56:41,533-Speed 2607.05 samples/sec   Loss 17.5025   LearningRate 0.0952   Epoch: 0   Global Step: 19990   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:56:45,442-Speed 2620.85 samples/sec   Loss 17.6272   LearningRate 0.0952   Epoch: 0   Global Step: 20000   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 21:57:28,604-[lfw][20000]XNorm: 22.849141
Training: 2022-04-12 21:57:28,605-[lfw][20000]Accuracy-Flip: 0.99500+-0.00279
Training: 2022-04-12 21:57:28,605-[lfw][20000]Accuracy-Highest: 0.99500
Training: 2022-04-12 21:58:18,988-[cfp_fp][20000]XNorm: 20.668791
Training: 2022-04-12 21:58:18,989-[cfp_fp][20000]Accuracy-Flip: 0.96043+-0.00869
Training: 2022-04-12 21:58:18,991-[cfp_fp][20000]Accuracy-Highest: 0.96043
Training: 2022-04-12 21:59:02,372-[agedb_30][20000]XNorm: 22.528649
Training: 2022-04-12 21:59:02,373-[agedb_30][20000]Accuracy-Flip: 0.94267+-0.01138
Training: 2022-04-12 21:59:02,373-[agedb_30][20000]Accuracy-Highest: 0.94267
Training: 2022-04-12 21:59:06,244-Speed 72.73 samples/sec   Loss 17.2786   LearningRate 0.0952   Epoch: 0   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 21:59:10,116-Speed 2645.17 samples/sec   Loss 17.3368   LearningRate 0.0952   Epoch: 0   Global Step: 20020   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 21:59:14,026-Speed 2619.95 samples/sec   Loss 17.5399   LearningRate 0.0952   Epoch: 0   Global Step: 20030   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 21:59:17,971-Speed 2596.09 samples/sec   Loss 17.5037   LearningRate 0.0952   Epoch: 0   Global Step: 20040   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 21:59:21,854-Speed 2637.86 samples/sec   Loss 17.3660   LearningRate 0.0952   Epoch: 0   Global Step: 20050   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 21:59:25,817-Speed 2584.21 samples/sec   Loss 17.3713   LearningRate 0.0952   Epoch: 0   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:29,729-Speed 2619.65 samples/sec   Loss 17.3161   LearningRate 0.0952   Epoch: 0   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:33,618-Speed 2633.45 samples/sec   Loss 17.3355   LearningRate 0.0952   Epoch: 0   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:37,507-Speed 2633.78 samples/sec   Loss 17.5217   LearningRate 0.0952   Epoch: 0   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:41,378-Speed 2646.37 samples/sec   Loss 17.2970   LearningRate 0.0952   Epoch: 0   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:45,263-Speed 2636.41 samples/sec   Loss 17.1568   LearningRate 0.0952   Epoch: 0   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:49,148-Speed 2636.23 samples/sec   Loss 17.5031   LearningRate 0.0952   Epoch: 0   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:53,037-Speed 2633.46 samples/sec   Loss 17.2312   LearningRate 0.0952   Epoch: 0   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 21:59:56,926-Speed 2634.14 samples/sec   Loss 17.2506   LearningRate 0.0952   Epoch: 0   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:00,818-Speed 2631.03 samples/sec   Loss 17.3349   LearningRate 0.0952   Epoch: 0   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:04,695-Speed 2642.21 samples/sec   Loss 17.4012   LearningRate 0.0952   Epoch: 0   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:08,589-Speed 2630.30 samples/sec   Loss 17.3337   LearningRate 0.0952   Epoch: 0   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:12,485-Speed 2628.84 samples/sec   Loss 17.3130   LearningRate 0.0952   Epoch: 0   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:16,506-Speed 2547.22 samples/sec   Loss 17.4086   LearningRate 0.0952   Epoch: 0   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:20,405-Speed 2627.49 samples/sec   Loss 17.3173   LearningRate 0.0952   Epoch: 0   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:24,299-Speed 2630.34 samples/sec   Loss 17.3467   LearningRate 0.0952   Epoch: 0   Global Step: 20210   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:28,194-Speed 2629.25 samples/sec   Loss 17.5611   LearningRate 0.0952   Epoch: 0   Global Step: 20220   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:32,105-Speed 2619.41 samples/sec   Loss 17.3273   LearningRate 0.0952   Epoch: 0   Global Step: 20230   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:36,005-Speed 2626.51 samples/sec   Loss 17.2458   LearningRate 0.0952   Epoch: 0   Global Step: 20240   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:39,904-Speed 2626.93 samples/sec   Loss 17.3941   LearningRate 0.0952   Epoch: 0   Global Step: 20250   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:43,799-Speed 2629.04 samples/sec   Loss 17.2698   LearningRate 0.0952   Epoch: 0   Global Step: 20260   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:00:47,702-Speed 2624.49 samples/sec   Loss 17.0955   LearningRate 0.0952   Epoch: 0   Global Step: 20270   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:00:51,598-Speed 2628.98 samples/sec   Loss 17.3007   LearningRate 0.0952   Epoch: 0   Global Step: 20280   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:00:55,516-Speed 2614.79 samples/sec   Loss 17.2960   LearningRate 0.0952   Epoch: 0   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:00:59,411-Speed 2629.61 samples/sec   Loss 17.2550   LearningRate 0.0952   Epoch: 0   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:03,328-Speed 2614.76 samples/sec   Loss 17.1290   LearningRate 0.0952   Epoch: 0   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:07,333-Speed 2557.54 samples/sec   Loss 17.1881   LearningRate 0.0952   Epoch: 0   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:11,232-Speed 2627.17 samples/sec   Loss 17.2640   LearningRate 0.0952   Epoch: 0   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:15,132-Speed 2626.31 samples/sec   Loss 17.3689   LearningRate 0.0952   Epoch: 0   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:19,033-Speed 2625.12 samples/sec   Loss 17.4201   LearningRate 0.0952   Epoch: 0   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:22,959-Speed 2609.07 samples/sec   Loss 17.2519   LearningRate 0.0952   Epoch: 0   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:26,879-Speed 2612.71 samples/sec   Loss 17.3264   LearningRate 0.0951   Epoch: 0   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:30,794-Speed 2616.76 samples/sec   Loss 17.2888   LearningRate 0.0951   Epoch: 0   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:34,716-Speed 2611.82 samples/sec   Loss 17.3595   LearningRate 0.0951   Epoch: 0   Global Step: 20390   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:01:38,760-Speed 2532.26 samples/sec   Loss 17.0819   LearningRate 0.0951   Epoch: 0   Global Step: 20400   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:01:42,665-Speed 2623.10 samples/sec   Loss 17.2258   LearningRate 0.0951   Epoch: 0   Global Step: 20410   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:01:46,580-Speed 2616.95 samples/sec   Loss 17.2909   LearningRate 0.0951   Epoch: 0   Global Step: 20420   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:01:50,468-Speed 2634.20 samples/sec   Loss 17.3200   LearningRate 0.0951   Epoch: 0   Global Step: 20430   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:54,372-Speed 2623.38 samples/sec   Loss 17.3095   LearningRate 0.0951   Epoch: 0   Global Step: 20440   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:01:58,274-Speed 2625.10 samples/sec   Loss 17.3333   LearningRate 0.0951   Epoch: 0   Global Step: 20450   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:02,179-Speed 2623.05 samples/sec   Loss 17.3007   LearningRate 0.0951   Epoch: 0   Global Step: 20460   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:06,098-Speed 2613.26 samples/sec   Loss 17.2143   LearningRate 0.0951   Epoch: 0   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:10,022-Speed 2610.53 samples/sec   Loss 17.2889   LearningRate 0.0951   Epoch: 0   Global Step: 20480   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:14,007-Speed 2569.67 samples/sec   Loss 17.2423   LearningRate 0.0951   Epoch: 0   Global Step: 20490   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:17,917-Speed 2619.81 samples/sec   Loss 17.2116   LearningRate 0.0951   Epoch: 0   Global Step: 20500   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:21,872-Speed 2590.17 samples/sec   Loss 17.1046   LearningRate 0.0951   Epoch: 0   Global Step: 20510   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:25,775-Speed 2624.26 samples/sec   Loss 17.3211   LearningRate 0.0951   Epoch: 0   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:02:29,679-Speed 2624.26 samples/sec   Loss 17.0901   LearningRate 0.0951   Epoch: 0   Global Step: 20530   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:33,582-Speed 2624.40 samples/sec   Loss 17.3014   LearningRate 0.0951   Epoch: 0   Global Step: 20540   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:37,484-Speed 2624.70 samples/sec   Loss 17.2327   LearningRate 0.0951   Epoch: 0   Global Step: 20550   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:41,387-Speed 2624.20 samples/sec   Loss 17.2847   LearningRate 0.0951   Epoch: 0   Global Step: 20560   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:45,293-Speed 2622.84 samples/sec   Loss 17.3132   LearningRate 0.0951   Epoch: 0   Global Step: 20570   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:49,196-Speed 2623.80 samples/sec   Loss 17.2234   LearningRate 0.0951   Epoch: 0   Global Step: 20580   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:53,155-Speed 2587.31 samples/sec   Loss 17.1704   LearningRate 0.0951   Epoch: 0   Global Step: 20590   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:02:57,071-Speed 2615.71 samples/sec   Loss 17.3309   LearningRate 0.0951   Epoch: 0   Global Step: 20600   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:03:00,971-Speed 2626.36 samples/sec   Loss 17.2529   LearningRate 0.0951   Epoch: 0   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:04,966-Speed 2563.97 samples/sec   Loss 17.1898   LearningRate 0.0951   Epoch: 0   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:08,873-Speed 2621.28 samples/sec   Loss 17.0936   LearningRate 0.0951   Epoch: 0   Global Step: 20630   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:12,779-Speed 2622.35 samples/sec   Loss 17.3306   LearningRate 0.0951   Epoch: 0   Global Step: 20640   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:16,696-Speed 2614.78 samples/sec   Loss 17.2487   LearningRate 0.0951   Epoch: 0   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:20,598-Speed 2625.24 samples/sec   Loss 17.3869   LearningRate 0.0951   Epoch: 0   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:24,501-Speed 2624.44 samples/sec   Loss 17.2078   LearningRate 0.0951   Epoch: 0   Global Step: 20670   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:28,441-Speed 2599.46 samples/sec   Loss 17.3298   LearningRate 0.0951   Epoch: 0   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:32,345-Speed 2623.55 samples/sec   Loss 17.3589   LearningRate 0.0951   Epoch: 0   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:36,250-Speed 2623.27 samples/sec   Loss 17.2275   LearningRate 0.0951   Epoch: 0   Global Step: 20700   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:40,163-Speed 2617.44 samples/sec   Loss 17.2090   LearningRate 0.0951   Epoch: 0   Global Step: 20710   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:03:44,073-Speed 2619.14 samples/sec   Loss 17.2965   LearningRate 0.0951   Epoch: 0   Global Step: 20720   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:03:47,986-Speed 2617.86 samples/sec   Loss 17.0071   LearningRate 0.0951   Epoch: 0   Global Step: 20730   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:03:51,928-Speed 2599.01 samples/sec   Loss 17.2234   LearningRate 0.0951   Epoch: 0   Global Step: 20740   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:03:55,831-Speed 2623.75 samples/sec   Loss 17.0716   LearningRate 0.0951   Epoch: 0   Global Step: 20750   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:03:59,754-Speed 2611.99 samples/sec   Loss 17.0878   LearningRate 0.0951   Epoch: 0   Global Step: 20760   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:03,668-Speed 2616.28 samples/sec   Loss 17.1337   LearningRate 0.0951   Epoch: 0   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:07,590-Speed 2611.28 samples/sec   Loss 17.1259   LearningRate 0.0951   Epoch: 0   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:11,508-Speed 2614.83 samples/sec   Loss 17.1120   LearningRate 0.0951   Epoch: 0   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:15,428-Speed 2612.67 samples/sec   Loss 17.3119   LearningRate 0.0950   Epoch: 0   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:19,394-Speed 2582.57 samples/sec   Loss 17.1137   LearningRate 0.0950   Epoch: 0   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:23,338-Speed 2597.75 samples/sec   Loss 17.1623   LearningRate 0.0950   Epoch: 0   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:27,245-Speed 2621.71 samples/sec   Loss 17.0761   LearningRate 0.0950   Epoch: 0   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:31,153-Speed 2620.55 samples/sec   Loss 17.1821   LearningRate 0.0950   Epoch: 0   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:04:35,058-Speed 2623.38 samples/sec   Loss 17.2244   LearningRate 0.0950   Epoch: 0   Global Step: 20850   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:04:38,968-Speed 2619.20 samples/sec   Loss 17.1397   LearningRate 0.0950   Epoch: 0   Global Step: 20860   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:04:42,870-Speed 2625.17 samples/sec   Loss 17.1464   LearningRate 0.0950   Epoch: 0   Global Step: 20870   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:04:46,779-Speed 2620.01 samples/sec   Loss 17.1666   LearningRate 0.0950   Epoch: 0   Global Step: 20880   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:04:50,686-Speed 2621.86 samples/sec   Loss 17.1096   LearningRate 0.0950   Epoch: 0   Global Step: 20890   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:04:54,591-Speed 2623.27 samples/sec   Loss 17.0485   LearningRate 0.0950   Epoch: 0   Global Step: 20900   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:04:58,483-Speed 2631.52 samples/sec   Loss 17.0844   LearningRate 0.0950   Epoch: 0   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:02,533-Speed 2528.68 samples/sec   Loss 17.2007   LearningRate 0.0950   Epoch: 0   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:06,655-Speed 2485.53 samples/sec   Loss 17.2306   LearningRate 0.0950   Epoch: 0   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:10,770-Speed 2488.79 samples/sec   Loss 17.1636   LearningRate 0.0950   Epoch: 0   Global Step: 20940   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:14,706-Speed 2602.08 samples/sec   Loss 17.1559   LearningRate 0.0950   Epoch: 0   Global Step: 20950   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:18,612-Speed 2622.92 samples/sec   Loss 17.0436   LearningRate 0.0950   Epoch: 0   Global Step: 20960   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:22,544-Speed 2604.43 samples/sec   Loss 17.0375   LearningRate 0.0950   Epoch: 0   Global Step: 20970   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:26,479-Speed 2603.26 samples/sec   Loss 16.9903   LearningRate 0.0950   Epoch: 0   Global Step: 20980   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:30,412-Speed 2604.61 samples/sec   Loss 16.9601   LearningRate 0.0950   Epoch: 0   Global Step: 20990   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:34,503-Speed 2503.14 samples/sec   Loss 17.0820   LearningRate 0.0950   Epoch: 0   Global Step: 21000   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:05:38,411-Speed 2621.28 samples/sec   Loss 16.8805   LearningRate 0.0950   Epoch: 0   Global Step: 21010   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:05:42,327-Speed 2615.57 samples/sec   Loss 17.1483   LearningRate 0.0950   Epoch: 0   Global Step: 21020   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:05:46,240-Speed 2617.60 samples/sec   Loss 17.0557   LearningRate 0.0950   Epoch: 0   Global Step: 21030   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:05:50,147-Speed 2621.65 samples/sec   Loss 17.0277   LearningRate 0.0950   Epoch: 0   Global Step: 21040   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:05:54,054-Speed 2620.99 samples/sec   Loss 17.2500   LearningRate 0.0950   Epoch: 0   Global Step: 21050   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:05:57,971-Speed 2615.30 samples/sec   Loss 17.0240   LearningRate 0.0950   Epoch: 0   Global Step: 21060   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:06:01,887-Speed 2615.59 samples/sec   Loss 17.0192   LearningRate 0.0950   Epoch: 0   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:05,799-Speed 2618.49 samples/sec   Loss 16.9565   LearningRate 0.0950   Epoch: 0   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:09,784-Speed 2570.29 samples/sec   Loss 17.0669   LearningRate 0.0950   Epoch: 0   Global Step: 21090   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:13,813-Speed 2541.92 samples/sec   Loss 17.0735   LearningRate 0.0950   Epoch: 0   Global Step: 21100   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:17,760-Speed 2594.95 samples/sec   Loss 17.1467   LearningRate 0.0950   Epoch: 0   Global Step: 21110   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:21,681-Speed 2612.49 samples/sec   Loss 17.0775   LearningRate 0.0950   Epoch: 0   Global Step: 21120   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:25,593-Speed 2618.34 samples/sec   Loss 16.9033   LearningRate 0.0950   Epoch: 0   Global Step: 21130   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:29,502-Speed 2620.14 samples/sec   Loss 17.0271   LearningRate 0.0950   Epoch: 0   Global Step: 21140   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:33,418-Speed 2615.98 samples/sec   Loss 17.0637   LearningRate 0.0950   Epoch: 0   Global Step: 21150   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:37,332-Speed 2617.11 samples/sec   Loss 17.0440   LearningRate 0.0950   Epoch: 0   Global Step: 21160   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:41,248-Speed 2615.42 samples/sec   Loss 17.0535   LearningRate 0.0950   Epoch: 0   Global Step: 21170   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:06:45,152-Speed 2623.95 samples/sec   Loss 17.0259   LearningRate 0.0950   Epoch: 0   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:49,058-Speed 2622.33 samples/sec   Loss 17.0544   LearningRate 0.0950   Epoch: 0   Global Step: 21190   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:53,020-Speed 2585.33 samples/sec   Loss 16.9845   LearningRate 0.0950   Epoch: 0   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:06:57,086-Speed 2518.83 samples/sec   Loss 17.0774   LearningRate 0.0950   Epoch: 0   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:00,992-Speed 2622.23 samples/sec   Loss 16.9532   LearningRate 0.0949   Epoch: 0   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:04,902-Speed 2619.66 samples/sec   Loss 17.2552   LearningRate 0.0949   Epoch: 0   Global Step: 21230   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:08,806-Speed 2623.45 samples/sec   Loss 16.9449   LearningRate 0.0949   Epoch: 0   Global Step: 21240   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:12,713-Speed 2621.95 samples/sec   Loss 17.0939   LearningRate 0.0949   Epoch: 0   Global Step: 21250   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:16,625-Speed 2617.76 samples/sec   Loss 16.9523   LearningRate 0.0949   Epoch: 0   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:20,688-Speed 2521.28 samples/sec   Loss 16.9467   LearningRate 0.0949   Epoch: 0   Global Step: 21270   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:24,610-Speed 2611.70 samples/sec   Loss 17.0630   LearningRate 0.0949   Epoch: 0   Global Step: 21280   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:28,535-Speed 2609.81 samples/sec   Loss 16.9539   LearningRate 0.0949   Epoch: 0   Global Step: 21290   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:32,446-Speed 2618.62 samples/sec   Loss 17.1433   LearningRate 0.0949   Epoch: 0   Global Step: 21300   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:36,357-Speed 2619.07 samples/sec   Loss 16.9352   LearningRate 0.0949   Epoch: 0   Global Step: 21310   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:40,266-Speed 2619.77 samples/sec   Loss 17.0704   LearningRate 0.0949   Epoch: 0   Global Step: 21320   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:44,177-Speed 2618.56 samples/sec   Loss 17.0987   LearningRate 0.0949   Epoch: 0   Global Step: 21330   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:48,105-Speed 2608.56 samples/sec   Loss 16.9881   LearningRate 0.0949   Epoch: 0   Global Step: 21340   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:07:52,095-Speed 2566.40 samples/sec   Loss 17.0044   LearningRate 0.0949   Epoch: 0   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:07:56,214-Speed 2487.17 samples/sec   Loss 16.9794   LearningRate 0.0949   Epoch: 0   Global Step: 21360   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:00,121-Speed 2621.43 samples/sec   Loss 17.1177   LearningRate 0.0949   Epoch: 0   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:04,035-Speed 2616.77 samples/sec   Loss 16.9850   LearningRate 0.0949   Epoch: 0   Global Step: 21380   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:07,946-Speed 2619.09 samples/sec   Loss 17.1058   LearningRate 0.0949   Epoch: 0   Global Step: 21390   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:11,853-Speed 2621.67 samples/sec   Loss 17.0647   LearningRate 0.0949   Epoch: 0   Global Step: 21400   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:15,782-Speed 2606.68 samples/sec   Loss 17.1170   LearningRate 0.0949   Epoch: 0   Global Step: 21410   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:19,718-Speed 2602.54 samples/sec   Loss 16.9096   LearningRate 0.0949   Epoch: 0   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:23,627-Speed 2620.51 samples/sec   Loss 17.0543   LearningRate 0.0949   Epoch: 0   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:27,534-Speed 2622.20 samples/sec   Loss 16.9177   LearningRate 0.0949   Epoch: 0   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:31,447-Speed 2617.52 samples/sec   Loss 17.0270   LearningRate 0.0949   Epoch: 0   Global Step: 21450   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:08:35,385-Speed 2600.88 samples/sec   Loss 16.7937   LearningRate 0.0949   Epoch: 0   Global Step: 21460   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:08:39,438-Speed 2526.96 samples/sec   Loss 16.9385   LearningRate 0.0949   Epoch: 0   Global Step: 21470   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:43,531-Speed 2502.09 samples/sec   Loss 16.9112   LearningRate 0.0949   Epoch: 0   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:47,453-Speed 2612.05 samples/sec   Loss 16.9142   LearningRate 0.0949   Epoch: 0   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:51,360-Speed 2621.14 samples/sec   Loss 16.9864   LearningRate 0.0949   Epoch: 0   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:55,274-Speed 2617.18 samples/sec   Loss 16.8475   LearningRate 0.0949   Epoch: 0   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:08:59,184-Speed 2619.61 samples/sec   Loss 16.9216   LearningRate 0.0949   Epoch: 0   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:03,095-Speed 2618.96 samples/sec   Loss 16.7584   LearningRate 0.0949   Epoch: 0   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:07,004-Speed 2619.82 samples/sec   Loss 16.7814   LearningRate 0.0949   Epoch: 0   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:10,924-Speed 2612.80 samples/sec   Loss 16.8829   LearningRate 0.0949   Epoch: 0   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:14,834-Speed 2619.37 samples/sec   Loss 16.9803   LearningRate 0.0949   Epoch: 0   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:18,936-Speed 2496.91 samples/sec   Loss 17.0373   LearningRate 0.0949   Epoch: 0   Global Step: 21570   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:09:23,001-Speed 2519.82 samples/sec   Loss 17.0240   LearningRate 0.0949   Epoch: 0   Global Step: 21580   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:09:26,906-Speed 2622.91 samples/sec   Loss 16.9210   LearningRate 0.0949   Epoch: 0   Global Step: 21590   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:09:30,821-Speed 2616.40 samples/sec   Loss 16.7600   LearningRate 0.0949   Epoch: 0   Global Step: 21600   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:09:34,734-Speed 2617.21 samples/sec   Loss 16.9929   LearningRate 0.0949   Epoch: 0   Global Step: 21610   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:09:38,622-Speed 2633.96 samples/sec   Loss 16.8408   LearningRate 0.0949   Epoch: 0   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:42,532-Speed 2620.03 samples/sec   Loss 16.8212   LearningRate 0.0949   Epoch: 0   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:46,452-Speed 2612.30 samples/sec   Loss 17.0384   LearningRate 0.0949   Epoch: 0   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:50,384-Speed 2605.38 samples/sec   Loss 16.9902   LearningRate 0.0948   Epoch: 0   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:54,386-Speed 2559.39 samples/sec   Loss 16.8431   LearningRate 0.0948   Epoch: 0   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:09:58,296-Speed 2620.02 samples/sec   Loss 16.9397   LearningRate 0.0948   Epoch: 0   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:02,350-Speed 2526.19 samples/sec   Loss 16.9069   LearningRate 0.0948   Epoch: 0   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:06,325-Speed 2577.07 samples/sec   Loss 17.0303   LearningRate 0.0948   Epoch: 0   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:10,236-Speed 2618.34 samples/sec   Loss 16.9323   LearningRate 0.0948   Epoch: 0   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:14,150-Speed 2616.83 samples/sec   Loss 16.9864   LearningRate 0.0948   Epoch: 0   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:18,076-Speed 2609.58 samples/sec   Loss 16.9566   LearningRate 0.0948   Epoch: 0   Global Step: 21720   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:10:21,993-Speed 2615.01 samples/sec   Loss 16.9099   LearningRate 0.0948   Epoch: 0   Global Step: 21730   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:10:25,888-Speed 2629.01 samples/sec   Loss 16.8735   LearningRate 0.0948   Epoch: 0   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:29,804-Speed 2616.11 samples/sec   Loss 16.9736   LearningRate 0.0948   Epoch: 0   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:33,721-Speed 2615.02 samples/sec   Loss 16.9530   LearningRate 0.0948   Epoch: 0   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:10:37,654-Speed 2604.32 samples/sec   Loss 16.8648   LearningRate 0.0948   Epoch: 0   Global Step: 21770   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:10:41,622-Speed 2581.73 samples/sec   Loss 16.8348   LearningRate 0.0948   Epoch: 0   Global Step: 21780   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:10:45,720-Speed 2498.83 samples/sec   Loss 16.9737   LearningRate 0.0948   Epoch: 0   Global Step: 21790   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:10:49,726-Speed 2557.74 samples/sec   Loss 16.7927   LearningRate 0.0948   Epoch: 0   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:10:53,639-Speed 2617.46 samples/sec   Loss 16.8731   LearningRate 0.0948   Epoch: 0   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:10:57,548-Speed 2619.91 samples/sec   Loss 16.8285   LearningRate 0.0948   Epoch: 0   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:11:01,454-Speed 2622.10 samples/sec   Loss 16.8650   LearningRate 0.0948   Epoch: 0   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:11:05,389-Speed 2603.47 samples/sec   Loss 16.7568   LearningRate 0.0948   Epoch: 0   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:11:09,297-Speed 2620.51 samples/sec   Loss 16.9007   LearningRate 0.0948   Epoch: 0   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:11:13,205-Speed 2620.84 samples/sec   Loss 16.9331   LearningRate 0.0948   Epoch: 0   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:11:17,127-Speed 2611.48 samples/sec   Loss 16.9770   LearningRate 0.0948   Epoch: 0   Global Step: 21870   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:21,060-Speed 2604.47 samples/sec   Loss 16.8032   LearningRate 0.0948   Epoch: 0   Global Step: 21880   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:24,991-Speed 2605.66 samples/sec   Loss 16.7248   LearningRate 0.0948   Epoch: 0   Global Step: 21890   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:28,906-Speed 2616.40 samples/sec   Loss 16.8836   LearningRate 0.0948   Epoch: 0   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:32,827-Speed 2611.68 samples/sec   Loss 16.7817   LearningRate 0.0948   Epoch: 0   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:36,742-Speed 2617.18 samples/sec   Loss 16.9030   LearningRate 0.0948   Epoch: 0   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:40,663-Speed 2611.68 samples/sec   Loss 16.9036   LearningRate 0.0948   Epoch: 0   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:44,575-Speed 2618.55 samples/sec   Loss 16.7863   LearningRate 0.0948   Epoch: 0   Global Step: 21940   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:48,486-Speed 2618.33 samples/sec   Loss 16.8626   LearningRate 0.0948   Epoch: 0   Global Step: 21950   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:52,393-Speed 2621.81 samples/sec   Loss 16.7639   LearningRate 0.0948   Epoch: 0   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:11:56,292-Speed 2626.90 samples/sec   Loss 16.9486   LearningRate 0.0948   Epoch: 0   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:00,202-Speed 2619.63 samples/sec   Loss 16.8055   LearningRate 0.0948   Epoch: 0   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:04,118-Speed 2615.54 samples/sec   Loss 16.7627   LearningRate 0.0948   Epoch: 0   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:08,026-Speed 2621.09 samples/sec   Loss 16.9939   LearningRate 0.0948   Epoch: 0   Global Step: 22000   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:11,938-Speed 2618.33 samples/sec   Loss 16.8963   LearningRate 0.0948   Epoch: 0   Global Step: 22010   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:15,881-Speed 2597.13 samples/sec   Loss 16.7661   LearningRate 0.0948   Epoch: 0   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:19,819-Speed 2601.39 samples/sec   Loss 16.7928   LearningRate 0.0948   Epoch: 0   Global Step: 22030   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:23,769-Speed 2592.85 samples/sec   Loss 16.8972   LearningRate 0.0948   Epoch: 0   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:27,717-Speed 2594.96 samples/sec   Loss 16.7560   LearningRate 0.0948   Epoch: 0   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:31,629-Speed 2618.08 samples/sec   Loss 16.7642   LearningRate 0.0948   Epoch: 0   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:35,535-Speed 2622.57 samples/sec   Loss 16.8549   LearningRate 0.0948   Epoch: 0   Global Step: 22070   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:12:39,477-Speed 2598.30 samples/sec   Loss 16.8242   LearningRate 0.0947   Epoch: 0   Global Step: 22080   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:12:43,390-Speed 2617.46 samples/sec   Loss 16.8511   LearningRate 0.0947   Epoch: 0   Global Step: 22090   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:12:47,330-Speed 2599.45 samples/sec   Loss 16.8887   LearningRate 0.0947   Epoch: 0   Global Step: 22100   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:12:51,273-Speed 2598.49 samples/sec   Loss 16.8176   LearningRate 0.0947   Epoch: 0   Global Step: 22110   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:12:55,178-Speed 2622.58 samples/sec   Loss 16.8183   LearningRate 0.0947   Epoch: 0   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:12:59,086-Speed 2620.71 samples/sec   Loss 16.8515   LearningRate 0.0947   Epoch: 0   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:03,005-Speed 2613.39 samples/sec   Loss 16.7055   LearningRate 0.0947   Epoch: 0   Global Step: 22140   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:06,927-Speed 2611.99 samples/sec   Loss 16.7682   LearningRate 0.0947   Epoch: 0   Global Step: 22150   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:10,834-Speed 2621.64 samples/sec   Loss 16.7296   LearningRate 0.0947   Epoch: 0   Global Step: 22160   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:14,742-Speed 2621.10 samples/sec   Loss 16.7022   LearningRate 0.0947   Epoch: 0   Global Step: 22170   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:18,738-Speed 2563.10 samples/sec   Loss 16.9250   LearningRate 0.0947   Epoch: 0   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:22,650-Speed 2618.31 samples/sec   Loss 16.8334   LearningRate 0.0947   Epoch: 0   Global Step: 22190   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:26,596-Speed 2595.90 samples/sec   Loss 16.6100   LearningRate 0.0947   Epoch: 0   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:30,535-Speed 2600.19 samples/sec   Loss 16.6690   LearningRate 0.0947   Epoch: 0   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:34,461-Speed 2608.77 samples/sec   Loss 16.5633   LearningRate 0.0947   Epoch: 0   Global Step: 22220   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:13:38,380-Speed 2613.66 samples/sec   Loss 16.8058   LearningRate 0.0947   Epoch: 0   Global Step: 22230   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:13:42,295-Speed 2616.63 samples/sec   Loss 16.6796   LearningRate 0.0947   Epoch: 0   Global Step: 22240   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:13:46,211-Speed 2614.96 samples/sec   Loss 16.8363   LearningRate 0.0947   Epoch: 0   Global Step: 22250   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:13:50,138-Speed 2608.60 samples/sec   Loss 16.9602   LearningRate 0.0947   Epoch: 0   Global Step: 22260   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:13:54,033-Speed 2629.69 samples/sec   Loss 16.8557   LearningRate 0.0947   Epoch: 0   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:13:57,946-Speed 2617.99 samples/sec   Loss 16.6426   LearningRate 0.0947   Epoch: 0   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:01,883-Speed 2601.68 samples/sec   Loss 16.7326   LearningRate 0.0947   Epoch: 0   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:05,975-Speed 2502.78 samples/sec   Loss 16.8904   LearningRate 0.0947   Epoch: 0   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:10,072-Speed 2499.86 samples/sec   Loss 16.8618   LearningRate 0.0947   Epoch: 0   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:14,118-Speed 2531.59 samples/sec   Loss 16.8052   LearningRate 0.0947   Epoch: 0   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:18,029-Speed 2619.23 samples/sec   Loss 16.7486   LearningRate 0.0947   Epoch: 0   Global Step: 22330   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:21,944-Speed 2616.50 samples/sec   Loss 16.6979   LearningRate 0.0947   Epoch: 0   Global Step: 22340   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:25,848-Speed 2623.91 samples/sec   Loss 16.7916   LearningRate 0.0947   Epoch: 0   Global Step: 22350   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:29,775-Speed 2607.96 samples/sec   Loss 16.8267   LearningRate 0.0947   Epoch: 0   Global Step: 22360   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:33,704-Speed 2606.30 samples/sec   Loss 16.7835   LearningRate 0.0947   Epoch: 0   Global Step: 22370   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:14:37,624-Speed 2613.20 samples/sec   Loss 16.6536   LearningRate 0.0947   Epoch: 0   Global Step: 22380   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:14:41,522-Speed 2627.66 samples/sec   Loss 16.7005   LearningRate 0.0947   Epoch: 0   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:14:45,438-Speed 2615.60 samples/sec   Loss 16.8289   LearningRate 0.0947   Epoch: 0   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:14:49,347-Speed 2620.10 samples/sec   Loss 16.9206   LearningRate 0.0947   Epoch: 0   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:14:53,250-Speed 2624.09 samples/sec   Loss 16.8509   LearningRate 0.0947   Epoch: 0   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:14:57,156-Speed 2622.52 samples/sec   Loss 16.6571   LearningRate 0.0947   Epoch: 0   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:01,058-Speed 2625.50 samples/sec   Loss 16.5993   LearningRate 0.0947   Epoch: 0   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:04,962-Speed 2622.96 samples/sec   Loss 16.6175   LearningRate 0.0947   Epoch: 0   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:08,864-Speed 2624.70 samples/sec   Loss 16.7212   LearningRate 0.0947   Epoch: 0   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:12,769-Speed 2623.42 samples/sec   Loss 16.6576   LearningRate 0.0947   Epoch: 0   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:16,677-Speed 2620.98 samples/sec   Loss 16.8581   LearningRate 0.0947   Epoch: 0   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:20,604-Speed 2608.60 samples/sec   Loss 16.7419   LearningRate 0.0947   Epoch: 0   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:15:24,517-Speed 2617.85 samples/sec   Loss 16.6385   LearningRate 0.0946   Epoch: 0   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:28,434-Speed 2614.90 samples/sec   Loss 16.5988   LearningRate 0.0946   Epoch: 0   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:32,340-Speed 2622.25 samples/sec   Loss 16.5852   LearningRate 0.0946   Epoch: 0   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:36,276-Speed 2602.04 samples/sec   Loss 16.8079   LearningRate 0.0946   Epoch: 0   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:40,185-Speed 2619.94 samples/sec   Loss 16.6693   LearningRate 0.0946   Epoch: 0   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:44,102-Speed 2615.59 samples/sec   Loss 16.6322   LearningRate 0.0946   Epoch: 0   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:48,006-Speed 2623.30 samples/sec   Loss 16.7721   LearningRate 0.0946   Epoch: 0   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:51,932-Speed 2609.38 samples/sec   Loss 16.7160   LearningRate 0.0946   Epoch: 0   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:55,837-Speed 2623.05 samples/sec   Loss 16.7050   LearningRate 0.0946   Epoch: 0   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:15:59,740-Speed 2624.35 samples/sec   Loss 16.5683   LearningRate 0.0946   Epoch: 0   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:16:03,642-Speed 2624.51 samples/sec   Loss 16.7109   LearningRate 0.0946   Epoch: 0   Global Step: 22600   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:16:07,553-Speed 2618.98 samples/sec   Loss 16.6937   LearningRate 0.0946   Epoch: 0   Global Step: 22610   Fp16 Grad Scale: 262144   Required: 91 hours
Training: 2022-04-12 22:16:11,429-Speed 2642.63 samples/sec   Loss 16.6583   LearningRate 0.0946   Epoch: 0   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:15,357-Speed 2607.57 samples/sec   Loss 16.5752   LearningRate 0.0946   Epoch: 0   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:19,277-Speed 2613.26 samples/sec   Loss 16.5559   LearningRate 0.0946   Epoch: 0   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:23,178-Speed 2625.43 samples/sec   Loss 16.7394   LearningRate 0.0946   Epoch: 0   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:27,083-Speed 2623.17 samples/sec   Loss 16.7260   LearningRate 0.0946   Epoch: 0   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:30,999-Speed 2615.94 samples/sec   Loss 16.7089   LearningRate 0.0946   Epoch: 0   Global Step: 22670   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:34,896-Speed 2627.95 samples/sec   Loss 16.6027   LearningRate 0.0946   Epoch: 0   Global Step: 22680   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:38,804-Speed 2621.01 samples/sec   Loss 16.6966   LearningRate 0.0946   Epoch: 0   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:42,720-Speed 2616.22 samples/sec   Loss 16.5720   LearningRate 0.0946   Epoch: 0   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:46,648-Speed 2607.15 samples/sec   Loss 16.7235   LearningRate 0.0946   Epoch: 0   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 91 hours
Training: 2022-04-12 22:16:50,553-Speed 2622.97 samples/sec   Loss 16.4758   LearningRate 0.0946   Epoch: 0   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:16:54,458-Speed 2623.05 samples/sec   Loss 16.6120   LearningRate 0.0946   Epoch: 0   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:16:58,410-Speed 2592.07 samples/sec   Loss 16.5066   LearningRate 0.0946   Epoch: 0   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:17:02,340-Speed 2606.30 samples/sec   Loss 16.6481   LearningRate 0.0946   Epoch: 0   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:17:06,253-Speed 2617.86 samples/sec   Loss 16.6615   LearningRate 0.0946   Epoch: 0   Global Step: 22760   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:17:10,163-Speed 2619.12 samples/sec   Loss 16.5848   LearningRate 0.0946   Epoch: 0   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:17:14,071-Speed 2620.94 samples/sec   Loss 16.6244   LearningRate 0.0946   Epoch: 0   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:17:17,987-Speed 2614.99 samples/sec   Loss 16.4830   LearningRate 0.0946   Epoch: 0   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 91 hours
Training: 2022-04-12 22:17:21,898-Speed 2619.21 samples/sec   Loss 16.6417   LearningRate 0.0946   Epoch: 0   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:25,800-Speed 2625.39 samples/sec   Loss 16.7599   LearningRate 0.0946   Epoch: 0   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:29,704-Speed 2623.33 samples/sec   Loss 16.6405   LearningRate 0.0946   Epoch: 0   Global Step: 22820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:17:33,613-Speed 2620.37 samples/sec   Loss 16.4435   LearningRate 0.0946   Epoch: 0   Global Step: 22830   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:17:37,507-Speed 2632.65 samples/sec   Loss 16.6291   LearningRate 0.0946   Epoch: 0   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:41,405-Speed 2627.74 samples/sec   Loss 16.5831   LearningRate 0.0946   Epoch: 0   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:45,311-Speed 2621.99 samples/sec   Loss 16.5760   LearningRate 0.0946   Epoch: 0   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:49,209-Speed 2627.10 samples/sec   Loss 16.7358   LearningRate 0.0946   Epoch: 0   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:53,116-Speed 2621.97 samples/sec   Loss 16.4928   LearningRate 0.0946   Epoch: 0   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:17:57,018-Speed 2625.29 samples/sec   Loss 16.5705   LearningRate 0.0946   Epoch: 0   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:18:00,922-Speed 2623.82 samples/sec   Loss 16.5105   LearningRate 0.0946   Epoch: 0   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:18:04,823-Speed 2625.17 samples/sec   Loss 16.4597   LearningRate 0.0946   Epoch: 0   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:18:08,722-Speed 2626.89 samples/sec   Loss 16.5756   LearningRate 0.0946   Epoch: 0   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:18:12,625-Speed 2624.10 samples/sec   Loss 16.5930   LearningRate 0.0945   Epoch: 0   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:18:16,530-Speed 2623.16 samples/sec   Loss 16.6126   LearningRate 0.0945   Epoch: 0   Global Step: 22940   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:20,433-Speed 2623.86 samples/sec   Loss 16.6224   LearningRate 0.0945   Epoch: 0   Global Step: 22950   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:24,340-Speed 2621.65 samples/sec   Loss 16.7100   LearningRate 0.0945   Epoch: 0   Global Step: 22960   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:28,243-Speed 2624.88 samples/sec   Loss 16.6416   LearningRate 0.0945   Epoch: 0   Global Step: 22970   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:32,142-Speed 2627.17 samples/sec   Loss 16.7317   LearningRate 0.0945   Epoch: 0   Global Step: 22980   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:36,042-Speed 2625.56 samples/sec   Loss 16.3526   LearningRate 0.0945   Epoch: 0   Global Step: 22990   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:39,952-Speed 2619.62 samples/sec   Loss 16.6539   LearningRate 0.0945   Epoch: 0   Global Step: 23000   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:43,890-Speed 2601.56 samples/sec   Loss 16.6327   LearningRate 0.0945   Epoch: 0   Global Step: 23010   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:47,791-Speed 2625.24 samples/sec   Loss 16.6432   LearningRate 0.0945   Epoch: 0   Global Step: 23020   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:51,691-Speed 2626.20 samples/sec   Loss 16.4949   LearningRate 0.0945   Epoch: 0   Global Step: 23030   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:18:55,607-Speed 2615.79 samples/sec   Loss 16.7332   LearningRate 0.0945   Epoch: 0   Global Step: 23040   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:18:59,496-Speed 2633.89 samples/sec   Loss 16.5109   LearningRate 0.0945   Epoch: 0   Global Step: 23050   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:03,494-Speed 2561.36 samples/sec   Loss 16.5240   LearningRate 0.0945   Epoch: 0   Global Step: 23060   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:07,402-Speed 2621.51 samples/sec   Loss 16.6508   LearningRate 0.0945   Epoch: 0   Global Step: 23070   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:11,313-Speed 2618.40 samples/sec   Loss 16.5687   LearningRate 0.0945   Epoch: 0   Global Step: 23080   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:15,222-Speed 2620.33 samples/sec   Loss 16.7593   LearningRate 0.0945   Epoch: 0   Global Step: 23090   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:19,133-Speed 2619.34 samples/sec   Loss 16.5569   LearningRate 0.0945   Epoch: 0   Global Step: 23100   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:23,056-Speed 2611.00 samples/sec   Loss 16.3881   LearningRate 0.0945   Epoch: 0   Global Step: 23110   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:26,956-Speed 2626.09 samples/sec   Loss 16.4379   LearningRate 0.0945   Epoch: 0   Global Step: 23120   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:30,866-Speed 2619.21 samples/sec   Loss 16.4483   LearningRate 0.0945   Epoch: 0   Global Step: 23130   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:34,783-Speed 2614.79 samples/sec   Loss 16.3667   LearningRate 0.0945   Epoch: 0   Global Step: 23140   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:38,679-Speed 2629.04 samples/sec   Loss 16.5144   LearningRate 0.0945   Epoch: 0   Global Step: 23150   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:42,591-Speed 2618.07 samples/sec   Loss 16.4234   LearningRate 0.0945   Epoch: 0   Global Step: 23160   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:19:46,484-Speed 2631.34 samples/sec   Loss 16.5534   LearningRate 0.0945   Epoch: 0   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:19:50,393-Speed 2620.47 samples/sec   Loss 16.6336   LearningRate 0.0945   Epoch: 0   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:19:54,293-Speed 2626.27 samples/sec   Loss 16.4652   LearningRate 0.0945   Epoch: 0   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:19:58,194-Speed 2625.49 samples/sec   Loss 16.4841   LearningRate 0.0945   Epoch: 0   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:02,094-Speed 2626.39 samples/sec   Loss 16.5268   LearningRate 0.0945   Epoch: 0   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:05,992-Speed 2627.54 samples/sec   Loss 16.4997   LearningRate 0.0945   Epoch: 0   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:09,893-Speed 2625.50 samples/sec   Loss 16.6351   LearningRate 0.0945   Epoch: 0   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:13,799-Speed 2622.11 samples/sec   Loss 16.5465   LearningRate 0.0945   Epoch: 0   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:17,710-Speed 2619.43 samples/sec   Loss 16.5233   LearningRate 0.0945   Epoch: 0   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:21,613-Speed 2624.16 samples/sec   Loss 16.4842   LearningRate 0.0945   Epoch: 0   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:20:25,520-Speed 2621.74 samples/sec   Loss 16.6100   LearningRate 0.0945   Epoch: 0   Global Step: 23270   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:29,502-Speed 2572.28 samples/sec   Loss 16.6170   LearningRate 0.0945   Epoch: 0   Global Step: 23280   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:33,413-Speed 2618.65 samples/sec   Loss 16.4927   LearningRate 0.0945   Epoch: 0   Global Step: 23290   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:37,312-Speed 2626.64 samples/sec   Loss 16.5420   LearningRate 0.0945   Epoch: 0   Global Step: 23300   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:41,226-Speed 2616.99 samples/sec   Loss 16.3600   LearningRate 0.0945   Epoch: 0   Global Step: 23310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:45,131-Speed 2622.89 samples/sec   Loss 16.5680   LearningRate 0.0945   Epoch: 0   Global Step: 23320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:49,037-Speed 2623.15 samples/sec   Loss 16.5730   LearningRate 0.0945   Epoch: 0   Global Step: 23330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:52,942-Speed 2622.58 samples/sec   Loss 16.3344   LearningRate 0.0945   Epoch: 0   Global Step: 23340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:20:56,843-Speed 2625.64 samples/sec   Loss 16.2890   LearningRate 0.0944   Epoch: 0   Global Step: 23350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:21:00,751-Speed 2620.70 samples/sec   Loss 16.4035   LearningRate 0.0944   Epoch: 0   Global Step: 23360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:21:04,666-Speed 2616.14 samples/sec   Loss 16.3842   LearningRate 0.0944   Epoch: 0   Global Step: 23370   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:21:08,553-Speed 2634.50 samples/sec   Loss 16.3626   LearningRate 0.0944   Epoch: 0   Global Step: 23380   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:21:12,460-Speed 2622.28 samples/sec   Loss 16.5520   LearningRate 0.0944   Epoch: 0   Global Step: 23390   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:21:16,368-Speed 2620.94 samples/sec   Loss 16.4575   LearningRate 0.0944   Epoch: 0   Global Step: 23400   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:21:20,254-Speed 2635.74 samples/sec   Loss 16.4034   LearningRate 0.0944   Epoch: 0   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:24,156-Speed 2625.16 samples/sec   Loss 16.5067   LearningRate 0.0944   Epoch: 0   Global Step: 23420   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:28,058-Speed 2624.56 samples/sec   Loss 16.2884   LearningRate 0.0944   Epoch: 0   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:31,957-Speed 2627.04 samples/sec   Loss 16.4724   LearningRate 0.0944   Epoch: 0   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:35,856-Speed 2626.90 samples/sec   Loss 16.5573   LearningRate 0.0944   Epoch: 0   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:39,760-Speed 2623.16 samples/sec   Loss 16.6909   LearningRate 0.0944   Epoch: 0   Global Step: 23460   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:43,668-Speed 2621.60 samples/sec   Loss 16.2647   LearningRate 0.0944   Epoch: 0   Global Step: 23470   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:47,568-Speed 2626.33 samples/sec   Loss 16.4521   LearningRate 0.0944   Epoch: 0   Global Step: 23480   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:51,470-Speed 2625.03 samples/sec   Loss 16.3547   LearningRate 0.0944   Epoch: 0   Global Step: 23490   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:55,378-Speed 2620.37 samples/sec   Loss 16.2704   LearningRate 0.0944   Epoch: 0   Global Step: 23500   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:21:59,365-Speed 2569.45 samples/sec   Loss 16.5173   LearningRate 0.0944   Epoch: 0   Global Step: 23510   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:03,294-Speed 2606.54 samples/sec   Loss 16.2632   LearningRate 0.0944   Epoch: 0   Global Step: 23520   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:07,191-Speed 2627.75 samples/sec   Loss 16.3426   LearningRate 0.0944   Epoch: 0   Global Step: 23530   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:11,101-Speed 2619.71 samples/sec   Loss 16.4848   LearningRate 0.0944   Epoch: 0   Global Step: 23540   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:15,004-Speed 2624.37 samples/sec   Loss 16.3858   LearningRate 0.0944   Epoch: 0   Global Step: 23550   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:18,908-Speed 2624.05 samples/sec   Loss 16.5043   LearningRate 0.0944   Epoch: 0   Global Step: 23560   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:22,808-Speed 2625.84 samples/sec   Loss 16.3076   LearningRate 0.0944   Epoch: 0   Global Step: 23570   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:26,706-Speed 2627.62 samples/sec   Loss 16.4921   LearningRate 0.0944   Epoch: 0   Global Step: 23580   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:30,624-Speed 2613.99 samples/sec   Loss 16.3556   LearningRate 0.0944   Epoch: 0   Global Step: 23590   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:34,528-Speed 2623.77 samples/sec   Loss 16.2611   LearningRate 0.0944   Epoch: 0   Global Step: 23600   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:38,415-Speed 2634.82 samples/sec   Loss 16.4782   LearningRate 0.0944   Epoch: 0   Global Step: 23610   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:22:42,300-Speed 2636.80 samples/sec   Loss 16.4489   LearningRate 0.0944   Epoch: 0   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:22:46,206-Speed 2622.12 samples/sec   Loss 16.4776   LearningRate 0.0944   Epoch: 0   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:22:50,149-Speed 2597.39 samples/sec   Loss 16.3204   LearningRate 0.0944   Epoch: 0   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:22:54,045-Speed 2628.81 samples/sec   Loss 16.3845   LearningRate 0.0944   Epoch: 0   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:22:57,942-Speed 2629.22 samples/sec   Loss 16.6311   LearningRate 0.0944   Epoch: 0   Global Step: 23660   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:01,843-Speed 2625.51 samples/sec   Loss 16.4749   LearningRate 0.0944   Epoch: 0   Global Step: 23670   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:05,745-Speed 2624.69 samples/sec   Loss 16.4679   LearningRate 0.0944   Epoch: 0   Global Step: 23680   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:09,649-Speed 2623.48 samples/sec   Loss 16.3462   LearningRate 0.0944   Epoch: 0   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:13,555-Speed 2622.18 samples/sec   Loss 16.2171   LearningRate 0.0944   Epoch: 0   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:17,460-Speed 2622.92 samples/sec   Loss 16.2067   LearningRate 0.0944   Epoch: 0   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:21,358-Speed 2627.41 samples/sec   Loss 16.1076   LearningRate 0.0944   Epoch: 0   Global Step: 23720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:23:25,263-Speed 2623.40 samples/sec   Loss 16.2866   LearningRate 0.0944   Epoch: 0   Global Step: 23730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:23:29,171-Speed 2621.14 samples/sec   Loss 16.1951   LearningRate 0.0944   Epoch: 0   Global Step: 23740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:23:33,066-Speed 2629.19 samples/sec   Loss 16.4544   LearningRate 0.0944   Epoch: 0   Global Step: 23750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:23:36,972-Speed 2622.31 samples/sec   Loss 16.5291   LearningRate 0.0944   Epoch: 0   Global Step: 23760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:23:40,868-Speed 2628.43 samples/sec   Loss 16.4718   LearningRate 0.0944   Epoch: 0   Global Step: 23770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:23:44,758-Speed 2633.27 samples/sec   Loss 16.4226   LearningRate 0.0943   Epoch: 0   Global Step: 23780   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:48,662-Speed 2623.21 samples/sec   Loss 16.4711   LearningRate 0.0943   Epoch: 0   Global Step: 23790   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:52,565-Speed 2624.08 samples/sec   Loss 16.2945   LearningRate 0.0943   Epoch: 0   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:23:56,463-Speed 2627.41 samples/sec   Loss 16.4693   LearningRate 0.0943   Epoch: 0   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:00,362-Speed 2627.66 samples/sec   Loss 16.4158   LearningRate 0.0943   Epoch: 0   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:04,258-Speed 2629.14 samples/sec   Loss 16.2077   LearningRate 0.0943   Epoch: 0   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:08,157-Speed 2626.19 samples/sec   Loss 16.5161   LearningRate 0.0943   Epoch: 0   Global Step: 23840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:12,056-Speed 2627.22 samples/sec   Loss 16.3597   LearningRate 0.0943   Epoch: 0   Global Step: 23850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:15,957-Speed 2625.75 samples/sec   Loss 16.4629   LearningRate 0.0943   Epoch: 0   Global Step: 23860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:19,857-Speed 2626.13 samples/sec   Loss 16.5167   LearningRate 0.0943   Epoch: 0   Global Step: 23870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:23,750-Speed 2630.28 samples/sec   Loss 16.3669   LearningRate 0.0943   Epoch: 0   Global Step: 23880   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:24:27,656-Speed 2622.60 samples/sec   Loss 16.3555   LearningRate 0.0943   Epoch: 0   Global Step: 23890   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:24:31,627-Speed 2579.33 samples/sec   Loss 16.3344   LearningRate 0.0943   Epoch: 0   Global Step: 23900   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:24:35,537-Speed 2619.21 samples/sec   Loss 16.4348   LearningRate 0.0943   Epoch: 0   Global Step: 23910   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:24:39,427-Speed 2633.43 samples/sec   Loss 16.3685   LearningRate 0.0943   Epoch: 0   Global Step: 23920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:43,463-Speed 2537.66 samples/sec   Loss 16.4127   LearningRate 0.0943   Epoch: 0   Global Step: 23930   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:47,362-Speed 2627.31 samples/sec   Loss 16.5432   LearningRate 0.0943   Epoch: 0   Global Step: 23940   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:51,263-Speed 2625.45 samples/sec   Loss 16.3310   LearningRate 0.0943   Epoch: 0   Global Step: 23950   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:55,168-Speed 2622.80 samples/sec   Loss 16.2016   LearningRate 0.0943   Epoch: 0   Global Step: 23960   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:24:59,070-Speed 2624.54 samples/sec   Loss 16.4113   LearningRate 0.0943   Epoch: 0   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:25:02,974-Speed 2623.26 samples/sec   Loss 16.1036   LearningRate 0.0943   Epoch: 0   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:25:06,873-Speed 2627.50 samples/sec   Loss 16.3123   LearningRate 0.0943   Epoch: 0   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:25:10,765-Speed 2631.56 samples/sec   Loss 16.3975   LearningRate 0.0943   Epoch: 0   Global Step: 24000   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:14,685-Speed 2613.41 samples/sec   Loss 16.3409   LearningRate 0.0943   Epoch: 0   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:18,578-Speed 2630.86 samples/sec   Loss 16.3821   LearningRate 0.0943   Epoch: 0   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:22,486-Speed 2621.13 samples/sec   Loss 16.5595   LearningRate 0.0943   Epoch: 0   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:26,380-Speed 2630.43 samples/sec   Loss 16.2933   LearningRate 0.0943   Epoch: 0   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:30,277-Speed 2628.80 samples/sec   Loss 16.2740   LearningRate 0.0943   Epoch: 0   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:34,180-Speed 2624.00 samples/sec   Loss 16.3091   LearningRate 0.0943   Epoch: 0   Global Step: 24060   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:38,084-Speed 2623.46 samples/sec   Loss 16.3472   LearningRate 0.0943   Epoch: 0   Global Step: 24070   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:41,982-Speed 2627.46 samples/sec   Loss 16.3115   LearningRate 0.0943   Epoch: 0   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:45,894-Speed 2618.39 samples/sec   Loss 16.0769   LearningRate 0.0943   Epoch: 0   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:25:49,792-Speed 2627.88 samples/sec   Loss 16.1843   LearningRate 0.0943   Epoch: 0   Global Step: 24100   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:25:53,690-Speed 2627.88 samples/sec   Loss 16.3162   LearningRate 0.0943   Epoch: 0   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:25:57,588-Speed 2627.45 samples/sec   Loss 16.4758   LearningRate 0.0943   Epoch: 0   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:01,495-Speed 2621.36 samples/sec   Loss 16.3187   LearningRate 0.0943   Epoch: 0   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:05,398-Speed 2624.04 samples/sec   Loss 16.3519   LearningRate 0.0943   Epoch: 0   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:09,298-Speed 2626.05 samples/sec   Loss 16.2054   LearningRate 0.0943   Epoch: 0   Global Step: 24150   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:13,197-Speed 2627.52 samples/sec   Loss 16.3593   LearningRate 0.0943   Epoch: 0   Global Step: 24160   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:17,099-Speed 2624.77 samples/sec   Loss 16.2533   LearningRate 0.0943   Epoch: 0   Global Step: 24170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:20,996-Speed 2628.62 samples/sec   Loss 16.3028   LearningRate 0.0943   Epoch: 0   Global Step: 24180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:24,897-Speed 2626.13 samples/sec   Loss 16.4042   LearningRate 0.0943   Epoch: 0   Global Step: 24190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:28,795-Speed 2627.28 samples/sec   Loss 16.2423   LearningRate 0.0943   Epoch: 0   Global Step: 24200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:26:32,677-Speed 2638.64 samples/sec   Loss 16.3283   LearningRate 0.0942   Epoch: 0   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:36,574-Speed 2628.36 samples/sec   Loss 16.2913   LearningRate 0.0942   Epoch: 0   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:40,475-Speed 2625.25 samples/sec   Loss 16.3129   LearningRate 0.0942   Epoch: 0   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:44,451-Speed 2576.47 samples/sec   Loss 16.3265   LearningRate 0.0942   Epoch: 0   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:48,363-Speed 2618.12 samples/sec   Loss 16.1471   LearningRate 0.0942   Epoch: 0   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:52,257-Speed 2630.53 samples/sec   Loss 16.3599   LearningRate 0.0942   Epoch: 0   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:26:56,154-Speed 2628.13 samples/sec   Loss 16.3434   LearningRate 0.0942   Epoch: 0   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:27:00,054-Speed 2626.86 samples/sec   Loss 16.2524   LearningRate 0.0942   Epoch: 0   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:27:03,955-Speed 2625.57 samples/sec   Loss 16.3845   LearningRate 0.0942   Epoch: 0   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:27:07,855-Speed 2625.92 samples/sec   Loss 16.2886   LearningRate 0.0942   Epoch: 0   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:27:11,752-Speed 2628.49 samples/sec   Loss 16.2064   LearningRate 0.0942   Epoch: 0   Global Step: 24310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:15,668-Speed 2615.59 samples/sec   Loss 16.3439   LearningRate 0.0942   Epoch: 0   Global Step: 24320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:19,570-Speed 2625.11 samples/sec   Loss 16.3769   LearningRate 0.0942   Epoch: 0   Global Step: 24330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:23,471-Speed 2625.75 samples/sec   Loss 16.2354   LearningRate 0.0942   Epoch: 0   Global Step: 24340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:27,376-Speed 2622.94 samples/sec   Loss 16.3487   LearningRate 0.0942   Epoch: 0   Global Step: 24350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:31,277-Speed 2625.78 samples/sec   Loss 16.1938   LearningRate 0.0942   Epoch: 0   Global Step: 24360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:35,177-Speed 2626.51 samples/sec   Loss 16.3039   LearningRate 0.0942   Epoch: 0   Global Step: 24370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:39,089-Speed 2617.52 samples/sec   Loss 16.3368   LearningRate 0.0942   Epoch: 0   Global Step: 24380   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:43,028-Speed 2601.06 samples/sec   Loss 16.1631   LearningRate 0.0942   Epoch: 0   Global Step: 24390   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:46,933-Speed 2622.74 samples/sec   Loss 16.3247   LearningRate 0.0942   Epoch: 0   Global Step: 24400   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:50,817-Speed 2637.27 samples/sec   Loss 16.1291   LearningRate 0.0942   Epoch: 0   Global Step: 24410   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:54,714-Speed 2627.86 samples/sec   Loss 16.1441   LearningRate 0.0942   Epoch: 0   Global Step: 24420   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:27:58,615-Speed 2626.33 samples/sec   Loss 16.1917   LearningRate 0.0942   Epoch: 0   Global Step: 24430   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:02,520-Speed 2623.15 samples/sec   Loss 16.2824   LearningRate 0.0942   Epoch: 0   Global Step: 24440   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:06,433-Speed 2617.28 samples/sec   Loss 15.9994   LearningRate 0.0942   Epoch: 0   Global Step: 24450   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:10,331-Speed 2627.06 samples/sec   Loss 15.9221   LearningRate 0.0942   Epoch: 0   Global Step: 24460   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:14,242-Speed 2619.87 samples/sec   Loss 16.2837   LearningRate 0.0942   Epoch: 0   Global Step: 24470   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:18,144-Speed 2624.88 samples/sec   Loss 16.0933   LearningRate 0.0942   Epoch: 0   Global Step: 24480   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:22,043-Speed 2626.55 samples/sec   Loss 16.1981   LearningRate 0.0942   Epoch: 0   Global Step: 24490   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:25,945-Speed 2624.86 samples/sec   Loss 16.2517   LearningRate 0.0942   Epoch: 0   Global Step: 24500   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:29,845-Speed 2626.80 samples/sec   Loss 16.1236   LearningRate 0.0942   Epoch: 0   Global Step: 24510   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:33,748-Speed 2623.95 samples/sec   Loss 16.2278   LearningRate 0.0942   Epoch: 0   Global Step: 24520   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:37,669-Speed 2611.91 samples/sec   Loss 16.2078   LearningRate 0.0942   Epoch: 0   Global Step: 24530   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:41,571-Speed 2625.33 samples/sec   Loss 16.2233   LearningRate 0.0942   Epoch: 0   Global Step: 24540   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:45,493-Speed 2611.62 samples/sec   Loss 16.2463   LearningRate 0.0942   Epoch: 0   Global Step: 24550   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:49,401-Speed 2621.57 samples/sec   Loss 16.2629   LearningRate 0.0942   Epoch: 0   Global Step: 24560   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:53,324-Speed 2610.46 samples/sec   Loss 16.1297   LearningRate 0.0942   Epoch: 0   Global Step: 24570   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:28:57,224-Speed 2626.51 samples/sec   Loss 16.3627   LearningRate 0.0942   Epoch: 0   Global Step: 24580   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:01,133-Speed 2619.81 samples/sec   Loss 16.2449   LearningRate 0.0942   Epoch: 0   Global Step: 24590   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:05,035-Speed 2625.45 samples/sec   Loss 16.2264   LearningRate 0.0942   Epoch: 0   Global Step: 24600   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:08,923-Speed 2634.53 samples/sec   Loss 16.0792   LearningRate 0.0942   Epoch: 0   Global Step: 24610   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:12,824-Speed 2625.66 samples/sec   Loss 16.1326   LearningRate 0.0942   Epoch: 0   Global Step: 24620   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:16,719-Speed 2629.96 samples/sec   Loss 16.1789   LearningRate 0.0942   Epoch: 0   Global Step: 24630   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:20,617-Speed 2627.70 samples/sec   Loss 16.0838   LearningRate 0.0941   Epoch: 0   Global Step: 24640   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:24,519-Speed 2624.98 samples/sec   Loss 16.1601   LearningRate 0.0941   Epoch: 0   Global Step: 24650   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:28,419-Speed 2626.29 samples/sec   Loss 16.2356   LearningRate 0.0941   Epoch: 0   Global Step: 24660   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:32,320-Speed 2625.65 samples/sec   Loss 16.2092   LearningRate 0.0941   Epoch: 0   Global Step: 24670   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:36,228-Speed 2621.01 samples/sec   Loss 16.1104   LearningRate 0.0941   Epoch: 0   Global Step: 24680   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:40,131-Speed 2624.25 samples/sec   Loss 16.2075   LearningRate 0.0941   Epoch: 0   Global Step: 24690   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:44,026-Speed 2629.55 samples/sec   Loss 16.2501   LearningRate 0.0941   Epoch: 0   Global Step: 24700   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:47,940-Speed 2617.12 samples/sec   Loss 16.1106   LearningRate 0.0941   Epoch: 0   Global Step: 24710   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:29:51,825-Speed 2636.64 samples/sec   Loss 16.0720   LearningRate 0.0941   Epoch: 0   Global Step: 24720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:55,726-Speed 2625.51 samples/sec   Loss 16.1187   LearningRate 0.0941   Epoch: 0   Global Step: 24730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:29:59,632-Speed 2621.64 samples/sec   Loss 15.9715   LearningRate 0.0941   Epoch: 0   Global Step: 24740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:03,530-Speed 2627.82 samples/sec   Loss 16.1906   LearningRate 0.0941   Epoch: 0   Global Step: 24750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:07,430-Speed 2626.81 samples/sec   Loss 16.1093   LearningRate 0.0941   Epoch: 0   Global Step: 24760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:11,324-Speed 2630.61 samples/sec   Loss 16.2786   LearningRate 0.0941   Epoch: 0   Global Step: 24770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:15,220-Speed 2628.60 samples/sec   Loss 16.2142   LearningRate 0.0941   Epoch: 0   Global Step: 24780   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:19,116-Speed 2629.32 samples/sec   Loss 16.0429   LearningRate 0.0941   Epoch: 0   Global Step: 24790   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:23,016-Speed 2625.47 samples/sec   Loss 16.2146   LearningRate 0.0941   Epoch: 0   Global Step: 24800   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:26,914-Speed 2627.96 samples/sec   Loss 16.1350   LearningRate 0.0941   Epoch: 0   Global Step: 24810   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:30,793-Speed 2639.83 samples/sec   Loss 16.0570   LearningRate 0.0941   Epoch: 0   Global Step: 24820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:34,693-Speed 2627.07 samples/sec   Loss 16.2219   LearningRate 0.0941   Epoch: 0   Global Step: 24830   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:30:38,578-Speed 2636.70 samples/sec   Loss 16.0566   LearningRate 0.0941   Epoch: 0   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:30:42,473-Speed 2629.86 samples/sec   Loss 15.9370   LearningRate 0.0941   Epoch: 0   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:30:46,369-Speed 2628.73 samples/sec   Loss 16.0747   LearningRate 0.0941   Epoch: 0   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:30:50,270-Speed 2625.69 samples/sec   Loss 16.2535   LearningRate 0.0941   Epoch: 0   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:30:54,168-Speed 2627.39 samples/sec   Loss 16.1590   LearningRate 0.0941   Epoch: 0   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:30:58,074-Speed 2622.70 samples/sec   Loss 16.2205   LearningRate 0.0941   Epoch: 0   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:31:01,973-Speed 2626.20 samples/sec   Loss 16.1926   LearningRate 0.0941   Epoch: 0   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:31:05,874-Speed 2625.66 samples/sec   Loss 15.9412   LearningRate 0.0941   Epoch: 0   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:31:09,777-Speed 2624.70 samples/sec   Loss 16.1645   LearningRate 0.0941   Epoch: 0   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:31:13,681-Speed 2623.20 samples/sec   Loss 15.9605   LearningRate 0.0941   Epoch: 0   Global Step: 24930   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:31:17,589-Speed 2621.30 samples/sec   Loss 16.1188   LearningRate 0.0941   Epoch: 0   Global Step: 24940   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:21,484-Speed 2629.62 samples/sec   Loss 16.0613   LearningRate 0.0941   Epoch: 0   Global Step: 24950   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:25,378-Speed 2630.28 samples/sec   Loss 16.1395   LearningRate 0.0941   Epoch: 0   Global Step: 24960   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:29,280-Speed 2624.57 samples/sec   Loss 16.0491   LearningRate 0.0941   Epoch: 0   Global Step: 24970   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:33,177-Speed 2628.25 samples/sec   Loss 16.0942   LearningRate 0.0941   Epoch: 0   Global Step: 24980   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:37,073-Speed 2628.94 samples/sec   Loss 15.9214   LearningRate 0.0941   Epoch: 0   Global Step: 24990   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:40,974-Speed 2625.68 samples/sec   Loss 16.2773   LearningRate 0.0941   Epoch: 0   Global Step: 25000   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:44,886-Speed 2618.74 samples/sec   Loss 16.1864   LearningRate 0.0941   Epoch: 0   Global Step: 25010   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:48,780-Speed 2629.72 samples/sec   Loss 16.0812   LearningRate 0.0941   Epoch: 0   Global Step: 25020   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:52,682-Speed 2625.59 samples/sec   Loss 16.0169   LearningRate 0.0941   Epoch: 0   Global Step: 25030   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:31:56,582-Speed 2626.30 samples/sec   Loss 16.0744   LearningRate 0.0941   Epoch: 0   Global Step: 25040   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:32:00,459-Speed 2642.06 samples/sec   Loss 16.1355   LearningRate 0.0941   Epoch: 0   Global Step: 25050   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:04,360-Speed 2625.39 samples/sec   Loss 15.9694   LearningRate 0.0940   Epoch: 0   Global Step: 25060   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:08,282-Speed 2611.09 samples/sec   Loss 15.9633   LearningRate 0.0940   Epoch: 0   Global Step: 25070   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:12,181-Speed 2627.21 samples/sec   Loss 16.0465   LearningRate 0.0940   Epoch: 0   Global Step: 25080   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:16,077-Speed 2629.47 samples/sec   Loss 16.1060   LearningRate 0.0940   Epoch: 0   Global Step: 25090   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:19,977-Speed 2626.99 samples/sec   Loss 16.2154   LearningRate 0.0940   Epoch: 0   Global Step: 25100   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:23,879-Speed 2624.85 samples/sec   Loss 16.0709   LearningRate 0.0940   Epoch: 0   Global Step: 25110   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:27,778-Speed 2626.73 samples/sec   Loss 16.1173   LearningRate 0.0940   Epoch: 0   Global Step: 25120   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:31,678-Speed 2626.67 samples/sec   Loss 16.2152   LearningRate 0.0940   Epoch: 0   Global Step: 25130   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:35,576-Speed 2627.39 samples/sec   Loss 16.0528   LearningRate 0.0940   Epoch: 0   Global Step: 25140   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:39,457-Speed 2639.00 samples/sec   Loss 16.0490   LearningRate 0.0940   Epoch: 0   Global Step: 25150   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:43,355-Speed 2627.90 samples/sec   Loss 16.1691   LearningRate 0.0940   Epoch: 0   Global Step: 25160   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:47,252-Speed 2628.53 samples/sec   Loss 16.1384   LearningRate 0.0940   Epoch: 0   Global Step: 25170   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:51,192-Speed 2599.24 samples/sec   Loss 15.9610   LearningRate 0.0940   Epoch: 0   Global Step: 25180   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:55,096-Speed 2624.11 samples/sec   Loss 15.9938   LearningRate 0.0940   Epoch: 0   Global Step: 25190   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:32:58,992-Speed 2629.03 samples/sec   Loss 16.1451   LearningRate 0.0940   Epoch: 0   Global Step: 25200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:02,902-Speed 2619.53 samples/sec   Loss 16.0753   LearningRate 0.0940   Epoch: 0   Global Step: 25210   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:06,801-Speed 2626.82 samples/sec   Loss 16.0884   LearningRate 0.0940   Epoch: 0   Global Step: 25220   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:10,702-Speed 2625.77 samples/sec   Loss 16.1005   LearningRate 0.0940   Epoch: 0   Global Step: 25230   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:14,614-Speed 2618.52 samples/sec   Loss 16.0414   LearningRate 0.0940   Epoch: 0   Global Step: 25240   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:18,514-Speed 2626.58 samples/sec   Loss 16.1543   LearningRate 0.0940   Epoch: 0   Global Step: 25250   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:33:22,580-Speed 2519.01 samples/sec   Loss 15.9576   LearningRate 0.0940   Epoch: 0   Global Step: 25260   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:26,499-Speed 2613.55 samples/sec   Loss 16.2170   LearningRate 0.0940   Epoch: 0   Global Step: 25270   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:30,430-Speed 2605.80 samples/sec   Loss 16.1568   LearningRate 0.0940   Epoch: 0   Global Step: 25280   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:34,329-Speed 2627.12 samples/sec   Loss 16.0414   LearningRate 0.0940   Epoch: 0   Global Step: 25290   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:38,227-Speed 2627.73 samples/sec   Loss 15.9548   LearningRate 0.0940   Epoch: 0   Global Step: 25300   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:42,124-Speed 2627.66 samples/sec   Loss 16.1961   LearningRate 0.0940   Epoch: 0   Global Step: 25310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:46,020-Speed 2629.04 samples/sec   Loss 16.0834   LearningRate 0.0940   Epoch: 0   Global Step: 25320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:49,927-Speed 2622.13 samples/sec   Loss 15.8710   LearningRate 0.0940   Epoch: 0   Global Step: 25330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:53,968-Speed 2534.76 samples/sec   Loss 15.9740   LearningRate 0.0940   Epoch: 0   Global Step: 25340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:33:57,981-Speed 2552.36 samples/sec   Loss 16.1969   LearningRate 0.0940   Epoch: 0   Global Step: 25350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:01,863-Speed 2638.57 samples/sec   Loss 16.1169   LearningRate 0.0940   Epoch: 0   Global Step: 25360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:05,763-Speed 2626.07 samples/sec   Loss 16.1198   LearningRate 0.0940   Epoch: 0   Global Step: 25370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:09,686-Speed 2610.73 samples/sec   Loss 16.2013   LearningRate 0.0940   Epoch: 0   Global Step: 25380   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:13,582-Speed 2633.26 samples/sec   Loss 16.0836   LearningRate 0.0940   Epoch: 0   Global Step: 25390   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:17,484-Speed 2624.38 samples/sec   Loss 16.0941   LearningRate 0.0940   Epoch: 0   Global Step: 25400   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:21,398-Speed 2616.97 samples/sec   Loss 16.0220   LearningRate 0.0940   Epoch: 0   Global Step: 25410   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:25,294-Speed 2629.26 samples/sec   Loss 16.0441   LearningRate 0.0940   Epoch: 0   Global Step: 25420   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:34:29,178-Speed 2636.95 samples/sec   Loss 15.8189   LearningRate 0.0940   Epoch: 0   Global Step: 25430   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:33,085-Speed 2621.59 samples/sec   Loss 16.0026   LearningRate 0.0940   Epoch: 0   Global Step: 25440   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:36,982-Speed 2628.35 samples/sec   Loss 15.8957   LearningRate 0.0940   Epoch: 0   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:40,879-Speed 2627.98 samples/sec   Loss 16.0913   LearningRate 0.0940   Epoch: 0   Global Step: 25460   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:44,775-Speed 2629.44 samples/sec   Loss 16.0890   LearningRate 0.0940   Epoch: 0   Global Step: 25470   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:48,671-Speed 2628.85 samples/sec   Loss 16.0451   LearningRate 0.0940   Epoch: 0   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:52,567-Speed 2628.93 samples/sec   Loss 16.0027   LearningRate 0.0939   Epoch: 0   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:34:56,463-Speed 2629.27 samples/sec   Loss 16.0866   LearningRate 0.0939   Epoch: 0   Global Step: 25500   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:35:00,456-Speed 2564.80 samples/sec   Loss 16.2367   LearningRate 0.0939   Epoch: 0   Global Step: 25510   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:35:04,354-Speed 2627.48 samples/sec   Loss 15.9935   LearningRate 0.0939   Epoch: 0   Global Step: 25520   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:35:08,271-Speed 2615.72 samples/sec   Loss 16.0045   LearningRate 0.0939   Epoch: 0   Global Step: 25530   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:12,180-Speed 2620.19 samples/sec   Loss 16.0332   LearningRate 0.0939   Epoch: 0   Global Step: 25540   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:16,086-Speed 2622.22 samples/sec   Loss 15.8460   LearningRate 0.0939   Epoch: 0   Global Step: 25550   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:19,990-Speed 2623.10 samples/sec   Loss 16.1433   LearningRate 0.0939   Epoch: 0   Global Step: 25560   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:23,897-Speed 2622.08 samples/sec   Loss 15.9658   LearningRate 0.0939   Epoch: 0   Global Step: 25570   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:27,818-Speed 2612.09 samples/sec   Loss 16.1088   LearningRate 0.0939   Epoch: 0   Global Step: 25580   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:31,716-Speed 2627.31 samples/sec   Loss 16.0262   LearningRate 0.0939   Epoch: 0   Global Step: 25590   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:35,616-Speed 2625.85 samples/sec   Loss 16.0518   LearningRate 0.0939   Epoch: 0   Global Step: 25600   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:39,518-Speed 2625.68 samples/sec   Loss 16.0651   LearningRate 0.0939   Epoch: 0   Global Step: 25610   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:43,421-Speed 2624.11 samples/sec   Loss 16.0537   LearningRate 0.0939   Epoch: 0   Global Step: 25620   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:47,319-Speed 2627.78 samples/sec   Loss 16.0581   LearningRate 0.0939   Epoch: 0   Global Step: 25630   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:35:51,205-Speed 2636.32 samples/sec   Loss 16.0348   LearningRate 0.0939   Epoch: 0   Global Step: 25640   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:55,116-Speed 2619.33 samples/sec   Loss 16.0158   LearningRate 0.0939   Epoch: 0   Global Step: 25650   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:35:59,012-Speed 2628.42 samples/sec   Loss 16.0493   LearningRate 0.0939   Epoch: 0   Global Step: 25660   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:02,911-Speed 2626.80 samples/sec   Loss 15.9929   LearningRate 0.0939   Epoch: 0   Global Step: 25670   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:06,822-Speed 2618.44 samples/sec   Loss 16.0816   LearningRate 0.0939   Epoch: 0   Global Step: 25680   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:10,724-Speed 2625.87 samples/sec   Loss 15.9443   LearningRate 0.0939   Epoch: 0   Global Step: 25690   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:14,623-Speed 2626.93 samples/sec   Loss 16.1148   LearningRate 0.0939   Epoch: 0   Global Step: 25700   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:18,521-Speed 2627.81 samples/sec   Loss 15.8212   LearningRate 0.0939   Epoch: 0   Global Step: 25710   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:22,417-Speed 2628.93 samples/sec   Loss 15.9731   LearningRate 0.0939   Epoch: 0   Global Step: 25720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:26,313-Speed 2629.14 samples/sec   Loss 16.0559   LearningRate 0.0939   Epoch: 0   Global Step: 25730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:30,207-Speed 2629.64 samples/sec   Loss 15.8981   LearningRate 0.0939   Epoch: 0   Global Step: 25740   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:36:34,101-Speed 2630.65 samples/sec   Loss 15.9739   LearningRate 0.0939   Epoch: 0   Global Step: 25750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:37,998-Speed 2628.64 samples/sec   Loss 16.2543   LearningRate 0.0939   Epoch: 0   Global Step: 25760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:41,905-Speed 2621.43 samples/sec   Loss 16.0802   LearningRate 0.0939   Epoch: 0   Global Step: 25770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:45,806-Speed 2625.75 samples/sec   Loss 16.0102   LearningRate 0.0939   Epoch: 0   Global Step: 25780   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:49,711-Speed 2622.64 samples/sec   Loss 15.9279   LearningRate 0.0939   Epoch: 0   Global Step: 25790   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:53,610-Speed 2627.81 samples/sec   Loss 15.8773   LearningRate 0.0939   Epoch: 0   Global Step: 25800   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:36:57,510-Speed 2626.14 samples/sec   Loss 15.7401   LearningRate 0.0939   Epoch: 0   Global Step: 25810   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:01,412-Speed 2624.84 samples/sec   Loss 15.9810   LearningRate 0.0939   Epoch: 0   Global Step: 25820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:05,310-Speed 2626.90 samples/sec   Loss 15.9846   LearningRate 0.0939   Epoch: 0   Global Step: 25830   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:09,210-Speed 2627.11 samples/sec   Loss 16.0951   LearningRate 0.0939   Epoch: 0   Global Step: 25840   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:13,089-Speed 2640.08 samples/sec   Loss 16.0267   LearningRate 0.0939   Epoch: 0   Global Step: 25850   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:16,992-Speed 2624.15 samples/sec   Loss 16.0138   LearningRate 0.0939   Epoch: 0   Global Step: 25860   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:20,895-Speed 2624.42 samples/sec   Loss 16.1036   LearningRate 0.0939   Epoch: 0   Global Step: 25870   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:24,796-Speed 2625.42 samples/sec   Loss 16.0055   LearningRate 0.0939   Epoch: 0   Global Step: 25880   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:28,695-Speed 2627.59 samples/sec   Loss 16.1422   LearningRate 0.0939   Epoch: 0   Global Step: 25890   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:32,597-Speed 2624.80 samples/sec   Loss 15.8156   LearningRate 0.0939   Epoch: 0   Global Step: 25900   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:36,496-Speed 2626.24 samples/sec   Loss 15.9555   LearningRate 0.0939   Epoch: 0   Global Step: 25910   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:40,397-Speed 2626.16 samples/sec   Loss 16.0049   LearningRate 0.0938   Epoch: 0   Global Step: 25920   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:44,295-Speed 2627.86 samples/sec   Loss 15.9690   LearningRate 0.0938   Epoch: 0   Global Step: 25930   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:48,193-Speed 2627.58 samples/sec   Loss 16.0190   LearningRate 0.0938   Epoch: 0   Global Step: 25940   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:52,080-Speed 2635.42 samples/sec   Loss 15.9247   LearningRate 0.0938   Epoch: 0   Global Step: 25950   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:55,979-Speed 2626.87 samples/sec   Loss 15.8363   LearningRate 0.0938   Epoch: 0   Global Step: 25960   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:37:59,876-Speed 2628.01 samples/sec   Loss 16.0192   LearningRate 0.0938   Epoch: 0   Global Step: 25970   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:38:03,777-Speed 2625.42 samples/sec   Loss 16.0301   LearningRate 0.0938   Epoch: 0   Global Step: 25980   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:38:07,669-Speed 2631.34 samples/sec   Loss 16.0161   LearningRate 0.0938   Epoch: 0   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:11,563-Speed 2630.77 samples/sec   Loss 15.8810   LearningRate 0.0938   Epoch: 0   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:15,466-Speed 2624.67 samples/sec   Loss 15.9354   LearningRate 0.0938   Epoch: 0   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:19,368-Speed 2624.47 samples/sec   Loss 16.0381   LearningRate 0.0938   Epoch: 0   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:23,280-Speed 2618.68 samples/sec   Loss 15.8604   LearningRate 0.0938   Epoch: 0   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:27,181-Speed 2625.98 samples/sec   Loss 15.9873   LearningRate 0.0938   Epoch: 0   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:31,091-Speed 2619.44 samples/sec   Loss 15.8689   LearningRate 0.0938   Epoch: 0   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:34,988-Speed 2628.04 samples/sec   Loss 15.9226   LearningRate 0.0938   Epoch: 0   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:38,887-Speed 2626.91 samples/sec   Loss 15.6784   LearningRate 0.0938   Epoch: 0   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:42,788-Speed 2625.87 samples/sec   Loss 15.8579   LearningRate 0.0938   Epoch: 0   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:38:46,691-Speed 2624.75 samples/sec   Loss 16.0332   LearningRate 0.0938   Epoch: 0   Global Step: 26090   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:38:50,597-Speed 2622.28 samples/sec   Loss 15.8994   LearningRate 0.0938   Epoch: 0   Global Step: 26100   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:38:54,504-Speed 2621.41 samples/sec   Loss 15.9975   LearningRate 0.0938   Epoch: 0   Global Step: 26110   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:38:58,403-Speed 2627.47 samples/sec   Loss 15.9908   LearningRate 0.0938   Epoch: 0   Global Step: 26120   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:02,298-Speed 2629.35 samples/sec   Loss 15.8118   LearningRate 0.0938   Epoch: 0   Global Step: 26130   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:06,264-Speed 2582.37 samples/sec   Loss 16.0199   LearningRate 0.0938   Epoch: 0   Global Step: 26140   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:10,201-Speed 2601.69 samples/sec   Loss 16.0106   LearningRate 0.0938   Epoch: 0   Global Step: 26150   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:14,101-Speed 2626.22 samples/sec   Loss 15.8954   LearningRate 0.0938   Epoch: 0   Global Step: 26160   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:18,010-Speed 2620.18 samples/sec   Loss 15.9299   LearningRate 0.0938   Epoch: 0   Global Step: 26170   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:21,919-Speed 2620.49 samples/sec   Loss 15.9896   LearningRate 0.0938   Epoch: 0   Global Step: 26180   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:25,813-Speed 2629.67 samples/sec   Loss 16.0644   LearningRate 0.0938   Epoch: 0   Global Step: 26190   Fp16 Grad Scale: 524288   Required: 90 hours
Training: 2022-04-12 22:39:29,696-Speed 2638.67 samples/sec   Loss 15.9947   LearningRate 0.0938   Epoch: 0   Global Step: 26200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:39:33,570-Speed 2643.31 samples/sec   Loss 15.8799   LearningRate 0.0938   Epoch: 0   Global Step: 26210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:39:37,472-Speed 2624.89 samples/sec   Loss 15.8969   LearningRate 0.0938   Epoch: 0   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:39:41,373-Speed 2625.30 samples/sec   Loss 15.9086   LearningRate 0.0938   Epoch: 0   Global Step: 26230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:39:45,277-Speed 2624.10 samples/sec   Loss 15.9509   LearningRate 0.0938   Epoch: 0   Global Step: 26240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:39:49,178-Speed 2625.28 samples/sec   Loss 15.7858   LearningRate 0.0938   Epoch: 0   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:39:53,080-Speed 2625.22 samples/sec   Loss 15.9277   LearningRate 0.0938   Epoch: 0   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:39:56,981-Speed 2625.39 samples/sec   Loss 15.7109   LearningRate 0.0938   Epoch: 0   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:00,885-Speed 2623.73 samples/sec   Loss 15.9135   LearningRate 0.0938   Epoch: 0   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:04,793-Speed 2620.63 samples/sec   Loss 15.9336   LearningRate 0.0938   Epoch: 0   Global Step: 26290   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:08,692-Speed 2626.56 samples/sec   Loss 15.8089   LearningRate 0.0938   Epoch: 0   Global Step: 26300   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:12,593-Speed 2625.69 samples/sec   Loss 15.7900   LearningRate 0.0938   Epoch: 0   Global Step: 26310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:16,501-Speed 2620.83 samples/sec   Loss 15.7935   LearningRate 0.0938   Epoch: 0   Global Step: 26320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:20,422-Speed 2612.73 samples/sec   Loss 15.9531   LearningRate 0.0938   Epoch: 0   Global Step: 26330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:24,322-Speed 2626.60 samples/sec   Loss 15.9588   LearningRate 0.0938   Epoch: 0   Global Step: 26340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:28,219-Speed 2627.92 samples/sec   Loss 15.8179   LearningRate 0.0937   Epoch: 0   Global Step: 26350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:32,145-Speed 2609.07 samples/sec   Loss 15.8838   LearningRate 0.0937   Epoch: 0   Global Step: 26360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:36,046-Speed 2625.62 samples/sec   Loss 15.9577   LearningRate 0.0937   Epoch: 0   Global Step: 26370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:40:39,945-Speed 2627.02 samples/sec   Loss 16.0394   LearningRate 0.0937   Epoch: 0   Global Step: 26380   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:43,857-Speed 2618.35 samples/sec   Loss 15.8536   LearningRate 0.0937   Epoch: 0   Global Step: 26390   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:47,765-Speed 2621.63 samples/sec   Loss 15.8087   LearningRate 0.0937   Epoch: 0   Global Step: 26400   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:51,661-Speed 2628.61 samples/sec   Loss 15.7733   LearningRate 0.0937   Epoch: 0   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:40:55,545-Speed 2637.30 samples/sec   Loss 15.7793   LearningRate 0.0937   Epoch: 0   Global Step: 26420   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:40:59,445-Speed 2626.22 samples/sec   Loss 15.9591   LearningRate 0.0937   Epoch: 0   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:03,346-Speed 2625.45 samples/sec   Loss 15.8481   LearningRate 0.0937   Epoch: 0   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:07,240-Speed 2630.47 samples/sec   Loss 15.8866   LearningRate 0.0937   Epoch: 0   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:11,133-Speed 2631.04 samples/sec   Loss 15.8636   LearningRate 0.0937   Epoch: 0   Global Step: 26460   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:15,029-Speed 2628.87 samples/sec   Loss 15.7671   LearningRate 0.0937   Epoch: 0   Global Step: 26470   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:18,922-Speed 2631.44 samples/sec   Loss 15.9520   LearningRate 0.0937   Epoch: 0   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:22,816-Speed 2630.58 samples/sec   Loss 15.9080   LearningRate 0.0937   Epoch: 0   Global Step: 26490   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:26,716-Speed 2626.05 samples/sec   Loss 15.8573   LearningRate 0.0937   Epoch: 0   Global Step: 26500   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:30,611-Speed 2629.92 samples/sec   Loss 15.8011   LearningRate 0.0937   Epoch: 0   Global Step: 26510   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:41:34,510-Speed 2626.15 samples/sec   Loss 15.8501   LearningRate 0.0937   Epoch: 0   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:41:38,406-Speed 2629.26 samples/sec   Loss 15.7513   LearningRate 0.0937   Epoch: 0   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:41:42,333-Speed 2608.53 samples/sec   Loss 15.7431   LearningRate 0.0937   Epoch: 0   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:41:46,229-Speed 2628.65 samples/sec   Loss 15.6443   LearningRate 0.0937   Epoch: 0   Global Step: 26550   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:41:50,124-Speed 2629.53 samples/sec   Loss 15.8854   LearningRate 0.0937   Epoch: 0   Global Step: 26560   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:41:54,013-Speed 2634.17 samples/sec   Loss 15.9721   LearningRate 0.0937   Epoch: 0   Global Step: 26570   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:41:57,920-Speed 2621.56 samples/sec   Loss 15.7988   LearningRate 0.0937   Epoch: 0   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:01,815-Speed 2629.31 samples/sec   Loss 15.9153   LearningRate 0.0937   Epoch: 0   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:05,711-Speed 2628.92 samples/sec   Loss 15.7916   LearningRate 0.0937   Epoch: 0   Global Step: 26600   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:09,614-Speed 2624.60 samples/sec   Loss 15.7534   LearningRate 0.0937   Epoch: 0   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:13,509-Speed 2629.59 samples/sec   Loss 15.7672   LearningRate 0.0937   Epoch: 0   Global Step: 26620   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:42:17,396-Speed 2635.07 samples/sec   Loss 15.8038   LearningRate 0.0937   Epoch: 0   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:21,290-Speed 2629.84 samples/sec   Loss 15.9071   LearningRate 0.0937   Epoch: 0   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:25,186-Speed 2629.78 samples/sec   Loss 15.7947   LearningRate 0.0937   Epoch: 0   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:29,079-Speed 2630.75 samples/sec   Loss 15.8731   LearningRate 0.0937   Epoch: 0   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:32,973-Speed 2630.50 samples/sec   Loss 15.8794   LearningRate 0.0937   Epoch: 0   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:36,873-Speed 2625.82 samples/sec   Loss 15.7602   LearningRate 0.0937   Epoch: 0   Global Step: 26680   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:40,768-Speed 2629.85 samples/sec   Loss 15.8061   LearningRate 0.0937   Epoch: 0   Global Step: 26690   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:44,663-Speed 2629.67 samples/sec   Loss 15.7881   LearningRate 0.0937   Epoch: 0   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:48,561-Speed 2627.21 samples/sec   Loss 15.8726   LearningRate 0.0937   Epoch: 0   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:52,462-Speed 2625.94 samples/sec   Loss 15.9314   LearningRate 0.0937   Epoch: 0   Global Step: 26720   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:42:56,357-Speed 2629.52 samples/sec   Loss 15.9190   LearningRate 0.0937   Epoch: 0   Global Step: 26730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:00,254-Speed 2628.08 samples/sec   Loss 15.8098   LearningRate 0.0937   Epoch: 0   Global Step: 26740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:04,154-Speed 2626.48 samples/sec   Loss 15.8906   LearningRate 0.0937   Epoch: 0   Global Step: 26750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:08,057-Speed 2624.48 samples/sec   Loss 15.8146   LearningRate 0.0937   Epoch: 0   Global Step: 26760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:11,955-Speed 2627.68 samples/sec   Loss 15.9297   LearningRate 0.0937   Epoch: 0   Global Step: 26770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:15,856-Speed 2625.38 samples/sec   Loss 15.8084   LearningRate 0.0936   Epoch: 0   Global Step: 26780   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:19,757-Speed 2625.09 samples/sec   Loss 15.9453   LearningRate 0.0936   Epoch: 0   Global Step: 26790   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:23,664-Speed 2622.19 samples/sec   Loss 15.7521   LearningRate 0.0936   Epoch: 0   Global Step: 26800   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:27,580-Speed 2615.26 samples/sec   Loss 15.8158   LearningRate 0.0936   Epoch: 0   Global Step: 26810   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:31,489-Speed 2620.12 samples/sec   Loss 15.6547   LearningRate 0.0936   Epoch: 0   Global Step: 26820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:35,369-Speed 2639.75 samples/sec   Loss 15.8239   LearningRate 0.0936   Epoch: 0   Global Step: 26830   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:39,268-Speed 2627.27 samples/sec   Loss 15.6381   LearningRate 0.0936   Epoch: 0   Global Step: 26840   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:43,166-Speed 2627.49 samples/sec   Loss 15.7380   LearningRate 0.0936   Epoch: 0   Global Step: 26850   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:47,071-Speed 2622.66 samples/sec   Loss 15.7523   LearningRate 0.0936   Epoch: 0   Global Step: 26860   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:50,980-Speed 2620.92 samples/sec   Loss 15.7960   LearningRate 0.0936   Epoch: 0   Global Step: 26870   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:54,880-Speed 2625.69 samples/sec   Loss 15.6837   LearningRate 0.0936   Epoch: 0   Global Step: 26880   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:43:58,787-Speed 2621.89 samples/sec   Loss 15.8028   LearningRate 0.0936   Epoch: 0   Global Step: 26890   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:44:02,668-Speed 2639.26 samples/sec   Loss 15.7885   LearningRate 0.0936   Epoch: 0   Global Step: 26900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:06,568-Speed 2625.65 samples/sec   Loss 15.8761   LearningRate 0.0936   Epoch: 0   Global Step: 26910   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:10,472-Speed 2623.30 samples/sec   Loss 15.6912   LearningRate 0.0936   Epoch: 0   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:14,375-Speed 2624.58 samples/sec   Loss 15.7536   LearningRate 0.0936   Epoch: 0   Global Step: 26930   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:18,274-Speed 2626.90 samples/sec   Loss 15.5660   LearningRate 0.0936   Epoch: 0   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:22,177-Speed 2624.42 samples/sec   Loss 15.7890   LearningRate 0.0936   Epoch: 0   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:26,075-Speed 2627.88 samples/sec   Loss 15.7575   LearningRate 0.0936   Epoch: 0   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:29,982-Speed 2621.42 samples/sec   Loss 15.8769   LearningRate 0.0936   Epoch: 0   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:33,883-Speed 2625.41 samples/sec   Loss 15.8677   LearningRate 0.0936   Epoch: 0   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:44:37,763-Speed 2639.45 samples/sec   Loss 15.7182   LearningRate 0.0936   Epoch: 0   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:44:41,660-Speed 2628.41 samples/sec   Loss 15.7360   LearningRate 0.0936   Epoch: 0   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:44:45,557-Speed 2628.44 samples/sec   Loss 15.7131   LearningRate 0.0936   Epoch: 0   Global Step: 27010   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:44:49,455-Speed 2627.33 samples/sec   Loss 15.7758   LearningRate 0.0936   Epoch: 0   Global Step: 27020   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:44:53,353-Speed 2627.90 samples/sec   Loss 15.7726   LearningRate 0.0936   Epoch: 0   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:44:57,249-Speed 2629.21 samples/sec   Loss 15.6922   LearningRate 0.0936   Epoch: 0   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:45:01,147-Speed 2627.64 samples/sec   Loss 15.6480   LearningRate 0.0936   Epoch: 0   Global Step: 27050   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:45:05,048-Speed 2625.41 samples/sec   Loss 15.7402   LearningRate 0.0936   Epoch: 0   Global Step: 27060   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:45:08,944-Speed 2628.44 samples/sec   Loss 15.7959   LearningRate 0.0936   Epoch: 0   Global Step: 27070   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:45:12,850-Speed 2622.68 samples/sec   Loss 15.6262   LearningRate 0.0936   Epoch: 0   Global Step: 27080   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 22:45:16,750-Speed 2626.00 samples/sec   Loss 15.8067   LearningRate 0.0936   Epoch: 0   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:20,657-Speed 2621.83 samples/sec   Loss 15.6307   LearningRate 0.0936   Epoch: 0   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:24,566-Speed 2620.12 samples/sec   Loss 15.9060   LearningRate 0.0936   Epoch: 0   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:28,472-Speed 2622.27 samples/sec   Loss 15.7579   LearningRate 0.0936   Epoch: 0   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:32,374-Speed 2625.35 samples/sec   Loss 15.7189   LearningRate 0.0936   Epoch: 0   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:36,272-Speed 2627.16 samples/sec   Loss 15.7445   LearningRate 0.0936   Epoch: 0   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:40,169-Speed 2627.91 samples/sec   Loss 15.6464   LearningRate 0.0936   Epoch: 0   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:44,075-Speed 2622.65 samples/sec   Loss 15.6733   LearningRate 0.0936   Epoch: 0   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:47,977-Speed 2624.55 samples/sec   Loss 15.7206   LearningRate 0.0936   Epoch: 0   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:51,886-Speed 2620.34 samples/sec   Loss 15.7684   LearningRate 0.0936   Epoch: 0   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 22:45:55,787-Speed 2625.30 samples/sec   Loss 15.6763   LearningRate 0.0936   Epoch: 0   Global Step: 27190   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:45:59,684-Speed 2629.12 samples/sec   Loss 15.8681   LearningRate 0.0935   Epoch: 0   Global Step: 27200   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:03,600-Speed 2615.02 samples/sec   Loss 15.6145   LearningRate 0.0935   Epoch: 0   Global Step: 27210   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:07,497-Speed 2628.56 samples/sec   Loss 15.7306   LearningRate 0.0935   Epoch: 0   Global Step: 27220   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:11,441-Speed 2596.45 samples/sec   Loss 15.6863   LearningRate 0.0935   Epoch: 0   Global Step: 27230   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:15,395-Speed 2590.59 samples/sec   Loss 15.8283   LearningRate 0.0935   Epoch: 0   Global Step: 27240   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:19,300-Speed 2622.71 samples/sec   Loss 15.6004   LearningRate 0.0935   Epoch: 0   Global Step: 27250   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:23,194-Speed 2630.40 samples/sec   Loss 15.7107   LearningRate 0.0935   Epoch: 0   Global Step: 27260   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:27,092-Speed 2627.22 samples/sec   Loss 15.6688   LearningRate 0.0935   Epoch: 0   Global Step: 27270   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:30,992-Speed 2627.09 samples/sec   Loss 15.8228   LearningRate 0.0935   Epoch: 0   Global Step: 27280   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:34,912-Speed 2612.92 samples/sec   Loss 15.6429   LearningRate 0.0935   Epoch: 0   Global Step: 27290   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:38,813-Speed 2625.41 samples/sec   Loss 15.7834   LearningRate 0.0935   Epoch: 0   Global Step: 27300   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:42,711-Speed 2626.98 samples/sec   Loss 15.7189   LearningRate 0.0935   Epoch: 0   Global Step: 27310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 22:46:46,607-Speed 2629.39 samples/sec   Loss 15.7105   LearningRate 0.0935   Epoch: 0   Global Step: 27320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:46:50,500-Speed 2630.57 samples/sec   Loss 15.7370   LearningRate 0.0935   Epoch: 0   Global Step: 27330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:46:54,414-Speed 2617.63 samples/sec   Loss 15.8142   LearningRate 0.0935   Epoch: 0   Global Step: 27340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:46:58,328-Speed 2616.26 samples/sec   Loss 15.6896   LearningRate 0.0935   Epoch: 0   Global Step: 27350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:02,237-Speed 2620.28 samples/sec   Loss 15.6826   LearningRate 0.0935   Epoch: 0   Global Step: 27360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:06,136-Speed 2627.16 samples/sec   Loss 15.8023   LearningRate 0.0935   Epoch: 0   Global Step: 27370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:10,042-Speed 2622.66 samples/sec   Loss 15.6528   LearningRate 0.0935   Epoch: 0   Global Step: 27380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:13,920-Speed 2640.73 samples/sec   Loss 15.5321   LearningRate 0.0935   Epoch: 0   Global Step: 27390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:17,817-Speed 2628.18 samples/sec   Loss 15.7662   LearningRate 0.0935   Epoch: 0   Global Step: 27400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:21,712-Speed 2629.52 samples/sec   Loss 15.6120   LearningRate 0.0935   Epoch: 0   Global Step: 27410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:25,615-Speed 2624.50 samples/sec   Loss 15.8130   LearningRate 0.0935   Epoch: 0   Global Step: 27420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:29,518-Speed 2623.76 samples/sec   Loss 15.8125   LearningRate 0.0935   Epoch: 0   Global Step: 27430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:33,415-Speed 2628.20 samples/sec   Loss 15.8209   LearningRate 0.0935   Epoch: 0   Global Step: 27440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:37,337-Speed 2611.84 samples/sec   Loss 15.8051   LearningRate 0.0935   Epoch: 0   Global Step: 27450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:41,242-Speed 2622.65 samples/sec   Loss 15.7231   LearningRate 0.0935   Epoch: 0   Global Step: 27460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:47:45,132-Speed 2633.41 samples/sec   Loss 15.7691   LearningRate 0.0935   Epoch: 0   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:47:49,028-Speed 2628.84 samples/sec   Loss 15.8549   LearningRate 0.0935   Epoch: 0   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:47:52,942-Speed 2616.72 samples/sec   Loss 15.7284   LearningRate 0.0935   Epoch: 0   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:47:56,836-Speed 2630.33 samples/sec   Loss 15.8668   LearningRate 0.0935   Epoch: 0   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:00,732-Speed 2628.92 samples/sec   Loss 15.6933   LearningRate 0.0935   Epoch: 0   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:04,628-Speed 2628.55 samples/sec   Loss 15.7541   LearningRate 0.0935   Epoch: 0   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:08,526-Speed 2627.67 samples/sec   Loss 15.8415   LearningRate 0.0935   Epoch: 0   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:12,431-Speed 2623.33 samples/sec   Loss 15.6719   LearningRate 0.0935   Epoch: 0   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:16,326-Speed 2629.38 samples/sec   Loss 15.5611   LearningRate 0.0935   Epoch: 0   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:20,222-Speed 2629.29 samples/sec   Loss 15.7372   LearningRate 0.0935   Epoch: 0   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:24,120-Speed 2627.96 samples/sec   Loss 15.7730   LearningRate 0.0935   Epoch: 0   Global Step: 27570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:48:28,014-Speed 2630.03 samples/sec   Loss 15.6396   LearningRate 0.0935   Epoch: 0   Global Step: 27580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:48:31,911-Speed 2628.05 samples/sec   Loss 15.6432   LearningRate 0.0935   Epoch: 0   Global Step: 27590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:48:35,795-Speed 2636.66 samples/sec   Loss 15.6027   LearningRate 0.0935   Epoch: 0   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:39,692-Speed 2628.66 samples/sec   Loss 15.6611   LearningRate 0.0935   Epoch: 0   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:43,589-Speed 2628.21 samples/sec   Loss 15.6398   LearningRate 0.0935   Epoch: 0   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:47,485-Speed 2628.92 samples/sec   Loss 15.5020   LearningRate 0.0934   Epoch: 0   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:51,386-Speed 2625.62 samples/sec   Loss 15.7427   LearningRate 0.0934   Epoch: 0   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:55,282-Speed 2629.50 samples/sec   Loss 15.4917   LearningRate 0.0934   Epoch: 0   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:48:59,176-Speed 2630.01 samples/sec   Loss 15.5851   LearningRate 0.0934   Epoch: 0   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:03,070-Speed 2630.11 samples/sec   Loss 15.5919   LearningRate 0.0934   Epoch: 0   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:06,964-Speed 2630.19 samples/sec   Loss 15.7966   LearningRate 0.0934   Epoch: 0   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:10,860-Speed 2628.93 samples/sec   Loss 15.7716   LearningRate 0.0934   Epoch: 0   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:14,757-Speed 2628.06 samples/sec   Loss 15.6210   LearningRate 0.0934   Epoch: 0   Global Step: 27700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:49:18,661-Speed 2623.26 samples/sec   Loss 15.6178   LearningRate 0.0934   Epoch: 0   Global Step: 27710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:49:22,542-Speed 2639.63 samples/sec   Loss 15.6384   LearningRate 0.0934   Epoch: 0   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:26,441-Speed 2626.21 samples/sec   Loss 15.6938   LearningRate 0.0934   Epoch: 0   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:30,338-Speed 2629.10 samples/sec   Loss 15.5822   LearningRate 0.0934   Epoch: 0   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:34,234-Speed 2628.59 samples/sec   Loss 15.6810   LearningRate 0.0934   Epoch: 0   Global Step: 27750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:38,131-Speed 2628.39 samples/sec   Loss 15.5193   LearningRate 0.0934   Epoch: 0   Global Step: 27760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:42,027-Speed 2629.01 samples/sec   Loss 15.7653   LearningRate 0.0934   Epoch: 0   Global Step: 27770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:45,926-Speed 2626.47 samples/sec   Loss 15.6139   LearningRate 0.0934   Epoch: 0   Global Step: 27780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:49,822-Speed 2628.82 samples/sec   Loss 15.6619   LearningRate 0.0934   Epoch: 0   Global Step: 27790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:53,718-Speed 2629.03 samples/sec   Loss 15.6230   LearningRate 0.0934   Epoch: 0   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:49:57,620-Speed 2625.03 samples/sec   Loss 15.7504   LearningRate 0.0934   Epoch: 0   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:50:01,519-Speed 2627.05 samples/sec   Loss 15.6266   LearningRate 0.0934   Epoch: 0   Global Step: 27820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:05,418-Speed 2626.58 samples/sec   Loss 15.5318   LearningRate 0.0934   Epoch: 0   Global Step: 27830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:09,310-Speed 2631.63 samples/sec   Loss 15.6411   LearningRate 0.0934   Epoch: 0   Global Step: 27840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:13,219-Speed 2619.95 samples/sec   Loss 15.6292   LearningRate 0.0934   Epoch: 0   Global Step: 27850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:17,118-Speed 2627.19 samples/sec   Loss 15.6022   LearningRate 0.0934   Epoch: 0   Global Step: 27860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:21,016-Speed 2627.75 samples/sec   Loss 15.5194   LearningRate 0.0934   Epoch: 0   Global Step: 27870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:24,920-Speed 2623.57 samples/sec   Loss 15.5999   LearningRate 0.0934   Epoch: 0   Global Step: 27880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:28,830-Speed 2619.49 samples/sec   Loss 15.6782   LearningRate 0.0934   Epoch: 0   Global Step: 27890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:32,728-Speed 2627.61 samples/sec   Loss 15.7811   LearningRate 0.0934   Epoch: 0   Global Step: 27900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:36,628-Speed 2625.88 samples/sec   Loss 15.6365   LearningRate 0.0934   Epoch: 0   Global Step: 27910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:40,518-Speed 2633.10 samples/sec   Loss 15.4624   LearningRate 0.0934   Epoch: 0   Global Step: 27920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:50:44,398-Speed 2640.10 samples/sec   Loss 15.7480   LearningRate 0.0934   Epoch: 0   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:50:48,292-Speed 2630.37 samples/sec   Loss 15.4463   LearningRate 0.0934   Epoch: 0   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:50:52,187-Speed 2629.52 samples/sec   Loss 15.6351   LearningRate 0.0934   Epoch: 0   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:50:56,082-Speed 2629.54 samples/sec   Loss 15.5479   LearningRate 0.0934   Epoch: 0   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:50:59,995-Speed 2617.24 samples/sec   Loss 15.4177   LearningRate 0.0934   Epoch: 0   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:51:03,895-Speed 2626.13 samples/sec   Loss 15.6432   LearningRate 0.0934   Epoch: 0   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:51:07,805-Speed 2619.69 samples/sec   Loss 15.5599   LearningRate 0.0934   Epoch: 0   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:51:11,702-Speed 2627.81 samples/sec   Loss 15.5866   LearningRate 0.0934   Epoch: 0   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:51:15,597-Speed 2630.03 samples/sec   Loss 15.8174   LearningRate 0.0934   Epoch: 0   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:51:19,495-Speed 2627.52 samples/sec   Loss 15.6177   LearningRate 0.0934   Epoch: 0   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:23,382-Speed 2636.76 samples/sec   Loss 15.6416   LearningRate 0.0934   Epoch: 0   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:27,293-Speed 2618.63 samples/sec   Loss 15.5092   LearningRate 0.0934   Epoch: 0   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:31,193-Speed 2626.15 samples/sec   Loss 15.6387   LearningRate 0.0934   Epoch: 0   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:35,092-Speed 2626.42 samples/sec   Loss 15.7264   LearningRate 0.0933   Epoch: 0   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:39,020-Speed 2607.99 samples/sec   Loss 15.7134   LearningRate 0.0933   Epoch: 0   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:42,924-Speed 2623.29 samples/sec   Loss 15.7915   LearningRate 0.0933   Epoch: 0   Global Step: 28080   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:46,821-Speed 2628.23 samples/sec   Loss 15.7562   LearningRate 0.0933   Epoch: 0   Global Step: 28090   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:50,719-Speed 2627.63 samples/sec   Loss 15.7205   LearningRate 0.0933   Epoch: 0   Global Step: 28100   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:54,617-Speed 2627.71 samples/sec   Loss 15.6410   LearningRate 0.0933   Epoch: 0   Global Step: 28110   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 22:51:58,515-Speed 2627.57 samples/sec   Loss 15.5937   LearningRate 0.0933   Epoch: 0   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:02,417-Speed 2625.20 samples/sec   Loss 15.6453   LearningRate 0.0933   Epoch: 0   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:06,320-Speed 2624.25 samples/sec   Loss 15.5976   LearningRate 0.0933   Epoch: 0   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:10,219-Speed 2626.90 samples/sec   Loss 15.6914   LearningRate 0.0933   Epoch: 0   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:14,117-Speed 2627.31 samples/sec   Loss 15.5587   LearningRate 0.0933   Epoch: 0   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:18,018-Speed 2625.44 samples/sec   Loss 15.4887   LearningRate 0.0933   Epoch: 0   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:21,920-Speed 2625.25 samples/sec   Loss 15.5053   LearningRate 0.0933   Epoch: 0   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:25,833-Speed 2617.60 samples/sec   Loss 15.6146   LearningRate 0.0933   Epoch: 0   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:29,731-Speed 2627.42 samples/sec   Loss 15.6995   LearningRate 0.0933   Epoch: 0   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:33,653-Speed 2611.33 samples/sec   Loss 15.4886   LearningRate 0.0933   Epoch: 0   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:52:37,551-Speed 2627.96 samples/sec   Loss 15.5989   LearningRate 0.0933   Epoch: 0   Global Step: 28220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:52:41,551-Speed 2561.60 samples/sec   Loss 15.5684   LearningRate 0.0933   Epoch: 0   Global Step: 28230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:52:45,448-Speed 2628.25 samples/sec   Loss 15.5675   LearningRate 0.0933   Epoch: 0   Global Step: 28240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:52:49,349-Speed 2625.71 samples/sec   Loss 15.5178   LearningRate 0.0933   Epoch: 0   Global Step: 28250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:52:53,250-Speed 2625.04 samples/sec   Loss 15.5954   LearningRate 0.0933   Epoch: 0   Global Step: 28260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:52:57,152-Speed 2624.91 samples/sec   Loss 15.4865   LearningRate 0.0933   Epoch: 0   Global Step: 28270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:53:01,143-Speed 2566.21 samples/sec   Loss 15.5872   LearningRate 0.0933   Epoch: 0   Global Step: 28280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:53:05,058-Speed 2616.22 samples/sec   Loss 15.6020   LearningRate 0.0933   Epoch: 0   Global Step: 28290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:53:08,942-Speed 2637.56 samples/sec   Loss 15.7003   LearningRate 0.0933   Epoch: 0   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:12,850-Speed 2621.04 samples/sec   Loss 15.5854   LearningRate 0.0933   Epoch: 0   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:16,769-Speed 2614.19 samples/sec   Loss 15.5867   LearningRate 0.0933   Epoch: 0   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:20,676-Speed 2621.39 samples/sec   Loss 15.6318   LearningRate 0.0933   Epoch: 0   Global Step: 28330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:24,572-Speed 2628.62 samples/sec   Loss 15.6978   LearningRate 0.0933   Epoch: 0   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:28,468-Speed 2628.78 samples/sec   Loss 15.5390   LearningRate 0.0933   Epoch: 0   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:32,374-Speed 2622.55 samples/sec   Loss 15.6317   LearningRate 0.0933   Epoch: 0   Global Step: 28360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:36,274-Speed 2625.79 samples/sec   Loss 15.5215   LearningRate 0.0933   Epoch: 0   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:40,173-Speed 2626.86 samples/sec   Loss 15.6088   LearningRate 0.0933   Epoch: 0   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:44,071-Speed 2627.89 samples/sec   Loss 15.5735   LearningRate 0.0933   Epoch: 0   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:47,963-Speed 2632.06 samples/sec   Loss 15.6043   LearningRate 0.0933   Epoch: 0   Global Step: 28400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:53:51,862-Speed 2626.58 samples/sec   Loss 15.6804   LearningRate 0.0933   Epoch: 0   Global Step: 28410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:53:55,750-Speed 2634.10 samples/sec   Loss 15.5479   LearningRate 0.0933   Epoch: 0   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:53:59,636-Speed 2636.02 samples/sec   Loss 15.5970   LearningRate 0.0933   Epoch: 0   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:03,531-Speed 2629.52 samples/sec   Loss 15.5086   LearningRate 0.0933   Epoch: 0   Global Step: 28440   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:07,424-Speed 2630.55 samples/sec   Loss 15.5698   LearningRate 0.0933   Epoch: 0   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:11,319-Speed 2629.80 samples/sec   Loss 15.5355   LearningRate 0.0933   Epoch: 0   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:15,214-Speed 2629.22 samples/sec   Loss 15.6019   LearningRate 0.0933   Epoch: 0   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:19,107-Speed 2631.55 samples/sec   Loss 15.6936   LearningRate 0.0933   Epoch: 0   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:23,004-Speed 2628.30 samples/sec   Loss 15.4856   LearningRate 0.0932   Epoch: 0   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:26,939-Speed 2602.73 samples/sec   Loss 15.5190   LearningRate 0.0932   Epoch: 0   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:30,836-Speed 2628.44 samples/sec   Loss 15.4081   LearningRate 0.0932   Epoch: 0   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:34,733-Speed 2628.18 samples/sec   Loss 15.4662   LearningRate 0.0932   Epoch: 0   Global Step: 28520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:54:38,634-Speed 2625.19 samples/sec   Loss 15.5298   LearningRate 0.0932   Epoch: 0   Global Step: 28530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:54:42,530-Speed 2629.26 samples/sec   Loss 15.6014   LearningRate 0.0932   Epoch: 0   Global Step: 28540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:54:46,432-Speed 2624.98 samples/sec   Loss 15.5077   LearningRate 0.0932   Epoch: 0   Global Step: 28550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:54:50,335-Speed 2624.11 samples/sec   Loss 15.4260   LearningRate 0.0932   Epoch: 0   Global Step: 28560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:54:54,228-Speed 2630.93 samples/sec   Loss 15.5417   LearningRate 0.0932   Epoch: 0   Global Step: 28570   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:54:58,125-Speed 2628.80 samples/sec   Loss 15.6918   LearningRate 0.0932   Epoch: 0   Global Step: 28580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:02,021-Speed 2628.47 samples/sec   Loss 15.5092   LearningRate 0.0932   Epoch: 0   Global Step: 28590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:05,918-Speed 2628.41 samples/sec   Loss 15.5344   LearningRate 0.0932   Epoch: 0   Global Step: 28600   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:09,815-Speed 2627.92 samples/sec   Loss 15.5622   LearningRate 0.0932   Epoch: 0   Global Step: 28610   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:13,725-Speed 2619.60 samples/sec   Loss 15.3843   LearningRate 0.0932   Epoch: 0   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:17,619-Speed 2630.93 samples/sec   Loss 15.4148   LearningRate 0.0932   Epoch: 0   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:21,517-Speed 2627.73 samples/sec   Loss 15.5293   LearningRate 0.0932   Epoch: 0   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:25,436-Speed 2613.30 samples/sec   Loss 15.4542   LearningRate 0.0932   Epoch: 0   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:29,336-Speed 2627.07 samples/sec   Loss 15.5589   LearningRate 0.0932   Epoch: 0   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:55:33,238-Speed 2624.71 samples/sec   Loss 15.5551   LearningRate 0.0932   Epoch: 0   Global Step: 28670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:55:37,136-Speed 2626.95 samples/sec   Loss 15.4308   LearningRate 0.0932   Epoch: 0   Global Step: 28680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:55:41,036-Speed 2626.25 samples/sec   Loss 15.4785   LearningRate 0.0932   Epoch: 0   Global Step: 28690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:55:44,934-Speed 2628.07 samples/sec   Loss 15.5770   LearningRate 0.0932   Epoch: 0   Global Step: 28700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:55:48,834-Speed 2625.64 samples/sec   Loss 15.4747   LearningRate 0.0932   Epoch: 0   Global Step: 28710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:55:52,732-Speed 2627.97 samples/sec   Loss 15.5432   LearningRate 0.0932   Epoch: 0   Global Step: 28720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:55:56,627-Speed 2629.41 samples/sec   Loss 15.6154   LearningRate 0.0932   Epoch: 0   Global Step: 28730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:56:00,529-Speed 2625.51 samples/sec   Loss 15.4635   LearningRate 0.0932   Epoch: 0   Global Step: 28740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:56:04,498-Speed 2580.21 samples/sec   Loss 15.4088   LearningRate 0.0932   Epoch: 0   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:08,402-Speed 2623.51 samples/sec   Loss 15.4477   LearningRate 0.0932   Epoch: 0   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:12,306-Speed 2623.38 samples/sec   Loss 15.4056   LearningRate 0.0932   Epoch: 0   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:16,208-Speed 2625.06 samples/sec   Loss 15.4145   LearningRate 0.0932   Epoch: 0   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:20,099-Speed 2632.15 samples/sec   Loss 15.4765   LearningRate 0.0932   Epoch: 0   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:23,991-Speed 2631.50 samples/sec   Loss 15.4205   LearningRate 0.0932   Epoch: 0   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:27,886-Speed 2630.04 samples/sec   Loss 15.4941   LearningRate 0.0932   Epoch: 0   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:31,780-Speed 2629.89 samples/sec   Loss 15.5083   LearningRate 0.0932   Epoch: 0   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:35,678-Speed 2628.27 samples/sec   Loss 15.5745   LearningRate 0.0932   Epoch: 0   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:39,574-Speed 2628.44 samples/sec   Loss 15.4659   LearningRate 0.0932   Epoch: 0   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:56:43,479-Speed 2622.92 samples/sec   Loss 15.4623   LearningRate 0.0932   Epoch: 0   Global Step: 28850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:56:47,386-Speed 2621.60 samples/sec   Loss 15.4373   LearningRate 0.0932   Epoch: 0   Global Step: 28860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:56:51,280-Speed 2630.01 samples/sec   Loss 15.4659   LearningRate 0.0932   Epoch: 0   Global Step: 28870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:56:55,177-Speed 2628.70 samples/sec   Loss 15.4722   LearningRate 0.0932   Epoch: 0   Global Step: 28880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:56:59,079-Speed 2624.81 samples/sec   Loss 15.5162   LearningRate 0.0932   Epoch: 0   Global Step: 28890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:02,993-Speed 2616.93 samples/sec   Loss 15.5624   LearningRate 0.0932   Epoch: 0   Global Step: 28900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:06,889-Speed 2628.49 samples/sec   Loss 15.2146   LearningRate 0.0932   Epoch: 0   Global Step: 28910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:10,784-Speed 2629.46 samples/sec   Loss 15.3921   LearningRate 0.0931   Epoch: 0   Global Step: 28920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:14,682-Speed 2627.62 samples/sec   Loss 15.5105   LearningRate 0.0931   Epoch: 0   Global Step: 28930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:18,579-Speed 2628.47 samples/sec   Loss 15.4881   LearningRate 0.0931   Epoch: 0   Global Step: 28940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:22,460-Speed 2639.41 samples/sec   Loss 15.6349   LearningRate 0.0931   Epoch: 0   Global Step: 28950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:57:26,351-Speed 2632.06 samples/sec   Loss 15.3961   LearningRate 0.0931   Epoch: 0   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:30,249-Speed 2627.90 samples/sec   Loss 15.5544   LearningRate 0.0931   Epoch: 0   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:34,149-Speed 2625.67 samples/sec   Loss 15.4695   LearningRate 0.0931   Epoch: 0   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:38,050-Speed 2625.52 samples/sec   Loss 15.3767   LearningRate 0.0931   Epoch: 0   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:41,952-Speed 2624.93 samples/sec   Loss 15.5016   LearningRate 0.0931   Epoch: 0   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:45,860-Speed 2620.75 samples/sec   Loss 15.4516   LearningRate 0.0931   Epoch: 0   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:49,762-Speed 2625.43 samples/sec   Loss 15.5558   LearningRate 0.0931   Epoch: 0   Global Step: 29020   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:53,660-Speed 2627.29 samples/sec   Loss 15.4194   LearningRate 0.0931   Epoch: 0   Global Step: 29030   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:57:57,556-Speed 2629.15 samples/sec   Loss 15.3907   LearningRate 0.0931   Epoch: 0   Global Step: 29040   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:01,452-Speed 2628.93 samples/sec   Loss 15.4479   LearningRate 0.0931   Epoch: 0   Global Step: 29050   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:05,347-Speed 2629.28 samples/sec   Loss 15.4278   LearningRate 0.0931   Epoch: 0   Global Step: 29060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:58:09,246-Speed 2627.22 samples/sec   Loss 15.4615   LearningRate 0.0931   Epoch: 0   Global Step: 29070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:58:13,143-Speed 2627.66 samples/sec   Loss 15.4766   LearningRate 0.0931   Epoch: 0   Global Step: 29080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:58:17,041-Speed 2628.13 samples/sec   Loss 15.4305   LearningRate 0.0931   Epoch: 0   Global Step: 29090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:58:20,937-Speed 2628.97 samples/sec   Loss 15.5048   LearningRate 0.0931   Epoch: 0   Global Step: 29100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:58:24,834-Speed 2628.18 samples/sec   Loss 15.4571   LearningRate 0.0931   Epoch: 0   Global Step: 29110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:58:28,715-Speed 2638.75 samples/sec   Loss 15.4655   LearningRate 0.0931   Epoch: 0   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:32,612-Speed 2628.99 samples/sec   Loss 15.4816   LearningRate 0.0931   Epoch: 0   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:36,521-Speed 2619.93 samples/sec   Loss 15.4550   LearningRate 0.0931   Epoch: 0   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:40,425-Speed 2623.03 samples/sec   Loss 15.4211   LearningRate 0.0931   Epoch: 0   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:44,324-Speed 2627.54 samples/sec   Loss 15.5096   LearningRate 0.0931   Epoch: 0   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:48,223-Speed 2626.57 samples/sec   Loss 15.3157   LearningRate 0.0931   Epoch: 0   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:52,126-Speed 2624.27 samples/sec   Loss 15.4267   LearningRate 0.0931   Epoch: 0   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:56,038-Speed 2618.04 samples/sec   Loss 15.5290   LearningRate 0.0931   Epoch: 0   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:58:59,931-Speed 2631.00 samples/sec   Loss 15.4160   LearningRate 0.0931   Epoch: 0   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:03,839-Speed 2620.74 samples/sec   Loss 15.2990   LearningRate 0.0931   Epoch: 0   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:07,738-Speed 2627.69 samples/sec   Loss 15.4273   LearningRate 0.0931   Epoch: 0   Global Step: 29220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:59:11,636-Speed 2626.95 samples/sec   Loss 15.1885   LearningRate 0.0931   Epoch: 0   Global Step: 29230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:59:15,537-Speed 2625.47 samples/sec   Loss 15.4428   LearningRate 0.0931   Epoch: 0   Global Step: 29240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:59:19,439-Speed 2625.40 samples/sec   Loss 15.3481   LearningRate 0.0931   Epoch: 0   Global Step: 29250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:59:23,336-Speed 2627.98 samples/sec   Loss 15.5034   LearningRate 0.0931   Epoch: 0   Global Step: 29260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:59:27,242-Speed 2622.24 samples/sec   Loss 15.4509   LearningRate 0.0931   Epoch: 0   Global Step: 29270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 22:59:31,133-Speed 2632.19 samples/sec   Loss 15.3011   LearningRate 0.0931   Epoch: 0   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:35,046-Speed 2617.53 samples/sec   Loss 15.4907   LearningRate 0.0931   Epoch: 0   Global Step: 29290   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:38,946-Speed 2626.56 samples/sec   Loss 15.4755   LearningRate 0.0931   Epoch: 0   Global Step: 29300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:42,846-Speed 2626.04 samples/sec   Loss 15.4361   LearningRate 0.0931   Epoch: 0   Global Step: 29310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:46,743-Speed 2628.58 samples/sec   Loss 15.4552   LearningRate 0.0931   Epoch: 0   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:50,636-Speed 2630.94 samples/sec   Loss 15.5564   LearningRate 0.0931   Epoch: 0   Global Step: 29330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:54,536-Speed 2625.93 samples/sec   Loss 15.4305   LearningRate 0.0931   Epoch: 0   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 22:59:58,438-Speed 2625.25 samples/sec   Loss 15.4119   LearningRate 0.0930   Epoch: 0   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:00:02,362-Speed 2610.20 samples/sec   Loss 15.4456   LearningRate 0.0930   Epoch: 0   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:00:06,273-Speed 2618.82 samples/sec   Loss 15.3740   LearningRate 0.0930   Epoch: 0   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:00:10,178-Speed 2622.66 samples/sec   Loss 15.3141   LearningRate 0.0930   Epoch: 0   Global Step: 29380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:14,081-Speed 2623.80 samples/sec   Loss 15.3895   LearningRate 0.0930   Epoch: 0   Global Step: 29390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:17,979-Speed 2628.07 samples/sec   Loss 15.6122   LearningRate 0.0930   Epoch: 0   Global Step: 29400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:21,883-Speed 2623.73 samples/sec   Loss 15.4547   LearningRate 0.0930   Epoch: 0   Global Step: 29410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:25,788-Speed 2622.93 samples/sec   Loss 15.4311   LearningRate 0.0930   Epoch: 0   Global Step: 29420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:29,695-Speed 2621.00 samples/sec   Loss 15.4582   LearningRate 0.0930   Epoch: 0   Global Step: 29430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:33,596-Speed 2625.78 samples/sec   Loss 15.4415   LearningRate 0.0930   Epoch: 0   Global Step: 29440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:37,495-Speed 2626.93 samples/sec   Loss 15.3698   LearningRate 0.0930   Epoch: 0   Global Step: 29450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:41,400-Speed 2622.59 samples/sec   Loss 15.4415   LearningRate 0.0930   Epoch: 0   Global Step: 29460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:45,303-Speed 2624.52 samples/sec   Loss 15.3407   LearningRate 0.0930   Epoch: 0   Global Step: 29470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:49,198-Speed 2629.26 samples/sec   Loss 15.5126   LearningRate 0.0930   Epoch: 0   Global Step: 29480   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 23:00:53,081-Speed 2638.39 samples/sec   Loss 15.4865   LearningRate 0.0930   Epoch: 0   Global Step: 29490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:00:56,958-Speed 2641.55 samples/sec   Loss 15.2246   LearningRate 0.0930   Epoch: 0   Global Step: 29500   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:00,866-Speed 2620.75 samples/sec   Loss 15.4759   LearningRate 0.0930   Epoch: 0   Global Step: 29510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:04,768-Speed 2625.25 samples/sec   Loss 15.3405   LearningRate 0.0930   Epoch: 0   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:08,666-Speed 2626.94 samples/sec   Loss 15.2382   LearningRate 0.0930   Epoch: 0   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:12,564-Speed 2628.19 samples/sec   Loss 15.5200   LearningRate 0.0930   Epoch: 0   Global Step: 29540   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:16,462-Speed 2627.23 samples/sec   Loss 15.4535   LearningRate 0.0930   Epoch: 0   Global Step: 29550   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:20,357-Speed 2629.96 samples/sec   Loss 15.2507   LearningRate 0.0930   Epoch: 0   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:24,274-Speed 2614.46 samples/sec   Loss 15.4491   LearningRate 0.0930   Epoch: 0   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:28,180-Speed 2622.23 samples/sec   Loss 15.4066   LearningRate 0.0930   Epoch: 0   Global Step: 29580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:32,079-Speed 2627.40 samples/sec   Loss 15.3866   LearningRate 0.0930   Epoch: 0   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:01:35,975-Speed 2628.56 samples/sec   Loss 15.3749   LearningRate 0.0930   Epoch: 0   Global Step: 29600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:01:39,873-Speed 2627.89 samples/sec   Loss 15.3999   LearningRate 0.0930   Epoch: 0   Global Step: 29610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:01:43,772-Speed 2627.13 samples/sec   Loss 15.5479   LearningRate 0.0930   Epoch: 0   Global Step: 29620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:01:47,666-Speed 2629.98 samples/sec   Loss 15.3121   LearningRate 0.0930   Epoch: 0   Global Step: 29630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:01:51,570-Speed 2623.65 samples/sec   Loss 15.3837   LearningRate 0.0930   Epoch: 0   Global Step: 29640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:01:55,471-Speed 2625.79 samples/sec   Loss 15.2103   LearningRate 0.0930   Epoch: 0   Global Step: 29650   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:01:59,374-Speed 2624.14 samples/sec   Loss 15.2528   LearningRate 0.0930   Epoch: 0   Global Step: 29660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:02:03,293-Speed 2613.08 samples/sec   Loss 15.4521   LearningRate 0.0930   Epoch: 0   Global Step: 29670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:02:07,196-Speed 2624.50 samples/sec   Loss 15.4783   LearningRate 0.0930   Epoch: 0   Global Step: 29680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:02:11,095-Speed 2626.87 samples/sec   Loss 15.4479   LearningRate 0.0930   Epoch: 0   Global Step: 29690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:02:14,979-Speed 2636.92 samples/sec   Loss 15.4876   LearningRate 0.0930   Epoch: 0   Global Step: 29700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:02:18,880-Speed 2626.11 samples/sec   Loss 15.3668   LearningRate 0.0930   Epoch: 0   Global Step: 29710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:02:22,760-Speed 2639.72 samples/sec   Loss 15.3512   LearningRate 0.0930   Epoch: 0   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:26,834-Speed 2513.69 samples/sec   Loss 15.3042   LearningRate 0.0930   Epoch: 0   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:30,733-Speed 2627.57 samples/sec   Loss 15.3324   LearningRate 0.0930   Epoch: 0   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:34,635-Speed 2624.62 samples/sec   Loss 15.4684   LearningRate 0.0930   Epoch: 0   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:38,534-Speed 2626.33 samples/sec   Loss 15.4069   LearningRate 0.0930   Epoch: 0   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:42,436-Speed 2625.38 samples/sec   Loss 15.2420   LearningRate 0.0930   Epoch: 0   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:46,331-Speed 2629.69 samples/sec   Loss 15.3919   LearningRate 0.0929   Epoch: 0   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:50,223-Speed 2631.40 samples/sec   Loss 15.4150   LearningRate 0.0929   Epoch: 0   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:54,127-Speed 2624.12 samples/sec   Loss 15.2252   LearningRate 0.0929   Epoch: 0   Global Step: 29800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:02:58,026-Speed 2626.55 samples/sec   Loss 15.3654   LearningRate 0.0929   Epoch: 0   Global Step: 29810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:01,931-Speed 2623.15 samples/sec   Loss 15.3148   LearningRate 0.0929   Epoch: 0   Global Step: 29820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:03:05,808-Speed 2641.87 samples/sec   Loss 15.3863   LearningRate 0.0929   Epoch: 0   Global Step: 29830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:09,709-Speed 2625.03 samples/sec   Loss 15.5462   LearningRate 0.0929   Epoch: 0   Global Step: 29840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:13,607-Speed 2627.48 samples/sec   Loss 15.5164   LearningRate 0.0929   Epoch: 0   Global Step: 29850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:17,502-Speed 2630.01 samples/sec   Loss 15.4188   LearningRate 0.0929   Epoch: 0   Global Step: 29860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:21,397-Speed 2629.32 samples/sec   Loss 15.4878   LearningRate 0.0929   Epoch: 0   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:25,295-Speed 2628.28 samples/sec   Loss 15.3790   LearningRate 0.0929   Epoch: 0   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:29,190-Speed 2629.13 samples/sec   Loss 15.3072   LearningRate 0.0929   Epoch: 0   Global Step: 29890   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:33,086-Speed 2629.30 samples/sec   Loss 15.2424   LearningRate 0.0929   Epoch: 0   Global Step: 29900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:36,979-Speed 2630.75 samples/sec   Loss 15.3303   LearningRate 0.0929   Epoch: 0   Global Step: 29910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:40,872-Speed 2630.73 samples/sec   Loss 15.3693   LearningRate 0.0929   Epoch: 0   Global Step: 29920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:03:44,770-Speed 2627.52 samples/sec   Loss 15.4416   LearningRate 0.0929   Epoch: 0   Global Step: 29930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:03:48,747-Speed 2575.63 samples/sec   Loss 15.5118   LearningRate 0.0929   Epoch: 0   Global Step: 29940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:03:52,651-Speed 2623.62 samples/sec   Loss 15.4025   LearningRate 0.0929   Epoch: 0   Global Step: 29950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:03:56,557-Speed 2622.47 samples/sec   Loss 15.4134   LearningRate 0.0929   Epoch: 0   Global Step: 29960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:04:00,467-Speed 2620.15 samples/sec   Loss 15.4168   LearningRate 0.0929   Epoch: 0   Global Step: 29970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:04:04,374-Speed 2621.32 samples/sec   Loss 15.4043   LearningRate 0.0929   Epoch: 0   Global Step: 29980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:04:08,276-Speed 2624.70 samples/sec   Loss 15.4428   LearningRate 0.0929   Epoch: 0   Global Step: 29990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:04:12,152-Speed 2642.73 samples/sec   Loss 15.2710   LearningRate 0.0929   Epoch: 0   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:04:55,646-[lfw][30000]XNorm: 22.802505
Training: 2022-04-12 23:04:55,647-[lfw][30000]Accuracy-Flip: 0.99483+-0.00320
Training: 2022-04-12 23:04:55,648-[lfw][30000]Accuracy-Highest: 0.99500
Training: 2022-04-12 23:05:46,179-[cfp_fp][30000]XNorm: 20.030670
Training: 2022-04-12 23:05:46,180-[cfp_fp][30000]Accuracy-Flip: 0.96657+-0.00969
Training: 2022-04-12 23:05:46,181-[cfp_fp][30000]Accuracy-Highest: 0.96657
Training: 2022-04-12 23:06:29,647-[agedb_30][30000]XNorm: 22.689600
Training: 2022-04-12 23:06:29,648-[agedb_30][30000]Accuracy-Flip: 0.94833+-0.00872
Training: 2022-04-12 23:06:29,648-[agedb_30][30000]Accuracy-Highest: 0.94833
Training: 2022-04-12 23:06:33,516-Speed 72.44 samples/sec   Loss 15.4600   LearningRate 0.0929   Epoch: 0   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:06:37,394-Speed 2641.88 samples/sec   Loss 15.4214   LearningRate 0.0929   Epoch: 0   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:06:41,268-Speed 2643.70 samples/sec   Loss 15.4069   LearningRate 0.0929   Epoch: 0   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:06:45,143-Speed 2643.34 samples/sec   Loss 15.5326   LearningRate 0.0929   Epoch: 0   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:06:49,020-Speed 2641.86 samples/sec   Loss 15.4620   LearningRate 0.0929   Epoch: 0   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:06:52,901-Speed 2639.47 samples/sec   Loss 15.2599   LearningRate 0.0929   Epoch: 0   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:06:56,787-Speed 2635.70 samples/sec   Loss 15.3573   LearningRate 0.0929   Epoch: 0   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:00,670-Speed 2638.60 samples/sec   Loss 15.4528   LearningRate 0.0929   Epoch: 0   Global Step: 30080   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:04,556-Speed 2635.34 samples/sec   Loss 15.3624   LearningRate 0.0929   Epoch: 0   Global Step: 30090   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:08,463-Speed 2622.17 samples/sec   Loss 15.3936   LearningRate 0.0929   Epoch: 0   Global Step: 30100   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:12,353-Speed 2633.40 samples/sec   Loss 15.3039   LearningRate 0.0929   Epoch: 0   Global Step: 30110   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:16,252-Speed 2626.76 samples/sec   Loss 15.3198   LearningRate 0.0929   Epoch: 0   Global Step: 30120   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:20,139-Speed 2635.06 samples/sec   Loss 15.3563   LearningRate 0.0929   Epoch: 0   Global Step: 30130   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:24,038-Speed 2626.98 samples/sec   Loss 15.3412   LearningRate 0.0929   Epoch: 0   Global Step: 30140   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:28,038-Speed 2560.96 samples/sec   Loss 15.3991   LearningRate 0.0929   Epoch: 0   Global Step: 30150   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:31,978-Speed 2599.41 samples/sec   Loss 15.2092   LearningRate 0.0929   Epoch: 0   Global Step: 30160   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:35,875-Speed 2628.49 samples/sec   Loss 15.3585   LearningRate 0.0929   Epoch: 0   Global Step: 30170   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:07:39,761-Speed 2636.26 samples/sec   Loss 15.3319   LearningRate 0.0929   Epoch: 0   Global Step: 30180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:43,718-Speed 2588.26 samples/sec   Loss 15.2173   LearningRate 0.0929   Epoch: 0   Global Step: 30190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:47,622-Speed 2623.58 samples/sec   Loss 15.1729   LearningRate 0.0929   Epoch: 0   Global Step: 30200   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:51,520-Speed 2627.31 samples/sec   Loss 15.3680   LearningRate 0.0928   Epoch: 0   Global Step: 30210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:55,425-Speed 2623.41 samples/sec   Loss 15.2765   LearningRate 0.0928   Epoch: 0   Global Step: 30220   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:07:59,316-Speed 2632.38 samples/sec   Loss 15.1989   LearningRate 0.0928   Epoch: 0   Global Step: 30230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:08:03,212-Speed 2628.53 samples/sec   Loss 15.3482   LearningRate 0.0928   Epoch: 0   Global Step: 30240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:08:07,118-Speed 2622.66 samples/sec   Loss 15.4385   LearningRate 0.0928   Epoch: 0   Global Step: 30250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:08:11,020-Speed 2625.07 samples/sec   Loss 15.2930   LearningRate 0.0928   Epoch: 0   Global Step: 30260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:08:14,920-Speed 2626.82 samples/sec   Loss 15.2451   LearningRate 0.0928   Epoch: 0   Global Step: 30270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:08:18,816-Speed 2628.48 samples/sec   Loss 15.1729   LearningRate 0.0928   Epoch: 0   Global Step: 30280   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:22,720-Speed 2623.85 samples/sec   Loss 15.3417   LearningRate 0.0928   Epoch: 0   Global Step: 30290   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:26,619-Speed 2626.57 samples/sec   Loss 15.2945   LearningRate 0.0928   Epoch: 0   Global Step: 30300   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:30,516-Speed 2628.75 samples/sec   Loss 15.1997   LearningRate 0.0928   Epoch: 0   Global Step: 30310   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:34,416-Speed 2626.31 samples/sec   Loss 15.4567   LearningRate 0.0928   Epoch: 0   Global Step: 30320   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:38,321-Speed 2622.90 samples/sec   Loss 15.0129   LearningRate 0.0928   Epoch: 0   Global Step: 30330   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:42,232-Speed 2619.38 samples/sec   Loss 15.3813   LearningRate 0.0928   Epoch: 0   Global Step: 30340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:46,127-Speed 2629.60 samples/sec   Loss 15.4082   LearningRate 0.0928   Epoch: 0   Global Step: 30350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:50,022-Speed 2629.77 samples/sec   Loss 15.2575   LearningRate 0.0928   Epoch: 0   Global Step: 30360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:53,918-Speed 2628.94 samples/sec   Loss 15.4343   LearningRate 0.0928   Epoch: 0   Global Step: 30370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:08:57,785-Speed 2648.68 samples/sec   Loss 15.2777   LearningRate 0.0928   Epoch: 0   Global Step: 30380   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:01,683-Speed 2627.43 samples/sec   Loss 15.3148   LearningRate 0.0928   Epoch: 0   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:05,577-Speed 2631.10 samples/sec   Loss 15.3331   LearningRate 0.0928   Epoch: 0   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:09,475-Speed 2627.20 samples/sec   Loss 15.2628   LearningRate 0.0928   Epoch: 0   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:13,397-Speed 2611.48 samples/sec   Loss 15.3089   LearningRate 0.0928   Epoch: 0   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:17,289-Speed 2631.78 samples/sec   Loss 15.2634   LearningRate 0.0928   Epoch: 0   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:21,186-Speed 2628.80 samples/sec   Loss 15.3617   LearningRate 0.0928   Epoch: 0   Global Step: 30440   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:25,164-Speed 2575.05 samples/sec   Loss 15.3516   LearningRate 0.0928   Epoch: 0   Global Step: 30450   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:29,059-Speed 2629.31 samples/sec   Loss 15.2135   LearningRate 0.0928   Epoch: 0   Global Step: 30460   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:32,962-Speed 2624.13 samples/sec   Loss 15.4451   LearningRate 0.0928   Epoch: 0   Global Step: 30470   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:37,004-Speed 2534.59 samples/sec   Loss 15.3713   LearningRate 0.0928   Epoch: 0   Global Step: 30480   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:09:40,912-Speed 2621.10 samples/sec   Loss 15.2705   LearningRate 0.0928   Epoch: 0   Global Step: 30490   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:09:44,800-Speed 2634.34 samples/sec   Loss 15.4849   LearningRate 0.0928   Epoch: 0   Global Step: 30500   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:48,720-Speed 2612.67 samples/sec   Loss 15.1649   LearningRate 0.0928   Epoch: 0   Global Step: 30510   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:52,670-Speed 2593.38 samples/sec   Loss 15.3227   LearningRate 0.0928   Epoch: 0   Global Step: 30520   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:09:56,571-Speed 2625.38 samples/sec   Loss 15.2025   LearningRate 0.0928   Epoch: 0   Global Step: 30530   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:00,468-Speed 2628.39 samples/sec   Loss 15.3668   LearningRate 0.0928   Epoch: 0   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:04,369-Speed 2625.84 samples/sec   Loss 15.1701   LearningRate 0.0928   Epoch: 0   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:08,288-Speed 2613.87 samples/sec   Loss 15.2991   LearningRate 0.0928   Epoch: 0   Global Step: 30560   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:12,186-Speed 2627.82 samples/sec   Loss 15.3890   LearningRate 0.0928   Epoch: 0   Global Step: 30570   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:16,092-Speed 2622.11 samples/sec   Loss 15.3023   LearningRate 0.0928   Epoch: 0   Global Step: 30580   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:19,992-Speed 2627.12 samples/sec   Loss 15.3633   LearningRate 0.0928   Epoch: 0   Global Step: 30590   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:23,896-Speed 2623.10 samples/sec   Loss 15.3790   LearningRate 0.0928   Epoch: 0   Global Step: 30600   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:10:27,776-Speed 2640.09 samples/sec   Loss 15.1684   LearningRate 0.0928   Epoch: 0   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:31,674-Speed 2627.17 samples/sec   Loss 15.1718   LearningRate 0.0928   Epoch: 0   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:35,580-Speed 2622.81 samples/sec   Loss 15.2849   LearningRate 0.0928   Epoch: 0   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:39,503-Speed 2610.78 samples/sec   Loss 15.3432   LearningRate 0.0927   Epoch: 0   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:43,421-Speed 2614.63 samples/sec   Loss 15.2112   LearningRate 0.0927   Epoch: 0   Global Step: 30650   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:47,334-Speed 2617.45 samples/sec   Loss 15.3223   LearningRate 0.0927   Epoch: 0   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:51,318-Speed 2571.12 samples/sec   Loss 15.1722   LearningRate 0.0927   Epoch: 0   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:55,308-Speed 2566.99 samples/sec   Loss 15.1646   LearningRate 0.0927   Epoch: 0   Global Step: 30680   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:10:59,284-Speed 2575.85 samples/sec   Loss 15.0905   LearningRate 0.0927   Epoch: 0   Global Step: 30690   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:03,197-Speed 2617.94 samples/sec   Loss 15.2728   LearningRate 0.0927   Epoch: 0   Global Step: 30700   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:07,090-Speed 2631.33 samples/sec   Loss 15.2241   LearningRate 0.0927   Epoch: 0   Global Step: 30710   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:10,990-Speed 2626.47 samples/sec   Loss 15.3202   LearningRate 0.0927   Epoch: 0   Global Step: 30720   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:14,921-Speed 2605.04 samples/sec   Loss 15.4775   LearningRate 0.0927   Epoch: 0   Global Step: 30730   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:18,938-Speed 2550.20 samples/sec   Loss 15.3660   LearningRate 0.0927   Epoch: 0   Global Step: 30740   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:22,890-Speed 2592.04 samples/sec   Loss 15.2107   LearningRate 0.0927   Epoch: 0   Global Step: 30750   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:26,786-Speed 2629.27 samples/sec   Loss 15.2427   LearningRate 0.0927   Epoch: 0   Global Step: 30760   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:30,683-Speed 2628.07 samples/sec   Loss 15.3072   LearningRate 0.0927   Epoch: 0   Global Step: 30770   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:11:34,581-Speed 2627.80 samples/sec   Loss 15.1932   LearningRate 0.0927   Epoch: 0   Global Step: 30780   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:38,471-Speed 2632.53 samples/sec   Loss 15.2095   LearningRate 0.0927   Epoch: 0   Global Step: 30790   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:42,364-Speed 2631.43 samples/sec   Loss 15.1837   LearningRate 0.0927   Epoch: 0   Global Step: 30800   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:46,265-Speed 2625.54 samples/sec   Loss 15.3095   LearningRate 0.0927   Epoch: 0   Global Step: 30810   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:50,162-Speed 2628.38 samples/sec   Loss 15.1372   LearningRate 0.0927   Epoch: 0   Global Step: 30820   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:54,070-Speed 2621.00 samples/sec   Loss 15.3268   LearningRate 0.0927   Epoch: 0   Global Step: 30830   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:11:58,005-Speed 2603.10 samples/sec   Loss 15.1801   LearningRate 0.0927   Epoch: 0   Global Step: 30840   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:01,907-Speed 2624.98 samples/sec   Loss 15.2899   LearningRate 0.0927   Epoch: 0   Global Step: 30850   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:05,830-Speed 2611.03 samples/sec   Loss 15.2849   LearningRate 0.0927   Epoch: 0   Global Step: 30860   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:09,738-Speed 2620.52 samples/sec   Loss 15.1843   LearningRate 0.0927   Epoch: 0   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:13,635-Speed 2628.69 samples/sec   Loss 15.0807   LearningRate 0.0927   Epoch: 0   Global Step: 30880   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:12:17,538-Speed 2624.45 samples/sec   Loss 15.1297   LearningRate 0.0927   Epoch: 0   Global Step: 30890   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:12:21,439-Speed 2625.76 samples/sec   Loss 15.0727   LearningRate 0.0927   Epoch: 0   Global Step: 30900   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:12:25,335-Speed 2628.68 samples/sec   Loss 15.4031   LearningRate 0.0927   Epoch: 0   Global Step: 30910   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:12:29,225-Speed 2633.28 samples/sec   Loss 15.2761   LearningRate 0.0927   Epoch: 0   Global Step: 30920   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:33,127-Speed 2625.24 samples/sec   Loss 15.1777   LearningRate 0.0927   Epoch: 0   Global Step: 30930   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:37,024-Speed 2628.26 samples/sec   Loss 15.2814   LearningRate 0.0927   Epoch: 0   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:40,920-Speed 2628.66 samples/sec   Loss 15.2009   LearningRate 0.0927   Epoch: 0   Global Step: 30950   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:44,816-Speed 2628.96 samples/sec   Loss 15.2467   LearningRate 0.0927   Epoch: 0   Global Step: 30960   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:48,726-Speed 2620.25 samples/sec   Loss 15.3001   LearningRate 0.0927   Epoch: 0   Global Step: 30970   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:52,622-Speed 2628.72 samples/sec   Loss 15.1717   LearningRate 0.0927   Epoch: 0   Global Step: 30980   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:12:56,537-Speed 2616.09 samples/sec   Loss 15.1135   LearningRate 0.0927   Epoch: 0   Global Step: 30990   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:13:00,436-Speed 2627.31 samples/sec   Loss 15.2318   LearningRate 0.0927   Epoch: 0   Global Step: 31000   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:13:04,342-Speed 2622.14 samples/sec   Loss 15.1569   LearningRate 0.0927   Epoch: 0   Global Step: 31010   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:13:08,235-Speed 2631.03 samples/sec   Loss 15.4059   LearningRate 0.0927   Epoch: 0   Global Step: 31020   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:12,135-Speed 2626.56 samples/sec   Loss 15.2960   LearningRate 0.0927   Epoch: 0   Global Step: 31030   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:16,033-Speed 2627.55 samples/sec   Loss 15.2751   LearningRate 0.0927   Epoch: 0   Global Step: 31040   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:19,929-Speed 2628.96 samples/sec   Loss 15.1084   LearningRate 0.0927   Epoch: 0   Global Step: 31050   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:23,824-Speed 2629.58 samples/sec   Loss 15.2263   LearningRate 0.0927   Epoch: 0   Global Step: 31060   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:27,722-Speed 2630.58 samples/sec   Loss 15.2765   LearningRate 0.0926   Epoch: 0   Global Step: 31070   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:31,621-Speed 2626.66 samples/sec   Loss 15.2160   LearningRate 0.0926   Epoch: 0   Global Step: 31080   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:35,518-Speed 2628.62 samples/sec   Loss 15.0275   LearningRate 0.0926   Epoch: 0   Global Step: 31090   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:39,428-Speed 2619.26 samples/sec   Loss 15.1419   LearningRate 0.0926   Epoch: 0   Global Step: 31100   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:43,323-Speed 2629.53 samples/sec   Loss 15.2500   LearningRate 0.0926   Epoch: 0   Global Step: 31110   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:13:47,201-Speed 2640.99 samples/sec   Loss 15.1450   LearningRate 0.0926   Epoch: 0   Global Step: 31120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:13:51,098-Speed 2628.94 samples/sec   Loss 15.1663   LearningRate 0.0926   Epoch: 0   Global Step: 31130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:13:54,997-Speed 2626.21 samples/sec   Loss 15.0845   LearningRate 0.0926   Epoch: 0   Global Step: 31140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:13:58,898-Speed 2626.54 samples/sec   Loss 15.1876   LearningRate 0.0926   Epoch: 0   Global Step: 31150   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:02,797-Speed 2626.78 samples/sec   Loss 15.1230   LearningRate 0.0926   Epoch: 0   Global Step: 31160   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:06,690-Speed 2630.71 samples/sec   Loss 15.2710   LearningRate 0.0926   Epoch: 0   Global Step: 31170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:10,586-Speed 2629.02 samples/sec   Loss 15.1805   LearningRate 0.0926   Epoch: 0   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:14,484-Speed 2627.95 samples/sec   Loss 15.1374   LearningRate 0.0926   Epoch: 0   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:18,383-Speed 2627.77 samples/sec   Loss 15.1181   LearningRate 0.0926   Epoch: 0   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:22,293-Speed 2619.25 samples/sec   Loss 15.0659   LearningRate 0.0926   Epoch: 0   Global Step: 31210   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:26,188-Speed 2629.96 samples/sec   Loss 15.1274   LearningRate 0.0926   Epoch: 0   Global Step: 31220   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:14:30,104-Speed 2615.67 samples/sec   Loss 15.2871   LearningRate 0.0926   Epoch: 0   Global Step: 31230   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:14:33,987-Speed 2637.50 samples/sec   Loss 15.1045   LearningRate 0.0926   Epoch: 0   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:37,895-Speed 2621.23 samples/sec   Loss 15.2066   LearningRate 0.0926   Epoch: 0   Global Step: 31250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:41,795-Speed 2626.54 samples/sec   Loss 15.2253   LearningRate 0.0926   Epoch: 0   Global Step: 31260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:45,693-Speed 2628.30 samples/sec   Loss 15.0967   LearningRate 0.0926   Epoch: 0   Global Step: 31270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:49,594-Speed 2625.43 samples/sec   Loss 15.1408   LearningRate 0.0926   Epoch: 0   Global Step: 31280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:53,498-Speed 2623.03 samples/sec   Loss 15.2443   LearningRate 0.0926   Epoch: 0   Global Step: 31290   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:14:57,405-Speed 2621.81 samples/sec   Loss 15.1954   LearningRate 0.0926   Epoch: 0   Global Step: 31300   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:15:01,311-Speed 2622.65 samples/sec   Loss 15.1398   LearningRate 0.0926   Epoch: 0   Global Step: 31310   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:15:05,212-Speed 2625.72 samples/sec   Loss 15.1392   LearningRate 0.0926   Epoch: 0   Global Step: 31320   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:15:09,109-Speed 2628.25 samples/sec   Loss 15.0534   LearningRate 0.0926   Epoch: 0   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:15:13,016-Speed 2621.82 samples/sec   Loss 15.0004   LearningRate 0.0926   Epoch: 0   Global Step: 31340   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:16,920-Speed 2623.94 samples/sec   Loss 15.2576   LearningRate 0.0926   Epoch: 0   Global Step: 31350   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:20,817-Speed 2628.33 samples/sec   Loss 15.1326   LearningRate 0.0926   Epoch: 0   Global Step: 31360   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:24,722-Speed 2622.93 samples/sec   Loss 15.2704   LearningRate 0.0926   Epoch: 0   Global Step: 31370   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:28,622-Speed 2625.62 samples/sec   Loss 15.0404   LearningRate 0.0926   Epoch: 0   Global Step: 31380   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:32,517-Speed 2630.30 samples/sec   Loss 15.0210   LearningRate 0.0926   Epoch: 0   Global Step: 31390   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:36,410-Speed 2630.58 samples/sec   Loss 15.3116   LearningRate 0.0926   Epoch: 0   Global Step: 31400   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:40,311-Speed 2625.84 samples/sec   Loss 15.3132   LearningRate 0.0926   Epoch: 0   Global Step: 31410   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:15:44,187-Speed 2642.90 samples/sec   Loss 15.2467   LearningRate 0.0926   Epoch: 0   Global Step: 31420   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:15:48,084-Speed 2627.87 samples/sec   Loss 15.1816   LearningRate 0.0926   Epoch: 0   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:15:52,001-Speed 2614.76 samples/sec   Loss 15.2909   LearningRate 0.0926   Epoch: 0   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:15:55,895-Speed 2630.47 samples/sec   Loss 15.1075   LearningRate 0.0926   Epoch: 0   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:15:59,793-Speed 2627.69 samples/sec   Loss 15.2493   LearningRate 0.0926   Epoch: 0   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:16:03,693-Speed 2625.98 samples/sec   Loss 15.1527   LearningRate 0.0926   Epoch: 0   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:16:07,657-Speed 2583.87 samples/sec   Loss 15.0387   LearningRate 0.0926   Epoch: 0   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:16:11,559-Speed 2625.37 samples/sec   Loss 15.0926   LearningRate 0.0926   Epoch: 0   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:16:15,461-Speed 2624.39 samples/sec   Loss 15.1002   LearningRate 0.0925   Epoch: 0   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:16:19,355-Speed 2630.95 samples/sec   Loss 15.0739   LearningRate 0.0925   Epoch: 0   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:16:23,258-Speed 2623.59 samples/sec   Loss 15.2498   LearningRate 0.0925   Epoch: 0   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:27,165-Speed 2621.39 samples/sec   Loss 15.1407   LearningRate 0.0925   Epoch: 0   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:31,065-Speed 2626.51 samples/sec   Loss 15.1343   LearningRate 0.0925   Epoch: 0   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:34,980-Speed 2615.95 samples/sec   Loss 15.2236   LearningRate 0.0925   Epoch: 0   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:38,877-Speed 2628.37 samples/sec   Loss 15.2513   LearningRate 0.0925   Epoch: 0   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:42,771-Speed 2630.98 samples/sec   Loss 15.1752   LearningRate 0.0925   Epoch: 0   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:46,672-Speed 2625.28 samples/sec   Loss 15.1379   LearningRate 0.0925   Epoch: 0   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:50,573-Speed 2626.02 samples/sec   Loss 15.2599   LearningRate 0.0925   Epoch: 0   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:54,467-Speed 2629.95 samples/sec   Loss 15.1742   LearningRate 0.0925   Epoch: 0   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:16:58,360-Speed 2630.73 samples/sec   Loss 15.2665   LearningRate 0.0925   Epoch: 0   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:17:02,255-Speed 2629.69 samples/sec   Loss 15.2395   LearningRate 0.0925   Epoch: 0   Global Step: 31620   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:06,149-Speed 2630.54 samples/sec   Loss 15.1894   LearningRate 0.0925   Epoch: 0   Global Step: 31630   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:10,042-Speed 2630.54 samples/sec   Loss 15.0339   LearningRate 0.0925   Epoch: 0   Global Step: 31640   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:13,938-Speed 2629.69 samples/sec   Loss 14.8609   LearningRate 0.0925   Epoch: 0   Global Step: 31650   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:17,831-Speed 2631.39 samples/sec   Loss 15.1177   LearningRate 0.0925   Epoch: 0   Global Step: 31660   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:21,873-Speed 2534.05 samples/sec   Loss 15.0956   LearningRate 0.0925   Epoch: 0   Global Step: 31670   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:25,934-Speed 2521.85 samples/sec   Loss 14.9889   LearningRate 0.0925   Epoch: 0   Global Step: 31680   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:29,823-Speed 2633.36 samples/sec   Loss 15.1898   LearningRate 0.0925   Epoch: 0   Global Step: 31690   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:33,721-Speed 2627.99 samples/sec   Loss 15.0605   LearningRate 0.0925   Epoch: 0   Global Step: 31700   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:17:37,614-Speed 2630.59 samples/sec   Loss 15.1752   LearningRate 0.0925   Epoch: 0   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:17:41,520-Speed 2622.32 samples/sec   Loss 15.2398   LearningRate 0.0925   Epoch: 0   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:17:45,419-Speed 2626.91 samples/sec   Loss 15.2719   LearningRate 0.0925   Epoch: 0   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:17:49,315-Speed 2629.22 samples/sec   Loss 15.0218   LearningRate 0.0925   Epoch: 0   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:17:53,232-Speed 2614.60 samples/sec   Loss 15.0721   LearningRate 0.0925   Epoch: 0   Global Step: 31750   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:17:57,131-Speed 2627.62 samples/sec   Loss 15.2409   LearningRate 0.0925   Epoch: 0   Global Step: 31760   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:01,026-Speed 2629.62 samples/sec   Loss 15.0964   LearningRate 0.0925   Epoch: 0   Global Step: 31770   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:04,921-Speed 2629.66 samples/sec   Loss 15.0314   LearningRate 0.0925   Epoch: 0   Global Step: 31780   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:08,816-Speed 2629.76 samples/sec   Loss 14.9716   LearningRate 0.0925   Epoch: 0   Global Step: 31790   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:12,712-Speed 2628.41 samples/sec   Loss 15.2479   LearningRate 0.0925   Epoch: 0   Global Step: 31800   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:16,622-Speed 2619.69 samples/sec   Loss 15.2167   LearningRate 0.0925   Epoch: 0   Global Step: 31810   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:18:20,515-Speed 2631.24 samples/sec   Loss 15.2066   LearningRate 0.0925   Epoch: 0   Global Step: 31820   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:18:24,408-Speed 2631.31 samples/sec   Loss 15.0883   LearningRate 0.0925   Epoch: 0   Global Step: 31830   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:18:28,299-Speed 2632.14 samples/sec   Loss 15.1858   LearningRate 0.0925   Epoch: 0   Global Step: 31840   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:18:32,190-Speed 2632.44 samples/sec   Loss 15.2351   LearningRate 0.0925   Epoch: 0   Global Step: 31850   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:18:36,083-Speed 2630.83 samples/sec   Loss 15.1992   LearningRate 0.0925   Epoch: 0   Global Step: 31860   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:18:39,959-Speed 2642.49 samples/sec   Loss 15.0604   LearningRate 0.0925   Epoch: 0   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:43,849-Speed 2633.30 samples/sec   Loss 15.1086   LearningRate 0.0925   Epoch: 0   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:47,742-Speed 2631.41 samples/sec   Loss 14.9740   LearningRate 0.0925   Epoch: 0   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:51,634-Speed 2631.52 samples/sec   Loss 15.0621   LearningRate 0.0925   Epoch: 0   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:18:55,494-Speed 2653.81 samples/sec   Loss 15.2894   LearningRate 0.0925   Epoch: 0   Global Step: 31910   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:18:59,390-Speed 2629.85 samples/sec   Loss 15.0140   LearningRate 0.0925   Epoch: 0   Global Step: 31920   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:03,286-Speed 2628.97 samples/sec   Loss 15.0976   LearningRate 0.0925   Epoch: 0   Global Step: 31930   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:07,203-Speed 2614.93 samples/sec   Loss 14.9914   LearningRate 0.0924   Epoch: 0   Global Step: 31940   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:11,109-Speed 2622.22 samples/sec   Loss 15.1169   LearningRate 0.0924   Epoch: 0   Global Step: 31950   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:15,000-Speed 2632.11 samples/sec   Loss 15.2001   LearningRate 0.0924   Epoch: 0   Global Step: 31960   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:18,890-Speed 2633.21 samples/sec   Loss 14.9280   LearningRate 0.0924   Epoch: 0   Global Step: 31970   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:22,783-Speed 2630.79 samples/sec   Loss 14.9523   LearningRate 0.0924   Epoch: 0   Global Step: 31980   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:26,678-Speed 2629.74 samples/sec   Loss 15.1638   LearningRate 0.0924   Epoch: 0   Global Step: 31990   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:30,569-Speed 2632.38 samples/sec   Loss 15.0917   LearningRate 0.0924   Epoch: 0   Global Step: 32000   Fp16 Grad Scale: 32768   Required: 90 hours
Training: 2022-04-12 23:19:34,461-Speed 2631.49 samples/sec   Loss 15.0577   LearningRate 0.0924   Epoch: 0   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:19:38,353-Speed 2631.38 samples/sec   Loss 15.1076   LearningRate 0.0924   Epoch: 0   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:19:42,247-Speed 2630.76 samples/sec   Loss 15.2836   LearningRate 0.0924   Epoch: 0   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:19:46,141-Speed 2629.64 samples/sec   Loss 15.1205   LearningRate 0.0924   Epoch: 0   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:19:50,037-Speed 2629.23 samples/sec   Loss 15.2558   LearningRate 0.0924   Epoch: 0   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:19:53,929-Speed 2631.53 samples/sec   Loss 15.2292   LearningRate 0.0924   Epoch: 0   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:19:57,838-Speed 2620.52 samples/sec   Loss 15.1677   LearningRate 0.0924   Epoch: 0   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:20:01,729-Speed 2632.00 samples/sec   Loss 15.0306   LearningRate 0.0924   Epoch: 0   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:20:05,631-Speed 2624.97 samples/sec   Loss 15.2382   LearningRate 0.0924   Epoch: 0   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:20:09,529-Speed 2627.54 samples/sec   Loss 15.0871   LearningRate 0.0924   Epoch: 0   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 90 hours
Training: 2022-04-12 23:20:13,434-Speed 2623.49 samples/sec   Loss 15.1452   LearningRate 0.0924   Epoch: 0   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:17,339-Speed 2622.80 samples/sec   Loss 15.1425   LearningRate 0.0924   Epoch: 0   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:21,234-Speed 2629.48 samples/sec   Loss 15.2374   LearningRate 0.0924   Epoch: 0   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:25,128-Speed 2630.08 samples/sec   Loss 15.0669   LearningRate 0.0924   Epoch: 0   Global Step: 32140   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:29,037-Speed 2620.21 samples/sec   Loss 15.2390   LearningRate 0.0924   Epoch: 0   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:32,945-Speed 2621.04 samples/sec   Loss 15.0548   LearningRate 0.0924   Epoch: 0   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:36,839-Speed 2630.21 samples/sec   Loss 15.1811   LearningRate 0.0924   Epoch: 0   Global Step: 32170   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:40,734-Speed 2629.73 samples/sec   Loss 15.1026   LearningRate 0.0924   Epoch: 0   Global Step: 32180   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:44,631-Speed 2628.52 samples/sec   Loss 15.1439   LearningRate 0.0924   Epoch: 0   Global Step: 32190   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:48,525-Speed 2630.12 samples/sec   Loss 15.2164   LearningRate 0.0924   Epoch: 0   Global Step: 32200   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:20:52,418-Speed 2630.73 samples/sec   Loss 15.1415   LearningRate 0.0924   Epoch: 0   Global Step: 32210   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:20:56,317-Speed 2627.22 samples/sec   Loss 15.2364   LearningRate 0.0924   Epoch: 0   Global Step: 32220   Fp16 Grad Scale: 262144   Required: 90 hours
Training: 2022-04-12 23:21:00,204-Speed 2634.90 samples/sec   Loss 15.0374   LearningRate 0.0924   Epoch: 0   Global Step: 32230   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:21:04,097-Speed 2631.37 samples/sec   Loss 15.0771   LearningRate 0.0924   Epoch: 0   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:21:07,988-Speed 2631.88 samples/sec   Loss 15.1525   LearningRate 0.0924   Epoch: 0   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:21:11,884-Speed 2629.21 samples/sec   Loss 15.2461   LearningRate 0.0924   Epoch: 0   Global Step: 32260   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:21:15,791-Speed 2622.13 samples/sec   Loss 15.0351   LearningRate 0.0924   Epoch: 0   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:21:19,691-Speed 2626.26 samples/sec   Loss 15.1281   LearningRate 0.0924   Epoch: 0   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 90 hours
Training: 2022-04-12 23:21:23,586-Speed 2629.40 samples/sec   Loss 14.9424   LearningRate 0.0924   Epoch: 0   Global Step: 32290   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:21:27,512-Speed 2608.91 samples/sec   Loss 15.0432   LearningRate 0.0924   Epoch: 0   Global Step: 32300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:21:31,416-Speed 2623.21 samples/sec   Loss 14.9389   LearningRate 0.0924   Epoch: 0   Global Step: 32310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:21:35,490-Speed 2514.33 samples/sec   Loss 15.0658   LearningRate 0.0924   Epoch: 0   Global Step: 32320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:21:39,388-Speed 2627.81 samples/sec   Loss 15.0022   LearningRate 0.0924   Epoch: 0   Global Step: 32330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:21:43,412-Speed 2544.99 samples/sec   Loss 15.0791   LearningRate 0.0924   Epoch: 0   Global Step: 32340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:21:47,315-Speed 2624.61 samples/sec   Loss 15.0936   LearningRate 0.0924   Epoch: 0   Global Step: 32350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:21:51,205-Speed 2632.89 samples/sec   Loss 14.9402   LearningRate 0.0924   Epoch: 0   Global Step: 32360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:21:55,104-Speed 2626.94 samples/sec   Loss 15.2202   LearningRate 0.0923   Epoch: 0   Global Step: 32370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:21:59,005-Speed 2625.73 samples/sec   Loss 15.0449   LearningRate 0.0923   Epoch: 0   Global Step: 32380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:02,903-Speed 2627.62 samples/sec   Loss 15.1204   LearningRate 0.0923   Epoch: 0   Global Step: 32390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:06,807-Speed 2623.49 samples/sec   Loss 15.0649   LearningRate 0.0923   Epoch: 0   Global Step: 32400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:10,702-Speed 2630.23 samples/sec   Loss 15.1236   LearningRate 0.0923   Epoch: 0   Global Step: 32410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:14,598-Speed 2628.39 samples/sec   Loss 15.2007   LearningRate 0.0923   Epoch: 0   Global Step: 32420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:18,506-Speed 2621.31 samples/sec   Loss 15.0954   LearningRate 0.0923   Epoch: 0   Global Step: 32430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:22,399-Speed 2631.01 samples/sec   Loss 15.1484   LearningRate 0.0923   Epoch: 0   Global Step: 32440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:26,296-Speed 2628.12 samples/sec   Loss 15.2139   LearningRate 0.0923   Epoch: 0   Global Step: 32450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:30,191-Speed 2629.71 samples/sec   Loss 14.9543   LearningRate 0.0923   Epoch: 0   Global Step: 32460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:34,085-Speed 2630.85 samples/sec   Loss 15.2057   LearningRate 0.0923   Epoch: 0   Global Step: 32470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:37,983-Speed 2627.49 samples/sec   Loss 15.1228   LearningRate 0.0923   Epoch: 0   Global Step: 32480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:41,915-Speed 2604.82 samples/sec   Loss 15.1871   LearningRate 0.0923   Epoch: 0   Global Step: 32490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:45,810-Speed 2629.49 samples/sec   Loss 15.0647   LearningRate 0.0923   Epoch: 0   Global Step: 32500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:49,711-Speed 2625.87 samples/sec   Loss 15.0827   LearningRate 0.0923   Epoch: 0   Global Step: 32510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:53,609-Speed 2627.47 samples/sec   Loss 15.1505   LearningRate 0.0923   Epoch: 0   Global Step: 32520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:22:57,509-Speed 2626.68 samples/sec   Loss 14.9900   LearningRate 0.0923   Epoch: 0   Global Step: 32530   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 23:23:01,408-Speed 2626.79 samples/sec   Loss 15.0144   LearningRate 0.0923   Epoch: 0   Global Step: 32540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:05,322-Speed 2616.75 samples/sec   Loss 15.1522   LearningRate 0.0923   Epoch: 0   Global Step: 32550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:09,220-Speed 2627.76 samples/sec   Loss 15.1154   LearningRate 0.0923   Epoch: 0   Global Step: 32560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:13,115-Speed 2629.32 samples/sec   Loss 14.9483   LearningRate 0.0923   Epoch: 0   Global Step: 32570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:17,020-Speed 2623.13 samples/sec   Loss 14.8703   LearningRate 0.0923   Epoch: 0   Global Step: 32580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:20,918-Speed 2627.83 samples/sec   Loss 15.0316   LearningRate 0.0923   Epoch: 0   Global Step: 32590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:24,813-Speed 2629.51 samples/sec   Loss 14.9651   LearningRate 0.0923   Epoch: 0   Global Step: 32600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:28,712-Speed 2627.36 samples/sec   Loss 15.1530   LearningRate 0.0923   Epoch: 0   Global Step: 32610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:32,620-Speed 2620.11 samples/sec   Loss 15.0747   LearningRate 0.0923   Epoch: 0   Global Step: 32620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:36,524-Speed 2623.81 samples/sec   Loss 14.9736   LearningRate 0.0923   Epoch: 0   Global Step: 32630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:40,407-Speed 2637.81 samples/sec   Loss 15.1331   LearningRate 0.0923   Epoch: 0   Global Step: 32640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:44,302-Speed 2629.99 samples/sec   Loss 15.1768   LearningRate 0.0923   Epoch: 0   Global Step: 32650   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:48,198-Speed 2629.04 samples/sec   Loss 15.1104   LearningRate 0.0923   Epoch: 0   Global Step: 32660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:52,092-Speed 2629.93 samples/sec   Loss 14.9886   LearningRate 0.0923   Epoch: 0   Global Step: 32670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:55,987-Speed 2630.02 samples/sec   Loss 14.9102   LearningRate 0.0923   Epoch: 0   Global Step: 32680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:23:59,881-Speed 2630.75 samples/sec   Loss 15.1068   LearningRate 0.0923   Epoch: 0   Global Step: 32690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:03,775-Speed 2629.87 samples/sec   Loss 14.9611   LearningRate 0.0923   Epoch: 0   Global Step: 32700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:07,671-Speed 2628.70 samples/sec   Loss 15.0035   LearningRate 0.0923   Epoch: 0   Global Step: 32710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:11,563-Speed 2631.47 samples/sec   Loss 14.9610   LearningRate 0.0923   Epoch: 0   Global Step: 32720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:15,464-Speed 2625.85 samples/sec   Loss 14.9801   LearningRate 0.0923   Epoch: 0   Global Step: 32730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:19,347-Speed 2638.03 samples/sec   Loss 15.0331   LearningRate 0.0923   Epoch: 0   Global Step: 32740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:23,244-Speed 2628.04 samples/sec   Loss 15.0283   LearningRate 0.0923   Epoch: 0   Global Step: 32750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:27,140-Speed 2629.10 samples/sec   Loss 15.0928   LearningRate 0.0923   Epoch: 0   Global Step: 32760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:31,035-Speed 2629.85 samples/sec   Loss 15.0573   LearningRate 0.0923   Epoch: 0   Global Step: 32770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:34,930-Speed 2629.97 samples/sec   Loss 15.1280   LearningRate 0.0923   Epoch: 0   Global Step: 32780   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:38,832-Speed 2624.68 samples/sec   Loss 15.1040   LearningRate 0.0923   Epoch: 0   Global Step: 32790   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:42,732-Speed 2626.32 samples/sec   Loss 15.0953   LearningRate 0.0922   Epoch: 0   Global Step: 32800   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:46,628-Speed 2628.69 samples/sec   Loss 15.0316   LearningRate 0.0922   Epoch: 0   Global Step: 32810   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:50,524-Speed 2628.96 samples/sec   Loss 15.0691   LearningRate 0.0922   Epoch: 0   Global Step: 32820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:54,420-Speed 2628.66 samples/sec   Loss 15.0287   LearningRate 0.0922   Epoch: 0   Global Step: 32830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:24:58,305-Speed 2636.89 samples/sec   Loss 15.0160   LearningRate 0.0922   Epoch: 0   Global Step: 32840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:02,193-Speed 2634.45 samples/sec   Loss 15.0196   LearningRate 0.0922   Epoch: 0   Global Step: 32850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:06,089-Speed 2628.96 samples/sec   Loss 14.9673   LearningRate 0.0922   Epoch: 0   Global Step: 32860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:09,987-Speed 2627.70 samples/sec   Loss 14.9052   LearningRate 0.0922   Epoch: 0   Global Step: 32870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:13,884-Speed 2627.99 samples/sec   Loss 15.0256   LearningRate 0.0922   Epoch: 0   Global Step: 32880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:17,782-Speed 2627.88 samples/sec   Loss 15.0500   LearningRate 0.0922   Epoch: 0   Global Step: 32890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:21,679-Speed 2627.86 samples/sec   Loss 14.9480   LearningRate 0.0922   Epoch: 0   Global Step: 32900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:25,573-Speed 2630.14 samples/sec   Loss 14.9896   LearningRate 0.0922   Epoch: 0   Global Step: 32910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:29,476-Speed 2624.57 samples/sec   Loss 15.2157   LearningRate 0.0922   Epoch: 0   Global Step: 32920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:33,379-Speed 2623.99 samples/sec   Loss 15.0759   LearningRate 0.0922   Epoch: 0   Global Step: 32930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:37,274-Speed 2629.78 samples/sec   Loss 14.9450   LearningRate 0.0922   Epoch: 0   Global Step: 32940   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 23:25:41,153-Speed 2640.41 samples/sec   Loss 15.0808   LearningRate 0.0922   Epoch: 0   Global Step: 32950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:45,054-Speed 2626.59 samples/sec   Loss 15.2113   LearningRate 0.0922   Epoch: 0   Global Step: 32960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:48,959-Speed 2623.10 samples/sec   Loss 14.9184   LearningRate 0.0922   Epoch: 0   Global Step: 32970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:52,852-Speed 2630.87 samples/sec   Loss 15.0355   LearningRate 0.0922   Epoch: 0   Global Step: 32980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:25:56,746-Speed 2630.32 samples/sec   Loss 15.0409   LearningRate 0.0922   Epoch: 0   Global Step: 32990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:00,645-Speed 2626.83 samples/sec   Loss 15.0993   LearningRate 0.0922   Epoch: 0   Global Step: 33000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:04,542-Speed 2628.00 samples/sec   Loss 14.9712   LearningRate 0.0922   Epoch: 0   Global Step: 33010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:08,443-Speed 2625.16 samples/sec   Loss 14.9424   LearningRate 0.0922   Epoch: 0   Global Step: 33020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:12,337-Speed 2630.95 samples/sec   Loss 15.0449   LearningRate 0.0922   Epoch: 0   Global Step: 33030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:16,231-Speed 2630.54 samples/sec   Loss 14.9980   LearningRate 0.0922   Epoch: 0   Global Step: 33040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:20,255-Speed 2545.26 samples/sec   Loss 14.9854   LearningRate 0.0922   Epoch: 0   Global Step: 33050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:26:24,168-Speed 2618.11 samples/sec   Loss 14.9975   LearningRate 0.0922   Epoch: 0   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:28,073-Speed 2622.45 samples/sec   Loss 14.9518   LearningRate 0.0922   Epoch: 0   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:31,963-Speed 2633.08 samples/sec   Loss 14.9722   LearningRate 0.0922   Epoch: 0   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:35,855-Speed 2631.14 samples/sec   Loss 15.0667   LearningRate 0.0922   Epoch: 0   Global Step: 33090   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:39,751-Speed 2629.18 samples/sec   Loss 14.9969   LearningRate 0.0922   Epoch: 0   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:43,672-Speed 2611.94 samples/sec   Loss 15.1204   LearningRate 0.0922   Epoch: 0   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:47,574-Speed 2624.87 samples/sec   Loss 15.1921   LearningRate 0.0922   Epoch: 0   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:51,498-Speed 2610.89 samples/sec   Loss 15.0241   LearningRate 0.0922   Epoch: 0   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:55,403-Speed 2622.71 samples/sec   Loss 15.0004   LearningRate 0.0922   Epoch: 0   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:26:59,328-Speed 2609.99 samples/sec   Loss 15.0180   LearningRate 0.0922   Epoch: 0   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:27:03,246-Speed 2614.45 samples/sec   Loss 14.9298   LearningRate 0.0922   Epoch: 0   Global Step: 33160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:07,189-Speed 2597.07 samples/sec   Loss 14.9469   LearningRate 0.0922   Epoch: 0   Global Step: 33170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:11,098-Speed 2620.04 samples/sec   Loss 15.0842   LearningRate 0.0922   Epoch: 0   Global Step: 33180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:14,996-Speed 2628.53 samples/sec   Loss 14.8448   LearningRate 0.0922   Epoch: 0   Global Step: 33190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:18,890-Speed 2630.02 samples/sec   Loss 14.9606   LearningRate 0.0922   Epoch: 0   Global Step: 33200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:22,818-Speed 2607.33 samples/sec   Loss 14.9833   LearningRate 0.0922   Epoch: 0   Global Step: 33210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:26,713-Speed 2630.12 samples/sec   Loss 15.0877   LearningRate 0.0922   Epoch: 0   Global Step: 33220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:30,607-Speed 2630.20 samples/sec   Loss 15.0145   LearningRate 0.0921   Epoch: 0   Global Step: 33230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:34,506-Speed 2626.57 samples/sec   Loss 14.7710   LearningRate 0.0921   Epoch: 0   Global Step: 33240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:38,432-Speed 2609.27 samples/sec   Loss 14.9417   LearningRate 0.0921   Epoch: 0   Global Step: 33250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:42,328-Speed 2629.06 samples/sec   Loss 14.9885   LearningRate 0.0921   Epoch: 0   Global Step: 33260   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 23:27:46,215-Speed 2635.32 samples/sec   Loss 14.8693   LearningRate 0.0921   Epoch: 0   Global Step: 33270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:27:50,098-Speed 2638.05 samples/sec   Loss 15.0154   LearningRate 0.0921   Epoch: 0   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:27:53,998-Speed 2626.42 samples/sec   Loss 14.8720   LearningRate 0.0921   Epoch: 0   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:27:57,894-Speed 2628.90 samples/sec   Loss 14.9415   LearningRate 0.0921   Epoch: 0   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:01,790-Speed 2629.13 samples/sec   Loss 14.9695   LearningRate 0.0921   Epoch: 0   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:05,685-Speed 2629.04 samples/sec   Loss 15.0288   LearningRate 0.0921   Epoch: 0   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:09,581-Speed 2629.02 samples/sec   Loss 14.9848   LearningRate 0.0921   Epoch: 0   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:13,478-Speed 2628.56 samples/sec   Loss 14.8323   LearningRate 0.0921   Epoch: 0   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:17,395-Speed 2614.76 samples/sec   Loss 14.9693   LearningRate 0.0921   Epoch: 0   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:21,294-Speed 2627.49 samples/sec   Loss 14.9799   LearningRate 0.0921   Epoch: 0   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:25,200-Speed 2622.14 samples/sec   Loss 14.8973   LearningRate 0.0921   Epoch: 0   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:29,093-Speed 2630.95 samples/sec   Loss 15.0072   LearningRate 0.0921   Epoch: 0   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:32,992-Speed 2627.33 samples/sec   Loss 14.9369   LearningRate 0.0921   Epoch: 0   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:36,891-Speed 2626.35 samples/sec   Loss 14.9119   LearningRate 0.0921   Epoch: 0   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:40,792-Speed 2626.59 samples/sec   Loss 14.7909   LearningRate 0.0921   Epoch: 0   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:44,697-Speed 2622.74 samples/sec   Loss 14.9903   LearningRate 0.0921   Epoch: 0   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:48,605-Speed 2621.06 samples/sec   Loss 14.9954   LearningRate 0.0921   Epoch: 0   Global Step: 33430   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:52,503-Speed 2628.25 samples/sec   Loss 15.0026   LearningRate 0.0921   Epoch: 0   Global Step: 33440   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:28:56,409-Speed 2622.13 samples/sec   Loss 14.9368   LearningRate 0.0921   Epoch: 0   Global Step: 33450   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:00,306-Speed 2627.88 samples/sec   Loss 14.9846   LearningRate 0.0921   Epoch: 0   Global Step: 33460   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:04,220-Speed 2616.57 samples/sec   Loss 14.8786   LearningRate 0.0921   Epoch: 0   Global Step: 33470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:08,111-Speed 2632.74 samples/sec   Loss 15.0684   LearningRate 0.0921   Epoch: 0   Global Step: 33480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:29:12,010-Speed 2626.59 samples/sec   Loss 14.8097   LearningRate 0.0921   Epoch: 0   Global Step: 33490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:29:15,903-Speed 2631.48 samples/sec   Loss 14.9985   LearningRate 0.0921   Epoch: 0   Global Step: 33500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:29:19,778-Speed 2643.91 samples/sec   Loss 14.8545   LearningRate 0.0921   Epoch: 0   Global Step: 33510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:23,674-Speed 2628.88 samples/sec   Loss 15.0837   LearningRate 0.0921   Epoch: 0   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:27,572-Speed 2627.25 samples/sec   Loss 14.9759   LearningRate 0.0921   Epoch: 0   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:31,467-Speed 2629.62 samples/sec   Loss 14.9340   LearningRate 0.0921   Epoch: 0   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:35,361-Speed 2630.26 samples/sec   Loss 15.0802   LearningRate 0.0921   Epoch: 0   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:39,270-Speed 2620.29 samples/sec   Loss 15.0002   LearningRate 0.0921   Epoch: 0   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:43,171-Speed 2625.82 samples/sec   Loss 15.0520   LearningRate 0.0921   Epoch: 0   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:47,073-Speed 2624.67 samples/sec   Loss 14.9454   LearningRate 0.0921   Epoch: 0   Global Step: 33580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:50,981-Speed 2620.96 samples/sec   Loss 14.8876   LearningRate 0.0921   Epoch: 0   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:54,878-Speed 2628.51 samples/sec   Loss 14.9521   LearningRate 0.0921   Epoch: 0   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:29:58,775-Speed 2628.32 samples/sec   Loss 14.9875   LearningRate 0.0921   Epoch: 0   Global Step: 33610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:30:02,673-Speed 2627.45 samples/sec   Loss 14.8608   LearningRate 0.0921   Epoch: 0   Global Step: 33620   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:30:06,579-Speed 2621.77 samples/sec   Loss 14.8653   LearningRate 0.0921   Epoch: 0   Global Step: 33630   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:30:10,477-Speed 2627.83 samples/sec   Loss 14.8481   LearningRate 0.0921   Epoch: 0   Global Step: 33640   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:30:14,387-Speed 2619.80 samples/sec   Loss 14.9294   LearningRate 0.0921   Epoch: 0   Global Step: 33650   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:30:18,287-Speed 2626.28 samples/sec   Loss 14.9609   LearningRate 0.0920   Epoch: 0   Global Step: 33660   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:30:22,148-Speed 2653.03 samples/sec   Loss 14.8815   LearningRate 0.0920   Epoch: 0   Global Step: 33670   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:26,053-Speed 2622.84 samples/sec   Loss 15.0118   LearningRate 0.0920   Epoch: 0   Global Step: 33680   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:29,963-Speed 2619.60 samples/sec   Loss 14.9455   LearningRate 0.0920   Epoch: 0   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:33,861-Speed 2627.73 samples/sec   Loss 14.9618   LearningRate 0.0920   Epoch: 0   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:37,763-Speed 2624.49 samples/sec   Loss 14.9164   LearningRate 0.0920   Epoch: 0   Global Step: 33710   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:41,661-Speed 2627.64 samples/sec   Loss 14.9317   LearningRate 0.0920   Epoch: 0   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:45,558-Speed 2628.10 samples/sec   Loss 15.0242   LearningRate 0.0920   Epoch: 0   Global Step: 33730   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:49,455-Speed 2629.02 samples/sec   Loss 14.9043   LearningRate 0.0920   Epoch: 0   Global Step: 33740   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:53,361-Speed 2622.59 samples/sec   Loss 14.9802   LearningRate 0.0920   Epoch: 0   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:30:57,260-Speed 2627.10 samples/sec   Loss 14.9454   LearningRate 0.0920   Epoch: 0   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:31:01,156-Speed 2628.91 samples/sec   Loss 14.9144   LearningRate 0.0920   Epoch: 0   Global Step: 33770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:05,053-Speed 2628.22 samples/sec   Loss 14.9417   LearningRate 0.0920   Epoch: 0   Global Step: 33780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:08,960-Speed 2621.43 samples/sec   Loss 15.0037   LearningRate 0.0920   Epoch: 0   Global Step: 33790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:12,864-Speed 2623.97 samples/sec   Loss 14.8254   LearningRate 0.0920   Epoch: 0   Global Step: 33800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:16,761-Speed 2628.38 samples/sec   Loss 14.7838   LearningRate 0.0920   Epoch: 0   Global Step: 33810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:20,668-Speed 2621.60 samples/sec   Loss 14.9449   LearningRate 0.0920   Epoch: 0   Global Step: 33820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:24,568-Speed 2626.13 samples/sec   Loss 14.8676   LearningRate 0.0920   Epoch: 0   Global Step: 33830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:28,488-Speed 2613.13 samples/sec   Loss 14.9192   LearningRate 0.0920   Epoch: 0   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:32,380-Speed 2631.86 samples/sec   Loss 14.9929   LearningRate 0.0920   Epoch: 0   Global Step: 33850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:36,282-Speed 2624.48 samples/sec   Loss 14.9312   LearningRate 0.0920   Epoch: 0   Global Step: 33860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:31:40,179-Speed 2628.33 samples/sec   Loss 14.7776   LearningRate 0.0920   Epoch: 0   Global Step: 33870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:31:44,129-Speed 2593.55 samples/sec   Loss 14.8612   LearningRate 0.0920   Epoch: 0   Global Step: 33880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:31:48,019-Speed 2632.67 samples/sec   Loss 14.9181   LearningRate 0.0920   Epoch: 0   Global Step: 33890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:31:51,919-Speed 2627.06 samples/sec   Loss 14.8650   LearningRate 0.0920   Epoch: 0   Global Step: 33900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:31:55,811-Speed 2631.14 samples/sec   Loss 14.9133   LearningRate 0.0920   Epoch: 0   Global Step: 33910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:31:59,687-Speed 2643.29 samples/sec   Loss 14.7749   LearningRate 0.0920   Epoch: 0   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:03,585-Speed 2627.99 samples/sec   Loss 14.9152   LearningRate 0.0920   Epoch: 0   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:07,486-Speed 2625.11 samples/sec   Loss 14.9519   LearningRate 0.0920   Epoch: 0   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:11,403-Speed 2614.86 samples/sec   Loss 15.0259   LearningRate 0.0920   Epoch: 0   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:15,296-Speed 2631.39 samples/sec   Loss 14.9772   LearningRate 0.0920   Epoch: 0   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:19,201-Speed 2622.70 samples/sec   Loss 14.9416   LearningRate 0.0920   Epoch: 0   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:23,094-Speed 2631.02 samples/sec   Loss 14.8620   LearningRate 0.0920   Epoch: 0   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:26,993-Speed 2626.95 samples/sec   Loss 14.8288   LearningRate 0.0920   Epoch: 0   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:30,890-Speed 2628.36 samples/sec   Loss 14.8550   LearningRate 0.0920   Epoch: 0   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:34,784-Speed 2630.46 samples/sec   Loss 14.8793   LearningRate 0.0920   Epoch: 0   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:38,673-Speed 2633.29 samples/sec   Loss 14.9172   LearningRate 0.0920   Epoch: 0   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:32:42,548-Speed 2643.53 samples/sec   Loss 14.9520   LearningRate 0.0920   Epoch: 0   Global Step: 34030   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:32:46,438-Speed 2633.58 samples/sec   Loss 14.8393   LearningRate 0.0920   Epoch: 0   Global Step: 34040   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:32:50,336-Speed 2627.43 samples/sec   Loss 14.7983   LearningRate 0.0920   Epoch: 0   Global Step: 34050   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:32:54,248-Speed 2618.56 samples/sec   Loss 14.8458   LearningRate 0.0920   Epoch: 0   Global Step: 34060   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:32:58,135-Speed 2634.90 samples/sec   Loss 14.8353   LearningRate 0.0920   Epoch: 0   Global Step: 34070   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:33:02,033-Speed 2627.51 samples/sec   Loss 14.9244   LearningRate 0.0920   Epoch: 0   Global Step: 34080   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:33:05,931-Speed 2627.82 samples/sec   Loss 14.8203   LearningRate 0.0920   Epoch: 0   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:33:09,824-Speed 2630.97 samples/sec   Loss 14.9738   LearningRate 0.0919   Epoch: 0   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:33:13,714-Speed 2633.12 samples/sec   Loss 14.8653   LearningRate 0.0919   Epoch: 0   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:33:17,607-Speed 2631.40 samples/sec   Loss 14.9405   LearningRate 0.0919   Epoch: 0   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:33:21,508-Speed 2625.10 samples/sec   Loss 14.9299   LearningRate 0.0919   Epoch: 0   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:25,411-Speed 2624.83 samples/sec   Loss 14.9675   LearningRate 0.0919   Epoch: 0   Global Step: 34140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:29,308-Speed 2628.06 samples/sec   Loss 14.7837   LearningRate 0.0919   Epoch: 0   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:33,205-Speed 2628.15 samples/sec   Loss 14.8411   LearningRate 0.0919   Epoch: 0   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:37,269-Speed 2520.17 samples/sec   Loss 14.8631   LearningRate 0.0919   Epoch: 0   Global Step: 34170   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:41,286-Speed 2550.33 samples/sec   Loss 14.8403   LearningRate 0.0919   Epoch: 0   Global Step: 34180   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:45,185-Speed 2627.25 samples/sec   Loss 14.9761   LearningRate 0.0919   Epoch: 0   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:49,082-Speed 2627.90 samples/sec   Loss 14.9360   LearningRate 0.0919   Epoch: 0   Global Step: 34200   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:52,985-Speed 2624.60 samples/sec   Loss 14.8138   LearningRate 0.0919   Epoch: 0   Global Step: 34210   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:33:56,884-Speed 2627.51 samples/sec   Loss 14.8540   LearningRate 0.0919   Epoch: 0   Global Step: 34220   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:34:00,743-Speed 2653.81 samples/sec   Loss 14.9829   LearningRate 0.0919   Epoch: 0   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:04,655-Speed 2618.23 samples/sec   Loss 14.8847   LearningRate 0.0919   Epoch: 0   Global Step: 34240   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:08,547-Speed 2631.73 samples/sec   Loss 14.8845   LearningRate 0.0919   Epoch: 0   Global Step: 34250   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:12,443-Speed 2629.36 samples/sec   Loss 14.8790   LearningRate 0.0919   Epoch: 0   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:16,335-Speed 2631.35 samples/sec   Loss 15.0544   LearningRate 0.0919   Epoch: 0   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:20,234-Speed 2627.18 samples/sec   Loss 14.7535   LearningRate 0.0919   Epoch: 0   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:24,137-Speed 2624.33 samples/sec   Loss 14.9140   LearningRate 0.0919   Epoch: 0   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:28,033-Speed 2629.21 samples/sec   Loss 14.8558   LearningRate 0.0919   Epoch: 0   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:31,946-Speed 2617.44 samples/sec   Loss 14.7232   LearningRate 0.0919   Epoch: 0   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:35,843-Speed 2628.12 samples/sec   Loss 14.8525   LearningRate 0.0919   Epoch: 0   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:34:39,755-Speed 2618.50 samples/sec   Loss 14.8888   LearningRate 0.0919   Epoch: 0   Global Step: 34330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:34:43,653-Speed 2627.49 samples/sec   Loss 14.7720   LearningRate 0.0919   Epoch: 0   Global Step: 34340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:34:47,545-Speed 2631.88 samples/sec   Loss 14.9586   LearningRate 0.0919   Epoch: 0   Global Step: 34350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:34:51,437-Speed 2631.92 samples/sec   Loss 14.8251   LearningRate 0.0919   Epoch: 0   Global Step: 34360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:34:55,329-Speed 2631.61 samples/sec   Loss 14.8306   LearningRate 0.0919   Epoch: 0   Global Step: 34370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:34:59,223-Speed 2630.45 samples/sec   Loss 14.9223   LearningRate 0.0919   Epoch: 0   Global Step: 34380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:35:03,115-Speed 2631.49 samples/sec   Loss 14.9731   LearningRate 0.0919   Epoch: 0   Global Step: 34390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:35:07,002-Speed 2635.10 samples/sec   Loss 14.9957   LearningRate 0.0919   Epoch: 0   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:35:10,887-Speed 2637.05 samples/sec   Loss 15.0298   LearningRate 0.0919   Epoch: 0   Global Step: 34410   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:35:14,755-Speed 2647.93 samples/sec   Loss 15.0642   LearningRate 0.0919   Epoch: 0   Global Step: 34420   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:18,661-Speed 2622.19 samples/sec   Loss 14.6983   LearningRate 0.0919   Epoch: 0   Global Step: 34430   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:22,558-Speed 2628.73 samples/sec   Loss 14.7714   LearningRate 0.0919   Epoch: 0   Global Step: 34440   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:26,482-Speed 2609.91 samples/sec   Loss 14.8592   LearningRate 0.0919   Epoch: 0   Global Step: 34450   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:30,398-Speed 2616.69 samples/sec   Loss 14.9750   LearningRate 0.0919   Epoch: 0   Global Step: 34460   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:34,299-Speed 2625.21 samples/sec   Loss 14.9046   LearningRate 0.0919   Epoch: 0   Global Step: 34470   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:38,202-Speed 2624.88 samples/sec   Loss 14.9236   LearningRate 0.0919   Epoch: 0   Global Step: 34480   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:42,149-Speed 2594.39 samples/sec   Loss 14.9773   LearningRate 0.0919   Epoch: 0   Global Step: 34490   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:46,098-Speed 2594.47 samples/sec   Loss 14.7090   LearningRate 0.0919   Epoch: 0   Global Step: 34500   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:49,999-Speed 2626.04 samples/sec   Loss 14.7299   LearningRate 0.0919   Epoch: 0   Global Step: 34510   Fp16 Grad Scale: 16384   Required: 89 hours
Training: 2022-04-12 23:35:53,935-Speed 2601.74 samples/sec   Loss 15.0252   LearningRate 0.0919   Epoch: 0   Global Step: 34520   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:35:57,828-Speed 2631.30 samples/sec   Loss 14.7789   LearningRate 0.0918   Epoch: 0   Global Step: 34530   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:01,723-Speed 2629.86 samples/sec   Loss 14.9176   LearningRate 0.0918   Epoch: 0   Global Step: 34540   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:05,625-Speed 2624.53 samples/sec   Loss 14.8577   LearningRate 0.0918   Epoch: 0   Global Step: 34550   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:09,523-Speed 2627.70 samples/sec   Loss 14.8301   LearningRate 0.0918   Epoch: 0   Global Step: 34560   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:13,417-Speed 2630.86 samples/sec   Loss 14.9119   LearningRate 0.0918   Epoch: 0   Global Step: 34570   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:17,316-Speed 2626.67 samples/sec   Loss 14.7277   LearningRate 0.0918   Epoch: 0   Global Step: 34580   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:21,211-Speed 2629.62 samples/sec   Loss 14.8982   LearningRate 0.0918   Epoch: 0   Global Step: 34590   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:25,115-Speed 2623.71 samples/sec   Loss 14.8364   LearningRate 0.0918   Epoch: 0   Global Step: 34600   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:29,009-Speed 2630.38 samples/sec   Loss 14.8483   LearningRate 0.0918   Epoch: 0   Global Step: 34610   Fp16 Grad Scale: 32768   Required: 89 hours
Training: 2022-04-12 23:36:32,908-Speed 2627.42 samples/sec   Loss 14.8890   LearningRate 0.0918   Epoch: 0   Global Step: 34620   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:36:36,800-Speed 2631.33 samples/sec   Loss 14.8503   LearningRate 0.0918   Epoch: 0   Global Step: 34630   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:36:40,694-Speed 2629.77 samples/sec   Loss 14.7516   LearningRate 0.0918   Epoch: 0   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:36:44,590-Speed 2630.19 samples/sec   Loss 14.8353   LearningRate 0.0918   Epoch: 0   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:36:48,495-Speed 2622.19 samples/sec   Loss 14.9055   LearningRate 0.0918   Epoch: 0   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:36:52,402-Speed 2621.93 samples/sec   Loss 14.8726   LearningRate 0.0918   Epoch: 0   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:36:56,301-Speed 2626.65 samples/sec   Loss 14.8647   LearningRate 0.0918   Epoch: 0   Global Step: 34680   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:37:00,214-Speed 2617.99 samples/sec   Loss 14.7552   LearningRate 0.0918   Epoch: 0   Global Step: 34690   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:37:04,109-Speed 2629.49 samples/sec   Loss 14.8017   LearningRate 0.0918   Epoch: 0   Global Step: 34700   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:37:08,001-Speed 2631.73 samples/sec   Loss 14.7827   LearningRate 0.0918   Epoch: 0   Global Step: 34710   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:37:11,896-Speed 2629.68 samples/sec   Loss 14.8726   LearningRate 0.0918   Epoch: 0   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:15,790-Speed 2629.95 samples/sec   Loss 14.9195   LearningRate 0.0918   Epoch: 0   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:19,690-Speed 2625.88 samples/sec   Loss 14.9369   LearningRate 0.0918   Epoch: 0   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:23,583-Speed 2631.64 samples/sec   Loss 14.8872   LearningRate 0.0918   Epoch: 0   Global Step: 34750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:27,477-Speed 2629.97 samples/sec   Loss 14.6997   LearningRate 0.0918   Epoch: 0   Global Step: 34760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:31,373-Speed 2629.24 samples/sec   Loss 14.7657   LearningRate 0.0918   Epoch: 0   Global Step: 34770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:35,269-Speed 2629.07 samples/sec   Loss 14.8521   LearningRate 0.0918   Epoch: 0   Global Step: 34780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:39,161-Speed 2631.78 samples/sec   Loss 14.7675   LearningRate 0.0918   Epoch: 0   Global Step: 34790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:43,189-Speed 2542.68 samples/sec   Loss 14.8349   LearningRate 0.0918   Epoch: 0   Global Step: 34800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:47,087-Speed 2626.93 samples/sec   Loss 14.9033   LearningRate 0.0918   Epoch: 0   Global Step: 34810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:37:50,989-Speed 2625.01 samples/sec   Loss 14.8078   LearningRate 0.0918   Epoch: 0   Global Step: 34820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:37:54,889-Speed 2626.49 samples/sec   Loss 14.7687   LearningRate 0.0918   Epoch: 0   Global Step: 34830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:37:58,794-Speed 2622.81 samples/sec   Loss 14.9143   LearningRate 0.0918   Epoch: 0   Global Step: 34840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:02,697-Speed 2624.04 samples/sec   Loss 14.7909   LearningRate 0.0918   Epoch: 0   Global Step: 34850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:06,597-Speed 2626.17 samples/sec   Loss 14.8241   LearningRate 0.0918   Epoch: 0   Global Step: 34860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:10,492-Speed 2629.37 samples/sec   Loss 14.8256   LearningRate 0.0918   Epoch: 0   Global Step: 34870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:14,406-Speed 2617.80 samples/sec   Loss 14.8818   LearningRate 0.0918   Epoch: 0   Global Step: 34880   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:18,323-Speed 2614.88 samples/sec   Loss 14.8138   LearningRate 0.0918   Epoch: 0   Global Step: 34890   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:22,232-Speed 2620.08 samples/sec   Loss 14.8169   LearningRate 0.0918   Epoch: 0   Global Step: 34900   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:26,180-Speed 2594.25 samples/sec   Loss 14.7869   LearningRate 0.0918   Epoch: 0   Global Step: 34910   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:30,064-Speed 2637.60 samples/sec   Loss 14.8165   LearningRate 0.0918   Epoch: 0   Global Step: 34920   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:33,968-Speed 2623.22 samples/sec   Loss 14.8856   LearningRate 0.0918   Epoch: 0   Global Step: 34930   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:37,869-Speed 2626.02 samples/sec   Loss 14.8995   LearningRate 0.0918   Epoch: 0   Global Step: 34940   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:41,763-Speed 2630.74 samples/sec   Loss 14.8872   LearningRate 0.0918   Epoch: 0   Global Step: 34950   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:45,665-Speed 2624.80 samples/sec   Loss 14.8836   LearningRate 0.0917   Epoch: 0   Global Step: 34960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:49,562-Speed 2628.03 samples/sec   Loss 14.8023   LearningRate 0.0917   Epoch: 0   Global Step: 34970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:53,457-Speed 2629.97 samples/sec   Loss 14.6967   LearningRate 0.0917   Epoch: 0   Global Step: 34980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:38:57,364-Speed 2621.70 samples/sec   Loss 14.9274   LearningRate 0.0917   Epoch: 0   Global Step: 34990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:01,267-Speed 2623.95 samples/sec   Loss 14.8785   LearningRate 0.0917   Epoch: 0   Global Step: 35000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:05,169-Speed 2624.91 samples/sec   Loss 14.7803   LearningRate 0.0917   Epoch: 0   Global Step: 35010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:09,066-Speed 2628.62 samples/sec   Loss 14.7869   LearningRate 0.0917   Epoch: 0   Global Step: 35020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:12,981-Speed 2616.25 samples/sec   Loss 14.9270   LearningRate 0.0917   Epoch: 0   Global Step: 35030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:16,902-Speed 2612.28 samples/sec   Loss 14.8249   LearningRate 0.0917   Epoch: 0   Global Step: 35040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:20,828-Speed 2608.51 samples/sec   Loss 14.7531   LearningRate 0.0917   Epoch: 0   Global Step: 35050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:24,725-Speed 2629.00 samples/sec   Loss 14.8089   LearningRate 0.0917   Epoch: 0   Global Step: 35060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:28,627-Speed 2625.21 samples/sec   Loss 14.7624   LearningRate 0.0917   Epoch: 0   Global Step: 35070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:32,541-Speed 2616.12 samples/sec   Loss 14.5531   LearningRate 0.0917   Epoch: 0   Global Step: 35080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:36,435-Speed 2630.54 samples/sec   Loss 14.8272   LearningRate 0.0917   Epoch: 0   Global Step: 35090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:40,341-Speed 2622.50 samples/sec   Loss 14.7491   LearningRate 0.0917   Epoch: 0   Global Step: 35100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:44,246-Speed 2622.98 samples/sec   Loss 14.7429   LearningRate 0.0917   Epoch: 0   Global Step: 35110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:48,197-Speed 2592.44 samples/sec   Loss 14.7747   LearningRate 0.0917   Epoch: 0   Global Step: 35120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:52,129-Speed 2604.79 samples/sec   Loss 14.8959   LearningRate 0.0917   Epoch: 0   Global Step: 35130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:56,023-Speed 2630.39 samples/sec   Loss 14.9822   LearningRate 0.0917   Epoch: 0   Global Step: 35140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:39:59,921-Speed 2627.62 samples/sec   Loss 14.8307   LearningRate 0.0917   Epoch: 0   Global Step: 35150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:03,821-Speed 2626.09 samples/sec   Loss 14.7799   LearningRate 0.0917   Epoch: 0   Global Step: 35160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:07,730-Speed 2620.29 samples/sec   Loss 14.8384   LearningRate 0.0917   Epoch: 0   Global Step: 35170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:11,629-Speed 2627.60 samples/sec   Loss 14.8196   LearningRate 0.0917   Epoch: 0   Global Step: 35180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:15,529-Speed 2626.43 samples/sec   Loss 14.6865   LearningRate 0.0917   Epoch: 0   Global Step: 35190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:19,429-Speed 2625.82 samples/sec   Loss 14.6684   LearningRate 0.0917   Epoch: 0   Global Step: 35200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:23,326-Speed 2629.06 samples/sec   Loss 14.5941   LearningRate 0.0917   Epoch: 0   Global Step: 35210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:27,206-Speed 2639.19 samples/sec   Loss 14.6704   LearningRate 0.0917   Epoch: 0   Global Step: 35220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:31,104-Speed 2627.76 samples/sec   Loss 14.9183   LearningRate 0.0917   Epoch: 0   Global Step: 35230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:34,999-Speed 2629.82 samples/sec   Loss 14.8450   LearningRate 0.0917   Epoch: 0   Global Step: 35240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:38,894-Speed 2629.72 samples/sec   Loss 14.7977   LearningRate 0.0917   Epoch: 0   Global Step: 35250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:42,787-Speed 2631.16 samples/sec   Loss 14.7694   LearningRate 0.0917   Epoch: 0   Global Step: 35260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:46,679-Speed 2632.52 samples/sec   Loss 14.8131   LearningRate 0.0917   Epoch: 0   Global Step: 35270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:50,578-Speed 2626.46 samples/sec   Loss 14.7898   LearningRate 0.0917   Epoch: 0   Global Step: 35280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:54,472-Speed 2630.36 samples/sec   Loss 14.8016   LearningRate 0.0917   Epoch: 0   Global Step: 35290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:40:58,367-Speed 2629.63 samples/sec   Loss 14.7053   LearningRate 0.0917   Epoch: 0   Global Step: 35300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:02,264-Speed 2628.55 samples/sec   Loss 14.7876   LearningRate 0.0917   Epoch: 0   Global Step: 35310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:06,163-Speed 2626.86 samples/sec   Loss 14.7540   LearningRate 0.0917   Epoch: 0   Global Step: 35320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:10,060-Speed 2628.17 samples/sec   Loss 14.8242   LearningRate 0.0917   Epoch: 0   Global Step: 35330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:13,959-Speed 2627.07 samples/sec   Loss 14.7867   LearningRate 0.0917   Epoch: 0   Global Step: 35340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:17,863-Speed 2623.27 samples/sec   Loss 14.8764   LearningRate 0.0917   Epoch: 0   Global Step: 35350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:21,766-Speed 2624.86 samples/sec   Loss 14.6531   LearningRate 0.0917   Epoch: 0   Global Step: 35360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:25,664-Speed 2627.77 samples/sec   Loss 14.8244   LearningRate 0.0917   Epoch: 0   Global Step: 35370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:29,588-Speed 2610.13 samples/sec   Loss 14.8147   LearningRate 0.0917   Epoch: 0   Global Step: 35380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:33,532-Speed 2597.20 samples/sec   Loss 14.6142   LearningRate 0.0916   Epoch: 0   Global Step: 35390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:37,440-Speed 2620.79 samples/sec   Loss 14.7889   LearningRate 0.0916   Epoch: 0   Global Step: 35400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:41,351-Speed 2618.71 samples/sec   Loss 14.7567   LearningRate 0.0916   Epoch: 0   Global Step: 35410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:45,264-Speed 2617.70 samples/sec   Loss 14.6779   LearningRate 0.0916   Epoch: 0   Global Step: 35420   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-12 23:41:49,160-Speed 2629.14 samples/sec   Loss 14.5487   LearningRate 0.0916   Epoch: 0   Global Step: 35430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:53,072-Speed 2618.31 samples/sec   Loss 14.7968   LearningRate 0.0916   Epoch: 0   Global Step: 35440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:41:56,991-Speed 2613.52 samples/sec   Loss 14.7568   LearningRate 0.0916   Epoch: 0   Global Step: 35450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:00,918-Speed 2608.84 samples/sec   Loss 14.7958   LearningRate 0.0916   Epoch: 0   Global Step: 35460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:04,828-Speed 2619.47 samples/sec   Loss 14.7047   LearningRate 0.0916   Epoch: 0   Global Step: 35470   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:08,765-Speed 2601.09 samples/sec   Loss 14.7779   LearningRate 0.0916   Epoch: 0   Global Step: 35480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:12,687-Speed 2611.80 samples/sec   Loss 14.6305   LearningRate 0.0916   Epoch: 0   Global Step: 35490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:16,606-Speed 2614.41 samples/sec   Loss 14.7546   LearningRate 0.0916   Epoch: 0   Global Step: 35500   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:20,513-Speed 2621.29 samples/sec   Loss 14.7602   LearningRate 0.0916   Epoch: 0   Global Step: 35510   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:24,427-Speed 2617.08 samples/sec   Loss 15.0261   LearningRate 0.0916   Epoch: 0   Global Step: 35520   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:28,322-Speed 2629.78 samples/sec   Loss 14.8546   LearningRate 0.0916   Epoch: 0   Global Step: 35530   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:32,217-Speed 2629.86 samples/sec   Loss 14.6530   LearningRate 0.0916   Epoch: 0   Global Step: 35540   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:36,121-Speed 2623.57 samples/sec   Loss 14.9561   LearningRate 0.0916   Epoch: 0   Global Step: 35550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:40,015-Speed 2630.26 samples/sec   Loss 14.8330   LearningRate 0.0916   Epoch: 0   Global Step: 35560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:43,910-Speed 2629.69 samples/sec   Loss 14.8982   LearningRate 0.0916   Epoch: 0   Global Step: 35570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:47,804-Speed 2630.09 samples/sec   Loss 14.6497   LearningRate 0.0916   Epoch: 0   Global Step: 35580   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:51,704-Speed 2626.80 samples/sec   Loss 14.8185   LearningRate 0.0916   Epoch: 0   Global Step: 35590   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:55,600-Speed 2628.63 samples/sec   Loss 14.8001   LearningRate 0.0916   Epoch: 0   Global Step: 35600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:42:59,500-Speed 2626.59 samples/sec   Loss 14.7349   LearningRate 0.0916   Epoch: 0   Global Step: 35610   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:43:03,378-Speed 2641.09 samples/sec   Loss 14.7020   LearningRate 0.0916   Epoch: 0   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:07,279-Speed 2625.24 samples/sec   Loss 14.6863   LearningRate 0.0916   Epoch: 0   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:11,175-Speed 2629.23 samples/sec   Loss 14.8074   LearningRate 0.0916   Epoch: 0   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:15,099-Speed 2610.23 samples/sec   Loss 14.8169   LearningRate 0.0916   Epoch: 0   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:19,103-Speed 2558.38 samples/sec   Loss 14.8387   LearningRate 0.0916   Epoch: 0   Global Step: 35660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:23,008-Speed 2623.18 samples/sec   Loss 14.7620   LearningRate 0.0916   Epoch: 0   Global Step: 35670   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:26,909-Speed 2625.10 samples/sec   Loss 14.6783   LearningRate 0.0916   Epoch: 0   Global Step: 35680   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:30,807-Speed 2627.77 samples/sec   Loss 14.5051   LearningRate 0.0916   Epoch: 0   Global Step: 35690   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:34,706-Speed 2627.11 samples/sec   Loss 14.8211   LearningRate 0.0916   Epoch: 0   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:38,608-Speed 2625.19 samples/sec   Loss 14.7765   LearningRate 0.0916   Epoch: 0   Global Step: 35710   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:43:42,516-Speed 2620.49 samples/sec   Loss 14.7954   LearningRate 0.0916   Epoch: 0   Global Step: 35720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:43:46,418-Speed 2625.03 samples/sec   Loss 14.6334   LearningRate 0.0916   Epoch: 0   Global Step: 35730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:43:50,313-Speed 2629.51 samples/sec   Loss 14.6375   LearningRate 0.0916   Epoch: 0   Global Step: 35740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:43:54,213-Speed 2626.55 samples/sec   Loss 14.9458   LearningRate 0.0916   Epoch: 0   Global Step: 35750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:43:58,110-Speed 2628.60 samples/sec   Loss 14.7191   LearningRate 0.0916   Epoch: 0   Global Step: 35760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:44:02,031-Speed 2612.32 samples/sec   Loss 14.9241   LearningRate 0.0916   Epoch: 0   Global Step: 35770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:44:05,954-Speed 2610.90 samples/sec   Loss 14.6070   LearningRate 0.0916   Epoch: 0   Global Step: 35780   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:44:09,851-Speed 2627.61 samples/sec   Loss 14.5872   LearningRate 0.0916   Epoch: 0   Global Step: 35790   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:44:13,731-Speed 2639.67 samples/sec   Loss 14.7449   LearningRate 0.0916   Epoch: 0   Global Step: 35800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:17,633-Speed 2625.11 samples/sec   Loss 14.7854   LearningRate 0.0916   Epoch: 0   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:21,529-Speed 2629.40 samples/sec   Loss 14.5033   LearningRate 0.0916   Epoch: 0   Global Step: 35820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:25,424-Speed 2630.12 samples/sec   Loss 14.8006   LearningRate 0.0915   Epoch: 0   Global Step: 35830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:29,320-Speed 2628.46 samples/sec   Loss 14.7711   LearningRate 0.0915   Epoch: 0   Global Step: 35840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:33,217-Speed 2628.58 samples/sec   Loss 14.7637   LearningRate 0.0915   Epoch: 0   Global Step: 35850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:37,111-Speed 2629.91 samples/sec   Loss 14.7950   LearningRate 0.0915   Epoch: 0   Global Step: 35860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:41,006-Speed 2630.37 samples/sec   Loss 14.7557   LearningRate 0.0915   Epoch: 0   Global Step: 35870   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:44,904-Speed 2627.35 samples/sec   Loss 14.8373   LearningRate 0.0915   Epoch: 0   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:48,798-Speed 2630.65 samples/sec   Loss 14.4993   LearningRate 0.0915   Epoch: 0   Global Step: 35890   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:52,676-Speed 2640.97 samples/sec   Loss 14.7549   LearningRate 0.0915   Epoch: 0   Global Step: 35900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:44:56,574-Speed 2627.67 samples/sec   Loss 14.7415   LearningRate 0.0915   Epoch: 0   Global Step: 35910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:00,467-Speed 2631.41 samples/sec   Loss 14.8551   LearningRate 0.0915   Epoch: 0   Global Step: 35920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:04,360-Speed 2630.31 samples/sec   Loss 14.6481   LearningRate 0.0915   Epoch: 0   Global Step: 35930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:08,253-Speed 2630.81 samples/sec   Loss 14.8105   LearningRate 0.0915   Epoch: 0   Global Step: 35940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:12,148-Speed 2630.00 samples/sec   Loss 14.7318   LearningRate 0.0915   Epoch: 0   Global Step: 35950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:16,046-Speed 2628.06 samples/sec   Loss 14.7945   LearningRate 0.0915   Epoch: 0   Global Step: 35960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:19,941-Speed 2629.31 samples/sec   Loss 14.7660   LearningRate 0.0915   Epoch: 0   Global Step: 35970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:23,845-Speed 2623.73 samples/sec   Loss 14.6821   LearningRate 0.0915   Epoch: 0   Global Step: 35980   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:27,748-Speed 2624.40 samples/sec   Loss 14.6974   LearningRate 0.0915   Epoch: 0   Global Step: 35990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:45:31,650-Speed 2625.01 samples/sec   Loss 14.6140   LearningRate 0.0915   Epoch: 0   Global Step: 36000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:45:35,510-Speed 2653.71 samples/sec   Loss 14.6313   LearningRate 0.0915   Epoch: 0   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:45:39,402-Speed 2631.65 samples/sec   Loss 14.5851   LearningRate 0.0915   Epoch: 0   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:45:43,296-Speed 2630.02 samples/sec   Loss 14.7099   LearningRate 0.0915   Epoch: 0   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:45:47,189-Speed 2630.81 samples/sec   Loss 14.7882   LearningRate 0.0915   Epoch: 0   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:45:51,086-Speed 2628.58 samples/sec   Loss 14.8086   LearningRate 0.0915   Epoch: 0   Global Step: 36050   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:45:54,984-Speed 2627.45 samples/sec   Loss 14.8540   LearningRate 0.0915   Epoch: 0   Global Step: 36060   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:45:58,879-Speed 2630.04 samples/sec   Loss 14.6625   LearningRate 0.0915   Epoch: 0   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:02,772-Speed 2630.79 samples/sec   Loss 14.8676   LearningRate 0.0915   Epoch: 0   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:06,663-Speed 2632.15 samples/sec   Loss 14.6883   LearningRate 0.0915   Epoch: 0   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:10,555-Speed 2631.65 samples/sec   Loss 14.8369   LearningRate 0.0915   Epoch: 0   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:14,447-Speed 2632.15 samples/sec   Loss 14.7159   LearningRate 0.0915   Epoch: 0   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:46:18,346-Speed 2626.85 samples/sec   Loss 14.7730   LearningRate 0.0915   Epoch: 0   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:46:22,242-Speed 2628.88 samples/sec   Loss 14.6378   LearningRate 0.0915   Epoch: 0   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:46:26,130-Speed 2634.58 samples/sec   Loss 14.7299   LearningRate 0.0915   Epoch: 0   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:30,050-Speed 2612.71 samples/sec   Loss 14.7922   LearningRate 0.0915   Epoch: 0   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:33,961-Speed 2619.36 samples/sec   Loss 14.7076   LearningRate 0.0915   Epoch: 0   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:37,859-Speed 2627.38 samples/sec   Loss 14.7891   LearningRate 0.0915   Epoch: 0   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:41,751-Speed 2631.59 samples/sec   Loss 14.7107   LearningRate 0.0915   Epoch: 0   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:45,644-Speed 2631.03 samples/sec   Loss 14.6643   LearningRate 0.0915   Epoch: 0   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:49,537-Speed 2630.43 samples/sec   Loss 14.7183   LearningRate 0.0915   Epoch: 0   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:53,439-Speed 2625.19 samples/sec   Loss 14.7547   LearningRate 0.0915   Epoch: 0   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:46:57,344-Speed 2622.89 samples/sec   Loss 14.7637   LearningRate 0.0915   Epoch: 0   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:47:01,244-Speed 2626.23 samples/sec   Loss 14.5793   LearningRate 0.0915   Epoch: 0   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-12 23:47:05,141-Speed 2628.58 samples/sec   Loss 14.6168   LearningRate 0.0915   Epoch: 0   Global Step: 36240   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:09,033-Speed 2631.26 samples/sec   Loss 14.6032   LearningRate 0.0915   Epoch: 0   Global Step: 36250   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:12,927-Speed 2630.26 samples/sec   Loss 14.8354   LearningRate 0.0914   Epoch: 0   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:16,821-Speed 2630.73 samples/sec   Loss 14.7463   LearningRate 0.0914   Epoch: 0   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:20,716-Speed 2629.46 samples/sec   Loss 14.7268   LearningRate 0.0914   Epoch: 0   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:24,614-Speed 2627.68 samples/sec   Loss 14.5731   LearningRate 0.0914   Epoch: 0   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:28,514-Speed 2626.12 samples/sec   Loss 14.7121   LearningRate 0.0914   Epoch: 0   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:32,411-Speed 2628.31 samples/sec   Loss 14.7462   LearningRate 0.0914   Epoch: 0   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:36,309-Speed 2627.63 samples/sec   Loss 14.7273   LearningRate 0.0914   Epoch: 0   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:40,205-Speed 2628.99 samples/sec   Loss 14.8089   LearningRate 0.0914   Epoch: 0   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:47:44,100-Speed 2629.69 samples/sec   Loss 14.6989   LearningRate 0.0914   Epoch: 0   Global Step: 36340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:47:47,997-Speed 2628.26 samples/sec   Loss 14.6655   LearningRate 0.0914   Epoch: 0   Global Step: 36350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:47:51,899-Speed 2625.55 samples/sec   Loss 14.6791   LearningRate 0.0914   Epoch: 0   Global Step: 36360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:47:55,800-Speed 2625.26 samples/sec   Loss 14.7695   LearningRate 0.0914   Epoch: 0   Global Step: 36370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:47:59,705-Speed 2623.06 samples/sec   Loss 14.6149   LearningRate 0.0914   Epoch: 0   Global Step: 36380   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:03,636-Speed 2605.51 samples/sec   Loss 14.6726   LearningRate 0.0914   Epoch: 0   Global Step: 36390   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:07,562-Speed 2608.70 samples/sec   Loss 14.5970   LearningRate 0.0914   Epoch: 0   Global Step: 36400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:11,478-Speed 2615.68 samples/sec   Loss 14.3835   LearningRate 0.0914   Epoch: 0   Global Step: 36410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:15,387-Speed 2620.75 samples/sec   Loss 14.6430   LearningRate 0.0914   Epoch: 0   Global Step: 36420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:19,289-Speed 2624.49 samples/sec   Loss 14.7175   LearningRate 0.0914   Epoch: 0   Global Step: 36430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:23,168-Speed 2640.37 samples/sec   Loss 14.6186   LearningRate 0.0914   Epoch: 0   Global Step: 36440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:48:27,083-Speed 2616.78 samples/sec   Loss 14.8224   LearningRate 0.0914   Epoch: 0   Global Step: 36450   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:30,976-Speed 2631.05 samples/sec   Loss 14.6203   LearningRate 0.0914   Epoch: 0   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:34,877-Speed 2626.31 samples/sec   Loss 14.7239   LearningRate 0.0914   Epoch: 0   Global Step: 36470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:38,773-Speed 2628.74 samples/sec   Loss 14.5686   LearningRate 0.0914   Epoch: 0   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:42,668-Speed 2629.56 samples/sec   Loss 14.6102   LearningRate 0.0914   Epoch: 0   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:46,567-Speed 2627.08 samples/sec   Loss 14.7176   LearningRate 0.0914   Epoch: 0   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:50,470-Speed 2624.21 samples/sec   Loss 14.7307   LearningRate 0.0914   Epoch: 0   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:54,364-Speed 2629.94 samples/sec   Loss 14.7969   LearningRate 0.0914   Epoch: 0   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:48:58,259-Speed 2629.57 samples/sec   Loss 14.5990   LearningRate 0.0914   Epoch: 0   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:02,153-Speed 2630.79 samples/sec   Loss 14.6973   LearningRate 0.0914   Epoch: 0   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:06,048-Speed 2630.13 samples/sec   Loss 14.7184   LearningRate 0.0914   Epoch: 0   Global Step: 36550   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:49:09,944-Speed 2628.44 samples/sec   Loss 14.7486   LearningRate 0.0914   Epoch: 0   Global Step: 36560   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:49:13,827-Speed 2637.43 samples/sec   Loss 14.6947   LearningRate 0.0914   Epoch: 0   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:17,721-Speed 2630.30 samples/sec   Loss 14.6734   LearningRate 0.0914   Epoch: 0   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:21,628-Speed 2621.94 samples/sec   Loss 14.6767   LearningRate 0.0914   Epoch: 0   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:25,523-Speed 2629.52 samples/sec   Loss 14.6934   LearningRate 0.0914   Epoch: 0   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:29,416-Speed 2631.52 samples/sec   Loss 14.6441   LearningRate 0.0914   Epoch: 0   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:33,312-Speed 2628.97 samples/sec   Loss 14.6468   LearningRate 0.0914   Epoch: 0   Global Step: 36620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:37,209-Speed 2628.28 samples/sec   Loss 14.7152   LearningRate 0.0914   Epoch: 0   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:41,112-Speed 2624.33 samples/sec   Loss 14.5896   LearningRate 0.0914   Epoch: 0   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:45,001-Speed 2633.27 samples/sec   Loss 14.5754   LearningRate 0.0914   Epoch: 0   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:48,904-Speed 2624.15 samples/sec   Loss 14.7323   LearningRate 0.0914   Epoch: 0   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:49:52,798-Speed 2631.16 samples/sec   Loss 14.7414   LearningRate 0.0914   Epoch: 0   Global Step: 36670   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:49:56,697-Speed 2627.28 samples/sec   Loss 14.6607   LearningRate 0.0914   Epoch: 0   Global Step: 36680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:50:00,606-Speed 2620.10 samples/sec   Loss 14.6406   LearningRate 0.0914   Epoch: 0   Global Step: 36690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:50:04,504-Speed 2627.54 samples/sec   Loss 14.5548   LearningRate 0.0913   Epoch: 0   Global Step: 36700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:50:08,407-Speed 2624.55 samples/sec   Loss 14.7192   LearningRate 0.0913   Epoch: 0   Global Step: 36710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:50:12,289-Speed 2638.08 samples/sec   Loss 14.7613   LearningRate 0.0913   Epoch: 0   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:16,182-Speed 2631.57 samples/sec   Loss 14.5456   LearningRate 0.0913   Epoch: 0   Global Step: 36730   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:20,076-Speed 2630.26 samples/sec   Loss 14.7717   LearningRate 0.0913   Epoch: 0   Global Step: 36740   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:23,975-Speed 2627.26 samples/sec   Loss 14.7567   LearningRate 0.0913   Epoch: 0   Global Step: 36750   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:27,870-Speed 2629.91 samples/sec   Loss 14.4593   LearningRate 0.0913   Epoch: 0   Global Step: 36760   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:31,777-Speed 2621.18 samples/sec   Loss 14.6775   LearningRate 0.0913   Epoch: 0   Global Step: 36770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:35,690-Speed 2618.06 samples/sec   Loss 14.6185   LearningRate 0.0913   Epoch: 0   Global Step: 36780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:39,584-Speed 2629.83 samples/sec   Loss 14.6031   LearningRate 0.0913   Epoch: 0   Global Step: 36790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:43,485-Speed 2625.90 samples/sec   Loss 14.7775   LearningRate 0.0913   Epoch: 0   Global Step: 36800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:47,393-Speed 2621.23 samples/sec   Loss 14.5796   LearningRate 0.0913   Epoch: 0   Global Step: 36810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:50:51,275-Speed 2638.15 samples/sec   Loss 14.6611   LearningRate 0.0913   Epoch: 0   Global Step: 36820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:50:55,172-Speed 2628.94 samples/sec   Loss 14.6382   LearningRate 0.0913   Epoch: 0   Global Step: 36830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:50:59,070-Speed 2627.44 samples/sec   Loss 14.6532   LearningRate 0.0913   Epoch: 0   Global Step: 36840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:51:02,978-Speed 2620.43 samples/sec   Loss 14.5723   LearningRate 0.0913   Epoch: 0   Global Step: 36850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:51:06,865-Speed 2634.72 samples/sec   Loss 14.7697   LearningRate 0.0913   Epoch: 0   Global Step: 36860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:10,772-Speed 2622.37 samples/sec   Loss 14.5479   LearningRate 0.0913   Epoch: 0   Global Step: 36870   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:14,672-Speed 2625.91 samples/sec   Loss 14.6375   LearningRate 0.0913   Epoch: 0   Global Step: 36880   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:18,580-Speed 2620.91 samples/sec   Loss 14.7395   LearningRate 0.0913   Epoch: 0   Global Step: 36890   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:22,477-Speed 2628.20 samples/sec   Loss 14.7585   LearningRate 0.0913   Epoch: 0   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:26,376-Speed 2627.25 samples/sec   Loss 14.6799   LearningRate 0.0913   Epoch: 0   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:30,277-Speed 2625.35 samples/sec   Loss 14.8267   LearningRate 0.0913   Epoch: 0   Global Step: 36920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:34,178-Speed 2625.52 samples/sec   Loss 14.7566   LearningRate 0.0913   Epoch: 0   Global Step: 36930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:38,117-Speed 2600.49 samples/sec   Loss 14.4812   LearningRate 0.0913   Epoch: 0   Global Step: 36940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:42,052-Speed 2602.25 samples/sec   Loss 14.4902   LearningRate 0.0913   Epoch: 0   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:51:45,950-Speed 2628.25 samples/sec   Loss 14.6866   LearningRate 0.0913   Epoch: 0   Global Step: 36960   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:51:49,848-Speed 2627.51 samples/sec   Loss 14.6051   LearningRate 0.0913   Epoch: 0   Global Step: 36970   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:51:53,749-Speed 2626.09 samples/sec   Loss 14.5708   LearningRate 0.0913   Epoch: 0   Global Step: 36980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:51:57,659-Speed 2619.68 samples/sec   Loss 14.6013   LearningRate 0.0913   Epoch: 0   Global Step: 36990   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:52:01,558-Speed 2627.34 samples/sec   Loss 14.6940   LearningRate 0.0913   Epoch: 0   Global Step: 37000   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:52:05,458-Speed 2626.03 samples/sec   Loss 14.5649   LearningRate 0.0913   Epoch: 0   Global Step: 37010   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:52:09,354-Speed 2628.63 samples/sec   Loss 14.6485   LearningRate 0.0913   Epoch: 0   Global Step: 37020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:52:13,278-Speed 2610.24 samples/sec   Loss 14.6138   LearningRate 0.0913   Epoch: 0   Global Step: 37030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:52:17,180-Speed 2625.25 samples/sec   Loss 14.7111   LearningRate 0.0913   Epoch: 0   Global Step: 37040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:52:21,063-Speed 2638.01 samples/sec   Loss 14.5790   LearningRate 0.0913   Epoch: 0   Global Step: 37050   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:24,963-Speed 2626.50 samples/sec   Loss 14.5320   LearningRate 0.0913   Epoch: 0   Global Step: 37060   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:28,873-Speed 2619.17 samples/sec   Loss 14.6542   LearningRate 0.0913   Epoch: 0   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:32,772-Speed 2627.34 samples/sec   Loss 14.6862   LearningRate 0.0913   Epoch: 0   Global Step: 37080   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:36,682-Speed 2619.30 samples/sec   Loss 14.6376   LearningRate 0.0913   Epoch: 0   Global Step: 37090   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:40,584-Speed 2625.19 samples/sec   Loss 14.6539   LearningRate 0.0913   Epoch: 0   Global Step: 37100   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:44,495-Speed 2618.88 samples/sec   Loss 14.6058   LearningRate 0.0913   Epoch: 0   Global Step: 37110   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:48,389-Speed 2629.97 samples/sec   Loss 14.7842   LearningRate 0.0913   Epoch: 0   Global Step: 37120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:52,293-Speed 2623.60 samples/sec   Loss 14.6923   LearningRate 0.0912   Epoch: 0   Global Step: 37130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:52:56,193-Speed 2626.01 samples/sec   Loss 14.5732   LearningRate 0.0912   Epoch: 0   Global Step: 37140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:00,093-Speed 2626.85 samples/sec   Loss 14.5779   LearningRate 0.0912   Epoch: 0   Global Step: 37150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:53:04,000-Speed 2621.24 samples/sec   Loss 14.5998   LearningRate 0.0912   Epoch: 0   Global Step: 37160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:53:07,888-Speed 2634.55 samples/sec   Loss 14.7481   LearningRate 0.0912   Epoch: 0   Global Step: 37170   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:11,788-Speed 2625.87 samples/sec   Loss 14.5585   LearningRate 0.0912   Epoch: 0   Global Step: 37180   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:15,707-Speed 2614.28 samples/sec   Loss 14.7085   LearningRate 0.0912   Epoch: 0   Global Step: 37190   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:19,604-Speed 2627.86 samples/sec   Loss 14.5995   LearningRate 0.0912   Epoch: 0   Global Step: 37200   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:23,510-Speed 2622.91 samples/sec   Loss 14.6238   LearningRate 0.0912   Epoch: 0   Global Step: 37210   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:27,403-Speed 2631.02 samples/sec   Loss 14.4981   LearningRate 0.0912   Epoch: 0   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:31,300-Speed 2628.26 samples/sec   Loss 14.9110   LearningRate 0.0912   Epoch: 0   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:35,207-Speed 2621.99 samples/sec   Loss 14.5906   LearningRate 0.0912   Epoch: 0   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:39,103-Speed 2628.37 samples/sec   Loss 14.6834   LearningRate 0.0912   Epoch: 0   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:43,002-Speed 2626.98 samples/sec   Loss 14.5264   LearningRate 0.0912   Epoch: 0   Global Step: 37260   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:53:46,912-Speed 2620.26 samples/sec   Loss 14.5402   LearningRate 0.0912   Epoch: 0   Global Step: 37270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:53:50,807-Speed 2629.69 samples/sec   Loss 14.6014   LearningRate 0.0912   Epoch: 0   Global Step: 37280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:53:54,716-Speed 2620.11 samples/sec   Loss 14.6357   LearningRate 0.0912   Epoch: 0   Global Step: 37290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:53:58,614-Speed 2628.06 samples/sec   Loss 14.6206   LearningRate 0.0912   Epoch: 0   Global Step: 37300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:54:02,529-Speed 2616.47 samples/sec   Loss 14.7906   LearningRate 0.0912   Epoch: 0   Global Step: 37310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:54:06,401-Speed 2645.26 samples/sec   Loss 14.5249   LearningRate 0.0912   Epoch: 0   Global Step: 37320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:10,300-Speed 2626.44 samples/sec   Loss 14.6109   LearningRate 0.0912   Epoch: 0   Global Step: 37330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:14,210-Speed 2619.66 samples/sec   Loss 14.3367   LearningRate 0.0912   Epoch: 0   Global Step: 37340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:18,103-Speed 2631.25 samples/sec   Loss 14.4624   LearningRate 0.0912   Epoch: 0   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:22,010-Speed 2622.16 samples/sec   Loss 14.6390   LearningRate 0.0912   Epoch: 0   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:25,906-Speed 2628.77 samples/sec   Loss 14.5135   LearningRate 0.0912   Epoch: 0   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:29,802-Speed 2629.18 samples/sec   Loss 14.6779   LearningRate 0.0912   Epoch: 0   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:33,762-Speed 2586.26 samples/sec   Loss 14.6406   LearningRate 0.0912   Epoch: 0   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:37,650-Speed 2635.22 samples/sec   Loss 14.7093   LearningRate 0.0912   Epoch: 0   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:41,547-Speed 2628.37 samples/sec   Loss 14.5384   LearningRate 0.0912   Epoch: 0   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-12 23:54:45,442-Speed 2629.03 samples/sec   Loss 14.7316   LearningRate 0.0912   Epoch: 0   Global Step: 37420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:54:49,348-Speed 2622.71 samples/sec   Loss 14.7594   LearningRate 0.0912   Epoch: 0   Global Step: 37430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:54:53,244-Speed 2628.90 samples/sec   Loss 14.6003   LearningRate 0.0912   Epoch: 0   Global Step: 37440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-12 23:54:57,141-Speed 2628.94 samples/sec   Loss 14.6144   LearningRate 0.0912   Epoch: 0   Global Step: 37450   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:55:01,045-Speed 2623.56 samples/sec   Loss 14.5957   LearningRate 0.0912   Epoch: 0   Global Step: 37460   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:55:04,943-Speed 2627.98 samples/sec   Loss 14.5355   LearningRate 0.0912   Epoch: 0   Global Step: 37470   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:55:08,839-Speed 2628.47 samples/sec   Loss 14.6150   LearningRate 0.0912   Epoch: 0   Global Step: 37480   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:55:12,719-Speed 2640.14 samples/sec   Loss 14.6696   LearningRate 0.0912   Epoch: 0   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:16,614-Speed 2630.01 samples/sec   Loss 14.6516   LearningRate 0.0912   Epoch: 0   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:20,572-Speed 2587.90 samples/sec   Loss 14.3109   LearningRate 0.0912   Epoch: 0   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:24,472-Speed 2626.49 samples/sec   Loss 14.5574   LearningRate 0.0912   Epoch: 0   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:28,370-Speed 2627.30 samples/sec   Loss 14.5195   LearningRate 0.0912   Epoch: 0   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:32,298-Speed 2607.84 samples/sec   Loss 14.6510   LearningRate 0.0912   Epoch: 0   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:36,428-Speed 2480.28 samples/sec   Loss 14.7007   LearningRate 0.0912   Epoch: 0   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:40,324-Speed 2628.90 samples/sec   Loss 14.7182   LearningRate 0.0911   Epoch: 0   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:44,222-Speed 2627.09 samples/sec   Loss 14.5937   LearningRate 0.0911   Epoch: 0   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:48,121-Speed 2627.07 samples/sec   Loss 14.5972   LearningRate 0.0911   Epoch: 0   Global Step: 37580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:55:52,017-Speed 2629.08 samples/sec   Loss 14.4536   LearningRate 0.0911   Epoch: 0   Global Step: 37590   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:55:55,914-Speed 2628.37 samples/sec   Loss 14.5798   LearningRate 0.0911   Epoch: 0   Global Step: 37600   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:55:59,815-Speed 2625.91 samples/sec   Loss 14.6914   LearningRate 0.0911   Epoch: 0   Global Step: 37610   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:03,710-Speed 2629.22 samples/sec   Loss 14.5050   LearningRate 0.0911   Epoch: 0   Global Step: 37620   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:07,615-Speed 2623.16 samples/sec   Loss 14.4456   LearningRate 0.0911   Epoch: 0   Global Step: 37630   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:11,515-Speed 2626.40 samples/sec   Loss 14.5271   LearningRate 0.0911   Epoch: 0   Global Step: 37640   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:15,420-Speed 2622.72 samples/sec   Loss 14.5463   LearningRate 0.0911   Epoch: 0   Global Step: 37650   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:19,314-Speed 2629.97 samples/sec   Loss 14.6110   LearningRate 0.0911   Epoch: 0   Global Step: 37660   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:23,208-Speed 2630.59 samples/sec   Loss 14.5985   LearningRate 0.0911   Epoch: 0   Global Step: 37670   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:27,102-Speed 2630.40 samples/sec   Loss 14.6080   LearningRate 0.0911   Epoch: 0   Global Step: 37680   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:30,988-Speed 2635.78 samples/sec   Loss 14.3575   LearningRate 0.0911   Epoch: 0   Global Step: 37690   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:34,892-Speed 2623.56 samples/sec   Loss 14.5668   LearningRate 0.0911   Epoch: 0   Global Step: 37700   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:38,785-Speed 2630.46 samples/sec   Loss 14.5414   LearningRate 0.0911   Epoch: 0   Global Step: 37710   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:42,699-Speed 2616.66 samples/sec   Loss 14.6540   LearningRate 0.0911   Epoch: 0   Global Step: 37720   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:46,594-Speed 2629.83 samples/sec   Loss 14.6837   LearningRate 0.0911   Epoch: 0   Global Step: 37730   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:50,500-Speed 2621.98 samples/sec   Loss 14.6026   LearningRate 0.0911   Epoch: 0   Global Step: 37740   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:54,400-Speed 2626.84 samples/sec   Loss 14.6130   LearningRate 0.0911   Epoch: 0   Global Step: 37750   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:56:58,308-Speed 2620.64 samples/sec   Loss 14.6321   LearningRate 0.0911   Epoch: 0   Global Step: 37760   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:02,206-Speed 2627.35 samples/sec   Loss 14.5486   LearningRate 0.0911   Epoch: 0   Global Step: 37770   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:06,108-Speed 2625.11 samples/sec   Loss 14.6059   LearningRate 0.0911   Epoch: 0   Global Step: 37780   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:10,004-Speed 2631.61 samples/sec   Loss 14.5734   LearningRate 0.0911   Epoch: 0   Global Step: 37790   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:13,905-Speed 2624.96 samples/sec   Loss 14.5741   LearningRate 0.0911   Epoch: 0   Global Step: 37800   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:17,804-Speed 2627.27 samples/sec   Loss 14.5903   LearningRate 0.0911   Epoch: 0   Global Step: 37810   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:21,698-Speed 2629.90 samples/sec   Loss 14.5958   LearningRate 0.0911   Epoch: 0   Global Step: 37820   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:25,681-Speed 2572.03 samples/sec   Loss 14.5700   LearningRate 0.0911   Epoch: 0   Global Step: 37830   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:29,578-Speed 2627.80 samples/sec   Loss 14.6260   LearningRate 0.0911   Epoch: 0   Global Step: 37840   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:33,473-Speed 2629.58 samples/sec   Loss 14.7083   LearningRate 0.0911   Epoch: 0   Global Step: 37850   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:37,383-Speed 2620.01 samples/sec   Loss 14.5624   LearningRate 0.0911   Epoch: 0   Global Step: 37860   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:41,271-Speed 2634.27 samples/sec   Loss 14.6123   LearningRate 0.0911   Epoch: 0   Global Step: 37870   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:45,176-Speed 2622.95 samples/sec   Loss 14.5660   LearningRate 0.0911   Epoch: 0   Global Step: 37880   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:49,052-Speed 2643.07 samples/sec   Loss 14.5354   LearningRate 0.0911   Epoch: 0   Global Step: 37890   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:52,944-Speed 2631.12 samples/sec   Loss 14.5691   LearningRate 0.0911   Epoch: 0   Global Step: 37900   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:57:56,840-Speed 2629.04 samples/sec   Loss 14.6529   LearningRate 0.0911   Epoch: 0   Global Step: 37910   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:58:00,717-Speed 2641.60 samples/sec   Loss 14.7694   LearningRate 0.0911   Epoch: 0   Global Step: 37920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:04,673-Speed 2588.84 samples/sec   Loss 14.5460   LearningRate 0.0911   Epoch: 0   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:08,567-Speed 2630.64 samples/sec   Loss 14.6174   LearningRate 0.0911   Epoch: 0   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:12,464-Speed 2628.70 samples/sec   Loss 14.4837   LearningRate 0.0911   Epoch: 0   Global Step: 37950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:16,359-Speed 2629.75 samples/sec   Loss 14.6868   LearningRate 0.0911   Epoch: 0   Global Step: 37960   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:20,260-Speed 2625.71 samples/sec   Loss 14.7234   LearningRate 0.0911   Epoch: 0   Global Step: 37970   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:24,165-Speed 2622.53 samples/sec   Loss 14.4276   LearningRate 0.0911   Epoch: 0   Global Step: 37980   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:28,059-Speed 2630.73 samples/sec   Loss 14.3793   LearningRate 0.0911   Epoch: 0   Global Step: 37990   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:31,961-Speed 2624.45 samples/sec   Loss 14.5891   LearningRate 0.0910   Epoch: 0   Global Step: 38000   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:35,862-Speed 2625.61 samples/sec   Loss 14.4546   LearningRate 0.0910   Epoch: 0   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:58:39,764-Speed 2624.39 samples/sec   Loss 14.5751   LearningRate 0.0910   Epoch: 0   Global Step: 38020   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:58:43,662-Speed 2627.89 samples/sec   Loss 14.5560   LearningRate 0.0910   Epoch: 0   Global Step: 38030   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:58:47,562-Speed 2626.77 samples/sec   Loss 14.6786   LearningRate 0.0910   Epoch: 0   Global Step: 38040   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:58:51,464-Speed 2624.89 samples/sec   Loss 14.6189   LearningRate 0.0910   Epoch: 0   Global Step: 38050   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:58:55,465-Speed 2560.06 samples/sec   Loss 14.5625   LearningRate 0.0910   Epoch: 0   Global Step: 38060   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:58:59,352-Speed 2635.05 samples/sec   Loss 14.5922   LearningRate 0.0910   Epoch: 0   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:03,248-Speed 2628.92 samples/sec   Loss 14.6392   LearningRate 0.0910   Epoch: 0   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:07,159-Speed 2619.39 samples/sec   Loss 14.6878   LearningRate 0.0910   Epoch: 0   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:11,073-Speed 2616.92 samples/sec   Loss 14.4574   LearningRate 0.0910   Epoch: 0   Global Step: 38100   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:14,973-Speed 2626.38 samples/sec   Loss 14.4297   LearningRate 0.0910   Epoch: 0   Global Step: 38110   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:18,889-Speed 2615.90 samples/sec   Loss 14.4269   LearningRate 0.0910   Epoch: 0   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:22,787-Speed 2627.67 samples/sec   Loss 14.4103   LearningRate 0.0910   Epoch: 0   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:26,677-Speed 2633.01 samples/sec   Loss 14.5331   LearningRate 0.0910   Epoch: 0   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:30,572-Speed 2629.52 samples/sec   Loss 14.6839   LearningRate 0.0910   Epoch: 0   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:34,464-Speed 2631.87 samples/sec   Loss 14.5781   LearningRate 0.0910   Epoch: 0   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-12 23:59:38,355-Speed 2631.61 samples/sec   Loss 14.6131   LearningRate 0.0910   Epoch: 0   Global Step: 38170   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:59:42,263-Speed 2621.66 samples/sec   Loss 14.6273   LearningRate 0.0910   Epoch: 0   Global Step: 38180   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:59:46,149-Speed 2635.93 samples/sec   Loss 14.5144   LearningRate 0.0910   Epoch: 0   Global Step: 38190   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:59:50,049-Speed 2626.37 samples/sec   Loss 14.5808   LearningRate 0.0910   Epoch: 0   Global Step: 38200   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:59:53,942-Speed 2630.66 samples/sec   Loss 14.4287   LearningRate 0.0910   Epoch: 0   Global Step: 38210   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-12 23:59:57,837-Speed 2629.89 samples/sec   Loss 14.4768   LearningRate 0.0910   Epoch: 0   Global Step: 38220   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:01,734-Speed 2628.58 samples/sec   Loss 14.4742   LearningRate 0.0910   Epoch: 0   Global Step: 38230   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:05,632-Speed 2627.36 samples/sec   Loss 14.5339   LearningRate 0.0910   Epoch: 0   Global Step: 38240   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:09,528-Speed 2628.92 samples/sec   Loss 14.5489   LearningRate 0.0910   Epoch: 0   Global Step: 38250   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:13,429-Speed 2626.21 samples/sec   Loss 14.3949   LearningRate 0.0910   Epoch: 0   Global Step: 38260   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:17,309-Speed 2639.68 samples/sec   Loss 14.3513   LearningRate 0.0910   Epoch: 0   Global Step: 38270   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:21,202-Speed 2630.79 samples/sec   Loss 14.3503   LearningRate 0.0910   Epoch: 0   Global Step: 38280   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:25,108-Speed 2622.55 samples/sec   Loss 14.4942   LearningRate 0.0910   Epoch: 0   Global Step: 38290   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:29,003-Speed 2629.69 samples/sec   Loss 14.5723   LearningRate 0.0910   Epoch: 0   Global Step: 38300   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:32,906-Speed 2624.98 samples/sec   Loss 14.5121   LearningRate 0.0910   Epoch: 0   Global Step: 38310   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:00:36,794-Speed 2634.04 samples/sec   Loss 14.5716   LearningRate 0.0910   Epoch: 0   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:00:40,717-Speed 2611.30 samples/sec   Loss 14.5415   LearningRate 0.0910   Epoch: 0   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:00:44,615-Speed 2627.50 samples/sec   Loss 14.4656   LearningRate 0.0910   Epoch: 0   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:00:48,523-Speed 2620.59 samples/sec   Loss 14.6288   LearningRate 0.0910   Epoch: 0   Global Step: 38350   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:00:52,423-Speed 2626.83 samples/sec   Loss 14.6679   LearningRate 0.0910   Epoch: 0   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:00:56,336-Speed 2617.09 samples/sec   Loss 14.6328   LearningRate 0.0910   Epoch: 0   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:00,236-Speed 2627.12 samples/sec   Loss 14.6576   LearningRate 0.0910   Epoch: 0   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:04,139-Speed 2624.26 samples/sec   Loss 14.5149   LearningRate 0.0910   Epoch: 0   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:08,035-Speed 2628.65 samples/sec   Loss 14.4666   LearningRate 0.0910   Epoch: 0   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:11,931-Speed 2628.83 samples/sec   Loss 14.4523   LearningRate 0.0910   Epoch: 0   Global Step: 38410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:15,836-Speed 2622.79 samples/sec   Loss 14.3832   LearningRate 0.0910   Epoch: 0   Global Step: 38420   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:19,731-Speed 2629.45 samples/sec   Loss 14.6406   LearningRate 0.0909   Epoch: 0   Global Step: 38430   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:23,643-Speed 2618.38 samples/sec   Loss 14.5173   LearningRate 0.0909   Epoch: 0   Global Step: 38440   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:27,561-Speed 2614.72 samples/sec   Loss 14.5003   LearningRate 0.0909   Epoch: 0   Global Step: 38450   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:31,586-Speed 2544.96 samples/sec   Loss 14.7222   LearningRate 0.0909   Epoch: 0   Global Step: 38460   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:35,494-Speed 2620.23 samples/sec   Loss 14.6355   LearningRate 0.0909   Epoch: 0   Global Step: 38470   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:39,391-Speed 2628.07 samples/sec   Loss 14.7344   LearningRate 0.0909   Epoch: 0   Global Step: 38480   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:43,285-Speed 2630.55 samples/sec   Loss 14.6244   LearningRate 0.0909   Epoch: 0   Global Step: 38490   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:47,186-Speed 2625.44 samples/sec   Loss 14.4593   LearningRate 0.0909   Epoch: 0   Global Step: 38500   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:01:51,065-Speed 2640.74 samples/sec   Loss 14.4064   LearningRate 0.0909   Epoch: 0   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:54,957-Speed 2631.74 samples/sec   Loss 14.5695   LearningRate 0.0909   Epoch: 0   Global Step: 38520   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:01:58,852-Speed 2629.55 samples/sec   Loss 14.4803   LearningRate 0.0909   Epoch: 0   Global Step: 38530   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:02,743-Speed 2632.91 samples/sec   Loss 14.5885   LearningRate 0.0909   Epoch: 0   Global Step: 38540   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:06,637-Speed 2630.28 samples/sec   Loss 14.4833   LearningRate 0.0909   Epoch: 0   Global Step: 38550   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:10,532-Speed 2629.38 samples/sec   Loss 14.4344   LearningRate 0.0909   Epoch: 0   Global Step: 38560   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:14,432-Speed 2626.34 samples/sec   Loss 14.4763   LearningRate 0.0909   Epoch: 0   Global Step: 38570   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:18,324-Speed 2631.70 samples/sec   Loss 14.5535   LearningRate 0.0909   Epoch: 0   Global Step: 38580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:22,218-Speed 2630.49 samples/sec   Loss 14.6660   LearningRate 0.0909   Epoch: 0   Global Step: 38590   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:26,110-Speed 2631.24 samples/sec   Loss 14.4841   LearningRate 0.0909   Epoch: 0   Global Step: 38600   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:30,002-Speed 2631.52 samples/sec   Loss 14.4978   LearningRate 0.0909   Epoch: 0   Global Step: 38610   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:02:33,907-Speed 2623.81 samples/sec   Loss 14.6872   LearningRate 0.0909   Epoch: 0   Global Step: 38620   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:02:37,821-Speed 2616.62 samples/sec   Loss 14.4720   LearningRate 0.0909   Epoch: 0   Global Step: 38630   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:02:41,715-Speed 2630.42 samples/sec   Loss 14.5654   LearningRate 0.0909   Epoch: 0   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:45,611-Speed 2629.37 samples/sec   Loss 14.5658   LearningRate 0.0909   Epoch: 0   Global Step: 38650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:49,512-Speed 2625.25 samples/sec   Loss 14.4887   LearningRate 0.0909   Epoch: 0   Global Step: 38660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:53,403-Speed 2632.15 samples/sec   Loss 14.6290   LearningRate 0.0909   Epoch: 0   Global Step: 38670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:02:57,296-Speed 2631.04 samples/sec   Loss 14.2452   LearningRate 0.0909   Epoch: 0   Global Step: 38680   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:01,190-Speed 2630.40 samples/sec   Loss 14.4722   LearningRate 0.0909   Epoch: 0   Global Step: 38690   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:05,090-Speed 2626.03 samples/sec   Loss 14.5182   LearningRate 0.0909   Epoch: 0   Global Step: 38700   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:09,015-Speed 2609.77 samples/sec   Loss 14.6681   LearningRate 0.0909   Epoch: 0   Global Step: 38710   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:12,906-Speed 2633.06 samples/sec   Loss 14.4559   LearningRate 0.0909   Epoch: 0   Global Step: 38720   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:16,801-Speed 2629.37 samples/sec   Loss 14.4817   LearningRate 0.0909   Epoch: 0   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:20,693-Speed 2632.26 samples/sec   Loss 14.2665   LearningRate 0.0909   Epoch: 0   Global Step: 38740   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:24,587-Speed 2630.33 samples/sec   Loss 14.5045   LearningRate 0.0909   Epoch: 0   Global Step: 38750   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:28,479-Speed 2631.85 samples/sec   Loss 14.5103   LearningRate 0.0909   Epoch: 0   Global Step: 38760   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:32,503-Speed 2544.92 samples/sec   Loss 14.3618   LearningRate 0.0909   Epoch: 0   Global Step: 38770   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:36,396-Speed 2631.30 samples/sec   Loss 14.4675   LearningRate 0.0909   Epoch: 0   Global Step: 38780   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:40,302-Speed 2622.39 samples/sec   Loss 14.4506   LearningRate 0.0909   Epoch: 0   Global Step: 38790   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:44,198-Speed 2629.06 samples/sec   Loss 14.6283   LearningRate 0.0909   Epoch: 0   Global Step: 38800   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:48,111-Speed 2617.64 samples/sec   Loss 14.6175   LearningRate 0.0909   Epoch: 0   Global Step: 38810   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:03:51,986-Speed 2643.23 samples/sec   Loss 14.5796   LearningRate 0.0909   Epoch: 0   Global Step: 38820   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:55,895-Speed 2620.69 samples/sec   Loss 14.5228   LearningRate 0.0909   Epoch: 0   Global Step: 38830   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:03:59,784-Speed 2634.06 samples/sec   Loss 14.4580   LearningRate 0.0909   Epoch: 0   Global Step: 38840   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:03,677-Speed 2630.61 samples/sec   Loss 14.3088   LearningRate 0.0909   Epoch: 0   Global Step: 38850   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:07,700-Speed 2545.75 samples/sec   Loss 14.4400   LearningRate 0.0909   Epoch: 0   Global Step: 38860   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:11,678-Speed 2574.82 samples/sec   Loss 14.4454   LearningRate 0.0908   Epoch: 0   Global Step: 38870   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:15,642-Speed 2583.80 samples/sec   Loss 14.6639   LearningRate 0.0908   Epoch: 0   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:19,552-Speed 2620.15 samples/sec   Loss 14.5088   LearningRate 0.0908   Epoch: 0   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:23,474-Speed 2611.62 samples/sec   Loss 14.4998   LearningRate 0.0908   Epoch: 0   Global Step: 38900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:27,378-Speed 2624.23 samples/sec   Loss 14.4532   LearningRate 0.0908   Epoch: 0   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:04:31,289-Speed 2618.18 samples/sec   Loss 14.5737   LearningRate 0.0908   Epoch: 0   Global Step: 38920   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:35,197-Speed 2621.38 samples/sec   Loss 14.5462   LearningRate 0.0908   Epoch: 0   Global Step: 38930   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:39,103-Speed 2622.28 samples/sec   Loss 14.4037   LearningRate 0.0908   Epoch: 0   Global Step: 38940   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:43,016-Speed 2619.21 samples/sec   Loss 14.4933   LearningRate 0.0908   Epoch: 0   Global Step: 38950   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:46,915-Speed 2626.68 samples/sec   Loss 14.3376   LearningRate 0.0908   Epoch: 0   Global Step: 38960   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:50,816-Speed 2625.60 samples/sec   Loss 14.6815   LearningRate 0.0908   Epoch: 0   Global Step: 38970   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:54,720-Speed 2624.00 samples/sec   Loss 14.6873   LearningRate 0.0908   Epoch: 0   Global Step: 38980   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:04:58,612-Speed 2631.41 samples/sec   Loss 14.5099   LearningRate 0.0908   Epoch: 0   Global Step: 38990   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:02,519-Speed 2621.27 samples/sec   Loss 14.5013   LearningRate 0.0908   Epoch: 0   Global Step: 39000   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:06,423-Speed 2623.60 samples/sec   Loss 14.5885   LearningRate 0.0908   Epoch: 0   Global Step: 39010   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:10,302-Speed 2640.37 samples/sec   Loss 14.5510   LearningRate 0.0908   Epoch: 0   Global Step: 39020   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:14,212-Speed 2620.18 samples/sec   Loss 14.5893   LearningRate 0.0908   Epoch: 0   Global Step: 39030   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:18,106-Speed 2630.70 samples/sec   Loss 14.4378   LearningRate 0.0908   Epoch: 0   Global Step: 39040   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:22,013-Speed 2621.17 samples/sec   Loss 14.4666   LearningRate 0.0908   Epoch: 0   Global Step: 39050   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:25,907-Speed 2630.64 samples/sec   Loss 14.4792   LearningRate 0.0908   Epoch: 0   Global Step: 39060   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:29,809-Speed 2625.02 samples/sec   Loss 14.3808   LearningRate 0.0908   Epoch: 0   Global Step: 39070   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:33,725-Speed 2615.37 samples/sec   Loss 14.4430   LearningRate 0.0908   Epoch: 0   Global Step: 39080   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:37,629-Speed 2623.61 samples/sec   Loss 14.5064   LearningRate 0.0908   Epoch: 0   Global Step: 39090   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:41,524-Speed 2630.15 samples/sec   Loss 14.4537   LearningRate 0.0908   Epoch: 0   Global Step: 39100   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:45,420-Speed 2628.93 samples/sec   Loss 14.3894   LearningRate 0.0908   Epoch: 0   Global Step: 39110   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:49,297-Speed 2642.19 samples/sec   Loss 14.5430   LearningRate 0.0908   Epoch: 0   Global Step: 39120   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:05:53,175-Speed 2640.57 samples/sec   Loss 14.4420   LearningRate 0.0908   Epoch: 0   Global Step: 39130   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:05:57,072-Speed 2628.34 samples/sec   Loss 14.6421   LearningRate 0.0908   Epoch: 0   Global Step: 39140   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:00,970-Speed 2627.96 samples/sec   Loss 14.5500   LearningRate 0.0908   Epoch: 0   Global Step: 39150   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:04,869-Speed 2626.57 samples/sec   Loss 14.3012   LearningRate 0.0908   Epoch: 0   Global Step: 39160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:08,762-Speed 2630.74 samples/sec   Loss 14.5864   LearningRate 0.0908   Epoch: 0   Global Step: 39170   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:12,665-Speed 2624.40 samples/sec   Loss 14.4410   LearningRate 0.0908   Epoch: 0   Global Step: 39180   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:16,560-Speed 2629.73 samples/sec   Loss 14.2759   LearningRate 0.0908   Epoch: 0   Global Step: 39190   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:20,458-Speed 2627.64 samples/sec   Loss 14.4985   LearningRate 0.0908   Epoch: 0   Global Step: 39200   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:24,350-Speed 2631.62 samples/sec   Loss 14.4717   LearningRate 0.0908   Epoch: 0   Global Step: 39210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:28,244-Speed 2630.72 samples/sec   Loss 14.5911   LearningRate 0.0908   Epoch: 0   Global Step: 39220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:06:32,141-Speed 2628.07 samples/sec   Loss 14.4373   LearningRate 0.0908   Epoch: 0   Global Step: 39230   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:36,051-Speed 2619.41 samples/sec   Loss 14.4922   LearningRate 0.0908   Epoch: 0   Global Step: 39240   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:39,944-Speed 2630.56 samples/sec   Loss 14.3087   LearningRate 0.0908   Epoch: 0   Global Step: 39250   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:43,841-Speed 2628.34 samples/sec   Loss 14.5641   LearningRate 0.0908   Epoch: 0   Global Step: 39260   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:47,761-Speed 2613.32 samples/sec   Loss 14.5388   LearningRate 0.0908   Epoch: 0   Global Step: 39270   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:51,658-Speed 2628.11 samples/sec   Loss 14.4114   LearningRate 0.0908   Epoch: 0   Global Step: 39280   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:55,576-Speed 2614.54 samples/sec   Loss 14.3530   LearningRate 0.0908   Epoch: 0   Global Step: 39290   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:06:59,454-Speed 2641.09 samples/sec   Loss 14.4184   LearningRate 0.0907   Epoch: 0   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:03,348-Speed 2630.30 samples/sec   Loss 14.3447   LearningRate 0.0907   Epoch: 0   Global Step: 39310   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:07,331-Speed 2571.22 samples/sec   Loss 14.4688   LearningRate 0.0907   Epoch: 0   Global Step: 39320   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:11,284-Speed 2591.10 samples/sec   Loss 14.5117   LearningRate 0.0907   Epoch: 0   Global Step: 39330   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:15,290-Speed 2556.83 samples/sec   Loss 14.5131   LearningRate 0.0907   Epoch: 0   Global Step: 39340   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:19,183-Speed 2630.95 samples/sec   Loss 14.4638   LearningRate 0.0907   Epoch: 0   Global Step: 39350   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:23,079-Speed 2629.37 samples/sec   Loss 14.4296   LearningRate 0.0907   Epoch: 0   Global Step: 39360   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:26,972-Speed 2630.84 samples/sec   Loss 14.4389   LearningRate 0.0907   Epoch: 0   Global Step: 39370   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:30,885-Speed 2618.11 samples/sec   Loss 14.5675   LearningRate 0.0907   Epoch: 0   Global Step: 39380   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:34,783-Speed 2627.43 samples/sec   Loss 14.5320   LearningRate 0.0907   Epoch: 0   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:07:38,676-Speed 2630.50 samples/sec   Loss 14.3626   LearningRate 0.0907   Epoch: 0   Global Step: 39400   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:07:42,664-Speed 2568.20 samples/sec   Loss 14.4095   LearningRate 0.0907   Epoch: 0   Global Step: 39410   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:07:46,578-Speed 2617.20 samples/sec   Loss 14.4885   LearningRate 0.0907   Epoch: 0   Global Step: 39420   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:07:50,492-Speed 2617.00 samples/sec   Loss 14.5304   LearningRate 0.0907   Epoch: 0   Global Step: 39430   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:07:54,384-Speed 2631.87 samples/sec   Loss 14.4325   LearningRate 0.0907   Epoch: 0   Global Step: 39440   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:07:58,286-Speed 2624.76 samples/sec   Loss 14.4452   LearningRate 0.0907   Epoch: 0   Global Step: 39450   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:08:02,162-Speed 2642.91 samples/sec   Loss 14.4297   LearningRate 0.0907   Epoch: 0   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:06,060-Speed 2627.74 samples/sec   Loss 14.4627   LearningRate 0.0907   Epoch: 0   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:09,995-Speed 2602.99 samples/sec   Loss 14.3747   LearningRate 0.0907   Epoch: 0   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:13,893-Speed 2627.42 samples/sec   Loss 14.4385   LearningRate 0.0907   Epoch: 0   Global Step: 39490   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:17,793-Speed 2626.21 samples/sec   Loss 14.3847   LearningRate 0.0907   Epoch: 0   Global Step: 39500   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:21,691-Speed 2627.55 samples/sec   Loss 14.3480   LearningRate 0.0907   Epoch: 0   Global Step: 39510   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:25,592-Speed 2625.45 samples/sec   Loss 14.4111   LearningRate 0.0907   Epoch: 0   Global Step: 39520   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:29,491-Speed 2627.60 samples/sec   Loss 14.4441   LearningRate 0.0907   Epoch: 0   Global Step: 39530   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:33,391-Speed 2626.13 samples/sec   Loss 14.4280   LearningRate 0.0907   Epoch: 0   Global Step: 39540   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:37,285-Speed 2630.16 samples/sec   Loss 14.5098   LearningRate 0.0907   Epoch: 0   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:08:41,186-Speed 2626.02 samples/sec   Loss 14.6340   LearningRate 0.0907   Epoch: 0   Global Step: 39560   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:08:45,096-Speed 2619.57 samples/sec   Loss 14.5355   LearningRate 0.0907   Epoch: 0   Global Step: 39570   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:08:48,992-Speed 2629.10 samples/sec   Loss 14.4651   LearningRate 0.0907   Epoch: 0   Global Step: 39580   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:08:52,903-Speed 2619.08 samples/sec   Loss 14.3658   LearningRate 0.0907   Epoch: 0   Global Step: 39590   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:08:56,791-Speed 2634.56 samples/sec   Loss 14.4110   LearningRate 0.0907   Epoch: 0   Global Step: 39600   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:09:00,689-Speed 2627.41 samples/sec   Loss 14.5075   LearningRate 0.0907   Epoch: 0   Global Step: 39610   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:09:04,572-Speed 2637.59 samples/sec   Loss 14.4290   LearningRate 0.0907   Epoch: 0   Global Step: 39620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:08,469-Speed 2628.54 samples/sec   Loss 14.5028   LearningRate 0.0907   Epoch: 0   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:12,368-Speed 2627.12 samples/sec   Loss 14.5417   LearningRate 0.0907   Epoch: 0   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:16,259-Speed 2632.22 samples/sec   Loss 14.3410   LearningRate 0.0907   Epoch: 0   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:20,155-Speed 2629.48 samples/sec   Loss 14.4885   LearningRate 0.0907   Epoch: 0   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:24,056-Speed 2625.18 samples/sec   Loss 14.4570   LearningRate 0.0907   Epoch: 0   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:27,954-Speed 2628.06 samples/sec   Loss 14.4440   LearningRate 0.0907   Epoch: 0   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:31,848-Speed 2629.62 samples/sec   Loss 14.4952   LearningRate 0.0907   Epoch: 0   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:35,743-Speed 2629.41 samples/sec   Loss 14.3892   LearningRate 0.0907   Epoch: 0   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:39,635-Speed 2631.63 samples/sec   Loss 14.4647   LearningRate 0.0907   Epoch: 0   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:43,531-Speed 2629.44 samples/sec   Loss 14.5089   LearningRate 0.0907   Epoch: 0   Global Step: 39720   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:09:47,407-Speed 2642.70 samples/sec   Loss 14.4944   LearningRate 0.0907   Epoch: 0   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:51,331-Speed 2610.58 samples/sec   Loss 14.5031   LearningRate 0.0906   Epoch: 0   Global Step: 39740   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:55,222-Speed 2631.65 samples/sec   Loss 14.5341   LearningRate 0.0906   Epoch: 0   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:09:59,119-Speed 2628.84 samples/sec   Loss 14.4938   LearningRate 0.0906   Epoch: 0   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:03,013-Speed 2629.69 samples/sec   Loss 14.3825   LearningRate 0.0906   Epoch: 0   Global Step: 39770   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:06,965-Speed 2591.72 samples/sec   Loss 14.5049   LearningRate 0.0906   Epoch: 0   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:10,862-Speed 2628.07 samples/sec   Loss 14.3531   LearningRate 0.0906   Epoch: 0   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:14,755-Speed 2631.45 samples/sec   Loss 14.5291   LearningRate 0.0906   Epoch: 0   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:18,648-Speed 2631.38 samples/sec   Loss 14.2880   LearningRate 0.0906   Epoch: 0   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:22,543-Speed 2630.02 samples/sec   Loss 14.4829   LearningRate 0.0906   Epoch: 0   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:10:26,435-Speed 2631.35 samples/sec   Loss 14.3325   LearningRate 0.0906   Epoch: 0   Global Step: 39830   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:30,332-Speed 2628.35 samples/sec   Loss 14.5166   LearningRate 0.0906   Epoch: 0   Global Step: 39840   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:34,224-Speed 2631.43 samples/sec   Loss 14.4866   LearningRate 0.0906   Epoch: 0   Global Step: 39850   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:38,118-Speed 2630.56 samples/sec   Loss 14.4474   LearningRate 0.0906   Epoch: 0   Global Step: 39860   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:42,010-Speed 2631.61 samples/sec   Loss 14.3005   LearningRate 0.0906   Epoch: 0   Global Step: 39870   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:45,904-Speed 2630.28 samples/sec   Loss 14.5405   LearningRate 0.0906   Epoch: 0   Global Step: 39880   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:49,812-Speed 2621.33 samples/sec   Loss 14.4777   LearningRate 0.0906   Epoch: 0   Global Step: 39890   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:53,757-Speed 2596.04 samples/sec   Loss 14.5826   LearningRate 0.0906   Epoch: 0   Global Step: 39900   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:10:57,650-Speed 2631.52 samples/sec   Loss 14.2765   LearningRate 0.0906   Epoch: 0   Global Step: 39910   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:11:01,527-Speed 2641.63 samples/sec   Loss 14.3667   LearningRate 0.0906   Epoch: 0   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:05,421-Speed 2630.20 samples/sec   Loss 14.3403   LearningRate 0.0906   Epoch: 0   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:09,317-Speed 2628.89 samples/sec   Loss 14.4814   LearningRate 0.0906   Epoch: 0   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:13,217-Speed 2626.07 samples/sec   Loss 14.4274   LearningRate 0.0906   Epoch: 0   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:17,114-Speed 2628.46 samples/sec   Loss 14.2084   LearningRate 0.0906   Epoch: 0   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:21,011-Speed 2628.56 samples/sec   Loss 14.3751   LearningRate 0.0906   Epoch: 0   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:24,907-Speed 2628.89 samples/sec   Loss 14.4620   LearningRate 0.0906   Epoch: 0   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:28,817-Speed 2619.84 samples/sec   Loss 14.5253   LearningRate 0.0906   Epoch: 0   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:11:32,713-Speed 2628.89 samples/sec   Loss 14.3157   LearningRate 0.0906   Epoch: 0   Global Step: 40000   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:12:16,122-[lfw][40000]XNorm: 22.849399
Training: 2022-04-13 00:12:16,123-[lfw][40000]Accuracy-Flip: 0.99517+-0.00311
Training: 2022-04-13 00:12:16,124-[lfw][40000]Accuracy-Highest: 0.99517
Training: 2022-04-13 00:13:06,650-[cfp_fp][40000]XNorm: 20.437435
Training: 2022-04-13 00:13:06,651-[cfp_fp][40000]Accuracy-Flip: 0.97057+-0.00788
Training: 2022-04-13 00:13:06,652-[cfp_fp][40000]Accuracy-Highest: 0.97057
Training: 2022-04-13 00:13:50,149-[agedb_30][40000]XNorm: 22.183955
Training: 2022-04-13 00:13:50,150-[agedb_30][40000]Accuracy-Flip: 0.95567+-0.00844
Training: 2022-04-13 00:13:50,150-[agedb_30][40000]Accuracy-Highest: 0.95567
Training: 2022-04-13 00:13:54,029-Speed 72.46 samples/sec   Loss 14.4160   LearningRate 0.0906   Epoch: 0   Global Step: 40010   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:13:57,903-Speed 2643.96 samples/sec   Loss 14.3571   LearningRate 0.0906   Epoch: 0   Global Step: 40020   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:01,782-Speed 2640.98 samples/sec   Loss 14.2670   LearningRate 0.0906   Epoch: 0   Global Step: 40030   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:05,661-Speed 2640.02 samples/sec   Loss 14.3108   LearningRate 0.0906   Epoch: 0   Global Step: 40040   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:09,586-Speed 2610.04 samples/sec   Loss 14.4073   LearningRate 0.0906   Epoch: 0   Global Step: 40050   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:13,467-Speed 2639.05 samples/sec   Loss 14.4426   LearningRate 0.0906   Epoch: 0   Global Step: 40060   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:17,355-Speed 2635.10 samples/sec   Loss 14.4410   LearningRate 0.0906   Epoch: 0   Global Step: 40070   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:21,254-Speed 2633.72 samples/sec   Loss 14.4021   LearningRate 0.0906   Epoch: 0   Global Step: 40080   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:25,127-Speed 2644.20 samples/sec   Loss 14.3593   LearningRate 0.0906   Epoch: 0   Global Step: 40090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:29,020-Speed 2631.72 samples/sec   Loss 14.3853   LearningRate 0.0906   Epoch: 0   Global Step: 40100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:32,913-Speed 2631.17 samples/sec   Loss 14.4130   LearningRate 0.0906   Epoch: 0   Global Step: 40110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:36,789-Speed 2642.24 samples/sec   Loss 14.4990   LearningRate 0.0906   Epoch: 0   Global Step: 40120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:40,686-Speed 2628.48 samples/sec   Loss 14.4107   LearningRate 0.0906   Epoch: 0   Global Step: 40130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:44,590-Speed 2623.11 samples/sec   Loss 14.3697   LearningRate 0.0906   Epoch: 0   Global Step: 40140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:48,497-Speed 2622.70 samples/sec   Loss 14.4244   LearningRate 0.0906   Epoch: 0   Global Step: 40150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:52,401-Speed 2623.86 samples/sec   Loss 14.2394   LearningRate 0.0906   Epoch: 0   Global Step: 40160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:14:56,316-Speed 2616.42 samples/sec   Loss 14.4623   LearningRate 0.0906   Epoch: 0   Global Step: 40170   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:00,220-Speed 2624.01 samples/sec   Loss 14.6685   LearningRate 0.0905   Epoch: 0   Global Step: 40180   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:04,124-Speed 2623.58 samples/sec   Loss 14.4091   LearningRate 0.0905   Epoch: 0   Global Step: 40190   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:08,033-Speed 2619.93 samples/sec   Loss 14.4530   LearningRate 0.0905   Epoch: 0   Global Step: 40200   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:11,943-Speed 2619.76 samples/sec   Loss 14.3183   LearningRate 0.0905   Epoch: 0   Global Step: 40210   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:15,837-Speed 2630.39 samples/sec   Loss 14.2451   LearningRate 0.0905   Epoch: 0   Global Step: 40220   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:19,743-Speed 2622.05 samples/sec   Loss 14.3724   LearningRate 0.0905   Epoch: 0   Global Step: 40230   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:23,650-Speed 2622.12 samples/sec   Loss 14.4806   LearningRate 0.0905   Epoch: 0   Global Step: 40240   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:27,554-Speed 2623.14 samples/sec   Loss 14.3699   LearningRate 0.0905   Epoch: 0   Global Step: 40250   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:31,467-Speed 2618.08 samples/sec   Loss 14.2691   LearningRate 0.0905   Epoch: 0   Global Step: 40260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:35,375-Speed 2620.20 samples/sec   Loss 14.5893   LearningRate 0.0905   Epoch: 0   Global Step: 40270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:39,386-Speed 2553.72 samples/sec   Loss 14.3208   LearningRate 0.0905   Epoch: 0   Global Step: 40280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:43,486-Speed 2497.96 samples/sec   Loss 14.4296   LearningRate 0.0905   Epoch: 0   Global Step: 40290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:15:47,558-Speed 2515.43 samples/sec   Loss 14.3983   LearningRate 0.0905   Epoch: 0   Global Step: 40300   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:15:51,522-Speed 2584.04 samples/sec   Loss 14.3922   LearningRate 0.0905   Epoch: 0   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:15:55,563-Speed 2535.26 samples/sec   Loss 14.4048   LearningRate 0.0905   Epoch: 0   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:15:59,465-Speed 2624.63 samples/sec   Loss 14.3445   LearningRate 0.0905   Epoch: 0   Global Step: 40330   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:03,368-Speed 2624.62 samples/sec   Loss 14.3235   LearningRate 0.0905   Epoch: 0   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:07,270-Speed 2624.32 samples/sec   Loss 14.3898   LearningRate 0.0905   Epoch: 0   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:11,170-Speed 2626.43 samples/sec   Loss 14.4048   LearningRate 0.0905   Epoch: 0   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:15,077-Speed 2621.54 samples/sec   Loss 14.3138   LearningRate 0.0905   Epoch: 0   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:18,977-Speed 2626.83 samples/sec   Loss 14.4085   LearningRate 0.0905   Epoch: 0   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:22,929-Speed 2591.29 samples/sec   Loss 14.4584   LearningRate 0.0905   Epoch: 0   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:26,827-Speed 2628.29 samples/sec   Loss 14.4268   LearningRate 0.0905   Epoch: 0   Global Step: 40400   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:30,736-Speed 2620.15 samples/sec   Loss 14.4096   LearningRate 0.0905   Epoch: 0   Global Step: 40410   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:34,638-Speed 2624.96 samples/sec   Loss 14.3719   LearningRate 0.0905   Epoch: 0   Global Step: 40420   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:38,550-Speed 2618.26 samples/sec   Loss 14.4587   LearningRate 0.0905   Epoch: 0   Global Step: 40430   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:42,448-Speed 2627.67 samples/sec   Loss 14.4505   LearningRate 0.0905   Epoch: 0   Global Step: 40440   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:46,471-Speed 2545.75 samples/sec   Loss 14.3618   LearningRate 0.0905   Epoch: 0   Global Step: 40450   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:50,374-Speed 2624.44 samples/sec   Loss 14.3038   LearningRate 0.0905   Epoch: 0   Global Step: 40460   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:16:54,270-Speed 2629.47 samples/sec   Loss 14.3873   LearningRate 0.0905   Epoch: 0   Global Step: 40470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:16:58,159-Speed 2633.44 samples/sec   Loss 14.5302   LearningRate 0.0905   Epoch: 0   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:02,217-Speed 2524.37 samples/sec   Loss 14.3686   LearningRate 0.0905   Epoch: 0   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:06,353-Speed 2476.86 samples/sec   Loss 14.3802   LearningRate 0.0905   Epoch: 0   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:10,385-Speed 2540.08 samples/sec   Loss 14.4505   LearningRate 0.0905   Epoch: 0   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:14,311-Speed 2608.92 samples/sec   Loss 14.3811   LearningRate 0.0905   Epoch: 0   Global Step: 40520   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:18,208-Speed 2628.35 samples/sec   Loss 14.3702   LearningRate 0.0905   Epoch: 0   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:22,119-Speed 2619.04 samples/sec   Loss 14.2603   LearningRate 0.0905   Epoch: 0   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:26,019-Speed 2626.82 samples/sec   Loss 14.3331   LearningRate 0.0905   Epoch: 0   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:29,922-Speed 2624.07 samples/sec   Loss 14.4168   LearningRate 0.0905   Epoch: 0   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:33,839-Speed 2615.19 samples/sec   Loss 14.3637   LearningRate 0.0905   Epoch: 0   Global Step: 40570   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:17:37,717-Speed 2640.56 samples/sec   Loss 14.3943   LearningRate 0.0905   Epoch: 0   Global Step: 40580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:41,625-Speed 2620.75 samples/sec   Loss 14.4141   LearningRate 0.0905   Epoch: 0   Global Step: 40590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:45,519-Speed 2630.48 samples/sec   Loss 14.3372   LearningRate 0.0905   Epoch: 0   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:49,439-Speed 2613.20 samples/sec   Loss 14.4195   LearningRate 0.0904   Epoch: 0   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:53,338-Speed 2626.84 samples/sec   Loss 14.3010   LearningRate 0.0904   Epoch: 0   Global Step: 40620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:17:57,242-Speed 2624.23 samples/sec   Loss 14.3949   LearningRate 0.0904   Epoch: 0   Global Step: 40630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:18:01,138-Speed 2629.03 samples/sec   Loss 14.4383   LearningRate 0.0904   Epoch: 0   Global Step: 40640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:18:05,036-Speed 2628.05 samples/sec   Loss 14.4077   LearningRate 0.0904   Epoch: 0   Global Step: 40650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:18:08,931-Speed 2629.01 samples/sec   Loss 14.3621   LearningRate 0.0904   Epoch: 0   Global Step: 40660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:18:12,832-Speed 2625.94 samples/sec   Loss 14.5978   LearningRate 0.0904   Epoch: 0   Global Step: 40670   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:18:16,731-Speed 2626.71 samples/sec   Loss 14.3121   LearningRate 0.0904   Epoch: 0   Global Step: 40680   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:20,638-Speed 2621.71 samples/sec   Loss 14.2439   LearningRate 0.0904   Epoch: 0   Global Step: 40690   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:24,537-Speed 2626.75 samples/sec   Loss 14.4216   LearningRate 0.0904   Epoch: 0   Global Step: 40700   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:28,438-Speed 2625.47 samples/sec   Loss 14.3291   LearningRate 0.0904   Epoch: 0   Global Step: 40710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:32,335-Speed 2628.08 samples/sec   Loss 14.4439   LearningRate 0.0904   Epoch: 0   Global Step: 40720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:36,247-Speed 2618.29 samples/sec   Loss 14.2937   LearningRate 0.0904   Epoch: 0   Global Step: 40730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:40,148-Speed 2626.41 samples/sec   Loss 14.4887   LearningRate 0.0904   Epoch: 0   Global Step: 40740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:44,046-Speed 2627.26 samples/sec   Loss 14.4063   LearningRate 0.0904   Epoch: 0   Global Step: 40750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:47,950-Speed 2623.85 samples/sec   Loss 14.3119   LearningRate 0.0904   Epoch: 0   Global Step: 40760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:51,870-Speed 2612.95 samples/sec   Loss 14.3207   LearningRate 0.0904   Epoch: 0   Global Step: 40770   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:18:55,775-Speed 2622.93 samples/sec   Loss 14.3311   LearningRate 0.0904   Epoch: 0   Global Step: 40780   Fp16 Grad Scale: 524288   Required: 89 hours
Training: 2022-04-13 00:18:59,663-Speed 2634.26 samples/sec   Loss 14.3186   LearningRate 0.0904   Epoch: 0   Global Step: 40790   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:03,566-Speed 2624.43 samples/sec   Loss 14.3875   LearningRate 0.0904   Epoch: 0   Global Step: 40800   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:07,486-Speed 2613.69 samples/sec   Loss 14.4066   LearningRate 0.0904   Epoch: 0   Global Step: 40810   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:11,378-Speed 2631.43 samples/sec   Loss 14.4206   LearningRate 0.0904   Epoch: 0   Global Step: 40820   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:15,322-Speed 2597.47 samples/sec   Loss 14.3415   LearningRate 0.0904   Epoch: 0   Global Step: 40830   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:19,234-Speed 2618.24 samples/sec   Loss 14.2580   LearningRate 0.0904   Epoch: 0   Global Step: 40840   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:23,206-Speed 2578.95 samples/sec   Loss 14.3586   LearningRate 0.0904   Epoch: 0   Global Step: 40850   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:27,143-Speed 2601.22 samples/sec   Loss 14.3963   LearningRate 0.0904   Epoch: 0   Global Step: 40860   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:31,106-Speed 2584.59 samples/sec   Loss 14.0970   LearningRate 0.0904   Epoch: 0   Global Step: 40870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:19:34,999-Speed 2630.99 samples/sec   Loss 14.5044   LearningRate 0.0904   Epoch: 0   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:19:38,895-Speed 2629.16 samples/sec   Loss 14.3887   LearningRate 0.0904   Epoch: 0   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:19:42,795-Speed 2626.54 samples/sec   Loss 14.1790   LearningRate 0.0904   Epoch: 0   Global Step: 40900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:19:46,695-Speed 2626.00 samples/sec   Loss 14.1966   LearningRate 0.0904   Epoch: 0   Global Step: 40910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:19:50,597-Speed 2625.25 samples/sec   Loss 14.3270   LearningRate 0.0904   Epoch: 0   Global Step: 40920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:19:54,505-Speed 2621.10 samples/sec   Loss 14.2808   LearningRate 0.0904   Epoch: 0   Global Step: 40930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:19:58,403-Speed 2627.47 samples/sec   Loss 14.3307   LearningRate 0.0904   Epoch: 0   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:02,303-Speed 2626.04 samples/sec   Loss 14.4088   LearningRate 0.0904   Epoch: 0   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:06,211-Speed 2620.82 samples/sec   Loss 14.3217   LearningRate 0.0904   Epoch: 0   Global Step: 40960   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:10,120-Speed 2620.54 samples/sec   Loss 14.2433   LearningRate 0.0904   Epoch: 0   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:14,037-Speed 2614.87 samples/sec   Loss 14.4350   LearningRate 0.0904   Epoch: 0   Global Step: 40980   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:20:17,927-Speed 2633.16 samples/sec   Loss 14.4631   LearningRate 0.0904   Epoch: 0   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:21,823-Speed 2629.43 samples/sec   Loss 14.2802   LearningRate 0.0904   Epoch: 0   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:25,720-Speed 2627.92 samples/sec   Loss 14.3993   LearningRate 0.0904   Epoch: 0   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:29,630-Speed 2619.25 samples/sec   Loss 14.2862   LearningRate 0.0904   Epoch: 0   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:33,534-Speed 2623.59 samples/sec   Loss 14.2515   LearningRate 0.0904   Epoch: 0   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:37,434-Speed 2626.60 samples/sec   Loss 14.3849   LearningRate 0.0904   Epoch: 0   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:41,334-Speed 2626.66 samples/sec   Loss 14.4949   LearningRate 0.0903   Epoch: 0   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:45,344-Speed 2554.10 samples/sec   Loss 14.2658   LearningRate 0.0903   Epoch: 0   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:49,341-Speed 2562.86 samples/sec   Loss 14.2763   LearningRate 0.0903   Epoch: 0   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:53,239-Speed 2627.67 samples/sec   Loss 14.4264   LearningRate 0.0903   Epoch: 0   Global Step: 41080   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:20:57,138-Speed 2626.48 samples/sec   Loss 14.2924   LearningRate 0.0903   Epoch: 0   Global Step: 41090   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:01,046-Speed 2621.12 samples/sec   Loss 14.4275   LearningRate 0.0903   Epoch: 0   Global Step: 41100   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:04,946-Speed 2626.17 samples/sec   Loss 14.3715   LearningRate 0.0903   Epoch: 0   Global Step: 41110   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:08,845-Speed 2627.42 samples/sec   Loss 14.3199   LearningRate 0.0903   Epoch: 0   Global Step: 41120   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:12,746-Speed 2625.00 samples/sec   Loss 14.5221   LearningRate 0.0903   Epoch: 0   Global Step: 41130   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:16,649-Speed 2624.65 samples/sec   Loss 14.4579   LearningRate 0.0903   Epoch: 0   Global Step: 41140   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:20,548-Speed 2626.59 samples/sec   Loss 14.4631   LearningRate 0.0903   Epoch: 0   Global Step: 41150   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:21:24,432-Speed 2636.85 samples/sec   Loss 14.3470   LearningRate 0.0903   Epoch: 0   Global Step: 41160   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:28,331-Speed 2627.14 samples/sec   Loss 14.3688   LearningRate 0.0903   Epoch: 0   Global Step: 41170   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:32,232-Speed 2625.83 samples/sec   Loss 14.2903   LearningRate 0.0903   Epoch: 0   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:36,144-Speed 2618.36 samples/sec   Loss 14.2611   LearningRate 0.0903   Epoch: 0   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:40,046-Speed 2624.67 samples/sec   Loss 14.3035   LearningRate 0.0903   Epoch: 0   Global Step: 41200   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:43,965-Speed 2613.83 samples/sec   Loss 14.3316   LearningRate 0.0903   Epoch: 0   Global Step: 41210   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:47,866-Speed 2625.67 samples/sec   Loss 14.2818   LearningRate 0.0903   Epoch: 0   Global Step: 41220   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:51,782-Speed 2615.75 samples/sec   Loss 14.2971   LearningRate 0.0903   Epoch: 0   Global Step: 41230   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:55,684-Speed 2624.75 samples/sec   Loss 14.3189   LearningRate 0.0903   Epoch: 0   Global Step: 41240   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:21:59,583-Speed 2627.27 samples/sec   Loss 14.2130   LearningRate 0.0903   Epoch: 0   Global Step: 41250   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:22:03,483-Speed 2626.63 samples/sec   Loss 14.3296   LearningRate 0.0903   Epoch: 0   Global Step: 41260   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:07,377-Speed 2629.78 samples/sec   Loss 14.2872   LearningRate 0.0903   Epoch: 0   Global Step: 41270   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:11,328-Speed 2592.26 samples/sec   Loss 14.3626   LearningRate 0.0903   Epoch: 0   Global Step: 41280   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:15,226-Speed 2628.12 samples/sec   Loss 14.5368   LearningRate 0.0903   Epoch: 0   Global Step: 41290   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:19,165-Speed 2601.00 samples/sec   Loss 14.3500   LearningRate 0.0903   Epoch: 0   Global Step: 41300   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:23,067-Speed 2624.50 samples/sec   Loss 14.5093   LearningRate 0.0903   Epoch: 0   Global Step: 41310   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:26,977-Speed 2620.11 samples/sec   Loss 14.3048   LearningRate 0.0903   Epoch: 0   Global Step: 41320   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:30,872-Speed 2629.19 samples/sec   Loss 14.4767   LearningRate 0.0903   Epoch: 0   Global Step: 41330   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:34,779-Speed 2621.79 samples/sec   Loss 14.4808   LearningRate 0.0903   Epoch: 0   Global Step: 41340   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:38,679-Speed 2626.35 samples/sec   Loss 14.3007   LearningRate 0.0903   Epoch: 0   Global Step: 41350   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:42,587-Speed 2620.84 samples/sec   Loss 14.3806   LearningRate 0.0903   Epoch: 0   Global Step: 41360   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:46,486-Speed 2626.91 samples/sec   Loss 14.3273   LearningRate 0.0903   Epoch: 0   Global Step: 41370   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:22:50,372-Speed 2636.31 samples/sec   Loss 14.5200   LearningRate 0.0903   Epoch: 0   Global Step: 41380   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:22:54,270-Speed 2627.05 samples/sec   Loss 14.4785   LearningRate 0.0903   Epoch: 0   Global Step: 41390   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:22:58,167-Speed 2628.70 samples/sec   Loss 14.2920   LearningRate 0.0903   Epoch: 0   Global Step: 41400   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:02,068-Speed 2626.08 samples/sec   Loss 14.3577   LearningRate 0.0903   Epoch: 0   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:05,964-Speed 2628.41 samples/sec   Loss 14.4150   LearningRate 0.0903   Epoch: 0   Global Step: 41420   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:09,860-Speed 2628.91 samples/sec   Loss 14.1782   LearningRate 0.0903   Epoch: 0   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:13,760-Speed 2626.82 samples/sec   Loss 14.3385   LearningRate 0.0903   Epoch: 0   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:17,661-Speed 2625.27 samples/sec   Loss 14.3022   LearningRate 0.0903   Epoch: 0   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:21,560-Speed 2627.32 samples/sec   Loss 14.1887   LearningRate 0.0903   Epoch: 0   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:25,473-Speed 2617.99 samples/sec   Loss 14.3989   LearningRate 0.0903   Epoch: 0   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:29,368-Speed 2629.79 samples/sec   Loss 14.2305   LearningRate 0.0902   Epoch: 0   Global Step: 41480   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:23:50,664-Speed 480.87 samples/sec   Loss 14.3810   LearningRate 0.0902   Epoch: 1   Global Step: 41490   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:23:54,541-Speed 2642.46 samples/sec   Loss 14.2984   LearningRate 0.0902   Epoch: 1   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:23:58,425-Speed 2636.78 samples/sec   Loss 14.2921   LearningRate 0.0902   Epoch: 1   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:02,342-Speed 2615.01 samples/sec   Loss 14.3595   LearningRate 0.0902   Epoch: 1   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:06,229-Speed 2634.92 samples/sec   Loss 14.4849   LearningRate 0.0902   Epoch: 1   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:10,114-Speed 2636.77 samples/sec   Loss 14.5274   LearningRate 0.0902   Epoch: 1   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:14,156-Speed 2533.99 samples/sec   Loss 14.3063   LearningRate 0.0902   Epoch: 1   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:18,162-Speed 2556.54 samples/sec   Loss 14.4472   LearningRate 0.0902   Epoch: 1   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:22,050-Speed 2634.92 samples/sec   Loss 14.1614   LearningRate 0.0902   Epoch: 1   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:25,935-Speed 2636.48 samples/sec   Loss 14.1693   LearningRate 0.0902   Epoch: 1   Global Step: 41580   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:29,823-Speed 2634.30 samples/sec   Loss 14.4487   LearningRate 0.0902   Epoch: 1   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:33,713-Speed 2632.54 samples/sec   Loss 14.3075   LearningRate 0.0902   Epoch: 1   Global Step: 41600   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:24:37,584-Speed 2645.97 samples/sec   Loss 14.2440   LearningRate 0.0902   Epoch: 1   Global Step: 41610   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:41,474-Speed 2633.09 samples/sec   Loss 14.3963   LearningRate 0.0902   Epoch: 1   Global Step: 41620   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:45,399-Speed 2609.53 samples/sec   Loss 14.4321   LearningRate 0.0902   Epoch: 1   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:49,446-Speed 2531.39 samples/sec   Loss 14.3627   LearningRate 0.0902   Epoch: 1   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:53,496-Speed 2528.73 samples/sec   Loss 14.1761   LearningRate 0.0902   Epoch: 1   Global Step: 41650   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:24:57,429-Speed 2604.41 samples/sec   Loss 14.3239   LearningRate 0.0902   Epoch: 1   Global Step: 41660   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:01,330-Speed 2625.97 samples/sec   Loss 14.3191   LearningRate 0.0902   Epoch: 1   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:05,229-Speed 2626.26 samples/sec   Loss 14.2773   LearningRate 0.0902   Epoch: 1   Global Step: 41680   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:09,136-Speed 2621.27 samples/sec   Loss 14.2910   LearningRate 0.0902   Epoch: 1   Global Step: 41690   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:13,042-Speed 2622.56 samples/sec   Loss 14.3059   LearningRate 0.0902   Epoch: 1   Global Step: 41700   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:16,943-Speed 2625.26 samples/sec   Loss 14.2715   LearningRate 0.0902   Epoch: 1   Global Step: 41710   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:25:20,844-Speed 2626.17 samples/sec   Loss 14.3366   LearningRate 0.0902   Epoch: 1   Global Step: 41720   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:25:24,743-Speed 2626.87 samples/sec   Loss 14.3922   LearningRate 0.0902   Epoch: 1   Global Step: 41730   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:25:28,641-Speed 2627.89 samples/sec   Loss 14.2569   LearningRate 0.0902   Epoch: 1   Global Step: 41740   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:25:32,541-Speed 2626.04 samples/sec   Loss 14.4079   LearningRate 0.0902   Epoch: 1   Global Step: 41750   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:25:36,443-Speed 2624.75 samples/sec   Loss 14.4067   LearningRate 0.0902   Epoch: 1   Global Step: 41760   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:25:40,321-Speed 2641.37 samples/sec   Loss 14.1733   LearningRate 0.0902   Epoch: 1   Global Step: 41770   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:44,218-Speed 2628.14 samples/sec   Loss 14.3336   LearningRate 0.0902   Epoch: 1   Global Step: 41780   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:48,117-Speed 2627.07 samples/sec   Loss 14.3612   LearningRate 0.0902   Epoch: 1   Global Step: 41790   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:52,015-Speed 2627.81 samples/sec   Loss 14.4239   LearningRate 0.0902   Epoch: 1   Global Step: 41800   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:55,912-Speed 2628.19 samples/sec   Loss 14.2460   LearningRate 0.0902   Epoch: 1   Global Step: 41810   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:25:59,812-Speed 2626.63 samples/sec   Loss 14.3600   LearningRate 0.0902   Epoch: 1   Global Step: 41820   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:03,711-Speed 2626.92 samples/sec   Loss 14.4317   LearningRate 0.0902   Epoch: 1   Global Step: 41830   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:07,607-Speed 2628.76 samples/sec   Loss 14.2358   LearningRate 0.0902   Epoch: 1   Global Step: 41840   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:11,508-Speed 2625.12 samples/sec   Loss 14.2559   LearningRate 0.0902   Epoch: 1   Global Step: 41850   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:15,524-Speed 2550.44 samples/sec   Loss 14.3077   LearningRate 0.0902   Epoch: 1   Global Step: 41860   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:19,631-Speed 2494.48 samples/sec   Loss 14.2971   LearningRate 0.0902   Epoch: 1   Global Step: 41870   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:26:23,701-Speed 2516.34 samples/sec   Loss 14.3131   LearningRate 0.0902   Epoch: 1   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:27,615-Speed 2617.03 samples/sec   Loss 14.2526   LearningRate 0.0902   Epoch: 1   Global Step: 41890   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:31,509-Speed 2630.18 samples/sec   Loss 14.1290   LearningRate 0.0902   Epoch: 1   Global Step: 41900   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:35,403-Speed 2630.01 samples/sec   Loss 14.3302   LearningRate 0.0902   Epoch: 1   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:39,297-Speed 2630.90 samples/sec   Loss 14.2852   LearningRate 0.0901   Epoch: 1   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:43,203-Speed 2622.58 samples/sec   Loss 14.2651   LearningRate 0.0901   Epoch: 1   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:47,095-Speed 2631.54 samples/sec   Loss 14.2773   LearningRate 0.0901   Epoch: 1   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:51,129-Speed 2539.15 samples/sec   Loss 14.2468   LearningRate 0.0901   Epoch: 1   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:26:55,014-Speed 2636.61 samples/sec   Loss 14.1832   LearningRate 0.0901   Epoch: 1   Global Step: 41960   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:26:58,910-Speed 2629.28 samples/sec   Loss 14.2395   LearningRate 0.0901   Epoch: 1   Global Step: 41970   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:02,817-Speed 2621.58 samples/sec   Loss 14.3010   LearningRate 0.0901   Epoch: 1   Global Step: 41980   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:06,715-Speed 2627.31 samples/sec   Loss 14.1904   LearningRate 0.0901   Epoch: 1   Global Step: 41990   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:10,627-Speed 2618.76 samples/sec   Loss 14.2608   LearningRate 0.0901   Epoch: 1   Global Step: 42000   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:14,530-Speed 2624.38 samples/sec   Loss 14.2974   LearningRate 0.0901   Epoch: 1   Global Step: 42010   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:18,430-Speed 2626.03 samples/sec   Loss 14.2564   LearningRate 0.0901   Epoch: 1   Global Step: 42020   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:22,330-Speed 2626.23 samples/sec   Loss 14.1622   LearningRate 0.0901   Epoch: 1   Global Step: 42030   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:26,232-Speed 2625.14 samples/sec   Loss 14.0508   LearningRate 0.0901   Epoch: 1   Global Step: 42040   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:30,160-Speed 2607.83 samples/sec   Loss 14.3261   LearningRate 0.0901   Epoch: 1   Global Step: 42050   Fp16 Grad Scale: 65536   Required: 89 hours
Training: 2022-04-13 00:27:34,062-Speed 2624.66 samples/sec   Loss 14.1446   LearningRate 0.0901   Epoch: 1   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:27:37,990-Speed 2607.45 samples/sec   Loss 14.1923   LearningRate 0.0901   Epoch: 1   Global Step: 42070   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:27:41,883-Speed 2630.89 samples/sec   Loss 14.4141   LearningRate 0.0901   Epoch: 1   Global Step: 42080   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:27:45,813-Speed 2606.97 samples/sec   Loss 14.2172   LearningRate 0.0901   Epoch: 1   Global Step: 42090   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:27:49,706-Speed 2631.28 samples/sec   Loss 14.3808   LearningRate 0.0901   Epoch: 1   Global Step: 42100   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:27:53,614-Speed 2620.82 samples/sec   Loss 14.1770   LearningRate 0.0901   Epoch: 1   Global Step: 42110   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:27:57,530-Speed 2616.25 samples/sec   Loss 14.0886   LearningRate 0.0901   Epoch: 1   Global Step: 42120   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:28:01,434-Speed 2623.64 samples/sec   Loss 14.1710   LearningRate 0.0901   Epoch: 1   Global Step: 42130   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:28:05,331-Speed 2627.72 samples/sec   Loss 14.2679   LearningRate 0.0901   Epoch: 1   Global Step: 42140   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:28:09,229-Speed 2627.43 samples/sec   Loss 14.2072   LearningRate 0.0901   Epoch: 1   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 89 hours
Training: 2022-04-13 00:28:13,129-Speed 2627.05 samples/sec   Loss 14.2125   LearningRate 0.0901   Epoch: 1   Global Step: 42160   Fp16 Grad Scale: 262144   Required: 89 hours
Training: 2022-04-13 00:28:17,027-Speed 2627.30 samples/sec   Loss 14.2393   LearningRate 0.0901   Epoch: 1   Global Step: 42170   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:28:20,934-Speed 2621.71 samples/sec   Loss 14.2534   LearningRate 0.0901   Epoch: 1   Global Step: 42180   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:28:24,854-Speed 2613.15 samples/sec   Loss 14.3046   LearningRate 0.0901   Epoch: 1   Global Step: 42190   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:28:28,754-Speed 2626.20 samples/sec   Loss 14.3178   LearningRate 0.0901   Epoch: 1   Global Step: 42200   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:28:32,642-Speed 2634.89 samples/sec   Loss 14.1644   LearningRate 0.0901   Epoch: 1   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:28:36,564-Speed 2611.52 samples/sec   Loss 14.2444   LearningRate 0.0901   Epoch: 1   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:28:40,468-Speed 2623.70 samples/sec   Loss 14.2661   LearningRate 0.0901   Epoch: 1   Global Step: 42230   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:28:44,393-Speed 2609.18 samples/sec   Loss 14.4298   LearningRate 0.0901   Epoch: 1   Global Step: 42240   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:28:48,435-Speed 2534.55 samples/sec   Loss 14.2135   LearningRate 0.0901   Epoch: 1   Global Step: 42250   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:28:52,335-Speed 2625.80 samples/sec   Loss 14.3638   LearningRate 0.0901   Epoch: 1   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:28:56,230-Speed 2630.44 samples/sec   Loss 14.1470   LearningRate 0.0901   Epoch: 1   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:00,126-Speed 2628.43 samples/sec   Loss 14.3219   LearningRate 0.0901   Epoch: 1   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:04,034-Speed 2621.30 samples/sec   Loss 14.4338   LearningRate 0.0901   Epoch: 1   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:07,933-Speed 2626.81 samples/sec   Loss 14.3233   LearningRate 0.0901   Epoch: 1   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:11,837-Speed 2623.26 samples/sec   Loss 14.1940   LearningRate 0.0901   Epoch: 1   Global Step: 42310   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:29:15,718-Speed 2639.14 samples/sec   Loss 14.2667   LearningRate 0.0901   Epoch: 1   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:19,617-Speed 2626.81 samples/sec   Loss 14.2569   LearningRate 0.0901   Epoch: 1   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:23,520-Speed 2624.85 samples/sec   Loss 14.2578   LearningRate 0.0901   Epoch: 1   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:27,420-Speed 2626.55 samples/sec   Loss 14.3254   LearningRate 0.0901   Epoch: 1   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:31,371-Speed 2592.26 samples/sec   Loss 14.2271   LearningRate 0.0900   Epoch: 1   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:35,278-Speed 2621.53 samples/sec   Loss 14.4185   LearningRate 0.0900   Epoch: 1   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:39,178-Speed 2626.46 samples/sec   Loss 14.3556   LearningRate 0.0900   Epoch: 1   Global Step: 42380   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:43,079-Speed 2625.32 samples/sec   Loss 14.2325   LearningRate 0.0900   Epoch: 1   Global Step: 42390   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:46,979-Speed 2626.07 samples/sec   Loss 14.3450   LearningRate 0.0900   Epoch: 1   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:50,880-Speed 2626.20 samples/sec   Loss 14.2357   LearningRate 0.0900   Epoch: 1   Global Step: 42410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:29:54,779-Speed 2627.20 samples/sec   Loss 14.3928   LearningRate 0.0900   Epoch: 1   Global Step: 42420   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:29:58,695-Speed 2615.38 samples/sec   Loss 14.2541   LearningRate 0.0900   Epoch: 1   Global Step: 42430   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:30:02,713-Speed 2549.56 samples/sec   Loss 14.2552   LearningRate 0.0900   Epoch: 1   Global Step: 42440   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:30:06,593-Speed 2639.31 samples/sec   Loss 14.2709   LearningRate 0.0900   Epoch: 1   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:10,517-Speed 2610.29 samples/sec   Loss 14.1042   LearningRate 0.0900   Epoch: 1   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:14,407-Speed 2633.00 samples/sec   Loss 14.2386   LearningRate 0.0900   Epoch: 1   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:18,299-Speed 2632.35 samples/sec   Loss 14.1692   LearningRate 0.0900   Epoch: 1   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:22,191-Speed 2631.12 samples/sec   Loss 14.2307   LearningRate 0.0900   Epoch: 1   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:26,092-Speed 2626.01 samples/sec   Loss 14.2770   LearningRate 0.0900   Epoch: 1   Global Step: 42500   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:29,986-Speed 2630.49 samples/sec   Loss 14.2275   LearningRate 0.0900   Epoch: 1   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:33,882-Speed 2629.07 samples/sec   Loss 14.2858   LearningRate 0.0900   Epoch: 1   Global Step: 42520   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:37,781-Speed 2626.86 samples/sec   Loss 14.3700   LearningRate 0.0900   Epoch: 1   Global Step: 42530   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:41,676-Speed 2630.07 samples/sec   Loss 14.2068   LearningRate 0.0900   Epoch: 1   Global Step: 42540   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:45,552-Speed 2642.49 samples/sec   Loss 14.3204   LearningRate 0.0900   Epoch: 1   Global Step: 42550   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:49,453-Speed 2625.40 samples/sec   Loss 14.1017   LearningRate 0.0900   Epoch: 1   Global Step: 42560   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:53,441-Speed 2568.63 samples/sec   Loss 14.3419   LearningRate 0.0900   Epoch: 1   Global Step: 42570   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:30:57,344-Speed 2623.97 samples/sec   Loss 14.3108   LearningRate 0.0900   Epoch: 1   Global Step: 42580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:01,243-Speed 2626.95 samples/sec   Loss 14.2347   LearningRate 0.0900   Epoch: 1   Global Step: 42590   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:05,140-Speed 2628.43 samples/sec   Loss 14.3068   LearningRate 0.0900   Epoch: 1   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:09,040-Speed 2626.13 samples/sec   Loss 14.1695   LearningRate 0.0900   Epoch: 1   Global Step: 42610   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:12,951-Speed 2618.58 samples/sec   Loss 14.3744   LearningRate 0.0900   Epoch: 1   Global Step: 42620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:16,846-Speed 2629.73 samples/sec   Loss 14.2672   LearningRate 0.0900   Epoch: 1   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:20,763-Speed 2615.02 samples/sec   Loss 14.1949   LearningRate 0.0900   Epoch: 1   Global Step: 42640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:24,663-Speed 2626.48 samples/sec   Loss 14.1628   LearningRate 0.0900   Epoch: 1   Global Step: 42650   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:31:28,560-Speed 2628.39 samples/sec   Loss 14.2407   LearningRate 0.0900   Epoch: 1   Global Step: 42660   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:31:32,458-Speed 2627.90 samples/sec   Loss 14.2568   LearningRate 0.0900   Epoch: 1   Global Step: 42670   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:31:36,364-Speed 2622.08 samples/sec   Loss 14.3877   LearningRate 0.0900   Epoch: 1   Global Step: 42680   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:31:40,264-Speed 2626.02 samples/sec   Loss 14.3788   LearningRate 0.0900   Epoch: 1   Global Step: 42690   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:31:44,155-Speed 2632.50 samples/sec   Loss 14.3367   LearningRate 0.0900   Epoch: 1   Global Step: 42700   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:48,060-Speed 2623.16 samples/sec   Loss 14.2055   LearningRate 0.0900   Epoch: 1   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:51,956-Speed 2628.89 samples/sec   Loss 14.3255   LearningRate 0.0900   Epoch: 1   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:55,861-Speed 2622.49 samples/sec   Loss 14.4456   LearningRate 0.0900   Epoch: 1   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:31:59,776-Speed 2616.79 samples/sec   Loss 14.2181   LearningRate 0.0900   Epoch: 1   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:03,688-Speed 2618.01 samples/sec   Loss 14.2413   LearningRate 0.0900   Epoch: 1   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:07,599-Speed 2619.29 samples/sec   Loss 14.0956   LearningRate 0.0900   Epoch: 1   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:11,506-Speed 2621.37 samples/sec   Loss 14.2223   LearningRate 0.0900   Epoch: 1   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:15,420-Speed 2617.09 samples/sec   Loss 14.2207   LearningRate 0.0900   Epoch: 1   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:19,332-Speed 2617.60 samples/sec   Loss 14.3320   LearningRate 0.0899   Epoch: 1   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:23,230-Speed 2628.09 samples/sec   Loss 14.1894   LearningRate 0.0899   Epoch: 1   Global Step: 42800   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:32:27,132-Speed 2625.28 samples/sec   Loss 14.3554   LearningRate 0.0899   Epoch: 1   Global Step: 42810   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:32:31,032-Speed 2626.19 samples/sec   Loss 14.3123   LearningRate 0.0899   Epoch: 1   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:34,946-Speed 2617.13 samples/sec   Loss 14.1706   LearningRate 0.0899   Epoch: 1   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:38,846-Speed 2626.00 samples/sec   Loss 14.0526   LearningRate 0.0899   Epoch: 1   Global Step: 42840   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:42,746-Speed 2626.14 samples/sec   Loss 14.4412   LearningRate 0.0899   Epoch: 1   Global Step: 42850   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:46,644-Speed 2628.20 samples/sec   Loss 14.3272   LearningRate 0.0899   Epoch: 1   Global Step: 42860   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:50,548-Speed 2623.50 samples/sec   Loss 14.3067   LearningRate 0.0899   Epoch: 1   Global Step: 42870   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:54,451-Speed 2623.93 samples/sec   Loss 14.1374   LearningRate 0.0899   Epoch: 1   Global Step: 42880   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:32:58,346-Speed 2630.00 samples/sec   Loss 14.2535   LearningRate 0.0899   Epoch: 1   Global Step: 42890   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:33:02,247-Speed 2625.94 samples/sec   Loss 14.3303   LearningRate 0.0899   Epoch: 1   Global Step: 42900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:33:06,155-Speed 2620.49 samples/sec   Loss 14.3252   LearningRate 0.0899   Epoch: 1   Global Step: 42910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:33:10,098-Speed 2597.72 samples/sec   Loss 14.1189   LearningRate 0.0899   Epoch: 1   Global Step: 42920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:33:14,167-Speed 2517.16 samples/sec   Loss 14.2810   LearningRate 0.0899   Epoch: 1   Global Step: 42930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:33:18,216-Speed 2529.30 samples/sec   Loss 14.1850   LearningRate 0.0899   Epoch: 1   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:22,150-Speed 2604.38 samples/sec   Loss 14.3394   LearningRate 0.0899   Epoch: 1   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:26,046-Speed 2628.96 samples/sec   Loss 14.2599   LearningRate 0.0899   Epoch: 1   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:29,963-Speed 2614.65 samples/sec   Loss 14.2217   LearningRate 0.0899   Epoch: 1   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:33,866-Speed 2624.95 samples/sec   Loss 14.1144   LearningRate 0.0899   Epoch: 1   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:37,760-Speed 2629.93 samples/sec   Loss 14.3485   LearningRate 0.0899   Epoch: 1   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:41,665-Speed 2623.09 samples/sec   Loss 14.1954   LearningRate 0.0899   Epoch: 1   Global Step: 43000   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:45,563-Speed 2627.26 samples/sec   Loss 14.1295   LearningRate 0.0899   Epoch: 1   Global Step: 43010   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:49,456-Speed 2630.75 samples/sec   Loss 14.3135   LearningRate 0.0899   Epoch: 1   Global Step: 43020   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:53,352-Speed 2630.48 samples/sec   Loss 14.3028   LearningRate 0.0899   Epoch: 1   Global Step: 43030   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:33:57,271-Speed 2613.45 samples/sec   Loss 14.3439   LearningRate 0.0899   Epoch: 1   Global Step: 43040   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:01,173-Speed 2624.99 samples/sec   Loss 14.1320   LearningRate 0.0899   Epoch: 1   Global Step: 43050   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:05,094-Speed 2611.61 samples/sec   Loss 14.2306   LearningRate 0.0899   Epoch: 1   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:08,993-Speed 2627.28 samples/sec   Loss 14.4273   LearningRate 0.0899   Epoch: 1   Global Step: 43070   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:12,889-Speed 2629.28 samples/sec   Loss 14.1769   LearningRate 0.0899   Epoch: 1   Global Step: 43080   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:16,793-Speed 2623.53 samples/sec   Loss 14.1576   LearningRate 0.0899   Epoch: 1   Global Step: 43090   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:20,698-Speed 2623.63 samples/sec   Loss 14.3131   LearningRate 0.0899   Epoch: 1   Global Step: 43100   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:24,603-Speed 2623.10 samples/sec   Loss 14.1590   LearningRate 0.0899   Epoch: 1   Global Step: 43110   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:28,497-Speed 2630.38 samples/sec   Loss 14.2344   LearningRate 0.0899   Epoch: 1   Global Step: 43120   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:32,395-Speed 2627.48 samples/sec   Loss 14.3280   LearningRate 0.0899   Epoch: 1   Global Step: 43130   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:36,294-Speed 2627.22 samples/sec   Loss 14.1777   LearningRate 0.0899   Epoch: 1   Global Step: 43140   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:34:40,193-Speed 2627.20 samples/sec   Loss 14.2386   LearningRate 0.0899   Epoch: 1   Global Step: 43150   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:34:44,083-Speed 2632.83 samples/sec   Loss 14.3029   LearningRate 0.0899   Epoch: 1   Global Step: 43160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:47,976-Speed 2630.21 samples/sec   Loss 14.3219   LearningRate 0.0899   Epoch: 1   Global Step: 43170   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:51,872-Speed 2629.61 samples/sec   Loss 14.0983   LearningRate 0.0899   Epoch: 1   Global Step: 43180   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:55,775-Speed 2625.05 samples/sec   Loss 14.2041   LearningRate 0.0899   Epoch: 1   Global Step: 43190   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:34:59,674-Speed 2626.28 samples/sec   Loss 14.2866   LearningRate 0.0899   Epoch: 1   Global Step: 43200   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:03,581-Speed 2621.91 samples/sec   Loss 14.2421   LearningRate 0.0899   Epoch: 1   Global Step: 43210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:07,487-Speed 2622.33 samples/sec   Loss 14.1469   LearningRate 0.0899   Epoch: 1   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:11,397-Speed 2619.72 samples/sec   Loss 14.4397   LearningRate 0.0898   Epoch: 1   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:15,298-Speed 2625.50 samples/sec   Loss 14.1559   LearningRate 0.0898   Epoch: 1   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:19,201-Speed 2624.65 samples/sec   Loss 14.2467   LearningRate 0.0898   Epoch: 1   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:23,094-Speed 2630.64 samples/sec   Loss 14.2880   LearningRate 0.0898   Epoch: 1   Global Step: 43260   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:35:27,027-Speed 2605.11 samples/sec   Loss 14.3539   LearningRate 0.0898   Epoch: 1   Global Step: 43270   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:35:30,906-Speed 2640.05 samples/sec   Loss 14.1918   LearningRate 0.0898   Epoch: 1   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:34,800-Speed 2630.87 samples/sec   Loss 14.2220   LearningRate 0.0898   Epoch: 1   Global Step: 43290   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:38,695-Speed 2629.57 samples/sec   Loss 14.2277   LearningRate 0.0898   Epoch: 1   Global Step: 43300   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:42,588-Speed 2630.71 samples/sec   Loss 14.1834   LearningRate 0.0898   Epoch: 1   Global Step: 43310   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:46,490-Speed 2624.79 samples/sec   Loss 14.2169   LearningRate 0.0898   Epoch: 1   Global Step: 43320   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:50,399-Speed 2620.49 samples/sec   Loss 14.1616   LearningRate 0.0898   Epoch: 1   Global Step: 43330   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:54,299-Speed 2626.24 samples/sec   Loss 14.3153   LearningRate 0.0898   Epoch: 1   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:35:58,220-Speed 2612.45 samples/sec   Loss 14.2782   LearningRate 0.0898   Epoch: 1   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:02,114-Speed 2630.22 samples/sec   Loss 14.3177   LearningRate 0.0898   Epoch: 1   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:06,014-Speed 2626.92 samples/sec   Loss 14.1046   LearningRate 0.0898   Epoch: 1   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:09,912-Speed 2627.54 samples/sec   Loss 14.3640   LearningRate 0.0898   Epoch: 1   Global Step: 43380   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:36:13,797-Speed 2635.70 samples/sec   Loss 14.2367   LearningRate 0.0898   Epoch: 1   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:17,698-Speed 2625.76 samples/sec   Loss 14.3066   LearningRate 0.0898   Epoch: 1   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:21,608-Speed 2619.74 samples/sec   Loss 14.1429   LearningRate 0.0898   Epoch: 1   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:25,503-Speed 2630.40 samples/sec   Loss 14.1471   LearningRate 0.0898   Epoch: 1   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:29,426-Speed 2610.70 samples/sec   Loss 14.2751   LearningRate 0.0898   Epoch: 1   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:33,334-Speed 2620.97 samples/sec   Loss 14.1867   LearningRate 0.0898   Epoch: 1   Global Step: 43440   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:37,236-Speed 2625.30 samples/sec   Loss 14.3910   LearningRate 0.0898   Epoch: 1   Global Step: 43450   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:41,174-Speed 2601.28 samples/sec   Loss 14.2419   LearningRate 0.0898   Epoch: 1   Global Step: 43460   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:45,070-Speed 2628.47 samples/sec   Loss 14.3381   LearningRate 0.0898   Epoch: 1   Global Step: 43470   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:48,998-Speed 2608.18 samples/sec   Loss 14.2744   LearningRate 0.0898   Epoch: 1   Global Step: 43480   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:36:52,897-Speed 2627.37 samples/sec   Loss 14.1531   LearningRate 0.0898   Epoch: 1   Global Step: 43490   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:36:56,808-Speed 2618.65 samples/sec   Loss 14.3452   LearningRate 0.0898   Epoch: 1   Global Step: 43500   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:00,720-Speed 2618.20 samples/sec   Loss 14.2471   LearningRate 0.0898   Epoch: 1   Global Step: 43510   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:04,629-Speed 2620.32 samples/sec   Loss 14.2006   LearningRate 0.0898   Epoch: 1   Global Step: 43520   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:08,531-Speed 2624.87 samples/sec   Loss 14.1348   LearningRate 0.0898   Epoch: 1   Global Step: 43530   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:12,439-Speed 2620.48 samples/sec   Loss 14.2685   LearningRate 0.0898   Epoch: 1   Global Step: 43540   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:16,346-Speed 2621.84 samples/sec   Loss 14.1817   LearningRate 0.0898   Epoch: 1   Global Step: 43550   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:20,248-Speed 2624.87 samples/sec   Loss 14.1899   LearningRate 0.0898   Epoch: 1   Global Step: 43560   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:24,148-Speed 2626.86 samples/sec   Loss 14.1109   LearningRate 0.0898   Epoch: 1   Global Step: 43570   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:28,045-Speed 2628.15 samples/sec   Loss 14.1927   LearningRate 0.0898   Epoch: 1   Global Step: 43580   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:31,921-Speed 2642.42 samples/sec   Loss 14.0843   LearningRate 0.0898   Epoch: 1   Global Step: 43590   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:35,826-Speed 2623.05 samples/sec   Loss 14.0538   LearningRate 0.0898   Epoch: 1   Global Step: 43600   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:39,724-Speed 2627.64 samples/sec   Loss 14.3613   LearningRate 0.0898   Epoch: 1   Global Step: 43610   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:43,616-Speed 2631.48 samples/sec   Loss 14.1764   LearningRate 0.0898   Epoch: 1   Global Step: 43620   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:47,511-Speed 2629.40 samples/sec   Loss 14.2278   LearningRate 0.0898   Epoch: 1   Global Step: 43630   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:51,404-Speed 2631.09 samples/sec   Loss 14.2897   LearningRate 0.0898   Epoch: 1   Global Step: 43640   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:37:55,288-Speed 2636.84 samples/sec   Loss 14.1984   LearningRate 0.0898   Epoch: 1   Global Step: 43650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:37:59,188-Speed 2626.49 samples/sec   Loss 14.1597   LearningRate 0.0898   Epoch: 1   Global Step: 43660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:03,102-Speed 2616.87 samples/sec   Loss 14.1144   LearningRate 0.0897   Epoch: 1   Global Step: 43670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:06,998-Speed 2628.66 samples/sec   Loss 14.1629   LearningRate 0.0897   Epoch: 1   Global Step: 43680   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:10,892-Speed 2630.18 samples/sec   Loss 14.1525   LearningRate 0.0897   Epoch: 1   Global Step: 43690   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:14,790-Speed 2628.34 samples/sec   Loss 14.0507   LearningRate 0.0897   Epoch: 1   Global Step: 43700   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:18,690-Speed 2626.27 samples/sec   Loss 14.2418   LearningRate 0.0897   Epoch: 1   Global Step: 43710   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:22,588-Speed 2627.42 samples/sec   Loss 14.1694   LearningRate 0.0897   Epoch: 1   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:26,492-Speed 2623.40 samples/sec   Loss 14.2737   LearningRate 0.0897   Epoch: 1   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:30,392-Speed 2627.21 samples/sec   Loss 14.1572   LearningRate 0.0897   Epoch: 1   Global Step: 43740   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:34,302-Speed 2619.06 samples/sec   Loss 14.0977   LearningRate 0.0897   Epoch: 1   Global Step: 43750   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:38:38,201-Speed 2626.63 samples/sec   Loss 14.2009   LearningRate 0.0897   Epoch: 1   Global Step: 43760   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:38:42,101-Speed 2626.27 samples/sec   Loss 14.2327   LearningRate 0.0897   Epoch: 1   Global Step: 43770   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:38:46,021-Speed 2613.53 samples/sec   Loss 14.1303   LearningRate 0.0897   Epoch: 1   Global Step: 43780   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:38:49,897-Speed 2642.27 samples/sec   Loss 14.2101   LearningRate 0.0897   Epoch: 1   Global Step: 43790   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:53,793-Speed 2629.05 samples/sec   Loss 14.3241   LearningRate 0.0897   Epoch: 1   Global Step: 43800   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:38:57,689-Speed 2628.83 samples/sec   Loss 14.1126   LearningRate 0.0897   Epoch: 1   Global Step: 43810   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:01,597-Speed 2621.23 samples/sec   Loss 14.3491   LearningRate 0.0897   Epoch: 1   Global Step: 43820   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:05,495-Speed 2627.30 samples/sec   Loss 14.4129   LearningRate 0.0897   Epoch: 1   Global Step: 43830   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:09,395-Speed 2625.74 samples/sec   Loss 14.1026   LearningRate 0.0897   Epoch: 1   Global Step: 43840   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:13,298-Speed 2624.10 samples/sec   Loss 14.0921   LearningRate 0.0897   Epoch: 1   Global Step: 43850   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:17,208-Speed 2619.92 samples/sec   Loss 14.1134   LearningRate 0.0897   Epoch: 1   Global Step: 43860   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:21,109-Speed 2626.30 samples/sec   Loss 14.2727   LearningRate 0.0897   Epoch: 1   Global Step: 43870   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:25,004-Speed 2629.43 samples/sec   Loss 14.2783   LearningRate 0.0897   Epoch: 1   Global Step: 43880   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:28,886-Speed 2638.30 samples/sec   Loss 14.1542   LearningRate 0.0897   Epoch: 1   Global Step: 43890   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:32,790-Speed 2623.94 samples/sec   Loss 14.2273   LearningRate 0.0897   Epoch: 1   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:36,690-Speed 2626.16 samples/sec   Loss 14.0842   LearningRate 0.0897   Epoch: 1   Global Step: 43910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:40,588-Speed 2626.99 samples/sec   Loss 14.0363   LearningRate 0.0897   Epoch: 1   Global Step: 43920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:44,480-Speed 2632.35 samples/sec   Loss 14.3011   LearningRate 0.0897   Epoch: 1   Global Step: 43930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:48,378-Speed 2627.28 samples/sec   Loss 14.3026   LearningRate 0.0897   Epoch: 1   Global Step: 43940   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:52,284-Speed 2623.15 samples/sec   Loss 14.1103   LearningRate 0.0897   Epoch: 1   Global Step: 43950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:39:56,179-Speed 2629.99 samples/sec   Loss 14.2152   LearningRate 0.0897   Epoch: 1   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:00,075-Speed 2628.86 samples/sec   Loss 14.0698   LearningRate 0.0897   Epoch: 1   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:03,987-Speed 2618.26 samples/sec   Loss 14.1183   LearningRate 0.0897   Epoch: 1   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:07,884-Speed 2628.17 samples/sec   Loss 14.1342   LearningRate 0.0897   Epoch: 1   Global Step: 43990   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:40:11,764-Speed 2639.61 samples/sec   Loss 14.2186   LearningRate 0.0897   Epoch: 1   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:15,661-Speed 2628.35 samples/sec   Loss 14.1071   LearningRate 0.0897   Epoch: 1   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:19,558-Speed 2627.81 samples/sec   Loss 14.2240   LearningRate 0.0897   Epoch: 1   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:23,465-Speed 2621.64 samples/sec   Loss 14.1691   LearningRate 0.0897   Epoch: 1   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:27,361-Speed 2629.32 samples/sec   Loss 14.0080   LearningRate 0.0897   Epoch: 1   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:31,258-Speed 2628.33 samples/sec   Loss 14.2975   LearningRate 0.0897   Epoch: 1   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:35,153-Speed 2629.37 samples/sec   Loss 14.1106   LearningRate 0.0897   Epoch: 1   Global Step: 44060   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:39,052-Speed 2627.41 samples/sec   Loss 14.0278   LearningRate 0.0897   Epoch: 1   Global Step: 44070   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:42,968-Speed 2614.86 samples/sec   Loss 14.2041   LearningRate 0.0897   Epoch: 1   Global Step: 44080   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:46,870-Speed 2625.08 samples/sec   Loss 14.1301   LearningRate 0.0897   Epoch: 1   Global Step: 44090   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:40:50,775-Speed 2622.82 samples/sec   Loss 14.1585   LearningRate 0.0897   Epoch: 1   Global Step: 44100   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:40:54,681-Speed 2621.99 samples/sec   Loss 14.1038   LearningRate 0.0896   Epoch: 1   Global Step: 44110   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:40:58,592-Speed 2619.32 samples/sec   Loss 14.2062   LearningRate 0.0896   Epoch: 1   Global Step: 44120   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:41:02,498-Speed 2621.52 samples/sec   Loss 14.0994   LearningRate 0.0896   Epoch: 1   Global Step: 44130   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:41:06,398-Speed 2626.63 samples/sec   Loss 13.9660   LearningRate 0.0896   Epoch: 1   Global Step: 44140   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:41:10,297-Speed 2627.43 samples/sec   Loss 14.2189   LearningRate 0.0896   Epoch: 1   Global Step: 44150   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:41:14,192-Speed 2629.32 samples/sec   Loss 14.1323   LearningRate 0.0896   Epoch: 1   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:18,088-Speed 2629.09 samples/sec   Loss 14.1202   LearningRate 0.0896   Epoch: 1   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:21,984-Speed 2628.38 samples/sec   Loss 14.1871   LearningRate 0.0896   Epoch: 1   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:25,885-Speed 2626.21 samples/sec   Loss 14.1415   LearningRate 0.0896   Epoch: 1   Global Step: 44190   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:29,792-Speed 2621.23 samples/sec   Loss 14.1060   LearningRate 0.0896   Epoch: 1   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:33,685-Speed 2630.62 samples/sec   Loss 14.2056   LearningRate 0.0896   Epoch: 1   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:37,583-Speed 2627.49 samples/sec   Loss 14.1757   LearningRate 0.0896   Epoch: 1   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:41,483-Speed 2626.40 samples/sec   Loss 14.1911   LearningRate 0.0896   Epoch: 1   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:41:45,373-Speed 2633.17 samples/sec   Loss 14.1268   LearningRate 0.0896   Epoch: 1   Global Step: 44240   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:41:49,267-Speed 2630.61 samples/sec   Loss 14.2284   LearningRate 0.0896   Epoch: 1   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:41:53,161-Speed 2630.00 samples/sec   Loss 14.1504   LearningRate 0.0896   Epoch: 1   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:41:57,049-Speed 2634.41 samples/sec   Loss 14.1640   LearningRate 0.0896   Epoch: 1   Global Step: 44270   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:00,945-Speed 2628.71 samples/sec   Loss 14.2144   LearningRate 0.0896   Epoch: 1   Global Step: 44280   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:04,842-Speed 2628.52 samples/sec   Loss 14.3070   LearningRate 0.0896   Epoch: 1   Global Step: 44290   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:08,735-Speed 2631.01 samples/sec   Loss 14.2162   LearningRate 0.0896   Epoch: 1   Global Step: 44300   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:12,631-Speed 2628.35 samples/sec   Loss 14.2101   LearningRate 0.0896   Epoch: 1   Global Step: 44310   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:16,530-Speed 2627.69 samples/sec   Loss 14.0682   LearningRate 0.0896   Epoch: 1   Global Step: 44320   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:20,428-Speed 2627.65 samples/sec   Loss 14.2800   LearningRate 0.0896   Epoch: 1   Global Step: 44330   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:42:24,327-Speed 2626.79 samples/sec   Loss 14.0063   LearningRate 0.0896   Epoch: 1   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:28,223-Speed 2628.68 samples/sec   Loss 14.2435   LearningRate 0.0896   Epoch: 1   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:32,119-Speed 2629.16 samples/sec   Loss 14.1065   LearningRate 0.0896   Epoch: 1   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:36,014-Speed 2629.63 samples/sec   Loss 14.1362   LearningRate 0.0896   Epoch: 1   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:40,050-Speed 2537.40 samples/sec   Loss 14.1402   LearningRate 0.0896   Epoch: 1   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:44,047-Speed 2562.77 samples/sec   Loss 14.1172   LearningRate 0.0896   Epoch: 1   Global Step: 44390   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:47,942-Speed 2629.75 samples/sec   Loss 14.2123   LearningRate 0.0896   Epoch: 1   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:51,843-Speed 2625.72 samples/sec   Loss 14.1357   LearningRate 0.0896   Epoch: 1   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:55,795-Speed 2591.74 samples/sec   Loss 14.0941   LearningRate 0.0896   Epoch: 1   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:42:59,697-Speed 2624.86 samples/sec   Loss 14.4016   LearningRate 0.0896   Epoch: 1   Global Step: 44430   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:43:03,593-Speed 2628.87 samples/sec   Loss 13.9794   LearningRate 0.0896   Epoch: 1   Global Step: 44440   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:43:07,488-Speed 2629.28 samples/sec   Loss 14.1265   LearningRate 0.0896   Epoch: 1   Global Step: 44450   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:43:11,382-Speed 2630.19 samples/sec   Loss 14.2311   LearningRate 0.0896   Epoch: 1   Global Step: 44460   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:43:15,281-Speed 2627.04 samples/sec   Loss 14.2226   LearningRate 0.0896   Epoch: 1   Global Step: 44470   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:43:19,156-Speed 2642.68 samples/sec   Loss 14.1379   LearningRate 0.0896   Epoch: 1   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:23,064-Speed 2621.48 samples/sec   Loss 14.2903   LearningRate 0.0896   Epoch: 1   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:26,959-Speed 2628.98 samples/sec   Loss 14.0069   LearningRate 0.0896   Epoch: 1   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:30,857-Speed 2628.58 samples/sec   Loss 14.1477   LearningRate 0.0896   Epoch: 1   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:34,751-Speed 2629.96 samples/sec   Loss 14.0001   LearningRate 0.0896   Epoch: 1   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:38,646-Speed 2629.63 samples/sec   Loss 14.1308   LearningRate 0.0896   Epoch: 1   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:42,542-Speed 2628.46 samples/sec   Loss 14.1853   LearningRate 0.0896   Epoch: 1   Global Step: 44540   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:46,442-Speed 2626.82 samples/sec   Loss 14.1651   LearningRate 0.0895   Epoch: 1   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:50,360-Speed 2613.66 samples/sec   Loss 14.3643   LearningRate 0.0895   Epoch: 1   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:54,259-Speed 2627.06 samples/sec   Loss 14.2065   LearningRate 0.0895   Epoch: 1   Global Step: 44570   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:43:58,155-Speed 2628.62 samples/sec   Loss 14.0347   LearningRate 0.0895   Epoch: 1   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:02,054-Speed 2627.81 samples/sec   Loss 14.2155   LearningRate 0.0895   Epoch: 1   Global Step: 44590   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:05,953-Speed 2626.55 samples/sec   Loss 14.2216   LearningRate 0.0895   Epoch: 1   Global Step: 44600   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:09,855-Speed 2624.65 samples/sec   Loss 14.0560   LearningRate 0.0895   Epoch: 1   Global Step: 44610   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:13,750-Speed 2629.61 samples/sec   Loss 14.0690   LearningRate 0.0895   Epoch: 1   Global Step: 44620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:17,651-Speed 2625.69 samples/sec   Loss 14.1374   LearningRate 0.0895   Epoch: 1   Global Step: 44630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:21,573-Speed 2612.00 samples/sec   Loss 14.0100   LearningRate 0.0895   Epoch: 1   Global Step: 44640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:25,503-Speed 2606.19 samples/sec   Loss 14.0642   LearningRate 0.0895   Epoch: 1   Global Step: 44650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:29,399-Speed 2628.84 samples/sec   Loss 14.1200   LearningRate 0.0895   Epoch: 1   Global Step: 44660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:33,314-Speed 2616.50 samples/sec   Loss 14.1109   LearningRate 0.0895   Epoch: 1   Global Step: 44670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:37,287-Speed 2577.71 samples/sec   Loss 14.2006   LearningRate 0.0895   Epoch: 1   Global Step: 44680   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:44:41,172-Speed 2636.48 samples/sec   Loss 14.1218   LearningRate 0.0895   Epoch: 1   Global Step: 44690   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:45,071-Speed 2626.93 samples/sec   Loss 14.0673   LearningRate 0.0895   Epoch: 1   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:48,966-Speed 2629.37 samples/sec   Loss 14.1888   LearningRate 0.0895   Epoch: 1   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:52,861-Speed 2630.03 samples/sec   Loss 14.0840   LearningRate 0.0895   Epoch: 1   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:44:56,773-Speed 2617.79 samples/sec   Loss 14.1620   LearningRate 0.0895   Epoch: 1   Global Step: 44730   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:00,672-Speed 2627.66 samples/sec   Loss 14.1174   LearningRate 0.0895   Epoch: 1   Global Step: 44740   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:04,592-Speed 2612.46 samples/sec   Loss 14.1008   LearningRate 0.0895   Epoch: 1   Global Step: 44750   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:08,501-Speed 2620.16 samples/sec   Loss 14.2388   LearningRate 0.0895   Epoch: 1   Global Step: 44760   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:12,400-Speed 2626.85 samples/sec   Loss 14.1629   LearningRate 0.0895   Epoch: 1   Global Step: 44770   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:16,293-Speed 2631.22 samples/sec   Loss 14.1898   LearningRate 0.0895   Epoch: 1   Global Step: 44780   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:20,185-Speed 2631.44 samples/sec   Loss 13.9806   LearningRate 0.0895   Epoch: 1   Global Step: 44790   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:24,081-Speed 2629.28 samples/sec   Loss 14.1676   LearningRate 0.0895   Epoch: 1   Global Step: 44800   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:27,983-Speed 2624.73 samples/sec   Loss 14.2688   LearningRate 0.0895   Epoch: 1   Global Step: 44810   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:31,879-Speed 2629.71 samples/sec   Loss 14.2666   LearningRate 0.0895   Epoch: 1   Global Step: 44820   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:35,779-Speed 2626.03 samples/sec   Loss 14.0374   LearningRate 0.0895   Epoch: 1   Global Step: 44830   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:39,691-Speed 2617.69 samples/sec   Loss 14.1138   LearningRate 0.0895   Epoch: 1   Global Step: 44840   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:43,594-Speed 2624.38 samples/sec   Loss 14.1661   LearningRate 0.0895   Epoch: 1   Global Step: 44850   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:45:47,496-Speed 2625.19 samples/sec   Loss 14.1152   LearningRate 0.0895   Epoch: 1   Global Step: 44860   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:51,398-Speed 2625.26 samples/sec   Loss 14.0037   LearningRate 0.0895   Epoch: 1   Global Step: 44870   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:55,293-Speed 2629.79 samples/sec   Loss 14.2573   LearningRate 0.0895   Epoch: 1   Global Step: 44880   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:45:59,191-Speed 2627.66 samples/sec   Loss 14.0662   LearningRate 0.0895   Epoch: 1   Global Step: 44890   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:03,089-Speed 2626.94 samples/sec   Loss 14.3061   LearningRate 0.0895   Epoch: 1   Global Step: 44900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:06,987-Speed 2627.94 samples/sec   Loss 14.0926   LearningRate 0.0895   Epoch: 1   Global Step: 44910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:10,882-Speed 2629.07 samples/sec   Loss 13.9796   LearningRate 0.0895   Epoch: 1   Global Step: 44920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:14,964-Speed 2509.65 samples/sec   Loss 14.1463   LearningRate 0.0895   Epoch: 1   Global Step: 44930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:18,858-Speed 2629.54 samples/sec   Loss 14.1568   LearningRate 0.0895   Epoch: 1   Global Step: 44940   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:22,766-Speed 2621.78 samples/sec   Loss 14.2225   LearningRate 0.0895   Epoch: 1   Global Step: 44950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:46:26,656-Speed 2633.10 samples/sec   Loss 14.1137   LearningRate 0.0895   Epoch: 1   Global Step: 44960   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:46:30,548-Speed 2631.76 samples/sec   Loss 14.0639   LearningRate 0.0895   Epoch: 1   Global Step: 44970   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:46:34,440-Speed 2631.36 samples/sec   Loss 14.1617   LearningRate 0.0894   Epoch: 1   Global Step: 44980   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:46:38,334-Speed 2629.98 samples/sec   Loss 14.1871   LearningRate 0.0894   Epoch: 1   Global Step: 44990   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:46:42,239-Speed 2622.83 samples/sec   Loss 14.1104   LearningRate 0.0894   Epoch: 1   Global Step: 45000   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:46:46,145-Speed 2622.02 samples/sec   Loss 14.2141   LearningRate 0.0894   Epoch: 1   Global Step: 45010   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:46:50,020-Speed 2643.29 samples/sec   Loss 14.0346   LearningRate 0.0894   Epoch: 1   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:46:53,932-Speed 2618.57 samples/sec   Loss 14.1897   LearningRate 0.0894   Epoch: 1   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:46:57,832-Speed 2626.64 samples/sec   Loss 14.0096   LearningRate 0.0894   Epoch: 1   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:01,726-Speed 2630.22 samples/sec   Loss 14.0848   LearningRate 0.0894   Epoch: 1   Global Step: 45050   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:05,615-Speed 2633.17 samples/sec   Loss 14.1921   LearningRate 0.0894   Epoch: 1   Global Step: 45060   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:09,511-Speed 2628.92 samples/sec   Loss 14.1493   LearningRate 0.0894   Epoch: 1   Global Step: 45070   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:13,410-Speed 2627.26 samples/sec   Loss 14.1778   LearningRate 0.0894   Epoch: 1   Global Step: 45080   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:17,312-Speed 2624.24 samples/sec   Loss 14.0868   LearningRate 0.0894   Epoch: 1   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:21,208-Speed 2629.33 samples/sec   Loss 14.1701   LearningRate 0.0894   Epoch: 1   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:25,110-Speed 2624.83 samples/sec   Loss 13.9975   LearningRate 0.0894   Epoch: 1   Global Step: 45110   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:47:29,001-Speed 2632.57 samples/sec   Loss 14.2355   LearningRate 0.0894   Epoch: 1   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:47:32,906-Speed 2622.66 samples/sec   Loss 14.0604   LearningRate 0.0894   Epoch: 1   Global Step: 45130   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:47:36,812-Speed 2622.33 samples/sec   Loss 14.0266   LearningRate 0.0894   Epoch: 1   Global Step: 45140   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:47:40,705-Speed 2630.80 samples/sec   Loss 14.1487   LearningRate 0.0894   Epoch: 1   Global Step: 45150   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:47:44,613-Speed 2620.73 samples/sec   Loss 14.0613   LearningRate 0.0894   Epoch: 1   Global Step: 45160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:47:48,468-Speed 2656.54 samples/sec   Loss 14.1369   LearningRate 0.0894   Epoch: 1   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:47:52,366-Speed 2627.85 samples/sec   Loss 14.1352   LearningRate 0.0894   Epoch: 1   Global Step: 45180   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:47:56,257-Speed 2632.38 samples/sec   Loss 14.1681   LearningRate 0.0894   Epoch: 1   Global Step: 45190   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:00,159-Speed 2625.14 samples/sec   Loss 14.0315   LearningRate 0.0894   Epoch: 1   Global Step: 45200   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:04,059-Speed 2626.32 samples/sec   Loss 14.1681   LearningRate 0.0894   Epoch: 1   Global Step: 45210   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:07,964-Speed 2622.60 samples/sec   Loss 14.1420   LearningRate 0.0894   Epoch: 1   Global Step: 45220   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:11,875-Speed 2618.85 samples/sec   Loss 14.1291   LearningRate 0.0894   Epoch: 1   Global Step: 45230   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:15,767-Speed 2631.97 samples/sec   Loss 14.2556   LearningRate 0.0894   Epoch: 1   Global Step: 45240   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:19,659-Speed 2631.24 samples/sec   Loss 14.1658   LearningRate 0.0894   Epoch: 1   Global Step: 45250   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:23,558-Speed 2626.70 samples/sec   Loss 14.2150   LearningRate 0.0894   Epoch: 1   Global Step: 45260   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 00:48:27,459-Speed 2626.04 samples/sec   Loss 14.1816   LearningRate 0.0894   Epoch: 1   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:31,367-Speed 2620.61 samples/sec   Loss 14.1012   LearningRate 0.0894   Epoch: 1   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:35,270-Speed 2624.21 samples/sec   Loss 14.1768   LearningRate 0.0894   Epoch: 1   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:39,165-Speed 2629.58 samples/sec   Loss 14.1414   LearningRate 0.0894   Epoch: 1   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:43,061-Speed 2628.97 samples/sec   Loss 14.1421   LearningRate 0.0894   Epoch: 1   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:46,960-Speed 2627.08 samples/sec   Loss 14.0852   LearningRate 0.0894   Epoch: 1   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:50,856-Speed 2628.89 samples/sec   Loss 14.0499   LearningRate 0.0894   Epoch: 1   Global Step: 45330   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:54,759-Speed 2624.40 samples/sec   Loss 14.0381   LearningRate 0.0894   Epoch: 1   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:48:58,669-Speed 2619.34 samples/sec   Loss 14.0576   LearningRate 0.0894   Epoch: 1   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:49:02,568-Speed 2626.77 samples/sec   Loss 13.9480   LearningRate 0.0894   Epoch: 1   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:49:06,463-Speed 2629.43 samples/sec   Loss 14.0403   LearningRate 0.0894   Epoch: 1   Global Step: 45370   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:10,357-Speed 2630.63 samples/sec   Loss 14.0540   LearningRate 0.0894   Epoch: 1   Global Step: 45380   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:14,263-Speed 2621.85 samples/sec   Loss 14.0864   LearningRate 0.0894   Epoch: 1   Global Step: 45390   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:18,160-Speed 2629.09 samples/sec   Loss 14.1846   LearningRate 0.0894   Epoch: 1   Global Step: 45400   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:22,054-Speed 2629.76 samples/sec   Loss 14.1092   LearningRate 0.0894   Epoch: 1   Global Step: 45410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:25,951-Speed 2628.64 samples/sec   Loss 14.1147   LearningRate 0.0893   Epoch: 1   Global Step: 45420   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:29,846-Speed 2629.29 samples/sec   Loss 14.0603   LearningRate 0.0893   Epoch: 1   Global Step: 45430   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:33,744-Speed 2627.72 samples/sec   Loss 14.1027   LearningRate 0.0893   Epoch: 1   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:37,643-Speed 2627.06 samples/sec   Loss 14.0583   LearningRate 0.0893   Epoch: 1   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:41,540-Speed 2628.10 samples/sec   Loss 14.1352   LearningRate 0.0893   Epoch: 1   Global Step: 45460   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:49:45,435-Speed 2629.56 samples/sec   Loss 14.0965   LearningRate 0.0893   Epoch: 1   Global Step: 45470   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:49:49,351-Speed 2615.70 samples/sec   Loss 13.9943   LearningRate 0.0893   Epoch: 1   Global Step: 45480   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:49:53,251-Speed 2626.40 samples/sec   Loss 13.8107   LearningRate 0.0893   Epoch: 1   Global Step: 45490   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:49:57,152-Speed 2625.08 samples/sec   Loss 14.0745   LearningRate 0.0893   Epoch: 1   Global Step: 45500   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:01,050-Speed 2628.17 samples/sec   Loss 14.1416   LearningRate 0.0893   Epoch: 1   Global Step: 45510   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:04,950-Speed 2625.60 samples/sec   Loss 14.1367   LearningRate 0.0893   Epoch: 1   Global Step: 45520   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:08,848-Speed 2627.41 samples/sec   Loss 14.1543   LearningRate 0.0893   Epoch: 1   Global Step: 45530   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:12,754-Speed 2622.44 samples/sec   Loss 14.0332   LearningRate 0.0893   Epoch: 1   Global Step: 45540   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:16,651-Speed 2628.54 samples/sec   Loss 13.9214   LearningRate 0.0893   Epoch: 1   Global Step: 45550   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:20,554-Speed 2623.91 samples/sec   Loss 14.2636   LearningRate 0.0893   Epoch: 1   Global Step: 45560   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:24,440-Speed 2636.15 samples/sec   Loss 14.0578   LearningRate 0.0893   Epoch: 1   Global Step: 45570   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:28,350-Speed 2619.62 samples/sec   Loss 14.2429   LearningRate 0.0893   Epoch: 1   Global Step: 45580   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:32,248-Speed 2627.72 samples/sec   Loss 14.1496   LearningRate 0.0893   Epoch: 1   Global Step: 45590   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:36,155-Speed 2621.11 samples/sec   Loss 14.0227   LearningRate 0.0893   Epoch: 1   Global Step: 45600   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:50:40,038-Speed 2637.87 samples/sec   Loss 14.1263   LearningRate 0.0893   Epoch: 1   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:50:43,936-Speed 2627.32 samples/sec   Loss 13.9710   LearningRate 0.0893   Epoch: 1   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:50:47,838-Speed 2625.25 samples/sec   Loss 14.1025   LearningRate 0.0893   Epoch: 1   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:50:51,712-Speed 2643.66 samples/sec   Loss 14.0409   LearningRate 0.0893   Epoch: 1   Global Step: 45640   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:50:55,606-Speed 2630.59 samples/sec   Loss 14.0986   LearningRate 0.0893   Epoch: 1   Global Step: 45650   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:50:59,509-Speed 2624.35 samples/sec   Loss 14.0470   LearningRate 0.0893   Epoch: 1   Global Step: 45660   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:03,401-Speed 2631.58 samples/sec   Loss 14.1240   LearningRate 0.0893   Epoch: 1   Global Step: 45670   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:07,294-Speed 2630.80 samples/sec   Loss 14.1124   LearningRate 0.0893   Epoch: 1   Global Step: 45680   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:11,185-Speed 2632.45 samples/sec   Loss 14.1045   LearningRate 0.0893   Epoch: 1   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:15,079-Speed 2630.20 samples/sec   Loss 14.0644   LearningRate 0.0893   Epoch: 1   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:18,978-Speed 2627.01 samples/sec   Loss 14.0734   LearningRate 0.0893   Epoch: 1   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:22,873-Speed 2629.24 samples/sec   Loss 14.1221   LearningRate 0.0893   Epoch: 1   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:26,777-Speed 2623.68 samples/sec   Loss 14.1241   LearningRate 0.0893   Epoch: 1   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 00:51:30,672-Speed 2630.15 samples/sec   Loss 14.0398   LearningRate 0.0893   Epoch: 1   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:34,567-Speed 2629.47 samples/sec   Loss 14.2249   LearningRate 0.0893   Epoch: 1   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:38,487-Speed 2612.71 samples/sec   Loss 14.0696   LearningRate 0.0893   Epoch: 1   Global Step: 45760   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:42,389-Speed 2624.52 samples/sec   Loss 14.1912   LearningRate 0.0893   Epoch: 1   Global Step: 45770   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:46,289-Speed 2626.49 samples/sec   Loss 14.2325   LearningRate 0.0893   Epoch: 1   Global Step: 45780   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:50,208-Speed 2613.71 samples/sec   Loss 14.0920   LearningRate 0.0893   Epoch: 1   Global Step: 45790   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:54,108-Speed 2625.76 samples/sec   Loss 14.0176   LearningRate 0.0893   Epoch: 1   Global Step: 45800   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:51:58,004-Speed 2629.59 samples/sec   Loss 14.2615   LearningRate 0.0893   Epoch: 1   Global Step: 45810   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:01,896-Speed 2631.85 samples/sec   Loss 13.8963   LearningRate 0.0893   Epoch: 1   Global Step: 45820   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:05,794-Speed 2627.23 samples/sec   Loss 14.1163   LearningRate 0.0893   Epoch: 1   Global Step: 45830   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:09,680-Speed 2635.66 samples/sec   Loss 14.1390   LearningRate 0.0893   Epoch: 1   Global Step: 45840   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:13,583-Speed 2623.83 samples/sec   Loss 14.0631   LearningRate 0.0893   Epoch: 1   Global Step: 45850   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:17,491-Speed 2621.55 samples/sec   Loss 14.1759   LearningRate 0.0892   Epoch: 1   Global Step: 45860   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:21,396-Speed 2622.64 samples/sec   Loss 14.2022   LearningRate 0.0892   Epoch: 1   Global Step: 45870   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:25,315-Speed 2612.92 samples/sec   Loss 13.9860   LearningRate 0.0892   Epoch: 1   Global Step: 45880   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:29,216-Speed 2626.45 samples/sec   Loss 14.0944   LearningRate 0.0892   Epoch: 1   Global Step: 45890   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:33,111-Speed 2629.30 samples/sec   Loss 14.1284   LearningRate 0.0892   Epoch: 1   Global Step: 45900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:37,005-Speed 2630.86 samples/sec   Loss 13.9665   LearningRate 0.0892   Epoch: 1   Global Step: 45910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:40,896-Speed 2632.62 samples/sec   Loss 14.0922   LearningRate 0.0892   Epoch: 1   Global Step: 45920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:44,788-Speed 2631.43 samples/sec   Loss 14.1191   LearningRate 0.0892   Epoch: 1   Global Step: 45930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:48,688-Speed 2626.16 samples/sec   Loss 14.0047   LearningRate 0.0892   Epoch: 1   Global Step: 45940   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:52:52,561-Speed 2645.10 samples/sec   Loss 14.1549   LearningRate 0.0892   Epoch: 1   Global Step: 45950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:52:56,457-Speed 2628.53 samples/sec   Loss 14.1149   LearningRate 0.0892   Epoch: 1   Global Step: 45960   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:00,358-Speed 2626.05 samples/sec   Loss 14.1204   LearningRate 0.0892   Epoch: 1   Global Step: 45970   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:04,253-Speed 2629.83 samples/sec   Loss 13.9719   LearningRate 0.0892   Epoch: 1   Global Step: 45980   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:08,171-Speed 2614.83 samples/sec   Loss 14.0672   LearningRate 0.0892   Epoch: 1   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:12,077-Speed 2622.32 samples/sec   Loss 14.0166   LearningRate 0.0892   Epoch: 1   Global Step: 46000   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:15,967-Speed 2632.56 samples/sec   Loss 14.0662   LearningRate 0.0892   Epoch: 1   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:19,861-Speed 2630.49 samples/sec   Loss 14.0586   LearningRate 0.0892   Epoch: 1   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:23,755-Speed 2630.39 samples/sec   Loss 13.9991   LearningRate 0.0892   Epoch: 1   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:27,650-Speed 2630.04 samples/sec   Loss 14.1188   LearningRate 0.0892   Epoch: 1   Global Step: 46040   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:53:31,541-Speed 2632.14 samples/sec   Loss 14.0903   LearningRate 0.0892   Epoch: 1   Global Step: 46050   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:35,436-Speed 2630.44 samples/sec   Loss 14.1598   LearningRate 0.0892   Epoch: 1   Global Step: 46060   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:39,344-Speed 2620.97 samples/sec   Loss 13.9657   LearningRate 0.0892   Epoch: 1   Global Step: 46070   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:43,236-Speed 2630.94 samples/sec   Loss 14.1872   LearningRate 0.0892   Epoch: 1   Global Step: 46080   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:47,136-Speed 2626.07 samples/sec   Loss 13.9540   LearningRate 0.0892   Epoch: 1   Global Step: 46090   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:51,032-Speed 2629.81 samples/sec   Loss 14.1776   LearningRate 0.0892   Epoch: 1   Global Step: 46100   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:54,930-Speed 2627.70 samples/sec   Loss 14.0270   LearningRate 0.0892   Epoch: 1   Global Step: 46110   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:53:58,822-Speed 2631.44 samples/sec   Loss 14.0939   LearningRate 0.0892   Epoch: 1   Global Step: 46120   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:54:02,769-Speed 2595.19 samples/sec   Loss 14.2652   LearningRate 0.0892   Epoch: 1   Global Step: 46130   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:54:06,684-Speed 2616.36 samples/sec   Loss 14.0976   LearningRate 0.0892   Epoch: 1   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:10,579-Speed 2629.42 samples/sec   Loss 14.0329   LearningRate 0.0892   Epoch: 1   Global Step: 46150   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:14,470-Speed 2632.24 samples/sec   Loss 14.0925   LearningRate 0.0892   Epoch: 1   Global Step: 46160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:18,366-Speed 2629.12 samples/sec   Loss 14.2096   LearningRate 0.0892   Epoch: 1   Global Step: 46170   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:22,264-Speed 2627.36 samples/sec   Loss 14.1454   LearningRate 0.0892   Epoch: 1   Global Step: 46180   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:26,162-Speed 2627.35 samples/sec   Loss 14.1105   LearningRate 0.0892   Epoch: 1   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:30,058-Speed 2629.19 samples/sec   Loss 14.2009   LearningRate 0.0892   Epoch: 1   Global Step: 46200   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:33,976-Speed 2614.67 samples/sec   Loss 13.9652   LearningRate 0.0892   Epoch: 1   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:37,881-Speed 2623.33 samples/sec   Loss 13.8590   LearningRate 0.0892   Epoch: 1   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:41,779-Speed 2627.02 samples/sec   Loss 14.1109   LearningRate 0.0892   Epoch: 1   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:45,676-Speed 2629.19 samples/sec   Loss 14.0444   LearningRate 0.0892   Epoch: 1   Global Step: 46240   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:54:49,587-Speed 2618.60 samples/sec   Loss 13.9875   LearningRate 0.0892   Epoch: 1   Global Step: 46250   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:54:53,465-Speed 2641.30 samples/sec   Loss 14.0055   LearningRate 0.0892   Epoch: 1   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:54:57,370-Speed 2622.80 samples/sec   Loss 13.9348   LearningRate 0.0892   Epoch: 1   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:01,265-Speed 2629.79 samples/sec   Loss 14.1896   LearningRate 0.0892   Epoch: 1   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:05,163-Speed 2627.75 samples/sec   Loss 14.0572   LearningRate 0.0892   Epoch: 1   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:09,061-Speed 2628.29 samples/sec   Loss 14.0886   LearningRate 0.0891   Epoch: 1   Global Step: 46300   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:12,961-Speed 2626.36 samples/sec   Loss 14.0615   LearningRate 0.0891   Epoch: 1   Global Step: 46310   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:16,868-Speed 2621.11 samples/sec   Loss 14.1074   LearningRate 0.0891   Epoch: 1   Global Step: 46320   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:20,777-Speed 2620.98 samples/sec   Loss 14.2077   LearningRate 0.0891   Epoch: 1   Global Step: 46330   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:24,676-Speed 2627.09 samples/sec   Loss 13.9765   LearningRate 0.0891   Epoch: 1   Global Step: 46340   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:28,568-Speed 2631.74 samples/sec   Loss 13.9056   LearningRate 0.0891   Epoch: 1   Global Step: 46350   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:32,466-Speed 2627.38 samples/sec   Loss 14.1456   LearningRate 0.0891   Epoch: 1   Global Step: 46360   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:55:36,363-Speed 2628.41 samples/sec   Loss 13.9660   LearningRate 0.0891   Epoch: 1   Global Step: 46370   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:55:40,262-Speed 2627.11 samples/sec   Loss 14.0246   LearningRate 0.0891   Epoch: 1   Global Step: 46380   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:55:44,168-Speed 2621.89 samples/sec   Loss 14.1257   LearningRate 0.0891   Epoch: 1   Global Step: 46390   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:55:48,078-Speed 2619.65 samples/sec   Loss 14.0475   LearningRate 0.0891   Epoch: 1   Global Step: 46400   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:55:51,959-Speed 2639.00 samples/sec   Loss 13.9778   LearningRate 0.0891   Epoch: 1   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:55,853-Speed 2631.11 samples/sec   Loss 13.9922   LearningRate 0.0891   Epoch: 1   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:55:59,748-Speed 2629.77 samples/sec   Loss 14.0584   LearningRate 0.0891   Epoch: 1   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:03,645-Speed 2627.75 samples/sec   Loss 14.0800   LearningRate 0.0891   Epoch: 1   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:07,541-Speed 2628.76 samples/sec   Loss 14.1045   LearningRate 0.0891   Epoch: 1   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:11,449-Speed 2621.04 samples/sec   Loss 13.9987   LearningRate 0.0891   Epoch: 1   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:15,346-Speed 2628.03 samples/sec   Loss 13.9602   LearningRate 0.0891   Epoch: 1   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:19,243-Speed 2628.75 samples/sec   Loss 14.1135   LearningRate 0.0891   Epoch: 1   Global Step: 46480   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:23,146-Speed 2624.45 samples/sec   Loss 14.0351   LearningRate 0.0891   Epoch: 1   Global Step: 46490   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:27,038-Speed 2631.65 samples/sec   Loss 14.0830   LearningRate 0.0891   Epoch: 1   Global Step: 46500   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:30,908-Speed 2646.89 samples/sec   Loss 13.9116   LearningRate 0.0891   Epoch: 1   Global Step: 46510   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:34,801-Speed 2630.31 samples/sec   Loss 14.1873   LearningRate 0.0891   Epoch: 1   Global Step: 46520   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:38,698-Speed 2628.55 samples/sec   Loss 13.9257   LearningRate 0.0891   Epoch: 1   Global Step: 46530   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:42,605-Speed 2621.53 samples/sec   Loss 14.0440   LearningRate 0.0891   Epoch: 1   Global Step: 46540   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:46,511-Speed 2622.04 samples/sec   Loss 14.0303   LearningRate 0.0891   Epoch: 1   Global Step: 46550   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:50,413-Speed 2625.36 samples/sec   Loss 13.9334   LearningRate 0.0891   Epoch: 1   Global Step: 46560   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:54,317-Speed 2624.02 samples/sec   Loss 14.1532   LearningRate 0.0891   Epoch: 1   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:56:58,220-Speed 2624.30 samples/sec   Loss 13.9848   LearningRate 0.0891   Epoch: 1   Global Step: 46580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:02,125-Speed 2623.25 samples/sec   Loss 13.9481   LearningRate 0.0891   Epoch: 1   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:06,033-Speed 2620.43 samples/sec   Loss 13.8698   LearningRate 0.0891   Epoch: 1   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:09,933-Speed 2626.11 samples/sec   Loss 13.9386   LearningRate 0.0891   Epoch: 1   Global Step: 46610   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:57:13,804-Speed 2646.36 samples/sec   Loss 14.1742   LearningRate 0.0891   Epoch: 1   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:17,709-Speed 2622.87 samples/sec   Loss 14.0324   LearningRate 0.0891   Epoch: 1   Global Step: 46630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:21,596-Speed 2634.86 samples/sec   Loss 14.1631   LearningRate 0.0891   Epoch: 1   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:25,497-Speed 2626.05 samples/sec   Loss 13.9820   LearningRate 0.0891   Epoch: 1   Global Step: 46650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:29,393-Speed 2628.76 samples/sec   Loss 14.0774   LearningRate 0.0891   Epoch: 1   Global Step: 46660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:33,285-Speed 2631.50 samples/sec   Loss 14.0087   LearningRate 0.0891   Epoch: 1   Global Step: 46670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:37,181-Speed 2628.72 samples/sec   Loss 14.1115   LearningRate 0.0891   Epoch: 1   Global Step: 46680   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:41,080-Speed 2627.12 samples/sec   Loss 14.1064   LearningRate 0.0891   Epoch: 1   Global Step: 46690   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:45,001-Speed 2611.98 samples/sec   Loss 13.9232   LearningRate 0.0891   Epoch: 1   Global Step: 46700   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:48,929-Speed 2608.21 samples/sec   Loss 14.1075   LearningRate 0.0891   Epoch: 1   Global Step: 46710   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:57:52,823-Speed 2630.48 samples/sec   Loss 14.0363   LearningRate 0.0891   Epoch: 1   Global Step: 46720   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:57:56,717-Speed 2630.30 samples/sec   Loss 14.0397   LearningRate 0.0891   Epoch: 1   Global Step: 46730   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:00,613-Speed 2628.93 samples/sec   Loss 13.9274   LearningRate 0.0890   Epoch: 1   Global Step: 46740   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:04,511-Speed 2627.31 samples/sec   Loss 14.0236   LearningRate 0.0890   Epoch: 1   Global Step: 46750   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:08,410-Speed 2627.28 samples/sec   Loss 14.0729   LearningRate 0.0890   Epoch: 1   Global Step: 46760   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:12,331-Speed 2612.44 samples/sec   Loss 14.1084   LearningRate 0.0890   Epoch: 1   Global Step: 46770   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:16,226-Speed 2629.67 samples/sec   Loss 14.1688   LearningRate 0.0890   Epoch: 1   Global Step: 46780   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:20,141-Speed 2615.98 samples/sec   Loss 14.0929   LearningRate 0.0890   Epoch: 1   Global Step: 46790   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:24,038-Speed 2628.57 samples/sec   Loss 14.0357   LearningRate 0.0890   Epoch: 1   Global Step: 46800   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:27,945-Speed 2622.38 samples/sec   Loss 14.1040   LearningRate 0.0890   Epoch: 1   Global Step: 46810   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:31,841-Speed 2628.42 samples/sec   Loss 14.1659   LearningRate 0.0890   Epoch: 1   Global Step: 46820   Fp16 Grad Scale: 524288   Required: 88 hours
Training: 2022-04-13 00:58:35,727-Speed 2636.45 samples/sec   Loss 14.0848   LearningRate 0.0890   Epoch: 1   Global Step: 46830   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:39,628-Speed 2625.43 samples/sec   Loss 14.0703   LearningRate 0.0890   Epoch: 1   Global Step: 46840   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:43,538-Speed 2619.96 samples/sec   Loss 14.0725   LearningRate 0.0890   Epoch: 1   Global Step: 46850   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:47,446-Speed 2620.61 samples/sec   Loss 14.0433   LearningRate 0.0890   Epoch: 1   Global Step: 46860   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:51,434-Speed 2568.13 samples/sec   Loss 14.0467   LearningRate 0.0890   Epoch: 1   Global Step: 46870   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:55,332-Speed 2627.95 samples/sec   Loss 14.0425   LearningRate 0.0890   Epoch: 1   Global Step: 46880   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:58:59,228-Speed 2629.53 samples/sec   Loss 14.0595   LearningRate 0.0890   Epoch: 1   Global Step: 46890   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:59:03,110-Speed 2638.62 samples/sec   Loss 13.9843   LearningRate 0.0890   Epoch: 1   Global Step: 46900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:07,006-Speed 2628.72 samples/sec   Loss 13.9413   LearningRate 0.0890   Epoch: 1   Global Step: 46910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:10,898-Speed 2631.40 samples/sec   Loss 14.0834   LearningRate 0.0890   Epoch: 1   Global Step: 46920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:14,800-Speed 2625.40 samples/sec   Loss 14.0411   LearningRate 0.0890   Epoch: 1   Global Step: 46930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:18,704-Speed 2623.55 samples/sec   Loss 14.0996   LearningRate 0.0890   Epoch: 1   Global Step: 46940   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:22,611-Speed 2621.35 samples/sec   Loss 14.0704   LearningRate 0.0890   Epoch: 1   Global Step: 46950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:26,509-Speed 2627.98 samples/sec   Loss 13.9780   LearningRate 0.0890   Epoch: 1   Global Step: 46960   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:30,408-Speed 2627.18 samples/sec   Loss 14.0297   LearningRate 0.0890   Epoch: 1   Global Step: 46970   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:34,308-Speed 2626.56 samples/sec   Loss 14.0058   LearningRate 0.0890   Epoch: 1   Global Step: 46980   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:38,203-Speed 2629.19 samples/sec   Loss 13.9394   LearningRate 0.0890   Epoch: 1   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 00:59:42,114-Speed 2618.96 samples/sec   Loss 14.0169   LearningRate 0.0890   Epoch: 1   Global Step: 47000   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:59:46,019-Speed 2622.92 samples/sec   Loss 13.9880   LearningRate 0.0890   Epoch: 1   Global Step: 47010   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:59:49,917-Speed 2628.01 samples/sec   Loss 14.1415   LearningRate 0.0890   Epoch: 1   Global Step: 47020   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:59:53,821-Speed 2623.23 samples/sec   Loss 14.0180   LearningRate 0.0890   Epoch: 1   Global Step: 47030   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 00:59:57,720-Speed 2627.12 samples/sec   Loss 14.0589   LearningRate 0.0890   Epoch: 1   Global Step: 47040   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:01,620-Speed 2626.49 samples/sec   Loss 13.9915   LearningRate 0.0890   Epoch: 1   Global Step: 47050   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:05,519-Speed 2627.08 samples/sec   Loss 14.0115   LearningRate 0.0890   Epoch: 1   Global Step: 47060   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:09,422-Speed 2624.46 samples/sec   Loss 13.9414   LearningRate 0.0890   Epoch: 1   Global Step: 47070   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:13,334-Speed 2617.93 samples/sec   Loss 14.0575   LearningRate 0.0890   Epoch: 1   Global Step: 47080   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:17,229-Speed 2629.64 samples/sec   Loss 14.0745   LearningRate 0.0890   Epoch: 1   Global Step: 47090   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:21,133-Speed 2623.68 samples/sec   Loss 14.1058   LearningRate 0.0890   Epoch: 1   Global Step: 47100   Fp16 Grad Scale: 524288   Required: 88 hours
Training: 2022-04-13 01:00:25,012-Speed 2639.69 samples/sec   Loss 14.1804   LearningRate 0.0890   Epoch: 1   Global Step: 47110   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:28,909-Speed 2629.03 samples/sec   Loss 13.9613   LearningRate 0.0890   Epoch: 1   Global Step: 47120   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:32,803-Speed 2630.37 samples/sec   Loss 14.1554   LearningRate 0.0890   Epoch: 1   Global Step: 47130   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:36,719-Speed 2615.49 samples/sec   Loss 14.0511   LearningRate 0.0890   Epoch: 1   Global Step: 47140   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:40,631-Speed 2618.12 samples/sec   Loss 14.0705   LearningRate 0.0890   Epoch: 1   Global Step: 47150   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:44,526-Speed 2630.05 samples/sec   Loss 14.0123   LearningRate 0.0890   Epoch: 1   Global Step: 47160   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:48,427-Speed 2625.40 samples/sec   Loss 14.1057   LearningRate 0.0890   Epoch: 1   Global Step: 47170   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:00:52,323-Speed 2629.67 samples/sec   Loss 14.1853   LearningRate 0.0889   Epoch: 1   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:00:56,267-Speed 2596.30 samples/sec   Loss 14.1176   LearningRate 0.0889   Epoch: 1   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:00,348-Speed 2510.43 samples/sec   Loss 14.0983   LearningRate 0.0889   Epoch: 1   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:04,494-Speed 2470.07 samples/sec   Loss 13.9371   LearningRate 0.0889   Epoch: 1   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:08,444-Speed 2593.20 samples/sec   Loss 13.9930   LearningRate 0.0889   Epoch: 1   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:12,335-Speed 2632.14 samples/sec   Loss 13.9025   LearningRate 0.0889   Epoch: 1   Global Step: 47230   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:16,232-Speed 2628.33 samples/sec   Loss 13.9310   LearningRate 0.0889   Epoch: 1   Global Step: 47240   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:20,147-Speed 2616.29 samples/sec   Loss 13.8693   LearningRate 0.0889   Epoch: 1   Global Step: 47250   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:24,047-Speed 2625.92 samples/sec   Loss 13.9085   LearningRate 0.0889   Epoch: 1   Global Step: 47260   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:27,944-Speed 2628.55 samples/sec   Loss 13.9549   LearningRate 0.0889   Epoch: 1   Global Step: 47270   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:01:31,841-Speed 2628.46 samples/sec   Loss 13.8592   LearningRate 0.0889   Epoch: 1   Global Step: 47280   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:35,737-Speed 2628.92 samples/sec   Loss 14.1855   LearningRate 0.0889   Epoch: 1   Global Step: 47290   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:39,632-Speed 2629.39 samples/sec   Loss 14.0821   LearningRate 0.0889   Epoch: 1   Global Step: 47300   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:43,528-Speed 2628.95 samples/sec   Loss 13.8551   LearningRate 0.0889   Epoch: 1   Global Step: 47310   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:47,422-Speed 2630.14 samples/sec   Loss 14.0245   LearningRate 0.0889   Epoch: 1   Global Step: 47320   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:51,316-Speed 2630.41 samples/sec   Loss 13.9601   LearningRate 0.0889   Epoch: 1   Global Step: 47330   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:55,213-Speed 2628.65 samples/sec   Loss 13.9619   LearningRate 0.0889   Epoch: 1   Global Step: 47340   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:01:59,116-Speed 2624.69 samples/sec   Loss 13.8893   LearningRate 0.0889   Epoch: 1   Global Step: 47350   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:03,011-Speed 2629.51 samples/sec   Loss 13.9776   LearningRate 0.0889   Epoch: 1   Global Step: 47360   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:06,910-Speed 2626.93 samples/sec   Loss 14.2012   LearningRate 0.0889   Epoch: 1   Global Step: 47370   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:10,803-Speed 2630.69 samples/sec   Loss 13.9836   LearningRate 0.0889   Epoch: 1   Global Step: 47380   Fp16 Grad Scale: 524288   Required: 88 hours
Training: 2022-04-13 01:02:14,681-Speed 2641.64 samples/sec   Loss 13.8946   LearningRate 0.0889   Epoch: 1   Global Step: 47390   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:18,576-Speed 2629.85 samples/sec   Loss 14.1580   LearningRate 0.0889   Epoch: 1   Global Step: 47400   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:22,470-Speed 2630.07 samples/sec   Loss 13.8634   LearningRate 0.0889   Epoch: 1   Global Step: 47410   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:26,364-Speed 2630.45 samples/sec   Loss 14.0722   LearningRate 0.0889   Epoch: 1   Global Step: 47420   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:30,258-Speed 2630.21 samples/sec   Loss 14.1432   LearningRate 0.0889   Epoch: 1   Global Step: 47430   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:34,153-Speed 2630.01 samples/sec   Loss 14.1043   LearningRate 0.0889   Epoch: 1   Global Step: 47440   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:38,070-Speed 2614.78 samples/sec   Loss 14.0073   LearningRate 0.0889   Epoch: 1   Global Step: 47450   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:42,021-Speed 2592.17 samples/sec   Loss 14.0114   LearningRate 0.0889   Epoch: 1   Global Step: 47460   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:45,917-Speed 2629.15 samples/sec   Loss 14.0830   LearningRate 0.0889   Epoch: 1   Global Step: 47470   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:49,811-Speed 2631.02 samples/sec   Loss 14.0494   LearningRate 0.0889   Epoch: 1   Global Step: 47480   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:53,689-Speed 2640.58 samples/sec   Loss 14.1061   LearningRate 0.0889   Epoch: 1   Global Step: 47490   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:02:57,598-Speed 2620.51 samples/sec   Loss 13.9315   LearningRate 0.0889   Epoch: 1   Global Step: 47500   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:01,490-Speed 2632.04 samples/sec   Loss 14.1343   LearningRate 0.0889   Epoch: 1   Global Step: 47510   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:05,394-Speed 2623.74 samples/sec   Loss 14.0984   LearningRate 0.0889   Epoch: 1   Global Step: 47520   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:09,301-Speed 2621.08 samples/sec   Loss 14.0531   LearningRate 0.0889   Epoch: 1   Global Step: 47530   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:13,198-Speed 2628.17 samples/sec   Loss 13.9783   LearningRate 0.0889   Epoch: 1   Global Step: 47540   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:17,101-Speed 2624.15 samples/sec   Loss 13.9908   LearningRate 0.0889   Epoch: 1   Global Step: 47550   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:20,994-Speed 2631.71 samples/sec   Loss 13.9802   LearningRate 0.0889   Epoch: 1   Global Step: 47560   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:24,889-Speed 2629.82 samples/sec   Loss 13.9813   LearningRate 0.0889   Epoch: 1   Global Step: 47570   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:03:28,767-Speed 2641.29 samples/sec   Loss 13.9460   LearningRate 0.0889   Epoch: 1   Global Step: 47580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:32,662-Speed 2629.36 samples/sec   Loss 14.0129   LearningRate 0.0889   Epoch: 1   Global Step: 47590   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:36,557-Speed 2629.26 samples/sec   Loss 13.8518   LearningRate 0.0889   Epoch: 1   Global Step: 47600   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:40,466-Speed 2620.35 samples/sec   Loss 14.1152   LearningRate 0.0889   Epoch: 1   Global Step: 47610   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:44,367-Speed 2625.87 samples/sec   Loss 14.0001   LearningRate 0.0888   Epoch: 1   Global Step: 47620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:48,266-Speed 2627.36 samples/sec   Loss 13.9550   LearningRate 0.0888   Epoch: 1   Global Step: 47630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:52,164-Speed 2627.11 samples/sec   Loss 14.0831   LearningRate 0.0888   Epoch: 1   Global Step: 47640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:56,065-Speed 2626.22 samples/sec   Loss 14.0234   LearningRate 0.0888   Epoch: 1   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:03:59,962-Speed 2628.04 samples/sec   Loss 14.1259   LearningRate 0.0888   Epoch: 1   Global Step: 47660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:03,860-Speed 2627.52 samples/sec   Loss 13.9169   LearningRate 0.0888   Epoch: 1   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:07,767-Speed 2621.81 samples/sec   Loss 14.0504   LearningRate 0.0888   Epoch: 1   Global Step: 47680   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:04:11,665-Speed 2627.72 samples/sec   Loss 14.0988   LearningRate 0.0888   Epoch: 1   Global Step: 47690   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:04:15,556-Speed 2632.29 samples/sec   Loss 13.9813   LearningRate 0.0888   Epoch: 1   Global Step: 47700   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:04:19,450-Speed 2630.81 samples/sec   Loss 14.0064   LearningRate 0.0888   Epoch: 1   Global Step: 47710   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:04:23,341-Speed 2632.43 samples/sec   Loss 14.0737   LearningRate 0.0888   Epoch: 1   Global Step: 47720   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:04:27,216-Speed 2643.06 samples/sec   Loss 14.1581   LearningRate 0.0888   Epoch: 1   Global Step: 47730   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:31,116-Speed 2626.34 samples/sec   Loss 14.0724   LearningRate 0.0888   Epoch: 1   Global Step: 47740   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:35,019-Speed 2624.22 samples/sec   Loss 14.0124   LearningRate 0.0888   Epoch: 1   Global Step: 47750   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:38,925-Speed 2622.29 samples/sec   Loss 13.9537   LearningRate 0.0888   Epoch: 1   Global Step: 47760   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:42,837-Speed 2618.81 samples/sec   Loss 14.1085   LearningRate 0.0888   Epoch: 1   Global Step: 47770   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:46,746-Speed 2619.90 samples/sec   Loss 13.9697   LearningRate 0.0888   Epoch: 1   Global Step: 47780   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:50,641-Speed 2629.65 samples/sec   Loss 13.8287   LearningRate 0.0888   Epoch: 1   Global Step: 47790   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:54,544-Speed 2623.69 samples/sec   Loss 14.1607   LearningRate 0.0888   Epoch: 1   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:04:58,473-Speed 2607.55 samples/sec   Loss 14.0219   LearningRate 0.0888   Epoch: 1   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:05:02,368-Speed 2629.35 samples/sec   Loss 14.1159   LearningRate 0.0888   Epoch: 1   Global Step: 47820   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:05:06,267-Speed 2627.80 samples/sec   Loss 13.8682   LearningRate 0.0888   Epoch: 1   Global Step: 47830   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:10,163-Speed 2628.53 samples/sec   Loss 13.9447   LearningRate 0.0888   Epoch: 1   Global Step: 47840   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:14,074-Speed 2619.28 samples/sec   Loss 14.0487   LearningRate 0.0888   Epoch: 1   Global Step: 47850   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:17,990-Speed 2614.97 samples/sec   Loss 14.1053   LearningRate 0.0888   Epoch: 1   Global Step: 47860   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:21,877-Speed 2635.29 samples/sec   Loss 13.9061   LearningRate 0.0888   Epoch: 1   Global Step: 47870   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:25,796-Speed 2613.11 samples/sec   Loss 13.9164   LearningRate 0.0888   Epoch: 1   Global Step: 47880   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:29,712-Speed 2616.33 samples/sec   Loss 14.1337   LearningRate 0.0888   Epoch: 1   Global Step: 47890   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:33,610-Speed 2627.61 samples/sec   Loss 14.1231   LearningRate 0.0888   Epoch: 1   Global Step: 47900   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:37,509-Speed 2626.70 samples/sec   Loss 13.9146   LearningRate 0.0888   Epoch: 1   Global Step: 47910   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:41,484-Speed 2576.74 samples/sec   Loss 14.1513   LearningRate 0.0888   Epoch: 1   Global Step: 47920   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:45,444-Speed 2586.54 samples/sec   Loss 14.0097   LearningRate 0.0888   Epoch: 1   Global Step: 47930   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:49,346-Speed 2624.97 samples/sec   Loss 14.1089   LearningRate 0.0888   Epoch: 1   Global Step: 47940   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:53,240-Speed 2630.13 samples/sec   Loss 14.0176   LearningRate 0.0888   Epoch: 1   Global Step: 47950   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:05:57,130-Speed 2633.78 samples/sec   Loss 13.8081   LearningRate 0.0888   Epoch: 1   Global Step: 47960   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:01,047-Speed 2615.01 samples/sec   Loss 13.7974   LearningRate 0.0888   Epoch: 1   Global Step: 47970   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:04,978-Speed 2605.39 samples/sec   Loss 13.8842   LearningRate 0.0888   Epoch: 1   Global Step: 47980   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:08,884-Speed 2622.95 samples/sec   Loss 14.0031   LearningRate 0.0888   Epoch: 1   Global Step: 47990   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:12,780-Speed 2629.24 samples/sec   Loss 14.0957   LearningRate 0.0888   Epoch: 1   Global Step: 48000   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:16,681-Speed 2625.48 samples/sec   Loss 14.0100   LearningRate 0.0888   Epoch: 1   Global Step: 48010   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:20,589-Speed 2620.82 samples/sec   Loss 14.0644   LearningRate 0.0888   Epoch: 1   Global Step: 48020   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:24,499-Speed 2619.23 samples/sec   Loss 14.0149   LearningRate 0.0888   Epoch: 1   Global Step: 48030   Fp16 Grad Scale: 524288   Required: 87 hours
Training: 2022-04-13 01:06:28,393-Speed 2630.76 samples/sec   Loss 14.0791   LearningRate 0.0888   Epoch: 1   Global Step: 48040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:32,286-Speed 2631.51 samples/sec   Loss 14.0448   LearningRate 0.0888   Epoch: 1   Global Step: 48050   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:36,180-Speed 2629.72 samples/sec   Loss 13.9905   LearningRate 0.0887   Epoch: 1   Global Step: 48060   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:40,108-Speed 2608.53 samples/sec   Loss 13.9830   LearningRate 0.0887   Epoch: 1   Global Step: 48070   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:44,000-Speed 2631.56 samples/sec   Loss 13.9327   LearningRate 0.0887   Epoch: 1   Global Step: 48080   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:47,899-Speed 2626.67 samples/sec   Loss 14.0759   LearningRate 0.0887   Epoch: 1   Global Step: 48090   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:51,802-Speed 2625.05 samples/sec   Loss 14.0242   LearningRate 0.0887   Epoch: 1   Global Step: 48100   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:55,702-Speed 2626.21 samples/sec   Loss 13.9910   LearningRate 0.0887   Epoch: 1   Global Step: 48110   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:06:59,598-Speed 2628.40 samples/sec   Loss 14.0603   LearningRate 0.0887   Epoch: 1   Global Step: 48120   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:03,512-Speed 2616.59 samples/sec   Loss 13.9527   LearningRate 0.0887   Epoch: 1   Global Step: 48130   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:07,391-Speed 2641.11 samples/sec   Loss 14.0234   LearningRate 0.0887   Epoch: 1   Global Step: 48140   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:11,283-Speed 2631.75 samples/sec   Loss 13.9331   LearningRate 0.0887   Epoch: 1   Global Step: 48150   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:15,178-Speed 2629.82 samples/sec   Loss 13.9978   LearningRate 0.0887   Epoch: 1   Global Step: 48160   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:19,080-Speed 2625.19 samples/sec   Loss 13.9133   LearningRate 0.0887   Epoch: 1   Global Step: 48170   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:22,991-Speed 2619.09 samples/sec   Loss 13.8184   LearningRate 0.0887   Epoch: 1   Global Step: 48180   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:26,886-Speed 2629.66 samples/sec   Loss 14.0358   LearningRate 0.0887   Epoch: 1   Global Step: 48190   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:30,786-Speed 2626.56 samples/sec   Loss 13.9957   LearningRate 0.0887   Epoch: 1   Global Step: 48200   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:34,690-Speed 2623.64 samples/sec   Loss 13.9878   LearningRate 0.0887   Epoch: 1   Global Step: 48210   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:38,599-Speed 2620.07 samples/sec   Loss 13.9356   LearningRate 0.0887   Epoch: 1   Global Step: 48220   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:42,507-Speed 2621.20 samples/sec   Loss 13.9723   LearningRate 0.0887   Epoch: 1   Global Step: 48230   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:46,394-Speed 2635.24 samples/sec   Loss 13.8785   LearningRate 0.0887   Epoch: 1   Global Step: 48240   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:50,296-Speed 2625.10 samples/sec   Loss 13.8746   LearningRate 0.0887   Epoch: 1   Global Step: 48250   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:54,187-Speed 2631.80 samples/sec   Loss 13.9103   LearningRate 0.0887   Epoch: 1   Global Step: 48260   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:07:58,086-Speed 2627.39 samples/sec   Loss 14.0131   LearningRate 0.0887   Epoch: 1   Global Step: 48270   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:01,986-Speed 2626.59 samples/sec   Loss 13.9173   LearningRate 0.0887   Epoch: 1   Global Step: 48280   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:05,882-Speed 2628.51 samples/sec   Loss 13.9823   LearningRate 0.0887   Epoch: 1   Global Step: 48290   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:09,784-Speed 2625.06 samples/sec   Loss 13.9896   LearningRate 0.0887   Epoch: 1   Global Step: 48300   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:13,681-Speed 2628.59 samples/sec   Loss 13.8061   LearningRate 0.0887   Epoch: 1   Global Step: 48310   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:17,576-Speed 2629.59 samples/sec   Loss 13.9223   LearningRate 0.0887   Epoch: 1   Global Step: 48320   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:21,469-Speed 2631.30 samples/sec   Loss 14.1121   LearningRate 0.0887   Epoch: 1   Global Step: 48330   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:25,363-Speed 2630.11 samples/sec   Loss 13.8848   LearningRate 0.0887   Epoch: 1   Global Step: 48340   Fp16 Grad Scale: 524288   Required: 87 hours
Training: 2022-04-13 01:08:29,284-Speed 2612.27 samples/sec   Loss 14.0361   LearningRate 0.0887   Epoch: 1   Global Step: 48350   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:33,182-Speed 2627.16 samples/sec   Loss 13.9791   LearningRate 0.0887   Epoch: 1   Global Step: 48360   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:37,084-Speed 2624.90 samples/sec   Loss 14.0200   LearningRate 0.0887   Epoch: 1   Global Step: 48370   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:40,986-Speed 2624.47 samples/sec   Loss 13.8592   LearningRate 0.0887   Epoch: 1   Global Step: 48380   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:44,881-Speed 2630.15 samples/sec   Loss 13.8208   LearningRate 0.0887   Epoch: 1   Global Step: 48390   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:48,777-Speed 2629.46 samples/sec   Loss 13.9552   LearningRate 0.0887   Epoch: 1   Global Step: 48400   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:52,674-Speed 2627.96 samples/sec   Loss 13.8824   LearningRate 0.0887   Epoch: 1   Global Step: 48410   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:08:56,573-Speed 2627.74 samples/sec   Loss 14.0383   LearningRate 0.0887   Epoch: 1   Global Step: 48420   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:00,466-Speed 2630.83 samples/sec   Loss 13.9175   LearningRate 0.0887   Epoch: 1   Global Step: 48430   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:04,362-Speed 2628.65 samples/sec   Loss 14.0994   LearningRate 0.0887   Epoch: 1   Global Step: 48440   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:08,243-Speed 2638.85 samples/sec   Loss 13.8768   LearningRate 0.0887   Epoch: 1   Global Step: 48450   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:12,142-Speed 2627.39 samples/sec   Loss 13.9107   LearningRate 0.0887   Epoch: 1   Global Step: 48460   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:16,035-Speed 2630.84 samples/sec   Loss 13.9418   LearningRate 0.0887   Epoch: 1   Global Step: 48470   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:19,942-Speed 2621.97 samples/sec   Loss 13.9891   LearningRate 0.0887   Epoch: 1   Global Step: 48480   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:23,860-Speed 2613.85 samples/sec   Loss 14.0205   LearningRate 0.0887   Epoch: 1   Global Step: 48490   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:27,755-Speed 2630.38 samples/sec   Loss 13.9867   LearningRate 0.0886   Epoch: 1   Global Step: 48500   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:31,664-Speed 2620.20 samples/sec   Loss 14.0574   LearningRate 0.0886   Epoch: 1   Global Step: 48510   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:35,560-Speed 2628.55 samples/sec   Loss 14.0468   LearningRate 0.0886   Epoch: 1   Global Step: 48520   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:39,472-Speed 2617.91 samples/sec   Loss 14.0045   LearningRate 0.0886   Epoch: 1   Global Step: 48530   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:43,386-Speed 2617.33 samples/sec   Loss 13.9163   LearningRate 0.0886   Epoch: 1   Global Step: 48540   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:47,268-Speed 2638.16 samples/sec   Loss 13.9816   LearningRate 0.0886   Epoch: 1   Global Step: 48550   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:51,163-Speed 2629.79 samples/sec   Loss 13.9027   LearningRate 0.0886   Epoch: 1   Global Step: 48560   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:09:55,043-Speed 2639.89 samples/sec   Loss 13.9392   LearningRate 0.0886   Epoch: 1   Global Step: 48570   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:09:58,940-Speed 2628.36 samples/sec   Loss 13.8843   LearningRate 0.0886   Epoch: 1   Global Step: 48580   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:02,836-Speed 2629.00 samples/sec   Loss 14.1398   LearningRate 0.0886   Epoch: 1   Global Step: 48590   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:06,731-Speed 2629.08 samples/sec   Loss 13.9670   LearningRate 0.0886   Epoch: 1   Global Step: 48600   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:10,627-Speed 2628.78 samples/sec   Loss 13.9192   LearningRate 0.0886   Epoch: 1   Global Step: 48610   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:14,522-Speed 2629.77 samples/sec   Loss 14.0041   LearningRate 0.0886   Epoch: 1   Global Step: 48620   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:18,417-Speed 2629.39 samples/sec   Loss 14.0588   LearningRate 0.0886   Epoch: 1   Global Step: 48630   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:22,313-Speed 2629.24 samples/sec   Loss 14.0464   LearningRate 0.0886   Epoch: 1   Global Step: 48640   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:26,206-Speed 2631.21 samples/sec   Loss 13.9018   LearningRate 0.0886   Epoch: 1   Global Step: 48650   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:30,100-Speed 2630.36 samples/sec   Loss 14.1461   LearningRate 0.0886   Epoch: 1   Global Step: 48660   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:10:33,991-Speed 2632.16 samples/sec   Loss 13.9866   LearningRate 0.0886   Epoch: 1   Global Step: 48670   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:10:37,897-Speed 2622.12 samples/sec   Loss 13.9029   LearningRate 0.0886   Epoch: 1   Global Step: 48680   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:10:41,786-Speed 2633.99 samples/sec   Loss 14.0135   LearningRate 0.0886   Epoch: 1   Global Step: 48690   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:10:45,693-Speed 2621.16 samples/sec   Loss 14.0377   LearningRate 0.0886   Epoch: 1   Global Step: 48700   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:10:49,602-Speed 2620.58 samples/sec   Loss 13.9016   LearningRate 0.0886   Epoch: 1   Global Step: 48710   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:10:53,507-Speed 2622.60 samples/sec   Loss 13.9755   LearningRate 0.0886   Epoch: 1   Global Step: 48720   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:10:57,407-Speed 2626.03 samples/sec   Loss 14.0099   LearningRate 0.0886   Epoch: 1   Global Step: 48730   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:01,306-Speed 2627.44 samples/sec   Loss 13.7711   LearningRate 0.0886   Epoch: 1   Global Step: 48740   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:05,204-Speed 2627.23 samples/sec   Loss 13.9880   LearningRate 0.0886   Epoch: 1   Global Step: 48750   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:09,105-Speed 2625.78 samples/sec   Loss 13.8887   LearningRate 0.0886   Epoch: 1   Global Step: 48760   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:12,989-Speed 2637.02 samples/sec   Loss 14.0647   LearningRate 0.0886   Epoch: 1   Global Step: 48770   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:16,943-Speed 2590.69 samples/sec   Loss 14.0639   LearningRate 0.0886   Epoch: 1   Global Step: 48780   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:20,878-Speed 2602.27 samples/sec   Loss 13.9664   LearningRate 0.0886   Epoch: 1   Global Step: 48790   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:24,775-Speed 2628.64 samples/sec   Loss 14.1556   LearningRate 0.0886   Epoch: 1   Global Step: 48800   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:11:28,668-Speed 2630.59 samples/sec   Loss 14.0453   LearningRate 0.0886   Epoch: 1   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:32,570-Speed 2624.92 samples/sec   Loss 13.9453   LearningRate 0.0886   Epoch: 1   Global Step: 48820   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:36,488-Speed 2614.47 samples/sec   Loss 14.0618   LearningRate 0.0886   Epoch: 1   Global Step: 48830   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:40,389-Speed 2625.83 samples/sec   Loss 14.0737   LearningRate 0.0886   Epoch: 1   Global Step: 48840   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:44,288-Speed 2627.08 samples/sec   Loss 13.9652   LearningRate 0.0886   Epoch: 1   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:48,192-Speed 2623.16 samples/sec   Loss 13.8942   LearningRate 0.0886   Epoch: 1   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:52,087-Speed 2629.79 samples/sec   Loss 13.9053   LearningRate 0.0886   Epoch: 1   Global Step: 48870   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:55,982-Speed 2629.54 samples/sec   Loss 13.8821   LearningRate 0.0886   Epoch: 1   Global Step: 48880   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:11:59,879-Speed 2627.91 samples/sec   Loss 13.9839   LearningRate 0.0886   Epoch: 1   Global Step: 48890   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:12:03,782-Speed 2624.40 samples/sec   Loss 13.8919   LearningRate 0.0886   Epoch: 1   Global Step: 48900   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:12:07,687-Speed 2622.85 samples/sec   Loss 14.0463   LearningRate 0.0886   Epoch: 1   Global Step: 48910   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:11,587-Speed 2626.79 samples/sec   Loss 13.9540   LearningRate 0.0886   Epoch: 1   Global Step: 48920   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:15,482-Speed 2629.08 samples/sec   Loss 13.9421   LearningRate 0.0886   Epoch: 1   Global Step: 48930   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:19,379-Speed 2628.65 samples/sec   Loss 14.1469   LearningRate 0.0885   Epoch: 1   Global Step: 48940   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:23,273-Speed 2630.50 samples/sec   Loss 13.9099   LearningRate 0.0885   Epoch: 1   Global Step: 48950   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:27,186-Speed 2617.26 samples/sec   Loss 13.9456   LearningRate 0.0885   Epoch: 1   Global Step: 48960   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:31,090-Speed 2623.30 samples/sec   Loss 13.9732   LearningRate 0.0885   Epoch: 1   Global Step: 48970   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:35,062-Speed 2578.53 samples/sec   Loss 13.8757   LearningRate 0.0885   Epoch: 1   Global Step: 48980   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:38,959-Speed 2628.35 samples/sec   Loss 14.0201   LearningRate 0.0885   Epoch: 1   Global Step: 48990   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:42,857-Speed 2627.40 samples/sec   Loss 14.0431   LearningRate 0.0885   Epoch: 1   Global Step: 49000   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:46,742-Speed 2636.76 samples/sec   Loss 13.9349   LearningRate 0.0885   Epoch: 1   Global Step: 49010   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:50,641-Speed 2627.12 samples/sec   Loss 13.9699   LearningRate 0.0885   Epoch: 1   Global Step: 49020   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:54,535-Speed 2630.20 samples/sec   Loss 13.9458   LearningRate 0.0885   Epoch: 1   Global Step: 49030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:12:58,429-Speed 2630.69 samples/sec   Loss 13.9015   LearningRate 0.0885   Epoch: 1   Global Step: 49040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:02,325-Speed 2628.60 samples/sec   Loss 14.0679   LearningRate 0.0885   Epoch: 1   Global Step: 49050   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:06,237-Speed 2618.06 samples/sec   Loss 13.9068   LearningRate 0.0885   Epoch: 1   Global Step: 49060   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:10,138-Speed 2625.68 samples/sec   Loss 13.9280   LearningRate 0.0885   Epoch: 1   Global Step: 49070   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:14,046-Speed 2621.77 samples/sec   Loss 14.0824   LearningRate 0.0885   Epoch: 1   Global Step: 49080   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:17,946-Speed 2625.59 samples/sec   Loss 13.9710   LearningRate 0.0885   Epoch: 1   Global Step: 49090   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:21,896-Speed 2593.42 samples/sec   Loss 14.0631   LearningRate 0.0885   Epoch: 1   Global Step: 49100   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:25,960-Speed 2520.78 samples/sec   Loss 13.9501   LearningRate 0.0885   Epoch: 1   Global Step: 49110   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:29,931-Speed 2579.15 samples/sec   Loss 13.9582   LearningRate 0.0885   Epoch: 1   Global Step: 49120   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:33,832-Speed 2625.44 samples/sec   Loss 14.0019   LearningRate 0.0885   Epoch: 1   Global Step: 49130   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:37,733-Speed 2625.46 samples/sec   Loss 13.8035   LearningRate 0.0885   Epoch: 1   Global Step: 49140   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:41,637-Speed 2623.68 samples/sec   Loss 13.9635   LearningRate 0.0885   Epoch: 1   Global Step: 49150   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:45,541-Speed 2623.58 samples/sec   Loss 13.7972   LearningRate 0.0885   Epoch: 1   Global Step: 49160   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:49,450-Speed 2620.30 samples/sec   Loss 13.9927   LearningRate 0.0885   Epoch: 1   Global Step: 49170   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:53,367-Speed 2614.43 samples/sec   Loss 14.0363   LearningRate 0.0885   Epoch: 1   Global Step: 49180   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:13:57,263-Speed 2628.94 samples/sec   Loss 13.7618   LearningRate 0.0885   Epoch: 1   Global Step: 49190   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:14:01,161-Speed 2627.69 samples/sec   Loss 14.0434   LearningRate 0.0885   Epoch: 1   Global Step: 49200   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:14:05,068-Speed 2621.89 samples/sec   Loss 14.0735   LearningRate 0.0885   Epoch: 1   Global Step: 49210   Fp16 Grad Scale: 524288   Required: 87 hours
Training: 2022-04-13 01:14:08,952-Speed 2637.11 samples/sec   Loss 13.9577   LearningRate 0.0885   Epoch: 1   Global Step: 49220   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:14:12,856-Speed 2623.33 samples/sec   Loss 13.9234   LearningRate 0.0885   Epoch: 1   Global Step: 49230   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:14:16,742-Speed 2635.45 samples/sec   Loss 13.8222   LearningRate 0.0885   Epoch: 1   Global Step: 49240   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:20,639-Speed 2628.88 samples/sec   Loss 13.8946   LearningRate 0.0885   Epoch: 1   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:24,537-Speed 2626.77 samples/sec   Loss 13.9583   LearningRate 0.0885   Epoch: 1   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:28,439-Speed 2625.06 samples/sec   Loss 13.8262   LearningRate 0.0885   Epoch: 1   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:32,337-Speed 2627.54 samples/sec   Loss 14.0905   LearningRate 0.0885   Epoch: 1   Global Step: 49280   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:36,234-Speed 2627.92 samples/sec   Loss 13.9592   LearningRate 0.0885   Epoch: 1   Global Step: 49290   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:40,127-Speed 2631.33 samples/sec   Loss 14.0301   LearningRate 0.0885   Epoch: 1   Global Step: 49300   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:44,021-Speed 2630.91 samples/sec   Loss 13.8695   LearningRate 0.0885   Epoch: 1   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:47,920-Speed 2626.61 samples/sec   Loss 14.0518   LearningRate 0.0885   Epoch: 1   Global Step: 49320   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:51,816-Speed 2628.98 samples/sec   Loss 13.8207   LearningRate 0.0885   Epoch: 1   Global Step: 49330   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:14:55,711-Speed 2629.69 samples/sec   Loss 14.1276   LearningRate 0.0885   Epoch: 1   Global Step: 49340   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:14:59,592-Speed 2639.29 samples/sec   Loss 13.8537   LearningRate 0.0885   Epoch: 1   Global Step: 49350   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:03,497-Speed 2622.69 samples/sec   Loss 13.9304   LearningRate 0.0885   Epoch: 1   Global Step: 49360   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:07,395-Speed 2626.87 samples/sec   Loss 13.8455   LearningRate 0.0885   Epoch: 1   Global Step: 49370   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:11,290-Speed 2629.73 samples/sec   Loss 14.0827   LearningRate 0.0884   Epoch: 1   Global Step: 49380   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:15,188-Speed 2627.76 samples/sec   Loss 13.8766   LearningRate 0.0884   Epoch: 1   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:19,083-Speed 2630.08 samples/sec   Loss 13.8512   LearningRate 0.0884   Epoch: 1   Global Step: 49400   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:22,984-Speed 2625.64 samples/sec   Loss 13.9173   LearningRate 0.0884   Epoch: 1   Global Step: 49410   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:26,879-Speed 2630.06 samples/sec   Loss 14.0429   LearningRate 0.0884   Epoch: 1   Global Step: 49420   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:30,773-Speed 2630.37 samples/sec   Loss 13.8448   LearningRate 0.0884   Epoch: 1   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:34,668-Speed 2629.21 samples/sec   Loss 14.0156   LearningRate 0.0884   Epoch: 1   Global Step: 49440   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:15:38,583-Speed 2615.86 samples/sec   Loss 13.9404   LearningRate 0.0884   Epoch: 1   Global Step: 49450   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:15:42,481-Speed 2628.61 samples/sec   Loss 13.8314   LearningRate 0.0884   Epoch: 1   Global Step: 49460   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:15:46,393-Speed 2618.85 samples/sec   Loss 13.9549   LearningRate 0.0884   Epoch: 1   Global Step: 49470   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:15:50,315-Speed 2611.82 samples/sec   Loss 13.8279   LearningRate 0.0884   Epoch: 1   Global Step: 49480   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:15:54,210-Speed 2629.61 samples/sec   Loss 13.8484   LearningRate 0.0884   Epoch: 1   Global Step: 49490   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:15:58,123-Speed 2618.11 samples/sec   Loss 14.0316   LearningRate 0.0884   Epoch: 1   Global Step: 49500   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:16:02,022-Speed 2627.16 samples/sec   Loss 13.9481   LearningRate 0.0884   Epoch: 1   Global Step: 49510   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:16:05,916-Speed 2630.12 samples/sec   Loss 13.9999   LearningRate 0.0884   Epoch: 1   Global Step: 49520   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:16:09,812-Speed 2628.53 samples/sec   Loss 13.8414   LearningRate 0.0884   Epoch: 1   Global Step: 49530   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:16:13,705-Speed 2631.49 samples/sec   Loss 13.9733   LearningRate 0.0884   Epoch: 1   Global Step: 49540   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:16:17,596-Speed 2632.15 samples/sec   Loss 14.0726   LearningRate 0.0884   Epoch: 1   Global Step: 49550   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:16:21,481-Speed 2636.33 samples/sec   Loss 14.0962   LearningRate 0.0884   Epoch: 1   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:16:25,383-Speed 2626.47 samples/sec   Loss 13.9987   LearningRate 0.0884   Epoch: 1   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:16:29,284-Speed 2625.90 samples/sec   Loss 13.8806   LearningRate 0.0884   Epoch: 1   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:16:33,180-Speed 2628.69 samples/sec   Loss 13.9832   LearningRate 0.0884   Epoch: 1   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:16:37,076-Speed 2628.79 samples/sec   Loss 13.9747   LearningRate 0.0884   Epoch: 1   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:16:40,970-Speed 2630.41 samples/sec   Loss 13.9705   LearningRate 0.0884   Epoch: 1   Global Step: 49610   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:16:44,866-Speed 2628.93 samples/sec   Loss 13.8942   LearningRate 0.0884   Epoch: 1   Global Step: 49620   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:16:48,815-Speed 2594.02 samples/sec   Loss 14.0441   LearningRate 0.0884   Epoch: 1   Global Step: 49630   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:16:52,713-Speed 2627.58 samples/sec   Loss 13.7869   LearningRate 0.0884   Epoch: 1   Global Step: 49640   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:16:56,607-Speed 2630.33 samples/sec   Loss 13.7766   LearningRate 0.0884   Epoch: 1   Global Step: 49650   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:17:00,502-Speed 2629.49 samples/sec   Loss 13.8981   LearningRate 0.0884   Epoch: 1   Global Step: 49660   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:17:04,411-Speed 2620.12 samples/sec   Loss 14.0257   LearningRate 0.0884   Epoch: 1   Global Step: 49670   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:17:08,311-Speed 2626.05 samples/sec   Loss 13.9578   LearningRate 0.0884   Epoch: 1   Global Step: 49680   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:17:12,205-Speed 2630.90 samples/sec   Loss 13.8833   LearningRate 0.0884   Epoch: 1   Global Step: 49690   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:17:16,106-Speed 2625.31 samples/sec   Loss 13.9454   LearningRate 0.0884   Epoch: 1   Global Step: 49700   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:17:20,006-Speed 2626.55 samples/sec   Loss 13.9366   LearningRate 0.0884   Epoch: 1   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:23,902-Speed 2628.88 samples/sec   Loss 13.9270   LearningRate 0.0884   Epoch: 1   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:27,813-Speed 2619.05 samples/sec   Loss 13.7845   LearningRate 0.0884   Epoch: 1   Global Step: 49730   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:31,708-Speed 2629.96 samples/sec   Loss 13.8957   LearningRate 0.0884   Epoch: 1   Global Step: 49740   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:35,611-Speed 2624.42 samples/sec   Loss 14.0160   LearningRate 0.0884   Epoch: 1   Global Step: 49750   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:39,520-Speed 2620.44 samples/sec   Loss 13.8984   LearningRate 0.0884   Epoch: 1   Global Step: 49760   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:43,422-Speed 2625.20 samples/sec   Loss 13.8062   LearningRate 0.0884   Epoch: 1   Global Step: 49770   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:47,317-Speed 2629.94 samples/sec   Loss 13.7931   LearningRate 0.0884   Epoch: 1   Global Step: 49780   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:51,215-Speed 2627.46 samples/sec   Loss 13.9275   LearningRate 0.0884   Epoch: 1   Global Step: 49790   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:55,109-Speed 2629.87 samples/sec   Loss 13.8347   LearningRate 0.0884   Epoch: 1   Global Step: 49800   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:17:59,022-Speed 2617.92 samples/sec   Loss 13.8798   LearningRate 0.0884   Epoch: 1   Global Step: 49810   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:18:02,938-Speed 2615.74 samples/sec   Loss 13.8102   LearningRate 0.0883   Epoch: 1   Global Step: 49820   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:18:06,838-Speed 2626.05 samples/sec   Loss 13.8071   LearningRate 0.0883   Epoch: 1   Global Step: 49830   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:18:10,720-Speed 2638.77 samples/sec   Loss 13.9118   LearningRate 0.0883   Epoch: 1   Global Step: 49840   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:14,619-Speed 2627.07 samples/sec   Loss 13.8530   LearningRate 0.0883   Epoch: 1   Global Step: 49850   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:18,518-Speed 2627.36 samples/sec   Loss 13.8131   LearningRate 0.0883   Epoch: 1   Global Step: 49860   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:22,412-Speed 2630.53 samples/sec   Loss 13.8207   LearningRate 0.0883   Epoch: 1   Global Step: 49870   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:26,314-Speed 2624.36 samples/sec   Loss 13.8751   LearningRate 0.0883   Epoch: 1   Global Step: 49880   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:30,310-Speed 2563.17 samples/sec   Loss 13.9348   LearningRate 0.0883   Epoch: 1   Global Step: 49890   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:34,204-Speed 2631.03 samples/sec   Loss 13.9997   LearningRate 0.0883   Epoch: 1   Global Step: 49900   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:38,098-Speed 2630.16 samples/sec   Loss 13.8451   LearningRate 0.0883   Epoch: 1   Global Step: 49910   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:42,000-Speed 2625.01 samples/sec   Loss 13.7688   LearningRate 0.0883   Epoch: 1   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:45,914-Speed 2616.59 samples/sec   Loss 13.9467   LearningRate 0.0883   Epoch: 1   Global Step: 49930   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:18:49,824-Speed 2619.46 samples/sec   Loss 13.8931   LearningRate 0.0883   Epoch: 1   Global Step: 49940   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:18:53,719-Speed 2630.35 samples/sec   Loss 13.8883   LearningRate 0.0883   Epoch: 1   Global Step: 49950   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:18:57,615-Speed 2628.64 samples/sec   Loss 13.9709   LearningRate 0.0883   Epoch: 1   Global Step: 49960   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:19:01,517-Speed 2624.98 samples/sec   Loss 13.8682   LearningRate 0.0883   Epoch: 1   Global Step: 49970   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:19:05,415-Speed 2627.14 samples/sec   Loss 13.8674   LearningRate 0.0883   Epoch: 1   Global Step: 49980   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:19:09,321-Speed 2622.68 samples/sec   Loss 13.9311   LearningRate 0.0883   Epoch: 1   Global Step: 49990   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:19:13,216-Speed 2629.10 samples/sec   Loss 13.8255   LearningRate 0.0883   Epoch: 1   Global Step: 50000   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:19:55,881-[lfw][50000]XNorm: 21.741130
Training: 2022-04-13 01:19:55,882-[lfw][50000]Accuracy-Flip: 0.99600+-0.00335
Training: 2022-04-13 01:19:55,882-[lfw][50000]Accuracy-Highest: 0.99600
Training: 2022-04-13 01:20:46,216-[cfp_fp][50000]XNorm: 19.481108
Training: 2022-04-13 01:20:46,217-[cfp_fp][50000]Accuracy-Flip: 0.97500+-0.00935
Training: 2022-04-13 01:20:46,218-[cfp_fp][50000]Accuracy-Highest: 0.97500
Training: 2022-04-13 01:21:29,214-[agedb_30][50000]XNorm: 21.220101
Training: 2022-04-13 01:21:29,215-[agedb_30][50000]Accuracy-Flip: 0.96100+-0.01081
Training: 2022-04-13 01:21:29,216-[agedb_30][50000]Accuracy-Highest: 0.96100
Training: 2022-04-13 01:21:33,076-Speed 73.22 samples/sec   Loss 13.7808   LearningRate 0.0883   Epoch: 1   Global Step: 50010   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:21:36,943-Speed 2648.86 samples/sec   Loss 13.9753   LearningRate 0.0883   Epoch: 1   Global Step: 50020   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:21:40,814-Speed 2646.19 samples/sec   Loss 13.7504   LearningRate 0.0883   Epoch: 1   Global Step: 50030   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:21:44,707-Speed 2631.01 samples/sec   Loss 13.8417   LearningRate 0.0883   Epoch: 1   Global Step: 50040   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:21:48,617-Speed 2619.86 samples/sec   Loss 13.8234   LearningRate 0.0883   Epoch: 1   Global Step: 50050   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:21:52,514-Speed 2628.24 samples/sec   Loss 13.8655   LearningRate 0.0883   Epoch: 1   Global Step: 50060   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:21:56,405-Speed 2632.49 samples/sec   Loss 13.8623   LearningRate 0.0883   Epoch: 1   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:00,314-Speed 2619.81 samples/sec   Loss 13.8250   LearningRate 0.0883   Epoch: 1   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:04,203-Speed 2634.14 samples/sec   Loss 13.8472   LearningRate 0.0883   Epoch: 1   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:08,094-Speed 2633.07 samples/sec   Loss 13.8505   LearningRate 0.0883   Epoch: 1   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:11,984-Speed 2633.17 samples/sec   Loss 13.9334   LearningRate 0.0883   Epoch: 1   Global Step: 50110   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:22:15,872-Speed 2634.11 samples/sec   Loss 13.7816   LearningRate 0.0883   Epoch: 1   Global Step: 50120   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:22:19,764-Speed 2631.01 samples/sec   Loss 13.8765   LearningRate 0.0883   Epoch: 1   Global Step: 50130   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:22:23,653-Speed 2634.09 samples/sec   Loss 13.9630   LearningRate 0.0883   Epoch: 1   Global Step: 50140   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:22:27,548-Speed 2630.40 samples/sec   Loss 13.9984   LearningRate 0.0883   Epoch: 1   Global Step: 50150   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:31,432-Speed 2636.66 samples/sec   Loss 13.8003   LearningRate 0.0883   Epoch: 1   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:35,330-Speed 2627.61 samples/sec   Loss 13.9886   LearningRate 0.0883   Epoch: 1   Global Step: 50170   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:39,237-Speed 2621.84 samples/sec   Loss 13.8831   LearningRate 0.0883   Epoch: 1   Global Step: 50180   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:43,149-Speed 2618.59 samples/sec   Loss 13.8809   LearningRate 0.0883   Epoch: 1   Global Step: 50190   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:47,236-Speed 2505.84 samples/sec   Loss 13.8762   LearningRate 0.0883   Epoch: 1   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:51,336-Speed 2498.65 samples/sec   Loss 13.8197   LearningRate 0.0883   Epoch: 1   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:55,301-Speed 2583.17 samples/sec   Loss 13.8583   LearningRate 0.0883   Epoch: 1   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:22:59,201-Speed 2626.04 samples/sec   Loss 13.9605   LearningRate 0.0883   Epoch: 1   Global Step: 50230   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:23:03,094-Speed 2631.49 samples/sec   Loss 13.9997   LearningRate 0.0883   Epoch: 1   Global Step: 50240   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:23:06,987-Speed 2630.54 samples/sec   Loss 13.9226   LearningRate 0.0883   Epoch: 1   Global Step: 50250   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:23:10,890-Speed 2624.65 samples/sec   Loss 13.7769   LearningRate 0.0883   Epoch: 1   Global Step: 50260   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:23:14,788-Speed 2627.64 samples/sec   Loss 13.9304   LearningRate 0.0882   Epoch: 1   Global Step: 50270   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:23:18,747-Speed 2587.34 samples/sec   Loss 13.8998   LearningRate 0.0882   Epoch: 1   Global Step: 50280   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:23:22,677-Speed 2606.58 samples/sec   Loss 13.8686   LearningRate 0.0882   Epoch: 1   Global Step: 50290   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:23:26,555-Speed 2641.27 samples/sec   Loss 13.7471   LearningRate 0.0882   Epoch: 1   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:30,451-Speed 2628.86 samples/sec   Loss 13.7989   LearningRate 0.0882   Epoch: 1   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:34,345-Speed 2630.31 samples/sec   Loss 13.8626   LearningRate 0.0882   Epoch: 1   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:38,247-Speed 2624.92 samples/sec   Loss 13.9787   LearningRate 0.0882   Epoch: 1   Global Step: 50330   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:42,145-Speed 2631.32 samples/sec   Loss 13.8855   LearningRate 0.0882   Epoch: 1   Global Step: 50340   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:46,041-Speed 2629.15 samples/sec   Loss 13.9081   LearningRate 0.0882   Epoch: 1   Global Step: 50350   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:49,956-Speed 2616.08 samples/sec   Loss 13.9144   LearningRate 0.0882   Epoch: 1   Global Step: 50360   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:53,848-Speed 2631.91 samples/sec   Loss 13.9868   LearningRate 0.0882   Epoch: 1   Global Step: 50370   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:23:57,750-Speed 2625.21 samples/sec   Loss 13.8303   LearningRate 0.0882   Epoch: 1   Global Step: 50380   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:24:01,656-Speed 2622.11 samples/sec   Loss 13.8282   LearningRate 0.0882   Epoch: 1   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:24:05,573-Speed 2614.86 samples/sec   Loss 13.7785   LearningRate 0.0882   Epoch: 1   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:09,463-Speed 2632.51 samples/sec   Loss 13.8023   LearningRate 0.0882   Epoch: 1   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:13,354-Speed 2633.11 samples/sec   Loss 13.9475   LearningRate 0.0882   Epoch: 1   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:17,248-Speed 2630.62 samples/sec   Loss 13.7294   LearningRate 0.0882   Epoch: 1   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:21,151-Speed 2624.22 samples/sec   Loss 13.8229   LearningRate 0.0882   Epoch: 1   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:25,060-Speed 2620.33 samples/sec   Loss 13.9044   LearningRate 0.0882   Epoch: 1   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:28,953-Speed 2630.73 samples/sec   Loss 13.9809   LearningRate 0.0882   Epoch: 1   Global Step: 50460   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:32,858-Speed 2623.25 samples/sec   Loss 13.9650   LearningRate 0.0882   Epoch: 1   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:36,755-Speed 2628.19 samples/sec   Loss 13.9701   LearningRate 0.0882   Epoch: 1   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:40,650-Speed 2629.86 samples/sec   Loss 13.8376   LearningRate 0.0882   Epoch: 1   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:44,605-Speed 2589.86 samples/sec   Loss 13.8691   LearningRate 0.0882   Epoch: 1   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:48,491-Speed 2635.80 samples/sec   Loss 13.9073   LearningRate 0.0882   Epoch: 1   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:52,387-Speed 2629.13 samples/sec   Loss 13.9352   LearningRate 0.0882   Epoch: 1   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:24:56,302-Speed 2616.85 samples/sec   Loss 13.7322   LearningRate 0.0882   Epoch: 1   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:00,209-Speed 2621.67 samples/sec   Loss 13.9864   LearningRate 0.0882   Epoch: 1   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:04,128-Speed 2613.41 samples/sec   Loss 13.9039   LearningRate 0.0882   Epoch: 1   Global Step: 50550   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:08,024-Speed 2628.67 samples/sec   Loss 13.7583   LearningRate 0.0882   Epoch: 1   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:11,922-Speed 2627.70 samples/sec   Loss 13.6593   LearningRate 0.0882   Epoch: 1   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:15,813-Speed 2632.35 samples/sec   Loss 14.0868   LearningRate 0.0882   Epoch: 1   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:19,706-Speed 2631.87 samples/sec   Loss 13.8017   LearningRate 0.0882   Epoch: 1   Global Step: 50590   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:23,600-Speed 2630.11 samples/sec   Loss 13.8525   LearningRate 0.0882   Epoch: 1   Global Step: 50600   Fp16 Grad Scale: 262144   Required: 88 hours
Training: 2022-04-13 01:25:27,482-Speed 2638.65 samples/sec   Loss 13.7864   LearningRate 0.0882   Epoch: 1   Global Step: 50610   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:31,377-Speed 2629.79 samples/sec   Loss 13.8000   LearningRate 0.0882   Epoch: 1   Global Step: 50620   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:35,276-Speed 2626.29 samples/sec   Loss 13.7956   LearningRate 0.0882   Epoch: 1   Global Step: 50630   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:39,175-Speed 2627.34 samples/sec   Loss 14.0180   LearningRate 0.0882   Epoch: 1   Global Step: 50640   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:43,076-Speed 2625.87 samples/sec   Loss 13.8380   LearningRate 0.0882   Epoch: 1   Global Step: 50650   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:46,973-Speed 2628.45 samples/sec   Loss 13.8372   LearningRate 0.0882   Epoch: 1   Global Step: 50660   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:50,872-Speed 2627.04 samples/sec   Loss 13.8552   LearningRate 0.0882   Epoch: 1   Global Step: 50670   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:54,767-Speed 2629.03 samples/sec   Loss 13.8331   LearningRate 0.0882   Epoch: 1   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:25:58,664-Speed 2628.53 samples/sec   Loss 13.9342   LearningRate 0.0882   Epoch: 1   Global Step: 50690   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:26:02,541-Speed 2642.16 samples/sec   Loss 13.9531   LearningRate 0.0882   Epoch: 1   Global Step: 50700   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:06,512-Speed 2579.07 samples/sec   Loss 13.7797   LearningRate 0.0881   Epoch: 1   Global Step: 50710   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:10,421-Speed 2620.46 samples/sec   Loss 13.8090   LearningRate 0.0881   Epoch: 1   Global Step: 50720   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:14,314-Speed 2630.53 samples/sec   Loss 13.8742   LearningRate 0.0881   Epoch: 1   Global Step: 50730   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:18,207-Speed 2631.58 samples/sec   Loss 13.8570   LearningRate 0.0881   Epoch: 1   Global Step: 50740   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:22,104-Speed 2628.16 samples/sec   Loss 13.8462   LearningRate 0.0881   Epoch: 1   Global Step: 50750   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:25,999-Speed 2630.16 samples/sec   Loss 13.9757   LearningRate 0.0881   Epoch: 1   Global Step: 50760   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:29,899-Speed 2626.16 samples/sec   Loss 13.7303   LearningRate 0.0881   Epoch: 1   Global Step: 50770   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:33,824-Speed 2610.31 samples/sec   Loss 13.8633   LearningRate 0.0881   Epoch: 1   Global Step: 50780   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:37,880-Speed 2525.09 samples/sec   Loss 13.7710   LearningRate 0.0881   Epoch: 1   Global Step: 50790   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:26:41,776-Speed 2628.58 samples/sec   Loss 13.9808   LearningRate 0.0881   Epoch: 1   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:26:45,669-Speed 2631.33 samples/sec   Loss 13.9466   LearningRate 0.0881   Epoch: 1   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:26:49,564-Speed 2629.93 samples/sec   Loss 13.8886   LearningRate 0.0881   Epoch: 1   Global Step: 50820   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:26:53,458-Speed 2630.26 samples/sec   Loss 13.8001   LearningRate 0.0881   Epoch: 1   Global Step: 50830   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:26:57,379-Speed 2612.30 samples/sec   Loss 13.8264   LearningRate 0.0881   Epoch: 1   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:27:01,278-Speed 2627.52 samples/sec   Loss 13.7634   LearningRate 0.0881   Epoch: 1   Global Step: 50850   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:27:05,204-Speed 2608.88 samples/sec   Loss 13.7432   LearningRate 0.0881   Epoch: 1   Global Step: 50860   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:27:09,108-Speed 2623.59 samples/sec   Loss 13.9414   LearningRate 0.0881   Epoch: 1   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:27:13,056-Speed 2593.92 samples/sec   Loss 14.0218   LearningRate 0.0881   Epoch: 1   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:27:16,953-Speed 2628.96 samples/sec   Loss 13.9274   LearningRate 0.0881   Epoch: 1   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:27:20,854-Speed 2625.64 samples/sec   Loss 13.9075   LearningRate 0.0881   Epoch: 1   Global Step: 50900   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:24,749-Speed 2629.74 samples/sec   Loss 13.7872   LearningRate 0.0881   Epoch: 1   Global Step: 50910   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:28,647-Speed 2627.63 samples/sec   Loss 13.9030   LearningRate 0.0881   Epoch: 1   Global Step: 50920   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:32,545-Speed 2627.73 samples/sec   Loss 13.7849   LearningRate 0.0881   Epoch: 1   Global Step: 50930   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:36,444-Speed 2627.04 samples/sec   Loss 13.7835   LearningRate 0.0881   Epoch: 1   Global Step: 50940   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:40,361-Speed 2614.71 samples/sec   Loss 13.9766   LearningRate 0.0881   Epoch: 1   Global Step: 50950   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:44,254-Speed 2630.56 samples/sec   Loss 13.8344   LearningRate 0.0881   Epoch: 1   Global Step: 50960   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:48,154-Speed 2626.59 samples/sec   Loss 13.8646   LearningRate 0.0881   Epoch: 1   Global Step: 50970   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:52,066-Speed 2618.21 samples/sec   Loss 13.8155   LearningRate 0.0881   Epoch: 1   Global Step: 50980   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:27:56,268-Speed 2437.69 samples/sec   Loss 13.9233   LearningRate 0.0881   Epoch: 1   Global Step: 50990   Fp16 Grad Scale: 131072   Required: 88 hours
Training: 2022-04-13 01:28:00,133-Speed 2649.52 samples/sec   Loss 13.8188   LearningRate 0.0881   Epoch: 1   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:28:04,035-Speed 2625.34 samples/sec   Loss 13.8439   LearningRate 0.0881   Epoch: 1   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:28:07,944-Speed 2620.23 samples/sec   Loss 13.8452   LearningRate 0.0881   Epoch: 1   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:28:11,849-Speed 2623.10 samples/sec   Loss 13.8689   LearningRate 0.0881   Epoch: 1   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:28:15,753-Speed 2623.52 samples/sec   Loss 14.0367   LearningRate 0.0881   Epoch: 1   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:28:19,621-Speed 2648.09 samples/sec   Loss 13.9063   LearningRate 0.0881   Epoch: 1   Global Step: 51050   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:23,575-Speed 2590.45 samples/sec   Loss 14.0058   LearningRate 0.0881   Epoch: 1   Global Step: 51060   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:27,481-Speed 2629.70 samples/sec   Loss 13.7980   LearningRate 0.0881   Epoch: 1   Global Step: 51070   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:31,405-Speed 2610.01 samples/sec   Loss 13.6780   LearningRate 0.0881   Epoch: 1   Global Step: 51080   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:35,313-Speed 2620.81 samples/sec   Loss 13.9825   LearningRate 0.0881   Epoch: 1   Global Step: 51090   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:39,208-Speed 2630.07 samples/sec   Loss 13.9629   LearningRate 0.0881   Epoch: 1   Global Step: 51100   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:43,110-Speed 2625.46 samples/sec   Loss 13.9012   LearningRate 0.0881   Epoch: 1   Global Step: 51110   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:47,003-Speed 2630.57 samples/sec   Loss 13.8993   LearningRate 0.0881   Epoch: 1   Global Step: 51120   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:50,897-Speed 2630.56 samples/sec   Loss 13.8998   LearningRate 0.0881   Epoch: 1   Global Step: 51130   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:54,796-Speed 2626.23 samples/sec   Loss 13.9163   LearningRate 0.0881   Epoch: 1   Global Step: 51140   Fp16 Grad Scale: 16384   Required: 88 hours
Training: 2022-04-13 01:28:58,703-Speed 2621.84 samples/sec   Loss 13.8366   LearningRate 0.0880   Epoch: 1   Global Step: 51150   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:02,609-Speed 2622.04 samples/sec   Loss 13.9206   LearningRate 0.0880   Epoch: 1   Global Step: 51160   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:06,507-Speed 2628.04 samples/sec   Loss 13.8270   LearningRate 0.0880   Epoch: 1   Global Step: 51170   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:10,408-Speed 2625.72 samples/sec   Loss 13.7650   LearningRate 0.0880   Epoch: 1   Global Step: 51180   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:14,301-Speed 2631.14 samples/sec   Loss 14.0210   LearningRate 0.0880   Epoch: 1   Global Step: 51190   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:18,384-Speed 2509.27 samples/sec   Loss 13.8866   LearningRate 0.0880   Epoch: 1   Global Step: 51200   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:22,280-Speed 2628.35 samples/sec   Loss 13.9307   LearningRate 0.0880   Epoch: 1   Global Step: 51210   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:26,175-Speed 2630.06 samples/sec   Loss 13.9387   LearningRate 0.0880   Epoch: 1   Global Step: 51220   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:30,072-Speed 2628.55 samples/sec   Loss 13.8742   LearningRate 0.0880   Epoch: 1   Global Step: 51230   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:33,974-Speed 2624.45 samples/sec   Loss 13.8937   LearningRate 0.0880   Epoch: 1   Global Step: 51240   Fp16 Grad Scale: 32768   Required: 88 hours
Training: 2022-04-13 01:29:37,883-Speed 2620.66 samples/sec   Loss 13.8816   LearningRate 0.0880   Epoch: 1   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:29:41,795-Speed 2618.21 samples/sec   Loss 13.8762   LearningRate 0.0880   Epoch: 1   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:29:45,701-Speed 2622.12 samples/sec   Loss 13.8245   LearningRate 0.0880   Epoch: 1   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:29:49,651-Speed 2593.74 samples/sec   Loss 13.8381   LearningRate 0.0880   Epoch: 1   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:29:53,554-Speed 2624.22 samples/sec   Loss 13.8560   LearningRate 0.0880   Epoch: 1   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 88 hours
Training: 2022-04-13 01:29:57,447-Speed 2631.18 samples/sec   Loss 13.7658   LearningRate 0.0880   Epoch: 1   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:30:01,327-Speed 2639.94 samples/sec   Loss 13.9093   LearningRate 0.0880   Epoch: 1   Global Step: 51310   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:05,221-Speed 2630.58 samples/sec   Loss 13.9092   LearningRate 0.0880   Epoch: 1   Global Step: 51320   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:09,114-Speed 2630.80 samples/sec   Loss 13.8230   LearningRate 0.0880   Epoch: 1   Global Step: 51330   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:13,013-Speed 2627.45 samples/sec   Loss 13.8431   LearningRate 0.0880   Epoch: 1   Global Step: 51340   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:16,911-Speed 2627.59 samples/sec   Loss 13.7753   LearningRate 0.0880   Epoch: 1   Global Step: 51350   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:20,812-Speed 2627.15 samples/sec   Loss 13.8373   LearningRate 0.0880   Epoch: 1   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:24,704-Speed 2631.50 samples/sec   Loss 14.0575   LearningRate 0.0880   Epoch: 1   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:28,639-Speed 2603.00 samples/sec   Loss 13.6554   LearningRate 0.0880   Epoch: 1   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:32,531-Speed 2631.71 samples/sec   Loss 13.9141   LearningRate 0.0880   Epoch: 1   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:36,430-Speed 2627.13 samples/sec   Loss 13.9383   LearningRate 0.0880   Epoch: 1   Global Step: 51400   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:30:40,331-Speed 2625.56 samples/sec   Loss 13.9343   LearningRate 0.0880   Epoch: 1   Global Step: 51410   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:30:44,235-Speed 2623.46 samples/sec   Loss 13.8223   LearningRate 0.0880   Epoch: 1   Global Step: 51420   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:30:48,141-Speed 2622.92 samples/sec   Loss 13.8065   LearningRate 0.0880   Epoch: 1   Global Step: 51430   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:30:52,045-Speed 2623.22 samples/sec   Loss 13.8876   LearningRate 0.0880   Epoch: 1   Global Step: 51440   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:30:55,942-Speed 2628.40 samples/sec   Loss 13.9137   LearningRate 0.0880   Epoch: 1   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:30:59,872-Speed 2606.10 samples/sec   Loss 13.8548   LearningRate 0.0880   Epoch: 1   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:31:03,768-Speed 2629.51 samples/sec   Loss 13.9020   LearningRate 0.0880   Epoch: 1   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:31:07,666-Speed 2627.58 samples/sec   Loss 13.8995   LearningRate 0.0880   Epoch: 1   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:31:11,563-Speed 2628.48 samples/sec   Loss 13.9566   LearningRate 0.0880   Epoch: 1   Global Step: 51490   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:31:15,454-Speed 2631.97 samples/sec   Loss 13.7810   LearningRate 0.0880   Epoch: 1   Global Step: 51500   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:31:19,354-Speed 2626.89 samples/sec   Loss 13.7588   LearningRate 0.0880   Epoch: 1   Global Step: 51510   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:31:23,249-Speed 2629.28 samples/sec   Loss 13.8007   LearningRate 0.0880   Epoch: 1   Global Step: 51520   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:31:27,149-Speed 2626.88 samples/sec   Loss 13.6503   LearningRate 0.0880   Epoch: 1   Global Step: 51530   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:31:31,055-Speed 2622.39 samples/sec   Loss 13.8341   LearningRate 0.0880   Epoch: 1   Global Step: 51540   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:31:34,929-Speed 2643.39 samples/sec   Loss 13.8951   LearningRate 0.0880   Epoch: 1   Global Step: 51550   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:31:38,837-Speed 2620.72 samples/sec   Loss 13.8749   LearningRate 0.0880   Epoch: 1   Global Step: 51560   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:31:42,785-Speed 2594.51 samples/sec   Loss 13.7493   LearningRate 0.0880   Epoch: 1   Global Step: 51570   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:31:46,679-Speed 2630.55 samples/sec   Loss 13.7701   LearningRate 0.0880   Epoch: 1   Global Step: 51580   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:31:50,576-Speed 2628.65 samples/sec   Loss 14.0100   LearningRate 0.0879   Epoch: 1   Global Step: 51590   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:31:54,473-Speed 2628.06 samples/sec   Loss 13.7731   LearningRate 0.0879   Epoch: 1   Global Step: 51600   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:31:58,368-Speed 2630.27 samples/sec   Loss 13.7664   LearningRate 0.0879   Epoch: 1   Global Step: 51610   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:32:02,268-Speed 2626.21 samples/sec   Loss 13.8699   LearningRate 0.0879   Epoch: 1   Global Step: 51620   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:32:06,162-Speed 2630.13 samples/sec   Loss 13.7181   LearningRate 0.0879   Epoch: 1   Global Step: 51630   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:32:10,061-Speed 2627.12 samples/sec   Loss 13.8222   LearningRate 0.0879   Epoch: 1   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:32:13,954-Speed 2630.69 samples/sec   Loss 13.9005   LearningRate 0.0879   Epoch: 1   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:17,849-Speed 2629.97 samples/sec   Loss 14.0495   LearningRate 0.0879   Epoch: 1   Global Step: 51660   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:21,759-Speed 2619.98 samples/sec   Loss 13.9642   LearningRate 0.0879   Epoch: 1   Global Step: 51670   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:25,655-Speed 2629.34 samples/sec   Loss 13.7457   LearningRate 0.0879   Epoch: 1   Global Step: 51680   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:29,549-Speed 2629.97 samples/sec   Loss 13.6999   LearningRate 0.0879   Epoch: 1   Global Step: 51690   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:33,440-Speed 2632.34 samples/sec   Loss 13.6262   LearningRate 0.0879   Epoch: 1   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:37,334-Speed 2630.13 samples/sec   Loss 13.8640   LearningRate 0.0879   Epoch: 1   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:41,239-Speed 2622.98 samples/sec   Loss 13.7795   LearningRate 0.0879   Epoch: 1   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:45,137-Speed 2627.18 samples/sec   Loss 13.9495   LearningRate 0.0879   Epoch: 1   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:49,036-Speed 2627.36 samples/sec   Loss 13.9739   LearningRate 0.0879   Epoch: 1   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:32:52,928-Speed 2631.79 samples/sec   Loss 13.9666   LearningRate 0.0879   Epoch: 1   Global Step: 51750   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:32:56,887-Speed 2587.07 samples/sec   Loss 13.8479   LearningRate 0.0879   Epoch: 1   Global Step: 51760   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:00,784-Speed 2628.81 samples/sec   Loss 13.7147   LearningRate 0.0879   Epoch: 1   Global Step: 51770   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:04,689-Speed 2623.41 samples/sec   Loss 13.7207   LearningRate 0.0879   Epoch: 1   Global Step: 51780   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:08,586-Speed 2628.04 samples/sec   Loss 13.7309   LearningRate 0.0879   Epoch: 1   Global Step: 51790   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:12,480-Speed 2629.65 samples/sec   Loss 13.6651   LearningRate 0.0879   Epoch: 1   Global Step: 51800   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:16,378-Speed 2628.49 samples/sec   Loss 13.9341   LearningRate 0.0879   Epoch: 1   Global Step: 51810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:20,273-Speed 2629.63 samples/sec   Loss 13.8515   LearningRate 0.0879   Epoch: 1   Global Step: 51820   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:33:24,149-Speed 2642.89 samples/sec   Loss 13.8884   LearningRate 0.0879   Epoch: 1   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:28,043-Speed 2630.06 samples/sec   Loss 13.8851   LearningRate 0.0879   Epoch: 1   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:31,934-Speed 2632.55 samples/sec   Loss 13.7361   LearningRate 0.0879   Epoch: 1   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:35,841-Speed 2621.35 samples/sec   Loss 13.7388   LearningRate 0.0879   Epoch: 1   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:39,737-Speed 2628.74 samples/sec   Loss 13.8687   LearningRate 0.0879   Epoch: 1   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:43,639-Speed 2624.98 samples/sec   Loss 13.7600   LearningRate 0.0879   Epoch: 1   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:47,527-Speed 2634.33 samples/sec   Loss 13.8134   LearningRate 0.0879   Epoch: 1   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:51,422-Speed 2630.00 samples/sec   Loss 13.8457   LearningRate 0.0879   Epoch: 1   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:55,317-Speed 2629.90 samples/sec   Loss 13.7654   LearningRate 0.0879   Epoch: 1   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:33:59,213-Speed 2628.83 samples/sec   Loss 13.7988   LearningRate 0.0879   Epoch: 1   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:34:03,115-Speed 2625.04 samples/sec   Loss 13.6845   LearningRate 0.0879   Epoch: 1   Global Step: 51930   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:07,014-Speed 2626.40 samples/sec   Loss 13.9971   LearningRate 0.0879   Epoch: 1   Global Step: 51940   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:10,924-Speed 2619.97 samples/sec   Loss 13.9475   LearningRate 0.0879   Epoch: 1   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:14,841-Speed 2614.75 samples/sec   Loss 13.8696   LearningRate 0.0879   Epoch: 1   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:18,740-Speed 2627.11 samples/sec   Loss 13.7657   LearningRate 0.0879   Epoch: 1   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:22,647-Speed 2621.64 samples/sec   Loss 13.8788   LearningRate 0.0879   Epoch: 1   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:26,659-Speed 2553.02 samples/sec   Loss 13.6067   LearningRate 0.0879   Epoch: 1   Global Step: 51990   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:30,559-Speed 2625.97 samples/sec   Loss 13.9771   LearningRate 0.0879   Epoch: 1   Global Step: 52000   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:34,459-Speed 2626.75 samples/sec   Loss 13.8020   LearningRate 0.0879   Epoch: 1   Global Step: 52010   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:38,370-Speed 2619.21 samples/sec   Loss 13.7722   LearningRate 0.0879   Epoch: 1   Global Step: 52020   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:42,283-Speed 2617.29 samples/sec   Loss 13.8503   LearningRate 0.0878   Epoch: 1   Global Step: 52030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:34:46,192-Speed 2620.12 samples/sec   Loss 13.8325   LearningRate 0.0878   Epoch: 1   Global Step: 52040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:34:50,083-Speed 2632.09 samples/sec   Loss 13.8765   LearningRate 0.0878   Epoch: 1   Global Step: 52050   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:53,989-Speed 2623.03 samples/sec   Loss 13.7307   LearningRate 0.0878   Epoch: 1   Global Step: 52060   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:34:57,892-Speed 2624.40 samples/sec   Loss 13.7546   LearningRate 0.0878   Epoch: 1   Global Step: 52070   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:01,887-Speed 2563.11 samples/sec   Loss 13.7472   LearningRate 0.0878   Epoch: 1   Global Step: 52080   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:05,805-Speed 2614.57 samples/sec   Loss 13.7750   LearningRate 0.0878   Epoch: 1   Global Step: 52090   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:09,709-Speed 2623.21 samples/sec   Loss 13.6906   LearningRate 0.0878   Epoch: 1   Global Step: 52100   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:13,609-Speed 2626.70 samples/sec   Loss 13.9099   LearningRate 0.0878   Epoch: 1   Global Step: 52110   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:17,513-Speed 2623.65 samples/sec   Loss 13.8067   LearningRate 0.0878   Epoch: 1   Global Step: 52120   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:21,524-Speed 2553.44 samples/sec   Loss 13.6802   LearningRate 0.0878   Epoch: 1   Global Step: 52130   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:25,430-Speed 2622.66 samples/sec   Loss 13.6334   LearningRate 0.0878   Epoch: 1   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:29,332-Speed 2625.66 samples/sec   Loss 13.7045   LearningRate 0.0878   Epoch: 1   Global Step: 52150   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:35:33,243-Speed 2619.13 samples/sec   Loss 13.8946   LearningRate 0.0878   Epoch: 1   Global Step: 52160   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:35:37,130-Speed 2634.76 samples/sec   Loss 13.9137   LearningRate 0.0878   Epoch: 1   Global Step: 52170   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:41,041-Speed 2618.76 samples/sec   Loss 13.6525   LearningRate 0.0878   Epoch: 1   Global Step: 52180   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:35:44,929-Speed 2634.29 samples/sec   Loss 13.8903   LearningRate 0.0878   Epoch: 1   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:35:48,840-Speed 2619.69 samples/sec   Loss 13.9025   LearningRate 0.0878   Epoch: 1   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:35:52,760-Speed 2612.82 samples/sec   Loss 13.7155   LearningRate 0.0878   Epoch: 1   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:35:56,755-Speed 2564.36 samples/sec   Loss 13.7172   LearningRate 0.0878   Epoch: 1   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:00,655-Speed 2625.75 samples/sec   Loss 13.7431   LearningRate 0.0878   Epoch: 1   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:04,563-Speed 2621.22 samples/sec   Loss 13.6200   LearningRate 0.0878   Epoch: 1   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:08,531-Speed 2580.66 samples/sec   Loss 13.8220   LearningRate 0.0878   Epoch: 1   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:12,441-Speed 2620.14 samples/sec   Loss 13.8414   LearningRate 0.0878   Epoch: 1   Global Step: 52260   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:16,351-Speed 2619.28 samples/sec   Loss 13.8715   LearningRate 0.0878   Epoch: 1   Global Step: 52270   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:20,269-Speed 2614.34 samples/sec   Loss 13.8708   LearningRate 0.0878   Epoch: 1   Global Step: 52280   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:36:24,186-Speed 2614.98 samples/sec   Loss 13.8384   LearningRate 0.0878   Epoch: 1   Global Step: 52290   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:28,092-Speed 2622.87 samples/sec   Loss 13.7355   LearningRate 0.0878   Epoch: 1   Global Step: 52300   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:32,001-Speed 2620.22 samples/sec   Loss 13.8302   LearningRate 0.0878   Epoch: 1   Global Step: 52310   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:35,920-Speed 2613.31 samples/sec   Loss 13.7040   LearningRate 0.0878   Epoch: 1   Global Step: 52320   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:39,831-Speed 2618.85 samples/sec   Loss 13.6456   LearningRate 0.0878   Epoch: 1   Global Step: 52330   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:43,724-Speed 2630.66 samples/sec   Loss 13.7763   LearningRate 0.0878   Epoch: 1   Global Step: 52340   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:47,630-Speed 2622.53 samples/sec   Loss 13.8707   LearningRate 0.0878   Epoch: 1   Global Step: 52350   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:51,524-Speed 2630.60 samples/sec   Loss 13.7260   LearningRate 0.0878   Epoch: 1   Global Step: 52360   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:55,418-Speed 2630.24 samples/sec   Loss 13.8242   LearningRate 0.0878   Epoch: 1   Global Step: 52370   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:36:59,269-Speed 2659.31 samples/sec   Loss 13.7883   LearningRate 0.0878   Epoch: 1   Global Step: 52380   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:03,162-Speed 2631.80 samples/sec   Loss 13.9432   LearningRate 0.0878   Epoch: 1   Global Step: 52390   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:07,270-Speed 2492.68 samples/sec   Loss 13.7609   LearningRate 0.0878   Epoch: 1   Global Step: 52400   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:11,188-Speed 2614.55 samples/sec   Loss 13.8524   LearningRate 0.0878   Epoch: 1   Global Step: 52410   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:15,090-Speed 2625.31 samples/sec   Loss 13.8315   LearningRate 0.0878   Epoch: 1   Global Step: 52420   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:18,983-Speed 2631.06 samples/sec   Loss 13.9785   LearningRate 0.0878   Epoch: 1   Global Step: 52430   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:22,881-Speed 2627.87 samples/sec   Loss 13.8113   LearningRate 0.0878   Epoch: 1   Global Step: 52440   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:26,779-Speed 2627.35 samples/sec   Loss 13.8287   LearningRate 0.0878   Epoch: 1   Global Step: 52450   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:30,695-Speed 2615.21 samples/sec   Loss 13.7797   LearningRate 0.0878   Epoch: 1   Global Step: 52460   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:34,594-Speed 2627.20 samples/sec   Loss 13.9839   LearningRate 0.0878   Epoch: 1   Global Step: 52470   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 01:37:38,491-Speed 2628.57 samples/sec   Loss 13.7543   LearningRate 0.0877   Epoch: 1   Global Step: 52480   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:37:42,390-Speed 2626.76 samples/sec   Loss 13.9084   LearningRate 0.0877   Epoch: 1   Global Step: 52490   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:37:46,288-Speed 2627.62 samples/sec   Loss 13.7483   LearningRate 0.0877   Epoch: 1   Global Step: 52500   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:37:50,181-Speed 2631.00 samples/sec   Loss 13.8041   LearningRate 0.0877   Epoch: 1   Global Step: 52510   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:37:54,096-Speed 2616.15 samples/sec   Loss 13.8797   LearningRate 0.0877   Epoch: 1   Global Step: 52520   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:37:58,004-Speed 2620.86 samples/sec   Loss 13.7482   LearningRate 0.0877   Epoch: 1   Global Step: 52530   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:38:01,903-Speed 2627.35 samples/sec   Loss 13.8932   LearningRate 0.0877   Epoch: 1   Global Step: 52540   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:38:05,802-Speed 2626.95 samples/sec   Loss 13.7424   LearningRate 0.0877   Epoch: 1   Global Step: 52550   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:38:09,697-Speed 2629.34 samples/sec   Loss 13.7106   LearningRate 0.0877   Epoch: 1   Global Step: 52560   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:38:13,626-Speed 2607.33 samples/sec   Loss 13.8361   LearningRate 0.0877   Epoch: 1   Global Step: 52570   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:38:17,521-Speed 2629.34 samples/sec   Loss 13.7000   LearningRate 0.0877   Epoch: 1   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:21,420-Speed 2627.45 samples/sec   Loss 13.8035   LearningRate 0.0877   Epoch: 1   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:25,316-Speed 2629.16 samples/sec   Loss 13.8408   LearningRate 0.0877   Epoch: 1   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:29,220-Speed 2623.30 samples/sec   Loss 13.7362   LearningRate 0.0877   Epoch: 1   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:33,120-Speed 2626.46 samples/sec   Loss 13.8655   LearningRate 0.0877   Epoch: 1   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:37,011-Speed 2632.08 samples/sec   Loss 13.7003   LearningRate 0.0877   Epoch: 1   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:40,907-Speed 2629.15 samples/sec   Loss 13.7286   LearningRate 0.0877   Epoch: 1   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:44,810-Speed 2624.32 samples/sec   Loss 13.7823   LearningRate 0.0877   Epoch: 1   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:48,722-Speed 2618.74 samples/sec   Loss 13.7864   LearningRate 0.0877   Epoch: 1   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:52,621-Speed 2626.84 samples/sec   Loss 13.9610   LearningRate 0.0877   Epoch: 1   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:38:56,518-Speed 2636.43 samples/sec   Loss 13.8252   LearningRate 0.0877   Epoch: 1   Global Step: 52680   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:00,500-Speed 2572.18 samples/sec   Loss 13.8174   LearningRate 0.0877   Epoch: 1   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:04,421-Speed 2612.48 samples/sec   Loss 13.9417   LearningRate 0.0877   Epoch: 1   Global Step: 52700   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:08,314-Speed 2630.60 samples/sec   Loss 13.8732   LearningRate 0.0877   Epoch: 1   Global Step: 52710   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:12,228-Speed 2616.69 samples/sec   Loss 13.6958   LearningRate 0.0877   Epoch: 1   Global Step: 52720   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:16,128-Speed 2626.50 samples/sec   Loss 13.7830   LearningRate 0.0877   Epoch: 1   Global Step: 52730   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:20,037-Speed 2620.56 samples/sec   Loss 13.9567   LearningRate 0.0877   Epoch: 1   Global Step: 52740   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:23,928-Speed 2632.78 samples/sec   Loss 13.7712   LearningRate 0.0877   Epoch: 1   Global Step: 52750   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:27,824-Speed 2629.00 samples/sec   Loss 13.6801   LearningRate 0.0877   Epoch: 1   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:31,717-Speed 2631.23 samples/sec   Loss 13.8046   LearningRate 0.0877   Epoch: 1   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:35,648-Speed 2605.16 samples/sec   Loss 13.8376   LearningRate 0.0877   Epoch: 1   Global Step: 52780   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:39:39,550-Speed 2625.03 samples/sec   Loss 13.8615   LearningRate 0.0877   Epoch: 1   Global Step: 52790   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:39:43,486-Speed 2602.45 samples/sec   Loss 13.8372   LearningRate 0.0877   Epoch: 1   Global Step: 52800   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:39:47,366-Speed 2639.84 samples/sec   Loss 14.0038   LearningRate 0.0877   Epoch: 1   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:51,251-Speed 2636.36 samples/sec   Loss 13.7857   LearningRate 0.0877   Epoch: 1   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:55,154-Speed 2624.26 samples/sec   Loss 13.7883   LearningRate 0.0877   Epoch: 1   Global Step: 52830   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:39:59,052-Speed 2627.61 samples/sec   Loss 13.7197   LearningRate 0.0877   Epoch: 1   Global Step: 52840   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:02,957-Speed 2623.52 samples/sec   Loss 13.6750   LearningRate 0.0877   Epoch: 1   Global Step: 52850   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:06,850-Speed 2631.07 samples/sec   Loss 13.7757   LearningRate 0.0877   Epoch: 1   Global Step: 52860   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:10,742-Speed 2631.81 samples/sec   Loss 13.9032   LearningRate 0.0877   Epoch: 1   Global Step: 52870   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:14,635-Speed 2630.32 samples/sec   Loss 13.9782   LearningRate 0.0877   Epoch: 1   Global Step: 52880   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:18,525-Speed 2636.36 samples/sec   Loss 13.7587   LearningRate 0.0877   Epoch: 1   Global Step: 52890   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:22,433-Speed 2620.90 samples/sec   Loss 13.7088   LearningRate 0.0877   Epoch: 1   Global Step: 52900   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:26,329-Speed 2628.55 samples/sec   Loss 13.7000   LearningRate 0.0877   Epoch: 1   Global Step: 52910   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:40:30,210-Speed 2639.58 samples/sec   Loss 13.7819   LearningRate 0.0876   Epoch: 1   Global Step: 52920   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:34,101-Speed 2632.71 samples/sec   Loss 13.8919   LearningRate 0.0876   Epoch: 1   Global Step: 52930   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:37,994-Speed 2631.34 samples/sec   Loss 13.6959   LearningRate 0.0876   Epoch: 1   Global Step: 52940   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:41,893-Speed 2626.81 samples/sec   Loss 13.8269   LearningRate 0.0876   Epoch: 1   Global Step: 52950   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:45,785-Speed 2631.32 samples/sec   Loss 13.8447   LearningRate 0.0876   Epoch: 1   Global Step: 52960   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:49,675-Speed 2633.29 samples/sec   Loss 13.7280   LearningRate 0.0876   Epoch: 1   Global Step: 52970   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:53,569-Speed 2630.48 samples/sec   Loss 13.7311   LearningRate 0.0876   Epoch: 1   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:40:57,482-Speed 2617.25 samples/sec   Loss 13.7358   LearningRate 0.0876   Epoch: 1   Global Step: 52990   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:41:01,379-Speed 2627.80 samples/sec   Loss 13.7297   LearningRate 0.0876   Epoch: 1   Global Step: 53000   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:41:05,277-Speed 2627.51 samples/sec   Loss 13.6960   LearningRate 0.0876   Epoch: 1   Global Step: 53010   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:41:09,189-Speed 2618.78 samples/sec   Loss 13.7665   LearningRate 0.0876   Epoch: 1   Global Step: 53020   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:41:13,082-Speed 2631.07 samples/sec   Loss 13.8479   LearningRate 0.0876   Epoch: 1   Global Step: 53030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:41:16,973-Speed 2632.24 samples/sec   Loss 13.8072   LearningRate 0.0876   Epoch: 1   Global Step: 53040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:41:20,860-Speed 2634.96 samples/sec   Loss 13.6556   LearningRate 0.0876   Epoch: 1   Global Step: 53050   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:41:24,751-Speed 2631.88 samples/sec   Loss 13.6085   LearningRate 0.0876   Epoch: 1   Global Step: 53060   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:41:28,644-Speed 2631.97 samples/sec   Loss 13.6570   LearningRate 0.0876   Epoch: 1   Global Step: 53070   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:41:32,533-Speed 2633.42 samples/sec   Loss 13.6987   LearningRate 0.0876   Epoch: 1   Global Step: 53080   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:36,430-Speed 2627.83 samples/sec   Loss 13.7573   LearningRate 0.0876   Epoch: 1   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:40,322-Speed 2631.81 samples/sec   Loss 13.6566   LearningRate 0.0876   Epoch: 1   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:44,224-Speed 2625.59 samples/sec   Loss 13.7994   LearningRate 0.0876   Epoch: 1   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:48,118-Speed 2630.77 samples/sec   Loss 13.8518   LearningRate 0.0876   Epoch: 1   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:52,010-Speed 2631.98 samples/sec   Loss 13.7930   LearningRate 0.0876   Epoch: 1   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:55,898-Speed 2633.98 samples/sec   Loss 13.8020   LearningRate 0.0876   Epoch: 1   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:41:59,801-Speed 2624.49 samples/sec   Loss 13.7022   LearningRate 0.0876   Epoch: 1   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:42:03,706-Speed 2623.49 samples/sec   Loss 13.8159   LearningRate 0.0876   Epoch: 1   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:42:07,597-Speed 2631.74 samples/sec   Loss 13.8742   LearningRate 0.0876   Epoch: 1   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:42:11,505-Speed 2620.84 samples/sec   Loss 13.6655   LearningRate 0.0876   Epoch: 1   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:15,410-Speed 2623.02 samples/sec   Loss 13.6698   LearningRate 0.0876   Epoch: 1   Global Step: 53190   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:19,311-Speed 2626.07 samples/sec   Loss 13.8304   LearningRate 0.0876   Epoch: 1   Global Step: 53200   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:23,211-Speed 2626.19 samples/sec   Loss 13.6162   LearningRate 0.0876   Epoch: 1   Global Step: 53210   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:27,113-Speed 2625.30 samples/sec   Loss 13.7340   LearningRate 0.0876   Epoch: 1   Global Step: 53220   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:31,010-Speed 2628.47 samples/sec   Loss 13.9467   LearningRate 0.0876   Epoch: 1   Global Step: 53230   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:34,911-Speed 2625.19 samples/sec   Loss 13.7384   LearningRate 0.0876   Epoch: 1   Global Step: 53240   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:38,821-Speed 2619.48 samples/sec   Loss 13.7431   LearningRate 0.0876   Epoch: 1   Global Step: 53250   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:42,721-Speed 2626.11 samples/sec   Loss 13.7665   LearningRate 0.0876   Epoch: 1   Global Step: 53260   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:46,620-Speed 2626.90 samples/sec   Loss 13.8963   LearningRate 0.0876   Epoch: 1   Global Step: 53270   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:42:50,521-Speed 2625.72 samples/sec   Loss 13.7227   LearningRate 0.0876   Epoch: 1   Global Step: 53280   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:42:54,419-Speed 2627.97 samples/sec   Loss 13.7004   LearningRate 0.0876   Epoch: 1   Global Step: 53290   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:42:58,376-Speed 2588.81 samples/sec   Loss 13.7706   LearningRate 0.0876   Epoch: 1   Global Step: 53300   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:43:02,315-Speed 2600.18 samples/sec   Loss 13.7965   LearningRate 0.0876   Epoch: 1   Global Step: 53310   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:43:06,208-Speed 2630.80 samples/sec   Loss 13.5808   LearningRate 0.0876   Epoch: 1   Global Step: 53320   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:43:10,108-Speed 2626.04 samples/sec   Loss 13.7853   LearningRate 0.0876   Epoch: 1   Global Step: 53330   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:13,995-Speed 2635.58 samples/sec   Loss 13.7246   LearningRate 0.0876   Epoch: 1   Global Step: 53340   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:17,904-Speed 2620.16 samples/sec   Loss 13.8359   LearningRate 0.0876   Epoch: 1   Global Step: 53350   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:21,808-Speed 2624.20 samples/sec   Loss 13.7876   LearningRate 0.0875   Epoch: 1   Global Step: 53360   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:25,706-Speed 2627.30 samples/sec   Loss 13.6676   LearningRate 0.0875   Epoch: 1   Global Step: 53370   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:29,604-Speed 2627.76 samples/sec   Loss 13.8325   LearningRate 0.0875   Epoch: 1   Global Step: 53380   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:33,503-Speed 2627.02 samples/sec   Loss 13.8690   LearningRate 0.0875   Epoch: 1   Global Step: 53390   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:37,432-Speed 2606.93 samples/sec   Loss 13.6935   LearningRate 0.0875   Epoch: 1   Global Step: 53400   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:41,332-Speed 2626.36 samples/sec   Loss 13.7283   LearningRate 0.0875   Epoch: 1   Global Step: 53410   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:45,321-Speed 2568.15 samples/sec   Loss 13.7908   LearningRate 0.0875   Epoch: 1   Global Step: 53420   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:43:49,338-Speed 2549.73 samples/sec   Loss 13.9527   LearningRate 0.0875   Epoch: 1   Global Step: 53430   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:43:53,233-Speed 2629.97 samples/sec   Loss 13.7750   LearningRate 0.0875   Epoch: 1   Global Step: 53440   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:43:57,150-Speed 2615.10 samples/sec   Loss 13.7626   LearningRate 0.0875   Epoch: 1   Global Step: 53450   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:01,050-Speed 2626.25 samples/sec   Loss 13.7850   LearningRate 0.0875   Epoch: 1   Global Step: 53460   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:04,962-Speed 2618.31 samples/sec   Loss 13.7220   LearningRate 0.0875   Epoch: 1   Global Step: 53470   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:08,880-Speed 2613.80 samples/sec   Loss 13.8328   LearningRate 0.0875   Epoch: 1   Global Step: 53480   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:12,789-Speed 2620.33 samples/sec   Loss 13.7666   LearningRate 0.0875   Epoch: 1   Global Step: 53490   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:16,714-Speed 2609.10 samples/sec   Loss 13.6406   LearningRate 0.0875   Epoch: 1   Global Step: 53500   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:20,626-Speed 2618.98 samples/sec   Loss 13.7550   LearningRate 0.0875   Epoch: 1   Global Step: 53510   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:24,539-Speed 2617.74 samples/sec   Loss 13.6552   LearningRate 0.0875   Epoch: 1   Global Step: 53520   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:28,461-Speed 2611.57 samples/sec   Loss 13.7670   LearningRate 0.0875   Epoch: 1   Global Step: 53530   Fp16 Grad Scale: 524288   Required: 87 hours
Training: 2022-04-13 01:44:32,379-Speed 2614.37 samples/sec   Loss 13.5855   LearningRate 0.0875   Epoch: 1   Global Step: 53540   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:36,274-Speed 2629.50 samples/sec   Loss 13.8258   LearningRate 0.0875   Epoch: 1   Global Step: 53550   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:40,171-Speed 2628.55 samples/sec   Loss 13.7005   LearningRate 0.0875   Epoch: 1   Global Step: 53560   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:44,066-Speed 2629.04 samples/sec   Loss 13.6413   LearningRate 0.0875   Epoch: 1   Global Step: 53570   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:47,965-Speed 2626.76 samples/sec   Loss 13.8424   LearningRate 0.0875   Epoch: 1   Global Step: 53580   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:44:51,844-Speed 2640.65 samples/sec   Loss 13.6979   LearningRate 0.0875   Epoch: 1   Global Step: 53590   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:44:55,740-Speed 2629.36 samples/sec   Loss 13.6775   LearningRate 0.0875   Epoch: 1   Global Step: 53600   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:44:59,635-Speed 2630.11 samples/sec   Loss 13.7978   LearningRate 0.0875   Epoch: 1   Global Step: 53610   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:03,535-Speed 2626.15 samples/sec   Loss 13.6526   LearningRate 0.0875   Epoch: 1   Global Step: 53620   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:07,437-Speed 2624.96 samples/sec   Loss 13.7715   LearningRate 0.0875   Epoch: 1   Global Step: 53630   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:11,342-Speed 2622.91 samples/sec   Loss 13.7469   LearningRate 0.0875   Epoch: 1   Global Step: 53640   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:15,247-Speed 2622.97 samples/sec   Loss 13.7602   LearningRate 0.0875   Epoch: 1   Global Step: 53650   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:19,142-Speed 2629.33 samples/sec   Loss 13.8489   LearningRate 0.0875   Epoch: 1   Global Step: 53660   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:23,039-Speed 2628.48 samples/sec   Loss 13.7725   LearningRate 0.0875   Epoch: 1   Global Step: 53670   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:26,929-Speed 2632.63 samples/sec   Loss 13.6763   LearningRate 0.0875   Epoch: 1   Global Step: 53680   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:30,863-Speed 2604.45 samples/sec   Loss 13.7939   LearningRate 0.0875   Epoch: 1   Global Step: 53690   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:45:34,763-Speed 2626.42 samples/sec   Loss 13.7383   LearningRate 0.0875   Epoch: 1   Global Step: 53700   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:45:38,660-Speed 2627.96 samples/sec   Loss 13.6189   LearningRate 0.0875   Epoch: 1   Global Step: 53710   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:45:42,569-Speed 2620.78 samples/sec   Loss 13.7159   LearningRate 0.0875   Epoch: 1   Global Step: 53720   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:45:46,469-Speed 2625.74 samples/sec   Loss 13.7808   LearningRate 0.0875   Epoch: 1   Global Step: 53730   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:45:50,378-Speed 2620.16 samples/sec   Loss 13.6068   LearningRate 0.0875   Epoch: 1   Global Step: 53740   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:45:54,281-Speed 2624.28 samples/sec   Loss 13.7634   LearningRate 0.0875   Epoch: 1   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:45:58,177-Speed 2629.42 samples/sec   Loss 13.8529   LearningRate 0.0875   Epoch: 1   Global Step: 53760   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:02,143-Speed 2582.93 samples/sec   Loss 13.7174   LearningRate 0.0875   Epoch: 1   Global Step: 53770   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:06,042-Speed 2626.92 samples/sec   Loss 13.7227   LearningRate 0.0875   Epoch: 1   Global Step: 53780   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:09,956-Speed 2616.89 samples/sec   Loss 13.8368   LearningRate 0.0875   Epoch: 1   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:13,854-Speed 2627.35 samples/sec   Loss 13.6515   LearningRate 0.0875   Epoch: 1   Global Step: 53800   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:17,764-Speed 2619.56 samples/sec   Loss 13.8034   LearningRate 0.0874   Epoch: 1   Global Step: 53810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:21,676-Speed 2617.80 samples/sec   Loss 13.7027   LearningRate 0.0874   Epoch: 1   Global Step: 53820   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:25,592-Speed 2615.63 samples/sec   Loss 13.5605   LearningRate 0.0874   Epoch: 1   Global Step: 53830   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:29,487-Speed 2629.77 samples/sec   Loss 13.8104   LearningRate 0.0874   Epoch: 1   Global Step: 53840   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:33,370-Speed 2638.29 samples/sec   Loss 13.7037   LearningRate 0.0874   Epoch: 1   Global Step: 53850   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:37,281-Speed 2618.80 samples/sec   Loss 13.7385   LearningRate 0.0874   Epoch: 1   Global Step: 53860   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:41,196-Speed 2615.91 samples/sec   Loss 13.8441   LearningRate 0.0874   Epoch: 1   Global Step: 53870   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:45,162-Speed 2582.39 samples/sec   Loss 13.7214   LearningRate 0.0874   Epoch: 1   Global Step: 53880   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:49,067-Speed 2623.47 samples/sec   Loss 13.8635   LearningRate 0.0874   Epoch: 1   Global Step: 53890   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:52,979-Speed 2618.07 samples/sec   Loss 13.6978   LearningRate 0.0874   Epoch: 1   Global Step: 53900   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:46:56,876-Speed 2628.23 samples/sec   Loss 13.8205   LearningRate 0.0874   Epoch: 1   Global Step: 53910   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:47:00,772-Speed 2629.17 samples/sec   Loss 13.6522   LearningRate 0.0874   Epoch: 1   Global Step: 53920   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:47:04,672-Speed 2625.98 samples/sec   Loss 13.6237   LearningRate 0.0874   Epoch: 1   Global Step: 53930   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:47:08,575-Speed 2625.28 samples/sec   Loss 13.8033   LearningRate 0.0874   Epoch: 1   Global Step: 53940   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:47:12,469-Speed 2630.10 samples/sec   Loss 13.5626   LearningRate 0.0874   Epoch: 1   Global Step: 53950   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:16,365-Speed 2628.55 samples/sec   Loss 13.7314   LearningRate 0.0874   Epoch: 1   Global Step: 53960   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:20,269-Speed 2624.32 samples/sec   Loss 13.7197   LearningRate 0.0874   Epoch: 1   Global Step: 53970   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:24,174-Speed 2622.55 samples/sec   Loss 13.7034   LearningRate 0.0874   Epoch: 1   Global Step: 53980   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:28,066-Speed 2632.00 samples/sec   Loss 13.8059   LearningRate 0.0874   Epoch: 1   Global Step: 53990   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:31,961-Speed 2629.10 samples/sec   Loss 13.6739   LearningRate 0.0874   Epoch: 1   Global Step: 54000   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:35,864-Speed 2624.13 samples/sec   Loss 13.6514   LearningRate 0.0874   Epoch: 1   Global Step: 54010   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:39,757-Speed 2630.82 samples/sec   Loss 13.6138   LearningRate 0.0874   Epoch: 1   Global Step: 54020   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:43,692-Speed 2603.38 samples/sec   Loss 13.7545   LearningRate 0.0874   Epoch: 1   Global Step: 54030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:47,608-Speed 2615.55 samples/sec   Loss 13.6865   LearningRate 0.0874   Epoch: 1   Global Step: 54040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:51,490-Speed 2638.89 samples/sec   Loss 13.6893   LearningRate 0.0874   Epoch: 1   Global Step: 54050   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:55,385-Speed 2629.77 samples/sec   Loss 13.7684   LearningRate 0.0874   Epoch: 1   Global Step: 54060   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:47:59,278-Speed 2631.51 samples/sec   Loss 13.7073   LearningRate 0.0874   Epoch: 1   Global Step: 54070   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:48:03,173-Speed 2629.43 samples/sec   Loss 13.7484   LearningRate 0.0874   Epoch: 1   Global Step: 54080   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:48:07,047-Speed 2643.48 samples/sec   Loss 13.8017   LearningRate 0.0874   Epoch: 1   Global Step: 54090   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:10,943-Speed 2629.25 samples/sec   Loss 13.7154   LearningRate 0.0874   Epoch: 1   Global Step: 54100   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:14,834-Speed 2632.49 samples/sec   Loss 13.7939   LearningRate 0.0874   Epoch: 1   Global Step: 54110   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:18,728-Speed 2630.23 samples/sec   Loss 13.7372   LearningRate 0.0874   Epoch: 1   Global Step: 54120   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:22,626-Speed 2628.03 samples/sec   Loss 13.6756   LearningRate 0.0874   Epoch: 1   Global Step: 54130   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:26,518-Speed 2631.30 samples/sec   Loss 13.5967   LearningRate 0.0874   Epoch: 1   Global Step: 54140   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:30,433-Speed 2617.05 samples/sec   Loss 13.7462   LearningRate 0.0874   Epoch: 1   Global Step: 54150   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:34,324-Speed 2631.67 samples/sec   Loss 13.8271   LearningRate 0.0874   Epoch: 1   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:38,220-Speed 2628.99 samples/sec   Loss 13.7006   LearningRate 0.0874   Epoch: 1   Global Step: 54170   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:42,114-Speed 2630.47 samples/sec   Loss 13.6841   LearningRate 0.0874   Epoch: 1   Global Step: 54180   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:46,006-Speed 2631.57 samples/sec   Loss 13.5321   LearningRate 0.0874   Epoch: 1   Global Step: 54190   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:49,900-Speed 2630.62 samples/sec   Loss 13.6558   LearningRate 0.0874   Epoch: 1   Global Step: 54200   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:53,797-Speed 2627.99 samples/sec   Loss 13.6046   LearningRate 0.0874   Epoch: 1   Global Step: 54210   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:48:57,698-Speed 2626.16 samples/sec   Loss 13.7698   LearningRate 0.0874   Epoch: 1   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:49:01,580-Speed 2638.28 samples/sec   Loss 13.6126   LearningRate 0.0874   Epoch: 1   Global Step: 54230   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:05,480-Speed 2626.50 samples/sec   Loss 13.8215   LearningRate 0.0874   Epoch: 1   Global Step: 54240   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:09,370-Speed 2632.83 samples/sec   Loss 13.5269   LearningRate 0.0873   Epoch: 1   Global Step: 54250   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:13,266-Speed 2628.74 samples/sec   Loss 13.7729   LearningRate 0.0873   Epoch: 1   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:17,164-Speed 2627.83 samples/sec   Loss 13.5676   LearningRate 0.0873   Epoch: 1   Global Step: 54270   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:21,065-Speed 2625.82 samples/sec   Loss 13.6328   LearningRate 0.0873   Epoch: 1   Global Step: 54280   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:24,965-Speed 2625.89 samples/sec   Loss 13.7019   LearningRate 0.0873   Epoch: 1   Global Step: 54290   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:28,864-Speed 2627.55 samples/sec   Loss 13.5531   LearningRate 0.0873   Epoch: 1   Global Step: 54300   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:32,762-Speed 2627.84 samples/sec   Loss 13.6156   LearningRate 0.0873   Epoch: 1   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:36,659-Speed 2628.34 samples/sec   Loss 13.6757   LearningRate 0.0873   Epoch: 1   Global Step: 54320   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:49:40,555-Speed 2628.24 samples/sec   Loss 13.7351   LearningRate 0.0873   Epoch: 1   Global Step: 54330   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:49:44,451-Speed 2629.63 samples/sec   Loss 13.6473   LearningRate 0.0873   Epoch: 1   Global Step: 54340   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:49:48,372-Speed 2611.78 samples/sec   Loss 13.5441   LearningRate 0.0873   Epoch: 1   Global Step: 54350   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:49:52,265-Speed 2631.43 samples/sec   Loss 13.7120   LearningRate 0.0873   Epoch: 1   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:49:56,161-Speed 2628.94 samples/sec   Loss 13.5107   LearningRate 0.0873   Epoch: 1   Global Step: 54370   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:00,061-Speed 2626.43 samples/sec   Loss 13.7773   LearningRate 0.0873   Epoch: 1   Global Step: 54380   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:03,961-Speed 2626.61 samples/sec   Loss 13.6130   LearningRate 0.0873   Epoch: 1   Global Step: 54390   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:07,855-Speed 2630.00 samples/sec   Loss 13.6883   LearningRate 0.0873   Epoch: 1   Global Step: 54400   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:11,747-Speed 2631.63 samples/sec   Loss 13.6674   LearningRate 0.0873   Epoch: 1   Global Step: 54410   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:15,640-Speed 2631.15 samples/sec   Loss 13.6139   LearningRate 0.0873   Epoch: 1   Global Step: 54420   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:19,542-Speed 2625.33 samples/sec   Loss 13.6513   LearningRate 0.0873   Epoch: 1   Global Step: 54430   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:50:23,450-Speed 2620.80 samples/sec   Loss 13.5922   LearningRate 0.0873   Epoch: 1   Global Step: 54440   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:50:27,371-Speed 2612.55 samples/sec   Loss 13.6419   LearningRate 0.0873   Epoch: 1   Global Step: 54450   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:50:31,260-Speed 2633.80 samples/sec   Loss 13.8162   LearningRate 0.0873   Epoch: 1   Global Step: 54460   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:50:35,152-Speed 2631.94 samples/sec   Loss 13.8087   LearningRate 0.0873   Epoch: 1   Global Step: 54470   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:50:39,048-Speed 2629.24 samples/sec   Loss 13.6126   LearningRate 0.0873   Epoch: 1   Global Step: 54480   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:50:42,933-Speed 2636.61 samples/sec   Loss 13.7174   LearningRate 0.0873   Epoch: 1   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:46,834-Speed 2625.85 samples/sec   Loss 13.6303   LearningRate 0.0873   Epoch: 1   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:50,748-Speed 2616.13 samples/sec   Loss 13.7373   LearningRate 0.0873   Epoch: 1   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:54,669-Speed 2612.67 samples/sec   Loss 13.7268   LearningRate 0.0873   Epoch: 1   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:50:58,615-Speed 2595.74 samples/sec   Loss 13.7055   LearningRate 0.0873   Epoch: 1   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:02,516-Speed 2625.55 samples/sec   Loss 13.7393   LearningRate 0.0873   Epoch: 1   Global Step: 54540   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:06,416-Speed 2626.80 samples/sec   Loss 13.7177   LearningRate 0.0873   Epoch: 1   Global Step: 54550   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:10,312-Speed 2628.35 samples/sec   Loss 13.6632   LearningRate 0.0873   Epoch: 1   Global Step: 54560   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:14,206-Speed 2630.79 samples/sec   Loss 13.6552   LearningRate 0.0873   Epoch: 1   Global Step: 54570   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:18,100-Speed 2630.29 samples/sec   Loss 13.7417   LearningRate 0.0873   Epoch: 1   Global Step: 54580   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:21,990-Speed 2632.86 samples/sec   Loss 13.6880   LearningRate 0.0873   Epoch: 1   Global Step: 54590   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:51:25,883-Speed 2630.66 samples/sec   Loss 13.6952   LearningRate 0.0873   Epoch: 1   Global Step: 54600   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:51:29,756-Speed 2645.11 samples/sec   Loss 13.5968   LearningRate 0.0873   Epoch: 1   Global Step: 54610   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:33,649-Speed 2630.91 samples/sec   Loss 13.6486   LearningRate 0.0873   Epoch: 1   Global Step: 54620   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:37,543-Speed 2630.26 samples/sec   Loss 13.6180   LearningRate 0.0873   Epoch: 1   Global Step: 54630   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:41,440-Speed 2628.91 samples/sec   Loss 13.7042   LearningRate 0.0873   Epoch: 1   Global Step: 54640   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:45,349-Speed 2620.34 samples/sec   Loss 13.5928   LearningRate 0.0873   Epoch: 1   Global Step: 54650   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:49,248-Speed 2626.32 samples/sec   Loss 13.6315   LearningRate 0.0873   Epoch: 1   Global Step: 54660   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:53,140-Speed 2631.49 samples/sec   Loss 13.6213   LearningRate 0.0873   Epoch: 1   Global Step: 54670   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:51:57,034-Speed 2630.65 samples/sec   Loss 13.7388   LearningRate 0.0873   Epoch: 1   Global Step: 54680   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:52:00,961-Speed 2608.55 samples/sec   Loss 13.7087   LearningRate 0.0872   Epoch: 1   Global Step: 54690   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:52:04,850-Speed 2633.95 samples/sec   Loss 13.6386   LearningRate 0.0872   Epoch: 1   Global Step: 54700   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:52:08,750-Speed 2626.20 samples/sec   Loss 13.5663   LearningRate 0.0872   Epoch: 1   Global Step: 54710   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:52:12,645-Speed 2629.86 samples/sec   Loss 13.8129   LearningRate 0.0872   Epoch: 1   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:52:16,538-Speed 2631.24 samples/sec   Loss 13.6373   LearningRate 0.0872   Epoch: 1   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:20,433-Speed 2629.28 samples/sec   Loss 13.4661   LearningRate 0.0872   Epoch: 1   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:24,323-Speed 2633.07 samples/sec   Loss 13.6804   LearningRate 0.0872   Epoch: 1   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:28,222-Speed 2627.21 samples/sec   Loss 13.8524   LearningRate 0.0872   Epoch: 1   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:32,112-Speed 2632.83 samples/sec   Loss 13.6612   LearningRate 0.0872   Epoch: 1   Global Step: 54770   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:36,018-Speed 2622.64 samples/sec   Loss 13.7980   LearningRate 0.0872   Epoch: 1   Global Step: 54780   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:39,908-Speed 2633.42 samples/sec   Loss 13.7978   LearningRate 0.0872   Epoch: 1   Global Step: 54790   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:43,827-Speed 2613.61 samples/sec   Loss 13.7536   LearningRate 0.0872   Epoch: 1   Global Step: 54800   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:47,720-Speed 2631.61 samples/sec   Loss 13.5556   LearningRate 0.0872   Epoch: 1   Global Step: 54810   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:51,609-Speed 2633.23 samples/sec   Loss 13.7827   LearningRate 0.0872   Epoch: 1   Global Step: 54820   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 01:52:55,508-Speed 2627.12 samples/sec   Loss 13.5899   LearningRate 0.0872   Epoch: 1   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:52:59,401-Speed 2630.90 samples/sec   Loss 13.6300   LearningRate 0.0872   Epoch: 1   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:03,294-Speed 2631.47 samples/sec   Loss 13.7222   LearningRate 0.0872   Epoch: 1   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:07,187-Speed 2630.48 samples/sec   Loss 13.6537   LearningRate 0.0872   Epoch: 1   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:11,080-Speed 2631.66 samples/sec   Loss 13.6930   LearningRate 0.0872   Epoch: 1   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:14,984-Speed 2623.59 samples/sec   Loss 13.8576   LearningRate 0.0872   Epoch: 1   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:18,876-Speed 2631.55 samples/sec   Loss 13.7161   LearningRate 0.0872   Epoch: 1   Global Step: 54890   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:22,777-Speed 2626.38 samples/sec   Loss 13.6541   LearningRate 0.0872   Epoch: 1   Global Step: 54900   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:26,670-Speed 2630.83 samples/sec   Loss 13.5367   LearningRate 0.0872   Epoch: 1   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:30,562-Speed 2632.24 samples/sec   Loss 13.7038   LearningRate 0.0872   Epoch: 1   Global Step: 54920   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:53:34,468-Speed 2621.64 samples/sec   Loss 13.7391   LearningRate 0.0872   Epoch: 1   Global Step: 54930   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:53:38,361-Speed 2630.61 samples/sec   Loss 13.7398   LearningRate 0.0872   Epoch: 1   Global Step: 54940   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:53:42,280-Speed 2613.89 samples/sec   Loss 13.6437   LearningRate 0.0872   Epoch: 1   Global Step: 54950   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:53:46,184-Speed 2623.46 samples/sec   Loss 13.6778   LearningRate 0.0872   Epoch: 1   Global Step: 54960   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:53:50,083-Speed 2627.52 samples/sec   Loss 13.5749   LearningRate 0.0872   Epoch: 1   Global Step: 54970   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:53:53,976-Speed 2631.01 samples/sec   Loss 13.6232   LearningRate 0.0872   Epoch: 1   Global Step: 54980   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:53:57,875-Speed 2627.08 samples/sec   Loss 13.4947   LearningRate 0.0872   Epoch: 1   Global Step: 54990   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:01,767-Speed 2631.29 samples/sec   Loss 13.7350   LearningRate 0.0872   Epoch: 1   Global Step: 55000   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:05,661-Speed 2630.32 samples/sec   Loss 13.6247   LearningRate 0.0872   Epoch: 1   Global Step: 55010   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:09,571-Speed 2619.51 samples/sec   Loss 13.7300   LearningRate 0.0872   Epoch: 1   Global Step: 55020   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:13,465-Speed 2630.54 samples/sec   Loss 13.5364   LearningRate 0.0872   Epoch: 1   Global Step: 55030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:54:17,357-Speed 2631.51 samples/sec   Loss 13.7922   LearningRate 0.0872   Epoch: 1   Global Step: 55040   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:21,262-Speed 2623.10 samples/sec   Loss 13.7569   LearningRate 0.0872   Epoch: 1   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:25,169-Speed 2621.78 samples/sec   Loss 13.6421   LearningRate 0.0872   Epoch: 1   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:29,081-Speed 2618.51 samples/sec   Loss 13.6648   LearningRate 0.0872   Epoch: 1   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:32,983-Speed 2624.28 samples/sec   Loss 13.5414   LearningRate 0.0872   Epoch: 1   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:36,893-Speed 2619.58 samples/sec   Loss 13.4401   LearningRate 0.0872   Epoch: 1   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:40,792-Speed 2626.92 samples/sec   Loss 13.6351   LearningRate 0.0872   Epoch: 1   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:44,694-Speed 2625.40 samples/sec   Loss 13.7467   LearningRate 0.0872   Epoch: 1   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:48,588-Speed 2630.15 samples/sec   Loss 13.5481   LearningRate 0.0872   Epoch: 1   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:52,481-Speed 2631.26 samples/sec   Loss 13.6540   LearningRate 0.0872   Epoch: 1   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:54:56,372-Speed 2632.40 samples/sec   Loss 13.7258   LearningRate 0.0871   Epoch: 1   Global Step: 55140   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:00,273-Speed 2625.37 samples/sec   Loss 13.7318   LearningRate 0.0871   Epoch: 1   Global Step: 55150   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:04,167-Speed 2630.58 samples/sec   Loss 13.6682   LearningRate 0.0871   Epoch: 1   Global Step: 55160   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:08,060-Speed 2630.70 samples/sec   Loss 13.6021   LearningRate 0.0871   Epoch: 1   Global Step: 55170   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:11,956-Speed 2629.04 samples/sec   Loss 13.6352   LearningRate 0.0871   Epoch: 1   Global Step: 55180   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:15,847-Speed 2632.17 samples/sec   Loss 13.5620   LearningRate 0.0871   Epoch: 1   Global Step: 55190   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:19,741-Speed 2630.87 samples/sec   Loss 13.5537   LearningRate 0.0871   Epoch: 1   Global Step: 55200   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:23,636-Speed 2630.06 samples/sec   Loss 13.7027   LearningRate 0.0871   Epoch: 1   Global Step: 55210   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:55:27,525-Speed 2633.01 samples/sec   Loss 13.6170   LearningRate 0.0871   Epoch: 1   Global Step: 55220   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:31,415-Speed 2633.72 samples/sec   Loss 13.6214   LearningRate 0.0871   Epoch: 1   Global Step: 55230   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:35,308-Speed 2630.39 samples/sec   Loss 13.6862   LearningRate 0.0871   Epoch: 1   Global Step: 55240   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:39,212-Speed 2623.36 samples/sec   Loss 13.5828   LearningRate 0.0871   Epoch: 1   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:43,104-Speed 2631.34 samples/sec   Loss 13.6443   LearningRate 0.0871   Epoch: 1   Global Step: 55260   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:47,001-Speed 2628.67 samples/sec   Loss 13.6968   LearningRate 0.0871   Epoch: 1   Global Step: 55270   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:50,894-Speed 2631.32 samples/sec   Loss 13.6200   LearningRate 0.0871   Epoch: 1   Global Step: 55280   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:54,787-Speed 2630.57 samples/sec   Loss 13.5623   LearningRate 0.0871   Epoch: 1   Global Step: 55290   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:55:58,682-Speed 2630.35 samples/sec   Loss 13.6691   LearningRate 0.0871   Epoch: 1   Global Step: 55300   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:56:02,587-Speed 2622.73 samples/sec   Loss 13.6071   LearningRate 0.0871   Epoch: 1   Global Step: 55310   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:56:06,467-Speed 2639.58 samples/sec   Loss 13.6782   LearningRate 0.0871   Epoch: 1   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:10,349-Speed 2637.88 samples/sec   Loss 13.6381   LearningRate 0.0871   Epoch: 1   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:14,253-Speed 2624.33 samples/sec   Loss 13.7209   LearningRate 0.0871   Epoch: 1   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:18,151-Speed 2627.40 samples/sec   Loss 13.7201   LearningRate 0.0871   Epoch: 1   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:22,044-Speed 2630.60 samples/sec   Loss 13.6608   LearningRate 0.0871   Epoch: 1   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:25,941-Speed 2628.74 samples/sec   Loss 13.6239   LearningRate 0.0871   Epoch: 1   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:29,836-Speed 2629.81 samples/sec   Loss 13.5730   LearningRate 0.0871   Epoch: 1   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:33,729-Speed 2630.61 samples/sec   Loss 13.7705   LearningRate 0.0871   Epoch: 1   Global Step: 55390   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:37,637-Speed 2621.57 samples/sec   Loss 13.6062   LearningRate 0.0871   Epoch: 1   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:41,530-Speed 2630.77 samples/sec   Loss 13.7033   LearningRate 0.0871   Epoch: 1   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:56:45,441-Speed 2619.12 samples/sec   Loss 13.5057   LearningRate 0.0871   Epoch: 1   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:56:49,346-Speed 2622.77 samples/sec   Loss 13.6385   LearningRate 0.0871   Epoch: 1   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:56:53,240-Speed 2630.85 samples/sec   Loss 13.7025   LearningRate 0.0871   Epoch: 1   Global Step: 55440   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:56:57,135-Speed 2629.62 samples/sec   Loss 13.6025   LearningRate 0.0871   Epoch: 1   Global Step: 55450   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:01,039-Speed 2623.80 samples/sec   Loss 13.7085   LearningRate 0.0871   Epoch: 1   Global Step: 55460   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:04,950-Speed 2619.12 samples/sec   Loss 13.7756   LearningRate 0.0871   Epoch: 1   Global Step: 55470   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:08,854-Speed 2623.90 samples/sec   Loss 13.5099   LearningRate 0.0871   Epoch: 1   Global Step: 55480   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:12,748-Speed 2630.34 samples/sec   Loss 13.6364   LearningRate 0.0871   Epoch: 1   Global Step: 55490   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:16,651-Speed 2625.09 samples/sec   Loss 13.7140   LearningRate 0.0871   Epoch: 1   Global Step: 55500   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:20,577-Speed 2608.58 samples/sec   Loss 13.7257   LearningRate 0.0871   Epoch: 1   Global Step: 55510   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:24,472-Speed 2629.67 samples/sec   Loss 13.6715   LearningRate 0.0871   Epoch: 1   Global Step: 55520   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:57:28,350-Speed 2641.23 samples/sec   Loss 13.7073   LearningRate 0.0871   Epoch: 1   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:32,263-Speed 2617.24 samples/sec   Loss 13.6573   LearningRate 0.0871   Epoch: 1   Global Step: 55540   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:36,158-Speed 2629.53 samples/sec   Loss 13.7104   LearningRate 0.0871   Epoch: 1   Global Step: 55550   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:40,065-Speed 2622.32 samples/sec   Loss 13.4431   LearningRate 0.0871   Epoch: 1   Global Step: 55560   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:43,966-Speed 2625.90 samples/sec   Loss 13.7545   LearningRate 0.0871   Epoch: 1   Global Step: 55570   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:47,875-Speed 2619.32 samples/sec   Loss 13.5607   LearningRate 0.0870   Epoch: 1   Global Step: 55580   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:51,784-Speed 2620.62 samples/sec   Loss 13.6100   LearningRate 0.0870   Epoch: 1   Global Step: 55590   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:55,697-Speed 2617.00 samples/sec   Loss 13.7607   LearningRate 0.0870   Epoch: 1   Global Step: 55600   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:57:59,610-Speed 2618.17 samples/sec   Loss 13.7278   LearningRate 0.0870   Epoch: 1   Global Step: 55610   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:03,506-Speed 2629.02 samples/sec   Loss 13.5872   LearningRate 0.0870   Epoch: 1   Global Step: 55620   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:07,403-Speed 2628.65 samples/sec   Loss 13.5043   LearningRate 0.0870   Epoch: 1   Global Step: 55630   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:58:11,286-Speed 2637.60 samples/sec   Loss 13.7508   LearningRate 0.0870   Epoch: 1   Global Step: 55640   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:15,262-Speed 2576.19 samples/sec   Loss 13.7510   LearningRate 0.0870   Epoch: 1   Global Step: 55650   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:19,193-Speed 2605.75 samples/sec   Loss 13.4634   LearningRate 0.0870   Epoch: 1   Global Step: 55660   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:23,113-Speed 2612.56 samples/sec   Loss 13.5692   LearningRate 0.0870   Epoch: 1   Global Step: 55670   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:27,012-Speed 2627.31 samples/sec   Loss 13.5239   LearningRate 0.0870   Epoch: 1   Global Step: 55680   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:58:30,894-Speed 2638.36 samples/sec   Loss 13.3937   LearningRate 0.0870   Epoch: 1   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:34,791-Speed 2628.28 samples/sec   Loss 13.5272   LearningRate 0.0870   Epoch: 1   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:38,685-Speed 2630.35 samples/sec   Loss 13.6138   LearningRate 0.0870   Epoch: 1   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:42,581-Speed 2629.08 samples/sec   Loss 13.6961   LearningRate 0.0870   Epoch: 1   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:46,477-Speed 2628.97 samples/sec   Loss 13.7348   LearningRate 0.0870   Epoch: 1   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:50,372-Speed 2629.84 samples/sec   Loss 13.5868   LearningRate 0.0870   Epoch: 1   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:54,266-Speed 2630.15 samples/sec   Loss 13.5894   LearningRate 0.0870   Epoch: 1   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:58:58,169-Speed 2624.18 samples/sec   Loss 13.6475   LearningRate 0.0870   Epoch: 1   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:59:02,070-Speed 2625.50 samples/sec   Loss 13.5032   LearningRate 0.0870   Epoch: 1   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:59:05,968-Speed 2627.50 samples/sec   Loss 13.5709   LearningRate 0.0870   Epoch: 1   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 01:59:09,871-Speed 2624.67 samples/sec   Loss 13.6148   LearningRate 0.0870   Epoch: 1   Global Step: 55790   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:13,905-Speed 2539.07 samples/sec   Loss 13.4833   LearningRate 0.0870   Epoch: 1   Global Step: 55800   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:17,963-Speed 2523.76 samples/sec   Loss 13.6007   LearningRate 0.0870   Epoch: 1   Global Step: 55810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:21,863-Speed 2627.16 samples/sec   Loss 13.6440   LearningRate 0.0870   Epoch: 1   Global Step: 55820   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:25,769-Speed 2622.06 samples/sec   Loss 13.6493   LearningRate 0.0870   Epoch: 1   Global Step: 55830   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:29,669-Speed 2626.29 samples/sec   Loss 13.5988   LearningRate 0.0870   Epoch: 1   Global Step: 55840   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:33,569-Speed 2626.12 samples/sec   Loss 13.5851   LearningRate 0.0870   Epoch: 1   Global Step: 55850   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:37,477-Speed 2620.96 samples/sec   Loss 13.6108   LearningRate 0.0870   Epoch: 1   Global Step: 55860   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:41,370-Speed 2630.32 samples/sec   Loss 13.5935   LearningRate 0.0870   Epoch: 1   Global Step: 55870   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:45,267-Speed 2629.02 samples/sec   Loss 13.5929   LearningRate 0.0870   Epoch: 1   Global Step: 55880   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 01:59:49,161-Speed 2630.22 samples/sec   Loss 13.5587   LearningRate 0.0870   Epoch: 1   Global Step: 55890   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:59:53,090-Speed 2607.39 samples/sec   Loss 13.6429   LearningRate 0.0870   Epoch: 1   Global Step: 55900   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 01:59:56,996-Speed 2622.12 samples/sec   Loss 13.5280   LearningRate 0.0870   Epoch: 1   Global Step: 55910   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:00,904-Speed 2621.50 samples/sec   Loss 13.7312   LearningRate 0.0870   Epoch: 1   Global Step: 55920   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:04,812-Speed 2620.16 samples/sec   Loss 13.8084   LearningRate 0.0870   Epoch: 1   Global Step: 55930   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:08,721-Speed 2620.74 samples/sec   Loss 13.6175   LearningRate 0.0870   Epoch: 1   Global Step: 55940   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:12,633-Speed 2618.27 samples/sec   Loss 13.5936   LearningRate 0.0870   Epoch: 1   Global Step: 55950   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:16,539-Speed 2622.37 samples/sec   Loss 13.7198   LearningRate 0.0870   Epoch: 1   Global Step: 55960   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:20,445-Speed 2622.39 samples/sec   Loss 13.6341   LearningRate 0.0870   Epoch: 1   Global Step: 55970   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:24,352-Speed 2621.25 samples/sec   Loss 13.7886   LearningRate 0.0870   Epoch: 1   Global Step: 55980   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:28,256-Speed 2624.37 samples/sec   Loss 13.5844   LearningRate 0.0870   Epoch: 1   Global Step: 55990   Fp16 Grad Scale: 524288   Required: 87 hours
Training: 2022-04-13 02:00:32,164-Speed 2620.43 samples/sec   Loss 13.5642   LearningRate 0.0870   Epoch: 1   Global Step: 56000   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:36,071-Speed 2621.81 samples/sec   Loss 13.5436   LearningRate 0.0870   Epoch: 1   Global Step: 56010   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:39,984-Speed 2617.59 samples/sec   Loss 13.6115   LearningRate 0.0870   Epoch: 1   Global Step: 56020   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:43,894-Speed 2619.88 samples/sec   Loss 13.6458   LearningRate 0.0869   Epoch: 1   Global Step: 56030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:47,795-Speed 2625.45 samples/sec   Loss 13.6635   LearningRate 0.0869   Epoch: 1   Global Step: 56040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:51,724-Speed 2607.26 samples/sec   Loss 13.7354   LearningRate 0.0869   Epoch: 1   Global Step: 56050   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:55,618-Speed 2630.61 samples/sec   Loss 13.6860   LearningRate 0.0869   Epoch: 1   Global Step: 56060   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:00:59,512-Speed 2630.46 samples/sec   Loss 13.6433   LearningRate 0.0869   Epoch: 1   Global Step: 56070   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:03,402-Speed 2632.81 samples/sec   Loss 13.7211   LearningRate 0.0869   Epoch: 1   Global Step: 56080   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:07,317-Speed 2616.19 samples/sec   Loss 13.6352   LearningRate 0.0869   Epoch: 1   Global Step: 56090   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:11,216-Speed 2627.15 samples/sec   Loss 13.6445   LearningRate 0.0869   Epoch: 1   Global Step: 56100   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:15,123-Speed 2621.74 samples/sec   Loss 13.6010   LearningRate 0.0869   Epoch: 1   Global Step: 56110   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:19,022-Speed 2627.12 samples/sec   Loss 13.5306   LearningRate 0.0869   Epoch: 1   Global Step: 56120   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:22,951-Speed 2607.04 samples/sec   Loss 13.5703   LearningRate 0.0869   Epoch: 1   Global Step: 56130   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:26,843-Speed 2632.26 samples/sec   Loss 13.6676   LearningRate 0.0869   Epoch: 1   Global Step: 56140   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:01:30,691-Speed 2661.74 samples/sec   Loss 13.8708   LearningRate 0.0869   Epoch: 1   Global Step: 56150   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:34,600-Speed 2619.97 samples/sec   Loss 13.4749   LearningRate 0.0869   Epoch: 1   Global Step: 56160   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:38,528-Speed 2608.25 samples/sec   Loss 13.5467   LearningRate 0.0869   Epoch: 1   Global Step: 56170   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:42,420-Speed 2630.98 samples/sec   Loss 13.5801   LearningRate 0.0869   Epoch: 1   Global Step: 56180   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:46,325-Speed 2623.25 samples/sec   Loss 13.6105   LearningRate 0.0869   Epoch: 1   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:50,217-Speed 2631.93 samples/sec   Loss 13.6953   LearningRate 0.0869   Epoch: 1   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:54,113-Speed 2629.20 samples/sec   Loss 13.6315   LearningRate 0.0869   Epoch: 1   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:01:58,009-Speed 2628.81 samples/sec   Loss 13.4678   LearningRate 0.0869   Epoch: 1   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:02:01,902-Speed 2631.40 samples/sec   Loss 13.6619   LearningRate 0.0869   Epoch: 1   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:02:05,799-Speed 2628.25 samples/sec   Loss 13.7096   LearningRate 0.0869   Epoch: 1   Global Step: 56240   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:02:09,695-Speed 2628.75 samples/sec   Loss 13.4775   LearningRate 0.0869   Epoch: 1   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:13,589-Speed 2629.99 samples/sec   Loss 13.6840   LearningRate 0.0869   Epoch: 1   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:17,508-Speed 2613.94 samples/sec   Loss 13.6166   LearningRate 0.0869   Epoch: 1   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:21,404-Speed 2629.47 samples/sec   Loss 13.5647   LearningRate 0.0869   Epoch: 1   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:25,297-Speed 2630.86 samples/sec   Loss 13.3841   LearningRate 0.0869   Epoch: 1   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:29,206-Speed 2619.89 samples/sec   Loss 13.5239   LearningRate 0.0869   Epoch: 1   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:33,102-Speed 2629.02 samples/sec   Loss 13.4743   LearningRate 0.0869   Epoch: 1   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:36,992-Speed 2633.68 samples/sec   Loss 13.5226   LearningRate 0.0869   Epoch: 1   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:40,883-Speed 2632.20 samples/sec   Loss 13.5473   LearningRate 0.0869   Epoch: 1   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:44,792-Speed 2620.15 samples/sec   Loss 13.8219   LearningRate 0.0869   Epoch: 1   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:02:48,689-Speed 2628.70 samples/sec   Loss 13.6197   LearningRate 0.0869   Epoch: 1   Global Step: 56350   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:02:52,581-Speed 2631.15 samples/sec   Loss 13.5089   LearningRate 0.0869   Epoch: 1   Global Step: 56360   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:02:56,510-Speed 2606.97 samples/sec   Loss 13.6546   LearningRate 0.0869   Epoch: 1   Global Step: 56370   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:00,415-Speed 2622.80 samples/sec   Loss 13.6639   LearningRate 0.0869   Epoch: 1   Global Step: 56380   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:04,335-Speed 2613.43 samples/sec   Loss 13.6135   LearningRate 0.0869   Epoch: 1   Global Step: 56390   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:08,233-Speed 2627.38 samples/sec   Loss 13.6706   LearningRate 0.0869   Epoch: 1   Global Step: 56400   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:12,162-Speed 2607.47 samples/sec   Loss 13.5648   LearningRate 0.0869   Epoch: 1   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:16,104-Speed 2598.03 samples/sec   Loss 13.4222   LearningRate 0.0869   Epoch: 1   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:20,021-Speed 2615.09 samples/sec   Loss 13.5667   LearningRate 0.0869   Epoch: 1   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:23,953-Speed 2605.07 samples/sec   Loss 13.6794   LearningRate 0.0869   Epoch: 1   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:27,922-Speed 2580.91 samples/sec   Loss 13.4559   LearningRate 0.0869   Epoch: 1   Global Step: 56450   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:03:31,816-Speed 2629.63 samples/sec   Loss 13.5121   LearningRate 0.0869   Epoch: 1   Global Step: 56460   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:03:35,709-Speed 2631.57 samples/sec   Loss 13.5443   LearningRate 0.0868   Epoch: 1   Global Step: 56470   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:39,608-Speed 2627.03 samples/sec   Loss 13.4822   LearningRate 0.0868   Epoch: 1   Global Step: 56480   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:43,521-Speed 2618.08 samples/sec   Loss 13.5658   LearningRate 0.0868   Epoch: 1   Global Step: 56490   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:47,420-Speed 2626.83 samples/sec   Loss 13.4995   LearningRate 0.0868   Epoch: 1   Global Step: 56500   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:51,317-Speed 2629.36 samples/sec   Loss 13.6272   LearningRate 0.0868   Epoch: 1   Global Step: 56510   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:55,210-Speed 2631.49 samples/sec   Loss 13.6287   LearningRate 0.0868   Epoch: 1   Global Step: 56520   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:03:59,109-Speed 2626.13 samples/sec   Loss 13.6108   LearningRate 0.0868   Epoch: 1   Global Step: 56530   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:03,014-Speed 2623.42 samples/sec   Loss 13.6372   LearningRate 0.0868   Epoch: 1   Global Step: 56540   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:06,914-Speed 2626.21 samples/sec   Loss 13.4574   LearningRate 0.0868   Epoch: 1   Global Step: 56550   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:10,829-Speed 2616.75 samples/sec   Loss 13.5913   LearningRate 0.0868   Epoch: 1   Global Step: 56560   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:14,738-Speed 2619.96 samples/sec   Loss 13.5125   LearningRate 0.0868   Epoch: 1   Global Step: 56570   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:04:18,640-Speed 2625.54 samples/sec   Loss 13.6148   LearningRate 0.0868   Epoch: 1   Global Step: 56580   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:04:22,547-Speed 2621.47 samples/sec   Loss 13.6042   LearningRate 0.0868   Epoch: 1   Global Step: 56590   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:04:26,417-Speed 2646.77 samples/sec   Loss 13.4964   LearningRate 0.0868   Epoch: 1   Global Step: 56600   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:30,312-Speed 2629.46 samples/sec   Loss 13.5973   LearningRate 0.0868   Epoch: 1   Global Step: 56610   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:34,243-Speed 2606.04 samples/sec   Loss 13.6153   LearningRate 0.0868   Epoch: 1   Global Step: 56620   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:38,145-Speed 2624.60 samples/sec   Loss 13.4679   LearningRate 0.0868   Epoch: 1   Global Step: 56630   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:42,040-Speed 2630.10 samples/sec   Loss 13.5334   LearningRate 0.0868   Epoch: 1   Global Step: 56640   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:45,947-Speed 2621.43 samples/sec   Loss 13.4576   LearningRate 0.0868   Epoch: 1   Global Step: 56650   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:49,841-Speed 2630.29 samples/sec   Loss 13.6149   LearningRate 0.0868   Epoch: 1   Global Step: 56660   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:53,737-Speed 2629.81 samples/sec   Loss 13.5063   LearningRate 0.0868   Epoch: 1   Global Step: 56670   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:04:57,653-Speed 2615.37 samples/sec   Loss 13.5752   LearningRate 0.0868   Epoch: 1   Global Step: 56680   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:01,544-Speed 2631.68 samples/sec   Loss 13.5491   LearningRate 0.0868   Epoch: 1   Global Step: 56690   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:05,441-Speed 2628.66 samples/sec   Loss 13.5419   LearningRate 0.0868   Epoch: 1   Global Step: 56700   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:05:09,338-Speed 2628.99 samples/sec   Loss 13.4853   LearningRate 0.0868   Epoch: 1   Global Step: 56710   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:05:13,219-Speed 2639.07 samples/sec   Loss 13.5478   LearningRate 0.0868   Epoch: 1   Global Step: 56720   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:17,113-Speed 2630.44 samples/sec   Loss 13.4878   LearningRate 0.0868   Epoch: 1   Global Step: 56730   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:21,005-Speed 2632.26 samples/sec   Loss 13.4938   LearningRate 0.0868   Epoch: 1   Global Step: 56740   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:24,898-Speed 2630.36 samples/sec   Loss 13.7358   LearningRate 0.0868   Epoch: 1   Global Step: 56750   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:28,884-Speed 2570.22 samples/sec   Loss 13.5157   LearningRate 0.0868   Epoch: 1   Global Step: 56760   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:32,776-Speed 2631.18 samples/sec   Loss 13.4473   LearningRate 0.0868   Epoch: 1   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:36,676-Speed 2626.34 samples/sec   Loss 13.5737   LearningRate 0.0868   Epoch: 1   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:40,582-Speed 2622.27 samples/sec   Loss 13.5044   LearningRate 0.0868   Epoch: 1   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:44,476-Speed 2630.35 samples/sec   Loss 13.4981   LearningRate 0.0868   Epoch: 1   Global Step: 56800   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:48,371-Speed 2629.54 samples/sec   Loss 13.6274   LearningRate 0.0868   Epoch: 1   Global Step: 56810   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:05:52,272-Speed 2625.74 samples/sec   Loss 13.3686   LearningRate 0.0868   Epoch: 1   Global Step: 56820   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:05:56,165-Speed 2631.18 samples/sec   Loss 13.5358   LearningRate 0.0868   Epoch: 1   Global Step: 56830   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:06:00,106-Speed 2599.01 samples/sec   Loss 13.5897   LearningRate 0.0868   Epoch: 1   Global Step: 56840   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:06:04,059-Speed 2590.59 samples/sec   Loss 13.5801   LearningRate 0.0868   Epoch: 1   Global Step: 56850   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:06:07,947-Speed 2634.56 samples/sec   Loss 13.5855   LearningRate 0.0868   Epoch: 1   Global Step: 56860   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:06:11,808-Speed 2652.56 samples/sec   Loss 13.5777   LearningRate 0.0868   Epoch: 1   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:06:15,698-Speed 2633.48 samples/sec   Loss 13.5956   LearningRate 0.0868   Epoch: 1   Global Step: 56880   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:06:19,572-Speed 2643.76 samples/sec   Loss 13.7270   LearningRate 0.0868   Epoch: 1   Global Step: 56890   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:23,473-Speed 2625.60 samples/sec   Loss 13.5636   LearningRate 0.0868   Epoch: 1   Global Step: 56900   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:27,365-Speed 2631.92 samples/sec   Loss 13.6558   LearningRate 0.0868   Epoch: 1   Global Step: 56910   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:31,258-Speed 2631.51 samples/sec   Loss 13.5559   LearningRate 0.0867   Epoch: 1   Global Step: 56920   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:35,149-Speed 2631.67 samples/sec   Loss 13.5136   LearningRate 0.0867   Epoch: 1   Global Step: 56930   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:39,051-Speed 2625.29 samples/sec   Loss 13.5626   LearningRate 0.0867   Epoch: 1   Global Step: 56940   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:42,947-Speed 2629.20 samples/sec   Loss 13.5813   LearningRate 0.0867   Epoch: 1   Global Step: 56950   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:46,859-Speed 2618.74 samples/sec   Loss 13.5720   LearningRate 0.0867   Epoch: 1   Global Step: 56960   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:50,758-Speed 2626.42 samples/sec   Loss 13.5851   LearningRate 0.0867   Epoch: 1   Global Step: 56970   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:54,653-Speed 2634.09 samples/sec   Loss 13.6387   LearningRate 0.0867   Epoch: 1   Global Step: 56980   Fp16 Grad Scale: 16384   Required: 87 hours
Training: 2022-04-13 02:06:58,546-Speed 2630.79 samples/sec   Loss 13.5925   LearningRate 0.0867   Epoch: 1   Global Step: 56990   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:02,446-Speed 2627.04 samples/sec   Loss 13.5534   LearningRate 0.0867   Epoch: 1   Global Step: 57000   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:06,337-Speed 2632.32 samples/sec   Loss 13.5837   LearningRate 0.0867   Epoch: 1   Global Step: 57010   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:10,231-Speed 2630.04 samples/sec   Loss 13.6869   LearningRate 0.0867   Epoch: 1   Global Step: 57020   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:14,145-Speed 2616.71 samples/sec   Loss 13.5643   LearningRate 0.0867   Epoch: 1   Global Step: 57030   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:18,040-Speed 2630.08 samples/sec   Loss 13.7014   LearningRate 0.0867   Epoch: 1   Global Step: 57040   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:21,932-Speed 2632.02 samples/sec   Loss 13.5634   LearningRate 0.0867   Epoch: 1   Global Step: 57050   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:25,831-Speed 2626.63 samples/sec   Loss 13.5576   LearningRate 0.0867   Epoch: 1   Global Step: 57060   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:29,735-Speed 2623.94 samples/sec   Loss 13.5630   LearningRate 0.0867   Epoch: 1   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:33,635-Speed 2625.88 samples/sec   Loss 13.5553   LearningRate 0.0867   Epoch: 1   Global Step: 57080   Fp16 Grad Scale: 32768   Required: 87 hours
Training: 2022-04-13 02:07:37,527-Speed 2631.60 samples/sec   Loss 13.6148   LearningRate 0.0867   Epoch: 1   Global Step: 57090   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:07:41,421-Speed 2629.67 samples/sec   Loss 13.4327   LearningRate 0.0867   Epoch: 1   Global Step: 57100   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:07:45,316-Speed 2630.36 samples/sec   Loss 13.4959   LearningRate 0.0867   Epoch: 1   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:07:49,208-Speed 2631.77 samples/sec   Loss 13.5697   LearningRate 0.0867   Epoch: 1   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:07:53,109-Speed 2625.34 samples/sec   Loss 13.3425   LearningRate 0.0867   Epoch: 1   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:07:57,005-Speed 2629.19 samples/sec   Loss 13.4949   LearningRate 0.0867   Epoch: 1   Global Step: 57140   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:08:00,912-Speed 2621.70 samples/sec   Loss 13.4547   LearningRate 0.0867   Epoch: 1   Global Step: 57150   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:08:04,813-Speed 2625.34 samples/sec   Loss 13.4140   LearningRate 0.0867   Epoch: 1   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:08:08,706-Speed 2630.93 samples/sec   Loss 13.4620   LearningRate 0.0867   Epoch: 1   Global Step: 57170   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:08:12,619-Speed 2617.54 samples/sec   Loss 13.5399   LearningRate 0.0867   Epoch: 1   Global Step: 57180   Fp16 Grad Scale: 65536   Required: 87 hours
Training: 2022-04-13 02:08:16,513-Speed 2630.63 samples/sec   Loss 13.4724   LearningRate 0.0867   Epoch: 1   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:20,408-Speed 2629.83 samples/sec   Loss 13.6005   LearningRate 0.0867   Epoch: 1   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:24,328-Speed 2613.24 samples/sec   Loss 13.4485   LearningRate 0.0867   Epoch: 1   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:28,220-Speed 2631.44 samples/sec   Loss 13.6112   LearningRate 0.0867   Epoch: 1   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:32,113-Speed 2631.21 samples/sec   Loss 13.5018   LearningRate 0.0867   Epoch: 1   Global Step: 57230   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:36,027-Speed 2616.42 samples/sec   Loss 13.5537   LearningRate 0.0867   Epoch: 1   Global Step: 57240   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:39,919-Speed 2631.81 samples/sec   Loss 13.5487   LearningRate 0.0867   Epoch: 1   Global Step: 57250   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:43,812-Speed 2631.15 samples/sec   Loss 13.7290   LearningRate 0.0867   Epoch: 1   Global Step: 57260   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:47,707-Speed 2629.90 samples/sec   Loss 13.5362   LearningRate 0.0867   Epoch: 1   Global Step: 57270   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:51,618-Speed 2619.08 samples/sec   Loss 13.5994   LearningRate 0.0867   Epoch: 1   Global Step: 57280   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:08:55,510-Speed 2631.26 samples/sec   Loss 13.5277   LearningRate 0.0867   Epoch: 1   Global Step: 57290   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:08:59,407-Speed 2628.59 samples/sec   Loss 13.4746   LearningRate 0.0867   Epoch: 1   Global Step: 57300   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:09:03,308-Speed 2625.59 samples/sec   Loss 13.5715   LearningRate 0.0867   Epoch: 1   Global Step: 57310   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:07,226-Speed 2613.49 samples/sec   Loss 13.4822   LearningRate 0.0867   Epoch: 1   Global Step: 57320   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:11,122-Speed 2630.19 samples/sec   Loss 13.5055   LearningRate 0.0867   Epoch: 1   Global Step: 57330   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:15,021-Speed 2626.81 samples/sec   Loss 13.5675   LearningRate 0.0867   Epoch: 1   Global Step: 57340   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:18,920-Speed 2627.06 samples/sec   Loss 13.4048   LearningRate 0.0867   Epoch: 1   Global Step: 57350   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:22,816-Speed 2629.73 samples/sec   Loss 13.4848   LearningRate 0.0866   Epoch: 1   Global Step: 57360   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:26,713-Speed 2628.31 samples/sec   Loss 13.5374   LearningRate 0.0866   Epoch: 1   Global Step: 57370   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:30,618-Speed 2622.80 samples/sec   Loss 13.4661   LearningRate 0.0866   Epoch: 1   Global Step: 57380   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:09:34,487-Speed 2646.72 samples/sec   Loss 13.5338   LearningRate 0.0866   Epoch: 1   Global Step: 57390   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:09:38,390-Speed 2624.36 samples/sec   Loss 13.4015   LearningRate 0.0866   Epoch: 1   Global Step: 57400   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:09:42,295-Speed 2623.46 samples/sec   Loss 13.5631   LearningRate 0.0866   Epoch: 1   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:09:46,189-Speed 2630.01 samples/sec   Loss 13.5812   LearningRate 0.0866   Epoch: 1   Global Step: 57420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:09:50,085-Speed 2629.41 samples/sec   Loss 13.5886   LearningRate 0.0866   Epoch: 1   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:09:53,981-Speed 2629.25 samples/sec   Loss 13.5502   LearningRate 0.0866   Epoch: 1   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:09:57,877-Speed 2628.76 samples/sec   Loss 13.5116   LearningRate 0.0866   Epoch: 1   Global Step: 57450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:01,777-Speed 2626.71 samples/sec   Loss 13.6344   LearningRate 0.0866   Epoch: 1   Global Step: 57460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:05,674-Speed 2628.09 samples/sec   Loss 13.4127   LearningRate 0.0866   Epoch: 1   Global Step: 57470   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:09,579-Speed 2622.90 samples/sec   Loss 13.5374   LearningRate 0.0866   Epoch: 1   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:13,475-Speed 2629.11 samples/sec   Loss 13.4211   LearningRate 0.0866   Epoch: 1   Global Step: 57490   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:10:17,376-Speed 2626.19 samples/sec   Loss 13.4748   LearningRate 0.0866   Epoch: 1   Global Step: 57500   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:10:21,275-Speed 2626.59 samples/sec   Loss 13.5352   LearningRate 0.0866   Epoch: 1   Global Step: 57510   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:10:25,172-Speed 2628.27 samples/sec   Loss 13.5936   LearningRate 0.0866   Epoch: 1   Global Step: 57520   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:29,096-Speed 2610.64 samples/sec   Loss 13.4219   LearningRate 0.0866   Epoch: 1   Global Step: 57530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:33,003-Speed 2621.75 samples/sec   Loss 13.4279   LearningRate 0.0866   Epoch: 1   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:36,940-Speed 2601.16 samples/sec   Loss 13.5916   LearningRate 0.0866   Epoch: 1   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:40,840-Speed 2626.94 samples/sec   Loss 13.6022   LearningRate 0.0866   Epoch: 1   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:44,745-Speed 2622.48 samples/sec   Loss 13.5343   LearningRate 0.0866   Epoch: 1   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:48,688-Speed 2598.05 samples/sec   Loss 13.5629   LearningRate 0.0866   Epoch: 1   Global Step: 57580   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:52,602-Speed 2616.84 samples/sec   Loss 13.5686   LearningRate 0.0866   Epoch: 1   Global Step: 57590   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:10:56,521-Speed 2614.37 samples/sec   Loss 13.4835   LearningRate 0.0866   Epoch: 1   Global Step: 57600   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:00,439-Speed 2614.03 samples/sec   Loss 13.5398   LearningRate 0.0866   Epoch: 1   Global Step: 57610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:04,336-Speed 2627.98 samples/sec   Loss 13.3575   LearningRate 0.0866   Epoch: 1   Global Step: 57620   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:11:08,232-Speed 2629.27 samples/sec   Loss 13.5605   LearningRate 0.0866   Epoch: 1   Global Step: 57630   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:11:12,110-Speed 2641.31 samples/sec   Loss 13.4669   LearningRate 0.0866   Epoch: 1   Global Step: 57640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:16,010-Speed 2626.08 samples/sec   Loss 13.4919   LearningRate 0.0866   Epoch: 1   Global Step: 57650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:19,915-Speed 2623.03 samples/sec   Loss 13.4568   LearningRate 0.0866   Epoch: 1   Global Step: 57660   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:23,809-Speed 2630.25 samples/sec   Loss 13.5606   LearningRate 0.0866   Epoch: 1   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:27,704-Speed 2629.59 samples/sec   Loss 13.6215   LearningRate 0.0866   Epoch: 1   Global Step: 57680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:31,600-Speed 2629.81 samples/sec   Loss 13.4978   LearningRate 0.0866   Epoch: 1   Global Step: 57690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:35,499-Speed 2626.70 samples/sec   Loss 13.4318   LearningRate 0.0866   Epoch: 1   Global Step: 57700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:39,401-Speed 2625.21 samples/sec   Loss 13.5358   LearningRate 0.0866   Epoch: 1   Global Step: 57710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:43,328-Speed 2608.10 samples/sec   Loss 13.4118   LearningRate 0.0866   Epoch: 1   Global Step: 57720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:47,232-Speed 2623.64 samples/sec   Loss 13.5183   LearningRate 0.0866   Epoch: 1   Global Step: 57730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:11:51,131-Speed 2626.82 samples/sec   Loss 13.6636   LearningRate 0.0866   Epoch: 1   Global Step: 57740   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:11:55,028-Speed 2628.81 samples/sec   Loss 13.6271   LearningRate 0.0866   Epoch: 1   Global Step: 57750   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:11:58,924-Speed 2628.56 samples/sec   Loss 13.6913   LearningRate 0.0866   Epoch: 1   Global Step: 57760   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:12:02,813-Speed 2634.26 samples/sec   Loss 13.6077   LearningRate 0.0866   Epoch: 1   Global Step: 57770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:06,711-Speed 2627.79 samples/sec   Loss 13.3972   LearningRate 0.0866   Epoch: 1   Global Step: 57780   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:10,607-Speed 2628.95 samples/sec   Loss 13.4413   LearningRate 0.0866   Epoch: 1   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:14,500-Speed 2630.35 samples/sec   Loss 13.5896   LearningRate 0.0866   Epoch: 1   Global Step: 57800   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:18,392-Speed 2632.25 samples/sec   Loss 13.6630   LearningRate 0.0865   Epoch: 1   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:22,293-Speed 2625.18 samples/sec   Loss 13.4933   LearningRate 0.0865   Epoch: 1   Global Step: 57820   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:26,205-Speed 2618.03 samples/sec   Loss 13.5069   LearningRate 0.0865   Epoch: 1   Global Step: 57830   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:30,127-Speed 2612.45 samples/sec   Loss 13.4942   LearningRate 0.0865   Epoch: 1   Global Step: 57840   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:34,018-Speed 2632.31 samples/sec   Loss 13.4483   LearningRate 0.0865   Epoch: 1   Global Step: 57850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:37,911-Speed 2630.55 samples/sec   Loss 13.6162   LearningRate 0.0865   Epoch: 1   Global Step: 57860   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:41,819-Speed 2621.19 samples/sec   Loss 13.4954   LearningRate 0.0865   Epoch: 1   Global Step: 57870   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:12:45,728-Speed 2620.66 samples/sec   Loss 13.6014   LearningRate 0.0865   Epoch: 1   Global Step: 57880   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:12:49,605-Speed 2641.70 samples/sec   Loss 13.6322   LearningRate 0.0865   Epoch: 1   Global Step: 57890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:53,500-Speed 2629.92 samples/sec   Loss 13.5366   LearningRate 0.0865   Epoch: 1   Global Step: 57900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:12:57,396-Speed 2629.01 samples/sec   Loss 13.6129   LearningRate 0.0865   Epoch: 1   Global Step: 57910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:01,289-Speed 2630.94 samples/sec   Loss 13.3949   LearningRate 0.0865   Epoch: 1   Global Step: 57920   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:05,190-Speed 2625.76 samples/sec   Loss 13.5361   LearningRate 0.0865   Epoch: 1   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:09,088-Speed 2627.73 samples/sec   Loss 13.4492   LearningRate 0.0865   Epoch: 1   Global Step: 57940   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:12,986-Speed 2627.87 samples/sec   Loss 13.5324   LearningRate 0.0865   Epoch: 1   Global Step: 57950   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:16,884-Speed 2627.43 samples/sec   Loss 13.4489   LearningRate 0.0865   Epoch: 1   Global Step: 57960   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:20,780-Speed 2629.00 samples/sec   Loss 13.5784   LearningRate 0.0865   Epoch: 1   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:24,673-Speed 2631.76 samples/sec   Loss 13.3652   LearningRate 0.0865   Epoch: 1   Global Step: 57980   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:28,553-Speed 2639.61 samples/sec   Loss 13.4860   LearningRate 0.0865   Epoch: 1   Global Step: 57990   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:32,470-Speed 2615.28 samples/sec   Loss 13.5294   LearningRate 0.0865   Epoch: 1   Global Step: 58000   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:36,365-Speed 2629.75 samples/sec   Loss 13.3977   LearningRate 0.0865   Epoch: 1   Global Step: 58010   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:40,261-Speed 2628.68 samples/sec   Loss 13.5670   LearningRate 0.0865   Epoch: 1   Global Step: 58020   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:44,152-Speed 2632.19 samples/sec   Loss 13.4687   LearningRate 0.0865   Epoch: 1   Global Step: 58030   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:48,044-Speed 2632.22 samples/sec   Loss 13.5349   LearningRate 0.0865   Epoch: 1   Global Step: 58040   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:51,938-Speed 2630.46 samples/sec   Loss 13.4411   LearningRate 0.0865   Epoch: 1   Global Step: 58050   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:55,828-Speed 2633.05 samples/sec   Loss 13.4730   LearningRate 0.0865   Epoch: 1   Global Step: 58060   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:13:59,719-Speed 2632.34 samples/sec   Loss 13.4127   LearningRate 0.0865   Epoch: 1   Global Step: 58070   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:03,611-Speed 2631.75 samples/sec   Loss 13.2897   LearningRate 0.0865   Epoch: 1   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:07,506-Speed 2629.26 samples/sec   Loss 13.6155   LearningRate 0.0865   Epoch: 1   Global Step: 58090   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:14:11,400-Speed 2630.36 samples/sec   Loss 13.4978   LearningRate 0.0865   Epoch: 1   Global Step: 58100   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:14:15,292-Speed 2631.56 samples/sec   Loss 13.3840   LearningRate 0.0865   Epoch: 1   Global Step: 58110   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:14:19,188-Speed 2629.68 samples/sec   Loss 13.5428   LearningRate 0.0865   Epoch: 1   Global Step: 58120   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:14:23,081-Speed 2631.23 samples/sec   Loss 13.4410   LearningRate 0.0865   Epoch: 1   Global Step: 58130   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:14:26,980-Speed 2627.05 samples/sec   Loss 13.4840   LearningRate 0.0865   Epoch: 1   Global Step: 58140   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:14:30,858-Speed 2641.24 samples/sec   Loss 13.3795   LearningRate 0.0865   Epoch: 1   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:34,794-Speed 2601.88 samples/sec   Loss 13.4914   LearningRate 0.0865   Epoch: 1   Global Step: 58160   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:38,735-Speed 2605.18 samples/sec   Loss 13.5077   LearningRate 0.0865   Epoch: 1   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:42,645-Speed 2619.78 samples/sec   Loss 13.5250   LearningRate 0.0865   Epoch: 1   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:46,540-Speed 2629.53 samples/sec   Loss 13.6529   LearningRate 0.0865   Epoch: 1   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:50,458-Speed 2614.47 samples/sec   Loss 13.4589   LearningRate 0.0865   Epoch: 1   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:54,349-Speed 2632.83 samples/sec   Loss 13.4544   LearningRate 0.0865   Epoch: 1   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:14:58,238-Speed 2633.35 samples/sec   Loss 13.3107   LearningRate 0.0865   Epoch: 1   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:02,131-Speed 2630.76 samples/sec   Loss 13.4272   LearningRate 0.0865   Epoch: 1   Global Step: 58230   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:06,026-Speed 2630.33 samples/sec   Loss 13.3445   LearningRate 0.0865   Epoch: 1   Global Step: 58240   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:10,063-Speed 2537.24 samples/sec   Loss 13.3312   LearningRate 0.0864   Epoch: 1   Global Step: 58250   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:15:13,959-Speed 2629.25 samples/sec   Loss 13.4102   LearningRate 0.0864   Epoch: 1   Global Step: 58260   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:15:17,857-Speed 2627.15 samples/sec   Loss 13.4309   LearningRate 0.0864   Epoch: 1   Global Step: 58270   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:21,749-Speed 2631.94 samples/sec   Loss 13.6459   LearningRate 0.0864   Epoch: 1   Global Step: 58280   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:25,647-Speed 2627.78 samples/sec   Loss 13.4125   LearningRate 0.0864   Epoch: 1   Global Step: 58290   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:29,553-Speed 2622.21 samples/sec   Loss 13.5030   LearningRate 0.0864   Epoch: 1   Global Step: 58300   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:33,458-Speed 2622.41 samples/sec   Loss 13.5603   LearningRate 0.0864   Epoch: 1   Global Step: 58310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:37,352-Speed 2631.12 samples/sec   Loss 13.5221   LearningRate 0.0864   Epoch: 1   Global Step: 58320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:41,244-Speed 2631.35 samples/sec   Loss 13.5279   LearningRate 0.0864   Epoch: 1   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:45,156-Speed 2618.15 samples/sec   Loss 13.4446   LearningRate 0.0864   Epoch: 1   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:49,058-Speed 2625.44 samples/sec   Loss 13.5851   LearningRate 0.0864   Epoch: 1   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:52,951-Speed 2631.46 samples/sec   Loss 13.5339   LearningRate 0.0864   Epoch: 1   Global Step: 58360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:15:56,855-Speed 2622.85 samples/sec   Loss 13.4321   LearningRate 0.0864   Epoch: 1   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:00,753-Speed 2627.69 samples/sec   Loss 13.3338   LearningRate 0.0864   Epoch: 1   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:04,671-Speed 2614.67 samples/sec   Loss 13.5982   LearningRate 0.0864   Epoch: 1   Global Step: 58390   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:08,565-Speed 2630.63 samples/sec   Loss 13.5981   LearningRate 0.0864   Epoch: 1   Global Step: 58400   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:12,467-Speed 2624.92 samples/sec   Loss 13.4477   LearningRate 0.0864   Epoch: 1   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:16,359-Speed 2631.93 samples/sec   Loss 13.5213   LearningRate 0.0864   Epoch: 1   Global Step: 58420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:20,260-Speed 2625.75 samples/sec   Loss 13.2251   LearningRate 0.0864   Epoch: 1   Global Step: 58430   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:24,170-Speed 2619.64 samples/sec   Loss 13.4779   LearningRate 0.0864   Epoch: 1   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:28,086-Speed 2614.73 samples/sec   Loss 13.4904   LearningRate 0.0864   Epoch: 1   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:31,989-Speed 2624.39 samples/sec   Loss 13.3611   LearningRate 0.0864   Epoch: 1   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:16:35,896-Speed 2621.54 samples/sec   Loss 13.3473   LearningRate 0.0864   Epoch: 1   Global Step: 58470   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:16:39,797-Speed 2626.24 samples/sec   Loss 13.5411   LearningRate 0.0864   Epoch: 1   Global Step: 58480   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:16:43,691-Speed 2630.18 samples/sec   Loss 13.4287   LearningRate 0.0864   Epoch: 1   Global Step: 58490   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:16:47,583-Speed 2631.67 samples/sec   Loss 13.3937   LearningRate 0.0864   Epoch: 1   Global Step: 58500   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:16:51,485-Speed 2625.47 samples/sec   Loss 13.4984   LearningRate 0.0864   Epoch: 1   Global Step: 58510   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:16:55,381-Speed 2629.03 samples/sec   Loss 13.5108   LearningRate 0.0864   Epoch: 1   Global Step: 58520   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:16:59,278-Speed 2628.01 samples/sec   Loss 13.4025   LearningRate 0.0864   Epoch: 1   Global Step: 58530   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:03,176-Speed 2627.64 samples/sec   Loss 13.6143   LearningRate 0.0864   Epoch: 1   Global Step: 58540   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:07,079-Speed 2623.88 samples/sec   Loss 13.4004   LearningRate 0.0864   Epoch: 1   Global Step: 58550   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:10,972-Speed 2630.98 samples/sec   Loss 13.3323   LearningRate 0.0864   Epoch: 1   Global Step: 58560   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:14,852-Speed 2640.30 samples/sec   Loss 13.5716   LearningRate 0.0864   Epoch: 1   Global Step: 58570   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:18,744-Speed 2631.65 samples/sec   Loss 13.3335   LearningRate 0.0864   Epoch: 1   Global Step: 58580   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:22,639-Speed 2629.32 samples/sec   Loss 13.4631   LearningRate 0.0864   Epoch: 1   Global Step: 58590   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:26,535-Speed 2629.80 samples/sec   Loss 13.4486   LearningRate 0.0864   Epoch: 1   Global Step: 58600   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:17:30,413-Speed 2641.04 samples/sec   Loss 13.4037   LearningRate 0.0864   Epoch: 1   Global Step: 58610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:34,311-Speed 2627.59 samples/sec   Loss 13.4823   LearningRate 0.0864   Epoch: 1   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:38,202-Speed 2631.91 samples/sec   Loss 13.3748   LearningRate 0.0864   Epoch: 1   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:42,102-Speed 2626.65 samples/sec   Loss 13.3719   LearningRate 0.0864   Epoch: 1   Global Step: 58640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:46,006-Speed 2623.40 samples/sec   Loss 13.5332   LearningRate 0.0864   Epoch: 1   Global Step: 58650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:49,905-Speed 2627.48 samples/sec   Loss 13.2553   LearningRate 0.0864   Epoch: 1   Global Step: 58660   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:53,801-Speed 2628.93 samples/sec   Loss 13.5175   LearningRate 0.0864   Epoch: 1   Global Step: 58670   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:17:57,697-Speed 2629.15 samples/sec   Loss 13.5177   LearningRate 0.0864   Epoch: 1   Global Step: 58680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:01,601-Speed 2623.41 samples/sec   Loss 13.3956   LearningRate 0.0864   Epoch: 1   Global Step: 58690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:05,499-Speed 2627.29 samples/sec   Loss 13.3928   LearningRate 0.0863   Epoch: 1   Global Step: 58700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:09,398-Speed 2626.66 samples/sec   Loss 13.4832   LearningRate 0.0863   Epoch: 1   Global Step: 58710   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:18:13,304-Speed 2622.94 samples/sec   Loss 13.3690   LearningRate 0.0863   Epoch: 1   Global Step: 58720   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:18:17,218-Speed 2616.93 samples/sec   Loss 13.6595   LearningRate 0.0863   Epoch: 1   Global Step: 58730   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:18:21,121-Speed 2624.65 samples/sec   Loss 13.3871   LearningRate 0.0863   Epoch: 1   Global Step: 58740   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:18:25,022-Speed 2625.56 samples/sec   Loss 13.6077   LearningRate 0.0863   Epoch: 1   Global Step: 58750   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:18:28,895-Speed 2644.68 samples/sec   Loss 13.5757   LearningRate 0.0863   Epoch: 1   Global Step: 58760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:32,800-Speed 2623.28 samples/sec   Loss 13.4431   LearningRate 0.0863   Epoch: 1   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:36,702-Speed 2624.15 samples/sec   Loss 13.5152   LearningRate 0.0863   Epoch: 1   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:40,619-Speed 2615.16 samples/sec   Loss 13.4202   LearningRate 0.0863   Epoch: 1   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:44,508-Speed 2633.81 samples/sec   Loss 13.5571   LearningRate 0.0863   Epoch: 1   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:48,408-Speed 2626.79 samples/sec   Loss 13.3802   LearningRate 0.0863   Epoch: 1   Global Step: 58810   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:52,306-Speed 2627.58 samples/sec   Loss 13.4088   LearningRate 0.0863   Epoch: 1   Global Step: 58820   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:18:56,241-Speed 2603.33 samples/sec   Loss 13.3164   LearningRate 0.0863   Epoch: 1   Global Step: 58830   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:19:00,165-Speed 2610.61 samples/sec   Loss 13.5122   LearningRate 0.0863   Epoch: 1   Global Step: 58840   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:19:04,072-Speed 2621.34 samples/sec   Loss 13.4387   LearningRate 0.0863   Epoch: 1   Global Step: 58850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:19:07,967-Speed 2629.19 samples/sec   Loss 13.3977   LearningRate 0.0863   Epoch: 1   Global Step: 58860   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:19:11,865-Speed 2628.42 samples/sec   Loss 13.5194   LearningRate 0.0863   Epoch: 1   Global Step: 58870   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:19:15,817-Speed 2591.45 samples/sec   Loss 13.4072   LearningRate 0.0863   Epoch: 1   Global Step: 58880   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:19:19,703-Speed 2639.89 samples/sec   Loss 13.4925   LearningRate 0.0863   Epoch: 1   Global Step: 58890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:19:23,591-Speed 2633.73 samples/sec   Loss 13.3579   LearningRate 0.0863   Epoch: 1   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:27,564-Speed 2578.86 samples/sec   Loss 13.6435   LearningRate 0.0863   Epoch: 1   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:31,459-Speed 2629.50 samples/sec   Loss 13.6969   LearningRate 0.0863   Epoch: 1   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:35,365-Speed 2622.12 samples/sec   Loss 13.3869   LearningRate 0.0863   Epoch: 1   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:39,275-Speed 2619.61 samples/sec   Loss 13.2862   LearningRate 0.0863   Epoch: 1   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:43,171-Speed 2629.43 samples/sec   Loss 13.4351   LearningRate 0.0863   Epoch: 1   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:47,067-Speed 2628.36 samples/sec   Loss 13.4383   LearningRate 0.0863   Epoch: 1   Global Step: 58960   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:50,970-Speed 2624.44 samples/sec   Loss 13.3802   LearningRate 0.0863   Epoch: 1   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:54,871-Speed 2625.61 samples/sec   Loss 13.6661   LearningRate 0.0863   Epoch: 1   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:19:58,775-Speed 2624.24 samples/sec   Loss 13.4829   LearningRate 0.0863   Epoch: 1   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:20:02,695-Speed 2612.30 samples/sec   Loss 13.5499   LearningRate 0.0863   Epoch: 1   Global Step: 59000   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:06,602-Speed 2621.71 samples/sec   Loss 13.4954   LearningRate 0.0863   Epoch: 1   Global Step: 59010   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:10,499-Speed 2628.26 samples/sec   Loss 13.3011   LearningRate 0.0863   Epoch: 1   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:14,398-Speed 2626.89 samples/sec   Loss 13.4180   LearningRate 0.0863   Epoch: 1   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:18,293-Speed 2630.20 samples/sec   Loss 13.5223   LearningRate 0.0863   Epoch: 1   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:22,189-Speed 2628.57 samples/sec   Loss 13.4605   LearningRate 0.0863   Epoch: 1   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:26,085-Speed 2628.93 samples/sec   Loss 13.4482   LearningRate 0.0863   Epoch: 1   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:29,996-Speed 2619.02 samples/sec   Loss 13.5287   LearningRate 0.0863   Epoch: 1   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:33,889-Speed 2631.34 samples/sec   Loss 13.4470   LearningRate 0.0863   Epoch: 1   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:37,778-Speed 2633.47 samples/sec   Loss 13.3394   LearningRate 0.0863   Epoch: 1   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:41,655-Speed 2641.86 samples/sec   Loss 13.5424   LearningRate 0.0863   Epoch: 1   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:45,563-Speed 2621.08 samples/sec   Loss 13.3415   LearningRate 0.0863   Epoch: 1   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:49,465-Speed 2624.61 samples/sec   Loss 13.4607   LearningRate 0.0863   Epoch: 1   Global Step: 59120   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:53,365-Speed 2626.85 samples/sec   Loss 13.4115   LearningRate 0.0863   Epoch: 1   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:20:57,258-Speed 2630.92 samples/sec   Loss 13.4289   LearningRate 0.0863   Epoch: 1   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:21:01,157-Speed 2626.65 samples/sec   Loss 13.4171   LearningRate 0.0862   Epoch: 1   Global Step: 59150   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:21:05,053-Speed 2629.06 samples/sec   Loss 13.3222   LearningRate 0.0862   Epoch: 1   Global Step: 59160   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:21:08,948-Speed 2629.68 samples/sec   Loss 13.2742   LearningRate 0.0862   Epoch: 1   Global Step: 59170   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:21:12,847-Speed 2627.29 samples/sec   Loss 13.3948   LearningRate 0.0862   Epoch: 1   Global Step: 59180   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:21:16,744-Speed 2627.95 samples/sec   Loss 13.4123   LearningRate 0.0862   Epoch: 1   Global Step: 59190   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:21:20,662-Speed 2615.17 samples/sec   Loss 13.2488   LearningRate 0.0862   Epoch: 1   Global Step: 59200   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:24,565-Speed 2623.88 samples/sec   Loss 13.3361   LearningRate 0.0862   Epoch: 1   Global Step: 59210   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:28,463-Speed 2628.26 samples/sec   Loss 13.6657   LearningRate 0.0862   Epoch: 1   Global Step: 59220   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:32,364-Speed 2625.62 samples/sec   Loss 13.4222   LearningRate 0.0862   Epoch: 1   Global Step: 59230   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:36,268-Speed 2623.63 samples/sec   Loss 13.6328   LearningRate 0.0862   Epoch: 1   Global Step: 59240   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:40,162-Speed 2630.04 samples/sec   Loss 13.5630   LearningRate 0.0862   Epoch: 1   Global Step: 59250   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:44,246-Speed 2507.80 samples/sec   Loss 13.5261   LearningRate 0.0862   Epoch: 1   Global Step: 59260   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:48,180-Speed 2604.32 samples/sec   Loss 13.3557   LearningRate 0.0862   Epoch: 1   Global Step: 59270   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:52,110-Speed 2605.99 samples/sec   Loss 13.6120   LearningRate 0.0862   Epoch: 1   Global Step: 59280   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:56,016-Speed 2623.17 samples/sec   Loss 13.2486   LearningRate 0.0862   Epoch: 1   Global Step: 59290   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:21:59,897-Speed 2639.69 samples/sec   Loss 13.4088   LearningRate 0.0862   Epoch: 1   Global Step: 59300   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:22:03,804-Speed 2621.90 samples/sec   Loss 13.4545   LearningRate 0.0862   Epoch: 1   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:22:07,694-Speed 2633.05 samples/sec   Loss 13.3708   LearningRate 0.0862   Epoch: 1   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:22:11,584-Speed 2632.85 samples/sec   Loss 13.4204   LearningRate 0.0862   Epoch: 1   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:22:15,460-Speed 2643.05 samples/sec   Loss 13.6270   LearningRate 0.0862   Epoch: 1   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:19,357-Speed 2628.44 samples/sec   Loss 13.4775   LearningRate 0.0862   Epoch: 1   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:23,250-Speed 2630.89 samples/sec   Loss 13.5826   LearningRate 0.0862   Epoch: 1   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:27,163-Speed 2617.76 samples/sec   Loss 13.3522   LearningRate 0.0862   Epoch: 1   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:31,083-Speed 2612.77 samples/sec   Loss 13.3934   LearningRate 0.0862   Epoch: 1   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:35,011-Speed 2607.61 samples/sec   Loss 13.4320   LearningRate 0.0862   Epoch: 1   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:38,926-Speed 2616.07 samples/sec   Loss 13.4267   LearningRate 0.0862   Epoch: 1   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:42,867-Speed 2599.55 samples/sec   Loss 13.4418   LearningRate 0.0862   Epoch: 1   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:46,802-Speed 2603.07 samples/sec   Loss 13.3074   LearningRate 0.0862   Epoch: 1   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:50,706-Speed 2623.29 samples/sec   Loss 13.4494   LearningRate 0.0862   Epoch: 1   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:22:54,610-Speed 2624.02 samples/sec   Loss 13.4540   LearningRate 0.0862   Epoch: 1   Global Step: 59440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:22:58,509-Speed 2626.97 samples/sec   Loss 13.4906   LearningRate 0.0862   Epoch: 1   Global Step: 59450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:02,406-Speed 2628.43 samples/sec   Loss 13.4389   LearningRate 0.0862   Epoch: 1   Global Step: 59460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:06,304-Speed 2627.37 samples/sec   Loss 13.6186   LearningRate 0.0862   Epoch: 1   Global Step: 59470   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:10,222-Speed 2614.40 samples/sec   Loss 13.5371   LearningRate 0.0862   Epoch: 1   Global Step: 59480   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:14,121-Speed 2626.80 samples/sec   Loss 13.4750   LearningRate 0.0862   Epoch: 1   Global Step: 59490   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:18,038-Speed 2615.11 samples/sec   Loss 13.6004   LearningRate 0.0862   Epoch: 1   Global Step: 59500   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:21,949-Speed 2619.00 samples/sec   Loss 13.3483   LearningRate 0.0862   Epoch: 1   Global Step: 59510   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:25,850-Speed 2625.52 samples/sec   Loss 13.5324   LearningRate 0.0862   Epoch: 1   Global Step: 59520   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:29,757-Speed 2621.66 samples/sec   Loss 13.4073   LearningRate 0.0862   Epoch: 1   Global Step: 59530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:33,675-Speed 2614.55 samples/sec   Loss 13.5416   LearningRate 0.0862   Epoch: 1   Global Step: 59540   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:23:37,611-Speed 2601.87 samples/sec   Loss 13.4896   LearningRate 0.0862   Epoch: 1   Global Step: 59550   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:23:41,509-Speed 2628.02 samples/sec   Loss 13.5163   LearningRate 0.0862   Epoch: 1   Global Step: 59560   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:23:45,412-Speed 2624.08 samples/sec   Loss 13.4615   LearningRate 0.0862   Epoch: 1   Global Step: 59570   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:23:49,294-Speed 2638.79 samples/sec   Loss 13.4131   LearningRate 0.0862   Epoch: 1   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:53,196-Speed 2625.56 samples/sec   Loss 13.3029   LearningRate 0.0861   Epoch: 1   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:23:57,168-Speed 2578.68 samples/sec   Loss 13.3419   LearningRate 0.0861   Epoch: 1   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:01,076-Speed 2621.11 samples/sec   Loss 13.3079   LearningRate 0.0861   Epoch: 1   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:04,974-Speed 2627.67 samples/sec   Loss 13.3818   LearningRate 0.0861   Epoch: 1   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:08,870-Speed 2628.50 samples/sec   Loss 13.2506   LearningRate 0.0861   Epoch: 1   Global Step: 59630   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:12,766-Speed 2629.55 samples/sec   Loss 13.3014   LearningRate 0.0861   Epoch: 1   Global Step: 59640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:16,670-Speed 2623.35 samples/sec   Loss 13.4051   LearningRate 0.0861   Epoch: 1   Global Step: 59650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:20,564-Speed 2629.81 samples/sec   Loss 13.3981   LearningRate 0.0861   Epoch: 1   Global Step: 59660   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:24,457-Speed 2631.53 samples/sec   Loss 13.4373   LearningRate 0.0861   Epoch: 1   Global Step: 59670   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:28,354-Speed 2628.42 samples/sec   Loss 13.2999   LearningRate 0.0861   Epoch: 1   Global Step: 59680   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:24:32,232-Speed 2640.98 samples/sec   Loss 13.3404   LearningRate 0.0861   Epoch: 1   Global Step: 59690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:36,126-Speed 2630.49 samples/sec   Loss 13.3889   LearningRate 0.0861   Epoch: 1   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:40,024-Speed 2627.13 samples/sec   Loss 13.4326   LearningRate 0.0861   Epoch: 1   Global Step: 59710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:43,932-Speed 2620.89 samples/sec   Loss 13.4482   LearningRate 0.0861   Epoch: 1   Global Step: 59720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:47,894-Speed 2585.31 samples/sec   Loss 13.4204   LearningRate 0.0861   Epoch: 1   Global Step: 59730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:51,784-Speed 2633.43 samples/sec   Loss 13.3773   LearningRate 0.0861   Epoch: 1   Global Step: 59740   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:55,675-Speed 2632.96 samples/sec   Loss 13.4507   LearningRate 0.0861   Epoch: 1   Global Step: 59750   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:24:59,603-Speed 2607.59 samples/sec   Loss 13.4949   LearningRate 0.0861   Epoch: 1   Global Step: 59760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:03,508-Speed 2623.23 samples/sec   Loss 13.4653   LearningRate 0.0861   Epoch: 1   Global Step: 59770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:07,403-Speed 2629.52 samples/sec   Loss 13.2933   LearningRate 0.0861   Epoch: 1   Global Step: 59780   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:11,303-Speed 2625.91 samples/sec   Loss 13.4803   LearningRate 0.0861   Epoch: 1   Global Step: 59790   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:25:15,222-Speed 2613.52 samples/sec   Loss 13.3634   LearningRate 0.0861   Epoch: 1   Global Step: 59800   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:25:19,136-Speed 2617.58 samples/sec   Loss 13.5002   LearningRate 0.0861   Epoch: 1   Global Step: 59810   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:25:23,013-Speed 2642.03 samples/sec   Loss 13.5322   LearningRate 0.0861   Epoch: 1   Global Step: 59820   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:26,906-Speed 2630.86 samples/sec   Loss 13.3879   LearningRate 0.0861   Epoch: 1   Global Step: 59830   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:30,795-Speed 2633.79 samples/sec   Loss 13.3910   LearningRate 0.0861   Epoch: 1   Global Step: 59840   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:34,691-Speed 2629.71 samples/sec   Loss 13.4373   LearningRate 0.0861   Epoch: 1   Global Step: 59850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:38,588-Speed 2628.08 samples/sec   Loss 13.4896   LearningRate 0.0861   Epoch: 1   Global Step: 59860   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:42,482-Speed 2629.96 samples/sec   Loss 13.4348   LearningRate 0.0861   Epoch: 1   Global Step: 59870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:46,376-Speed 2630.27 samples/sec   Loss 13.3985   LearningRate 0.0861   Epoch: 1   Global Step: 59880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:50,283-Speed 2621.66 samples/sec   Loss 13.3430   LearningRate 0.0861   Epoch: 1   Global Step: 59890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:54,179-Speed 2629.21 samples/sec   Loss 13.4003   LearningRate 0.0861   Epoch: 1   Global Step: 59900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:25:58,081-Speed 2624.66 samples/sec   Loss 13.3472   LearningRate 0.0861   Epoch: 1   Global Step: 59910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:01,979-Speed 2627.88 samples/sec   Loss 13.4812   LearningRate 0.0861   Epoch: 1   Global Step: 59920   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:26:05,860-Speed 2639.29 samples/sec   Loss 13.4081   LearningRate 0.0861   Epoch: 1   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:09,762-Speed 2624.59 samples/sec   Loss 13.4377   LearningRate 0.0861   Epoch: 1   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:13,667-Speed 2622.91 samples/sec   Loss 13.4747   LearningRate 0.0861   Epoch: 1   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:17,570-Speed 2624.13 samples/sec   Loss 13.1648   LearningRate 0.0861   Epoch: 1   Global Step: 59960   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:21,558-Speed 2568.37 samples/sec   Loss 13.6281   LearningRate 0.0861   Epoch: 1   Global Step: 59970   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:25,501-Speed 2597.77 samples/sec   Loss 13.3216   LearningRate 0.0861   Epoch: 1   Global Step: 59980   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:29,402-Speed 2626.03 samples/sec   Loss 13.5359   LearningRate 0.0861   Epoch: 1   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:26:33,305-Speed 2624.35 samples/sec   Loss 13.4007   LearningRate 0.0861   Epoch: 1   Global Step: 60000   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:27:16,321-[lfw][60000]XNorm: 24.636387
Training: 2022-04-13 02:27:16,322-[lfw][60000]Accuracy-Flip: 0.99600+-0.00271
Training: 2022-04-13 02:27:16,322-[lfw][60000]Accuracy-Highest: 0.99600
Training: 2022-04-13 02:28:06,789-[cfp_fp][60000]XNorm: 21.984522
Training: 2022-04-13 02:28:06,790-[cfp_fp][60000]Accuracy-Flip: 0.97100+-0.00975
Training: 2022-04-13 02:28:06,791-[cfp_fp][60000]Accuracy-Highest: 0.97500
Training: 2022-04-13 02:28:50,209-[agedb_30][60000]XNorm: 24.158313
Training: 2022-04-13 02:28:50,210-[agedb_30][60000]Accuracy-Flip: 0.96283+-0.01019
Training: 2022-04-13 02:28:50,210-[agedb_30][60000]Accuracy-Highest: 0.96283
Training: 2022-04-13 02:28:54,089-Speed 72.74 samples/sec   Loss 13.3686   LearningRate 0.0861   Epoch: 1   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:28:57,964-Speed 2643.02 samples/sec   Loss 13.5044   LearningRate 0.0861   Epoch: 1   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:29:01,843-Speed 2640.67 samples/sec   Loss 13.4097   LearningRate 0.0861   Epoch: 1   Global Step: 60030   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:05,734-Speed 2632.31 samples/sec   Loss 13.3074   LearningRate 0.0860   Epoch: 1   Global Step: 60040   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:09,619-Speed 2636.45 samples/sec   Loss 13.2636   LearningRate 0.0860   Epoch: 1   Global Step: 60050   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:13,501-Speed 2638.60 samples/sec   Loss 13.3799   LearningRate 0.0860   Epoch: 1   Global Step: 60060   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:17,385-Speed 2637.26 samples/sec   Loss 13.4819   LearningRate 0.0860   Epoch: 1   Global Step: 60070   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:21,270-Speed 2636.71 samples/sec   Loss 13.4840   LearningRate 0.0860   Epoch: 1   Global Step: 60080   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:25,156-Speed 2635.71 samples/sec   Loss 13.5570   LearningRate 0.0860   Epoch: 1   Global Step: 60090   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:29,042-Speed 2635.97 samples/sec   Loss 13.3409   LearningRate 0.0860   Epoch: 1   Global Step: 60100   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:32,930-Speed 2634.35 samples/sec   Loss 13.4807   LearningRate 0.0860   Epoch: 1   Global Step: 60110   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:36,828-Speed 2627.64 samples/sec   Loss 13.5210   LearningRate 0.0860   Epoch: 1   Global Step: 60120   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:40,721-Speed 2632.16 samples/sec   Loss 13.4134   LearningRate 0.0860   Epoch: 1   Global Step: 60130   Fp16 Grad Scale: 524288   Required: 87 hours
Training: 2022-04-13 02:29:44,593-Speed 2644.72 samples/sec   Loss 13.4895   LearningRate 0.0860   Epoch: 1   Global Step: 60140   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:48,491-Speed 2628.01 samples/sec   Loss 13.4127   LearningRate 0.0860   Epoch: 1   Global Step: 60150   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:52,399-Speed 2620.78 samples/sec   Loss 13.4360   LearningRate 0.0860   Epoch: 1   Global Step: 60160   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:29:56,296-Speed 2628.82 samples/sec   Loss 13.4454   LearningRate 0.0860   Epoch: 1   Global Step: 60170   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:30:00,193-Speed 2628.12 samples/sec   Loss 13.3948   LearningRate 0.0860   Epoch: 1   Global Step: 60180   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:30:04,117-Speed 2610.20 samples/sec   Loss 13.4077   LearningRate 0.0860   Epoch: 1   Global Step: 60190   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:30:08,016-Speed 2627.57 samples/sec   Loss 13.3438   LearningRate 0.0860   Epoch: 1   Global Step: 60200   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:30:11,898-Speed 2638.24 samples/sec   Loss 13.5000   LearningRate 0.0860   Epoch: 1   Global Step: 60210   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:15,857-Speed 2587.06 samples/sec   Loss 13.4455   LearningRate 0.0860   Epoch: 1   Global Step: 60220   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:19,765-Speed 2620.83 samples/sec   Loss 13.4670   LearningRate 0.0860   Epoch: 1   Global Step: 60230   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:23,671-Speed 2622.96 samples/sec   Loss 13.4130   LearningRate 0.0860   Epoch: 1   Global Step: 60240   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:27,569-Speed 2627.30 samples/sec   Loss 13.5316   LearningRate 0.0860   Epoch: 1   Global Step: 60250   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:31,493-Speed 2610.12 samples/sec   Loss 13.2303   LearningRate 0.0860   Epoch: 1   Global Step: 60260   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:35,398-Speed 2623.50 samples/sec   Loss 13.4632   LearningRate 0.0860   Epoch: 1   Global Step: 60270   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:39,296-Speed 2627.31 samples/sec   Loss 13.3631   LearningRate 0.0860   Epoch: 1   Global Step: 60280   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:43,198-Speed 2625.44 samples/sec   Loss 13.4040   LearningRate 0.0860   Epoch: 1   Global Step: 60290   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:47,100-Speed 2624.19 samples/sec   Loss 13.4517   LearningRate 0.0860   Epoch: 1   Global Step: 60300   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:30:51,005-Speed 2622.85 samples/sec   Loss 13.5857   LearningRate 0.0860   Epoch: 1   Global Step: 60310   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:30:54,908-Speed 2624.39 samples/sec   Loss 13.2713   LearningRate 0.0860   Epoch: 1   Global Step: 60320   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:30:58,811-Speed 2624.46 samples/sec   Loss 13.3738   LearningRate 0.0860   Epoch: 1   Global Step: 60330   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:31:02,732-Speed 2612.52 samples/sec   Loss 13.4013   LearningRate 0.0860   Epoch: 1   Global Step: 60340   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:31:06,630-Speed 2627.54 samples/sec   Loss 13.4905   LearningRate 0.0860   Epoch: 1   Global Step: 60350   Fp16 Grad Scale: 262144   Required: 87 hours
Training: 2022-04-13 02:31:10,512-Speed 2638.14 samples/sec   Loss 13.4481   LearningRate 0.0860   Epoch: 1   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:31:14,423-Speed 2619.33 samples/sec   Loss 13.4137   LearningRate 0.0860   Epoch: 1   Global Step: 60370   Fp16 Grad Scale: 131072   Required: 87 hours
Training: 2022-04-13 02:31:18,321-Speed 2627.47 samples/sec   Loss 13.4428   LearningRate 0.0860   Epoch: 1   Global Step: 60380   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:22,229-Speed 2620.47 samples/sec   Loss 13.3709   LearningRate 0.0860   Epoch: 1   Global Step: 60390   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:26,130-Speed 2625.71 samples/sec   Loss 13.3598   LearningRate 0.0860   Epoch: 1   Global Step: 60400   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:30,024-Speed 2630.61 samples/sec   Loss 13.3353   LearningRate 0.0860   Epoch: 1   Global Step: 60410   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:33,922-Speed 2627.74 samples/sec   Loss 13.3912   LearningRate 0.0860   Epoch: 1   Global Step: 60420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:37,827-Speed 2623.11 samples/sec   Loss 13.3210   LearningRate 0.0860   Epoch: 1   Global Step: 60430   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:41,735-Speed 2621.39 samples/sec   Loss 13.3658   LearningRate 0.0860   Epoch: 1   Global Step: 60440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:45,634-Speed 2626.91 samples/sec   Loss 13.4134   LearningRate 0.0860   Epoch: 1   Global Step: 60450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:31:49,551-Speed 2614.45 samples/sec   Loss 13.2492   LearningRate 0.0860   Epoch: 1   Global Step: 60460   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:31:53,454-Speed 2624.62 samples/sec   Loss 13.3354   LearningRate 0.0860   Epoch: 1   Global Step: 60470   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:31:57,366-Speed 2618.60 samples/sec   Loss 13.4490   LearningRate 0.0860   Epoch: 1   Global Step: 60480   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:32:01,248-Speed 2638.29 samples/sec   Loss 13.1897   LearningRate 0.0859   Epoch: 1   Global Step: 60490   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:05,144-Speed 2628.74 samples/sec   Loss 13.4364   LearningRate 0.0859   Epoch: 1   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:09,184-Speed 2535.25 samples/sec   Loss 13.5558   LearningRate 0.0859   Epoch: 1   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:13,080-Speed 2629.31 samples/sec   Loss 13.3614   LearningRate 0.0859   Epoch: 1   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:16,978-Speed 2627.37 samples/sec   Loss 13.3889   LearningRate 0.0859   Epoch: 1   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:20,874-Speed 2629.09 samples/sec   Loss 13.3883   LearningRate 0.0859   Epoch: 1   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:24,768-Speed 2629.79 samples/sec   Loss 13.3135   LearningRate 0.0859   Epoch: 1   Global Step: 60550   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:28,669-Speed 2625.98 samples/sec   Loss 13.3341   LearningRate 0.0859   Epoch: 1   Global Step: 60560   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:32,569-Speed 2626.42 samples/sec   Loss 13.3909   LearningRate 0.0859   Epoch: 1   Global Step: 60570   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:36,473-Speed 2623.47 samples/sec   Loss 13.4652   LearningRate 0.0859   Epoch: 1   Global Step: 60580   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:40,370-Speed 2628.70 samples/sec   Loss 13.5854   LearningRate 0.0859   Epoch: 1   Global Step: 60590   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:32:44,249-Speed 2640.36 samples/sec   Loss 13.3261   LearningRate 0.0859   Epoch: 1   Global Step: 60600   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:48,146-Speed 2628.00 samples/sec   Loss 13.5367   LearningRate 0.0859   Epoch: 1   Global Step: 60610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:52,051-Speed 2623.70 samples/sec   Loss 13.4666   LearningRate 0.0859   Epoch: 1   Global Step: 60620   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:55,945-Speed 2630.02 samples/sec   Loss 13.4715   LearningRate 0.0859   Epoch: 1   Global Step: 60630   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:32:59,848-Speed 2624.10 samples/sec   Loss 13.3357   LearningRate 0.0859   Epoch: 1   Global Step: 60640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:33:03,748-Speed 2626.45 samples/sec   Loss 13.4441   LearningRate 0.0859   Epoch: 1   Global Step: 60650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:33:07,648-Speed 2626.82 samples/sec   Loss 13.4040   LearningRate 0.0859   Epoch: 1   Global Step: 60660   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:33:11,547-Speed 2626.42 samples/sec   Loss 13.3515   LearningRate 0.0859   Epoch: 1   Global Step: 60670   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:33:15,477-Speed 2606.83 samples/sec   Loss 13.3390   LearningRate 0.0859   Epoch: 1   Global Step: 60680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:33:19,372-Speed 2630.43 samples/sec   Loss 13.1728   LearningRate 0.0859   Epoch: 1   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:23,278-Speed 2622.08 samples/sec   Loss 13.3420   LearningRate 0.0859   Epoch: 1   Global Step: 60700   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:27,177-Speed 2627.13 samples/sec   Loss 13.3897   LearningRate 0.0859   Epoch: 1   Global Step: 60710   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:31,085-Speed 2620.50 samples/sec   Loss 13.1462   LearningRate 0.0859   Epoch: 1   Global Step: 60720   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:34,981-Speed 2629.22 samples/sec   Loss 13.1307   LearningRate 0.0859   Epoch: 1   Global Step: 60730   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:38,885-Speed 2623.47 samples/sec   Loss 13.5348   LearningRate 0.0859   Epoch: 1   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:42,783-Speed 2632.23 samples/sec   Loss 13.3663   LearningRate 0.0859   Epoch: 1   Global Step: 60750   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:46,686-Speed 2624.34 samples/sec   Loss 13.3437   LearningRate 0.0859   Epoch: 1   Global Step: 60760   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:50,589-Speed 2624.52 samples/sec   Loss 13.3736   LearningRate 0.0859   Epoch: 1   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:54,487-Speed 2627.73 samples/sec   Loss 13.2780   LearningRate 0.0859   Epoch: 1   Global Step: 60780   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:33:58,396-Speed 2620.63 samples/sec   Loss 13.4348   LearningRate 0.0859   Epoch: 1   Global Step: 60790   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:02,308-Speed 2617.76 samples/sec   Loss 13.3453   LearningRate 0.0859   Epoch: 1   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:06,220-Speed 2618.70 samples/sec   Loss 13.4067   LearningRate 0.0859   Epoch: 1   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:10,127-Speed 2621.43 samples/sec   Loss 13.5028   LearningRate 0.0859   Epoch: 1   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:14,035-Speed 2620.79 samples/sec   Loss 13.3939   LearningRate 0.0859   Epoch: 1   Global Step: 60830   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:17,941-Speed 2622.48 samples/sec   Loss 13.3443   LearningRate 0.0859   Epoch: 1   Global Step: 60840   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:21,844-Speed 2624.22 samples/sec   Loss 13.2769   LearningRate 0.0859   Epoch: 1   Global Step: 60850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:25,745-Speed 2625.57 samples/sec   Loss 13.2108   LearningRate 0.0859   Epoch: 1   Global Step: 60860   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:29,649-Speed 2623.62 samples/sec   Loss 13.4311   LearningRate 0.0859   Epoch: 1   Global Step: 60870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:33,550-Speed 2625.44 samples/sec   Loss 13.3580   LearningRate 0.0859   Epoch: 1   Global Step: 60880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:34:37,455-Speed 2622.71 samples/sec   Loss 13.4248   LearningRate 0.0859   Epoch: 1   Global Step: 60890   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:34:41,372-Speed 2615.11 samples/sec   Loss 13.3460   LearningRate 0.0859   Epoch: 1   Global Step: 60900   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:34:45,243-Speed 2646.09 samples/sec   Loss 13.3323   LearningRate 0.0859   Epoch: 1   Global Step: 60910   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:34:49,215-Speed 2578.56 samples/sec   Loss 13.3221   LearningRate 0.0859   Epoch: 1   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:34:53,133-Speed 2613.69 samples/sec   Loss 13.4355   LearningRate 0.0859   Epoch: 1   Global Step: 60930   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:34:57,053-Speed 2613.30 samples/sec   Loss 13.3742   LearningRate 0.0858   Epoch: 1   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:01,152-Speed 2498.99 samples/sec   Loss 13.3105   LearningRate 0.0858   Epoch: 1   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:05,092-Speed 2599.18 samples/sec   Loss 13.4415   LearningRate 0.0858   Epoch: 1   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:09,059-Speed 2582.20 samples/sec   Loss 13.2890   LearningRate 0.0858   Epoch: 1   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:12,989-Speed 2605.72 samples/sec   Loss 13.5780   LearningRate 0.0858   Epoch: 1   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:16,892-Speed 2624.28 samples/sec   Loss 13.4584   LearningRate 0.0858   Epoch: 1   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:20,790-Speed 2634.24 samples/sec   Loss 13.4455   LearningRate 0.0858   Epoch: 1   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:35:24,687-Speed 2628.08 samples/sec   Loss 13.2003   LearningRate 0.0858   Epoch: 1   Global Step: 61010   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:28,583-Speed 2628.55 samples/sec   Loss 13.3959   LearningRate 0.0858   Epoch: 1   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:32,479-Speed 2629.21 samples/sec   Loss 13.4610   LearningRate 0.0858   Epoch: 1   Global Step: 61030   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:36,373-Speed 2630.36 samples/sec   Loss 13.4941   LearningRate 0.0858   Epoch: 1   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:40,276-Speed 2623.93 samples/sec   Loss 13.3398   LearningRate 0.0858   Epoch: 1   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:44,169-Speed 2630.99 samples/sec   Loss 13.2861   LearningRate 0.0858   Epoch: 1   Global Step: 61060   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:48,065-Speed 2629.33 samples/sec   Loss 13.4260   LearningRate 0.0858   Epoch: 1   Global Step: 61070   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:51,968-Speed 2624.07 samples/sec   Loss 13.2344   LearningRate 0.0858   Epoch: 1   Global Step: 61080   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:55,864-Speed 2629.03 samples/sec   Loss 13.3442   LearningRate 0.0858   Epoch: 1   Global Step: 61090   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:35:59,764-Speed 2626.66 samples/sec   Loss 13.4423   LearningRate 0.0858   Epoch: 1   Global Step: 61100   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:36:03,646-Speed 2638.14 samples/sec   Loss 13.3847   LearningRate 0.0858   Epoch: 1   Global Step: 61110   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:07,609-Speed 2584.26 samples/sec   Loss 13.3590   LearningRate 0.0858   Epoch: 1   Global Step: 61120   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:11,505-Speed 2629.57 samples/sec   Loss 13.2631   LearningRate 0.0858   Epoch: 1   Global Step: 61130   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:15,399-Speed 2630.28 samples/sec   Loss 13.3754   LearningRate 0.0858   Epoch: 1   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:19,292-Speed 2631.00 samples/sec   Loss 13.4830   LearningRate 0.0858   Epoch: 1   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:23,184-Speed 2631.43 samples/sec   Loss 13.3317   LearningRate 0.0858   Epoch: 1   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:27,078-Speed 2630.56 samples/sec   Loss 13.4516   LearningRate 0.0858   Epoch: 1   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:30,970-Speed 2631.36 samples/sec   Loss 13.5294   LearningRate 0.0858   Epoch: 1   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:34,865-Speed 2629.49 samples/sec   Loss 13.5837   LearningRate 0.0858   Epoch: 1   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:38,773-Speed 2620.96 samples/sec   Loss 13.4258   LearningRate 0.0858   Epoch: 1   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:42,665-Speed 2631.48 samples/sec   Loss 13.4132   LearningRate 0.0858   Epoch: 1   Global Step: 61210   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:36:46,563-Speed 2627.83 samples/sec   Loss 13.2993   LearningRate 0.0858   Epoch: 1   Global Step: 61220   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:36:50,456-Speed 2630.87 samples/sec   Loss 13.2357   LearningRate 0.0858   Epoch: 1   Global Step: 61230   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:36:54,333-Speed 2641.89 samples/sec   Loss 13.5318   LearningRate 0.0858   Epoch: 1   Global Step: 61240   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:36:58,233-Speed 2626.39 samples/sec   Loss 13.4669   LearningRate 0.0858   Epoch: 1   Global Step: 61250   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:02,132-Speed 2627.01 samples/sec   Loss 13.3258   LearningRate 0.0858   Epoch: 1   Global Step: 61260   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:06,026-Speed 2629.88 samples/sec   Loss 13.4120   LearningRate 0.0858   Epoch: 1   Global Step: 61270   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:09,973-Speed 2594.79 samples/sec   Loss 13.4763   LearningRate 0.0858   Epoch: 1   Global Step: 61280   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:13,883-Speed 2619.74 samples/sec   Loss 13.4160   LearningRate 0.0858   Epoch: 1   Global Step: 61290   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:17,785-Speed 2625.31 samples/sec   Loss 13.3762   LearningRate 0.0858   Epoch: 1   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:21,682-Speed 2628.28 samples/sec   Loss 13.4431   LearningRate 0.0858   Epoch: 1   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:25,581-Speed 2626.77 samples/sec   Loss 13.4457   LearningRate 0.0858   Epoch: 1   Global Step: 61320   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:29,476-Speed 2629.73 samples/sec   Loss 13.3921   LearningRate 0.0858   Epoch: 1   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:37:33,371-Speed 2629.34 samples/sec   Loss 13.3167   LearningRate 0.0858   Epoch: 1   Global Step: 61340   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:37:37,266-Speed 2630.09 samples/sec   Loss 13.3279   LearningRate 0.0858   Epoch: 1   Global Step: 61350   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:37:41,173-Speed 2621.34 samples/sec   Loss 13.4472   LearningRate 0.0858   Epoch: 1   Global Step: 61360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:37:45,043-Speed 2646.56 samples/sec   Loss 13.3391   LearningRate 0.0858   Epoch: 1   Global Step: 61370   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:37:48,933-Speed 2632.72 samples/sec   Loss 13.2756   LearningRate 0.0857   Epoch: 1   Global Step: 61380   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:37:52,887-Speed 2590.26 samples/sec   Loss 13.4548   LearningRate 0.0857   Epoch: 1   Global Step: 61390   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:37:56,802-Speed 2616.29 samples/sec   Loss 13.4563   LearningRate 0.0857   Epoch: 1   Global Step: 61400   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:00,695-Speed 2630.92 samples/sec   Loss 13.3884   LearningRate 0.0857   Epoch: 1   Global Step: 61410   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:04,595-Speed 2626.88 samples/sec   Loss 13.2946   LearningRate 0.0857   Epoch: 1   Global Step: 61420   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:08,495-Speed 2625.90 samples/sec   Loss 13.3411   LearningRate 0.0857   Epoch: 1   Global Step: 61430   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:12,388-Speed 2631.13 samples/sec   Loss 13.3635   LearningRate 0.0857   Epoch: 1   Global Step: 61440   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:16,282-Speed 2630.25 samples/sec   Loss 13.3811   LearningRate 0.0857   Epoch: 1   Global Step: 61450   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:20,176-Speed 2629.68 samples/sec   Loss 13.3082   LearningRate 0.0857   Epoch: 1   Global Step: 61460   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:38:24,040-Speed 2651.13 samples/sec   Loss 13.4326   LearningRate 0.0857   Epoch: 1   Global Step: 61470   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:27,935-Speed 2629.45 samples/sec   Loss 13.6310   LearningRate 0.0857   Epoch: 1   Global Step: 61480   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:31,827-Speed 2632.01 samples/sec   Loss 13.3421   LearningRate 0.0857   Epoch: 1   Global Step: 61490   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:35,720-Speed 2631.63 samples/sec   Loss 13.2527   LearningRate 0.0857   Epoch: 1   Global Step: 61500   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:39,614-Speed 2630.57 samples/sec   Loss 13.3856   LearningRate 0.0857   Epoch: 1   Global Step: 61510   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:43,513-Speed 2626.88 samples/sec   Loss 13.4653   LearningRate 0.0857   Epoch: 1   Global Step: 61520   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:47,504-Speed 2566.97 samples/sec   Loss 13.3371   LearningRate 0.0857   Epoch: 1   Global Step: 61530   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:51,398-Speed 2630.06 samples/sec   Loss 13.2684   LearningRate 0.0857   Epoch: 1   Global Step: 61540   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:55,294-Speed 2629.80 samples/sec   Loss 13.3700   LearningRate 0.0857   Epoch: 1   Global Step: 61550   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:38:59,198-Speed 2623.22 samples/sec   Loss 13.3935   LearningRate 0.0857   Epoch: 1   Global Step: 61560   Fp16 Grad Scale: 8192   Required: 86 hours
Training: 2022-04-13 02:39:03,100-Speed 2624.88 samples/sec   Loss 13.3841   LearningRate 0.0857   Epoch: 1   Global Step: 61570   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:06,994-Speed 2630.69 samples/sec   Loss 13.4505   LearningRate 0.0857   Epoch: 1   Global Step: 61580   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:10,889-Speed 2629.78 samples/sec   Loss 13.4625   LearningRate 0.0857   Epoch: 1   Global Step: 61590   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:14,787-Speed 2628.11 samples/sec   Loss 13.2806   LearningRate 0.0857   Epoch: 1   Global Step: 61600   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:18,682-Speed 2629.48 samples/sec   Loss 13.3103   LearningRate 0.0857   Epoch: 1   Global Step: 61610   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:22,578-Speed 2629.80 samples/sec   Loss 13.1977   LearningRate 0.0857   Epoch: 1   Global Step: 61620   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:26,487-Speed 2619.92 samples/sec   Loss 13.3371   LearningRate 0.0857   Epoch: 1   Global Step: 61630   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:30,388-Speed 2626.07 samples/sec   Loss 13.2303   LearningRate 0.0857   Epoch: 1   Global Step: 61640   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:34,281-Speed 2631.17 samples/sec   Loss 13.2883   LearningRate 0.0857   Epoch: 1   Global Step: 61650   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:38,177-Speed 2628.42 samples/sec   Loss 13.2554   LearningRate 0.0857   Epoch: 1   Global Step: 61660   Fp16 Grad Scale: 16384   Required: 86 hours
Training: 2022-04-13 02:39:42,100-Speed 2610.63 samples/sec   Loss 13.4100   LearningRate 0.0857   Epoch: 1   Global Step: 61670   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:39:45,996-Speed 2629.29 samples/sec   Loss 13.5120   LearningRate 0.0857   Epoch: 1   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:39:49,892-Speed 2629.27 samples/sec   Loss 13.3272   LearningRate 0.0857   Epoch: 1   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:39:53,791-Speed 2627.48 samples/sec   Loss 13.3083   LearningRate 0.0857   Epoch: 1   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:39:57,700-Speed 2619.96 samples/sec   Loss 13.4825   LearningRate 0.0857   Epoch: 1   Global Step: 61710   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:40:01,595-Speed 2629.79 samples/sec   Loss 13.2977   LearningRate 0.0857   Epoch: 1   Global Step: 61720   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:40:05,491-Speed 2628.68 samples/sec   Loss 13.3586   LearningRate 0.0857   Epoch: 1   Global Step: 61730   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:40:09,403-Speed 2618.05 samples/sec   Loss 13.2901   LearningRate 0.0857   Epoch: 1   Global Step: 61740   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:40:13,302-Speed 2626.97 samples/sec   Loss 13.3651   LearningRate 0.0857   Epoch: 1   Global Step: 61750   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:40:17,197-Speed 2629.81 samples/sec   Loss 13.2974   LearningRate 0.0857   Epoch: 1   Global Step: 61760   Fp16 Grad Scale: 32768   Required: 86 hours
Training: 2022-04-13 02:40:21,099-Speed 2625.08 samples/sec   Loss 13.1946   LearningRate 0.0857   Epoch: 1   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:25,000-Speed 2625.49 samples/sec   Loss 13.2518   LearningRate 0.0857   Epoch: 1   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:28,900-Speed 2626.20 samples/sec   Loss 13.0861   LearningRate 0.0857   Epoch: 1   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:32,802-Speed 2625.35 samples/sec   Loss 13.3495   LearningRate 0.0857   Epoch: 1   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:36,706-Speed 2623.30 samples/sec   Loss 13.3345   LearningRate 0.0857   Epoch: 1   Global Step: 61810   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:40,613-Speed 2621.56 samples/sec   Loss 13.3551   LearningRate 0.0857   Epoch: 1   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:44,522-Speed 2620.55 samples/sec   Loss 13.4505   LearningRate 0.0856   Epoch: 1   Global Step: 61830   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:48,423-Speed 2625.50 samples/sec   Loss 13.3488   LearningRate 0.0856   Epoch: 1   Global Step: 61840   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:52,319-Speed 2629.06 samples/sec   Loss 13.3414   LearningRate 0.0856   Epoch: 1   Global Step: 61850   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:40:56,216-Speed 2628.56 samples/sec   Loss 13.3535   LearningRate 0.0856   Epoch: 1   Global Step: 61860   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:41:00,116-Speed 2626.21 samples/sec   Loss 13.4096   LearningRate 0.0856   Epoch: 1   Global Step: 61870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:04,013-Speed 2628.08 samples/sec   Loss 13.4504   LearningRate 0.0856   Epoch: 1   Global Step: 61880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:07,906-Speed 2630.85 samples/sec   Loss 13.3691   LearningRate 0.0856   Epoch: 1   Global Step: 61890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:11,806-Speed 2626.25 samples/sec   Loss 13.3927   LearningRate 0.0856   Epoch: 1   Global Step: 61900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:15,707-Speed 2626.00 samples/sec   Loss 13.3726   LearningRate 0.0856   Epoch: 1   Global Step: 61910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:19,602-Speed 2629.11 samples/sec   Loss 13.3054   LearningRate 0.0856   Epoch: 1   Global Step: 61920   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:23,504-Speed 2625.38 samples/sec   Loss 13.3516   LearningRate 0.0856   Epoch: 1   Global Step: 61930   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:27,407-Speed 2624.05 samples/sec   Loss 13.3812   LearningRate 0.0856   Epoch: 1   Global Step: 61940   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:31,310-Speed 2624.47 samples/sec   Loss 13.2315   LearningRate 0.0856   Epoch: 1   Global Step: 61950   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:35,205-Speed 2629.64 samples/sec   Loss 13.3376   LearningRate 0.0856   Epoch: 1   Global Step: 61960   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:39,084-Speed 2640.31 samples/sec   Loss 13.2351   LearningRate 0.0856   Epoch: 1   Global Step: 61970   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:42,980-Speed 2628.81 samples/sec   Loss 13.2577   LearningRate 0.0856   Epoch: 1   Global Step: 61980   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:46,883-Speed 2623.87 samples/sec   Loss 13.3860   LearningRate 0.0856   Epoch: 1   Global Step: 61990   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:50,790-Speed 2622.27 samples/sec   Loss 13.4083   LearningRate 0.0856   Epoch: 1   Global Step: 62000   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:54,689-Speed 2626.31 samples/sec   Loss 13.3286   LearningRate 0.0856   Epoch: 1   Global Step: 62010   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:41:58,582-Speed 2631.36 samples/sec   Loss 13.2716   LearningRate 0.0856   Epoch: 1   Global Step: 62020   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:42:02,475-Speed 2630.73 samples/sec   Loss 13.5264   LearningRate 0.0856   Epoch: 1   Global Step: 62030   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:42:06,372-Speed 2628.54 samples/sec   Loss 13.4329   LearningRate 0.0856   Epoch: 1   Global Step: 62040   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:42:10,278-Speed 2622.41 samples/sec   Loss 13.3321   LearningRate 0.0856   Epoch: 1   Global Step: 62050   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:42:14,170-Speed 2631.87 samples/sec   Loss 13.3566   LearningRate 0.0856   Epoch: 1   Global Step: 62060   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:42:18,066-Speed 2628.56 samples/sec   Loss 13.2477   LearningRate 0.0856   Epoch: 1   Global Step: 62070   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:21,962-Speed 2628.55 samples/sec   Loss 13.3575   LearningRate 0.0856   Epoch: 1   Global Step: 62080   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:25,860-Speed 2627.64 samples/sec   Loss 13.1080   LearningRate 0.0856   Epoch: 1   Global Step: 62090   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:29,755-Speed 2630.02 samples/sec   Loss 13.2441   LearningRate 0.0856   Epoch: 1   Global Step: 62100   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:33,694-Speed 2600.20 samples/sec   Loss 13.3547   LearningRate 0.0856   Epoch: 1   Global Step: 62110   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:37,591-Speed 2628.45 samples/sec   Loss 13.3373   LearningRate 0.0856   Epoch: 1   Global Step: 62120   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:41,495-Speed 2623.49 samples/sec   Loss 13.3848   LearningRate 0.0856   Epoch: 1   Global Step: 62130   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:45,390-Speed 2630.21 samples/sec   Loss 13.4506   LearningRate 0.0856   Epoch: 1   Global Step: 62140   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:49,306-Speed 2615.20 samples/sec   Loss 13.3077   LearningRate 0.0856   Epoch: 1   Global Step: 62150   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:53,200-Speed 2630.41 samples/sec   Loss 13.2621   LearningRate 0.0856   Epoch: 1   Global Step: 62160   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:42:57,081-Speed 2638.97 samples/sec   Loss 13.3578   LearningRate 0.0856   Epoch: 1   Global Step: 62170   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:43:00,968-Speed 2635.03 samples/sec   Loss 13.1638   LearningRate 0.0856   Epoch: 1   Global Step: 62180   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:43:04,885-Speed 2614.64 samples/sec   Loss 13.1644   LearningRate 0.0856   Epoch: 1   Global Step: 62190   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:08,997-Speed 2490.67 samples/sec   Loss 13.4858   LearningRate 0.0856   Epoch: 1   Global Step: 62200   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:13,075-Speed 2512.15 samples/sec   Loss 13.4663   LearningRate 0.0856   Epoch: 1   Global Step: 62210   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:17,147-Speed 2514.89 samples/sec   Loss 13.4263   LearningRate 0.0856   Epoch: 1   Global Step: 62220   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:21,218-Speed 2515.94 samples/sec   Loss 13.3769   LearningRate 0.0856   Epoch: 1   Global Step: 62230   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:25,302-Speed 2508.14 samples/sec   Loss 13.4045   LearningRate 0.0856   Epoch: 1   Global Step: 62240   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:29,403-Speed 2497.74 samples/sec   Loss 13.2615   LearningRate 0.0856   Epoch: 1   Global Step: 62250   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:33,500-Speed 2499.68 samples/sec   Loss 13.3436   LearningRate 0.0856   Epoch: 1   Global Step: 62260   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:37,475-Speed 2576.80 samples/sec   Loss 13.4121   LearningRate 0.0856   Epoch: 1   Global Step: 62270   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:41,393-Speed 2613.97 samples/sec   Loss 13.2240   LearningRate 0.0855   Epoch: 1   Global Step: 62280   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:45,281-Speed 2634.32 samples/sec   Loss 13.3269   LearningRate 0.0855   Epoch: 1   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:49,169-Speed 2633.88 samples/sec   Loss 13.1644   LearningRate 0.0855   Epoch: 1   Global Step: 62300   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:53,067-Speed 2627.80 samples/sec   Loss 13.4990   LearningRate 0.0855   Epoch: 1   Global Step: 62310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:43:56,962-Speed 2629.95 samples/sec   Loss 13.3470   LearningRate 0.0855   Epoch: 1   Global Step: 62320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:00,854-Speed 2632.00 samples/sec   Loss 13.3298   LearningRate 0.0855   Epoch: 1   Global Step: 62330   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:04,930-Speed 2512.80 samples/sec   Loss 13.3373   LearningRate 0.0855   Epoch: 1   Global Step: 62340   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:09,034-Speed 2495.45 samples/sec   Loss 13.3387   LearningRate 0.0855   Epoch: 1   Global Step: 62350   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:13,129-Speed 2500.93 samples/sec   Loss 13.3433   LearningRate 0.0855   Epoch: 1   Global Step: 62360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:17,224-Speed 2501.43 samples/sec   Loss 13.3681   LearningRate 0.0855   Epoch: 1   Global Step: 62370   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:21,243-Speed 2548.31 samples/sec   Loss 13.3819   LearningRate 0.0855   Epoch: 1   Global Step: 62380   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:25,164-Speed 2611.82 samples/sec   Loss 13.2493   LearningRate 0.0855   Epoch: 1   Global Step: 62390   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:44:29,065-Speed 2625.37 samples/sec   Loss 13.2407   LearningRate 0.0855   Epoch: 1   Global Step: 62400   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:44:32,970-Speed 2623.09 samples/sec   Loss 13.3635   LearningRate 0.0855   Epoch: 1   Global Step: 62410   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:44:36,850-Speed 2639.86 samples/sec   Loss 13.2918   LearningRate 0.0855   Epoch: 1   Global Step: 62420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:40,761-Speed 2619.19 samples/sec   Loss 13.2567   LearningRate 0.0855   Epoch: 1   Global Step: 62430   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:44,658-Speed 2628.03 samples/sec   Loss 13.2014   LearningRate 0.0855   Epoch: 1   Global Step: 62440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:48,564-Speed 2622.42 samples/sec   Loss 13.1810   LearningRate 0.0855   Epoch: 1   Global Step: 62450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:52,475-Speed 2618.90 samples/sec   Loss 13.2759   LearningRate 0.0855   Epoch: 1   Global Step: 62460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:44:56,373-Speed 2627.55 samples/sec   Loss 13.5381   LearningRate 0.0855   Epoch: 1   Global Step: 62470   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:00,266-Speed 2630.65 samples/sec   Loss 13.2649   LearningRate 0.0855   Epoch: 1   Global Step: 62480   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:04,158-Speed 2631.56 samples/sec   Loss 13.2111   LearningRate 0.0855   Epoch: 1   Global Step: 62490   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:08,053-Speed 2629.44 samples/sec   Loss 13.0992   LearningRate 0.0855   Epoch: 1   Global Step: 62500   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:11,946-Speed 2631.38 samples/sec   Loss 13.2741   LearningRate 0.0855   Epoch: 1   Global Step: 62510   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:15,822-Speed 2643.16 samples/sec   Loss 13.4609   LearningRate 0.0855   Epoch: 1   Global Step: 62520   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:19,716-Speed 2630.07 samples/sec   Loss 13.4005   LearningRate 0.0855   Epoch: 1   Global Step: 62530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:23,607-Speed 2632.27 samples/sec   Loss 13.3163   LearningRate 0.0855   Epoch: 1   Global Step: 62540   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:27,502-Speed 2629.84 samples/sec   Loss 13.3730   LearningRate 0.0855   Epoch: 1   Global Step: 62550   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:31,395-Speed 2630.25 samples/sec   Loss 13.3402   LearningRate 0.0855   Epoch: 1   Global Step: 62560   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:35,293-Speed 2627.43 samples/sec   Loss 13.2794   LearningRate 0.0855   Epoch: 1   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:39,201-Speed 2621.02 samples/sec   Loss 13.1263   LearningRate 0.0855   Epoch: 1   Global Step: 62580   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:43,107-Speed 2622.94 samples/sec   Loss 13.3723   LearningRate 0.0855   Epoch: 1   Global Step: 62590   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:47,007-Speed 2626.71 samples/sec   Loss 13.3692   LearningRate 0.0855   Epoch: 1   Global Step: 62600   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:50,944-Speed 2601.78 samples/sec   Loss 13.4483   LearningRate 0.0855   Epoch: 1   Global Step: 62610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:45:54,869-Speed 2609.41 samples/sec   Loss 13.3519   LearningRate 0.0855   Epoch: 1   Global Step: 62620   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:45:58,771-Speed 2624.93 samples/sec   Loss 13.2437   LearningRate 0.0855   Epoch: 1   Global Step: 62630   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:46:02,659-Speed 2633.98 samples/sec   Loss 13.3831   LearningRate 0.0855   Epoch: 1   Global Step: 62640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:06,561-Speed 2625.07 samples/sec   Loss 13.3844   LearningRate 0.0855   Epoch: 1   Global Step: 62650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:10,464-Speed 2623.86 samples/sec   Loss 13.3966   LearningRate 0.0855   Epoch: 1   Global Step: 62660   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:14,373-Speed 2620.83 samples/sec   Loss 13.3071   LearningRate 0.0855   Epoch: 1   Global Step: 62670   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:18,271-Speed 2627.15 samples/sec   Loss 13.3330   LearningRate 0.0855   Epoch: 1   Global Step: 62680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:22,166-Speed 2630.83 samples/sec   Loss 13.2805   LearningRate 0.0855   Epoch: 1   Global Step: 62690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:26,061-Speed 2629.43 samples/sec   Loss 13.1379   LearningRate 0.0855   Epoch: 1   Global Step: 62700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:29,971-Speed 2619.41 samples/sec   Loss 13.3976   LearningRate 0.0855   Epoch: 1   Global Step: 62710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:33,867-Speed 2628.94 samples/sec   Loss 13.3504   LearningRate 0.0855   Epoch: 1   Global Step: 62720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:37,765-Speed 2627.23 samples/sec   Loss 13.2137   LearningRate 0.0854   Epoch: 1   Global Step: 62730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:41,653-Speed 2634.13 samples/sec   Loss 13.2830   LearningRate 0.0854   Epoch: 1   Global Step: 62740   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:45,554-Speed 2625.70 samples/sec   Loss 13.3661   LearningRate 0.0854   Epoch: 1   Global Step: 62750   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:49,452-Speed 2628.11 samples/sec   Loss 13.2664   LearningRate 0.0854   Epoch: 1   Global Step: 62760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:53,358-Speed 2622.48 samples/sec   Loss 13.3071   LearningRate 0.0854   Epoch: 1   Global Step: 62770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:46:57,250-Speed 2631.47 samples/sec   Loss 13.2854   LearningRate 0.0854   Epoch: 1   Global Step: 62780   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:01,147-Speed 2628.41 samples/sec   Loss 13.2111   LearningRate 0.0854   Epoch: 1   Global Step: 62790   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:05,046-Speed 2626.89 samples/sec   Loss 13.1254   LearningRate 0.0854   Epoch: 1   Global Step: 62800   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:08,943-Speed 2627.80 samples/sec   Loss 13.2592   LearningRate 0.0854   Epoch: 1   Global Step: 62810   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:12,843-Speed 2626.10 samples/sec   Loss 13.2485   LearningRate 0.0854   Epoch: 1   Global Step: 62820   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:16,745-Speed 2624.95 samples/sec   Loss 13.3103   LearningRate 0.0854   Epoch: 1   Global Step: 62830   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:20,642-Speed 2628.60 samples/sec   Loss 13.2756   LearningRate 0.0854   Epoch: 1   Global Step: 62840   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:47:24,522-Speed 2639.34 samples/sec   Loss 13.2002   LearningRate 0.0854   Epoch: 1   Global Step: 62850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:28,443-Speed 2612.66 samples/sec   Loss 13.3378   LearningRate 0.0854   Epoch: 1   Global Step: 62860   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:32,359-Speed 2615.20 samples/sec   Loss 13.3392   LearningRate 0.0854   Epoch: 1   Global Step: 62870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:36,255-Speed 2628.90 samples/sec   Loss 13.3517   LearningRate 0.0854   Epoch: 1   Global Step: 62880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:40,153-Speed 2627.83 samples/sec   Loss 13.3786   LearningRate 0.0854   Epoch: 1   Global Step: 62890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:44,063-Speed 2619.19 samples/sec   Loss 13.3157   LearningRate 0.0854   Epoch: 1   Global Step: 62900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:47,960-Speed 2628.26 samples/sec   Loss 13.3324   LearningRate 0.0854   Epoch: 1   Global Step: 62910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:51,859-Speed 2627.28 samples/sec   Loss 13.3813   LearningRate 0.0854   Epoch: 1   Global Step: 62920   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:55,757-Speed 2627.00 samples/sec   Loss 13.2804   LearningRate 0.0854   Epoch: 1   Global Step: 62930   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:47:59,656-Speed 2627.92 samples/sec   Loss 13.3636   LearningRate 0.0854   Epoch: 1   Global Step: 62940   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:03,560-Speed 2623.25 samples/sec   Loss 13.2214   LearningRate 0.0854   Epoch: 1   Global Step: 62950   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:48:07,463-Speed 2624.59 samples/sec   Loss 13.3250   LearningRate 0.0854   Epoch: 1   Global Step: 62960   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:48:11,369-Speed 2622.00 samples/sec   Loss 13.3327   LearningRate 0.0854   Epoch: 1   Global Step: 62970   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:48:15,266-Speed 2629.07 samples/sec   Loss 13.3397   LearningRate 0.0854   Epoch: 1   Global Step: 62980   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:48:19,164-Speed 2627.62 samples/sec   Loss 13.4734   LearningRate 0.0854   Epoch: 1   Global Step: 62990   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:48:23,062-Speed 2627.26 samples/sec   Loss 13.1917   LearningRate 0.0854   Epoch: 1   Global Step: 63000   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:48:26,943-Speed 2639.42 samples/sec   Loss 13.3711   LearningRate 0.0854   Epoch: 1   Global Step: 63010   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:30,843-Speed 2626.02 samples/sec   Loss 13.2464   LearningRate 0.0854   Epoch: 1   Global Step: 63020   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:34,746-Speed 2624.41 samples/sec   Loss 13.2740   LearningRate 0.0854   Epoch: 1   Global Step: 63030   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:38,643-Speed 2628.15 samples/sec   Loss 13.3408   LearningRate 0.0854   Epoch: 1   Global Step: 63040   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:42,664-Speed 2547.69 samples/sec   Loss 13.3869   LearningRate 0.0854   Epoch: 1   Global Step: 63050   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:46,757-Speed 2502.07 samples/sec   Loss 13.1529   LearningRate 0.0854   Epoch: 1   Global Step: 63060   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:50,694-Speed 2601.46 samples/sec   Loss 13.2538   LearningRate 0.0854   Epoch: 1   Global Step: 63070   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:54,601-Speed 2621.70 samples/sec   Loss 13.6472   LearningRate 0.0854   Epoch: 1   Global Step: 63080   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:48:58,514-Speed 2617.16 samples/sec   Loss 13.3653   LearningRate 0.0854   Epoch: 1   Global Step: 63090   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:02,416-Speed 2625.06 samples/sec   Loss 13.2928   LearningRate 0.0854   Epoch: 1   Global Step: 63100   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:06,317-Speed 2625.80 samples/sec   Loss 13.3797   LearningRate 0.0854   Epoch: 1   Global Step: 63110   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:49:10,196-Speed 2640.48 samples/sec   Loss 13.3263   LearningRate 0.0854   Epoch: 1   Global Step: 63120   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:14,098-Speed 2624.94 samples/sec   Loss 13.1899   LearningRate 0.0854   Epoch: 1   Global Step: 63130   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:18,000-Speed 2624.95 samples/sec   Loss 13.3389   LearningRate 0.0854   Epoch: 1   Global Step: 63140   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:21,897-Speed 2628.20 samples/sec   Loss 13.4370   LearningRate 0.0854   Epoch: 1   Global Step: 63150   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:25,806-Speed 2620.39 samples/sec   Loss 13.3514   LearningRate 0.0854   Epoch: 1   Global Step: 63160   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:29,705-Speed 2626.57 samples/sec   Loss 13.2515   LearningRate 0.0854   Epoch: 1   Global Step: 63170   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:33,601-Speed 2628.97 samples/sec   Loss 13.2063   LearningRate 0.0853   Epoch: 1   Global Step: 63180   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:37,497-Speed 2628.66 samples/sec   Loss 13.3091   LearningRate 0.0853   Epoch: 1   Global Step: 63190   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:41,399-Speed 2624.99 samples/sec   Loss 13.2220   LearningRate 0.0853   Epoch: 1   Global Step: 63200   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:45,318-Speed 2613.49 samples/sec   Loss 13.2963   LearningRate 0.0853   Epoch: 1   Global Step: 63210   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:49,240-Speed 2611.51 samples/sec   Loss 13.2540   LearningRate 0.0853   Epoch: 1   Global Step: 63220   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:49:53,127-Speed 2635.36 samples/sec   Loss 13.3683   LearningRate 0.0853   Epoch: 1   Global Step: 63230   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:49:57,024-Speed 2627.83 samples/sec   Loss 13.3567   LearningRate 0.0853   Epoch: 1   Global Step: 63240   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:00,925-Speed 2626.01 samples/sec   Loss 13.2165   LearningRate 0.0853   Epoch: 1   Global Step: 63250   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:04,822-Speed 2627.90 samples/sec   Loss 13.2742   LearningRate 0.0853   Epoch: 1   Global Step: 63260   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:08,718-Speed 2628.80 samples/sec   Loss 13.3632   LearningRate 0.0853   Epoch: 1   Global Step: 63270   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:12,618-Speed 2626.35 samples/sec   Loss 13.3971   LearningRate 0.0853   Epoch: 1   Global Step: 63280   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:16,516-Speed 2627.80 samples/sec   Loss 13.2626   LearningRate 0.0853   Epoch: 1   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:20,417-Speed 2625.37 samples/sec   Loss 13.4275   LearningRate 0.0853   Epoch: 1   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:24,319-Speed 2625.62 samples/sec   Loss 13.2726   LearningRate 0.0853   Epoch: 1   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:28,228-Speed 2619.92 samples/sec   Loss 13.2992   LearningRate 0.0853   Epoch: 1   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:32,129-Speed 2625.52 samples/sec   Loss 13.1742   LearningRate 0.0853   Epoch: 1   Global Step: 63330   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:50:36,026-Speed 2628.16 samples/sec   Loss 13.4018   LearningRate 0.0853   Epoch: 1   Global Step: 63340   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:50:39,908-Speed 2638.05 samples/sec   Loss 13.2481   LearningRate 0.0853   Epoch: 1   Global Step: 63350   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:43,812-Speed 2623.92 samples/sec   Loss 13.2596   LearningRate 0.0853   Epoch: 1   Global Step: 63360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:47,713-Speed 2626.41 samples/sec   Loss 13.3129   LearningRate 0.0853   Epoch: 1   Global Step: 63370   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:51,619-Speed 2621.71 samples/sec   Loss 13.3258   LearningRate 0.0853   Epoch: 1   Global Step: 63380   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:55,527-Speed 2621.27 samples/sec   Loss 13.3430   LearningRate 0.0853   Epoch: 1   Global Step: 63390   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:50:59,421-Speed 2629.93 samples/sec   Loss 13.1516   LearningRate 0.0853   Epoch: 1   Global Step: 63400   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:03,340-Speed 2613.95 samples/sec   Loss 13.3329   LearningRate 0.0853   Epoch: 1   Global Step: 63410   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:07,250-Speed 2619.06 samples/sec   Loss 13.3731   LearningRate 0.0853   Epoch: 1   Global Step: 63420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:11,154-Speed 2623.97 samples/sec   Loss 13.2253   LearningRate 0.0853   Epoch: 1   Global Step: 63430   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:15,065-Speed 2618.71 samples/sec   Loss 13.3533   LearningRate 0.0853   Epoch: 1   Global Step: 63440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:18,960-Speed 2629.04 samples/sec   Loss 13.1676   LearningRate 0.0853   Epoch: 1   Global Step: 63450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:22,857-Speed 2628.53 samples/sec   Loss 13.3289   LearningRate 0.0853   Epoch: 1   Global Step: 63460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:26,754-Speed 2628.31 samples/sec   Loss 13.2518   LearningRate 0.0853   Epoch: 1   Global Step: 63470   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:30,663-Speed 2620.22 samples/sec   Loss 13.4082   LearningRate 0.0853   Epoch: 1   Global Step: 63480   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:34,571-Speed 2620.90 samples/sec   Loss 13.2805   LearningRate 0.0853   Epoch: 1   Global Step: 63490   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:38,473-Speed 2625.25 samples/sec   Loss 13.2677   LearningRate 0.0853   Epoch: 1   Global Step: 63500   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:42,378-Speed 2622.73 samples/sec   Loss 13.3747   LearningRate 0.0853   Epoch: 1   Global Step: 63510   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:46,292-Speed 2616.62 samples/sec   Loss 13.3463   LearningRate 0.0853   Epoch: 1   Global Step: 63520   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:50,198-Speed 2622.06 samples/sec   Loss 13.3993   LearningRate 0.0853   Epoch: 1   Global Step: 63530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:54,090-Speed 2631.97 samples/sec   Loss 13.2181   LearningRate 0.0853   Epoch: 1   Global Step: 63540   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:51:57,991-Speed 2624.97 samples/sec   Loss 13.3999   LearningRate 0.0853   Epoch: 1   Global Step: 63550   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:52:01,879-Speed 2635.09 samples/sec   Loss 13.4173   LearningRate 0.0853   Epoch: 1   Global Step: 63560   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:05,855-Speed 2575.44 samples/sec   Loss 13.3583   LearningRate 0.0853   Epoch: 1   Global Step: 63570   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:09,753-Speed 2627.70 samples/sec   Loss 13.3676   LearningRate 0.0853   Epoch: 1   Global Step: 63580   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:13,649-Speed 2628.90 samples/sec   Loss 13.2815   LearningRate 0.0853   Epoch: 1   Global Step: 63590   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:17,547-Speed 2627.58 samples/sec   Loss 13.2495   LearningRate 0.0853   Epoch: 1   Global Step: 63600   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:21,440-Speed 2631.45 samples/sec   Loss 13.1462   LearningRate 0.0853   Epoch: 1   Global Step: 63610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:25,333-Speed 2630.60 samples/sec   Loss 13.3134   LearningRate 0.0853   Epoch: 1   Global Step: 63620   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:29,230-Speed 2628.59 samples/sec   Loss 13.2686   LearningRate 0.0852   Epoch: 1   Global Step: 63630   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:33,147-Speed 2614.78 samples/sec   Loss 13.2918   LearningRate 0.0852   Epoch: 1   Global Step: 63640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:37,079-Speed 2604.16 samples/sec   Loss 13.1984   LearningRate 0.0852   Epoch: 1   Global Step: 63650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:40,983-Speed 2624.00 samples/sec   Loss 13.2432   LearningRate 0.0852   Epoch: 1   Global Step: 63660   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:52:44,895-Speed 2618.04 samples/sec   Loss 13.3672   LearningRate 0.0852   Epoch: 1   Global Step: 63670   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:52:48,791-Speed 2629.02 samples/sec   Loss 13.3607   LearningRate 0.0852   Epoch: 1   Global Step: 63680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:52,689-Speed 2628.22 samples/sec   Loss 13.3508   LearningRate 0.0852   Epoch: 1   Global Step: 63690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:52:56,580-Speed 2631.68 samples/sec   Loss 13.2638   LearningRate 0.0852   Epoch: 1   Global Step: 63700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:00,476-Speed 2629.04 samples/sec   Loss 13.2811   LearningRate 0.0852   Epoch: 1   Global Step: 63710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:04,376-Speed 2626.41 samples/sec   Loss 13.3088   LearningRate 0.0852   Epoch: 1   Global Step: 63720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:08,272-Speed 2628.93 samples/sec   Loss 13.0835   LearningRate 0.0852   Epoch: 1   Global Step: 63730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:12,185-Speed 2617.39 samples/sec   Loss 13.1615   LearningRate 0.0852   Epoch: 1   Global Step: 63740   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:16,069-Speed 2636.90 samples/sec   Loss 13.2394   LearningRate 0.0852   Epoch: 1   Global Step: 63750   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:19,962-Speed 2630.79 samples/sec   Loss 13.3251   LearningRate 0.0852   Epoch: 1   Global Step: 63760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:23,857-Speed 2629.84 samples/sec   Loss 13.3367   LearningRate 0.0852   Epoch: 1   Global Step: 63770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:27,752-Speed 2629.50 samples/sec   Loss 13.3794   LearningRate 0.0852   Epoch: 1   Global Step: 63780   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:53:31,652-Speed 2626.86 samples/sec   Loss 13.3238   LearningRate 0.0852   Epoch: 1   Global Step: 63790   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:53:35,549-Speed 2627.84 samples/sec   Loss 13.2127   LearningRate 0.0852   Epoch: 1   Global Step: 63800   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:53:39,447-Speed 2627.85 samples/sec   Loss 13.3512   LearningRate 0.0852   Epoch: 1   Global Step: 63810   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:53:43,341-Speed 2630.15 samples/sec   Loss 13.2691   LearningRate 0.0852   Epoch: 1   Global Step: 63820   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:53:47,241-Speed 2626.50 samples/sec   Loss 13.2478   LearningRate 0.0852   Epoch: 1   Global Step: 63830   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:53:51,120-Speed 2640.03 samples/sec   Loss 13.2521   LearningRate 0.0852   Epoch: 1   Global Step: 63840   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:55,018-Speed 2627.33 samples/sec   Loss 13.1363   LearningRate 0.0852   Epoch: 1   Global Step: 63850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:53:58,927-Speed 2620.85 samples/sec   Loss 13.3038   LearningRate 0.0852   Epoch: 1   Global Step: 63860   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:02,843-Speed 2615.44 samples/sec   Loss 13.3626   LearningRate 0.0852   Epoch: 1   Global Step: 63870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:06,743-Speed 2626.04 samples/sec   Loss 13.2959   LearningRate 0.0852   Epoch: 1   Global Step: 63880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:10,645-Speed 2625.09 samples/sec   Loss 13.3655   LearningRate 0.0852   Epoch: 1   Global Step: 63890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:14,544-Speed 2626.66 samples/sec   Loss 13.2218   LearningRate 0.0852   Epoch: 1   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:18,437-Speed 2631.01 samples/sec   Loss 13.3349   LearningRate 0.0852   Epoch: 1   Global Step: 63910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:22,335-Speed 2627.47 samples/sec   Loss 13.2666   LearningRate 0.0852   Epoch: 1   Global Step: 63920   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:26,236-Speed 2626.12 samples/sec   Loss 13.2355   LearningRate 0.0852   Epoch: 1   Global Step: 63930   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:30,131-Speed 2628.90 samples/sec   Loss 13.3930   LearningRate 0.0852   Epoch: 1   Global Step: 63940   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:54:34,018-Speed 2635.20 samples/sec   Loss 13.1191   LearningRate 0.0852   Epoch: 1   Global Step: 63950   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:37,916-Speed 2627.29 samples/sec   Loss 13.1780   LearningRate 0.0852   Epoch: 1   Global Step: 63960   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:41,809-Speed 2631.40 samples/sec   Loss 13.2333   LearningRate 0.0852   Epoch: 1   Global Step: 63970   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:54:45,691-Speed 2638.80 samples/sec   Loss 13.4094   LearningRate 0.0852   Epoch: 1   Global Step: 63980   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:54:49,590-Speed 2626.66 samples/sec   Loss 13.3402   LearningRate 0.0852   Epoch: 1   Global Step: 63990   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:54:53,492-Speed 2625.24 samples/sec   Loss 13.4194   LearningRate 0.0852   Epoch: 1   Global Step: 64000   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:54:57,398-Speed 2621.68 samples/sec   Loss 13.3158   LearningRate 0.0852   Epoch: 1   Global Step: 64010   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:01,292-Speed 2630.00 samples/sec   Loss 13.3464   LearningRate 0.0852   Epoch: 1   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:05,189-Speed 2628.31 samples/sec   Loss 13.2994   LearningRate 0.0852   Epoch: 1   Global Step: 64030   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:09,083-Speed 2630.68 samples/sec   Loss 13.1751   LearningRate 0.0852   Epoch: 1   Global Step: 64040   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:12,979-Speed 2628.37 samples/sec   Loss 13.1872   LearningRate 0.0852   Epoch: 1   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:16,876-Speed 2635.68 samples/sec   Loss 13.3201   LearningRate 0.0852   Epoch: 1   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:20,772-Speed 2628.76 samples/sec   Loss 13.2712   LearningRate 0.0852   Epoch: 1   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:24,668-Speed 2629.27 samples/sec   Loss 13.3779   LearningRate 0.0851   Epoch: 1   Global Step: 64080   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:55:28,605-Speed 2601.35 samples/sec   Loss 13.2116   LearningRate 0.0851   Epoch: 1   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:32,539-Speed 2603.48 samples/sec   Loss 13.2942   LearningRate 0.0851   Epoch: 1   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:36,437-Speed 2627.76 samples/sec   Loss 13.3646   LearningRate 0.0851   Epoch: 1   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:40,331-Speed 2630.42 samples/sec   Loss 13.3101   LearningRate 0.0851   Epoch: 1   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:44,231-Speed 2626.06 samples/sec   Loss 13.2955   LearningRate 0.0851   Epoch: 1   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:48,127-Speed 2628.66 samples/sec   Loss 13.1928   LearningRate 0.0851   Epoch: 1   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:52,027-Speed 2626.15 samples/sec   Loss 13.1853   LearningRate 0.0851   Epoch: 1   Global Step: 64150   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:55,924-Speed 2629.26 samples/sec   Loss 13.4107   LearningRate 0.0851   Epoch: 1   Global Step: 64160   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:55:59,824-Speed 2626.13 samples/sec   Loss 13.2075   LearningRate 0.0851   Epoch: 1   Global Step: 64170   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:56:03,718-Speed 2630.08 samples/sec   Loss 13.1581   LearningRate 0.0851   Epoch: 1   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 02:56:07,612-Speed 2630.09 samples/sec   Loss 13.3761   LearningRate 0.0851   Epoch: 1   Global Step: 64190   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:11,505-Speed 2630.60 samples/sec   Loss 13.1109   LearningRate 0.0851   Epoch: 1   Global Step: 64200   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:15,399-Speed 2630.13 samples/sec   Loss 13.1551   LearningRate 0.0851   Epoch: 1   Global Step: 64210   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:19,297-Speed 2627.82 samples/sec   Loss 13.2896   LearningRate 0.0851   Epoch: 1   Global Step: 64220   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:23,191-Speed 2630.35 samples/sec   Loss 13.3608   LearningRate 0.0851   Epoch: 1   Global Step: 64230   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:27,082-Speed 2633.08 samples/sec   Loss 13.2728   LearningRate 0.0851   Epoch: 1   Global Step: 64240   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:30,974-Speed 2631.75 samples/sec   Loss 13.2110   LearningRate 0.0851   Epoch: 1   Global Step: 64250   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:34,868-Speed 2629.97 samples/sec   Loss 13.2587   LearningRate 0.0851   Epoch: 1   Global Step: 64260   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:38,761-Speed 2631.02 samples/sec   Loss 13.2071   LearningRate 0.0851   Epoch: 1   Global Step: 64270   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:42,661-Speed 2626.12 samples/sec   Loss 13.2521   LearningRate 0.0851   Epoch: 1   Global Step: 64280   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:46,559-Speed 2628.15 samples/sec   Loss 13.3925   LearningRate 0.0851   Epoch: 1   Global Step: 64290   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:56:50,439-Speed 2639.28 samples/sec   Loss 13.1689   LearningRate 0.0851   Epoch: 1   Global Step: 64300   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:54,332-Speed 2631.24 samples/sec   Loss 13.2020   LearningRate 0.0851   Epoch: 1   Global Step: 64310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:56:58,243-Speed 2618.50 samples/sec   Loss 13.2640   LearningRate 0.0851   Epoch: 1   Global Step: 64320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:02,157-Speed 2616.98 samples/sec   Loss 13.2457   LearningRate 0.0851   Epoch: 1   Global Step: 64330   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:06,065-Speed 2620.75 samples/sec   Loss 13.2606   LearningRate 0.0851   Epoch: 1   Global Step: 64340   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:09,965-Speed 2626.85 samples/sec   Loss 13.2554   LearningRate 0.0851   Epoch: 1   Global Step: 64350   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:13,861-Speed 2628.70 samples/sec   Loss 13.1444   LearningRate 0.0851   Epoch: 1   Global Step: 64360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:17,761-Speed 2626.32 samples/sec   Loss 13.2717   LearningRate 0.0851   Epoch: 1   Global Step: 64370   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:21,659-Speed 2627.99 samples/sec   Loss 13.2657   LearningRate 0.0851   Epoch: 1   Global Step: 64380   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:25,559-Speed 2625.79 samples/sec   Loss 13.2508   LearningRate 0.0851   Epoch: 1   Global Step: 64390   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:29,459-Speed 2626.77 samples/sec   Loss 13.1680   LearningRate 0.0851   Epoch: 1   Global Step: 64400   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:57:33,354-Speed 2629.36 samples/sec   Loss 13.3435   LearningRate 0.0851   Epoch: 1   Global Step: 64410   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:57:37,248-Speed 2630.27 samples/sec   Loss 13.2497   LearningRate 0.0851   Epoch: 1   Global Step: 64420   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:57:41,145-Speed 2628.29 samples/sec   Loss 13.1946   LearningRate 0.0851   Epoch: 1   Global Step: 64430   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:57:45,024-Speed 2641.11 samples/sec   Loss 13.2616   LearningRate 0.0851   Epoch: 1   Global Step: 64440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:48,917-Speed 2630.30 samples/sec   Loss 13.3695   LearningRate 0.0851   Epoch: 1   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:52,810-Speed 2631.20 samples/sec   Loss 13.4091   LearningRate 0.0851   Epoch: 1   Global Step: 64460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:57:56,707-Speed 2628.29 samples/sec   Loss 13.2849   LearningRate 0.0851   Epoch: 1   Global Step: 64470   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:00,628-Speed 2612.63 samples/sec   Loss 13.3470   LearningRate 0.0851   Epoch: 1   Global Step: 64480   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:04,524-Speed 2628.87 samples/sec   Loss 13.3042   LearningRate 0.0851   Epoch: 1   Global Step: 64490   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:08,427-Speed 2624.62 samples/sec   Loss 13.2841   LearningRate 0.0851   Epoch: 1   Global Step: 64500   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:12,335-Speed 2620.71 samples/sec   Loss 13.4183   LearningRate 0.0851   Epoch: 1   Global Step: 64510   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:16,230-Speed 2629.68 samples/sec   Loss 13.2285   LearningRate 0.0851   Epoch: 1   Global Step: 64520   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:20,125-Speed 2629.88 samples/sec   Loss 13.1408   LearningRate 0.0850   Epoch: 1   Global Step: 64530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:24,019-Speed 2630.30 samples/sec   Loss 13.2651   LearningRate 0.0850   Epoch: 1   Global Step: 64540   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:58:27,918-Speed 2626.62 samples/sec   Loss 13.1190   LearningRate 0.0850   Epoch: 1   Global Step: 64550   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:58:31,828-Speed 2620.08 samples/sec   Loss 13.1849   LearningRate 0.0850   Epoch: 1   Global Step: 64560   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:58:35,724-Speed 2629.03 samples/sec   Loss 13.3745   LearningRate 0.0850   Epoch: 1   Global Step: 64570   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:58:39,627-Speed 2623.91 samples/sec   Loss 13.3602   LearningRate 0.0850   Epoch: 1   Global Step: 64580   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:58:43,522-Speed 2629.84 samples/sec   Loss 13.3493   LearningRate 0.0850   Epoch: 1   Global Step: 64590   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:58:47,418-Speed 2629.12 samples/sec   Loss 13.2922   LearningRate 0.0850   Epoch: 1   Global Step: 64600   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:51,341-Speed 2610.75 samples/sec   Loss 13.2899   LearningRate 0.0850   Epoch: 1   Global Step: 64610   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:55,239-Speed 2627.77 samples/sec   Loss 13.3618   LearningRate 0.0850   Epoch: 1   Global Step: 64620   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:58:59,142-Speed 2624.28 samples/sec   Loss 13.2760   LearningRate 0.0850   Epoch: 1   Global Step: 64630   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:03,052-Speed 2619.06 samples/sec   Loss 13.1646   LearningRate 0.0850   Epoch: 1   Global Step: 64640   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:06,951-Speed 2626.80 samples/sec   Loss 13.1048   LearningRate 0.0850   Epoch: 1   Global Step: 64650   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:10,851-Speed 2626.65 samples/sec   Loss 13.2173   LearningRate 0.0850   Epoch: 1   Global Step: 64660   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:14,758-Speed 2622.14 samples/sec   Loss 13.2687   LearningRate 0.0850   Epoch: 1   Global Step: 64670   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:18,653-Speed 2629.03 samples/sec   Loss 13.1877   LearningRate 0.0850   Epoch: 1   Global Step: 64680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:22,553-Speed 2626.71 samples/sec   Loss 13.1900   LearningRate 0.0850   Epoch: 1   Global Step: 64690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:26,446-Speed 2630.66 samples/sec   Loss 13.2228   LearningRate 0.0850   Epoch: 1   Global Step: 64700   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 02:59:30,324-Speed 2641.41 samples/sec   Loss 13.1976   LearningRate 0.0850   Epoch: 1   Global Step: 64710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:34,220-Speed 2629.05 samples/sec   Loss 13.2596   LearningRate 0.0850   Epoch: 1   Global Step: 64720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:38,116-Speed 2628.55 samples/sec   Loss 13.2388   LearningRate 0.0850   Epoch: 1   Global Step: 64730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:42,016-Speed 2626.42 samples/sec   Loss 13.2333   LearningRate 0.0850   Epoch: 1   Global Step: 64740   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:45,912-Speed 2629.32 samples/sec   Loss 13.3068   LearningRate 0.0850   Epoch: 1   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:49,807-Speed 2629.11 samples/sec   Loss 13.4075   LearningRate 0.0850   Epoch: 1   Global Step: 64760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:53,701-Speed 2630.68 samples/sec   Loss 13.1788   LearningRate 0.0850   Epoch: 1   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 02:59:57,597-Speed 2628.77 samples/sec   Loss 13.3232   LearningRate 0.0850   Epoch: 1   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:01,523-Speed 2609.10 samples/sec   Loss 13.3398   LearningRate 0.0850   Epoch: 1   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:05,417-Speed 2630.41 samples/sec   Loss 13.2695   LearningRate 0.0850   Epoch: 1   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:09,312-Speed 2629.46 samples/sec   Loss 13.2503   LearningRate 0.0850   Epoch: 1   Global Step: 64810   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:00:13,209-Speed 2628.17 samples/sec   Loss 13.2499   LearningRate 0.0850   Epoch: 1   Global Step: 64820   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:00:17,090-Speed 2639.64 samples/sec   Loss 13.1672   LearningRate 0.0850   Epoch: 1   Global Step: 64830   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:20,993-Speed 2624.55 samples/sec   Loss 13.4367   LearningRate 0.0850   Epoch: 1   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:24,892-Speed 2626.46 samples/sec   Loss 13.1942   LearningRate 0.0850   Epoch: 1   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:28,797-Speed 2622.81 samples/sec   Loss 13.3154   LearningRate 0.0850   Epoch: 1   Global Step: 64860   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:32,698-Speed 2625.65 samples/sec   Loss 13.2392   LearningRate 0.0850   Epoch: 1   Global Step: 64870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:36,601-Speed 2625.06 samples/sec   Loss 13.1243   LearningRate 0.0850   Epoch: 1   Global Step: 64880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:40,498-Speed 2628.24 samples/sec   Loss 13.1938   LearningRate 0.0850   Epoch: 1   Global Step: 64890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:44,399-Speed 2625.01 samples/sec   Loss 13.2727   LearningRate 0.0850   Epoch: 1   Global Step: 64900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:48,296-Speed 2628.86 samples/sec   Loss 13.2447   LearningRate 0.0850   Epoch: 1   Global Step: 64910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:52,246-Speed 2593.12 samples/sec   Loss 13.2131   LearningRate 0.0850   Epoch: 1   Global Step: 64920   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:00:56,303-Speed 2524.53 samples/sec   Loss 13.2230   LearningRate 0.0850   Epoch: 1   Global Step: 64930   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:01:00,181-Speed 2641.30 samples/sec   Loss 13.1195   LearningRate 0.0850   Epoch: 1   Global Step: 64940   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:01:04,083-Speed 2624.79 samples/sec   Loss 13.2710   LearningRate 0.0850   Epoch: 1   Global Step: 64950   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:01:07,980-Speed 2628.71 samples/sec   Loss 13.2584   LearningRate 0.0850   Epoch: 1   Global Step: 64960   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:01:11,920-Speed 2627.14 samples/sec   Loss 13.2530   LearningRate 0.0850   Epoch: 1   Global Step: 64970   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:01:15,832-Speed 2617.95 samples/sec   Loss 13.0314   LearningRate 0.0849   Epoch: 1   Global Step: 64980   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:01:19,754-Speed 2648.66 samples/sec   Loss 13.1973   LearningRate 0.0849   Epoch: 1   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:23,647-Speed 2631.33 samples/sec   Loss 13.2046   LearningRate 0.0849   Epoch: 1   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:28,264-Speed 2636.16 samples/sec   Loss 13.1232   LearningRate 0.0849   Epoch: 1   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:32,180-Speed 2615.23 samples/sec   Loss 13.0974   LearningRate 0.0849   Epoch: 1   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:36,077-Speed 2628.49 samples/sec   Loss 13.2505   LearningRate 0.0849   Epoch: 1   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:39,968-Speed 2633.02 samples/sec   Loss 13.2375   LearningRate 0.0849   Epoch: 1   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:43,861-Speed 2630.98 samples/sec   Loss 13.3354   LearningRate 0.0849   Epoch: 1   Global Step: 65050   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:49,353-Speed 2639.93 samples/sec   Loss 13.1588   LearningRate 0.0849   Epoch: 1   Global Step: 65060   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:53,263-Speed 2619.65 samples/sec   Loss 13.2779   LearningRate 0.0849   Epoch: 1   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:01:57,155-Speed 2632.29 samples/sec   Loss 13.1475   LearningRate 0.0849   Epoch: 1   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:02:01,060-Speed 2622.87 samples/sec   Loss 13.0752   LearningRate 0.0849   Epoch: 1   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:05,000-Speed 2599.42 samples/sec   Loss 13.0956   LearningRate 0.0849   Epoch: 1   Global Step: 65100   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:08,977-Speed 2575.46 samples/sec   Loss 13.2936   LearningRate 0.0849   Epoch: 1   Global Step: 65110   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:12,875-Speed 2627.86 samples/sec   Loss 13.3806   LearningRate 0.0849   Epoch: 1   Global Step: 65120   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:16,794-Speed 2613.72 samples/sec   Loss 13.0535   LearningRate 0.0849   Epoch: 1   Global Step: 65130   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:20,699-Speed 2622.48 samples/sec   Loss 13.2026   LearningRate 0.0849   Epoch: 1   Global Step: 65140   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:24,607-Speed 2621.78 samples/sec   Loss 13.1727   LearningRate 0.0849   Epoch: 1   Global Step: 65150   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:28,505-Speed 2627.15 samples/sec   Loss 13.2420   LearningRate 0.0849   Epoch: 1   Global Step: 65160   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:32,437-Speed 2604.97 samples/sec   Loss 13.3828   LearningRate 0.0849   Epoch: 1   Global Step: 65170   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:36,345-Speed 2621.11 samples/sec   Loss 13.1567   LearningRate 0.0849   Epoch: 1   Global Step: 65180   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:02:40,243-Speed 2627.45 samples/sec   Loss 13.1781   LearningRate 0.0849   Epoch: 1   Global Step: 65190   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:02:44,147-Speed 2623.50 samples/sec   Loss 13.1654   LearningRate 0.0849   Epoch: 1   Global Step: 65200   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:02:48,047-Speed 2626.22 samples/sec   Loss 13.1107   LearningRate 0.0849   Epoch: 1   Global Step: 65210   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:02:51,944-Speed 2628.36 samples/sec   Loss 13.2766   LearningRate 0.0849   Epoch: 1   Global Step: 65220   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:02:55,843-Speed 2627.20 samples/sec   Loss 13.2787   LearningRate 0.0849   Epoch: 1   Global Step: 65230   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:02:59,743-Speed 2626.33 samples/sec   Loss 13.1520   LearningRate 0.0849   Epoch: 1   Global Step: 65240   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:03:03,639-Speed 2628.77 samples/sec   Loss 13.0162   LearningRate 0.0849   Epoch: 1   Global Step: 65250   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:03:07,542-Speed 2624.37 samples/sec   Loss 13.3421   LearningRate 0.0849   Epoch: 1   Global Step: 65260   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:03:11,436-Speed 2633.65 samples/sec   Loss 13.1105   LearningRate 0.0849   Epoch: 1   Global Step: 65270   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:03:15,319-Speed 2637.54 samples/sec   Loss 13.3958   LearningRate 0.0849   Epoch: 1   Global Step: 65280   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:19,231-Speed 2617.88 samples/sec   Loss 13.3622   LearningRate 0.0849   Epoch: 1   Global Step: 65290   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:23,134-Speed 2624.99 samples/sec   Loss 13.1664   LearningRate 0.0849   Epoch: 1   Global Step: 65300   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:27,028-Speed 2630.39 samples/sec   Loss 13.4229   LearningRate 0.0849   Epoch: 1   Global Step: 65310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:30,948-Speed 2618.11 samples/sec   Loss 13.2035   LearningRate 0.0849   Epoch: 1   Global Step: 65320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:34,846-Speed 2627.59 samples/sec   Loss 13.2717   LearningRate 0.0849   Epoch: 1   Global Step: 65330   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:38,742-Speed 2629.20 samples/sec   Loss 13.2312   LearningRate 0.0849   Epoch: 1   Global Step: 65340   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:42,639-Speed 2628.18 samples/sec   Loss 13.4032   LearningRate 0.0849   Epoch: 1   Global Step: 65350   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:46,565-Speed 2609.00 samples/sec   Loss 13.2408   LearningRate 0.0849   Epoch: 1   Global Step: 65360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:50,477-Speed 2618.27 samples/sec   Loss 13.1927   LearningRate 0.0849   Epoch: 1   Global Step: 65370   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:03:54,365-Speed 2634.51 samples/sec   Loss 13.0360   LearningRate 0.0849   Epoch: 1   Global Step: 65380   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:03:58,257-Speed 2631.38 samples/sec   Loss 13.2781   LearningRate 0.0849   Epoch: 1   Global Step: 65390   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:04:02,146-Speed 2633.79 samples/sec   Loss 13.2476   LearningRate 0.0849   Epoch: 1   Global Step: 65400   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:04:06,026-Speed 2640.02 samples/sec   Loss 13.1613   LearningRate 0.0849   Epoch: 1   Global Step: 65410   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:04:09,926-Speed 2626.55 samples/sec   Loss 13.1496   LearningRate 0.0849   Epoch: 1   Global Step: 65420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:04:13,822-Speed 2628.66 samples/sec   Loss 13.1243   LearningRate 0.0848   Epoch: 1   Global Step: 65430   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:04:17,729-Speed 2621.53 samples/sec   Loss 13.1820   LearningRate 0.0848   Epoch: 1   Global Step: 65440   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:04:21,622-Speed 2631.44 samples/sec   Loss 13.0934   LearningRate 0.0848   Epoch: 1   Global Step: 65450   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:04:25,513-Speed 2631.92 samples/sec   Loss 13.3097   LearningRate 0.0848   Epoch: 1   Global Step: 65460   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:04:29,384-Speed 2646.54 samples/sec   Loss 13.2349   LearningRate 0.0848   Epoch: 1   Global Step: 65470   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:33,284-Speed 2625.50 samples/sec   Loss 13.2358   LearningRate 0.0848   Epoch: 1   Global Step: 65480   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:37,174-Speed 2633.14 samples/sec   Loss 13.3214   LearningRate 0.0848   Epoch: 1   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:41,066-Speed 2632.02 samples/sec   Loss 13.2518   LearningRate 0.0848   Epoch: 1   Global Step: 65500   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:44,960-Speed 2630.49 samples/sec   Loss 13.3331   LearningRate 0.0848   Epoch: 1   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:48,860-Speed 2626.39 samples/sec   Loss 13.4197   LearningRate 0.0848   Epoch: 1   Global Step: 65520   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:52,750-Speed 2633.02 samples/sec   Loss 13.3722   LearningRate 0.0848   Epoch: 1   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:04:56,663-Speed 2617.42 samples/sec   Loss 13.2497   LearningRate 0.0848   Epoch: 1   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:00,560-Speed 2629.14 samples/sec   Loss 13.3285   LearningRate 0.0848   Epoch: 1   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:04,480-Speed 2612.31 samples/sec   Loss 13.2202   LearningRate 0.0848   Epoch: 1   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:08,380-Speed 2625.89 samples/sec   Loss 13.0786   LearningRate 0.0848   Epoch: 1   Global Step: 65570   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:05:12,275-Speed 2629.82 samples/sec   Loss 13.1755   LearningRate 0.0848   Epoch: 1   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:16,174-Speed 2627.14 samples/sec   Loss 13.2810   LearningRate 0.0848   Epoch: 1   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:20,065-Speed 2632.70 samples/sec   Loss 13.2268   LearningRate 0.0848   Epoch: 1   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:23,973-Speed 2620.90 samples/sec   Loss 13.1894   LearningRate 0.0848   Epoch: 1   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:27,864-Speed 2632.86 samples/sec   Loss 13.2611   LearningRate 0.0848   Epoch: 1   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:31,751-Speed 2635.24 samples/sec   Loss 13.1354   LearningRate 0.0848   Epoch: 1   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:35,641-Speed 2633.22 samples/sec   Loss 13.4953   LearningRate 0.0848   Epoch: 1   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:39,533-Speed 2631.50 samples/sec   Loss 13.0618   LearningRate 0.0848   Epoch: 1   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:43,425-Speed 2631.20 samples/sec   Loss 13.3888   LearningRate 0.0848   Epoch: 1   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:47,313-Speed 2634.70 samples/sec   Loss 13.3300   LearningRate 0.0848   Epoch: 1   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:05:51,216-Speed 2623.98 samples/sec   Loss 13.1951   LearningRate 0.0848   Epoch: 1   Global Step: 65680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:05:55,110-Speed 2630.73 samples/sec   Loss 13.1743   LearningRate 0.0848   Epoch: 1   Global Step: 65690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:05:59,005-Speed 2629.92 samples/sec   Loss 13.2145   LearningRate 0.0848   Epoch: 1   Global Step: 65700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:02,905-Speed 2626.38 samples/sec   Loss 13.0858   LearningRate 0.0848   Epoch: 1   Global Step: 65710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:06,799-Speed 2630.85 samples/sec   Loss 13.1818   LearningRate 0.0848   Epoch: 1   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:10,704-Speed 2622.86 samples/sec   Loss 13.3455   LearningRate 0.0848   Epoch: 1   Global Step: 65730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:14,595-Speed 2632.23 samples/sec   Loss 13.2073   LearningRate 0.0848   Epoch: 1   Global Step: 65740   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:18,492-Speed 2628.61 samples/sec   Loss 13.1120   LearningRate 0.0848   Epoch: 1   Global Step: 65750   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:22,397-Speed 2623.39 samples/sec   Loss 13.4083   LearningRate 0.0848   Epoch: 1   Global Step: 65760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:06:26,288-Speed 2631.89 samples/sec   Loss 13.3361   LearningRate 0.0848   Epoch: 1   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:30,199-Speed 2619.06 samples/sec   Loss 13.3089   LearningRate 0.0848   Epoch: 1   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:34,094-Speed 2629.85 samples/sec   Loss 13.0672   LearningRate 0.0848   Epoch: 1   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:37,996-Speed 2625.46 samples/sec   Loss 13.1924   LearningRate 0.0848   Epoch: 1   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:41,893-Speed 2628.31 samples/sec   Loss 13.1029   LearningRate 0.0848   Epoch: 1   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:45,784-Speed 2631.99 samples/sec   Loss 13.3658   LearningRate 0.0848   Epoch: 1   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:49,683-Speed 2626.95 samples/sec   Loss 13.1760   LearningRate 0.0848   Epoch: 1   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:53,579-Speed 2629.13 samples/sec   Loss 13.3934   LearningRate 0.0848   Epoch: 1   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:06:57,476-Speed 2628.08 samples/sec   Loss 13.1680   LearningRate 0.0848   Epoch: 1   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:07:01,374-Speed 2627.39 samples/sec   Loss 13.1587   LearningRate 0.0848   Epoch: 1   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:07:05,264-Speed 2633.68 samples/sec   Loss 13.0995   LearningRate 0.0848   Epoch: 1   Global Step: 65870   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:09,154-Speed 2632.51 samples/sec   Loss 13.1336   LearningRate 0.0847   Epoch: 1   Global Step: 65880   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:13,051-Speed 2629.01 samples/sec   Loss 13.1072   LearningRate 0.0847   Epoch: 1   Global Step: 65890   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:16,945-Speed 2631.14 samples/sec   Loss 13.3096   LearningRate 0.0847   Epoch: 1   Global Step: 65900   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:20,837-Speed 2631.44 samples/sec   Loss 13.2200   LearningRate 0.0847   Epoch: 1   Global Step: 65910   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:24,737-Speed 2626.33 samples/sec   Loss 13.1808   LearningRate 0.0847   Epoch: 1   Global Step: 65920   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:28,653-Speed 2615.13 samples/sec   Loss 13.3155   LearningRate 0.0847   Epoch: 1   Global Step: 65930   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:32,564-Speed 2619.02 samples/sec   Loss 13.2000   LearningRate 0.0847   Epoch: 1   Global Step: 65940   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:36,454-Speed 2633.18 samples/sec   Loss 13.1434   LearningRate 0.0847   Epoch: 1   Global Step: 65950   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:40,346-Speed 2631.77 samples/sec   Loss 13.2376   LearningRate 0.0847   Epoch: 1   Global Step: 65960   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:44,240-Speed 2630.36 samples/sec   Loss 13.1803   LearningRate 0.0847   Epoch: 1   Global Step: 65970   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:07:48,177-Speed 2601.72 samples/sec   Loss 13.0175   LearningRate 0.0847   Epoch: 1   Global Step: 65980   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:07:52,177-Speed 2560.43 samples/sec   Loss 13.1538   LearningRate 0.0847   Epoch: 1   Global Step: 65990   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:07:56,057-Speed 2639.61 samples/sec   Loss 13.2638   LearningRate 0.0847   Epoch: 1   Global Step: 66000   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:07:59,953-Speed 2628.85 samples/sec   Loss 13.0831   LearningRate 0.0847   Epoch: 1   Global Step: 66010   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:03,847-Speed 2630.16 samples/sec   Loss 13.1211   LearningRate 0.0847   Epoch: 1   Global Step: 66020   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:07,744-Speed 2628.71 samples/sec   Loss 13.0967   LearningRate 0.0847   Epoch: 1   Global Step: 66030   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:11,640-Speed 2629.07 samples/sec   Loss 13.1995   LearningRate 0.0847   Epoch: 1   Global Step: 66040   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:15,534-Speed 2630.54 samples/sec   Loss 13.0462   LearningRate 0.0847   Epoch: 1   Global Step: 66050   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:19,440-Speed 2622.48 samples/sec   Loss 12.9790   LearningRate 0.0847   Epoch: 1   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:23,341-Speed 2624.94 samples/sec   Loss 13.1543   LearningRate 0.0847   Epoch: 1   Global Step: 66070   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:27,234-Speed 2631.20 samples/sec   Loss 13.1697   LearningRate 0.0847   Epoch: 1   Global Step: 66080   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:31,143-Speed 2620.28 samples/sec   Loss 13.0959   LearningRate 0.0847   Epoch: 1   Global Step: 66090   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:35,029-Speed 2635.69 samples/sec   Loss 13.0844   LearningRate 0.0847   Epoch: 1   Global Step: 66100   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:08:38,928-Speed 2626.31 samples/sec   Loss 13.1158   LearningRate 0.0847   Epoch: 1   Global Step: 66110   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:08:42,830-Speed 2625.53 samples/sec   Loss 13.1025   LearningRate 0.0847   Epoch: 1   Global Step: 66120   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:08:46,726-Speed 2629.04 samples/sec   Loss 13.1047   LearningRate 0.0847   Epoch: 1   Global Step: 66130   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:08:50,624-Speed 2627.50 samples/sec   Loss 13.1765   LearningRate 0.0847   Epoch: 1   Global Step: 66140   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:54,519-Speed 2630.00 samples/sec   Loss 13.1062   LearningRate 0.0847   Epoch: 1   Global Step: 66150   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:08:58,412-Speed 2631.12 samples/sec   Loss 13.0686   LearningRate 0.0847   Epoch: 1   Global Step: 66160   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:02,306-Speed 2630.17 samples/sec   Loss 13.0564   LearningRate 0.0847   Epoch: 1   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:06,204-Speed 2626.85 samples/sec   Loss 13.2017   LearningRate 0.0847   Epoch: 1   Global Step: 66180   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:10,095-Speed 2632.38 samples/sec   Loss 13.2267   LearningRate 0.0847   Epoch: 1   Global Step: 66190   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:13,991-Speed 2629.48 samples/sec   Loss 13.3136   LearningRate 0.0847   Epoch: 1   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:17,901-Speed 2619.91 samples/sec   Loss 13.1573   LearningRate 0.0847   Epoch: 1   Global Step: 66210   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:21,795-Speed 2630.37 samples/sec   Loss 13.0054   LearningRate 0.0847   Epoch: 1   Global Step: 66220   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:25,693-Speed 2627.75 samples/sec   Loss 13.1942   LearningRate 0.0847   Epoch: 1   Global Step: 66230   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:29,605-Speed 2618.20 samples/sec   Loss 13.1211   LearningRate 0.0847   Epoch: 1   Global Step: 66240   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:09:33,490-Speed 2636.13 samples/sec   Loss 13.2019   LearningRate 0.0847   Epoch: 1   Global Step: 66250   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:37,380-Speed 2633.13 samples/sec   Loss 13.0895   LearningRate 0.0847   Epoch: 1   Global Step: 66260   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:41,270-Speed 2633.00 samples/sec   Loss 13.1128   LearningRate 0.0847   Epoch: 1   Global Step: 66270   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:45,189-Speed 2613.39 samples/sec   Loss 13.3736   LearningRate 0.0847   Epoch: 1   Global Step: 66280   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:49,106-Speed 2615.05 samples/sec   Loss 13.2475   LearningRate 0.0847   Epoch: 1   Global Step: 66290   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:53,110-Speed 2557.89 samples/sec   Loss 13.1949   LearningRate 0.0847   Epoch: 1   Global Step: 66300   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:09:57,099-Speed 2567.79 samples/sec   Loss 13.2226   LearningRate 0.0847   Epoch: 1   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:01,222-Speed 2484.26 samples/sec   Loss 13.1080   LearningRate 0.0847   Epoch: 1   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:05,140-Speed 2614.48 samples/sec   Loss 13.1755   LearningRate 0.0846   Epoch: 1   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:09,033-Speed 2630.92 samples/sec   Loss 13.2472   LearningRate 0.0846   Epoch: 1   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:12,935-Speed 2624.86 samples/sec   Loss 13.1614   LearningRate 0.0846   Epoch: 1   Global Step: 66350   Fp16 Grad Scale: 262144   Required: 86 hours
Training: 2022-04-13 03:10:16,816-Speed 2638.45 samples/sec   Loss 13.2133   LearningRate 0.0846   Epoch: 1   Global Step: 66360   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:20,718-Speed 2624.94 samples/sec   Loss 12.9383   LearningRate 0.0846   Epoch: 1   Global Step: 66370   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:24,616-Speed 2627.51 samples/sec   Loss 13.1503   LearningRate 0.0846   Epoch: 1   Global Step: 66380   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:28,513-Speed 2628.85 samples/sec   Loss 13.3614   LearningRate 0.0846   Epoch: 1   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:32,413-Speed 2626.48 samples/sec   Loss 13.2911   LearningRate 0.0846   Epoch: 1   Global Step: 66400   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:36,314-Speed 2625.01 samples/sec   Loss 13.1195   LearningRate 0.0846   Epoch: 1   Global Step: 66410   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:40,208-Speed 2630.08 samples/sec   Loss 13.1293   LearningRate 0.0846   Epoch: 1   Global Step: 66420   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:10:44,093-Speed 2636.41 samples/sec   Loss 13.2052   LearningRate 0.0846   Epoch: 1   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:10:47,987-Speed 2630.52 samples/sec   Loss 13.1790   LearningRate 0.0846   Epoch: 1   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:10:51,887-Speed 2626.40 samples/sec   Loss 13.1746   LearningRate 0.0846   Epoch: 1   Global Step: 66450   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:10:55,879-Speed 2565.77 samples/sec   Loss 13.0447   LearningRate 0.0846   Epoch: 1   Global Step: 66460   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:10:59,782-Speed 2624.21 samples/sec   Loss 13.1974   LearningRate 0.0846   Epoch: 1   Global Step: 66470   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:03,690-Speed 2621.65 samples/sec   Loss 13.1820   LearningRate 0.0846   Epoch: 1   Global Step: 66480   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:07,584-Speed 2629.67 samples/sec   Loss 13.2310   LearningRate 0.0846   Epoch: 1   Global Step: 66490   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:11,482-Speed 2627.79 samples/sec   Loss 13.1913   LearningRate 0.0846   Epoch: 1   Global Step: 66500   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:15,402-Speed 2612.26 samples/sec   Loss 13.1184   LearningRate 0.0846   Epoch: 1   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:19,297-Speed 2629.41 samples/sec   Loss 13.0054   LearningRate 0.0846   Epoch: 1   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:23,260-Speed 2585.47 samples/sec   Loss 13.1773   LearningRate 0.0846   Epoch: 1   Global Step: 66530   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:11:27,177-Speed 2614.69 samples/sec   Loss 13.3053   LearningRate 0.0846   Epoch: 1   Global Step: 66540   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:11:31,074-Speed 2628.29 samples/sec   Loss 13.1476   LearningRate 0.0846   Epoch: 1   Global Step: 66550   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:11:35,036-Speed 2585.20 samples/sec   Loss 13.2276   LearningRate 0.0846   Epoch: 1   Global Step: 66560   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:11:38,939-Speed 2624.57 samples/sec   Loss 13.1176   LearningRate 0.0846   Epoch: 1   Global Step: 66570   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:11:42,818-Speed 2640.50 samples/sec   Loss 13.0472   LearningRate 0.0846   Epoch: 1   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:46,713-Speed 2629.38 samples/sec   Loss 13.0986   LearningRate 0.0846   Epoch: 1   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:50,632-Speed 2613.77 samples/sec   Loss 13.1401   LearningRate 0.0846   Epoch: 1   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:54,537-Speed 2622.79 samples/sec   Loss 13.1029   LearningRate 0.0846   Epoch: 1   Global Step: 66610   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:11:58,431-Speed 2630.89 samples/sec   Loss 13.0777   LearningRate 0.0846   Epoch: 1   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:12:02,342-Speed 2618.65 samples/sec   Loss 13.1769   LearningRate 0.0846   Epoch: 1   Global Step: 66630   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:12:06,244-Speed 2624.63 samples/sec   Loss 13.1071   LearningRate 0.0846   Epoch: 1   Global Step: 66640   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:12:10,159-Speed 2616.32 samples/sec   Loss 13.2194   LearningRate 0.0846   Epoch: 1   Global Step: 66650   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:12:14,062-Speed 2623.80 samples/sec   Loss 13.1530   LearningRate 0.0846   Epoch: 1   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:12:17,968-Speed 2630.50 samples/sec   Loss 13.2080   LearningRate 0.0846   Epoch: 1   Global Step: 66670   Fp16 Grad Scale: 65536   Required: 86 hours
Training: 2022-04-13 03:12:21,873-Speed 2622.98 samples/sec   Loss 13.0534   LearningRate 0.0846   Epoch: 1   Global Step: 66680   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:25,778-Speed 2622.56 samples/sec   Loss 13.1035   LearningRate 0.0846   Epoch: 1   Global Step: 66690   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:29,680-Speed 2625.38 samples/sec   Loss 13.1057   LearningRate 0.0846   Epoch: 1   Global Step: 66700   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:33,581-Speed 2625.80 samples/sec   Loss 13.1757   LearningRate 0.0846   Epoch: 1   Global Step: 66710   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:37,484-Speed 2624.30 samples/sec   Loss 13.1324   LearningRate 0.0846   Epoch: 1   Global Step: 66720   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:41,387-Speed 2623.85 samples/sec   Loss 13.3083   LearningRate 0.0846   Epoch: 1   Global Step: 66730   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:45,285-Speed 2628.10 samples/sec   Loss 13.2944   LearningRate 0.0846   Epoch: 1   Global Step: 66740   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:49,186-Speed 2625.32 samples/sec   Loss 13.2566   LearningRate 0.0846   Epoch: 1   Global Step: 66750   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:53,085-Speed 2627.02 samples/sec   Loss 13.2630   LearningRate 0.0846   Epoch: 1   Global Step: 66760   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:12:56,984-Speed 2626.96 samples/sec   Loss 13.0976   LearningRate 0.0846   Epoch: 1   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 86 hours
Training: 2022-04-13 03:13:00,865-Speed 2639.04 samples/sec   Loss 13.0757   LearningRate 0.0845   Epoch: 1   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:04,764-Speed 2626.91 samples/sec   Loss 13.2636   LearningRate 0.0845   Epoch: 1   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:08,664-Speed 2626.68 samples/sec   Loss 13.3068   LearningRate 0.0845   Epoch: 1   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:12,562-Speed 2627.68 samples/sec   Loss 13.2350   LearningRate 0.0845   Epoch: 1   Global Step: 66810   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:16,464-Speed 2625.50 samples/sec   Loss 13.1200   LearningRate 0.0845   Epoch: 1   Global Step: 66820   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:20,362-Speed 2627.44 samples/sec   Loss 13.2493   LearningRate 0.0845   Epoch: 1   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:24,284-Speed 2611.62 samples/sec   Loss 13.1430   LearningRate 0.0845   Epoch: 1   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:28,183-Speed 2627.46 samples/sec   Loss 13.2126   LearningRate 0.0845   Epoch: 1   Global Step: 66850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:32,081-Speed 2627.67 samples/sec   Loss 13.2522   LearningRate 0.0845   Epoch: 1   Global Step: 66860   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:35,984-Speed 2623.90 samples/sec   Loss 13.2498   LearningRate 0.0845   Epoch: 1   Global Step: 66870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:39,861-Speed 2641.81 samples/sec   Loss 13.0770   LearningRate 0.0845   Epoch: 1   Global Step: 66880   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:43,756-Speed 2629.89 samples/sec   Loss 13.2297   LearningRate 0.0845   Epoch: 1   Global Step: 66890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:47,654-Speed 2628.07 samples/sec   Loss 13.2150   LearningRate 0.0845   Epoch: 1   Global Step: 66900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:51,559-Speed 2622.93 samples/sec   Loss 13.2312   LearningRate 0.0845   Epoch: 1   Global Step: 66910   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:55,478-Speed 2613.44 samples/sec   Loss 13.2701   LearningRate 0.0845   Epoch: 1   Global Step: 66920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:13:59,375-Speed 2628.60 samples/sec   Loss 13.2239   LearningRate 0.0845   Epoch: 1   Global Step: 66930   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:03,274-Speed 2626.70 samples/sec   Loss 13.1048   LearningRate 0.0845   Epoch: 1   Global Step: 66940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:07,179-Speed 2622.70 samples/sec   Loss 13.2573   LearningRate 0.0845   Epoch: 1   Global Step: 66950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:11,078-Speed 2627.27 samples/sec   Loss 13.1020   LearningRate 0.0845   Epoch: 1   Global Step: 66960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:14,975-Speed 2628.53 samples/sec   Loss 13.1586   LearningRate 0.0845   Epoch: 1   Global Step: 66970   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:18,917-Speed 2598.59 samples/sec   Loss 13.0615   LearningRate 0.0845   Epoch: 1   Global Step: 66980   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:14:22,789-Speed 2645.53 samples/sec   Loss 13.2300   LearningRate 0.0845   Epoch: 1   Global Step: 66990   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:26,684-Speed 2629.76 samples/sec   Loss 13.3256   LearningRate 0.0845   Epoch: 1   Global Step: 67000   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:30,584-Speed 2626.17 samples/sec   Loss 13.3267   LearningRate 0.0845   Epoch: 1   Global Step: 67010   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:34,504-Speed 2612.50 samples/sec   Loss 13.2256   LearningRate 0.0845   Epoch: 1   Global Step: 67020   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:38,445-Speed 2598.80 samples/sec   Loss 13.2631   LearningRate 0.0845   Epoch: 1   Global Step: 67030   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:14:42,347-Speed 2625.17 samples/sec   Loss 13.1660   LearningRate 0.0845   Epoch: 1   Global Step: 67040   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:14:46,248-Speed 2625.78 samples/sec   Loss 13.1768   LearningRate 0.0845   Epoch: 1   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:14:50,155-Speed 2621.76 samples/sec   Loss 13.2562   LearningRate 0.0845   Epoch: 1   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:14:54,058-Speed 2623.81 samples/sec   Loss 12.9899   LearningRate 0.0845   Epoch: 1   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:14:57,965-Speed 2621.56 samples/sec   Loss 13.0506   LearningRate 0.0845   Epoch: 1   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:15:01,871-Speed 2622.41 samples/sec   Loss 13.1423   LearningRate 0.0845   Epoch: 1   Global Step: 67090   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:15:05,771-Speed 2626.29 samples/sec   Loss 13.1482   LearningRate 0.0845   Epoch: 1   Global Step: 67100   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:15:09,676-Speed 2622.28 samples/sec   Loss 13.2009   LearningRate 0.0845   Epoch: 1   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:15:13,574-Speed 2628.31 samples/sec   Loss 13.2907   LearningRate 0.0845   Epoch: 1   Global Step: 67120   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:15:17,474-Speed 2625.92 samples/sec   Loss 13.2296   LearningRate 0.0845   Epoch: 1   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:15:21,369-Speed 2630.50 samples/sec   Loss 13.1927   LearningRate 0.0845   Epoch: 1   Global Step: 67140   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:25,267-Speed 2627.11 samples/sec   Loss 13.1098   LearningRate 0.0845   Epoch: 1   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:29,163-Speed 2628.90 samples/sec   Loss 13.2518   LearningRate 0.0845   Epoch: 1   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:33,061-Speed 2627.81 samples/sec   Loss 13.2367   LearningRate 0.0845   Epoch: 1   Global Step: 67170   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:36,960-Speed 2626.71 samples/sec   Loss 13.2514   LearningRate 0.0845   Epoch: 1   Global Step: 67180   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:40,883-Speed 2610.59 samples/sec   Loss 13.0858   LearningRate 0.0845   Epoch: 1   Global Step: 67190   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:44,769-Speed 2635.49 samples/sec   Loss 13.2226   LearningRate 0.0845   Epoch: 1   Global Step: 67200   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:48,673-Speed 2623.99 samples/sec   Loss 13.0898   LearningRate 0.0845   Epoch: 1   Global Step: 67210   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:52,582-Speed 2620.28 samples/sec   Loss 13.0349   LearningRate 0.0845   Epoch: 1   Global Step: 67220   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:15:56,477-Speed 2630.15 samples/sec   Loss 13.1290   LearningRate 0.0844   Epoch: 1   Global Step: 67230   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:00,373-Speed 2628.48 samples/sec   Loss 13.2341   LearningRate 0.0844   Epoch: 1   Global Step: 67240   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:16:04,274-Speed 2625.80 samples/sec   Loss 13.1589   LearningRate 0.0844   Epoch: 1   Global Step: 67250   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:16:08,174-Speed 2625.74 samples/sec   Loss 13.1740   LearningRate 0.0844   Epoch: 1   Global Step: 67260   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:16:12,062-Speed 2634.82 samples/sec   Loss 13.1363   LearningRate 0.0844   Epoch: 1   Global Step: 67270   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:15,976-Speed 2616.38 samples/sec   Loss 13.1764   LearningRate 0.0844   Epoch: 1   Global Step: 67280   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:19,879-Speed 2624.09 samples/sec   Loss 13.1522   LearningRate 0.0844   Epoch: 1   Global Step: 67290   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:23,782-Speed 2625.40 samples/sec   Loss 13.1491   LearningRate 0.0844   Epoch: 1   Global Step: 67300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:27,694-Speed 2617.71 samples/sec   Loss 13.1132   LearningRate 0.0844   Epoch: 1   Global Step: 67310   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:31,606-Speed 2618.95 samples/sec   Loss 13.1188   LearningRate 0.0844   Epoch: 1   Global Step: 67320   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:35,509-Speed 2624.18 samples/sec   Loss 13.2671   LearningRate 0.0844   Epoch: 1   Global Step: 67330   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:39,407-Speed 2627.60 samples/sec   Loss 13.3159   LearningRate 0.0844   Epoch: 1   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:43,309-Speed 2624.87 samples/sec   Loss 13.2172   LearningRate 0.0844   Epoch: 1   Global Step: 67350   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:47,219-Speed 2619.36 samples/sec   Loss 13.1095   LearningRate 0.0844   Epoch: 1   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:16:51,075-Speed 2656.19 samples/sec   Loss 13.1994   LearningRate 0.0844   Epoch: 1   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:16:54,975-Speed 2627.23 samples/sec   Loss 13.3090   LearningRate 0.0844   Epoch: 1   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:16:58,876-Speed 2625.45 samples/sec   Loss 12.9944   LearningRate 0.0844   Epoch: 1   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:02,782-Speed 2622.01 samples/sec   Loss 13.1100   LearningRate 0.0844   Epoch: 1   Global Step: 67400   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:06,681-Speed 2627.01 samples/sec   Loss 13.0688   LearningRate 0.0844   Epoch: 1   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:10,577-Speed 2628.68 samples/sec   Loss 13.1818   LearningRate 0.0844   Epoch: 1   Global Step: 67420   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:14,484-Speed 2621.31 samples/sec   Loss 13.1839   LearningRate 0.0844   Epoch: 1   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:18,398-Speed 2617.46 samples/sec   Loss 13.1162   LearningRate 0.0844   Epoch: 1   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:22,312-Speed 2616.95 samples/sec   Loss 13.2472   LearningRate 0.0844   Epoch: 1   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:26,230-Speed 2613.93 samples/sec   Loss 13.0948   LearningRate 0.0844   Epoch: 1   Global Step: 67460   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:17:30,151-Speed 2612.76 samples/sec   Loss 13.2391   LearningRate 0.0844   Epoch: 1   Global Step: 67470   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:34,052-Speed 2625.78 samples/sec   Loss 13.1826   LearningRate 0.0844   Epoch: 1   Global Step: 67480   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:37,949-Speed 2627.81 samples/sec   Loss 13.2824   LearningRate 0.0844   Epoch: 1   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:41,851-Speed 2624.80 samples/sec   Loss 13.0881   LearningRate 0.0844   Epoch: 1   Global Step: 67500   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:45,756-Speed 2622.80 samples/sec   Loss 13.2354   LearningRate 0.0844   Epoch: 1   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:49,655-Speed 2627.13 samples/sec   Loss 13.1957   LearningRate 0.0844   Epoch: 1   Global Step: 67520   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:53,558-Speed 2625.36 samples/sec   Loss 13.0965   LearningRate 0.0844   Epoch: 1   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:17:57,458-Speed 2625.59 samples/sec   Loss 13.2561   LearningRate 0.0844   Epoch: 1   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:18:01,372-Speed 2617.60 samples/sec   Loss 13.2547   LearningRate 0.0844   Epoch: 1   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:18:05,267-Speed 2629.43 samples/sec   Loss 13.3247   LearningRate 0.0844   Epoch: 1   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:18:09,166-Speed 2627.00 samples/sec   Loss 13.2157   LearningRate 0.0844   Epoch: 1   Global Step: 67570   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:13,063-Speed 2628.29 samples/sec   Loss 13.2210   LearningRate 0.0844   Epoch: 1   Global Step: 67580   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:16,959-Speed 2628.86 samples/sec   Loss 13.2148   LearningRate 0.0844   Epoch: 1   Global Step: 67590   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:20,859-Speed 2626.40 samples/sec   Loss 13.0497   LearningRate 0.0844   Epoch: 1   Global Step: 67600   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:24,764-Speed 2622.75 samples/sec   Loss 13.2458   LearningRate 0.0844   Epoch: 1   Global Step: 67610   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:28,661-Speed 2628.54 samples/sec   Loss 13.1620   LearningRate 0.0844   Epoch: 1   Global Step: 67620   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:32,560-Speed 2627.33 samples/sec   Loss 13.0330   LearningRate 0.0844   Epoch: 1   Global Step: 67630   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:36,460-Speed 2626.16 samples/sec   Loss 13.1491   LearningRate 0.0844   Epoch: 1   Global Step: 67640   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:40,365-Speed 2622.24 samples/sec   Loss 13.1678   LearningRate 0.0844   Epoch: 1   Global Step: 67650   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:44,264-Speed 2627.03 samples/sec   Loss 13.2005   LearningRate 0.0844   Epoch: 1   Global Step: 67660   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:18:48,220-Speed 2589.16 samples/sec   Loss 13.1003   LearningRate 0.0844   Epoch: 1   Global Step: 67670   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:18:52,117-Speed 2628.37 samples/sec   Loss 13.1051   LearningRate 0.0843   Epoch: 1   Global Step: 67680   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:18:56,016-Speed 2627.05 samples/sec   Loss 13.0784   LearningRate 0.0843   Epoch: 1   Global Step: 67690   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:18:59,914-Speed 2627.61 samples/sec   Loss 13.1140   LearningRate 0.0843   Epoch: 1   Global Step: 67700   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:19:03,815-Speed 2625.29 samples/sec   Loss 13.1237   LearningRate 0.0843   Epoch: 1   Global Step: 67710   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:19:07,706-Speed 2632.44 samples/sec   Loss 13.2440   LearningRate 0.0843   Epoch: 1   Global Step: 67720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:11,613-Speed 2621.37 samples/sec   Loss 13.0762   LearningRate 0.0843   Epoch: 1   Global Step: 67730   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:15,518-Speed 2622.96 samples/sec   Loss 13.2404   LearningRate 0.0843   Epoch: 1   Global Step: 67740   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:19,426-Speed 2621.22 samples/sec   Loss 13.1261   LearningRate 0.0843   Epoch: 1   Global Step: 67750   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:23,333-Speed 2620.97 samples/sec   Loss 13.0577   LearningRate 0.0843   Epoch: 1   Global Step: 67760   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:27,244-Speed 2619.13 samples/sec   Loss 13.1565   LearningRate 0.0843   Epoch: 1   Global Step: 67770   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:31,157-Speed 2617.19 samples/sec   Loss 13.1190   LearningRate 0.0843   Epoch: 1   Global Step: 67780   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:35,063-Speed 2622.06 samples/sec   Loss 13.1709   LearningRate 0.0843   Epoch: 1   Global Step: 67790   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:38,969-Speed 2622.41 samples/sec   Loss 13.2268   LearningRate 0.0843   Epoch: 1   Global Step: 67800   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:42,872-Speed 2624.89 samples/sec   Loss 13.1004   LearningRate 0.0843   Epoch: 1   Global Step: 67810   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:46,781-Speed 2620.07 samples/sec   Loss 13.0839   LearningRate 0.0843   Epoch: 1   Global Step: 67820   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:19:50,726-Speed 2595.78 samples/sec   Loss 13.0654   LearningRate 0.0843   Epoch: 1   Global Step: 67830   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:19:54,611-Speed 2636.21 samples/sec   Loss 13.2686   LearningRate 0.0843   Epoch: 1   Global Step: 67840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:19:58,517-Speed 2622.54 samples/sec   Loss 13.1470   LearningRate 0.0843   Epoch: 1   Global Step: 67850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:02,414-Speed 2628.33 samples/sec   Loss 13.1090   LearningRate 0.0843   Epoch: 1   Global Step: 67860   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:06,312-Speed 2627.66 samples/sec   Loss 13.1766   LearningRate 0.0843   Epoch: 1   Global Step: 67870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:10,212-Speed 2625.68 samples/sec   Loss 13.1550   LearningRate 0.0843   Epoch: 1   Global Step: 67880   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:14,109-Speed 2628.53 samples/sec   Loss 13.0809   LearningRate 0.0843   Epoch: 1   Global Step: 67890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:18,018-Speed 2620.32 samples/sec   Loss 12.9716   LearningRate 0.0843   Epoch: 1   Global Step: 67900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:21,919-Speed 2626.00 samples/sec   Loss 13.2628   LearningRate 0.0843   Epoch: 1   Global Step: 67910   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:25,829-Speed 2619.15 samples/sec   Loss 13.2789   LearningRate 0.0843   Epoch: 1   Global Step: 67920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:29,734-Speed 2623.27 samples/sec   Loss 13.1466   LearningRate 0.0843   Epoch: 1   Global Step: 67930   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:33,619-Speed 2636.21 samples/sec   Loss 13.2963   LearningRate 0.0843   Epoch: 1   Global Step: 67940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:37,518-Speed 2626.48 samples/sec   Loss 13.1635   LearningRate 0.0843   Epoch: 1   Global Step: 67950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:20:41,402-Speed 2636.73 samples/sec   Loss 13.2954   LearningRate 0.0843   Epoch: 1   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:20:45,298-Speed 2628.77 samples/sec   Loss 13.1718   LearningRate 0.0843   Epoch: 1   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:20:49,218-Speed 2613.13 samples/sec   Loss 13.1301   LearningRate 0.0843   Epoch: 1   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:20:53,119-Speed 2626.05 samples/sec   Loss 13.2058   LearningRate 0.0843   Epoch: 1   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:20:57,020-Speed 2625.48 samples/sec   Loss 13.2753   LearningRate 0.0843   Epoch: 1   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:21:00,917-Speed 2628.41 samples/sec   Loss 13.2015   LearningRate 0.0843   Epoch: 1   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:21:04,812-Speed 2628.83 samples/sec   Loss 13.1133   LearningRate 0.0843   Epoch: 1   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:21:08,707-Speed 2629.74 samples/sec   Loss 13.0572   LearningRate 0.0843   Epoch: 1   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:21:12,607-Speed 2626.40 samples/sec   Loss 13.1965   LearningRate 0.0843   Epoch: 1   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:21:16,503-Speed 2628.52 samples/sec   Loss 13.2194   LearningRate 0.0843   Epoch: 1   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:21:20,403-Speed 2626.62 samples/sec   Loss 13.2277   LearningRate 0.0843   Epoch: 1   Global Step: 68060   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:24,298-Speed 2629.27 samples/sec   Loss 13.0651   LearningRate 0.0843   Epoch: 1   Global Step: 68070   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:28,197-Speed 2627.78 samples/sec   Loss 13.1251   LearningRate 0.0843   Epoch: 1   Global Step: 68080   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:32,094-Speed 2627.77 samples/sec   Loss 13.1110   LearningRate 0.0843   Epoch: 1   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:35,988-Speed 2630.00 samples/sec   Loss 13.0518   LearningRate 0.0843   Epoch: 1   Global Step: 68100   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:39,886-Speed 2627.99 samples/sec   Loss 13.1603   LearningRate 0.0843   Epoch: 1   Global Step: 68110   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:43,792-Speed 2621.91 samples/sec   Loss 13.2489   LearningRate 0.0843   Epoch: 1   Global Step: 68120   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:47,780-Speed 2568.22 samples/sec   Loss 13.2781   LearningRate 0.0842   Epoch: 1   Global Step: 68130   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:51,697-Speed 2615.16 samples/sec   Loss 13.2469   LearningRate 0.0842   Epoch: 1   Global Step: 68140   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:55,596-Speed 2626.88 samples/sec   Loss 13.2267   LearningRate 0.0842   Epoch: 1   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:21:59,500-Speed 2623.66 samples/sec   Loss 13.1503   LearningRate 0.0842   Epoch: 1   Global Step: 68160   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:03,398-Speed 2627.32 samples/sec   Loss 13.1789   LearningRate 0.0842   Epoch: 1   Global Step: 68170   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:07,301-Speed 2624.32 samples/sec   Loss 13.2234   LearningRate 0.0842   Epoch: 1   Global Step: 68180   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:11,203-Speed 2625.20 samples/sec   Loss 13.1464   LearningRate 0.0842   Epoch: 1   Global Step: 68190   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:15,096-Speed 2630.87 samples/sec   Loss 13.0952   LearningRate 0.0842   Epoch: 1   Global Step: 68200   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:19,004-Speed 2621.06 samples/sec   Loss 13.1935   LearningRate 0.0842   Epoch: 1   Global Step: 68210   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:22,899-Speed 2629.85 samples/sec   Loss 13.2705   LearningRate 0.0842   Epoch: 1   Global Step: 68220   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:26,802-Speed 2624.23 samples/sec   Loss 13.0348   LearningRate 0.0842   Epoch: 1   Global Step: 68230   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:30,703-Speed 2625.00 samples/sec   Loss 13.0893   LearningRate 0.0842   Epoch: 1   Global Step: 68240   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:34,611-Speed 2621.37 samples/sec   Loss 13.1987   LearningRate 0.0842   Epoch: 1   Global Step: 68250   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:38,491-Speed 2639.53 samples/sec   Loss 13.0775   LearningRate 0.0842   Epoch: 1   Global Step: 68260   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:42,390-Speed 2626.99 samples/sec   Loss 13.1950   LearningRate 0.0842   Epoch: 1   Global Step: 68270   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:46,288-Speed 2627.29 samples/sec   Loss 13.1731   LearningRate 0.0842   Epoch: 1   Global Step: 68280   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:50,194-Speed 2622.66 samples/sec   Loss 13.1447   LearningRate 0.0842   Epoch: 1   Global Step: 68290   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:54,100-Speed 2622.04 samples/sec   Loss 13.0945   LearningRate 0.0842   Epoch: 1   Global Step: 68300   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:22:58,014-Speed 2617.19 samples/sec   Loss 13.1383   LearningRate 0.0842   Epoch: 1   Global Step: 68310   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:23:01,978-Speed 2583.40 samples/sec   Loss 13.0784   LearningRate 0.0842   Epoch: 1   Global Step: 68320   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:23:05,879-Speed 2625.96 samples/sec   Loss 13.0261   LearningRate 0.0842   Epoch: 1   Global Step: 68330   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:23:09,777-Speed 2626.97 samples/sec   Loss 13.0815   LearningRate 0.0842   Epoch: 1   Global Step: 68340   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:23:13,677-Speed 2626.46 samples/sec   Loss 13.0264   LearningRate 0.0842   Epoch: 1   Global Step: 68350   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:23:17,562-Speed 2637.17 samples/sec   Loss 13.1374   LearningRate 0.0842   Epoch: 1   Global Step: 68360   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:21,460-Speed 2627.13 samples/sec   Loss 13.2232   LearningRate 0.0842   Epoch: 1   Global Step: 68370   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:25,360-Speed 2626.71 samples/sec   Loss 13.0529   LearningRate 0.0842   Epoch: 1   Global Step: 68380   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:29,257-Speed 2627.79 samples/sec   Loss 13.0623   LearningRate 0.0842   Epoch: 1   Global Step: 68390   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:33,154-Speed 2628.57 samples/sec   Loss 13.0619   LearningRate 0.0842   Epoch: 1   Global Step: 68400   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:37,050-Speed 2628.54 samples/sec   Loss 13.2266   LearningRate 0.0842   Epoch: 1   Global Step: 68410   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:40,972-Speed 2611.23 samples/sec   Loss 13.0784   LearningRate 0.0842   Epoch: 1   Global Step: 68420   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:44,869-Speed 2628.33 samples/sec   Loss 13.1806   LearningRate 0.0842   Epoch: 1   Global Step: 68430   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:48,923-Speed 2526.81 samples/sec   Loss 13.0629   LearningRate 0.0842   Epoch: 1   Global Step: 68440   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:52,827-Speed 2623.56 samples/sec   Loss 13.1804   LearningRate 0.0842   Epoch: 1   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:23:56,726-Speed 2627.15 samples/sec   Loss 13.1544   LearningRate 0.0842   Epoch: 1   Global Step: 68460   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:00,631-Speed 2622.54 samples/sec   Loss 13.1399   LearningRate 0.0842   Epoch: 1   Global Step: 68470   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:04,529-Speed 2627.55 samples/sec   Loss 13.1466   LearningRate 0.0842   Epoch: 1   Global Step: 68480   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:08,428-Speed 2626.76 samples/sec   Loss 12.9848   LearningRate 0.0842   Epoch: 1   Global Step: 68490   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:12,327-Speed 2627.09 samples/sec   Loss 13.0588   LearningRate 0.0842   Epoch: 1   Global Step: 68500   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:16,226-Speed 2626.75 samples/sec   Loss 12.8754   LearningRate 0.0842   Epoch: 1   Global Step: 68510   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:20,145-Speed 2613.51 samples/sec   Loss 13.1025   LearningRate 0.0842   Epoch: 1   Global Step: 68520   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:24,052-Speed 2621.95 samples/sec   Loss 12.8723   LearningRate 0.0842   Epoch: 1   Global Step: 68530   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:27,958-Speed 2622.03 samples/sec   Loss 13.1687   LearningRate 0.0842   Epoch: 1   Global Step: 68540   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:24:31,848-Speed 2633.26 samples/sec   Loss 13.0393   LearningRate 0.0842   Epoch: 1   Global Step: 68550   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:35,747-Speed 2626.61 samples/sec   Loss 13.1494   LearningRate 0.0842   Epoch: 1   Global Step: 68560   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:39,644-Speed 2628.39 samples/sec   Loss 13.0633   LearningRate 0.0842   Epoch: 1   Global Step: 68570   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:43,541-Speed 2628.04 samples/sec   Loss 13.1134   LearningRate 0.0841   Epoch: 1   Global Step: 68580   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:47,444-Speed 2624.10 samples/sec   Loss 12.9771   LearningRate 0.0841   Epoch: 1   Global Step: 68590   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:51,348-Speed 2623.95 samples/sec   Loss 12.9951   LearningRate 0.0841   Epoch: 1   Global Step: 68600   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:55,250-Speed 2624.65 samples/sec   Loss 13.2086   LearningRate 0.0841   Epoch: 1   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:24:59,309-Speed 2523.34 samples/sec   Loss 13.2181   LearningRate 0.0841   Epoch: 1   Global Step: 68620   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:03,275-Speed 2582.69 samples/sec   Loss 13.1526   LearningRate 0.0841   Epoch: 1   Global Step: 68630   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:07,177-Speed 2625.04 samples/sec   Loss 13.0539   LearningRate 0.0841   Epoch: 1   Global Step: 68640   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:11,070-Speed 2630.52 samples/sec   Loss 13.1106   LearningRate 0.0841   Epoch: 1   Global Step: 68650   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:25:14,965-Speed 2630.03 samples/sec   Loss 13.1352   LearningRate 0.0841   Epoch: 1   Global Step: 68660   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:25:18,863-Speed 2627.99 samples/sec   Loss 13.1936   LearningRate 0.0841   Epoch: 1   Global Step: 68670   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:25:22,787-Speed 2609.82 samples/sec   Loss 13.1308   LearningRate 0.0841   Epoch: 1   Global Step: 68680   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:25:26,665-Speed 2641.15 samples/sec   Loss 13.2287   LearningRate 0.0841   Epoch: 1   Global Step: 68690   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:30,562-Speed 2628.06 samples/sec   Loss 12.9638   LearningRate 0.0841   Epoch: 1   Global Step: 68700   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:34,456-Speed 2630.38 samples/sec   Loss 13.0639   LearningRate 0.0841   Epoch: 1   Global Step: 68710   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:38,357-Speed 2625.38 samples/sec   Loss 13.1829   LearningRate 0.0841   Epoch: 1   Global Step: 68720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:42,259-Speed 2625.09 samples/sec   Loss 13.0422   LearningRate 0.0841   Epoch: 1   Global Step: 68730   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:46,152-Speed 2631.01 samples/sec   Loss 13.1191   LearningRate 0.0841   Epoch: 1   Global Step: 68740   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:50,057-Speed 2623.32 samples/sec   Loss 13.1568   LearningRate 0.0841   Epoch: 1   Global Step: 68750   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:53,962-Speed 2622.40 samples/sec   Loss 13.1525   LearningRate 0.0841   Epoch: 1   Global Step: 68760   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:25:57,858-Speed 2628.92 samples/sec   Loss 13.0552   LearningRate 0.0841   Epoch: 1   Global Step: 68770   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:01,753-Speed 2629.94 samples/sec   Loss 12.9889   LearningRate 0.0841   Epoch: 1   Global Step: 68780   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:05,645-Speed 2631.00 samples/sec   Loss 13.2197   LearningRate 0.0841   Epoch: 1   Global Step: 68790   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:26:09,542-Speed 2628.35 samples/sec   Loss 13.0801   LearningRate 0.0841   Epoch: 1   Global Step: 68800   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:26:13,442-Speed 2626.69 samples/sec   Loss 13.1503   LearningRate 0.0841   Epoch: 1   Global Step: 68810   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:26:17,419-Speed 2575.03 samples/sec   Loss 13.1796   LearningRate 0.0841   Epoch: 1   Global Step: 68820   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:26:21,330-Speed 2619.42 samples/sec   Loss 13.2273   LearningRate 0.0841   Epoch: 1   Global Step: 68830   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:25,223-Speed 2630.68 samples/sec   Loss 13.2332   LearningRate 0.0841   Epoch: 1   Global Step: 68840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:29,119-Speed 2629.21 samples/sec   Loss 13.2602   LearningRate 0.0841   Epoch: 1   Global Step: 68850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:33,019-Speed 2626.02 samples/sec   Loss 12.9673   LearningRate 0.0841   Epoch: 1   Global Step: 68860   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:36,910-Speed 2632.48 samples/sec   Loss 13.0360   LearningRate 0.0841   Epoch: 1   Global Step: 68870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:40,803-Speed 2630.70 samples/sec   Loss 13.0535   LearningRate 0.0841   Epoch: 1   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:44,702-Speed 2626.98 samples/sec   Loss 13.0337   LearningRate 0.0841   Epoch: 1   Global Step: 68890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:48,601-Speed 2626.55 samples/sec   Loss 13.0499   LearningRate 0.0841   Epoch: 1   Global Step: 68900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:52,499-Speed 2628.52 samples/sec   Loss 13.1130   LearningRate 0.0841   Epoch: 1   Global Step: 68910   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:26:56,388-Speed 2633.75 samples/sec   Loss 13.0736   LearningRate 0.0841   Epoch: 1   Global Step: 68920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:00,277-Speed 2633.13 samples/sec   Loss 13.0897   LearningRate 0.0841   Epoch: 1   Global Step: 68930   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:27:04,151-Speed 2643.99 samples/sec   Loss 13.1454   LearningRate 0.0841   Epoch: 1   Global Step: 68940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:08,042-Speed 2632.01 samples/sec   Loss 12.9187   LearningRate 0.0841   Epoch: 1   Global Step: 68950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:11,959-Speed 2614.99 samples/sec   Loss 12.9641   LearningRate 0.0841   Epoch: 1   Global Step: 68960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:15,850-Speed 2631.85 samples/sec   Loss 13.1578   LearningRate 0.0841   Epoch: 1   Global Step: 68970   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:19,751-Speed 2625.72 samples/sec   Loss 13.1978   LearningRate 0.0841   Epoch: 1   Global Step: 68980   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:23,648-Speed 2628.34 samples/sec   Loss 13.1343   LearningRate 0.0841   Epoch: 1   Global Step: 68990   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:27,544-Speed 2629.28 samples/sec   Loss 13.1308   LearningRate 0.0841   Epoch: 1   Global Step: 69000   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:31,438-Speed 2630.08 samples/sec   Loss 13.2036   LearningRate 0.0841   Epoch: 1   Global Step: 69010   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:35,336-Speed 2627.64 samples/sec   Loss 13.1125   LearningRate 0.0841   Epoch: 1   Global Step: 69020   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:39,232-Speed 2628.82 samples/sec   Loss 13.2319   LearningRate 0.0841   Epoch: 1   Global Step: 69030   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:27:43,090-Speed 2655.01 samples/sec   Loss 13.0706   LearningRate 0.0840   Epoch: 1   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:27:46,999-Speed 2619.67 samples/sec   Loss 13.2377   LearningRate 0.0840   Epoch: 1   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:27:50,934-Speed 2603.01 samples/sec   Loss 13.1373   LearningRate 0.0840   Epoch: 1   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:27:54,826-Speed 2632.06 samples/sec   Loss 13.2216   LearningRate 0.0840   Epoch: 1   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:27:58,717-Speed 2632.31 samples/sec   Loss 13.0916   LearningRate 0.0840   Epoch: 1   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:28:02,615-Speed 2627.29 samples/sec   Loss 13.1271   LearningRate 0.0840   Epoch: 1   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:28:06,508-Speed 2631.28 samples/sec   Loss 13.2377   LearningRate 0.0840   Epoch: 1   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:28:10,418-Speed 2619.11 samples/sec   Loss 13.0389   LearningRate 0.0840   Epoch: 1   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:28:14,317-Speed 2626.91 samples/sec   Loss 13.1528   LearningRate 0.0840   Epoch: 1   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:28:18,215-Speed 2627.92 samples/sec   Loss 13.0519   LearningRate 0.0840   Epoch: 1   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:28:22,111-Speed 2628.63 samples/sec   Loss 13.1948   LearningRate 0.0840   Epoch: 1   Global Step: 69140   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:26,007-Speed 2628.84 samples/sec   Loss 13.1795   LearningRate 0.0840   Epoch: 1   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:29,900-Speed 2630.65 samples/sec   Loss 13.2693   LearningRate 0.0840   Epoch: 1   Global Step: 69160   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:33,798-Speed 2627.82 samples/sec   Loss 13.0089   LearningRate 0.0840   Epoch: 1   Global Step: 69170   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:37,691-Speed 2631.20 samples/sec   Loss 13.1094   LearningRate 0.0840   Epoch: 1   Global Step: 69180   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:41,585-Speed 2629.97 samples/sec   Loss 13.0420   LearningRate 0.0840   Epoch: 1   Global Step: 69190   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:45,482-Speed 2628.80 samples/sec   Loss 13.0020   LearningRate 0.0840   Epoch: 1   Global Step: 69200   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:49,375-Speed 2631.04 samples/sec   Loss 13.0281   LearningRate 0.0840   Epoch: 1   Global Step: 69210   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:53,268-Speed 2630.37 samples/sec   Loss 13.1458   LearningRate 0.0840   Epoch: 1   Global Step: 69220   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:28:57,159-Speed 2632.27 samples/sec   Loss 13.0451   LearningRate 0.0840   Epoch: 1   Global Step: 69230   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:01,052-Speed 2631.36 samples/sec   Loss 13.0268   LearningRate 0.0840   Epoch: 1   Global Step: 69240   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:29:04,936-Speed 2636.65 samples/sec   Loss 13.0794   LearningRate 0.0840   Epoch: 1   Global Step: 69250   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:08,826-Speed 2633.29 samples/sec   Loss 13.0733   LearningRate 0.0840   Epoch: 1   Global Step: 69260   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:12,715-Speed 2633.78 samples/sec   Loss 12.9822   LearningRate 0.0840   Epoch: 1   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:16,609-Speed 2630.17 samples/sec   Loss 12.9287   LearningRate 0.0840   Epoch: 1   Global Step: 69280   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:20,524-Speed 2616.13 samples/sec   Loss 13.1434   LearningRate 0.0840   Epoch: 1   Global Step: 69290   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:24,416-Speed 2632.08 samples/sec   Loss 13.1199   LearningRate 0.0840   Epoch: 1   Global Step: 69300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:28,312-Speed 2628.88 samples/sec   Loss 13.0324   LearningRate 0.0840   Epoch: 1   Global Step: 69310   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:32,213-Speed 2625.48 samples/sec   Loss 13.0068   LearningRate 0.0840   Epoch: 1   Global Step: 69320   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:36,107-Speed 2630.79 samples/sec   Loss 13.1684   LearningRate 0.0840   Epoch: 1   Global Step: 69330   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:40,008-Speed 2625.32 samples/sec   Loss 13.0795   LearningRate 0.0840   Epoch: 1   Global Step: 69340   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:29:43,905-Speed 2628.48 samples/sec   Loss 13.2221   LearningRate 0.0840   Epoch: 1   Global Step: 69350   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:29:47,801-Speed 2629.46 samples/sec   Loss 12.9967   LearningRate 0.0840   Epoch: 1   Global Step: 69360   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:29:51,695-Speed 2629.98 samples/sec   Loss 13.1796   LearningRate 0.0840   Epoch: 1   Global Step: 69370   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:29:55,591-Speed 2629.45 samples/sec   Loss 12.9912   LearningRate 0.0840   Epoch: 1   Global Step: 69380   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:29:59,492-Speed 2625.59 samples/sec   Loss 13.0330   LearningRate 0.0840   Epoch: 1   Global Step: 69390   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:03,394-Speed 2624.68 samples/sec   Loss 13.1486   LearningRate 0.0840   Epoch: 1   Global Step: 69400   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:07,292-Speed 2627.63 samples/sec   Loss 13.0922   LearningRate 0.0840   Epoch: 1   Global Step: 69410   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:11,186-Speed 2630.91 samples/sec   Loss 13.1388   LearningRate 0.0840   Epoch: 1   Global Step: 69420   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:15,099-Speed 2617.24 samples/sec   Loss 13.0740   LearningRate 0.0840   Epoch: 1   Global Step: 69430   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:19,010-Speed 2618.65 samples/sec   Loss 12.9446   LearningRate 0.0840   Epoch: 1   Global Step: 69440   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:22,890-Speed 2639.83 samples/sec   Loss 13.2820   LearningRate 0.0840   Epoch: 1   Global Step: 69450   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:30:26,766-Speed 2642.55 samples/sec   Loss 13.2058   LearningRate 0.0840   Epoch: 1   Global Step: 69460   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:30,816-Speed 2529.00 samples/sec   Loss 13.0807   LearningRate 0.0840   Epoch: 1   Global Step: 69470   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:34,711-Speed 2630.29 samples/sec   Loss 13.0930   LearningRate 0.0840   Epoch: 1   Global Step: 69480   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:38,603-Speed 2631.42 samples/sec   Loss 12.9164   LearningRate 0.0839   Epoch: 1   Global Step: 69490   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:42,497-Speed 2629.88 samples/sec   Loss 13.0401   LearningRate 0.0839   Epoch: 1   Global Step: 69500   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:46,394-Speed 2628.55 samples/sec   Loss 13.1350   LearningRate 0.0839   Epoch: 1   Global Step: 69510   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:50,289-Speed 2629.91 samples/sec   Loss 13.0345   LearningRate 0.0839   Epoch: 1   Global Step: 69520   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:54,179-Speed 2632.61 samples/sec   Loss 13.1684   LearningRate 0.0839   Epoch: 1   Global Step: 69530   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:30:58,074-Speed 2630.04 samples/sec   Loss 13.1541   LearningRate 0.0839   Epoch: 1   Global Step: 69540   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:01,976-Speed 2624.40 samples/sec   Loss 13.0594   LearningRate 0.0839   Epoch: 1   Global Step: 69550   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:05,858-Speed 2638.52 samples/sec   Loss 13.1130   LearningRate 0.0839   Epoch: 1   Global Step: 69560   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:09,751-Speed 2630.87 samples/sec   Loss 13.0789   LearningRate 0.0839   Epoch: 1   Global Step: 69570   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:13,645-Speed 2630.72 samples/sec   Loss 13.1564   LearningRate 0.0839   Epoch: 1   Global Step: 69580   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:17,536-Speed 2631.84 samples/sec   Loss 13.1993   LearningRate 0.0839   Epoch: 1   Global Step: 69590   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:21,438-Speed 2625.25 samples/sec   Loss 13.1641   LearningRate 0.0839   Epoch: 1   Global Step: 69600   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:25,341-Speed 2624.10 samples/sec   Loss 13.2423   LearningRate 0.0839   Epoch: 1   Global Step: 69610   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:29,243-Speed 2625.20 samples/sec   Loss 13.1429   LearningRate 0.0839   Epoch: 1   Global Step: 69620   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:33,151-Speed 2620.93 samples/sec   Loss 13.2138   LearningRate 0.0839   Epoch: 1   Global Step: 69630   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:37,054-Speed 2624.13 samples/sec   Loss 13.0646   LearningRate 0.0839   Epoch: 1   Global Step: 69640   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:40,961-Speed 2621.67 samples/sec   Loss 13.1430   LearningRate 0.0839   Epoch: 1   Global Step: 69650   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:31:44,874-Speed 2617.47 samples/sec   Loss 13.1270   LearningRate 0.0839   Epoch: 1   Global Step: 69660   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:31:48,863-Speed 2567.52 samples/sec   Loss 13.1866   LearningRate 0.0839   Epoch: 1   Global Step: 69670   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:31:52,768-Speed 2622.89 samples/sec   Loss 13.1768   LearningRate 0.0839   Epoch: 1   Global Step: 69680   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:31:56,674-Speed 2622.55 samples/sec   Loss 12.9814   LearningRate 0.0839   Epoch: 1   Global Step: 69690   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:32:00,557-Speed 2637.84 samples/sec   Loss 13.2031   LearningRate 0.0839   Epoch: 1   Global Step: 69700   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:32:04,459-Speed 2624.52 samples/sec   Loss 13.2039   LearningRate 0.0839   Epoch: 1   Global Step: 69710   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:32:08,374-Speed 2616.16 samples/sec   Loss 13.0219   LearningRate 0.0839   Epoch: 1   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:32:12,305-Speed 2605.37 samples/sec   Loss 13.1924   LearningRate 0.0839   Epoch: 1   Global Step: 69730   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:32:16,205-Speed 2627.13 samples/sec   Loss 13.0110   LearningRate 0.0839   Epoch: 1   Global Step: 69740   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:32:20,113-Speed 2621.03 samples/sec   Loss 13.0741   LearningRate 0.0839   Epoch: 1   Global Step: 69750   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:32:24,005-Speed 2631.61 samples/sec   Loss 13.1269   LearningRate 0.0839   Epoch: 1   Global Step: 69760   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:32:27,907-Speed 2624.81 samples/sec   Loss 13.1891   LearningRate 0.0839   Epoch: 1   Global Step: 69770   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:32:31,805-Speed 2627.64 samples/sec   Loss 13.0757   LearningRate 0.0839   Epoch: 1   Global Step: 69780   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:32:35,690-Speed 2635.83 samples/sec   Loss 13.0770   LearningRate 0.0839   Epoch: 1   Global Step: 69790   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:32:39,565-Speed 2643.40 samples/sec   Loss 13.2658   LearningRate 0.0839   Epoch: 1   Global Step: 69800   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:32:43,459-Speed 2630.80 samples/sec   Loss 13.1124   LearningRate 0.0839   Epoch: 1   Global Step: 69810   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:32:47,353-Speed 2630.07 samples/sec   Loss 13.1887   LearningRate 0.0839   Epoch: 1   Global Step: 69820   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:32:51,254-Speed 2625.84 samples/sec   Loss 13.2342   LearningRate 0.0839   Epoch: 1   Global Step: 69830   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:32:55,153-Speed 2626.96 samples/sec   Loss 13.0379   LearningRate 0.0839   Epoch: 1   Global Step: 69840   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:32:59,049-Speed 2629.59 samples/sec   Loss 13.0370   LearningRate 0.0839   Epoch: 1   Global Step: 69850   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:33:02,945-Speed 2628.63 samples/sec   Loss 13.1880   LearningRate 0.0839   Epoch: 1   Global Step: 69860   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:33:06,842-Speed 2628.24 samples/sec   Loss 13.2113   LearningRate 0.0839   Epoch: 1   Global Step: 69870   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:33:10,733-Speed 2632.43 samples/sec   Loss 13.1769   LearningRate 0.0839   Epoch: 1   Global Step: 69880   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:33:14,626-Speed 2631.25 samples/sec   Loss 12.8798   LearningRate 0.0839   Epoch: 1   Global Step: 69890   Fp16 Grad Scale: 16384   Required: 85 hours
Training: 2022-04-13 03:33:18,584-Speed 2587.40 samples/sec   Loss 13.0601   LearningRate 0.0839   Epoch: 1   Global Step: 69900   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:22,496-Speed 2618.91 samples/sec   Loss 13.1885   LearningRate 0.0839   Epoch: 1   Global Step: 69910   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:26,400-Speed 2623.30 samples/sec   Loss 13.1153   LearningRate 0.0839   Epoch: 1   Global Step: 69920   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:30,298-Speed 2628.27 samples/sec   Loss 13.0620   LearningRate 0.0839   Epoch: 1   Global Step: 69930   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:34,190-Speed 2631.24 samples/sec   Loss 13.0577   LearningRate 0.0838   Epoch: 1   Global Step: 69940   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:38,084-Speed 2630.49 samples/sec   Loss 13.2528   LearningRate 0.0838   Epoch: 1   Global Step: 69950   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:41,976-Speed 2631.33 samples/sec   Loss 13.0593   LearningRate 0.0838   Epoch: 1   Global Step: 69960   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:45,880-Speed 2623.79 samples/sec   Loss 13.0641   LearningRate 0.0838   Epoch: 1   Global Step: 69970   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:49,777-Speed 2628.65 samples/sec   Loss 12.8933   LearningRate 0.0838   Epoch: 1   Global Step: 69980   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:53,686-Speed 2619.86 samples/sec   Loss 13.0793   LearningRate 0.0838   Epoch: 1   Global Step: 69990   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:33:57,586-Speed 2627.07 samples/sec   Loss 13.1502   LearningRate 0.0838   Epoch: 1   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:34:40,954-[lfw][70000]XNorm: 23.249918
Training: 2022-04-13 03:34:40,955-[lfw][70000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-13 03:34:40,956-[lfw][70000]Accuracy-Highest: 0.99783
Training: 2022-04-13 03:35:31,044-[cfp_fp][70000]XNorm: 20.953000
Training: 2022-04-13 03:35:31,045-[cfp_fp][70000]Accuracy-Flip: 0.97500+-0.00680
Training: 2022-04-13 03:35:31,046-[cfp_fp][70000]Accuracy-Highest: 0.97500
Training: 2022-04-13 03:36:14,566-[agedb_30][70000]XNorm: 22.650397
Training: 2022-04-13 03:36:14,567-[agedb_30][70000]Accuracy-Flip: 0.96083+-0.00597
Training: 2022-04-13 03:36:14,568-[agedb_30][70000]Accuracy-Highest: 0.96283
Training: 2022-04-13 03:36:18,455-Speed 72.69 samples/sec   Loss 13.2279   LearningRate 0.0838   Epoch: 1   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:22,338-Speed 2638.38 samples/sec   Loss 13.1059   LearningRate 0.0838   Epoch: 1   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:26,213-Speed 2643.15 samples/sec   Loss 13.0600   LearningRate 0.0838   Epoch: 1   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:30,086-Speed 2645.04 samples/sec   Loss 13.0094   LearningRate 0.0838   Epoch: 1   Global Step: 70040   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:33,953-Speed 2648.49 samples/sec   Loss 13.1365   LearningRate 0.0838   Epoch: 1   Global Step: 70050   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:37,835-Speed 2639.22 samples/sec   Loss 13.1191   LearningRate 0.0838   Epoch: 1   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:41,709-Speed 2644.24 samples/sec   Loss 12.9019   LearningRate 0.0838   Epoch: 1   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:45,596-Speed 2634.85 samples/sec   Loss 12.8726   LearningRate 0.0838   Epoch: 1   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:49,475-Speed 2640.53 samples/sec   Loss 12.9454   LearningRate 0.0838   Epoch: 1   Global Step: 70090   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:36:53,351-Speed 2643.41 samples/sec   Loss 12.8893   LearningRate 0.0838   Epoch: 1   Global Step: 70100   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:36:57,247-Speed 2628.45 samples/sec   Loss 12.9735   LearningRate 0.0838   Epoch: 1   Global Step: 70110   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:01,186-Speed 2600.97 samples/sec   Loss 13.1731   LearningRate 0.0838   Epoch: 1   Global Step: 70120   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:05,073-Speed 2634.56 samples/sec   Loss 13.0358   LearningRate 0.0838   Epoch: 1   Global Step: 70130   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:08,957-Speed 2637.14 samples/sec   Loss 13.0908   LearningRate 0.0838   Epoch: 1   Global Step: 70140   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:12,846-Speed 2633.77 samples/sec   Loss 13.0904   LearningRate 0.0838   Epoch: 1   Global Step: 70150   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:16,742-Speed 2629.39 samples/sec   Loss 13.0712   LearningRate 0.0838   Epoch: 1   Global Step: 70160   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:20,645-Speed 2624.36 samples/sec   Loss 12.9279   LearningRate 0.0838   Epoch: 1   Global Step: 70170   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:24,551-Speed 2622.06 samples/sec   Loss 12.8615   LearningRate 0.0838   Epoch: 1   Global Step: 70180   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:28,473-Speed 2611.77 samples/sec   Loss 12.9951   LearningRate 0.0838   Epoch: 1   Global Step: 70190   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:32,362-Speed 2634.18 samples/sec   Loss 13.0238   LearningRate 0.0838   Epoch: 1   Global Step: 70200   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:37:36,251-Speed 2633.59 samples/sec   Loss 13.0233   LearningRate 0.0838   Epoch: 1   Global Step: 70210   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:37:40,139-Speed 2634.71 samples/sec   Loss 13.0223   LearningRate 0.0838   Epoch: 1   Global Step: 70220   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:37:44,029-Speed 2632.83 samples/sec   Loss 13.0407   LearningRate 0.0838   Epoch: 1   Global Step: 70230   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:37:47,903-Speed 2644.07 samples/sec   Loss 12.9335   LearningRate 0.0838   Epoch: 1   Global Step: 70240   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:51,790-Speed 2634.83 samples/sec   Loss 13.0166   LearningRate 0.0838   Epoch: 1   Global Step: 70250   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:55,682-Speed 2632.36 samples/sec   Loss 13.0152   LearningRate 0.0838   Epoch: 1   Global Step: 70260   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:37:59,590-Speed 2620.29 samples/sec   Loss 13.0846   LearningRate 0.0838   Epoch: 1   Global Step: 70270   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:03,524-Speed 2604.04 samples/sec   Loss 12.9803   LearningRate 0.0838   Epoch: 1   Global Step: 70280   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:07,416-Speed 2631.62 samples/sec   Loss 12.9957   LearningRate 0.0838   Epoch: 1   Global Step: 70290   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:11,310-Speed 2630.71 samples/sec   Loss 13.0694   LearningRate 0.0838   Epoch: 1   Global Step: 70300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:15,209-Speed 2626.81 samples/sec   Loss 13.0671   LearningRate 0.0838   Epoch: 1   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:19,096-Speed 2634.61 samples/sec   Loss 13.0453   LearningRate 0.0838   Epoch: 1   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:22,987-Speed 2633.16 samples/sec   Loss 12.9655   LearningRate 0.0838   Epoch: 1   Global Step: 70330   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:38:26,877-Speed 2632.26 samples/sec   Loss 12.9856   LearningRate 0.0838   Epoch: 1   Global Step: 70340   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:38:30,781-Speed 2624.08 samples/sec   Loss 13.0703   LearningRate 0.0838   Epoch: 1   Global Step: 70350   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:38:34,629-Speed 2661.70 samples/sec   Loss 13.1417   LearningRate 0.0838   Epoch: 1   Global Step: 70360   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:38:38,519-Speed 2633.16 samples/sec   Loss 12.9224   LearningRate 0.0838   Epoch: 1   Global Step: 70370   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:38:42,410-Speed 2633.08 samples/sec   Loss 12.8842   LearningRate 0.0838   Epoch: 1   Global Step: 70380   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:38:46,301-Speed 2632.26 samples/sec   Loss 13.0163   LearningRate 0.0837   Epoch: 1   Global Step: 70390   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:38:50,199-Speed 2627.67 samples/sec   Loss 13.1638   LearningRate 0.0837   Epoch: 1   Global Step: 70400   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:38:54,114-Speed 2616.32 samples/sec   Loss 12.9810   LearningRate 0.0837   Epoch: 1   Global Step: 70410   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:38:58,004-Speed 2632.63 samples/sec   Loss 13.2294   LearningRate 0.0837   Epoch: 1   Global Step: 70420   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:39:01,901-Speed 2627.90 samples/sec   Loss 13.0680   LearningRate 0.0837   Epoch: 1   Global Step: 70430   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:39:05,804-Speed 2624.75 samples/sec   Loss 12.9926   LearningRate 0.0837   Epoch: 1   Global Step: 70440   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:39:09,702-Speed 2627.78 samples/sec   Loss 13.1561   LearningRate 0.0837   Epoch: 1   Global Step: 70450   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 03:39:13,594-Speed 2632.02 samples/sec   Loss 13.0510   LearningRate 0.0837   Epoch: 1   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:17,511-Speed 2615.00 samples/sec   Loss 13.0147   LearningRate 0.0837   Epoch: 1   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:21,404-Speed 2630.85 samples/sec   Loss 12.9931   LearningRate 0.0837   Epoch: 1   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:25,300-Speed 2629.17 samples/sec   Loss 13.0666   LearningRate 0.0837   Epoch: 1   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:29,212-Speed 2618.40 samples/sec   Loss 12.9322   LearningRate 0.0837   Epoch: 1   Global Step: 70500   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:33,121-Speed 2619.81 samples/sec   Loss 13.0512   LearningRate 0.0837   Epoch: 1   Global Step: 70510   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:37,018-Speed 2628.17 samples/sec   Loss 13.0916   LearningRate 0.0837   Epoch: 1   Global Step: 70520   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:40,912-Speed 2630.78 samples/sec   Loss 12.9814   LearningRate 0.0837   Epoch: 1   Global Step: 70530   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:44,806-Speed 2629.93 samples/sec   Loss 13.0398   LearningRate 0.0837   Epoch: 1   Global Step: 70540   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:48,705-Speed 2627.19 samples/sec   Loss 12.9953   LearningRate 0.0837   Epoch: 1   Global Step: 70550   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:39:52,597-Speed 2632.06 samples/sec   Loss 13.0149   LearningRate 0.0837   Epoch: 1   Global Step: 70560   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:39:56,503-Speed 2622.30 samples/sec   Loss 13.0542   LearningRate 0.0837   Epoch: 1   Global Step: 70570   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:00,397-Speed 2630.26 samples/sec   Loss 12.9529   LearningRate 0.0837   Epoch: 1   Global Step: 70580   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:04,296-Speed 2627.08 samples/sec   Loss 13.0035   LearningRate 0.0837   Epoch: 1   Global Step: 70590   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:08,199-Speed 2624.29 samples/sec   Loss 13.0594   LearningRate 0.0837   Epoch: 1   Global Step: 70600   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:12,184-Speed 2570.65 samples/sec   Loss 13.0260   LearningRate 0.0837   Epoch: 1   Global Step: 70610   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:16,082-Speed 2627.16 samples/sec   Loss 13.0000   LearningRate 0.0837   Epoch: 1   Global Step: 70620   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:19,990-Speed 2621.50 samples/sec   Loss 12.9963   LearningRate 0.0837   Epoch: 1   Global Step: 70630   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:23,893-Speed 2623.75 samples/sec   Loss 13.0691   LearningRate 0.0837   Epoch: 1   Global Step: 70640   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:27,792-Speed 2627.48 samples/sec   Loss 13.0223   LearningRate 0.0837   Epoch: 1   Global Step: 70650   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:31,692-Speed 2625.95 samples/sec   Loss 13.0435   LearningRate 0.0837   Epoch: 1   Global Step: 70660   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:40:35,595-Speed 2623.90 samples/sec   Loss 13.0966   LearningRate 0.0837   Epoch: 1   Global Step: 70670   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:40:39,493-Speed 2627.87 samples/sec   Loss 12.8385   LearningRate 0.0837   Epoch: 1   Global Step: 70680   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:43,401-Speed 2621.05 samples/sec   Loss 12.8656   LearningRate 0.0837   Epoch: 1   Global Step: 70690   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:47,304-Speed 2624.72 samples/sec   Loss 13.1644   LearningRate 0.0837   Epoch: 1   Global Step: 70700   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:51,206-Speed 2624.29 samples/sec   Loss 13.0650   LearningRate 0.0837   Epoch: 1   Global Step: 70710   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:55,105-Speed 2628.89 samples/sec   Loss 12.9958   LearningRate 0.0837   Epoch: 1   Global Step: 70720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:40:59,008-Speed 2624.43 samples/sec   Loss 13.0987   LearningRate 0.0837   Epoch: 1   Global Step: 70730   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:03,037-Speed 2541.98 samples/sec   Loss 12.8701   LearningRate 0.0837   Epoch: 1   Global Step: 70740   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:07,120-Speed 2508.05 samples/sec   Loss 12.9916   LearningRate 0.0837   Epoch: 1   Global Step: 70750   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:11,197-Speed 2512.60 samples/sec   Loss 13.0115   LearningRate 0.0837   Epoch: 1   Global Step: 70760   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:15,281-Speed 2507.98 samples/sec   Loss 13.0354   LearningRate 0.0837   Epoch: 1   Global Step: 70770   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:19,288-Speed 2556.70 samples/sec   Loss 12.9512   LearningRate 0.0837   Epoch: 1   Global Step: 70780   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:41:23,188-Speed 2626.14 samples/sec   Loss 13.1699   LearningRate 0.0837   Epoch: 1   Global Step: 70790   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:41:27,086-Speed 2627.61 samples/sec   Loss 13.0100   LearningRate 0.0837   Epoch: 1   Global Step: 70800   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:41:30,975-Speed 2634.15 samples/sec   Loss 13.1320   LearningRate 0.0837   Epoch: 1   Global Step: 70810   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:34,873-Speed 2626.94 samples/sec   Loss 13.0608   LearningRate 0.0837   Epoch: 1   Global Step: 70820   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:38,788-Speed 2615.91 samples/sec   Loss 13.2146   LearningRate 0.0837   Epoch: 1   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:42,700-Speed 2618.41 samples/sec   Loss 13.0937   LearningRate 0.0837   Epoch: 1   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:46,625-Speed 2610.14 samples/sec   Loss 13.0994   LearningRate 0.0836   Epoch: 1   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:50,554-Speed 2607.18 samples/sec   Loss 13.0412   LearningRate 0.0836   Epoch: 1   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:54,473-Speed 2612.78 samples/sec   Loss 13.0408   LearningRate 0.0836   Epoch: 1   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:41:58,389-Speed 2616.15 samples/sec   Loss 13.1484   LearningRate 0.0836   Epoch: 1   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:02,298-Speed 2619.77 samples/sec   Loss 13.0086   LearningRate 0.0836   Epoch: 1   Global Step: 70890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:06,203-Speed 2622.47 samples/sec   Loss 13.1016   LearningRate 0.0836   Epoch: 1   Global Step: 70900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:10,107-Speed 2623.70 samples/sec   Loss 13.0312   LearningRate 0.0836   Epoch: 1   Global Step: 70910   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:42:14,015-Speed 2621.53 samples/sec   Loss 13.1348   LearningRate 0.0836   Epoch: 1   Global Step: 70920   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:42:17,918-Speed 2624.67 samples/sec   Loss 13.0517   LearningRate 0.0836   Epoch: 1   Global Step: 70930   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:42:21,809-Speed 2631.83 samples/sec   Loss 13.0204   LearningRate 0.0836   Epoch: 1   Global Step: 70940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:25,725-Speed 2615.94 samples/sec   Loss 13.0463   LearningRate 0.0836   Epoch: 1   Global Step: 70950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:29,633-Speed 2620.66 samples/sec   Loss 13.0076   LearningRate 0.0836   Epoch: 1   Global Step: 70960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:33,550-Speed 2615.28 samples/sec   Loss 13.1701   LearningRate 0.0836   Epoch: 1   Global Step: 70970   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:37,460-Speed 2619.65 samples/sec   Loss 13.0061   LearningRate 0.0836   Epoch: 1   Global Step: 70980   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:41,390-Speed 2605.78 samples/sec   Loss 13.0004   LearningRate 0.0836   Epoch: 1   Global Step: 70990   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:45,302-Speed 2618.16 samples/sec   Loss 13.1339   LearningRate 0.0836   Epoch: 1   Global Step: 71000   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:49,346-Speed 2533.19 samples/sec   Loss 13.0388   LearningRate 0.0836   Epoch: 1   Global Step: 71010   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:53,359-Speed 2552.89 samples/sec   Loss 13.0169   LearningRate 0.0836   Epoch: 1   Global Step: 71020   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:42:57,273-Speed 2616.30 samples/sec   Loss 13.2049   LearningRate 0.0836   Epoch: 1   Global Step: 71030   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:43:01,187-Speed 2617.61 samples/sec   Loss 13.0395   LearningRate 0.0836   Epoch: 1   Global Step: 71040   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:05,089-Speed 2624.53 samples/sec   Loss 13.1257   LearningRate 0.0836   Epoch: 1   Global Step: 71050   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:08,997-Speed 2620.76 samples/sec   Loss 13.1184   LearningRate 0.0836   Epoch: 1   Global Step: 71060   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:12,901-Speed 2623.38 samples/sec   Loss 13.1019   LearningRate 0.0836   Epoch: 1   Global Step: 71070   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:16,915-Speed 2552.18 samples/sec   Loss 12.9770   LearningRate 0.0836   Epoch: 1   Global Step: 71080   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:20,819-Speed 2623.82 samples/sec   Loss 12.9232   LearningRate 0.0836   Epoch: 1   Global Step: 71090   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:24,724-Speed 2623.51 samples/sec   Loss 12.9958   LearningRate 0.0836   Epoch: 1   Global Step: 71100   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:28,636-Speed 2618.32 samples/sec   Loss 12.9966   LearningRate 0.0836   Epoch: 1   Global Step: 71110   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:43:32,522-Speed 2636.05 samples/sec   Loss 13.1885   LearningRate 0.0836   Epoch: 1   Global Step: 71120   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:43:36,426-Speed 2623.31 samples/sec   Loss 13.0638   LearningRate 0.0836   Epoch: 1   Global Step: 71130   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:43:40,337-Speed 2618.60 samples/sec   Loss 12.9834   LearningRate 0.0836   Epoch: 1   Global Step: 71140   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:43:44,240-Speed 2623.92 samples/sec   Loss 13.1558   LearningRate 0.0836   Epoch: 1   Global Step: 71150   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:43:48,145-Speed 2623.07 samples/sec   Loss 13.1424   LearningRate 0.0836   Epoch: 1   Global Step: 71160   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:43:52,036-Speed 2632.50 samples/sec   Loss 13.2758   LearningRate 0.0836   Epoch: 1   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:43:55,946-Speed 2619.80 samples/sec   Loss 12.9558   LearningRate 0.0836   Epoch: 1   Global Step: 71180   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:43:59,859-Speed 2617.50 samples/sec   Loss 13.0449   LearningRate 0.0836   Epoch: 1   Global Step: 71190   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:03,769-Speed 2619.60 samples/sec   Loss 12.9706   LearningRate 0.0836   Epoch: 1   Global Step: 71200   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:07,693-Speed 2610.39 samples/sec   Loss 13.0721   LearningRate 0.0836   Epoch: 1   Global Step: 71210   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:11,603-Speed 2619.49 samples/sec   Loss 13.0186   LearningRate 0.0836   Epoch: 1   Global Step: 71220   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:15,507-Speed 2623.06 samples/sec   Loss 13.0724   LearningRate 0.0836   Epoch: 1   Global Step: 71230   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:19,421-Speed 2617.56 samples/sec   Loss 13.0119   LearningRate 0.0836   Epoch: 1   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:23,334-Speed 2617.71 samples/sec   Loss 12.8705   LearningRate 0.0836   Epoch: 1   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:27,256-Speed 2611.36 samples/sec   Loss 13.0042   LearningRate 0.0836   Epoch: 1   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:44:31,162-Speed 2622.73 samples/sec   Loss 13.1145   LearningRate 0.0836   Epoch: 1   Global Step: 71270   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:35,074-Speed 2618.17 samples/sec   Loss 13.0057   LearningRate 0.0836   Epoch: 1   Global Step: 71280   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:39,005-Speed 2605.34 samples/sec   Loss 12.9667   LearningRate 0.0836   Epoch: 1   Global Step: 71290   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:42,912-Speed 2621.75 samples/sec   Loss 12.9779   LearningRate 0.0835   Epoch: 1   Global Step: 71300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:46,820-Speed 2620.95 samples/sec   Loss 12.9623   LearningRate 0.0835   Epoch: 1   Global Step: 71310   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:50,730-Speed 2619.48 samples/sec   Loss 12.9912   LearningRate 0.0835   Epoch: 1   Global Step: 71320   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:54,648-Speed 2614.21 samples/sec   Loss 13.1502   LearningRate 0.0835   Epoch: 1   Global Step: 71330   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:44:58,549-Speed 2625.44 samples/sec   Loss 13.1277   LearningRate 0.0835   Epoch: 1   Global Step: 71340   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:02,457-Speed 2621.20 samples/sec   Loss 12.9808   LearningRate 0.0835   Epoch: 1   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:06,374-Speed 2614.69 samples/sec   Loss 13.0900   LearningRate 0.0835   Epoch: 1   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:10,295-Speed 2612.10 samples/sec   Loss 13.0626   LearningRate 0.0835   Epoch: 1   Global Step: 71370   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:45:14,231-Speed 2602.46 samples/sec   Loss 13.0187   LearningRate 0.0835   Epoch: 1   Global Step: 71380   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:45:18,158-Speed 2612.69 samples/sec   Loss 13.1483   LearningRate 0.0835   Epoch: 1   Global Step: 71390   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:45:22,076-Speed 2614.03 samples/sec   Loss 12.7533   LearningRate 0.0835   Epoch: 1   Global Step: 71400   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:45:25,993-Speed 2615.24 samples/sec   Loss 13.0100   LearningRate 0.0835   Epoch: 1   Global Step: 71410   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:45:29,894-Speed 2626.08 samples/sec   Loss 13.1437   LearningRate 0.0835   Epoch: 1   Global Step: 71420   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:33,812-Speed 2613.76 samples/sec   Loss 13.0637   LearningRate 0.0835   Epoch: 1   Global Step: 71430   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:37,725-Speed 2617.46 samples/sec   Loss 13.0682   LearningRate 0.0835   Epoch: 1   Global Step: 71440   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:41,634-Speed 2620.43 samples/sec   Loss 13.1262   LearningRate 0.0835   Epoch: 1   Global Step: 71450   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:45,544-Speed 2619.17 samples/sec   Loss 12.9519   LearningRate 0.0835   Epoch: 1   Global Step: 71460   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:49,466-Speed 2611.85 samples/sec   Loss 13.0486   LearningRate 0.0835   Epoch: 1   Global Step: 71470   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:53,388-Speed 2611.43 samples/sec   Loss 12.8853   LearningRate 0.0835   Epoch: 1   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:45:57,296-Speed 2621.37 samples/sec   Loss 13.1104   LearningRate 0.0835   Epoch: 1   Global Step: 71490   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:46:01,208-Speed 2618.60 samples/sec   Loss 13.1394   LearningRate 0.0835   Epoch: 1   Global Step: 71500   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:46:05,122-Speed 2616.52 samples/sec   Loss 13.0769   LearningRate 0.0835   Epoch: 1   Global Step: 71510   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:46:09,035-Speed 2617.47 samples/sec   Loss 13.0393   LearningRate 0.0835   Epoch: 1   Global Step: 71520   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:12,951-Speed 2615.92 samples/sec   Loss 13.0329   LearningRate 0.0835   Epoch: 1   Global Step: 71530   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:16,860-Speed 2620.45 samples/sec   Loss 13.0766   LearningRate 0.0835   Epoch: 1   Global Step: 71540   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:20,784-Speed 2609.87 samples/sec   Loss 12.8302   LearningRate 0.0835   Epoch: 1   Global Step: 71550   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:24,695-Speed 2619.57 samples/sec   Loss 13.0960   LearningRate 0.0835   Epoch: 1   Global Step: 71560   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:28,633-Speed 2601.01 samples/sec   Loss 13.1691   LearningRate 0.0835   Epoch: 1   Global Step: 71570   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:32,546-Speed 2617.84 samples/sec   Loss 12.9368   LearningRate 0.0835   Epoch: 1   Global Step: 71580   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:36,463-Speed 2614.77 samples/sec   Loss 13.0293   LearningRate 0.0835   Epoch: 1   Global Step: 71590   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:40,377-Speed 2616.58 samples/sec   Loss 13.1966   LearningRate 0.0835   Epoch: 1   Global Step: 71600   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:44,288-Speed 2619.17 samples/sec   Loss 13.0440   LearningRate 0.0835   Epoch: 1   Global Step: 71610   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:48,188-Speed 2626.96 samples/sec   Loss 13.0663   LearningRate 0.0835   Epoch: 1   Global Step: 71620   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:52,089-Speed 2625.30 samples/sec   Loss 13.0106   LearningRate 0.0835   Epoch: 1   Global Step: 71630   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:56,017-Speed 2607.44 samples/sec   Loss 12.8748   LearningRate 0.0835   Epoch: 1   Global Step: 71640   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:46:59,906-Speed 2633.68 samples/sec   Loss 13.0524   LearningRate 0.0835   Epoch: 1   Global Step: 71650   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:03,815-Speed 2620.76 samples/sec   Loss 13.0083   LearningRate 0.0835   Epoch: 1   Global Step: 71660   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:07,722-Speed 2621.30 samples/sec   Loss 12.9992   LearningRate 0.0835   Epoch: 1   Global Step: 71670   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:11,632-Speed 2619.45 samples/sec   Loss 13.1196   LearningRate 0.0835   Epoch: 1   Global Step: 71680   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:15,538-Speed 2622.35 samples/sec   Loss 12.8769   LearningRate 0.0835   Epoch: 1   Global Step: 71690   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:19,464-Speed 2608.92 samples/sec   Loss 13.0032   LearningRate 0.0835   Epoch: 1   Global Step: 71700   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:23,371-Speed 2621.56 samples/sec   Loss 13.0109   LearningRate 0.0835   Epoch: 1   Global Step: 71710   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:27,283-Speed 2618.21 samples/sec   Loss 12.8153   LearningRate 0.0835   Epoch: 1   Global Step: 71720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:31,194-Speed 2618.95 samples/sec   Loss 12.9902   LearningRate 0.0835   Epoch: 1   Global Step: 71730   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:35,104-Speed 2619.32 samples/sec   Loss 12.9769   LearningRate 0.0835   Epoch: 1   Global Step: 71740   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:39,016-Speed 2617.86 samples/sec   Loss 12.9053   LearningRate 0.0835   Epoch: 1   Global Step: 71750   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:47:42,908-Speed 2632.48 samples/sec   Loss 13.2114   LearningRate 0.0834   Epoch: 1   Global Step: 71760   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:46,814-Speed 2622.13 samples/sec   Loss 12.9456   LearningRate 0.0834   Epoch: 1   Global Step: 71770   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:50,720-Speed 2622.37 samples/sec   Loss 12.9738   LearningRate 0.0834   Epoch: 1   Global Step: 71780   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:54,632-Speed 2617.68 samples/sec   Loss 13.1288   LearningRate 0.0834   Epoch: 1   Global Step: 71790   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:47:58,536-Speed 2624.09 samples/sec   Loss 12.9503   LearningRate 0.0834   Epoch: 1   Global Step: 71800   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:02,437-Speed 2625.76 samples/sec   Loss 12.9352   LearningRate 0.0834   Epoch: 1   Global Step: 71810   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:06,343-Speed 2622.23 samples/sec   Loss 13.1079   LearningRate 0.0834   Epoch: 1   Global Step: 71820   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:10,248-Speed 2622.72 samples/sec   Loss 13.2199   LearningRate 0.0834   Epoch: 1   Global Step: 71830   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:14,170-Speed 2612.04 samples/sec   Loss 12.9951   LearningRate 0.0834   Epoch: 1   Global Step: 71840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:18,079-Speed 2620.17 samples/sec   Loss 13.0810   LearningRate 0.0834   Epoch: 1   Global Step: 71850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:21,979-Speed 2626.06 samples/sec   Loss 13.2017   LearningRate 0.0834   Epoch: 1   Global Step: 71860   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:48:25,882-Speed 2624.86 samples/sec   Loss 12.9884   LearningRate 0.0834   Epoch: 1   Global Step: 71870   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:48:29,797-Speed 2615.90 samples/sec   Loss 13.0071   LearningRate 0.0834   Epoch: 1   Global Step: 71880   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:48:33,692-Speed 2629.91 samples/sec   Loss 13.0966   LearningRate 0.0834   Epoch: 1   Global Step: 71890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:37,605-Speed 2617.27 samples/sec   Loss 13.0127   LearningRate 0.0834   Epoch: 1   Global Step: 71900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:41,520-Speed 2616.62 samples/sec   Loss 13.2090   LearningRate 0.0834   Epoch: 1   Global Step: 71910   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:45,442-Speed 2610.92 samples/sec   Loss 12.9306   LearningRate 0.0834   Epoch: 1   Global Step: 71920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:49,358-Speed 2615.51 samples/sec   Loss 13.0771   LearningRate 0.0834   Epoch: 1   Global Step: 71930   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:53,279-Speed 2612.62 samples/sec   Loss 13.0023   LearningRate 0.0834   Epoch: 1   Global Step: 71940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:48:57,189-Speed 2619.00 samples/sec   Loss 12.9288   LearningRate 0.0834   Epoch: 1   Global Step: 71950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:01,092-Speed 2624.31 samples/sec   Loss 12.9227   LearningRate 0.0834   Epoch: 1   Global Step: 71960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:04,997-Speed 2623.11 samples/sec   Loss 13.1241   LearningRate 0.0834   Epoch: 1   Global Step: 71970   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:08,924-Speed 2608.22 samples/sec   Loss 12.9917   LearningRate 0.0834   Epoch: 1   Global Step: 71980   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:12,894-Speed 2579.65 samples/sec   Loss 12.9038   LearningRate 0.0834   Epoch: 1   Global Step: 71990   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:16,814-Speed 2613.26 samples/sec   Loss 13.0172   LearningRate 0.0834   Epoch: 1   Global Step: 72000   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:20,723-Speed 2619.86 samples/sec   Loss 12.9815   LearningRate 0.0834   Epoch: 1   Global Step: 72010   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:24,631-Speed 2620.94 samples/sec   Loss 12.9939   LearningRate 0.0834   Epoch: 1   Global Step: 72020   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:28,550-Speed 2613.56 samples/sec   Loss 12.9063   LearningRate 0.0834   Epoch: 1   Global Step: 72030   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:32,457-Speed 2621.38 samples/sec   Loss 13.0332   LearningRate 0.0834   Epoch: 1   Global Step: 72040   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:36,368-Speed 2618.96 samples/sec   Loss 12.9374   LearningRate 0.0834   Epoch: 1   Global Step: 72050   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:40,280-Speed 2617.84 samples/sec   Loss 12.9209   LearningRate 0.0834   Epoch: 1   Global Step: 72060   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:44,187-Speed 2621.43 samples/sec   Loss 12.9468   LearningRate 0.0834   Epoch: 1   Global Step: 72070   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:49:48,081-Speed 2630.83 samples/sec   Loss 12.9884   LearningRate 0.0834   Epoch: 1   Global Step: 72080   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:51,987-Speed 2622.55 samples/sec   Loss 12.8867   LearningRate 0.0834   Epoch: 1   Global Step: 72090   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:55,894-Speed 2621.18 samples/sec   Loss 12.9805   LearningRate 0.0834   Epoch: 1   Global Step: 72100   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:49:59,799-Speed 2622.65 samples/sec   Loss 13.0478   LearningRate 0.0834   Epoch: 1   Global Step: 72110   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:03,706-Speed 2622.16 samples/sec   Loss 12.8653   LearningRate 0.0834   Epoch: 1   Global Step: 72120   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:07,620-Speed 2616.71 samples/sec   Loss 13.0628   LearningRate 0.0834   Epoch: 1   Global Step: 72130   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:11,528-Speed 2620.60 samples/sec   Loss 12.9504   LearningRate 0.0834   Epoch: 1   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:15,454-Speed 2608.29 samples/sec   Loss 12.9661   LearningRate 0.0834   Epoch: 1   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:19,362-Speed 2621.89 samples/sec   Loss 13.0323   LearningRate 0.0834   Epoch: 1   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:23,272-Speed 2619.98 samples/sec   Loss 13.0328   LearningRate 0.0834   Epoch: 1   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:27,222-Speed 2593.31 samples/sec   Loss 13.0700   LearningRate 0.0834   Epoch: 1   Global Step: 72180   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:31,130-Speed 2620.43 samples/sec   Loss 12.9633   LearningRate 0.0834   Epoch: 1   Global Step: 72190   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:35,046-Speed 2615.73 samples/sec   Loss 13.0256   LearningRate 0.0834   Epoch: 1   Global Step: 72200   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:38,962-Speed 2615.91 samples/sec   Loss 13.0423   LearningRate 0.0833   Epoch: 1   Global Step: 72210   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:42,875-Speed 2617.20 samples/sec   Loss 12.9897   LearningRate 0.0833   Epoch: 1   Global Step: 72220   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:46,799-Speed 2610.19 samples/sec   Loss 13.1702   LearningRate 0.0833   Epoch: 1   Global Step: 72230   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:50,716-Speed 2615.01 samples/sec   Loss 13.0477   LearningRate 0.0833   Epoch: 1   Global Step: 72240   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:50:54,623-Speed 2621.34 samples/sec   Loss 12.9828   LearningRate 0.0833   Epoch: 1   Global Step: 72250   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:50:58,533-Speed 2619.82 samples/sec   Loss 12.9833   LearningRate 0.0833   Epoch: 1   Global Step: 72260   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:51:02,447-Speed 2616.45 samples/sec   Loss 12.8792   LearningRate 0.0833   Epoch: 1   Global Step: 72270   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:51:06,357-Speed 2619.83 samples/sec   Loss 12.8899   LearningRate 0.0833   Epoch: 1   Global Step: 72280   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:51:10,267-Speed 2619.45 samples/sec   Loss 13.0773   LearningRate 0.0833   Epoch: 1   Global Step: 72290   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:51:14,173-Speed 2622.24 samples/sec   Loss 13.0357   LearningRate 0.0833   Epoch: 1   Global Step: 72300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:51:18,061-Speed 2633.97 samples/sec   Loss 13.0064   LearningRate 0.0833   Epoch: 1   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:22,028-Speed 2582.44 samples/sec   Loss 13.0352   LearningRate 0.0833   Epoch: 1   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:25,944-Speed 2615.12 samples/sec   Loss 13.0906   LearningRate 0.0833   Epoch: 1   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:29,861-Speed 2614.83 samples/sec   Loss 13.0043   LearningRate 0.0833   Epoch: 1   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:33,778-Speed 2615.11 samples/sec   Loss 12.9256   LearningRate 0.0833   Epoch: 1   Global Step: 72350   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:37,697-Speed 2613.53 samples/sec   Loss 12.9989   LearningRate 0.0833   Epoch: 1   Global Step: 72360   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:41,615-Speed 2614.25 samples/sec   Loss 13.0581   LearningRate 0.0833   Epoch: 1   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:45,541-Speed 2608.68 samples/sec   Loss 12.9405   LearningRate 0.0833   Epoch: 1   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:49,458-Speed 2615.16 samples/sec   Loss 13.0349   LearningRate 0.0833   Epoch: 1   Global Step: 72390   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:53,371-Speed 2616.86 samples/sec   Loss 13.0696   LearningRate 0.0833   Epoch: 1   Global Step: 72400   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:51:57,279-Speed 2621.57 samples/sec   Loss 13.0077   LearningRate 0.0833   Epoch: 1   Global Step: 72410   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:52:01,203-Speed 2610.29 samples/sec   Loss 12.9117   LearningRate 0.0833   Epoch: 1   Global Step: 72420   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:05,137-Speed 2603.70 samples/sec   Loss 12.9897   LearningRate 0.0833   Epoch: 1   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:09,044-Speed 2621.38 samples/sec   Loss 12.9914   LearningRate 0.0833   Epoch: 1   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:12,949-Speed 2623.71 samples/sec   Loss 13.0902   LearningRate 0.0833   Epoch: 1   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:16,851-Speed 2624.36 samples/sec   Loss 12.9021   LearningRate 0.0833   Epoch: 1   Global Step: 72460   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:20,802-Speed 2592.93 samples/sec   Loss 12.9663   LearningRate 0.0833   Epoch: 1   Global Step: 72470   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:24,706-Speed 2623.18 samples/sec   Loss 12.8441   LearningRate 0.0833   Epoch: 1   Global Step: 72480   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:28,612-Speed 2622.57 samples/sec   Loss 13.1670   LearningRate 0.0833   Epoch: 1   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:32,580-Speed 2581.66 samples/sec   Loss 12.9061   LearningRate 0.0833   Epoch: 1   Global Step: 72500   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:36,478-Speed 2627.37 samples/sec   Loss 13.0163   LearningRate 0.0833   Epoch: 1   Global Step: 72510   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:52:40,386-Speed 2621.08 samples/sec   Loss 12.9440   LearningRate 0.0833   Epoch: 1   Global Step: 72520   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:52:44,289-Speed 2624.01 samples/sec   Loss 13.0597   LearningRate 0.0833   Epoch: 1   Global Step: 72530   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:52:48,215-Speed 2608.91 samples/sec   Loss 12.9976   LearningRate 0.0833   Epoch: 1   Global Step: 72540   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:52:52,116-Speed 2625.85 samples/sec   Loss 13.0762   LearningRate 0.0833   Epoch: 1   Global Step: 72550   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:52:56,040-Speed 2610.20 samples/sec   Loss 12.9498   LearningRate 0.0833   Epoch: 1   Global Step: 72560   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:52:59,946-Speed 2622.26 samples/sec   Loss 12.9823   LearningRate 0.0833   Epoch: 1   Global Step: 72570   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:03,875-Speed 2606.74 samples/sec   Loss 12.9565   LearningRate 0.0833   Epoch: 1   Global Step: 72580   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:07,781-Speed 2622.36 samples/sec   Loss 12.7562   LearningRate 0.0833   Epoch: 1   Global Step: 72590   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:11,683-Speed 2624.81 samples/sec   Loss 12.9274   LearningRate 0.0833   Epoch: 1   Global Step: 72600   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:15,588-Speed 2623.34 samples/sec   Loss 13.0075   LearningRate 0.0833   Epoch: 1   Global Step: 72610   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:19,489-Speed 2625.70 samples/sec   Loss 13.0361   LearningRate 0.0833   Epoch: 1   Global Step: 72620   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:23,400-Speed 2618.28 samples/sec   Loss 12.9731   LearningRate 0.0833   Epoch: 1   Global Step: 72630   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:27,407-Speed 2556.81 samples/sec   Loss 13.0541   LearningRate 0.0833   Epoch: 1   Global Step: 72640   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:31,384-Speed 2575.20 samples/sec   Loss 13.1481   LearningRate 0.0833   Epoch: 1   Global Step: 72650   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:35,295-Speed 2618.93 samples/sec   Loss 13.0443   LearningRate 0.0832   Epoch: 1   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:53:39,209-Speed 2617.22 samples/sec   Loss 12.9733   LearningRate 0.0832   Epoch: 1   Global Step: 72670   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:53:43,117-Speed 2621.16 samples/sec   Loss 12.8914   LearningRate 0.0832   Epoch: 1   Global Step: 72680   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:53:47,015-Speed 2627.14 samples/sec   Loss 12.8985   LearningRate 0.0832   Epoch: 1   Global Step: 72690   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:53:50,951-Speed 2602.25 samples/sec   Loss 13.0475   LearningRate 0.0832   Epoch: 1   Global Step: 72700   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:53:54,858-Speed 2621.90 samples/sec   Loss 13.1769   LearningRate 0.0832   Epoch: 1   Global Step: 72710   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:53:58,755-Speed 2628.90 samples/sec   Loss 13.0852   LearningRate 0.0832   Epoch: 1   Global Step: 72720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:54:02,656-Speed 2625.59 samples/sec   Loss 12.9573   LearningRate 0.0832   Epoch: 1   Global Step: 72730   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:54:06,559-Speed 2623.48 samples/sec   Loss 13.0303   LearningRate 0.0832   Epoch: 1   Global Step: 72740   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:54:10,554-Speed 2564.05 samples/sec   Loss 12.9100   LearningRate 0.0832   Epoch: 1   Global Step: 72750   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:54:14,467-Speed 2618.32 samples/sec   Loss 13.1001   LearningRate 0.0832   Epoch: 1   Global Step: 72760   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:54:18,372-Speed 2622.96 samples/sec   Loss 12.9049   LearningRate 0.0832   Epoch: 1   Global Step: 72770   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:22,285-Speed 2617.57 samples/sec   Loss 12.9840   LearningRate 0.0832   Epoch: 1   Global Step: 72780   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:26,186-Speed 2625.30 samples/sec   Loss 12.9818   LearningRate 0.0832   Epoch: 1   Global Step: 72790   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:30,085-Speed 2627.55 samples/sec   Loss 12.9589   LearningRate 0.0832   Epoch: 1   Global Step: 72800   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:33,996-Speed 2618.79 samples/sec   Loss 12.9186   LearningRate 0.0832   Epoch: 1   Global Step: 72810   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:37,903-Speed 2621.84 samples/sec   Loss 13.0291   LearningRate 0.0832   Epoch: 1   Global Step: 72820   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:41,816-Speed 2616.98 samples/sec   Loss 12.8769   LearningRate 0.0832   Epoch: 1   Global Step: 72830   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:45,724-Speed 2621.45 samples/sec   Loss 13.0056   LearningRate 0.0832   Epoch: 1   Global Step: 72840   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:49,638-Speed 2616.26 samples/sec   Loss 12.9126   LearningRate 0.0832   Epoch: 1   Global Step: 72850   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:53,541-Speed 2624.50 samples/sec   Loss 13.0389   LearningRate 0.0832   Epoch: 1   Global Step: 72860   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:54:57,430-Speed 2633.91 samples/sec   Loss 13.1488   LearningRate 0.0832   Epoch: 1   Global Step: 72870   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:55:01,330-Speed 2626.28 samples/sec   Loss 13.1695   LearningRate 0.0832   Epoch: 1   Global Step: 72880   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:55:05,213-Speed 2637.87 samples/sec   Loss 12.9935   LearningRate 0.0832   Epoch: 1   Global Step: 72890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:55:09,139-Speed 2609.34 samples/sec   Loss 12.9451   LearningRate 0.0832   Epoch: 1   Global Step: 72900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:55:13,043-Speed 2623.64 samples/sec   Loss 12.8959   LearningRate 0.0832   Epoch: 1   Global Step: 72910   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:55:16,945-Speed 2624.89 samples/sec   Loss 12.9202   LearningRate 0.0832   Epoch: 1   Global Step: 72920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:55:20,849-Speed 2623.03 samples/sec   Loss 12.9402   LearningRate 0.0832   Epoch: 1   Global Step: 72930   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:55:24,738-Speed 2634.12 samples/sec   Loss 13.1346   LearningRate 0.0832   Epoch: 1   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:28,660-Speed 2611.62 samples/sec   Loss 12.9315   LearningRate 0.0832   Epoch: 1   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:32,563-Speed 2624.35 samples/sec   Loss 12.9767   LearningRate 0.0832   Epoch: 1   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:36,522-Speed 2587.39 samples/sec   Loss 12.9755   LearningRate 0.0832   Epoch: 1   Global Step: 72970   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:40,439-Speed 2614.79 samples/sec   Loss 12.7887   LearningRate 0.0832   Epoch: 1   Global Step: 72980   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:44,358-Speed 2613.41 samples/sec   Loss 12.8928   LearningRate 0.0832   Epoch: 1   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:48,269-Speed 2619.27 samples/sec   Loss 13.1790   LearningRate 0.0832   Epoch: 1   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:52,178-Speed 2620.01 samples/sec   Loss 12.9242   LearningRate 0.0832   Epoch: 1   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:55:56,118-Speed 2599.33 samples/sec   Loss 12.9067   LearningRate 0.0832   Epoch: 1   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:56:00,026-Speed 2622.18 samples/sec   Loss 13.0404   LearningRate 0.0832   Epoch: 1   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:56:03,936-Speed 2619.83 samples/sec   Loss 12.8179   LearningRate 0.0832   Epoch: 1   Global Step: 73040   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:07,831-Speed 2629.53 samples/sec   Loss 13.0079   LearningRate 0.0832   Epoch: 1   Global Step: 73050   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:11,729-Speed 2627.45 samples/sec   Loss 12.7579   LearningRate 0.0832   Epoch: 1   Global Step: 73060   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:15,630-Speed 2625.60 samples/sec   Loss 12.9366   LearningRate 0.0832   Epoch: 1   Global Step: 73070   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:19,538-Speed 2620.53 samples/sec   Loss 13.0977   LearningRate 0.0832   Epoch: 1   Global Step: 73080   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:23,449-Speed 2618.99 samples/sec   Loss 12.8744   LearningRate 0.0832   Epoch: 1   Global Step: 73090   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:27,347-Speed 2627.71 samples/sec   Loss 12.8252   LearningRate 0.0832   Epoch: 1   Global Step: 73100   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:31,250-Speed 2624.42 samples/sec   Loss 12.9396   LearningRate 0.0832   Epoch: 1   Global Step: 73110   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:35,147-Speed 2628.47 samples/sec   Loss 12.8883   LearningRate 0.0831   Epoch: 1   Global Step: 73120   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:39,046-Speed 2626.74 samples/sec   Loss 12.8298   LearningRate 0.0831   Epoch: 1   Global Step: 73130   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:42,948-Speed 2625.42 samples/sec   Loss 12.8543   LearningRate 0.0831   Epoch: 1   Global Step: 73140   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:56:46,847-Speed 2627.22 samples/sec   Loss 12.7675   LearningRate 0.0831   Epoch: 1   Global Step: 73150   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:56:50,757-Speed 2619.48 samples/sec   Loss 12.9283   LearningRate 0.0831   Epoch: 1   Global Step: 73160   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:56:54,654-Speed 2628.14 samples/sec   Loss 12.9378   LearningRate 0.0831   Epoch: 1   Global Step: 73170   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:56:58,558-Speed 2623.59 samples/sec   Loss 12.9207   LearningRate 0.0831   Epoch: 1   Global Step: 73180   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:57:02,457-Speed 2626.96 samples/sec   Loss 13.0012   LearningRate 0.0831   Epoch: 1   Global Step: 73190   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:57:06,336-Speed 2640.18 samples/sec   Loss 12.9415   LearningRate 0.0831   Epoch: 1   Global Step: 73200   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:10,234-Speed 2628.23 samples/sec   Loss 13.0178   LearningRate 0.0831   Epoch: 1   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:14,133-Speed 2627.57 samples/sec   Loss 13.0148   LearningRate 0.0831   Epoch: 1   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:18,030-Speed 2628.33 samples/sec   Loss 12.7996   LearningRate 0.0831   Epoch: 1   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:21,928-Speed 2627.73 samples/sec   Loss 12.9385   LearningRate 0.0831   Epoch: 1   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:25,829-Speed 2625.28 samples/sec   Loss 12.9480   LearningRate 0.0831   Epoch: 1   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:29,732-Speed 2625.12 samples/sec   Loss 12.9736   LearningRate 0.0831   Epoch: 1   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:33,629-Speed 2628.09 samples/sec   Loss 13.0886   LearningRate 0.0831   Epoch: 1   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:37,524-Speed 2629.30 samples/sec   Loss 12.8796   LearningRate 0.0831   Epoch: 1   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:41,422-Speed 2627.57 samples/sec   Loss 12.9515   LearningRate 0.0831   Epoch: 1   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:57:45,320-Speed 2627.90 samples/sec   Loss 13.1038   LearningRate 0.0831   Epoch: 1   Global Step: 73300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:57:49,224-Speed 2623.72 samples/sec   Loss 13.0709   LearningRate 0.0831   Epoch: 1   Global Step: 73310   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:57:53,125-Speed 2625.61 samples/sec   Loss 13.0397   LearningRate 0.0831   Epoch: 1   Global Step: 73320   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:57:57,026-Speed 2625.83 samples/sec   Loss 13.0076   LearningRate 0.0831   Epoch: 1   Global Step: 73330   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:00,928-Speed 2624.74 samples/sec   Loss 13.1364   LearningRate 0.0831   Epoch: 1   Global Step: 73340   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:04,842-Speed 2617.02 samples/sec   Loss 13.0791   LearningRate 0.0831   Epoch: 1   Global Step: 73350   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:08,738-Speed 2628.32 samples/sec   Loss 12.8108   LearningRate 0.0831   Epoch: 1   Global Step: 73360   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:12,644-Speed 2622.76 samples/sec   Loss 13.1194   LearningRate 0.0831   Epoch: 1   Global Step: 73370   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:16,550-Speed 2621.94 samples/sec   Loss 12.9665   LearningRate 0.0831   Epoch: 1   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:20,462-Speed 2618.65 samples/sec   Loss 13.0720   LearningRate 0.0831   Epoch: 1   Global Step: 73390   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:24,361-Speed 2627.10 samples/sec   Loss 13.0093   LearningRate 0.0831   Epoch: 1   Global Step: 73400   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:58:28,242-Speed 2639.35 samples/sec   Loss 12.9860   LearningRate 0.0831   Epoch: 1   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:32,138-Speed 2628.63 samples/sec   Loss 12.8663   LearningRate 0.0831   Epoch: 1   Global Step: 73420   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:36,039-Speed 2625.35 samples/sec   Loss 12.9357   LearningRate 0.0831   Epoch: 1   Global Step: 73430   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:39,942-Speed 2624.16 samples/sec   Loss 13.0083   LearningRate 0.0831   Epoch: 1   Global Step: 73440   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:43,857-Speed 2616.11 samples/sec   Loss 12.9607   LearningRate 0.0831   Epoch: 1   Global Step: 73450   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:47,769-Speed 2618.54 samples/sec   Loss 13.0027   LearningRate 0.0831   Epoch: 1   Global Step: 73460   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:51,664-Speed 2629.86 samples/sec   Loss 12.9760   LearningRate 0.0831   Epoch: 1   Global Step: 73470   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:55,570-Speed 2622.46 samples/sec   Loss 13.0089   LearningRate 0.0831   Epoch: 1   Global Step: 73480   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:58:59,483-Speed 2617.52 samples/sec   Loss 12.9755   LearningRate 0.0831   Epoch: 1   Global Step: 73490   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:59:03,383-Speed 2626.00 samples/sec   Loss 12.8638   LearningRate 0.0831   Epoch: 1   Global Step: 73500   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:59:07,293-Speed 2619.81 samples/sec   Loss 13.0106   LearningRate 0.0831   Epoch: 1   Global Step: 73510   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:11,193-Speed 2625.84 samples/sec   Loss 13.0236   LearningRate 0.0831   Epoch: 1   Global Step: 73520   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:15,093-Speed 2626.59 samples/sec   Loss 12.8997   LearningRate 0.0831   Epoch: 1   Global Step: 73530   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:18,995-Speed 2624.88 samples/sec   Loss 13.0566   LearningRate 0.0831   Epoch: 1   Global Step: 73540   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:22,892-Speed 2628.68 samples/sec   Loss 13.0165   LearningRate 0.0831   Epoch: 1   Global Step: 73550   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:26,792-Speed 2626.35 samples/sec   Loss 13.0678   LearningRate 0.0831   Epoch: 1   Global Step: 73560   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:30,695-Speed 2623.88 samples/sec   Loss 12.9029   LearningRate 0.0830   Epoch: 1   Global Step: 73570   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:34,616-Speed 2612.23 samples/sec   Loss 13.1736   LearningRate 0.0830   Epoch: 1   Global Step: 73580   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:38,516-Speed 2626.32 samples/sec   Loss 13.0127   LearningRate 0.0830   Epoch: 1   Global Step: 73590   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 03:59:42,411-Speed 2629.54 samples/sec   Loss 13.0338   LearningRate 0.0830   Epoch: 1   Global Step: 73600   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 03:59:46,292-Speed 2639.82 samples/sec   Loss 12.9498   LearningRate 0.0830   Epoch: 1   Global Step: 73610   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:59:50,205-Speed 2617.56 samples/sec   Loss 12.9664   LearningRate 0.0830   Epoch: 1   Global Step: 73620   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:59:54,101-Speed 2629.00 samples/sec   Loss 13.0551   LearningRate 0.0830   Epoch: 1   Global Step: 73630   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 03:59:58,027-Speed 2609.29 samples/sec   Loss 12.8057   LearningRate 0.0830   Epoch: 1   Global Step: 73640   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:01,922-Speed 2629.72 samples/sec   Loss 13.0089   LearningRate 0.0830   Epoch: 1   Global Step: 73650   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:05,882-Speed 2586.27 samples/sec   Loss 13.0439   LearningRate 0.0830   Epoch: 1   Global Step: 73660   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:09,796-Speed 2616.43 samples/sec   Loss 13.0522   LearningRate 0.0830   Epoch: 1   Global Step: 73670   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:13,693-Speed 2628.49 samples/sec   Loss 13.0239   LearningRate 0.0830   Epoch: 1   Global Step: 73680   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:17,593-Speed 2626.55 samples/sec   Loss 12.9865   LearningRate 0.0830   Epoch: 1   Global Step: 73690   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:21,490-Speed 2627.85 samples/sec   Loss 12.8370   LearningRate 0.0830   Epoch: 1   Global Step: 73700   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:00:25,404-Speed 2617.43 samples/sec   Loss 12.8400   LearningRate 0.0830   Epoch: 1   Global Step: 73710   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:29,307-Speed 2624.82 samples/sec   Loss 12.9362   LearningRate 0.0830   Epoch: 1   Global Step: 73720   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:33,203-Speed 2628.85 samples/sec   Loss 12.8997   LearningRate 0.0830   Epoch: 1   Global Step: 73730   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:37,105-Speed 2624.76 samples/sec   Loss 12.9473   LearningRate 0.0830   Epoch: 1   Global Step: 73740   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:41,009-Speed 2623.80 samples/sec   Loss 13.0901   LearningRate 0.0830   Epoch: 1   Global Step: 73750   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:44,906-Speed 2628.48 samples/sec   Loss 13.0820   LearningRate 0.0830   Epoch: 1   Global Step: 73760   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:48,808-Speed 2625.26 samples/sec   Loss 12.9228   LearningRate 0.0830   Epoch: 1   Global Step: 73770   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:52,720-Speed 2618.24 samples/sec   Loss 12.9271   LearningRate 0.0830   Epoch: 1   Global Step: 73780   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:00:56,616-Speed 2628.82 samples/sec   Loss 12.8876   LearningRate 0.0830   Epoch: 1   Global Step: 73790   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:00,523-Speed 2621.79 samples/sec   Loss 13.0310   LearningRate 0.0830   Epoch: 1   Global Step: 73800   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:04,421-Speed 2628.07 samples/sec   Loss 13.0298   LearningRate 0.0830   Epoch: 1   Global Step: 73810   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:01:08,322-Speed 2625.31 samples/sec   Loss 12.8349   LearningRate 0.0830   Epoch: 1   Global Step: 73820   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:01:12,221-Speed 2627.27 samples/sec   Loss 12.9809   LearningRate 0.0830   Epoch: 1   Global Step: 73830   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:01:16,124-Speed 2624.23 samples/sec   Loss 12.8596   LearningRate 0.0830   Epoch: 1   Global Step: 73840   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:01:20,042-Speed 2614.64 samples/sec   Loss 12.9964   LearningRate 0.0830   Epoch: 1   Global Step: 73850   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:01:23,933-Speed 2631.88 samples/sec   Loss 12.8064   LearningRate 0.0830   Epoch: 1   Global Step: 73860   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:01:27,819-Speed 2635.85 samples/sec   Loss 12.9793   LearningRate 0.0830   Epoch: 1   Global Step: 73870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:31,718-Speed 2627.08 samples/sec   Loss 12.9107   LearningRate 0.0830   Epoch: 1   Global Step: 73880   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:35,616-Speed 2627.74 samples/sec   Loss 13.0090   LearningRate 0.0830   Epoch: 1   Global Step: 73890   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:39,516-Speed 2626.48 samples/sec   Loss 12.9262   LearningRate 0.0830   Epoch: 1   Global Step: 73900   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:43,441-Speed 2609.69 samples/sec   Loss 12.9090   LearningRate 0.0830   Epoch: 1   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:47,336-Speed 2630.11 samples/sec   Loss 12.9740   LearningRate 0.0830   Epoch: 1   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:51,241-Speed 2622.61 samples/sec   Loss 13.0182   LearningRate 0.0830   Epoch: 1   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:55,139-Speed 2627.52 samples/sec   Loss 13.0015   LearningRate 0.0830   Epoch: 1   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:01:59,063-Speed 2610.57 samples/sec   Loss 13.0284   LearningRate 0.0830   Epoch: 1   Global Step: 73950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:02:02,974-Speed 2618.94 samples/sec   Loss 12.9505   LearningRate 0.0830   Epoch: 1   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:02:06,887-Speed 2617.67 samples/sec   Loss 12.6597   LearningRate 0.0830   Epoch: 1   Global Step: 73970   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:10,792-Speed 2623.38 samples/sec   Loss 13.0241   LearningRate 0.0830   Epoch: 1   Global Step: 73980   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:14,692-Speed 2626.40 samples/sec   Loss 13.0943   LearningRate 0.0830   Epoch: 1   Global Step: 73990   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:18,624-Speed 2605.02 samples/sec   Loss 12.9721   LearningRate 0.0830   Epoch: 1   Global Step: 74000   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:22,540-Speed 2615.24 samples/sec   Loss 12.9078   LearningRate 0.0830   Epoch: 1   Global Step: 74010   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:26,459-Speed 2613.70 samples/sec   Loss 12.8438   LearningRate 0.0830   Epoch: 1   Global Step: 74020   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:30,394-Speed 2603.30 samples/sec   Loss 12.8935   LearningRate 0.0829   Epoch: 1   Global Step: 74030   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:34,311-Speed 2615.14 samples/sec   Loss 12.9146   LearningRate 0.0829   Epoch: 1   Global Step: 74040   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:38,224-Speed 2617.09 samples/sec   Loss 12.7887   LearningRate 0.0829   Epoch: 1   Global Step: 74050   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:42,135-Speed 2619.00 samples/sec   Loss 12.8489   LearningRate 0.0829   Epoch: 1   Global Step: 74060   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:46,018-Speed 2638.21 samples/sec   Loss 12.9093   LearningRate 0.0829   Epoch: 1   Global Step: 74070   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:49,917-Speed 2626.82 samples/sec   Loss 12.9467   LearningRate 0.0829   Epoch: 1   Global Step: 74080   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:53,813-Speed 2629.20 samples/sec   Loss 12.9416   LearningRate 0.0829   Epoch: 1   Global Step: 74090   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:02:57,719-Speed 2622.05 samples/sec   Loss 12.9548   LearningRate 0.0829   Epoch: 1   Global Step: 74100   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:01,622-Speed 2624.69 samples/sec   Loss 12.9958   LearningRate 0.0829   Epoch: 1   Global Step: 74110   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:05,547-Speed 2609.49 samples/sec   Loss 13.0601   LearningRate 0.0829   Epoch: 1   Global Step: 74120   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:09,449-Speed 2625.46 samples/sec   Loss 12.8954   LearningRate 0.0829   Epoch: 1   Global Step: 74130   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:13,348-Speed 2626.80 samples/sec   Loss 13.0228   LearningRate 0.0829   Epoch: 1   Global Step: 74140   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:17,248-Speed 2626.23 samples/sec   Loss 13.0373   LearningRate 0.0829   Epoch: 1   Global Step: 74150   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:21,156-Speed 2620.53 samples/sec   Loss 13.0071   LearningRate 0.0829   Epoch: 1   Global Step: 74160   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:25,060-Speed 2623.96 samples/sec   Loss 12.9844   LearningRate 0.0829   Epoch: 1   Global Step: 74170   Fp16 Grad Scale: 524288   Required: 85 hours
Training: 2022-04-13 04:03:28,948-Speed 2634.95 samples/sec   Loss 13.0483   LearningRate 0.0829   Epoch: 1   Global Step: 74180   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:32,844-Speed 2628.47 samples/sec   Loss 13.0756   LearningRate 0.0829   Epoch: 1   Global Step: 74190   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:36,745-Speed 2626.07 samples/sec   Loss 12.9799   LearningRate 0.0829   Epoch: 1   Global Step: 74200   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:40,648-Speed 2624.32 samples/sec   Loss 13.0124   LearningRate 0.0829   Epoch: 1   Global Step: 74210   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:44,556-Speed 2620.93 samples/sec   Loss 12.9472   LearningRate 0.0829   Epoch: 1   Global Step: 74220   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:48,501-Speed 2596.75 samples/sec   Loss 12.8661   LearningRate 0.0829   Epoch: 1   Global Step: 74230   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:52,549-Speed 2529.94 samples/sec   Loss 12.9263   LearningRate 0.0829   Epoch: 1   Global Step: 74240   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:03:56,652-Speed 2496.40 samples/sec   Loss 13.0516   LearningRate 0.0829   Epoch: 1   Global Step: 74250   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:00,754-Speed 2497.48 samples/sec   Loss 12.9259   LearningRate 0.0829   Epoch: 1   Global Step: 74260   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:04,714-Speed 2586.33 samples/sec   Loss 12.9627   LearningRate 0.0829   Epoch: 1   Global Step: 74270   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:08,603-Speed 2633.92 samples/sec   Loss 12.8720   LearningRate 0.0829   Epoch: 1   Global Step: 74280   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:12,630-Speed 2543.17 samples/sec   Loss 12.9356   LearningRate 0.0829   Epoch: 1   Global Step: 74290   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:16,532-Speed 2625.42 samples/sec   Loss 12.9097   LearningRate 0.0829   Epoch: 1   Global Step: 74300   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:20,428-Speed 2629.12 samples/sec   Loss 12.9764   LearningRate 0.0829   Epoch: 1   Global Step: 74310   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:24,323-Speed 2629.68 samples/sec   Loss 12.8967   LearningRate 0.0829   Epoch: 1   Global Step: 74320   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:28,219-Speed 2628.58 samples/sec   Loss 13.0942   LearningRate 0.0829   Epoch: 1   Global Step: 74330   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:32,139-Speed 2612.76 samples/sec   Loss 12.9420   LearningRate 0.0829   Epoch: 1   Global Step: 74340   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:36,031-Speed 2631.97 samples/sec   Loss 12.9298   LearningRate 0.0829   Epoch: 1   Global Step: 74350   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:39,935-Speed 2623.78 samples/sec   Loss 13.0775   LearningRate 0.0829   Epoch: 1   Global Step: 74360   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:43,840-Speed 2622.90 samples/sec   Loss 12.9078   LearningRate 0.0829   Epoch: 1   Global Step: 74370   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:47,738-Speed 2627.81 samples/sec   Loss 12.8495   LearningRate 0.0829   Epoch: 1   Global Step: 74380   Fp16 Grad Scale: 524288   Required: 85 hours
Training: 2022-04-13 04:04:51,630-Speed 2631.34 samples/sec   Loss 12.8456   LearningRate 0.0829   Epoch: 1   Global Step: 74390   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:55,540-Speed 2619.30 samples/sec   Loss 12.9169   LearningRate 0.0829   Epoch: 1   Global Step: 74400   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:04:59,443-Speed 2625.08 samples/sec   Loss 12.9363   LearningRate 0.0829   Epoch: 1   Global Step: 74410   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:03,343-Speed 2626.05 samples/sec   Loss 12.8916   LearningRate 0.0829   Epoch: 1   Global Step: 74420   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:07,240-Speed 2628.06 samples/sec   Loss 13.1244   LearningRate 0.0829   Epoch: 1   Global Step: 74430   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:11,136-Speed 2629.32 samples/sec   Loss 12.8522   LearningRate 0.0829   Epoch: 1   Global Step: 74440   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:15,031-Speed 2629.98 samples/sec   Loss 12.9073   LearningRate 0.0829   Epoch: 1   Global Step: 74450   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:18,930-Speed 2626.88 samples/sec   Loss 13.0176   LearningRate 0.0829   Epoch: 1   Global Step: 74460   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:22,828-Speed 2627.82 samples/sec   Loss 12.8387   LearningRate 0.0829   Epoch: 1   Global Step: 74470   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:26,735-Speed 2620.95 samples/sec   Loss 13.0146   LearningRate 0.0828   Epoch: 1   Global Step: 74480   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:30,612-Speed 2642.87 samples/sec   Loss 12.9990   LearningRate 0.0828   Epoch: 1   Global Step: 74490   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:34,505-Speed 2630.91 samples/sec   Loss 12.9340   LearningRate 0.0828   Epoch: 1   Global Step: 74500   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:38,400-Speed 2629.46 samples/sec   Loss 13.0015   LearningRate 0.0828   Epoch: 1   Global Step: 74510   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:42,317-Speed 2614.73 samples/sec   Loss 12.7979   LearningRate 0.0828   Epoch: 1   Global Step: 74520   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:46,215-Speed 2627.72 samples/sec   Loss 13.0600   LearningRate 0.0828   Epoch: 1   Global Step: 74530   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:50,116-Speed 2625.93 samples/sec   Loss 12.9314   LearningRate 0.0828   Epoch: 1   Global Step: 74540   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:54,013-Speed 2628.05 samples/sec   Loss 12.7550   LearningRate 0.0828   Epoch: 1   Global Step: 74550   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:05:57,912-Speed 2627.09 samples/sec   Loss 12.8127   LearningRate 0.0828   Epoch: 1   Global Step: 74560   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:01,823-Speed 2618.83 samples/sec   Loss 12.9908   LearningRate 0.0828   Epoch: 1   Global Step: 74570   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:05,725-Speed 2625.07 samples/sec   Loss 12.9836   LearningRate 0.0828   Epoch: 1   Global Step: 74580   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:09,620-Speed 2629.44 samples/sec   Loss 12.9281   LearningRate 0.0828   Epoch: 1   Global Step: 74590   Fp16 Grad Scale: 524288   Required: 85 hours
Training: 2022-04-13 04:06:13,503-Speed 2638.43 samples/sec   Loss 12.8689   LearningRate 0.0828   Epoch: 1   Global Step: 74600   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:17,399-Speed 2628.66 samples/sec   Loss 12.9446   LearningRate 0.0828   Epoch: 1   Global Step: 74610   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:21,368-Speed 2580.88 samples/sec   Loss 12.9299   LearningRate 0.0828   Epoch: 1   Global Step: 74620   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:25,283-Speed 2615.86 samples/sec   Loss 13.0423   LearningRate 0.0828   Epoch: 1   Global Step: 74630   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:29,378-Speed 2501.45 samples/sec   Loss 12.8103   LearningRate 0.0828   Epoch: 1   Global Step: 74640   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:33,344-Speed 2582.07 samples/sec   Loss 12.9621   LearningRate 0.0828   Epoch: 1   Global Step: 74650   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:37,241-Speed 2628.25 samples/sec   Loss 12.7751   LearningRate 0.0828   Epoch: 1   Global Step: 74660   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:41,153-Speed 2618.59 samples/sec   Loss 12.9596   LearningRate 0.0828   Epoch: 1   Global Step: 74670   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:06:45,019-Speed 2649.44 samples/sec   Loss 12.6932   LearningRate 0.0828   Epoch: 1   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:06:48,915-Speed 2629.05 samples/sec   Loss 13.1589   LearningRate 0.0828   Epoch: 1   Global Step: 74690   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:06:52,810-Speed 2629.80 samples/sec   Loss 12.8589   LearningRate 0.0828   Epoch: 1   Global Step: 74700   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:06:56,711-Speed 2625.49 samples/sec   Loss 13.0248   LearningRate 0.0828   Epoch: 1   Global Step: 74710   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:00,607-Speed 2628.47 samples/sec   Loss 12.8406   LearningRate 0.0828   Epoch: 1   Global Step: 74720   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:04,503-Speed 2629.17 samples/sec   Loss 12.8256   LearningRate 0.0828   Epoch: 1   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:08,400-Speed 2628.43 samples/sec   Loss 13.0190   LearningRate 0.0828   Epoch: 1   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:12,318-Speed 2614.13 samples/sec   Loss 12.7577   LearningRate 0.0828   Epoch: 1   Global Step: 74750   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:16,229-Speed 2619.09 samples/sec   Loss 12.9010   LearningRate 0.0828   Epoch: 1   Global Step: 74760   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:20,144-Speed 2616.18 samples/sec   Loss 12.9723   LearningRate 0.0828   Epoch: 1   Global Step: 74770   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:07:24,061-Speed 2615.45 samples/sec   Loss 12.8087   LearningRate 0.0828   Epoch: 1   Global Step: 74780   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:27,974-Speed 2617.61 samples/sec   Loss 12.9431   LearningRate 0.0828   Epoch: 1   Global Step: 74790   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:31,883-Speed 2619.57 samples/sec   Loss 13.1382   LearningRate 0.0828   Epoch: 1   Global Step: 74800   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:35,791-Speed 2621.81 samples/sec   Loss 12.8724   LearningRate 0.0828   Epoch: 1   Global Step: 74810   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:39,697-Speed 2621.61 samples/sec   Loss 12.9721   LearningRate 0.0828   Epoch: 1   Global Step: 74820   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:43,619-Speed 2611.97 samples/sec   Loss 13.0499   LearningRate 0.0828   Epoch: 1   Global Step: 74830   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:47,534-Speed 2616.16 samples/sec   Loss 12.9794   LearningRate 0.0828   Epoch: 1   Global Step: 74840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:51,434-Speed 2626.73 samples/sec   Loss 12.8713   LearningRate 0.0828   Epoch: 1   Global Step: 74850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:55,351-Speed 2614.78 samples/sec   Loss 12.9171   LearningRate 0.0828   Epoch: 1   Global Step: 74860   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:07:59,260-Speed 2619.91 samples/sec   Loss 12.9061   LearningRate 0.0828   Epoch: 1   Global Step: 74870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:03,168-Speed 2620.82 samples/sec   Loss 12.9007   LearningRate 0.0828   Epoch: 1   Global Step: 74880   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:08:07,066-Speed 2627.90 samples/sec   Loss 12.9699   LearningRate 0.0828   Epoch: 1   Global Step: 74890   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:08:11,107-Speed 2535.02 samples/sec   Loss 12.9163   LearningRate 0.0828   Epoch: 1   Global Step: 74900   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:08:15,016-Speed 2620.02 samples/sec   Loss 12.9622   LearningRate 0.0828   Epoch: 1   Global Step: 74910   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:08:18,913-Speed 2628.67 samples/sec   Loss 12.9503   LearningRate 0.0828   Epoch: 1   Global Step: 74920   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:08:22,813-Speed 2626.02 samples/sec   Loss 13.1415   LearningRate 0.0828   Epoch: 1   Global Step: 74930   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:08:26,704-Speed 2631.90 samples/sec   Loss 12.9330   LearningRate 0.0827   Epoch: 1   Global Step: 74940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:30,599-Speed 2629.75 samples/sec   Loss 12.9876   LearningRate 0.0827   Epoch: 1   Global Step: 74950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:34,495-Speed 2629.09 samples/sec   Loss 13.0135   LearningRate 0.0827   Epoch: 1   Global Step: 74960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:38,406-Speed 2618.89 samples/sec   Loss 12.8817   LearningRate 0.0827   Epoch: 1   Global Step: 74970   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:42,340-Speed 2603.55 samples/sec   Loss 12.8650   LearningRate 0.0827   Epoch: 1   Global Step: 74980   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:46,259-Speed 2615.51 samples/sec   Loss 12.8951   LearningRate 0.0827   Epoch: 1   Global Step: 74990   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:50,169-Speed 2619.53 samples/sec   Loss 12.8843   LearningRate 0.0827   Epoch: 1   Global Step: 75000   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:54,064-Speed 2629.46 samples/sec   Loss 12.8575   LearningRate 0.0827   Epoch: 1   Global Step: 75010   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:08:57,962-Speed 2627.52 samples/sec   Loss 12.9021   LearningRate 0.0827   Epoch: 1   Global Step: 75020   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:09:01,861-Speed 2626.60 samples/sec   Loss 12.9454   LearningRate 0.0827   Epoch: 1   Global Step: 75030   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:09:05,770-Speed 2620.23 samples/sec   Loss 12.8399   LearningRate 0.0827   Epoch: 1   Global Step: 75040   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:09,667-Speed 2629.59 samples/sec   Loss 12.9840   LearningRate 0.0827   Epoch: 1   Global Step: 75050   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:13,561-Speed 2630.19 samples/sec   Loss 12.8546   LearningRate 0.0827   Epoch: 1   Global Step: 75060   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:17,458-Speed 2628.23 samples/sec   Loss 12.7977   LearningRate 0.0827   Epoch: 1   Global Step: 75070   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:21,355-Speed 2628.63 samples/sec   Loss 12.8885   LearningRate 0.0827   Epoch: 1   Global Step: 75080   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:25,252-Speed 2628.15 samples/sec   Loss 13.0399   LearningRate 0.0827   Epoch: 1   Global Step: 75090   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:29,182-Speed 2606.59 samples/sec   Loss 12.9009   LearningRate 0.0827   Epoch: 1   Global Step: 75100   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:09:33,061-Speed 2640.51 samples/sec   Loss 12.9559   LearningRate 0.0827   Epoch: 1   Global Step: 75110   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:09:36,945-Speed 2636.95 samples/sec   Loss 12.8602   LearningRate 0.0827   Epoch: 1   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:09:40,843-Speed 2627.29 samples/sec   Loss 13.0446   LearningRate 0.0827   Epoch: 1   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:09:44,776-Speed 2605.15 samples/sec   Loss 12.8849   LearningRate 0.0827   Epoch: 1   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:09:48,678-Speed 2624.91 samples/sec   Loss 12.8681   LearningRate 0.0827   Epoch: 1   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:09:52,588-Speed 2619.36 samples/sec   Loss 13.0149   LearningRate 0.0827   Epoch: 1   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:09:56,500-Speed 2617.99 samples/sec   Loss 12.7904   LearningRate 0.0827   Epoch: 1   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:10:00,409-Speed 2620.58 samples/sec   Loss 12.9949   LearningRate 0.0827   Epoch: 1   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:10:04,318-Speed 2620.33 samples/sec   Loss 13.1000   LearningRate 0.0827   Epoch: 1   Global Step: 75190   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:10:08,239-Speed 2611.97 samples/sec   Loss 12.8884   LearningRate 0.0827   Epoch: 1   Global Step: 75200   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:10:12,144-Speed 2623.30 samples/sec   Loss 12.7336   LearningRate 0.0827   Epoch: 1   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:10:16,042-Speed 2627.95 samples/sec   Loss 12.9436   LearningRate 0.0827   Epoch: 1   Global Step: 75220   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:19,942-Speed 2625.85 samples/sec   Loss 12.9606   LearningRate 0.0827   Epoch: 1   Global Step: 75230   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:23,870-Speed 2607.41 samples/sec   Loss 12.8078   LearningRate 0.0827   Epoch: 1   Global Step: 75240   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:27,797-Speed 2608.25 samples/sec   Loss 12.9223   LearningRate 0.0827   Epoch: 1   Global Step: 75250   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:31,693-Speed 2629.59 samples/sec   Loss 12.9038   LearningRate 0.0827   Epoch: 1   Global Step: 75260   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:35,590-Speed 2628.11 samples/sec   Loss 12.7720   LearningRate 0.0827   Epoch: 1   Global Step: 75270   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:39,486-Speed 2628.71 samples/sec   Loss 12.9828   LearningRate 0.0827   Epoch: 1   Global Step: 75280   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:43,385-Speed 2626.79 samples/sec   Loss 12.7638   LearningRate 0.0827   Epoch: 1   Global Step: 75290   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:47,283-Speed 2627.66 samples/sec   Loss 12.7618   LearningRate 0.0827   Epoch: 1   Global Step: 75300   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:51,192-Speed 2620.07 samples/sec   Loss 12.7347   LearningRate 0.0827   Epoch: 1   Global Step: 75310   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:10:55,091-Speed 2627.24 samples/sec   Loss 12.8552   LearningRate 0.0827   Epoch: 1   Global Step: 75320   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:10:58,990-Speed 2626.80 samples/sec   Loss 12.8754   LearningRate 0.0827   Epoch: 1   Global Step: 75330   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:02,889-Speed 2627.05 samples/sec   Loss 12.8701   LearningRate 0.0827   Epoch: 1   Global Step: 75340   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:06,783-Speed 2630.68 samples/sec   Loss 12.9938   LearningRate 0.0827   Epoch: 1   Global Step: 75350   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:10,681-Speed 2627.49 samples/sec   Loss 13.0116   LearningRate 0.0827   Epoch: 1   Global Step: 75360   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:14,580-Speed 2627.11 samples/sec   Loss 12.7713   LearningRate 0.0827   Epoch: 1   Global Step: 75370   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:18,475-Speed 2629.18 samples/sec   Loss 12.8234   LearningRate 0.0827   Epoch: 1   Global Step: 75380   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:22,373-Speed 2628.11 samples/sec   Loss 12.9131   LearningRate 0.0827   Epoch: 1   Global Step: 75390   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:26,270-Speed 2627.98 samples/sec   Loss 12.9170   LearningRate 0.0826   Epoch: 1   Global Step: 75400   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:30,172-Speed 2625.50 samples/sec   Loss 12.9558   LearningRate 0.0826   Epoch: 1   Global Step: 75410   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:34,062-Speed 2632.86 samples/sec   Loss 12.9823   LearningRate 0.0826   Epoch: 1   Global Step: 75420   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:11:37,968-Speed 2621.74 samples/sec   Loss 12.9421   LearningRate 0.0826   Epoch: 1   Global Step: 75430   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:11:41,867-Speed 2627.27 samples/sec   Loss 12.9147   LearningRate 0.0826   Epoch: 1   Global Step: 75440   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:11:45,774-Speed 2621.90 samples/sec   Loss 12.9664   LearningRate 0.0826   Epoch: 1   Global Step: 75450   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:11:49,673-Speed 2627.09 samples/sec   Loss 12.9832   LearningRate 0.0826   Epoch: 1   Global Step: 75460   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:11:53,575-Speed 2625.10 samples/sec   Loss 12.8034   LearningRate 0.0826   Epoch: 1   Global Step: 75470   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:11:57,479-Speed 2623.44 samples/sec   Loss 12.8575   LearningRate 0.0826   Epoch: 1   Global Step: 75480   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:12:01,392-Speed 2617.78 samples/sec   Loss 12.7460   LearningRate 0.0826   Epoch: 1   Global Step: 75490   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:12:05,297-Speed 2622.72 samples/sec   Loss 12.9629   LearningRate 0.0826   Epoch: 1   Global Step: 75500   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:12:09,198-Speed 2625.81 samples/sec   Loss 12.8824   LearningRate 0.0826   Epoch: 1   Global Step: 75510   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:12:13,092-Speed 2629.50 samples/sec   Loss 12.8473   LearningRate 0.0826   Epoch: 1   Global Step: 75520   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:12:16,987-Speed 2630.02 samples/sec   Loss 12.8996   LearningRate 0.0826   Epoch: 1   Global Step: 75530   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:12:20,887-Speed 2626.56 samples/sec   Loss 12.8158   LearningRate 0.0826   Epoch: 1   Global Step: 75540   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:12:24,791-Speed 2623.65 samples/sec   Loss 12.6540   LearningRate 0.0826   Epoch: 1   Global Step: 75550   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:12:28,665-Speed 2644.18 samples/sec   Loss 12.9650   LearningRate 0.0826   Epoch: 1   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:12:32,553-Speed 2633.84 samples/sec   Loss 12.8646   LearningRate 0.0826   Epoch: 1   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:12:36,467-Speed 2617.87 samples/sec   Loss 12.8810   LearningRate 0.0826   Epoch: 1   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:12:40,355-Speed 2634.09 samples/sec   Loss 12.9245   LearningRate 0.0826   Epoch: 1   Global Step: 75590   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:12:44,264-Speed 2620.30 samples/sec   Loss 12.9760   LearningRate 0.0826   Epoch: 1   Global Step: 75600   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:12:48,166-Speed 2625.19 samples/sec   Loss 12.8153   LearningRate 0.0826   Epoch: 1   Global Step: 75610   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:12:52,149-Speed 2571.71 samples/sec   Loss 12.8386   LearningRate 0.0826   Epoch: 1   Global Step: 75620   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:12:56,052-Speed 2624.28 samples/sec   Loss 12.7636   LearningRate 0.0826   Epoch: 1   Global Step: 75630   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:12:59,957-Speed 2622.75 samples/sec   Loss 12.9743   LearningRate 0.0826   Epoch: 1   Global Step: 75640   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:13:03,866-Speed 2619.83 samples/sec   Loss 12.9530   LearningRate 0.0826   Epoch: 1   Global Step: 75650   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:13:07,763-Speed 2628.52 samples/sec   Loss 12.9755   LearningRate 0.0826   Epoch: 1   Global Step: 75660   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:13:11,670-Speed 2621.96 samples/sec   Loss 12.8388   LearningRate 0.0826   Epoch: 1   Global Step: 75670   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:13:15,570-Speed 2626.42 samples/sec   Loss 12.9348   LearningRate 0.0826   Epoch: 1   Global Step: 75680   Fp16 Grad Scale: 32768   Required: 85 hours
Training: 2022-04-13 04:13:19,477-Speed 2621.66 samples/sec   Loss 13.0608   LearningRate 0.0826   Epoch: 1   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:23,530-Speed 2527.61 samples/sec   Loss 13.0184   LearningRate 0.0826   Epoch: 1   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:27,631-Speed 2497.24 samples/sec   Loss 12.8515   LearningRate 0.0826   Epoch: 1   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:31,587-Speed 2588.70 samples/sec   Loss 12.8785   LearningRate 0.0826   Epoch: 1   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:35,493-Speed 2622.89 samples/sec   Loss 12.9483   LearningRate 0.0826   Epoch: 1   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:39,393-Speed 2626.11 samples/sec   Loss 12.9854   LearningRate 0.0826   Epoch: 1   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:43,289-Speed 2628.93 samples/sec   Loss 12.8884   LearningRate 0.0826   Epoch: 1   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:47,186-Speed 2628.74 samples/sec   Loss 12.9768   LearningRate 0.0826   Epoch: 1   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:51,083-Speed 2627.64 samples/sec   Loss 12.8643   LearningRate 0.0826   Epoch: 1   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:54,984-Speed 2625.72 samples/sec   Loss 12.7522   LearningRate 0.0826   Epoch: 1   Global Step: 75780   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:13:58,888-Speed 2623.59 samples/sec   Loss 12.9255   LearningRate 0.0826   Epoch: 1   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:02,781-Speed 2631.13 samples/sec   Loss 12.9566   LearningRate 0.0826   Epoch: 1   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:06,680-Speed 2626.76 samples/sec   Loss 13.0069   LearningRate 0.0826   Epoch: 1   Global Step: 75810   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:10,573-Speed 2630.91 samples/sec   Loss 12.7178   LearningRate 0.0826   Epoch: 1   Global Step: 75820   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:14,468-Speed 2630.19 samples/sec   Loss 13.0350   LearningRate 0.0826   Epoch: 1   Global Step: 75830   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:18,365-Speed 2627.86 samples/sec   Loss 13.0042   LearningRate 0.0826   Epoch: 1   Global Step: 75840   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:22,345-Speed 2573.31 samples/sec   Loss 12.9807   LearningRate 0.0825   Epoch: 1   Global Step: 75850   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:26,240-Speed 2629.72 samples/sec   Loss 12.8545   LearningRate 0.0825   Epoch: 1   Global Step: 75860   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:30,140-Speed 2626.22 samples/sec   Loss 12.9664   LearningRate 0.0825   Epoch: 1   Global Step: 75870   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:34,039-Speed 2627.08 samples/sec   Loss 13.0463   LearningRate 0.0825   Epoch: 1   Global Step: 75880   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:37,939-Speed 2626.44 samples/sec   Loss 12.8681   LearningRate 0.0825   Epoch: 1   Global Step: 75890   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:14:41,832-Speed 2631.03 samples/sec   Loss 12.8513   LearningRate 0.0825   Epoch: 1   Global Step: 75900   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:14:45,734-Speed 2625.08 samples/sec   Loss 12.8755   LearningRate 0.0825   Epoch: 1   Global Step: 75910   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:14:49,611-Speed 2641.67 samples/sec   Loss 12.9560   LearningRate 0.0825   Epoch: 1   Global Step: 75920   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:53,508-Speed 2628.44 samples/sec   Loss 12.9600   LearningRate 0.0825   Epoch: 1   Global Step: 75930   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:14:57,401-Speed 2631.41 samples/sec   Loss 12.9705   LearningRate 0.0825   Epoch: 1   Global Step: 75940   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:01,294-Speed 2630.99 samples/sec   Loss 12.9180   LearningRate 0.0825   Epoch: 1   Global Step: 75950   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:05,186-Speed 2631.42 samples/sec   Loss 12.9838   LearningRate 0.0825   Epoch: 1   Global Step: 75960   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:09,117-Speed 2605.69 samples/sec   Loss 12.8034   LearningRate 0.0825   Epoch: 1   Global Step: 75970   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:13,012-Speed 2629.55 samples/sec   Loss 12.8530   LearningRate 0.0825   Epoch: 1   Global Step: 75980   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:16,921-Speed 2620.94 samples/sec   Loss 12.8398   LearningRate 0.0825   Epoch: 1   Global Step: 75990   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:20,827-Speed 2622.37 samples/sec   Loss 13.1242   LearningRate 0.0825   Epoch: 1   Global Step: 76000   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:24,735-Speed 2620.89 samples/sec   Loss 12.8308   LearningRate 0.0825   Epoch: 1   Global Step: 76010   Fp16 Grad Scale: 131072   Required: 85 hours
Training: 2022-04-13 04:15:28,636-Speed 2625.45 samples/sec   Loss 12.8559   LearningRate 0.0825   Epoch: 1   Global Step: 76020   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:32,542-Speed 2622.63 samples/sec   Loss 12.8610   LearningRate 0.0825   Epoch: 1   Global Step: 76030   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:36,438-Speed 2628.30 samples/sec   Loss 12.8159   LearningRate 0.0825   Epoch: 1   Global Step: 76040   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:40,343-Speed 2623.04 samples/sec   Loss 12.8949   LearningRate 0.0825   Epoch: 1   Global Step: 76050   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:44,236-Speed 2631.19 samples/sec   Loss 12.8845   LearningRate 0.0825   Epoch: 1   Global Step: 76060   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:48,144-Speed 2621.33 samples/sec   Loss 12.7816   LearningRate 0.0825   Epoch: 1   Global Step: 76070   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:52,045-Speed 2626.08 samples/sec   Loss 12.8531   LearningRate 0.0825   Epoch: 1   Global Step: 76080   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:55,944-Speed 2627.03 samples/sec   Loss 12.8096   LearningRate 0.0825   Epoch: 1   Global Step: 76090   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:15:59,840-Speed 2628.87 samples/sec   Loss 13.0085   LearningRate 0.0825   Epoch: 1   Global Step: 76100   Fp16 Grad Scale: 262144   Required: 85 hours
Training: 2022-04-13 04:16:03,703-Speed 2651.66 samples/sec   Loss 12.7997   LearningRate 0.0825   Epoch: 1   Global Step: 76110   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:16:07,598-Speed 2629.26 samples/sec   Loss 12.9275   LearningRate 0.0825   Epoch: 1   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 85 hours
Training: 2022-04-13 04:16:11,500-Speed 2624.83 samples/sec   Loss 12.8621   LearningRate 0.0825   Epoch: 1   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:15,391-Speed 2633.20 samples/sec   Loss 12.9177   LearningRate 0.0825   Epoch: 1   Global Step: 76140   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:19,294-Speed 2623.96 samples/sec   Loss 12.7896   LearningRate 0.0825   Epoch: 1   Global Step: 76150   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:23,192-Speed 2627.80 samples/sec   Loss 12.8313   LearningRate 0.0825   Epoch: 1   Global Step: 76160   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:27,093-Speed 2625.57 samples/sec   Loss 12.8207   LearningRate 0.0825   Epoch: 1   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:31,002-Speed 2620.22 samples/sec   Loss 12.8452   LearningRate 0.0825   Epoch: 1   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:34,905-Speed 2624.46 samples/sec   Loss 12.9270   LearningRate 0.0825   Epoch: 1   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:38,825-Speed 2612.87 samples/sec   Loss 12.8673   LearningRate 0.0825   Epoch: 1   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:16:42,726-Speed 2625.31 samples/sec   Loss 12.8434   LearningRate 0.0825   Epoch: 1   Global Step: 76210   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:16:46,626-Speed 2626.51 samples/sec   Loss 12.8429   LearningRate 0.0825   Epoch: 1   Global Step: 76220   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:16:50,531-Speed 2623.25 samples/sec   Loss 12.9010   LearningRate 0.0825   Epoch: 1   Global Step: 76230   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:16:54,425-Speed 2629.94 samples/sec   Loss 12.9610   LearningRate 0.0825   Epoch: 1   Global Step: 76240   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:16:58,337-Speed 2618.44 samples/sec   Loss 12.8454   LearningRate 0.0825   Epoch: 1   Global Step: 76250   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:17:02,237-Speed 2626.81 samples/sec   Loss 12.9570   LearningRate 0.0825   Epoch: 1   Global Step: 76260   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:17:06,129-Speed 2631.50 samples/sec   Loss 13.0286   LearningRate 0.0825   Epoch: 1   Global Step: 76270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:17:10,013-Speed 2636.67 samples/sec   Loss 12.8662   LearningRate 0.0825   Epoch: 1   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:17:13,864-Speed 2659.49 samples/sec   Loss 13.0483   LearningRate 0.0825   Epoch: 1   Global Step: 76290   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:17,760-Speed 2628.68 samples/sec   Loss 13.1365   LearningRate 0.0825   Epoch: 1   Global Step: 76300   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:21,653-Speed 2631.29 samples/sec   Loss 12.8535   LearningRate 0.0824   Epoch: 1   Global Step: 76310   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:25,552-Speed 2627.26 samples/sec   Loss 13.0596   LearningRate 0.0824   Epoch: 1   Global Step: 76320   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:29,442-Speed 2633.11 samples/sec   Loss 12.8599   LearningRate 0.0824   Epoch: 1   Global Step: 76330   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:33,368-Speed 2609.10 samples/sec   Loss 13.0249   LearningRate 0.0824   Epoch: 1   Global Step: 76340   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:37,267-Speed 2626.80 samples/sec   Loss 12.8561   LearningRate 0.0824   Epoch: 1   Global Step: 76350   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:41,162-Speed 2629.45 samples/sec   Loss 12.8247   LearningRate 0.0824   Epoch: 1   Global Step: 76360   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:45,058-Speed 2629.56 samples/sec   Loss 12.8251   LearningRate 0.0824   Epoch: 1   Global Step: 76370   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:48,947-Speed 2633.09 samples/sec   Loss 12.8243   LearningRate 0.0824   Epoch: 1   Global Step: 76380   Fp16 Grad Scale: 8192   Required: 84 hours
Training: 2022-04-13 04:17:52,840-Speed 2631.74 samples/sec   Loss 12.9461   LearningRate 0.0824   Epoch: 1   Global Step: 76390   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:17:56,732-Speed 2631.76 samples/sec   Loss 12.8400   LearningRate 0.0824   Epoch: 1   Global Step: 76400   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:00,621-Speed 2633.35 samples/sec   Loss 12.7910   LearningRate 0.0824   Epoch: 1   Global Step: 76410   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:04,537-Speed 2616.00 samples/sec   Loss 12.8787   LearningRate 0.0824   Epoch: 1   Global Step: 76420   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:08,427-Speed 2632.57 samples/sec   Loss 13.0382   LearningRate 0.0824   Epoch: 1   Global Step: 76430   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:12,321-Speed 2630.45 samples/sec   Loss 12.8805   LearningRate 0.0824   Epoch: 1   Global Step: 76440   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:16,214-Speed 2631.12 samples/sec   Loss 12.8592   LearningRate 0.0824   Epoch: 1   Global Step: 76450   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:20,178-Speed 2583.54 samples/sec   Loss 12.9168   LearningRate 0.0824   Epoch: 1   Global Step: 76460   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:24,077-Speed 2626.89 samples/sec   Loss 12.6906   LearningRate 0.0824   Epoch: 1   Global Step: 76470   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:27,969-Speed 2632.28 samples/sec   Loss 12.8532   LearningRate 0.0824   Epoch: 1   Global Step: 76480   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:18:31,860-Speed 2632.57 samples/sec   Loss 12.8765   LearningRate 0.0824   Epoch: 1   Global Step: 76490   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:35,751-Speed 2632.23 samples/sec   Loss 12.8165   LearningRate 0.0824   Epoch: 1   Global Step: 76500   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:39,644-Speed 2630.69 samples/sec   Loss 12.8683   LearningRate 0.0824   Epoch: 1   Global Step: 76510   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:43,550-Speed 2621.96 samples/sec   Loss 12.7770   LearningRate 0.0824   Epoch: 1   Global Step: 76520   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:47,445-Speed 2629.77 samples/sec   Loss 12.7843   LearningRate 0.0824   Epoch: 1   Global Step: 76530   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:51,339-Speed 2630.31 samples/sec   Loss 12.8270   LearningRate 0.0824   Epoch: 1   Global Step: 76540   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:55,233-Speed 2630.45 samples/sec   Loss 12.8096   LearningRate 0.0824   Epoch: 1   Global Step: 76550   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:18:59,124-Speed 2632.29 samples/sec   Loss 12.8907   LearningRate 0.0824   Epoch: 1   Global Step: 76560   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:19:03,017-Speed 2630.60 samples/sec   Loss 12.7809   LearningRate 0.0824   Epoch: 1   Global Step: 76570   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:19:06,912-Speed 2630.02 samples/sec   Loss 12.9110   LearningRate 0.0824   Epoch: 1   Global Step: 76580   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:19:10,805-Speed 2631.02 samples/sec   Loss 12.8213   LearningRate 0.0824   Epoch: 1   Global Step: 76590   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:14,697-Speed 2631.22 samples/sec   Loss 12.7568   LearningRate 0.0824   Epoch: 1   Global Step: 76600   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:18,606-Speed 2620.84 samples/sec   Loss 12.9231   LearningRate 0.0824   Epoch: 1   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:22,502-Speed 2628.86 samples/sec   Loss 12.8652   LearningRate 0.0824   Epoch: 1   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:26,418-Speed 2616.09 samples/sec   Loss 12.8131   LearningRate 0.0824   Epoch: 1   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:30,317-Speed 2627.10 samples/sec   Loss 12.8367   LearningRate 0.0824   Epoch: 1   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:34,218-Speed 2625.24 samples/sec   Loss 12.9512   LearningRate 0.0824   Epoch: 1   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:38,116-Speed 2627.68 samples/sec   Loss 12.8886   LearningRate 0.0824   Epoch: 1   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:42,013-Speed 2628.50 samples/sec   Loss 12.9765   LearningRate 0.0824   Epoch: 1   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:45,979-Speed 2582.32 samples/sec   Loss 12.8343   LearningRate 0.0824   Epoch: 1   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:19:49,900-Speed 2612.83 samples/sec   Loss 12.8498   LearningRate 0.0824   Epoch: 1   Global Step: 76690   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:19:53,835-Speed 2602.54 samples/sec   Loss 12.8105   LearningRate 0.0824   Epoch: 1   Global Step: 76700   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:19:57,728-Speed 2630.91 samples/sec   Loss 12.8211   LearningRate 0.0824   Epoch: 1   Global Step: 76710   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:01,636-Speed 2620.72 samples/sec   Loss 12.7983   LearningRate 0.0824   Epoch: 1   Global Step: 76720   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:05,530-Speed 2630.36 samples/sec   Loss 13.0267   LearningRate 0.0824   Epoch: 1   Global Step: 76730   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:09,425-Speed 2629.50 samples/sec   Loss 12.9563   LearningRate 0.0824   Epoch: 1   Global Step: 76740   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:13,320-Speed 2629.37 samples/sec   Loss 12.9033   LearningRate 0.0824   Epoch: 1   Global Step: 76750   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:17,215-Speed 2629.95 samples/sec   Loss 12.9190   LearningRate 0.0824   Epoch: 1   Global Step: 76760   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:21,110-Speed 2630.10 samples/sec   Loss 12.7641   LearningRate 0.0823   Epoch: 1   Global Step: 76770   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:25,006-Speed 2629.17 samples/sec   Loss 12.9380   LearningRate 0.0823   Epoch: 1   Global Step: 76780   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:20:28,902-Speed 2628.45 samples/sec   Loss 12.8382   LearningRate 0.0823   Epoch: 1   Global Step: 76790   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:20:32,765-Speed 2651.21 samples/sec   Loss 12.8308   LearningRate 0.0823   Epoch: 1   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:20:36,663-Speed 2627.32 samples/sec   Loss 12.8401   LearningRate 0.0823   Epoch: 1   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:20:40,567-Speed 2624.57 samples/sec   Loss 12.7394   LearningRate 0.0823   Epoch: 1   Global Step: 76820   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:20:44,473-Speed 2621.72 samples/sec   Loss 12.7547   LearningRate 0.0823   Epoch: 1   Global Step: 76830   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:20:48,494-Speed 2547.54 samples/sec   Loss 13.0120   LearningRate 0.0823   Epoch: 1   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:20:52,518-Speed 2545.43 samples/sec   Loss 12.7756   LearningRate 0.0823   Epoch: 1   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:20:56,452-Speed 2604.14 samples/sec   Loss 12.9502   LearningRate 0.0823   Epoch: 1   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:21:00,364-Speed 2617.87 samples/sec   Loss 12.7326   LearningRate 0.0823   Epoch: 1   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:21:04,275-Speed 2619.11 samples/sec   Loss 12.9402   LearningRate 0.0823   Epoch: 1   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:21:08,179-Speed 2623.45 samples/sec   Loss 12.7660   LearningRate 0.0823   Epoch: 1   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:21:12,076-Speed 2628.87 samples/sec   Loss 12.9224   LearningRate 0.0823   Epoch: 1   Global Step: 76900   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:15,984-Speed 2620.57 samples/sec   Loss 12.7335   LearningRate 0.0823   Epoch: 1   Global Step: 76910   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:19,893-Speed 2620.37 samples/sec   Loss 12.8386   LearningRate 0.0823   Epoch: 1   Global Step: 76920   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:23,788-Speed 2629.53 samples/sec   Loss 12.8562   LearningRate 0.0823   Epoch: 1   Global Step: 76930   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:27,690-Speed 2624.98 samples/sec   Loss 12.7255   LearningRate 0.0823   Epoch: 1   Global Step: 76940   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:31,589-Speed 2626.86 samples/sec   Loss 12.9439   LearningRate 0.0823   Epoch: 1   Global Step: 76950   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:35,488-Speed 2627.35 samples/sec   Loss 12.7875   LearningRate 0.0823   Epoch: 1   Global Step: 76960   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:39,383-Speed 2629.07 samples/sec   Loss 12.9289   LearningRate 0.0823   Epoch: 1   Global Step: 76970   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:43,284-Speed 2625.65 samples/sec   Loss 12.8144   LearningRate 0.0823   Epoch: 1   Global Step: 76980   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:47,181-Speed 2628.86 samples/sec   Loss 12.8171   LearningRate 0.0823   Epoch: 1   Global Step: 76990   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:51,057-Speed 2642.62 samples/sec   Loss 12.7596   LearningRate 0.0823   Epoch: 1   Global Step: 77000   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:54,960-Speed 2624.45 samples/sec   Loss 12.8936   LearningRate 0.0823   Epoch: 1   Global Step: 77010   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:21:58,875-Speed 2616.09 samples/sec   Loss 12.8092   LearningRate 0.0823   Epoch: 1   Global Step: 77020   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:02,790-Speed 2616.33 samples/sec   Loss 12.8935   LearningRate 0.0823   Epoch: 1   Global Step: 77030   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:06,693-Speed 2624.00 samples/sec   Loss 12.7110   LearningRate 0.0823   Epoch: 1   Global Step: 77040   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:10,615-Speed 2612.32 samples/sec   Loss 12.9158   LearningRate 0.0823   Epoch: 1   Global Step: 77050   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:14,510-Speed 2628.96 samples/sec   Loss 12.8176   LearningRate 0.0823   Epoch: 1   Global Step: 77060   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:18,418-Speed 2621.06 samples/sec   Loss 12.9017   LearningRate 0.0823   Epoch: 1   Global Step: 77070   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:22,312-Speed 2630.32 samples/sec   Loss 12.7619   LearningRate 0.0823   Epoch: 1   Global Step: 77080   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:26,205-Speed 2630.61 samples/sec   Loss 12.8640   LearningRate 0.0823   Epoch: 1   Global Step: 77090   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:30,100-Speed 2630.57 samples/sec   Loss 12.8729   LearningRate 0.0823   Epoch: 1   Global Step: 77100   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:22:33,994-Speed 2630.29 samples/sec   Loss 12.8663   LearningRate 0.0823   Epoch: 1   Global Step: 77110   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:22:37,877-Speed 2637.66 samples/sec   Loss 12.8858   LearningRate 0.0823   Epoch: 1   Global Step: 77120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:41,766-Speed 2633.63 samples/sec   Loss 12.9015   LearningRate 0.0823   Epoch: 1   Global Step: 77130   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:45,657-Speed 2632.40 samples/sec   Loss 12.8278   LearningRate 0.0823   Epoch: 1   Global Step: 77140   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:49,552-Speed 2629.97 samples/sec   Loss 12.8830   LearningRate 0.0823   Epoch: 1   Global Step: 77150   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:53,445-Speed 2630.64 samples/sec   Loss 12.7697   LearningRate 0.0823   Epoch: 1   Global Step: 77160   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:22:57,346-Speed 2625.93 samples/sec   Loss 12.8758   LearningRate 0.0823   Epoch: 1   Global Step: 77170   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:01,251-Speed 2623.54 samples/sec   Loss 12.9532   LearningRate 0.0823   Epoch: 1   Global Step: 77180   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:05,148-Speed 2627.78 samples/sec   Loss 12.9283   LearningRate 0.0823   Epoch: 1   Global Step: 77190   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:09,043-Speed 2629.53 samples/sec   Loss 12.9016   LearningRate 0.0823   Epoch: 1   Global Step: 77200   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:12,945-Speed 2624.71 samples/sec   Loss 12.8940   LearningRate 0.0823   Epoch: 1   Global Step: 77210   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:16,843-Speed 2627.99 samples/sec   Loss 12.8562   LearningRate 0.0822   Epoch: 1   Global Step: 77220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:23:20,738-Speed 2629.25 samples/sec   Loss 12.8258   LearningRate 0.0822   Epoch: 1   Global Step: 77230   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:23:24,635-Speed 2628.54 samples/sec   Loss 12.8596   LearningRate 0.0822   Epoch: 1   Global Step: 77240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:23:28,528-Speed 2631.35 samples/sec   Loss 13.0255   LearningRate 0.0822   Epoch: 1   Global Step: 77250   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:23:32,422-Speed 2630.35 samples/sec   Loss 12.8263   LearningRate 0.0822   Epoch: 1   Global Step: 77260   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:23:36,301-Speed 2640.77 samples/sec   Loss 12.8347   LearningRate 0.0822   Epoch: 1   Global Step: 77270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:40,199-Speed 2626.91 samples/sec   Loss 12.9929   LearningRate 0.0822   Epoch: 1   Global Step: 77280   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:44,094-Speed 2630.06 samples/sec   Loss 12.8637   LearningRate 0.0822   Epoch: 1   Global Step: 77290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:48,089-Speed 2563.76 samples/sec   Loss 12.9543   LearningRate 0.0822   Epoch: 1   Global Step: 77300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:51,985-Speed 2629.26 samples/sec   Loss 12.8956   LearningRate 0.0822   Epoch: 1   Global Step: 77310   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:55,878-Speed 2631.35 samples/sec   Loss 12.7875   LearningRate 0.0822   Epoch: 1   Global Step: 77320   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:23:59,770-Speed 2631.40 samples/sec   Loss 12.8469   LearningRate 0.0822   Epoch: 1   Global Step: 77330   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:03,660-Speed 2632.99 samples/sec   Loss 12.6390   LearningRate 0.0822   Epoch: 1   Global Step: 77340   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:07,566-Speed 2622.36 samples/sec   Loss 12.8411   LearningRate 0.0822   Epoch: 1   Global Step: 77350   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:11,459-Speed 2630.79 samples/sec   Loss 12.7705   LearningRate 0.0822   Epoch: 1   Global Step: 77360   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:15,388-Speed 2607.13 samples/sec   Loss 12.8773   LearningRate 0.0822   Epoch: 1   Global Step: 77370   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:24:19,295-Speed 2621.79 samples/sec   Loss 12.9283   LearningRate 0.0822   Epoch: 1   Global Step: 77380   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:24:23,191-Speed 2629.10 samples/sec   Loss 12.7600   LearningRate 0.0822   Epoch: 1   Global Step: 77390   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:24:27,111-Speed 2612.92 samples/sec   Loss 12.8039   LearningRate 0.0822   Epoch: 1   Global Step: 77400   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:31,007-Speed 2629.13 samples/sec   Loss 12.7412   LearningRate 0.0822   Epoch: 1   Global Step: 77410   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:34,922-Speed 2616.33 samples/sec   Loss 12.8358   LearningRate 0.0822   Epoch: 1   Global Step: 77420   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:38,819-Speed 2627.88 samples/sec   Loss 12.7361   LearningRate 0.0822   Epoch: 1   Global Step: 77430   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:42,713-Speed 2630.79 samples/sec   Loss 12.9287   LearningRate 0.0822   Epoch: 1   Global Step: 77440   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:46,623-Speed 2619.96 samples/sec   Loss 12.8598   LearningRate 0.0822   Epoch: 1   Global Step: 77450   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:50,519-Speed 2628.49 samples/sec   Loss 12.7753   LearningRate 0.0822   Epoch: 1   Global Step: 77460   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:54,415-Speed 2629.32 samples/sec   Loss 12.7552   LearningRate 0.0822   Epoch: 1   Global Step: 77470   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:24:58,321-Speed 2622.27 samples/sec   Loss 12.8872   LearningRate 0.0822   Epoch: 1   Global Step: 77480   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:25:02,226-Speed 2622.93 samples/sec   Loss 12.6866   LearningRate 0.0822   Epoch: 1   Global Step: 77490   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:25:06,120-Speed 2630.76 samples/sec   Loss 12.8269   LearningRate 0.0822   Epoch: 1   Global Step: 77500   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:10,030-Speed 2619.29 samples/sec   Loss 12.8765   LearningRate 0.0822   Epoch: 1   Global Step: 77510   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:13,930-Speed 2626.56 samples/sec   Loss 12.7963   LearningRate 0.0822   Epoch: 1   Global Step: 77520   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:17,825-Speed 2629.74 samples/sec   Loss 12.7566   LearningRate 0.0822   Epoch: 1   Global Step: 77530   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:21,716-Speed 2632.29 samples/sec   Loss 12.7257   LearningRate 0.0822   Epoch: 1   Global Step: 77540   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:25,621-Speed 2623.38 samples/sec   Loss 12.7211   LearningRate 0.0822   Epoch: 1   Global Step: 77550   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:29,526-Speed 2622.74 samples/sec   Loss 12.7575   LearningRate 0.0822   Epoch: 1   Global Step: 77560   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:33,420-Speed 2630.54 samples/sec   Loss 12.9111   LearningRate 0.0822   Epoch: 1   Global Step: 77570   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:37,320-Speed 2625.67 samples/sec   Loss 12.7212   LearningRate 0.0822   Epoch: 1   Global Step: 77580   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:41,228-Speed 2621.59 samples/sec   Loss 12.7457   LearningRate 0.0822   Epoch: 1   Global Step: 77590   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:45,138-Speed 2619.26 samples/sec   Loss 12.8922   LearningRate 0.0822   Epoch: 1   Global Step: 77600   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 04:25:49,023-Speed 2637.45 samples/sec   Loss 12.7435   LearningRate 0.0822   Epoch: 1   Global Step: 77610   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:25:52,905-Speed 2638.43 samples/sec   Loss 12.8578   LearningRate 0.0822   Epoch: 1   Global Step: 77620   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:25:56,808-Speed 2624.23 samples/sec   Loss 12.7040   LearningRate 0.0822   Epoch: 1   Global Step: 77630   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:00,704-Speed 2629.13 samples/sec   Loss 13.0481   LearningRate 0.0822   Epoch: 1   Global Step: 77640   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:04,614-Speed 2619.65 samples/sec   Loss 12.8703   LearningRate 0.0822   Epoch: 1   Global Step: 77650   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:08,549-Speed 2602.81 samples/sec   Loss 12.8510   LearningRate 0.0822   Epoch: 1   Global Step: 77660   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:12,441-Speed 2631.44 samples/sec   Loss 12.8718   LearningRate 0.0822   Epoch: 1   Global Step: 77670   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:16,336-Speed 2629.81 samples/sec   Loss 12.8150   LearningRate 0.0821   Epoch: 1   Global Step: 77680   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:20,271-Speed 2603.42 samples/sec   Loss 12.8008   LearningRate 0.0821   Epoch: 1   Global Step: 77690   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:24,176-Speed 2622.73 samples/sec   Loss 12.7927   LearningRate 0.0821   Epoch: 1   Global Step: 77700   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:28,080-Speed 2624.24 samples/sec   Loss 12.8690   LearningRate 0.0821   Epoch: 1   Global Step: 77710   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:31,978-Speed 2627.44 samples/sec   Loss 12.9090   LearningRate 0.0821   Epoch: 1   Global Step: 77720   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:26:35,866-Speed 2634.04 samples/sec   Loss 12.7505   LearningRate 0.0821   Epoch: 1   Global Step: 77730   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:39,759-Speed 2631.37 samples/sec   Loss 12.7080   LearningRate 0.0821   Epoch: 1   Global Step: 77740   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:43,652-Speed 2630.97 samples/sec   Loss 12.9159   LearningRate 0.0821   Epoch: 1   Global Step: 77750   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:47,543-Speed 2632.76 samples/sec   Loss 12.8624   LearningRate 0.0821   Epoch: 1   Global Step: 77760   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:51,439-Speed 2628.54 samples/sec   Loss 12.7939   LearningRate 0.0821   Epoch: 1   Global Step: 77770   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:55,341-Speed 2625.45 samples/sec   Loss 12.7761   LearningRate 0.0821   Epoch: 1   Global Step: 77780   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:26:59,254-Speed 2616.83 samples/sec   Loss 12.6788   LearningRate 0.0821   Epoch: 1   Global Step: 77790   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:03,161-Speed 2622.01 samples/sec   Loss 12.7208   LearningRate 0.0821   Epoch: 1   Global Step: 77800   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:07,055-Speed 2630.08 samples/sec   Loss 12.7039   LearningRate 0.0821   Epoch: 1   Global Step: 77810   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:10,949-Speed 2630.40 samples/sec   Loss 12.9194   LearningRate 0.0821   Epoch: 1   Global Step: 77820   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:14,847-Speed 2627.51 samples/sec   Loss 12.7473   LearningRate 0.0821   Epoch: 1   Global Step: 77830   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:27:18,747-Speed 2626.47 samples/sec   Loss 12.8490   LearningRate 0.0821   Epoch: 1   Global Step: 77840   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:22,645-Speed 2627.88 samples/sec   Loss 13.0079   LearningRate 0.0821   Epoch: 1   Global Step: 77850   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:26,545-Speed 2626.93 samples/sec   Loss 12.8253   LearningRate 0.0821   Epoch: 1   Global Step: 77860   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:30,439-Speed 2630.07 samples/sec   Loss 12.7887   LearningRate 0.0821   Epoch: 1   Global Step: 77870   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:34,341-Speed 2624.32 samples/sec   Loss 12.8818   LearningRate 0.0821   Epoch: 1   Global Step: 77880   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:38,234-Speed 2631.17 samples/sec   Loss 12.7712   LearningRate 0.0821   Epoch: 1   Global Step: 77890   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:42,131-Speed 2628.50 samples/sec   Loss 12.8965   LearningRate 0.0821   Epoch: 1   Global Step: 77900   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:46,025-Speed 2630.04 samples/sec   Loss 12.7564   LearningRate 0.0821   Epoch: 1   Global Step: 77910   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:49,921-Speed 2629.64 samples/sec   Loss 12.8997   LearningRate 0.0821   Epoch: 1   Global Step: 77920   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:53,818-Speed 2627.60 samples/sec   Loss 12.8622   LearningRate 0.0821   Epoch: 1   Global Step: 77930   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:27:57,754-Speed 2603.14 samples/sec   Loss 12.8552   LearningRate 0.0821   Epoch: 1   Global Step: 77940   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:28:01,654-Speed 2626.16 samples/sec   Loss 12.7746   LearningRate 0.0821   Epoch: 1   Global Step: 77950   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:28:05,549-Speed 2629.21 samples/sec   Loss 12.7134   LearningRate 0.0821   Epoch: 1   Global Step: 77960   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:28:09,433-Speed 2637.60 samples/sec   Loss 12.8033   LearningRate 0.0821   Epoch: 1   Global Step: 77970   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:13,333-Speed 2626.61 samples/sec   Loss 12.7087   LearningRate 0.0821   Epoch: 1   Global Step: 77980   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:17,230-Speed 2628.36 samples/sec   Loss 12.6412   LearningRate 0.0821   Epoch: 1   Global Step: 77990   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:21,125-Speed 2629.53 samples/sec   Loss 12.9300   LearningRate 0.0821   Epoch: 1   Global Step: 78000   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:25,019-Speed 2630.23 samples/sec   Loss 12.8192   LearningRate 0.0821   Epoch: 1   Global Step: 78010   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:28,927-Speed 2621.19 samples/sec   Loss 12.8672   LearningRate 0.0821   Epoch: 1   Global Step: 78020   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:32,827-Speed 2626.15 samples/sec   Loss 12.8814   LearningRate 0.0821   Epoch: 1   Global Step: 78030   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:36,729-Speed 2624.98 samples/sec   Loss 12.7293   LearningRate 0.0821   Epoch: 1   Global Step: 78040   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:40,627-Speed 2627.91 samples/sec   Loss 13.0511   LearningRate 0.0821   Epoch: 1   Global Step: 78050   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:44,524-Speed 2627.89 samples/sec   Loss 12.8501   LearningRate 0.0821   Epoch: 1   Global Step: 78060   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:28:48,436-Speed 2618.50 samples/sec   Loss 12.8240   LearningRate 0.0821   Epoch: 1   Global Step: 78070   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:28:52,346-Speed 2619.13 samples/sec   Loss 12.8910   LearningRate 0.0821   Epoch: 1   Global Step: 78080   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:28:56,238-Speed 2632.12 samples/sec   Loss 12.9451   LearningRate 0.0821   Epoch: 1   Global Step: 78090   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:00,123-Speed 2636.74 samples/sec   Loss 12.9203   LearningRate 0.0821   Epoch: 1   Global Step: 78100   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:04,024-Speed 2625.45 samples/sec   Loss 12.9362   LearningRate 0.0821   Epoch: 1   Global Step: 78110   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:07,927-Speed 2624.14 samples/sec   Loss 12.8824   LearningRate 0.0821   Epoch: 1   Global Step: 78120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:11,830-Speed 2624.05 samples/sec   Loss 12.9384   LearningRate 0.0821   Epoch: 1   Global Step: 78130   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:15,725-Speed 2629.94 samples/sec   Loss 12.7650   LearningRate 0.0820   Epoch: 1   Global Step: 78140   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:19,620-Speed 2629.36 samples/sec   Loss 12.7823   LearningRate 0.0820   Epoch: 1   Global Step: 78150   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:23,520-Speed 2626.44 samples/sec   Loss 12.8208   LearningRate 0.0820   Epoch: 1   Global Step: 78160   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:27,421-Speed 2625.43 samples/sec   Loss 12.8645   LearningRate 0.0820   Epoch: 1   Global Step: 78170   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:31,323-Speed 2625.17 samples/sec   Loss 12.7678   LearningRate 0.0820   Epoch: 1   Global Step: 78180   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:29:35,218-Speed 2629.44 samples/sec   Loss 12.8584   LearningRate 0.0820   Epoch: 1   Global Step: 78190   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:29:39,121-Speed 2624.41 samples/sec   Loss 12.7438   LearningRate 0.0820   Epoch: 1   Global Step: 78200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:29:43,018-Speed 2628.31 samples/sec   Loss 12.7168   LearningRate 0.0820   Epoch: 1   Global Step: 78210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:29:46,917-Speed 2626.98 samples/sec   Loss 12.9208   LearningRate 0.0820   Epoch: 1   Global Step: 78220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:29:50,854-Speed 2601.62 samples/sec   Loss 12.7785   LearningRate 0.0820   Epoch: 1   Global Step: 78230   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:29:54,802-Speed 2594.84 samples/sec   Loss 12.6715   LearningRate 0.0820   Epoch: 1   Global Step: 78240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:29:58,682-Speed 2639.59 samples/sec   Loss 12.6967   LearningRate 0.0820   Epoch: 1   Global Step: 78250   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:30:02,584-Speed 2625.11 samples/sec   Loss 12.9039   LearningRate 0.0820   Epoch: 1   Global Step: 78260   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:30:06,476-Speed 2631.55 samples/sec   Loss 12.8114   LearningRate 0.0820   Epoch: 1   Global Step: 78270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:30:10,368-Speed 2631.74 samples/sec   Loss 12.7737   LearningRate 0.0820   Epoch: 1   Global Step: 78280   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:30:14,261-Speed 2630.88 samples/sec   Loss 12.7110   LearningRate 0.0820   Epoch: 1   Global Step: 78290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:30:18,164-Speed 2624.70 samples/sec   Loss 12.7869   LearningRate 0.0820   Epoch: 1   Global Step: 78300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:30:22,041-Speed 2642.51 samples/sec   Loss 12.8020   LearningRate 0.0820   Epoch: 1   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:25,936-Speed 2628.87 samples/sec   Loss 12.9103   LearningRate 0.0820   Epoch: 1   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:29,834-Speed 2628.31 samples/sec   Loss 12.7381   LearningRate 0.0820   Epoch: 1   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:33,729-Speed 2629.37 samples/sec   Loss 12.7006   LearningRate 0.0820   Epoch: 1   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:37,624-Speed 2630.57 samples/sec   Loss 12.8465   LearningRate 0.0820   Epoch: 1   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:41,520-Speed 2629.07 samples/sec   Loss 12.9006   LearningRate 0.0820   Epoch: 1   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:45,418-Speed 2627.59 samples/sec   Loss 12.6969   LearningRate 0.0820   Epoch: 1   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:49,314-Speed 2629.08 samples/sec   Loss 12.7265   LearningRate 0.0820   Epoch: 1   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:53,212-Speed 2627.89 samples/sec   Loss 12.8976   LearningRate 0.0820   Epoch: 1   Global Step: 78390   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:30:57,104-Speed 2631.42 samples/sec   Loss 12.6753   LearningRate 0.0820   Epoch: 1   Global Step: 78400   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:31:01,001-Speed 2627.61 samples/sec   Loss 12.8461   LearningRate 0.0820   Epoch: 1   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:04,918-Speed 2614.72 samples/sec   Loss 12.8387   LearningRate 0.0820   Epoch: 1   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:08,811-Speed 2631.30 samples/sec   Loss 12.6760   LearningRate 0.0820   Epoch: 1   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:12,720-Speed 2620.46 samples/sec   Loss 12.8697   LearningRate 0.0820   Epoch: 1   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:16,622-Speed 2624.76 samples/sec   Loss 12.7270   LearningRate 0.0820   Epoch: 1   Global Step: 78450   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:20,520-Speed 2627.78 samples/sec   Loss 12.7657   LearningRate 0.0820   Epoch: 1   Global Step: 78460   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:24,416-Speed 2629.41 samples/sec   Loss 12.9697   LearningRate 0.0820   Epoch: 1   Global Step: 78470   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:28,331-Speed 2616.09 samples/sec   Loss 12.8208   LearningRate 0.0820   Epoch: 1   Global Step: 78480   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:32,228-Speed 2628.16 samples/sec   Loss 12.7919   LearningRate 0.0820   Epoch: 1   Global Step: 78490   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:36,134-Speed 2621.76 samples/sec   Loss 12.8770   LearningRate 0.0820   Epoch: 1   Global Step: 78500   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:40,033-Speed 2627.06 samples/sec   Loss 12.7877   LearningRate 0.0820   Epoch: 1   Global Step: 78510   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:31:43,930-Speed 2628.64 samples/sec   Loss 12.9402   LearningRate 0.0820   Epoch: 1   Global Step: 78520   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:31:47,823-Speed 2630.46 samples/sec   Loss 12.8662   LearningRate 0.0820   Epoch: 1   Global Step: 78530   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:51,720-Speed 2628.27 samples/sec   Loss 12.9167   LearningRate 0.0820   Epoch: 1   Global Step: 78540   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:55,619-Speed 2627.36 samples/sec   Loss 12.8010   LearningRate 0.0820   Epoch: 1   Global Step: 78550   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:31:59,518-Speed 2626.98 samples/sec   Loss 12.8306   LearningRate 0.0820   Epoch: 1   Global Step: 78560   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:03,414-Speed 2628.88 samples/sec   Loss 12.7414   LearningRate 0.0820   Epoch: 1   Global Step: 78570   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:07,306-Speed 2631.06 samples/sec   Loss 12.8925   LearningRate 0.0820   Epoch: 1   Global Step: 78580   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:11,202-Speed 2629.07 samples/sec   Loss 12.7802   LearningRate 0.0820   Epoch: 1   Global Step: 78590   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:15,104-Speed 2625.10 samples/sec   Loss 12.6732   LearningRate 0.0819   Epoch: 1   Global Step: 78600   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:19,014-Speed 2619.90 samples/sec   Loss 12.8059   LearningRate 0.0819   Epoch: 1   Global Step: 78610   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:22,924-Speed 2619.57 samples/sec   Loss 12.7461   LearningRate 0.0819   Epoch: 1   Global Step: 78620   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:26,820-Speed 2629.27 samples/sec   Loss 12.8218   LearningRate 0.0819   Epoch: 1   Global Step: 78630   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:30,714-Speed 2630.54 samples/sec   Loss 12.8370   LearningRate 0.0819   Epoch: 1   Global Step: 78640   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:34,615-Speed 2625.31 samples/sec   Loss 12.8568   LearningRate 0.0819   Epoch: 1   Global Step: 78650   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:38,533-Speed 2614.15 samples/sec   Loss 12.6604   LearningRate 0.0819   Epoch: 1   Global Step: 78660   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:42,428-Speed 2629.73 samples/sec   Loss 12.7272   LearningRate 0.0819   Epoch: 1   Global Step: 78670   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:46,321-Speed 2631.21 samples/sec   Loss 12.7554   LearningRate 0.0819   Epoch: 1   Global Step: 78680   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:50,214-Speed 2630.97 samples/sec   Loss 12.7779   LearningRate 0.0819   Epoch: 1   Global Step: 78690   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:32:54,100-Speed 2635.72 samples/sec   Loss 12.9035   LearningRate 0.0819   Epoch: 1   Global Step: 78700   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:32:57,993-Speed 2630.49 samples/sec   Loss 12.8566   LearningRate 0.0819   Epoch: 1   Global Step: 78710   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:33:01,889-Speed 2628.97 samples/sec   Loss 12.5797   LearningRate 0.0819   Epoch: 1   Global Step: 78720   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:33:05,796-Speed 2621.38 samples/sec   Loss 12.7827   LearningRate 0.0819   Epoch: 1   Global Step: 78730   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:33:09,687-Speed 2632.34 samples/sec   Loss 12.5908   LearningRate 0.0819   Epoch: 1   Global Step: 78740   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:33:13,581-Speed 2630.71 samples/sec   Loss 12.7201   LearningRate 0.0819   Epoch: 1   Global Step: 78750   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:33:17,447-Speed 2649.90 samples/sec   Loss 12.8506   LearningRate 0.0819   Epoch: 1   Global Step: 78760   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:33:21,323-Speed 2642.01 samples/sec   Loss 12.9083   LearningRate 0.0819   Epoch: 1   Global Step: 78770   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:25,215-Speed 2632.11 samples/sec   Loss 13.0466   LearningRate 0.0819   Epoch: 1   Global Step: 78780   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:29,105-Speed 2632.40 samples/sec   Loss 13.0360   LearningRate 0.0819   Epoch: 1   Global Step: 78790   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:32,998-Speed 2630.83 samples/sec   Loss 13.1003   LearningRate 0.0819   Epoch: 1   Global Step: 78800   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:36,890-Speed 2631.63 samples/sec   Loss 12.7936   LearningRate 0.0819   Epoch: 1   Global Step: 78810   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:40,789-Speed 2626.97 samples/sec   Loss 12.8726   LearningRate 0.0819   Epoch: 1   Global Step: 78820   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:44,680-Speed 2632.32 samples/sec   Loss 12.9907   LearningRate 0.0819   Epoch: 1   Global Step: 78830   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:48,572-Speed 2632.46 samples/sec   Loss 12.9715   LearningRate 0.0819   Epoch: 1   Global Step: 78840   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:52,465-Speed 2630.85 samples/sec   Loss 12.7721   LearningRate 0.0819   Epoch: 1   Global Step: 78850   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:33:56,356-Speed 2632.38 samples/sec   Loss 12.8491   LearningRate 0.0819   Epoch: 1   Global Step: 78860   Fp16 Grad Scale: 16384   Required: 84 hours
Training: 2022-04-13 04:34:00,248-Speed 2631.13 samples/sec   Loss 13.0084   LearningRate 0.0819   Epoch: 1   Global Step: 78870   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:04,144-Speed 2628.87 samples/sec   Loss 12.9840   LearningRate 0.0819   Epoch: 1   Global Step: 78880   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:08,040-Speed 2628.92 samples/sec   Loss 12.9162   LearningRate 0.0819   Epoch: 1   Global Step: 78890   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:11,931-Speed 2632.57 samples/sec   Loss 12.8324   LearningRate 0.0819   Epoch: 1   Global Step: 78900   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:15,825-Speed 2630.07 samples/sec   Loss 12.6981   LearningRate 0.0819   Epoch: 1   Global Step: 78910   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:19,724-Speed 2626.77 samples/sec   Loss 12.8609   LearningRate 0.0819   Epoch: 1   Global Step: 78920   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:23,622-Speed 2627.84 samples/sec   Loss 12.7898   LearningRate 0.0819   Epoch: 1   Global Step: 78930   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:27,519-Speed 2628.63 samples/sec   Loss 12.7558   LearningRate 0.0819   Epoch: 1   Global Step: 78940   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:31,413-Speed 2630.44 samples/sec   Loss 12.7609   LearningRate 0.0819   Epoch: 1   Global Step: 78950   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:35,306-Speed 2630.23 samples/sec   Loss 12.8650   LearningRate 0.0819   Epoch: 1   Global Step: 78960   Fp16 Grad Scale: 32768   Required: 84 hours
Training: 2022-04-13 04:34:39,199-Speed 2631.15 samples/sec   Loss 12.7998   LearningRate 0.0819   Epoch: 1   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:34:43,092-Speed 2630.94 samples/sec   Loss 12.7965   LearningRate 0.0819   Epoch: 1   Global Step: 78980   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:34:46,998-Speed 2622.15 samples/sec   Loss 12.8599   LearningRate 0.0819   Epoch: 1   Global Step: 78990   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:34:50,917-Speed 2613.18 samples/sec   Loss 12.6153   LearningRate 0.0819   Epoch: 1   Global Step: 79000   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:34:54,812-Speed 2629.87 samples/sec   Loss 12.7665   LearningRate 0.0819   Epoch: 1   Global Step: 79010   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:34:58,708-Speed 2629.01 samples/sec   Loss 12.8888   LearningRate 0.0819   Epoch: 1   Global Step: 79020   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:35:02,607-Speed 2626.90 samples/sec   Loss 12.6751   LearningRate 0.0819   Epoch: 1   Global Step: 79030   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:35:06,501-Speed 2630.36 samples/sec   Loss 12.7594   LearningRate 0.0819   Epoch: 1   Global Step: 79040   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:35:10,404-Speed 2624.21 samples/sec   Loss 12.7454   LearningRate 0.0819   Epoch: 1   Global Step: 79050   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:35:14,294-Speed 2632.98 samples/sec   Loss 12.7362   LearningRate 0.0818   Epoch: 1   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:35:18,188-Speed 2630.35 samples/sec   Loss 12.9595   LearningRate 0.0818   Epoch: 1   Global Step: 79070   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:22,088-Speed 2626.09 samples/sec   Loss 12.8832   LearningRate 0.0818   Epoch: 1   Global Step: 79080   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:25,983-Speed 2629.60 samples/sec   Loss 12.7141   LearningRate 0.0818   Epoch: 1   Global Step: 79090   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:29,883-Speed 2626.06 samples/sec   Loss 12.6687   LearningRate 0.0818   Epoch: 1   Global Step: 79100   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:33,788-Speed 2622.99 samples/sec   Loss 12.9121   LearningRate 0.0818   Epoch: 1   Global Step: 79110   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:37,684-Speed 2629.46 samples/sec   Loss 12.9523   LearningRate 0.0818   Epoch: 1   Global Step: 79120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:41,590-Speed 2622.13 samples/sec   Loss 12.8702   LearningRate 0.0818   Epoch: 1   Global Step: 79130   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:45,499-Speed 2620.22 samples/sec   Loss 12.8484   LearningRate 0.0818   Epoch: 1   Global Step: 79140   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:49,402-Speed 2624.37 samples/sec   Loss 12.8190   LearningRate 0.0818   Epoch: 1   Global Step: 79150   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:53,317-Speed 2616.22 samples/sec   Loss 12.7038   LearningRate 0.0818   Epoch: 1   Global Step: 79160   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:35:57,223-Speed 2621.80 samples/sec   Loss 12.9511   LearningRate 0.0818   Epoch: 1   Global Step: 79170   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:36:01,136-Speed 2617.79 samples/sec   Loss 12.8236   LearningRate 0.0818   Epoch: 1   Global Step: 79180   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:36:05,051-Speed 2616.44 samples/sec   Loss 12.8169   LearningRate 0.0818   Epoch: 1   Global Step: 79190   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:36:08,957-Speed 2622.00 samples/sec   Loss 12.8891   LearningRate 0.0818   Epoch: 1   Global Step: 79200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:36:12,855-Speed 2627.34 samples/sec   Loss 12.8398   LearningRate 0.0818   Epoch: 1   Global Step: 79210   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:16,764-Speed 2620.43 samples/sec   Loss 12.7741   LearningRate 0.0818   Epoch: 1   Global Step: 79220   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:20,660-Speed 2628.98 samples/sec   Loss 12.8372   LearningRate 0.0818   Epoch: 1   Global Step: 79230   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:24,561-Speed 2625.05 samples/sec   Loss 12.8558   LearningRate 0.0818   Epoch: 1   Global Step: 79240   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:28,463-Speed 2625.58 samples/sec   Loss 12.9408   LearningRate 0.0818   Epoch: 1   Global Step: 79250   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:32,359-Speed 2628.84 samples/sec   Loss 12.8133   LearningRate 0.0818   Epoch: 1   Global Step: 79260   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:36,261-Speed 2624.12 samples/sec   Loss 12.8987   LearningRate 0.0818   Epoch: 1   Global Step: 79270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:40,177-Speed 2615.69 samples/sec   Loss 12.8755   LearningRate 0.0818   Epoch: 1   Global Step: 79280   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:44,072-Speed 2630.48 samples/sec   Loss 12.7173   LearningRate 0.0818   Epoch: 1   Global Step: 79290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:47,964-Speed 2631.48 samples/sec   Loss 12.7431   LearningRate 0.0818   Epoch: 1   Global Step: 79300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:36:51,857-Speed 2630.66 samples/sec   Loss 12.7852   LearningRate 0.0818   Epoch: 1   Global Step: 79310   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:36:55,752-Speed 2630.00 samples/sec   Loss 12.7898   LearningRate 0.0818   Epoch: 1   Global Step: 79320   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:36:59,646-Speed 2630.51 samples/sec   Loss 12.9319   LearningRate 0.0818   Epoch: 1   Global Step: 79330   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:37:03,541-Speed 2629.44 samples/sec   Loss 12.7597   LearningRate 0.0818   Epoch: 1   Global Step: 79340   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:37:07,435-Speed 2630.54 samples/sec   Loss 12.8926   LearningRate 0.0818   Epoch: 1   Global Step: 79350   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:37:11,323-Speed 2634.00 samples/sec   Loss 12.6816   LearningRate 0.0818   Epoch: 1   Global Step: 79360   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:15,224-Speed 2625.13 samples/sec   Loss 12.6626   LearningRate 0.0818   Epoch: 1   Global Step: 79370   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:19,119-Speed 2630.18 samples/sec   Loss 12.8122   LearningRate 0.0818   Epoch: 1   Global Step: 79380   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:23,016-Speed 2628.69 samples/sec   Loss 12.9114   LearningRate 0.0818   Epoch: 1   Global Step: 79390   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:26,915-Speed 2627.03 samples/sec   Loss 12.7084   LearningRate 0.0818   Epoch: 1   Global Step: 79400   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:30,810-Speed 2629.08 samples/sec   Loss 12.6976   LearningRate 0.0818   Epoch: 1   Global Step: 79410   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:34,705-Speed 2629.81 samples/sec   Loss 12.8348   LearningRate 0.0818   Epoch: 1   Global Step: 79420   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:38,599-Speed 2629.98 samples/sec   Loss 12.9684   LearningRate 0.0818   Epoch: 1   Global Step: 79430   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:42,504-Speed 2623.15 samples/sec   Loss 13.0040   LearningRate 0.0818   Epoch: 1   Global Step: 79440   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:46,404-Speed 2626.45 samples/sec   Loss 12.8818   LearningRate 0.0818   Epoch: 1   Global Step: 79450   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:37:50,311-Speed 2621.70 samples/sec   Loss 12.6068   LearningRate 0.0818   Epoch: 1   Global Step: 79460   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:37:54,216-Speed 2623.12 samples/sec   Loss 12.7484   LearningRate 0.0818   Epoch: 1   Global Step: 79470   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:37:58,119-Speed 2624.02 samples/sec   Loss 12.8204   LearningRate 0.0818   Epoch: 1   Global Step: 79480   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:38:02,004-Speed 2636.57 samples/sec   Loss 12.8082   LearningRate 0.0818   Epoch: 1   Global Step: 79490   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:05,908-Speed 2623.01 samples/sec   Loss 12.6333   LearningRate 0.0818   Epoch: 1   Global Step: 79500   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:09,822-Speed 2616.87 samples/sec   Loss 12.7522   LearningRate 0.0817   Epoch: 1   Global Step: 79510   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:13,727-Speed 2622.89 samples/sec   Loss 12.8615   LearningRate 0.0817   Epoch: 1   Global Step: 79520   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:17,632-Speed 2623.25 samples/sec   Loss 12.7101   LearningRate 0.0817   Epoch: 1   Global Step: 79530   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:21,538-Speed 2622.43 samples/sec   Loss 12.8005   LearningRate 0.0817   Epoch: 1   Global Step: 79540   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:25,446-Speed 2620.61 samples/sec   Loss 12.6562   LearningRate 0.0817   Epoch: 1   Global Step: 79550   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:29,353-Speed 2621.79 samples/sec   Loss 12.9067   LearningRate 0.0817   Epoch: 1   Global Step: 79560   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:33,256-Speed 2624.42 samples/sec   Loss 12.7459   LearningRate 0.0817   Epoch: 1   Global Step: 79570   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:37,153-Speed 2627.85 samples/sec   Loss 12.8979   LearningRate 0.0817   Epoch: 1   Global Step: 79580   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:38:41,047-Speed 2630.15 samples/sec   Loss 12.8001   LearningRate 0.0817   Epoch: 1   Global Step: 79590   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:38:44,947-Speed 2625.79 samples/sec   Loss 12.8898   LearningRate 0.0817   Epoch: 1   Global Step: 79600   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:38:48,841-Speed 2631.03 samples/sec   Loss 12.6969   LearningRate 0.0817   Epoch: 1   Global Step: 79610   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:38:52,736-Speed 2629.41 samples/sec   Loss 12.6629   LearningRate 0.0817   Epoch: 1   Global Step: 79620   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:38:56,615-Speed 2640.80 samples/sec   Loss 12.6578   LearningRate 0.0817   Epoch: 1   Global Step: 79630   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:00,524-Speed 2620.38 samples/sec   Loss 12.9250   LearningRate 0.0817   Epoch: 1   Global Step: 79640   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:04,423-Speed 2627.02 samples/sec   Loss 12.7343   LearningRate 0.0817   Epoch: 1   Global Step: 79650   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:08,329-Speed 2621.85 samples/sec   Loss 12.6895   LearningRate 0.0817   Epoch: 1   Global Step: 79660   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:12,243-Speed 2617.53 samples/sec   Loss 12.6784   LearningRate 0.0817   Epoch: 1   Global Step: 79670   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:16,151-Speed 2620.77 samples/sec   Loss 12.8997   LearningRate 0.0817   Epoch: 1   Global Step: 79680   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:20,050-Speed 2626.65 samples/sec   Loss 12.8189   LearningRate 0.0817   Epoch: 1   Global Step: 79690   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:23,946-Speed 2629.58 samples/sec   Loss 12.7580   LearningRate 0.0817   Epoch: 1   Global Step: 79700   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:27,861-Speed 2615.97 samples/sec   Loss 12.9302   LearningRate 0.0817   Epoch: 1   Global Step: 79710   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:31,761-Speed 2625.93 samples/sec   Loss 12.7935   LearningRate 0.0817   Epoch: 1   Global Step: 79720   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:35,660-Speed 2627.27 samples/sec   Loss 12.9045   LearningRate 0.0817   Epoch: 1   Global Step: 79730   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:39:39,556-Speed 2628.77 samples/sec   Loss 12.9638   LearningRate 0.0817   Epoch: 1   Global Step: 79740   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:39:43,442-Speed 2635.39 samples/sec   Loss 12.6972   LearningRate 0.0817   Epoch: 1   Global Step: 79750   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:47,331-Speed 2633.71 samples/sec   Loss 12.7133   LearningRate 0.0817   Epoch: 1   Global Step: 79760   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:51,225-Speed 2630.82 samples/sec   Loss 12.9490   LearningRate 0.0817   Epoch: 1   Global Step: 79770   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:55,120-Speed 2629.52 samples/sec   Loss 12.7519   LearningRate 0.0817   Epoch: 1   Global Step: 79780   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:39:59,026-Speed 2622.76 samples/sec   Loss 12.7407   LearningRate 0.0817   Epoch: 1   Global Step: 79790   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:02,926-Speed 2625.82 samples/sec   Loss 12.6279   LearningRate 0.0817   Epoch: 1   Global Step: 79800   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:06,825-Speed 2626.91 samples/sec   Loss 12.6480   LearningRate 0.0817   Epoch: 1   Global Step: 79810   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:10,725-Speed 2626.11 samples/sec   Loss 12.6798   LearningRate 0.0817   Epoch: 1   Global Step: 79820   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:14,627-Speed 2625.38 samples/sec   Loss 12.9296   LearningRate 0.0817   Epoch: 1   Global Step: 79830   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:18,524-Speed 2628.28 samples/sec   Loss 12.8823   LearningRate 0.0817   Epoch: 1   Global Step: 79840   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:22,423-Speed 2626.46 samples/sec   Loss 12.7759   LearningRate 0.0817   Epoch: 1   Global Step: 79850   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:40:26,304-Speed 2639.53 samples/sec   Loss 12.6561   LearningRate 0.0817   Epoch: 1   Global Step: 79860   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:30,199-Speed 2629.60 samples/sec   Loss 12.7642   LearningRate 0.0817   Epoch: 1   Global Step: 79870   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:34,095-Speed 2629.55 samples/sec   Loss 12.7361   LearningRate 0.0817   Epoch: 1   Global Step: 79880   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:37,987-Speed 2631.08 samples/sec   Loss 12.6978   LearningRate 0.0817   Epoch: 1   Global Step: 79890   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:41,892-Speed 2623.25 samples/sec   Loss 12.7561   LearningRate 0.0817   Epoch: 1   Global Step: 79900   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:45,814-Speed 2611.35 samples/sec   Loss 12.7193   LearningRate 0.0817   Epoch: 1   Global Step: 79910   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:49,744-Speed 2606.50 samples/sec   Loss 12.7288   LearningRate 0.0817   Epoch: 1   Global Step: 79920   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:53,641-Speed 2628.63 samples/sec   Loss 12.9034   LearningRate 0.0817   Epoch: 1   Global Step: 79930   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:40:57,555-Speed 2616.82 samples/sec   Loss 12.8345   LearningRate 0.0817   Epoch: 1   Global Step: 79940   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:41:01,472-Speed 2615.02 samples/sec   Loss 12.7232   LearningRate 0.0817   Epoch: 1   Global Step: 79950   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:41:05,414-Speed 2598.04 samples/sec   Loss 12.8026   LearningRate 0.0817   Epoch: 1   Global Step: 79960   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:41:09,337-Speed 2610.76 samples/sec   Loss 12.6180   LearningRate 0.0816   Epoch: 1   Global Step: 79970   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:41:13,242-Speed 2622.81 samples/sec   Loss 12.8061   LearningRate 0.0816   Epoch: 1   Global Step: 79980   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:41:17,140-Speed 2627.97 samples/sec   Loss 12.6761   LearningRate 0.0816   Epoch: 1   Global Step: 79990   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:41:21,037-Speed 2628.73 samples/sec   Loss 12.7602   LearningRate 0.0816   Epoch: 1   Global Step: 80000   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:42:04,299-[lfw][80000]XNorm: 22.462203
Training: 2022-04-13 04:42:04,300-[lfw][80000]Accuracy-Flip: 0.99717+-0.00269
Training: 2022-04-13 04:42:04,301-[lfw][80000]Accuracy-Highest: 0.99783
Training: 2022-04-13 04:42:54,464-[cfp_fp][80000]XNorm: 20.692995
Training: 2022-04-13 04:42:54,465-[cfp_fp][80000]Accuracy-Flip: 0.97586+-0.00640
Training: 2022-04-13 04:42:54,466-[cfp_fp][80000]Accuracy-Highest: 0.97586
Training: 2022-04-13 04:43:37,644-[agedb_30][80000]XNorm: 22.361619
Training: 2022-04-13 04:43:37,645-[agedb_30][80000]Accuracy-Flip: 0.96600+-0.00779
Training: 2022-04-13 04:43:37,646-[agedb_30][80000]Accuracy-Highest: 0.96600
Training: 2022-04-13 04:43:41,512-Speed 72.90 samples/sec   Loss 12.8134   LearningRate 0.0816   Epoch: 1   Global Step: 80010   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:43:45,383-Speed 2646.66 samples/sec   Loss 12.7882   LearningRate 0.0816   Epoch: 1   Global Step: 80020   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:43:49,268-Speed 2636.10 samples/sec   Loss 12.6886   LearningRate 0.0816   Epoch: 1   Global Step: 80030   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:43:53,137-Speed 2647.22 samples/sec   Loss 12.9031   LearningRate 0.0816   Epoch: 1   Global Step: 80040   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:43:57,015-Speed 2641.92 samples/sec   Loss 12.8988   LearningRate 0.0816   Epoch: 1   Global Step: 80050   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:00,892-Speed 2641.43 samples/sec   Loss 12.8086   LearningRate 0.0816   Epoch: 1   Global Step: 80060   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:04,770-Speed 2641.46 samples/sec   Loss 12.6649   LearningRate 0.0816   Epoch: 1   Global Step: 80070   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:08,653-Speed 2638.40 samples/sec   Loss 12.7231   LearningRate 0.0816   Epoch: 1   Global Step: 80080   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:44:12,536-Speed 2638.82 samples/sec   Loss 12.6675   LearningRate 0.0816   Epoch: 1   Global Step: 80090   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:44:16,400-Speed 2650.16 samples/sec   Loss 12.8448   LearningRate 0.0816   Epoch: 1   Global Step: 80100   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:20,282-Speed 2638.29 samples/sec   Loss 12.7490   LearningRate 0.0816   Epoch: 1   Global Step: 80110   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:24,169-Speed 2634.78 samples/sec   Loss 12.7155   LearningRate 0.0816   Epoch: 1   Global Step: 80120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:28,062-Speed 2631.81 samples/sec   Loss 12.8777   LearningRate 0.0816   Epoch: 1   Global Step: 80130   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:31,953-Speed 2633.04 samples/sec   Loss 12.8058   LearningRate 0.0816   Epoch: 1   Global Step: 80140   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:35,855-Speed 2624.43 samples/sec   Loss 12.7501   LearningRate 0.0816   Epoch: 1   Global Step: 80150   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:39,751-Speed 2629.24 samples/sec   Loss 12.7743   LearningRate 0.0816   Epoch: 1   Global Step: 80160   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:43,644-Speed 2631.21 samples/sec   Loss 12.6797   LearningRate 0.0816   Epoch: 1   Global Step: 80170   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:47,530-Speed 2635.37 samples/sec   Loss 12.8394   LearningRate 0.0816   Epoch: 1   Global Step: 80180   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:51,421-Speed 2631.71 samples/sec   Loss 12.7971   LearningRate 0.0816   Epoch: 1   Global Step: 80190   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:44:55,307-Speed 2636.24 samples/sec   Loss 12.7350   LearningRate 0.0816   Epoch: 1   Global Step: 80200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:44:59,216-Speed 2620.57 samples/sec   Loss 12.7464   LearningRate 0.0816   Epoch: 1   Global Step: 80210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:45:03,134-Speed 2614.41 samples/sec   Loss 12.8488   LearningRate 0.0816   Epoch: 1   Global Step: 80220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:45:07,025-Speed 2632.07 samples/sec   Loss 12.6119   LearningRate 0.0816   Epoch: 1   Global Step: 80230   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:45:10,918-Speed 2631.36 samples/sec   Loss 12.7329   LearningRate 0.0816   Epoch: 1   Global Step: 80240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:45:14,792-Speed 2643.32 samples/sec   Loss 12.6955   LearningRate 0.0816   Epoch: 1   Global Step: 80250   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:18,690-Speed 2627.51 samples/sec   Loss 12.6549   LearningRate 0.0816   Epoch: 1   Global Step: 80260   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:22,585-Speed 2629.43 samples/sec   Loss 12.6659   LearningRate 0.0816   Epoch: 1   Global Step: 80270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:26,491-Speed 2622.54 samples/sec   Loss 12.5762   LearningRate 0.0816   Epoch: 1   Global Step: 80280   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:30,396-Speed 2622.53 samples/sec   Loss 12.6130   LearningRate 0.0816   Epoch: 1   Global Step: 80290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:34,303-Speed 2622.53 samples/sec   Loss 12.6348   LearningRate 0.0816   Epoch: 1   Global Step: 80300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:38,202-Speed 2626.46 samples/sec   Loss 12.8579   LearningRate 0.0816   Epoch: 1   Global Step: 80310   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:42,094-Speed 2631.84 samples/sec   Loss 12.8343   LearningRate 0.0816   Epoch: 1   Global Step: 80320   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:45,997-Speed 2624.56 samples/sec   Loss 12.6599   LearningRate 0.0816   Epoch: 1   Global Step: 80330   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:49,890-Speed 2631.01 samples/sec   Loss 12.8302   LearningRate 0.0816   Epoch: 1   Global Step: 80340   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:45:53,782-Speed 2631.30 samples/sec   Loss 12.7807   LearningRate 0.0816   Epoch: 1   Global Step: 80350   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:45:57,678-Speed 2629.50 samples/sec   Loss 12.7705   LearningRate 0.0816   Epoch: 1   Global Step: 80360   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:01,570-Speed 2631.37 samples/sec   Loss 12.7796   LearningRate 0.0816   Epoch: 1   Global Step: 80370   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:05,641-Speed 2515.90 samples/sec   Loss 12.8321   LearningRate 0.0816   Epoch: 1   Global Step: 80380   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:09,676-Speed 2538.71 samples/sec   Loss 12.9455   LearningRate 0.0816   Epoch: 1   Global Step: 80390   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:13,577-Speed 2625.94 samples/sec   Loss 12.7840   LearningRate 0.0816   Epoch: 1   Global Step: 80400   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:17,480-Speed 2624.12 samples/sec   Loss 12.6987   LearningRate 0.0816   Epoch: 1   Global Step: 80410   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:21,376-Speed 2628.83 samples/sec   Loss 12.7805   LearningRate 0.0816   Epoch: 1   Global Step: 80420   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:25,274-Speed 2627.66 samples/sec   Loss 12.6222   LearningRate 0.0815   Epoch: 1   Global Step: 80430   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:29,176-Speed 2624.43 samples/sec   Loss 12.8477   LearningRate 0.0815   Epoch: 1   Global Step: 80440   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:33,063-Speed 2634.98 samples/sec   Loss 12.8288   LearningRate 0.0815   Epoch: 1   Global Step: 80450   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:36,974-Speed 2619.04 samples/sec   Loss 12.7998   LearningRate 0.0815   Epoch: 1   Global Step: 80460   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:40,886-Speed 2618.84 samples/sec   Loss 12.5066   LearningRate 0.0815   Epoch: 1   Global Step: 80470   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:46:44,784-Speed 2627.32 samples/sec   Loss 12.7464   LearningRate 0.0815   Epoch: 1   Global Step: 80480   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:46:48,679-Speed 2630.11 samples/sec   Loss 12.7376   LearningRate 0.0815   Epoch: 1   Global Step: 80490   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:46:52,571-Speed 2631.64 samples/sec   Loss 12.6713   LearningRate 0.0815   Epoch: 1   Global Step: 80500   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:46:56,466-Speed 2629.56 samples/sec   Loss 12.7699   LearningRate 0.0815   Epoch: 1   Global Step: 80510   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:00,364-Speed 2627.43 samples/sec   Loss 12.8512   LearningRate 0.0815   Epoch: 1   Global Step: 80520   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:04,260-Speed 2628.88 samples/sec   Loss 12.6571   LearningRate 0.0815   Epoch: 1   Global Step: 80530   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:08,161-Speed 2625.91 samples/sec   Loss 12.7599   LearningRate 0.0815   Epoch: 1   Global Step: 80540   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:12,060-Speed 2627.43 samples/sec   Loss 12.6809   LearningRate 0.0815   Epoch: 1   Global Step: 80550   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:15,951-Speed 2632.24 samples/sec   Loss 12.6743   LearningRate 0.0815   Epoch: 1   Global Step: 80560   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:19,854-Speed 2624.20 samples/sec   Loss 12.9173   LearningRate 0.0815   Epoch: 1   Global Step: 80570   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:23,737-Speed 2637.48 samples/sec   Loss 12.7960   LearningRate 0.0815   Epoch: 1   Global Step: 80580   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:27,649-Speed 2618.07 samples/sec   Loss 12.8844   LearningRate 0.0815   Epoch: 1   Global Step: 80590   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:31,552-Speed 2624.01 samples/sec   Loss 12.8958   LearningRate 0.0815   Epoch: 1   Global Step: 80600   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:35,470-Speed 2614.38 samples/sec   Loss 12.7100   LearningRate 0.0815   Epoch: 1   Global Step: 80610   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:39,454-Speed 2570.65 samples/sec   Loss 12.6080   LearningRate 0.0815   Epoch: 1   Global Step: 80620   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:43,350-Speed 2628.94 samples/sec   Loss 12.8326   LearningRate 0.0815   Epoch: 1   Global Step: 80630   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:47,288-Speed 2601.28 samples/sec   Loss 12.7482   LearningRate 0.0815   Epoch: 1   Global Step: 80640   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:51,189-Speed 2625.48 samples/sec   Loss 12.7633   LearningRate 0.0815   Epoch: 1   Global Step: 80650   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:55,084-Speed 2629.68 samples/sec   Loss 12.6248   LearningRate 0.0815   Epoch: 1   Global Step: 80660   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:47:58,978-Speed 2629.60 samples/sec   Loss 12.8583   LearningRate 0.0815   Epoch: 1   Global Step: 80670   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:48:02,883-Speed 2623.18 samples/sec   Loss 12.7128   LearningRate 0.0815   Epoch: 1   Global Step: 80680   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:06,779-Speed 2629.12 samples/sec   Loss 12.8196   LearningRate 0.0815   Epoch: 1   Global Step: 80690   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:10,674-Speed 2629.30 samples/sec   Loss 12.6130   LearningRate 0.0815   Epoch: 1   Global Step: 80700   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:14,570-Speed 2629.25 samples/sec   Loss 12.6965   LearningRate 0.0815   Epoch: 1   Global Step: 80710   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:18,469-Speed 2627.33 samples/sec   Loss 12.6029   LearningRate 0.0815   Epoch: 1   Global Step: 80720   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:22,361-Speed 2631.61 samples/sec   Loss 12.7464   LearningRate 0.0815   Epoch: 1   Global Step: 80730   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:26,261-Speed 2625.92 samples/sec   Loss 12.8016   LearningRate 0.0815   Epoch: 1   Global Step: 80740   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:30,158-Speed 2627.82 samples/sec   Loss 12.7421   LearningRate 0.0815   Epoch: 1   Global Step: 80750   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:34,054-Speed 2629.55 samples/sec   Loss 12.6184   LearningRate 0.0815   Epoch: 1   Global Step: 80760   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:37,951-Speed 2627.99 samples/sec   Loss 12.8556   LearningRate 0.0815   Epoch: 1   Global Step: 80770   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:41,841-Speed 2632.98 samples/sec   Loss 12.6286   LearningRate 0.0815   Epoch: 1   Global Step: 80780   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:45,742-Speed 2625.94 samples/sec   Loss 12.7446   LearningRate 0.0815   Epoch: 1   Global Step: 80790   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:49,640-Speed 2627.16 samples/sec   Loss 12.6728   LearningRate 0.0815   Epoch: 1   Global Step: 80800   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:53,533-Speed 2631.14 samples/sec   Loss 12.6960   LearningRate 0.0815   Epoch: 1   Global Step: 80810   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:48:57,435-Speed 2625.27 samples/sec   Loss 12.6455   LearningRate 0.0815   Epoch: 1   Global Step: 80820   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:49:01,329-Speed 2629.92 samples/sec   Loss 12.7649   LearningRate 0.0815   Epoch: 1   Global Step: 80830   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:49:05,224-Speed 2629.28 samples/sec   Loss 12.6688   LearningRate 0.0815   Epoch: 1   Global Step: 80840   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:49:09,104-Speed 2640.04 samples/sec   Loss 12.7638   LearningRate 0.0815   Epoch: 1   Global Step: 80850   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:49:12,999-Speed 2629.82 samples/sec   Loss 12.7499   LearningRate 0.0815   Epoch: 1   Global Step: 80860   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:49:16,895-Speed 2629.04 samples/sec   Loss 12.5910   LearningRate 0.0815   Epoch: 1   Global Step: 80870   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:49:20,806-Speed 2618.99 samples/sec   Loss 12.7298   LearningRate 0.0815   Epoch: 1   Global Step: 80880   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:49:24,697-Speed 2632.09 samples/sec   Loss 12.7774   LearningRate 0.0814   Epoch: 1   Global Step: 80890   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:49:28,576-Speed 2640.08 samples/sec   Loss 12.8067   LearningRate 0.0814   Epoch: 1   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:32,484-Speed 2620.99 samples/sec   Loss 12.6970   LearningRate 0.0814   Epoch: 1   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:36,380-Speed 2629.38 samples/sec   Loss 12.8965   LearningRate 0.0814   Epoch: 1   Global Step: 80920   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:40,272-Speed 2631.42 samples/sec   Loss 12.8138   LearningRate 0.0814   Epoch: 1   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:44,166-Speed 2630.01 samples/sec   Loss 12.6446   LearningRate 0.0814   Epoch: 1   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:48,063-Speed 2628.49 samples/sec   Loss 12.8336   LearningRate 0.0814   Epoch: 1   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:51,958-Speed 2629.39 samples/sec   Loss 12.6968   LearningRate 0.0814   Epoch: 1   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:55,858-Speed 2626.45 samples/sec   Loss 12.6231   LearningRate 0.0814   Epoch: 1   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:49:59,756-Speed 2627.38 samples/sec   Loss 12.6524   LearningRate 0.0814   Epoch: 1   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:50:03,669-Speed 2617.80 samples/sec   Loss 12.8564   LearningRate 0.0814   Epoch: 1   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 04:50:07,590-Speed 2611.91 samples/sec   Loss 12.6948   LearningRate 0.0814   Epoch: 1   Global Step: 81000   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:11,484-Speed 2630.79 samples/sec   Loss 12.6512   LearningRate 0.0814   Epoch: 1   Global Step: 81010   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:15,382-Speed 2627.41 samples/sec   Loss 12.5896   LearningRate 0.0814   Epoch: 1   Global Step: 81020   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:19,279-Speed 2628.25 samples/sec   Loss 12.7791   LearningRate 0.0814   Epoch: 1   Global Step: 81030   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:23,172-Speed 2630.50 samples/sec   Loss 12.8260   LearningRate 0.0814   Epoch: 1   Global Step: 81040   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:27,066-Speed 2630.43 samples/sec   Loss 12.7332   LearningRate 0.0814   Epoch: 1   Global Step: 81050   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:30,964-Speed 2627.96 samples/sec   Loss 12.7378   LearningRate 0.0814   Epoch: 1   Global Step: 81060   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:34,868-Speed 2623.21 samples/sec   Loss 12.9162   LearningRate 0.0814   Epoch: 1   Global Step: 81070   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:38,783-Speed 2615.89 samples/sec   Loss 12.7844   LearningRate 0.0814   Epoch: 1   Global Step: 81080   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:42,693-Speed 2619.86 samples/sec   Loss 12.7199   LearningRate 0.0814   Epoch: 1   Global Step: 81090   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:46,591-Speed 2627.60 samples/sec   Loss 12.8298   LearningRate 0.0814   Epoch: 1   Global Step: 81100   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:50,476-Speed 2636.66 samples/sec   Loss 12.7015   LearningRate 0.0814   Epoch: 1   Global Step: 81110   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:54,365-Speed 2633.26 samples/sec   Loss 12.7069   LearningRate 0.0814   Epoch: 1   Global Step: 81120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:50:58,261-Speed 2629.22 samples/sec   Loss 12.6679   LearningRate 0.0814   Epoch: 1   Global Step: 81130   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:02,152-Speed 2631.96 samples/sec   Loss 12.7390   LearningRate 0.0814   Epoch: 1   Global Step: 81140   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:06,042-Speed 2632.85 samples/sec   Loss 12.7599   LearningRate 0.0814   Epoch: 1   Global Step: 81150   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:09,940-Speed 2627.90 samples/sec   Loss 12.7039   LearningRate 0.0814   Epoch: 1   Global Step: 81160   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:13,837-Speed 2628.30 samples/sec   Loss 12.8003   LearningRate 0.0814   Epoch: 1   Global Step: 81170   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:17,744-Speed 2621.62 samples/sec   Loss 12.5774   LearningRate 0.0814   Epoch: 1   Global Step: 81180   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:21,638-Speed 2630.41 samples/sec   Loss 12.7879   LearningRate 0.0814   Epoch: 1   Global Step: 81190   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:25,549-Speed 2619.12 samples/sec   Loss 12.8007   LearningRate 0.0814   Epoch: 1   Global Step: 81200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:51:29,442-Speed 2630.30 samples/sec   Loss 12.8140   LearningRate 0.0814   Epoch: 1   Global Step: 81210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:51:33,506-Speed 2520.32 samples/sec   Loss 12.7617   LearningRate 0.0814   Epoch: 1   Global Step: 81220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:51:37,527-Speed 2547.29 samples/sec   Loss 12.8809   LearningRate 0.0814   Epoch: 1   Global Step: 81230   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:41,420-Speed 2632.25 samples/sec   Loss 12.7595   LearningRate 0.0814   Epoch: 1   Global Step: 81240   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:45,324-Speed 2622.99 samples/sec   Loss 12.9105   LearningRate 0.0814   Epoch: 1   Global Step: 81250   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:49,218-Speed 2630.57 samples/sec   Loss 12.7549   LearningRate 0.0814   Epoch: 1   Global Step: 81260   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:53,114-Speed 2629.28 samples/sec   Loss 12.6899   LearningRate 0.0814   Epoch: 1   Global Step: 81270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:51:57,030-Speed 2615.76 samples/sec   Loss 12.7518   LearningRate 0.0814   Epoch: 1   Global Step: 81280   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:00,920-Speed 2632.87 samples/sec   Loss 12.8466   LearningRate 0.0814   Epoch: 1   Global Step: 81290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:04,817-Speed 2628.56 samples/sec   Loss 12.6669   LearningRate 0.0814   Epoch: 1   Global Step: 81300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:08,708-Speed 2631.90 samples/sec   Loss 12.6150   LearningRate 0.0814   Epoch: 1   Global Step: 81310   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:12,600-Speed 2631.44 samples/sec   Loss 12.6824   LearningRate 0.0814   Epoch: 1   Global Step: 81320   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:16,494-Speed 2630.18 samples/sec   Loss 12.8216   LearningRate 0.0814   Epoch: 1   Global Step: 81330   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:52:20,400-Speed 2622.23 samples/sec   Loss 12.7252   LearningRate 0.0814   Epoch: 1   Global Step: 81340   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:52:24,305-Speed 2623.19 samples/sec   Loss 12.5465   LearningRate 0.0813   Epoch: 1   Global Step: 81350   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:52:28,195-Speed 2632.86 samples/sec   Loss 12.7207   LearningRate 0.0813   Epoch: 1   Global Step: 81360   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:52:32,090-Speed 2630.17 samples/sec   Loss 12.7792   LearningRate 0.0813   Epoch: 1   Global Step: 81370   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:52:35,963-Speed 2644.19 samples/sec   Loss 12.5230   LearningRate 0.0813   Epoch: 1   Global Step: 81380   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:39,853-Speed 2633.45 samples/sec   Loss 12.8004   LearningRate 0.0813   Epoch: 1   Global Step: 81390   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:43,748-Speed 2629.59 samples/sec   Loss 12.6007   LearningRate 0.0813   Epoch: 1   Global Step: 81400   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:47,682-Speed 2603.57 samples/sec   Loss 12.6748   LearningRate 0.0813   Epoch: 1   Global Step: 81410   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:51,576-Speed 2630.02 samples/sec   Loss 12.6747   LearningRate 0.0813   Epoch: 1   Global Step: 81420   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:55,467-Speed 2632.10 samples/sec   Loss 12.7286   LearningRate 0.0813   Epoch: 1   Global Step: 81430   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:52:59,365-Speed 2627.25 samples/sec   Loss 12.6791   LearningRate 0.0813   Epoch: 1   Global Step: 81440   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:03,259-Speed 2630.74 samples/sec   Loss 12.7724   LearningRate 0.0813   Epoch: 1   Global Step: 81450   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:07,155-Speed 2629.03 samples/sec   Loss 12.6047   LearningRate 0.0813   Epoch: 1   Global Step: 81460   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:11,053-Speed 2627.82 samples/sec   Loss 12.7933   LearningRate 0.0813   Epoch: 1   Global Step: 81470   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:14,949-Speed 2628.43 samples/sec   Loss 12.6709   LearningRate 0.0813   Epoch: 1   Global Step: 81480   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:53:18,832-Speed 2638.04 samples/sec   Loss 12.6596   LearningRate 0.0813   Epoch: 1   Global Step: 81490   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:22,732-Speed 2625.76 samples/sec   Loss 12.6927   LearningRate 0.0813   Epoch: 1   Global Step: 81500   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:26,634-Speed 2625.18 samples/sec   Loss 12.6432   LearningRate 0.0813   Epoch: 1   Global Step: 81510   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:30,533-Speed 2626.28 samples/sec   Loss 12.5319   LearningRate 0.0813   Epoch: 1   Global Step: 81520   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:34,435-Speed 2625.54 samples/sec   Loss 12.5782   LearningRate 0.0813   Epoch: 1   Global Step: 81530   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:38,336-Speed 2625.32 samples/sec   Loss 12.5581   LearningRate 0.0813   Epoch: 1   Global Step: 81540   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:42,237-Speed 2625.67 samples/sec   Loss 12.7249   LearningRate 0.0813   Epoch: 1   Global Step: 81550   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:46,138-Speed 2626.03 samples/sec   Loss 12.5270   LearningRate 0.0813   Epoch: 1   Global Step: 81560   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:50,036-Speed 2627.74 samples/sec   Loss 12.8708   LearningRate 0.0813   Epoch: 1   Global Step: 81570   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:53,931-Speed 2629.30 samples/sec   Loss 12.7725   LearningRate 0.0813   Epoch: 1   Global Step: 81580   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:53:57,823-Speed 2631.45 samples/sec   Loss 12.6493   LearningRate 0.0813   Epoch: 1   Global Step: 81590   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:54:01,716-Speed 2630.73 samples/sec   Loss 12.7764   LearningRate 0.0813   Epoch: 1   Global Step: 81600   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:54:05,616-Speed 2626.25 samples/sec   Loss 12.6475   LearningRate 0.0813   Epoch: 1   Global Step: 81610   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:54:09,524-Speed 2621.22 samples/sec   Loss 12.6867   LearningRate 0.0813   Epoch: 1   Global Step: 81620   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:54:13,432-Speed 2620.30 samples/sec   Loss 12.4883   LearningRate 0.0813   Epoch: 1   Global Step: 81630   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:54:17,334-Speed 2625.61 samples/sec   Loss 12.7440   LearningRate 0.0813   Epoch: 1   Global Step: 81640   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:54:21,216-Speed 2638.55 samples/sec   Loss 12.7219   LearningRate 0.0813   Epoch: 1   Global Step: 81650   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:25,114-Speed 2627.47 samples/sec   Loss 12.8117   LearningRate 0.0813   Epoch: 1   Global Step: 81660   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:29,016-Speed 2624.30 samples/sec   Loss 12.6689   LearningRate 0.0813   Epoch: 1   Global Step: 81670   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:32,915-Speed 2627.15 samples/sec   Loss 12.7536   LearningRate 0.0813   Epoch: 1   Global Step: 81680   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:36,812-Speed 2628.26 samples/sec   Loss 12.5726   LearningRate 0.0813   Epoch: 1   Global Step: 81690   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:40,705-Speed 2630.60 samples/sec   Loss 12.6654   LearningRate 0.0813   Epoch: 1   Global Step: 81700   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:44,606-Speed 2625.56 samples/sec   Loss 12.5321   LearningRate 0.0813   Epoch: 1   Global Step: 81710   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:48,502-Speed 2629.01 samples/sec   Loss 12.7687   LearningRate 0.0813   Epoch: 1   Global Step: 81720   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:52,397-Speed 2629.96 samples/sec   Loss 12.7468   LearningRate 0.0813   Epoch: 1   Global Step: 81730   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:54:56,296-Speed 2627.01 samples/sec   Loss 12.6623   LearningRate 0.0813   Epoch: 1   Global Step: 81740   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:55:00,193-Speed 2628.21 samples/sec   Loss 12.6982   LearningRate 0.0813   Epoch: 1   Global Step: 81750   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:04,095-Speed 2624.97 samples/sec   Loss 12.7564   LearningRate 0.0813   Epoch: 1   Global Step: 81760   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:07,989-Speed 2630.11 samples/sec   Loss 12.7046   LearningRate 0.0813   Epoch: 1   Global Step: 81770   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:11,885-Speed 2628.48 samples/sec   Loss 12.7803   LearningRate 0.0813   Epoch: 1   Global Step: 81780   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:15,780-Speed 2629.46 samples/sec   Loss 12.6532   LearningRate 0.0813   Epoch: 1   Global Step: 81790   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:19,674-Speed 2630.46 samples/sec   Loss 12.6093   LearningRate 0.0813   Epoch: 1   Global Step: 81800   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:23,574-Speed 2626.53 samples/sec   Loss 12.8642   LearningRate 0.0812   Epoch: 1   Global Step: 81810   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:27,473-Speed 2632.76 samples/sec   Loss 12.5971   LearningRate 0.0812   Epoch: 1   Global Step: 81820   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:31,369-Speed 2628.81 samples/sec   Loss 12.7297   LearningRate 0.0812   Epoch: 1   Global Step: 81830   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:35,264-Speed 2629.51 samples/sec   Loss 12.8105   LearningRate 0.0812   Epoch: 1   Global Step: 81840   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:39,152-Speed 2634.10 samples/sec   Loss 12.6360   LearningRate 0.0812   Epoch: 1   Global Step: 81850   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:43,044-Speed 2632.25 samples/sec   Loss 12.5466   LearningRate 0.0812   Epoch: 1   Global Step: 81860   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:46,943-Speed 2626.87 samples/sec   Loss 12.6833   LearningRate 0.0812   Epoch: 1   Global Step: 81870   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:50,839-Speed 2628.56 samples/sec   Loss 12.7764   LearningRate 0.0812   Epoch: 1   Global Step: 81880   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:54,744-Speed 2623.19 samples/sec   Loss 12.5993   LearningRate 0.0812   Epoch: 1   Global Step: 81890   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:55:58,642-Speed 2627.66 samples/sec   Loss 12.7398   LearningRate 0.0812   Epoch: 1   Global Step: 81900   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:02,535-Speed 2630.44 samples/sec   Loss 12.7492   LearningRate 0.0812   Epoch: 1   Global Step: 81910   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:06,430-Speed 2629.54 samples/sec   Loss 12.8317   LearningRate 0.0812   Epoch: 1   Global Step: 81920   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:10,335-Speed 2622.99 samples/sec   Loss 12.7210   LearningRate 0.0812   Epoch: 1   Global Step: 81930   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:14,232-Speed 2628.84 samples/sec   Loss 12.6599   LearningRate 0.0812   Epoch: 1   Global Step: 81940   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:18,135-Speed 2624.50 samples/sec   Loss 12.7609   LearningRate 0.0812   Epoch: 1   Global Step: 81950   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 04:56:22,018-Speed 2637.28 samples/sec   Loss 12.7326   LearningRate 0.0812   Epoch: 1   Global Step: 81960   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:25,923-Speed 2623.31 samples/sec   Loss 12.5790   LearningRate 0.0812   Epoch: 1   Global Step: 81970   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:29,825-Speed 2624.71 samples/sec   Loss 12.6488   LearningRate 0.0812   Epoch: 1   Global Step: 81980   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:33,919-Speed 2501.46 samples/sec   Loss 12.7640   LearningRate 0.0812   Epoch: 1   Global Step: 81990   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:37,931-Speed 2553.15 samples/sec   Loss 12.6293   LearningRate 0.0812   Epoch: 1   Global Step: 82000   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:41,827-Speed 2628.54 samples/sec   Loss 12.6754   LearningRate 0.0812   Epoch: 1   Global Step: 82010   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:45,741-Speed 2617.55 samples/sec   Loss 12.6118   LearningRate 0.0812   Epoch: 1   Global Step: 82020   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:56:49,651-Speed 2619.58 samples/sec   Loss 12.6666   LearningRate 0.0812   Epoch: 1   Global Step: 82030   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:56:53,542-Speed 2632.03 samples/sec   Loss 12.5728   LearningRate 0.0812   Epoch: 1   Global Step: 82040   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:56:57,447-Speed 2623.33 samples/sec   Loss 12.5581   LearningRate 0.0812   Epoch: 1   Global Step: 82050   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:01,340-Speed 2630.39 samples/sec   Loss 12.6067   LearningRate 0.0812   Epoch: 1   Global Step: 82060   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:05,246-Speed 2621.94 samples/sec   Loss 12.6107   LearningRate 0.0812   Epoch: 1   Global Step: 82070   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:09,146-Speed 2626.67 samples/sec   Loss 12.7986   LearningRate 0.0812   Epoch: 1   Global Step: 82080   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:13,069-Speed 2611.07 samples/sec   Loss 12.5893   LearningRate 0.0812   Epoch: 1   Global Step: 82090   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:16,967-Speed 2626.87 samples/sec   Loss 12.7543   LearningRate 0.0812   Epoch: 1   Global Step: 82100   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:20,866-Speed 2627.57 samples/sec   Loss 12.7022   LearningRate 0.0812   Epoch: 1   Global Step: 82110   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:24,761-Speed 2629.33 samples/sec   Loss 12.7836   LearningRate 0.0812   Epoch: 1   Global Step: 82120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 04:57:28,701-Speed 2600.15 samples/sec   Loss 12.7135   LearningRate 0.0812   Epoch: 1   Global Step: 82130   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:32,607-Speed 2621.80 samples/sec   Loss 12.5842   LearningRate 0.0812   Epoch: 1   Global Step: 82140   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:36,503-Speed 2629.17 samples/sec   Loss 12.7908   LearningRate 0.0812   Epoch: 1   Global Step: 82150   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:40,399-Speed 2628.36 samples/sec   Loss 12.7755   LearningRate 0.0812   Epoch: 1   Global Step: 82160   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:44,296-Speed 2628.48 samples/sec   Loss 12.6497   LearningRate 0.0812   Epoch: 1   Global Step: 82170   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:48,196-Speed 2626.47 samples/sec   Loss 12.5428   LearningRate 0.0812   Epoch: 1   Global Step: 82180   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:52,101-Speed 2622.59 samples/sec   Loss 12.6256   LearningRate 0.0812   Epoch: 1   Global Step: 82190   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:56,005-Speed 2624.00 samples/sec   Loss 12.6583   LearningRate 0.0812   Epoch: 1   Global Step: 82200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:57:59,914-Speed 2620.24 samples/sec   Loss 12.7950   LearningRate 0.0812   Epoch: 1   Global Step: 82210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:03,816-Speed 2624.84 samples/sec   Loss 12.7580   LearningRate 0.0812   Epoch: 1   Global Step: 82220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:07,710-Speed 2629.86 samples/sec   Loss 12.5997   LearningRate 0.0812   Epoch: 1   Global Step: 82230   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 04:58:11,593-Speed 2638.00 samples/sec   Loss 12.6582   LearningRate 0.0812   Epoch: 1   Global Step: 82240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:15,496-Speed 2624.32 samples/sec   Loss 12.8005   LearningRate 0.0812   Epoch: 1   Global Step: 82250   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:19,408-Speed 2617.95 samples/sec   Loss 12.6448   LearningRate 0.0812   Epoch: 1   Global Step: 82260   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:23,316-Speed 2620.74 samples/sec   Loss 12.7128   LearningRate 0.0811   Epoch: 1   Global Step: 82270   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:27,215-Speed 2627.33 samples/sec   Loss 12.6916   LearningRate 0.0811   Epoch: 1   Global Step: 82280   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:31,111-Speed 2628.88 samples/sec   Loss 12.7688   LearningRate 0.0811   Epoch: 1   Global Step: 82290   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:35,006-Speed 2629.86 samples/sec   Loss 12.6644   LearningRate 0.0811   Epoch: 1   Global Step: 82300   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:38,899-Speed 2630.83 samples/sec   Loss 12.8269   LearningRate 0.0811   Epoch: 1   Global Step: 82310   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:42,800-Speed 2625.70 samples/sec   Loss 12.5897   LearningRate 0.0811   Epoch: 1   Global Step: 82320   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:46,696-Speed 2629.08 samples/sec   Loss 12.6441   LearningRate 0.0811   Epoch: 1   Global Step: 82330   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:50,576-Speed 2639.57 samples/sec   Loss 12.7058   LearningRate 0.0811   Epoch: 1   Global Step: 82340   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:54,475-Speed 2627.77 samples/sec   Loss 12.7321   LearningRate 0.0811   Epoch: 1   Global Step: 82350   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:58:58,373-Speed 2627.21 samples/sec   Loss 12.9011   LearningRate 0.0811   Epoch: 1   Global Step: 82360   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:02,272-Speed 2626.70 samples/sec   Loss 12.7435   LearningRate 0.0811   Epoch: 1   Global Step: 82370   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:06,175-Speed 2624.55 samples/sec   Loss 12.6077   LearningRate 0.0811   Epoch: 1   Global Step: 82380   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:10,071-Speed 2629.04 samples/sec   Loss 12.6641   LearningRate 0.0811   Epoch: 1   Global Step: 82390   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:13,965-Speed 2630.74 samples/sec   Loss 12.6721   LearningRate 0.0811   Epoch: 1   Global Step: 82400   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:17,860-Speed 2629.90 samples/sec   Loss 12.7596   LearningRate 0.0811   Epoch: 1   Global Step: 82410   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:21,754-Speed 2630.35 samples/sec   Loss 12.7226   LearningRate 0.0811   Epoch: 1   Global Step: 82420   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:25,646-Speed 2631.05 samples/sec   Loss 12.7139   LearningRate 0.0811   Epoch: 1   Global Step: 82430   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:29,528-Speed 2638.37 samples/sec   Loss 12.6202   LearningRate 0.0811   Epoch: 1   Global Step: 82440   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:33,423-Speed 2629.60 samples/sec   Loss 12.6714   LearningRate 0.0811   Epoch: 1   Global Step: 82450   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:37,315-Speed 2631.71 samples/sec   Loss 12.6482   LearningRate 0.0811   Epoch: 1   Global Step: 82460   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:41,209-Speed 2629.96 samples/sec   Loss 12.6327   LearningRate 0.0811   Epoch: 1   Global Step: 82470   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:45,105-Speed 2629.25 samples/sec   Loss 12.6468   LearningRate 0.0811   Epoch: 1   Global Step: 82480   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:49,010-Speed 2623.20 samples/sec   Loss 12.6882   LearningRate 0.0811   Epoch: 1   Global Step: 82490   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:52,907-Speed 2628.60 samples/sec   Loss 12.6799   LearningRate 0.0811   Epoch: 1   Global Step: 82500   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 04:59:56,807-Speed 2626.17 samples/sec   Loss 12.8499   LearningRate 0.0811   Epoch: 1   Global Step: 82510   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:00,709-Speed 2624.63 samples/sec   Loss 12.7245   LearningRate 0.0811   Epoch: 1   Global Step: 82520   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:04,621-Speed 2617.64 samples/sec   Loss 12.7288   LearningRate 0.0811   Epoch: 1   Global Step: 82530   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:08,507-Speed 2636.30 samples/sec   Loss 12.6739   LearningRate 0.0811   Epoch: 1   Global Step: 82540   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:12,413-Speed 2621.67 samples/sec   Loss 12.7611   LearningRate 0.0811   Epoch: 1   Global Step: 82550   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:16,309-Speed 2628.92 samples/sec   Loss 12.7189   LearningRate 0.0811   Epoch: 1   Global Step: 82560   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:20,205-Speed 2629.60 samples/sec   Loss 12.5895   LearningRate 0.0811   Epoch: 1   Global Step: 82570   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:24,101-Speed 2628.86 samples/sec   Loss 12.7070   LearningRate 0.0811   Epoch: 1   Global Step: 82580   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:27,998-Speed 2628.10 samples/sec   Loss 12.6487   LearningRate 0.0811   Epoch: 1   Global Step: 82590   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:31,890-Speed 2631.67 samples/sec   Loss 12.7384   LearningRate 0.0811   Epoch: 1   Global Step: 82600   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:35,785-Speed 2629.36 samples/sec   Loss 12.7496   LearningRate 0.0811   Epoch: 1   Global Step: 82610   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:39,695-Speed 2619.55 samples/sec   Loss 12.7746   LearningRate 0.0811   Epoch: 1   Global Step: 82620   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:43,734-Speed 2536.12 samples/sec   Loss 12.6215   LearningRate 0.0811   Epoch: 1   Global Step: 82630   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:47,766-Speed 2540.17 samples/sec   Loss 12.7360   LearningRate 0.0811   Epoch: 1   Global Step: 82640   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:51,666-Speed 2626.85 samples/sec   Loss 12.7079   LearningRate 0.0811   Epoch: 1   Global Step: 82650   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:55,563-Speed 2628.36 samples/sec   Loss 12.7221   LearningRate 0.0811   Epoch: 1   Global Step: 82660   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:00:59,460-Speed 2627.54 samples/sec   Loss 12.7136   LearningRate 0.0811   Epoch: 1   Global Step: 82670   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:01:03,356-Speed 2628.82 samples/sec   Loss 12.6344   LearningRate 0.0811   Epoch: 1   Global Step: 82680   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:01:07,236-Speed 2639.94 samples/sec   Loss 12.7343   LearningRate 0.0811   Epoch: 1   Global Step: 82690   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:11,134-Speed 2627.79 samples/sec   Loss 12.6650   LearningRate 0.0811   Epoch: 1   Global Step: 82700   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:15,058-Speed 2609.82 samples/sec   Loss 12.6163   LearningRate 0.0811   Epoch: 1   Global Step: 82710   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:18,955-Speed 2628.47 samples/sec   Loss 12.6202   LearningRate 0.0811   Epoch: 1   Global Step: 82720   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:22,855-Speed 2626.19 samples/sec   Loss 12.6998   LearningRate 0.0810   Epoch: 1   Global Step: 82730   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:26,756-Speed 2626.05 samples/sec   Loss 12.7759   LearningRate 0.0810   Epoch: 1   Global Step: 82740   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:30,658-Speed 2624.96 samples/sec   Loss 12.7015   LearningRate 0.0810   Epoch: 1   Global Step: 82750   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:34,558-Speed 2626.05 samples/sec   Loss 12.7171   LearningRate 0.0810   Epoch: 1   Global Step: 82760   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:38,451-Speed 2630.89 samples/sec   Loss 12.7009   LearningRate 0.0810   Epoch: 1   Global Step: 82770   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:42,340-Speed 2633.38 samples/sec   Loss 12.6431   LearningRate 0.0810   Epoch: 1   Global Step: 82780   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:01:46,234-Speed 2630.27 samples/sec   Loss 12.7664   LearningRate 0.0810   Epoch: 1   Global Step: 82790   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:01:50,129-Speed 2629.83 samples/sec   Loss 12.6279   LearningRate 0.0810   Epoch: 1   Global Step: 82800   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:01:54,032-Speed 2623.75 samples/sec   Loss 12.8490   LearningRate 0.0810   Epoch: 1   Global Step: 82810   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:01:57,936-Speed 2624.14 samples/sec   Loss 12.6811   LearningRate 0.0810   Epoch: 1   Global Step: 82820   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:02:01,864-Speed 2607.11 samples/sec   Loss 12.6864   LearningRate 0.0810   Epoch: 1   Global Step: 82830   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:02:05,771-Speed 2622.16 samples/sec   Loss 12.8650   LearningRate 0.0810   Epoch: 1   Global Step: 82840   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:02:09,668-Speed 2627.87 samples/sec   Loss 12.6440   LearningRate 0.0810   Epoch: 1   Global Step: 82850   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:02:13,567-Speed 2627.22 samples/sec   Loss 12.6982   LearningRate 0.0810   Epoch: 1   Global Step: 82860   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:02:17,464-Speed 2628.07 samples/sec   Loss 12.8198   LearningRate 0.0810   Epoch: 1   Global Step: 82870   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:02:21,325-Speed 2652.76 samples/sec   Loss 12.8725   LearningRate 0.0810   Epoch: 1   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:25,218-Speed 2630.76 samples/sec   Loss 12.7274   LearningRate 0.0810   Epoch: 1   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:29,111-Speed 2630.76 samples/sec   Loss 12.9040   LearningRate 0.0810   Epoch: 1   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:33,004-Speed 2631.09 samples/sec   Loss 12.7294   LearningRate 0.0810   Epoch: 1   Global Step: 82910   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:36,906-Speed 2624.73 samples/sec   Loss 12.6230   LearningRate 0.0810   Epoch: 1   Global Step: 82920   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:40,817-Speed 2619.28 samples/sec   Loss 12.9445   LearningRate 0.0810   Epoch: 1   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:44,716-Speed 2627.21 samples/sec   Loss 12.8571   LearningRate 0.0810   Epoch: 1   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:02:48,635-Speed 2613.38 samples/sec   Loss 12.7275   LearningRate 0.0810   Epoch: 1   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:03:09,768-Speed 484.57 samples/sec   Loss 12.6181   LearningRate 0.0810   Epoch: 2   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:03:13,643-Speed 2643.91 samples/sec   Loss 12.6812   LearningRate 0.0810   Epoch: 2   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:03:17,526-Speed 2637.38 samples/sec   Loss 12.7368   LearningRate 0.0810   Epoch: 2   Global Step: 82980   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:21,408-Speed 2639.01 samples/sec   Loss 12.7815   LearningRate 0.0810   Epoch: 2   Global Step: 82990   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:25,289-Speed 2639.04 samples/sec   Loss 12.7604   LearningRate 0.0810   Epoch: 2   Global Step: 83000   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:29,185-Speed 2628.80 samples/sec   Loss 12.7710   LearningRate 0.0810   Epoch: 2   Global Step: 83010   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:33,090-Speed 2623.52 samples/sec   Loss 12.8139   LearningRate 0.0810   Epoch: 2   Global Step: 83020   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:36,984-Speed 2630.33 samples/sec   Loss 12.7055   LearningRate 0.0810   Epoch: 2   Global Step: 83030   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:40,870-Speed 2635.55 samples/sec   Loss 12.6682   LearningRate 0.0810   Epoch: 2   Global Step: 83040   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:44,762-Speed 2631.98 samples/sec   Loss 12.6308   LearningRate 0.0810   Epoch: 2   Global Step: 83050   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:48,666-Speed 2623.54 samples/sec   Loss 12.5435   LearningRate 0.0810   Epoch: 2   Global Step: 83060   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:52,555-Speed 2633.51 samples/sec   Loss 12.7190   LearningRate 0.0810   Epoch: 2   Global Step: 83070   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:03:56,447-Speed 2632.64 samples/sec   Loss 12.8639   LearningRate 0.0810   Epoch: 2   Global Step: 83080   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:00,337-Speed 2632.53 samples/sec   Loss 12.5883   LearningRate 0.0810   Epoch: 2   Global Step: 83090   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:04,229-Speed 2632.03 samples/sec   Loss 12.7701   LearningRate 0.0810   Epoch: 2   Global Step: 83100   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:08,124-Speed 2629.20 samples/sec   Loss 12.6821   LearningRate 0.0810   Epoch: 2   Global Step: 83110   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:12,020-Speed 2629.61 samples/sec   Loss 12.8174   LearningRate 0.0810   Epoch: 2   Global Step: 83120   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:15,914-Speed 2629.95 samples/sec   Loss 12.4704   LearningRate 0.0810   Epoch: 2   Global Step: 83130   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:19,806-Speed 2631.90 samples/sec   Loss 12.7375   LearningRate 0.0810   Epoch: 2   Global Step: 83140   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:23,698-Speed 2632.07 samples/sec   Loss 12.5765   LearningRate 0.0810   Epoch: 2   Global Step: 83150   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:27,597-Speed 2626.99 samples/sec   Loss 12.6542   LearningRate 0.0810   Epoch: 2   Global Step: 83160   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:31,489-Speed 2631.48 samples/sec   Loss 12.5551   LearningRate 0.0810   Epoch: 2   Global Step: 83170   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:35,367-Speed 2641.72 samples/sec   Loss 12.5882   LearningRate 0.0810   Epoch: 2   Global Step: 83180   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:39,261-Speed 2630.77 samples/sec   Loss 12.7456   LearningRate 0.0809   Epoch: 2   Global Step: 83190   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:43,153-Speed 2631.41 samples/sec   Loss 12.6461   LearningRate 0.0809   Epoch: 2   Global Step: 83200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:47,051-Speed 2627.69 samples/sec   Loss 12.7515   LearningRate 0.0809   Epoch: 2   Global Step: 83210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:50,949-Speed 2627.60 samples/sec   Loss 12.6849   LearningRate 0.0809   Epoch: 2   Global Step: 83220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:54,844-Speed 2629.94 samples/sec   Loss 12.7803   LearningRate 0.0809   Epoch: 2   Global Step: 83230   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:04:58,747-Speed 2623.87 samples/sec   Loss 12.6311   LearningRate 0.0809   Epoch: 2   Global Step: 83240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:02,655-Speed 2620.68 samples/sec   Loss 12.6180   LearningRate 0.0809   Epoch: 2   Global Step: 83250   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:06,549-Speed 2630.42 samples/sec   Loss 12.5331   LearningRate 0.0809   Epoch: 2   Global Step: 83260   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:10,457-Speed 2621.34 samples/sec   Loss 12.8018   LearningRate 0.0809   Epoch: 2   Global Step: 83270   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:14,335-Speed 2641.28 samples/sec   Loss 12.7061   LearningRate 0.0809   Epoch: 2   Global Step: 83280   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:18,233-Speed 2627.44 samples/sec   Loss 12.6106   LearningRate 0.0809   Epoch: 2   Global Step: 83290   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:22,127-Speed 2630.78 samples/sec   Loss 12.7963   LearningRate 0.0809   Epoch: 2   Global Step: 83300   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:26,023-Speed 2628.65 samples/sec   Loss 12.7176   LearningRate 0.0809   Epoch: 2   Global Step: 83310   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:29,921-Speed 2628.12 samples/sec   Loss 12.6952   LearningRate 0.0809   Epoch: 2   Global Step: 83320   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:33,820-Speed 2626.39 samples/sec   Loss 12.6567   LearningRate 0.0809   Epoch: 2   Global Step: 83330   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:37,717-Speed 2628.40 samples/sec   Loss 12.6643   LearningRate 0.0809   Epoch: 2   Global Step: 83340   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:41,616-Speed 2626.94 samples/sec   Loss 12.6514   LearningRate 0.0809   Epoch: 2   Global Step: 83350   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:45,513-Speed 2628.53 samples/sec   Loss 12.5549   LearningRate 0.0809   Epoch: 2   Global Step: 83360   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:49,408-Speed 2631.72 samples/sec   Loss 12.8500   LearningRate 0.0809   Epoch: 2   Global Step: 83370   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:05:53,303-Speed 2629.03 samples/sec   Loss 12.7266   LearningRate 0.0809   Epoch: 2   Global Step: 83380   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 05:05:57,180-Speed 2641.95 samples/sec   Loss 12.5868   LearningRate 0.0809   Epoch: 2   Global Step: 83390   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:01,074-Speed 2630.60 samples/sec   Loss 12.5005   LearningRate 0.0809   Epoch: 2   Global Step: 83400   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:04,978-Speed 2623.57 samples/sec   Loss 12.5689   LearningRate 0.0809   Epoch: 2   Global Step: 83410   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:08,879-Speed 2625.33 samples/sec   Loss 12.6257   LearningRate 0.0809   Epoch: 2   Global Step: 83420   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:12,774-Speed 2629.43 samples/sec   Loss 12.5673   LearningRate 0.0809   Epoch: 2   Global Step: 83430   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:16,674-Speed 2626.86 samples/sec   Loss 12.7097   LearningRate 0.0809   Epoch: 2   Global Step: 83440   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:20,571-Speed 2628.38 samples/sec   Loss 12.7615   LearningRate 0.0809   Epoch: 2   Global Step: 83450   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:24,470-Speed 2626.85 samples/sec   Loss 12.5454   LearningRate 0.0809   Epoch: 2   Global Step: 83460   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:28,451-Speed 2572.77 samples/sec   Loss 12.5263   LearningRate 0.0809   Epoch: 2   Global Step: 83470   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:32,349-Speed 2627.87 samples/sec   Loss 12.6705   LearningRate 0.0809   Epoch: 2   Global Step: 83480   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:36,236-Speed 2634.75 samples/sec   Loss 12.7527   LearningRate 0.0809   Epoch: 2   Global Step: 83490   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:40,206-Speed 2579.60 samples/sec   Loss 12.6085   LearningRate 0.0809   Epoch: 2   Global Step: 83500   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:06:44,227-Speed 2547.63 samples/sec   Loss 12.6121   LearningRate 0.0809   Epoch: 2   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:06:48,147-Speed 2612.67 samples/sec   Loss 12.4387   LearningRate 0.0809   Epoch: 2   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:06:52,048-Speed 2626.02 samples/sec   Loss 12.6112   LearningRate 0.0809   Epoch: 2   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:06:55,945-Speed 2628.19 samples/sec   Loss 12.6596   LearningRate 0.0809   Epoch: 2   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:06:59,840-Speed 2630.18 samples/sec   Loss 12.8412   LearningRate 0.0809   Epoch: 2   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:03,734-Speed 2629.76 samples/sec   Loss 12.8221   LearningRate 0.0809   Epoch: 2   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:07,626-Speed 2631.66 samples/sec   Loss 12.8442   LearningRate 0.0809   Epoch: 2   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:11,518-Speed 2631.36 samples/sec   Loss 12.6725   LearningRate 0.0809   Epoch: 2   Global Step: 83580   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:15,418-Speed 2626.40 samples/sec   Loss 12.5799   LearningRate 0.0809   Epoch: 2   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:19,358-Speed 2599.68 samples/sec   Loss 12.6651   LearningRate 0.0809   Epoch: 2   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:23,263-Speed 2623.06 samples/sec   Loss 12.6659   LearningRate 0.0809   Epoch: 2   Global Step: 83610   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:07:27,166-Speed 2624.60 samples/sec   Loss 12.6874   LearningRate 0.0809   Epoch: 2   Global Step: 83620   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:07:31,083-Speed 2614.61 samples/sec   Loss 12.7385   LearningRate 0.0809   Epoch: 2   Global Step: 83630   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:07:34,972-Speed 2633.62 samples/sec   Loss 12.8003   LearningRate 0.0809   Epoch: 2   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:38,878-Speed 2622.44 samples/sec   Loss 12.8965   LearningRate 0.0808   Epoch: 2   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:42,770-Speed 2631.47 samples/sec   Loss 12.6835   LearningRate 0.0808   Epoch: 2   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:46,662-Speed 2631.55 samples/sec   Loss 12.8774   LearningRate 0.0808   Epoch: 2   Global Step: 83670   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:50,556-Speed 2630.44 samples/sec   Loss 12.7550   LearningRate 0.0808   Epoch: 2   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:54,450-Speed 2630.22 samples/sec   Loss 12.6474   LearningRate 0.0808   Epoch: 2   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:07:58,351-Speed 2626.01 samples/sec   Loss 12.7320   LearningRate 0.0808   Epoch: 2   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:08:02,245-Speed 2629.79 samples/sec   Loss 12.6085   LearningRate 0.0808   Epoch: 2   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:08:06,136-Speed 2632.21 samples/sec   Loss 12.7492   LearningRate 0.0808   Epoch: 2   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:08:10,031-Speed 2630.24 samples/sec   Loss 12.7555   LearningRate 0.0808   Epoch: 2   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:08:13,929-Speed 2627.42 samples/sec   Loss 12.6233   LearningRate 0.0808   Epoch: 2   Global Step: 83740   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:17,828-Speed 2626.86 samples/sec   Loss 12.7859   LearningRate 0.0808   Epoch: 2   Global Step: 83750   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:21,721-Speed 2630.69 samples/sec   Loss 12.6010   LearningRate 0.0808   Epoch: 2   Global Step: 83760   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:25,621-Speed 2626.42 samples/sec   Loss 12.6841   LearningRate 0.0808   Epoch: 2   Global Step: 83770   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:29,533-Speed 2618.63 samples/sec   Loss 12.7767   LearningRate 0.0808   Epoch: 2   Global Step: 83780   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:33,443-Speed 2619.35 samples/sec   Loss 12.7096   LearningRate 0.0808   Epoch: 2   Global Step: 83790   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:37,354-Speed 2619.33 samples/sec   Loss 12.6757   LearningRate 0.0808   Epoch: 2   Global Step: 83800   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:41,264-Speed 2619.20 samples/sec   Loss 12.5644   LearningRate 0.0808   Epoch: 2   Global Step: 83810   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:45,199-Speed 2603.31 samples/sec   Loss 12.6744   LearningRate 0.0808   Epoch: 2   Global Step: 83820   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:49,109-Speed 2619.69 samples/sec   Loss 12.7836   LearningRate 0.0808   Epoch: 2   Global Step: 83830   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:08:53,024-Speed 2616.57 samples/sec   Loss 12.7465   LearningRate 0.0808   Epoch: 2   Global Step: 83840   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:08:56,928-Speed 2623.74 samples/sec   Loss 12.6630   LearningRate 0.0808   Epoch: 2   Global Step: 83850   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:00,835-Speed 2621.33 samples/sec   Loss 12.7791   LearningRate 0.0808   Epoch: 2   Global Step: 83860   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:04,736-Speed 2626.25 samples/sec   Loss 12.6649   LearningRate 0.0808   Epoch: 2   Global Step: 83870   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:08,637-Speed 2625.10 samples/sec   Loss 12.7156   LearningRate 0.0808   Epoch: 2   Global Step: 83880   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:12,545-Speed 2621.07 samples/sec   Loss 12.7347   LearningRate 0.0808   Epoch: 2   Global Step: 83890   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:16,460-Speed 2616.14 samples/sec   Loss 12.7525   LearningRate 0.0808   Epoch: 2   Global Step: 83900   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:20,360-Speed 2626.32 samples/sec   Loss 12.8300   LearningRate 0.0808   Epoch: 2   Global Step: 83910   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:24,264-Speed 2624.45 samples/sec   Loss 12.6140   LearningRate 0.0808   Epoch: 2   Global Step: 83920   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:28,167-Speed 2623.71 samples/sec   Loss 12.7353   LearningRate 0.0808   Epoch: 2   Global Step: 83930   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:32,093-Speed 2609.51 samples/sec   Loss 12.7445   LearningRate 0.0808   Epoch: 2   Global Step: 83940   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:36,007-Speed 2617.00 samples/sec   Loss 12.7494   LearningRate 0.0808   Epoch: 2   Global Step: 83950   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:39,913-Speed 2621.60 samples/sec   Loss 12.7077   LearningRate 0.0808   Epoch: 2   Global Step: 83960   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:43,823-Speed 2619.99 samples/sec   Loss 12.8054   LearningRate 0.0808   Epoch: 2   Global Step: 83970   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:47,733-Speed 2619.50 samples/sec   Loss 12.6227   LearningRate 0.0808   Epoch: 2   Global Step: 83980   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:51,634-Speed 2625.57 samples/sec   Loss 12.6517   LearningRate 0.0808   Epoch: 2   Global Step: 83990   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:55,538-Speed 2624.28 samples/sec   Loss 12.7417   LearningRate 0.0808   Epoch: 2   Global Step: 84000   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:09:59,433-Speed 2629.44 samples/sec   Loss 12.8201   LearningRate 0.0808   Epoch: 2   Global Step: 84010   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:03,366-Speed 2604.58 samples/sec   Loss 12.6396   LearningRate 0.0808   Epoch: 2   Global Step: 84020   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:07,286-Speed 2612.67 samples/sec   Loss 12.7039   LearningRate 0.0808   Epoch: 2   Global Step: 84030   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:11,212-Speed 2608.81 samples/sec   Loss 12.5985   LearningRate 0.0808   Epoch: 2   Global Step: 84040   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 05:10:15,095-Speed 2638.10 samples/sec   Loss 12.5920   LearningRate 0.0808   Epoch: 2   Global Step: 84050   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:18,998-Speed 2624.05 samples/sec   Loss 12.6899   LearningRate 0.0808   Epoch: 2   Global Step: 84060   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:22,905-Speed 2621.61 samples/sec   Loss 12.7865   LearningRate 0.0808   Epoch: 2   Global Step: 84070   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:26,811-Speed 2622.04 samples/sec   Loss 12.6616   LearningRate 0.0808   Epoch: 2   Global Step: 84080   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:30,712-Speed 2625.93 samples/sec   Loss 12.7313   LearningRate 0.0808   Epoch: 2   Global Step: 84090   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:34,626-Speed 2617.09 samples/sec   Loss 12.7410   LearningRate 0.0808   Epoch: 2   Global Step: 84100   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:38,529-Speed 2624.52 samples/sec   Loss 12.6205   LearningRate 0.0808   Epoch: 2   Global Step: 84110   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:42,450-Speed 2611.63 samples/sec   Loss 12.8531   LearningRate 0.0807   Epoch: 2   Global Step: 84120   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:46,390-Speed 2599.63 samples/sec   Loss 12.6965   LearningRate 0.0807   Epoch: 2   Global Step: 84130   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:50,298-Speed 2620.72 samples/sec   Loss 12.7952   LearningRate 0.0807   Epoch: 2   Global Step: 84140   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:54,235-Speed 2602.61 samples/sec   Loss 12.5031   LearningRate 0.0807   Epoch: 2   Global Step: 84150   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:10:58,136-Speed 2625.89 samples/sec   Loss 12.7377   LearningRate 0.0807   Epoch: 2   Global Step: 84160   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:02,034-Speed 2627.66 samples/sec   Loss 12.5362   LearningRate 0.0807   Epoch: 2   Global Step: 84170   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:05,939-Speed 2622.34 samples/sec   Loss 12.6135   LearningRate 0.0807   Epoch: 2   Global Step: 84180   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:09,845-Speed 2622.87 samples/sec   Loss 12.6470   LearningRate 0.0807   Epoch: 2   Global Step: 84190   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:13,743-Speed 2627.37 samples/sec   Loss 12.6101   LearningRate 0.0807   Epoch: 2   Global Step: 84200   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:17,642-Speed 2626.77 samples/sec   Loss 12.7002   LearningRate 0.0807   Epoch: 2   Global Step: 84210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:21,537-Speed 2629.42 samples/sec   Loss 12.5859   LearningRate 0.0807   Epoch: 2   Global Step: 84220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:25,433-Speed 2628.86 samples/sec   Loss 12.4878   LearningRate 0.0807   Epoch: 2   Global Step: 84230   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:29,335-Speed 2626.00 samples/sec   Loss 12.6438   LearningRate 0.0807   Epoch: 2   Global Step: 84240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:11:33,238-Speed 2623.78 samples/sec   Loss 12.6012   LearningRate 0.0807   Epoch: 2   Global Step: 84250   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 05:11:37,113-Speed 2642.99 samples/sec   Loss 12.5769   LearningRate 0.0807   Epoch: 2   Global Step: 84260   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:11:41,017-Speed 2623.68 samples/sec   Loss 12.6687   LearningRate 0.0807   Epoch: 2   Global Step: 84270   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:11:44,923-Speed 2622.62 samples/sec   Loss 12.5435   LearningRate 0.0807   Epoch: 2   Global Step: 84280   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:11:48,827-Speed 2623.21 samples/sec   Loss 12.8105   LearningRate 0.0807   Epoch: 2   Global Step: 84290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:11:52,727-Speed 2627.04 samples/sec   Loss 12.7847   LearningRate 0.0807   Epoch: 2   Global Step: 84300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:11:56,629-Speed 2624.89 samples/sec   Loss 12.7399   LearningRate 0.0807   Epoch: 2   Global Step: 84310   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:12:00,527-Speed 2627.30 samples/sec   Loss 12.7072   LearningRate 0.0807   Epoch: 2   Global Step: 84320   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:12:04,433-Speed 2622.70 samples/sec   Loss 12.6864   LearningRate 0.0807   Epoch: 2   Global Step: 84330   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:12:08,333-Speed 2626.34 samples/sec   Loss 12.6342   LearningRate 0.0807   Epoch: 2   Global Step: 84340   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:12:12,232-Speed 2626.73 samples/sec   Loss 12.7220   LearningRate 0.0807   Epoch: 2   Global Step: 84350   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:12:16,124-Speed 2631.95 samples/sec   Loss 12.5929   LearningRate 0.0807   Epoch: 2   Global Step: 84360   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:20,018-Speed 2630.20 samples/sec   Loss 12.6315   LearningRate 0.0807   Epoch: 2   Global Step: 84370   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:23,914-Speed 2629.13 samples/sec   Loss 12.6572   LearningRate 0.0807   Epoch: 2   Global Step: 84380   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:27,810-Speed 2629.48 samples/sec   Loss 12.5986   LearningRate 0.0807   Epoch: 2   Global Step: 84390   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:31,713-Speed 2623.92 samples/sec   Loss 12.6473   LearningRate 0.0807   Epoch: 2   Global Step: 84400   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:35,756-Speed 2533.22 samples/sec   Loss 12.8007   LearningRate 0.0807   Epoch: 2   Global Step: 84410   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:39,647-Speed 2632.27 samples/sec   Loss 12.7784   LearningRate 0.0807   Epoch: 2   Global Step: 84420   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:43,543-Speed 2629.66 samples/sec   Loss 12.6485   LearningRate 0.0807   Epoch: 2   Global Step: 84430   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:47,439-Speed 2628.56 samples/sec   Loss 12.8082   LearningRate 0.0807   Epoch: 2   Global Step: 84440   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:51,335-Speed 2629.17 samples/sec   Loss 12.5653   LearningRate 0.0807   Epoch: 2   Global Step: 84450   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:12:55,254-Speed 2613.73 samples/sec   Loss 12.7548   LearningRate 0.0807   Epoch: 2   Global Step: 84460   Fp16 Grad Scale: 524288   Required: 84 hours
Training: 2022-04-13 05:12:59,149-Speed 2629.50 samples/sec   Loss 12.5432   LearningRate 0.0807   Epoch: 2   Global Step: 84470   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:03,045-Speed 2629.03 samples/sec   Loss 12.6680   LearningRate 0.0807   Epoch: 2   Global Step: 84480   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:06,938-Speed 2631.14 samples/sec   Loss 12.6283   LearningRate 0.0807   Epoch: 2   Global Step: 84490   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:10,892-Speed 2589.90 samples/sec   Loss 12.6699   LearningRate 0.0807   Epoch: 2   Global Step: 84500   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:14,800-Speed 2621.75 samples/sec   Loss 12.6858   LearningRate 0.0807   Epoch: 2   Global Step: 84510   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:18,691-Speed 2632.27 samples/sec   Loss 12.7933   LearningRate 0.0807   Epoch: 2   Global Step: 84520   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:22,585-Speed 2630.34 samples/sec   Loss 12.6295   LearningRate 0.0807   Epoch: 2   Global Step: 84530   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:26,478-Speed 2631.31 samples/sec   Loss 12.6732   LearningRate 0.0807   Epoch: 2   Global Step: 84540   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:30,372-Speed 2630.30 samples/sec   Loss 12.5617   LearningRate 0.0807   Epoch: 2   Global Step: 84550   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:13:34,254-Speed 2638.54 samples/sec   Loss 12.6293   LearningRate 0.0807   Epoch: 2   Global Step: 84560   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:13:38,151-Speed 2628.34 samples/sec   Loss 12.5032   LearningRate 0.0807   Epoch: 2   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:13:42,110-Speed 2586.56 samples/sec   Loss 12.7094   LearningRate 0.0806   Epoch: 2   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:13:46,012-Speed 2626.45 samples/sec   Loss 12.5143   LearningRate 0.0806   Epoch: 2   Global Step: 84590   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:13:49,901-Speed 2633.37 samples/sec   Loss 12.5269   LearningRate 0.0806   Epoch: 2   Global Step: 84600   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:13:53,795-Speed 2630.54 samples/sec   Loss 12.6594   LearningRate 0.0806   Epoch: 2   Global Step: 84610   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:13:57,698-Speed 2624.30 samples/sec   Loss 12.8048   LearningRate 0.0806   Epoch: 2   Global Step: 84620   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:14:01,591-Speed 2630.52 samples/sec   Loss 12.5336   LearningRate 0.0806   Epoch: 2   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:14:05,491-Speed 2626.15 samples/sec   Loss 12.5849   LearningRate 0.0806   Epoch: 2   Global Step: 84640   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:14:09,405-Speed 2616.87 samples/sec   Loss 12.7306   LearningRate 0.0806   Epoch: 2   Global Step: 84650   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:14:13,509-Speed 2495.98 samples/sec   Loss 12.6487   LearningRate 0.0806   Epoch: 2   Global Step: 84660   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:17,403-Speed 2630.48 samples/sec   Loss 12.7807   LearningRate 0.0806   Epoch: 2   Global Step: 84670   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:21,308-Speed 2622.83 samples/sec   Loss 12.4878   LearningRate 0.0806   Epoch: 2   Global Step: 84680   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:25,206-Speed 2628.53 samples/sec   Loss 12.6558   LearningRate 0.0806   Epoch: 2   Global Step: 84690   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:29,103-Speed 2627.59 samples/sec   Loss 12.7988   LearningRate 0.0806   Epoch: 2   Global Step: 84700   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:32,997-Speed 2630.94 samples/sec   Loss 12.6838   LearningRate 0.0806   Epoch: 2   Global Step: 84710   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:36,900-Speed 2623.62 samples/sec   Loss 12.7378   LearningRate 0.0806   Epoch: 2   Global Step: 84720   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:40,808-Speed 2621.10 samples/sec   Loss 12.7757   LearningRate 0.0806   Epoch: 2   Global Step: 84730   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:44,703-Speed 2629.16 samples/sec   Loss 12.5519   LearningRate 0.0806   Epoch: 2   Global Step: 84740   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:48,601-Speed 2628.57 samples/sec   Loss 12.5955   LearningRate 0.0806   Epoch: 2   Global Step: 84750   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:52,479-Speed 2641.15 samples/sec   Loss 12.7336   LearningRate 0.0806   Epoch: 2   Global Step: 84760   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:14:56,355-Speed 2642.61 samples/sec   Loss 12.5958   LearningRate 0.0806   Epoch: 2   Global Step: 84770   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:00,252-Speed 2628.19 samples/sec   Loss 12.7699   LearningRate 0.0806   Epoch: 2   Global Step: 84780   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:04,216-Speed 2584.08 samples/sec   Loss 12.7556   LearningRate 0.0806   Epoch: 2   Global Step: 84790   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:08,139-Speed 2610.22 samples/sec   Loss 12.5051   LearningRate 0.0806   Epoch: 2   Global Step: 84800   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:12,169-Speed 2541.62 samples/sec   Loss 12.7456   LearningRate 0.0806   Epoch: 2   Global Step: 84810   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:16,165-Speed 2562.81 samples/sec   Loss 12.7446   LearningRate 0.0806   Epoch: 2   Global Step: 84820   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:20,070-Speed 2623.69 samples/sec   Loss 12.6422   LearningRate 0.0806   Epoch: 2   Global Step: 84830   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:23,968-Speed 2627.58 samples/sec   Loss 12.5929   LearningRate 0.0806   Epoch: 2   Global Step: 84840   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:27,862-Speed 2630.34 samples/sec   Loss 12.5913   LearningRate 0.0806   Epoch: 2   Global Step: 84850   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:31,756-Speed 2630.39 samples/sec   Loss 12.5731   LearningRate 0.0806   Epoch: 2   Global Step: 84860   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:15:35,653-Speed 2628.07 samples/sec   Loss 12.7372   LearningRate 0.0806   Epoch: 2   Global Step: 84870   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:15:39,558-Speed 2622.62 samples/sec   Loss 12.6132   LearningRate 0.0806   Epoch: 2   Global Step: 84880   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:15:43,530-Speed 2578.98 samples/sec   Loss 12.6684   LearningRate 0.0806   Epoch: 2   Global Step: 84890   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:15:47,608-Speed 2511.31 samples/sec   Loss 12.5944   LearningRate 0.0806   Epoch: 2   Global Step: 84900   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:15:51,511-Speed 2624.42 samples/sec   Loss 12.5465   LearningRate 0.0806   Epoch: 2   Global Step: 84910   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:15:55,416-Speed 2623.37 samples/sec   Loss 12.7334   LearningRate 0.0806   Epoch: 2   Global Step: 84920   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:15:59,295-Speed 2640.72 samples/sec   Loss 12.5934   LearningRate 0.0806   Epoch: 2   Global Step: 84930   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:03,311-Speed 2550.40 samples/sec   Loss 12.7536   LearningRate 0.0806   Epoch: 2   Global Step: 84940   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:07,358-Speed 2530.83 samples/sec   Loss 12.5569   LearningRate 0.0806   Epoch: 2   Global Step: 84950   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:11,369-Speed 2553.66 samples/sec   Loss 12.5957   LearningRate 0.0806   Epoch: 2   Global Step: 84960   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:15,272-Speed 2624.48 samples/sec   Loss 12.5676   LearningRate 0.0806   Epoch: 2   Global Step: 84970   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:19,167-Speed 2629.20 samples/sec   Loss 12.6192   LearningRate 0.0806   Epoch: 2   Global Step: 84980   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:23,078-Speed 2618.68 samples/sec   Loss 12.5580   LearningRate 0.0806   Epoch: 2   Global Step: 84990   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:26,990-Speed 2618.43 samples/sec   Loss 12.5390   LearningRate 0.0806   Epoch: 2   Global Step: 85000   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:30,902-Speed 2618.61 samples/sec   Loss 12.6126   LearningRate 0.0806   Epoch: 2   Global Step: 85010   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:34,806-Speed 2623.31 samples/sec   Loss 12.6206   LearningRate 0.0806   Epoch: 2   Global Step: 85020   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:16:38,712-Speed 2622.33 samples/sec   Loss 12.6252   LearningRate 0.0806   Epoch: 2   Global Step: 85030   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:16:42,616-Speed 2623.66 samples/sec   Loss 12.6244   LearningRate 0.0805   Epoch: 2   Global Step: 85040   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:16:46,529-Speed 2617.33 samples/sec   Loss 12.6847   LearningRate 0.0805   Epoch: 2   Global Step: 85050   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:16:50,459-Speed 2606.38 samples/sec   Loss 12.6144   LearningRate 0.0805   Epoch: 2   Global Step: 85060   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:16:54,365-Speed 2622.53 samples/sec   Loss 12.5373   LearningRate 0.0805   Epoch: 2   Global Step: 85070   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:16:58,272-Speed 2621.79 samples/sec   Loss 12.5378   LearningRate 0.0805   Epoch: 2   Global Step: 85080   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:17:02,238-Speed 2582.40 samples/sec   Loss 12.8040   LearningRate 0.0805   Epoch: 2   Global Step: 85090   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:17:06,131-Speed 2630.78 samples/sec   Loss 12.6422   LearningRate 0.0805   Epoch: 2   Global Step: 85100   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:17:10,016-Speed 2637.23 samples/sec   Loss 12.6515   LearningRate 0.0805   Epoch: 2   Global Step: 85110   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:13,927-Speed 2618.56 samples/sec   Loss 12.7767   LearningRate 0.0805   Epoch: 2   Global Step: 85120   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:17,839-Speed 2618.14 samples/sec   Loss 12.6138   LearningRate 0.0805   Epoch: 2   Global Step: 85130   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:21,733-Speed 2630.83 samples/sec   Loss 12.6180   LearningRate 0.0805   Epoch: 2   Global Step: 85140   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:25,622-Speed 2633.35 samples/sec   Loss 12.6384   LearningRate 0.0805   Epoch: 2   Global Step: 85150   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:29,514-Speed 2631.60 samples/sec   Loss 12.4843   LearningRate 0.0805   Epoch: 2   Global Step: 85160   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:33,405-Speed 2632.77 samples/sec   Loss 12.5864   LearningRate 0.0805   Epoch: 2   Global Step: 85170   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:37,299-Speed 2630.30 samples/sec   Loss 12.5278   LearningRate 0.0805   Epoch: 2   Global Step: 85180   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:41,191-Speed 2631.47 samples/sec   Loss 12.7651   LearningRate 0.0805   Epoch: 2   Global Step: 85190   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:45,089-Speed 2628.03 samples/sec   Loss 12.6728   LearningRate 0.0805   Epoch: 2   Global Step: 85200   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:17:48,982-Speed 2630.88 samples/sec   Loss 12.6963   LearningRate 0.0805   Epoch: 2   Global Step: 85210   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:17:52,875-Speed 2631.10 samples/sec   Loss 12.6352   LearningRate 0.0805   Epoch: 2   Global Step: 85220   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:17:56,772-Speed 2628.18 samples/sec   Loss 12.5739   LearningRate 0.0805   Epoch: 2   Global Step: 85230   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:18:00,755-Speed 2571.61 samples/sec   Loss 12.5854   LearningRate 0.0805   Epoch: 2   Global Step: 85240   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:18:04,651-Speed 2629.15 samples/sec   Loss 12.4850   LearningRate 0.0805   Epoch: 2   Global Step: 85250   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:18:08,546-Speed 2629.10 samples/sec   Loss 12.6020   LearningRate 0.0805   Epoch: 2   Global Step: 85260   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:18:12,442-Speed 2629.22 samples/sec   Loss 12.6308   LearningRate 0.0805   Epoch: 2   Global Step: 85270   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:18:16,346-Speed 2623.24 samples/sec   Loss 12.5984   LearningRate 0.0805   Epoch: 2   Global Step: 85280   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:18:20,226-Speed 2640.69 samples/sec   Loss 12.6451   LearningRate 0.0805   Epoch: 2   Global Step: 85290   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:18:24,144-Speed 2613.92 samples/sec   Loss 12.6425   LearningRate 0.0805   Epoch: 2   Global Step: 85300   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:18:28,146-Speed 2559.39 samples/sec   Loss 12.6161   LearningRate 0.0805   Epoch: 2   Global Step: 85310   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:18:32,194-Speed 2530.52 samples/sec   Loss 12.6923   LearningRate 0.0805   Epoch: 2   Global Step: 85320   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:36,089-Speed 2629.07 samples/sec   Loss 12.8329   LearningRate 0.0805   Epoch: 2   Global Step: 85330   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:39,982-Speed 2630.66 samples/sec   Loss 12.6846   LearningRate 0.0805   Epoch: 2   Global Step: 85340   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:43,894-Speed 2618.64 samples/sec   Loss 12.5376   LearningRate 0.0805   Epoch: 2   Global Step: 85350   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:47,800-Speed 2622.72 samples/sec   Loss 12.4275   LearningRate 0.0805   Epoch: 2   Global Step: 85360   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:51,716-Speed 2615.79 samples/sec   Loss 12.7769   LearningRate 0.0805   Epoch: 2   Global Step: 85370   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:55,614-Speed 2627.83 samples/sec   Loss 12.6142   LearningRate 0.0805   Epoch: 2   Global Step: 85380   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:18:59,520-Speed 2622.57 samples/sec   Loss 12.6423   LearningRate 0.0805   Epoch: 2   Global Step: 85390   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:19:03,427-Speed 2621.69 samples/sec   Loss 12.5875   LearningRate 0.0805   Epoch: 2   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:19:07,345-Speed 2613.80 samples/sec   Loss 12.6910   LearningRate 0.0805   Epoch: 2   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 84 hours
Training: 2022-04-13 05:19:11,242-Speed 2628.46 samples/sec   Loss 12.6411   LearningRate 0.0805   Epoch: 2   Global Step: 85420   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:15,151-Speed 2620.42 samples/sec   Loss 12.6325   LearningRate 0.0805   Epoch: 2   Global Step: 85430   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:19,057-Speed 2621.70 samples/sec   Loss 12.5757   LearningRate 0.0805   Epoch: 2   Global Step: 85440   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:22,967-Speed 2620.44 samples/sec   Loss 12.5994   LearningRate 0.0805   Epoch: 2   Global Step: 85450   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:26,879-Speed 2618.11 samples/sec   Loss 12.5990   LearningRate 0.0805   Epoch: 2   Global Step: 85460   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:30,793-Speed 2617.02 samples/sec   Loss 12.5426   LearningRate 0.0805   Epoch: 2   Global Step: 85470   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:34,721-Speed 2607.87 samples/sec   Loss 12.6589   LearningRate 0.0805   Epoch: 2   Global Step: 85480   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:38,664-Speed 2597.06 samples/sec   Loss 12.6778   LearningRate 0.0805   Epoch: 2   Global Step: 85490   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:42,563-Speed 2627.04 samples/sec   Loss 12.7073   LearningRate 0.0804   Epoch: 2   Global Step: 85500   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:46,624-Speed 2522.41 samples/sec   Loss 12.5322   LearningRate 0.0804   Epoch: 2   Global Step: 85510   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:19:50,569-Speed 2596.56 samples/sec   Loss 12.7801   LearningRate 0.0804   Epoch: 2   Global Step: 85520   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:19:54,473-Speed 2623.19 samples/sec   Loss 12.7782   LearningRate 0.0804   Epoch: 2   Global Step: 85530   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:19:58,377-Speed 2624.12 samples/sec   Loss 12.7357   LearningRate 0.0804   Epoch: 2   Global Step: 85540   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:20:02,277-Speed 2626.65 samples/sec   Loss 12.7583   LearningRate 0.0804   Epoch: 2   Global Step: 85550   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:20:06,172-Speed 2629.10 samples/sec   Loss 12.6217   LearningRate 0.0804   Epoch: 2   Global Step: 85560   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:20:10,070-Speed 2627.68 samples/sec   Loss 12.6493   LearningRate 0.0804   Epoch: 2   Global Step: 85570   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:20:13,976-Speed 2622.09 samples/sec   Loss 12.7011   LearningRate 0.0804   Epoch: 2   Global Step: 85580   Fp16 Grad Scale: 262144   Required: 84 hours
Training: 2022-04-13 05:20:17,848-Speed 2645.28 samples/sec   Loss 12.6061   LearningRate 0.0804   Epoch: 2   Global Step: 85590   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:20:21,748-Speed 2626.79 samples/sec   Loss 12.6005   LearningRate 0.0804   Epoch: 2   Global Step: 85600   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:20:25,644-Speed 2629.58 samples/sec   Loss 12.5137   LearningRate 0.0804   Epoch: 2   Global Step: 85610   Fp16 Grad Scale: 131072   Required: 84 hours
Training: 2022-04-13 05:20:29,541-Speed 2627.72 samples/sec   Loss 12.5526   LearningRate 0.0804   Epoch: 2   Global Step: 85620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:33,453-Speed 2618.67 samples/sec   Loss 12.4882   LearningRate 0.0804   Epoch: 2   Global Step: 85630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:37,354-Speed 2625.17 samples/sec   Loss 12.4238   LearningRate 0.0804   Epoch: 2   Global Step: 85640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:41,317-Speed 2584.65 samples/sec   Loss 12.5519   LearningRate 0.0804   Epoch: 2   Global Step: 85650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:45,409-Speed 2502.69 samples/sec   Loss 12.5808   LearningRate 0.0804   Epoch: 2   Global Step: 85660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:49,345-Speed 2603.49 samples/sec   Loss 12.6116   LearningRate 0.0804   Epoch: 2   Global Step: 85670   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:53,243-Speed 2627.19 samples/sec   Loss 12.5540   LearningRate 0.0804   Epoch: 2   Global Step: 85680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:20:57,139-Speed 2628.91 samples/sec   Loss 12.6394   LearningRate 0.0804   Epoch: 2   Global Step: 85690   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:01,039-Speed 2626.87 samples/sec   Loss 12.5670   LearningRate 0.0804   Epoch: 2   Global Step: 85700   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:04,938-Speed 2627.67 samples/sec   Loss 12.5786   LearningRate 0.0804   Epoch: 2   Global Step: 85710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:08,836-Speed 2627.55 samples/sec   Loss 12.6309   LearningRate 0.0804   Epoch: 2   Global Step: 85720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:12,731-Speed 2628.93 samples/sec   Loss 12.5673   LearningRate 0.0804   Epoch: 2   Global Step: 85730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:16,640-Speed 2620.32 samples/sec   Loss 12.6501   LearningRate 0.0804   Epoch: 2   Global Step: 85740   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:20,544-Speed 2624.14 samples/sec   Loss 12.5459   LearningRate 0.0804   Epoch: 2   Global Step: 85750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:24,440-Speed 2628.69 samples/sec   Loss 12.6845   LearningRate 0.0804   Epoch: 2   Global Step: 85760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:28,339-Speed 2626.87 samples/sec   Loss 12.6532   LearningRate 0.0804   Epoch: 2   Global Step: 85770   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:32,250-Speed 2619.35 samples/sec   Loss 12.6822   LearningRate 0.0804   Epoch: 2   Global Step: 85780   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:36,156-Speed 2621.88 samples/sec   Loss 12.5162   LearningRate 0.0804   Epoch: 2   Global Step: 85790   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:40,067-Speed 2618.70 samples/sec   Loss 12.5610   LearningRate 0.0804   Epoch: 2   Global Step: 85800   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:43,963-Speed 2629.55 samples/sec   Loss 12.6083   LearningRate 0.0804   Epoch: 2   Global Step: 85810   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:47,855-Speed 2631.68 samples/sec   Loss 12.5728   LearningRate 0.0804   Epoch: 2   Global Step: 85820   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:21:51,732-Speed 2641.42 samples/sec   Loss 12.5581   LearningRate 0.0804   Epoch: 2   Global Step: 85830   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:21:55,635-Speed 2624.73 samples/sec   Loss 12.6421   LearningRate 0.0804   Epoch: 2   Global Step: 85840   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:21:59,530-Speed 2629.22 samples/sec   Loss 12.5954   LearningRate 0.0804   Epoch: 2   Global Step: 85850   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:03,425-Speed 2630.04 samples/sec   Loss 12.5197   LearningRate 0.0804   Epoch: 2   Global Step: 85860   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:07,319-Speed 2630.68 samples/sec   Loss 12.5688   LearningRate 0.0804   Epoch: 2   Global Step: 85870   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:11,255-Speed 2602.11 samples/sec   Loss 12.5571   LearningRate 0.0804   Epoch: 2   Global Step: 85880   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:15,143-Speed 2634.60 samples/sec   Loss 12.4978   LearningRate 0.0804   Epoch: 2   Global Step: 85890   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:19,040-Speed 2628.53 samples/sec   Loss 12.4238   LearningRate 0.0804   Epoch: 2   Global Step: 85900   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:22,943-Speed 2624.27 samples/sec   Loss 12.5431   LearningRate 0.0804   Epoch: 2   Global Step: 85910   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:26,897-Speed 2590.54 samples/sec   Loss 12.7123   LearningRate 0.0804   Epoch: 2   Global Step: 85920   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:30,780-Speed 2638.09 samples/sec   Loss 12.5622   LearningRate 0.0804   Epoch: 2   Global Step: 85930   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:34,705-Speed 2609.18 samples/sec   Loss 12.4802   LearningRate 0.0804   Epoch: 2   Global Step: 85940   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:38,610-Speed 2623.39 samples/sec   Loss 12.4552   LearningRate 0.0804   Epoch: 2   Global Step: 85950   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:42,503-Speed 2630.74 samples/sec   Loss 12.5740   LearningRate 0.0803   Epoch: 2   Global Step: 85960   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:46,407-Speed 2624.01 samples/sec   Loss 12.6258   LearningRate 0.0803   Epoch: 2   Global Step: 85970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:50,311-Speed 2623.29 samples/sec   Loss 12.6021   LearningRate 0.0803   Epoch: 2   Global Step: 85980   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:54,230-Speed 2614.05 samples/sec   Loss 12.5940   LearningRate 0.0803   Epoch: 2   Global Step: 85990   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:22:58,121-Speed 2632.42 samples/sec   Loss 12.6143   LearningRate 0.0803   Epoch: 2   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:02,016-Speed 2629.63 samples/sec   Loss 12.6218   LearningRate 0.0803   Epoch: 2   Global Step: 86010   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:05,913-Speed 2628.15 samples/sec   Loss 12.5078   LearningRate 0.0803   Epoch: 2   Global Step: 86020   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:09,806-Speed 2631.09 samples/sec   Loss 12.7434   LearningRate 0.0803   Epoch: 2   Global Step: 86030   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:23:13,684-Speed 2641.54 samples/sec   Loss 12.5888   LearningRate 0.0803   Epoch: 2   Global Step: 86040   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:17,577-Speed 2630.34 samples/sec   Loss 12.6355   LearningRate 0.0803   Epoch: 2   Global Step: 86050   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:21,470-Speed 2631.65 samples/sec   Loss 12.7385   LearningRate 0.0803   Epoch: 2   Global Step: 86060   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:25,365-Speed 2629.69 samples/sec   Loss 12.5696   LearningRate 0.0803   Epoch: 2   Global Step: 86070   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:29,257-Speed 2631.62 samples/sec   Loss 12.5925   LearningRate 0.0803   Epoch: 2   Global Step: 86080   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:33,150-Speed 2630.72 samples/sec   Loss 12.5885   LearningRate 0.0803   Epoch: 2   Global Step: 86090   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:37,050-Speed 2626.48 samples/sec   Loss 12.6926   LearningRate 0.0803   Epoch: 2   Global Step: 86100   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:40,947-Speed 2628.51 samples/sec   Loss 12.5802   LearningRate 0.0803   Epoch: 2   Global Step: 86110   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:44,840-Speed 2631.10 samples/sec   Loss 12.6867   LearningRate 0.0803   Epoch: 2   Global Step: 86120   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:48,739-Speed 2626.46 samples/sec   Loss 12.5719   LearningRate 0.0803   Epoch: 2   Global Step: 86130   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:23:52,646-Speed 2622.24 samples/sec   Loss 12.5666   LearningRate 0.0803   Epoch: 2   Global Step: 86140   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:23:56,540-Speed 2630.38 samples/sec   Loss 12.6406   LearningRate 0.0803   Epoch: 2   Global Step: 86150   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:00,434-Speed 2630.23 samples/sec   Loss 12.5468   LearningRate 0.0803   Epoch: 2   Global Step: 86160   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:04,332-Speed 2627.84 samples/sec   Loss 12.5976   LearningRate 0.0803   Epoch: 2   Global Step: 86170   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:08,224-Speed 2631.02 samples/sec   Loss 12.5573   LearningRate 0.0803   Epoch: 2   Global Step: 86180   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:12,119-Speed 2629.63 samples/sec   Loss 12.5549   LearningRate 0.0803   Epoch: 2   Global Step: 86190   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:16,017-Speed 2627.91 samples/sec   Loss 12.5582   LearningRate 0.0803   Epoch: 2   Global Step: 86200   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:19,918-Speed 2625.45 samples/sec   Loss 12.7513   LearningRate 0.0803   Epoch: 2   Global Step: 86210   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:23,814-Speed 2629.48 samples/sec   Loss 12.4987   LearningRate 0.0803   Epoch: 2   Global Step: 86220   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:27,710-Speed 2628.95 samples/sec   Loss 12.5670   LearningRate 0.0803   Epoch: 2   Global Step: 86230   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:31,617-Speed 2622.12 samples/sec   Loss 12.5370   LearningRate 0.0803   Epoch: 2   Global Step: 86240   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:35,507-Speed 2632.45 samples/sec   Loss 12.5307   LearningRate 0.0803   Epoch: 2   Global Step: 86250   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:39,404-Speed 2628.72 samples/sec   Loss 12.6861   LearningRate 0.0803   Epoch: 2   Global Step: 86260   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:43,303-Speed 2626.57 samples/sec   Loss 12.6169   LearningRate 0.0803   Epoch: 2   Global Step: 86270   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:47,204-Speed 2625.87 samples/sec   Loss 12.6654   LearningRate 0.0803   Epoch: 2   Global Step: 86280   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:51,119-Speed 2616.50 samples/sec   Loss 12.6449   LearningRate 0.0803   Epoch: 2   Global Step: 86290   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:55,018-Speed 2626.80 samples/sec   Loss 12.7166   LearningRate 0.0803   Epoch: 2   Global Step: 86300   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:24:58,903-Speed 2636.95 samples/sec   Loss 12.6607   LearningRate 0.0803   Epoch: 2   Global Step: 86310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:25:02,823-Speed 2612.53 samples/sec   Loss 12.5981   LearningRate 0.0803   Epoch: 2   Global Step: 86320   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:25:06,718-Speed 2629.93 samples/sec   Loss 12.7260   LearningRate 0.0803   Epoch: 2   Global Step: 86330   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:25:10,592-Speed 2643.50 samples/sec   Loss 12.5671   LearningRate 0.0803   Epoch: 2   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:14,485-Speed 2631.60 samples/sec   Loss 12.6017   LearningRate 0.0803   Epoch: 2   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:18,381-Speed 2628.94 samples/sec   Loss 12.5866   LearningRate 0.0803   Epoch: 2   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:22,274-Speed 2630.99 samples/sec   Loss 12.5736   LearningRate 0.0803   Epoch: 2   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:26,165-Speed 2632.60 samples/sec   Loss 12.6028   LearningRate 0.0803   Epoch: 2   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:30,055-Speed 2633.43 samples/sec   Loss 12.6121   LearningRate 0.0803   Epoch: 2   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:33,950-Speed 2629.37 samples/sec   Loss 12.6952   LearningRate 0.0803   Epoch: 2   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:37,843-Speed 2630.54 samples/sec   Loss 12.6402   LearningRate 0.0803   Epoch: 2   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:41,737-Speed 2630.25 samples/sec   Loss 12.5248   LearningRate 0.0803   Epoch: 2   Global Step: 86420   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:45,632-Speed 2632.70 samples/sec   Loss 12.6268   LearningRate 0.0802   Epoch: 2   Global Step: 86430   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:25:49,531-Speed 2626.57 samples/sec   Loss 12.7035   LearningRate 0.0802   Epoch: 2   Global Step: 86440   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:25:53,435-Speed 2623.60 samples/sec   Loss 12.6852   LearningRate 0.0802   Epoch: 2   Global Step: 86450   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:25:57,328-Speed 2631.63 samples/sec   Loss 12.5687   LearningRate 0.0802   Epoch: 2   Global Step: 86460   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:01,225-Speed 2628.33 samples/sec   Loss 12.5066   LearningRate 0.0802   Epoch: 2   Global Step: 86470   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:05,120-Speed 2629.44 samples/sec   Loss 12.5660   LearningRate 0.0802   Epoch: 2   Global Step: 86480   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:09,013-Speed 2630.76 samples/sec   Loss 12.6333   LearningRate 0.0802   Epoch: 2   Global Step: 86490   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:12,907-Speed 2630.21 samples/sec   Loss 12.6075   LearningRate 0.0802   Epoch: 2   Global Step: 86500   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:16,802-Speed 2630.31 samples/sec   Loss 12.5996   LearningRate 0.0802   Epoch: 2   Global Step: 86510   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:20,700-Speed 2627.66 samples/sec   Loss 12.5307   LearningRate 0.0802   Epoch: 2   Global Step: 86520   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:24,599-Speed 2626.71 samples/sec   Loss 12.5058   LearningRate 0.0802   Epoch: 2   Global Step: 86530   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:26:28,502-Speed 2624.42 samples/sec   Loss 12.5556   LearningRate 0.0802   Epoch: 2   Global Step: 86540   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:32,409-Speed 2621.55 samples/sec   Loss 12.6668   LearningRate 0.0802   Epoch: 2   Global Step: 86550   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:36,319-Speed 2620.23 samples/sec   Loss 12.6053   LearningRate 0.0802   Epoch: 2   Global Step: 86560   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:40,229-Speed 2619.48 samples/sec   Loss 12.5508   LearningRate 0.0802   Epoch: 2   Global Step: 86570   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:44,125-Speed 2628.65 samples/sec   Loss 12.5460   LearningRate 0.0802   Epoch: 2   Global Step: 86580   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:48,027-Speed 2625.09 samples/sec   Loss 12.6109   LearningRate 0.0802   Epoch: 2   Global Step: 86590   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:51,940-Speed 2617.81 samples/sec   Loss 12.6103   LearningRate 0.0802   Epoch: 2   Global Step: 86600   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:55,857-Speed 2614.95 samples/sec   Loss 12.5916   LearningRate 0.0802   Epoch: 2   Global Step: 86610   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:26:59,763-Speed 2622.45 samples/sec   Loss 12.6642   LearningRate 0.0802   Epoch: 2   Global Step: 86620   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:03,668-Speed 2622.62 samples/sec   Loss 12.5872   LearningRate 0.0802   Epoch: 2   Global Step: 86630   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:07,576-Speed 2621.44 samples/sec   Loss 12.5799   LearningRate 0.0802   Epoch: 2   Global Step: 86640   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 05:27:11,463-Speed 2635.03 samples/sec   Loss 12.4176   LearningRate 0.0802   Epoch: 2   Global Step: 86650   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:15,376-Speed 2617.31 samples/sec   Loss 12.4862   LearningRate 0.0802   Epoch: 2   Global Step: 86660   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:19,277-Speed 2625.84 samples/sec   Loss 12.6208   LearningRate 0.0802   Epoch: 2   Global Step: 86670   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:23,180-Speed 2624.05 samples/sec   Loss 12.5103   LearningRate 0.0802   Epoch: 2   Global Step: 86680   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:27,072-Speed 2631.74 samples/sec   Loss 12.5556   LearningRate 0.0802   Epoch: 2   Global Step: 86690   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:30,970-Speed 2627.61 samples/sec   Loss 12.5487   LearningRate 0.0802   Epoch: 2   Global Step: 86700   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:34,870-Speed 2626.85 samples/sec   Loss 12.4357   LearningRate 0.0802   Epoch: 2   Global Step: 86710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:38,773-Speed 2624.08 samples/sec   Loss 12.4331   LearningRate 0.0802   Epoch: 2   Global Step: 86720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:42,681-Speed 2620.54 samples/sec   Loss 12.3852   LearningRate 0.0802   Epoch: 2   Global Step: 86730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:46,588-Speed 2621.42 samples/sec   Loss 12.5660   LearningRate 0.0802   Epoch: 2   Global Step: 86740   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:50,480-Speed 2632.11 samples/sec   Loss 12.5761   LearningRate 0.0802   Epoch: 2   Global Step: 86750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:54,376-Speed 2629.14 samples/sec   Loss 12.4185   LearningRate 0.0802   Epoch: 2   Global Step: 86760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:27:58,303-Speed 2608.30 samples/sec   Loss 12.5966   LearningRate 0.0802   Epoch: 2   Global Step: 86770   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:02,204-Speed 2625.75 samples/sec   Loss 12.5480   LearningRate 0.0802   Epoch: 2   Global Step: 86780   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:06,104-Speed 2626.47 samples/sec   Loss 12.5770   LearningRate 0.0802   Epoch: 2   Global Step: 86790   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:10,005-Speed 2625.43 samples/sec   Loss 12.5102   LearningRate 0.0802   Epoch: 2   Global Step: 86800   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:13,901-Speed 2628.66 samples/sec   Loss 12.6977   LearningRate 0.0802   Epoch: 2   Global Step: 86810   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:17,797-Speed 2628.89 samples/sec   Loss 12.4139   LearningRate 0.0802   Epoch: 2   Global Step: 86820   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:21,701-Speed 2623.83 samples/sec   Loss 12.5774   LearningRate 0.0802   Epoch: 2   Global Step: 86830   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:25,604-Speed 2624.21 samples/sec   Loss 12.4652   LearningRate 0.0802   Epoch: 2   Global Step: 86840   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:29,491-Speed 2635.11 samples/sec   Loss 12.5266   LearningRate 0.0802   Epoch: 2   Global Step: 86850   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:33,403-Speed 2617.88 samples/sec   Loss 12.6480   LearningRate 0.0802   Epoch: 2   Global Step: 86860   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:37,319-Speed 2616.08 samples/sec   Loss 12.5682   LearningRate 0.0802   Epoch: 2   Global Step: 86870   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:41,218-Speed 2627.26 samples/sec   Loss 12.4467   LearningRate 0.0802   Epoch: 2   Global Step: 86880   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:45,114-Speed 2628.63 samples/sec   Loss 12.4918   LearningRate 0.0801   Epoch: 2   Global Step: 86890   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:49,010-Speed 2629.08 samples/sec   Loss 12.4338   LearningRate 0.0801   Epoch: 2   Global Step: 86900   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:52,909-Speed 2627.60 samples/sec   Loss 12.6246   LearningRate 0.0801   Epoch: 2   Global Step: 86910   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:28:56,807-Speed 2627.80 samples/sec   Loss 12.5798   LearningRate 0.0801   Epoch: 2   Global Step: 86920   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:29:00,699-Speed 2631.35 samples/sec   Loss 12.6099   LearningRate 0.0801   Epoch: 2   Global Step: 86930   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:29:04,578-Speed 2640.84 samples/sec   Loss 12.5608   LearningRate 0.0801   Epoch: 2   Global Step: 86940   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:08,474-Speed 2628.69 samples/sec   Loss 12.5880   LearningRate 0.0801   Epoch: 2   Global Step: 86950   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:12,376-Speed 2624.91 samples/sec   Loss 12.6427   LearningRate 0.0801   Epoch: 2   Global Step: 86960   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:16,284-Speed 2621.45 samples/sec   Loss 12.5669   LearningRate 0.0801   Epoch: 2   Global Step: 86970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:20,194-Speed 2619.25 samples/sec   Loss 12.5546   LearningRate 0.0801   Epoch: 2   Global Step: 86980   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:24,089-Speed 2630.12 samples/sec   Loss 12.6188   LearningRate 0.0801   Epoch: 2   Global Step: 86990   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:27,999-Speed 2619.76 samples/sec   Loss 12.5125   LearningRate 0.0801   Epoch: 2   Global Step: 87000   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:31,894-Speed 2629.29 samples/sec   Loss 12.6235   LearningRate 0.0801   Epoch: 2   Global Step: 87010   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:35,802-Speed 2621.05 samples/sec   Loss 12.5091   LearningRate 0.0801   Epoch: 2   Global Step: 87020   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:39,699-Speed 2628.67 samples/sec   Loss 12.6974   LearningRate 0.0801   Epoch: 2   Global Step: 87030   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:29:43,631-Speed 2604.39 samples/sec   Loss 12.4327   LearningRate 0.0801   Epoch: 2   Global Step: 87040   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:29:47,532-Speed 2626.06 samples/sec   Loss 12.5347   LearningRate 0.0801   Epoch: 2   Global Step: 87050   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:29:51,428-Speed 2629.17 samples/sec   Loss 12.4707   LearningRate 0.0801   Epoch: 2   Global Step: 87060   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:29:55,332-Speed 2623.75 samples/sec   Loss 12.4642   LearningRate 0.0801   Epoch: 2   Global Step: 87070   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:29:59,225-Speed 2631.13 samples/sec   Loss 12.5715   LearningRate 0.0801   Epoch: 2   Global Step: 87080   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:03,132-Speed 2621.70 samples/sec   Loss 12.7133   LearningRate 0.0801   Epoch: 2   Global Step: 87090   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:07,032-Speed 2625.86 samples/sec   Loss 12.3934   LearningRate 0.0801   Epoch: 2   Global Step: 87100   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:10,926-Speed 2630.13 samples/sec   Loss 12.5151   LearningRate 0.0801   Epoch: 2   Global Step: 87110   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:14,826-Speed 2626.83 samples/sec   Loss 12.6218   LearningRate 0.0801   Epoch: 2   Global Step: 87120   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:18,725-Speed 2626.85 samples/sec   Loss 12.6602   LearningRate 0.0801   Epoch: 2   Global Step: 87130   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:22,615-Speed 2633.64 samples/sec   Loss 12.6137   LearningRate 0.0801   Epoch: 2   Global Step: 87140   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:26,521-Speed 2622.37 samples/sec   Loss 12.5555   LearningRate 0.0801   Epoch: 2   Global Step: 87150   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:30,426-Speed 2622.89 samples/sec   Loss 12.5492   LearningRate 0.0801   Epoch: 2   Global Step: 87160   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:30:34,316-Speed 2632.85 samples/sec   Loss 12.5929   LearningRate 0.0801   Epoch: 2   Global Step: 87170   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:30:38,221-Speed 2622.61 samples/sec   Loss 12.4248   LearningRate 0.0801   Epoch: 2   Global Step: 87180   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:30:42,189-Speed 2581.27 samples/sec   Loss 12.6545   LearningRate 0.0801   Epoch: 2   Global Step: 87190   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:30:46,089-Speed 2626.77 samples/sec   Loss 12.5891   LearningRate 0.0801   Epoch: 2   Global Step: 87200   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:30:49,985-Speed 2628.99 samples/sec   Loss 12.6150   LearningRate 0.0801   Epoch: 2   Global Step: 87210   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:30:53,882-Speed 2628.72 samples/sec   Loss 12.5778   LearningRate 0.0801   Epoch: 2   Global Step: 87220   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:30:57,786-Speed 2623.36 samples/sec   Loss 12.4118   LearningRate 0.0801   Epoch: 2   Global Step: 87230   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:01,680-Speed 2630.53 samples/sec   Loss 12.4444   LearningRate 0.0801   Epoch: 2   Global Step: 87240   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:05,575-Speed 2629.86 samples/sec   Loss 12.5291   LearningRate 0.0801   Epoch: 2   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:09,468-Speed 2630.66 samples/sec   Loss 12.6604   LearningRate 0.0801   Epoch: 2   Global Step: 87260   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:13,361-Speed 2630.64 samples/sec   Loss 12.6222   LearningRate 0.0801   Epoch: 2   Global Step: 87270   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:31:17,257-Speed 2629.47 samples/sec   Loss 12.5832   LearningRate 0.0801   Epoch: 2   Global Step: 87280   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:31:21,149-Speed 2632.06 samples/sec   Loss 12.4772   LearningRate 0.0801   Epoch: 2   Global Step: 87290   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:31:25,043-Speed 2630.24 samples/sec   Loss 12.5263   LearningRate 0.0801   Epoch: 2   Global Step: 87300   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:31:28,926-Speed 2637.54 samples/sec   Loss 12.6298   LearningRate 0.0801   Epoch: 2   Global Step: 87310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:32,829-Speed 2624.68 samples/sec   Loss 12.5972   LearningRate 0.0801   Epoch: 2   Global Step: 87320   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:36,723-Speed 2630.40 samples/sec   Loss 12.4155   LearningRate 0.0801   Epoch: 2   Global Step: 87330   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:40,615-Speed 2631.29 samples/sec   Loss 12.7000   LearningRate 0.0801   Epoch: 2   Global Step: 87340   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:44,514-Speed 2626.92 samples/sec   Loss 12.3933   LearningRate 0.0800   Epoch: 2   Global Step: 87350   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:48,410-Speed 2628.89 samples/sec   Loss 12.6117   LearningRate 0.0800   Epoch: 2   Global Step: 87360   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:52,312-Speed 2625.04 samples/sec   Loss 12.5403   LearningRate 0.0800   Epoch: 2   Global Step: 87370   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:31:56,239-Speed 2608.34 samples/sec   Loss 12.5401   LearningRate 0.0800   Epoch: 2   Global Step: 87380   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:32:00,143-Speed 2623.65 samples/sec   Loss 12.6504   LearningRate 0.0800   Epoch: 2   Global Step: 87390   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:32:04,036-Speed 2630.54 samples/sec   Loss 12.6262   LearningRate 0.0800   Epoch: 2   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:32:07,923-Speed 2635.64 samples/sec   Loss 12.7488   LearningRate 0.0800   Epoch: 2   Global Step: 87410   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:11,816-Speed 2630.84 samples/sec   Loss 12.8811   LearningRate 0.0800   Epoch: 2   Global Step: 87420   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:15,715-Speed 2626.87 samples/sec   Loss 12.9303   LearningRate 0.0800   Epoch: 2   Global Step: 87430   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:19,638-Speed 2610.84 samples/sec   Loss 12.6039   LearningRate 0.0800   Epoch: 2   Global Step: 87440   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:23,545-Speed 2621.16 samples/sec   Loss 12.7249   LearningRate 0.0800   Epoch: 2   Global Step: 87450   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:27,450-Speed 2623.16 samples/sec   Loss 12.6173   LearningRate 0.0800   Epoch: 2   Global Step: 87460   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:31,345-Speed 2629.23 samples/sec   Loss 12.6143   LearningRate 0.0800   Epoch: 2   Global Step: 87470   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:35,277-Speed 2605.04 samples/sec   Loss 12.5483   LearningRate 0.0800   Epoch: 2   Global Step: 87480   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:39,173-Speed 2629.45 samples/sec   Loss 12.4567   LearningRate 0.0800   Epoch: 2   Global Step: 87490   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:43,069-Speed 2628.88 samples/sec   Loss 12.7219   LearningRate 0.0800   Epoch: 2   Global Step: 87500   Fp16 Grad Scale: 32768   Required: 83 hours
Training: 2022-04-13 05:32:46,977-Speed 2620.31 samples/sec   Loss 12.7702   LearningRate 0.0800   Epoch: 2   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:32:50,879-Speed 2625.15 samples/sec   Loss 12.5715   LearningRate 0.0800   Epoch: 2   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:32:54,777-Speed 2627.66 samples/sec   Loss 12.5338   LearningRate 0.0800   Epoch: 2   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:32:58,764-Speed 2570.83 samples/sec   Loss 12.5566   LearningRate 0.0800   Epoch: 2   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:02,833-Speed 2517.28 samples/sec   Loss 12.6085   LearningRate 0.0800   Epoch: 2   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:06,867-Speed 2539.06 samples/sec   Loss 12.5691   LearningRate 0.0800   Epoch: 2   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:10,765-Speed 2626.94 samples/sec   Loss 12.4803   LearningRate 0.0800   Epoch: 2   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:14,662-Speed 2628.42 samples/sec   Loss 12.5810   LearningRate 0.0800   Epoch: 2   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:18,561-Speed 2627.47 samples/sec   Loss 12.5741   LearningRate 0.0800   Epoch: 2   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:22,458-Speed 2628.31 samples/sec   Loss 12.7871   LearningRate 0.0800   Epoch: 2   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:33:26,355-Speed 2627.95 samples/sec   Loss 12.7247   LearningRate 0.0800   Epoch: 2   Global Step: 87610   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:30,252-Speed 2628.80 samples/sec   Loss 12.5408   LearningRate 0.0800   Epoch: 2   Global Step: 87620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:34,152-Speed 2626.10 samples/sec   Loss 12.4941   LearningRate 0.0800   Epoch: 2   Global Step: 87630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:38,050-Speed 2627.92 samples/sec   Loss 12.5701   LearningRate 0.0800   Epoch: 2   Global Step: 87640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:41,953-Speed 2624.51 samples/sec   Loss 12.3667   LearningRate 0.0800   Epoch: 2   Global Step: 87650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:45,849-Speed 2628.59 samples/sec   Loss 12.6751   LearningRate 0.0800   Epoch: 2   Global Step: 87660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:49,747-Speed 2627.65 samples/sec   Loss 12.5827   LearningRate 0.0800   Epoch: 2   Global Step: 87670   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:53,646-Speed 2626.38 samples/sec   Loss 12.6558   LearningRate 0.0800   Epoch: 2   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:33:57,572-Speed 2609.31 samples/sec   Loss 12.5299   LearningRate 0.0800   Epoch: 2   Global Step: 87690   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:34:01,467-Speed 2630.17 samples/sec   Loss 12.5867   LearningRate 0.0800   Epoch: 2   Global Step: 87700   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:34:05,363-Speed 2629.08 samples/sec   Loss 12.4694   LearningRate 0.0800   Epoch: 2   Global Step: 87710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:09,282-Speed 2613.45 samples/sec   Loss 12.6562   LearningRate 0.0800   Epoch: 2   Global Step: 87720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:13,185-Speed 2624.59 samples/sec   Loss 12.6967   LearningRate 0.0800   Epoch: 2   Global Step: 87730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:17,095-Speed 2619.28 samples/sec   Loss 12.5893   LearningRate 0.0800   Epoch: 2   Global Step: 87740   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:20,996-Speed 2625.81 samples/sec   Loss 12.6661   LearningRate 0.0800   Epoch: 2   Global Step: 87750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:24,897-Speed 2625.48 samples/sec   Loss 12.5430   LearningRate 0.0800   Epoch: 2   Global Step: 87760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:28,803-Speed 2621.61 samples/sec   Loss 12.5884   LearningRate 0.0800   Epoch: 2   Global Step: 87770   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:32,710-Speed 2622.35 samples/sec   Loss 12.4089   LearningRate 0.0800   Epoch: 2   Global Step: 87780   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:36,607-Speed 2628.35 samples/sec   Loss 12.4172   LearningRate 0.0800   Epoch: 2   Global Step: 87790   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:40,507-Speed 2626.26 samples/sec   Loss 12.4615   LearningRate 0.0800   Epoch: 2   Global Step: 87800   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:44,392-Speed 2636.69 samples/sec   Loss 12.5799   LearningRate 0.0800   Epoch: 2   Global Step: 87810   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:48,292-Speed 2626.00 samples/sec   Loss 12.6235   LearningRate 0.0799   Epoch: 2   Global Step: 87820   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:52,194-Speed 2624.92 samples/sec   Loss 12.6553   LearningRate 0.0799   Epoch: 2   Global Step: 87830   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:34:56,055-Speed 2652.61 samples/sec   Loss 12.4768   LearningRate 0.0799   Epoch: 2   Global Step: 87840   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:34:59,957-Speed 2624.95 samples/sec   Loss 12.5340   LearningRate 0.0799   Epoch: 2   Global Step: 87850   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:03,865-Speed 2620.77 samples/sec   Loss 12.5442   LearningRate 0.0799   Epoch: 2   Global Step: 87860   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:07,770-Speed 2623.57 samples/sec   Loss 12.5715   LearningRate 0.0799   Epoch: 2   Global Step: 87870   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:11,673-Speed 2624.33 samples/sec   Loss 12.5348   LearningRate 0.0799   Epoch: 2   Global Step: 87880   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:15,568-Speed 2629.61 samples/sec   Loss 12.3355   LearningRate 0.0799   Epoch: 2   Global Step: 87890   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:19,460-Speed 2631.42 samples/sec   Loss 12.5946   LearningRate 0.0799   Epoch: 2   Global Step: 87900   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:23,353-Speed 2630.67 samples/sec   Loss 12.4507   LearningRate 0.0799   Epoch: 2   Global Step: 87910   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:27,263-Speed 2619.98 samples/sec   Loss 12.5993   LearningRate 0.0799   Epoch: 2   Global Step: 87920   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:31,155-Speed 2632.07 samples/sec   Loss 12.6165   LearningRate 0.0799   Epoch: 2   Global Step: 87930   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:35:35,049-Speed 2630.29 samples/sec   Loss 12.4224   LearningRate 0.0799   Epoch: 2   Global Step: 87940   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:35:38,943-Speed 2630.43 samples/sec   Loss 12.5546   LearningRate 0.0799   Epoch: 2   Global Step: 87950   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:35:42,833-Speed 2632.89 samples/sec   Loss 12.6466   LearningRate 0.0799   Epoch: 2   Global Step: 87960   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:35:46,727-Speed 2630.70 samples/sec   Loss 12.5660   LearningRate 0.0799   Epoch: 2   Global Step: 87970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:35:50,631-Speed 2623.45 samples/sec   Loss 12.5411   LearningRate 0.0799   Epoch: 2   Global Step: 87980   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:35:54,540-Speed 2620.25 samples/sec   Loss 12.6598   LearningRate 0.0799   Epoch: 2   Global Step: 87990   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:35:58,446-Speed 2622.30 samples/sec   Loss 12.6939   LearningRate 0.0799   Epoch: 2   Global Step: 88000   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:36:02,347-Speed 2625.82 samples/sec   Loss 12.6619   LearningRate 0.0799   Epoch: 2   Global Step: 88010   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:36:06,245-Speed 2627.11 samples/sec   Loss 12.5813   LearningRate 0.0799   Epoch: 2   Global Step: 88020   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:36:10,140-Speed 2630.05 samples/sec   Loss 12.4665   LearningRate 0.0799   Epoch: 2   Global Step: 88030   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:36:14,046-Speed 2622.37 samples/sec   Loss 12.6313   LearningRate 0.0799   Epoch: 2   Global Step: 88040   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:17,946-Speed 2626.16 samples/sec   Loss 12.4833   LearningRate 0.0799   Epoch: 2   Global Step: 88050   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:21,860-Speed 2616.47 samples/sec   Loss 12.6219   LearningRate 0.0799   Epoch: 2   Global Step: 88060   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:25,766-Speed 2623.06 samples/sec   Loss 12.5271   LearningRate 0.0799   Epoch: 2   Global Step: 88070   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:29,677-Speed 2618.83 samples/sec   Loss 12.5810   LearningRate 0.0799   Epoch: 2   Global Step: 88080   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:33,582-Speed 2622.45 samples/sec   Loss 12.3858   LearningRate 0.0799   Epoch: 2   Global Step: 88090   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:37,477-Speed 2630.03 samples/sec   Loss 12.4228   LearningRate 0.0799   Epoch: 2   Global Step: 88100   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:41,388-Speed 2619.29 samples/sec   Loss 12.6281   LearningRate 0.0799   Epoch: 2   Global Step: 88110   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:45,310-Speed 2611.08 samples/sec   Loss 12.4282   LearningRate 0.0799   Epoch: 2   Global Step: 88120   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:49,227-Speed 2615.06 samples/sec   Loss 12.5433   LearningRate 0.0799   Epoch: 2   Global Step: 88130   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:53,141-Speed 2616.95 samples/sec   Loss 12.6036   LearningRate 0.0799   Epoch: 2   Global Step: 88140   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:36:57,036-Speed 2630.55 samples/sec   Loss 12.5320   LearningRate 0.0799   Epoch: 2   Global Step: 88150   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:00,938-Speed 2624.87 samples/sec   Loss 12.5085   LearningRate 0.0799   Epoch: 2   Global Step: 88160   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:04,837-Speed 2626.92 samples/sec   Loss 12.4027   LearningRate 0.0799   Epoch: 2   Global Step: 88170   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:08,734-Speed 2627.70 samples/sec   Loss 12.5555   LearningRate 0.0799   Epoch: 2   Global Step: 88180   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:12,649-Speed 2616.60 samples/sec   Loss 12.7041   LearningRate 0.0799   Epoch: 2   Global Step: 88190   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:16,537-Speed 2634.09 samples/sec   Loss 12.5259   LearningRate 0.0799   Epoch: 2   Global Step: 88200   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:20,468-Speed 2606.28 samples/sec   Loss 12.6637   LearningRate 0.0799   Epoch: 2   Global Step: 88210   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:24,377-Speed 2620.30 samples/sec   Loss 12.4989   LearningRate 0.0799   Epoch: 2   Global Step: 88220   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:28,289-Speed 2618.40 samples/sec   Loss 12.5286   LearningRate 0.0799   Epoch: 2   Global Step: 88230   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:32,201-Speed 2617.94 samples/sec   Loss 12.7541   LearningRate 0.0799   Epoch: 2   Global Step: 88240   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 05:37:36,099-Speed 2627.77 samples/sec   Loss 12.4556   LearningRate 0.0799   Epoch: 2   Global Step: 88250   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:40,114-Speed 2550.68 samples/sec   Loss 12.5891   LearningRate 0.0799   Epoch: 2   Global Step: 88260   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:44,021-Speed 2621.60 samples/sec   Loss 12.5489   LearningRate 0.0799   Epoch: 2   Global Step: 88270   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:47,929-Speed 2621.05 samples/sec   Loss 12.5367   LearningRate 0.0798   Epoch: 2   Global Step: 88280   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:51,833-Speed 2623.64 samples/sec   Loss 12.4487   LearningRate 0.0798   Epoch: 2   Global Step: 88290   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:55,732-Speed 2627.08 samples/sec   Loss 12.5531   LearningRate 0.0798   Epoch: 2   Global Step: 88300   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:37:59,630-Speed 2627.41 samples/sec   Loss 12.6015   LearningRate 0.0798   Epoch: 2   Global Step: 88310   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:03,531-Speed 2625.77 samples/sec   Loss 12.5824   LearningRate 0.0798   Epoch: 2   Global Step: 88320   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:07,449-Speed 2614.06 samples/sec   Loss 12.4877   LearningRate 0.0798   Epoch: 2   Global Step: 88330   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:11,360-Speed 2618.94 samples/sec   Loss 12.5974   LearningRate 0.0798   Epoch: 2   Global Step: 88340   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:15,246-Speed 2635.44 samples/sec   Loss 12.6286   LearningRate 0.0798   Epoch: 2   Global Step: 88350   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:19,151-Speed 2622.82 samples/sec   Loss 12.5666   LearningRate 0.0798   Epoch: 2   Global Step: 88360   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:23,063-Speed 2618.55 samples/sec   Loss 12.5198   LearningRate 0.0798   Epoch: 2   Global Step: 88370   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:26,968-Speed 2622.43 samples/sec   Loss 12.6472   LearningRate 0.0798   Epoch: 2   Global Step: 88380   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:30,869-Speed 2627.43 samples/sec   Loss 12.6900   LearningRate 0.0798   Epoch: 2   Global Step: 88390   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:38:34,750-Speed 2639.45 samples/sec   Loss 12.4047   LearningRate 0.0798   Epoch: 2   Global Step: 88400   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:38:38,645-Speed 2629.78 samples/sec   Loss 12.6196   LearningRate 0.0798   Epoch: 2   Global Step: 88410   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:38:42,540-Speed 2628.94 samples/sec   Loss 12.5943   LearningRate 0.0798   Epoch: 2   Global Step: 88420   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:38:46,433-Speed 2631.43 samples/sec   Loss 12.4698   LearningRate 0.0798   Epoch: 2   Global Step: 88430   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:38:50,333-Speed 2625.91 samples/sec   Loss 12.5187   LearningRate 0.0798   Epoch: 2   Global Step: 88440   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:38:54,243-Speed 2619.78 samples/sec   Loss 12.6257   LearningRate 0.0798   Epoch: 2   Global Step: 88450   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:38:58,139-Speed 2629.20 samples/sec   Loss 12.3885   LearningRate 0.0798   Epoch: 2   Global Step: 88460   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:02,039-Speed 2626.60 samples/sec   Loss 12.3993   LearningRate 0.0798   Epoch: 2   Global Step: 88470   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:05,937-Speed 2627.34 samples/sec   Loss 12.5382   LearningRate 0.0798   Epoch: 2   Global Step: 88480   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:09,831-Speed 2630.21 samples/sec   Loss 12.5221   LearningRate 0.0798   Epoch: 2   Global Step: 88490   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:13,724-Speed 2630.75 samples/sec   Loss 12.5484   LearningRate 0.0798   Epoch: 2   Global Step: 88500   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:39:17,628-Speed 2624.04 samples/sec   Loss 12.6300   LearningRate 0.0798   Epoch: 2   Global Step: 88510   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:21,535-Speed 2621.93 samples/sec   Loss 12.4948   LearningRate 0.0798   Epoch: 2   Global Step: 88520   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:25,445-Speed 2619.61 samples/sec   Loss 12.4756   LearningRate 0.0798   Epoch: 2   Global Step: 88530   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:29,357-Speed 2618.18 samples/sec   Loss 12.5561   LearningRate 0.0798   Epoch: 2   Global Step: 88540   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:33,274-Speed 2615.11 samples/sec   Loss 12.4968   LearningRate 0.0798   Epoch: 2   Global Step: 88550   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:37,268-Speed 2564.40 samples/sec   Loss 12.6971   LearningRate 0.0798   Epoch: 2   Global Step: 88560   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:41,345-Speed 2512.23 samples/sec   Loss 12.5720   LearningRate 0.0798   Epoch: 2   Global Step: 88570   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:45,433-Speed 2505.30 samples/sec   Loss 12.6396   LearningRate 0.0798   Epoch: 2   Global Step: 88580   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:49,357-Speed 2610.38 samples/sec   Loss 12.5659   LearningRate 0.0798   Epoch: 2   Global Step: 88590   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:53,262-Speed 2622.77 samples/sec   Loss 12.4098   LearningRate 0.0798   Epoch: 2   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:39:57,183-Speed 2612.09 samples/sec   Loss 12.4489   LearningRate 0.0798   Epoch: 2   Global Step: 88610   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:40:01,085-Speed 2625.06 samples/sec   Loss 12.5019   LearningRate 0.0798   Epoch: 2   Global Step: 88620   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:40:04,989-Speed 2623.74 samples/sec   Loss 12.6385   LearningRate 0.0798   Epoch: 2   Global Step: 88630   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:40:08,900-Speed 2618.80 samples/sec   Loss 12.4512   LearningRate 0.0798   Epoch: 2   Global Step: 88640   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:40:12,779-Speed 2640.03 samples/sec   Loss 12.4608   LearningRate 0.0798   Epoch: 2   Global Step: 88650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:16,675-Speed 2629.85 samples/sec   Loss 12.5847   LearningRate 0.0798   Epoch: 2   Global Step: 88660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:20,570-Speed 2630.23 samples/sec   Loss 12.5540   LearningRate 0.0798   Epoch: 2   Global Step: 88670   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:24,473-Speed 2623.96 samples/sec   Loss 12.4999   LearningRate 0.0798   Epoch: 2   Global Step: 88680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:28,374-Speed 2625.85 samples/sec   Loss 12.5879   LearningRate 0.0798   Epoch: 2   Global Step: 88690   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:32,271-Speed 2628.76 samples/sec   Loss 12.5325   LearningRate 0.0798   Epoch: 2   Global Step: 88700   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:36,171-Speed 2626.25 samples/sec   Loss 12.4908   LearningRate 0.0798   Epoch: 2   Global Step: 88710   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:40,090-Speed 2613.53 samples/sec   Loss 12.6923   LearningRate 0.0798   Epoch: 2   Global Step: 88720   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:44,006-Speed 2615.60 samples/sec   Loss 12.7709   LearningRate 0.0798   Epoch: 2   Global Step: 88730   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:47,908-Speed 2625.07 samples/sec   Loss 12.7714   LearningRate 0.0798   Epoch: 2   Global Step: 88740   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:40:51,805-Speed 2629.14 samples/sec   Loss 12.4912   LearningRate 0.0797   Epoch: 2   Global Step: 88750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:40:55,703-Speed 2627.60 samples/sec   Loss 12.6758   LearningRate 0.0797   Epoch: 2   Global Step: 88760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:40:59,596-Speed 2630.71 samples/sec   Loss 12.5809   LearningRate 0.0797   Epoch: 2   Global Step: 88770   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:03,492-Speed 2628.53 samples/sec   Loss 12.5217   LearningRate 0.0797   Epoch: 2   Global Step: 88780   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:07,394-Speed 2625.49 samples/sec   Loss 12.5642   LearningRate 0.0797   Epoch: 2   Global Step: 88790   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:11,291-Speed 2628.53 samples/sec   Loss 12.5697   LearningRate 0.0797   Epoch: 2   Global Step: 88800   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:15,230-Speed 2599.89 samples/sec   Loss 12.4660   LearningRate 0.0797   Epoch: 2   Global Step: 88810   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:19,136-Speed 2622.38 samples/sec   Loss 12.3638   LearningRate 0.0797   Epoch: 2   Global Step: 88820   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:23,037-Speed 2625.79 samples/sec   Loss 12.5579   LearningRate 0.0797   Epoch: 2   Global Step: 88830   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:26,944-Speed 2622.24 samples/sec   Loss 12.7039   LearningRate 0.0797   Epoch: 2   Global Step: 88840   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:30,826-Speed 2637.86 samples/sec   Loss 12.3771   LearningRate 0.0797   Epoch: 2   Global Step: 88850   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:34,824-Speed 2561.83 samples/sec   Loss 12.6995   LearningRate 0.0797   Epoch: 2   Global Step: 88860   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:38,719-Speed 2630.10 samples/sec   Loss 12.6446   LearningRate 0.0797   Epoch: 2   Global Step: 88870   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:42,617-Speed 2627.77 samples/sec   Loss 12.5640   LearningRate 0.0797   Epoch: 2   Global Step: 88880   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:41:46,498-Speed 2638.67 samples/sec   Loss 12.5405   LearningRate 0.0797   Epoch: 2   Global Step: 88890   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:41:50,390-Speed 2631.96 samples/sec   Loss 12.4575   LearningRate 0.0797   Epoch: 2   Global Step: 88900   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:41:54,287-Speed 2627.85 samples/sec   Loss 12.4945   LearningRate 0.0797   Epoch: 2   Global Step: 88910   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:41:58,180-Speed 2631.16 samples/sec   Loss 12.6424   LearningRate 0.0797   Epoch: 2   Global Step: 88920   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:02,077-Speed 2628.35 samples/sec   Loss 12.5299   LearningRate 0.0797   Epoch: 2   Global Step: 88930   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:05,975-Speed 2627.27 samples/sec   Loss 12.5500   LearningRate 0.0797   Epoch: 2   Global Step: 88940   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:09,872-Speed 2628.32 samples/sec   Loss 12.5444   LearningRate 0.0797   Epoch: 2   Global Step: 88950   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:13,773-Speed 2626.20 samples/sec   Loss 12.4651   LearningRate 0.0797   Epoch: 2   Global Step: 88960   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:17,672-Speed 2626.91 samples/sec   Loss 12.5775   LearningRate 0.0797   Epoch: 2   Global Step: 88970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:21,577-Speed 2623.00 samples/sec   Loss 12.6244   LearningRate 0.0797   Epoch: 2   Global Step: 88980   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:42:25,495-Speed 2614.05 samples/sec   Loss 12.4654   LearningRate 0.0797   Epoch: 2   Global Step: 88990   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:29,394-Speed 2627.56 samples/sec   Loss 12.4709   LearningRate 0.0797   Epoch: 2   Global Step: 89000   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:33,288-Speed 2630.08 samples/sec   Loss 12.6773   LearningRate 0.0797   Epoch: 2   Global Step: 89010   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:37,189-Speed 2625.34 samples/sec   Loss 12.6118   LearningRate 0.0797   Epoch: 2   Global Step: 89020   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:41,093-Speed 2623.91 samples/sec   Loss 12.3752   LearningRate 0.0797   Epoch: 2   Global Step: 89030   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:44,988-Speed 2630.08 samples/sec   Loss 12.4470   LearningRate 0.0797   Epoch: 2   Global Step: 89040   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:48,895-Speed 2621.38 samples/sec   Loss 12.6189   LearningRate 0.0797   Epoch: 2   Global Step: 89050   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:52,796-Speed 2625.33 samples/sec   Loss 12.5110   LearningRate 0.0797   Epoch: 2   Global Step: 89060   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:42:56,691-Speed 2630.20 samples/sec   Loss 12.5289   LearningRate 0.0797   Epoch: 2   Global Step: 89070   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:43:00,589-Speed 2627.57 samples/sec   Loss 12.5542   LearningRate 0.0797   Epoch: 2   Global Step: 89080   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:43:04,490-Speed 2625.83 samples/sec   Loss 12.5828   LearningRate 0.0797   Epoch: 2   Global Step: 89090   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 05:43:08,374-Speed 2636.73 samples/sec   Loss 12.5192   LearningRate 0.0797   Epoch: 2   Global Step: 89100   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:43:12,406-Speed 2540.58 samples/sec   Loss 12.6093   LearningRate 0.0797   Epoch: 2   Global Step: 89110   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:43:16,314-Speed 2620.63 samples/sec   Loss 12.6257   LearningRate 0.0797   Epoch: 2   Global Step: 89120   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:43:20,217-Speed 2624.30 samples/sec   Loss 12.5453   LearningRate 0.0797   Epoch: 2   Global Step: 89130   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:43:24,101-Speed 2637.03 samples/sec   Loss 12.4752   LearningRate 0.0797   Epoch: 2   Global Step: 89140   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:28,001-Speed 2626.58 samples/sec   Loss 12.5401   LearningRate 0.0797   Epoch: 2   Global Step: 89150   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:31,924-Speed 2610.77 samples/sec   Loss 12.4685   LearningRate 0.0797   Epoch: 2   Global Step: 89160   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:35,826-Speed 2625.05 samples/sec   Loss 12.5484   LearningRate 0.0797   Epoch: 2   Global Step: 89170   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:39,775-Speed 2593.21 samples/sec   Loss 12.4803   LearningRate 0.0797   Epoch: 2   Global Step: 89180   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:43,681-Speed 2622.63 samples/sec   Loss 12.4603   LearningRate 0.0797   Epoch: 2   Global Step: 89190   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:47,577-Speed 2629.19 samples/sec   Loss 12.5667   LearningRate 0.0797   Epoch: 2   Global Step: 89200   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:51,489-Speed 2618.02 samples/sec   Loss 12.5252   LearningRate 0.0796   Epoch: 2   Global Step: 89210   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:55,394-Speed 2623.16 samples/sec   Loss 12.6617   LearningRate 0.0796   Epoch: 2   Global Step: 89220   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:43:59,299-Speed 2623.09 samples/sec   Loss 12.3754   LearningRate 0.0796   Epoch: 2   Global Step: 89230   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:44:03,195-Speed 2628.59 samples/sec   Loss 12.6311   LearningRate 0.0796   Epoch: 2   Global Step: 89240   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:07,092-Speed 2627.92 samples/sec   Loss 12.4950   LearningRate 0.0796   Epoch: 2   Global Step: 89250   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:10,990-Speed 2628.36 samples/sec   Loss 12.5885   LearningRate 0.0796   Epoch: 2   Global Step: 89260   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:14,892-Speed 2624.85 samples/sec   Loss 12.4984   LearningRate 0.0796   Epoch: 2   Global Step: 89270   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:18,790-Speed 2627.88 samples/sec   Loss 12.5011   LearningRate 0.0796   Epoch: 2   Global Step: 89280   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:22,693-Speed 2624.40 samples/sec   Loss 12.4283   LearningRate 0.0796   Epoch: 2   Global Step: 89290   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:26,594-Speed 2625.55 samples/sec   Loss 12.6771   LearningRate 0.0796   Epoch: 2   Global Step: 89300   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:30,488-Speed 2631.78 samples/sec   Loss 12.4459   LearningRate 0.0796   Epoch: 2   Global Step: 89310   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:34,390-Speed 2624.61 samples/sec   Loss 12.3607   LearningRate 0.0796   Epoch: 2   Global Step: 89320   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:38,297-Speed 2621.74 samples/sec   Loss 12.2911   LearningRate 0.0796   Epoch: 2   Global Step: 89330   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:42,175-Speed 2640.89 samples/sec   Loss 12.5887   LearningRate 0.0796   Epoch: 2   Global Step: 89340   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:46,081-Speed 2622.33 samples/sec   Loss 12.4955   LearningRate 0.0796   Epoch: 2   Global Step: 89350   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:49,978-Speed 2628.12 samples/sec   Loss 12.6191   LearningRate 0.0796   Epoch: 2   Global Step: 89360   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:44:53,873-Speed 2629.64 samples/sec   Loss 12.5140   LearningRate 0.0796   Epoch: 2   Global Step: 89370   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:44:57,766-Speed 2630.90 samples/sec   Loss 12.4930   LearningRate 0.0796   Epoch: 2   Global Step: 89380   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:01,661-Speed 2629.56 samples/sec   Loss 12.4289   LearningRate 0.0796   Epoch: 2   Global Step: 89390   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:05,560-Speed 2627.72 samples/sec   Loss 12.4442   LearningRate 0.0796   Epoch: 2   Global Step: 89400   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:09,450-Speed 2633.05 samples/sec   Loss 12.5553   LearningRate 0.0796   Epoch: 2   Global Step: 89410   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:13,346-Speed 2628.63 samples/sec   Loss 12.4912   LearningRate 0.0796   Epoch: 2   Global Step: 89420   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:17,246-Speed 2626.15 samples/sec   Loss 12.4156   LearningRate 0.0796   Epoch: 2   Global Step: 89430   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:21,136-Speed 2633.11 samples/sec   Loss 12.3253   LearningRate 0.0796   Epoch: 2   Global Step: 89440   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:25,032-Speed 2628.86 samples/sec   Loss 12.5020   LearningRate 0.0796   Epoch: 2   Global Step: 89450   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:28,932-Speed 2626.21 samples/sec   Loss 12.3129   LearningRate 0.0796   Epoch: 2   Global Step: 89460   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:45:32,825-Speed 2631.45 samples/sec   Loss 12.6258   LearningRate 0.0796   Epoch: 2   Global Step: 89470   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:45:36,721-Speed 2628.92 samples/sec   Loss 12.5448   LearningRate 0.0796   Epoch: 2   Global Step: 89480   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:45:40,614-Speed 2630.71 samples/sec   Loss 12.5847   LearningRate 0.0796   Epoch: 2   Global Step: 89490   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:45:44,512-Speed 2627.15 samples/sec   Loss 12.4487   LearningRate 0.0796   Epoch: 2   Global Step: 89500   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:45:48,419-Speed 2621.72 samples/sec   Loss 12.6035   LearningRate 0.0796   Epoch: 2   Global Step: 89510   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:45:52,321-Speed 2624.93 samples/sec   Loss 12.5525   LearningRate 0.0796   Epoch: 2   Global Step: 89520   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:45:56,215-Speed 2630.21 samples/sec   Loss 12.4599   LearningRate 0.0796   Epoch: 2   Global Step: 89530   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:46:00,137-Speed 2611.50 samples/sec   Loss 12.7046   LearningRate 0.0796   Epoch: 2   Global Step: 89540   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:46:04,068-Speed 2605.58 samples/sec   Loss 12.5206   LearningRate 0.0796   Epoch: 2   Global Step: 89550   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:07,965-Speed 2628.10 samples/sec   Loss 12.3967   LearningRate 0.0796   Epoch: 2   Global Step: 89560   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:11,864-Speed 2627.12 samples/sec   Loss 12.4953   LearningRate 0.0796   Epoch: 2   Global Step: 89570   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:15,759-Speed 2629.79 samples/sec   Loss 12.4986   LearningRate 0.0796   Epoch: 2   Global Step: 89580   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:19,658-Speed 2627.10 samples/sec   Loss 12.4868   LearningRate 0.0796   Epoch: 2   Global Step: 89590   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:23,553-Speed 2629.42 samples/sec   Loss 12.5976   LearningRate 0.0796   Epoch: 2   Global Step: 89600   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:27,446-Speed 2630.94 samples/sec   Loss 12.2993   LearningRate 0.0796   Epoch: 2   Global Step: 89610   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:31,343-Speed 2628.28 samples/sec   Loss 12.6788   LearningRate 0.0796   Epoch: 2   Global Step: 89620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:35,248-Speed 2622.35 samples/sec   Loss 12.6442   LearningRate 0.0796   Epoch: 2   Global Step: 89630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:39,135-Speed 2635.53 samples/sec   Loss 12.6129   LearningRate 0.0796   Epoch: 2   Global Step: 89640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:46:43,031-Speed 2628.96 samples/sec   Loss 12.4339   LearningRate 0.0796   Epoch: 2   Global Step: 89650   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:46:46,925-Speed 2630.92 samples/sec   Loss 12.5659   LearningRate 0.0796   Epoch: 2   Global Step: 89660   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:46:50,826-Speed 2626.21 samples/sec   Loss 12.3877   LearningRate 0.0796   Epoch: 2   Global Step: 89670   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:46:54,722-Speed 2628.81 samples/sec   Loss 12.4884   LearningRate 0.0795   Epoch: 2   Global Step: 89680   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:46:58,622-Speed 2626.00 samples/sec   Loss 12.3869   LearningRate 0.0795   Epoch: 2   Global Step: 89690   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:47:02,522-Speed 2625.78 samples/sec   Loss 12.3565   LearningRate 0.0795   Epoch: 2   Global Step: 89700   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:47:06,431-Speed 2620.72 samples/sec   Loss 12.5650   LearningRate 0.0795   Epoch: 2   Global Step: 89710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:47:10,335-Speed 2623.23 samples/sec   Loss 12.4623   LearningRate 0.0795   Epoch: 2   Global Step: 89720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:47:14,230-Speed 2630.53 samples/sec   Loss 12.4193   LearningRate 0.0795   Epoch: 2   Global Step: 89730   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:18,127-Speed 2629.00 samples/sec   Loss 12.4052   LearningRate 0.0795   Epoch: 2   Global Step: 89740   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:22,023-Speed 2628.70 samples/sec   Loss 12.4174   LearningRate 0.0795   Epoch: 2   Global Step: 89750   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:25,920-Speed 2628.29 samples/sec   Loss 12.3856   LearningRate 0.0795   Epoch: 2   Global Step: 89760   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:29,814-Speed 2629.80 samples/sec   Loss 12.3968   LearningRate 0.0795   Epoch: 2   Global Step: 89770   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:33,713-Speed 2627.29 samples/sec   Loss 12.4098   LearningRate 0.0795   Epoch: 2   Global Step: 89780   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:37,625-Speed 2618.12 samples/sec   Loss 12.3804   LearningRate 0.0795   Epoch: 2   Global Step: 89790   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:41,529-Speed 2623.27 samples/sec   Loss 12.5352   LearningRate 0.0795   Epoch: 2   Global Step: 89800   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:45,457-Speed 2607.95 samples/sec   Loss 12.3776   LearningRate 0.0795   Epoch: 2   Global Step: 89810   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:49,377-Speed 2612.86 samples/sec   Loss 12.4081   LearningRate 0.0795   Epoch: 2   Global Step: 89820   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:47:53,310-Speed 2604.18 samples/sec   Loss 12.6405   LearningRate 0.0795   Epoch: 2   Global Step: 89830   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:47:57,218-Speed 2621.71 samples/sec   Loss 12.5063   LearningRate 0.0795   Epoch: 2   Global Step: 89840   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:01,123-Speed 2622.42 samples/sec   Loss 12.4423   LearningRate 0.0795   Epoch: 2   Global Step: 89850   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:05,023-Speed 2626.43 samples/sec   Loss 12.5655   LearningRate 0.0795   Epoch: 2   Global Step: 89860   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:09,021-Speed 2561.79 samples/sec   Loss 12.4024   LearningRate 0.0795   Epoch: 2   Global Step: 89870   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:12,940-Speed 2614.08 samples/sec   Loss 12.4850   LearningRate 0.0795   Epoch: 2   Global Step: 89880   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:16,840-Speed 2626.27 samples/sec   Loss 12.5925   LearningRate 0.0795   Epoch: 2   Global Step: 89890   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:20,742-Speed 2625.22 samples/sec   Loss 12.5711   LearningRate 0.0795   Epoch: 2   Global Step: 89900   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:24,637-Speed 2629.67 samples/sec   Loss 12.6117   LearningRate 0.0795   Epoch: 2   Global Step: 89910   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:28,546-Speed 2620.37 samples/sec   Loss 12.4748   LearningRate 0.0795   Epoch: 2   Global Step: 89920   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:32,430-Speed 2637.11 samples/sec   Loss 12.5805   LearningRate 0.0795   Epoch: 2   Global Step: 89930   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:36,333-Speed 2624.41 samples/sec   Loss 12.4757   LearningRate 0.0795   Epoch: 2   Global Step: 89940   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:40,227-Speed 2630.10 samples/sec   Loss 12.4222   LearningRate 0.0795   Epoch: 2   Global Step: 89950   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:44,128-Speed 2625.76 samples/sec   Loss 12.3750   LearningRate 0.0795   Epoch: 2   Global Step: 89960   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:48,026-Speed 2627.32 samples/sec   Loss 12.4639   LearningRate 0.0795   Epoch: 2   Global Step: 89970   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:51,925-Speed 2626.73 samples/sec   Loss 12.5041   LearningRate 0.0795   Epoch: 2   Global Step: 89980   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:55,830-Speed 2623.67 samples/sec   Loss 12.4757   LearningRate 0.0795   Epoch: 2   Global Step: 89990   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:48:59,736-Speed 2622.09 samples/sec   Loss 12.4954   LearningRate 0.0795   Epoch: 2   Global Step: 90000   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:49:42,926-[lfw][90000]XNorm: 23.094169
Training: 2022-04-13 05:49:42,927-[lfw][90000]Accuracy-Flip: 0.99667+-0.00258
Training: 2022-04-13 05:49:42,927-[lfw][90000]Accuracy-Highest: 0.99783
Training: 2022-04-13 05:50:33,196-[cfp_fp][90000]XNorm: 21.114501
Training: 2022-04-13 05:50:33,197-[cfp_fp][90000]Accuracy-Flip: 0.97986+-0.00488
Training: 2022-04-13 05:50:33,197-[cfp_fp][90000]Accuracy-Highest: 0.97986
Training: 2022-04-13 05:51:16,475-[agedb_30][90000]XNorm: 23.006428
Training: 2022-04-13 05:51:16,476-[agedb_30][90000]Accuracy-Flip: 0.96333+-0.00650
Training: 2022-04-13 05:51:16,476-[agedb_30][90000]Accuracy-Highest: 0.96600
Training: 2022-04-13 05:51:20,437-Speed 72.78 samples/sec   Loss 12.6087   LearningRate 0.0795   Epoch: 2   Global Step: 90010   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:51:24,313-Speed 2642.72 samples/sec   Loss 12.4703   LearningRate 0.0795   Epoch: 2   Global Step: 90020   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:51:28,175-Speed 2652.19 samples/sec   Loss 12.5309   LearningRate 0.0795   Epoch: 2   Global Step: 90030   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 05:51:32,033-Speed 2654.34 samples/sec   Loss 12.5832   LearningRate 0.0795   Epoch: 2   Global Step: 90040   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:51:35,901-Speed 2647.82 samples/sec   Loss 12.5604   LearningRate 0.0795   Epoch: 2   Global Step: 90050   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:51:39,769-Speed 2648.56 samples/sec   Loss 12.4957   LearningRate 0.0795   Epoch: 2   Global Step: 90060   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:51:43,644-Speed 2643.48 samples/sec   Loss 12.4759   LearningRate 0.0795   Epoch: 2   Global Step: 90070   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:51:47,512-Speed 2647.61 samples/sec   Loss 12.5796   LearningRate 0.0795   Epoch: 2   Global Step: 90080   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:51:51,390-Speed 2641.22 samples/sec   Loss 12.5081   LearningRate 0.0795   Epoch: 2   Global Step: 90090   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:51:55,268-Speed 2641.37 samples/sec   Loss 12.4691   LearningRate 0.0795   Epoch: 2   Global Step: 90100   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:51:59,155-Speed 2634.70 samples/sec   Loss 12.6911   LearningRate 0.0795   Epoch: 2   Global Step: 90110   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:03,041-Speed 2635.82 samples/sec   Loss 12.5042   LearningRate 0.0795   Epoch: 2   Global Step: 90120   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:06,927-Speed 2635.58 samples/sec   Loss 12.5590   LearningRate 0.0795   Epoch: 2   Global Step: 90130   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:10,813-Speed 2635.31 samples/sec   Loss 12.5113   LearningRate 0.0794   Epoch: 2   Global Step: 90140   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:14,700-Speed 2635.79 samples/sec   Loss 12.4965   LearningRate 0.0794   Epoch: 2   Global Step: 90150   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:18,585-Speed 2636.14 samples/sec   Loss 12.5488   LearningRate 0.0794   Epoch: 2   Global Step: 90160   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:22,478-Speed 2630.92 samples/sec   Loss 12.6433   LearningRate 0.0794   Epoch: 2   Global Step: 90170   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:26,378-Speed 2626.29 samples/sec   Loss 12.4645   LearningRate 0.0794   Epoch: 2   Global Step: 90180   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:52:30,283-Speed 2623.09 samples/sec   Loss 12.5801   LearningRate 0.0794   Epoch: 2   Global Step: 90190   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:52:34,194-Speed 2618.51 samples/sec   Loss 12.5438   LearningRate 0.0794   Epoch: 2   Global Step: 90200   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:52:38,093-Speed 2626.92 samples/sec   Loss 12.3670   LearningRate 0.0794   Epoch: 2   Global Step: 90210   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:52:42,006-Speed 2617.90 samples/sec   Loss 12.5421   LearningRate 0.0794   Epoch: 2   Global Step: 90220   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:52:45,887-Speed 2638.83 samples/sec   Loss 12.6265   LearningRate 0.0794   Epoch: 2   Global Step: 90230   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:49,783-Speed 2629.41 samples/sec   Loss 12.5942   LearningRate 0.0794   Epoch: 2   Global Step: 90240   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:53,676-Speed 2630.92 samples/sec   Loss 12.2895   LearningRate 0.0794   Epoch: 2   Global Step: 90250   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:52:57,569-Speed 2631.12 samples/sec   Loss 12.3536   LearningRate 0.0794   Epoch: 2   Global Step: 90260   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:01,459-Speed 2632.98 samples/sec   Loss 12.3689   LearningRate 0.0794   Epoch: 2   Global Step: 90270   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:05,353-Speed 2629.62 samples/sec   Loss 12.5258   LearningRate 0.0794   Epoch: 2   Global Step: 90280   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:09,263-Speed 2619.35 samples/sec   Loss 12.3166   LearningRate 0.0794   Epoch: 2   Global Step: 90290   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:13,163-Speed 2626.98 samples/sec   Loss 12.2435   LearningRate 0.0794   Epoch: 2   Global Step: 90300   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:17,061-Speed 2627.25 samples/sec   Loss 12.4989   LearningRate 0.0794   Epoch: 2   Global Step: 90310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:20,960-Speed 2626.83 samples/sec   Loss 12.3757   LearningRate 0.0794   Epoch: 2   Global Step: 90320   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:53:24,852-Speed 2631.48 samples/sec   Loss 12.2589   LearningRate 0.0794   Epoch: 2   Global Step: 90330   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:28,746-Speed 2630.83 samples/sec   Loss 12.5611   LearningRate 0.0794   Epoch: 2   Global Step: 90340   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:32,660-Speed 2616.87 samples/sec   Loss 12.5392   LearningRate 0.0794   Epoch: 2   Global Step: 90350   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:36,552-Speed 2631.79 samples/sec   Loss 12.4399   LearningRate 0.0794   Epoch: 2   Global Step: 90360   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:40,446-Speed 2630.14 samples/sec   Loss 12.4126   LearningRate 0.0794   Epoch: 2   Global Step: 90370   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:44,353-Speed 2621.18 samples/sec   Loss 12.6504   LearningRate 0.0794   Epoch: 2   Global Step: 90380   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:48,246-Speed 2631.18 samples/sec   Loss 12.4746   LearningRate 0.0794   Epoch: 2   Global Step: 90390   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:52,140-Speed 2630.33 samples/sec   Loss 12.5007   LearningRate 0.0794   Epoch: 2   Global Step: 90400   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:56,031-Speed 2631.76 samples/sec   Loss 12.6204   LearningRate 0.0794   Epoch: 2   Global Step: 90410   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:53:59,930-Speed 2627.74 samples/sec   Loss 12.6546   LearningRate 0.0794   Epoch: 2   Global Step: 90420   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:54:03,828-Speed 2627.76 samples/sec   Loss 12.3924   LearningRate 0.0794   Epoch: 2   Global Step: 90430   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 05:54:07,687-Speed 2653.58 samples/sec   Loss 12.4004   LearningRate 0.0794   Epoch: 2   Global Step: 90440   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:54:11,609-Speed 2611.61 samples/sec   Loss 12.5268   LearningRate 0.0794   Epoch: 2   Global Step: 90450   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:54:15,599-Speed 2566.65 samples/sec   Loss 12.5390   LearningRate 0.0794   Epoch: 2   Global Step: 90460   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:54:19,476-Speed 2642.24 samples/sec   Loss 12.6056   LearningRate 0.0794   Epoch: 2   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:23,368-Speed 2631.61 samples/sec   Loss 12.4067   LearningRate 0.0794   Epoch: 2   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:27,255-Speed 2635.11 samples/sec   Loss 12.5806   LearningRate 0.0794   Epoch: 2   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:31,149-Speed 2630.16 samples/sec   Loss 12.4007   LearningRate 0.0794   Epoch: 2   Global Step: 90500   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:35,043-Speed 2630.65 samples/sec   Loss 12.6431   LearningRate 0.0794   Epoch: 2   Global Step: 90510   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:38,934-Speed 2632.37 samples/sec   Loss 12.3786   LearningRate 0.0794   Epoch: 2   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:42,822-Speed 2634.19 samples/sec   Loss 12.4187   LearningRate 0.0794   Epoch: 2   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:46,713-Speed 2631.98 samples/sec   Loss 12.4650   LearningRate 0.0794   Epoch: 2   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:50,609-Speed 2629.16 samples/sec   Loss 12.5458   LearningRate 0.0794   Epoch: 2   Global Step: 90550   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:54,501-Speed 2632.17 samples/sec   Loss 12.4139   LearningRate 0.0794   Epoch: 2   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:54:58,401-Speed 2625.67 samples/sec   Loss 12.3967   LearningRate 0.0794   Epoch: 2   Global Step: 90570   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:02,293-Speed 2631.63 samples/sec   Loss 12.5470   LearningRate 0.0794   Epoch: 2   Global Step: 90580   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:06,185-Speed 2631.52 samples/sec   Loss 12.3688   LearningRate 0.0794   Epoch: 2   Global Step: 90590   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:10,079-Speed 2630.20 samples/sec   Loss 12.5596   LearningRate 0.0794   Epoch: 2   Global Step: 90600   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:13,975-Speed 2628.73 samples/sec   Loss 12.5415   LearningRate 0.0793   Epoch: 2   Global Step: 90610   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:17,870-Speed 2630.61 samples/sec   Loss 12.4552   LearningRate 0.0793   Epoch: 2   Global Step: 90620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:21,763-Speed 2630.98 samples/sec   Loss 12.3131   LearningRate 0.0793   Epoch: 2   Global Step: 90630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:25,660-Speed 2627.80 samples/sec   Loss 12.4707   LearningRate 0.0793   Epoch: 2   Global Step: 90640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:29,559-Speed 2627.05 samples/sec   Loss 12.4925   LearningRate 0.0793   Epoch: 2   Global Step: 90650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:33,465-Speed 2622.52 samples/sec   Loss 12.5258   LearningRate 0.0793   Epoch: 2   Global Step: 90660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:55:37,361-Speed 2628.47 samples/sec   Loss 12.4574   LearningRate 0.0793   Epoch: 2   Global Step: 90670   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:55:41,254-Speed 2630.73 samples/sec   Loss 12.5098   LearningRate 0.0793   Epoch: 2   Global Step: 90680   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:55:45,149-Speed 2629.78 samples/sec   Loss 12.5191   LearningRate 0.0793   Epoch: 2   Global Step: 90690   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:55:49,042-Speed 2630.86 samples/sec   Loss 12.4256   LearningRate 0.0793   Epoch: 2   Global Step: 90700   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:55:52,938-Speed 2629.34 samples/sec   Loss 12.5212   LearningRate 0.0793   Epoch: 2   Global Step: 90710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:55:56,833-Speed 2629.98 samples/sec   Loss 12.3401   LearningRate 0.0793   Epoch: 2   Global Step: 90720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:00,732-Speed 2626.35 samples/sec   Loss 12.4726   LearningRate 0.0793   Epoch: 2   Global Step: 90730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:04,630-Speed 2627.81 samples/sec   Loss 12.3998   LearningRate 0.0793   Epoch: 2   Global Step: 90740   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:08,523-Speed 2630.95 samples/sec   Loss 12.3885   LearningRate 0.0793   Epoch: 2   Global Step: 90750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:12,415-Speed 2631.19 samples/sec   Loss 12.4149   LearningRate 0.0793   Epoch: 2   Global Step: 90760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:16,319-Speed 2623.74 samples/sec   Loss 12.4627   LearningRate 0.0793   Epoch: 2   Global Step: 90770   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 05:56:20,209-Speed 2632.77 samples/sec   Loss 12.5551   LearningRate 0.0793   Epoch: 2   Global Step: 90780   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:24,106-Speed 2628.26 samples/sec   Loss 12.5272   LearningRate 0.0793   Epoch: 2   Global Step: 90790   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:27,998-Speed 2631.76 samples/sec   Loss 12.3950   LearningRate 0.0793   Epoch: 2   Global Step: 90800   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:31,900-Speed 2625.28 samples/sec   Loss 12.4772   LearningRate 0.0793   Epoch: 2   Global Step: 90810   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:35,818-Speed 2614.24 samples/sec   Loss 12.6655   LearningRate 0.0793   Epoch: 2   Global Step: 90820   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:39,715-Speed 2628.01 samples/sec   Loss 12.3999   LearningRate 0.0793   Epoch: 2   Global Step: 90830   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:43,609-Speed 2630.13 samples/sec   Loss 12.2097   LearningRate 0.0793   Epoch: 2   Global Step: 90840   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:47,517-Speed 2621.09 samples/sec   Loss 12.4997   LearningRate 0.0793   Epoch: 2   Global Step: 90850   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:51,409-Speed 2631.29 samples/sec   Loss 12.4486   LearningRate 0.0793   Epoch: 2   Global Step: 90860   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:55,305-Speed 2629.39 samples/sec   Loss 12.4566   LearningRate 0.0793   Epoch: 2   Global Step: 90870   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:56:59,186-Speed 2638.71 samples/sec   Loss 12.4814   LearningRate 0.0793   Epoch: 2   Global Step: 90880   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:57:03,103-Speed 2614.56 samples/sec   Loss 12.6137   LearningRate 0.0793   Epoch: 2   Global Step: 90890   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:57:07,007-Speed 2623.86 samples/sec   Loss 12.4833   LearningRate 0.0793   Epoch: 2   Global Step: 90900   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:57:10,908-Speed 2625.60 samples/sec   Loss 12.4288   LearningRate 0.0793   Epoch: 2   Global Step: 90910   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:57:14,808-Speed 2628.64 samples/sec   Loss 12.6207   LearningRate 0.0793   Epoch: 2   Global Step: 90920   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:57:18,705-Speed 2627.87 samples/sec   Loss 12.4292   LearningRate 0.0793   Epoch: 2   Global Step: 90930   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:57:22,590-Speed 2636.78 samples/sec   Loss 12.4654   LearningRate 0.0793   Epoch: 2   Global Step: 90940   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:26,476-Speed 2635.31 samples/sec   Loss 12.5246   LearningRate 0.0793   Epoch: 2   Global Step: 90950   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:30,374-Speed 2627.95 samples/sec   Loss 12.4688   LearningRate 0.0793   Epoch: 2   Global Step: 90960   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:34,269-Speed 2629.20 samples/sec   Loss 12.5058   LearningRate 0.0793   Epoch: 2   Global Step: 90970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:38,165-Speed 2629.00 samples/sec   Loss 12.4112   LearningRate 0.0793   Epoch: 2   Global Step: 90980   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:42,060-Speed 2629.83 samples/sec   Loss 12.3955   LearningRate 0.0793   Epoch: 2   Global Step: 90990   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:45,953-Speed 2630.87 samples/sec   Loss 12.5482   LearningRate 0.0793   Epoch: 2   Global Step: 91000   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:49,848-Speed 2629.93 samples/sec   Loss 12.3594   LearningRate 0.0793   Epoch: 2   Global Step: 91010   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:53,743-Speed 2629.04 samples/sec   Loss 12.4187   LearningRate 0.0793   Epoch: 2   Global Step: 91020   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:57:57,640-Speed 2628.37 samples/sec   Loss 12.4988   LearningRate 0.0793   Epoch: 2   Global Step: 91030   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:01,536-Speed 2629.11 samples/sec   Loss 12.4263   LearningRate 0.0793   Epoch: 2   Global Step: 91040   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:58:05,452-Speed 2615.35 samples/sec   Loss 12.3978   LearningRate 0.0793   Epoch: 2   Global Step: 91050   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:58:09,333-Speed 2638.77 samples/sec   Loss 12.5106   LearningRate 0.0793   Epoch: 2   Global Step: 91060   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:13,229-Speed 2629.43 samples/sec   Loss 12.3828   LearningRate 0.0792   Epoch: 2   Global Step: 91070   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:17,124-Speed 2628.93 samples/sec   Loss 12.3700   LearningRate 0.0792   Epoch: 2   Global Step: 91080   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:21,019-Speed 2630.55 samples/sec   Loss 12.5268   LearningRate 0.0792   Epoch: 2   Global Step: 91090   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:24,914-Speed 2629.64 samples/sec   Loss 12.4918   LearningRate 0.0792   Epoch: 2   Global Step: 91100   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:28,820-Speed 2623.76 samples/sec   Loss 12.6271   LearningRate 0.0792   Epoch: 2   Global Step: 91110   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:32,724-Speed 2623.76 samples/sec   Loss 12.3964   LearningRate 0.0792   Epoch: 2   Global Step: 91120   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:36,620-Speed 2628.82 samples/sec   Loss 12.3566   LearningRate 0.0792   Epoch: 2   Global Step: 91130   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:40,512-Speed 2631.00 samples/sec   Loss 12.4972   LearningRate 0.0792   Epoch: 2   Global Step: 91140   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:44,425-Speed 2618.07 samples/sec   Loss 12.4289   LearningRate 0.0792   Epoch: 2   Global Step: 91150   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:58:48,320-Speed 2629.48 samples/sec   Loss 12.3939   LearningRate 0.0792   Epoch: 2   Global Step: 91160   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:58:52,252-Speed 2605.11 samples/sec   Loss 12.5667   LearningRate 0.0792   Epoch: 2   Global Step: 91170   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:58:56,149-Speed 2627.86 samples/sec   Loss 12.4631   LearningRate 0.0792   Epoch: 2   Global Step: 91180   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:59:00,040-Speed 2632.56 samples/sec   Loss 12.4002   LearningRate 0.0792   Epoch: 2   Global Step: 91190   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:03,937-Speed 2628.58 samples/sec   Loss 12.4478   LearningRate 0.0792   Epoch: 2   Global Step: 91200   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:07,841-Speed 2623.57 samples/sec   Loss 12.5689   LearningRate 0.0792   Epoch: 2   Global Step: 91210   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:11,741-Speed 2625.95 samples/sec   Loss 12.4817   LearningRate 0.0792   Epoch: 2   Global Step: 91220   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:15,639-Speed 2627.99 samples/sec   Loss 12.4373   LearningRate 0.0792   Epoch: 2   Global Step: 91230   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:19,557-Speed 2614.64 samples/sec   Loss 12.4574   LearningRate 0.0792   Epoch: 2   Global Step: 91240   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:23,457-Speed 2626.21 samples/sec   Loss 12.5578   LearningRate 0.0792   Epoch: 2   Global Step: 91250   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:27,355-Speed 2627.86 samples/sec   Loss 12.4258   LearningRate 0.0792   Epoch: 2   Global Step: 91260   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:31,259-Speed 2623.47 samples/sec   Loss 12.4576   LearningRate 0.0792   Epoch: 2   Global Step: 91270   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:35,154-Speed 2629.44 samples/sec   Loss 12.5542   LearningRate 0.0792   Epoch: 2   Global Step: 91280   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:39,059-Speed 2623.12 samples/sec   Loss 12.4592   LearningRate 0.0792   Epoch: 2   Global Step: 91290   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:59:42,964-Speed 2623.38 samples/sec   Loss 12.3915   LearningRate 0.0792   Epoch: 2   Global Step: 91300   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 05:59:46,842-Speed 2640.80 samples/sec   Loss 12.5056   LearningRate 0.0792   Epoch: 2   Global Step: 91310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:50,746-Speed 2623.91 samples/sec   Loss 12.4138   LearningRate 0.0792   Epoch: 2   Global Step: 91320   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 05:59:54,624-Speed 2641.30 samples/sec   Loss 12.4868   LearningRate 0.0792   Epoch: 2   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 05:59:58,522-Speed 2627.96 samples/sec   Loss 12.2947   LearningRate 0.0792   Epoch: 2   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:02,417-Speed 2629.72 samples/sec   Loss 12.4918   LearningRate 0.0792   Epoch: 2   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:06,315-Speed 2627.54 samples/sec   Loss 12.4688   LearningRate 0.0792   Epoch: 2   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:10,223-Speed 2620.88 samples/sec   Loss 12.4252   LearningRate 0.0792   Epoch: 2   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:14,120-Speed 2627.97 samples/sec   Loss 12.3921   LearningRate 0.0792   Epoch: 2   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:18,017-Speed 2628.35 samples/sec   Loss 12.2655   LearningRate 0.0792   Epoch: 2   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:21,913-Speed 2629.29 samples/sec   Loss 12.3958   LearningRate 0.0792   Epoch: 2   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:25,809-Speed 2629.32 samples/sec   Loss 12.3479   LearningRate 0.0792   Epoch: 2   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:29,712-Speed 2624.14 samples/sec   Loss 12.5960   LearningRate 0.0792   Epoch: 2   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:00:33,608-Speed 2629.16 samples/sec   Loss 12.4549   LearningRate 0.0792   Epoch: 2   Global Step: 91430   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:00:37,510-Speed 2624.52 samples/sec   Loss 12.4075   LearningRate 0.0792   Epoch: 2   Global Step: 91440   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:00:41,435-Speed 2609.49 samples/sec   Loss 12.4451   LearningRate 0.0792   Epoch: 2   Global Step: 91450   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:00:45,334-Speed 2626.88 samples/sec   Loss 12.3916   LearningRate 0.0792   Epoch: 2   Global Step: 91460   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:00:49,231-Speed 2628.61 samples/sec   Loss 12.3923   LearningRate 0.0792   Epoch: 2   Global Step: 91470   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:00:53,127-Speed 2628.77 samples/sec   Loss 12.3342   LearningRate 0.0792   Epoch: 2   Global Step: 91480   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:00:57,034-Speed 2621.61 samples/sec   Loss 12.4448   LearningRate 0.0792   Epoch: 2   Global Step: 91490   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:00,929-Speed 2629.57 samples/sec   Loss 12.4216   LearningRate 0.0792   Epoch: 2   Global Step: 91500   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:04,841-Speed 2621.30 samples/sec   Loss 12.6142   LearningRate 0.0792   Epoch: 2   Global Step: 91510   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:08,740-Speed 2627.00 samples/sec   Loss 12.3942   LearningRate 0.0792   Epoch: 2   Global Step: 91520   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:12,633-Speed 2630.88 samples/sec   Loss 12.4394   LearningRate 0.0792   Epoch: 2   Global Step: 91530   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:01:16,530-Speed 2628.50 samples/sec   Loss 12.4907   LearningRate 0.0791   Epoch: 2   Global Step: 91540   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:01:20,428-Speed 2628.03 samples/sec   Loss 12.3239   LearningRate 0.0791   Epoch: 2   Global Step: 91550   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:01:24,310-Speed 2638.78 samples/sec   Loss 12.7019   LearningRate 0.0791   Epoch: 2   Global Step: 91560   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:28,371-Speed 2522.13 samples/sec   Loss 12.4993   LearningRate 0.0791   Epoch: 2   Global Step: 91570   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:32,418-Speed 2533.28 samples/sec   Loss 12.4543   LearningRate 0.0791   Epoch: 2   Global Step: 91580   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:36,320-Speed 2625.21 samples/sec   Loss 12.5234   LearningRate 0.0791   Epoch: 2   Global Step: 91590   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:40,222-Speed 2624.75 samples/sec   Loss 12.4793   LearningRate 0.0791   Epoch: 2   Global Step: 91600   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:44,134-Speed 2618.42 samples/sec   Loss 12.5522   LearningRate 0.0791   Epoch: 2   Global Step: 91610   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:48,040-Speed 2622.10 samples/sec   Loss 12.5634   LearningRate 0.0791   Epoch: 2   Global Step: 91620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:51,937-Speed 2628.63 samples/sec   Loss 12.4529   LearningRate 0.0791   Epoch: 2   Global Step: 91630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:55,846-Speed 2620.71 samples/sec   Loss 12.5310   LearningRate 0.0791   Epoch: 2   Global Step: 91640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:01:59,744-Speed 2627.32 samples/sec   Loss 12.3468   LearningRate 0.0791   Epoch: 2   Global Step: 91650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:03,641-Speed 2628.52 samples/sec   Loss 12.4632   LearningRate 0.0791   Epoch: 2   Global Step: 91660   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:02:07,545-Speed 2623.66 samples/sec   Loss 12.4375   LearningRate 0.0791   Epoch: 2   Global Step: 91670   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:02:11,431-Speed 2635.40 samples/sec   Loss 12.4488   LearningRate 0.0791   Epoch: 2   Global Step: 91680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:15,342-Speed 2619.22 samples/sec   Loss 12.3730   LearningRate 0.0791   Epoch: 2   Global Step: 91690   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:19,245-Speed 2623.90 samples/sec   Loss 12.4682   LearningRate 0.0791   Epoch: 2   Global Step: 91700   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:23,143-Speed 2627.47 samples/sec   Loss 12.3608   LearningRate 0.0791   Epoch: 2   Global Step: 91710   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:27,044-Speed 2625.49 samples/sec   Loss 12.3585   LearningRate 0.0791   Epoch: 2   Global Step: 91720   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:30,941-Speed 2628.88 samples/sec   Loss 12.4800   LearningRate 0.0791   Epoch: 2   Global Step: 91730   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:34,839-Speed 2627.55 samples/sec   Loss 12.5400   LearningRate 0.0791   Epoch: 2   Global Step: 91740   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:38,738-Speed 2626.92 samples/sec   Loss 12.5367   LearningRate 0.0791   Epoch: 2   Global Step: 91750   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:42,638-Speed 2626.83 samples/sec   Loss 12.3744   LearningRate 0.0791   Epoch: 2   Global Step: 91760   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:46,570-Speed 2604.72 samples/sec   Loss 12.4668   LearningRate 0.0791   Epoch: 2   Global Step: 91770   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:50,457-Speed 2635.14 samples/sec   Loss 12.4695   LearningRate 0.0791   Epoch: 2   Global Step: 91780   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:54,363-Speed 2622.42 samples/sec   Loss 12.5335   LearningRate 0.0791   Epoch: 2   Global Step: 91790   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:02:58,270-Speed 2621.47 samples/sec   Loss 12.4039   LearningRate 0.0791   Epoch: 2   Global Step: 91800   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:02,178-Speed 2620.97 samples/sec   Loss 12.5271   LearningRate 0.0791   Epoch: 2   Global Step: 91810   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:06,076-Speed 2627.30 samples/sec   Loss 12.4724   LearningRate 0.0791   Epoch: 2   Global Step: 91820   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:09,980-Speed 2623.77 samples/sec   Loss 12.4296   LearningRate 0.0791   Epoch: 2   Global Step: 91830   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:13,891-Speed 2618.90 samples/sec   Loss 12.4129   LearningRate 0.0791   Epoch: 2   Global Step: 91840   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:17,807-Speed 2615.59 samples/sec   Loss 12.2908   LearningRate 0.0791   Epoch: 2   Global Step: 91850   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:21,721-Speed 2617.21 samples/sec   Loss 12.3117   LearningRate 0.0791   Epoch: 2   Global Step: 91860   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:03:25,598-Speed 2641.66 samples/sec   Loss 12.5092   LearningRate 0.0791   Epoch: 2   Global Step: 91870   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:29,503-Speed 2623.10 samples/sec   Loss 12.4458   LearningRate 0.0791   Epoch: 2   Global Step: 91880   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:33,399-Speed 2629.15 samples/sec   Loss 12.4717   LearningRate 0.0791   Epoch: 2   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:37,294-Speed 2629.37 samples/sec   Loss 12.5793   LearningRate 0.0791   Epoch: 2   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:41,192-Speed 2627.52 samples/sec   Loss 12.4189   LearningRate 0.0791   Epoch: 2   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:45,091-Speed 2627.36 samples/sec   Loss 12.5457   LearningRate 0.0791   Epoch: 2   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:48,999-Speed 2620.09 samples/sec   Loss 12.3322   LearningRate 0.0791   Epoch: 2   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:52,913-Speed 2617.34 samples/sec   Loss 12.4618   LearningRate 0.0791   Epoch: 2   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:03:56,818-Speed 2622.45 samples/sec   Loss 12.4108   LearningRate 0.0791   Epoch: 2   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:00,720-Speed 2625.70 samples/sec   Loss 12.4333   LearningRate 0.0791   Epoch: 2   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:04,621-Speed 2625.28 samples/sec   Loss 12.3345   LearningRate 0.0791   Epoch: 2   Global Step: 91970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:04:08,520-Speed 2626.99 samples/sec   Loss 12.3374   LearningRate 0.0791   Epoch: 2   Global Step: 91980   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:04:12,417-Speed 2628.02 samples/sec   Loss 12.3876   LearningRate 0.0791   Epoch: 2   Global Step: 91990   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:04:16,315-Speed 2627.94 samples/sec   Loss 12.4860   LearningRate 0.0790   Epoch: 2   Global Step: 92000   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:04:20,251-Speed 2602.22 samples/sec   Loss 12.3281   LearningRate 0.0790   Epoch: 2   Global Step: 92010   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:04:24,148-Speed 2628.94 samples/sec   Loss 12.4746   LearningRate 0.0790   Epoch: 2   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:28,059-Speed 2619.12 samples/sec   Loss 12.4352   LearningRate 0.0790   Epoch: 2   Global Step: 92030   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:31,969-Speed 2619.30 samples/sec   Loss 12.5712   LearningRate 0.0790   Epoch: 2   Global Step: 92040   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:35,886-Speed 2615.11 samples/sec   Loss 12.4460   LearningRate 0.0790   Epoch: 2   Global Step: 92050   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:39,793-Speed 2621.58 samples/sec   Loss 12.3491   LearningRate 0.0790   Epoch: 2   Global Step: 92060   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:43,698-Speed 2622.78 samples/sec   Loss 12.4635   LearningRate 0.0790   Epoch: 2   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:47,609-Speed 2618.43 samples/sec   Loss 12.4257   LearningRate 0.0790   Epoch: 2   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:51,514-Speed 2623.18 samples/sec   Loss 12.3188   LearningRate 0.0790   Epoch: 2   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:55,414-Speed 2626.97 samples/sec   Loss 12.4254   LearningRate 0.0790   Epoch: 2   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:04:59,315-Speed 2625.30 samples/sec   Loss 12.3608   LearningRate 0.0790   Epoch: 2   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:05:03,213-Speed 2627.65 samples/sec   Loss 12.4065   LearningRate 0.0790   Epoch: 2   Global Step: 92120   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:07,110-Speed 2628.13 samples/sec   Loss 12.4485   LearningRate 0.0790   Epoch: 2   Global Step: 92130   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:11,008-Speed 2627.72 samples/sec   Loss 12.3711   LearningRate 0.0790   Epoch: 2   Global Step: 92140   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:14,925-Speed 2614.76 samples/sec   Loss 12.5643   LearningRate 0.0790   Epoch: 2   Global Step: 92150   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:18,883-Speed 2587.42 samples/sec   Loss 12.5110   LearningRate 0.0790   Epoch: 2   Global Step: 92160   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:22,826-Speed 2598.31 samples/sec   Loss 12.2933   LearningRate 0.0790   Epoch: 2   Global Step: 92170   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:26,738-Speed 2618.07 samples/sec   Loss 12.4446   LearningRate 0.0790   Epoch: 2   Global Step: 92180   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:30,643-Speed 2622.91 samples/sec   Loss 12.3303   LearningRate 0.0790   Epoch: 2   Global Step: 92190   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:34,545-Speed 2625.19 samples/sec   Loss 12.3154   LearningRate 0.0790   Epoch: 2   Global Step: 92200   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:38,447-Speed 2624.70 samples/sec   Loss 12.5075   LearningRate 0.0790   Epoch: 2   Global Step: 92210   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:05:42,348-Speed 2625.43 samples/sec   Loss 12.4245   LearningRate 0.0790   Epoch: 2   Global Step: 92220   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:05:46,259-Speed 2619.00 samples/sec   Loss 12.3449   LearningRate 0.0790   Epoch: 2   Global Step: 92230   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:05:50,160-Speed 2625.55 samples/sec   Loss 12.4063   LearningRate 0.0790   Epoch: 2   Global Step: 92240   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:05:54,062-Speed 2624.75 samples/sec   Loss 12.5487   LearningRate 0.0790   Epoch: 2   Global Step: 92250   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:05:57,961-Speed 2627.54 samples/sec   Loss 12.3984   LearningRate 0.0790   Epoch: 2   Global Step: 92260   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:06:01,844-Speed 2637.24 samples/sec   Loss 12.3857   LearningRate 0.0790   Epoch: 2   Global Step: 92270   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:05,750-Speed 2622.48 samples/sec   Loss 12.4617   LearningRate 0.0790   Epoch: 2   Global Step: 92280   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:09,689-Speed 2599.99 samples/sec   Loss 12.3088   LearningRate 0.0790   Epoch: 2   Global Step: 92290   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:13,589-Speed 2626.82 samples/sec   Loss 12.4180   LearningRate 0.0790   Epoch: 2   Global Step: 92300   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:17,511-Speed 2611.17 samples/sec   Loss 12.4669   LearningRate 0.0790   Epoch: 2   Global Step: 92310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:21,413-Speed 2625.24 samples/sec   Loss 12.3219   LearningRate 0.0790   Epoch: 2   Global Step: 92320   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:25,316-Speed 2624.39 samples/sec   Loss 12.3936   LearningRate 0.0790   Epoch: 2   Global Step: 92330   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:29,252-Speed 2601.94 samples/sec   Loss 12.4169   LearningRate 0.0790   Epoch: 2   Global Step: 92340   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:33,168-Speed 2616.47 samples/sec   Loss 12.4345   LearningRate 0.0790   Epoch: 2   Global Step: 92350   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:37,068-Speed 2626.41 samples/sec   Loss 12.5265   LearningRate 0.0790   Epoch: 2   Global Step: 92360   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:40,970-Speed 2624.75 samples/sec   Loss 12.4710   LearningRate 0.0790   Epoch: 2   Global Step: 92370   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:44,879-Speed 2619.93 samples/sec   Loss 12.5328   LearningRate 0.0790   Epoch: 2   Global Step: 92380   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:06:48,781-Speed 2625.25 samples/sec   Loss 12.4016   LearningRate 0.0790   Epoch: 2   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:06:52,695-Speed 2616.49 samples/sec   Loss 12.5488   LearningRate 0.0790   Epoch: 2   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:06:56,601-Speed 2622.72 samples/sec   Loss 12.4961   LearningRate 0.0790   Epoch: 2   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:00,507-Speed 2622.31 samples/sec   Loss 12.4559   LearningRate 0.0790   Epoch: 2   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:04,410-Speed 2624.26 samples/sec   Loss 12.5178   LearningRate 0.0790   Epoch: 2   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:08,397-Speed 2568.93 samples/sec   Loss 12.5564   LearningRate 0.0790   Epoch: 2   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:12,503-Speed 2494.55 samples/sec   Loss 12.5293   LearningRate 0.0790   Epoch: 2   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:16,467-Speed 2583.16 samples/sec   Loss 12.4619   LearningRate 0.0790   Epoch: 2   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:20,374-Speed 2622.18 samples/sec   Loss 12.3521   LearningRate 0.0789   Epoch: 2   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:24,284-Speed 2619.37 samples/sec   Loss 12.3378   LearningRate 0.0789   Epoch: 2   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:07:28,206-Speed 2611.84 samples/sec   Loss 12.3694   LearningRate 0.0789   Epoch: 2   Global Step: 92490   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:32,111-Speed 2622.97 samples/sec   Loss 12.3951   LearningRate 0.0789   Epoch: 2   Global Step: 92500   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:36,026-Speed 2616.26 samples/sec   Loss 12.3580   LearningRate 0.0789   Epoch: 2   Global Step: 92510   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:39,932-Speed 2622.14 samples/sec   Loss 12.5053   LearningRate 0.0789   Epoch: 2   Global Step: 92520   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:43,835-Speed 2624.45 samples/sec   Loss 12.4401   LearningRate 0.0789   Epoch: 2   Global Step: 92530   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:47,734-Speed 2626.67 samples/sec   Loss 12.3168   LearningRate 0.0789   Epoch: 2   Global Step: 92540   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:51,635-Speed 2626.28 samples/sec   Loss 12.3673   LearningRate 0.0789   Epoch: 2   Global Step: 92550   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:55,540-Speed 2623.45 samples/sec   Loss 12.4599   LearningRate 0.0789   Epoch: 2   Global Step: 92560   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:07:59,442-Speed 2624.73 samples/sec   Loss 12.4471   LearningRate 0.0789   Epoch: 2   Global Step: 92570   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:03,348-Speed 2622.12 samples/sec   Loss 12.2734   LearningRate 0.0789   Epoch: 2   Global Step: 92580   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:07,260-Speed 2618.40 samples/sec   Loss 12.3225   LearningRate 0.0789   Epoch: 2   Global Step: 92590   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:08:11,170-Speed 2619.77 samples/sec   Loss 12.4281   LearningRate 0.0789   Epoch: 2   Global Step: 92600   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:08:15,069-Speed 2626.82 samples/sec   Loss 12.4027   LearningRate 0.0789   Epoch: 2   Global Step: 92610   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:18,996-Speed 2608.62 samples/sec   Loss 12.3715   LearningRate 0.0789   Epoch: 2   Global Step: 92620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:22,892-Speed 2628.25 samples/sec   Loss 12.5354   LearningRate 0.0789   Epoch: 2   Global Step: 92630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:26,797-Speed 2623.86 samples/sec   Loss 12.4225   LearningRate 0.0789   Epoch: 2   Global Step: 92640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:30,710-Speed 2617.83 samples/sec   Loss 12.4115   LearningRate 0.0789   Epoch: 2   Global Step: 92650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:34,617-Speed 2621.28 samples/sec   Loss 12.5527   LearningRate 0.0789   Epoch: 2   Global Step: 92660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:38,519-Speed 2624.47 samples/sec   Loss 12.3298   LearningRate 0.0789   Epoch: 2   Global Step: 92670   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:42,428-Speed 2620.13 samples/sec   Loss 12.3476   LearningRate 0.0789   Epoch: 2   Global Step: 92680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:46,337-Speed 2620.37 samples/sec   Loss 12.4444   LearningRate 0.0789   Epoch: 2   Global Step: 92690   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:50,239-Speed 2625.05 samples/sec   Loss 12.3834   LearningRate 0.0789   Epoch: 2   Global Step: 92700   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:08:54,147-Speed 2620.95 samples/sec   Loss 12.4194   LearningRate 0.0789   Epoch: 2   Global Step: 92710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:08:58,050-Speed 2624.57 samples/sec   Loss 12.3069   LearningRate 0.0789   Epoch: 2   Global Step: 92720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:01,957-Speed 2621.15 samples/sec   Loss 12.4295   LearningRate 0.0789   Epoch: 2   Global Step: 92730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:05,875-Speed 2614.53 samples/sec   Loss 12.5686   LearningRate 0.0789   Epoch: 2   Global Step: 92740   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:09,783-Speed 2621.28 samples/sec   Loss 12.3455   LearningRate 0.0789   Epoch: 2   Global Step: 92750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:13,688-Speed 2622.80 samples/sec   Loss 12.4333   LearningRate 0.0789   Epoch: 2   Global Step: 92760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:17,587-Speed 2626.50 samples/sec   Loss 12.2531   LearningRate 0.0789   Epoch: 2   Global Step: 92770   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:21,494-Speed 2622.43 samples/sec   Loss 12.5302   LearningRate 0.0789   Epoch: 2   Global Step: 92780   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:25,405-Speed 2618.52 samples/sec   Loss 12.4305   LearningRate 0.0789   Epoch: 2   Global Step: 92790   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:29,326-Speed 2612.75 samples/sec   Loss 12.4207   LearningRate 0.0789   Epoch: 2   Global Step: 92800   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:33,247-Speed 2612.14 samples/sec   Loss 12.6008   LearningRate 0.0789   Epoch: 2   Global Step: 92810   Fp16 Grad Scale: 524288   Required: 83 hours
Training: 2022-04-13 06:09:37,131-Speed 2636.73 samples/sec   Loss 12.6143   LearningRate 0.0789   Epoch: 2   Global Step: 92820   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:09:41,015-Speed 2637.50 samples/sec   Loss 12.4221   LearningRate 0.0789   Epoch: 2   Global Step: 92830   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:09:44,919-Speed 2623.94 samples/sec   Loss 12.3819   LearningRate 0.0789   Epoch: 2   Global Step: 92840   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:09:48,821-Speed 2624.45 samples/sec   Loss 12.4188   LearningRate 0.0789   Epoch: 2   Global Step: 92850   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:09:52,724-Speed 2624.72 samples/sec   Loss 12.4187   LearningRate 0.0789   Epoch: 2   Global Step: 92860   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:09:56,621-Speed 2628.30 samples/sec   Loss 12.3679   LearningRate 0.0789   Epoch: 2   Global Step: 92870   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:10:00,519-Speed 2628.03 samples/sec   Loss 12.4832   LearningRate 0.0789   Epoch: 2   Global Step: 92880   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:10:04,418-Speed 2626.48 samples/sec   Loss 12.3896   LearningRate 0.0789   Epoch: 2   Global Step: 92890   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:10:08,320-Speed 2625.10 samples/sec   Loss 12.2901   LearningRate 0.0789   Epoch: 2   Global Step: 92900   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:10:12,219-Speed 2626.59 samples/sec   Loss 12.5101   LearningRate 0.0789   Epoch: 2   Global Step: 92910   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:10:16,118-Speed 2626.88 samples/sec   Loss 12.3521   LearningRate 0.0789   Epoch: 2   Global Step: 92920   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:10:20,018-Speed 2626.40 samples/sec   Loss 12.4205   LearningRate 0.0789   Epoch: 2   Global Step: 92930   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:23,922-Speed 2623.22 samples/sec   Loss 12.3975   LearningRate 0.0788   Epoch: 2   Global Step: 92940   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:27,828-Speed 2622.64 samples/sec   Loss 12.3854   LearningRate 0.0788   Epoch: 2   Global Step: 92950   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:31,851-Speed 2546.27 samples/sec   Loss 12.5221   LearningRate 0.0788   Epoch: 2   Global Step: 92960   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:35,756-Speed 2622.57 samples/sec   Loss 12.4377   LearningRate 0.0788   Epoch: 2   Global Step: 92970   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:39,658-Speed 2624.62 samples/sec   Loss 12.3526   LearningRate 0.0788   Epoch: 2   Global Step: 92980   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:43,562-Speed 2623.49 samples/sec   Loss 12.3768   LearningRate 0.0788   Epoch: 2   Global Step: 92990   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:47,478-Speed 2615.43 samples/sec   Loss 12.5218   LearningRate 0.0788   Epoch: 2   Global Step: 93000   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:51,408-Speed 2606.71 samples/sec   Loss 12.4254   LearningRate 0.0788   Epoch: 2   Global Step: 93010   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:55,344-Speed 2602.54 samples/sec   Loss 12.2678   LearningRate 0.0788   Epoch: 2   Global Step: 93020   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:10:59,237-Speed 2630.73 samples/sec   Loss 12.5705   LearningRate 0.0788   Epoch: 2   Global Step: 93030   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:03,162-Speed 2609.39 samples/sec   Loss 12.2680   LearningRate 0.0788   Epoch: 2   Global Step: 93040   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:07,074-Speed 2618.94 samples/sec   Loss 12.4655   LearningRate 0.0788   Epoch: 2   Global Step: 93050   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:10,976-Speed 2624.61 samples/sec   Loss 12.3664   LearningRate 0.0788   Epoch: 2   Global Step: 93060   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:14,881-Speed 2629.25 samples/sec   Loss 12.4601   LearningRate 0.0788   Epoch: 2   Global Step: 93070   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:18,783-Speed 2624.44 samples/sec   Loss 12.4148   LearningRate 0.0788   Epoch: 2   Global Step: 93080   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:22,689-Speed 2622.78 samples/sec   Loss 12.4475   LearningRate 0.0788   Epoch: 2   Global Step: 93090   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:26,592-Speed 2624.29 samples/sec   Loss 12.2638   LearningRate 0.0788   Epoch: 2   Global Step: 93100   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:30,506-Speed 2617.05 samples/sec   Loss 12.2595   LearningRate 0.0788   Epoch: 2   Global Step: 93110   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:11:34,404-Speed 2627.81 samples/sec   Loss 12.2703   LearningRate 0.0788   Epoch: 2   Global Step: 93120   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:11:38,306-Speed 2624.93 samples/sec   Loss 12.3557   LearningRate 0.0788   Epoch: 2   Global Step: 93130   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:11:42,212-Speed 2622.46 samples/sec   Loss 12.5449   LearningRate 0.0788   Epoch: 2   Global Step: 93140   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:11:46,258-Speed 2531.92 samples/sec   Loss 12.5243   LearningRate 0.0788   Epoch: 2   Global Step: 93150   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:11:50,269-Speed 2553.54 samples/sec   Loss 12.4441   LearningRate 0.0788   Epoch: 2   Global Step: 93160   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:11:54,180-Speed 2618.93 samples/sec   Loss 12.3306   LearningRate 0.0788   Epoch: 2   Global Step: 93170   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:11:58,071-Speed 2632.83 samples/sec   Loss 12.3233   LearningRate 0.0788   Epoch: 2   Global Step: 93180   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:01,975-Speed 2623.67 samples/sec   Loss 12.4181   LearningRate 0.0788   Epoch: 2   Global Step: 93190   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:05,888-Speed 2617.80 samples/sec   Loss 12.4432   LearningRate 0.0788   Epoch: 2   Global Step: 93200   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:09,791-Speed 2623.82 samples/sec   Loss 12.4207   LearningRate 0.0788   Epoch: 2   Global Step: 93210   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:13,681-Speed 2632.85 samples/sec   Loss 12.4727   LearningRate 0.0788   Epoch: 2   Global Step: 93220   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:17,577-Speed 2629.09 samples/sec   Loss 12.4071   LearningRate 0.0788   Epoch: 2   Global Step: 93230   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:21,473-Speed 2629.47 samples/sec   Loss 12.4357   LearningRate 0.0788   Epoch: 2   Global Step: 93240   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:25,372-Speed 2626.90 samples/sec   Loss 12.4518   LearningRate 0.0788   Epoch: 2   Global Step: 93250   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:29,278-Speed 2622.35 samples/sec   Loss 12.4049   LearningRate 0.0788   Epoch: 2   Global Step: 93260   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:33,209-Speed 2605.81 samples/sec   Loss 12.5645   LearningRate 0.0788   Epoch: 2   Global Step: 93270   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:37,119-Speed 2619.50 samples/sec   Loss 12.3919   LearningRate 0.0788   Epoch: 2   Global Step: 93280   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:41,015-Speed 2628.61 samples/sec   Loss 12.4464   LearningRate 0.0788   Epoch: 2   Global Step: 93290   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:44,924-Speed 2620.35 samples/sec   Loss 12.3727   LearningRate 0.0788   Epoch: 2   Global Step: 93300   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:48,840-Speed 2615.79 samples/sec   Loss 12.5044   LearningRate 0.0788   Epoch: 2   Global Step: 93310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:12:52,744-Speed 2623.39 samples/sec   Loss 12.4332   LearningRate 0.0788   Epoch: 2   Global Step: 93320   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:12:56,653-Speed 2620.87 samples/sec   Loss 12.5418   LearningRate 0.0788   Epoch: 2   Global Step: 93330   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:13:00,554-Speed 2625.18 samples/sec   Loss 12.2786   LearningRate 0.0788   Epoch: 2   Global Step: 93340   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:13:04,439-Speed 2637.31 samples/sec   Loss 12.3841   LearningRate 0.0788   Epoch: 2   Global Step: 93350   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:08,342-Speed 2623.96 samples/sec   Loss 12.3935   LearningRate 0.0788   Epoch: 2   Global Step: 93360   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:12,251-Speed 2620.01 samples/sec   Loss 12.3972   LearningRate 0.0788   Epoch: 2   Global Step: 93370   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:16,153-Speed 2624.55 samples/sec   Loss 12.3968   LearningRate 0.0788   Epoch: 2   Global Step: 93380   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:20,059-Speed 2622.64 samples/sec   Loss 12.3372   LearningRate 0.0788   Epoch: 2   Global Step: 93390   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:23,959-Speed 2626.55 samples/sec   Loss 12.2385   LearningRate 0.0788   Epoch: 2   Global Step: 93400   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:27,867-Speed 2621.16 samples/sec   Loss 12.4934   LearningRate 0.0787   Epoch: 2   Global Step: 93410   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:31,771-Speed 2623.41 samples/sec   Loss 12.3868   LearningRate 0.0787   Epoch: 2   Global Step: 93420   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:35,671-Speed 2627.24 samples/sec   Loss 12.2568   LearningRate 0.0787   Epoch: 2   Global Step: 93430   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:39,572-Speed 2625.26 samples/sec   Loss 12.3896   LearningRate 0.0787   Epoch: 2   Global Step: 93440   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:43,483-Speed 2618.61 samples/sec   Loss 12.4257   LearningRate 0.0787   Epoch: 2   Global Step: 93450   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:13:47,505-Speed 2546.42 samples/sec   Loss 12.4695   LearningRate 0.0787   Epoch: 2   Global Step: 93460   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:13:51,417-Speed 2618.76 samples/sec   Loss 12.3422   LearningRate 0.0787   Epoch: 2   Global Step: 93470   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:13:55,316-Speed 2626.73 samples/sec   Loss 12.3470   LearningRate 0.0787   Epoch: 2   Global Step: 93480   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:13:59,229-Speed 2617.45 samples/sec   Loss 12.2831   LearningRate 0.0787   Epoch: 2   Global Step: 93490   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:03,142-Speed 2617.85 samples/sec   Loss 12.3306   LearningRate 0.0787   Epoch: 2   Global Step: 93500   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:07,055-Speed 2617.24 samples/sec   Loss 12.4846   LearningRate 0.0787   Epoch: 2   Global Step: 93510   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:10,967-Speed 2618.66 samples/sec   Loss 12.3989   LearningRate 0.0787   Epoch: 2   Global Step: 93520   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:14,877-Speed 2619.20 samples/sec   Loss 12.4299   LearningRate 0.0787   Epoch: 2   Global Step: 93530   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:18,791-Speed 2616.25 samples/sec   Loss 12.3631   LearningRate 0.0787   Epoch: 2   Global Step: 93540   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:22,705-Speed 2617.37 samples/sec   Loss 12.3723   LearningRate 0.0787   Epoch: 2   Global Step: 93550   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:26,615-Speed 2619.55 samples/sec   Loss 12.3333   LearningRate 0.0787   Epoch: 2   Global Step: 93560   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:30,527-Speed 2618.28 samples/sec   Loss 12.3776   LearningRate 0.0787   Epoch: 2   Global Step: 93570   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:34,437-Speed 2619.41 samples/sec   Loss 12.3287   LearningRate 0.0787   Epoch: 2   Global Step: 93580   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:14:38,337-Speed 2626.27 samples/sec   Loss 12.3492   LearningRate 0.0787   Epoch: 2   Global Step: 93590   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:42,244-Speed 2621.77 samples/sec   Loss 12.5750   LearningRate 0.0787   Epoch: 2   Global Step: 93600   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:46,149-Speed 2623.41 samples/sec   Loss 12.4599   LearningRate 0.0787   Epoch: 2   Global Step: 93610   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:50,057-Speed 2621.17 samples/sec   Loss 12.4883   LearningRate 0.0787   Epoch: 2   Global Step: 93620   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:53,974-Speed 2614.49 samples/sec   Loss 12.5040   LearningRate 0.0787   Epoch: 2   Global Step: 93630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:14:57,888-Speed 2617.20 samples/sec   Loss 12.5360   LearningRate 0.0787   Epoch: 2   Global Step: 93640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:01,797-Speed 2620.56 samples/sec   Loss 12.4002   LearningRate 0.0787   Epoch: 2   Global Step: 93650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:05,702-Speed 2622.71 samples/sec   Loss 12.3310   LearningRate 0.0787   Epoch: 2   Global Step: 93660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:09,606-Speed 2623.22 samples/sec   Loss 12.3921   LearningRate 0.0787   Epoch: 2   Global Step: 93670   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:13,519-Speed 2617.45 samples/sec   Loss 12.2709   LearningRate 0.0787   Epoch: 2   Global Step: 93680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:17,423-Speed 2624.48 samples/sec   Loss 12.5061   LearningRate 0.0787   Epoch: 2   Global Step: 93690   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:15:21,323-Speed 2626.40 samples/sec   Loss 12.3873   LearningRate 0.0787   Epoch: 2   Global Step: 93700   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:15:25,226-Speed 2624.28 samples/sec   Loss 12.3583   LearningRate 0.0787   Epoch: 2   Global Step: 93710   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:15:29,126-Speed 2625.90 samples/sec   Loss 12.4231   LearningRate 0.0787   Epoch: 2   Global Step: 93720   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:15:33,041-Speed 2616.66 samples/sec   Loss 12.4079   LearningRate 0.0787   Epoch: 2   Global Step: 93730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:15:36,933-Speed 2631.32 samples/sec   Loss 12.4921   LearningRate 0.0787   Epoch: 2   Global Step: 93740   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:40,848-Speed 2616.14 samples/sec   Loss 12.4686   LearningRate 0.0787   Epoch: 2   Global Step: 93750   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:44,751-Speed 2625.15 samples/sec   Loss 12.3115   LearningRate 0.0787   Epoch: 2   Global Step: 93760   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:48,649-Speed 2627.28 samples/sec   Loss 12.3932   LearningRate 0.0787   Epoch: 2   Global Step: 93770   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:52,558-Speed 2620.32 samples/sec   Loss 12.2341   LearningRate 0.0787   Epoch: 2   Global Step: 93780   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:15:56,461-Speed 2624.16 samples/sec   Loss 12.4485   LearningRate 0.0787   Epoch: 2   Global Step: 93790   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:00,364-Speed 2624.56 samples/sec   Loss 12.3942   LearningRate 0.0787   Epoch: 2   Global Step: 93800   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:04,264-Speed 2626.29 samples/sec   Loss 12.3259   LearningRate 0.0787   Epoch: 2   Global Step: 93810   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:08,169-Speed 2622.59 samples/sec   Loss 12.3153   LearningRate 0.0787   Epoch: 2   Global Step: 93820   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:12,075-Speed 2622.51 samples/sec   Loss 12.2968   LearningRate 0.0787   Epoch: 2   Global Step: 93830   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:15,980-Speed 2622.73 samples/sec   Loss 12.3469   LearningRate 0.0787   Epoch: 2   Global Step: 93840   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:16:19,886-Speed 2622.46 samples/sec   Loss 12.3895   LearningRate 0.0787   Epoch: 2   Global Step: 93850   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:16:23,789-Speed 2624.20 samples/sec   Loss 12.2863   LearningRate 0.0787   Epoch: 2   Global Step: 93860   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:16:27,690-Speed 2625.06 samples/sec   Loss 12.4112   LearningRate 0.0786   Epoch: 2   Global Step: 93870   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:16:31,580-Speed 2633.97 samples/sec   Loss 12.4231   LearningRate 0.0786   Epoch: 2   Global Step: 93880   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:35,479-Speed 2626.68 samples/sec   Loss 12.4350   LearningRate 0.0786   Epoch: 2   Global Step: 93890   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:39,382-Speed 2623.86 samples/sec   Loss 12.3679   LearningRate 0.0786   Epoch: 2   Global Step: 93900   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:43,285-Speed 2624.47 samples/sec   Loss 12.3763   LearningRate 0.0786   Epoch: 2   Global Step: 93910   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:47,187-Speed 2625.19 samples/sec   Loss 12.4318   LearningRate 0.0786   Epoch: 2   Global Step: 93920   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:51,087-Speed 2626.89 samples/sec   Loss 12.3540   LearningRate 0.0786   Epoch: 2   Global Step: 93930   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:54,983-Speed 2628.38 samples/sec   Loss 12.3429   LearningRate 0.0786   Epoch: 2   Global Step: 93940   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:16:58,888-Speed 2623.65 samples/sec   Loss 12.5026   LearningRate 0.0786   Epoch: 2   Global Step: 93950   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:02,791-Speed 2623.83 samples/sec   Loss 12.4750   LearningRate 0.0786   Epoch: 2   Global Step: 93960   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:06,692-Speed 2625.60 samples/sec   Loss 12.4858   LearningRate 0.0786   Epoch: 2   Global Step: 93970   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:10,598-Speed 2622.15 samples/sec   Loss 12.3342   LearningRate 0.0786   Epoch: 2   Global Step: 93980   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:17:14,515-Speed 2615.21 samples/sec   Loss 12.3048   LearningRate 0.0786   Epoch: 2   Global Step: 93990   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:17:18,434-Speed 2613.35 samples/sec   Loss 12.3014   LearningRate 0.0786   Epoch: 2   Global Step: 94000   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:22,348-Speed 2616.82 samples/sec   Loss 12.3041   LearningRate 0.0786   Epoch: 2   Global Step: 94010   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:26,247-Speed 2627.46 samples/sec   Loss 12.4018   LearningRate 0.0786   Epoch: 2   Global Step: 94020   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:30,159-Speed 2618.39 samples/sec   Loss 12.4283   LearningRate 0.0786   Epoch: 2   Global Step: 94030   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:34,059-Speed 2625.90 samples/sec   Loss 12.3740   LearningRate 0.0786   Epoch: 2   Global Step: 94040   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:37,960-Speed 2625.66 samples/sec   Loss 12.4738   LearningRate 0.0786   Epoch: 2   Global Step: 94050   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:41,859-Speed 2627.15 samples/sec   Loss 12.2985   LearningRate 0.0786   Epoch: 2   Global Step: 94060   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:45,771-Speed 2618.45 samples/sec   Loss 12.5016   LearningRate 0.0786   Epoch: 2   Global Step: 94070   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:49,682-Speed 2618.40 samples/sec   Loss 12.3559   LearningRate 0.0786   Epoch: 2   Global Step: 94080   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:53,593-Speed 2619.26 samples/sec   Loss 12.3603   LearningRate 0.0786   Epoch: 2   Global Step: 94090   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:17:57,478-Speed 2636.55 samples/sec   Loss 12.3667   LearningRate 0.0786   Epoch: 2   Global Step: 94100   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:01,390-Speed 2618.24 samples/sec   Loss 12.4346   LearningRate 0.0786   Epoch: 2   Global Step: 94110   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:05,296-Speed 2622.47 samples/sec   Loss 12.4374   LearningRate 0.0786   Epoch: 2   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:09,197-Speed 2625.61 samples/sec   Loss 12.3677   LearningRate 0.0786   Epoch: 2   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:13,097-Speed 2626.54 samples/sec   Loss 12.3658   LearningRate 0.0786   Epoch: 2   Global Step: 94140   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:17,003-Speed 2622.09 samples/sec   Loss 12.2916   LearningRate 0.0786   Epoch: 2   Global Step: 94150   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:20,932-Speed 2608.21 samples/sec   Loss 12.5557   LearningRate 0.0786   Epoch: 2   Global Step: 94160   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:18:24,819-Speed 2635.14 samples/sec   Loss 12.3815   LearningRate 0.0786   Epoch: 2   Global Step: 94170   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:28,727-Speed 2620.94 samples/sec   Loss 12.3773   LearningRate 0.0786   Epoch: 2   Global Step: 94180   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:32,627-Speed 2625.95 samples/sec   Loss 12.4166   LearningRate 0.0786   Epoch: 2   Global Step: 94190   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:36,584-Speed 2589.42 samples/sec   Loss 12.4102   LearningRate 0.0786   Epoch: 2   Global Step: 94200   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:40,482-Speed 2627.38 samples/sec   Loss 12.5074   LearningRate 0.0786   Epoch: 2   Global Step: 94210   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:44,383-Speed 2625.53 samples/sec   Loss 12.5507   LearningRate 0.0786   Epoch: 2   Global Step: 94220   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:48,285-Speed 2625.00 samples/sec   Loss 12.4041   LearningRate 0.0786   Epoch: 2   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:52,185-Speed 2626.07 samples/sec   Loss 12.5660   LearningRate 0.0786   Epoch: 2   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:56,088-Speed 2624.57 samples/sec   Loss 12.3250   LearningRate 0.0786   Epoch: 2   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:18:59,988-Speed 2625.99 samples/sec   Loss 12.3720   LearningRate 0.0786   Epoch: 2   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:19:03,954-Speed 2583.33 samples/sec   Loss 12.1959   LearningRate 0.0786   Epoch: 2   Global Step: 94270   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:07,853-Speed 2626.94 samples/sec   Loss 12.4748   LearningRate 0.0786   Epoch: 2   Global Step: 94280   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:11,759-Speed 2622.10 samples/sec   Loss 12.2754   LearningRate 0.0786   Epoch: 2   Global Step: 94290   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:15,673-Speed 2616.36 samples/sec   Loss 12.4161   LearningRate 0.0786   Epoch: 2   Global Step: 94300   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:19,584-Speed 2619.27 samples/sec   Loss 12.4346   LearningRate 0.0786   Epoch: 2   Global Step: 94310   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:23,486-Speed 2625.01 samples/sec   Loss 12.3549   LearningRate 0.0786   Epoch: 2   Global Step: 94320   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:27,416-Speed 2606.65 samples/sec   Loss 12.3873   LearningRate 0.0786   Epoch: 2   Global Step: 94330   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:31,452-Speed 2537.93 samples/sec   Loss 12.4361   LearningRate 0.0785   Epoch: 2   Global Step: 94340   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:35,493-Speed 2535.03 samples/sec   Loss 12.2632   LearningRate 0.0785   Epoch: 2   Global Step: 94350   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:19:39,385-Speed 2631.24 samples/sec   Loss 12.4849   LearningRate 0.0785   Epoch: 2   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:19:43,368-Speed 2571.60 samples/sec   Loss 12.3731   LearningRate 0.0785   Epoch: 2   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:19:47,270-Speed 2624.47 samples/sec   Loss 12.4836   LearningRate 0.0785   Epoch: 2   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:19:51,181-Speed 2619.44 samples/sec   Loss 12.2414   LearningRate 0.0785   Epoch: 2   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:19:55,076-Speed 2629.88 samples/sec   Loss 12.2957   LearningRate 0.0785   Epoch: 2   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:19:58,972-Speed 2628.93 samples/sec   Loss 12.3698   LearningRate 0.0785   Epoch: 2   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:20:02,875-Speed 2624.10 samples/sec   Loss 12.2326   LearningRate 0.0785   Epoch: 2   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:20:06,777-Speed 2624.97 samples/sec   Loss 12.4585   LearningRate 0.0785   Epoch: 2   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:20:10,676-Speed 2626.70 samples/sec   Loss 12.4076   LearningRate 0.0785   Epoch: 2   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:20:14,590-Speed 2616.55 samples/sec   Loss 12.3624   LearningRate 0.0785   Epoch: 2   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 83 hours
Training: 2022-04-13 06:20:18,494-Speed 2624.13 samples/sec   Loss 12.3479   LearningRate 0.0785   Epoch: 2   Global Step: 94460   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:22,398-Speed 2623.42 samples/sec   Loss 12.4136   LearningRate 0.0785   Epoch: 2   Global Step: 94470   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:26,304-Speed 2622.91 samples/sec   Loss 12.2839   LearningRate 0.0785   Epoch: 2   Global Step: 94480   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:30,206-Speed 2624.99 samples/sec   Loss 12.1890   LearningRate 0.0785   Epoch: 2   Global Step: 94490   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:34,104-Speed 2627.74 samples/sec   Loss 12.3879   LearningRate 0.0785   Epoch: 2   Global Step: 94500   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:38,010-Speed 2622.23 samples/sec   Loss 12.5266   LearningRate 0.0785   Epoch: 2   Global Step: 94510   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:41,912-Speed 2625.76 samples/sec   Loss 12.5609   LearningRate 0.0785   Epoch: 2   Global Step: 94520   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:45,823-Speed 2618.26 samples/sec   Loss 12.4626   LearningRate 0.0785   Epoch: 2   Global Step: 94530   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:49,723-Speed 2626.80 samples/sec   Loss 12.2228   LearningRate 0.0785   Epoch: 2   Global Step: 94540   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:53,623-Speed 2626.33 samples/sec   Loss 12.2862   LearningRate 0.0785   Epoch: 2   Global Step: 94550   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:20:57,538-Speed 2616.52 samples/sec   Loss 12.4248   LearningRate 0.0785   Epoch: 2   Global Step: 94560   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:01,455-Speed 2614.80 samples/sec   Loss 12.4841   LearningRate 0.0785   Epoch: 2   Global Step: 94570   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:05,355-Speed 2626.12 samples/sec   Loss 12.2809   LearningRate 0.0785   Epoch: 2   Global Step: 94580   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:09,259-Speed 2623.63 samples/sec   Loss 12.2275   LearningRate 0.0785   Epoch: 2   Global Step: 94590   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:13,171-Speed 2617.82 samples/sec   Loss 12.3622   LearningRate 0.0785   Epoch: 2   Global Step: 94600   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:17,099-Speed 2608.54 samples/sec   Loss 12.4808   LearningRate 0.0785   Epoch: 2   Global Step: 94610   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:21,007-Speed 2620.92 samples/sec   Loss 12.3472   LearningRate 0.0785   Epoch: 2   Global Step: 94620   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:21:24,887-Speed 2639.60 samples/sec   Loss 12.4973   LearningRate 0.0785   Epoch: 2   Global Step: 94630   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:28,784-Speed 2628.37 samples/sec   Loss 12.2362   LearningRate 0.0785   Epoch: 2   Global Step: 94640   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:32,685-Speed 2626.19 samples/sec   Loss 12.3176   LearningRate 0.0785   Epoch: 2   Global Step: 94650   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:36,587-Speed 2624.76 samples/sec   Loss 12.5733   LearningRate 0.0785   Epoch: 2   Global Step: 94660   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:40,484-Speed 2628.24 samples/sec   Loss 12.4492   LearningRate 0.0785   Epoch: 2   Global Step: 94670   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:44,388-Speed 2623.37 samples/sec   Loss 12.4918   LearningRate 0.0785   Epoch: 2   Global Step: 94680   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:48,292-Speed 2623.59 samples/sec   Loss 12.5127   LearningRate 0.0785   Epoch: 2   Global Step: 94690   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:52,218-Speed 2608.94 samples/sec   Loss 12.2132   LearningRate 0.0785   Epoch: 2   Global Step: 94700   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:21:56,119-Speed 2625.69 samples/sec   Loss 12.2472   LearningRate 0.0785   Epoch: 2   Global Step: 94710   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:22:00,019-Speed 2626.32 samples/sec   Loss 12.1816   LearningRate 0.0785   Epoch: 2   Global Step: 94720   Fp16 Grad Scale: 131072   Required: 83 hours
Training: 2022-04-13 06:22:03,918-Speed 2627.05 samples/sec   Loss 12.3433   LearningRate 0.0785   Epoch: 2   Global Step: 94730   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:22:07,820-Speed 2624.78 samples/sec   Loss 12.3686   LearningRate 0.0785   Epoch: 2   Global Step: 94740   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:22:11,731-Speed 2619.17 samples/sec   Loss 12.4657   LearningRate 0.0785   Epoch: 2   Global Step: 94750   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:22:15,627-Speed 2629.15 samples/sec   Loss 12.5813   LearningRate 0.0785   Epoch: 2   Global Step: 94760   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:22:19,530-Speed 2624.54 samples/sec   Loss 12.3781   LearningRate 0.0785   Epoch: 2   Global Step: 94770   Fp16 Grad Scale: 262144   Required: 83 hours
Training: 2022-04-13 06:22:23,560-Speed 2541.65 samples/sec   Loss 12.4180   LearningRate 0.0785   Epoch: 2   Global Step: 94780   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:22:27,463-Speed 2623.99 samples/sec   Loss 12.5558   LearningRate 0.0785   Epoch: 2   Global Step: 94790   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:22:31,348-Speed 2636.90 samples/sec   Loss 12.4274   LearningRate 0.0785   Epoch: 2   Global Step: 94800   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:35,268-Speed 2612.57 samples/sec   Loss 12.4152   LearningRate 0.0784   Epoch: 2   Global Step: 94810   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:39,166-Speed 2627.54 samples/sec   Loss 12.3513   LearningRate 0.0784   Epoch: 2   Global Step: 94820   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:43,065-Speed 2626.61 samples/sec   Loss 12.3886   LearningRate 0.0784   Epoch: 2   Global Step: 94830   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:46,966-Speed 2626.27 samples/sec   Loss 12.3181   LearningRate 0.0784   Epoch: 2   Global Step: 94840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:50,868-Speed 2625.04 samples/sec   Loss 12.4398   LearningRate 0.0784   Epoch: 2   Global Step: 94850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:54,772-Speed 2623.53 samples/sec   Loss 12.2982   LearningRate 0.0784   Epoch: 2   Global Step: 94860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:22:58,676-Speed 2623.74 samples/sec   Loss 12.3589   LearningRate 0.0784   Epoch: 2   Global Step: 94870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:23:02,603-Speed 2608.35 samples/sec   Loss 12.2600   LearningRate 0.0784   Epoch: 2   Global Step: 94880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:23:06,504-Speed 2625.14 samples/sec   Loss 12.2818   LearningRate 0.0784   Epoch: 2   Global Step: 94890   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:23:10,405-Speed 2625.47 samples/sec   Loss 12.2442   LearningRate 0.0784   Epoch: 2   Global Step: 94900   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:23:14,303-Speed 2627.44 samples/sec   Loss 12.5001   LearningRate 0.0784   Epoch: 2   Global Step: 94910   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:23:18,117-Speed 2686.07 samples/sec   Loss 12.4129   LearningRate 0.0784   Epoch: 2   Global Step: 94920   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:22,013-Speed 2629.09 samples/sec   Loss 12.5417   LearningRate 0.0784   Epoch: 2   Global Step: 94930   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:25,909-Speed 2629.02 samples/sec   Loss 12.4410   LearningRate 0.0784   Epoch: 2   Global Step: 94940   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:29,847-Speed 2601.34 samples/sec   Loss 12.3869   LearningRate 0.0784   Epoch: 2   Global Step: 94950   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:33,742-Speed 2629.53 samples/sec   Loss 12.4389   LearningRate 0.0784   Epoch: 2   Global Step: 94960   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:37,642-Speed 2625.53 samples/sec   Loss 12.3814   LearningRate 0.0784   Epoch: 2   Global Step: 94970   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:41,540-Speed 2627.79 samples/sec   Loss 12.4704   LearningRate 0.0784   Epoch: 2   Global Step: 94980   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:45,437-Speed 2628.60 samples/sec   Loss 12.3706   LearningRate 0.0784   Epoch: 2   Global Step: 94990   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:49,346-Speed 2620.24 samples/sec   Loss 12.3828   LearningRate 0.0784   Epoch: 2   Global Step: 95000   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:53,258-Speed 2618.59 samples/sec   Loss 12.4051   LearningRate 0.0784   Epoch: 2   Global Step: 95010   Fp16 Grad Scale: 8192   Required: 82 hours
Training: 2022-04-13 06:23:57,161-Speed 2624.01 samples/sec   Loss 12.5102   LearningRate 0.0784   Epoch: 2   Global Step: 95020   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:01,068-Speed 2621.91 samples/sec   Loss 12.3514   LearningRate 0.0784   Epoch: 2   Global Step: 95030   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:04,969-Speed 2625.21 samples/sec   Loss 12.3649   LearningRate 0.0784   Epoch: 2   Global Step: 95040   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:08,879-Speed 2619.64 samples/sec   Loss 12.5036   LearningRate 0.0784   Epoch: 2   Global Step: 95050   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:12,770-Speed 2632.25 samples/sec   Loss 12.2806   LearningRate 0.0784   Epoch: 2   Global Step: 95060   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:16,669-Speed 2626.94 samples/sec   Loss 12.4772   LearningRate 0.0784   Epoch: 2   Global Step: 95070   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:20,571-Speed 2625.04 samples/sec   Loss 12.3377   LearningRate 0.0784   Epoch: 2   Global Step: 95080   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:24,470-Speed 2627.03 samples/sec   Loss 12.4531   LearningRate 0.0784   Epoch: 2   Global Step: 95090   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:28,391-Speed 2612.58 samples/sec   Loss 12.4295   LearningRate 0.0784   Epoch: 2   Global Step: 95100   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:32,317-Speed 2608.45 samples/sec   Loss 12.1799   LearningRate 0.0784   Epoch: 2   Global Step: 95110   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 06:24:36,286-Speed 2581.01 samples/sec   Loss 12.2615   LearningRate 0.0784   Epoch: 2   Global Step: 95120   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:24:40,201-Speed 2615.93 samples/sec   Loss 12.3562   LearningRate 0.0784   Epoch: 2   Global Step: 95130   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:24:44,134-Speed 2604.67 samples/sec   Loss 12.3918   LearningRate 0.0784   Epoch: 2   Global Step: 95140   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:24:48,048-Speed 2616.41 samples/sec   Loss 12.2960   LearningRate 0.0784   Epoch: 2   Global Step: 95150   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:24:51,978-Speed 2606.75 samples/sec   Loss 12.4647   LearningRate 0.0784   Epoch: 2   Global Step: 95160   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:24:55,875-Speed 2628.40 samples/sec   Loss 12.3674   LearningRate 0.0784   Epoch: 2   Global Step: 95170   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:24:59,789-Speed 2617.34 samples/sec   Loss 12.3257   LearningRate 0.0784   Epoch: 2   Global Step: 95180   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:25:03,687-Speed 2627.72 samples/sec   Loss 12.3591   LearningRate 0.0784   Epoch: 2   Global Step: 95190   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:25:07,590-Speed 2624.22 samples/sec   Loss 12.4246   LearningRate 0.0784   Epoch: 2   Global Step: 95200   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:25:11,493-Speed 2623.85 samples/sec   Loss 12.3298   LearningRate 0.0784   Epoch: 2   Global Step: 95210   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 06:25:15,410-Speed 2615.40 samples/sec   Loss 12.3856   LearningRate 0.0784   Epoch: 2   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:19,320-Speed 2619.69 samples/sec   Loss 12.3294   LearningRate 0.0784   Epoch: 2   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:23,226-Speed 2622.31 samples/sec   Loss 12.4524   LearningRate 0.0784   Epoch: 2   Global Step: 95240   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:27,139-Speed 2618.08 samples/sec   Loss 12.2790   LearningRate 0.0784   Epoch: 2   Global Step: 95250   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:31,042-Speed 2624.16 samples/sec   Loss 12.3578   LearningRate 0.0784   Epoch: 2   Global Step: 95260   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:34,939-Speed 2628.29 samples/sec   Loss 12.2463   LearningRate 0.0784   Epoch: 2   Global Step: 95270   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:38,844-Speed 2622.90 samples/sec   Loss 12.2554   LearningRate 0.0783   Epoch: 2   Global Step: 95280   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:42,749-Speed 2622.92 samples/sec   Loss 12.4621   LearningRate 0.0783   Epoch: 2   Global Step: 95290   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:46,654-Speed 2622.89 samples/sec   Loss 12.5071   LearningRate 0.0783   Epoch: 2   Global Step: 95300   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:50,582-Speed 2607.85 samples/sec   Loss 12.1942   LearningRate 0.0783   Epoch: 2   Global Step: 95310   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:25:54,547-Speed 2583.39 samples/sec   Loss 12.3049   LearningRate 0.0783   Epoch: 2   Global Step: 95320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:25:58,449-Speed 2625.35 samples/sec   Loss 12.3074   LearningRate 0.0783   Epoch: 2   Global Step: 95330   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:02,350-Speed 2625.38 samples/sec   Loss 12.3310   LearningRate 0.0783   Epoch: 2   Global Step: 95340   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:06,246-Speed 2629.47 samples/sec   Loss 12.2558   LearningRate 0.0783   Epoch: 2   Global Step: 95350   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:10,156-Speed 2619.71 samples/sec   Loss 12.3984   LearningRate 0.0783   Epoch: 2   Global Step: 95360   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:14,057-Speed 2625.37 samples/sec   Loss 12.3498   LearningRate 0.0783   Epoch: 2   Global Step: 95370   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:17,954-Speed 2627.89 samples/sec   Loss 12.3842   LearningRate 0.0783   Epoch: 2   Global Step: 95380   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:21,862-Speed 2621.54 samples/sec   Loss 12.1875   LearningRate 0.0783   Epoch: 2   Global Step: 95390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:25,780-Speed 2613.71 samples/sec   Loss 12.4236   LearningRate 0.0783   Epoch: 2   Global Step: 95400   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:29,695-Speed 2616.41 samples/sec   Loss 12.2960   LearningRate 0.0783   Epoch: 2   Global Step: 95410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:33,589-Speed 2630.38 samples/sec   Loss 12.4678   LearningRate 0.0783   Epoch: 2   Global Step: 95420   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:26:37,489-Speed 2626.44 samples/sec   Loss 12.3078   LearningRate 0.0783   Epoch: 2   Global Step: 95430   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:26:41,397-Speed 2620.41 samples/sec   Loss 12.3673   LearningRate 0.0783   Epoch: 2   Global Step: 95440   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:26:45,281-Speed 2636.98 samples/sec   Loss 12.2580   LearningRate 0.0783   Epoch: 2   Global Step: 95450   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:49,189-Speed 2621.36 samples/sec   Loss 12.5007   LearningRate 0.0783   Epoch: 2   Global Step: 95460   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:53,089-Speed 2626.05 samples/sec   Loss 12.5400   LearningRate 0.0783   Epoch: 2   Global Step: 95470   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:26:56,988-Speed 2627.71 samples/sec   Loss 12.2965   LearningRate 0.0783   Epoch: 2   Global Step: 95480   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:00,887-Speed 2626.61 samples/sec   Loss 12.3505   LearningRate 0.0783   Epoch: 2   Global Step: 95490   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:04,791-Speed 2623.41 samples/sec   Loss 12.4877   LearningRate 0.0783   Epoch: 2   Global Step: 95500   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:08,694-Speed 2624.23 samples/sec   Loss 12.6136   LearningRate 0.0783   Epoch: 2   Global Step: 95510   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:12,591-Speed 2628.86 samples/sec   Loss 12.3638   LearningRate 0.0783   Epoch: 2   Global Step: 95520   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:16,506-Speed 2615.86 samples/sec   Loss 12.3673   LearningRate 0.0783   Epoch: 2   Global Step: 95530   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:20,406-Speed 2626.68 samples/sec   Loss 12.3588   LearningRate 0.0783   Epoch: 2   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:24,334-Speed 2607.86 samples/sec   Loss 12.4866   LearningRate 0.0783   Epoch: 2   Global Step: 95550   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:27:28,231-Speed 2628.80 samples/sec   Loss 12.2813   LearningRate 0.0783   Epoch: 2   Global Step: 95560   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:27:32,138-Speed 2620.98 samples/sec   Loss 12.2886   LearningRate 0.0783   Epoch: 2   Global Step: 95570   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:27:36,042-Speed 2624.09 samples/sec   Loss 12.2301   LearningRate 0.0783   Epoch: 2   Global Step: 95580   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:27:39,941-Speed 2626.74 samples/sec   Loss 12.3765   LearningRate 0.0783   Epoch: 2   Global Step: 95590   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:27:43,866-Speed 2609.21 samples/sec   Loss 12.2687   LearningRate 0.0783   Epoch: 2   Global Step: 95600   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:47,768-Speed 2625.54 samples/sec   Loss 12.4446   LearningRate 0.0783   Epoch: 2   Global Step: 95610   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:51,664-Speed 2628.81 samples/sec   Loss 12.3033   LearningRate 0.0783   Epoch: 2   Global Step: 95620   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:27:55,611-Speed 2595.29 samples/sec   Loss 12.3672   LearningRate 0.0783   Epoch: 2   Global Step: 95630   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:27:59,510-Speed 2626.92 samples/sec   Loss 12.9900   LearningRate 0.0783   Epoch: 2   Global Step: 95640   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:03,409-Speed 2626.98 samples/sec   Loss 12.7384   LearningRate 0.0783   Epoch: 2   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:07,336-Speed 2608.49 samples/sec   Loss 12.4522   LearningRate 0.0783   Epoch: 2   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:11,389-Speed 2527.54 samples/sec   Loss 12.5440   LearningRate 0.0783   Epoch: 2   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:15,312-Speed 2610.53 samples/sec   Loss 12.5117   LearningRate 0.0783   Epoch: 2   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:19,203-Speed 2633.18 samples/sec   Loss 12.5504   LearningRate 0.0783   Epoch: 2   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:23,094-Speed 2632.13 samples/sec   Loss 12.3565   LearningRate 0.0783   Epoch: 2   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:26,992-Speed 2627.92 samples/sec   Loss 12.3879   LearningRate 0.0783   Epoch: 2   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:30,893-Speed 2625.54 samples/sec   Loss 12.3992   LearningRate 0.0783   Epoch: 2   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:28:34,788-Speed 2629.41 samples/sec   Loss 12.3691   LearningRate 0.0783   Epoch: 2   Global Step: 95730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:28:38,680-Speed 2631.82 samples/sec   Loss 12.6148   LearningRate 0.0783   Epoch: 2   Global Step: 95740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:28:42,579-Speed 2627.13 samples/sec   Loss 12.4007   LearningRate 0.0782   Epoch: 2   Global Step: 95750   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:28:46,475-Speed 2628.51 samples/sec   Loss 12.4408   LearningRate 0.0782   Epoch: 2   Global Step: 95760   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:28:50,388-Speed 2618.17 samples/sec   Loss 12.4624   LearningRate 0.0782   Epoch: 2   Global Step: 95770   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:28:54,308-Speed 2612.97 samples/sec   Loss 12.5665   LearningRate 0.0782   Epoch: 2   Global Step: 95780   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:28:58,206-Speed 2627.55 samples/sec   Loss 12.3666   LearningRate 0.0782   Epoch: 2   Global Step: 95790   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:29:02,108-Speed 2624.81 samples/sec   Loss 12.4307   LearningRate 0.0782   Epoch: 2   Global Step: 95800   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:29:06,005-Speed 2627.87 samples/sec   Loss 12.5336   LearningRate 0.0782   Epoch: 2   Global Step: 95810   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:29:09,907-Speed 2625.27 samples/sec   Loss 12.4109   LearningRate 0.0782   Epoch: 2   Global Step: 95820   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:29:13,805-Speed 2627.82 samples/sec   Loss 12.4817   LearningRate 0.0782   Epoch: 2   Global Step: 95830   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:17,707-Speed 2624.68 samples/sec   Loss 12.4335   LearningRate 0.0782   Epoch: 2   Global Step: 95840   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:21,618-Speed 2619.13 samples/sec   Loss 12.1688   LearningRate 0.0782   Epoch: 2   Global Step: 95850   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:25,526-Speed 2620.92 samples/sec   Loss 12.4334   LearningRate 0.0782   Epoch: 2   Global Step: 95860   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:29,437-Speed 2618.98 samples/sec   Loss 12.3509   LearningRate 0.0782   Epoch: 2   Global Step: 95870   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:33,360-Speed 2610.34 samples/sec   Loss 12.5021   LearningRate 0.0782   Epoch: 2   Global Step: 95880   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:37,279-Speed 2613.97 samples/sec   Loss 12.4403   LearningRate 0.0782   Epoch: 2   Global Step: 95890   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:41,176-Speed 2628.07 samples/sec   Loss 12.3506   LearningRate 0.0782   Epoch: 2   Global Step: 95900   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:45,069-Speed 2630.91 samples/sec   Loss 12.3862   LearningRate 0.0782   Epoch: 2   Global Step: 95910   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:48,967-Speed 2629.35 samples/sec   Loss 12.4708   LearningRate 0.0782   Epoch: 2   Global Step: 95920   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:29:52,845-Speed 2641.13 samples/sec   Loss 12.2250   LearningRate 0.0782   Epoch: 2   Global Step: 95930   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:29:56,775-Speed 2606.17 samples/sec   Loss 12.2520   LearningRate 0.0782   Epoch: 2   Global Step: 95940   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:00,671-Speed 2628.90 samples/sec   Loss 12.2116   LearningRate 0.0782   Epoch: 2   Global Step: 95950   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:04,564-Speed 2630.95 samples/sec   Loss 12.3021   LearningRate 0.0782   Epoch: 2   Global Step: 95960   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:08,459-Speed 2629.67 samples/sec   Loss 12.1989   LearningRate 0.0782   Epoch: 2   Global Step: 95970   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:12,360-Speed 2625.80 samples/sec   Loss 12.3298   LearningRate 0.0782   Epoch: 2   Global Step: 95980   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:16,256-Speed 2628.86 samples/sec   Loss 12.2920   LearningRate 0.0782   Epoch: 2   Global Step: 95990   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:20,221-Speed 2583.04 samples/sec   Loss 12.4009   LearningRate 0.0782   Epoch: 2   Global Step: 96000   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:24,155-Speed 2603.09 samples/sec   Loss 12.4395   LearningRate 0.0782   Epoch: 2   Global Step: 96010   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:28,051-Speed 2629.62 samples/sec   Loss 12.5021   LearningRate 0.0782   Epoch: 2   Global Step: 96020   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:30:31,960-Speed 2620.13 samples/sec   Loss 12.3987   LearningRate 0.0782   Epoch: 2   Global Step: 96030   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:35,869-Speed 2620.48 samples/sec   Loss 12.5961   LearningRate 0.0782   Epoch: 2   Global Step: 96040   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:39,773-Speed 2623.25 samples/sec   Loss 12.3366   LearningRate 0.0782   Epoch: 2   Global Step: 96050   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:43,674-Speed 2625.61 samples/sec   Loss 12.3436   LearningRate 0.0782   Epoch: 2   Global Step: 96060   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:47,573-Speed 2627.41 samples/sec   Loss 12.4092   LearningRate 0.0782   Epoch: 2   Global Step: 96070   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:51,480-Speed 2621.70 samples/sec   Loss 12.5041   LearningRate 0.0782   Epoch: 2   Global Step: 96080   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:55,379-Speed 2626.88 samples/sec   Loss 12.5216   LearningRate 0.0782   Epoch: 2   Global Step: 96090   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:30:59,296-Speed 2615.20 samples/sec   Loss 12.3386   LearningRate 0.0782   Epoch: 2   Global Step: 96100   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:03,216-Speed 2612.58 samples/sec   Loss 12.4276   LearningRate 0.0782   Epoch: 2   Global Step: 96110   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:07,114-Speed 2628.29 samples/sec   Loss 12.2136   LearningRate 0.0782   Epoch: 2   Global Step: 96120   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:11,022-Speed 2620.57 samples/sec   Loss 12.3971   LearningRate 0.0782   Epoch: 2   Global Step: 96130   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:14,923-Speed 2625.39 samples/sec   Loss 12.3342   LearningRate 0.0782   Epoch: 2   Global Step: 96140   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:18,825-Speed 2625.13 samples/sec   Loss 12.4786   LearningRate 0.0782   Epoch: 2   Global Step: 96150   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:22,719-Speed 2630.58 samples/sec   Loss 12.4734   LearningRate 0.0782   Epoch: 2   Global Step: 96160   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:26,623-Speed 2623.80 samples/sec   Loss 12.3421   LearningRate 0.0782   Epoch: 2   Global Step: 96170   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:30,515-Speed 2632.48 samples/sec   Loss 12.3205   LearningRate 0.0782   Epoch: 2   Global Step: 96180   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:34,411-Speed 2628.38 samples/sec   Loss 12.6137   LearningRate 0.0782   Epoch: 2   Global Step: 96190   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:38,325-Speed 2617.30 samples/sec   Loss 12.2979   LearningRate 0.0782   Epoch: 2   Global Step: 96200   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:42,218-Speed 2631.43 samples/sec   Loss 12.1817   LearningRate 0.0782   Epoch: 2   Global Step: 96210   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:46,115-Speed 2628.35 samples/sec   Loss 12.1643   LearningRate 0.0781   Epoch: 2   Global Step: 96220   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:50,038-Speed 2610.56 samples/sec   Loss 12.2645   LearningRate 0.0781   Epoch: 2   Global Step: 96230   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:54,058-Speed 2548.10 samples/sec   Loss 12.2766   LearningRate 0.0781   Epoch: 2   Global Step: 96240   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:31:58,131-Speed 2514.86 samples/sec   Loss 12.3582   LearningRate 0.0781   Epoch: 2   Global Step: 96250   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:02,202-Speed 2515.93 samples/sec   Loss 12.3579   LearningRate 0.0781   Epoch: 2   Global Step: 96260   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:06,130-Speed 2607.96 samples/sec   Loss 12.2371   LearningRate 0.0781   Epoch: 2   Global Step: 96270   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:10,032-Speed 2624.37 samples/sec   Loss 12.3373   LearningRate 0.0781   Epoch: 2   Global Step: 96280   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:13,938-Speed 2622.36 samples/sec   Loss 12.3841   LearningRate 0.0781   Epoch: 2   Global Step: 96290   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:17,829-Speed 2631.71 samples/sec   Loss 12.3542   LearningRate 0.0781   Epoch: 2   Global Step: 96300   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:21,721-Speed 2632.13 samples/sec   Loss 12.2693   LearningRate 0.0781   Epoch: 2   Global Step: 96310   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:25,619-Speed 2627.37 samples/sec   Loss 12.2868   LearningRate 0.0781   Epoch: 2   Global Step: 96320   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:29,500-Speed 2639.11 samples/sec   Loss 12.2014   LearningRate 0.0781   Epoch: 2   Global Step: 96330   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:33,394-Speed 2630.70 samples/sec   Loss 12.3983   LearningRate 0.0781   Epoch: 2   Global Step: 96340   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:37,300-Speed 2622.34 samples/sec   Loss 12.3680   LearningRate 0.0781   Epoch: 2   Global Step: 96350   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:41,191-Speed 2632.32 samples/sec   Loss 12.3394   LearningRate 0.0781   Epoch: 2   Global Step: 96360   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:45,084-Speed 2630.88 samples/sec   Loss 12.3035   LearningRate 0.0781   Epoch: 2   Global Step: 96370   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:48,978-Speed 2630.74 samples/sec   Loss 12.4094   LearningRate 0.0781   Epoch: 2   Global Step: 96380   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:32:52,853-Speed 2642.88 samples/sec   Loss 12.2119   LearningRate 0.0781   Epoch: 2   Global Step: 96390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:32:56,763-Speed 2619.61 samples/sec   Loss 12.3041   LearningRate 0.0781   Epoch: 2   Global Step: 96400   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:00,657-Speed 2630.01 samples/sec   Loss 12.3963   LearningRate 0.0781   Epoch: 2   Global Step: 96410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:04,551-Speed 2630.13 samples/sec   Loss 12.3643   LearningRate 0.0781   Epoch: 2   Global Step: 96420   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:08,458-Speed 2621.95 samples/sec   Loss 12.3685   LearningRate 0.0781   Epoch: 2   Global Step: 96430   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:12,389-Speed 2606.15 samples/sec   Loss 12.2676   LearningRate 0.0781   Epoch: 2   Global Step: 96440   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:16,279-Speed 2632.46 samples/sec   Loss 12.2976   LearningRate 0.0781   Epoch: 2   Global Step: 96450   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:20,173-Speed 2630.82 samples/sec   Loss 12.2655   LearningRate 0.0781   Epoch: 2   Global Step: 96460   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:24,064-Speed 2631.93 samples/sec   Loss 12.2929   LearningRate 0.0781   Epoch: 2   Global Step: 96470   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:27,956-Speed 2632.15 samples/sec   Loss 12.3740   LearningRate 0.0781   Epoch: 2   Global Step: 96480   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:33:31,852-Speed 2628.71 samples/sec   Loss 12.3753   LearningRate 0.0781   Epoch: 2   Global Step: 96490   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:35,752-Speed 2625.73 samples/sec   Loss 12.4954   LearningRate 0.0781   Epoch: 2   Global Step: 96500   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:39,658-Speed 2622.21 samples/sec   Loss 12.3497   LearningRate 0.0781   Epoch: 2   Global Step: 96510   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:43,553-Speed 2629.98 samples/sec   Loss 12.2957   LearningRate 0.0781   Epoch: 2   Global Step: 96520   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:47,446-Speed 2631.39 samples/sec   Loss 12.3652   LearningRate 0.0781   Epoch: 2   Global Step: 96530   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:51,361-Speed 2616.37 samples/sec   Loss 12.3492   LearningRate 0.0781   Epoch: 2   Global Step: 96540   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:55,262-Speed 2625.72 samples/sec   Loss 12.3450   LearningRate 0.0781   Epoch: 2   Global Step: 96550   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:33:59,163-Speed 2625.91 samples/sec   Loss 12.3397   LearningRate 0.0781   Epoch: 2   Global Step: 96560   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:34:03,048-Speed 2635.92 samples/sec   Loss 12.1974   LearningRate 0.0781   Epoch: 2   Global Step: 96570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:06,952-Speed 2623.95 samples/sec   Loss 12.5146   LearningRate 0.0781   Epoch: 2   Global Step: 96580   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:10,850-Speed 2627.43 samples/sec   Loss 12.4264   LearningRate 0.0781   Epoch: 2   Global Step: 96590   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:14,755-Speed 2623.36 samples/sec   Loss 12.4100   LearningRate 0.0781   Epoch: 2   Global Step: 96600   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:18,656-Speed 2625.83 samples/sec   Loss 12.2056   LearningRate 0.0781   Epoch: 2   Global Step: 96610   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:22,550-Speed 2630.01 samples/sec   Loss 12.2610   LearningRate 0.0781   Epoch: 2   Global Step: 96620   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:26,441-Speed 2632.13 samples/sec   Loss 12.3153   LearningRate 0.0781   Epoch: 2   Global Step: 96630   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:30,335-Speed 2630.55 samples/sec   Loss 12.4366   LearningRate 0.0781   Epoch: 2   Global Step: 96640   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:34,227-Speed 2631.44 samples/sec   Loss 12.3978   LearningRate 0.0781   Epoch: 2   Global Step: 96650   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:38,142-Speed 2616.01 samples/sec   Loss 12.3955   LearningRate 0.0781   Epoch: 2   Global Step: 96660   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:34:42,036-Speed 2630.75 samples/sec   Loss 12.5405   LearningRate 0.0781   Epoch: 2   Global Step: 96670   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:34:45,940-Speed 2623.15 samples/sec   Loss 12.2334   LearningRate 0.0780   Epoch: 2   Global Step: 96680   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:34:49,834-Speed 2630.97 samples/sec   Loss 12.4347   LearningRate 0.0780   Epoch: 2   Global Step: 96690   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:34:53,733-Speed 2626.43 samples/sec   Loss 12.3697   LearningRate 0.0780   Epoch: 2   Global Step: 96700   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:34:57,664-Speed 2606.07 samples/sec   Loss 12.3712   LearningRate 0.0780   Epoch: 2   Global Step: 96710   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:35:01,550-Speed 2635.75 samples/sec   Loss 12.2798   LearningRate 0.0780   Epoch: 2   Global Step: 96720   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:05,459-Speed 2619.68 samples/sec   Loss 12.2087   LearningRate 0.0780   Epoch: 2   Global Step: 96730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:09,363-Speed 2623.67 samples/sec   Loss 12.4304   LearningRate 0.0780   Epoch: 2   Global Step: 96740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:13,268-Speed 2623.66 samples/sec   Loss 12.2959   LearningRate 0.0780   Epoch: 2   Global Step: 96750   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:17,185-Speed 2614.46 samples/sec   Loss 12.2907   LearningRate 0.0780   Epoch: 2   Global Step: 96760   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:21,166-Speed 2572.77 samples/sec   Loss 12.3801   LearningRate 0.0780   Epoch: 2   Global Step: 96770   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:25,265-Speed 2498.97 samples/sec   Loss 12.2830   LearningRate 0.0780   Epoch: 2   Global Step: 96780   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:29,170-Speed 2623.34 samples/sec   Loss 12.2197   LearningRate 0.0780   Epoch: 2   Global Step: 96790   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:33,068-Speed 2627.67 samples/sec   Loss 12.3357   LearningRate 0.0780   Epoch: 2   Global Step: 96800   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:36,977-Speed 2619.77 samples/sec   Loss 12.3319   LearningRate 0.0780   Epoch: 2   Global Step: 96810   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:35:40,885-Speed 2620.67 samples/sec   Loss 12.4075   LearningRate 0.0780   Epoch: 2   Global Step: 96820   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:35:44,788-Speed 2624.80 samples/sec   Loss 12.2945   LearningRate 0.0780   Epoch: 2   Global Step: 96830   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:35:48,684-Speed 2629.40 samples/sec   Loss 12.2732   LearningRate 0.0780   Epoch: 2   Global Step: 96840   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:35:52,578-Speed 2629.96 samples/sec   Loss 12.3866   LearningRate 0.0780   Epoch: 2   Global Step: 96850   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:35:56,474-Speed 2629.22 samples/sec   Loss 12.1310   LearningRate 0.0780   Epoch: 2   Global Step: 96860   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:00,369-Speed 2629.33 samples/sec   Loss 12.2598   LearningRate 0.0780   Epoch: 2   Global Step: 96870   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:04,270-Speed 2625.58 samples/sec   Loss 12.4297   LearningRate 0.0780   Epoch: 2   Global Step: 96880   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:08,172-Speed 2624.83 samples/sec   Loss 12.3366   LearningRate 0.0780   Epoch: 2   Global Step: 96890   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:12,069-Speed 2628.37 samples/sec   Loss 12.3643   LearningRate 0.0780   Epoch: 2   Global Step: 96900   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:15,967-Speed 2627.42 samples/sec   Loss 12.4281   LearningRate 0.0780   Epoch: 2   Global Step: 96910   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:19,850-Speed 2637.87 samples/sec   Loss 12.2273   LearningRate 0.0780   Epoch: 2   Global Step: 96920   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:23,740-Speed 2633.24 samples/sec   Loss 12.1373   LearningRate 0.0780   Epoch: 2   Global Step: 96930   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:27,630-Speed 2632.86 samples/sec   Loss 12.1727   LearningRate 0.0780   Epoch: 2   Global Step: 96940   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:31,524-Speed 2630.46 samples/sec   Loss 12.3128   LearningRate 0.0780   Epoch: 2   Global Step: 96950   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:35,421-Speed 2628.34 samples/sec   Loss 12.4147   LearningRate 0.0780   Epoch: 2   Global Step: 96960   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:39,321-Speed 2626.16 samples/sec   Loss 12.3024   LearningRate 0.0780   Epoch: 2   Global Step: 96970   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:43,216-Speed 2629.66 samples/sec   Loss 12.1948   LearningRate 0.0780   Epoch: 2   Global Step: 96980   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:47,111-Speed 2629.45 samples/sec   Loss 12.2093   LearningRate 0.0780   Epoch: 2   Global Step: 96990   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:51,002-Speed 2632.24 samples/sec   Loss 12.3577   LearningRate 0.0780   Epoch: 2   Global Step: 97000   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:54,910-Speed 2620.60 samples/sec   Loss 12.2757   LearningRate 0.0780   Epoch: 2   Global Step: 97010   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:36:58,795-Speed 2636.92 samples/sec   Loss 12.4586   LearningRate 0.0780   Epoch: 2   Global Step: 97020   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:02,691-Speed 2629.32 samples/sec   Loss 12.4469   LearningRate 0.0780   Epoch: 2   Global Step: 97030   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:06,585-Speed 2630.33 samples/sec   Loss 12.3337   LearningRate 0.0780   Epoch: 2   Global Step: 97040   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:10,480-Speed 2630.19 samples/sec   Loss 12.2611   LearningRate 0.0780   Epoch: 2   Global Step: 97050   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:14,386-Speed 2621.58 samples/sec   Loss 12.3325   LearningRate 0.0780   Epoch: 2   Global Step: 97060   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:18,281-Speed 2629.90 samples/sec   Loss 12.2781   LearningRate 0.0780   Epoch: 2   Global Step: 97070   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:22,178-Speed 2627.72 samples/sec   Loss 12.3842   LearningRate 0.0780   Epoch: 2   Global Step: 97080   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:26,101-Speed 2610.96 samples/sec   Loss 12.4398   LearningRate 0.0780   Epoch: 2   Global Step: 97090   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:30,005-Speed 2624.01 samples/sec   Loss 12.1393   LearningRate 0.0780   Epoch: 2   Global Step: 97100   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:37:33,885-Speed 2639.84 samples/sec   Loss 12.1844   LearningRate 0.0780   Epoch: 2   Global Step: 97110   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:37:37,788-Speed 2624.58 samples/sec   Loss 12.2365   LearningRate 0.0780   Epoch: 2   Global Step: 97120   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:37:41,683-Speed 2629.81 samples/sec   Loss 12.4282   LearningRate 0.0780   Epoch: 2   Global Step: 97130   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:37:45,582-Speed 2627.14 samples/sec   Loss 12.4310   LearningRate 0.0780   Epoch: 2   Global Step: 97140   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:37:49,478-Speed 2628.79 samples/sec   Loss 12.3243   LearningRate 0.0779   Epoch: 2   Global Step: 97150   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:37:53,377-Speed 2626.45 samples/sec   Loss 12.4180   LearningRate 0.0779   Epoch: 2   Global Step: 97160   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:37:57,275-Speed 2627.68 samples/sec   Loss 12.3597   LearningRate 0.0779   Epoch: 2   Global Step: 97170   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:01,171-Speed 2629.17 samples/sec   Loss 12.2422   LearningRate 0.0779   Epoch: 2   Global Step: 97180   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:05,065-Speed 2630.85 samples/sec   Loss 12.3515   LearningRate 0.0779   Epoch: 2   Global Step: 97190   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:08,964-Speed 2626.62 samples/sec   Loss 12.3795   LearningRate 0.0779   Epoch: 2   Global Step: 97200   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:12,865-Speed 2626.03 samples/sec   Loss 12.2042   LearningRate 0.0779   Epoch: 2   Global Step: 97210   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:38:16,765-Speed 2626.15 samples/sec   Loss 12.2763   LearningRate 0.0779   Epoch: 2   Global Step: 97220   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:38:20,661-Speed 2628.42 samples/sec   Loss 12.2958   LearningRate 0.0779   Epoch: 2   Global Step: 97230   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:38:24,543-Speed 2638.23 samples/sec   Loss 12.3865   LearningRate 0.0779   Epoch: 2   Global Step: 97240   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:28,439-Speed 2629.30 samples/sec   Loss 12.4271   LearningRate 0.0779   Epoch: 2   Global Step: 97250   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:32,340-Speed 2625.91 samples/sec   Loss 12.2510   LearningRate 0.0779   Epoch: 2   Global Step: 97260   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:36,235-Speed 2629.21 samples/sec   Loss 12.1196   LearningRate 0.0779   Epoch: 2   Global Step: 97270   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:40,149-Speed 2617.40 samples/sec   Loss 12.1788   LearningRate 0.0779   Epoch: 2   Global Step: 97280   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:44,043-Speed 2630.39 samples/sec   Loss 12.4076   LearningRate 0.0779   Epoch: 2   Global Step: 97290   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:47,935-Speed 2631.07 samples/sec   Loss 12.4209   LearningRate 0.0779   Epoch: 2   Global Step: 97300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:51,826-Speed 2632.58 samples/sec   Loss 12.4034   LearningRate 0.0779   Epoch: 2   Global Step: 97310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:55,723-Speed 2628.08 samples/sec   Loss 12.4033   LearningRate 0.0779   Epoch: 2   Global Step: 97320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:38:59,618-Speed 2629.77 samples/sec   Loss 12.2769   LearningRate 0.0779   Epoch: 2   Global Step: 97330   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:03,519-Speed 2625.25 samples/sec   Loss 12.2340   LearningRate 0.0779   Epoch: 2   Global Step: 97340   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:39:07,420-Speed 2625.99 samples/sec   Loss 12.2644   LearningRate 0.0779   Epoch: 2   Global Step: 97350   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:39:11,315-Speed 2629.28 samples/sec   Loss 12.3639   LearningRate 0.0779   Epoch: 2   Global Step: 97360   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:39:15,213-Speed 2627.98 samples/sec   Loss 12.3600   LearningRate 0.0779   Epoch: 2   Global Step: 97370   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:39:19,115-Speed 2625.14 samples/sec   Loss 12.2585   LearningRate 0.0779   Epoch: 2   Global Step: 97380   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:39:22,999-Speed 2636.66 samples/sec   Loss 12.3830   LearningRate 0.0779   Epoch: 2   Global Step: 97390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:26,895-Speed 2629.43 samples/sec   Loss 12.2077   LearningRate 0.0779   Epoch: 2   Global Step: 97400   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:30,803-Speed 2620.35 samples/sec   Loss 12.3076   LearningRate 0.0779   Epoch: 2   Global Step: 97410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:34,694-Speed 2632.08 samples/sec   Loss 12.2287   LearningRate 0.0779   Epoch: 2   Global Step: 97420   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:38,590-Speed 2629.40 samples/sec   Loss 12.3457   LearningRate 0.0779   Epoch: 2   Global Step: 97430   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:42,488-Speed 2627.45 samples/sec   Loss 12.3181   LearningRate 0.0779   Epoch: 2   Global Step: 97440   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:46,384-Speed 2629.06 samples/sec   Loss 12.2099   LearningRate 0.0779   Epoch: 2   Global Step: 97450   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:50,278-Speed 2630.66 samples/sec   Loss 12.4412   LearningRate 0.0779   Epoch: 2   Global Step: 97460   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:54,175-Speed 2628.27 samples/sec   Loss 12.3456   LearningRate 0.0779   Epoch: 2   Global Step: 97470   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:39:58,073-Speed 2627.91 samples/sec   Loss 12.1849   LearningRate 0.0779   Epoch: 2   Global Step: 97480   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:01,968-Speed 2629.29 samples/sec   Loss 12.1210   LearningRate 0.0779   Epoch: 2   Global Step: 97490   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:40:05,865-Speed 2628.33 samples/sec   Loss 12.3235   LearningRate 0.0779   Epoch: 2   Global Step: 97500   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:40:09,762-Speed 2627.60 samples/sec   Loss 12.3248   LearningRate 0.0779   Epoch: 2   Global Step: 97510   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:40:13,680-Speed 2615.13 samples/sec   Loss 12.2207   LearningRate 0.0779   Epoch: 2   Global Step: 97520   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:40:17,574-Speed 2630.29 samples/sec   Loss 12.4907   LearningRate 0.0779   Epoch: 2   Global Step: 97530   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:40:21,452-Speed 2640.87 samples/sec   Loss 12.4269   LearningRate 0.0779   Epoch: 2   Global Step: 97540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:25,354-Speed 2625.17 samples/sec   Loss 12.2611   LearningRate 0.0779   Epoch: 2   Global Step: 97550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:29,249-Speed 2630.09 samples/sec   Loss 12.2763   LearningRate 0.0779   Epoch: 2   Global Step: 97560   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:33,146-Speed 2628.22 samples/sec   Loss 12.4356   LearningRate 0.0779   Epoch: 2   Global Step: 97570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:37,070-Speed 2610.30 samples/sec   Loss 12.1865   LearningRate 0.0779   Epoch: 2   Global Step: 97580   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:40,997-Speed 2607.70 samples/sec   Loss 12.3766   LearningRate 0.0779   Epoch: 2   Global Step: 97590   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:44,894-Speed 2629.05 samples/sec   Loss 12.3209   LearningRate 0.0779   Epoch: 2   Global Step: 97600   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:48,813-Speed 2613.75 samples/sec   Loss 12.3284   LearningRate 0.0779   Epoch: 2   Global Step: 97610   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:52,729-Speed 2615.78 samples/sec   Loss 12.4818   LearningRate 0.0778   Epoch: 2   Global Step: 97620   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:40:56,620-Speed 2632.02 samples/sec   Loss 12.4138   LearningRate 0.0778   Epoch: 2   Global Step: 97630   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:00,523-Speed 2624.43 samples/sec   Loss 12.1869   LearningRate 0.0778   Epoch: 2   Global Step: 97640   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:41:04,403-Speed 2640.19 samples/sec   Loss 12.3037   LearningRate 0.0778   Epoch: 2   Global Step: 97650   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:08,300-Speed 2628.19 samples/sec   Loss 12.3170   LearningRate 0.0778   Epoch: 2   Global Step: 97660   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:12,198-Speed 2627.51 samples/sec   Loss 12.2405   LearningRate 0.0778   Epoch: 2   Global Step: 97670   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:16,104-Speed 2622.15 samples/sec   Loss 12.2920   LearningRate 0.0778   Epoch: 2   Global Step: 97680   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:20,015-Speed 2618.93 samples/sec   Loss 12.2653   LearningRate 0.0778   Epoch: 2   Global Step: 97690   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:23,923-Speed 2620.92 samples/sec   Loss 12.2847   LearningRate 0.0778   Epoch: 2   Global Step: 97700   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:27,816-Speed 2631.48 samples/sec   Loss 12.3061   LearningRate 0.0778   Epoch: 2   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:31,712-Speed 2628.72 samples/sec   Loss 12.3854   LearningRate 0.0778   Epoch: 2   Global Step: 97720   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:35,607-Speed 2629.62 samples/sec   Loss 12.7009   LearningRate 0.0778   Epoch: 2   Global Step: 97730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:39,507-Speed 2625.95 samples/sec   Loss 12.4542   LearningRate 0.0778   Epoch: 2   Global Step: 97740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:41:43,405-Speed 2628.26 samples/sec   Loss 12.3482   LearningRate 0.0778   Epoch: 2   Global Step: 97750   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:41:47,314-Speed 2619.81 samples/sec   Loss 12.3821   LearningRate 0.0778   Epoch: 2   Global Step: 97760   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:41:51,212-Speed 2628.21 samples/sec   Loss 12.3500   LearningRate 0.0778   Epoch: 2   Global Step: 97770   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:41:55,115-Speed 2624.00 samples/sec   Loss 12.2425   LearningRate 0.0778   Epoch: 2   Global Step: 97780   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:41:59,019-Speed 2624.02 samples/sec   Loss 12.3848   LearningRate 0.0778   Epoch: 2   Global Step: 97790   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:42:02,897-Speed 2640.74 samples/sec   Loss 12.3483   LearningRate 0.0778   Epoch: 2   Global Step: 97800   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:06,791-Speed 2630.45 samples/sec   Loss 12.2266   LearningRate 0.0778   Epoch: 2   Global Step: 97810   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:10,683-Speed 2631.39 samples/sec   Loss 12.3804   LearningRate 0.0778   Epoch: 2   Global Step: 97820   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:14,574-Speed 2632.56 samples/sec   Loss 12.0463   LearningRate 0.0778   Epoch: 2   Global Step: 97830   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:18,476-Speed 2625.19 samples/sec   Loss 12.3594   LearningRate 0.0778   Epoch: 2   Global Step: 97840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:22,388-Speed 2618.42 samples/sec   Loss 12.2352   LearningRate 0.0778   Epoch: 2   Global Step: 97850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:26,290-Speed 2624.86 samples/sec   Loss 12.3976   LearningRate 0.0778   Epoch: 2   Global Step: 97860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:30,185-Speed 2628.87 samples/sec   Loss 12.2179   LearningRate 0.0778   Epoch: 2   Global Step: 97870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:34,080-Speed 2630.38 samples/sec   Loss 12.3030   LearningRate 0.0778   Epoch: 2   Global Step: 97880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:37,972-Speed 2631.81 samples/sec   Loss 12.3086   LearningRate 0.0778   Epoch: 2   Global Step: 97890   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:41,901-Speed 2606.98 samples/sec   Loss 12.3963   LearningRate 0.0778   Epoch: 2   Global Step: 97900   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:42:45,813-Speed 2618.57 samples/sec   Loss 12.2959   LearningRate 0.0778   Epoch: 2   Global Step: 97910   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:42:49,693-Speed 2639.76 samples/sec   Loss 12.3165   LearningRate 0.0778   Epoch: 2   Global Step: 97920   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:53,591-Speed 2627.94 samples/sec   Loss 12.3054   LearningRate 0.0778   Epoch: 2   Global Step: 97930   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:42:57,483-Speed 2631.60 samples/sec   Loss 12.1860   LearningRate 0.0778   Epoch: 2   Global Step: 97940   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:01,385-Speed 2624.84 samples/sec   Loss 12.4365   LearningRate 0.0778   Epoch: 2   Global Step: 97950   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:05,285-Speed 2626.51 samples/sec   Loss 12.3350   LearningRate 0.0778   Epoch: 2   Global Step: 97960   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:09,185-Speed 2626.53 samples/sec   Loss 12.2568   LearningRate 0.0778   Epoch: 2   Global Step: 97970   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:13,082-Speed 2627.97 samples/sec   Loss 12.3683   LearningRate 0.0778   Epoch: 2   Global Step: 97980   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:16,990-Speed 2621.55 samples/sec   Loss 12.3395   LearningRate 0.0778   Epoch: 2   Global Step: 97990   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:20,894-Speed 2623.31 samples/sec   Loss 12.1997   LearningRate 0.0778   Epoch: 2   Global Step: 98000   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:24,811-Speed 2614.59 samples/sec   Loss 12.2019   LearningRate 0.0778   Epoch: 2   Global Step: 98010   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:28,714-Speed 2624.45 samples/sec   Loss 12.2424   LearningRate 0.0778   Epoch: 2   Global Step: 98020   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:43:32,605-Speed 2632.44 samples/sec   Loss 12.1498   LearningRate 0.0778   Epoch: 2   Global Step: 98030   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:36,517-Speed 2618.06 samples/sec   Loss 12.2320   LearningRate 0.0778   Epoch: 2   Global Step: 98040   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:40,436-Speed 2613.71 samples/sec   Loss 12.2985   LearningRate 0.0778   Epoch: 2   Global Step: 98050   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:44,331-Speed 2629.81 samples/sec   Loss 12.1855   LearningRate 0.0778   Epoch: 2   Global Step: 98060   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:48,260-Speed 2607.26 samples/sec   Loss 12.2905   LearningRate 0.0778   Epoch: 2   Global Step: 98070   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:52,157-Speed 2628.05 samples/sec   Loss 12.3484   LearningRate 0.0778   Epoch: 2   Global Step: 98080   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:56,051-Speed 2630.71 samples/sec   Loss 12.2651   LearningRate 0.0777   Epoch: 2   Global Step: 98090   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:43:59,947-Speed 2628.68 samples/sec   Loss 12.2646   LearningRate 0.0777   Epoch: 2   Global Step: 98100   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:03,877-Speed 2606.37 samples/sec   Loss 12.3203   LearningRate 0.0777   Epoch: 2   Global Step: 98110   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:07,785-Speed 2621.06 samples/sec   Loss 12.3751   LearningRate 0.0777   Epoch: 2   Global Step: 98120   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:11,681-Speed 2629.53 samples/sec   Loss 12.3964   LearningRate 0.0777   Epoch: 2   Global Step: 98130   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:44:15,576-Speed 2629.54 samples/sec   Loss 12.3090   LearningRate 0.0777   Epoch: 2   Global Step: 98140   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:44:19,481-Speed 2622.94 samples/sec   Loss 12.4678   LearningRate 0.0777   Epoch: 2   Global Step: 98150   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:44:23,374-Speed 2631.12 samples/sec   Loss 12.2377   LearningRate 0.0777   Epoch: 2   Global Step: 98160   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:44:27,274-Speed 2625.72 samples/sec   Loss 12.4625   LearningRate 0.0777   Epoch: 2   Global Step: 98170   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:44:31,149-Speed 2643.01 samples/sec   Loss 12.4256   LearningRate 0.0777   Epoch: 2   Global Step: 98180   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:35,041-Speed 2632.33 samples/sec   Loss 12.3008   LearningRate 0.0777   Epoch: 2   Global Step: 98190   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:38,933-Speed 2631.69 samples/sec   Loss 12.0643   LearningRate 0.0777   Epoch: 2   Global Step: 98200   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:42,827-Speed 2630.27 samples/sec   Loss 12.3365   LearningRate 0.0777   Epoch: 2   Global Step: 98210   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:46,741-Speed 2617.10 samples/sec   Loss 12.2994   LearningRate 0.0777   Epoch: 2   Global Step: 98220   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:50,633-Speed 2631.59 samples/sec   Loss 12.2434   LearningRate 0.0777   Epoch: 2   Global Step: 98230   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:54,528-Speed 2629.81 samples/sec   Loss 12.3444   LearningRate 0.0777   Epoch: 2   Global Step: 98240   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:44:58,423-Speed 2629.75 samples/sec   Loss 12.3586   LearningRate 0.0777   Epoch: 2   Global Step: 98250   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:02,317-Speed 2629.93 samples/sec   Loss 12.3026   LearningRate 0.0777   Epoch: 2   Global Step: 98260   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:06,214-Speed 2628.61 samples/sec   Loss 12.3159   LearningRate 0.0777   Epoch: 2   Global Step: 98270   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:10,107-Speed 2630.89 samples/sec   Loss 12.4043   LearningRate 0.0777   Epoch: 2   Global Step: 98280   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:45:13,990-Speed 2638.03 samples/sec   Loss 12.1773   LearningRate 0.0777   Epoch: 2   Global Step: 98290   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:17,888-Speed 2627.82 samples/sec   Loss 11.9870   LearningRate 0.0777   Epoch: 2   Global Step: 98300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:21,787-Speed 2626.70 samples/sec   Loss 12.2542   LearningRate 0.0777   Epoch: 2   Global Step: 98310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:25,683-Speed 2629.18 samples/sec   Loss 12.2346   LearningRate 0.0777   Epoch: 2   Global Step: 98320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:29,577-Speed 2630.47 samples/sec   Loss 12.0580   LearningRate 0.0777   Epoch: 2   Global Step: 98330   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:33,483-Speed 2621.95 samples/sec   Loss 12.2221   LearningRate 0.0777   Epoch: 2   Global Step: 98340   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:37,374-Speed 2631.95 samples/sec   Loss 12.2959   LearningRate 0.0777   Epoch: 2   Global Step: 98350   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:45:41,260-Speed 2636.17 samples/sec   Loss 12.2468   LearningRate 0.0777   Epoch: 2   Global Step: 98360   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:45:45,156-Speed 2629.25 samples/sec   Loss 12.3013   LearningRate 0.0777   Epoch: 2   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:45:49,071-Speed 2616.04 samples/sec   Loss 12.3426   LearningRate 0.0777   Epoch: 2   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:45:52,990-Speed 2613.69 samples/sec   Loss 12.4195   LearningRate 0.0777   Epoch: 2   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:45:56,903-Speed 2617.69 samples/sec   Loss 12.3072   LearningRate 0.0777   Epoch: 2   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:46:00,802-Speed 2626.24 samples/sec   Loss 12.4059   LearningRate 0.0777   Epoch: 2   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:46:04,715-Speed 2617.45 samples/sec   Loss 12.2283   LearningRate 0.0777   Epoch: 2   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:46:08,629-Speed 2616.60 samples/sec   Loss 12.2024   LearningRate 0.0777   Epoch: 2   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:46:12,525-Speed 2629.73 samples/sec   Loss 12.2876   LearningRate 0.0777   Epoch: 2   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:46:16,425-Speed 2626.34 samples/sec   Loss 12.0745   LearningRate 0.0777   Epoch: 2   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:46:20,335-Speed 2619.52 samples/sec   Loss 12.2155   LearningRate 0.0777   Epoch: 2   Global Step: 98460   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:24,236-Speed 2625.49 samples/sec   Loss 12.2746   LearningRate 0.0777   Epoch: 2   Global Step: 98470   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:28,133-Speed 2628.33 samples/sec   Loss 12.2056   LearningRate 0.0777   Epoch: 2   Global Step: 98480   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:32,035-Speed 2624.79 samples/sec   Loss 12.3377   LearningRate 0.0777   Epoch: 2   Global Step: 98490   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:35,929-Speed 2629.96 samples/sec   Loss 12.3427   LearningRate 0.0777   Epoch: 2   Global Step: 98500   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:39,827-Speed 2627.64 samples/sec   Loss 12.2990   LearningRate 0.0777   Epoch: 2   Global Step: 98510   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:43,724-Speed 2628.66 samples/sec   Loss 12.2589   LearningRate 0.0777   Epoch: 2   Global Step: 98520   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:47,619-Speed 2629.91 samples/sec   Loss 12.3811   LearningRate 0.0777   Epoch: 2   Global Step: 98530   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:51,512-Speed 2630.99 samples/sec   Loss 12.1863   LearningRate 0.0777   Epoch: 2   Global Step: 98540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:55,424-Speed 2617.84 samples/sec   Loss 12.2202   LearningRate 0.0777   Epoch: 2   Global Step: 98550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:46:59,335-Speed 2619.40 samples/sec   Loss 12.2662   LearningRate 0.0777   Epoch: 2   Global Step: 98560   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:47:03,209-Speed 2643.91 samples/sec   Loss 12.3252   LearningRate 0.0776   Epoch: 2   Global Step: 98570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:47:07,106-Speed 2628.42 samples/sec   Loss 12.3137   LearningRate 0.0776   Epoch: 2   Global Step: 98580   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:47:11,002-Speed 2629.34 samples/sec   Loss 12.2972   LearningRate 0.0776   Epoch: 2   Global Step: 98590   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:47:14,910-Speed 2620.81 samples/sec   Loss 12.3709   LearningRate 0.0776   Epoch: 2   Global Step: 98600   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:47:18,813-Speed 2625.08 samples/sec   Loss 12.3277   LearningRate 0.0776   Epoch: 2   Global Step: 98610   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:47:22,709-Speed 2628.50 samples/sec   Loss 12.2514   LearningRate 0.0776   Epoch: 2   Global Step: 98620   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:47:26,586-Speed 2642.28 samples/sec   Loss 12.2494   LearningRate 0.0776   Epoch: 2   Global Step: 98630   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:30,521-Speed 2602.45 samples/sec   Loss 12.2572   LearningRate 0.0776   Epoch: 2   Global Step: 98640   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:34,418-Speed 2628.59 samples/sec   Loss 12.1931   LearningRate 0.0776   Epoch: 2   Global Step: 98650   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:38,312-Speed 2630.42 samples/sec   Loss 12.3268   LearningRate 0.0776   Epoch: 2   Global Step: 98660   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:42,210-Speed 2627.86 samples/sec   Loss 12.2915   LearningRate 0.0776   Epoch: 2   Global Step: 98670   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:46,114-Speed 2623.43 samples/sec   Loss 12.2557   LearningRate 0.0776   Epoch: 2   Global Step: 98680   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:50,010-Speed 2628.94 samples/sec   Loss 12.3179   LearningRate 0.0776   Epoch: 2   Global Step: 98690   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:53,908-Speed 2627.54 samples/sec   Loss 12.1940   LearningRate 0.0776   Epoch: 2   Global Step: 98700   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:47:57,855-Speed 2595.54 samples/sec   Loss 12.2895   LearningRate 0.0776   Epoch: 2   Global Step: 98710   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:48:01,749-Speed 2630.26 samples/sec   Loss 12.3345   LearningRate 0.0776   Epoch: 2   Global Step: 98720   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:48:05,645-Speed 2628.61 samples/sec   Loss 12.3096   LearningRate 0.0776   Epoch: 2   Global Step: 98730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:09,544-Speed 2626.53 samples/sec   Loss 12.3228   LearningRate 0.0776   Epoch: 2   Global Step: 98740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:13,470-Speed 2609.30 samples/sec   Loss 12.3506   LearningRate 0.0776   Epoch: 2   Global Step: 98750   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:17,510-Speed 2535.46 samples/sec   Loss 12.2290   LearningRate 0.0776   Epoch: 2   Global Step: 98760   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:21,436-Speed 2608.69 samples/sec   Loss 12.2514   LearningRate 0.0776   Epoch: 2   Global Step: 98770   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:25,369-Speed 2605.00 samples/sec   Loss 12.2289   LearningRate 0.0776   Epoch: 2   Global Step: 98780   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:29,263-Speed 2630.22 samples/sec   Loss 12.3239   LearningRate 0.0776   Epoch: 2   Global Step: 98790   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:33,183-Speed 2613.02 samples/sec   Loss 12.3932   LearningRate 0.0776   Epoch: 2   Global Step: 98800   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:37,103-Speed 2612.62 samples/sec   Loss 12.2764   LearningRate 0.0776   Epoch: 2   Global Step: 98810   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:41,079-Speed 2576.30 samples/sec   Loss 12.3404   LearningRate 0.0776   Epoch: 2   Global Step: 98820   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:44,994-Speed 2616.40 samples/sec   Loss 12.3094   LearningRate 0.0776   Epoch: 2   Global Step: 98830   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:48:48,880-Speed 2635.78 samples/sec   Loss 12.1403   LearningRate 0.0776   Epoch: 2   Global Step: 98840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:52,883-Speed 2558.41 samples/sec   Loss 12.2174   LearningRate 0.0776   Epoch: 2   Global Step: 98850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:48:56,782-Speed 2626.69 samples/sec   Loss 12.1715   LearningRate 0.0776   Epoch: 2   Global Step: 98860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:00,679-Speed 2628.78 samples/sec   Loss 12.1519   LearningRate 0.0776   Epoch: 2   Global Step: 98870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:04,578-Speed 2627.20 samples/sec   Loss 12.3221   LearningRate 0.0776   Epoch: 2   Global Step: 98880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:08,473-Speed 2629.23 samples/sec   Loss 12.2405   LearningRate 0.0776   Epoch: 2   Global Step: 98890   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:12,370-Speed 2628.58 samples/sec   Loss 12.3036   LearningRate 0.0776   Epoch: 2   Global Step: 98900   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:16,276-Speed 2622.42 samples/sec   Loss 12.2763   LearningRate 0.0776   Epoch: 2   Global Step: 98910   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:20,175-Speed 2627.21 samples/sec   Loss 12.2060   LearningRate 0.0776   Epoch: 2   Global Step: 98920   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:24,074-Speed 2626.69 samples/sec   Loss 12.2739   LearningRate 0.0776   Epoch: 2   Global Step: 98930   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:27,969-Speed 2629.58 samples/sec   Loss 12.3316   LearningRate 0.0776   Epoch: 2   Global Step: 98940   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:49:31,877-Speed 2620.64 samples/sec   Loss 12.3206   LearningRate 0.0776   Epoch: 2   Global Step: 98950   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:49:35,777-Speed 2626.51 samples/sec   Loss 12.2627   LearningRate 0.0776   Epoch: 2   Global Step: 98960   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:49:39,653-Speed 2642.64 samples/sec   Loss 12.3404   LearningRate 0.0776   Epoch: 2   Global Step: 98970   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:43,553-Speed 2626.28 samples/sec   Loss 12.2721   LearningRate 0.0776   Epoch: 2   Global Step: 98980   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:47,455-Speed 2625.21 samples/sec   Loss 12.1025   LearningRate 0.0776   Epoch: 2   Global Step: 98990   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:51,344-Speed 2633.18 samples/sec   Loss 12.4266   LearningRate 0.0776   Epoch: 2   Global Step: 99000   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:55,241-Speed 2628.32 samples/sec   Loss 12.1279   LearningRate 0.0776   Epoch: 2   Global Step: 99010   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:49:59,138-Speed 2628.28 samples/sec   Loss 12.2729   LearningRate 0.0776   Epoch: 2   Global Step: 99020   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:50:03,028-Speed 2633.67 samples/sec   Loss 12.2735   LearningRate 0.0776   Epoch: 2   Global Step: 99030   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:50:06,924-Speed 2629.02 samples/sec   Loss 12.3430   LearningRate 0.0775   Epoch: 2   Global Step: 99040   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:50:10,816-Speed 2631.30 samples/sec   Loss 12.3024   LearningRate 0.0775   Epoch: 2   Global Step: 99050   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:50:14,711-Speed 2629.69 samples/sec   Loss 12.2179   LearningRate 0.0775   Epoch: 2   Global Step: 99060   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:50:18,602-Speed 2632.31 samples/sec   Loss 12.3877   LearningRate 0.0775   Epoch: 2   Global Step: 99070   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:22,513-Speed 2618.67 samples/sec   Loss 12.3577   LearningRate 0.0775   Epoch: 2   Global Step: 99080   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:26,406-Speed 2630.94 samples/sec   Loss 12.2985   LearningRate 0.0775   Epoch: 2   Global Step: 99090   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:30,480-Speed 2513.96 samples/sec   Loss 12.3446   LearningRate 0.0775   Epoch: 2   Global Step: 99100   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:34,375-Speed 2630.38 samples/sec   Loss 12.2550   LearningRate 0.0775   Epoch: 2   Global Step: 99110   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:38,268-Speed 2630.93 samples/sec   Loss 12.2807   LearningRate 0.0775   Epoch: 2   Global Step: 99120   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:42,188-Speed 2612.63 samples/sec   Loss 12.3745   LearningRate 0.0775   Epoch: 2   Global Step: 99130   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:46,101-Speed 2617.96 samples/sec   Loss 12.4002   LearningRate 0.0775   Epoch: 2   Global Step: 99140   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:50,003-Speed 2625.32 samples/sec   Loss 12.4008   LearningRate 0.0775   Epoch: 2   Global Step: 99150   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:53,916-Speed 2617.34 samples/sec   Loss 12.2546   LearningRate 0.0775   Epoch: 2   Global Step: 99160   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:50:57,792-Speed 2643.07 samples/sec   Loss 12.3300   LearningRate 0.0775   Epoch: 2   Global Step: 99170   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:51:01,695-Speed 2624.04 samples/sec   Loss 12.0957   LearningRate 0.0775   Epoch: 2   Global Step: 99180   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:51:05,590-Speed 2629.99 samples/sec   Loss 12.3216   LearningRate 0.0775   Epoch: 2   Global Step: 99190   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:51:09,482-Speed 2631.93 samples/sec   Loss 12.2596   LearningRate 0.0775   Epoch: 2   Global Step: 99200   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:51:13,384-Speed 2624.44 samples/sec   Loss 12.4100   LearningRate 0.0775   Epoch: 2   Global Step: 99210   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:51:17,268-Speed 2637.32 samples/sec   Loss 12.3623   LearningRate 0.0775   Epoch: 2   Global Step: 99220   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:21,160-Speed 2631.32 samples/sec   Loss 12.2558   LearningRate 0.0775   Epoch: 2   Global Step: 99230   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:25,057-Speed 2628.90 samples/sec   Loss 12.2057   LearningRate 0.0775   Epoch: 2   Global Step: 99240   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:28,952-Speed 2630.29 samples/sec   Loss 12.1656   LearningRate 0.0775   Epoch: 2   Global Step: 99250   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:32,857-Speed 2622.81 samples/sec   Loss 12.2123   LearningRate 0.0775   Epoch: 2   Global Step: 99260   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:36,753-Speed 2628.98 samples/sec   Loss 12.3698   LearningRate 0.0775   Epoch: 2   Global Step: 99270   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:40,648-Speed 2629.44 samples/sec   Loss 12.2953   LearningRate 0.0775   Epoch: 2   Global Step: 99280   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:44,551-Speed 2624.00 samples/sec   Loss 12.2690   LearningRate 0.0775   Epoch: 2   Global Step: 99290   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:48,462-Speed 2618.97 samples/sec   Loss 12.2229   LearningRate 0.0775   Epoch: 2   Global Step: 99300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:52,363-Speed 2625.74 samples/sec   Loss 12.2691   LearningRate 0.0775   Epoch: 2   Global Step: 99310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:51:56,260-Speed 2628.13 samples/sec   Loss 12.3495   LearningRate 0.0775   Epoch: 2   Global Step: 99320   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:52:00,158-Speed 2627.76 samples/sec   Loss 12.2490   LearningRate 0.0775   Epoch: 2   Global Step: 99330   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:52:04,053-Speed 2629.51 samples/sec   Loss 12.2586   LearningRate 0.0775   Epoch: 2   Global Step: 99340   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:52:07,933-Speed 2640.15 samples/sec   Loss 12.4508   LearningRate 0.0775   Epoch: 2   Global Step: 99350   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:11,840-Speed 2620.92 samples/sec   Loss 12.1196   LearningRate 0.0775   Epoch: 2   Global Step: 99360   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:15,746-Speed 2622.39 samples/sec   Loss 12.2132   LearningRate 0.0775   Epoch: 2   Global Step: 99370   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:19,642-Speed 2628.93 samples/sec   Loss 12.2086   LearningRate 0.0775   Epoch: 2   Global Step: 99380   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:23,547-Speed 2622.98 samples/sec   Loss 12.3027   LearningRate 0.0775   Epoch: 2   Global Step: 99390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:27,449-Speed 2624.80 samples/sec   Loss 12.3828   LearningRate 0.0775   Epoch: 2   Global Step: 99400   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:31,340-Speed 2632.46 samples/sec   Loss 12.0772   LearningRate 0.0775   Epoch: 2   Global Step: 99410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:52:35,220-Speed 2640.05 samples/sec   Loss 12.2261   LearningRate 0.0775   Epoch: 2   Global Step: 99420   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:52:39,119-Speed 2626.68 samples/sec   Loss 12.2769   LearningRate 0.0775   Epoch: 2   Global Step: 99430   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:52:43,023-Speed 2623.63 samples/sec   Loss 12.2801   LearningRate 0.0775   Epoch: 2   Global Step: 99440   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:52:46,920-Speed 2628.51 samples/sec   Loss 12.1979   LearningRate 0.0775   Epoch: 2   Global Step: 99450   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:52:50,836-Speed 2615.52 samples/sec   Loss 12.1422   LearningRate 0.0775   Epoch: 2   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:52:54,734-Speed 2627.30 samples/sec   Loss 12.2356   LearningRate 0.0775   Epoch: 2   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:52:58,641-Speed 2621.86 samples/sec   Loss 12.2459   LearningRate 0.0775   Epoch: 2   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:53:02,535-Speed 2629.95 samples/sec   Loss 12.2970   LearningRate 0.0775   Epoch: 2   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:53:06,431-Speed 2628.43 samples/sec   Loss 12.3458   LearningRate 0.0775   Epoch: 2   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:53:10,344-Speed 2617.61 samples/sec   Loss 12.3646   LearningRate 0.0774   Epoch: 2   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 06:53:14,300-Speed 2589.69 samples/sec   Loss 12.3010   LearningRate 0.0774   Epoch: 2   Global Step: 99520   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:18,192-Speed 2631.28 samples/sec   Loss 12.2785   LearningRate 0.0774   Epoch: 2   Global Step: 99530   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:22,086-Speed 2632.66 samples/sec   Loss 12.2767   LearningRate 0.0774   Epoch: 2   Global Step: 99540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:25,975-Speed 2633.38 samples/sec   Loss 12.0599   LearningRate 0.0774   Epoch: 2   Global Step: 99550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:29,958-Speed 2571.81 samples/sec   Loss 12.3426   LearningRate 0.0774   Epoch: 2   Global Step: 99560   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:33,855-Speed 2627.95 samples/sec   Loss 12.3186   LearningRate 0.0774   Epoch: 2   Global Step: 99570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:37,912-Speed 2524.35 samples/sec   Loss 12.1964   LearningRate 0.0774   Epoch: 2   Global Step: 99580   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:41,993-Speed 2509.79 samples/sec   Loss 12.3181   LearningRate 0.0774   Epoch: 2   Global Step: 99590   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:45,930-Speed 2601.69 samples/sec   Loss 12.1606   LearningRate 0.0774   Epoch: 2   Global Step: 99600   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:50,002-Speed 2515.32 samples/sec   Loss 12.0989   LearningRate 0.0774   Epoch: 2   Global Step: 99610   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:53:54,073-Speed 2516.09 samples/sec   Loss 12.2968   LearningRate 0.0774   Epoch: 2   Global Step: 99620   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:53:57,996-Speed 2610.85 samples/sec   Loss 12.1463   LearningRate 0.0774   Epoch: 2   Global Step: 99630   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:54:01,899-Speed 2623.85 samples/sec   Loss 12.2792   LearningRate 0.0774   Epoch: 2   Global Step: 99640   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:54:05,794-Speed 2629.79 samples/sec   Loss 12.1388   LearningRate 0.0774   Epoch: 2   Global Step: 99650   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:54:09,679-Speed 2636.06 samples/sec   Loss 12.3330   LearningRate 0.0774   Epoch: 2   Global Step: 99660   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:13,577-Speed 2627.79 samples/sec   Loss 12.1135   LearningRate 0.0774   Epoch: 2   Global Step: 99670   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:17,474-Speed 2628.32 samples/sec   Loss 12.2458   LearningRate 0.0774   Epoch: 2   Global Step: 99680   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:21,370-Speed 2628.46 samples/sec   Loss 12.2339   LearningRate 0.0774   Epoch: 2   Global Step: 99690   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:25,266-Speed 2629.76 samples/sec   Loss 12.1387   LearningRate 0.0774   Epoch: 2   Global Step: 99700   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:29,162-Speed 2629.20 samples/sec   Loss 12.3997   LearningRate 0.0774   Epoch: 2   Global Step: 99710   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:33,079-Speed 2614.15 samples/sec   Loss 12.3275   LearningRate 0.0774   Epoch: 2   Global Step: 99720   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:36,986-Speed 2621.78 samples/sec   Loss 12.0762   LearningRate 0.0774   Epoch: 2   Global Step: 99730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:40,886-Speed 2626.40 samples/sec   Loss 12.2722   LearningRate 0.0774   Epoch: 2   Global Step: 99740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:44,802-Speed 2615.76 samples/sec   Loss 12.1985   LearningRate 0.0774   Epoch: 2   Global Step: 99750   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:54:48,705-Speed 2623.60 samples/sec   Loss 12.2206   LearningRate 0.0774   Epoch: 2   Global Step: 99760   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:54:52,617-Speed 2619.27 samples/sec   Loss 12.2315   LearningRate 0.0774   Epoch: 2   Global Step: 99770   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:54:56,515-Speed 2627.25 samples/sec   Loss 12.2358   LearningRate 0.0774   Epoch: 2   Global Step: 99780   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:55:00,417-Speed 2624.56 samples/sec   Loss 12.2619   LearningRate 0.0774   Epoch: 2   Global Step: 99790   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:55:04,315-Speed 2627.79 samples/sec   Loss 12.1700   LearningRate 0.0774   Epoch: 2   Global Step: 99800   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:55:08,191-Speed 2642.92 samples/sec   Loss 12.4104   LearningRate 0.0774   Epoch: 2   Global Step: 99810   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:12,099-Speed 2621.19 samples/sec   Loss 12.2275   LearningRate 0.0774   Epoch: 2   Global Step: 99820   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:16,000-Speed 2625.69 samples/sec   Loss 12.2997   LearningRate 0.0774   Epoch: 2   Global Step: 99830   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:19,899-Speed 2626.58 samples/sec   Loss 12.2014   LearningRate 0.0774   Epoch: 2   Global Step: 99840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:23,798-Speed 2627.50 samples/sec   Loss 12.2107   LearningRate 0.0774   Epoch: 2   Global Step: 99850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:27,722-Speed 2609.91 samples/sec   Loss 12.2719   LearningRate 0.0774   Epoch: 2   Global Step: 99860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:31,619-Speed 2628.46 samples/sec   Loss 12.2622   LearningRate 0.0774   Epoch: 2   Global Step: 99870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:35,515-Speed 2629.07 samples/sec   Loss 12.2528   LearningRate 0.0774   Epoch: 2   Global Step: 99880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:39,449-Speed 2603.40 samples/sec   Loss 12.2114   LearningRate 0.0774   Epoch: 2   Global Step: 99890   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:43,528-Speed 2510.77 samples/sec   Loss 12.2258   LearningRate 0.0774   Epoch: 2   Global Step: 99900   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:47,542-Speed 2552.00 samples/sec   Loss 12.2826   LearningRate 0.0774   Epoch: 2   Global Step: 99910   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:55:51,434-Speed 2631.93 samples/sec   Loss 12.3184   LearningRate 0.0774   Epoch: 2   Global Step: 99920   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:55:55,309-Speed 2643.37 samples/sec   Loss 12.4011   LearningRate 0.0774   Epoch: 2   Global Step: 99930   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:55:59,200-Speed 2632.57 samples/sec   Loss 12.3258   LearningRate 0.0774   Epoch: 2   Global Step: 99940   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:56:03,100-Speed 2626.30 samples/sec   Loss 12.2245   LearningRate 0.0774   Epoch: 2   Global Step: 99950   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:56:07,010-Speed 2619.24 samples/sec   Loss 12.3686   LearningRate 0.0774   Epoch: 2   Global Step: 99960   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:56:11,008-Speed 2562.01 samples/sec   Loss 12.1977   LearningRate 0.0774   Epoch: 2   Global Step: 99970   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:56:14,956-Speed 2594.89 samples/sec   Loss 12.1902   LearningRate 0.0773   Epoch: 2   Global Step: 99980   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:56:18,862-Speed 2621.78 samples/sec   Loss 12.1553   LearningRate 0.0773   Epoch: 2   Global Step: 99990   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:56:22,766-Speed 2623.89 samples/sec   Loss 12.2024   LearningRate 0.0773   Epoch: 2   Global Step: 100000   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:57:05,822-[lfw][100000]XNorm: 24.093140
Training: 2022-04-13 06:57:05,823-[lfw][100000]Accuracy-Flip: 0.99667+-0.00298
Training: 2022-04-13 06:57:05,823-[lfw][100000]Accuracy-Highest: 0.99783
Training: 2022-04-13 06:57:56,593-[cfp_fp][100000]XNorm: 22.382464
Training: 2022-04-13 06:57:56,594-[cfp_fp][100000]Accuracy-Flip: 0.97614+-0.00658
Training: 2022-04-13 06:57:56,594-[cfp_fp][100000]Accuracy-Highest: 0.97986
Training: 2022-04-13 06:58:39,974-[agedb_30][100000]XNorm: 23.680389
Training: 2022-04-13 06:58:39,975-[agedb_30][100000]Accuracy-Flip: 0.96750+-0.00834
Training: 2022-04-13 06:58:39,975-[agedb_30][100000]Accuracy-Highest: 0.96750
Training: 2022-04-13 06:58:43,869-Speed 72.57 samples/sec   Loss 12.2919   LearningRate 0.0773   Epoch: 2   Global Step: 100010   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:58:47,742-Speed 2644.54 samples/sec   Loss 12.2197   LearningRate 0.0773   Epoch: 2   Global Step: 100020   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:58:51,615-Speed 2644.91 samples/sec   Loss 12.2164   LearningRate 0.0773   Epoch: 2   Global Step: 100030   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:58:55,496-Speed 2638.49 samples/sec   Loss 12.1436   LearningRate 0.0773   Epoch: 2   Global Step: 100040   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:58:59,390-Speed 2630.45 samples/sec   Loss 12.2636   LearningRate 0.0773   Epoch: 2   Global Step: 100050   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:59:03,257-Speed 2648.58 samples/sec   Loss 12.1178   LearningRate 0.0773   Epoch: 2   Global Step: 100060   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:07,138-Speed 2639.49 samples/sec   Loss 12.2830   LearningRate 0.0773   Epoch: 2   Global Step: 100070   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:11,038-Speed 2626.09 samples/sec   Loss 12.2513   LearningRate 0.0773   Epoch: 2   Global Step: 100080   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:14,984-Speed 2595.95 samples/sec   Loss 12.3199   LearningRate 0.0773   Epoch: 2   Global Step: 100090   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:18,870-Speed 2636.21 samples/sec   Loss 12.2728   LearningRate 0.0773   Epoch: 2   Global Step: 100100   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:22,755-Speed 2635.82 samples/sec   Loss 12.2451   LearningRate 0.0773   Epoch: 2   Global Step: 100110   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:26,656-Speed 2626.01 samples/sec   Loss 12.1791   LearningRate 0.0773   Epoch: 2   Global Step: 100120   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:30,551-Speed 2630.20 samples/sec   Loss 12.1800   LearningRate 0.0773   Epoch: 2   Global Step: 100130   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:34,439-Speed 2634.20 samples/sec   Loss 12.3156   LearningRate 0.0773   Epoch: 2   Global Step: 100140   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:38,328-Speed 2633.79 samples/sec   Loss 12.0245   LearningRate 0.0773   Epoch: 2   Global Step: 100150   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 06:59:42,245-Speed 2615.25 samples/sec   Loss 12.2319   LearningRate 0.0773   Epoch: 2   Global Step: 100160   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:59:46,142-Speed 2628.60 samples/sec   Loss 12.0774   LearningRate 0.0773   Epoch: 2   Global Step: 100170   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:59:50,044-Speed 2624.28 samples/sec   Loss 12.4209   LearningRate 0.0773   Epoch: 2   Global Step: 100180   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:59:53,938-Speed 2630.47 samples/sec   Loss 12.2400   LearningRate 0.0773   Epoch: 2   Global Step: 100190   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 06:59:57,836-Speed 2627.57 samples/sec   Loss 12.2784   LearningRate 0.0773   Epoch: 2   Global Step: 100200   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:01,736-Speed 2626.55 samples/sec   Loss 12.2090   LearningRate 0.0773   Epoch: 2   Global Step: 100210   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:05,635-Speed 2627.48 samples/sec   Loss 12.2386   LearningRate 0.0773   Epoch: 2   Global Step: 100220   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:09,531-Speed 2628.83 samples/sec   Loss 12.1740   LearningRate 0.0773   Epoch: 2   Global Step: 100230   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:13,431-Speed 2626.09 samples/sec   Loss 12.2549   LearningRate 0.0773   Epoch: 2   Global Step: 100240   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:17,333-Speed 2625.15 samples/sec   Loss 12.2152   LearningRate 0.0773   Epoch: 2   Global Step: 100250   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:21,230-Speed 2628.23 samples/sec   Loss 12.2302   LearningRate 0.0773   Epoch: 2   Global Step: 100260   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:25,121-Speed 2632.15 samples/sec   Loss 12.1100   LearningRate 0.0773   Epoch: 2   Global Step: 100270   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:29,025-Speed 2624.12 samples/sec   Loss 12.3089   LearningRate 0.0773   Epoch: 2   Global Step: 100280   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:32,922-Speed 2628.11 samples/sec   Loss 12.1838   LearningRate 0.0773   Epoch: 2   Global Step: 100290   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:00:36,801-Speed 2640.56 samples/sec   Loss 12.2294   LearningRate 0.0773   Epoch: 2   Global Step: 100300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:00:40,701-Speed 2626.87 samples/sec   Loss 12.2548   LearningRate 0.0773   Epoch: 2   Global Step: 100310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:00:44,607-Speed 2621.75 samples/sec   Loss 12.1533   LearningRate 0.0773   Epoch: 2   Global Step: 100320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:00:48,508-Speed 2625.87 samples/sec   Loss 12.2825   LearningRate 0.0773   Epoch: 2   Global Step: 100330   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:00:52,406-Speed 2627.53 samples/sec   Loss 12.1042   LearningRate 0.0773   Epoch: 2   Global Step: 100340   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:00:56,344-Speed 2601.65 samples/sec   Loss 12.2951   LearningRate 0.0773   Epoch: 2   Global Step: 100350   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:00,256-Speed 2618.36 samples/sec   Loss 12.3940   LearningRate 0.0773   Epoch: 2   Global Step: 100360   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:04,150-Speed 2630.28 samples/sec   Loss 12.2223   LearningRate 0.0773   Epoch: 2   Global Step: 100370   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:08,043-Speed 2630.85 samples/sec   Loss 12.1884   LearningRate 0.0773   Epoch: 2   Global Step: 100380   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:11,947-Speed 2623.70 samples/sec   Loss 12.2507   LearningRate 0.0773   Epoch: 2   Global Step: 100390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:15,859-Speed 2618.32 samples/sec   Loss 12.1768   LearningRate 0.0773   Epoch: 2   Global Step: 100400   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:01:19,735-Speed 2642.10 samples/sec   Loss 12.1700   LearningRate 0.0773   Epoch: 2   Global Step: 100410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:23,637-Speed 2625.67 samples/sec   Loss 12.1008   LearningRate 0.0773   Epoch: 2   Global Step: 100420   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:27,534-Speed 2628.08 samples/sec   Loss 12.2809   LearningRate 0.0773   Epoch: 2   Global Step: 100430   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:31,437-Speed 2624.70 samples/sec   Loss 12.1084   LearningRate 0.0773   Epoch: 2   Global Step: 100440   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:01:35,375-Speed 2600.83 samples/sec   Loss 12.2661   LearningRate 0.0772   Epoch: 2   Global Step: 100450   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:01:39,264-Speed 2633.57 samples/sec   Loss 12.1561   LearningRate 0.0772   Epoch: 2   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:01:43,163-Speed 2627.28 samples/sec   Loss 12.2241   LearningRate 0.0772   Epoch: 2   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:01:47,058-Speed 2629.03 samples/sec   Loss 12.2098   LearningRate 0.0772   Epoch: 2   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:01:50,980-Speed 2612.34 samples/sec   Loss 12.2855   LearningRate 0.0772   Epoch: 2   Global Step: 100490   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:01:54,993-Speed 2552.29 samples/sec   Loss 12.4344   LearningRate 0.0772   Epoch: 2   Global Step: 100500   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:01:58,909-Speed 2615.72 samples/sec   Loss 12.0474   LearningRate 0.0772   Epoch: 2   Global Step: 100510   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:02,808-Speed 2626.97 samples/sec   Loss 12.2450   LearningRate 0.0772   Epoch: 2   Global Step: 100520   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:06,702-Speed 2630.27 samples/sec   Loss 12.0580   LearningRate 0.0772   Epoch: 2   Global Step: 100530   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:10,594-Speed 2631.35 samples/sec   Loss 12.3551   LearningRate 0.0772   Epoch: 2   Global Step: 100540   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:14,521-Speed 2608.45 samples/sec   Loss 12.2549   LearningRate 0.0772   Epoch: 2   Global Step: 100550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:02:18,398-Speed 2641.66 samples/sec   Loss 12.2078   LearningRate 0.0772   Epoch: 2   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:22,289-Speed 2632.51 samples/sec   Loss 12.2944   LearningRate 0.0772   Epoch: 2   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:26,182-Speed 2631.04 samples/sec   Loss 12.3503   LearningRate 0.0772   Epoch: 2   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:30,076-Speed 2630.82 samples/sec   Loss 12.2689   LearningRate 0.0772   Epoch: 2   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:33,973-Speed 2628.37 samples/sec   Loss 12.3087   LearningRate 0.0772   Epoch: 2   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:37,875-Speed 2624.82 samples/sec   Loss 12.2033   LearningRate 0.0772   Epoch: 2   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:41,775-Speed 2625.79 samples/sec   Loss 12.2495   LearningRate 0.0772   Epoch: 2   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:45,671-Speed 2629.38 samples/sec   Loss 12.3401   LearningRate 0.0772   Epoch: 2   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:49,569-Speed 2626.86 samples/sec   Loss 12.1728   LearningRate 0.0772   Epoch: 2   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:53,474-Speed 2623.28 samples/sec   Loss 12.1578   LearningRate 0.0772   Epoch: 2   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:02:57,370-Speed 2629.12 samples/sec   Loss 12.1944   LearningRate 0.0772   Epoch: 2   Global Step: 100660   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:01,295-Speed 2610.16 samples/sec   Loss 12.2984   LearningRate 0.0772   Epoch: 2   Global Step: 100670   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:05,200-Speed 2622.52 samples/sec   Loss 12.2219   LearningRate 0.0772   Epoch: 2   Global Step: 100680   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:09,103-Speed 2624.59 samples/sec   Loss 12.1074   LearningRate 0.0772   Epoch: 2   Global Step: 100690   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:12,994-Speed 2631.69 samples/sec   Loss 12.0849   LearningRate 0.0772   Epoch: 2   Global Step: 100700   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:16,896-Speed 2625.10 samples/sec   Loss 12.1407   LearningRate 0.0772   Epoch: 2   Global Step: 100710   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:20,807-Speed 2619.75 samples/sec   Loss 12.1448   LearningRate 0.0772   Epoch: 2   Global Step: 100720   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:24,702-Speed 2629.32 samples/sec   Loss 12.3900   LearningRate 0.0772   Epoch: 2   Global Step: 100730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:28,639-Speed 2602.22 samples/sec   Loss 12.2513   LearningRate 0.0772   Epoch: 2   Global Step: 100740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:32,535-Speed 2628.68 samples/sec   Loss 12.3497   LearningRate 0.0772   Epoch: 2   Global Step: 100750   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:03:36,425-Speed 2633.33 samples/sec   Loss 12.3160   LearningRate 0.0772   Epoch: 2   Global Step: 100760   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:03:40,324-Speed 2626.96 samples/sec   Loss 12.1851   LearningRate 0.0772   Epoch: 2   Global Step: 100770   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:03:44,222-Speed 2627.90 samples/sec   Loss 12.2673   LearningRate 0.0772   Epoch: 2   Global Step: 100780   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:03:48,114-Speed 2631.51 samples/sec   Loss 12.2167   LearningRate 0.0772   Epoch: 2   Global Step: 100790   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:03:52,035-Speed 2612.47 samples/sec   Loss 12.0929   LearningRate 0.0772   Epoch: 2   Global Step: 100800   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:03:55,937-Speed 2624.97 samples/sec   Loss 12.2086   LearningRate 0.0772   Epoch: 2   Global Step: 100810   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:03:59,833-Speed 2628.79 samples/sec   Loss 12.3403   LearningRate 0.0772   Epoch: 2   Global Step: 100820   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:04:03,733-Speed 2626.71 samples/sec   Loss 12.1891   LearningRate 0.0772   Epoch: 2   Global Step: 100830   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:04:07,614-Speed 2639.03 samples/sec   Loss 12.3343   LearningRate 0.0772   Epoch: 2   Global Step: 100840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:11,513-Speed 2626.88 samples/sec   Loss 12.3359   LearningRate 0.0772   Epoch: 2   Global Step: 100850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:15,417-Speed 2623.66 samples/sec   Loss 12.2650   LearningRate 0.0772   Epoch: 2   Global Step: 100860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:19,318-Speed 2625.50 samples/sec   Loss 12.0746   LearningRate 0.0772   Epoch: 2   Global Step: 100870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:23,232-Speed 2616.71 samples/sec   Loss 12.1503   LearningRate 0.0772   Epoch: 2   Global Step: 100880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:27,129-Speed 2628.41 samples/sec   Loss 12.1591   LearningRate 0.0772   Epoch: 2   Global Step: 100890   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:31,026-Speed 2628.34 samples/sec   Loss 12.2153   LearningRate 0.0772   Epoch: 2   Global Step: 100900   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:34,970-Speed 2597.27 samples/sec   Loss 12.1915   LearningRate 0.0772   Epoch: 2   Global Step: 100910   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:38,874-Speed 2623.41 samples/sec   Loss 12.2254   LearningRate 0.0771   Epoch: 2   Global Step: 100920   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:42,769-Speed 2629.79 samples/sec   Loss 12.1230   LearningRate 0.0771   Epoch: 2   Global Step: 100930   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:04:46,681-Speed 2617.89 samples/sec   Loss 12.4187   LearningRate 0.0771   Epoch: 2   Global Step: 100940   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:04:50,589-Speed 2621.22 samples/sec   Loss 12.2386   LearningRate 0.0771   Epoch: 2   Global Step: 100950   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:04:54,485-Speed 2629.25 samples/sec   Loss 12.1654   LearningRate 0.0771   Epoch: 2   Global Step: 100960   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:04:58,378-Speed 2631.61 samples/sec   Loss 12.1288   LearningRate 0.0771   Epoch: 2   Global Step: 100970   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:02,272-Speed 2629.83 samples/sec   Loss 12.2003   LearningRate 0.0771   Epoch: 2   Global Step: 100980   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:06,166-Speed 2630.69 samples/sec   Loss 12.1408   LearningRate 0.0771   Epoch: 2   Global Step: 100990   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:10,058-Speed 2632.00 samples/sec   Loss 12.0539   LearningRate 0.0771   Epoch: 2   Global Step: 101000   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:13,973-Speed 2615.54 samples/sec   Loss 12.1113   LearningRate 0.0771   Epoch: 2   Global Step: 101010   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:17,874-Speed 2626.22 samples/sec   Loss 12.2934   LearningRate 0.0771   Epoch: 2   Global Step: 101020   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:21,778-Speed 2623.70 samples/sec   Loss 12.1977   LearningRate 0.0771   Epoch: 2   Global Step: 101030   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:05:25,669-Speed 2632.22 samples/sec   Loss 12.3255   LearningRate 0.0771   Epoch: 2   Global Step: 101040   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:29,577-Speed 2621.07 samples/sec   Loss 12.2311   LearningRate 0.0771   Epoch: 2   Global Step: 101050   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:33,473-Speed 2628.65 samples/sec   Loss 12.1915   LearningRate 0.0771   Epoch: 2   Global Step: 101060   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:37,372-Speed 2627.64 samples/sec   Loss 12.3571   LearningRate 0.0771   Epoch: 2   Global Step: 101070   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:41,269-Speed 2628.31 samples/sec   Loss 12.2085   LearningRate 0.0771   Epoch: 2   Global Step: 101080   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:45,160-Speed 2631.86 samples/sec   Loss 12.2268   LearningRate 0.0771   Epoch: 2   Global Step: 101090   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:49,053-Speed 2631.16 samples/sec   Loss 12.3231   LearningRate 0.0771   Epoch: 2   Global Step: 101100   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:52,947-Speed 2630.05 samples/sec   Loss 12.4115   LearningRate 0.0771   Epoch: 2   Global Step: 101110   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:05:56,845-Speed 2627.91 samples/sec   Loss 12.3528   LearningRate 0.0771   Epoch: 2   Global Step: 101120   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:06:00,748-Speed 2623.95 samples/sec   Loss 12.1512   LearningRate 0.0771   Epoch: 2   Global Step: 101130   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:06:04,653-Speed 2622.80 samples/sec   Loss 12.2121   LearningRate 0.0771   Epoch: 2   Global Step: 101140   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:08,561-Speed 2620.92 samples/sec   Loss 12.2276   LearningRate 0.0771   Epoch: 2   Global Step: 101150   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:12,503-Speed 2598.19 samples/sec   Loss 12.2998   LearningRate 0.0771   Epoch: 2   Global Step: 101160   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:16,417-Speed 2616.85 samples/sec   Loss 12.1156   LearningRate 0.0771   Epoch: 2   Global Step: 101170   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:20,313-Speed 2629.30 samples/sec   Loss 12.2375   LearningRate 0.0771   Epoch: 2   Global Step: 101180   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:24,208-Speed 2629.60 samples/sec   Loss 12.0986   LearningRate 0.0771   Epoch: 2   Global Step: 101190   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:28,106-Speed 2627.12 samples/sec   Loss 12.2704   LearningRate 0.0771   Epoch: 2   Global Step: 101200   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:32,006-Speed 2626.72 samples/sec   Loss 12.1323   LearningRate 0.0771   Epoch: 2   Global Step: 101210   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:35,901-Speed 2629.57 samples/sec   Loss 12.2261   LearningRate 0.0771   Epoch: 2   Global Step: 101220   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:39,808-Speed 2621.38 samples/sec   Loss 12.2239   LearningRate 0.0771   Epoch: 2   Global Step: 101230   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:43,695-Speed 2635.07 samples/sec   Loss 12.2985   LearningRate 0.0771   Epoch: 2   Global Step: 101240   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:06:47,574-Speed 2640.48 samples/sec   Loss 12.3178   LearningRate 0.0771   Epoch: 2   Global Step: 101250   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:06:51,468-Speed 2630.71 samples/sec   Loss 12.2327   LearningRate 0.0771   Epoch: 2   Global Step: 101260   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:06:55,362-Speed 2630.28 samples/sec   Loss 12.1734   LearningRate 0.0771   Epoch: 2   Global Step: 101270   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:06:59,273-Speed 2618.55 samples/sec   Loss 12.2979   LearningRate 0.0771   Epoch: 2   Global Step: 101280   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:03,171-Speed 2627.38 samples/sec   Loss 12.3307   LearningRate 0.0771   Epoch: 2   Global Step: 101290   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:07,063-Speed 2631.92 samples/sec   Loss 12.2384   LearningRate 0.0771   Epoch: 2   Global Step: 101300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:10,961-Speed 2627.50 samples/sec   Loss 12.1941   LearningRate 0.0771   Epoch: 2   Global Step: 101310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:14,864-Speed 2624.12 samples/sec   Loss 12.1392   LearningRate 0.0771   Epoch: 2   Global Step: 101320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:18,758-Speed 2629.83 samples/sec   Loss 12.2062   LearningRate 0.0771   Epoch: 2   Global Step: 101330   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:22,654-Speed 2629.65 samples/sec   Loss 12.1602   LearningRate 0.0771   Epoch: 2   Global Step: 101340   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:26,548-Speed 2630.26 samples/sec   Loss 12.1277   LearningRate 0.0771   Epoch: 2   Global Step: 101350   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:07:30,423-Speed 2643.27 samples/sec   Loss 12.3019   LearningRate 0.0771   Epoch: 2   Global Step: 101360   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:34,321-Speed 2627.38 samples/sec   Loss 12.2482   LearningRate 0.0771   Epoch: 2   Global Step: 101370   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:38,218-Speed 2628.39 samples/sec   Loss 12.2814   LearningRate 0.0771   Epoch: 2   Global Step: 101380   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:42,111-Speed 2630.83 samples/sec   Loss 12.2769   LearningRate 0.0771   Epoch: 2   Global Step: 101390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:46,005-Speed 2630.41 samples/sec   Loss 12.2905   LearningRate 0.0770   Epoch: 2   Global Step: 101400   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:49,897-Speed 2631.15 samples/sec   Loss 12.1402   LearningRate 0.0770   Epoch: 2   Global Step: 101410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:53,790-Speed 2631.85 samples/sec   Loss 12.2120   LearningRate 0.0770   Epoch: 2   Global Step: 101420   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:07:57,684-Speed 2630.36 samples/sec   Loss 12.2517   LearningRate 0.0770   Epoch: 2   Global Step: 101430   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:01,576-Speed 2631.48 samples/sec   Loss 12.0966   LearningRate 0.0770   Epoch: 2   Global Step: 101440   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:05,471-Speed 2630.17 samples/sec   Loss 12.2362   LearningRate 0.0770   Epoch: 2   Global Step: 101450   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:09,384-Speed 2616.88 samples/sec   Loss 12.1906   LearningRate 0.0770   Epoch: 2   Global Step: 101460   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:08:13,283-Speed 2627.31 samples/sec   Loss 12.1464   LearningRate 0.0770   Epoch: 2   Global Step: 101470   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:08:17,159-Speed 2642.46 samples/sec   Loss 12.3691   LearningRate 0.0770   Epoch: 2   Global Step: 101480   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:21,051-Speed 2632.15 samples/sec   Loss 12.1966   LearningRate 0.0770   Epoch: 2   Global Step: 101490   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:24,947-Speed 2628.66 samples/sec   Loss 12.1006   LearningRate 0.0770   Epoch: 2   Global Step: 101500   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:28,841-Speed 2630.39 samples/sec   Loss 12.2043   LearningRate 0.0770   Epoch: 2   Global Step: 101510   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:32,743-Speed 2625.22 samples/sec   Loss 12.2626   LearningRate 0.0770   Epoch: 2   Global Step: 101520   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:36,666-Speed 2610.42 samples/sec   Loss 12.3191   LearningRate 0.0770   Epoch: 2   Global Step: 101530   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:40,584-Speed 2614.74 samples/sec   Loss 12.3106   LearningRate 0.0770   Epoch: 2   Global Step: 101540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:44,482-Speed 2627.51 samples/sec   Loss 12.2297   LearningRate 0.0770   Epoch: 2   Global Step: 101550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:48,374-Speed 2631.71 samples/sec   Loss 12.1820   LearningRate 0.0770   Epoch: 2   Global Step: 101560   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:52,268-Speed 2630.19 samples/sec   Loss 12.3121   LearningRate 0.0770   Epoch: 2   Global Step: 101570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:08:56,163-Speed 2629.56 samples/sec   Loss 12.1909   LearningRate 0.0770   Epoch: 2   Global Step: 101580   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:09:00,062-Speed 2626.78 samples/sec   Loss 12.3520   LearningRate 0.0770   Epoch: 2   Global Step: 101590   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:09:03,978-Speed 2615.09 samples/sec   Loss 12.3107   LearningRate 0.0770   Epoch: 2   Global Step: 101600   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:09:07,875-Speed 2628.48 samples/sec   Loss 12.1437   LearningRate 0.0770   Epoch: 2   Global Step: 101610   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:09:11,751-Speed 2642.29 samples/sec   Loss 12.2231   LearningRate 0.0770   Epoch: 2   Global Step: 101620   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:15,670-Speed 2614.07 samples/sec   Loss 12.1652   LearningRate 0.0770   Epoch: 2   Global Step: 101630   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:19,625-Speed 2590.23 samples/sec   Loss 12.1588   LearningRate 0.0770   Epoch: 2   Global Step: 101640   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:23,574-Speed 2593.66 samples/sec   Loss 12.2917   LearningRate 0.0770   Epoch: 2   Global Step: 101650   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:27,470-Speed 2628.92 samples/sec   Loss 12.1052   LearningRate 0.0770   Epoch: 2   Global Step: 101660   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:31,359-Speed 2633.48 samples/sec   Loss 12.3780   LearningRate 0.0770   Epoch: 2   Global Step: 101670   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:35,277-Speed 2614.44 samples/sec   Loss 12.2479   LearningRate 0.0770   Epoch: 2   Global Step: 101680   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:39,168-Speed 2632.05 samples/sec   Loss 12.1207   LearningRate 0.0770   Epoch: 2   Global Step: 101690   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:43,061-Speed 2631.12 samples/sec   Loss 12.2198   LearningRate 0.0770   Epoch: 2   Global Step: 101700   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:46,957-Speed 2629.26 samples/sec   Loss 12.2900   LearningRate 0.0770   Epoch: 2   Global Step: 101710   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:09:50,859-Speed 2625.03 samples/sec   Loss 12.1635   LearningRate 0.0770   Epoch: 2   Global Step: 101720   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:09:54,759-Speed 2625.94 samples/sec   Loss 12.0671   LearningRate 0.0770   Epoch: 2   Global Step: 101730   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:09:58,639-Speed 2640.25 samples/sec   Loss 12.1690   LearningRate 0.0770   Epoch: 2   Global Step: 101740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:10:02,511-Speed 2645.22 samples/sec   Loss 12.1375   LearningRate 0.0770   Epoch: 2   Global Step: 101750   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:10:06,445-Speed 2603.57 samples/sec   Loss 12.3203   LearningRate 0.0770   Epoch: 2   Global Step: 101760   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:10:10,337-Speed 2630.88 samples/sec   Loss 12.1361   LearningRate 0.0770   Epoch: 2   Global Step: 101770   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:10:14,229-Speed 2631.93 samples/sec   Loss 12.2872   LearningRate 0.0770   Epoch: 2   Global Step: 101780   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:10:18,121-Speed 2632.26 samples/sec   Loss 12.3180   LearningRate 0.0770   Epoch: 2   Global Step: 101790   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:10:22,012-Speed 2632.27 samples/sec   Loss 12.2593   LearningRate 0.0770   Epoch: 2   Global Step: 101800   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:25,900-Speed 2634.94 samples/sec   Loss 12.4949   LearningRate 0.0770   Epoch: 2   Global Step: 101810   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:29,905-Speed 2557.42 samples/sec   Loss 12.2437   LearningRate 0.0770   Epoch: 2   Global Step: 101820   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:33,903-Speed 2562.29 samples/sec   Loss 12.3102   LearningRate 0.0770   Epoch: 2   Global Step: 101830   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:37,798-Speed 2629.39 samples/sec   Loss 12.3919   LearningRate 0.0770   Epoch: 2   Global Step: 101840   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:41,699-Speed 2625.39 samples/sec   Loss 12.3819   LearningRate 0.0770   Epoch: 2   Global Step: 101850   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:45,596-Speed 2628.47 samples/sec   Loss 12.6346   LearningRate 0.0770   Epoch: 2   Global Step: 101860   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:49,497-Speed 2625.93 samples/sec   Loss 12.3651   LearningRate 0.0769   Epoch: 2   Global Step: 101870   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:53,399-Speed 2624.45 samples/sec   Loss 12.3805   LearningRate 0.0769   Epoch: 2   Global Step: 101880   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:10:57,301-Speed 2624.91 samples/sec   Loss 12.2019   LearningRate 0.0769   Epoch: 2   Global Step: 101890   Fp16 Grad Scale: 16384   Required: 82 hours
Training: 2022-04-13 07:11:01,210-Speed 2620.43 samples/sec   Loss 12.3155   LearningRate 0.0769   Epoch: 2   Global Step: 101900   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:05,113-Speed 2624.14 samples/sec   Loss 12.0658   LearningRate 0.0769   Epoch: 2   Global Step: 101910   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:09,003-Speed 2632.96 samples/sec   Loss 12.2669   LearningRate 0.0769   Epoch: 2   Global Step: 101920   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:12,895-Speed 2631.60 samples/sec   Loss 12.3025   LearningRate 0.0769   Epoch: 2   Global Step: 101930   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:16,789-Speed 2630.26 samples/sec   Loss 12.2500   LearningRate 0.0769   Epoch: 2   Global Step: 101940   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:20,680-Speed 2632.81 samples/sec   Loss 12.2388   LearningRate 0.0769   Epoch: 2   Global Step: 101950   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:24,573-Speed 2631.09 samples/sec   Loss 12.3048   LearningRate 0.0769   Epoch: 2   Global Step: 101960   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:28,471-Speed 2627.80 samples/sec   Loss 12.1552   LearningRate 0.0769   Epoch: 2   Global Step: 101970   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:32,365-Speed 2630.02 samples/sec   Loss 12.1765   LearningRate 0.0769   Epoch: 2   Global Step: 101980   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:36,259-Speed 2630.47 samples/sec   Loss 12.2038   LearningRate 0.0769   Epoch: 2   Global Step: 101990   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:11:40,162-Speed 2623.94 samples/sec   Loss 12.2109   LearningRate 0.0769   Epoch: 2   Global Step: 102000   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:11:44,066-Speed 2623.47 samples/sec   Loss 12.3778   LearningRate 0.0769   Epoch: 2   Global Step: 102010   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:11:47,960-Speed 2629.93 samples/sec   Loss 12.2938   LearningRate 0.0769   Epoch: 2   Global Step: 102020   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:11:51,858-Speed 2628.12 samples/sec   Loss 12.2434   LearningRate 0.0769   Epoch: 2   Global Step: 102030   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:11:55,757-Speed 2627.07 samples/sec   Loss 12.2913   LearningRate 0.0769   Epoch: 2   Global Step: 102040   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:11:59,649-Speed 2631.59 samples/sec   Loss 12.3403   LearningRate 0.0769   Epoch: 2   Global Step: 102050   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:12:03,542-Speed 2630.70 samples/sec   Loss 12.1955   LearningRate 0.0769   Epoch: 2   Global Step: 102060   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:12:07,438-Speed 2629.47 samples/sec   Loss 12.1081   LearningRate 0.0769   Epoch: 2   Global Step: 102070   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:12:11,330-Speed 2631.29 samples/sec   Loss 12.2871   LearningRate 0.0769   Epoch: 2   Global Step: 102080   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:12:15,223-Speed 2630.79 samples/sec   Loss 12.1583   LearningRate 0.0769   Epoch: 2   Global Step: 102090   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:12:19,119-Speed 2628.57 samples/sec   Loss 12.2729   LearningRate 0.0769   Epoch: 2   Global Step: 102100   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:23,015-Speed 2628.75 samples/sec   Loss 12.2197   LearningRate 0.0769   Epoch: 2   Global Step: 102110   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:26,911-Speed 2630.04 samples/sec   Loss 12.2424   LearningRate 0.0769   Epoch: 2   Global Step: 102120   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:30,804-Speed 2630.40 samples/sec   Loss 12.2355   LearningRate 0.0769   Epoch: 2   Global Step: 102130   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:34,709-Speed 2623.17 samples/sec   Loss 12.2040   LearningRate 0.0769   Epoch: 2   Global Step: 102140   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:38,611-Speed 2624.88 samples/sec   Loss 12.2032   LearningRate 0.0769   Epoch: 2   Global Step: 102150   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:42,508-Speed 2628.90 samples/sec   Loss 12.3380   LearningRate 0.0769   Epoch: 2   Global Step: 102160   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:46,401-Speed 2630.66 samples/sec   Loss 12.2054   LearningRate 0.0769   Epoch: 2   Global Step: 102170   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:50,294-Speed 2631.12 samples/sec   Loss 12.1262   LearningRate 0.0769   Epoch: 2   Global Step: 102180   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:54,186-Speed 2632.00 samples/sec   Loss 12.2094   LearningRate 0.0769   Epoch: 2   Global Step: 102190   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:12:58,089-Speed 2623.84 samples/sec   Loss 12.1615   LearningRate 0.0769   Epoch: 2   Global Step: 102200   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:13:01,986-Speed 2628.19 samples/sec   Loss 12.3645   LearningRate 0.0769   Epoch: 2   Global Step: 102210   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:13:05,881-Speed 2630.15 samples/sec   Loss 12.2009   LearningRate 0.0769   Epoch: 2   Global Step: 102220   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:13:09,777-Speed 2628.79 samples/sec   Loss 12.2743   LearningRate 0.0769   Epoch: 2   Global Step: 102230   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:13:13,669-Speed 2631.23 samples/sec   Loss 12.2560   LearningRate 0.0769   Epoch: 2   Global Step: 102240   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:17,565-Speed 2629.39 samples/sec   Loss 12.2688   LearningRate 0.0769   Epoch: 2   Global Step: 102250   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:21,475-Speed 2619.59 samples/sec   Loss 12.2014   LearningRate 0.0769   Epoch: 2   Global Step: 102260   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:25,370-Speed 2629.38 samples/sec   Loss 12.0947   LearningRate 0.0769   Epoch: 2   Global Step: 102270   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:29,265-Speed 2630.57 samples/sec   Loss 12.2565   LearningRate 0.0769   Epoch: 2   Global Step: 102280   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:33,162-Speed 2627.81 samples/sec   Loss 12.3066   LearningRate 0.0769   Epoch: 2   Global Step: 102290   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:37,072-Speed 2619.50 samples/sec   Loss 12.1499   LearningRate 0.0769   Epoch: 2   Global Step: 102300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:40,967-Speed 2629.38 samples/sec   Loss 12.0755   LearningRate 0.0769   Epoch: 2   Global Step: 102310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:44,862-Speed 2629.69 samples/sec   Loss 12.1873   LearningRate 0.0769   Epoch: 2   Global Step: 102320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:48,756-Speed 2630.44 samples/sec   Loss 12.2116   LearningRate 0.0769   Epoch: 2   Global Step: 102330   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:13:52,653-Speed 2628.48 samples/sec   Loss 12.2640   LearningRate 0.0768   Epoch: 2   Global Step: 102340   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:13:56,532-Speed 2639.97 samples/sec   Loss 12.1860   LearningRate 0.0768   Epoch: 2   Global Step: 102350   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:00,514-Speed 2572.52 samples/sec   Loss 12.3142   LearningRate 0.0768   Epoch: 2   Global Step: 102360   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:04,591-Speed 2512.54 samples/sec   Loss 12.1283   LearningRate 0.0768   Epoch: 2   Global Step: 102370   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:08,589-Speed 2561.62 samples/sec   Loss 12.0942   LearningRate 0.0768   Epoch: 2   Global Step: 102380   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:12,484-Speed 2629.77 samples/sec   Loss 12.2319   LearningRate 0.0768   Epoch: 2   Global Step: 102390   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:16,379-Speed 2629.29 samples/sec   Loss 12.1836   LearningRate 0.0768   Epoch: 2   Global Step: 102400   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:20,284-Speed 2622.88 samples/sec   Loss 12.2646   LearningRate 0.0768   Epoch: 2   Global Step: 102410   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:14:24,166-Speed 2643.38 samples/sec   Loss 12.2556   LearningRate 0.0768   Epoch: 2   Global Step: 102420   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:28,069-Speed 2623.82 samples/sec   Loss 12.0787   LearningRate 0.0768   Epoch: 2   Global Step: 102430   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:31,973-Speed 2624.15 samples/sec   Loss 12.1703   LearningRate 0.0768   Epoch: 2   Global Step: 102440   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:35,867-Speed 2630.16 samples/sec   Loss 12.2292   LearningRate 0.0768   Epoch: 2   Global Step: 102450   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:39,764-Speed 2628.24 samples/sec   Loss 12.3302   LearningRate 0.0768   Epoch: 2   Global Step: 102460   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:43,655-Speed 2632.02 samples/sec   Loss 12.2017   LearningRate 0.0768   Epoch: 2   Global Step: 102470   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:47,555-Speed 2626.58 samples/sec   Loss 12.2753   LearningRate 0.0768   Epoch: 2   Global Step: 102480   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:51,454-Speed 2627.45 samples/sec   Loss 12.3237   LearningRate 0.0768   Epoch: 2   Global Step: 102490   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:55,346-Speed 2631.05 samples/sec   Loss 12.2631   LearningRate 0.0768   Epoch: 2   Global Step: 102500   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:14:59,253-Speed 2621.45 samples/sec   Loss 12.2867   LearningRate 0.0768   Epoch: 2   Global Step: 102510   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:03,148-Speed 2629.94 samples/sec   Loss 12.3139   LearningRate 0.0768   Epoch: 2   Global Step: 102520   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:07,043-Speed 2629.37 samples/sec   Loss 12.3185   LearningRate 0.0768   Epoch: 2   Global Step: 102530   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:10,945-Speed 2625.30 samples/sec   Loss 12.1234   LearningRate 0.0768   Epoch: 2   Global Step: 102540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:14,842-Speed 2628.18 samples/sec   Loss 12.3464   LearningRate 0.0768   Epoch: 2   Global Step: 102550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:18,741-Speed 2626.55 samples/sec   Loss 12.1836   LearningRate 0.0768   Epoch: 2   Global Step: 102560   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:22,645-Speed 2624.29 samples/sec   Loss 12.3583   LearningRate 0.0768   Epoch: 2   Global Step: 102570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:26,556-Speed 2618.15 samples/sec   Loss 12.3493   LearningRate 0.0768   Epoch: 2   Global Step: 102580   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:30,457-Speed 2625.87 samples/sec   Loss 12.3093   LearningRate 0.0768   Epoch: 2   Global Step: 102590   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:15:34,334-Speed 2642.06 samples/sec   Loss 12.4760   LearningRate 0.0768   Epoch: 2   Global Step: 102600   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:38,225-Speed 2631.57 samples/sec   Loss 12.5732   LearningRate 0.0768   Epoch: 2   Global Step: 102610   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:42,122-Speed 2628.65 samples/sec   Loss 12.4080   LearningRate 0.0768   Epoch: 2   Global Step: 102620   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:46,018-Speed 2628.87 samples/sec   Loss 12.2528   LearningRate 0.0768   Epoch: 2   Global Step: 102630   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:49,913-Speed 2629.89 samples/sec   Loss 12.2990   LearningRate 0.0768   Epoch: 2   Global Step: 102640   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:53,807-Speed 2630.29 samples/sec   Loss 12.2819   LearningRate 0.0768   Epoch: 2   Global Step: 102650   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:15:57,713-Speed 2622.42 samples/sec   Loss 12.4035   LearningRate 0.0768   Epoch: 2   Global Step: 102660   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:16:01,605-Speed 2631.60 samples/sec   Loss 12.2369   LearningRate 0.0768   Epoch: 2   Global Step: 102670   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:16:05,508-Speed 2624.17 samples/sec   Loss 12.1342   LearningRate 0.0768   Epoch: 2   Global Step: 102680   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:16:09,415-Speed 2621.28 samples/sec   Loss 12.1245   LearningRate 0.0768   Epoch: 2   Global Step: 102690   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:16:13,326-Speed 2618.89 samples/sec   Loss 12.1826   LearningRate 0.0768   Epoch: 2   Global Step: 102700   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:17,232-Speed 2622.76 samples/sec   Loss 12.2153   LearningRate 0.0768   Epoch: 2   Global Step: 102710   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:21,133-Speed 2625.42 samples/sec   Loss 12.2048   LearningRate 0.0768   Epoch: 2   Global Step: 102720   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:25,045-Speed 2618.16 samples/sec   Loss 12.2632   LearningRate 0.0768   Epoch: 2   Global Step: 102730   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:28,937-Speed 2631.98 samples/sec   Loss 12.4143   LearningRate 0.0768   Epoch: 2   Global Step: 102740   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:32,841-Speed 2623.24 samples/sec   Loss 12.1846   LearningRate 0.0768   Epoch: 2   Global Step: 102750   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:36,736-Speed 2630.12 samples/sec   Loss 12.3480   LearningRate 0.0768   Epoch: 2   Global Step: 102760   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:40,629-Speed 2630.88 samples/sec   Loss 12.2049   LearningRate 0.0768   Epoch: 2   Global Step: 102770   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:44,521-Speed 2631.43 samples/sec   Loss 12.1873   LearningRate 0.0768   Epoch: 2   Global Step: 102780   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:48,417-Speed 2628.97 samples/sec   Loss 12.1127   LearningRate 0.0768   Epoch: 2   Global Step: 102790   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:16:52,310-Speed 2631.44 samples/sec   Loss 12.3044   LearningRate 0.0768   Epoch: 2   Global Step: 102800   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:16:56,221-Speed 2618.61 samples/sec   Loss 12.1643   LearningRate 0.0767   Epoch: 2   Global Step: 102810   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:17:00,141-Speed 2613.28 samples/sec   Loss 12.2840   LearningRate 0.0767   Epoch: 2   Global Step: 102820   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:17:04,024-Speed 2637.82 samples/sec   Loss 12.1622   LearningRate 0.0767   Epoch: 2   Global Step: 102830   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:07,924-Speed 2626.42 samples/sec   Loss 11.9802   LearningRate 0.0767   Epoch: 2   Global Step: 102840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:11,823-Speed 2627.37 samples/sec   Loss 12.3053   LearningRate 0.0767   Epoch: 2   Global Step: 102850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:15,723-Speed 2626.05 samples/sec   Loss 12.2704   LearningRate 0.0767   Epoch: 2   Global Step: 102860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:19,622-Speed 2626.55 samples/sec   Loss 12.1794   LearningRate 0.0767   Epoch: 2   Global Step: 102870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:23,516-Speed 2630.22 samples/sec   Loss 12.2029   LearningRate 0.0767   Epoch: 2   Global Step: 102880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:27,447-Speed 2606.38 samples/sec   Loss 12.1963   LearningRate 0.0767   Epoch: 2   Global Step: 102890   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:31,337-Speed 2633.10 samples/sec   Loss 12.1750   LearningRate 0.0767   Epoch: 2   Global Step: 102900   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:17:35,213-Speed 2642.17 samples/sec   Loss 12.1784   LearningRate 0.0767   Epoch: 2   Global Step: 102910   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:17:39,123-Speed 2619.52 samples/sec   Loss 12.1686   LearningRate 0.0767   Epoch: 2   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:17:43,020-Speed 2628.35 samples/sec   Loss 12.1427   LearningRate 0.0767   Epoch: 2   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:17:46,916-Speed 2628.93 samples/sec   Loss 12.2312   LearningRate 0.0767   Epoch: 2   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:17:50,808-Speed 2631.98 samples/sec   Loss 12.2392   LearningRate 0.0767   Epoch: 2   Global Step: 102950   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:17:54,696-Speed 2634.80 samples/sec   Loss 12.1929   LearningRate 0.0767   Epoch: 2   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:17:58,598-Speed 2624.71 samples/sec   Loss 12.0552   LearningRate 0.0767   Epoch: 2   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:02,490-Speed 2631.42 samples/sec   Loss 12.1724   LearningRate 0.0767   Epoch: 2   Global Step: 102980   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:06,381-Speed 2633.23 samples/sec   Loss 12.3818   LearningRate 0.0767   Epoch: 2   Global Step: 102990   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:10,270-Speed 2633.41 samples/sec   Loss 12.2809   LearningRate 0.0767   Epoch: 2   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:14,161-Speed 2632.24 samples/sec   Loss 12.2900   LearningRate 0.0767   Epoch: 2   Global Step: 103010   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:18:18,045-Speed 2637.08 samples/sec   Loss 12.0736   LearningRate 0.0767   Epoch: 2   Global Step: 103020   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:21,951-Speed 2623.12 samples/sec   Loss 12.2291   LearningRate 0.0767   Epoch: 2   Global Step: 103030   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:25,850-Speed 2626.97 samples/sec   Loss 12.1129   LearningRate 0.0767   Epoch: 2   Global Step: 103040   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:29,771-Speed 2611.86 samples/sec   Loss 12.1548   LearningRate 0.0767   Epoch: 2   Global Step: 103050   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:33,676-Speed 2622.85 samples/sec   Loss 12.2434   LearningRate 0.0767   Epoch: 2   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:37,571-Speed 2630.17 samples/sec   Loss 12.1950   LearningRate 0.0767   Epoch: 2   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:41,460-Speed 2633.32 samples/sec   Loss 12.2404   LearningRate 0.0767   Epoch: 2   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:45,387-Speed 2608.39 samples/sec   Loss 12.1179   LearningRate 0.0767   Epoch: 2   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:49,297-Speed 2619.82 samples/sec   Loss 12.1448   LearningRate 0.0767   Epoch: 2   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:53,192-Speed 2629.50 samples/sec   Loss 12.1400   LearningRate 0.0767   Epoch: 2   Global Step: 103110   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:18:57,089-Speed 2628.82 samples/sec   Loss 12.2128   LearningRate 0.0767   Epoch: 2   Global Step: 103120   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:00,980-Speed 2631.69 samples/sec   Loss 12.3559   LearningRate 0.0767   Epoch: 2   Global Step: 103130   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:04,871-Speed 2632.53 samples/sec   Loss 12.2608   LearningRate 0.0767   Epoch: 2   Global Step: 103140   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:08,763-Speed 2631.34 samples/sec   Loss 12.1999   LearningRate 0.0767   Epoch: 2   Global Step: 103150   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:12,654-Speed 2632.93 samples/sec   Loss 12.2596   LearningRate 0.0767   Epoch: 2   Global Step: 103160   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:16,558-Speed 2623.66 samples/sec   Loss 12.3314   LearningRate 0.0767   Epoch: 2   Global Step: 103170   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:20,470-Speed 2617.76 samples/sec   Loss 12.2267   LearningRate 0.0767   Epoch: 2   Global Step: 103180   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:24,379-Speed 2620.94 samples/sec   Loss 12.2213   LearningRate 0.0767   Epoch: 2   Global Step: 103190   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:28,294-Speed 2615.75 samples/sec   Loss 12.2289   LearningRate 0.0767   Epoch: 2   Global Step: 103200   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:32,208-Speed 2617.05 samples/sec   Loss 12.0932   LearningRate 0.0767   Epoch: 2   Global Step: 103210   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:36,120-Speed 2618.81 samples/sec   Loss 12.2105   LearningRate 0.0767   Epoch: 2   Global Step: 103220   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:19:40,022-Speed 2625.02 samples/sec   Loss 12.1696   LearningRate 0.0767   Epoch: 2   Global Step: 103230   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:19:43,913-Speed 2631.66 samples/sec   Loss 12.2234   LearningRate 0.0767   Epoch: 2   Global Step: 103240   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:19:47,809-Speed 2629.42 samples/sec   Loss 12.2353   LearningRate 0.0767   Epoch: 2   Global Step: 103250   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:19:51,685-Speed 2643.25 samples/sec   Loss 12.2068   LearningRate 0.0767   Epoch: 2   Global Step: 103260   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:55,592-Speed 2621.63 samples/sec   Loss 12.1431   LearningRate 0.0767   Epoch: 2   Global Step: 103270   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:19:59,488-Speed 2628.91 samples/sec   Loss 12.2695   LearningRate 0.0767   Epoch: 2   Global Step: 103280   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:20:03,381-Speed 2631.40 samples/sec   Loss 12.2344   LearningRate 0.0766   Epoch: 2   Global Step: 103290   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:20:07,275-Speed 2630.29 samples/sec   Loss 12.1218   LearningRate 0.0766   Epoch: 2   Global Step: 103300   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:20:11,173-Speed 2627.16 samples/sec   Loss 12.2609   LearningRate 0.0766   Epoch: 2   Global Step: 103310   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:20:15,087-Speed 2617.11 samples/sec   Loss 12.2086   LearningRate 0.0766   Epoch: 2   Global Step: 103320   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:20:18,953-Speed 2649.08 samples/sec   Loss 12.3539   LearningRate 0.0766   Epoch: 2   Global Step: 103330   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:22,856-Speed 2624.97 samples/sec   Loss 12.5259   LearningRate 0.0766   Epoch: 2   Global Step: 103340   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:26,766-Speed 2619.86 samples/sec   Loss 12.2670   LearningRate 0.0766   Epoch: 2   Global Step: 103350   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:30,658-Speed 2631.94 samples/sec   Loss 12.2427   LearningRate 0.0766   Epoch: 2   Global Step: 103360   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:34,552-Speed 2630.59 samples/sec   Loss 12.3601   LearningRate 0.0766   Epoch: 2   Global Step: 103370   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:38,458-Speed 2621.82 samples/sec   Loss 12.0875   LearningRate 0.0766   Epoch: 2   Global Step: 103380   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:42,350-Speed 2631.55 samples/sec   Loss 12.1752   LearningRate 0.0766   Epoch: 2   Global Step: 103390   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:46,242-Speed 2631.96 samples/sec   Loss 12.1180   LearningRate 0.0766   Epoch: 2   Global Step: 103400   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:50,138-Speed 2629.21 samples/sec   Loss 12.2851   LearningRate 0.0766   Epoch: 2   Global Step: 103410   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:54,044-Speed 2621.98 samples/sec   Loss 12.1039   LearningRate 0.0766   Epoch: 2   Global Step: 103420   Fp16 Grad Scale: 32768   Required: 82 hours
Training: 2022-04-13 07:20:57,947-Speed 2624.25 samples/sec   Loss 12.3190   LearningRate 0.0766   Epoch: 2   Global Step: 103430   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:01,863-Speed 2615.91 samples/sec   Loss 12.1148   LearningRate 0.0766   Epoch: 2   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:05,763-Speed 2626.66 samples/sec   Loss 12.0321   LearningRate 0.0766   Epoch: 2   Global Step: 103450   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:09,655-Speed 2631.07 samples/sec   Loss 12.1902   LearningRate 0.0766   Epoch: 2   Global Step: 103460   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:13,549-Speed 2630.20 samples/sec   Loss 12.0855   LearningRate 0.0766   Epoch: 2   Global Step: 103470   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:17,465-Speed 2616.09 samples/sec   Loss 12.1602   LearningRate 0.0766   Epoch: 2   Global Step: 103480   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:21,358-Speed 2630.99 samples/sec   Loss 12.1656   LearningRate 0.0766   Epoch: 2   Global Step: 103490   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:25,274-Speed 2615.65 samples/sec   Loss 12.2833   LearningRate 0.0766   Epoch: 2   Global Step: 103500   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:29,174-Speed 2626.53 samples/sec   Loss 12.3899   LearningRate 0.0766   Epoch: 2   Global Step: 103510   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:33,070-Speed 2629.40 samples/sec   Loss 12.1381   LearningRate 0.0766   Epoch: 2   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:21:36,969-Speed 2626.77 samples/sec   Loss 11.9863   LearningRate 0.0766   Epoch: 2   Global Step: 103530   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:21:40,863-Speed 2630.26 samples/sec   Loss 12.2981   LearningRate 0.0766   Epoch: 2   Global Step: 103540   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:21:44,760-Speed 2628.47 samples/sec   Loss 12.1568   LearningRate 0.0766   Epoch: 2   Global Step: 103550   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:21:48,656-Speed 2629.16 samples/sec   Loss 12.2095   LearningRate 0.0766   Epoch: 2   Global Step: 103560   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:21:52,549-Speed 2631.31 samples/sec   Loss 12.0828   LearningRate 0.0766   Epoch: 2   Global Step: 103570   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:21:56,455-Speed 2622.33 samples/sec   Loss 12.1764   LearningRate 0.0766   Epoch: 2   Global Step: 103580   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:00,357-Speed 2625.21 samples/sec   Loss 12.3042   LearningRate 0.0766   Epoch: 2   Global Step: 103590   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:04,253-Speed 2628.62 samples/sec   Loss 12.0015   LearningRate 0.0766   Epoch: 2   Global Step: 103600   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:08,157-Speed 2623.28 samples/sec   Loss 12.3096   LearningRate 0.0766   Epoch: 2   Global Step: 103610   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:12,058-Speed 2625.67 samples/sec   Loss 12.3122   LearningRate 0.0766   Epoch: 2   Global Step: 103620   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:15,952-Speed 2629.97 samples/sec   Loss 12.1485   LearningRate 0.0766   Epoch: 2   Global Step: 103630   Fp16 Grad Scale: 262144   Required: 82 hours
Training: 2022-04-13 07:22:19,836-Speed 2637.29 samples/sec   Loss 12.2795   LearningRate 0.0766   Epoch: 2   Global Step: 103640   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:23,735-Speed 2627.38 samples/sec   Loss 12.2563   LearningRate 0.0766   Epoch: 2   Global Step: 103650   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:27,639-Speed 2624.36 samples/sec   Loss 12.1888   LearningRate 0.0766   Epoch: 2   Global Step: 103660   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:31,537-Speed 2627.74 samples/sec   Loss 12.1773   LearningRate 0.0766   Epoch: 2   Global Step: 103670   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:35,437-Speed 2626.12 samples/sec   Loss 12.1839   LearningRate 0.0766   Epoch: 2   Global Step: 103680   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:39,340-Speed 2624.17 samples/sec   Loss 12.1108   LearningRate 0.0766   Epoch: 2   Global Step: 103690   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:43,239-Speed 2627.01 samples/sec   Loss 12.1221   LearningRate 0.0766   Epoch: 2   Global Step: 103700   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:47,132-Speed 2630.29 samples/sec   Loss 12.1737   LearningRate 0.0766   Epoch: 2   Global Step: 103710   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:22:51,014-Speed 2639.31 samples/sec   Loss 12.2011   LearningRate 0.0766   Epoch: 2   Global Step: 103720   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:22:54,904-Speed 2632.61 samples/sec   Loss 12.2444   LearningRate 0.0766   Epoch: 2   Global Step: 103730   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:22:58,797-Speed 2631.31 samples/sec   Loss 12.1434   LearningRate 0.0766   Epoch: 2   Global Step: 103740   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:02,709-Speed 2618.17 samples/sec   Loss 12.2640   LearningRate 0.0766   Epoch: 2   Global Step: 103750   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:06,601-Speed 2632.32 samples/sec   Loss 12.2150   LearningRate 0.0765   Epoch: 2   Global Step: 103760   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:10,494-Speed 2630.54 samples/sec   Loss 12.1398   LearningRate 0.0765   Epoch: 2   Global Step: 103770   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:14,392-Speed 2627.96 samples/sec   Loss 12.0545   LearningRate 0.0765   Epoch: 2   Global Step: 103780   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:18,292-Speed 2625.86 samples/sec   Loss 12.3392   LearningRate 0.0765   Epoch: 2   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:22,186-Speed 2630.62 samples/sec   Loss 12.1687   LearningRate 0.0765   Epoch: 2   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:26,078-Speed 2631.94 samples/sec   Loss 12.2499   LearningRate 0.0765   Epoch: 2   Global Step: 103810   Fp16 Grad Scale: 65536   Required: 82 hours
Training: 2022-04-13 07:23:29,993-Speed 2616.59 samples/sec   Loss 12.1643   LearningRate 0.0765   Epoch: 2   Global Step: 103820   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:33,887-Speed 2630.14 samples/sec   Loss 12.2311   LearningRate 0.0765   Epoch: 2   Global Step: 103830   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:37,795-Speed 2621.30 samples/sec   Loss 12.2044   LearningRate 0.0765   Epoch: 2   Global Step: 103840   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:41,713-Speed 2614.32 samples/sec   Loss 12.0922   LearningRate 0.0765   Epoch: 2   Global Step: 103850   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:45,608-Speed 2629.66 samples/sec   Loss 12.1947   LearningRate 0.0765   Epoch: 2   Global Step: 103860   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:49,500-Speed 2631.38 samples/sec   Loss 12.1682   LearningRate 0.0765   Epoch: 2   Global Step: 103870   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:53,401-Speed 2625.84 samples/sec   Loss 12.1183   LearningRate 0.0765   Epoch: 2   Global Step: 103880   Fp16 Grad Scale: 131072   Required: 82 hours
Training: 2022-04-13 07:23:57,301-Speed 2626.94 samples/sec   Loss 12.1906   LearningRate 0.0765   Epoch: 2   Global Step: 103890   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:01,196-Speed 2629.68 samples/sec   Loss 12.1330   LearningRate 0.0765   Epoch: 2   Global Step: 103900   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:05,091-Speed 2630.13 samples/sec   Loss 12.1191   LearningRate 0.0765   Epoch: 2   Global Step: 103910   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:09,000-Speed 2620.37 samples/sec   Loss 12.3123   LearningRate 0.0765   Epoch: 2   Global Step: 103920   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:24:12,896-Speed 2628.79 samples/sec   Loss 12.1130   LearningRate 0.0765   Epoch: 2   Global Step: 103930   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:24:16,788-Speed 2631.50 samples/sec   Loss 12.1960   LearningRate 0.0765   Epoch: 2   Global Step: 103940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:20,692-Speed 2623.91 samples/sec   Loss 12.1054   LearningRate 0.0765   Epoch: 2   Global Step: 103950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:24,595-Speed 2624.04 samples/sec   Loss 12.2250   LearningRate 0.0765   Epoch: 2   Global Step: 103960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:28,511-Speed 2616.10 samples/sec   Loss 12.1282   LearningRate 0.0765   Epoch: 2   Global Step: 103970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:32,410-Speed 2626.80 samples/sec   Loss 12.2532   LearningRate 0.0765   Epoch: 2   Global Step: 103980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:36,311-Speed 2626.20 samples/sec   Loss 12.1321   LearningRate 0.0765   Epoch: 2   Global Step: 103990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:40,206-Speed 2629.30 samples/sec   Loss 12.0551   LearningRate 0.0765   Epoch: 2   Global Step: 104000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:44,121-Speed 2616.61 samples/sec   Loss 12.0718   LearningRate 0.0765   Epoch: 2   Global Step: 104010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:48,031-Speed 2619.44 samples/sec   Loss 12.1762   LearningRate 0.0765   Epoch: 2   Global Step: 104020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:51,928-Speed 2629.13 samples/sec   Loss 12.1915   LearningRate 0.0765   Epoch: 2   Global Step: 104030   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:24:55,825-Speed 2627.77 samples/sec   Loss 12.1825   LearningRate 0.0765   Epoch: 2   Global Step: 104040   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:24:59,829-Speed 2558.34 samples/sec   Loss 12.2123   LearningRate 0.0765   Epoch: 2   Global Step: 104050   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:25:03,730-Speed 2625.03 samples/sec   Loss 12.2001   LearningRate 0.0765   Epoch: 2   Global Step: 104060   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:25:07,625-Speed 2630.23 samples/sec   Loss 12.3199   LearningRate 0.0765   Epoch: 2   Global Step: 104070   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:25:11,616-Speed 2566.57 samples/sec   Loss 12.2402   LearningRate 0.0765   Epoch: 2   Global Step: 104080   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:15,673-Speed 2524.23 samples/sec   Loss 12.0971   LearningRate 0.0765   Epoch: 2   Global Step: 104090   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:19,746-Speed 2514.73 samples/sec   Loss 12.1841   LearningRate 0.0765   Epoch: 2   Global Step: 104100   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:23,816-Speed 2516.68 samples/sec   Loss 12.1156   LearningRate 0.0765   Epoch: 2   Global Step: 104110   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:27,897-Speed 2510.07 samples/sec   Loss 12.1042   LearningRate 0.0765   Epoch: 2   Global Step: 104120   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:31,938-Speed 2534.42 samples/sec   Loss 12.1305   LearningRate 0.0765   Epoch: 2   Global Step: 104130   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:35,834-Speed 2629.21 samples/sec   Loss 12.1200   LearningRate 0.0765   Epoch: 2   Global Step: 104140   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:39,730-Speed 2628.81 samples/sec   Loss 12.1821   LearningRate 0.0765   Epoch: 2   Global Step: 104150   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:43,631-Speed 2625.77 samples/sec   Loss 12.2137   LearningRate 0.0765   Epoch: 2   Global Step: 104160   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:47,527-Speed 2628.82 samples/sec   Loss 12.2472   LearningRate 0.0765   Epoch: 2   Global Step: 104170   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:25:51,421-Speed 2630.08 samples/sec   Loss 12.2565   LearningRate 0.0765   Epoch: 2   Global Step: 104180   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:25:55,289-Speed 2649.11 samples/sec   Loss 12.1614   LearningRate 0.0765   Epoch: 2   Global Step: 104190   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:25:59,182-Speed 2630.82 samples/sec   Loss 12.3095   LearningRate 0.0765   Epoch: 2   Global Step: 104200   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:03,078-Speed 2628.89 samples/sec   Loss 12.1699   LearningRate 0.0765   Epoch: 2   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:06,987-Speed 2620.05 samples/sec   Loss 12.2803   LearningRate 0.0765   Epoch: 2   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:10,894-Speed 2622.17 samples/sec   Loss 12.0826   LearningRate 0.0765   Epoch: 2   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:14,792-Speed 2627.59 samples/sec   Loss 12.0162   LearningRate 0.0764   Epoch: 2   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:18,685-Speed 2630.88 samples/sec   Loss 12.2482   LearningRate 0.0764   Epoch: 2   Global Step: 104250   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:22,582-Speed 2628.28 samples/sec   Loss 12.0227   LearningRate 0.0764   Epoch: 2   Global Step: 104260   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:26,477-Speed 2630.04 samples/sec   Loss 12.1014   LearningRate 0.0764   Epoch: 2   Global Step: 104270   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:30,397-Speed 2612.43 samples/sec   Loss 12.1500   LearningRate 0.0764   Epoch: 2   Global Step: 104280   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:26:34,291-Speed 2630.57 samples/sec   Loss 12.0578   LearningRate 0.0764   Epoch: 2   Global Step: 104290   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:26:38,193-Speed 2625.30 samples/sec   Loss 12.1094   LearningRate 0.0764   Epoch: 2   Global Step: 104300   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:26:42,091-Speed 2627.34 samples/sec   Loss 12.1258   LearningRate 0.0764   Epoch: 2   Global Step: 104310   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:26:45,988-Speed 2628.26 samples/sec   Loss 12.1136   LearningRate 0.0764   Epoch: 2   Global Step: 104320   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:26:49,965-Speed 2575.25 samples/sec   Loss 12.2484   LearningRate 0.0764   Epoch: 2   Global Step: 104330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:26:53,874-Speed 2620.77 samples/sec   Loss 12.3049   LearningRate 0.0764   Epoch: 2   Global Step: 104340   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:26:57,770-Speed 2629.15 samples/sec   Loss 12.2111   LearningRate 0.0764   Epoch: 2   Global Step: 104350   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:27:01,663-Speed 2630.58 samples/sec   Loss 12.0884   LearningRate 0.0764   Epoch: 2   Global Step: 104360   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:27:05,539-Speed 2642.82 samples/sec   Loss 12.3076   LearningRate 0.0764   Epoch: 2   Global Step: 104370   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:09,432-Speed 2630.60 samples/sec   Loss 12.1819   LearningRate 0.0764   Epoch: 2   Global Step: 104380   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:13,326-Speed 2630.84 samples/sec   Loss 12.0106   LearningRate 0.0764   Epoch: 2   Global Step: 104390   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:17,218-Speed 2631.29 samples/sec   Loss 12.2035   LearningRate 0.0764   Epoch: 2   Global Step: 104400   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:21,107-Speed 2633.77 samples/sec   Loss 12.1374   LearningRate 0.0764   Epoch: 2   Global Step: 104410   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:25,012-Speed 2622.87 samples/sec   Loss 12.2912   LearningRate 0.0764   Epoch: 2   Global Step: 104420   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:28,913-Speed 2626.02 samples/sec   Loss 12.1247   LearningRate 0.0764   Epoch: 2   Global Step: 104430   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:32,807-Speed 2630.51 samples/sec   Loss 12.1036   LearningRate 0.0764   Epoch: 2   Global Step: 104440   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:36,705-Speed 2627.30 samples/sec   Loss 12.3275   LearningRate 0.0764   Epoch: 2   Global Step: 104450   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:40,602-Speed 2627.92 samples/sec   Loss 12.2157   LearningRate 0.0764   Epoch: 2   Global Step: 104460   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:27:44,503-Speed 2625.66 samples/sec   Loss 12.1639   LearningRate 0.0764   Epoch: 2   Global Step: 104470   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:27:48,411-Speed 2621.11 samples/sec   Loss 12.0892   LearningRate 0.0764   Epoch: 2   Global Step: 104480   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:27:52,308-Speed 2628.03 samples/sec   Loss 12.1159   LearningRate 0.0764   Epoch: 2   Global Step: 104490   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:27:56,204-Speed 2628.80 samples/sec   Loss 12.1715   LearningRate 0.0764   Epoch: 2   Global Step: 104500   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:00,102-Speed 2627.82 samples/sec   Loss 12.1546   LearningRate 0.0764   Epoch: 2   Global Step: 104510   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:03,995-Speed 2631.20 samples/sec   Loss 12.1585   LearningRate 0.0764   Epoch: 2   Global Step: 104520   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:07,887-Speed 2631.25 samples/sec   Loss 12.0926   LearningRate 0.0764   Epoch: 2   Global Step: 104530   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:11,782-Speed 2629.91 samples/sec   Loss 12.1943   LearningRate 0.0764   Epoch: 2   Global Step: 104540   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:15,676-Speed 2630.00 samples/sec   Loss 12.2820   LearningRate 0.0764   Epoch: 2   Global Step: 104550   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:19,578-Speed 2624.66 samples/sec   Loss 12.2065   LearningRate 0.0764   Epoch: 2   Global Step: 104560   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:28:23,508-Speed 2606.50 samples/sec   Loss 12.1263   LearningRate 0.0764   Epoch: 2   Global Step: 104570   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:27,407-Speed 2627.29 samples/sec   Loss 12.1626   LearningRate 0.0764   Epoch: 2   Global Step: 104580   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:31,313-Speed 2622.46 samples/sec   Loss 12.1979   LearningRate 0.0764   Epoch: 2   Global Step: 104590   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:35,198-Speed 2635.92 samples/sec   Loss 12.0360   LearningRate 0.0764   Epoch: 2   Global Step: 104600   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:39,095-Speed 2629.25 samples/sec   Loss 12.2204   LearningRate 0.0764   Epoch: 2   Global Step: 104610   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:43,202-Speed 2493.25 samples/sec   Loss 12.1786   LearningRate 0.0764   Epoch: 2   Global Step: 104620   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:47,157-Speed 2589.76 samples/sec   Loss 12.1608   LearningRate 0.0764   Epoch: 2   Global Step: 104630   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:51,067-Speed 2619.80 samples/sec   Loss 12.2400   LearningRate 0.0764   Epoch: 2   Global Step: 104640   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:54,965-Speed 2627.28 samples/sec   Loss 12.2995   LearningRate 0.0764   Epoch: 2   Global Step: 104650   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:28:58,860-Speed 2630.15 samples/sec   Loss 12.2631   LearningRate 0.0764   Epoch: 2   Global Step: 104660   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:29:02,763-Speed 2623.66 samples/sec   Loss 12.2951   LearningRate 0.0764   Epoch: 2   Global Step: 104670   Fp16 Grad Scale: 524288   Required: 81 hours
Training: 2022-04-13 07:29:06,640-Speed 2642.28 samples/sec   Loss 12.1110   LearningRate 0.0764   Epoch: 2   Global Step: 104680   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:29:10,541-Speed 2626.05 samples/sec   Loss 12.1165   LearningRate 0.0764   Epoch: 2   Global Step: 104690   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:29:14,435-Speed 2629.67 samples/sec   Loss 12.0330   LearningRate 0.0764   Epoch: 2   Global Step: 104700   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:29:18,328-Speed 2631.34 samples/sec   Loss 12.3109   LearningRate 0.0763   Epoch: 2   Global Step: 104710   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:29:22,231-Speed 2624.82 samples/sec   Loss 12.1562   LearningRate 0.0763   Epoch: 2   Global Step: 104720   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:29:26,130-Speed 2627.08 samples/sec   Loss 12.1233   LearningRate 0.0763   Epoch: 2   Global Step: 104730   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:29:30,028-Speed 2627.89 samples/sec   Loss 12.2826   LearningRate 0.0763   Epoch: 2   Global Step: 104740   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:29:33,933-Speed 2622.66 samples/sec   Loss 12.3044   LearningRate 0.0763   Epoch: 2   Global Step: 104750   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:29:37,807-Speed 2644.06 samples/sec   Loss 12.3629   LearningRate 0.0763   Epoch: 2   Global Step: 104760   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:29:41,699-Speed 2631.81 samples/sec   Loss 12.2540   LearningRate 0.0763   Epoch: 2   Global Step: 104770   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:29:45,590-Speed 2632.15 samples/sec   Loss 12.1929   LearningRate 0.0763   Epoch: 2   Global Step: 104780   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:29:49,485-Speed 2629.83 samples/sec   Loss 12.0655   LearningRate 0.0763   Epoch: 2   Global Step: 104790   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:29:53,410-Speed 2610.11 samples/sec   Loss 12.1439   LearningRate 0.0763   Epoch: 2   Global Step: 104800   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:29:57,314-Speed 2623.27 samples/sec   Loss 12.0365   LearningRate 0.0763   Epoch: 2   Global Step: 104810   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:30:01,226-Speed 2618.66 samples/sec   Loss 12.2332   LearningRate 0.0763   Epoch: 2   Global Step: 104820   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:30:05,124-Speed 2628.25 samples/sec   Loss 12.3663   LearningRate 0.0763   Epoch: 2   Global Step: 104830   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:30:09,016-Speed 2631.49 samples/sec   Loss 12.0917   LearningRate 0.0763   Epoch: 2   Global Step: 104840   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:30:12,915-Speed 2626.75 samples/sec   Loss 12.1779   LearningRate 0.0763   Epoch: 2   Global Step: 104850   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:30:16,821-Speed 2622.56 samples/sec   Loss 12.2960   LearningRate 0.0763   Epoch: 2   Global Step: 104860   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:20,713-Speed 2631.33 samples/sec   Loss 12.1904   LearningRate 0.0763   Epoch: 2   Global Step: 104870   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:24,613-Speed 2626.89 samples/sec   Loss 12.3483   LearningRate 0.0763   Epoch: 2   Global Step: 104880   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:28,506-Speed 2631.19 samples/sec   Loss 12.3147   LearningRate 0.0763   Epoch: 2   Global Step: 104890   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:32,398-Speed 2631.73 samples/sec   Loss 12.0905   LearningRate 0.0763   Epoch: 2   Global Step: 104900   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:36,293-Speed 2629.43 samples/sec   Loss 12.1552   LearningRate 0.0763   Epoch: 2   Global Step: 104910   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:40,186-Speed 2630.45 samples/sec   Loss 12.1943   LearningRate 0.0763   Epoch: 2   Global Step: 104920   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:44,090-Speed 2623.95 samples/sec   Loss 12.2414   LearningRate 0.0763   Epoch: 2   Global Step: 104930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:47,975-Speed 2636.28 samples/sec   Loss 12.2108   LearningRate 0.0763   Epoch: 2   Global Step: 104940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:51,872-Speed 2628.88 samples/sec   Loss 12.1823   LearningRate 0.0763   Epoch: 2   Global Step: 104950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:55,764-Speed 2631.58 samples/sec   Loss 12.0669   LearningRate 0.0763   Epoch: 2   Global Step: 104960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:30:59,662-Speed 2627.49 samples/sec   Loss 12.2513   LearningRate 0.0763   Epoch: 2   Global Step: 104970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:03,562-Speed 2626.26 samples/sec   Loss 12.2454   LearningRate 0.0763   Epoch: 2   Global Step: 104980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:07,461-Speed 2626.63 samples/sec   Loss 12.2952   LearningRate 0.0763   Epoch: 2   Global Step: 104990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:11,368-Speed 2622.35 samples/sec   Loss 12.1808   LearningRate 0.0763   Epoch: 2   Global Step: 105000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:15,264-Speed 2628.74 samples/sec   Loss 12.2372   LearningRate 0.0763   Epoch: 2   Global Step: 105010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:19,184-Speed 2612.93 samples/sec   Loss 12.0052   LearningRate 0.0763   Epoch: 2   Global Step: 105020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:23,082-Speed 2628.00 samples/sec   Loss 12.1660   LearningRate 0.0763   Epoch: 2   Global Step: 105030   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:26,977-Speed 2630.00 samples/sec   Loss 12.0542   LearningRate 0.0763   Epoch: 2   Global Step: 105040   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:31:30,890-Speed 2617.72 samples/sec   Loss 12.1448   LearningRate 0.0763   Epoch: 2   Global Step: 105050   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:34,783-Speed 2630.69 samples/sec   Loss 11.9839   LearningRate 0.0763   Epoch: 2   Global Step: 105060   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:38,680-Speed 2627.99 samples/sec   Loss 12.1629   LearningRate 0.0763   Epoch: 2   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:42,574-Speed 2630.65 samples/sec   Loss 12.1680   LearningRate 0.0763   Epoch: 2   Global Step: 105080   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:46,467-Speed 2631.16 samples/sec   Loss 12.2777   LearningRate 0.0763   Epoch: 2   Global Step: 105090   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:50,371-Speed 2623.92 samples/sec   Loss 12.1626   LearningRate 0.0763   Epoch: 2   Global Step: 105100   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:54,260-Speed 2633.54 samples/sec   Loss 12.2306   LearningRate 0.0763   Epoch: 2   Global Step: 105110   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:31:58,150-Speed 2632.85 samples/sec   Loss 12.1406   LearningRate 0.0763   Epoch: 2   Global Step: 105120   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:32:02,042-Speed 2631.91 samples/sec   Loss 12.2782   LearningRate 0.0763   Epoch: 2   Global Step: 105130   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:32:05,932-Speed 2632.45 samples/sec   Loss 12.2338   LearningRate 0.0763   Epoch: 2   Global Step: 105140   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:32:09,830-Speed 2627.75 samples/sec   Loss 12.2786   LearningRate 0.0763   Epoch: 2   Global Step: 105150   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:13,729-Speed 2626.63 samples/sec   Loss 12.2385   LearningRate 0.0763   Epoch: 2   Global Step: 105160   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:17,623-Speed 2630.71 samples/sec   Loss 12.2826   LearningRate 0.0763   Epoch: 2   Global Step: 105170   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:21,516-Speed 2631.55 samples/sec   Loss 12.0359   LearningRate 0.0763   Epoch: 2   Global Step: 105180   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:25,415-Speed 2626.57 samples/sec   Loss 12.1738   LearningRate 0.0762   Epoch: 2   Global Step: 105190   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:29,310-Speed 2629.81 samples/sec   Loss 12.1550   LearningRate 0.0762   Epoch: 2   Global Step: 105200   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:33,211-Speed 2625.55 samples/sec   Loss 12.1621   LearningRate 0.0762   Epoch: 2   Global Step: 105210   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:37,107-Speed 2628.50 samples/sec   Loss 12.2652   LearningRate 0.0762   Epoch: 2   Global Step: 105220   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:40,999-Speed 2632.00 samples/sec   Loss 12.0388   LearningRate 0.0762   Epoch: 2   Global Step: 105230   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:44,894-Speed 2629.74 samples/sec   Loss 12.1801   LearningRate 0.0762   Epoch: 2   Global Step: 105240   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:32:48,787-Speed 2631.11 samples/sec   Loss 12.1633   LearningRate 0.0762   Epoch: 2   Global Step: 105250   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:32:52,683-Speed 2628.78 samples/sec   Loss 12.2276   LearningRate 0.0762   Epoch: 2   Global Step: 105260   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:32:56,580-Speed 2628.98 samples/sec   Loss 12.0434   LearningRate 0.0762   Epoch: 2   Global Step: 105270   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:33:00,448-Speed 2647.92 samples/sec   Loss 11.9990   LearningRate 0.0762   Epoch: 2   Global Step: 105280   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:04,340-Speed 2631.39 samples/sec   Loss 11.9901   LearningRate 0.0762   Epoch: 2   Global Step: 105290   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:08,233-Speed 2631.18 samples/sec   Loss 12.2099   LearningRate 0.0762   Epoch: 2   Global Step: 105300   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:12,123-Speed 2633.19 samples/sec   Loss 12.0475   LearningRate 0.0762   Epoch: 2   Global Step: 105310   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:16,015-Speed 2631.90 samples/sec   Loss 12.0915   LearningRate 0.0762   Epoch: 2   Global Step: 105320   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:19,911-Speed 2629.12 samples/sec   Loss 12.1920   LearningRate 0.0762   Epoch: 2   Global Step: 105330   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:23,850-Speed 2600.23 samples/sec   Loss 12.2253   LearningRate 0.0762   Epoch: 2   Global Step: 105340   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:27,744-Speed 2631.00 samples/sec   Loss 12.2892   LearningRate 0.0762   Epoch: 2   Global Step: 105350   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:31,658-Speed 2616.54 samples/sec   Loss 12.3795   LearningRate 0.0762   Epoch: 2   Global Step: 105360   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:35,572-Speed 2616.73 samples/sec   Loss 12.1746   LearningRate 0.0762   Epoch: 2   Global Step: 105370   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:33:39,489-Speed 2614.25 samples/sec   Loss 12.1443   LearningRate 0.0762   Epoch: 2   Global Step: 105380   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:33:43,401-Speed 2619.08 samples/sec   Loss 11.9622   LearningRate 0.0762   Epoch: 2   Global Step: 105390   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:33:47,320-Speed 2613.10 samples/sec   Loss 12.3554   LearningRate 0.0762   Epoch: 2   Global Step: 105400   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:33:51,219-Speed 2627.44 samples/sec   Loss 12.2863   LearningRate 0.0762   Epoch: 2   Global Step: 105410   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:33:55,119-Speed 2626.02 samples/sec   Loss 12.1485   LearningRate 0.0762   Epoch: 2   Global Step: 105420   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:33:59,028-Speed 2620.85 samples/sec   Loss 12.3475   LearningRate 0.0762   Epoch: 2   Global Step: 105430   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:34:02,941-Speed 2617.07 samples/sec   Loss 12.2081   LearningRate 0.0762   Epoch: 2   Global Step: 105440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:34:06,895-Speed 2590.25 samples/sec   Loss 12.2380   LearningRate 0.0762   Epoch: 2   Global Step: 105450   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:34:10,881-Speed 2570.00 samples/sec   Loss 12.3191   LearningRate 0.0762   Epoch: 2   Global Step: 105460   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:34:14,773-Speed 2631.87 samples/sec   Loss 12.2630   LearningRate 0.0762   Epoch: 2   Global Step: 105470   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:34:18,662-Speed 2634.30 samples/sec   Loss 12.3122   LearningRate 0.0762   Epoch: 2   Global Step: 105480   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:22,562-Speed 2626.07 samples/sec   Loss 12.6681   LearningRate 0.0762   Epoch: 2   Global Step: 105490   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:26,463-Speed 2625.68 samples/sec   Loss 12.3751   LearningRate 0.0762   Epoch: 2   Global Step: 105500   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:30,357-Speed 2630.44 samples/sec   Loss 12.2226   LearningRate 0.0762   Epoch: 2   Global Step: 105510   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:34,274-Speed 2615.33 samples/sec   Loss 12.2084   LearningRate 0.0762   Epoch: 2   Global Step: 105520   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:38,243-Speed 2581.13 samples/sec   Loss 12.2194   LearningRate 0.0762   Epoch: 2   Global Step: 105530   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:42,145-Speed 2624.97 samples/sec   Loss 12.3660   LearningRate 0.0762   Epoch: 2   Global Step: 105540   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:46,036-Speed 2632.42 samples/sec   Loss 12.1954   LearningRate 0.0762   Epoch: 2   Global Step: 105550   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:49,989-Speed 2590.74 samples/sec   Loss 12.0879   LearningRate 0.0762   Epoch: 2   Global Step: 105560   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:53,890-Speed 2625.82 samples/sec   Loss 12.1549   LearningRate 0.0762   Epoch: 2   Global Step: 105570   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:34:57,780-Speed 2632.78 samples/sec   Loss 12.2021   LearningRate 0.0762   Epoch: 2   Global Step: 105580   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:01,673-Speed 2631.17 samples/sec   Loss 12.1441   LearningRate 0.0762   Epoch: 2   Global Step: 105590   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:05,569-Speed 2629.11 samples/sec   Loss 12.1587   LearningRate 0.0762   Epoch: 2   Global Step: 105600   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:09,474-Speed 2623.19 samples/sec   Loss 12.2875   LearningRate 0.0762   Epoch: 2   Global Step: 105610   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:13,376-Speed 2625.07 samples/sec   Loss 12.2957   LearningRate 0.0762   Epoch: 2   Global Step: 105620   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:17,274-Speed 2627.26 samples/sec   Loss 12.2303   LearningRate 0.0762   Epoch: 2   Global Step: 105630   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:21,177-Speed 2624.44 samples/sec   Loss 12.2017   LearningRate 0.0762   Epoch: 2   Global Step: 105640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:25,087-Speed 2619.36 samples/sec   Loss 12.0352   LearningRate 0.0762   Epoch: 2   Global Step: 105650   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:28,993-Speed 2622.31 samples/sec   Loss 12.1423   LearningRate 0.0761   Epoch: 2   Global Step: 105660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:32,894-Speed 2625.35 samples/sec   Loss 12.2646   LearningRate 0.0761   Epoch: 2   Global Step: 105670   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:36,800-Speed 2623.14 samples/sec   Loss 12.1476   LearningRate 0.0761   Epoch: 2   Global Step: 105680   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:35:40,697-Speed 2628.33 samples/sec   Loss 12.0637   LearningRate 0.0761   Epoch: 2   Global Step: 105690   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:35:44,598-Speed 2625.35 samples/sec   Loss 12.1976   LearningRate 0.0761   Epoch: 2   Global Step: 105700   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:35:48,499-Speed 2625.35 samples/sec   Loss 12.2499   LearningRate 0.0761   Epoch: 2   Global Step: 105710   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:35:52,381-Speed 2638.82 samples/sec   Loss 12.2408   LearningRate 0.0761   Epoch: 2   Global Step: 105720   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:35:56,279-Speed 2627.46 samples/sec   Loss 12.2115   LearningRate 0.0761   Epoch: 2   Global Step: 105730   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:00,172-Speed 2631.24 samples/sec   Loss 12.2932   LearningRate 0.0761   Epoch: 2   Global Step: 105740   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:04,079-Speed 2621.25 samples/sec   Loss 12.1140   LearningRate 0.0761   Epoch: 2   Global Step: 105750   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:07,974-Speed 2630.25 samples/sec   Loss 12.0954   LearningRate 0.0761   Epoch: 2   Global Step: 105760   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:11,875-Speed 2625.43 samples/sec   Loss 12.2772   LearningRate 0.0761   Epoch: 2   Global Step: 105770   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:15,850-Speed 2577.10 samples/sec   Loss 12.1174   LearningRate 0.0761   Epoch: 2   Global Step: 105780   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:19,920-Speed 2516.04 samples/sec   Loss 12.2266   LearningRate 0.0761   Epoch: 2   Global Step: 105790   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:24,001-Speed 2509.65 samples/sec   Loss 12.2279   LearningRate 0.0761   Epoch: 2   Global Step: 105800   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:28,045-Speed 2532.90 samples/sec   Loss 12.2610   LearningRate 0.0761   Epoch: 2   Global Step: 105810   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:31,937-Speed 2631.48 samples/sec   Loss 12.0572   LearningRate 0.0761   Epoch: 2   Global Step: 105820   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:36:35,845-Speed 2620.97 samples/sec   Loss 12.1448   LearningRate 0.0761   Epoch: 2   Global Step: 105830   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:36:39,747-Speed 2625.10 samples/sec   Loss 12.0173   LearningRate 0.0761   Epoch: 2   Global Step: 105840   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:36:43,649-Speed 2625.03 samples/sec   Loss 12.1755   LearningRate 0.0761   Epoch: 2   Global Step: 105850   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:36:47,718-Speed 2516.94 samples/sec   Loss 12.1783   LearningRate 0.0761   Epoch: 2   Global Step: 105860   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:36:51,739-Speed 2547.37 samples/sec   Loss 12.0636   LearningRate 0.0761   Epoch: 2   Global Step: 105870   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:55,635-Speed 2629.42 samples/sec   Loss 12.1300   LearningRate 0.0761   Epoch: 2   Global Step: 105880   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:36:59,529-Speed 2630.14 samples/sec   Loss 12.0425   LearningRate 0.0761   Epoch: 2   Global Step: 105890   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:03,431-Speed 2624.48 samples/sec   Loss 12.2914   LearningRate 0.0761   Epoch: 2   Global Step: 105900   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:07,325-Speed 2630.52 samples/sec   Loss 12.1333   LearningRate 0.0761   Epoch: 2   Global Step: 105910   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:11,226-Speed 2625.64 samples/sec   Loss 12.0797   LearningRate 0.0761   Epoch: 2   Global Step: 105920   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:15,124-Speed 2627.82 samples/sec   Loss 12.1944   LearningRate 0.0761   Epoch: 2   Global Step: 105930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:19,017-Speed 2631.87 samples/sec   Loss 12.2819   LearningRate 0.0761   Epoch: 2   Global Step: 105940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:22,918-Speed 2625.61 samples/sec   Loss 12.1094   LearningRate 0.0761   Epoch: 2   Global Step: 105950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:26,828-Speed 2619.73 samples/sec   Loss 12.1186   LearningRate 0.0761   Epoch: 2   Global Step: 105960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:30,714-Speed 2635.60 samples/sec   Loss 12.2050   LearningRate 0.0761   Epoch: 2   Global Step: 105970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:37:34,587-Speed 2644.86 samples/sec   Loss 12.2162   LearningRate 0.0761   Epoch: 2   Global Step: 105980   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:37:38,479-Speed 2631.30 samples/sec   Loss 12.1719   LearningRate 0.0761   Epoch: 2   Global Step: 105990   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:37:42,379-Speed 2626.75 samples/sec   Loss 12.1747   LearningRate 0.0761   Epoch: 2   Global Step: 106000   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:37:46,272-Speed 2631.19 samples/sec   Loss 12.2167   LearningRate 0.0761   Epoch: 2   Global Step: 106010   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:37:50,174-Speed 2624.87 samples/sec   Loss 12.1952   LearningRate 0.0761   Epoch: 2   Global Step: 106020   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:37:54,066-Speed 2631.50 samples/sec   Loss 12.2484   LearningRate 0.0761   Epoch: 2   Global Step: 106030   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:37:57,954-Speed 2634.67 samples/sec   Loss 12.0159   LearningRate 0.0761   Epoch: 2   Global Step: 106040   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:38:01,848-Speed 2630.33 samples/sec   Loss 12.0201   LearningRate 0.0761   Epoch: 2   Global Step: 106050   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:38:05,742-Speed 2630.78 samples/sec   Loss 12.1652   LearningRate 0.0761   Epoch: 2   Global Step: 106060   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:38:09,637-Speed 2628.99 samples/sec   Loss 12.1575   LearningRate 0.0761   Epoch: 2   Global Step: 106070   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:38:13,537-Speed 2626.51 samples/sec   Loss 12.1539   LearningRate 0.0761   Epoch: 2   Global Step: 106080   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:17,430-Speed 2631.08 samples/sec   Loss 12.1061   LearningRate 0.0761   Epoch: 2   Global Step: 106090   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:21,323-Speed 2631.34 samples/sec   Loss 12.1317   LearningRate 0.0761   Epoch: 2   Global Step: 106100   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:25,215-Speed 2631.93 samples/sec   Loss 11.9791   LearningRate 0.0761   Epoch: 2   Global Step: 106110   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:29,119-Speed 2623.51 samples/sec   Loss 12.0179   LearningRate 0.0761   Epoch: 2   Global Step: 106120   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:33,015-Speed 2629.60 samples/sec   Loss 12.0772   LearningRate 0.0761   Epoch: 2   Global Step: 106130   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:36,907-Speed 2631.54 samples/sec   Loss 12.1443   LearningRate 0.0760   Epoch: 2   Global Step: 106140   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:40,798-Speed 2632.02 samples/sec   Loss 12.2644   LearningRate 0.0760   Epoch: 2   Global Step: 106150   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:44,742-Speed 2597.04 samples/sec   Loss 12.1205   LearningRate 0.0760   Epoch: 2   Global Step: 106160   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:48,636-Speed 2630.68 samples/sec   Loss 12.1859   LearningRate 0.0760   Epoch: 2   Global Step: 106170   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:38:52,528-Speed 2631.32 samples/sec   Loss 12.1485   LearningRate 0.0760   Epoch: 2   Global Step: 106180   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:38:56,476-Speed 2594.46 samples/sec   Loss 12.0924   LearningRate 0.0760   Epoch: 2   Global Step: 106190   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:00,475-Speed 2561.40 samples/sec   Loss 12.1310   LearningRate 0.0760   Epoch: 2   Global Step: 106200   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:04,394-Speed 2613.53 samples/sec   Loss 12.1635   LearningRate 0.0760   Epoch: 2   Global Step: 106210   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:08,290-Speed 2628.47 samples/sec   Loss 12.2093   LearningRate 0.0760   Epoch: 2   Global Step: 106220   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:12,192-Speed 2625.50 samples/sec   Loss 12.1155   LearningRate 0.0760   Epoch: 2   Global Step: 106230   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:16,093-Speed 2624.95 samples/sec   Loss 12.2428   LearningRate 0.0760   Epoch: 2   Global Step: 106240   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:20,002-Speed 2620.76 samples/sec   Loss 12.0527   LearningRate 0.0760   Epoch: 2   Global Step: 106250   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:23,919-Speed 2614.65 samples/sec   Loss 12.0019   LearningRate 0.0760   Epoch: 2   Global Step: 106260   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:27,817-Speed 2627.57 samples/sec   Loss 12.0585   LearningRate 0.0760   Epoch: 2   Global Step: 106270   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:31,717-Speed 2626.23 samples/sec   Loss 12.0607   LearningRate 0.0760   Epoch: 2   Global Step: 106280   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:35,623-Speed 2622.24 samples/sec   Loss 12.0955   LearningRate 0.0760   Epoch: 2   Global Step: 106290   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:39:39,526-Speed 2624.51 samples/sec   Loss 12.2181   LearningRate 0.0760   Epoch: 2   Global Step: 106300   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:39:43,423-Speed 2628.18 samples/sec   Loss 12.2101   LearningRate 0.0760   Epoch: 2   Global Step: 106310   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:39:47,307-Speed 2637.43 samples/sec   Loss 12.0158   LearningRate 0.0760   Epoch: 2   Global Step: 106320   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:51,200-Speed 2630.78 samples/sec   Loss 12.0113   LearningRate 0.0760   Epoch: 2   Global Step: 106330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:39:55,088-Speed 2634.18 samples/sec   Loss 12.1309   LearningRate 0.0760   Epoch: 2   Global Step: 106340   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:39:58,981-Speed 2631.31 samples/sec   Loss 12.4171   LearningRate 0.0760   Epoch: 2   Global Step: 106350   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:02,878-Speed 2628.12 samples/sec   Loss 12.2583   LearningRate 0.0760   Epoch: 2   Global Step: 106360   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:06,785-Speed 2621.57 samples/sec   Loss 12.2547   LearningRate 0.0760   Epoch: 2   Global Step: 106370   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:10,679-Speed 2630.36 samples/sec   Loss 12.2514   LearningRate 0.0760   Epoch: 2   Global Step: 106380   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:14,579-Speed 2626.92 samples/sec   Loss 12.0663   LearningRate 0.0760   Epoch: 2   Global Step: 106390   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:18,482-Speed 2623.53 samples/sec   Loss 12.2094   LearningRate 0.0760   Epoch: 2   Global Step: 106400   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:22,384-Speed 2624.77 samples/sec   Loss 12.0792   LearningRate 0.0760   Epoch: 2   Global Step: 106410   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:26,288-Speed 2623.77 samples/sec   Loss 12.2405   LearningRate 0.0760   Epoch: 2   Global Step: 106420   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:30,192-Speed 2624.08 samples/sec   Loss 11.9958   LearningRate 0.0760   Epoch: 2   Global Step: 106430   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:40:34,094-Speed 2624.84 samples/sec   Loss 12.0058   LearningRate 0.0760   Epoch: 2   Global Step: 106440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:40:37,991-Speed 2628.33 samples/sec   Loss 12.1716   LearningRate 0.0760   Epoch: 2   Global Step: 106450   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:40:41,892-Speed 2626.66 samples/sec   Loss 12.1279   LearningRate 0.0760   Epoch: 2   Global Step: 106460   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:40:45,806-Speed 2616.57 samples/sec   Loss 12.1383   LearningRate 0.0760   Epoch: 2   Global Step: 106470   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:40:49,701-Speed 2629.62 samples/sec   Loss 12.0240   LearningRate 0.0760   Epoch: 2   Global Step: 106480   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:40:53,594-Speed 2630.94 samples/sec   Loss 12.1409   LearningRate 0.0760   Epoch: 2   Global Step: 106490   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:40:57,500-Speed 2622.46 samples/sec   Loss 12.2282   LearningRate 0.0760   Epoch: 2   Global Step: 106500   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:01,397-Speed 2627.94 samples/sec   Loss 12.1945   LearningRate 0.0760   Epoch: 2   Global Step: 106510   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:05,289-Speed 2631.99 samples/sec   Loss 12.1384   LearningRate 0.0760   Epoch: 2   Global Step: 106520   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:09,183-Speed 2630.37 samples/sec   Loss 12.1647   LearningRate 0.0760   Epoch: 2   Global Step: 106530   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:13,077-Speed 2630.40 samples/sec   Loss 12.2515   LearningRate 0.0760   Epoch: 2   Global Step: 106540   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:41:16,970-Speed 2631.23 samples/sec   Loss 12.0462   LearningRate 0.0760   Epoch: 2   Global Step: 106550   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:41:20,859-Speed 2633.14 samples/sec   Loss 12.0276   LearningRate 0.0760   Epoch: 2   Global Step: 106560   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:41:24,755-Speed 2628.91 samples/sec   Loss 12.0869   LearningRate 0.0760   Epoch: 2   Global Step: 106570   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:41:28,792-Speed 2537.61 samples/sec   Loss 12.1080   LearningRate 0.0760   Epoch: 2   Global Step: 106580   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:41:32,724-Speed 2604.44 samples/sec   Loss 12.1479   LearningRate 0.0760   Epoch: 2   Global Step: 106590   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:41:36,613-Speed 2633.95 samples/sec   Loss 12.2318   LearningRate 0.0760   Epoch: 2   Global Step: 106600   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:40,501-Speed 2635.00 samples/sec   Loss 12.1395   LearningRate 0.0759   Epoch: 2   Global Step: 106610   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:44,394-Speed 2630.91 samples/sec   Loss 12.1407   LearningRate 0.0759   Epoch: 2   Global Step: 106620   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:48,285-Speed 2632.55 samples/sec   Loss 12.0895   LearningRate 0.0759   Epoch: 2   Global Step: 106630   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:52,182-Speed 2627.71 samples/sec   Loss 12.1133   LearningRate 0.0759   Epoch: 2   Global Step: 106640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:56,080-Speed 2627.61 samples/sec   Loss 12.0923   LearningRate 0.0759   Epoch: 2   Global Step: 106650   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:41:59,986-Speed 2622.72 samples/sec   Loss 11.9902   LearningRate 0.0759   Epoch: 2   Global Step: 106660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:03,879-Speed 2631.18 samples/sec   Loss 12.1205   LearningRate 0.0759   Epoch: 2   Global Step: 106670   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:07,772-Speed 2631.15 samples/sec   Loss 12.0455   LearningRate 0.0759   Epoch: 2   Global Step: 106680   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:11,665-Speed 2631.26 samples/sec   Loss 12.2767   LearningRate 0.0759   Epoch: 2   Global Step: 106690   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:15,566-Speed 2625.46 samples/sec   Loss 12.1606   LearningRate 0.0759   Epoch: 2   Global Step: 106700   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:19,509-Speed 2597.40 samples/sec   Loss 12.1155   LearningRate 0.0759   Epoch: 2   Global Step: 106710   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:23,404-Speed 2629.24 samples/sec   Loss 12.2608   LearningRate 0.0759   Epoch: 2   Global Step: 106720   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:27,295-Speed 2632.67 samples/sec   Loss 12.1381   LearningRate 0.0759   Epoch: 2   Global Step: 106730   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:31,201-Speed 2622.08 samples/sec   Loss 12.2722   LearningRate 0.0759   Epoch: 2   Global Step: 106740   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:35,097-Speed 2629.70 samples/sec   Loss 12.1097   LearningRate 0.0759   Epoch: 2   Global Step: 106750   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:38,997-Speed 2626.29 samples/sec   Loss 12.0831   LearningRate 0.0759   Epoch: 2   Global Step: 106760   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:42:42,876-Speed 2640.03 samples/sec   Loss 12.1141   LearningRate 0.0759   Epoch: 2   Global Step: 106770   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:46,784-Speed 2620.84 samples/sec   Loss 12.0299   LearningRate 0.0759   Epoch: 2   Global Step: 106780   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:50,679-Speed 2630.10 samples/sec   Loss 12.0378   LearningRate 0.0759   Epoch: 2   Global Step: 106790   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:54,617-Speed 2601.25 samples/sec   Loss 12.2373   LearningRate 0.0759   Epoch: 2   Global Step: 106800   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:42:58,510-Speed 2630.64 samples/sec   Loss 12.1734   LearningRate 0.0759   Epoch: 2   Global Step: 106810   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:02,406-Speed 2629.15 samples/sec   Loss 12.0204   LearningRate 0.0759   Epoch: 2   Global Step: 106820   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:06,302-Speed 2628.70 samples/sec   Loss 12.2945   LearningRate 0.0759   Epoch: 2   Global Step: 106830   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:10,201-Speed 2627.71 samples/sec   Loss 12.0841   LearningRate 0.0759   Epoch: 2   Global Step: 106840   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:14,134-Speed 2603.87 samples/sec   Loss 12.1454   LearningRate 0.0759   Epoch: 2   Global Step: 106850   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:18,148-Speed 2551.56 samples/sec   Loss 12.1036   LearningRate 0.0759   Epoch: 2   Global Step: 106860   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:22,211-Speed 2521.52 samples/sec   Loss 12.1728   LearningRate 0.0759   Epoch: 2   Global Step: 106870   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:43:26,106-Speed 2629.39 samples/sec   Loss 12.1347   LearningRate 0.0759   Epoch: 2   Global Step: 106880   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:43:29,997-Speed 2632.78 samples/sec   Loss 12.0829   LearningRate 0.0759   Epoch: 2   Global Step: 106890   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:43:33,890-Speed 2630.73 samples/sec   Loss 12.1284   LearningRate 0.0759   Epoch: 2   Global Step: 106900   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:43:37,784-Speed 2630.03 samples/sec   Loss 12.2723   LearningRate 0.0759   Epoch: 2   Global Step: 106910   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:43:41,682-Speed 2627.44 samples/sec   Loss 11.9663   LearningRate 0.0759   Epoch: 2   Global Step: 106920   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:43:45,566-Speed 2637.75 samples/sec   Loss 12.2550   LearningRate 0.0759   Epoch: 2   Global Step: 106930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:49,459-Speed 2631.57 samples/sec   Loss 12.0306   LearningRate 0.0759   Epoch: 2   Global Step: 106940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:53,372-Speed 2617.51 samples/sec   Loss 12.0112   LearningRate 0.0759   Epoch: 2   Global Step: 106950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:43:57,283-Speed 2619.07 samples/sec   Loss 12.0126   LearningRate 0.0759   Epoch: 2   Global Step: 106960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:01,177-Speed 2629.85 samples/sec   Loss 12.1024   LearningRate 0.0759   Epoch: 2   Global Step: 106970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:05,071-Speed 2630.32 samples/sec   Loss 12.0718   LearningRate 0.0759   Epoch: 2   Global Step: 106980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:08,969-Speed 2627.47 samples/sec   Loss 12.2120   LearningRate 0.0759   Epoch: 2   Global Step: 106990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:12,864-Speed 2630.15 samples/sec   Loss 12.2502   LearningRate 0.0759   Epoch: 2   Global Step: 107000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:16,757-Speed 2630.82 samples/sec   Loss 12.1826   LearningRate 0.0759   Epoch: 2   Global Step: 107010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:20,664-Speed 2621.81 samples/sec   Loss 12.2013   LearningRate 0.0759   Epoch: 2   Global Step: 107020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:24,544-Speed 2640.39 samples/sec   Loss 12.2019   LearningRate 0.0759   Epoch: 2   Global Step: 107030   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:28,442-Speed 2627.55 samples/sec   Loss 12.2088   LearningRate 0.0759   Epoch: 2   Global Step: 107040   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:32,346-Speed 2623.73 samples/sec   Loss 12.1201   LearningRate 0.0759   Epoch: 2   Global Step: 107050   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:36,245-Speed 2626.75 samples/sec   Loss 12.2045   LearningRate 0.0759   Epoch: 2   Global Step: 107060   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:40,145-Speed 2626.54 samples/sec   Loss 12.1599   LearningRate 0.0759   Epoch: 2   Global Step: 107070   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:44,039-Speed 2630.18 samples/sec   Loss 12.0279   LearningRate 0.0759   Epoch: 2   Global Step: 107080   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:47,942-Speed 2624.42 samples/sec   Loss 12.0992   LearningRate 0.0758   Epoch: 2   Global Step: 107090   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:51,837-Speed 2629.71 samples/sec   Loss 12.2826   LearningRate 0.0758   Epoch: 2   Global Step: 107100   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:55,740-Speed 2624.26 samples/sec   Loss 12.0651   LearningRate 0.0758   Epoch: 2   Global Step: 107110   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:44:59,638-Speed 2628.01 samples/sec   Loss 12.0807   LearningRate 0.0758   Epoch: 2   Global Step: 107120   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:03,543-Speed 2622.53 samples/sec   Loss 11.9453   LearningRate 0.0758   Epoch: 2   Global Step: 107130   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:07,437-Speed 2630.19 samples/sec   Loss 12.1075   LearningRate 0.0758   Epoch: 2   Global Step: 107140   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:11,345-Speed 2620.59 samples/sec   Loss 12.0284   LearningRate 0.0758   Epoch: 2   Global Step: 107150   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:15,252-Speed 2621.84 samples/sec   Loss 12.1348   LearningRate 0.0758   Epoch: 2   Global Step: 107160   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:19,159-Speed 2621.61 samples/sec   Loss 12.1620   LearningRate 0.0758   Epoch: 2   Global Step: 107170   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:23,058-Speed 2627.29 samples/sec   Loss 12.2358   LearningRate 0.0758   Epoch: 2   Global Step: 107180   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:26,958-Speed 2626.71 samples/sec   Loss 12.1587   LearningRate 0.0758   Epoch: 2   Global Step: 107190   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:30,865-Speed 2621.44 samples/sec   Loss 12.2354   LearningRate 0.0758   Epoch: 2   Global Step: 107200   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:45:34,793-Speed 2607.67 samples/sec   Loss 12.0734   LearningRate 0.0758   Epoch: 2   Global Step: 107210   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:38,689-Speed 2628.79 samples/sec   Loss 12.0510   LearningRate 0.0758   Epoch: 2   Global Step: 107220   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:42,618-Speed 2606.82 samples/sec   Loss 12.1518   LearningRate 0.0758   Epoch: 2   Global Step: 107230   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:46,517-Speed 2627.40 samples/sec   Loss 12.2477   LearningRate 0.0758   Epoch: 2   Global Step: 107240   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:50,419-Speed 2624.94 samples/sec   Loss 12.0920   LearningRate 0.0758   Epoch: 2   Global Step: 107250   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:54,328-Speed 2620.53 samples/sec   Loss 12.2334   LearningRate 0.0758   Epoch: 2   Global Step: 107260   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:45:58,226-Speed 2627.55 samples/sec   Loss 12.1161   LearningRate 0.0758   Epoch: 2   Global Step: 107270   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:02,130-Speed 2623.21 samples/sec   Loss 12.1774   LearningRate 0.0758   Epoch: 2   Global Step: 107280   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:06,053-Speed 2611.04 samples/sec   Loss 12.0556   LearningRate 0.0758   Epoch: 2   Global Step: 107290   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:09,950-Speed 2628.57 samples/sec   Loss 12.0803   LearningRate 0.0758   Epoch: 2   Global Step: 107300   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:13,851-Speed 2626.06 samples/sec   Loss 12.1002   LearningRate 0.0758   Epoch: 2   Global Step: 107310   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:46:17,728-Speed 2641.62 samples/sec   Loss 12.0647   LearningRate 0.0758   Epoch: 2   Global Step: 107320   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:21,626-Speed 2627.66 samples/sec   Loss 12.0520   LearningRate 0.0758   Epoch: 2   Global Step: 107330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:25,525-Speed 2627.06 samples/sec   Loss 12.0692   LearningRate 0.0758   Epoch: 2   Global Step: 107340   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:29,419-Speed 2630.97 samples/sec   Loss 12.1605   LearningRate 0.0758   Epoch: 2   Global Step: 107350   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:33,316-Speed 2628.15 samples/sec   Loss 11.9945   LearningRate 0.0758   Epoch: 2   Global Step: 107360   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:37,228-Speed 2618.73 samples/sec   Loss 12.0780   LearningRate 0.0758   Epoch: 2   Global Step: 107370   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:41,126-Speed 2627.43 samples/sec   Loss 12.1782   LearningRate 0.0758   Epoch: 2   Global Step: 107380   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:45,034-Speed 2621.10 samples/sec   Loss 12.0654   LearningRate 0.0758   Epoch: 2   Global Step: 107390   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:48,943-Speed 2620.38 samples/sec   Loss 12.0674   LearningRate 0.0758   Epoch: 2   Global Step: 107400   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:52,834-Speed 2631.87 samples/sec   Loss 12.1024   LearningRate 0.0758   Epoch: 2   Global Step: 107410   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:46:56,714-Speed 2640.12 samples/sec   Loss 12.0788   LearningRate 0.0758   Epoch: 2   Global Step: 107420   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:00,638-Speed 2609.93 samples/sec   Loss 11.9677   LearningRate 0.0758   Epoch: 2   Global Step: 107430   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:04,535-Speed 2628.49 samples/sec   Loss 12.1818   LearningRate 0.0758   Epoch: 2   Global Step: 107440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:08,469-Speed 2603.76 samples/sec   Loss 12.1219   LearningRate 0.0758   Epoch: 2   Global Step: 107450   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:12,363-Speed 2630.92 samples/sec   Loss 12.0534   LearningRate 0.0758   Epoch: 2   Global Step: 107460   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:16,285-Speed 2610.87 samples/sec   Loss 12.3211   LearningRate 0.0758   Epoch: 2   Global Step: 107470   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:20,177-Speed 2632.09 samples/sec   Loss 12.2885   LearningRate 0.0758   Epoch: 2   Global Step: 107480   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:47:24,054-Speed 2641.71 samples/sec   Loss 12.2851   LearningRate 0.0758   Epoch: 2   Global Step: 107490   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:27,944-Speed 2633.50 samples/sec   Loss 12.1051   LearningRate 0.0758   Epoch: 2   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:31,846-Speed 2624.57 samples/sec   Loss 11.9399   LearningRate 0.0758   Epoch: 2   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:35,745-Speed 2627.09 samples/sec   Loss 11.8788   LearningRate 0.0758   Epoch: 2   Global Step: 107520   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:39,641-Speed 2629.38 samples/sec   Loss 11.9618   LearningRate 0.0758   Epoch: 2   Global Step: 107530   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:43,532-Speed 2632.46 samples/sec   Loss 12.1550   LearningRate 0.0758   Epoch: 2   Global Step: 107540   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:47,449-Speed 2614.75 samples/sec   Loss 12.1620   LearningRate 0.0758   Epoch: 2   Global Step: 107550   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:51,351-Speed 2624.71 samples/sec   Loss 11.9934   LearningRate 0.0757   Epoch: 2   Global Step: 107560   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:55,246-Speed 2630.07 samples/sec   Loss 12.0206   LearningRate 0.0757   Epoch: 2   Global Step: 107570   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:47:59,135-Speed 2634.21 samples/sec   Loss 11.9787   LearningRate 0.0757   Epoch: 2   Global Step: 107580   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:48:03,025-Speed 2632.37 samples/sec   Loss 12.1328   LearningRate 0.0757   Epoch: 2   Global Step: 107590   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:07,022-Speed 2563.49 samples/sec   Loss 12.2072   LearningRate 0.0757   Epoch: 2   Global Step: 107600   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:10,913-Speed 2631.81 samples/sec   Loss 12.1734   LearningRate 0.0757   Epoch: 2   Global Step: 107610   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:14,806-Speed 2631.79 samples/sec   Loss 12.0023   LearningRate 0.0757   Epoch: 2   Global Step: 107620   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:18,707-Speed 2625.15 samples/sec   Loss 12.1927   LearningRate 0.0757   Epoch: 2   Global Step: 107630   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:22,620-Speed 2617.99 samples/sec   Loss 12.0313   LearningRate 0.0757   Epoch: 2   Global Step: 107640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:26,513-Speed 2630.46 samples/sec   Loss 12.0624   LearningRate 0.0757   Epoch: 2   Global Step: 107650   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:30,409-Speed 2629.13 samples/sec   Loss 12.1299   LearningRate 0.0757   Epoch: 2   Global Step: 107660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:34,313-Speed 2623.86 samples/sec   Loss 12.1157   LearningRate 0.0757   Epoch: 2   Global Step: 107670   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:38,205-Speed 2631.18 samples/sec   Loss 12.0022   LearningRate 0.0757   Epoch: 2   Global Step: 107680   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:42,090-Speed 2636.48 samples/sec   Loss 12.0722   LearningRate 0.0757   Epoch: 2   Global Step: 107690   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:45,977-Speed 2635.83 samples/sec   Loss 12.1946   LearningRate 0.0757   Epoch: 2   Global Step: 107700   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:49,871-Speed 2630.17 samples/sec   Loss 11.9895   LearningRate 0.0757   Epoch: 2   Global Step: 107710   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:53,786-Speed 2615.71 samples/sec   Loss 12.1514   LearningRate 0.0757   Epoch: 2   Global Step: 107720   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:48:57,697-Speed 2619.05 samples/sec   Loss 12.0017   LearningRate 0.0757   Epoch: 2   Global Step: 107730   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:01,597-Speed 2626.44 samples/sec   Loss 12.0900   LearningRate 0.0757   Epoch: 2   Global Step: 107740   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:05,492-Speed 2629.34 samples/sec   Loss 12.0690   LearningRate 0.0757   Epoch: 2   Global Step: 107750   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:09,426-Speed 2604.01 samples/sec   Loss 12.1163   LearningRate 0.0757   Epoch: 2   Global Step: 107760   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:13,320-Speed 2630.51 samples/sec   Loss 12.1115   LearningRate 0.0757   Epoch: 2   Global Step: 107770   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:17,213-Speed 2630.71 samples/sec   Loss 12.1700   LearningRate 0.0757   Epoch: 2   Global Step: 107780   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:21,112-Speed 2627.37 samples/sec   Loss 12.1181   LearningRate 0.0757   Epoch: 2   Global Step: 107790   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:49:24,989-Speed 2642.20 samples/sec   Loss 12.1057   LearningRate 0.0757   Epoch: 2   Global Step: 107800   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:28,906-Speed 2615.05 samples/sec   Loss 12.0184   LearningRate 0.0757   Epoch: 2   Global Step: 107810   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:32,812-Speed 2622.96 samples/sec   Loss 12.0238   LearningRate 0.0757   Epoch: 2   Global Step: 107820   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:36,718-Speed 2622.03 samples/sec   Loss 12.1087   LearningRate 0.0757   Epoch: 2   Global Step: 107830   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:40,649-Speed 2605.39 samples/sec   Loss 12.0065   LearningRate 0.0757   Epoch: 2   Global Step: 107840   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:44,544-Speed 2629.95 samples/sec   Loss 12.0919   LearningRate 0.0757   Epoch: 2   Global Step: 107850   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:48,441-Speed 2627.99 samples/sec   Loss 11.9539   LearningRate 0.0757   Epoch: 2   Global Step: 107860   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:52,340-Speed 2627.70 samples/sec   Loss 12.0886   LearningRate 0.0757   Epoch: 2   Global Step: 107870   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:49:56,236-Speed 2628.40 samples/sec   Loss 12.0829   LearningRate 0.0757   Epoch: 2   Global Step: 107880   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:00,131-Speed 2629.98 samples/sec   Loss 12.1296   LearningRate 0.0757   Epoch: 2   Global Step: 107890   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:04,036-Speed 2623.19 samples/sec   Loss 11.9621   LearningRate 0.0757   Epoch: 2   Global Step: 107900   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:50:07,933-Speed 2627.87 samples/sec   Loss 12.0196   LearningRate 0.0757   Epoch: 2   Global Step: 107910   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:50:11,830-Speed 2628.71 samples/sec   Loss 12.1181   LearningRate 0.0757   Epoch: 2   Global Step: 107920   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:50:15,718-Speed 2634.81 samples/sec   Loss 12.0675   LearningRate 0.0757   Epoch: 2   Global Step: 107930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:19,614-Speed 2629.11 samples/sec   Loss 12.0756   LearningRate 0.0757   Epoch: 2   Global Step: 107940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:23,507-Speed 2631.19 samples/sec   Loss 12.1412   LearningRate 0.0757   Epoch: 2   Global Step: 107950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:27,402-Speed 2629.49 samples/sec   Loss 12.2661   LearningRate 0.0757   Epoch: 2   Global Step: 107960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:31,304-Speed 2625.18 samples/sec   Loss 12.1781   LearningRate 0.0757   Epoch: 2   Global Step: 107970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:35,201-Speed 2627.83 samples/sec   Loss 11.9912   LearningRate 0.0757   Epoch: 2   Global Step: 107980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:39,092-Speed 2632.51 samples/sec   Loss 12.1391   LearningRate 0.0757   Epoch: 2   Global Step: 107990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:42,990-Speed 2628.22 samples/sec   Loss 12.0429   LearningRate 0.0757   Epoch: 2   Global Step: 108000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:46,883-Speed 2630.60 samples/sec   Loss 12.0831   LearningRate 0.0757   Epoch: 2   Global Step: 108010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:50,783-Speed 2626.66 samples/sec   Loss 12.1805   LearningRate 0.0757   Epoch: 2   Global Step: 108020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:50:54,679-Speed 2628.64 samples/sec   Loss 12.0231   LearningRate 0.0757   Epoch: 2   Global Step: 108030   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:50:58,573-Speed 2630.80 samples/sec   Loss 12.0066   LearningRate 0.0756   Epoch: 2   Global Step: 108040   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:51:02,462-Speed 2633.43 samples/sec   Loss 12.2031   LearningRate 0.0756   Epoch: 2   Global Step: 108050   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:06,362-Speed 2626.44 samples/sec   Loss 11.9959   LearningRate 0.0756   Epoch: 2   Global Step: 108060   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:10,374-Speed 2552.74 samples/sec   Loss 12.0708   LearningRate 0.0756   Epoch: 2   Global Step: 108070   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:14,281-Speed 2621.23 samples/sec   Loss 12.1786   LearningRate 0.0756   Epoch: 2   Global Step: 108080   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:18,181-Speed 2626.63 samples/sec   Loss 12.0631   LearningRate 0.0756   Epoch: 2   Global Step: 108090   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:22,078-Speed 2627.94 samples/sec   Loss 12.0129   LearningRate 0.0756   Epoch: 2   Global Step: 108100   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:25,974-Speed 2629.65 samples/sec   Loss 12.2320   LearningRate 0.0756   Epoch: 2   Global Step: 108110   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:51:29,856-Speed 2638.64 samples/sec   Loss 12.2700   LearningRate 0.0756   Epoch: 2   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:33,751-Speed 2629.40 samples/sec   Loss 12.1500   LearningRate 0.0756   Epoch: 2   Global Step: 108130   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:37,649-Speed 2627.19 samples/sec   Loss 12.1854   LearningRate 0.0756   Epoch: 2   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:41,538-Speed 2633.94 samples/sec   Loss 12.0555   LearningRate 0.0756   Epoch: 2   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:45,433-Speed 2629.23 samples/sec   Loss 12.0629   LearningRate 0.0756   Epoch: 2   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:49,331-Speed 2627.84 samples/sec   Loss 12.0794   LearningRate 0.0756   Epoch: 2   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:53,227-Speed 2629.12 samples/sec   Loss 11.9393   LearningRate 0.0756   Epoch: 2   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:51:57,125-Speed 2627.85 samples/sec   Loss 12.0834   LearningRate 0.0756   Epoch: 2   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:52:01,021-Speed 2629.09 samples/sec   Loss 12.0955   LearningRate 0.0756   Epoch: 2   Global Step: 108200   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:52:04,913-Speed 2631.70 samples/sec   Loss 12.2419   LearningRate 0.0756   Epoch: 2   Global Step: 108210   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:52:08,810-Speed 2628.88 samples/sec   Loss 12.0509   LearningRate 0.0756   Epoch: 2   Global Step: 108220   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:12,712-Speed 2624.63 samples/sec   Loss 12.1429   LearningRate 0.0756   Epoch: 2   Global Step: 108230   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:16,640-Speed 2608.01 samples/sec   Loss 11.9966   LearningRate 0.0756   Epoch: 2   Global Step: 108240   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:20,536-Speed 2628.91 samples/sec   Loss 12.1471   LearningRate 0.0756   Epoch: 2   Global Step: 108250   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:24,456-Speed 2613.19 samples/sec   Loss 12.0613   LearningRate 0.0756   Epoch: 2   Global Step: 108260   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:28,455-Speed 2560.88 samples/sec   Loss 12.0975   LearningRate 0.0756   Epoch: 2   Global Step: 108270   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:32,347-Speed 2631.57 samples/sec   Loss 12.1424   LearningRate 0.0756   Epoch: 2   Global Step: 108280   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:36,265-Speed 2614.63 samples/sec   Loss 12.0658   LearningRate 0.0756   Epoch: 2   Global Step: 108290   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:40,228-Speed 2584.55 samples/sec   Loss 12.1081   LearningRate 0.0756   Epoch: 2   Global Step: 108300   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:44,123-Speed 2630.11 samples/sec   Loss 12.0102   LearningRate 0.0756   Epoch: 2   Global Step: 108310   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:52:48,019-Speed 2628.37 samples/sec   Loss 12.1857   LearningRate 0.0756   Epoch: 2   Global Step: 108320   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:52:51,893-Speed 2644.21 samples/sec   Loss 12.1094   LearningRate 0.0756   Epoch: 2   Global Step: 108330   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:52:55,788-Speed 2629.63 samples/sec   Loss 12.0892   LearningRate 0.0756   Epoch: 2   Global Step: 108340   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:52:59,694-Speed 2622.81 samples/sec   Loss 12.0654   LearningRate 0.0756   Epoch: 2   Global Step: 108350   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:03,593-Speed 2626.65 samples/sec   Loss 12.0869   LearningRate 0.0756   Epoch: 2   Global Step: 108360   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:07,489-Speed 2629.64 samples/sec   Loss 12.0914   LearningRate 0.0756   Epoch: 2   Global Step: 108370   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:11,386-Speed 2627.87 samples/sec   Loss 12.2122   LearningRate 0.0756   Epoch: 2   Global Step: 108380   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:15,285-Speed 2627.48 samples/sec   Loss 12.1212   LearningRate 0.0756   Epoch: 2   Global Step: 108390   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:19,183-Speed 2627.51 samples/sec   Loss 12.0516   LearningRate 0.0756   Epoch: 2   Global Step: 108400   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:23,079-Speed 2628.71 samples/sec   Loss 12.0799   LearningRate 0.0756   Epoch: 2   Global Step: 108410   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:26,998-Speed 2613.28 samples/sec   Loss 11.9285   LearningRate 0.0756   Epoch: 2   Global Step: 108420   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:30,894-Speed 2629.31 samples/sec   Loss 12.2517   LearningRate 0.0756   Epoch: 2   Global Step: 108430   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:53:34,799-Speed 2623.07 samples/sec   Loss 12.2082   LearningRate 0.0756   Epoch: 2   Global Step: 108440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:53:38,720-Speed 2612.66 samples/sec   Loss 12.0538   LearningRate 0.0756   Epoch: 2   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:42,624-Speed 2623.71 samples/sec   Loss 11.9938   LearningRate 0.0756   Epoch: 2   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:46,529-Speed 2623.03 samples/sec   Loss 12.2881   LearningRate 0.0756   Epoch: 2   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:50,431-Speed 2624.73 samples/sec   Loss 12.0087   LearningRate 0.0756   Epoch: 2   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:54,354-Speed 2610.45 samples/sec   Loss 12.1033   LearningRate 0.0756   Epoch: 2   Global Step: 108490   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:53:58,256-Speed 2625.04 samples/sec   Loss 12.2493   LearningRate 0.0756   Epoch: 2   Global Step: 108500   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:54:02,157-Speed 2625.71 samples/sec   Loss 12.0941   LearningRate 0.0756   Epoch: 2   Global Step: 108510   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:54:06,062-Speed 2623.24 samples/sec   Loss 12.0865   LearningRate 0.0755   Epoch: 2   Global Step: 108520   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:54:09,961-Speed 2626.30 samples/sec   Loss 12.0467   LearningRate 0.0755   Epoch: 2   Global Step: 108530   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:54:13,863-Speed 2625.83 samples/sec   Loss 12.0775   LearningRate 0.0755   Epoch: 2   Global Step: 108540   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:54:17,757-Speed 2630.36 samples/sec   Loss 12.0388   LearningRate 0.0755   Epoch: 2   Global Step: 108550   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:21,668-Speed 2618.25 samples/sec   Loss 12.1829   LearningRate 0.0755   Epoch: 2   Global Step: 108560   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:25,566-Speed 2627.96 samples/sec   Loss 12.0848   LearningRate 0.0755   Epoch: 2   Global Step: 108570   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:29,456-Speed 2633.29 samples/sec   Loss 12.0913   LearningRate 0.0755   Epoch: 2   Global Step: 108580   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:33,351-Speed 2629.56 samples/sec   Loss 12.0622   LearningRate 0.0755   Epoch: 2   Global Step: 108590   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:37,260-Speed 2619.74 samples/sec   Loss 12.2147   LearningRate 0.0755   Epoch: 2   Global Step: 108600   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:41,144-Speed 2637.60 samples/sec   Loss 12.1002   LearningRate 0.0755   Epoch: 2   Global Step: 108610   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:45,039-Speed 2630.03 samples/sec   Loss 12.2254   LearningRate 0.0755   Epoch: 2   Global Step: 108620   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:48,931-Speed 2631.71 samples/sec   Loss 12.1150   LearningRate 0.0755   Epoch: 2   Global Step: 108630   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:52,822-Speed 2632.48 samples/sec   Loss 11.9998   LearningRate 0.0755   Epoch: 2   Global Step: 108640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:54:56,716-Speed 2630.11 samples/sec   Loss 12.2454   LearningRate 0.0755   Epoch: 2   Global Step: 108650   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:00,611-Speed 2629.39 samples/sec   Loss 12.0263   LearningRate 0.0755   Epoch: 2   Global Step: 108660   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:04,558-Speed 2595.09 samples/sec   Loss 12.1181   LearningRate 0.0755   Epoch: 2   Global Step: 108670   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:08,450-Speed 2631.56 samples/sec   Loss 12.0422   LearningRate 0.0755   Epoch: 2   Global Step: 108680   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:12,354-Speed 2623.48 samples/sec   Loss 12.0632   LearningRate 0.0755   Epoch: 2   Global Step: 108690   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:16,248-Speed 2631.06 samples/sec   Loss 11.9831   LearningRate 0.0755   Epoch: 2   Global Step: 108700   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:20,155-Speed 2621.26 samples/sec   Loss 11.9876   LearningRate 0.0755   Epoch: 2   Global Step: 108710   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:24,052-Speed 2628.98 samples/sec   Loss 12.2634   LearningRate 0.0755   Epoch: 2   Global Step: 108720   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:28,002-Speed 2593.19 samples/sec   Loss 12.2988   LearningRate 0.0755   Epoch: 2   Global Step: 108730   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:31,904-Speed 2624.76 samples/sec   Loss 12.0849   LearningRate 0.0755   Epoch: 2   Global Step: 108740   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:35,783-Speed 2640.85 samples/sec   Loss 12.0204   LearningRate 0.0755   Epoch: 2   Global Step: 108750   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:55:39,672-Speed 2633.22 samples/sec   Loss 12.2000   LearningRate 0.0755   Epoch: 2   Global Step: 108760   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:55:43,578-Speed 2622.91 samples/sec   Loss 12.0983   LearningRate 0.0755   Epoch: 2   Global Step: 108770   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:55:47,479-Speed 2625.38 samples/sec   Loss 12.2004   LearningRate 0.0755   Epoch: 2   Global Step: 108780   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:55:51,380-Speed 2625.57 samples/sec   Loss 12.0197   LearningRate 0.0755   Epoch: 2   Global Step: 108790   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:55:55,286-Speed 2622.48 samples/sec   Loss 12.2006   LearningRate 0.0755   Epoch: 2   Global Step: 108800   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:55:59,192-Speed 2622.27 samples/sec   Loss 12.0625   LearningRate 0.0755   Epoch: 2   Global Step: 108810   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:03,096-Speed 2623.33 samples/sec   Loss 12.1204   LearningRate 0.0755   Epoch: 2   Global Step: 108820   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:07,000-Speed 2623.55 samples/sec   Loss 12.0633   LearningRate 0.0755   Epoch: 2   Global Step: 108830   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:10,902-Speed 2624.81 samples/sec   Loss 12.1104   LearningRate 0.0755   Epoch: 2   Global Step: 108840   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:14,804-Speed 2624.82 samples/sec   Loss 12.0619   LearningRate 0.0755   Epoch: 2   Global Step: 108850   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:18,706-Speed 2624.85 samples/sec   Loss 12.0789   LearningRate 0.0755   Epoch: 2   Global Step: 108860   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:56:22,609-Speed 2624.47 samples/sec   Loss 12.1355   LearningRate 0.0755   Epoch: 2   Global Step: 108870   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:56:26,510-Speed 2625.84 samples/sec   Loss 11.9669   LearningRate 0.0755   Epoch: 2   Global Step: 108880   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:56:30,414-Speed 2623.65 samples/sec   Loss 12.0681   LearningRate 0.0755   Epoch: 2   Global Step: 108890   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:56:34,316-Speed 2624.88 samples/sec   Loss 12.2126   LearningRate 0.0755   Epoch: 2   Global Step: 108900   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:56:38,193-Speed 2641.90 samples/sec   Loss 12.1523   LearningRate 0.0755   Epoch: 2   Global Step: 108910   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:42,091-Speed 2627.40 samples/sec   Loss 12.2089   LearningRate 0.0755   Epoch: 2   Global Step: 108920   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:45,993-Speed 2625.00 samples/sec   Loss 12.0243   LearningRate 0.0755   Epoch: 2   Global Step: 108930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:49,899-Speed 2622.32 samples/sec   Loss 12.1898   LearningRate 0.0755   Epoch: 2   Global Step: 108940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:53,792-Speed 2630.67 samples/sec   Loss 12.2838   LearningRate 0.0755   Epoch: 2   Global Step: 108950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:56:57,688-Speed 2628.93 samples/sec   Loss 11.9669   LearningRate 0.0755   Epoch: 2   Global Step: 108960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:01,590-Speed 2625.39 samples/sec   Loss 12.1485   LearningRate 0.0755   Epoch: 2   Global Step: 108970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:05,494-Speed 2623.68 samples/sec   Loss 12.1333   LearningRate 0.0755   Epoch: 2   Global Step: 108980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:09,383-Speed 2633.30 samples/sec   Loss 12.1019   LearningRate 0.0755   Epoch: 2   Global Step: 108990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:13,275-Speed 2631.93 samples/sec   Loss 11.9807   LearningRate 0.0754   Epoch: 2   Global Step: 109000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:17,167-Speed 2631.67 samples/sec   Loss 11.9796   LearningRate 0.0754   Epoch: 2   Global Step: 109010   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:21,081-Speed 2617.10 samples/sec   Loss 12.0874   LearningRate 0.0754   Epoch: 2   Global Step: 109020   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:24,973-Speed 2631.60 samples/sec   Loss 12.1108   LearningRate 0.0754   Epoch: 2   Global Step: 109030   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:28,866-Speed 2630.89 samples/sec   Loss 11.9577   LearningRate 0.0754   Epoch: 2   Global Step: 109040   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:32,758-Speed 2631.26 samples/sec   Loss 12.1592   LearningRate 0.0754   Epoch: 2   Global Step: 109050   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:36,655-Speed 2628.26 samples/sec   Loss 12.1800   LearningRate 0.0754   Epoch: 2   Global Step: 109060   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:40,562-Speed 2621.72 samples/sec   Loss 12.0902   LearningRate 0.0754   Epoch: 2   Global Step: 109070   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:44,468-Speed 2622.27 samples/sec   Loss 12.0846   LearningRate 0.0754   Epoch: 2   Global Step: 109080   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:57:48,357-Speed 2633.64 samples/sec   Loss 12.1254   LearningRate 0.0754   Epoch: 2   Global Step: 109090   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:52,392-Speed 2538.82 samples/sec   Loss 12.1212   LearningRate 0.0754   Epoch: 2   Global Step: 109100   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:57:56,300-Speed 2621.11 samples/sec   Loss 12.1308   LearningRate 0.0754   Epoch: 2   Global Step: 109110   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:00,203-Speed 2624.49 samples/sec   Loss 12.1590   LearningRate 0.0754   Epoch: 2   Global Step: 109120   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:04,107-Speed 2623.64 samples/sec   Loss 12.1334   LearningRate 0.0754   Epoch: 2   Global Step: 109130   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:08,007-Speed 2625.80 samples/sec   Loss 11.9664   LearningRate 0.0754   Epoch: 2   Global Step: 109140   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:11,912-Speed 2623.05 samples/sec   Loss 12.0657   LearningRate 0.0754   Epoch: 2   Global Step: 109150   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:15,851-Speed 2600.59 samples/sec   Loss 12.0159   LearningRate 0.0754   Epoch: 2   Global Step: 109160   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:19,749-Speed 2627.42 samples/sec   Loss 12.2505   LearningRate 0.0754   Epoch: 2   Global Step: 109170   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:23,647-Speed 2627.52 samples/sec   Loss 12.2474   LearningRate 0.0754   Epoch: 2   Global Step: 109180   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:58:27,553-Speed 2622.80 samples/sec   Loss 12.0146   LearningRate 0.0754   Epoch: 2   Global Step: 109190   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:58:31,456-Speed 2624.16 samples/sec   Loss 12.1351   LearningRate 0.0754   Epoch: 2   Global Step: 109200   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:58:35,361-Speed 2622.66 samples/sec   Loss 12.2021   LearningRate 0.0754   Epoch: 2   Global Step: 109210   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:58:39,264-Speed 2624.06 samples/sec   Loss 12.0420   LearningRate 0.0754   Epoch: 2   Global Step: 109220   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 07:58:43,132-Speed 2648.59 samples/sec   Loss 12.1947   LearningRate 0.0754   Epoch: 2   Global Step: 109230   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:58:47,039-Speed 2621.71 samples/sec   Loss 11.9755   LearningRate 0.0754   Epoch: 2   Global Step: 109240   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:58:50,938-Speed 2627.05 samples/sec   Loss 12.0470   LearningRate 0.0754   Epoch: 2   Global Step: 109250   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:58:54,841-Speed 2624.07 samples/sec   Loss 12.0770   LearningRate 0.0754   Epoch: 2   Global Step: 109260   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:58:58,747-Speed 2621.84 samples/sec   Loss 12.0428   LearningRate 0.0754   Epoch: 2   Global Step: 109270   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:02,655-Speed 2621.20 samples/sec   Loss 12.0817   LearningRate 0.0754   Epoch: 2   Global Step: 109280   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:06,562-Speed 2621.44 samples/sec   Loss 12.0904   LearningRate 0.0754   Epoch: 2   Global Step: 109290   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:10,483-Speed 2611.90 samples/sec   Loss 12.0870   LearningRate 0.0754   Epoch: 2   Global Step: 109300   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:14,386-Speed 2624.53 samples/sec   Loss 12.1005   LearningRate 0.0754   Epoch: 2   Global Step: 109310   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:18,289-Speed 2624.41 samples/sec   Loss 11.9498   LearningRate 0.0754   Epoch: 2   Global Step: 109320   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:22,190-Speed 2625.65 samples/sec   Loss 12.1892   LearningRate 0.0754   Epoch: 2   Global Step: 109330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 07:59:26,070-Speed 2640.12 samples/sec   Loss 12.0754   LearningRate 0.0754   Epoch: 2   Global Step: 109340   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:29,962-Speed 2631.35 samples/sec   Loss 12.9077   LearningRate 0.0754   Epoch: 2   Global Step: 109350   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 07:59:33,825-Speed 2651.20 samples/sec   Loss 12.7647   LearningRate 0.0754   Epoch: 2   Global Step: 109360   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 07:59:37,728-Speed 2624.37 samples/sec   Loss 12.5082   LearningRate 0.0754   Epoch: 2   Global Step: 109370   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 07:59:41,626-Speed 2628.04 samples/sec   Loss 12.2818   LearningRate 0.0754   Epoch: 2   Global Step: 109380   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 07:59:45,530-Speed 2623.67 samples/sec   Loss 12.2389   LearningRate 0.0754   Epoch: 2   Global Step: 109390   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 07:59:49,424-Speed 2630.68 samples/sec   Loss 12.1722   LearningRate 0.0754   Epoch: 2   Global Step: 109400   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 07:59:53,321-Speed 2628.39 samples/sec   Loss 12.2351   LearningRate 0.0754   Epoch: 2   Global Step: 109410   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 07:59:57,223-Speed 2624.73 samples/sec   Loss 12.2997   LearningRate 0.0754   Epoch: 2   Global Step: 109420   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:00:01,185-Speed 2584.70 samples/sec   Loss 12.1977   LearningRate 0.0754   Epoch: 2   Global Step: 109430   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:00:05,202-Speed 2550.40 samples/sec   Loss 12.1046   LearningRate 0.0754   Epoch: 2   Global Step: 109440   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:00:09,094-Speed 2631.25 samples/sec   Loss 12.0615   LearningRate 0.0754   Epoch: 2   Global Step: 109450   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:00:12,986-Speed 2632.22 samples/sec   Loss 12.2449   LearningRate 0.0754   Epoch: 2   Global Step: 109460   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:16,881-Speed 2629.54 samples/sec   Loss 12.1527   LearningRate 0.0753   Epoch: 2   Global Step: 109470   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:20,784-Speed 2625.02 samples/sec   Loss 12.1119   LearningRate 0.0753   Epoch: 2   Global Step: 109480   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:24,677-Speed 2630.42 samples/sec   Loss 12.2176   LearningRate 0.0753   Epoch: 2   Global Step: 109490   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:28,608-Speed 2605.86 samples/sec   Loss 12.0391   LearningRate 0.0753   Epoch: 2   Global Step: 109500   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:32,503-Speed 2629.26 samples/sec   Loss 12.0630   LearningRate 0.0753   Epoch: 2   Global Step: 109510   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:36,401-Speed 2628.32 samples/sec   Loss 11.9907   LearningRate 0.0753   Epoch: 2   Global Step: 109520   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:40,295-Speed 2629.70 samples/sec   Loss 12.1819   LearningRate 0.0753   Epoch: 2   Global Step: 109530   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:44,202-Speed 2621.34 samples/sec   Loss 12.0584   LearningRate 0.0753   Epoch: 2   Global Step: 109540   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:48,092-Speed 2633.39 samples/sec   Loss 12.0864   LearningRate 0.0753   Epoch: 2   Global Step: 109550   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:00:51,988-Speed 2629.12 samples/sec   Loss 12.0153   LearningRate 0.0753   Epoch: 2   Global Step: 109560   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:00:55,881-Speed 2631.28 samples/sec   Loss 12.1187   LearningRate 0.0753   Epoch: 2   Global Step: 109570   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:00:59,788-Speed 2621.53 samples/sec   Loss 12.1644   LearningRate 0.0753   Epoch: 2   Global Step: 109580   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:03,684-Speed 2628.49 samples/sec   Loss 12.1966   LearningRate 0.0753   Epoch: 2   Global Step: 109590   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:07,575-Speed 2632.76 samples/sec   Loss 12.1865   LearningRate 0.0753   Epoch: 2   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:11,463-Speed 2634.37 samples/sec   Loss 12.1732   LearningRate 0.0753   Epoch: 2   Global Step: 109610   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:15,362-Speed 2626.92 samples/sec   Loss 12.0320   LearningRate 0.0753   Epoch: 2   Global Step: 109620   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:19,267-Speed 2622.44 samples/sec   Loss 11.9451   LearningRate 0.0753   Epoch: 2   Global Step: 109630   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:23,178-Speed 2619.39 samples/sec   Loss 12.0914   LearningRate 0.0753   Epoch: 2   Global Step: 109640   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:27,067-Speed 2633.54 samples/sec   Loss 12.0586   LearningRate 0.0753   Epoch: 2   Global Step: 109650   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:01:30,956-Speed 2633.73 samples/sec   Loss 12.0617   LearningRate 0.0753   Epoch: 2   Global Step: 109660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:34,848-Speed 2631.59 samples/sec   Loss 12.0576   LearningRate 0.0753   Epoch: 2   Global Step: 109670   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:38,741-Speed 2630.73 samples/sec   Loss 12.1214   LearningRate 0.0753   Epoch: 2   Global Step: 109680   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:42,638-Speed 2628.35 samples/sec   Loss 12.0047   LearningRate 0.0753   Epoch: 2   Global Step: 109690   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:46,543-Speed 2623.68 samples/sec   Loss 12.1174   LearningRate 0.0753   Epoch: 2   Global Step: 109700   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:50,452-Speed 2620.13 samples/sec   Loss 12.1868   LearningRate 0.0753   Epoch: 2   Global Step: 109710   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:54,368-Speed 2615.97 samples/sec   Loss 12.2262   LearningRate 0.0753   Epoch: 2   Global Step: 109720   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:01:58,271-Speed 2624.03 samples/sec   Loss 12.2735   LearningRate 0.0753   Epoch: 2   Global Step: 109730   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:02,196-Speed 2609.58 samples/sec   Loss 12.1244   LearningRate 0.0753   Epoch: 2   Global Step: 109740   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:06,089-Speed 2631.30 samples/sec   Loss 12.0442   LearningRate 0.0753   Epoch: 2   Global Step: 109750   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:09,968-Speed 2640.62 samples/sec   Loss 12.0241   LearningRate 0.0753   Epoch: 2   Global Step: 109760   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:13,869-Speed 2625.44 samples/sec   Loss 12.1831   LearningRate 0.0753   Epoch: 2   Global Step: 109770   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:17,767-Speed 2627.78 samples/sec   Loss 12.0441   LearningRate 0.0753   Epoch: 2   Global Step: 109780   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:21,827-Speed 2522.94 samples/sec   Loss 11.9970   LearningRate 0.0753   Epoch: 2   Global Step: 109790   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:25,925-Speed 2499.01 samples/sec   Loss 12.0530   LearningRate 0.0753   Epoch: 2   Global Step: 109800   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:29,897-Speed 2578.74 samples/sec   Loss 12.0382   LearningRate 0.0753   Epoch: 2   Global Step: 109810   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:33,790-Speed 2631.20 samples/sec   Loss 11.9907   LearningRate 0.0753   Epoch: 2   Global Step: 109820   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:37,700-Speed 2619.74 samples/sec   Loss 12.0735   LearningRate 0.0753   Epoch: 2   Global Step: 109830   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:41,599-Speed 2626.43 samples/sec   Loss 12.0928   LearningRate 0.0753   Epoch: 2   Global Step: 109840   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:45,538-Speed 2600.45 samples/sec   Loss 11.9639   LearningRate 0.0753   Epoch: 2   Global Step: 109850   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:02:49,430-Speed 2631.88 samples/sec   Loss 12.1759   LearningRate 0.0753   Epoch: 2   Global Step: 109860   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:02:53,324-Speed 2630.81 samples/sec   Loss 12.1047   LearningRate 0.0753   Epoch: 2   Global Step: 109870   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:02:57,217-Speed 2631.20 samples/sec   Loss 12.0891   LearningRate 0.0753   Epoch: 2   Global Step: 109880   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:01,111-Speed 2630.38 samples/sec   Loss 12.1207   LearningRate 0.0753   Epoch: 2   Global Step: 109890   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:05,031-Speed 2613.07 samples/sec   Loss 11.9241   LearningRate 0.0753   Epoch: 2   Global Step: 109900   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:08,949-Speed 2614.05 samples/sec   Loss 12.1597   LearningRate 0.0753   Epoch: 2   Global Step: 109910   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:12,879-Speed 2606.25 samples/sec   Loss 11.9903   LearningRate 0.0753   Epoch: 2   Global Step: 109920   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:16,776-Speed 2628.77 samples/sec   Loss 12.1304   LearningRate 0.0753   Epoch: 2   Global Step: 109930   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:20,683-Speed 2621.47 samples/sec   Loss 12.0935   LearningRate 0.0753   Epoch: 2   Global Step: 109940   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:24,591-Speed 2620.92 samples/sec   Loss 12.1041   LearningRate 0.0752   Epoch: 2   Global Step: 109950   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:28,484-Speed 2631.21 samples/sec   Loss 12.1361   LearningRate 0.0752   Epoch: 2   Global Step: 109960   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:32,383-Speed 2626.77 samples/sec   Loss 12.0959   LearningRate 0.0752   Epoch: 2   Global Step: 109970   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:03:36,277-Speed 2630.69 samples/sec   Loss 12.2194   LearningRate 0.0752   Epoch: 2   Global Step: 109980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:03:40,189-Speed 2618.32 samples/sec   Loss 12.1815   LearningRate 0.0752   Epoch: 2   Global Step: 109990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:03:44,102-Speed 2617.02 samples/sec   Loss 11.9682   LearningRate 0.0752   Epoch: 2   Global Step: 110000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:04:27,269-[lfw][110000]XNorm: 22.173644
Training: 2022-04-13 08:04:27,270-[lfw][110000]Accuracy-Flip: 0.99650+-0.00189
Training: 2022-04-13 08:04:27,270-[lfw][110000]Accuracy-Highest: 0.99783
Training: 2022-04-13 08:05:17,321-[cfp_fp][110000]XNorm: 20.136800
Training: 2022-04-13 08:05:17,322-[cfp_fp][110000]Accuracy-Flip: 0.97729+-0.00721
Training: 2022-04-13 08:05:17,323-[cfp_fp][110000]Accuracy-Highest: 0.97986
Training: 2022-04-13 08:06:00,469-[agedb_30][110000]XNorm: 21.814672
Training: 2022-04-13 08:06:00,470-[agedb_30][110000]Accuracy-Flip: 0.96733+-0.00782
Training: 2022-04-13 08:06:00,470-[agedb_30][110000]Accuracy-Highest: 0.96750
Training: 2022-04-13 08:06:04,344-Speed 73.02 samples/sec   Loss 12.1267   LearningRate 0.0752   Epoch: 2   Global Step: 110010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:08,220-Speed 2642.51 samples/sec   Loss 12.1057   LearningRate 0.0752   Epoch: 2   Global Step: 110020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:12,106-Speed 2636.06 samples/sec   Loss 11.7171   LearningRate 0.0752   Epoch: 2   Global Step: 110030   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:15,979-Speed 2644.79 samples/sec   Loss 11.9920   LearningRate 0.0752   Epoch: 2   Global Step: 110040   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:19,865-Speed 2635.59 samples/sec   Loss 12.0496   LearningRate 0.0752   Epoch: 2   Global Step: 110050   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:23,744-Speed 2640.50 samples/sec   Loss 12.1492   LearningRate 0.0752   Epoch: 2   Global Step: 110060   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:27,627-Speed 2638.76 samples/sec   Loss 12.0734   LearningRate 0.0752   Epoch: 2   Global Step: 110070   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:06:31,510-Speed 2638.05 samples/sec   Loss 11.9498   LearningRate 0.0752   Epoch: 2   Global Step: 110080   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:35,395-Speed 2636.61 samples/sec   Loss 12.0721   LearningRate 0.0752   Epoch: 2   Global Step: 110090   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:39,289-Speed 2629.74 samples/sec   Loss 12.1161   LearningRate 0.0752   Epoch: 2   Global Step: 110100   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:43,177-Speed 2634.75 samples/sec   Loss 12.2796   LearningRate 0.0752   Epoch: 2   Global Step: 110110   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:47,070-Speed 2631.18 samples/sec   Loss 12.0230   LearningRate 0.0752   Epoch: 2   Global Step: 110120   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:50,969-Speed 2627.09 samples/sec   Loss 12.1282   LearningRate 0.0752   Epoch: 2   Global Step: 110130   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:54,874-Speed 2622.89 samples/sec   Loss 12.1854   LearningRate 0.0752   Epoch: 2   Global Step: 110140   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:06:58,753-Speed 2640.46 samples/sec   Loss 12.0519   LearningRate 0.0752   Epoch: 2   Global Step: 110150   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:02,651-Speed 2627.70 samples/sec   Loss 12.0336   LearningRate 0.0752   Epoch: 2   Global Step: 110160   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:06,552-Speed 2626.15 samples/sec   Loss 12.0500   LearningRate 0.0752   Epoch: 2   Global Step: 110170   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:10,450-Speed 2627.05 samples/sec   Loss 12.0462   LearningRate 0.0752   Epoch: 2   Global Step: 110180   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:14,356-Speed 2622.61 samples/sec   Loss 11.9131   LearningRate 0.0752   Epoch: 2   Global Step: 110190   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:18,283-Speed 2608.21 samples/sec   Loss 11.9506   LearningRate 0.0752   Epoch: 2   Global Step: 110200   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:22,189-Speed 2622.53 samples/sec   Loss 12.1381   LearningRate 0.0752   Epoch: 2   Global Step: 110210   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:26,100-Speed 2618.51 samples/sec   Loss 12.2486   LearningRate 0.0752   Epoch: 2   Global Step: 110220   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:30,008-Speed 2621.40 samples/sec   Loss 11.9714   LearningRate 0.0752   Epoch: 2   Global Step: 110230   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:33,915-Speed 2621.18 samples/sec   Loss 12.1201   LearningRate 0.0752   Epoch: 2   Global Step: 110240   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:37,802-Speed 2635.35 samples/sec   Loss 12.1608   LearningRate 0.0752   Epoch: 2   Global Step: 110250   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:41,713-Speed 2619.32 samples/sec   Loss 11.9358   LearningRate 0.0752   Epoch: 2   Global Step: 110260   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:45,625-Speed 2618.42 samples/sec   Loss 11.9078   LearningRate 0.0752   Epoch: 2   Global Step: 110270   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:49,539-Speed 2617.07 samples/sec   Loss 12.0952   LearningRate 0.0752   Epoch: 2   Global Step: 110280   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:53,444-Speed 2622.70 samples/sec   Loss 12.0628   LearningRate 0.0752   Epoch: 2   Global Step: 110290   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:07:57,345-Speed 2625.85 samples/sec   Loss 11.9827   LearningRate 0.0752   Epoch: 2   Global Step: 110300   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:08:01,382-Speed 2537.32 samples/sec   Loss 12.2289   LearningRate 0.0752   Epoch: 2   Global Step: 110310   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:08:05,280-Speed 2627.76 samples/sec   Loss 12.0559   LearningRate 0.0752   Epoch: 2   Global Step: 110320   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:08:09,195-Speed 2615.73 samples/sec   Loss 11.9937   LearningRate 0.0752   Epoch: 2   Global Step: 110330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:08:13,086-Speed 2632.62 samples/sec   Loss 12.2329   LearningRate 0.0752   Epoch: 2   Global Step: 110340   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:16,990-Speed 2624.17 samples/sec   Loss 12.1646   LearningRate 0.0752   Epoch: 2   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:20,911-Speed 2612.23 samples/sec   Loss 12.2336   LearningRate 0.0752   Epoch: 2   Global Step: 110360   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:24,809-Speed 2627.16 samples/sec   Loss 12.0953   LearningRate 0.0752   Epoch: 2   Global Step: 110370   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:28,740-Speed 2606.05 samples/sec   Loss 11.9805   LearningRate 0.0752   Epoch: 2   Global Step: 110380   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:32,643-Speed 2623.87 samples/sec   Loss 11.9999   LearningRate 0.0752   Epoch: 2   Global Step: 110390   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:36,543-Speed 2626.83 samples/sec   Loss 12.0766   LearningRate 0.0752   Epoch: 2   Global Step: 110400   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:40,447-Speed 2623.42 samples/sec   Loss 12.1347   LearningRate 0.0752   Epoch: 2   Global Step: 110410   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:44,345-Speed 2628.15 samples/sec   Loss 12.0173   LearningRate 0.0752   Epoch: 2   Global Step: 110420   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:48,242-Speed 2628.26 samples/sec   Loss 12.0979   LearningRate 0.0751   Epoch: 2   Global Step: 110430   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:08:52,143-Speed 2625.26 samples/sec   Loss 12.1335   LearningRate 0.0751   Epoch: 2   Global Step: 110440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:08:56,039-Speed 2628.63 samples/sec   Loss 12.0513   LearningRate 0.0751   Epoch: 2   Global Step: 110450   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:08:59,942-Speed 2624.78 samples/sec   Loss 12.0830   LearningRate 0.0751   Epoch: 2   Global Step: 110460   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:09:03,840-Speed 2627.82 samples/sec   Loss 12.0301   LearningRate 0.0751   Epoch: 2   Global Step: 110470   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:07,740-Speed 2626.40 samples/sec   Loss 12.0721   LearningRate 0.0751   Epoch: 2   Global Step: 110480   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:11,642-Speed 2624.95 samples/sec   Loss 12.1823   LearningRate 0.0751   Epoch: 2   Global Step: 110490   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:15,540-Speed 2628.24 samples/sec   Loss 12.0432   LearningRate 0.0751   Epoch: 2   Global Step: 110500   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:19,462-Speed 2611.41 samples/sec   Loss 11.9884   LearningRate 0.0751   Epoch: 2   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:23,370-Speed 2620.58 samples/sec   Loss 12.0531   LearningRate 0.0751   Epoch: 2   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:27,281-Speed 2619.01 samples/sec   Loss 11.9284   LearningRate 0.0751   Epoch: 2   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:31,195-Speed 2617.11 samples/sec   Loss 12.0669   LearningRate 0.0751   Epoch: 2   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:35,107-Speed 2618.12 samples/sec   Loss 12.0893   LearningRate 0.0751   Epoch: 2   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:39,020-Speed 2617.47 samples/sec   Loss 12.2299   LearningRate 0.0751   Epoch: 2   Global Step: 110560   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:09:42,917-Speed 2628.83 samples/sec   Loss 12.0055   LearningRate 0.0751   Epoch: 2   Global Step: 110570   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:09:46,818-Speed 2625.75 samples/sec   Loss 12.2020   LearningRate 0.0751   Epoch: 2   Global Step: 110580   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:09:50,717-Speed 2626.64 samples/sec   Loss 11.9720   LearningRate 0.0751   Epoch: 2   Global Step: 110590   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:09:54,616-Speed 2626.89 samples/sec   Loss 12.0128   LearningRate 0.0751   Epoch: 2   Global Step: 110600   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:09:58,510-Speed 2630.75 samples/sec   Loss 12.0750   LearningRate 0.0751   Epoch: 2   Global Step: 110610   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:02,410-Speed 2625.98 samples/sec   Loss 12.0755   LearningRate 0.0751   Epoch: 2   Global Step: 110620   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:06,310-Speed 2627.02 samples/sec   Loss 12.0976   LearningRate 0.0751   Epoch: 2   Global Step: 110630   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:10,233-Speed 2610.42 samples/sec   Loss 11.9559   LearningRate 0.0751   Epoch: 2   Global Step: 110640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:14,138-Speed 2623.19 samples/sec   Loss 12.0633   LearningRate 0.0751   Epoch: 2   Global Step: 110650   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:18,041-Speed 2624.07 samples/sec   Loss 12.0996   LearningRate 0.0751   Epoch: 2   Global Step: 110660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:21,979-Speed 2600.72 samples/sec   Loss 12.0363   LearningRate 0.0751   Epoch: 2   Global Step: 110670   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:10:25,898-Speed 2613.90 samples/sec   Loss 12.1118   LearningRate 0.0751   Epoch: 2   Global Step: 110680   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:10:29,823-Speed 2609.84 samples/sec   Loss 11.9896   LearningRate 0.0751   Epoch: 2   Global Step: 110690   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:10:33,720-Speed 2628.74 samples/sec   Loss 11.9183   LearningRate 0.0751   Epoch: 2   Global Step: 110700   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:10:37,617-Speed 2628.07 samples/sec   Loss 11.9805   LearningRate 0.0751   Epoch: 2   Global Step: 110710   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:10:41,501-Speed 2637.46 samples/sec   Loss 12.0580   LearningRate 0.0751   Epoch: 2   Global Step: 110720   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:45,452-Speed 2592.32 samples/sec   Loss 12.1490   LearningRate 0.0751   Epoch: 2   Global Step: 110730   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:49,346-Speed 2630.11 samples/sec   Loss 12.0912   LearningRate 0.0751   Epoch: 2   Global Step: 110740   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:53,251-Speed 2623.50 samples/sec   Loss 12.0314   LearningRate 0.0751   Epoch: 2   Global Step: 110750   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:10:57,150-Speed 2626.81 samples/sec   Loss 12.0158   LearningRate 0.0751   Epoch: 2   Global Step: 110760   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:01,049-Speed 2626.92 samples/sec   Loss 12.1455   LearningRate 0.0751   Epoch: 2   Global Step: 110770   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:04,968-Speed 2613.49 samples/sec   Loss 12.2227   LearningRate 0.0751   Epoch: 2   Global Step: 110780   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:08,877-Speed 2620.65 samples/sec   Loss 12.2660   LearningRate 0.0751   Epoch: 2   Global Step: 110790   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:12,777-Speed 2626.50 samples/sec   Loss 12.0587   LearningRate 0.0751   Epoch: 2   Global Step: 110800   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:16,686-Speed 2620.43 samples/sec   Loss 12.0454   LearningRate 0.0751   Epoch: 2   Global Step: 110810   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:20,583-Speed 2628.19 samples/sec   Loss 12.0303   LearningRate 0.0751   Epoch: 2   Global Step: 110820   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:11:24,473-Speed 2632.88 samples/sec   Loss 12.0510   LearningRate 0.0751   Epoch: 2   Global Step: 110830   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:28,375-Speed 2625.53 samples/sec   Loss 11.9338   LearningRate 0.0751   Epoch: 2   Global Step: 110840   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:32,276-Speed 2625.93 samples/sec   Loss 12.1023   LearningRate 0.0751   Epoch: 2   Global Step: 110850   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:36,180-Speed 2623.02 samples/sec   Loss 11.9962   LearningRate 0.0751   Epoch: 2   Global Step: 110860   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:40,077-Speed 2627.91 samples/sec   Loss 12.0230   LearningRate 0.0751   Epoch: 2   Global Step: 110870   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:43,975-Speed 2628.37 samples/sec   Loss 11.8651   LearningRate 0.0751   Epoch: 2   Global Step: 110880   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:47,876-Speed 2625.07 samples/sec   Loss 12.1158   LearningRate 0.0751   Epoch: 2   Global Step: 110890   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:51,782-Speed 2623.05 samples/sec   Loss 12.2190   LearningRate 0.0751   Epoch: 2   Global Step: 110900   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:55,683-Speed 2625.10 samples/sec   Loss 12.0310   LearningRate 0.0750   Epoch: 2   Global Step: 110910   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:11:59,588-Speed 2623.23 samples/sec   Loss 12.0406   LearningRate 0.0750   Epoch: 2   Global Step: 110920   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:03,476-Speed 2634.24 samples/sec   Loss 12.0697   LearningRate 0.0750   Epoch: 2   Global Step: 110930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:07,371-Speed 2630.00 samples/sec   Loss 12.0359   LearningRate 0.0750   Epoch: 2   Global Step: 110940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:11,273-Speed 2624.91 samples/sec   Loss 11.9402   LearningRate 0.0750   Epoch: 2   Global Step: 110950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:15,174-Speed 2625.62 samples/sec   Loss 12.2030   LearningRate 0.0750   Epoch: 2   Global Step: 110960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:19,075-Speed 2625.42 samples/sec   Loss 12.0478   LearningRate 0.0750   Epoch: 2   Global Step: 110970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:22,978-Speed 2624.56 samples/sec   Loss 11.9715   LearningRate 0.0750   Epoch: 2   Global Step: 110980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:26,876-Speed 2627.48 samples/sec   Loss 12.0454   LearningRate 0.0750   Epoch: 2   Global Step: 110990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:30,788-Speed 2618.87 samples/sec   Loss 12.1329   LearningRate 0.0750   Epoch: 2   Global Step: 111000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:34,693-Speed 2622.58 samples/sec   Loss 12.1216   LearningRate 0.0750   Epoch: 2   Global Step: 111010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:38,595-Speed 2625.14 samples/sec   Loss 12.0061   LearningRate 0.0750   Epoch: 2   Global Step: 111020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:42,494-Speed 2626.64 samples/sec   Loss 12.1361   LearningRate 0.0750   Epoch: 2   Global Step: 111030   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:12:46,391-Speed 2628.73 samples/sec   Loss 12.0415   LearningRate 0.0750   Epoch: 2   Global Step: 111040   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:12:50,275-Speed 2636.84 samples/sec   Loss 11.9773   LearningRate 0.0750   Epoch: 2   Global Step: 111050   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:54,172-Speed 2628.39 samples/sec   Loss 12.1923   LearningRate 0.0750   Epoch: 2   Global Step: 111060   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:12:58,065-Speed 2631.40 samples/sec   Loss 12.0014   LearningRate 0.0750   Epoch: 2   Global Step: 111070   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:01,960-Speed 2629.70 samples/sec   Loss 11.9596   LearningRate 0.0750   Epoch: 2   Global Step: 111080   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:05,866-Speed 2622.09 samples/sec   Loss 12.0078   LearningRate 0.0750   Epoch: 2   Global Step: 111090   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:09,768-Speed 2624.96 samples/sec   Loss 12.1258   LearningRate 0.0750   Epoch: 2   Global Step: 111100   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:13,663-Speed 2629.48 samples/sec   Loss 12.0981   LearningRate 0.0750   Epoch: 2   Global Step: 111110   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:17,568-Speed 2623.15 samples/sec   Loss 11.9268   LearningRate 0.0750   Epoch: 2   Global Step: 111120   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:21,465-Speed 2628.61 samples/sec   Loss 12.0996   LearningRate 0.0750   Epoch: 2   Global Step: 111130   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:25,361-Speed 2628.95 samples/sec   Loss 12.0050   LearningRate 0.0750   Epoch: 2   Global Step: 111140   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:29,277-Speed 2616.06 samples/sec   Loss 12.0250   LearningRate 0.0750   Epoch: 2   Global Step: 111150   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:13:33,238-Speed 2585.54 samples/sec   Loss 12.0573   LearningRate 0.0750   Epoch: 2   Global Step: 111160   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:13:37,139-Speed 2625.77 samples/sec   Loss 12.0115   LearningRate 0.0750   Epoch: 2   Global Step: 111170   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:13:41,049-Speed 2618.98 samples/sec   Loss 11.9367   LearningRate 0.0750   Epoch: 2   Global Step: 111180   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:13:44,930-Speed 2639.59 samples/sec   Loss 12.0700   LearningRate 0.0750   Epoch: 2   Global Step: 111190   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:13:48,813-Speed 2637.89 samples/sec   Loss 12.0464   LearningRate 0.0750   Epoch: 2   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:13:52,719-Speed 2622.27 samples/sec   Loss 12.0130   LearningRate 0.0750   Epoch: 2   Global Step: 111210   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:13:56,614-Speed 2629.69 samples/sec   Loss 12.0982   LearningRate 0.0750   Epoch: 2   Global Step: 111220   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:00,508-Speed 2630.29 samples/sec   Loss 12.0670   LearningRate 0.0750   Epoch: 2   Global Step: 111230   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:04,406-Speed 2627.44 samples/sec   Loss 12.0746   LearningRate 0.0750   Epoch: 2   Global Step: 111240   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:08,298-Speed 2631.51 samples/sec   Loss 12.0986   LearningRate 0.0750   Epoch: 2   Global Step: 111250   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:12,192-Speed 2630.28 samples/sec   Loss 12.1874   LearningRate 0.0750   Epoch: 2   Global Step: 111260   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:16,098-Speed 2622.20 samples/sec   Loss 12.1905   LearningRate 0.0750   Epoch: 2   Global Step: 111270   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:20,000-Speed 2624.79 samples/sec   Loss 12.1586   LearningRate 0.0750   Epoch: 2   Global Step: 111280   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:23,897-Speed 2628.13 samples/sec   Loss 12.0679   LearningRate 0.0750   Epoch: 2   Global Step: 111290   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:14:27,800-Speed 2624.41 samples/sec   Loss 12.1273   LearningRate 0.0750   Epoch: 2   Global Step: 111300   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:31,701-Speed 2625.33 samples/sec   Loss 12.1775   LearningRate 0.0750   Epoch: 2   Global Step: 111310   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:35,600-Speed 2626.95 samples/sec   Loss 11.9247   LearningRate 0.0750   Epoch: 2   Global Step: 111320   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:39,494-Speed 2630.00 samples/sec   Loss 12.0734   LearningRate 0.0750   Epoch: 2   Global Step: 111330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:43,396-Speed 2625.24 samples/sec   Loss 12.1506   LearningRate 0.0750   Epoch: 2   Global Step: 111340   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:47,308-Speed 2618.11 samples/sec   Loss 12.1163   LearningRate 0.0750   Epoch: 2   Global Step: 111350   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:51,202-Speed 2630.15 samples/sec   Loss 12.0673   LearningRate 0.0750   Epoch: 2   Global Step: 111360   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:55,100-Speed 2627.52 samples/sec   Loss 12.1007   LearningRate 0.0750   Epoch: 2   Global Step: 111370   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:14:58,995-Speed 2629.99 samples/sec   Loss 12.1421   LearningRate 0.0750   Epoch: 2   Global Step: 111380   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:02,891-Speed 2628.86 samples/sec   Loss 12.1346   LearningRate 0.0749   Epoch: 2   Global Step: 111390   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:06,964-Speed 2514.57 samples/sec   Loss 11.9070   LearningRate 0.0749   Epoch: 2   Global Step: 111400   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:15:10,876-Speed 2618.49 samples/sec   Loss 12.0569   LearningRate 0.0749   Epoch: 2   Global Step: 111410   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:15:14,771-Speed 2629.51 samples/sec   Loss 12.1600   LearningRate 0.0749   Epoch: 2   Global Step: 111420   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:15:18,651-Speed 2639.31 samples/sec   Loss 11.9232   LearningRate 0.0749   Epoch: 2   Global Step: 111430   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:22,557-Speed 2629.72 samples/sec   Loss 12.0288   LearningRate 0.0749   Epoch: 2   Global Step: 111440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:26,477-Speed 2613.05 samples/sec   Loss 12.0736   LearningRate 0.0749   Epoch: 2   Global Step: 111450   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:30,373-Speed 2628.85 samples/sec   Loss 11.9205   LearningRate 0.0749   Epoch: 2   Global Step: 111460   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:34,272-Speed 2627.46 samples/sec   Loss 12.1197   LearningRate 0.0749   Epoch: 2   Global Step: 111470   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:38,169-Speed 2627.92 samples/sec   Loss 11.9801   LearningRate 0.0749   Epoch: 2   Global Step: 111480   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:42,068-Speed 2626.89 samples/sec   Loss 11.8774   LearningRate 0.0749   Epoch: 2   Global Step: 111490   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:45,966-Speed 2627.41 samples/sec   Loss 12.0871   LearningRate 0.0749   Epoch: 2   Global Step: 111500   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:49,869-Speed 2624.28 samples/sec   Loss 11.9174   LearningRate 0.0749   Epoch: 2   Global Step: 111510   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:53,766-Speed 2628.43 samples/sec   Loss 11.9154   LearningRate 0.0749   Epoch: 2   Global Step: 111520   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:15:57,671-Speed 2622.82 samples/sec   Loss 11.9617   LearningRate 0.0749   Epoch: 2   Global Step: 111530   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:01,564-Speed 2630.62 samples/sec   Loss 11.9442   LearningRate 0.0749   Epoch: 2   Global Step: 111540   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:05,467-Speed 2624.52 samples/sec   Loss 11.8987   LearningRate 0.0749   Epoch: 2   Global Step: 111550   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:09,372-Speed 2623.04 samples/sec   Loss 11.8983   LearningRate 0.0749   Epoch: 2   Global Step: 111560   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:13,273-Speed 2625.94 samples/sec   Loss 11.9609   LearningRate 0.0749   Epoch: 2   Global Step: 111570   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:17,178-Speed 2622.50 samples/sec   Loss 11.9902   LearningRate 0.0749   Epoch: 2   Global Step: 111580   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:21,079-Speed 2626.02 samples/sec   Loss 12.0311   LearningRate 0.0749   Epoch: 2   Global Step: 111590   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:24,980-Speed 2625.09 samples/sec   Loss 11.8949   LearningRate 0.0749   Epoch: 2   Global Step: 111600   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:28,882-Speed 2624.62 samples/sec   Loss 12.0549   LearningRate 0.0749   Epoch: 2   Global Step: 111610   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:32,784-Speed 2625.25 samples/sec   Loss 12.0790   LearningRate 0.0749   Epoch: 2   Global Step: 111620   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:36,673-Speed 2633.45 samples/sec   Loss 11.9914   LearningRate 0.0749   Epoch: 2   Global Step: 111630   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:16:40,554-Speed 2640.62 samples/sec   Loss 11.9892   LearningRate 0.0749   Epoch: 2   Global Step: 111640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:16:44,446-Speed 2631.39 samples/sec   Loss 11.9645   LearningRate 0.0749   Epoch: 2   Global Step: 111650   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:16:48,345-Speed 2627.10 samples/sec   Loss 12.0487   LearningRate 0.0749   Epoch: 2   Global Step: 111660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:16:52,244-Speed 2627.15 samples/sec   Loss 11.8452   LearningRate 0.0749   Epoch: 2   Global Step: 111670   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:16:56,142-Speed 2627.62 samples/sec   Loss 12.0021   LearningRate 0.0749   Epoch: 2   Global Step: 111680   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:17:00,043-Speed 2625.40 samples/sec   Loss 11.9385   LearningRate 0.0749   Epoch: 2   Global Step: 111690   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:17:03,947-Speed 2623.33 samples/sec   Loss 11.9288   LearningRate 0.0749   Epoch: 2   Global Step: 111700   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:17:07,841-Speed 2629.95 samples/sec   Loss 12.0591   LearningRate 0.0749   Epoch: 2   Global Step: 111710   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:17:11,724-Speed 2638.48 samples/sec   Loss 11.9210   LearningRate 0.0749   Epoch: 2   Global Step: 111720   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:15,619-Speed 2629.90 samples/sec   Loss 12.0594   LearningRate 0.0749   Epoch: 2   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:19,512-Speed 2630.78 samples/sec   Loss 12.0731   LearningRate 0.0749   Epoch: 2   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:23,423-Speed 2619.48 samples/sec   Loss 12.0032   LearningRate 0.0749   Epoch: 2   Global Step: 111750   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:27,326-Speed 2624.00 samples/sec   Loss 12.0037   LearningRate 0.0749   Epoch: 2   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:31,242-Speed 2615.74 samples/sec   Loss 12.0209   LearningRate 0.0749   Epoch: 2   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:35,145-Speed 2624.02 samples/sec   Loss 12.0444   LearningRate 0.0749   Epoch: 2   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:39,058-Speed 2617.04 samples/sec   Loss 12.0102   LearningRate 0.0749   Epoch: 2   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:42,962-Speed 2623.59 samples/sec   Loss 12.0331   LearningRate 0.0749   Epoch: 2   Global Step: 111800   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:46,870-Speed 2621.20 samples/sec   Loss 11.9762   LearningRate 0.0749   Epoch: 2   Global Step: 111810   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:17:50,767-Speed 2628.80 samples/sec   Loss 12.0957   LearningRate 0.0749   Epoch: 2   Global Step: 111820   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:17:54,672-Speed 2622.55 samples/sec   Loss 11.9502   LearningRate 0.0749   Epoch: 2   Global Step: 111830   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:17:58,568-Speed 2629.04 samples/sec   Loss 12.0217   LearningRate 0.0749   Epoch: 2   Global Step: 111840   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:02,468-Speed 2626.09 samples/sec   Loss 11.9913   LearningRate 0.0749   Epoch: 2   Global Step: 111850   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:06,367-Speed 2626.88 samples/sec   Loss 12.0263   LearningRate 0.0749   Epoch: 2   Global Step: 111860   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:10,262-Speed 2629.51 samples/sec   Loss 11.9425   LearningRate 0.0748   Epoch: 2   Global Step: 111870   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:14,177-Speed 2616.55 samples/sec   Loss 11.9433   LearningRate 0.0748   Epoch: 2   Global Step: 111880   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:18,126-Speed 2593.59 samples/sec   Loss 11.9493   LearningRate 0.0748   Epoch: 2   Global Step: 111890   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:22,019-Speed 2631.50 samples/sec   Loss 11.9988   LearningRate 0.0748   Epoch: 2   Global Step: 111900   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:25,920-Speed 2625.92 samples/sec   Loss 11.9539   LearningRate 0.0748   Epoch: 2   Global Step: 111910   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:29,818-Speed 2627.90 samples/sec   Loss 11.9408   LearningRate 0.0748   Epoch: 2   Global Step: 111920   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:18:33,700-Speed 2638.59 samples/sec   Loss 12.1026   LearningRate 0.0748   Epoch: 2   Global Step: 111930   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:37,590-Speed 2632.79 samples/sec   Loss 12.0664   LearningRate 0.0748   Epoch: 2   Global Step: 111940   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:41,485-Speed 2629.02 samples/sec   Loss 12.0809   LearningRate 0.0748   Epoch: 2   Global Step: 111950   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:45,379-Speed 2631.22 samples/sec   Loss 12.1411   LearningRate 0.0748   Epoch: 2   Global Step: 111960   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:49,274-Speed 2629.36 samples/sec   Loss 12.0386   LearningRate 0.0748   Epoch: 2   Global Step: 111970   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:53,168-Speed 2631.02 samples/sec   Loss 12.2088   LearningRate 0.0748   Epoch: 2   Global Step: 111980   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:18:57,061-Speed 2631.01 samples/sec   Loss 12.0630   LearningRate 0.0748   Epoch: 2   Global Step: 111990   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:19:00,953-Speed 2631.97 samples/sec   Loss 12.0293   LearningRate 0.0748   Epoch: 2   Global Step: 112000   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:19:04,855-Speed 2624.50 samples/sec   Loss 11.9446   LearningRate 0.0748   Epoch: 2   Global Step: 112010   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:19:08,760-Speed 2622.90 samples/sec   Loss 11.9822   LearningRate 0.0748   Epoch: 2   Global Step: 112020   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:19:12,668-Speed 2620.80 samples/sec   Loss 12.1547   LearningRate 0.0748   Epoch: 2   Global Step: 112030   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:19:16,637-Speed 2580.86 samples/sec   Loss 12.0148   LearningRate 0.0748   Epoch: 2   Global Step: 112040   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:19:20,619-Speed 2572.68 samples/sec   Loss 12.0033   LearningRate 0.0748   Epoch: 2   Global Step: 112050   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:19:24,524-Speed 2622.88 samples/sec   Loss 12.0635   LearningRate 0.0748   Epoch: 2   Global Step: 112060   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:19:28,418-Speed 2629.65 samples/sec   Loss 11.9274   LearningRate 0.0748   Epoch: 2   Global Step: 112070   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:19:32,293-Speed 2644.20 samples/sec   Loss 12.0352   LearningRate 0.0748   Epoch: 2   Global Step: 112080   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:36,218-Speed 2609.38 samples/sec   Loss 12.0042   LearningRate 0.0748   Epoch: 2   Global Step: 112090   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:40,121-Speed 2624.24 samples/sec   Loss 11.9203   LearningRate 0.0748   Epoch: 2   Global Step: 112100   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:44,027-Speed 2621.75 samples/sec   Loss 11.8764   LearningRate 0.0748   Epoch: 2   Global Step: 112110   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:47,925-Speed 2627.78 samples/sec   Loss 11.9597   LearningRate 0.0748   Epoch: 2   Global Step: 112120   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:51,824-Speed 2627.12 samples/sec   Loss 11.9333   LearningRate 0.0748   Epoch: 2   Global Step: 112130   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:55,724-Speed 2626.53 samples/sec   Loss 12.0401   LearningRate 0.0748   Epoch: 2   Global Step: 112140   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:19:59,619-Speed 2629.36 samples/sec   Loss 11.9055   LearningRate 0.0748   Epoch: 2   Global Step: 112150   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:03,524-Speed 2622.67 samples/sec   Loss 12.1210   LearningRate 0.0748   Epoch: 2   Global Step: 112160   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:07,429-Speed 2622.77 samples/sec   Loss 11.8737   LearningRate 0.0748   Epoch: 2   Global Step: 112170   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:11,316-Speed 2635.16 samples/sec   Loss 12.0191   LearningRate 0.0748   Epoch: 2   Global Step: 112180   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:15,229-Speed 2617.90 samples/sec   Loss 12.1105   LearningRate 0.0748   Epoch: 2   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:19,135-Speed 2621.84 samples/sec   Loss 12.0320   LearningRate 0.0748   Epoch: 2   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:23,030-Speed 2629.85 samples/sec   Loss 12.0288   LearningRate 0.0748   Epoch: 2   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:26,925-Speed 2629.89 samples/sec   Loss 11.9828   LearningRate 0.0748   Epoch: 2   Global Step: 112220   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:30,821-Speed 2628.71 samples/sec   Loss 12.1044   LearningRate 0.0748   Epoch: 2   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:34,720-Speed 2627.60 samples/sec   Loss 11.9982   LearningRate 0.0748   Epoch: 2   Global Step: 112240   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:38,648-Speed 2607.57 samples/sec   Loss 11.9456   LearningRate 0.0748   Epoch: 2   Global Step: 112250   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:42,547-Speed 2626.75 samples/sec   Loss 12.1191   LearningRate 0.0748   Epoch: 2   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:46,442-Speed 2629.77 samples/sec   Loss 12.1059   LearningRate 0.0748   Epoch: 2   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:20:50,343-Speed 2625.79 samples/sec   Loss 11.8242   LearningRate 0.0748   Epoch: 2   Global Step: 112280   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:20:54,241-Speed 2627.90 samples/sec   Loss 12.0107   LearningRate 0.0748   Epoch: 2   Global Step: 112290   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:20:58,134-Speed 2631.25 samples/sec   Loss 12.0785   LearningRate 0.0748   Epoch: 2   Global Step: 112300   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:02,028-Speed 2630.04 samples/sec   Loss 11.9353   LearningRate 0.0748   Epoch: 2   Global Step: 112310   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:05,929-Speed 2626.49 samples/sec   Loss 11.9822   LearningRate 0.0748   Epoch: 2   Global Step: 112320   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:09,836-Speed 2621.69 samples/sec   Loss 11.8550   LearningRate 0.0748   Epoch: 2   Global Step: 112330   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:13,735-Speed 2626.46 samples/sec   Loss 12.0751   LearningRate 0.0748   Epoch: 2   Global Step: 112340   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:17,634-Speed 2626.72 samples/sec   Loss 12.0883   LearningRate 0.0747   Epoch: 2   Global Step: 112350   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:21,538-Speed 2624.18 samples/sec   Loss 12.1154   LearningRate 0.0747   Epoch: 2   Global Step: 112360   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:25,439-Speed 2625.78 samples/sec   Loss 11.9502   LearningRate 0.0747   Epoch: 2   Global Step: 112370   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:29,340-Speed 2625.69 samples/sec   Loss 12.1603   LearningRate 0.0747   Epoch: 2   Global Step: 112380   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:21:33,240-Speed 2626.39 samples/sec   Loss 12.0122   LearningRate 0.0747   Epoch: 2   Global Step: 112390   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:21:37,117-Speed 2641.97 samples/sec   Loss 11.8769   LearningRate 0.0747   Epoch: 2   Global Step: 112400   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:41,011-Speed 2629.94 samples/sec   Loss 12.0427   LearningRate 0.0747   Epoch: 2   Global Step: 112410   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:44,930-Speed 2613.79 samples/sec   Loss 12.1626   LearningRate 0.0747   Epoch: 2   Global Step: 112420   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:48,827-Speed 2627.66 samples/sec   Loss 12.0736   LearningRate 0.0747   Epoch: 2   Global Step: 112430   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:52,724-Speed 2628.65 samples/sec   Loss 12.0678   LearningRate 0.0747   Epoch: 2   Global Step: 112440   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:21:56,618-Speed 2630.79 samples/sec   Loss 12.0033   LearningRate 0.0747   Epoch: 2   Global Step: 112450   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:00,513-Speed 2629.99 samples/sec   Loss 12.2211   LearningRate 0.0747   Epoch: 2   Global Step: 112460   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:04,406-Speed 2631.07 samples/sec   Loss 12.2069   LearningRate 0.0747   Epoch: 2   Global Step: 112470   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:08,302-Speed 2628.36 samples/sec   Loss 12.0117   LearningRate 0.0747   Epoch: 2   Global Step: 112480   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:12,203-Speed 2625.66 samples/sec   Loss 11.8441   LearningRate 0.0747   Epoch: 2   Global Step: 112490   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:16,113-Speed 2619.67 samples/sec   Loss 12.0361   LearningRate 0.0747   Epoch: 2   Global Step: 112500   Fp16 Grad Scale: 262144   Required: 81 hours
Training: 2022-04-13 08:22:19,989-Speed 2642.85 samples/sec   Loss 12.1023   LearningRate 0.0747   Epoch: 2   Global Step: 112510   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:23,885-Speed 2629.12 samples/sec   Loss 12.2041   LearningRate 0.0747   Epoch: 2   Global Step: 112520   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:27,787-Speed 2624.73 samples/sec   Loss 11.9451   LearningRate 0.0747   Epoch: 2   Global Step: 112530   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:22:31,670-Speed 2638.69 samples/sec   Loss 11.8222   LearningRate 0.0747   Epoch: 2   Global Step: 112540   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:35,565-Speed 2629.62 samples/sec   Loss 12.0565   LearningRate 0.0747   Epoch: 2   Global Step: 112550   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:39,465-Speed 2625.90 samples/sec   Loss 11.8290   LearningRate 0.0747   Epoch: 2   Global Step: 112560   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:43,373-Speed 2620.97 samples/sec   Loss 11.7630   LearningRate 0.0747   Epoch: 2   Global Step: 112570   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:47,272-Speed 2627.61 samples/sec   Loss 11.9883   LearningRate 0.0747   Epoch: 2   Global Step: 112580   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:51,167-Speed 2629.00 samples/sec   Loss 11.9720   LearningRate 0.0747   Epoch: 2   Global Step: 112590   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:55,091-Speed 2610.76 samples/sec   Loss 12.0850   LearningRate 0.0747   Epoch: 2   Global Step: 112600   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:22:58,983-Speed 2631.75 samples/sec   Loss 12.0889   LearningRate 0.0747   Epoch: 2   Global Step: 112610   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:23:02,878-Speed 2629.16 samples/sec   Loss 12.1169   LearningRate 0.0747   Epoch: 2   Global Step: 112620   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:23:06,773-Speed 2630.08 samples/sec   Loss 11.9978   LearningRate 0.0747   Epoch: 2   Global Step: 112630   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:23:10,666-Speed 2631.18 samples/sec   Loss 12.0756   LearningRate 0.0747   Epoch: 2   Global Step: 112640   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:14,559-Speed 2631.07 samples/sec   Loss 12.0119   LearningRate 0.0747   Epoch: 2   Global Step: 112650   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:18,455-Speed 2629.14 samples/sec   Loss 11.9005   LearningRate 0.0747   Epoch: 2   Global Step: 112660   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:22,360-Speed 2623.11 samples/sec   Loss 12.1449   LearningRate 0.0747   Epoch: 2   Global Step: 112670   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:26,252-Speed 2631.50 samples/sec   Loss 12.0196   LearningRate 0.0747   Epoch: 2   Global Step: 112680   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:30,148-Speed 2628.83 samples/sec   Loss 12.0416   LearningRate 0.0747   Epoch: 2   Global Step: 112690   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:34,044-Speed 2629.00 samples/sec   Loss 12.0107   LearningRate 0.0747   Epoch: 2   Global Step: 112700   Fp16 Grad Scale: 131072   Required: 81 hours
Training: 2022-04-13 08:23:37,882-Speed 2668.43 samples/sec   Loss 12.0006   LearningRate 0.0747   Epoch: 2   Global Step: 112710   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:23:41,780-Speed 2628.26 samples/sec   Loss 12.1892   LearningRate 0.0747   Epoch: 2   Global Step: 112720   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:23:45,670-Speed 2632.70 samples/sec   Loss 11.8504   LearningRate 0.0747   Epoch: 2   Global Step: 112730   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:23:49,560-Speed 2632.87 samples/sec   Loss 11.9023   LearningRate 0.0747   Epoch: 2   Global Step: 112740   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:23:53,451-Speed 2632.48 samples/sec   Loss 12.1347   LearningRate 0.0747   Epoch: 2   Global Step: 112750   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:23:57,347-Speed 2628.98 samples/sec   Loss 12.1371   LearningRate 0.0747   Epoch: 2   Global Step: 112760   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:24:01,240-Speed 2631.43 samples/sec   Loss 11.9363   LearningRate 0.0747   Epoch: 2   Global Step: 112770   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:24:05,156-Speed 2615.51 samples/sec   Loss 12.0157   LearningRate 0.0747   Epoch: 2   Global Step: 112780   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:24:09,050-Speed 2630.36 samples/sec   Loss 11.8309   LearningRate 0.0747   Epoch: 2   Global Step: 112790   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:24:12,948-Speed 2627.63 samples/sec   Loss 12.1599   LearningRate 0.0747   Epoch: 2   Global Step: 112800   Fp16 Grad Scale: 16384   Required: 81 hours
Training: 2022-04-13 08:24:16,844-Speed 2628.86 samples/sec   Loss 11.9169   LearningRate 0.0747   Epoch: 2   Global Step: 112810   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:20,743-Speed 2626.67 samples/sec   Loss 12.1383   LearningRate 0.0747   Epoch: 2   Global Step: 112820   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:24,648-Speed 2623.03 samples/sec   Loss 12.0775   LearningRate 0.0746   Epoch: 2   Global Step: 112830   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:28,556-Speed 2621.12 samples/sec   Loss 11.9599   LearningRate 0.0746   Epoch: 2   Global Step: 112840   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:32,453-Speed 2628.49 samples/sec   Loss 12.0101   LearningRate 0.0746   Epoch: 2   Global Step: 112850   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:36,348-Speed 2629.43 samples/sec   Loss 11.9501   LearningRate 0.0746   Epoch: 2   Global Step: 112860   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:40,243-Speed 2629.81 samples/sec   Loss 11.9731   LearningRate 0.0746   Epoch: 2   Global Step: 112870   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:44,135-Speed 2631.67 samples/sec   Loss 12.0303   LearningRate 0.0746   Epoch: 2   Global Step: 112880   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:48,031-Speed 2628.87 samples/sec   Loss 12.0651   LearningRate 0.0746   Epoch: 2   Global Step: 112890   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:51,922-Speed 2632.33 samples/sec   Loss 12.0153   LearningRate 0.0746   Epoch: 2   Global Step: 112900   Fp16 Grad Scale: 32768   Required: 81 hours
Training: 2022-04-13 08:24:55,860-Speed 2600.97 samples/sec   Loss 12.0787   LearningRate 0.0746   Epoch: 2   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:24:59,775-Speed 2616.16 samples/sec   Loss 12.3927   LearningRate 0.0746   Epoch: 2   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:25:03,672-Speed 2628.47 samples/sec   Loss 12.2308   LearningRate 0.0746   Epoch: 2   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:25:07,571-Speed 2626.37 samples/sec   Loss 12.1560   LearningRate 0.0746   Epoch: 2   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:25:11,464-Speed 2631.56 samples/sec   Loss 12.1164   LearningRate 0.0746   Epoch: 2   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 81 hours
Training: 2022-04-13 08:25:15,358-Speed 2630.42 samples/sec   Loss 12.1217   LearningRate 0.0746   Epoch: 2   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:25:19,256-Speed 2627.37 samples/sec   Loss 12.2234   LearningRate 0.0746   Epoch: 2   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:25:23,149-Speed 2631.61 samples/sec   Loss 12.1140   LearningRate 0.0746   Epoch: 2   Global Step: 112980   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:25:27,044-Speed 2629.36 samples/sec   Loss 12.1619   LearningRate 0.0746   Epoch: 2   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:25:30,980-Speed 2602.71 samples/sec   Loss 12.0638   LearningRate 0.0746   Epoch: 2   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:25:34,875-Speed 2629.69 samples/sec   Loss 12.0059   LearningRate 0.0746   Epoch: 2   Global Step: 113010   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:25:38,766-Speed 2632.26 samples/sec   Loss 11.8868   LearningRate 0.0746   Epoch: 2   Global Step: 113020   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:25:42,663-Speed 2628.48 samples/sec   Loss 12.0018   LearningRate 0.0746   Epoch: 2   Global Step: 113030   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:25:46,555-Speed 2632.46 samples/sec   Loss 12.1525   LearningRate 0.0746   Epoch: 2   Global Step: 113040   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:25:50,446-Speed 2632.34 samples/sec   Loss 12.2091   LearningRate 0.0746   Epoch: 2   Global Step: 113050   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:25:54,357-Speed 2619.56 samples/sec   Loss 12.1360   LearningRate 0.0746   Epoch: 2   Global Step: 113060   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:25:58,276-Speed 2613.65 samples/sec   Loss 12.0051   LearningRate 0.0746   Epoch: 2   Global Step: 113070   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:02,168-Speed 2631.46 samples/sec   Loss 12.0198   LearningRate 0.0746   Epoch: 2   Global Step: 113080   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:06,059-Speed 2632.51 samples/sec   Loss 11.9788   LearningRate 0.0746   Epoch: 2   Global Step: 113090   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:09,952-Speed 2630.64 samples/sec   Loss 11.9409   LearningRate 0.0746   Epoch: 2   Global Step: 113100   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:13,859-Speed 2621.33 samples/sec   Loss 11.9200   LearningRate 0.0746   Epoch: 2   Global Step: 113110   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:26:17,758-Speed 2627.50 samples/sec   Loss 12.0594   LearningRate 0.0746   Epoch: 2   Global Step: 113120   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:26:21,644-Speed 2636.47 samples/sec   Loss 12.1731   LearningRate 0.0746   Epoch: 2   Global Step: 113130   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:25,542-Speed 2627.34 samples/sec   Loss 12.0195   LearningRate 0.0746   Epoch: 2   Global Step: 113140   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:29,439-Speed 2628.36 samples/sec   Loss 12.0515   LearningRate 0.0746   Epoch: 2   Global Step: 113150   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:33,331-Speed 2631.44 samples/sec   Loss 12.1742   LearningRate 0.0746   Epoch: 2   Global Step: 113160   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:37,226-Speed 2629.50 samples/sec   Loss 11.9841   LearningRate 0.0746   Epoch: 2   Global Step: 113170   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:41,131-Speed 2623.13 samples/sec   Loss 11.9449   LearningRate 0.0746   Epoch: 2   Global Step: 113180   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:45,035-Speed 2623.12 samples/sec   Loss 12.0150   LearningRate 0.0746   Epoch: 2   Global Step: 113190   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:48,924-Speed 2633.63 samples/sec   Loss 12.1008   LearningRate 0.0746   Epoch: 2   Global Step: 113200   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:52,823-Speed 2627.22 samples/sec   Loss 12.0444   LearningRate 0.0746   Epoch: 2   Global Step: 113210   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:26:56,737-Speed 2617.14 samples/sec   Loss 11.9431   LearningRate 0.0746   Epoch: 2   Global Step: 113220   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:00,636-Speed 2627.10 samples/sec   Loss 12.0265   LearningRate 0.0746   Epoch: 2   Global Step: 113230   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:27:04,583-Speed 2603.43 samples/sec   Loss 12.0393   LearningRate 0.0746   Epoch: 2   Global Step: 113240   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:27:08,457-Speed 2643.71 samples/sec   Loss 12.0966   LearningRate 0.0746   Epoch: 2   Global Step: 113250   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:12,359-Speed 2624.76 samples/sec   Loss 12.0331   LearningRate 0.0746   Epoch: 2   Global Step: 113260   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:16,258-Speed 2626.74 samples/sec   Loss 12.1442   LearningRate 0.0746   Epoch: 2   Global Step: 113270   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:20,153-Speed 2629.33 samples/sec   Loss 12.0276   LearningRate 0.0746   Epoch: 2   Global Step: 113280   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:24,048-Speed 2629.92 samples/sec   Loss 12.0091   LearningRate 0.0746   Epoch: 2   Global Step: 113290   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:27,940-Speed 2632.25 samples/sec   Loss 12.1560   LearningRate 0.0746   Epoch: 2   Global Step: 113300   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:31,831-Speed 2631.94 samples/sec   Loss 11.9394   LearningRate 0.0745   Epoch: 2   Global Step: 113310   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:35,732-Speed 2625.55 samples/sec   Loss 12.2046   LearningRate 0.0745   Epoch: 2   Global Step: 113320   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:39,631-Speed 2627.02 samples/sec   Loss 11.9135   LearningRate 0.0745   Epoch: 2   Global Step: 113330   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:43,527-Speed 2628.98 samples/sec   Loss 12.1166   LearningRate 0.0745   Epoch: 2   Global Step: 113340   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:47,403-Speed 2642.37 samples/sec   Loss 12.1063   LearningRate 0.0745   Epoch: 2   Global Step: 113350   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:51,296-Speed 2631.41 samples/sec   Loss 11.8695   LearningRate 0.0745   Epoch: 2   Global Step: 113360   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:55,190-Speed 2630.78 samples/sec   Loss 11.9594   LearningRate 0.0745   Epoch: 2   Global Step: 113370   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:27:59,082-Speed 2631.73 samples/sec   Loss 12.2204   LearningRate 0.0745   Epoch: 2   Global Step: 113380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:02,973-Speed 2632.30 samples/sec   Loss 12.0029   LearningRate 0.0745   Epoch: 2   Global Step: 113390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:06,874-Speed 2625.44 samples/sec   Loss 11.9117   LearningRate 0.0745   Epoch: 2   Global Step: 113400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:10,770-Speed 2629.52 samples/sec   Loss 12.0550   LearningRate 0.0745   Epoch: 2   Global Step: 113410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:14,674-Speed 2623.48 samples/sec   Loss 12.1403   LearningRate 0.0745   Epoch: 2   Global Step: 113420   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:18,586-Speed 2617.53 samples/sec   Loss 11.8616   LearningRate 0.0745   Epoch: 2   Global Step: 113430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:22,478-Speed 2632.19 samples/sec   Loss 11.9114   LearningRate 0.0745   Epoch: 2   Global Step: 113440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:28:26,373-Speed 2630.01 samples/sec   Loss 11.9408   LearningRate 0.0745   Epoch: 2   Global Step: 113450   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:30,270-Speed 2627.51 samples/sec   Loss 11.8746   LearningRate 0.0745   Epoch: 2   Global Step: 113460   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:34,166-Speed 2629.33 samples/sec   Loss 12.0750   LearningRate 0.0745   Epoch: 2   Global Step: 113470   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:38,058-Speed 2631.73 samples/sec   Loss 12.0738   LearningRate 0.0745   Epoch: 2   Global Step: 113480   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:41,951-Speed 2631.43 samples/sec   Loss 12.0993   LearningRate 0.0745   Epoch: 2   Global Step: 113490   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:45,860-Speed 2620.38 samples/sec   Loss 12.0828   LearningRate 0.0745   Epoch: 2   Global Step: 113500   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:49,774-Speed 2616.84 samples/sec   Loss 12.0595   LearningRate 0.0745   Epoch: 2   Global Step: 113510   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:53,667-Speed 2631.13 samples/sec   Loss 12.0032   LearningRate 0.0745   Epoch: 2   Global Step: 113520   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:28:57,557-Speed 2632.96 samples/sec   Loss 12.0009   LearningRate 0.0745   Epoch: 2   Global Step: 113530   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:01,449-Speed 2631.65 samples/sec   Loss 12.1176   LearningRate 0.0745   Epoch: 2   Global Step: 113540   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:05,330-Speed 2638.84 samples/sec   Loss 11.9068   LearningRate 0.0745   Epoch: 2   Global Step: 113550   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:09,216-Speed 2635.65 samples/sec   Loss 12.0724   LearningRate 0.0745   Epoch: 2   Global Step: 113560   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:13,110-Speed 2631.00 samples/sec   Loss 12.0748   LearningRate 0.0745   Epoch: 2   Global Step: 113570   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:17,004-Speed 2630.87 samples/sec   Loss 11.9872   LearningRate 0.0745   Epoch: 2   Global Step: 113580   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:20,895-Speed 2631.85 samples/sec   Loss 12.0834   LearningRate 0.0745   Epoch: 2   Global Step: 113590   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:24,790-Speed 2630.30 samples/sec   Loss 12.0336   LearningRate 0.0745   Epoch: 2   Global Step: 113600   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:28,705-Speed 2616.10 samples/sec   Loss 12.0215   LearningRate 0.0745   Epoch: 2   Global Step: 113610   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:32,599-Speed 2630.39 samples/sec   Loss 12.0407   LearningRate 0.0745   Epoch: 2   Global Step: 113620   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:36,510-Speed 2618.68 samples/sec   Loss 11.8106   LearningRate 0.0745   Epoch: 2   Global Step: 113630   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:40,414-Speed 2623.84 samples/sec   Loss 12.0480   LearningRate 0.0745   Epoch: 2   Global Step: 113640   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:44,308-Speed 2630.41 samples/sec   Loss 11.9142   LearningRate 0.0745   Epoch: 2   Global Step: 113650   Fp16 Grad Scale: 524288   Required: 80 hours
Training: 2022-04-13 08:29:48,186-Speed 2641.69 samples/sec   Loss 11.9417   LearningRate 0.0745   Epoch: 2   Global Step: 113660   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:29:52,063-Speed 2641.96 samples/sec   Loss 11.8973   LearningRate 0.0745   Epoch: 2   Global Step: 113670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:29:55,960-Speed 2628.08 samples/sec   Loss 11.9566   LearningRate 0.0745   Epoch: 2   Global Step: 113680   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:29:59,857-Speed 2628.41 samples/sec   Loss 11.8970   LearningRate 0.0745   Epoch: 2   Global Step: 113690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:03,757-Speed 2625.91 samples/sec   Loss 12.0579   LearningRate 0.0745   Epoch: 2   Global Step: 113700   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:07,650-Speed 2630.70 samples/sec   Loss 12.0125   LearningRate 0.0745   Epoch: 2   Global Step: 113710   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:11,548-Speed 2627.39 samples/sec   Loss 11.8716   LearningRate 0.0745   Epoch: 2   Global Step: 113720   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:15,452-Speed 2624.32 samples/sec   Loss 12.0256   LearningRate 0.0745   Epoch: 2   Global Step: 113730   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:19,370-Speed 2614.37 samples/sec   Loss 11.9438   LearningRate 0.0745   Epoch: 2   Global Step: 113740   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:23,274-Speed 2623.86 samples/sec   Loss 11.8747   LearningRate 0.0745   Epoch: 2   Global Step: 113750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:27,168-Speed 2629.52 samples/sec   Loss 11.9960   LearningRate 0.0745   Epoch: 2   Global Step: 113760   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:31,081-Speed 2618.46 samples/sec   Loss 12.0666   LearningRate 0.0745   Epoch: 2   Global Step: 113770   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:30:34,957-Speed 2642.45 samples/sec   Loss 11.9229   LearningRate 0.0745   Epoch: 2   Global Step: 113780   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:38,961-Speed 2557.89 samples/sec   Loss 11.9780   LearningRate 0.0744   Epoch: 2   Global Step: 113790   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:43,024-Speed 2520.59 samples/sec   Loss 11.9591   LearningRate 0.0744   Epoch: 2   Global Step: 113800   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:46,921-Speed 2628.54 samples/sec   Loss 12.0863   LearningRate 0.0744   Epoch: 2   Global Step: 113810   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:50,942-Speed 2547.27 samples/sec   Loss 12.1645   LearningRate 0.0744   Epoch: 2   Global Step: 113820   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:54,840-Speed 2628.19 samples/sec   Loss 11.9533   LearningRate 0.0744   Epoch: 2   Global Step: 113830   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:30:58,735-Speed 2629.76 samples/sec   Loss 12.0593   LearningRate 0.0744   Epoch: 2   Global Step: 113840   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:02,630-Speed 2629.65 samples/sec   Loss 12.0969   LearningRate 0.0744   Epoch: 2   Global Step: 113850   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:06,551-Speed 2612.05 samples/sec   Loss 12.1927   LearningRate 0.0744   Epoch: 2   Global Step: 113860   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:10,448-Speed 2628.13 samples/sec   Loss 12.0592   LearningRate 0.0744   Epoch: 2   Global Step: 113870   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:14,420-Speed 2578.84 samples/sec   Loss 12.1510   LearningRate 0.0744   Epoch: 2   Global Step: 113880   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:18,316-Speed 2629.16 samples/sec   Loss 12.0039   LearningRate 0.0744   Epoch: 2   Global Step: 113890   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:22,213-Speed 2628.40 samples/sec   Loss 11.9059   LearningRate 0.0744   Epoch: 2   Global Step: 113900   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:26,106-Speed 2630.72 samples/sec   Loss 11.9005   LearningRate 0.0744   Epoch: 2   Global Step: 113910   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:30,005-Speed 2627.26 samples/sec   Loss 12.0191   LearningRate 0.0744   Epoch: 2   Global Step: 113920   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:33,901-Speed 2629.13 samples/sec   Loss 11.9930   LearningRate 0.0744   Epoch: 2   Global Step: 113930   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:37,788-Speed 2635.29 samples/sec   Loss 11.9705   LearningRate 0.0744   Epoch: 2   Global Step: 113940   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:41,687-Speed 2626.68 samples/sec   Loss 11.9495   LearningRate 0.0744   Epoch: 2   Global Step: 113950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:45,591-Speed 2623.53 samples/sec   Loss 12.0699   LearningRate 0.0744   Epoch: 2   Global Step: 113960   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:49,485-Speed 2630.45 samples/sec   Loss 12.0650   LearningRate 0.0744   Epoch: 2   Global Step: 113970   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:31:53,433-Speed 2594.23 samples/sec   Loss 11.9260   LearningRate 0.0744   Epoch: 2   Global Step: 113980   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:31:57,334-Speed 2625.31 samples/sec   Loss 11.9731   LearningRate 0.0744   Epoch: 2   Global Step: 113990   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:32:01,216-Speed 2638.61 samples/sec   Loss 11.9786   LearningRate 0.0744   Epoch: 2   Global Step: 114000   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:05,110-Speed 2630.76 samples/sec   Loss 12.0235   LearningRate 0.0744   Epoch: 2   Global Step: 114010   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:09,014-Speed 2623.14 samples/sec   Loss 12.0379   LearningRate 0.0744   Epoch: 2   Global Step: 114020   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:12,909-Speed 2629.78 samples/sec   Loss 12.0443   LearningRate 0.0744   Epoch: 2   Global Step: 114030   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:16,801-Speed 2631.83 samples/sec   Loss 11.8751   LearningRate 0.0744   Epoch: 2   Global Step: 114040   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:20,694-Speed 2631.07 samples/sec   Loss 11.8452   LearningRate 0.0744   Epoch: 2   Global Step: 114050   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:24,599-Speed 2622.96 samples/sec   Loss 11.8834   LearningRate 0.0744   Epoch: 2   Global Step: 114060   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:28,529-Speed 2606.32 samples/sec   Loss 11.9960   LearningRate 0.0744   Epoch: 2   Global Step: 114070   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:32,434-Speed 2622.99 samples/sec   Loss 11.9713   LearningRate 0.0744   Epoch: 2   Global Step: 114080   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:36,346-Speed 2618.79 samples/sec   Loss 12.0899   LearningRate 0.0744   Epoch: 2   Global Step: 114090   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:32:40,258-Speed 2618.20 samples/sec   Loss 12.0163   LearningRate 0.0744   Epoch: 2   Global Step: 114100   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:32:44,167-Speed 2619.73 samples/sec   Loss 12.0542   LearningRate 0.0744   Epoch: 2   Global Step: 114110   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:32:48,079-Speed 2618.48 samples/sec   Loss 12.0158   LearningRate 0.0744   Epoch: 2   Global Step: 114120   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:32:52,018-Speed 2600.17 samples/sec   Loss 11.9240   LearningRate 0.0744   Epoch: 2   Global Step: 114130   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:32:55,925-Speed 2621.94 samples/sec   Loss 12.1170   LearningRate 0.0744   Epoch: 2   Global Step: 114140   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:32:59,818-Speed 2631.23 samples/sec   Loss 12.0347   LearningRate 0.0744   Epoch: 2   Global Step: 114150   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:03,714-Speed 2628.68 samples/sec   Loss 11.9245   LearningRate 0.0744   Epoch: 2   Global Step: 114160   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:07,613-Speed 2627.43 samples/sec   Loss 12.0317   LearningRate 0.0744   Epoch: 2   Global Step: 114170   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:11,509-Speed 2628.78 samples/sec   Loss 11.9635   LearningRate 0.0744   Epoch: 2   Global Step: 114180   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:15,411-Speed 2624.85 samples/sec   Loss 11.9462   LearningRate 0.0744   Epoch: 2   Global Step: 114190   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:19,370-Speed 2586.93 samples/sec   Loss 11.9873   LearningRate 0.0744   Epoch: 2   Global Step: 114200   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:23,403-Speed 2539.98 samples/sec   Loss 11.9778   LearningRate 0.0744   Epoch: 2   Global Step: 114210   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:27,300-Speed 2628.68 samples/sec   Loss 12.0504   LearningRate 0.0744   Epoch: 2   Global Step: 114220   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:31,196-Speed 2628.68 samples/sec   Loss 11.9754   LearningRate 0.0744   Epoch: 2   Global Step: 114230   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:35,096-Speed 2625.99 samples/sec   Loss 12.0063   LearningRate 0.0744   Epoch: 2   Global Step: 114240   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:38,992-Speed 2629.13 samples/sec   Loss 11.9695   LearningRate 0.0744   Epoch: 2   Global Step: 114250   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:42,908-Speed 2614.93 samples/sec   Loss 11.7187   LearningRate 0.0744   Epoch: 2   Global Step: 114260   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:46,805-Speed 2628.35 samples/sec   Loss 11.8735   LearningRate 0.0743   Epoch: 2   Global Step: 114270   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:50,696-Speed 2632.91 samples/sec   Loss 12.0454   LearningRate 0.0743   Epoch: 2   Global Step: 114280   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:54,591-Speed 2629.56 samples/sec   Loss 11.8948   LearningRate 0.0743   Epoch: 2   Global Step: 114290   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:33:58,480-Speed 2633.91 samples/sec   Loss 11.8688   LearningRate 0.0743   Epoch: 2   Global Step: 114300   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:34:02,376-Speed 2629.04 samples/sec   Loss 12.1054   LearningRate 0.0743   Epoch: 2   Global Step: 114310   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:34:06,256-Speed 2639.71 samples/sec   Loss 12.1213   LearningRate 0.0743   Epoch: 2   Global Step: 114320   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:10,150-Speed 2630.20 samples/sec   Loss 11.9852   LearningRate 0.0743   Epoch: 2   Global Step: 114330   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:14,040-Speed 2633.01 samples/sec   Loss 11.9762   LearningRate 0.0743   Epoch: 2   Global Step: 114340   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:17,934-Speed 2629.85 samples/sec   Loss 11.9954   LearningRate 0.0743   Epoch: 2   Global Step: 114350   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:21,831-Speed 2628.72 samples/sec   Loss 12.0518   LearningRate 0.0743   Epoch: 2   Global Step: 114360   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:25,725-Speed 2630.04 samples/sec   Loss 11.9083   LearningRate 0.0743   Epoch: 2   Global Step: 114370   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:29,643-Speed 2614.50 samples/sec   Loss 12.0162   LearningRate 0.0743   Epoch: 2   Global Step: 114380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:33,551-Speed 2621.02 samples/sec   Loss 12.0142   LearningRate 0.0743   Epoch: 2   Global Step: 114390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:37,523-Speed 2579.33 samples/sec   Loss 11.9710   LearningRate 0.0743   Epoch: 2   Global Step: 114400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:41,423-Speed 2626.23 samples/sec   Loss 11.9018   LearningRate 0.0743   Epoch: 2   Global Step: 114410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:34:45,322-Speed 2627.42 samples/sec   Loss 12.0350   LearningRate 0.0743   Epoch: 2   Global Step: 114420   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:34:49,231-Speed 2619.61 samples/sec   Loss 12.1293   LearningRate 0.0743   Epoch: 2   Global Step: 114430   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:34:53,135-Speed 2623.88 samples/sec   Loss 11.9815   LearningRate 0.0743   Epoch: 2   Global Step: 114440   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:34:57,026-Speed 2632.61 samples/sec   Loss 11.9595   LearningRate 0.0743   Epoch: 2   Global Step: 114450   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:35:00,916-Speed 2633.32 samples/sec   Loss 11.9234   LearningRate 0.0743   Epoch: 2   Global Step: 114460   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:35:04,790-Speed 2643.55 samples/sec   Loss 11.8120   LearningRate 0.0743   Epoch: 2   Global Step: 114470   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:08,771-Speed 2573.01 samples/sec   Loss 11.8705   LearningRate 0.0743   Epoch: 2   Global Step: 114480   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:12,676-Speed 2623.35 samples/sec   Loss 12.0225   LearningRate 0.0743   Epoch: 2   Global Step: 114490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:16,566-Speed 2633.39 samples/sec   Loss 11.9153   LearningRate 0.0743   Epoch: 2   Global Step: 114500   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:20,454-Speed 2634.23 samples/sec   Loss 12.0544   LearningRate 0.0743   Epoch: 2   Global Step: 114510   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:24,354-Speed 2626.49 samples/sec   Loss 11.9379   LearningRate 0.0743   Epoch: 2   Global Step: 114520   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:28,244-Speed 2632.86 samples/sec   Loss 12.0191   LearningRate 0.0743   Epoch: 2   Global Step: 114530   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:32,137-Speed 2631.24 samples/sec   Loss 12.0199   LearningRate 0.0743   Epoch: 2   Global Step: 114540   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:36,026-Speed 2633.62 samples/sec   Loss 12.1558   LearningRate 0.0743   Epoch: 2   Global Step: 114550   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:39,915-Speed 2633.30 samples/sec   Loss 12.1010   LearningRate 0.0743   Epoch: 2   Global Step: 114560   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:43,808-Speed 2631.26 samples/sec   Loss 12.2688   LearningRate 0.0743   Epoch: 2   Global Step: 114570   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:35:47,688-Speed 2639.65 samples/sec   Loss 12.0816   LearningRate 0.0743   Epoch: 2   Global Step: 114580   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:51,579-Speed 2632.66 samples/sec   Loss 12.0359   LearningRate 0.0743   Epoch: 2   Global Step: 114590   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:55,475-Speed 2629.54 samples/sec   Loss 11.9791   LearningRate 0.0743   Epoch: 2   Global Step: 114600   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:35:59,370-Speed 2629.05 samples/sec   Loss 12.0302   LearningRate 0.0743   Epoch: 2   Global Step: 114610   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:03,278-Speed 2620.84 samples/sec   Loss 12.0294   LearningRate 0.0743   Epoch: 2   Global Step: 114620   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:07,168-Speed 2633.17 samples/sec   Loss 12.1009   LearningRate 0.0743   Epoch: 2   Global Step: 114630   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:11,070-Speed 2625.08 samples/sec   Loss 12.0950   LearningRate 0.0743   Epoch: 2   Global Step: 114640   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:14,986-Speed 2615.80 samples/sec   Loss 12.1627   LearningRate 0.0743   Epoch: 2   Global Step: 114650   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:18,881-Speed 2629.79 samples/sec   Loss 12.1475   LearningRate 0.0743   Epoch: 2   Global Step: 114660   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:22,801-Speed 2612.63 samples/sec   Loss 12.0242   LearningRate 0.0743   Epoch: 2   Global Step: 114670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:26,719-Speed 2614.34 samples/sec   Loss 12.0991   LearningRate 0.0743   Epoch: 2   Global Step: 114680   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:30,622-Speed 2624.12 samples/sec   Loss 12.0551   LearningRate 0.0743   Epoch: 2   Global Step: 114690   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:34,530-Speed 2620.92 samples/sec   Loss 12.0762   LearningRate 0.0743   Epoch: 2   Global Step: 114700   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:38,429-Speed 2626.99 samples/sec   Loss 12.1232   LearningRate 0.0743   Epoch: 2   Global Step: 114710   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:42,330-Speed 2625.64 samples/sec   Loss 11.9574   LearningRate 0.0743   Epoch: 2   Global Step: 114720   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:46,228-Speed 2627.34 samples/sec   Loss 12.0588   LearningRate 0.0743   Epoch: 2   Global Step: 114730   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:50,127-Speed 2627.79 samples/sec   Loss 12.0740   LearningRate 0.0743   Epoch: 2   Global Step: 114740   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:36:54,001-Speed 2643.93 samples/sec   Loss 12.0796   LearningRate 0.0742   Epoch: 2   Global Step: 114750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:36:57,891-Speed 2633.10 samples/sec   Loss 12.0184   LearningRate 0.0742   Epoch: 2   Global Step: 114760   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:01,783-Speed 2631.16 samples/sec   Loss 12.0376   LearningRate 0.0742   Epoch: 2   Global Step: 114770   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:05,679-Speed 2628.75 samples/sec   Loss 11.8753   LearningRate 0.0742   Epoch: 2   Global Step: 114780   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:09,573-Speed 2630.65 samples/sec   Loss 12.0686   LearningRate 0.0742   Epoch: 2   Global Step: 114790   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:13,463-Speed 2633.44 samples/sec   Loss 12.0181   LearningRate 0.0742   Epoch: 2   Global Step: 114800   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:17,354-Speed 2632.72 samples/sec   Loss 11.8307   LearningRate 0.0742   Epoch: 2   Global Step: 114810   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:21,248-Speed 2630.70 samples/sec   Loss 12.0137   LearningRate 0.0742   Epoch: 2   Global Step: 114820   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:25,172-Speed 2610.38 samples/sec   Loss 11.9836   LearningRate 0.0742   Epoch: 2   Global Step: 114830   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:29,071-Speed 2626.37 samples/sec   Loss 11.9255   LearningRate 0.0742   Epoch: 2   Global Step: 114840   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:37:32,964-Speed 2631.36 samples/sec   Loss 12.0046   LearningRate 0.0742   Epoch: 2   Global Step: 114850   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:37:36,854-Speed 2632.60 samples/sec   Loss 11.9565   LearningRate 0.0742   Epoch: 2   Global Step: 114860   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:37:40,743-Speed 2634.21 samples/sec   Loss 12.0128   LearningRate 0.0742   Epoch: 2   Global Step: 114870   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:37:44,636-Speed 2630.77 samples/sec   Loss 12.0423   LearningRate 0.0742   Epoch: 2   Global Step: 114880   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:37:48,531-Speed 2629.75 samples/sec   Loss 11.9277   LearningRate 0.0742   Epoch: 2   Global Step: 114890   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:37:52,425-Speed 2630.56 samples/sec   Loss 11.9826   LearningRate 0.0742   Epoch: 2   Global Step: 114900   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:37:56,320-Speed 2630.03 samples/sec   Loss 11.9586   LearningRate 0.0742   Epoch: 2   Global Step: 114910   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:38:00,211-Speed 2632.03 samples/sec   Loss 12.0306   LearningRate 0.0742   Epoch: 2   Global Step: 114920   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:38:04,106-Speed 2629.39 samples/sec   Loss 11.9454   LearningRate 0.0742   Epoch: 2   Global Step: 114930   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:38:08,018-Speed 2617.79 samples/sec   Loss 12.0190   LearningRate 0.0742   Epoch: 2   Global Step: 114940   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:38:11,896-Speed 2641.84 samples/sec   Loss 11.9858   LearningRate 0.0742   Epoch: 2   Global Step: 114950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:15,789-Speed 2630.88 samples/sec   Loss 12.0955   LearningRate 0.0742   Epoch: 2   Global Step: 114960   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:19,682-Speed 2630.79 samples/sec   Loss 12.0120   LearningRate 0.0742   Epoch: 2   Global Step: 114970   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:23,584-Speed 2625.04 samples/sec   Loss 12.0063   LearningRate 0.0742   Epoch: 2   Global Step: 114980   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:27,476-Speed 2631.76 samples/sec   Loss 11.9452   LearningRate 0.0742   Epoch: 2   Global Step: 114990   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:31,371-Speed 2630.00 samples/sec   Loss 11.9826   LearningRate 0.0742   Epoch: 2   Global Step: 115000   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:35,266-Speed 2629.64 samples/sec   Loss 12.0461   LearningRate 0.0742   Epoch: 2   Global Step: 115010   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:39,160-Speed 2629.67 samples/sec   Loss 11.9784   LearningRate 0.0742   Epoch: 2   Global Step: 115020   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:43,062-Speed 2624.88 samples/sec   Loss 12.0209   LearningRate 0.0742   Epoch: 2   Global Step: 115030   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:46,956-Speed 2630.47 samples/sec   Loss 11.9906   LearningRate 0.0742   Epoch: 2   Global Step: 115040   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:38:50,853-Speed 2627.88 samples/sec   Loss 11.9933   LearningRate 0.0742   Epoch: 2   Global Step: 115050   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:38:54,749-Speed 2629.30 samples/sec   Loss 12.0385   LearningRate 0.0742   Epoch: 2   Global Step: 115060   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:38:58,647-Speed 2628.14 samples/sec   Loss 11.9482   LearningRate 0.0742   Epoch: 2   Global Step: 115070   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:39:02,541-Speed 2630.32 samples/sec   Loss 12.0451   LearningRate 0.0742   Epoch: 2   Global Step: 115080   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:39:06,437-Speed 2629.17 samples/sec   Loss 11.9456   LearningRate 0.0742   Epoch: 2   Global Step: 115090   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:39:10,332-Speed 2629.15 samples/sec   Loss 11.8063   LearningRate 0.0742   Epoch: 2   Global Step: 115100   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:39:14,211-Speed 2640.57 samples/sec   Loss 11.8489   LearningRate 0.0742   Epoch: 2   Global Step: 115110   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:18,113-Speed 2624.75 samples/sec   Loss 12.0178   LearningRate 0.0742   Epoch: 2   Global Step: 115120   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:22,208-Speed 2501.21 samples/sec   Loss 11.9213   LearningRate 0.0742   Epoch: 2   Global Step: 115130   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:26,285-Speed 2512.44 samples/sec   Loss 11.8318   LearningRate 0.0742   Epoch: 2   Global Step: 115140   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:30,362-Speed 2512.45 samples/sec   Loss 11.8547   LearningRate 0.0742   Epoch: 2   Global Step: 115150   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:34,428-Speed 2518.99 samples/sec   Loss 12.0327   LearningRate 0.0742   Epoch: 2   Global Step: 115160   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:38,337-Speed 2620.27 samples/sec   Loss 12.0176   LearningRate 0.0742   Epoch: 2   Global Step: 115170   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:42,232-Speed 2629.70 samples/sec   Loss 11.7844   LearningRate 0.0742   Epoch: 2   Global Step: 115180   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:46,131-Speed 2626.70 samples/sec   Loss 12.0369   LearningRate 0.0742   Epoch: 2   Global Step: 115190   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:50,029-Speed 2627.29 samples/sec   Loss 11.9647   LearningRate 0.0742   Epoch: 2   Global Step: 115200   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:39:53,927-Speed 2628.28 samples/sec   Loss 12.1206   LearningRate 0.0742   Epoch: 2   Global Step: 115210   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:39:57,820-Speed 2630.76 samples/sec   Loss 11.9750   LearningRate 0.0742   Epoch: 2   Global Step: 115220   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:01,724-Speed 2623.61 samples/sec   Loss 12.0770   LearningRate 0.0741   Epoch: 2   Global Step: 115230   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:05,620-Speed 2629.71 samples/sec   Loss 11.9368   LearningRate 0.0741   Epoch: 2   Global Step: 115240   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:09,524-Speed 2623.54 samples/sec   Loss 12.0570   LearningRate 0.0741   Epoch: 2   Global Step: 115250   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:13,423-Speed 2626.49 samples/sec   Loss 11.9952   LearningRate 0.0741   Epoch: 2   Global Step: 115260   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:17,350-Speed 2608.51 samples/sec   Loss 11.9866   LearningRate 0.0741   Epoch: 2   Global Step: 115270   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:21,249-Speed 2626.88 samples/sec   Loss 11.9910   LearningRate 0.0741   Epoch: 2   Global Step: 115280   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:25,167-Speed 2614.47 samples/sec   Loss 11.9483   LearningRate 0.0741   Epoch: 2   Global Step: 115290   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:29,077-Speed 2619.58 samples/sec   Loss 11.9838   LearningRate 0.0741   Epoch: 2   Global Step: 115300   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:32,954-Speed 2642.27 samples/sec   Loss 12.1201   LearningRate 0.0741   Epoch: 2   Global Step: 115310   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:36,843-Speed 2633.62 samples/sec   Loss 11.9510   LearningRate 0.0741   Epoch: 2   Global Step: 115320   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:40,748-Speed 2622.61 samples/sec   Loss 12.0418   LearningRate 0.0741   Epoch: 2   Global Step: 115330   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:44,644-Speed 2629.70 samples/sec   Loss 11.9906   LearningRate 0.0741   Epoch: 2   Global Step: 115340   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:48,538-Speed 2630.61 samples/sec   Loss 11.9791   LearningRate 0.0741   Epoch: 2   Global Step: 115350   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:52,440-Speed 2624.86 samples/sec   Loss 12.0769   LearningRate 0.0741   Epoch: 2   Global Step: 115360   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:40:56,332-Speed 2631.38 samples/sec   Loss 11.9445   LearningRate 0.0741   Epoch: 2   Global Step: 115370   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:41:00,231-Speed 2626.71 samples/sec   Loss 11.9826   LearningRate 0.0741   Epoch: 2   Global Step: 115380   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:41:04,124-Speed 2631.18 samples/sec   Loss 11.9924   LearningRate 0.0741   Epoch: 2   Global Step: 115390   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:41:08,013-Speed 2633.52 samples/sec   Loss 12.1279   LearningRate 0.0741   Epoch: 2   Global Step: 115400   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:41:11,890-Speed 2642.22 samples/sec   Loss 12.0731   LearningRate 0.0741   Epoch: 2   Global Step: 115410   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:41:15,786-Speed 2628.96 samples/sec   Loss 12.1671   LearningRate 0.0741   Epoch: 2   Global Step: 115420   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:41:19,660-Speed 2643.61 samples/sec   Loss 12.0916   LearningRate 0.0741   Epoch: 2   Global Step: 115430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:41:23,561-Speed 2625.64 samples/sec   Loss 12.1140   LearningRate 0.0741   Epoch: 2   Global Step: 115440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:41:27,462-Speed 2625.49 samples/sec   Loss 12.1444   LearningRate 0.0741   Epoch: 2   Global Step: 115450   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:41:31,346-Speed 2637.19 samples/sec   Loss 11.8297   LearningRate 0.0741   Epoch: 2   Global Step: 115460   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:35,251-Speed 2622.22 samples/sec   Loss 12.0184   LearningRate 0.0741   Epoch: 2   Global Step: 115470   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:39,157-Speed 2622.56 samples/sec   Loss 12.0274   LearningRate 0.0741   Epoch: 2   Global Step: 115480   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:43,067-Speed 2620.05 samples/sec   Loss 12.0270   LearningRate 0.0741   Epoch: 2   Global Step: 115490   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:46,961-Speed 2630.10 samples/sec   Loss 11.9350   LearningRate 0.0741   Epoch: 2   Global Step: 115500   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:50,854-Speed 2630.96 samples/sec   Loss 11.9493   LearningRate 0.0741   Epoch: 2   Global Step: 115510   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:54,752-Speed 2627.35 samples/sec   Loss 11.9926   LearningRate 0.0741   Epoch: 2   Global Step: 115520   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:41:58,668-Speed 2615.59 samples/sec   Loss 12.0215   LearningRate 0.0741   Epoch: 2   Global Step: 115530   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:42:02,566-Speed 2627.58 samples/sec   Loss 11.9333   LearningRate 0.0741   Epoch: 2   Global Step: 115540   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:42:06,459-Speed 2630.85 samples/sec   Loss 11.9869   LearningRate 0.0741   Epoch: 2   Global Step: 115550   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:42:10,353-Speed 2630.29 samples/sec   Loss 12.0923   LearningRate 0.0741   Epoch: 2   Global Step: 115560   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:14,253-Speed 2626.75 samples/sec   Loss 11.9746   LearningRate 0.0741   Epoch: 2   Global Step: 115570   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:18,146-Speed 2631.42 samples/sec   Loss 12.0114   LearningRate 0.0741   Epoch: 2   Global Step: 115580   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:22,036-Speed 2632.64 samples/sec   Loss 11.9876   LearningRate 0.0741   Epoch: 2   Global Step: 115590   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:25,927-Speed 2632.60 samples/sec   Loss 12.1042   LearningRate 0.0741   Epoch: 2   Global Step: 115600   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:29,820-Speed 2630.97 samples/sec   Loss 11.9436   LearningRate 0.0741   Epoch: 2   Global Step: 115610   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:33,714-Speed 2629.96 samples/sec   Loss 11.9793   LearningRate 0.0741   Epoch: 2   Global Step: 115620   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:37,616-Speed 2624.58 samples/sec   Loss 12.0453   LearningRate 0.0741   Epoch: 2   Global Step: 115630   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:41,512-Speed 2629.28 samples/sec   Loss 11.9692   LearningRate 0.0741   Epoch: 2   Global Step: 115640   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:45,402-Speed 2632.65 samples/sec   Loss 11.8873   LearningRate 0.0741   Epoch: 2   Global Step: 115650   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:42:49,296-Speed 2630.73 samples/sec   Loss 11.9873   LearningRate 0.0741   Epoch: 2   Global Step: 115660   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:42:53,189-Speed 2630.89 samples/sec   Loss 11.9720   LearningRate 0.0741   Epoch: 2   Global Step: 115670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:42:57,088-Speed 2627.46 samples/sec   Loss 11.9718   LearningRate 0.0741   Epoch: 2   Global Step: 115680   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:00,998-Speed 2618.95 samples/sec   Loss 12.0111   LearningRate 0.0741   Epoch: 2   Global Step: 115690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:04,894-Speed 2628.62 samples/sec   Loss 12.0040   LearningRate 0.0741   Epoch: 2   Global Step: 115700   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:08,795-Speed 2626.02 samples/sec   Loss 12.0012   LearningRate 0.0740   Epoch: 2   Global Step: 115710   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:12,687-Speed 2631.68 samples/sec   Loss 11.8066   LearningRate 0.0740   Epoch: 2   Global Step: 115720   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:16,581-Speed 2629.52 samples/sec   Loss 11.8829   LearningRate 0.0740   Epoch: 2   Global Step: 115730   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:20,486-Speed 2623.68 samples/sec   Loss 11.8487   LearningRate 0.0740   Epoch: 2   Global Step: 115740   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:24,380-Speed 2629.97 samples/sec   Loss 11.9769   LearningRate 0.0740   Epoch: 2   Global Step: 115750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:43:28,285-Speed 2623.50 samples/sec   Loss 11.7989   LearningRate 0.0740   Epoch: 2   Global Step: 115760   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:43:32,191-Speed 2622.09 samples/sec   Loss 11.9213   LearningRate 0.0740   Epoch: 2   Global Step: 115770   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:43:36,097-Speed 2621.87 samples/sec   Loss 11.8863   LearningRate 0.0740   Epoch: 2   Global Step: 115780   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:43:39,970-Speed 2644.11 samples/sec   Loss 12.0436   LearningRate 0.0740   Epoch: 2   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:43:43,888-Speed 2614.86 samples/sec   Loss 11.8619   LearningRate 0.0740   Epoch: 2   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:43:47,796-Speed 2620.71 samples/sec   Loss 12.0508   LearningRate 0.0740   Epoch: 2   Global Step: 115810   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:43:51,848-Speed 2528.33 samples/sec   Loss 12.0022   LearningRate 0.0740   Epoch: 2   Global Step: 115820   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:43:55,896-Speed 2529.99 samples/sec   Loss 11.9319   LearningRate 0.0740   Epoch: 2   Global Step: 115830   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:43:59,791-Speed 2629.66 samples/sec   Loss 12.0178   LearningRate 0.0740   Epoch: 2   Global Step: 115840   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:44:03,689-Speed 2627.60 samples/sec   Loss 11.9688   LearningRate 0.0740   Epoch: 2   Global Step: 115850   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:44:07,596-Speed 2621.78 samples/sec   Loss 11.9701   LearningRate 0.0740   Epoch: 2   Global Step: 115860   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:44:11,491-Speed 2629.39 samples/sec   Loss 12.0084   LearningRate 0.0740   Epoch: 2   Global Step: 115870   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:44:15,386-Speed 2629.32 samples/sec   Loss 11.8252   LearningRate 0.0740   Epoch: 2   Global Step: 115880   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:44:19,295-Speed 2620.90 samples/sec   Loss 11.8796   LearningRate 0.0740   Epoch: 2   Global Step: 115890   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:23,190-Speed 2629.73 samples/sec   Loss 12.1267   LearningRate 0.0740   Epoch: 2   Global Step: 115900   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:27,083-Speed 2631.03 samples/sec   Loss 11.8837   LearningRate 0.0740   Epoch: 2   Global Step: 115910   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:30,980-Speed 2628.70 samples/sec   Loss 11.9304   LearningRate 0.0740   Epoch: 2   Global Step: 115920   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:34,878-Speed 2627.52 samples/sec   Loss 11.9426   LearningRate 0.0740   Epoch: 2   Global Step: 115930   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:38,776-Speed 2627.39 samples/sec   Loss 11.9354   LearningRate 0.0740   Epoch: 2   Global Step: 115940   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:42,671-Speed 2629.47 samples/sec   Loss 12.0092   LearningRate 0.0740   Epoch: 2   Global Step: 115950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:46,568-Speed 2628.84 samples/sec   Loss 11.9550   LearningRate 0.0740   Epoch: 2   Global Step: 115960   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:50,468-Speed 2625.78 samples/sec   Loss 11.9854   LearningRate 0.0740   Epoch: 2   Global Step: 115970   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:54,367-Speed 2627.48 samples/sec   Loss 12.0069   LearningRate 0.0740   Epoch: 2   Global Step: 115980   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:44:58,266-Speed 2627.06 samples/sec   Loss 11.9493   LearningRate 0.0740   Epoch: 2   Global Step: 115990   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:45:02,164-Speed 2628.07 samples/sec   Loss 11.9261   LearningRate 0.0740   Epoch: 2   Global Step: 116000   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:45:06,061-Speed 2627.59 samples/sec   Loss 12.0076   LearningRate 0.0740   Epoch: 2   Global Step: 116010   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:45:09,959-Speed 2627.88 samples/sec   Loss 11.9577   LearningRate 0.0740   Epoch: 2   Global Step: 116020   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:45:13,880-Speed 2612.12 samples/sec   Loss 11.9480   LearningRate 0.0740   Epoch: 2   Global Step: 116030   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:45:17,780-Speed 2626.20 samples/sec   Loss 11.9741   LearningRate 0.0740   Epoch: 2   Global Step: 116040   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:45:21,673-Speed 2631.33 samples/sec   Loss 12.0362   LearningRate 0.0740   Epoch: 2   Global Step: 116050   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:45:25,574-Speed 2625.50 samples/sec   Loss 11.7914   LearningRate 0.0740   Epoch: 2   Global Step: 116060   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:29,468-Speed 2630.37 samples/sec   Loss 11.9448   LearningRate 0.0740   Epoch: 2   Global Step: 116070   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:33,360-Speed 2632.30 samples/sec   Loss 12.0130   LearningRate 0.0740   Epoch: 2   Global Step: 116080   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:37,259-Speed 2626.94 samples/sec   Loss 11.9946   LearningRate 0.0740   Epoch: 2   Global Step: 116090   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:41,160-Speed 2625.38 samples/sec   Loss 11.8986   LearningRate 0.0740   Epoch: 2   Global Step: 116100   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:45,056-Speed 2629.15 samples/sec   Loss 12.0030   LearningRate 0.0740   Epoch: 2   Global Step: 116110   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:48,954-Speed 2627.86 samples/sec   Loss 11.9666   LearningRate 0.0740   Epoch: 2   Global Step: 116120   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:52,876-Speed 2611.34 samples/sec   Loss 12.0139   LearningRate 0.0740   Epoch: 2   Global Step: 116130   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:45:56,809-Speed 2604.32 samples/sec   Loss 12.0147   LearningRate 0.0740   Epoch: 2   Global Step: 116140   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:46:00,729-Speed 2613.32 samples/sec   Loss 12.1951   LearningRate 0.0740   Epoch: 2   Global Step: 116150   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:46:04,629-Speed 2626.34 samples/sec   Loss 12.0257   LearningRate 0.0740   Epoch: 2   Global Step: 116160   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:08,526-Speed 2628.08 samples/sec   Loss 12.0600   LearningRate 0.0740   Epoch: 2   Global Step: 116170   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:12,426-Speed 2626.23 samples/sec   Loss 11.9802   LearningRate 0.0740   Epoch: 2   Global Step: 116180   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:16,331-Speed 2622.52 samples/sec   Loss 12.0393   LearningRate 0.0739   Epoch: 2   Global Step: 116190   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:20,238-Speed 2621.23 samples/sec   Loss 12.0589   LearningRate 0.0739   Epoch: 2   Global Step: 116200   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:24,162-Speed 2610.91 samples/sec   Loss 11.9029   LearningRate 0.0739   Epoch: 2   Global Step: 116210   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:28,066-Speed 2623.55 samples/sec   Loss 11.8582   LearningRate 0.0739   Epoch: 2   Global Step: 116220   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:31,965-Speed 2627.00 samples/sec   Loss 11.7847   LearningRate 0.0739   Epoch: 2   Global Step: 116230   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:35,864-Speed 2627.26 samples/sec   Loss 11.8334   LearningRate 0.0739   Epoch: 2   Global Step: 116240   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:39,758-Speed 2630.33 samples/sec   Loss 11.9196   LearningRate 0.0739   Epoch: 2   Global Step: 116250   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:43,633-Speed 2645.08 samples/sec   Loss 11.9892   LearningRate 0.0739   Epoch: 2   Global Step: 116260   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:47,541-Speed 2621.32 samples/sec   Loss 11.9075   LearningRate 0.0739   Epoch: 2   Global Step: 116270   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:51,444-Speed 2624.13 samples/sec   Loss 11.9103   LearningRate 0.0739   Epoch: 2   Global Step: 116280   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:55,346-Speed 2625.33 samples/sec   Loss 11.8987   LearningRate 0.0739   Epoch: 2   Global Step: 116290   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:46:59,244-Speed 2627.20 samples/sec   Loss 11.9408   LearningRate 0.0739   Epoch: 2   Global Step: 116300   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:03,145-Speed 2625.70 samples/sec   Loss 11.8878   LearningRate 0.0739   Epoch: 2   Global Step: 116310   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:07,048-Speed 2624.38 samples/sec   Loss 11.8787   LearningRate 0.0739   Epoch: 2   Global Step: 116320   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:10,949-Speed 2625.58 samples/sec   Loss 11.9597   LearningRate 0.0739   Epoch: 2   Global Step: 116330   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:14,839-Speed 2633.02 samples/sec   Loss 11.9081   LearningRate 0.0739   Epoch: 2   Global Step: 116340   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:18,741-Speed 2625.12 samples/sec   Loss 12.0817   LearningRate 0.0739   Epoch: 2   Global Step: 116350   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:22,638-Speed 2628.37 samples/sec   Loss 12.0767   LearningRate 0.0739   Epoch: 2   Global Step: 116360   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:47:26,540-Speed 2625.23 samples/sec   Loss 11.9980   LearningRate 0.0739   Epoch: 2   Global Step: 116370   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:47:30,429-Speed 2634.27 samples/sec   Loss 11.9136   LearningRate 0.0739   Epoch: 2   Global Step: 116380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:34,329-Speed 2625.66 samples/sec   Loss 12.0453   LearningRate 0.0739   Epoch: 2   Global Step: 116390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:38,224-Speed 2629.48 samples/sec   Loss 11.9465   LearningRate 0.0739   Epoch: 2   Global Step: 116400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:42,117-Speed 2631.54 samples/sec   Loss 11.9256   LearningRate 0.0739   Epoch: 2   Global Step: 116410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:46,008-Speed 2631.64 samples/sec   Loss 12.0195   LearningRate 0.0739   Epoch: 2   Global Step: 116420   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:49,902-Speed 2630.56 samples/sec   Loss 11.9428   LearningRate 0.0739   Epoch: 2   Global Step: 116430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:53,797-Speed 2629.60 samples/sec   Loss 11.8956   LearningRate 0.0739   Epoch: 2   Global Step: 116440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:47:57,696-Speed 2626.95 samples/sec   Loss 11.9867   LearningRate 0.0739   Epoch: 2   Global Step: 116450   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:01,590-Speed 2630.41 samples/sec   Loss 11.8672   LearningRate 0.0739   Epoch: 2   Global Step: 116460   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:05,482-Speed 2631.69 samples/sec   Loss 11.7444   LearningRate 0.0739   Epoch: 2   Global Step: 116470   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:09,378-Speed 2629.07 samples/sec   Loss 12.0091   LearningRate 0.0739   Epoch: 2   Global Step: 116480   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:48:13,265-Speed 2635.16 samples/sec   Loss 11.9117   LearningRate 0.0739   Epoch: 2   Global Step: 116490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:17,163-Speed 2627.49 samples/sec   Loss 12.0481   LearningRate 0.0739   Epoch: 2   Global Step: 116500   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:21,055-Speed 2631.78 samples/sec   Loss 12.0089   LearningRate 0.0739   Epoch: 2   Global Step: 116510   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:24,949-Speed 2630.24 samples/sec   Loss 11.9381   LearningRate 0.0739   Epoch: 2   Global Step: 116520   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:28,847-Speed 2627.31 samples/sec   Loss 11.9905   LearningRate 0.0739   Epoch: 2   Global Step: 116530   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:32,757-Speed 2619.46 samples/sec   Loss 12.0694   LearningRate 0.0739   Epoch: 2   Global Step: 116540   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:36,653-Speed 2629.19 samples/sec   Loss 12.0641   LearningRate 0.0739   Epoch: 2   Global Step: 116550   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:40,546-Speed 2630.74 samples/sec   Loss 11.9085   LearningRate 0.0739   Epoch: 2   Global Step: 116560   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:44,454-Speed 2621.31 samples/sec   Loss 11.8661   LearningRate 0.0739   Epoch: 2   Global Step: 116570   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:48,365-Speed 2618.99 samples/sec   Loss 11.9924   LearningRate 0.0739   Epoch: 2   Global Step: 116580   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:48:52,262-Speed 2628.15 samples/sec   Loss 11.9982   LearningRate 0.0739   Epoch: 2   Global Step: 116590   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:48:56,174-Speed 2617.97 samples/sec   Loss 11.8722   LearningRate 0.0739   Epoch: 2   Global Step: 116600   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:49:00,127-Speed 2591.25 samples/sec   Loss 11.8385   LearningRate 0.0739   Epoch: 2   Global Step: 116610   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:49:04,020-Speed 2631.01 samples/sec   Loss 11.8685   LearningRate 0.0739   Epoch: 2   Global Step: 116620   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:49:07,916-Speed 2629.02 samples/sec   Loss 12.0318   LearningRate 0.0739   Epoch: 2   Global Step: 116630   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:49:11,815-Speed 2627.33 samples/sec   Loss 11.9657   LearningRate 0.0739   Epoch: 2   Global Step: 116640   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:49:15,711-Speed 2628.89 samples/sec   Loss 11.8466   LearningRate 0.0739   Epoch: 2   Global Step: 116650   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:49:19,617-Speed 2622.34 samples/sec   Loss 11.7905   LearningRate 0.0739   Epoch: 2   Global Step: 116660   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:23,521-Speed 2622.95 samples/sec   Loss 11.9550   LearningRate 0.0739   Epoch: 2   Global Step: 116670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:27,429-Speed 2621.48 samples/sec   Loss 12.0749   LearningRate 0.0738   Epoch: 2   Global Step: 116680   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:31,323-Speed 2630.22 samples/sec   Loss 11.8952   LearningRate 0.0738   Epoch: 2   Global Step: 116690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:35,217-Speed 2629.99 samples/sec   Loss 11.9168   LearningRate 0.0738   Epoch: 2   Global Step: 116700   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:39,115-Speed 2627.80 samples/sec   Loss 11.9441   LearningRate 0.0738   Epoch: 2   Global Step: 116710   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:43,009-Speed 2630.67 samples/sec   Loss 12.0480   LearningRate 0.0738   Epoch: 2   Global Step: 116720   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:46,914-Speed 2622.33 samples/sec   Loss 11.8963   LearningRate 0.0738   Epoch: 2   Global Step: 116730   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:50,808-Speed 2630.34 samples/sec   Loss 11.8685   LearningRate 0.0738   Epoch: 2   Global Step: 116740   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:54,704-Speed 2628.87 samples/sec   Loss 11.8547   LearningRate 0.0738   Epoch: 2   Global Step: 116750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:49:58,570-Speed 2649.52 samples/sec   Loss 11.8233   LearningRate 0.0738   Epoch: 2   Global Step: 116760   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:02,457-Speed 2634.90 samples/sec   Loss 11.9946   LearningRate 0.0738   Epoch: 2   Global Step: 116770   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:06,353-Speed 2629.11 samples/sec   Loss 11.9737   LearningRate 0.0738   Epoch: 2   Global Step: 116780   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:10,247-Speed 2630.48 samples/sec   Loss 12.0038   LearningRate 0.0738   Epoch: 2   Global Step: 116790   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:14,142-Speed 2629.69 samples/sec   Loss 11.9991   LearningRate 0.0738   Epoch: 2   Global Step: 116800   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:18,038-Speed 2628.85 samples/sec   Loss 12.0577   LearningRate 0.0738   Epoch: 2   Global Step: 116810   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:21,930-Speed 2631.31 samples/sec   Loss 11.8173   LearningRate 0.0738   Epoch: 2   Global Step: 116820   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:25,840-Speed 2619.98 samples/sec   Loss 11.9864   LearningRate 0.0738   Epoch: 2   Global Step: 116830   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:29,742-Speed 2624.41 samples/sec   Loss 11.9161   LearningRate 0.0738   Epoch: 2   Global Step: 116840   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:33,635-Speed 2631.19 samples/sec   Loss 11.9112   LearningRate 0.0738   Epoch: 2   Global Step: 116850   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:50:37,529-Speed 2630.22 samples/sec   Loss 11.8549   LearningRate 0.0738   Epoch: 2   Global Step: 116860   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:50:41,423-Speed 2630.31 samples/sec   Loss 11.9584   LearningRate 0.0738   Epoch: 2   Global Step: 116870   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:50:45,331-Speed 2620.96 samples/sec   Loss 11.9555   LearningRate 0.0738   Epoch: 2   Global Step: 116880   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:50:49,223-Speed 2631.65 samples/sec   Loss 11.9068   LearningRate 0.0738   Epoch: 2   Global Step: 116890   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:50:53,116-Speed 2631.42 samples/sec   Loss 11.9706   LearningRate 0.0738   Epoch: 2   Global Step: 116900   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:50:57,010-Speed 2630.58 samples/sec   Loss 11.8385   LearningRate 0.0738   Epoch: 2   Global Step: 116910   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:51:00,901-Speed 2631.61 samples/sec   Loss 11.9940   LearningRate 0.0738   Epoch: 2   Global Step: 116920   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:51:04,802-Speed 2625.92 samples/sec   Loss 11.9851   LearningRate 0.0738   Epoch: 2   Global Step: 116930   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:51:08,695-Speed 2631.09 samples/sec   Loss 11.8600   LearningRate 0.0738   Epoch: 2   Global Step: 116940   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:51:12,588-Speed 2630.74 samples/sec   Loss 11.9452   LearningRate 0.0738   Epoch: 2   Global Step: 116950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:51:16,482-Speed 2630.26 samples/sec   Loss 11.9690   LearningRate 0.0738   Epoch: 2   Global Step: 116960   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:51:20,390-Speed 2621.33 samples/sec   Loss 11.9535   LearningRate 0.0738   Epoch: 2   Global Step: 116970   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:51:24,239-Speed 2661.06 samples/sec   Loss 11.8362   LearningRate 0.0738   Epoch: 2   Global Step: 116980   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:51:28,106-Speed 2648.69 samples/sec   Loss 12.0698   LearningRate 0.0738   Epoch: 2   Global Step: 116990   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:32,009-Speed 2624.63 samples/sec   Loss 11.9321   LearningRate 0.0738   Epoch: 2   Global Step: 117000   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:35,902-Speed 2631.21 samples/sec   Loss 12.0638   LearningRate 0.0738   Epoch: 2   Global Step: 117010   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:39,797-Speed 2629.32 samples/sec   Loss 11.8074   LearningRate 0.0738   Epoch: 2   Global Step: 117020   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:43,692-Speed 2629.67 samples/sec   Loss 12.0089   LearningRate 0.0738   Epoch: 2   Global Step: 117030   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:47,582-Speed 2632.82 samples/sec   Loss 12.0535   LearningRate 0.0738   Epoch: 2   Global Step: 117040   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:51,471-Speed 2633.83 samples/sec   Loss 12.1024   LearningRate 0.0738   Epoch: 2   Global Step: 117050   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:55,362-Speed 2632.75 samples/sec   Loss 12.0172   LearningRate 0.0738   Epoch: 2   Global Step: 117060   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:51:59,260-Speed 2627.95 samples/sec   Loss 11.9415   LearningRate 0.0738   Epoch: 2   Global Step: 117070   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:52:03,152-Speed 2631.69 samples/sec   Loss 12.0597   LearningRate 0.0738   Epoch: 2   Global Step: 117080   Fp16 Grad Scale: 8192   Required: 80 hours
Training: 2022-04-13 08:52:07,046-Speed 2630.11 samples/sec   Loss 11.9577   LearningRate 0.0738   Epoch: 2   Global Step: 117090   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:10,958-Speed 2618.23 samples/sec   Loss 11.9168   LearningRate 0.0738   Epoch: 2   Global Step: 117100   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:14,846-Speed 2634.06 samples/sec   Loss 11.9171   LearningRate 0.0738   Epoch: 2   Global Step: 117110   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:18,739-Speed 2631.38 samples/sec   Loss 11.8976   LearningRate 0.0738   Epoch: 2   Global Step: 117120   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:22,636-Speed 2628.35 samples/sec   Loss 11.9674   LearningRate 0.0738   Epoch: 2   Global Step: 117130   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:26,543-Speed 2621.48 samples/sec   Loss 12.1788   LearningRate 0.0738   Epoch: 2   Global Step: 117140   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:30,435-Speed 2631.80 samples/sec   Loss 11.9135   LearningRate 0.0738   Epoch: 2   Global Step: 117150   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:34,326-Speed 2632.31 samples/sec   Loss 11.9309   LearningRate 0.0737   Epoch: 2   Global Step: 117160   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:38,222-Speed 2629.31 samples/sec   Loss 12.0753   LearningRate 0.0737   Epoch: 2   Global Step: 117170   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:42,115-Speed 2631.03 samples/sec   Loss 12.4284   LearningRate 0.0737   Epoch: 2   Global Step: 117180   Fp16 Grad Scale: 16384   Required: 80 hours
Training: 2022-04-13 08:52:46,032-Speed 2614.65 samples/sec   Loss 12.1309   LearningRate 0.0737   Epoch: 2   Global Step: 117190   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:52:49,928-Speed 2629.03 samples/sec   Loss 12.0262   LearningRate 0.0737   Epoch: 2   Global Step: 117200   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:52:53,820-Speed 2631.92 samples/sec   Loss 12.1582   LearningRate 0.0737   Epoch: 2   Global Step: 117210   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:52:57,714-Speed 2630.54 samples/sec   Loss 11.9538   LearningRate 0.0737   Epoch: 2   Global Step: 117220   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:01,611-Speed 2628.17 samples/sec   Loss 11.9390   LearningRate 0.0737   Epoch: 2   Global Step: 117230   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:05,508-Speed 2628.04 samples/sec   Loss 11.9337   LearningRate 0.0737   Epoch: 2   Global Step: 117240   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:09,398-Speed 2633.22 samples/sec   Loss 11.9973   LearningRate 0.0737   Epoch: 2   Global Step: 117250   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:13,290-Speed 2631.51 samples/sec   Loss 12.0182   LearningRate 0.0737   Epoch: 2   Global Step: 117260   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:17,189-Speed 2626.80 samples/sec   Loss 11.8973   LearningRate 0.0737   Epoch: 2   Global Step: 117270   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:21,086-Speed 2628.71 samples/sec   Loss 11.8872   LearningRate 0.0737   Epoch: 2   Global Step: 117280   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 08:53:24,979-Speed 2630.74 samples/sec   Loss 12.1040   LearningRate 0.0737   Epoch: 2   Global Step: 117290   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:28,868-Speed 2634.01 samples/sec   Loss 11.8440   LearningRate 0.0737   Epoch: 2   Global Step: 117300   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:32,762-Speed 2629.80 samples/sec   Loss 12.0067   LearningRate 0.0737   Epoch: 2   Global Step: 117310   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:36,654-Speed 2632.11 samples/sec   Loss 11.9581   LearningRate 0.0737   Epoch: 2   Global Step: 117320   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:40,550-Speed 2628.50 samples/sec   Loss 11.8815   LearningRate 0.0737   Epoch: 2   Global Step: 117330   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:44,441-Speed 2632.82 samples/sec   Loss 11.9745   LearningRate 0.0737   Epoch: 2   Global Step: 117340   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:48,334-Speed 2630.83 samples/sec   Loss 12.0768   LearningRate 0.0737   Epoch: 2   Global Step: 117350   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:52,229-Speed 2629.76 samples/sec   Loss 11.9299   LearningRate 0.0737   Epoch: 2   Global Step: 117360   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:53:56,120-Speed 2632.53 samples/sec   Loss 12.1296   LearningRate 0.0737   Epoch: 2   Global Step: 117370   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:54:00,013-Speed 2630.82 samples/sec   Loss 12.0249   LearningRate 0.0737   Epoch: 2   Global Step: 117380   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:54:03,910-Speed 2628.46 samples/sec   Loss 11.8572   LearningRate 0.0737   Epoch: 2   Global Step: 117390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:07,802-Speed 2631.49 samples/sec   Loss 12.0591   LearningRate 0.0737   Epoch: 2   Global Step: 117400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:11,690-Speed 2634.22 samples/sec   Loss 11.8585   LearningRate 0.0737   Epoch: 2   Global Step: 117410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:15,580-Speed 2632.94 samples/sec   Loss 11.9499   LearningRate 0.0737   Epoch: 2   Global Step: 117420   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:19,472-Speed 2632.36 samples/sec   Loss 12.0459   LearningRate 0.0737   Epoch: 2   Global Step: 117430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:23,371-Speed 2626.80 samples/sec   Loss 11.9391   LearningRate 0.0737   Epoch: 2   Global Step: 117440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:27,268-Speed 2628.82 samples/sec   Loss 11.6792   LearningRate 0.0737   Epoch: 2   Global Step: 117450   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:31,162-Speed 2630.40 samples/sec   Loss 11.9228   LearningRate 0.0737   Epoch: 2   Global Step: 117460   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:35,069-Speed 2621.04 samples/sec   Loss 11.9331   LearningRate 0.0737   Epoch: 2   Global Step: 117470   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:38,972-Speed 2624.17 samples/sec   Loss 11.9872   LearningRate 0.0737   Epoch: 2   Global Step: 117480   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:42,847-Speed 2643.57 samples/sec   Loss 11.9365   LearningRate 0.0737   Epoch: 2   Global Step: 117490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:46,740-Speed 2631.19 samples/sec   Loss 11.8601   LearningRate 0.0737   Epoch: 2   Global Step: 117500   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:50,657-Speed 2614.83 samples/sec   Loss 11.9074   LearningRate 0.0737   Epoch: 2   Global Step: 117510   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:54,550-Speed 2631.02 samples/sec   Loss 11.9360   LearningRate 0.0737   Epoch: 2   Global Step: 117520   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:54:58,463-Speed 2617.76 samples/sec   Loss 11.8626   LearningRate 0.0737   Epoch: 2   Global Step: 117530   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:55:02,344-Speed 2639.48 samples/sec   Loss 11.9055   LearningRate 0.0737   Epoch: 2   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:06,237-Speed 2631.32 samples/sec   Loss 11.9528   LearningRate 0.0737   Epoch: 2   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:10,139-Speed 2624.54 samples/sec   Loss 11.8604   LearningRate 0.0737   Epoch: 2   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:14,039-Speed 2626.69 samples/sec   Loss 11.9347   LearningRate 0.0737   Epoch: 2   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:17,944-Speed 2622.29 samples/sec   Loss 11.8489   LearningRate 0.0737   Epoch: 2   Global Step: 117580   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:21,852-Speed 2620.95 samples/sec   Loss 11.9981   LearningRate 0.0737   Epoch: 2   Global Step: 117590   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:25,744-Speed 2631.95 samples/sec   Loss 11.9963   LearningRate 0.0737   Epoch: 2   Global Step: 117600   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:29,634-Speed 2633.87 samples/sec   Loss 11.8162   LearningRate 0.0737   Epoch: 2   Global Step: 117610   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:33,523-Speed 2633.34 samples/sec   Loss 11.9221   LearningRate 0.0737   Epoch: 2   Global Step: 117620   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:37,414-Speed 2631.87 samples/sec   Loss 12.0157   LearningRate 0.0737   Epoch: 2   Global Step: 117630   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 08:55:41,321-Speed 2621.40 samples/sec   Loss 11.9540   LearningRate 0.0736   Epoch: 2   Global Step: 117640   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:55:45,251-Speed 2606.51 samples/sec   Loss 12.0406   LearningRate 0.0736   Epoch: 2   Global Step: 117650   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:55:49,152-Speed 2625.71 samples/sec   Loss 11.9340   LearningRate 0.0736   Epoch: 2   Global Step: 117660   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:55:53,062-Speed 2619.81 samples/sec   Loss 11.9973   LearningRate 0.0736   Epoch: 2   Global Step: 117670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:55:56,972-Speed 2619.58 samples/sec   Loss 12.0556   LearningRate 0.0736   Epoch: 2   Global Step: 117680   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:00,877-Speed 2622.95 samples/sec   Loss 11.8589   LearningRate 0.0736   Epoch: 2   Global Step: 117690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:04,790-Speed 2617.44 samples/sec   Loss 11.8397   LearningRate 0.0736   Epoch: 2   Global Step: 117700   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:08,705-Speed 2615.99 samples/sec   Loss 11.9817   LearningRate 0.0736   Epoch: 2   Global Step: 117710   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:12,601-Speed 2629.14 samples/sec   Loss 11.9664   LearningRate 0.0736   Epoch: 2   Global Step: 117720   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:16,490-Speed 2633.62 samples/sec   Loss 11.9719   LearningRate 0.0736   Epoch: 2   Global Step: 117730   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:20,385-Speed 2629.50 samples/sec   Loss 11.9412   LearningRate 0.0736   Epoch: 2   Global Step: 117740   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:56:24,279-Speed 2630.41 samples/sec   Loss 11.8853   LearningRate 0.0736   Epoch: 2   Global Step: 117750   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:56:28,173-Speed 2630.50 samples/sec   Loss 12.0130   LearningRate 0.0736   Epoch: 2   Global Step: 117760   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:56:32,066-Speed 2631.44 samples/sec   Loss 11.9533   LearningRate 0.0736   Epoch: 2   Global Step: 117770   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:56:35,945-Speed 2640.07 samples/sec   Loss 11.8916   LearningRate 0.0736   Epoch: 2   Global Step: 117780   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:39,909-Speed 2583.72 samples/sec   Loss 11.8579   LearningRate 0.0736   Epoch: 2   Global Step: 117790   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:43,864-Speed 2589.82 samples/sec   Loss 11.9144   LearningRate 0.0736   Epoch: 2   Global Step: 117800   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:47,766-Speed 2624.94 samples/sec   Loss 11.9713   LearningRate 0.0736   Epoch: 2   Global Step: 117810   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:51,679-Speed 2617.11 samples/sec   Loss 11.9801   LearningRate 0.0736   Epoch: 2   Global Step: 117820   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:55,585-Speed 2623.05 samples/sec   Loss 12.0006   LearningRate 0.0736   Epoch: 2   Global Step: 117830   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:56:59,486-Speed 2625.91 samples/sec   Loss 12.0635   LearningRate 0.0736   Epoch: 2   Global Step: 117840   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:57:03,386-Speed 2626.06 samples/sec   Loss 11.9006   LearningRate 0.0736   Epoch: 2   Global Step: 117850   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:57:07,287-Speed 2625.35 samples/sec   Loss 11.9145   LearningRate 0.0736   Epoch: 2   Global Step: 117860   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:57:11,197-Speed 2619.27 samples/sec   Loss 11.9371   LearningRate 0.0736   Epoch: 2   Global Step: 117870   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:57:15,104-Speed 2621.74 samples/sec   Loss 11.8372   LearningRate 0.0736   Epoch: 2   Global Step: 117880   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:19,033-Speed 2607.37 samples/sec   Loss 11.8798   LearningRate 0.0736   Epoch: 2   Global Step: 117890   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:22,945-Speed 2618.10 samples/sec   Loss 11.9389   LearningRate 0.0736   Epoch: 2   Global Step: 117900   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:26,848-Speed 2624.48 samples/sec   Loss 11.8592   LearningRate 0.0736   Epoch: 2   Global Step: 117910   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:30,755-Speed 2621.52 samples/sec   Loss 11.9575   LearningRate 0.0736   Epoch: 2   Global Step: 117920   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:34,672-Speed 2615.14 samples/sec   Loss 11.9312   LearningRate 0.0736   Epoch: 2   Global Step: 117930   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:38,581-Speed 2620.06 samples/sec   Loss 12.0170   LearningRate 0.0736   Epoch: 2   Global Step: 117940   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:42,490-Speed 2620.11 samples/sec   Loss 12.0212   LearningRate 0.0736   Epoch: 2   Global Step: 117950   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:46,384-Speed 2629.97 samples/sec   Loss 11.8646   LearningRate 0.0736   Epoch: 2   Global Step: 117960   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:50,282-Speed 2628.31 samples/sec   Loss 11.9074   LearningRate 0.0736   Epoch: 2   Global Step: 117970   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:54,160-Speed 2640.91 samples/sec   Loss 11.9334   LearningRate 0.0736   Epoch: 2   Global Step: 117980   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:57:58,075-Speed 2616.60 samples/sec   Loss 11.7371   LearningRate 0.0736   Epoch: 2   Global Step: 117990   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:58:01,993-Speed 2613.95 samples/sec   Loss 11.8595   LearningRate 0.0736   Epoch: 2   Global Step: 118000   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:58:05,870-Speed 2642.32 samples/sec   Loss 11.9039   LearningRate 0.0736   Epoch: 2   Global Step: 118010   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:09,763-Speed 2631.05 samples/sec   Loss 11.9608   LearningRate 0.0736   Epoch: 2   Global Step: 118020   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:13,660-Speed 2628.28 samples/sec   Loss 11.9086   LearningRate 0.0736   Epoch: 2   Global Step: 118030   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:17,553-Speed 2630.96 samples/sec   Loss 11.7136   LearningRate 0.0736   Epoch: 2   Global Step: 118040   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:21,449-Speed 2628.29 samples/sec   Loss 11.8635   LearningRate 0.0736   Epoch: 2   Global Step: 118050   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:25,344-Speed 2630.23 samples/sec   Loss 12.0012   LearningRate 0.0736   Epoch: 2   Global Step: 118060   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:29,236-Speed 2632.10 samples/sec   Loss 12.0541   LearningRate 0.0736   Epoch: 2   Global Step: 118070   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:33,141-Speed 2622.27 samples/sec   Loss 11.8097   LearningRate 0.0736   Epoch: 2   Global Step: 118080   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:37,035-Speed 2630.57 samples/sec   Loss 11.7451   LearningRate 0.0736   Epoch: 2   Global Step: 118090   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:40,929-Speed 2630.81 samples/sec   Loss 11.7566   LearningRate 0.0736   Epoch: 2   Global Step: 118100   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:58:44,822-Speed 2631.40 samples/sec   Loss 11.8925   LearningRate 0.0736   Epoch: 2   Global Step: 118110   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:58:48,713-Speed 2631.74 samples/sec   Loss 11.8493   LearningRate 0.0736   Epoch: 2   Global Step: 118120   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:58:52,624-Speed 2619.00 samples/sec   Loss 11.7599   LearningRate 0.0735   Epoch: 2   Global Step: 118130   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:58:56,521-Speed 2628.86 samples/sec   Loss 11.7724   LearningRate 0.0735   Epoch: 2   Global Step: 118140   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:59:00,425-Speed 2623.26 samples/sec   Loss 11.9398   LearningRate 0.0735   Epoch: 2   Global Step: 118150   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:59:04,322-Speed 2627.89 samples/sec   Loss 12.0056   LearningRate 0.0735   Epoch: 2   Global Step: 118160   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:59:08,219-Speed 2629.43 samples/sec   Loss 11.9017   LearningRate 0.0735   Epoch: 2   Global Step: 118170   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:59:12,098-Speed 2640.38 samples/sec   Loss 11.9225   LearningRate 0.0735   Epoch: 2   Global Step: 118180   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:15,997-Speed 2627.74 samples/sec   Loss 11.9236   LearningRate 0.0735   Epoch: 2   Global Step: 118190   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:19,914-Speed 2615.10 samples/sec   Loss 11.7372   LearningRate 0.0735   Epoch: 2   Global Step: 118200   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:23,810-Speed 2628.87 samples/sec   Loss 11.8968   LearningRate 0.0735   Epoch: 2   Global Step: 118210   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:27,705-Speed 2630.33 samples/sec   Loss 11.8301   LearningRate 0.0735   Epoch: 2   Global Step: 118220   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:31,599-Speed 2629.55 samples/sec   Loss 11.8247   LearningRate 0.0735   Epoch: 2   Global Step: 118230   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:35,493-Speed 2630.56 samples/sec   Loss 11.9002   LearningRate 0.0735   Epoch: 2   Global Step: 118240   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:39,412-Speed 2612.86 samples/sec   Loss 11.8573   LearningRate 0.0735   Epoch: 2   Global Step: 118250   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:43,313-Speed 2626.35 samples/sec   Loss 11.9724   LearningRate 0.0735   Epoch: 2   Global Step: 118260   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:47,204-Speed 2632.15 samples/sec   Loss 11.7616   LearningRate 0.0735   Epoch: 2   Global Step: 118270   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 08:59:51,097-Speed 2631.61 samples/sec   Loss 11.9315   LearningRate 0.0735   Epoch: 2   Global Step: 118280   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:59:54,989-Speed 2631.54 samples/sec   Loss 11.7485   LearningRate 0.0735   Epoch: 2   Global Step: 118290   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 08:59:58,881-Speed 2631.76 samples/sec   Loss 11.9002   LearningRate 0.0735   Epoch: 2   Global Step: 118300   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:00:02,775-Speed 2630.17 samples/sec   Loss 11.9119   LearningRate 0.0735   Epoch: 2   Global Step: 118310   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:00:06,671-Speed 2628.37 samples/sec   Loss 11.9136   LearningRate 0.0735   Epoch: 2   Global Step: 118320   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:00:10,568-Speed 2628.09 samples/sec   Loss 11.9895   LearningRate 0.0735   Epoch: 2   Global Step: 118330   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:00:14,617-Speed 2530.09 samples/sec   Loss 11.8178   LearningRate 0.0735   Epoch: 2   Global Step: 118340   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:00:18,683-Speed 2519.03 samples/sec   Loss 11.7766   LearningRate 0.0735   Epoch: 2   Global Step: 118350   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:00:22,721-Speed 2536.93 samples/sec   Loss 12.0801   LearningRate 0.0735   Epoch: 2   Global Step: 118360   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:26,684-Speed 2583.85 samples/sec   Loss 11.9333   LearningRate 0.0735   Epoch: 2   Global Step: 118370   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:30,748-Speed 2520.74 samples/sec   Loss 11.8618   LearningRate 0.0735   Epoch: 2   Global Step: 118380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:34,826-Speed 2511.24 samples/sec   Loss 11.9710   LearningRate 0.0735   Epoch: 2   Global Step: 118390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:38,865-Speed 2536.05 samples/sec   Loss 11.8974   LearningRate 0.0735   Epoch: 2   Global Step: 118400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:42,758-Speed 2631.08 samples/sec   Loss 11.9326   LearningRate 0.0735   Epoch: 2   Global Step: 118410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:46,746-Speed 2568.34 samples/sec   Loss 11.9914   LearningRate 0.0735   Epoch: 2   Global Step: 118420   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:50,641-Speed 2629.74 samples/sec   Loss 11.8481   LearningRate 0.0735   Epoch: 2   Global Step: 118430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:54,565-Speed 2610.75 samples/sec   Loss 11.6769   LearningRate 0.0735   Epoch: 2   Global Step: 118440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:00:58,455-Speed 2633.05 samples/sec   Loss 11.8882   LearningRate 0.0735   Epoch: 2   Global Step: 118450   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:02,355-Speed 2626.51 samples/sec   Loss 12.0606   LearningRate 0.0735   Epoch: 2   Global Step: 118460   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:01:06,248-Speed 2630.83 samples/sec   Loss 11.8521   LearningRate 0.0735   Epoch: 2   Global Step: 118470   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:01:10,139-Speed 2631.64 samples/sec   Loss 11.9110   LearningRate 0.0735   Epoch: 2   Global Step: 118480   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:01:14,016-Speed 2642.67 samples/sec   Loss 12.0019   LearningRate 0.0735   Epoch: 2   Global Step: 118490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:17,903-Speed 2634.77 samples/sec   Loss 11.8310   LearningRate 0.0735   Epoch: 2   Global Step: 118500   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:21,803-Speed 2626.78 samples/sec   Loss 11.8743   LearningRate 0.0735   Epoch: 2   Global Step: 118510   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:25,700-Speed 2627.79 samples/sec   Loss 12.0285   LearningRate 0.0735   Epoch: 2   Global Step: 118520   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:29,598-Speed 2627.64 samples/sec   Loss 11.9841   LearningRate 0.0735   Epoch: 2   Global Step: 118530   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:33,517-Speed 2613.32 samples/sec   Loss 11.9995   LearningRate 0.0735   Epoch: 2   Global Step: 118540   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:37,409-Speed 2631.76 samples/sec   Loss 12.0445   LearningRate 0.0735   Epoch: 2   Global Step: 118550   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:41,309-Speed 2626.37 samples/sec   Loss 11.9895   LearningRate 0.0735   Epoch: 2   Global Step: 118560   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:45,205-Speed 2629.03 samples/sec   Loss 11.8650   LearningRate 0.0735   Epoch: 2   Global Step: 118570   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:49,097-Speed 2631.99 samples/sec   Loss 11.9302   LearningRate 0.0735   Epoch: 2   Global Step: 118580   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:01:52,991-Speed 2630.18 samples/sec   Loss 11.9371   LearningRate 0.0735   Epoch: 2   Global Step: 118590   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:01:56,889-Speed 2627.83 samples/sec   Loss 12.1032   LearningRate 0.0735   Epoch: 2   Global Step: 118600   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:02:00,803-Speed 2617.35 samples/sec   Loss 11.9373   LearningRate 0.0734   Epoch: 2   Global Step: 118610   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:02:04,701-Speed 2627.46 samples/sec   Loss 11.9241   LearningRate 0.0734   Epoch: 2   Global Step: 118620   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:02:08,584-Speed 2637.81 samples/sec   Loss 11.8163   LearningRate 0.0734   Epoch: 2   Global Step: 118630   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:12,478-Speed 2629.89 samples/sec   Loss 11.7069   LearningRate 0.0734   Epoch: 2   Global Step: 118640   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:16,475-Speed 2563.16 samples/sec   Loss 11.7726   LearningRate 0.0734   Epoch: 2   Global Step: 118650   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:20,371-Speed 2629.30 samples/sec   Loss 11.9193   LearningRate 0.0734   Epoch: 2   Global Step: 118660   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:24,266-Speed 2629.33 samples/sec   Loss 11.7560   LearningRate 0.0734   Epoch: 2   Global Step: 118670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:28,156-Speed 2633.22 samples/sec   Loss 11.9829   LearningRate 0.0734   Epoch: 2   Global Step: 118680   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:32,063-Speed 2621.25 samples/sec   Loss 12.0131   LearningRate 0.0734   Epoch: 2   Global Step: 118690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:35,953-Speed 2634.00 samples/sec   Loss 11.9088   LearningRate 0.0734   Epoch: 2   Global Step: 118700   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:39,844-Speed 2631.61 samples/sec   Loss 11.9151   LearningRate 0.0734   Epoch: 2   Global Step: 118710   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:43,740-Speed 2628.93 samples/sec   Loss 11.9617   LearningRate 0.0734   Epoch: 2   Global Step: 118720   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:47,632-Speed 2631.86 samples/sec   Loss 11.8606   LearningRate 0.0734   Epoch: 2   Global Step: 118730   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:02:51,537-Speed 2623.08 samples/sec   Loss 12.0077   LearningRate 0.0734   Epoch: 2   Global Step: 118740   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:55,438-Speed 2625.88 samples/sec   Loss 11.7717   LearningRate 0.0734   Epoch: 2   Global Step: 118750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:02:59,330-Speed 2631.55 samples/sec   Loss 11.9851   LearningRate 0.0734   Epoch: 2   Global Step: 118760   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:03,239-Speed 2620.35 samples/sec   Loss 11.8885   LearningRate 0.0734   Epoch: 2   Global Step: 118770   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:07,131-Speed 2631.58 samples/sec   Loss 11.9190   LearningRate 0.0734   Epoch: 2   Global Step: 118780   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:11,025-Speed 2630.49 samples/sec   Loss 11.9659   LearningRate 0.0734   Epoch: 2   Global Step: 118790   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:14,920-Speed 2629.72 samples/sec   Loss 11.8391   LearningRate 0.0734   Epoch: 2   Global Step: 118800   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:18,814-Speed 2630.34 samples/sec   Loss 11.8987   LearningRate 0.0734   Epoch: 2   Global Step: 118810   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:22,711-Speed 2628.60 samples/sec   Loss 11.7759   LearningRate 0.0734   Epoch: 2   Global Step: 118820   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:26,609-Speed 2627.76 samples/sec   Loss 11.8881   LearningRate 0.0734   Epoch: 2   Global Step: 118830   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:30,512-Speed 2623.76 samples/sec   Loss 11.7901   LearningRate 0.0734   Epoch: 2   Global Step: 118840   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:03:34,396-Speed 2637.71 samples/sec   Loss 12.0241   LearningRate 0.0734   Epoch: 2   Global Step: 118850   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:38,297-Speed 2625.69 samples/sec   Loss 11.9551   LearningRate 0.0734   Epoch: 2   Global Step: 118860   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:42,222-Speed 2609.71 samples/sec   Loss 11.9040   LearningRate 0.0734   Epoch: 2   Global Step: 118870   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:46,124-Speed 2624.49 samples/sec   Loss 11.8104   LearningRate 0.0734   Epoch: 2   Global Step: 118880   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:03:49,990-Speed 2650.05 samples/sec   Loss 11.9411   LearningRate 0.0734   Epoch: 2   Global Step: 118890   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:03:53,893-Speed 2623.56 samples/sec   Loss 11.8937   LearningRate 0.0734   Epoch: 2   Global Step: 118900   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:03:57,802-Speed 2620.34 samples/sec   Loss 11.9929   LearningRate 0.0734   Epoch: 2   Global Step: 118910   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:01,708-Speed 2622.59 samples/sec   Loss 11.8118   LearningRate 0.0734   Epoch: 2   Global Step: 118920   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:05,609-Speed 2625.95 samples/sec   Loss 12.0314   LearningRate 0.0734   Epoch: 2   Global Step: 118930   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:09,507-Speed 2627.12 samples/sec   Loss 11.9818   LearningRate 0.0734   Epoch: 2   Global Step: 118940   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:13,411-Speed 2623.51 samples/sec   Loss 11.8860   LearningRate 0.0734   Epoch: 2   Global Step: 118950   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:17,308-Speed 2628.53 samples/sec   Loss 11.8929   LearningRate 0.0734   Epoch: 2   Global Step: 118960   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:21,203-Speed 2629.89 samples/sec   Loss 11.7208   LearningRate 0.0734   Epoch: 2   Global Step: 118970   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:25,098-Speed 2629.68 samples/sec   Loss 11.8515   LearningRate 0.0734   Epoch: 2   Global Step: 118980   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:04:28,998-Speed 2626.35 samples/sec   Loss 11.9491   LearningRate 0.0734   Epoch: 2   Global Step: 118990   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:32,888-Speed 2633.44 samples/sec   Loss 11.8419   LearningRate 0.0734   Epoch: 2   Global Step: 119000   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:36,805-Speed 2614.99 samples/sec   Loss 11.8090   LearningRate 0.0734   Epoch: 2   Global Step: 119010   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:40,721-Speed 2615.21 samples/sec   Loss 11.8018   LearningRate 0.0734   Epoch: 2   Global Step: 119020   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:44,644-Speed 2610.72 samples/sec   Loss 11.8772   LearningRate 0.0734   Epoch: 2   Global Step: 119030   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:48,539-Speed 2630.31 samples/sec   Loss 11.8381   LearningRate 0.0734   Epoch: 2   Global Step: 119040   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:52,433-Speed 2630.55 samples/sec   Loss 11.7580   LearningRate 0.0734   Epoch: 2   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:04:56,349-Speed 2615.48 samples/sec   Loss 11.7459   LearningRate 0.0734   Epoch: 2   Global Step: 119060   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:05:00,258-Speed 2620.71 samples/sec   Loss 11.9167   LearningRate 0.0734   Epoch: 2   Global Step: 119070   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:05:04,157-Speed 2626.73 samples/sec   Loss 11.9433   LearningRate 0.0734   Epoch: 2   Global Step: 119080   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:05:08,048-Speed 2632.30 samples/sec   Loss 11.9873   LearningRate 0.0733   Epoch: 2   Global Step: 119090   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:11,940-Speed 2632.25 samples/sec   Loss 11.8174   LearningRate 0.0733   Epoch: 2   Global Step: 119100   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:15,835-Speed 2630.13 samples/sec   Loss 11.8452   LearningRate 0.0733   Epoch: 2   Global Step: 119110   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:19,730-Speed 2629.73 samples/sec   Loss 11.8233   LearningRate 0.0733   Epoch: 2   Global Step: 119120   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:23,636-Speed 2622.46 samples/sec   Loss 11.7752   LearningRate 0.0733   Epoch: 2   Global Step: 119130   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:27,534-Speed 2627.39 samples/sec   Loss 11.8988   LearningRate 0.0733   Epoch: 2   Global Step: 119140   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:31,426-Speed 2631.83 samples/sec   Loss 11.8180   LearningRate 0.0733   Epoch: 2   Global Step: 119150   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:35,327-Speed 2625.64 samples/sec   Loss 11.7974   LearningRate 0.0733   Epoch: 2   Global Step: 119160   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:39,223-Speed 2628.88 samples/sec   Loss 11.9197   LearningRate 0.0733   Epoch: 2   Global Step: 119170   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:43,122-Speed 2626.59 samples/sec   Loss 11.9297   LearningRate 0.0733   Epoch: 2   Global Step: 119180   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:47,029-Speed 2621.68 samples/sec   Loss 11.8786   LearningRate 0.0733   Epoch: 2   Global Step: 119190   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:05:50,920-Speed 2632.95 samples/sec   Loss 11.8557   LearningRate 0.0733   Epoch: 2   Global Step: 119200   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:05:54,805-Speed 2636.19 samples/sec   Loss 11.8640   LearningRate 0.0733   Epoch: 2   Global Step: 119210   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:05:58,694-Speed 2634.34 samples/sec   Loss 11.9833   LearningRate 0.0733   Epoch: 2   Global Step: 119220   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:06:02,574-Speed 2639.18 samples/sec   Loss 11.8615   LearningRate 0.0733   Epoch: 2   Global Step: 119230   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:06,469-Speed 2629.71 samples/sec   Loss 11.8564   LearningRate 0.0733   Epoch: 2   Global Step: 119240   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:10,365-Speed 2628.65 samples/sec   Loss 11.9028   LearningRate 0.0733   Epoch: 2   Global Step: 119250   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:14,276-Speed 2619.00 samples/sec   Loss 11.9499   LearningRate 0.0733   Epoch: 2   Global Step: 119260   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:18,169-Speed 2631.00 samples/sec   Loss 11.9006   LearningRate 0.0733   Epoch: 2   Global Step: 119270   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:22,062-Speed 2631.25 samples/sec   Loss 11.9585   LearningRate 0.0733   Epoch: 2   Global Step: 119280   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:25,961-Speed 2627.18 samples/sec   Loss 11.8451   LearningRate 0.0733   Epoch: 2   Global Step: 119290   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:29,853-Speed 2632.01 samples/sec   Loss 11.9082   LearningRate 0.0733   Epoch: 2   Global Step: 119300   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:33,746-Speed 2631.08 samples/sec   Loss 11.7541   LearningRate 0.0733   Epoch: 2   Global Step: 119310   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:37,637-Speed 2632.16 samples/sec   Loss 11.9750   LearningRate 0.0733   Epoch: 2   Global Step: 119320   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:06:41,560-Speed 2610.82 samples/sec   Loss 11.9140   LearningRate 0.0733   Epoch: 2   Global Step: 119330   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:06:45,458-Speed 2627.38 samples/sec   Loss 11.8632   LearningRate 0.0733   Epoch: 2   Global Step: 119340   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:06:49,384-Speed 2608.77 samples/sec   Loss 11.9346   LearningRate 0.0733   Epoch: 2   Global Step: 119350   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:06:53,330-Speed 2596.52 samples/sec   Loss 11.7785   LearningRate 0.0733   Epoch: 2   Global Step: 119360   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:06:57,220-Speed 2632.88 samples/sec   Loss 11.7927   LearningRate 0.0733   Epoch: 2   Global Step: 119370   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:01,137-Speed 2616.11 samples/sec   Loss 11.9336   LearningRate 0.0733   Epoch: 2   Global Step: 119380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:05,035-Speed 2627.98 samples/sec   Loss 12.0248   LearningRate 0.0733   Epoch: 2   Global Step: 119390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:08,953-Speed 2613.98 samples/sec   Loss 11.8378   LearningRate 0.0733   Epoch: 2   Global Step: 119400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:12,845-Speed 2631.32 samples/sec   Loss 11.8458   LearningRate 0.0733   Epoch: 2   Global Step: 119410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:16,739-Speed 2630.78 samples/sec   Loss 11.8719   LearningRate 0.0733   Epoch: 2   Global Step: 119420   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:20,637-Speed 2628.07 samples/sec   Loss 11.8802   LearningRate 0.0733   Epoch: 2   Global Step: 119430   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:07:24,537-Speed 2626.08 samples/sec   Loss 11.7986   LearningRate 0.0733   Epoch: 2   Global Step: 119440   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:07:28,432-Speed 2629.76 samples/sec   Loss 12.0575   LearningRate 0.0733   Epoch: 2   Global Step: 119450   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:07:32,330-Speed 2627.91 samples/sec   Loss 11.8551   LearningRate 0.0733   Epoch: 2   Global Step: 119460   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:07:36,224-Speed 2630.02 samples/sec   Loss 11.9343   LearningRate 0.0733   Epoch: 2   Global Step: 119470   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:07:40,103-Speed 2640.50 samples/sec   Loss 11.8946   LearningRate 0.0733   Epoch: 2   Global Step: 119480   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:44,043-Speed 2599.49 samples/sec   Loss 11.9431   LearningRate 0.0733   Epoch: 2   Global Step: 119490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:47,952-Speed 2620.56 samples/sec   Loss 11.8454   LearningRate 0.0733   Epoch: 2   Global Step: 119500   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:51,846-Speed 2630.82 samples/sec   Loss 11.8195   LearningRate 0.0733   Epoch: 2   Global Step: 119510   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:55,743-Speed 2627.69 samples/sec   Loss 11.9112   LearningRate 0.0733   Epoch: 2   Global Step: 119520   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:07:59,638-Speed 2630.53 samples/sec   Loss 11.9521   LearningRate 0.0733   Epoch: 2   Global Step: 119530   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:08:03,530-Speed 2631.48 samples/sec   Loss 11.9301   LearningRate 0.0733   Epoch: 2   Global Step: 119540   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:08:07,421-Speed 2632.40 samples/sec   Loss 11.9364   LearningRate 0.0733   Epoch: 2   Global Step: 119550   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:08:11,314-Speed 2630.42 samples/sec   Loss 11.9972   LearningRate 0.0733   Epoch: 2   Global Step: 119560   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:08:15,220-Speed 2622.76 samples/sec   Loss 11.9433   LearningRate 0.0733   Epoch: 2   Global Step: 119570   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:08:19,115-Speed 2629.09 samples/sec   Loss 11.9360   LearningRate 0.0732   Epoch: 2   Global Step: 119580   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:08:22,973-Speed 2654.95 samples/sec   Loss 11.8682   LearningRate 0.0732   Epoch: 2   Global Step: 119590   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:26,871-Speed 2627.89 samples/sec   Loss 11.8578   LearningRate 0.0732   Epoch: 2   Global Step: 119600   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:30,763-Speed 2632.17 samples/sec   Loss 11.8526   LearningRate 0.0732   Epoch: 2   Global Step: 119610   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:34,652-Speed 2633.14 samples/sec   Loss 11.9640   LearningRate 0.0732   Epoch: 2   Global Step: 119620   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:38,544-Speed 2631.83 samples/sec   Loss 11.9566   LearningRate 0.0732   Epoch: 2   Global Step: 119630   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:42,438-Speed 2630.07 samples/sec   Loss 11.8657   LearningRate 0.0732   Epoch: 2   Global Step: 119640   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:46,328-Speed 2632.85 samples/sec   Loss 11.9772   LearningRate 0.0732   Epoch: 2   Global Step: 119650   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:50,221-Speed 2630.97 samples/sec   Loss 11.9010   LearningRate 0.0732   Epoch: 2   Global Step: 119660   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:54,114-Speed 2631.86 samples/sec   Loss 11.8531   LearningRate 0.0732   Epoch: 2   Global Step: 119670   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:08:58,005-Speed 2631.91 samples/sec   Loss 11.9425   LearningRate 0.0732   Epoch: 2   Global Step: 119680   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:09:01,901-Speed 2629.44 samples/sec   Loss 12.0005   LearningRate 0.0732   Epoch: 2   Global Step: 119690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:05,793-Speed 2631.30 samples/sec   Loss 11.8622   LearningRate 0.0732   Epoch: 2   Global Step: 119700   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:09,686-Speed 2631.54 samples/sec   Loss 11.9400   LearningRate 0.0732   Epoch: 2   Global Step: 119710   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:13,585-Speed 2626.79 samples/sec   Loss 11.7860   LearningRate 0.0732   Epoch: 2   Global Step: 119720   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:17,478-Speed 2630.20 samples/sec   Loss 11.8984   LearningRate 0.0732   Epoch: 2   Global Step: 119730   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:21,384-Speed 2622.50 samples/sec   Loss 11.8313   LearningRate 0.0732   Epoch: 2   Global Step: 119740   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:25,270-Speed 2636.04 samples/sec   Loss 12.0126   LearningRate 0.0732   Epoch: 2   Global Step: 119750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:29,168-Speed 2627.21 samples/sec   Loss 11.8969   LearningRate 0.0732   Epoch: 2   Global Step: 119760   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:33,067-Speed 2627.31 samples/sec   Loss 11.7784   LearningRate 0.0732   Epoch: 2   Global Step: 119770   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:36,962-Speed 2629.53 samples/sec   Loss 11.8658   LearningRate 0.0732   Epoch: 2   Global Step: 119780   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:09:40,854-Speed 2632.07 samples/sec   Loss 11.9293   LearningRate 0.0732   Epoch: 2   Global Step: 119790   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:09:44,747-Speed 2631.04 samples/sec   Loss 11.8130   LearningRate 0.0732   Epoch: 2   Global Step: 119800   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:09:48,640-Speed 2630.66 samples/sec   Loss 12.0266   LearningRate 0.0732   Epoch: 2   Global Step: 119810   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:09:52,531-Speed 2631.80 samples/sec   Loss 11.9329   LearningRate 0.0732   Epoch: 2   Global Step: 119820   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:09:56,473-Speed 2598.60 samples/sec   Loss 11.9035   LearningRate 0.0732   Epoch: 2   Global Step: 119830   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:00,373-Speed 2626.83 samples/sec   Loss 11.8449   LearningRate 0.0732   Epoch: 2   Global Step: 119840   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:04,267-Speed 2630.42 samples/sec   Loss 11.8094   LearningRate 0.0732   Epoch: 2   Global Step: 119850   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:08,170-Speed 2624.07 samples/sec   Loss 11.8808   LearningRate 0.0732   Epoch: 2   Global Step: 119860   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:12,069-Speed 2627.55 samples/sec   Loss 11.8251   LearningRate 0.0732   Epoch: 2   Global Step: 119870   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:15,968-Speed 2626.49 samples/sec   Loss 11.8827   LearningRate 0.0732   Epoch: 2   Global Step: 119880   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:19,868-Speed 2626.21 samples/sec   Loss 11.9792   LearningRate 0.0732   Epoch: 2   Global Step: 119890   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:23,768-Speed 2626.45 samples/sec   Loss 11.8391   LearningRate 0.0732   Epoch: 2   Global Step: 119900   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:27,675-Speed 2621.39 samples/sec   Loss 11.7490   LearningRate 0.0732   Epoch: 2   Global Step: 119910   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:31,586-Speed 2618.90 samples/sec   Loss 11.9006   LearningRate 0.0732   Epoch: 2   Global Step: 119920   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:35,489-Speed 2623.93 samples/sec   Loss 11.8699   LearningRate 0.0732   Epoch: 2   Global Step: 119930   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:10:39,386-Speed 2628.67 samples/sec   Loss 11.8096   LearningRate 0.0732   Epoch: 2   Global Step: 119940   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:43,279-Speed 2630.44 samples/sec   Loss 11.7952   LearningRate 0.0732   Epoch: 2   Global Step: 119950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:47,180-Speed 2625.66 samples/sec   Loss 11.9398   LearningRate 0.0732   Epoch: 2   Global Step: 119960   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:51,072-Speed 2631.42 samples/sec   Loss 11.7875   LearningRate 0.0732   Epoch: 2   Global Step: 119970   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:54,969-Speed 2628.57 samples/sec   Loss 11.7599   LearningRate 0.0732   Epoch: 2   Global Step: 119980   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:10:58,863-Speed 2630.14 samples/sec   Loss 11.8931   LearningRate 0.0732   Epoch: 2   Global Step: 119990   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:11:02,757-Speed 2630.58 samples/sec   Loss 11.8223   LearningRate 0.0732   Epoch: 2   Global Step: 120000   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:11:45,583-[lfw][120000]XNorm: 23.355378
Training: 2022-04-13 09:11:45,584-[lfw][120000]Accuracy-Flip: 0.99750+-0.00250
Training: 2022-04-13 09:11:45,585-[lfw][120000]Accuracy-Highest: 0.99783
Training: 2022-04-13 09:12:35,723-[cfp_fp][120000]XNorm: 21.516966
Training: 2022-04-13 09:12:35,724-[cfp_fp][120000]Accuracy-Flip: 0.97971+-0.00790
Training: 2022-04-13 09:12:35,724-[cfp_fp][120000]Accuracy-Highest: 0.97986
Training: 2022-04-13 09:13:18,693-[agedb_30][120000]XNorm: 23.355863
Training: 2022-04-13 09:13:18,694-[agedb_30][120000]Accuracy-Flip: 0.96800+-0.00852
Training: 2022-04-13 09:13:18,694-[agedb_30][120000]Accuracy-Highest: 0.96800
Training: 2022-04-13 09:13:22,572-Speed 73.24 samples/sec   Loss 11.7953   LearningRate 0.0732   Epoch: 2   Global Step: 120010   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:13:26,437-Speed 2650.30 samples/sec   Loss 11.9058   LearningRate 0.0732   Epoch: 2   Global Step: 120020   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:13:30,304-Speed 2648.67 samples/sec   Loss 11.8279   LearningRate 0.0732   Epoch: 2   Global Step: 120030   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:13:34,162-Speed 2654.41 samples/sec   Loss 11.8315   LearningRate 0.0732   Epoch: 2   Global Step: 120040   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:13:38,042-Speed 2640.03 samples/sec   Loss 11.9734   LearningRate 0.0732   Epoch: 2   Global Step: 120050   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:13:41,923-Speed 2638.97 samples/sec   Loss 11.8406   LearningRate 0.0731   Epoch: 2   Global Step: 120060   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:13:45,809-Speed 2636.01 samples/sec   Loss 11.8424   LearningRate 0.0731   Epoch: 2   Global Step: 120070   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:13:49,710-Speed 2625.62 samples/sec   Loss 11.8074   LearningRate 0.0731   Epoch: 2   Global Step: 120080   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:13:53,585-Speed 2643.00 samples/sec   Loss 11.9036   LearningRate 0.0731   Epoch: 2   Global Step: 120090   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:13:57,466-Speed 2638.65 samples/sec   Loss 11.8632   LearningRate 0.0731   Epoch: 2   Global Step: 120100   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:01,408-Speed 2598.70 samples/sec   Loss 11.8836   LearningRate 0.0731   Epoch: 2   Global Step: 120110   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:05,362-Speed 2590.47 samples/sec   Loss 11.8211   LearningRate 0.0731   Epoch: 2   Global Step: 120120   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:09,263-Speed 2625.53 samples/sec   Loss 11.7182   LearningRate 0.0731   Epoch: 2   Global Step: 120130   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:13,150-Speed 2634.57 samples/sec   Loss 11.7325   LearningRate 0.0731   Epoch: 2   Global Step: 120140   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:14:17,090-Speed 2599.95 samples/sec   Loss 11.9489   LearningRate 0.0731   Epoch: 2   Global Step: 120150   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:14:20,995-Speed 2623.18 samples/sec   Loss 11.8308   LearningRate 0.0731   Epoch: 2   Global Step: 120160   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:14:24,893-Speed 2627.45 samples/sec   Loss 11.9262   LearningRate 0.0731   Epoch: 2   Global Step: 120170   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:14:28,792-Speed 2626.58 samples/sec   Loss 11.7641   LearningRate 0.0731   Epoch: 2   Global Step: 120180   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:14:32,681-Speed 2633.72 samples/sec   Loss 11.8209   LearningRate 0.0731   Epoch: 2   Global Step: 120190   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:14:36,555-Speed 2644.21 samples/sec   Loss 11.8369   LearningRate 0.0731   Epoch: 2   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:40,464-Speed 2619.43 samples/sec   Loss 11.9501   LearningRate 0.0731   Epoch: 2   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:44,359-Speed 2630.00 samples/sec   Loss 11.9429   LearningRate 0.0731   Epoch: 2   Global Step: 120220   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:48,256-Speed 2627.96 samples/sec   Loss 11.9547   LearningRate 0.0731   Epoch: 2   Global Step: 120230   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:52,157-Speed 2626.06 samples/sec   Loss 11.8869   LearningRate 0.0731   Epoch: 2   Global Step: 120240   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:56,043-Speed 2635.88 samples/sec   Loss 12.0201   LearningRate 0.0731   Epoch: 2   Global Step: 120250   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:14:59,928-Speed 2636.25 samples/sec   Loss 11.8795   LearningRate 0.0731   Epoch: 2   Global Step: 120260   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:15:03,818-Speed 2632.43 samples/sec   Loss 11.9430   LearningRate 0.0731   Epoch: 2   Global Step: 120270   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:15:07,711-Speed 2631.45 samples/sec   Loss 11.9478   LearningRate 0.0731   Epoch: 2   Global Step: 120280   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:15:11,630-Speed 2612.97 samples/sec   Loss 11.8194   LearningRate 0.0731   Epoch: 2   Global Step: 120290   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:15:15,528-Speed 2627.94 samples/sec   Loss 11.8980   LearningRate 0.0731   Epoch: 2   Global Step: 120300   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:19,418-Speed 2632.57 samples/sec   Loss 11.7210   LearningRate 0.0731   Epoch: 2   Global Step: 120310   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:23,327-Speed 2620.71 samples/sec   Loss 11.9406   LearningRate 0.0731   Epoch: 2   Global Step: 120320   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:27,236-Speed 2620.00 samples/sec   Loss 11.9140   LearningRate 0.0731   Epoch: 2   Global Step: 120330   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:31,119-Speed 2638.23 samples/sec   Loss 11.8058   LearningRate 0.0731   Epoch: 2   Global Step: 120340   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:35,009-Speed 2632.76 samples/sec   Loss 11.9221   LearningRate 0.0731   Epoch: 2   Global Step: 120350   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:38,902-Speed 2630.81 samples/sec   Loss 11.8298   LearningRate 0.0731   Epoch: 2   Global Step: 120360   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:42,791-Speed 2633.33 samples/sec   Loss 11.9415   LearningRate 0.0731   Epoch: 2   Global Step: 120370   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:46,686-Speed 2629.80 samples/sec   Loss 11.7900   LearningRate 0.0731   Epoch: 2   Global Step: 120380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:50,576-Speed 2632.53 samples/sec   Loss 11.9282   LearningRate 0.0731   Epoch: 2   Global Step: 120390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:15:54,466-Speed 2633.46 samples/sec   Loss 11.9167   LearningRate 0.0731   Epoch: 2   Global Step: 120400   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:15:58,346-Speed 2639.45 samples/sec   Loss 11.7739   LearningRate 0.0731   Epoch: 2   Global Step: 120410   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:02,240-Speed 2630.37 samples/sec   Loss 11.9026   LearningRate 0.0731   Epoch: 2   Global Step: 120420   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:06,132-Speed 2631.88 samples/sec   Loss 11.9568   LearningRate 0.0731   Epoch: 2   Global Step: 120430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:10,023-Speed 2632.47 samples/sec   Loss 11.8828   LearningRate 0.0731   Epoch: 2   Global Step: 120440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:13,924-Speed 2625.10 samples/sec   Loss 11.8330   LearningRate 0.0731   Epoch: 2   Global Step: 120450   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:17,816-Speed 2631.77 samples/sec   Loss 11.6569   LearningRate 0.0731   Epoch: 2   Global Step: 120460   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:21,703-Speed 2635.72 samples/sec   Loss 11.8438   LearningRate 0.0731   Epoch: 2   Global Step: 120470   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:25,605-Speed 2624.80 samples/sec   Loss 11.7752   LearningRate 0.0731   Epoch: 2   Global Step: 120480   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:29,499-Speed 2629.77 samples/sec   Loss 11.9160   LearningRate 0.0731   Epoch: 2   Global Step: 120490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:33,398-Speed 2626.96 samples/sec   Loss 11.7556   LearningRate 0.0731   Epoch: 2   Global Step: 120500   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:37,291-Speed 2630.75 samples/sec   Loss 11.8437   LearningRate 0.0731   Epoch: 2   Global Step: 120510   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:16:41,161-Speed 2646.88 samples/sec   Loss 11.8332   LearningRate 0.0731   Epoch: 2   Global Step: 120520   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:16:45,031-Speed 2647.07 samples/sec   Loss 11.9421   LearningRate 0.0731   Epoch: 2   Global Step: 120530   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:16:48,903-Speed 2645.14 samples/sec   Loss 11.8893   LearningRate 0.0731   Epoch: 2   Global Step: 120540   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:16:52,924-Speed 2555.98 samples/sec   Loss 11.9597   LearningRate 0.0730   Epoch: 2   Global Step: 120550   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:16:56,811-Speed 2634.84 samples/sec   Loss 11.9002   LearningRate 0.0730   Epoch: 2   Global Step: 120560   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:00,699-Speed 2634.40 samples/sec   Loss 11.9216   LearningRate 0.0730   Epoch: 2   Global Step: 120570   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:04,595-Speed 2628.68 samples/sec   Loss 11.7957   LearningRate 0.0730   Epoch: 2   Global Step: 120580   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:08,506-Speed 2618.49 samples/sec   Loss 11.9622   LearningRate 0.0730   Epoch: 2   Global Step: 120590   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:12,410-Speed 2623.87 samples/sec   Loss 11.7891   LearningRate 0.0730   Epoch: 2   Global Step: 120600   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:16,309-Speed 2638.96 samples/sec   Loss 11.9462   LearningRate 0.0730   Epoch: 2   Global Step: 120610   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:20,204-Speed 2629.66 samples/sec   Loss 11.7573   LearningRate 0.0730   Epoch: 2   Global Step: 120620   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:24,145-Speed 2630.56 samples/sec   Loss 11.8937   LearningRate 0.0730   Epoch: 2   Global Step: 120630   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:17:28,050-Speed 2622.42 samples/sec   Loss 11.9023   LearningRate 0.0730   Epoch: 2   Global Step: 120640   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:31,943-Speed 2638.99 samples/sec   Loss 11.7641   LearningRate 0.0730   Epoch: 2   Global Step: 120650   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:35,830-Speed 2634.59 samples/sec   Loss 11.7250   LearningRate 0.0730   Epoch: 2   Global Step: 120660   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:39,797-Speed 2581.77 samples/sec   Loss 11.9324   LearningRate 0.0730   Epoch: 2   Global Step: 120670   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:43,743-Speed 2595.64 samples/sec   Loss 11.8826   LearningRate 0.0730   Epoch: 2   Global Step: 120680   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:47,740-Speed 2640.96 samples/sec   Loss 11.8526   LearningRate 0.0730   Epoch: 2   Global Step: 120690   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:51,889-Speed 2641.89 samples/sec   Loss 11.8919   LearningRate 0.0730   Epoch: 2   Global Step: 120700   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:17:55,807-Speed 2613.88 samples/sec   Loss 11.8648   LearningRate 0.0730   Epoch: 2   Global Step: 120710   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:00,106-Speed 2634.30 samples/sec   Loss 11.8405   LearningRate 0.0730   Epoch: 2   Global Step: 120720   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:04,005-Speed 2626.65 samples/sec   Loss 11.8168   LearningRate 0.0730   Epoch: 2   Global Step: 120730   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:07,936-Speed 2605.87 samples/sec   Loss 11.9654   LearningRate 0.0730   Epoch: 2   Global Step: 120740   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:18:11,843-Speed 2622.16 samples/sec   Loss 11.8105   LearningRate 0.0730   Epoch: 2   Global Step: 120750   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:18:15,742-Speed 2626.35 samples/sec   Loss 11.9260   LearningRate 0.0730   Epoch: 2   Global Step: 120760   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:18:19,620-Speed 2641.13 samples/sec   Loss 11.7581   LearningRate 0.0730   Epoch: 2   Global Step: 120770   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:23,521-Speed 2625.39 samples/sec   Loss 12.0620   LearningRate 0.0730   Epoch: 2   Global Step: 120780   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:27,453-Speed 2605.17 samples/sec   Loss 12.3688   LearningRate 0.0730   Epoch: 2   Global Step: 120790   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:31,359-Speed 2622.48 samples/sec   Loss 11.9747   LearningRate 0.0730   Epoch: 2   Global Step: 120800   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:35,253-Speed 2630.41 samples/sec   Loss 12.0341   LearningRate 0.0730   Epoch: 2   Global Step: 120810   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:39,154-Speed 2625.92 samples/sec   Loss 11.9469   LearningRate 0.0730   Epoch: 2   Global Step: 120820   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:43,057-Speed 2624.10 samples/sec   Loss 11.8490   LearningRate 0.0730   Epoch: 2   Global Step: 120830   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:46,960-Speed 2624.25 samples/sec   Loss 11.6752   LearningRate 0.0730   Epoch: 2   Global Step: 120840   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:50,863-Speed 2623.84 samples/sec   Loss 11.9498   LearningRate 0.0730   Epoch: 2   Global Step: 120850   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:54,769-Speed 2622.38 samples/sec   Loss 11.7887   LearningRate 0.0730   Epoch: 2   Global Step: 120860   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:18:58,673-Speed 2624.23 samples/sec   Loss 11.9175   LearningRate 0.0730   Epoch: 2   Global Step: 120870   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:02,579-Speed 2621.65 samples/sec   Loss 11.8170   LearningRate 0.0730   Epoch: 2   Global Step: 120880   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:06,484-Speed 2623.69 samples/sec   Loss 11.7956   LearningRate 0.0730   Epoch: 2   Global Step: 120890   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:10,387-Speed 2623.74 samples/sec   Loss 11.8179   LearningRate 0.0730   Epoch: 2   Global Step: 120900   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:14,287-Speed 2626.28 samples/sec   Loss 11.9583   LearningRate 0.0730   Epoch: 2   Global Step: 120910   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:18,181-Speed 2630.30 samples/sec   Loss 11.8592   LearningRate 0.0730   Epoch: 2   Global Step: 120920   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:22,081-Speed 2626.36 samples/sec   Loss 11.9992   LearningRate 0.0730   Epoch: 2   Global Step: 120930   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:25,977-Speed 2629.35 samples/sec   Loss 11.8931   LearningRate 0.0730   Epoch: 2   Global Step: 120940   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:29,874-Speed 2628.24 samples/sec   Loss 12.0012   LearningRate 0.0730   Epoch: 2   Global Step: 120950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:33,777-Speed 2624.03 samples/sec   Loss 11.9725   LearningRate 0.0730   Epoch: 2   Global Step: 120960   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:37,676-Speed 2627.70 samples/sec   Loss 11.8555   LearningRate 0.0730   Epoch: 2   Global Step: 120970   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:19:41,571-Speed 2629.21 samples/sec   Loss 11.7684   LearningRate 0.0730   Epoch: 2   Global Step: 120980   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:19:45,470-Speed 2626.50 samples/sec   Loss 11.9957   LearningRate 0.0730   Epoch: 2   Global Step: 120990   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:19:49,371-Speed 2625.95 samples/sec   Loss 11.8924   LearningRate 0.0730   Epoch: 2   Global Step: 121000   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:19:53,249-Speed 2641.28 samples/sec   Loss 11.7337   LearningRate 0.0730   Epoch: 2   Global Step: 121010   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:19:57,144-Speed 2630.26 samples/sec   Loss 11.7660   LearningRate 0.0730   Epoch: 2   Global Step: 121020   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:01,058-Speed 2616.82 samples/sec   Loss 11.9353   LearningRate 0.0729   Epoch: 2   Global Step: 121030   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:04,967-Speed 2620.13 samples/sec   Loss 11.7850   LearningRate 0.0729   Epoch: 2   Global Step: 121040   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:08,863-Speed 2629.29 samples/sec   Loss 11.9033   LearningRate 0.0729   Epoch: 2   Global Step: 121050   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:12,760-Speed 2628.49 samples/sec   Loss 11.7507   LearningRate 0.0729   Epoch: 2   Global Step: 121060   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:16,657-Speed 2627.87 samples/sec   Loss 11.8643   LearningRate 0.0729   Epoch: 2   Global Step: 121070   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:20,556-Speed 2627.09 samples/sec   Loss 11.6592   LearningRate 0.0729   Epoch: 2   Global Step: 121080   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:24,451-Speed 2629.39 samples/sec   Loss 11.8951   LearningRate 0.0729   Epoch: 2   Global Step: 121090   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:20:28,331-Speed 2640.42 samples/sec   Loss 11.8588   LearningRate 0.0729   Epoch: 2   Global Step: 121100   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:20:32,216-Speed 2636.50 samples/sec   Loss 11.9981   LearningRate 0.0729   Epoch: 2   Global Step: 121110   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:36,110-Speed 2629.98 samples/sec   Loss 11.7821   LearningRate 0.0729   Epoch: 2   Global Step: 121120   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:40,005-Speed 2629.81 samples/sec   Loss 11.9489   LearningRate 0.0729   Epoch: 2   Global Step: 121130   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:43,912-Speed 2620.98 samples/sec   Loss 11.8071   LearningRate 0.0729   Epoch: 2   Global Step: 121140   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:47,823-Speed 2619.38 samples/sec   Loss 11.9687   LearningRate 0.0729   Epoch: 2   Global Step: 121150   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:51,731-Speed 2620.81 samples/sec   Loss 11.9156   LearningRate 0.0729   Epoch: 2   Global Step: 121160   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:55,638-Speed 2621.78 samples/sec   Loss 11.6326   LearningRate 0.0729   Epoch: 2   Global Step: 121170   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:20:59,543-Speed 2622.60 samples/sec   Loss 11.9023   LearningRate 0.0729   Epoch: 2   Global Step: 121180   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:21:03,440-Speed 2628.58 samples/sec   Loss 11.8131   LearningRate 0.0729   Epoch: 2   Global Step: 121190   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:21:07,335-Speed 2629.59 samples/sec   Loss 11.7590   LearningRate 0.0729   Epoch: 2   Global Step: 121200   Fp16 Grad Scale: 32768   Required: 80 hours
Training: 2022-04-13 09:21:11,228-Speed 2630.88 samples/sec   Loss 11.7416   LearningRate 0.0729   Epoch: 2   Global Step: 121210   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:15,135-Speed 2621.82 samples/sec   Loss 11.8528   LearningRate 0.0729   Epoch: 2   Global Step: 121220   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:19,027-Speed 2631.43 samples/sec   Loss 11.7822   LearningRate 0.0729   Epoch: 2   Global Step: 121230   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:22,935-Speed 2621.34 samples/sec   Loss 11.7380   LearningRate 0.0729   Epoch: 2   Global Step: 121240   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:26,836-Speed 2625.41 samples/sec   Loss 11.8334   LearningRate 0.0729   Epoch: 2   Global Step: 121250   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:30,733-Speed 2629.01 samples/sec   Loss 11.7743   LearningRate 0.0729   Epoch: 2   Global Step: 121260   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:34,628-Speed 2629.48 samples/sec   Loss 11.7383   LearningRate 0.0729   Epoch: 2   Global Step: 121270   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:38,525-Speed 2627.57 samples/sec   Loss 11.6860   LearningRate 0.0729   Epoch: 2   Global Step: 121280   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:42,420-Speed 2629.50 samples/sec   Loss 11.7783   LearningRate 0.0729   Epoch: 2   Global Step: 121290   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:46,327-Speed 2621.76 samples/sec   Loss 11.9261   LearningRate 0.0729   Epoch: 2   Global Step: 121300   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:21:50,239-Speed 2618.52 samples/sec   Loss 11.7621   LearningRate 0.0729   Epoch: 2   Global Step: 121310   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:21:54,141-Speed 2625.35 samples/sec   Loss 12.0549   LearningRate 0.0729   Epoch: 2   Global Step: 121320   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:21:58,039-Speed 2627.71 samples/sec   Loss 11.7720   LearningRate 0.0729   Epoch: 2   Global Step: 121330   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:01,937-Speed 2627.63 samples/sec   Loss 11.8362   LearningRate 0.0729   Epoch: 2   Global Step: 121340   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:05,834-Speed 2628.38 samples/sec   Loss 11.8056   LearningRate 0.0729   Epoch: 2   Global Step: 121350   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:09,734-Speed 2625.47 samples/sec   Loss 12.0086   LearningRate 0.0729   Epoch: 2   Global Step: 121360   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:13,635-Speed 2625.91 samples/sec   Loss 11.8193   LearningRate 0.0729   Epoch: 2   Global Step: 121370   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:17,532-Speed 2628.24 samples/sec   Loss 11.7585   LearningRate 0.0729   Epoch: 2   Global Step: 121380   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:21,436-Speed 2623.59 samples/sec   Loss 11.6961   LearningRate 0.0729   Epoch: 2   Global Step: 121390   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:25,337-Speed 2625.78 samples/sec   Loss 11.8371   LearningRate 0.0729   Epoch: 2   Global Step: 121400   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:29,234-Speed 2628.41 samples/sec   Loss 11.8788   LearningRate 0.0729   Epoch: 2   Global Step: 121410   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:22:33,131-Speed 2628.22 samples/sec   Loss 11.7448   LearningRate 0.0729   Epoch: 2   Global Step: 121420   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:22:37,012-Speed 2639.38 samples/sec   Loss 11.9600   LearningRate 0.0729   Epoch: 2   Global Step: 121430   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:40,913-Speed 2624.97 samples/sec   Loss 11.9530   LearningRate 0.0729   Epoch: 2   Global Step: 121440   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:44,822-Speed 2620.16 samples/sec   Loss 11.9358   LearningRate 0.0729   Epoch: 2   Global Step: 121450   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:48,728-Speed 2621.82 samples/sec   Loss 11.9734   LearningRate 0.0729   Epoch: 2   Global Step: 121460   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:52,644-Speed 2616.32 samples/sec   Loss 11.9264   LearningRate 0.0729   Epoch: 2   Global Step: 121470   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:22:56,555-Speed 2618.65 samples/sec   Loss 11.8084   LearningRate 0.0729   Epoch: 2   Global Step: 121480   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:23:00,468-Speed 2618.33 samples/sec   Loss 11.9569   LearningRate 0.0729   Epoch: 2   Global Step: 121490   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:23:04,352-Speed 2637.09 samples/sec   Loss 11.8706   LearningRate 0.0729   Epoch: 2   Global Step: 121500   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:08,260-Speed 2620.89 samples/sec   Loss 11.9519   LearningRate 0.0729   Epoch: 2   Global Step: 121510   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:12,173-Speed 2617.11 samples/sec   Loss 11.8341   LearningRate 0.0728   Epoch: 2   Global Step: 121520   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:16,078-Speed 2623.26 samples/sec   Loss 11.6919   LearningRate 0.0728   Epoch: 2   Global Step: 121530   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:19,977-Speed 2626.64 samples/sec   Loss 11.8432   LearningRate 0.0728   Epoch: 2   Global Step: 121540   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:23,878-Speed 2626.22 samples/sec   Loss 11.9429   LearningRate 0.0728   Epoch: 2   Global Step: 121550   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:27,781-Speed 2623.70 samples/sec   Loss 11.8419   LearningRate 0.0728   Epoch: 2   Global Step: 121560   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:31,687-Speed 2622.91 samples/sec   Loss 11.8540   LearningRate 0.0728   Epoch: 2   Global Step: 121570   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:35,588-Speed 2625.05 samples/sec   Loss 11.8310   LearningRate 0.0728   Epoch: 2   Global Step: 121580   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:39,491-Speed 2624.39 samples/sec   Loss 11.8564   LearningRate 0.0728   Epoch: 2   Global Step: 121590   Fp16 Grad Scale: 65536   Required: 80 hours
Training: 2022-04-13 09:23:43,393-Speed 2624.68 samples/sec   Loss 11.8116   LearningRate 0.0728   Epoch: 2   Global Step: 121600   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:23:47,291-Speed 2627.93 samples/sec   Loss 11.9172   LearningRate 0.0728   Epoch: 2   Global Step: 121610   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:23:51,190-Speed 2626.88 samples/sec   Loss 11.7626   LearningRate 0.0728   Epoch: 2   Global Step: 121620   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:23:55,091-Speed 2625.60 samples/sec   Loss 11.8741   LearningRate 0.0728   Epoch: 2   Global Step: 121630   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:23:59,021-Speed 2606.47 samples/sec   Loss 11.8300   LearningRate 0.0728   Epoch: 2   Global Step: 121640   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:24:02,922-Speed 2625.97 samples/sec   Loss 11.8681   LearningRate 0.0728   Epoch: 2   Global Step: 121650   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:24:06,950-Speed 2542.97 samples/sec   Loss 11.7689   LearningRate 0.0728   Epoch: 2   Global Step: 121660   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:24:10,843-Speed 2630.70 samples/sec   Loss 11.9191   LearningRate 0.0728   Epoch: 2   Global Step: 121670   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:24:14,746-Speed 2624.40 samples/sec   Loss 11.7271   LearningRate 0.0728   Epoch: 2   Global Step: 121680   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:24:18,648-Speed 2624.98 samples/sec   Loss 11.9771   LearningRate 0.0728   Epoch: 2   Global Step: 121690   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:24:22,669-Speed 2548.10 samples/sec   Loss 11.7928   LearningRate 0.0728   Epoch: 2   Global Step: 121700   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:26,648-Speed 2573.64 samples/sec   Loss 11.7094   LearningRate 0.0728   Epoch: 2   Global Step: 121710   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:30,542-Speed 2630.45 samples/sec   Loss 11.7722   LearningRate 0.0728   Epoch: 2   Global Step: 121720   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:34,449-Speed 2621.77 samples/sec   Loss 11.7590   LearningRate 0.0728   Epoch: 2   Global Step: 121730   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:38,344-Speed 2629.85 samples/sec   Loss 11.9857   LearningRate 0.0728   Epoch: 2   Global Step: 121740   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:42,239-Speed 2629.10 samples/sec   Loss 11.9674   LearningRate 0.0728   Epoch: 2   Global Step: 121750   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:46,137-Speed 2627.44 samples/sec   Loss 11.9523   LearningRate 0.0728   Epoch: 2   Global Step: 121760   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:50,039-Speed 2625.31 samples/sec   Loss 11.9558   LearningRate 0.0728   Epoch: 2   Global Step: 121770   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:53,936-Speed 2628.04 samples/sec   Loss 11.8151   LearningRate 0.0728   Epoch: 2   Global Step: 121780   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:24:57,834-Speed 2627.47 samples/sec   Loss 11.9651   LearningRate 0.0728   Epoch: 2   Global Step: 121790   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:01,714-Speed 2639.71 samples/sec   Loss 11.6776   LearningRate 0.0728   Epoch: 2   Global Step: 121800   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:05,612-Speed 2627.87 samples/sec   Loss 11.8682   LearningRate 0.0728   Epoch: 2   Global Step: 121810   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:09,512-Speed 2625.96 samples/sec   Loss 11.8300   LearningRate 0.0728   Epoch: 2   Global Step: 121820   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:13,415-Speed 2624.54 samples/sec   Loss 11.8248   LearningRate 0.0728   Epoch: 2   Global Step: 121830   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:17,319-Speed 2623.13 samples/sec   Loss 11.8811   LearningRate 0.0728   Epoch: 2   Global Step: 121840   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:21,228-Speed 2620.67 samples/sec   Loss 11.8216   LearningRate 0.0728   Epoch: 2   Global Step: 121850   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:25,127-Speed 2626.70 samples/sec   Loss 11.7984   LearningRate 0.0728   Epoch: 2   Global Step: 121860   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:29,024-Speed 2628.50 samples/sec   Loss 11.9762   LearningRate 0.0728   Epoch: 2   Global Step: 121870   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:32,923-Speed 2626.88 samples/sec   Loss 11.8984   LearningRate 0.0728   Epoch: 2   Global Step: 121880   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:36,825-Speed 2625.25 samples/sec   Loss 11.9986   LearningRate 0.0728   Epoch: 2   Global Step: 121890   Fp16 Grad Scale: 262144   Required: 80 hours
Training: 2022-04-13 09:25:40,705-Speed 2639.70 samples/sec   Loss 11.8537   LearningRate 0.0728   Epoch: 2   Global Step: 121900   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:25:44,602-Speed 2628.07 samples/sec   Loss 11.8113   LearningRate 0.0728   Epoch: 2   Global Step: 121910   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:25:48,502-Speed 2626.00 samples/sec   Loss 11.7939   LearningRate 0.0728   Epoch: 2   Global Step: 121920   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:25:52,402-Speed 2627.24 samples/sec   Loss 11.7835   LearningRate 0.0728   Epoch: 2   Global Step: 121930   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:25:56,298-Speed 2628.36 samples/sec   Loss 11.7996   LearningRate 0.0728   Epoch: 2   Global Step: 121940   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:26:00,201-Speed 2624.63 samples/sec   Loss 11.9093   LearningRate 0.0728   Epoch: 2   Global Step: 121950   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:26:04,102-Speed 2625.32 samples/sec   Loss 11.8187   LearningRate 0.0728   Epoch: 2   Global Step: 121960   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:26:07,999-Speed 2628.33 samples/sec   Loss 11.6305   LearningRate 0.0728   Epoch: 2   Global Step: 121970   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:26:11,898-Speed 2626.85 samples/sec   Loss 11.6242   LearningRate 0.0728   Epoch: 2   Global Step: 121980   Fp16 Grad Scale: 131072   Required: 80 hours
Training: 2022-04-13 09:26:15,797-Speed 2626.63 samples/sec   Loss 11.8262   LearningRate 0.0728   Epoch: 2   Global Step: 121990   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:26:19,662-Speed 2649.99 samples/sec   Loss 12.2163   LearningRate 0.0728   Epoch: 2   Global Step: 122000   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:23,561-Speed 2627.48 samples/sec   Loss 12.4100   LearningRate 0.0727   Epoch: 2   Global Step: 122010   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:27,470-Speed 2620.31 samples/sec   Loss 12.0318   LearningRate 0.0727   Epoch: 2   Global Step: 122020   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:31,371-Speed 2625.42 samples/sec   Loss 11.9977   LearningRate 0.0727   Epoch: 2   Global Step: 122030   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:35,290-Speed 2613.86 samples/sec   Loss 12.0182   LearningRate 0.0727   Epoch: 2   Global Step: 122040   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:39,186-Speed 2628.88 samples/sec   Loss 11.9878   LearningRate 0.0727   Epoch: 2   Global Step: 122050   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:43,087-Speed 2625.94 samples/sec   Loss 11.8040   LearningRate 0.0727   Epoch: 2   Global Step: 122060   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:46,987-Speed 2626.17 samples/sec   Loss 11.7951   LearningRate 0.0727   Epoch: 2   Global Step: 122070   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:50,887-Speed 2626.12 samples/sec   Loss 11.9277   LearningRate 0.0727   Epoch: 2   Global Step: 122080   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:54,789-Speed 2625.36 samples/sec   Loss 11.8413   LearningRate 0.0727   Epoch: 2   Global Step: 122090   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:26:58,690-Speed 2625.64 samples/sec   Loss 11.7738   LearningRate 0.0727   Epoch: 2   Global Step: 122100   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:02,600-Speed 2619.60 samples/sec   Loss 11.8651   LearningRate 0.0727   Epoch: 2   Global Step: 122110   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:06,500-Speed 2626.36 samples/sec   Loss 11.9163   LearningRate 0.0727   Epoch: 2   Global Step: 122120   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:10,404-Speed 2622.97 samples/sec   Loss 11.8260   LearningRate 0.0727   Epoch: 2   Global Step: 122130   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:14,317-Speed 2617.25 samples/sec   Loss 11.8428   LearningRate 0.0727   Epoch: 2   Global Step: 122140   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:18,217-Speed 2626.45 samples/sec   Loss 11.9809   LearningRate 0.0727   Epoch: 2   Global Step: 122150   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:22,118-Speed 2625.59 samples/sec   Loss 11.9449   LearningRate 0.0727   Epoch: 2   Global Step: 122160   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:26,025-Speed 2622.53 samples/sec   Loss 11.9919   LearningRate 0.0727   Epoch: 2   Global Step: 122170   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:29,954-Speed 2606.78 samples/sec   Loss 11.9123   LearningRate 0.0727   Epoch: 2   Global Step: 122180   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:33,852-Speed 2627.25 samples/sec   Loss 11.8406   LearningRate 0.0727   Epoch: 2   Global Step: 122190   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:27:37,750-Speed 2627.21 samples/sec   Loss 11.8255   LearningRate 0.0727   Epoch: 2   Global Step: 122200   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:27:41,647-Speed 2628.64 samples/sec   Loss 11.8377   LearningRate 0.0727   Epoch: 2   Global Step: 122210   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:27:45,550-Speed 2623.85 samples/sec   Loss 11.7523   LearningRate 0.0727   Epoch: 2   Global Step: 122220   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:27:49,447-Speed 2628.66 samples/sec   Loss 11.7737   LearningRate 0.0727   Epoch: 2   Global Step: 122230   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:27:53,347-Speed 2626.59 samples/sec   Loss 11.9187   LearningRate 0.0727   Epoch: 2   Global Step: 122240   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:27:57,245-Speed 2627.85 samples/sec   Loss 11.9339   LearningRate 0.0727   Epoch: 2   Global Step: 122250   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:01,152-Speed 2621.39 samples/sec   Loss 11.8834   LearningRate 0.0727   Epoch: 2   Global Step: 122260   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:05,052-Speed 2626.37 samples/sec   Loss 11.9217   LearningRate 0.0727   Epoch: 2   Global Step: 122270   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:08,964-Speed 2618.13 samples/sec   Loss 11.5762   LearningRate 0.0727   Epoch: 2   Global Step: 122280   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:12,862-Speed 2627.55 samples/sec   Loss 11.8152   LearningRate 0.0727   Epoch: 2   Global Step: 122290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:16,760-Speed 2627.35 samples/sec   Loss 11.9525   LearningRate 0.0727   Epoch: 2   Global Step: 122300   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:28:20,666-Speed 2622.38 samples/sec   Loss 11.9825   LearningRate 0.0727   Epoch: 2   Global Step: 122310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:24,568-Speed 2625.42 samples/sec   Loss 11.8499   LearningRate 0.0727   Epoch: 2   Global Step: 122320   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:28:28,457-Speed 2634.17 samples/sec   Loss 11.8926   LearningRate 0.0727   Epoch: 2   Global Step: 122330   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:32,370-Speed 2617.32 samples/sec   Loss 11.7684   LearningRate 0.0727   Epoch: 2   Global Step: 122340   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:36,265-Speed 2629.31 samples/sec   Loss 11.8037   LearningRate 0.0727   Epoch: 2   Global Step: 122350   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:40,170-Speed 2622.79 samples/sec   Loss 11.8498   LearningRate 0.0727   Epoch: 2   Global Step: 122360   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:44,076-Speed 2622.50 samples/sec   Loss 12.0399   LearningRate 0.0727   Epoch: 2   Global Step: 122370   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:47,980-Speed 2623.42 samples/sec   Loss 11.8729   LearningRate 0.0727   Epoch: 2   Global Step: 122380   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:51,896-Speed 2615.24 samples/sec   Loss 11.9809   LearningRate 0.0727   Epoch: 2   Global Step: 122390   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:55,798-Speed 2625.67 samples/sec   Loss 11.7854   LearningRate 0.0727   Epoch: 2   Global Step: 122400   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:28:59,700-Speed 2624.61 samples/sec   Loss 11.8766   LearningRate 0.0727   Epoch: 2   Global Step: 122410   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:03,607-Speed 2621.44 samples/sec   Loss 11.7952   LearningRate 0.0727   Epoch: 2   Global Step: 122420   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:07,513-Speed 2622.62 samples/sec   Loss 11.8860   LearningRate 0.0727   Epoch: 2   Global Step: 122430   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:29:11,414-Speed 2625.70 samples/sec   Loss 11.9470   LearningRate 0.0727   Epoch: 2   Global Step: 122440   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:29:15,354-Speed 2599.76 samples/sec   Loss 11.8821   LearningRate 0.0727   Epoch: 2   Global Step: 122450   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:29:19,247-Speed 2631.05 samples/sec   Loss 11.6876   LearningRate 0.0727   Epoch: 2   Global Step: 122460   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:29:23,128-Speed 2638.65 samples/sec   Loss 11.8168   LearningRate 0.0727   Epoch: 2   Global Step: 122470   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:27,051-Speed 2611.13 samples/sec   Loss 11.9437   LearningRate 0.0727   Epoch: 2   Global Step: 122480   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:30,961-Speed 2619.50 samples/sec   Loss 11.9653   LearningRate 0.0726   Epoch: 2   Global Step: 122490   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:34,857-Speed 2628.98 samples/sec   Loss 11.8267   LearningRate 0.0726   Epoch: 2   Global Step: 122500   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:38,758-Speed 2625.58 samples/sec   Loss 11.7818   LearningRate 0.0726   Epoch: 2   Global Step: 122510   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:42,655-Speed 2628.56 samples/sec   Loss 11.8563   LearningRate 0.0726   Epoch: 2   Global Step: 122520   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:46,552-Speed 2628.38 samples/sec   Loss 11.8464   LearningRate 0.0726   Epoch: 2   Global Step: 122530   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:50,454-Speed 2624.43 samples/sec   Loss 11.7533   LearningRate 0.0726   Epoch: 2   Global Step: 122540   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:54,351-Speed 2628.34 samples/sec   Loss 11.7482   LearningRate 0.0726   Epoch: 2   Global Step: 122550   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:29:58,253-Speed 2624.90 samples/sec   Loss 11.7288   LearningRate 0.0726   Epoch: 2   Global Step: 122560   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:30:02,152-Speed 2627.02 samples/sec   Loss 11.9754   LearningRate 0.0726   Epoch: 2   Global Step: 122570   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:06,052-Speed 2626.71 samples/sec   Loss 11.8080   LearningRate 0.0726   Epoch: 2   Global Step: 122580   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:09,960-Speed 2620.62 samples/sec   Loss 11.7659   LearningRate 0.0726   Epoch: 2   Global Step: 122590   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:13,898-Speed 2601.17 samples/sec   Loss 11.7456   LearningRate 0.0726   Epoch: 2   Global Step: 122600   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:17,820-Speed 2611.47 samples/sec   Loss 11.8179   LearningRate 0.0726   Epoch: 2   Global Step: 122610   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:21,731-Speed 2619.08 samples/sec   Loss 11.8097   LearningRate 0.0726   Epoch: 2   Global Step: 122620   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:25,658-Speed 2607.88 samples/sec   Loss 11.7864   LearningRate 0.0726   Epoch: 2   Global Step: 122630   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:29,557-Speed 2626.77 samples/sec   Loss 11.7339   LearningRate 0.0726   Epoch: 2   Global Step: 122640   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:33,457-Speed 2626.71 samples/sec   Loss 11.8004   LearningRate 0.0726   Epoch: 2   Global Step: 122650   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:37,357-Speed 2626.04 samples/sec   Loss 11.9122   LearningRate 0.0726   Epoch: 2   Global Step: 122660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:30:41,256-Speed 2626.99 samples/sec   Loss 11.9685   LearningRate 0.0726   Epoch: 2   Global Step: 122670   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:30:45,160-Speed 2623.48 samples/sec   Loss 11.8268   LearningRate 0.0726   Epoch: 2   Global Step: 122680   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:30:49,063-Speed 2624.58 samples/sec   Loss 11.7625   LearningRate 0.0726   Epoch: 2   Global Step: 122690   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:30:52,963-Speed 2626.03 samples/sec   Loss 11.8637   LearningRate 0.0726   Epoch: 2   Global Step: 122700   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:30:56,847-Speed 2637.02 samples/sec   Loss 11.8649   LearningRate 0.0726   Epoch: 2   Global Step: 122710   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:00,747-Speed 2626.88 samples/sec   Loss 11.7112   LearningRate 0.0726   Epoch: 2   Global Step: 122720   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:04,663-Speed 2615.60 samples/sec   Loss 11.8120   LearningRate 0.0726   Epoch: 2   Global Step: 122730   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:08,576-Speed 2617.03 samples/sec   Loss 11.8884   LearningRate 0.0726   Epoch: 2   Global Step: 122740   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:12,475-Speed 2627.32 samples/sec   Loss 11.6631   LearningRate 0.0726   Epoch: 2   Global Step: 122750   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:16,376-Speed 2625.73 samples/sec   Loss 11.7905   LearningRate 0.0726   Epoch: 2   Global Step: 122760   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:20,272-Speed 2628.63 samples/sec   Loss 11.9228   LearningRate 0.0726   Epoch: 2   Global Step: 122770   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:24,205-Speed 2604.32 samples/sec   Loss 11.7029   LearningRate 0.0726   Epoch: 2   Global Step: 122780   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:28,104-Speed 2626.94 samples/sec   Loss 11.7505   LearningRate 0.0726   Epoch: 2   Global Step: 122790   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:32,033-Speed 2607.41 samples/sec   Loss 11.9277   LearningRate 0.0726   Epoch: 2   Global Step: 122800   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:35,937-Speed 2623.77 samples/sec   Loss 11.6777   LearningRate 0.0726   Epoch: 2   Global Step: 122810   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:31:39,851-Speed 2616.69 samples/sec   Loss 11.8578   LearningRate 0.0726   Epoch: 2   Global Step: 122820   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:31:43,737-Speed 2635.45 samples/sec   Loss 11.8592   LearningRate 0.0726   Epoch: 2   Global Step: 122830   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:47,716-Speed 2574.29 samples/sec   Loss 11.7583   LearningRate 0.0726   Epoch: 2   Global Step: 122840   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:51,620-Speed 2623.39 samples/sec   Loss 11.9020   LearningRate 0.0726   Epoch: 2   Global Step: 122850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:55,524-Speed 2624.17 samples/sec   Loss 11.7362   LearningRate 0.0726   Epoch: 2   Global Step: 122860   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:31:59,424-Speed 2626.17 samples/sec   Loss 11.9008   LearningRate 0.0726   Epoch: 2   Global Step: 122870   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:03,325-Speed 2626.15 samples/sec   Loss 11.8599   LearningRate 0.0726   Epoch: 2   Global Step: 122880   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:07,222-Speed 2627.58 samples/sec   Loss 11.7925   LearningRate 0.0726   Epoch: 2   Global Step: 122890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:11,125-Speed 2624.30 samples/sec   Loss 11.7926   LearningRate 0.0726   Epoch: 2   Global Step: 122900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:15,052-Speed 2608.20 samples/sec   Loss 11.7769   LearningRate 0.0726   Epoch: 2   Global Step: 122910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:18,951-Speed 2627.12 samples/sec   Loss 11.8210   LearningRate 0.0726   Epoch: 2   Global Step: 122920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:22,841-Speed 2633.20 samples/sec   Loss 11.5902   LearningRate 0.0726   Epoch: 2   Global Step: 122930   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:26,738-Speed 2628.20 samples/sec   Loss 11.8646   LearningRate 0.0726   Epoch: 2   Global Step: 122940   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:30,635-Speed 2628.45 samples/sec   Loss 11.7951   LearningRate 0.0726   Epoch: 2   Global Step: 122950   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:34,532-Speed 2627.78 samples/sec   Loss 11.8079   LearningRate 0.0726   Epoch: 2   Global Step: 122960   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:38,433-Speed 2625.88 samples/sec   Loss 11.7106   LearningRate 0.0726   Epoch: 2   Global Step: 122970   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:42,336-Speed 2624.30 samples/sec   Loss 11.7364   LearningRate 0.0725   Epoch: 2   Global Step: 122980   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:46,244-Speed 2621.17 samples/sec   Loss 11.7981   LearningRate 0.0725   Epoch: 2   Global Step: 122990   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:50,154-Speed 2619.56 samples/sec   Loss 11.8677   LearningRate 0.0725   Epoch: 2   Global Step: 123000   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:54,076-Speed 2611.66 samples/sec   Loss 11.8484   LearningRate 0.0725   Epoch: 2   Global Step: 123010   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:32:57,976-Speed 2626.19 samples/sec   Loss 11.7385   LearningRate 0.0725   Epoch: 2   Global Step: 123020   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:01,882-Speed 2622.23 samples/sec   Loss 11.8070   LearningRate 0.0725   Epoch: 2   Global Step: 123030   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:33:05,804-Speed 2611.40 samples/sec   Loss 11.8015   LearningRate 0.0725   Epoch: 2   Global Step: 123040   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:33:09,693-Speed 2633.80 samples/sec   Loss 11.7360   LearningRate 0.0725   Epoch: 2   Global Step: 123050   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:13,596-Speed 2624.31 samples/sec   Loss 11.8730   LearningRate 0.0725   Epoch: 2   Global Step: 123060   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:17,513-Speed 2614.64 samples/sec   Loss 11.7912   LearningRate 0.0725   Epoch: 2   Global Step: 123070   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:21,435-Speed 2611.85 samples/sec   Loss 11.8747   LearningRate 0.0725   Epoch: 2   Global Step: 123080   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:25,336-Speed 2626.08 samples/sec   Loss 11.8608   LearningRate 0.0725   Epoch: 2   Global Step: 123090   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:29,252-Speed 2615.66 samples/sec   Loss 11.8621   LearningRate 0.0725   Epoch: 2   Global Step: 123100   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:33,146-Speed 2630.04 samples/sec   Loss 11.7494   LearningRate 0.0725   Epoch: 2   Global Step: 123110   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:37,049-Speed 2624.51 samples/sec   Loss 11.7124   LearningRate 0.0725   Epoch: 2   Global Step: 123120   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:40,950-Speed 2625.73 samples/sec   Loss 11.8833   LearningRate 0.0725   Epoch: 2   Global Step: 123130   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:44,845-Speed 2629.38 samples/sec   Loss 11.8988   LearningRate 0.0725   Epoch: 2   Global Step: 123140   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:33:48,745-Speed 2626.20 samples/sec   Loss 11.7250   LearningRate 0.0725   Epoch: 2   Global Step: 123150   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:33:52,646-Speed 2625.18 samples/sec   Loss 11.8196   LearningRate 0.0725   Epoch: 2   Global Step: 123160   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:33:56,526-Speed 2640.69 samples/sec   Loss 11.7819   LearningRate 0.0725   Epoch: 2   Global Step: 123170   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:00,448-Speed 2611.62 samples/sec   Loss 11.7879   LearningRate 0.0725   Epoch: 2   Global Step: 123180   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:04,351-Speed 2624.07 samples/sec   Loss 11.8508   LearningRate 0.0725   Epoch: 2   Global Step: 123190   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:08,254-Speed 2624.12 samples/sec   Loss 11.6372   LearningRate 0.0725   Epoch: 2   Global Step: 123200   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:12,160-Speed 2622.39 samples/sec   Loss 11.7380   LearningRate 0.0725   Epoch: 2   Global Step: 123210   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:16,080-Speed 2613.04 samples/sec   Loss 11.7770   LearningRate 0.0725   Epoch: 2   Global Step: 123220   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:19,993-Speed 2618.04 samples/sec   Loss 11.6381   LearningRate 0.0725   Epoch: 2   Global Step: 123230   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:23,923-Speed 2606.25 samples/sec   Loss 11.7807   LearningRate 0.0725   Epoch: 2   Global Step: 123240   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:27,826-Speed 2624.13 samples/sec   Loss 11.6850   LearningRate 0.0725   Epoch: 2   Global Step: 123250   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:31,754-Speed 2607.99 samples/sec   Loss 11.7742   LearningRate 0.0725   Epoch: 2   Global Step: 123260   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:35,643-Speed 2633.24 samples/sec   Loss 11.8477   LearningRate 0.0725   Epoch: 2   Global Step: 123270   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:39,543-Speed 2626.55 samples/sec   Loss 11.7729   LearningRate 0.0725   Epoch: 2   Global Step: 123280   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:43,446-Speed 2624.08 samples/sec   Loss 11.8566   LearningRate 0.0725   Epoch: 2   Global Step: 123290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:47,352-Speed 2622.24 samples/sec   Loss 11.6844   LearningRate 0.0725   Epoch: 2   Global Step: 123300   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:51,251-Speed 2627.31 samples/sec   Loss 11.9179   LearningRate 0.0725   Epoch: 2   Global Step: 123310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:55,150-Speed 2626.45 samples/sec   Loss 11.8574   LearningRate 0.0725   Epoch: 2   Global Step: 123320   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:34:59,049-Speed 2627.78 samples/sec   Loss 11.7602   LearningRate 0.0725   Epoch: 2   Global Step: 123330   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:02,946-Speed 2627.95 samples/sec   Loss 11.7008   LearningRate 0.0725   Epoch: 2   Global Step: 123340   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:06,846-Speed 2626.21 samples/sec   Loss 11.8657   LearningRate 0.0725   Epoch: 2   Global Step: 123350   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:10,750-Speed 2623.42 samples/sec   Loss 11.8312   LearningRate 0.0725   Epoch: 2   Global Step: 123360   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:14,654-Speed 2623.58 samples/sec   Loss 11.9368   LearningRate 0.0725   Epoch: 2   Global Step: 123370   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:35:18,552-Speed 2628.02 samples/sec   Loss 11.8768   LearningRate 0.0725   Epoch: 2   Global Step: 123380   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:35:22,438-Speed 2635.84 samples/sec   Loss 11.7038   LearningRate 0.0725   Epoch: 2   Global Step: 123390   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:26,349-Speed 2619.74 samples/sec   Loss 11.7516   LearningRate 0.0725   Epoch: 2   Global Step: 123400   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:30,257-Speed 2620.31 samples/sec   Loss 11.8085   LearningRate 0.0725   Epoch: 2   Global Step: 123410   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:34,251-Speed 2564.48 samples/sec   Loss 11.8257   LearningRate 0.0725   Epoch: 2   Global Step: 123420   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:38,172-Speed 2612.15 samples/sec   Loss 11.8632   LearningRate 0.0725   Epoch: 2   Global Step: 123430   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:42,076-Speed 2623.71 samples/sec   Loss 11.7448   LearningRate 0.0725   Epoch: 2   Global Step: 123440   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:45,979-Speed 2624.46 samples/sec   Loss 11.7232   LearningRate 0.0725   Epoch: 2   Global Step: 123450   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:49,877-Speed 2627.95 samples/sec   Loss 11.8808   LearningRate 0.0725   Epoch: 2   Global Step: 123460   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:53,777-Speed 2625.88 samples/sec   Loss 11.6846   LearningRate 0.0724   Epoch: 2   Global Step: 123470   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:35:57,677-Speed 2626.38 samples/sec   Loss 11.6540   LearningRate 0.0724   Epoch: 2   Global Step: 123480   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:01,577-Speed 2626.00 samples/sec   Loss 11.8238   LearningRate 0.0724   Epoch: 2   Global Step: 123490   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:36:05,491-Speed 2616.88 samples/sec   Loss 11.9202   LearningRate 0.0724   Epoch: 2   Global Step: 123500   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:36:09,394-Speed 2623.95 samples/sec   Loss 11.7743   LearningRate 0.0724   Epoch: 2   Global Step: 123510   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:36:13,328-Speed 2603.98 samples/sec   Loss 11.6460   LearningRate 0.0724   Epoch: 2   Global Step: 123520   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:36:17,207-Speed 2640.97 samples/sec   Loss 11.7637   LearningRate 0.0724   Epoch: 2   Global Step: 123530   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:21,120-Speed 2617.51 samples/sec   Loss 11.8375   LearningRate 0.0724   Epoch: 2   Global Step: 123540   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:25,024-Speed 2623.60 samples/sec   Loss 11.7961   LearningRate 0.0724   Epoch: 2   Global Step: 123550   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:28,927-Speed 2624.41 samples/sec   Loss 11.8193   LearningRate 0.0724   Epoch: 2   Global Step: 123560   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:32,831-Speed 2623.49 samples/sec   Loss 11.6527   LearningRate 0.0724   Epoch: 2   Global Step: 123570   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:36,735-Speed 2623.66 samples/sec   Loss 11.8526   LearningRate 0.0724   Epoch: 2   Global Step: 123580   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:40,647-Speed 2617.75 samples/sec   Loss 11.8455   LearningRate 0.0724   Epoch: 2   Global Step: 123590   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:44,544-Speed 2628.92 samples/sec   Loss 11.6886   LearningRate 0.0724   Epoch: 2   Global Step: 123600   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:48,446-Speed 2624.88 samples/sec   Loss 11.7370   LearningRate 0.0724   Epoch: 2   Global Step: 123610   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:52,363-Speed 2614.84 samples/sec   Loss 11.7370   LearningRate 0.0724   Epoch: 2   Global Step: 123620   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:36:56,249-Speed 2636.26 samples/sec   Loss 11.7274   LearningRate 0.0724   Epoch: 2   Global Step: 123630   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:00,148-Speed 2626.71 samples/sec   Loss 11.8203   LearningRate 0.0724   Epoch: 2   Global Step: 123640   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:04,047-Speed 2627.15 samples/sec   Loss 11.8788   LearningRate 0.0724   Epoch: 2   Global Step: 123650   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:07,949-Speed 2625.06 samples/sec   Loss 11.7873   LearningRate 0.0724   Epoch: 2   Global Step: 123660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:11,848-Speed 2626.97 samples/sec   Loss 11.8868   LearningRate 0.0724   Epoch: 2   Global Step: 123670   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:15,748-Speed 2626.11 samples/sec   Loss 11.7978   LearningRate 0.0724   Epoch: 2   Global Step: 123680   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:19,653-Speed 2623.05 samples/sec   Loss 11.8075   LearningRate 0.0724   Epoch: 2   Global Step: 123690   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:23,555-Speed 2625.73 samples/sec   Loss 11.7920   LearningRate 0.0724   Epoch: 2   Global Step: 123700   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:27,471-Speed 2615.58 samples/sec   Loss 11.8945   LearningRate 0.0724   Epoch: 2   Global Step: 123710   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:31,391-Speed 2612.84 samples/sec   Loss 11.7737   LearningRate 0.0724   Epoch: 2   Global Step: 123720   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:35,296-Speed 2623.36 samples/sec   Loss 11.8205   LearningRate 0.0724   Epoch: 2   Global Step: 123730   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:37:39,205-Speed 2620.08 samples/sec   Loss 11.8773   LearningRate 0.0724   Epoch: 2   Global Step: 123740   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:37:43,110-Speed 2622.79 samples/sec   Loss 11.7602   LearningRate 0.0724   Epoch: 2   Global Step: 123750   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:37:46,998-Speed 2633.99 samples/sec   Loss 11.8038   LearningRate 0.0724   Epoch: 2   Global Step: 123760   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:50,907-Speed 2620.50 samples/sec   Loss 11.8458   LearningRate 0.0724   Epoch: 2   Global Step: 123770   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:54,816-Speed 2620.31 samples/sec   Loss 11.8431   LearningRate 0.0724   Epoch: 2   Global Step: 123780   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:37:58,725-Speed 2620.16 samples/sec   Loss 11.7515   LearningRate 0.0724   Epoch: 2   Global Step: 123790   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:02,670-Speed 2596.66 samples/sec   Loss 11.9315   LearningRate 0.0724   Epoch: 2   Global Step: 123800   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:06,571-Speed 2625.68 samples/sec   Loss 11.7631   LearningRate 0.0724   Epoch: 2   Global Step: 123810   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:10,473-Speed 2624.71 samples/sec   Loss 11.6866   LearningRate 0.0724   Epoch: 2   Global Step: 123820   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:14,384-Speed 2618.94 samples/sec   Loss 11.8883   LearningRate 0.0724   Epoch: 2   Global Step: 123830   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:18,426-Speed 2534.70 samples/sec   Loss 11.7535   LearningRate 0.0724   Epoch: 2   Global Step: 123840   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:22,447-Speed 2547.06 samples/sec   Loss 11.7352   LearningRate 0.0724   Epoch: 2   Global Step: 123850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:26,354-Speed 2621.16 samples/sec   Loss 11.6854   LearningRate 0.0724   Epoch: 2   Global Step: 123860   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:38:30,270-Speed 2615.41 samples/sec   Loss 11.9382   LearningRate 0.0724   Epoch: 2   Global Step: 123870   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:38:34,157-Speed 2635.38 samples/sec   Loss 11.8460   LearningRate 0.0724   Epoch: 2   Global Step: 123880   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:38,059-Speed 2624.53 samples/sec   Loss 11.6859   LearningRate 0.0724   Epoch: 2   Global Step: 123890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:41,968-Speed 2620.33 samples/sec   Loss 11.8456   LearningRate 0.0724   Epoch: 2   Global Step: 123900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:45,863-Speed 2629.71 samples/sec   Loss 11.7904   LearningRate 0.0724   Epoch: 2   Global Step: 123910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:49,768-Speed 2623.02 samples/sec   Loss 11.8729   LearningRate 0.0724   Epoch: 2   Global Step: 123920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:53,670-Speed 2624.88 samples/sec   Loss 11.9724   LearningRate 0.0724   Epoch: 2   Global Step: 123930   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:38:57,598-Speed 2607.24 samples/sec   Loss 11.6316   LearningRate 0.0724   Epoch: 2   Global Step: 123940   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:01,502-Speed 2623.53 samples/sec   Loss 11.8404   LearningRate 0.0723   Epoch: 2   Global Step: 123950   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:05,402-Speed 2626.44 samples/sec   Loss 11.7150   LearningRate 0.0723   Epoch: 2   Global Step: 123960   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:09,306-Speed 2623.85 samples/sec   Loss 11.7722   LearningRate 0.0723   Epoch: 2   Global Step: 123970   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:13,208-Speed 2625.17 samples/sec   Loss 11.7193   LearningRate 0.0723   Epoch: 2   Global Step: 123980   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:39:17,098-Speed 2632.60 samples/sec   Loss 11.7745   LearningRate 0.0723   Epoch: 2   Global Step: 123990   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:20,990-Speed 2631.98 samples/sec   Loss 11.8023   LearningRate 0.0723   Epoch: 2   Global Step: 124000   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:24,902-Speed 2618.04 samples/sec   Loss 11.9746   LearningRate 0.0723   Epoch: 2   Global Step: 124010   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:28,817-Speed 2616.83 samples/sec   Loss 11.7262   LearningRate 0.0723   Epoch: 2   Global Step: 124020   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:32,734-Speed 2614.87 samples/sec   Loss 11.9272   LearningRate 0.0723   Epoch: 2   Global Step: 124030   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:36,631-Speed 2627.91 samples/sec   Loss 11.8676   LearningRate 0.0723   Epoch: 2   Global Step: 124040   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:40,538-Speed 2621.93 samples/sec   Loss 11.7195   LearningRate 0.0723   Epoch: 2   Global Step: 124050   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:44,447-Speed 2620.61 samples/sec   Loss 11.7235   LearningRate 0.0723   Epoch: 2   Global Step: 124060   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:48,351-Speed 2623.39 samples/sec   Loss 11.7162   LearningRate 0.0723   Epoch: 2   Global Step: 124070   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:52,271-Speed 2613.55 samples/sec   Loss 11.9289   LearningRate 0.0723   Epoch: 2   Global Step: 124080   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:39:56,180-Speed 2620.36 samples/sec   Loss 11.9181   LearningRate 0.0723   Epoch: 2   Global Step: 124090   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:40:00,095-Speed 2616.25 samples/sec   Loss 11.7750   LearningRate 0.0723   Epoch: 2   Global Step: 124100   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:40:03,989-Speed 2630.33 samples/sec   Loss 11.7366   LearningRate 0.0723   Epoch: 2   Global Step: 124110   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:07,902-Speed 2617.18 samples/sec   Loss 11.9672   LearningRate 0.0723   Epoch: 2   Global Step: 124120   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:11,811-Speed 2620.40 samples/sec   Loss 11.8273   LearningRate 0.0723   Epoch: 2   Global Step: 124130   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:15,720-Speed 2620.42 samples/sec   Loss 11.8792   LearningRate 0.0723   Epoch: 2   Global Step: 124140   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:19,635-Speed 2616.14 samples/sec   Loss 11.7589   LearningRate 0.0723   Epoch: 2   Global Step: 124150   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:23,539-Speed 2623.24 samples/sec   Loss 11.7384   LearningRate 0.0723   Epoch: 2   Global Step: 124160   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:27,462-Speed 2611.28 samples/sec   Loss 11.7829   LearningRate 0.0723   Epoch: 2   Global Step: 124170   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:31,368-Speed 2622.06 samples/sec   Loss 11.9226   LearningRate 0.0723   Epoch: 2   Global Step: 124180   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:35,279-Speed 2618.78 samples/sec   Loss 11.9701   LearningRate 0.0723   Epoch: 2   Global Step: 124190   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:39,193-Speed 2616.63 samples/sec   Loss 11.7124   LearningRate 0.0723   Epoch: 2   Global Step: 124200   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:43,086-Speed 2631.61 samples/sec   Loss 11.7927   LearningRate 0.0723   Epoch: 2   Global Step: 124210   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:46,993-Speed 2621.42 samples/sec   Loss 11.6731   LearningRate 0.0723   Epoch: 2   Global Step: 124220   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:50,951-Speed 2588.26 samples/sec   Loss 11.7843   LearningRate 0.0723   Epoch: 2   Global Step: 124230   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:40:54,855-Speed 2623.91 samples/sec   Loss 11.5960   LearningRate 0.0723   Epoch: 2   Global Step: 124240   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:40:58,749-Speed 2630.36 samples/sec   Loss 11.7524   LearningRate 0.0723   Epoch: 2   Global Step: 124250   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:02,649-Speed 2626.20 samples/sec   Loss 11.7790   LearningRate 0.0723   Epoch: 2   Global Step: 124260   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:06,556-Speed 2621.61 samples/sec   Loss 11.8606   LearningRate 0.0723   Epoch: 2   Global Step: 124270   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:10,459-Speed 2623.44 samples/sec   Loss 11.7290   LearningRate 0.0723   Epoch: 2   Global Step: 124280   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:14,364-Speed 2623.59 samples/sec   Loss 11.6771   LearningRate 0.0723   Epoch: 2   Global Step: 124290   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:18,276-Speed 2618.28 samples/sec   Loss 11.8401   LearningRate 0.0723   Epoch: 2   Global Step: 124300   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:22,204-Speed 2608.00 samples/sec   Loss 11.7549   LearningRate 0.0723   Epoch: 2   Global Step: 124310   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:26,118-Speed 2617.28 samples/sec   Loss 11.8570   LearningRate 0.0723   Epoch: 2   Global Step: 124320   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:30,028-Speed 2619.60 samples/sec   Loss 11.7917   LearningRate 0.0723   Epoch: 2   Global Step: 124330   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:41:33,948-Speed 2612.75 samples/sec   Loss 11.7748   LearningRate 0.0723   Epoch: 2   Global Step: 124340   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:41:37,855-Speed 2621.33 samples/sec   Loss 11.9142   LearningRate 0.0723   Epoch: 2   Global Step: 124350   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:41:41,761-Speed 2621.97 samples/sec   Loss 11.5553   LearningRate 0.0723   Epoch: 2   Global Step: 124360   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:41:45,701-Speed 2600.36 samples/sec   Loss 11.7386   LearningRate 0.0723   Epoch: 2   Global Step: 124370   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:41:49,598-Speed 2628.58 samples/sec   Loss 11.8113   LearningRate 0.0723   Epoch: 2   Global Step: 124380   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:41:53,496-Speed 2627.03 samples/sec   Loss 11.7655   LearningRate 0.0723   Epoch: 2   Global Step: 124390   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:41:57,427-Speed 2606.27 samples/sec   Loss 11.9761   LearningRate 0.0723   Epoch: 2   Global Step: 124400   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:01,330-Speed 2623.94 samples/sec   Loss 11.8139   LearningRate 0.0723   Epoch: 2   Global Step: 124410   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:05,304-Speed 2577.41 samples/sec   Loss 11.7064   LearningRate 0.0723   Epoch: 2   Global Step: 124420   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:09,201-Speed 2628.63 samples/sec   Loss 11.7881   LearningRate 0.0723   Epoch: 2   Global Step: 124430   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:30,304-Speed 485.27 samples/sec   Loss 11.9024   LearningRate 0.0722   Epoch: 3   Global Step: 124440   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:34,183-Speed 2641.13 samples/sec   Loss 11.7427   LearningRate 0.0722   Epoch: 3   Global Step: 124450   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:38,075-Speed 2631.38 samples/sec   Loss 11.7811   LearningRate 0.0722   Epoch: 3   Global Step: 124460   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:41,989-Speed 2617.14 samples/sec   Loss 11.8198   LearningRate 0.0722   Epoch: 3   Global Step: 124470   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:45,877-Speed 2634.52 samples/sec   Loss 11.7142   LearningRate 0.0722   Epoch: 3   Global Step: 124480   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:49,762-Speed 2636.12 samples/sec   Loss 11.7939   LearningRate 0.0722   Epoch: 3   Global Step: 124490   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:53,652-Speed 2633.23 samples/sec   Loss 11.8602   LearningRate 0.0722   Epoch: 3   Global Step: 124500   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:42:57,542-Speed 2633.44 samples/sec   Loss 11.7628   LearningRate 0.0722   Epoch: 3   Global Step: 124510   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:01,443-Speed 2625.93 samples/sec   Loss 11.6352   LearningRate 0.0722   Epoch: 3   Global Step: 124520   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:05,340-Speed 2627.60 samples/sec   Loss 11.8756   LearningRate 0.0722   Epoch: 3   Global Step: 124530   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:09,238-Speed 2627.35 samples/sec   Loss 11.8486   LearningRate 0.0722   Epoch: 3   Global Step: 124540   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:43:13,113-Speed 2643.28 samples/sec   Loss 11.8849   LearningRate 0.0722   Epoch: 3   Global Step: 124550   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:17,038-Speed 2609.86 samples/sec   Loss 11.7464   LearningRate 0.0722   Epoch: 3   Global Step: 124560   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:20,930-Speed 2631.63 samples/sec   Loss 11.8035   LearningRate 0.0722   Epoch: 3   Global Step: 124570   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:24,857-Speed 2608.74 samples/sec   Loss 11.7912   LearningRate 0.0722   Epoch: 3   Global Step: 124580   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:28,866-Speed 2554.90 samples/sec   Loss 11.8408   LearningRate 0.0722   Epoch: 3   Global Step: 124590   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:32,761-Speed 2629.83 samples/sec   Loss 11.8870   LearningRate 0.0722   Epoch: 3   Global Step: 124600   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:36,683-Speed 2611.53 samples/sec   Loss 11.8761   LearningRate 0.0722   Epoch: 3   Global Step: 124610   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:40,577-Speed 2630.70 samples/sec   Loss 11.8428   LearningRate 0.0722   Epoch: 3   Global Step: 124620   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:44,472-Speed 2628.89 samples/sec   Loss 11.8663   LearningRate 0.0722   Epoch: 3   Global Step: 124630   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:48,369-Speed 2628.51 samples/sec   Loss 11.8508   LearningRate 0.0722   Epoch: 3   Global Step: 124640   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:52,252-Speed 2638.05 samples/sec   Loss 11.7146   LearningRate 0.0722   Epoch: 3   Global Step: 124650   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:43:56,147-Speed 2629.70 samples/sec   Loss 11.6892   LearningRate 0.0722   Epoch: 3   Global Step: 124660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:00,042-Speed 2629.76 samples/sec   Loss 11.7921   LearningRate 0.0722   Epoch: 3   Global Step: 124670   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:03,939-Speed 2628.46 samples/sec   Loss 11.8371   LearningRate 0.0722   Epoch: 3   Global Step: 124680   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:07,855-Speed 2615.32 samples/sec   Loss 11.7702   LearningRate 0.0722   Epoch: 3   Global Step: 124690   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:11,764-Speed 2620.36 samples/sec   Loss 11.6421   LearningRate 0.0722   Epoch: 3   Global Step: 124700   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:15,662-Speed 2627.64 samples/sec   Loss 11.7817   LearningRate 0.0722   Epoch: 3   Global Step: 124710   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:19,562-Speed 2626.54 samples/sec   Loss 11.7138   LearningRate 0.0722   Epoch: 3   Global Step: 124720   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:23,462-Speed 2626.46 samples/sec   Loss 11.8039   LearningRate 0.0722   Epoch: 3   Global Step: 124730   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:27,360-Speed 2627.56 samples/sec   Loss 11.7614   LearningRate 0.0722   Epoch: 3   Global Step: 124740   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:44:31,260-Speed 2625.79 samples/sec   Loss 11.7497   LearningRate 0.0722   Epoch: 3   Global Step: 124750   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:35,167-Speed 2622.06 samples/sec   Loss 11.8649   LearningRate 0.0722   Epoch: 3   Global Step: 124760   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:39,073-Speed 2622.58 samples/sec   Loss 11.8077   LearningRate 0.0722   Epoch: 3   Global Step: 124770   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:42,975-Speed 2624.79 samples/sec   Loss 11.7902   LearningRate 0.0722   Epoch: 3   Global Step: 124780   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:46,878-Speed 2624.63 samples/sec   Loss 11.7090   LearningRate 0.0722   Epoch: 3   Global Step: 124790   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:50,786-Speed 2621.03 samples/sec   Loss 11.6608   LearningRate 0.0722   Epoch: 3   Global Step: 124800   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:54,703-Speed 2615.51 samples/sec   Loss 11.8120   LearningRate 0.0722   Epoch: 3   Global Step: 124810   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:44:58,590-Speed 2634.64 samples/sec   Loss 11.7161   LearningRate 0.0722   Epoch: 3   Global Step: 124820   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:45:02,469-Speed 2641.03 samples/sec   Loss 12.0170   LearningRate 0.0722   Epoch: 3   Global Step: 124830   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:06,369-Speed 2625.95 samples/sec   Loss 11.8668   LearningRate 0.0722   Epoch: 3   Global Step: 124840   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:10,285-Speed 2615.81 samples/sec   Loss 11.6913   LearningRate 0.0722   Epoch: 3   Global Step: 124850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:14,203-Speed 2614.56 samples/sec   Loss 11.7715   LearningRate 0.0722   Epoch: 3   Global Step: 124860   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:18,117-Speed 2616.72 samples/sec   Loss 11.8018   LearningRate 0.0722   Epoch: 3   Global Step: 124870   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:22,021-Speed 2623.94 samples/sec   Loss 11.7117   LearningRate 0.0722   Epoch: 3   Global Step: 124880   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:25,921-Speed 2626.53 samples/sec   Loss 11.8762   LearningRate 0.0722   Epoch: 3   Global Step: 124890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:29,856-Speed 2603.07 samples/sec   Loss 11.8324   LearningRate 0.0722   Epoch: 3   Global Step: 124900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:33,757-Speed 2625.55 samples/sec   Loss 11.6956   LearningRate 0.0722   Epoch: 3   Global Step: 124910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:37,658-Speed 2625.88 samples/sec   Loss 11.7471   LearningRate 0.0722   Epoch: 3   Global Step: 124920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:41,561-Speed 2623.95 samples/sec   Loss 11.7149   LearningRate 0.0721   Epoch: 3   Global Step: 124930   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:45:45,482-Speed 2612.83 samples/sec   Loss 11.6047   LearningRate 0.0721   Epoch: 3   Global Step: 124940   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:45:49,363-Speed 2638.93 samples/sec   Loss 11.8329   LearningRate 0.0721   Epoch: 3   Global Step: 124950   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:53,286-Speed 2611.50 samples/sec   Loss 11.6939   LearningRate 0.0721   Epoch: 3   Global Step: 124960   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:45:57,193-Speed 2621.95 samples/sec   Loss 11.8994   LearningRate 0.0721   Epoch: 3   Global Step: 124970   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:01,109-Speed 2615.07 samples/sec   Loss 11.8009   LearningRate 0.0721   Epoch: 3   Global Step: 124980   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:05,000-Speed 2632.95 samples/sec   Loss 11.7105   LearningRate 0.0721   Epoch: 3   Global Step: 124990   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:08,900-Speed 2625.89 samples/sec   Loss 11.9212   LearningRate 0.0721   Epoch: 3   Global Step: 125000   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:12,904-Speed 2558.51 samples/sec   Loss 11.7223   LearningRate 0.0721   Epoch: 3   Global Step: 125010   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:16,815-Speed 2618.69 samples/sec   Loss 11.8019   LearningRate 0.0721   Epoch: 3   Global Step: 125020   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:20,717-Speed 2625.49 samples/sec   Loss 11.8101   LearningRate 0.0721   Epoch: 3   Global Step: 125030   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:24,623-Speed 2622.07 samples/sec   Loss 11.8491   LearningRate 0.0721   Epoch: 3   Global Step: 125040   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:28,530-Speed 2621.85 samples/sec   Loss 11.8313   LearningRate 0.0721   Epoch: 3   Global Step: 125050   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:46:32,433-Speed 2624.24 samples/sec   Loss 11.7796   LearningRate 0.0721   Epoch: 3   Global Step: 125060   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:46:36,331-Speed 2627.25 samples/sec   Loss 11.6037   LearningRate 0.0721   Epoch: 3   Global Step: 125070   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:46:40,238-Speed 2621.61 samples/sec   Loss 11.7677   LearningRate 0.0721   Epoch: 3   Global Step: 125080   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:46:44,124-Speed 2635.74 samples/sec   Loss 11.7357   LearningRate 0.0721   Epoch: 3   Global Step: 125090   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:48,028-Speed 2623.99 samples/sec   Loss 11.6632   LearningRate 0.0721   Epoch: 3   Global Step: 125100   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:51,933-Speed 2622.72 samples/sec   Loss 11.8857   LearningRate 0.0721   Epoch: 3   Global Step: 125110   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:55,844-Speed 2619.01 samples/sec   Loss 11.6855   LearningRate 0.0721   Epoch: 3   Global Step: 125120   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:46:59,742-Speed 2627.91 samples/sec   Loss 11.7927   LearningRate 0.0721   Epoch: 3   Global Step: 125130   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:47:03,621-Speed 2639.95 samples/sec   Loss 11.7749   LearningRate 0.0721   Epoch: 3   Global Step: 125140   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:07,518-Speed 2628.55 samples/sec   Loss 11.6872   LearningRate 0.0721   Epoch: 3   Global Step: 125150   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:11,421-Speed 2624.15 samples/sec   Loss 11.7602   LearningRate 0.0721   Epoch: 3   Global Step: 125160   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:15,323-Speed 2624.65 samples/sec   Loss 11.7404   LearningRate 0.0721   Epoch: 3   Global Step: 125170   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:19,254-Speed 2606.35 samples/sec   Loss 11.7002   LearningRate 0.0721   Epoch: 3   Global Step: 125180   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:23,190-Speed 2601.92 samples/sec   Loss 11.8565   LearningRate 0.0721   Epoch: 3   Global Step: 125190   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:27,110-Speed 2613.56 samples/sec   Loss 11.7151   LearningRate 0.0721   Epoch: 3   Global Step: 125200   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:31,013-Speed 2624.09 samples/sec   Loss 11.8245   LearningRate 0.0721   Epoch: 3   Global Step: 125210   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:34,934-Speed 2612.15 samples/sec   Loss 11.5812   LearningRate 0.0721   Epoch: 3   Global Step: 125220   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:38,818-Speed 2636.64 samples/sec   Loss 12.1883   LearningRate 0.0721   Epoch: 3   Global Step: 125230   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:47:42,718-Speed 2626.76 samples/sec   Loss 12.0810   LearningRate 0.0721   Epoch: 3   Global Step: 125240   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:47:46,612-Speed 2630.68 samples/sec   Loss 11.8369   LearningRate 0.0721   Epoch: 3   Global Step: 125250   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:47:50,508-Speed 2629.03 samples/sec   Loss 11.8549   LearningRate 0.0721   Epoch: 3   Global Step: 125260   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:47:54,417-Speed 2620.00 samples/sec   Loss 11.7561   LearningRate 0.0721   Epoch: 3   Global Step: 125270   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:47:58,314-Speed 2628.25 samples/sec   Loss 11.6946   LearningRate 0.0721   Epoch: 3   Global Step: 125280   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:48:02,209-Speed 2630.14 samples/sec   Loss 11.8778   LearningRate 0.0721   Epoch: 3   Global Step: 125290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:48:06,110-Speed 2625.16 samples/sec   Loss 11.7200   LearningRate 0.0721   Epoch: 3   Global Step: 125300   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:48:10,018-Speed 2620.40 samples/sec   Loss 11.7535   LearningRate 0.0721   Epoch: 3   Global Step: 125310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:48:13,906-Speed 2634.65 samples/sec   Loss 11.6515   LearningRate 0.0721   Epoch: 3   Global Step: 125320   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:17,809-Speed 2624.04 samples/sec   Loss 11.9309   LearningRate 0.0721   Epoch: 3   Global Step: 125330   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:21,719-Speed 2619.56 samples/sec   Loss 11.7230   LearningRate 0.0721   Epoch: 3   Global Step: 125340   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:25,624-Speed 2622.76 samples/sec   Loss 11.7816   LearningRate 0.0721   Epoch: 3   Global Step: 125350   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:29,520-Speed 2629.26 samples/sec   Loss 11.8508   LearningRate 0.0721   Epoch: 3   Global Step: 125360   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:33,422-Speed 2625.46 samples/sec   Loss 11.7080   LearningRate 0.0721   Epoch: 3   Global Step: 125370   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:37,320-Speed 2627.62 samples/sec   Loss 11.7730   LearningRate 0.0721   Epoch: 3   Global Step: 125380   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:41,221-Speed 2625.31 samples/sec   Loss 11.7817   LearningRate 0.0721   Epoch: 3   Global Step: 125390   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:45,117-Speed 2629.42 samples/sec   Loss 11.6860   LearningRate 0.0721   Epoch: 3   Global Step: 125400   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:49,012-Speed 2629.47 samples/sec   Loss 11.8136   LearningRate 0.0721   Epoch: 3   Global Step: 125410   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:48:52,908-Speed 2628.93 samples/sec   Loss 11.7977   LearningRate 0.0720   Epoch: 3   Global Step: 125420   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:48:56,804-Speed 2628.41 samples/sec   Loss 11.7171   LearningRate 0.0720   Epoch: 3   Global Step: 125430   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:00,704-Speed 2626.97 samples/sec   Loss 11.8609   LearningRate 0.0720   Epoch: 3   Global Step: 125440   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:04,602-Speed 2627.16 samples/sec   Loss 11.8207   LearningRate 0.0720   Epoch: 3   Global Step: 125450   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:08,496-Speed 2630.32 samples/sec   Loss 11.7031   LearningRate 0.0720   Epoch: 3   Global Step: 125460   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:12,402-Speed 2622.35 samples/sec   Loss 11.7806   LearningRate 0.0720   Epoch: 3   Global Step: 125470   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:16,296-Speed 2630.23 samples/sec   Loss 11.7989   LearningRate 0.0720   Epoch: 3   Global Step: 125480   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:20,196-Speed 2626.12 samples/sec   Loss 11.6625   LearningRate 0.0720   Epoch: 3   Global Step: 125490   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:24,115-Speed 2613.50 samples/sec   Loss 11.7329   LearningRate 0.0720   Epoch: 3   Global Step: 125500   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:28,009-Speed 2630.48 samples/sec   Loss 11.6270   LearningRate 0.0720   Epoch: 3   Global Step: 125510   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:31,906-Speed 2629.19 samples/sec   Loss 11.8414   LearningRate 0.0720   Epoch: 3   Global Step: 125520   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:49:35,782-Speed 2641.79 samples/sec   Loss 11.7079   LearningRate 0.0720   Epoch: 3   Global Step: 125530   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:39,678-Speed 2629.41 samples/sec   Loss 11.7534   LearningRate 0.0720   Epoch: 3   Global Step: 125540   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:43,577-Speed 2627.41 samples/sec   Loss 11.8694   LearningRate 0.0720   Epoch: 3   Global Step: 125550   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:47,479-Speed 2624.20 samples/sec   Loss 11.6584   LearningRate 0.0720   Epoch: 3   Global Step: 125560   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:51,377-Speed 2627.78 samples/sec   Loss 11.6447   LearningRate 0.0720   Epoch: 3   Global Step: 125570   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:55,308-Speed 2606.03 samples/sec   Loss 11.7634   LearningRate 0.0720   Epoch: 3   Global Step: 125580   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:49:59,212-Speed 2623.34 samples/sec   Loss 11.7685   LearningRate 0.0720   Epoch: 3   Global Step: 125590   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:50:03,138-Speed 2608.98 samples/sec   Loss 11.5462   LearningRate 0.0720   Epoch: 3   Global Step: 125600   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:50:07,031-Speed 2631.18 samples/sec   Loss 11.7244   LearningRate 0.0720   Epoch: 3   Global Step: 125610   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:50:10,948-Speed 2615.34 samples/sec   Loss 11.8987   LearningRate 0.0720   Epoch: 3   Global Step: 125620   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:50:14,870-Speed 2610.94 samples/sec   Loss 11.6117   LearningRate 0.0720   Epoch: 3   Global Step: 125630   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:50:18,770-Speed 2626.68 samples/sec   Loss 11.7343   LearningRate 0.0720   Epoch: 3   Global Step: 125640   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:50:22,640-Speed 2646.14 samples/sec   Loss 11.7820   LearningRate 0.0720   Epoch: 3   Global Step: 125650   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:26,534-Speed 2630.76 samples/sec   Loss 11.9704   LearningRate 0.0720   Epoch: 3   Global Step: 125660   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:30,427-Speed 2631.04 samples/sec   Loss 11.6898   LearningRate 0.0720   Epoch: 3   Global Step: 125670   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:34,321-Speed 2630.22 samples/sec   Loss 11.5668   LearningRate 0.0720   Epoch: 3   Global Step: 125680   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:38,215-Speed 2630.28 samples/sec   Loss 11.8325   LearningRate 0.0720   Epoch: 3   Global Step: 125690   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:42,108-Speed 2631.18 samples/sec   Loss 11.8294   LearningRate 0.0720   Epoch: 3   Global Step: 125700   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:46,015-Speed 2621.30 samples/sec   Loss 11.8587   LearningRate 0.0720   Epoch: 3   Global Step: 125710   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:49,911-Speed 2628.83 samples/sec   Loss 11.7516   LearningRate 0.0720   Epoch: 3   Global Step: 125720   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:53,816-Speed 2623.11 samples/sec   Loss 11.8139   LearningRate 0.0720   Epoch: 3   Global Step: 125730   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:50:57,709-Speed 2631.07 samples/sec   Loss 11.7749   LearningRate 0.0720   Epoch: 3   Global Step: 125740   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:51:01,603-Speed 2630.00 samples/sec   Loss 11.8247   LearningRate 0.0720   Epoch: 3   Global Step: 125750   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:05,505-Speed 2625.29 samples/sec   Loss 11.6999   LearningRate 0.0720   Epoch: 3   Global Step: 125760   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:09,411-Speed 2622.26 samples/sec   Loss 11.7603   LearningRate 0.0720   Epoch: 3   Global Step: 125770   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:13,311-Speed 2626.32 samples/sec   Loss 11.8147   LearningRate 0.0720   Epoch: 3   Global Step: 125780   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:17,229-Speed 2614.00 samples/sec   Loss 11.7187   LearningRate 0.0720   Epoch: 3   Global Step: 125790   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:21,125-Speed 2628.89 samples/sec   Loss 11.8011   LearningRate 0.0720   Epoch: 3   Global Step: 125800   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:25,020-Speed 2630.30 samples/sec   Loss 11.7634   LearningRate 0.0720   Epoch: 3   Global Step: 125810   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:28,990-Speed 2579.58 samples/sec   Loss 11.7813   LearningRate 0.0720   Epoch: 3   Global Step: 125820   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:32,905-Speed 2615.99 samples/sec   Loss 11.8500   LearningRate 0.0720   Epoch: 3   Global Step: 125830   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:36,827-Speed 2611.73 samples/sec   Loss 11.6418   LearningRate 0.0720   Epoch: 3   Global Step: 125840   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:51:40,746-Speed 2613.80 samples/sec   Loss 11.7762   LearningRate 0.0720   Epoch: 3   Global Step: 125850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:51:44,647-Speed 2625.90 samples/sec   Loss 11.8140   LearningRate 0.0720   Epoch: 3   Global Step: 125860   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:51:48,554-Speed 2621.55 samples/sec   Loss 11.8122   LearningRate 0.0720   Epoch: 3   Global Step: 125870   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:51:52,452-Speed 2627.54 samples/sec   Loss 11.7132   LearningRate 0.0720   Epoch: 3   Global Step: 125880   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:51:56,361-Speed 2620.36 samples/sec   Loss 11.6106   LearningRate 0.0720   Epoch: 3   Global Step: 125890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:52:00,272-Speed 2619.02 samples/sec   Loss 11.8679   LearningRate 0.0720   Epoch: 3   Global Step: 125900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:52:04,179-Speed 2621.57 samples/sec   Loss 11.9712   LearningRate 0.0719   Epoch: 3   Global Step: 125910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:52:08,083-Speed 2623.05 samples/sec   Loss 11.8165   LearningRate 0.0719   Epoch: 3   Global Step: 125920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:52:11,945-Speed 2651.86 samples/sec   Loss 11.7136   LearningRate 0.0719   Epoch: 3   Global Step: 125930   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:15,853-Speed 2621.51 samples/sec   Loss 11.6964   LearningRate 0.0719   Epoch: 3   Global Step: 125940   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:19,749-Speed 2629.13 samples/sec   Loss 11.6822   LearningRate 0.0719   Epoch: 3   Global Step: 125950   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:23,654-Speed 2622.58 samples/sec   Loss 11.6891   LearningRate 0.0719   Epoch: 3   Global Step: 125960   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:27,559-Speed 2622.89 samples/sec   Loss 11.8130   LearningRate 0.0719   Epoch: 3   Global Step: 125970   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:31,464-Speed 2623.02 samples/sec   Loss 11.6893   LearningRate 0.0719   Epoch: 3   Global Step: 125980   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:35,370-Speed 2622.42 samples/sec   Loss 11.5974   LearningRate 0.0719   Epoch: 3   Global Step: 125990   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:39,264-Speed 2629.89 samples/sec   Loss 11.6534   LearningRate 0.0719   Epoch: 3   Global Step: 126000   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:43,187-Speed 2611.46 samples/sec   Loss 11.8859   LearningRate 0.0719   Epoch: 3   Global Step: 126010   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:47,083-Speed 2628.86 samples/sec   Loss 11.6930   LearningRate 0.0719   Epoch: 3   Global Step: 126020   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 09:52:50,997-Speed 2625.36 samples/sec   Loss 11.8475   LearningRate 0.0719   Epoch: 3   Global Step: 126030   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:52:54,896-Speed 2627.06 samples/sec   Loss 11.8469   LearningRate 0.0719   Epoch: 3   Global Step: 126040   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:52:58,791-Speed 2629.86 samples/sec   Loss 11.7350   LearningRate 0.0719   Epoch: 3   Global Step: 126050   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:02,690-Speed 2626.98 samples/sec   Loss 11.8048   LearningRate 0.0719   Epoch: 3   Global Step: 126060   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:06,588-Speed 2627.62 samples/sec   Loss 11.8054   LearningRate 0.0719   Epoch: 3   Global Step: 126070   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:10,488-Speed 2626.46 samples/sec   Loss 11.8105   LearningRate 0.0719   Epoch: 3   Global Step: 126080   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:14,383-Speed 2630.17 samples/sec   Loss 11.7707   LearningRate 0.0719   Epoch: 3   Global Step: 126090   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:18,275-Speed 2630.99 samples/sec   Loss 11.8958   LearningRate 0.0719   Epoch: 3   Global Step: 126100   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:22,175-Speed 2626.97 samples/sec   Loss 11.7294   LearningRate 0.0719   Epoch: 3   Global Step: 126110   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:26,075-Speed 2625.63 samples/sec   Loss 12.0591   LearningRate 0.0719   Epoch: 3   Global Step: 126120   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:29,968-Speed 2631.54 samples/sec   Loss 11.7605   LearningRate 0.0719   Epoch: 3   Global Step: 126130   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:53:33,860-Speed 2631.74 samples/sec   Loss 11.7266   LearningRate 0.0719   Epoch: 3   Global Step: 126140   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:53:37,760-Speed 2625.71 samples/sec   Loss 11.7707   LearningRate 0.0719   Epoch: 3   Global Step: 126150   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:53:41,655-Speed 2629.66 samples/sec   Loss 11.8367   LearningRate 0.0719   Epoch: 3   Global Step: 126160   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:53:45,554-Speed 2627.35 samples/sec   Loss 11.8123   LearningRate 0.0719   Epoch: 3   Global Step: 126170   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:53:49,454-Speed 2626.27 samples/sec   Loss 11.7837   LearningRate 0.0719   Epoch: 3   Global Step: 126180   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:53:53,351-Speed 2628.20 samples/sec   Loss 11.6929   LearningRate 0.0719   Epoch: 3   Global Step: 126190   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:53:57,287-Speed 2602.28 samples/sec   Loss 11.8090   LearningRate 0.0719   Epoch: 3   Global Step: 126200   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:01,182-Speed 2630.07 samples/sec   Loss 11.6795   LearningRate 0.0719   Epoch: 3   Global Step: 126210   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:05,086-Speed 2623.83 samples/sec   Loss 11.7475   LearningRate 0.0719   Epoch: 3   Global Step: 126220   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:08,987-Speed 2625.50 samples/sec   Loss 11.6720   LearningRate 0.0719   Epoch: 3   Global Step: 126230   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:12,884-Speed 2628.63 samples/sec   Loss 11.8032   LearningRate 0.0719   Epoch: 3   Global Step: 126240   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:16,784-Speed 2625.95 samples/sec   Loss 11.7044   LearningRate 0.0719   Epoch: 3   Global Step: 126250   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:20,679-Speed 2629.47 samples/sec   Loss 12.0247   LearningRate 0.0719   Epoch: 3   Global Step: 126260   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:24,574-Speed 2629.94 samples/sec   Loss 11.6536   LearningRate 0.0719   Epoch: 3   Global Step: 126270   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:28,508-Speed 2603.21 samples/sec   Loss 11.8452   LearningRate 0.0719   Epoch: 3   Global Step: 126280   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:54:32,422-Speed 2617.26 samples/sec   Loss 11.8246   LearningRate 0.0719   Epoch: 3   Global Step: 126290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:36,319-Speed 2628.65 samples/sec   Loss 11.7651   LearningRate 0.0719   Epoch: 3   Global Step: 126300   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:40,215-Speed 2628.87 samples/sec   Loss 11.7735   LearningRate 0.0719   Epoch: 3   Global Step: 126310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:44,109-Speed 2630.30 samples/sec   Loss 11.7611   LearningRate 0.0719   Epoch: 3   Global Step: 126320   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:48,003-Speed 2630.59 samples/sec   Loss 11.6454   LearningRate 0.0719   Epoch: 3   Global Step: 126330   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:51,902-Speed 2626.72 samples/sec   Loss 11.8674   LearningRate 0.0719   Epoch: 3   Global Step: 126340   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:55,800-Speed 2628.04 samples/sec   Loss 11.8534   LearningRate 0.0719   Epoch: 3   Global Step: 126350   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:54:59,718-Speed 2614.30 samples/sec   Loss 11.8707   LearningRate 0.0719   Epoch: 3   Global Step: 126360   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:55:03,681-Speed 2585.00 samples/sec   Loss 11.8062   LearningRate 0.0719   Epoch: 3   Global Step: 126370   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:55:07,584-Speed 2624.40 samples/sec   Loss 11.7972   LearningRate 0.0719   Epoch: 3   Global Step: 126380   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:55:11,493-Speed 2620.04 samples/sec   Loss 11.6972   LearningRate 0.0719   Epoch: 3   Global Step: 126390   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:15,397-Speed 2623.47 samples/sec   Loss 11.8872   LearningRate 0.0718   Epoch: 3   Global Step: 126400   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:19,308-Speed 2619.26 samples/sec   Loss 11.8070   LearningRate 0.0718   Epoch: 3   Global Step: 126410   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:23,210-Speed 2624.87 samples/sec   Loss 11.7108   LearningRate 0.0718   Epoch: 3   Global Step: 126420   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:27,107-Speed 2627.52 samples/sec   Loss 11.7121   LearningRate 0.0718   Epoch: 3   Global Step: 126430   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:31,005-Speed 2630.76 samples/sec   Loss 11.7034   LearningRate 0.0718   Epoch: 3   Global Step: 126440   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:34,927-Speed 2612.02 samples/sec   Loss 11.6629   LearningRate 0.0718   Epoch: 3   Global Step: 126450   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:38,820-Speed 2630.66 samples/sec   Loss 11.7394   LearningRate 0.0718   Epoch: 3   Global Step: 126460   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:42,715-Speed 2629.61 samples/sec   Loss 11.7016   LearningRate 0.0718   Epoch: 3   Global Step: 126470   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:46,620-Speed 2622.87 samples/sec   Loss 11.7009   LearningRate 0.0718   Epoch: 3   Global Step: 126480   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:50,502-Speed 2638.03 samples/sec   Loss 11.8752   LearningRate 0.0718   Epoch: 3   Global Step: 126490   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:54,403-Speed 2626.55 samples/sec   Loss 11.4873   LearningRate 0.0718   Epoch: 3   Global Step: 126500   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:55:58,302-Speed 2626.62 samples/sec   Loss 11.8502   LearningRate 0.0718   Epoch: 3   Global Step: 126510   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:56:02,199-Speed 2628.18 samples/sec   Loss 11.6859   LearningRate 0.0718   Epoch: 3   Global Step: 126520   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:56:06,201-Speed 2559.88 samples/sec   Loss 11.6523   LearningRate 0.0718   Epoch: 3   Global Step: 126530   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:10,136-Speed 2602.76 samples/sec   Loss 11.8235   LearningRate 0.0718   Epoch: 3   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:14,032-Speed 2628.52 samples/sec   Loss 11.6859   LearningRate 0.0718   Epoch: 3   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:17,927-Speed 2629.45 samples/sec   Loss 11.7031   LearningRate 0.0718   Epoch: 3   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:21,832-Speed 2623.05 samples/sec   Loss 11.8364   LearningRate 0.0718   Epoch: 3   Global Step: 126570   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:25,729-Speed 2628.46 samples/sec   Loss 11.7766   LearningRate 0.0718   Epoch: 3   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:29,639-Speed 2619.30 samples/sec   Loss 11.8224   LearningRate 0.0718   Epoch: 3   Global Step: 126590   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:33,531-Speed 2631.15 samples/sec   Loss 11.6581   LearningRate 0.0718   Epoch: 3   Global Step: 126600   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:37,432-Speed 2625.86 samples/sec   Loss 11.7993   LearningRate 0.0718   Epoch: 3   Global Step: 126610   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:41,338-Speed 2622.30 samples/sec   Loss 11.8217   LearningRate 0.0718   Epoch: 3   Global Step: 126620   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:56:45,232-Speed 2630.62 samples/sec   Loss 11.7364   LearningRate 0.0718   Epoch: 3   Global Step: 126630   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:56:49,134-Speed 2624.93 samples/sec   Loss 11.8398   LearningRate 0.0718   Epoch: 3   Global Step: 126640   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:56:53,027-Speed 2630.67 samples/sec   Loss 11.6745   LearningRate 0.0718   Epoch: 3   Global Step: 126650   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:56:56,925-Speed 2627.76 samples/sec   Loss 11.7806   LearningRate 0.0718   Epoch: 3   Global Step: 126660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:00,827-Speed 2624.83 samples/sec   Loss 11.7038   LearningRate 0.0718   Epoch: 3   Global Step: 126670   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:04,733-Speed 2621.72 samples/sec   Loss 11.7896   LearningRate 0.0718   Epoch: 3   Global Step: 126680   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:08,636-Speed 2624.65 samples/sec   Loss 11.5782   LearningRate 0.0718   Epoch: 3   Global Step: 126690   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:12,537-Speed 2625.34 samples/sec   Loss 11.7581   LearningRate 0.0718   Epoch: 3   Global Step: 126700   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:16,442-Speed 2623.60 samples/sec   Loss 11.8433   LearningRate 0.0718   Epoch: 3   Global Step: 126710   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:20,344-Speed 2624.90 samples/sec   Loss 11.8178   LearningRate 0.0718   Epoch: 3   Global Step: 126720   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:24,244-Speed 2626.13 samples/sec   Loss 11.5851   LearningRate 0.0718   Epoch: 3   Global Step: 126730   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:57:28,148-Speed 2623.77 samples/sec   Loss 11.7596   LearningRate 0.0718   Epoch: 3   Global Step: 126740   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:57:32,051-Speed 2623.88 samples/sec   Loss 11.6734   LearningRate 0.0718   Epoch: 3   Global Step: 126750   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:57:35,957-Speed 2622.08 samples/sec   Loss 11.8245   LearningRate 0.0718   Epoch: 3   Global Step: 126760   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:57:39,840-Speed 2637.38 samples/sec   Loss 11.7989   LearningRate 0.0718   Epoch: 3   Global Step: 126770   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:43,739-Speed 2626.98 samples/sec   Loss 11.7142   LearningRate 0.0718   Epoch: 3   Global Step: 126780   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:47,639-Speed 2626.72 samples/sec   Loss 11.8023   LearningRate 0.0718   Epoch: 3   Global Step: 126790   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:51,550-Speed 2618.87 samples/sec   Loss 11.8689   LearningRate 0.0718   Epoch: 3   Global Step: 126800   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:55,626-Speed 2513.15 samples/sec   Loss 11.5002   LearningRate 0.0718   Epoch: 3   Global Step: 126810   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:57:59,631-Speed 2557.27 samples/sec   Loss 11.6105   LearningRate 0.0718   Epoch: 3   Global Step: 126820   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:03,527-Speed 2628.77 samples/sec   Loss 11.8101   LearningRate 0.0718   Epoch: 3   Global Step: 126830   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:07,424-Speed 2628.31 samples/sec   Loss 11.7724   LearningRate 0.0718   Epoch: 3   Global Step: 126840   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:11,319-Speed 2629.24 samples/sec   Loss 11.6755   LearningRate 0.0718   Epoch: 3   Global Step: 126850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:15,224-Speed 2623.48 samples/sec   Loss 11.7169   LearningRate 0.0718   Epoch: 3   Global Step: 126860   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:19,129-Speed 2622.86 samples/sec   Loss 11.7280   LearningRate 0.0718   Epoch: 3   Global Step: 126870   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:58:23,025-Speed 2628.88 samples/sec   Loss 11.6532   LearningRate 0.0718   Epoch: 3   Global Step: 126880   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 09:58:26,908-Speed 2638.08 samples/sec   Loss 11.7903   LearningRate 0.0717   Epoch: 3   Global Step: 126890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:30,803-Speed 2630.08 samples/sec   Loss 11.8497   LearningRate 0.0717   Epoch: 3   Global Step: 126900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:34,704-Speed 2625.23 samples/sec   Loss 11.6460   LearningRate 0.0717   Epoch: 3   Global Step: 126910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:38,602-Speed 2627.10 samples/sec   Loss 11.5945   LearningRate 0.0717   Epoch: 3   Global Step: 126920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:58:42,485-Speed 2638.06 samples/sec   Loss 11.5061   LearningRate 0.0717   Epoch: 3   Global Step: 126930   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:58:46,385-Speed 2626.19 samples/sec   Loss 11.7154   LearningRate 0.0717   Epoch: 3   Global Step: 126940   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:58:50,283-Speed 2628.11 samples/sec   Loss 11.6601   LearningRate 0.0717   Epoch: 3   Global Step: 126950   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:58:54,181-Speed 2627.50 samples/sec   Loss 11.9510   LearningRate 0.0717   Epoch: 3   Global Step: 126960   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:58:58,073-Speed 2631.64 samples/sec   Loss 11.7020   LearningRate 0.0717   Epoch: 3   Global Step: 126970   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:01,971-Speed 2627.93 samples/sec   Loss 12.1375   LearningRate 0.0717   Epoch: 3   Global Step: 126980   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:05,864-Speed 2630.61 samples/sec   Loss 12.0162   LearningRate 0.0717   Epoch: 3   Global Step: 126990   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:09,755-Speed 2632.42 samples/sec   Loss 11.8112   LearningRate 0.0717   Epoch: 3   Global Step: 127000   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:13,650-Speed 2629.69 samples/sec   Loss 11.9964   LearningRate 0.0717   Epoch: 3   Global Step: 127010   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:17,578-Speed 2607.22 samples/sec   Loss 11.9440   LearningRate 0.0717   Epoch: 3   Global Step: 127020   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:21,476-Speed 2627.83 samples/sec   Loss 11.6849   LearningRate 0.0717   Epoch: 3   Global Step: 127030   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 09:59:25,366-Speed 2633.16 samples/sec   Loss 11.8174   LearningRate 0.0717   Epoch: 3   Global Step: 127040   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:29,262-Speed 2628.90 samples/sec   Loss 11.6445   LearningRate 0.0717   Epoch: 3   Global Step: 127050   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:33,161-Speed 2627.23 samples/sec   Loss 11.7393   LearningRate 0.0717   Epoch: 3   Global Step: 127060   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:37,053-Speed 2631.70 samples/sec   Loss 11.7862   LearningRate 0.0717   Epoch: 3   Global Step: 127070   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:40,949-Speed 2628.29 samples/sec   Loss 11.8749   LearningRate 0.0717   Epoch: 3   Global Step: 127080   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:44,842-Speed 2631.62 samples/sec   Loss 11.7609   LearningRate 0.0717   Epoch: 3   Global Step: 127090   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:48,735-Speed 2630.40 samples/sec   Loss 11.8234   LearningRate 0.0717   Epoch: 3   Global Step: 127100   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:52,634-Speed 2627.05 samples/sec   Loss 11.7725   LearningRate 0.0717   Epoch: 3   Global Step: 127110   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 09:59:56,528-Speed 2630.43 samples/sec   Loss 11.8166   LearningRate 0.0717   Epoch: 3   Global Step: 127120   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:00:00,421-Speed 2631.32 samples/sec   Loss 11.7567   LearningRate 0.0717   Epoch: 3   Global Step: 127130   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:00:04,316-Speed 2629.66 samples/sec   Loss 11.7742   LearningRate 0.0717   Epoch: 3   Global Step: 127140   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:08,231-Speed 2616.31 samples/sec   Loss 11.7316   LearningRate 0.0717   Epoch: 3   Global Step: 127150   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:12,138-Speed 2621.06 samples/sec   Loss 11.5890   LearningRate 0.0717   Epoch: 3   Global Step: 127160   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:16,036-Speed 2628.51 samples/sec   Loss 11.7564   LearningRate 0.0717   Epoch: 3   Global Step: 127170   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:19,935-Speed 2626.88 samples/sec   Loss 11.5854   LearningRate 0.0717   Epoch: 3   Global Step: 127180   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:23,836-Speed 2624.99 samples/sec   Loss 11.6478   LearningRate 0.0717   Epoch: 3   Global Step: 127190   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:27,734-Speed 2628.06 samples/sec   Loss 11.5889   LearningRate 0.0717   Epoch: 3   Global Step: 127200   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:31,628-Speed 2630.75 samples/sec   Loss 11.7099   LearningRate 0.0717   Epoch: 3   Global Step: 127210   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:35,572-Speed 2596.83 samples/sec   Loss 11.6834   LearningRate 0.0717   Epoch: 3   Global Step: 127220   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:39,465-Speed 2630.99 samples/sec   Loss 11.7758   LearningRate 0.0717   Epoch: 3   Global Step: 127230   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:43,357-Speed 2631.77 samples/sec   Loss 11.8183   LearningRate 0.0717   Epoch: 3   Global Step: 127240   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:00:47,254-Speed 2628.43 samples/sec   Loss 11.7385   LearningRate 0.0717   Epoch: 3   Global Step: 127250   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:00:51,148-Speed 2630.31 samples/sec   Loss 11.9327   LearningRate 0.0717   Epoch: 3   Global Step: 127260   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:00:55,032-Speed 2636.64 samples/sec   Loss 11.4623   LearningRate 0.0717   Epoch: 3   Global Step: 127270   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:00:58,938-Speed 2622.77 samples/sec   Loss 11.8957   LearningRate 0.0717   Epoch: 3   Global Step: 127280   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:02,839-Speed 2625.26 samples/sec   Loss 11.6629   LearningRate 0.0717   Epoch: 3   Global Step: 127290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:06,734-Speed 2629.97 samples/sec   Loss 11.7334   LearningRate 0.0717   Epoch: 3   Global Step: 127300   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:10,629-Speed 2629.66 samples/sec   Loss 11.8341   LearningRate 0.0717   Epoch: 3   Global Step: 127310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:14,532-Speed 2624.54 samples/sec   Loss 11.7315   LearningRate 0.0717   Epoch: 3   Global Step: 127320   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:18,440-Speed 2620.86 samples/sec   Loss 11.7153   LearningRate 0.0717   Epoch: 3   Global Step: 127330   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:22,335-Speed 2629.08 samples/sec   Loss 11.7529   LearningRate 0.0717   Epoch: 3   Global Step: 127340   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:26,242-Speed 2621.98 samples/sec   Loss 11.7672   LearningRate 0.0717   Epoch: 3   Global Step: 127350   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:30,144-Speed 2624.69 samples/sec   Loss 11.7032   LearningRate 0.0717   Epoch: 3   Global Step: 127360   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:01:34,048-Speed 2624.10 samples/sec   Loss 11.8327   LearningRate 0.0717   Epoch: 3   Global Step: 127370   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:01:37,953-Speed 2623.16 samples/sec   Loss 11.6655   LearningRate 0.0716   Epoch: 3   Global Step: 127380   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:01:41,857-Speed 2623.68 samples/sec   Loss 11.6925   LearningRate 0.0716   Epoch: 3   Global Step: 127390   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:01:45,766-Speed 2620.78 samples/sec   Loss 11.7120   LearningRate 0.0716   Epoch: 3   Global Step: 127400   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:01:49,705-Speed 2599.77 samples/sec   Loss 11.8277   LearningRate 0.0716   Epoch: 3   Global Step: 127410   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:01:53,581-Speed 2642.89 samples/sec   Loss 11.6146   LearningRate 0.0716   Epoch: 3   Global Step: 127420   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:01:57,483-Speed 2624.62 samples/sec   Loss 11.8467   LearningRate 0.0716   Epoch: 3   Global Step: 127430   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:01,393-Speed 2619.86 samples/sec   Loss 11.7672   LearningRate 0.0716   Epoch: 3   Global Step: 127440   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:05,296-Speed 2623.71 samples/sec   Loss 11.7897   LearningRate 0.0716   Epoch: 3   Global Step: 127450   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:09,207-Speed 2619.52 samples/sec   Loss 11.8402   LearningRate 0.0716   Epoch: 3   Global Step: 127460   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:13,162-Speed 2590.25 samples/sec   Loss 11.9202   LearningRate 0.0716   Epoch: 3   Global Step: 127470   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:17,067-Speed 2622.91 samples/sec   Loss 11.6238   LearningRate 0.0716   Epoch: 3   Global Step: 127480   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:20,960-Speed 2631.29 samples/sec   Loss 11.6669   LearningRate 0.0716   Epoch: 3   Global Step: 127490   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:24,848-Speed 2634.37 samples/sec   Loss 11.7893   LearningRate 0.0716   Epoch: 3   Global Step: 127500   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:28,740-Speed 2630.97 samples/sec   Loss 11.7150   LearningRate 0.0716   Epoch: 3   Global Step: 127510   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:02:32,630-Speed 2633.26 samples/sec   Loss 11.7563   LearningRate 0.0716   Epoch: 3   Global Step: 127520   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:36,519-Speed 2633.82 samples/sec   Loss 11.7666   LearningRate 0.0716   Epoch: 3   Global Step: 127530   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:40,419-Speed 2626.10 samples/sec   Loss 11.7183   LearningRate 0.0716   Epoch: 3   Global Step: 127540   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:44,320-Speed 2625.54 samples/sec   Loss 11.5827   LearningRate 0.0716   Epoch: 3   Global Step: 127550   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:48,228-Speed 2620.94 samples/sec   Loss 11.6888   LearningRate 0.0716   Epoch: 3   Global Step: 127560   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:52,131-Speed 2624.77 samples/sec   Loss 11.6174   LearningRate 0.0716   Epoch: 3   Global Step: 127570   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:56,031-Speed 2626.40 samples/sec   Loss 11.7408   LearningRate 0.0716   Epoch: 3   Global Step: 127580   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:02:59,933-Speed 2624.72 samples/sec   Loss 11.5449   LearningRate 0.0716   Epoch: 3   Global Step: 127590   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:03:03,837-Speed 2623.42 samples/sec   Loss 11.6841   LearningRate 0.0716   Epoch: 3   Global Step: 127600   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:03:07,748-Speed 2618.81 samples/sec   Loss 11.7059   LearningRate 0.0716   Epoch: 3   Global Step: 127610   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:03:11,661-Speed 2617.75 samples/sec   Loss 11.5596   LearningRate 0.0716   Epoch: 3   Global Step: 127620   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:03:15,560-Speed 2626.99 samples/sec   Loss 11.8608   LearningRate 0.0716   Epoch: 3   Global Step: 127630   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:03:19,443-Speed 2637.63 samples/sec   Loss 11.7442   LearningRate 0.0716   Epoch: 3   Global Step: 127640   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:03:23,345-Speed 2625.33 samples/sec   Loss 11.6519   LearningRate 0.0716   Epoch: 3   Global Step: 127650   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:03:27,244-Speed 2626.55 samples/sec   Loss 11.8642   LearningRate 0.0716   Epoch: 3   Global Step: 127660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:03:31,156-Speed 2618.85 samples/sec   Loss 11.7843   LearningRate 0.0716   Epoch: 3   Global Step: 127670   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:03:35,070-Speed 2616.27 samples/sec   Loss 11.7061   LearningRate 0.0716   Epoch: 3   Global Step: 127680   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:03:38,956-Speed 2635.86 samples/sec   Loss 11.7380   LearningRate 0.0716   Epoch: 3   Global Step: 127690   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:03:42,830-Speed 2643.56 samples/sec   Loss 11.9166   LearningRate 0.0716   Epoch: 3   Global Step: 127700   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:03:46,725-Speed 2630.45 samples/sec   Loss 11.8284   LearningRate 0.0716   Epoch: 3   Global Step: 127710   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:03:50,615-Speed 2633.25 samples/sec   Loss 11.8292   LearningRate 0.0716   Epoch: 3   Global Step: 127720   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:03:54,518-Speed 2623.53 samples/sec   Loss 11.6588   LearningRate 0.0716   Epoch: 3   Global Step: 127730   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:03:58,421-Speed 2624.48 samples/sec   Loss 11.6777   LearningRate 0.0716   Epoch: 3   Global Step: 127740   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:04:02,329-Speed 2620.97 samples/sec   Loss 11.8488   LearningRate 0.0716   Epoch: 3   Global Step: 127750   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:04:06,219-Speed 2632.58 samples/sec   Loss 11.8841   LearningRate 0.0716   Epoch: 3   Global Step: 127760   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:04:10,127-Speed 2621.24 samples/sec   Loss 11.7802   LearningRate 0.0716   Epoch: 3   Global Step: 127770   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:04:14,024-Speed 2628.45 samples/sec   Loss 11.6980   LearningRate 0.0716   Epoch: 3   Global Step: 127780   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:04:17,928-Speed 2623.39 samples/sec   Loss 11.8324   LearningRate 0.0716   Epoch: 3   Global Step: 127790   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:04:21,837-Speed 2620.73 samples/sec   Loss 11.7195   LearningRate 0.0716   Epoch: 3   Global Step: 127800   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:25,741-Speed 2623.04 samples/sec   Loss 11.9143   LearningRate 0.0716   Epoch: 3   Global Step: 127810   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:29,639-Speed 2628.14 samples/sec   Loss 11.7653   LearningRate 0.0716   Epoch: 3   Global Step: 127820   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:33,545-Speed 2622.22 samples/sec   Loss 11.7578   LearningRate 0.0716   Epoch: 3   Global Step: 127830   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:37,448-Speed 2624.08 samples/sec   Loss 11.6173   LearningRate 0.0716   Epoch: 3   Global Step: 127840   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:41,353-Speed 2622.52 samples/sec   Loss 11.8387   LearningRate 0.0716   Epoch: 3   Global Step: 127850   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:45,255-Speed 2624.65 samples/sec   Loss 11.7561   LearningRate 0.0716   Epoch: 3   Global Step: 127860   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:49,159-Speed 2623.91 samples/sec   Loss 11.6417   LearningRate 0.0715   Epoch: 3   Global Step: 127870   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:53,059-Speed 2626.90 samples/sec   Loss 11.7431   LearningRate 0.0715   Epoch: 3   Global Step: 127880   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:04:57,087-Speed 2542.68 samples/sec   Loss 11.6699   LearningRate 0.0715   Epoch: 3   Global Step: 127890   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:05:00,982-Speed 2629.69 samples/sec   Loss 11.7977   LearningRate 0.0715   Epoch: 3   Global Step: 127900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:04,889-Speed 2621.46 samples/sec   Loss 11.6704   LearningRate 0.0715   Epoch: 3   Global Step: 127910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:08,784-Speed 2629.14 samples/sec   Loss 11.7678   LearningRate 0.0715   Epoch: 3   Global Step: 127920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:12,701-Speed 2615.06 samples/sec   Loss 11.6242   LearningRate 0.0715   Epoch: 3   Global Step: 127930   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:16,606-Speed 2622.44 samples/sec   Loss 11.8245   LearningRate 0.0715   Epoch: 3   Global Step: 127940   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:20,511-Speed 2623.63 samples/sec   Loss 11.5857   LearningRate 0.0715   Epoch: 3   Global Step: 127950   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:24,416-Speed 2623.00 samples/sec   Loss 11.6356   LearningRate 0.0715   Epoch: 3   Global Step: 127960   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:28,321-Speed 2622.51 samples/sec   Loss 11.7350   LearningRate 0.0715   Epoch: 3   Global Step: 127970   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:32,232-Speed 2619.33 samples/sec   Loss 11.5740   LearningRate 0.0715   Epoch: 3   Global Step: 127980   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:36,204-Speed 2578.64 samples/sec   Loss 11.7084   LearningRate 0.0715   Epoch: 3   Global Step: 127990   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:40,111-Speed 2621.32 samples/sec   Loss 11.8476   LearningRate 0.0715   Epoch: 3   Global Step: 128000   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:05:43,996-Speed 2636.33 samples/sec   Loss 11.6992   LearningRate 0.0715   Epoch: 3   Global Step: 128010   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:47,903-Speed 2620.87 samples/sec   Loss 11.6715   LearningRate 0.0715   Epoch: 3   Global Step: 128020   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:51,801-Speed 2627.98 samples/sec   Loss 11.7136   LearningRate 0.0715   Epoch: 3   Global Step: 128030   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:55,781-Speed 2573.21 samples/sec   Loss 11.7036   LearningRate 0.0715   Epoch: 3   Global Step: 128040   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:05:59,686-Speed 2623.16 samples/sec   Loss 11.5263   LearningRate 0.0715   Epoch: 3   Global Step: 128050   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:06:03,590-Speed 2623.48 samples/sec   Loss 11.6887   LearningRate 0.0715   Epoch: 3   Global Step: 128060   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:06:07,490-Speed 2625.93 samples/sec   Loss 11.7474   LearningRate 0.0715   Epoch: 3   Global Step: 128070   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:06:11,385-Speed 2629.44 samples/sec   Loss 11.7938   LearningRate 0.0715   Epoch: 3   Global Step: 128080   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:06:15,275-Speed 2632.98 samples/sec   Loss 11.7374   LearningRate 0.0715   Epoch: 3   Global Step: 128090   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:06:19,213-Speed 2600.60 samples/sec   Loss 11.7177   LearningRate 0.0715   Epoch: 3   Global Step: 128100   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:06:23,313-Speed 2498.02 samples/sec   Loss 11.6393   LearningRate 0.0715   Epoch: 3   Global Step: 128110   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:27,213-Speed 2626.69 samples/sec   Loss 11.7182   LearningRate 0.0715   Epoch: 3   Global Step: 128120   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:31,110-Speed 2629.18 samples/sec   Loss 11.7207   LearningRate 0.0715   Epoch: 3   Global Step: 128130   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:35,007-Speed 2628.14 samples/sec   Loss 11.8235   LearningRate 0.0715   Epoch: 3   Global Step: 128140   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:38,909-Speed 2624.99 samples/sec   Loss 11.6127   LearningRate 0.0715   Epoch: 3   Global Step: 128150   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:42,821-Speed 2617.96 samples/sec   Loss 11.7297   LearningRate 0.0715   Epoch: 3   Global Step: 128160   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:46,728-Speed 2621.19 samples/sec   Loss 11.6254   LearningRate 0.0715   Epoch: 3   Global Step: 128170   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:50,631-Speed 2624.60 samples/sec   Loss 11.6357   LearningRate 0.0715   Epoch: 3   Global Step: 128180   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:54,540-Speed 2619.85 samples/sec   Loss 11.6868   LearningRate 0.0715   Epoch: 3   Global Step: 128190   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:06:58,463-Speed 2610.34 samples/sec   Loss 11.5698   LearningRate 0.0715   Epoch: 3   Global Step: 128200   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:02,338-Speed 2643.46 samples/sec   Loss 11.6836   LearningRate 0.0715   Epoch: 3   Global Step: 128210   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:06,233-Speed 2630.47 samples/sec   Loss 11.6300   LearningRate 0.0715   Epoch: 3   Global Step: 128220   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:10,134-Speed 2625.16 samples/sec   Loss 11.7347   LearningRate 0.0715   Epoch: 3   Global Step: 128230   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:14,034-Speed 2626.38 samples/sec   Loss 11.7834   LearningRate 0.0715   Epoch: 3   Global Step: 128240   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:17,936-Speed 2624.83 samples/sec   Loss 11.8168   LearningRate 0.0715   Epoch: 3   Global Step: 128250   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:21,842-Speed 2621.63 samples/sec   Loss 11.6936   LearningRate 0.0715   Epoch: 3   Global Step: 128260   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:25,742-Speed 2626.21 samples/sec   Loss 11.7518   LearningRate 0.0715   Epoch: 3   Global Step: 128270   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:29,650-Speed 2621.04 samples/sec   Loss 11.7885   LearningRate 0.0715   Epoch: 3   Global Step: 128280   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:33,554-Speed 2623.58 samples/sec   Loss 11.7560   LearningRate 0.0715   Epoch: 3   Global Step: 128290   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:37,454-Speed 2626.55 samples/sec   Loss 11.8401   LearningRate 0.0715   Epoch: 3   Global Step: 128300   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:07:41,363-Speed 2619.75 samples/sec   Loss 11.5934   LearningRate 0.0715   Epoch: 3   Global Step: 128310   Fp16 Grad Scale: 524288   Required: 79 hours
Training: 2022-04-13 10:07:45,238-Speed 2644.07 samples/sec   Loss 11.6903   LearningRate 0.0715   Epoch: 3   Global Step: 128320   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:07:49,132-Speed 2629.83 samples/sec   Loss 11.6651   LearningRate 0.0715   Epoch: 3   Global Step: 128330   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:07:53,020-Speed 2634.19 samples/sec   Loss 11.6715   LearningRate 0.0715   Epoch: 3   Global Step: 128340   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:07:56,913-Speed 2630.54 samples/sec   Loss 11.7267   LearningRate 0.0715   Epoch: 3   Global Step: 128350   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:08:00,810-Speed 2628.53 samples/sec   Loss 11.5234   LearningRate 0.0714   Epoch: 3   Global Step: 128360   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:08:04,702-Speed 2631.54 samples/sec   Loss 11.7411   LearningRate 0.0714   Epoch: 3   Global Step: 128370   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:08:08,598-Speed 2629.05 samples/sec   Loss 11.5286   LearningRate 0.0714   Epoch: 3   Global Step: 128380   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:08:12,471-Speed 2644.22 samples/sec   Loss 11.6463   LearningRate 0.0714   Epoch: 3   Global Step: 128390   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:16,362-Speed 2632.90 samples/sec   Loss 11.6588   LearningRate 0.0714   Epoch: 3   Global Step: 128400   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:20,255-Speed 2631.02 samples/sec   Loss 11.8049   LearningRate 0.0714   Epoch: 3   Global Step: 128410   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:24,142-Speed 2635.07 samples/sec   Loss 11.8943   LearningRate 0.0714   Epoch: 3   Global Step: 128420   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:28,036-Speed 2629.72 samples/sec   Loss 11.7710   LearningRate 0.0714   Epoch: 3   Global Step: 128430   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:31,939-Speed 2624.57 samples/sec   Loss 11.7064   LearningRate 0.0714   Epoch: 3   Global Step: 128440   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:35,828-Speed 2633.66 samples/sec   Loss 11.6428   LearningRate 0.0714   Epoch: 3   Global Step: 128450   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:39,720-Speed 2631.49 samples/sec   Loss 11.8221   LearningRate 0.0714   Epoch: 3   Global Step: 128460   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:43,612-Speed 2631.06 samples/sec   Loss 11.5941   LearningRate 0.0714   Epoch: 3   Global Step: 128470   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:47,517-Speed 2622.99 samples/sec   Loss 11.7650   LearningRate 0.0714   Epoch: 3   Global Step: 128480   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:08:51,414-Speed 2628.99 samples/sec   Loss 11.6997   LearningRate 0.0714   Epoch: 3   Global Step: 128490   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:08:55,305-Speed 2632.69 samples/sec   Loss 11.6834   LearningRate 0.0714   Epoch: 3   Global Step: 128500   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:08:59,183-Speed 2640.65 samples/sec   Loss 11.5863   LearningRate 0.0714   Epoch: 3   Global Step: 128510   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:03,080-Speed 2627.85 samples/sec   Loss 11.6975   LearningRate 0.0714   Epoch: 3   Global Step: 128520   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:06,973-Speed 2631.08 samples/sec   Loss 11.7245   LearningRate 0.0714   Epoch: 3   Global Step: 128530   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:10,879-Speed 2622.44 samples/sec   Loss 11.7807   LearningRate 0.0714   Epoch: 3   Global Step: 128540   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:14,772-Speed 2630.76 samples/sec   Loss 11.8172   LearningRate 0.0714   Epoch: 3   Global Step: 128550   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:18,672-Speed 2625.93 samples/sec   Loss 11.6944   LearningRate 0.0714   Epoch: 3   Global Step: 128560   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:22,563-Speed 2632.96 samples/sec   Loss 11.6172   LearningRate 0.0714   Epoch: 3   Global Step: 128570   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:26,455-Speed 2631.68 samples/sec   Loss 11.6357   LearningRate 0.0714   Epoch: 3   Global Step: 128580   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:30,350-Speed 2629.51 samples/sec   Loss 11.5783   LearningRate 0.0714   Epoch: 3   Global Step: 128590   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:34,242-Speed 2631.52 samples/sec   Loss 11.7839   LearningRate 0.0714   Epoch: 3   Global Step: 128600   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:09:38,134-Speed 2631.43 samples/sec   Loss 11.6835   LearningRate 0.0714   Epoch: 3   Global Step: 128610   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:09:42,027-Speed 2631.43 samples/sec   Loss 11.6715   LearningRate 0.0714   Epoch: 3   Global Step: 128620   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:09:45,926-Speed 2627.18 samples/sec   Loss 11.5259   LearningRate 0.0714   Epoch: 3   Global Step: 128630   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:09:49,817-Speed 2632.17 samples/sec   Loss 11.6515   LearningRate 0.0714   Epoch: 3   Global Step: 128640   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:09:53,726-Speed 2620.73 samples/sec   Loss 11.7352   LearningRate 0.0714   Epoch: 3   Global Step: 128650   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:09:57,620-Speed 2630.37 samples/sec   Loss 11.6599   LearningRate 0.0714   Epoch: 3   Global Step: 128660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:01,522-Speed 2625.51 samples/sec   Loss 11.5428   LearningRate 0.0714   Epoch: 3   Global Step: 128670   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:05,416-Speed 2630.11 samples/sec   Loss 11.8359   LearningRate 0.0714   Epoch: 3   Global Step: 128680   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:09,308-Speed 2631.49 samples/sec   Loss 11.5571   LearningRate 0.0714   Epoch: 3   Global Step: 128690   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:13,202-Speed 2630.52 samples/sec   Loss 11.6692   LearningRate 0.0714   Epoch: 3   Global Step: 128700   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:17,100-Speed 2627.39 samples/sec   Loss 11.6835   LearningRate 0.0714   Epoch: 3   Global Step: 128710   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:10:20,998-Speed 2627.49 samples/sec   Loss 11.7466   LearningRate 0.0714   Epoch: 3   Global Step: 128720   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:10:24,949-Speed 2593.05 samples/sec   Loss 11.7119   LearningRate 0.0714   Epoch: 3   Global Step: 128730   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:10:28,838-Speed 2633.37 samples/sec   Loss 11.7225   LearningRate 0.0714   Epoch: 3   Global Step: 128740   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:10:32,732-Speed 2630.90 samples/sec   Loss 11.6952   LearningRate 0.0714   Epoch: 3   Global Step: 128750   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:10:36,624-Speed 2630.97 samples/sec   Loss 11.8177   LearningRate 0.0714   Epoch: 3   Global Step: 128760   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:10:40,498-Speed 2643.66 samples/sec   Loss 11.6017   LearningRate 0.0714   Epoch: 3   Global Step: 128770   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:44,425-Speed 2608.05 samples/sec   Loss 11.6138   LearningRate 0.0714   Epoch: 3   Global Step: 128780   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:48,324-Speed 2626.82 samples/sec   Loss 11.6755   LearningRate 0.0714   Epoch: 3   Global Step: 128790   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:52,229-Speed 2623.13 samples/sec   Loss 11.5965   LearningRate 0.0714   Epoch: 3   Global Step: 128800   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:10:56,128-Speed 2627.15 samples/sec   Loss 11.5430   LearningRate 0.0714   Epoch: 3   Global Step: 128810   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:00,041-Speed 2617.85 samples/sec   Loss 11.7462   LearningRate 0.0714   Epoch: 3   Global Step: 128820   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:03,937-Speed 2628.49 samples/sec   Loss 11.6722   LearningRate 0.0714   Epoch: 3   Global Step: 128830   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:07,833-Speed 2629.24 samples/sec   Loss 11.6413   LearningRate 0.0714   Epoch: 3   Global Step: 128840   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:11,730-Speed 2628.24 samples/sec   Loss 11.7569   LearningRate 0.0713   Epoch: 3   Global Step: 128850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:15,623-Speed 2631.08 samples/sec   Loss 11.6295   LearningRate 0.0713   Epoch: 3   Global Step: 128860   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:19,509-Speed 2635.49 samples/sec   Loss 11.8304   LearningRate 0.0713   Epoch: 3   Global Step: 128870   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:23,396-Speed 2635.60 samples/sec   Loss 11.6842   LearningRate 0.0713   Epoch: 3   Global Step: 128880   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:27,291-Speed 2629.74 samples/sec   Loss 11.5885   LearningRate 0.0713   Epoch: 3   Global Step: 128890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:31,184-Speed 2630.92 samples/sec   Loss 11.5692   LearningRate 0.0713   Epoch: 3   Global Step: 128900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:35,079-Speed 2629.45 samples/sec   Loss 11.6548   LearningRate 0.0713   Epoch: 3   Global Step: 128910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:38,974-Speed 2629.10 samples/sec   Loss 11.5265   LearningRate 0.0713   Epoch: 3   Global Step: 128920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:42,867-Speed 2631.06 samples/sec   Loss 11.5897   LearningRate 0.0713   Epoch: 3   Global Step: 128930   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:46,760-Speed 2631.53 samples/sec   Loss 11.7354   LearningRate 0.0713   Epoch: 3   Global Step: 128940   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:50,662-Speed 2624.76 samples/sec   Loss 11.7769   LearningRate 0.0713   Epoch: 3   Global Step: 128950   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:54,564-Speed 2625.64 samples/sec   Loss 11.6207   LearningRate 0.0713   Epoch: 3   Global Step: 128960   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:11:58,459-Speed 2629.38 samples/sec   Loss 11.8681   LearningRate 0.0713   Epoch: 3   Global Step: 128970   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:12:02,335-Speed 2643.08 samples/sec   Loss 11.7734   LearningRate 0.0713   Epoch: 3   Global Step: 128980   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:06,241-Speed 2621.53 samples/sec   Loss 11.7628   LearningRate 0.0713   Epoch: 3   Global Step: 128990   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:10,140-Speed 2627.32 samples/sec   Loss 11.5741   LearningRate 0.0713   Epoch: 3   Global Step: 129000   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:14,035-Speed 2629.45 samples/sec   Loss 11.7280   LearningRate 0.0713   Epoch: 3   Global Step: 129010   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:17,930-Speed 2629.79 samples/sec   Loss 11.7230   LearningRate 0.0713   Epoch: 3   Global Step: 129020   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:21,851-Speed 2612.42 samples/sec   Loss 11.6406   LearningRate 0.0713   Epoch: 3   Global Step: 129030   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:25,832-Speed 2573.05 samples/sec   Loss 11.6540   LearningRate 0.0713   Epoch: 3   Global Step: 129040   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:29,758-Speed 2609.05 samples/sec   Loss 11.6916   LearningRate 0.0713   Epoch: 3   Global Step: 129050   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:33,650-Speed 2631.06 samples/sec   Loss 11.7591   LearningRate 0.0713   Epoch: 3   Global Step: 129060   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:37,543-Speed 2632.23 samples/sec   Loss 11.6781   LearningRate 0.0713   Epoch: 3   Global Step: 129070   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:41,442-Speed 2627.08 samples/sec   Loss 11.6315   LearningRate 0.0713   Epoch: 3   Global Step: 129080   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:12:45,341-Speed 2627.23 samples/sec   Loss 11.5289   LearningRate 0.0713   Epoch: 3   Global Step: 129090   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:12:49,268-Speed 2607.48 samples/sec   Loss 11.7113   LearningRate 0.0713   Epoch: 3   Global Step: 129100   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:12:53,154-Speed 2636.36 samples/sec   Loss 11.7425   LearningRate 0.0713   Epoch: 3   Global Step: 129110   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:12:57,047-Speed 2630.48 samples/sec   Loss 11.6616   LearningRate 0.0713   Epoch: 3   Global Step: 129120   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:00,940-Speed 2631.51 samples/sec   Loss 11.7736   LearningRate 0.0713   Epoch: 3   Global Step: 129130   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:04,836-Speed 2629.08 samples/sec   Loss 11.8007   LearningRate 0.0713   Epoch: 3   Global Step: 129140   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:08,733-Speed 2628.55 samples/sec   Loss 11.5360   LearningRate 0.0713   Epoch: 3   Global Step: 129150   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:12,719-Speed 2569.84 samples/sec   Loss 11.7113   LearningRate 0.0713   Epoch: 3   Global Step: 129160   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:16,614-Speed 2629.86 samples/sec   Loss 11.5717   LearningRate 0.0713   Epoch: 3   Global Step: 129170   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:20,518-Speed 2623.80 samples/sec   Loss 11.6460   LearningRate 0.0713   Epoch: 3   Global Step: 129180   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:24,413-Speed 2629.24 samples/sec   Loss 11.5806   LearningRate 0.0713   Epoch: 3   Global Step: 129190   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:28,308-Speed 2629.40 samples/sec   Loss 11.6107   LearningRate 0.0713   Epoch: 3   Global Step: 129200   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:32,205-Speed 2629.22 samples/sec   Loss 11.6507   LearningRate 0.0713   Epoch: 3   Global Step: 129210   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:13:36,084-Speed 2640.22 samples/sec   Loss 11.7114   LearningRate 0.0713   Epoch: 3   Global Step: 129220   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:39,983-Speed 2626.83 samples/sec   Loss 11.7732   LearningRate 0.0713   Epoch: 3   Global Step: 129230   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:43,881-Speed 2627.67 samples/sec   Loss 11.6165   LearningRate 0.0713   Epoch: 3   Global Step: 129240   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:47,780-Speed 2627.42 samples/sec   Loss 11.6731   LearningRate 0.0713   Epoch: 3   Global Step: 129250   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:51,676-Speed 2629.17 samples/sec   Loss 11.8101   LearningRate 0.0713   Epoch: 3   Global Step: 129260   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:55,571-Speed 2629.40 samples/sec   Loss 11.5246   LearningRate 0.0713   Epoch: 3   Global Step: 129270   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:13:59,466-Speed 2629.15 samples/sec   Loss 11.7880   LearningRate 0.0713   Epoch: 3   Global Step: 129280   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:03,361-Speed 2630.05 samples/sec   Loss 11.6739   LearningRate 0.0713   Epoch: 3   Global Step: 129290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:07,254-Speed 2631.25 samples/sec   Loss 11.5421   LearningRate 0.0713   Epoch: 3   Global Step: 129300   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:11,150-Speed 2629.05 samples/sec   Loss 11.6635   LearningRate 0.0713   Epoch: 3   Global Step: 129310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:15,055-Speed 2623.31 samples/sec   Loss 11.5858   LearningRate 0.0713   Epoch: 3   Global Step: 129320   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:18,977-Speed 2611.52 samples/sec   Loss 11.6494   LearningRate 0.0713   Epoch: 3   Global Step: 129330   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:22,880-Speed 2624.11 samples/sec   Loss 11.7222   LearningRate 0.0712   Epoch: 3   Global Step: 129340   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:26,784-Speed 2622.90 samples/sec   Loss 11.7358   LearningRate 0.0712   Epoch: 3   Global Step: 129350   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:30,686-Speed 2625.15 samples/sec   Loss 11.4577   LearningRate 0.0712   Epoch: 3   Global Step: 129360   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:34,582-Speed 2628.96 samples/sec   Loss 11.7443   LearningRate 0.0712   Epoch: 3   Global Step: 129370   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:38,479-Speed 2628.02 samples/sec   Loss 11.8184   LearningRate 0.0712   Epoch: 3   Global Step: 129380   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:14:42,360-Speed 2639.54 samples/sec   Loss 11.6264   LearningRate 0.0712   Epoch: 3   Global Step: 129390   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:46,257-Speed 2628.86 samples/sec   Loss 11.6301   LearningRate 0.0712   Epoch: 3   Global Step: 129400   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:50,152-Speed 2629.63 samples/sec   Loss 11.7189   LearningRate 0.0712   Epoch: 3   Global Step: 129410   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:54,063-Speed 2618.58 samples/sec   Loss 11.5964   LearningRate 0.0712   Epoch: 3   Global Step: 129420   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:14:57,959-Speed 2628.89 samples/sec   Loss 11.4963   LearningRate 0.0712   Epoch: 3   Global Step: 129430   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:01,855-Speed 2629.48 samples/sec   Loss 11.5767   LearningRate 0.0712   Epoch: 3   Global Step: 129440   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:05,749-Speed 2630.06 samples/sec   Loss 11.6359   LearningRate 0.0712   Epoch: 3   Global Step: 129450   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:09,643-Speed 2630.48 samples/sec   Loss 11.6951   LearningRate 0.0712   Epoch: 3   Global Step: 129460   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:13,534-Speed 2632.97 samples/sec   Loss 11.7909   LearningRate 0.0712   Epoch: 3   Global Step: 129470   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:17,429-Speed 2629.56 samples/sec   Loss 11.8139   LearningRate 0.0712   Epoch: 3   Global Step: 129480   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:21,351-Speed 2611.46 samples/sec   Loss 11.7944   LearningRate 0.0712   Epoch: 3   Global Step: 129490   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:15:25,241-Speed 2633.19 samples/sec   Loss 11.7548   LearningRate 0.0712   Epoch: 3   Global Step: 129500   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:29,141-Speed 2626.07 samples/sec   Loss 11.5775   LearningRate 0.0712   Epoch: 3   Global Step: 129510   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:33,048-Speed 2621.93 samples/sec   Loss 11.7562   LearningRate 0.0712   Epoch: 3   Global Step: 129520   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:36,940-Speed 2631.53 samples/sec   Loss 11.7417   LearningRate 0.0712   Epoch: 3   Global Step: 129530   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:40,844-Speed 2623.99 samples/sec   Loss 11.7458   LearningRate 0.0712   Epoch: 3   Global Step: 129540   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:44,738-Speed 2629.99 samples/sec   Loss 11.7864   LearningRate 0.0712   Epoch: 3   Global Step: 129550   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:48,632-Speed 2630.98 samples/sec   Loss 11.7403   LearningRate 0.0712   Epoch: 3   Global Step: 129560   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:52,520-Speed 2634.59 samples/sec   Loss 11.8427   LearningRate 0.0712   Epoch: 3   Global Step: 129570   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:15:56,414-Speed 2630.08 samples/sec   Loss 11.6789   LearningRate 0.0712   Epoch: 3   Global Step: 129580   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:16:00,311-Speed 2628.32 samples/sec   Loss 11.7211   LearningRate 0.0712   Epoch: 3   Global Step: 129590   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:16:04,172-Speed 2652.40 samples/sec   Loss 11.8143   LearningRate 0.0712   Epoch: 3   Global Step: 129600   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:08,072-Speed 2626.36 samples/sec   Loss 11.6665   LearningRate 0.0712   Epoch: 3   Global Step: 129610   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:11,970-Speed 2628.31 samples/sec   Loss 11.6816   LearningRate 0.0712   Epoch: 3   Global Step: 129620   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:15,858-Speed 2633.98 samples/sec   Loss 11.5975   LearningRate 0.0712   Epoch: 3   Global Step: 129630   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:19,765-Speed 2621.82 samples/sec   Loss 11.7367   LearningRate 0.0712   Epoch: 3   Global Step: 129640   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:23,688-Speed 2610.72 samples/sec   Loss 11.7312   LearningRate 0.0712   Epoch: 3   Global Step: 129650   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:27,583-Speed 2629.81 samples/sec   Loss 11.7814   LearningRate 0.0712   Epoch: 3   Global Step: 129660   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:31,483-Speed 2625.87 samples/sec   Loss 11.5874   LearningRate 0.0712   Epoch: 3   Global Step: 129670   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:16:35,385-Speed 2624.98 samples/sec   Loss 11.7559   LearningRate 0.0712   Epoch: 3   Global Step: 129680   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:16:39,286-Speed 2625.18 samples/sec   Loss 11.5554   LearningRate 0.0712   Epoch: 3   Global Step: 129690   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:16:43,182-Speed 2629.84 samples/sec   Loss 11.7087   LearningRate 0.0712   Epoch: 3   Global Step: 129700   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:16:47,079-Speed 2629.02 samples/sec   Loss 11.6165   LearningRate 0.0712   Epoch: 3   Global Step: 129710   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:16:50,972-Speed 2630.38 samples/sec   Loss 11.5548   LearningRate 0.0712   Epoch: 3   Global Step: 129720   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:16:54,883-Speed 2619.61 samples/sec   Loss 11.6000   LearningRate 0.0712   Epoch: 3   Global Step: 129730   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:16:58,778-Speed 2629.61 samples/sec   Loss 11.5385   LearningRate 0.0712   Epoch: 3   Global Step: 129740   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:02,680-Speed 2625.16 samples/sec   Loss 11.6961   LearningRate 0.0712   Epoch: 3   Global Step: 129750   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:06,594-Speed 2617.06 samples/sec   Loss 11.7657   LearningRate 0.0712   Epoch: 3   Global Step: 129760   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:10,482-Speed 2634.65 samples/sec   Loss 11.7706   LearningRate 0.0712   Epoch: 3   Global Step: 129770   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:14,376-Speed 2630.34 samples/sec   Loss 11.6970   LearningRate 0.0712   Epoch: 3   Global Step: 129780   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:18,279-Speed 2623.98 samples/sec   Loss 11.7285   LearningRate 0.0712   Epoch: 3   Global Step: 129790   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:22,174-Speed 2629.63 samples/sec   Loss 11.7934   LearningRate 0.0712   Epoch: 3   Global Step: 129800   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:17:26,068-Speed 2630.59 samples/sec   Loss 11.6423   LearningRate 0.0712   Epoch: 3   Global Step: 129810   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:17:29,962-Speed 2630.25 samples/sec   Loss 11.5774   LearningRate 0.0712   Epoch: 3   Global Step: 129820   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:17:33,841-Speed 2641.04 samples/sec   Loss 11.6797   LearningRate 0.0711   Epoch: 3   Global Step: 129830   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:37,735-Speed 2630.35 samples/sec   Loss 11.7330   LearningRate 0.0711   Epoch: 3   Global Step: 129840   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:41,630-Speed 2629.35 samples/sec   Loss 11.6626   LearningRate 0.0711   Epoch: 3   Global Step: 129850   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:45,530-Speed 2626.10 samples/sec   Loss 12.0724   LearningRate 0.0711   Epoch: 3   Global Step: 129860   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:49,425-Speed 2629.77 samples/sec   Loss 11.8813   LearningRate 0.0711   Epoch: 3   Global Step: 129870   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:17:53,309-Speed 2637.74 samples/sec   Loss 11.6924   LearningRate 0.0711   Epoch: 3   Global Step: 129880   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:17:57,203-Speed 2629.95 samples/sec   Loss 11.8102   LearningRate 0.0711   Epoch: 3   Global Step: 129890   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:01,111-Speed 2621.15 samples/sec   Loss 11.7022   LearningRate 0.0711   Epoch: 3   Global Step: 129900   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:05,024-Speed 2617.00 samples/sec   Loss 11.6343   LearningRate 0.0711   Epoch: 3   Global Step: 129910   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:08,939-Speed 2616.18 samples/sec   Loss 11.6581   LearningRate 0.0711   Epoch: 3   Global Step: 129920   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:12,882-Speed 2597.93 samples/sec   Loss 11.7582   LearningRate 0.0711   Epoch: 3   Global Step: 129930   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:16,786-Speed 2623.27 samples/sec   Loss 11.8050   LearningRate 0.0711   Epoch: 3   Global Step: 129940   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:20,686-Speed 2626.38 samples/sec   Loss 11.7228   LearningRate 0.0711   Epoch: 3   Global Step: 129950   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:24,575-Speed 2633.72 samples/sec   Loss 11.7239   LearningRate 0.0711   Epoch: 3   Global Step: 129960   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:28,466-Speed 2632.33 samples/sec   Loss 11.7294   LearningRate 0.0711   Epoch: 3   Global Step: 129970   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:18:32,316-Speed 2660.54 samples/sec   Loss 11.7065   LearningRate 0.0711   Epoch: 3   Global Step: 129980   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:18:36,202-Speed 2635.92 samples/sec   Loss 11.8400   LearningRate 0.0711   Epoch: 3   Global Step: 129990   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:18:40,097-Speed 2629.50 samples/sec   Loss 11.8558   LearningRate 0.0711   Epoch: 3   Global Step: 130000   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:19:22,789-[lfw][130000]XNorm: 22.624960
Training: 2022-04-13 10:19:22,790-[lfw][130000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-13 10:19:22,790-[lfw][130000]Accuracy-Highest: 0.99783
Training: 2022-04-13 10:20:12,922-[cfp_fp][130000]XNorm: 20.408098
Training: 2022-04-13 10:20:12,923-[cfp_fp][130000]Accuracy-Flip: 0.97771+-0.00698
Training: 2022-04-13 10:20:12,924-[cfp_fp][130000]Accuracy-Highest: 0.97986
Training: 2022-04-13 10:20:56,101-[agedb_30][130000]XNorm: 22.158565
Training: 2022-04-13 10:20:56,102-[agedb_30][130000]Accuracy-Flip: 0.96717+-0.00646
Training: 2022-04-13 10:20:56,103-[agedb_30][130000]Accuracy-Highest: 0.96800
Training: 2022-04-13 10:20:59,971-Speed 73.21 samples/sec   Loss 11.8310   LearningRate 0.0711   Epoch: 3   Global Step: 130010   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:03,837-Speed 2649.01 samples/sec   Loss 11.4757   LearningRate 0.0711   Epoch: 3   Global Step: 130020   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:07,707-Speed 2646.98 samples/sec   Loss 11.7318   LearningRate 0.0711   Epoch: 3   Global Step: 130030   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:11,609-Speed 2625.49 samples/sec   Loss 11.6377   LearningRate 0.0711   Epoch: 3   Global Step: 130040   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:15,577-Speed 2581.55 samples/sec   Loss 11.6190   LearningRate 0.0711   Epoch: 3   Global Step: 130050   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:19,451-Speed 2643.17 samples/sec   Loss 11.6699   LearningRate 0.0711   Epoch: 3   Global Step: 130060   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:23,332-Speed 2639.56 samples/sec   Loss 11.7546   LearningRate 0.0711   Epoch: 3   Global Step: 130070   Fp16 Grad Scale: 16384   Required: 79 hours
Training: 2022-04-13 10:21:27,210-Speed 2641.15 samples/sec   Loss 11.7167   LearningRate 0.0711   Epoch: 3   Global Step: 130080   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:31,090-Speed 2640.36 samples/sec   Loss 11.6134   LearningRate 0.0711   Epoch: 3   Global Step: 130090   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:34,989-Speed 2626.39 samples/sec   Loss 11.5885   LearningRate 0.0711   Epoch: 3   Global Step: 130100   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:38,875-Speed 2636.31 samples/sec   Loss 11.7321   LearningRate 0.0711   Epoch: 3   Global Step: 130110   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:42,754-Speed 2640.26 samples/sec   Loss 11.4579   LearningRate 0.0711   Epoch: 3   Global Step: 130120   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:46,651-Speed 2629.03 samples/sec   Loss 11.7277   LearningRate 0.0711   Epoch: 3   Global Step: 130130   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:50,535-Speed 2636.50 samples/sec   Loss 11.7479   LearningRate 0.0711   Epoch: 3   Global Step: 130140   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:54,423-Speed 2634.47 samples/sec   Loss 11.6941   LearningRate 0.0711   Epoch: 3   Global Step: 130150   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:21:58,358-Speed 2602.93 samples/sec   Loss 11.7482   LearningRate 0.0711   Epoch: 3   Global Step: 130160   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:22:02,255-Speed 2628.36 samples/sec   Loss 11.6573   LearningRate 0.0711   Epoch: 3   Global Step: 130170   Fp16 Grad Scale: 32768   Required: 79 hours
Training: 2022-04-13 10:22:06,149-Speed 2630.54 samples/sec   Loss 11.6893   LearningRate 0.0711   Epoch: 3   Global Step: 130180   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:10,042-Speed 2631.12 samples/sec   Loss 11.6086   LearningRate 0.0711   Epoch: 3   Global Step: 130190   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:13,944-Speed 2624.66 samples/sec   Loss 11.6355   LearningRate 0.0711   Epoch: 3   Global Step: 130200   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:17,840-Speed 2629.26 samples/sec   Loss 11.7123   LearningRate 0.0711   Epoch: 3   Global Step: 130210   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:21,732-Speed 2631.79 samples/sec   Loss 11.5457   LearningRate 0.0711   Epoch: 3   Global Step: 130220   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:25,627-Speed 2629.44 samples/sec   Loss 11.7810   LearningRate 0.0711   Epoch: 3   Global Step: 130230   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:29,523-Speed 2628.49 samples/sec   Loss 11.6746   LearningRate 0.0711   Epoch: 3   Global Step: 130240   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:33,437-Speed 2617.22 samples/sec   Loss 11.5519   LearningRate 0.0711   Epoch: 3   Global Step: 130250   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:37,445-Speed 2555.01 samples/sec   Loss 11.6709   LearningRate 0.0711   Epoch: 3   Global Step: 130260   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:41,432-Speed 2569.40 samples/sec   Loss 11.7373   LearningRate 0.0711   Epoch: 3   Global Step: 130270   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:22:45,345-Speed 2617.60 samples/sec   Loss 11.6232   LearningRate 0.0711   Epoch: 3   Global Step: 130280   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:22:49,241-Speed 2628.67 samples/sec   Loss 11.5235   LearningRate 0.0711   Epoch: 3   Global Step: 130290   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:22:53,139-Speed 2627.74 samples/sec   Loss 11.7606   LearningRate 0.0711   Epoch: 3   Global Step: 130300   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:22:57,036-Speed 2628.09 samples/sec   Loss 11.7700   LearningRate 0.0711   Epoch: 3   Global Step: 130310   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:00,931-Speed 2629.12 samples/sec   Loss 11.6757   LearningRate 0.0710   Epoch: 3   Global Step: 130320   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:04,842-Speed 2618.98 samples/sec   Loss 11.6078   LearningRate 0.0710   Epoch: 3   Global Step: 130330   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:08,740-Speed 2627.87 samples/sec   Loss 11.7829   LearningRate 0.0710   Epoch: 3   Global Step: 130340   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:12,635-Speed 2629.50 samples/sec   Loss 11.7930   LearningRate 0.0710   Epoch: 3   Global Step: 130350   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:16,543-Speed 2620.60 samples/sec   Loss 11.6241   LearningRate 0.0710   Epoch: 3   Global Step: 130360   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:20,463-Speed 2613.47 samples/sec   Loss 11.7131   LearningRate 0.0710   Epoch: 3   Global Step: 130370   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:23:24,399-Speed 2602.42 samples/sec   Loss 11.7312   LearningRate 0.0710   Epoch: 3   Global Step: 130380   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:28,330-Speed 2605.35 samples/sec   Loss 11.7498   LearningRate 0.0710   Epoch: 3   Global Step: 130390   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:32,251-Speed 2611.85 samples/sec   Loss 11.5767   LearningRate 0.0710   Epoch: 3   Global Step: 130400   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:36,147-Speed 2628.89 samples/sec   Loss 11.8385   LearningRate 0.0710   Epoch: 3   Global Step: 130410   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:40,047-Speed 2626.32 samples/sec   Loss 11.5604   LearningRate 0.0710   Epoch: 3   Global Step: 130420   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:43,951-Speed 2623.84 samples/sec   Loss 11.6179   LearningRate 0.0710   Epoch: 3   Global Step: 130430   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:47,846-Speed 2629.33 samples/sec   Loss 11.4821   LearningRate 0.0710   Epoch: 3   Global Step: 130440   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:51,742-Speed 2629.25 samples/sec   Loss 11.6301   LearningRate 0.0710   Epoch: 3   Global Step: 130450   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:55,663-Speed 2612.45 samples/sec   Loss 11.6790   LearningRate 0.0710   Epoch: 3   Global Step: 130460   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:23:59,563-Speed 2626.03 samples/sec   Loss 11.5938   LearningRate 0.0710   Epoch: 3   Global Step: 130470   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:03,458-Speed 2629.75 samples/sec   Loss 11.6501   LearningRate 0.0710   Epoch: 3   Global Step: 130480   Fp16 Grad Scale: 524288   Required: 79 hours
Training: 2022-04-13 10:24:07,337-Speed 2640.21 samples/sec   Loss 11.5515   LearningRate 0.0710   Epoch: 3   Global Step: 130490   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:11,232-Speed 2629.59 samples/sec   Loss 11.5735   LearningRate 0.0710   Epoch: 3   Global Step: 130500   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:15,126-Speed 2629.88 samples/sec   Loss 11.7582   LearningRate 0.0710   Epoch: 3   Global Step: 130510   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:19,021-Speed 2629.61 samples/sec   Loss 11.7323   LearningRate 0.0710   Epoch: 3   Global Step: 130520   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:22,921-Speed 2626.44 samples/sec   Loss 11.6834   LearningRate 0.0710   Epoch: 3   Global Step: 130530   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:26,831-Speed 2619.67 samples/sec   Loss 11.6225   LearningRate 0.0710   Epoch: 3   Global Step: 130540   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:30,730-Speed 2627.19 samples/sec   Loss 11.5860   LearningRate 0.0710   Epoch: 3   Global Step: 130550   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:34,636-Speed 2621.64 samples/sec   Loss 11.6876   LearningRate 0.0710   Epoch: 3   Global Step: 130560   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:38,538-Speed 2625.02 samples/sec   Loss 11.5007   LearningRate 0.0710   Epoch: 3   Global Step: 130570   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:42,434-Speed 2629.26 samples/sec   Loss 11.7150   LearningRate 0.0710   Epoch: 3   Global Step: 130580   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:46,323-Speed 2633.28 samples/sec   Loss 11.6496   LearningRate 0.0710   Epoch: 3   Global Step: 130590   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:50,219-Speed 2628.76 samples/sec   Loss 11.6718   LearningRate 0.0710   Epoch: 3   Global Step: 130600   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:54,121-Speed 2624.99 samples/sec   Loss 11.7986   LearningRate 0.0710   Epoch: 3   Global Step: 130610   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:24:58,029-Speed 2621.19 samples/sec   Loss 11.8100   LearningRate 0.0710   Epoch: 3   Global Step: 130620   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:25:01,941-Speed 2617.69 samples/sec   Loss 11.6844   LearningRate 0.0710   Epoch: 3   Global Step: 130630   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:25:05,837-Speed 2628.87 samples/sec   Loss 11.6988   LearningRate 0.0710   Epoch: 3   Global Step: 130640   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:25:09,734-Speed 2628.24 samples/sec   Loss 11.6518   LearningRate 0.0710   Epoch: 3   Global Step: 130650   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:25:13,606-Speed 2645.66 samples/sec   Loss 11.6971   LearningRate 0.0710   Epoch: 3   Global Step: 130660   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:17,501-Speed 2629.27 samples/sec   Loss 11.6767   LearningRate 0.0710   Epoch: 3   Global Step: 130670   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:21,402-Speed 2625.81 samples/sec   Loss 11.6809   LearningRate 0.0710   Epoch: 3   Global Step: 130680   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:25,316-Speed 2616.86 samples/sec   Loss 11.5681   LearningRate 0.0710   Epoch: 3   Global Step: 130690   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:29,308-Speed 2565.44 samples/sec   Loss 11.6237   LearningRate 0.0710   Epoch: 3   Global Step: 130700   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:33,257-Speed 2593.86 samples/sec   Loss 11.7184   LearningRate 0.0710   Epoch: 3   Global Step: 130710   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:37,157-Speed 2626.09 samples/sec   Loss 11.7849   LearningRate 0.0710   Epoch: 3   Global Step: 130720   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:41,051-Speed 2629.83 samples/sec   Loss 11.6852   LearningRate 0.0710   Epoch: 3   Global Step: 130730   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:44,953-Speed 2625.30 samples/sec   Loss 11.7048   LearningRate 0.0710   Epoch: 3   Global Step: 130740   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:48,853-Speed 2626.05 samples/sec   Loss 11.6292   LearningRate 0.0710   Epoch: 3   Global Step: 130750   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:25:52,756-Speed 2624.36 samples/sec   Loss 11.7477   LearningRate 0.0710   Epoch: 3   Global Step: 130760   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:25:56,635-Speed 2640.81 samples/sec   Loss 11.6020   LearningRate 0.0710   Epoch: 3   Global Step: 130770   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:00,534-Speed 2626.30 samples/sec   Loss 11.6360   LearningRate 0.0710   Epoch: 3   Global Step: 130780   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:04,432-Speed 2627.46 samples/sec   Loss 11.7221   LearningRate 0.0710   Epoch: 3   Global Step: 130790   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:08,332-Speed 2626.39 samples/sec   Loss 11.5646   LearningRate 0.0710   Epoch: 3   Global Step: 130800   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:12,227-Speed 2629.89 samples/sec   Loss 11.5655   LearningRate 0.0709   Epoch: 3   Global Step: 130810   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:16,123-Speed 2628.78 samples/sec   Loss 11.6771   LearningRate 0.0709   Epoch: 3   Global Step: 130820   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:20,019-Speed 2628.98 samples/sec   Loss 11.6224   LearningRate 0.0709   Epoch: 3   Global Step: 130830   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:24,035-Speed 2550.32 samples/sec   Loss 11.6163   LearningRate 0.0709   Epoch: 3   Global Step: 130840   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:27,948-Speed 2617.89 samples/sec   Loss 11.4852   LearningRate 0.0709   Epoch: 3   Global Step: 130850   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:31,855-Speed 2621.42 samples/sec   Loss 11.6118   LearningRate 0.0709   Epoch: 3   Global Step: 130860   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:35,740-Speed 2636.20 samples/sec   Loss 11.5589   LearningRate 0.0709   Epoch: 3   Global Step: 130870   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:39,643-Speed 2624.56 samples/sec   Loss 11.5994   LearningRate 0.0709   Epoch: 3   Global Step: 130880   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:43,546-Speed 2624.28 samples/sec   Loss 11.7393   LearningRate 0.0709   Epoch: 3   Global Step: 130890   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:47,448-Speed 2624.99 samples/sec   Loss 11.7128   LearningRate 0.0709   Epoch: 3   Global Step: 130900   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:51,353-Speed 2622.76 samples/sec   Loss 11.8304   LearningRate 0.0709   Epoch: 3   Global Step: 130910   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:55,253-Speed 2626.27 samples/sec   Loss 11.4677   LearningRate 0.0709   Epoch: 3   Global Step: 130920   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:26:59,176-Speed 2611.08 samples/sec   Loss 11.7953   LearningRate 0.0709   Epoch: 3   Global Step: 130930   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:03,082-Speed 2621.92 samples/sec   Loss 11.5820   LearningRate 0.0709   Epoch: 3   Global Step: 130940   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:06,982-Speed 2626.06 samples/sec   Loss 11.6757   LearningRate 0.0709   Epoch: 3   Global Step: 130950   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:10,880-Speed 2627.46 samples/sec   Loss 11.5301   LearningRate 0.0709   Epoch: 3   Global Step: 130960   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:14,861-Speed 2573.13 samples/sec   Loss 11.6684   LearningRate 0.0709   Epoch: 3   Global Step: 130970   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:27:18,768-Speed 2621.30 samples/sec   Loss 11.5921   LearningRate 0.0709   Epoch: 3   Global Step: 130980   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:27:22,680-Speed 2618.31 samples/sec   Loss 11.7501   LearningRate 0.0709   Epoch: 3   Global Step: 130990   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:27:26,582-Speed 2624.79 samples/sec   Loss 11.5596   LearningRate 0.0709   Epoch: 3   Global Step: 131000   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:27:30,478-Speed 2629.39 samples/sec   Loss 11.6178   LearningRate 0.0709   Epoch: 3   Global Step: 131010   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:27:34,376-Speed 2627.44 samples/sec   Loss 11.6126   LearningRate 0.0709   Epoch: 3   Global Step: 131020   Fp16 Grad Scale: 262144   Required: 79 hours
Training: 2022-04-13 10:27:38,257-Speed 2639.21 samples/sec   Loss 11.8022   LearningRate 0.0709   Epoch: 3   Global Step: 131030   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:42,158-Speed 2625.49 samples/sec   Loss 11.6823   LearningRate 0.0709   Epoch: 3   Global Step: 131040   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:46,057-Speed 2627.02 samples/sec   Loss 11.6356   LearningRate 0.0709   Epoch: 3   Global Step: 131050   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:49,959-Speed 2624.74 samples/sec   Loss 11.6858   LearningRate 0.0709   Epoch: 3   Global Step: 131060   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:53,859-Speed 2626.22 samples/sec   Loss 11.6387   LearningRate 0.0709   Epoch: 3   Global Step: 131070   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:27:57,735-Speed 2642.41 samples/sec   Loss 11.6363   LearningRate 0.0709   Epoch: 3   Global Step: 131080   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:01,647-Speed 2617.85 samples/sec   Loss 11.5730   LearningRate 0.0709   Epoch: 3   Global Step: 131090   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:05,542-Speed 2630.13 samples/sec   Loss 11.5276   LearningRate 0.0709   Epoch: 3   Global Step: 131100   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:09,440-Speed 2627.77 samples/sec   Loss 11.7688   LearningRate 0.0709   Epoch: 3   Global Step: 131110   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:13,337-Speed 2628.65 samples/sec   Loss 11.5457   LearningRate 0.0709   Epoch: 3   Global Step: 131120   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:17,234-Speed 2627.83 samples/sec   Loss 11.6374   LearningRate 0.0709   Epoch: 3   Global Step: 131130   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:21,138-Speed 2623.62 samples/sec   Loss 11.7521   LearningRate 0.0709   Epoch: 3   Global Step: 131140   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:25,042-Speed 2623.31 samples/sec   Loss 11.6679   LearningRate 0.0709   Epoch: 3   Global Step: 131150   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:28,950-Speed 2620.84 samples/sec   Loss 11.6260   LearningRate 0.0709   Epoch: 3   Global Step: 131160   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:32,842-Speed 2631.34 samples/sec   Loss 11.7089   LearningRate 0.0709   Epoch: 3   Global Step: 131170   Fp16 Grad Scale: 65536   Required: 79 hours
Training: 2022-04-13 10:28:36,736-Speed 2630.52 samples/sec   Loss 11.6426   LearningRate 0.0709   Epoch: 3   Global Step: 131180   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:28:40,636-Speed 2626.84 samples/sec   Loss 11.8865   LearningRate 0.0709   Epoch: 3   Global Step: 131190   Fp16 Grad Scale: 131072   Required: 79 hours
Training: 2022-04-13 10:28:44,531-Speed 2629.61 samples/sec   Loss 11.6168   LearningRate 0.0709   Epoch: 3   Global Step: 131200   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:28:48,423-Speed 2631.49 samples/sec   Loss 11.7150   LearningRate 0.0709   Epoch: 3   Global Step: 131210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:28:52,317-Speed 2630.82 samples/sec   Loss 11.7038   LearningRate 0.0709   Epoch: 3   Global Step: 131220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:28:56,261-Speed 2596.80 samples/sec   Loss 11.6625   LearningRate 0.0709   Epoch: 3   Global Step: 131230   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:00,165-Speed 2623.12 samples/sec   Loss 11.6725   LearningRate 0.0709   Epoch: 3   Global Step: 131240   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:04,139-Speed 2577.73 samples/sec   Loss 11.5363   LearningRate 0.0709   Epoch: 3   Global Step: 131250   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:08,056-Speed 2614.92 samples/sec   Loss 11.4468   LearningRate 0.0709   Epoch: 3   Global Step: 131260   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:11,958-Speed 2625.00 samples/sec   Loss 11.7225   LearningRate 0.0709   Epoch: 3   Global Step: 131270   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:15,868-Speed 2619.41 samples/sec   Loss 11.6620   LearningRate 0.0709   Epoch: 3   Global Step: 131280   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:19,773-Speed 2622.73 samples/sec   Loss 11.5147   LearningRate 0.0709   Epoch: 3   Global Step: 131290   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:23,675-Speed 2624.58 samples/sec   Loss 11.5812   LearningRate 0.0709   Epoch: 3   Global Step: 131300   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:27,576-Speed 2625.95 samples/sec   Loss 11.6308   LearningRate 0.0708   Epoch: 3   Global Step: 131310   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:31,483-Speed 2621.63 samples/sec   Loss 11.6793   LearningRate 0.0708   Epoch: 3   Global Step: 131320   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:35,393-Speed 2619.46 samples/sec   Loss 11.5187   LearningRate 0.0708   Epoch: 3   Global Step: 131330   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:39,298-Speed 2622.70 samples/sec   Loss 11.6699   LearningRate 0.0708   Epoch: 3   Global Step: 131340   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:43,197-Speed 2627.50 samples/sec   Loss 11.6702   LearningRate 0.0708   Epoch: 3   Global Step: 131350   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:29:47,081-Speed 2636.68 samples/sec   Loss 11.5974   LearningRate 0.0708   Epoch: 3   Global Step: 131360   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:50,985-Speed 2623.84 samples/sec   Loss 11.7054   LearningRate 0.0708   Epoch: 3   Global Step: 131370   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:54,902-Speed 2614.28 samples/sec   Loss 11.6298   LearningRate 0.0708   Epoch: 3   Global Step: 131380   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:29:58,823-Speed 2613.05 samples/sec   Loss 11.7362   LearningRate 0.0708   Epoch: 3   Global Step: 131390   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:02,730-Speed 2621.52 samples/sec   Loss 11.6145   LearningRate 0.0708   Epoch: 3   Global Step: 131400   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:06,630-Speed 2625.94 samples/sec   Loss 11.6530   LearningRate 0.0708   Epoch: 3   Global Step: 131410   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:10,535-Speed 2622.43 samples/sec   Loss 11.5070   LearningRate 0.0708   Epoch: 3   Global Step: 131420   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:14,434-Speed 2627.51 samples/sec   Loss 11.6405   LearningRate 0.0708   Epoch: 3   Global Step: 131430   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:18,332-Speed 2628.21 samples/sec   Loss 11.6565   LearningRate 0.0708   Epoch: 3   Global Step: 131440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:22,269-Speed 2601.45 samples/sec   Loss 11.5151   LearningRate 0.0708   Epoch: 3   Global Step: 131450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:30:26,314-Speed 2531.80 samples/sec   Loss 11.6151   LearningRate 0.0708   Epoch: 3   Global Step: 131460   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:30,205-Speed 2632.28 samples/sec   Loss 11.6601   LearningRate 0.0708   Epoch: 3   Global Step: 131470   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:34,102-Speed 2628.41 samples/sec   Loss 11.6007   LearningRate 0.0708   Epoch: 3   Global Step: 131480   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:37,996-Speed 2630.41 samples/sec   Loss 11.5821   LearningRate 0.0708   Epoch: 3   Global Step: 131490   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:41,894-Speed 2627.64 samples/sec   Loss 11.5112   LearningRate 0.0708   Epoch: 3   Global Step: 131500   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:45,790-Speed 2628.82 samples/sec   Loss 11.6430   LearningRate 0.0708   Epoch: 3   Global Step: 131510   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:49,692-Speed 2624.97 samples/sec   Loss 11.3972   LearningRate 0.0708   Epoch: 3   Global Step: 131520   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:53,595-Speed 2624.43 samples/sec   Loss 11.5338   LearningRate 0.0708   Epoch: 3   Global Step: 131530   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:30:57,532-Speed 2601.99 samples/sec   Loss 11.7182   LearningRate 0.0708   Epoch: 3   Global Step: 131540   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:01,424-Speed 2631.08 samples/sec   Loss 11.5581   LearningRate 0.0708   Epoch: 3   Global Step: 131550   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:05,354-Speed 2606.44 samples/sec   Loss 11.5886   LearningRate 0.0708   Epoch: 3   Global Step: 131560   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:09,263-Speed 2620.24 samples/sec   Loss 11.7704   LearningRate 0.0708   Epoch: 3   Global Step: 131570   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:13,199-Speed 2602.22 samples/sec   Loss 11.5889   LearningRate 0.0708   Epoch: 3   Global Step: 131580   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:17,092-Speed 2631.66 samples/sec   Loss 11.6828   LearningRate 0.0708   Epoch: 3   Global Step: 131590   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:20,985-Speed 2630.92 samples/sec   Loss 11.6327   LearningRate 0.0708   Epoch: 3   Global Step: 131600   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:24,880-Speed 2629.45 samples/sec   Loss 11.5536   LearningRate 0.0708   Epoch: 3   Global Step: 131610   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:28,786-Speed 2622.54 samples/sec   Loss 11.5667   LearningRate 0.0708   Epoch: 3   Global Step: 131620   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:32,681-Speed 2629.54 samples/sec   Loss 11.5548   LearningRate 0.0708   Epoch: 3   Global Step: 131630   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:36,574-Speed 2630.74 samples/sec   Loss 11.5324   LearningRate 0.0708   Epoch: 3   Global Step: 131640   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:40,476-Speed 2625.62 samples/sec   Loss 11.6559   LearningRate 0.0708   Epoch: 3   Global Step: 131650   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:44,352-Speed 2642.43 samples/sec   Loss 11.5339   LearningRate 0.0708   Epoch: 3   Global Step: 131660   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:31:48,230-Speed 2641.30 samples/sec   Loss 11.6160   LearningRate 0.0708   Epoch: 3   Global Step: 131670   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:31:52,123-Speed 2631.27 samples/sec   Loss 11.6815   LearningRate 0.0708   Epoch: 3   Global Step: 131680   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:31:56,023-Speed 2625.79 samples/sec   Loss 11.6855   LearningRate 0.0708   Epoch: 3   Global Step: 131690   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:31:59,911-Speed 2634.49 samples/sec   Loss 11.6885   LearningRate 0.0708   Epoch: 3   Global Step: 131700   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:03,806-Speed 2630.00 samples/sec   Loss 11.6953   LearningRate 0.0708   Epoch: 3   Global Step: 131710   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:07,704-Speed 2627.35 samples/sec   Loss 11.6884   LearningRate 0.0708   Epoch: 3   Global Step: 131720   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:11,601-Speed 2628.47 samples/sec   Loss 11.7547   LearningRate 0.0708   Epoch: 3   Global Step: 131730   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:15,505-Speed 2622.89 samples/sec   Loss 11.5665   LearningRate 0.0708   Epoch: 3   Global Step: 131740   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:19,398-Speed 2631.48 samples/sec   Loss 11.6925   LearningRate 0.0708   Epoch: 3   Global Step: 131750   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:23,293-Speed 2629.64 samples/sec   Loss 11.5755   LearningRate 0.0708   Epoch: 3   Global Step: 131760   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:27,189-Speed 2628.33 samples/sec   Loss 11.5907   LearningRate 0.0708   Epoch: 3   Global Step: 131770   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:32:31,091-Speed 2624.85 samples/sec   Loss 11.6843   LearningRate 0.0708   Epoch: 3   Global Step: 131780   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:32:34,992-Speed 2625.99 samples/sec   Loss 11.5815   LearningRate 0.0708   Epoch: 3   Global Step: 131790   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:32:38,889-Speed 2628.49 samples/sec   Loss 11.6162   LearningRate 0.0707   Epoch: 3   Global Step: 131800   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:32:42,783-Speed 2629.91 samples/sec   Loss 11.6352   LearningRate 0.0707   Epoch: 3   Global Step: 131810   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:32:46,677-Speed 2630.56 samples/sec   Loss 11.6210   LearningRate 0.0707   Epoch: 3   Global Step: 131820   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:32:50,556-Speed 2640.22 samples/sec   Loss 11.7731   LearningRate 0.0707   Epoch: 3   Global Step: 131830   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:54,447-Speed 2632.59 samples/sec   Loss 11.5961   LearningRate 0.0707   Epoch: 3   Global Step: 131840   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:32:58,351-Speed 2623.46 samples/sec   Loss 11.5371   LearningRate 0.0707   Epoch: 3   Global Step: 131850   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:02,335-Speed 2570.56 samples/sec   Loss 11.6905   LearningRate 0.0707   Epoch: 3   Global Step: 131860   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:06,238-Speed 2624.27 samples/sec   Loss 11.6398   LearningRate 0.0707   Epoch: 3   Global Step: 131870   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:10,131-Speed 2631.55 samples/sec   Loss 11.6965   LearningRate 0.0707   Epoch: 3   Global Step: 131880   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:14,029-Speed 2628.07 samples/sec   Loss 11.6484   LearningRate 0.0707   Epoch: 3   Global Step: 131890   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:17,925-Speed 2628.82 samples/sec   Loss 11.6522   LearningRate 0.0707   Epoch: 3   Global Step: 131900   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:21,820-Speed 2629.23 samples/sec   Loss 11.6208   LearningRate 0.0707   Epoch: 3   Global Step: 131910   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:25,717-Speed 2628.64 samples/sec   Loss 11.6191   LearningRate 0.0707   Epoch: 3   Global Step: 131920   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:29,612-Speed 2629.26 samples/sec   Loss 11.6349   LearningRate 0.0707   Epoch: 3   Global Step: 131930   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:33:33,490-Speed 2641.11 samples/sec   Loss 11.6536   LearningRate 0.0707   Epoch: 3   Global Step: 131940   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:37,389-Speed 2627.40 samples/sec   Loss 11.6102   LearningRate 0.0707   Epoch: 3   Global Step: 131950   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:41,289-Speed 2625.87 samples/sec   Loss 11.5882   LearningRate 0.0707   Epoch: 3   Global Step: 131960   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:45,199-Speed 2629.96 samples/sec   Loss 11.5079   LearningRate 0.0707   Epoch: 3   Global Step: 131970   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:49,104-Speed 2622.31 samples/sec   Loss 11.4768   LearningRate 0.0707   Epoch: 3   Global Step: 131980   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:53,003-Speed 2627.17 samples/sec   Loss 11.6120   LearningRate 0.0707   Epoch: 3   Global Step: 131990   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:33:56,901-Speed 2627.97 samples/sec   Loss 11.7379   LearningRate 0.0707   Epoch: 3   Global Step: 132000   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:00,798-Speed 2628.59 samples/sec   Loss 11.5009   LearningRate 0.0707   Epoch: 3   Global Step: 132010   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:04,695-Speed 2627.76 samples/sec   Loss 11.4716   LearningRate 0.0707   Epoch: 3   Global Step: 132020   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:08,588-Speed 2630.99 samples/sec   Loss 11.5330   LearningRate 0.0707   Epoch: 3   Global Step: 132030   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:12,492-Speed 2623.79 samples/sec   Loss 11.6266   LearningRate 0.0707   Epoch: 3   Global Step: 132040   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:34:16,387-Speed 2629.71 samples/sec   Loss 11.6154   LearningRate 0.0707   Epoch: 3   Global Step: 132050   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:34:20,278-Speed 2632.33 samples/sec   Loss 11.7653   LearningRate 0.0707   Epoch: 3   Global Step: 132060   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:34:24,162-Speed 2637.21 samples/sec   Loss 11.4799   LearningRate 0.0707   Epoch: 3   Global Step: 132070   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:28,060-Speed 2627.47 samples/sec   Loss 11.6062   LearningRate 0.0707   Epoch: 3   Global Step: 132080   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:31,962-Speed 2625.04 samples/sec   Loss 11.4549   LearningRate 0.0707   Epoch: 3   Global Step: 132090   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:35,861-Speed 2626.56 samples/sec   Loss 11.5004   LearningRate 0.0707   Epoch: 3   Global Step: 132100   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:39,775-Speed 2617.31 samples/sec   Loss 11.6063   LearningRate 0.0707   Epoch: 3   Global Step: 132110   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:43,679-Speed 2623.56 samples/sec   Loss 11.5965   LearningRate 0.0707   Epoch: 3   Global Step: 132120   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:47,573-Speed 2630.89 samples/sec   Loss 11.6719   LearningRate 0.0707   Epoch: 3   Global Step: 132130   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:51,463-Speed 2633.26 samples/sec   Loss 11.5924   LearningRate 0.0707   Epoch: 3   Global Step: 132140   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:55,359-Speed 2628.63 samples/sec   Loss 11.6518   LearningRate 0.0707   Epoch: 3   Global Step: 132150   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:34:59,256-Speed 2628.36 samples/sec   Loss 11.6391   LearningRate 0.0707   Epoch: 3   Global Step: 132160   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:03,142-Speed 2635.89 samples/sec   Loss 11.7046   LearningRate 0.0707   Epoch: 3   Global Step: 132170   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:07,083-Speed 2599.07 samples/sec   Loss 11.6648   LearningRate 0.0707   Epoch: 3   Global Step: 132180   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:10,979-Speed 2628.51 samples/sec   Loss 11.6728   LearningRate 0.0707   Epoch: 3   Global Step: 132190   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:14,869-Speed 2633.42 samples/sec   Loss 11.5634   LearningRate 0.0707   Epoch: 3   Global Step: 132200   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:18,761-Speed 2631.87 samples/sec   Loss 11.6220   LearningRate 0.0707   Epoch: 3   Global Step: 132210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:22,655-Speed 2630.12 samples/sec   Loss 11.7314   LearningRate 0.0707   Epoch: 3   Global Step: 132220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:26,555-Speed 2626.81 samples/sec   Loss 11.5099   LearningRate 0.0707   Epoch: 3   Global Step: 132230   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:30,459-Speed 2623.30 samples/sec   Loss 11.5264   LearningRate 0.0707   Epoch: 3   Global Step: 132240   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:34,359-Speed 2626.62 samples/sec   Loss 11.6168   LearningRate 0.0707   Epoch: 3   Global Step: 132250   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:38,262-Speed 2624.20 samples/sec   Loss 11.6399   LearningRate 0.0707   Epoch: 3   Global Step: 132260   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:35:42,153-Speed 2632.31 samples/sec   Loss 11.4820   LearningRate 0.0707   Epoch: 3   Global Step: 132270   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:35:46,046-Speed 2631.00 samples/sec   Loss 11.8038   LearningRate 0.0707   Epoch: 3   Global Step: 132280   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:35:49,939-Speed 2631.02 samples/sec   Loss 11.5656   LearningRate 0.0706   Epoch: 3   Global Step: 132290   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:35:53,839-Speed 2626.62 samples/sec   Loss 11.7122   LearningRate 0.0706   Epoch: 3   Global Step: 132300   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:35:57,755-Speed 2615.24 samples/sec   Loss 11.7507   LearningRate 0.0706   Epoch: 3   Global Step: 132310   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:36:01,633-Speed 2641.06 samples/sec   Loss 11.4210   LearningRate 0.0706   Epoch: 3   Global Step: 132320   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:05,533-Speed 2626.24 samples/sec   Loss 11.4761   LearningRate 0.0706   Epoch: 3   Global Step: 132330   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:09,430-Speed 2628.46 samples/sec   Loss 11.7087   LearningRate 0.0706   Epoch: 3   Global Step: 132340   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:13,322-Speed 2631.86 samples/sec   Loss 11.6741   LearningRate 0.0706   Epoch: 3   Global Step: 132350   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:17,223-Speed 2625.60 samples/sec   Loss 11.5716   LearningRate 0.0706   Epoch: 3   Global Step: 132360   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:21,116-Speed 2630.98 samples/sec   Loss 11.4780   LearningRate 0.0706   Epoch: 3   Global Step: 132370   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:25,015-Speed 2627.27 samples/sec   Loss 11.6841   LearningRate 0.0706   Epoch: 3   Global Step: 132380   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:28,909-Speed 2630.70 samples/sec   Loss 11.7462   LearningRate 0.0706   Epoch: 3   Global Step: 132390   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:32,805-Speed 2628.76 samples/sec   Loss 11.6635   LearningRate 0.0706   Epoch: 3   Global Step: 132400   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:36,697-Speed 2631.35 samples/sec   Loss 11.5555   LearningRate 0.0706   Epoch: 3   Global Step: 132410   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:40,604-Speed 2621.54 samples/sec   Loss 11.6731   LearningRate 0.0706   Epoch: 3   Global Step: 132420   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:36:44,500-Speed 2628.64 samples/sec   Loss 11.7145   LearningRate 0.0706   Epoch: 3   Global Step: 132430   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:36:48,395-Speed 2629.77 samples/sec   Loss 11.5566   LearningRate 0.0706   Epoch: 3   Global Step: 132440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:52,326-Speed 2605.82 samples/sec   Loss 11.5639   LearningRate 0.0706   Epoch: 3   Global Step: 132450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:36:56,221-Speed 2629.82 samples/sec   Loss 11.6114   LearningRate 0.0706   Epoch: 3   Global Step: 132460   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:00,124-Speed 2624.32 samples/sec   Loss 11.5906   LearningRate 0.0706   Epoch: 3   Global Step: 132470   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:04,026-Speed 2624.69 samples/sec   Loss 11.5713   LearningRate 0.0706   Epoch: 3   Global Step: 132480   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:07,933-Speed 2621.19 samples/sec   Loss 11.5921   LearningRate 0.0706   Epoch: 3   Global Step: 132490   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:11,827-Speed 2630.16 samples/sec   Loss 11.6905   LearningRate 0.0706   Epoch: 3   Global Step: 132500   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:15,722-Speed 2630.09 samples/sec   Loss 11.5197   LearningRate 0.0706   Epoch: 3   Global Step: 132510   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:19,626-Speed 2623.88 samples/sec   Loss 11.5684   LearningRate 0.0706   Epoch: 3   Global Step: 132520   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:23,521-Speed 2629.32 samples/sec   Loss 11.5887   LearningRate 0.0706   Epoch: 3   Global Step: 132530   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:27,421-Speed 2626.78 samples/sec   Loss 11.5885   LearningRate 0.0706   Epoch: 3   Global Step: 132540   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:37:31,303-Speed 2637.95 samples/sec   Loss 11.5138   LearningRate 0.0706   Epoch: 3   Global Step: 132550   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:37:35,200-Speed 2628.20 samples/sec   Loss 11.7519   LearningRate 0.0706   Epoch: 3   Global Step: 132560   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:37:39,095-Speed 2629.84 samples/sec   Loss 11.6478   LearningRate 0.0706   Epoch: 3   Global Step: 132570   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:37:42,993-Speed 2627.92 samples/sec   Loss 11.6120   LearningRate 0.0706   Epoch: 3   Global Step: 132580   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:37:46,884-Speed 2631.95 samples/sec   Loss 11.4982   LearningRate 0.0706   Epoch: 3   Global Step: 132590   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:37:50,767-Speed 2638.48 samples/sec   Loss 11.6666   LearningRate 0.0706   Epoch: 3   Global Step: 132600   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:37:54,673-Speed 2622.21 samples/sec   Loss 11.4874   LearningRate 0.0706   Epoch: 3   Global Step: 132610   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:37:58,567-Speed 2630.54 samples/sec   Loss 11.7601   LearningRate 0.0706   Epoch: 3   Global Step: 132620   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:02,460-Speed 2630.43 samples/sec   Loss 11.6875   LearningRate 0.0706   Epoch: 3   Global Step: 132630   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:06,356-Speed 2628.79 samples/sec   Loss 11.7452   LearningRate 0.0706   Epoch: 3   Global Step: 132640   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:10,248-Speed 2631.63 samples/sec   Loss 11.5797   LearningRate 0.0706   Epoch: 3   Global Step: 132650   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:14,162-Speed 2617.08 samples/sec   Loss 11.5994   LearningRate 0.0706   Epoch: 3   Global Step: 132660   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:18,052-Speed 2633.11 samples/sec   Loss 11.4833   LearningRate 0.0706   Epoch: 3   Global Step: 132670   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:21,949-Speed 2628.09 samples/sec   Loss 11.7313   LearningRate 0.0706   Epoch: 3   Global Step: 132680   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:25,849-Speed 2626.59 samples/sec   Loss 11.5778   LearningRate 0.0706   Epoch: 3   Global Step: 132690   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:38:29,753-Speed 2623.57 samples/sec   Loss 11.6587   LearningRate 0.0706   Epoch: 3   Global Step: 132700   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:33,653-Speed 2626.63 samples/sec   Loss 11.5357   LearningRate 0.0706   Epoch: 3   Global Step: 132710   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:37,548-Speed 2629.52 samples/sec   Loss 11.7466   LearningRate 0.0706   Epoch: 3   Global Step: 132720   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:41,444-Speed 2628.52 samples/sec   Loss 11.6107   LearningRate 0.0706   Epoch: 3   Global Step: 132730   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:45,343-Speed 2627.32 samples/sec   Loss 11.6313   LearningRate 0.0706   Epoch: 3   Global Step: 132740   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:49,236-Speed 2630.93 samples/sec   Loss 11.6628   LearningRate 0.0706   Epoch: 3   Global Step: 132750   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:53,130-Speed 2630.75 samples/sec   Loss 11.5045   LearningRate 0.0706   Epoch: 3   Global Step: 132760   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:38:57,029-Speed 2626.84 samples/sec   Loss 11.5759   LearningRate 0.0706   Epoch: 3   Global Step: 132770   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:39:00,925-Speed 2628.79 samples/sec   Loss 11.5485   LearningRate 0.0706   Epoch: 3   Global Step: 132780   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:39:04,818-Speed 2630.45 samples/sec   Loss 11.4600   LearningRate 0.0705   Epoch: 3   Global Step: 132790   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:39:08,715-Speed 2628.77 samples/sec   Loss 11.5293   LearningRate 0.0705   Epoch: 3   Global Step: 132800   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:12,756-Speed 2534.45 samples/sec   Loss 11.5582   LearningRate 0.0705   Epoch: 3   Global Step: 132810   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:16,650-Speed 2630.18 samples/sec   Loss 11.5753   LearningRate 0.0705   Epoch: 3   Global Step: 132820   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:20,544-Speed 2630.11 samples/sec   Loss 11.7071   LearningRate 0.0705   Epoch: 3   Global Step: 132830   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:24,451-Speed 2621.97 samples/sec   Loss 11.4428   LearningRate 0.0705   Epoch: 3   Global Step: 132840   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:28,347-Speed 2629.23 samples/sec   Loss 11.7813   LearningRate 0.0705   Epoch: 3   Global Step: 132850   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:32,244-Speed 2628.25 samples/sec   Loss 11.6460   LearningRate 0.0705   Epoch: 3   Global Step: 132860   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:36,141-Speed 2628.01 samples/sec   Loss 11.5843   LearningRate 0.0705   Epoch: 3   Global Step: 132870   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:40,037-Speed 2629.22 samples/sec   Loss 11.6366   LearningRate 0.0705   Epoch: 3   Global Step: 132880   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:43,927-Speed 2632.20 samples/sec   Loss 11.5944   LearningRate 0.0705   Epoch: 3   Global Step: 132890   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:39:47,824-Speed 2628.89 samples/sec   Loss 11.7066   LearningRate 0.0705   Epoch: 3   Global Step: 132900   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:39:51,720-Speed 2628.72 samples/sec   Loss 11.5481   LearningRate 0.0705   Epoch: 3   Global Step: 132910   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:39:55,616-Speed 2629.19 samples/sec   Loss 11.5261   LearningRate 0.0705   Epoch: 3   Global Step: 132920   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:39:59,512-Speed 2628.58 samples/sec   Loss 11.4973   LearningRate 0.0705   Epoch: 3   Global Step: 132930   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:03,413-Speed 2625.62 samples/sec   Loss 11.6371   LearningRate 0.0705   Epoch: 3   Global Step: 132940   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:07,311-Speed 2627.46 samples/sec   Loss 11.4838   LearningRate 0.0705   Epoch: 3   Global Step: 132950   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:11,207-Speed 2629.34 samples/sec   Loss 11.6856   LearningRate 0.0705   Epoch: 3   Global Step: 132960   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:15,105-Speed 2627.62 samples/sec   Loss 11.7552   LearningRate 0.0705   Epoch: 3   Global Step: 132970   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:19,021-Speed 2615.35 samples/sec   Loss 11.7819   LearningRate 0.0705   Epoch: 3   Global Step: 132980   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:23,012-Speed 2566.70 samples/sec   Loss 11.4676   LearningRate 0.0705   Epoch: 3   Global Step: 132990   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:26,885-Speed 2644.81 samples/sec   Loss 11.6046   LearningRate 0.0705   Epoch: 3   Global Step: 133000   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:30,783-Speed 2627.07 samples/sec   Loss 11.6062   LearningRate 0.0705   Epoch: 3   Global Step: 133010   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:34,727-Speed 2597.21 samples/sec   Loss 11.5378   LearningRate 0.0705   Epoch: 3   Global Step: 133020   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:38,620-Speed 2631.40 samples/sec   Loss 11.5841   LearningRate 0.0705   Epoch: 3   Global Step: 133030   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:42,510-Speed 2633.19 samples/sec   Loss 11.6309   LearningRate 0.0705   Epoch: 3   Global Step: 133040   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:46,401-Speed 2632.26 samples/sec   Loss 11.5073   LearningRate 0.0705   Epoch: 3   Global Step: 133050   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:50,294-Speed 2631.46 samples/sec   Loss 11.5607   LearningRate 0.0705   Epoch: 3   Global Step: 133060   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:54,185-Speed 2632.02 samples/sec   Loss 11.6157   LearningRate 0.0705   Epoch: 3   Global Step: 133070   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:40:58,077-Speed 2631.81 samples/sec   Loss 11.4837   LearningRate 0.0705   Epoch: 3   Global Step: 133080   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:01,972-Speed 2629.65 samples/sec   Loss 11.4255   LearningRate 0.0705   Epoch: 3   Global Step: 133090   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:05,851-Speed 2640.20 samples/sec   Loss 11.5988   LearningRate 0.0705   Epoch: 3   Global Step: 133100   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:09,801-Speed 2594.21 samples/sec   Loss 11.6191   LearningRate 0.0705   Epoch: 3   Global Step: 133110   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:13,699-Speed 2626.99 samples/sec   Loss 11.5812   LearningRate 0.0705   Epoch: 3   Global Step: 133120   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:17,600-Speed 2625.50 samples/sec   Loss 11.6169   LearningRate 0.0705   Epoch: 3   Global Step: 133130   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:21,529-Speed 2607.04 samples/sec   Loss 11.5939   LearningRate 0.0705   Epoch: 3   Global Step: 133140   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:25,429-Speed 2626.52 samples/sec   Loss 11.7920   LearningRate 0.0705   Epoch: 3   Global Step: 133150   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:29,331-Speed 2625.29 samples/sec   Loss 11.6077   LearningRate 0.0705   Epoch: 3   Global Step: 133160   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:33,231-Speed 2626.29 samples/sec   Loss 11.5559   LearningRate 0.0705   Epoch: 3   Global Step: 133170   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:37,124-Speed 2630.29 samples/sec   Loss 11.6271   LearningRate 0.0705   Epoch: 3   Global Step: 133180   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:41,021-Speed 2628.41 samples/sec   Loss 11.5988   LearningRate 0.0705   Epoch: 3   Global Step: 133190   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:44,898-Speed 2642.42 samples/sec   Loss 11.4847   LearningRate 0.0705   Epoch: 3   Global Step: 133200   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:41:48,770-Speed 2644.87 samples/sec   Loss 11.6153   LearningRate 0.0705   Epoch: 3   Global Step: 133210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:41:52,667-Speed 2628.87 samples/sec   Loss 11.5202   LearningRate 0.0705   Epoch: 3   Global Step: 133220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:41:56,571-Speed 2623.79 samples/sec   Loss 11.6682   LearningRate 0.0705   Epoch: 3   Global Step: 133230   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:00,473-Speed 2624.96 samples/sec   Loss 11.6488   LearningRate 0.0705   Epoch: 3   Global Step: 133240   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:04,381-Speed 2620.98 samples/sec   Loss 11.4974   LearningRate 0.0705   Epoch: 3   Global Step: 133250   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:08,273-Speed 2631.77 samples/sec   Loss 11.6921   LearningRate 0.0705   Epoch: 3   Global Step: 133260   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:12,164-Speed 2631.92 samples/sec   Loss 11.5991   LearningRate 0.0705   Epoch: 3   Global Step: 133270   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:16,056-Speed 2632.12 samples/sec   Loss 11.5603   LearningRate 0.0704   Epoch: 3   Global Step: 133280   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:19,956-Speed 2626.16 samples/sec   Loss 11.4816   LearningRate 0.0704   Epoch: 3   Global Step: 133290   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:23,851-Speed 2630.26 samples/sec   Loss 11.6036   LearningRate 0.0704   Epoch: 3   Global Step: 133300   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:27,727-Speed 2642.64 samples/sec   Loss 11.6086   LearningRate 0.0704   Epoch: 3   Global Step: 133310   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:31,650-Speed 2610.75 samples/sec   Loss 11.5653   LearningRate 0.0704   Epoch: 3   Global Step: 133320   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:35,542-Speed 2632.01 samples/sec   Loss 11.6005   LearningRate 0.0704   Epoch: 3   Global Step: 133330   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:42:39,419-Speed 2641.68 samples/sec   Loss 11.5294   LearningRate 0.0704   Epoch: 3   Global Step: 133340   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:42:43,313-Speed 2630.12 samples/sec   Loss 11.5948   LearningRate 0.0704   Epoch: 3   Global Step: 133350   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:42:47,208-Speed 2629.40 samples/sec   Loss 11.7172   LearningRate 0.0704   Epoch: 3   Global Step: 133360   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:42:51,102-Speed 2630.83 samples/sec   Loss 11.7959   LearningRate 0.0704   Epoch: 3   Global Step: 133370   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:42:54,998-Speed 2628.96 samples/sec   Loss 11.5854   LearningRate 0.0704   Epoch: 3   Global Step: 133380   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:42:58,894-Speed 2629.49 samples/sec   Loss 11.5753   LearningRate 0.0704   Epoch: 3   Global Step: 133390   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:43:02,796-Speed 2624.72 samples/sec   Loss 11.6310   LearningRate 0.0704   Epoch: 3   Global Step: 133400   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:43:06,689-Speed 2630.92 samples/sec   Loss 11.5749   LearningRate 0.0704   Epoch: 3   Global Step: 133410   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:43:10,581-Speed 2631.01 samples/sec   Loss 11.5319   LearningRate 0.0704   Epoch: 3   Global Step: 133420   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:43:14,473-Speed 2631.82 samples/sec   Loss 11.5469   LearningRate 0.0704   Epoch: 3   Global Step: 133430   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:43:18,380-Speed 2621.75 samples/sec   Loss 11.5000   LearningRate 0.0704   Epoch: 3   Global Step: 133440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:22,276-Speed 2628.78 samples/sec   Loss 11.6647   LearningRate 0.0704   Epoch: 3   Global Step: 133450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:26,173-Speed 2628.70 samples/sec   Loss 11.6541   LearningRate 0.0704   Epoch: 3   Global Step: 133460   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:30,071-Speed 2627.86 samples/sec   Loss 11.4877   LearningRate 0.0704   Epoch: 3   Global Step: 133470   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:33,963-Speed 2631.33 samples/sec   Loss 11.5041   LearningRate 0.0704   Epoch: 3   Global Step: 133480   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:37,859-Speed 2629.38 samples/sec   Loss 11.6541   LearningRate 0.0704   Epoch: 3   Global Step: 133490   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:41,815-Speed 2588.97 samples/sec   Loss 11.5325   LearningRate 0.0704   Epoch: 3   Global Step: 133500   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:45,709-Speed 2630.17 samples/sec   Loss 11.5030   LearningRate 0.0704   Epoch: 3   Global Step: 133510   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:49,608-Speed 2627.23 samples/sec   Loss 11.5876   LearningRate 0.0704   Epoch: 3   Global Step: 133520   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:53,505-Speed 2628.30 samples/sec   Loss 11.6236   LearningRate 0.0704   Epoch: 3   Global Step: 133530   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:43:57,399-Speed 2630.64 samples/sec   Loss 11.3722   LearningRate 0.0704   Epoch: 3   Global Step: 133540   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:44:01,294-Speed 2629.18 samples/sec   Loss 11.5298   LearningRate 0.0704   Epoch: 3   Global Step: 133550   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:44:05,172-Speed 2641.38 samples/sec   Loss 11.5872   LearningRate 0.0704   Epoch: 3   Global Step: 133560   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:09,065-Speed 2630.59 samples/sec   Loss 11.6993   LearningRate 0.0704   Epoch: 3   Global Step: 133570   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:12,962-Speed 2628.51 samples/sec   Loss 11.5076   LearningRate 0.0704   Epoch: 3   Global Step: 133580   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:16,866-Speed 2623.58 samples/sec   Loss 11.4944   LearningRate 0.0704   Epoch: 3   Global Step: 133590   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:20,764-Speed 2627.53 samples/sec   Loss 11.6442   LearningRate 0.0704   Epoch: 3   Global Step: 133600   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:24,667-Speed 2624.41 samples/sec   Loss 11.5588   LearningRate 0.0704   Epoch: 3   Global Step: 133610   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:28,562-Speed 2629.91 samples/sec   Loss 11.6546   LearningRate 0.0704   Epoch: 3   Global Step: 133620   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:32,459-Speed 2628.19 samples/sec   Loss 11.6194   LearningRate 0.0704   Epoch: 3   Global Step: 133630   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:36,373-Speed 2616.94 samples/sec   Loss 11.6447   LearningRate 0.0704   Epoch: 3   Global Step: 133640   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:40,390-Speed 2549.87 samples/sec   Loss 11.5231   LearningRate 0.0704   Epoch: 3   Global Step: 133650   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:44:44,345-Speed 2590.26 samples/sec   Loss 11.5545   LearningRate 0.0704   Epoch: 3   Global Step: 133660   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:44:48,268-Speed 2611.35 samples/sec   Loss 11.5400   LearningRate 0.0704   Epoch: 3   Global Step: 133670   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:44:52,158-Speed 2632.59 samples/sec   Loss 11.5890   LearningRate 0.0704   Epoch: 3   Global Step: 133680   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:44:56,025-Speed 2649.28 samples/sec   Loss 11.7191   LearningRate 0.0704   Epoch: 3   Global Step: 133690   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:44:59,881-Speed 2656.33 samples/sec   Loss 12.3524   LearningRate 0.0704   Epoch: 3   Global Step: 133700   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:03,769-Speed 2633.99 samples/sec   Loss 12.0529   LearningRate 0.0704   Epoch: 3   Global Step: 133710   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:07,664-Speed 2629.83 samples/sec   Loss 11.8304   LearningRate 0.0704   Epoch: 3   Global Step: 133720   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:11,553-Speed 2633.48 samples/sec   Loss 11.8141   LearningRate 0.0704   Epoch: 3   Global Step: 133730   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:15,483-Speed 2606.80 samples/sec   Loss 11.6434   LearningRate 0.0704   Epoch: 3   Global Step: 133740   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:19,376-Speed 2630.76 samples/sec   Loss 11.7892   LearningRate 0.0704   Epoch: 3   Global Step: 133750   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:23,286-Speed 2619.75 samples/sec   Loss 11.7020   LearningRate 0.0704   Epoch: 3   Global Step: 133760   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:27,178-Speed 2632.09 samples/sec   Loss 11.8907   LearningRate 0.0704   Epoch: 3   Global Step: 133770   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:31,075-Speed 2628.18 samples/sec   Loss 11.6437   LearningRate 0.0703   Epoch: 3   Global Step: 133780   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:34,965-Speed 2633.14 samples/sec   Loss 11.6913   LearningRate 0.0703   Epoch: 3   Global Step: 133790   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:45:38,857-Speed 2631.87 samples/sec   Loss 11.5976   LearningRate 0.0703   Epoch: 3   Global Step: 133800   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:45:42,756-Speed 2626.22 samples/sec   Loss 11.6673   LearningRate 0.0703   Epoch: 3   Global Step: 133810   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:45:46,652-Speed 2629.52 samples/sec   Loss 11.6888   LearningRate 0.0703   Epoch: 3   Global Step: 133820   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:45:50,544-Speed 2631.81 samples/sec   Loss 11.5885   LearningRate 0.0703   Epoch: 3   Global Step: 133830   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:45:54,436-Speed 2631.77 samples/sec   Loss 11.6038   LearningRate 0.0703   Epoch: 3   Global Step: 133840   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:45:58,340-Speed 2623.61 samples/sec   Loss 11.8245   LearningRate 0.0703   Epoch: 3   Global Step: 133850   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:46:02,243-Speed 2624.11 samples/sec   Loss 11.5630   LearningRate 0.0703   Epoch: 3   Global Step: 133860   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:46:06,138-Speed 2629.24 samples/sec   Loss 11.7120   LearningRate 0.0703   Epoch: 3   Global Step: 133870   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:46:10,031-Speed 2630.67 samples/sec   Loss 11.5564   LearningRate 0.0703   Epoch: 3   Global Step: 133880   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:46:13,929-Speed 2627.78 samples/sec   Loss 11.5643   LearningRate 0.0703   Epoch: 3   Global Step: 133890   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:46:17,822-Speed 2631.38 samples/sec   Loss 11.4765   LearningRate 0.0703   Epoch: 3   Global Step: 133900   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:21,716-Speed 2630.01 samples/sec   Loss 11.8559   LearningRate 0.0703   Epoch: 3   Global Step: 133910   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:25,608-Speed 2632.37 samples/sec   Loss 11.5534   LearningRate 0.0703   Epoch: 3   Global Step: 133920   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:29,503-Speed 2629.33 samples/sec   Loss 11.6680   LearningRate 0.0703   Epoch: 3   Global Step: 133930   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:33,408-Speed 2622.61 samples/sec   Loss 11.7201   LearningRate 0.0703   Epoch: 3   Global Step: 133940   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:37,301-Speed 2630.86 samples/sec   Loss 11.5994   LearningRate 0.0703   Epoch: 3   Global Step: 133950   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:41,196-Speed 2630.07 samples/sec   Loss 11.5128   LearningRate 0.0703   Epoch: 3   Global Step: 133960   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:45,101-Speed 2622.93 samples/sec   Loss 11.6230   LearningRate 0.0703   Epoch: 3   Global Step: 133970   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:48,996-Speed 2629.63 samples/sec   Loss 11.6423   LearningRate 0.0703   Epoch: 3   Global Step: 133980   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:52,885-Speed 2633.99 samples/sec   Loss 11.7190   LearningRate 0.0703   Epoch: 3   Global Step: 133990   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:46:56,788-Speed 2624.23 samples/sec   Loss 11.7071   LearningRate 0.0703   Epoch: 3   Global Step: 134000   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:00,687-Speed 2627.18 samples/sec   Loss 11.6817   LearningRate 0.0703   Epoch: 3   Global Step: 134010   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:04,588-Speed 2625.44 samples/sec   Loss 11.6253   LearningRate 0.0703   Epoch: 3   Global Step: 134020   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:08,479-Speed 2632.15 samples/sec   Loss 11.6674   LearningRate 0.0703   Epoch: 3   Global Step: 134030   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:12,395-Speed 2615.98 samples/sec   Loss 11.5434   LearningRate 0.0703   Epoch: 3   Global Step: 134040   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:16,295-Speed 2626.17 samples/sec   Loss 11.4943   LearningRate 0.0703   Epoch: 3   Global Step: 134050   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:20,190-Speed 2630.30 samples/sec   Loss 11.5192   LearningRate 0.0703   Epoch: 3   Global Step: 134060   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:47:24,082-Speed 2631.13 samples/sec   Loss 11.5370   LearningRate 0.0703   Epoch: 3   Global Step: 134070   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:27,972-Speed 2634.07 samples/sec   Loss 11.5983   LearningRate 0.0703   Epoch: 3   Global Step: 134080   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:31,866-Speed 2629.99 samples/sec   Loss 11.5277   LearningRate 0.0703   Epoch: 3   Global Step: 134090   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:35,767-Speed 2625.52 samples/sec   Loss 11.6680   LearningRate 0.0703   Epoch: 3   Global Step: 134100   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:39,827-Speed 2522.72 samples/sec   Loss 11.4629   LearningRate 0.0703   Epoch: 3   Global Step: 134110   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:43,743-Speed 2615.61 samples/sec   Loss 11.6495   LearningRate 0.0703   Epoch: 3   Global Step: 134120   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:47,634-Speed 2632.63 samples/sec   Loss 11.6043   LearningRate 0.0703   Epoch: 3   Global Step: 134130   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:51,540-Speed 2622.38 samples/sec   Loss 11.4870   LearningRate 0.0703   Epoch: 3   Global Step: 134140   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:55,430-Speed 2633.23 samples/sec   Loss 11.6495   LearningRate 0.0703   Epoch: 3   Global Step: 134150   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:47:59,446-Speed 2550.63 samples/sec   Loss 11.6193   LearningRate 0.0703   Epoch: 3   Global Step: 134160   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:48:03,341-Speed 2629.10 samples/sec   Loss 11.6491   LearningRate 0.0703   Epoch: 3   Global Step: 134170   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:07,237-Speed 2629.29 samples/sec   Loss 11.6879   LearningRate 0.0703   Epoch: 3   Global Step: 134180   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:11,132-Speed 2629.30 samples/sec   Loss 11.5835   LearningRate 0.0703   Epoch: 3   Global Step: 134190   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:15,029-Speed 2628.91 samples/sec   Loss 11.6910   LearningRate 0.0703   Epoch: 3   Global Step: 134200   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:18,928-Speed 2626.77 samples/sec   Loss 11.6770   LearningRate 0.0703   Epoch: 3   Global Step: 134210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:22,828-Speed 2626.56 samples/sec   Loss 11.4125   LearningRate 0.0703   Epoch: 3   Global Step: 134220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:26,727-Speed 2626.71 samples/sec   Loss 11.5451   LearningRate 0.0703   Epoch: 3   Global Step: 134230   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:30,624-Speed 2628.30 samples/sec   Loss 11.6972   LearningRate 0.0703   Epoch: 3   Global Step: 134240   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:34,520-Speed 2628.90 samples/sec   Loss 11.5195   LearningRate 0.0703   Epoch: 3   Global Step: 134250   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:38,416-Speed 2628.95 samples/sec   Loss 11.6375   LearningRate 0.0703   Epoch: 3   Global Step: 134260   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:42,310-Speed 2629.54 samples/sec   Loss 11.6423   LearningRate 0.0702   Epoch: 3   Global Step: 134270   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:48:46,201-Speed 2633.11 samples/sec   Loss 11.4529   LearningRate 0.0702   Epoch: 3   Global Step: 134280   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:50,089-Speed 2634.53 samples/sec   Loss 11.6529   LearningRate 0.0702   Epoch: 3   Global Step: 134290   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:53,988-Speed 2626.71 samples/sec   Loss 11.4976   LearningRate 0.0702   Epoch: 3   Global Step: 134300   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:48:57,890-Speed 2625.47 samples/sec   Loss 11.6144   LearningRate 0.0702   Epoch: 3   Global Step: 134310   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:01,795-Speed 2622.77 samples/sec   Loss 11.5566   LearningRate 0.0702   Epoch: 3   Global Step: 134320   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:05,708-Speed 2617.83 samples/sec   Loss 11.6062   LearningRate 0.0702   Epoch: 3   Global Step: 134330   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:09,695-Speed 2568.88 samples/sec   Loss 11.4839   LearningRate 0.0702   Epoch: 3   Global Step: 134340   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:13,649-Speed 2590.47 samples/sec   Loss 11.6883   LearningRate 0.0702   Epoch: 3   Global Step: 134350   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:17,541-Speed 2631.50 samples/sec   Loss 11.6113   LearningRate 0.0702   Epoch: 3   Global Step: 134360   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:21,434-Speed 2631.02 samples/sec   Loss 11.5007   LearningRate 0.0702   Epoch: 3   Global Step: 134370   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:25,335-Speed 2625.66 samples/sec   Loss 11.5615   LearningRate 0.0702   Epoch: 3   Global Step: 134380   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:49:29,217-Speed 2638.54 samples/sec   Loss 11.7182   LearningRate 0.0702   Epoch: 3   Global Step: 134390   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:33,132-Speed 2616.19 samples/sec   Loss 11.5894   LearningRate 0.0702   Epoch: 3   Global Step: 134400   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:37,043-Speed 2619.16 samples/sec   Loss 11.5624   LearningRate 0.0702   Epoch: 3   Global Step: 134410   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:40,946-Speed 2624.46 samples/sec   Loss 11.7549   LearningRate 0.0702   Epoch: 3   Global Step: 134420   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:44,838-Speed 2631.80 samples/sec   Loss 11.4084   LearningRate 0.0702   Epoch: 3   Global Step: 134430   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:48,728-Speed 2632.47 samples/sec   Loss 11.4616   LearningRate 0.0702   Epoch: 3   Global Step: 134440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:52,625-Speed 2628.97 samples/sec   Loss 11.5842   LearningRate 0.0702   Epoch: 3   Global Step: 134450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:49:56,520-Speed 2629.21 samples/sec   Loss 11.5309   LearningRate 0.0702   Epoch: 3   Global Step: 134460   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:50:00,413-Speed 2631.51 samples/sec   Loss 11.5509   LearningRate 0.0702   Epoch: 3   Global Step: 134470   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:50:04,307-Speed 2630.55 samples/sec   Loss 11.4827   LearningRate 0.0702   Epoch: 3   Global Step: 134480   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:50:08,207-Speed 2625.56 samples/sec   Loss 11.6408   LearningRate 0.0702   Epoch: 3   Global Step: 134490   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:50:12,105-Speed 2627.84 samples/sec   Loss 11.5171   LearningRate 0.0702   Epoch: 3   Global Step: 134500   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:50:15,985-Speed 2639.83 samples/sec   Loss 11.6606   LearningRate 0.0702   Epoch: 3   Global Step: 134510   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:19,883-Speed 2628.08 samples/sec   Loss 11.5656   LearningRate 0.0702   Epoch: 3   Global Step: 134520   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:23,774-Speed 2632.02 samples/sec   Loss 11.7297   LearningRate 0.0702   Epoch: 3   Global Step: 134530   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:27,681-Speed 2621.53 samples/sec   Loss 11.5550   LearningRate 0.0702   Epoch: 3   Global Step: 134540   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:31,576-Speed 2630.04 samples/sec   Loss 11.5822   LearningRate 0.0702   Epoch: 3   Global Step: 134550   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:35,470-Speed 2630.64 samples/sec   Loss 11.5515   LearningRate 0.0702   Epoch: 3   Global Step: 134560   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:39,367-Speed 2628.39 samples/sec   Loss 11.6242   LearningRate 0.0702   Epoch: 3   Global Step: 134570   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:43,257-Speed 2632.49 samples/sec   Loss 11.4957   LearningRate 0.0702   Epoch: 3   Global Step: 134580   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:47,152-Speed 2630.29 samples/sec   Loss 11.5745   LearningRate 0.0702   Epoch: 3   Global Step: 134590   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:51,045-Speed 2631.10 samples/sec   Loss 11.5453   LearningRate 0.0702   Epoch: 3   Global Step: 134600   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:50:54,944-Speed 2627.17 samples/sec   Loss 11.5343   LearningRate 0.0702   Epoch: 3   Global Step: 134610   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:50:58,842-Speed 2627.85 samples/sec   Loss 11.6593   LearningRate 0.0702   Epoch: 3   Global Step: 134620   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:02,734-Speed 2631.29 samples/sec   Loss 11.4863   LearningRate 0.0702   Epoch: 3   Global Step: 134630   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:06,629-Speed 2629.24 samples/sec   Loss 11.5836   LearningRate 0.0702   Epoch: 3   Global Step: 134640   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:10,525-Speed 2629.63 samples/sec   Loss 11.4042   LearningRate 0.0702   Epoch: 3   Global Step: 134650   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:14,461-Speed 2602.08 samples/sec   Loss 11.4465   LearningRate 0.0702   Epoch: 3   Global Step: 134660   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:18,359-Speed 2628.03 samples/sec   Loss 11.6025   LearningRate 0.0702   Epoch: 3   Global Step: 134670   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:22,254-Speed 2629.14 samples/sec   Loss 11.6955   LearningRate 0.0702   Epoch: 3   Global Step: 134680   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:51:26,138-Speed 2637.56 samples/sec   Loss 11.6315   LearningRate 0.0702   Epoch: 3   Global Step: 134690   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:51:29,991-Speed 2658.86 samples/sec   Loss 11.5960   LearningRate 0.0702   Epoch: 3   Global Step: 134700   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:33,882-Speed 2631.83 samples/sec   Loss 11.7744   LearningRate 0.0702   Epoch: 3   Global Step: 134710   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:37,776-Speed 2630.21 samples/sec   Loss 11.8374   LearningRate 0.0702   Epoch: 3   Global Step: 134720   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:41,671-Speed 2630.00 samples/sec   Loss 11.7264   LearningRate 0.0702   Epoch: 3   Global Step: 134730   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:45,582-Speed 2619.35 samples/sec   Loss 11.4783   LearningRate 0.0702   Epoch: 3   Global Step: 134740   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:49,499-Speed 2614.98 samples/sec   Loss 11.5012   LearningRate 0.0702   Epoch: 3   Global Step: 134750   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:53,390-Speed 2632.79 samples/sec   Loss 11.6680   LearningRate 0.0702   Epoch: 3   Global Step: 134760   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:51:57,285-Speed 2629.43 samples/sec   Loss 11.6351   LearningRate 0.0701   Epoch: 3   Global Step: 134770   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:52:01,182-Speed 2628.38 samples/sec   Loss 11.6531   LearningRate 0.0701   Epoch: 3   Global Step: 134780   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:52:05,076-Speed 2630.35 samples/sec   Loss 11.5570   LearningRate 0.0701   Epoch: 3   Global Step: 134790   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-13 10:52:08,966-Speed 2633.38 samples/sec   Loss 11.5497   LearningRate 0.0701   Epoch: 3   Global Step: 134800   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:12,877-Speed 2619.02 samples/sec   Loss 11.6832   LearningRate 0.0701   Epoch: 3   Global Step: 134810   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:16,774-Speed 2628.60 samples/sec   Loss 11.6206   LearningRate 0.0701   Epoch: 3   Global Step: 134820   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:20,670-Speed 2628.75 samples/sec   Loss 11.7719   LearningRate 0.0701   Epoch: 3   Global Step: 134830   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:24,564-Speed 2631.04 samples/sec   Loss 11.6579   LearningRate 0.0701   Epoch: 3   Global Step: 134840   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:28,458-Speed 2630.37 samples/sec   Loss 11.5301   LearningRate 0.0701   Epoch: 3   Global Step: 134850   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:32,361-Speed 2624.10 samples/sec   Loss 11.7231   LearningRate 0.0701   Epoch: 3   Global Step: 134860   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:36,261-Speed 2626.21 samples/sec   Loss 11.7121   LearningRate 0.0701   Epoch: 3   Global Step: 134870   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:40,156-Speed 2629.83 samples/sec   Loss 11.6189   LearningRate 0.0701   Epoch: 3   Global Step: 134880   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:44,056-Speed 2626.38 samples/sec   Loss 11.5075   LearningRate 0.0701   Epoch: 3   Global Step: 134890   Fp16 Grad Scale: 16384   Required: 78 hours
Training: 2022-04-13 10:52:47,955-Speed 2626.72 samples/sec   Loss 11.6423   LearningRate 0.0701   Epoch: 3   Global Step: 134900   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:52:51,856-Speed 2625.75 samples/sec   Loss 11.5796   LearningRate 0.0701   Epoch: 3   Global Step: 134910   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:52:55,758-Speed 2624.96 samples/sec   Loss 11.6600   LearningRate 0.0701   Epoch: 3   Global Step: 134920   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:52:59,704-Speed 2595.67 samples/sec   Loss 11.4675   LearningRate 0.0701   Epoch: 3   Global Step: 134930   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:03,625-Speed 2611.80 samples/sec   Loss 11.5948   LearningRate 0.0701   Epoch: 3   Global Step: 134940   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:07,526-Speed 2625.61 samples/sec   Loss 11.4812   LearningRate 0.0701   Epoch: 3   Global Step: 134950   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:11,425-Speed 2627.17 samples/sec   Loss 11.5275   LearningRate 0.0701   Epoch: 3   Global Step: 134960   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:15,325-Speed 2626.56 samples/sec   Loss 11.5222   LearningRate 0.0701   Epoch: 3   Global Step: 134970   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:19,221-Speed 2628.66 samples/sec   Loss 11.6119   LearningRate 0.0701   Epoch: 3   Global Step: 134980   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:23,127-Speed 2622.30 samples/sec   Loss 11.5870   LearningRate 0.0701   Epoch: 3   Global Step: 134990   Fp16 Grad Scale: 32768   Required: 78 hours
Training: 2022-04-13 10:53:27,039-Speed 2617.73 samples/sec   Loss 11.6308   LearningRate 0.0701   Epoch: 3   Global Step: 135000   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:30,949-Speed 2620.13 samples/sec   Loss 11.5785   LearningRate 0.0701   Epoch: 3   Global Step: 135010   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:34,841-Speed 2632.03 samples/sec   Loss 11.4670   LearningRate 0.0701   Epoch: 3   Global Step: 135020   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:38,731-Speed 2632.54 samples/sec   Loss 11.5507   LearningRate 0.0701   Epoch: 3   Global Step: 135030   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:42,655-Speed 2610.31 samples/sec   Loss 11.6952   LearningRate 0.0701   Epoch: 3   Global Step: 135040   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:46,554-Speed 2627.62 samples/sec   Loss 11.5926   LearningRate 0.0701   Epoch: 3   Global Step: 135050   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:50,445-Speed 2632.70 samples/sec   Loss 11.6392   LearningRate 0.0701   Epoch: 3   Global Step: 135060   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:54,360-Speed 2615.57 samples/sec   Loss 11.6038   LearningRate 0.0701   Epoch: 3   Global Step: 135070   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:53:58,260-Speed 2626.47 samples/sec   Loss 11.5251   LearningRate 0.0701   Epoch: 3   Global Step: 135080   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:54:02,156-Speed 2629.27 samples/sec   Loss 11.5236   LearningRate 0.0701   Epoch: 3   Global Step: 135090   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:54:06,062-Speed 2622.01 samples/sec   Loss 11.6134   LearningRate 0.0701   Epoch: 3   Global Step: 135100   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:09,959-Speed 2628.10 samples/sec   Loss 11.5675   LearningRate 0.0701   Epoch: 3   Global Step: 135110   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:13,855-Speed 2629.54 samples/sec   Loss 11.5390   LearningRate 0.0701   Epoch: 3   Global Step: 135120   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:17,749-Speed 2630.23 samples/sec   Loss 11.5648   LearningRate 0.0701   Epoch: 3   Global Step: 135130   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:21,641-Speed 2631.57 samples/sec   Loss 11.5660   LearningRate 0.0701   Epoch: 3   Global Step: 135140   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:25,535-Speed 2630.62 samples/sec   Loss 11.5827   LearningRate 0.0701   Epoch: 3   Global Step: 135150   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:29,444-Speed 2620.71 samples/sec   Loss 11.4991   LearningRate 0.0701   Epoch: 3   Global Step: 135160   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:33,363-Speed 2613.26 samples/sec   Loss 11.6794   LearningRate 0.0701   Epoch: 3   Global Step: 135170   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:37,280-Speed 2614.69 samples/sec   Loss 11.5801   LearningRate 0.0701   Epoch: 3   Global Step: 135180   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:41,198-Speed 2614.25 samples/sec   Loss 11.7379   LearningRate 0.0701   Epoch: 3   Global Step: 135190   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:54:45,096-Speed 2627.81 samples/sec   Loss 11.5638   LearningRate 0.0701   Epoch: 3   Global Step: 135200   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:54:49,013-Speed 2614.60 samples/sec   Loss 11.6269   LearningRate 0.0701   Epoch: 3   Global Step: 135210   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:54:52,911-Speed 2627.93 samples/sec   Loss 11.4920   LearningRate 0.0701   Epoch: 3   Global Step: 135220   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:54:56,808-Speed 2628.60 samples/sec   Loss 11.4208   LearningRate 0.0701   Epoch: 3   Global Step: 135230   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:55:00,707-Speed 2627.32 samples/sec   Loss 11.5028   LearningRate 0.0701   Epoch: 3   Global Step: 135240   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:55:04,604-Speed 2627.75 samples/sec   Loss 11.5117   LearningRate 0.0701   Epoch: 3   Global Step: 135250   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:55:08,501-Speed 2628.47 samples/sec   Loss 11.5344   LearningRate 0.0700   Epoch: 3   Global Step: 135260   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:55:12,396-Speed 2629.32 samples/sec   Loss 11.4725   LearningRate 0.0700   Epoch: 3   Global Step: 135270   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:55:16,274-Speed 2646.27 samples/sec   Loss 11.5936   LearningRate 0.0700   Epoch: 3   Global Step: 135280   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:55:20,170-Speed 2628.95 samples/sec   Loss 11.4799   LearningRate 0.0700   Epoch: 3   Global Step: 135290   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:55:24,076-Speed 2622.44 samples/sec   Loss 11.5111   LearningRate 0.0700   Epoch: 3   Global Step: 135300   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:55:27,976-Speed 2626.38 samples/sec   Loss 11.7210   LearningRate 0.0700   Epoch: 3   Global Step: 135310   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:55:31,867-Speed 2632.86 samples/sec   Loss 11.5341   LearningRate 0.0700   Epoch: 3   Global Step: 135320   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:55:35,740-Speed 2644.23 samples/sec   Loss 11.5810   LearningRate 0.0700   Epoch: 3   Global Step: 135330   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:55:39,630-Speed 2632.87 samples/sec   Loss 11.5383   LearningRate 0.0700   Epoch: 3   Global Step: 135340   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:55:43,523-Speed 2631.02 samples/sec   Loss 11.6884   LearningRate 0.0700   Epoch: 3   Global Step: 135350   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:55:47,422-Speed 2627.36 samples/sec   Loss 11.4249   LearningRate 0.0700   Epoch: 3   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:55:51,318-Speed 2629.21 samples/sec   Loss 11.6263   LearningRate 0.0700   Epoch: 3   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:55:55,227-Speed 2619.94 samples/sec   Loss 11.5308   LearningRate 0.0700   Epoch: 3   Global Step: 135380   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:55:59,122-Speed 2630.35 samples/sec   Loss 11.6522   LearningRate 0.0700   Epoch: 3   Global Step: 135390   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:56:03,012-Speed 2632.65 samples/sec   Loss 11.5984   LearningRate 0.0700   Epoch: 3   Global Step: 135400   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:56:06,903-Speed 2632.18 samples/sec   Loss 11.5609   LearningRate 0.0700   Epoch: 3   Global Step: 135410   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:56:10,799-Speed 2628.77 samples/sec   Loss 11.6644   LearningRate 0.0700   Epoch: 3   Global Step: 135420   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:56:14,701-Speed 2625.38 samples/sec   Loss 11.4908   LearningRate 0.0700   Epoch: 3   Global Step: 135430   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:18,604-Speed 2624.61 samples/sec   Loss 11.6816   LearningRate 0.0700   Epoch: 3   Global Step: 135440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:22,538-Speed 2603.45 samples/sec   Loss 11.2966   LearningRate 0.0700   Epoch: 3   Global Step: 135450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:26,446-Speed 2622.04 samples/sec   Loss 11.5621   LearningRate 0.0700   Epoch: 3   Global Step: 135460   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:30,340-Speed 2630.03 samples/sec   Loss 11.6559   LearningRate 0.0700   Epoch: 3   Global Step: 135470   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:34,236-Speed 2628.60 samples/sec   Loss 11.6468   LearningRate 0.0700   Epoch: 3   Global Step: 135480   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:38,138-Speed 2624.44 samples/sec   Loss 11.5372   LearningRate 0.0700   Epoch: 3   Global Step: 135490   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:42,081-Speed 2598.37 samples/sec   Loss 11.4963   LearningRate 0.0700   Epoch: 3   Global Step: 135500   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:45,970-Speed 2634.12 samples/sec   Loss 11.4736   LearningRate 0.0700   Epoch: 3   Global Step: 135510   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:49,878-Speed 2620.68 samples/sec   Loss 11.5390   LearningRate 0.0700   Epoch: 3   Global Step: 135520   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:56:53,749-Speed 2646.57 samples/sec   Loss 11.5734   LearningRate 0.0700   Epoch: 3   Global Step: 135530   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:56:57,682-Speed 2604.22 samples/sec   Loss 11.4712   LearningRate 0.0700   Epoch: 3   Global Step: 135540   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:01,573-Speed 2632.01 samples/sec   Loss 11.5340   LearningRate 0.0700   Epoch: 3   Global Step: 135550   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:05,471-Speed 2627.58 samples/sec   Loss 11.5158   LearningRate 0.0700   Epoch: 3   Global Step: 135560   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:09,363-Speed 2631.85 samples/sec   Loss 11.5992   LearningRate 0.0700   Epoch: 3   Global Step: 135570   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:13,261-Speed 2627.65 samples/sec   Loss 11.5302   LearningRate 0.0700   Epoch: 3   Global Step: 135580   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:17,166-Speed 2623.57 samples/sec   Loss 11.4441   LearningRate 0.0700   Epoch: 3   Global Step: 135590   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:21,061-Speed 2629.05 samples/sec   Loss 11.6616   LearningRate 0.0700   Epoch: 3   Global Step: 135600   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:24,953-Speed 2632.38 samples/sec   Loss 11.7223   LearningRate 0.0700   Epoch: 3   Global Step: 135610   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:28,846-Speed 2631.24 samples/sec   Loss 11.5864   LearningRate 0.0700   Epoch: 3   Global Step: 135620   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:57:32,747-Speed 2625.07 samples/sec   Loss 11.5286   LearningRate 0.0700   Epoch: 3   Global Step: 135630   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:57:36,641-Speed 2630.64 samples/sec   Loss 11.4317   LearningRate 0.0700   Epoch: 3   Global Step: 135640   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:57:40,539-Speed 2627.69 samples/sec   Loss 11.4492   LearningRate 0.0700   Epoch: 3   Global Step: 135650   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:57:44,430-Speed 2631.93 samples/sec   Loss 11.4834   LearningRate 0.0700   Epoch: 3   Global Step: 135660   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:57:48,323-Speed 2631.83 samples/sec   Loss 11.6194   LearningRate 0.0700   Epoch: 3   Global Step: 135670   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:57:52,218-Speed 2628.89 samples/sec   Loss 11.5839   LearningRate 0.0700   Epoch: 3   Global Step: 135680   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:57:56,124-Speed 2622.98 samples/sec   Loss 11.5447   LearningRate 0.0700   Epoch: 3   Global Step: 135690   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:00,034-Speed 2619.12 samples/sec   Loss 11.4437   LearningRate 0.0700   Epoch: 3   Global Step: 135700   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:03,938-Speed 2623.14 samples/sec   Loss 11.5566   LearningRate 0.0700   Epoch: 3   Global Step: 135710   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:07,837-Speed 2626.81 samples/sec   Loss 11.5878   LearningRate 0.0700   Epoch: 3   Global Step: 135720   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:11,731-Speed 2630.53 samples/sec   Loss 11.6635   LearningRate 0.0700   Epoch: 3   Global Step: 135730   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:58:15,613-Speed 2638.93 samples/sec   Loss 11.6345   LearningRate 0.0700   Epoch: 3   Global Step: 135740   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:19,509-Speed 2629.06 samples/sec   Loss 11.6030   LearningRate 0.0700   Epoch: 3   Global Step: 135750   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:23,399-Speed 2632.65 samples/sec   Loss 11.6567   LearningRate 0.0699   Epoch: 3   Global Step: 135760   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:27,292-Speed 2631.04 samples/sec   Loss 11.5543   LearningRate 0.0699   Epoch: 3   Global Step: 135770   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:31,190-Speed 2627.42 samples/sec   Loss 11.5095   LearningRate 0.0699   Epoch: 3   Global Step: 135780   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:35,081-Speed 2632.10 samples/sec   Loss 11.4151   LearningRate 0.0699   Epoch: 3   Global Step: 135790   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:38,974-Speed 2630.74 samples/sec   Loss 11.5531   LearningRate 0.0699   Epoch: 3   Global Step: 135800   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:42,875-Speed 2625.64 samples/sec   Loss 11.4896   LearningRate 0.0699   Epoch: 3   Global Step: 135810   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:46,779-Speed 2623.28 samples/sec   Loss 11.5587   LearningRate 0.0699   Epoch: 3   Global Step: 135820   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:50,683-Speed 2624.05 samples/sec   Loss 11.3565   LearningRate 0.0699   Epoch: 3   Global Step: 135830   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:58:54,581-Speed 2627.82 samples/sec   Loss 11.5510   LearningRate 0.0699   Epoch: 3   Global Step: 135840   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 10:58:58,461-Speed 2639.33 samples/sec   Loss 11.5008   LearningRate 0.0699   Epoch: 3   Global Step: 135850   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:59:02,356-Speed 2629.48 samples/sec   Loss 11.4029   LearningRate 0.0699   Epoch: 3   Global Step: 135860   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:59:06,253-Speed 2628.71 samples/sec   Loss 11.5612   LearningRate 0.0699   Epoch: 3   Global Step: 135870   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:59:10,154-Speed 2625.07 samples/sec   Loss 11.4969   LearningRate 0.0699   Epoch: 3   Global Step: 135880   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:59:14,037-Speed 2637.61 samples/sec   Loss 11.5893   LearningRate 0.0699   Epoch: 3   Global Step: 135890   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:17,938-Speed 2625.74 samples/sec   Loss 11.5370   LearningRate 0.0699   Epoch: 3   Global Step: 135900   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:21,834-Speed 2628.82 samples/sec   Loss 11.4928   LearningRate 0.0699   Epoch: 3   Global Step: 135910   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:25,728-Speed 2630.61 samples/sec   Loss 11.4630   LearningRate 0.0699   Epoch: 3   Global Step: 135920   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:29,624-Speed 2628.88 samples/sec   Loss 11.3223   LearningRate 0.0699   Epoch: 3   Global Step: 135930   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:33,525-Speed 2625.84 samples/sec   Loss 11.5889   LearningRate 0.0699   Epoch: 3   Global Step: 135940   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:37,419-Speed 2630.36 samples/sec   Loss 11.6363   LearningRate 0.0699   Epoch: 3   Global Step: 135950   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:41,311-Speed 2631.03 samples/sec   Loss 11.5993   LearningRate 0.0699   Epoch: 3   Global Step: 135960   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:45,206-Speed 2629.70 samples/sec   Loss 11.6287   LearningRate 0.0699   Epoch: 3   Global Step: 135970   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:49,103-Speed 2628.29 samples/sec   Loss 11.3723   LearningRate 0.0699   Epoch: 3   Global Step: 135980   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 10:59:53,000-Speed 2628.20 samples/sec   Loss 11.5798   LearningRate 0.0699   Epoch: 3   Global Step: 135990   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 10:59:56,901-Speed 2625.72 samples/sec   Loss 11.3053   LearningRate 0.0699   Epoch: 3   Global Step: 136000   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:00,796-Speed 2629.75 samples/sec   Loss 11.7188   LearningRate 0.0699   Epoch: 3   Global Step: 136010   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:04,692-Speed 2628.76 samples/sec   Loss 11.5153   LearningRate 0.0699   Epoch: 3   Global Step: 136020   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:08,591-Speed 2626.99 samples/sec   Loss 11.6386   LearningRate 0.0699   Epoch: 3   Global Step: 136030   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:12,495-Speed 2623.49 samples/sec   Loss 11.5624   LearningRate 0.0699   Epoch: 3   Global Step: 136040   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:16,394-Speed 2627.13 samples/sec   Loss 11.4880   LearningRate 0.0699   Epoch: 3   Global Step: 136050   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:20,288-Speed 2629.91 samples/sec   Loss 11.5004   LearningRate 0.0699   Epoch: 3   Global Step: 136060   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:24,188-Speed 2626.60 samples/sec   Loss 11.6447   LearningRate 0.0699   Epoch: 3   Global Step: 136070   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:28,089-Speed 2625.32 samples/sec   Loss 11.6392   LearningRate 0.0699   Epoch: 3   Global Step: 136080   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:00:31,984-Speed 2630.03 samples/sec   Loss 11.5706   LearningRate 0.0699   Epoch: 3   Global Step: 136090   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:35,881-Speed 2627.83 samples/sec   Loss 11.6929   LearningRate 0.0699   Epoch: 3   Global Step: 136100   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:39,777-Speed 2629.14 samples/sec   Loss 11.6570   LearningRate 0.0699   Epoch: 3   Global Step: 136110   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:43,670-Speed 2631.09 samples/sec   Loss 11.5705   LearningRate 0.0699   Epoch: 3   Global Step: 136120   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:47,566-Speed 2629.24 samples/sec   Loss 11.6216   LearningRate 0.0699   Epoch: 3   Global Step: 136130   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:51,501-Speed 2603.33 samples/sec   Loss 11.4529   LearningRate 0.0699   Epoch: 3   Global Step: 136140   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:55,397-Speed 2628.69 samples/sec   Loss 11.6085   LearningRate 0.0699   Epoch: 3   Global Step: 136150   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:00:59,272-Speed 2643.51 samples/sec   Loss 11.4014   LearningRate 0.0699   Epoch: 3   Global Step: 136160   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:03,164-Speed 2631.97 samples/sec   Loss 11.3992   LearningRate 0.0699   Epoch: 3   Global Step: 136170   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:07,059-Speed 2629.48 samples/sec   Loss 11.6215   LearningRate 0.0699   Epoch: 3   Global Step: 136180   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:10,954-Speed 2629.48 samples/sec   Loss 11.5997   LearningRate 0.0699   Epoch: 3   Global Step: 136190   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:14,870-Speed 2615.93 samples/sec   Loss 11.5497   LearningRate 0.0699   Epoch: 3   Global Step: 136200   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:18,768-Speed 2627.85 samples/sec   Loss 11.5974   LearningRate 0.0699   Epoch: 3   Global Step: 136210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:22,665-Speed 2628.70 samples/sec   Loss 11.7044   LearningRate 0.0699   Epoch: 3   Global Step: 136220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:26,596-Speed 2605.19 samples/sec   Loss 11.5245   LearningRate 0.0699   Epoch: 3   Global Step: 136230   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:30,496-Speed 2626.13 samples/sec   Loss 11.5560   LearningRate 0.0699   Epoch: 3   Global Step: 136240   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:34,393-Speed 2628.72 samples/sec   Loss 11.5659   LearningRate 0.0698   Epoch: 3   Global Step: 136250   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:01:38,289-Speed 2628.82 samples/sec   Loss 11.5584   LearningRate 0.0698   Epoch: 3   Global Step: 136260   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:01:42,184-Speed 2629.52 samples/sec   Loss 11.5701   LearningRate 0.0698   Epoch: 3   Global Step: 136270   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:01:46,080-Speed 2629.24 samples/sec   Loss 11.5391   LearningRate 0.0698   Epoch: 3   Global Step: 136280   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:01:49,979-Speed 2627.19 samples/sec   Loss 11.5964   LearningRate 0.0698   Epoch: 3   Global Step: 136290   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:01:53,877-Speed 2627.82 samples/sec   Loss 11.4918   LearningRate 0.0698   Epoch: 3   Global Step: 136300   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:01:57,773-Speed 2628.85 samples/sec   Loss 11.4593   LearningRate 0.0698   Epoch: 3   Global Step: 136310   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:02:01,669-Speed 2628.98 samples/sec   Loss 11.6281   LearningRate 0.0698   Epoch: 3   Global Step: 136320   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:02:05,569-Speed 2626.52 samples/sec   Loss 11.5384   LearningRate 0.0698   Epoch: 3   Global Step: 136330   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:02:09,457-Speed 2633.78 samples/sec   Loss 11.6494   LearningRate 0.0698   Epoch: 3   Global Step: 136340   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:13,361-Speed 2623.45 samples/sec   Loss 11.5969   LearningRate 0.0698   Epoch: 3   Global Step: 136350   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:17,262-Speed 2626.14 samples/sec   Loss 11.5706   LearningRate 0.0698   Epoch: 3   Global Step: 136360   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:21,160-Speed 2627.62 samples/sec   Loss 11.4490   LearningRate 0.0698   Epoch: 3   Global Step: 136370   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:25,058-Speed 2627.97 samples/sec   Loss 11.4407   LearningRate 0.0698   Epoch: 3   Global Step: 136380   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:28,959-Speed 2625.04 samples/sec   Loss 11.5465   LearningRate 0.0698   Epoch: 3   Global Step: 136390   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:32,855-Speed 2629.60 samples/sec   Loss 11.5955   LearningRate 0.0698   Epoch: 3   Global Step: 136400   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:36,749-Speed 2630.30 samples/sec   Loss 11.5668   LearningRate 0.0698   Epoch: 3   Global Step: 136410   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:40,643-Speed 2630.49 samples/sec   Loss 11.4540   LearningRate 0.0698   Epoch: 3   Global Step: 136420   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:44,537-Speed 2629.73 samples/sec   Loss 11.6208   LearningRate 0.0698   Epoch: 3   Global Step: 136430   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:02:48,456-Speed 2613.57 samples/sec   Loss 11.5230   LearningRate 0.0698   Epoch: 3   Global Step: 136440   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:02:52,352-Speed 2629.02 samples/sec   Loss 11.4845   LearningRate 0.0698   Epoch: 3   Global Step: 136450   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:02:56,247-Speed 2630.14 samples/sec   Loss 11.4230   LearningRate 0.0698   Epoch: 3   Global Step: 136460   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:03:00,143-Speed 2629.32 samples/sec   Loss 11.4886   LearningRate 0.0698   Epoch: 3   Global Step: 136470   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:03:04,037-Speed 2629.62 samples/sec   Loss 11.5187   LearningRate 0.0698   Epoch: 3   Global Step: 136480   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:03:07,946-Speed 2620.44 samples/sec   Loss 11.5798   LearningRate 0.0698   Epoch: 3   Global Step: 136490   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:03:11,834-Speed 2634.43 samples/sec   Loss 11.6183   LearningRate 0.0698   Epoch: 3   Global Step: 136500   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:15,741-Speed 2621.33 samples/sec   Loss 11.6505   LearningRate 0.0698   Epoch: 3   Global Step: 136510   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:19,639-Speed 2627.60 samples/sec   Loss 11.4346   LearningRate 0.0698   Epoch: 3   Global Step: 136520   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:23,530-Speed 2632.06 samples/sec   Loss 11.4490   LearningRate 0.0698   Epoch: 3   Global Step: 136530   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:27,431-Speed 2626.13 samples/sec   Loss 11.4682   LearningRate 0.0698   Epoch: 3   Global Step: 136540   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:31,334-Speed 2623.78 samples/sec   Loss 11.5408   LearningRate 0.0698   Epoch: 3   Global Step: 136550   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:35,241-Speed 2621.57 samples/sec   Loss 11.6310   LearningRate 0.0698   Epoch: 3   Global Step: 136560   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:39,142-Speed 2626.10 samples/sec   Loss 11.6072   LearningRate 0.0698   Epoch: 3   Global Step: 136570   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:43,048-Speed 2622.63 samples/sec   Loss 11.5219   LearningRate 0.0698   Epoch: 3   Global Step: 136580   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:46,940-Speed 2631.50 samples/sec   Loss 11.5831   LearningRate 0.0698   Epoch: 3   Global Step: 136590   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:03:50,837-Speed 2628.41 samples/sec   Loss 11.5559   LearningRate 0.0698   Epoch: 3   Global Step: 136600   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:03:54,734-Speed 2629.04 samples/sec   Loss 11.4516   LearningRate 0.0698   Epoch: 3   Global Step: 136610   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:03:58,611-Speed 2641.50 samples/sec   Loss 11.5520   LearningRate 0.0698   Epoch: 3   Global Step: 136620   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:02,507-Speed 2629.07 samples/sec   Loss 11.4462   LearningRate 0.0698   Epoch: 3   Global Step: 136630   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:06,403-Speed 2628.65 samples/sec   Loss 11.6671   LearningRate 0.0698   Epoch: 3   Global Step: 136640   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:10,298-Speed 2630.27 samples/sec   Loss 11.4686   LearningRate 0.0698   Epoch: 3   Global Step: 136650   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:14,191-Speed 2630.42 samples/sec   Loss 11.5738   LearningRate 0.0698   Epoch: 3   Global Step: 136660   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:18,102-Speed 2619.67 samples/sec   Loss 11.5205   LearningRate 0.0698   Epoch: 3   Global Step: 136670   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:22,033-Speed 2605.39 samples/sec   Loss 11.5965   LearningRate 0.0698   Epoch: 3   Global Step: 136680   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:25,937-Speed 2623.91 samples/sec   Loss 11.5343   LearningRate 0.0698   Epoch: 3   Global Step: 136690   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:29,838-Speed 2625.06 samples/sec   Loss 11.4087   LearningRate 0.0698   Epoch: 3   Global Step: 136700   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:33,735-Speed 2628.29 samples/sec   Loss 11.6903   LearningRate 0.0698   Epoch: 3   Global Step: 136710   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:04:37,636-Speed 2625.91 samples/sec   Loss 11.4423   LearningRate 0.0698   Epoch: 3   Global Step: 136720   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:04:41,546-Speed 2619.64 samples/sec   Loss 11.4867   LearningRate 0.0698   Epoch: 3   Global Step: 136730   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:04:45,465-Speed 2613.21 samples/sec   Loss 11.4910   LearningRate 0.0698   Epoch: 3   Global Step: 136740   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:04:49,376-Speed 2619.39 samples/sec   Loss 11.6742   LearningRate 0.0697   Epoch: 3   Global Step: 136750   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:04:53,430-Speed 2526.69 samples/sec   Loss 11.5582   LearningRate 0.0697   Epoch: 3   Global Step: 136760   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:04:57,348-Speed 2614.60 samples/sec   Loss 11.5719   LearningRate 0.0697   Epoch: 3   Global Step: 136770   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:05:01,226-Speed 2641.05 samples/sec   Loss 11.6627   LearningRate 0.0697   Epoch: 3   Global Step: 136780   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:05:05,123-Speed 2628.35 samples/sec   Loss 11.4207   LearningRate 0.0697   Epoch: 3   Global Step: 136790   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:05:09,016-Speed 2631.29 samples/sec   Loss 11.4194   LearningRate 0.0697   Epoch: 3   Global Step: 136800   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:12,918-Speed 2625.07 samples/sec   Loss 11.4676   LearningRate 0.0697   Epoch: 3   Global Step: 136810   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:16,839-Speed 2612.54 samples/sec   Loss 11.5828   LearningRate 0.0697   Epoch: 3   Global Step: 136820   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:20,732-Speed 2631.32 samples/sec   Loss 11.4887   LearningRate 0.0697   Epoch: 3   Global Step: 136830   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:24,623-Speed 2632.23 samples/sec   Loss 11.4415   LearningRate 0.0697   Epoch: 3   Global Step: 136840   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:28,523-Speed 2626.53 samples/sec   Loss 11.5950   LearningRate 0.0697   Epoch: 3   Global Step: 136850   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:32,411-Speed 2634.62 samples/sec   Loss 11.4731   LearningRate 0.0697   Epoch: 3   Global Step: 136860   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:36,312-Speed 2625.68 samples/sec   Loss 11.4553   LearningRate 0.0697   Epoch: 3   Global Step: 136870   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:40,209-Speed 2628.13 samples/sec   Loss 11.5747   LearningRate 0.0697   Epoch: 3   Global Step: 136880   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:44,113-Speed 2623.69 samples/sec   Loss 11.4742   LearningRate 0.0697   Epoch: 3   Global Step: 136890   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:05:48,018-Speed 2623.35 samples/sec   Loss 11.6188   LearningRate 0.0697   Epoch: 3   Global Step: 136900   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:05:51,917-Speed 2627.02 samples/sec   Loss 11.6199   LearningRate 0.0697   Epoch: 3   Global Step: 136910   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:05:55,818-Speed 2625.67 samples/sec   Loss 11.5780   LearningRate 0.0697   Epoch: 3   Global Step: 136920   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:05:59,710-Speed 2631.57 samples/sec   Loss 11.6326   LearningRate 0.0697   Epoch: 3   Global Step: 136930   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:03,603-Speed 2630.78 samples/sec   Loss 11.5799   LearningRate 0.0697   Epoch: 3   Global Step: 136940   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:07,497-Speed 2630.60 samples/sec   Loss 11.5812   LearningRate 0.0697   Epoch: 3   Global Step: 136950   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:11,389-Speed 2631.39 samples/sec   Loss 11.4543   LearningRate 0.0697   Epoch: 3   Global Step: 136960   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:15,292-Speed 2624.37 samples/sec   Loss 11.5845   LearningRate 0.0697   Epoch: 3   Global Step: 136970   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:19,201-Speed 2619.97 samples/sec   Loss 11.6585   LearningRate 0.0697   Epoch: 3   Global Step: 136980   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:23,101-Speed 2626.79 samples/sec   Loss 11.5073   LearningRate 0.0697   Epoch: 3   Global Step: 136990   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:27,022-Speed 2611.81 samples/sec   Loss 11.6532   LearningRate 0.0697   Epoch: 3   Global Step: 137000   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:06:30,909-Speed 2635.62 samples/sec   Loss 11.5124   LearningRate 0.0697   Epoch: 3   Global Step: 137010   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:06:34,803-Speed 2630.64 samples/sec   Loss 11.5828   LearningRate 0.0697   Epoch: 3   Global Step: 137020   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:06:38,697-Speed 2630.18 samples/sec   Loss 11.4998   LearningRate 0.0697   Epoch: 3   Global Step: 137030   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:06:42,575-Speed 2641.30 samples/sec   Loss 11.4175   LearningRate 0.0697   Epoch: 3   Global Step: 137040   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:46,468-Speed 2630.51 samples/sec   Loss 11.4476   LearningRate 0.0697   Epoch: 3   Global Step: 137050   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:50,359-Speed 2632.35 samples/sec   Loss 11.4649   LearningRate 0.0697   Epoch: 3   Global Step: 137060   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:54,257-Speed 2627.94 samples/sec   Loss 11.4745   LearningRate 0.0697   Epoch: 3   Global Step: 137070   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:06:58,155-Speed 2627.24 samples/sec   Loss 11.5695   LearningRate 0.0697   Epoch: 3   Global Step: 137080   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:02,053-Speed 2628.09 samples/sec   Loss 11.5158   LearningRate 0.0697   Epoch: 3   Global Step: 137090   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:05,957-Speed 2623.89 samples/sec   Loss 11.5173   LearningRate 0.0697   Epoch: 3   Global Step: 137100   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:09,865-Speed 2620.70 samples/sec   Loss 11.5287   LearningRate 0.0697   Epoch: 3   Global Step: 137110   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:13,768-Speed 2623.79 samples/sec   Loss 11.4965   LearningRate 0.0697   Epoch: 3   Global Step: 137120   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:17,666-Speed 2628.39 samples/sec   Loss 11.4810   LearningRate 0.0697   Epoch: 3   Global Step: 137130   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:21,574-Speed 2620.34 samples/sec   Loss 11.4991   LearningRate 0.0697   Epoch: 3   Global Step: 137140   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:25,468-Speed 2630.52 samples/sec   Loss 11.4694   LearningRate 0.0697   Epoch: 3   Global Step: 137150   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:29,372-Speed 2623.44 samples/sec   Loss 11.4497   LearningRate 0.0697   Epoch: 3   Global Step: 137160   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:33,282-Speed 2620.26 samples/sec   Loss 11.4464   LearningRate 0.0697   Epoch: 3   Global Step: 137170   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:37,188-Speed 2622.02 samples/sec   Loss 11.4037   LearningRate 0.0697   Epoch: 3   Global Step: 137180   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:41,089-Speed 2625.21 samples/sec   Loss 11.3076   LearningRate 0.0697   Epoch: 3   Global Step: 137190   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:44,995-Speed 2622.12 samples/sec   Loss 11.5710   LearningRate 0.0697   Epoch: 3   Global Step: 137200   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:07:48,938-Speed 2597.77 samples/sec   Loss 11.5136   LearningRate 0.0697   Epoch: 3   Global Step: 137210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:52,837-Speed 2627.11 samples/sec   Loss 11.4959   LearningRate 0.0697   Epoch: 3   Global Step: 137220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:07:56,737-Speed 2626.08 samples/sec   Loss 11.4146   LearningRate 0.0697   Epoch: 3   Global Step: 137230   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:00,642-Speed 2623.43 samples/sec   Loss 11.5058   LearningRate 0.0697   Epoch: 3   Global Step: 137240   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:04,670-Speed 2542.58 samples/sec   Loss 11.5080   LearningRate 0.0696   Epoch: 3   Global Step: 137250   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:08,579-Speed 2619.69 samples/sec   Loss 11.4937   LearningRate 0.0696   Epoch: 3   Global Step: 137260   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:12,481-Speed 2625.11 samples/sec   Loss 11.5564   LearningRate 0.0696   Epoch: 3   Global Step: 137270   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:16,385-Speed 2623.47 samples/sec   Loss 11.4429   LearningRate 0.0696   Epoch: 3   Global Step: 137280   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:20,280-Speed 2630.37 samples/sec   Loss 11.6270   LearningRate 0.0696   Epoch: 3   Global Step: 137290   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:24,196-Speed 2614.96 samples/sec   Loss 11.6202   LearningRate 0.0696   Epoch: 3   Global Step: 137300   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:08:28,092-Speed 2629.25 samples/sec   Loss 11.5218   LearningRate 0.0696   Epoch: 3   Global Step: 137310   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:31,985-Speed 2631.20 samples/sec   Loss 11.4935   LearningRate 0.0696   Epoch: 3   Global Step: 137320   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:35,885-Speed 2626.19 samples/sec   Loss 11.6729   LearningRate 0.0696   Epoch: 3   Global Step: 137330   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:39,804-Speed 2613.35 samples/sec   Loss 11.5734   LearningRate 0.0696   Epoch: 3   Global Step: 137340   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:43,695-Speed 2632.32 samples/sec   Loss 11.4111   LearningRate 0.0696   Epoch: 3   Global Step: 137350   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:47,589-Speed 2630.48 samples/sec   Loss 11.5089   LearningRate 0.0696   Epoch: 3   Global Step: 137360   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:51,520-Speed 2605.71 samples/sec   Loss 11.3941   LearningRate 0.0696   Epoch: 3   Global Step: 137370   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:55,415-Speed 2629.84 samples/sec   Loss 11.5494   LearningRate 0.0696   Epoch: 3   Global Step: 137380   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:08:59,289-Speed 2643.57 samples/sec   Loss 11.4086   LearningRate 0.0696   Epoch: 3   Global Step: 137390   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:03,185-Speed 2628.97 samples/sec   Loss 11.5060   LearningRate 0.0696   Epoch: 3   Global Step: 137400   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:07,104-Speed 2613.98 samples/sec   Loss 11.5417   LearningRate 0.0696   Epoch: 3   Global Step: 137410   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:11,118-Speed 2551.62 samples/sec   Loss 11.4050   LearningRate 0.0696   Epoch: 3   Global Step: 137420   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:15,010-Speed 2631.51 samples/sec   Loss 11.4032   LearningRate 0.0696   Epoch: 3   Global Step: 137430   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:18,914-Speed 2623.98 samples/sec   Loss 11.4911   LearningRate 0.0696   Epoch: 3   Global Step: 137440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:22,815-Speed 2625.12 samples/sec   Loss 11.4763   LearningRate 0.0696   Epoch: 3   Global Step: 137450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:26,709-Speed 2631.20 samples/sec   Loss 11.5065   LearningRate 0.0696   Epoch: 3   Global Step: 137460   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:30,604-Speed 2629.62 samples/sec   Loss 11.5966   LearningRate 0.0696   Epoch: 3   Global Step: 137470   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:34,516-Speed 2617.92 samples/sec   Loss 11.6085   LearningRate 0.0696   Epoch: 3   Global Step: 137480   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:09:38,396-Speed 2639.91 samples/sec   Loss 11.5430   LearningRate 0.0696   Epoch: 3   Global Step: 137490   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:09:42,288-Speed 2632.03 samples/sec   Loss 11.6471   LearningRate 0.0696   Epoch: 3   Global Step: 137500   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:09:46,180-Speed 2631.48 samples/sec   Loss 11.3988   LearningRate 0.0696   Epoch: 3   Global Step: 137510   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:09:50,084-Speed 2623.61 samples/sec   Loss 11.5404   LearningRate 0.0696   Epoch: 3   Global Step: 137520   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:09:53,977-Speed 2630.59 samples/sec   Loss 11.4839   LearningRate 0.0696   Epoch: 3   Global Step: 137530   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:09:57,871-Speed 2631.01 samples/sec   Loss 11.5059   LearningRate 0.0696   Epoch: 3   Global Step: 137540   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:01,774-Speed 2624.44 samples/sec   Loss 11.4238   LearningRate 0.0696   Epoch: 3   Global Step: 137550   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:05,676-Speed 2624.70 samples/sec   Loss 11.3554   LearningRate 0.0696   Epoch: 3   Global Step: 137560   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:09,577-Speed 2625.59 samples/sec   Loss 11.5425   LearningRate 0.0696   Epoch: 3   Global Step: 137570   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:13,562-Speed 2569.84 samples/sec   Loss 11.4663   LearningRate 0.0696   Epoch: 3   Global Step: 137580   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:17,447-Speed 2636.75 samples/sec   Loss 11.6018   LearningRate 0.0696   Epoch: 3   Global Step: 137590   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:21,342-Speed 2629.35 samples/sec   Loss 11.4889   LearningRate 0.0696   Epoch: 3   Global Step: 137600   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:25,237-Speed 2629.84 samples/sec   Loss 11.5338   LearningRate 0.0696   Epoch: 3   Global Step: 137610   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:29,135-Speed 2627.92 samples/sec   Loss 11.5610   LearningRate 0.0696   Epoch: 3   Global Step: 137620   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:33,029-Speed 2630.25 samples/sec   Loss 11.5077   LearningRate 0.0696   Epoch: 3   Global Step: 137630   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:36,921-Speed 2631.26 samples/sec   Loss 11.4088   LearningRate 0.0696   Epoch: 3   Global Step: 137640   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:40,852-Speed 2605.86 samples/sec   Loss 11.5476   LearningRate 0.0696   Epoch: 3   Global Step: 137650   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:10:44,735-Speed 2637.57 samples/sec   Loss 11.4011   LearningRate 0.0696   Epoch: 3   Global Step: 137660   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:10:48,620-Speed 2636.74 samples/sec   Loss 11.4342   LearningRate 0.0696   Epoch: 3   Global Step: 137670   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:10:52,509-Speed 2633.28 samples/sec   Loss 11.3969   LearningRate 0.0696   Epoch: 3   Global Step: 137680   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:10:56,407-Speed 2628.17 samples/sec   Loss 11.5616   LearningRate 0.0696   Epoch: 3   Global Step: 137690   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:00,301-Speed 2629.67 samples/sec   Loss 11.3364   LearningRate 0.0696   Epoch: 3   Global Step: 137700   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:04,195-Speed 2630.46 samples/sec   Loss 11.3219   LearningRate 0.0696   Epoch: 3   Global Step: 137710   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:08,087-Speed 2632.10 samples/sec   Loss 11.5017   LearningRate 0.0696   Epoch: 3   Global Step: 137720   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:11,987-Speed 2625.99 samples/sec   Loss 11.4787   LearningRate 0.0696   Epoch: 3   Global Step: 137730   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:15,883-Speed 2629.35 samples/sec   Loss 11.5068   LearningRate 0.0695   Epoch: 3   Global Step: 137740   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:19,793-Speed 2619.84 samples/sec   Loss 11.5222   LearningRate 0.0695   Epoch: 3   Global Step: 137750   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:11:23,703-Speed 2619.28 samples/sec   Loss 11.3695   LearningRate 0.0695   Epoch: 3   Global Step: 137760   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:27,632-Speed 2607.16 samples/sec   Loss 11.4330   LearningRate 0.0695   Epoch: 3   Global Step: 137770   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:31,526-Speed 2630.19 samples/sec   Loss 11.4253   LearningRate 0.0695   Epoch: 3   Global Step: 137780   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:35,425-Speed 2627.00 samples/sec   Loss 11.5624   LearningRate 0.0695   Epoch: 3   Global Step: 137790   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:39,325-Speed 2626.57 samples/sec   Loss 11.5141   LearningRate 0.0695   Epoch: 3   Global Step: 137800   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:43,224-Speed 2626.80 samples/sec   Loss 11.4720   LearningRate 0.0695   Epoch: 3   Global Step: 137810   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:47,129-Speed 2623.00 samples/sec   Loss 11.4816   LearningRate 0.0695   Epoch: 3   Global Step: 137820   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:51,026-Speed 2628.35 samples/sec   Loss 11.5032   LearningRate 0.0695   Epoch: 3   Global Step: 137830   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:54,925-Speed 2627.43 samples/sec   Loss 11.5485   LearningRate 0.0695   Epoch: 3   Global Step: 137840   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:11:58,823-Speed 2627.20 samples/sec   Loss 11.5931   LearningRate 0.0695   Epoch: 3   Global Step: 137850   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:12:02,725-Speed 2624.95 samples/sec   Loss 11.5961   LearningRate 0.0695   Epoch: 3   Global Step: 137860   Fp16 Grad Scale: 524288   Required: 78 hours
Training: 2022-04-13 11:12:06,608-Speed 2637.52 samples/sec   Loss 11.5071   LearningRate 0.0695   Epoch: 3   Global Step: 137870   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:12:10,515-Speed 2622.07 samples/sec   Loss 11.5325   LearningRate 0.0695   Epoch: 3   Global Step: 137880   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:12:14,411-Speed 2628.94 samples/sec   Loss 11.5771   LearningRate 0.0695   Epoch: 3   Global Step: 137890   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:12:18,313-Speed 2625.50 samples/sec   Loss 11.5272   LearningRate 0.0695   Epoch: 3   Global Step: 137900   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:22,198-Speed 2636.33 samples/sec   Loss 11.6404   LearningRate 0.0695   Epoch: 3   Global Step: 137910   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:26,108-Speed 2620.09 samples/sec   Loss 11.5025   LearningRate 0.0695   Epoch: 3   Global Step: 137920   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:30,001-Speed 2630.57 samples/sec   Loss 11.4281   LearningRate 0.0695   Epoch: 3   Global Step: 137930   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:33,905-Speed 2623.35 samples/sec   Loss 11.4946   LearningRate 0.0695   Epoch: 3   Global Step: 137940   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:37,812-Speed 2621.61 samples/sec   Loss 11.4084   LearningRate 0.0695   Epoch: 3   Global Step: 137950   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:41,707-Speed 2630.13 samples/sec   Loss 11.5458   LearningRate 0.0695   Epoch: 3   Global Step: 137960   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:45,602-Speed 2629.87 samples/sec   Loss 11.5193   LearningRate 0.0695   Epoch: 3   Global Step: 137970   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:49,517-Speed 2616.02 samples/sec   Loss 11.3555   LearningRate 0.0695   Epoch: 3   Global Step: 137980   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:53,418-Speed 2625.56 samples/sec   Loss 11.4343   LearningRate 0.0695   Epoch: 3   Global Step: 137990   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:12:57,321-Speed 2624.61 samples/sec   Loss 11.4299   LearningRate 0.0695   Epoch: 3   Global Step: 138000   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:13:01,219-Speed 2627.71 samples/sec   Loss 11.5280   LearningRate 0.0695   Epoch: 3   Global Step: 138010   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:05,127-Speed 2621.03 samples/sec   Loss 11.4687   LearningRate 0.0695   Epoch: 3   Global Step: 138020   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:09,025-Speed 2627.68 samples/sec   Loss 11.6047   LearningRate 0.0695   Epoch: 3   Global Step: 138030   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:12,921-Speed 2628.98 samples/sec   Loss 11.5437   LearningRate 0.0695   Epoch: 3   Global Step: 138040   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:16,815-Speed 2630.52 samples/sec   Loss 11.4681   LearningRate 0.0695   Epoch: 3   Global Step: 138050   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:20,715-Speed 2626.74 samples/sec   Loss 11.5167   LearningRate 0.0695   Epoch: 3   Global Step: 138060   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:24,610-Speed 2629.28 samples/sec   Loss 11.4287   LearningRate 0.0695   Epoch: 3   Global Step: 138070   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:28,506-Speed 2629.54 samples/sec   Loss 11.6876   LearningRate 0.0695   Epoch: 3   Global Step: 138080   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:32,405-Speed 2626.56 samples/sec   Loss 11.4170   LearningRate 0.0695   Epoch: 3   Global Step: 138090   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:36,302-Speed 2628.73 samples/sec   Loss 11.4325   LearningRate 0.0695   Epoch: 3   Global Step: 138100   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:13:40,199-Speed 2628.16 samples/sec   Loss 11.4635   LearningRate 0.0695   Epoch: 3   Global Step: 138110   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:13:44,101-Speed 2625.08 samples/sec   Loss 11.4811   LearningRate 0.0695   Epoch: 3   Global Step: 138120   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:13:47,997-Speed 2629.03 samples/sec   Loss 11.4078   LearningRate 0.0695   Epoch: 3   Global Step: 138130   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:13:51,896-Speed 2627.02 samples/sec   Loss 11.5274   LearningRate 0.0695   Epoch: 3   Global Step: 138140   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:13:55,822-Speed 2608.90 samples/sec   Loss 11.4921   LearningRate 0.0695   Epoch: 3   Global Step: 138150   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:13:59,720-Speed 2628.26 samples/sec   Loss 11.5089   LearningRate 0.0695   Epoch: 3   Global Step: 138160   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:14:03,627-Speed 2621.04 samples/sec   Loss 11.5071   LearningRate 0.0695   Epoch: 3   Global Step: 138170   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:14:07,522-Speed 2629.92 samples/sec   Loss 11.3047   LearningRate 0.0695   Epoch: 3   Global Step: 138180   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:14:11,417-Speed 2629.35 samples/sec   Loss 11.5076   LearningRate 0.0695   Epoch: 3   Global Step: 138190   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:14:15,315-Speed 2627.84 samples/sec   Loss 11.5063   LearningRate 0.0695   Epoch: 3   Global Step: 138200   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:14:19,209-Speed 2631.08 samples/sec   Loss 11.5223   LearningRate 0.0695   Epoch: 3   Global Step: 138210   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:14:23,109-Speed 2626.78 samples/sec   Loss 11.4328   LearningRate 0.0695   Epoch: 3   Global Step: 138220   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:14:26,998-Speed 2633.05 samples/sec   Loss 11.4730   LearningRate 0.0695   Epoch: 3   Global Step: 138230   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:30,921-Speed 2610.74 samples/sec   Loss 11.3737   LearningRate 0.0694   Epoch: 3   Global Step: 138240   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:34,813-Speed 2631.79 samples/sec   Loss 11.4526   LearningRate 0.0694   Epoch: 3   Global Step: 138250   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:38,711-Speed 2628.41 samples/sec   Loss 11.4552   LearningRate 0.0694   Epoch: 3   Global Step: 138260   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:42,604-Speed 2630.79 samples/sec   Loss 11.4683   LearningRate 0.0694   Epoch: 3   Global Step: 138270   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:46,503-Speed 2627.18 samples/sec   Loss 11.5682   LearningRate 0.0694   Epoch: 3   Global Step: 138280   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:50,394-Speed 2632.23 samples/sec   Loss 11.3903   LearningRate 0.0694   Epoch: 3   Global Step: 138290   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:54,290-Speed 2628.82 samples/sec   Loss 11.4553   LearningRate 0.0694   Epoch: 3   Global Step: 138300   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:14:58,183-Speed 2631.59 samples/sec   Loss 11.4477   LearningRate 0.0694   Epoch: 3   Global Step: 138310   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:15:02,081-Speed 2627.44 samples/sec   Loss 11.6249   LearningRate 0.0694   Epoch: 3   Global Step: 138320   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:15:05,981-Speed 2625.80 samples/sec   Loss 11.5226   LearningRate 0.0694   Epoch: 3   Global Step: 138330   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:09,880-Speed 2626.79 samples/sec   Loss 11.5584   LearningRate 0.0694   Epoch: 3   Global Step: 138340   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:13,780-Speed 2626.99 samples/sec   Loss 11.4621   LearningRate 0.0694   Epoch: 3   Global Step: 138350   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:17,682-Speed 2625.09 samples/sec   Loss 11.4659   LearningRate 0.0694   Epoch: 3   Global Step: 138360   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:21,582-Speed 2626.18 samples/sec   Loss 11.4728   LearningRate 0.0694   Epoch: 3   Global Step: 138370   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:25,477-Speed 2629.61 samples/sec   Loss 11.4289   LearningRate 0.0694   Epoch: 3   Global Step: 138380   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:29,377-Speed 2626.49 samples/sec   Loss 11.4176   LearningRate 0.0694   Epoch: 3   Global Step: 138390   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:33,275-Speed 2627.22 samples/sec   Loss 11.3821   LearningRate 0.0694   Epoch: 3   Global Step: 138400   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:37,170-Speed 2629.87 samples/sec   Loss 11.3653   LearningRate 0.0694   Epoch: 3   Global Step: 138410   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:41,076-Speed 2622.44 samples/sec   Loss 11.6983   LearningRate 0.0694   Epoch: 3   Global Step: 138420   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:44,970-Speed 2629.94 samples/sec   Loss 11.6812   LearningRate 0.0694   Epoch: 3   Global Step: 138430   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:15:48,862-Speed 2631.73 samples/sec   Loss 11.4805   LearningRate 0.0694   Epoch: 3   Global Step: 138440   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:52,766-Speed 2623.66 samples/sec   Loss 11.6304   LearningRate 0.0694   Epoch: 3   Global Step: 138450   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:15:56,660-Speed 2630.44 samples/sec   Loss 11.5149   LearningRate 0.0694   Epoch: 3   Global Step: 138460   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:16:00,561-Speed 2625.86 samples/sec   Loss 11.5372   LearningRate 0.0694   Epoch: 3   Global Step: 138470   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:16:04,459-Speed 2627.45 samples/sec   Loss 11.4286   LearningRate 0.0694   Epoch: 3   Global Step: 138480   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:16:08,365-Speed 2621.87 samples/sec   Loss 11.4786   LearningRate 0.0694   Epoch: 3   Global Step: 138490   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:16:12,261-Speed 2629.19 samples/sec   Loss 11.6147   LearningRate 0.0694   Epoch: 3   Global Step: 138500   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:16:16,143-Speed 2638.44 samples/sec   Loss 11.5786   LearningRate 0.0694   Epoch: 3   Global Step: 138510   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:20,040-Speed 2628.44 samples/sec   Loss 11.4549   LearningRate 0.0694   Epoch: 3   Global Step: 138520   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:23,944-Speed 2623.22 samples/sec   Loss 11.5639   LearningRate 0.0694   Epoch: 3   Global Step: 138530   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:27,841-Speed 2629.18 samples/sec   Loss 11.4950   LearningRate 0.0694   Epoch: 3   Global Step: 138540   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:31,736-Speed 2629.15 samples/sec   Loss 11.4718   LearningRate 0.0694   Epoch: 3   Global Step: 138550   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:35,629-Speed 2631.31 samples/sec   Loss 11.4254   LearningRate 0.0694   Epoch: 3   Global Step: 138560   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:39,525-Speed 2628.46 samples/sec   Loss 11.6618   LearningRate 0.0694   Epoch: 3   Global Step: 138570   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:43,424-Speed 2627.10 samples/sec   Loss 11.4042   LearningRate 0.0694   Epoch: 3   Global Step: 138580   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:47,320-Speed 2628.51 samples/sec   Loss 11.2088   LearningRate 0.0694   Epoch: 3   Global Step: 138590   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:51,218-Speed 2628.16 samples/sec   Loss 11.4535   LearningRate 0.0694   Epoch: 3   Global Step: 138600   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:16:55,120-Speed 2625.10 samples/sec   Loss 11.4674   LearningRate 0.0694   Epoch: 3   Global Step: 138610   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:16:59,019-Speed 2627.56 samples/sec   Loss 11.4633   LearningRate 0.0694   Epoch: 3   Global Step: 138620   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:02,915-Speed 2628.65 samples/sec   Loss 11.5419   LearningRate 0.0694   Epoch: 3   Global Step: 138630   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:06,810-Speed 2629.45 samples/sec   Loss 11.5184   LearningRate 0.0694   Epoch: 3   Global Step: 138640   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:10,725-Speed 2615.99 samples/sec   Loss 11.6214   LearningRate 0.0694   Epoch: 3   Global Step: 138650   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:14,615-Speed 2633.33 samples/sec   Loss 11.4813   LearningRate 0.0694   Epoch: 3   Global Step: 138660   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:18,515-Speed 2626.76 samples/sec   Loss 11.5295   LearningRate 0.0694   Epoch: 3   Global Step: 138670   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:22,412-Speed 2627.96 samples/sec   Loss 11.5294   LearningRate 0.0694   Epoch: 3   Global Step: 138680   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:26,312-Speed 2626.50 samples/sec   Loss 11.3569   LearningRate 0.0694   Epoch: 3   Global Step: 138690   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:30,206-Speed 2630.80 samples/sec   Loss 11.5122   LearningRate 0.0694   Epoch: 3   Global Step: 138700   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:34,213-Speed 2556.11 samples/sec   Loss 11.6187   LearningRate 0.0694   Epoch: 3   Global Step: 138710   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:17:38,109-Speed 2628.76 samples/sec   Loss 11.4552   LearningRate 0.0694   Epoch: 3   Global Step: 138720   Fp16 Grad Scale: 262144   Required: 78 hours
Training: 2022-04-13 11:17:41,987-Speed 2641.11 samples/sec   Loss 11.5110   LearningRate 0.0694   Epoch: 3   Global Step: 138730   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:45,897-Speed 2620.12 samples/sec   Loss 11.3615   LearningRate 0.0693   Epoch: 3   Global Step: 138740   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:17:49,961-Speed 2520.17 samples/sec   Loss 11.4666   LearningRate 0.0693   Epoch: 3   Global Step: 138750   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:17:53,841-Speed 2639.44 samples/sec   Loss 11.4790   LearningRate 0.0693   Epoch: 3   Global Step: 138760   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:17:57,830-Speed 2568.29 samples/sec   Loss 11.4805   LearningRate 0.0693   Epoch: 3   Global Step: 138770   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:01,865-Speed 2538.84 samples/sec   Loss 11.4170   LearningRate 0.0693   Epoch: 3   Global Step: 138780   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:05,758-Speed 2630.99 samples/sec   Loss 11.4590   LearningRate 0.0693   Epoch: 3   Global Step: 138790   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:09,656-Speed 2627.11 samples/sec   Loss 11.4936   LearningRate 0.0693   Epoch: 3   Global Step: 138800   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:13,555-Speed 2627.68 samples/sec   Loss 11.4167   LearningRate 0.0693   Epoch: 3   Global Step: 138810   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:17,450-Speed 2629.44 samples/sec   Loss 11.5533   LearningRate 0.0693   Epoch: 3   Global Step: 138820   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:21,370-Speed 2613.12 samples/sec   Loss 11.4843   LearningRate 0.0693   Epoch: 3   Global Step: 138830   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:25,262-Speed 2631.89 samples/sec   Loss 11.7894   LearningRate 0.0693   Epoch: 3   Global Step: 138840   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:29,172-Speed 2619.63 samples/sec   Loss 11.8044   LearningRate 0.0693   Epoch: 3   Global Step: 138850   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:18:33,077-Speed 2623.21 samples/sec   Loss 11.5338   LearningRate 0.0693   Epoch: 3   Global Step: 138860   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:18:36,975-Speed 2626.88 samples/sec   Loss 11.6367   LearningRate 0.0693   Epoch: 3   Global Step: 138870   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:18:40,885-Speed 2619.53 samples/sec   Loss 11.5650   LearningRate 0.0693   Epoch: 3   Global Step: 138880   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:18:44,790-Speed 2622.60 samples/sec   Loss 11.5028   LearningRate 0.0693   Epoch: 3   Global Step: 138890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:18:48,709-Speed 2614.28 samples/sec   Loss 11.6762   LearningRate 0.0693   Epoch: 3   Global Step: 138900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:18:52,601-Speed 2631.73 samples/sec   Loss 11.6278   LearningRate 0.0693   Epoch: 3   Global Step: 138910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:18:56,529-Speed 2607.65 samples/sec   Loss 11.7421   LearningRate 0.0693   Epoch: 3   Global Step: 138920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:00,423-Speed 2630.34 samples/sec   Loss 11.5870   LearningRate 0.0693   Epoch: 3   Global Step: 138930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:04,319-Speed 2629.03 samples/sec   Loss 11.5055   LearningRate 0.0693   Epoch: 3   Global Step: 138940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:08,218-Speed 2626.81 samples/sec   Loss 11.5478   LearningRate 0.0693   Epoch: 3   Global Step: 138950   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:12,108-Speed 2632.83 samples/sec   Loss 11.6567   LearningRate 0.0693   Epoch: 3   Global Step: 138960   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:16,017-Speed 2620.76 samples/sec   Loss 11.5085   LearningRate 0.0693   Epoch: 3   Global Step: 138970   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:19,914-Speed 2627.85 samples/sec   Loss 11.5337   LearningRate 0.0693   Epoch: 3   Global Step: 138980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:23,814-Speed 2626.91 samples/sec   Loss 11.5262   LearningRate 0.0693   Epoch: 3   Global Step: 138990   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:27,720-Speed 2621.98 samples/sec   Loss 11.5880   LearningRate 0.0693   Epoch: 3   Global Step: 139000   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:31,628-Speed 2621.22 samples/sec   Loss 11.3177   LearningRate 0.0693   Epoch: 3   Global Step: 139010   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:35,532-Speed 2623.62 samples/sec   Loss 11.4083   LearningRate 0.0693   Epoch: 3   Global Step: 139020   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:39,445-Speed 2617.17 samples/sec   Loss 11.6097   LearningRate 0.0693   Epoch: 3   Global Step: 139030   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:43,352-Speed 2621.76 samples/sec   Loss 11.3927   LearningRate 0.0693   Epoch: 3   Global Step: 139040   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:47,256-Speed 2623.90 samples/sec   Loss 11.6832   LearningRate 0.0693   Epoch: 3   Global Step: 139050   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:51,159-Speed 2624.32 samples/sec   Loss 11.5999   LearningRate 0.0693   Epoch: 3   Global Step: 139060   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:19:55,049-Speed 2633.17 samples/sec   Loss 11.4915   LearningRate 0.0693   Epoch: 3   Global Step: 139070   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:19:58,943-Speed 2630.15 samples/sec   Loss 11.5676   LearningRate 0.0693   Epoch: 3   Global Step: 139080   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:20:02,848-Speed 2623.83 samples/sec   Loss 11.5259   LearningRate 0.0693   Epoch: 3   Global Step: 139090   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:20:06,725-Speed 2641.14 samples/sec   Loss 11.6779   LearningRate 0.0693   Epoch: 3   Global Step: 139100   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:10,619-Speed 2630.57 samples/sec   Loss 11.5151   LearningRate 0.0693   Epoch: 3   Global Step: 139110   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:14,522-Speed 2623.67 samples/sec   Loss 11.5945   LearningRate 0.0693   Epoch: 3   Global Step: 139120   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:18,419-Speed 2628.90 samples/sec   Loss 11.5805   LearningRate 0.0693   Epoch: 3   Global Step: 139130   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:22,315-Speed 2629.02 samples/sec   Loss 11.5616   LearningRate 0.0693   Epoch: 3   Global Step: 139140   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:26,214-Speed 2626.90 samples/sec   Loss 11.3740   LearningRate 0.0693   Epoch: 3   Global Step: 139150   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:30,185-Speed 2578.89 samples/sec   Loss 11.5706   LearningRate 0.0693   Epoch: 3   Global Step: 139160   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:34,087-Speed 2624.85 samples/sec   Loss 11.5281   LearningRate 0.0693   Epoch: 3   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:37,990-Speed 2624.95 samples/sec   Loss 11.5730   LearningRate 0.0693   Epoch: 3   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:41,885-Speed 2629.32 samples/sec   Loss 11.4237   LearningRate 0.0693   Epoch: 3   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:20:45,786-Speed 2625.90 samples/sec   Loss 11.4499   LearningRate 0.0693   Epoch: 3   Global Step: 139200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:20:49,684-Speed 2628.05 samples/sec   Loss 11.4868   LearningRate 0.0693   Epoch: 3   Global Step: 139210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:20:53,599-Speed 2615.98 samples/sec   Loss 11.2867   LearningRate 0.0693   Epoch: 3   Global Step: 139220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:20:57,508-Speed 2620.83 samples/sec   Loss 11.3984   LearningRate 0.0693   Epoch: 3   Global Step: 139230   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:01,403-Speed 2629.49 samples/sec   Loss 11.4265   LearningRate 0.0692   Epoch: 3   Global Step: 139240   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:05,298-Speed 2629.69 samples/sec   Loss 11.4359   LearningRate 0.0692   Epoch: 3   Global Step: 139250   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:09,194-Speed 2628.85 samples/sec   Loss 11.4189   LearningRate 0.0692   Epoch: 3   Global Step: 139260   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:13,089-Speed 2629.66 samples/sec   Loss 11.5822   LearningRate 0.0692   Epoch: 3   Global Step: 139270   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:16,985-Speed 2629.29 samples/sec   Loss 11.3172   LearningRate 0.0692   Epoch: 3   Global Step: 139280   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:20,896-Speed 2618.86 samples/sec   Loss 11.6055   LearningRate 0.0692   Epoch: 3   Global Step: 139290   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:24,790-Speed 2630.37 samples/sec   Loss 11.3424   LearningRate 0.0692   Epoch: 3   Global Step: 139300   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:21:28,667-Speed 2641.86 samples/sec   Loss 11.4830   LearningRate 0.0692   Epoch: 3   Global Step: 139310   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:21:32,542-Speed 2643.35 samples/sec   Loss 11.4411   LearningRate 0.0692   Epoch: 3   Global Step: 139320   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:36,443-Speed 2625.22 samples/sec   Loss 11.6338   LearningRate 0.0692   Epoch: 3   Global Step: 139330   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:40,345-Speed 2624.83 samples/sec   Loss 11.6626   LearningRate 0.0692   Epoch: 3   Global Step: 139340   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:44,258-Speed 2619.48 samples/sec   Loss 11.5856   LearningRate 0.0692   Epoch: 3   Global Step: 139350   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:48,155-Speed 2628.65 samples/sec   Loss 11.5247   LearningRate 0.0692   Epoch: 3   Global Step: 139360   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:52,064-Speed 2619.57 samples/sec   Loss 11.5669   LearningRate 0.0692   Epoch: 3   Global Step: 139370   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:55,977-Speed 2618.37 samples/sec   Loss 11.5507   LearningRate 0.0692   Epoch: 3   Global Step: 139380   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:21:59,873-Speed 2628.79 samples/sec   Loss 11.4529   LearningRate 0.0692   Epoch: 3   Global Step: 139390   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:03,768-Speed 2629.42 samples/sec   Loss 11.5102   LearningRate 0.0692   Epoch: 3   Global Step: 139400   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:07,664-Speed 2629.44 samples/sec   Loss 11.5368   LearningRate 0.0692   Epoch: 3   Global Step: 139410   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:11,585-Speed 2612.58 samples/sec   Loss 11.3993   LearningRate 0.0692   Epoch: 3   Global Step: 139420   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:22:15,484-Speed 2627.12 samples/sec   Loss 11.6713   LearningRate 0.0692   Epoch: 3   Global Step: 139430   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:22:19,634-Speed 2468.35 samples/sec   Loss 11.4380   LearningRate 0.0692   Epoch: 3   Global Step: 139440   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:22:23,539-Speed 2622.66 samples/sec   Loss 11.4658   LearningRate 0.0692   Epoch: 3   Global Step: 139450   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:22:27,422-Speed 2639.03 samples/sec   Loss 11.5489   LearningRate 0.0692   Epoch: 3   Global Step: 139460   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:31,374-Speed 2591.75 samples/sec   Loss 11.5234   LearningRate 0.0692   Epoch: 3   Global Step: 139470   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:35,277-Speed 2623.78 samples/sec   Loss 11.5054   LearningRate 0.0692   Epoch: 3   Global Step: 139480   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:39,171-Speed 2630.75 samples/sec   Loss 11.6054   LearningRate 0.0692   Epoch: 3   Global Step: 139490   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:43,066-Speed 2629.66 samples/sec   Loss 11.4039   LearningRate 0.0692   Epoch: 3   Global Step: 139500   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:46,958-Speed 2631.37 samples/sec   Loss 11.6920   LearningRate 0.0692   Epoch: 3   Global Step: 139510   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:50,855-Speed 2629.08 samples/sec   Loss 11.4551   LearningRate 0.0692   Epoch: 3   Global Step: 139520   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:54,749-Speed 2630.22 samples/sec   Loss 11.4349   LearningRate 0.0692   Epoch: 3   Global Step: 139530   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:22:58,646-Speed 2628.31 samples/sec   Loss 11.2923   LearningRate 0.0692   Epoch: 3   Global Step: 139540   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:02,545-Speed 2627.11 samples/sec   Loss 11.4934   LearningRate 0.0692   Epoch: 3   Global Step: 139550   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:06,440-Speed 2630.00 samples/sec   Loss 11.4976   LearningRate 0.0692   Epoch: 3   Global Step: 139560   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:23:10,334-Speed 2630.19 samples/sec   Loss 11.4292   LearningRate 0.0692   Epoch: 3   Global Step: 139570   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:23:14,230-Speed 2629.02 samples/sec   Loss 11.3972   LearningRate 0.0692   Epoch: 3   Global Step: 139580   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:23:18,128-Speed 2627.69 samples/sec   Loss 11.4578   LearningRate 0.0692   Epoch: 3   Global Step: 139590   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:23:22,061-Speed 2604.43 samples/sec   Loss 11.5704   LearningRate 0.0692   Epoch: 3   Global Step: 139600   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:26,066-Speed 2557.48 samples/sec   Loss 11.3798   LearningRate 0.0692   Epoch: 3   Global Step: 139610   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:29,960-Speed 2630.80 samples/sec   Loss 11.6420   LearningRate 0.0692   Epoch: 3   Global Step: 139620   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:33,851-Speed 2631.96 samples/sec   Loss 11.6040   LearningRate 0.0692   Epoch: 3   Global Step: 139630   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:37,745-Speed 2630.67 samples/sec   Loss 11.4649   LearningRate 0.0692   Epoch: 3   Global Step: 139640   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:41,649-Speed 2623.33 samples/sec   Loss 11.5198   LearningRate 0.0692   Epoch: 3   Global Step: 139650   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:45,577-Speed 2607.58 samples/sec   Loss 11.3524   LearningRate 0.0692   Epoch: 3   Global Step: 139660   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:49,490-Speed 2617.71 samples/sec   Loss 11.5823   LearningRate 0.0692   Epoch: 3   Global Step: 139670   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:53,385-Speed 2629.94 samples/sec   Loss 11.5077   LearningRate 0.0692   Epoch: 3   Global Step: 139680   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:23:57,314-Speed 2607.12 samples/sec   Loss 11.4814   LearningRate 0.0692   Epoch: 3   Global Step: 139690   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:01,211-Speed 2628.76 samples/sec   Loss 11.5425   LearningRate 0.0692   Epoch: 3   Global Step: 139700   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:05,119-Speed 2620.39 samples/sec   Loss 11.5945   LearningRate 0.0692   Epoch: 3   Global Step: 139710   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:09,025-Speed 2622.52 samples/sec   Loss 11.4116   LearningRate 0.0692   Epoch: 3   Global Step: 139720   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:12,925-Speed 2626.56 samples/sec   Loss 11.5413   LearningRate 0.0692   Epoch: 3   Global Step: 139730   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:16,828-Speed 2624.60 samples/sec   Loss 11.4098   LearningRate 0.0691   Epoch: 3   Global Step: 139740   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:20,730-Speed 2624.17 samples/sec   Loss 11.3048   LearningRate 0.0691   Epoch: 3   Global Step: 139750   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:24,628-Speed 2628.37 samples/sec   Loss 11.5183   LearningRate 0.0691   Epoch: 3   Global Step: 139760   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:24:28,514-Speed 2635.46 samples/sec   Loss 11.5297   LearningRate 0.0691   Epoch: 3   Global Step: 139770   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:32,411-Speed 2628.82 samples/sec   Loss 11.5358   LearningRate 0.0691   Epoch: 3   Global Step: 139780   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:36,313-Speed 2624.64 samples/sec   Loss 11.4005   LearningRate 0.0691   Epoch: 3   Global Step: 139790   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:40,216-Speed 2624.03 samples/sec   Loss 11.4532   LearningRate 0.0691   Epoch: 3   Global Step: 139800   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:44,242-Speed 2543.71 samples/sec   Loss 11.4930   LearningRate 0.0691   Epoch: 3   Global Step: 139810   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:48,199-Speed 2589.18 samples/sec   Loss 11.5384   LearningRate 0.0691   Epoch: 3   Global Step: 139820   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:52,099-Speed 2626.57 samples/sec   Loss 11.4486   LearningRate 0.0691   Epoch: 3   Global Step: 139830   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:56,007-Speed 2620.82 samples/sec   Loss 11.5250   LearningRate 0.0691   Epoch: 3   Global Step: 139840   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:24:59,894-Speed 2635.54 samples/sec   Loss 11.3960   LearningRate 0.0691   Epoch: 3   Global Step: 139850   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:25:03,787-Speed 2631.50 samples/sec   Loss 11.5522   LearningRate 0.0691   Epoch: 3   Global Step: 139860   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:25:07,680-Speed 2630.50 samples/sec   Loss 11.3853   LearningRate 0.0691   Epoch: 3   Global Step: 139870   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:11,574-Speed 2630.11 samples/sec   Loss 11.4223   LearningRate 0.0691   Epoch: 3   Global Step: 139880   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:15,473-Speed 2626.67 samples/sec   Loss 11.4669   LearningRate 0.0691   Epoch: 3   Global Step: 139890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:19,370-Speed 2628.75 samples/sec   Loss 11.4567   LearningRate 0.0691   Epoch: 3   Global Step: 139900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:23,272-Speed 2625.16 samples/sec   Loss 11.4065   LearningRate 0.0691   Epoch: 3   Global Step: 139910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:27,173-Speed 2625.65 samples/sec   Loss 11.3955   LearningRate 0.0691   Epoch: 3   Global Step: 139920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:31,072-Speed 2627.22 samples/sec   Loss 11.5157   LearningRate 0.0691   Epoch: 3   Global Step: 139930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:35,087-Speed 2551.26 samples/sec   Loss 11.4261   LearningRate 0.0691   Epoch: 3   Global Step: 139940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:39,052-Speed 2582.59 samples/sec   Loss 11.5169   LearningRate 0.0691   Epoch: 3   Global Step: 139950   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:42,953-Speed 2625.67 samples/sec   Loss 11.3971   LearningRate 0.0691   Epoch: 3   Global Step: 139960   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:46,832-Speed 2640.26 samples/sec   Loss 11.4252   LearningRate 0.0691   Epoch: 3   Global Step: 139970   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:50,730-Speed 2627.41 samples/sec   Loss 11.3244   LearningRate 0.0691   Epoch: 3   Global Step: 139980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:54,638-Speed 2621.29 samples/sec   Loss 11.3840   LearningRate 0.0691   Epoch: 3   Global Step: 139990   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:25:58,542-Speed 2623.70 samples/sec   Loss 11.4631   LearningRate 0.0691   Epoch: 3   Global Step: 140000   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:26:41,925-[lfw][140000]XNorm: 23.652834
Training: 2022-04-13 11:26:41,926-[lfw][140000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-13 11:26:41,926-[lfw][140000]Accuracy-Highest: 0.99783
Training: 2022-04-13 11:27:31,837-[cfp_fp][140000]XNorm: 21.368183
Training: 2022-04-13 11:27:31,838-[cfp_fp][140000]Accuracy-Flip: 0.97786+-0.00767
Training: 2022-04-13 11:27:31,839-[cfp_fp][140000]Accuracy-Highest: 0.97986
Training: 2022-04-13 11:28:15,252-[agedb_30][140000]XNorm: 23.439239
Training: 2022-04-13 11:28:15,253-[agedb_30][140000]Accuracy-Flip: 0.96683+-0.00848
Training: 2022-04-13 11:28:15,254-[agedb_30][140000]Accuracy-Highest: 0.96800
Training: 2022-04-13 11:28:19,105-Speed 72.85 samples/sec   Loss 11.4378   LearningRate 0.0691   Epoch: 3   Global Step: 140010   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:22,972-Speed 2648.64 samples/sec   Loss 11.5861   LearningRate 0.0691   Epoch: 3   Global Step: 140020   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:26,845-Speed 2645.02 samples/sec   Loss 11.3642   LearningRate 0.0691   Epoch: 3   Global Step: 140030   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:30,718-Speed 2645.06 samples/sec   Loss 11.4575   LearningRate 0.0691   Epoch: 3   Global Step: 140040   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:34,593-Speed 2642.83 samples/sec   Loss 11.6345   LearningRate 0.0691   Epoch: 3   Global Step: 140050   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:38,481-Speed 2634.40 samples/sec   Loss 11.5239   LearningRate 0.0691   Epoch: 3   Global Step: 140060   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:42,360-Speed 2641.40 samples/sec   Loss 11.5056   LearningRate 0.0691   Epoch: 3   Global Step: 140070   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:46,238-Speed 2641.57 samples/sec   Loss 11.3834   LearningRate 0.0691   Epoch: 3   Global Step: 140080   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:50,117-Speed 2640.26 samples/sec   Loss 11.4860   LearningRate 0.0691   Epoch: 3   Global Step: 140090   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:54,006-Speed 2633.76 samples/sec   Loss 11.5370   LearningRate 0.0691   Epoch: 3   Global Step: 140100   Fp16 Grad Scale: 65536   Required: 78 hours
Training: 2022-04-13 11:28:57,903-Speed 2628.66 samples/sec   Loss 11.5312   LearningRate 0.0691   Epoch: 3   Global Step: 140110   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:01,790-Speed 2635.08 samples/sec   Loss 11.4537   LearningRate 0.0691   Epoch: 3   Global Step: 140120   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:05,679-Speed 2633.54 samples/sec   Loss 11.4460   LearningRate 0.0691   Epoch: 3   Global Step: 140130   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:09,570-Speed 2632.67 samples/sec   Loss 11.3876   LearningRate 0.0691   Epoch: 3   Global Step: 140140   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:13,459-Speed 2633.58 samples/sec   Loss 11.4551   LearningRate 0.0691   Epoch: 3   Global Step: 140150   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:17,367-Speed 2621.12 samples/sec   Loss 11.4543   LearningRate 0.0691   Epoch: 3   Global Step: 140160   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:21,267-Speed 2625.96 samples/sec   Loss 11.2977   LearningRate 0.0691   Epoch: 3   Global Step: 140170   Fp16 Grad Scale: 131072   Required: 78 hours
Training: 2022-04-13 11:29:25,168-Speed 2625.77 samples/sec   Loss 11.4462   LearningRate 0.0691   Epoch: 3   Global Step: 140180   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:29,082-Speed 2617.20 samples/sec   Loss 11.4731   LearningRate 0.0691   Epoch: 3   Global Step: 140190   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:32,997-Speed 2616.31 samples/sec   Loss 11.5266   LearningRate 0.0691   Epoch: 3   Global Step: 140200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:36,881-Speed 2636.83 samples/sec   Loss 11.4023   LearningRate 0.0691   Epoch: 3   Global Step: 140210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:40,778-Speed 2628.16 samples/sec   Loss 11.4398   LearningRate 0.0691   Epoch: 3   Global Step: 140220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:44,676-Speed 2627.40 samples/sec   Loss 11.4072   LearningRate 0.0690   Epoch: 3   Global Step: 140230   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:48,587-Speed 2619.35 samples/sec   Loss 11.4482   LearningRate 0.0690   Epoch: 3   Global Step: 140240   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:52,486-Speed 2627.18 samples/sec   Loss 11.4549   LearningRate 0.0690   Epoch: 3   Global Step: 140250   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:29:56,387-Speed 2625.25 samples/sec   Loss 11.5308   LearningRate 0.0690   Epoch: 3   Global Step: 140260   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:00,297-Speed 2620.24 samples/sec   Loss 11.6045   LearningRate 0.0690   Epoch: 3   Global Step: 140270   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:04,201-Speed 2623.77 samples/sec   Loss 11.3271   LearningRate 0.0690   Epoch: 3   Global Step: 140280   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:08,107-Speed 2621.71 samples/sec   Loss 11.5415   LearningRate 0.0690   Epoch: 3   Global Step: 140290   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:12,019-Speed 2618.34 samples/sec   Loss 11.4101   LearningRate 0.0690   Epoch: 3   Global Step: 140300   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:15,927-Speed 2621.05 samples/sec   Loss 11.5158   LearningRate 0.0690   Epoch: 3   Global Step: 140310   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:19,824-Speed 2628.79 samples/sec   Loss 11.3239   LearningRate 0.0690   Epoch: 3   Global Step: 140320   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:23,726-Speed 2625.14 samples/sec   Loss 11.4142   LearningRate 0.0690   Epoch: 3   Global Step: 140330   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:27,630-Speed 2623.49 samples/sec   Loss 11.5095   LearningRate 0.0690   Epoch: 3   Global Step: 140340   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:31,530-Speed 2625.81 samples/sec   Loss 11.3349   LearningRate 0.0690   Epoch: 3   Global Step: 140350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:35,454-Speed 2610.39 samples/sec   Loss 11.6060   LearningRate 0.0690   Epoch: 3   Global Step: 140360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:39,358-Speed 2623.76 samples/sec   Loss 11.5652   LearningRate 0.0690   Epoch: 3   Global Step: 140370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:43,264-Speed 2622.66 samples/sec   Loss 11.5602   LearningRate 0.0690   Epoch: 3   Global Step: 140380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:47,173-Speed 2620.08 samples/sec   Loss 11.5877   LearningRate 0.0690   Epoch: 3   Global Step: 140390   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:51,092-Speed 2613.12 samples/sec   Loss 11.5007   LearningRate 0.0690   Epoch: 3   Global Step: 140400   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:30:55,007-Speed 2616.75 samples/sec   Loss 11.4437   LearningRate 0.0690   Epoch: 3   Global Step: 140410   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:30:58,944-Speed 2601.45 samples/sec   Loss 11.4228   LearningRate 0.0690   Epoch: 3   Global Step: 140420   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:31:02,843-Speed 2627.76 samples/sec   Loss 11.3630   LearningRate 0.0690   Epoch: 3   Global Step: 140430   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:31:06,725-Speed 2638.12 samples/sec   Loss 11.4592   LearningRate 0.0690   Epoch: 3   Global Step: 140440   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:10,651-Speed 2609.36 samples/sec   Loss 11.9044   LearningRate 0.0690   Epoch: 3   Global Step: 140450   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:14,560-Speed 2620.48 samples/sec   Loss 11.4944   LearningRate 0.0690   Epoch: 3   Global Step: 140460   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:18,466-Speed 2621.89 samples/sec   Loss 11.4162   LearningRate 0.0690   Epoch: 3   Global Step: 140470   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:22,391-Speed 2609.66 samples/sec   Loss 11.4396   LearningRate 0.0690   Epoch: 3   Global Step: 140480   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:26,300-Speed 2620.20 samples/sec   Loss 11.4124   LearningRate 0.0690   Epoch: 3   Global Step: 140490   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:30,217-Speed 2615.37 samples/sec   Loss 11.4965   LearningRate 0.0690   Epoch: 3   Global Step: 140500   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:34,134-Speed 2615.02 samples/sec   Loss 11.3716   LearningRate 0.0690   Epoch: 3   Global Step: 140510   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:38,048-Speed 2616.76 samples/sec   Loss 11.2961   LearningRate 0.0690   Epoch: 3   Global Step: 140520   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:41,962-Speed 2616.96 samples/sec   Loss 11.7287   LearningRate 0.0690   Epoch: 3   Global Step: 140530   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:31:45,881-Speed 2613.43 samples/sec   Loss 11.4125   LearningRate 0.0690   Epoch: 3   Global Step: 140540   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:31:49,825-Speed 2596.90 samples/sec   Loss 11.3419   LearningRate 0.0690   Epoch: 3   Global Step: 140550   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:31:53,752-Speed 2608.72 samples/sec   Loss 11.4240   LearningRate 0.0690   Epoch: 3   Global Step: 140560   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:31:57,682-Speed 2606.15 samples/sec   Loss 11.4415   LearningRate 0.0690   Epoch: 3   Global Step: 140570   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:01,610-Speed 2607.81 samples/sec   Loss 11.5156   LearningRate 0.0690   Epoch: 3   Global Step: 140580   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:05,609-Speed 2561.84 samples/sec   Loss 11.4472   LearningRate 0.0690   Epoch: 3   Global Step: 140590   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:09,515-Speed 2622.07 samples/sec   Loss 11.3307   LearningRate 0.0690   Epoch: 3   Global Step: 140600   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:13,427-Speed 2618.34 samples/sec   Loss 11.5070   LearningRate 0.0690   Epoch: 3   Global Step: 140610   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:17,342-Speed 2616.20 samples/sec   Loss 11.5071   LearningRate 0.0690   Epoch: 3   Global Step: 140620   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:21,250-Speed 2620.77 samples/sec   Loss 11.4577   LearningRate 0.0690   Epoch: 3   Global Step: 140630   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:25,148-Speed 2627.65 samples/sec   Loss 11.4826   LearningRate 0.0690   Epoch: 3   Global Step: 140640   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:29,053-Speed 2622.99 samples/sec   Loss 11.4960   LearningRate 0.0690   Epoch: 3   Global Step: 140650   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:32:32,943-Speed 2633.49 samples/sec   Loss 11.6009   LearningRate 0.0690   Epoch: 3   Global Step: 140660   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:32:36,860-Speed 2614.66 samples/sec   Loss 11.4315   LearningRate 0.0690   Epoch: 3   Global Step: 140670   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:32:40,775-Speed 2616.19 samples/sec   Loss 11.5443   LearningRate 0.0690   Epoch: 3   Global Step: 140680   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:32:44,684-Speed 2620.35 samples/sec   Loss 11.3306   LearningRate 0.0690   Epoch: 3   Global Step: 140690   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:32:48,594-Speed 2619.46 samples/sec   Loss 11.5870   LearningRate 0.0690   Epoch: 3   Global Step: 140700   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:32:52,500-Speed 2622.40 samples/sec   Loss 11.5220   LearningRate 0.0690   Epoch: 3   Global Step: 140710   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:32:56,404-Speed 2623.56 samples/sec   Loss 11.4744   LearningRate 0.0690   Epoch: 3   Global Step: 140720   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:00,309-Speed 2622.96 samples/sec   Loss 11.4578   LearningRate 0.0689   Epoch: 3   Global Step: 140730   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:04,218-Speed 2620.26 samples/sec   Loss 11.4407   LearningRate 0.0689   Epoch: 3   Global Step: 140740   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:08,125-Speed 2621.39 samples/sec   Loss 11.5646   LearningRate 0.0689   Epoch: 3   Global Step: 140750   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:12,036-Speed 2619.28 samples/sec   Loss 11.3965   LearningRate 0.0689   Epoch: 3   Global Step: 140760   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:15,946-Speed 2619.74 samples/sec   Loss 11.5331   LearningRate 0.0689   Epoch: 3   Global Step: 140770   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:19,855-Speed 2620.21 samples/sec   Loss 11.4870   LearningRate 0.0689   Epoch: 3   Global Step: 140780   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:23,762-Speed 2621.48 samples/sec   Loss 11.5189   LearningRate 0.0689   Epoch: 3   Global Step: 140790   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:27,682-Speed 2613.45 samples/sec   Loss 11.4621   LearningRate 0.0689   Epoch: 3   Global Step: 140800   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:31,610-Speed 2607.63 samples/sec   Loss 11.4119   LearningRate 0.0689   Epoch: 3   Global Step: 140810   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:35,516-Speed 2621.80 samples/sec   Loss 11.5202   LearningRate 0.0689   Epoch: 3   Global Step: 140820   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:39,429-Speed 2617.38 samples/sec   Loss 11.4485   LearningRate 0.0689   Epoch: 3   Global Step: 140830   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:43,342-Speed 2618.22 samples/sec   Loss 11.3970   LearningRate 0.0689   Epoch: 3   Global Step: 140840   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:33:47,241-Speed 2626.78 samples/sec   Loss 11.5087   LearningRate 0.0689   Epoch: 3   Global Step: 140850   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:51,157-Speed 2615.45 samples/sec   Loss 11.5148   LearningRate 0.0689   Epoch: 3   Global Step: 140860   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:55,084-Speed 2608.10 samples/sec   Loss 11.3518   LearningRate 0.0689   Epoch: 3   Global Step: 140870   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:33:59,019-Speed 2603.11 samples/sec   Loss 11.5160   LearningRate 0.0689   Epoch: 3   Global Step: 140880   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:02,927-Speed 2621.14 samples/sec   Loss 11.3086   LearningRate 0.0689   Epoch: 3   Global Step: 140890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:06,962-Speed 2537.96 samples/sec   Loss 11.3646   LearningRate 0.0689   Epoch: 3   Global Step: 140900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:10,878-Speed 2615.85 samples/sec   Loss 11.5812   LearningRate 0.0689   Epoch: 3   Global Step: 140910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:14,794-Speed 2615.33 samples/sec   Loss 11.4661   LearningRate 0.0689   Epoch: 3   Global Step: 140920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:18,715-Speed 2612.58 samples/sec   Loss 11.4968   LearningRate 0.0689   Epoch: 3   Global Step: 140930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:22,640-Speed 2609.72 samples/sec   Loss 11.5453   LearningRate 0.0689   Epoch: 3   Global Step: 140940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:34:26,560-Speed 2613.16 samples/sec   Loss 11.5508   LearningRate 0.0689   Epoch: 3   Global Step: 140950   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:30,477-Speed 2614.60 samples/sec   Loss 11.4078   LearningRate 0.0689   Epoch: 3   Global Step: 140960   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:34,386-Speed 2619.86 samples/sec   Loss 11.3966   LearningRate 0.0689   Epoch: 3   Global Step: 140970   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:38,298-Speed 2618.44 samples/sec   Loss 11.3262   LearningRate 0.0689   Epoch: 3   Global Step: 140980   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:42,207-Speed 2621.12 samples/sec   Loss 11.4912   LearningRate 0.0689   Epoch: 3   Global Step: 140990   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:46,120-Speed 2617.02 samples/sec   Loss 11.4893   LearningRate 0.0689   Epoch: 3   Global Step: 141000   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:50,034-Speed 2617.55 samples/sec   Loss 11.3086   LearningRate 0.0689   Epoch: 3   Global Step: 141010   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:53,947-Speed 2616.99 samples/sec   Loss 11.4281   LearningRate 0.0689   Epoch: 3   Global Step: 141020   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:34:57,857-Speed 2620.30 samples/sec   Loss 11.4373   LearningRate 0.0689   Epoch: 3   Global Step: 141030   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:35:01,766-Speed 2619.97 samples/sec   Loss 11.6408   LearningRate 0.0689   Epoch: 3   Global Step: 141040   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:35:05,678-Speed 2618.26 samples/sec   Loss 11.4914   LearningRate 0.0689   Epoch: 3   Global Step: 141050   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:35:09,584-Speed 2621.92 samples/sec   Loss 11.3488   LearningRate 0.0689   Epoch: 3   Global Step: 141060   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:35:13,499-Speed 2616.46 samples/sec   Loss 11.2188   LearningRate 0.0689   Epoch: 3   Global Step: 141070   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:35:17,421-Speed 2611.38 samples/sec   Loss 11.3704   LearningRate 0.0689   Epoch: 3   Global Step: 141080   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:35:21,310-Speed 2634.45 samples/sec   Loss 11.4613   LearningRate 0.0689   Epoch: 3   Global Step: 141090   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:25,215-Speed 2622.51 samples/sec   Loss 11.3017   LearningRate 0.0689   Epoch: 3   Global Step: 141100   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:29,119-Speed 2623.79 samples/sec   Loss 11.3073   LearningRate 0.0689   Epoch: 3   Global Step: 141110   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:33,028-Speed 2620.55 samples/sec   Loss 11.2259   LearningRate 0.0689   Epoch: 3   Global Step: 141120   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:36,931-Speed 2623.69 samples/sec   Loss 11.4873   LearningRate 0.0689   Epoch: 3   Global Step: 141130   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:40,848-Speed 2615.36 samples/sec   Loss 11.5308   LearningRate 0.0689   Epoch: 3   Global Step: 141140   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:44,787-Speed 2600.33 samples/sec   Loss 11.4312   LearningRate 0.0689   Epoch: 3   Global Step: 141150   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:48,759-Speed 2579.08 samples/sec   Loss 11.2547   LearningRate 0.0689   Epoch: 3   Global Step: 141160   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:52,698-Speed 2600.10 samples/sec   Loss 11.4434   LearningRate 0.0689   Epoch: 3   Global Step: 141170   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:35:56,622-Speed 2610.28 samples/sec   Loss 11.3866   LearningRate 0.0689   Epoch: 3   Global Step: 141180   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:36:00,536-Speed 2617.28 samples/sec   Loss 11.5506   LearningRate 0.0689   Epoch: 3   Global Step: 141190   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:04,449-Speed 2617.65 samples/sec   Loss 11.3638   LearningRate 0.0689   Epoch: 3   Global Step: 141200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:08,382-Speed 2603.84 samples/sec   Loss 11.4456   LearningRate 0.0689   Epoch: 3   Global Step: 141210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:12,308-Speed 2608.89 samples/sec   Loss 11.3798   LearningRate 0.0689   Epoch: 3   Global Step: 141220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:16,305-Speed 2562.99 samples/sec   Loss 11.2569   LearningRate 0.0688   Epoch: 3   Global Step: 141230   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:20,225-Speed 2612.52 samples/sec   Loss 11.3921   LearningRate 0.0688   Epoch: 3   Global Step: 141240   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:24,128-Speed 2624.60 samples/sec   Loss 11.3214   LearningRate 0.0688   Epoch: 3   Global Step: 141250   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:28,038-Speed 2620.38 samples/sec   Loss 11.4252   LearningRate 0.0688   Epoch: 3   Global Step: 141260   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:31,947-Speed 2620.02 samples/sec   Loss 11.3511   LearningRate 0.0688   Epoch: 3   Global Step: 141270   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:35,864-Speed 2615.44 samples/sec   Loss 11.3453   LearningRate 0.0688   Epoch: 3   Global Step: 141280   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:36:39,769-Speed 2622.58 samples/sec   Loss 11.4773   LearningRate 0.0688   Epoch: 3   Global Step: 141290   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:36:43,693-Speed 2610.46 samples/sec   Loss 11.5312   LearningRate 0.0688   Epoch: 3   Global Step: 141300   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:36:47,614-Speed 2612.48 samples/sec   Loss 11.3539   LearningRate 0.0688   Epoch: 3   Global Step: 141310   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:36:51,523-Speed 2620.10 samples/sec   Loss 11.4182   LearningRate 0.0688   Epoch: 3   Global Step: 141320   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:36:55,443-Speed 2612.28 samples/sec   Loss 11.3839   LearningRate 0.0688   Epoch: 3   Global Step: 141330   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:36:59,379-Speed 2602.94 samples/sec   Loss 11.2267   LearningRate 0.0688   Epoch: 3   Global Step: 141340   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:37:03,268-Speed 2633.99 samples/sec   Loss 11.4737   LearningRate 0.0688   Epoch: 3   Global Step: 141350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:07,184-Speed 2615.32 samples/sec   Loss 11.4666   LearningRate 0.0688   Epoch: 3   Global Step: 141360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:11,095-Speed 2618.99 samples/sec   Loss 11.4450   LearningRate 0.0688   Epoch: 3   Global Step: 141370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:15,001-Speed 2622.54 samples/sec   Loss 11.4541   LearningRate 0.0688   Epoch: 3   Global Step: 141380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:18,907-Speed 2621.95 samples/sec   Loss 11.2949   LearningRate 0.0688   Epoch: 3   Global Step: 141390   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:22,821-Speed 2616.75 samples/sec   Loss 11.4561   LearningRate 0.0688   Epoch: 3   Global Step: 141400   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:26,786-Speed 2583.67 samples/sec   Loss 11.3978   LearningRate 0.0688   Epoch: 3   Global Step: 141410   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:30,692-Speed 2622.34 samples/sec   Loss 11.4393   LearningRate 0.0688   Epoch: 3   Global Step: 141420   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:34,602-Speed 2620.00 samples/sec   Loss 11.2692   LearningRate 0.0688   Epoch: 3   Global Step: 141430   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:38,517-Speed 2615.79 samples/sec   Loss 11.4151   LearningRate 0.0688   Epoch: 3   Global Step: 141440   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:37:42,428-Speed 2619.45 samples/sec   Loss 11.3961   LearningRate 0.0688   Epoch: 3   Global Step: 141450   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:37:46,335-Speed 2621.32 samples/sec   Loss 11.3946   LearningRate 0.0688   Epoch: 3   Global Step: 141460   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:37:50,243-Speed 2620.98 samples/sec   Loss 11.4378   LearningRate 0.0688   Epoch: 3   Global Step: 141470   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:37:54,166-Speed 2610.32 samples/sec   Loss 11.5497   LearningRate 0.0688   Epoch: 3   Global Step: 141480   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:37:58,073-Speed 2622.20 samples/sec   Loss 11.4073   LearningRate 0.0688   Epoch: 3   Global Step: 141490   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:38:01,975-Speed 2624.89 samples/sec   Loss 11.4073   LearningRate 0.0688   Epoch: 3   Global Step: 141500   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:05,909-Speed 2605.20 samples/sec   Loss 11.2899   LearningRate 0.0688   Epoch: 3   Global Step: 141510   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:09,814-Speed 2622.70 samples/sec   Loss 11.5003   LearningRate 0.0688   Epoch: 3   Global Step: 141520   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:13,724-Speed 2619.64 samples/sec   Loss 11.4092   LearningRate 0.0688   Epoch: 3   Global Step: 141530   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:17,641-Speed 2615.16 samples/sec   Loss 11.5568   LearningRate 0.0688   Epoch: 3   Global Step: 141540   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:21,550-Speed 2619.54 samples/sec   Loss 11.4394   LearningRate 0.0688   Epoch: 3   Global Step: 141550   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:25,457-Speed 2621.92 samples/sec   Loss 11.3653   LearningRate 0.0688   Epoch: 3   Global Step: 141560   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:29,366-Speed 2620.44 samples/sec   Loss 11.3104   LearningRate 0.0688   Epoch: 3   Global Step: 141570   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:33,275-Speed 2620.55 samples/sec   Loss 11.4000   LearningRate 0.0688   Epoch: 3   Global Step: 141580   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:37,180-Speed 2622.72 samples/sec   Loss 11.4327   LearningRate 0.0688   Epoch: 3   Global Step: 141590   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:38:41,087-Speed 2621.27 samples/sec   Loss 11.3998   LearningRate 0.0688   Epoch: 3   Global Step: 141600   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:38:45,022-Speed 2603.22 samples/sec   Loss 11.4430   LearningRate 0.0688   Epoch: 3   Global Step: 141610   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:38:48,929-Speed 2621.70 samples/sec   Loss 11.2411   LearningRate 0.0688   Epoch: 3   Global Step: 141620   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:38:52,928-Speed 2561.38 samples/sec   Loss 11.3899   LearningRate 0.0688   Epoch: 3   Global Step: 141630   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:38:57,041-Speed 2490.41 samples/sec   Loss 11.1615   LearningRate 0.0688   Epoch: 3   Global Step: 141640   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:01,050-Speed 2554.64 samples/sec   Loss 11.4369   LearningRate 0.0688   Epoch: 3   Global Step: 141650   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:04,962-Speed 2618.67 samples/sec   Loss 11.3843   LearningRate 0.0688   Epoch: 3   Global Step: 141660   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:08,875-Speed 2617.47 samples/sec   Loss 11.3637   LearningRate 0.0688   Epoch: 3   Global Step: 141670   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:12,800-Speed 2609.48 samples/sec   Loss 11.3543   LearningRate 0.0688   Epoch: 3   Global Step: 141680   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:16,702-Speed 2624.64 samples/sec   Loss 11.3089   LearningRate 0.0688   Epoch: 3   Global Step: 141690   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:20,597-Speed 2629.88 samples/sec   Loss 11.3449   LearningRate 0.0688   Epoch: 3   Global Step: 141700   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:24,511-Speed 2617.20 samples/sec   Loss 11.3541   LearningRate 0.0688   Epoch: 3   Global Step: 141710   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:28,442-Speed 2605.52 samples/sec   Loss 11.4136   LearningRate 0.0688   Epoch: 3   Global Step: 141720   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:32,348-Speed 2621.96 samples/sec   Loss 11.3649   LearningRate 0.0687   Epoch: 3   Global Step: 141730   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:36,255-Speed 2621.35 samples/sec   Loss 11.3684   LearningRate 0.0687   Epoch: 3   Global Step: 141740   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:40,166-Speed 2619.25 samples/sec   Loss 11.4180   LearningRate 0.0687   Epoch: 3   Global Step: 141750   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:44,074-Speed 2621.06 samples/sec   Loss 11.4867   LearningRate 0.0687   Epoch: 3   Global Step: 141760   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:47,979-Speed 2623.02 samples/sec   Loss 11.3242   LearningRate 0.0687   Epoch: 3   Global Step: 141770   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:51,887-Speed 2620.82 samples/sec   Loss 11.2706   LearningRate 0.0687   Epoch: 3   Global Step: 141780   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:55,794-Speed 2621.58 samples/sec   Loss 11.4904   LearningRate 0.0687   Epoch: 3   Global Step: 141790   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:39:59,683-Speed 2633.86 samples/sec   Loss 11.4283   LearningRate 0.0687   Epoch: 3   Global Step: 141800   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:03,595-Speed 2617.85 samples/sec   Loss 11.5027   LearningRate 0.0687   Epoch: 3   Global Step: 141810   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:07,530-Speed 2602.62 samples/sec   Loss 11.4241   LearningRate 0.0687   Epoch: 3   Global Step: 141820   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:11,441-Speed 2619.21 samples/sec   Loss 11.4088   LearningRate 0.0687   Epoch: 3   Global Step: 141830   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:15,356-Speed 2616.15 samples/sec   Loss 11.3979   LearningRate 0.0687   Epoch: 3   Global Step: 141840   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:19,486-Speed 2480.17 samples/sec   Loss 11.3195   LearningRate 0.0687   Epoch: 3   Global Step: 141850   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:23,405-Speed 2613.19 samples/sec   Loss 11.3441   LearningRate 0.0687   Epoch: 3   Global Step: 141860   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:27,329-Speed 2610.63 samples/sec   Loss 11.4124   LearningRate 0.0687   Epoch: 3   Global Step: 141870   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:31,250-Speed 2611.94 samples/sec   Loss 11.3360   LearningRate 0.0687   Epoch: 3   Global Step: 141880   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:35,193-Speed 2597.16 samples/sec   Loss 11.3684   LearningRate 0.0687   Epoch: 3   Global Step: 141890   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:39,105-Speed 2618.38 samples/sec   Loss 11.2552   LearningRate 0.0687   Epoch: 3   Global Step: 141900   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:43,081-Speed 2576.52 samples/sec   Loss 11.3602   LearningRate 0.0687   Epoch: 3   Global Step: 141910   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:47,003-Speed 2611.50 samples/sec   Loss 11.3318   LearningRate 0.0687   Epoch: 3   Global Step: 141920   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:50,935-Speed 2605.82 samples/sec   Loss 11.3419   LearningRate 0.0687   Epoch: 3   Global Step: 141930   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:54,855-Speed 2613.17 samples/sec   Loss 11.3849   LearningRate 0.0687   Epoch: 3   Global Step: 141940   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:40:58,778-Speed 2610.78 samples/sec   Loss 11.5047   LearningRate 0.0687   Epoch: 3   Global Step: 141950   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:41:02,701-Speed 2611.14 samples/sec   Loss 11.4682   LearningRate 0.0687   Epoch: 3   Global Step: 141960   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:41:06,604-Speed 2623.58 samples/sec   Loss 11.3661   LearningRate 0.0687   Epoch: 3   Global Step: 141970   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:41:10,513-Speed 2620.03 samples/sec   Loss 11.4929   LearningRate 0.0687   Epoch: 3   Global Step: 141980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:14,424-Speed 2619.65 samples/sec   Loss 11.3642   LearningRate 0.0687   Epoch: 3   Global Step: 141990   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:18,329-Speed 2622.61 samples/sec   Loss 11.5485   LearningRate 0.0687   Epoch: 3   Global Step: 142000   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:22,238-Speed 2620.72 samples/sec   Loss 11.3228   LearningRate 0.0687   Epoch: 3   Global Step: 142010   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:26,141-Speed 2624.01 samples/sec   Loss 11.2822   LearningRate 0.0687   Epoch: 3   Global Step: 142020   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:30,045-Speed 2623.46 samples/sec   Loss 11.3175   LearningRate 0.0687   Epoch: 3   Global Step: 142030   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:33,971-Speed 2609.10 samples/sec   Loss 11.3594   LearningRate 0.0687   Epoch: 3   Global Step: 142040   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:37,903-Speed 2604.63 samples/sec   Loss 11.2956   LearningRate 0.0687   Epoch: 3   Global Step: 142050   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:41,823-Speed 2612.43 samples/sec   Loss 11.3615   LearningRate 0.0687   Epoch: 3   Global Step: 142060   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:45,749-Speed 2608.70 samples/sec   Loss 11.4769   LearningRate 0.0687   Epoch: 3   Global Step: 142070   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:41:49,666-Speed 2615.54 samples/sec   Loss 11.3290   LearningRate 0.0687   Epoch: 3   Global Step: 142080   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:41:53,570-Speed 2623.30 samples/sec   Loss 11.4997   LearningRate 0.0687   Epoch: 3   Global Step: 142090   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:41:57,473-Speed 2624.81 samples/sec   Loss 11.4257   LearningRate 0.0687   Epoch: 3   Global Step: 142100   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:42:01,362-Speed 2633.34 samples/sec   Loss 11.4061   LearningRate 0.0687   Epoch: 3   Global Step: 142110   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:05,285-Speed 2610.80 samples/sec   Loss 11.3164   LearningRate 0.0687   Epoch: 3   Global Step: 142120   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:09,194-Speed 2620.09 samples/sec   Loss 11.4694   LearningRate 0.0687   Epoch: 3   Global Step: 142130   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:13,097-Speed 2624.71 samples/sec   Loss 11.4833   LearningRate 0.0687   Epoch: 3   Global Step: 142140   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:16,998-Speed 2625.06 samples/sec   Loss 11.3224   LearningRate 0.0687   Epoch: 3   Global Step: 142150   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:20,920-Speed 2611.26 samples/sec   Loss 11.4785   LearningRate 0.0687   Epoch: 3   Global Step: 142160   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:24,839-Speed 2614.13 samples/sec   Loss 11.2885   LearningRate 0.0687   Epoch: 3   Global Step: 142170   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:28,740-Speed 2626.94 samples/sec   Loss 11.3219   LearningRate 0.0687   Epoch: 3   Global Step: 142180   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:32,648-Speed 2620.49 samples/sec   Loss 11.3411   LearningRate 0.0687   Epoch: 3   Global Step: 142190   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:36,559-Speed 2618.97 samples/sec   Loss 11.3157   LearningRate 0.0687   Epoch: 3   Global Step: 142200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:40,474-Speed 2616.42 samples/sec   Loss 11.4341   LearningRate 0.0687   Epoch: 3   Global Step: 142210   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:42:44,375-Speed 2625.56 samples/sec   Loss 11.3482   LearningRate 0.0687   Epoch: 3   Global Step: 142220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:42:48,270-Speed 2629.85 samples/sec   Loss 11.3652   LearningRate 0.0686   Epoch: 3   Global Step: 142230   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:42:52,147-Speed 2641.49 samples/sec   Loss 12.0246   LearningRate 0.0686   Epoch: 3   Global Step: 142240   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:42:56,049-Speed 2625.27 samples/sec   Loss 11.7898   LearningRate 0.0686   Epoch: 3   Global Step: 142250   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:00,028-Speed 2573.98 samples/sec   Loss 11.7476   LearningRate 0.0686   Epoch: 3   Global Step: 142260   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:04,082-Speed 2526.19 samples/sec   Loss 11.6340   LearningRate 0.0686   Epoch: 3   Global Step: 142270   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:07,994-Speed 2618.31 samples/sec   Loss 11.5403   LearningRate 0.0686   Epoch: 3   Global Step: 142280   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:11,911-Speed 2615.31 samples/sec   Loss 11.4514   LearningRate 0.0686   Epoch: 3   Global Step: 142290   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:15,809-Speed 2627.19 samples/sec   Loss 11.3795   LearningRate 0.0686   Epoch: 3   Global Step: 142300   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:19,723-Speed 2617.16 samples/sec   Loss 11.3141   LearningRate 0.0686   Epoch: 3   Global Step: 142310   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:23,632-Speed 2620.03 samples/sec   Loss 11.4755   LearningRate 0.0686   Epoch: 3   Global Step: 142320   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:27,542-Speed 2619.93 samples/sec   Loss 11.4946   LearningRate 0.0686   Epoch: 3   Global Step: 142330   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 11:43:31,454-Speed 2618.05 samples/sec   Loss 11.4763   LearningRate 0.0686   Epoch: 3   Global Step: 142340   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:35,367-Speed 2617.43 samples/sec   Loss 11.4352   LearningRate 0.0686   Epoch: 3   Global Step: 142350   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:39,366-Speed 2561.07 samples/sec   Loss 11.4701   LearningRate 0.0686   Epoch: 3   Global Step: 142360   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:43,268-Speed 2624.84 samples/sec   Loss 11.4895   LearningRate 0.0686   Epoch: 3   Global Step: 142370   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:47,177-Speed 2621.11 samples/sec   Loss 11.3880   LearningRate 0.0686   Epoch: 3   Global Step: 142380   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:51,123-Speed 2595.45 samples/sec   Loss 11.4908   LearningRate 0.0686   Epoch: 3   Global Step: 142390   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:55,026-Speed 2624.07 samples/sec   Loss 11.4450   LearningRate 0.0686   Epoch: 3   Global Step: 142400   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:43:58,944-Speed 2614.58 samples/sec   Loss 11.4029   LearningRate 0.0686   Epoch: 3   Global Step: 142410   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:44:02,845-Speed 2625.82 samples/sec   Loss 11.3852   LearningRate 0.0686   Epoch: 3   Global Step: 142420   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:44:06,764-Speed 2613.71 samples/sec   Loss 11.4392   LearningRate 0.0686   Epoch: 3   Global Step: 142430   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:44:10,691-Speed 2608.61 samples/sec   Loss 11.4646   LearningRate 0.0686   Epoch: 3   Global Step: 142440   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:14,606-Speed 2616.02 samples/sec   Loss 11.3485   LearningRate 0.0686   Epoch: 3   Global Step: 142450   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:18,510-Speed 2623.43 samples/sec   Loss 11.3384   LearningRate 0.0686   Epoch: 3   Global Step: 142460   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:22,411-Speed 2626.12 samples/sec   Loss 11.4016   LearningRate 0.0686   Epoch: 3   Global Step: 142470   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:26,312-Speed 2625.15 samples/sec   Loss 11.3393   LearningRate 0.0686   Epoch: 3   Global Step: 142480   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:30,218-Speed 2622.50 samples/sec   Loss 11.3671   LearningRate 0.0686   Epoch: 3   Global Step: 142490   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:34,120-Speed 2625.06 samples/sec   Loss 11.4317   LearningRate 0.0686   Epoch: 3   Global Step: 142500   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:38,031-Speed 2618.40 samples/sec   Loss 11.3466   LearningRate 0.0686   Epoch: 3   Global Step: 142510   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:41,944-Speed 2617.85 samples/sec   Loss 11.2513   LearningRate 0.0686   Epoch: 3   Global Step: 142520   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:45,854-Speed 2619.46 samples/sec   Loss 11.2853   LearningRate 0.0686   Epoch: 3   Global Step: 142530   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:44:49,775-Speed 2612.16 samples/sec   Loss 11.4241   LearningRate 0.0686   Epoch: 3   Global Step: 142540   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:44:53,688-Speed 2617.67 samples/sec   Loss 11.3326   LearningRate 0.0686   Epoch: 3   Global Step: 142550   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:44:57,614-Speed 2608.89 samples/sec   Loss 11.2993   LearningRate 0.0686   Epoch: 3   Global Step: 142560   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:01,538-Speed 2609.89 samples/sec   Loss 11.4769   LearningRate 0.0686   Epoch: 3   Global Step: 142570   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:05,466-Speed 2607.21 samples/sec   Loss 11.3171   LearningRate 0.0686   Epoch: 3   Global Step: 142580   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:09,380-Speed 2616.97 samples/sec   Loss 11.4017   LearningRate 0.0686   Epoch: 3   Global Step: 142590   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:13,306-Speed 2609.01 samples/sec   Loss 11.3690   LearningRate 0.0686   Epoch: 3   Global Step: 142600   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:17,228-Speed 2611.55 samples/sec   Loss 11.3179   LearningRate 0.0686   Epoch: 3   Global Step: 142610   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:21,132-Speed 2624.19 samples/sec   Loss 11.4728   LearningRate 0.0686   Epoch: 3   Global Step: 142620   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:45:25,030-Speed 2627.37 samples/sec   Loss 11.3587   LearningRate 0.0686   Epoch: 3   Global Step: 142630   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:28,939-Speed 2620.32 samples/sec   Loss 11.2927   LearningRate 0.0686   Epoch: 3   Global Step: 142640   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:32,847-Speed 2620.65 samples/sec   Loss 11.3845   LearningRate 0.0686   Epoch: 3   Global Step: 142650   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:36,750-Speed 2623.90 samples/sec   Loss 11.2975   LearningRate 0.0686   Epoch: 3   Global Step: 142660   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:40,649-Speed 2627.21 samples/sec   Loss 11.3435   LearningRate 0.0686   Epoch: 3   Global Step: 142670   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:44,555-Speed 2622.50 samples/sec   Loss 11.2960   LearningRate 0.0686   Epoch: 3   Global Step: 142680   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:48,456-Speed 2625.31 samples/sec   Loss 11.3365   LearningRate 0.0686   Epoch: 3   Global Step: 142690   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:52,365-Speed 2620.36 samples/sec   Loss 11.4033   LearningRate 0.0686   Epoch: 3   Global Step: 142700   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:45:56,266-Speed 2625.67 samples/sec   Loss 11.4208   LearningRate 0.0686   Epoch: 3   Global Step: 142710   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:00,164-Speed 2627.85 samples/sec   Loss 11.2866   LearningRate 0.0686   Epoch: 3   Global Step: 142720   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:04,071-Speed 2621.14 samples/sec   Loss 11.4143   LearningRate 0.0685   Epoch: 3   Global Step: 142730   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:46:07,961-Speed 2633.17 samples/sec   Loss 11.4677   LearningRate 0.0685   Epoch: 3   Global Step: 142740   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:11,866-Speed 2622.87 samples/sec   Loss 11.3709   LearningRate 0.0685   Epoch: 3   Global Step: 142750   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:15,770-Speed 2623.41 samples/sec   Loss 11.3866   LearningRate 0.0685   Epoch: 3   Global Step: 142760   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:19,686-Speed 2616.32 samples/sec   Loss 11.3407   LearningRate 0.0685   Epoch: 3   Global Step: 142770   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:23,600-Speed 2616.89 samples/sec   Loss 11.4436   LearningRate 0.0685   Epoch: 3   Global Step: 142780   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:27,512-Speed 2617.94 samples/sec   Loss 11.4021   LearningRate 0.0685   Epoch: 3   Global Step: 142790   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:31,427-Speed 2616.54 samples/sec   Loss 11.2958   LearningRate 0.0685   Epoch: 3   Global Step: 142800   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:35,347-Speed 2613.09 samples/sec   Loss 11.4198   LearningRate 0.0685   Epoch: 3   Global Step: 142810   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:39,260-Speed 2616.94 samples/sec   Loss 11.4151   LearningRate 0.0685   Epoch: 3   Global Step: 142820   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:43,191-Speed 2605.92 samples/sec   Loss 11.5835   LearningRate 0.0685   Epoch: 3   Global Step: 142830   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:47,102-Speed 2618.89 samples/sec   Loss 11.5020   LearningRate 0.0685   Epoch: 3   Global Step: 142840   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:46:50,982-Speed 2640.53 samples/sec   Loss 11.4261   LearningRate 0.0685   Epoch: 3   Global Step: 142850   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:54,897-Speed 2615.72 samples/sec   Loss 11.5781   LearningRate 0.0685   Epoch: 3   Global Step: 142860   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:46:58,804-Speed 2621.86 samples/sec   Loss 11.2675   LearningRate 0.0685   Epoch: 3   Global Step: 142870   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:02,707-Speed 2624.23 samples/sec   Loss 11.4017   LearningRate 0.0685   Epoch: 3   Global Step: 142880   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:06,614-Speed 2621.64 samples/sec   Loss 11.3940   LearningRate 0.0685   Epoch: 3   Global Step: 142890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:10,532-Speed 2613.58 samples/sec   Loss 11.2714   LearningRate 0.0685   Epoch: 3   Global Step: 142900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:14,433-Speed 2626.04 samples/sec   Loss 11.5382   LearningRate 0.0685   Epoch: 3   Global Step: 142910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:18,333-Speed 2626.66 samples/sec   Loss 11.4913   LearningRate 0.0685   Epoch: 3   Global Step: 142920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:22,233-Speed 2626.01 samples/sec   Loss 11.3118   LearningRate 0.0685   Epoch: 3   Global Step: 142930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:26,137-Speed 2623.94 samples/sec   Loss 11.5241   LearningRate 0.0685   Epoch: 3   Global Step: 142940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:30,037-Speed 2626.89 samples/sec   Loss 11.3024   LearningRate 0.0685   Epoch: 3   Global Step: 142950   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:47:33,937-Speed 2625.76 samples/sec   Loss 11.3797   LearningRate 0.0685   Epoch: 3   Global Step: 142960   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:37,853-Speed 2615.56 samples/sec   Loss 11.4275   LearningRate 0.0685   Epoch: 3   Global Step: 142970   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:41,762-Speed 2620.48 samples/sec   Loss 11.3797   LearningRate 0.0685   Epoch: 3   Global Step: 142980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:45,662-Speed 2626.09 samples/sec   Loss 11.3124   LearningRate 0.0685   Epoch: 3   Global Step: 142990   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:49,565-Speed 2627.18 samples/sec   Loss 11.4924   LearningRate 0.0685   Epoch: 3   Global Step: 143000   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:53,466-Speed 2625.19 samples/sec   Loss 11.3694   LearningRate 0.0685   Epoch: 3   Global Step: 143010   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:47:57,389-Speed 2610.64 samples/sec   Loss 11.3455   LearningRate 0.0685   Epoch: 3   Global Step: 143020   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:48:01,369-Speed 2573.39 samples/sec   Loss 11.3581   LearningRate 0.0685   Epoch: 3   Global Step: 143030   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:05,272-Speed 2624.55 samples/sec   Loss 11.2923   LearningRate 0.0685   Epoch: 3   Global Step: 143040   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:09,188-Speed 2615.13 samples/sec   Loss 11.5425   LearningRate 0.0685   Epoch: 3   Global Step: 143050   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:13,089-Speed 2626.11 samples/sec   Loss 11.1669   LearningRate 0.0685   Epoch: 3   Global Step: 143060   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:16,989-Speed 2625.83 samples/sec   Loss 11.2289   LearningRate 0.0685   Epoch: 3   Global Step: 143070   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:20,896-Speed 2621.97 samples/sec   Loss 11.3269   LearningRate 0.0685   Epoch: 3   Global Step: 143080   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:24,800-Speed 2623.80 samples/sec   Loss 11.3883   LearningRate 0.0685   Epoch: 3   Global Step: 143090   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:28,699-Speed 2626.44 samples/sec   Loss 11.4420   LearningRate 0.0685   Epoch: 3   Global Step: 143100   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:32,608-Speed 2620.33 samples/sec   Loss 11.4190   LearningRate 0.0685   Epoch: 3   Global Step: 143110   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:36,515-Speed 2621.84 samples/sec   Loss 11.2121   LearningRate 0.0685   Epoch: 3   Global Step: 143120   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 11:48:40,418-Speed 2624.56 samples/sec   Loss 11.4982   LearningRate 0.0685   Epoch: 3   Global Step: 143130   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:48:44,322-Speed 2623.84 samples/sec   Loss 11.3652   LearningRate 0.0685   Epoch: 3   Global Step: 143140   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:48:48,221-Speed 2626.53 samples/sec   Loss 11.4596   LearningRate 0.0685   Epoch: 3   Global Step: 143150   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:48:52,122-Speed 2625.76 samples/sec   Loss 11.3742   LearningRate 0.0685   Epoch: 3   Global Step: 143160   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:48:56,022-Speed 2626.13 samples/sec   Loss 11.3343   LearningRate 0.0685   Epoch: 3   Global Step: 143170   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:48:59,919-Speed 2628.30 samples/sec   Loss 11.4312   LearningRate 0.0685   Epoch: 3   Global Step: 143180   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:03,829-Speed 2619.10 samples/sec   Loss 11.4237   LearningRate 0.0685   Epoch: 3   Global Step: 143190   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:07,757-Speed 2608.55 samples/sec   Loss 11.2716   LearningRate 0.0685   Epoch: 3   Global Step: 143200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:11,658-Speed 2625.92 samples/sec   Loss 11.4015   LearningRate 0.0685   Epoch: 3   Global Step: 143210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:15,555-Speed 2627.97 samples/sec   Loss 11.2810   LearningRate 0.0685   Epoch: 3   Global Step: 143220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:19,462-Speed 2621.56 samples/sec   Loss 11.4352   LearningRate 0.0685   Epoch: 3   Global Step: 143230   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:23,366-Speed 2623.57 samples/sec   Loss 11.3429   LearningRate 0.0684   Epoch: 3   Global Step: 143240   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:27,270-Speed 2623.96 samples/sec   Loss 11.4525   LearningRate 0.0684   Epoch: 3   Global Step: 143250   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:31,167-Speed 2628.15 samples/sec   Loss 11.3752   LearningRate 0.0684   Epoch: 3   Global Step: 143260   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:35,065-Speed 2627.60 samples/sec   Loss 11.3636   LearningRate 0.0684   Epoch: 3   Global Step: 143270   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:38,963-Speed 2627.50 samples/sec   Loss 11.3162   LearningRate 0.0684   Epoch: 3   Global Step: 143280   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:42,865-Speed 2624.86 samples/sec   Loss 11.3827   LearningRate 0.0684   Epoch: 3   Global Step: 143290   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:49:46,747-Speed 2638.89 samples/sec   Loss 11.2654   LearningRate 0.0684   Epoch: 3   Global Step: 143300   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:50,648-Speed 2625.55 samples/sec   Loss 11.4236   LearningRate 0.0684   Epoch: 3   Global Step: 143310   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:54,552-Speed 2623.72 samples/sec   Loss 11.3961   LearningRate 0.0684   Epoch: 3   Global Step: 143320   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:49:58,475-Speed 2611.27 samples/sec   Loss 11.4203   LearningRate 0.0684   Epoch: 3   Global Step: 143330   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:02,399-Speed 2609.83 samples/sec   Loss 11.2528   LearningRate 0.0684   Epoch: 3   Global Step: 143340   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:06,317-Speed 2614.41 samples/sec   Loss 11.3385   LearningRate 0.0684   Epoch: 3   Global Step: 143350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:10,248-Speed 2605.78 samples/sec   Loss 11.1412   LearningRate 0.0684   Epoch: 3   Global Step: 143360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:14,143-Speed 2629.59 samples/sec   Loss 11.4868   LearningRate 0.0684   Epoch: 3   Global Step: 143370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:18,053-Speed 2619.68 samples/sec   Loss 11.2580   LearningRate 0.0684   Epoch: 3   Global Step: 143380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:21,961-Speed 2620.75 samples/sec   Loss 11.2077   LearningRate 0.0684   Epoch: 3   Global Step: 143390   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:50:25,866-Speed 2623.28 samples/sec   Loss 11.2748   LearningRate 0.0684   Epoch: 3   Global Step: 143400   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:29,762-Speed 2628.87 samples/sec   Loss 11.5051   LearningRate 0.0684   Epoch: 3   Global Step: 143410   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:33,667-Speed 2622.71 samples/sec   Loss 11.3778   LearningRate 0.0684   Epoch: 3   Global Step: 143420   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:37,576-Speed 2620.11 samples/sec   Loss 11.5139   LearningRate 0.0684   Epoch: 3   Global Step: 143430   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:41,487-Speed 2619.15 samples/sec   Loss 11.4445   LearningRate 0.0684   Epoch: 3   Global Step: 143440   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:45,420-Speed 2604.22 samples/sec   Loss 11.2727   LearningRate 0.0684   Epoch: 3   Global Step: 143450   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:49,343-Speed 2611.10 samples/sec   Loss 11.3448   LearningRate 0.0684   Epoch: 3   Global Step: 143460   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:53,256-Speed 2618.12 samples/sec   Loss 11.4153   LearningRate 0.0684   Epoch: 3   Global Step: 143470   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:50:57,161-Speed 2622.35 samples/sec   Loss 11.4458   LearningRate 0.0684   Epoch: 3   Global Step: 143480   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:51:01,078-Speed 2615.08 samples/sec   Loss 11.3737   LearningRate 0.0684   Epoch: 3   Global Step: 143490   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:51:04,986-Speed 2621.10 samples/sec   Loss 11.1666   LearningRate 0.0684   Epoch: 3   Global Step: 143500   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:51:08,944-Speed 2587.53 samples/sec   Loss 11.3288   LearningRate 0.0684   Epoch: 3   Global Step: 143510   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:51:12,846-Speed 2624.98 samples/sec   Loss 11.3731   LearningRate 0.0684   Epoch: 3   Global Step: 143520   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:51:16,743-Speed 2628.66 samples/sec   Loss 11.4612   LearningRate 0.0684   Epoch: 3   Global Step: 143530   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:51:20,620-Speed 2641.72 samples/sec   Loss 11.2823   LearningRate 0.0684   Epoch: 3   Global Step: 143540   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:24,515-Speed 2629.81 samples/sec   Loss 11.4197   LearningRate 0.0684   Epoch: 3   Global Step: 143550   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:28,436-Speed 2612.72 samples/sec   Loss 11.3761   LearningRate 0.0684   Epoch: 3   Global Step: 143560   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:32,357-Speed 2612.03 samples/sec   Loss 11.3301   LearningRate 0.0684   Epoch: 3   Global Step: 143570   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:36,264-Speed 2621.38 samples/sec   Loss 11.3465   LearningRate 0.0684   Epoch: 3   Global Step: 143580   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:40,163-Speed 2627.14 samples/sec   Loss 11.4766   LearningRate 0.0684   Epoch: 3   Global Step: 143590   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:44,082-Speed 2613.51 samples/sec   Loss 11.3247   LearningRate 0.0684   Epoch: 3   Global Step: 143600   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:47,975-Speed 2631.50 samples/sec   Loss 11.2839   LearningRate 0.0684   Epoch: 3   Global Step: 143610   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:51,890-Speed 2616.58 samples/sec   Loss 11.3397   LearningRate 0.0684   Epoch: 3   Global Step: 143620   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:55,779-Speed 2633.27 samples/sec   Loss 11.2591   LearningRate 0.0684   Epoch: 3   Global Step: 143630   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:51:59,681-Speed 2625.66 samples/sec   Loss 11.3374   LearningRate 0.0684   Epoch: 3   Global Step: 143640   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:03,576-Speed 2629.42 samples/sec   Loss 11.3049   LearningRate 0.0684   Epoch: 3   Global Step: 143650   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:07,473-Speed 2628.56 samples/sec   Loss 11.4005   LearningRate 0.0684   Epoch: 3   Global Step: 143660   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:11,380-Speed 2621.15 samples/sec   Loss 11.4552   LearningRate 0.0684   Epoch: 3   Global Step: 143670   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:15,276-Speed 2629.02 samples/sec   Loss 11.4094   LearningRate 0.0684   Epoch: 3   Global Step: 143680   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:19,173-Speed 2628.55 samples/sec   Loss 11.1909   LearningRate 0.0684   Epoch: 3   Global Step: 143690   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:23,073-Speed 2626.61 samples/sec   Loss 11.3525   LearningRate 0.0684   Epoch: 3   Global Step: 143700   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:26,993-Speed 2612.25 samples/sec   Loss 11.3216   LearningRate 0.0684   Epoch: 3   Global Step: 143710   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:30,892-Speed 2627.86 samples/sec   Loss 11.3622   LearningRate 0.0684   Epoch: 3   Global Step: 143720   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:34,789-Speed 2628.04 samples/sec   Loss 11.4707   LearningRate 0.0684   Epoch: 3   Global Step: 143730   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:38,676-Speed 2634.77 samples/sec   Loss 11.3827   LearningRate 0.0683   Epoch: 3   Global Step: 143740   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:42,573-Speed 2628.37 samples/sec   Loss 11.2181   LearningRate 0.0683   Epoch: 3   Global Step: 143750   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:46,473-Speed 2626.68 samples/sec   Loss 11.2367   LearningRate 0.0683   Epoch: 3   Global Step: 143760   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:50,373-Speed 2626.56 samples/sec   Loss 11.3859   LearningRate 0.0683   Epoch: 3   Global Step: 143770   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:54,269-Speed 2628.68 samples/sec   Loss 11.2890   LearningRate 0.0683   Epoch: 3   Global Step: 143780   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:52:58,192-Speed 2611.11 samples/sec   Loss 11.4129   LearningRate 0.0683   Epoch: 3   Global Step: 143790   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:53:02,095-Speed 2624.75 samples/sec   Loss 11.3436   LearningRate 0.0683   Epoch: 3   Global Step: 143800   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:53:06,037-Speed 2597.90 samples/sec   Loss 11.5055   LearningRate 0.0683   Epoch: 3   Global Step: 143810   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:53:09,936-Speed 2627.15 samples/sec   Loss 11.4390   LearningRate 0.0683   Epoch: 3   Global Step: 143820   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:53:13,839-Speed 2624.21 samples/sec   Loss 11.2670   LearningRate 0.0683   Epoch: 3   Global Step: 143830   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:53:17,714-Speed 2643.13 samples/sec   Loss 11.3703   LearningRate 0.0683   Epoch: 3   Global Step: 143840   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:21,612-Speed 2628.33 samples/sec   Loss 11.3018   LearningRate 0.0683   Epoch: 3   Global Step: 143850   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:25,509-Speed 2627.76 samples/sec   Loss 11.2876   LearningRate 0.0683   Epoch: 3   Global Step: 143860   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:29,798-Speed 2388.38 samples/sec   Loss 11.4827   LearningRate 0.0683   Epoch: 3   Global Step: 143870   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:33,692-Speed 2630.56 samples/sec   Loss 11.3545   LearningRate 0.0683   Epoch: 3   Global Step: 143880   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:37,585-Speed 2630.59 samples/sec   Loss 11.3555   LearningRate 0.0683   Epoch: 3   Global Step: 143890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:41,484-Speed 2627.60 samples/sec   Loss 11.4138   LearningRate 0.0683   Epoch: 3   Global Step: 143900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:45,380-Speed 2628.74 samples/sec   Loss 11.3944   LearningRate 0.0683   Epoch: 3   Global Step: 143910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:49,273-Speed 2630.96 samples/sec   Loss 11.3637   LearningRate 0.0683   Epoch: 3   Global Step: 143920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:53,172-Speed 2627.35 samples/sec   Loss 11.3023   LearningRate 0.0683   Epoch: 3   Global Step: 143930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:53:57,069-Speed 2628.03 samples/sec   Loss 11.4037   LearningRate 0.0683   Epoch: 3   Global Step: 143940   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:54:00,973-Speed 2623.77 samples/sec   Loss 11.2783   LearningRate 0.0683   Epoch: 3   Global Step: 143950   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:54:04,875-Speed 2624.96 samples/sec   Loss 11.2719   LearningRate 0.0683   Epoch: 3   Global Step: 143960   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:54:08,755-Speed 2639.59 samples/sec   Loss 11.4395   LearningRate 0.0683   Epoch: 3   Global Step: 143970   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:12,653-Speed 2627.71 samples/sec   Loss 11.4096   LearningRate 0.0683   Epoch: 3   Global Step: 143980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:16,553-Speed 2626.70 samples/sec   Loss 11.1862   LearningRate 0.0683   Epoch: 3   Global Step: 143990   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:20,451-Speed 2627.83 samples/sec   Loss 11.3790   LearningRate 0.0683   Epoch: 3   Global Step: 144000   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:24,349-Speed 2627.45 samples/sec   Loss 11.4311   LearningRate 0.0683   Epoch: 3   Global Step: 144010   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:28,265-Speed 2615.46 samples/sec   Loss 11.3483   LearningRate 0.0683   Epoch: 3   Global Step: 144020   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:32,161-Speed 2629.44 samples/sec   Loss 11.3314   LearningRate 0.0683   Epoch: 3   Global Step: 144030   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:36,058-Speed 2628.59 samples/sec   Loss 11.1951   LearningRate 0.0683   Epoch: 3   Global Step: 144040   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:39,956-Speed 2627.35 samples/sec   Loss 11.4210   LearningRate 0.0683   Epoch: 3   Global Step: 144050   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:43,856-Speed 2626.85 samples/sec   Loss 11.3370   LearningRate 0.0683   Epoch: 3   Global Step: 144060   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:54:47,762-Speed 2622.52 samples/sec   Loss 11.4236   LearningRate 0.0683   Epoch: 3   Global Step: 144070   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:54:51,685-Speed 2610.62 samples/sec   Loss 11.4796   LearningRate 0.0683   Epoch: 3   Global Step: 144080   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:54:55,591-Speed 2622.94 samples/sec   Loss 11.3222   LearningRate 0.0683   Epoch: 3   Global Step: 144090   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:54:59,494-Speed 2623.72 samples/sec   Loss 11.4548   LearningRate 0.0683   Epoch: 3   Global Step: 144100   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:55:03,440-Speed 2595.74 samples/sec   Loss 11.3135   LearningRate 0.0683   Epoch: 3   Global Step: 144110   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:55:07,356-Speed 2615.54 samples/sec   Loss 11.4123   LearningRate 0.0683   Epoch: 3   Global Step: 144120   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:55:11,272-Speed 2616.08 samples/sec   Loss 11.2495   LearningRate 0.0683   Epoch: 3   Global Step: 144130   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:55:15,192-Speed 2612.94 samples/sec   Loss 11.2535   LearningRate 0.0683   Epoch: 3   Global Step: 144140   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:55:19,108-Speed 2615.49 samples/sec   Loss 11.5966   LearningRate 0.0683   Epoch: 3   Global Step: 144150   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:55:22,988-Speed 2639.76 samples/sec   Loss 11.3114   LearningRate 0.0683   Epoch: 3   Global Step: 144160   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:26,885-Speed 2628.35 samples/sec   Loss 11.4691   LearningRate 0.0683   Epoch: 3   Global Step: 144170   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:30,798-Speed 2617.54 samples/sec   Loss 11.2002   LearningRate 0.0683   Epoch: 3   Global Step: 144180   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:34,706-Speed 2621.11 samples/sec   Loss 11.2825   LearningRate 0.0683   Epoch: 3   Global Step: 144190   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:38,630-Speed 2610.06 samples/sec   Loss 11.5294   LearningRate 0.0683   Epoch: 3   Global Step: 144200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:42,536-Speed 2622.33 samples/sec   Loss 11.2418   LearningRate 0.0683   Epoch: 3   Global Step: 144210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:46,431-Speed 2629.85 samples/sec   Loss 11.3954   LearningRate 0.0683   Epoch: 3   Global Step: 144220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:50,336-Speed 2623.06 samples/sec   Loss 11.2670   LearningRate 0.0683   Epoch: 3   Global Step: 144230   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:54,234-Speed 2627.80 samples/sec   Loss 11.3348   LearningRate 0.0682   Epoch: 3   Global Step: 144240   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:55:58,134-Speed 2626.22 samples/sec   Loss 11.2415   LearningRate 0.0682   Epoch: 3   Global Step: 144250   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:02,032-Speed 2627.59 samples/sec   Loss 11.3362   LearningRate 0.0682   Epoch: 3   Global Step: 144260   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:56:05,932-Speed 2625.77 samples/sec   Loss 11.2445   LearningRate 0.0682   Epoch: 3   Global Step: 144270   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:56:09,832-Speed 2627.13 samples/sec   Loss 11.3054   LearningRate 0.0682   Epoch: 3   Global Step: 144280   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:56:13,729-Speed 2628.11 samples/sec   Loss 11.5105   LearningRate 0.0682   Epoch: 3   Global Step: 144290   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:56:17,658-Speed 2606.98 samples/sec   Loss 11.3318   LearningRate 0.0682   Epoch: 3   Global Step: 144300   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:56:21,558-Speed 2626.48 samples/sec   Loss 11.2847   LearningRate 0.0682   Epoch: 3   Global Step: 144310   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:56:25,437-Speed 2640.58 samples/sec   Loss 11.3461   LearningRate 0.0682   Epoch: 3   Global Step: 144320   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:29,348-Speed 2619.15 samples/sec   Loss 11.3295   LearningRate 0.0682   Epoch: 3   Global Step: 144330   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:33,246-Speed 2627.67 samples/sec   Loss 11.3617   LearningRate 0.0682   Epoch: 3   Global Step: 144340   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:37,146-Speed 2625.96 samples/sec   Loss 11.3379   LearningRate 0.0682   Epoch: 3   Global Step: 144350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:41,058-Speed 2618.87 samples/sec   Loss 11.1887   LearningRate 0.0682   Epoch: 3   Global Step: 144360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:44,953-Speed 2629.91 samples/sec   Loss 11.3410   LearningRate 0.0682   Epoch: 3   Global Step: 144370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:48,846-Speed 2631.17 samples/sec   Loss 11.1467   LearningRate 0.0682   Epoch: 3   Global Step: 144380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:52,768-Speed 2611.58 samples/sec   Loss 11.2636   LearningRate 0.0682   Epoch: 3   Global Step: 144390   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:56:56,669-Speed 2625.91 samples/sec   Loss 11.4993   LearningRate 0.0682   Epoch: 3   Global Step: 144400   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:00,580-Speed 2618.78 samples/sec   Loss 11.3079   LearningRate 0.0682   Epoch: 3   Global Step: 144410   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:04,488-Speed 2620.89 samples/sec   Loss 11.5640   LearningRate 0.0682   Epoch: 3   Global Step: 144420   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:57:08,380-Speed 2631.53 samples/sec   Loss 11.3615   LearningRate 0.0682   Epoch: 3   Global Step: 144430   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:57:12,275-Speed 2629.60 samples/sec   Loss 11.4308   LearningRate 0.0682   Epoch: 3   Global Step: 144440   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:57:16,180-Speed 2623.85 samples/sec   Loss 11.3126   LearningRate 0.0682   Epoch: 3   Global Step: 144450   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:57:20,074-Speed 2630.77 samples/sec   Loss 11.2255   LearningRate 0.0682   Epoch: 3   Global Step: 144460   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:57:23,951-Speed 2641.64 samples/sec   Loss 11.3897   LearningRate 0.0682   Epoch: 3   Global Step: 144470   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:27,856-Speed 2623.30 samples/sec   Loss 11.2246   LearningRate 0.0682   Epoch: 3   Global Step: 144480   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:31,750-Speed 2630.02 samples/sec   Loss 11.3708   LearningRate 0.0682   Epoch: 3   Global Step: 144490   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:35,643-Speed 2630.68 samples/sec   Loss 11.4184   LearningRate 0.0682   Epoch: 3   Global Step: 144500   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:39,539-Speed 2629.28 samples/sec   Loss 11.3603   LearningRate 0.0682   Epoch: 3   Global Step: 144510   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:43,447-Speed 2620.54 samples/sec   Loss 11.3762   LearningRate 0.0682   Epoch: 3   Global Step: 144520   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:47,347-Speed 2626.82 samples/sec   Loss 11.3577   LearningRate 0.0682   Epoch: 3   Global Step: 144530   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:51,245-Speed 2627.71 samples/sec   Loss 11.3368   LearningRate 0.0682   Epoch: 3   Global Step: 144540   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:55,145-Speed 2626.02 samples/sec   Loss 11.3305   LearningRate 0.0682   Epoch: 3   Global Step: 144550   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:57:59,040-Speed 2630.59 samples/sec   Loss 11.3005   LearningRate 0.0682   Epoch: 3   Global Step: 144560   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:02,939-Speed 2626.19 samples/sec   Loss 11.4359   LearningRate 0.0682   Epoch: 3   Global Step: 144570   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:58:06,822-Speed 2637.83 samples/sec   Loss 11.2130   LearningRate 0.0682   Epoch: 3   Global Step: 144580   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:10,740-Speed 2614.44 samples/sec   Loss 11.3349   LearningRate 0.0682   Epoch: 3   Global Step: 144590   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:14,637-Speed 2628.64 samples/sec   Loss 11.3918   LearningRate 0.0682   Epoch: 3   Global Step: 144600   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:18,532-Speed 2629.25 samples/sec   Loss 11.1919   LearningRate 0.0682   Epoch: 3   Global Step: 144610   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:22,436-Speed 2624.16 samples/sec   Loss 11.2349   LearningRate 0.0682   Epoch: 3   Global Step: 144620   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:26,352-Speed 2615.42 samples/sec   Loss 11.2528   LearningRate 0.0682   Epoch: 3   Global Step: 144630   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:30,247-Speed 2630.64 samples/sec   Loss 11.3412   LearningRate 0.0682   Epoch: 3   Global Step: 144640   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:34,145-Speed 2627.74 samples/sec   Loss 11.2090   LearningRate 0.0682   Epoch: 3   Global Step: 144650   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:38,045-Speed 2625.83 samples/sec   Loss 11.3877   LearningRate 0.0682   Epoch: 3   Global Step: 144660   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:41,946-Speed 2625.15 samples/sec   Loss 11.2594   LearningRate 0.0682   Epoch: 3   Global Step: 144670   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:58:45,866-Speed 2612.98 samples/sec   Loss 11.3496   LearningRate 0.0682   Epoch: 3   Global Step: 144680   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:58:49,774-Speed 2620.80 samples/sec   Loss 11.2996   LearningRate 0.0682   Epoch: 3   Global Step: 144690   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:58:53,674-Speed 2627.16 samples/sec   Loss 11.4878   LearningRate 0.0682   Epoch: 3   Global Step: 144700   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:58:57,554-Speed 2639.45 samples/sec   Loss 11.3865   LearningRate 0.0682   Epoch: 3   Global Step: 144710   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:01,448-Speed 2631.09 samples/sec   Loss 11.2829   LearningRate 0.0682   Epoch: 3   Global Step: 144720   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:05,345-Speed 2628.40 samples/sec   Loss 11.4582   LearningRate 0.0682   Epoch: 3   Global Step: 144730   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:09,247-Speed 2624.61 samples/sec   Loss 11.2651   LearningRate 0.0681   Epoch: 3   Global Step: 144740   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:13,141-Speed 2629.80 samples/sec   Loss 11.4304   LearningRate 0.0681   Epoch: 3   Global Step: 144750   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:17,037-Speed 2629.32 samples/sec   Loss 11.2763   LearningRate 0.0681   Epoch: 3   Global Step: 144760   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:20,930-Speed 2631.43 samples/sec   Loss 11.2207   LearningRate 0.0681   Epoch: 3   Global Step: 144770   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:24,822-Speed 2631.75 samples/sec   Loss 11.3156   LearningRate 0.0681   Epoch: 3   Global Step: 144780   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:28,722-Speed 2626.60 samples/sec   Loss 11.2079   LearningRate 0.0681   Epoch: 3   Global Step: 144790   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:32,621-Speed 2627.07 samples/sec   Loss 11.3867   LearningRate 0.0681   Epoch: 3   Global Step: 144800   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 11:59:36,528-Speed 2621.74 samples/sec   Loss 11.2268   LearningRate 0.0681   Epoch: 3   Global Step: 144810   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:59:40,421-Speed 2630.84 samples/sec   Loss 11.3820   LearningRate 0.0681   Epoch: 3   Global Step: 144820   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:59:44,313-Speed 2631.66 samples/sec   Loss 11.4718   LearningRate 0.0681   Epoch: 3   Global Step: 144830   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:59:48,214-Speed 2625.74 samples/sec   Loss 11.4413   LearningRate 0.0681   Epoch: 3   Global Step: 144840   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:59:52,115-Speed 2625.51 samples/sec   Loss 11.3103   LearningRate 0.0681   Epoch: 3   Global Step: 144850   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:59:56,033-Speed 2614.05 samples/sec   Loss 11.3323   LearningRate 0.0681   Epoch: 3   Global Step: 144860   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 11:59:59,943-Speed 2620.14 samples/sec   Loss 11.3601   LearningRate 0.0681   Epoch: 3   Global Step: 144870   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:00:03,849-Speed 2622.59 samples/sec   Loss 11.3537   LearningRate 0.0681   Epoch: 3   Global Step: 144880   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:00:07,785-Speed 2601.56 samples/sec   Loss 11.1541   LearningRate 0.0681   Epoch: 3   Global Step: 144890   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:00:11,693-Speed 2620.99 samples/sec   Loss 11.3767   LearningRate 0.0681   Epoch: 3   Global Step: 144900   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:00:15,559-Speed 2649.88 samples/sec   Loss 11.3110   LearningRate 0.0681   Epoch: 3   Global Step: 144910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:19,454-Speed 2629.66 samples/sec   Loss 11.3541   LearningRate 0.0681   Epoch: 3   Global Step: 144920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:23,359-Speed 2622.39 samples/sec   Loss 11.3432   LearningRate 0.0681   Epoch: 3   Global Step: 144930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:27,261-Speed 2625.29 samples/sec   Loss 11.3400   LearningRate 0.0681   Epoch: 3   Global Step: 144940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:31,157-Speed 2629.22 samples/sec   Loss 11.3020   LearningRate 0.0681   Epoch: 3   Global Step: 144950   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:35,052-Speed 2629.87 samples/sec   Loss 11.3396   LearningRate 0.0681   Epoch: 3   Global Step: 144960   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:38,982-Speed 2605.53 samples/sec   Loss 11.3423   LearningRate 0.0681   Epoch: 3   Global Step: 144970   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:42,887-Speed 2622.83 samples/sec   Loss 11.3327   LearningRate 0.0681   Epoch: 3   Global Step: 144980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:46,791-Speed 2624.17 samples/sec   Loss 11.2974   LearningRate 0.0681   Epoch: 3   Global Step: 144990   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:00:50,658-Speed 2648.84 samples/sec   Loss 11.3329   LearningRate 0.0681   Epoch: 3   Global Step: 145000   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:00:54,550-Speed 2631.60 samples/sec   Loss 11.4481   LearningRate 0.0681   Epoch: 3   Global Step: 145010   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:00:58,447-Speed 2628.24 samples/sec   Loss 11.2476   LearningRate 0.0681   Epoch: 3   Global Step: 145020   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:02,338-Speed 2632.36 samples/sec   Loss 11.3089   LearningRate 0.0681   Epoch: 3   Global Step: 145030   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:06,231-Speed 2631.27 samples/sec   Loss 11.2747   LearningRate 0.0681   Epoch: 3   Global Step: 145040   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:10,126-Speed 2629.69 samples/sec   Loss 11.1733   LearningRate 0.0681   Epoch: 3   Global Step: 145050   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:14,017-Speed 2632.10 samples/sec   Loss 11.2600   LearningRate 0.0681   Epoch: 3   Global Step: 145060   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:17,907-Speed 2633.60 samples/sec   Loss 11.3667   LearningRate 0.0681   Epoch: 3   Global Step: 145070   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:21,805-Speed 2627.91 samples/sec   Loss 11.3362   LearningRate 0.0681   Epoch: 3   Global Step: 145080   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:25,694-Speed 2633.60 samples/sec   Loss 11.4117   LearningRate 0.0681   Epoch: 3   Global Step: 145090   Fp16 Grad Scale: 16384   Required: 77 hours
Training: 2022-04-13 12:01:29,583-Speed 2633.57 samples/sec   Loss 11.3965   LearningRate 0.0681   Epoch: 3   Global Step: 145100   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:33,473-Speed 2632.95 samples/sec   Loss 11.2860   LearningRate 0.0681   Epoch: 3   Global Step: 145110   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:37,363-Speed 2632.95 samples/sec   Loss 11.2945   LearningRate 0.0681   Epoch: 3   Global Step: 145120   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:41,255-Speed 2632.12 samples/sec   Loss 11.3820   LearningRate 0.0681   Epoch: 3   Global Step: 145130   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:45,154-Speed 2626.67 samples/sec   Loss 11.4400   LearningRate 0.0681   Epoch: 3   Global Step: 145140   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:49,045-Speed 2632.58 samples/sec   Loss 11.2296   LearningRate 0.0681   Epoch: 3   Global Step: 145150   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:52,935-Speed 2633.15 samples/sec   Loss 11.3105   LearningRate 0.0681   Epoch: 3   Global Step: 145160   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:01:56,825-Speed 2633.27 samples/sec   Loss 11.2113   LearningRate 0.0681   Epoch: 3   Global Step: 145170   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:02:00,721-Speed 2629.28 samples/sec   Loss 11.1974   LearningRate 0.0681   Epoch: 3   Global Step: 145180   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:02:04,614-Speed 2630.88 samples/sec   Loss 11.4205   LearningRate 0.0681   Epoch: 3   Global Step: 145190   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:02:08,509-Speed 2628.92 samples/sec   Loss 11.3576   LearningRate 0.0681   Epoch: 3   Global Step: 145200   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:12,407-Speed 2628.24 samples/sec   Loss 11.3191   LearningRate 0.0681   Epoch: 3   Global Step: 145210   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:16,306-Speed 2626.75 samples/sec   Loss 11.2745   LearningRate 0.0681   Epoch: 3   Global Step: 145220   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:20,202-Speed 2628.96 samples/sec   Loss 11.2213   LearningRate 0.0681   Epoch: 3   Global Step: 145230   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:24,093-Speed 2632.38 samples/sec   Loss 11.1040   LearningRate 0.0680   Epoch: 3   Global Step: 145240   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:27,987-Speed 2630.66 samples/sec   Loss 11.3500   LearningRate 0.0680   Epoch: 3   Global Step: 145250   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:31,876-Speed 2633.09 samples/sec   Loss 11.3860   LearningRate 0.0680   Epoch: 3   Global Step: 145260   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:35,771-Speed 2629.32 samples/sec   Loss 11.2709   LearningRate 0.0680   Epoch: 3   Global Step: 145270   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:39,666-Speed 2630.31 samples/sec   Loss 11.3316   LearningRate 0.0680   Epoch: 3   Global Step: 145280   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:43,615-Speed 2593.85 samples/sec   Loss 11.2007   LearningRate 0.0680   Epoch: 3   Global Step: 145290   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:02:47,531-Speed 2615.63 samples/sec   Loss 11.2019   LearningRate 0.0680   Epoch: 3   Global Step: 145300   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:02:51,448-Speed 2614.98 samples/sec   Loss 11.4168   LearningRate 0.0680   Epoch: 3   Global Step: 145310   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:02:55,361-Speed 2617.73 samples/sec   Loss 11.3705   LearningRate 0.0680   Epoch: 3   Global Step: 145320   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:02:59,274-Speed 2617.53 samples/sec   Loss 11.4740   LearningRate 0.0680   Epoch: 3   Global Step: 145330   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:03,187-Speed 2617.62 samples/sec   Loss 11.2923   LearningRate 0.0680   Epoch: 3   Global Step: 145340   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:07,101-Speed 2616.44 samples/sec   Loss 11.3271   LearningRate 0.0680   Epoch: 3   Global Step: 145350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:11,017-Speed 2615.61 samples/sec   Loss 11.4143   LearningRate 0.0680   Epoch: 3   Global Step: 145360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:14,928-Speed 2618.96 samples/sec   Loss 11.3862   LearningRate 0.0680   Epoch: 3   Global Step: 145370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:18,829-Speed 2625.70 samples/sec   Loss 11.2450   LearningRate 0.0680   Epoch: 3   Global Step: 145380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:22,720-Speed 2632.41 samples/sec   Loss 11.3539   LearningRate 0.0680   Epoch: 3   Global Step: 145390   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:26,615-Speed 2629.22 samples/sec   Loss 11.2771   LearningRate 0.0680   Epoch: 3   Global Step: 145400   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:03:30,503-Speed 2635.58 samples/sec   Loss 11.4279   LearningRate 0.0680   Epoch: 3   Global Step: 145410   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:34,422-Speed 2613.45 samples/sec   Loss 11.2700   LearningRate 0.0680   Epoch: 3   Global Step: 145420   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:38,344-Speed 2611.25 samples/sec   Loss 11.4144   LearningRate 0.0680   Epoch: 3   Global Step: 145430   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:42,270-Speed 2609.10 samples/sec   Loss 11.4399   LearningRate 0.0680   Epoch: 3   Global Step: 145440   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:46,171-Speed 2625.29 samples/sec   Loss 11.3131   LearningRate 0.0680   Epoch: 3   Global Step: 145450   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:50,064-Speed 2631.11 samples/sec   Loss 11.5128   LearningRate 0.0680   Epoch: 3   Global Step: 145460   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:53,970-Speed 2622.96 samples/sec   Loss 11.2560   LearningRate 0.0680   Epoch: 3   Global Step: 145470   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:03:57,866-Speed 2628.50 samples/sec   Loss 11.3009   LearningRate 0.0680   Epoch: 3   Global Step: 145480   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:04:01,759-Speed 2631.71 samples/sec   Loss 11.3572   LearningRate 0.0680   Epoch: 3   Global Step: 145490   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:04:05,656-Speed 2628.34 samples/sec   Loss 11.2709   LearningRate 0.0680   Epoch: 3   Global Step: 145500   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:04:09,548-Speed 2631.30 samples/sec   Loss 11.2755   LearningRate 0.0680   Epoch: 3   Global Step: 145510   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:13,444-Speed 2628.70 samples/sec   Loss 11.2194   LearningRate 0.0680   Epoch: 3   Global Step: 145520   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:17,338-Speed 2630.64 samples/sec   Loss 11.2545   LearningRate 0.0680   Epoch: 3   Global Step: 145530   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:21,284-Speed 2596.30 samples/sec   Loss 11.2436   LearningRate 0.0680   Epoch: 3   Global Step: 145540   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:25,197-Speed 2617.32 samples/sec   Loss 11.2325   LearningRate 0.0680   Epoch: 3   Global Step: 145550   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:29,108-Speed 2618.95 samples/sec   Loss 11.2291   LearningRate 0.0680   Epoch: 3   Global Step: 145560   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:33,011-Speed 2624.88 samples/sec   Loss 11.3407   LearningRate 0.0680   Epoch: 3   Global Step: 145570   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:36,906-Speed 2629.28 samples/sec   Loss 11.1417   LearningRate 0.0680   Epoch: 3   Global Step: 145580   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:40,809-Speed 2624.24 samples/sec   Loss 11.3615   LearningRate 0.0680   Epoch: 3   Global Step: 145590   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:44,709-Speed 2626.99 samples/sec   Loss 11.4244   LearningRate 0.0680   Epoch: 3   Global Step: 145600   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:48,769-Speed 2522.22 samples/sec   Loss 11.3314   LearningRate 0.0680   Epoch: 3   Global Step: 145610   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:52,845-Speed 2513.09 samples/sec   Loss 11.2026   LearningRate 0.0680   Epoch: 3   Global Step: 145620   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:04:56,737-Speed 2631.98 samples/sec   Loss 11.4365   LearningRate 0.0680   Epoch: 3   Global Step: 145630   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:05:00,630-Speed 2630.85 samples/sec   Loss 11.1501   LearningRate 0.0680   Epoch: 3   Global Step: 145640   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:05:04,528-Speed 2628.13 samples/sec   Loss 11.3399   LearningRate 0.0680   Epoch: 3   Global Step: 145650   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:05:08,406-Speed 2640.66 samples/sec   Loss 11.3366   LearningRate 0.0680   Epoch: 3   Global Step: 145660   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:05:12,286-Speed 2639.81 samples/sec   Loss 11.2301   LearningRate 0.0680   Epoch: 3   Global Step: 145670   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:16,186-Speed 2626.28 samples/sec   Loss 11.2316   LearningRate 0.0680   Epoch: 3   Global Step: 145680   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:20,089-Speed 2624.62 samples/sec   Loss 11.3405   LearningRate 0.0680   Epoch: 3   Global Step: 145690   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:24,002-Speed 2618.04 samples/sec   Loss 11.4510   LearningRate 0.0680   Epoch: 3   Global Step: 145700   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:27,927-Speed 2609.07 samples/sec   Loss 11.3627   LearningRate 0.0680   Epoch: 3   Global Step: 145710   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:31,828-Speed 2626.26 samples/sec   Loss 11.2470   LearningRate 0.0680   Epoch: 3   Global Step: 145720   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:35,746-Speed 2614.24 samples/sec   Loss 11.3217   LearningRate 0.0680   Epoch: 3   Global Step: 145730   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:39,647-Speed 2625.92 samples/sec   Loss 11.3879   LearningRate 0.0680   Epoch: 3   Global Step: 145740   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:43,552-Speed 2622.43 samples/sec   Loss 11.2502   LearningRate 0.0679   Epoch: 3   Global Step: 145750   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:47,457-Speed 2623.60 samples/sec   Loss 11.3596   LearningRate 0.0679   Epoch: 3   Global Step: 145760   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:05:51,364-Speed 2621.73 samples/sec   Loss 11.2809   LearningRate 0.0679   Epoch: 3   Global Step: 145770   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:05:55,269-Speed 2622.58 samples/sec   Loss 11.2404   LearningRate 0.0679   Epoch: 3   Global Step: 145780   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:05:59,185-Speed 2616.25 samples/sec   Loss 11.1463   LearningRate 0.0679   Epoch: 3   Global Step: 145790   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:03,095-Speed 2619.14 samples/sec   Loss 11.3900   LearningRate 0.0679   Epoch: 3   Global Step: 145800   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:06,997-Speed 2625.01 samples/sec   Loss 11.2950   LearningRate 0.0679   Epoch: 3   Global Step: 145810   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:10,893-Speed 2629.36 samples/sec   Loss 11.2848   LearningRate 0.0679   Epoch: 3   Global Step: 145820   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:14,800-Speed 2621.25 samples/sec   Loss 11.2542   LearningRate 0.0679   Epoch: 3   Global Step: 145830   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:18,697-Speed 2628.61 samples/sec   Loss 11.2598   LearningRate 0.0679   Epoch: 3   Global Step: 145840   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:22,589-Speed 2631.86 samples/sec   Loss 11.4223   LearningRate 0.0679   Epoch: 3   Global Step: 145850   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:26,489-Speed 2626.90 samples/sec   Loss 11.4958   LearningRate 0.0679   Epoch: 3   Global Step: 145860   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:30,415-Speed 2608.49 samples/sec   Loss 11.2237   LearningRate 0.0679   Epoch: 3   Global Step: 145870   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:06:34,324-Speed 2620.26 samples/sec   Loss 11.2599   LearningRate 0.0679   Epoch: 3   Global Step: 145880   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:06:38,202-Speed 2641.32 samples/sec   Loss 11.4089   LearningRate 0.0679   Epoch: 3   Global Step: 145890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:42,123-Speed 2612.78 samples/sec   Loss 11.4443   LearningRate 0.0679   Epoch: 3   Global Step: 145900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:46,018-Speed 2629.57 samples/sec   Loss 11.2553   LearningRate 0.0679   Epoch: 3   Global Step: 145910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:49,926-Speed 2621.04 samples/sec   Loss 11.3647   LearningRate 0.0679   Epoch: 3   Global Step: 145920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:53,820-Speed 2630.54 samples/sec   Loss 11.4305   LearningRate 0.0679   Epoch: 3   Global Step: 145930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:06:57,716-Speed 2629.40 samples/sec   Loss 11.3808   LearningRate 0.0679   Epoch: 3   Global Step: 145940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:01,649-Speed 2604.09 samples/sec   Loss 11.4077   LearningRate 0.0679   Epoch: 3   Global Step: 145950   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:05,543-Speed 2630.51 samples/sec   Loss 11.4871   LearningRate 0.0679   Epoch: 3   Global Step: 145960   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:09,436-Speed 2631.30 samples/sec   Loss 11.2719   LearningRate 0.0679   Epoch: 3   Global Step: 145970   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:13,333-Speed 2628.30 samples/sec   Loss 11.1158   LearningRate 0.0679   Epoch: 3   Global Step: 145980   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:17,234-Speed 2625.76 samples/sec   Loss 11.3053   LearningRate 0.0679   Epoch: 3   Global Step: 145990   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:07:21,135-Speed 2625.53 samples/sec   Loss 11.2831   LearningRate 0.0679   Epoch: 3   Global Step: 146000   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:07:25,035-Speed 2626.57 samples/sec   Loss 11.2733   LearningRate 0.0679   Epoch: 3   Global Step: 146010   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:07:28,936-Speed 2625.52 samples/sec   Loss 11.3092   LearningRate 0.0679   Epoch: 3   Global Step: 146020   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:07:32,843-Speed 2621.26 samples/sec   Loss 11.2789   LearningRate 0.0679   Epoch: 3   Global Step: 146030   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:07:36,731-Speed 2634.35 samples/sec   Loss 11.2107   LearningRate 0.0679   Epoch: 3   Global Step: 146040   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:40,641-Speed 2619.98 samples/sec   Loss 11.4569   LearningRate 0.0679   Epoch: 3   Global Step: 146050   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:44,535-Speed 2630.42 samples/sec   Loss 11.2603   LearningRate 0.0679   Epoch: 3   Global Step: 146060   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:48,434-Speed 2626.97 samples/sec   Loss 11.2999   LearningRate 0.0679   Epoch: 3   Global Step: 146070   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:52,342-Speed 2621.35 samples/sec   Loss 11.3248   LearningRate 0.0679   Epoch: 3   Global Step: 146080   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:07:56,245-Speed 2624.47 samples/sec   Loss 11.3705   LearningRate 0.0679   Epoch: 3   Global Step: 146090   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:00,139-Speed 2630.25 samples/sec   Loss 11.4370   LearningRate 0.0679   Epoch: 3   Global Step: 146100   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:04,061-Speed 2611.12 samples/sec   Loss 11.2795   LearningRate 0.0679   Epoch: 3   Global Step: 146110   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:07,961-Speed 2626.02 samples/sec   Loss 11.2829   LearningRate 0.0679   Epoch: 3   Global Step: 146120   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:11,863-Speed 2625.90 samples/sec   Loss 11.4025   LearningRate 0.0679   Epoch: 3   Global Step: 146130   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:15,786-Speed 2610.49 samples/sec   Loss 11.2511   LearningRate 0.0679   Epoch: 3   Global Step: 146140   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:08:19,720-Speed 2604.12 samples/sec   Loss 11.3757   LearningRate 0.0679   Epoch: 3   Global Step: 146150   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:08:23,680-Speed 2586.81 samples/sec   Loss 11.3109   LearningRate 0.0679   Epoch: 3   Global Step: 146160   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:08:27,574-Speed 2629.93 samples/sec   Loss 11.2844   LearningRate 0.0679   Epoch: 3   Global Step: 146170   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:08:31,458-Speed 2637.29 samples/sec   Loss 11.2822   LearningRate 0.0679   Epoch: 3   Global Step: 146180   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:35,347-Speed 2633.68 samples/sec   Loss 11.4779   LearningRate 0.0679   Epoch: 3   Global Step: 146190   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:39,243-Speed 2629.18 samples/sec   Loss 11.3031   LearningRate 0.0679   Epoch: 3   Global Step: 146200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:43,142-Speed 2626.91 samples/sec   Loss 11.2598   LearningRate 0.0679   Epoch: 3   Global Step: 146210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:47,040-Speed 2628.33 samples/sec   Loss 11.3307   LearningRate 0.0679   Epoch: 3   Global Step: 146220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:50,933-Speed 2630.84 samples/sec   Loss 11.3382   LearningRate 0.0679   Epoch: 3   Global Step: 146230   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:54,829-Speed 2629.04 samples/sec   Loss 11.4272   LearningRate 0.0679   Epoch: 3   Global Step: 146240   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:08:58,723-Speed 2630.62 samples/sec   Loss 11.3826   LearningRate 0.0678   Epoch: 3   Global Step: 146250   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:02,635-Speed 2617.64 samples/sec   Loss 11.2442   LearningRate 0.0678   Epoch: 3   Global Step: 146260   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:06,531-Speed 2629.28 samples/sec   Loss 11.4507   LearningRate 0.0678   Epoch: 3   Global Step: 146270   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:10,424-Speed 2631.22 samples/sec   Loss 11.3004   LearningRate 0.0678   Epoch: 3   Global Step: 146280   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:09:14,313-Speed 2633.90 samples/sec   Loss 11.3484   LearningRate 0.0678   Epoch: 3   Global Step: 146290   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:18,219-Speed 2621.95 samples/sec   Loss 11.3721   LearningRate 0.0678   Epoch: 3   Global Step: 146300   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:22,113-Speed 2630.91 samples/sec   Loss 11.3666   LearningRate 0.0678   Epoch: 3   Global Step: 146310   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:26,013-Speed 2625.75 samples/sec   Loss 11.2261   LearningRate 0.0678   Epoch: 3   Global Step: 146320   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:29,907-Speed 2630.69 samples/sec   Loss 11.4024   LearningRate 0.0678   Epoch: 3   Global Step: 146330   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:33,803-Speed 2629.06 samples/sec   Loss 11.2871   LearningRate 0.0678   Epoch: 3   Global Step: 146340   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:37,713-Speed 2619.22 samples/sec   Loss 11.2620   LearningRate 0.0678   Epoch: 3   Global Step: 146350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:41,606-Speed 2631.28 samples/sec   Loss 11.3792   LearningRate 0.0678   Epoch: 3   Global Step: 146360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:45,515-Speed 2620.27 samples/sec   Loss 11.3260   LearningRate 0.0678   Epoch: 3   Global Step: 146370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:49,408-Speed 2630.80 samples/sec   Loss 11.3946   LearningRate 0.0678   Epoch: 3   Global Step: 146380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:09:53,302-Speed 2629.99 samples/sec   Loss 11.3673   LearningRate 0.0678   Epoch: 3   Global Step: 146390   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:09:57,183-Speed 2639.62 samples/sec   Loss 11.4074   LearningRate 0.0678   Epoch: 3   Global Step: 146400   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:01,075-Speed 2632.05 samples/sec   Loss 11.1735   LearningRate 0.0678   Epoch: 3   Global Step: 146410   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:04,972-Speed 2627.89 samples/sec   Loss 11.2649   LearningRate 0.0678   Epoch: 3   Global Step: 146420   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:08,869-Speed 2628.43 samples/sec   Loss 11.0970   LearningRate 0.0678   Epoch: 3   Global Step: 146430   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:12,774-Speed 2622.67 samples/sec   Loss 11.1764   LearningRate 0.0678   Epoch: 3   Global Step: 146440   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:16,669-Speed 2629.66 samples/sec   Loss 11.1774   LearningRate 0.0678   Epoch: 3   Global Step: 146450   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:20,574-Speed 2622.35 samples/sec   Loss 11.3438   LearningRate 0.0678   Epoch: 3   Global Step: 146460   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:24,472-Speed 2627.85 samples/sec   Loss 11.4060   LearningRate 0.0678   Epoch: 3   Global Step: 146470   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:28,381-Speed 2620.40 samples/sec   Loss 11.2783   LearningRate 0.0678   Epoch: 3   Global Step: 146480   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:32,274-Speed 2631.36 samples/sec   Loss 11.3725   LearningRate 0.0678   Epoch: 3   Global Step: 146490   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:10:36,175-Speed 2625.78 samples/sec   Loss 11.1520   LearningRate 0.0678   Epoch: 3   Global Step: 146500   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:10:40,087-Speed 2617.94 samples/sec   Loss 11.3444   LearningRate 0.0678   Epoch: 3   Global Step: 146510   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:10:44,042-Speed 2589.54 samples/sec   Loss 11.1636   LearningRate 0.0678   Epoch: 3   Global Step: 146520   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:10:47,991-Speed 2594.29 samples/sec   Loss 11.9880   LearningRate 0.0678   Epoch: 3   Global Step: 146530   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:10:52,038-Speed 2530.61 samples/sec   Loss 11.8950   LearningRate 0.0678   Epoch: 3   Global Step: 146540   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:10:55,944-Speed 2622.34 samples/sec   Loss 11.6548   LearningRate 0.0678   Epoch: 3   Global Step: 146550   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:10:59,878-Speed 2604.10 samples/sec   Loss 11.5833   LearningRate 0.0678   Epoch: 3   Global Step: 146560   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:03,802-Speed 2610.36 samples/sec   Loss 11.5135   LearningRate 0.0678   Epoch: 3   Global Step: 146570   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:07,756-Speed 2590.46 samples/sec   Loss 11.4158   LearningRate 0.0678   Epoch: 3   Global Step: 146580   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:11,728-Speed 2578.80 samples/sec   Loss 11.2318   LearningRate 0.0678   Epoch: 3   Global Step: 146590   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:15,631-Speed 2624.67 samples/sec   Loss 11.2839   LearningRate 0.0678   Epoch: 3   Global Step: 146600   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:19,539-Speed 2620.82 samples/sec   Loss 11.3175   LearningRate 0.0678   Epoch: 3   Global Step: 146610   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:23,444-Speed 2623.11 samples/sec   Loss 11.3788   LearningRate 0.0678   Epoch: 3   Global Step: 146620   Fp16 Grad Scale: 32768   Required: 77 hours
Training: 2022-04-13 12:11:27,368-Speed 2610.20 samples/sec   Loss 11.3432   LearningRate 0.0678   Epoch: 3   Global Step: 146630   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:31,279-Speed 2619.20 samples/sec   Loss 11.2490   LearningRate 0.0678   Epoch: 3   Global Step: 146640   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:35,180-Speed 2625.94 samples/sec   Loss 11.3967   LearningRate 0.0678   Epoch: 3   Global Step: 146650   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:39,082-Speed 2624.35 samples/sec   Loss 11.3202   LearningRate 0.0678   Epoch: 3   Global Step: 146660   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:43,008-Speed 2609.24 samples/sec   Loss 11.3080   LearningRate 0.0678   Epoch: 3   Global Step: 146670   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:46,914-Speed 2622.25 samples/sec   Loss 11.3516   LearningRate 0.0678   Epoch: 3   Global Step: 146680   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:50,809-Speed 2629.49 samples/sec   Loss 11.3803   LearningRate 0.0678   Epoch: 3   Global Step: 146690   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:54,713-Speed 2623.77 samples/sec   Loss 11.4331   LearningRate 0.0678   Epoch: 3   Global Step: 146700   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:11:58,617-Speed 2623.28 samples/sec   Loss 11.3241   LearningRate 0.0678   Epoch: 3   Global Step: 146710   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:12:02,514-Speed 2629.07 samples/sec   Loss 11.1940   LearningRate 0.0678   Epoch: 3   Global Step: 146720   Fp16 Grad Scale: 65536   Required: 77 hours
Training: 2022-04-13 12:12:06,413-Speed 2627.14 samples/sec   Loss 11.0832   LearningRate 0.0678   Epoch: 3   Global Step: 146730   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:10,309-Speed 2628.90 samples/sec   Loss 11.4040   LearningRate 0.0678   Epoch: 3   Global Step: 146740   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:14,206-Speed 2627.96 samples/sec   Loss 11.2842   LearningRate 0.0677   Epoch: 3   Global Step: 146750   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:18,101-Speed 2629.95 samples/sec   Loss 11.3860   LearningRate 0.0677   Epoch: 3   Global Step: 146760   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:21,993-Speed 2631.22 samples/sec   Loss 11.2762   LearningRate 0.0677   Epoch: 3   Global Step: 146770   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:25,897-Speed 2624.28 samples/sec   Loss 11.1613   LearningRate 0.0677   Epoch: 3   Global Step: 146780   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:29,891-Speed 2563.99 samples/sec   Loss 11.2646   LearningRate 0.0677   Epoch: 3   Global Step: 146790   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:33,799-Speed 2620.74 samples/sec   Loss 11.3656   LearningRate 0.0677   Epoch: 3   Global Step: 146800   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:37,791-Speed 2566.38 samples/sec   Loss 11.2888   LearningRate 0.0677   Epoch: 3   Global Step: 146810   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:41,704-Speed 2618.15 samples/sec   Loss 11.2350   LearningRate 0.0677   Epoch: 3   Global Step: 146820   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:45,705-Speed 2559.88 samples/sec   Loss 11.2363   LearningRate 0.0677   Epoch: 3   Global Step: 146830   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:12:49,799-Speed 2502.58 samples/sec   Loss 11.3201   LearningRate 0.0677   Epoch: 3   Global Step: 146840   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:12:53,679-Speed 2639.78 samples/sec   Loss 11.2446   LearningRate 0.0677   Epoch: 3   Global Step: 146850   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:12:57,580-Speed 2625.70 samples/sec   Loss 11.2813   LearningRate 0.0677   Epoch: 3   Global Step: 146860   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:01,474-Speed 2629.79 samples/sec   Loss 11.2666   LearningRate 0.0677   Epoch: 3   Global Step: 146870   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:05,375-Speed 2625.82 samples/sec   Loss 11.4939   LearningRate 0.0677   Epoch: 3   Global Step: 146880   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:09,283-Speed 2620.79 samples/sec   Loss 11.4164   LearningRate 0.0677   Epoch: 3   Global Step: 146890   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:13,176-Speed 2631.52 samples/sec   Loss 11.3223   LearningRate 0.0677   Epoch: 3   Global Step: 146900   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:17,076-Speed 2626.47 samples/sec   Loss 11.2946   LearningRate 0.0677   Epoch: 3   Global Step: 146910   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:20,973-Speed 2628.24 samples/sec   Loss 11.3891   LearningRate 0.0677   Epoch: 3   Global Step: 146920   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:24,869-Speed 2629.55 samples/sec   Loss 11.5075   LearningRate 0.0677   Epoch: 3   Global Step: 146930   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:28,766-Speed 2628.46 samples/sec   Loss 11.2081   LearningRate 0.0677   Epoch: 3   Global Step: 146940   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:13:32,661-Speed 2629.28 samples/sec   Loss 11.4110   LearningRate 0.0677   Epoch: 3   Global Step: 146950   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:13:36,564-Speed 2624.16 samples/sec   Loss 11.3713   LearningRate 0.0677   Epoch: 3   Global Step: 146960   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:13:40,469-Speed 2623.60 samples/sec   Loss 11.2517   LearningRate 0.0677   Epoch: 3   Global Step: 146970   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:13:44,382-Speed 2617.14 samples/sec   Loss 11.1841   LearningRate 0.0677   Epoch: 3   Global Step: 146980   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:13:48,386-Speed 2558.31 samples/sec   Loss 11.3186   LearningRate 0.0677   Epoch: 3   Global Step: 146990   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:13:52,298-Speed 2618.63 samples/sec   Loss 11.2218   LearningRate 0.0677   Epoch: 3   Global Step: 147000   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:13:56,190-Speed 2631.81 samples/sec   Loss 11.2600   LearningRate 0.0677   Epoch: 3   Global Step: 147010   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:00,090-Speed 2626.57 samples/sec   Loss 11.2880   LearningRate 0.0677   Epoch: 3   Global Step: 147020   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:03,990-Speed 2626.30 samples/sec   Loss 11.3082   LearningRate 0.0677   Epoch: 3   Global Step: 147030   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:07,881-Speed 2632.31 samples/sec   Loss 11.4971   LearningRate 0.0677   Epoch: 3   Global Step: 147040   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:11,767-Speed 2635.73 samples/sec   Loss 11.3934   LearningRate 0.0677   Epoch: 3   Global Step: 147050   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:15,662-Speed 2629.82 samples/sec   Loss 11.3078   LearningRate 0.0677   Epoch: 3   Global Step: 147060   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:19,554-Speed 2631.94 samples/sec   Loss 11.3724   LearningRate 0.0677   Epoch: 3   Global Step: 147070   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:23,471-Speed 2614.91 samples/sec   Loss 11.3861   LearningRate 0.0677   Epoch: 3   Global Step: 147080   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:27,389-Speed 2614.39 samples/sec   Loss 11.3917   LearningRate 0.0677   Epoch: 3   Global Step: 147090   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:31,284-Speed 2629.30 samples/sec   Loss 11.2707   LearningRate 0.0677   Epoch: 3   Global Step: 147100   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:35,181-Speed 2628.43 samples/sec   Loss 11.3048   LearningRate 0.0677   Epoch: 3   Global Step: 147110   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:39,083-Speed 2624.73 samples/sec   Loss 11.2521   LearningRate 0.0677   Epoch: 3   Global Step: 147120   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:43,133-Speed 2529.76 samples/sec   Loss 11.1816   LearningRate 0.0677   Epoch: 3   Global Step: 147130   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:47,047-Speed 2617.12 samples/sec   Loss 11.3071   LearningRate 0.0677   Epoch: 3   Global Step: 147140   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:50,962-Speed 2616.12 samples/sec   Loss 11.3238   LearningRate 0.0677   Epoch: 3   Global Step: 147150   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:54,894-Speed 2604.94 samples/sec   Loss 11.2782   LearningRate 0.0677   Epoch: 3   Global Step: 147160   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:14:58,791-Speed 2628.75 samples/sec   Loss 11.3070   LearningRate 0.0677   Epoch: 3   Global Step: 147170   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:15:02,713-Speed 2611.02 samples/sec   Loss 11.4247   LearningRate 0.0677   Epoch: 3   Global Step: 147180   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:15:06,610-Speed 2628.91 samples/sec   Loss 11.2508   LearningRate 0.0677   Epoch: 3   Global Step: 147190   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:15:10,506-Speed 2628.95 samples/sec   Loss 11.3248   LearningRate 0.0677   Epoch: 3   Global Step: 147200   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:14,417-Speed 2618.98 samples/sec   Loss 11.2722   LearningRate 0.0677   Epoch: 3   Global Step: 147210   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:18,367-Speed 2593.01 samples/sec   Loss 11.1313   LearningRate 0.0677   Epoch: 3   Global Step: 147220   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:22,399-Speed 2540.81 samples/sec   Loss 11.2744   LearningRate 0.0677   Epoch: 3   Global Step: 147230   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:26,291-Speed 2631.25 samples/sec   Loss 11.3657   LearningRate 0.0677   Epoch: 3   Global Step: 147240   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:30,251-Speed 2586.94 samples/sec   Loss 11.2744   LearningRate 0.0677   Epoch: 3   Global Step: 147250   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:34,160-Speed 2620.54 samples/sec   Loss 11.2641   LearningRate 0.0676   Epoch: 3   Global Step: 147260   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:38,068-Speed 2620.68 samples/sec   Loss 11.1760   LearningRate 0.0676   Epoch: 3   Global Step: 147270   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:41,987-Speed 2613.79 samples/sec   Loss 11.3725   LearningRate 0.0676   Epoch: 3   Global Step: 147280   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:45,901-Speed 2616.97 samples/sec   Loss 11.3348   LearningRate 0.0676   Epoch: 3   Global Step: 147290   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:15:49,801-Speed 2626.19 samples/sec   Loss 11.3367   LearningRate 0.0676   Epoch: 3   Global Step: 147300   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:15:53,697-Speed 2629.17 samples/sec   Loss 11.2718   LearningRate 0.0676   Epoch: 3   Global Step: 147310   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:15:57,573-Speed 2642.45 samples/sec   Loss 11.2572   LearningRate 0.0676   Epoch: 3   Global Step: 147320   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:01,465-Speed 2632.05 samples/sec   Loss 11.1721   LearningRate 0.0676   Epoch: 3   Global Step: 147330   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:05,361-Speed 2628.91 samples/sec   Loss 11.2933   LearningRate 0.0676   Epoch: 3   Global Step: 147340   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:09,258-Speed 2627.96 samples/sec   Loss 11.4295   LearningRate 0.0676   Epoch: 3   Global Step: 147350   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:13,175-Speed 2614.70 samples/sec   Loss 11.3158   LearningRate 0.0676   Epoch: 3   Global Step: 147360   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:17,088-Speed 2618.34 samples/sec   Loss 11.1355   LearningRate 0.0676   Epoch: 3   Global Step: 147370   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:20,977-Speed 2633.41 samples/sec   Loss 11.2267   LearningRate 0.0676   Epoch: 3   Global Step: 147380   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:24,870-Speed 2631.01 samples/sec   Loss 11.3691   LearningRate 0.0676   Epoch: 3   Global Step: 147390   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:28,770-Speed 2626.81 samples/sec   Loss 11.1278   LearningRate 0.0676   Epoch: 3   Global Step: 147400   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:32,664-Speed 2630.23 samples/sec   Loss 11.3830   LearningRate 0.0676   Epoch: 3   Global Step: 147410   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:36,594-Speed 2606.43 samples/sec   Loss 11.2859   LearningRate 0.0676   Epoch: 3   Global Step: 147420   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:16:40,493-Speed 2627.11 samples/sec   Loss 11.1605   LearningRate 0.0676   Epoch: 3   Global Step: 147430   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:16:44,387-Speed 2630.00 samples/sec   Loss 11.1559   LearningRate 0.0676   Epoch: 3   Global Step: 147440   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:16:48,282-Speed 2629.66 samples/sec   Loss 11.2945   LearningRate 0.0676   Epoch: 3   Global Step: 147450   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:16:52,170-Speed 2635.21 samples/sec   Loss 11.3289   LearningRate 0.0676   Epoch: 3   Global Step: 147460   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:56,063-Speed 2631.00 samples/sec   Loss 11.2367   LearningRate 0.0676   Epoch: 3   Global Step: 147470   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:16:59,975-Speed 2618.38 samples/sec   Loss 11.3395   LearningRate 0.0676   Epoch: 3   Global Step: 147480   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:03,889-Speed 2616.87 samples/sec   Loss 11.3951   LearningRate 0.0676   Epoch: 3   Global Step: 147490   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:07,786-Speed 2628.17 samples/sec   Loss 11.2951   LearningRate 0.0676   Epoch: 3   Global Step: 147500   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:11,696-Speed 2619.89 samples/sec   Loss 11.2977   LearningRate 0.0676   Epoch: 3   Global Step: 147510   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:15,599-Speed 2624.68 samples/sec   Loss 11.1410   LearningRate 0.0676   Epoch: 3   Global Step: 147520   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:19,487-Speed 2633.98 samples/sec   Loss 11.2410   LearningRate 0.0676   Epoch: 3   Global Step: 147530   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:23,388-Speed 2626.27 samples/sec   Loss 11.2198   LearningRate 0.0676   Epoch: 3   Global Step: 147540   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:27,295-Speed 2621.81 samples/sec   Loss 11.4223   LearningRate 0.0676   Epoch: 3   Global Step: 147550   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:17:31,194-Speed 2627.25 samples/sec   Loss 11.4217   LearningRate 0.0676   Epoch: 3   Global Step: 147560   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:35,120-Speed 2608.85 samples/sec   Loss 11.1682   LearningRate 0.0676   Epoch: 3   Global Step: 147570   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:39,015-Speed 2629.35 samples/sec   Loss 11.3656   LearningRate 0.0676   Epoch: 3   Global Step: 147580   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:42,909-Speed 2630.51 samples/sec   Loss 11.2635   LearningRate 0.0676   Epoch: 3   Global Step: 147590   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:46,811-Speed 2624.71 samples/sec   Loss 11.2710   LearningRate 0.0676   Epoch: 3   Global Step: 147600   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:50,703-Speed 2631.50 samples/sec   Loss 11.1677   LearningRate 0.0676   Epoch: 3   Global Step: 147610   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:54,595-Speed 2632.03 samples/sec   Loss 11.4823   LearningRate 0.0676   Epoch: 3   Global Step: 147620   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:17:58,494-Speed 2627.32 samples/sec   Loss 11.1746   LearningRate 0.0676   Epoch: 3   Global Step: 147630   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:02,387-Speed 2630.97 samples/sec   Loss 11.2976   LearningRate 0.0676   Epoch: 3   Global Step: 147640   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:06,285-Speed 2627.63 samples/sec   Loss 11.3105   LearningRate 0.0676   Epoch: 3   Global Step: 147650   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:10,170-Speed 2636.05 samples/sec   Loss 11.1982   LearningRate 0.0676   Epoch: 3   Global Step: 147660   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:14,122-Speed 2591.15 samples/sec   Loss 11.3162   LearningRate 0.0676   Epoch: 3   Global Step: 147670   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:18,028-Speed 2622.75 samples/sec   Loss 11.1553   LearningRate 0.0676   Epoch: 3   Global Step: 147680   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:21,925-Speed 2628.50 samples/sec   Loss 11.2276   LearningRate 0.0676   Epoch: 3   Global Step: 147690   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:25,816-Speed 2632.32 samples/sec   Loss 11.2788   LearningRate 0.0676   Epoch: 3   Global Step: 147700   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:29,713-Speed 2627.75 samples/sec   Loss 11.1792   LearningRate 0.0676   Epoch: 3   Global Step: 147710   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:33,615-Speed 2625.48 samples/sec   Loss 11.2086   LearningRate 0.0676   Epoch: 3   Global Step: 147720   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:37,514-Speed 2627.20 samples/sec   Loss 11.2033   LearningRate 0.0676   Epoch: 3   Global Step: 147730   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:41,417-Speed 2624.34 samples/sec   Loss 11.1510   LearningRate 0.0676   Epoch: 3   Global Step: 147740   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:45,325-Speed 2620.59 samples/sec   Loss 11.3314   LearningRate 0.0676   Epoch: 3   Global Step: 147750   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:18:49,195-Speed 2647.30 samples/sec   Loss 11.3010   LearningRate 0.0675   Epoch: 3   Global Step: 147760   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:18:53,085-Speed 2633.30 samples/sec   Loss 11.1880   LearningRate 0.0675   Epoch: 3   Global Step: 147770   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:18:56,981-Speed 2628.64 samples/sec   Loss 11.3444   LearningRate 0.0675   Epoch: 3   Global Step: 147780   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:00,885-Speed 2623.62 samples/sec   Loss 11.2322   LearningRate 0.0675   Epoch: 3   Global Step: 147790   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:04,772-Speed 2634.72 samples/sec   Loss 11.1788   LearningRate 0.0675   Epoch: 3   Global Step: 147800   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:08,668-Speed 2628.98 samples/sec   Loss 11.2328   LearningRate 0.0675   Epoch: 3   Global Step: 147810   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:12,565-Speed 2628.61 samples/sec   Loss 11.3483   LearningRate 0.0675   Epoch: 3   Global Step: 147820   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:16,464-Speed 2627.10 samples/sec   Loss 11.0399   LearningRate 0.0675   Epoch: 3   Global Step: 147830   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:20,357-Speed 2630.98 samples/sec   Loss 11.2294   LearningRate 0.0675   Epoch: 3   Global Step: 147840   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:24,252-Speed 2629.68 samples/sec   Loss 11.2343   LearningRate 0.0675   Epoch: 3   Global Step: 147850   Fp16 Grad Scale: 131072   Required: 77 hours
Training: 2022-04-13 12:19:28,146-Speed 2630.94 samples/sec   Loss 11.2732   LearningRate 0.0675   Epoch: 3   Global Step: 147860   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:19:32,043-Speed 2628.12 samples/sec   Loss 11.3297   LearningRate 0.0675   Epoch: 3   Global Step: 147870   Fp16 Grad Scale: 262144   Required: 77 hours
Training: 2022-04-13 12:19:35,950-Speed 2621.20 samples/sec   Loss 11.3081   LearningRate 0.0675   Epoch: 3   Global Step: 147880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:19:39,860-Speed 2619.61 samples/sec   Loss 11.3774   LearningRate 0.0675   Epoch: 3   Global Step: 147890   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:19:43,757-Speed 2628.90 samples/sec   Loss 11.1073   LearningRate 0.0675   Epoch: 3   Global Step: 147900   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:19:47,665-Speed 2620.84 samples/sec   Loss 11.2816   LearningRate 0.0675   Epoch: 3   Global Step: 147910   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:19:51,558-Speed 2631.06 samples/sec   Loss 11.2661   LearningRate 0.0675   Epoch: 3   Global Step: 147920   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:19:55,471-Speed 2617.52 samples/sec   Loss 11.2831   LearningRate 0.0675   Epoch: 3   Global Step: 147930   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:19:59,365-Speed 2630.82 samples/sec   Loss 11.2980   LearningRate 0.0675   Epoch: 3   Global Step: 147940   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:03,262-Speed 2628.06 samples/sec   Loss 11.3030   LearningRate 0.0675   Epoch: 3   Global Step: 147950   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:07,158-Speed 2628.63 samples/sec   Loss 11.2001   LearningRate 0.0675   Epoch: 3   Global Step: 147960   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:11,050-Speed 2631.66 samples/sec   Loss 11.4102   LearningRate 0.0675   Epoch: 3   Global Step: 147970   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:14,948-Speed 2627.74 samples/sec   Loss 11.2580   LearningRate 0.0675   Epoch: 3   Global Step: 147980   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:18,850-Speed 2625.39 samples/sec   Loss 11.2938   LearningRate 0.0675   Epoch: 3   Global Step: 147990   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:22,750-Speed 2626.54 samples/sec   Loss 11.1379   LearningRate 0.0675   Epoch: 3   Global Step: 148000   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:26,644-Speed 2630.44 samples/sec   Loss 11.1784   LearningRate 0.0675   Epoch: 3   Global Step: 148010   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:30,542-Speed 2627.44 samples/sec   Loss 11.2337   LearningRate 0.0675   Epoch: 3   Global Step: 148020   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:34,442-Speed 2626.08 samples/sec   Loss 11.1793   LearningRate 0.0675   Epoch: 3   Global Step: 148030   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:38,352-Speed 2619.75 samples/sec   Loss 11.1736   LearningRate 0.0675   Epoch: 3   Global Step: 148040   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:20:42,237-Speed 2636.86 samples/sec   Loss 11.1883   LearningRate 0.0675   Epoch: 3   Global Step: 148050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:46,137-Speed 2626.32 samples/sec   Loss 11.1377   LearningRate 0.0675   Epoch: 3   Global Step: 148060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:50,042-Speed 2622.68 samples/sec   Loss 11.2101   LearningRate 0.0675   Epoch: 3   Global Step: 148070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:53,939-Speed 2629.03 samples/sec   Loss 11.2359   LearningRate 0.0675   Epoch: 3   Global Step: 148080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:20:57,841-Speed 2625.61 samples/sec   Loss 11.2752   LearningRate 0.0675   Epoch: 3   Global Step: 148090   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:01,738-Speed 2628.36 samples/sec   Loss 11.1303   LearningRate 0.0675   Epoch: 3   Global Step: 148100   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:05,658-Speed 2612.85 samples/sec   Loss 11.3733   LearningRate 0.0675   Epoch: 3   Global Step: 148110   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:09,555-Speed 2627.79 samples/sec   Loss 11.1863   LearningRate 0.0675   Epoch: 3   Global Step: 148120   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:13,483-Speed 2608.07 samples/sec   Loss 11.3325   LearningRate 0.0675   Epoch: 3   Global Step: 148130   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:17,380-Speed 2629.28 samples/sec   Loss 11.2392   LearningRate 0.0675   Epoch: 3   Global Step: 148140   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:21,284-Speed 2622.89 samples/sec   Loss 11.3009   LearningRate 0.0675   Epoch: 3   Global Step: 148150   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:21:25,219-Speed 2603.60 samples/sec   Loss 11.1720   LearningRate 0.0675   Epoch: 3   Global Step: 148160   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:21:29,102-Speed 2637.88 samples/sec   Loss 11.2496   LearningRate 0.0675   Epoch: 3   Global Step: 148170   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:33,004-Speed 2624.74 samples/sec   Loss 11.1808   LearningRate 0.0675   Epoch: 3   Global Step: 148180   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:36,898-Speed 2630.47 samples/sec   Loss 11.3508   LearningRate 0.0675   Epoch: 3   Global Step: 148190   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:40,792-Speed 2630.29 samples/sec   Loss 11.2602   LearningRate 0.0675   Epoch: 3   Global Step: 148200   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:44,691-Speed 2626.92 samples/sec   Loss 11.2166   LearningRate 0.0675   Epoch: 3   Global Step: 148210   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:48,588-Speed 2628.78 samples/sec   Loss 11.3948   LearningRate 0.0675   Epoch: 3   Global Step: 148220   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:21:52,468-Speed 2639.64 samples/sec   Loss 11.2095   LearningRate 0.0675   Epoch: 3   Global Step: 148230   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:21:56,364-Speed 2629.08 samples/sec   Loss 11.2067   LearningRate 0.0675   Epoch: 3   Global Step: 148240   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:00,263-Speed 2626.90 samples/sec   Loss 11.2218   LearningRate 0.0675   Epoch: 3   Global Step: 148250   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:04,162-Speed 2626.78 samples/sec   Loss 11.3320   LearningRate 0.0675   Epoch: 3   Global Step: 148260   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:08,097-Speed 2603.10 samples/sec   Loss 11.3767   LearningRate 0.0674   Epoch: 3   Global Step: 148270   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:12,003-Speed 2622.88 samples/sec   Loss 11.1937   LearningRate 0.0674   Epoch: 3   Global Step: 148280   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:15,948-Speed 2596.32 samples/sec   Loss 11.2512   LearningRate 0.0674   Epoch: 3   Global Step: 148290   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:19,863-Speed 2616.59 samples/sec   Loss 11.3114   LearningRate 0.0674   Epoch: 3   Global Step: 148300   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:23,771-Speed 2620.80 samples/sec   Loss 11.3157   LearningRate 0.0674   Epoch: 3   Global Step: 148310   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:27,683-Speed 2618.40 samples/sec   Loss 11.1930   LearningRate 0.0674   Epoch: 3   Global Step: 148320   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:22:31,589-Speed 2621.99 samples/sec   Loss 11.3112   LearningRate 0.0674   Epoch: 3   Global Step: 148330   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:35,485-Speed 2628.92 samples/sec   Loss 11.2844   LearningRate 0.0674   Epoch: 3   Global Step: 148340   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:39,395-Speed 2619.66 samples/sec   Loss 11.1759   LearningRate 0.0674   Epoch: 3   Global Step: 148350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:43,300-Speed 2622.83 samples/sec   Loss 11.1560   LearningRate 0.0674   Epoch: 3   Global Step: 148360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:47,204-Speed 2623.57 samples/sec   Loss 11.2356   LearningRate 0.0674   Epoch: 3   Global Step: 148370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:51,100-Speed 2629.56 samples/sec   Loss 11.2493   LearningRate 0.0674   Epoch: 3   Global Step: 148380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:54,996-Speed 2628.60 samples/sec   Loss 11.3436   LearningRate 0.0674   Epoch: 3   Global Step: 148390   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:22:58,914-Speed 2614.28 samples/sec   Loss 11.2941   LearningRate 0.0674   Epoch: 3   Global Step: 148400   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:02,818-Speed 2623.60 samples/sec   Loss 11.4579   LearningRate 0.0674   Epoch: 3   Global Step: 148410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:06,711-Speed 2630.78 samples/sec   Loss 11.2315   LearningRate 0.0674   Epoch: 3   Global Step: 148420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:10,606-Speed 2629.60 samples/sec   Loss 11.0127   LearningRate 0.0674   Epoch: 3   Global Step: 148430   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:23:14,502-Speed 2628.88 samples/sec   Loss 11.2043   LearningRate 0.0674   Epoch: 3   Global Step: 148440   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:23:18,404-Speed 2626.14 samples/sec   Loss 11.2275   LearningRate 0.0674   Epoch: 3   Global Step: 148450   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:23:22,299-Speed 2629.51 samples/sec   Loss 11.1839   LearningRate 0.0674   Epoch: 3   Global Step: 148460   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:23:26,195-Speed 2628.79 samples/sec   Loss 11.1506   LearningRate 0.0674   Epoch: 3   Global Step: 148470   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:23:30,096-Speed 2625.30 samples/sec   Loss 11.3451   LearningRate 0.0674   Epoch: 3   Global Step: 148480   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:23:33,979-Speed 2638.12 samples/sec   Loss 11.1499   LearningRate 0.0674   Epoch: 3   Global Step: 148490   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:37,870-Speed 2631.72 samples/sec   Loss 11.3024   LearningRate 0.0674   Epoch: 3   Global Step: 148500   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:41,767-Speed 2628.79 samples/sec   Loss 11.2164   LearningRate 0.0674   Epoch: 3   Global Step: 148510   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:45,669-Speed 2625.13 samples/sec   Loss 11.2351   LearningRate 0.0674   Epoch: 3   Global Step: 148520   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:49,569-Speed 2626.45 samples/sec   Loss 11.2583   LearningRate 0.0674   Epoch: 3   Global Step: 148530   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:53,462-Speed 2630.99 samples/sec   Loss 11.1214   LearningRate 0.0674   Epoch: 3   Global Step: 148540   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:23:57,353-Speed 2632.17 samples/sec   Loss 11.2412   LearningRate 0.0674   Epoch: 3   Global Step: 148550   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:24:01,251-Speed 2627.69 samples/sec   Loss 11.3874   LearningRate 0.0674   Epoch: 3   Global Step: 148560   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:24:05,096-Speed 2663.79 samples/sec   Loss 11.2341   LearningRate 0.0674   Epoch: 3   Global Step: 148570   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:08,995-Speed 2626.57 samples/sec   Loss 11.3096   LearningRate 0.0674   Epoch: 3   Global Step: 148580   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:12,899-Speed 2623.88 samples/sec   Loss 11.4962   LearningRate 0.0674   Epoch: 3   Global Step: 148590   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:16,799-Speed 2625.80 samples/sec   Loss 11.1962   LearningRate 0.0674   Epoch: 3   Global Step: 148600   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:20,701-Speed 2625.11 samples/sec   Loss 11.4817   LearningRate 0.0674   Epoch: 3   Global Step: 148610   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:24,601-Speed 2626.46 samples/sec   Loss 11.3998   LearningRate 0.0674   Epoch: 3   Global Step: 148620   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:28,503-Speed 2624.99 samples/sec   Loss 11.2013   LearningRate 0.0674   Epoch: 3   Global Step: 148630   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:32,405-Speed 2624.72 samples/sec   Loss 11.2708   LearningRate 0.0674   Epoch: 3   Global Step: 148640   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:36,388-Speed 2571.59 samples/sec   Loss 11.3011   LearningRate 0.0674   Epoch: 3   Global Step: 148650   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:40,390-Speed 2558.76 samples/sec   Loss 11.2079   LearningRate 0.0674   Epoch: 3   Global Step: 148660   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:24:44,408-Speed 2549.71 samples/sec   Loss 11.3193   LearningRate 0.0674   Epoch: 3   Global Step: 148670   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:24:48,311-Speed 2623.95 samples/sec   Loss 11.2873   LearningRate 0.0674   Epoch: 3   Global Step: 148680   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:24:52,207-Speed 2629.75 samples/sec   Loss 11.2130   LearningRate 0.0674   Epoch: 3   Global Step: 148690   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:24:56,108-Speed 2625.40 samples/sec   Loss 11.3460   LearningRate 0.0674   Epoch: 3   Global Step: 148700   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:00,002-Speed 2630.07 samples/sec   Loss 11.2346   LearningRate 0.0674   Epoch: 3   Global Step: 148710   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:03,904-Speed 2624.98 samples/sec   Loss 11.4171   LearningRate 0.0674   Epoch: 3   Global Step: 148720   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:07,816-Speed 2618.45 samples/sec   Loss 11.3762   LearningRate 0.0674   Epoch: 3   Global Step: 148730   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:11,712-Speed 2629.49 samples/sec   Loss 11.4111   LearningRate 0.0674   Epoch: 3   Global Step: 148740   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:15,612-Speed 2625.78 samples/sec   Loss 11.2425   LearningRate 0.0674   Epoch: 3   Global Step: 148750   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:19,520-Speed 2621.44 samples/sec   Loss 11.2509   LearningRate 0.0674   Epoch: 3   Global Step: 148760   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:23,406-Speed 2636.09 samples/sec   Loss 11.1529   LearningRate 0.0673   Epoch: 3   Global Step: 148770   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:27,300-Speed 2630.35 samples/sec   Loss 11.1064   LearningRate 0.0673   Epoch: 3   Global Step: 148780   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:31,199-Speed 2626.76 samples/sec   Loss 11.2624   LearningRate 0.0673   Epoch: 3   Global Step: 148790   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:35,108-Speed 2619.87 samples/sec   Loss 11.2600   LearningRate 0.0673   Epoch: 3   Global Step: 148800   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:39,001-Speed 2631.11 samples/sec   Loss 11.4188   LearningRate 0.0673   Epoch: 3   Global Step: 148810   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:42,927-Speed 2609.59 samples/sec   Loss 11.2670   LearningRate 0.0673   Epoch: 3   Global Step: 148820   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:46,851-Speed 2609.72 samples/sec   Loss 11.2676   LearningRate 0.0673   Epoch: 3   Global Step: 148830   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:25:50,733-Speed 2642.49 samples/sec   Loss 11.3357   LearningRate 0.0673   Epoch: 3   Global Step: 148840   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:54,622-Speed 2633.69 samples/sec   Loss 11.2700   LearningRate 0.0673   Epoch: 3   Global Step: 148850   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:25:58,534-Speed 2618.38 samples/sec   Loss 11.3101   LearningRate 0.0673   Epoch: 3   Global Step: 148860   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:02,451-Speed 2614.97 samples/sec   Loss 11.2806   LearningRate 0.0673   Epoch: 3   Global Step: 148870   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:06,355-Speed 2623.48 samples/sec   Loss 11.1791   LearningRate 0.0673   Epoch: 3   Global Step: 148880   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:10,255-Speed 2625.94 samples/sec   Loss 11.3550   LearningRate 0.0673   Epoch: 3   Global Step: 148890   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:14,168-Speed 2618.33 samples/sec   Loss 11.3085   LearningRate 0.0673   Epoch: 3   Global Step: 148900   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:18,110-Speed 2598.00 samples/sec   Loss 11.2584   LearningRate 0.0673   Epoch: 3   Global Step: 148910   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:22,007-Speed 2628.32 samples/sec   Loss 11.2910   LearningRate 0.0673   Epoch: 3   Global Step: 148920   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:25,904-Speed 2628.64 samples/sec   Loss 11.2716   LearningRate 0.0673   Epoch: 3   Global Step: 148930   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:26:29,810-Speed 2622.03 samples/sec   Loss 11.3054   LearningRate 0.0673   Epoch: 3   Global Step: 148940   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:33,721-Speed 2619.27 samples/sec   Loss 11.2482   LearningRate 0.0673   Epoch: 3   Global Step: 148950   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:37,628-Speed 2621.61 samples/sec   Loss 11.3159   LearningRate 0.0673   Epoch: 3   Global Step: 148960   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:41,520-Speed 2632.23 samples/sec   Loss 11.3510   LearningRate 0.0673   Epoch: 3   Global Step: 148970   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:45,412-Speed 2631.29 samples/sec   Loss 11.1523   LearningRate 0.0673   Epoch: 3   Global Step: 148980   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:49,329-Speed 2615.48 samples/sec   Loss 11.1632   LearningRate 0.0673   Epoch: 3   Global Step: 148990   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:53,239-Speed 2619.22 samples/sec   Loss 11.1516   LearningRate 0.0673   Epoch: 3   Global Step: 149000   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:26:57,157-Speed 2615.10 samples/sec   Loss 11.2781   LearningRate 0.0673   Epoch: 3   Global Step: 149010   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:27:01,064-Speed 2621.31 samples/sec   Loss 11.1255   LearningRate 0.0673   Epoch: 3   Global Step: 149020   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:27:04,965-Speed 2625.55 samples/sec   Loss 11.4366   LearningRate 0.0673   Epoch: 3   Global Step: 149030   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:27:08,888-Speed 2610.53 samples/sec   Loss 11.4131   LearningRate 0.0673   Epoch: 3   Global Step: 149040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:27:12,788-Speed 2626.66 samples/sec   Loss 11.0699   LearningRate 0.0673   Epoch: 3   Global Step: 149050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:27:16,663-Speed 2643.82 samples/sec   Loss 11.2967   LearningRate 0.0673   Epoch: 3   Global Step: 149060   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:27:20,536-Speed 2645.16 samples/sec   Loss 11.2360   LearningRate 0.0673   Epoch: 3   Global Step: 149070   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:24,433-Speed 2627.77 samples/sec   Loss 11.1698   LearningRate 0.0673   Epoch: 3   Global Step: 149080   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:28,336-Speed 2625.48 samples/sec   Loss 11.1338   LearningRate 0.0673   Epoch: 3   Global Step: 149090   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:32,234-Speed 2627.49 samples/sec   Loss 11.2707   LearningRate 0.0673   Epoch: 3   Global Step: 149100   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:36,132-Speed 2627.52 samples/sec   Loss 11.1545   LearningRate 0.0673   Epoch: 3   Global Step: 149110   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:40,036-Speed 2623.12 samples/sec   Loss 11.1665   LearningRate 0.0673   Epoch: 3   Global Step: 149120   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:43,937-Speed 2626.18 samples/sec   Loss 11.1816   LearningRate 0.0673   Epoch: 3   Global Step: 149130   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:47,829-Speed 2631.65 samples/sec   Loss 11.1888   LearningRate 0.0673   Epoch: 3   Global Step: 149140   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:51,717-Speed 2634.76 samples/sec   Loss 11.3147   LearningRate 0.0673   Epoch: 3   Global Step: 149150   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:55,618-Speed 2625.13 samples/sec   Loss 11.1450   LearningRate 0.0673   Epoch: 3   Global Step: 149160   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 12:27:59,509-Speed 2632.69 samples/sec   Loss 11.0684   LearningRate 0.0673   Epoch: 3   Global Step: 149170   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:03,403-Speed 2630.33 samples/sec   Loss 11.2868   LearningRate 0.0673   Epoch: 3   Global Step: 149180   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:07,294-Speed 2632.14 samples/sec   Loss 11.4429   LearningRate 0.0673   Epoch: 3   Global Step: 149190   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:11,249-Speed 2589.15 samples/sec   Loss 11.2643   LearningRate 0.0673   Epoch: 3   Global Step: 149200   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:15,138-Speed 2633.49 samples/sec   Loss 11.3004   LearningRate 0.0673   Epoch: 3   Global Step: 149210   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:19,044-Speed 2622.72 samples/sec   Loss 11.2726   LearningRate 0.0673   Epoch: 3   Global Step: 149220   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:22,953-Speed 2620.17 samples/sec   Loss 11.1921   LearningRate 0.0673   Epoch: 3   Global Step: 149230   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:26,864-Speed 2619.54 samples/sec   Loss 11.0629   LearningRate 0.0673   Epoch: 3   Global Step: 149240   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:30,762-Speed 2627.39 samples/sec   Loss 11.0748   LearningRate 0.0673   Epoch: 3   Global Step: 149250   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:34,683-Speed 2612.45 samples/sec   Loss 11.2977   LearningRate 0.0673   Epoch: 3   Global Step: 149260   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:28:38,578-Speed 2629.50 samples/sec   Loss 11.2025   LearningRate 0.0673   Epoch: 3   Global Step: 149270   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:28:42,543-Speed 2583.73 samples/sec   Loss 11.1572   LearningRate 0.0672   Epoch: 3   Global Step: 149280   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:28:46,524-Speed 2572.85 samples/sec   Loss 11.0558   LearningRate 0.0672   Epoch: 3   Global Step: 149290   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:28:50,455-Speed 2605.61 samples/sec   Loss 11.2233   LearningRate 0.0672   Epoch: 3   Global Step: 149300   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:28:54,363-Speed 2620.69 samples/sec   Loss 11.0965   LearningRate 0.0672   Epoch: 3   Global Step: 149310   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:28:58,270-Speed 2622.43 samples/sec   Loss 11.1488   LearningRate 0.0672   Epoch: 3   Global Step: 149320   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:29:02,185-Speed 2615.70 samples/sec   Loss 11.0926   LearningRate 0.0672   Epoch: 3   Global Step: 149330   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:29:06,083-Speed 2627.87 samples/sec   Loss 11.1771   LearningRate 0.0672   Epoch: 3   Global Step: 149340   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:29:09,981-Speed 2627.88 samples/sec   Loss 11.1843   LearningRate 0.0672   Epoch: 3   Global Step: 149350   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:29:13,877-Speed 2628.95 samples/sec   Loss 11.2019   LearningRate 0.0672   Epoch: 3   Global Step: 149360   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:29:17,775-Speed 2627.15 samples/sec   Loss 11.2119   LearningRate 0.0672   Epoch: 3   Global Step: 149370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:21,706-Speed 2606.49 samples/sec   Loss 11.2318   LearningRate 0.0672   Epoch: 3   Global Step: 149380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:25,611-Speed 2622.81 samples/sec   Loss 11.1882   LearningRate 0.0672   Epoch: 3   Global Step: 149390   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:29,507-Speed 2628.94 samples/sec   Loss 11.1599   LearningRate 0.0672   Epoch: 3   Global Step: 149400   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:33,411-Speed 2624.05 samples/sec   Loss 11.2349   LearningRate 0.0672   Epoch: 3   Global Step: 149410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:37,304-Speed 2630.77 samples/sec   Loss 11.3685   LearningRate 0.0672   Epoch: 3   Global Step: 149420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:41,227-Speed 2610.71 samples/sec   Loss 11.2356   LearningRate 0.0672   Epoch: 3   Global Step: 149430   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:45,113-Speed 2635.78 samples/sec   Loss 11.0778   LearningRate 0.0672   Epoch: 3   Global Step: 149440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:49,004-Speed 2632.43 samples/sec   Loss 11.1142   LearningRate 0.0672   Epoch: 3   Global Step: 149450   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:52,898-Speed 2630.39 samples/sec   Loss 11.2669   LearningRate 0.0672   Epoch: 3   Global Step: 149460   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:29:56,790-Speed 2631.71 samples/sec   Loss 11.2282   LearningRate 0.0672   Epoch: 3   Global Step: 149470   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:30:00,692-Speed 2625.03 samples/sec   Loss 11.1693   LearningRate 0.0672   Epoch: 3   Global Step: 149480   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:30:04,585-Speed 2630.91 samples/sec   Loss 11.1908   LearningRate 0.0672   Epoch: 3   Global Step: 149490   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:30:08,517-Speed 2604.93 samples/sec   Loss 11.3058   LearningRate 0.0672   Epoch: 3   Global Step: 149500   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:30:12,403-Speed 2636.00 samples/sec   Loss 11.2068   LearningRate 0.0672   Epoch: 3   Global Step: 149510   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:30:16,301-Speed 2627.90 samples/sec   Loss 11.1962   LearningRate 0.0672   Epoch: 3   Global Step: 149520   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:30:20,209-Speed 2620.71 samples/sec   Loss 11.3849   LearningRate 0.0672   Epoch: 3   Global Step: 149530   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:30:24,088-Speed 2640.91 samples/sec   Loss 11.1054   LearningRate 0.0672   Epoch: 3   Global Step: 149540   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:27,975-Speed 2634.60 samples/sec   Loss 11.1353   LearningRate 0.0672   Epoch: 3   Global Step: 149550   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:31,869-Speed 2630.57 samples/sec   Loss 11.3026   LearningRate 0.0672   Epoch: 3   Global Step: 149560   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:35,762-Speed 2630.51 samples/sec   Loss 11.1207   LearningRate 0.0672   Epoch: 3   Global Step: 149570   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:39,655-Speed 2631.68 samples/sec   Loss 11.2394   LearningRate 0.0672   Epoch: 3   Global Step: 149580   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:43,552-Speed 2628.24 samples/sec   Loss 11.2139   LearningRate 0.0672   Epoch: 3   Global Step: 149590   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:47,446-Speed 2629.82 samples/sec   Loss 11.2326   LearningRate 0.0672   Epoch: 3   Global Step: 149600   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:51,344-Speed 2628.08 samples/sec   Loss 11.2401   LearningRate 0.0672   Epoch: 3   Global Step: 149610   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:55,248-Speed 2623.34 samples/sec   Loss 11.1541   LearningRate 0.0672   Epoch: 3   Global Step: 149620   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:30:59,142-Speed 2630.39 samples/sec   Loss 11.2465   LearningRate 0.0672   Epoch: 3   Global Step: 149630   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:31:03,034-Speed 2631.72 samples/sec   Loss 11.3367   LearningRate 0.0672   Epoch: 3   Global Step: 149640   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:06,930-Speed 2629.40 samples/sec   Loss 11.2130   LearningRate 0.0672   Epoch: 3   Global Step: 149650   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:10,842-Speed 2617.93 samples/sec   Loss 11.3370   LearningRate 0.0672   Epoch: 3   Global Step: 149660   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:14,931-Speed 2504.72 samples/sec   Loss 11.2858   LearningRate 0.0672   Epoch: 3   Global Step: 149670   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:18,832-Speed 2626.25 samples/sec   Loss 11.2174   LearningRate 0.0672   Epoch: 3   Global Step: 149680   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:22,731-Speed 2626.64 samples/sec   Loss 11.2369   LearningRate 0.0672   Epoch: 3   Global Step: 149690   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:26,661-Speed 2606.51 samples/sec   Loss 11.1978   LearningRate 0.0672   Epoch: 3   Global Step: 149700   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:30,564-Speed 2624.66 samples/sec   Loss 11.2191   LearningRate 0.0672   Epoch: 3   Global Step: 149710   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:34,456-Speed 2631.53 samples/sec   Loss 11.3366   LearningRate 0.0672   Epoch: 3   Global Step: 149720   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:38,350-Speed 2629.78 samples/sec   Loss 11.2383   LearningRate 0.0672   Epoch: 3   Global Step: 149730   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:31:42,291-Speed 2599.39 samples/sec   Loss 11.1974   LearningRate 0.0672   Epoch: 3   Global Step: 149740   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:31:46,186-Speed 2629.82 samples/sec   Loss 11.2605   LearningRate 0.0672   Epoch: 3   Global Step: 149750   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:31:50,119-Speed 2604.57 samples/sec   Loss 11.2805   LearningRate 0.0672   Epoch: 3   Global Step: 149760   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:31:54,026-Speed 2621.77 samples/sec   Loss 11.3401   LearningRate 0.0672   Epoch: 3   Global Step: 149770   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:31:57,934-Speed 2621.23 samples/sec   Loss 11.3692   LearningRate 0.0671   Epoch: 3   Global Step: 149780   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:32:01,827-Speed 2630.68 samples/sec   Loss 11.1901   LearningRate 0.0671   Epoch: 3   Global Step: 149790   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:32:05,736-Speed 2620.40 samples/sec   Loss 11.1693   LearningRate 0.0671   Epoch: 3   Global Step: 149800   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:32:09,629-Speed 2631.08 samples/sec   Loss 11.0744   LearningRate 0.0671   Epoch: 3   Global Step: 149810   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:32:13,524-Speed 2630.13 samples/sec   Loss 11.3814   LearningRate 0.0671   Epoch: 3   Global Step: 149820   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:32:17,409-Speed 2636.01 samples/sec   Loss 11.3633   LearningRate 0.0671   Epoch: 3   Global Step: 149830   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:21,308-Speed 2627.30 samples/sec   Loss 11.1319   LearningRate 0.0671   Epoch: 3   Global Step: 149840   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:25,201-Speed 2631.34 samples/sec   Loss 11.2255   LearningRate 0.0671   Epoch: 3   Global Step: 149850   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:29,108-Speed 2621.69 samples/sec   Loss 11.2961   LearningRate 0.0671   Epoch: 3   Global Step: 149860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:33,009-Speed 2625.46 samples/sec   Loss 11.3622   LearningRate 0.0671   Epoch: 3   Global Step: 149870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:36,904-Speed 2629.79 samples/sec   Loss 11.2460   LearningRate 0.0671   Epoch: 3   Global Step: 149880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:40,798-Speed 2630.15 samples/sec   Loss 11.1109   LearningRate 0.0671   Epoch: 3   Global Step: 149890   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:44,693-Speed 2630.06 samples/sec   Loss 11.1618   LearningRate 0.0671   Epoch: 3   Global Step: 149900   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:48,586-Speed 2631.18 samples/sec   Loss 11.3483   LearningRate 0.0671   Epoch: 3   Global Step: 149910   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:52,485-Speed 2627.05 samples/sec   Loss 11.3089   LearningRate 0.0671   Epoch: 3   Global Step: 149920   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:32:56,386-Speed 2625.67 samples/sec   Loss 11.3132   LearningRate 0.0671   Epoch: 3   Global Step: 149930   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:33:00,281-Speed 2629.91 samples/sec   Loss 11.0319   LearningRate 0.0671   Epoch: 3   Global Step: 149940   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:33:04,160-Speed 2639.82 samples/sec   Loss 11.2636   LearningRate 0.0671   Epoch: 3   Global Step: 149950   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:33:08,058-Speed 2627.84 samples/sec   Loss 11.2228   LearningRate 0.0671   Epoch: 3   Global Step: 149960   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:33:11,950-Speed 2632.02 samples/sec   Loss 11.3525   LearningRate 0.0671   Epoch: 3   Global Step: 149970   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:33:15,849-Speed 2627.21 samples/sec   Loss 11.2732   LearningRate 0.0671   Epoch: 3   Global Step: 149980   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:33:19,742-Speed 2631.30 samples/sec   Loss 11.2130   LearningRate 0.0671   Epoch: 3   Global Step: 149990   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:33:23,640-Speed 2627.59 samples/sec   Loss 11.1958   LearningRate 0.0671   Epoch: 3   Global Step: 150000   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:34:07,413-[lfw][150000]XNorm: 23.773956
Training: 2022-04-13 12:34:07,414-[lfw][150000]Accuracy-Flip: 0.99750+-0.00271
Training: 2022-04-13 12:34:07,415-[lfw][150000]Accuracy-Highest: 0.99783
Training: 2022-04-13 12:34:57,644-[cfp_fp][150000]XNorm: 21.455639
Training: 2022-04-13 12:34:57,645-[cfp_fp][150000]Accuracy-Flip: 0.98100+-0.00626
Training: 2022-04-13 12:34:57,646-[cfp_fp][150000]Accuracy-Highest: 0.98100
Training: 2022-04-13 12:35:41,139-[agedb_30][150000]XNorm: 23.533388
Training: 2022-04-13 12:35:41,140-[agedb_30][150000]Accuracy-Flip: 0.97000+-0.00645
Training: 2022-04-13 12:35:41,141-[agedb_30][150000]Accuracy-Highest: 0.97000
Training: 2022-04-13 12:35:45,007-Speed 72.44 samples/sec   Loss 11.2658   LearningRate 0.0671   Epoch: 3   Global Step: 150010   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:35:48,877-Speed 2646.72 samples/sec   Loss 11.3336   LearningRate 0.0671   Epoch: 3   Global Step: 150020   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:35:52,760-Speed 2638.03 samples/sec   Loss 11.4217   LearningRate 0.0671   Epoch: 3   Global Step: 150030   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:35:56,636-Speed 2642.86 samples/sec   Loss 11.3243   LearningRate 0.0671   Epoch: 3   Global Step: 150040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:00,511-Speed 2643.62 samples/sec   Loss 11.3368   LearningRate 0.0671   Epoch: 3   Global Step: 150050   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:36:04,392-Speed 2640.03 samples/sec   Loss 11.3843   LearningRate 0.0671   Epoch: 3   Global Step: 150060   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:36:08,258-Speed 2649.38 samples/sec   Loss 11.1806   LearningRate 0.0671   Epoch: 3   Global Step: 150070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:12,135-Speed 2641.39 samples/sec   Loss 11.1321   LearningRate 0.0671   Epoch: 3   Global Step: 150080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:16,095-Speed 2587.12 samples/sec   Loss 11.3816   LearningRate 0.0671   Epoch: 3   Global Step: 150090   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:19,997-Speed 2625.27 samples/sec   Loss 11.2845   LearningRate 0.0671   Epoch: 3   Global Step: 150100   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:23,884-Speed 2634.58 samples/sec   Loss 11.0953   LearningRate 0.0671   Epoch: 3   Global Step: 150110   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:27,771-Speed 2634.84 samples/sec   Loss 11.2218   LearningRate 0.0671   Epoch: 3   Global Step: 150120   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:31,656-Speed 2637.31 samples/sec   Loss 11.1449   LearningRate 0.0671   Epoch: 3   Global Step: 150130   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:35,543-Speed 2634.58 samples/sec   Loss 11.2068   LearningRate 0.0671   Epoch: 3   Global Step: 150140   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:39,443-Speed 2625.98 samples/sec   Loss 11.2226   LearningRate 0.0671   Epoch: 3   Global Step: 150150   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:43,332-Speed 2634.29 samples/sec   Loss 11.2967   LearningRate 0.0671   Epoch: 3   Global Step: 150160   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:36:47,241-Speed 2620.06 samples/sec   Loss 11.2505   LearningRate 0.0671   Epoch: 3   Global Step: 150170   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:36:51,140-Speed 2627.53 samples/sec   Loss 11.3009   LearningRate 0.0671   Epoch: 3   Global Step: 150180   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:36:55,029-Speed 2634.04 samples/sec   Loss 11.2855   LearningRate 0.0671   Epoch: 3   Global Step: 150190   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:36:58,886-Speed 2655.06 samples/sec   Loss 11.1107   LearningRate 0.0671   Epoch: 3   Global Step: 150200   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:02,786-Speed 2626.32 samples/sec   Loss 10.9861   LearningRate 0.0671   Epoch: 3   Global Step: 150210   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:06,674-Speed 2634.50 samples/sec   Loss 11.2671   LearningRate 0.0671   Epoch: 3   Global Step: 150220   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:10,570-Speed 2629.42 samples/sec   Loss 11.1470   LearningRate 0.0671   Epoch: 3   Global Step: 150230   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:14,475-Speed 2622.53 samples/sec   Loss 11.2473   LearningRate 0.0671   Epoch: 3   Global Step: 150240   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:18,376-Speed 2625.56 samples/sec   Loss 11.1963   LearningRate 0.0671   Epoch: 3   Global Step: 150250   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:22,271-Speed 2629.70 samples/sec   Loss 11.1708   LearningRate 0.0671   Epoch: 3   Global Step: 150260   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:26,169-Speed 2628.01 samples/sec   Loss 11.1549   LearningRate 0.0671   Epoch: 3   Global Step: 150270   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:30,094-Speed 2609.37 samples/sec   Loss 11.1859   LearningRate 0.0671   Epoch: 3   Global Step: 150280   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:34,014-Speed 2613.27 samples/sec   Loss 11.2560   LearningRate 0.0670   Epoch: 3   Global Step: 150290   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:37:37,918-Speed 2623.04 samples/sec   Loss 11.2425   LearningRate 0.0670   Epoch: 3   Global Step: 150300   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:37:41,888-Speed 2580.77 samples/sec   Loss 11.1935   LearningRate 0.0670   Epoch: 3   Global Step: 150310   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:37:45,783-Speed 2629.64 samples/sec   Loss 11.1098   LearningRate 0.0670   Epoch: 3   Global Step: 150320   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:37:49,688-Speed 2622.77 samples/sec   Loss 11.0595   LearningRate 0.0670   Epoch: 3   Global Step: 150330   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:37:53,608-Speed 2612.98 samples/sec   Loss 11.2660   LearningRate 0.0670   Epoch: 3   Global Step: 150340   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:37:57,520-Speed 2618.38 samples/sec   Loss 11.2344   LearningRate 0.0670   Epoch: 3   Global Step: 150350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:01,445-Speed 2609.73 samples/sec   Loss 11.1905   LearningRate 0.0670   Epoch: 3   Global Step: 150360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:05,374-Speed 2606.60 samples/sec   Loss 11.1843   LearningRate 0.0670   Epoch: 3   Global Step: 150370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:09,268-Speed 2630.61 samples/sec   Loss 11.1058   LearningRate 0.0670   Epoch: 3   Global Step: 150380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:13,163-Speed 2629.95 samples/sec   Loss 11.2435   LearningRate 0.0670   Epoch: 3   Global Step: 150390   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:17,055-Speed 2632.09 samples/sec   Loss 11.3649   LearningRate 0.0670   Epoch: 3   Global Step: 150400   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:38:20,942-Speed 2634.82 samples/sec   Loss 11.1204   LearningRate 0.0670   Epoch: 3   Global Step: 150410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:24,850-Speed 2621.27 samples/sec   Loss 11.2142   LearningRate 0.0670   Epoch: 3   Global Step: 150420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:28,759-Speed 2620.12 samples/sec   Loss 11.2280   LearningRate 0.0670   Epoch: 3   Global Step: 150430   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:32,659-Speed 2626.06 samples/sec   Loss 11.2499   LearningRate 0.0670   Epoch: 3   Global Step: 150440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:36,564-Speed 2622.88 samples/sec   Loss 11.1691   LearningRate 0.0670   Epoch: 3   Global Step: 150450   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:40,461-Speed 2628.72 samples/sec   Loss 11.1873   LearningRate 0.0670   Epoch: 3   Global Step: 150460   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:44,353-Speed 2631.39 samples/sec   Loss 11.2816   LearningRate 0.0670   Epoch: 3   Global Step: 150470   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:48,250-Speed 2628.53 samples/sec   Loss 11.0650   LearningRate 0.0670   Epoch: 3   Global Step: 150480   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:52,140-Speed 2632.73 samples/sec   Loss 11.1118   LearningRate 0.0670   Epoch: 3   Global Step: 150490   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:56,075-Speed 2603.05 samples/sec   Loss 11.3001   LearningRate 0.0670   Epoch: 3   Global Step: 150500   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:38:59,974-Speed 2627.30 samples/sec   Loss 11.3237   LearningRate 0.0670   Epoch: 3   Global Step: 150510   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:39:03,877-Speed 2624.51 samples/sec   Loss 11.2472   LearningRate 0.0670   Epoch: 3   Global Step: 150520   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:39:07,774-Speed 2628.11 samples/sec   Loss 11.2070   LearningRate 0.0670   Epoch: 3   Global Step: 150530   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:39:11,690-Speed 2616.29 samples/sec   Loss 11.1619   LearningRate 0.0670   Epoch: 3   Global Step: 150540   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:15,631-Speed 2598.43 samples/sec   Loss 11.1989   LearningRate 0.0670   Epoch: 3   Global Step: 150550   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:19,543-Speed 2619.04 samples/sec   Loss 11.3531   LearningRate 0.0670   Epoch: 3   Global Step: 150560   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:23,450-Speed 2621.50 samples/sec   Loss 11.2000   LearningRate 0.0670   Epoch: 3   Global Step: 150570   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:27,356-Speed 2621.91 samples/sec   Loss 11.2943   LearningRate 0.0670   Epoch: 3   Global Step: 150580   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:31,249-Speed 2631.58 samples/sec   Loss 11.2218   LearningRate 0.0670   Epoch: 3   Global Step: 150590   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:35,138-Speed 2633.52 samples/sec   Loss 11.2876   LearningRate 0.0670   Epoch: 3   Global Step: 150600   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:39,029-Speed 2632.10 samples/sec   Loss 11.2725   LearningRate 0.0670   Epoch: 3   Global Step: 150610   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:42,920-Speed 2632.25 samples/sec   Loss 11.2155   LearningRate 0.0670   Epoch: 3   Global Step: 150620   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:46,816-Speed 2628.72 samples/sec   Loss 11.3199   LearningRate 0.0670   Epoch: 3   Global Step: 150630   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:50,709-Speed 2631.53 samples/sec   Loss 11.2476   LearningRate 0.0670   Epoch: 3   Global Step: 150640   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:39:54,584-Speed 2643.24 samples/sec   Loss 11.2546   LearningRate 0.0670   Epoch: 3   Global Step: 150650   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:39:58,475-Speed 2632.56 samples/sec   Loss 11.2389   LearningRate 0.0670   Epoch: 3   Global Step: 150660   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:02,371-Speed 2629.08 samples/sec   Loss 11.1580   LearningRate 0.0670   Epoch: 3   Global Step: 150670   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:06,262-Speed 2631.96 samples/sec   Loss 11.1848   LearningRate 0.0670   Epoch: 3   Global Step: 150680   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:10,156-Speed 2630.58 samples/sec   Loss 11.3783   LearningRate 0.0670   Epoch: 3   Global Step: 150690   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:14,055-Speed 2627.10 samples/sec   Loss 11.1968   LearningRate 0.0670   Epoch: 3   Global Step: 150700   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:17,950-Speed 2629.47 samples/sec   Loss 11.2736   LearningRate 0.0670   Epoch: 3   Global Step: 150710   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:21,845-Speed 2630.19 samples/sec   Loss 11.3500   LearningRate 0.0670   Epoch: 3   Global Step: 150720   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:25,741-Speed 2628.88 samples/sec   Loss 11.2064   LearningRate 0.0670   Epoch: 3   Global Step: 150730   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:29,635-Speed 2630.48 samples/sec   Loss 11.3264   LearningRate 0.0670   Epoch: 3   Global Step: 150740   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:40:33,540-Speed 2622.58 samples/sec   Loss 11.1431   LearningRate 0.0670   Epoch: 3   Global Step: 150750   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:40:37,446-Speed 2622.47 samples/sec   Loss 11.1300   LearningRate 0.0670   Epoch: 3   Global Step: 150760   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:40:41,353-Speed 2621.27 samples/sec   Loss 11.3382   LearningRate 0.0670   Epoch: 3   Global Step: 150770   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:40:45,246-Speed 2630.56 samples/sec   Loss 11.1434   LearningRate 0.0670   Epoch: 3   Global Step: 150780   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:40:49,105-Speed 2654.11 samples/sec   Loss 11.2413   LearningRate 0.0670   Epoch: 3   Global Step: 150790   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:40:53,000-Speed 2630.35 samples/sec   Loss 11.9287   LearningRate 0.0669   Epoch: 3   Global Step: 150800   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:40:56,902-Speed 2624.68 samples/sec   Loss 11.6009   LearningRate 0.0669   Epoch: 3   Global Step: 150810   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:00,796-Speed 2630.88 samples/sec   Loss 11.2951   LearningRate 0.0669   Epoch: 3   Global Step: 150820   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:04,695-Speed 2626.66 samples/sec   Loss 11.3713   LearningRate 0.0669   Epoch: 3   Global Step: 150830   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:08,595-Speed 2626.46 samples/sec   Loss 11.2565   LearningRate 0.0669   Epoch: 3   Global Step: 150840   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:12,498-Speed 2624.20 samples/sec   Loss 11.1790   LearningRate 0.0669   Epoch: 3   Global Step: 150850   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:16,413-Speed 2615.89 samples/sec   Loss 11.2375   LearningRate 0.0669   Epoch: 3   Global Step: 150860   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:20,348-Speed 2602.96 samples/sec   Loss 11.2973   LearningRate 0.0669   Epoch: 3   Global Step: 150870   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:24,246-Speed 2628.32 samples/sec   Loss 11.1466   LearningRate 0.0669   Epoch: 3   Global Step: 150880   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 12:41:28,153-Speed 2621.57 samples/sec   Loss 11.2189   LearningRate 0.0669   Epoch: 3   Global Step: 150890   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:32,054-Speed 2625.86 samples/sec   Loss 11.1747   LearningRate 0.0669   Epoch: 3   Global Step: 150900   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:35,954-Speed 2626.45 samples/sec   Loss 11.2326   LearningRate 0.0669   Epoch: 3   Global Step: 150910   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:39,864-Speed 2619.36 samples/sec   Loss 11.3105   LearningRate 0.0669   Epoch: 3   Global Step: 150920   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:43,759-Speed 2629.77 samples/sec   Loss 11.1017   LearningRate 0.0669   Epoch: 3   Global Step: 150930   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:47,658-Speed 2626.69 samples/sec   Loss 11.2284   LearningRate 0.0669   Epoch: 3   Global Step: 150940   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:51,552-Speed 2630.47 samples/sec   Loss 11.0929   LearningRate 0.0669   Epoch: 3   Global Step: 150950   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:55,476-Speed 2609.95 samples/sec   Loss 11.2216   LearningRate 0.0669   Epoch: 3   Global Step: 150960   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:41:59,393-Speed 2615.33 samples/sec   Loss 11.2697   LearningRate 0.0669   Epoch: 3   Global Step: 150970   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:42:03,293-Speed 2626.26 samples/sec   Loss 11.2435   LearningRate 0.0669   Epoch: 3   Global Step: 150980   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:42:07,205-Speed 2618.51 samples/sec   Loss 11.1684   LearningRate 0.0669   Epoch: 3   Global Step: 150990   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:11,101-Speed 2628.90 samples/sec   Loss 11.2546   LearningRate 0.0669   Epoch: 3   Global Step: 151000   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:14,993-Speed 2631.48 samples/sec   Loss 11.1903   LearningRate 0.0669   Epoch: 3   Global Step: 151010   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:18,898-Speed 2623.40 samples/sec   Loss 11.0984   LearningRate 0.0669   Epoch: 3   Global Step: 151020   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:22,800-Speed 2624.85 samples/sec   Loss 11.2372   LearningRate 0.0669   Epoch: 3   Global Step: 151030   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:26,691-Speed 2632.14 samples/sec   Loss 11.2275   LearningRate 0.0669   Epoch: 3   Global Step: 151040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:30,586-Speed 2630.08 samples/sec   Loss 11.0791   LearningRate 0.0669   Epoch: 3   Global Step: 151050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:34,476-Speed 2632.87 samples/sec   Loss 11.1524   LearningRate 0.0669   Epoch: 3   Global Step: 151060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:38,370-Speed 2630.29 samples/sec   Loss 11.2594   LearningRate 0.0669   Epoch: 3   Global Step: 151070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:42,265-Speed 2629.48 samples/sec   Loss 11.3653   LearningRate 0.0669   Epoch: 3   Global Step: 151080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:42:46,165-Speed 2626.60 samples/sec   Loss 11.2158   LearningRate 0.0669   Epoch: 3   Global Step: 151090   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:42:50,059-Speed 2630.35 samples/sec   Loss 11.0708   LearningRate 0.0669   Epoch: 3   Global Step: 151100   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:42:53,956-Speed 2628.62 samples/sec   Loss 11.1676   LearningRate 0.0669   Epoch: 3   Global Step: 151110   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:42:57,854-Speed 2627.67 samples/sec   Loss 11.3767   LearningRate 0.0669   Epoch: 3   Global Step: 151120   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:43:01,774-Speed 2612.95 samples/sec   Loss 11.2296   LearningRate 0.0669   Epoch: 3   Global Step: 151130   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:43:05,678-Speed 2623.36 samples/sec   Loss 11.2419   LearningRate 0.0669   Epoch: 3   Global Step: 151140   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:43:09,573-Speed 2629.64 samples/sec   Loss 11.1024   LearningRate 0.0669   Epoch: 3   Global Step: 151150   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:43:13,450-Speed 2641.96 samples/sec   Loss 11.2318   LearningRate 0.0669   Epoch: 3   Global Step: 151160   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:43:17,346-Speed 2628.82 samples/sec   Loss 11.3622   LearningRate 0.0669   Epoch: 3   Global Step: 151170   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:43:21,221-Speed 2643.46 samples/sec   Loss 11.1885   LearningRate 0.0669   Epoch: 3   Global Step: 151180   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:25,130-Speed 2619.85 samples/sec   Loss 11.1192   LearningRate 0.0669   Epoch: 3   Global Step: 151190   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:29,023-Speed 2631.59 samples/sec   Loss 11.2900   LearningRate 0.0669   Epoch: 3   Global Step: 151200   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:32,918-Speed 2629.68 samples/sec   Loss 11.1474   LearningRate 0.0669   Epoch: 3   Global Step: 151210   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:36,810-Speed 2631.70 samples/sec   Loss 11.2814   LearningRate 0.0669   Epoch: 3   Global Step: 151220   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:40,701-Speed 2631.74 samples/sec   Loss 11.1652   LearningRate 0.0669   Epoch: 3   Global Step: 151230   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:44,594-Speed 2630.93 samples/sec   Loss 11.3139   LearningRate 0.0669   Epoch: 3   Global Step: 151240   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:48,490-Speed 2630.21 samples/sec   Loss 11.1118   LearningRate 0.0669   Epoch: 3   Global Step: 151250   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:52,384-Speed 2629.71 samples/sec   Loss 11.1918   LearningRate 0.0669   Epoch: 3   Global Step: 151260   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:43:56,279-Speed 2630.49 samples/sec   Loss 11.2053   LearningRate 0.0669   Epoch: 3   Global Step: 151270   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:00,178-Speed 2626.83 samples/sec   Loss 11.0513   LearningRate 0.0669   Epoch: 3   Global Step: 151280   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:44:04,080-Speed 2625.03 samples/sec   Loss 11.2635   LearningRate 0.0669   Epoch: 3   Global Step: 151290   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:44:07,979-Speed 2626.32 samples/sec   Loss 11.2654   LearningRate 0.0669   Epoch: 3   Global Step: 151300   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:44:11,895-Speed 2616.08 samples/sec   Loss 11.2495   LearningRate 0.0668   Epoch: 3   Global Step: 151310   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:15,785-Speed 2632.93 samples/sec   Loss 11.1123   LearningRate 0.0668   Epoch: 3   Global Step: 151320   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:19,721-Speed 2602.64 samples/sec   Loss 11.1178   LearningRate 0.0668   Epoch: 3   Global Step: 151330   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:23,611-Speed 2633.16 samples/sec   Loss 11.1610   LearningRate 0.0668   Epoch: 3   Global Step: 151340   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:27,503-Speed 2633.86 samples/sec   Loss 11.2857   LearningRate 0.0668   Epoch: 3   Global Step: 151350   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:31,396-Speed 2630.74 samples/sec   Loss 11.1893   LearningRate 0.0668   Epoch: 3   Global Step: 151360   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:35,288-Speed 2631.36 samples/sec   Loss 11.0503   LearningRate 0.0668   Epoch: 3   Global Step: 151370   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:39,184-Speed 2629.05 samples/sec   Loss 11.2136   LearningRate 0.0668   Epoch: 3   Global Step: 151380   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:43,096-Speed 2617.78 samples/sec   Loss 11.1732   LearningRate 0.0668   Epoch: 3   Global Step: 151390   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:47,009-Speed 2618.14 samples/sec   Loss 11.2076   LearningRate 0.0668   Epoch: 3   Global Step: 151400   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:44:50,921-Speed 2618.59 samples/sec   Loss 11.3212   LearningRate 0.0668   Epoch: 3   Global Step: 151410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:44:54,831-Speed 2618.86 samples/sec   Loss 11.0932   LearningRate 0.0668   Epoch: 3   Global Step: 151420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:44:58,727-Speed 2629.56 samples/sec   Loss 11.1054   LearningRate 0.0668   Epoch: 3   Global Step: 151430   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:02,623-Speed 2628.71 samples/sec   Loss 11.2769   LearningRate 0.0668   Epoch: 3   Global Step: 151440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:06,516-Speed 2630.92 samples/sec   Loss 11.1636   LearningRate 0.0668   Epoch: 3   Global Step: 151450   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:10,409-Speed 2630.76 samples/sec   Loss 11.0719   LearningRate 0.0668   Epoch: 3   Global Step: 151460   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:14,305-Speed 2629.10 samples/sec   Loss 11.2080   LearningRate 0.0668   Epoch: 3   Global Step: 151470   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:18,208-Speed 2624.99 samples/sec   Loss 11.0664   LearningRate 0.0668   Epoch: 3   Global Step: 151480   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:22,110-Speed 2624.33 samples/sec   Loss 11.2349   LearningRate 0.0668   Epoch: 3   Global Step: 151490   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:26,009-Speed 2627.43 samples/sec   Loss 11.2000   LearningRate 0.0668   Epoch: 3   Global Step: 151500   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:29,906-Speed 2628.18 samples/sec   Loss 11.1942   LearningRate 0.0668   Epoch: 3   Global Step: 151510   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:45:33,777-Speed 2645.20 samples/sec   Loss 11.1060   LearningRate 0.0668   Epoch: 3   Global Step: 151520   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:37,670-Speed 2631.36 samples/sec   Loss 11.1539   LearningRate 0.0668   Epoch: 3   Global Step: 151530   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:41,571-Speed 2625.47 samples/sec   Loss 11.0251   LearningRate 0.0668   Epoch: 3   Global Step: 151540   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:45,475-Speed 2623.25 samples/sec   Loss 11.1792   LearningRate 0.0668   Epoch: 3   Global Step: 151550   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:49,382-Speed 2621.90 samples/sec   Loss 11.1662   LearningRate 0.0668   Epoch: 3   Global Step: 151560   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:53,284-Speed 2624.92 samples/sec   Loss 11.2041   LearningRate 0.0668   Epoch: 3   Global Step: 151570   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:45:57,196-Speed 2618.85 samples/sec   Loss 11.1114   LearningRate 0.0668   Epoch: 3   Global Step: 151580   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:01,089-Speed 2630.75 samples/sec   Loss 11.1029   LearningRate 0.0668   Epoch: 3   Global Step: 151590   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:05,000-Speed 2618.66 samples/sec   Loss 11.2332   LearningRate 0.0668   Epoch: 3   Global Step: 151600   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:08,893-Speed 2630.49 samples/sec   Loss 11.0893   LearningRate 0.0668   Epoch: 3   Global Step: 151610   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:12,816-Speed 2611.38 samples/sec   Loss 11.2141   LearningRate 0.0668   Epoch: 3   Global Step: 151620   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:46:16,701-Speed 2636.21 samples/sec   Loss 11.1309   LearningRate 0.0668   Epoch: 3   Global Step: 151630   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:20,604-Speed 2624.20 samples/sec   Loss 11.3117   LearningRate 0.0668   Epoch: 3   Global Step: 151640   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:24,614-Speed 2554.24 samples/sec   Loss 11.2298   LearningRate 0.0668   Epoch: 3   Global Step: 151650   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:28,527-Speed 2618.12 samples/sec   Loss 11.1459   LearningRate 0.0668   Epoch: 3   Global Step: 151660   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:32,430-Speed 2624.20 samples/sec   Loss 11.1483   LearningRate 0.0668   Epoch: 3   Global Step: 151670   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:36,324-Speed 2630.17 samples/sec   Loss 11.1465   LearningRate 0.0668   Epoch: 3   Global Step: 151680   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:40,229-Speed 2622.80 samples/sec   Loss 11.1426   LearningRate 0.0668   Epoch: 3   Global Step: 151690   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:44,122-Speed 2632.01 samples/sec   Loss 11.1959   LearningRate 0.0668   Epoch: 3   Global Step: 151700   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:48,014-Speed 2631.74 samples/sec   Loss 11.2421   LearningRate 0.0668   Epoch: 3   Global Step: 151710   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:51,912-Speed 2627.52 samples/sec   Loss 11.3525   LearningRate 0.0668   Epoch: 3   Global Step: 151720   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:46:55,807-Speed 2629.52 samples/sec   Loss 11.1625   LearningRate 0.0668   Epoch: 3   Global Step: 151730   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:46:59,702-Speed 2630.18 samples/sec   Loss 11.3711   LearningRate 0.0668   Epoch: 3   Global Step: 151740   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:47:03,597-Speed 2629.47 samples/sec   Loss 11.2449   LearningRate 0.0668   Epoch: 3   Global Step: 151750   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:47:07,499-Speed 2625.15 samples/sec   Loss 11.2041   LearningRate 0.0668   Epoch: 3   Global Step: 151760   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:47:11,519-Speed 2548.20 samples/sec   Loss 11.2124   LearningRate 0.0668   Epoch: 3   Global Step: 151770   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:15,468-Speed 2593.31 samples/sec   Loss 11.3074   LearningRate 0.0668   Epoch: 3   Global Step: 151780   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:19,365-Speed 2628.21 samples/sec   Loss 11.2162   LearningRate 0.0668   Epoch: 3   Global Step: 151790   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:23,260-Speed 2629.67 samples/sec   Loss 11.2146   LearningRate 0.0668   Epoch: 3   Global Step: 151800   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:27,153-Speed 2631.29 samples/sec   Loss 11.2773   LearningRate 0.0667   Epoch: 3   Global Step: 151810   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:31,048-Speed 2629.38 samples/sec   Loss 11.2531   LearningRate 0.0667   Epoch: 3   Global Step: 151820   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:34,950-Speed 2625.27 samples/sec   Loss 11.1653   LearningRate 0.0667   Epoch: 3   Global Step: 151830   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:38,854-Speed 2622.92 samples/sec   Loss 11.2518   LearningRate 0.0667   Epoch: 3   Global Step: 151840   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:42,756-Speed 2625.32 samples/sec   Loss 11.0536   LearningRate 0.0667   Epoch: 3   Global Step: 151850   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:46,666-Speed 2619.94 samples/sec   Loss 11.1362   LearningRate 0.0667   Epoch: 3   Global Step: 151860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:50,546-Speed 2639.76 samples/sec   Loss 11.0961   LearningRate 0.0667   Epoch: 3   Global Step: 151870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:54,443-Speed 2628.03 samples/sec   Loss 11.0880   LearningRate 0.0667   Epoch: 3   Global Step: 151880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:47:58,341-Speed 2627.78 samples/sec   Loss 11.1794   LearningRate 0.0667   Epoch: 3   Global Step: 151890   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:02,333-Speed 2565.44 samples/sec   Loss 11.2260   LearningRate 0.0667   Epoch: 3   Global Step: 151900   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:06,228-Speed 2629.79 samples/sec   Loss 11.0702   LearningRate 0.0667   Epoch: 3   Global Step: 151910   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:10,120-Speed 2631.24 samples/sec   Loss 11.1800   LearningRate 0.0667   Epoch: 3   Global Step: 151920   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:14,016-Speed 2629.71 samples/sec   Loss 11.2489   LearningRate 0.0667   Epoch: 3   Global Step: 151930   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:17,914-Speed 2627.26 samples/sec   Loss 11.3266   LearningRate 0.0667   Epoch: 3   Global Step: 151940   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:21,809-Speed 2629.67 samples/sec   Loss 11.3156   LearningRate 0.0667   Epoch: 3   Global Step: 151950   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:25,706-Speed 2628.26 samples/sec   Loss 11.2760   LearningRate 0.0667   Epoch: 3   Global Step: 151960   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:29,610-Speed 2623.76 samples/sec   Loss 11.1628   LearningRate 0.0667   Epoch: 3   Global Step: 151970   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:48:33,496-Speed 2636.48 samples/sec   Loss 11.1918   LearningRate 0.0667   Epoch: 3   Global Step: 151980   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:37,402-Speed 2621.93 samples/sec   Loss 11.3401   LearningRate 0.0667   Epoch: 3   Global Step: 151990   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:41,306-Speed 2623.40 samples/sec   Loss 11.0194   LearningRate 0.0667   Epoch: 3   Global Step: 152000   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:45,212-Speed 2623.04 samples/sec   Loss 11.1987   LearningRate 0.0667   Epoch: 3   Global Step: 152010   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:49,114-Speed 2624.46 samples/sec   Loss 11.1670   LearningRate 0.0667   Epoch: 3   Global Step: 152020   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:53,031-Speed 2614.81 samples/sec   Loss 11.0334   LearningRate 0.0667   Epoch: 3   Global Step: 152030   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:48:56,931-Speed 2626.06 samples/sec   Loss 11.1658   LearningRate 0.0667   Epoch: 3   Global Step: 152040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:00,836-Speed 2623.17 samples/sec   Loss 11.1822   LearningRate 0.0667   Epoch: 3   Global Step: 152050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:04,743-Speed 2621.30 samples/sec   Loss 11.1838   LearningRate 0.0667   Epoch: 3   Global Step: 152060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:08,637-Speed 2630.73 samples/sec   Loss 11.2061   LearningRate 0.0667   Epoch: 3   Global Step: 152070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:12,535-Speed 2627.23 samples/sec   Loss 11.2304   LearningRate 0.0667   Epoch: 3   Global Step: 152080   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:16,426-Speed 2632.24 samples/sec   Loss 11.1708   LearningRate 0.0667   Epoch: 3   Global Step: 152090   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:20,329-Speed 2624.91 samples/sec   Loss 11.0306   LearningRate 0.0667   Epoch: 3   Global Step: 152100   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:24,221-Speed 2631.02 samples/sec   Loss 11.0912   LearningRate 0.0667   Epoch: 3   Global Step: 152110   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:28,108-Speed 2635.23 samples/sec   Loss 11.3664   LearningRate 0.0667   Epoch: 3   Global Step: 152120   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:32,000-Speed 2631.74 samples/sec   Loss 11.1163   LearningRate 0.0667   Epoch: 3   Global Step: 152130   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:35,904-Speed 2623.94 samples/sec   Loss 11.1366   LearningRate 0.0667   Epoch: 3   Global Step: 152140   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:49:39,785-Speed 2638.53 samples/sec   Loss 11.1841   LearningRate 0.0667   Epoch: 3   Global Step: 152150   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:43,681-Speed 2629.43 samples/sec   Loss 11.1442   LearningRate 0.0667   Epoch: 3   Global Step: 152160   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:47,610-Speed 2606.66 samples/sec   Loss 11.2008   LearningRate 0.0667   Epoch: 3   Global Step: 152170   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:51,526-Speed 2615.85 samples/sec   Loss 11.1621   LearningRate 0.0667   Epoch: 3   Global Step: 152180   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:55,450-Speed 2609.93 samples/sec   Loss 11.2031   LearningRate 0.0667   Epoch: 3   Global Step: 152190   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:49:59,366-Speed 2615.95 samples/sec   Loss 11.1007   LearningRate 0.0667   Epoch: 3   Global Step: 152200   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:03,263-Speed 2627.78 samples/sec   Loss 11.1351   LearningRate 0.0667   Epoch: 3   Global Step: 152210   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:07,183-Speed 2612.90 samples/sec   Loss 11.3012   LearningRate 0.0667   Epoch: 3   Global Step: 152220   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:11,082-Speed 2626.84 samples/sec   Loss 11.1555   LearningRate 0.0667   Epoch: 3   Global Step: 152230   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:15,048-Speed 2582.88 samples/sec   Loss 10.9694   LearningRate 0.0667   Epoch: 3   Global Step: 152240   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:18,952-Speed 2623.96 samples/sec   Loss 11.1953   LearningRate 0.0667   Epoch: 3   Global Step: 152250   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:50:22,853-Speed 2625.48 samples/sec   Loss 11.1276   LearningRate 0.0667   Epoch: 3   Global Step: 152260   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:50:26,747-Speed 2630.82 samples/sec   Loss 11.2728   LearningRate 0.0667   Epoch: 3   Global Step: 152270   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:50:30,638-Speed 2632.26 samples/sec   Loss 11.1773   LearningRate 0.0667   Epoch: 3   Global Step: 152280   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:50:34,536-Speed 2627.44 samples/sec   Loss 11.2014   LearningRate 0.0667   Epoch: 3   Global Step: 152290   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:50:38,409-Speed 2644.45 samples/sec   Loss 11.0222   LearningRate 0.0667   Epoch: 3   Global Step: 152300   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:42,309-Speed 2626.55 samples/sec   Loss 11.1134   LearningRate 0.0667   Epoch: 3   Global Step: 152310   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:46,205-Speed 2628.86 samples/sec   Loss 11.2858   LearningRate 0.0666   Epoch: 3   Global Step: 152320   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:50,125-Speed 2613.26 samples/sec   Loss 11.2296   LearningRate 0.0666   Epoch: 3   Global Step: 152330   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:54,047-Speed 2611.51 samples/sec   Loss 11.0949   LearningRate 0.0666   Epoch: 3   Global Step: 152340   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:50:57,952-Speed 2623.22 samples/sec   Loss 11.2012   LearningRate 0.0666   Epoch: 3   Global Step: 152350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:01,852-Speed 2626.25 samples/sec   Loss 11.1367   LearningRate 0.0666   Epoch: 3   Global Step: 152360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:05,759-Speed 2621.50 samples/sec   Loss 11.1134   LearningRate 0.0666   Epoch: 3   Global Step: 152370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:09,663-Speed 2623.18 samples/sec   Loss 11.1164   LearningRate 0.0666   Epoch: 3   Global Step: 152380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:13,577-Speed 2616.90 samples/sec   Loss 11.0711   LearningRate 0.0666   Epoch: 3   Global Step: 152390   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:17,475-Speed 2627.71 samples/sec   Loss 11.2832   LearningRate 0.0666   Epoch: 3   Global Step: 152400   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:51:21,372-Speed 2629.01 samples/sec   Loss 11.2427   LearningRate 0.0666   Epoch: 3   Global Step: 152410   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:51:25,254-Speed 2638.13 samples/sec   Loss 11.0830   LearningRate 0.0666   Epoch: 3   Global Step: 152420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:29,164-Speed 2619.90 samples/sec   Loss 11.2852   LearningRate 0.0666   Epoch: 3   Global Step: 152430   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:33,070-Speed 2622.01 samples/sec   Loss 11.2057   LearningRate 0.0666   Epoch: 3   Global Step: 152440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:36,965-Speed 2629.37 samples/sec   Loss 11.1609   LearningRate 0.0666   Epoch: 3   Global Step: 152450   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:40,865-Speed 2625.77 samples/sec   Loss 11.2043   LearningRate 0.0666   Epoch: 3   Global Step: 152460   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:44,766-Speed 2626.19 samples/sec   Loss 11.0801   LearningRate 0.0666   Epoch: 3   Global Step: 152470   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:48,660-Speed 2630.71 samples/sec   Loss 11.1485   LearningRate 0.0666   Epoch: 3   Global Step: 152480   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:52,551-Speed 2632.23 samples/sec   Loss 11.2306   LearningRate 0.0666   Epoch: 3   Global Step: 152490   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:51:56,447-Speed 2629.31 samples/sec   Loss 11.1771   LearningRate 0.0666   Epoch: 3   Global Step: 152500   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:00,338-Speed 2631.62 samples/sec   Loss 11.2742   LearningRate 0.0666   Epoch: 3   Global Step: 152510   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:04,223-Speed 2636.42 samples/sec   Loss 11.2088   LearningRate 0.0666   Epoch: 3   Global Step: 152520   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:08,119-Speed 2628.94 samples/sec   Loss 11.1705   LearningRate 0.0666   Epoch: 3   Global Step: 152530   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:12,018-Speed 2627.14 samples/sec   Loss 11.1337   LearningRate 0.0666   Epoch: 3   Global Step: 152540   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:15,906-Speed 2634.53 samples/sec   Loss 11.0117   LearningRate 0.0666   Epoch: 3   Global Step: 152550   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:19,815-Speed 2620.14 samples/sec   Loss 11.0470   LearningRate 0.0666   Epoch: 3   Global Step: 152560   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:23,710-Speed 2630.13 samples/sec   Loss 11.3354   LearningRate 0.0666   Epoch: 3   Global Step: 152570   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:27,607-Speed 2628.73 samples/sec   Loss 11.0470   LearningRate 0.0666   Epoch: 3   Global Step: 152580   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:31,505-Speed 2627.32 samples/sec   Loss 11.0485   LearningRate 0.0666   Epoch: 3   Global Step: 152590   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:35,397-Speed 2631.52 samples/sec   Loss 11.1520   LearningRate 0.0666   Epoch: 3   Global Step: 152600   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:39,295-Speed 2627.96 samples/sec   Loss 11.1474   LearningRate 0.0666   Epoch: 3   Global Step: 152610   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:52:43,190-Speed 2629.84 samples/sec   Loss 11.1756   LearningRate 0.0666   Epoch: 3   Global Step: 152620   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:52:47,082-Speed 2631.48 samples/sec   Loss 11.0534   LearningRate 0.0666   Epoch: 3   Global Step: 152630   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:52:50,977-Speed 2629.89 samples/sec   Loss 11.1026   LearningRate 0.0666   Epoch: 3   Global Step: 152640   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:52:54,870-Speed 2630.19 samples/sec   Loss 11.2301   LearningRate 0.0666   Epoch: 3   Global Step: 152650   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:52:58,750-Speed 2640.25 samples/sec   Loss 11.2485   LearningRate 0.0666   Epoch: 3   Global Step: 152660   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:02,645-Speed 2629.74 samples/sec   Loss 11.2554   LearningRate 0.0666   Epoch: 3   Global Step: 152670   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:06,541-Speed 2629.37 samples/sec   Loss 11.1421   LearningRate 0.0666   Epoch: 3   Global Step: 152680   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:10,438-Speed 2628.32 samples/sec   Loss 11.1446   LearningRate 0.0666   Epoch: 3   Global Step: 152690   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:14,332-Speed 2630.66 samples/sec   Loss 11.1798   LearningRate 0.0666   Epoch: 3   Global Step: 152700   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:18,243-Speed 2618.78 samples/sec   Loss 11.2548   LearningRate 0.0666   Epoch: 3   Global Step: 152710   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:22,146-Speed 2624.24 samples/sec   Loss 11.1468   LearningRate 0.0666   Epoch: 3   Global Step: 152720   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:26,039-Speed 2630.62 samples/sec   Loss 11.2225   LearningRate 0.0666   Epoch: 3   Global Step: 152730   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:29,933-Speed 2630.72 samples/sec   Loss 11.2447   LearningRate 0.0666   Epoch: 3   Global Step: 152740   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:33,830-Speed 2629.15 samples/sec   Loss 11.1689   LearningRate 0.0666   Epoch: 3   Global Step: 152750   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:53:37,727-Speed 2628.06 samples/sec   Loss 11.2180   LearningRate 0.0666   Epoch: 3   Global Step: 152760   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:53:41,627-Speed 2626.21 samples/sec   Loss 11.1419   LearningRate 0.0666   Epoch: 3   Global Step: 152770   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:53:45,528-Speed 2625.73 samples/sec   Loss 11.1147   LearningRate 0.0666   Epoch: 3   Global Step: 152780   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:53:49,437-Speed 2620.33 samples/sec   Loss 11.2238   LearningRate 0.0666   Epoch: 3   Global Step: 152790   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:53:53,341-Speed 2623.79 samples/sec   Loss 11.1714   LearningRate 0.0666   Epoch: 3   Global Step: 152800   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:53:57,238-Speed 2628.48 samples/sec   Loss 11.1661   LearningRate 0.0666   Epoch: 3   Global Step: 152810   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:01,173-Speed 2602.51 samples/sec   Loss 11.1061   LearningRate 0.0666   Epoch: 3   Global Step: 152820   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:05,073-Speed 2626.40 samples/sec   Loss 11.2564   LearningRate 0.0665   Epoch: 3   Global Step: 152830   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:08,976-Speed 2624.42 samples/sec   Loss 11.0701   LearningRate 0.0665   Epoch: 3   Global Step: 152840   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:12,878-Speed 2625.19 samples/sec   Loss 10.9925   LearningRate 0.0665   Epoch: 3   Global Step: 152850   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:16,779-Speed 2625.29 samples/sec   Loss 11.4090   LearningRate 0.0665   Epoch: 3   Global Step: 152860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:20,687-Speed 2621.02 samples/sec   Loss 11.2099   LearningRate 0.0665   Epoch: 3   Global Step: 152870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:24,581-Speed 2630.03 samples/sec   Loss 11.2344   LearningRate 0.0665   Epoch: 3   Global Step: 152880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:54:28,460-Speed 2640.24 samples/sec   Loss 11.3058   LearningRate 0.0665   Epoch: 3   Global Step: 152890   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:32,357-Speed 2628.13 samples/sec   Loss 11.1195   LearningRate 0.0665   Epoch: 3   Global Step: 152900   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:36,247-Speed 2633.32 samples/sec   Loss 10.9399   LearningRate 0.0665   Epoch: 3   Global Step: 152910   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:40,138-Speed 2632.89 samples/sec   Loss 11.1953   LearningRate 0.0665   Epoch: 3   Global Step: 152920   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:44,031-Speed 2630.52 samples/sec   Loss 11.1390   LearningRate 0.0665   Epoch: 3   Global Step: 152930   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:47,924-Speed 2631.05 samples/sec   Loss 11.2131   LearningRate 0.0665   Epoch: 3   Global Step: 152940   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:51,816-Speed 2631.98 samples/sec   Loss 11.0334   LearningRate 0.0665   Epoch: 3   Global Step: 152950   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:55,705-Speed 2633.29 samples/sec   Loss 11.0654   LearningRate 0.0665   Epoch: 3   Global Step: 152960   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:54:59,597-Speed 2631.50 samples/sec   Loss 11.1513   LearningRate 0.0665   Epoch: 3   Global Step: 152970   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:55:03,486-Speed 2633.51 samples/sec   Loss 11.3248   LearningRate 0.0665   Epoch: 3   Global Step: 152980   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:55:07,386-Speed 2626.23 samples/sec   Loss 11.0641   LearningRate 0.0665   Epoch: 3   Global Step: 152990   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:11,288-Speed 2624.88 samples/sec   Loss 11.1964   LearningRate 0.0665   Epoch: 3   Global Step: 153000   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:15,190-Speed 2625.18 samples/sec   Loss 11.2106   LearningRate 0.0665   Epoch: 3   Global Step: 153010   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:19,089-Speed 2627.63 samples/sec   Loss 11.0460   LearningRate 0.0665   Epoch: 3   Global Step: 153020   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:22,982-Speed 2630.31 samples/sec   Loss 10.9780   LearningRate 0.0665   Epoch: 3   Global Step: 153030   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:26,879-Speed 2628.42 samples/sec   Loss 11.1734   LearningRate 0.0665   Epoch: 3   Global Step: 153040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:30,773-Speed 2630.71 samples/sec   Loss 11.1327   LearningRate 0.0665   Epoch: 3   Global Step: 153050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:34,674-Speed 2625.38 samples/sec   Loss 11.0324   LearningRate 0.0665   Epoch: 3   Global Step: 153060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:38,568-Speed 2629.80 samples/sec   Loss 10.9947   LearningRate 0.0665   Epoch: 3   Global Step: 153070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:42,460-Speed 2631.79 samples/sec   Loss 11.0680   LearningRate 0.0665   Epoch: 3   Global Step: 153080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:46,340-Speed 2640.07 samples/sec   Loss 11.1551   LearningRate 0.0665   Epoch: 3   Global Step: 153090   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:50,233-Speed 2631.02 samples/sec   Loss 11.2564   LearningRate 0.0665   Epoch: 3   Global Step: 153100   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:54,130-Speed 2628.36 samples/sec   Loss 11.0994   LearningRate 0.0665   Epoch: 3   Global Step: 153110   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:55:58,024-Speed 2630.48 samples/sec   Loss 11.0944   LearningRate 0.0665   Epoch: 3   Global Step: 153120   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:01,917-Speed 2630.37 samples/sec   Loss 11.0647   LearningRate 0.0665   Epoch: 3   Global Step: 153130   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:05,814-Speed 2628.66 samples/sec   Loss 11.0261   LearningRate 0.0665   Epoch: 3   Global Step: 153140   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:09,717-Speed 2623.65 samples/sec   Loss 11.2049   LearningRate 0.0665   Epoch: 3   Global Step: 153150   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:13,612-Speed 2630.01 samples/sec   Loss 11.1586   LearningRate 0.0665   Epoch: 3   Global Step: 153160   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:17,503-Speed 2631.86 samples/sec   Loss 11.0555   LearningRate 0.0665   Epoch: 3   Global Step: 153170   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:21,398-Speed 2630.07 samples/sec   Loss 11.0623   LearningRate 0.0665   Epoch: 3   Global Step: 153180   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:25,300-Speed 2624.73 samples/sec   Loss 11.1400   LearningRate 0.0665   Epoch: 3   Global Step: 153190   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:56:29,202-Speed 2625.33 samples/sec   Loss 11.1805   LearningRate 0.0665   Epoch: 3   Global Step: 153200   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:56:33,106-Speed 2623.01 samples/sec   Loss 11.1873   LearningRate 0.0665   Epoch: 3   Global Step: 153210   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:56:36,987-Speed 2638.95 samples/sec   Loss 11.2265   LearningRate 0.0665   Epoch: 3   Global Step: 153220   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:40,880-Speed 2630.98 samples/sec   Loss 11.1994   LearningRate 0.0665   Epoch: 3   Global Step: 153230   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:44,777-Speed 2628.08 samples/sec   Loss 11.1919   LearningRate 0.0665   Epoch: 3   Global Step: 153240   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:56:48,661-Speed 2637.31 samples/sec   Loss 11.2584   LearningRate 0.0665   Epoch: 3   Global Step: 153250   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:56:52,565-Speed 2623.55 samples/sec   Loss 11.1318   LearningRate 0.0665   Epoch: 3   Global Step: 153260   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:56:56,456-Speed 2632.05 samples/sec   Loss 11.1825   LearningRate 0.0665   Epoch: 3   Global Step: 153270   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:00,350-Speed 2630.83 samples/sec   Loss 11.1371   LearningRate 0.0665   Epoch: 3   Global Step: 153280   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:04,240-Speed 2633.05 samples/sec   Loss 11.0479   LearningRate 0.0665   Epoch: 3   Global Step: 153290   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:08,130-Speed 2632.48 samples/sec   Loss 11.1723   LearningRate 0.0665   Epoch: 3   Global Step: 153300   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:12,038-Speed 2621.18 samples/sec   Loss 11.0045   LearningRate 0.0665   Epoch: 3   Global Step: 153310   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:15,934-Speed 2628.48 samples/sec   Loss 11.2263   LearningRate 0.0665   Epoch: 3   Global Step: 153320   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:19,829-Speed 2629.80 samples/sec   Loss 11.0331   LearningRate 0.0665   Epoch: 3   Global Step: 153330   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:23,722-Speed 2630.61 samples/sec   Loss 11.2547   LearningRate 0.0664   Epoch: 3   Global Step: 153340   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 12:57:27,636-Speed 2617.36 samples/sec   Loss 11.2030   LearningRate 0.0664   Epoch: 3   Global Step: 153350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:31,529-Speed 2630.30 samples/sec   Loss 11.0956   LearningRate 0.0664   Epoch: 3   Global Step: 153360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:35,428-Speed 2627.02 samples/sec   Loss 11.1824   LearningRate 0.0664   Epoch: 3   Global Step: 153370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:39,327-Speed 2627.03 samples/sec   Loss 11.1927   LearningRate 0.0664   Epoch: 3   Global Step: 153380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:43,218-Speed 2632.78 samples/sec   Loss 11.2725   LearningRate 0.0664   Epoch: 3   Global Step: 153390   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:47,113-Speed 2629.40 samples/sec   Loss 11.0960   LearningRate 0.0664   Epoch: 3   Global Step: 153400   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:51,007-Speed 2630.27 samples/sec   Loss 10.9756   LearningRate 0.0664   Epoch: 3   Global Step: 153410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:54,903-Speed 2628.62 samples/sec   Loss 11.1302   LearningRate 0.0664   Epoch: 3   Global Step: 153420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:57:58,798-Speed 2629.97 samples/sec   Loss 11.2066   LearningRate 0.0664   Epoch: 3   Global Step: 153430   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:58:02,691-Speed 2630.77 samples/sec   Loss 11.1585   LearningRate 0.0664   Epoch: 3   Global Step: 153440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:58:06,591-Speed 2625.87 samples/sec   Loss 11.2233   LearningRate 0.0664   Epoch: 3   Global Step: 153450   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:10,490-Speed 2627.43 samples/sec   Loss 11.1819   LearningRate 0.0664   Epoch: 3   Global Step: 153460   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:14,390-Speed 2626.35 samples/sec   Loss 11.1103   LearningRate 0.0664   Epoch: 3   Global Step: 153470   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:18,293-Speed 2623.83 samples/sec   Loss 11.1711   LearningRate 0.0664   Epoch: 3   Global Step: 153480   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:22,205-Speed 2618.33 samples/sec   Loss 11.1068   LearningRate 0.0664   Epoch: 3   Global Step: 153490   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:26,112-Speed 2621.47 samples/sec   Loss 11.1455   LearningRate 0.0664   Epoch: 3   Global Step: 153500   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:30,009-Speed 2627.99 samples/sec   Loss 11.0256   LearningRate 0.0664   Epoch: 3   Global Step: 153510   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:33,910-Speed 2625.39 samples/sec   Loss 10.9933   LearningRate 0.0664   Epoch: 3   Global Step: 153520   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:37,806-Speed 2629.37 samples/sec   Loss 11.1211   LearningRate 0.0664   Epoch: 3   Global Step: 153530   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:58:41,700-Speed 2629.76 samples/sec   Loss 10.9795   LearningRate 0.0664   Epoch: 3   Global Step: 153540   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:58:45,600-Speed 2627.03 samples/sec   Loss 11.1762   LearningRate 0.0664   Epoch: 3   Global Step: 153550   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:58:49,492-Speed 2631.16 samples/sec   Loss 11.1052   LearningRate 0.0664   Epoch: 3   Global Step: 153560   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:58:53,391-Speed 2627.07 samples/sec   Loss 11.0156   LearningRate 0.0664   Epoch: 3   Global Step: 153570   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:58:57,286-Speed 2629.43 samples/sec   Loss 11.1075   LearningRate 0.0664   Epoch: 3   Global Step: 153580   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:01,180-Speed 2630.05 samples/sec   Loss 11.1249   LearningRate 0.0664   Epoch: 3   Global Step: 153590   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:05,077-Speed 2628.20 samples/sec   Loss 11.0677   LearningRate 0.0664   Epoch: 3   Global Step: 153600   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:08,979-Speed 2625.02 samples/sec   Loss 11.0685   LearningRate 0.0664   Epoch: 3   Global Step: 153610   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:12,873-Speed 2630.23 samples/sec   Loss 11.0729   LearningRate 0.0664   Epoch: 3   Global Step: 153620   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:16,772-Speed 2626.69 samples/sec   Loss 11.1103   LearningRate 0.0664   Epoch: 3   Global Step: 153630   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:20,675-Speed 2624.42 samples/sec   Loss 11.1908   LearningRate 0.0664   Epoch: 3   Global Step: 153640   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 12:59:24,552-Speed 2641.68 samples/sec   Loss 11.0547   LearningRate 0.0664   Epoch: 3   Global Step: 153650   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:28,444-Speed 2631.93 samples/sec   Loss 11.1407   LearningRate 0.0664   Epoch: 3   Global Step: 153660   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:32,343-Speed 2627.07 samples/sec   Loss 11.0619   LearningRate 0.0664   Epoch: 3   Global Step: 153670   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:36,267-Speed 2609.94 samples/sec   Loss 11.1776   LearningRate 0.0664   Epoch: 3   Global Step: 153680   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:40,163-Speed 2629.15 samples/sec   Loss 11.0929   LearningRate 0.0664   Epoch: 3   Global Step: 153690   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:44,083-Speed 2612.94 samples/sec   Loss 11.1283   LearningRate 0.0664   Epoch: 3   Global Step: 153700   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:47,971-Speed 2633.68 samples/sec   Loss 11.0947   LearningRate 0.0664   Epoch: 3   Global Step: 153710   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:51,865-Speed 2630.30 samples/sec   Loss 11.1717   LearningRate 0.0664   Epoch: 3   Global Step: 153720   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:55,765-Speed 2625.88 samples/sec   Loss 11.1524   LearningRate 0.0664   Epoch: 3   Global Step: 153730   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 12:59:59,660-Speed 2630.43 samples/sec   Loss 11.1155   LearningRate 0.0664   Epoch: 3   Global Step: 153740   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:03,555-Speed 2629.54 samples/sec   Loss 11.1552   LearningRate 0.0664   Epoch: 3   Global Step: 153750   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:00:07,457-Speed 2624.77 samples/sec   Loss 11.0533   LearningRate 0.0664   Epoch: 3   Global Step: 153760   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:00:11,372-Speed 2615.97 samples/sec   Loss 11.0910   LearningRate 0.0664   Epoch: 3   Global Step: 153770   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:00:15,284-Speed 2618.63 samples/sec   Loss 11.0040   LearningRate 0.0664   Epoch: 3   Global Step: 153780   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:00:19,182-Speed 2626.92 samples/sec   Loss 11.2189   LearningRate 0.0664   Epoch: 3   Global Step: 153790   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:00:23,073-Speed 2632.34 samples/sec   Loss 11.1419   LearningRate 0.0664   Epoch: 3   Global Step: 153800   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:26,977-Speed 2623.60 samples/sec   Loss 11.0913   LearningRate 0.0664   Epoch: 3   Global Step: 153810   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:30,865-Speed 2634.51 samples/sec   Loss 11.0292   LearningRate 0.0664   Epoch: 3   Global Step: 153820   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:34,757-Speed 2631.62 samples/sec   Loss 11.2055   LearningRate 0.0664   Epoch: 3   Global Step: 153830   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:38,647-Speed 2632.90 samples/sec   Loss 11.0932   LearningRate 0.0664   Epoch: 3   Global Step: 153840   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:42,539-Speed 2632.00 samples/sec   Loss 11.0270   LearningRate 0.0663   Epoch: 3   Global Step: 153850   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:46,431-Speed 2631.34 samples/sec   Loss 11.1711   LearningRate 0.0663   Epoch: 3   Global Step: 153860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:50,334-Speed 2624.26 samples/sec   Loss 10.9366   LearningRate 0.0663   Epoch: 3   Global Step: 153870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:54,238-Speed 2623.87 samples/sec   Loss 11.1085   LearningRate 0.0663   Epoch: 3   Global Step: 153880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:00:58,134-Speed 2628.54 samples/sec   Loss 11.0749   LearningRate 0.0663   Epoch: 3   Global Step: 153890   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:02,018-Speed 2636.92 samples/sec   Loss 11.0250   LearningRate 0.0663   Epoch: 3   Global Step: 153900   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:05,924-Speed 2622.14 samples/sec   Loss 11.1311   LearningRate 0.0663   Epoch: 3   Global Step: 153910   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:09,833-Speed 2620.20 samples/sec   Loss 10.9843   LearningRate 0.0663   Epoch: 3   Global Step: 153920   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:13,749-Speed 2615.36 samples/sec   Loss 10.9672   LearningRate 0.0663   Epoch: 3   Global Step: 153930   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:17,660-Speed 2619.06 samples/sec   Loss 11.1322   LearningRate 0.0663   Epoch: 3   Global Step: 153940   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:21,570-Speed 2619.81 samples/sec   Loss 11.1331   LearningRate 0.0663   Epoch: 3   Global Step: 153950   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:25,479-Speed 2619.87 samples/sec   Loss 11.3021   LearningRate 0.0663   Epoch: 3   Global Step: 153960   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:29,386-Speed 2621.66 samples/sec   Loss 11.1292   LearningRate 0.0663   Epoch: 3   Global Step: 153970   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:33,279-Speed 2630.72 samples/sec   Loss 11.1533   LearningRate 0.0663   Epoch: 3   Global Step: 153980   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:37,171-Speed 2631.31 samples/sec   Loss 10.9978   LearningRate 0.0663   Epoch: 3   Global Step: 153990   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:41,064-Speed 2631.34 samples/sec   Loss 11.2738   LearningRate 0.0663   Epoch: 3   Global Step: 154000   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:01:44,944-Speed 2639.68 samples/sec   Loss 11.3387   LearningRate 0.0663   Epoch: 3   Global Step: 154010   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:48,841-Speed 2628.06 samples/sec   Loss 11.1073   LearningRate 0.0663   Epoch: 3   Global Step: 154020   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:52,739-Speed 2628.11 samples/sec   Loss 11.0249   LearningRate 0.0663   Epoch: 3   Global Step: 154030   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:01:56,632-Speed 2630.99 samples/sec   Loss 11.1470   LearningRate 0.0663   Epoch: 3   Global Step: 154040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:00,525-Speed 2630.93 samples/sec   Loss 11.1367   LearningRate 0.0663   Epoch: 3   Global Step: 154050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:04,426-Speed 2625.34 samples/sec   Loss 11.1698   LearningRate 0.0663   Epoch: 3   Global Step: 154060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:08,336-Speed 2619.62 samples/sec   Loss 11.0824   LearningRate 0.0663   Epoch: 3   Global Step: 154070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:12,232-Speed 2628.25 samples/sec   Loss 10.9822   LearningRate 0.0663   Epoch: 3   Global Step: 154080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:16,127-Speed 2630.21 samples/sec   Loss 11.3523   LearningRate 0.0663   Epoch: 3   Global Step: 154090   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:20,024-Speed 2628.27 samples/sec   Loss 11.0428   LearningRate 0.0663   Epoch: 3   Global Step: 154100   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:23,932-Speed 2620.81 samples/sec   Loss 11.2207   LearningRate 0.0663   Epoch: 3   Global Step: 154110   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:02:27,819-Speed 2635.60 samples/sec   Loss 11.0648   LearningRate 0.0663   Epoch: 3   Global Step: 154120   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:31,718-Speed 2626.26 samples/sec   Loss 11.0872   LearningRate 0.0663   Epoch: 3   Global Step: 154130   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:35,620-Speed 2624.73 samples/sec   Loss 11.2273   LearningRate 0.0663   Epoch: 3   Global Step: 154140   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:39,546-Speed 2609.16 samples/sec   Loss 11.1283   LearningRate 0.0663   Epoch: 3   Global Step: 154150   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:43,440-Speed 2630.28 samples/sec   Loss 11.2649   LearningRate 0.0663   Epoch: 3   Global Step: 154160   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:47,333-Speed 2630.60 samples/sec   Loss 11.1884   LearningRate 0.0663   Epoch: 3   Global Step: 154170   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:51,231-Speed 2627.77 samples/sec   Loss 11.1084   LearningRate 0.0663   Epoch: 3   Global Step: 154180   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:55,135-Speed 2623.62 samples/sec   Loss 11.1827   LearningRate 0.0663   Epoch: 3   Global Step: 154190   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:02:59,027-Speed 2631.61 samples/sec   Loss 11.1352   LearningRate 0.0663   Epoch: 3   Global Step: 154200   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:02,917-Speed 2633.12 samples/sec   Loss 11.2734   LearningRate 0.0663   Epoch: 3   Global Step: 154210   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:06,836-Speed 2613.62 samples/sec   Loss 11.2369   LearningRate 0.0663   Epoch: 3   Global Step: 154220   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:03:10,734-Speed 2627.13 samples/sec   Loss 11.1224   LearningRate 0.0663   Epoch: 3   Global Step: 154230   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:14,642-Speed 2621.07 samples/sec   Loss 11.2052   LearningRate 0.0663   Epoch: 3   Global Step: 154240   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:18,538-Speed 2629.10 samples/sec   Loss 11.1296   LearningRate 0.0663   Epoch: 3   Global Step: 154250   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:22,435-Speed 2628.22 samples/sec   Loss 11.1088   LearningRate 0.0663   Epoch: 3   Global Step: 154260   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:26,336-Speed 2625.34 samples/sec   Loss 11.1466   LearningRate 0.0663   Epoch: 3   Global Step: 154270   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:30,234-Speed 2627.93 samples/sec   Loss 11.1188   LearningRate 0.0663   Epoch: 3   Global Step: 154280   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:34,134-Speed 2625.98 samples/sec   Loss 11.0670   LearningRate 0.0663   Epoch: 3   Global Step: 154290   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:38,029-Speed 2629.49 samples/sec   Loss 11.0644   LearningRate 0.0663   Epoch: 3   Global Step: 154300   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:41,924-Speed 2629.42 samples/sec   Loss 11.1319   LearningRate 0.0663   Epoch: 3   Global Step: 154310   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:45,846-Speed 2611.88 samples/sec   Loss 11.1728   LearningRate 0.0663   Epoch: 3   Global Step: 154320   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:03:49,747-Speed 2625.47 samples/sec   Loss 11.1623   LearningRate 0.0663   Epoch: 3   Global Step: 154330   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:03:53,647-Speed 2626.12 samples/sec   Loss 11.2026   LearningRate 0.0663   Epoch: 3   Global Step: 154340   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:03:57,520-Speed 2645.06 samples/sec   Loss 11.0222   LearningRate 0.0663   Epoch: 3   Global Step: 154350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:04:01,422-Speed 2624.75 samples/sec   Loss 11.1582   LearningRate 0.0662   Epoch: 3   Global Step: 154360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:04:05,316-Speed 2629.59 samples/sec   Loss 11.0706   LearningRate 0.0662   Epoch: 3   Global Step: 154370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:04:09,188-Speed 2645.11 samples/sec   Loss 11.1160   LearningRate 0.0662   Epoch: 3   Global Step: 154380   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:04:13,090-Speed 2625.05 samples/sec   Loss 11.0010   LearningRate 0.0662   Epoch: 3   Global Step: 154390   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:04:16,993-Speed 2624.82 samples/sec   Loss 10.8389   LearningRate 0.0662   Epoch: 3   Global Step: 154400   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:04:20,852-Speed 2654.04 samples/sec   Loss 11.2702   LearningRate 0.0662   Epoch: 3   Global Step: 154410   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:04:24,744-Speed 2631.77 samples/sec   Loss 11.1618   LearningRate 0.0662   Epoch: 3   Global Step: 154420   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:04:28,639-Speed 2629.90 samples/sec   Loss 11.3617   LearningRate 0.0662   Epoch: 3   Global Step: 154430   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:04:32,540-Speed 2625.49 samples/sec   Loss 11.2685   LearningRate 0.0662   Epoch: 3   Global Step: 154440   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:04:36,432-Speed 2631.83 samples/sec   Loss 11.2392   LearningRate 0.0662   Epoch: 3   Global Step: 154450   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:04:40,310-Speed 2640.77 samples/sec   Loss 11.4682   LearningRate 0.0662   Epoch: 3   Global Step: 154460   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:04:44,198-Speed 2634.67 samples/sec   Loss 11.9239   LearningRate 0.0662   Epoch: 3   Global Step: 154470   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:04:48,097-Speed 2626.80 samples/sec   Loss 11.6128   LearningRate 0.0662   Epoch: 3   Global Step: 154480   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:04:51,998-Speed 2625.54 samples/sec   Loss 11.3572   LearningRate 0.0662   Epoch: 3   Global Step: 154490   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:04:55,909-Speed 2618.49 samples/sec   Loss 11.4364   LearningRate 0.0662   Epoch: 3   Global Step: 154500   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:04:59,804-Speed 2629.84 samples/sec   Loss 11.3108   LearningRate 0.0662   Epoch: 3   Global Step: 154510   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:05:03,710-Speed 2622.45 samples/sec   Loss 11.2527   LearningRate 0.0662   Epoch: 3   Global Step: 154520   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:05:07,598-Speed 2634.06 samples/sec   Loss 11.2783   LearningRate 0.0662   Epoch: 3   Global Step: 154530   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:05:11,489-Speed 2632.20 samples/sec   Loss 11.2326   LearningRate 0.0662   Epoch: 3   Global Step: 154540   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:05:15,379-Speed 2633.17 samples/sec   Loss 11.2500   LearningRate 0.0662   Epoch: 3   Global Step: 154550   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-13 13:05:19,280-Speed 2625.35 samples/sec   Loss 11.0948   LearningRate 0.0662   Epoch: 3   Global Step: 154560   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:23,174-Speed 2629.92 samples/sec   Loss 11.2072   LearningRate 0.0662   Epoch: 3   Global Step: 154570   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:27,067-Speed 2630.75 samples/sec   Loss 11.2084   LearningRate 0.0662   Epoch: 3   Global Step: 154580   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:30,964-Speed 2628.77 samples/sec   Loss 11.2586   LearningRate 0.0662   Epoch: 3   Global Step: 154590   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:34,859-Speed 2630.22 samples/sec   Loss 11.2965   LearningRate 0.0662   Epoch: 3   Global Step: 154600   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:38,749-Speed 2632.58 samples/sec   Loss 11.1579   LearningRate 0.0662   Epoch: 3   Global Step: 154610   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:42,646-Speed 2628.14 samples/sec   Loss 11.2257   LearningRate 0.0662   Epoch: 3   Global Step: 154620   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:46,538-Speed 2631.70 samples/sec   Loss 11.1648   LearningRate 0.0662   Epoch: 3   Global Step: 154630   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:50,431-Speed 2630.95 samples/sec   Loss 11.1040   LearningRate 0.0662   Epoch: 3   Global Step: 154640   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:54,324-Speed 2630.75 samples/sec   Loss 11.2622   LearningRate 0.0662   Epoch: 3   Global Step: 154650   Fp16 Grad Scale: 16384   Required: 76 hours
Training: 2022-04-13 13:05:58,215-Speed 2632.22 samples/sec   Loss 11.3156   LearningRate 0.0662   Epoch: 3   Global Step: 154660   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:02,119-Speed 2623.74 samples/sec   Loss 11.0873   LearningRate 0.0662   Epoch: 3   Global Step: 154670   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:06,014-Speed 2629.72 samples/sec   Loss 11.2779   LearningRate 0.0662   Epoch: 3   Global Step: 154680   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:09,907-Speed 2630.79 samples/sec   Loss 11.0945   LearningRate 0.0662   Epoch: 3   Global Step: 154690   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:13,801-Speed 2630.52 samples/sec   Loss 11.2349   LearningRate 0.0662   Epoch: 3   Global Step: 154700   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:17,698-Speed 2628.04 samples/sec   Loss 11.0770   LearningRate 0.0662   Epoch: 3   Global Step: 154710   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:21,591-Speed 2631.19 samples/sec   Loss 10.9490   LearningRate 0.0662   Epoch: 3   Global Step: 154720   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:25,498-Speed 2621.46 samples/sec   Loss 11.0302   LearningRate 0.0662   Epoch: 3   Global Step: 154730   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:29,430-Speed 2604.89 samples/sec   Loss 11.0803   LearningRate 0.0662   Epoch: 3   Global Step: 154740   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:33,332-Speed 2624.66 samples/sec   Loss 11.2095   LearningRate 0.0662   Epoch: 3   Global Step: 154750   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:06:37,234-Speed 2624.72 samples/sec   Loss 11.2619   LearningRate 0.0662   Epoch: 3   Global Step: 154760   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:06:41,127-Speed 2630.85 samples/sec   Loss 11.0578   LearningRate 0.0662   Epoch: 3   Global Step: 154770   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:06:45,021-Speed 2630.76 samples/sec   Loss 11.1763   LearningRate 0.0662   Epoch: 3   Global Step: 154780   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:06:48,922-Speed 2625.83 samples/sec   Loss 11.1638   LearningRate 0.0662   Epoch: 3   Global Step: 154790   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:06:52,815-Speed 2631.11 samples/sec   Loss 11.0629   LearningRate 0.0662   Epoch: 3   Global Step: 154800   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:06:56,714-Speed 2626.60 samples/sec   Loss 11.0309   LearningRate 0.0662   Epoch: 3   Global Step: 154810   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:00,604-Speed 2633.08 samples/sec   Loss 11.1127   LearningRate 0.0662   Epoch: 3   Global Step: 154820   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:04,502-Speed 2627.47 samples/sec   Loss 11.0586   LearningRate 0.0662   Epoch: 3   Global Step: 154830   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:08,399-Speed 2628.15 samples/sec   Loss 11.0392   LearningRate 0.0662   Epoch: 3   Global Step: 154840   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:12,291-Speed 2631.66 samples/sec   Loss 11.1442   LearningRate 0.0662   Epoch: 3   Global Step: 154850   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:16,187-Speed 2628.70 samples/sec   Loss 11.0905   LearningRate 0.0662   Epoch: 3   Global Step: 154860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:20,090-Speed 2624.63 samples/sec   Loss 11.0205   LearningRate 0.0661   Epoch: 3   Global Step: 154870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:23,983-Speed 2631.15 samples/sec   Loss 11.0566   LearningRate 0.0661   Epoch: 3   Global Step: 154880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:27,880-Speed 2628.32 samples/sec   Loss 11.2532   LearningRate 0.0661   Epoch: 3   Global Step: 154890   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:31,774-Speed 2630.53 samples/sec   Loss 11.1466   LearningRate 0.0661   Epoch: 3   Global Step: 154900   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:35,671-Speed 2627.87 samples/sec   Loss 11.1996   LearningRate 0.0661   Epoch: 3   Global Step: 154910   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:39,563-Speed 2631.39 samples/sec   Loss 11.0892   LearningRate 0.0661   Epoch: 3   Global Step: 154920   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:43,462-Speed 2627.33 samples/sec   Loss 11.1998   LearningRate 0.0661   Epoch: 3   Global Step: 154930   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:07:47,337-Speed 2642.75 samples/sec   Loss 11.1689   LearningRate 0.0661   Epoch: 3   Global Step: 154940   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:51,228-Speed 2632.42 samples/sec   Loss 11.1040   LearningRate 0.0661   Epoch: 3   Global Step: 154950   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:55,120-Speed 2631.39 samples/sec   Loss 11.2039   LearningRate 0.0661   Epoch: 3   Global Step: 154960   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:07:59,037-Speed 2615.16 samples/sec   Loss 11.0236   LearningRate 0.0661   Epoch: 3   Global Step: 154970   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:02,936-Speed 2627.02 samples/sec   Loss 11.1073   LearningRate 0.0661   Epoch: 3   Global Step: 154980   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:06,843-Speed 2621.31 samples/sec   Loss 11.1283   LearningRate 0.0661   Epoch: 3   Global Step: 154990   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:10,748-Speed 2622.92 samples/sec   Loss 11.0359   LearningRate 0.0661   Epoch: 3   Global Step: 155000   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:14,655-Speed 2621.77 samples/sec   Loss 11.2346   LearningRate 0.0661   Epoch: 3   Global Step: 155010   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:18,550-Speed 2629.11 samples/sec   Loss 11.1344   LearningRate 0.0661   Epoch: 3   Global Step: 155020   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:22,442-Speed 2632.03 samples/sec   Loss 11.1645   LearningRate 0.0661   Epoch: 3   Global Step: 155030   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:08:26,333-Speed 2631.83 samples/sec   Loss 11.1771   LearningRate 0.0661   Epoch: 3   Global Step: 155040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:30,235-Speed 2625.16 samples/sec   Loss 11.1061   LearningRate 0.0661   Epoch: 3   Global Step: 155050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:34,128-Speed 2631.08 samples/sec   Loss 11.1151   LearningRate 0.0661   Epoch: 3   Global Step: 155060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:38,031-Speed 2623.57 samples/sec   Loss 11.0212   LearningRate 0.0661   Epoch: 3   Global Step: 155070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:41,926-Speed 2629.79 samples/sec   Loss 11.2620   LearningRate 0.0661   Epoch: 3   Global Step: 155080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:45,822-Speed 2629.08 samples/sec   Loss 11.1746   LearningRate 0.0661   Epoch: 3   Global Step: 155090   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:49,737-Speed 2616.62 samples/sec   Loss 11.1275   LearningRate 0.0661   Epoch: 3   Global Step: 155100   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:53,633-Speed 2629.03 samples/sec   Loss 11.0373   LearningRate 0.0661   Epoch: 3   Global Step: 155110   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:08:57,529-Speed 2628.34 samples/sec   Loss 11.1096   LearningRate 0.0661   Epoch: 3   Global Step: 155120   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:01,423-Speed 2630.66 samples/sec   Loss 11.0306   LearningRate 0.0661   Epoch: 3   Global Step: 155130   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:05,317-Speed 2629.82 samples/sec   Loss 11.0057   LearningRate 0.0661   Epoch: 3   Global Step: 155140   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:09,213-Speed 2628.82 samples/sec   Loss 11.1980   LearningRate 0.0661   Epoch: 3   Global Step: 155150   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:13,113-Speed 2626.54 samples/sec   Loss 11.1525   LearningRate 0.0661   Epoch: 3   Global Step: 155160   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:17,019-Speed 2622.26 samples/sec   Loss 11.0545   LearningRate 0.0661   Epoch: 3   Global Step: 155170   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:20,936-Speed 2615.02 samples/sec   Loss 11.0369   LearningRate 0.0661   Epoch: 3   Global Step: 155180   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:24,831-Speed 2629.04 samples/sec   Loss 11.1004   LearningRate 0.0661   Epoch: 3   Global Step: 155190   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:28,737-Speed 2622.46 samples/sec   Loss 11.1051   LearningRate 0.0661   Epoch: 3   Global Step: 155200   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:09:32,613-Speed 2642.67 samples/sec   Loss 11.2863   LearningRate 0.0661   Epoch: 3   Global Step: 155210   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:36,510-Speed 2628.06 samples/sec   Loss 11.1310   LearningRate 0.0661   Epoch: 3   Global Step: 155220   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:40,404-Speed 2630.12 samples/sec   Loss 11.0279   LearningRate 0.0661   Epoch: 3   Global Step: 155230   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:44,321-Speed 2615.13 samples/sec   Loss 11.1707   LearningRate 0.0661   Epoch: 3   Global Step: 155240   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:48,212-Speed 2631.64 samples/sec   Loss 10.9466   LearningRate 0.0661   Epoch: 3   Global Step: 155250   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:52,109-Speed 2629.00 samples/sec   Loss 11.0198   LearningRate 0.0661   Epoch: 3   Global Step: 155260   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:56,005-Speed 2628.58 samples/sec   Loss 11.1537   LearningRate 0.0661   Epoch: 3   Global Step: 155270   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:09:59,907-Speed 2625.57 samples/sec   Loss 10.9516   LearningRate 0.0661   Epoch: 3   Global Step: 155280   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:03,806-Speed 2626.25 samples/sec   Loss 10.9647   LearningRate 0.0661   Epoch: 3   Global Step: 155290   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:07,706-Speed 2626.49 samples/sec   Loss 11.1923   LearningRate 0.0661   Epoch: 3   Global Step: 155300   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:11,619-Speed 2617.30 samples/sec   Loss 11.2441   LearningRate 0.0661   Epoch: 3   Global Step: 155310   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:10:15,535-Speed 2615.77 samples/sec   Loss 11.1786   LearningRate 0.0661   Epoch: 3   Global Step: 155320   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:10:19,425-Speed 2632.73 samples/sec   Loss 11.1067   LearningRate 0.0661   Epoch: 3   Global Step: 155330   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:23,328-Speed 2623.92 samples/sec   Loss 11.2660   LearningRate 0.0661   Epoch: 3   Global Step: 155340   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:27,219-Speed 2631.99 samples/sec   Loss 11.0809   LearningRate 0.0661   Epoch: 3   Global Step: 155350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:31,117-Speed 2628.36 samples/sec   Loss 11.2058   LearningRate 0.0661   Epoch: 3   Global Step: 155360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:35,019-Speed 2624.48 samples/sec   Loss 11.0810   LearningRate 0.0661   Epoch: 3   Global Step: 155370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:38,915-Speed 2629.27 samples/sec   Loss 11.1280   LearningRate 0.0660   Epoch: 3   Global Step: 155380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:42,820-Speed 2622.61 samples/sec   Loss 11.0929   LearningRate 0.0660   Epoch: 3   Global Step: 155390   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:46,741-Speed 2612.06 samples/sec   Loss 11.0019   LearningRate 0.0660   Epoch: 3   Global Step: 155400   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:50,647-Speed 2622.11 samples/sec   Loss 11.1628   LearningRate 0.0660   Epoch: 3   Global Step: 155410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:54,557-Speed 2619.82 samples/sec   Loss 11.0644   LearningRate 0.0660   Epoch: 3   Global Step: 155420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:10:58,465-Speed 2620.80 samples/sec   Loss 11.1797   LearningRate 0.0660   Epoch: 3   Global Step: 155430   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:11:02,350-Speed 2636.27 samples/sec   Loss 11.2162   LearningRate 0.0660   Epoch: 3   Global Step: 155440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:06,250-Speed 2626.16 samples/sec   Loss 11.1218   LearningRate 0.0660   Epoch: 3   Global Step: 155450   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:10,165-Speed 2616.19 samples/sec   Loss 11.0670   LearningRate 0.0660   Epoch: 3   Global Step: 155460   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:14,060-Speed 2629.64 samples/sec   Loss 11.0749   LearningRate 0.0660   Epoch: 3   Global Step: 155470   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:17,953-Speed 2631.28 samples/sec   Loss 11.1091   LearningRate 0.0660   Epoch: 3   Global Step: 155480   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:21,853-Speed 2626.28 samples/sec   Loss 10.9433   LearningRate 0.0660   Epoch: 3   Global Step: 155490   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:25,751-Speed 2627.08 samples/sec   Loss 11.0463   LearningRate 0.0660   Epoch: 3   Global Step: 155500   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:29,646-Speed 2629.59 samples/sec   Loss 11.0457   LearningRate 0.0660   Epoch: 3   Global Step: 155510   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:33,545-Speed 2627.09 samples/sec   Loss 10.9713   LearningRate 0.0660   Epoch: 3   Global Step: 155520   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:37,442-Speed 2628.13 samples/sec   Loss 10.9327   LearningRate 0.0660   Epoch: 3   Global Step: 155530   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:11:41,341-Speed 2626.80 samples/sec   Loss 11.1966   LearningRate 0.0660   Epoch: 3   Global Step: 155540   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:11:45,236-Speed 2629.66 samples/sec   Loss 11.0890   LearningRate 0.0660   Epoch: 3   Global Step: 155550   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:11:49,138-Speed 2625.31 samples/sec   Loss 10.9215   LearningRate 0.0660   Epoch: 3   Global Step: 155560   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:11:53,036-Speed 2627.35 samples/sec   Loss 11.0630   LearningRate 0.0660   Epoch: 3   Global Step: 155570   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:11:56,912-Speed 2642.07 samples/sec   Loss 11.0531   LearningRate 0.0660   Epoch: 3   Global Step: 155580   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:00,811-Speed 2627.10 samples/sec   Loss 11.1597   LearningRate 0.0660   Epoch: 3   Global Step: 155590   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:04,708-Speed 2627.96 samples/sec   Loss 11.0594   LearningRate 0.0660   Epoch: 3   Global Step: 155600   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:08,604-Speed 2629.03 samples/sec   Loss 11.0803   LearningRate 0.0660   Epoch: 3   Global Step: 155610   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:12,502-Speed 2627.43 samples/sec   Loss 11.1249   LearningRate 0.0660   Epoch: 3   Global Step: 155620   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:16,397-Speed 2629.90 samples/sec   Loss 11.0106   LearningRate 0.0660   Epoch: 3   Global Step: 155630   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:20,298-Speed 2625.99 samples/sec   Loss 10.9701   LearningRate 0.0660   Epoch: 3   Global Step: 155640   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:24,210-Speed 2618.05 samples/sec   Loss 10.9952   LearningRate 0.0660   Epoch: 3   Global Step: 155650   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:28,103-Speed 2630.71 samples/sec   Loss 11.0470   LearningRate 0.0660   Epoch: 3   Global Step: 155660   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:32,000-Speed 2628.37 samples/sec   Loss 11.1336   LearningRate 0.0660   Epoch: 3   Global Step: 155670   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:35,895-Speed 2629.31 samples/sec   Loss 11.1476   LearningRate 0.0660   Epoch: 3   Global Step: 155680   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:12:39,776-Speed 2639.45 samples/sec   Loss 11.1385   LearningRate 0.0660   Epoch: 3   Global Step: 155690   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:12:43,682-Speed 2621.65 samples/sec   Loss 11.1845   LearningRate 0.0660   Epoch: 3   Global Step: 155700   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:12:47,578-Speed 2629.24 samples/sec   Loss 11.1162   LearningRate 0.0660   Epoch: 3   Global Step: 155710   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:12:51,477-Speed 2626.80 samples/sec   Loss 11.0453   LearningRate 0.0660   Epoch: 3   Global Step: 155720   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:12:55,380-Speed 2624.48 samples/sec   Loss 11.1715   LearningRate 0.0660   Epoch: 3   Global Step: 155730   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:12:59,275-Speed 2629.51 samples/sec   Loss 11.1776   LearningRate 0.0660   Epoch: 3   Global Step: 155740   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:13:03,182-Speed 2621.79 samples/sec   Loss 11.0721   LearningRate 0.0660   Epoch: 3   Global Step: 155750   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:13:07,078-Speed 2628.76 samples/sec   Loss 10.9655   LearningRate 0.0660   Epoch: 3   Global Step: 155760   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:13:10,977-Speed 2630.88 samples/sec   Loss 11.1837   LearningRate 0.0660   Epoch: 3   Global Step: 155770   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:13:14,875-Speed 2627.44 samples/sec   Loss 11.1915   LearningRate 0.0660   Epoch: 3   Global Step: 155780   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:13:18,771-Speed 2628.24 samples/sec   Loss 11.2692   LearningRate 0.0660   Epoch: 3   Global Step: 155790   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:22,676-Speed 2623.70 samples/sec   Loss 11.2151   LearningRate 0.0660   Epoch: 3   Global Step: 155800   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:26,577-Speed 2625.72 samples/sec   Loss 11.2024   LearningRate 0.0660   Epoch: 3   Global Step: 155810   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:30,471-Speed 2630.18 samples/sec   Loss 11.1498   LearningRate 0.0660   Epoch: 3   Global Step: 155820   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:34,362-Speed 2632.48 samples/sec   Loss 11.2559   LearningRate 0.0660   Epoch: 3   Global Step: 155830   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:38,284-Speed 2611.30 samples/sec   Loss 11.2214   LearningRate 0.0660   Epoch: 3   Global Step: 155840   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:42,176-Speed 2631.30 samples/sec   Loss 11.2537   LearningRate 0.0660   Epoch: 3   Global Step: 155850   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:46,066-Speed 2633.67 samples/sec   Loss 11.0825   LearningRate 0.0660   Epoch: 3   Global Step: 155860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:49,965-Speed 2626.33 samples/sec   Loss 11.1621   LearningRate 0.0660   Epoch: 3   Global Step: 155870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:53,861-Speed 2629.48 samples/sec   Loss 11.0486   LearningRate 0.0660   Epoch: 3   Global Step: 155880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:13:57,751-Speed 2632.21 samples/sec   Loss 11.0191   LearningRate 0.0659   Epoch: 3   Global Step: 155890   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:14:01,648-Speed 2628.87 samples/sec   Loss 11.1005   LearningRate 0.0659   Epoch: 3   Global Step: 155900   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:14:05,527-Speed 2640.23 samples/sec   Loss 11.1874   LearningRate 0.0659   Epoch: 3   Global Step: 155910   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:09,419-Speed 2631.40 samples/sec   Loss 11.1232   LearningRate 0.0659   Epoch: 3   Global Step: 155920   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:13,316-Speed 2628.64 samples/sec   Loss 11.0633   LearningRate 0.0659   Epoch: 3   Global Step: 155930   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:17,220-Speed 2623.56 samples/sec   Loss 11.0006   LearningRate 0.0659   Epoch: 3   Global Step: 155940   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:21,135-Speed 2616.21 samples/sec   Loss 11.0747   LearningRate 0.0659   Epoch: 3   Global Step: 155950   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:25,041-Speed 2622.02 samples/sec   Loss 11.0650   LearningRate 0.0659   Epoch: 3   Global Step: 155960   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:28,964-Speed 2610.95 samples/sec   Loss 11.0233   LearningRate 0.0659   Epoch: 3   Global Step: 155970   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:32,876-Speed 2618.37 samples/sec   Loss 11.1282   LearningRate 0.0659   Epoch: 3   Global Step: 155980   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:36,783-Speed 2621.22 samples/sec   Loss 11.0429   LearningRate 0.0659   Epoch: 3   Global Step: 155990   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:40,679-Speed 2628.97 samples/sec   Loss 11.0727   LearningRate 0.0659   Epoch: 3   Global Step: 156000   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:44,561-Speed 2639.28 samples/sec   Loss 11.1412   LearningRate 0.0659   Epoch: 3   Global Step: 156010   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:48,447-Speed 2635.99 samples/sec   Loss 11.1675   LearningRate 0.0659   Epoch: 3   Global Step: 156020   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:52,343-Speed 2628.56 samples/sec   Loss 10.9344   LearningRate 0.0659   Epoch: 3   Global Step: 156030   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:14:56,237-Speed 2630.64 samples/sec   Loss 11.1678   LearningRate 0.0659   Epoch: 3   Global Step: 156040   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:00,128-Speed 2632.41 samples/sec   Loss 11.1144   LearningRate 0.0659   Epoch: 3   Global Step: 156050   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:04,021-Speed 2630.62 samples/sec   Loss 11.2482   LearningRate 0.0659   Epoch: 3   Global Step: 156060   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:07,914-Speed 2630.68 samples/sec   Loss 11.1419   LearningRate 0.0659   Epoch: 3   Global Step: 156070   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:11,808-Speed 2630.31 samples/sec   Loss 11.2035   LearningRate 0.0659   Epoch: 3   Global Step: 156080   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:15,706-Speed 2627.24 samples/sec   Loss 11.0633   LearningRate 0.0659   Epoch: 3   Global Step: 156090   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:19,601-Speed 2629.67 samples/sec   Loss 11.0256   LearningRate 0.0659   Epoch: 3   Global Step: 156100   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:23,500-Speed 2627.15 samples/sec   Loss 10.9724   LearningRate 0.0659   Epoch: 3   Global Step: 156110   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:15:27,411-Speed 2618.96 samples/sec   Loss 11.0605   LearningRate 0.0659   Epoch: 3   Global Step: 156120   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:15:31,307-Speed 2629.72 samples/sec   Loss 11.0189   LearningRate 0.0659   Epoch: 3   Global Step: 156130   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:15:35,216-Speed 2620.02 samples/sec   Loss 11.0058   LearningRate 0.0659   Epoch: 3   Global Step: 156140   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:15:39,124-Speed 2620.20 samples/sec   Loss 11.2181   LearningRate 0.0659   Epoch: 3   Global Step: 156150   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:15:43,009-Speed 2636.76 samples/sec   Loss 11.1166   LearningRate 0.0659   Epoch: 3   Global Step: 156160   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:46,905-Speed 2628.84 samples/sec   Loss 11.0157   LearningRate 0.0659   Epoch: 3   Global Step: 156170   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:50,818-Speed 2618.01 samples/sec   Loss 11.0484   LearningRate 0.0659   Epoch: 3   Global Step: 156180   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:54,720-Speed 2624.51 samples/sec   Loss 11.0188   LearningRate 0.0659   Epoch: 3   Global Step: 156190   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:15:58,615-Speed 2630.06 samples/sec   Loss 11.0867   LearningRate 0.0659   Epoch: 3   Global Step: 156200   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:02,509-Speed 2630.02 samples/sec   Loss 11.1431   LearningRate 0.0659   Epoch: 3   Global Step: 156210   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:06,406-Speed 2628.23 samples/sec   Loss 10.9353   LearningRate 0.0659   Epoch: 3   Global Step: 156220   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:10,307-Speed 2625.54 samples/sec   Loss 11.0423   LearningRate 0.0659   Epoch: 3   Global Step: 156230   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:14,202-Speed 2629.39 samples/sec   Loss 11.1525   LearningRate 0.0659   Epoch: 3   Global Step: 156240   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:18,097-Speed 2629.91 samples/sec   Loss 11.0312   LearningRate 0.0659   Epoch: 3   Global Step: 156250   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:21,997-Speed 2628.10 samples/sec   Loss 11.1252   LearningRate 0.0659   Epoch: 3   Global Step: 156260   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:16:25,900-Speed 2624.37 samples/sec   Loss 11.0805   LearningRate 0.0659   Epoch: 3   Global Step: 156270   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:16:29,803-Speed 2624.11 samples/sec   Loss 11.0007   LearningRate 0.0659   Epoch: 3   Global Step: 156280   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:16:33,686-Speed 2637.63 samples/sec   Loss 11.1079   LearningRate 0.0659   Epoch: 3   Global Step: 156290   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:37,596-Speed 2620.30 samples/sec   Loss 11.1496   LearningRate 0.0659   Epoch: 3   Global Step: 156300   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:41,503-Speed 2621.60 samples/sec   Loss 11.1082   LearningRate 0.0659   Epoch: 3   Global Step: 156310   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:45,409-Speed 2621.75 samples/sec   Loss 11.0826   LearningRate 0.0659   Epoch: 3   Global Step: 156320   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:49,325-Speed 2615.93 samples/sec   Loss 11.0307   LearningRate 0.0659   Epoch: 3   Global Step: 156330   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:53,226-Speed 2625.11 samples/sec   Loss 11.1166   LearningRate 0.0659   Epoch: 3   Global Step: 156340   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:16:57,122-Speed 2629.18 samples/sec   Loss 11.2659   LearningRate 0.0659   Epoch: 3   Global Step: 156350   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:01,032-Speed 2618.84 samples/sec   Loss 11.0473   LearningRate 0.0659   Epoch: 3   Global Step: 156360   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:04,963-Speed 2606.28 samples/sec   Loss 11.0401   LearningRate 0.0659   Epoch: 3   Global Step: 156370   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:08,855-Speed 2631.00 samples/sec   Loss 11.0974   LearningRate 0.0659   Epoch: 3   Global Step: 156380   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:12,749-Speed 2630.53 samples/sec   Loss 11.1351   LearningRate 0.0659   Epoch: 3   Global Step: 156390   Fp16 Grad Scale: 262144   Required: 76 hours
Training: 2022-04-13 13:17:16,634-Speed 2636.71 samples/sec   Loss 11.0258   LearningRate 0.0658   Epoch: 3   Global Step: 156400   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:20,534-Speed 2626.26 samples/sec   Loss 11.0545   LearningRate 0.0658   Epoch: 3   Global Step: 156410   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:24,451-Speed 2614.46 samples/sec   Loss 10.9695   LearningRate 0.0658   Epoch: 3   Global Step: 156420   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:28,366-Speed 2616.03 samples/sec   Loss 11.0415   LearningRate 0.0658   Epoch: 3   Global Step: 156430   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:32,261-Speed 2629.71 samples/sec   Loss 11.1805   LearningRate 0.0658   Epoch: 3   Global Step: 156440   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:36,156-Speed 2629.60 samples/sec   Loss 11.1316   LearningRate 0.0658   Epoch: 3   Global Step: 156450   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:40,060-Speed 2623.28 samples/sec   Loss 10.9509   LearningRate 0.0658   Epoch: 3   Global Step: 156460   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:17:43,948-Speed 2634.46 samples/sec   Loss 11.0199   LearningRate 0.0658   Epoch: 3   Global Step: 156470   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:17:47,855-Speed 2621.50 samples/sec   Loss 11.0394   LearningRate 0.0658   Epoch: 3   Global Step: 156480   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:17:51,762-Speed 2621.99 samples/sec   Loss 11.1266   LearningRate 0.0658   Epoch: 3   Global Step: 156490   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:17:55,658-Speed 2628.94 samples/sec   Loss 10.9918   LearningRate 0.0658   Epoch: 3   Global Step: 156500   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:17:59,552-Speed 2630.01 samples/sec   Loss 11.1837   LearningRate 0.0658   Epoch: 3   Global Step: 156510   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:18:03,447-Speed 2629.38 samples/sec   Loss 11.1767   LearningRate 0.0658   Epoch: 3   Global Step: 156520   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:18:07,346-Speed 2627.64 samples/sec   Loss 11.1315   LearningRate 0.0658   Epoch: 3   Global Step: 156530   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:18:11,239-Speed 2630.46 samples/sec   Loss 11.2450   LearningRate 0.0658   Epoch: 3   Global Step: 156540   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:18:15,129-Speed 2633.24 samples/sec   Loss 11.1292   LearningRate 0.0658   Epoch: 3   Global Step: 156550   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:18:19,022-Speed 2630.94 samples/sec   Loss 11.1617   LearningRate 0.0658   Epoch: 3   Global Step: 156560   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:18:22,924-Speed 2625.05 samples/sec   Loss 11.0917   LearningRate 0.0658   Epoch: 3   Global Step: 156570   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:26,816-Speed 2631.10 samples/sec   Loss 11.0569   LearningRate 0.0658   Epoch: 3   Global Step: 156580   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:30,707-Speed 2632.27 samples/sec   Loss 11.0285   LearningRate 0.0658   Epoch: 3   Global Step: 156590   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:34,599-Speed 2632.52 samples/sec   Loss 11.1067   LearningRate 0.0658   Epoch: 3   Global Step: 156600   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:38,496-Speed 2627.74 samples/sec   Loss 11.1090   LearningRate 0.0658   Epoch: 3   Global Step: 156610   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:42,401-Speed 2623.41 samples/sec   Loss 11.3132   LearningRate 0.0658   Epoch: 3   Global Step: 156620   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:46,301-Speed 2626.07 samples/sec   Loss 11.0221   LearningRate 0.0658   Epoch: 3   Global Step: 156630   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:50,195-Speed 2629.84 samples/sec   Loss 11.0408   LearningRate 0.0658   Epoch: 3   Global Step: 156640   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:18:54,067-Speed 2645.41 samples/sec   Loss 11.7435   LearningRate 0.0658   Epoch: 3   Global Step: 156650   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:18:57,961-Speed 2630.21 samples/sec   Loss 11.6227   LearningRate 0.0658   Epoch: 3   Global Step: 156660   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:01,858-Speed 2628.21 samples/sec   Loss 11.2409   LearningRate 0.0658   Epoch: 3   Global Step: 156670   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:05,837-Speed 2573.84 samples/sec   Loss 11.1777   LearningRate 0.0658   Epoch: 3   Global Step: 156680   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:09,735-Speed 2628.18 samples/sec   Loss 11.0120   LearningRate 0.0658   Epoch: 3   Global Step: 156690   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:13,644-Speed 2620.04 samples/sec   Loss 10.9237   LearningRate 0.0658   Epoch: 3   Global Step: 156700   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:17,536-Speed 2631.53 samples/sec   Loss 11.3260   LearningRate 0.0658   Epoch: 3   Global Step: 156710   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:21,442-Speed 2622.77 samples/sec   Loss 11.0820   LearningRate 0.0658   Epoch: 3   Global Step: 156720   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:25,364-Speed 2610.97 samples/sec   Loss 11.1929   LearningRate 0.0658   Epoch: 3   Global Step: 156730   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:29,270-Speed 2622.15 samples/sec   Loss 11.2631   LearningRate 0.0658   Epoch: 3   Global Step: 156740   Fp16 Grad Scale: 32768   Required: 76 hours
Training: 2022-04-13 13:19:33,168-Speed 2627.40 samples/sec   Loss 10.9760   LearningRate 0.0658   Epoch: 3   Global Step: 156750   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:19:37,068-Speed 2626.70 samples/sec   Loss 11.1255   LearningRate 0.0658   Epoch: 3   Global Step: 156760   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:19:40,965-Speed 2627.94 samples/sec   Loss 11.0867   LearningRate 0.0658   Epoch: 3   Global Step: 156770   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:19:44,920-Speed 2589.77 samples/sec   Loss 11.1070   LearningRate 0.0658   Epoch: 3   Global Step: 156780   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:19:48,814-Speed 2630.56 samples/sec   Loss 11.1037   LearningRate 0.0658   Epoch: 3   Global Step: 156790   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:19:52,710-Speed 2628.97 samples/sec   Loss 11.1353   LearningRate 0.0658   Epoch: 3   Global Step: 156800   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:19:56,608-Speed 2627.48 samples/sec   Loss 11.1322   LearningRate 0.0658   Epoch: 3   Global Step: 156810   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:20:00,508-Speed 2626.71 samples/sec   Loss 11.3646   LearningRate 0.0658   Epoch: 3   Global Step: 156820   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:20:04,406-Speed 2627.40 samples/sec   Loss 11.0753   LearningRate 0.0658   Epoch: 3   Global Step: 156830   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:20:08,301-Speed 2628.84 samples/sec   Loss 11.2847   LearningRate 0.0658   Epoch: 3   Global Step: 156840   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:20:12,200-Speed 2627.23 samples/sec   Loss 10.9723   LearningRate 0.0658   Epoch: 3   Global Step: 156850   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:20:16,088-Speed 2634.71 samples/sec   Loss 11.0152   LearningRate 0.0658   Epoch: 3   Global Step: 156860   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:20:19,995-Speed 2621.33 samples/sec   Loss 11.1577   LearningRate 0.0658   Epoch: 3   Global Step: 156870   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:20:23,889-Speed 2630.70 samples/sec   Loss 11.1669   LearningRate 0.0658   Epoch: 3   Global Step: 156880   Fp16 Grad Scale: 131072   Required: 76 hours
Training: 2022-04-13 13:20:27,765-Speed 2642.63 samples/sec   Loss 11.1136   LearningRate 0.0658   Epoch: 3   Global Step: 156890   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:20:31,663-Speed 2627.62 samples/sec   Loss 11.1051   LearningRate 0.0658   Epoch: 3   Global Step: 156900   Fp16 Grad Scale: 65536   Required: 76 hours
Training: 2022-04-13 13:20:35,555-Speed 2631.31 samples/sec   Loss 10.9863   LearningRate 0.0657   Epoch: 3   Global Step: 156910   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:20:39,452-Speed 2628.25 samples/sec   Loss 10.9806   LearningRate 0.0657   Epoch: 3   Global Step: 156920   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:20:43,346-Speed 2630.21 samples/sec   Loss 11.2286   LearningRate 0.0657   Epoch: 3   Global Step: 156930   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:20:47,250-Speed 2623.37 samples/sec   Loss 11.1935   LearningRate 0.0657   Epoch: 3   Global Step: 156940   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:20:51,144-Speed 2630.38 samples/sec   Loss 11.1306   LearningRate 0.0657   Epoch: 3   Global Step: 156950   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:20:55,052-Speed 2620.57 samples/sec   Loss 11.1238   LearningRate 0.0657   Epoch: 3   Global Step: 156960   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:20:58,943-Speed 2632.61 samples/sec   Loss 11.0949   LearningRate 0.0657   Epoch: 3   Global Step: 156970   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:21:02,841-Speed 2627.93 samples/sec   Loss 11.0471   LearningRate 0.0657   Epoch: 3   Global Step: 156980   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:21:06,735-Speed 2630.36 samples/sec   Loss 10.9186   LearningRate 0.0657   Epoch: 3   Global Step: 156990   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:10,641-Speed 2621.89 samples/sec   Loss 11.0851   LearningRate 0.0657   Epoch: 3   Global Step: 157000   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:14,536-Speed 2629.70 samples/sec   Loss 10.9802   LearningRate 0.0657   Epoch: 3   Global Step: 157010   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:18,432-Speed 2628.80 samples/sec   Loss 11.1047   LearningRate 0.0657   Epoch: 3   Global Step: 157020   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:22,339-Speed 2621.75 samples/sec   Loss 10.9409   LearningRate 0.0657   Epoch: 3   Global Step: 157030   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:26,232-Speed 2630.42 samples/sec   Loss 11.0474   LearningRate 0.0657   Epoch: 3   Global Step: 157040   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:30,131-Speed 2627.28 samples/sec   Loss 11.0304   LearningRate 0.0657   Epoch: 3   Global Step: 157050   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:34,024-Speed 2631.32 samples/sec   Loss 11.1547   LearningRate 0.0657   Epoch: 3   Global Step: 157060   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:37,929-Speed 2622.57 samples/sec   Loss 11.2623   LearningRate 0.0657   Epoch: 3   Global Step: 157070   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:41,825-Speed 2628.99 samples/sec   Loss 11.1166   LearningRate 0.0657   Epoch: 3   Global Step: 157080   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:45,718-Speed 2630.81 samples/sec   Loss 11.0781   LearningRate 0.0657   Epoch: 3   Global Step: 157090   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:21:49,616-Speed 2627.70 samples/sec   Loss 11.1216   LearningRate 0.0657   Epoch: 3   Global Step: 157100   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:21:53,501-Speed 2636.50 samples/sec   Loss 11.0033   LearningRate 0.0657   Epoch: 3   Global Step: 157110   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:21:57,410-Speed 2619.93 samples/sec   Loss 10.9505   LearningRate 0.0657   Epoch: 3   Global Step: 157120   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:01,305-Speed 2629.79 samples/sec   Loss 10.9679   LearningRate 0.0657   Epoch: 3   Global Step: 157130   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:05,204-Speed 2626.55 samples/sec   Loss 11.0108   LearningRate 0.0657   Epoch: 3   Global Step: 157140   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:09,096-Speed 2631.99 samples/sec   Loss 11.0585   LearningRate 0.0657   Epoch: 3   Global Step: 157150   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:12,992-Speed 2629.18 samples/sec   Loss 11.1795   LearningRate 0.0657   Epoch: 3   Global Step: 157160   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:16,890-Speed 2627.58 samples/sec   Loss 11.1119   LearningRate 0.0657   Epoch: 3   Global Step: 157170   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:20,783-Speed 2630.95 samples/sec   Loss 11.0065   LearningRate 0.0657   Epoch: 3   Global Step: 157180   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:24,686-Speed 2624.18 samples/sec   Loss 11.1773   LearningRate 0.0657   Epoch: 3   Global Step: 157190   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:28,660-Speed 2577.12 samples/sec   Loss 11.0295   LearningRate 0.0657   Epoch: 3   Global Step: 157200   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:32,570-Speed 2619.48 samples/sec   Loss 11.0100   LearningRate 0.0657   Epoch: 3   Global Step: 157210   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:22:36,476-Speed 2623.10 samples/sec   Loss 11.2059   LearningRate 0.0657   Epoch: 3   Global Step: 157220   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:22:40,485-Speed 2554.60 samples/sec   Loss 11.1905   LearningRate 0.0657   Epoch: 3   Global Step: 157230   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:22:44,362-Speed 2641.70 samples/sec   Loss 11.0598   LearningRate 0.0657   Epoch: 3   Global Step: 157240   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:48,255-Speed 2631.28 samples/sec   Loss 10.9256   LearningRate 0.0657   Epoch: 3   Global Step: 157250   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:52,175-Speed 2613.36 samples/sec   Loss 11.1204   LearningRate 0.0657   Epoch: 3   Global Step: 157260   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:22:56,117-Speed 2597.86 samples/sec   Loss 11.1828   LearningRate 0.0657   Epoch: 3   Global Step: 157270   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:00,028-Speed 2618.56 samples/sec   Loss 11.1459   LearningRate 0.0657   Epoch: 3   Global Step: 157280   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:03,939-Speed 2618.89 samples/sec   Loss 11.1567   LearningRate 0.0657   Epoch: 3   Global Step: 157290   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:07,838-Speed 2627.10 samples/sec   Loss 11.0276   LearningRate 0.0657   Epoch: 3   Global Step: 157300   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:11,732-Speed 2630.56 samples/sec   Loss 11.0263   LearningRate 0.0657   Epoch: 3   Global Step: 157310   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:15,624-Speed 2631.43 samples/sec   Loss 11.0590   LearningRate 0.0657   Epoch: 3   Global Step: 157320   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:19,518-Speed 2630.30 samples/sec   Loss 11.1296   LearningRate 0.0657   Epoch: 3   Global Step: 157330   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:23,418-Speed 2626.63 samples/sec   Loss 11.0013   LearningRate 0.0657   Epoch: 3   Global Step: 157340   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:27,309-Speed 2631.80 samples/sec   Loss 11.0813   LearningRate 0.0657   Epoch: 3   Global Step: 157350   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:31,208-Speed 2626.72 samples/sec   Loss 11.1384   LearningRate 0.0657   Epoch: 3   Global Step: 157360   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:23:35,112-Speed 2623.86 samples/sec   Loss 11.1501   LearningRate 0.0657   Epoch: 3   Global Step: 157370   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:23:39,021-Speed 2620.29 samples/sec   Loss 11.0319   LearningRate 0.0657   Epoch: 3   Global Step: 157380   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:23:42,930-Speed 2620.33 samples/sec   Loss 10.9809   LearningRate 0.0657   Epoch: 3   Global Step: 157390   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:23:46,840-Speed 2619.21 samples/sec   Loss 11.1430   LearningRate 0.0657   Epoch: 3   Global Step: 157400   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:23:50,742-Speed 2625.11 samples/sec   Loss 11.0852   LearningRate 0.0657   Epoch: 3   Global Step: 157410   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:23:54,638-Speed 2628.67 samples/sec   Loss 11.0447   LearningRate 0.0656   Epoch: 3   Global Step: 157420   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:23:58,547-Speed 2619.88 samples/sec   Loss 10.9671   LearningRate 0.0656   Epoch: 3   Global Step: 157430   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:02,446-Speed 2627.38 samples/sec   Loss 11.0544   LearningRate 0.0656   Epoch: 3   Global Step: 157440   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:06,342-Speed 2629.41 samples/sec   Loss 11.0970   LearningRate 0.0656   Epoch: 3   Global Step: 157450   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:10,234-Speed 2630.96 samples/sec   Loss 11.1641   LearningRate 0.0656   Epoch: 3   Global Step: 157460   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:14,131-Speed 2628.49 samples/sec   Loss 11.1483   LearningRate 0.0656   Epoch: 3   Global Step: 157470   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:24:18,032-Speed 2625.83 samples/sec   Loss 11.0114   LearningRate 0.0656   Epoch: 3   Global Step: 157480   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:24:21,919-Speed 2635.22 samples/sec   Loss 11.0215   LearningRate 0.0656   Epoch: 3   Global Step: 157490   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:25,822-Speed 2624.07 samples/sec   Loss 11.1728   LearningRate 0.0656   Epoch: 3   Global Step: 157500   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:29,735-Speed 2616.74 samples/sec   Loss 11.0298   LearningRate 0.0656   Epoch: 3   Global Step: 157510   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:33,632-Speed 2628.19 samples/sec   Loss 11.1431   LearningRate 0.0656   Epoch: 3   Global Step: 157520   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:37,530-Speed 2628.20 samples/sec   Loss 10.9670   LearningRate 0.0656   Epoch: 3   Global Step: 157530   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:41,428-Speed 2627.97 samples/sec   Loss 11.0530   LearningRate 0.0656   Epoch: 3   Global Step: 157540   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:45,324-Speed 2628.55 samples/sec   Loss 10.9979   LearningRate 0.0656   Epoch: 3   Global Step: 157550   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:49,217-Speed 2631.26 samples/sec   Loss 11.1027   LearningRate 0.0656   Epoch: 3   Global Step: 157560   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:53,116-Speed 2626.32 samples/sec   Loss 11.0778   LearningRate 0.0656   Epoch: 3   Global Step: 157570   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:24:57,010-Speed 2630.47 samples/sec   Loss 11.0453   LearningRate 0.0656   Epoch: 3   Global Step: 157580   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:00,906-Speed 2628.70 samples/sec   Loss 11.0441   LearningRate 0.0656   Epoch: 3   Global Step: 157590   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:25:04,803-Speed 2628.57 samples/sec   Loss 10.9532   LearningRate 0.0656   Epoch: 3   Global Step: 157600   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:25:08,683-Speed 2639.19 samples/sec   Loss 11.0461   LearningRate 0.0656   Epoch: 3   Global Step: 157610   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:12,595-Speed 2618.27 samples/sec   Loss 11.1466   LearningRate 0.0656   Epoch: 3   Global Step: 157620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:16,492-Speed 2628.64 samples/sec   Loss 11.0284   LearningRate 0.0656   Epoch: 3   Global Step: 157630   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:20,385-Speed 2630.73 samples/sec   Loss 11.0187   LearningRate 0.0656   Epoch: 3   Global Step: 157640   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:24,279-Speed 2630.42 samples/sec   Loss 11.0660   LearningRate 0.0656   Epoch: 3   Global Step: 157650   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:28,176-Speed 2628.19 samples/sec   Loss 10.8698   LearningRate 0.0656   Epoch: 3   Global Step: 157660   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:32,078-Speed 2624.99 samples/sec   Loss 11.1079   LearningRate 0.0656   Epoch: 3   Global Step: 157670   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:35,975-Speed 2628.50 samples/sec   Loss 11.0752   LearningRate 0.0656   Epoch: 3   Global Step: 157680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:39,880-Speed 2622.59 samples/sec   Loss 11.1239   LearningRate 0.0656   Epoch: 3   Global Step: 157690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:43,784-Speed 2623.09 samples/sec   Loss 11.1230   LearningRate 0.0656   Epoch: 3   Global Step: 157700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:47,682-Speed 2628.20 samples/sec   Loss 10.9456   LearningRate 0.0656   Epoch: 3   Global Step: 157710   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:25:51,580-Speed 2627.21 samples/sec   Loss 11.1009   LearningRate 0.0656   Epoch: 3   Global Step: 157720   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:55,563-Speed 2571.84 samples/sec   Loss 11.0926   LearningRate 0.0656   Epoch: 3   Global Step: 157730   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:25:59,468-Speed 2622.73 samples/sec   Loss 11.0888   LearningRate 0.0656   Epoch: 3   Global Step: 157740   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:26:03,357-Speed 2634.03 samples/sec   Loss 10.9887   LearningRate 0.0656   Epoch: 3   Global Step: 157750   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:07,257-Speed 2626.07 samples/sec   Loss 11.0603   LearningRate 0.0656   Epoch: 3   Global Step: 157760   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:11,156-Speed 2626.82 samples/sec   Loss 10.9405   LearningRate 0.0656   Epoch: 3   Global Step: 157770   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:15,051-Speed 2629.81 samples/sec   Loss 10.9580   LearningRate 0.0656   Epoch: 3   Global Step: 157780   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:18,944-Speed 2630.38 samples/sec   Loss 10.8636   LearningRate 0.0656   Epoch: 3   Global Step: 157790   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:22,853-Speed 2620.63 samples/sec   Loss 10.9487   LearningRate 0.0656   Epoch: 3   Global Step: 157800   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:26,751-Speed 2627.26 samples/sec   Loss 11.0870   LearningRate 0.0656   Epoch: 3   Global Step: 157810   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:30,648-Speed 2628.61 samples/sec   Loss 11.2181   LearningRate 0.0656   Epoch: 3   Global Step: 157820   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:34,544-Speed 2629.15 samples/sec   Loss 11.1467   LearningRate 0.0656   Epoch: 3   Global Step: 157830   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:38,490-Speed 2595.58 samples/sec   Loss 11.0427   LearningRate 0.0656   Epoch: 3   Global Step: 157840   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:26:42,392-Speed 2624.73 samples/sec   Loss 11.1799   LearningRate 0.0656   Epoch: 3   Global Step: 157850   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:26:46,287-Speed 2629.70 samples/sec   Loss 11.1548   LearningRate 0.0656   Epoch: 3   Global Step: 157860   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:26:50,181-Speed 2630.31 samples/sec   Loss 11.1292   LearningRate 0.0656   Epoch: 3   Global Step: 157870   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:26:54,078-Speed 2628.58 samples/sec   Loss 10.9326   LearningRate 0.0656   Epoch: 3   Global Step: 157880   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:26:57,973-Speed 2629.64 samples/sec   Loss 11.0211   LearningRate 0.0656   Epoch: 3   Global Step: 157890   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:01,875-Speed 2624.95 samples/sec   Loss 11.1477   LearningRate 0.0656   Epoch: 3   Global Step: 157900   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:05,779-Speed 2623.58 samples/sec   Loss 11.0835   LearningRate 0.0656   Epoch: 3   Global Step: 157910   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:09,684-Speed 2622.85 samples/sec   Loss 10.9023   LearningRate 0.0656   Epoch: 3   Global Step: 157920   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:13,584-Speed 2626.19 samples/sec   Loss 11.1285   LearningRate 0.0655   Epoch: 3   Global Step: 157930   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:17,483-Speed 2627.03 samples/sec   Loss 10.9227   LearningRate 0.0655   Epoch: 3   Global Step: 157940   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:21,402-Speed 2613.61 samples/sec   Loss 11.0143   LearningRate 0.0655   Epoch: 3   Global Step: 157950   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:27:25,290-Speed 2634.61 samples/sec   Loss 11.0854   LearningRate 0.0655   Epoch: 3   Global Step: 157960   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:27:29,276-Speed 2569.54 samples/sec   Loss 11.0898   LearningRate 0.0655   Epoch: 3   Global Step: 157970   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:27:33,233-Speed 2588.12 samples/sec   Loss 11.0233   LearningRate 0.0655   Epoch: 3   Global Step: 157980   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:37,135-Speed 2624.78 samples/sec   Loss 10.9743   LearningRate 0.0655   Epoch: 3   Global Step: 157990   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:41,035-Speed 2626.36 samples/sec   Loss 11.0052   LearningRate 0.0655   Epoch: 3   Global Step: 158000   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:44,940-Speed 2623.15 samples/sec   Loss 11.1171   LearningRate 0.0655   Epoch: 3   Global Step: 158010   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:48,851-Speed 2618.60 samples/sec   Loss 11.1392   LearningRate 0.0655   Epoch: 3   Global Step: 158020   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:52,773-Speed 2611.48 samples/sec   Loss 10.9524   LearningRate 0.0655   Epoch: 3   Global Step: 158030   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:27:56,662-Speed 2633.64 samples/sec   Loss 11.0757   LearningRate 0.0655   Epoch: 3   Global Step: 158040   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:00,567-Speed 2622.85 samples/sec   Loss 11.0646   LearningRate 0.0655   Epoch: 3   Global Step: 158050   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:04,612-Speed 2532.02 samples/sec   Loss 10.9918   LearningRate 0.0655   Epoch: 3   Global Step: 158060   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:08,539-Speed 2608.61 samples/sec   Loss 10.9907   LearningRate 0.0655   Epoch: 3   Global Step: 158070   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:12,532-Speed 2564.67 samples/sec   Loss 11.1038   LearningRate 0.0655   Epoch: 3   Global Step: 158080   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:16,441-Speed 2620.68 samples/sec   Loss 11.1671   LearningRate 0.0655   Epoch: 3   Global Step: 158090   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:20,340-Speed 2626.85 samples/sec   Loss 11.0691   LearningRate 0.0655   Epoch: 3   Global Step: 158100   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:24,242-Speed 2624.86 samples/sec   Loss 10.9592   LearningRate 0.0655   Epoch: 3   Global Step: 158110   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:28,162-Speed 2613.27 samples/sec   Loss 11.0784   LearningRate 0.0655   Epoch: 3   Global Step: 158120   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:32,166-Speed 2557.98 samples/sec   Loss 11.1611   LearningRate 0.0655   Epoch: 3   Global Step: 158130   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:28:36,059-Speed 2630.84 samples/sec   Loss 11.2649   LearningRate 0.0655   Epoch: 3   Global Step: 158140   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:28:39,957-Speed 2627.51 samples/sec   Loss 10.9287   LearningRate 0.0655   Epoch: 3   Global Step: 158150   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:28:43,868-Speed 2619.52 samples/sec   Loss 11.0348   LearningRate 0.0655   Epoch: 3   Global Step: 158160   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:28:47,772-Speed 2623.22 samples/sec   Loss 11.1075   LearningRate 0.0655   Epoch: 3   Global Step: 158170   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:28:51,683-Speed 2619.78 samples/sec   Loss 11.0075   LearningRate 0.0655   Epoch: 3   Global Step: 158180   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:28:55,591-Speed 2620.50 samples/sec   Loss 11.1345   LearningRate 0.0655   Epoch: 3   Global Step: 158190   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:28:59,498-Speed 2621.25 samples/sec   Loss 10.9439   LearningRate 0.0655   Epoch: 3   Global Step: 158200   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:03,396-Speed 2627.92 samples/sec   Loss 11.0241   LearningRate 0.0655   Epoch: 3   Global Step: 158210   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:07,290-Speed 2630.12 samples/sec   Loss 11.0618   LearningRate 0.0655   Epoch: 3   Global Step: 158220   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:11,194-Speed 2623.55 samples/sec   Loss 11.0416   LearningRate 0.0655   Epoch: 3   Global Step: 158230   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:15,099-Speed 2622.99 samples/sec   Loss 11.1084   LearningRate 0.0655   Epoch: 3   Global Step: 158240   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:29:19,019-Speed 2612.66 samples/sec   Loss 10.8599   LearningRate 0.0655   Epoch: 3   Global Step: 158250   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:22,922-Speed 2624.58 samples/sec   Loss 10.9186   LearningRate 0.0655   Epoch: 3   Global Step: 158260   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:26,830-Speed 2620.96 samples/sec   Loss 10.9959   LearningRate 0.0655   Epoch: 3   Global Step: 158270   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:30,727-Speed 2628.30 samples/sec   Loss 11.1525   LearningRate 0.0655   Epoch: 3   Global Step: 158280   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:34,620-Speed 2630.40 samples/sec   Loss 11.0176   LearningRate 0.0655   Epoch: 3   Global Step: 158290   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:38,519-Speed 2626.94 samples/sec   Loss 10.9298   LearningRate 0.0655   Epoch: 3   Global Step: 158300   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:42,418-Speed 2626.99 samples/sec   Loss 10.9840   LearningRate 0.0655   Epoch: 3   Global Step: 158310   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:46,311-Speed 2631.21 samples/sec   Loss 10.9284   LearningRate 0.0655   Epoch: 3   Global Step: 158320   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:50,206-Speed 2629.42 samples/sec   Loss 11.0663   LearningRate 0.0655   Epoch: 3   Global Step: 158330   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:54,100-Speed 2630.30 samples/sec   Loss 11.0194   LearningRate 0.0655   Epoch: 3   Global Step: 158340   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:29:58,003-Speed 2623.84 samples/sec   Loss 11.0874   LearningRate 0.0655   Epoch: 3   Global Step: 158350   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:30:01,882-Speed 2640.89 samples/sec   Loss 10.9742   LearningRate 0.0655   Epoch: 3   Global Step: 158360   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:05,782-Speed 2626.35 samples/sec   Loss 10.9956   LearningRate 0.0655   Epoch: 3   Global Step: 158370   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:09,698-Speed 2615.71 samples/sec   Loss 10.8955   LearningRate 0.0655   Epoch: 3   Global Step: 158380   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:13,595-Speed 2628.10 samples/sec   Loss 10.9382   LearningRate 0.0655   Epoch: 3   Global Step: 158390   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:17,489-Speed 2630.64 samples/sec   Loss 11.0688   LearningRate 0.0655   Epoch: 3   Global Step: 158400   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:21,385-Speed 2628.59 samples/sec   Loss 11.1027   LearningRate 0.0655   Epoch: 3   Global Step: 158410   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:25,280-Speed 2629.37 samples/sec   Loss 11.0543   LearningRate 0.0655   Epoch: 3   Global Step: 158420   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:29,172-Speed 2631.87 samples/sec   Loss 11.1253   LearningRate 0.0655   Epoch: 3   Global Step: 158430   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:30:33,055-Speed 2637.69 samples/sec   Loss 11.0113   LearningRate 0.0655   Epoch: 3   Global Step: 158440   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:30:36,951-Speed 2629.23 samples/sec   Loss 10.9324   LearningRate 0.0654   Epoch: 3   Global Step: 158450   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:30:40,843-Speed 2631.75 samples/sec   Loss 11.1827   LearningRate 0.0654   Epoch: 3   Global Step: 158460   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:30:44,733-Speed 2632.96 samples/sec   Loss 11.0009   LearningRate 0.0654   Epoch: 3   Global Step: 158470   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:30:48,626-Speed 2630.62 samples/sec   Loss 11.1596   LearningRate 0.0654   Epoch: 3   Global Step: 158480   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:30:52,518-Speed 2631.61 samples/sec   Loss 11.0842   LearningRate 0.0654   Epoch: 3   Global Step: 158490   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:30:56,411-Speed 2631.18 samples/sec   Loss 11.0328   LearningRate 0.0654   Epoch: 3   Global Step: 158500   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:31:00,306-Speed 2629.78 samples/sec   Loss 11.1082   LearningRate 0.0654   Epoch: 3   Global Step: 158510   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:31:04,205-Speed 2626.63 samples/sec   Loss 11.1368   LearningRate 0.0654   Epoch: 3   Global Step: 158520   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:31:08,102-Speed 2628.78 samples/sec   Loss 11.0776   LearningRate 0.0654   Epoch: 3   Global Step: 158530   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:31:12,037-Speed 2602.99 samples/sec   Loss 11.0585   LearningRate 0.0654   Epoch: 3   Global Step: 158540   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:15,930-Speed 2630.90 samples/sec   Loss 11.1082   LearningRate 0.0654   Epoch: 3   Global Step: 158550   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:19,827-Speed 2628.54 samples/sec   Loss 10.9775   LearningRate 0.0654   Epoch: 3   Global Step: 158560   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:23,724-Speed 2628.36 samples/sec   Loss 10.8590   LearningRate 0.0654   Epoch: 3   Global Step: 158570   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:27,673-Speed 2594.00 samples/sec   Loss 11.0476   LearningRate 0.0654   Epoch: 3   Global Step: 158580   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:31,569-Speed 2629.04 samples/sec   Loss 11.1358   LearningRate 0.0654   Epoch: 3   Global Step: 158590   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:35,489-Speed 2612.84 samples/sec   Loss 10.9978   LearningRate 0.0654   Epoch: 3   Global Step: 158600   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:39,432-Speed 2597.84 samples/sec   Loss 11.1069   LearningRate 0.0654   Epoch: 3   Global Step: 158610   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:43,339-Speed 2621.76 samples/sec   Loss 10.9404   LearningRate 0.0654   Epoch: 3   Global Step: 158620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:47,242-Speed 2625.01 samples/sec   Loss 11.1078   LearningRate 0.0654   Epoch: 3   Global Step: 158630   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:51,129-Speed 2634.26 samples/sec   Loss 11.1671   LearningRate 0.0654   Epoch: 3   Global Step: 158640   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:55,024-Speed 2630.79 samples/sec   Loss 11.0949   LearningRate 0.0654   Epoch: 3   Global Step: 158650   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:31:58,919-Speed 2629.09 samples/sec   Loss 11.0548   LearningRate 0.0654   Epoch: 3   Global Step: 158660   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:02,814-Speed 2629.67 samples/sec   Loss 11.0745   LearningRate 0.0654   Epoch: 3   Global Step: 158670   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:06,733-Speed 2613.21 samples/sec   Loss 11.0801   LearningRate 0.0654   Epoch: 3   Global Step: 158680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:10,626-Speed 2631.46 samples/sec   Loss 10.9052   LearningRate 0.0654   Epoch: 3   Global Step: 158690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:14,523-Speed 2628.62 samples/sec   Loss 11.1758   LearningRate 0.0654   Epoch: 3   Global Step: 158700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:18,422-Speed 2627.12 samples/sec   Loss 10.8589   LearningRate 0.0654   Epoch: 3   Global Step: 158710   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:22,321-Speed 2627.00 samples/sec   Loss 11.0772   LearningRate 0.0654   Epoch: 3   Global Step: 158720   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:26,213-Speed 2631.71 samples/sec   Loss 11.0902   LearningRate 0.0654   Epoch: 3   Global Step: 158730   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:30,111-Speed 2627.40 samples/sec   Loss 11.0294   LearningRate 0.0654   Epoch: 3   Global Step: 158740   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:32:34,028-Speed 2614.57 samples/sec   Loss 11.0041   LearningRate 0.0654   Epoch: 3   Global Step: 158750   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:32:37,931-Speed 2623.83 samples/sec   Loss 10.9601   LearningRate 0.0654   Epoch: 3   Global Step: 158760   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:41,838-Speed 2621.77 samples/sec   Loss 10.9906   LearningRate 0.0654   Epoch: 3   Global Step: 158770   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:45,868-Speed 2541.67 samples/sec   Loss 11.0142   LearningRate 0.0654   Epoch: 3   Global Step: 158780   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:49,782-Speed 2617.38 samples/sec   Loss 10.9692   LearningRate 0.0654   Epoch: 3   Global Step: 158790   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:53,682-Speed 2626.10 samples/sec   Loss 11.0966   LearningRate 0.0654   Epoch: 3   Global Step: 158800   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:32:57,576-Speed 2630.67 samples/sec   Loss 11.0024   LearningRate 0.0654   Epoch: 3   Global Step: 158810   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:33:01,478-Speed 2624.43 samples/sec   Loss 10.9753   LearningRate 0.0654   Epoch: 3   Global Step: 158820   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:33:05,370-Speed 2631.68 samples/sec   Loss 11.1323   LearningRate 0.0654   Epoch: 3   Global Step: 158830   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:33:09,250-Speed 2639.77 samples/sec   Loss 11.1611   LearningRate 0.0654   Epoch: 3   Global Step: 158840   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:13,134-Speed 2637.16 samples/sec   Loss 10.9784   LearningRate 0.0654   Epoch: 3   Global Step: 158850   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:17,041-Speed 2621.72 samples/sec   Loss 10.9163   LearningRate 0.0654   Epoch: 3   Global Step: 158860   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:21,000-Speed 2587.16 samples/sec   Loss 11.0519   LearningRate 0.0654   Epoch: 3   Global Step: 158870   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:24,893-Speed 2631.03 samples/sec   Loss 10.9338   LearningRate 0.0654   Epoch: 3   Global Step: 158880   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:28,786-Speed 2630.98 samples/sec   Loss 11.0241   LearningRate 0.0654   Epoch: 3   Global Step: 158890   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:32,675-Speed 2633.62 samples/sec   Loss 10.9925   LearningRate 0.0654   Epoch: 3   Global Step: 158900   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:36,565-Speed 2633.30 samples/sec   Loss 11.0176   LearningRate 0.0654   Epoch: 3   Global Step: 158910   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:40,455-Speed 2632.70 samples/sec   Loss 11.0207   LearningRate 0.0654   Epoch: 3   Global Step: 158920   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:44,350-Speed 2629.42 samples/sec   Loss 11.0176   LearningRate 0.0654   Epoch: 3   Global Step: 158930   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:33:48,244-Speed 2630.68 samples/sec   Loss 11.0826   LearningRate 0.0654   Epoch: 3   Global Step: 158940   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:33:52,155-Speed 2618.46 samples/sec   Loss 10.8754   LearningRate 0.0654   Epoch: 3   Global Step: 158950   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:33:56,066-Speed 2618.96 samples/sec   Loss 11.0415   LearningRate 0.0653   Epoch: 3   Global Step: 158960   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:33:59,967-Speed 2625.69 samples/sec   Loss 10.9902   LearningRate 0.0653   Epoch: 3   Global Step: 158970   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:03,863-Speed 2629.15 samples/sec   Loss 10.9272   LearningRate 0.0653   Epoch: 3   Global Step: 158980   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:07,758-Speed 2629.26 samples/sec   Loss 11.0971   LearningRate 0.0653   Epoch: 3   Global Step: 158990   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:11,852-Speed 2501.94 samples/sec   Loss 11.0825   LearningRate 0.0653   Epoch: 3   Global Step: 159000   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:15,830-Speed 2574.87 samples/sec   Loss 10.9934   LearningRate 0.0653   Epoch: 3   Global Step: 159010   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:19,725-Speed 2629.14 samples/sec   Loss 11.0094   LearningRate 0.0653   Epoch: 3   Global Step: 159020   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:23,624-Speed 2627.70 samples/sec   Loss 10.9966   LearningRate 0.0653   Epoch: 3   Global Step: 159030   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:27,518-Speed 2630.58 samples/sec   Loss 10.9761   LearningRate 0.0653   Epoch: 3   Global Step: 159040   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:34:31,426-Speed 2620.71 samples/sec   Loss 10.8631   LearningRate 0.0653   Epoch: 3   Global Step: 159050   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:35,326-Speed 2626.36 samples/sec   Loss 10.8999   LearningRate 0.0653   Epoch: 3   Global Step: 159060   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:39,236-Speed 2619.86 samples/sec   Loss 11.0969   LearningRate 0.0653   Epoch: 3   Global Step: 159070   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:43,131-Speed 2629.49 samples/sec   Loss 10.9509   LearningRate 0.0653   Epoch: 3   Global Step: 159080   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:47,026-Speed 2629.87 samples/sec   Loss 11.0387   LearningRate 0.0653   Epoch: 3   Global Step: 159090   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:50,920-Speed 2631.02 samples/sec   Loss 10.9035   LearningRate 0.0653   Epoch: 3   Global Step: 159100   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:54,831-Speed 2618.41 samples/sec   Loss 11.0478   LearningRate 0.0653   Epoch: 3   Global Step: 159110   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:34:58,732-Speed 2625.71 samples/sec   Loss 10.7785   LearningRate 0.0653   Epoch: 3   Global Step: 159120   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:02,651-Speed 2613.04 samples/sec   Loss 11.0932   LearningRate 0.0653   Epoch: 3   Global Step: 159130   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:06,545-Speed 2630.85 samples/sec   Loss 11.0589   LearningRate 0.0653   Epoch: 3   Global Step: 159140   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:10,439-Speed 2630.19 samples/sec   Loss 11.0685   LearningRate 0.0653   Epoch: 3   Global Step: 159150   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:35:14,335-Speed 2629.31 samples/sec   Loss 11.0339   LearningRate 0.0653   Epoch: 3   Global Step: 159160   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:35:18,235-Speed 2625.94 samples/sec   Loss 10.9380   LearningRate 0.0653   Epoch: 3   Global Step: 159170   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:35:22,128-Speed 2631.21 samples/sec   Loss 10.9841   LearningRate 0.0653   Epoch: 3   Global Step: 159180   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:35:26,027-Speed 2627.39 samples/sec   Loss 11.0233   LearningRate 0.0653   Epoch: 3   Global Step: 159190   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:35:29,914-Speed 2635.06 samples/sec   Loss 11.0068   LearningRate 0.0653   Epoch: 3   Global Step: 159200   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:33,831-Speed 2614.23 samples/sec   Loss 11.0771   LearningRate 0.0653   Epoch: 3   Global Step: 159210   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:37,728-Speed 2628.98 samples/sec   Loss 11.0237   LearningRate 0.0653   Epoch: 3   Global Step: 159220   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:41,627-Speed 2626.77 samples/sec   Loss 10.9516   LearningRate 0.0653   Epoch: 3   Global Step: 159230   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:45,521-Speed 2630.39 samples/sec   Loss 10.9383   LearningRate 0.0653   Epoch: 3   Global Step: 159240   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:49,420-Speed 2627.26 samples/sec   Loss 10.9717   LearningRate 0.0653   Epoch: 3   Global Step: 159250   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:53,326-Speed 2622.52 samples/sec   Loss 10.9242   LearningRate 0.0653   Epoch: 3   Global Step: 159260   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:35:57,228-Speed 2624.56 samples/sec   Loss 10.9373   LearningRate 0.0653   Epoch: 3   Global Step: 159270   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:01,154-Speed 2608.89 samples/sec   Loss 11.1220   LearningRate 0.0653   Epoch: 3   Global Step: 159280   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:05,050-Speed 2629.27 samples/sec   Loss 11.1101   LearningRate 0.0653   Epoch: 3   Global Step: 159290   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:08,981-Speed 2605.96 samples/sec   Loss 11.0193   LearningRate 0.0653   Epoch: 3   Global Step: 159300   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:36:12,887-Speed 2622.42 samples/sec   Loss 11.0508   LearningRate 0.0653   Epoch: 3   Global Step: 159310   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:36:16,788-Speed 2626.10 samples/sec   Loss 11.0537   LearningRate 0.0653   Epoch: 3   Global Step: 159320   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:36:20,764-Speed 2575.81 samples/sec   Loss 10.9994   LearningRate 0.0653   Epoch: 3   Global Step: 159330   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:36:24,754-Speed 2567.17 samples/sec   Loss 11.0186   LearningRate 0.0653   Epoch: 3   Global Step: 159340   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:28,685-Speed 2605.96 samples/sec   Loss 10.9081   LearningRate 0.0653   Epoch: 3   Global Step: 159350   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:32,666-Speed 2572.41 samples/sec   Loss 10.9585   LearningRate 0.0653   Epoch: 3   Global Step: 159360   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:36,560-Speed 2630.69 samples/sec   Loss 11.2307   LearningRate 0.0653   Epoch: 3   Global Step: 159370   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:40,497-Speed 2601.01 samples/sec   Loss 10.8503   LearningRate 0.0653   Epoch: 3   Global Step: 159380   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:44,446-Speed 2594.29 samples/sec   Loss 11.0791   LearningRate 0.0653   Epoch: 3   Global Step: 159390   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:48,351-Speed 2623.12 samples/sec   Loss 10.8380   LearningRate 0.0653   Epoch: 3   Global Step: 159400   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:52,265-Speed 2616.86 samples/sec   Loss 11.0274   LearningRate 0.0653   Epoch: 3   Global Step: 159410   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:36:56,163-Speed 2627.21 samples/sec   Loss 10.9564   LearningRate 0.0653   Epoch: 3   Global Step: 159420   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:37:00,056-Speed 2631.29 samples/sec   Loss 11.0029   LearningRate 0.0653   Epoch: 3   Global Step: 159430   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:37:03,936-Speed 2639.13 samples/sec   Loss 11.0265   LearningRate 0.0653   Epoch: 3   Global Step: 159440   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:37:07,836-Speed 2626.67 samples/sec   Loss 11.0949   LearningRate 0.0653   Epoch: 3   Global Step: 159450   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:37:11,727-Speed 2632.22 samples/sec   Loss 11.1059   LearningRate 0.0653   Epoch: 3   Global Step: 159460   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:37:15,620-Speed 2631.03 samples/sec   Loss 10.9951   LearningRate 0.0652   Epoch: 3   Global Step: 159470   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:37:19,515-Speed 2629.83 samples/sec   Loss 10.8879   LearningRate 0.0652   Epoch: 3   Global Step: 159480   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:37:23,398-Speed 2637.97 samples/sec   Loss 11.2656   LearningRate 0.0652   Epoch: 3   Global Step: 159490   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:27,299-Speed 2625.43 samples/sec   Loss 11.5980   LearningRate 0.0652   Epoch: 3   Global Step: 159500   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:31,197-Speed 2627.52 samples/sec   Loss 11.3739   LearningRate 0.0652   Epoch: 3   Global Step: 159510   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:35,094-Speed 2628.08 samples/sec   Loss 11.2510   LearningRate 0.0652   Epoch: 3   Global Step: 159520   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:38,986-Speed 2631.72 samples/sec   Loss 11.1301   LearningRate 0.0652   Epoch: 3   Global Step: 159530   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:42,876-Speed 2632.73 samples/sec   Loss 10.9917   LearningRate 0.0652   Epoch: 3   Global Step: 159540   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:46,770-Speed 2630.43 samples/sec   Loss 11.0527   LearningRate 0.0652   Epoch: 3   Global Step: 159550   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:50,663-Speed 2631.68 samples/sec   Loss 11.1123   LearningRate 0.0652   Epoch: 3   Global Step: 159560   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:54,557-Speed 2630.15 samples/sec   Loss 11.1206   LearningRate 0.0652   Epoch: 3   Global Step: 159570   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:37:58,447-Speed 2632.78 samples/sec   Loss 11.1547   LearningRate 0.0652   Epoch: 3   Global Step: 159580   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:38:02,347-Speed 2626.08 samples/sec   Loss 11.0379   LearningRate 0.0652   Epoch: 3   Global Step: 159590   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:06,248-Speed 2625.32 samples/sec   Loss 10.9958   LearningRate 0.0652   Epoch: 3   Global Step: 159600   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:10,153-Speed 2623.25 samples/sec   Loss 11.1787   LearningRate 0.0652   Epoch: 3   Global Step: 159610   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:14,055-Speed 2625.10 samples/sec   Loss 11.0766   LearningRate 0.0652   Epoch: 3   Global Step: 159620   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:17,958-Speed 2624.25 samples/sec   Loss 11.0770   LearningRate 0.0652   Epoch: 3   Global Step: 159630   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:21,863-Speed 2623.17 samples/sec   Loss 11.2137   LearningRate 0.0652   Epoch: 3   Global Step: 159640   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:25,767-Speed 2623.34 samples/sec   Loss 11.2070   LearningRate 0.0652   Epoch: 3   Global Step: 159650   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:29,670-Speed 2624.48 samples/sec   Loss 11.0141   LearningRate 0.0652   Epoch: 3   Global Step: 159660   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:33,580-Speed 2619.28 samples/sec   Loss 11.0365   LearningRate 0.0652   Epoch: 3   Global Step: 159670   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:37,484-Speed 2623.24 samples/sec   Loss 11.2478   LearningRate 0.0652   Epoch: 3   Global Step: 159680   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:41,383-Speed 2626.97 samples/sec   Loss 11.0513   LearningRate 0.0652   Epoch: 3   Global Step: 159690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:38:45,258-Speed 2643.26 samples/sec   Loss 11.0504   LearningRate 0.0652   Epoch: 3   Global Step: 159700   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:49,162-Speed 2623.64 samples/sec   Loss 10.8439   LearningRate 0.0652   Epoch: 3   Global Step: 159710   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:38:53,052-Speed 2633.39 samples/sec   Loss 11.2633   LearningRate 0.0652   Epoch: 3   Global Step: 159720   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:38:56,979-Speed 2608.22 samples/sec   Loss 11.1171   LearningRate 0.0652   Epoch: 3   Global Step: 159730   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:00,896-Speed 2615.00 samples/sec   Loss 11.0260   LearningRate 0.0652   Epoch: 3   Global Step: 159740   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:04,792-Speed 2628.61 samples/sec   Loss 10.9534   LearningRate 0.0652   Epoch: 3   Global Step: 159750   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:08,702-Speed 2619.28 samples/sec   Loss 10.9722   LearningRate 0.0652   Epoch: 3   Global Step: 159760   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:12,592-Speed 2633.31 samples/sec   Loss 10.9683   LearningRate 0.0652   Epoch: 3   Global Step: 159770   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:16,487-Speed 2629.94 samples/sec   Loss 11.1234   LearningRate 0.0652   Epoch: 3   Global Step: 159780   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:20,381-Speed 2630.24 samples/sec   Loss 11.1539   LearningRate 0.0652   Epoch: 3   Global Step: 159790   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:24,288-Speed 2621.84 samples/sec   Loss 11.2041   LearningRate 0.0652   Epoch: 3   Global Step: 159800   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:28,217-Speed 2607.42 samples/sec   Loss 11.0727   LearningRate 0.0652   Epoch: 3   Global Step: 159810   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:39:32,114-Speed 2628.08 samples/sec   Loss 10.8594   LearningRate 0.0652   Epoch: 3   Global Step: 159820   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:36,035-Speed 2611.72 samples/sec   Loss 10.9972   LearningRate 0.0652   Epoch: 3   Global Step: 159830   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:39,943-Speed 2621.27 samples/sec   Loss 11.0341   LearningRate 0.0652   Epoch: 3   Global Step: 159840   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:43,845-Speed 2625.00 samples/sec   Loss 11.0067   LearningRate 0.0652   Epoch: 3   Global Step: 159850   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:47,757-Speed 2618.45 samples/sec   Loss 11.1268   LearningRate 0.0652   Epoch: 3   Global Step: 159860   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:51,675-Speed 2614.70 samples/sec   Loss 11.0921   LearningRate 0.0652   Epoch: 3   Global Step: 159870   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:55,596-Speed 2611.88 samples/sec   Loss 10.9021   LearningRate 0.0652   Epoch: 3   Global Step: 159880   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:39:59,488-Speed 2632.07 samples/sec   Loss 11.0784   LearningRate 0.0652   Epoch: 3   Global Step: 159890   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:40:03,386-Speed 2627.76 samples/sec   Loss 10.9704   LearningRate 0.0652   Epoch: 3   Global Step: 159900   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:40:07,287-Speed 2625.41 samples/sec   Loss 10.9634   LearningRate 0.0652   Epoch: 3   Global Step: 159910   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:40:11,189-Speed 2624.71 samples/sec   Loss 11.0143   LearningRate 0.0652   Epoch: 3   Global Step: 159920   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:15,086-Speed 2628.29 samples/sec   Loss 10.9031   LearningRate 0.0652   Epoch: 3   Global Step: 159930   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:18,980-Speed 2630.76 samples/sec   Loss 11.0354   LearningRate 0.0652   Epoch: 3   Global Step: 159940   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:22,877-Speed 2627.95 samples/sec   Loss 11.1331   LearningRate 0.0652   Epoch: 3   Global Step: 159950   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:26,777-Speed 2626.23 samples/sec   Loss 11.0678   LearningRate 0.0652   Epoch: 3   Global Step: 159960   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:30,669-Speed 2631.47 samples/sec   Loss 11.0199   LearningRate 0.0652   Epoch: 3   Global Step: 159970   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:34,571-Speed 2625.44 samples/sec   Loss 11.0497   LearningRate 0.0651   Epoch: 3   Global Step: 159980   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:38,474-Speed 2623.94 samples/sec   Loss 10.9674   LearningRate 0.0651   Epoch: 3   Global Step: 159990   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:40:42,380-Speed 2622.25 samples/sec   Loss 11.0598   LearningRate 0.0651   Epoch: 3   Global Step: 160000   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:41:26,157-[lfw][160000]XNorm: 23.239836
Training: 2022-04-13 13:41:26,158-[lfw][160000]Accuracy-Flip: 0.99783+-0.00299
Training: 2022-04-13 13:41:26,159-[lfw][160000]Accuracy-Highest: 0.99783
Training: 2022-04-13 13:42:16,619-[cfp_fp][160000]XNorm: 20.861491
Training: 2022-04-13 13:42:16,620-[cfp_fp][160000]Accuracy-Flip: 0.97843+-0.00972
Training: 2022-04-13 13:42:16,622-[cfp_fp][160000]Accuracy-Highest: 0.98100
Training: 2022-04-13 13:43:00,184-[agedb_30][160000]XNorm: 23.004241
Training: 2022-04-13 13:43:00,185-[agedb_30][160000]Accuracy-Flip: 0.97050+-0.00853
Training: 2022-04-13 13:43:00,186-[agedb_30][160000]Accuracy-Highest: 0.97050
Training: 2022-04-13 13:43:04,052-Speed 72.28 samples/sec   Loss 11.0553   LearningRate 0.0651   Epoch: 3   Global Step: 160010   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:43:07,909-Speed 2655.15 samples/sec   Loss 11.0809   LearningRate 0.0651   Epoch: 3   Global Step: 160020   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:43:11,765-Speed 2656.64 samples/sec   Loss 10.9895   LearningRate 0.0651   Epoch: 3   Global Step: 160030   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:43:15,612-Speed 2662.19 samples/sec   Loss 11.1210   LearningRate 0.0651   Epoch: 3   Global Step: 160040   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:19,510-Speed 2627.82 samples/sec   Loss 11.0020   LearningRate 0.0651   Epoch: 3   Global Step: 160050   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:23,391-Speed 2639.25 samples/sec   Loss 11.0242   LearningRate 0.0651   Epoch: 3   Global Step: 160060   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:27,262-Speed 2647.18 samples/sec   Loss 11.0286   LearningRate 0.0651   Epoch: 3   Global Step: 160070   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:31,138-Speed 2642.89 samples/sec   Loss 11.0776   LearningRate 0.0651   Epoch: 3   Global Step: 160080   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:35,013-Speed 2643.49 samples/sec   Loss 11.1476   LearningRate 0.0651   Epoch: 3   Global Step: 160090   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:38,890-Speed 2641.21 samples/sec   Loss 11.1376   LearningRate 0.0651   Epoch: 3   Global Step: 160100   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:42,776-Speed 2636.37 samples/sec   Loss 10.8194   LearningRate 0.0651   Epoch: 3   Global Step: 160110   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:46,659-Speed 2637.50 samples/sec   Loss 10.9650   LearningRate 0.0651   Epoch: 3   Global Step: 160120   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:50,543-Speed 2637.32 samples/sec   Loss 10.9648   LearningRate 0.0651   Epoch: 3   Global Step: 160130   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:43:54,431-Speed 2635.22 samples/sec   Loss 11.0046   LearningRate 0.0651   Epoch: 3   Global Step: 160140   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:43:58,320-Speed 2633.26 samples/sec   Loss 11.0694   LearningRate 0.0651   Epoch: 3   Global Step: 160150   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:02,209-Speed 2635.06 samples/sec   Loss 11.0220   LearningRate 0.0651   Epoch: 3   Global Step: 160160   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:06,105-Speed 2628.69 samples/sec   Loss 10.9520   LearningRate 0.0651   Epoch: 3   Global Step: 160170   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:09,999-Speed 2630.19 samples/sec   Loss 10.9794   LearningRate 0.0651   Epoch: 3   Global Step: 160180   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:13,892-Speed 2630.94 samples/sec   Loss 10.8064   LearningRate 0.0651   Epoch: 3   Global Step: 160190   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:17,791-Speed 2627.30 samples/sec   Loss 10.9545   LearningRate 0.0651   Epoch: 3   Global Step: 160200   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:21,688-Speed 2628.01 samples/sec   Loss 11.0500   LearningRate 0.0651   Epoch: 3   Global Step: 160210   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:25,580-Speed 2631.97 samples/sec   Loss 11.0423   LearningRate 0.0651   Epoch: 3   Global Step: 160220   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:29,471-Speed 2632.61 samples/sec   Loss 10.9753   LearningRate 0.0651   Epoch: 3   Global Step: 160230   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:33,362-Speed 2632.81 samples/sec   Loss 10.9783   LearningRate 0.0651   Epoch: 3   Global Step: 160240   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:44:37,261-Speed 2626.88 samples/sec   Loss 10.9169   LearningRate 0.0651   Epoch: 3   Global Step: 160250   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:44:41,137-Speed 2641.96 samples/sec   Loss 11.1172   LearningRate 0.0651   Epoch: 3   Global Step: 160260   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:45,041-Speed 2623.70 samples/sec   Loss 10.9855   LearningRate 0.0651   Epoch: 3   Global Step: 160270   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:48,949-Speed 2620.66 samples/sec   Loss 11.0250   LearningRate 0.0651   Epoch: 3   Global Step: 160280   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:52,862-Speed 2617.46 samples/sec   Loss 11.0423   LearningRate 0.0651   Epoch: 3   Global Step: 160290   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:44:56,755-Speed 2631.17 samples/sec   Loss 11.1508   LearningRate 0.0651   Epoch: 3   Global Step: 160300   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:45:00,654-Speed 2626.58 samples/sec   Loss 10.9996   LearningRate 0.0651   Epoch: 3   Global Step: 160310   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:45:04,551-Speed 2628.67 samples/sec   Loss 11.0327   LearningRate 0.0651   Epoch: 3   Global Step: 160320   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:45:08,448-Speed 2628.77 samples/sec   Loss 10.9797   LearningRate 0.0651   Epoch: 3   Global Step: 160330   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:45:12,336-Speed 2634.25 samples/sec   Loss 10.9576   LearningRate 0.0651   Epoch: 3   Global Step: 160340   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:16,244-Speed 2620.33 samples/sec   Loss 10.9118   LearningRate 0.0651   Epoch: 3   Global Step: 160350   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:20,138-Speed 2630.71 samples/sec   Loss 10.8092   LearningRate 0.0651   Epoch: 3   Global Step: 160360   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:24,032-Speed 2630.22 samples/sec   Loss 11.0090   LearningRate 0.0651   Epoch: 3   Global Step: 160370   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:27,935-Speed 2624.16 samples/sec   Loss 10.9201   LearningRate 0.0651   Epoch: 3   Global Step: 160380   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:31,852-Speed 2615.51 samples/sec   Loss 11.2216   LearningRate 0.0651   Epoch: 3   Global Step: 160390   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:35,758-Speed 2622.17 samples/sec   Loss 10.9905   LearningRate 0.0651   Epoch: 3   Global Step: 160400   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:39,665-Speed 2621.96 samples/sec   Loss 11.0033   LearningRate 0.0651   Epoch: 3   Global Step: 160410   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:43,572-Speed 2620.92 samples/sec   Loss 10.9868   LearningRate 0.0651   Epoch: 3   Global Step: 160420   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:47,473-Speed 2626.44 samples/sec   Loss 11.0350   LearningRate 0.0651   Epoch: 3   Global Step: 160430   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:45:51,366-Speed 2630.60 samples/sec   Loss 11.0211   LearningRate 0.0651   Epoch: 3   Global Step: 160440   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:45:55,271-Speed 2623.27 samples/sec   Loss 10.9463   LearningRate 0.0651   Epoch: 3   Global Step: 160450   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:45:59,168-Speed 2628.23 samples/sec   Loss 10.8495   LearningRate 0.0651   Epoch: 3   Global Step: 160460   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:46:03,068-Speed 2626.91 samples/sec   Loss 11.0614   LearningRate 0.0651   Epoch: 3   Global Step: 160470   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:46:06,966-Speed 2627.18 samples/sec   Loss 10.9681   LearningRate 0.0651   Epoch: 3   Global Step: 160480   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:46:10,867-Speed 2625.61 samples/sec   Loss 11.0133   LearningRate 0.0651   Epoch: 3   Global Step: 160490   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:46:14,763-Speed 2629.05 samples/sec   Loss 10.9794   LearningRate 0.0650   Epoch: 3   Global Step: 160500   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:46:18,666-Speed 2624.35 samples/sec   Loss 11.0792   LearningRate 0.0650   Epoch: 3   Global Step: 160510   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:46:22,558-Speed 2631.95 samples/sec   Loss 10.8852   LearningRate 0.0650   Epoch: 3   Global Step: 160520   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:26,459-Speed 2625.49 samples/sec   Loss 10.9815   LearningRate 0.0650   Epoch: 3   Global Step: 160530   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:30,360-Speed 2625.48 samples/sec   Loss 11.0609   LearningRate 0.0650   Epoch: 3   Global Step: 160540   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:34,257-Speed 2628.73 samples/sec   Loss 11.0984   LearningRate 0.0650   Epoch: 3   Global Step: 160550   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:38,155-Speed 2627.77 samples/sec   Loss 11.0105   LearningRate 0.0650   Epoch: 3   Global Step: 160560   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:42,064-Speed 2620.42 samples/sec   Loss 11.0694   LearningRate 0.0650   Epoch: 3   Global Step: 160570   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:45,996-Speed 2604.92 samples/sec   Loss 10.9604   LearningRate 0.0650   Epoch: 3   Global Step: 160580   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:49,905-Speed 2620.59 samples/sec   Loss 11.1437   LearningRate 0.0650   Epoch: 3   Global Step: 160590   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:53,824-Speed 2613.12 samples/sec   Loss 10.8739   LearningRate 0.0650   Epoch: 3   Global Step: 160600   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:46:57,742-Speed 2614.56 samples/sec   Loss 11.0632   LearningRate 0.0650   Epoch: 3   Global Step: 160610   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:01,646-Speed 2623.63 samples/sec   Loss 10.9777   LearningRate 0.0650   Epoch: 3   Global Step: 160620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:47:05,559-Speed 2618.27 samples/sec   Loss 10.9190   LearningRate 0.0650   Epoch: 3   Global Step: 160630   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:47:09,449-Speed 2633.07 samples/sec   Loss 10.8692   LearningRate 0.0650   Epoch: 3   Global Step: 160640   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:13,353-Speed 2623.76 samples/sec   Loss 10.9865   LearningRate 0.0650   Epoch: 3   Global Step: 160650   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:17,260-Speed 2621.36 samples/sec   Loss 10.9842   LearningRate 0.0650   Epoch: 3   Global Step: 160660   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:21,167-Speed 2620.89 samples/sec   Loss 10.9515   LearningRate 0.0650   Epoch: 3   Global Step: 160670   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:25,075-Speed 2621.30 samples/sec   Loss 10.9004   LearningRate 0.0650   Epoch: 3   Global Step: 160680   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:28,988-Speed 2617.78 samples/sec   Loss 11.1353   LearningRate 0.0650   Epoch: 3   Global Step: 160690   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:32,892-Speed 2623.93 samples/sec   Loss 11.0403   LearningRate 0.0650   Epoch: 3   Global Step: 160700   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:36,797-Speed 2622.84 samples/sec   Loss 11.0690   LearningRate 0.0650   Epoch: 3   Global Step: 160710   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:40,703-Speed 2622.46 samples/sec   Loss 11.0175   LearningRate 0.0650   Epoch: 3   Global Step: 160720   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:44,643-Speed 2599.21 samples/sec   Loss 10.8331   LearningRate 0.0650   Epoch: 3   Global Step: 160730   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:47:48,554-Speed 2619.18 samples/sec   Loss 11.0453   LearningRate 0.0650   Epoch: 3   Global Step: 160740   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:47:52,464-Speed 2619.48 samples/sec   Loss 10.9701   LearningRate 0.0650   Epoch: 3   Global Step: 160750   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:47:56,389-Speed 2609.54 samples/sec   Loss 10.9400   LearningRate 0.0650   Epoch: 3   Global Step: 160760   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:00,306-Speed 2615.23 samples/sec   Loss 11.0223   LearningRate 0.0650   Epoch: 3   Global Step: 160770   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:04,236-Speed 2606.23 samples/sec   Loss 10.9962   LearningRate 0.0650   Epoch: 3   Global Step: 160780   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:08,161-Speed 2609.96 samples/sec   Loss 11.0467   LearningRate 0.0650   Epoch: 3   Global Step: 160790   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:12,082-Speed 2611.74 samples/sec   Loss 11.1246   LearningRate 0.0650   Epoch: 3   Global Step: 160800   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:16,002-Speed 2613.84 samples/sec   Loss 10.9609   LearningRate 0.0650   Epoch: 3   Global Step: 160810   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:19,923-Speed 2612.52 samples/sec   Loss 10.8932   LearningRate 0.0650   Epoch: 3   Global Step: 160820   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:23,835-Speed 2618.23 samples/sec   Loss 10.9143   LearningRate 0.0650   Epoch: 3   Global Step: 160830   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:27,755-Speed 2613.03 samples/sec   Loss 11.0143   LearningRate 0.0650   Epoch: 3   Global Step: 160840   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:48:31,680-Speed 2609.29 samples/sec   Loss 11.0515   LearningRate 0.0650   Epoch: 3   Global Step: 160850   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:48:35,636-Speed 2589.69 samples/sec   Loss 10.8679   LearningRate 0.0650   Epoch: 3   Global Step: 160860   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:48:39,536-Speed 2626.17 samples/sec   Loss 10.9572   LearningRate 0.0650   Epoch: 3   Global Step: 160870   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:43,451-Speed 2616.45 samples/sec   Loss 10.9156   LearningRate 0.0650   Epoch: 3   Global Step: 160880   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:47,365-Speed 2616.77 samples/sec   Loss 11.0035   LearningRate 0.0650   Epoch: 3   Global Step: 160890   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:51,277-Speed 2618.33 samples/sec   Loss 10.8471   LearningRate 0.0650   Epoch: 3   Global Step: 160900   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:55,198-Speed 2612.68 samples/sec   Loss 11.0861   LearningRate 0.0650   Epoch: 3   Global Step: 160910   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:48:59,114-Speed 2615.39 samples/sec   Loss 11.0114   LearningRate 0.0650   Epoch: 3   Global Step: 160920   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:49:03,032-Speed 2613.98 samples/sec   Loss 10.9156   LearningRate 0.0650   Epoch: 3   Global Step: 160930   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:49:06,964-Speed 2604.83 samples/sec   Loss 11.0080   LearningRate 0.0650   Epoch: 3   Global Step: 160940   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:49:10,883-Speed 2614.17 samples/sec   Loss 11.0503   LearningRate 0.0650   Epoch: 3   Global Step: 160950   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:49:14,802-Speed 2613.52 samples/sec   Loss 10.8828   LearningRate 0.0650   Epoch: 3   Global Step: 160960   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:49:18,719-Speed 2614.60 samples/sec   Loss 11.0044   LearningRate 0.0650   Epoch: 3   Global Step: 160970   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:49:22,615-Speed 2629.16 samples/sec   Loss 10.9602   LearningRate 0.0650   Epoch: 3   Global Step: 160980   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:26,558-Speed 2598.04 samples/sec   Loss 10.9390   LearningRate 0.0650   Epoch: 3   Global Step: 160990   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:30,497-Speed 2600.50 samples/sec   Loss 10.9519   LearningRate 0.0650   Epoch: 3   Global Step: 161000   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:34,432-Speed 2602.87 samples/sec   Loss 10.9483   LearningRate 0.0649   Epoch: 3   Global Step: 161010   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:38,349-Speed 2614.45 samples/sec   Loss 11.0854   LearningRate 0.0649   Epoch: 3   Global Step: 161020   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:42,268-Speed 2613.48 samples/sec   Loss 11.0612   LearningRate 0.0649   Epoch: 3   Global Step: 161030   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:46,186-Speed 2614.72 samples/sec   Loss 10.9933   LearningRate 0.0649   Epoch: 3   Global Step: 161040   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:50,104-Speed 2614.49 samples/sec   Loss 11.0999   LearningRate 0.0649   Epoch: 3   Global Step: 161050   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:54,050-Speed 2595.43 samples/sec   Loss 11.0406   LearningRate 0.0649   Epoch: 3   Global Step: 161060   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:49:58,088-Speed 2536.55 samples/sec   Loss 10.9733   LearningRate 0.0649   Epoch: 3   Global Step: 161070   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:50:02,010-Speed 2612.03 samples/sec   Loss 10.9133   LearningRate 0.0649   Epoch: 3   Global Step: 161080   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:50:05,931-Speed 2611.79 samples/sec   Loss 10.9337   LearningRate 0.0649   Epoch: 3   Global Step: 161090   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:50:09,857-Speed 2609.09 samples/sec   Loss 10.9579   LearningRate 0.0649   Epoch: 3   Global Step: 161100   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:50:13,773-Speed 2615.62 samples/sec   Loss 11.1953   LearningRate 0.0649   Epoch: 3   Global Step: 161110   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:50:17,694-Speed 2612.14 samples/sec   Loss 11.6893   LearningRate 0.0649   Epoch: 3   Global Step: 161120   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:50:21,610-Speed 2615.77 samples/sec   Loss 11.4058   LearningRate 0.0649   Epoch: 3   Global Step: 161130   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:50:25,523-Speed 2617.56 samples/sec   Loss 11.0811   LearningRate 0.0649   Epoch: 3   Global Step: 161140   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:50:29,418-Speed 2629.88 samples/sec   Loss 11.1700   LearningRate 0.0649   Epoch: 3   Global Step: 161150   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:33,323-Speed 2622.98 samples/sec   Loss 11.1597   LearningRate 0.0649   Epoch: 3   Global Step: 161160   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:37,266-Speed 2598.07 samples/sec   Loss 11.0234   LearningRate 0.0649   Epoch: 3   Global Step: 161170   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:41,210-Speed 2596.23 samples/sec   Loss 11.0049   LearningRate 0.0649   Epoch: 3   Global Step: 161180   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:45,125-Speed 2617.04 samples/sec   Loss 11.0671   LearningRate 0.0649   Epoch: 3   Global Step: 161190   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:49,037-Speed 2617.83 samples/sec   Loss 10.9759   LearningRate 0.0649   Epoch: 3   Global Step: 161200   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:52,946-Speed 2620.89 samples/sec   Loss 10.9412   LearningRate 0.0649   Epoch: 3   Global Step: 161210   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:50:56,859-Speed 2617.53 samples/sec   Loss 10.9200   LearningRate 0.0649   Epoch: 3   Global Step: 161220   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:51:00,773-Speed 2617.35 samples/sec   Loss 11.0304   LearningRate 0.0649   Epoch: 3   Global Step: 161230   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:51:04,683-Speed 2618.94 samples/sec   Loss 10.9520   LearningRate 0.0649   Epoch: 3   Global Step: 161240   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 13:51:08,613-Speed 2606.67 samples/sec   Loss 11.0119   LearningRate 0.0649   Epoch: 3   Global Step: 161250   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:12,550-Speed 2601.49 samples/sec   Loss 11.1321   LearningRate 0.0649   Epoch: 3   Global Step: 161260   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:16,463-Speed 2617.64 samples/sec   Loss 11.0399   LearningRate 0.0649   Epoch: 3   Global Step: 161270   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:20,434-Speed 2579.05 samples/sec   Loss 11.0540   LearningRate 0.0649   Epoch: 3   Global Step: 161280   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:24,351-Speed 2614.81 samples/sec   Loss 11.1016   LearningRate 0.0649   Epoch: 3   Global Step: 161290   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:28,258-Speed 2621.99 samples/sec   Loss 11.1115   LearningRate 0.0649   Epoch: 3   Global Step: 161300   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:32,171-Speed 2618.20 samples/sec   Loss 10.7734   LearningRate 0.0649   Epoch: 3   Global Step: 161310   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:36,078-Speed 2620.79 samples/sec   Loss 11.0235   LearningRate 0.0649   Epoch: 3   Global Step: 161320   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:39,990-Speed 2618.19 samples/sec   Loss 10.9893   LearningRate 0.0649   Epoch: 3   Global Step: 161330   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:43,902-Speed 2618.50 samples/sec   Loss 10.9967   LearningRate 0.0649   Epoch: 3   Global Step: 161340   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 13:51:47,825-Speed 2610.73 samples/sec   Loss 10.9572   LearningRate 0.0649   Epoch: 3   Global Step: 161350   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:51:51,764-Speed 2600.48 samples/sec   Loss 11.0865   LearningRate 0.0649   Epoch: 3   Global Step: 161360   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:51:55,673-Speed 2620.64 samples/sec   Loss 10.9735   LearningRate 0.0649   Epoch: 3   Global Step: 161370   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:51:59,576-Speed 2624.00 samples/sec   Loss 10.9865   LearningRate 0.0649   Epoch: 3   Global Step: 161380   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:03,487-Speed 2619.18 samples/sec   Loss 10.9713   LearningRate 0.0649   Epoch: 3   Global Step: 161390   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:07,391-Speed 2623.26 samples/sec   Loss 11.1946   LearningRate 0.0649   Epoch: 3   Global Step: 161400   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:11,297-Speed 2621.95 samples/sec   Loss 10.9062   LearningRate 0.0649   Epoch: 3   Global Step: 161410   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:15,213-Speed 2616.07 samples/sec   Loss 10.8883   LearningRate 0.0649   Epoch: 3   Global Step: 161420   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:19,116-Speed 2623.73 samples/sec   Loss 11.0296   LearningRate 0.0649   Epoch: 3   Global Step: 161430   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:23,024-Speed 2621.30 samples/sec   Loss 10.8705   LearningRate 0.0649   Epoch: 3   Global Step: 161440   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:52:27,039-Speed 2550.72 samples/sec   Loss 10.9316   LearningRate 0.0649   Epoch: 3   Global Step: 161450   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:30,959-Speed 2613.05 samples/sec   Loss 10.8400   LearningRate 0.0649   Epoch: 3   Global Step: 161460   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:34,863-Speed 2623.83 samples/sec   Loss 10.8783   LearningRate 0.0649   Epoch: 3   Global Step: 161470   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:38,770-Speed 2621.17 samples/sec   Loss 10.9094   LearningRate 0.0649   Epoch: 3   Global Step: 161480   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:42,680-Speed 2619.24 samples/sec   Loss 10.8981   LearningRate 0.0649   Epoch: 3   Global Step: 161490   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:46,592-Speed 2617.97 samples/sec   Loss 11.2373   LearningRate 0.0649   Epoch: 3   Global Step: 161500   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:50,507-Speed 2616.94 samples/sec   Loss 11.0604   LearningRate 0.0649   Epoch: 3   Global Step: 161510   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:54,413-Speed 2621.79 samples/sec   Loss 10.9507   LearningRate 0.0649   Epoch: 3   Global Step: 161520   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:52:58,347-Speed 2603.70 samples/sec   Loss 11.0306   LearningRate 0.0648   Epoch: 3   Global Step: 161530   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:02,258-Speed 2619.65 samples/sec   Loss 11.0565   LearningRate 0.0648   Epoch: 3   Global Step: 161540   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:06,172-Speed 2616.38 samples/sec   Loss 10.9149   LearningRate 0.0648   Epoch: 3   Global Step: 161550   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:53:10,071-Speed 2627.08 samples/sec   Loss 11.0272   LearningRate 0.0648   Epoch: 3   Global Step: 161560   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:13,984-Speed 2617.92 samples/sec   Loss 10.9892   LearningRate 0.0648   Epoch: 3   Global Step: 161570   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:17,897-Speed 2617.31 samples/sec   Loss 11.0065   LearningRate 0.0648   Epoch: 3   Global Step: 161580   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:21,813-Speed 2615.94 samples/sec   Loss 11.0052   LearningRate 0.0648   Epoch: 3   Global Step: 161590   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:25,748-Speed 2602.70 samples/sec   Loss 11.0827   LearningRate 0.0648   Epoch: 3   Global Step: 161600   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:29,660-Speed 2618.96 samples/sec   Loss 11.1151   LearningRate 0.0648   Epoch: 3   Global Step: 161610   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:33,581-Speed 2611.70 samples/sec   Loss 10.9419   LearningRate 0.0648   Epoch: 3   Global Step: 161620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:37,489-Speed 2620.73 samples/sec   Loss 10.8974   LearningRate 0.0648   Epoch: 3   Global Step: 161630   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:41,423-Speed 2603.61 samples/sec   Loss 10.9977   LearningRate 0.0648   Epoch: 3   Global Step: 161640   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:45,337-Speed 2616.98 samples/sec   Loss 11.0799   LearningRate 0.0648   Epoch: 3   Global Step: 161650   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:49,246-Speed 2620.36 samples/sec   Loss 10.8862   LearningRate 0.0648   Epoch: 3   Global Step: 161660   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:53:53,142-Speed 2628.94 samples/sec   Loss 10.9816   LearningRate 0.0648   Epoch: 3   Global Step: 161670   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:53:57,039-Speed 2628.33 samples/sec   Loss 10.9106   LearningRate 0.0648   Epoch: 3   Global Step: 161680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:00,946-Speed 2621.96 samples/sec   Loss 10.8934   LearningRate 0.0648   Epoch: 3   Global Step: 161690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:04,856-Speed 2619.63 samples/sec   Loss 10.9718   LearningRate 0.0648   Epoch: 3   Global Step: 161700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:08,759-Speed 2624.37 samples/sec   Loss 10.9574   LearningRate 0.0648   Epoch: 3   Global Step: 161710   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:12,670-Speed 2618.74 samples/sec   Loss 11.0266   LearningRate 0.0648   Epoch: 3   Global Step: 161720   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:16,570-Speed 2626.25 samples/sec   Loss 10.8213   LearningRate 0.0648   Epoch: 3   Global Step: 161730   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:20,471-Speed 2625.88 samples/sec   Loss 10.8257   LearningRate 0.0648   Epoch: 3   Global Step: 161740   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:24,374-Speed 2624.06 samples/sec   Loss 10.8371   LearningRate 0.0648   Epoch: 3   Global Step: 161750   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:28,279-Speed 2623.40 samples/sec   Loss 11.0967   LearningRate 0.0648   Epoch: 3   Global Step: 161760   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:32,187-Speed 2620.77 samples/sec   Loss 10.8865   LearningRate 0.0648   Epoch: 3   Global Step: 161770   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:54:36,148-Speed 2586.29 samples/sec   Loss 10.9402   LearningRate 0.0648   Epoch: 3   Global Step: 161780   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:54:40,061-Speed 2618.19 samples/sec   Loss 10.9140   LearningRate 0.0648   Epoch: 3   Global Step: 161790   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:54:43,983-Speed 2611.42 samples/sec   Loss 10.9727   LearningRate 0.0648   Epoch: 3   Global Step: 161800   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:47,884-Speed 2625.94 samples/sec   Loss 11.0596   LearningRate 0.0648   Epoch: 3   Global Step: 161810   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:51,790-Speed 2622.11 samples/sec   Loss 10.8905   LearningRate 0.0648   Epoch: 3   Global Step: 161820   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:55,716-Speed 2609.24 samples/sec   Loss 10.8789   LearningRate 0.0648   Epoch: 3   Global Step: 161830   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:54:59,635-Speed 2612.96 samples/sec   Loss 11.0299   LearningRate 0.0648   Epoch: 3   Global Step: 161840   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:03,559-Speed 2610.35 samples/sec   Loss 10.9173   LearningRate 0.0648   Epoch: 3   Global Step: 161850   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:07,467-Speed 2620.87 samples/sec   Loss 10.9936   LearningRate 0.0648   Epoch: 3   Global Step: 161860   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:11,384-Speed 2615.64 samples/sec   Loss 11.0288   LearningRate 0.0648   Epoch: 3   Global Step: 161870   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:15,286-Speed 2624.50 samples/sec   Loss 10.9858   LearningRate 0.0648   Epoch: 3   Global Step: 161880   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:19,191-Speed 2622.91 samples/sec   Loss 10.9319   LearningRate 0.0648   Epoch: 3   Global Step: 161890   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:23,095-Speed 2623.63 samples/sec   Loss 10.9143   LearningRate 0.0648   Epoch: 3   Global Step: 161900   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:55:26,998-Speed 2624.54 samples/sec   Loss 11.1582   LearningRate 0.0648   Epoch: 3   Global Step: 161910   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:55:30,889-Speed 2632.27 samples/sec   Loss 11.0631   LearningRate 0.0648   Epoch: 3   Global Step: 161920   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:34,790-Speed 2625.43 samples/sec   Loss 10.8930   LearningRate 0.0648   Epoch: 3   Global Step: 161930   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:38,694-Speed 2623.32 samples/sec   Loss 10.9116   LearningRate 0.0648   Epoch: 3   Global Step: 161940   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:42,600-Speed 2622.32 samples/sec   Loss 11.0898   LearningRate 0.0648   Epoch: 3   Global Step: 161950   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:46,504-Speed 2623.92 samples/sec   Loss 11.0097   LearningRate 0.0648   Epoch: 3   Global Step: 161960   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:50,423-Speed 2614.00 samples/sec   Loss 10.9757   LearningRate 0.0648   Epoch: 3   Global Step: 161970   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:54,324-Speed 2625.52 samples/sec   Loss 10.9646   LearningRate 0.0648   Epoch: 3   Global Step: 161980   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:55:58,236-Speed 2618.55 samples/sec   Loss 10.8754   LearningRate 0.0648   Epoch: 3   Global Step: 161990   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:02,145-Speed 2620.19 samples/sec   Loss 10.9727   LearningRate 0.0648   Epoch: 3   Global Step: 162000   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:06,056-Speed 2619.19 samples/sec   Loss 11.0431   LearningRate 0.0648   Epoch: 3   Global Step: 162010   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:09,947-Speed 2631.76 samples/sec   Loss 10.9211   LearningRate 0.0648   Epoch: 3   Global Step: 162020   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:13,847-Speed 2627.10 samples/sec   Loss 10.8532   LearningRate 0.0648   Epoch: 3   Global Step: 162030   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:17,746-Speed 2626.49 samples/sec   Loss 11.0445   LearningRate 0.0647   Epoch: 3   Global Step: 162040   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:21,652-Speed 2623.26 samples/sec   Loss 10.9038   LearningRate 0.0647   Epoch: 3   Global Step: 162050   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:25,565-Speed 2617.17 samples/sec   Loss 11.0513   LearningRate 0.0647   Epoch: 3   Global Step: 162060   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:29,474-Speed 2620.23 samples/sec   Loss 10.9505   LearningRate 0.0647   Epoch: 3   Global Step: 162070   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:33,385-Speed 2619.39 samples/sec   Loss 11.0889   LearningRate 0.0647   Epoch: 3   Global Step: 162080   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:37,288-Speed 2624.05 samples/sec   Loss 11.1723   LearningRate 0.0647   Epoch: 3   Global Step: 162090   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:41,192-Speed 2623.15 samples/sec   Loss 10.8746   LearningRate 0.0647   Epoch: 3   Global Step: 162100   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:45,119-Speed 2608.61 samples/sec   Loss 10.9221   LearningRate 0.0647   Epoch: 3   Global Step: 162110   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:56:49,024-Speed 2623.16 samples/sec   Loss 11.0800   LearningRate 0.0647   Epoch: 3   Global Step: 162120   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:56:52,933-Speed 2620.66 samples/sec   Loss 10.9346   LearningRate 0.0647   Epoch: 3   Global Step: 162130   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:56:56,844-Speed 2618.56 samples/sec   Loss 11.0235   LearningRate 0.0647   Epoch: 3   Global Step: 162140   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:57:00,749-Speed 2623.17 samples/sec   Loss 10.9611   LearningRate 0.0647   Epoch: 3   Global Step: 162150   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:57:04,640-Speed 2632.22 samples/sec   Loss 10.9898   LearningRate 0.0647   Epoch: 3   Global Step: 162160   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:57:08,534-Speed 2630.47 samples/sec   Loss 10.9014   LearningRate 0.0647   Epoch: 3   Global Step: 162170   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:12,429-Speed 2629.53 samples/sec   Loss 10.9344   LearningRate 0.0647   Epoch: 3   Global Step: 162180   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:16,333-Speed 2624.32 samples/sec   Loss 10.9136   LearningRate 0.0647   Epoch: 3   Global Step: 162190   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:20,236-Speed 2624.18 samples/sec   Loss 10.8794   LearningRate 0.0647   Epoch: 3   Global Step: 162200   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:24,153-Speed 2614.81 samples/sec   Loss 11.0097   LearningRate 0.0647   Epoch: 3   Global Step: 162210   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:28,058-Speed 2623.21 samples/sec   Loss 11.0591   LearningRate 0.0647   Epoch: 3   Global Step: 162220   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:31,962-Speed 2624.02 samples/sec   Loss 11.0903   LearningRate 0.0647   Epoch: 3   Global Step: 162230   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:35,860-Speed 2626.87 samples/sec   Loss 10.9612   LearningRate 0.0647   Epoch: 3   Global Step: 162240   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:39,758-Speed 2627.41 samples/sec   Loss 10.9189   LearningRate 0.0647   Epoch: 3   Global Step: 162250   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:43,663-Speed 2623.34 samples/sec   Loss 10.9945   LearningRate 0.0647   Epoch: 3   Global Step: 162260   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:57:47,569-Speed 2622.34 samples/sec   Loss 10.8974   LearningRate 0.0647   Epoch: 3   Global Step: 162270   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:57:51,488-Speed 2613.56 samples/sec   Loss 10.9186   LearningRate 0.0647   Epoch: 3   Global Step: 162280   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:57:55,400-Speed 2618.58 samples/sec   Loss 11.1610   LearningRate 0.0647   Epoch: 3   Global Step: 162290   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:57:59,309-Speed 2620.57 samples/sec   Loss 10.8216   LearningRate 0.0647   Epoch: 3   Global Step: 162300   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:03,221-Speed 2618.36 samples/sec   Loss 10.9046   LearningRate 0.0647   Epoch: 3   Global Step: 162310   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:07,124-Speed 2624.20 samples/sec   Loss 10.9091   LearningRate 0.0647   Epoch: 3   Global Step: 162320   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:11,031-Speed 2621.39 samples/sec   Loss 11.0537   LearningRate 0.0647   Epoch: 3   Global Step: 162330   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:14,941-Speed 2619.92 samples/sec   Loss 10.9346   LearningRate 0.0647   Epoch: 3   Global Step: 162340   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:18,844-Speed 2624.62 samples/sec   Loss 11.0203   LearningRate 0.0647   Epoch: 3   Global Step: 162350   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:22,746-Speed 2624.76 samples/sec   Loss 10.8797   LearningRate 0.0647   Epoch: 3   Global Step: 162360   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:26,703-Speed 2588.48 samples/sec   Loss 10.9468   LearningRate 0.0647   Epoch: 3   Global Step: 162370   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:58:30,610-Speed 2621.94 samples/sec   Loss 10.9031   LearningRate 0.0647   Epoch: 3   Global Step: 162380   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:58:34,521-Speed 2619.07 samples/sec   Loss 10.9080   LearningRate 0.0647   Epoch: 3   Global Step: 162390   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:58:38,434-Speed 2617.03 samples/sec   Loss 10.9258   LearningRate 0.0647   Epoch: 3   Global Step: 162400   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 13:58:42,328-Speed 2630.49 samples/sec   Loss 10.9565   LearningRate 0.0647   Epoch: 3   Global Step: 162410   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:46,237-Speed 2620.45 samples/sec   Loss 10.8477   LearningRate 0.0647   Epoch: 3   Global Step: 162420   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:58:50,301-Speed 2520.62 samples/sec   Loss 10.8874   LearningRate 0.0647   Epoch: 3   Global Step: 162430   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:58:54,339-Speed 2536.56 samples/sec   Loss 11.1556   LearningRate 0.0647   Epoch: 3   Global Step: 162440   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:58:58,241-Speed 2624.89 samples/sec   Loss 11.0139   LearningRate 0.0647   Epoch: 3   Global Step: 162450   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:02,146-Speed 2622.93 samples/sec   Loss 10.9189   LearningRate 0.0647   Epoch: 3   Global Step: 162460   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:06,048-Speed 2625.21 samples/sec   Loss 11.0007   LearningRate 0.0647   Epoch: 3   Global Step: 162470   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:09,952-Speed 2623.50 samples/sec   Loss 10.8582   LearningRate 0.0647   Epoch: 3   Global Step: 162480   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:13,858-Speed 2622.57 samples/sec   Loss 10.9539   LearningRate 0.0647   Epoch: 3   Global Step: 162490   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:17,766-Speed 2620.60 samples/sec   Loss 10.8772   LearningRate 0.0647   Epoch: 3   Global Step: 162500   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:21,676-Speed 2620.01 samples/sec   Loss 11.0532   LearningRate 0.0647   Epoch: 3   Global Step: 162510   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:25,576-Speed 2626.70 samples/sec   Loss 11.0255   LearningRate 0.0647   Epoch: 3   Global Step: 162520   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 13:59:29,474-Speed 2627.44 samples/sec   Loss 11.0422   LearningRate 0.0647   Epoch: 3   Global Step: 162530   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:33,381-Speed 2621.95 samples/sec   Loss 10.9264   LearningRate 0.0647   Epoch: 3   Global Step: 162540   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:37,289-Speed 2620.51 samples/sec   Loss 10.9139   LearningRate 0.0647   Epoch: 3   Global Step: 162550   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:41,197-Speed 2620.90 samples/sec   Loss 10.8377   LearningRate 0.0646   Epoch: 3   Global Step: 162560   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:45,095-Speed 2627.53 samples/sec   Loss 10.9642   LearningRate 0.0646   Epoch: 3   Global Step: 162570   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:48,994-Speed 2627.39 samples/sec   Loss 10.8296   LearningRate 0.0646   Epoch: 3   Global Step: 162580   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:52,893-Speed 2626.54 samples/sec   Loss 10.9892   LearningRate 0.0646   Epoch: 3   Global Step: 162590   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 13:59:56,812-Speed 2614.50 samples/sec   Loss 10.9705   LearningRate 0.0646   Epoch: 3   Global Step: 162600   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:00,804-Speed 2565.55 samples/sec   Loss 11.0877   LearningRate 0.0646   Epoch: 3   Global Step: 162610   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:04,709-Speed 2623.42 samples/sec   Loss 11.0123   LearningRate 0.0646   Epoch: 3   Global Step: 162620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:08,616-Speed 2621.21 samples/sec   Loss 10.8443   LearningRate 0.0646   Epoch: 3   Global Step: 162630   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:00:12,525-Speed 2619.81 samples/sec   Loss 10.8230   LearningRate 0.0646   Epoch: 3   Global Step: 162640   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:00:16,442-Speed 2614.72 samples/sec   Loss 10.8962   LearningRate 0.0646   Epoch: 3   Global Step: 162650   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:20,370-Speed 2608.04 samples/sec   Loss 10.8180   LearningRate 0.0646   Epoch: 3   Global Step: 162660   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:24,313-Speed 2598.26 samples/sec   Loss 11.1184   LearningRate 0.0646   Epoch: 3   Global Step: 162670   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:28,233-Speed 2612.59 samples/sec   Loss 10.8961   LearningRate 0.0646   Epoch: 3   Global Step: 162680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:32,129-Speed 2629.00 samples/sec   Loss 10.8859   LearningRate 0.0646   Epoch: 3   Global Step: 162690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:36,040-Speed 2619.77 samples/sec   Loss 10.8796   LearningRate 0.0646   Epoch: 3   Global Step: 162700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:39,957-Speed 2614.62 samples/sec   Loss 10.9962   LearningRate 0.0646   Epoch: 3   Global Step: 162710   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:43,855-Speed 2627.53 samples/sec   Loss 11.0315   LearningRate 0.0646   Epoch: 3   Global Step: 162720   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:47,756-Speed 2625.67 samples/sec   Loss 11.0609   LearningRate 0.0646   Epoch: 3   Global Step: 162730   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:51,657-Speed 2625.81 samples/sec   Loss 11.0567   LearningRate 0.0646   Epoch: 3   Global Step: 162740   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:00:55,557-Speed 2626.49 samples/sec   Loss 10.9373   LearningRate 0.0646   Epoch: 3   Global Step: 162750   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:00:59,458-Speed 2625.58 samples/sec   Loss 10.8119   LearningRate 0.0646   Epoch: 3   Global Step: 162760   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:01:03,360-Speed 2624.51 samples/sec   Loss 10.8598   LearningRate 0.0646   Epoch: 3   Global Step: 162770   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:01:07,247-Speed 2635.22 samples/sec   Loss 10.9757   LearningRate 0.0646   Epoch: 3   Global Step: 162780   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:01:11,140-Speed 2630.89 samples/sec   Loss 11.2021   LearningRate 0.0646   Epoch: 3   Global Step: 162790   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:01:15,070-Speed 2605.78 samples/sec   Loss 10.9515   LearningRate 0.0646   Epoch: 3   Global Step: 162800   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:01:18,977-Speed 2621.68 samples/sec   Loss 10.9650   LearningRate 0.0646   Epoch: 3   Global Step: 162810   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:01:22,992-Speed 2551.35 samples/sec   Loss 10.9701   LearningRate 0.0646   Epoch: 3   Global Step: 162820   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:26,963-Speed 2579.55 samples/sec   Loss 10.8594   LearningRate 0.0646   Epoch: 3   Global Step: 162830   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:30,878-Speed 2616.55 samples/sec   Loss 10.9784   LearningRate 0.0646   Epoch: 3   Global Step: 162840   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:34,776-Speed 2627.90 samples/sec   Loss 11.0815   LearningRate 0.0646   Epoch: 3   Global Step: 162850   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:38,674-Speed 2627.49 samples/sec   Loss 10.9478   LearningRate 0.0646   Epoch: 3   Global Step: 162860   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:42,591-Speed 2614.58 samples/sec   Loss 10.8980   LearningRate 0.0646   Epoch: 3   Global Step: 162870   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:46,486-Speed 2629.80 samples/sec   Loss 10.8971   LearningRate 0.0646   Epoch: 3   Global Step: 162880   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:50,384-Speed 2628.08 samples/sec   Loss 10.9926   LearningRate 0.0646   Epoch: 3   Global Step: 162890   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:54,283-Speed 2627.43 samples/sec   Loss 11.0128   LearningRate 0.0646   Epoch: 3   Global Step: 162900   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:01:58,183-Speed 2625.86 samples/sec   Loss 10.9129   LearningRate 0.0646   Epoch: 3   Global Step: 162910   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:02:02,080-Speed 2628.96 samples/sec   Loss 11.0612   LearningRate 0.0646   Epoch: 3   Global Step: 162920   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:05,972-Speed 2631.25 samples/sec   Loss 10.9481   LearningRate 0.0646   Epoch: 3   Global Step: 162930   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:09,868-Speed 2629.30 samples/sec   Loss 10.8985   LearningRate 0.0646   Epoch: 3   Global Step: 162940   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:13,770-Speed 2624.60 samples/sec   Loss 11.0660   LearningRate 0.0646   Epoch: 3   Global Step: 162950   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:17,720-Speed 2592.72 samples/sec   Loss 10.7478   LearningRate 0.0646   Epoch: 3   Global Step: 162960   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:21,619-Speed 2627.14 samples/sec   Loss 10.8909   LearningRate 0.0646   Epoch: 3   Global Step: 162970   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:25,534-Speed 2616.49 samples/sec   Loss 10.8388   LearningRate 0.0646   Epoch: 3   Global Step: 162980   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:29,440-Speed 2622.66 samples/sec   Loss 10.9614   LearningRate 0.0646   Epoch: 3   Global Step: 162990   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:33,342-Speed 2624.71 samples/sec   Loss 10.9150   LearningRate 0.0646   Epoch: 3   Global Step: 163000   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:02:37,219-Speed 2641.18 samples/sec   Loss 10.8422   LearningRate 0.0646   Epoch: 3   Global Step: 163010   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:02:41,123-Speed 2623.67 samples/sec   Loss 11.0231   LearningRate 0.0646   Epoch: 3   Global Step: 163020   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:02:45,029-Speed 2623.14 samples/sec   Loss 11.1006   LearningRate 0.0646   Epoch: 3   Global Step: 163030   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:02:48,933-Speed 2623.41 samples/sec   Loss 10.9859   LearningRate 0.0646   Epoch: 3   Global Step: 163040   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:02:52,835-Speed 2624.87 samples/sec   Loss 11.0140   LearningRate 0.0646   Epoch: 3   Global Step: 163050   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:02:56,742-Speed 2623.23 samples/sec   Loss 10.8129   LearningRate 0.0646   Epoch: 3   Global Step: 163060   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:03:00,640-Speed 2627.78 samples/sec   Loss 10.8823   LearningRate 0.0646   Epoch: 3   Global Step: 163070   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:03:04,540-Speed 2625.57 samples/sec   Loss 10.8996   LearningRate 0.0645   Epoch: 3   Global Step: 163080   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:03:08,436-Speed 2629.00 samples/sec   Loss 10.9029   LearningRate 0.0645   Epoch: 3   Global Step: 163090   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:03:12,340-Speed 2623.60 samples/sec   Loss 10.9264   LearningRate 0.0645   Epoch: 3   Global Step: 163100   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:03:16,242-Speed 2625.42 samples/sec   Loss 10.9929   LearningRate 0.0645   Epoch: 3   Global Step: 163110   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:20,150-Speed 2621.07 samples/sec   Loss 10.9187   LearningRate 0.0645   Epoch: 3   Global Step: 163120   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:24,052-Speed 2625.00 samples/sec   Loss 11.0033   LearningRate 0.0645   Epoch: 3   Global Step: 163130   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:27,958-Speed 2622.15 samples/sec   Loss 11.0705   LearningRate 0.0645   Epoch: 3   Global Step: 163140   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:31,861-Speed 2624.71 samples/sec   Loss 10.9662   LearningRate 0.0645   Epoch: 3   Global Step: 163150   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:35,810-Speed 2593.55 samples/sec   Loss 11.1604   LearningRate 0.0645   Epoch: 3   Global Step: 163160   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:39,768-Speed 2587.75 samples/sec   Loss 10.8614   LearningRate 0.0645   Epoch: 3   Global Step: 163170   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:43,667-Speed 2626.96 samples/sec   Loss 10.9434   LearningRate 0.0645   Epoch: 3   Global Step: 163180   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:47,596-Speed 2607.28 samples/sec   Loss 11.1093   LearningRate 0.0645   Epoch: 3   Global Step: 163190   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:51,509-Speed 2617.47 samples/sec   Loss 11.0872   LearningRate 0.0645   Epoch: 3   Global Step: 163200   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:03:55,407-Speed 2627.54 samples/sec   Loss 11.0019   LearningRate 0.0645   Epoch: 3   Global Step: 163210   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:03:59,315-Speed 2621.70 samples/sec   Loss 11.0308   LearningRate 0.0645   Epoch: 3   Global Step: 163220   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:03,216-Speed 2625.22 samples/sec   Loss 11.0455   LearningRate 0.0645   Epoch: 3   Global Step: 163230   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:07,118-Speed 2624.85 samples/sec   Loss 10.9602   LearningRate 0.0645   Epoch: 3   Global Step: 163240   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:11,017-Speed 2626.61 samples/sec   Loss 10.9231   LearningRate 0.0645   Epoch: 3   Global Step: 163250   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:14,952-Speed 2603.37 samples/sec   Loss 10.7435   LearningRate 0.0645   Epoch: 3   Global Step: 163260   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:18,853-Speed 2625.88 samples/sec   Loss 10.9389   LearningRate 0.0645   Epoch: 3   Global Step: 163270   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:22,753-Speed 2626.18 samples/sec   Loss 11.0810   LearningRate 0.0645   Epoch: 3   Global Step: 163280   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:26,652-Speed 2627.34 samples/sec   Loss 10.8992   LearningRate 0.0645   Epoch: 3   Global Step: 163290   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:04:30,541-Speed 2633.48 samples/sec   Loss 11.0333   LearningRate 0.0645   Epoch: 3   Global Step: 163300   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:34,442-Speed 2625.90 samples/sec   Loss 10.9838   LearningRate 0.0645   Epoch: 3   Global Step: 163310   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:38,337-Speed 2629.37 samples/sec   Loss 10.8345   LearningRate 0.0645   Epoch: 3   Global Step: 163320   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:42,236-Speed 2626.91 samples/sec   Loss 10.8858   LearningRate 0.0645   Epoch: 3   Global Step: 163330   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:46,139-Speed 2624.23 samples/sec   Loss 10.9768   LearningRate 0.0645   Epoch: 3   Global Step: 163340   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:50,033-Speed 2630.58 samples/sec   Loss 10.8694   LearningRate 0.0645   Epoch: 3   Global Step: 163350   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:53,932-Speed 2627.04 samples/sec   Loss 11.0759   LearningRate 0.0645   Epoch: 3   Global Step: 163360   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:04:57,849-Speed 2614.50 samples/sec   Loss 11.0369   LearningRate 0.0645   Epoch: 3   Global Step: 163370   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:01,744-Speed 2629.85 samples/sec   Loss 10.9376   LearningRate 0.0645   Epoch: 3   Global Step: 163380   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:05,654-Speed 2619.35 samples/sec   Loss 10.8490   LearningRate 0.0645   Epoch: 3   Global Step: 163390   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:09,566-Speed 2618.20 samples/sec   Loss 10.8674   LearningRate 0.0645   Epoch: 3   Global Step: 163400   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:13,467-Speed 2625.86 samples/sec   Loss 10.8802   LearningRate 0.0645   Epoch: 3   Global Step: 163410   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:17,364-Speed 2628.30 samples/sec   Loss 10.8784   LearningRate 0.0645   Epoch: 3   Global Step: 163420   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:21,263-Speed 2627.03 samples/sec   Loss 10.9053   LearningRate 0.0645   Epoch: 3   Global Step: 163430   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:25,163-Speed 2626.44 samples/sec   Loss 10.9024   LearningRate 0.0645   Epoch: 3   Global Step: 163440   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:29,071-Speed 2620.67 samples/sec   Loss 10.9665   LearningRate 0.0645   Epoch: 3   Global Step: 163450   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:32,968-Speed 2628.10 samples/sec   Loss 10.9914   LearningRate 0.0645   Epoch: 3   Global Step: 163460   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:36,866-Speed 2627.96 samples/sec   Loss 10.9675   LearningRate 0.0645   Epoch: 3   Global Step: 163470   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:40,763-Speed 2627.48 samples/sec   Loss 11.0897   LearningRate 0.0645   Epoch: 3   Global Step: 163480   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:44,660-Speed 2629.43 samples/sec   Loss 10.8488   LearningRate 0.0645   Epoch: 3   Global Step: 163490   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:48,574-Speed 2616.66 samples/sec   Loss 10.9809   LearningRate 0.0645   Epoch: 3   Global Step: 163500   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:05:52,462-Speed 2634.55 samples/sec   Loss 11.0659   LearningRate 0.0645   Epoch: 3   Global Step: 163510   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:05:56,339-Speed 2641.61 samples/sec   Loss 10.7656   LearningRate 0.0645   Epoch: 3   Global Step: 163520   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:00,233-Speed 2630.68 samples/sec   Loss 10.9432   LearningRate 0.0645   Epoch: 3   Global Step: 163530   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:04,138-Speed 2622.40 samples/sec   Loss 10.8004   LearningRate 0.0645   Epoch: 3   Global Step: 163540   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:08,041-Speed 2624.06 samples/sec   Loss 10.8499   LearningRate 0.0645   Epoch: 3   Global Step: 163550   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:11,937-Speed 2629.27 samples/sec   Loss 10.9630   LearningRate 0.0645   Epoch: 3   Global Step: 163560   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:15,832-Speed 2630.07 samples/sec   Loss 10.8683   LearningRate 0.0645   Epoch: 3   Global Step: 163570   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:19,733-Speed 2625.78 samples/sec   Loss 10.9317   LearningRate 0.0645   Epoch: 3   Global Step: 163580   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:23,626-Speed 2630.63 samples/sec   Loss 10.8380   LearningRate 0.0644   Epoch: 3   Global Step: 163590   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:27,522-Speed 2629.28 samples/sec   Loss 10.8084   LearningRate 0.0644   Epoch: 3   Global Step: 163600   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:31,453-Speed 2606.15 samples/sec   Loss 10.9542   LearningRate 0.0644   Epoch: 3   Global Step: 163610   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:06:35,363-Speed 2619.54 samples/sec   Loss 10.8166   LearningRate 0.0644   Epoch: 3   Global Step: 163620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:06:39,266-Speed 2624.32 samples/sec   Loss 10.8538   LearningRate 0.0644   Epoch: 3   Global Step: 163630   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:06:43,218-Speed 2592.15 samples/sec   Loss 10.9999   LearningRate 0.0644   Epoch: 3   Global Step: 163640   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:06:47,164-Speed 2595.51 samples/sec   Loss 10.8345   LearningRate 0.0644   Epoch: 3   Global Step: 163650   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:06:51,069-Speed 2623.16 samples/sec   Loss 11.0921   LearningRate 0.0644   Epoch: 3   Global Step: 163660   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:06:54,984-Speed 2616.00 samples/sec   Loss 11.0768   LearningRate 0.0644   Epoch: 3   Global Step: 163670   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:06:58,944-Speed 2586.25 samples/sec   Loss 10.8516   LearningRate 0.0644   Epoch: 3   Global Step: 163680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:02,891-Speed 2595.11 samples/sec   Loss 10.8898   LearningRate 0.0644   Epoch: 3   Global Step: 163690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:06,804-Speed 2617.81 samples/sec   Loss 10.9466   LearningRate 0.0644   Epoch: 3   Global Step: 163700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:10,752-Speed 2594.15 samples/sec   Loss 10.8563   LearningRate 0.0644   Epoch: 3   Global Step: 163710   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:14,655-Speed 2624.53 samples/sec   Loss 10.9083   LearningRate 0.0644   Epoch: 3   Global Step: 163720   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:07:18,557-Speed 2624.70 samples/sec   Loss 10.9884   LearningRate 0.0644   Epoch: 3   Global Step: 163730   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:22,475-Speed 2614.34 samples/sec   Loss 10.7579   LearningRate 0.0644   Epoch: 3   Global Step: 163740   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:26,379-Speed 2623.95 samples/sec   Loss 11.0168   LearningRate 0.0644   Epoch: 3   Global Step: 163750   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:30,295-Speed 2615.59 samples/sec   Loss 11.0018   LearningRate 0.0644   Epoch: 3   Global Step: 163760   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:34,206-Speed 2619.26 samples/sec   Loss 11.0297   LearningRate 0.0644   Epoch: 3   Global Step: 163770   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:38,110-Speed 2623.08 samples/sec   Loss 10.9619   LearningRate 0.0644   Epoch: 3   Global Step: 163780   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:42,021-Speed 2618.60 samples/sec   Loss 11.0149   LearningRate 0.0644   Epoch: 3   Global Step: 163790   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:07:45,875-Speed 2658.00 samples/sec   Loss 11.0789   LearningRate 0.0644   Epoch: 3   Global Step: 163800   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:07:49,780-Speed 2622.80 samples/sec   Loss 10.9086   LearningRate 0.0644   Epoch: 3   Global Step: 163810   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:07:53,677-Speed 2628.91 samples/sec   Loss 11.0581   LearningRate 0.0644   Epoch: 3   Global Step: 163820   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:07:57,579-Speed 2624.56 samples/sec   Loss 11.0664   LearningRate 0.0644   Epoch: 3   Global Step: 163830   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:01,512-Speed 2604.93 samples/sec   Loss 10.9626   LearningRate 0.0644   Epoch: 3   Global Step: 163840   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:05,429-Speed 2614.79 samples/sec   Loss 11.0219   LearningRate 0.0644   Epoch: 3   Global Step: 163850   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:09,360-Speed 2605.34 samples/sec   Loss 10.9675   LearningRate 0.0644   Epoch: 3   Global Step: 163860   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:13,269-Speed 2620.48 samples/sec   Loss 11.0453   LearningRate 0.0644   Epoch: 3   Global Step: 163870   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:17,176-Speed 2621.38 samples/sec   Loss 10.9668   LearningRate 0.0644   Epoch: 3   Global Step: 163880   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:21,091-Speed 2616.94 samples/sec   Loss 10.8767   LearningRate 0.0644   Epoch: 3   Global Step: 163890   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:08:24,988-Speed 2628.44 samples/sec   Loss 11.0245   LearningRate 0.0644   Epoch: 3   Global Step: 163900   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:28,885-Speed 2628.13 samples/sec   Loss 11.0223   LearningRate 0.0644   Epoch: 3   Global Step: 163910   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:32,785-Speed 2626.41 samples/sec   Loss 10.9511   LearningRate 0.0644   Epoch: 3   Global Step: 163920   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:36,690-Speed 2622.42 samples/sec   Loss 11.0186   LearningRate 0.0644   Epoch: 3   Global Step: 163930   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:40,581-Speed 2632.27 samples/sec   Loss 10.9575   LearningRate 0.0644   Epoch: 3   Global Step: 163940   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:44,479-Speed 2628.01 samples/sec   Loss 11.1620   LearningRate 0.0644   Epoch: 3   Global Step: 163950   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:48,389-Speed 2619.22 samples/sec   Loss 10.9932   LearningRate 0.0644   Epoch: 3   Global Step: 163960   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:52,281-Speed 2632.23 samples/sec   Loss 10.8964   LearningRate 0.0644   Epoch: 3   Global Step: 163970   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:08:56,172-Speed 2632.24 samples/sec   Loss 10.9794   LearningRate 0.0644   Epoch: 3   Global Step: 163980   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:09:00,076-Speed 2624.13 samples/sec   Loss 10.8667   LearningRate 0.0644   Epoch: 3   Global Step: 163990   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:09:03,987-Speed 2618.47 samples/sec   Loss 10.8452   LearningRate 0.0644   Epoch: 3   Global Step: 164000   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:07,886-Speed 2627.43 samples/sec   Loss 10.9458   LearningRate 0.0644   Epoch: 3   Global Step: 164010   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:11,783-Speed 2627.86 samples/sec   Loss 10.8459   LearningRate 0.0644   Epoch: 3   Global Step: 164020   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:15,685-Speed 2625.43 samples/sec   Loss 10.9420   LearningRate 0.0644   Epoch: 3   Global Step: 164030   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:19,579-Speed 2630.19 samples/sec   Loss 10.9411   LearningRate 0.0644   Epoch: 3   Global Step: 164040   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:23,471-Speed 2632.07 samples/sec   Loss 10.8511   LearningRate 0.0644   Epoch: 3   Global Step: 164050   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:27,367-Speed 2628.63 samples/sec   Loss 10.8697   LearningRate 0.0644   Epoch: 3   Global Step: 164060   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:31,261-Speed 2630.42 samples/sec   Loss 11.0348   LearningRate 0.0644   Epoch: 3   Global Step: 164070   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:35,153-Speed 2631.75 samples/sec   Loss 10.9775   LearningRate 0.0644   Epoch: 3   Global Step: 164080   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:39,049-Speed 2628.88 samples/sec   Loss 11.0851   LearningRate 0.0644   Epoch: 3   Global Step: 164090   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:09:42,941-Speed 2631.43 samples/sec   Loss 11.0488   LearningRate 0.0644   Epoch: 3   Global Step: 164100   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:09:46,882-Speed 2599.05 samples/sec   Loss 10.8680   LearningRate 0.0643   Epoch: 3   Global Step: 164110   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:09:50,789-Speed 2621.97 samples/sec   Loss 11.0148   LearningRate 0.0643   Epoch: 3   Global Step: 164120   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:09:54,687-Speed 2629.23 samples/sec   Loss 10.9061   LearningRate 0.0643   Epoch: 3   Global Step: 164130   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:09:58,589-Speed 2625.12 samples/sec   Loss 10.9993   LearningRate 0.0643   Epoch: 3   Global Step: 164140   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:02,487-Speed 2628.10 samples/sec   Loss 10.8930   LearningRate 0.0643   Epoch: 3   Global Step: 164150   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:06,389-Speed 2624.88 samples/sec   Loss 10.8806   LearningRate 0.0643   Epoch: 3   Global Step: 164160   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:10,286-Speed 2628.59 samples/sec   Loss 11.0413   LearningRate 0.0643   Epoch: 3   Global Step: 164170   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:14,183-Speed 2628.13 samples/sec   Loss 10.7614   LearningRate 0.0643   Epoch: 3   Global Step: 164180   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:18,083-Speed 2626.11 samples/sec   Loss 11.0682   LearningRate 0.0643   Epoch: 3   Global Step: 164190   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:21,989-Speed 2622.83 samples/sec   Loss 11.0231   LearningRate 0.0643   Epoch: 3   Global Step: 164200   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:10:25,884-Speed 2629.22 samples/sec   Loss 11.0412   LearningRate 0.0643   Epoch: 3   Global Step: 164210   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:10:29,783-Speed 2627.30 samples/sec   Loss 10.9128   LearningRate 0.0643   Epoch: 3   Global Step: 164220   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:10:33,675-Speed 2631.93 samples/sec   Loss 10.8374   LearningRate 0.0643   Epoch: 3   Global Step: 164230   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:10:37,611-Speed 2602.11 samples/sec   Loss 10.9962   LearningRate 0.0643   Epoch: 3   Global Step: 164240   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:10:41,511-Speed 2626.32 samples/sec   Loss 10.7986   LearningRate 0.0643   Epoch: 3   Global Step: 164250   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:10:45,396-Speed 2636.76 samples/sec   Loss 11.1009   LearningRate 0.0643   Epoch: 3   Global Step: 164260   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:49,300-Speed 2623.25 samples/sec   Loss 11.0798   LearningRate 0.0643   Epoch: 3   Global Step: 164270   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:53,198-Speed 2628.22 samples/sec   Loss 10.9081   LearningRate 0.0643   Epoch: 3   Global Step: 164280   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:10:57,094-Speed 2629.34 samples/sec   Loss 10.8270   LearningRate 0.0643   Epoch: 3   Global Step: 164290   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:01,005-Speed 2618.78 samples/sec   Loss 10.9867   LearningRate 0.0643   Epoch: 3   Global Step: 164300   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:04,898-Speed 2631.50 samples/sec   Loss 10.9100   LearningRate 0.0643   Epoch: 3   Global Step: 164310   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:08,792-Speed 2629.85 samples/sec   Loss 10.7536   LearningRate 0.0643   Epoch: 3   Global Step: 164320   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:12,686-Speed 2630.29 samples/sec   Loss 10.7388   LearningRate 0.0643   Epoch: 3   Global Step: 164330   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:16,679-Speed 2564.89 samples/sec   Loss 10.8810   LearningRate 0.0643   Epoch: 3   Global Step: 164340   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:20,579-Speed 2626.27 samples/sec   Loss 11.0337   LearningRate 0.0643   Epoch: 3   Global Step: 164350   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:24,475-Speed 2629.09 samples/sec   Loss 10.8927   LearningRate 0.0643   Epoch: 3   Global Step: 164360   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:11:28,383-Speed 2621.34 samples/sec   Loss 10.8043   LearningRate 0.0643   Epoch: 3   Global Step: 164370   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:11:32,268-Speed 2636.24 samples/sec   Loss 10.9068   LearningRate 0.0643   Epoch: 3   Global Step: 164380   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:36,228-Speed 2587.02 samples/sec   Loss 10.8396   LearningRate 0.0643   Epoch: 3   Global Step: 164390   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:40,231-Speed 2558.31 samples/sec   Loss 10.9819   LearningRate 0.0643   Epoch: 3   Global Step: 164400   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:44,121-Speed 2633.07 samples/sec   Loss 10.8879   LearningRate 0.0643   Epoch: 3   Global Step: 164410   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:48,016-Speed 2629.36 samples/sec   Loss 10.9090   LearningRate 0.0643   Epoch: 3   Global Step: 164420   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:51,989-Speed 2579.08 samples/sec   Loss 10.9561   LearningRate 0.0643   Epoch: 3   Global Step: 164430   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:55,899-Speed 2619.07 samples/sec   Loss 10.8323   LearningRate 0.0643   Epoch: 3   Global Step: 164440   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:11:59,911-Speed 2552.90 samples/sec   Loss 10.9161   LearningRate 0.0643   Epoch: 3   Global Step: 164450   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:03,824-Speed 2617.75 samples/sec   Loss 10.9242   LearningRate 0.0643   Epoch: 3   Global Step: 164460   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:07,724-Speed 2626.37 samples/sec   Loss 10.8250   LearningRate 0.0643   Epoch: 3   Global Step: 164470   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:11,621-Speed 2628.44 samples/sec   Loss 10.9552   LearningRate 0.0643   Epoch: 3   Global Step: 164480   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:12:15,551-Speed 2606.34 samples/sec   Loss 10.9011   LearningRate 0.0643   Epoch: 3   Global Step: 164490   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:19,449-Speed 2627.44 samples/sec   Loss 10.8493   LearningRate 0.0643   Epoch: 3   Global Step: 164500   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:23,344-Speed 2630.00 samples/sec   Loss 10.8890   LearningRate 0.0643   Epoch: 3   Global Step: 164510   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:27,240-Speed 2628.88 samples/sec   Loss 11.0028   LearningRate 0.0643   Epoch: 3   Global Step: 164520   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:31,157-Speed 2614.76 samples/sec   Loss 10.8711   LearningRate 0.0643   Epoch: 3   Global Step: 164530   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:35,057-Speed 2626.74 samples/sec   Loss 10.9015   LearningRate 0.0643   Epoch: 3   Global Step: 164540   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:38,948-Speed 2631.82 samples/sec   Loss 10.7872   LearningRate 0.0643   Epoch: 3   Global Step: 164550   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:42,839-Speed 2632.99 samples/sec   Loss 11.0008   LearningRate 0.0643   Epoch: 3   Global Step: 164560   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:46,731-Speed 2631.52 samples/sec   Loss 10.7964   LearningRate 0.0643   Epoch: 3   Global Step: 164570   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:50,628-Speed 2628.47 samples/sec   Loss 10.9001   LearningRate 0.0643   Epoch: 3   Global Step: 164580   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:12:54,648-Speed 2547.70 samples/sec   Loss 10.8688   LearningRate 0.0643   Epoch: 3   Global Step: 164590   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:12:58,546-Speed 2627.60 samples/sec   Loss 10.9641   LearningRate 0.0643   Epoch: 3   Global Step: 164600   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:13:02,422-Speed 2642.47 samples/sec   Loss 10.9158   LearningRate 0.0643   Epoch: 3   Global Step: 164610   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:06,320-Speed 2627.30 samples/sec   Loss 10.9138   LearningRate 0.0643   Epoch: 3   Global Step: 164620   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:10,216-Speed 2630.04 samples/sec   Loss 10.9936   LearningRate 0.0642   Epoch: 3   Global Step: 164630   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:14,119-Speed 2623.95 samples/sec   Loss 10.8836   LearningRate 0.0642   Epoch: 3   Global Step: 164640   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:18,025-Speed 2622.61 samples/sec   Loss 11.0052   LearningRate 0.0642   Epoch: 3   Global Step: 164650   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:21,919-Speed 2630.48 samples/sec   Loss 10.9046   LearningRate 0.0642   Epoch: 3   Global Step: 164660   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:25,812-Speed 2630.75 samples/sec   Loss 10.9061   LearningRate 0.0642   Epoch: 3   Global Step: 164670   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:29,706-Speed 2630.71 samples/sec   Loss 10.8712   LearningRate 0.0642   Epoch: 3   Global Step: 164680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:33,603-Speed 2628.27 samples/sec   Loss 10.9413   LearningRate 0.0642   Epoch: 3   Global Step: 164690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:37,496-Speed 2630.63 samples/sec   Loss 10.8932   LearningRate 0.0642   Epoch: 3   Global Step: 164700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:13:41,397-Speed 2625.25 samples/sec   Loss 10.9718   LearningRate 0.0642   Epoch: 3   Global Step: 164710   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:13:45,295-Speed 2627.78 samples/sec   Loss 10.9277   LearningRate 0.0642   Epoch: 3   Global Step: 164720   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:13:49,198-Speed 2624.25 samples/sec   Loss 10.8622   LearningRate 0.0642   Epoch: 3   Global Step: 164730   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:13:53,076-Speed 2641.14 samples/sec   Loss 10.9196   LearningRate 0.0642   Epoch: 3   Global Step: 164740   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:13:56,973-Speed 2628.00 samples/sec   Loss 10.9475   LearningRate 0.0642   Epoch: 3   Global Step: 164750   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:00,870-Speed 2629.19 samples/sec   Loss 10.9717   LearningRate 0.0642   Epoch: 3   Global Step: 164760   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:04,769-Speed 2627.00 samples/sec   Loss 11.0732   LearningRate 0.0642   Epoch: 3   Global Step: 164770   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:08,673-Speed 2623.39 samples/sec   Loss 10.8268   LearningRate 0.0642   Epoch: 3   Global Step: 164780   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:12,565-Speed 2631.25 samples/sec   Loss 10.7352   LearningRate 0.0642   Epoch: 3   Global Step: 164790   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:16,459-Speed 2630.85 samples/sec   Loss 10.8922   LearningRate 0.0642   Epoch: 3   Global Step: 164800   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:20,360-Speed 2625.53 samples/sec   Loss 10.8651   LearningRate 0.0642   Epoch: 3   Global Step: 164810   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:24,263-Speed 2624.72 samples/sec   Loss 11.0050   LearningRate 0.0642   Epoch: 3   Global Step: 164820   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:28,205-Speed 2597.72 samples/sec   Loss 10.9926   LearningRate 0.0642   Epoch: 3   Global Step: 164830   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:14:32,149-Speed 2597.40 samples/sec   Loss 10.8458   LearningRate 0.0642   Epoch: 3   Global Step: 164840   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:36,070-Speed 2612.43 samples/sec   Loss 10.9954   LearningRate 0.0642   Epoch: 3   Global Step: 164850   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:39,996-Speed 2609.03 samples/sec   Loss 10.9001   LearningRate 0.0642   Epoch: 3   Global Step: 164860   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:44,017-Speed 2547.22 samples/sec   Loss 10.8517   LearningRate 0.0642   Epoch: 3   Global Step: 164870   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:47,919-Speed 2624.93 samples/sec   Loss 10.7881   LearningRate 0.0642   Epoch: 3   Global Step: 164880   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:51,817-Speed 2628.08 samples/sec   Loss 10.8168   LearningRate 0.0642   Epoch: 3   Global Step: 164890   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:55,723-Speed 2621.93 samples/sec   Loss 10.9518   LearningRate 0.0642   Epoch: 3   Global Step: 164900   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:14:59,621-Speed 2627.60 samples/sec   Loss 11.0440   LearningRate 0.0642   Epoch: 3   Global Step: 164910   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:15:03,499-Speed 2641.61 samples/sec   Loss 10.9297   LearningRate 0.0642   Epoch: 3   Global Step: 164920   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:07,392-Speed 2630.77 samples/sec   Loss 10.8497   LearningRate 0.0642   Epoch: 3   Global Step: 164930   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:11,295-Speed 2624.35 samples/sec   Loss 10.8735   LearningRate 0.0642   Epoch: 3   Global Step: 164940   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:15,202-Speed 2621.56 samples/sec   Loss 10.9329   LearningRate 0.0642   Epoch: 3   Global Step: 164950   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:19,096-Speed 2630.38 samples/sec   Loss 11.0297   LearningRate 0.0642   Epoch: 3   Global Step: 164960   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:22,993-Speed 2628.82 samples/sec   Loss 10.9306   LearningRate 0.0642   Epoch: 3   Global Step: 164970   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:26,891-Speed 2626.99 samples/sec   Loss 10.9129   LearningRate 0.0642   Epoch: 3   Global Step: 164980   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:30,805-Speed 2617.67 samples/sec   Loss 10.9431   LearningRate 0.0642   Epoch: 3   Global Step: 164990   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:34,702-Speed 2628.34 samples/sec   Loss 11.0150   LearningRate 0.0642   Epoch: 3   Global Step: 165000   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:38,594-Speed 2631.39 samples/sec   Loss 10.8445   LearningRate 0.0642   Epoch: 3   Global Step: 165010   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:15:42,491-Speed 2628.23 samples/sec   Loss 10.8902   LearningRate 0.0642   Epoch: 3   Global Step: 165020   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:15:46,385-Speed 2630.70 samples/sec   Loss 10.8699   LearningRate 0.0642   Epoch: 3   Global Step: 165030   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:15:50,278-Speed 2631.14 samples/sec   Loss 10.7863   LearningRate 0.0642   Epoch: 3   Global Step: 165040   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:15:54,173-Speed 2629.67 samples/sec   Loss 10.7478   LearningRate 0.0642   Epoch: 3   Global Step: 165050   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:15:58,089-Speed 2615.79 samples/sec   Loss 10.9768   LearningRate 0.0642   Epoch: 3   Global Step: 165060   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:01,992-Speed 2624.55 samples/sec   Loss 10.9343   LearningRate 0.0642   Epoch: 3   Global Step: 165070   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:05,912-Speed 2612.42 samples/sec   Loss 10.9832   LearningRate 0.0642   Epoch: 3   Global Step: 165080   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:09,804-Speed 2632.21 samples/sec   Loss 10.8236   LearningRate 0.0642   Epoch: 3   Global Step: 165090   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:13,697-Speed 2630.75 samples/sec   Loss 10.7955   LearningRate 0.0642   Epoch: 3   Global Step: 165100   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:17,590-Speed 2631.11 samples/sec   Loss 10.7674   LearningRate 0.0642   Epoch: 3   Global Step: 165110   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:21,481-Speed 2632.65 samples/sec   Loss 10.9549   LearningRate 0.0642   Epoch: 3   Global Step: 165120   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:25,371-Speed 2632.49 samples/sec   Loss 10.9576   LearningRate 0.0642   Epoch: 3   Global Step: 165130   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:29,263-Speed 2632.42 samples/sec   Loss 10.8442   LearningRate 0.0641   Epoch: 3   Global Step: 165140   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:33,160-Speed 2627.94 samples/sec   Loss 11.0635   LearningRate 0.0641   Epoch: 3   Global Step: 165150   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:37,060-Speed 2626.67 samples/sec   Loss 10.9554   LearningRate 0.0641   Epoch: 3   Global Step: 165160   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:40,977-Speed 2614.80 samples/sec   Loss 10.9926   LearningRate 0.0641   Epoch: 3   Global Step: 165170   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:44,873-Speed 2628.59 samples/sec   Loss 11.0049   LearningRate 0.0641   Epoch: 3   Global Step: 165180   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:48,768-Speed 2629.63 samples/sec   Loss 11.1392   LearningRate 0.0641   Epoch: 3   Global Step: 165190   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:16:52,654-Speed 2636.00 samples/sec   Loss 10.9094   LearningRate 0.0641   Epoch: 3   Global Step: 165200   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:16:56,558-Speed 2623.31 samples/sec   Loss 10.9079   LearningRate 0.0641   Epoch: 3   Global Step: 165210   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:00,457-Speed 2627.26 samples/sec   Loss 10.9659   LearningRate 0.0641   Epoch: 3   Global Step: 165220   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:04,366-Speed 2620.34 samples/sec   Loss 10.8910   LearningRate 0.0641   Epoch: 3   Global Step: 165230   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:08,266-Speed 2626.32 samples/sec   Loss 10.9315   LearningRate 0.0641   Epoch: 3   Global Step: 165240   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:12,165-Speed 2626.89 samples/sec   Loss 10.9864   LearningRate 0.0641   Epoch: 3   Global Step: 165250   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:16,075-Speed 2619.48 samples/sec   Loss 10.7931   LearningRate 0.0641   Epoch: 3   Global Step: 165260   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:19,974-Speed 2626.41 samples/sec   Loss 11.0944   LearningRate 0.0641   Epoch: 3   Global Step: 165270   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:23,885-Speed 2619.48 samples/sec   Loss 10.9086   LearningRate 0.0641   Epoch: 3   Global Step: 165280   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:27,784-Speed 2627.35 samples/sec   Loss 10.9571   LearningRate 0.0641   Epoch: 3   Global Step: 165290   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:31,678-Speed 2629.62 samples/sec   Loss 10.8494   LearningRate 0.0641   Epoch: 3   Global Step: 165300   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:35,573-Speed 2630.23 samples/sec   Loss 10.9323   LearningRate 0.0641   Epoch: 3   Global Step: 165310   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:17:39,464-Speed 2631.86 samples/sec   Loss 11.0264   LearningRate 0.0641   Epoch: 3   Global Step: 165320   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:17:43,364-Speed 2626.46 samples/sec   Loss 10.9654   LearningRate 0.0641   Epoch: 3   Global Step: 165330   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:47,259-Speed 2629.22 samples/sec   Loss 10.8620   LearningRate 0.0641   Epoch: 3   Global Step: 165340   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:51,155-Speed 2629.57 samples/sec   Loss 10.8100   LearningRate 0.0641   Epoch: 3   Global Step: 165350   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:55,073-Speed 2614.16 samples/sec   Loss 10.8889   LearningRate 0.0641   Epoch: 3   Global Step: 165360   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:17:58,983-Speed 2620.09 samples/sec   Loss 10.8631   LearningRate 0.0641   Epoch: 3   Global Step: 165370   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:18:02,881-Speed 2627.30 samples/sec   Loss 10.8381   LearningRate 0.0641   Epoch: 3   Global Step: 165380   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:18:06,802-Speed 2612.69 samples/sec   Loss 10.7955   LearningRate 0.0641   Epoch: 3   Global Step: 165390   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:18:10,713-Speed 2618.90 samples/sec   Loss 10.8390   LearningRate 0.0641   Epoch: 3   Global Step: 165400   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:18:14,612-Speed 2627.12 samples/sec   Loss 10.8736   LearningRate 0.0641   Epoch: 3   Global Step: 165410   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:18:18,515-Speed 2624.23 samples/sec   Loss 10.9831   LearningRate 0.0641   Epoch: 3   Global Step: 165420   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:18:22,409-Speed 2630.65 samples/sec   Loss 10.9479   LearningRate 0.0641   Epoch: 3   Global Step: 165430   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:26,315-Speed 2621.92 samples/sec   Loss 10.8576   LearningRate 0.0641   Epoch: 3   Global Step: 165440   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:30,211-Speed 2628.93 samples/sec   Loss 10.9091   LearningRate 0.0641   Epoch: 3   Global Step: 165450   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:34,110-Speed 2626.64 samples/sec   Loss 11.0012   LearningRate 0.0641   Epoch: 3   Global Step: 165460   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:38,004-Speed 2631.03 samples/sec   Loss 10.8923   LearningRate 0.0641   Epoch: 3   Global Step: 165470   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:41,911-Speed 2620.91 samples/sec   Loss 10.9568   LearningRate 0.0641   Epoch: 3   Global Step: 165480   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:45,810-Speed 2627.57 samples/sec   Loss 10.8093   LearningRate 0.0641   Epoch: 3   Global Step: 165490   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:49,717-Speed 2621.14 samples/sec   Loss 10.9434   LearningRate 0.0641   Epoch: 3   Global Step: 165500   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:53,626-Speed 2620.32 samples/sec   Loss 10.9625   LearningRate 0.0641   Epoch: 3   Global Step: 165510   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:18:57,528-Speed 2625.01 samples/sec   Loss 11.0629   LearningRate 0.0641   Epoch: 3   Global Step: 165520   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:19:01,428-Speed 2626.16 samples/sec   Loss 10.8915   LearningRate 0.0641   Epoch: 3   Global Step: 165530   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:19:05,324-Speed 2628.90 samples/sec   Loss 11.0324   LearningRate 0.0641   Epoch: 3   Global Step: 165540   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:19:09,220-Speed 2629.00 samples/sec   Loss 10.9543   LearningRate 0.0641   Epoch: 3   Global Step: 165550   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:19:13,114-Speed 2630.33 samples/sec   Loss 10.8658   LearningRate 0.0641   Epoch: 3   Global Step: 165560   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:19:16,995-Speed 2638.78 samples/sec   Loss 11.0409   LearningRate 0.0641   Epoch: 3   Global Step: 165570   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:19:20,876-Speed 2639.82 samples/sec   Loss 11.0410   LearningRate 0.0641   Epoch: 3   Global Step: 165580   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:24,784-Speed 2620.51 samples/sec   Loss 10.9511   LearningRate 0.0641   Epoch: 3   Global Step: 165590   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:28,688-Speed 2624.25 samples/sec   Loss 10.9386   LearningRate 0.0641   Epoch: 3   Global Step: 165600   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:32,580-Speed 2631.15 samples/sec   Loss 10.9426   LearningRate 0.0641   Epoch: 3   Global Step: 165610   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:36,474-Speed 2630.44 samples/sec   Loss 11.0673   LearningRate 0.0641   Epoch: 3   Global Step: 165620   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:40,397-Speed 2611.11 samples/sec   Loss 10.8708   LearningRate 0.0641   Epoch: 3   Global Step: 165630   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:44,290-Speed 2631.06 samples/sec   Loss 10.8027   LearningRate 0.0641   Epoch: 3   Global Step: 165640   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:48,330-Speed 2535.03 samples/sec   Loss 10.9088   LearningRate 0.0641   Epoch: 3   Global Step: 165650   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:52,223-Speed 2631.16 samples/sec   Loss 10.8888   LearningRate 0.0640   Epoch: 3   Global Step: 165660   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:19:56,119-Speed 2628.61 samples/sec   Loss 11.0723   LearningRate 0.0640   Epoch: 3   Global Step: 165670   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:20:00,016-Speed 2628.24 samples/sec   Loss 10.8624   LearningRate 0.0640   Epoch: 3   Global Step: 165680   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:03,926-Speed 2619.85 samples/sec   Loss 10.8624   LearningRate 0.0640   Epoch: 3   Global Step: 165690   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:07,823-Speed 2628.32 samples/sec   Loss 10.8057   LearningRate 0.0640   Epoch: 3   Global Step: 165700   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:11,722-Speed 2627.35 samples/sec   Loss 10.9920   LearningRate 0.0640   Epoch: 3   Global Step: 165710   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:15,626-Speed 2623.23 samples/sec   Loss 10.9615   LearningRate 0.0640   Epoch: 3   Global Step: 165720   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:19,525-Speed 2627.47 samples/sec   Loss 10.9060   LearningRate 0.0640   Epoch: 3   Global Step: 165730   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:23,427-Speed 2625.00 samples/sec   Loss 10.8880   LearningRate 0.0640   Epoch: 3   Global Step: 165740   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:27,331-Speed 2623.83 samples/sec   Loss 11.1735   LearningRate 0.0640   Epoch: 3   Global Step: 165750   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:31,248-Speed 2614.72 samples/sec   Loss 10.8404   LearningRate 0.0640   Epoch: 3   Global Step: 165760   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:35,144-Speed 2628.86 samples/sec   Loss 11.0280   LearningRate 0.0640   Epoch: 3   Global Step: 165770   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:39,040-Speed 2629.50 samples/sec   Loss 10.7781   LearningRate 0.0640   Epoch: 3   Global Step: 165780   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:20:42,935-Speed 2629.37 samples/sec   Loss 10.8957   LearningRate 0.0640   Epoch: 3   Global Step: 165790   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:20:46,832-Speed 2628.19 samples/sec   Loss 10.9039   LearningRate 0.0640   Epoch: 3   Global Step: 165800   Fp16 Grad Scale: 262144   Required: 75 hours
Training: 2022-04-13 14:20:50,711-Speed 2640.55 samples/sec   Loss 10.8810   LearningRate 0.0640   Epoch: 3   Global Step: 165810   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:54,611-Speed 2626.39 samples/sec   Loss 10.8913   LearningRate 0.0640   Epoch: 3   Global Step: 165820   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:20:58,511-Speed 2626.96 samples/sec   Loss 10.8704   LearningRate 0.0640   Epoch: 3   Global Step: 165830   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:21:02,420-Speed 2619.87 samples/sec   Loss 10.9257   LearningRate 0.0640   Epoch: 3   Global Step: 165840   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:21:06,315-Speed 2629.50 samples/sec   Loss 10.8983   LearningRate 0.0640   Epoch: 3   Global Step: 165850   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:21:10,209-Speed 2630.16 samples/sec   Loss 10.8850   LearningRate 0.0640   Epoch: 3   Global Step: 165860   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:21:14,118-Speed 2620.20 samples/sec   Loss 11.0602   LearningRate 0.0640   Epoch: 3   Global Step: 165870   Fp16 Grad Scale: 131072   Required: 75 hours
Training: 2022-04-13 14:21:18,015-Speed 2628.54 samples/sec   Loss 10.8924   LearningRate 0.0640   Epoch: 3   Global Step: 165880   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:21:21,948-Speed 2603.73 samples/sec   Loss 10.9192   LearningRate 0.0640   Epoch: 3   Global Step: 165890   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:21:25,842-Speed 2630.97 samples/sec   Loss 10.8564   LearningRate 0.0640   Epoch: 3   Global Step: 165900   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:21:29,750-Speed 2620.96 samples/sec   Loss 10.9676   LearningRate 0.0640   Epoch: 3   Global Step: 165910   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:21:50,890-Speed 484.41 samples/sec   Loss 11.0112   LearningRate 0.0640   Epoch: 4   Global Step: 165920   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:21:54,779-Speed 2634.58 samples/sec   Loss 10.9454   LearningRate 0.0640   Epoch: 4   Global Step: 165930   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:21:58,664-Speed 2636.62 samples/sec   Loss 10.9245   LearningRate 0.0640   Epoch: 4   Global Step: 165940   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:22:02,558-Speed 2630.73 samples/sec   Loss 10.9101   LearningRate 0.0640   Epoch: 4   Global Step: 165950   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:22:06,456-Speed 2627.55 samples/sec   Loss 10.8445   LearningRate 0.0640   Epoch: 4   Global Step: 165960   Fp16 Grad Scale: 65536   Required: 75 hours
Training: 2022-04-13 14:22:10,311-Speed 2656.70 samples/sec   Loss 10.9454   LearningRate 0.0640   Epoch: 4   Global Step: 165970   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:14,255-Speed 2597.05 samples/sec   Loss 11.6734   LearningRate 0.0640   Epoch: 4   Global Step: 165980   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:18,138-Speed 2637.78 samples/sec   Loss 11.0802   LearningRate 0.0640   Epoch: 4   Global Step: 165990   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:22,025-Speed 2635.34 samples/sec   Loss 10.9931   LearningRate 0.0640   Epoch: 4   Global Step: 166000   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:25,909-Speed 2637.48 samples/sec   Loss 10.9837   LearningRate 0.0640   Epoch: 4   Global Step: 166010   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:29,797-Speed 2633.89 samples/sec   Loss 10.9347   LearningRate 0.0640   Epoch: 4   Global Step: 166020   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:33,679-Speed 2638.83 samples/sec   Loss 10.9395   LearningRate 0.0640   Epoch: 4   Global Step: 166030   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:37,577-Speed 2627.57 samples/sec   Loss 11.0745   LearningRate 0.0640   Epoch: 4   Global Step: 166040   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:41,465-Speed 2634.74 samples/sec   Loss 11.0384   LearningRate 0.0640   Epoch: 4   Global Step: 166050   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:45,356-Speed 2632.44 samples/sec   Loss 10.9228   LearningRate 0.0640   Epoch: 4   Global Step: 166060   Fp16 Grad Scale: 16384   Required: 75 hours
Training: 2022-04-13 14:22:49,255-Speed 2627.06 samples/sec   Loss 10.9700   LearningRate 0.0640   Epoch: 4   Global Step: 166070   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:22:53,146-Speed 2632.21 samples/sec   Loss 10.8792   LearningRate 0.0640   Epoch: 4   Global Step: 166080   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:22:57,038-Speed 2631.73 samples/sec   Loss 10.8374   LearningRate 0.0640   Epoch: 4   Global Step: 166090   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:23:00,933-Speed 2629.31 samples/sec   Loss 10.9539   LearningRate 0.0640   Epoch: 4   Global Step: 166100   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:23:04,842-Speed 2620.34 samples/sec   Loss 10.8036   LearningRate 0.0640   Epoch: 4   Global Step: 166110   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:23:08,745-Speed 2623.98 samples/sec   Loss 10.8780   LearningRate 0.0640   Epoch: 4   Global Step: 166120   Fp16 Grad Scale: 32768   Required: 75 hours
Training: 2022-04-13 14:23:12,640-Speed 2630.12 samples/sec   Loss 10.8332   LearningRate 0.0640   Epoch: 4   Global Step: 166130   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:23:16,541-Speed 2625.42 samples/sec   Loss 10.8140   LearningRate 0.0640   Epoch: 4   Global Step: 166140   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:23:20,436-Speed 2629.69 samples/sec   Loss 10.8718   LearningRate 0.0640   Epoch: 4   Global Step: 166150   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:23:24,331-Speed 2629.26 samples/sec   Loss 10.9417   LearningRate 0.0640   Epoch: 4   Global Step: 166160   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:23:28,235-Speed 2623.78 samples/sec   Loss 10.8876   LearningRate 0.0640   Epoch: 4   Global Step: 166170   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:32,235-Speed 2561.00 samples/sec   Loss 10.9525   LearningRate 0.0639   Epoch: 4   Global Step: 166180   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:36,135-Speed 2625.56 samples/sec   Loss 10.8792   LearningRate 0.0639   Epoch: 4   Global Step: 166190   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:40,032-Speed 2628.36 samples/sec   Loss 10.7112   LearningRate 0.0639   Epoch: 4   Global Step: 166200   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:43,931-Speed 2626.94 samples/sec   Loss 10.8979   LearningRate 0.0639   Epoch: 4   Global Step: 166210   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:47,873-Speed 2598.87 samples/sec   Loss 10.8686   LearningRate 0.0639   Epoch: 4   Global Step: 166220   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:51,767-Speed 2630.18 samples/sec   Loss 11.0111   LearningRate 0.0639   Epoch: 4   Global Step: 166230   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:55,741-Speed 2577.81 samples/sec   Loss 10.9062   LearningRate 0.0639   Epoch: 4   Global Step: 166240   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:23:59,649-Speed 2620.70 samples/sec   Loss 11.0863   LearningRate 0.0639   Epoch: 4   Global Step: 166250   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:24:03,594-Speed 2596.61 samples/sec   Loss 10.8673   LearningRate 0.0639   Epoch: 4   Global Step: 166260   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:24:07,510-Speed 2615.42 samples/sec   Loss 10.8728   LearningRate 0.0639   Epoch: 4   Global Step: 166270   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:11,413-Speed 2624.04 samples/sec   Loss 10.9223   LearningRate 0.0639   Epoch: 4   Global Step: 166280   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:15,309-Speed 2628.99 samples/sec   Loss 10.9859   LearningRate 0.0639   Epoch: 4   Global Step: 166290   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:19,208-Speed 2627.29 samples/sec   Loss 10.9415   LearningRate 0.0639   Epoch: 4   Global Step: 166300   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:23,106-Speed 2628.23 samples/sec   Loss 10.8897   LearningRate 0.0639   Epoch: 4   Global Step: 166310   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:27,002-Speed 2628.25 samples/sec   Loss 10.9874   LearningRate 0.0639   Epoch: 4   Global Step: 166320   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:30,902-Speed 2626.58 samples/sec   Loss 10.8920   LearningRate 0.0639   Epoch: 4   Global Step: 166330   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:34,912-Speed 2553.65 samples/sec   Loss 10.9370   LearningRate 0.0639   Epoch: 4   Global Step: 166340   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:38,839-Speed 2608.82 samples/sec   Loss 10.9532   LearningRate 0.0639   Epoch: 4   Global Step: 166350   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:42,778-Speed 2600.43 samples/sec   Loss 10.8756   LearningRate 0.0639   Epoch: 4   Global Step: 166360   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:24:46,671-Speed 2630.99 samples/sec   Loss 10.7874   LearningRate 0.0639   Epoch: 4   Global Step: 166370   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:24:50,567-Speed 2629.39 samples/sec   Loss 10.7718   LearningRate 0.0639   Epoch: 4   Global Step: 166380   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:24:54,466-Speed 2626.40 samples/sec   Loss 10.8953   LearningRate 0.0639   Epoch: 4   Global Step: 166390   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:24:58,375-Speed 2620.42 samples/sec   Loss 10.8186   LearningRate 0.0639   Epoch: 4   Global Step: 166400   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:02,275-Speed 2626.48 samples/sec   Loss 10.9683   LearningRate 0.0639   Epoch: 4   Global Step: 166410   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:06,178-Speed 2624.54 samples/sec   Loss 10.9713   LearningRate 0.0639   Epoch: 4   Global Step: 166420   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:10,073-Speed 2629.55 samples/sec   Loss 10.9294   LearningRate 0.0639   Epoch: 4   Global Step: 166430   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:13,981-Speed 2621.09 samples/sec   Loss 10.7508   LearningRate 0.0639   Epoch: 4   Global Step: 166440   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:17,878-Speed 2628.70 samples/sec   Loss 10.8753   LearningRate 0.0639   Epoch: 4   Global Step: 166450   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:21,789-Speed 2618.57 samples/sec   Loss 10.9423   LearningRate 0.0639   Epoch: 4   Global Step: 166460   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:25,682-Speed 2631.28 samples/sec   Loss 10.7409   LearningRate 0.0639   Epoch: 4   Global Step: 166470   Fp16 Grad Scale: 524288   Required: 74 hours
Training: 2022-04-13 14:25:29,560-Speed 2641.58 samples/sec   Loss 11.0004   LearningRate 0.0639   Epoch: 4   Global Step: 166480   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:33,456-Speed 2628.99 samples/sec   Loss 10.8695   LearningRate 0.0639   Epoch: 4   Global Step: 166490   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:37,353-Speed 2627.73 samples/sec   Loss 10.8116   LearningRate 0.0639   Epoch: 4   Global Step: 166500   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:41,245-Speed 2632.08 samples/sec   Loss 10.9856   LearningRate 0.0639   Epoch: 4   Global Step: 166510   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:25:45,127-Speed 2638.32 samples/sec   Loss 10.8548   LearningRate 0.0639   Epoch: 4   Global Step: 166520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:25:49,036-Speed 2620.71 samples/sec   Loss 10.9104   LearningRate 0.0639   Epoch: 4   Global Step: 166530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:25:52,934-Speed 2627.47 samples/sec   Loss 10.8311   LearningRate 0.0639   Epoch: 4   Global Step: 166540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:25:56,833-Speed 2627.30 samples/sec   Loss 10.9935   LearningRate 0.0639   Epoch: 4   Global Step: 166550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:00,735-Speed 2624.83 samples/sec   Loss 10.9520   LearningRate 0.0639   Epoch: 4   Global Step: 166560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:04,632-Speed 2628.21 samples/sec   Loss 10.8624   LearningRate 0.0639   Epoch: 4   Global Step: 166570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:08,571-Speed 2600.09 samples/sec   Loss 10.9174   LearningRate 0.0639   Epoch: 4   Global Step: 166580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:12,487-Speed 2616.24 samples/sec   Loss 10.9446   LearningRate 0.0639   Epoch: 4   Global Step: 166590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:16,389-Speed 2624.71 samples/sec   Loss 10.5952   LearningRate 0.0639   Epoch: 4   Global Step: 166600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:20,281-Speed 2632.05 samples/sec   Loss 10.8732   LearningRate 0.0639   Epoch: 4   Global Step: 166610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:24,177-Speed 2628.87 samples/sec   Loss 10.8492   LearningRate 0.0639   Epoch: 4   Global Step: 166620   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:28,089-Speed 2618.65 samples/sec   Loss 10.8223   LearningRate 0.0639   Epoch: 4   Global Step: 166630   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:31,994-Speed 2623.19 samples/sec   Loss 10.8288   LearningRate 0.0639   Epoch: 4   Global Step: 166640   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:35,901-Speed 2621.00 samples/sec   Loss 10.9069   LearningRate 0.0639   Epoch: 4   Global Step: 166650   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:39,819-Speed 2614.52 samples/sec   Loss 10.8087   LearningRate 0.0639   Epoch: 4   Global Step: 166660   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:43,713-Speed 2630.85 samples/sec   Loss 10.8250   LearningRate 0.0639   Epoch: 4   Global Step: 166670   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:47,608-Speed 2629.17 samples/sec   Loss 10.7349   LearningRate 0.0639   Epoch: 4   Global Step: 166680   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:26:51,483-Speed 2643.59 samples/sec   Loss 11.0178   LearningRate 0.0639   Epoch: 4   Global Step: 166690   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:55,378-Speed 2629.61 samples/sec   Loss 10.7513   LearningRate 0.0638   Epoch: 4   Global Step: 166700   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:26:59,274-Speed 2629.35 samples/sec   Loss 10.8219   LearningRate 0.0638   Epoch: 4   Global Step: 166710   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:03,176-Speed 2624.48 samples/sec   Loss 10.8765   LearningRate 0.0638   Epoch: 4   Global Step: 166720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:07,097-Speed 2612.40 samples/sec   Loss 10.8427   LearningRate 0.0638   Epoch: 4   Global Step: 166730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:11,009-Speed 2617.74 samples/sec   Loss 10.9023   LearningRate 0.0638   Epoch: 4   Global Step: 166740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:14,913-Speed 2623.91 samples/sec   Loss 10.8391   LearningRate 0.0638   Epoch: 4   Global Step: 166750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:18,822-Speed 2620.76 samples/sec   Loss 10.9604   LearningRate 0.0638   Epoch: 4   Global Step: 166760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:22,713-Speed 2632.06 samples/sec   Loss 10.9360   LearningRate 0.0638   Epoch: 4   Global Step: 166770   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:26,609-Speed 2629.29 samples/sec   Loss 10.9757   LearningRate 0.0638   Epoch: 4   Global Step: 166780   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:27:30,504-Speed 2629.04 samples/sec   Loss 10.9240   LearningRate 0.0638   Epoch: 4   Global Step: 166790   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:27:34,362-Speed 2654.66 samples/sec   Loss 10.9736   LearningRate 0.0638   Epoch: 4   Global Step: 166800   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:27:38,262-Speed 2626.43 samples/sec   Loss 10.9264   LearningRate 0.0638   Epoch: 4   Global Step: 166810   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:27:42,169-Speed 2621.69 samples/sec   Loss 10.8554   LearningRate 0.0638   Epoch: 4   Global Step: 166820   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:27:46,074-Speed 2623.35 samples/sec   Loss 10.9287   LearningRate 0.0638   Epoch: 4   Global Step: 166830   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:27:49,990-Speed 2615.55 samples/sec   Loss 10.9666   LearningRate 0.0638   Epoch: 4   Global Step: 166840   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:27:53,893-Speed 2624.41 samples/sec   Loss 10.9769   LearningRate 0.0638   Epoch: 4   Global Step: 166850   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:27:57,816-Speed 2611.37 samples/sec   Loss 11.0164   LearningRate 0.0638   Epoch: 4   Global Step: 166860   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:01,707-Speed 2631.82 samples/sec   Loss 10.9618   LearningRate 0.0638   Epoch: 4   Global Step: 166870   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:05,613-Speed 2622.17 samples/sec   Loss 10.8318   LearningRate 0.0638   Epoch: 4   Global Step: 166880   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:09,531-Speed 2614.57 samples/sec   Loss 10.9235   LearningRate 0.0638   Epoch: 4   Global Step: 166890   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:13,443-Speed 2618.20 samples/sec   Loss 10.6930   LearningRate 0.0638   Epoch: 4   Global Step: 166900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:28:17,337-Speed 2629.91 samples/sec   Loss 10.9575   LearningRate 0.0638   Epoch: 4   Global Step: 166910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:28:21,251-Speed 2617.15 samples/sec   Loss 10.7190   LearningRate 0.0638   Epoch: 4   Global Step: 166920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:28:25,165-Speed 2616.44 samples/sec   Loss 10.7871   LearningRate 0.0638   Epoch: 4   Global Step: 166930   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:28:29,064-Speed 2627.22 samples/sec   Loss 10.9264   LearningRate 0.0638   Epoch: 4   Global Step: 166940   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:28:32,942-Speed 2641.26 samples/sec   Loss 10.8679   LearningRate 0.0638   Epoch: 4   Global Step: 166950   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:36,837-Speed 2630.11 samples/sec   Loss 10.8995   LearningRate 0.0638   Epoch: 4   Global Step: 166960   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:40,731-Speed 2629.80 samples/sec   Loss 10.9415   LearningRate 0.0638   Epoch: 4   Global Step: 166970   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:44,628-Speed 2628.58 samples/sec   Loss 10.7775   LearningRate 0.0638   Epoch: 4   Global Step: 166980   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:48,563-Speed 2602.22 samples/sec   Loss 10.8062   LearningRate 0.0638   Epoch: 4   Global Step: 166990   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:52,475-Speed 2618.67 samples/sec   Loss 10.8774   LearningRate 0.0638   Epoch: 4   Global Step: 167000   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:28:56,368-Speed 2631.23 samples/sec   Loss 10.7924   LearningRate 0.0638   Epoch: 4   Global Step: 167010   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:29:00,261-Speed 2630.89 samples/sec   Loss 10.8417   LearningRate 0.0638   Epoch: 4   Global Step: 167020   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:29:04,225-Speed 2583.84 samples/sec   Loss 10.8530   LearningRate 0.0638   Epoch: 4   Global Step: 167030   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:29:08,312-Speed 2506.32 samples/sec   Loss 10.9114   LearningRate 0.0638   Epoch: 4   Global Step: 167040   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:29:12,405-Speed 2502.42 samples/sec   Loss 10.7906   LearningRate 0.0638   Epoch: 4   Global Step: 167050   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:16,303-Speed 2627.69 samples/sec   Loss 10.7801   LearningRate 0.0638   Epoch: 4   Global Step: 167060   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:20,198-Speed 2629.52 samples/sec   Loss 10.8878   LearningRate 0.0638   Epoch: 4   Global Step: 167070   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:24,093-Speed 2630.01 samples/sec   Loss 10.8856   LearningRate 0.0638   Epoch: 4   Global Step: 167080   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:27,987-Speed 2630.29 samples/sec   Loss 11.0918   LearningRate 0.0638   Epoch: 4   Global Step: 167090   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:31,880-Speed 2631.22 samples/sec   Loss 10.8141   LearningRate 0.0638   Epoch: 4   Global Step: 167100   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:35,778-Speed 2627.71 samples/sec   Loss 10.8175   LearningRate 0.0638   Epoch: 4   Global Step: 167110   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:39,671-Speed 2630.67 samples/sec   Loss 10.8642   LearningRate 0.0638   Epoch: 4   Global Step: 167120   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:43,582-Speed 2619.14 samples/sec   Loss 10.8634   LearningRate 0.0638   Epoch: 4   Global Step: 167130   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:47,478-Speed 2628.96 samples/sec   Loss 10.8827   LearningRate 0.0638   Epoch: 4   Global Step: 167140   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:51,376-Speed 2627.17 samples/sec   Loss 10.8517   LearningRate 0.0638   Epoch: 4   Global Step: 167150   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:55,276-Speed 2626.05 samples/sec   Loss 10.8948   LearningRate 0.0638   Epoch: 4   Global Step: 167160   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:29:59,185-Speed 2621.03 samples/sec   Loss 10.8623   LearningRate 0.0638   Epoch: 4   Global Step: 167170   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:30:03,080-Speed 2628.85 samples/sec   Loss 11.0594   LearningRate 0.0638   Epoch: 4   Global Step: 167180   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:30:06,981-Speed 2626.57 samples/sec   Loss 10.8739   LearningRate 0.0638   Epoch: 4   Global Step: 167190   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:30:10,858-Speed 2641.96 samples/sec   Loss 10.8163   LearningRate 0.0638   Epoch: 4   Global Step: 167200   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:14,763-Speed 2622.45 samples/sec   Loss 12.1286   LearningRate 0.0638   Epoch: 4   Global Step: 167210   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:18,663-Speed 2626.48 samples/sec   Loss 11.5601   LearningRate 0.0637   Epoch: 4   Global Step: 167220   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:22,555-Speed 2632.13 samples/sec   Loss 11.2354   LearningRate 0.0637   Epoch: 4   Global Step: 167230   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:26,481-Speed 2608.57 samples/sec   Loss 11.1638   LearningRate 0.0637   Epoch: 4   Global Step: 167240   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:30,377-Speed 2629.05 samples/sec   Loss 11.1064   LearningRate 0.0637   Epoch: 4   Global Step: 167250   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:34,356-Speed 2574.23 samples/sec   Loss 10.9744   LearningRate 0.0637   Epoch: 4   Global Step: 167260   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:38,279-Speed 2611.42 samples/sec   Loss 11.0518   LearningRate 0.0637   Epoch: 4   Global Step: 167270   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:42,186-Speed 2621.41 samples/sec   Loss 10.7631   LearningRate 0.0637   Epoch: 4   Global Step: 167280   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:46,077-Speed 2632.77 samples/sec   Loss 11.0137   LearningRate 0.0637   Epoch: 4   Global Step: 167290   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:30:49,969-Speed 2631.25 samples/sec   Loss 10.7934   LearningRate 0.0637   Epoch: 4   Global Step: 167300   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:30:53,861-Speed 2632.37 samples/sec   Loss 10.8723   LearningRate 0.0637   Epoch: 4   Global Step: 167310   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:30:57,771-Speed 2619.76 samples/sec   Loss 10.9893   LearningRate 0.0637   Epoch: 4   Global Step: 167320   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:01,666-Speed 2629.16 samples/sec   Loss 10.9357   LearningRate 0.0637   Epoch: 4   Global Step: 167330   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:05,587-Speed 2611.96 samples/sec   Loss 10.8828   LearningRate 0.0637   Epoch: 4   Global Step: 167340   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:09,482-Speed 2630.05 samples/sec   Loss 10.9224   LearningRate 0.0637   Epoch: 4   Global Step: 167350   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:13,381-Speed 2627.44 samples/sec   Loss 10.9445   LearningRate 0.0637   Epoch: 4   Global Step: 167360   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:17,280-Speed 2627.02 samples/sec   Loss 10.9801   LearningRate 0.0637   Epoch: 4   Global Step: 167370   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:21,177-Speed 2628.73 samples/sec   Loss 10.7823   LearningRate 0.0637   Epoch: 4   Global Step: 167380   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:25,075-Speed 2627.24 samples/sec   Loss 10.9843   LearningRate 0.0637   Epoch: 4   Global Step: 167390   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:31:28,972-Speed 2628.80 samples/sec   Loss 11.0350   LearningRate 0.0637   Epoch: 4   Global Step: 167400   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:32,871-Speed 2626.71 samples/sec   Loss 10.9582   LearningRate 0.0637   Epoch: 4   Global Step: 167410   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:36,773-Speed 2624.83 samples/sec   Loss 10.8611   LearningRate 0.0637   Epoch: 4   Global Step: 167420   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:40,671-Speed 2627.26 samples/sec   Loss 10.8008   LearningRate 0.0637   Epoch: 4   Global Step: 167430   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:44,569-Speed 2628.28 samples/sec   Loss 10.8873   LearningRate 0.0637   Epoch: 4   Global Step: 167440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:48,467-Speed 2627.66 samples/sec   Loss 10.9177   LearningRate 0.0637   Epoch: 4   Global Step: 167450   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:52,363-Speed 2629.13 samples/sec   Loss 10.9442   LearningRate 0.0637   Epoch: 4   Global Step: 167460   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:31:56,274-Speed 2618.97 samples/sec   Loss 10.9605   LearningRate 0.0637   Epoch: 4   Global Step: 167470   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:00,343-Speed 2517.12 samples/sec   Loss 10.8941   LearningRate 0.0637   Epoch: 4   Global Step: 167480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:04,416-Speed 2514.30 samples/sec   Loss 10.7886   LearningRate 0.0637   Epoch: 4   Global Step: 167490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:08,417-Speed 2560.20 samples/sec   Loss 11.0714   LearningRate 0.0637   Epoch: 4   Global Step: 167500   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:32:12,296-Speed 2640.48 samples/sec   Loss 10.8346   LearningRate 0.0637   Epoch: 4   Global Step: 167510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:16,197-Speed 2625.70 samples/sec   Loss 10.9776   LearningRate 0.0637   Epoch: 4   Global Step: 167520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:20,096-Speed 2626.72 samples/sec   Loss 10.9006   LearningRate 0.0637   Epoch: 4   Global Step: 167530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:23,991-Speed 2630.24 samples/sec   Loss 10.9260   LearningRate 0.0637   Epoch: 4   Global Step: 167540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:27,893-Speed 2624.76 samples/sec   Loss 10.7571   LearningRate 0.0637   Epoch: 4   Global Step: 167550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:31,787-Speed 2630.29 samples/sec   Loss 10.8744   LearningRate 0.0637   Epoch: 4   Global Step: 167560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:35,686-Speed 2627.18 samples/sec   Loss 10.7480   LearningRate 0.0637   Epoch: 4   Global Step: 167570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:39,590-Speed 2623.22 samples/sec   Loss 10.9048   LearningRate 0.0637   Epoch: 4   Global Step: 167580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:43,491-Speed 2625.87 samples/sec   Loss 10.9235   LearningRate 0.0637   Epoch: 4   Global Step: 167590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:47,391-Speed 2626.48 samples/sec   Loss 11.0814   LearningRate 0.0637   Epoch: 4   Global Step: 167600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:32:51,293-Speed 2625.23 samples/sec   Loss 10.9855   LearningRate 0.0637   Epoch: 4   Global Step: 167610   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:32:55,200-Speed 2620.94 samples/sec   Loss 10.7891   LearningRate 0.0637   Epoch: 4   Global Step: 167620   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:32:59,101-Speed 2625.92 samples/sec   Loss 10.7830   LearningRate 0.0637   Epoch: 4   Global Step: 167630   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:03,002-Speed 2625.17 samples/sec   Loss 10.8159   LearningRate 0.0637   Epoch: 4   Global Step: 167640   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:06,902-Speed 2626.43 samples/sec   Loss 10.7402   LearningRate 0.0637   Epoch: 4   Global Step: 167650   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:10,797-Speed 2629.15 samples/sec   Loss 10.9173   LearningRate 0.0637   Epoch: 4   Global Step: 167660   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:14,698-Speed 2626.30 samples/sec   Loss 10.9457   LearningRate 0.0637   Epoch: 4   Global Step: 167670   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:18,593-Speed 2629.88 samples/sec   Loss 11.0028   LearningRate 0.0637   Epoch: 4   Global Step: 167680   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:22,485-Speed 2631.70 samples/sec   Loss 10.9536   LearningRate 0.0637   Epoch: 4   Global Step: 167690   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:26,381-Speed 2629.26 samples/sec   Loss 10.9292   LearningRate 0.0637   Epoch: 4   Global Step: 167700   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:33:30,253-Speed 2645.49 samples/sec   Loss 10.9740   LearningRate 0.0637   Epoch: 4   Global Step: 167710   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:34,152-Speed 2626.75 samples/sec   Loss 10.8057   LearningRate 0.0637   Epoch: 4   Global Step: 167720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:38,056-Speed 2623.33 samples/sec   Loss 10.8268   LearningRate 0.0637   Epoch: 4   Global Step: 167730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:41,950-Speed 2630.59 samples/sec   Loss 11.0218   LearningRate 0.0636   Epoch: 4   Global Step: 167740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:45,839-Speed 2634.00 samples/sec   Loss 10.7145   LearningRate 0.0636   Epoch: 4   Global Step: 167750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:49,758-Speed 2613.40 samples/sec   Loss 10.8518   LearningRate 0.0636   Epoch: 4   Global Step: 167760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:53,685-Speed 2608.62 samples/sec   Loss 10.9035   LearningRate 0.0636   Epoch: 4   Global Step: 167770   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:33:57,591-Speed 2622.10 samples/sec   Loss 10.8465   LearningRate 0.0636   Epoch: 4   Global Step: 167780   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:01,502-Speed 2618.86 samples/sec   Loss 10.9523   LearningRate 0.0636   Epoch: 4   Global Step: 167790   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:05,402-Speed 2625.91 samples/sec   Loss 10.8682   LearningRate 0.0636   Epoch: 4   Global Step: 167800   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:09,299-Speed 2628.52 samples/sec   Loss 10.8621   LearningRate 0.0636   Epoch: 4   Global Step: 167810   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:13,212-Speed 2617.33 samples/sec   Loss 10.9394   LearningRate 0.0636   Epoch: 4   Global Step: 167820   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:17,127-Speed 2616.70 samples/sec   Loss 10.9743   LearningRate 0.0636   Epoch: 4   Global Step: 167830   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:21,037-Speed 2619.13 samples/sec   Loss 10.9306   LearningRate 0.0636   Epoch: 4   Global Step: 167840   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:24,939-Speed 2624.99 samples/sec   Loss 10.9132   LearningRate 0.0636   Epoch: 4   Global Step: 167850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:28,854-Speed 2616.70 samples/sec   Loss 10.7688   LearningRate 0.0636   Epoch: 4   Global Step: 167860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:32,777-Speed 2610.60 samples/sec   Loss 10.8973   LearningRate 0.0636   Epoch: 4   Global Step: 167870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:36,682-Speed 2623.01 samples/sec   Loss 10.9279   LearningRate 0.0636   Epoch: 4   Global Step: 167880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:40,586-Speed 2623.15 samples/sec   Loss 10.9077   LearningRate 0.0636   Epoch: 4   Global Step: 167890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:44,503-Speed 2615.32 samples/sec   Loss 10.8090   LearningRate 0.0636   Epoch: 4   Global Step: 167900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:34:48,399-Speed 2628.81 samples/sec   Loss 10.8561   LearningRate 0.0636   Epoch: 4   Global Step: 167910   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:34:52,304-Speed 2623.60 samples/sec   Loss 10.9155   LearningRate 0.0636   Epoch: 4   Global Step: 167920   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:34:56,215-Speed 2618.85 samples/sec   Loss 10.8991   LearningRate 0.0636   Epoch: 4   Global Step: 167930   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:00,123-Speed 2620.98 samples/sec   Loss 10.7906   LearningRate 0.0636   Epoch: 4   Global Step: 167940   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:04,028-Speed 2622.78 samples/sec   Loss 10.8875   LearningRate 0.0636   Epoch: 4   Global Step: 167950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:07,940-Speed 2618.24 samples/sec   Loss 10.9149   LearningRate 0.0636   Epoch: 4   Global Step: 167960   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:11,852-Speed 2617.82 samples/sec   Loss 11.0222   LearningRate 0.0636   Epoch: 4   Global Step: 167970   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:15,760-Speed 2620.99 samples/sec   Loss 10.8700   LearningRate 0.0636   Epoch: 4   Global Step: 167980   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:19,703-Speed 2597.43 samples/sec   Loss 10.8854   LearningRate 0.0636   Epoch: 4   Global Step: 167990   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:23,649-Speed 2596.14 samples/sec   Loss 10.8608   LearningRate 0.0636   Epoch: 4   Global Step: 168000   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:27,537-Speed 2634.35 samples/sec   Loss 10.8577   LearningRate 0.0636   Epoch: 4   Global Step: 168010   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:31,436-Speed 2627.05 samples/sec   Loss 10.8728   LearningRate 0.0636   Epoch: 4   Global Step: 168020   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:35,346-Speed 2619.27 samples/sec   Loss 10.9872   LearningRate 0.0636   Epoch: 4   Global Step: 168030   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:39,249-Speed 2624.37 samples/sec   Loss 10.8839   LearningRate 0.0636   Epoch: 4   Global Step: 168040   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:43,151-Speed 2625.26 samples/sec   Loss 10.9563   LearningRate 0.0636   Epoch: 4   Global Step: 168050   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:47,053-Speed 2624.55 samples/sec   Loss 10.8694   LearningRate 0.0636   Epoch: 4   Global Step: 168060   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:50,963-Speed 2619.22 samples/sec   Loss 10.7732   LearningRate 0.0636   Epoch: 4   Global Step: 168070   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:54,873-Speed 2619.90 samples/sec   Loss 10.9464   LearningRate 0.0636   Epoch: 4   Global Step: 168080   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:35:58,755-Speed 2638.60 samples/sec   Loss 10.7835   LearningRate 0.0636   Epoch: 4   Global Step: 168090   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:36:02,650-Speed 2629.72 samples/sec   Loss 10.8740   LearningRate 0.0636   Epoch: 4   Global Step: 168100   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:36:06,508-Speed 2655.08 samples/sec   Loss 11.0024   LearningRate 0.0636   Epoch: 4   Global Step: 168110   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:36:10,380-Speed 2645.31 samples/sec   Loss 10.8499   LearningRate 0.0636   Epoch: 4   Global Step: 168120   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:14,274-Speed 2630.25 samples/sec   Loss 10.6917   LearningRate 0.0636   Epoch: 4   Global Step: 168130   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:18,184-Speed 2619.36 samples/sec   Loss 10.8943   LearningRate 0.0636   Epoch: 4   Global Step: 168140   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:22,074-Speed 2632.79 samples/sec   Loss 10.9558   LearningRate 0.0636   Epoch: 4   Global Step: 168150   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:25,971-Speed 2628.79 samples/sec   Loss 10.8383   LearningRate 0.0636   Epoch: 4   Global Step: 168160   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:29,866-Speed 2629.47 samples/sec   Loss 10.6689   LearningRate 0.0636   Epoch: 4   Global Step: 168170   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:33,758-Speed 2632.52 samples/sec   Loss 10.9208   LearningRate 0.0636   Epoch: 4   Global Step: 168180   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:37,655-Speed 2627.47 samples/sec   Loss 10.8023   LearningRate 0.0636   Epoch: 4   Global Step: 168190   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:41,551-Speed 2629.31 samples/sec   Loss 10.7389   LearningRate 0.0636   Epoch: 4   Global Step: 168200   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:45,448-Speed 2628.85 samples/sec   Loss 10.7973   LearningRate 0.0636   Epoch: 4   Global Step: 168210   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:36:49,508-Speed 2522.18 samples/sec   Loss 10.9844   LearningRate 0.0636   Epoch: 4   Global Step: 168220   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:36:53,424-Speed 2615.72 samples/sec   Loss 10.9070   LearningRate 0.0636   Epoch: 4   Global Step: 168230   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:36:57,331-Speed 2621.43 samples/sec   Loss 10.9415   LearningRate 0.0636   Epoch: 4   Global Step: 168240   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:01,292-Speed 2585.96 samples/sec   Loss 10.9537   LearningRate 0.0636   Epoch: 4   Global Step: 168250   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:05,396-Speed 2495.51 samples/sec   Loss 10.7509   LearningRate 0.0635   Epoch: 4   Global Step: 168260   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:09,499-Speed 2496.77 samples/sec   Loss 10.8043   LearningRate 0.0635   Epoch: 4   Global Step: 168270   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:13,612-Speed 2490.20 samples/sec   Loss 10.8856   LearningRate 0.0635   Epoch: 4   Global Step: 168280   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:17,513-Speed 2625.16 samples/sec   Loss 10.7765   LearningRate 0.0635   Epoch: 4   Global Step: 168290   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:21,402-Speed 2634.01 samples/sec   Loss 10.8809   LearningRate 0.0635   Epoch: 4   Global Step: 168300   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:37:25,277-Speed 2643.21 samples/sec   Loss 10.9296   LearningRate 0.0635   Epoch: 4   Global Step: 168310   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:29,172-Speed 2629.84 samples/sec   Loss 10.7748   LearningRate 0.0635   Epoch: 4   Global Step: 168320   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:33,067-Speed 2629.35 samples/sec   Loss 10.7700   LearningRate 0.0635   Epoch: 4   Global Step: 168330   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:36,973-Speed 2622.71 samples/sec   Loss 10.8143   LearningRate 0.0635   Epoch: 4   Global Step: 168340   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:40,871-Speed 2627.06 samples/sec   Loss 10.9175   LearningRate 0.0635   Epoch: 4   Global Step: 168350   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:44,833-Speed 2585.81 samples/sec   Loss 10.8068   LearningRate 0.0635   Epoch: 4   Global Step: 168360   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:48,925-Speed 2502.59 samples/sec   Loss 10.7991   LearningRate 0.0635   Epoch: 4   Global Step: 168370   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:52,881-Speed 2589.52 samples/sec   Loss 10.8701   LearningRate 0.0635   Epoch: 4   Global Step: 168380   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:37:56,781-Speed 2625.64 samples/sec   Loss 10.8314   LearningRate 0.0635   Epoch: 4   Global Step: 168390   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:00,677-Speed 2629.16 samples/sec   Loss 10.9435   LearningRate 0.0635   Epoch: 4   Global Step: 168400   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:04,578-Speed 2626.17 samples/sec   Loss 10.9113   LearningRate 0.0635   Epoch: 4   Global Step: 168410   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:38:08,475-Speed 2627.98 samples/sec   Loss 10.8168   LearningRate 0.0635   Epoch: 4   Global Step: 168420   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:38:12,380-Speed 2623.05 samples/sec   Loss 10.9760   LearningRate 0.0635   Epoch: 4   Global Step: 168430   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:38:16,281-Speed 2625.21 samples/sec   Loss 10.7986   LearningRate 0.0635   Epoch: 4   Global Step: 168440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:38:20,165-Speed 2637.08 samples/sec   Loss 10.9145   LearningRate 0.0635   Epoch: 4   Global Step: 168450   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:24,064-Speed 2626.94 samples/sec   Loss 11.0139   LearningRate 0.0635   Epoch: 4   Global Step: 168460   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:27,986-Speed 2612.14 samples/sec   Loss 11.0297   LearningRate 0.0635   Epoch: 4   Global Step: 168470   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:31,884-Speed 2627.37 samples/sec   Loss 10.8637   LearningRate 0.0635   Epoch: 4   Global Step: 168480   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:35,774-Speed 2632.89 samples/sec   Loss 10.9090   LearningRate 0.0635   Epoch: 4   Global Step: 168490   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:39,691-Speed 2614.75 samples/sec   Loss 10.8270   LearningRate 0.0635   Epoch: 4   Global Step: 168500   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:43,592-Speed 2625.92 samples/sec   Loss 10.8076   LearningRate 0.0635   Epoch: 4   Global Step: 168510   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:47,494-Speed 2624.99 samples/sec   Loss 10.8312   LearningRate 0.0635   Epoch: 4   Global Step: 168520   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:51,410-Speed 2616.50 samples/sec   Loss 10.8941   LearningRate 0.0635   Epoch: 4   Global Step: 168530   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:55,311-Speed 2625.30 samples/sec   Loss 10.8376   LearningRate 0.0635   Epoch: 4   Global Step: 168540   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:38:59,216-Speed 2623.40 samples/sec   Loss 10.8035   LearningRate 0.0635   Epoch: 4   Global Step: 168550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:03,130-Speed 2616.94 samples/sec   Loss 10.7926   LearningRate 0.0635   Epoch: 4   Global Step: 168560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:07,029-Speed 2626.64 samples/sec   Loss 10.8542   LearningRate 0.0635   Epoch: 4   Global Step: 168570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:10,963-Speed 2603.46 samples/sec   Loss 10.7163   LearningRate 0.0635   Epoch: 4   Global Step: 168580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:14,895-Speed 2605.60 samples/sec   Loss 10.7991   LearningRate 0.0635   Epoch: 4   Global Step: 168590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:18,790-Speed 2629.23 samples/sec   Loss 10.7987   LearningRate 0.0635   Epoch: 4   Global Step: 168600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:22,709-Speed 2613.57 samples/sec   Loss 10.9212   LearningRate 0.0635   Epoch: 4   Global Step: 168610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:26,612-Speed 2624.41 samples/sec   Loss 10.9234   LearningRate 0.0635   Epoch: 4   Global Step: 168620   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:39:30,525-Speed 2617.90 samples/sec   Loss 10.8422   LearningRate 0.0635   Epoch: 4   Global Step: 168630   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:34,453-Speed 2607.59 samples/sec   Loss 10.8392   LearningRate 0.0635   Epoch: 4   Global Step: 168640   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:38,352-Speed 2627.19 samples/sec   Loss 10.8371   LearningRate 0.0635   Epoch: 4   Global Step: 168650   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:42,250-Speed 2627.56 samples/sec   Loss 10.8701   LearningRate 0.0635   Epoch: 4   Global Step: 168660   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:46,148-Speed 2627.54 samples/sec   Loss 10.8869   LearningRate 0.0635   Epoch: 4   Global Step: 168670   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:50,044-Speed 2629.25 samples/sec   Loss 10.9157   LearningRate 0.0635   Epoch: 4   Global Step: 168680   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:53,937-Speed 2630.35 samples/sec   Loss 10.9160   LearningRate 0.0635   Epoch: 4   Global Step: 168690   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:39:57,836-Speed 2627.86 samples/sec   Loss 10.9230   LearningRate 0.0635   Epoch: 4   Global Step: 168700   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:40:01,730-Speed 2630.06 samples/sec   Loss 10.7169   LearningRate 0.0635   Epoch: 4   Global Step: 168710   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:40:05,632-Speed 2624.91 samples/sec   Loss 10.8118   LearningRate 0.0635   Epoch: 4   Global Step: 168720   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:40:09,529-Speed 2628.22 samples/sec   Loss 10.9902   LearningRate 0.0635   Epoch: 4   Global Step: 168730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:13,428-Speed 2627.57 samples/sec   Loss 10.9293   LearningRate 0.0635   Epoch: 4   Global Step: 168740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:17,330-Speed 2624.62 samples/sec   Loss 10.7354   LearningRate 0.0635   Epoch: 4   Global Step: 168750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:21,227-Speed 2628.18 samples/sec   Loss 10.8580   LearningRate 0.0635   Epoch: 4   Global Step: 168760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:25,124-Speed 2628.57 samples/sec   Loss 10.6964   LearningRate 0.0635   Epoch: 4   Global Step: 168770   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:29,020-Speed 2628.96 samples/sec   Loss 10.8570   LearningRate 0.0634   Epoch: 4   Global Step: 168780   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:32,932-Speed 2617.78 samples/sec   Loss 10.7401   LearningRate 0.0634   Epoch: 4   Global Step: 168790   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:36,830-Speed 2627.94 samples/sec   Loss 10.7566   LearningRate 0.0634   Epoch: 4   Global Step: 168800   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:40,727-Speed 2628.28 samples/sec   Loss 10.8376   LearningRate 0.0634   Epoch: 4   Global Step: 168810   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:44,634-Speed 2622.21 samples/sec   Loss 10.6976   LearningRate 0.0634   Epoch: 4   Global Step: 168820   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:48,510-Speed 2642.14 samples/sec   Loss 10.8791   LearningRate 0.0634   Epoch: 4   Global Step: 168830   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:52,406-Speed 2628.87 samples/sec   Loss 10.8513   LearningRate 0.0634   Epoch: 4   Global Step: 168840   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:40:56,306-Speed 2626.38 samples/sec   Loss 11.0031   LearningRate 0.0634   Epoch: 4   Global Step: 168850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:00,213-Speed 2621.47 samples/sec   Loss 10.6927   LearningRate 0.0634   Epoch: 4   Global Step: 168860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:04,130-Speed 2615.41 samples/sec   Loss 10.8947   LearningRate 0.0634   Epoch: 4   Global Step: 168870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:08,044-Speed 2616.73 samples/sec   Loss 10.9192   LearningRate 0.0634   Epoch: 4   Global Step: 168880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:11,942-Speed 2627.98 samples/sec   Loss 10.7492   LearningRate 0.0634   Epoch: 4   Global Step: 168890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:15,837-Speed 2629.86 samples/sec   Loss 10.8229   LearningRate 0.0634   Epoch: 4   Global Step: 168900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:19,734-Speed 2627.75 samples/sec   Loss 10.9207   LearningRate 0.0634   Epoch: 4   Global Step: 168910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:23,632-Speed 2627.99 samples/sec   Loss 10.8541   LearningRate 0.0634   Epoch: 4   Global Step: 168920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:27,528-Speed 2628.97 samples/sec   Loss 10.7704   LearningRate 0.0634   Epoch: 4   Global Step: 168930   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:41:31,425-Speed 2627.80 samples/sec   Loss 10.7944   LearningRate 0.0634   Epoch: 4   Global Step: 168940   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:41:35,326-Speed 2626.03 samples/sec   Loss 10.9315   LearningRate 0.0634   Epoch: 4   Global Step: 168950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:41:39,227-Speed 2625.33 samples/sec   Loss 10.8068   LearningRate 0.0634   Epoch: 4   Global Step: 168960   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:41:43,111-Speed 2637.37 samples/sec   Loss 10.8905   LearningRate 0.0634   Epoch: 4   Global Step: 168970   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:47,009-Speed 2627.45 samples/sec   Loss 10.7923   LearningRate 0.0634   Epoch: 4   Global Step: 168980   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:50,902-Speed 2630.61 samples/sec   Loss 10.8006   LearningRate 0.0634   Epoch: 4   Global Step: 168990   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:54,800-Speed 2627.68 samples/sec   Loss 10.7580   LearningRate 0.0634   Epoch: 4   Global Step: 169000   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:41:58,714-Speed 2617.28 samples/sec   Loss 10.7472   LearningRate 0.0634   Epoch: 4   Global Step: 169010   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:02,609-Speed 2629.52 samples/sec   Loss 10.7549   LearningRate 0.0634   Epoch: 4   Global Step: 169020   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:06,509-Speed 2626.54 samples/sec   Loss 10.7566   LearningRate 0.0634   Epoch: 4   Global Step: 169030   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:10,410-Speed 2625.16 samples/sec   Loss 10.8527   LearningRate 0.0634   Epoch: 4   Global Step: 169040   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:14,308-Speed 2627.78 samples/sec   Loss 10.8871   LearningRate 0.0634   Epoch: 4   Global Step: 169050   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:18,210-Speed 2624.49 samples/sec   Loss 10.8986   LearningRate 0.0634   Epoch: 4   Global Step: 169060   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:22,106-Speed 2629.15 samples/sec   Loss 10.8517   LearningRate 0.0634   Epoch: 4   Global Step: 169070   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:26,002-Speed 2629.39 samples/sec   Loss 10.9152   LearningRate 0.0634   Epoch: 4   Global Step: 169080   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:29,896-Speed 2630.59 samples/sec   Loss 10.8386   LearningRate 0.0634   Epoch: 4   Global Step: 169090   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:33,796-Speed 2625.73 samples/sec   Loss 10.8781   LearningRate 0.0634   Epoch: 4   Global Step: 169100   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:37,690-Speed 2630.16 samples/sec   Loss 10.7530   LearningRate 0.0634   Epoch: 4   Global Step: 169110   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:41,672-Speed 2572.02 samples/sec   Loss 10.8076   LearningRate 0.0634   Epoch: 4   Global Step: 169120   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:45,612-Speed 2599.85 samples/sec   Loss 10.7672   LearningRate 0.0634   Epoch: 4   Global Step: 169130   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:42:49,490-Speed 2640.87 samples/sec   Loss 10.7980   LearningRate 0.0634   Epoch: 4   Global Step: 169140   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:53,385-Speed 2630.11 samples/sec   Loss 10.9353   LearningRate 0.0634   Epoch: 4   Global Step: 169150   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:42:57,283-Speed 2627.83 samples/sec   Loss 10.9227   LearningRate 0.0634   Epoch: 4   Global Step: 169160   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:01,179-Speed 2628.39 samples/sec   Loss 10.8782   LearningRate 0.0634   Epoch: 4   Global Step: 169170   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:05,082-Speed 2624.21 samples/sec   Loss 10.7342   LearningRate 0.0634   Epoch: 4   Global Step: 169180   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:09,018-Speed 2602.61 samples/sec   Loss 10.9147   LearningRate 0.0634   Epoch: 4   Global Step: 169190   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:12,914-Speed 2628.97 samples/sec   Loss 10.8871   LearningRate 0.0634   Epoch: 4   Global Step: 169200   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:16,815-Speed 2625.47 samples/sec   Loss 10.8155   LearningRate 0.0634   Epoch: 4   Global Step: 169210   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:20,713-Speed 2627.97 samples/sec   Loss 10.8832   LearningRate 0.0634   Epoch: 4   Global Step: 169220   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:24,612-Speed 2626.80 samples/sec   Loss 10.7419   LearningRate 0.0634   Epoch: 4   Global Step: 169230   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:28,505-Speed 2631.30 samples/sec   Loss 10.7228   LearningRate 0.0634   Epoch: 4   Global Step: 169240   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:43:32,411-Speed 2622.44 samples/sec   Loss 10.8407   LearningRate 0.0634   Epoch: 4   Global Step: 169250   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:43:36,306-Speed 2629.59 samples/sec   Loss 10.8764   LearningRate 0.0634   Epoch: 4   Global Step: 169260   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:43:40,201-Speed 2628.91 samples/sec   Loss 10.9409   LearningRate 0.0634   Epoch: 4   Global Step: 169270   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:43:44,086-Speed 2637.02 samples/sec   Loss 10.8578   LearningRate 0.0634   Epoch: 4   Global Step: 169280   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:47,972-Speed 2635.98 samples/sec   Loss 10.8748   LearningRate 0.0634   Epoch: 4   Global Step: 169290   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:51,868-Speed 2629.32 samples/sec   Loss 10.8943   LearningRate 0.0633   Epoch: 4   Global Step: 169300   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:55,769-Speed 2625.30 samples/sec   Loss 10.6833   LearningRate 0.0633   Epoch: 4   Global Step: 169310   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:43:59,665-Speed 2629.18 samples/sec   Loss 10.9778   LearningRate 0.0633   Epoch: 4   Global Step: 169320   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:44:03,560-Speed 2629.34 samples/sec   Loss 10.8632   LearningRate 0.0633   Epoch: 4   Global Step: 169330   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:44:07,466-Speed 2622.23 samples/sec   Loss 10.8101   LearningRate 0.0633   Epoch: 4   Global Step: 169340   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:44:11,366-Speed 2625.69 samples/sec   Loss 10.9235   LearningRate 0.0633   Epoch: 4   Global Step: 169350   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:44:15,261-Speed 2630.53 samples/sec   Loss 10.8409   LearningRate 0.0633   Epoch: 4   Global Step: 169360   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:44:19,200-Speed 2599.85 samples/sec   Loss 10.8915   LearningRate 0.0633   Epoch: 4   Global Step: 169370   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:44:23,087-Speed 2635.62 samples/sec   Loss 10.7732   LearningRate 0.0633   Epoch: 4   Global Step: 169380   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:26,982-Speed 2629.79 samples/sec   Loss 10.8197   LearningRate 0.0633   Epoch: 4   Global Step: 169390   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:30,877-Speed 2629.88 samples/sec   Loss 10.7959   LearningRate 0.0633   Epoch: 4   Global Step: 169400   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:34,869-Speed 2565.72 samples/sec   Loss 10.8175   LearningRate 0.0633   Epoch: 4   Global Step: 169410   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:38,892-Speed 2545.76 samples/sec   Loss 10.6244   LearningRate 0.0633   Epoch: 4   Global Step: 169420   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:43,481-Speed 2231.43 samples/sec   Loss 10.9064   LearningRate 0.0633   Epoch: 4   Global Step: 169430   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:47,375-Speed 2630.63 samples/sec   Loss 10.8168   LearningRate 0.0633   Epoch: 4   Global Step: 169440   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:51,269-Speed 2630.78 samples/sec   Loss 10.9448   LearningRate 0.0633   Epoch: 4   Global Step: 169450   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:55,165-Speed 2628.91 samples/sec   Loss 10.7307   LearningRate 0.0633   Epoch: 4   Global Step: 169460   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:44:59,086-Speed 2612.76 samples/sec   Loss 10.8200   LearningRate 0.0633   Epoch: 4   Global Step: 169470   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:45:02,976-Speed 2632.86 samples/sec   Loss 10.8574   LearningRate 0.0633   Epoch: 4   Global Step: 169480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:06,905-Speed 2606.99 samples/sec   Loss 10.7305   LearningRate 0.0633   Epoch: 4   Global Step: 169490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:10,809-Speed 2623.71 samples/sec   Loss 10.8797   LearningRate 0.0633   Epoch: 4   Global Step: 169500   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:14,709-Speed 2626.65 samples/sec   Loss 10.8448   LearningRate 0.0633   Epoch: 4   Global Step: 169510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:18,616-Speed 2620.80 samples/sec   Loss 10.8014   LearningRate 0.0633   Epoch: 4   Global Step: 169520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:22,533-Speed 2615.17 samples/sec   Loss 10.8299   LearningRate 0.0633   Epoch: 4   Global Step: 169530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:26,431-Speed 2627.75 samples/sec   Loss 10.7785   LearningRate 0.0633   Epoch: 4   Global Step: 169540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:30,333-Speed 2625.25 samples/sec   Loss 10.8068   LearningRate 0.0633   Epoch: 4   Global Step: 169550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:34,236-Speed 2624.37 samples/sec   Loss 10.9437   LearningRate 0.0633   Epoch: 4   Global Step: 169560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:38,141-Speed 2623.20 samples/sec   Loss 10.7512   LearningRate 0.0633   Epoch: 4   Global Step: 169570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:42,048-Speed 2621.38 samples/sec   Loss 10.7983   LearningRate 0.0633   Epoch: 4   Global Step: 169580   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:45:45,935-Speed 2635.54 samples/sec   Loss 10.7457   LearningRate 0.0633   Epoch: 4   Global Step: 169590   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:45:49,837-Speed 2624.71 samples/sec   Loss 10.7566   LearningRate 0.0633   Epoch: 4   Global Step: 169600   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:45:53,725-Speed 2634.64 samples/sec   Loss 10.8635   LearningRate 0.0633   Epoch: 4   Global Step: 169610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:45:57,621-Speed 2629.22 samples/sec   Loss 10.7313   LearningRate 0.0633   Epoch: 4   Global Step: 169620   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:01,524-Speed 2623.76 samples/sec   Loss 10.8766   LearningRate 0.0633   Epoch: 4   Global Step: 169630   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:05,416-Speed 2632.15 samples/sec   Loss 10.8297   LearningRate 0.0633   Epoch: 4   Global Step: 169640   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:09,314-Speed 2627.45 samples/sec   Loss 10.8052   LearningRate 0.0633   Epoch: 4   Global Step: 169650   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:13,210-Speed 2629.79 samples/sec   Loss 10.7659   LearningRate 0.0633   Epoch: 4   Global Step: 169660   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:17,104-Speed 2630.34 samples/sec   Loss 10.9454   LearningRate 0.0633   Epoch: 4   Global Step: 169670   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:20,995-Speed 2632.52 samples/sec   Loss 10.7667   LearningRate 0.0633   Epoch: 4   Global Step: 169680   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:24,890-Speed 2629.79 samples/sec   Loss 10.7241   LearningRate 0.0633   Epoch: 4   Global Step: 169690   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:28,784-Speed 2629.89 samples/sec   Loss 10.7315   LearningRate 0.0633   Epoch: 4   Global Step: 169700   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:46:32,686-Speed 2625.05 samples/sec   Loss 10.6250   LearningRate 0.0633   Epoch: 4   Global Step: 169710   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:46:36,580-Speed 2630.54 samples/sec   Loss 10.7790   LearningRate 0.0633   Epoch: 4   Global Step: 169720   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:46:40,437-Speed 2655.30 samples/sec   Loss 10.8038   LearningRate 0.0633   Epoch: 4   Global Step: 169730   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:46:44,333-Speed 2628.96 samples/sec   Loss 10.7615   LearningRate 0.0633   Epoch: 4   Global Step: 169740   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:46:48,269-Speed 2602.26 samples/sec   Loss 10.7717   LearningRate 0.0633   Epoch: 4   Global Step: 169750   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:46:52,180-Speed 2619.05 samples/sec   Loss 10.8169   LearningRate 0.0633   Epoch: 4   Global Step: 169760   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:46:56,075-Speed 2629.04 samples/sec   Loss 10.8249   LearningRate 0.0633   Epoch: 4   Global Step: 169770   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:46:59,967-Speed 2632.34 samples/sec   Loss 10.7470   LearningRate 0.0633   Epoch: 4   Global Step: 169780   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:47:03,861-Speed 2629.90 samples/sec   Loss 10.8131   LearningRate 0.0633   Epoch: 4   Global Step: 169790   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:47:07,764-Speed 2624.81 samples/sec   Loss 10.8351   LearningRate 0.0633   Epoch: 4   Global Step: 169800   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:47:11,653-Speed 2633.67 samples/sec   Loss 10.9434   LearningRate 0.0633   Epoch: 4   Global Step: 169810   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:47:15,548-Speed 2629.21 samples/sec   Loss 10.8935   LearningRate 0.0632   Epoch: 4   Global Step: 169820   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:47:19,451-Speed 2624.55 samples/sec   Loss 10.7454   LearningRate 0.0632   Epoch: 4   Global Step: 169830   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:23,334-Speed 2637.69 samples/sec   Loss 10.8903   LearningRate 0.0632   Epoch: 4   Global Step: 169840   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:27,230-Speed 2628.77 samples/sec   Loss 10.7526   LearningRate 0.0632   Epoch: 4   Global Step: 169850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:31,134-Speed 2624.00 samples/sec   Loss 10.7558   LearningRate 0.0632   Epoch: 4   Global Step: 169860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:35,043-Speed 2620.61 samples/sec   Loss 10.7063   LearningRate 0.0632   Epoch: 4   Global Step: 169870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:38,951-Speed 2620.25 samples/sec   Loss 10.7951   LearningRate 0.0632   Epoch: 4   Global Step: 169880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:42,854-Speed 2624.88 samples/sec   Loss 10.7814   LearningRate 0.0632   Epoch: 4   Global Step: 169890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:46,750-Speed 2629.25 samples/sec   Loss 10.7414   LearningRate 0.0632   Epoch: 4   Global Step: 169900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:50,643-Speed 2630.73 samples/sec   Loss 10.9030   LearningRate 0.0632   Epoch: 4   Global Step: 169910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:54,542-Speed 2626.46 samples/sec   Loss 10.6726   LearningRate 0.0632   Epoch: 4   Global Step: 169920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:47:58,428-Speed 2635.89 samples/sec   Loss 10.7241   LearningRate 0.0632   Epoch: 4   Global Step: 169930   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:02,335-Speed 2621.78 samples/sec   Loss 10.7837   LearningRate 0.0632   Epoch: 4   Global Step: 169940   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:06,232-Speed 2628.79 samples/sec   Loss 10.8128   LearningRate 0.0632   Epoch: 4   Global Step: 169950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:10,124-Speed 2631.59 samples/sec   Loss 10.8520   LearningRate 0.0632   Epoch: 4   Global Step: 169960   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:14,023-Speed 2626.95 samples/sec   Loss 10.8551   LearningRate 0.0632   Epoch: 4   Global Step: 169970   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:17,920-Speed 2628.09 samples/sec   Loss 10.8173   LearningRate 0.0632   Epoch: 4   Global Step: 169980   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:21,876-Speed 2588.85 samples/sec   Loss 10.7367   LearningRate 0.0632   Epoch: 4   Global Step: 169990   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:48:25,934-Speed 2524.08 samples/sec   Loss 10.7718   LearningRate 0.0632   Epoch: 4   Global Step: 170000   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:49:09,339-[lfw][170000]XNorm: 22.842724
Training: 2022-04-13 14:49:09,340-[lfw][170000]Accuracy-Flip: 0.99767+-0.00309
Training: 2022-04-13 14:49:09,341-[lfw][170000]Accuracy-Highest: 0.99783
Training: 2022-04-13 14:49:59,490-[cfp_fp][170000]XNorm: 21.100989
Training: 2022-04-13 14:49:59,491-[cfp_fp][170000]Accuracy-Flip: 0.98071+-0.00565
Training: 2022-04-13 14:49:59,492-[cfp_fp][170000]Accuracy-Highest: 0.98100
Training: 2022-04-13 14:50:42,685-[agedb_30][170000]XNorm: 22.645779
Training: 2022-04-13 14:50:42,686-[agedb_30][170000]Accuracy-Flip: 0.97133+-0.00741
Training: 2022-04-13 14:50:42,686-[agedb_30][170000]Accuracy-Highest: 0.97133
Training: 2022-04-13 14:50:46,560-Speed 72.82 samples/sec   Loss 10.9538   LearningRate 0.0632   Epoch: 4   Global Step: 170010   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:50:50,431-Speed 2645.36 samples/sec   Loss 10.8029   LearningRate 0.0632   Epoch: 4   Global Step: 170020   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:50:54,286-Speed 2657.06 samples/sec   Loss 10.7809   LearningRate 0.0632   Epoch: 4   Global Step: 170030   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:50:58,163-Speed 2642.16 samples/sec   Loss 10.6764   LearningRate 0.0632   Epoch: 4   Global Step: 170040   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:51:02,036-Speed 2644.21 samples/sec   Loss 10.7266   LearningRate 0.0632   Epoch: 4   Global Step: 170050   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:51:05,918-Speed 2639.17 samples/sec   Loss 10.7595   LearningRate 0.0632   Epoch: 4   Global Step: 170060   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:51:09,805-Speed 2635.38 samples/sec   Loss 10.8372   LearningRate 0.0632   Epoch: 4   Global Step: 170070   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:51:13,690-Speed 2636.34 samples/sec   Loss 10.7721   LearningRate 0.0632   Epoch: 4   Global Step: 170080   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:51:17,569-Speed 2640.68 samples/sec   Loss 10.8722   LearningRate 0.0632   Epoch: 4   Global Step: 170090   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:21,457-Speed 2634.10 samples/sec   Loss 10.7622   LearningRate 0.0632   Epoch: 4   Global Step: 170100   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:25,353-Speed 2628.85 samples/sec   Loss 10.7884   LearningRate 0.0632   Epoch: 4   Global Step: 170110   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:29,246-Speed 2631.38 samples/sec   Loss 10.9057   LearningRate 0.0632   Epoch: 4   Global Step: 170120   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:33,134-Speed 2634.14 samples/sec   Loss 10.7943   LearningRate 0.0632   Epoch: 4   Global Step: 170130   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:37,032-Speed 2628.19 samples/sec   Loss 10.7729   LearningRate 0.0632   Epoch: 4   Global Step: 170140   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:40,933-Speed 2625.41 samples/sec   Loss 10.8175   LearningRate 0.0632   Epoch: 4   Global Step: 170150   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:44,851-Speed 2614.23 samples/sec   Loss 10.7973   LearningRate 0.0632   Epoch: 4   Global Step: 170160   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:48,747-Speed 2628.92 samples/sec   Loss 10.9252   LearningRate 0.0632   Epoch: 4   Global Step: 170170   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:52,642-Speed 2629.80 samples/sec   Loss 10.7637   LearningRate 0.0632   Epoch: 4   Global Step: 170180   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:51:56,535-Speed 2631.09 samples/sec   Loss 10.9238   LearningRate 0.0632   Epoch: 4   Global Step: 170190   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:52:00,438-Speed 2624.22 samples/sec   Loss 10.7465   LearningRate 0.0632   Epoch: 4   Global Step: 170200   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:52:04,344-Speed 2625.15 samples/sec   Loss 10.7089   LearningRate 0.0632   Epoch: 4   Global Step: 170210   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:08,241-Speed 2628.80 samples/sec   Loss 10.8033   LearningRate 0.0632   Epoch: 4   Global Step: 170220   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:12,151-Speed 2619.33 samples/sec   Loss 10.9080   LearningRate 0.0632   Epoch: 4   Global Step: 170230   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:16,053-Speed 2624.79 samples/sec   Loss 10.7291   LearningRate 0.0632   Epoch: 4   Global Step: 170240   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:19,967-Speed 2616.97 samples/sec   Loss 10.8922   LearningRate 0.0632   Epoch: 4   Global Step: 170250   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:23,879-Speed 2618.20 samples/sec   Loss 10.9011   LearningRate 0.0632   Epoch: 4   Global Step: 170260   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:27,800-Speed 2612.70 samples/sec   Loss 10.9171   LearningRate 0.0632   Epoch: 4   Global Step: 170270   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:31,703-Speed 2624.71 samples/sec   Loss 10.7181   LearningRate 0.0632   Epoch: 4   Global Step: 170280   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:35,630-Speed 2607.98 samples/sec   Loss 10.8054   LearningRate 0.0632   Epoch: 4   Global Step: 170290   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:39,533-Speed 2624.45 samples/sec   Loss 10.8273   LearningRate 0.0632   Epoch: 4   Global Step: 170300   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:52:43,437-Speed 2623.59 samples/sec   Loss 10.7953   LearningRate 0.0632   Epoch: 4   Global Step: 170310   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:52:47,337-Speed 2625.79 samples/sec   Loss 10.7408   LearningRate 0.0632   Epoch: 4   Global Step: 170320   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:52:51,239-Speed 2625.45 samples/sec   Loss 10.8347   LearningRate 0.0632   Epoch: 4   Global Step: 170330   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:52:55,136-Speed 2628.02 samples/sec   Loss 10.7605   LearningRate 0.0631   Epoch: 4   Global Step: 170340   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:52:59,034-Speed 2627.51 samples/sec   Loss 10.6719   LearningRate 0.0631   Epoch: 4   Global Step: 170350   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:02,935-Speed 2625.69 samples/sec   Loss 10.7925   LearningRate 0.0631   Epoch: 4   Global Step: 170360   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:06,830-Speed 2629.76 samples/sec   Loss 10.8636   LearningRate 0.0631   Epoch: 4   Global Step: 170370   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:10,730-Speed 2626.01 samples/sec   Loss 10.9459   LearningRate 0.0631   Epoch: 4   Global Step: 170380   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:14,632-Speed 2625.26 samples/sec   Loss 10.8832   LearningRate 0.0631   Epoch: 4   Global Step: 170390   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:18,536-Speed 2623.25 samples/sec   Loss 10.8902   LearningRate 0.0631   Epoch: 4   Global Step: 170400   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:22,448-Speed 2618.16 samples/sec   Loss 10.9881   LearningRate 0.0631   Epoch: 4   Global Step: 170410   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:26,340-Speed 2632.08 samples/sec   Loss 10.7620   LearningRate 0.0631   Epoch: 4   Global Step: 170420   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:30,236-Speed 2629.06 samples/sec   Loss 10.8349   LearningRate 0.0631   Epoch: 4   Global Step: 170430   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:34,131-Speed 2629.62 samples/sec   Loss 10.7660   LearningRate 0.0631   Epoch: 4   Global Step: 170440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:38,006-Speed 2642.96 samples/sec   Loss 10.7443   LearningRate 0.0631   Epoch: 4   Global Step: 170450   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:41,901-Speed 2629.94 samples/sec   Loss 10.8312   LearningRate 0.0631   Epoch: 4   Global Step: 170460   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:45,815-Speed 2616.36 samples/sec   Loss 10.7489   LearningRate 0.0631   Epoch: 4   Global Step: 170470   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:49,755-Speed 2600.43 samples/sec   Loss 10.8926   LearningRate 0.0631   Epoch: 4   Global Step: 170480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:53,651-Speed 2628.92 samples/sec   Loss 10.8003   LearningRate 0.0631   Epoch: 4   Global Step: 170490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:53:57,557-Speed 2622.61 samples/sec   Loss 10.8559   LearningRate 0.0631   Epoch: 4   Global Step: 170500   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:01,461-Speed 2623.32 samples/sec   Loss 10.9130   LearningRate 0.0631   Epoch: 4   Global Step: 170510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:05,356-Speed 2629.55 samples/sec   Loss 10.8523   LearningRate 0.0631   Epoch: 4   Global Step: 170520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:09,253-Speed 2628.06 samples/sec   Loss 10.7294   LearningRate 0.0631   Epoch: 4   Global Step: 170530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:13,157-Speed 2623.90 samples/sec   Loss 10.8096   LearningRate 0.0631   Epoch: 4   Global Step: 170540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:17,064-Speed 2621.68 samples/sec   Loss 10.7651   LearningRate 0.0631   Epoch: 4   Global Step: 170550   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:54:20,948-Speed 2637.14 samples/sec   Loss 10.8537   LearningRate 0.0631   Epoch: 4   Global Step: 170560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:24,897-Speed 2593.58 samples/sec   Loss 10.8311   LearningRate 0.0631   Epoch: 4   Global Step: 170570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:28,798-Speed 2626.28 samples/sec   Loss 10.8120   LearningRate 0.0631   Epoch: 4   Global Step: 170580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:32,695-Speed 2628.24 samples/sec   Loss 10.6872   LearningRate 0.0631   Epoch: 4   Global Step: 170590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:36,594-Speed 2626.31 samples/sec   Loss 10.9455   LearningRate 0.0631   Epoch: 4   Global Step: 170600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:40,490-Speed 2628.78 samples/sec   Loss 10.9038   LearningRate 0.0631   Epoch: 4   Global Step: 170610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:44,388-Speed 2627.37 samples/sec   Loss 10.8175   LearningRate 0.0631   Epoch: 4   Global Step: 170620   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:48,284-Speed 2629.80 samples/sec   Loss 10.9410   LearningRate 0.0631   Epoch: 4   Global Step: 170630   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:52,184-Speed 2625.46 samples/sec   Loss 10.8301   LearningRate 0.0631   Epoch: 4   Global Step: 170640   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:56,083-Speed 2627.36 samples/sec   Loss 10.8605   LearningRate 0.0631   Epoch: 4   Global Step: 170650   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:54:59,983-Speed 2626.34 samples/sec   Loss 10.7350   LearningRate 0.0631   Epoch: 4   Global Step: 170660   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:55:04,643-Speed 2198.09 samples/sec   Loss 10.8400   LearningRate 0.0631   Epoch: 4   Global Step: 170670   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:55:08,526-Speed 2637.48 samples/sec   Loss 10.7616   LearningRate 0.0631   Epoch: 4   Global Step: 170680   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:12,479-Speed 2591.29 samples/sec   Loss 10.8837   LearningRate 0.0631   Epoch: 4   Global Step: 170690   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:16,426-Speed 2595.37 samples/sec   Loss 10.8788   LearningRate 0.0631   Epoch: 4   Global Step: 170700   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:20,339-Speed 2617.52 samples/sec   Loss 10.7320   LearningRate 0.0631   Epoch: 4   Global Step: 170710   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:24,248-Speed 2620.13 samples/sec   Loss 10.8584   LearningRate 0.0631   Epoch: 4   Global Step: 170720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:28,153-Speed 2624.63 samples/sec   Loss 10.7120   LearningRate 0.0631   Epoch: 4   Global Step: 170730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:32,061-Speed 2620.92 samples/sec   Loss 10.8788   LearningRate 0.0631   Epoch: 4   Global Step: 170740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:35,963-Speed 2624.48 samples/sec   Loss 10.8363   LearningRate 0.0631   Epoch: 4   Global Step: 170750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:39,874-Speed 2618.66 samples/sec   Loss 10.8878   LearningRate 0.0631   Epoch: 4   Global Step: 170760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:43,802-Speed 2608.36 samples/sec   Loss 10.7838   LearningRate 0.0631   Epoch: 4   Global Step: 170770   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:55:47,719-Speed 2615.16 samples/sec   Loss 10.9075   LearningRate 0.0631   Epoch: 4   Global Step: 170780   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:55:51,618-Speed 2626.56 samples/sec   Loss 10.7762   LearningRate 0.0631   Epoch: 4   Global Step: 170790   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:55:55,522-Speed 2624.31 samples/sec   Loss 10.7452   LearningRate 0.0631   Epoch: 4   Global Step: 170800   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:55:59,415-Speed 2630.84 samples/sec   Loss 10.8465   LearningRate 0.0631   Epoch: 4   Global Step: 170810   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:56:03,311-Speed 2628.64 samples/sec   Loss 10.9224   LearningRate 0.0631   Epoch: 4   Global Step: 170820   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:56:07,197-Speed 2635.39 samples/sec   Loss 10.5966   LearningRate 0.0631   Epoch: 4   Global Step: 170830   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:11,095-Speed 2628.18 samples/sec   Loss 10.8583   LearningRate 0.0631   Epoch: 4   Global Step: 170840   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:14,987-Speed 2631.48 samples/sec   Loss 10.7017   LearningRate 0.0631   Epoch: 4   Global Step: 170850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:18,881-Speed 2630.74 samples/sec   Loss 10.8001   LearningRate 0.0631   Epoch: 4   Global Step: 170860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:22,779-Speed 2627.57 samples/sec   Loss 10.6964   LearningRate 0.0630   Epoch: 4   Global Step: 170870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:26,756-Speed 2575.40 samples/sec   Loss 10.8482   LearningRate 0.0630   Epoch: 4   Global Step: 170880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:30,687-Speed 2605.90 samples/sec   Loss 10.7218   LearningRate 0.0630   Epoch: 4   Global Step: 170890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:34,594-Speed 2621.23 samples/sec   Loss 10.7677   LearningRate 0.0630   Epoch: 4   Global Step: 170900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:38,497-Speed 2624.59 samples/sec   Loss 10.7405   LearningRate 0.0630   Epoch: 4   Global Step: 170910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:42,408-Speed 2618.39 samples/sec   Loss 10.7030   LearningRate 0.0630   Epoch: 4   Global Step: 170920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:56:46,318-Speed 2620.39 samples/sec   Loss 10.6657   LearningRate 0.0630   Epoch: 4   Global Step: 170930   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:56:50,299-Speed 2572.69 samples/sec   Loss 10.8200   LearningRate 0.0630   Epoch: 4   Global Step: 170940   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:56:54,307-Speed 2556.32 samples/sec   Loss 10.7527   LearningRate 0.0630   Epoch: 4   Global Step: 170950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:56:58,222-Speed 2616.02 samples/sec   Loss 10.7428   LearningRate 0.0630   Epoch: 4   Global Step: 170960   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:57:02,072-Speed 2660.53 samples/sec   Loss 11.2906   LearningRate 0.0630   Epoch: 4   Global Step: 170970   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:05,981-Speed 2620.08 samples/sec   Loss 11.2391   LearningRate 0.0630   Epoch: 4   Global Step: 170980   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:09,883-Speed 2625.08 samples/sec   Loss 11.0576   LearningRate 0.0630   Epoch: 4   Global Step: 170990   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:13,824-Speed 2598.69 samples/sec   Loss 10.8896   LearningRate 0.0630   Epoch: 4   Global Step: 171000   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:17,724-Speed 2626.85 samples/sec   Loss 10.8454   LearningRate 0.0630   Epoch: 4   Global Step: 171010   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:21,689-Speed 2582.90 samples/sec   Loss 10.9602   LearningRate 0.0630   Epoch: 4   Global Step: 171020   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:25,637-Speed 2594.51 samples/sec   Loss 10.9397   LearningRate 0.0630   Epoch: 4   Global Step: 171030   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:29,532-Speed 2629.37 samples/sec   Loss 10.7468   LearningRate 0.0630   Epoch: 4   Global Step: 171040   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:33,425-Speed 2631.35 samples/sec   Loss 10.9140   LearningRate 0.0630   Epoch: 4   Global Step: 171050   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:37,316-Speed 2632.52 samples/sec   Loss 10.8477   LearningRate 0.0630   Epoch: 4   Global Step: 171060   Fp16 Grad Scale: 32768   Required: 74 hours
Training: 2022-04-13 14:57:41,235-Speed 2613.39 samples/sec   Loss 10.8083   LearningRate 0.0630   Epoch: 4   Global Step: 171070   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:57:45,139-Speed 2623.98 samples/sec   Loss 10.8775   LearningRate 0.0630   Epoch: 4   Global Step: 171080   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:57:49,038-Speed 2627.26 samples/sec   Loss 10.7247   LearningRate 0.0630   Epoch: 4   Global Step: 171090   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:57:52,930-Speed 2631.80 samples/sec   Loss 10.8801   LearningRate 0.0630   Epoch: 4   Global Step: 171100   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:57:57,942-Speed 2043.11 samples/sec   Loss 10.9099   LearningRate 0.0630   Epoch: 4   Global Step: 171110   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:58:01,956-Speed 2552.76 samples/sec   Loss 10.8206   LearningRate 0.0630   Epoch: 4   Global Step: 171120   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:58:05,845-Speed 2633.83 samples/sec   Loss 10.7613   LearningRate 0.0630   Epoch: 4   Global Step: 171130   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:58:09,757-Speed 2617.55 samples/sec   Loss 10.7903   LearningRate 0.0630   Epoch: 4   Global Step: 171140   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:58:13,661-Speed 2623.81 samples/sec   Loss 10.8582   LearningRate 0.0630   Epoch: 4   Global Step: 171150   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:58:17,558-Speed 2629.02 samples/sec   Loss 10.7767   LearningRate 0.0630   Epoch: 4   Global Step: 171160   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 14:58:21,460-Speed 2624.14 samples/sec   Loss 10.7972   LearningRate 0.0630   Epoch: 4   Global Step: 171170   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:25,366-Speed 2622.44 samples/sec   Loss 10.6315   LearningRate 0.0630   Epoch: 4   Global Step: 171180   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:29,296-Speed 2606.56 samples/sec   Loss 10.6412   LearningRate 0.0630   Epoch: 4   Global Step: 171190   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:33,211-Speed 2616.50 samples/sec   Loss 10.7645   LearningRate 0.0630   Epoch: 4   Global Step: 171200   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:37,129-Speed 2614.08 samples/sec   Loss 10.7257   LearningRate 0.0630   Epoch: 4   Global Step: 171210   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:41,024-Speed 2629.83 samples/sec   Loss 10.8126   LearningRate 0.0630   Epoch: 4   Global Step: 171220   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:44,917-Speed 2631.60 samples/sec   Loss 10.7253   LearningRate 0.0630   Epoch: 4   Global Step: 171230   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:48,823-Speed 2621.87 samples/sec   Loss 10.8373   LearningRate 0.0630   Epoch: 4   Global Step: 171240   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:52,718-Speed 2630.30 samples/sec   Loss 10.8687   LearningRate 0.0630   Epoch: 4   Global Step: 171250   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:58:56,612-Speed 2630.05 samples/sec   Loss 10.9822   LearningRate 0.0630   Epoch: 4   Global Step: 171260   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:00,509-Speed 2627.81 samples/sec   Loss 10.7715   LearningRate 0.0630   Epoch: 4   Global Step: 171270   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:59:04,432-Speed 2611.21 samples/sec   Loss 10.6589   LearningRate 0.0630   Epoch: 4   Global Step: 171280   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:59:08,344-Speed 2618.27 samples/sec   Loss 10.9594   LearningRate 0.0630   Epoch: 4   Global Step: 171290   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:59:12,241-Speed 2628.46 samples/sec   Loss 10.7787   LearningRate 0.0630   Epoch: 4   Global Step: 171300   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:59:16,136-Speed 2629.90 samples/sec   Loss 10.8857   LearningRate 0.0630   Epoch: 4   Global Step: 171310   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 14:59:20,048-Speed 2618.57 samples/sec   Loss 10.9644   LearningRate 0.0630   Epoch: 4   Global Step: 171320   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:23,945-Speed 2627.84 samples/sec   Loss 10.8266   LearningRate 0.0630   Epoch: 4   Global Step: 171330   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:27,837-Speed 2632.12 samples/sec   Loss 10.9453   LearningRate 0.0630   Epoch: 4   Global Step: 171340   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:31,738-Speed 2625.76 samples/sec   Loss 10.8112   LearningRate 0.0630   Epoch: 4   Global Step: 171350   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:35,636-Speed 2627.58 samples/sec   Loss 10.9075   LearningRate 0.0630   Epoch: 4   Global Step: 171360   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:39,529-Speed 2630.66 samples/sec   Loss 10.8749   LearningRate 0.0630   Epoch: 4   Global Step: 171370   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:43,452-Speed 2611.80 samples/sec   Loss 10.9388   LearningRate 0.0630   Epoch: 4   Global Step: 171380   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:47,364-Speed 2617.54 samples/sec   Loss 10.8823   LearningRate 0.0629   Epoch: 4   Global Step: 171390   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:51,273-Speed 2620.71 samples/sec   Loss 10.8174   LearningRate 0.0629   Epoch: 4   Global Step: 171400   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:55,176-Speed 2624.17 samples/sec   Loss 10.9107   LearningRate 0.0629   Epoch: 4   Global Step: 171410   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 14:59:59,072-Speed 2628.82 samples/sec   Loss 10.8537   LearningRate 0.0629   Epoch: 4   Global Step: 171420   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:00:02,968-Speed 2629.54 samples/sec   Loss 10.7312   LearningRate 0.0629   Epoch: 4   Global Step: 171430   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:00:06,844-Speed 2642.04 samples/sec   Loss 10.8411   LearningRate 0.0629   Epoch: 4   Global Step: 171440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:10,745-Speed 2625.77 samples/sec   Loss 10.6726   LearningRate 0.0629   Epoch: 4   Global Step: 171450   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:14,640-Speed 2629.46 samples/sec   Loss 10.6338   LearningRate 0.0629   Epoch: 4   Global Step: 171460   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:18,575-Speed 2603.39 samples/sec   Loss 10.7419   LearningRate 0.0629   Epoch: 4   Global Step: 171470   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:22,473-Speed 2627.91 samples/sec   Loss 10.8858   LearningRate 0.0629   Epoch: 4   Global Step: 171480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:26,366-Speed 2630.73 samples/sec   Loss 10.8297   LearningRate 0.0629   Epoch: 4   Global Step: 171490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:30,262-Speed 2629.17 samples/sec   Loss 10.8536   LearningRate 0.0629   Epoch: 4   Global Step: 171500   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:34,161-Speed 2627.18 samples/sec   Loss 10.7999   LearningRate 0.0629   Epoch: 4   Global Step: 171510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:38,056-Speed 2629.23 samples/sec   Loss 10.9471   LearningRate 0.0629   Epoch: 4   Global Step: 171520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:41,955-Speed 2627.60 samples/sec   Loss 10.8861   LearningRate 0.0629   Epoch: 4   Global Step: 171530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:45,863-Speed 2620.19 samples/sec   Loss 10.7219   LearningRate 0.0629   Epoch: 4   Global Step: 171540   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:00:49,751-Speed 2634.79 samples/sec   Loss 10.9466   LearningRate 0.0629   Epoch: 4   Global Step: 171550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:53,647-Speed 2629.20 samples/sec   Loss 10.7245   LearningRate 0.0629   Epoch: 4   Global Step: 171560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:00:57,545-Speed 2628.02 samples/sec   Loss 10.7904   LearningRate 0.0629   Epoch: 4   Global Step: 171570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:01:01,439-Speed 2630.03 samples/sec   Loss 10.7776   LearningRate 0.0629   Epoch: 4   Global Step: 171580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:01:05,337-Speed 2627.36 samples/sec   Loss 10.8350   LearningRate 0.0629   Epoch: 4   Global Step: 171590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:01:09,252-Speed 2616.02 samples/sec   Loss 10.8726   LearningRate 0.0629   Epoch: 4   Global Step: 171600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:01:13,177-Speed 2609.55 samples/sec   Loss 10.6939   LearningRate 0.0629   Epoch: 4   Global Step: 171610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:01:17,716-Speed 2256.39 samples/sec   Loss 10.7711   LearningRate 0.0629   Epoch: 4   Global Step: 171620   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:21,610-Speed 2630.88 samples/sec   Loss 10.7439   LearningRate 0.0629   Epoch: 4   Global Step: 171630   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:25,510-Speed 2626.64 samples/sec   Loss 10.7036   LearningRate 0.0629   Epoch: 4   Global Step: 171640   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:29,406-Speed 2628.64 samples/sec   Loss 10.8593   LearningRate 0.0629   Epoch: 4   Global Step: 171650   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:33,310-Speed 2623.64 samples/sec   Loss 10.8831   LearningRate 0.0629   Epoch: 4   Global Step: 171660   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:37,211-Speed 2625.29 samples/sec   Loss 10.7542   LearningRate 0.0629   Epoch: 4   Global Step: 171670   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:41,107-Speed 2628.85 samples/sec   Loss 10.8625   LearningRate 0.0629   Epoch: 4   Global Step: 171680   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:45,021-Speed 2616.82 samples/sec   Loss 10.9070   LearningRate 0.0629   Epoch: 4   Global Step: 171690   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:48,967-Speed 2595.57 samples/sec   Loss 10.8027   LearningRate 0.0629   Epoch: 4   Global Step: 171700   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:52,867-Speed 2626.51 samples/sec   Loss 10.6318   LearningRate 0.0629   Epoch: 4   Global Step: 171710   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:01:56,769-Speed 2625.21 samples/sec   Loss 10.8198   LearningRate 0.0629   Epoch: 4   Global Step: 171720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:00,710-Speed 2598.69 samples/sec   Loss 10.8320   LearningRate 0.0629   Epoch: 4   Global Step: 171730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:04,606-Speed 2629.03 samples/sec   Loss 10.8543   LearningRate 0.0629   Epoch: 4   Global Step: 171740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:08,511-Speed 2623.05 samples/sec   Loss 10.7318   LearningRate 0.0629   Epoch: 4   Global Step: 171750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:12,516-Speed 2557.61 samples/sec   Loss 10.8245   LearningRate 0.0629   Epoch: 4   Global Step: 171760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:16,433-Speed 2614.68 samples/sec   Loss 10.7077   LearningRate 0.0629   Epoch: 4   Global Step: 171770   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:20,331-Speed 2627.16 samples/sec   Loss 10.6873   LearningRate 0.0629   Epoch: 4   Global Step: 171780   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:24,230-Speed 2626.98 samples/sec   Loss 10.8784   LearningRate 0.0629   Epoch: 4   Global Step: 171790   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:28,133-Speed 2624.09 samples/sec   Loss 10.7770   LearningRate 0.0629   Epoch: 4   Global Step: 171800   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:32,027-Speed 2630.78 samples/sec   Loss 10.6041   LearningRate 0.0629   Epoch: 4   Global Step: 171810   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:35,927-Speed 2626.44 samples/sec   Loss 10.8865   LearningRate 0.0629   Epoch: 4   Global Step: 171820   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:02:39,828-Speed 2625.57 samples/sec   Loss 10.6881   LearningRate 0.0629   Epoch: 4   Global Step: 171830   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:02:43,723-Speed 2629.37 samples/sec   Loss 10.8657   LearningRate 0.0629   Epoch: 4   Global Step: 171840   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:02:47,613-Speed 2632.91 samples/sec   Loss 10.9765   LearningRate 0.0629   Epoch: 4   Global Step: 171850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:51,512-Speed 2627.23 samples/sec   Loss 10.7244   LearningRate 0.0629   Epoch: 4   Global Step: 171860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:55,414-Speed 2625.14 samples/sec   Loss 10.8713   LearningRate 0.0629   Epoch: 4   Global Step: 171870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:02:59,313-Speed 2626.68 samples/sec   Loss 10.8089   LearningRate 0.0629   Epoch: 4   Global Step: 171880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:03,208-Speed 2630.35 samples/sec   Loss 10.8048   LearningRate 0.0629   Epoch: 4   Global Step: 171890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:07,102-Speed 2630.29 samples/sec   Loss 10.8504   LearningRate 0.0629   Epoch: 4   Global Step: 171900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:11,007-Speed 2622.73 samples/sec   Loss 10.7104   LearningRate 0.0628   Epoch: 4   Global Step: 171910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:14,901-Speed 2630.58 samples/sec   Loss 10.8256   LearningRate 0.0628   Epoch: 4   Global Step: 171920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:18,918-Speed 2549.89 samples/sec   Loss 10.7599   LearningRate 0.0628   Epoch: 4   Global Step: 171930   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:22,822-Speed 2623.31 samples/sec   Loss 10.7853   LearningRate 0.0628   Epoch: 4   Global Step: 171940   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:26,715-Speed 2631.22 samples/sec   Loss 10.6901   LearningRate 0.0628   Epoch: 4   Global Step: 171950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:03:30,611-Speed 2628.59 samples/sec   Loss 10.7629   LearningRate 0.0628   Epoch: 4   Global Step: 171960   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:03:34,509-Speed 2628.14 samples/sec   Loss 10.8013   LearningRate 0.0628   Epoch: 4   Global Step: 171970   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:03:38,403-Speed 2630.68 samples/sec   Loss 10.6717   LearningRate 0.0628   Epoch: 4   Global Step: 171980   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:03:42,297-Speed 2630.15 samples/sec   Loss 10.7060   LearningRate 0.0628   Epoch: 4   Global Step: 171990   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:03:46,195-Speed 2627.47 samples/sec   Loss 10.8420   LearningRate 0.0628   Epoch: 4   Global Step: 172000   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:03:50,075-Speed 2639.55 samples/sec   Loss 10.7104   LearningRate 0.0628   Epoch: 4   Global Step: 172010   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:53,964-Speed 2633.93 samples/sec   Loss 10.8185   LearningRate 0.0628   Epoch: 4   Global Step: 172020   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:03:57,860-Speed 2629.27 samples/sec   Loss 11.0383   LearningRate 0.0628   Epoch: 4   Global Step: 172030   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:01,777-Speed 2614.80 samples/sec   Loss 10.8193   LearningRate 0.0628   Epoch: 4   Global Step: 172040   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:05,673-Speed 2628.92 samples/sec   Loss 10.8381   LearningRate 0.0628   Epoch: 4   Global Step: 172050   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:09,569-Speed 2629.38 samples/sec   Loss 10.7874   LearningRate 0.0628   Epoch: 4   Global Step: 172060   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:13,467-Speed 2627.07 samples/sec   Loss 10.7885   LearningRate 0.0628   Epoch: 4   Global Step: 172070   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:17,391-Speed 2610.66 samples/sec   Loss 10.7678   LearningRate 0.0628   Epoch: 4   Global Step: 172080   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:21,284-Speed 2631.34 samples/sec   Loss 10.6034   LearningRate 0.0628   Epoch: 4   Global Step: 172090   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:25,177-Speed 2630.76 samples/sec   Loss 10.7601   LearningRate 0.0628   Epoch: 4   Global Step: 172100   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:29,092-Speed 2616.62 samples/sec   Loss 10.9031   LearningRate 0.0628   Epoch: 4   Global Step: 172110   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:04:32,987-Speed 2629.39 samples/sec   Loss 10.8284   LearningRate 0.0628   Epoch: 4   Global Step: 172120   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:04:36,892-Speed 2622.76 samples/sec   Loss 10.6740   LearningRate 0.0628   Epoch: 4   Global Step: 172130   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:04:40,783-Speed 2632.48 samples/sec   Loss 10.6820   LearningRate 0.0628   Epoch: 4   Global Step: 172140   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:44,677-Speed 2630.46 samples/sec   Loss 10.6698   LearningRate 0.0628   Epoch: 4   Global Step: 172150   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:48,570-Speed 2630.89 samples/sec   Loss 10.7947   LearningRate 0.0628   Epoch: 4   Global Step: 172160   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:52,475-Speed 2622.85 samples/sec   Loss 10.7849   LearningRate 0.0628   Epoch: 4   Global Step: 172170   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:04:56,389-Speed 2617.16 samples/sec   Loss 10.7802   LearningRate 0.0628   Epoch: 4   Global Step: 172180   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:05:00,284-Speed 2629.73 samples/sec   Loss 10.6254   LearningRate 0.0628   Epoch: 4   Global Step: 172190   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:04,199-Speed 2615.73 samples/sec   Loss 10.8117   LearningRate 0.0628   Epoch: 4   Global Step: 172200   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:08,122-Speed 2611.16 samples/sec   Loss 10.7770   LearningRate 0.0628   Epoch: 4   Global Step: 172210   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:12,026-Speed 2623.54 samples/sec   Loss 10.6998   LearningRate 0.0628   Epoch: 4   Global Step: 172220   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:16,158-Speed 2478.92 samples/sec   Loss 10.7446   LearningRate 0.0628   Epoch: 4   Global Step: 172230   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:20,063-Speed 2622.45 samples/sec   Loss 10.8250   LearningRate 0.0628   Epoch: 4   Global Step: 172240   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:23,954-Speed 2632.19 samples/sec   Loss 10.8549   LearningRate 0.0628   Epoch: 4   Global Step: 172250   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:27,846-Speed 2632.19 samples/sec   Loss 10.7645   LearningRate 0.0628   Epoch: 4   Global Step: 172260   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:31,739-Speed 2631.55 samples/sec   Loss 10.6717   LearningRate 0.0628   Epoch: 4   Global Step: 172270   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:35,629-Speed 2632.82 samples/sec   Loss 10.7440   LearningRate 0.0628   Epoch: 4   Global Step: 172280   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:05:39,526-Speed 2628.16 samples/sec   Loss 10.7260   LearningRate 0.0628   Epoch: 4   Global Step: 172290   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:05:43,419-Speed 2631.47 samples/sec   Loss 10.7482   LearningRate 0.0628   Epoch: 4   Global Step: 172300   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:05:47,315-Speed 2628.55 samples/sec   Loss 10.9120   LearningRate 0.0628   Epoch: 4   Global Step: 172310   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:05:51,208-Speed 2631.63 samples/sec   Loss 10.7968   LearningRate 0.0628   Epoch: 4   Global Step: 172320   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:05:55,110-Speed 2624.83 samples/sec   Loss 10.7646   LearningRate 0.0628   Epoch: 4   Global Step: 172330   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:05:58,993-Speed 2638.77 samples/sec   Loss 10.7389   LearningRate 0.0628   Epoch: 4   Global Step: 172340   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:02,888-Speed 2629.51 samples/sec   Loss 10.7611   LearningRate 0.0628   Epoch: 4   Global Step: 172350   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:06,790-Speed 2624.84 samples/sec   Loss 10.6705   LearningRate 0.0628   Epoch: 4   Global Step: 172360   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:10,730-Speed 2599.70 samples/sec   Loss 10.7131   LearningRate 0.0628   Epoch: 4   Global Step: 172370   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:14,625-Speed 2630.28 samples/sec   Loss 11.0811   LearningRate 0.0628   Epoch: 4   Global Step: 172380   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:18,534-Speed 2620.36 samples/sec   Loss 10.7106   LearningRate 0.0628   Epoch: 4   Global Step: 172390   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:22,433-Speed 2627.15 samples/sec   Loss 10.8176   LearningRate 0.0628   Epoch: 4   Global Step: 172400   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:26,331-Speed 2627.52 samples/sec   Loss 10.7779   LearningRate 0.0628   Epoch: 4   Global Step: 172410   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:30,223-Speed 2631.69 samples/sec   Loss 10.7210   LearningRate 0.0628   Epoch: 4   Global Step: 172420   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:34,121-Speed 2627.97 samples/sec   Loss 10.8464   LearningRate 0.0627   Epoch: 4   Global Step: 172430   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:06:38,012-Speed 2632.17 samples/sec   Loss 10.7880   LearningRate 0.0627   Epoch: 4   Global Step: 172440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:06:41,910-Speed 2627.59 samples/sec   Loss 10.8712   LearningRate 0.0627   Epoch: 4   Global Step: 172450   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:06:45,815-Speed 2623.10 samples/sec   Loss 10.7549   LearningRate 0.0627   Epoch: 4   Global Step: 172460   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:06:49,727-Speed 2617.92 samples/sec   Loss 10.6915   LearningRate 0.0627   Epoch: 4   Global Step: 172470   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:06:53,619-Speed 2631.65 samples/sec   Loss 10.6299   LearningRate 0.0627   Epoch: 4   Global Step: 172480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:06:57,516-Speed 2628.52 samples/sec   Loss 10.6828   LearningRate 0.0627   Epoch: 4   Global Step: 172490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:07:01,412-Speed 2629.44 samples/sec   Loss 10.7927   LearningRate 0.0627   Epoch: 4   Global Step: 172500   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:07:05,320-Speed 2620.46 samples/sec   Loss 10.7660   LearningRate 0.0627   Epoch: 4   Global Step: 172510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:07:09,214-Speed 2629.91 samples/sec   Loss 10.8600   LearningRate 0.0627   Epoch: 4   Global Step: 172520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:07:13,110-Speed 2629.60 samples/sec   Loss 10.9167   LearningRate 0.0627   Epoch: 4   Global Step: 172530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:07:16,991-Speed 2638.94 samples/sec   Loss 10.8000   LearningRate 0.0627   Epoch: 4   Global Step: 172540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:07:20,882-Speed 2632.66 samples/sec   Loss 10.6127   LearningRate 0.0627   Epoch: 4   Global Step: 172550   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:24,812-Speed 2606.44 samples/sec   Loss 10.8799   LearningRate 0.0627   Epoch: 4   Global Step: 172560   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:28,705-Speed 2631.51 samples/sec   Loss 10.6614   LearningRate 0.0627   Epoch: 4   Global Step: 172570   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:32,609-Speed 2623.21 samples/sec   Loss 10.7932   LearningRate 0.0627   Epoch: 4   Global Step: 172580   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:36,500-Speed 2632.25 samples/sec   Loss 10.8804   LearningRate 0.0627   Epoch: 4   Global Step: 172590   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:40,397-Speed 2628.56 samples/sec   Loss 10.7321   LearningRate 0.0627   Epoch: 4   Global Step: 172600   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:44,323-Speed 2609.10 samples/sec   Loss 10.9244   LearningRate 0.0627   Epoch: 4   Global Step: 172610   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:48,237-Speed 2616.25 samples/sec   Loss 10.8323   LearningRate 0.0627   Epoch: 4   Global Step: 172620   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:52,146-Speed 2620.28 samples/sec   Loss 10.7039   LearningRate 0.0627   Epoch: 4   Global Step: 172630   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:56,041-Speed 2630.04 samples/sec   Loss 10.6951   LearningRate 0.0627   Epoch: 4   Global Step: 172640   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:07:59,936-Speed 2629.90 samples/sec   Loss 10.6590   LearningRate 0.0627   Epoch: 4   Global Step: 172650   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:03,833-Speed 2628.28 samples/sec   Loss 10.8381   LearningRate 0.0627   Epoch: 4   Global Step: 172660   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:07,748-Speed 2616.10 samples/sec   Loss 10.8175   LearningRate 0.0627   Epoch: 4   Global Step: 172670   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:11,644-Speed 2629.07 samples/sec   Loss 10.7102   LearningRate 0.0627   Epoch: 4   Global Step: 172680   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:15,536-Speed 2631.63 samples/sec   Loss 10.7960   LearningRate 0.0627   Epoch: 4   Global Step: 172690   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:19,429-Speed 2631.59 samples/sec   Loss 10.7356   LearningRate 0.0627   Epoch: 4   Global Step: 172700   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:23,323-Speed 2630.02 samples/sec   Loss 10.7800   LearningRate 0.0627   Epoch: 4   Global Step: 172710   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:27,231-Speed 2620.78 samples/sec   Loss 10.8633   LearningRate 0.0627   Epoch: 4   Global Step: 172720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:31,131-Speed 2626.27 samples/sec   Loss 10.6385   LearningRate 0.0627   Epoch: 4   Global Step: 172730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:35,027-Speed 2629.72 samples/sec   Loss 10.8101   LearningRate 0.0627   Epoch: 4   Global Step: 172740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:38,902-Speed 2642.71 samples/sec   Loss 10.9086   LearningRate 0.0627   Epoch: 4   Global Step: 172750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:42,796-Speed 2630.16 samples/sec   Loss 10.8875   LearningRate 0.0627   Epoch: 4   Global Step: 172760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:46,692-Speed 2629.07 samples/sec   Loss 10.6318   LearningRate 0.0627   Epoch: 4   Global Step: 172770   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:50,594-Speed 2624.98 samples/sec   Loss 10.6061   LearningRate 0.0627   Epoch: 4   Global Step: 172780   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:54,493-Speed 2627.14 samples/sec   Loss 10.7264   LearningRate 0.0627   Epoch: 4   Global Step: 172790   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:08:58,390-Speed 2628.36 samples/sec   Loss 10.6995   LearningRate 0.0627   Epoch: 4   Global Step: 172800   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:09:02,282-Speed 2631.89 samples/sec   Loss 10.8567   LearningRate 0.0627   Epoch: 4   Global Step: 172810   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:06,190-Speed 2620.73 samples/sec   Loss 10.8632   LearningRate 0.0627   Epoch: 4   Global Step: 172820   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:10,087-Speed 2628.61 samples/sec   Loss 10.6662   LearningRate 0.0627   Epoch: 4   Global Step: 172830   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:13,988-Speed 2625.39 samples/sec   Loss 10.7887   LearningRate 0.0627   Epoch: 4   Global Step: 172840   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:17,886-Speed 2627.68 samples/sec   Loss 10.7631   LearningRate 0.0627   Epoch: 4   Global Step: 172850   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:21,780-Speed 2630.25 samples/sec   Loss 10.7577   LearningRate 0.0627   Epoch: 4   Global Step: 172860   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:25,672-Speed 2632.06 samples/sec   Loss 10.6995   LearningRate 0.0627   Epoch: 4   Global Step: 172870   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:29,568-Speed 2629.17 samples/sec   Loss 10.8272   LearningRate 0.0627   Epoch: 4   Global Step: 172880   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:33,463-Speed 2629.77 samples/sec   Loss 10.7659   LearningRate 0.0627   Epoch: 4   Global Step: 172890   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:37,381-Speed 2613.80 samples/sec   Loss 10.8155   LearningRate 0.0627   Epoch: 4   Global Step: 172900   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:09:41,274-Speed 2630.74 samples/sec   Loss 10.8455   LearningRate 0.0627   Epoch: 4   Global Step: 172910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:09:45,170-Speed 2629.07 samples/sec   Loss 10.8940   LearningRate 0.0627   Epoch: 4   Global Step: 172920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:09:49,077-Speed 2621.74 samples/sec   Loss 10.8651   LearningRate 0.0627   Epoch: 4   Global Step: 172930   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:09:52,968-Speed 2632.37 samples/sec   Loss 10.9635   LearningRate 0.0627   Epoch: 4   Global Step: 172940   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:09:56,843-Speed 2643.31 samples/sec   Loss 11.3575   LearningRate 0.0627   Epoch: 4   Global Step: 172950   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:00,738-Speed 2629.54 samples/sec   Loss 10.9786   LearningRate 0.0626   Epoch: 4   Global Step: 172960   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:04,674-Speed 2602.58 samples/sec   Loss 10.8423   LearningRate 0.0626   Epoch: 4   Global Step: 172970   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:08,563-Speed 2633.55 samples/sec   Loss 11.1827   LearningRate 0.0626   Epoch: 4   Global Step: 172980   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:12,454-Speed 2632.62 samples/sec   Loss 10.8828   LearningRate 0.0626   Epoch: 4   Global Step: 172990   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:16,346-Speed 2631.49 samples/sec   Loss 10.7993   LearningRate 0.0626   Epoch: 4   Global Step: 173000   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:20,260-Speed 2616.87 samples/sec   Loss 11.0231   LearningRate 0.0626   Epoch: 4   Global Step: 173010   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:24,165-Speed 2623.50 samples/sec   Loss 10.8502   LearningRate 0.0626   Epoch: 4   Global Step: 173020   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:28,056-Speed 2632.33 samples/sec   Loss 10.8482   LearningRate 0.0626   Epoch: 4   Global Step: 173030   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:31,988-Speed 2604.60 samples/sec   Loss 10.7245   LearningRate 0.0626   Epoch: 4   Global Step: 173040   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:35,891-Speed 2624.32 samples/sec   Loss 10.8305   LearningRate 0.0626   Epoch: 4   Global Step: 173050   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:10:39,788-Speed 2628.40 samples/sec   Loss 10.8836   LearningRate 0.0626   Epoch: 4   Global Step: 173060   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:10:43,701-Speed 2617.51 samples/sec   Loss 10.8909   LearningRate 0.0626   Epoch: 4   Global Step: 173070   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:10:47,593-Speed 2631.70 samples/sec   Loss 10.6543   LearningRate 0.0626   Epoch: 4   Global Step: 173080   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:10:51,472-Speed 2640.89 samples/sec   Loss 10.8102   LearningRate 0.0626   Epoch: 4   Global Step: 173090   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:55,368-Speed 2629.01 samples/sec   Loss 10.6681   LearningRate 0.0626   Epoch: 4   Global Step: 173100   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:10:59,275-Speed 2622.05 samples/sec   Loss 10.9143   LearningRate 0.0626   Epoch: 4   Global Step: 173110   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:03,175-Speed 2625.84 samples/sec   Loss 10.7019   LearningRate 0.0626   Epoch: 4   Global Step: 173120   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:07,065-Speed 2632.70 samples/sec   Loss 10.8697   LearningRate 0.0626   Epoch: 4   Global Step: 173130   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:10,964-Speed 2626.90 samples/sec   Loss 10.8202   LearningRate 0.0626   Epoch: 4   Global Step: 173140   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:14,866-Speed 2625.27 samples/sec   Loss 10.8934   LearningRate 0.0626   Epoch: 4   Global Step: 173150   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:18,759-Speed 2630.28 samples/sec   Loss 10.7429   LearningRate 0.0626   Epoch: 4   Global Step: 173160   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:22,651-Speed 2632.36 samples/sec   Loss 10.8234   LearningRate 0.0626   Epoch: 4   Global Step: 173170   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:26,550-Speed 2627.20 samples/sec   Loss 10.7531   LearningRate 0.0626   Epoch: 4   Global Step: 173180   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:11:30,465-Speed 2615.99 samples/sec   Loss 10.7279   LearningRate 0.0626   Epoch: 4   Global Step: 173190   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:34,382-Speed 2614.74 samples/sec   Loss 10.7632   LearningRate 0.0626   Epoch: 4   Global Step: 173200   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:38,292-Speed 2619.40 samples/sec   Loss 10.8112   LearningRate 0.0626   Epoch: 4   Global Step: 173210   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:42,187-Speed 2629.75 samples/sec   Loss 10.9360   LearningRate 0.0626   Epoch: 4   Global Step: 173220   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:46,091-Speed 2623.25 samples/sec   Loss 10.8704   LearningRate 0.0626   Epoch: 4   Global Step: 173230   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:49,987-Speed 2629.13 samples/sec   Loss 10.7579   LearningRate 0.0626   Epoch: 4   Global Step: 173240   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:53,880-Speed 2631.02 samples/sec   Loss 10.8565   LearningRate 0.0626   Epoch: 4   Global Step: 173250   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:11:57,915-Speed 2538.25 samples/sec   Loss 10.7464   LearningRate 0.0626   Epoch: 4   Global Step: 173260   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:01,810-Speed 2629.71 samples/sec   Loss 10.8736   LearningRate 0.0626   Epoch: 4   Global Step: 173270   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:05,704-Speed 2629.99 samples/sec   Loss 10.8719   LearningRate 0.0626   Epoch: 4   Global Step: 173280   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:09,603-Speed 2627.32 samples/sec   Loss 10.7459   LearningRate 0.0626   Epoch: 4   Global Step: 173290   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:12:13,496-Speed 2631.07 samples/sec   Loss 10.7601   LearningRate 0.0626   Epoch: 4   Global Step: 173300   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:12:17,418-Speed 2610.90 samples/sec   Loss 10.8656   LearningRate 0.0626   Epoch: 4   Global Step: 173310   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:21,400-Speed 2572.26 samples/sec   Loss 10.8739   LearningRate 0.0626   Epoch: 4   Global Step: 173320   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:25,295-Speed 2629.51 samples/sec   Loss 10.7896   LearningRate 0.0626   Epoch: 4   Global Step: 173330   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:29,191-Speed 2628.98 samples/sec   Loss 10.5838   LearningRate 0.0626   Epoch: 4   Global Step: 173340   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:33,085-Speed 2630.64 samples/sec   Loss 10.7678   LearningRate 0.0626   Epoch: 4   Global Step: 173350   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:36,974-Speed 2633.35 samples/sec   Loss 10.7598   LearningRate 0.0626   Epoch: 4   Global Step: 173360   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:40,864-Speed 2632.79 samples/sec   Loss 10.7016   LearningRate 0.0626   Epoch: 4   Global Step: 173370   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:44,780-Speed 2616.09 samples/sec   Loss 10.7955   LearningRate 0.0626   Epoch: 4   Global Step: 173380   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:48,700-Speed 2612.37 samples/sec   Loss 10.7060   LearningRate 0.0626   Epoch: 4   Global Step: 173390   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:52,604-Speed 2623.58 samples/sec   Loss 10.9020   LearningRate 0.0626   Epoch: 4   Global Step: 173400   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:12:56,499-Speed 2629.32 samples/sec   Loss 10.5962   LearningRate 0.0626   Epoch: 4   Global Step: 173410   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:13:00,376-Speed 2642.14 samples/sec   Loss 10.7046   LearningRate 0.0626   Epoch: 4   Global Step: 173420   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:04,267-Speed 2632.17 samples/sec   Loss 10.8801   LearningRate 0.0626   Epoch: 4   Global Step: 173430   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:08,157-Speed 2633.10 samples/sec   Loss 10.7361   LearningRate 0.0626   Epoch: 4   Global Step: 173440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:12,046-Speed 2633.45 samples/sec   Loss 10.6341   LearningRate 0.0626   Epoch: 4   Global Step: 173450   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:15,936-Speed 2633.24 samples/sec   Loss 10.7365   LearningRate 0.0626   Epoch: 4   Global Step: 173460   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:19,830-Speed 2630.54 samples/sec   Loss 10.7617   LearningRate 0.0626   Epoch: 4   Global Step: 173470   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:23,730-Speed 2625.93 samples/sec   Loss 10.7822   LearningRate 0.0625   Epoch: 4   Global Step: 173480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:27,629-Speed 2627.03 samples/sec   Loss 10.7678   LearningRate 0.0625   Epoch: 4   Global Step: 173490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:31,532-Speed 2624.23 samples/sec   Loss 10.7415   LearningRate 0.0625   Epoch: 4   Global Step: 173500   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:35,434-Speed 2624.88 samples/sec   Loss 10.6994   LearningRate 0.0625   Epoch: 4   Global Step: 173510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:39,384-Speed 2592.89 samples/sec   Loss 10.7858   LearningRate 0.0625   Epoch: 4   Global Step: 173520   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:13:43,258-Speed 2643.47 samples/sec   Loss 10.8031   LearningRate 0.0625   Epoch: 4   Global Step: 173530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:47,157-Speed 2626.90 samples/sec   Loss 10.7423   LearningRate 0.0625   Epoch: 4   Global Step: 173540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:51,052-Speed 2630.15 samples/sec   Loss 10.7350   LearningRate 0.0625   Epoch: 4   Global Step: 173550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:54,946-Speed 2629.97 samples/sec   Loss 10.7598   LearningRate 0.0625   Epoch: 4   Global Step: 173560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:13:58,844-Speed 2628.14 samples/sec   Loss 10.7751   LearningRate 0.0625   Epoch: 4   Global Step: 173570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:02,740-Speed 2628.75 samples/sec   Loss 10.6500   LearningRate 0.0625   Epoch: 4   Global Step: 173580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:06,635-Speed 2628.97 samples/sec   Loss 10.7573   LearningRate 0.0625   Epoch: 4   Global Step: 173590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:10,543-Speed 2620.95 samples/sec   Loss 10.8009   LearningRate 0.0625   Epoch: 4   Global Step: 173600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:14,438-Speed 2629.45 samples/sec   Loss 10.7831   LearningRate 0.0625   Epoch: 4   Global Step: 173610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:18,332-Speed 2630.44 samples/sec   Loss 10.7263   LearningRate 0.0625   Epoch: 4   Global Step: 173620   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:22,234-Speed 2625.13 samples/sec   Loss 10.7962   LearningRate 0.0625   Epoch: 4   Global Step: 173630   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:14:26,126-Speed 2631.81 samples/sec   Loss 10.7702   LearningRate 0.0625   Epoch: 4   Global Step: 173640   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:14:30,012-Speed 2635.60 samples/sec   Loss 10.8154   LearningRate 0.0625   Epoch: 4   Global Step: 173650   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:33,898-Speed 2635.46 samples/sec   Loss 10.7184   LearningRate 0.0625   Epoch: 4   Global Step: 173660   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:37,789-Speed 2632.49 samples/sec   Loss 10.5403   LearningRate 0.0625   Epoch: 4   Global Step: 173670   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:41,680-Speed 2632.14 samples/sec   Loss 10.8104   LearningRate 0.0625   Epoch: 4   Global Step: 173680   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:45,584-Speed 2623.36 samples/sec   Loss 10.6946   LearningRate 0.0625   Epoch: 4   Global Step: 173690   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:49,468-Speed 2637.32 samples/sec   Loss 10.9046   LearningRate 0.0625   Epoch: 4   Global Step: 173700   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:53,357-Speed 2633.40 samples/sec   Loss 10.6771   LearningRate 0.0625   Epoch: 4   Global Step: 173710   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:14:57,249-Speed 2631.66 samples/sec   Loss 10.7664   LearningRate 0.0625   Epoch: 4   Global Step: 173720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:15:01,139-Speed 2633.33 samples/sec   Loss 10.8431   LearningRate 0.0625   Epoch: 4   Global Step: 173730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:15:05,043-Speed 2623.60 samples/sec   Loss 10.8031   LearningRate 0.0625   Epoch: 4   Global Step: 173740   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:08,934-Speed 2632.35 samples/sec   Loss 10.8083   LearningRate 0.0625   Epoch: 4   Global Step: 173750   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:12,832-Speed 2627.43 samples/sec   Loss 10.7510   LearningRate 0.0625   Epoch: 4   Global Step: 173760   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:16,722-Speed 2632.55 samples/sec   Loss 10.6500   LearningRate 0.0625   Epoch: 4   Global Step: 173770   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:20,611-Speed 2634.09 samples/sec   Loss 10.7489   LearningRate 0.0625   Epoch: 4   Global Step: 173780   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:24,514-Speed 2624.05 samples/sec   Loss 10.7568   LearningRate 0.0625   Epoch: 4   Global Step: 173790   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:28,418-Speed 2623.69 samples/sec   Loss 10.6574   LearningRate 0.0625   Epoch: 4   Global Step: 173800   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:32,324-Speed 2621.80 samples/sec   Loss 10.7708   LearningRate 0.0625   Epoch: 4   Global Step: 173810   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:36,224-Speed 2626.57 samples/sec   Loss 10.7326   LearningRate 0.0625   Epoch: 4   Global Step: 173820   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:40,124-Speed 2626.03 samples/sec   Loss 10.8245   LearningRate 0.0625   Epoch: 4   Global Step: 173830   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:15:44,016-Speed 2631.44 samples/sec   Loss 10.7205   LearningRate 0.0625   Epoch: 4   Global Step: 173840   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:15:47,912-Speed 2628.99 samples/sec   Loss 10.7694   LearningRate 0.0625   Epoch: 4   Global Step: 173850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:15:51,805-Speed 2631.13 samples/sec   Loss 10.6566   LearningRate 0.0625   Epoch: 4   Global Step: 173860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:15:55,698-Speed 2631.07 samples/sec   Loss 10.8494   LearningRate 0.0625   Epoch: 4   Global Step: 173870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:15:59,594-Speed 2628.92 samples/sec   Loss 10.7814   LearningRate 0.0625   Epoch: 4   Global Step: 173880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:03,491-Speed 2628.38 samples/sec   Loss 10.9482   LearningRate 0.0625   Epoch: 4   Global Step: 173890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:07,386-Speed 2629.77 samples/sec   Loss 10.7851   LearningRate 0.0625   Epoch: 4   Global Step: 173900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:11,601-Speed 2429.77 samples/sec   Loss 10.7236   LearningRate 0.0625   Epoch: 4   Global Step: 173910   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:15,494-Speed 2631.03 samples/sec   Loss 10.7261   LearningRate 0.0625   Epoch: 4   Global Step: 173920   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:19,391-Speed 2628.26 samples/sec   Loss 10.8019   LearningRate 0.0625   Epoch: 4   Global Step: 173930   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:23,285-Speed 2629.89 samples/sec   Loss 10.6601   LearningRate 0.0625   Epoch: 4   Global Step: 173940   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:16:27,177-Speed 2631.84 samples/sec   Loss 10.7178   LearningRate 0.0625   Epoch: 4   Global Step: 173950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:16:31,075-Speed 2627.95 samples/sec   Loss 10.7053   LearningRate 0.0625   Epoch: 4   Global Step: 173960   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:16:34,970-Speed 2629.70 samples/sec   Loss 10.7943   LearningRate 0.0625   Epoch: 4   Global Step: 173970   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:16:38,857-Speed 2634.27 samples/sec   Loss 10.7280   LearningRate 0.0625   Epoch: 4   Global Step: 173980   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:16:42,733-Speed 2642.77 samples/sec   Loss 10.7603   LearningRate 0.0625   Epoch: 4   Global Step: 173990   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:16:46,625-Speed 2631.99 samples/sec   Loss 10.6746   LearningRate 0.0625   Epoch: 4   Global Step: 174000   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:16:50,519-Speed 2630.10 samples/sec   Loss 10.6649   LearningRate 0.0624   Epoch: 4   Global Step: 174010   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:16:54,413-Speed 2630.46 samples/sec   Loss 10.7759   LearningRate 0.0624   Epoch: 4   Global Step: 174020   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:16:58,305-Speed 2631.30 samples/sec   Loss 10.6874   LearningRate 0.0624   Epoch: 4   Global Step: 174030   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:17:02,212-Speed 2621.70 samples/sec   Loss 10.5995   LearningRate 0.0624   Epoch: 4   Global Step: 174040   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:17:06,134-Speed 2611.54 samples/sec   Loss 10.8607   LearningRate 0.0624   Epoch: 4   Global Step: 174050   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:17:10,051-Speed 2614.60 samples/sec   Loss 10.7701   LearningRate 0.0624   Epoch: 4   Global Step: 174060   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:17:13,990-Speed 2600.08 samples/sec   Loss 10.7226   LearningRate 0.0624   Epoch: 4   Global Step: 174070   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:17:17,911-Speed 2611.84 samples/sec   Loss 10.8025   LearningRate 0.0624   Epoch: 4   Global Step: 174080   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:17:21,814-Speed 2624.75 samples/sec   Loss 10.8090   LearningRate 0.0624   Epoch: 4   Global Step: 174090   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:25,711-Speed 2628.10 samples/sec   Loss 10.8929   LearningRate 0.0624   Epoch: 4   Global Step: 174100   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:29,611-Speed 2626.59 samples/sec   Loss 10.6142   LearningRate 0.0624   Epoch: 4   Global Step: 174110   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:33,501-Speed 2633.09 samples/sec   Loss 10.6885   LearningRate 0.0624   Epoch: 4   Global Step: 174120   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:37,392-Speed 2631.68 samples/sec   Loss 10.9025   LearningRate 0.0624   Epoch: 4   Global Step: 174130   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:41,285-Speed 2631.09 samples/sec   Loss 10.7672   LearningRate 0.0624   Epoch: 4   Global Step: 174140   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:45,184-Speed 2627.24 samples/sec   Loss 10.8159   LearningRate 0.0624   Epoch: 4   Global Step: 174150   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:50,415-Speed 1957.52 samples/sec   Loss 10.6434   LearningRate 0.0624   Epoch: 4   Global Step: 174160   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:54,306-Speed 2632.58 samples/sec   Loss 10.7697   LearningRate 0.0624   Epoch: 4   Global Step: 174170   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:17:58,204-Speed 2627.40 samples/sec   Loss 10.9157   LearningRate 0.0624   Epoch: 4   Global Step: 174180   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:02,095-Speed 2632.57 samples/sec   Loss 10.8239   LearningRate 0.0624   Epoch: 4   Global Step: 174190   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:18:05,989-Speed 2630.21 samples/sec   Loss 10.8369   LearningRate 0.0624   Epoch: 4   Global Step: 174200   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:18:09,896-Speed 2621.40 samples/sec   Loss 10.8767   LearningRate 0.0624   Epoch: 4   Global Step: 174210   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:18:13,791-Speed 2629.66 samples/sec   Loss 10.6767   LearningRate 0.0624   Epoch: 4   Global Step: 174220   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:18:17,684-Speed 2630.98 samples/sec   Loss 10.6976   LearningRate 0.0624   Epoch: 4   Global Step: 174230   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:18:21,557-Speed 2644.57 samples/sec   Loss 10.7081   LearningRate 0.0624   Epoch: 4   Global Step: 174240   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:25,449-Speed 2631.35 samples/sec   Loss 10.7545   LearningRate 0.0624   Epoch: 4   Global Step: 174250   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:29,342-Speed 2631.38 samples/sec   Loss 10.7009   LearningRate 0.0624   Epoch: 4   Global Step: 174260   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:33,247-Speed 2622.39 samples/sec   Loss 10.7810   LearningRate 0.0624   Epoch: 4   Global Step: 174270   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:37,140-Speed 2630.75 samples/sec   Loss 10.7507   LearningRate 0.0624   Epoch: 4   Global Step: 174280   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:41,029-Speed 2633.50 samples/sec   Loss 10.8639   LearningRate 0.0624   Epoch: 4   Global Step: 174290   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:44,928-Speed 2627.81 samples/sec   Loss 10.7496   LearningRate 0.0624   Epoch: 4   Global Step: 174300   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:48,822-Speed 2630.29 samples/sec   Loss 10.6550   LearningRate 0.0624   Epoch: 4   Global Step: 174310   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:52,717-Speed 2629.56 samples/sec   Loss 10.6201   LearningRate 0.0624   Epoch: 4   Global Step: 174320   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:18:56,612-Speed 2629.34 samples/sec   Loss 10.6121   LearningRate 0.0624   Epoch: 4   Global Step: 174330   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:19:00,490-Speed 2641.29 samples/sec   Loss 11.4523   LearningRate 0.0624   Epoch: 4   Global Step: 174340   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:04,389-Speed 2626.48 samples/sec   Loss 11.2824   LearningRate 0.0624   Epoch: 4   Global Step: 174350   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:08,293-Speed 2623.82 samples/sec   Loss 10.9648   LearningRate 0.0624   Epoch: 4   Global Step: 174360   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:12,187-Speed 2629.96 samples/sec   Loss 10.9998   LearningRate 0.0624   Epoch: 4   Global Step: 174370   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:16,078-Speed 2632.48 samples/sec   Loss 10.8710   LearningRate 0.0624   Epoch: 4   Global Step: 174380   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:19,969-Speed 2632.19 samples/sec   Loss 10.6671   LearningRate 0.0624   Epoch: 4   Global Step: 174390   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:23,868-Speed 2627.12 samples/sec   Loss 10.8038   LearningRate 0.0624   Epoch: 4   Global Step: 174400   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:27,776-Speed 2621.41 samples/sec   Loss 10.8161   LearningRate 0.0624   Epoch: 4   Global Step: 174410   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:31,667-Speed 2631.95 samples/sec   Loss 10.7887   LearningRate 0.0624   Epoch: 4   Global Step: 174420   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:35,575-Speed 2620.65 samples/sec   Loss 10.7683   LearningRate 0.0624   Epoch: 4   Global Step: 174430   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:19:39,489-Speed 2616.57 samples/sec   Loss 10.8141   LearningRate 0.0624   Epoch: 4   Global Step: 174440   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:19:43,384-Speed 2630.19 samples/sec   Loss 10.6233   LearningRate 0.0624   Epoch: 4   Global Step: 174450   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:19:47,353-Speed 2580.27 samples/sec   Loss 10.6904   LearningRate 0.0624   Epoch: 4   Global Step: 174460   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:19:51,447-Speed 2501.97 samples/sec   Loss 10.8299   LearningRate 0.0624   Epoch: 4   Global Step: 174470   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:19:55,459-Speed 2553.08 samples/sec   Loss 10.8537   LearningRate 0.0624   Epoch: 4   Global Step: 174480   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:19:59,352-Speed 2631.24 samples/sec   Loss 10.7206   LearningRate 0.0624   Epoch: 4   Global Step: 174490   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:03,245-Speed 2630.49 samples/sec   Loss 10.8175   LearningRate 0.0624   Epoch: 4   Global Step: 174500   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:07,141-Speed 2629.33 samples/sec   Loss 10.6776   LearningRate 0.0624   Epoch: 4   Global Step: 174510   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:11,045-Speed 2623.14 samples/sec   Loss 10.7478   LearningRate 0.0624   Epoch: 4   Global Step: 174520   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:14,951-Speed 2622.37 samples/sec   Loss 10.6877   LearningRate 0.0623   Epoch: 4   Global Step: 174530   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:18,833-Speed 2638.16 samples/sec   Loss 10.9442   LearningRate 0.0623   Epoch: 4   Global Step: 174540   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:22,725-Speed 2631.75 samples/sec   Loss 10.7491   LearningRate 0.0623   Epoch: 4   Global Step: 174550   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:27,787-Speed 2023.45 samples/sec   Loss 10.7297   LearningRate 0.0623   Epoch: 4   Global Step: 174560   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:31,691-Speed 2623.24 samples/sec   Loss 10.7083   LearningRate 0.0623   Epoch: 4   Global Step: 174570   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:35,593-Speed 2625.53 samples/sec   Loss 10.6207   LearningRate 0.0623   Epoch: 4   Global Step: 174580   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:39,487-Speed 2630.14 samples/sec   Loss 10.7620   LearningRate 0.0623   Epoch: 4   Global Step: 174590   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:43,380-Speed 2630.48 samples/sec   Loss 10.6981   LearningRate 0.0623   Epoch: 4   Global Step: 174600   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:47,285-Speed 2622.43 samples/sec   Loss 10.6995   LearningRate 0.0623   Epoch: 4   Global Step: 174610   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:51,190-Speed 2623.43 samples/sec   Loss 10.7465   LearningRate 0.0623   Epoch: 4   Global Step: 174620   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:55,087-Speed 2628.01 samples/sec   Loss 10.6480   LearningRate 0.0623   Epoch: 4   Global Step: 174630   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:20:58,987-Speed 2626.64 samples/sec   Loss 10.5357   LearningRate 0.0623   Epoch: 4   Global Step: 174640   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:21:02,883-Speed 2628.65 samples/sec   Loss 10.8567   LearningRate 0.0623   Epoch: 4   Global Step: 174650   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:21:06,778-Speed 2629.94 samples/sec   Loss 10.6441   LearningRate 0.0623   Epoch: 4   Global Step: 174660   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:21:10,663-Speed 2636.57 samples/sec   Loss 10.5831   LearningRate 0.0623   Epoch: 4   Global Step: 174670   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:14,547-Speed 2636.60 samples/sec   Loss 10.8303   LearningRate 0.0623   Epoch: 4   Global Step: 174680   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:18,440-Speed 2631.30 samples/sec   Loss 10.7286   LearningRate 0.0623   Epoch: 4   Global Step: 174690   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:22,333-Speed 2630.60 samples/sec   Loss 10.7322   LearningRate 0.0623   Epoch: 4   Global Step: 174700   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:26,223-Speed 2633.24 samples/sec   Loss 10.7805   LearningRate 0.0623   Epoch: 4   Global Step: 174710   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:30,143-Speed 2612.68 samples/sec   Loss 10.7420   LearningRate 0.0623   Epoch: 4   Global Step: 174720   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:34,051-Speed 2620.72 samples/sec   Loss 10.6903   LearningRate 0.0623   Epoch: 4   Global Step: 174730   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:37,962-Speed 2618.89 samples/sec   Loss 10.8575   LearningRate 0.0623   Epoch: 4   Global Step: 174740   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:41,858-Speed 2628.95 samples/sec   Loss 10.7745   LearningRate 0.0623   Epoch: 4   Global Step: 174750   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:45,752-Speed 2630.36 samples/sec   Loss 10.8433   LearningRate 0.0623   Epoch: 4   Global Step: 174760   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:21:49,648-Speed 2629.20 samples/sec   Loss 10.8626   LearningRate 0.0623   Epoch: 4   Global Step: 174770   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:21:53,544-Speed 2628.57 samples/sec   Loss 10.7883   LearningRate 0.0623   Epoch: 4   Global Step: 174780   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:21:57,445-Speed 2626.02 samples/sec   Loss 10.5689   LearningRate 0.0623   Epoch: 4   Global Step: 174790   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:22:01,342-Speed 2627.87 samples/sec   Loss 10.7796   LearningRate 0.0623   Epoch: 4   Global Step: 174800   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:22:05,223-Speed 2639.39 samples/sec   Loss 10.7048   LearningRate 0.0623   Epoch: 4   Global Step: 174810   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:09,126-Speed 2624.30 samples/sec   Loss 10.7274   LearningRate 0.0623   Epoch: 4   Global Step: 174820   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:13,023-Speed 2627.51 samples/sec   Loss 10.7090   LearningRate 0.0623   Epoch: 4   Global Step: 174830   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:16,943-Speed 2612.84 samples/sec   Loss 10.6173   LearningRate 0.0623   Epoch: 4   Global Step: 174840   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:20,840-Speed 2629.24 samples/sec   Loss 10.5902   LearningRate 0.0623   Epoch: 4   Global Step: 174850   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:24,738-Speed 2627.09 samples/sec   Loss 10.7916   LearningRate 0.0623   Epoch: 4   Global Step: 174860   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:28,657-Speed 2613.50 samples/sec   Loss 10.7958   LearningRate 0.0623   Epoch: 4   Global Step: 174870   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:32,562-Speed 2623.00 samples/sec   Loss 10.7846   LearningRate 0.0623   Epoch: 4   Global Step: 174880   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:36,586-Speed 2544.82 samples/sec   Loss 10.7296   LearningRate 0.0623   Epoch: 4   Global Step: 174890   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:40,668-Speed 2509.62 samples/sec   Loss 10.6569   LearningRate 0.0623   Epoch: 4   Global Step: 174900   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:22:44,724-Speed 2524.64 samples/sec   Loss 10.7406   LearningRate 0.0623   Epoch: 4   Global Step: 174910   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:22:48,622-Speed 2628.01 samples/sec   Loss 10.8790   LearningRate 0.0623   Epoch: 4   Global Step: 174920   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:22:52,518-Speed 2628.51 samples/sec   Loss 10.6011   LearningRate 0.0623   Epoch: 4   Global Step: 174930   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:22:56,417-Speed 2627.87 samples/sec   Loss 10.6952   LearningRate 0.0623   Epoch: 4   Global Step: 174940   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:23:00,313-Speed 2629.02 samples/sec   Loss 10.8053   LearningRate 0.0623   Epoch: 4   Global Step: 174950   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:23:04,243-Speed 2605.96 samples/sec   Loss 10.8025   LearningRate 0.0623   Epoch: 4   Global Step: 174960   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:08,138-Speed 2629.65 samples/sec   Loss 10.7369   LearningRate 0.0623   Epoch: 4   Global Step: 174970   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:12,030-Speed 2631.34 samples/sec   Loss 10.7133   LearningRate 0.0623   Epoch: 4   Global Step: 174980   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:15,925-Speed 2629.60 samples/sec   Loss 10.7463   LearningRate 0.0623   Epoch: 4   Global Step: 174990   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:19,816-Speed 2632.26 samples/sec   Loss 10.7484   LearningRate 0.0623   Epoch: 4   Global Step: 175000   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:23,706-Speed 2632.91 samples/sec   Loss 10.6811   LearningRate 0.0623   Epoch: 4   Global Step: 175010   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:27,599-Speed 2630.84 samples/sec   Loss 10.6397   LearningRate 0.0623   Epoch: 4   Global Step: 175020   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:31,568-Speed 2580.61 samples/sec   Loss 10.8277   LearningRate 0.0623   Epoch: 4   Global Step: 175030   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:35,494-Speed 2608.82 samples/sec   Loss 10.8153   LearningRate 0.0623   Epoch: 4   Global Step: 175040   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:39,395-Speed 2625.90 samples/sec   Loss 10.7740   LearningRate 0.0623   Epoch: 4   Global Step: 175050   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:43,291-Speed 2628.59 samples/sec   Loss 10.7886   LearningRate 0.0622   Epoch: 4   Global Step: 175060   Fp16 Grad Scale: 262144   Required: 74 hours
Training: 2022-04-13 15:23:47,164-Speed 2644.55 samples/sec   Loss 10.6622   LearningRate 0.0622   Epoch: 4   Global Step: 175070   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:51,058-Speed 2630.41 samples/sec   Loss 10.8645   LearningRate 0.0622   Epoch: 4   Global Step: 175080   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:54,951-Speed 2630.69 samples/sec   Loss 10.8176   LearningRate 0.0622   Epoch: 4   Global Step: 175090   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:23:58,845-Speed 2630.18 samples/sec   Loss 10.6177   LearningRate 0.0622   Epoch: 4   Global Step: 175100   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:24:02,756-Speed 2618.59 samples/sec   Loss 10.7145   LearningRate 0.0622   Epoch: 4   Global Step: 175110   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:24:06,671-Speed 2616.57 samples/sec   Loss 10.6010   LearningRate 0.0622   Epoch: 4   Global Step: 175120   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:24:10,586-Speed 2615.71 samples/sec   Loss 10.7332   LearningRate 0.0622   Epoch: 4   Global Step: 175130   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:24:14,484-Speed 2628.08 samples/sec   Loss 10.8067   LearningRate 0.0622   Epoch: 4   Global Step: 175140   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:24:18,374-Speed 2633.14 samples/sec   Loss 10.9011   LearningRate 0.0622   Epoch: 4   Global Step: 175150   Fp16 Grad Scale: 131072   Required: 74 hours
Training: 2022-04-13 15:24:22,253-Speed 2640.63 samples/sec   Loss 10.7377   LearningRate 0.0622   Epoch: 4   Global Step: 175160   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:24:26,144-Speed 2631.82 samples/sec   Loss 10.7899   LearningRate 0.0622   Epoch: 4   Global Step: 175170   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:24:30,038-Speed 2630.46 samples/sec   Loss 10.6972   LearningRate 0.0622   Epoch: 4   Global Step: 175180   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:24:33,929-Speed 2631.84 samples/sec   Loss 10.7613   LearningRate 0.0622   Epoch: 4   Global Step: 175190   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:24:37,824-Speed 2630.01 samples/sec   Loss 10.6620   LearningRate 0.0622   Epoch: 4   Global Step: 175200   Fp16 Grad Scale: 65536   Required: 74 hours
Training: 2022-04-13 15:24:41,715-Speed 2632.00 samples/sec   Loss 10.5800   LearningRate 0.0622   Epoch: 4   Global Step: 175210   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:24:45,609-Speed 2630.01 samples/sec   Loss 10.6840   LearningRate 0.0622   Epoch: 4   Global Step: 175220   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:24:49,502-Speed 2631.16 samples/sec   Loss 10.9055   LearningRate 0.0622   Epoch: 4   Global Step: 175230   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:24:53,395-Speed 2631.51 samples/sec   Loss 10.7669   LearningRate 0.0622   Epoch: 4   Global Step: 175240   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:24:57,286-Speed 2632.72 samples/sec   Loss 10.5624   LearningRate 0.0622   Epoch: 4   Global Step: 175250   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:25:01,187-Speed 2625.70 samples/sec   Loss 10.6342   LearningRate 0.0622   Epoch: 4   Global Step: 175260   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:05,092-Speed 2623.10 samples/sec   Loss 10.7093   LearningRate 0.0622   Epoch: 4   Global Step: 175270   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:09,017-Speed 2609.22 samples/sec   Loss 10.7988   LearningRate 0.0622   Epoch: 4   Global Step: 175280   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:12,913-Speed 2628.29 samples/sec   Loss 10.7208   LearningRate 0.0622   Epoch: 4   Global Step: 175290   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:16,811-Speed 2627.73 samples/sec   Loss 10.6920   LearningRate 0.0622   Epoch: 4   Global Step: 175300   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:20,705-Speed 2630.50 samples/sec   Loss 10.6972   LearningRate 0.0622   Epoch: 4   Global Step: 175310   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:24,736-Speed 2541.08 samples/sec   Loss 10.6796   LearningRate 0.0622   Epoch: 4   Global Step: 175320   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:28,628-Speed 2632.06 samples/sec   Loss 10.9342   LearningRate 0.0622   Epoch: 4   Global Step: 175330   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:25:32,505-Speed 2641.93 samples/sec   Loss 10.7003   LearningRate 0.0622   Epoch: 4   Global Step: 175340   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:25:36,448-Speed 2596.99 samples/sec   Loss 10.6969   LearningRate 0.0622   Epoch: 4   Global Step: 175350   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:25:40,409-Speed 2586.11 samples/sec   Loss 10.8403   LearningRate 0.0622   Epoch: 4   Global Step: 175360   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:25:44,304-Speed 2629.14 samples/sec   Loss 10.6189   LearningRate 0.0622   Epoch: 4   Global Step: 175370   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:25:48,206-Speed 2625.28 samples/sec   Loss 10.6624   LearningRate 0.0622   Epoch: 4   Global Step: 175380   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:25:52,095-Speed 2633.53 samples/sec   Loss 10.7478   LearningRate 0.0622   Epoch: 4   Global Step: 175390   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:25:55,985-Speed 2632.93 samples/sec   Loss 10.7284   LearningRate 0.0622   Epoch: 4   Global Step: 175400   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:25:59,884-Speed 2626.77 samples/sec   Loss 10.6303   LearningRate 0.0622   Epoch: 4   Global Step: 175410   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:26:03,778-Speed 2630.53 samples/sec   Loss 10.7890   LearningRate 0.0622   Epoch: 4   Global Step: 175420   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:26:07,669-Speed 2632.32 samples/sec   Loss 10.7887   LearningRate 0.0622   Epoch: 4   Global Step: 175430   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:26:11,562-Speed 2630.99 samples/sec   Loss 10.7722   LearningRate 0.0622   Epoch: 4   Global Step: 175440   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:26:15,452-Speed 2633.23 samples/sec   Loss 10.6236   LearningRate 0.0622   Epoch: 4   Global Step: 175450   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:19,343-Speed 2631.67 samples/sec   Loss 10.8594   LearningRate 0.0622   Epoch: 4   Global Step: 175460   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:23,236-Speed 2631.10 samples/sec   Loss 10.8926   LearningRate 0.0622   Epoch: 4   Global Step: 175470   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:27,128-Speed 2631.50 samples/sec   Loss 10.7877   LearningRate 0.0622   Epoch: 4   Global Step: 175480   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:31,022-Speed 2630.34 samples/sec   Loss 10.7071   LearningRate 0.0622   Epoch: 4   Global Step: 175490   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:34,913-Speed 2632.52 samples/sec   Loss 10.6716   LearningRate 0.0622   Epoch: 4   Global Step: 175500   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:38,802-Speed 2633.05 samples/sec   Loss 10.7889   LearningRate 0.0622   Epoch: 4   Global Step: 175510   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:42,694-Speed 2631.87 samples/sec   Loss 10.7865   LearningRate 0.0622   Epoch: 4   Global Step: 175520   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:46,587-Speed 2631.20 samples/sec   Loss 10.8531   LearningRate 0.0622   Epoch: 4   Global Step: 175530   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:50,484-Speed 2628.32 samples/sec   Loss 10.7706   LearningRate 0.0622   Epoch: 4   Global Step: 175540   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:26:54,374-Speed 2632.55 samples/sec   Loss 10.6387   LearningRate 0.0622   Epoch: 4   Global Step: 175550   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:26:58,267-Speed 2631.08 samples/sec   Loss 10.8343   LearningRate 0.0622   Epoch: 4   Global Step: 175560   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:02,159-Speed 2631.94 samples/sec   Loss 10.7015   LearningRate 0.0622   Epoch: 4   Global Step: 175570   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:06,053-Speed 2630.00 samples/sec   Loss 10.6858   LearningRate 0.0621   Epoch: 4   Global Step: 175580   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:09,956-Speed 2624.10 samples/sec   Loss 10.8392   LearningRate 0.0621   Epoch: 4   Global Step: 175590   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:13,842-Speed 2636.01 samples/sec   Loss 10.7554   LearningRate 0.0621   Epoch: 4   Global Step: 175600   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:17,736-Speed 2629.49 samples/sec   Loss 10.6580   LearningRate 0.0621   Epoch: 4   Global Step: 175610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:21,627-Speed 2632.83 samples/sec   Loss 10.8468   LearningRate 0.0621   Epoch: 4   Global Step: 175620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:25,533-Speed 2622.88 samples/sec   Loss 10.7791   LearningRate 0.0621   Epoch: 4   Global Step: 175630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:29,433-Speed 2626.14 samples/sec   Loss 10.6719   LearningRate 0.0621   Epoch: 4   Global Step: 175640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:27:33,339-Speed 2622.04 samples/sec   Loss 10.7758   LearningRate 0.0621   Epoch: 4   Global Step: 175650   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:27:37,233-Speed 2630.05 samples/sec   Loss 10.7553   LearningRate 0.0621   Epoch: 4   Global Step: 175660   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:27:41,134-Speed 2625.56 samples/sec   Loss 10.7350   LearningRate 0.0621   Epoch: 4   Global Step: 175670   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:27:45,036-Speed 2624.86 samples/sec   Loss 10.9179   LearningRate 0.0621   Epoch: 4   Global Step: 175680   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:27:48,937-Speed 2625.44 samples/sec   Loss 10.7973   LearningRate 0.0621   Epoch: 4   Global Step: 175690   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:27:52,836-Speed 2627.10 samples/sec   Loss 10.5898   LearningRate 0.0621   Epoch: 4   Global Step: 175700   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:27:56,755-Speed 2613.35 samples/sec   Loss 10.6639   LearningRate 0.0621   Epoch: 4   Global Step: 175710   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:28:00,632-Speed 2642.26 samples/sec   Loss 10.6602   LearningRate 0.0621   Epoch: 4   Global Step: 175720   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:04,617-Speed 2569.87 samples/sec   Loss 10.7264   LearningRate 0.0621   Epoch: 4   Global Step: 175730   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:08,515-Speed 2627.70 samples/sec   Loss 10.6572   LearningRate 0.0621   Epoch: 4   Global Step: 175740   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:12,411-Speed 2628.90 samples/sec   Loss 10.6827   LearningRate 0.0621   Epoch: 4   Global Step: 175750   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:16,306-Speed 2629.25 samples/sec   Loss 10.6635   LearningRate 0.0621   Epoch: 4   Global Step: 175760   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:20,206-Speed 2626.25 samples/sec   Loss 10.7135   LearningRate 0.0621   Epoch: 4   Global Step: 175770   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:24,109-Speed 2624.26 samples/sec   Loss 10.6920   LearningRate 0.0621   Epoch: 4   Global Step: 175780   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:28,024-Speed 2616.14 samples/sec   Loss 10.8571   LearningRate 0.0621   Epoch: 4   Global Step: 175790   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:31,926-Speed 2624.97 samples/sec   Loss 10.6844   LearningRate 0.0621   Epoch: 4   Global Step: 175800   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:35,821-Speed 2630.32 samples/sec   Loss 10.6087   LearningRate 0.0621   Epoch: 4   Global Step: 175810   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:39,732-Speed 2618.32 samples/sec   Loss 10.6794   LearningRate 0.0621   Epoch: 4   Global Step: 175820   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:28:43,618-Speed 2635.65 samples/sec   Loss 10.7008   LearningRate 0.0621   Epoch: 4   Global Step: 175830   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:47,516-Speed 2628.12 samples/sec   Loss 10.7695   LearningRate 0.0621   Epoch: 4   Global Step: 175840   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:51,418-Speed 2624.63 samples/sec   Loss 10.7057   LearningRate 0.0621   Epoch: 4   Global Step: 175850   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:55,326-Speed 2620.58 samples/sec   Loss 10.6863   LearningRate 0.0621   Epoch: 4   Global Step: 175860   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:28:59,221-Speed 2629.98 samples/sec   Loss 10.7089   LearningRate 0.0621   Epoch: 4   Global Step: 175870   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:03,116-Speed 2629.10 samples/sec   Loss 10.8783   LearningRate 0.0621   Epoch: 4   Global Step: 175880   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:07,010-Speed 2630.31 samples/sec   Loss 10.7363   LearningRate 0.0621   Epoch: 4   Global Step: 175890   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:10,908-Speed 2627.88 samples/sec   Loss 10.6791   LearningRate 0.0621   Epoch: 4   Global Step: 175900   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:14,799-Speed 2632.56 samples/sec   Loss 10.7971   LearningRate 0.0621   Epoch: 4   Global Step: 175910   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:18,694-Speed 2629.42 samples/sec   Loss 10.6907   LearningRate 0.0621   Epoch: 4   Global Step: 175920   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:22,587-Speed 2631.10 samples/sec   Loss 10.5945   LearningRate 0.0621   Epoch: 4   Global Step: 175930   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:29:26,480-Speed 2630.67 samples/sec   Loss 10.8076   LearningRate 0.0621   Epoch: 4   Global Step: 175940   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:29:30,373-Speed 2631.34 samples/sec   Loss 10.6623   LearningRate 0.0621   Epoch: 4   Global Step: 175950   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:29:34,271-Speed 2627.44 samples/sec   Loss 10.6712   LearningRate 0.0621   Epoch: 4   Global Step: 175960   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:29:38,165-Speed 2629.67 samples/sec   Loss 10.6622   LearningRate 0.0621   Epoch: 4   Global Step: 175970   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:29:42,055-Speed 2633.26 samples/sec   Loss 10.7350   LearningRate 0.0621   Epoch: 4   Global Step: 175980   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:45,963-Speed 2620.65 samples/sec   Loss 10.7615   LearningRate 0.0621   Epoch: 4   Global Step: 175990   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:49,861-Speed 2627.94 samples/sec   Loss 10.7375   LearningRate 0.0621   Epoch: 4   Global Step: 176000   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:53,754-Speed 2631.16 samples/sec   Loss 10.6707   LearningRate 0.0621   Epoch: 4   Global Step: 176010   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:29:57,651-Speed 2627.76 samples/sec   Loss 10.7087   LearningRate 0.0621   Epoch: 4   Global Step: 176020   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:01,546-Speed 2630.10 samples/sec   Loss 10.5741   LearningRate 0.0621   Epoch: 4   Global Step: 176030   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:05,441-Speed 2629.31 samples/sec   Loss 10.7027   LearningRate 0.0621   Epoch: 4   Global Step: 176040   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:09,332-Speed 2632.36 samples/sec   Loss 10.7372   LearningRate 0.0621   Epoch: 4   Global Step: 176050   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:13,224-Speed 2631.44 samples/sec   Loss 10.7157   LearningRate 0.0621   Epoch: 4   Global Step: 176060   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:17,225-Speed 2559.82 samples/sec   Loss 10.6932   LearningRate 0.0621   Epoch: 4   Global Step: 176070   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:21,120-Speed 2630.01 samples/sec   Loss 10.7112   LearningRate 0.0621   Epoch: 4   Global Step: 176080   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:30:25,021-Speed 2625.13 samples/sec   Loss 10.8546   LearningRate 0.0621   Epoch: 4   Global Step: 176090   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:30:28,927-Speed 2623.02 samples/sec   Loss 10.7189   LearningRate 0.0621   Epoch: 4   Global Step: 176100   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:30:32,821-Speed 2629.96 samples/sec   Loss 10.7999   LearningRate 0.0620   Epoch: 4   Global Step: 176110   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:36,714-Speed 2630.91 samples/sec   Loss 10.7788   LearningRate 0.0620   Epoch: 4   Global Step: 176120   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:40,614-Speed 2626.01 samples/sec   Loss 10.7185   LearningRate 0.0620   Epoch: 4   Global Step: 176130   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:44,514-Speed 2626.64 samples/sec   Loss 10.5909   LearningRate 0.0620   Epoch: 4   Global Step: 176140   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:48,412-Speed 2626.78 samples/sec   Loss 10.8180   LearningRate 0.0620   Epoch: 4   Global Step: 176150   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:52,318-Speed 2622.76 samples/sec   Loss 10.7990   LearningRate 0.0620   Epoch: 4   Global Step: 176160   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:30:56,211-Speed 2630.75 samples/sec   Loss 10.6689   LearningRate 0.0620   Epoch: 4   Global Step: 176170   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:00,116-Speed 2623.46 samples/sec   Loss 10.8172   LearningRate 0.0620   Epoch: 4   Global Step: 176180   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:04,016-Speed 2625.89 samples/sec   Loss 10.7653   LearningRate 0.0620   Epoch: 4   Global Step: 176190   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:07,916-Speed 2626.51 samples/sec   Loss 10.6732   LearningRate 0.0620   Epoch: 4   Global Step: 176200   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:11,809-Speed 2630.80 samples/sec   Loss 10.5973   LearningRate 0.0620   Epoch: 4   Global Step: 176210   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:31:15,685-Speed 2642.03 samples/sec   Loss 10.6459   LearningRate 0.0620   Epoch: 4   Global Step: 176220   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:19,591-Speed 2622.15 samples/sec   Loss 10.8487   LearningRate 0.0620   Epoch: 4   Global Step: 176230   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:23,485-Speed 2631.16 samples/sec   Loss 10.8097   LearningRate 0.0620   Epoch: 4   Global Step: 176240   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:27,381-Speed 2628.64 samples/sec   Loss 10.7433   LearningRate 0.0620   Epoch: 4   Global Step: 176250   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:31,277-Speed 2628.78 samples/sec   Loss 10.6186   LearningRate 0.0620   Epoch: 4   Global Step: 176260   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:35,183-Speed 2622.35 samples/sec   Loss 10.7398   LearningRate 0.0620   Epoch: 4   Global Step: 176270   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:39,149-Speed 2582.43 samples/sec   Loss 10.8364   LearningRate 0.0620   Epoch: 4   Global Step: 176280   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:43,063-Speed 2617.06 samples/sec   Loss 10.6513   LearningRate 0.0620   Epoch: 4   Global Step: 176290   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:46,954-Speed 2632.78 samples/sec   Loss 10.7697   LearningRate 0.0620   Epoch: 4   Global Step: 176300   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:50,859-Speed 2622.74 samples/sec   Loss 10.7081   LearningRate 0.0620   Epoch: 4   Global Step: 176310   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:31:54,761-Speed 2624.53 samples/sec   Loss 10.6848   LearningRate 0.0620   Epoch: 4   Global Step: 176320   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:31:58,666-Speed 2623.44 samples/sec   Loss 10.7962   LearningRate 0.0620   Epoch: 4   Global Step: 176330   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:02,561-Speed 2629.46 samples/sec   Loss 10.7381   LearningRate 0.0620   Epoch: 4   Global Step: 176340   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:06,454-Speed 2630.54 samples/sec   Loss 10.8210   LearningRate 0.0620   Epoch: 4   Global Step: 176350   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:10,354-Speed 2626.10 samples/sec   Loss 10.6238   LearningRate 0.0620   Epoch: 4   Global Step: 176360   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:14,265-Speed 2618.81 samples/sec   Loss 10.5145   LearningRate 0.0620   Epoch: 4   Global Step: 176370   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:18,195-Speed 2606.20 samples/sec   Loss 10.6527   LearningRate 0.0620   Epoch: 4   Global Step: 176380   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:22,106-Speed 2619.41 samples/sec   Loss 10.7560   LearningRate 0.0620   Epoch: 4   Global Step: 176390   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:32:25,986-Speed 2639.53 samples/sec   Loss 10.7127   LearningRate 0.0620   Epoch: 4   Global Step: 176400   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:29,895-Speed 2620.06 samples/sec   Loss 10.7619   LearningRate 0.0620   Epoch: 4   Global Step: 176410   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:33,800-Speed 2623.24 samples/sec   Loss 10.7750   LearningRate 0.0620   Epoch: 4   Global Step: 176420   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:38,085-Speed 2389.85 samples/sec   Loss 10.6890   LearningRate 0.0620   Epoch: 4   Global Step: 176430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:41,986-Speed 2625.58 samples/sec   Loss 10.6991   LearningRate 0.0620   Epoch: 4   Global Step: 176440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:45,888-Speed 2625.16 samples/sec   Loss 10.6888   LearningRate 0.0620   Epoch: 4   Global Step: 176450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:49,783-Speed 2629.65 samples/sec   Loss 10.6977   LearningRate 0.0620   Epoch: 4   Global Step: 176460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:53,681-Speed 2627.48 samples/sec   Loss 10.7055   LearningRate 0.0620   Epoch: 4   Global Step: 176470   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:32:57,586-Speed 2623.03 samples/sec   Loss 10.7352   LearningRate 0.0620   Epoch: 4   Global Step: 176480   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:01,483-Speed 2627.95 samples/sec   Loss 10.7077   LearningRate 0.0620   Epoch: 4   Global Step: 176490   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:05,380-Speed 2628.27 samples/sec   Loss 10.6808   LearningRate 0.0620   Epoch: 4   Global Step: 176500   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:33:09,281-Speed 2625.84 samples/sec   Loss 10.7724   LearningRate 0.0620   Epoch: 4   Global Step: 176510   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:33:13,173-Speed 2631.35 samples/sec   Loss 10.5734   LearningRate 0.0620   Epoch: 4   Global Step: 176520   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:33:17,057-Speed 2636.77 samples/sec   Loss 10.6240   LearningRate 0.0620   Epoch: 4   Global Step: 176530   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:20,943-Speed 2636.13 samples/sec   Loss 10.6511   LearningRate 0.0620   Epoch: 4   Global Step: 176540   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:24,837-Speed 2630.34 samples/sec   Loss 10.8826   LearningRate 0.0620   Epoch: 4   Global Step: 176550   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:28,741-Speed 2623.70 samples/sec   Loss 10.7811   LearningRate 0.0620   Epoch: 4   Global Step: 176560   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:32,655-Speed 2616.92 samples/sec   Loss 10.6497   LearningRate 0.0620   Epoch: 4   Global Step: 176570   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:36,572-Speed 2633.99 samples/sec   Loss 10.6863   LearningRate 0.0620   Epoch: 4   Global Step: 176580   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:40,463-Speed 2632.11 samples/sec   Loss 10.6572   LearningRate 0.0620   Epoch: 4   Global Step: 176590   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:44,354-Speed 2631.93 samples/sec   Loss 10.7450   LearningRate 0.0620   Epoch: 4   Global Step: 176600   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:48,251-Speed 2627.90 samples/sec   Loss 10.6378   LearningRate 0.0620   Epoch: 4   Global Step: 176610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:52,293-Speed 2632.38 samples/sec   Loss 10.7356   LearningRate 0.0620   Epoch: 4   Global Step: 176620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:33:56,366-Speed 2609.40 samples/sec   Loss 10.7222   LearningRate 0.0620   Epoch: 4   Global Step: 176630   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:34:00,265-Speed 2627.23 samples/sec   Loss 10.6636   LearningRate 0.0619   Epoch: 4   Global Step: 176640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:04,299-Speed 2635.71 samples/sec   Loss 10.6590   LearningRate 0.0619   Epoch: 4   Global Step: 176650   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:08,201-Speed 2625.49 samples/sec   Loss 10.6195   LearningRate 0.0619   Epoch: 4   Global Step: 176660   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:12,110-Speed 2619.96 samples/sec   Loss 10.7043   LearningRate 0.0619   Epoch: 4   Global Step: 176670   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:16,016-Speed 2622.14 samples/sec   Loss 10.7197   LearningRate 0.0619   Epoch: 4   Global Step: 176680   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:19,946-Speed 2634.07 samples/sec   Loss 10.5888   LearningRate 0.0619   Epoch: 4   Global Step: 176690   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:23,853-Speed 2621.57 samples/sec   Loss 10.7515   LearningRate 0.0619   Epoch: 4   Global Step: 176700   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:27,756-Speed 2624.26 samples/sec   Loss 10.6949   LearningRate 0.0619   Epoch: 4   Global Step: 176710   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:31,666-Speed 2619.41 samples/sec   Loss 10.6482   LearningRate 0.0619   Epoch: 4   Global Step: 176720   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:35,572-Speed 2622.13 samples/sec   Loss 10.8454   LearningRate 0.0619   Epoch: 4   Global Step: 176730   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:34:39,471-Speed 2626.37 samples/sec   Loss 10.7773   LearningRate 0.0619   Epoch: 4   Global Step: 176740   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:34:43,370-Speed 2626.82 samples/sec   Loss 10.6781   LearningRate 0.0619   Epoch: 4   Global Step: 176750   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:34:47,267-Speed 2629.08 samples/sec   Loss 10.6456   LearningRate 0.0619   Epoch: 4   Global Step: 176760   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:34:51,158-Speed 2631.78 samples/sec   Loss 10.7127   LearningRate 0.0619   Epoch: 4   Global Step: 176770   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:34:55,050-Speed 2632.32 samples/sec   Loss 10.8820   LearningRate 0.0619   Epoch: 4   Global Step: 176780   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:34:59,020-Speed 2579.20 samples/sec   Loss 10.7867   LearningRate 0.0619   Epoch: 4   Global Step: 176790   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:35:02,921-Speed 2625.43 samples/sec   Loss 10.7515   LearningRate 0.0619   Epoch: 4   Global Step: 176800   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:35:06,832-Speed 2619.11 samples/sec   Loss 10.6160   LearningRate 0.0619   Epoch: 4   Global Step: 176810   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:35:10,728-Speed 2628.59 samples/sec   Loss 10.7990   LearningRate 0.0619   Epoch: 4   Global Step: 176820   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:14,632-Speed 2624.09 samples/sec   Loss 10.7396   LearningRate 0.0619   Epoch: 4   Global Step: 176830   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:18,546-Speed 2616.55 samples/sec   Loss 10.6700   LearningRate 0.0619   Epoch: 4   Global Step: 176840   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:22,454-Speed 2620.75 samples/sec   Loss 10.6871   LearningRate 0.0619   Epoch: 4   Global Step: 176850   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:26,350-Speed 2629.56 samples/sec   Loss 10.6838   LearningRate 0.0619   Epoch: 4   Global Step: 176860   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:30,240-Speed 2633.15 samples/sec   Loss 10.6471   LearningRate 0.0619   Epoch: 4   Global Step: 176870   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:34,133-Speed 2630.42 samples/sec   Loss 10.7171   LearningRate 0.0619   Epoch: 4   Global Step: 176880   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:38,024-Speed 2632.44 samples/sec   Loss 10.5183   LearningRate 0.0619   Epoch: 4   Global Step: 176890   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:41,915-Speed 2632.17 samples/sec   Loss 10.6828   LearningRate 0.0619   Epoch: 4   Global Step: 176900   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:45,809-Speed 2629.86 samples/sec   Loss 10.7663   LearningRate 0.0619   Epoch: 4   Global Step: 176910   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:49,712-Speed 2624.05 samples/sec   Loss 10.6216   LearningRate 0.0619   Epoch: 4   Global Step: 176920   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:35:53,592-Speed 2640.31 samples/sec   Loss 10.7263   LearningRate 0.0619   Epoch: 4   Global Step: 176930   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:35:57,498-Speed 2621.57 samples/sec   Loss 10.6348   LearningRate 0.0619   Epoch: 4   Global Step: 176940   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:01,392-Speed 2631.52 samples/sec   Loss 10.7550   LearningRate 0.0619   Epoch: 4   Global Step: 176950   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:05,300-Speed 2620.94 samples/sec   Loss 10.8086   LearningRate 0.0619   Epoch: 4   Global Step: 176960   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:09,193-Speed 2630.68 samples/sec   Loss 10.7674   LearningRate 0.0619   Epoch: 4   Global Step: 176970   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:13,088-Speed 2629.81 samples/sec   Loss 10.6824   LearningRate 0.0619   Epoch: 4   Global Step: 176980   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:16,983-Speed 2629.73 samples/sec   Loss 10.6777   LearningRate 0.0619   Epoch: 4   Global Step: 176990   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:20,886-Speed 2624.28 samples/sec   Loss 10.6437   LearningRate 0.0619   Epoch: 4   Global Step: 177000   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:24,790-Speed 2623.63 samples/sec   Loss 10.6668   LearningRate 0.0619   Epoch: 4   Global Step: 177010   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:28,694-Speed 2622.86 samples/sec   Loss 10.7645   LearningRate 0.0619   Epoch: 4   Global Step: 177020   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:36:32,595-Speed 2625.73 samples/sec   Loss 10.5933   LearningRate 0.0619   Epoch: 4   Global Step: 177030   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:36,498-Speed 2624.64 samples/sec   Loss 10.7008   LearningRate 0.0619   Epoch: 4   Global Step: 177040   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:40,393-Speed 2629.60 samples/sec   Loss 10.6659   LearningRate 0.0619   Epoch: 4   Global Step: 177050   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:44,284-Speed 2632.31 samples/sec   Loss 10.7712   LearningRate 0.0619   Epoch: 4   Global Step: 177060   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:48,177-Speed 2630.98 samples/sec   Loss 10.7532   LearningRate 0.0619   Epoch: 4   Global Step: 177070   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:52,076-Speed 2626.70 samples/sec   Loss 10.6842   LearningRate 0.0619   Epoch: 4   Global Step: 177080   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:56,076-Speed 2560.51 samples/sec   Loss 10.5223   LearningRate 0.0619   Epoch: 4   Global Step: 177090   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:36:59,979-Speed 2624.39 samples/sec   Loss 10.6365   LearningRate 0.0619   Epoch: 4   Global Step: 177100   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:37:03,872-Speed 2631.06 samples/sec   Loss 10.6782   LearningRate 0.0619   Epoch: 4   Global Step: 177110   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:37:07,755-Speed 2637.27 samples/sec   Loss 10.7403   LearningRate 0.0619   Epoch: 4   Global Step: 177120   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:37:11,641-Speed 2635.84 samples/sec   Loss 10.5900   LearningRate 0.0619   Epoch: 4   Global Step: 177130   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:37:15,541-Speed 2626.40 samples/sec   Loss 10.4670   LearningRate 0.0619   Epoch: 4   Global Step: 177140   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:37:19,447-Speed 2622.63 samples/sec   Loss 10.6346   LearningRate 0.0619   Epoch: 4   Global Step: 177150   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:37:23,322-Speed 2643.89 samples/sec   Loss 10.7860   LearningRate 0.0618   Epoch: 4   Global Step: 177160   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:27,215-Speed 2630.36 samples/sec   Loss 10.7039   LearningRate 0.0618   Epoch: 4   Global Step: 177170   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:31,107-Speed 2632.03 samples/sec   Loss 10.7579   LearningRate 0.0618   Epoch: 4   Global Step: 177180   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:35,008-Speed 2625.45 samples/sec   Loss 10.7255   LearningRate 0.0618   Epoch: 4   Global Step: 177190   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:38,898-Speed 2632.47 samples/sec   Loss 10.7378   LearningRate 0.0618   Epoch: 4   Global Step: 177200   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:42,793-Speed 2629.75 samples/sec   Loss 10.6898   LearningRate 0.0618   Epoch: 4   Global Step: 177210   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:46,687-Speed 2630.33 samples/sec   Loss 10.6790   LearningRate 0.0618   Epoch: 4   Global Step: 177220   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:50,925-Speed 2416.62 samples/sec   Loss 10.7366   LearningRate 0.0618   Epoch: 4   Global Step: 177230   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:54,821-Speed 2629.77 samples/sec   Loss 10.8375   LearningRate 0.0618   Epoch: 4   Global Step: 177240   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:37:58,723-Speed 2624.49 samples/sec   Loss 10.6167   LearningRate 0.0618   Epoch: 4   Global Step: 177250   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:38:02,614-Speed 2632.10 samples/sec   Loss 10.5710   LearningRate 0.0618   Epoch: 4   Global Step: 177260   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:06,507-Speed 2631.37 samples/sec   Loss 10.7370   LearningRate 0.0618   Epoch: 4   Global Step: 177270   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:10,427-Speed 2612.78 samples/sec   Loss 10.7185   LearningRate 0.0618   Epoch: 4   Global Step: 177280   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:14,334-Speed 2621.13 samples/sec   Loss 10.7166   LearningRate 0.0618   Epoch: 4   Global Step: 177290   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:18,254-Speed 2613.06 samples/sec   Loss 10.6369   LearningRate 0.0618   Epoch: 4   Global Step: 177300   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:22,154-Speed 2626.50 samples/sec   Loss 10.6174   LearningRate 0.0618   Epoch: 4   Global Step: 177310   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:26,069-Speed 2615.61 samples/sec   Loss 10.7197   LearningRate 0.0618   Epoch: 4   Global Step: 177320   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:29,987-Speed 2615.00 samples/sec   Loss 10.6972   LearningRate 0.0618   Epoch: 4   Global Step: 177330   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:33,890-Speed 2624.14 samples/sec   Loss 10.6162   LearningRate 0.0618   Epoch: 4   Global Step: 177340   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:37,817-Speed 2608.30 samples/sec   Loss 10.6185   LearningRate 0.0618   Epoch: 4   Global Step: 177350   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:41,720-Speed 2624.09 samples/sec   Loss 10.8037   LearningRate 0.0618   Epoch: 4   Global Step: 177360   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:38:45,602-Speed 2637.81 samples/sec   Loss 10.9066   LearningRate 0.0618   Epoch: 4   Global Step: 177370   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:49,495-Speed 2631.20 samples/sec   Loss 10.8019   LearningRate 0.0618   Epoch: 4   Global Step: 177380   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:53,388-Speed 2630.65 samples/sec   Loss 10.7085   LearningRate 0.0618   Epoch: 4   Global Step: 177390   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:38:57,280-Speed 2632.67 samples/sec   Loss 10.5987   LearningRate 0.0618   Epoch: 4   Global Step: 177400   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:01,178-Speed 2627.42 samples/sec   Loss 10.6530   LearningRate 0.0618   Epoch: 4   Global Step: 177410   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:05,085-Speed 2622.31 samples/sec   Loss 10.7282   LearningRate 0.0618   Epoch: 4   Global Step: 177420   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:08,975-Speed 2632.86 samples/sec   Loss 10.7099   LearningRate 0.0618   Epoch: 4   Global Step: 177430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:12,874-Speed 2626.92 samples/sec   Loss 10.6297   LearningRate 0.0618   Epoch: 4   Global Step: 177440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:16,792-Speed 2613.68 samples/sec   Loss 10.5446   LearningRate 0.0618   Epoch: 4   Global Step: 177450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:20,686-Speed 2630.22 samples/sec   Loss 10.5561   LearningRate 0.0618   Epoch: 4   Global Step: 177460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:24,568-Speed 2638.44 samples/sec   Loss 10.6395   LearningRate 0.0618   Epoch: 4   Global Step: 177470   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:28,464-Speed 2629.49 samples/sec   Loss 10.7069   LearningRate 0.0618   Epoch: 4   Global Step: 177480   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:32,362-Speed 2626.98 samples/sec   Loss 10.7957   LearningRate 0.0618   Epoch: 4   Global Step: 177490   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:36,254-Speed 2632.13 samples/sec   Loss 10.7293   LearningRate 0.0618   Epoch: 4   Global Step: 177500   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:40,153-Speed 2626.92 samples/sec   Loss 10.6744   LearningRate 0.0618   Epoch: 4   Global Step: 177510   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:44,041-Speed 2633.92 samples/sec   Loss 10.6719   LearningRate 0.0618   Epoch: 4   Global Step: 177520   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:47,943-Speed 2625.26 samples/sec   Loss 10.7571   LearningRate 0.0618   Epoch: 4   Global Step: 177530   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:51,846-Speed 2623.80 samples/sec   Loss 10.7392   LearningRate 0.0618   Epoch: 4   Global Step: 177540   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:55,742-Speed 2629.35 samples/sec   Loss 10.6444   LearningRate 0.0618   Epoch: 4   Global Step: 177550   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:39:59,635-Speed 2630.84 samples/sec   Loss 10.6213   LearningRate 0.0618   Epoch: 4   Global Step: 177560   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:03,529-Speed 2630.41 samples/sec   Loss 10.8568   LearningRate 0.0618   Epoch: 4   Global Step: 177570   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:40:07,403-Speed 2643.43 samples/sec   Loss 10.7055   LearningRate 0.0618   Epoch: 4   Global Step: 177580   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:11,294-Speed 2632.32 samples/sec   Loss 10.7894   LearningRate 0.0618   Epoch: 4   Global Step: 177590   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:15,190-Speed 2628.89 samples/sec   Loss 10.6767   LearningRate 0.0618   Epoch: 4   Global Step: 177600   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:19,091-Speed 2625.45 samples/sec   Loss 10.7030   LearningRate 0.0618   Epoch: 4   Global Step: 177610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:22,991-Speed 2626.49 samples/sec   Loss 10.6369   LearningRate 0.0618   Epoch: 4   Global Step: 177620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:26,901-Speed 2620.32 samples/sec   Loss 10.5948   LearningRate 0.0618   Epoch: 4   Global Step: 177630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:30,801-Speed 2625.93 samples/sec   Loss 10.8084   LearningRate 0.0618   Epoch: 4   Global Step: 177640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:34,696-Speed 2629.33 samples/sec   Loss 10.8131   LearningRate 0.0618   Epoch: 4   Global Step: 177650   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:40:38,574-Speed 2641.48 samples/sec   Loss 10.6426   LearningRate 0.0618   Epoch: 4   Global Step: 177660   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:40:42,450-Speed 2642.23 samples/sec   Loss 10.7322   LearningRate 0.0618   Epoch: 4   Global Step: 177670   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:40:46,352-Speed 2624.85 samples/sec   Loss 10.7065   LearningRate 0.0618   Epoch: 4   Global Step: 177680   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:40:50,245-Speed 2630.81 samples/sec   Loss 10.6451   LearningRate 0.0617   Epoch: 4   Global Step: 177690   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:40:54,140-Speed 2629.84 samples/sec   Loss 10.8034   LearningRate 0.0617   Epoch: 4   Global Step: 177700   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:40:58,035-Speed 2629.82 samples/sec   Loss 10.6571   LearningRate 0.0617   Epoch: 4   Global Step: 177710   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:41:01,935-Speed 2626.00 samples/sec   Loss 10.6810   LearningRate 0.0617   Epoch: 4   Global Step: 177720   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:41:05,831-Speed 2629.34 samples/sec   Loss 10.7038   LearningRate 0.0617   Epoch: 4   Global Step: 177730   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:41:09,725-Speed 2629.57 samples/sec   Loss 10.7648   LearningRate 0.0617   Epoch: 4   Global Step: 177740   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:41:13,618-Speed 2631.65 samples/sec   Loss 10.7444   LearningRate 0.0617   Epoch: 4   Global Step: 177750   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:41:17,507-Speed 2633.59 samples/sec   Loss 10.6907   LearningRate 0.0617   Epoch: 4   Global Step: 177760   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:41:21,400-Speed 2630.51 samples/sec   Loss 10.6934   LearningRate 0.0617   Epoch: 4   Global Step: 177770   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:25,292-Speed 2631.87 samples/sec   Loss 10.7140   LearningRate 0.0617   Epoch: 4   Global Step: 177780   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:29,187-Speed 2629.56 samples/sec   Loss 10.6796   LearningRate 0.0617   Epoch: 4   Global Step: 177790   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:33,092-Speed 2623.01 samples/sec   Loss 10.8943   LearningRate 0.0617   Epoch: 4   Global Step: 177800   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:36,984-Speed 2631.55 samples/sec   Loss 10.8446   LearningRate 0.0617   Epoch: 4   Global Step: 177810   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:40,878-Speed 2630.11 samples/sec   Loss 10.7733   LearningRate 0.0617   Epoch: 4   Global Step: 177820   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:44,782-Speed 2623.54 samples/sec   Loss 10.7231   LearningRate 0.0617   Epoch: 4   Global Step: 177830   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:48,673-Speed 2634.99 samples/sec   Loss 10.7276   LearningRate 0.0617   Epoch: 4   Global Step: 177840   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:52,569-Speed 2628.97 samples/sec   Loss 10.6892   LearningRate 0.0617   Epoch: 4   Global Step: 177850   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:41:56,460-Speed 2632.63 samples/sec   Loss 10.6792   LearningRate 0.0617   Epoch: 4   Global Step: 177860   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:42:00,357-Speed 2628.10 samples/sec   Loss 10.6002   LearningRate 0.0617   Epoch: 4   Global Step: 177870   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:04,253-Speed 2628.99 samples/sec   Loss 10.7297   LearningRate 0.0617   Epoch: 4   Global Step: 177880   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:08,146-Speed 2630.87 samples/sec   Loss 10.7052   LearningRate 0.0617   Epoch: 4   Global Step: 177890   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:12,047-Speed 2625.62 samples/sec   Loss 10.7505   LearningRate 0.0617   Epoch: 4   Global Step: 177900   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:15,940-Speed 2630.73 samples/sec   Loss 10.7265   LearningRate 0.0617   Epoch: 4   Global Step: 177910   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:19,830-Speed 2633.81 samples/sec   Loss 10.6468   LearningRate 0.0617   Epoch: 4   Global Step: 177920   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:23,723-Speed 2630.30 samples/sec   Loss 10.7412   LearningRate 0.0617   Epoch: 4   Global Step: 177930   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:27,618-Speed 2629.83 samples/sec   Loss 10.7731   LearningRate 0.0617   Epoch: 4   Global Step: 177940   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:31,520-Speed 2625.09 samples/sec   Loss 10.7229   LearningRate 0.0617   Epoch: 4   Global Step: 177950   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:35,441-Speed 2611.78 samples/sec   Loss 10.7386   LearningRate 0.0617   Epoch: 4   Global Step: 177960   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:39,340-Speed 2626.89 samples/sec   Loss 10.8662   LearningRate 0.0617   Epoch: 4   Global Step: 177970   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:42:43,231-Speed 2632.46 samples/sec   Loss 10.6399   LearningRate 0.0617   Epoch: 4   Global Step: 177980   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:42:47,124-Speed 2631.52 samples/sec   Loss 10.7031   LearningRate 0.0617   Epoch: 4   Global Step: 177990   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:42:51,020-Speed 2629.01 samples/sec   Loss 10.7105   LearningRate 0.0617   Epoch: 4   Global Step: 178000   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:42:54,896-Speed 2642.01 samples/sec   Loss 10.5960   LearningRate 0.0617   Epoch: 4   Global Step: 178010   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:42:58,793-Speed 2628.05 samples/sec   Loss 10.5568   LearningRate 0.0617   Epoch: 4   Global Step: 178020   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:02,689-Speed 2629.39 samples/sec   Loss 10.6569   LearningRate 0.0617   Epoch: 4   Global Step: 178030   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:06,583-Speed 2629.97 samples/sec   Loss 10.7621   LearningRate 0.0617   Epoch: 4   Global Step: 178040   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:10,492-Speed 2620.12 samples/sec   Loss 10.7767   LearningRate 0.0617   Epoch: 4   Global Step: 178050   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:14,390-Speed 2627.71 samples/sec   Loss 10.6641   LearningRate 0.0617   Epoch: 4   Global Step: 178060   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:18,285-Speed 2629.56 samples/sec   Loss 10.6103   LearningRate 0.0617   Epoch: 4   Global Step: 178070   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:22,179-Speed 2630.16 samples/sec   Loss 10.8313   LearningRate 0.0617   Epoch: 4   Global Step: 178080   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:26,075-Speed 2629.07 samples/sec   Loss 10.7969   LearningRate 0.0617   Epoch: 4   Global Step: 178090   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:29,968-Speed 2631.14 samples/sec   Loss 10.6215   LearningRate 0.0617   Epoch: 4   Global Step: 178100   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:43:33,862-Speed 2630.06 samples/sec   Loss 10.7521   LearningRate 0.0617   Epoch: 4   Global Step: 178110   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:43:37,754-Speed 2631.28 samples/sec   Loss 10.7286   LearningRate 0.0617   Epoch: 4   Global Step: 178120   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:43:41,650-Speed 2629.64 samples/sec   Loss 10.7450   LearningRate 0.0617   Epoch: 4   Global Step: 178130   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:43:45,560-Speed 2618.97 samples/sec   Loss 10.7084   LearningRate 0.0617   Epoch: 4   Global Step: 178140   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:43:49,463-Speed 2624.30 samples/sec   Loss 10.7583   LearningRate 0.0617   Epoch: 4   Global Step: 178150   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:43:53,356-Speed 2630.66 samples/sec   Loss 10.7655   LearningRate 0.0617   Epoch: 4   Global Step: 178160   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:43:57,249-Speed 2631.52 samples/sec   Loss 10.6423   LearningRate 0.0617   Epoch: 4   Global Step: 178170   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:01,188-Speed 2600.41 samples/sec   Loss 10.7013   LearningRate 0.0617   Epoch: 4   Global Step: 178180   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:05,084-Speed 2628.53 samples/sec   Loss 10.5544   LearningRate 0.0617   Epoch: 4   Global Step: 178190   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:08,984-Speed 2625.90 samples/sec   Loss 10.7151   LearningRate 0.0617   Epoch: 4   Global Step: 178200   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:12,963-Speed 2574.20 samples/sec   Loss 10.8585   LearningRate 0.0617   Epoch: 4   Global Step: 178210   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:16,916-Speed 2590.92 samples/sec   Loss 10.6912   LearningRate 0.0616   Epoch: 4   Global Step: 178220   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:20,812-Speed 2628.75 samples/sec   Loss 10.7455   LearningRate 0.0616   Epoch: 4   Global Step: 178230   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:24,712-Speed 2626.24 samples/sec   Loss 10.6219   LearningRate 0.0616   Epoch: 4   Global Step: 178240   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:28,619-Speed 2621.78 samples/sec   Loss 10.6290   LearningRate 0.0616   Epoch: 4   Global Step: 178250   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:32,513-Speed 2630.60 samples/sec   Loss 10.6873   LearningRate 0.0616   Epoch: 4   Global Step: 178260   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:36,404-Speed 2632.71 samples/sec   Loss 10.8280   LearningRate 0.0616   Epoch: 4   Global Step: 178270   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:40,312-Speed 2620.81 samples/sec   Loss 10.6867   LearningRate 0.0616   Epoch: 4   Global Step: 178280   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:44,230-Speed 2614.08 samples/sec   Loss 10.7628   LearningRate 0.0616   Epoch: 4   Global Step: 178290   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:48,138-Speed 2620.84 samples/sec   Loss 10.7297   LearningRate 0.0616   Epoch: 4   Global Step: 178300   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:52,022-Speed 2636.88 samples/sec   Loss 10.6877   LearningRate 0.0616   Epoch: 4   Global Step: 178310   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:55,928-Speed 2622.52 samples/sec   Loss 10.6968   LearningRate 0.0616   Epoch: 4   Global Step: 178320   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:44:59,820-Speed 2631.43 samples/sec   Loss 10.6623   LearningRate 0.0616   Epoch: 4   Global Step: 178330   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:03,713-Speed 2630.59 samples/sec   Loss 10.8234   LearningRate 0.0616   Epoch: 4   Global Step: 178340   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:07,606-Speed 2631.58 samples/sec   Loss 10.6855   LearningRate 0.0616   Epoch: 4   Global Step: 178350   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:11,499-Speed 2631.23 samples/sec   Loss 10.6041   LearningRate 0.0616   Epoch: 4   Global Step: 178360   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:15,394-Speed 2629.08 samples/sec   Loss 10.5299   LearningRate 0.0616   Epoch: 4   Global Step: 178370   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:19,294-Speed 2626.08 samples/sec   Loss 10.6934   LearningRate 0.0616   Epoch: 4   Global Step: 178380   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:23,190-Speed 2628.84 samples/sec   Loss 10.6114   LearningRate 0.0616   Epoch: 4   Global Step: 178390   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:27,089-Speed 2627.10 samples/sec   Loss 10.6748   LearningRate 0.0616   Epoch: 4   Global Step: 178400   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:30,992-Speed 2623.96 samples/sec   Loss 10.6957   LearningRate 0.0616   Epoch: 4   Global Step: 178410   Fp16 Grad Scale: 524288   Required: 73 hours
Training: 2022-04-13 15:45:34,876-Speed 2637.20 samples/sec   Loss 10.6529   LearningRate 0.0616   Epoch: 4   Global Step: 178420   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:38,770-Speed 2630.70 samples/sec   Loss 10.7334   LearningRate 0.0616   Epoch: 4   Global Step: 178430   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:42,665-Speed 2629.65 samples/sec   Loss 10.6328   LearningRate 0.0616   Epoch: 4   Global Step: 178440   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:46,557-Speed 2631.20 samples/sec   Loss 10.6947   LearningRate 0.0616   Epoch: 4   Global Step: 178450   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:50,450-Speed 2630.95 samples/sec   Loss 10.5275   LearningRate 0.0616   Epoch: 4   Global Step: 178460   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:54,351-Speed 2625.72 samples/sec   Loss 10.8108   LearningRate 0.0616   Epoch: 4   Global Step: 178470   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:45:58,246-Speed 2629.35 samples/sec   Loss 10.6088   LearningRate 0.0616   Epoch: 4   Global Step: 178480   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:02,142-Speed 2629.14 samples/sec   Loss 10.5937   LearningRate 0.0616   Epoch: 4   Global Step: 178490   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:06,039-Speed 2628.52 samples/sec   Loss 10.7525   LearningRate 0.0616   Epoch: 4   Global Step: 178500   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:09,935-Speed 2628.54 samples/sec   Loss 10.7295   LearningRate 0.0616   Epoch: 4   Global Step: 178510   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:13,815-Speed 2639.29 samples/sec   Loss 10.7554   LearningRate 0.0616   Epoch: 4   Global Step: 178520   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:17,710-Speed 2629.47 samples/sec   Loss 10.7040   LearningRate 0.0616   Epoch: 4   Global Step: 178530   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:21,605-Speed 2630.63 samples/sec   Loss 10.7745   LearningRate 0.0616   Epoch: 4   Global Step: 178540   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:25,495-Speed 2632.38 samples/sec   Loss 10.7126   LearningRate 0.0616   Epoch: 4   Global Step: 178550   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:29,389-Speed 2630.57 samples/sec   Loss 10.6019   LearningRate 0.0616   Epoch: 4   Global Step: 178560   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:33,282-Speed 2631.17 samples/sec   Loss 10.6452   LearningRate 0.0616   Epoch: 4   Global Step: 178570   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:37,176-Speed 2629.60 samples/sec   Loss 10.8244   LearningRate 0.0616   Epoch: 4   Global Step: 178580   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:41,072-Speed 2628.89 samples/sec   Loss 10.6246   LearningRate 0.0616   Epoch: 4   Global Step: 178590   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:44,972-Speed 2626.50 samples/sec   Loss 10.6387   LearningRate 0.0616   Epoch: 4   Global Step: 178600   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:46:48,851-Speed 2639.91 samples/sec   Loss 10.6630   LearningRate 0.0616   Epoch: 4   Global Step: 178610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:46:52,758-Speed 2622.12 samples/sec   Loss 10.7916   LearningRate 0.0616   Epoch: 4   Global Step: 178620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:46:56,659-Speed 2624.99 samples/sec   Loss 10.8277   LearningRate 0.0616   Epoch: 4   Global Step: 178630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:00,559-Speed 2627.03 samples/sec   Loss 10.6252   LearningRate 0.0616   Epoch: 4   Global Step: 178640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:04,451-Speed 2631.30 samples/sec   Loss 10.7371   LearningRate 0.0616   Epoch: 4   Global Step: 178650   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:08,347-Speed 2629.29 samples/sec   Loss 10.6699   LearningRate 0.0616   Epoch: 4   Global Step: 178660   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:12,241-Speed 2629.93 samples/sec   Loss 10.5969   LearningRate 0.0616   Epoch: 4   Global Step: 178670   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:16,136-Speed 2629.53 samples/sec   Loss 10.5907   LearningRate 0.0616   Epoch: 4   Global Step: 178680   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:20,030-Speed 2630.25 samples/sec   Loss 10.6133   LearningRate 0.0616   Epoch: 4   Global Step: 178690   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:23,936-Speed 2622.20 samples/sec   Loss 10.5881   LearningRate 0.0616   Epoch: 4   Global Step: 178700   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:27,827-Speed 2632.51 samples/sec   Loss 10.8299   LearningRate 0.0616   Epoch: 4   Global Step: 178710   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:31,722-Speed 2628.89 samples/sec   Loss 10.6099   LearningRate 0.0616   Epoch: 4   Global Step: 178720   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:35,617-Speed 2629.98 samples/sec   Loss 10.7039   LearningRate 0.0616   Epoch: 4   Global Step: 178730   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:39,522-Speed 2623.15 samples/sec   Loss 10.6398   LearningRate 0.0616   Epoch: 4   Global Step: 178740   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:43,429-Speed 2621.58 samples/sec   Loss 10.4824   LearningRate 0.0615   Epoch: 4   Global Step: 178750   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:47,330-Speed 2625.20 samples/sec   Loss 10.4868   LearningRate 0.0615   Epoch: 4   Global Step: 178760   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:51,224-Speed 2630.64 samples/sec   Loss 10.5890   LearningRate 0.0615   Epoch: 4   Global Step: 178770   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:47:55,098-Speed 2643.72 samples/sec   Loss 10.5201   LearningRate 0.0615   Epoch: 4   Global Step: 178780   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:47:58,988-Speed 2632.70 samples/sec   Loss 10.6658   LearningRate 0.0615   Epoch: 4   Global Step: 178790   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:02,882-Speed 2630.28 samples/sec   Loss 10.7692   LearningRate 0.0615   Epoch: 4   Global Step: 178800   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:06,777-Speed 2629.24 samples/sec   Loss 10.7885   LearningRate 0.0615   Epoch: 4   Global Step: 178810   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:10,670-Speed 2630.95 samples/sec   Loss 10.7999   LearningRate 0.0615   Epoch: 4   Global Step: 178820   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:14,565-Speed 2630.11 samples/sec   Loss 10.5970   LearningRate 0.0615   Epoch: 4   Global Step: 178830   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:18,459-Speed 2630.36 samples/sec   Loss 10.7011   LearningRate 0.0615   Epoch: 4   Global Step: 178840   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:22,351-Speed 2631.29 samples/sec   Loss 10.6921   LearningRate 0.0615   Epoch: 4   Global Step: 178850   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:26,244-Speed 2631.07 samples/sec   Loss 10.5355   LearningRate 0.0615   Epoch: 4   Global Step: 178860   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:30,318-Speed 2514.26 samples/sec   Loss 10.7501   LearningRate 0.0615   Epoch: 4   Global Step: 178870   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:34,355-Speed 2536.71 samples/sec   Loss 10.6171   LearningRate 0.0615   Epoch: 4   Global Step: 178880   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:38,286-Speed 2605.51 samples/sec   Loss 10.6193   LearningRate 0.0615   Epoch: 4   Global Step: 178890   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:42,222-Speed 2602.38 samples/sec   Loss 10.7977   LearningRate 0.0615   Epoch: 4   Global Step: 178900   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:46,140-Speed 2613.74 samples/sec   Loss 10.6376   LearningRate 0.0615   Epoch: 4   Global Step: 178910   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:50,033-Speed 2631.44 samples/sec   Loss 10.7728   LearningRate 0.0615   Epoch: 4   Global Step: 178920   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:53,931-Speed 2627.75 samples/sec   Loss 10.5955   LearningRate 0.0615   Epoch: 4   Global Step: 178930   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:48:57,840-Speed 2621.01 samples/sec   Loss 10.6787   LearningRate 0.0615   Epoch: 4   Global Step: 178940   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:01,745-Speed 2622.37 samples/sec   Loss 10.6520   LearningRate 0.0615   Epoch: 4   Global Step: 178950   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:05,637-Speed 2631.57 samples/sec   Loss 10.7467   LearningRate 0.0615   Epoch: 4   Global Step: 178960   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:09,532-Speed 2629.26 samples/sec   Loss 10.7096   LearningRate 0.0615   Epoch: 4   Global Step: 178970   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:13,493-Speed 2586.29 samples/sec   Loss 10.8246   LearningRate 0.0615   Epoch: 4   Global Step: 178980   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:49:17,394-Speed 2625.31 samples/sec   Loss 10.6916   LearningRate 0.0615   Epoch: 4   Global Step: 178990   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:49:21,271-Speed 2641.74 samples/sec   Loss 10.6205   LearningRate 0.0615   Epoch: 4   Global Step: 179000   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:25,163-Speed 2631.73 samples/sec   Loss 10.7163   LearningRate 0.0615   Epoch: 4   Global Step: 179010   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:29,057-Speed 2630.77 samples/sec   Loss 10.6433   LearningRate 0.0615   Epoch: 4   Global Step: 179020   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:32,951-Speed 2630.28 samples/sec   Loss 10.5948   LearningRate 0.0615   Epoch: 4   Global Step: 179030   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:36,843-Speed 2631.65 samples/sec   Loss 10.6182   LearningRate 0.0615   Epoch: 4   Global Step: 179040   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:40,735-Speed 2631.05 samples/sec   Loss 10.6851   LearningRate 0.0615   Epoch: 4   Global Step: 179050   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:44,631-Speed 2629.08 samples/sec   Loss 10.4721   LearningRate 0.0615   Epoch: 4   Global Step: 179060   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:49:48,478-Speed 2662.72 samples/sec   Loss 11.4678   LearningRate 0.0615   Epoch: 4   Global Step: 179070   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:49:52,371-Speed 2630.80 samples/sec   Loss 11.3414   LearningRate 0.0615   Epoch: 4   Global Step: 179080   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:49:56,262-Speed 2632.50 samples/sec   Loss 11.0457   LearningRate 0.0615   Epoch: 4   Global Step: 179090   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:00,158-Speed 2628.37 samples/sec   Loss 10.8251   LearningRate 0.0615   Epoch: 4   Global Step: 179100   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:04,049-Speed 2632.94 samples/sec   Loss 10.6977   LearningRate 0.0615   Epoch: 4   Global Step: 179110   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:07,937-Speed 2634.26 samples/sec   Loss 10.7251   LearningRate 0.0615   Epoch: 4   Global Step: 179120   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:11,830-Speed 2631.24 samples/sec   Loss 10.7323   LearningRate 0.0615   Epoch: 4   Global Step: 179130   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:15,720-Speed 2632.38 samples/sec   Loss 10.6973   LearningRate 0.0615   Epoch: 4   Global Step: 179140   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:19,615-Speed 2629.35 samples/sec   Loss 10.7236   LearningRate 0.0615   Epoch: 4   Global Step: 179150   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:23,516-Speed 2625.79 samples/sec   Loss 10.7575   LearningRate 0.0615   Epoch: 4   Global Step: 179160   Fp16 Grad Scale: 16384   Required: 73 hours
Training: 2022-04-13 15:50:27,399-Speed 2637.80 samples/sec   Loss 10.6121   LearningRate 0.0615   Epoch: 4   Global Step: 179170   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:31,294-Speed 2629.71 samples/sec   Loss 10.7194   LearningRate 0.0615   Epoch: 4   Global Step: 179180   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:35,189-Speed 2629.34 samples/sec   Loss 10.7210   LearningRate 0.0615   Epoch: 4   Global Step: 179190   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:39,083-Speed 2630.86 samples/sec   Loss 10.8128   LearningRate 0.0615   Epoch: 4   Global Step: 179200   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:42,976-Speed 2631.26 samples/sec   Loss 10.8504   LearningRate 0.0615   Epoch: 4   Global Step: 179210   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:46,865-Speed 2633.12 samples/sec   Loss 10.7281   LearningRate 0.0615   Epoch: 4   Global Step: 179220   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:50,756-Speed 2632.29 samples/sec   Loss 10.8003   LearningRate 0.0615   Epoch: 4   Global Step: 179230   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:54,645-Speed 2633.32 samples/sec   Loss 10.7831   LearningRate 0.0615   Epoch: 4   Global Step: 179240   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:50:58,538-Speed 2631.54 samples/sec   Loss 10.6746   LearningRate 0.0615   Epoch: 4   Global Step: 179250   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:51:02,432-Speed 2630.33 samples/sec   Loss 10.5514   LearningRate 0.0615   Epoch: 4   Global Step: 179260   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 15:51:06,324-Speed 2631.98 samples/sec   Loss 10.6155   LearningRate 0.0615   Epoch: 4   Global Step: 179270   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:10,216-Speed 2631.62 samples/sec   Loss 10.5372   LearningRate 0.0614   Epoch: 4   Global Step: 179280   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:14,109-Speed 2631.31 samples/sec   Loss 10.6337   LearningRate 0.0614   Epoch: 4   Global Step: 179290   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:18,004-Speed 2629.74 samples/sec   Loss 11.4906   LearningRate 0.0614   Epoch: 4   Global Step: 179300   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:21,895-Speed 2632.23 samples/sec   Loss 11.0863   LearningRate 0.0614   Epoch: 4   Global Step: 179310   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:25,790-Speed 2629.39 samples/sec   Loss 10.8555   LearningRate 0.0614   Epoch: 4   Global Step: 179320   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:29,681-Speed 2632.50 samples/sec   Loss 10.8270   LearningRate 0.0614   Epoch: 4   Global Step: 179330   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:33,574-Speed 2630.85 samples/sec   Loss 10.7821   LearningRate 0.0614   Epoch: 4   Global Step: 179340   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:37,466-Speed 2631.54 samples/sec   Loss 10.7586   LearningRate 0.0614   Epoch: 4   Global Step: 179350   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:41,356-Speed 2632.88 samples/sec   Loss 10.6942   LearningRate 0.0614   Epoch: 4   Global Step: 179360   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:51:45,248-Speed 2631.72 samples/sec   Loss 10.7069   LearningRate 0.0614   Epoch: 4   Global Step: 179370   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:51:49,141-Speed 2630.64 samples/sec   Loss 10.7330   LearningRate 0.0614   Epoch: 4   Global Step: 179380   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:51:53,070-Speed 2607.33 samples/sec   Loss 10.8350   LearningRate 0.0614   Epoch: 4   Global Step: 179390   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:51:56,994-Speed 2610.61 samples/sec   Loss 10.6367   LearningRate 0.0614   Epoch: 4   Global Step: 179400   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:00,887-Speed 2630.55 samples/sec   Loss 10.6038   LearningRate 0.0614   Epoch: 4   Global Step: 179410   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:04,809-Speed 2612.06 samples/sec   Loss 10.7514   LearningRate 0.0614   Epoch: 4   Global Step: 179420   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:08,706-Speed 2628.32 samples/sec   Loss 10.6394   LearningRate 0.0614   Epoch: 4   Global Step: 179430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:12,598-Speed 2631.94 samples/sec   Loss 10.6900   LearningRate 0.0614   Epoch: 4   Global Step: 179440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:16,515-Speed 2614.72 samples/sec   Loss 10.6217   LearningRate 0.0614   Epoch: 4   Global Step: 179450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:20,410-Speed 2630.00 samples/sec   Loss 10.7656   LearningRate 0.0614   Epoch: 4   Global Step: 179460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:24,310-Speed 2626.08 samples/sec   Loss 10.7729   LearningRate 0.0614   Epoch: 4   Global Step: 179470   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:52:28,202-Speed 2631.67 samples/sec   Loss 10.6612   LearningRate 0.0614   Epoch: 4   Global Step: 179480   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:52:32,104-Speed 2624.75 samples/sec   Loss 10.7255   LearningRate 0.0614   Epoch: 4   Global Step: 179490   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:52:35,980-Speed 2642.33 samples/sec   Loss 10.6804   LearningRate 0.0614   Epoch: 4   Global Step: 179500   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:39,872-Speed 2631.80 samples/sec   Loss 10.6216   LearningRate 0.0614   Epoch: 4   Global Step: 179510   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:43,762-Speed 2633.08 samples/sec   Loss 10.7651   LearningRate 0.0614   Epoch: 4   Global Step: 179520   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:47,653-Speed 2632.22 samples/sec   Loss 10.5641   LearningRate 0.0614   Epoch: 4   Global Step: 179530   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:51,547-Speed 2630.63 samples/sec   Loss 10.6669   LearningRate 0.0614   Epoch: 4   Global Step: 179540   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:55,438-Speed 2631.78 samples/sec   Loss 10.7712   LearningRate 0.0614   Epoch: 4   Global Step: 179550   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:52:59,330-Speed 2631.68 samples/sec   Loss 10.5663   LearningRate 0.0614   Epoch: 4   Global Step: 179560   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:03,224-Speed 2630.07 samples/sec   Loss 10.9089   LearningRate 0.0614   Epoch: 4   Global Step: 179570   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:07,119-Speed 2629.98 samples/sec   Loss 10.6575   LearningRate 0.0614   Epoch: 4   Global Step: 179580   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:11,012-Speed 2631.42 samples/sec   Loss 10.6964   LearningRate 0.0614   Epoch: 4   Global Step: 179590   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:14,893-Speed 2638.74 samples/sec   Loss 10.8384   LearningRate 0.0614   Epoch: 4   Global Step: 179600   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:18,790-Speed 2628.45 samples/sec   Loss 10.7343   LearningRate 0.0614   Epoch: 4   Global Step: 179610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:22,680-Speed 2632.77 samples/sec   Loss 10.6719   LearningRate 0.0614   Epoch: 4   Global Step: 179620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:26,583-Speed 2624.86 samples/sec   Loss 10.8384   LearningRate 0.0614   Epoch: 4   Global Step: 179630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:30,473-Speed 2632.87 samples/sec   Loss 10.6800   LearningRate 0.0614   Epoch: 4   Global Step: 179640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:53:34,356-Speed 2638.04 samples/sec   Loss 10.7496   LearningRate 0.0614   Epoch: 4   Global Step: 179650   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:53:38,246-Speed 2632.98 samples/sec   Loss 10.8088   LearningRate 0.0614   Epoch: 4   Global Step: 179660   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:53:42,139-Speed 2631.30 samples/sec   Loss 10.6878   LearningRate 0.0614   Epoch: 4   Global Step: 179670   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:53:46,030-Speed 2631.97 samples/sec   Loss 10.6731   LearningRate 0.0614   Epoch: 4   Global Step: 179680   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:53:49,922-Speed 2632.04 samples/sec   Loss 10.8416   LearningRate 0.0614   Epoch: 4   Global Step: 179690   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:53:53,816-Speed 2629.93 samples/sec   Loss 10.6913   LearningRate 0.0614   Epoch: 4   Global Step: 179700   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:53:57,708-Speed 2632.20 samples/sec   Loss 10.7023   LearningRate 0.0614   Epoch: 4   Global Step: 179710   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:54:01,598-Speed 2633.07 samples/sec   Loss 10.6951   LearningRate 0.0614   Epoch: 4   Global Step: 179720   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:54:05,505-Speed 2621.16 samples/sec   Loss 10.6363   LearningRate 0.0614   Epoch: 4   Global Step: 179730   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:54:09,400-Speed 2629.97 samples/sec   Loss 10.6136   LearningRate 0.0614   Epoch: 4   Global Step: 179740   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 15:54:13,296-Speed 2629.48 samples/sec   Loss 10.7743   LearningRate 0.0614   Epoch: 4   Global Step: 179750   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:17,194-Speed 2627.55 samples/sec   Loss 10.5550   LearningRate 0.0614   Epoch: 4   Global Step: 179760   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:21,091-Speed 2628.12 samples/sec   Loss 10.5721   LearningRate 0.0614   Epoch: 4   Global Step: 179770   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:24,982-Speed 2632.46 samples/sec   Loss 10.6277   LearningRate 0.0614   Epoch: 4   Global Step: 179780   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:28,875-Speed 2631.35 samples/sec   Loss 10.7339   LearningRate 0.0614   Epoch: 4   Global Step: 179790   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:32,766-Speed 2631.99 samples/sec   Loss 10.7212   LearningRate 0.0614   Epoch: 4   Global Step: 179800   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:36,671-Speed 2622.63 samples/sec   Loss 10.8057   LearningRate 0.0613   Epoch: 4   Global Step: 179810   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:40,574-Speed 2624.23 samples/sec   Loss 10.5653   LearningRate 0.0613   Epoch: 4   Global Step: 179820   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:44,469-Speed 2630.05 samples/sec   Loss 10.7153   LearningRate 0.0613   Epoch: 4   Global Step: 179830   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:48,376-Speed 2621.89 samples/sec   Loss 10.7245   LearningRate 0.0613   Epoch: 4   Global Step: 179840   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:54:52,289-Speed 2617.62 samples/sec   Loss 10.6649   LearningRate 0.0613   Epoch: 4   Global Step: 179850   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:54:56,240-Speed 2592.14 samples/sec   Loss 10.5395   LearningRate 0.0613   Epoch: 4   Global Step: 179860   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:00,139-Speed 2627.10 samples/sec   Loss 10.7672   LearningRate 0.0613   Epoch: 4   Global Step: 179870   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:04,037-Speed 2627.84 samples/sec   Loss 10.7356   LearningRate 0.0613   Epoch: 4   Global Step: 179880   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:07,930-Speed 2630.56 samples/sec   Loss 10.6357   LearningRate 0.0613   Epoch: 4   Global Step: 179890   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:11,825-Speed 2630.17 samples/sec   Loss 10.7863   LearningRate 0.0613   Epoch: 4   Global Step: 179900   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:15,753-Speed 2607.18 samples/sec   Loss 10.6860   LearningRate 0.0613   Epoch: 4   Global Step: 179910   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:19,648-Speed 2630.43 samples/sec   Loss 10.6230   LearningRate 0.0613   Epoch: 4   Global Step: 179920   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:23,552-Speed 2623.49 samples/sec   Loss 10.5793   LearningRate 0.0613   Epoch: 4   Global Step: 179930   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:27,453-Speed 2625.59 samples/sec   Loss 10.6637   LearningRate 0.0613   Epoch: 4   Global Step: 179940   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:31,348-Speed 2630.10 samples/sec   Loss 10.5728   LearningRate 0.0613   Epoch: 4   Global Step: 179950   Fp16 Grad Scale: 524288   Required: 73 hours
Training: 2022-04-13 15:55:35,229-Speed 2639.09 samples/sec   Loss 10.7175   LearningRate 0.0613   Epoch: 4   Global Step: 179960   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:39,118-Speed 2633.78 samples/sec   Loss 10.7092   LearningRate 0.0613   Epoch: 4   Global Step: 179970   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:43,112-Speed 2564.91 samples/sec   Loss 10.7567   LearningRate 0.0613   Epoch: 4   Global Step: 179980   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:47,098-Speed 2569.48 samples/sec   Loss 10.4948   LearningRate 0.0613   Epoch: 4   Global Step: 179990   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:55:50,989-Speed 2631.95 samples/sec   Loss 10.6943   LearningRate 0.0613   Epoch: 4   Global Step: 180000   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:56:33,642-[lfw][180000]XNorm: 23.741681
Training: 2022-04-13 15:56:33,643-[lfw][180000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-13 15:56:33,644-[lfw][180000]Accuracy-Highest: 0.99783
Training: 2022-04-13 15:57:23,685-[cfp_fp][180000]XNorm: 21.689816
Training: 2022-04-13 15:57:23,686-[cfp_fp][180000]Accuracy-Flip: 0.98043+-0.00667
Training: 2022-04-13 15:57:23,687-[cfp_fp][180000]Accuracy-Highest: 0.98100
Training: 2022-04-13 15:58:06,863-[agedb_30][180000]XNorm: 23.205789
Training: 2022-04-13 15:58:06,867-[agedb_30][180000]Accuracy-Flip: 0.97150+-0.00769
Training: 2022-04-13 15:58:06,867-[agedb_30][180000]Accuracy-Highest: 0.97150
Training: 2022-04-13 15:58:10,730-Speed 73.28 samples/sec   Loss 10.5278   LearningRate 0.0613   Epoch: 4   Global Step: 180010   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:14,592-Speed 2652.22 samples/sec   Loss 10.6196   LearningRate 0.0613   Epoch: 4   Global Step: 180020   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:18,459-Speed 2648.84 samples/sec   Loss 10.6917   LearningRate 0.0613   Epoch: 4   Global Step: 180030   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:22,322-Speed 2651.41 samples/sec   Loss 10.5523   LearningRate 0.0613   Epoch: 4   Global Step: 180040   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:26,189-Speed 2648.36 samples/sec   Loss 10.6624   LearningRate 0.0613   Epoch: 4   Global Step: 180050   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:30,062-Speed 2644.66 samples/sec   Loss 10.7234   LearningRate 0.0613   Epoch: 4   Global Step: 180060   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:33,931-Speed 2648.06 samples/sec   Loss 10.6265   LearningRate 0.0613   Epoch: 4   Global Step: 180070   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:37,805-Speed 2643.89 samples/sec   Loss 10.6824   LearningRate 0.0613   Epoch: 4   Global Step: 180080   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:41,692-Speed 2635.19 samples/sec   Loss 10.7101   LearningRate 0.0613   Epoch: 4   Global Step: 180090   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:45,573-Speed 2639.67 samples/sec   Loss 10.6837   LearningRate 0.0613   Epoch: 4   Global Step: 180100   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:49,453-Speed 2639.66 samples/sec   Loss 10.6291   LearningRate 0.0613   Epoch: 4   Global Step: 180110   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:53,337-Speed 2637.50 samples/sec   Loss 10.5625   LearningRate 0.0613   Epoch: 4   Global Step: 180120   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:58:57,204-Speed 2648.41 samples/sec   Loss 10.6789   LearningRate 0.0613   Epoch: 4   Global Step: 180130   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:01,088-Speed 2637.47 samples/sec   Loss 10.6072   LearningRate 0.0613   Epoch: 4   Global Step: 180140   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:04,975-Speed 2634.96 samples/sec   Loss 10.5723   LearningRate 0.0613   Epoch: 4   Global Step: 180150   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:08,871-Speed 2629.16 samples/sec   Loss 10.6580   LearningRate 0.0613   Epoch: 4   Global Step: 180160   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:12,759-Speed 2637.26 samples/sec   Loss 10.6919   LearningRate 0.0613   Epoch: 4   Global Step: 180170   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:16,649-Speed 2633.00 samples/sec   Loss 10.5645   LearningRate 0.0613   Epoch: 4   Global Step: 180180   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:20,542-Speed 2632.45 samples/sec   Loss 10.8033   LearningRate 0.0613   Epoch: 4   Global Step: 180190   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:24,433-Speed 2632.10 samples/sec   Loss 10.6376   LearningRate 0.0613   Epoch: 4   Global Step: 180200   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:28,344-Speed 2618.25 samples/sec   Loss 10.5730   LearningRate 0.0613   Epoch: 4   Global Step: 180210   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:32,240-Speed 2629.24 samples/sec   Loss 10.6707   LearningRate 0.0613   Epoch: 4   Global Step: 180220   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:36,133-Speed 2631.32 samples/sec   Loss 10.6471   LearningRate 0.0613   Epoch: 4   Global Step: 180230   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:59:40,023-Speed 2633.05 samples/sec   Loss 10.8297   LearningRate 0.0613   Epoch: 4   Global Step: 180240   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:59:43,913-Speed 2633.14 samples/sec   Loss 10.6352   LearningRate 0.0613   Epoch: 4   Global Step: 180250   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:59:47,809-Speed 2628.75 samples/sec   Loss 10.7809   LearningRate 0.0613   Epoch: 4   Global Step: 180260   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:59:51,704-Speed 2630.24 samples/sec   Loss 10.4698   LearningRate 0.0613   Epoch: 4   Global Step: 180270   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 15:59:55,588-Speed 2636.84 samples/sec   Loss 10.7069   LearningRate 0.0613   Epoch: 4   Global Step: 180280   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 15:59:59,460-Speed 2645.71 samples/sec   Loss 10.6178   LearningRate 0.0613   Epoch: 4   Global Step: 180290   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:03,354-Speed 2630.48 samples/sec   Loss 10.5063   LearningRate 0.0613   Epoch: 4   Global Step: 180300   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:07,246-Speed 2631.62 samples/sec   Loss 10.7155   LearningRate 0.0613   Epoch: 4   Global Step: 180310   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:11,138-Speed 2631.93 samples/sec   Loss 10.8032   LearningRate 0.0613   Epoch: 4   Global Step: 180320   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:15,039-Speed 2625.64 samples/sec   Loss 10.6509   LearningRate 0.0613   Epoch: 4   Global Step: 180330   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:18,952-Speed 2618.09 samples/sec   Loss 10.6850   LearningRate 0.0612   Epoch: 4   Global Step: 180340   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:22,855-Speed 2623.65 samples/sec   Loss 10.7459   LearningRate 0.0612   Epoch: 4   Global Step: 180350   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:26,745-Speed 2633.42 samples/sec   Loss 10.7226   LearningRate 0.0612   Epoch: 4   Global Step: 180360   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:30,634-Speed 2633.53 samples/sec   Loss 10.6855   LearningRate 0.0612   Epoch: 4   Global Step: 180370   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:34,524-Speed 2633.47 samples/sec   Loss 10.6966   LearningRate 0.0612   Epoch: 4   Global Step: 180380   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:00:38,567-Speed 2533.00 samples/sec   Loss 10.8163   LearningRate 0.0612   Epoch: 4   Global Step: 180390   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:00:42,457-Speed 2633.50 samples/sec   Loss 10.6296   LearningRate 0.0612   Epoch: 4   Global Step: 180400   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:00:46,359-Speed 2625.30 samples/sec   Loss 10.6958   LearningRate 0.0612   Epoch: 4   Global Step: 180410   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:00:50,241-Speed 2639.01 samples/sec   Loss 10.6559   LearningRate 0.0612   Epoch: 4   Global Step: 180420   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:00:54,129-Speed 2633.95 samples/sec   Loss 10.7385   LearningRate 0.0612   Epoch: 4   Global Step: 180430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:00:58,016-Speed 2635.55 samples/sec   Loss 10.7052   LearningRate 0.0612   Epoch: 4   Global Step: 180440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:01,910-Speed 2630.01 samples/sec   Loss 10.5470   LearningRate 0.0612   Epoch: 4   Global Step: 180450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:05,818-Speed 2621.08 samples/sec   Loss 10.6318   LearningRate 0.0612   Epoch: 4   Global Step: 180460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:10,584-Speed 2149.00 samples/sec   Loss 10.6697   LearningRate 0.0612   Epoch: 4   Global Step: 180470   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:14,472-Speed 2634.88 samples/sec   Loss 10.7839   LearningRate 0.0612   Epoch: 4   Global Step: 180480   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:18,338-Speed 2648.70 samples/sec   Loss 10.8303   LearningRate 0.0612   Epoch: 4   Global Step: 180490   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:22,310-Speed 2579.14 samples/sec   Loss 10.5812   LearningRate 0.0612   Epoch: 4   Global Step: 180500   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:26,316-Speed 2556.49 samples/sec   Loss 10.5814   LearningRate 0.0612   Epoch: 4   Global Step: 180510   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:30,201-Speed 2635.97 samples/sec   Loss 10.7076   LearningRate 0.0612   Epoch: 4   Global Step: 180520   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:34,093-Speed 2631.66 samples/sec   Loss 10.6563   LearningRate 0.0612   Epoch: 4   Global Step: 180530   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:37,987-Speed 2630.97 samples/sec   Loss 10.6241   LearningRate 0.0612   Epoch: 4   Global Step: 180540   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:41,887-Speed 2626.25 samples/sec   Loss 10.7666   LearningRate 0.0612   Epoch: 4   Global Step: 180550   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:45,782-Speed 2629.40 samples/sec   Loss 10.7071   LearningRate 0.0612   Epoch: 4   Global Step: 180560   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:49,681-Speed 2627.84 samples/sec   Loss 10.7200   LearningRate 0.0612   Epoch: 4   Global Step: 180570   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:53,573-Speed 2631.80 samples/sec   Loss 10.6394   LearningRate 0.0612   Epoch: 4   Global Step: 180580   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:01:57,462-Speed 2633.59 samples/sec   Loss 10.6807   LearningRate 0.0612   Epoch: 4   Global Step: 180590   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:02:01,443-Speed 2572.16 samples/sec   Loss 10.6814   LearningRate 0.0612   Epoch: 4   Global Step: 180600   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:02:05,500-Speed 2525.33 samples/sec   Loss 10.6107   LearningRate 0.0612   Epoch: 4   Global Step: 180610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:09,595-Speed 2501.00 samples/sec   Loss 10.6533   LearningRate 0.0612   Epoch: 4   Global Step: 180620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:13,535-Speed 2599.77 samples/sec   Loss 10.5806   LearningRate 0.0612   Epoch: 4   Global Step: 180630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:17,439-Speed 2623.96 samples/sec   Loss 10.5941   LearningRate 0.0612   Epoch: 4   Global Step: 180640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:21,329-Speed 2632.65 samples/sec   Loss 10.6386   LearningRate 0.0612   Epoch: 4   Global Step: 180650   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:25,219-Speed 2633.27 samples/sec   Loss 10.5028   LearningRate 0.0612   Epoch: 4   Global Step: 180660   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:29,124-Speed 2622.47 samples/sec   Loss 10.8272   LearningRate 0.0612   Epoch: 4   Global Step: 180670   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:33,014-Speed 2633.17 samples/sec   Loss 10.5021   LearningRate 0.0612   Epoch: 4   Global Step: 180680   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:36,905-Speed 2632.43 samples/sec   Loss 10.7250   LearningRate 0.0612   Epoch: 4   Global Step: 180690   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:40,803-Speed 2627.39 samples/sec   Loss 10.7216   LearningRate 0.0612   Epoch: 4   Global Step: 180700   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:02:44,695-Speed 2631.51 samples/sec   Loss 10.6586   LearningRate 0.0612   Epoch: 4   Global Step: 180710   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:02:48,606-Speed 2619.37 samples/sec   Loss 10.7203   LearningRate 0.0612   Epoch: 4   Global Step: 180720   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:02:52,505-Speed 2626.87 samples/sec   Loss 10.6982   LearningRate 0.0612   Epoch: 4   Global Step: 180730   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:02:56,440-Speed 2602.74 samples/sec   Loss 10.5833   LearningRate 0.0612   Epoch: 4   Global Step: 180740   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:00,336-Speed 2629.09 samples/sec   Loss 10.5545   LearningRate 0.0612   Epoch: 4   Global Step: 180750   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:04,229-Speed 2630.83 samples/sec   Loss 10.6596   LearningRate 0.0612   Epoch: 4   Global Step: 180760   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:08,121-Speed 2631.44 samples/sec   Loss 10.5933   LearningRate 0.0612   Epoch: 4   Global Step: 180770   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:12,021-Speed 2626.22 samples/sec   Loss 10.7196   LearningRate 0.0612   Epoch: 4   Global Step: 180780   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:15,921-Speed 2626.77 samples/sec   Loss 10.6851   LearningRate 0.0612   Epoch: 4   Global Step: 180790   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:19,819-Speed 2627.41 samples/sec   Loss 10.5461   LearningRate 0.0612   Epoch: 4   Global Step: 180800   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:23,699-Speed 2639.65 samples/sec   Loss 10.5943   LearningRate 0.0612   Epoch: 4   Global Step: 180810   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:27,587-Speed 2634.60 samples/sec   Loss 10.7442   LearningRate 0.0612   Epoch: 4   Global Step: 180820   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:31,487-Speed 2626.54 samples/sec   Loss 10.7036   LearningRate 0.0612   Epoch: 4   Global Step: 180830   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:35,464-Speed 2575.19 samples/sec   Loss 10.7254   LearningRate 0.0612   Epoch: 4   Global Step: 180840   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:39,370-Speed 2622.74 samples/sec   Loss 10.6948   LearningRate 0.0612   Epoch: 4   Global Step: 180850   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:43,274-Speed 2623.59 samples/sec   Loss 10.6432   LearningRate 0.0612   Epoch: 4   Global Step: 180860   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:47,168-Speed 2630.66 samples/sec   Loss 10.6108   LearningRate 0.0611   Epoch: 4   Global Step: 180870   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:51,061-Speed 2630.62 samples/sec   Loss 10.5412   LearningRate 0.0611   Epoch: 4   Global Step: 180880   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:54,988-Speed 2608.43 samples/sec   Loss 10.6483   LearningRate 0.0611   Epoch: 4   Global Step: 180890   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:03:58,885-Speed 2628.43 samples/sec   Loss 10.6220   LearningRate 0.0611   Epoch: 4   Global Step: 180900   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:04:02,762-Speed 2645.49 samples/sec   Loss 10.6241   LearningRate 0.0611   Epoch: 4   Global Step: 180910   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:04:06,656-Speed 2630.00 samples/sec   Loss 10.7727   LearningRate 0.0611   Epoch: 4   Global Step: 180920   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:04:10,539-Speed 2637.44 samples/sec   Loss 10.5844   LearningRate 0.0611   Epoch: 4   Global Step: 180930   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:14,431-Speed 2631.98 samples/sec   Loss 10.5316   LearningRate 0.0611   Epoch: 4   Global Step: 180940   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:18,335-Speed 2623.38 samples/sec   Loss 10.6153   LearningRate 0.0611   Epoch: 4   Global Step: 180950   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:22,253-Speed 2614.08 samples/sec   Loss 10.6959   LearningRate 0.0611   Epoch: 4   Global Step: 180960   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:26,172-Speed 2613.55 samples/sec   Loss 10.6533   LearningRate 0.0611   Epoch: 4   Global Step: 180970   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:30,088-Speed 2616.20 samples/sec   Loss 10.8060   LearningRate 0.0611   Epoch: 4   Global Step: 180980   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:34,439-Speed 2354.05 samples/sec   Loss 10.5894   LearningRate 0.0611   Epoch: 4   Global Step: 180990   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:38,329-Speed 2632.34 samples/sec   Loss 10.4178   LearningRate 0.0611   Epoch: 4   Global Step: 181000   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:42,245-Speed 2616.03 samples/sec   Loss 10.5947   LearningRate 0.0611   Epoch: 4   Global Step: 181010   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:46,132-Speed 2635.24 samples/sec   Loss 10.6188   LearningRate 0.0611   Epoch: 4   Global Step: 181020   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:04:50,026-Speed 2629.85 samples/sec   Loss 10.6427   LearningRate 0.0611   Epoch: 4   Global Step: 181030   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:04:53,921-Speed 2630.28 samples/sec   Loss 10.6929   LearningRate 0.0611   Epoch: 4   Global Step: 181040   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:04:57,818-Speed 2628.64 samples/sec   Loss 10.5807   LearningRate 0.0611   Epoch: 4   Global Step: 181050   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:05:01,713-Speed 2629.35 samples/sec   Loss 10.5619   LearningRate 0.0611   Epoch: 4   Global Step: 181060   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:05:05,616-Speed 2624.52 samples/sec   Loss 10.6897   LearningRate 0.0611   Epoch: 4   Global Step: 181070   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:05:09,513-Speed 2628.18 samples/sec   Loss 10.5009   LearningRate 0.0611   Epoch: 4   Global Step: 181080   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:05:13,391-Speed 2641.06 samples/sec   Loss 10.5480   LearningRate 0.0611   Epoch: 4   Global Step: 181090   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:17,285-Speed 2630.64 samples/sec   Loss 10.6286   LearningRate 0.0611   Epoch: 4   Global Step: 181100   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:21,178-Speed 2630.42 samples/sec   Loss 10.6525   LearningRate 0.0611   Epoch: 4   Global Step: 181110   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:25,077-Speed 2627.70 samples/sec   Loss 10.5186   LearningRate 0.0611   Epoch: 4   Global Step: 181120   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:28,992-Speed 2616.06 samples/sec   Loss 10.6526   LearningRate 0.0611   Epoch: 4   Global Step: 181130   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:32,897-Speed 2623.05 samples/sec   Loss 10.6625   LearningRate 0.0611   Epoch: 4   Global Step: 181140   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:36,794-Speed 2627.57 samples/sec   Loss 10.5616   LearningRate 0.0611   Epoch: 4   Global Step: 181150   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:40,691-Speed 2628.51 samples/sec   Loss 10.5912   LearningRate 0.0611   Epoch: 4   Global Step: 181160   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:44,599-Speed 2620.58 samples/sec   Loss 10.6122   LearningRate 0.0611   Epoch: 4   Global Step: 181170   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:48,495-Speed 2629.18 samples/sec   Loss 10.5695   LearningRate 0.0611   Epoch: 4   Global Step: 181180   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:52,380-Speed 2636.57 samples/sec   Loss 10.6333   LearningRate 0.0611   Epoch: 4   Global Step: 181190   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:05:56,277-Speed 2627.96 samples/sec   Loss 10.6033   LearningRate 0.0611   Epoch: 4   Global Step: 181200   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:00,174-Speed 2628.73 samples/sec   Loss 10.6872   LearningRate 0.0611   Epoch: 4   Global Step: 181210   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:04,085-Speed 2618.83 samples/sec   Loss 10.6635   LearningRate 0.0611   Epoch: 4   Global Step: 181220   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:07,990-Speed 2622.82 samples/sec   Loss 10.4773   LearningRate 0.0611   Epoch: 4   Global Step: 181230   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:11,888-Speed 2627.26 samples/sec   Loss 10.6653   LearningRate 0.0611   Epoch: 4   Global Step: 181240   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:15,786-Speed 2627.63 samples/sec   Loss 10.5633   LearningRate 0.0611   Epoch: 4   Global Step: 181250   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:19,684-Speed 2627.74 samples/sec   Loss 10.6247   LearningRate 0.0611   Epoch: 4   Global Step: 181260   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:23,580-Speed 2628.68 samples/sec   Loss 10.5597   LearningRate 0.0611   Epoch: 4   Global Step: 181270   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:27,481-Speed 2626.15 samples/sec   Loss 10.6892   LearningRate 0.0611   Epoch: 4   Global Step: 181280   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:06:31,383-Speed 2624.80 samples/sec   Loss 10.8397   LearningRate 0.0611   Epoch: 4   Global Step: 181290   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:35,318-Speed 2602.77 samples/sec   Loss 10.5164   LearningRate 0.0611   Epoch: 4   Global Step: 181300   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:39,225-Speed 2621.72 samples/sec   Loss 10.6927   LearningRate 0.0611   Epoch: 4   Global Step: 181310   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:43,121-Speed 2628.46 samples/sec   Loss 10.5991   LearningRate 0.0611   Epoch: 4   Global Step: 181320   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:47,018-Speed 2628.64 samples/sec   Loss 10.6495   LearningRate 0.0611   Epoch: 4   Global Step: 181330   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:50,912-Speed 2629.96 samples/sec   Loss 10.6532   LearningRate 0.0611   Epoch: 4   Global Step: 181340   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:54,813-Speed 2625.95 samples/sec   Loss 10.5700   LearningRate 0.0611   Epoch: 4   Global Step: 181350   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:06:58,715-Speed 2625.35 samples/sec   Loss 10.4529   LearningRate 0.0611   Epoch: 4   Global Step: 181360   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:07:02,617-Speed 2624.93 samples/sec   Loss 10.6327   LearningRate 0.0611   Epoch: 4   Global Step: 181370   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:07:06,512-Speed 2629.61 samples/sec   Loss 10.6759   LearningRate 0.0611   Epoch: 4   Global Step: 181380   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:07:10,410-Speed 2627.40 samples/sec   Loss 10.5057   LearningRate 0.0611   Epoch: 4   Global Step: 181390   Fp16 Grad Scale: 524288   Required: 73 hours
Training: 2022-04-13 16:07:14,289-Speed 2640.50 samples/sec   Loss 10.7592   LearningRate 0.0610   Epoch: 4   Global Step: 181400   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:07:18,171-Speed 2638.37 samples/sec   Loss 10.5395   LearningRate 0.0610   Epoch: 4   Global Step: 181410   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:22,067-Speed 2628.80 samples/sec   Loss 10.5523   LearningRate 0.0610   Epoch: 4   Global Step: 181420   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:25,960-Speed 2630.89 samples/sec   Loss 10.6261   LearningRate 0.0610   Epoch: 4   Global Step: 181430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:29,862-Speed 2625.65 samples/sec   Loss 10.6696   LearningRate 0.0610   Epoch: 4   Global Step: 181440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:33,753-Speed 2632.40 samples/sec   Loss 10.5726   LearningRate 0.0610   Epoch: 4   Global Step: 181450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:37,669-Speed 2615.37 samples/sec   Loss 10.7882   LearningRate 0.0610   Epoch: 4   Global Step: 181460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:41,588-Speed 2613.72 samples/sec   Loss 10.7497   LearningRate 0.0610   Epoch: 4   Global Step: 181470   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:45,507-Speed 2613.52 samples/sec   Loss 10.6367   LearningRate 0.0610   Epoch: 4   Global Step: 181480   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:49,416-Speed 2620.02 samples/sec   Loss 10.4848   LearningRate 0.0610   Epoch: 4   Global Step: 181490   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:53,302-Speed 2635.99 samples/sec   Loss 10.5479   LearningRate 0.0610   Epoch: 4   Global Step: 181500   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:07:57,200-Speed 2628.47 samples/sec   Loss 10.5169   LearningRate 0.0610   Epoch: 4   Global Step: 181510   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:01,117-Speed 2614.85 samples/sec   Loss 10.6270   LearningRate 0.0610   Epoch: 4   Global Step: 181520   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:05,038-Speed 2611.80 samples/sec   Loss 10.5744   LearningRate 0.0610   Epoch: 4   Global Step: 181530   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:08,969-Speed 2605.53 samples/sec   Loss 10.5640   LearningRate 0.0610   Epoch: 4   Global Step: 181540   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:12,885-Speed 2615.67 samples/sec   Loss 10.6765   LearningRate 0.0610   Epoch: 4   Global Step: 181550   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:16,783-Speed 2627.64 samples/sec   Loss 10.7293   LearningRate 0.0610   Epoch: 4   Global Step: 181560   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:20,677-Speed 2630.44 samples/sec   Loss 10.5180   LearningRate 0.0610   Epoch: 4   Global Step: 181570   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:24,572-Speed 2629.61 samples/sec   Loss 10.4729   LearningRate 0.0610   Epoch: 4   Global Step: 181580   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:28,486-Speed 2617.30 samples/sec   Loss 10.5045   LearningRate 0.0610   Epoch: 4   Global Step: 181590   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:08:32,375-Speed 2634.19 samples/sec   Loss 10.6519   LearningRate 0.0610   Epoch: 4   Global Step: 181600   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:36,268-Speed 2630.38 samples/sec   Loss 10.6497   LearningRate 0.0610   Epoch: 4   Global Step: 181610   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:40,180-Speed 2618.19 samples/sec   Loss 10.7381   LearningRate 0.0610   Epoch: 4   Global Step: 181620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:44,074-Speed 2630.74 samples/sec   Loss 10.6309   LearningRate 0.0610   Epoch: 4   Global Step: 181630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:47,967-Speed 2631.08 samples/sec   Loss 10.4371   LearningRate 0.0610   Epoch: 4   Global Step: 181640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:51,865-Speed 2628.10 samples/sec   Loss 10.5574   LearningRate 0.0610   Epoch: 4   Global Step: 181650   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:55,765-Speed 2626.11 samples/sec   Loss 10.5925   LearningRate 0.0610   Epoch: 4   Global Step: 181660   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:08:59,674-Speed 2620.22 samples/sec   Loss 10.6029   LearningRate 0.0610   Epoch: 4   Global Step: 181670   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:09:03,568-Speed 2630.15 samples/sec   Loss 10.5914   LearningRate 0.0610   Epoch: 4   Global Step: 181680   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:09:07,466-Speed 2627.85 samples/sec   Loss 10.6755   LearningRate 0.0610   Epoch: 4   Global Step: 181690   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:09:11,360-Speed 2630.28 samples/sec   Loss 10.6614   LearningRate 0.0610   Epoch: 4   Global Step: 181700   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:09:15,257-Speed 2628.46 samples/sec   Loss 10.5906   LearningRate 0.0610   Epoch: 4   Global Step: 181710   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:09:19,153-Speed 2628.97 samples/sec   Loss 10.5792   LearningRate 0.0610   Epoch: 4   Global Step: 181720   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:09:23,054-Speed 2625.61 samples/sec   Loss 10.6830   LearningRate 0.0610   Epoch: 4   Global Step: 181730   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:09:26,936-Speed 2639.11 samples/sec   Loss 10.5417   LearningRate 0.0610   Epoch: 4   Global Step: 181740   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:09:30,834-Speed 2627.39 samples/sec   Loss 10.7139   LearningRate 0.0610   Epoch: 4   Global Step: 181750   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:09:34,738-Speed 2623.25 samples/sec   Loss 10.5911   LearningRate 0.0610   Epoch: 4   Global Step: 181760   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:09:38,617-Speed 2640.44 samples/sec   Loss 10.5306   LearningRate 0.0610   Epoch: 4   Global Step: 181770   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:09:42,517-Speed 2626.44 samples/sec   Loss 10.6007   LearningRate 0.0610   Epoch: 4   Global Step: 181780   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:09:46,409-Speed 2632.09 samples/sec   Loss 10.6638   LearningRate 0.0610   Epoch: 4   Global Step: 181790   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:09:50,304-Speed 2629.62 samples/sec   Loss 10.6790   LearningRate 0.0610   Epoch: 4   Global Step: 181800   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:09:54,204-Speed 2626.24 samples/sec   Loss 10.4568   LearningRate 0.0610   Epoch: 4   Global Step: 181810   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:09:58,094-Speed 2633.07 samples/sec   Loss 10.6224   LearningRate 0.0610   Epoch: 4   Global Step: 181820   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:01,987-Speed 2631.05 samples/sec   Loss 10.5340   LearningRate 0.0610   Epoch: 4   Global Step: 181830   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:05,902-Speed 2616.41 samples/sec   Loss 10.7226   LearningRate 0.0610   Epoch: 4   Global Step: 181840   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:09,829-Speed 2608.07 samples/sec   Loss 10.6149   LearningRate 0.0610   Epoch: 4   Global Step: 181850   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:13,725-Speed 2629.85 samples/sec   Loss 10.6026   LearningRate 0.0610   Epoch: 4   Global Step: 181860   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:17,617-Speed 2632.20 samples/sec   Loss 10.6847   LearningRate 0.0610   Epoch: 4   Global Step: 181870   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:21,521-Speed 2623.28 samples/sec   Loss 10.7361   LearningRate 0.0610   Epoch: 4   Global Step: 181880   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:25,414-Speed 2631.36 samples/sec   Loss 10.6512   LearningRate 0.0610   Epoch: 4   Global Step: 181890   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:29,315-Speed 2625.55 samples/sec   Loss 10.6573   LearningRate 0.0610   Epoch: 4   Global Step: 181900   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:33,211-Speed 2628.49 samples/sec   Loss 10.5991   LearningRate 0.0610   Epoch: 4   Global Step: 181910   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:10:37,124-Speed 2617.97 samples/sec   Loss 10.5044   LearningRate 0.0610   Epoch: 4   Global Step: 181920   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:10:41,025-Speed 2625.75 samples/sec   Loss 10.5258   LearningRate 0.0609   Epoch: 4   Global Step: 181930   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:10:44,922-Speed 2628.35 samples/sec   Loss 10.5621   LearningRate 0.0609   Epoch: 4   Global Step: 181940   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:10:48,818-Speed 2630.05 samples/sec   Loss 10.7525   LearningRate 0.0609   Epoch: 4   Global Step: 181950   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:10:52,714-Speed 2628.43 samples/sec   Loss 10.5647   LearningRate 0.0609   Epoch: 4   Global Step: 181960   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:10:56,613-Speed 2627.59 samples/sec   Loss 10.6026   LearningRate 0.0609   Epoch: 4   Global Step: 181970   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:11:00,508-Speed 2629.49 samples/sec   Loss 10.6529   LearningRate 0.0609   Epoch: 4   Global Step: 181980   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:11:04,419-Speed 2618.92 samples/sec   Loss 10.6301   LearningRate 0.0609   Epoch: 4   Global Step: 181990   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:11:08,325-Speed 2621.94 samples/sec   Loss 10.5934   LearningRate 0.0609   Epoch: 4   Global Step: 182000   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:11:12,226-Speed 2626.53 samples/sec   Loss 10.6935   LearningRate 0.0609   Epoch: 4   Global Step: 182010   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:11:16,124-Speed 2627.43 samples/sec   Loss 10.7150   LearningRate 0.0609   Epoch: 4   Global Step: 182020   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:20,029-Speed 2623.01 samples/sec   Loss 10.6341   LearningRate 0.0609   Epoch: 4   Global Step: 182030   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:23,950-Speed 2612.12 samples/sec   Loss 10.5878   LearningRate 0.0609   Epoch: 4   Global Step: 182040   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:27,853-Speed 2624.88 samples/sec   Loss 10.7228   LearningRate 0.0609   Epoch: 4   Global Step: 182050   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:31,750-Speed 2627.99 samples/sec   Loss 10.6698   LearningRate 0.0609   Epoch: 4   Global Step: 182060   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:35,650-Speed 2626.18 samples/sec   Loss 10.5196   LearningRate 0.0609   Epoch: 4   Global Step: 182070   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:39,548-Speed 2627.35 samples/sec   Loss 10.4910   LearningRate 0.0609   Epoch: 4   Global Step: 182080   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:43,450-Speed 2624.85 samples/sec   Loss 10.5626   LearningRate 0.0609   Epoch: 4   Global Step: 182090   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:47,486-Speed 2538.62 samples/sec   Loss 10.7266   LearningRate 0.0609   Epoch: 4   Global Step: 182100   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:51,450-Speed 2584.08 samples/sec   Loss 10.6156   LearningRate 0.0609   Epoch: 4   Global Step: 182110   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:11:55,350-Speed 2626.08 samples/sec   Loss 10.7578   LearningRate 0.0609   Epoch: 4   Global Step: 182120   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:11:59,248-Speed 2627.35 samples/sec   Loss 10.5461   LearningRate 0.0609   Epoch: 4   Global Step: 182130   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:03,150-Speed 2624.58 samples/sec   Loss 10.6829   LearningRate 0.0609   Epoch: 4   Global Step: 182140   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:07,049-Speed 2627.36 samples/sec   Loss 10.5379   LearningRate 0.0609   Epoch: 4   Global Step: 182150   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:10,959-Speed 2619.51 samples/sec   Loss 10.7760   LearningRate 0.0609   Epoch: 4   Global Step: 182160   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:14,878-Speed 2613.55 samples/sec   Loss 10.5968   LearningRate 0.0609   Epoch: 4   Global Step: 182170   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:18,781-Speed 2624.79 samples/sec   Loss 10.5206   LearningRate 0.0609   Epoch: 4   Global Step: 182180   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:22,691-Speed 2620.11 samples/sec   Loss 10.7389   LearningRate 0.0609   Epoch: 4   Global Step: 182190   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:26,586-Speed 2629.42 samples/sec   Loss 10.5760   LearningRate 0.0609   Epoch: 4   Global Step: 182200   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:30,483-Speed 2629.01 samples/sec   Loss 10.5448   LearningRate 0.0609   Epoch: 4   Global Step: 182210   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:34,399-Speed 2615.33 samples/sec   Loss 10.5898   LearningRate 0.0609   Epoch: 4   Global Step: 182220   Fp16 Grad Scale: 524288   Required: 73 hours
Training: 2022-04-13 16:12:38,285-Speed 2635.81 samples/sec   Loss 10.5977   LearningRate 0.0609   Epoch: 4   Global Step: 182230   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:42,183-Speed 2627.45 samples/sec   Loss 10.6495   LearningRate 0.0609   Epoch: 4   Global Step: 182240   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:46,085-Speed 2625.37 samples/sec   Loss 10.5565   LearningRate 0.0609   Epoch: 4   Global Step: 182250   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:49,985-Speed 2625.86 samples/sec   Loss 10.5493   LearningRate 0.0609   Epoch: 4   Global Step: 182260   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:53,938-Speed 2591.19 samples/sec   Loss 10.6555   LearningRate 0.0609   Epoch: 4   Global Step: 182270   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:12:57,959-Speed 2547.69 samples/sec   Loss 10.8131   LearningRate 0.0609   Epoch: 4   Global Step: 182280   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:01,856-Speed 2628.16 samples/sec   Loss 10.6297   LearningRate 0.0609   Epoch: 4   Global Step: 182290   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:05,785-Speed 2607.03 samples/sec   Loss 10.6467   LearningRate 0.0609   Epoch: 4   Global Step: 182300   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:09,685-Speed 2626.60 samples/sec   Loss 10.6107   LearningRate 0.0609   Epoch: 4   Global Step: 182310   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:13,710-Speed 2544.72 samples/sec   Loss 10.7768   LearningRate 0.0609   Epoch: 4   Global Step: 182320   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:17,734-Speed 2545.45 samples/sec   Loss 10.6302   LearningRate 0.0609   Epoch: 4   Global Step: 182330   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:21,638-Speed 2623.61 samples/sec   Loss 10.5243   LearningRate 0.0609   Epoch: 4   Global Step: 182340   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:25,535-Speed 2628.10 samples/sec   Loss 10.7775   LearningRate 0.0609   Epoch: 4   Global Step: 182350   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:29,441-Speed 2622.39 samples/sec   Loss 10.5437   LearningRate 0.0609   Epoch: 4   Global Step: 182360   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:33,382-Speed 2599.46 samples/sec   Loss 10.7610   LearningRate 0.0609   Epoch: 4   Global Step: 182370   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:37,292-Speed 2619.67 samples/sec   Loss 10.6245   LearningRate 0.0609   Epoch: 4   Global Step: 182380   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:13:41,178-Speed 2635.19 samples/sec   Loss 10.7282   LearningRate 0.0609   Epoch: 4   Global Step: 182390   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:13:45,087-Speed 2620.23 samples/sec   Loss 10.5268   LearningRate 0.0609   Epoch: 4   Global Step: 182400   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:13:48,994-Speed 2622.19 samples/sec   Loss 10.6992   LearningRate 0.0609   Epoch: 4   Global Step: 182410   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:13:52,900-Speed 2622.63 samples/sec   Loss 10.6386   LearningRate 0.0609   Epoch: 4   Global Step: 182420   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:13:56,811-Speed 2619.17 samples/sec   Loss 10.5813   LearningRate 0.0609   Epoch: 4   Global Step: 182430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:00,742-Speed 2605.69 samples/sec   Loss 10.6616   LearningRate 0.0609   Epoch: 4   Global Step: 182440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:04,647-Speed 2622.79 samples/sec   Loss 10.6459   LearningRate 0.0609   Epoch: 4   Global Step: 182450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:08,560-Speed 2617.74 samples/sec   Loss 10.6187   LearningRate 0.0608   Epoch: 4   Global Step: 182460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:12,470-Speed 2619.70 samples/sec   Loss 10.4066   LearningRate 0.0608   Epoch: 4   Global Step: 182470   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:16,377-Speed 2621.76 samples/sec   Loss 10.5464   LearningRate 0.0608   Epoch: 4   Global Step: 182480   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:20,274-Speed 2628.05 samples/sec   Loss 10.5142   LearningRate 0.0608   Epoch: 4   Global Step: 182490   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:24,174-Speed 2627.09 samples/sec   Loss 10.6155   LearningRate 0.0608   Epoch: 4   Global Step: 182500   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:28,150-Speed 2575.98 samples/sec   Loss 10.6917   LearningRate 0.0608   Epoch: 4   Global Step: 182510   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:32,080-Speed 2606.18 samples/sec   Loss 10.5755   LearningRate 0.0608   Epoch: 4   Global Step: 182520   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:35,988-Speed 2621.38 samples/sec   Loss 10.5887   LearningRate 0.0608   Epoch: 4   Global Step: 182530   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:39,883-Speed 2628.95 samples/sec   Loss 10.6140   LearningRate 0.0608   Epoch: 4   Global Step: 182540   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:43,794-Speed 2619.22 samples/sec   Loss 10.5853   LearningRate 0.0608   Epoch: 4   Global Step: 182550   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:47,703-Speed 2620.14 samples/sec   Loss 10.6279   LearningRate 0.0608   Epoch: 4   Global Step: 182560   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:51,620-Speed 2615.09 samples/sec   Loss 10.6035   LearningRate 0.0608   Epoch: 4   Global Step: 182570   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:55,542-Speed 2611.73 samples/sec   Loss 10.7647   LearningRate 0.0608   Epoch: 4   Global Step: 182580   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:14:59,444-Speed 2624.90 samples/sec   Loss 10.6958   LearningRate 0.0608   Epoch: 4   Global Step: 182590   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:03,344-Speed 2626.34 samples/sec   Loss 10.5617   LearningRate 0.0608   Epoch: 4   Global Step: 182600   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:07,240-Speed 2628.73 samples/sec   Loss 10.5553   LearningRate 0.0608   Epoch: 4   Global Step: 182610   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:11,136-Speed 2629.12 samples/sec   Loss 10.5350   LearningRate 0.0608   Epoch: 4   Global Step: 182620   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:15,033-Speed 2628.31 samples/sec   Loss 10.6299   LearningRate 0.0608   Epoch: 4   Global Step: 182630   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:18,956-Speed 2610.90 samples/sec   Loss 10.6529   LearningRate 0.0608   Epoch: 4   Global Step: 182640   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:22,851-Speed 2629.70 samples/sec   Loss 10.5863   LearningRate 0.0608   Epoch: 4   Global Step: 182650   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:26,748-Speed 2628.90 samples/sec   Loss 10.5848   LearningRate 0.0608   Epoch: 4   Global Step: 182660   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:30,651-Speed 2624.14 samples/sec   Loss 10.6103   LearningRate 0.0608   Epoch: 4   Global Step: 182670   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:34,559-Speed 2620.76 samples/sec   Loss 10.4880   LearningRate 0.0608   Epoch: 4   Global Step: 182680   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:15:38,471-Speed 2618.15 samples/sec   Loss 10.6970   LearningRate 0.0608   Epoch: 4   Global Step: 182690   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:15:42,370-Speed 2627.61 samples/sec   Loss 10.6535   LearningRate 0.0608   Epoch: 4   Global Step: 182700   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:15:46,265-Speed 2629.37 samples/sec   Loss 10.5523   LearningRate 0.0608   Epoch: 4   Global Step: 182710   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:15:50,161-Speed 2628.90 samples/sec   Loss 10.5487   LearningRate 0.0608   Epoch: 4   Global Step: 182720   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:15:54,060-Speed 2627.12 samples/sec   Loss 10.4938   LearningRate 0.0608   Epoch: 4   Global Step: 182730   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:15:57,959-Speed 2627.47 samples/sec   Loss 10.6251   LearningRate 0.0608   Epoch: 4   Global Step: 182740   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:16:01,856-Speed 2628.13 samples/sec   Loss 10.5306   LearningRate 0.0608   Epoch: 4   Global Step: 182750   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:16:05,763-Speed 2621.58 samples/sec   Loss 10.5696   LearningRate 0.0608   Epoch: 4   Global Step: 182760   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:16:09,661-Speed 2627.87 samples/sec   Loss 10.6759   LearningRate 0.0608   Epoch: 4   Global Step: 182770   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:16:13,560-Speed 2627.07 samples/sec   Loss 10.5406   LearningRate 0.0608   Epoch: 4   Global Step: 182780   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:16:17,469-Speed 2620.35 samples/sec   Loss 10.5706   LearningRate 0.0608   Epoch: 4   Global Step: 182790   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:21,365-Speed 2629.20 samples/sec   Loss 10.5733   LearningRate 0.0608   Epoch: 4   Global Step: 182800   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:25,266-Speed 2625.53 samples/sec   Loss 10.5642   LearningRate 0.0608   Epoch: 4   Global Step: 182810   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:29,168-Speed 2624.65 samples/sec   Loss 10.5697   LearningRate 0.0608   Epoch: 4   Global Step: 182820   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:33,068-Speed 2626.52 samples/sec   Loss 10.6267   LearningRate 0.0608   Epoch: 4   Global Step: 182830   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:36,966-Speed 2627.61 samples/sec   Loss 10.5356   LearningRate 0.0608   Epoch: 4   Global Step: 182840   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:40,863-Speed 2628.29 samples/sec   Loss 10.5762   LearningRate 0.0608   Epoch: 4   Global Step: 182850   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:44,766-Speed 2624.33 samples/sec   Loss 10.4712   LearningRate 0.0608   Epoch: 4   Global Step: 182860   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:48,667-Speed 2625.62 samples/sec   Loss 10.6671   LearningRate 0.0608   Epoch: 4   Global Step: 182870   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:52,563-Speed 2628.95 samples/sec   Loss 10.7410   LearningRate 0.0608   Epoch: 4   Global Step: 182880   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:16:56,428-Speed 2650.84 samples/sec   Loss 10.6647   LearningRate 0.0608   Epoch: 4   Global Step: 182890   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:17:00,335-Speed 2621.49 samples/sec   Loss 10.7148   LearningRate 0.0608   Epoch: 4   Global Step: 182900   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:17:04,253-Speed 2613.77 samples/sec   Loss 10.7989   LearningRate 0.0608   Epoch: 4   Global Step: 182910   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:17:08,144-Speed 2632.76 samples/sec   Loss 10.7834   LearningRate 0.0608   Epoch: 4   Global Step: 182920   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:17:12,027-Speed 2637.84 samples/sec   Loss 10.4767   LearningRate 0.0608   Epoch: 4   Global Step: 182930   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:15,918-Speed 2632.25 samples/sec   Loss 11.2953   LearningRate 0.0608   Epoch: 4   Global Step: 182940   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:19,844-Speed 2609.17 samples/sec   Loss 10.8910   LearningRate 0.0608   Epoch: 4   Global Step: 182950   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:23,748-Speed 2623.69 samples/sec   Loss 10.7037   LearningRate 0.0608   Epoch: 4   Global Step: 182960   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:27,660-Speed 2618.68 samples/sec   Loss 10.6013   LearningRate 0.0608   Epoch: 4   Global Step: 182970   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:31,598-Speed 2600.35 samples/sec   Loss 10.6207   LearningRate 0.0608   Epoch: 4   Global Step: 182980   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:35,507-Speed 2620.82 samples/sec   Loss 10.6187   LearningRate 0.0607   Epoch: 4   Global Step: 182990   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:39,406-Speed 2626.83 samples/sec   Loss 10.5801   LearningRate 0.0607   Epoch: 4   Global Step: 183000   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:43,305-Speed 2626.83 samples/sec   Loss 10.7020   LearningRate 0.0607   Epoch: 4   Global Step: 183010   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:47,211-Speed 2622.49 samples/sec   Loss 10.7388   LearningRate 0.0607   Epoch: 4   Global Step: 183020   Fp16 Grad Scale: 32768   Required: 73 hours
Training: 2022-04-13 16:17:51,147-Speed 2602.46 samples/sec   Loss 10.4915   LearningRate 0.0607   Epoch: 4   Global Step: 183030   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:17:55,057-Speed 2619.34 samples/sec   Loss 10.7029   LearningRate 0.0607   Epoch: 4   Global Step: 183040   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:17:58,985-Speed 2607.77 samples/sec   Loss 10.6336   LearningRate 0.0607   Epoch: 4   Global Step: 183050   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:02,901-Speed 2615.97 samples/sec   Loss 10.5048   LearningRate 0.0607   Epoch: 4   Global Step: 183060   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:06,813-Speed 2617.87 samples/sec   Loss 10.6367   LearningRate 0.0607   Epoch: 4   Global Step: 183070   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:10,725-Speed 2618.06 samples/sec   Loss 10.4730   LearningRate 0.0607   Epoch: 4   Global Step: 183080   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:14,733-Speed 2555.67 samples/sec   Loss 10.7263   LearningRate 0.0607   Epoch: 4   Global Step: 183090   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:18,642-Speed 2620.40 samples/sec   Loss 10.6576   LearningRate 0.0607   Epoch: 4   Global Step: 183100   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:22,554-Speed 2617.67 samples/sec   Loss 10.7076   LearningRate 0.0607   Epoch: 4   Global Step: 183110   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:26,455-Speed 2626.32 samples/sec   Loss 10.5671   LearningRate 0.0607   Epoch: 4   Global Step: 183120   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:18:30,353-Speed 2627.20 samples/sec   Loss 10.6844   LearningRate 0.0607   Epoch: 4   Global Step: 183130   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:34,265-Speed 2618.96 samples/sec   Loss 10.6549   LearningRate 0.0607   Epoch: 4   Global Step: 183140   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:38,185-Speed 2612.49 samples/sec   Loss 10.6312   LearningRate 0.0607   Epoch: 4   Global Step: 183150   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:42,084-Speed 2627.17 samples/sec   Loss 10.5925   LearningRate 0.0607   Epoch: 4   Global Step: 183160   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:45,981-Speed 2628.27 samples/sec   Loss 10.5358   LearningRate 0.0607   Epoch: 4   Global Step: 183170   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:49,881-Speed 2626.30 samples/sec   Loss 10.6163   LearningRate 0.0607   Epoch: 4   Global Step: 183180   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:53,779-Speed 2627.39 samples/sec   Loss 10.5251   LearningRate 0.0607   Epoch: 4   Global Step: 183190   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:18:57,681-Speed 2624.64 samples/sec   Loss 10.5916   LearningRate 0.0607   Epoch: 4   Global Step: 183200   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:01,581-Speed 2626.65 samples/sec   Loss 10.6361   LearningRate 0.0607   Epoch: 4   Global Step: 183210   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:05,491-Speed 2619.98 samples/sec   Loss 10.5543   LearningRate 0.0607   Epoch: 4   Global Step: 183220   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:09,414-Speed 2610.62 samples/sec   Loss 10.4912   LearningRate 0.0607   Epoch: 4   Global Step: 183230   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:13,329-Speed 2615.56 samples/sec   Loss 10.5684   LearningRate 0.0607   Epoch: 4   Global Step: 183240   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:17,259-Speed 2606.86 samples/sec   Loss 10.4742   LearningRate 0.0607   Epoch: 4   Global Step: 183250   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:21,178-Speed 2613.38 samples/sec   Loss 10.5769   LearningRate 0.0607   Epoch: 4   Global Step: 183260   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:25,085-Speed 2621.53 samples/sec   Loss 10.5223   LearningRate 0.0607   Epoch: 4   Global Step: 183270   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:28,995-Speed 2619.80 samples/sec   Loss 10.4908   LearningRate 0.0607   Epoch: 4   Global Step: 183280   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:32,903-Speed 2621.19 samples/sec   Loss 10.4481   LearningRate 0.0607   Epoch: 4   Global Step: 183290   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:19:36,788-Speed 2636.60 samples/sec   Loss 10.5416   LearningRate 0.0607   Epoch: 4   Global Step: 183300   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:40,682-Speed 2629.79 samples/sec   Loss 10.5438   LearningRate 0.0607   Epoch: 4   Global Step: 183310   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:44,588-Speed 2622.11 samples/sec   Loss 10.6805   LearningRate 0.0607   Epoch: 4   Global Step: 183320   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:48,504-Speed 2615.96 samples/sec   Loss 10.5735   LearningRate 0.0607   Epoch: 4   Global Step: 183330   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:52,415-Speed 2618.74 samples/sec   Loss 10.6156   LearningRate 0.0607   Epoch: 4   Global Step: 183340   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:19:56,329-Speed 2616.89 samples/sec   Loss 10.5934   LearningRate 0.0607   Epoch: 4   Global Step: 183350   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:00,249-Speed 2613.21 samples/sec   Loss 10.7278   LearningRate 0.0607   Epoch: 4   Global Step: 183360   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:04,161-Speed 2618.41 samples/sec   Loss 10.5636   LearningRate 0.0607   Epoch: 4   Global Step: 183370   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:08,079-Speed 2613.90 samples/sec   Loss 10.5440   LearningRate 0.0607   Epoch: 4   Global Step: 183380   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:11,988-Speed 2620.74 samples/sec   Loss 10.5809   LearningRate 0.0607   Epoch: 4   Global Step: 183390   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:15,898-Speed 2619.25 samples/sec   Loss 10.7647   LearningRate 0.0607   Epoch: 4   Global Step: 183400   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:20:19,810-Speed 2618.22 samples/sec   Loss 10.6400   LearningRate 0.0607   Epoch: 4   Global Step: 183410   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:20:23,749-Speed 2600.56 samples/sec   Loss 10.6121   LearningRate 0.0607   Epoch: 4   Global Step: 183420   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:20:27,638-Speed 2633.71 samples/sec   Loss 10.6260   LearningRate 0.0607   Epoch: 4   Global Step: 183430   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:31,547-Speed 2620.57 samples/sec   Loss 10.3317   LearningRate 0.0607   Epoch: 4   Global Step: 183440   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:35,456-Speed 2620.14 samples/sec   Loss 10.6135   LearningRate 0.0607   Epoch: 4   Global Step: 183450   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:39,359-Speed 2624.60 samples/sec   Loss 10.7293   LearningRate 0.0607   Epoch: 4   Global Step: 183460   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:43,265-Speed 2622.31 samples/sec   Loss 10.5256   LearningRate 0.0607   Epoch: 4   Global Step: 183470   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:47,169-Speed 2623.82 samples/sec   Loss 10.5807   LearningRate 0.0607   Epoch: 4   Global Step: 183480   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:51,081-Speed 2618.68 samples/sec   Loss 10.6004   LearningRate 0.0607   Epoch: 4   Global Step: 183490   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:54,990-Speed 2620.20 samples/sec   Loss 10.5726   LearningRate 0.0607   Epoch: 4   Global Step: 183500   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:20:58,898-Speed 2620.54 samples/sec   Loss 10.5900   LearningRate 0.0607   Epoch: 4   Global Step: 183510   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:21:02,815-Speed 2614.35 samples/sec   Loss 10.4934   LearningRate 0.0606   Epoch: 4   Global Step: 183520   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:21:06,714-Speed 2627.52 samples/sec   Loss 10.6406   LearningRate 0.0606   Epoch: 4   Global Step: 183530   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:10,623-Speed 2620.33 samples/sec   Loss 10.5690   LearningRate 0.0606   Epoch: 4   Global Step: 183540   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:14,523-Speed 2626.40 samples/sec   Loss 10.5748   LearningRate 0.0606   Epoch: 4   Global Step: 183550   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:18,530-Speed 2556.21 samples/sec   Loss 10.5404   LearningRate 0.0606   Epoch: 4   Global Step: 183560   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:22,450-Speed 2612.31 samples/sec   Loss 10.6120   LearningRate 0.0606   Epoch: 4   Global Step: 183570   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:26,347-Speed 2629.31 samples/sec   Loss 10.6559   LearningRate 0.0606   Epoch: 4   Global Step: 183580   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:30,248-Speed 2625.33 samples/sec   Loss 10.4617   LearningRate 0.0606   Epoch: 4   Global Step: 183590   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:34,150-Speed 2624.73 samples/sec   Loss 10.5733   LearningRate 0.0606   Epoch: 4   Global Step: 183600   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:38,084-Speed 2603.38 samples/sec   Loss 10.5447   LearningRate 0.0606   Epoch: 4   Global Step: 183610   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:21:41,968-Speed 2637.48 samples/sec   Loss 10.5933   LearningRate 0.0606   Epoch: 4   Global Step: 183620   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:21:45,869-Speed 2625.74 samples/sec   Loss 10.5365   LearningRate 0.0606   Epoch: 4   Global Step: 183630   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:21:49,771-Speed 2625.06 samples/sec   Loss 10.5783   LearningRate 0.0606   Epoch: 4   Global Step: 183640   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:21:53,701-Speed 2605.79 samples/sec   Loss 10.5058   LearningRate 0.0606   Epoch: 4   Global Step: 183650   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:21:57,612-Speed 2619.27 samples/sec   Loss 10.7114   LearningRate 0.0606   Epoch: 4   Global Step: 183660   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:01,519-Speed 2621.69 samples/sec   Loss 10.6911   LearningRate 0.0606   Epoch: 4   Global Step: 183670   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:05,452-Speed 2604.11 samples/sec   Loss 10.5334   LearningRate 0.0606   Epoch: 4   Global Step: 183680   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:09,364-Speed 2618.46 samples/sec   Loss 10.6005   LearningRate 0.0606   Epoch: 4   Global Step: 183690   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:13,262-Speed 2627.54 samples/sec   Loss 10.4854   LearningRate 0.0606   Epoch: 4   Global Step: 183700   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:17,167-Speed 2623.05 samples/sec   Loss 10.5098   LearningRate 0.0606   Epoch: 4   Global Step: 183710   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:21,070-Speed 2624.29 samples/sec   Loss 10.5284   LearningRate 0.0606   Epoch: 4   Global Step: 183720   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:24,984-Speed 2617.09 samples/sec   Loss 10.5763   LearningRate 0.0606   Epoch: 4   Global Step: 183730   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:28,893-Speed 2620.04 samples/sec   Loss 10.5038   LearningRate 0.0606   Epoch: 4   Global Step: 183740   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:32,796-Speed 2624.28 samples/sec   Loss 10.5257   LearningRate 0.0606   Epoch: 4   Global Step: 183750   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:36,712-Speed 2615.34 samples/sec   Loss 10.6148   LearningRate 0.0606   Epoch: 4   Global Step: 183760   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:40,628-Speed 2615.75 samples/sec   Loss 10.5216   LearningRate 0.0606   Epoch: 4   Global Step: 183770   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:44,529-Speed 2625.64 samples/sec   Loss 10.7179   LearningRate 0.0606   Epoch: 4   Global Step: 183780   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:48,454-Speed 2610.04 samples/sec   Loss 10.5817   LearningRate 0.0606   Epoch: 4   Global Step: 183790   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:52,357-Speed 2624.25 samples/sec   Loss 10.6328   LearningRate 0.0606   Epoch: 4   Global Step: 183800   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:22:56,257-Speed 2626.81 samples/sec   Loss 10.6825   LearningRate 0.0606   Epoch: 4   Global Step: 183810   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:00,159-Speed 2624.69 samples/sec   Loss 10.6161   LearningRate 0.0606   Epoch: 4   Global Step: 183820   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:04,060-Speed 2625.59 samples/sec   Loss 10.5105   LearningRate 0.0606   Epoch: 4   Global Step: 183830   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:08,104-Speed 2532.98 samples/sec   Loss 10.5950   LearningRate 0.0606   Epoch: 4   Global Step: 183840   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:12,243-Speed 2474.79 samples/sec   Loss 10.6063   LearningRate 0.0606   Epoch: 4   Global Step: 183850   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:16,141-Speed 2627.29 samples/sec   Loss 10.6862   LearningRate 0.0606   Epoch: 4   Global Step: 183860   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:20,049-Speed 2621.34 samples/sec   Loss 10.6321   LearningRate 0.0606   Epoch: 4   Global Step: 183870   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:23,955-Speed 2621.73 samples/sec   Loss 10.6371   LearningRate 0.0606   Epoch: 4   Global Step: 183880   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:27,862-Speed 2622.24 samples/sec   Loss 10.7326   LearningRate 0.0606   Epoch: 4   Global Step: 183890   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:23:31,745-Speed 2637.55 samples/sec   Loss 10.5023   LearningRate 0.0606   Epoch: 4   Global Step: 183900   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:35,649-Speed 2623.90 samples/sec   Loss 10.6916   LearningRate 0.0606   Epoch: 4   Global Step: 183910   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:39,622-Speed 2577.38 samples/sec   Loss 10.5836   LearningRate 0.0606   Epoch: 4   Global Step: 183920   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:43,528-Speed 2622.93 samples/sec   Loss 10.5217   LearningRate 0.0606   Epoch: 4   Global Step: 183930   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:47,425-Speed 2628.12 samples/sec   Loss 10.4956   LearningRate 0.0606   Epoch: 4   Global Step: 183940   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:51,327-Speed 2625.06 samples/sec   Loss 10.5683   LearningRate 0.0606   Epoch: 4   Global Step: 183950   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:55,232-Speed 2622.90 samples/sec   Loss 10.7415   LearningRate 0.0606   Epoch: 4   Global Step: 183960   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:23:59,131-Speed 2626.83 samples/sec   Loss 10.4834   LearningRate 0.0606   Epoch: 4   Global Step: 183970   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:24:03,034-Speed 2623.94 samples/sec   Loss 10.4396   LearningRate 0.0606   Epoch: 4   Global Step: 183980   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:24:06,934-Speed 2626.15 samples/sec   Loss 10.4998   LearningRate 0.0606   Epoch: 4   Global Step: 183990   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:24:10,834-Speed 2626.00 samples/sec   Loss 10.4997   LearningRate 0.0606   Epoch: 4   Global Step: 184000   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:14,736-Speed 2626.08 samples/sec   Loss 10.6454   LearningRate 0.0606   Epoch: 4   Global Step: 184010   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:18,639-Speed 2624.32 samples/sec   Loss 10.7241   LearningRate 0.0606   Epoch: 4   Global Step: 184020   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:22,541-Speed 2624.71 samples/sec   Loss 10.6992   LearningRate 0.0606   Epoch: 4   Global Step: 184030   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:26,447-Speed 2622.41 samples/sec   Loss 10.6502   LearningRate 0.0606   Epoch: 4   Global Step: 184040   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:30,352-Speed 2623.02 samples/sec   Loss 10.5445   LearningRate 0.0606   Epoch: 4   Global Step: 184050   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:34,257-Speed 2622.89 samples/sec   Loss 10.6267   LearningRate 0.0605   Epoch: 4   Global Step: 184060   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:38,161-Speed 2623.85 samples/sec   Loss 10.4949   LearningRate 0.0605   Epoch: 4   Global Step: 184070   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:42,069-Speed 2621.03 samples/sec   Loss 10.4095   LearningRate 0.0605   Epoch: 4   Global Step: 184080   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:46,064-Speed 2563.74 samples/sec   Loss 10.7075   LearningRate 0.0605   Epoch: 4   Global Step: 184090   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:50,021-Speed 2588.80 samples/sec   Loss 10.5862   LearningRate 0.0605   Epoch: 4   Global Step: 184100   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:53,925-Speed 2623.38 samples/sec   Loss 10.5465   LearningRate 0.0605   Epoch: 4   Global Step: 184110   Fp16 Grad Scale: 262144   Required: 73 hours
Training: 2022-04-13 16:24:57,952-Speed 2543.54 samples/sec   Loss 10.5998   LearningRate 0.0605   Epoch: 4   Global Step: 184120   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:02,009-Speed 2524.96 samples/sec   Loss 10.5300   LearningRate 0.0605   Epoch: 4   Global Step: 184130   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:05,924-Speed 2616.27 samples/sec   Loss 10.5639   LearningRate 0.0605   Epoch: 4   Global Step: 184140   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:09,823-Speed 2626.88 samples/sec   Loss 10.6990   LearningRate 0.0605   Epoch: 4   Global Step: 184150   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:13,724-Speed 2625.30 samples/sec   Loss 10.4950   LearningRate 0.0605   Epoch: 4   Global Step: 184160   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:17,627-Speed 2623.98 samples/sec   Loss 10.5793   LearningRate 0.0605   Epoch: 4   Global Step: 184170   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:21,530-Speed 2624.33 samples/sec   Loss 10.6595   LearningRate 0.0605   Epoch: 4   Global Step: 184180   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:25,432-Speed 2624.88 samples/sec   Loss 10.6213   LearningRate 0.0605   Epoch: 4   Global Step: 184190   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:29,332-Speed 2626.24 samples/sec   Loss 10.5110   LearningRate 0.0605   Epoch: 4   Global Step: 184200   Fp16 Grad Scale: 131072   Required: 73 hours
Training: 2022-04-13 16:25:33,217-Speed 2636.96 samples/sec   Loss 10.6273   LearningRate 0.0605   Epoch: 4   Global Step: 184210   Fp16 Grad Scale: 65536   Required: 73 hours
Training: 2022-04-13 16:25:37,121-Speed 2623.69 samples/sec   Loss 10.4822   LearningRate 0.0605   Epoch: 4   Global Step: 184220   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:25:41,009-Speed 2633.79 samples/sec   Loss 10.6419   LearningRate 0.0605   Epoch: 4   Global Step: 184230   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:25:44,907-Speed 2627.54 samples/sec   Loss 10.7038   LearningRate 0.0605   Epoch: 4   Global Step: 184240   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:25:48,808-Speed 2625.49 samples/sec   Loss 10.6483   LearningRate 0.0605   Epoch: 4   Global Step: 184250   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:25:52,708-Speed 2626.58 samples/sec   Loss 10.6231   LearningRate 0.0605   Epoch: 4   Global Step: 184260   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:25:56,615-Speed 2621.86 samples/sec   Loss 10.5994   LearningRate 0.0605   Epoch: 4   Global Step: 184270   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:00,543-Speed 2607.63 samples/sec   Loss 10.5681   LearningRate 0.0605   Epoch: 4   Global Step: 184280   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:04,446-Speed 2624.94 samples/sec   Loss 10.6835   LearningRate 0.0605   Epoch: 4   Global Step: 184290   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:08,346-Speed 2626.22 samples/sec   Loss 10.6623   LearningRate 0.0605   Epoch: 4   Global Step: 184300   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:12,244-Speed 2626.93 samples/sec   Loss 10.5720   LearningRate 0.0605   Epoch: 4   Global Step: 184310   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:16,154-Speed 2619.90 samples/sec   Loss 10.4455   LearningRate 0.0605   Epoch: 4   Global Step: 184320   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:20,054-Speed 2626.35 samples/sec   Loss 10.6809   LearningRate 0.0605   Epoch: 4   Global Step: 184330   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:26:23,953-Speed 2627.44 samples/sec   Loss 10.6126   LearningRate 0.0605   Epoch: 4   Global Step: 184340   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:26:27,837-Speed 2636.58 samples/sec   Loss 10.6414   LearningRate 0.0605   Epoch: 4   Global Step: 184350   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:31,738-Speed 2625.50 samples/sec   Loss 10.5546   LearningRate 0.0605   Epoch: 4   Global Step: 184360   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:35,636-Speed 2627.81 samples/sec   Loss 10.8078   LearningRate 0.0605   Epoch: 4   Global Step: 184370   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:39,547-Speed 2618.84 samples/sec   Loss 10.4602   LearningRate 0.0605   Epoch: 4   Global Step: 184380   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:43,446-Speed 2627.09 samples/sec   Loss 10.6898   LearningRate 0.0605   Epoch: 4   Global Step: 184390   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:47,349-Speed 2624.11 samples/sec   Loss 10.5053   LearningRate 0.0605   Epoch: 4   Global Step: 184400   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:51,256-Speed 2622.20 samples/sec   Loss 10.6110   LearningRate 0.0605   Epoch: 4   Global Step: 184410   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:55,160-Speed 2623.20 samples/sec   Loss 10.5676   LearningRate 0.0605   Epoch: 4   Global Step: 184420   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:26:59,063-Speed 2623.99 samples/sec   Loss 10.7032   LearningRate 0.0605   Epoch: 4   Global Step: 184430   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:27:02,977-Speed 2617.26 samples/sec   Loss 10.5663   LearningRate 0.0605   Epoch: 4   Global Step: 184440   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:27:06,919-Speed 2598.33 samples/sec   Loss 10.6549   LearningRate 0.0605   Epoch: 4   Global Step: 184450   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:10,849-Speed 2606.70 samples/sec   Loss 10.5947   LearningRate 0.0605   Epoch: 4   Global Step: 184460   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:14,754-Speed 2623.27 samples/sec   Loss 10.6335   LearningRate 0.0605   Epoch: 4   Global Step: 184470   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:18,686-Speed 2604.94 samples/sec   Loss 10.5769   LearningRate 0.0605   Epoch: 4   Global Step: 184480   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:22,584-Speed 2627.16 samples/sec   Loss 10.5360   LearningRate 0.0605   Epoch: 4   Global Step: 184490   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:26,491-Speed 2621.81 samples/sec   Loss 10.7477   LearningRate 0.0605   Epoch: 4   Global Step: 184500   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:30,403-Speed 2618.86 samples/sec   Loss 10.6674   LearningRate 0.0605   Epoch: 4   Global Step: 184510   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:34,316-Speed 2617.43 samples/sec   Loss 10.5243   LearningRate 0.0605   Epoch: 4   Global Step: 184520   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:38,218-Speed 2624.31 samples/sec   Loss 10.7643   LearningRate 0.0605   Epoch: 4   Global Step: 184530   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:42,138-Speed 2613.29 samples/sec   Loss 10.6321   LearningRate 0.0605   Epoch: 4   Global Step: 184540   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:27:46,053-Speed 2615.86 samples/sec   Loss 10.4938   LearningRate 0.0605   Epoch: 4   Global Step: 184550   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:27:49,975-Speed 2611.81 samples/sec   Loss 10.5331   LearningRate 0.0605   Epoch: 4   Global Step: 184560   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:27:53,902-Speed 2608.54 samples/sec   Loss 10.5423   LearningRate 0.0605   Epoch: 4   Global Step: 184570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:27:57,821-Speed 2613.39 samples/sec   Loss 10.5167   LearningRate 0.0605   Epoch: 4   Global Step: 184580   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:01,742-Speed 2612.36 samples/sec   Loss 10.5292   LearningRate 0.0604   Epoch: 4   Global Step: 184590   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:05,644-Speed 2624.78 samples/sec   Loss 10.6661   LearningRate 0.0604   Epoch: 4   Global Step: 184600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:09,546-Speed 2624.48 samples/sec   Loss 10.5439   LearningRate 0.0604   Epoch: 4   Global Step: 184610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:13,445-Speed 2627.14 samples/sec   Loss 10.5977   LearningRate 0.0604   Epoch: 4   Global Step: 184620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:17,344-Speed 2626.89 samples/sec   Loss 10.7084   LearningRate 0.0604   Epoch: 4   Global Step: 184630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:21,248-Speed 2623.98 samples/sec   Loss 10.6098   LearningRate 0.0604   Epoch: 4   Global Step: 184640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:25,148-Speed 2625.73 samples/sec   Loss 10.5456   LearningRate 0.0604   Epoch: 4   Global Step: 184650   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:28:29,048-Speed 2626.69 samples/sec   Loss 10.6197   LearningRate 0.0604   Epoch: 4   Global Step: 184660   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:28:33,009-Speed 2585.82 samples/sec   Loss 10.6041   LearningRate 0.0604   Epoch: 4   Global Step: 184670   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:37,022-Speed 2551.79 samples/sec   Loss 10.4547   LearningRate 0.0604   Epoch: 4   Global Step: 184680   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:40,925-Speed 2624.09 samples/sec   Loss 10.4919   LearningRate 0.0604   Epoch: 4   Global Step: 184690   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:44,827-Speed 2625.31 samples/sec   Loss 10.5920   LearningRate 0.0604   Epoch: 4   Global Step: 184700   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:48,734-Speed 2621.91 samples/sec   Loss 10.4396   LearningRate 0.0604   Epoch: 4   Global Step: 184710   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:52,637-Speed 2624.71 samples/sec   Loss 10.5333   LearningRate 0.0604   Epoch: 4   Global Step: 184720   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:28:56,560-Speed 2610.93 samples/sec   Loss 10.5034   LearningRate 0.0604   Epoch: 4   Global Step: 184730   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:00,457-Speed 2628.06 samples/sec   Loss 10.6697   LearningRate 0.0604   Epoch: 4   Global Step: 184740   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:04,383-Speed 2609.13 samples/sec   Loss 10.4962   LearningRate 0.0604   Epoch: 4   Global Step: 184750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:08,414-Speed 2540.30 samples/sec   Loss 10.5709   LearningRate 0.0604   Epoch: 4   Global Step: 184760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:12,319-Speed 2623.72 samples/sec   Loss 10.5591   LearningRate 0.0604   Epoch: 4   Global Step: 184770   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:29:16,218-Speed 2626.73 samples/sec   Loss 10.5961   LearningRate 0.0604   Epoch: 4   Global Step: 184780   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:29:20,106-Speed 2634.60 samples/sec   Loss 10.5266   LearningRate 0.0604   Epoch: 4   Global Step: 184790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:23,998-Speed 2631.77 samples/sec   Loss 10.5995   LearningRate 0.0604   Epoch: 4   Global Step: 184800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:27,908-Speed 2620.12 samples/sec   Loss 10.6684   LearningRate 0.0604   Epoch: 4   Global Step: 184810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:31,832-Speed 2610.44 samples/sec   Loss 10.6136   LearningRate 0.0604   Epoch: 4   Global Step: 184820   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:35,759-Speed 2608.10 samples/sec   Loss 10.6992   LearningRate 0.0604   Epoch: 4   Global Step: 184830   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:39,661-Speed 2624.71 samples/sec   Loss 10.6487   LearningRate 0.0604   Epoch: 4   Global Step: 184840   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:43,565-Speed 2623.80 samples/sec   Loss 10.6348   LearningRate 0.0604   Epoch: 4   Global Step: 184850   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:47,468-Speed 2624.28 samples/sec   Loss 10.5335   LearningRate 0.0604   Epoch: 4   Global Step: 184860   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:51,368-Speed 2626.93 samples/sec   Loss 10.6502   LearningRate 0.0604   Epoch: 4   Global Step: 184870   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:55,274-Speed 2622.00 samples/sec   Loss 10.4774   LearningRate 0.0604   Epoch: 4   Global Step: 184880   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:29:59,177-Speed 2624.36 samples/sec   Loss 10.7059   LearningRate 0.0604   Epoch: 4   Global Step: 184890   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:30:03,082-Speed 2622.78 samples/sec   Loss 10.7126   LearningRate 0.0604   Epoch: 4   Global Step: 184900   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:30:06,991-Speed 2620.52 samples/sec   Loss 10.5312   LearningRate 0.0604   Epoch: 4   Global Step: 184910   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:30:10,872-Speed 2638.74 samples/sec   Loss 10.5638   LearningRate 0.0604   Epoch: 4   Global Step: 184920   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:14,785-Speed 2618.14 samples/sec   Loss 10.6599   LearningRate 0.0604   Epoch: 4   Global Step: 184930   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:18,680-Speed 2629.75 samples/sec   Loss 10.6631   LearningRate 0.0604   Epoch: 4   Global Step: 184940   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:22,577-Speed 2627.97 samples/sec   Loss 10.5227   LearningRate 0.0604   Epoch: 4   Global Step: 184950   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:26,475-Speed 2627.71 samples/sec   Loss 10.5775   LearningRate 0.0604   Epoch: 4   Global Step: 184960   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:30,369-Speed 2629.88 samples/sec   Loss 10.4100   LearningRate 0.0604   Epoch: 4   Global Step: 184970   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:34,273-Speed 2624.44 samples/sec   Loss 10.4814   LearningRate 0.0604   Epoch: 4   Global Step: 184980   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:38,167-Speed 2630.09 samples/sec   Loss 10.4690   LearningRate 0.0604   Epoch: 4   Global Step: 184990   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:42,062-Speed 2629.64 samples/sec   Loss 10.4557   LearningRate 0.0604   Epoch: 4   Global Step: 185000   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:45,960-Speed 2627.08 samples/sec   Loss 10.5270   LearningRate 0.0604   Epoch: 4   Global Step: 185010   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:30:49,855-Speed 2630.32 samples/sec   Loss 10.4744   LearningRate 0.0604   Epoch: 4   Global Step: 185020   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:30:53,751-Speed 2628.80 samples/sec   Loss 10.5129   LearningRate 0.0604   Epoch: 4   Global Step: 185030   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:30:57,650-Speed 2627.39 samples/sec   Loss 10.6388   LearningRate 0.0604   Epoch: 4   Global Step: 185040   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:31:01,550-Speed 2626.36 samples/sec   Loss 10.6687   LearningRate 0.0604   Epoch: 4   Global Step: 185050   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:31:05,437-Speed 2635.00 samples/sec   Loss 10.7029   LearningRate 0.0604   Epoch: 4   Global Step: 185060   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:31:09,345-Speed 2621.13 samples/sec   Loss 10.6603   LearningRate 0.0604   Epoch: 4   Global Step: 185070   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:31:13,251-Speed 2622.13 samples/sec   Loss 10.5859   LearningRate 0.0604   Epoch: 4   Global Step: 185080   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:31:17,161-Speed 2619.47 samples/sec   Loss 10.4911   LearningRate 0.0604   Epoch: 4   Global Step: 185090   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:31:21,029-Speed 2648.01 samples/sec   Loss 10.6052   LearningRate 0.0604   Epoch: 4   Global Step: 185100   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:24,947-Speed 2614.12 samples/sec   Loss 10.5178   LearningRate 0.0604   Epoch: 4   Global Step: 185110   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:28,853-Speed 2622.20 samples/sec   Loss 10.4556   LearningRate 0.0603   Epoch: 4   Global Step: 185120   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:32,762-Speed 2620.62 samples/sec   Loss 10.5906   LearningRate 0.0603   Epoch: 4   Global Step: 185130   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:36,674-Speed 2618.51 samples/sec   Loss 10.4838   LearningRate 0.0603   Epoch: 4   Global Step: 185140   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:40,606-Speed 2604.49 samples/sec   Loss 11.2617   LearningRate 0.0603   Epoch: 4   Global Step: 185150   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:44,520-Speed 2616.68 samples/sec   Loss 11.0940   LearningRate 0.0603   Epoch: 4   Global Step: 185160   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:48,436-Speed 2615.55 samples/sec   Loss 10.9711   LearningRate 0.0603   Epoch: 4   Global Step: 185170   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:52,349-Speed 2617.76 samples/sec   Loss 10.8780   LearningRate 0.0603   Epoch: 4   Global Step: 185180   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:31:56,267-Speed 2614.53 samples/sec   Loss 10.5780   LearningRate 0.0603   Epoch: 4   Global Step: 185190   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:32:00,163-Speed 2629.21 samples/sec   Loss 10.7208   LearningRate 0.0603   Epoch: 4   Global Step: 185200   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:04,058-Speed 2630.09 samples/sec   Loss 10.7208   LearningRate 0.0603   Epoch: 4   Global Step: 185210   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:07,954-Speed 2628.56 samples/sec   Loss 10.6942   LearningRate 0.0603   Epoch: 4   Global Step: 185220   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:11,853-Speed 2626.55 samples/sec   Loss 10.6701   LearningRate 0.0603   Epoch: 4   Global Step: 185230   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:15,748-Speed 2629.70 samples/sec   Loss 10.6755   LearningRate 0.0603   Epoch: 4   Global Step: 185240   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:19,650-Speed 2625.71 samples/sec   Loss 10.6153   LearningRate 0.0603   Epoch: 4   Global Step: 185250   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:23,553-Speed 2624.14 samples/sec   Loss 10.6715   LearningRate 0.0603   Epoch: 4   Global Step: 185260   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:27,597-Speed 2532.53 samples/sec   Loss 10.5906   LearningRate 0.0603   Epoch: 4   Global Step: 185270   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:31,493-Speed 2628.66 samples/sec   Loss 10.6303   LearningRate 0.0603   Epoch: 4   Global Step: 185280   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:35,386-Speed 2631.65 samples/sec   Loss 10.5088   LearningRate 0.0603   Epoch: 4   Global Step: 185290   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:32:39,280-Speed 2630.80 samples/sec   Loss 10.5092   LearningRate 0.0603   Epoch: 4   Global Step: 185300   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:32:43,172-Speed 2631.69 samples/sec   Loss 10.6468   LearningRate 0.0603   Epoch: 4   Global Step: 185310   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:32:47,066-Speed 2630.38 samples/sec   Loss 10.6270   LearningRate 0.0603   Epoch: 4   Global Step: 185320   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:32:50,959-Speed 2630.84 samples/sec   Loss 10.6446   LearningRate 0.0603   Epoch: 4   Global Step: 185330   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:32:54,853-Speed 2631.05 samples/sec   Loss 10.6117   LearningRate 0.0603   Epoch: 4   Global Step: 185340   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:32:58,751-Speed 2627.62 samples/sec   Loss 10.4957   LearningRate 0.0603   Epoch: 4   Global Step: 185350   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:33:02,630-Speed 2640.13 samples/sec   Loss 10.6169   LearningRate 0.0603   Epoch: 4   Global Step: 185360   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:06,528-Speed 2627.54 samples/sec   Loss 10.7107   LearningRate 0.0603   Epoch: 4   Global Step: 185370   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:10,428-Speed 2626.73 samples/sec   Loss 10.5924   LearningRate 0.0603   Epoch: 4   Global Step: 185380   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:14,356-Speed 2607.71 samples/sec   Loss 10.7093   LearningRate 0.0603   Epoch: 4   Global Step: 185390   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:18,257-Speed 2625.63 samples/sec   Loss 10.7544   LearningRate 0.0603   Epoch: 4   Global Step: 185400   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:22,158-Speed 2626.23 samples/sec   Loss 10.6201   LearningRate 0.0603   Epoch: 4   Global Step: 185410   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:26,132-Speed 2576.85 samples/sec   Loss 10.6731   LearningRate 0.0603   Epoch: 4   Global Step: 185420   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:30,037-Speed 2623.43 samples/sec   Loss 10.7073   LearningRate 0.0603   Epoch: 4   Global Step: 185430   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:33,933-Speed 2628.81 samples/sec   Loss 10.5707   LearningRate 0.0603   Epoch: 4   Global Step: 185440   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:37,828-Speed 2629.78 samples/sec   Loss 10.6469   LearningRate 0.0603   Epoch: 4   Global Step: 185450   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:33:41,737-Speed 2620.31 samples/sec   Loss 10.7055   LearningRate 0.0603   Epoch: 4   Global Step: 185460   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:33:45,645-Speed 2621.07 samples/sec   Loss 10.5284   LearningRate 0.0603   Epoch: 4   Global Step: 185470   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:33:49,542-Speed 2627.95 samples/sec   Loss 10.4981   LearningRate 0.0603   Epoch: 4   Global Step: 185480   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:33:53,439-Speed 2628.50 samples/sec   Loss 10.6515   LearningRate 0.0603   Epoch: 4   Global Step: 185490   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:33:57,338-Speed 2626.78 samples/sec   Loss 10.6495   LearningRate 0.0603   Epoch: 4   Global Step: 185500   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:01,244-Speed 2622.38 samples/sec   Loss 10.6250   LearningRate 0.0603   Epoch: 4   Global Step: 185510   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:05,151-Speed 2621.29 samples/sec   Loss 10.5220   LearningRate 0.0603   Epoch: 4   Global Step: 185520   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:09,048-Speed 2627.99 samples/sec   Loss 10.6619   LearningRate 0.0603   Epoch: 4   Global Step: 185530   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:12,953-Speed 2623.28 samples/sec   Loss 10.6647   LearningRate 0.0603   Epoch: 4   Global Step: 185540   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:16,852-Speed 2627.22 samples/sec   Loss 10.4801   LearningRate 0.0603   Epoch: 4   Global Step: 185550   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:20,735-Speed 2637.76 samples/sec   Loss 10.6487   LearningRate 0.0603   Epoch: 4   Global Step: 185560   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:24,634-Speed 2626.78 samples/sec   Loss 10.6922   LearningRate 0.0603   Epoch: 4   Global Step: 185570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:28,534-Speed 2626.31 samples/sec   Loss 10.5140   LearningRate 0.0603   Epoch: 4   Global Step: 185580   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:32,436-Speed 2625.17 samples/sec   Loss 10.6767   LearningRate 0.0603   Epoch: 4   Global Step: 185590   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:36,343-Speed 2621.41 samples/sec   Loss 10.3920   LearningRate 0.0603   Epoch: 4   Global Step: 185600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:40,252-Speed 2620.09 samples/sec   Loss 10.6595   LearningRate 0.0603   Epoch: 4   Global Step: 185610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:44,205-Speed 2591.48 samples/sec   Loss 10.5993   LearningRate 0.0603   Epoch: 4   Global Step: 185620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:48,222-Speed 2549.71 samples/sec   Loss 10.6928   LearningRate 0.0603   Epoch: 4   Global Step: 185630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:52,138-Speed 2615.71 samples/sec   Loss 10.7423   LearningRate 0.0603   Epoch: 4   Global Step: 185640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:56,060-Speed 2611.85 samples/sec   Loss 10.5828   LearningRate 0.0603   Epoch: 4   Global Step: 185650   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:34:59,957-Speed 2628.26 samples/sec   Loss 10.5885   LearningRate 0.0602   Epoch: 4   Global Step: 185660   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:03,861-Speed 2623.68 samples/sec   Loss 10.6927   LearningRate 0.0602   Epoch: 4   Global Step: 185670   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:07,765-Speed 2623.00 samples/sec   Loss 10.5922   LearningRate 0.0602   Epoch: 4   Global Step: 185680   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:11,662-Speed 2628.11 samples/sec   Loss 10.6038   LearningRate 0.0602   Epoch: 4   Global Step: 185690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:15,559-Speed 2628.83 samples/sec   Loss 10.6264   LearningRate 0.0602   Epoch: 4   Global Step: 185700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:19,460-Speed 2626.26 samples/sec   Loss 10.6714   LearningRate 0.0602   Epoch: 4   Global Step: 185710   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:23,385-Speed 2609.28 samples/sec   Loss 10.6732   LearningRate 0.0602   Epoch: 4   Global Step: 185720   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:27,303-Speed 2614.10 samples/sec   Loss 10.5748   LearningRate 0.0602   Epoch: 4   Global Step: 185730   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:31,210-Speed 2621.68 samples/sec   Loss 10.5597   LearningRate 0.0602   Epoch: 4   Global Step: 185740   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:35,121-Speed 2618.61 samples/sec   Loss 10.4787   LearningRate 0.0602   Epoch: 4   Global Step: 185750   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:39,033-Speed 2617.90 samples/sec   Loss 10.5546   LearningRate 0.0602   Epoch: 4   Global Step: 185760   Fp16 Grad Scale: 524288   Required: 72 hours
Training: 2022-04-13 16:35:42,932-Speed 2630.93 samples/sec   Loss 10.3505   LearningRate 0.0602   Epoch: 4   Global Step: 185770   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:46,846-Speed 2617.09 samples/sec   Loss 10.6557   LearningRate 0.0602   Epoch: 4   Global Step: 185780   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:50,745-Speed 2626.80 samples/sec   Loss 10.5838   LearningRate 0.0602   Epoch: 4   Global Step: 185790   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:54,671-Speed 2608.96 samples/sec   Loss 10.5370   LearningRate 0.0602   Epoch: 4   Global Step: 185800   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:35:58,585-Speed 2616.97 samples/sec   Loss 10.3702   LearningRate 0.0602   Epoch: 4   Global Step: 185810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:02,482-Speed 2628.42 samples/sec   Loss 10.3938   LearningRate 0.0602   Epoch: 4   Global Step: 185820   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:06,386-Speed 2623.61 samples/sec   Loss 10.5216   LearningRate 0.0602   Epoch: 4   Global Step: 185830   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:10,289-Speed 2624.07 samples/sec   Loss 10.5938   LearningRate 0.0602   Epoch: 4   Global Step: 185840   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:14,193-Speed 2623.83 samples/sec   Loss 10.5293   LearningRate 0.0602   Epoch: 4   Global Step: 185850   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:18,102-Speed 2619.65 samples/sec   Loss 10.5330   LearningRate 0.0602   Epoch: 4   Global Step: 185860   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:22,009-Speed 2621.79 samples/sec   Loss 10.6690   LearningRate 0.0602   Epoch: 4   Global Step: 185870   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:25,908-Speed 2627.07 samples/sec   Loss 10.4654   LearningRate 0.0602   Epoch: 4   Global Step: 185880   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:29,811-Speed 2624.25 samples/sec   Loss 10.6081   LearningRate 0.0602   Epoch: 4   Global Step: 185890   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:33,715-Speed 2624.15 samples/sec   Loss 10.4066   LearningRate 0.0602   Epoch: 4   Global Step: 185900   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:36:37,619-Speed 2623.08 samples/sec   Loss 10.5715   LearningRate 0.0602   Epoch: 4   Global Step: 185910   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:36:41,518-Speed 2627.01 samples/sec   Loss 10.4880   LearningRate 0.0602   Epoch: 4   Global Step: 185920   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:36:45,419-Speed 2624.89 samples/sec   Loss 10.6168   LearningRate 0.0602   Epoch: 4   Global Step: 185930   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:36:49,328-Speed 2620.57 samples/sec   Loss 10.5328   LearningRate 0.0602   Epoch: 4   Global Step: 185940   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:36:53,232-Speed 2623.87 samples/sec   Loss 10.6299   LearningRate 0.0602   Epoch: 4   Global Step: 185950   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:36:57,134-Speed 2625.24 samples/sec   Loss 10.6442   LearningRate 0.0602   Epoch: 4   Global Step: 185960   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:01,034-Speed 2626.19 samples/sec   Loss 10.5992   LearningRate 0.0602   Epoch: 4   Global Step: 185970   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:04,938-Speed 2631.19 samples/sec   Loss 10.5362   LearningRate 0.0602   Epoch: 4   Global Step: 185980   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:08,842-Speed 2623.69 samples/sec   Loss 10.5972   LearningRate 0.0602   Epoch: 4   Global Step: 185990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:12,744-Speed 2624.80 samples/sec   Loss 10.4121   LearningRate 0.0602   Epoch: 4   Global Step: 186000   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:16,630-Speed 2635.40 samples/sec   Loss 10.6424   LearningRate 0.0602   Epoch: 4   Global Step: 186010   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:20,546-Speed 2615.85 samples/sec   Loss 10.5703   LearningRate 0.0602   Epoch: 4   Global Step: 186020   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:24,437-Speed 2631.97 samples/sec   Loss 10.3762   LearningRate 0.0602   Epoch: 4   Global Step: 186030   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:28,340-Speed 2624.43 samples/sec   Loss 10.6347   LearningRate 0.0602   Epoch: 4   Global Step: 186040   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:32,295-Speed 2590.01 samples/sec   Loss 10.4400   LearningRate 0.0602   Epoch: 4   Global Step: 186050   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:36,201-Speed 2622.52 samples/sec   Loss 10.5505   LearningRate 0.0602   Epoch: 4   Global Step: 186060   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:40,103-Speed 2624.16 samples/sec   Loss 10.5984   LearningRate 0.0602   Epoch: 4   Global Step: 186070   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:44,020-Speed 2614.98 samples/sec   Loss 10.7104   LearningRate 0.0602   Epoch: 4   Global Step: 186080   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:47,921-Speed 2625.13 samples/sec   Loss 10.5839   LearningRate 0.0602   Epoch: 4   Global Step: 186090   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:51,824-Speed 2624.08 samples/sec   Loss 10.3272   LearningRate 0.0602   Epoch: 4   Global Step: 186100   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:55,716-Speed 2632.31 samples/sec   Loss 10.5290   LearningRate 0.0602   Epoch: 4   Global Step: 186110   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:37:59,629-Speed 2616.78 samples/sec   Loss 10.6250   LearningRate 0.0602   Epoch: 4   Global Step: 186120   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:38:03,533-Speed 2623.94 samples/sec   Loss 10.6114   LearningRate 0.0602   Epoch: 4   Global Step: 186130   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:38:07,419-Speed 2635.67 samples/sec   Loss 10.6750   LearningRate 0.0602   Epoch: 4   Global Step: 186140   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:11,321-Speed 2624.88 samples/sec   Loss 10.4252   LearningRate 0.0602   Epoch: 4   Global Step: 186150   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:15,224-Speed 2624.10 samples/sec   Loss 10.4898   LearningRate 0.0602   Epoch: 4   Global Step: 186160   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:19,135-Speed 2619.33 samples/sec   Loss 10.4463   LearningRate 0.0602   Epoch: 4   Global Step: 186170   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:23,038-Speed 2623.86 samples/sec   Loss 10.5784   LearningRate 0.0602   Epoch: 4   Global Step: 186180   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:26,940-Speed 2624.75 samples/sec   Loss 10.5654   LearningRate 0.0601   Epoch: 4   Global Step: 186190   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:30,844-Speed 2623.77 samples/sec   Loss 10.4988   LearningRate 0.0601   Epoch: 4   Global Step: 186200   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:34,744-Speed 2626.14 samples/sec   Loss 10.4930   LearningRate 0.0601   Epoch: 4   Global Step: 186210   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:38,660-Speed 2615.13 samples/sec   Loss 10.5333   LearningRate 0.0601   Epoch: 4   Global Step: 186220   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:42,566-Speed 2622.67 samples/sec   Loss 10.6241   LearningRate 0.0601   Epoch: 4   Global Step: 186230   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:46,467-Speed 2625.49 samples/sec   Loss 10.4176   LearningRate 0.0601   Epoch: 4   Global Step: 186240   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:38:50,357-Speed 2633.28 samples/sec   Loss 10.6028   LearningRate 0.0601   Epoch: 4   Global Step: 186250   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:54,278-Speed 2612.25 samples/sec   Loss 10.4645   LearningRate 0.0601   Epoch: 4   Global Step: 186260   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:38:58,132-Speed 2657.40 samples/sec   Loss 11.4375   LearningRate 0.0601   Epoch: 4   Global Step: 186270   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:02,023-Speed 2632.10 samples/sec   Loss 11.5140   LearningRate 0.0601   Epoch: 4   Global Step: 186280   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:05,928-Speed 2622.78 samples/sec   Loss 11.2130   LearningRate 0.0601   Epoch: 4   Global Step: 186290   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:09,834-Speed 2621.86 samples/sec   Loss 10.8443   LearningRate 0.0601   Epoch: 4   Global Step: 186300   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:13,731-Speed 2628.32 samples/sec   Loss 10.6664   LearningRate 0.0601   Epoch: 4   Global Step: 186310   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:17,632-Speed 2625.63 samples/sec   Loss 10.5800   LearningRate 0.0601   Epoch: 4   Global Step: 186320   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:21,532-Speed 2626.31 samples/sec   Loss 10.5862   LearningRate 0.0601   Epoch: 4   Global Step: 186330   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:25,429-Speed 2628.17 samples/sec   Loss 10.5918   LearningRate 0.0601   Epoch: 4   Global Step: 186340   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:29,322-Speed 2631.33 samples/sec   Loss 10.4820   LearningRate 0.0601   Epoch: 4   Global Step: 186350   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:33,214-Speed 2631.39 samples/sec   Loss 10.5446   LearningRate 0.0601   Epoch: 4   Global Step: 186360   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 16:39:37,123-Speed 2620.31 samples/sec   Loss 10.6743   LearningRate 0.0601   Epoch: 4   Global Step: 186370   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:39:41,021-Speed 2627.13 samples/sec   Loss 10.5657   LearningRate 0.0601   Epoch: 4   Global Step: 186380   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:39:44,923-Speed 2625.26 samples/sec   Loss 10.6091   LearningRate 0.0601   Epoch: 4   Global Step: 186390   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:39:48,825-Speed 2624.32 samples/sec   Loss 10.5826   LearningRate 0.0601   Epoch: 4   Global Step: 186400   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:39:52,729-Speed 2623.69 samples/sec   Loss 10.6542   LearningRate 0.0601   Epoch: 4   Global Step: 186410   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:39:56,640-Speed 2619.02 samples/sec   Loss 10.5544   LearningRate 0.0601   Epoch: 4   Global Step: 186420   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:40:00,568-Speed 2607.91 samples/sec   Loss 10.7568   LearningRate 0.0601   Epoch: 4   Global Step: 186430   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:40:04,483-Speed 2615.89 samples/sec   Loss 11.0367   LearningRate 0.0601   Epoch: 4   Global Step: 186440   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:40:08,379-Speed 2629.23 samples/sec   Loss 10.8165   LearningRate 0.0601   Epoch: 4   Global Step: 186450   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:40:12,284-Speed 2622.26 samples/sec   Loss 10.6589   LearningRate 0.0601   Epoch: 4   Global Step: 186460   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:40:16,186-Speed 2625.28 samples/sec   Loss 10.6777   LearningRate 0.0601   Epoch: 4   Global Step: 186470   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:20,097-Speed 2618.67 samples/sec   Loss 10.5233   LearningRate 0.0601   Epoch: 4   Global Step: 186480   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:24,007-Speed 2619.47 samples/sec   Loss 10.6226   LearningRate 0.0601   Epoch: 4   Global Step: 186490   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:27,910-Speed 2624.42 samples/sec   Loss 10.6019   LearningRate 0.0601   Epoch: 4   Global Step: 186500   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:31,801-Speed 2632.36 samples/sec   Loss 10.5466   LearningRate 0.0601   Epoch: 4   Global Step: 186510   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:35,693-Speed 2632.03 samples/sec   Loss 10.6319   LearningRate 0.0601   Epoch: 4   Global Step: 186520   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:39,588-Speed 2629.32 samples/sec   Loss 10.5208   LearningRate 0.0601   Epoch: 4   Global Step: 186530   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:43,484-Speed 2628.74 samples/sec   Loss 10.5828   LearningRate 0.0601   Epoch: 4   Global Step: 186540   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:47,377-Speed 2630.92 samples/sec   Loss 10.5851   LearningRate 0.0601   Epoch: 4   Global Step: 186550   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:51,285-Speed 2621.25 samples/sec   Loss 10.6229   LearningRate 0.0601   Epoch: 4   Global Step: 186560   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:40:55,183-Speed 2627.48 samples/sec   Loss 10.5823   LearningRate 0.0601   Epoch: 4   Global Step: 186570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:40:59,074-Speed 2632.05 samples/sec   Loss 10.5791   LearningRate 0.0601   Epoch: 4   Global Step: 186580   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:02,970-Speed 2629.29 samples/sec   Loss 10.5299   LearningRate 0.0601   Epoch: 4   Global Step: 186590   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:06,871-Speed 2625.81 samples/sec   Loss 10.7638   LearningRate 0.0601   Epoch: 4   Global Step: 186600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:10,775-Speed 2623.01 samples/sec   Loss 10.6339   LearningRate 0.0601   Epoch: 4   Global Step: 186610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:14,669-Speed 2630.44 samples/sec   Loss 10.6755   LearningRate 0.0601   Epoch: 4   Global Step: 186620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:18,569-Speed 2626.33 samples/sec   Loss 10.6200   LearningRate 0.0601   Epoch: 4   Global Step: 186630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:22,546-Speed 2575.43 samples/sec   Loss 10.6563   LearningRate 0.0601   Epoch: 4   Global Step: 186640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:26,455-Speed 2620.03 samples/sec   Loss 10.6602   LearningRate 0.0601   Epoch: 4   Global Step: 186650   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:30,339-Speed 2637.25 samples/sec   Loss 10.6547   LearningRate 0.0601   Epoch: 4   Global Step: 186660   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:41:34,256-Speed 2614.97 samples/sec   Loss 10.5280   LearningRate 0.0601   Epoch: 4   Global Step: 186670   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:41:38,164-Speed 2620.47 samples/sec   Loss 10.5670   LearningRate 0.0601   Epoch: 4   Global Step: 186680   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:41:42,068-Speed 2623.26 samples/sec   Loss 10.5878   LearningRate 0.0601   Epoch: 4   Global Step: 186690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:41:45,968-Speed 2627.56 samples/sec   Loss 10.6878   LearningRate 0.0601   Epoch: 4   Global Step: 186700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:41:49,870-Speed 2624.81 samples/sec   Loss 10.4425   LearningRate 0.0601   Epoch: 4   Global Step: 186710   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:41:53,771-Speed 2625.81 samples/sec   Loss 10.6894   LearningRate 0.0601   Epoch: 4   Global Step: 186720   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:41:57,673-Speed 2624.34 samples/sec   Loss 10.6315   LearningRate 0.0600   Epoch: 4   Global Step: 186730   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:42:01,578-Speed 2623.08 samples/sec   Loss 10.4926   LearningRate 0.0600   Epoch: 4   Global Step: 186740   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:42:05,463-Speed 2636.30 samples/sec   Loss 10.5588   LearningRate 0.0600   Epoch: 4   Global Step: 186750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:09,360-Speed 2628.26 samples/sec   Loss 10.4621   LearningRate 0.0600   Epoch: 4   Global Step: 186760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:13,261-Speed 2625.16 samples/sec   Loss 10.6299   LearningRate 0.0600   Epoch: 4   Global Step: 186770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:17,163-Speed 2624.61 samples/sec   Loss 10.5018   LearningRate 0.0600   Epoch: 4   Global Step: 186780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:21,065-Speed 2625.15 samples/sec   Loss 10.4914   LearningRate 0.0600   Epoch: 4   Global Step: 186790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:24,972-Speed 2625.15 samples/sec   Loss 10.5077   LearningRate 0.0600   Epoch: 4   Global Step: 186800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:28,874-Speed 2624.50 samples/sec   Loss 10.4358   LearningRate 0.0600   Epoch: 4   Global Step: 186810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:32,776-Speed 2625.05 samples/sec   Loss 10.4901   LearningRate 0.0600   Epoch: 4   Global Step: 186820   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:36,677-Speed 2625.36 samples/sec   Loss 10.5502   LearningRate 0.0600   Epoch: 4   Global Step: 186830   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:40,572-Speed 2629.87 samples/sec   Loss 10.5861   LearningRate 0.0600   Epoch: 4   Global Step: 186840   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:42:44,464-Speed 2630.90 samples/sec   Loss 10.5704   LearningRate 0.0600   Epoch: 4   Global Step: 186850   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:42:48,357-Speed 2631.15 samples/sec   Loss 10.6421   LearningRate 0.0600   Epoch: 4   Global Step: 186860   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:42:52,248-Speed 2632.09 samples/sec   Loss 10.6998   LearningRate 0.0600   Epoch: 4   Global Step: 186870   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:42:56,123-Speed 2643.71 samples/sec   Loss 10.4447   LearningRate 0.0600   Epoch: 4   Global Step: 186880   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:00,016-Speed 2631.09 samples/sec   Loss 10.6650   LearningRate 0.0600   Epoch: 4   Global Step: 186890   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:03,908-Speed 2631.90 samples/sec   Loss 10.5083   LearningRate 0.0600   Epoch: 4   Global Step: 186900   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:07,801-Speed 2631.02 samples/sec   Loss 10.4122   LearningRate 0.0600   Epoch: 4   Global Step: 186910   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:11,693-Speed 2631.36 samples/sec   Loss 10.6849   LearningRate 0.0600   Epoch: 4   Global Step: 186920   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:15,585-Speed 2631.40 samples/sec   Loss 10.6880   LearningRate 0.0600   Epoch: 4   Global Step: 186930   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:19,497-Speed 2618.74 samples/sec   Loss 10.4990   LearningRate 0.0600   Epoch: 4   Global Step: 186940   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:23,443-Speed 2595.80 samples/sec   Loss 10.6097   LearningRate 0.0600   Epoch: 4   Global Step: 186950   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:27,353-Speed 2619.46 samples/sec   Loss 10.6315   LearningRate 0.0600   Epoch: 4   Global Step: 186960   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:31,247-Speed 2629.92 samples/sec   Loss 10.6655   LearningRate 0.0600   Epoch: 4   Global Step: 186970   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:35,145-Speed 2627.97 samples/sec   Loss 10.4514   LearningRate 0.0600   Epoch: 4   Global Step: 186980   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:43:39,034-Speed 2633.92 samples/sec   Loss 10.6183   LearningRate 0.0600   Epoch: 4   Global Step: 186990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:43:42,925-Speed 2631.85 samples/sec   Loss 10.6322   LearningRate 0.0600   Epoch: 4   Global Step: 187000   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:43:46,818-Speed 2631.12 samples/sec   Loss 10.6246   LearningRate 0.0600   Epoch: 4   Global Step: 187010   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:43:50,698-Speed 2639.70 samples/sec   Loss 10.6591   LearningRate 0.0600   Epoch: 4   Global Step: 187020   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:43:54,575-Speed 2642.00 samples/sec   Loss 10.5420   LearningRate 0.0600   Epoch: 4   Global Step: 187030   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:43:58,462-Speed 2635.18 samples/sec   Loss 10.6892   LearningRate 0.0600   Epoch: 4   Global Step: 187040   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:02,359-Speed 2627.98 samples/sec   Loss 10.7331   LearningRate 0.0600   Epoch: 4   Global Step: 187050   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:06,254-Speed 2629.58 samples/sec   Loss 10.7130   LearningRate 0.0600   Epoch: 4   Global Step: 187060   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:10,155-Speed 2626.04 samples/sec   Loss 10.6622   LearningRate 0.0600   Epoch: 4   Global Step: 187070   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:14,052-Speed 2628.01 samples/sec   Loss 10.5504   LearningRate 0.0600   Epoch: 4   Global Step: 187080   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:17,952-Speed 2626.08 samples/sec   Loss 10.5250   LearningRate 0.0600   Epoch: 4   Global Step: 187090   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:21,848-Speed 2628.72 samples/sec   Loss 10.6355   LearningRate 0.0600   Epoch: 4   Global Step: 187100   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:25,748-Speed 2626.73 samples/sec   Loss 10.6229   LearningRate 0.0600   Epoch: 4   Global Step: 187110   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:29,646-Speed 2627.06 samples/sec   Loss 10.6098   LearningRate 0.0600   Epoch: 4   Global Step: 187120   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:33,568-Speed 2611.63 samples/sec   Loss 10.6042   LearningRate 0.0600   Epoch: 4   Global Step: 187130   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 16:44:37,498-Speed 2606.06 samples/sec   Loss 10.5648   LearningRate 0.0600   Epoch: 4   Global Step: 187140   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:44:41,422-Speed 2610.14 samples/sec   Loss 10.6832   LearningRate 0.0600   Epoch: 4   Global Step: 187150   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:44:45,344-Speed 2612.01 samples/sec   Loss 10.5489   LearningRate 0.0600   Epoch: 4   Global Step: 187160   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:44:49,239-Speed 2629.32 samples/sec   Loss 10.6238   LearningRate 0.0600   Epoch: 4   Global Step: 187170   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:44:53,130-Speed 2632.68 samples/sec   Loss 10.6794   LearningRate 0.0600   Epoch: 4   Global Step: 187180   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:44:57,021-Speed 2631.88 samples/sec   Loss 10.5339   LearningRate 0.0600   Epoch: 4   Global Step: 187190   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:00,914-Speed 2631.01 samples/sec   Loss 10.5048   LearningRate 0.0600   Epoch: 4   Global Step: 187200   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:04,807-Speed 2630.60 samples/sec   Loss 10.6867   LearningRate 0.0600   Epoch: 4   Global Step: 187210   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:08,707-Speed 2626.59 samples/sec   Loss 10.6137   LearningRate 0.0600   Epoch: 4   Global Step: 187220   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:12,600-Speed 2630.99 samples/sec   Loss 10.5960   LearningRate 0.0600   Epoch: 4   Global Step: 187230   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:16,492-Speed 2631.91 samples/sec   Loss 10.5985   LearningRate 0.0600   Epoch: 4   Global Step: 187240   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:45:20,386-Speed 2630.06 samples/sec   Loss 10.5914   LearningRate 0.0600   Epoch: 4   Global Step: 187250   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:45:24,277-Speed 2632.59 samples/sec   Loss 10.5621   LearningRate 0.0599   Epoch: 4   Global Step: 187260   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:45:28,170-Speed 2631.01 samples/sec   Loss 10.5079   LearningRate 0.0599   Epoch: 4   Global Step: 187270   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:32,078-Speed 2620.81 samples/sec   Loss 10.5303   LearningRate 0.0599   Epoch: 4   Global Step: 187280   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:35,986-Speed 2620.68 samples/sec   Loss 10.4235   LearningRate 0.0599   Epoch: 4   Global Step: 187290   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:39,884-Speed 2627.48 samples/sec   Loss 10.5743   LearningRate 0.0599   Epoch: 4   Global Step: 187300   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:43,779-Speed 2629.93 samples/sec   Loss 10.5982   LearningRate 0.0599   Epoch: 4   Global Step: 187310   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:47,678-Speed 2626.73 samples/sec   Loss 10.5930   LearningRate 0.0599   Epoch: 4   Global Step: 187320   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:51,569-Speed 2632.65 samples/sec   Loss 10.5661   LearningRate 0.0599   Epoch: 4   Global Step: 187330   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:55,464-Speed 2629.07 samples/sec   Loss 10.4675   LearningRate 0.0599   Epoch: 4   Global Step: 187340   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:45:59,364-Speed 2627.22 samples/sec   Loss 10.4862   LearningRate 0.0599   Epoch: 4   Global Step: 187350   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:46:03,262-Speed 2627.54 samples/sec   Loss 10.5555   LearningRate 0.0599   Epoch: 4   Global Step: 187360   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 16:46:07,162-Speed 2626.04 samples/sec   Loss 10.6979   LearningRate 0.0599   Epoch: 4   Global Step: 187370   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:11,057-Speed 2629.04 samples/sec   Loss 10.4765   LearningRate 0.0599   Epoch: 4   Global Step: 187380   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:14,978-Speed 2612.67 samples/sec   Loss 10.4594   LearningRate 0.0599   Epoch: 4   Global Step: 187390   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:18,873-Speed 2630.13 samples/sec   Loss 10.5156   LearningRate 0.0599   Epoch: 4   Global Step: 187400   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:22,771-Speed 2627.22 samples/sec   Loss 10.6569   LearningRate 0.0599   Epoch: 4   Global Step: 187410   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:26,680-Speed 2620.46 samples/sec   Loss 10.4233   LearningRate 0.0599   Epoch: 4   Global Step: 187420   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:30,569-Speed 2634.28 samples/sec   Loss 10.5055   LearningRate 0.0599   Epoch: 4   Global Step: 187430   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:34,478-Speed 2620.11 samples/sec   Loss 10.5126   LearningRate 0.0599   Epoch: 4   Global Step: 187440   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:38,375-Speed 2628.09 samples/sec   Loss 10.6673   LearningRate 0.0599   Epoch: 4   Global Step: 187450   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:42,272-Speed 2628.21 samples/sec   Loss 10.4025   LearningRate 0.0599   Epoch: 4   Global Step: 187460   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:46,168-Speed 2629.11 samples/sec   Loss 10.5022   LearningRate 0.0599   Epoch: 4   Global Step: 187470   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:46:50,044-Speed 2642.65 samples/sec   Loss 10.4502   LearningRate 0.0599   Epoch: 4   Global Step: 187480   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:53,939-Speed 2630.18 samples/sec   Loss 10.4803   LearningRate 0.0599   Epoch: 4   Global Step: 187490   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:46:57,831-Speed 2631.39 samples/sec   Loss 10.4543   LearningRate 0.0599   Epoch: 4   Global Step: 187500   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:01,728-Speed 2628.28 samples/sec   Loss 10.5173   LearningRate 0.0599   Epoch: 4   Global Step: 187510   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:05,618-Speed 2632.57 samples/sec   Loss 10.6053   LearningRate 0.0599   Epoch: 4   Global Step: 187520   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:09,507-Speed 2633.51 samples/sec   Loss 10.6049   LearningRate 0.0599   Epoch: 4   Global Step: 187530   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:13,404-Speed 2628.46 samples/sec   Loss 10.5100   LearningRate 0.0599   Epoch: 4   Global Step: 187540   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:17,301-Speed 2628.52 samples/sec   Loss 10.6536   LearningRate 0.0599   Epoch: 4   Global Step: 187550   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:21,196-Speed 2629.93 samples/sec   Loss 10.5619   LearningRate 0.0599   Epoch: 4   Global Step: 187560   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:25,087-Speed 2632.07 samples/sec   Loss 10.5578   LearningRate 0.0599   Epoch: 4   Global Step: 187570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:47:28,993-Speed 2622.97 samples/sec   Loss 10.6286   LearningRate 0.0599   Epoch: 4   Global Step: 187580   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:32,889-Speed 2628.58 samples/sec   Loss 10.5452   LearningRate 0.0599   Epoch: 4   Global Step: 187590   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:36,781-Speed 2631.31 samples/sec   Loss 10.5183   LearningRate 0.0599   Epoch: 4   Global Step: 187600   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:40,675-Speed 2630.11 samples/sec   Loss 10.4594   LearningRate 0.0599   Epoch: 4   Global Step: 187610   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:44,579-Speed 2624.01 samples/sec   Loss 10.5644   LearningRate 0.0599   Epoch: 4   Global Step: 187620   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:48,473-Speed 2630.07 samples/sec   Loss 10.6456   LearningRate 0.0599   Epoch: 4   Global Step: 187630   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:52,370-Speed 2628.68 samples/sec   Loss 10.5082   LearningRate 0.0599   Epoch: 4   Global Step: 187640   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:47:56,296-Speed 2608.69 samples/sec   Loss 10.4825   LearningRate 0.0599   Epoch: 4   Global Step: 187650   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:00,192-Speed 2629.09 samples/sec   Loss 10.5098   LearningRate 0.0599   Epoch: 4   Global Step: 187660   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:04,094-Speed 2624.87 samples/sec   Loss 10.5094   LearningRate 0.0599   Epoch: 4   Global Step: 187670   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:07,978-Speed 2636.91 samples/sec   Loss 10.6363   LearningRate 0.0599   Epoch: 4   Global Step: 187680   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:11,871-Speed 2631.20 samples/sec   Loss 10.6011   LearningRate 0.0599   Epoch: 4   Global Step: 187690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:15,776-Speed 2622.66 samples/sec   Loss 10.5870   LearningRate 0.0599   Epoch: 4   Global Step: 187700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:19,673-Speed 2628.65 samples/sec   Loss 10.5833   LearningRate 0.0599   Epoch: 4   Global Step: 187710   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:23,568-Speed 2629.43 samples/sec   Loss 10.6704   LearningRate 0.0599   Epoch: 4   Global Step: 187720   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:27,461-Speed 2631.31 samples/sec   Loss 10.5115   LearningRate 0.0599   Epoch: 4   Global Step: 187730   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:48:31,337-Speed 2642.52 samples/sec   Loss 10.5593   LearningRate 0.0599   Epoch: 4   Global Step: 187740   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:35,227-Speed 2633.09 samples/sec   Loss 10.5916   LearningRate 0.0599   Epoch: 4   Global Step: 187750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:39,121-Speed 2630.67 samples/sec   Loss 10.6684   LearningRate 0.0599   Epoch: 4   Global Step: 187760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:43,011-Speed 2632.69 samples/sec   Loss 10.5890   LearningRate 0.0599   Epoch: 4   Global Step: 187770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:46,902-Speed 2632.23 samples/sec   Loss 10.4179   LearningRate 0.0599   Epoch: 4   Global Step: 187780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:50,796-Speed 2630.57 samples/sec   Loss 10.6200   LearningRate 0.0599   Epoch: 4   Global Step: 187790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:54,693-Speed 2628.47 samples/sec   Loss 10.4958   LearningRate 0.0598   Epoch: 4   Global Step: 187800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:48:58,585-Speed 2631.46 samples/sec   Loss 10.5773   LearningRate 0.0598   Epoch: 4   Global Step: 187810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:02,482-Speed 2628.06 samples/sec   Loss 10.6375   LearningRate 0.0598   Epoch: 4   Global Step: 187820   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:06,376-Speed 2630.82 samples/sec   Loss 10.6877   LearningRate 0.0598   Epoch: 4   Global Step: 187830   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:10,281-Speed 2622.65 samples/sec   Loss 10.4339   LearningRate 0.0598   Epoch: 4   Global Step: 187840   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:49:14,178-Speed 2628.00 samples/sec   Loss 10.5151   LearningRate 0.0598   Epoch: 4   Global Step: 187850   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:49:18,073-Speed 2629.54 samples/sec   Loss 10.3564   LearningRate 0.0598   Epoch: 4   Global Step: 187860   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:49:21,949-Speed 2642.36 samples/sec   Loss 10.7312   LearningRate 0.0598   Epoch: 4   Global Step: 187870   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:25,842-Speed 2631.77 samples/sec   Loss 10.4175   LearningRate 0.0598   Epoch: 4   Global Step: 187880   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:29,749-Speed 2621.72 samples/sec   Loss 10.6202   LearningRate 0.0598   Epoch: 4   Global Step: 187890   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:33,641-Speed 2631.65 samples/sec   Loss 10.6131   LearningRate 0.0598   Epoch: 4   Global Step: 187900   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:37,543-Speed 2624.59 samples/sec   Loss 10.4425   LearningRate 0.0598   Epoch: 4   Global Step: 187910   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:41,442-Speed 2626.77 samples/sec   Loss 10.5132   LearningRate 0.0598   Epoch: 4   Global Step: 187920   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:45,347-Speed 2623.00 samples/sec   Loss 10.7070   LearningRate 0.0598   Epoch: 4   Global Step: 187930   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:49,251-Speed 2623.80 samples/sec   Loss 10.4266   LearningRate 0.0598   Epoch: 4   Global Step: 187940   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:53,165-Speed 2616.60 samples/sec   Loss 10.6033   LearningRate 0.0598   Epoch: 4   Global Step: 187950   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:49:57,070-Speed 2622.90 samples/sec   Loss 10.4511   LearningRate 0.0598   Epoch: 4   Global Step: 187960   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:00,973-Speed 2624.65 samples/sec   Loss 10.5942   LearningRate 0.0598   Epoch: 4   Global Step: 187970   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:50:04,879-Speed 2622.24 samples/sec   Loss 10.4375   LearningRate 0.0598   Epoch: 4   Global Step: 187980   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:50:08,784-Speed 2623.07 samples/sec   Loss 10.3505   LearningRate 0.0598   Epoch: 4   Global Step: 187990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:50:12,666-Speed 2638.34 samples/sec   Loss 10.6777   LearningRate 0.0598   Epoch: 4   Global Step: 188000   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:16,562-Speed 2628.63 samples/sec   Loss 10.6346   LearningRate 0.0598   Epoch: 4   Global Step: 188010   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:20,457-Speed 2629.88 samples/sec   Loss 10.6032   LearningRate 0.0598   Epoch: 4   Global Step: 188020   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:24,376-Speed 2613.51 samples/sec   Loss 10.5192   LearningRate 0.0598   Epoch: 4   Global Step: 188030   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:28,269-Speed 2630.62 samples/sec   Loss 10.4713   LearningRate 0.0598   Epoch: 4   Global Step: 188040   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:32,179-Speed 2620.23 samples/sec   Loss 10.5585   LearningRate 0.0598   Epoch: 4   Global Step: 188050   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:36,099-Speed 2613.09 samples/sec   Loss 10.5279   LearningRate 0.0598   Epoch: 4   Global Step: 188060   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:39,995-Speed 2628.55 samples/sec   Loss 10.4093   LearningRate 0.0598   Epoch: 4   Global Step: 188070   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:43,886-Speed 2632.81 samples/sec   Loss 10.3749   LearningRate 0.0598   Epoch: 4   Global Step: 188080   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:47,782-Speed 2628.82 samples/sec   Loss 10.5468   LearningRate 0.0598   Epoch: 4   Global Step: 188090   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:50:51,680-Speed 2627.40 samples/sec   Loss 10.4930   LearningRate 0.0598   Epoch: 4   Global Step: 188100   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:50:55,611-Speed 2606.56 samples/sec   Loss 10.4804   LearningRate 0.0598   Epoch: 4   Global Step: 188110   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:50:59,509-Speed 2627.14 samples/sec   Loss 10.6600   LearningRate 0.0598   Epoch: 4   Global Step: 188120   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:51:03,399-Speed 2633.41 samples/sec   Loss 10.6941   LearningRate 0.0598   Epoch: 4   Global Step: 188130   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:51:07,293-Speed 2630.13 samples/sec   Loss 10.5225   LearningRate 0.0598   Epoch: 4   Global Step: 188140   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:51:11,191-Speed 2627.69 samples/sec   Loss 10.5506   LearningRate 0.0598   Epoch: 4   Global Step: 188150   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:51:15,063-Speed 2645.16 samples/sec   Loss 10.5293   LearningRate 0.0598   Epoch: 4   Global Step: 188160   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:18,967-Speed 2623.91 samples/sec   Loss 10.5275   LearningRate 0.0598   Epoch: 4   Global Step: 188170   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:22,866-Speed 2626.89 samples/sec   Loss 10.5690   LearningRate 0.0598   Epoch: 4   Global Step: 188180   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:26,759-Speed 2630.62 samples/sec   Loss 10.5950   LearningRate 0.0598   Epoch: 4   Global Step: 188190   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:30,652-Speed 2631.85 samples/sec   Loss 10.3900   LearningRate 0.0598   Epoch: 4   Global Step: 188200   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:34,550-Speed 2627.68 samples/sec   Loss 10.5584   LearningRate 0.0598   Epoch: 4   Global Step: 188210   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:38,444-Speed 2629.81 samples/sec   Loss 10.5079   LearningRate 0.0598   Epoch: 4   Global Step: 188220   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:42,336-Speed 2631.47 samples/sec   Loss 10.5920   LearningRate 0.0598   Epoch: 4   Global Step: 188230   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:46,230-Speed 2630.85 samples/sec   Loss 10.6104   LearningRate 0.0598   Epoch: 4   Global Step: 188240   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:50,125-Speed 2629.45 samples/sec   Loss 10.6349   LearningRate 0.0598   Epoch: 4   Global Step: 188250   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:51:54,021-Speed 2629.38 samples/sec   Loss 10.5141   LearningRate 0.0598   Epoch: 4   Global Step: 188260   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:51:57,902-Speed 2638.68 samples/sec   Loss 10.5972   LearningRate 0.0598   Epoch: 4   Global Step: 188270   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:01,798-Speed 2629.39 samples/sec   Loss 10.4540   LearningRate 0.0598   Epoch: 4   Global Step: 188280   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:05,691-Speed 2631.14 samples/sec   Loss 10.4539   LearningRate 0.0598   Epoch: 4   Global Step: 188290   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:09,583-Speed 2631.62 samples/sec   Loss 10.5220   LearningRate 0.0598   Epoch: 4   Global Step: 188300   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:13,477-Speed 2630.01 samples/sec   Loss 10.6716   LearningRate 0.0598   Epoch: 4   Global Step: 188310   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:17,370-Speed 2631.21 samples/sec   Loss 10.4980   LearningRate 0.0598   Epoch: 4   Global Step: 188320   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:21,267-Speed 2628.60 samples/sec   Loss 10.4403   LearningRate 0.0598   Epoch: 4   Global Step: 188330   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:25,171-Speed 2623.78 samples/sec   Loss 10.4099   LearningRate 0.0597   Epoch: 4   Global Step: 188340   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:29,067-Speed 2629.21 samples/sec   Loss 10.4585   LearningRate 0.0597   Epoch: 4   Global Step: 188350   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:32,962-Speed 2629.47 samples/sec   Loss 10.6576   LearningRate 0.0597   Epoch: 4   Global Step: 188360   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:52:36,858-Speed 2628.93 samples/sec   Loss 10.4857   LearningRate 0.0597   Epoch: 4   Global Step: 188370   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:52:40,760-Speed 2624.45 samples/sec   Loss 10.5338   LearningRate 0.0597   Epoch: 4   Global Step: 188380   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:52:44,657-Speed 2628.83 samples/sec   Loss 10.5389   LearningRate 0.0597   Epoch: 4   Global Step: 188390   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:52:48,566-Speed 2620.43 samples/sec   Loss 10.6752   LearningRate 0.0597   Epoch: 4   Global Step: 188400   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:52:52,476-Speed 2619.84 samples/sec   Loss 10.6253   LearningRate 0.0597   Epoch: 4   Global Step: 188410   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:52:56,378-Speed 2624.34 samples/sec   Loss 10.6157   LearningRate 0.0597   Epoch: 4   Global Step: 188420   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:00,285-Speed 2622.28 samples/sec   Loss 10.4068   LearningRate 0.0597   Epoch: 4   Global Step: 188430   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:04,206-Speed 2611.56 samples/sec   Loss 10.3940   LearningRate 0.0597   Epoch: 4   Global Step: 188440   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:08,103-Speed 2627.99 samples/sec   Loss 10.5349   LearningRate 0.0597   Epoch: 4   Global Step: 188450   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:12,007-Speed 2623.59 samples/sec   Loss 10.5692   LearningRate 0.0597   Epoch: 4   Global Step: 188460   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:15,889-Speed 2639.07 samples/sec   Loss 10.6236   LearningRate 0.0597   Epoch: 4   Global Step: 188470   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:19,789-Speed 2625.62 samples/sec   Loss 10.5436   LearningRate 0.0597   Epoch: 4   Global Step: 188480   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:23,692-Speed 2624.63 samples/sec   Loss 10.5635   LearningRate 0.0597   Epoch: 4   Global Step: 188490   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:27,647-Speed 2590.31 samples/sec   Loss 10.6575   LearningRate 0.0597   Epoch: 4   Global Step: 188500   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:31,553-Speed 2622.32 samples/sec   Loss 10.3975   LearningRate 0.0597   Epoch: 4   Global Step: 188510   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:35,458-Speed 2622.31 samples/sec   Loss 10.6141   LearningRate 0.0597   Epoch: 4   Global Step: 188520   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:39,361-Speed 2624.25 samples/sec   Loss 10.5599   LearningRate 0.0597   Epoch: 4   Global Step: 188530   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:43,323-Speed 2584.97 samples/sec   Loss 10.5793   LearningRate 0.0597   Epoch: 4   Global Step: 188540   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:47,219-Speed 2629.66 samples/sec   Loss 10.5825   LearningRate 0.0597   Epoch: 4   Global Step: 188550   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:51,125-Speed 2622.24 samples/sec   Loss 10.5474   LearningRate 0.0597   Epoch: 4   Global Step: 188560   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:55,004-Speed 2640.25 samples/sec   Loss 10.5616   LearningRate 0.0597   Epoch: 4   Global Step: 188570   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:53:58,888-Speed 2637.26 samples/sec   Loss 10.5254   LearningRate 0.0597   Epoch: 4   Global Step: 188580   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:02,779-Speed 2632.68 samples/sec   Loss 10.4823   LearningRate 0.0597   Epoch: 4   Global Step: 188590   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:06,675-Speed 2628.66 samples/sec   Loss 10.5075   LearningRate 0.0597   Epoch: 4   Global Step: 188600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:10,566-Speed 2632.60 samples/sec   Loss 10.6056   LearningRate 0.0597   Epoch: 4   Global Step: 188610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:14,461-Speed 2629.92 samples/sec   Loss 10.5104   LearningRate 0.0597   Epoch: 4   Global Step: 188620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:18,354-Speed 2631.05 samples/sec   Loss 10.4864   LearningRate 0.0597   Epoch: 4   Global Step: 188630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:22,246-Speed 2631.69 samples/sec   Loss 10.5838   LearningRate 0.0597   Epoch: 4   Global Step: 188640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:26,143-Speed 2628.50 samples/sec   Loss 10.5098   LearningRate 0.0597   Epoch: 4   Global Step: 188650   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:30,037-Speed 2630.20 samples/sec   Loss 10.5311   LearningRate 0.0597   Epoch: 4   Global Step: 188660   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:33,929-Speed 2631.71 samples/sec   Loss 10.6492   LearningRate 0.0597   Epoch: 4   Global Step: 188670   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:37,824-Speed 2630.02 samples/sec   Loss 10.5138   LearningRate 0.0597   Epoch: 4   Global Step: 188680   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:54:41,741-Speed 2614.41 samples/sec   Loss 10.5098   LearningRate 0.0597   Epoch: 4   Global Step: 188690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:54:45,676-Speed 2602.85 samples/sec   Loss 10.7023   LearningRate 0.0597   Epoch: 4   Global Step: 188700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:54:49,559-Speed 2637.96 samples/sec   Loss 10.5941   LearningRate 0.0597   Epoch: 4   Global Step: 188710   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:53,458-Speed 2626.71 samples/sec   Loss 10.5436   LearningRate 0.0597   Epoch: 4   Global Step: 188720   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:54:57,360-Speed 2625.37 samples/sec   Loss 10.5774   LearningRate 0.0597   Epoch: 4   Global Step: 188730   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:01,255-Speed 2629.85 samples/sec   Loss 10.3777   LearningRate 0.0597   Epoch: 4   Global Step: 188740   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:05,171-Speed 2615.82 samples/sec   Loss 10.6520   LearningRate 0.0597   Epoch: 4   Global Step: 188750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:09,090-Speed 2613.47 samples/sec   Loss 10.5316   LearningRate 0.0597   Epoch: 4   Global Step: 188760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:12,988-Speed 2627.42 samples/sec   Loss 10.5007   LearningRate 0.0597   Epoch: 4   Global Step: 188770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:16,884-Speed 2628.92 samples/sec   Loss 10.4725   LearningRate 0.0597   Epoch: 4   Global Step: 188780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:20,779-Speed 2629.59 samples/sec   Loss 10.4902   LearningRate 0.0597   Epoch: 4   Global Step: 188790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:24,674-Speed 2629.81 samples/sec   Loss 10.4950   LearningRate 0.0597   Epoch: 4   Global Step: 188800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:55:28,573-Speed 2627.45 samples/sec   Loss 10.4425   LearningRate 0.0597   Epoch: 4   Global Step: 188810   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:32,466-Speed 2630.67 samples/sec   Loss 10.4759   LearningRate 0.0597   Epoch: 4   Global Step: 188820   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:36,387-Speed 2612.33 samples/sec   Loss 10.4692   LearningRate 0.0597   Epoch: 4   Global Step: 188830   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:40,290-Speed 2624.64 samples/sec   Loss 10.5670   LearningRate 0.0597   Epoch: 4   Global Step: 188840   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:44,187-Speed 2628.02 samples/sec   Loss 10.4246   LearningRate 0.0597   Epoch: 4   Global Step: 188850   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:48,092-Speed 2622.63 samples/sec   Loss 10.4728   LearningRate 0.0597   Epoch: 4   Global Step: 188860   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:51,990-Speed 2627.76 samples/sec   Loss 10.4553   LearningRate 0.0596   Epoch: 4   Global Step: 188870   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:55,886-Speed 2628.89 samples/sec   Loss 10.3828   LearningRate 0.0596   Epoch: 4   Global Step: 188880   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:55:59,770-Speed 2637.74 samples/sec   Loss 10.4221   LearningRate 0.0596   Epoch: 4   Global Step: 188890   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:03,674-Speed 2623.30 samples/sec   Loss 10.5587   LearningRate 0.0596   Epoch: 4   Global Step: 188900   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:07,569-Speed 2630.36 samples/sec   Loss 10.5024   LearningRate 0.0596   Epoch: 4   Global Step: 188910   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:11,473-Speed 2622.85 samples/sec   Loss 10.4777   LearningRate 0.0596   Epoch: 4   Global Step: 188920   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:15,374-Speed 2625.47 samples/sec   Loss 10.6322   LearningRate 0.0596   Epoch: 4   Global Step: 188930   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:19,277-Speed 2624.59 samples/sec   Loss 10.5715   LearningRate 0.0596   Epoch: 4   Global Step: 188940   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:23,167-Speed 2633.07 samples/sec   Loss 10.4818   LearningRate 0.0596   Epoch: 4   Global Step: 188950   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:27,061-Speed 2630.80 samples/sec   Loss 10.5365   LearningRate 0.0596   Epoch: 4   Global Step: 188960   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:30,972-Speed 2618.76 samples/sec   Loss 10.5647   LearningRate 0.0596   Epoch: 4   Global Step: 188970   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:34,884-Speed 2618.32 samples/sec   Loss 10.5226   LearningRate 0.0596   Epoch: 4   Global Step: 188980   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:56:38,784-Speed 2626.29 samples/sec   Loss 10.6781   LearningRate 0.0596   Epoch: 4   Global Step: 188990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:56:42,673-Speed 2633.80 samples/sec   Loss 10.6533   LearningRate 0.0596   Epoch: 4   Global Step: 189000   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:56:46,567-Speed 2630.84 samples/sec   Loss 10.4945   LearningRate 0.0596   Epoch: 4   Global Step: 189010   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:56:50,464-Speed 2628.13 samples/sec   Loss 10.3796   LearningRate 0.0596   Epoch: 4   Global Step: 189020   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:56:54,360-Speed 2628.84 samples/sec   Loss 10.5220   LearningRate 0.0596   Epoch: 4   Global Step: 189030   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:56:58,237-Speed 2641.78 samples/sec   Loss 10.3986   LearningRate 0.0596   Epoch: 4   Global Step: 189040   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:02,127-Speed 2633.26 samples/sec   Loss 10.4290   LearningRate 0.0596   Epoch: 4   Global Step: 189050   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:06,020-Speed 2631.02 samples/sec   Loss 10.5022   LearningRate 0.0596   Epoch: 4   Global Step: 189060   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:09,911-Speed 2632.55 samples/sec   Loss 10.5786   LearningRate 0.0596   Epoch: 4   Global Step: 189070   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:13,802-Speed 2631.76 samples/sec   Loss 10.5241   LearningRate 0.0596   Epoch: 4   Global Step: 189080   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:17,691-Speed 2634.22 samples/sec   Loss 10.5457   LearningRate 0.0596   Epoch: 4   Global Step: 189090   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:21,582-Speed 2632.17 samples/sec   Loss 10.6496   LearningRate 0.0596   Epoch: 4   Global Step: 189100   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:25,470-Speed 2634.32 samples/sec   Loss 10.4341   LearningRate 0.0596   Epoch: 4   Global Step: 189110   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:29,359-Speed 2633.61 samples/sec   Loss 10.4533   LearningRate 0.0596   Epoch: 4   Global Step: 189120   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:33,250-Speed 2632.64 samples/sec   Loss 10.4929   LearningRate 0.0596   Epoch: 4   Global Step: 189130   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:37,148-Speed 2627.75 samples/sec   Loss 10.5382   LearningRate 0.0596   Epoch: 4   Global Step: 189140   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:57:41,020-Speed 2644.99 samples/sec   Loss 10.6221   LearningRate 0.0596   Epoch: 4   Global Step: 189150   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:44,913-Speed 2631.08 samples/sec   Loss 10.4533   LearningRate 0.0596   Epoch: 4   Global Step: 189160   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:48,806-Speed 2630.52 samples/sec   Loss 10.5076   LearningRate 0.0596   Epoch: 4   Global Step: 189170   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:52,696-Speed 2633.33 samples/sec   Loss 10.5468   LearningRate 0.0596   Epoch: 4   Global Step: 189180   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:57:56,586-Speed 2633.34 samples/sec   Loss 10.5621   LearningRate 0.0596   Epoch: 4   Global Step: 189190   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:00,479-Speed 2630.63 samples/sec   Loss 10.5538   LearningRate 0.0596   Epoch: 4   Global Step: 189200   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:04,378-Speed 2627.21 samples/sec   Loss 10.4538   LearningRate 0.0596   Epoch: 4   Global Step: 189210   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:08,283-Speed 2622.71 samples/sec   Loss 10.6164   LearningRate 0.0596   Epoch: 4   Global Step: 189220   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:12,172-Speed 2633.75 samples/sec   Loss 10.4241   LearningRate 0.0596   Epoch: 4   Global Step: 189230   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:16,063-Speed 2632.42 samples/sec   Loss 10.4067   LearningRate 0.0596   Epoch: 4   Global Step: 189240   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:19,960-Speed 2628.45 samples/sec   Loss 10.4888   LearningRate 0.0596   Epoch: 4   Global Step: 189250   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:58:23,861-Speed 2625.44 samples/sec   Loss 10.5865   LearningRate 0.0596   Epoch: 4   Global Step: 189260   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:58:27,765-Speed 2623.24 samples/sec   Loss 10.3425   LearningRate 0.0596   Epoch: 4   Global Step: 189270   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:58:31,674-Speed 2620.51 samples/sec   Loss 10.4800   LearningRate 0.0596   Epoch: 4   Global Step: 189280   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:58:35,572-Speed 2627.70 samples/sec   Loss 10.6627   LearningRate 0.0596   Epoch: 4   Global Step: 189290   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:58:39,458-Speed 2635.34 samples/sec   Loss 10.6046   LearningRate 0.0596   Epoch: 4   Global Step: 189300   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:43,356-Speed 2627.70 samples/sec   Loss 10.4624   LearningRate 0.0596   Epoch: 4   Global Step: 189310   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:47,266-Speed 2619.76 samples/sec   Loss 10.4929   LearningRate 0.0596   Epoch: 4   Global Step: 189320   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:51,165-Speed 2626.81 samples/sec   Loss 10.6571   LearningRate 0.0596   Epoch: 4   Global Step: 189330   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:55,065-Speed 2626.84 samples/sec   Loss 10.5338   LearningRate 0.0596   Epoch: 4   Global Step: 189340   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:58:58,974-Speed 2619.79 samples/sec   Loss 10.5571   LearningRate 0.0596   Epoch: 4   Global Step: 189350   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:02,873-Speed 2626.92 samples/sec   Loss 10.5913   LearningRate 0.0596   Epoch: 4   Global Step: 189360   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:06,767-Speed 2630.30 samples/sec   Loss 10.4842   LearningRate 0.0596   Epoch: 4   Global Step: 189370   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:10,662-Speed 2629.17 samples/sec   Loss 10.6622   LearningRate 0.0596   Epoch: 4   Global Step: 189380   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:14,564-Speed 2625.22 samples/sec   Loss 10.4648   LearningRate 0.0596   Epoch: 4   Global Step: 189390   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:18,466-Speed 2624.86 samples/sec   Loss 10.3647   LearningRate 0.0596   Epoch: 4   Global Step: 189400   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:59:22,360-Speed 2630.70 samples/sec   Loss 10.4286   LearningRate 0.0595   Epoch: 4   Global Step: 189410   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:59:26,255-Speed 2629.82 samples/sec   Loss 10.4091   LearningRate 0.0595   Epoch: 4   Global Step: 189420   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 16:59:30,141-Speed 2635.61 samples/sec   Loss 10.5437   LearningRate 0.0595   Epoch: 4   Global Step: 189430   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:34,049-Speed 2621.08 samples/sec   Loss 10.5333   LearningRate 0.0595   Epoch: 4   Global Step: 189440   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:37,949-Speed 2625.94 samples/sec   Loss 10.5275   LearningRate 0.0595   Epoch: 4   Global Step: 189450   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:41,851-Speed 2624.86 samples/sec   Loss 10.4434   LearningRate 0.0595   Epoch: 4   Global Step: 189460   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:45,758-Speed 2621.89 samples/sec   Loss 10.5698   LearningRate 0.0595   Epoch: 4   Global Step: 189470   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:49,646-Speed 2633.61 samples/sec   Loss 10.3985   LearningRate 0.0595   Epoch: 4   Global Step: 189480   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:53,541-Speed 2630.70 samples/sec   Loss 10.3998   LearningRate 0.0595   Epoch: 4   Global Step: 189490   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 16:59:57,434-Speed 2630.44 samples/sec   Loss 10.4035   LearningRate 0.0595   Epoch: 4   Global Step: 189500   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:01,325-Speed 2632.59 samples/sec   Loss 10.5844   LearningRate 0.0595   Epoch: 4   Global Step: 189510   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:05,217-Speed 2631.44 samples/sec   Loss 10.4401   LearningRate 0.0595   Epoch: 4   Global Step: 189520   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:09,106-Speed 2633.85 samples/sec   Loss 10.5408   LearningRate 0.0595   Epoch: 4   Global Step: 189530   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:00:12,999-Speed 2630.59 samples/sec   Loss 10.5481   LearningRate 0.0595   Epoch: 4   Global Step: 189540   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:00:16,890-Speed 2632.43 samples/sec   Loss 10.4053   LearningRate 0.0595   Epoch: 4   Global Step: 189550   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:00:20,788-Speed 2627.55 samples/sec   Loss 10.3970   LearningRate 0.0595   Epoch: 4   Global Step: 189560   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:00:24,679-Speed 2632.20 samples/sec   Loss 10.5072   LearningRate 0.0595   Epoch: 4   Global Step: 189570   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:00:28,571-Speed 2631.89 samples/sec   Loss 10.4413   LearningRate 0.0595   Epoch: 4   Global Step: 189580   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:00:32,483-Speed 2618.35 samples/sec   Loss 10.5960   LearningRate 0.0595   Epoch: 4   Global Step: 189590   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:36,371-Speed 2634.28 samples/sec   Loss 10.5894   LearningRate 0.0595   Epoch: 4   Global Step: 189600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:40,264-Speed 2631.02 samples/sec   Loss 10.5283   LearningRate 0.0595   Epoch: 4   Global Step: 189610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:44,156-Speed 2631.75 samples/sec   Loss 10.5360   LearningRate 0.0595   Epoch: 4   Global Step: 189620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:48,045-Speed 2633.06 samples/sec   Loss 10.5062   LearningRate 0.0595   Epoch: 4   Global Step: 189630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:51,944-Speed 2627.90 samples/sec   Loss 10.4863   LearningRate 0.0595   Epoch: 4   Global Step: 189640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:55,840-Speed 2629.01 samples/sec   Loss 10.5333   LearningRate 0.0595   Epoch: 4   Global Step: 189650   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:00:59,764-Speed 2610.38 samples/sec   Loss 10.3506   LearningRate 0.0595   Epoch: 4   Global Step: 189660   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:03,664-Speed 2625.96 samples/sec   Loss 10.4475   LearningRate 0.0595   Epoch: 4   Global Step: 189670   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:07,553-Speed 2633.95 samples/sec   Loss 10.3458   LearningRate 0.0595   Epoch: 4   Global Step: 189680   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:11,446-Speed 2631.32 samples/sec   Loss 10.5997   LearningRate 0.0595   Epoch: 4   Global Step: 189690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:01:15,353-Speed 2621.25 samples/sec   Loss 10.3963   LearningRate 0.0595   Epoch: 4   Global Step: 189700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:01:19,241-Speed 2634.32 samples/sec   Loss 10.5665   LearningRate 0.0595   Epoch: 4   Global Step: 189710   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:01:23,139-Speed 2628.01 samples/sec   Loss 10.3996   LearningRate 0.0595   Epoch: 4   Global Step: 189720   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:01:27,035-Speed 2629.00 samples/sec   Loss 10.4833   LearningRate 0.0595   Epoch: 4   Global Step: 189730   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:01:30,951-Speed 2615.63 samples/sec   Loss 10.4912   LearningRate 0.0595   Epoch: 4   Global Step: 189740   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:35,014-Speed 2520.77 samples/sec   Loss 10.5216   LearningRate 0.0595   Epoch: 4   Global Step: 189750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:38,984-Speed 2580.23 samples/sec   Loss 10.4604   LearningRate 0.0595   Epoch: 4   Global Step: 189760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:42,877-Speed 2630.26 samples/sec   Loss 10.5101   LearningRate 0.0595   Epoch: 4   Global Step: 189770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:46,778-Speed 2625.83 samples/sec   Loss 10.5915   LearningRate 0.0595   Epoch: 4   Global Step: 189780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:50,669-Speed 2632.75 samples/sec   Loss 10.5055   LearningRate 0.0595   Epoch: 4   Global Step: 189790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:54,562-Speed 2631.08 samples/sec   Loss 10.4294   LearningRate 0.0595   Epoch: 4   Global Step: 189800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:01:58,460-Speed 2627.90 samples/sec   Loss 10.6250   LearningRate 0.0595   Epoch: 4   Global Step: 189810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:02:02,354-Speed 2630.20 samples/sec   Loss 10.4573   LearningRate 0.0595   Epoch: 4   Global Step: 189820   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:02:06,247-Speed 2631.27 samples/sec   Loss 10.4817   LearningRate 0.0595   Epoch: 4   Global Step: 189830   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:02:10,143-Speed 2628.67 samples/sec   Loss 10.4053   LearningRate 0.0595   Epoch: 4   Global Step: 189840   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:14,038-Speed 2629.20 samples/sec   Loss 10.4245   LearningRate 0.0595   Epoch: 4   Global Step: 189850   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:17,928-Speed 2633.18 samples/sec   Loss 10.5061   LearningRate 0.0595   Epoch: 4   Global Step: 189860   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:21,826-Speed 2628.12 samples/sec   Loss 10.4600   LearningRate 0.0595   Epoch: 4   Global Step: 189870   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:25,738-Speed 2618.05 samples/sec   Loss 10.2931   LearningRate 0.0595   Epoch: 4   Global Step: 189880   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:29,630-Speed 2631.36 samples/sec   Loss 10.4296   LearningRate 0.0595   Epoch: 4   Global Step: 189890   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:33,533-Speed 2624.90 samples/sec   Loss 10.3907   LearningRate 0.0595   Epoch: 4   Global Step: 189900   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:37,431-Speed 2627.71 samples/sec   Loss 10.5253   LearningRate 0.0595   Epoch: 4   Global Step: 189910   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:41,326-Speed 2629.89 samples/sec   Loss 10.5326   LearningRate 0.0595   Epoch: 4   Global Step: 189920   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:45,238-Speed 2618.71 samples/sec   Loss 10.5214   LearningRate 0.0595   Epoch: 4   Global Step: 189930   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:49,119-Speed 2639.13 samples/sec   Loss 10.5698   LearningRate 0.0595   Epoch: 4   Global Step: 189940   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:53,031-Speed 2618.50 samples/sec   Loss 10.5068   LearningRate 0.0594   Epoch: 4   Global Step: 189950   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:02:56,931-Speed 2626.50 samples/sec   Loss 10.5631   LearningRate 0.0594   Epoch: 4   Global Step: 189960   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:03:00,825-Speed 2630.47 samples/sec   Loss 10.3840   LearningRate 0.0594   Epoch: 4   Global Step: 189970   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:03:04,719-Speed 2629.70 samples/sec   Loss 10.4495   LearningRate 0.0594   Epoch: 4   Global Step: 189980   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:03:08,614-Speed 2630.54 samples/sec   Loss 10.4605   LearningRate 0.0594   Epoch: 4   Global Step: 189990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:03:12,541-Speed 2608.47 samples/sec   Loss 10.5089   LearningRate 0.0594   Epoch: 4   Global Step: 190000   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:03:55,704-[lfw][190000]XNorm: 23.847233
Training: 2022-04-13 17:03:55,704-[lfw][190000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-13 17:03:55,705-[lfw][190000]Accuracy-Highest: 0.99783
Training: 2022-04-13 17:04:46,102-[cfp_fp][190000]XNorm: 21.677466
Training: 2022-04-13 17:04:46,103-[cfp_fp][190000]Accuracy-Flip: 0.98071+-0.00804
Training: 2022-04-13 17:04:46,104-[cfp_fp][190000]Accuracy-Highest: 0.98100
Training: 2022-04-13 17:05:28,829-[agedb_30][190000]XNorm: 23.758015
Training: 2022-04-13 17:05:28,830-[agedb_30][190000]Accuracy-Flip: 0.96967+-0.00785
Training: 2022-04-13 17:05:28,831-[agedb_30][190000]Accuracy-Highest: 0.97150
Training: 2022-04-13 17:05:32,703-Speed 73.06 samples/sec   Loss 10.4995   LearningRate 0.0594   Epoch: 4   Global Step: 190010   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:05:36,530-Speed 2676.24 samples/sec   Loss 10.4946   LearningRate 0.0594   Epoch: 4   Global Step: 190020   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:05:40,400-Speed 2647.36 samples/sec   Loss 10.6074   LearningRate 0.0594   Epoch: 4   Global Step: 190030   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:05:44,283-Speed 2637.17 samples/sec   Loss 10.4411   LearningRate 0.0594   Epoch: 4   Global Step: 190040   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:05:48,284-Speed 2561.59 samples/sec   Loss 10.4263   LearningRate 0.0594   Epoch: 4   Global Step: 190050   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:05:52,156-Speed 2645.62 samples/sec   Loss 11.1960   LearningRate 0.0594   Epoch: 4   Global Step: 190060   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:05:56,037-Speed 2639.45 samples/sec   Loss 11.1806   LearningRate 0.0594   Epoch: 4   Global Step: 190070   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:05:59,916-Speed 2639.78 samples/sec   Loss 10.7807   LearningRate 0.0594   Epoch: 4   Global Step: 190080   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:03,803-Speed 2635.90 samples/sec   Loss 10.5264   LearningRate 0.0594   Epoch: 4   Global Step: 190090   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:07,696-Speed 2630.58 samples/sec   Loss 10.5076   LearningRate 0.0594   Epoch: 4   Global Step: 190100   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:11,579-Speed 2638.17 samples/sec   Loss 10.4712   LearningRate 0.0594   Epoch: 4   Global Step: 190110   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:15,461-Speed 2638.00 samples/sec   Loss 10.5662   LearningRate 0.0594   Epoch: 4   Global Step: 190120   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:19,356-Speed 2629.79 samples/sec   Loss 10.5367   LearningRate 0.0594   Epoch: 4   Global Step: 190130   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:23,243-Speed 2635.81 samples/sec   Loss 10.8120   LearningRate 0.0594   Epoch: 4   Global Step: 190140   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:27,129-Speed 2635.55 samples/sec   Loss 10.5976   LearningRate 0.0594   Epoch: 4   Global Step: 190150   Fp16 Grad Scale: 16384   Required: 72 hours
Training: 2022-04-13 17:06:31,018-Speed 2634.12 samples/sec   Loss 10.5918   LearningRate 0.0594   Epoch: 4   Global Step: 190160   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:34,913-Speed 2629.08 samples/sec   Loss 10.4924   LearningRate 0.0594   Epoch: 4   Global Step: 190170   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:38,806-Speed 2631.10 samples/sec   Loss 10.5546   LearningRate 0.0594   Epoch: 4   Global Step: 190180   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:42,699-Speed 2631.14 samples/sec   Loss 10.5572   LearningRate 0.0594   Epoch: 4   Global Step: 190190   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:46,603-Speed 2623.29 samples/sec   Loss 10.5023   LearningRate 0.0594   Epoch: 4   Global Step: 190200   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:50,497-Speed 2630.54 samples/sec   Loss 10.5750   LearningRate 0.0594   Epoch: 4   Global Step: 190210   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:54,392-Speed 2629.76 samples/sec   Loss 10.6720   LearningRate 0.0594   Epoch: 4   Global Step: 190220   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:06:58,289-Speed 2628.73 samples/sec   Loss 10.5994   LearningRate 0.0594   Epoch: 4   Global Step: 190230   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:07:02,192-Speed 2624.00 samples/sec   Loss 10.7873   LearningRate 0.0594   Epoch: 4   Global Step: 190240   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:07:06,100-Speed 2621.39 samples/sec   Loss 10.5660   LearningRate 0.0594   Epoch: 4   Global Step: 190250   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:07:10,086-Speed 2569.48 samples/sec   Loss 10.4971   LearningRate 0.0594   Epoch: 4   Global Step: 190260   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:14,155-Speed 2517.45 samples/sec   Loss 10.5623   LearningRate 0.0594   Epoch: 4   Global Step: 190270   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:18,108-Speed 2590.96 samples/sec   Loss 10.5113   LearningRate 0.0594   Epoch: 4   Global Step: 190280   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:22,212-Speed 2495.77 samples/sec   Loss 10.5859   LearningRate 0.0594   Epoch: 4   Global Step: 190290   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:26,141-Speed 2607.52 samples/sec   Loss 10.7468   LearningRate 0.0594   Epoch: 4   Global Step: 190300   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:30,070-Speed 2606.92 samples/sec   Loss 10.3825   LearningRate 0.0594   Epoch: 4   Global Step: 190310   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:33,961-Speed 2631.87 samples/sec   Loss 10.5035   LearningRate 0.0594   Epoch: 4   Global Step: 190320   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:37,857-Speed 2628.84 samples/sec   Loss 10.6838   LearningRate 0.0594   Epoch: 4   Global Step: 190330   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:41,752-Speed 2630.30 samples/sec   Loss 10.5093   LearningRate 0.0594   Epoch: 4   Global Step: 190340   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:45,645-Speed 2630.71 samples/sec   Loss 10.6347   LearningRate 0.0594   Epoch: 4   Global Step: 190350   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:07:49,665-Speed 2548.36 samples/sec   Loss 10.5058   LearningRate 0.0594   Epoch: 4   Global Step: 190360   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:07:53,644-Speed 2573.94 samples/sec   Loss 10.4967   LearningRate 0.0594   Epoch: 4   Global Step: 190370   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:07:57,537-Speed 2631.77 samples/sec   Loss 10.4387   LearningRate 0.0594   Epoch: 4   Global Step: 190380   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:01,433-Speed 2628.93 samples/sec   Loss 10.6314   LearningRate 0.0594   Epoch: 4   Global Step: 190390   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:05,328-Speed 2628.94 samples/sec   Loss 10.5006   LearningRate 0.0594   Epoch: 4   Global Step: 190400   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:09,235-Speed 2621.53 samples/sec   Loss 10.4913   LearningRate 0.0594   Epoch: 4   Global Step: 190410   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:13,162-Speed 2608.71 samples/sec   Loss 10.4664   LearningRate 0.0594   Epoch: 4   Global Step: 190420   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:17,061-Speed 2627.18 samples/sec   Loss 10.6770   LearningRate 0.0594   Epoch: 4   Global Step: 190430   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:20,984-Speed 2610.44 samples/sec   Loss 10.5858   LearningRate 0.0594   Epoch: 4   Global Step: 190440   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:24,886-Speed 2625.18 samples/sec   Loss 10.4522   LearningRate 0.0594   Epoch: 4   Global Step: 190450   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:28,785-Speed 2626.46 samples/sec   Loss 10.5944   LearningRate 0.0594   Epoch: 4   Global Step: 190460   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:32,686-Speed 2625.92 samples/sec   Loss 10.4850   LearningRate 0.0594   Epoch: 4   Global Step: 190470   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:08:36,584-Speed 2627.91 samples/sec   Loss 10.3938   LearningRate 0.0594   Epoch: 4   Global Step: 190480   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:08:40,478-Speed 2629.81 samples/sec   Loss 10.5246   LearningRate 0.0593   Epoch: 4   Global Step: 190490   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:08:44,389-Speed 2618.67 samples/sec   Loss 10.4154   LearningRate 0.0593   Epoch: 4   Global Step: 190500   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:08:48,286-Speed 2628.94 samples/sec   Loss 10.3993   LearningRate 0.0593   Epoch: 4   Global Step: 190510   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:08:52,186-Speed 2626.36 samples/sec   Loss 10.5274   LearningRate 0.0593   Epoch: 4   Global Step: 190520   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:08:56,109-Speed 2610.79 samples/sec   Loss 10.3678   LearningRate 0.0593   Epoch: 4   Global Step: 190530   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:09:00,071-Speed 2584.91 samples/sec   Loss 10.5340   LearningRate 0.0593   Epoch: 4   Global Step: 190540   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:09:03,965-Speed 2630.23 samples/sec   Loss 10.6263   LearningRate 0.0593   Epoch: 4   Global Step: 190550   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:09:07,871-Speed 2622.52 samples/sec   Loss 10.4625   LearningRate 0.0593   Epoch: 4   Global Step: 190560   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:09:11,768-Speed 2627.69 samples/sec   Loss 10.4980   LearningRate 0.0593   Epoch: 4   Global Step: 190570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:09:15,670-Speed 2625.32 samples/sec   Loss 10.4501   LearningRate 0.0593   Epoch: 4   Global Step: 190580   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:19,571-Speed 2625.44 samples/sec   Loss 10.5338   LearningRate 0.0593   Epoch: 4   Global Step: 190590   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:23,470-Speed 2626.97 samples/sec   Loss 10.6639   LearningRate 0.0593   Epoch: 4   Global Step: 190600   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:27,370-Speed 2625.98 samples/sec   Loss 10.4374   LearningRate 0.0593   Epoch: 4   Global Step: 190610   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:31,266-Speed 2629.25 samples/sec   Loss 10.4807   LearningRate 0.0593   Epoch: 4   Global Step: 190620   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:35,203-Speed 2601.37 samples/sec   Loss 10.5609   LearningRate 0.0593   Epoch: 4   Global Step: 190630   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:39,300-Speed 2499.93 samples/sec   Loss 10.4715   LearningRate 0.0593   Epoch: 4   Global Step: 190640   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:43,224-Speed 2609.84 samples/sec   Loss 10.5445   LearningRate 0.0593   Epoch: 4   Global Step: 190650   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:47,123-Speed 2627.84 samples/sec   Loss 10.4267   LearningRate 0.0593   Epoch: 4   Global Step: 190660   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:51,023-Speed 2626.33 samples/sec   Loss 10.5045   LearningRate 0.0593   Epoch: 4   Global Step: 190670   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:54,918-Speed 2629.38 samples/sec   Loss 10.5177   LearningRate 0.0593   Epoch: 4   Global Step: 190680   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:09:58,812-Speed 2630.59 samples/sec   Loss 10.3624   LearningRate 0.0593   Epoch: 4   Global Step: 190690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:10:02,711-Speed 2627.67 samples/sec   Loss 10.2636   LearningRate 0.0593   Epoch: 4   Global Step: 190700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:10:06,604-Speed 2630.48 samples/sec   Loss 10.5871   LearningRate 0.0593   Epoch: 4   Global Step: 190710   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:10:10,483-Speed 2640.28 samples/sec   Loss 10.4481   LearningRate 0.0593   Epoch: 4   Global Step: 190720   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:14,381-Speed 2627.77 samples/sec   Loss 10.5135   LearningRate 0.0593   Epoch: 4   Global Step: 190730   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:18,282-Speed 2626.02 samples/sec   Loss 10.5842   LearningRate 0.0593   Epoch: 4   Global Step: 190740   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:22,183-Speed 2625.72 samples/sec   Loss 10.5903   LearningRate 0.0593   Epoch: 4   Global Step: 190750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:26,080-Speed 2627.71 samples/sec   Loss 10.5571   LearningRate 0.0593   Epoch: 4   Global Step: 190760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:29,980-Speed 2626.37 samples/sec   Loss 10.4954   LearningRate 0.0593   Epoch: 4   Global Step: 190770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:33,876-Speed 2628.71 samples/sec   Loss 10.5377   LearningRate 0.0593   Epoch: 4   Global Step: 190780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:37,774-Speed 2628.04 samples/sec   Loss 10.5018   LearningRate 0.0593   Epoch: 4   Global Step: 190790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:41,681-Speed 2621.16 samples/sec   Loss 10.4894   LearningRate 0.0593   Epoch: 4   Global Step: 190800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:45,579-Speed 2627.52 samples/sec   Loss 10.4479   LearningRate 0.0593   Epoch: 4   Global Step: 190810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:10:49,520-Speed 2598.92 samples/sec   Loss 10.5496   LearningRate 0.0593   Epoch: 4   Global Step: 190820   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:10:53,419-Speed 2627.27 samples/sec   Loss 10.5843   LearningRate 0.0593   Epoch: 4   Global Step: 190830   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:10:57,302-Speed 2637.94 samples/sec   Loss 10.5491   LearningRate 0.0593   Epoch: 4   Global Step: 190840   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:01,200-Speed 2627.45 samples/sec   Loss 10.3084   LearningRate 0.0593   Epoch: 4   Global Step: 190850   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:05,109-Speed 2620.11 samples/sec   Loss 10.5249   LearningRate 0.0593   Epoch: 4   Global Step: 190860   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:09,181-Speed 2515.10 samples/sec   Loss 10.4921   LearningRate 0.0593   Epoch: 4   Global Step: 190870   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:13,251-Speed 2516.85 samples/sec   Loss 10.5815   LearningRate 0.0593   Epoch: 4   Global Step: 190880   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:17,207-Speed 2588.41 samples/sec   Loss 10.5238   LearningRate 0.0593   Epoch: 4   Global Step: 190890   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:21,108-Speed 2625.71 samples/sec   Loss 10.3851   LearningRate 0.0593   Epoch: 4   Global Step: 190900   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:25,020-Speed 2618.86 samples/sec   Loss 10.5488   LearningRate 0.0593   Epoch: 4   Global Step: 190910   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:28,921-Speed 2625.90 samples/sec   Loss 10.4772   LearningRate 0.0593   Epoch: 4   Global Step: 190920   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:32,820-Speed 2626.80 samples/sec   Loss 10.4737   LearningRate 0.0593   Epoch: 4   Global Step: 190930   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:11:36,716-Speed 2628.76 samples/sec   Loss 10.5006   LearningRate 0.0593   Epoch: 4   Global Step: 190940   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:11:40,612-Speed 2628.82 samples/sec   Loss 10.5006   LearningRate 0.0593   Epoch: 4   Global Step: 190950   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:11:44,512-Speed 2625.99 samples/sec   Loss 10.4179   LearningRate 0.0593   Epoch: 4   Global Step: 190960   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:11:48,411-Speed 2627.28 samples/sec   Loss 10.4668   LearningRate 0.0593   Epoch: 4   Global Step: 190970   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:11:52,306-Speed 2629.08 samples/sec   Loss 10.4917   LearningRate 0.0593   Epoch: 4   Global Step: 190980   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:11:56,210-Speed 2623.77 samples/sec   Loss 10.5002   LearningRate 0.0593   Epoch: 4   Global Step: 190990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:12:00,111-Speed 2625.60 samples/sec   Loss 10.6046   LearningRate 0.0593   Epoch: 4   Global Step: 191000   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:12:04,008-Speed 2628.17 samples/sec   Loss 10.5510   LearningRate 0.0593   Epoch: 4   Global Step: 191010   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:12:07,904-Speed 2629.30 samples/sec   Loss 10.5355   LearningRate 0.0592   Epoch: 4   Global Step: 191020   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:12:11,798-Speed 2630.17 samples/sec   Loss 10.5403   LearningRate 0.0592   Epoch: 4   Global Step: 191030   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:12:15,677-Speed 2640.32 samples/sec   Loss 10.4950   LearningRate 0.0592   Epoch: 4   Global Step: 191040   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:12:19,561-Speed 2637.46 samples/sec   Loss 10.3723   LearningRate 0.0592   Epoch: 4   Global Step: 191050   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:12:23,437-Speed 2642.42 samples/sec   Loss 10.5365   LearningRate 0.0592   Epoch: 4   Global Step: 191060   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:27,343-Speed 2622.61 samples/sec   Loss 11.3496   LearningRate 0.0592   Epoch: 4   Global Step: 191070   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:31,237-Speed 2629.68 samples/sec   Loss 10.9315   LearningRate 0.0592   Epoch: 4   Global Step: 191080   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:35,137-Speed 2626.16 samples/sec   Loss 10.6548   LearningRate 0.0592   Epoch: 4   Global Step: 191090   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:39,034-Speed 2628.75 samples/sec   Loss 10.7133   LearningRate 0.0592   Epoch: 4   Global Step: 191100   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:42,926-Speed 2632.10 samples/sec   Loss 10.6835   LearningRate 0.0592   Epoch: 4   Global Step: 191110   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:46,818-Speed 2631.02 samples/sec   Loss 10.6263   LearningRate 0.0592   Epoch: 4   Global Step: 191120   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:50,719-Speed 2625.58 samples/sec   Loss 10.5824   LearningRate 0.0592   Epoch: 4   Global Step: 191130   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:54,637-Speed 2614.18 samples/sec   Loss 10.3810   LearningRate 0.0592   Epoch: 4   Global Step: 191140   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:12:58,548-Speed 2618.96 samples/sec   Loss 10.4915   LearningRate 0.0592   Epoch: 4   Global Step: 191150   Fp16 Grad Scale: 32768   Required: 72 hours
Training: 2022-04-13 17:13:02,440-Speed 2631.40 samples/sec   Loss 10.5862   LearningRate 0.0592   Epoch: 4   Global Step: 191160   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:06,337-Speed 2628.25 samples/sec   Loss 10.5384   LearningRate 0.0592   Epoch: 4   Global Step: 191170   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:10,238-Speed 2625.37 samples/sec   Loss 10.7065   LearningRate 0.0592   Epoch: 4   Global Step: 191180   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:14,132-Speed 2630.92 samples/sec   Loss 10.4927   LearningRate 0.0592   Epoch: 4   Global Step: 191190   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:18,035-Speed 2624.52 samples/sec   Loss 10.4100   LearningRate 0.0592   Epoch: 4   Global Step: 191200   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:21,934-Speed 2626.87 samples/sec   Loss 10.6214   LearningRate 0.0592   Epoch: 4   Global Step: 191210   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:25,840-Speed 2623.13 samples/sec   Loss 10.6271   LearningRate 0.0592   Epoch: 4   Global Step: 191220   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:29,733-Speed 2631.02 samples/sec   Loss 10.5808   LearningRate 0.0592   Epoch: 4   Global Step: 191230   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:33,626-Speed 2630.29 samples/sec   Loss 10.4136   LearningRate 0.0592   Epoch: 4   Global Step: 191240   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:37,545-Speed 2613.85 samples/sec   Loss 10.4926   LearningRate 0.0592   Epoch: 4   Global Step: 191250   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:13:41,441-Speed 2629.60 samples/sec   Loss 10.4247   LearningRate 0.0592   Epoch: 4   Global Step: 191260   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:13:45,349-Speed 2620.51 samples/sec   Loss 10.3414   LearningRate 0.0592   Epoch: 4   Global Step: 191270   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:13:49,264-Speed 2616.28 samples/sec   Loss 10.4203   LearningRate 0.0592   Epoch: 4   Global Step: 191280   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:13:53,184-Speed 2612.84 samples/sec   Loss 10.5427   LearningRate 0.0592   Epoch: 4   Global Step: 191290   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:13:57,116-Speed 2605.52 samples/sec   Loss 10.7028   LearningRate 0.0592   Epoch: 4   Global Step: 191300   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:01,046-Speed 2605.74 samples/sec   Loss 10.6545   LearningRate 0.0592   Epoch: 4   Global Step: 191310   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:04,956-Speed 2619.79 samples/sec   Loss 10.4746   LearningRate 0.0592   Epoch: 4   Global Step: 191320   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:08,858-Speed 2624.41 samples/sec   Loss 10.5597   LearningRate 0.0592   Epoch: 4   Global Step: 191330   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:12,763-Speed 2623.83 samples/sec   Loss 10.5071   LearningRate 0.0592   Epoch: 4   Global Step: 191340   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:16,667-Speed 2623.10 samples/sec   Loss 10.6524   LearningRate 0.0592   Epoch: 4   Global Step: 191350   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:20,571-Speed 2623.30 samples/sec   Loss 10.5086   LearningRate 0.0592   Epoch: 4   Global Step: 191360   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:14:24,477-Speed 2622.68 samples/sec   Loss 10.7572   LearningRate 0.0592   Epoch: 4   Global Step: 191370   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:14:28,385-Speed 2620.84 samples/sec   Loss 10.6984   LearningRate 0.0592   Epoch: 4   Global Step: 191380   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:32,283-Speed 2628.06 samples/sec   Loss 10.5030   LearningRate 0.0592   Epoch: 4   Global Step: 191390   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:36,181-Speed 2628.12 samples/sec   Loss 10.5052   LearningRate 0.0592   Epoch: 4   Global Step: 191400   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:40,075-Speed 2630.12 samples/sec   Loss 10.5307   LearningRate 0.0592   Epoch: 4   Global Step: 191410   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:43,971-Speed 2628.96 samples/sec   Loss 10.4809   LearningRate 0.0592   Epoch: 4   Global Step: 191420   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:47,879-Speed 2620.79 samples/sec   Loss 10.4106   LearningRate 0.0592   Epoch: 4   Global Step: 191430   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:51,943-Speed 2520.99 samples/sec   Loss 10.5279   LearningRate 0.0592   Epoch: 4   Global Step: 191440   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:14:55,914-Speed 2579.15 samples/sec   Loss 10.6795   LearningRate 0.0592   Epoch: 4   Global Step: 191450   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:14:59,839-Speed 2609.48 samples/sec   Loss 10.6149   LearningRate 0.0592   Epoch: 4   Global Step: 191460   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:03,733-Speed 2630.78 samples/sec   Loss 10.4043   LearningRate 0.0592   Epoch: 4   Global Step: 191470   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:07,630-Speed 2628.40 samples/sec   Loss 10.5212   LearningRate 0.0592   Epoch: 4   Global Step: 191480   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:11,536-Speed 2621.76 samples/sec   Loss 10.4960   LearningRate 0.0592   Epoch: 4   Global Step: 191490   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:15,435-Speed 2627.50 samples/sec   Loss 10.4771   LearningRate 0.0592   Epoch: 4   Global Step: 191500   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:19,330-Speed 2629.27 samples/sec   Loss 10.4369   LearningRate 0.0592   Epoch: 4   Global Step: 191510   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:23,223-Speed 2631.54 samples/sec   Loss 10.4728   LearningRate 0.0592   Epoch: 4   Global Step: 191520   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:27,118-Speed 2629.04 samples/sec   Loss 10.5501   LearningRate 0.0592   Epoch: 4   Global Step: 191530   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:31,021-Speed 2624.53 samples/sec   Loss 10.4331   LearningRate 0.0592   Epoch: 4   Global Step: 191540   Fp16 Grad Scale: 65536   Required: 72 hours
Training: 2022-04-13 17:15:34,916-Speed 2629.58 samples/sec   Loss 10.4178   LearningRate 0.0592   Epoch: 4   Global Step: 191550   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:15:38,814-Speed 2628.41 samples/sec   Loss 10.4856   LearningRate 0.0591   Epoch: 4   Global Step: 191560   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:15:42,711-Speed 2628.23 samples/sec   Loss 10.4312   LearningRate 0.0591   Epoch: 4   Global Step: 191570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:15:46,602-Speed 2631.82 samples/sec   Loss 10.5477   LearningRate 0.0591   Epoch: 4   Global Step: 191580   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:15:50,496-Speed 2630.52 samples/sec   Loss 10.4769   LearningRate 0.0591   Epoch: 4   Global Step: 191590   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:15:54,391-Speed 2629.35 samples/sec   Loss 10.4061   LearningRate 0.0591   Epoch: 4   Global Step: 191600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:15:58,292-Speed 2625.87 samples/sec   Loss 10.5409   LearningRate 0.0591   Epoch: 4   Global Step: 191610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:02,190-Speed 2627.20 samples/sec   Loss 10.4956   LearningRate 0.0591   Epoch: 4   Global Step: 191620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:06,091-Speed 2627.01 samples/sec   Loss 10.5650   LearningRate 0.0591   Epoch: 4   Global Step: 191630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:09,993-Speed 2624.91 samples/sec   Loss 10.5025   LearningRate 0.0591   Epoch: 4   Global Step: 191640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:13,903-Speed 2618.97 samples/sec   Loss 10.4961   LearningRate 0.0591   Epoch: 4   Global Step: 191650   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:16:17,974-Speed 2516.30 samples/sec   Loss 10.5350   LearningRate 0.0591   Epoch: 4   Global Step: 191660   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:16:21,937-Speed 2585.15 samples/sec   Loss 10.5844   LearningRate 0.0591   Epoch: 4   Global Step: 191670   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:16:25,832-Speed 2629.44 samples/sec   Loss 10.6002   LearningRate 0.0591   Epoch: 4   Global Step: 191680   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:16:29,728-Speed 2628.65 samples/sec   Loss 10.4022   LearningRate 0.0591   Epoch: 4   Global Step: 191690   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:16:33,634-Speed 2622.96 samples/sec   Loss 10.5772   LearningRate 0.0591   Epoch: 4   Global Step: 191700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:16:37,509-Speed 2643.47 samples/sec   Loss 10.5031   LearningRate 0.0591   Epoch: 4   Global Step: 191710   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:41,408-Speed 2626.63 samples/sec   Loss 10.4935   LearningRate 0.0591   Epoch: 4   Global Step: 191720   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:45,302-Speed 2630.61 samples/sec   Loss 10.6202   LearningRate 0.0591   Epoch: 4   Global Step: 191730   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:49,197-Speed 2629.43 samples/sec   Loss 10.5857   LearningRate 0.0591   Epoch: 4   Global Step: 191740   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:53,252-Speed 2526.24 samples/sec   Loss 10.5287   LearningRate 0.0591   Epoch: 4   Global Step: 191750   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:16:57,148-Speed 2628.89 samples/sec   Loss 10.4343   LearningRate 0.0591   Epoch: 4   Global Step: 191760   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:01,044-Speed 2629.00 samples/sec   Loss 10.4776   LearningRate 0.0591   Epoch: 4   Global Step: 191770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:04,950-Speed 2621.85 samples/sec   Loss 10.4395   LearningRate 0.0591   Epoch: 4   Global Step: 191780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:08,881-Speed 2606.15 samples/sec   Loss 10.5774   LearningRate 0.0591   Epoch: 4   Global Step: 191790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:12,775-Speed 2630.39 samples/sec   Loss 10.6506   LearningRate 0.0591   Epoch: 4   Global Step: 191800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:16,681-Speed 2622.30 samples/sec   Loss 10.4952   LearningRate 0.0591   Epoch: 4   Global Step: 191810   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:20,599-Speed 2614.37 samples/sec   Loss 10.4556   LearningRate 0.0591   Epoch: 4   Global Step: 191820   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:24,499-Speed 2626.31 samples/sec   Loss 10.5630   LearningRate 0.0591   Epoch: 4   Global Step: 191830   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:28,393-Speed 2630.48 samples/sec   Loss 10.5401   LearningRate 0.0591   Epoch: 4   Global Step: 191840   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:32,333-Speed 2599.29 samples/sec   Loss 10.4607   LearningRate 0.0591   Epoch: 4   Global Step: 191850   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:36,274-Speed 2599.15 samples/sec   Loss 10.5014   LearningRate 0.0591   Epoch: 4   Global Step: 191860   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:40,176-Speed 2625.25 samples/sec   Loss 10.4124   LearningRate 0.0591   Epoch: 4   Global Step: 191870   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:44,065-Speed 2633.52 samples/sec   Loss 10.4306   LearningRate 0.0591   Epoch: 4   Global Step: 191880   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:17:47,940-Speed 2642.96 samples/sec   Loss 10.5148   LearningRate 0.0591   Epoch: 4   Global Step: 191890   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:51,863-Speed 2611.58 samples/sec   Loss 10.4844   LearningRate 0.0591   Epoch: 4   Global Step: 191900   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:55,757-Speed 2630.14 samples/sec   Loss 10.5163   LearningRate 0.0591   Epoch: 4   Global Step: 191910   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:17:59,655-Speed 2628.37 samples/sec   Loss 10.4989   LearningRate 0.0591   Epoch: 4   Global Step: 191920   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:03,549-Speed 2630.22 samples/sec   Loss 10.6097   LearningRate 0.0591   Epoch: 4   Global Step: 191930   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:07,482-Speed 2604.04 samples/sec   Loss 10.5250   LearningRate 0.0591   Epoch: 4   Global Step: 191940   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:11,378-Speed 2628.87 samples/sec   Loss 10.6141   LearningRate 0.0591   Epoch: 4   Global Step: 191950   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:15,279-Speed 2626.17 samples/sec   Loss 10.3741   LearningRate 0.0591   Epoch: 4   Global Step: 191960   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:19,172-Speed 2631.58 samples/sec   Loss 10.5229   LearningRate 0.0591   Epoch: 4   Global Step: 191970   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:23,070-Speed 2627.41 samples/sec   Loss 10.4623   LearningRate 0.0591   Epoch: 4   Global Step: 191980   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:18:26,969-Speed 2627.45 samples/sec   Loss 10.4178   LearningRate 0.0591   Epoch: 4   Global Step: 191990   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:30,867-Speed 2627.45 samples/sec   Loss 10.4474   LearningRate 0.0591   Epoch: 4   Global Step: 192000   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:34,770-Speed 2624.37 samples/sec   Loss 10.3882   LearningRate 0.0591   Epoch: 4   Global Step: 192010   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:38,670-Speed 2626.10 samples/sec   Loss 10.5311   LearningRate 0.0591   Epoch: 4   Global Step: 192020   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:42,565-Speed 2629.76 samples/sec   Loss 10.4435   LearningRate 0.0591   Epoch: 4   Global Step: 192030   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:46,465-Speed 2626.38 samples/sec   Loss 10.6205   LearningRate 0.0591   Epoch: 4   Global Step: 192040   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:50,376-Speed 2618.83 samples/sec   Loss 10.4176   LearningRate 0.0591   Epoch: 4   Global Step: 192050   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:54,280-Speed 2623.28 samples/sec   Loss 10.5343   LearningRate 0.0591   Epoch: 4   Global Step: 192060   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:18:58,155-Speed 2643.84 samples/sec   Loss 10.4303   LearningRate 0.0591   Epoch: 4   Global Step: 192070   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:02,063-Speed 2621.13 samples/sec   Loss 10.4623   LearningRate 0.0591   Epoch: 4   Global Step: 192080   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:05,960-Speed 2627.90 samples/sec   Loss 10.4371   LearningRate 0.0591   Epoch: 4   Global Step: 192090   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:09,915-Speed 2589.93 samples/sec   Loss 10.4647   LearningRate 0.0590   Epoch: 4   Global Step: 192100   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:13,829-Speed 2617.01 samples/sec   Loss 10.4058   LearningRate 0.0590   Epoch: 4   Global Step: 192110   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:17,770-Speed 2599.12 samples/sec   Loss 10.4531   LearningRate 0.0590   Epoch: 4   Global Step: 192120   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:21,674-Speed 2624.12 samples/sec   Loss 10.5838   LearningRate 0.0590   Epoch: 4   Global Step: 192130   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:25,581-Speed 2621.21 samples/sec   Loss 10.6053   LearningRate 0.0590   Epoch: 4   Global Step: 192140   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:29,472-Speed 2632.55 samples/sec   Loss 10.5158   LearningRate 0.0590   Epoch: 4   Global Step: 192150   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:33,367-Speed 2630.05 samples/sec   Loss 10.3791   LearningRate 0.0590   Epoch: 4   Global Step: 192160   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:19:37,260-Speed 2630.53 samples/sec   Loss 10.5939   LearningRate 0.0590   Epoch: 4   Global Step: 192170   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:19:41,154-Speed 2630.14 samples/sec   Loss 10.5032   LearningRate 0.0590   Epoch: 4   Global Step: 192180   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:19:45,049-Speed 2630.17 samples/sec   Loss 10.4860   LearningRate 0.0590   Epoch: 4   Global Step: 192190   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:19:48,942-Speed 2631.11 samples/sec   Loss 10.5053   LearningRate 0.0590   Epoch: 4   Global Step: 192200   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:19:52,834-Speed 2632.19 samples/sec   Loss 10.5103   LearningRate 0.0590   Epoch: 4   Global Step: 192210   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:19:56,728-Speed 2630.05 samples/sec   Loss 10.4409   LearningRate 0.0590   Epoch: 4   Global Step: 192220   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:20:00,631-Speed 2624.44 samples/sec   Loss 10.3759   LearningRate 0.0590   Epoch: 4   Global Step: 192230   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:20:04,542-Speed 2618.94 samples/sec   Loss 10.4965   LearningRate 0.0590   Epoch: 4   Global Step: 192240   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:08,621-Speed 2510.65 samples/sec   Loss 10.5306   LearningRate 0.0590   Epoch: 4   Global Step: 192250   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:12,550-Speed 2607.24 samples/sec   Loss 10.3114   LearningRate 0.0590   Epoch: 4   Global Step: 192260   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:16,446-Speed 2629.07 samples/sec   Loss 10.6083   LearningRate 0.0590   Epoch: 4   Global Step: 192270   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:20,346-Speed 2626.47 samples/sec   Loss 10.5971   LearningRate 0.0590   Epoch: 4   Global Step: 192280   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:24,246-Speed 2627.01 samples/sec   Loss 10.4582   LearningRate 0.0590   Epoch: 4   Global Step: 192290   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:28,148-Speed 2624.86 samples/sec   Loss 10.4621   LearningRate 0.0590   Epoch: 4   Global Step: 192300   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:32,046-Speed 2627.67 samples/sec   Loss 10.4431   LearningRate 0.0590   Epoch: 4   Global Step: 192310   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:35,938-Speed 2631.05 samples/sec   Loss 10.3284   LearningRate 0.0590   Epoch: 4   Global Step: 192320   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:39,830-Speed 2632.46 samples/sec   Loss 10.4806   LearningRate 0.0590   Epoch: 4   Global Step: 192330   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:20:43,720-Speed 2633.29 samples/sec   Loss 10.4774   LearningRate 0.0590   Epoch: 4   Global Step: 192340   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:20:47,613-Speed 2630.54 samples/sec   Loss 10.5255   LearningRate 0.0590   Epoch: 4   Global Step: 192350   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:20:51,505-Speed 2631.73 samples/sec   Loss 10.5499   LearningRate 0.0590   Epoch: 4   Global Step: 192360   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:20:55,396-Speed 2632.32 samples/sec   Loss 10.5138   LearningRate 0.0590   Epoch: 4   Global Step: 192370   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:20:59,286-Speed 2633.72 samples/sec   Loss 10.5129   LearningRate 0.0590   Epoch: 4   Global Step: 192380   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:03,174-Speed 2633.91 samples/sec   Loss 10.4724   LearningRate 0.0590   Epoch: 4   Global Step: 192390   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:07,069-Speed 2629.60 samples/sec   Loss 10.5467   LearningRate 0.0590   Epoch: 4   Global Step: 192400   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:10,970-Speed 2626.07 samples/sec   Loss 10.3416   LearningRate 0.0590   Epoch: 4   Global Step: 192410   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:14,872-Speed 2624.60 samples/sec   Loss 10.5119   LearningRate 0.0590   Epoch: 4   Global Step: 192420   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:18,769-Speed 2628.36 samples/sec   Loss 10.4790   LearningRate 0.0590   Epoch: 4   Global Step: 192430   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:22,666-Speed 2628.20 samples/sec   Loss 10.3944   LearningRate 0.0590   Epoch: 4   Global Step: 192440   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:26,564-Speed 2627.87 samples/sec   Loss 10.4448   LearningRate 0.0590   Epoch: 4   Global Step: 192450   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:30,460-Speed 2629.49 samples/sec   Loss 10.4803   LearningRate 0.0590   Epoch: 4   Global Step: 192460   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:34,367-Speed 2621.47 samples/sec   Loss 10.4869   LearningRate 0.0590   Epoch: 4   Global Step: 192470   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:38,263-Speed 2629.12 samples/sec   Loss 10.5119   LearningRate 0.0590   Epoch: 4   Global Step: 192480   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:21:42,140-Speed 2642.06 samples/sec   Loss 10.4901   LearningRate 0.0590   Epoch: 4   Global Step: 192490   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:46,035-Speed 2629.10 samples/sec   Loss 10.5229   LearningRate 0.0590   Epoch: 4   Global Step: 192500   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:49,929-Speed 2629.92 samples/sec   Loss 10.3956   LearningRate 0.0590   Epoch: 4   Global Step: 192510   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:53,840-Speed 2619.27 samples/sec   Loss 10.3481   LearningRate 0.0590   Epoch: 4   Global Step: 192520   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:21:57,734-Speed 2630.56 samples/sec   Loss 10.5195   LearningRate 0.0590   Epoch: 4   Global Step: 192530   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:01,631-Speed 2628.22 samples/sec   Loss 10.6734   LearningRate 0.0590   Epoch: 4   Global Step: 192540   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:05,526-Speed 2629.44 samples/sec   Loss 10.6376   LearningRate 0.0590   Epoch: 4   Global Step: 192550   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:09,465-Speed 2601.11 samples/sec   Loss 10.5139   LearningRate 0.0590   Epoch: 4   Global Step: 192560   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:13,359-Speed 2629.86 samples/sec   Loss 10.4673   LearningRate 0.0590   Epoch: 4   Global Step: 192570   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:17,251-Speed 2631.85 samples/sec   Loss 10.3801   LearningRate 0.0590   Epoch: 4   Global Step: 192580   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:21,156-Speed 2622.47 samples/sec   Loss 10.6125   LearningRate 0.0590   Epoch: 4   Global Step: 192590   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:22:25,038-Speed 2638.92 samples/sec   Loss 10.5090   LearningRate 0.0590   Epoch: 4   Global Step: 192600   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:28,935-Speed 2628.97 samples/sec   Loss 10.5034   LearningRate 0.0590   Epoch: 4   Global Step: 192610   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:32,833-Speed 2627.54 samples/sec   Loss 10.4223   LearningRate 0.0590   Epoch: 4   Global Step: 192620   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:36,731-Speed 2627.34 samples/sec   Loss 10.4503   LearningRate 0.0590   Epoch: 4   Global Step: 192630   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:40,646-Speed 2616.44 samples/sec   Loss 10.5290   LearningRate 0.0589   Epoch: 4   Global Step: 192640   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:44,559-Speed 2617.28 samples/sec   Loss 10.6451   LearningRate 0.0589   Epoch: 4   Global Step: 192650   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:48,453-Speed 2631.01 samples/sec   Loss 10.3437   LearningRate 0.0589   Epoch: 4   Global Step: 192660   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:52,348-Speed 2629.83 samples/sec   Loss 10.6338   LearningRate 0.0589   Epoch: 4   Global Step: 192670   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:22:56,243-Speed 2629.16 samples/sec   Loss 10.4286   LearningRate 0.0589   Epoch: 4   Global Step: 192680   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:00,137-Speed 2630.59 samples/sec   Loss 10.5150   LearningRate 0.0589   Epoch: 4   Global Step: 192690   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:04,029-Speed 2631.08 samples/sec   Loss 10.4191   LearningRate 0.0589   Epoch: 4   Global Step: 192700   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:07,928-Speed 2627.09 samples/sec   Loss 10.5200   LearningRate 0.0589   Epoch: 4   Global Step: 192710   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:11,834-Speed 2622.63 samples/sec   Loss 10.5364   LearningRate 0.0589   Epoch: 4   Global Step: 192720   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:15,732-Speed 2627.18 samples/sec   Loss 10.3980   LearningRate 0.0589   Epoch: 4   Global Step: 192730   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:19,630-Speed 2628.49 samples/sec   Loss 10.4335   LearningRate 0.0589   Epoch: 4   Global Step: 192740   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:23,529-Speed 2626.60 samples/sec   Loss 10.5531   LearningRate 0.0589   Epoch: 4   Global Step: 192750   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:27,427-Speed 2628.26 samples/sec   Loss 10.4418   LearningRate 0.0589   Epoch: 4   Global Step: 192760   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:23:31,302-Speed 2642.48 samples/sec   Loss 10.3948   LearningRate 0.0589   Epoch: 4   Global Step: 192770   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:35,194-Speed 2631.87 samples/sec   Loss 10.5187   LearningRate 0.0589   Epoch: 4   Global Step: 192780   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:39,092-Speed 2627.86 samples/sec   Loss 10.3317   LearningRate 0.0589   Epoch: 4   Global Step: 192790   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:42,986-Speed 2630.28 samples/sec   Loss 10.4212   LearningRate 0.0589   Epoch: 4   Global Step: 192800   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:46,883-Speed 2628.57 samples/sec   Loss 10.4529   LearningRate 0.0589   Epoch: 4   Global Step: 192810   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:50,778-Speed 2629.31 samples/sec   Loss 10.4418   LearningRate 0.0589   Epoch: 4   Global Step: 192820   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:54,669-Speed 2632.19 samples/sec   Loss 10.4392   LearningRate 0.0589   Epoch: 4   Global Step: 192830   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:23:58,570-Speed 2626.00 samples/sec   Loss 10.4866   LearningRate 0.0589   Epoch: 4   Global Step: 192840   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:24:02,466-Speed 2628.98 samples/sec   Loss 10.3803   LearningRate 0.0589   Epoch: 4   Global Step: 192850   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:24:06,364-Speed 2627.79 samples/sec   Loss 10.4751   LearningRate 0.0589   Epoch: 4   Global Step: 192860   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:24:10,256-Speed 2631.38 samples/sec   Loss 10.4258   LearningRate 0.0589   Epoch: 4   Global Step: 192870   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:14,153-Speed 2628.45 samples/sec   Loss 10.3057   LearningRate 0.0589   Epoch: 4   Global Step: 192880   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:18,044-Speed 2632.27 samples/sec   Loss 10.5834   LearningRate 0.0589   Epoch: 4   Global Step: 192890   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:21,938-Speed 2630.52 samples/sec   Loss 10.4333   LearningRate 0.0589   Epoch: 4   Global Step: 192900   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:25,831-Speed 2630.47 samples/sec   Loss 10.4258   LearningRate 0.0589   Epoch: 4   Global Step: 192910   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:29,728-Speed 2628.85 samples/sec   Loss 10.5334   LearningRate 0.0589   Epoch: 4   Global Step: 192920   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:33,620-Speed 2631.63 samples/sec   Loss 10.4144   LearningRate 0.0589   Epoch: 4   Global Step: 192930   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:37,518-Speed 2627.53 samples/sec   Loss 10.2876   LearningRate 0.0589   Epoch: 4   Global Step: 192940   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:41,410-Speed 2631.36 samples/sec   Loss 10.5343   LearningRate 0.0589   Epoch: 4   Global Step: 192950   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:45,305-Speed 2630.54 samples/sec   Loss 10.5487   LearningRate 0.0589   Epoch: 4   Global Step: 192960   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:49,183-Speed 2640.99 samples/sec   Loss 10.4622   LearningRate 0.0589   Epoch: 4   Global Step: 192970   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:24:53,071-Speed 2634.26 samples/sec   Loss 10.5165   LearningRate 0.0589   Epoch: 4   Global Step: 192980   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:24:56,966-Speed 2629.93 samples/sec   Loss 10.3910   LearningRate 0.0589   Epoch: 4   Global Step: 192990   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:00,862-Speed 2628.72 samples/sec   Loss 10.3098   LearningRate 0.0589   Epoch: 4   Global Step: 193000   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:04,760-Speed 2627.83 samples/sec   Loss 10.4170   LearningRate 0.0589   Epoch: 4   Global Step: 193010   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:08,655-Speed 2629.84 samples/sec   Loss 10.3986   LearningRate 0.0589   Epoch: 4   Global Step: 193020   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:12,556-Speed 2625.51 samples/sec   Loss 10.4623   LearningRate 0.0589   Epoch: 4   Global Step: 193030   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:16,454-Speed 2627.08 samples/sec   Loss 10.5320   LearningRate 0.0589   Epoch: 4   Global Step: 193040   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:20,347-Speed 2631.66 samples/sec   Loss 10.3433   LearningRate 0.0589   Epoch: 4   Global Step: 193050   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:24,251-Speed 2623.04 samples/sec   Loss 10.4526   LearningRate 0.0589   Epoch: 4   Global Step: 193060   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:28,144-Speed 2631.23 samples/sec   Loss 10.4225   LearningRate 0.0589   Epoch: 4   Global Step: 193070   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:32,050-Speed 2622.21 samples/sec   Loss 10.4402   LearningRate 0.0589   Epoch: 4   Global Step: 193080   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:25:35,931-Speed 2638.97 samples/sec   Loss 10.4358   LearningRate 0.0589   Epoch: 4   Global Step: 193090   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:39,827-Speed 2628.99 samples/sec   Loss 10.3895   LearningRate 0.0589   Epoch: 4   Global Step: 193100   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:43,719-Speed 2631.59 samples/sec   Loss 10.5987   LearningRate 0.0589   Epoch: 4   Global Step: 193110   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:47,616-Speed 2628.35 samples/sec   Loss 10.4339   LearningRate 0.0589   Epoch: 4   Global Step: 193120   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:51,510-Speed 2630.60 samples/sec   Loss 10.4200   LearningRate 0.0589   Epoch: 4   Global Step: 193130   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:55,403-Speed 2630.25 samples/sec   Loss 10.4893   LearningRate 0.0589   Epoch: 4   Global Step: 193140   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:25:59,310-Speed 2622.36 samples/sec   Loss 10.5237   LearningRate 0.0589   Epoch: 4   Global Step: 193150   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:26:03,223-Speed 2617.11 samples/sec   Loss 10.3940   LearningRate 0.0589   Epoch: 4   Global Step: 193160   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:26:07,122-Speed 2627.02 samples/sec   Loss 10.4343   LearningRate 0.0589   Epoch: 4   Global Step: 193170   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:26:11,015-Speed 2631.15 samples/sec   Loss 10.4905   LearningRate 0.0588   Epoch: 4   Global Step: 193180   Fp16 Grad Scale: 131072   Required: 72 hours
Training: 2022-04-13 17:26:14,912-Speed 2628.22 samples/sec   Loss 10.3805   LearningRate 0.0588   Epoch: 4   Global Step: 193190   Fp16 Grad Scale: 262144   Required: 72 hours
Training: 2022-04-13 17:26:18,806-Speed 2630.61 samples/sec   Loss 10.4434   LearningRate 0.0588   Epoch: 4   Global Step: 193200   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:26:22,702-Speed 2628.47 samples/sec   Loss 10.5285   LearningRate 0.0588   Epoch: 4   Global Step: 193210   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:26:26,594-Speed 2631.50 samples/sec   Loss 10.4103   LearningRate 0.0588   Epoch: 4   Global Step: 193220   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:26:30,492-Speed 2627.92 samples/sec   Loss 10.3601   LearningRate 0.0588   Epoch: 4   Global Step: 193230   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:26:34,369-Speed 2641.48 samples/sec   Loss 10.3899   LearningRate 0.0588   Epoch: 4   Global Step: 193240   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:26:38,274-Speed 2623.41 samples/sec   Loss 10.6618   LearningRate 0.0588   Epoch: 4   Global Step: 193250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:26:42,166-Speed 2631.32 samples/sec   Loss 10.3574   LearningRate 0.0588   Epoch: 4   Global Step: 193260   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:26:46,057-Speed 2632.26 samples/sec   Loss 10.4385   LearningRate 0.0588   Epoch: 4   Global Step: 193270   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:26:49,954-Speed 2628.12 samples/sec   Loss 10.3976   LearningRate 0.0588   Epoch: 4   Global Step: 193280   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:26:53,847-Speed 2631.14 samples/sec   Loss 10.4743   LearningRate 0.0588   Epoch: 4   Global Step: 193290   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:26:57,750-Speed 2624.48 samples/sec   Loss 10.5361   LearningRate 0.0588   Epoch: 4   Global Step: 193300   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:01,657-Speed 2621.62 samples/sec   Loss 10.4404   LearningRate 0.0588   Epoch: 4   Global Step: 193310   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:05,553-Speed 2628.28 samples/sec   Loss 10.4547   LearningRate 0.0588   Epoch: 4   Global Step: 193320   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:09,467-Speed 2617.18 samples/sec   Loss 10.4065   LearningRate 0.0588   Epoch: 4   Global Step: 193330   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:13,364-Speed 2627.93 samples/sec   Loss 10.5066   LearningRate 0.0588   Epoch: 4   Global Step: 193340   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:27:17,252-Speed 2634.74 samples/sec   Loss 10.3510   LearningRate 0.0588   Epoch: 4   Global Step: 193350   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:21,144-Speed 2631.44 samples/sec   Loss 10.3582   LearningRate 0.0588   Epoch: 4   Global Step: 193360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:25,039-Speed 2629.44 samples/sec   Loss 10.4325   LearningRate 0.0588   Epoch: 4   Global Step: 193370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:28,939-Speed 2626.46 samples/sec   Loss 10.4872   LearningRate 0.0588   Epoch: 4   Global Step: 193380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:32,838-Speed 2627.38 samples/sec   Loss 10.5159   LearningRate 0.0588   Epoch: 4   Global Step: 193390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:36,730-Speed 2631.08 samples/sec   Loss 10.4044   LearningRate 0.0588   Epoch: 4   Global Step: 193400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:40,623-Speed 2631.24 samples/sec   Loss 10.4220   LearningRate 0.0588   Epoch: 4   Global Step: 193410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:44,532-Speed 2619.97 samples/sec   Loss 10.5120   LearningRate 0.0588   Epoch: 4   Global Step: 193420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:48,424-Speed 2631.78 samples/sec   Loss 10.5098   LearningRate 0.0588   Epoch: 4   Global Step: 193430   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:52,322-Speed 2627.13 samples/sec   Loss 10.4596   LearningRate 0.0588   Epoch: 4   Global Step: 193440   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:27:56,213-Speed 2632.68 samples/sec   Loss 10.4354   LearningRate 0.0588   Epoch: 4   Global Step: 193450   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:28:00,087-Speed 2644.21 samples/sec   Loss 10.3342   LearningRate 0.0588   Epoch: 4   Global Step: 193460   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:03,981-Speed 2630.06 samples/sec   Loss 10.5046   LearningRate 0.0588   Epoch: 4   Global Step: 193470   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:07,886-Speed 2623.16 samples/sec   Loss 10.5819   LearningRate 0.0588   Epoch: 4   Global Step: 193480   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:11,778-Speed 2631.68 samples/sec   Loss 10.4953   LearningRate 0.0588   Epoch: 4   Global Step: 193490   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:15,678-Speed 2626.04 samples/sec   Loss 10.5536   LearningRate 0.0588   Epoch: 4   Global Step: 193500   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:19,582-Speed 2623.38 samples/sec   Loss 10.3898   LearningRate 0.0588   Epoch: 4   Global Step: 193510   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:23,488-Speed 2624.53 samples/sec   Loss 10.4309   LearningRate 0.0588   Epoch: 4   Global Step: 193520   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:27,383-Speed 2629.42 samples/sec   Loss 10.4654   LearningRate 0.0588   Epoch: 4   Global Step: 193530   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:31,287-Speed 2623.82 samples/sec   Loss 10.3823   LearningRate 0.0588   Epoch: 4   Global Step: 193540   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:35,179-Speed 2631.52 samples/sec   Loss 10.5580   LearningRate 0.0588   Epoch: 4   Global Step: 193550   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:39,070-Speed 2632.30 samples/sec   Loss 10.4463   LearningRate 0.0588   Epoch: 4   Global Step: 193560   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:28:42,962-Speed 2631.93 samples/sec   Loss 10.2640   LearningRate 0.0588   Epoch: 4   Global Step: 193570   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:28:46,842-Speed 2639.62 samples/sec   Loss 10.3917   LearningRate 0.0588   Epoch: 4   Global Step: 193580   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:50,743-Speed 2625.46 samples/sec   Loss 10.3143   LearningRate 0.0588   Epoch: 4   Global Step: 193590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:54,649-Speed 2622.67 samples/sec   Loss 10.3842   LearningRate 0.0588   Epoch: 4   Global Step: 193600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:28:58,560-Speed 2618.68 samples/sec   Loss 10.3889   LearningRate 0.0588   Epoch: 4   Global Step: 193610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:29:02,451-Speed 2632.27 samples/sec   Loss 10.4834   LearningRate 0.0588   Epoch: 4   Global Step: 193620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:29:06,305-Speed 2657.03 samples/sec   Loss 10.7338   LearningRate 0.0588   Epoch: 4   Global Step: 193630   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:10,198-Speed 2631.46 samples/sec   Loss 10.6741   LearningRate 0.0588   Epoch: 4   Global Step: 193640   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:14,088-Speed 2632.75 samples/sec   Loss 10.4991   LearningRate 0.0588   Epoch: 4   Global Step: 193650   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:17,990-Speed 2625.23 samples/sec   Loss 10.5148   LearningRate 0.0588   Epoch: 4   Global Step: 193660   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:21,898-Speed 2621.55 samples/sec   Loss 10.4970   LearningRate 0.0588   Epoch: 4   Global Step: 193670   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:25,805-Speed 2621.28 samples/sec   Loss 10.3676   LearningRate 0.0588   Epoch: 4   Global Step: 193680   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:29,703-Speed 2627.48 samples/sec   Loss 10.5299   LearningRate 0.0588   Epoch: 4   Global Step: 193690   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:33,609-Speed 2622.27 samples/sec   Loss 10.6203   LearningRate 0.0588   Epoch: 4   Global Step: 193700   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:37,500-Speed 2632.49 samples/sec   Loss 10.5414   LearningRate 0.0588   Epoch: 4   Global Step: 193710   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:41,392-Speed 2631.15 samples/sec   Loss 10.5456   LearningRate 0.0587   Epoch: 4   Global Step: 193720   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:29:45,283-Speed 2632.12 samples/sec   Loss 10.4822   LearningRate 0.0587   Epoch: 4   Global Step: 193730   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:29:49,192-Speed 2620.42 samples/sec   Loss 10.4345   LearningRate 0.0587   Epoch: 4   Global Step: 193740   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:29:53,084-Speed 2631.41 samples/sec   Loss 10.4839   LearningRate 0.0587   Epoch: 4   Global Step: 193750   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:29:56,976-Speed 2631.91 samples/sec   Loss 10.4016   LearningRate 0.0587   Epoch: 4   Global Step: 193760   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:00,869-Speed 2630.96 samples/sec   Loss 10.4564   LearningRate 0.0587   Epoch: 4   Global Step: 193770   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:04,762-Speed 2631.09 samples/sec   Loss 10.4015   LearningRate 0.0587   Epoch: 4   Global Step: 193780   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:08,658-Speed 2628.67 samples/sec   Loss 10.5133   LearningRate 0.0587   Epoch: 4   Global Step: 193790   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:12,560-Speed 2624.98 samples/sec   Loss 10.3703   LearningRate 0.0587   Epoch: 4   Global Step: 193800   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:16,462-Speed 2625.12 samples/sec   Loss 10.4176   LearningRate 0.0587   Epoch: 4   Global Step: 193810   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:20,387-Speed 2609.80 samples/sec   Loss 10.5472   LearningRate 0.0587   Epoch: 4   Global Step: 193820   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:24,287-Speed 2625.65 samples/sec   Loss 10.3967   LearningRate 0.0587   Epoch: 4   Global Step: 193830   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:28,189-Speed 2630.61 samples/sec   Loss 10.5139   LearningRate 0.0587   Epoch: 4   Global Step: 193840   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:32,098-Speed 2620.59 samples/sec   Loss 10.4967   LearningRate 0.0587   Epoch: 4   Global Step: 193850   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:35,996-Speed 2627.34 samples/sec   Loss 10.4487   LearningRate 0.0587   Epoch: 4   Global Step: 193860   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:39,902-Speed 2622.36 samples/sec   Loss 10.4200   LearningRate 0.0587   Epoch: 4   Global Step: 193870   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:43,813-Speed 2619.05 samples/sec   Loss 10.4419   LearningRate 0.0587   Epoch: 4   Global Step: 193880   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:47,724-Speed 2618.25 samples/sec   Loss 10.5825   LearningRate 0.0587   Epoch: 4   Global Step: 193890   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:51,621-Speed 2628.30 samples/sec   Loss 10.5541   LearningRate 0.0587   Epoch: 4   Global Step: 193900   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:30:55,497-Speed 2642.34 samples/sec   Loss 10.4614   LearningRate 0.0587   Epoch: 4   Global Step: 193910   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:30:59,390-Speed 2631.57 samples/sec   Loss 10.5560   LearningRate 0.0587   Epoch: 4   Global Step: 193920   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:03,285-Speed 2629.25 samples/sec   Loss 10.5408   LearningRate 0.0587   Epoch: 4   Global Step: 193930   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:07,179-Speed 2630.59 samples/sec   Loss 10.5044   LearningRate 0.0587   Epoch: 4   Global Step: 193940   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:11,072-Speed 2630.72 samples/sec   Loss 10.5688   LearningRate 0.0587   Epoch: 4   Global Step: 193950   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:14,968-Speed 2628.78 samples/sec   Loss 10.3114   LearningRate 0.0587   Epoch: 4   Global Step: 193960   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:18,862-Speed 2630.33 samples/sec   Loss 10.4915   LearningRate 0.0587   Epoch: 4   Global Step: 193970   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:22,754-Speed 2631.91 samples/sec   Loss 10.3883   LearningRate 0.0587   Epoch: 4   Global Step: 193980   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:26,647-Speed 2630.54 samples/sec   Loss 10.3266   LearningRate 0.0587   Epoch: 4   Global Step: 193990   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:30,540-Speed 2631.43 samples/sec   Loss 10.5439   LearningRate 0.0587   Epoch: 4   Global Step: 194000   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:31:34,438-Speed 2627.23 samples/sec   Loss 10.3950   LearningRate 0.0587   Epoch: 4   Global Step: 194010   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:31:38,337-Speed 2626.79 samples/sec   Loss 10.6151   LearningRate 0.0587   Epoch: 4   Global Step: 194020   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:31:42,237-Speed 2626.30 samples/sec   Loss 10.4826   LearningRate 0.0587   Epoch: 4   Global Step: 194030   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:31:46,149-Speed 2618.33 samples/sec   Loss 10.4776   LearningRate 0.0587   Epoch: 4   Global Step: 194040   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:31:50,049-Speed 2626.59 samples/sec   Loss 10.5517   LearningRate 0.0587   Epoch: 4   Global Step: 194050   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:31:53,998-Speed 2593.53 samples/sec   Loss 10.5884   LearningRate 0.0587   Epoch: 4   Global Step: 194060   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:31:57,902-Speed 2623.63 samples/sec   Loss 10.3668   LearningRate 0.0587   Epoch: 4   Global Step: 194070   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:32:01,819-Speed 2614.37 samples/sec   Loss 10.5485   LearningRate 0.0587   Epoch: 4   Global Step: 194080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:32:05,727-Speed 2620.91 samples/sec   Loss 10.5255   LearningRate 0.0587   Epoch: 4   Global Step: 194090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:32:09,596-Speed 2647.48 samples/sec   Loss 10.3380   LearningRate 0.0587   Epoch: 4   Global Step: 194100   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:13,486-Speed 2632.52 samples/sec   Loss 10.6794   LearningRate 0.0587   Epoch: 4   Global Step: 194110   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:17,402-Speed 2615.84 samples/sec   Loss 10.5922   LearningRate 0.0587   Epoch: 4   Global Step: 194120   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:21,290-Speed 2633.96 samples/sec   Loss 10.5169   LearningRate 0.0587   Epoch: 4   Global Step: 194130   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:25,191-Speed 2635.16 samples/sec   Loss 10.3884   LearningRate 0.0587   Epoch: 4   Global Step: 194140   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:29,082-Speed 2632.19 samples/sec   Loss 10.5331   LearningRate 0.0587   Epoch: 4   Global Step: 194150   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:32,971-Speed 2633.41 samples/sec   Loss 10.4854   LearningRate 0.0587   Epoch: 4   Global Step: 194160   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:36,863-Speed 2631.36 samples/sec   Loss 10.4795   LearningRate 0.0587   Epoch: 4   Global Step: 194170   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:40,756-Speed 2631.53 samples/sec   Loss 10.3662   LearningRate 0.0587   Epoch: 4   Global Step: 194180   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:44,658-Speed 2624.33 samples/sec   Loss 10.4130   LearningRate 0.0587   Epoch: 4   Global Step: 194190   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:32:48,565-Speed 2621.86 samples/sec   Loss 10.4290   LearningRate 0.0587   Epoch: 4   Global Step: 194200   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:32:52,476-Speed 2618.28 samples/sec   Loss 10.3865   LearningRate 0.0587   Epoch: 4   Global Step: 194210   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:32:56,383-Speed 2621.52 samples/sec   Loss 10.5444   LearningRate 0.0587   Epoch: 4   Global Step: 194220   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:00,289-Speed 2622.83 samples/sec   Loss 10.5760   LearningRate 0.0587   Epoch: 4   Global Step: 194230   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:04,187-Speed 2627.71 samples/sec   Loss 10.4786   LearningRate 0.0587   Epoch: 4   Global Step: 194240   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:08,131-Speed 2596.99 samples/sec   Loss 10.5105   LearningRate 0.0587   Epoch: 4   Global Step: 194250   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:12,020-Speed 2633.70 samples/sec   Loss 10.3389   LearningRate 0.0587   Epoch: 4   Global Step: 194260   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:15,915-Speed 2629.63 samples/sec   Loss 10.3588   LearningRate 0.0586   Epoch: 4   Global Step: 194270   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:19,805-Speed 2632.64 samples/sec   Loss 10.6322   LearningRate 0.0586   Epoch: 4   Global Step: 194280   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:23,693-Speed 2634.58 samples/sec   Loss 10.3447   LearningRate 0.0586   Epoch: 4   Global Step: 194290   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:33:27,581-Speed 2634.03 samples/sec   Loss 10.4917   LearningRate 0.0586   Epoch: 4   Global Step: 194300   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:31,470-Speed 2633.97 samples/sec   Loss 10.5911   LearningRate 0.0586   Epoch: 4   Global Step: 194310   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:35,361-Speed 2632.19 samples/sec   Loss 10.4286   LearningRate 0.0586   Epoch: 4   Global Step: 194320   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:39,270-Speed 2620.84 samples/sec   Loss 10.5158   LearningRate 0.0586   Epoch: 4   Global Step: 194330   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:43,173-Speed 2623.66 samples/sec   Loss 10.4981   LearningRate 0.0586   Epoch: 4   Global Step: 194340   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:47,070-Speed 2628.12 samples/sec   Loss 10.4288   LearningRate 0.0586   Epoch: 4   Global Step: 194350   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:50,976-Speed 2622.00 samples/sec   Loss 10.5846   LearningRate 0.0586   Epoch: 4   Global Step: 194360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:54,882-Speed 2622.31 samples/sec   Loss 10.5262   LearningRate 0.0586   Epoch: 4   Global Step: 194370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:33:58,791-Speed 2619.97 samples/sec   Loss 10.4031   LearningRate 0.0586   Epoch: 4   Global Step: 194380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:02,694-Speed 2624.35 samples/sec   Loss 10.3984   LearningRate 0.0586   Epoch: 4   Global Step: 194390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:06,590-Speed 2628.75 samples/sec   Loss 10.5938   LearningRate 0.0586   Epoch: 4   Global Step: 194400   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:34:10,489-Speed 2627.42 samples/sec   Loss 10.4459   LearningRate 0.0586   Epoch: 4   Global Step: 194410   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:34:14,372-Speed 2637.66 samples/sec   Loss 10.5859   LearningRate 0.0586   Epoch: 4   Global Step: 194420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:18,272-Speed 2626.18 samples/sec   Loss 10.4843   LearningRate 0.0586   Epoch: 4   Global Step: 194430   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:22,168-Speed 2628.79 samples/sec   Loss 10.4234   LearningRate 0.0586   Epoch: 4   Global Step: 194440   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:26,062-Speed 2630.62 samples/sec   Loss 10.4070   LearningRate 0.0586   Epoch: 4   Global Step: 194450   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:29,968-Speed 2622.43 samples/sec   Loss 10.2644   LearningRate 0.0586   Epoch: 4   Global Step: 194460   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:33,868-Speed 2625.66 samples/sec   Loss 10.3778   LearningRate 0.0586   Epoch: 4   Global Step: 194470   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:38,174-Speed 2378.76 samples/sec   Loss 10.6105   LearningRate 0.0586   Epoch: 4   Global Step: 194480   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:42,094-Speed 2612.45 samples/sec   Loss 10.4944   LearningRate 0.0586   Epoch: 4   Global Step: 194490   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:46,000-Speed 2622.55 samples/sec   Loss 10.4377   LearningRate 0.0586   Epoch: 4   Global Step: 194500   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:49,899-Speed 2627.27 samples/sec   Loss 10.3634   LearningRate 0.0586   Epoch: 4   Global Step: 194510   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:34:53,793-Speed 2630.22 samples/sec   Loss 10.4524   LearningRate 0.0586   Epoch: 4   Global Step: 194520   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:34:57,687-Speed 2630.19 samples/sec   Loss 10.5385   LearningRate 0.0586   Epoch: 4   Global Step: 194530   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:35:01,582-Speed 2629.59 samples/sec   Loss 10.4636   LearningRate 0.0586   Epoch: 4   Global Step: 194540   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:35:05,475-Speed 2630.38 samples/sec   Loss 10.4604   LearningRate 0.0586   Epoch: 4   Global Step: 194550   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:35:09,348-Speed 2644.79 samples/sec   Loss 10.4315   LearningRate 0.0586   Epoch: 4   Global Step: 194560   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:13,247-Speed 2626.97 samples/sec   Loss 10.4704   LearningRate 0.0586   Epoch: 4   Global Step: 194570   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:17,143-Speed 2628.96 samples/sec   Loss 10.4368   LearningRate 0.0586   Epoch: 4   Global Step: 194580   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:21,054-Speed 2619.74 samples/sec   Loss 10.2237   LearningRate 0.0586   Epoch: 4   Global Step: 194590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:24,947-Speed 2630.92 samples/sec   Loss 10.4520   LearningRate 0.0586   Epoch: 4   Global Step: 194600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:28,842-Speed 2629.53 samples/sec   Loss 10.3830   LearningRate 0.0586   Epoch: 4   Global Step: 194610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:32,736-Speed 2629.96 samples/sec   Loss 10.4642   LearningRate 0.0586   Epoch: 4   Global Step: 194620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:36,632-Speed 2628.90 samples/sec   Loss 10.4451   LearningRate 0.0586   Epoch: 4   Global Step: 194630   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:40,538-Speed 2622.28 samples/sec   Loss 10.4560   LearningRate 0.0586   Epoch: 4   Global Step: 194640   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:44,439-Speed 2625.34 samples/sec   Loss 10.4652   LearningRate 0.0586   Epoch: 4   Global Step: 194650   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:48,335-Speed 2628.89 samples/sec   Loss 10.3180   LearningRate 0.0586   Epoch: 4   Global Step: 194660   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:52,225-Speed 2633.54 samples/sec   Loss 10.4037   LearningRate 0.0586   Epoch: 4   Global Step: 194670   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:35:56,119-Speed 2630.43 samples/sec   Loss 10.4611   LearningRate 0.0586   Epoch: 4   Global Step: 194680   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:36:00,032-Speed 2617.25 samples/sec   Loss 10.3290   LearningRate 0.0586   Epoch: 4   Global Step: 194690   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:36:03,930-Speed 2627.87 samples/sec   Loss 10.4149   LearningRate 0.0586   Epoch: 4   Global Step: 194700   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:36:07,828-Speed 2627.63 samples/sec   Loss 10.4126   LearningRate 0.0586   Epoch: 4   Global Step: 194710   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:36:11,703-Speed 2642.66 samples/sec   Loss 10.5091   LearningRate 0.0586   Epoch: 4   Global Step: 194720   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:15,596-Speed 2630.92 samples/sec   Loss 10.3087   LearningRate 0.0586   Epoch: 4   Global Step: 194730   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:19,489-Speed 2631.34 samples/sec   Loss 10.3361   LearningRate 0.0586   Epoch: 4   Global Step: 194740   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:23,383-Speed 2629.96 samples/sec   Loss 10.5851   LearningRate 0.0586   Epoch: 4   Global Step: 194750   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:27,288-Speed 2622.97 samples/sec   Loss 10.5224   LearningRate 0.0586   Epoch: 4   Global Step: 194760   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:31,210-Speed 2611.58 samples/sec   Loss 10.6179   LearningRate 0.0586   Epoch: 4   Global Step: 194770   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:35,103-Speed 2631.09 samples/sec   Loss 10.4611   LearningRate 0.0586   Epoch: 4   Global Step: 194780   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:38,998-Speed 2629.55 samples/sec   Loss 10.4132   LearningRate 0.0586   Epoch: 4   Global Step: 194790   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:42,891-Speed 2630.83 samples/sec   Loss 10.2616   LearningRate 0.0586   Epoch: 4   Global Step: 194800   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:46,788-Speed 2628.36 samples/sec   Loss 10.5292   LearningRate 0.0585   Epoch: 4   Global Step: 194810   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:36:50,770-Speed 2572.18 samples/sec   Loss 10.5019   LearningRate 0.0585   Epoch: 4   Global Step: 194820   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:36:54,663-Speed 2630.71 samples/sec   Loss 10.5847   LearningRate 0.0585   Epoch: 4   Global Step: 194830   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:36:58,539-Speed 2642.78 samples/sec   Loss 10.5374   LearningRate 0.0585   Epoch: 4   Global Step: 194840   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:37:02,433-Speed 2630.08 samples/sec   Loss 10.2645   LearningRate 0.0585   Epoch: 4   Global Step: 194850   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:37:06,327-Speed 2630.36 samples/sec   Loss 10.3010   LearningRate 0.0585   Epoch: 4   Global Step: 194860   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:37:10,208-Speed 2639.07 samples/sec   Loss 10.5895   LearningRate 0.0585   Epoch: 4   Global Step: 194870   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:14,101-Speed 2631.46 samples/sec   Loss 10.9760   LearningRate 0.0585   Epoch: 4   Global Step: 194880   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:17,991-Speed 2632.76 samples/sec   Loss 10.5610   LearningRate 0.0585   Epoch: 4   Global Step: 194890   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:21,887-Speed 2629.07 samples/sec   Loss 10.6208   LearningRate 0.0585   Epoch: 4   Global Step: 194900   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:25,778-Speed 2631.88 samples/sec   Loss 10.4875   LearningRate 0.0585   Epoch: 4   Global Step: 194910   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:29,672-Speed 2630.67 samples/sec   Loss 10.3605   LearningRate 0.0585   Epoch: 4   Global Step: 194920   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:33,567-Speed 2629.35 samples/sec   Loss 10.4060   LearningRate 0.0585   Epoch: 4   Global Step: 194930   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:37,470-Speed 2624.12 samples/sec   Loss 10.5207   LearningRate 0.0585   Epoch: 4   Global Step: 194940   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:41,368-Speed 2627.46 samples/sec   Loss 10.3897   LearningRate 0.0585   Epoch: 4   Global Step: 194950   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:45,262-Speed 2630.74 samples/sec   Loss 10.4983   LearningRate 0.0585   Epoch: 4   Global Step: 194960   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:37:49,157-Speed 2629.60 samples/sec   Loss 10.4698   LearningRate 0.0585   Epoch: 4   Global Step: 194970   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:37:53,052-Speed 2629.45 samples/sec   Loss 10.3695   LearningRate 0.0585   Epoch: 4   Global Step: 194980   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:37:56,950-Speed 2628.05 samples/sec   Loss 10.5045   LearningRate 0.0585   Epoch: 4   Global Step: 194990   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:00,851-Speed 2625.46 samples/sec   Loss 10.3719   LearningRate 0.0585   Epoch: 4   Global Step: 195000   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:04,742-Speed 2632.20 samples/sec   Loss 10.4637   LearningRate 0.0585   Epoch: 4   Global Step: 195010   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:08,631-Speed 2633.51 samples/sec   Loss 10.5118   LearningRate 0.0585   Epoch: 4   Global Step: 195020   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:12,519-Speed 2633.80 samples/sec   Loss 10.4103   LearningRate 0.0585   Epoch: 4   Global Step: 195030   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:16,432-Speed 2618.03 samples/sec   Loss 10.5204   LearningRate 0.0585   Epoch: 4   Global Step: 195040   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:20,333-Speed 2625.82 samples/sec   Loss 10.5117   LearningRate 0.0585   Epoch: 4   Global Step: 195050   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:24,257-Speed 2610.24 samples/sec   Loss 10.4717   LearningRate 0.0585   Epoch: 4   Global Step: 195060   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:38:28,174-Speed 2614.92 samples/sec   Loss 10.4220   LearningRate 0.0585   Epoch: 4   Global Step: 195070   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:32,087-Speed 2617.66 samples/sec   Loss 10.3991   LearningRate 0.0585   Epoch: 4   Global Step: 195080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:35,986-Speed 2626.86 samples/sec   Loss 10.4104   LearningRate 0.0585   Epoch: 4   Global Step: 195090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:39,883-Speed 2628.05 samples/sec   Loss 10.4155   LearningRate 0.0585   Epoch: 4   Global Step: 195100   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:43,782-Speed 2627.17 samples/sec   Loss 10.4960   LearningRate 0.0585   Epoch: 4   Global Step: 195110   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:47,684-Speed 2624.20 samples/sec   Loss 10.4392   LearningRate 0.0585   Epoch: 4   Global Step: 195120   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:51,588-Speed 2623.55 samples/sec   Loss 10.5189   LearningRate 0.0585   Epoch: 4   Global Step: 195130   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:55,488-Speed 2626.23 samples/sec   Loss 10.4035   LearningRate 0.0585   Epoch: 4   Global Step: 195140   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:38:59,420-Speed 2605.29 samples/sec   Loss 10.6137   LearningRate 0.0585   Epoch: 4   Global Step: 195150   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:03,445-Speed 2544.38 samples/sec   Loss 10.3402   LearningRate 0.0585   Epoch: 4   Global Step: 195160   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:07,339-Speed 2630.50 samples/sec   Loss 10.5238   LearningRate 0.0585   Epoch: 4   Global Step: 195170   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:39:11,240-Speed 2625.55 samples/sec   Loss 10.5159   LearningRate 0.0585   Epoch: 4   Global Step: 195180   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:39:15,134-Speed 2630.18 samples/sec   Loss 10.3269   LearningRate 0.0585   Epoch: 4   Global Step: 195190   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:39:19,046-Speed 2618.15 samples/sec   Loss 10.3445   LearningRate 0.0585   Epoch: 4   Global Step: 195200   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:39:22,922-Speed 2642.24 samples/sec   Loss 10.4759   LearningRate 0.0585   Epoch: 4   Global Step: 195210   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:26,823-Speed 2625.66 samples/sec   Loss 10.4147   LearningRate 0.0585   Epoch: 4   Global Step: 195220   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:30,718-Speed 2629.81 samples/sec   Loss 10.4385   LearningRate 0.0585   Epoch: 4   Global Step: 195230   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:34,611-Speed 2631.02 samples/sec   Loss 10.3898   LearningRate 0.0585   Epoch: 4   Global Step: 195240   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:38,501-Speed 2633.01 samples/sec   Loss 10.2816   LearningRate 0.0585   Epoch: 4   Global Step: 195250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:42,392-Speed 2632.16 samples/sec   Loss 10.2661   LearningRate 0.0585   Epoch: 4   Global Step: 195260   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:46,283-Speed 2632.68 samples/sec   Loss 10.4566   LearningRate 0.0585   Epoch: 4   Global Step: 195270   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:50,175-Speed 2631.25 samples/sec   Loss 10.4649   LearningRate 0.0585   Epoch: 4   Global Step: 195280   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:54,065-Speed 2632.90 samples/sec   Loss 10.4242   LearningRate 0.0585   Epoch: 4   Global Step: 195290   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:39:57,957-Speed 2631.71 samples/sec   Loss 10.4424   LearningRate 0.0585   Epoch: 4   Global Step: 195300   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:01,853-Speed 2628.64 samples/sec   Loss 10.5314   LearningRate 0.0585   Epoch: 4   Global Step: 195310   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:40:05,731-Speed 2641.34 samples/sec   Loss 10.3340   LearningRate 0.0585   Epoch: 4   Global Step: 195320   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:09,635-Speed 2624.14 samples/sec   Loss 10.3515   LearningRate 0.0585   Epoch: 4   Global Step: 195330   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:13,522-Speed 2635.36 samples/sec   Loss 10.4366   LearningRate 0.0585   Epoch: 4   Global Step: 195340   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:17,434-Speed 2617.47 samples/sec   Loss 10.4311   LearningRate 0.0584   Epoch: 4   Global Step: 195350   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:21,324-Speed 2632.77 samples/sec   Loss 10.3511   LearningRate 0.0584   Epoch: 4   Global Step: 195360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:25,216-Speed 2631.74 samples/sec   Loss 10.4466   LearningRate 0.0584   Epoch: 4   Global Step: 195370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:29,105-Speed 2634.11 samples/sec   Loss 10.4421   LearningRate 0.0584   Epoch: 4   Global Step: 195380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:32,998-Speed 2630.28 samples/sec   Loss 10.5444   LearningRate 0.0584   Epoch: 4   Global Step: 195390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:36,895-Speed 2628.42 samples/sec   Loss 10.3556   LearningRate 0.0584   Epoch: 4   Global Step: 195400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:40,790-Speed 2630.11 samples/sec   Loss 10.4540   LearningRate 0.0584   Epoch: 4   Global Step: 195410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:44,667-Speed 2641.19 samples/sec   Loss 10.4165   LearningRate 0.0584   Epoch: 4   Global Step: 195420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:48,567-Speed 2627.12 samples/sec   Loss 10.5216   LearningRate 0.0584   Epoch: 4   Global Step: 195430   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:52,474-Speed 2621.58 samples/sec   Loss 10.3916   LearningRate 0.0584   Epoch: 4   Global Step: 195440   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:40:56,360-Speed 2635.38 samples/sec   Loss 10.4956   LearningRate 0.0584   Epoch: 4   Global Step: 195450   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:00,259-Speed 2627.26 samples/sec   Loss 10.4520   LearningRate 0.0584   Epoch: 4   Global Step: 195460   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:04,162-Speed 2623.66 samples/sec   Loss 10.4518   LearningRate 0.0584   Epoch: 4   Global Step: 195470   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:08,063-Speed 2625.54 samples/sec   Loss 10.4115   LearningRate 0.0584   Epoch: 4   Global Step: 195480   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:11,967-Speed 2623.37 samples/sec   Loss 10.5708   LearningRate 0.0584   Epoch: 4   Global Step: 195490   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:15,875-Speed 2620.95 samples/sec   Loss 10.5797   LearningRate 0.0584   Epoch: 4   Global Step: 195500   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:19,785-Speed 2620.22 samples/sec   Loss 10.3262   LearningRate 0.0584   Epoch: 4   Global Step: 195510   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:23,691-Speed 2622.34 samples/sec   Loss 10.3824   LearningRate 0.0584   Epoch: 4   Global Step: 195520   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:41:27,598-Speed 2621.37 samples/sec   Loss 10.4856   LearningRate 0.0584   Epoch: 4   Global Step: 195530   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:41:31,487-Speed 2633.80 samples/sec   Loss 10.5168   LearningRate 0.0584   Epoch: 4   Global Step: 195540   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:35,388-Speed 2625.50 samples/sec   Loss 10.4482   LearningRate 0.0584   Epoch: 4   Global Step: 195550   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:39,290-Speed 2624.53 samples/sec   Loss 10.4520   LearningRate 0.0584   Epoch: 4   Global Step: 195560   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:43,188-Speed 2627.43 samples/sec   Loss 10.4964   LearningRate 0.0584   Epoch: 4   Global Step: 195570   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:47,089-Speed 2625.70 samples/sec   Loss 10.4238   LearningRate 0.0584   Epoch: 4   Global Step: 195580   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:50,985-Speed 2629.11 samples/sec   Loss 10.6008   LearningRate 0.0584   Epoch: 4   Global Step: 195590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:54,879-Speed 2630.41 samples/sec   Loss 10.3144   LearningRate 0.0584   Epoch: 4   Global Step: 195600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:41:58,773-Speed 2630.28 samples/sec   Loss 10.4682   LearningRate 0.0584   Epoch: 4   Global Step: 195610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:42:02,662-Speed 2633.98 samples/sec   Loss 10.3484   LearningRate 0.0584   Epoch: 4   Global Step: 195620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:42:06,521-Speed 2653.74 samples/sec   Loss 10.4623   LearningRate 0.0584   Epoch: 4   Global Step: 195630   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:10,419-Speed 2627.21 samples/sec   Loss 10.4403   LearningRate 0.0584   Epoch: 4   Global Step: 195640   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:14,320-Speed 2626.15 samples/sec   Loss 10.3305   LearningRate 0.0584   Epoch: 4   Global Step: 195650   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:18,221-Speed 2625.16 samples/sec   Loss 10.5424   LearningRate 0.0584   Epoch: 4   Global Step: 195660   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:22,120-Speed 2626.98 samples/sec   Loss 10.4219   LearningRate 0.0584   Epoch: 4   Global Step: 195670   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:26,020-Speed 2625.95 samples/sec   Loss 10.4380   LearningRate 0.0584   Epoch: 4   Global Step: 195680   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:29,919-Speed 2626.93 samples/sec   Loss 10.4683   LearningRate 0.0584   Epoch: 4   Global Step: 195690   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:33,816-Speed 2628.52 samples/sec   Loss 10.3514   LearningRate 0.0584   Epoch: 4   Global Step: 195700   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:37,704-Speed 2634.50 samples/sec   Loss 10.4938   LearningRate 0.0584   Epoch: 4   Global Step: 195710   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:41,591-Speed 2634.56 samples/sec   Loss 10.3844   LearningRate 0.0584   Epoch: 4   Global Step: 195720   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:42:45,480-Speed 2633.77 samples/sec   Loss 10.4108   LearningRate 0.0584   Epoch: 4   Global Step: 195730   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:42:49,377-Speed 2628.42 samples/sec   Loss 10.4036   LearningRate 0.0584   Epoch: 4   Global Step: 195740   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:42:53,286-Speed 2620.49 samples/sec   Loss 10.4991   LearningRate 0.0584   Epoch: 4   Global Step: 195750   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:42:57,176-Speed 2632.44 samples/sec   Loss 10.3666   LearningRate 0.0584   Epoch: 4   Global Step: 195760   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:01,068-Speed 2632.06 samples/sec   Loss 10.5063   LearningRate 0.0584   Epoch: 4   Global Step: 195770   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:04,975-Speed 2620.91 samples/sec   Loss 10.4933   LearningRate 0.0584   Epoch: 4   Global Step: 195780   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:08,869-Speed 2630.65 samples/sec   Loss 10.3949   LearningRate 0.0584   Epoch: 4   Global Step: 195790   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:12,760-Speed 2631.89 samples/sec   Loss 10.4316   LearningRate 0.0584   Epoch: 4   Global Step: 195800   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:16,653-Speed 2631.74 samples/sec   Loss 10.4356   LearningRate 0.0584   Epoch: 4   Global Step: 195810   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:20,550-Speed 2627.77 samples/sec   Loss 10.3790   LearningRate 0.0584   Epoch: 4   Global Step: 195820   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:24,424-Speed 2643.94 samples/sec   Loss 10.5013   LearningRate 0.0584   Epoch: 4   Global Step: 195830   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:43:28,301-Speed 2641.61 samples/sec   Loss 10.5331   LearningRate 0.0584   Epoch: 4   Global Step: 195840   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:32,216-Speed 2616.68 samples/sec   Loss 10.3322   LearningRate 0.0584   Epoch: 4   Global Step: 195850   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:36,108-Speed 2632.19 samples/sec   Loss 10.3776   LearningRate 0.0584   Epoch: 4   Global Step: 195860   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:40,858-Speed 2156.19 samples/sec   Loss 10.4236   LearningRate 0.0584   Epoch: 4   Global Step: 195870   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:44,748-Speed 2633.25 samples/sec   Loss 10.4972   LearningRate 0.0584   Epoch: 4   Global Step: 195880   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:48,638-Speed 2632.73 samples/sec   Loss 10.4866   LearningRate 0.0583   Epoch: 4   Global Step: 195890   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:52,538-Speed 2626.51 samples/sec   Loss 10.4607   LearningRate 0.0583   Epoch: 4   Global Step: 195900   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:43:56,434-Speed 2629.35 samples/sec   Loss 10.3716   LearningRate 0.0583   Epoch: 4   Global Step: 195910   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:00,328-Speed 2630.87 samples/sec   Loss 10.3816   LearningRate 0.0583   Epoch: 4   Global Step: 195920   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:04,223-Speed 2629.36 samples/sec   Loss 10.4443   LearningRate 0.0583   Epoch: 4   Global Step: 195930   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:08,160-Speed 2601.24 samples/sec   Loss 10.3697   LearningRate 0.0583   Epoch: 4   Global Step: 195940   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:44:12,111-Speed 2592.32 samples/sec   Loss 10.4352   LearningRate 0.0583   Epoch: 4   Global Step: 195950   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:44:16,054-Speed 2598.11 samples/sec   Loss 10.2621   LearningRate 0.0583   Epoch: 4   Global Step: 195960   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:44:19,952-Speed 2627.24 samples/sec   Loss 10.3767   LearningRate 0.0583   Epoch: 4   Global Step: 195970   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:44:23,832-Speed 2639.79 samples/sec   Loss 10.3697   LearningRate 0.0583   Epoch: 4   Global Step: 195980   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:27,732-Speed 2626.79 samples/sec   Loss 10.6179   LearningRate 0.0583   Epoch: 4   Global Step: 195990   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:31,631-Speed 2627.02 samples/sec   Loss 10.5081   LearningRate 0.0583   Epoch: 4   Global Step: 196000   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:35,524-Speed 2631.53 samples/sec   Loss 10.5198   LearningRate 0.0583   Epoch: 4   Global Step: 196010   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:39,432-Speed 2620.87 samples/sec   Loss 10.3783   LearningRate 0.0583   Epoch: 4   Global Step: 196020   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:43,315-Speed 2637.61 samples/sec   Loss 10.5284   LearningRate 0.0583   Epoch: 4   Global Step: 196030   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:47,208-Speed 2631.75 samples/sec   Loss 10.4940   LearningRate 0.0583   Epoch: 4   Global Step: 196040   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:51,110-Speed 2624.40 samples/sec   Loss 10.5327   LearningRate 0.0583   Epoch: 4   Global Step: 196050   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:54,999-Speed 2633.42 samples/sec   Loss 10.4428   LearningRate 0.0583   Epoch: 4   Global Step: 196060   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:44:58,887-Speed 2634.08 samples/sec   Loss 10.4469   LearningRate 0.0583   Epoch: 4   Global Step: 196070   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:45:02,782-Speed 2630.65 samples/sec   Loss 10.4707   LearningRate 0.0583   Epoch: 4   Global Step: 196080   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:06,673-Speed 2632.56 samples/sec   Loss 10.5234   LearningRate 0.0583   Epoch: 4   Global Step: 196090   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:10,575-Speed 2624.55 samples/sec   Loss 10.3899   LearningRate 0.0583   Epoch: 4   Global Step: 196100   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:14,507-Speed 2605.56 samples/sec   Loss 10.3976   LearningRate 0.0583   Epoch: 4   Global Step: 196110   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:18,398-Speed 2632.32 samples/sec   Loss 10.4360   LearningRate 0.0583   Epoch: 4   Global Step: 196120   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:22,288-Speed 2632.67 samples/sec   Loss 10.5279   LearningRate 0.0583   Epoch: 4   Global Step: 196130   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:26,179-Speed 2632.46 samples/sec   Loss 10.4037   LearningRate 0.0583   Epoch: 4   Global Step: 196140   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:30,071-Speed 2631.71 samples/sec   Loss 10.4984   LearningRate 0.0583   Epoch: 4   Global Step: 196150   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:33,971-Speed 2626.64 samples/sec   Loss 10.3996   LearningRate 0.0583   Epoch: 4   Global Step: 196160   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:37,867-Speed 2629.40 samples/sec   Loss 10.4503   LearningRate 0.0583   Epoch: 4   Global Step: 196170   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:45:41,766-Speed 2626.81 samples/sec   Loss 10.4866   LearningRate 0.0583   Epoch: 4   Global Step: 196180   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:45:45,658-Speed 2631.58 samples/sec   Loss 10.3683   LearningRate 0.0583   Epoch: 4   Global Step: 196190   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:45:49,553-Speed 2630.02 samples/sec   Loss 10.5584   LearningRate 0.0583   Epoch: 4   Global Step: 196200   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:45:53,456-Speed 2624.29 samples/sec   Loss 10.3382   LearningRate 0.0583   Epoch: 4   Global Step: 196210   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:45:57,390-Speed 2603.01 samples/sec   Loss 10.4543   LearningRate 0.0583   Epoch: 4   Global Step: 196220   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:01,286-Speed 2629.85 samples/sec   Loss 10.3824   LearningRate 0.0583   Epoch: 4   Global Step: 196230   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:05,187-Speed 2625.69 samples/sec   Loss 10.4958   LearningRate 0.0583   Epoch: 4   Global Step: 196240   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:09,084-Speed 2628.73 samples/sec   Loss 10.4756   LearningRate 0.0583   Epoch: 4   Global Step: 196250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:12,997-Speed 2617.12 samples/sec   Loss 10.4927   LearningRate 0.0583   Epoch: 4   Global Step: 196260   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:16,890-Speed 2631.57 samples/sec   Loss 10.5423   LearningRate 0.0583   Epoch: 4   Global Step: 196270   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:20,782-Speed 2631.76 samples/sec   Loss 10.4132   LearningRate 0.0583   Epoch: 4   Global Step: 196280   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:46:24,676-Speed 2629.98 samples/sec   Loss 10.5662   LearningRate 0.0583   Epoch: 4   Global Step: 196290   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:46:28,599-Speed 2610.62 samples/sec   Loss 10.6715   LearningRate 0.0583   Epoch: 4   Global Step: 196300   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:46:32,493-Speed 2630.33 samples/sec   Loss 10.3744   LearningRate 0.0583   Epoch: 4   Global Step: 196310   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:46:36,388-Speed 2630.54 samples/sec   Loss 10.4869   LearningRate 0.0583   Epoch: 4   Global Step: 196320   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:46:40,263-Speed 2642.97 samples/sec   Loss 10.5147   LearningRate 0.0583   Epoch: 4   Global Step: 196330   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:44,162-Speed 2627.03 samples/sec   Loss 10.3240   LearningRate 0.0583   Epoch: 4   Global Step: 196340   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:48,060-Speed 2628.28 samples/sec   Loss 10.5427   LearningRate 0.0583   Epoch: 4   Global Step: 196350   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:51,962-Speed 2625.31 samples/sec   Loss 10.4806   LearningRate 0.0583   Epoch: 4   Global Step: 196360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:55,860-Speed 2627.32 samples/sec   Loss 10.2553   LearningRate 0.0583   Epoch: 4   Global Step: 196370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:46:59,763-Speed 2624.54 samples/sec   Loss 10.5110   LearningRate 0.0583   Epoch: 4   Global Step: 196380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:03,663-Speed 2626.36 samples/sec   Loss 10.2777   LearningRate 0.0583   Epoch: 4   Global Step: 196390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:07,564-Speed 2625.41 samples/sec   Loss 10.5187   LearningRate 0.0583   Epoch: 4   Global Step: 196400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:11,458-Speed 2631.06 samples/sec   Loss 10.4711   LearningRate 0.0583   Epoch: 4   Global Step: 196410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:15,359-Speed 2625.57 samples/sec   Loss 10.4209   LearningRate 0.0583   Epoch: 4   Global Step: 196420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:19,261-Speed 2624.77 samples/sec   Loss 10.4228   LearningRate 0.0583   Epoch: 4   Global Step: 196430   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:47:23,154-Speed 2630.51 samples/sec   Loss 10.3993   LearningRate 0.0582   Epoch: 4   Global Step: 196440   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:47:27,030-Speed 2642.66 samples/sec   Loss 10.3544   LearningRate 0.0582   Epoch: 4   Global Step: 196450   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:30,923-Speed 2631.26 samples/sec   Loss 10.5714   LearningRate 0.0582   Epoch: 4   Global Step: 196460   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:34,818-Speed 2629.69 samples/sec   Loss 10.4527   LearningRate 0.0582   Epoch: 4   Global Step: 196470   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:38,727-Speed 2619.96 samples/sec   Loss 10.3883   LearningRate 0.0582   Epoch: 4   Global Step: 196480   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:47:42,610-Speed 2638.36 samples/sec   Loss 10.4453   LearningRate 0.0582   Epoch: 4   Global Step: 196490   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:47:46,509-Speed 2627.33 samples/sec   Loss 10.3745   LearningRate 0.0582   Epoch: 4   Global Step: 196500   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:47:50,411-Speed 2624.52 samples/sec   Loss 10.3026   LearningRate 0.0582   Epoch: 4   Global Step: 196510   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:47:54,351-Speed 2600.15 samples/sec   Loss 10.5075   LearningRate 0.0582   Epoch: 4   Global Step: 196520   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:47:58,252-Speed 2625.50 samples/sec   Loss 10.4627   LearningRate 0.0582   Epoch: 4   Global Step: 196530   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:48:02,146-Speed 2630.46 samples/sec   Loss 10.3505   LearningRate 0.0582   Epoch: 4   Global Step: 196540   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:48:06,075-Speed 2607.27 samples/sec   Loss 10.4308   LearningRate 0.0582   Epoch: 4   Global Step: 196550   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:48:09,977-Speed 2624.74 samples/sec   Loss 10.5093   LearningRate 0.0582   Epoch: 4   Global Step: 196560   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:48:13,869-Speed 2631.25 samples/sec   Loss 10.3869   LearningRate 0.0582   Epoch: 4   Global Step: 196570   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:48:17,833-Speed 2584.80 samples/sec   Loss 10.3372   LearningRate 0.0582   Epoch: 4   Global Step: 196580   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:48:21,729-Speed 2629.43 samples/sec   Loss 10.3753   LearningRate 0.0582   Epoch: 4   Global Step: 196590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:25,641-Speed 2617.99 samples/sec   Loss 10.3114   LearningRate 0.0582   Epoch: 4   Global Step: 196600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:29,536-Speed 2630.17 samples/sec   Loss 10.3678   LearningRate 0.0582   Epoch: 4   Global Step: 196610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:33,425-Speed 2633.34 samples/sec   Loss 10.4687   LearningRate 0.0582   Epoch: 4   Global Step: 196620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:37,317-Speed 2631.31 samples/sec   Loss 10.4292   LearningRate 0.0582   Epoch: 4   Global Step: 196630   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:41,209-Speed 2631.72 samples/sec   Loss 10.2840   LearningRate 0.0582   Epoch: 4   Global Step: 196640   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:45,106-Speed 2628.50 samples/sec   Loss 10.5408   LearningRate 0.0582   Epoch: 4   Global Step: 196650   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:48,995-Speed 2633.94 samples/sec   Loss 10.4216   LearningRate 0.0582   Epoch: 4   Global Step: 196660   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:52,887-Speed 2631.84 samples/sec   Loss 10.2949   LearningRate 0.0582   Epoch: 4   Global Step: 196670   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:48:56,781-Speed 2629.91 samples/sec   Loss 10.3655   LearningRate 0.0582   Epoch: 4   Global Step: 196680   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:00,672-Speed 2632.96 samples/sec   Loss 10.3716   LearningRate 0.0582   Epoch: 4   Global Step: 196690   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:49:04,549-Speed 2641.41 samples/sec   Loss 10.2886   LearningRate 0.0582   Epoch: 4   Global Step: 196700   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:08,444-Speed 2629.77 samples/sec   Loss 10.4190   LearningRate 0.0582   Epoch: 4   Global Step: 196710   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:12,340-Speed 2628.43 samples/sec   Loss 10.2947   LearningRate 0.0582   Epoch: 4   Global Step: 196720   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:16,235-Speed 2630.18 samples/sec   Loss 10.2878   LearningRate 0.0582   Epoch: 4   Global Step: 196730   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:20,135-Speed 2626.55 samples/sec   Loss 10.4809   LearningRate 0.0582   Epoch: 4   Global Step: 196740   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:24,025-Speed 2633.40 samples/sec   Loss 10.4769   LearningRate 0.0582   Epoch: 4   Global Step: 196750   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:27,915-Speed 2632.76 samples/sec   Loss 10.3076   LearningRate 0.0582   Epoch: 4   Global Step: 196760   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:31,831-Speed 2615.60 samples/sec   Loss 10.4215   LearningRate 0.0582   Epoch: 4   Global Step: 196770   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:49:35,710-Speed 2640.96 samples/sec   Loss 10.5399   LearningRate 0.0582   Epoch: 4   Global Step: 196780   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:49:39,614-Speed 2623.78 samples/sec   Loss 10.3658   LearningRate 0.0582   Epoch: 4   Global Step: 196790   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:49:43,575-Speed 2585.72 samples/sec   Loss 10.4174   LearningRate 0.0582   Epoch: 4   Global Step: 196800   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:49:47,442-Speed 2648.80 samples/sec   Loss 10.9131   LearningRate 0.0582   Epoch: 4   Global Step: 196810   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:49:51,337-Speed 2628.89 samples/sec   Loss 11.1056   LearningRate 0.0582   Epoch: 4   Global Step: 196820   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:49:55,234-Speed 2629.11 samples/sec   Loss 10.6958   LearningRate 0.0582   Epoch: 4   Global Step: 196830   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:49:59,139-Speed 2622.97 samples/sec   Loss 10.6166   LearningRate 0.0582   Epoch: 4   Global Step: 196840   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:03,065-Speed 2608.59 samples/sec   Loss 10.5500   LearningRate 0.0582   Epoch: 4   Global Step: 196850   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:06,957-Speed 2632.25 samples/sec   Loss 10.4917   LearningRate 0.0582   Epoch: 4   Global Step: 196860   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:10,846-Speed 2633.08 samples/sec   Loss 10.5279   LearningRate 0.0582   Epoch: 4   Global Step: 196870   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:14,735-Speed 2634.11 samples/sec   Loss 10.4238   LearningRate 0.0582   Epoch: 4   Global Step: 196880   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:18,806-Speed 2515.62 samples/sec   Loss 10.3944   LearningRate 0.0582   Epoch: 4   Global Step: 196890   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:22,728-Speed 2612.18 samples/sec   Loss 10.4822   LearningRate 0.0582   Epoch: 4   Global Step: 196900   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:50:26,617-Speed 2633.72 samples/sec   Loss 10.5178   LearningRate 0.0582   Epoch: 4   Global Step: 196910   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:30,505-Speed 2634.62 samples/sec   Loss 10.4282   LearningRate 0.0582   Epoch: 4   Global Step: 196920   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:34,399-Speed 2630.16 samples/sec   Loss 10.4445   LearningRate 0.0582   Epoch: 4   Global Step: 196930   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:38,290-Speed 2633.98 samples/sec   Loss 10.4601   LearningRate 0.0582   Epoch: 4   Global Step: 196940   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:42,196-Speed 2623.19 samples/sec   Loss 10.6059   LearningRate 0.0582   Epoch: 4   Global Step: 196950   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:46,103-Speed 2621.07 samples/sec   Loss 10.5952   LearningRate 0.0582   Epoch: 4   Global Step: 196960   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:50,022-Speed 2614.60 samples/sec   Loss 10.4728   LearningRate 0.0582   Epoch: 4   Global Step: 196970   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:53,945-Speed 2610.92 samples/sec   Loss 10.3693   LearningRate 0.0581   Epoch: 4   Global Step: 196980   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:50:57,839-Speed 2630.59 samples/sec   Loss 10.3948   LearningRate 0.0581   Epoch: 4   Global Step: 196990   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:51:01,730-Speed 2631.78 samples/sec   Loss 10.4573   LearningRate 0.0581   Epoch: 4   Global Step: 197000   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:51:05,627-Speed 2628.82 samples/sec   Loss 10.5263   LearningRate 0.0581   Epoch: 4   Global Step: 197010   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:09,516-Speed 2633.27 samples/sec   Loss 10.6233   LearningRate 0.0581   Epoch: 4   Global Step: 197020   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:13,409-Speed 2631.09 samples/sec   Loss 10.3667   LearningRate 0.0581   Epoch: 4   Global Step: 197030   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:17,301-Speed 2632.02 samples/sec   Loss 10.3658   LearningRate 0.0581   Epoch: 4   Global Step: 197040   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:21,194-Speed 2630.76 samples/sec   Loss 10.4213   LearningRate 0.0581   Epoch: 4   Global Step: 197050   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:25,090-Speed 2628.78 samples/sec   Loss 10.4672   LearningRate 0.0581   Epoch: 4   Global Step: 197060   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:28,981-Speed 2632.31 samples/sec   Loss 10.4857   LearningRate 0.0581   Epoch: 4   Global Step: 197070   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:32,873-Speed 2631.88 samples/sec   Loss 10.3606   LearningRate 0.0581   Epoch: 4   Global Step: 197080   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:36,766-Speed 2631.01 samples/sec   Loss 10.3250   LearningRate 0.0581   Epoch: 4   Global Step: 197090   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:40,658-Speed 2631.30 samples/sec   Loss 10.5526   LearningRate 0.0581   Epoch: 4   Global Step: 197100   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:51:44,554-Speed 2629.50 samples/sec   Loss 10.5863   LearningRate 0.0581   Epoch: 4   Global Step: 197110   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:51:48,469-Speed 2616.27 samples/sec   Loss 10.4409   LearningRate 0.0581   Epoch: 4   Global Step: 197120   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:51:52,361-Speed 2631.38 samples/sec   Loss 10.4660   LearningRate 0.0581   Epoch: 4   Global Step: 197130   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:51:56,255-Speed 2630.39 samples/sec   Loss 10.4518   LearningRate 0.0581   Epoch: 4   Global Step: 197140   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:00,147-Speed 2631.43 samples/sec   Loss 10.3312   LearningRate 0.0581   Epoch: 4   Global Step: 197150   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:04,038-Speed 2632.12 samples/sec   Loss 10.3926   LearningRate 0.0581   Epoch: 4   Global Step: 197160   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:07,930-Speed 2631.75 samples/sec   Loss 10.4467   LearningRate 0.0581   Epoch: 4   Global Step: 197170   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:11,821-Speed 2632.76 samples/sec   Loss 10.3357   LearningRate 0.0581   Epoch: 4   Global Step: 197180   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:15,712-Speed 2631.88 samples/sec   Loss 10.4009   LearningRate 0.0581   Epoch: 4   Global Step: 197190   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:19,611-Speed 2627.14 samples/sec   Loss 10.4595   LearningRate 0.0581   Epoch: 4   Global Step: 197200   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:52:23,503-Speed 2631.86 samples/sec   Loss 10.4756   LearningRate 0.0581   Epoch: 4   Global Step: 197210   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:27,399-Speed 2628.85 samples/sec   Loss 10.3292   LearningRate 0.0581   Epoch: 4   Global Step: 197220   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:31,297-Speed 2627.13 samples/sec   Loss 10.3188   LearningRate 0.0581   Epoch: 4   Global Step: 197230   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:35,202-Speed 2623.12 samples/sec   Loss 10.3516   LearningRate 0.0581   Epoch: 4   Global Step: 197240   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:39,097-Speed 2629.81 samples/sec   Loss 10.2428   LearningRate 0.0581   Epoch: 4   Global Step: 197250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:42,988-Speed 2632.37 samples/sec   Loss 10.3009   LearningRate 0.0581   Epoch: 4   Global Step: 197260   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:46,879-Speed 2631.84 samples/sec   Loss 10.4962   LearningRate 0.0581   Epoch: 4   Global Step: 197270   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:50,772-Speed 2631.43 samples/sec   Loss 10.3290   LearningRate 0.0581   Epoch: 4   Global Step: 197280   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:54,671-Speed 2627.41 samples/sec   Loss 10.4644   LearningRate 0.0581   Epoch: 4   Global Step: 197290   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:52:58,563-Speed 2631.47 samples/sec   Loss 10.3060   LearningRate 0.0581   Epoch: 4   Global Step: 197300   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:02,457-Speed 2630.41 samples/sec   Loss 10.3693   LearningRate 0.0581   Epoch: 4   Global Step: 197310   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:53:06,337-Speed 2639.45 samples/sec   Loss 10.5333   LearningRate 0.0581   Epoch: 4   Global Step: 197320   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:10,231-Speed 2630.32 samples/sec   Loss 10.4675   LearningRate 0.0581   Epoch: 4   Global Step: 197330   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:14,122-Speed 2632.04 samples/sec   Loss 10.3704   LearningRate 0.0581   Epoch: 4   Global Step: 197340   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:18,036-Speed 2617.59 samples/sec   Loss 10.3461   LearningRate 0.0581   Epoch: 4   Global Step: 197350   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:21,927-Speed 2632.42 samples/sec   Loss 10.4917   LearningRate 0.0581   Epoch: 4   Global Step: 197360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:25,819-Speed 2631.45 samples/sec   Loss 10.4343   LearningRate 0.0581   Epoch: 4   Global Step: 197370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:29,726-Speed 2621.52 samples/sec   Loss 10.4759   LearningRate 0.0581   Epoch: 4   Global Step: 197380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:33,618-Speed 2631.72 samples/sec   Loss 10.3831   LearningRate 0.0581   Epoch: 4   Global Step: 197390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:37,507-Speed 2633.70 samples/sec   Loss 10.3751   LearningRate 0.0581   Epoch: 4   Global Step: 197400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:41,399-Speed 2631.89 samples/sec   Loss 10.4450   LearningRate 0.0581   Epoch: 4   Global Step: 197410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:45,288-Speed 2634.12 samples/sec   Loss 10.4758   LearningRate 0.0581   Epoch: 4   Global Step: 197420   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:53:49,180-Speed 2632.15 samples/sec   Loss 10.3804   LearningRate 0.0581   Epoch: 4   Global Step: 197430   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:53:53,083-Speed 2624.23 samples/sec   Loss 10.5227   LearningRate 0.0581   Epoch: 4   Global Step: 197440   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:53:56,992-Speed 2620.22 samples/sec   Loss 10.2125   LearningRate 0.0581   Epoch: 4   Global Step: 197450   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:00,883-Speed 2632.34 samples/sec   Loss 10.4716   LearningRate 0.0581   Epoch: 4   Global Step: 197460   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:04,782-Speed 2626.66 samples/sec   Loss 10.4624   LearningRate 0.0581   Epoch: 4   Global Step: 197470   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:08,688-Speed 2622.40 samples/sec   Loss 10.5621   LearningRate 0.0581   Epoch: 4   Global Step: 197480   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:12,578-Speed 2633.42 samples/sec   Loss 10.4772   LearningRate 0.0581   Epoch: 4   Global Step: 197490   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:16,470-Speed 2631.90 samples/sec   Loss 10.4322   LearningRate 0.0581   Epoch: 4   Global Step: 197500   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:20,360-Speed 2632.98 samples/sec   Loss 10.4675   LearningRate 0.0581   Epoch: 4   Global Step: 197510   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:24,260-Speed 2625.73 samples/sec   Loss 10.4205   LearningRate 0.0580   Epoch: 4   Global Step: 197520   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:28,156-Speed 2629.41 samples/sec   Loss 10.2901   LearningRate 0.0580   Epoch: 4   Global Step: 197530   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:54:31,976-Speed 2681.76 samples/sec   Loss 11.1399   LearningRate 0.0580   Epoch: 4   Global Step: 197540   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:35,870-Speed 2630.00 samples/sec   Loss 10.6462   LearningRate 0.0580   Epoch: 4   Global Step: 197550   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:39,768-Speed 2627.20 samples/sec   Loss 10.3412   LearningRate 0.0580   Epoch: 4   Global Step: 197560   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:43,768-Speed 2561.02 samples/sec   Loss 10.3980   LearningRate 0.0580   Epoch: 4   Global Step: 197570   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:47,692-Speed 2611.07 samples/sec   Loss 10.4657   LearningRate 0.0580   Epoch: 4   Global Step: 197580   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:51,582-Speed 2632.34 samples/sec   Loss 10.3875   LearningRate 0.0580   Epoch: 4   Global Step: 197590   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:55,479-Speed 2628.35 samples/sec   Loss 10.4700   LearningRate 0.0580   Epoch: 4   Global Step: 197600   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:54:59,477-Speed 2562.33 samples/sec   Loss 10.4853   LearningRate 0.0580   Epoch: 4   Global Step: 197610   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:55:03,550-Speed 2514.67 samples/sec   Loss 10.3904   LearningRate 0.0580   Epoch: 4   Global Step: 197620   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:55:07,482-Speed 2604.58 samples/sec   Loss 10.5059   LearningRate 0.0580   Epoch: 4   Global Step: 197630   Fp16 Grad Scale: 8192   Required: 71 hours
Training: 2022-04-13 17:55:11,382-Speed 2626.37 samples/sec   Loss 10.3158   LearningRate 0.0580   Epoch: 4   Global Step: 197640   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:15,270-Speed 2634.77 samples/sec   Loss 10.3801   LearningRate 0.0580   Epoch: 4   Global Step: 197650   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:19,228-Speed 2587.71 samples/sec   Loss 10.4093   LearningRate 0.0580   Epoch: 4   Global Step: 197660   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:23,197-Speed 2580.23 samples/sec   Loss 10.3120   LearningRate 0.0580   Epoch: 4   Global Step: 197670   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:27,113-Speed 2616.12 samples/sec   Loss 10.3512   LearningRate 0.0580   Epoch: 4   Global Step: 197680   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:31,006-Speed 2631.10 samples/sec   Loss 10.3528   LearningRate 0.0580   Epoch: 4   Global Step: 197690   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:34,913-Speed 2621.19 samples/sec   Loss 10.2624   LearningRate 0.0580   Epoch: 4   Global Step: 197700   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:38,819-Speed 2622.51 samples/sec   Loss 10.4612   LearningRate 0.0580   Epoch: 4   Global Step: 197710   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:42,715-Speed 2629.19 samples/sec   Loss 10.5356   LearningRate 0.0580   Epoch: 4   Global Step: 197720   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:46,604-Speed 2633.14 samples/sec   Loss 10.3982   LearningRate 0.0580   Epoch: 4   Global Step: 197730   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:55:50,498-Speed 2630.54 samples/sec   Loss 10.3856   LearningRate 0.0580   Epoch: 4   Global Step: 197740   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:55:54,406-Speed 2621.09 samples/sec   Loss 10.5326   LearningRate 0.0580   Epoch: 4   Global Step: 197750   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:55:58,308-Speed 2624.66 samples/sec   Loss 10.4146   LearningRate 0.0580   Epoch: 4   Global Step: 197760   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:02,203-Speed 2629.97 samples/sec   Loss 10.2138   LearningRate 0.0580   Epoch: 4   Global Step: 197770   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:06,099-Speed 2629.14 samples/sec   Loss 10.4380   LearningRate 0.0580   Epoch: 4   Global Step: 197780   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:09,992-Speed 2631.02 samples/sec   Loss 10.4261   LearningRate 0.0580   Epoch: 4   Global Step: 197790   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:13,888-Speed 2628.67 samples/sec   Loss 10.4682   LearningRate 0.0580   Epoch: 4   Global Step: 197800   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:17,785-Speed 2629.18 samples/sec   Loss 10.3841   LearningRate 0.0580   Epoch: 4   Global Step: 197810   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:21,674-Speed 2633.27 samples/sec   Loss 10.3649   LearningRate 0.0580   Epoch: 4   Global Step: 197820   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:25,571-Speed 2628.37 samples/sec   Loss 10.4876   LearningRate 0.0580   Epoch: 4   Global Step: 197830   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:56:29,467-Speed 2629.08 samples/sec   Loss 10.3514   LearningRate 0.0580   Epoch: 4   Global Step: 197840   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:33,358-Speed 2632.96 samples/sec   Loss 10.4595   LearningRate 0.0580   Epoch: 4   Global Step: 197850   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:37,249-Speed 2632.42 samples/sec   Loss 10.4034   LearningRate 0.0580   Epoch: 4   Global Step: 197860   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:41,144-Speed 2629.89 samples/sec   Loss 10.2291   LearningRate 0.0580   Epoch: 4   Global Step: 197870   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:45,035-Speed 2632.25 samples/sec   Loss 10.4837   LearningRate 0.0580   Epoch: 4   Global Step: 197880   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:48,927-Speed 2631.40 samples/sec   Loss 10.4227   LearningRate 0.0580   Epoch: 4   Global Step: 197890   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:52,854-Speed 2608.80 samples/sec   Loss 10.5299   LearningRate 0.0580   Epoch: 4   Global Step: 197900   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:56:56,750-Speed 2629.03 samples/sec   Loss 10.4116   LearningRate 0.0580   Epoch: 4   Global Step: 197910   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:57:00,659-Speed 2619.87 samples/sec   Loss 10.4247   LearningRate 0.0580   Epoch: 4   Global Step: 197920   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:57:04,554-Speed 2629.82 samples/sec   Loss 10.4715   LearningRate 0.0580   Epoch: 4   Global Step: 197930   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:57:08,447-Speed 2631.45 samples/sec   Loss 10.2220   LearningRate 0.0580   Epoch: 4   Global Step: 197940   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:12,346-Speed 2627.16 samples/sec   Loss 10.3490   LearningRate 0.0580   Epoch: 4   Global Step: 197950   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:16,241-Speed 2629.44 samples/sec   Loss 10.4925   LearningRate 0.0580   Epoch: 4   Global Step: 197960   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:20,141-Speed 2626.52 samples/sec   Loss 10.4743   LearningRate 0.0580   Epoch: 4   Global Step: 197970   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:24,042-Speed 2625.66 samples/sec   Loss 10.4589   LearningRate 0.0580   Epoch: 4   Global Step: 197980   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:27,934-Speed 2631.49 samples/sec   Loss 10.4698   LearningRate 0.0580   Epoch: 4   Global Step: 197990   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:31,858-Speed 2610.04 samples/sec   Loss 10.4029   LearningRate 0.0580   Epoch: 4   Global Step: 198000   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:35,810-Speed 2592.47 samples/sec   Loss 10.4302   LearningRate 0.0580   Epoch: 4   Global Step: 198010   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:39,821-Speed 2553.50 samples/sec   Loss 10.3776   LearningRate 0.0580   Epoch: 4   Global Step: 198020   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:43,716-Speed 2630.50 samples/sec   Loss 10.2587   LearningRate 0.0580   Epoch: 4   Global Step: 198030   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:57:47,622-Speed 2621.86 samples/sec   Loss 10.4220   LearningRate 0.0580   Epoch: 4   Global Step: 198040   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:57:51,516-Speed 2630.27 samples/sec   Loss 10.4292   LearningRate 0.0580   Epoch: 4   Global Step: 198050   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:57:55,413-Speed 2628.48 samples/sec   Loss 10.2373   LearningRate 0.0580   Epoch: 4   Global Step: 198060   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 17:57:59,297-Speed 2636.93 samples/sec   Loss 10.4108   LearningRate 0.0579   Epoch: 4   Global Step: 198070   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:03,188-Speed 2632.17 samples/sec   Loss 10.4480   LearningRate 0.0579   Epoch: 4   Global Step: 198080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:07,108-Speed 2613.11 samples/sec   Loss 10.3403   LearningRate 0.0579   Epoch: 4   Global Step: 198090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:11,117-Speed 2555.08 samples/sec   Loss 10.3406   LearningRate 0.0579   Epoch: 4   Global Step: 198100   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:15,019-Speed 2625.44 samples/sec   Loss 10.4041   LearningRate 0.0579   Epoch: 4   Global Step: 198110   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:19,031-Speed 2552.65 samples/sec   Loss 10.4557   LearningRate 0.0579   Epoch: 4   Global Step: 198120   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:22,924-Speed 2632.49 samples/sec   Loss 10.2965   LearningRate 0.0579   Epoch: 4   Global Step: 198130   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:26,818-Speed 2629.76 samples/sec   Loss 10.4455   LearningRate 0.0579   Epoch: 4   Global Step: 198140   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:30,718-Speed 2626.69 samples/sec   Loss 10.5091   LearningRate 0.0579   Epoch: 4   Global Step: 198150   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 17:58:34,599-Speed 2639.15 samples/sec   Loss 10.4013   LearningRate 0.0579   Epoch: 4   Global Step: 198160   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:58:38,493-Speed 2631.02 samples/sec   Loss 10.5431   LearningRate 0.0579   Epoch: 4   Global Step: 198170   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:58:42,387-Speed 2630.13 samples/sec   Loss 10.4734   LearningRate 0.0579   Epoch: 4   Global Step: 198180   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:58:46,280-Speed 2631.65 samples/sec   Loss 10.5082   LearningRate 0.0579   Epoch: 4   Global Step: 198190   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:58:50,170-Speed 2632.84 samples/sec   Loss 10.4451   LearningRate 0.0579   Epoch: 4   Global Step: 198200   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:58:54,097-Speed 2608.51 samples/sec   Loss 10.4188   LearningRate 0.0579   Epoch: 4   Global Step: 198210   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:58:57,992-Speed 2629.77 samples/sec   Loss 10.3816   LearningRate 0.0579   Epoch: 4   Global Step: 198220   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:59:01,882-Speed 2632.98 samples/sec   Loss 10.3799   LearningRate 0.0579   Epoch: 4   Global Step: 198230   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 17:59:05,756-Speed 2644.23 samples/sec   Loss 10.5500   LearningRate 0.0579   Epoch: 4   Global Step: 198240   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:09,657-Speed 2625.57 samples/sec   Loss 10.5172   LearningRate 0.0579   Epoch: 4   Global Step: 198250   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:13,556-Speed 2627.28 samples/sec   Loss 10.3226   LearningRate 0.0579   Epoch: 4   Global Step: 198260   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:17,460-Speed 2624.03 samples/sec   Loss 10.4706   LearningRate 0.0579   Epoch: 4   Global Step: 198270   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:21,358-Speed 2627.46 samples/sec   Loss 10.5123   LearningRate 0.0579   Epoch: 4   Global Step: 198280   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:25,260-Speed 2624.98 samples/sec   Loss 10.3765   LearningRate 0.0579   Epoch: 4   Global Step: 198290   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:29,159-Speed 2627.23 samples/sec   Loss 10.4044   LearningRate 0.0579   Epoch: 4   Global Step: 198300   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:33,093-Speed 2603.46 samples/sec   Loss 10.4614   LearningRate 0.0579   Epoch: 4   Global Step: 198310   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:37,221-Speed 2480.76 samples/sec   Loss 10.4310   LearningRate 0.0579   Epoch: 4   Global Step: 198320   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:41,120-Speed 2627.24 samples/sec   Loss 10.2911   LearningRate 0.0579   Epoch: 4   Global Step: 198330   Fp16 Grad Scale: 16384   Required: 71 hours
Training: 2022-04-13 17:59:45,028-Speed 2620.55 samples/sec   Loss 10.4362   LearningRate 0.0579   Epoch: 4   Global Step: 198340   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:59:48,942-Speed 2617.63 samples/sec   Loss 10.3099   LearningRate 0.0579   Epoch: 4   Global Step: 198350   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:59:52,842-Speed 2626.37 samples/sec   Loss 10.4934   LearningRate 0.0579   Epoch: 4   Global Step: 198360   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 17:59:56,740-Speed 2627.33 samples/sec   Loss 10.1897   LearningRate 0.0579   Epoch: 4   Global Step: 198370   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:00,632-Speed 2631.86 samples/sec   Loss 10.2833   LearningRate 0.0579   Epoch: 4   Global Step: 198380   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:04,525-Speed 2630.98 samples/sec   Loss 10.3524   LearningRate 0.0579   Epoch: 4   Global Step: 198390   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:08,421-Speed 2628.72 samples/sec   Loss 10.3483   LearningRate 0.0579   Epoch: 4   Global Step: 198400   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:12,330-Speed 2620.35 samples/sec   Loss 10.4928   LearningRate 0.0579   Epoch: 4   Global Step: 198410   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:16,226-Speed 2629.52 samples/sec   Loss 10.5135   LearningRate 0.0579   Epoch: 4   Global Step: 198420   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:20,120-Speed 2630.38 samples/sec   Loss 10.4320   LearningRate 0.0579   Epoch: 4   Global Step: 198430   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:00:24,030-Speed 2619.56 samples/sec   Loss 10.4642   LearningRate 0.0579   Epoch: 4   Global Step: 198440   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:27,927-Speed 2628.37 samples/sec   Loss 10.5113   LearningRate 0.0579   Epoch: 4   Global Step: 198450   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:31,825-Speed 2627.99 samples/sec   Loss 10.3928   LearningRate 0.0579   Epoch: 4   Global Step: 198460   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:35,714-Speed 2632.96 samples/sec   Loss 10.3958   LearningRate 0.0579   Epoch: 4   Global Step: 198470   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:39,616-Speed 2625.06 samples/sec   Loss 10.4309   LearningRate 0.0579   Epoch: 4   Global Step: 198480   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:43,509-Speed 2631.70 samples/sec   Loss 10.4315   LearningRate 0.0579   Epoch: 4   Global Step: 198490   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:47,406-Speed 2628.04 samples/sec   Loss 10.3515   LearningRate 0.0579   Epoch: 4   Global Step: 198500   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:51,319-Speed 2617.87 samples/sec   Loss 10.3621   LearningRate 0.0579   Epoch: 4   Global Step: 198510   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:55,218-Speed 2626.56 samples/sec   Loss 10.4952   LearningRate 0.0579   Epoch: 4   Global Step: 198520   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:00:59,111-Speed 2631.96 samples/sec   Loss 10.4245   LearningRate 0.0579   Epoch: 4   Global Step: 198530   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:01:03,001-Speed 2632.55 samples/sec   Loss 10.3728   LearningRate 0.0579   Epoch: 4   Global Step: 198540   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:06,898-Speed 2627.98 samples/sec   Loss 10.2532   LearningRate 0.0579   Epoch: 4   Global Step: 198550   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:10,792-Speed 2630.64 samples/sec   Loss 10.4546   LearningRate 0.0579   Epoch: 4   Global Step: 198560   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:14,691-Speed 2626.39 samples/sec   Loss 10.4405   LearningRate 0.0579   Epoch: 4   Global Step: 198570   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:18,586-Speed 2630.36 samples/sec   Loss 10.3963   LearningRate 0.0579   Epoch: 4   Global Step: 198580   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:22,479-Speed 2631.02 samples/sec   Loss 10.5806   LearningRate 0.0579   Epoch: 4   Global Step: 198590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:26,369-Speed 2632.51 samples/sec   Loss 10.4830   LearningRate 0.0579   Epoch: 4   Global Step: 198600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:30,262-Speed 2630.89 samples/sec   Loss 10.3835   LearningRate 0.0578   Epoch: 4   Global Step: 198610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:34,155-Speed 2630.82 samples/sec   Loss 10.4958   LearningRate 0.0578   Epoch: 4   Global Step: 198620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:38,046-Speed 2632.07 samples/sec   Loss 10.5085   LearningRate 0.0578   Epoch: 4   Global Step: 198630   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:41,936-Speed 2633.16 samples/sec   Loss 10.4809   LearningRate 0.0578   Epoch: 4   Global Step: 198640   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:01:45,811-Speed 2643.49 samples/sec   Loss 10.4385   LearningRate 0.0578   Epoch: 4   Global Step: 198650   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:49,708-Speed 2628.46 samples/sec   Loss 10.2259   LearningRate 0.0578   Epoch: 4   Global Step: 198660   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:53,606-Speed 2627.52 samples/sec   Loss 10.3577   LearningRate 0.0578   Epoch: 4   Global Step: 198670   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:01:57,500-Speed 2630.63 samples/sec   Loss 10.4017   LearningRate 0.0578   Epoch: 4   Global Step: 198680   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:01,406-Speed 2622.02 samples/sec   Loss 10.4402   LearningRate 0.0578   Epoch: 4   Global Step: 198690   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:05,304-Speed 2626.99 samples/sec   Loss 10.4360   LearningRate 0.0578   Epoch: 4   Global Step: 198700   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:09,195-Speed 2632.52 samples/sec   Loss 10.3393   LearningRate 0.0578   Epoch: 4   Global Step: 198710   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:13,096-Speed 2625.64 samples/sec   Loss 10.3022   LearningRate 0.0578   Epoch: 4   Global Step: 198720   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:16,991-Speed 2629.10 samples/sec   Loss 10.4760   LearningRate 0.0578   Epoch: 4   Global Step: 198730   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:20,882-Speed 2632.06 samples/sec   Loss 10.3442   LearningRate 0.0578   Epoch: 4   Global Step: 198740   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:02:24,780-Speed 2628.13 samples/sec   Loss 10.3906   LearningRate 0.0578   Epoch: 4   Global Step: 198750   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:28,673-Speed 2631.24 samples/sec   Loss 10.2228   LearningRate 0.0578   Epoch: 4   Global Step: 198760   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:32,567-Speed 2630.16 samples/sec   Loss 10.4550   LearningRate 0.0578   Epoch: 4   Global Step: 198770   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:36,502-Speed 2602.74 samples/sec   Loss 10.2655   LearningRate 0.0578   Epoch: 4   Global Step: 198780   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:40,403-Speed 2625.24 samples/sec   Loss 10.5210   LearningRate 0.0578   Epoch: 4   Global Step: 198790   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:44,306-Speed 2624.37 samples/sec   Loss 10.4149   LearningRate 0.0578   Epoch: 4   Global Step: 198800   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:48,213-Speed 2621.80 samples/sec   Loss 10.4094   LearningRate 0.0578   Epoch: 4   Global Step: 198810   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:52,107-Speed 2630.28 samples/sec   Loss 10.3558   LearningRate 0.0578   Epoch: 4   Global Step: 198820   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:56,007-Speed 2626.21 samples/sec   Loss 10.2301   LearningRate 0.0578   Epoch: 4   Global Step: 198830   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:02:59,918-Speed 2618.70 samples/sec   Loss 10.3536   LearningRate 0.0578   Epoch: 4   Global Step: 198840   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:03:03,800-Speed 2638.70 samples/sec   Loss 10.4048   LearningRate 0.0578   Epoch: 4   Global Step: 198850   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:03:07,701-Speed 2625.27 samples/sec   Loss 10.3115   LearningRate 0.0578   Epoch: 4   Global Step: 198860   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:03:11,586-Speed 2636.58 samples/sec   Loss 10.4263   LearningRate 0.0578   Epoch: 4   Global Step: 198870   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:15,489-Speed 2624.13 samples/sec   Loss 10.3710   LearningRate 0.0578   Epoch: 4   Global Step: 198880   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:19,389-Speed 2626.26 samples/sec   Loss 10.3086   LearningRate 0.0578   Epoch: 4   Global Step: 198890   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:23,294-Speed 2623.34 samples/sec   Loss 10.4092   LearningRate 0.0578   Epoch: 4   Global Step: 198900   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:27,205-Speed 2618.61 samples/sec   Loss 10.3059   LearningRate 0.0578   Epoch: 4   Global Step: 198910   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:31,125-Speed 2612.42 samples/sec   Loss 10.5314   LearningRate 0.0578   Epoch: 4   Global Step: 198920   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:35,026-Speed 2625.46 samples/sec   Loss 10.4327   LearningRate 0.0578   Epoch: 4   Global Step: 198930   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:38,931-Speed 2623.65 samples/sec   Loss 10.2482   LearningRate 0.0578   Epoch: 4   Global Step: 198940   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:42,829-Speed 2627.53 samples/sec   Loss 10.4961   LearningRate 0.0578   Epoch: 4   Global Step: 198950   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:46,747-Speed 2614.43 samples/sec   Loss 10.3463   LearningRate 0.0578   Epoch: 4   Global Step: 198960   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:03:50,636-Speed 2633.19 samples/sec   Loss 10.3763   LearningRate 0.0578   Epoch: 4   Global Step: 198970   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:03:54,550-Speed 2617.26 samples/sec   Loss 10.4557   LearningRate 0.0578   Epoch: 4   Global Step: 198980   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:03:58,473-Speed 2610.80 samples/sec   Loss 10.4232   LearningRate 0.0578   Epoch: 4   Global Step: 198990   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:02,375-Speed 2624.72 samples/sec   Loss 10.3001   LearningRate 0.0578   Epoch: 4   Global Step: 199000   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:06,270-Speed 2629.27 samples/sec   Loss 10.2978   LearningRate 0.0578   Epoch: 4   Global Step: 199010   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:10,172-Speed 2624.91 samples/sec   Loss 10.4877   LearningRate 0.0578   Epoch: 4   Global Step: 199020   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:14,065-Speed 2631.18 samples/sec   Loss 10.4304   LearningRate 0.0578   Epoch: 4   Global Step: 199030   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:17,968-Speed 2624.38 samples/sec   Loss 10.4121   LearningRate 0.0578   Epoch: 4   Global Step: 199040   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:21,860-Speed 2632.04 samples/sec   Loss 10.4827   LearningRate 0.0578   Epoch: 4   Global Step: 199050   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:25,754-Speed 2630.31 samples/sec   Loss 10.4122   LearningRate 0.0578   Epoch: 4   Global Step: 199060   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:04:29,659-Speed 2622.37 samples/sec   Loss 10.1884   LearningRate 0.0578   Epoch: 4   Global Step: 199070   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:33,566-Speed 2621.22 samples/sec   Loss 10.3624   LearningRate 0.0578   Epoch: 4   Global Step: 199080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:37,466-Speed 2626.95 samples/sec   Loss 10.3697   LearningRate 0.0578   Epoch: 4   Global Step: 199090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:41,360-Speed 2630.05 samples/sec   Loss 10.3972   LearningRate 0.0578   Epoch: 4   Global Step: 199100   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:45,262-Speed 2624.80 samples/sec   Loss 10.4243   LearningRate 0.0578   Epoch: 4   Global Step: 199110   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:49,165-Speed 2624.09 samples/sec   Loss 10.2614   LearningRate 0.0578   Epoch: 4   Global Step: 199120   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:53,059-Speed 2631.86 samples/sec   Loss 10.3766   LearningRate 0.0578   Epoch: 4   Global Step: 199130   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:04:56,953-Speed 2630.64 samples/sec   Loss 10.3419   LearningRate 0.0578   Epoch: 4   Global Step: 199140   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:00,849-Speed 2628.95 samples/sec   Loss 10.4512   LearningRate 0.0578   Epoch: 4   Global Step: 199150   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:04,748-Speed 2627.00 samples/sec   Loss 10.3635   LearningRate 0.0577   Epoch: 4   Global Step: 199160   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:08,653-Speed 2622.64 samples/sec   Loss 10.4523   LearningRate 0.0577   Epoch: 4   Global Step: 199170   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:12,553-Speed 2626.54 samples/sec   Loss 10.3363   LearningRate 0.0577   Epoch: 4   Global Step: 199180   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:16,445-Speed 2631.01 samples/sec   Loss 10.3246   LearningRate 0.0577   Epoch: 4   Global Step: 199190   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:20,348-Speed 2624.29 samples/sec   Loss 10.3816   LearningRate 0.0577   Epoch: 4   Global Step: 199200   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:24,253-Speed 2623.04 samples/sec   Loss 10.1874   LearningRate 0.0577   Epoch: 4   Global Step: 199210   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:28,145-Speed 2631.78 samples/sec   Loss 10.3675   LearningRate 0.0577   Epoch: 4   Global Step: 199220   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:32,041-Speed 2628.66 samples/sec   Loss 10.2827   LearningRate 0.0577   Epoch: 4   Global Step: 199230   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:05:35,915-Speed 2644.17 samples/sec   Loss 10.5425   LearningRate 0.0577   Epoch: 4   Global Step: 199240   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:39,820-Speed 2622.67 samples/sec   Loss 10.3043   LearningRate 0.0577   Epoch: 4   Global Step: 199250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:43,724-Speed 2623.63 samples/sec   Loss 10.4082   LearningRate 0.0577   Epoch: 4   Global Step: 199260   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:47,629-Speed 2622.62 samples/sec   Loss 10.3106   LearningRate 0.0577   Epoch: 4   Global Step: 199270   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:51,534-Speed 2623.16 samples/sec   Loss 10.3620   LearningRate 0.0577   Epoch: 4   Global Step: 199280   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:55,436-Speed 2624.92 samples/sec   Loss 10.2897   LearningRate 0.0577   Epoch: 4   Global Step: 199290   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:05:59,328-Speed 2631.92 samples/sec   Loss 10.4026   LearningRate 0.0577   Epoch: 4   Global Step: 199300   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:03,224-Speed 2628.28 samples/sec   Loss 10.3689   LearningRate 0.0577   Epoch: 4   Global Step: 199310   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:07,121-Speed 2628.24 samples/sec   Loss 10.5717   LearningRate 0.0577   Epoch: 4   Global Step: 199320   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:11,014-Speed 2631.14 samples/sec   Loss 10.3764   LearningRate 0.0577   Epoch: 4   Global Step: 199330   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:14,922-Speed 2621.47 samples/sec   Loss 10.4316   LearningRate 0.0577   Epoch: 4   Global Step: 199340   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:06:18,889-Speed 2581.88 samples/sec   Loss 10.3267   LearningRate 0.0577   Epoch: 4   Global Step: 199350   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:06:22,771-Speed 2638.19 samples/sec   Loss 10.4749   LearningRate 0.0577   Epoch: 4   Global Step: 199360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:26,670-Speed 2627.04 samples/sec   Loss 10.4410   LearningRate 0.0577   Epoch: 4   Global Step: 199370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:30,571-Speed 2625.50 samples/sec   Loss 10.3568   LearningRate 0.0577   Epoch: 4   Global Step: 199380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:34,494-Speed 2610.65 samples/sec   Loss 10.2446   LearningRate 0.0577   Epoch: 4   Global Step: 199390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:38,394-Speed 2626.34 samples/sec   Loss 10.3340   LearningRate 0.0577   Epoch: 4   Global Step: 199400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:42,287-Speed 2631.39 samples/sec   Loss 10.2946   LearningRate 0.0577   Epoch: 4   Global Step: 199410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:46,181-Speed 2629.92 samples/sec   Loss 10.3393   LearningRate 0.0577   Epoch: 4   Global Step: 199420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:50,075-Speed 2630.38 samples/sec   Loss 10.2619   LearningRate 0.0577   Epoch: 4   Global Step: 199430   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:06:53,954-Speed 2641.00 samples/sec   Loss 10.4370   LearningRate 0.0577   Epoch: 4   Global Step: 199440   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:06:57,848-Speed 2630.27 samples/sec   Loss 10.4986   LearningRate 0.0577   Epoch: 4   Global Step: 199450   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:01,738-Speed 2632.43 samples/sec   Loss 10.3995   LearningRate 0.0577   Epoch: 4   Global Step: 199460   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:05,629-Speed 2632.58 samples/sec   Loss 10.3696   LearningRate 0.0577   Epoch: 4   Global Step: 199470   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:09,520-Speed 2632.19 samples/sec   Loss 10.4180   LearningRate 0.0577   Epoch: 4   Global Step: 199480   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:13,411-Speed 2632.13 samples/sec   Loss 10.3220   LearningRate 0.0577   Epoch: 4   Global Step: 199490   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:17,301-Speed 2632.95 samples/sec   Loss 10.2431   LearningRate 0.0577   Epoch: 4   Global Step: 199500   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:21,198-Speed 2628.60 samples/sec   Loss 10.2741   LearningRate 0.0577   Epoch: 4   Global Step: 199510   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:25,096-Speed 2627.48 samples/sec   Loss 10.4599   LearningRate 0.0577   Epoch: 4   Global Step: 199520   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:28,993-Speed 2628.87 samples/sec   Loss 10.3936   LearningRate 0.0577   Epoch: 4   Global Step: 199530   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:07:32,884-Speed 2631.64 samples/sec   Loss 10.4091   LearningRate 0.0577   Epoch: 4   Global Step: 199540   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:07:36,773-Speed 2634.05 samples/sec   Loss 10.3351   LearningRate 0.0577   Epoch: 4   Global Step: 199550   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:07:40,660-Speed 2634.33 samples/sec   Loss 10.3175   LearningRate 0.0577   Epoch: 4   Global Step: 199560   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:07:44,550-Speed 2633.30 samples/sec   Loss 10.2832   LearningRate 0.0577   Epoch: 4   Global Step: 199570   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:07:48,441-Speed 2633.25 samples/sec   Loss 10.2602   LearningRate 0.0577   Epoch: 4   Global Step: 199580   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:07:52,423-Speed 2572.11 samples/sec   Loss 10.4254   LearningRate 0.0577   Epoch: 4   Global Step: 199590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:07:56,326-Speed 2624.11 samples/sec   Loss 10.2727   LearningRate 0.0577   Epoch: 4   Global Step: 199600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:00,223-Speed 2628.42 samples/sec   Loss 10.3611   LearningRate 0.0577   Epoch: 4   Global Step: 199610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:04,145-Speed 2611.39 samples/sec   Loss 10.4112   LearningRate 0.0577   Epoch: 4   Global Step: 199620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:08,038-Speed 2631.26 samples/sec   Loss 10.3248   LearningRate 0.0577   Epoch: 4   Global Step: 199630   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:11,931-Speed 2631.05 samples/sec   Loss 10.3844   LearningRate 0.0577   Epoch: 4   Global Step: 199640   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:08:15,807-Speed 2642.79 samples/sec   Loss 10.2962   LearningRate 0.0577   Epoch: 4   Global Step: 199650   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:19,724-Speed 2614.84 samples/sec   Loss 10.2848   LearningRate 0.0577   Epoch: 4   Global Step: 199660   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:23,623-Speed 2627.44 samples/sec   Loss 10.2787   LearningRate 0.0577   Epoch: 4   Global Step: 199670   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:27,519-Speed 2628.93 samples/sec   Loss 10.4202   LearningRate 0.0577   Epoch: 4   Global Step: 199680   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:31,414-Speed 2629.35 samples/sec   Loss 10.4731   LearningRate 0.0577   Epoch: 4   Global Step: 199690   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:35,312-Speed 2627.82 samples/sec   Loss 10.4287   LearningRate 0.0576   Epoch: 4   Global Step: 199700   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:39,295-Speed 2571.84 samples/sec   Loss 10.2613   LearningRate 0.0576   Epoch: 4   Global Step: 199710   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:43,192-Speed 2628.25 samples/sec   Loss 10.2827   LearningRate 0.0576   Epoch: 4   Global Step: 199720   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:47,083-Speed 2632.23 samples/sec   Loss 10.4764   LearningRate 0.0576   Epoch: 4   Global Step: 199730   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:50,989-Speed 2622.93 samples/sec   Loss 10.3902   LearningRate 0.0576   Epoch: 4   Global Step: 199740   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:54,864-Speed 2643.47 samples/sec   Loss 10.3001   LearningRate 0.0576   Epoch: 4   Global Step: 199750   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:08:58,756-Speed 2630.99 samples/sec   Loss 10.3683   LearningRate 0.0576   Epoch: 4   Global Step: 199760   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:02,649-Speed 2631.26 samples/sec   Loss 10.4276   LearningRate 0.0576   Epoch: 4   Global Step: 199770   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:06,542-Speed 2630.91 samples/sec   Loss 10.3477   LearningRate 0.0576   Epoch: 4   Global Step: 199780   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:10,432-Speed 2633.05 samples/sec   Loss 10.1873   LearningRate 0.0576   Epoch: 4   Global Step: 199790   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:14,360-Speed 2608.35 samples/sec   Loss 10.4312   LearningRate 0.0576   Epoch: 4   Global Step: 199800   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:18,255-Speed 2629.23 samples/sec   Loss 10.3294   LearningRate 0.0576   Epoch: 4   Global Step: 199810   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:22,147-Speed 2632.36 samples/sec   Loss 10.5087   LearningRate 0.0576   Epoch: 4   Global Step: 199820   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:26,070-Speed 2610.66 samples/sec   Loss 10.4282   LearningRate 0.0576   Epoch: 4   Global Step: 199830   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:29,961-Speed 2632.47 samples/sec   Loss 10.3917   LearningRate 0.0576   Epoch: 4   Global Step: 199840   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:33,852-Speed 2632.19 samples/sec   Loss 10.3286   LearningRate 0.0576   Epoch: 4   Global Step: 199850   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:09:37,745-Speed 2631.22 samples/sec   Loss 10.1622   LearningRate 0.0576   Epoch: 4   Global Step: 199860   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:09:41,642-Speed 2628.49 samples/sec   Loss 10.3089   LearningRate 0.0576   Epoch: 4   Global Step: 199870   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:09:45,536-Speed 2630.71 samples/sec   Loss 10.3292   LearningRate 0.0576   Epoch: 4   Global Step: 199880   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:09:49,439-Speed 2623.76 samples/sec   Loss 10.3811   LearningRate 0.0576   Epoch: 4   Global Step: 199890   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:53,336-Speed 2628.95 samples/sec   Loss 10.4237   LearningRate 0.0576   Epoch: 4   Global Step: 199900   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:09:57,226-Speed 2632.95 samples/sec   Loss 10.5008   LearningRate 0.0576   Epoch: 4   Global Step: 199910   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:01,119-Speed 2630.72 samples/sec   Loss 10.3433   LearningRate 0.0576   Epoch: 4   Global Step: 199920   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:05,017-Speed 2627.17 samples/sec   Loss 10.3834   LearningRate 0.0576   Epoch: 4   Global Step: 199930   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:08,912-Speed 2630.57 samples/sec   Loss 10.3877   LearningRate 0.0576   Epoch: 4   Global Step: 199940   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:12,808-Speed 2628.82 samples/sec   Loss 10.2307   LearningRate 0.0576   Epoch: 4   Global Step: 199950   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:16,700-Speed 2631.46 samples/sec   Loss 10.3340   LearningRate 0.0576   Epoch: 4   Global Step: 199960   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:20,602-Speed 2625.01 samples/sec   Loss 10.3875   LearningRate 0.0576   Epoch: 4   Global Step: 199970   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:24,502-Speed 2626.64 samples/sec   Loss 10.3997   LearningRate 0.0576   Epoch: 4   Global Step: 199980   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:10:28,394-Speed 2631.97 samples/sec   Loss 10.4073   LearningRate 0.0576   Epoch: 4   Global Step: 199990   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:10:32,297-Speed 2623.94 samples/sec   Loss 10.2945   LearningRate 0.0576   Epoch: 4   Global Step: 200000   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:11:15,517-[lfw][200000]XNorm: 23.224185
Training: 2022-04-13 18:11:15,518-[lfw][200000]Accuracy-Flip: 0.99767+-0.00226
Training: 2022-04-13 18:11:15,518-[lfw][200000]Accuracy-Highest: 0.99783
Training: 2022-04-13 18:12:05,937-[cfp_fp][200000]XNorm: 20.991996
Training: 2022-04-13 18:12:05,938-[cfp_fp][200000]Accuracy-Flip: 0.98314+-0.00477
Training: 2022-04-13 18:12:05,939-[cfp_fp][200000]Accuracy-Highest: 0.98314
Training: 2022-04-13 18:12:49,331-[agedb_30][200000]XNorm: 23.042334
Training: 2022-04-13 18:12:49,332-[agedb_30][200000]Accuracy-Flip: 0.96933+-0.00731
Training: 2022-04-13 18:12:49,333-[agedb_30][200000]Accuracy-Highest: 0.97150
Training: 2022-04-13 18:12:53,172-Speed 72.69 samples/sec   Loss 10.4043   LearningRate 0.0576   Epoch: 4   Global Step: 200010   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:12:57,039-Speed 2648.66 samples/sec   Loss 10.3826   LearningRate 0.0576   Epoch: 4   Global Step: 200020   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:00,906-Speed 2648.98 samples/sec   Loss 10.1876   LearningRate 0.0576   Epoch: 4   Global Step: 200030   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:04,780-Speed 2644.20 samples/sec   Loss 10.3414   LearningRate 0.0576   Epoch: 4   Global Step: 200040   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:08,647-Speed 2648.14 samples/sec   Loss 10.4036   LearningRate 0.0576   Epoch: 4   Global Step: 200050   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:12,534-Speed 2636.13 samples/sec   Loss 10.4969   LearningRate 0.0576   Epoch: 4   Global Step: 200060   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:16,409-Speed 2643.51 samples/sec   Loss 10.4154   LearningRate 0.0576   Epoch: 4   Global Step: 200070   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:20,280-Speed 2645.73 samples/sec   Loss 10.3380   LearningRate 0.0576   Epoch: 4   Global Step: 200080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:24,164-Speed 2637.74 samples/sec   Loss 10.2099   LearningRate 0.0576   Epoch: 4   Global Step: 200090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:28,040-Speed 2641.99 samples/sec   Loss 10.2916   LearningRate 0.0576   Epoch: 4   Global Step: 200100   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:31,924-Speed 2637.72 samples/sec   Loss 10.4134   LearningRate 0.0576   Epoch: 4   Global Step: 200110   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:13:35,807-Speed 2637.67 samples/sec   Loss 10.3706   LearningRate 0.0576   Epoch: 4   Global Step: 200120   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:13:39,699-Speed 2631.26 samples/sec   Loss 10.4044   LearningRate 0.0576   Epoch: 4   Global Step: 200130   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:13:43,609-Speed 2619.49 samples/sec   Loss 10.3783   LearningRate 0.0576   Epoch: 4   Global Step: 200140   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:47,494-Speed 2637.22 samples/sec   Loss 10.4498   LearningRate 0.0576   Epoch: 4   Global Step: 200150   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:51,382-Speed 2633.96 samples/sec   Loss 10.4382   LearningRate 0.0576   Epoch: 4   Global Step: 200160   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:55,271-Speed 2634.33 samples/sec   Loss 10.4895   LearningRate 0.0576   Epoch: 4   Global Step: 200170   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:13:59,173-Speed 2624.17 samples/sec   Loss 10.2979   LearningRate 0.0576   Epoch: 4   Global Step: 200180   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:03,057-Speed 2637.23 samples/sec   Loss 10.3013   LearningRate 0.0576   Epoch: 4   Global Step: 200190   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:06,958-Speed 2625.98 samples/sec   Loss 10.3608   LearningRate 0.0576   Epoch: 4   Global Step: 200200   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:10,852-Speed 2630.32 samples/sec   Loss 10.2863   LearningRate 0.0576   Epoch: 4   Global Step: 200210   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:14,745-Speed 2630.44 samples/sec   Loss 10.3409   LearningRate 0.0576   Epoch: 4   Global Step: 200220   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:18,638-Speed 2631.03 samples/sec   Loss 10.2493   LearningRate 0.0576   Epoch: 4   Global Step: 200230   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:22,532-Speed 2630.54 samples/sec   Loss 10.2352   LearningRate 0.0576   Epoch: 4   Global Step: 200240   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:14:26,410-Speed 2640.65 samples/sec   Loss 10.4536   LearningRate 0.0575   Epoch: 4   Global Step: 200250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:14:30,284-Speed 2644.24 samples/sec   Loss 10.3163   LearningRate 0.0575   Epoch: 4   Global Step: 200260   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:34,297-Speed 2552.47 samples/sec   Loss 10.3092   LearningRate 0.0575   Epoch: 4   Global Step: 200270   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:38,247-Speed 2593.13 samples/sec   Loss 10.2065   LearningRate 0.0575   Epoch: 4   Global Step: 200280   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:42,227-Speed 2573.58 samples/sec   Loss 10.4271   LearningRate 0.0575   Epoch: 4   Global Step: 200290   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:46,298-Speed 2516.04 samples/sec   Loss 10.3799   LearningRate 0.0575   Epoch: 4   Global Step: 200300   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:50,294-Speed 2562.52 samples/sec   Loss 10.4337   LearningRate 0.0575   Epoch: 4   Global Step: 200310   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:54,184-Speed 2633.70 samples/sec   Loss 10.3961   LearningRate 0.0575   Epoch: 4   Global Step: 200320   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:14:58,077-Speed 2630.69 samples/sec   Loss 10.3465   LearningRate 0.0575   Epoch: 4   Global Step: 200330   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:15:01,971-Speed 2629.95 samples/sec   Loss 10.2584   LearningRate 0.0575   Epoch: 4   Global Step: 200340   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:15:05,861-Speed 2632.98 samples/sec   Loss 10.3747   LearningRate 0.0575   Epoch: 4   Global Step: 200350   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:15:09,751-Speed 2633.63 samples/sec   Loss 10.2967   LearningRate 0.0575   Epoch: 4   Global Step: 200360   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:13,663-Speed 2618.34 samples/sec   Loss 10.2796   LearningRate 0.0575   Epoch: 4   Global Step: 200370   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:17,560-Speed 2627.94 samples/sec   Loss 10.3790   LearningRate 0.0575   Epoch: 4   Global Step: 200380   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:21,456-Speed 2629.26 samples/sec   Loss 10.3223   LearningRate 0.0575   Epoch: 4   Global Step: 200390   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:25,346-Speed 2632.54 samples/sec   Loss 10.3495   LearningRate 0.0575   Epoch: 4   Global Step: 200400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:29,242-Speed 2629.21 samples/sec   Loss 10.4922   LearningRate 0.0575   Epoch: 4   Global Step: 200410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:33,146-Speed 2623.46 samples/sec   Loss 10.2768   LearningRate 0.0575   Epoch: 4   Global Step: 200420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:37,203-Speed 2524.35 samples/sec   Loss 10.2515   LearningRate 0.0575   Epoch: 4   Global Step: 200430   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:41,095-Speed 2631.78 samples/sec   Loss 10.3976   LearningRate 0.0575   Epoch: 4   Global Step: 200440   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:44,997-Speed 2625.14 samples/sec   Loss 10.3555   LearningRate 0.0575   Epoch: 4   Global Step: 200450   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:15:48,890-Speed 2631.46 samples/sec   Loss 10.4310   LearningRate 0.0575   Epoch: 4   Global Step: 200460   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:15:52,785-Speed 2629.46 samples/sec   Loss 10.3209   LearningRate 0.0575   Epoch: 4   Global Step: 200470   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:15:56,683-Speed 2627.43 samples/sec   Loss 10.2903   LearningRate 0.0575   Epoch: 4   Global Step: 200480   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:16:00,569-Speed 2635.63 samples/sec   Loss 10.3903   LearningRate 0.0575   Epoch: 4   Global Step: 200490   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:04,464-Speed 2629.13 samples/sec   Loss 10.2712   LearningRate 0.0575   Epoch: 4   Global Step: 200500   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:08,365-Speed 2625.85 samples/sec   Loss 10.3528   LearningRate 0.0575   Epoch: 4   Global Step: 200510   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:12,250-Speed 2636.37 samples/sec   Loss 10.2308   LearningRate 0.0575   Epoch: 4   Global Step: 200520   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:16,140-Speed 2632.58 samples/sec   Loss 10.5774   LearningRate 0.0575   Epoch: 4   Global Step: 200530   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:20,146-Speed 2557.55 samples/sec   Loss 10.1639   LearningRate 0.0575   Epoch: 4   Global Step: 200540   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:24,066-Speed 2612.74 samples/sec   Loss 10.3310   LearningRate 0.0575   Epoch: 4   Global Step: 200550   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:27,955-Speed 2633.75 samples/sec   Loss 10.4402   LearningRate 0.0575   Epoch: 4   Global Step: 200560   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:31,850-Speed 2629.80 samples/sec   Loss 10.3769   LearningRate 0.0575   Epoch: 4   Global Step: 200570   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:35,761-Speed 2618.57 samples/sec   Loss 10.4585   LearningRate 0.0575   Epoch: 4   Global Step: 200580   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:39,640-Speed 2640.15 samples/sec   Loss 10.3648   LearningRate 0.0575   Epoch: 4   Global Step: 200590   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:43,535-Speed 2629.51 samples/sec   Loss 10.3323   LearningRate 0.0575   Epoch: 4   Global Step: 200600   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:47,438-Speed 2624.61 samples/sec   Loss 10.3710   LearningRate 0.0575   Epoch: 4   Global Step: 200610   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:51,342-Speed 2623.26 samples/sec   Loss 10.2220   LearningRate 0.0575   Epoch: 4   Global Step: 200620   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:55,233-Speed 2632.19 samples/sec   Loss 10.2811   LearningRate 0.0575   Epoch: 4   Global Step: 200630   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:16:59,126-Speed 2631.26 samples/sec   Loss 10.3466   LearningRate 0.0575   Epoch: 4   Global Step: 200640   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:17:03,023-Speed 2628.64 samples/sec   Loss 10.1749   LearningRate 0.0575   Epoch: 4   Global Step: 200650   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:17:06,918-Speed 2629.38 samples/sec   Loss 10.3942   LearningRate 0.0575   Epoch: 4   Global Step: 200660   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:17:10,796-Speed 2640.95 samples/sec   Loss 10.2925   LearningRate 0.0575   Epoch: 4   Global Step: 200670   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:14,691-Speed 2629.76 samples/sec   Loss 10.2613   LearningRate 0.0575   Epoch: 4   Global Step: 200680   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:18,586-Speed 2629.34 samples/sec   Loss 10.2377   LearningRate 0.0575   Epoch: 4   Global Step: 200690   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:22,481-Speed 2629.97 samples/sec   Loss 10.3873   LearningRate 0.0575   Epoch: 4   Global Step: 200700   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:26,379-Speed 2627.36 samples/sec   Loss 10.2964   LearningRate 0.0575   Epoch: 4   Global Step: 200710   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:30,288-Speed 2620.55 samples/sec   Loss 10.4622   LearningRate 0.0575   Epoch: 4   Global Step: 200720   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:34,185-Speed 2627.79 samples/sec   Loss 10.4409   LearningRate 0.0575   Epoch: 4   Global Step: 200730   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:38,083-Speed 2627.52 samples/sec   Loss 10.2889   LearningRate 0.0575   Epoch: 4   Global Step: 200740   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:41,977-Speed 2630.40 samples/sec   Loss 10.2983   LearningRate 0.0575   Epoch: 4   Global Step: 200750   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:45,880-Speed 2624.39 samples/sec   Loss 10.3795   LearningRate 0.0575   Epoch: 4   Global Step: 200760   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:17:49,798-Speed 2614.67 samples/sec   Loss 10.2564   LearningRate 0.0575   Epoch: 4   Global Step: 200770   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:17:53,734-Speed 2601.48 samples/sec   Loss 10.3226   LearningRate 0.0575   Epoch: 4   Global Step: 200780   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:17:57,652-Speed 2614.82 samples/sec   Loss 10.2650   LearningRate 0.0575   Epoch: 4   Global Step: 200790   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:01,556-Speed 2623.13 samples/sec   Loss 10.4238   LearningRate 0.0574   Epoch: 4   Global Step: 200800   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:05,477-Speed 2611.74 samples/sec   Loss 10.3200   LearningRate 0.0574   Epoch: 4   Global Step: 200810   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:09,381-Speed 2623.76 samples/sec   Loss 10.2479   LearningRate 0.0574   Epoch: 4   Global Step: 200820   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:13,287-Speed 2622.89 samples/sec   Loss 10.3806   LearningRate 0.0574   Epoch: 4   Global Step: 200830   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:17,194-Speed 2621.56 samples/sec   Loss 10.4470   LearningRate 0.0574   Epoch: 4   Global Step: 200840   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:21,098-Speed 2624.13 samples/sec   Loss 10.3558   LearningRate 0.0574   Epoch: 4   Global Step: 200850   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:25,001-Speed 2624.27 samples/sec   Loss 10.4616   LearningRate 0.0574   Epoch: 4   Global Step: 200860   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:28,904-Speed 2624.29 samples/sec   Loss 10.4647   LearningRate 0.0574   Epoch: 4   Global Step: 200870   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:32,815-Speed 2618.88 samples/sec   Loss 10.2822   LearningRate 0.0574   Epoch: 4   Global Step: 200880   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:36,721-Speed 2621.95 samples/sec   Loss 10.5319   LearningRate 0.0574   Epoch: 4   Global Step: 200890   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:40,678-Speed 2588.00 samples/sec   Loss 10.4486   LearningRate 0.0574   Epoch: 4   Global Step: 200900   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:44,578-Speed 2625.97 samples/sec   Loss 10.3828   LearningRate 0.0574   Epoch: 4   Global Step: 200910   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:48,480-Speed 2626.02 samples/sec   Loss 10.3694   LearningRate 0.0574   Epoch: 4   Global Step: 200920   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:52,382-Speed 2624.83 samples/sec   Loss 10.2716   LearningRate 0.0574   Epoch: 4   Global Step: 200930   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:18:56,283-Speed 2625.60 samples/sec   Loss 10.3191   LearningRate 0.0574   Epoch: 4   Global Step: 200940   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:19:00,190-Speed 2621.82 samples/sec   Loss 10.4931   LearningRate 0.0574   Epoch: 4   Global Step: 200950   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:19:04,096-Speed 2622.20 samples/sec   Loss 10.3818   LearningRate 0.0574   Epoch: 4   Global Step: 200960   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:19:07,987-Speed 2632.37 samples/sec   Loss 10.5187   LearningRate 0.0574   Epoch: 4   Global Step: 200970   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:11,892-Speed 2622.89 samples/sec   Loss 10.4167   LearningRate 0.0574   Epoch: 4   Global Step: 200980   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:15,795-Speed 2623.68 samples/sec   Loss 10.3395   LearningRate 0.0574   Epoch: 4   Global Step: 200990   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:19,707-Speed 2618.40 samples/sec   Loss 10.3017   LearningRate 0.0574   Epoch: 4   Global Step: 201000   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:23,608-Speed 2625.57 samples/sec   Loss 10.2783   LearningRate 0.0574   Epoch: 4   Global Step: 201010   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:27,510-Speed 2624.52 samples/sec   Loss 10.2057   LearningRate 0.0574   Epoch: 4   Global Step: 201020   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:31,413-Speed 2624.81 samples/sec   Loss 10.2202   LearningRate 0.0574   Epoch: 4   Global Step: 201030   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:35,314-Speed 2625.40 samples/sec   Loss 10.2703   LearningRate 0.0574   Epoch: 4   Global Step: 201040   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:39,219-Speed 2623.30 samples/sec   Loss 10.4041   LearningRate 0.0574   Epoch: 4   Global Step: 201050   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:43,151-Speed 2604.65 samples/sec   Loss 10.4855   LearningRate 0.0574   Epoch: 4   Global Step: 201060   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:19:47,067-Speed 2615.18 samples/sec   Loss 10.2893   LearningRate 0.0574   Epoch: 4   Global Step: 201070   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:19:50,978-Speed 2618.82 samples/sec   Loss 10.2592   LearningRate 0.0574   Epoch: 4   Global Step: 201080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:19:54,890-Speed 2618.59 samples/sec   Loss 10.2758   LearningRate 0.0574   Epoch: 4   Global Step: 201090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:19:58,798-Speed 2620.89 samples/sec   Loss 10.3186   LearningRate 0.0574   Epoch: 4   Global Step: 201100   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:02,711-Speed 2617.48 samples/sec   Loss 10.4393   LearningRate 0.0574   Epoch: 4   Global Step: 201110   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:06,620-Speed 2620.08 samples/sec   Loss 10.3071   LearningRate 0.0574   Epoch: 4   Global Step: 201120   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:10,524-Speed 2623.21 samples/sec   Loss 10.3154   LearningRate 0.0574   Epoch: 4   Global Step: 201130   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:14,425-Speed 2625.72 samples/sec   Loss 10.5162   LearningRate 0.0574   Epoch: 4   Global Step: 201140   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:18,331-Speed 2622.57 samples/sec   Loss 10.4712   LearningRate 0.0574   Epoch: 4   Global Step: 201150   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:22,243-Speed 2618.21 samples/sec   Loss 10.4910   LearningRate 0.0574   Epoch: 4   Global Step: 201160   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:26,131-Speed 2634.44 samples/sec   Loss 10.4604   LearningRate 0.0574   Epoch: 4   Global Step: 201170   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:30,050-Speed 2613.45 samples/sec   Loss 10.4167   LearningRate 0.0574   Epoch: 4   Global Step: 201180   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:33,958-Speed 2620.59 samples/sec   Loss 10.3073   LearningRate 0.0574   Epoch: 4   Global Step: 201190   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:37,880-Speed 2611.50 samples/sec   Loss 10.2696   LearningRate 0.0574   Epoch: 4   Global Step: 201200   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:41,793-Speed 2617.56 samples/sec   Loss 10.3561   LearningRate 0.0574   Epoch: 4   Global Step: 201210   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:45,702-Speed 2619.92 samples/sec   Loss 10.3029   LearningRate 0.0574   Epoch: 4   Global Step: 201220   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:49,622-Speed 2613.08 samples/sec   Loss 10.2778   LearningRate 0.0574   Epoch: 4   Global Step: 201230   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:53,544-Speed 2612.06 samples/sec   Loss 10.4619   LearningRate 0.0574   Epoch: 4   Global Step: 201240   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:20:57,459-Speed 2615.91 samples/sec   Loss 10.2682   LearningRate 0.0574   Epoch: 4   Global Step: 201250   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:21:01,364-Speed 2622.41 samples/sec   Loss 10.2486   LearningRate 0.0574   Epoch: 4   Global Step: 201260   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:21:05,255-Speed 2632.49 samples/sec   Loss 10.3128   LearningRate 0.0574   Epoch: 4   Global Step: 201270   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:21:09,160-Speed 2622.57 samples/sec   Loss 10.3626   LearningRate 0.0574   Epoch: 4   Global Step: 201280   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:21:13,067-Speed 2621.82 samples/sec   Loss 10.4133   LearningRate 0.0574   Epoch: 4   Global Step: 201290   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:21:16,954-Speed 2634.92 samples/sec   Loss 10.3380   LearningRate 0.0574   Epoch: 4   Global Step: 201300   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:20,885-Speed 2605.45 samples/sec   Loss 10.2953   LearningRate 0.0574   Epoch: 4   Global Step: 201310   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:24,790-Speed 2622.83 samples/sec   Loss 10.3719   LearningRate 0.0574   Epoch: 4   Global Step: 201320   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:28,696-Speed 2622.59 samples/sec   Loss 10.3280   LearningRate 0.0574   Epoch: 4   Global Step: 201330   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:32,600-Speed 2623.53 samples/sec   Loss 10.4203   LearningRate 0.0574   Epoch: 4   Global Step: 201340   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:36,507-Speed 2621.72 samples/sec   Loss 10.1271   LearningRate 0.0573   Epoch: 4   Global Step: 201350   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:40,410-Speed 2624.02 samples/sec   Loss 10.2698   LearningRate 0.0573   Epoch: 4   Global Step: 201360   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:44,316-Speed 2622.23 samples/sec   Loss 10.2664   LearningRate 0.0573   Epoch: 4   Global Step: 201370   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:48,217-Speed 2625.38 samples/sec   Loss 10.3412   LearningRate 0.0573   Epoch: 4   Global Step: 201380   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:52,135-Speed 2614.24 samples/sec   Loss 10.3065   LearningRate 0.0573   Epoch: 4   Global Step: 201390   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:21:56,044-Speed 2619.82 samples/sec   Loss 10.2351   LearningRate 0.0573   Epoch: 4   Global Step: 201400   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:21:59,951-Speed 2621.53 samples/sec   Loss 10.3788   LearningRate 0.0573   Epoch: 4   Global Step: 201410   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:03,858-Speed 2621.65 samples/sec   Loss 10.5261   LearningRate 0.0573   Epoch: 4   Global Step: 201420   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:07,764-Speed 2622.21 samples/sec   Loss 10.3217   LearningRate 0.0573   Epoch: 4   Global Step: 201430   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:11,668-Speed 2623.76 samples/sec   Loss 10.3905   LearningRate 0.0573   Epoch: 4   Global Step: 201440   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:15,574-Speed 2622.41 samples/sec   Loss 10.4972   LearningRate 0.0573   Epoch: 4   Global Step: 201450   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:19,488-Speed 2616.36 samples/sec   Loss 10.2720   LearningRate 0.0573   Epoch: 4   Global Step: 201460   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:23,421-Speed 2604.35 samples/sec   Loss 10.1958   LearningRate 0.0573   Epoch: 4   Global Step: 201470   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:27,476-Speed 2525.94 samples/sec   Loss 10.4877   LearningRate 0.0573   Epoch: 4   Global Step: 201480   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:31,523-Speed 2538.21 samples/sec   Loss 10.2434   LearningRate 0.0573   Epoch: 4   Global Step: 201490   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:22:35,417-Speed 2630.11 samples/sec   Loss 10.3084   LearningRate 0.0573   Epoch: 4   Global Step: 201500   Fp16 Grad Scale: 262144   Required: 71 hours
Training: 2022-04-13 18:22:39,279-Speed 2651.69 samples/sec   Loss 10.5117   LearningRate 0.0573   Epoch: 4   Global Step: 201510   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:22:43,184-Speed 2623.38 samples/sec   Loss 10.3410   LearningRate 0.0573   Epoch: 4   Global Step: 201520   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:22:47,088-Speed 2623.51 samples/sec   Loss 10.2819   LearningRate 0.0573   Epoch: 4   Global Step: 201530   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:22:50,991-Speed 2623.90 samples/sec   Loss 10.3414   LearningRate 0.0573   Epoch: 4   Global Step: 201540   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:22:54,897-Speed 2621.87 samples/sec   Loss 10.4695   LearningRate 0.0573   Epoch: 4   Global Step: 201550   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:22:58,802-Speed 2623.54 samples/sec   Loss 10.3834   LearningRate 0.0573   Epoch: 4   Global Step: 201560   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:23:02,717-Speed 2615.95 samples/sec   Loss 10.3872   LearningRate 0.0573   Epoch: 4   Global Step: 201570   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:23:06,635-Speed 2614.03 samples/sec   Loss 10.4523   LearningRate 0.0573   Epoch: 4   Global Step: 201580   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:23:10,540-Speed 2622.32 samples/sec   Loss 10.3353   LearningRate 0.0573   Epoch: 4   Global Step: 201590   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:23:14,451-Speed 2619.39 samples/sec   Loss 10.2683   LearningRate 0.0573   Epoch: 4   Global Step: 201600   Fp16 Grad Scale: 32768   Required: 71 hours
Training: 2022-04-13 18:23:18,363-Speed 2618.14 samples/sec   Loss 10.3702   LearningRate 0.0573   Epoch: 4   Global Step: 201610   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:22,294-Speed 2605.84 samples/sec   Loss 10.4157   LearningRate 0.0573   Epoch: 4   Global Step: 201620   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:26,198-Speed 2623.95 samples/sec   Loss 10.3672   LearningRate 0.0573   Epoch: 4   Global Step: 201630   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:30,118-Speed 2612.92 samples/sec   Loss 10.6150   LearningRate 0.0573   Epoch: 4   Global Step: 201640   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:34,017-Speed 2626.62 samples/sec   Loss 10.3129   LearningRate 0.0573   Epoch: 4   Global Step: 201650   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:37,923-Speed 2622.29 samples/sec   Loss 10.3904   LearningRate 0.0573   Epoch: 4   Global Step: 201660   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:41,843-Speed 2613.14 samples/sec   Loss 10.3802   LearningRate 0.0573   Epoch: 4   Global Step: 201670   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:45,749-Speed 2622.27 samples/sec   Loss 10.2976   LearningRate 0.0573   Epoch: 4   Global Step: 201680   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:49,651-Speed 2624.97 samples/sec   Loss 10.3534   LearningRate 0.0573   Epoch: 4   Global Step: 201690   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:53,561-Speed 2620.10 samples/sec   Loss 10.4000   LearningRate 0.0573   Epoch: 4   Global Step: 201700   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:23:57,494-Speed 2604.40 samples/sec   Loss 10.4569   LearningRate 0.0573   Epoch: 4   Global Step: 201710   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:24:01,462-Speed 2581.53 samples/sec   Loss 10.1986   LearningRate 0.0573   Epoch: 4   Global Step: 201720   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:24:05,374-Speed 2618.55 samples/sec   Loss 10.4465   LearningRate 0.0573   Epoch: 4   Global Step: 201730   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:24:09,284-Speed 2619.38 samples/sec   Loss 10.3938   LearningRate 0.0573   Epoch: 4   Global Step: 201740   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:24:13,209-Speed 2609.36 samples/sec   Loss 10.2406   LearningRate 0.0573   Epoch: 4   Global Step: 201750   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:24:17,101-Speed 2632.02 samples/sec   Loss 10.2313   LearningRate 0.0573   Epoch: 4   Global Step: 201760   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:21,023-Speed 2611.57 samples/sec   Loss 10.2049   LearningRate 0.0573   Epoch: 4   Global Step: 201770   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:24,930-Speed 2621.87 samples/sec   Loss 10.3739   LearningRate 0.0573   Epoch: 4   Global Step: 201780   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:28,835-Speed 2623.06 samples/sec   Loss 10.3621   LearningRate 0.0573   Epoch: 4   Global Step: 201790   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:32,806-Speed 2580.19 samples/sec   Loss 10.4749   LearningRate 0.0573   Epoch: 4   Global Step: 201800   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:36,715-Speed 2619.94 samples/sec   Loss 10.4446   LearningRate 0.0573   Epoch: 4   Global Step: 201810   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:40,657-Speed 2598.06 samples/sec   Loss 10.3831   LearningRate 0.0573   Epoch: 4   Global Step: 201820   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:44,572-Speed 2616.49 samples/sec   Loss 10.2731   LearningRate 0.0573   Epoch: 4   Global Step: 201830   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:48,477-Speed 2625.51 samples/sec   Loss 10.3336   LearningRate 0.0573   Epoch: 4   Global Step: 201840   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:52,384-Speed 2621.32 samples/sec   Loss 10.3421   LearningRate 0.0573   Epoch: 4   Global Step: 201850   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:24:56,297-Speed 2617.59 samples/sec   Loss 10.3256   LearningRate 0.0573   Epoch: 4   Global Step: 201860   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:25:00,209-Speed 2618.04 samples/sec   Loss 10.3059   LearningRate 0.0573   Epoch: 4   Global Step: 201870   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:04,119-Speed 2619.93 samples/sec   Loss 10.4974   LearningRate 0.0573   Epoch: 4   Global Step: 201880   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:08,022-Speed 2624.40 samples/sec   Loss 10.2912   LearningRate 0.0572   Epoch: 4   Global Step: 201890   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:11,927-Speed 2623.03 samples/sec   Loss 10.3058   LearningRate 0.0572   Epoch: 4   Global Step: 201900   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:15,841-Speed 2616.68 samples/sec   Loss 10.3577   LearningRate 0.0572   Epoch: 4   Global Step: 201910   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:19,750-Speed 2620.46 samples/sec   Loss 10.2604   LearningRate 0.0572   Epoch: 4   Global Step: 201920   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:23,667-Speed 2614.76 samples/sec   Loss 10.4215   LearningRate 0.0572   Epoch: 4   Global Step: 201930   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:27,574-Speed 2622.01 samples/sec   Loss 10.1342   LearningRate 0.0572   Epoch: 4   Global Step: 201940   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:31,477-Speed 2624.24 samples/sec   Loss 10.2372   LearningRate 0.0572   Epoch: 4   Global Step: 201950   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:35,380-Speed 2624.46 samples/sec   Loss 10.2974   LearningRate 0.0572   Epoch: 4   Global Step: 201960   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:39,289-Speed 2619.51 samples/sec   Loss 10.2658   LearningRate 0.0572   Epoch: 4   Global Step: 201970   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:25:43,177-Speed 2634.83 samples/sec   Loss 10.3843   LearningRate 0.0572   Epoch: 4   Global Step: 201980   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:47,076-Speed 2626.92 samples/sec   Loss 10.3624   LearningRate 0.0572   Epoch: 4   Global Step: 201990   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:50,985-Speed 2620.31 samples/sec   Loss 10.3574   LearningRate 0.0572   Epoch: 4   Global Step: 202000   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:54,890-Speed 2622.78 samples/sec   Loss 10.3056   LearningRate 0.0572   Epoch: 4   Global Step: 202010   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:25:58,805-Speed 2616.31 samples/sec   Loss 10.4280   LearningRate 0.0572   Epoch: 4   Global Step: 202020   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:26:02,711-Speed 2621.86 samples/sec   Loss 10.3060   LearningRate 0.0572   Epoch: 4   Global Step: 202030   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:26:06,615-Speed 2623.50 samples/sec   Loss 10.2442   LearningRate 0.0572   Epoch: 4   Global Step: 202040   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:26:10,527-Speed 2618.50 samples/sec   Loss 10.4084   LearningRate 0.0572   Epoch: 4   Global Step: 202050   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:26:14,430-Speed 2624.07 samples/sec   Loss 10.1889   LearningRate 0.0572   Epoch: 4   Global Step: 202060   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:26:18,332-Speed 2624.90 samples/sec   Loss 10.5120   LearningRate 0.0572   Epoch: 4   Global Step: 202070   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:26:22,234-Speed 2624.92 samples/sec   Loss 10.2824   LearningRate 0.0572   Epoch: 4   Global Step: 202080   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:26,300-Speed 2518.98 samples/sec   Loss 10.3773   LearningRate 0.0572   Epoch: 4   Global Step: 202090   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:30,201-Speed 2626.34 samples/sec   Loss 10.2204   LearningRate 0.0572   Epoch: 4   Global Step: 202100   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:34,103-Speed 2624.44 samples/sec   Loss 10.3483   LearningRate 0.0572   Epoch: 4   Global Step: 202110   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:38,004-Speed 2625.69 samples/sec   Loss 10.3124   LearningRate 0.0572   Epoch: 4   Global Step: 202120   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:41,984-Speed 2573.63 samples/sec   Loss 10.1134   LearningRate 0.0572   Epoch: 4   Global Step: 202130   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:45,897-Speed 2618.04 samples/sec   Loss 10.3395   LearningRate 0.0572   Epoch: 4   Global Step: 202140   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:49,847-Speed 2593.37 samples/sec   Loss 10.3456   LearningRate 0.0572   Epoch: 4   Global Step: 202150   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:53,747-Speed 2626.26 samples/sec   Loss 10.2930   LearningRate 0.0572   Epoch: 4   Global Step: 202160   Fp16 Grad Scale: 131072   Required: 71 hours
Training: 2022-04-13 18:26:57,640-Speed 2630.74 samples/sec   Loss 10.1797   LearningRate 0.0572   Epoch: 4   Global Step: 202170   Fp16 Grad Scale: 65536   Required: 71 hours
Training: 2022-04-13 18:27:01,535-Speed 2629.71 samples/sec   Loss 10.2714   LearningRate 0.0572   Epoch: 4   Global Step: 202180   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:05,441-Speed 2622.29 samples/sec   Loss 10.3675   LearningRate 0.0572   Epoch: 4   Global Step: 202190   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:09,347-Speed 2622.52 samples/sec   Loss 10.4090   LearningRate 0.0572   Epoch: 4   Global Step: 202200   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:13,249-Speed 2624.50 samples/sec   Loss 10.3138   LearningRate 0.0572   Epoch: 4   Global Step: 202210   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:17,161-Speed 2617.97 samples/sec   Loss 10.4232   LearningRate 0.0572   Epoch: 4   Global Step: 202220   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:21,068-Speed 2621.95 samples/sec   Loss 10.3558   LearningRate 0.0572   Epoch: 4   Global Step: 202230   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:24,994-Speed 2609.01 samples/sec   Loss 10.3599   LearningRate 0.0572   Epoch: 4   Global Step: 202240   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:28,918-Speed 2610.37 samples/sec   Loss 10.3696   LearningRate 0.0572   Epoch: 4   Global Step: 202250   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:32,837-Speed 2613.96 samples/sec   Loss 10.2741   LearningRate 0.0572   Epoch: 4   Global Step: 202260   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:27:36,737-Speed 2626.15 samples/sec   Loss 10.2507   LearningRate 0.0572   Epoch: 4   Global Step: 202270   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:27:40,636-Speed 2626.55 samples/sec   Loss 10.3358   LearningRate 0.0572   Epoch: 4   Global Step: 202280   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:27:44,567-Speed 2605.96 samples/sec   Loss 10.3579   LearningRate 0.0572   Epoch: 4   Global Step: 202290   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:27:48,466-Speed 2626.89 samples/sec   Loss 10.2818   LearningRate 0.0572   Epoch: 4   Global Step: 202300   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:27:53,196-Speed 2165.45 samples/sec   Loss 10.4024   LearningRate 0.0572   Epoch: 4   Global Step: 202310   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:27:57,090-Speed 2629.88 samples/sec   Loss 10.2841   LearningRate 0.0572   Epoch: 4   Global Step: 202320   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:00,988-Speed 2628.93 samples/sec   Loss 10.3818   LearningRate 0.0572   Epoch: 4   Global Step: 202330   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:04,892-Speed 2623.49 samples/sec   Loss 10.2622   LearningRate 0.0572   Epoch: 4   Global Step: 202340   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:08,796-Speed 2623.56 samples/sec   Loss 10.2386   LearningRate 0.0572   Epoch: 4   Global Step: 202350   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:12,712-Speed 2615.75 samples/sec   Loss 10.2011   LearningRate 0.0572   Epoch: 4   Global Step: 202360   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:16,611-Speed 2627.16 samples/sec   Loss 10.3406   LearningRate 0.0572   Epoch: 4   Global Step: 202370   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:28:20,501-Speed 2633.13 samples/sec   Loss 10.3185   LearningRate 0.0572   Epoch: 4   Global Step: 202380   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:24,403-Speed 2624.84 samples/sec   Loss 10.2250   LearningRate 0.0572   Epoch: 4   Global Step: 202390   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:28,307-Speed 2624.49 samples/sec   Loss 10.2576   LearningRate 0.0572   Epoch: 4   Global Step: 202400   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:32,203-Speed 2628.35 samples/sec   Loss 10.2137   LearningRate 0.0572   Epoch: 4   Global Step: 202410   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:36,130-Speed 2608.35 samples/sec   Loss 10.4293   LearningRate 0.0572   Epoch: 4   Global Step: 202420   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:40,043-Speed 2617.82 samples/sec   Loss 10.3706   LearningRate 0.0572   Epoch: 4   Global Step: 202430   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:43,942-Speed 2626.78 samples/sec   Loss 10.4389   LearningRate 0.0571   Epoch: 4   Global Step: 202440   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:47,840-Speed 2627.85 samples/sec   Loss 10.4578   LearningRate 0.0571   Epoch: 4   Global Step: 202450   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:51,756-Speed 2615.64 samples/sec   Loss 10.1901   LearningRate 0.0571   Epoch: 4   Global Step: 202460   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:55,656-Speed 2626.04 samples/sec   Loss 10.3187   LearningRate 0.0571   Epoch: 4   Global Step: 202470   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:28:59,554-Speed 2628.40 samples/sec   Loss 10.3157   LearningRate 0.0571   Epoch: 4   Global Step: 202480   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:29:03,455-Speed 2625.43 samples/sec   Loss 10.3455   LearningRate 0.0571   Epoch: 4   Global Step: 202490   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:29:07,341-Speed 2635.57 samples/sec   Loss 10.3501   LearningRate 0.0571   Epoch: 4   Global Step: 202500   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:29:11,256-Speed 2616.08 samples/sec   Loss 10.3920   LearningRate 0.0571   Epoch: 4   Global Step: 202510   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:29:15,150-Speed 2630.96 samples/sec   Loss 10.4077   LearningRate 0.0571   Epoch: 4   Global Step: 202520   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:29:19,074-Speed 2610.16 samples/sec   Loss 10.3641   LearningRate 0.0571   Epoch: 4   Global Step: 202530   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:29:22,976-Speed 2624.86 samples/sec   Loss 10.2482   LearningRate 0.0571   Epoch: 4   Global Step: 202540   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:29:26,876-Speed 2626.09 samples/sec   Loss 10.2359   LearningRate 0.0571   Epoch: 4   Global Step: 202550   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:29:30,829-Speed 2591.33 samples/sec   Loss 10.3318   LearningRate 0.0571   Epoch: 4   Global Step: 202560   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:34,729-Speed 2626.26 samples/sec   Loss 10.2995   LearningRate 0.0571   Epoch: 4   Global Step: 202570   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:38,631-Speed 2624.80 samples/sec   Loss 10.3629   LearningRate 0.0571   Epoch: 4   Global Step: 202580   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:42,531-Speed 2626.34 samples/sec   Loss 10.4088   LearningRate 0.0571   Epoch: 4   Global Step: 202590   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:46,455-Speed 2610.66 samples/sec   Loss 10.3405   LearningRate 0.0571   Epoch: 4   Global Step: 202600   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:50,359-Speed 2623.36 samples/sec   Loss 10.2797   LearningRate 0.0571   Epoch: 4   Global Step: 202610   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:54,258-Speed 2627.72 samples/sec   Loss 10.2088   LearningRate 0.0571   Epoch: 4   Global Step: 202620   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:29:58,153-Speed 2629.50 samples/sec   Loss 10.2691   LearningRate 0.0571   Epoch: 4   Global Step: 202630   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:02,066-Speed 2617.23 samples/sec   Loss 10.4444   LearningRate 0.0571   Epoch: 4   Global Step: 202640   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:06,067-Speed 2560.43 samples/sec   Loss 10.3843   LearningRate 0.0571   Epoch: 4   Global Step: 202650   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:09,964-Speed 2628.35 samples/sec   Loss 10.3414   LearningRate 0.0571   Epoch: 4   Global Step: 202660   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:13,865-Speed 2625.24 samples/sec   Loss 10.3180   LearningRate 0.0571   Epoch: 4   Global Step: 202670   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:17,778-Speed 2617.88 samples/sec   Loss 10.3536   LearningRate 0.0571   Epoch: 4   Global Step: 202680   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:21,851-Speed 2514.63 samples/sec   Loss 10.2562   LearningRate 0.0571   Epoch: 4   Global Step: 202690   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:25,785-Speed 2604.14 samples/sec   Loss 10.4002   LearningRate 0.0571   Epoch: 4   Global Step: 202700   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:30,190-Speed 2324.97 samples/sec   Loss 10.2708   LearningRate 0.0571   Epoch: 4   Global Step: 202710   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:34,093-Speed 2624.57 samples/sec   Loss 10.2935   LearningRate 0.0571   Epoch: 4   Global Step: 202720   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:37,985-Speed 2631.60 samples/sec   Loss 10.2613   LearningRate 0.0571   Epoch: 4   Global Step: 202730   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:30:41,862-Speed 2642.03 samples/sec   Loss 10.2380   LearningRate 0.0571   Epoch: 4   Global Step: 202740   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:45,758-Speed 2628.40 samples/sec   Loss 10.3516   LearningRate 0.0571   Epoch: 4   Global Step: 202750   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:49,663-Speed 2622.99 samples/sec   Loss 10.3343   LearningRate 0.0571   Epoch: 4   Global Step: 202760   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:53,580-Speed 2614.33 samples/sec   Loss 10.3271   LearningRate 0.0571   Epoch: 4   Global Step: 202770   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:30:57,474-Speed 2630.55 samples/sec   Loss 10.1916   LearningRate 0.0571   Epoch: 4   Global Step: 202780   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:01,412-Speed 2601.58 samples/sec   Loss 10.5179   LearningRate 0.0571   Epoch: 4   Global Step: 202790   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:05,304-Speed 2631.36 samples/sec   Loss 10.4698   LearningRate 0.0571   Epoch: 4   Global Step: 202800   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:09,200-Speed 2629.36 samples/sec   Loss 10.3037   LearningRate 0.0571   Epoch: 4   Global Step: 202810   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:13,093-Speed 2630.75 samples/sec   Loss 10.2740   LearningRate 0.0571   Epoch: 4   Global Step: 202820   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:16,986-Speed 2630.33 samples/sec   Loss 10.1639   LearningRate 0.0571   Epoch: 4   Global Step: 202830   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:20,870-Speed 2636.93 samples/sec   Loss 10.2326   LearningRate 0.0571   Epoch: 4   Global Step: 202840   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:31:24,736-Speed 2649.93 samples/sec   Loss 11.5342   LearningRate 0.0571   Epoch: 4   Global Step: 202850   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:28,637-Speed 2625.33 samples/sec   Loss 10.9231   LearningRate 0.0571   Epoch: 4   Global Step: 202860   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:32,660-Speed 2546.38 samples/sec   Loss 10.5368   LearningRate 0.0571   Epoch: 4   Global Step: 202870   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:36,684-Speed 2545.80 samples/sec   Loss 10.4977   LearningRate 0.0571   Epoch: 4   Global Step: 202880   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:40,576-Speed 2631.71 samples/sec   Loss 10.5378   LearningRate 0.0571   Epoch: 4   Global Step: 202890   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:44,484-Speed 2620.74 samples/sec   Loss 10.4855   LearningRate 0.0571   Epoch: 4   Global Step: 202900   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:48,393-Speed 2619.75 samples/sec   Loss 10.3557   LearningRate 0.0571   Epoch: 4   Global Step: 202910   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:52,301-Speed 2620.97 samples/sec   Loss 10.4171   LearningRate 0.0571   Epoch: 4   Global Step: 202920   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:31:56,194-Speed 2631.13 samples/sec   Loss 10.3597   LearningRate 0.0571   Epoch: 4   Global Step: 202930   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:32:00,089-Speed 2628.99 samples/sec   Loss 10.2792   LearningRate 0.0571   Epoch: 4   Global Step: 202940   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:32:03,983-Speed 2630.92 samples/sec   Loss 10.3669   LearningRate 0.0571   Epoch: 4   Global Step: 202950   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:07,880-Speed 2627.65 samples/sec   Loss 10.3227   LearningRate 0.0571   Epoch: 4   Global Step: 202960   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:11,782-Speed 2625.33 samples/sec   Loss 10.4495   LearningRate 0.0571   Epoch: 4   Global Step: 202970   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:15,675-Speed 2631.26 samples/sec   Loss 10.2559   LearningRate 0.0571   Epoch: 4   Global Step: 202980   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:19,568-Speed 2630.96 samples/sec   Loss 10.3955   LearningRate 0.0570   Epoch: 4   Global Step: 202990   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:23,461-Speed 2630.19 samples/sec   Loss 10.4653   LearningRate 0.0570   Epoch: 4   Global Step: 203000   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:27,399-Speed 2601.65 samples/sec   Loss 10.2367   LearningRate 0.0570   Epoch: 4   Global Step: 203010   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:31,292-Speed 2630.97 samples/sec   Loss 10.3600   LearningRate 0.0570   Epoch: 4   Global Step: 203020   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:35,183-Speed 2632.17 samples/sec   Loss 10.2954   LearningRate 0.0570   Epoch: 4   Global Step: 203030   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:39,076-Speed 2630.45 samples/sec   Loss 10.2831   LearningRate 0.0570   Epoch: 4   Global Step: 203040   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:32:42,972-Speed 2628.88 samples/sec   Loss 10.3914   LearningRate 0.0570   Epoch: 4   Global Step: 203050   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:32:46,886-Speed 2617.32 samples/sec   Loss 10.2662   LearningRate 0.0570   Epoch: 4   Global Step: 203060   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:32:50,805-Speed 2613.43 samples/sec   Loss 10.3283   LearningRate 0.0570   Epoch: 4   Global Step: 203070   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:32:54,701-Speed 2629.32 samples/sec   Loss 10.3708   LearningRate 0.0570   Epoch: 4   Global Step: 203080   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:32:58,733-Speed 2539.90 samples/sec   Loss 10.3791   LearningRate 0.0570   Epoch: 4   Global Step: 203090   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:33:02,805-Speed 2515.40 samples/sec   Loss 10.3486   LearningRate 0.0570   Epoch: 4   Global Step: 203100   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:33:06,727-Speed 2611.83 samples/sec   Loss 10.5007   LearningRate 0.0570   Epoch: 4   Global Step: 203110   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:33:10,622-Speed 2629.47 samples/sec   Loss 10.3721   LearningRate 0.0570   Epoch: 4   Global Step: 203120   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:33:14,515-Speed 2630.74 samples/sec   Loss 10.3153   LearningRate 0.0570   Epoch: 4   Global Step: 203130   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:33:18,445-Speed 2605.74 samples/sec   Loss 10.5571   LearningRate 0.0570   Epoch: 4   Global Step: 203140   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:22,337-Speed 2631.90 samples/sec   Loss 10.3252   LearningRate 0.0570   Epoch: 4   Global Step: 203150   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:26,230-Speed 2631.37 samples/sec   Loss 10.3450   LearningRate 0.0570   Epoch: 4   Global Step: 203160   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:30,122-Speed 2632.00 samples/sec   Loss 10.3465   LearningRate 0.0570   Epoch: 4   Global Step: 203170   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:34,014-Speed 2631.15 samples/sec   Loss 10.3506   LearningRate 0.0570   Epoch: 4   Global Step: 203180   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:37,905-Speed 2631.96 samples/sec   Loss 10.3483   LearningRate 0.0570   Epoch: 4   Global Step: 203190   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:41,799-Speed 2630.43 samples/sec   Loss 10.3289   LearningRate 0.0570   Epoch: 4   Global Step: 203200   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:45,692-Speed 2631.12 samples/sec   Loss 10.2776   LearningRate 0.0570   Epoch: 4   Global Step: 203210   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:49,595-Speed 2624.25 samples/sec   Loss 10.3095   LearningRate 0.0570   Epoch: 4   Global Step: 203220   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:53,479-Speed 2637.05 samples/sec   Loss 10.4911   LearningRate 0.0570   Epoch: 4   Global Step: 203230   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:33:57,379-Speed 2626.48 samples/sec   Loss 10.4234   LearningRate 0.0570   Epoch: 4   Global Step: 203240   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:01,276-Speed 2628.34 samples/sec   Loss 10.3254   LearningRate 0.0570   Epoch: 4   Global Step: 203250   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:05,176-Speed 2626.24 samples/sec   Loss 10.2598   LearningRate 0.0570   Epoch: 4   Global Step: 203260   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:09,074-Speed 2627.45 samples/sec   Loss 10.3372   LearningRate 0.0570   Epoch: 4   Global Step: 203270   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:12,968-Speed 2630.52 samples/sec   Loss 10.2527   LearningRate 0.0570   Epoch: 4   Global Step: 203280   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:16,899-Speed 2606.45 samples/sec   Loss 10.2866   LearningRate 0.0570   Epoch: 4   Global Step: 203290   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:20,816-Speed 2614.97 samples/sec   Loss 10.2451   LearningRate 0.0570   Epoch: 4   Global Step: 203300   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:24,707-Speed 2632.34 samples/sec   Loss 10.3618   LearningRate 0.0570   Epoch: 4   Global Step: 203310   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:28,606-Speed 2627.54 samples/sec   Loss 10.3280   LearningRate 0.0570   Epoch: 4   Global Step: 203320   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:32,539-Speed 2603.87 samples/sec   Loss 10.4834   LearningRate 0.0570   Epoch: 4   Global Step: 203330   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:34:36,447-Speed 2621.43 samples/sec   Loss 10.3859   LearningRate 0.0570   Epoch: 4   Global Step: 203340   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:34:40,344-Speed 2627.77 samples/sec   Loss 10.3711   LearningRate 0.0570   Epoch: 4   Global Step: 203350   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:34:44,361-Speed 2550.17 samples/sec   Loss 10.4033   LearningRate 0.0570   Epoch: 4   Global Step: 203360   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:34:48,256-Speed 2629.71 samples/sec   Loss 10.2251   LearningRate 0.0570   Epoch: 4   Global Step: 203370   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:34:52,153-Speed 2628.49 samples/sec   Loss 10.2522   LearningRate 0.0570   Epoch: 4   Global Step: 203380   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:34:56,047-Speed 2629.97 samples/sec   Loss 10.8794   LearningRate 0.0570   Epoch: 4   Global Step: 203390   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:34:59,940-Speed 2631.23 samples/sec   Loss 10.4137   LearningRate 0.0570   Epoch: 4   Global Step: 203400   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:03,833-Speed 2631.37 samples/sec   Loss 10.3778   LearningRate 0.0570   Epoch: 4   Global Step: 203410   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:07,726-Speed 2630.61 samples/sec   Loss 10.2914   LearningRate 0.0570   Epoch: 4   Global Step: 203420   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:11,631-Speed 2622.95 samples/sec   Loss 10.4338   LearningRate 0.0570   Epoch: 4   Global Step: 203430   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:15,525-Speed 2630.46 samples/sec   Loss 10.4656   LearningRate 0.0570   Epoch: 4   Global Step: 203440   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:35:19,425-Speed 2625.57 samples/sec   Loss 10.4303   LearningRate 0.0570   Epoch: 4   Global Step: 203450   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:35:23,330-Speed 2623.55 samples/sec   Loss 10.1896   LearningRate 0.0570   Epoch: 4   Global Step: 203460   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:35:27,223-Speed 2630.63 samples/sec   Loss 10.3304   LearningRate 0.0570   Epoch: 4   Global Step: 203470   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:35:31,119-Speed 2629.70 samples/sec   Loss 10.2772   LearningRate 0.0570   Epoch: 4   Global Step: 203480   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:35:34,997-Speed 2640.78 samples/sec   Loss 10.1107   LearningRate 0.0570   Epoch: 4   Global Step: 203490   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:38,894-Speed 2628.06 samples/sec   Loss 10.3120   LearningRate 0.0570   Epoch: 4   Global Step: 203500   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:42,787-Speed 2630.96 samples/sec   Loss 10.1539   LearningRate 0.0570   Epoch: 4   Global Step: 203510   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:46,685-Speed 2627.75 samples/sec   Loss 10.1909   LearningRate 0.0570   Epoch: 4   Global Step: 203520   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:50,583-Speed 2628.14 samples/sec   Loss 10.2619   LearningRate 0.0570   Epoch: 4   Global Step: 203530   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:35:54,442-Speed 2654.09 samples/sec   Loss 10.2899   LearningRate 0.0569   Epoch: 4   Global Step: 203540   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:35:58,367-Speed 2609.33 samples/sec   Loss 10.4997   LearningRate 0.0569   Epoch: 4   Global Step: 203550   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:02,259-Speed 2632.06 samples/sec   Loss 10.4775   LearningRate 0.0569   Epoch: 4   Global Step: 203560   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:06,150-Speed 2632.59 samples/sec   Loss 10.2056   LearningRate 0.0569   Epoch: 4   Global Step: 203570   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:10,053-Speed 2623.91 samples/sec   Loss 10.3093   LearningRate 0.0569   Epoch: 4   Global Step: 203580   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:13,944-Speed 2632.20 samples/sec   Loss 10.4330   LearningRate 0.0569   Epoch: 4   Global Step: 203590   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:17,845-Speed 2626.28 samples/sec   Loss 10.3353   LearningRate 0.0569   Epoch: 4   Global Step: 203600   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:21,739-Speed 2630.70 samples/sec   Loss 10.2117   LearningRate 0.0569   Epoch: 4   Global Step: 203610   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:25,636-Speed 2627.53 samples/sec   Loss 10.2872   LearningRate 0.0569   Epoch: 4   Global Step: 203620   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:29,528-Speed 2632.19 samples/sec   Loss 10.5829   LearningRate 0.0569   Epoch: 4   Global Step: 203630   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:36:33,420-Speed 2631.82 samples/sec   Loss 10.3993   LearningRate 0.0569   Epoch: 4   Global Step: 203640   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:36:37,312-Speed 2631.58 samples/sec   Loss 10.2476   LearningRate 0.0569   Epoch: 4   Global Step: 203650   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:36:41,221-Speed 2620.19 samples/sec   Loss 10.3281   LearningRate 0.0569   Epoch: 4   Global Step: 203660   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:36:45,120-Speed 2626.52 samples/sec   Loss 10.4878   LearningRate 0.0569   Epoch: 4   Global Step: 203670   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:36:49,016-Speed 2629.66 samples/sec   Loss 10.2664   LearningRate 0.0569   Epoch: 4   Global Step: 203680   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:36:52,915-Speed 2626.77 samples/sec   Loss 10.3128   LearningRate 0.0569   Epoch: 4   Global Step: 203690   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:36:56,814-Speed 2627.17 samples/sec   Loss 10.2862   LearningRate 0.0569   Epoch: 4   Global Step: 203700   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:37:00,709-Speed 2629.43 samples/sec   Loss 10.2066   LearningRate 0.0569   Epoch: 4   Global Step: 203710   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:37:04,603-Speed 2630.18 samples/sec   Loss 10.1825   LearningRate 0.0569   Epoch: 4   Global Step: 203720   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:37:08,596-Speed 2564.86 samples/sec   Loss 10.2483   LearningRate 0.0569   Epoch: 4   Global Step: 203730   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:37:12,496-Speed 2626.53 samples/sec   Loss 10.2025   LearningRate 0.0569   Epoch: 4   Global Step: 203740   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:16,388-Speed 2631.78 samples/sec   Loss 10.2777   LearningRate 0.0569   Epoch: 4   Global Step: 203750   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:20,291-Speed 2624.11 samples/sec   Loss 10.2301   LearningRate 0.0569   Epoch: 4   Global Step: 203760   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:24,222-Speed 2605.66 samples/sec   Loss 10.1485   LearningRate 0.0569   Epoch: 4   Global Step: 203770   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:28,112-Speed 2633.34 samples/sec   Loss 10.2616   LearningRate 0.0569   Epoch: 4   Global Step: 203780   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:32,004-Speed 2631.42 samples/sec   Loss 10.2097   LearningRate 0.0569   Epoch: 4   Global Step: 203790   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:35,903-Speed 2626.90 samples/sec   Loss 10.3104   LearningRate 0.0569   Epoch: 4   Global Step: 203800   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:39,801-Speed 2627.95 samples/sec   Loss 10.3226   LearningRate 0.0569   Epoch: 4   Global Step: 203810   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:43,705-Speed 2623.33 samples/sec   Loss 10.4569   LearningRate 0.0569   Epoch: 4   Global Step: 203820   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:47,635-Speed 2606.37 samples/sec   Loss 10.3469   LearningRate 0.0569   Epoch: 4   Global Step: 203830   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:37:51,533-Speed 2627.72 samples/sec   Loss 10.4684   LearningRate 0.0569   Epoch: 4   Global Step: 203840   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:37:55,433-Speed 2625.94 samples/sec   Loss 10.2775   LearningRate 0.0569   Epoch: 4   Global Step: 203850   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:37:59,329-Speed 2629.36 samples/sec   Loss 10.2928   LearningRate 0.0569   Epoch: 4   Global Step: 203860   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:38:03,210-Speed 2639.15 samples/sec   Loss 10.1512   LearningRate 0.0569   Epoch: 4   Global Step: 203870   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:07,124-Speed 2616.84 samples/sec   Loss 10.3809   LearningRate 0.0569   Epoch: 4   Global Step: 203880   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:11,017-Speed 2630.86 samples/sec   Loss 10.2318   LearningRate 0.0569   Epoch: 4   Global Step: 203890   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:14,915-Speed 2627.29 samples/sec   Loss 10.4688   LearningRate 0.0569   Epoch: 4   Global Step: 203900   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:18,809-Speed 2630.12 samples/sec   Loss 10.2795   LearningRate 0.0569   Epoch: 4   Global Step: 203910   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:22,704-Speed 2629.69 samples/sec   Loss 10.2807   LearningRate 0.0569   Epoch: 4   Global Step: 203920   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:26,599-Speed 2629.94 samples/sec   Loss 10.2797   LearningRate 0.0569   Epoch: 4   Global Step: 203930   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:30,494-Speed 2630.10 samples/sec   Loss 10.2660   LearningRate 0.0569   Epoch: 4   Global Step: 203940   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:34,387-Speed 2630.62 samples/sec   Loss 10.4464   LearningRate 0.0569   Epoch: 4   Global Step: 203950   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:38,280-Speed 2631.01 samples/sec   Loss 10.1889   LearningRate 0.0569   Epoch: 4   Global Step: 203960   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:42,180-Speed 2626.21 samples/sec   Loss 10.1856   LearningRate 0.0569   Epoch: 4   Global Step: 203970   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:38:46,059-Speed 2640.31 samples/sec   Loss 10.3671   LearningRate 0.0569   Epoch: 4   Global Step: 203980   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:49,951-Speed 2632.04 samples/sec   Loss 10.2983   LearningRate 0.0569   Epoch: 4   Global Step: 203990   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:53,843-Speed 2631.55 samples/sec   Loss 10.3187   LearningRate 0.0569   Epoch: 4   Global Step: 204000   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:38:57,742-Speed 2627.26 samples/sec   Loss 10.3440   LearningRate 0.0569   Epoch: 4   Global Step: 204010   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:01,635-Speed 2631.15 samples/sec   Loss 10.2745   LearningRate 0.0569   Epoch: 4   Global Step: 204020   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:05,535-Speed 2626.42 samples/sec   Loss 10.1974   LearningRate 0.0569   Epoch: 4   Global Step: 204030   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:09,429-Speed 2629.99 samples/sec   Loss 10.2210   LearningRate 0.0569   Epoch: 4   Global Step: 204040   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:13,320-Speed 2632.17 samples/sec   Loss 10.3358   LearningRate 0.0569   Epoch: 4   Global Step: 204050   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:17,215-Speed 2630.22 samples/sec   Loss 10.2898   LearningRate 0.0569   Epoch: 4   Global Step: 204060   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:21,111-Speed 2629.01 samples/sec   Loss 10.1560   LearningRate 0.0569   Epoch: 4   Global Step: 204070   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:25,036-Speed 2609.88 samples/sec   Loss 10.2297   LearningRate 0.0569   Epoch: 4   Global Step: 204080   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:39:28,918-Speed 2638.87 samples/sec   Loss 10.3151   LearningRate 0.0568   Epoch: 4   Global Step: 204090   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:32,811-Speed 2631.31 samples/sec   Loss 10.1852   LearningRate 0.0568   Epoch: 4   Global Step: 204100   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:36,708-Speed 2627.81 samples/sec   Loss 10.3536   LearningRate 0.0568   Epoch: 4   Global Step: 204110   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:40,606-Speed 2628.06 samples/sec   Loss 10.3520   LearningRate 0.0568   Epoch: 4   Global Step: 204120   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:44,501-Speed 2629.55 samples/sec   Loss 10.3973   LearningRate 0.0568   Epoch: 4   Global Step: 204130   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:48,407-Speed 2622.16 samples/sec   Loss 10.2924   LearningRate 0.0568   Epoch: 4   Global Step: 204140   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:52,331-Speed 2610.67 samples/sec   Loss 10.2506   LearningRate 0.0568   Epoch: 4   Global Step: 204150   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:39:56,239-Speed 2621.02 samples/sec   Loss 10.2505   LearningRate 0.0568   Epoch: 4   Global Step: 204160   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:00,163-Speed 2610.49 samples/sec   Loss 10.4084   LearningRate 0.0568   Epoch: 4   Global Step: 204170   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:04,057-Speed 2630.20 samples/sec   Loss 10.4093   LearningRate 0.0568   Epoch: 4   Global Step: 204180   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:07,950-Speed 2631.14 samples/sec   Loss 10.2733   LearningRate 0.0568   Epoch: 4   Global Step: 204190   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:40:11,846-Speed 2629.14 samples/sec   Loss 10.3285   LearningRate 0.0568   Epoch: 4   Global Step: 204200   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:40:15,774-Speed 2607.72 samples/sec   Loss 10.1302   LearningRate 0.0568   Epoch: 4   Global Step: 204210   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:19,660-Speed 2635.91 samples/sec   Loss 10.4072   LearningRate 0.0568   Epoch: 4   Global Step: 204220   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:23,553-Speed 2631.14 samples/sec   Loss 10.3276   LearningRate 0.0568   Epoch: 4   Global Step: 204230   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:27,462-Speed 2619.71 samples/sec   Loss 10.4047   LearningRate 0.0568   Epoch: 4   Global Step: 204240   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:31,362-Speed 2626.88 samples/sec   Loss 10.3891   LearningRate 0.0568   Epoch: 4   Global Step: 204250   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:35,257-Speed 2629.79 samples/sec   Loss 10.2768   LearningRate 0.0568   Epoch: 4   Global Step: 204260   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:39,149-Speed 2631.58 samples/sec   Loss 10.2003   LearningRate 0.0568   Epoch: 4   Global Step: 204270   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:43,049-Speed 2625.77 samples/sec   Loss 10.2865   LearningRate 0.0568   Epoch: 4   Global Step: 204280   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:46,955-Speed 2623.03 samples/sec   Loss 10.2141   LearningRate 0.0568   Epoch: 4   Global Step: 204290   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:40:50,836-Speed 2639.01 samples/sec   Loss 10.2661   LearningRate 0.0568   Epoch: 4   Global Step: 204300   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:40:54,770-Speed 2603.86 samples/sec   Loss 10.2149   LearningRate 0.0568   Epoch: 4   Global Step: 204310   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:40:58,665-Speed 2629.23 samples/sec   Loss 10.1552   LearningRate 0.0568   Epoch: 4   Global Step: 204320   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:02,566-Speed 2626.60 samples/sec   Loss 10.3635   LearningRate 0.0568   Epoch: 4   Global Step: 204330   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:06,457-Speed 2631.75 samples/sec   Loss 10.1789   LearningRate 0.0568   Epoch: 4   Global Step: 204340   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:10,355-Speed 2627.46 samples/sec   Loss 10.4146   LearningRate 0.0568   Epoch: 4   Global Step: 204350   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:14,249-Speed 2630.27 samples/sec   Loss 10.2491   LearningRate 0.0568   Epoch: 4   Global Step: 204360   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:18,255-Speed 2557.02 samples/sec   Loss 10.1658   LearningRate 0.0568   Epoch: 4   Global Step: 204370   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:22,152-Speed 2628.96 samples/sec   Loss 10.2969   LearningRate 0.0568   Epoch: 4   Global Step: 204380   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:26,052-Speed 2626.52 samples/sec   Loss 10.3156   LearningRate 0.0568   Epoch: 4   Global Step: 204390   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:29,932-Speed 2640.12 samples/sec   Loss 10.3187   LearningRate 0.0568   Epoch: 4   Global Step: 204400   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:33,848-Speed 2615.38 samples/sec   Loss 10.4201   LearningRate 0.0568   Epoch: 4   Global Step: 204410   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:37,754-Speed 2621.87 samples/sec   Loss 10.3263   LearningRate 0.0568   Epoch: 4   Global Step: 204420   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:41,648-Speed 2630.58 samples/sec   Loss 10.3449   LearningRate 0.0568   Epoch: 4   Global Step: 204430   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:45,544-Speed 2634.60 samples/sec   Loss 10.3183   LearningRate 0.0568   Epoch: 4   Global Step: 204440   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:49,441-Speed 2628.70 samples/sec   Loss 10.3286   LearningRate 0.0568   Epoch: 4   Global Step: 204450   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:53,340-Speed 2626.63 samples/sec   Loss 10.2944   LearningRate 0.0568   Epoch: 4   Global Step: 204460   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:41:57,232-Speed 2632.04 samples/sec   Loss 10.3630   LearningRate 0.0568   Epoch: 4   Global Step: 204470   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:01,124-Speed 2631.89 samples/sec   Loss 10.3990   LearningRate 0.0568   Epoch: 4   Global Step: 204480   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:05,019-Speed 2629.80 samples/sec   Loss 10.2107   LearningRate 0.0568   Epoch: 4   Global Step: 204490   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:08,917-Speed 2627.34 samples/sec   Loss 10.3875   LearningRate 0.0568   Epoch: 4   Global Step: 204500   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:42:12,799-Speed 2638.29 samples/sec   Loss 10.2618   LearningRate 0.0568   Epoch: 4   Global Step: 204510   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:16,685-Speed 2635.86 samples/sec   Loss 10.2901   LearningRate 0.0568   Epoch: 4   Global Step: 204520   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:20,591-Speed 2622.05 samples/sec   Loss 10.3299   LearningRate 0.0568   Epoch: 4   Global Step: 204530   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:24,533-Speed 2598.27 samples/sec   Loss 10.4106   LearningRate 0.0568   Epoch: 4   Global Step: 204540   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:28,425-Speed 2631.74 samples/sec   Loss 10.2430   LearningRate 0.0568   Epoch: 4   Global Step: 204550   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:32,325-Speed 2627.17 samples/sec   Loss 10.2451   LearningRate 0.0568   Epoch: 4   Global Step: 204560   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:36,222-Speed 2628.18 samples/sec   Loss 10.1198   LearningRate 0.0568   Epoch: 4   Global Step: 204570   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:40,121-Speed 2627.27 samples/sec   Loss 10.3569   LearningRate 0.0568   Epoch: 4   Global Step: 204580   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:42:44,016-Speed 2628.94 samples/sec   Loss 10.5012   LearningRate 0.0568   Epoch: 4   Global Step: 204590   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:42:47,918-Speed 2625.18 samples/sec   Loss 11.1485   LearningRate 0.0568   Epoch: 4   Global Step: 204600   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:42:51,944-Speed 2543.66 samples/sec   Loss 10.7269   LearningRate 0.0568   Epoch: 4   Global Step: 204610   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:42:55,821-Speed 2642.14 samples/sec   Loss 10.7903   LearningRate 0.0568   Epoch: 4   Global Step: 204620   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:42:59,719-Speed 2627.50 samples/sec   Loss 10.5225   LearningRate 0.0568   Epoch: 4   Global Step: 204630   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:03,614-Speed 2629.65 samples/sec   Loss 10.4841   LearningRate 0.0567   Epoch: 4   Global Step: 204640   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:07,513-Speed 2627.15 samples/sec   Loss 10.3535   LearningRate 0.0567   Epoch: 4   Global Step: 204650   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:11,409-Speed 2629.16 samples/sec   Loss 10.3338   LearningRate 0.0567   Epoch: 4   Global Step: 204660   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:15,304-Speed 2629.43 samples/sec   Loss 10.2820   LearningRate 0.0567   Epoch: 4   Global Step: 204670   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:19,198-Speed 2629.93 samples/sec   Loss 10.2677   LearningRate 0.0567   Epoch: 4   Global Step: 204680   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:23,091-Speed 2631.30 samples/sec   Loss 10.3610   LearningRate 0.0567   Epoch: 4   Global Step: 204690   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:26,983-Speed 2631.35 samples/sec   Loss 10.3176   LearningRate 0.0567   Epoch: 4   Global Step: 204700   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:30,874-Speed 2632.63 samples/sec   Loss 10.2077   LearningRate 0.0567   Epoch: 4   Global Step: 204710   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:43:34,765-Speed 2632.53 samples/sec   Loss 10.2684   LearningRate 0.0567   Epoch: 4   Global Step: 204720   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:43:38,657-Speed 2631.06 samples/sec   Loss 10.2316   LearningRate 0.0567   Epoch: 4   Global Step: 204730   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:43:42,551-Speed 2635.46 samples/sec   Loss 10.3056   LearningRate 0.0567   Epoch: 4   Global Step: 204740   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:43:46,443-Speed 2631.54 samples/sec   Loss 10.2954   LearningRate 0.0567   Epoch: 4   Global Step: 204750   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:43:50,337-Speed 2630.56 samples/sec   Loss 10.2069   LearningRate 0.0567   Epoch: 4   Global Step: 204760   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:43:54,229-Speed 2631.29 samples/sec   Loss 10.3918   LearningRate 0.0567   Epoch: 4   Global Step: 204770   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:43:58,126-Speed 2628.90 samples/sec   Loss 10.3407   LearningRate 0.0567   Epoch: 4   Global Step: 204780   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:44:02,017-Speed 2632.26 samples/sec   Loss 10.1028   LearningRate 0.0567   Epoch: 4   Global Step: 204790   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:44:05,909-Speed 2631.70 samples/sec   Loss 10.2874   LearningRate 0.0567   Epoch: 4   Global Step: 204800   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:44:09,802-Speed 2631.25 samples/sec   Loss 10.2549   LearningRate 0.0567   Epoch: 4   Global Step: 204810   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:44:13,695-Speed 2630.26 samples/sec   Loss 10.2350   LearningRate 0.0567   Epoch: 4   Global Step: 204820   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:17,600-Speed 2623.55 samples/sec   Loss 10.3439   LearningRate 0.0567   Epoch: 4   Global Step: 204830   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:21,494-Speed 2630.34 samples/sec   Loss 10.3742   LearningRate 0.0567   Epoch: 4   Global Step: 204840   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:25,383-Speed 2633.63 samples/sec   Loss 10.2739   LearningRate 0.0567   Epoch: 4   Global Step: 204850   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:29,283-Speed 2626.43 samples/sec   Loss 10.2532   LearningRate 0.0567   Epoch: 4   Global Step: 204860   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:33,180-Speed 2628.19 samples/sec   Loss 10.3628   LearningRate 0.0567   Epoch: 4   Global Step: 204870   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:37,075-Speed 2629.67 samples/sec   Loss 10.3176   LearningRate 0.0567   Epoch: 4   Global Step: 204880   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:40,968-Speed 2631.08 samples/sec   Loss 10.3018   LearningRate 0.0567   Epoch: 4   Global Step: 204890   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:44,862-Speed 2629.77 samples/sec   Loss 10.3376   LearningRate 0.0567   Epoch: 4   Global Step: 204900   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:48,764-Speed 2625.22 samples/sec   Loss 10.3023   LearningRate 0.0567   Epoch: 4   Global Step: 204910   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:44:52,657-Speed 2630.78 samples/sec   Loss 10.2403   LearningRate 0.0567   Epoch: 4   Global Step: 204920   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:44:56,553-Speed 2629.66 samples/sec   Loss 10.2757   LearningRate 0.0567   Epoch: 4   Global Step: 204930   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:00,445-Speed 2631.88 samples/sec   Loss 10.1436   LearningRate 0.0567   Epoch: 4   Global Step: 204940   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:04,342-Speed 2627.91 samples/sec   Loss 10.2279   LearningRate 0.0567   Epoch: 4   Global Step: 204950   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:08,235-Speed 2630.83 samples/sec   Loss 10.3373   LearningRate 0.0567   Epoch: 4   Global Step: 204960   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:12,126-Speed 2632.96 samples/sec   Loss 10.1707   LearningRate 0.0567   Epoch: 4   Global Step: 204970   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:16,016-Speed 2633.13 samples/sec   Loss 10.3810   LearningRate 0.0567   Epoch: 4   Global Step: 204980   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:19,905-Speed 2633.41 samples/sec   Loss 10.2784   LearningRate 0.0567   Epoch: 4   Global Step: 204990   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:23,803-Speed 2627.45 samples/sec   Loss 10.3048   LearningRate 0.0567   Epoch: 4   Global Step: 205000   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:27,705-Speed 2625.87 samples/sec   Loss 10.3435   LearningRate 0.0567   Epoch: 4   Global Step: 205010   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:31,600-Speed 2629.41 samples/sec   Loss 10.3481   LearningRate 0.0567   Epoch: 4   Global Step: 205020   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:45:35,496-Speed 2628.81 samples/sec   Loss 10.2181   LearningRate 0.0567   Epoch: 4   Global Step: 205030   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:45:39,400-Speed 2623.31 samples/sec   Loss 10.3828   LearningRate 0.0567   Epoch: 4   Global Step: 205040   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:45:43,295-Speed 2630.00 samples/sec   Loss 10.3935   LearningRate 0.0567   Epoch: 4   Global Step: 205050   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:45:47,193-Speed 2627.92 samples/sec   Loss 10.3161   LearningRate 0.0567   Epoch: 4   Global Step: 205060   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:45:51,095-Speed 2624.43 samples/sec   Loss 10.1193   LearningRate 0.0567   Epoch: 4   Global Step: 205070   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:54,990-Speed 2630.00 samples/sec   Loss 10.3624   LearningRate 0.0567   Epoch: 4   Global Step: 205080   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:45:58,871-Speed 2639.35 samples/sec   Loss 10.3411   LearningRate 0.0567   Epoch: 4   Global Step: 205090   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:46:02,785-Speed 2616.54 samples/sec   Loss 10.3078   LearningRate 0.0567   Epoch: 4   Global Step: 205100   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:46:06,684-Speed 2626.42 samples/sec   Loss 10.2477   LearningRate 0.0567   Epoch: 4   Global Step: 205110   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:46:10,589-Speed 2623.86 samples/sec   Loss 10.1790   LearningRate 0.0567   Epoch: 4   Global Step: 205120   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:46:14,485-Speed 2628.78 samples/sec   Loss 10.3540   LearningRate 0.0567   Epoch: 4   Global Step: 205130   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:18,381-Speed 2629.21 samples/sec   Loss 10.2100   LearningRate 0.0567   Epoch: 4   Global Step: 205140   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:22,284-Speed 2623.73 samples/sec   Loss 10.1932   LearningRate 0.0567   Epoch: 4   Global Step: 205150   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:26,184-Speed 2627.00 samples/sec   Loss 10.2906   LearningRate 0.0567   Epoch: 4   Global Step: 205160   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:30,084-Speed 2626.49 samples/sec   Loss 10.2999   LearningRate 0.0567   Epoch: 4   Global Step: 205170   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:33,993-Speed 2619.66 samples/sec   Loss 10.3178   LearningRate 0.0567   Epoch: 4   Global Step: 205180   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:37,901-Speed 2620.96 samples/sec   Loss 10.2045   LearningRate 0.0566   Epoch: 4   Global Step: 205190   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:41,806-Speed 2623.20 samples/sec   Loss 10.2883   LearningRate 0.0566   Epoch: 4   Global Step: 205200   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:45,699-Speed 2631.01 samples/sec   Loss 10.3563   LearningRate 0.0566   Epoch: 4   Global Step: 205210   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:49,584-Speed 2636.34 samples/sec   Loss 10.2753   LearningRate 0.0566   Epoch: 4   Global Step: 205220   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 18:46:53,474-Speed 2633.19 samples/sec   Loss 10.2899   LearningRate 0.0566   Epoch: 4   Global Step: 205230   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:46:57,393-Speed 2613.67 samples/sec   Loss 10.3289   LearningRate 0.0566   Epoch: 4   Global Step: 205240   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:01,283-Speed 2633.31 samples/sec   Loss 10.2388   LearningRate 0.0566   Epoch: 4   Global Step: 205250   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:05,185-Speed 2624.69 samples/sec   Loss 10.2007   LearningRate 0.0566   Epoch: 4   Global Step: 205260   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:09,067-Speed 2638.24 samples/sec   Loss 10.3315   LearningRate 0.0566   Epoch: 4   Global Step: 205270   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:12,957-Speed 2633.18 samples/sec   Loss 10.2288   LearningRate 0.0566   Epoch: 4   Global Step: 205280   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:16,847-Speed 2633.16 samples/sec   Loss 10.4193   LearningRate 0.0566   Epoch: 4   Global Step: 205290   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:20,761-Speed 2617.41 samples/sec   Loss 10.1154   LearningRate 0.0566   Epoch: 4   Global Step: 205300   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:24,668-Speed 2621.84 samples/sec   Loss 10.4729   LearningRate 0.0566   Epoch: 4   Global Step: 205310   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:28,564-Speed 2629.13 samples/sec   Loss 10.4085   LearningRate 0.0566   Epoch: 4   Global Step: 205320   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 18:47:32,461-Speed 2628.29 samples/sec   Loss 10.4225   LearningRate 0.0566   Epoch: 4   Global Step: 205330   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:36,351-Speed 2633.36 samples/sec   Loss 10.2530   LearningRate 0.0566   Epoch: 4   Global Step: 205340   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:40,246-Speed 2629.25 samples/sec   Loss 10.3073   LearningRate 0.0566   Epoch: 4   Global Step: 205350   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:44,135-Speed 2633.52 samples/sec   Loss 10.3168   LearningRate 0.0566   Epoch: 4   Global Step: 205360   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:48,046-Speed 2619.77 samples/sec   Loss 10.3818   LearningRate 0.0566   Epoch: 4   Global Step: 205370   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:51,937-Speed 2631.83 samples/sec   Loss 10.3360   LearningRate 0.0566   Epoch: 4   Global Step: 205380   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:55,830-Speed 2631.65 samples/sec   Loss 10.2839   LearningRate 0.0566   Epoch: 4   Global Step: 205390   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:47:59,749-Speed 2613.44 samples/sec   Loss 10.1118   LearningRate 0.0566   Epoch: 4   Global Step: 205400   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:48:03,638-Speed 2633.82 samples/sec   Loss 10.2283   LearningRate 0.0566   Epoch: 4   Global Step: 205410   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:48:07,530-Speed 2631.03 samples/sec   Loss 10.3300   LearningRate 0.0566   Epoch: 4   Global Step: 205420   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:48:11,435-Speed 2623.31 samples/sec   Loss 10.2219   LearningRate 0.0566   Epoch: 4   Global Step: 205430   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:15,327-Speed 2631.18 samples/sec   Loss 10.4791   LearningRate 0.0566   Epoch: 4   Global Step: 205440   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:19,216-Speed 2633.72 samples/sec   Loss 10.2120   LearningRate 0.0566   Epoch: 4   Global Step: 205450   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:23,107-Speed 2632.55 samples/sec   Loss 10.2731   LearningRate 0.0566   Epoch: 4   Global Step: 205460   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:27,012-Speed 2622.83 samples/sec   Loss 10.2196   LearningRate 0.0566   Epoch: 4   Global Step: 205470   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:30,909-Speed 2628.16 samples/sec   Loss 10.1469   LearningRate 0.0566   Epoch: 4   Global Step: 205480   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:34,799-Speed 2633.59 samples/sec   Loss 10.2617   LearningRate 0.0566   Epoch: 4   Global Step: 205490   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:38,703-Speed 2623.29 samples/sec   Loss 10.1501   LearningRate 0.0566   Epoch: 4   Global Step: 205500   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:42,598-Speed 2629.75 samples/sec   Loss 10.2886   LearningRate 0.0566   Epoch: 4   Global Step: 205510   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:46,495-Speed 2628.07 samples/sec   Loss 10.1293   LearningRate 0.0566   Epoch: 4   Global Step: 205520   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:48:50,393-Speed 2627.51 samples/sec   Loss 10.1011   LearningRate 0.0566   Epoch: 4   Global Step: 205530   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:48:54,291-Speed 2627.32 samples/sec   Loss 10.1766   LearningRate 0.0566   Epoch: 4   Global Step: 205540   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:48:58,184-Speed 2631.73 samples/sec   Loss 10.3087   LearningRate 0.0566   Epoch: 4   Global Step: 205550   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:49:02,075-Speed 2632.24 samples/sec   Loss 10.1897   LearningRate 0.0566   Epoch: 4   Global Step: 205560   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:49:05,977-Speed 2624.51 samples/sec   Loss 10.2535   LearningRate 0.0566   Epoch: 4   Global Step: 205570   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:09,889-Speed 2619.50 samples/sec   Loss 10.2133   LearningRate 0.0566   Epoch: 4   Global Step: 205580   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:13,780-Speed 2632.73 samples/sec   Loss 10.3322   LearningRate 0.0566   Epoch: 4   Global Step: 205590   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:17,671-Speed 2632.12 samples/sec   Loss 10.3177   LearningRate 0.0566   Epoch: 4   Global Step: 205600   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:21,565-Speed 2630.15 samples/sec   Loss 10.3312   LearningRate 0.0566   Epoch: 4   Global Step: 205610   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:25,455-Speed 2633.44 samples/sec   Loss 10.3323   LearningRate 0.0566   Epoch: 4   Global Step: 205620   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:29,354-Speed 2626.42 samples/sec   Loss 10.2682   LearningRate 0.0566   Epoch: 4   Global Step: 205630   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:33,258-Speed 2623.89 samples/sec   Loss 10.2596   LearningRate 0.0566   Epoch: 4   Global Step: 205640   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:37,147-Speed 2633.60 samples/sec   Loss 10.3462   LearningRate 0.0566   Epoch: 4   Global Step: 205650   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:41,037-Speed 2633.83 samples/sec   Loss 10.3564   LearningRate 0.0566   Epoch: 4   Global Step: 205660   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:44,927-Speed 2632.77 samples/sec   Loss 10.2406   LearningRate 0.0566   Epoch: 4   Global Step: 205670   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:48,818-Speed 2632.46 samples/sec   Loss 10.2658   LearningRate 0.0566   Epoch: 4   Global Step: 205680   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:52,731-Speed 2617.06 samples/sec   Loss 10.4572   LearningRate 0.0566   Epoch: 4   Global Step: 205690   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:49:56,629-Speed 2628.02 samples/sec   Loss 10.2648   LearningRate 0.0566   Epoch: 4   Global Step: 205700   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:00,522-Speed 2630.96 samples/sec   Loss 10.3032   LearningRate 0.0566   Epoch: 4   Global Step: 205710   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:04,426-Speed 2624.15 samples/sec   Loss 10.3160   LearningRate 0.0566   Epoch: 4   Global Step: 205720   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:08,321-Speed 2629.82 samples/sec   Loss 10.2813   LearningRate 0.0566   Epoch: 4   Global Step: 205730   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:12,232-Speed 2619.14 samples/sec   Loss 10.2908   LearningRate 0.0565   Epoch: 4   Global Step: 205740   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:16,130-Speed 2627.69 samples/sec   Loss 10.2138   LearningRate 0.0565   Epoch: 4   Global Step: 205750   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:20,036-Speed 2621.88 samples/sec   Loss 10.1492   LearningRate 0.0565   Epoch: 4   Global Step: 205760   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:23,955-Speed 2613.35 samples/sec   Loss 10.0790   LearningRate 0.0565   Epoch: 4   Global Step: 205770   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:50:27,892-Speed 2601.71 samples/sec   Loss 10.2860   LearningRate 0.0565   Epoch: 4   Global Step: 205780   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:50:31,777-Speed 2636.60 samples/sec   Loss 10.1903   LearningRate 0.0565   Epoch: 4   Global Step: 205790   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:35,681-Speed 2623.76 samples/sec   Loss 10.3287   LearningRate 0.0565   Epoch: 4   Global Step: 205800   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:39,580-Speed 2626.89 samples/sec   Loss 10.2165   LearningRate 0.0565   Epoch: 4   Global Step: 205810   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:43,485-Speed 2622.81 samples/sec   Loss 10.3478   LearningRate 0.0565   Epoch: 4   Global Step: 205820   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:47,385-Speed 2626.33 samples/sec   Loss 10.1149   LearningRate 0.0565   Epoch: 4   Global Step: 205830   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:50:51,277-Speed 2632.37 samples/sec   Loss 10.2550   LearningRate 0.0565   Epoch: 4   Global Step: 205840   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:50:55,175-Speed 2627.41 samples/sec   Loss 10.1919   LearningRate 0.0565   Epoch: 4   Global Step: 205850   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:50:59,069-Speed 2630.22 samples/sec   Loss 10.2684   LearningRate 0.0565   Epoch: 4   Global Step: 205860   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:02,977-Speed 2620.90 samples/sec   Loss 10.2981   LearningRate 0.0565   Epoch: 4   Global Step: 205870   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:06,878-Speed 2626.02 samples/sec   Loss 10.1892   LearningRate 0.0565   Epoch: 4   Global Step: 205880   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:10,766-Speed 2634.07 samples/sec   Loss 10.2073   LearningRate 0.0565   Epoch: 4   Global Step: 205890   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:14,666-Speed 2626.12 samples/sec   Loss 10.1509   LearningRate 0.0565   Epoch: 4   Global Step: 205900   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:18,554-Speed 2635.04 samples/sec   Loss 10.1997   LearningRate 0.0565   Epoch: 4   Global Step: 205910   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:22,450-Speed 2629.09 samples/sec   Loss 10.1774   LearningRate 0.0565   Epoch: 4   Global Step: 205920   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:26,342-Speed 2631.99 samples/sec   Loss 10.2694   LearningRate 0.0565   Epoch: 4   Global Step: 205930   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:51:30,231-Speed 2633.91 samples/sec   Loss 10.2712   LearningRate 0.0565   Epoch: 4   Global Step: 205940   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:34,134-Speed 2624.16 samples/sec   Loss 10.1286   LearningRate 0.0565   Epoch: 4   Global Step: 205950   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:38,021-Speed 2635.14 samples/sec   Loss 10.1476   LearningRate 0.0565   Epoch: 4   Global Step: 205960   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:41,912-Speed 2632.35 samples/sec   Loss 10.2498   LearningRate 0.0565   Epoch: 4   Global Step: 205970   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:45,805-Speed 2630.60 samples/sec   Loss 10.2202   LearningRate 0.0565   Epoch: 4   Global Step: 205980   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:49,698-Speed 2631.73 samples/sec   Loss 10.0856   LearningRate 0.0565   Epoch: 4   Global Step: 205990   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:53,590-Speed 2631.52 samples/sec   Loss 10.2553   LearningRate 0.0565   Epoch: 4   Global Step: 206000   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:51:57,481-Speed 2632.34 samples/sec   Loss 10.2475   LearningRate 0.0565   Epoch: 4   Global Step: 206010   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:52:01,371-Speed 2633.22 samples/sec   Loss 10.2330   LearningRate 0.0565   Epoch: 4   Global Step: 206020   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:52:05,263-Speed 2631.62 samples/sec   Loss 10.3422   LearningRate 0.0565   Epoch: 4   Global Step: 206030   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:52:09,164-Speed 2625.68 samples/sec   Loss 10.2610   LearningRate 0.0565   Epoch: 4   Global Step: 206040   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:52:13,057-Speed 2631.05 samples/sec   Loss 10.2592   LearningRate 0.0565   Epoch: 4   Global Step: 206050   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:52:16,951-Speed 2630.19 samples/sec   Loss 10.3231   LearningRate 0.0565   Epoch: 4   Global Step: 206060   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:52:20,845-Speed 2630.55 samples/sec   Loss 10.1802   LearningRate 0.0565   Epoch: 4   Global Step: 206070   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:52:24,708-Speed 2651.93 samples/sec   Loss 10.2339   LearningRate 0.0565   Epoch: 4   Global Step: 206080   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:28,603-Speed 2630.33 samples/sec   Loss 10.2397   LearningRate 0.0565   Epoch: 4   Global Step: 206090   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:32,501-Speed 2627.03 samples/sec   Loss 10.1519   LearningRate 0.0565   Epoch: 4   Global Step: 206100   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:36,393-Speed 2631.48 samples/sec   Loss 10.2817   LearningRate 0.0565   Epoch: 4   Global Step: 206110   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:40,286-Speed 2630.82 samples/sec   Loss 10.2748   LearningRate 0.0565   Epoch: 4   Global Step: 206120   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:44,179-Speed 2631.05 samples/sec   Loss 10.2215   LearningRate 0.0565   Epoch: 4   Global Step: 206130   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:48,080-Speed 2626.18 samples/sec   Loss 10.2644   LearningRate 0.0565   Epoch: 4   Global Step: 206140   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:51,972-Speed 2631.80 samples/sec   Loss 10.2231   LearningRate 0.0565   Epoch: 4   Global Step: 206150   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:55,872-Speed 2626.26 samples/sec   Loss 10.2600   LearningRate 0.0565   Epoch: 4   Global Step: 206160   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:52:59,773-Speed 2625.38 samples/sec   Loss 10.2204   LearningRate 0.0565   Epoch: 4   Global Step: 206170   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:53:03,671-Speed 2628.00 samples/sec   Loss 10.2032   LearningRate 0.0565   Epoch: 4   Global Step: 206180   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:07,566-Speed 2629.27 samples/sec   Loss 10.2872   LearningRate 0.0565   Epoch: 4   Global Step: 206190   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:11,473-Speed 2621.82 samples/sec   Loss 10.2237   LearningRate 0.0565   Epoch: 4   Global Step: 206200   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:15,368-Speed 2629.83 samples/sec   Loss 10.3095   LearningRate 0.0565   Epoch: 4   Global Step: 206210   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:19,264-Speed 2629.31 samples/sec   Loss 10.3964   LearningRate 0.0565   Epoch: 4   Global Step: 206220   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:23,156-Speed 2631.96 samples/sec   Loss 10.2083   LearningRate 0.0565   Epoch: 4   Global Step: 206230   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:27,160-Speed 2558.50 samples/sec   Loss 10.2757   LearningRate 0.0565   Epoch: 4   Global Step: 206240   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:31,087-Speed 2608.22 samples/sec   Loss 10.3265   LearningRate 0.0565   Epoch: 4   Global Step: 206250   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:34,982-Speed 2629.21 samples/sec   Loss 10.2276   LearningRate 0.0565   Epoch: 4   Global Step: 206260   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:38,878-Speed 2629.10 samples/sec   Loss 10.1861   LearningRate 0.0565   Epoch: 4   Global Step: 206270   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:42,784-Speed 2622.18 samples/sec   Loss 10.2174   LearningRate 0.0565   Epoch: 4   Global Step: 206280   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:53:46,674-Speed 2633.01 samples/sec   Loss 10.3533   LearningRate 0.0564   Epoch: 4   Global Step: 206290   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:53:50,566-Speed 2632.03 samples/sec   Loss 10.1808   LearningRate 0.0564   Epoch: 4   Global Step: 206300   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:53:54,443-Speed 2641.58 samples/sec   Loss 10.2743   LearningRate 0.0564   Epoch: 4   Global Step: 206310   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:53:58,337-Speed 2630.52 samples/sec   Loss 10.1632   LearningRate 0.0564   Epoch: 4   Global Step: 206320   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:02,230-Speed 2631.22 samples/sec   Loss 10.1266   LearningRate 0.0564   Epoch: 4   Global Step: 206330   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:06,124-Speed 2629.68 samples/sec   Loss 10.1586   LearningRate 0.0564   Epoch: 4   Global Step: 206340   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:10,029-Speed 2623.17 samples/sec   Loss 10.2597   LearningRate 0.0564   Epoch: 4   Global Step: 206350   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:13,924-Speed 2630.10 samples/sec   Loss 10.3938   LearningRate 0.0564   Epoch: 4   Global Step: 206360   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:17,822-Speed 2628.05 samples/sec   Loss 10.3069   LearningRate 0.0564   Epoch: 4   Global Step: 206370   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:21,718-Speed 2628.34 samples/sec   Loss 10.2749   LearningRate 0.0564   Epoch: 4   Global Step: 206380   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:25,618-Speed 2626.26 samples/sec   Loss 10.3739   LearningRate 0.0564   Epoch: 4   Global Step: 206390   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:29,550-Speed 2605.51 samples/sec   Loss 10.3179   LearningRate 0.0564   Epoch: 4   Global Step: 206400   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:54:33,491-Speed 2598.98 samples/sec   Loss 10.2002   LearningRate 0.0564   Epoch: 4   Global Step: 206410   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:54:37,382-Speed 2632.56 samples/sec   Loss 10.1633   LearningRate 0.0564   Epoch: 4   Global Step: 206420   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:54:41,286-Speed 2624.26 samples/sec   Loss 10.1936   LearningRate 0.0564   Epoch: 4   Global Step: 206430   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:54:45,189-Speed 2623.91 samples/sec   Loss 10.3172   LearningRate 0.0564   Epoch: 4   Global Step: 206440   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:54:49,092-Speed 2623.97 samples/sec   Loss 10.2171   LearningRate 0.0564   Epoch: 4   Global Step: 206450   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:54:52,997-Speed 2623.18 samples/sec   Loss 10.3120   LearningRate 0.0564   Epoch: 4   Global Step: 206460   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:54:56,894-Speed 2628.73 samples/sec   Loss 10.1387   LearningRate 0.0564   Epoch: 4   Global Step: 206470   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:55:00,818-Speed 2609.38 samples/sec   Loss 10.0336   LearningRate 0.0564   Epoch: 4   Global Step: 206480   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:55:04,719-Speed 2625.56 samples/sec   Loss 10.1511   LearningRate 0.0564   Epoch: 4   Global Step: 206490   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:55:08,631-Speed 2618.89 samples/sec   Loss 10.1632   LearningRate 0.0564   Epoch: 4   Global Step: 206500   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:55:12,535-Speed 2623.21 samples/sec   Loss 10.2656   LearningRate 0.0564   Epoch: 4   Global Step: 206510   Fp16 Grad Scale: 524288   Required: 70 hours
Training: 2022-04-13 18:55:16,410-Speed 2643.60 samples/sec   Loss 10.1910   LearningRate 0.0564   Epoch: 4   Global Step: 206520   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:20,312-Speed 2625.30 samples/sec   Loss 10.1811   LearningRate 0.0564   Epoch: 4   Global Step: 206530   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:24,326-Speed 2551.98 samples/sec   Loss 10.1974   LearningRate 0.0564   Epoch: 4   Global Step: 206540   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:28,232-Speed 2621.55 samples/sec   Loss 10.3151   LearningRate 0.0564   Epoch: 4   Global Step: 206550   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:32,138-Speed 2622.86 samples/sec   Loss 10.2599   LearningRate 0.0564   Epoch: 4   Global Step: 206560   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:36,128-Speed 2566.94 samples/sec   Loss 10.3835   LearningRate 0.0564   Epoch: 4   Global Step: 206570   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:40,082-Speed 2590.44 samples/sec   Loss 10.2470   LearningRate 0.0564   Epoch: 4   Global Step: 206580   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:43,994-Speed 2618.61 samples/sec   Loss 10.2505   LearningRate 0.0564   Epoch: 4   Global Step: 206590   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:47,893-Speed 2627.17 samples/sec   Loss 10.1751   LearningRate 0.0564   Epoch: 4   Global Step: 206600   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:51,818-Speed 2609.15 samples/sec   Loss 10.1713   LearningRate 0.0564   Epoch: 4   Global Step: 206610   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:55:55,708-Speed 2633.42 samples/sec   Loss 10.2782   LearningRate 0.0564   Epoch: 4   Global Step: 206620   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:55:59,581-Speed 2643.95 samples/sec   Loss 10.1930   LearningRate 0.0564   Epoch: 4   Global Step: 206630   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:03,471-Speed 2633.58 samples/sec   Loss 10.2761   LearningRate 0.0564   Epoch: 4   Global Step: 206640   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:07,362-Speed 2631.82 samples/sec   Loss 10.2165   LearningRate 0.0564   Epoch: 4   Global Step: 206650   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:11,253-Speed 2632.91 samples/sec   Loss 10.2356   LearningRate 0.0564   Epoch: 4   Global Step: 206660   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:15,145-Speed 2630.99 samples/sec   Loss 10.3037   LearningRate 0.0564   Epoch: 4   Global Step: 206670   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:19,047-Speed 2625.72 samples/sec   Loss 10.1478   LearningRate 0.0564   Epoch: 4   Global Step: 206680   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:22,934-Speed 2634.77 samples/sec   Loss 10.1948   LearningRate 0.0564   Epoch: 4   Global Step: 206690   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:26,827-Speed 2630.82 samples/sec   Loss 10.2767   LearningRate 0.0564   Epoch: 4   Global Step: 206700   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:30,718-Speed 2632.45 samples/sec   Loss 10.2230   LearningRate 0.0564   Epoch: 4   Global Step: 206710   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:34,634-Speed 2619.00 samples/sec   Loss 10.1750   LearningRate 0.0564   Epoch: 4   Global Step: 206720   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:38,515-Speed 2640.04 samples/sec   Loss 10.3754   LearningRate 0.0564   Epoch: 4   Global Step: 206730   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:42,410-Speed 2629.33 samples/sec   Loss 10.2502   LearningRate 0.0564   Epoch: 4   Global Step: 206740   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:46,304-Speed 2630.44 samples/sec   Loss 10.3147   LearningRate 0.0564   Epoch: 4   Global Step: 206750   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:50,195-Speed 2632.14 samples/sec   Loss 10.2821   LearningRate 0.0564   Epoch: 4   Global Step: 206760   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:54,086-Speed 2632.83 samples/sec   Loss 10.1641   LearningRate 0.0564   Epoch: 4   Global Step: 206770   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:56:57,985-Speed 2626.71 samples/sec   Loss 10.3617   LearningRate 0.0564   Epoch: 4   Global Step: 206780   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:01,887-Speed 2624.57 samples/sec   Loss 10.3324   LearningRate 0.0564   Epoch: 4   Global Step: 206790   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:05,794-Speed 2621.47 samples/sec   Loss 10.1734   LearningRate 0.0564   Epoch: 4   Global Step: 206800   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:09,697-Speed 2624.53 samples/sec   Loss 10.1399   LearningRate 0.0564   Epoch: 4   Global Step: 206810   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:13,599-Speed 2625.71 samples/sec   Loss 10.0145   LearningRate 0.0564   Epoch: 4   Global Step: 206820   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:17,498-Speed 2626.24 samples/sec   Loss 10.2673   LearningRate 0.0564   Epoch: 4   Global Step: 206830   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:57:21,403-Speed 2623.25 samples/sec   Loss 10.3352   LearningRate 0.0564   Epoch: 4   Global Step: 206840   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:57:25,313-Speed 2619.24 samples/sec   Loss 10.2177   LearningRate 0.0563   Epoch: 4   Global Step: 206850   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:57:29,205-Speed 2632.28 samples/sec   Loss 10.2682   LearningRate 0.0563   Epoch: 4   Global Step: 206860   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:57:33,105-Speed 2625.49 samples/sec   Loss 10.3002   LearningRate 0.0563   Epoch: 4   Global Step: 206870   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:57:37,004-Speed 2626.78 samples/sec   Loss 10.1814   LearningRate 0.0563   Epoch: 4   Global Step: 206880   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:57:40,874-Speed 2647.08 samples/sec   Loss 10.2023   LearningRate 0.0563   Epoch: 4   Global Step: 206890   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:44,771-Speed 2628.17 samples/sec   Loss 10.1508   LearningRate 0.0563   Epoch: 4   Global Step: 206900   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:48,663-Speed 2631.85 samples/sec   Loss 10.1792   LearningRate 0.0563   Epoch: 4   Global Step: 206910   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:57:52,549-Speed 2636.07 samples/sec   Loss 10.1440   LearningRate 0.0563   Epoch: 4   Global Step: 206920   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:57:56,450-Speed 2625.04 samples/sec   Loss 10.2070   LearningRate 0.0563   Epoch: 4   Global Step: 206930   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:00,354-Speed 2623.79 samples/sec   Loss 10.2303   LearningRate 0.0563   Epoch: 4   Global Step: 206940   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:04,244-Speed 2632.70 samples/sec   Loss 10.3011   LearningRate 0.0563   Epoch: 4   Global Step: 206950   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:08,144-Speed 2626.05 samples/sec   Loss 10.3708   LearningRate 0.0563   Epoch: 4   Global Step: 206960   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:12,036-Speed 2631.53 samples/sec   Loss 10.3821   LearningRate 0.0563   Epoch: 4   Global Step: 206970   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:15,930-Speed 2630.21 samples/sec   Loss 10.2297   LearningRate 0.0563   Epoch: 4   Global Step: 206980   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:19,828-Speed 2628.17 samples/sec   Loss 10.2548   LearningRate 0.0563   Epoch: 4   Global Step: 206990   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:23,722-Speed 2630.14 samples/sec   Loss 10.2683   LearningRate 0.0563   Epoch: 4   Global Step: 207000   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:27,615-Speed 2631.29 samples/sec   Loss 10.2533   LearningRate 0.0563   Epoch: 4   Global Step: 207010   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 18:58:31,505-Speed 2632.55 samples/sec   Loss 10.1371   LearningRate 0.0563   Epoch: 4   Global Step: 207020   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:35,397-Speed 2632.00 samples/sec   Loss 10.0513   LearningRate 0.0563   Epoch: 4   Global Step: 207030   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:39,294-Speed 2627.77 samples/sec   Loss 10.3981   LearningRate 0.0563   Epoch: 4   Global Step: 207040   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:43,237-Speed 2598.29 samples/sec   Loss 10.2875   LearningRate 0.0563   Epoch: 4   Global Step: 207050   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:47,267-Speed 2541.32 samples/sec   Loss 10.3357   LearningRate 0.0563   Epoch: 4   Global Step: 207060   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:51,184-Speed 2615.02 samples/sec   Loss 10.1166   LearningRate 0.0563   Epoch: 4   Global Step: 207070   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:55,077-Speed 2631.00 samples/sec   Loss 10.3260   LearningRate 0.0563   Epoch: 4   Global Step: 207080   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:58:58,977-Speed 2626.60 samples/sec   Loss 10.1833   LearningRate 0.0563   Epoch: 4   Global Step: 207090   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:02,873-Speed 2628.54 samples/sec   Loss 10.1157   LearningRate 0.0563   Epoch: 4   Global Step: 207100   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:06,764-Speed 2632.42 samples/sec   Loss 10.2392   LearningRate 0.0563   Epoch: 4   Global Step: 207110   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:10,645-Speed 2639.07 samples/sec   Loss 10.2330   LearningRate 0.0563   Epoch: 4   Global Step: 207120   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:14,541-Speed 2628.82 samples/sec   Loss 10.3843   LearningRate 0.0563   Epoch: 4   Global Step: 207130   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:18,441-Speed 2627.07 samples/sec   Loss 10.2725   LearningRate 0.0563   Epoch: 4   Global Step: 207140   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:22,337-Speed 2628.54 samples/sec   Loss 10.3561   LearningRate 0.0563   Epoch: 4   Global Step: 207150   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:26,269-Speed 2605.48 samples/sec   Loss 10.0333   LearningRate 0.0563   Epoch: 4   Global Step: 207160   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:30,167-Speed 2627.80 samples/sec   Loss 10.1933   LearningRate 0.0563   Epoch: 4   Global Step: 207170   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:34,065-Speed 2627.04 samples/sec   Loss 10.0970   LearningRate 0.0563   Epoch: 4   Global Step: 207180   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:37,960-Speed 2629.73 samples/sec   Loss 10.2746   LearningRate 0.0563   Epoch: 4   Global Step: 207190   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:41,853-Speed 2631.49 samples/sec   Loss 10.0865   LearningRate 0.0563   Epoch: 4   Global Step: 207200   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:45,782-Speed 2607.03 samples/sec   Loss 10.2238   LearningRate 0.0563   Epoch: 4   Global Step: 207210   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 18:59:49,689-Speed 2621.67 samples/sec   Loss 10.2111   LearningRate 0.0563   Epoch: 4   Global Step: 207220   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:59:53,588-Speed 2626.62 samples/sec   Loss 10.1879   LearningRate 0.0563   Epoch: 4   Global Step: 207230   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 18:59:57,489-Speed 2625.70 samples/sec   Loss 10.3176   LearningRate 0.0563   Epoch: 4   Global Step: 207240   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:01,387-Speed 2627.43 samples/sec   Loss 10.2092   LearningRate 0.0563   Epoch: 4   Global Step: 207250   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:05,282-Speed 2630.19 samples/sec   Loss 10.2484   LearningRate 0.0563   Epoch: 4   Global Step: 207260   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:09,179-Speed 2628.13 samples/sec   Loss 10.1984   LearningRate 0.0563   Epoch: 4   Global Step: 207270   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:13,074-Speed 2630.62 samples/sec   Loss 10.3023   LearningRate 0.0563   Epoch: 4   Global Step: 207280   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:16,980-Speed 2621.98 samples/sec   Loss 10.2435   LearningRate 0.0563   Epoch: 4   Global Step: 207290   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:20,876-Speed 2629.03 samples/sec   Loss 10.2460   LearningRate 0.0563   Epoch: 4   Global Step: 207300   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:00:24,814-Speed 2601.37 samples/sec   Loss 10.1800   LearningRate 0.0563   Epoch: 4   Global Step: 207310   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:28,717-Speed 2624.53 samples/sec   Loss 10.2498   LearningRate 0.0563   Epoch: 4   Global Step: 207320   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:32,610-Speed 2630.87 samples/sec   Loss 10.2448   LearningRate 0.0563   Epoch: 4   Global Step: 207330   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:36,500-Speed 2632.67 samples/sec   Loss 10.3323   LearningRate 0.0563   Epoch: 4   Global Step: 207340   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:40,392-Speed 2631.81 samples/sec   Loss 10.2795   LearningRate 0.0563   Epoch: 4   Global Step: 207350   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:44,286-Speed 2630.15 samples/sec   Loss 10.0809   LearningRate 0.0563   Epoch: 4   Global Step: 207360   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:48,180-Speed 2630.73 samples/sec   Loss 10.1817   LearningRate 0.0563   Epoch: 4   Global Step: 207370   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:52,073-Speed 2630.47 samples/sec   Loss 10.1995   LearningRate 0.0563   Epoch: 4   Global Step: 207380   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:00:55,966-Speed 2631.30 samples/sec   Loss 10.2686   LearningRate 0.0563   Epoch: 4   Global Step: 207390   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:18,026-Speed 464.22 samples/sec   Loss 10.2975   LearningRate 0.0562   Epoch: 5   Global Step: 207400   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:21,936-Speed 2619.62 samples/sec   Loss 10.3200   LearningRate 0.0562   Epoch: 5   Global Step: 207410   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:01:25,831-Speed 2629.65 samples/sec   Loss 10.2283   LearningRate 0.0562   Epoch: 5   Global Step: 207420   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:01:29,702-Speed 2646.40 samples/sec   Loss 10.2658   LearningRate 0.0562   Epoch: 5   Global Step: 207430   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:33,600-Speed 2627.41 samples/sec   Loss 10.3411   LearningRate 0.0562   Epoch: 5   Global Step: 207440   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:37,502-Speed 2625.03 samples/sec   Loss 10.0276   LearningRate 0.0562   Epoch: 5   Global Step: 207450   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:41,402-Speed 2626.05 samples/sec   Loss 10.2505   LearningRate 0.0562   Epoch: 5   Global Step: 207460   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:45,315-Speed 2617.46 samples/sec   Loss 10.1436   LearningRate 0.0562   Epoch: 5   Global Step: 207470   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:49,211-Speed 2629.24 samples/sec   Loss 10.2258   LearningRate 0.0562   Epoch: 5   Global Step: 207480   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:53,104-Speed 2630.85 samples/sec   Loss 10.2635   LearningRate 0.0562   Epoch: 5   Global Step: 207490   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:01:56,987-Speed 2637.66 samples/sec   Loss 10.2554   LearningRate 0.0562   Epoch: 5   Global Step: 207500   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:00,877-Speed 2633.56 samples/sec   Loss 10.2796   LearningRate 0.0562   Epoch: 5   Global Step: 207510   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:04,760-Speed 2637.23 samples/sec   Loss 10.2798   LearningRate 0.0562   Epoch: 5   Global Step: 207520   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:08,644-Speed 2637.32 samples/sec   Loss 10.2898   LearningRate 0.0562   Epoch: 5   Global Step: 207530   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:12,560-Speed 2615.80 samples/sec   Loss 10.2261   LearningRate 0.0562   Epoch: 5   Global Step: 207540   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:16,446-Speed 2635.12 samples/sec   Loss 10.1726   LearningRate 0.0562   Epoch: 5   Global Step: 207550   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:20,341-Speed 2630.26 samples/sec   Loss 10.2015   LearningRate 0.0562   Epoch: 5   Global Step: 207560   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:24,228-Speed 2634.38 samples/sec   Loss 10.2307   LearningRate 0.0562   Epoch: 5   Global Step: 207570   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:28,117-Speed 2634.27 samples/sec   Loss 10.2276   LearningRate 0.0562   Epoch: 5   Global Step: 207580   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:32,007-Speed 2632.78 samples/sec   Loss 10.3049   LearningRate 0.0562   Epoch: 5   Global Step: 207590   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:35,899-Speed 2631.87 samples/sec   Loss 10.2986   LearningRate 0.0562   Epoch: 5   Global Step: 207600   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:02:39,773-Speed 2643.68 samples/sec   Loss 10.2951   LearningRate 0.0562   Epoch: 5   Global Step: 207610   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:43,715-Speed 2598.87 samples/sec   Loss 10.3588   LearningRate 0.0562   Epoch: 5   Global Step: 207620   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:47,606-Speed 2632.28 samples/sec   Loss 10.1834   LearningRate 0.0562   Epoch: 5   Global Step: 207630   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:51,501-Speed 2629.59 samples/sec   Loss 10.2311   LearningRate 0.0562   Epoch: 5   Global Step: 207640   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:55,404-Speed 2624.57 samples/sec   Loss 10.2852   LearningRate 0.0562   Epoch: 5   Global Step: 207650   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:02:59,293-Speed 2633.66 samples/sec   Loss 10.1762   LearningRate 0.0562   Epoch: 5   Global Step: 207660   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:03:03,197-Speed 2623.41 samples/sec   Loss 10.3404   LearningRate 0.0562   Epoch: 5   Global Step: 207670   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:03:07,091-Speed 2630.61 samples/sec   Loss 10.2018   LearningRate 0.0562   Epoch: 5   Global Step: 207680   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:03:10,987-Speed 2628.72 samples/sec   Loss 10.3055   LearningRate 0.0562   Epoch: 5   Global Step: 207690   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:03:14,882-Speed 2629.82 samples/sec   Loss 10.1820   LearningRate 0.0562   Epoch: 5   Global Step: 207700   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:03:18,784-Speed 2625.51 samples/sec   Loss 10.1384   LearningRate 0.0562   Epoch: 5   Global Step: 207710   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:03:22,640-Speed 2655.85 samples/sec   Loss 10.0547   LearningRate 0.0562   Epoch: 5   Global Step: 207720   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:26,531-Speed 2633.03 samples/sec   Loss 10.3268   LearningRate 0.0562   Epoch: 5   Global Step: 207730   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:30,437-Speed 2621.95 samples/sec   Loss 10.1329   LearningRate 0.0562   Epoch: 5   Global Step: 207740   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:34,340-Speed 2624.17 samples/sec   Loss 10.0766   LearningRate 0.0562   Epoch: 5   Global Step: 207750   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:38,256-Speed 2615.33 samples/sec   Loss 10.3164   LearningRate 0.0562   Epoch: 5   Global Step: 207760   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:42,161-Speed 2623.57 samples/sec   Loss 10.3755   LearningRate 0.0562   Epoch: 5   Global Step: 207770   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:46,059-Speed 2627.29 samples/sec   Loss 10.2363   LearningRate 0.0562   Epoch: 5   Global Step: 207780   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:49,944-Speed 2636.67 samples/sec   Loss 10.0848   LearningRate 0.0562   Epoch: 5   Global Step: 207790   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:53,868-Speed 2610.43 samples/sec   Loss 10.1099   LearningRate 0.0562   Epoch: 5   Global Step: 207800   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:03:57,765-Speed 2628.69 samples/sec   Loss 10.1998   LearningRate 0.0562   Epoch: 5   Global Step: 207810   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:04:01,662-Speed 2628.32 samples/sec   Loss 10.1235   LearningRate 0.0562   Epoch: 5   Global Step: 207820   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:05,554-Speed 2630.98 samples/sec   Loss 9.9540   LearningRate 0.0562   Epoch: 5   Global Step: 207830   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:09,452-Speed 2628.05 samples/sec   Loss 10.0919   LearningRate 0.0562   Epoch: 5   Global Step: 207840   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:13,355-Speed 2624.59 samples/sec   Loss 10.2729   LearningRate 0.0562   Epoch: 5   Global Step: 207850   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:17,249-Speed 2630.25 samples/sec   Loss 10.1988   LearningRate 0.0562   Epoch: 5   Global Step: 207860   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:21,151-Speed 2624.52 samples/sec   Loss 10.2423   LearningRate 0.0562   Epoch: 5   Global Step: 207870   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:25,046-Speed 2629.70 samples/sec   Loss 10.0729   LearningRate 0.0562   Epoch: 5   Global Step: 207880   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:28,940-Speed 2630.40 samples/sec   Loss 10.1158   LearningRate 0.0562   Epoch: 5   Global Step: 207890   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:32,844-Speed 2623.19 samples/sec   Loss 10.1440   LearningRate 0.0562   Epoch: 5   Global Step: 207900   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:36,745-Speed 2625.82 samples/sec   Loss 10.2384   LearningRate 0.0562   Epoch: 5   Global Step: 207910   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:40,626-Speed 2638.71 samples/sec   Loss 10.1496   LearningRate 0.0562   Epoch: 5   Global Step: 207920   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:44,517-Speed 2631.97 samples/sec   Loss 10.2422   LearningRate 0.0562   Epoch: 5   Global Step: 207930   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:04:48,395-Speed 2641.99 samples/sec   Loss 10.0844   LearningRate 0.0562   Epoch: 5   Global Step: 207940   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:04:52,287-Speed 2631.61 samples/sec   Loss 10.1690   LearningRate 0.0561   Epoch: 5   Global Step: 207950   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:04:56,178-Speed 2632.11 samples/sec   Loss 10.1670   LearningRate 0.0561   Epoch: 5   Global Step: 207960   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:00,068-Speed 2632.77 samples/sec   Loss 10.2187   LearningRate 0.0561   Epoch: 5   Global Step: 207970   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:03,965-Speed 2628.70 samples/sec   Loss 10.2447   LearningRate 0.0561   Epoch: 5   Global Step: 207980   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:07,848-Speed 2637.36 samples/sec   Loss 10.1757   LearningRate 0.0561   Epoch: 5   Global Step: 207990   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:11,740-Speed 2631.66 samples/sec   Loss 10.2504   LearningRate 0.0561   Epoch: 5   Global Step: 208000   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:15,628-Speed 2634.06 samples/sec   Loss 10.2388   LearningRate 0.0561   Epoch: 5   Global Step: 208010   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:19,518-Speed 2633.56 samples/sec   Loss 10.2922   LearningRate 0.0561   Epoch: 5   Global Step: 208020   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:23,407-Speed 2633.97 samples/sec   Loss 10.1399   LearningRate 0.0561   Epoch: 5   Global Step: 208030   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:05:27,296-Speed 2633.42 samples/sec   Loss 10.2039   LearningRate 0.0561   Epoch: 5   Global Step: 208040   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:31,187-Speed 2632.30 samples/sec   Loss 10.1591   LearningRate 0.0561   Epoch: 5   Global Step: 208050   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:35,090-Speed 2624.48 samples/sec   Loss 10.1732   LearningRate 0.0561   Epoch: 5   Global Step: 208060   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:38,989-Speed 2626.73 samples/sec   Loss 10.2063   LearningRate 0.0561   Epoch: 5   Global Step: 208070   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:42,883-Speed 2629.62 samples/sec   Loss 10.2183   LearningRate 0.0561   Epoch: 5   Global Step: 208080   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:46,776-Speed 2631.07 samples/sec   Loss 10.2398   LearningRate 0.0561   Epoch: 5   Global Step: 208090   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:50,674-Speed 2627.44 samples/sec   Loss 10.2120   LearningRate 0.0561   Epoch: 5   Global Step: 208100   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:54,567-Speed 2631.36 samples/sec   Loss 10.0830   LearningRate 0.0561   Epoch: 5   Global Step: 208110   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:05:58,463-Speed 2629.07 samples/sec   Loss 10.1203   LearningRate 0.0561   Epoch: 5   Global Step: 208120   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:02,358-Speed 2630.09 samples/sec   Loss 10.1373   LearningRate 0.0561   Epoch: 5   Global Step: 208130   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:06,252-Speed 2629.73 samples/sec   Loss 10.1185   LearningRate 0.0561   Epoch: 5   Global Step: 208140   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:06:10,209-Speed 2588.38 samples/sec   Loss 10.2124   LearningRate 0.0561   Epoch: 5   Global Step: 208150   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:06:14,106-Speed 2627.98 samples/sec   Loss 10.1902   LearningRate 0.0561   Epoch: 5   Global Step: 208160   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:06:18,001-Speed 2629.70 samples/sec   Loss 10.2645   LearningRate 0.0561   Epoch: 5   Global Step: 208170   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:06:21,896-Speed 2629.49 samples/sec   Loss 10.2559   LearningRate 0.0561   Epoch: 5   Global Step: 208180   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:06:25,772-Speed 2642.49 samples/sec   Loss 10.2590   LearningRate 0.0561   Epoch: 5   Global Step: 208190   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:29,667-Speed 2630.20 samples/sec   Loss 10.1508   LearningRate 0.0561   Epoch: 5   Global Step: 208200   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:33,560-Speed 2630.61 samples/sec   Loss 10.2492   LearningRate 0.0561   Epoch: 5   Global Step: 208210   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:37,462-Speed 2624.85 samples/sec   Loss 10.2859   LearningRate 0.0561   Epoch: 5   Global Step: 208220   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:41,356-Speed 2630.35 samples/sec   Loss 10.1174   LearningRate 0.0561   Epoch: 5   Global Step: 208230   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:06:45,205-Speed 2661.38 samples/sec   Loss 10.7692   LearningRate 0.0561   Epoch: 5   Global Step: 208240   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:06:49,058-Speed 2658.29 samples/sec   Loss 11.1137   LearningRate 0.0561   Epoch: 5   Global Step: 208250   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:06:52,983-Speed 2609.74 samples/sec   Loss 10.7215   LearningRate 0.0561   Epoch: 5   Global Step: 208260   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:06:56,878-Speed 2630.13 samples/sec   Loss 10.3438   LearningRate 0.0561   Epoch: 5   Global Step: 208270   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:00,770-Speed 2631.61 samples/sec   Loss 10.3542   LearningRate 0.0561   Epoch: 5   Global Step: 208280   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:04,686-Speed 2615.23 samples/sec   Loss 10.2814   LearningRate 0.0561   Epoch: 5   Global Step: 208290   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:08,583-Speed 2628.03 samples/sec   Loss 10.2290   LearningRate 0.0561   Epoch: 5   Global Step: 208300   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:12,473-Speed 2633.27 samples/sec   Loss 10.3113   LearningRate 0.0561   Epoch: 5   Global Step: 208310   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:16,361-Speed 2634.36 samples/sec   Loss 10.3821   LearningRate 0.0561   Epoch: 5   Global Step: 208320   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:20,280-Speed 2613.60 samples/sec   Loss 10.2131   LearningRate 0.0561   Epoch: 5   Global Step: 208330   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:24,173-Speed 2631.32 samples/sec   Loss 10.2192   LearningRate 0.0561   Epoch: 5   Global Step: 208340   Fp16 Grad Scale: 4096   Required: 70 hours
Training: 2022-04-13 19:07:28,060-Speed 2635.53 samples/sec   Loss 10.2328   LearningRate 0.0561   Epoch: 5   Global Step: 208350   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:31,959-Speed 2626.59 samples/sec   Loss 10.2900   LearningRate 0.0561   Epoch: 5   Global Step: 208360   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:35,848-Speed 2633.84 samples/sec   Loss 10.2369   LearningRate 0.0561   Epoch: 5   Global Step: 208370   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:39,735-Speed 2634.59 samples/sec   Loss 10.1767   LearningRate 0.0561   Epoch: 5   Global Step: 208380   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:43,633-Speed 2628.35 samples/sec   Loss 10.4207   LearningRate 0.0561   Epoch: 5   Global Step: 208390   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:47,524-Speed 2631.83 samples/sec   Loss 10.2725   LearningRate 0.0561   Epoch: 5   Global Step: 208400   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:51,416-Speed 2632.34 samples/sec   Loss 10.1260   LearningRate 0.0561   Epoch: 5   Global Step: 208410   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:55,318-Speed 2624.84 samples/sec   Loss 10.3363   LearningRate 0.0561   Epoch: 5   Global Step: 208420   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:07:59,205-Speed 2635.03 samples/sec   Loss 10.2080   LearningRate 0.0561   Epoch: 5   Global Step: 208430   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:08:03,091-Speed 2635.41 samples/sec   Loss 10.2364   LearningRate 0.0561   Epoch: 5   Global Step: 208440   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:08:06,983-Speed 2632.46 samples/sec   Loss 10.2732   LearningRate 0.0561   Epoch: 5   Global Step: 208450   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:10,874-Speed 2631.86 samples/sec   Loss 10.2375   LearningRate 0.0561   Epoch: 5   Global Step: 208460   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:14,766-Speed 2632.12 samples/sec   Loss 10.2847   LearningRate 0.0561   Epoch: 5   Global Step: 208470   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:18,655-Speed 2633.45 samples/sec   Loss 10.2029   LearningRate 0.0561   Epoch: 5   Global Step: 208480   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:22,545-Speed 2633.54 samples/sec   Loss 10.2913   LearningRate 0.0561   Epoch: 5   Global Step: 208490   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:26,432-Speed 2634.95 samples/sec   Loss 10.1835   LearningRate 0.0561   Epoch: 5   Global Step: 208500   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:30,325-Speed 2631.06 samples/sec   Loss 10.2781   LearningRate 0.0560   Epoch: 5   Global Step: 208510   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:34,238-Speed 2617.69 samples/sec   Loss 10.2770   LearningRate 0.0560   Epoch: 5   Global Step: 208520   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:38,284-Speed 2531.56 samples/sec   Loss 10.2585   LearningRate 0.0560   Epoch: 5   Global Step: 208530   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:42,173-Speed 2633.25 samples/sec   Loss 10.2131   LearningRate 0.0560   Epoch: 5   Global Step: 208540   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:08:46,060-Speed 2634.78 samples/sec   Loss 10.1797   LearningRate 0.0560   Epoch: 5   Global Step: 208550   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:08:49,950-Speed 2632.99 samples/sec   Loss 10.1315   LearningRate 0.0560   Epoch: 5   Global Step: 208560   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:08:53,847-Speed 2628.44 samples/sec   Loss 10.3651   LearningRate 0.0560   Epoch: 5   Global Step: 208570   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:08:57,740-Speed 2631.27 samples/sec   Loss 10.1471   LearningRate 0.0560   Epoch: 5   Global Step: 208580   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:01,632-Speed 2631.24 samples/sec   Loss 10.3860   LearningRate 0.0560   Epoch: 5   Global Step: 208590   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:05,533-Speed 2625.59 samples/sec   Loss 10.1614   LearningRate 0.0560   Epoch: 5   Global Step: 208600   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:09,423-Speed 2632.90 samples/sec   Loss 10.0886   LearningRate 0.0560   Epoch: 5   Global Step: 208610   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:13,316-Speed 2631.44 samples/sec   Loss 10.2571   LearningRate 0.0560   Epoch: 5   Global Step: 208620   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:17,204-Speed 2634.20 samples/sec   Loss 10.2493   LearningRate 0.0560   Epoch: 5   Global Step: 208630   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:21,101-Speed 2627.78 samples/sec   Loss 10.2137   LearningRate 0.0560   Epoch: 5   Global Step: 208640   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:09:24,992-Speed 2632.80 samples/sec   Loss 10.2175   LearningRate 0.0560   Epoch: 5   Global Step: 208650   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:09:28,878-Speed 2635.17 samples/sec   Loss 10.3209   LearningRate 0.0560   Epoch: 5   Global Step: 208660   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:09:32,769-Speed 2632.67 samples/sec   Loss 10.1688   LearningRate 0.0560   Epoch: 5   Global Step: 208670   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:09:36,657-Speed 2634.56 samples/sec   Loss 10.2904   LearningRate 0.0560   Epoch: 5   Global Step: 208680   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:09:40,553-Speed 2628.70 samples/sec   Loss 10.3774   LearningRate 0.0560   Epoch: 5   Global Step: 208690   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:09:44,507-Speed 2590.23 samples/sec   Loss 10.2943   LearningRate 0.0560   Epoch: 5   Global Step: 208700   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:09:48,372-Speed 2650.19 samples/sec   Loss 10.8591   LearningRate 0.0560   Epoch: 5   Global Step: 208710   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:09:52,301-Speed 2607.08 samples/sec   Loss 10.2306   LearningRate 0.0560   Epoch: 5   Global Step: 208720   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:09:56,204-Speed 2624.07 samples/sec   Loss 10.1797   LearningRate 0.0560   Epoch: 5   Global Step: 208730   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:00,099-Speed 2629.83 samples/sec   Loss 10.2020   LearningRate 0.0560   Epoch: 5   Global Step: 208740   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:03,992-Speed 2630.95 samples/sec   Loss 10.1961   LearningRate 0.0560   Epoch: 5   Global Step: 208750   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:07,881-Speed 2634.10 samples/sec   Loss 10.2556   LearningRate 0.0560   Epoch: 5   Global Step: 208760   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:11,771-Speed 2632.81 samples/sec   Loss 10.0751   LearningRate 0.0560   Epoch: 5   Global Step: 208770   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:15,662-Speed 2632.78 samples/sec   Loss 10.1959   LearningRate 0.0560   Epoch: 5   Global Step: 208780   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:19,554-Speed 2631.85 samples/sec   Loss 10.1936   LearningRate 0.0560   Epoch: 5   Global Step: 208790   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:23,440-Speed 2635.00 samples/sec   Loss 10.1590   LearningRate 0.0560   Epoch: 5   Global Step: 208800   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:10:27,329-Speed 2634.28 samples/sec   Loss 10.1302   LearningRate 0.0560   Epoch: 5   Global Step: 208810   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:31,222-Speed 2630.89 samples/sec   Loss 10.2293   LearningRate 0.0560   Epoch: 5   Global Step: 208820   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:35,116-Speed 2630.82 samples/sec   Loss 10.1616   LearningRate 0.0560   Epoch: 5   Global Step: 208830   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:39,009-Speed 2630.66 samples/sec   Loss 10.4207   LearningRate 0.0560   Epoch: 5   Global Step: 208840   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:42,900-Speed 2632.43 samples/sec   Loss 10.2717   LearningRate 0.0560   Epoch: 5   Global Step: 208850   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:46,828-Speed 2607.43 samples/sec   Loss 10.1430   LearningRate 0.0560   Epoch: 5   Global Step: 208860   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:50,718-Speed 2633.59 samples/sec   Loss 10.2189   LearningRate 0.0560   Epoch: 5   Global Step: 208870   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:54,611-Speed 2630.71 samples/sec   Loss 10.3740   LearningRate 0.0560   Epoch: 5   Global Step: 208880   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:10:58,504-Speed 2630.92 samples/sec   Loss 10.2807   LearningRate 0.0560   Epoch: 5   Global Step: 208890   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:11:02,407-Speed 2624.66 samples/sec   Loss 10.3255   LearningRate 0.0560   Epoch: 5   Global Step: 208900   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:11:06,294-Speed 2635.07 samples/sec   Loss 10.2573   LearningRate 0.0560   Epoch: 5   Global Step: 208910   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:10,187-Speed 2630.64 samples/sec   Loss 10.2411   LearningRate 0.0560   Epoch: 5   Global Step: 208920   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:14,114-Speed 2608.72 samples/sec   Loss 10.2109   LearningRate 0.0560   Epoch: 5   Global Step: 208930   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:18,009-Speed 2629.71 samples/sec   Loss 10.1777   LearningRate 0.0560   Epoch: 5   Global Step: 208940   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:21,897-Speed 2633.94 samples/sec   Loss 10.2104   LearningRate 0.0560   Epoch: 5   Global Step: 208950   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:25,788-Speed 2632.26 samples/sec   Loss 10.1081   LearningRate 0.0560   Epoch: 5   Global Step: 208960   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:29,677-Speed 2633.66 samples/sec   Loss 10.2000   LearningRate 0.0560   Epoch: 5   Global Step: 208970   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:33,569-Speed 2631.90 samples/sec   Loss 10.0917   LearningRate 0.0560   Epoch: 5   Global Step: 208980   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:37,459-Speed 2632.55 samples/sec   Loss 10.0864   LearningRate 0.0560   Epoch: 5   Global Step: 208990   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:41,353-Speed 2630.54 samples/sec   Loss 10.2728   LearningRate 0.0560   Epoch: 5   Global Step: 209000   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:11:45,248-Speed 2629.55 samples/sec   Loss 10.2300   LearningRate 0.0560   Epoch: 5   Global Step: 209010   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:11:49,140-Speed 2631.73 samples/sec   Loss 10.2805   LearningRate 0.0560   Epoch: 5   Global Step: 209020   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:11:53,033-Speed 2631.19 samples/sec   Loss 10.2026   LearningRate 0.0560   Epoch: 5   Global Step: 209030   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:11:56,927-Speed 2630.21 samples/sec   Loss 10.1828   LearningRate 0.0560   Epoch: 5   Global Step: 209040   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:12:00,833-Speed 2622.16 samples/sec   Loss 10.2208   LearningRate 0.0560   Epoch: 5   Global Step: 209050   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:12:04,726-Speed 2630.62 samples/sec   Loss 10.1583   LearningRate 0.0559   Epoch: 5   Global Step: 209060   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:12:08,621-Speed 2629.48 samples/sec   Loss 10.3044   LearningRate 0.0559   Epoch: 5   Global Step: 209070   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:12:12,517-Speed 2628.85 samples/sec   Loss 10.3321   LearningRate 0.0559   Epoch: 5   Global Step: 209080   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:12:16,420-Speed 2624.65 samples/sec   Loss 10.2407   LearningRate 0.0559   Epoch: 5   Global Step: 209090   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:12:20,391-Speed 2579.58 samples/sec   Loss 10.1396   LearningRate 0.0559   Epoch: 5   Global Step: 209100   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:24,284-Speed 2630.29 samples/sec   Loss 10.3872   LearningRate 0.0559   Epoch: 5   Global Step: 209110   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:28,179-Speed 2629.68 samples/sec   Loss 10.1163   LearningRate 0.0559   Epoch: 5   Global Step: 209120   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:32,076-Speed 2628.12 samples/sec   Loss 10.1792   LearningRate 0.0559   Epoch: 5   Global Step: 209130   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:35,969-Speed 2631.52 samples/sec   Loss 10.1904   LearningRate 0.0559   Epoch: 5   Global Step: 209140   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:39,866-Speed 2628.55 samples/sec   Loss 10.2136   LearningRate 0.0559   Epoch: 5   Global Step: 209150   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:43,753-Speed 2635.21 samples/sec   Loss 10.1372   LearningRate 0.0559   Epoch: 5   Global Step: 209160   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:47,646-Speed 2630.88 samples/sec   Loss 10.3020   LearningRate 0.0559   Epoch: 5   Global Step: 209170   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:51,538-Speed 2631.25 samples/sec   Loss 10.2102   LearningRate 0.0559   Epoch: 5   Global Step: 209180   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:12:56,293-Speed 2154.50 samples/sec   Loss 10.1995   LearningRate 0.0559   Epoch: 5   Global Step: 209190   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:00,188-Speed 2629.58 samples/sec   Loss 10.2636   LearningRate 0.0559   Epoch: 5   Global Step: 209200   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:04,079-Speed 2632.19 samples/sec   Loss 10.0835   LearningRate 0.0559   Epoch: 5   Global Step: 209210   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:07,969-Speed 2633.03 samples/sec   Loss 10.1723   LearningRate 0.0559   Epoch: 5   Global Step: 209220   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:11,866-Speed 2628.13 samples/sec   Loss 10.3657   LearningRate 0.0559   Epoch: 5   Global Step: 209230   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:15,761-Speed 2629.88 samples/sec   Loss 10.1915   LearningRate 0.0559   Epoch: 5   Global Step: 209240   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:19,657-Speed 2629.79 samples/sec   Loss 10.2623   LearningRate 0.0559   Epoch: 5   Global Step: 209250   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:23,552-Speed 2629.11 samples/sec   Loss 10.2230   LearningRate 0.0559   Epoch: 5   Global Step: 209260   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:13:27,428-Speed 2642.78 samples/sec   Loss 10.1954   LearningRate 0.0559   Epoch: 5   Global Step: 209270   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:31,396-Speed 2581.46 samples/sec   Loss 10.2502   LearningRate 0.0559   Epoch: 5   Global Step: 209280   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:35,321-Speed 2609.20 samples/sec   Loss 10.2160   LearningRate 0.0559   Epoch: 5   Global Step: 209290   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:39,274-Speed 2591.25 samples/sec   Loss 10.1025   LearningRate 0.0559   Epoch: 5   Global Step: 209300   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:43,170-Speed 2629.14 samples/sec   Loss 10.3507   LearningRate 0.0559   Epoch: 5   Global Step: 209310   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:47,062-Speed 2631.68 samples/sec   Loss 10.2316   LearningRate 0.0559   Epoch: 5   Global Step: 209320   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:50,964-Speed 2625.11 samples/sec   Loss 10.1128   LearningRate 0.0559   Epoch: 5   Global Step: 209330   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:54,860-Speed 2629.37 samples/sec   Loss 10.2683   LearningRate 0.0559   Epoch: 5   Global Step: 209340   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:13:58,749-Speed 2634.19 samples/sec   Loss 10.1848   LearningRate 0.0559   Epoch: 5   Global Step: 209350   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:14:02,640-Speed 2631.67 samples/sec   Loss 10.3750   LearningRate 0.0559   Epoch: 5   Global Step: 209360   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:14:06,665-Speed 2545.12 samples/sec   Loss 10.1466   LearningRate 0.0559   Epoch: 5   Global Step: 209370   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:10,756-Speed 2503.36 samples/sec   Loss 10.3312   LearningRate 0.0559   Epoch: 5   Global Step: 209380   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:14,795-Speed 2536.43 samples/sec   Loss 10.2260   LearningRate 0.0559   Epoch: 5   Global Step: 209390   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:18,710-Speed 2615.56 samples/sec   Loss 10.2792   LearningRate 0.0559   Epoch: 5   Global Step: 209400   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:22,606-Speed 2629.64 samples/sec   Loss 10.1602   LearningRate 0.0559   Epoch: 5   Global Step: 209410   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:26,500-Speed 2630.04 samples/sec   Loss 10.1357   LearningRate 0.0559   Epoch: 5   Global Step: 209420   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:30,415-Speed 2616.57 samples/sec   Loss 10.1345   LearningRate 0.0559   Epoch: 5   Global Step: 209430   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:34,322-Speed 2621.61 samples/sec   Loss 10.2345   LearningRate 0.0559   Epoch: 5   Global Step: 209440   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:38,245-Speed 2610.77 samples/sec   Loss 10.2325   LearningRate 0.0559   Epoch: 5   Global Step: 209450   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:42,147-Speed 2624.55 samples/sec   Loss 10.2051   LearningRate 0.0559   Epoch: 5   Global Step: 209460   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:46,051-Speed 2624.65 samples/sec   Loss 10.2689   LearningRate 0.0559   Epoch: 5   Global Step: 209470   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:14:49,935-Speed 2637.18 samples/sec   Loss 10.1544   LearningRate 0.0559   Epoch: 5   Global Step: 209480   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:53,839-Speed 2623.84 samples/sec   Loss 10.2268   LearningRate 0.0559   Epoch: 5   Global Step: 209490   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:14:57,760-Speed 2612.32 samples/sec   Loss 10.1811   LearningRate 0.0559   Epoch: 5   Global Step: 209500   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:01,667-Speed 2621.68 samples/sec   Loss 10.3559   LearningRate 0.0559   Epoch: 5   Global Step: 209510   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:05,558-Speed 2632.12 samples/sec   Loss 10.1576   LearningRate 0.0559   Epoch: 5   Global Step: 209520   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:09,449-Speed 2632.06 samples/sec   Loss 10.1893   LearningRate 0.0559   Epoch: 5   Global Step: 209530   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:13,346-Speed 2629.07 samples/sec   Loss 10.1281   LearningRate 0.0559   Epoch: 5   Global Step: 209540   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:17,236-Speed 2632.61 samples/sec   Loss 10.2050   LearningRate 0.0559   Epoch: 5   Global Step: 209550   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:21,125-Speed 2633.86 samples/sec   Loss 10.2056   LearningRate 0.0559   Epoch: 5   Global Step: 209560   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:25,029-Speed 2623.50 samples/sec   Loss 10.1803   LearningRate 0.0559   Epoch: 5   Global Step: 209570   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:28,910-Speed 2639.14 samples/sec   Loss 10.2458   LearningRate 0.0559   Epoch: 5   Global Step: 209580   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:32,807-Speed 2628.67 samples/sec   Loss 10.1349   LearningRate 0.0559   Epoch: 5   Global Step: 209590   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:36,701-Speed 2629.91 samples/sec   Loss 10.2076   LearningRate 0.0559   Epoch: 5   Global Step: 209600   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:40,594-Speed 2630.62 samples/sec   Loss 10.1696   LearningRate 0.0559   Epoch: 5   Global Step: 209610   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:44,484-Speed 2633.62 samples/sec   Loss 10.2533   LearningRate 0.0558   Epoch: 5   Global Step: 209620   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:48,379-Speed 2629.70 samples/sec   Loss 10.2718   LearningRate 0.0558   Epoch: 5   Global Step: 209630   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:52,279-Speed 2626.31 samples/sec   Loss 10.1063   LearningRate 0.0558   Epoch: 5   Global Step: 209640   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:15:56,198-Speed 2613.42 samples/sec   Loss 10.0807   LearningRate 0.0558   Epoch: 5   Global Step: 209650   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:16:00,094-Speed 2629.21 samples/sec   Loss 10.1517   LearningRate 0.0558   Epoch: 5   Global Step: 209660   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:16:03,986-Speed 2631.97 samples/sec   Loss 10.1760   LearningRate 0.0558   Epoch: 5   Global Step: 209670   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:16:07,879-Speed 2630.72 samples/sec   Loss 10.1047   LearningRate 0.0558   Epoch: 5   Global Step: 209680   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:16:11,757-Speed 2641.56 samples/sec   Loss 10.1884   LearningRate 0.0558   Epoch: 5   Global Step: 209690   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:16:15,632-Speed 2643.34 samples/sec   Loss 10.2628   LearningRate 0.0558   Epoch: 5   Global Step: 209700   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:19,527-Speed 2629.30 samples/sec   Loss 10.2766   LearningRate 0.0558   Epoch: 5   Global Step: 209710   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:23,419-Speed 2631.55 samples/sec   Loss 10.3392   LearningRate 0.0558   Epoch: 5   Global Step: 209720   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:27,315-Speed 2629.40 samples/sec   Loss 10.1931   LearningRate 0.0558   Epoch: 5   Global Step: 209730   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:31,216-Speed 2625.67 samples/sec   Loss 10.0785   LearningRate 0.0558   Epoch: 5   Global Step: 209740   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:35,122-Speed 2622.57 samples/sec   Loss 10.1159   LearningRate 0.0558   Epoch: 5   Global Step: 209750   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:39,034-Speed 2617.83 samples/sec   Loss 10.1624   LearningRate 0.0558   Epoch: 5   Global Step: 209760   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:42,936-Speed 2625.15 samples/sec   Loss 10.2672   LearningRate 0.0558   Epoch: 5   Global Step: 209770   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:46,842-Speed 2622.36 samples/sec   Loss 10.2930   LearningRate 0.0558   Epoch: 5   Global Step: 209780   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:50,733-Speed 2632.23 samples/sec   Loss 10.2097   LearningRate 0.0558   Epoch: 5   Global Step: 209790   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:16:54,625-Speed 2631.76 samples/sec   Loss 10.0654   LearningRate 0.0558   Epoch: 5   Global Step: 209800   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:16:58,522-Speed 2628.63 samples/sec   Loss 10.0295   LearningRate 0.0558   Epoch: 5   Global Step: 209810   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:17:02,412-Speed 2632.88 samples/sec   Loss 10.1445   LearningRate 0.0558   Epoch: 5   Global Step: 209820   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:17:06,309-Speed 2628.22 samples/sec   Loss 10.2062   LearningRate 0.0558   Epoch: 5   Global Step: 209830   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:17:10,187-Speed 2641.49 samples/sec   Loss 10.1323   LearningRate 0.0558   Epoch: 5   Global Step: 209840   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:14,083-Speed 2628.88 samples/sec   Loss 10.1948   LearningRate 0.0558   Epoch: 5   Global Step: 209850   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:17,975-Speed 2631.32 samples/sec   Loss 10.1950   LearningRate 0.0558   Epoch: 5   Global Step: 209860   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:21,871-Speed 2629.14 samples/sec   Loss 10.2500   LearningRate 0.0558   Epoch: 5   Global Step: 209870   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:25,819-Speed 2594.39 samples/sec   Loss 10.1438   LearningRate 0.0558   Epoch: 5   Global Step: 209880   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:29,712-Speed 2631.26 samples/sec   Loss 10.2083   LearningRate 0.0558   Epoch: 5   Global Step: 209890   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:33,604-Speed 2631.73 samples/sec   Loss 10.2260   LearningRate 0.0558   Epoch: 5   Global Step: 209900   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:37,494-Speed 2632.76 samples/sec   Loss 10.2018   LearningRate 0.0558   Epoch: 5   Global Step: 209910   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:41,386-Speed 2632.20 samples/sec   Loss 10.1746   LearningRate 0.0558   Epoch: 5   Global Step: 209920   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:45,277-Speed 2632.29 samples/sec   Loss 10.2572   LearningRate 0.0558   Epoch: 5   Global Step: 209930   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:17:49,171-Speed 2630.46 samples/sec   Loss 10.1229   LearningRate 0.0558   Epoch: 5   Global Step: 209940   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:17:53,070-Speed 2626.59 samples/sec   Loss 10.0628   LearningRate 0.0558   Epoch: 5   Global Step: 209950   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:17:56,967-Speed 2628.67 samples/sec   Loss 10.2470   LearningRate 0.0558   Epoch: 5   Global Step: 209960   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:18:00,864-Speed 2628.37 samples/sec   Loss 10.1227   LearningRate 0.0558   Epoch: 5   Global Step: 209970   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:18:04,764-Speed 2626.74 samples/sec   Loss 10.1346   LearningRate 0.0558   Epoch: 5   Global Step: 209980   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:18:08,661-Speed 2627.69 samples/sec   Loss 10.1441   LearningRate 0.0558   Epoch: 5   Global Step: 209990   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:18:12,555-Speed 2630.29 samples/sec   Loss 10.1887   LearningRate 0.0558   Epoch: 5   Global Step: 210000   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:18:55,793-[lfw][210000]XNorm: 24.450740
Training: 2022-04-13 19:18:55,794-[lfw][210000]Accuracy-Flip: 0.99650+-0.00345
Training: 2022-04-13 19:18:55,794-[lfw][210000]Accuracy-Highest: 0.99783
Training: 2022-04-13 19:19:45,881-[cfp_fp][210000]XNorm: 22.013891
Training: 2022-04-13 19:19:45,881-[cfp_fp][210000]Accuracy-Flip: 0.98214+-0.00607
Training: 2022-04-13 19:19:45,883-[cfp_fp][210000]Accuracy-Highest: 0.98314
Training: 2022-04-13 19:20:28,899-[agedb_30][210000]XNorm: 24.174806
Training: 2022-04-13 19:20:28,900-[agedb_30][210000]Accuracy-Flip: 0.96983+-0.00769
Training: 2022-04-13 19:20:28,901-[agedb_30][210000]Accuracy-Highest: 0.97150
Training: 2022-04-13 19:20:32,768-Speed 73.03 samples/sec   Loss 10.1919   LearningRate 0.0558   Epoch: 5   Global Step: 210010   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:20:36,627-Speed 2654.07 samples/sec   Loss 10.1917   LearningRate 0.0558   Epoch: 5   Global Step: 210020   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:20:40,493-Speed 2649.14 samples/sec   Loss 10.2132   LearningRate 0.0558   Epoch: 5   Global Step: 210030   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:20:44,473-Speed 2573.71 samples/sec   Loss 10.1918   LearningRate 0.0558   Epoch: 5   Global Step: 210040   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:20:48,349-Speed 2642.72 samples/sec   Loss 9.9840   LearningRate 0.0558   Epoch: 5   Global Step: 210050   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:20:52,219-Speed 2646.49 samples/sec   Loss 10.1846   LearningRate 0.0558   Epoch: 5   Global Step: 210060   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:20:56,128-Speed 2620.13 samples/sec   Loss 10.2476   LearningRate 0.0558   Epoch: 5   Global Step: 210070   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:00,005-Speed 2642.23 samples/sec   Loss 10.1166   LearningRate 0.0558   Epoch: 5   Global Step: 210080   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:03,885-Speed 2639.99 samples/sec   Loss 10.1512   LearningRate 0.0558   Epoch: 5   Global Step: 210090   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:07,763-Speed 2640.97 samples/sec   Loss 10.3491   LearningRate 0.0558   Epoch: 5   Global Step: 210100   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:11,651-Speed 2634.53 samples/sec   Loss 10.2785   LearningRate 0.0558   Epoch: 5   Global Step: 210110   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:15,535-Speed 2638.02 samples/sec   Loss 10.2470   LearningRate 0.0558   Epoch: 5   Global Step: 210120   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:19,423-Speed 2634.80 samples/sec   Loss 10.0660   LearningRate 0.0558   Epoch: 5   Global Step: 210130   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:23,349-Speed 2608.46 samples/sec   Loss 10.1811   LearningRate 0.0558   Epoch: 5   Global Step: 210140   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:27,239-Speed 2633.48 samples/sec   Loss 10.1758   LearningRate 0.0558   Epoch: 5   Global Step: 210150   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:21:31,124-Speed 2636.29 samples/sec   Loss 10.0067   LearningRate 0.0558   Epoch: 5   Global Step: 210160   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:21:34,996-Speed 2645.20 samples/sec   Loss 10.2034   LearningRate 0.0557   Epoch: 5   Global Step: 210170   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:38,886-Speed 2633.23 samples/sec   Loss 10.2643   LearningRate 0.0557   Epoch: 5   Global Step: 210180   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:42,776-Speed 2633.22 samples/sec   Loss 10.1790   LearningRate 0.0557   Epoch: 5   Global Step: 210190   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:46,665-Speed 2633.87 samples/sec   Loss 10.2525   LearningRate 0.0557   Epoch: 5   Global Step: 210200   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:50,572-Speed 2621.50 samples/sec   Loss 10.2492   LearningRate 0.0557   Epoch: 5   Global Step: 210210   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:54,491-Speed 2613.27 samples/sec   Loss 10.1375   LearningRate 0.0557   Epoch: 5   Global Step: 210220   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:21:58,391-Speed 2626.65 samples/sec   Loss 10.0901   LearningRate 0.0557   Epoch: 5   Global Step: 210230   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:22:02,285-Speed 2630.37 samples/sec   Loss 10.1774   LearningRate 0.0557   Epoch: 5   Global Step: 210240   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:06,241-Speed 2589.12 samples/sec   Loss 10.1780   LearningRate 0.0557   Epoch: 5   Global Step: 210250   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:10,133-Speed 2632.32 samples/sec   Loss 10.1509   LearningRate 0.0557   Epoch: 5   Global Step: 210260   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:14,029-Speed 2628.96 samples/sec   Loss 10.0641   LearningRate 0.0557   Epoch: 5   Global Step: 210270   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:17,927-Speed 2627.25 samples/sec   Loss 10.2250   LearningRate 0.0557   Epoch: 5   Global Step: 210280   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:21,825-Speed 2628.30 samples/sec   Loss 10.2398   LearningRate 0.0557   Epoch: 5   Global Step: 210290   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:25,720-Speed 2629.02 samples/sec   Loss 10.0868   LearningRate 0.0557   Epoch: 5   Global Step: 210300   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:29,613-Speed 2630.92 samples/sec   Loss 10.0943   LearningRate 0.0557   Epoch: 5   Global Step: 210310   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:33,510-Speed 2629.11 samples/sec   Loss 10.1441   LearningRate 0.0557   Epoch: 5   Global Step: 210320   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:37,400-Speed 2633.17 samples/sec   Loss 10.1125   LearningRate 0.0557   Epoch: 5   Global Step: 210330   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:22:41,290-Speed 2633.22 samples/sec   Loss 10.2013   LearningRate 0.0557   Epoch: 5   Global Step: 210340   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:22:45,186-Speed 2628.84 samples/sec   Loss 10.0513   LearningRate 0.0557   Epoch: 5   Global Step: 210350   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:22:49,076-Speed 2633.25 samples/sec   Loss 10.1727   LearningRate 0.0557   Epoch: 5   Global Step: 210360   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:22:52,965-Speed 2633.52 samples/sec   Loss 10.3022   LearningRate 0.0557   Epoch: 5   Global Step: 210370   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:22:56,856-Speed 2632.18 samples/sec   Loss 10.2713   LearningRate 0.0557   Epoch: 5   Global Step: 210380   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:23:00,749-Speed 2631.23 samples/sec   Loss 10.1817   LearningRate 0.0557   Epoch: 5   Global Step: 210390   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:23:04,645-Speed 2628.66 samples/sec   Loss 10.2585   LearningRate 0.0557   Epoch: 5   Global Step: 210400   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:23:08,540-Speed 2629.76 samples/sec   Loss 10.0575   LearningRate 0.0557   Epoch: 5   Global Step: 210410   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:23:12,480-Speed 2600.00 samples/sec   Loss 10.2934   LearningRate 0.0557   Epoch: 5   Global Step: 210420   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:23:16,378-Speed 2628.13 samples/sec   Loss 10.2617   LearningRate 0.0557   Epoch: 5   Global Step: 210430   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:23:20,284-Speed 2622.09 samples/sec   Loss 10.2623   LearningRate 0.0557   Epoch: 5   Global Step: 210440   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:23:24,225-Speed 2598.95 samples/sec   Loss 10.1867   LearningRate 0.0557   Epoch: 5   Global Step: 210450   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:23:28,092-Speed 2648.63 samples/sec   Loss 10.2117   LearningRate 0.0557   Epoch: 5   Global Step: 210460   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:31,988-Speed 2628.57 samples/sec   Loss 9.9907   LearningRate 0.0557   Epoch: 5   Global Step: 210470   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:35,894-Speed 2622.88 samples/sec   Loss 10.1518   LearningRate 0.0557   Epoch: 5   Global Step: 210480   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:39,800-Speed 2621.98 samples/sec   Loss 10.1344   LearningRate 0.0557   Epoch: 5   Global Step: 210490   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:43,705-Speed 2622.94 samples/sec   Loss 10.3725   LearningRate 0.0557   Epoch: 5   Global Step: 210500   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:47,607-Speed 2624.83 samples/sec   Loss 10.1875   LearningRate 0.0557   Epoch: 5   Global Step: 210510   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:51,514-Speed 2621.92 samples/sec   Loss 10.2259   LearningRate 0.0557   Epoch: 5   Global Step: 210520   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:55,427-Speed 2616.96 samples/sec   Loss 10.2289   LearningRate 0.0557   Epoch: 5   Global Step: 210530   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:23:59,339-Speed 2618.60 samples/sec   Loss 10.1435   LearningRate 0.0557   Epoch: 5   Global Step: 210540   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:03,239-Speed 2625.80 samples/sec   Loss 10.1141   LearningRate 0.0557   Epoch: 5   Global Step: 210550   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:07,137-Speed 2627.47 samples/sec   Loss 10.2015   LearningRate 0.0557   Epoch: 5   Global Step: 210560   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:24:11,034-Speed 2628.27 samples/sec   Loss 9.9223   LearningRate 0.0557   Epoch: 5   Global Step: 210570   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:24:14,934-Speed 2626.37 samples/sec   Loss 10.1115   LearningRate 0.0557   Epoch: 5   Global Step: 210580   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:24:18,835-Speed 2625.83 samples/sec   Loss 10.1705   LearningRate 0.0557   Epoch: 5   Global Step: 210590   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:24:22,733-Speed 2627.47 samples/sec   Loss 10.1948   LearningRate 0.0557   Epoch: 5   Global Step: 210600   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:24:26,630-Speed 2628.19 samples/sec   Loss 10.0486   LearningRate 0.0557   Epoch: 5   Global Step: 210610   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:24:30,518-Speed 2634.11 samples/sec   Loss 10.2177   LearningRate 0.0557   Epoch: 5   Global Step: 210620   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:34,414-Speed 2629.19 samples/sec   Loss 10.1990   LearningRate 0.0557   Epoch: 5   Global Step: 210630   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:38,317-Speed 2624.08 samples/sec   Loss 10.3333   LearningRate 0.0557   Epoch: 5   Global Step: 210640   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:42,218-Speed 2625.34 samples/sec   Loss 10.2414   LearningRate 0.0557   Epoch: 5   Global Step: 210650   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:46,117-Speed 2627.62 samples/sec   Loss 10.1277   LearningRate 0.0557   Epoch: 5   Global Step: 210660   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:50,028-Speed 2618.48 samples/sec   Loss 10.1846   LearningRate 0.0557   Epoch: 5   Global Step: 210670   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:53,927-Speed 2626.68 samples/sec   Loss 10.2920   LearningRate 0.0557   Epoch: 5   Global Step: 210680   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:24:57,829-Speed 2625.57 samples/sec   Loss 10.0526   LearningRate 0.0557   Epoch: 5   Global Step: 210690   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:25:01,728-Speed 2626.43 samples/sec   Loss 10.1982   LearningRate 0.0557   Epoch: 5   Global Step: 210700   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:25:05,638-Speed 2619.47 samples/sec   Loss 10.1887   LearningRate 0.0557   Epoch: 5   Global Step: 210710   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:25:09,539-Speed 2625.62 samples/sec   Loss 10.2778   LearningRate 0.0557   Epoch: 5   Global Step: 210720   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:13,478-Speed 2600.05 samples/sec   Loss 10.1750   LearningRate 0.0556   Epoch: 5   Global Step: 210730   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:17,380-Speed 2624.92 samples/sec   Loss 10.1711   LearningRate 0.0556   Epoch: 5   Global Step: 210740   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:21,285-Speed 2623.29 samples/sec   Loss 9.9844   LearningRate 0.0556   Epoch: 5   Global Step: 210750   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:25,188-Speed 2624.48 samples/sec   Loss 10.1572   LearningRate 0.0556   Epoch: 5   Global Step: 210760   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:29,098-Speed 2619.07 samples/sec   Loss 10.1847   LearningRate 0.0556   Epoch: 5   Global Step: 210770   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:32,997-Speed 2626.77 samples/sec   Loss 10.0585   LearningRate 0.0556   Epoch: 5   Global Step: 210780   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:36,914-Speed 2614.64 samples/sec   Loss 10.2087   LearningRate 0.0556   Epoch: 5   Global Step: 210790   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:40,824-Speed 2620.16 samples/sec   Loss 10.3117   LearningRate 0.0556   Epoch: 5   Global Step: 210800   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:44,732-Speed 2620.94 samples/sec   Loss 10.1242   LearningRate 0.0556   Epoch: 5   Global Step: 210810   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:25:48,633-Speed 2625.20 samples/sec   Loss 10.1794   LearningRate 0.0556   Epoch: 5   Global Step: 210820   Fp16 Grad Scale: 262144   Required: 70 hours
Training: 2022-04-13 19:25:52,471-Speed 2668.48 samples/sec   Loss 10.2761   LearningRate 0.0556   Epoch: 5   Global Step: 210830   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:25:56,345-Speed 2644.39 samples/sec   Loss 10.3384   LearningRate 0.0556   Epoch: 5   Global Step: 210840   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:00,242-Speed 2627.94 samples/sec   Loss 10.1931   LearningRate 0.0556   Epoch: 5   Global Step: 210850   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:04,149-Speed 2621.52 samples/sec   Loss 10.1985   LearningRate 0.0556   Epoch: 5   Global Step: 210860   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:08,058-Speed 2620.29 samples/sec   Loss 10.2912   LearningRate 0.0556   Epoch: 5   Global Step: 210870   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:11,968-Speed 2619.36 samples/sec   Loss 10.2146   LearningRate 0.0556   Epoch: 5   Global Step: 210880   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:15,866-Speed 2627.52 samples/sec   Loss 10.0060   LearningRate 0.0556   Epoch: 5   Global Step: 210890   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:19,769-Speed 2624.59 samples/sec   Loss 10.2251   LearningRate 0.0556   Epoch: 5   Global Step: 210900   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:23,674-Speed 2622.73 samples/sec   Loss 10.3475   LearningRate 0.0556   Epoch: 5   Global Step: 210910   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:27,572-Speed 2627.83 samples/sec   Loss 10.1959   LearningRate 0.0556   Epoch: 5   Global Step: 210920   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:31,471-Speed 2626.95 samples/sec   Loss 10.2786   LearningRate 0.0556   Epoch: 5   Global Step: 210930   Fp16 Grad Scale: 8192   Required: 70 hours
Training: 2022-04-13 19:26:35,370-Speed 2626.33 samples/sec   Loss 10.0667   LearningRate 0.0556   Epoch: 5   Global Step: 210940   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:26:39,267-Speed 2628.38 samples/sec   Loss 10.1806   LearningRate 0.0556   Epoch: 5   Global Step: 210950   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:26:43,165-Speed 2627.86 samples/sec   Loss 10.1512   LearningRate 0.0556   Epoch: 5   Global Step: 210960   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:26:47,066-Speed 2624.99 samples/sec   Loss 10.1626   LearningRate 0.0556   Epoch: 5   Global Step: 210970   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:26:50,979-Speed 2617.80 samples/sec   Loss 10.2326   LearningRate 0.0556   Epoch: 5   Global Step: 210980   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:26:54,888-Speed 2620.57 samples/sec   Loss 10.2345   LearningRate 0.0556   Epoch: 5   Global Step: 210990   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:26:58,790-Speed 2624.91 samples/sec   Loss 10.0684   LearningRate 0.0556   Epoch: 5   Global Step: 211000   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:27:02,691-Speed 2625.23 samples/sec   Loss 10.3161   LearningRate 0.0556   Epoch: 5   Global Step: 211010   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:27:06,603-Speed 2618.53 samples/sec   Loss 10.1371   LearningRate 0.0556   Epoch: 5   Global Step: 211020   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:27:10,500-Speed 2627.79 samples/sec   Loss 10.1921   LearningRate 0.0556   Epoch: 5   Global Step: 211030   Fp16 Grad Scale: 16384   Required: 70 hours
Training: 2022-04-13 19:27:14,401-Speed 2626.02 samples/sec   Loss 10.1511   LearningRate 0.0556   Epoch: 5   Global Step: 211040   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:18,316-Speed 2616.79 samples/sec   Loss 10.1384   LearningRate 0.0556   Epoch: 5   Global Step: 211050   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:22,266-Speed 2592.92 samples/sec   Loss 10.2221   LearningRate 0.0556   Epoch: 5   Global Step: 211060   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:26,167-Speed 2625.65 samples/sec   Loss 10.3201   LearningRate 0.0556   Epoch: 5   Global Step: 211070   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:30,076-Speed 2619.86 samples/sec   Loss 10.1340   LearningRate 0.0556   Epoch: 5   Global Step: 211080   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:33,983-Speed 2621.65 samples/sec   Loss 10.1888   LearningRate 0.0556   Epoch: 5   Global Step: 211090   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:37,892-Speed 2620.23 samples/sec   Loss 10.0708   LearningRate 0.0556   Epoch: 5   Global Step: 211100   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:41,801-Speed 2620.52 samples/sec   Loss 10.2007   LearningRate 0.0556   Epoch: 5   Global Step: 211110   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:45,739-Speed 2600.74 samples/sec   Loss 10.1394   LearningRate 0.0556   Epoch: 5   Global Step: 211120   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:49,816-Speed 2512.36 samples/sec   Loss 10.1230   LearningRate 0.0556   Epoch: 5   Global Step: 211130   Fp16 Grad Scale: 32768   Required: 70 hours
Training: 2022-04-13 19:27:53,806-Speed 2566.61 samples/sec   Loss 10.1460   LearningRate 0.0556   Epoch: 5   Global Step: 211140   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:27:57,723-Speed 2614.70 samples/sec   Loss 9.9662   LearningRate 0.0556   Epoch: 5   Global Step: 211150   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:01,628-Speed 2622.78 samples/sec   Loss 10.1193   LearningRate 0.0556   Epoch: 5   Global Step: 211160   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:05,536-Speed 2621.00 samples/sec   Loss 10.0186   LearningRate 0.0556   Epoch: 5   Global Step: 211170   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:09,450-Speed 2616.79 samples/sec   Loss 10.1189   LearningRate 0.0556   Epoch: 5   Global Step: 211180   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:13,356-Speed 2622.39 samples/sec   Loss 10.2578   LearningRate 0.0556   Epoch: 5   Global Step: 211190   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:17,262-Speed 2622.11 samples/sec   Loss 10.0660   LearningRate 0.0556   Epoch: 5   Global Step: 211200   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:21,175-Speed 2617.80 samples/sec   Loss 9.9885   LearningRate 0.0556   Epoch: 5   Global Step: 211210   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:25,082-Speed 2621.39 samples/sec   Loss 10.0794   LearningRate 0.0556   Epoch: 5   Global Step: 211220   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:28,985-Speed 2624.14 samples/sec   Loss 10.1331   LearningRate 0.0556   Epoch: 5   Global Step: 211230   Fp16 Grad Scale: 65536   Required: 70 hours
Training: 2022-04-13 19:28:32,891-Speed 2622.28 samples/sec   Loss 10.0917   LearningRate 0.0556   Epoch: 5   Global Step: 211240   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:28:36,804-Speed 2617.53 samples/sec   Loss 10.1733   LearningRate 0.0556   Epoch: 5   Global Step: 211250   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:28:40,704-Speed 2626.28 samples/sec   Loss 10.0458   LearningRate 0.0556   Epoch: 5   Global Step: 211260   Fp16 Grad Scale: 131072   Required: 70 hours
Training: 2022-04-13 19:28:44,609-Speed 2622.50 samples/sec   Loss 10.0672   LearningRate 0.0556   Epoch: 5   Global Step: 211270   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:28:48,512-Speed 2624.64 samples/sec   Loss 10.1543   LearningRate 0.0555   Epoch: 5   Global Step: 211280   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:28:52,416-Speed 2623.25 samples/sec   Loss 10.0823   LearningRate 0.0555   Epoch: 5   Global Step: 211290   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:28:56,317-Speed 2625.89 samples/sec   Loss 10.1985   LearningRate 0.0555   Epoch: 5   Global Step: 211300   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:00,230-Speed 2617.19 samples/sec   Loss 9.9412   LearningRate 0.0555   Epoch: 5   Global Step: 211310   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:04,139-Speed 2620.20 samples/sec   Loss 10.3201   LearningRate 0.0555   Epoch: 5   Global Step: 211320   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:08,056-Speed 2614.99 samples/sec   Loss 10.1608   LearningRate 0.0555   Epoch: 5   Global Step: 211330   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:11,947-Speed 2632.53 samples/sec   Loss 10.1609   LearningRate 0.0555   Epoch: 5   Global Step: 211340   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:15,854-Speed 2621.59 samples/sec   Loss 10.1651   LearningRate 0.0555   Epoch: 5   Global Step: 211350   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:19,764-Speed 2618.88 samples/sec   Loss 9.9713   LearningRate 0.0555   Epoch: 5   Global Step: 211360   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:23,668-Speed 2623.79 samples/sec   Loss 10.1756   LearningRate 0.0555   Epoch: 5   Global Step: 211370   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:27,569-Speed 2625.37 samples/sec   Loss 10.0961   LearningRate 0.0555   Epoch: 5   Global Step: 211380   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:31,472-Speed 2624.52 samples/sec   Loss 10.2439   LearningRate 0.0555   Epoch: 5   Global Step: 211390   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:35,376-Speed 2623.50 samples/sec   Loss 10.1738   LearningRate 0.0555   Epoch: 5   Global Step: 211400   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:39,280-Speed 2623.64 samples/sec   Loss 10.0216   LearningRate 0.0555   Epoch: 5   Global Step: 211410   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:43,181-Speed 2625.78 samples/sec   Loss 10.2341   LearningRate 0.0555   Epoch: 5   Global Step: 211420   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:47,087-Speed 2622.36 samples/sec   Loss 10.1856   LearningRate 0.0555   Epoch: 5   Global Step: 211430   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:50,977-Speed 2632.46 samples/sec   Loss 10.2407   LearningRate 0.0555   Epoch: 5   Global Step: 211440   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:54,882-Speed 2623.13 samples/sec   Loss 10.0431   LearningRate 0.0555   Epoch: 5   Global Step: 211450   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:29:58,784-Speed 2624.22 samples/sec   Loss 10.0909   LearningRate 0.0555   Epoch: 5   Global Step: 211460   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:02,689-Speed 2623.68 samples/sec   Loss 10.0293   LearningRate 0.0555   Epoch: 5   Global Step: 211470   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:06,594-Speed 2622.34 samples/sec   Loss 10.1527   LearningRate 0.0555   Epoch: 5   Global Step: 211480   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:10,504-Speed 2619.97 samples/sec   Loss 10.0855   LearningRate 0.0555   Epoch: 5   Global Step: 211490   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:14,408-Speed 2623.37 samples/sec   Loss 10.2199   LearningRate 0.0555   Epoch: 5   Global Step: 211500   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:18,312-Speed 2623.66 samples/sec   Loss 10.1278   LearningRate 0.0555   Epoch: 5   Global Step: 211510   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:22,214-Speed 2625.29 samples/sec   Loss 10.1111   LearningRate 0.0555   Epoch: 5   Global Step: 211520   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:30:26,102-Speed 2633.91 samples/sec   Loss 10.1553   LearningRate 0.0555   Epoch: 5   Global Step: 211530   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:30,012-Speed 2619.25 samples/sec   Loss 10.2680   LearningRate 0.0555   Epoch: 5   Global Step: 211540   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:33,914-Speed 2625.31 samples/sec   Loss 10.1072   LearningRate 0.0555   Epoch: 5   Global Step: 211550   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:37,826-Speed 2617.75 samples/sec   Loss 10.1725   LearningRate 0.0555   Epoch: 5   Global Step: 211560   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:41,726-Speed 2626.25 samples/sec   Loss 10.0600   LearningRate 0.0555   Epoch: 5   Global Step: 211570   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:45,634-Speed 2621.15 samples/sec   Loss 10.0241   LearningRate 0.0555   Epoch: 5   Global Step: 211580   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:49,536-Speed 2625.33 samples/sec   Loss 10.2179   LearningRate 0.0555   Epoch: 5   Global Step: 211590   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:53,436-Speed 2626.42 samples/sec   Loss 10.1039   LearningRate 0.0555   Epoch: 5   Global Step: 211600   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:30:57,337-Speed 2625.43 samples/sec   Loss 10.1268   LearningRate 0.0555   Epoch: 5   Global Step: 211610   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:01,238-Speed 2625.02 samples/sec   Loss 10.2151   LearningRate 0.0555   Epoch: 5   Global Step: 211620   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:05,142-Speed 2623.29 samples/sec   Loss 10.1066   LearningRate 0.0555   Epoch: 5   Global Step: 211630   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:31:09,033-Speed 2632.38 samples/sec   Loss 10.0953   LearningRate 0.0555   Epoch: 5   Global Step: 211640   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:12,929-Speed 2629.54 samples/sec   Loss 10.1013   LearningRate 0.0555   Epoch: 5   Global Step: 211650   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:16,834-Speed 2622.78 samples/sec   Loss 10.0850   LearningRate 0.0555   Epoch: 5   Global Step: 211660   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:20,740-Speed 2622.50 samples/sec   Loss 10.1922   LearningRate 0.0555   Epoch: 5   Global Step: 211670   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:24,644-Speed 2624.05 samples/sec   Loss 10.2748   LearningRate 0.0555   Epoch: 5   Global Step: 211680   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:28,547-Speed 2623.93 samples/sec   Loss 10.2802   LearningRate 0.0555   Epoch: 5   Global Step: 211690   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:32,457-Speed 2619.63 samples/sec   Loss 10.2168   LearningRate 0.0555   Epoch: 5   Global Step: 211700   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:36,359-Speed 2624.61 samples/sec   Loss 10.2276   LearningRate 0.0555   Epoch: 5   Global Step: 211710   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:40,261-Speed 2625.03 samples/sec   Loss 10.1493   LearningRate 0.0555   Epoch: 5   Global Step: 211720   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:44,162-Speed 2625.95 samples/sec   Loss 10.1244   LearningRate 0.0555   Epoch: 5   Global Step: 211730   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:31:48,060-Speed 2627.04 samples/sec   Loss 10.2042   LearningRate 0.0555   Epoch: 5   Global Step: 211740   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:31:51,964-Speed 2623.61 samples/sec   Loss 10.1774   LearningRate 0.0555   Epoch: 5   Global Step: 211750   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:31:55,867-Speed 2624.67 samples/sec   Loss 10.1623   LearningRate 0.0555   Epoch: 5   Global Step: 211760   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:31:59,769-Speed 2624.35 samples/sec   Loss 10.1256   LearningRate 0.0555   Epoch: 5   Global Step: 211770   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:03,678-Speed 2620.84 samples/sec   Loss 10.2684   LearningRate 0.0555   Epoch: 5   Global Step: 211780   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:07,578-Speed 2625.99 samples/sec   Loss 9.9127   LearningRate 0.0555   Epoch: 5   Global Step: 211790   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:11,480-Speed 2625.04 samples/sec   Loss 10.0331   LearningRate 0.0555   Epoch: 5   Global Step: 211800   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:15,382-Speed 2624.36 samples/sec   Loss 10.1048   LearningRate 0.0555   Epoch: 5   Global Step: 211810   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:19,287-Speed 2623.19 samples/sec   Loss 10.1211   LearningRate 0.0555   Epoch: 5   Global Step: 211820   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:23,195-Speed 2620.53 samples/sec   Loss 10.1074   LearningRate 0.0555   Epoch: 5   Global Step: 211830   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:32:27,099-Speed 2624.07 samples/sec   Loss 10.0940   LearningRate 0.0554   Epoch: 5   Global Step: 211840   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:32:31,004-Speed 2622.91 samples/sec   Loss 10.0175   LearningRate 0.0554   Epoch: 5   Global Step: 211850   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:32:34,836-Speed 2672.64 samples/sec   Loss 10.9874   LearningRate 0.0554   Epoch: 5   Global Step: 211860   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:32:38,738-Speed 2624.53 samples/sec   Loss 10.5663   LearningRate 0.0554   Epoch: 5   Global Step: 211870   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:32:42,638-Speed 2626.84 samples/sec   Loss 10.2804   LearningRate 0.0554   Epoch: 5   Global Step: 211880   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:32:46,549-Speed 2618.44 samples/sec   Loss 10.2284   LearningRate 0.0554   Epoch: 5   Global Step: 211890   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:32:50,448-Speed 2627.20 samples/sec   Loss 10.2934   LearningRate 0.0554   Epoch: 5   Global Step: 211900   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:32:54,354-Speed 2622.16 samples/sec   Loss 10.2174   LearningRate 0.0554   Epoch: 5   Global Step: 211910   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:32:58,267-Speed 2617.44 samples/sec   Loss 10.1605   LearningRate 0.0554   Epoch: 5   Global Step: 211920   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:33:02,174-Speed 2621.70 samples/sec   Loss 10.3022   LearningRate 0.0554   Epoch: 5   Global Step: 211930   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:33:06,097-Speed 2610.69 samples/sec   Loss 10.1307   LearningRate 0.0554   Epoch: 5   Global Step: 211940   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:33:10,004-Speed 2621.44 samples/sec   Loss 10.1084   LearningRate 0.0554   Epoch: 5   Global Step: 211950   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:33:13,910-Speed 2621.90 samples/sec   Loss 10.0934   LearningRate 0.0554   Epoch: 5   Global Step: 211960   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:17,831-Speed 2612.58 samples/sec   Loss 10.0829   LearningRate 0.0554   Epoch: 5   Global Step: 211970   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:21,729-Speed 2627.51 samples/sec   Loss 10.2402   LearningRate 0.0554   Epoch: 5   Global Step: 211980   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:25,640-Speed 2619.21 samples/sec   Loss 10.2310   LearningRate 0.0554   Epoch: 5   Global Step: 211990   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:29,552-Speed 2617.67 samples/sec   Loss 10.2151   LearningRate 0.0554   Epoch: 5   Global Step: 212000   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:33,456-Speed 2623.54 samples/sec   Loss 10.0501   LearningRate 0.0554   Epoch: 5   Global Step: 212010   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:37,353-Speed 2628.46 samples/sec   Loss 10.1725   LearningRate 0.0554   Epoch: 5   Global Step: 212020   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:41,253-Speed 2626.17 samples/sec   Loss 10.0609   LearningRate 0.0554   Epoch: 5   Global Step: 212030   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:45,150-Speed 2627.81 samples/sec   Loss 10.1716   LearningRate 0.0554   Epoch: 5   Global Step: 212040   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:49,052-Speed 2625.06 samples/sec   Loss 10.2343   LearningRate 0.0554   Epoch: 5   Global Step: 212050   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:33:52,956-Speed 2623.03 samples/sec   Loss 10.1094   LearningRate 0.0554   Epoch: 5   Global Step: 212060   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:33:56,860-Speed 2624.21 samples/sec   Loss 10.1718   LearningRate 0.0554   Epoch: 5   Global Step: 212070   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:00,758-Speed 2627.55 samples/sec   Loss 10.0912   LearningRate 0.0554   Epoch: 5   Global Step: 212080   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:04,661-Speed 2624.52 samples/sec   Loss 10.2962   LearningRate 0.0554   Epoch: 5   Global Step: 212090   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:08,564-Speed 2623.88 samples/sec   Loss 10.0589   LearningRate 0.0554   Epoch: 5   Global Step: 212100   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:12,467-Speed 2624.11 samples/sec   Loss 10.1192   LearningRate 0.0554   Epoch: 5   Global Step: 212110   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:16,368-Speed 2625.63 samples/sec   Loss 10.1077   LearningRate 0.0554   Epoch: 5   Global Step: 212120   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:20,277-Speed 2619.80 samples/sec   Loss 10.1732   LearningRate 0.0554   Epoch: 5   Global Step: 212130   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:24,181-Speed 2623.87 samples/sec   Loss 10.1707   LearningRate 0.0554   Epoch: 5   Global Step: 212140   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:28,082-Speed 2625.67 samples/sec   Loss 10.2095   LearningRate 0.0554   Epoch: 5   Global Step: 212150   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:34:31,987-Speed 2622.72 samples/sec   Loss 10.2704   LearningRate 0.0554   Epoch: 5   Global Step: 212160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:35,897-Speed 2619.56 samples/sec   Loss 10.1100   LearningRate 0.0554   Epoch: 5   Global Step: 212170   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:39,809-Speed 2618.21 samples/sec   Loss 10.2212   LearningRate 0.0554   Epoch: 5   Global Step: 212180   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:43,714-Speed 2623.09 samples/sec   Loss 10.2064   LearningRate 0.0554   Epoch: 5   Global Step: 212190   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:47,615-Speed 2625.77 samples/sec   Loss 10.2096   LearningRate 0.0554   Epoch: 5   Global Step: 212200   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:51,531-Speed 2615.95 samples/sec   Loss 10.0939   LearningRate 0.0554   Epoch: 5   Global Step: 212210   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:55,451-Speed 2613.03 samples/sec   Loss 10.0855   LearningRate 0.0554   Epoch: 5   Global Step: 212220   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:34:59,354-Speed 2624.29 samples/sec   Loss 10.1417   LearningRate 0.0554   Epoch: 5   Global Step: 212230   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:35:03,267-Speed 2617.65 samples/sec   Loss 10.0282   LearningRate 0.0554   Epoch: 5   Global Step: 212240   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:35:07,174-Speed 2621.58 samples/sec   Loss 9.9935   LearningRate 0.0554   Epoch: 5   Global Step: 212250   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:35:11,078-Speed 2623.55 samples/sec   Loss 10.0121   LearningRate 0.0554   Epoch: 5   Global Step: 212260   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:35:14,946-Speed 2648.45 samples/sec   Loss 10.0243   LearningRate 0.0554   Epoch: 5   Global Step: 212270   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:18,852-Speed 2621.86 samples/sec   Loss 10.0981   LearningRate 0.0554   Epoch: 5   Global Step: 212280   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:22,760-Speed 2621.24 samples/sec   Loss 10.1227   LearningRate 0.0554   Epoch: 5   Global Step: 212290   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:26,665-Speed 2623.42 samples/sec   Loss 10.3192   LearningRate 0.0554   Epoch: 5   Global Step: 212300   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:30,568-Speed 2623.91 samples/sec   Loss 10.2721   LearningRate 0.0554   Epoch: 5   Global Step: 212310   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:34,475-Speed 2621.71 samples/sec   Loss 10.1312   LearningRate 0.0554   Epoch: 5   Global Step: 212320   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:38,388-Speed 2617.86 samples/sec   Loss 10.2010   LearningRate 0.0554   Epoch: 5   Global Step: 212330   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:42,308-Speed 2612.23 samples/sec   Loss 10.2171   LearningRate 0.0554   Epoch: 5   Global Step: 212340   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:46,218-Speed 2619.65 samples/sec   Loss 10.1024   LearningRate 0.0554   Epoch: 5   Global Step: 212350   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:50,127-Speed 2620.34 samples/sec   Loss 10.1842   LearningRate 0.0554   Epoch: 5   Global Step: 212360   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:35:54,032-Speed 2623.64 samples/sec   Loss 10.2526   LearningRate 0.0554   Epoch: 5   Global Step: 212370   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:35:57,934-Speed 2624.58 samples/sec   Loss 10.1520   LearningRate 0.0554   Epoch: 5   Global Step: 212380   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:01,836-Speed 2624.68 samples/sec   Loss 10.2281   LearningRate 0.0554   Epoch: 5   Global Step: 212390   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:05,743-Speed 2621.46 samples/sec   Loss 10.2213   LearningRate 0.0553   Epoch: 5   Global Step: 212400   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:09,650-Speed 2621.80 samples/sec   Loss 10.1360   LearningRate 0.0553   Epoch: 5   Global Step: 212410   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:13,562-Speed 2618.61 samples/sec   Loss 9.9852   LearningRate 0.0553   Epoch: 5   Global Step: 212420   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:17,469-Speed 2621.22 samples/sec   Loss 10.1296   LearningRate 0.0553   Epoch: 5   Global Step: 212430   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:21,495-Speed 2543.88 samples/sec   Loss 10.1634   LearningRate 0.0553   Epoch: 5   Global Step: 212440   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:25,398-Speed 2624.11 samples/sec   Loss 9.9626   LearningRate 0.0553   Epoch: 5   Global Step: 212450   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:29,343-Speed 2596.25 samples/sec   Loss 10.0982   LearningRate 0.0553   Epoch: 5   Global Step: 212460   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:33,248-Speed 2622.96 samples/sec   Loss 10.1854   LearningRate 0.0553   Epoch: 5   Global Step: 212470   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:36:37,152-Speed 2623.97 samples/sec   Loss 9.9703   LearningRate 0.0553   Epoch: 5   Global Step: 212480   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:36:41,057-Speed 2622.98 samples/sec   Loss 10.1460   LearningRate 0.0553   Epoch: 5   Global Step: 212490   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:36:44,961-Speed 2623.21 samples/sec   Loss 10.0228   LearningRate 0.0553   Epoch: 5   Global Step: 212500   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:36:48,868-Speed 2621.47 samples/sec   Loss 10.2066   LearningRate 0.0553   Epoch: 5   Global Step: 212510   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:36:52,756-Speed 2634.65 samples/sec   Loss 10.0838   LearningRate 0.0553   Epoch: 5   Global Step: 212520   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:36:56,679-Speed 2611.03 samples/sec   Loss 10.0979   LearningRate 0.0553   Epoch: 5   Global Step: 212530   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:37:00,581-Speed 2624.28 samples/sec   Loss 10.1037   LearningRate 0.0553   Epoch: 5   Global Step: 212540   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:37:04,493-Speed 2617.93 samples/sec   Loss 10.1119   LearningRate 0.0553   Epoch: 5   Global Step: 212550   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:37:08,360-Speed 2649.40 samples/sec   Loss 10.6067   LearningRate 0.0553   Epoch: 5   Global Step: 212560   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:12,261-Speed 2625.37 samples/sec   Loss 10.8599   LearningRate 0.0553   Epoch: 5   Global Step: 212570   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:16,169-Speed 2620.93 samples/sec   Loss 10.3965   LearningRate 0.0553   Epoch: 5   Global Step: 212580   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:20,071-Speed 2625.19 samples/sec   Loss 10.1178   LearningRate 0.0553   Epoch: 5   Global Step: 212590   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:23,969-Speed 2627.25 samples/sec   Loss 10.2907   LearningRate 0.0553   Epoch: 5   Global Step: 212600   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:27,888-Speed 2613.42 samples/sec   Loss 10.1987   LearningRate 0.0553   Epoch: 5   Global Step: 212610   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:31,788-Speed 2626.31 samples/sec   Loss 10.1748   LearningRate 0.0553   Epoch: 5   Global Step: 212620   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:35,687-Speed 2626.45 samples/sec   Loss 10.1578   LearningRate 0.0553   Epoch: 5   Global Step: 212630   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:39,587-Speed 2626.71 samples/sec   Loss 10.1653   LearningRate 0.0553   Epoch: 5   Global Step: 212640   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:43,486-Speed 2632.40 samples/sec   Loss 10.1844   LearningRate 0.0553   Epoch: 5   Global Step: 212650   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:37:47,385-Speed 2627.08 samples/sec   Loss 10.1652   LearningRate 0.0553   Epoch: 5   Global Step: 212660   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:37:51,285-Speed 2626.50 samples/sec   Loss 10.3488   LearningRate 0.0553   Epoch: 5   Global Step: 212670   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:37:55,188-Speed 2624.47 samples/sec   Loss 10.1354   LearningRate 0.0553   Epoch: 5   Global Step: 212680   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:37:59,091-Speed 2623.86 samples/sec   Loss 10.3222   LearningRate 0.0553   Epoch: 5   Global Step: 212690   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:03,010-Speed 2613.46 samples/sec   Loss 10.1712   LearningRate 0.0553   Epoch: 5   Global Step: 212700   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:06,910-Speed 2626.36 samples/sec   Loss 10.1502   LearningRate 0.0553   Epoch: 5   Global Step: 212710   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:10,808-Speed 2627.48 samples/sec   Loss 10.1835   LearningRate 0.0553   Epoch: 5   Global Step: 212720   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:14,708-Speed 2626.50 samples/sec   Loss 10.0940   LearningRate 0.0553   Epoch: 5   Global Step: 212730   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:18,605-Speed 2627.94 samples/sec   Loss 10.1035   LearningRate 0.0553   Epoch: 5   Global Step: 212740   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:22,505-Speed 2625.94 samples/sec   Loss 10.1605   LearningRate 0.0553   Epoch: 5   Global Step: 212750   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:26,402-Speed 2628.37 samples/sec   Loss 10.3384   LearningRate 0.0553   Epoch: 5   Global Step: 212760   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:38:30,303-Speed 2625.83 samples/sec   Loss 10.0925   LearningRate 0.0553   Epoch: 5   Global Step: 212770   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:38:34,200-Speed 2628.28 samples/sec   Loss 10.1741   LearningRate 0.0553   Epoch: 5   Global Step: 212780   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:38:38,099-Speed 2627.16 samples/sec   Loss 10.0695   LearningRate 0.0553   Epoch: 5   Global Step: 212790   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:38:41,997-Speed 2627.38 samples/sec   Loss 10.2096   LearningRate 0.0553   Epoch: 5   Global Step: 212800   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:38:45,882-Speed 2636.26 samples/sec   Loss 10.1160   LearningRate 0.0553   Epoch: 5   Global Step: 212810   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:49,784-Speed 2624.99 samples/sec   Loss 10.1628   LearningRate 0.0553   Epoch: 5   Global Step: 212820   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:53,685-Speed 2625.34 samples/sec   Loss 10.1123   LearningRate 0.0553   Epoch: 5   Global Step: 212830   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:38:57,588-Speed 2624.55 samples/sec   Loss 10.0966   LearningRate 0.0553   Epoch: 5   Global Step: 212840   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:01,494-Speed 2621.73 samples/sec   Loss 10.0848   LearningRate 0.0553   Epoch: 5   Global Step: 212850   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:05,400-Speed 2621.95 samples/sec   Loss 10.3336   LearningRate 0.0553   Epoch: 5   Global Step: 212860   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:09,311-Speed 2619.23 samples/sec   Loss 10.2227   LearningRate 0.0553   Epoch: 5   Global Step: 212870   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:13,217-Speed 2622.76 samples/sec   Loss 10.0907   LearningRate 0.0553   Epoch: 5   Global Step: 212880   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:17,119-Speed 2624.52 samples/sec   Loss 9.9937   LearningRate 0.0553   Epoch: 5   Global Step: 212890   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:21,022-Speed 2623.98 samples/sec   Loss 10.2966   LearningRate 0.0553   Epoch: 5   Global Step: 212900   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:39:24,929-Speed 2621.79 samples/sec   Loss 10.1534   LearningRate 0.0553   Epoch: 5   Global Step: 212910   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:28,837-Speed 2620.75 samples/sec   Loss 10.2650   LearningRate 0.0553   Epoch: 5   Global Step: 212920   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:32,753-Speed 2615.37 samples/sec   Loss 10.2010   LearningRate 0.0553   Epoch: 5   Global Step: 212930   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:36,659-Speed 2621.71 samples/sec   Loss 10.0295   LearningRate 0.0553   Epoch: 5   Global Step: 212940   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:40,573-Speed 2616.76 samples/sec   Loss 10.1349   LearningRate 0.0553   Epoch: 5   Global Step: 212950   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:44,483-Speed 2620.07 samples/sec   Loss 10.0060   LearningRate 0.0552   Epoch: 5   Global Step: 212960   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:48,387-Speed 2623.26 samples/sec   Loss 10.1486   LearningRate 0.0552   Epoch: 5   Global Step: 212970   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:52,293-Speed 2622.87 samples/sec   Loss 10.1675   LearningRate 0.0552   Epoch: 5   Global Step: 212980   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:39:56,213-Speed 2612.61 samples/sec   Loss 10.1735   LearningRate 0.0552   Epoch: 5   Global Step: 212990   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:40:00,120-Speed 2621.56 samples/sec   Loss 10.1420   LearningRate 0.0552   Epoch: 5   Global Step: 213000   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:40:04,028-Speed 2620.74 samples/sec   Loss 10.1809   LearningRate 0.0552   Epoch: 5   Global Step: 213010   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:07,934-Speed 2621.85 samples/sec   Loss 10.0385   LearningRate 0.0552   Epoch: 5   Global Step: 213020   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:11,842-Speed 2620.67 samples/sec   Loss 10.1539   LearningRate 0.0552   Epoch: 5   Global Step: 213030   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:15,756-Speed 2617.36 samples/sec   Loss 10.0369   LearningRate 0.0552   Epoch: 5   Global Step: 213040   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:19,660-Speed 2623.85 samples/sec   Loss 9.9586   LearningRate 0.0552   Epoch: 5   Global Step: 213050   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:23,565-Speed 2622.34 samples/sec   Loss 10.1241   LearningRate 0.0552   Epoch: 5   Global Step: 213060   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:27,467-Speed 2625.06 samples/sec   Loss 10.2414   LearningRate 0.0552   Epoch: 5   Global Step: 213070   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:31,367-Speed 2626.48 samples/sec   Loss 10.1469   LearningRate 0.0552   Epoch: 5   Global Step: 213080   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:35,267-Speed 2626.07 samples/sec   Loss 10.0809   LearningRate 0.0552   Epoch: 5   Global Step: 213090   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:39,172-Speed 2622.76 samples/sec   Loss 10.1759   LearningRate 0.0552   Epoch: 5   Global Step: 213100   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:40:43,073-Speed 2625.67 samples/sec   Loss 10.2441   LearningRate 0.0552   Epoch: 5   Global Step: 213110   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:40:46,975-Speed 2624.87 samples/sec   Loss 10.3381   LearningRate 0.0552   Epoch: 5   Global Step: 213120   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:40:50,888-Speed 2617.95 samples/sec   Loss 10.2304   LearningRate 0.0552   Epoch: 5   Global Step: 213130   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:40:54,805-Speed 2614.75 samples/sec   Loss 10.1481   LearningRate 0.0552   Epoch: 5   Global Step: 213140   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:40:58,715-Speed 2619.27 samples/sec   Loss 10.3498   LearningRate 0.0552   Epoch: 5   Global Step: 213150   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:02,616-Speed 2625.52 samples/sec   Loss 10.2120   LearningRate 0.0552   Epoch: 5   Global Step: 213160   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:06,518-Speed 2624.93 samples/sec   Loss 10.1768   LearningRate 0.0552   Epoch: 5   Global Step: 213170   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:10,423-Speed 2623.08 samples/sec   Loss 10.1734   LearningRate 0.0552   Epoch: 5   Global Step: 213180   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:14,329-Speed 2622.61 samples/sec   Loss 10.0751   LearningRate 0.0552   Epoch: 5   Global Step: 213190   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:18,233-Speed 2623.34 samples/sec   Loss 10.0828   LearningRate 0.0552   Epoch: 5   Global Step: 213200   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:22,123-Speed 2633.04 samples/sec   Loss 10.1075   LearningRate 0.0552   Epoch: 5   Global Step: 213210   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:26,025-Speed 2625.04 samples/sec   Loss 10.1665   LearningRate 0.0552   Epoch: 5   Global Step: 213220   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:29,927-Speed 2624.76 samples/sec   Loss 10.0532   LearningRate 0.0552   Epoch: 5   Global Step: 213230   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:33,826-Speed 2626.95 samples/sec   Loss 9.9607   LearningRate 0.0552   Epoch: 5   Global Step: 213240   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:37,725-Speed 2627.26 samples/sec   Loss 10.0450   LearningRate 0.0552   Epoch: 5   Global Step: 213250   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:41,639-Speed 2616.90 samples/sec   Loss 9.9743   LearningRate 0.0552   Epoch: 5   Global Step: 213260   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:45,535-Speed 2628.71 samples/sec   Loss 10.0555   LearningRate 0.0552   Epoch: 5   Global Step: 213270   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:49,431-Speed 2628.82 samples/sec   Loss 10.2828   LearningRate 0.0552   Epoch: 5   Global Step: 213280   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:53,334-Speed 2624.81 samples/sec   Loss 10.2069   LearningRate 0.0552   Epoch: 5   Global Step: 213290   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:41:57,231-Speed 2628.16 samples/sec   Loss 10.3347   LearningRate 0.0552   Epoch: 5   Global Step: 213300   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:42:01,179-Speed 2594.28 samples/sec   Loss 10.1107   LearningRate 0.0552   Epoch: 5   Global Step: 213310   Fp16 Grad Scale: 524288   Required: 69 hours
Training: 2022-04-13 19:42:05,118-Speed 2600.01 samples/sec   Loss 10.0960   LearningRate 0.0552   Epoch: 5   Global Step: 213320   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:42:09,022-Speed 2624.61 samples/sec   Loss 10.0347   LearningRate 0.0552   Epoch: 5   Global Step: 213330   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:42:12,919-Speed 2627.91 samples/sec   Loss 10.1402   LearningRate 0.0552   Epoch: 5   Global Step: 213340   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:42:16,817-Speed 2627.78 samples/sec   Loss 10.1330   LearningRate 0.0552   Epoch: 5   Global Step: 213350   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:42:20,702-Speed 2636.48 samples/sec   Loss 10.1890   LearningRate 0.0552   Epoch: 5   Global Step: 213360   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:24,621-Speed 2614.02 samples/sec   Loss 9.9674   LearningRate 0.0552   Epoch: 5   Global Step: 213370   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:28,553-Speed 2604.77 samples/sec   Loss 10.0064   LearningRate 0.0552   Epoch: 5   Global Step: 213380   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:32,462-Speed 2620.47 samples/sec   Loss 10.1349   LearningRate 0.0552   Epoch: 5   Global Step: 213390   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:36,358-Speed 2628.50 samples/sec   Loss 10.0540   LearningRate 0.0552   Epoch: 5   Global Step: 213400   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:40,256-Speed 2627.89 samples/sec   Loss 10.1493   LearningRate 0.0552   Epoch: 5   Global Step: 213410   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:44,155-Speed 2627.06 samples/sec   Loss 10.0716   LearningRate 0.0552   Epoch: 5   Global Step: 213420   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:48,056-Speed 2625.31 samples/sec   Loss 10.0279   LearningRate 0.0552   Epoch: 5   Global Step: 213430   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:51,951-Speed 2630.15 samples/sec   Loss 10.2648   LearningRate 0.0552   Epoch: 5   Global Step: 213440   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:55,847-Speed 2628.77 samples/sec   Loss 10.0211   LearningRate 0.0552   Epoch: 5   Global Step: 213450   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:42:59,772-Speed 2609.44 samples/sec   Loss 10.1115   LearningRate 0.0552   Epoch: 5   Global Step: 213460   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:43:03,643-Speed 2646.30 samples/sec   Loss 10.0691   LearningRate 0.0552   Epoch: 5   Global Step: 213470   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:07,546-Speed 2624.76 samples/sec   Loss 10.0490   LearningRate 0.0552   Epoch: 5   Global Step: 213480   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:11,474-Speed 2607.10 samples/sec   Loss 10.1847   LearningRate 0.0552   Epoch: 5   Global Step: 213490   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:15,369-Speed 2630.41 samples/sec   Loss 10.0792   LearningRate 0.0552   Epoch: 5   Global Step: 213500   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:19,267-Speed 2627.83 samples/sec   Loss 9.9246   LearningRate 0.0551   Epoch: 5   Global Step: 213510   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:23,171-Speed 2623.63 samples/sec   Loss 10.1290   LearningRate 0.0551   Epoch: 5   Global Step: 213520   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:27,067-Speed 2629.24 samples/sec   Loss 10.1581   LearningRate 0.0551   Epoch: 5   Global Step: 213530   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:30,972-Speed 2622.84 samples/sec   Loss 10.0891   LearningRate 0.0551   Epoch: 5   Global Step: 213540   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:34,871-Speed 2626.38 samples/sec   Loss 10.0260   LearningRate 0.0551   Epoch: 5   Global Step: 213550   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:38,767-Speed 2629.22 samples/sec   Loss 10.1020   LearningRate 0.0551   Epoch: 5   Global Step: 213560   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:42,662-Speed 2629.60 samples/sec   Loss 10.2256   LearningRate 0.0551   Epoch: 5   Global Step: 213570   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:43:46,587-Speed 2609.92 samples/sec   Loss 10.2028   LearningRate 0.0551   Epoch: 5   Global Step: 213580   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:43:50,480-Speed 2631.33 samples/sec   Loss 10.0497   LearningRate 0.0551   Epoch: 5   Global Step: 213590   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:43:54,353-Speed 2644.37 samples/sec   Loss 10.1815   LearningRate 0.0551   Epoch: 5   Global Step: 213600   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:43:58,253-Speed 2626.00 samples/sec   Loss 10.0912   LearningRate 0.0551   Epoch: 5   Global Step: 213610   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:02,158-Speed 2623.42 samples/sec   Loss 10.1352   LearningRate 0.0551   Epoch: 5   Global Step: 213620   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:06,055-Speed 2628.17 samples/sec   Loss 10.2677   LearningRate 0.0551   Epoch: 5   Global Step: 213630   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:09,952-Speed 2627.77 samples/sec   Loss 10.1049   LearningRate 0.0551   Epoch: 5   Global Step: 213640   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:13,856-Speed 2624.50 samples/sec   Loss 10.1611   LearningRate 0.0551   Epoch: 5   Global Step: 213650   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:17,761-Speed 2622.76 samples/sec   Loss 10.1771   LearningRate 0.0551   Epoch: 5   Global Step: 213660   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:21,677-Speed 2615.81 samples/sec   Loss 10.1722   LearningRate 0.0551   Epoch: 5   Global Step: 213670   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:25,597-Speed 2612.42 samples/sec   Loss 10.0932   LearningRate 0.0551   Epoch: 5   Global Step: 213680   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:29,514-Speed 2615.54 samples/sec   Loss 10.1383   LearningRate 0.0551   Epoch: 5   Global Step: 213690   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:44:33,442-Speed 2607.59 samples/sec   Loss 10.1069   LearningRate 0.0551   Epoch: 5   Global Step: 213700   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:44:37,345-Speed 2623.82 samples/sec   Loss 10.1284   LearningRate 0.0551   Epoch: 5   Global Step: 213710   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:44:41,249-Speed 2623.56 samples/sec   Loss 10.0913   LearningRate 0.0551   Epoch: 5   Global Step: 213720   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:44:45,144-Speed 2629.53 samples/sec   Loss 10.1550   LearningRate 0.0551   Epoch: 5   Global Step: 213730   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:44:49,038-Speed 2630.29 samples/sec   Loss 10.1705   LearningRate 0.0551   Epoch: 5   Global Step: 213740   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:44:52,936-Speed 2628.12 samples/sec   Loss 10.1346   LearningRate 0.0551   Epoch: 5   Global Step: 213750   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:44:56,837-Speed 2625.54 samples/sec   Loss 10.1142   LearningRate 0.0551   Epoch: 5   Global Step: 213760   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:00,740-Speed 2624.24 samples/sec   Loss 10.1865   LearningRate 0.0551   Epoch: 5   Global Step: 213770   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:04,674-Speed 2603.63 samples/sec   Loss 10.0326   LearningRate 0.0551   Epoch: 5   Global Step: 213780   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:08,585-Speed 2618.82 samples/sec   Loss 9.9870   LearningRate 0.0551   Epoch: 5   Global Step: 213790   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:12,468-Speed 2637.20 samples/sec   Loss 10.1744   LearningRate 0.0551   Epoch: 5   Global Step: 213800   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:16,367-Speed 2627.21 samples/sec   Loss 10.1329   LearningRate 0.0551   Epoch: 5   Global Step: 213810   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:20,266-Speed 2627.33 samples/sec   Loss 10.2364   LearningRate 0.0551   Epoch: 5   Global Step: 213820   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:24,157-Speed 2631.88 samples/sec   Loss 10.1638   LearningRate 0.0551   Epoch: 5   Global Step: 213830   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:28,049-Speed 2631.44 samples/sec   Loss 10.1014   LearningRate 0.0551   Epoch: 5   Global Step: 213840   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:31,944-Speed 2630.02 samples/sec   Loss 10.1271   LearningRate 0.0551   Epoch: 5   Global Step: 213850   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:35,841-Speed 2628.05 samples/sec   Loss 9.9681   LearningRate 0.0551   Epoch: 5   Global Step: 213860   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:39,838-Speed 2562.03 samples/sec   Loss 10.1946   LearningRate 0.0551   Epoch: 5   Global Step: 213870   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:43,732-Speed 2630.42 samples/sec   Loss 9.9576   LearningRate 0.0551   Epoch: 5   Global Step: 213880   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:45:47,621-Speed 2633.89 samples/sec   Loss 10.0210   LearningRate 0.0551   Epoch: 5   Global Step: 213890   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:45:51,516-Speed 2629.17 samples/sec   Loss 10.0735   LearningRate 0.0551   Epoch: 5   Global Step: 213900   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:45:55,414-Speed 2627.39 samples/sec   Loss 10.1393   LearningRate 0.0551   Epoch: 5   Global Step: 213910   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:45:59,305-Speed 2633.12 samples/sec   Loss 10.1055   LearningRate 0.0551   Epoch: 5   Global Step: 213920   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:03,198-Speed 2631.10 samples/sec   Loss 10.0852   LearningRate 0.0551   Epoch: 5   Global Step: 213930   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:07,092-Speed 2630.03 samples/sec   Loss 10.0834   LearningRate 0.0551   Epoch: 5   Global Step: 213940   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:10,987-Speed 2629.57 samples/sec   Loss 10.1673   LearningRate 0.0551   Epoch: 5   Global Step: 213950   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:14,880-Speed 2630.85 samples/sec   Loss 9.8407   LearningRate 0.0551   Epoch: 5   Global Step: 213960   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:18,774-Speed 2630.33 samples/sec   Loss 10.0365   LearningRate 0.0551   Epoch: 5   Global Step: 213970   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:22,681-Speed 2621.44 samples/sec   Loss 9.9783   LearningRate 0.0551   Epoch: 5   Global Step: 213980   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:46:26,582-Speed 2625.27 samples/sec   Loss 10.0372   LearningRate 0.0551   Epoch: 5   Global Step: 213990   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:30,481-Speed 2626.95 samples/sec   Loss 10.1813   LearningRate 0.0551   Epoch: 5   Global Step: 214000   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:34,390-Speed 2620.81 samples/sec   Loss 10.1282   LearningRate 0.0551   Epoch: 5   Global Step: 214010   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:38,299-Speed 2619.77 samples/sec   Loss 10.1578   LearningRate 0.0551   Epoch: 5   Global Step: 214020   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:42,192-Speed 2630.81 samples/sec   Loss 10.1586   LearningRate 0.0551   Epoch: 5   Global Step: 214030   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:46,085-Speed 2631.00 samples/sec   Loss 10.1800   LearningRate 0.0551   Epoch: 5   Global Step: 214040   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:49,979-Speed 2630.59 samples/sec   Loss 9.9574   LearningRate 0.0551   Epoch: 5   Global Step: 214050   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:53,880-Speed 2625.82 samples/sec   Loss 10.0342   LearningRate 0.0551   Epoch: 5   Global Step: 214060   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:46:57,774-Speed 2630.17 samples/sec   Loss 10.0879   LearningRate 0.0550   Epoch: 5   Global Step: 214070   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:47:01,671-Speed 2628.06 samples/sec   Loss 10.1276   LearningRate 0.0550   Epoch: 5   Global Step: 214080   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:47:05,562-Speed 2632.04 samples/sec   Loss 10.0394   LearningRate 0.0550   Epoch: 5   Global Step: 214090   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:47:09,453-Speed 2632.48 samples/sec   Loss 10.1381   LearningRate 0.0550   Epoch: 5   Global Step: 214100   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:47:13,332-Speed 2640.32 samples/sec   Loss 10.1154   LearningRate 0.0550   Epoch: 5   Global Step: 214110   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:17,228-Speed 2629.50 samples/sec   Loss 9.9950   LearningRate 0.0550   Epoch: 5   Global Step: 214120   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:21,133-Speed 2622.40 samples/sec   Loss 10.1754   LearningRate 0.0550   Epoch: 5   Global Step: 214130   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:25,029-Speed 2628.94 samples/sec   Loss 10.0531   LearningRate 0.0550   Epoch: 5   Global Step: 214140   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:28,926-Speed 2628.19 samples/sec   Loss 10.0373   LearningRate 0.0550   Epoch: 5   Global Step: 214150   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:32,943-Speed 2549.86 samples/sec   Loss 10.0658   LearningRate 0.0550   Epoch: 5   Global Step: 214160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:36,834-Speed 2631.91 samples/sec   Loss 10.1497   LearningRate 0.0550   Epoch: 5   Global Step: 214170   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:40,765-Speed 2605.95 samples/sec   Loss 10.1005   LearningRate 0.0550   Epoch: 5   Global Step: 214180   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:44,658-Speed 2630.68 samples/sec   Loss 10.1297   LearningRate 0.0550   Epoch: 5   Global Step: 214190   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:48,552-Speed 2630.68 samples/sec   Loss 9.9419   LearningRate 0.0550   Epoch: 5   Global Step: 214200   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:52,428-Speed 2642.71 samples/sec   Loss 10.1148   LearningRate 0.0550   Epoch: 5   Global Step: 214210   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:47:56,321-Speed 2630.95 samples/sec   Loss 10.1190   LearningRate 0.0550   Epoch: 5   Global Step: 214220   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:00,218-Speed 2627.89 samples/sec   Loss 10.0422   LearningRate 0.0550   Epoch: 5   Global Step: 214230   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:04,115-Speed 2628.05 samples/sec   Loss 10.0708   LearningRate 0.0550   Epoch: 5   Global Step: 214240   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:08,008-Speed 2630.86 samples/sec   Loss 10.0866   LearningRate 0.0550   Epoch: 5   Global Step: 214250   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:11,904-Speed 2629.06 samples/sec   Loss 10.1565   LearningRate 0.0550   Epoch: 5   Global Step: 214260   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:15,804-Speed 2626.08 samples/sec   Loss 10.1215   LearningRate 0.0550   Epoch: 5   Global Step: 214270   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:19,705-Speed 2625.52 samples/sec   Loss 10.0653   LearningRate 0.0550   Epoch: 5   Global Step: 214280   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:23,615-Speed 2619.88 samples/sec   Loss 10.1824   LearningRate 0.0550   Epoch: 5   Global Step: 214290   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:27,508-Speed 2630.84 samples/sec   Loss 10.1327   LearningRate 0.0550   Epoch: 5   Global Step: 214300   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:48:31,402-Speed 2630.64 samples/sec   Loss 10.1063   LearningRate 0.0550   Epoch: 5   Global Step: 214310   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:35,297-Speed 2629.04 samples/sec   Loss 10.1547   LearningRate 0.0550   Epoch: 5   Global Step: 214320   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:39,193-Speed 2629.10 samples/sec   Loss 10.0620   LearningRate 0.0550   Epoch: 5   Global Step: 214330   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:43,089-Speed 2628.81 samples/sec   Loss 10.0049   LearningRate 0.0550   Epoch: 5   Global Step: 214340   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:46,985-Speed 2629.07 samples/sec   Loss 10.0172   LearningRate 0.0550   Epoch: 5   Global Step: 214350   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:50,887-Speed 2624.99 samples/sec   Loss 10.0646   LearningRate 0.0550   Epoch: 5   Global Step: 214360   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:54,796-Speed 2619.96 samples/sec   Loss 10.0037   LearningRate 0.0550   Epoch: 5   Global Step: 214370   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:48:58,691-Speed 2629.39 samples/sec   Loss 10.2389   LearningRate 0.0550   Epoch: 5   Global Step: 214380   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:49:02,590-Speed 2626.96 samples/sec   Loss 10.1073   LearningRate 0.0550   Epoch: 5   Global Step: 214390   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:49:06,485-Speed 2629.81 samples/sec   Loss 10.0018   LearningRate 0.0550   Epoch: 5   Global Step: 214400   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:49:10,386-Speed 2625.76 samples/sec   Loss 10.1334   LearningRate 0.0550   Epoch: 5   Global Step: 214410   Fp16 Grad Scale: 524288   Required: 69 hours
Training: 2022-04-13 19:49:14,265-Speed 2640.13 samples/sec   Loss 10.0744   LearningRate 0.0550   Epoch: 5   Global Step: 214420   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:49:18,164-Speed 2627.28 samples/sec   Loss 9.9934   LearningRate 0.0550   Epoch: 5   Global Step: 214430   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:49:22,046-Speed 2638.32 samples/sec   Loss 9.9855   LearningRate 0.0550   Epoch: 5   Global Step: 214440   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:25,944-Speed 2627.64 samples/sec   Loss 10.0936   LearningRate 0.0550   Epoch: 5   Global Step: 214450   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:29,844-Speed 2626.31 samples/sec   Loss 10.1699   LearningRate 0.0550   Epoch: 5   Global Step: 214460   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:33,740-Speed 2628.91 samples/sec   Loss 10.1738   LearningRate 0.0550   Epoch: 5   Global Step: 214470   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:37,634-Speed 2629.69 samples/sec   Loss 10.1349   LearningRate 0.0550   Epoch: 5   Global Step: 214480   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:41,528-Speed 2630.38 samples/sec   Loss 9.9568   LearningRate 0.0550   Epoch: 5   Global Step: 214490   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:45,422-Speed 2630.69 samples/sec   Loss 10.1405   LearningRate 0.0550   Epoch: 5   Global Step: 214500   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:49,353-Speed 2605.23 samples/sec   Loss 10.1247   LearningRate 0.0550   Epoch: 5   Global Step: 214510   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:53,431-Speed 2511.67 samples/sec   Loss 10.2048   LearningRate 0.0550   Epoch: 5   Global Step: 214520   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:49:57,328-Speed 2628.33 samples/sec   Loss 10.0751   LearningRate 0.0550   Epoch: 5   Global Step: 214530   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:01,220-Speed 2631.64 samples/sec   Loss 10.0005   LearningRate 0.0550   Epoch: 5   Global Step: 214540   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:50:05,118-Speed 2627.45 samples/sec   Loss 10.0453   LearningRate 0.0550   Epoch: 5   Global Step: 214550   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:50:09,012-Speed 2630.21 samples/sec   Loss 10.1243   LearningRate 0.0550   Epoch: 5   Global Step: 214560   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:50:12,909-Speed 2628.25 samples/sec   Loss 10.1515   LearningRate 0.0550   Epoch: 5   Global Step: 214570   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:50:16,788-Speed 2640.44 samples/sec   Loss 10.2128   LearningRate 0.0550   Epoch: 5   Global Step: 214580   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:20,828-Speed 2535.78 samples/sec   Loss 10.2314   LearningRate 0.0550   Epoch: 5   Global Step: 214590   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:24,724-Speed 2628.72 samples/sec   Loss 10.0388   LearningRate 0.0550   Epoch: 5   Global Step: 214600   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:28,619-Speed 2630.61 samples/sec   Loss 10.0905   LearningRate 0.0550   Epoch: 5   Global Step: 214610   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:32,511-Speed 2631.31 samples/sec   Loss 9.9867   LearningRate 0.0550   Epoch: 5   Global Step: 214620   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:36,401-Speed 2632.98 samples/sec   Loss 10.0139   LearningRate 0.0549   Epoch: 5   Global Step: 214630   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:40,292-Speed 2631.93 samples/sec   Loss 10.1939   LearningRate 0.0549   Epoch: 5   Global Step: 214640   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:44,189-Speed 2629.34 samples/sec   Loss 10.0823   LearningRate 0.0549   Epoch: 5   Global Step: 214650   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:48,115-Speed 2608.61 samples/sec   Loss 10.1066   LearningRate 0.0549   Epoch: 5   Global Step: 214660   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:52,004-Speed 2633.59 samples/sec   Loss 9.9786   LearningRate 0.0549   Epoch: 5   Global Step: 214670   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:50:55,895-Speed 2632.54 samples/sec   Loss 10.0649   LearningRate 0.0549   Epoch: 5   Global Step: 214680   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:50:59,792-Speed 2627.93 samples/sec   Loss 10.1151   LearningRate 0.0549   Epoch: 5   Global Step: 214690   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:03,693-Speed 2625.72 samples/sec   Loss 10.1462   LearningRate 0.0549   Epoch: 5   Global Step: 214700   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:07,585-Speed 2631.37 samples/sec   Loss 9.9792   LearningRate 0.0549   Epoch: 5   Global Step: 214710   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:11,482-Speed 2628.41 samples/sec   Loss 10.0255   LearningRate 0.0549   Epoch: 5   Global Step: 214720   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:15,389-Speed 2621.74 samples/sec   Loss 10.1003   LearningRate 0.0549   Epoch: 5   Global Step: 214730   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:19,279-Speed 2632.46 samples/sec   Loss 10.1039   LearningRate 0.0549   Epoch: 5   Global Step: 214740   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:23,175-Speed 2629.51 samples/sec   Loss 10.2006   LearningRate 0.0549   Epoch: 5   Global Step: 214750   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:27,078-Speed 2623.70 samples/sec   Loss 9.9297   LearningRate 0.0549   Epoch: 5   Global Step: 214760   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:30,980-Speed 2625.39 samples/sec   Loss 9.9596   LearningRate 0.0549   Epoch: 5   Global Step: 214770   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:51:34,861-Speed 2639.29 samples/sec   Loss 10.1970   LearningRate 0.0549   Epoch: 5   Global Step: 214780   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:51:38,753-Speed 2630.96 samples/sec   Loss 10.0613   LearningRate 0.0549   Epoch: 5   Global Step: 214790   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:51:42,646-Speed 2630.98 samples/sec   Loss 10.1730   LearningRate 0.0549   Epoch: 5   Global Step: 214800   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:51:46,545-Speed 2627.33 samples/sec   Loss 10.0581   LearningRate 0.0549   Epoch: 5   Global Step: 214810   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:51:50,438-Speed 2630.88 samples/sec   Loss 10.0977   LearningRate 0.0549   Epoch: 5   Global Step: 214820   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:51:54,331-Speed 2630.71 samples/sec   Loss 10.1010   LearningRate 0.0549   Epoch: 5   Global Step: 214830   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:51:58,224-Speed 2631.20 samples/sec   Loss 10.0416   LearningRate 0.0549   Epoch: 5   Global Step: 214840   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:02,127-Speed 2623.82 samples/sec   Loss 10.0765   LearningRate 0.0549   Epoch: 5   Global Step: 214850   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:06,027-Speed 2626.21 samples/sec   Loss 10.1832   LearningRate 0.0549   Epoch: 5   Global Step: 214860   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:09,924-Speed 2628.38 samples/sec   Loss 10.0473   LearningRate 0.0549   Epoch: 5   Global Step: 214870   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:13,818-Speed 2630.51 samples/sec   Loss 10.1552   LearningRate 0.0549   Epoch: 5   Global Step: 214880   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:52:17,687-Speed 2647.13 samples/sec   Loss 10.0774   LearningRate 0.0549   Epoch: 5   Global Step: 214890   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:21,599-Speed 2618.18 samples/sec   Loss 10.0829   LearningRate 0.0549   Epoch: 5   Global Step: 214900   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:25,491-Speed 2632.00 samples/sec   Loss 10.0875   LearningRate 0.0549   Epoch: 5   Global Step: 214910   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:29,389-Speed 2627.25 samples/sec   Loss 10.1231   LearningRate 0.0549   Epoch: 5   Global Step: 214920   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:33,323-Speed 2603.36 samples/sec   Loss 10.2137   LearningRate 0.0549   Epoch: 5   Global Step: 214930   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:37,287-Speed 2584.30 samples/sec   Loss 10.1433   LearningRate 0.0549   Epoch: 5   Global Step: 214940   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:41,179-Speed 2631.18 samples/sec   Loss 10.0920   LearningRate 0.0549   Epoch: 5   Global Step: 214950   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:45,073-Speed 2630.58 samples/sec   Loss 10.1711   LearningRate 0.0549   Epoch: 5   Global Step: 214960   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:48,966-Speed 2631.02 samples/sec   Loss 10.1188   LearningRate 0.0549   Epoch: 5   Global Step: 214970   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:52,860-Speed 2630.47 samples/sec   Loss 10.0397   LearningRate 0.0549   Epoch: 5   Global Step: 214980   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:52:56,752-Speed 2631.61 samples/sec   Loss 10.0868   LearningRate 0.0549   Epoch: 5   Global Step: 214990   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:53:00,643-Speed 2632.37 samples/sec   Loss 10.1237   LearningRate 0.0549   Epoch: 5   Global Step: 215000   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:53:04,536-Speed 2630.51 samples/sec   Loss 10.0398   LearningRate 0.0549   Epoch: 5   Global Step: 215010   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:53:08,430-Speed 2630.26 samples/sec   Loss 10.0859   LearningRate 0.0549   Epoch: 5   Global Step: 215020   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:53:12,318-Speed 2634.44 samples/sec   Loss 10.0759   LearningRate 0.0549   Epoch: 5   Global Step: 215030   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:16,211-Speed 2631.29 samples/sec   Loss 10.1056   LearningRate 0.0549   Epoch: 5   Global Step: 215040   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:20,111-Speed 2625.72 samples/sec   Loss 10.0657   LearningRate 0.0549   Epoch: 5   Global Step: 215050   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:24,002-Speed 2633.03 samples/sec   Loss 10.0260   LearningRate 0.0549   Epoch: 5   Global Step: 215060   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:27,897-Speed 2629.69 samples/sec   Loss 10.0463   LearningRate 0.0549   Epoch: 5   Global Step: 215070   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:31,788-Speed 2632.31 samples/sec   Loss 10.0727   LearningRate 0.0549   Epoch: 5   Global Step: 215080   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:35,682-Speed 2630.13 samples/sec   Loss 10.0551   LearningRate 0.0549   Epoch: 5   Global Step: 215090   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:39,575-Speed 2630.69 samples/sec   Loss 10.0459   LearningRate 0.0549   Epoch: 5   Global Step: 215100   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:43,468-Speed 2630.59 samples/sec   Loss 10.2447   LearningRate 0.0549   Epoch: 5   Global Step: 215110   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:47,367-Speed 2627.33 samples/sec   Loss 10.1985   LearningRate 0.0549   Epoch: 5   Global Step: 215120   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:51,241-Speed 2643.58 samples/sec   Loss 10.0566   LearningRate 0.0549   Epoch: 5   Global Step: 215130   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:55,145-Speed 2623.57 samples/sec   Loss 10.0327   LearningRate 0.0549   Epoch: 5   Global Step: 215140   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:53:59,037-Speed 2631.46 samples/sec   Loss 10.0899   LearningRate 0.0549   Epoch: 5   Global Step: 215150   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:02,932-Speed 2629.68 samples/sec   Loss 10.0036   LearningRate 0.0549   Epoch: 5   Global Step: 215160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:06,830-Speed 2628.06 samples/sec   Loss 10.0705   LearningRate 0.0549   Epoch: 5   Global Step: 215170   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:10,734-Speed 2623.29 samples/sec   Loss 10.0096   LearningRate 0.0549   Epoch: 5   Global Step: 215180   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:14,648-Speed 2616.97 samples/sec   Loss 10.0743   LearningRate 0.0548   Epoch: 5   Global Step: 215190   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:18,566-Speed 2614.09 samples/sec   Loss 9.9948   LearningRate 0.0548   Epoch: 5   Global Step: 215200   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:22,486-Speed 2613.02 samples/sec   Loss 10.1949   LearningRate 0.0548   Epoch: 5   Global Step: 215210   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:54:26,337-Speed 2659.64 samples/sec   Loss 10.6052   LearningRate 0.0548   Epoch: 5   Global Step: 215220   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:30,243-Speed 2622.16 samples/sec   Loss 10.3042   LearningRate 0.0548   Epoch: 5   Global Step: 215230   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:34,136-Speed 2630.82 samples/sec   Loss 10.3436   LearningRate 0.0548   Epoch: 5   Global Step: 215240   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:38,033-Speed 2628.48 samples/sec   Loss 10.0639   LearningRate 0.0548   Epoch: 5   Global Step: 215250   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:41,924-Speed 2635.58 samples/sec   Loss 10.1076   LearningRate 0.0548   Epoch: 5   Global Step: 215260   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:45,820-Speed 2628.59 samples/sec   Loss 10.1824   LearningRate 0.0548   Epoch: 5   Global Step: 215270   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:49,714-Speed 2630.03 samples/sec   Loss 10.1345   LearningRate 0.0548   Epoch: 5   Global Step: 215280   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:53,614-Speed 2626.75 samples/sec   Loss 9.9370   LearningRate 0.0548   Epoch: 5   Global Step: 215290   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:54:57,505-Speed 2632.12 samples/sec   Loss 10.1115   LearningRate 0.0548   Epoch: 5   Global Step: 215300   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:55:01,404-Speed 2626.97 samples/sec   Loss 10.0184   LearningRate 0.0548   Epoch: 5   Global Step: 215310   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 19:55:05,295-Speed 2632.04 samples/sec   Loss 10.1159   LearningRate 0.0548   Epoch: 5   Global Step: 215320   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:09,191-Speed 2628.92 samples/sec   Loss 10.1747   LearningRate 0.0548   Epoch: 5   Global Step: 215330   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:13,083-Speed 2631.68 samples/sec   Loss 10.2457   LearningRate 0.0548   Epoch: 5   Global Step: 215340   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:16,983-Speed 2626.57 samples/sec   Loss 10.1608   LearningRate 0.0548   Epoch: 5   Global Step: 215350   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:20,875-Speed 2631.93 samples/sec   Loss 10.1179   LearningRate 0.0548   Epoch: 5   Global Step: 215360   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:24,773-Speed 2626.98 samples/sec   Loss 10.4762   LearningRate 0.0548   Epoch: 5   Global Step: 215370   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:28,677-Speed 2623.56 samples/sec   Loss 10.8365   LearningRate 0.0548   Epoch: 5   Global Step: 215380   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:32,572-Speed 2629.67 samples/sec   Loss 10.2773   LearningRate 0.0548   Epoch: 5   Global Step: 215390   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:36,462-Speed 2632.75 samples/sec   Loss 10.0210   LearningRate 0.0548   Epoch: 5   Global Step: 215400   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:40,353-Speed 2632.18 samples/sec   Loss 10.1552   LearningRate 0.0548   Epoch: 5   Global Step: 215410   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 19:55:44,270-Speed 2615.15 samples/sec   Loss 10.0499   LearningRate 0.0548   Epoch: 5   Global Step: 215420   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:55:48,292-Speed 2546.57 samples/sec   Loss 10.1392   LearningRate 0.0548   Epoch: 5   Global Step: 215430   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:55:52,189-Speed 2628.75 samples/sec   Loss 10.1152   LearningRate 0.0548   Epoch: 5   Global Step: 215440   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:55:56,083-Speed 2630.11 samples/sec   Loss 10.2149   LearningRate 0.0548   Epoch: 5   Global Step: 215450   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:00,089-Speed 2556.83 samples/sec   Loss 10.1835   LearningRate 0.0548   Epoch: 5   Global Step: 215460   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:03,986-Speed 2627.97 samples/sec   Loss 10.2761   LearningRate 0.0548   Epoch: 5   Global Step: 215470   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:07,885-Speed 2627.03 samples/sec   Loss 10.2039   LearningRate 0.0548   Epoch: 5   Global Step: 215480   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:11,779-Speed 2630.07 samples/sec   Loss 10.2547   LearningRate 0.0548   Epoch: 5   Global Step: 215490   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:15,680-Speed 2625.37 samples/sec   Loss 10.0131   LearningRate 0.0548   Epoch: 5   Global Step: 215500   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:19,593-Speed 2617.34 samples/sec   Loss 10.0749   LearningRate 0.0548   Epoch: 5   Global Step: 215510   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 19:56:23,492-Speed 2627.93 samples/sec   Loss 10.2214   LearningRate 0.0548   Epoch: 5   Global Step: 215520   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:27,388-Speed 2628.66 samples/sec   Loss 10.1414   LearningRate 0.0548   Epoch: 5   Global Step: 215530   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:31,284-Speed 2628.99 samples/sec   Loss 10.0247   LearningRate 0.0548   Epoch: 5   Global Step: 215540   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:35,177-Speed 2630.92 samples/sec   Loss 10.1898   LearningRate 0.0548   Epoch: 5   Global Step: 215550   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:39,068-Speed 2632.52 samples/sec   Loss 10.0449   LearningRate 0.0548   Epoch: 5   Global Step: 215560   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:42,960-Speed 2631.07 samples/sec   Loss 10.1265   LearningRate 0.0548   Epoch: 5   Global Step: 215570   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:46,852-Speed 2631.98 samples/sec   Loss 10.0250   LearningRate 0.0548   Epoch: 5   Global Step: 215580   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:50,744-Speed 2631.72 samples/sec   Loss 10.1180   LearningRate 0.0548   Epoch: 5   Global Step: 215590   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:54,655-Speed 2619.01 samples/sec   Loss 9.9520   LearningRate 0.0548   Epoch: 5   Global Step: 215600   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:56:58,550-Speed 2629.39 samples/sec   Loss 10.1569   LearningRate 0.0548   Epoch: 5   Global Step: 215610   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:02,448-Speed 2627.52 samples/sec   Loss 10.0692   LearningRate 0.0548   Epoch: 5   Global Step: 215620   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:57:06,346-Speed 2627.89 samples/sec   Loss 10.1952   LearningRate 0.0548   Epoch: 5   Global Step: 215630   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:57:10,241-Speed 2629.39 samples/sec   Loss 10.0606   LearningRate 0.0548   Epoch: 5   Global Step: 215640   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:57:14,136-Speed 2630.37 samples/sec   Loss 10.1459   LearningRate 0.0548   Epoch: 5   Global Step: 215650   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:57:18,012-Speed 2641.74 samples/sec   Loss 10.0123   LearningRate 0.0548   Epoch: 5   Global Step: 215660   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:21,905-Speed 2631.61 samples/sec   Loss 10.0523   LearningRate 0.0548   Epoch: 5   Global Step: 215670   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:25,801-Speed 2628.77 samples/sec   Loss 9.9774   LearningRate 0.0548   Epoch: 5   Global Step: 215680   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:29,693-Speed 2631.49 samples/sec   Loss 10.0688   LearningRate 0.0548   Epoch: 5   Global Step: 215690   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:33,594-Speed 2625.49 samples/sec   Loss 9.9664   LearningRate 0.0548   Epoch: 5   Global Step: 215700   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:37,500-Speed 2622.32 samples/sec   Loss 10.1447   LearningRate 0.0548   Epoch: 5   Global Step: 215710   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:41,396-Speed 2628.76 samples/sec   Loss 10.0694   LearningRate 0.0548   Epoch: 5   Global Step: 215720   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:45,288-Speed 2631.74 samples/sec   Loss 9.8772   LearningRate 0.0548   Epoch: 5   Global Step: 215730   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:49,179-Speed 2632.30 samples/sec   Loss 9.9155   LearningRate 0.0548   Epoch: 5   Global Step: 215740   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:53,070-Speed 2632.75 samples/sec   Loss 10.1016   LearningRate 0.0547   Epoch: 5   Global Step: 215750   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:57:56,963-Speed 2630.62 samples/sec   Loss 10.1050   LearningRate 0.0547   Epoch: 5   Global Step: 215760   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:58:00,858-Speed 2629.96 samples/sec   Loss 10.1032   LearningRate 0.0547   Epoch: 5   Global Step: 215770   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:58:04,751-Speed 2630.62 samples/sec   Loss 9.9849   LearningRate 0.0547   Epoch: 5   Global Step: 215780   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:58:08,651-Speed 2625.94 samples/sec   Loss 10.0752   LearningRate 0.0547   Epoch: 5   Global Step: 215790   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:12,551-Speed 2625.83 samples/sec   Loss 10.1828   LearningRate 0.0547   Epoch: 5   Global Step: 215800   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:16,459-Speed 2621.02 samples/sec   Loss 10.0465   LearningRate 0.0547   Epoch: 5   Global Step: 215810   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:20,357-Speed 2627.72 samples/sec   Loss 10.0983   LearningRate 0.0547   Epoch: 5   Global Step: 215820   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:24,256-Speed 2627.00 samples/sec   Loss 9.9483   LearningRate 0.0547   Epoch: 5   Global Step: 215830   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:28,164-Speed 2620.64 samples/sec   Loss 10.1484   LearningRate 0.0547   Epoch: 5   Global Step: 215840   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:32,083-Speed 2613.58 samples/sec   Loss 10.1368   LearningRate 0.0547   Epoch: 5   Global Step: 215850   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:36,003-Speed 2613.18 samples/sec   Loss 10.0650   LearningRate 0.0547   Epoch: 5   Global Step: 215860   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:40,078-Speed 2513.36 samples/sec   Loss 10.0661   LearningRate 0.0547   Epoch: 5   Global Step: 215870   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:44,002-Speed 2609.82 samples/sec   Loss 10.0620   LearningRate 0.0547   Epoch: 5   Global Step: 215880   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:58:47,934-Speed 2605.89 samples/sec   Loss 10.0619   LearningRate 0.0547   Epoch: 5   Global Step: 215890   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:58:51,831-Speed 2628.16 samples/sec   Loss 9.9651   LearningRate 0.0547   Epoch: 5   Global Step: 215900   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:58:55,736-Speed 2623.22 samples/sec   Loss 9.9596   LearningRate 0.0547   Epoch: 5   Global Step: 215910   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:58:59,636-Speed 2626.26 samples/sec   Loss 9.9378   LearningRate 0.0547   Epoch: 5   Global Step: 215920   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:59:03,541-Speed 2622.82 samples/sec   Loss 9.9800   LearningRate 0.0547   Epoch: 5   Global Step: 215930   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:59:07,452-Speed 2618.49 samples/sec   Loss 10.1137   LearningRate 0.0547   Epoch: 5   Global Step: 215940   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:59:11,357-Speed 2622.97 samples/sec   Loss 10.0145   LearningRate 0.0547   Epoch: 5   Global Step: 215950   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:59:15,247-Speed 2633.23 samples/sec   Loss 9.9867   LearningRate 0.0547   Epoch: 5   Global Step: 215960   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:19,143-Speed 2629.26 samples/sec   Loss 10.1360   LearningRate 0.0547   Epoch: 5   Global Step: 215970   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:23,040-Speed 2628.91 samples/sec   Loss 10.0032   LearningRate 0.0547   Epoch: 5   Global Step: 215980   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:26,930-Speed 2633.94 samples/sec   Loss 10.0657   LearningRate 0.0547   Epoch: 5   Global Step: 215990   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:30,842-Speed 2618.38 samples/sec   Loss 10.2476   LearningRate 0.0547   Epoch: 5   Global Step: 216000   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:34,738-Speed 2629.11 samples/sec   Loss 10.0435   LearningRate 0.0547   Epoch: 5   Global Step: 216010   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:38,631-Speed 2631.23 samples/sec   Loss 9.9358   LearningRate 0.0547   Epoch: 5   Global Step: 216020   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:42,523-Speed 2631.58 samples/sec   Loss 9.9274   LearningRate 0.0547   Epoch: 5   Global Step: 216030   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:46,465-Speed 2598.38 samples/sec   Loss 10.1142   LearningRate 0.0547   Epoch: 5   Global Step: 216040   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:50,381-Speed 2615.51 samples/sec   Loss 10.1660   LearningRate 0.0547   Epoch: 5   Global Step: 216050   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 19:59:54,272-Speed 2632.58 samples/sec   Loss 10.1042   LearningRate 0.0547   Epoch: 5   Global Step: 216060   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 19:59:58,167-Speed 2629.67 samples/sec   Loss 10.0708   LearningRate 0.0547   Epoch: 5   Global Step: 216070   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:00:02,081-Speed 2617.15 samples/sec   Loss 10.1124   LearningRate 0.0547   Epoch: 5   Global Step: 216080   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:00:05,981-Speed 2626.35 samples/sec   Loss 10.0829   LearningRate 0.0547   Epoch: 5   Global Step: 216090   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:00:09,890-Speed 2620.48 samples/sec   Loss 10.2089   LearningRate 0.0547   Epoch: 5   Global Step: 216100   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:13,786-Speed 2628.62 samples/sec   Loss 9.9972   LearningRate 0.0547   Epoch: 5   Global Step: 216110   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:17,688-Speed 2625.08 samples/sec   Loss 10.0061   LearningRate 0.0547   Epoch: 5   Global Step: 216120   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:21,587-Speed 2627.39 samples/sec   Loss 9.9419   LearningRate 0.0547   Epoch: 5   Global Step: 216130   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:25,482-Speed 2629.96 samples/sec   Loss 10.0756   LearningRate 0.0547   Epoch: 5   Global Step: 216140   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:29,606-Speed 2483.05 samples/sec   Loss 10.1056   LearningRate 0.0547   Epoch: 5   Global Step: 216150   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:33,509-Speed 2624.43 samples/sec   Loss 10.0427   LearningRate 0.0547   Epoch: 5   Global Step: 216160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:00:37,382-Speed 2644.78 samples/sec   Loss 10.0735   LearningRate 0.0547   Epoch: 5   Global Step: 216170   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:00:41,275-Speed 2630.99 samples/sec   Loss 10.1282   LearningRate 0.0547   Epoch: 5   Global Step: 216180   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:00:45,170-Speed 2629.71 samples/sec   Loss 10.0710   LearningRate 0.0547   Epoch: 5   Global Step: 216190   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:00:49,063-Speed 2630.86 samples/sec   Loss 10.0142   LearningRate 0.0547   Epoch: 5   Global Step: 216200   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:00:52,957-Speed 2630.16 samples/sec   Loss 9.9890   LearningRate 0.0547   Epoch: 5   Global Step: 216210   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:00:56,850-Speed 2631.41 samples/sec   Loss 10.0780   LearningRate 0.0547   Epoch: 5   Global Step: 216220   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:01:00,759-Speed 2620.35 samples/sec   Loss 10.1171   LearningRate 0.0547   Epoch: 5   Global Step: 216230   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:01:04,672-Speed 2617.34 samples/sec   Loss 10.1073   LearningRate 0.0547   Epoch: 5   Global Step: 216240   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:01:08,572-Speed 2626.40 samples/sec   Loss 10.1200   LearningRate 0.0547   Epoch: 5   Global Step: 216250   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:01:12,479-Speed 2621.58 samples/sec   Loss 10.0703   LearningRate 0.0547   Epoch: 5   Global Step: 216260   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:01:16,372-Speed 2631.16 samples/sec   Loss 10.0508   LearningRate 0.0547   Epoch: 5   Global Step: 216270   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:20,266-Speed 2630.66 samples/sec   Loss 9.9793   LearningRate 0.0547   Epoch: 5   Global Step: 216280   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:24,160-Speed 2630.39 samples/sec   Loss 10.0980   LearningRate 0.0547   Epoch: 5   Global Step: 216290   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:28,056-Speed 2629.04 samples/sec   Loss 10.1142   LearningRate 0.0547   Epoch: 5   Global Step: 216300   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:31,956-Speed 2626.58 samples/sec   Loss 10.1500   LearningRate 0.0546   Epoch: 5   Global Step: 216310   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:35,850-Speed 2630.27 samples/sec   Loss 10.0766   LearningRate 0.0546   Epoch: 5   Global Step: 216320   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:39,743-Speed 2630.65 samples/sec   Loss 10.1190   LearningRate 0.0546   Epoch: 5   Global Step: 216330   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:43,633-Speed 2633.25 samples/sec   Loss 10.0501   LearningRate 0.0546   Epoch: 5   Global Step: 216340   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:47,527-Speed 2630.58 samples/sec   Loss 10.0521   LearningRate 0.0546   Epoch: 5   Global Step: 216350   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:51,419-Speed 2631.85 samples/sec   Loss 10.0829   LearningRate 0.0546   Epoch: 5   Global Step: 216360   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:01:55,309-Speed 2632.93 samples/sec   Loss 10.1391   LearningRate 0.0546   Epoch: 5   Global Step: 216370   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:01:59,219-Speed 2619.90 samples/sec   Loss 10.1233   LearningRate 0.0546   Epoch: 5   Global Step: 216380   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:03,110-Speed 2631.80 samples/sec   Loss 10.1611   LearningRate 0.0546   Epoch: 5   Global Step: 216390   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:07,003-Speed 2631.33 samples/sec   Loss 10.0832   LearningRate 0.0546   Epoch: 5   Global Step: 216400   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:10,898-Speed 2629.46 samples/sec   Loss 10.1673   LearningRate 0.0546   Epoch: 5   Global Step: 216410   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:14,793-Speed 2630.27 samples/sec   Loss 10.0024   LearningRate 0.0546   Epoch: 5   Global Step: 216420   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:18,688-Speed 2629.75 samples/sec   Loss 10.1689   LearningRate 0.0546   Epoch: 5   Global Step: 216430   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:22,579-Speed 2632.74 samples/sec   Loss 9.9777   LearningRate 0.0546   Epoch: 5   Global Step: 216440   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:27,298-Speed 2170.07 samples/sec   Loss 10.2525   LearningRate 0.0546   Epoch: 5   Global Step: 216450   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:31,187-Speed 2633.96 samples/sec   Loss 10.1546   LearningRate 0.0546   Epoch: 5   Global Step: 216460   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:35,061-Speed 2643.90 samples/sec   Loss 9.9999   LearningRate 0.0546   Epoch: 5   Global Step: 216470   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:02:38,939-Speed 2641.19 samples/sec   Loss 9.9806   LearningRate 0.0546   Epoch: 5   Global Step: 216480   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:02:42,832-Speed 2630.87 samples/sec   Loss 10.1740   LearningRate 0.0546   Epoch: 5   Global Step: 216490   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:02:46,743-Speed 2618.92 samples/sec   Loss 10.0428   LearningRate 0.0546   Epoch: 5   Global Step: 216500   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:02:50,646-Speed 2624.74 samples/sec   Loss 10.1231   LearningRate 0.0546   Epoch: 5   Global Step: 216510   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:02:54,540-Speed 2630.09 samples/sec   Loss 10.0081   LearningRate 0.0546   Epoch: 5   Global Step: 216520   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:02:58,427-Speed 2635.26 samples/sec   Loss 10.0590   LearningRate 0.0546   Epoch: 5   Global Step: 216530   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:02,321-Speed 2631.10 samples/sec   Loss 10.0873   LearningRate 0.0546   Epoch: 5   Global Step: 216540   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:06,218-Speed 2627.92 samples/sec   Loss 10.1131   LearningRate 0.0546   Epoch: 5   Global Step: 216550   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:10,145-Speed 2607.84 samples/sec   Loss 10.0188   LearningRate 0.0546   Epoch: 5   Global Step: 216560   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:14,038-Speed 2630.96 samples/sec   Loss 10.0041   LearningRate 0.0546   Epoch: 5   Global Step: 216570   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:17,930-Speed 2632.35 samples/sec   Loss 10.0229   LearningRate 0.0546   Epoch: 5   Global Step: 216580   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:21,837-Speed 2621.79 samples/sec   Loss 10.0692   LearningRate 0.0546   Epoch: 5   Global Step: 216590   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:25,729-Speed 2631.19 samples/sec   Loss 10.0824   LearningRate 0.0546   Epoch: 5   Global Step: 216600   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:29,619-Speed 2633.71 samples/sec   Loss 10.0409   LearningRate 0.0546   Epoch: 5   Global Step: 216610   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:33,529-Speed 2619.01 samples/sec   Loss 10.0942   LearningRate 0.0546   Epoch: 5   Global Step: 216620   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:03:37,422-Speed 2631.14 samples/sec   Loss 10.0293   LearningRate 0.0546   Epoch: 5   Global Step: 216630   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:03:41,315-Speed 2630.95 samples/sec   Loss 9.9682   LearningRate 0.0546   Epoch: 5   Global Step: 216640   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:03:45,211-Speed 2628.99 samples/sec   Loss 9.9969   LearningRate 0.0546   Epoch: 5   Global Step: 216650   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:03:49,111-Speed 2626.49 samples/sec   Loss 9.9300   LearningRate 0.0546   Epoch: 5   Global Step: 216660   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:03:53,009-Speed 2627.99 samples/sec   Loss 10.0362   LearningRate 0.0546   Epoch: 5   Global Step: 216670   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:03:56,907-Speed 2627.02 samples/sec   Loss 10.1355   LearningRate 0.0546   Epoch: 5   Global Step: 216680   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:04:00,815-Speed 2621.12 samples/sec   Loss 10.1593   LearningRate 0.0546   Epoch: 5   Global Step: 216690   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:04:04,718-Speed 2624.22 samples/sec   Loss 9.9772   LearningRate 0.0546   Epoch: 5   Global Step: 216700   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:04:08,621-Speed 2624.48 samples/sec   Loss 10.1129   LearningRate 0.0546   Epoch: 5   Global Step: 216710   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:04:12,513-Speed 2631.76 samples/sec   Loss 10.0196   LearningRate 0.0546   Epoch: 5   Global Step: 216720   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:04:16,405-Speed 2631.28 samples/sec   Loss 9.9015   LearningRate 0.0546   Epoch: 5   Global Step: 216730   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:20,303-Speed 2627.88 samples/sec   Loss 10.0913   LearningRate 0.0546   Epoch: 5   Global Step: 216740   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:24,207-Speed 2623.36 samples/sec   Loss 10.0472   LearningRate 0.0546   Epoch: 5   Global Step: 216750   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:28,105-Speed 2627.42 samples/sec   Loss 9.8549   LearningRate 0.0546   Epoch: 5   Global Step: 216760   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:31,999-Speed 2630.28 samples/sec   Loss 10.1018   LearningRate 0.0546   Epoch: 5   Global Step: 216770   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:35,916-Speed 2614.68 samples/sec   Loss 10.0443   LearningRate 0.0546   Epoch: 5   Global Step: 216780   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:39,817-Speed 2625.77 samples/sec   Loss 9.9793   LearningRate 0.0546   Epoch: 5   Global Step: 216790   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:43,711-Speed 2629.95 samples/sec   Loss 9.9900   LearningRate 0.0546   Epoch: 5   Global Step: 216800   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:47,609-Speed 2628.12 samples/sec   Loss 10.0682   LearningRate 0.0546   Epoch: 5   Global Step: 216810   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:51,503-Speed 2630.10 samples/sec   Loss 10.0313   LearningRate 0.0546   Epoch: 5   Global Step: 216820   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:04:55,400-Speed 2628.39 samples/sec   Loss 10.1056   LearningRate 0.0546   Epoch: 5   Global Step: 216830   Fp16 Grad Scale: 524288   Required: 69 hours
Training: 2022-04-13 20:04:59,279-Speed 2640.50 samples/sec   Loss 9.9827   LearningRate 0.0546   Epoch: 5   Global Step: 216840   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:03,181-Speed 2624.79 samples/sec   Loss 10.0372   LearningRate 0.0546   Epoch: 5   Global Step: 216850   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:07,073-Speed 2631.35 samples/sec   Loss 10.0318   LearningRate 0.0546   Epoch: 5   Global Step: 216860   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:10,968-Speed 2629.60 samples/sec   Loss 10.0534   LearningRate 0.0545   Epoch: 5   Global Step: 216870   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:14,863-Speed 2629.90 samples/sec   Loss 10.0579   LearningRate 0.0545   Epoch: 5   Global Step: 216880   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:18,760-Speed 2628.92 samples/sec   Loss 10.0201   LearningRate 0.0545   Epoch: 5   Global Step: 216890   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:22,652-Speed 2631.41 samples/sec   Loss 10.0954   LearningRate 0.0545   Epoch: 5   Global Step: 216900   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:26,544-Speed 2631.82 samples/sec   Loss 10.0812   LearningRate 0.0545   Epoch: 5   Global Step: 216910   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:30,470-Speed 2608.37 samples/sec   Loss 9.9815   LearningRate 0.0545   Epoch: 5   Global Step: 216920   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:05:34,346-Speed 2642.82 samples/sec   Loss 10.1541   LearningRate 0.0545   Epoch: 5   Global Step: 216930   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:05:38,256-Speed 2619.62 samples/sec   Loss 10.0795   LearningRate 0.0545   Epoch: 5   Global Step: 216940   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:05:42,157-Speed 2625.80 samples/sec   Loss 9.9195   LearningRate 0.0545   Epoch: 5   Global Step: 216950   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:05:46,051-Speed 2630.31 samples/sec   Loss 10.1355   LearningRate 0.0545   Epoch: 5   Global Step: 216960   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:05:49,954-Speed 2624.18 samples/sec   Loss 9.9105   LearningRate 0.0545   Epoch: 5   Global Step: 216970   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:05:53,859-Speed 2622.56 samples/sec   Loss 9.9455   LearningRate 0.0545   Epoch: 5   Global Step: 216980   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:05:57,885-Speed 2544.68 samples/sec   Loss 9.9500   LearningRate 0.0545   Epoch: 5   Global Step: 216990   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:01,784-Speed 2626.88 samples/sec   Loss 10.0066   LearningRate 0.0545   Epoch: 5   Global Step: 217000   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:05,671-Speed 2634.70 samples/sec   Loss 10.1468   LearningRate 0.0545   Epoch: 5   Global Step: 217010   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:09,560-Speed 2633.53 samples/sec   Loss 10.0161   LearningRate 0.0545   Epoch: 5   Global Step: 217020   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:13,434-Speed 2644.00 samples/sec   Loss 10.1798   LearningRate 0.0545   Epoch: 5   Global Step: 217030   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:17,345-Speed 2619.16 samples/sec   Loss 9.9199   LearningRate 0.0545   Epoch: 5   Global Step: 217040   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:21,237-Speed 2631.25 samples/sec   Loss 10.1146   LearningRate 0.0545   Epoch: 5   Global Step: 217050   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:25,130-Speed 2631.12 samples/sec   Loss 10.0760   LearningRate 0.0545   Epoch: 5   Global Step: 217060   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:29,027-Speed 2628.78 samples/sec   Loss 10.0989   LearningRate 0.0545   Epoch: 5   Global Step: 217070   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:06:32,955-Speed 2606.93 samples/sec   Loss 10.1636   LearningRate 0.0545   Epoch: 5   Global Step: 217080   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:06:36,815-Speed 2653.80 samples/sec   Loss 10.4585   LearningRate 0.0545   Epoch: 5   Global Step: 217090   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:06:40,704-Speed 2634.17 samples/sec   Loss 10.2140   LearningRate 0.0545   Epoch: 5   Global Step: 217100   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:06:44,590-Speed 2635.43 samples/sec   Loss 10.1633   LearningRate 0.0545   Epoch: 5   Global Step: 217110   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:06:48,476-Speed 2636.03 samples/sec   Loss 10.2315   LearningRate 0.0545   Epoch: 5   Global Step: 217120   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:06:52,366-Speed 2632.87 samples/sec   Loss 10.1710   LearningRate 0.0545   Epoch: 5   Global Step: 217130   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:06:56,252-Speed 2636.27 samples/sec   Loss 10.1395   LearningRate 0.0545   Epoch: 5   Global Step: 217140   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:07:00,138-Speed 2635.51 samples/sec   Loss 10.1025   LearningRate 0.0545   Epoch: 5   Global Step: 217150   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:07:04,032-Speed 2629.97 samples/sec   Loss 10.1127   LearningRate 0.0545   Epoch: 5   Global Step: 217160   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:07:07,934-Speed 2624.50 samples/sec   Loss 10.0332   LearningRate 0.0545   Epoch: 5   Global Step: 217170   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:07:11,846-Speed 2618.83 samples/sec   Loss 10.0382   LearningRate 0.0545   Epoch: 5   Global Step: 217180   Fp16 Grad Scale: 16384   Required: 69 hours
Training: 2022-04-13 20:07:15,753-Speed 2621.29 samples/sec   Loss 9.9599   LearningRate 0.0545   Epoch: 5   Global Step: 217190   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:19,654-Speed 2625.47 samples/sec   Loss 10.0123   LearningRate 0.0545   Epoch: 5   Global Step: 217200   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:23,572-Speed 2614.66 samples/sec   Loss 10.1065   LearningRate 0.0545   Epoch: 5   Global Step: 217210   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:27,469-Speed 2628.61 samples/sec   Loss 10.0463   LearningRate 0.0545   Epoch: 5   Global Step: 217220   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:31,360-Speed 2632.67 samples/sec   Loss 10.0176   LearningRate 0.0545   Epoch: 5   Global Step: 217230   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:35,254-Speed 2629.69 samples/sec   Loss 10.0639   LearningRate 0.0545   Epoch: 5   Global Step: 217240   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:39,247-Speed 2565.12 samples/sec   Loss 10.0764   LearningRate 0.0545   Epoch: 5   Global Step: 217250   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:43,137-Speed 2632.50 samples/sec   Loss 10.1571   LearningRate 0.0545   Epoch: 5   Global Step: 217260   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:47,028-Speed 2632.80 samples/sec   Loss 10.0825   LearningRate 0.0545   Epoch: 5   Global Step: 217270   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:50,923-Speed 2629.54 samples/sec   Loss 10.0185   LearningRate 0.0545   Epoch: 5   Global Step: 217280   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:07:54,817-Speed 2631.21 samples/sec   Loss 10.0667   LearningRate 0.0545   Epoch: 5   Global Step: 217290   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:07:58,710-Speed 2630.22 samples/sec   Loss 10.1981   LearningRate 0.0545   Epoch: 5   Global Step: 217300   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:02,607-Speed 2628.76 samples/sec   Loss 10.0923   LearningRate 0.0545   Epoch: 5   Global Step: 217310   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:06,499-Speed 2631.62 samples/sec   Loss 10.0721   LearningRate 0.0545   Epoch: 5   Global Step: 217320   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:10,428-Speed 2606.83 samples/sec   Loss 10.1296   LearningRate 0.0545   Epoch: 5   Global Step: 217330   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:14,323-Speed 2628.94 samples/sec   Loss 10.0501   LearningRate 0.0545   Epoch: 5   Global Step: 217340   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:18,215-Speed 2632.03 samples/sec   Loss 10.1830   LearningRate 0.0545   Epoch: 5   Global Step: 217350   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:22,109-Speed 2630.82 samples/sec   Loss 10.0462   LearningRate 0.0545   Epoch: 5   Global Step: 217360   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:08:26,022-Speed 2617.40 samples/sec   Loss 10.4574   LearningRate 0.0545   Epoch: 5   Global Step: 217370   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:29,920-Speed 2627.74 samples/sec   Loss 11.1287   LearningRate 0.0545   Epoch: 5   Global Step: 217380   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:33,820-Speed 2626.42 samples/sec   Loss 10.4590   LearningRate 0.0545   Epoch: 5   Global Step: 217390   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:37,703-Speed 2637.58 samples/sec   Loss 10.3181   LearningRate 0.0545   Epoch: 5   Global Step: 217400   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:41,594-Speed 2632.01 samples/sec   Loss 10.1067   LearningRate 0.0545   Epoch: 5   Global Step: 217410   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:45,487-Speed 2631.50 samples/sec   Loss 10.1444   LearningRate 0.0545   Epoch: 5   Global Step: 217420   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:49,376-Speed 2633.52 samples/sec   Loss 10.1066   LearningRate 0.0545   Epoch: 5   Global Step: 217430   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:53,267-Speed 2632.66 samples/sec   Loss 10.1837   LearningRate 0.0544   Epoch: 5   Global Step: 217440   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:08:57,156-Speed 2633.89 samples/sec   Loss 10.2390   LearningRate 0.0544   Epoch: 5   Global Step: 217450   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:09:01,045-Speed 2633.99 samples/sec   Loss 10.1422   LearningRate 0.0544   Epoch: 5   Global Step: 217460   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:09:05,114-Speed 2516.87 samples/sec   Loss 10.0577   LearningRate 0.0544   Epoch: 5   Global Step: 217470   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:09,204-Speed 2504.06 samples/sec   Loss 10.1584   LearningRate 0.0544   Epoch: 5   Global Step: 217480   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:13,243-Speed 2535.91 samples/sec   Loss 10.1698   LearningRate 0.0544   Epoch: 5   Global Step: 217490   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:17,135-Speed 2631.87 samples/sec   Loss 10.2079   LearningRate 0.0544   Epoch: 5   Global Step: 217500   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:21,085-Speed 2593.28 samples/sec   Loss 10.1691   LearningRate 0.0544   Epoch: 5   Global Step: 217510   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:25,044-Speed 2586.99 samples/sec   Loss 10.0856   LearningRate 0.0544   Epoch: 5   Global Step: 217520   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:28,934-Speed 2633.64 samples/sec   Loss 10.1531   LearningRate 0.0544   Epoch: 5   Global Step: 217530   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:32,848-Speed 2616.65 samples/sec   Loss 10.0526   LearningRate 0.0544   Epoch: 5   Global Step: 217540   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:36,740-Speed 2631.91 samples/sec   Loss 10.1068   LearningRate 0.0544   Epoch: 5   Global Step: 217550   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:40,633-Speed 2630.67 samples/sec   Loss 10.1644   LearningRate 0.0544   Epoch: 5   Global Step: 217560   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:09:44,539-Speed 2622.18 samples/sec   Loss 10.0139   LearningRate 0.0544   Epoch: 5   Global Step: 217570   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:09:48,497-Speed 2587.80 samples/sec   Loss 10.0515   LearningRate 0.0544   Epoch: 5   Global Step: 217580   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:09:52,408-Speed 2619.38 samples/sec   Loss 10.0225   LearningRate 0.0544   Epoch: 5   Global Step: 217590   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:09:56,300-Speed 2631.80 samples/sec   Loss 10.1342   LearningRate 0.0544   Epoch: 5   Global Step: 217600   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:00,193-Speed 2630.56 samples/sec   Loss 9.9988   LearningRate 0.0544   Epoch: 5   Global Step: 217610   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:04,092-Speed 2627.15 samples/sec   Loss 9.9976   LearningRate 0.0544   Epoch: 5   Global Step: 217620   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:07,987-Speed 2629.28 samples/sec   Loss 10.0973   LearningRate 0.0544   Epoch: 5   Global Step: 217630   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:11,883-Speed 2628.70 samples/sec   Loss 10.0947   LearningRate 0.0544   Epoch: 5   Global Step: 217640   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:15,776-Speed 2631.56 samples/sec   Loss 10.0439   LearningRate 0.0544   Epoch: 5   Global Step: 217650   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:19,670-Speed 2629.96 samples/sec   Loss 9.9518   LearningRate 0.0544   Epoch: 5   Global Step: 217660   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:23,568-Speed 2628.55 samples/sec   Loss 9.9444   LearningRate 0.0544   Epoch: 5   Global Step: 217670   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:10:27,471-Speed 2624.04 samples/sec   Loss 10.0400   LearningRate 0.0544   Epoch: 5   Global Step: 217680   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:10:31,363-Speed 2631.70 samples/sec   Loss 10.1288   LearningRate 0.0544   Epoch: 5   Global Step: 217690   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:10:35,289-Speed 2608.38 samples/sec   Loss 10.0135   LearningRate 0.0544   Epoch: 5   Global Step: 217700   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:10:39,163-Speed 2644.08 samples/sec   Loss 10.0862   LearningRate 0.0544   Epoch: 5   Global Step: 217710   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:43,056-Speed 2631.10 samples/sec   Loss 9.9968   LearningRate 0.0544   Epoch: 5   Global Step: 217720   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:46,947-Speed 2632.17 samples/sec   Loss 10.0521   LearningRate 0.0544   Epoch: 5   Global Step: 217730   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:50,851-Speed 2623.65 samples/sec   Loss 10.0330   LearningRate 0.0544   Epoch: 5   Global Step: 217740   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:54,738-Speed 2635.50 samples/sec   Loss 10.0074   LearningRate 0.0544   Epoch: 5   Global Step: 217750   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:10:58,630-Speed 2631.31 samples/sec   Loss 9.9975   LearningRate 0.0544   Epoch: 5   Global Step: 217760   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:11:02,520-Speed 2632.96 samples/sec   Loss 10.1640   LearningRate 0.0544   Epoch: 5   Global Step: 217770   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:11:06,416-Speed 2629.19 samples/sec   Loss 10.0885   LearningRate 0.0544   Epoch: 5   Global Step: 217780   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:11:10,283-Speed 2648.83 samples/sec   Loss 10.4367   LearningRate 0.0544   Epoch: 5   Global Step: 217790   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:14,182-Speed 2626.79 samples/sec   Loss 10.1568   LearningRate 0.0544   Epoch: 5   Global Step: 217800   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:18,246-Speed 2520.51 samples/sec   Loss 10.1481   LearningRate 0.0544   Epoch: 5   Global Step: 217810   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:22,138-Speed 2631.29 samples/sec   Loss 10.0982   LearningRate 0.0544   Epoch: 5   Global Step: 217820   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:26,044-Speed 2622.82 samples/sec   Loss 10.1176   LearningRate 0.0544   Epoch: 5   Global Step: 217830   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:29,940-Speed 2629.28 samples/sec   Loss 9.9946   LearningRate 0.0544   Epoch: 5   Global Step: 217840   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:33,840-Speed 2625.95 samples/sec   Loss 10.1233   LearningRate 0.0544   Epoch: 5   Global Step: 217850   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:37,734-Speed 2630.47 samples/sec   Loss 10.0687   LearningRate 0.0544   Epoch: 5   Global Step: 217860   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:41,622-Speed 2634.16 samples/sec   Loss 10.1467   LearningRate 0.0544   Epoch: 5   Global Step: 217870   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:45,513-Speed 2632.34 samples/sec   Loss 10.0776   LearningRate 0.0544   Epoch: 5   Global Step: 217880   Fp16 Grad Scale: 32768   Required: 69 hours
Training: 2022-04-13 20:11:49,408-Speed 2629.56 samples/sec   Loss 10.0334   LearningRate 0.0544   Epoch: 5   Global Step: 217890   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:11:53,299-Speed 2632.76 samples/sec   Loss 9.9675   LearningRate 0.0544   Epoch: 5   Global Step: 217900   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:11:57,192-Speed 2630.96 samples/sec   Loss 10.1356   LearningRate 0.0544   Epoch: 5   Global Step: 217910   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:01,083-Speed 2632.11 samples/sec   Loss 9.9899   LearningRate 0.0544   Epoch: 5   Global Step: 217920   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:04,977-Speed 2630.35 samples/sec   Loss 9.8191   LearningRate 0.0544   Epoch: 5   Global Step: 217930   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:08,865-Speed 2634.25 samples/sec   Loss 10.0460   LearningRate 0.0544   Epoch: 5   Global Step: 217940   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:12,757-Speed 2631.61 samples/sec   Loss 10.0970   LearningRate 0.0544   Epoch: 5   Global Step: 217950   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:16,649-Speed 2631.88 samples/sec   Loss 10.0359   LearningRate 0.0544   Epoch: 5   Global Step: 217960   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:20,566-Speed 2614.84 samples/sec   Loss 9.8988   LearningRate 0.0544   Epoch: 5   Global Step: 217970   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:24,489-Speed 2611.40 samples/sec   Loss 10.0387   LearningRate 0.0544   Epoch: 5   Global Step: 217980   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:12:28,390-Speed 2625.77 samples/sec   Loss 10.0061   LearningRate 0.0544   Epoch: 5   Global Step: 217990   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:32,297-Speed 2621.49 samples/sec   Loss 10.0660   LearningRate 0.0543   Epoch: 5   Global Step: 218000   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:36,192-Speed 2629.25 samples/sec   Loss 10.0600   LearningRate 0.0543   Epoch: 5   Global Step: 218010   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:40,094-Speed 2624.87 samples/sec   Loss 10.0744   LearningRate 0.0543   Epoch: 5   Global Step: 218020   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:43,992-Speed 2627.61 samples/sec   Loss 10.0627   LearningRate 0.0543   Epoch: 5   Global Step: 218030   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:47,914-Speed 2611.83 samples/sec   Loss 10.0427   LearningRate 0.0543   Epoch: 5   Global Step: 218040   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:51,809-Speed 2629.14 samples/sec   Loss 10.0970   LearningRate 0.0543   Epoch: 5   Global Step: 218050   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:55,719-Speed 2619.97 samples/sec   Loss 10.0568   LearningRate 0.0543   Epoch: 5   Global Step: 218060   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:12:59,620-Speed 2625.51 samples/sec   Loss 10.0643   LearningRate 0.0543   Epoch: 5   Global Step: 218070   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:03,523-Speed 2624.40 samples/sec   Loss 10.0285   LearningRate 0.0543   Epoch: 5   Global Step: 218080   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:07,425-Speed 2625.12 samples/sec   Loss 9.9531   LearningRate 0.0543   Epoch: 5   Global Step: 218090   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:13:11,317-Speed 2631.83 samples/sec   Loss 10.0583   LearningRate 0.0543   Epoch: 5   Global Step: 218100   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:13:15,213-Speed 2628.60 samples/sec   Loss 10.1306   LearningRate 0.0543   Epoch: 5   Global Step: 218110   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:13:19,091-Speed 2641.49 samples/sec   Loss 10.0554   LearningRate 0.0543   Epoch: 5   Global Step: 218120   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:22,983-Speed 2631.60 samples/sec   Loss 10.0395   LearningRate 0.0543   Epoch: 5   Global Step: 218130   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:27,019-Speed 2538.29 samples/sec   Loss 10.0835   LearningRate 0.0543   Epoch: 5   Global Step: 218140   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:31,084-Speed 2519.42 samples/sec   Loss 9.9614   LearningRate 0.0543   Epoch: 5   Global Step: 218150   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:35,082-Speed 2561.83 samples/sec   Loss 9.9924   LearningRate 0.0543   Epoch: 5   Global Step: 218160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:39,010-Speed 2607.14 samples/sec   Loss 10.1951   LearningRate 0.0543   Epoch: 5   Global Step: 218170   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:42,902-Speed 2631.79 samples/sec   Loss 10.0745   LearningRate 0.0543   Epoch: 5   Global Step: 218180   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:46,803-Speed 2625.06 samples/sec   Loss 10.0729   LearningRate 0.0543   Epoch: 5   Global Step: 218190   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:50,695-Speed 2631.85 samples/sec   Loss 9.9785   LearningRate 0.0543   Epoch: 5   Global Step: 218200   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:54,586-Speed 2632.22 samples/sec   Loss 9.9833   LearningRate 0.0543   Epoch: 5   Global Step: 218210   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:13:58,476-Speed 2633.95 samples/sec   Loss 9.9666   LearningRate 0.0543   Epoch: 5   Global Step: 218220   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:14:02,351-Speed 2643.12 samples/sec   Loss 9.9895   LearningRate 0.0543   Epoch: 5   Global Step: 218230   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:06,243-Speed 2631.33 samples/sec   Loss 9.9958   LearningRate 0.0543   Epoch: 5   Global Step: 218240   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:10,134-Speed 2632.23 samples/sec   Loss 10.1030   LearningRate 0.0543   Epoch: 5   Global Step: 218250   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:14,023-Speed 2633.64 samples/sec   Loss 9.9691   LearningRate 0.0543   Epoch: 5   Global Step: 218260   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:17,914-Speed 2632.05 samples/sec   Loss 10.0582   LearningRate 0.0543   Epoch: 5   Global Step: 218270   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:21,816-Speed 2624.84 samples/sec   Loss 10.0572   LearningRate 0.0543   Epoch: 5   Global Step: 218280   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:25,732-Speed 2615.93 samples/sec   Loss 10.0170   LearningRate 0.0543   Epoch: 5   Global Step: 218290   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:29,622-Speed 2632.44 samples/sec   Loss 9.9769   LearningRate 0.0543   Epoch: 5   Global Step: 218300   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:33,515-Speed 2631.51 samples/sec   Loss 10.0784   LearningRate 0.0543   Epoch: 5   Global Step: 218310   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:37,409-Speed 2630.11 samples/sec   Loss 10.0153   LearningRate 0.0543   Epoch: 5   Global Step: 218320   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:41,326-Speed 2615.15 samples/sec   Loss 9.9660   LearningRate 0.0543   Epoch: 5   Global Step: 218330   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:14:45,253-Speed 2607.94 samples/sec   Loss 10.1008   LearningRate 0.0543   Epoch: 5   Global Step: 218340   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:49,147-Speed 2630.28 samples/sec   Loss 9.9166   LearningRate 0.0543   Epoch: 5   Global Step: 218350   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:53,045-Speed 2627.82 samples/sec   Loss 10.0474   LearningRate 0.0543   Epoch: 5   Global Step: 218360   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:14:56,945-Speed 2626.62 samples/sec   Loss 10.1493   LearningRate 0.0543   Epoch: 5   Global Step: 218370   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:00,845-Speed 2626.68 samples/sec   Loss 10.1483   LearningRate 0.0543   Epoch: 5   Global Step: 218380   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:04,743-Speed 2627.34 samples/sec   Loss 10.0854   LearningRate 0.0543   Epoch: 5   Global Step: 218390   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:08,639-Speed 2628.94 samples/sec   Loss 10.0507   LearningRate 0.0543   Epoch: 5   Global Step: 218400   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:12,534-Speed 2629.96 samples/sec   Loss 9.9535   LearningRate 0.0543   Epoch: 5   Global Step: 218410   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:16,427-Speed 2630.92 samples/sec   Loss 10.0315   LearningRate 0.0543   Epoch: 5   Global Step: 218420   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:20,321-Speed 2630.07 samples/sec   Loss 10.0884   LearningRate 0.0543   Epoch: 5   Global Step: 218430   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:24,212-Speed 2632.64 samples/sec   Loss 9.8550   LearningRate 0.0543   Epoch: 5   Global Step: 218440   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:15:28,102-Speed 2632.75 samples/sec   Loss 9.9957   LearningRate 0.0543   Epoch: 5   Global Step: 218450   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:15:31,997-Speed 2630.55 samples/sec   Loss 10.0913   LearningRate 0.0543   Epoch: 5   Global Step: 218460   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:15:35,902-Speed 2622.40 samples/sec   Loss 10.1362   LearningRate 0.0543   Epoch: 5   Global Step: 218470   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:15:39,799-Speed 2628.40 samples/sec   Loss 10.1242   LearningRate 0.0543   Epoch: 5   Global Step: 218480   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:15:43,699-Speed 2626.13 samples/sec   Loss 9.9719   LearningRate 0.0543   Epoch: 5   Global Step: 218490   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:47,585-Speed 2635.67 samples/sec   Loss 10.1098   LearningRate 0.0543   Epoch: 5   Global Step: 218500   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:51,483-Speed 2627.50 samples/sec   Loss 10.0373   LearningRate 0.0543   Epoch: 5   Global Step: 218510   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:55,383-Speed 2626.74 samples/sec   Loss 9.9240   LearningRate 0.0543   Epoch: 5   Global Step: 218520   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:15:59,279-Speed 2629.09 samples/sec   Loss 10.0497   LearningRate 0.0543   Epoch: 5   Global Step: 218530   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:16:03,237-Speed 2587.82 samples/sec   Loss 10.0975   LearningRate 0.0543   Epoch: 5   Global Step: 218540   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:16:07,111-Speed 2643.61 samples/sec   Loss 10.0148   LearningRate 0.0543   Epoch: 5   Global Step: 218550   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:11,026-Speed 2616.94 samples/sec   Loss 10.1473   LearningRate 0.0542   Epoch: 5   Global Step: 218560   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:14,916-Speed 2632.66 samples/sec   Loss 10.0384   LearningRate 0.0542   Epoch: 5   Global Step: 218570   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:18,812-Speed 2629.27 samples/sec   Loss 9.9203   LearningRate 0.0542   Epoch: 5   Global Step: 218580   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:22,737-Speed 2610.30 samples/sec   Loss 10.0120   LearningRate 0.0542   Epoch: 5   Global Step: 218590   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:26,632-Speed 2629.62 samples/sec   Loss 10.1177   LearningRate 0.0542   Epoch: 5   Global Step: 218600   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:30,531-Speed 2626.72 samples/sec   Loss 10.0681   LearningRate 0.0542   Epoch: 5   Global Step: 218610   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:34,427-Speed 2628.91 samples/sec   Loss 10.0942   LearningRate 0.0542   Epoch: 5   Global Step: 218620   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:38,319-Speed 2631.88 samples/sec   Loss 9.9943   LearningRate 0.0542   Epoch: 5   Global Step: 218630   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:42,219-Speed 2626.46 samples/sec   Loss 10.0674   LearningRate 0.0542   Epoch: 5   Global Step: 218640   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:16:46,223-Speed 2557.95 samples/sec   Loss 9.9767   LearningRate 0.0542   Epoch: 5   Global Step: 218650   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:16:50,118-Speed 2629.78 samples/sec   Loss 10.1444   LearningRate 0.0542   Epoch: 5   Global Step: 218660   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:16:54,016-Speed 2627.85 samples/sec   Loss 9.8594   LearningRate 0.0542   Epoch: 5   Global Step: 218670   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:16:57,912-Speed 2628.79 samples/sec   Loss 10.0918   LearningRate 0.0542   Epoch: 5   Global Step: 218680   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:01,806-Speed 2630.22 samples/sec   Loss 10.0289   LearningRate 0.0542   Epoch: 5   Global Step: 218690   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:05,702-Speed 2629.48 samples/sec   Loss 9.9871   LearningRate 0.0542   Epoch: 5   Global Step: 218700   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:09,603-Speed 2625.16 samples/sec   Loss 10.1083   LearningRate 0.0542   Epoch: 5   Global Step: 218710   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:13,499-Speed 2629.39 samples/sec   Loss 10.1507   LearningRate 0.0542   Epoch: 5   Global Step: 218720   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:17,398-Speed 2627.72 samples/sec   Loss 9.9886   LearningRate 0.0542   Epoch: 5   Global Step: 218730   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:21,343-Speed 2596.20 samples/sec   Loss 10.1028   LearningRate 0.0542   Epoch: 5   Global Step: 218740   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:25,242-Speed 2627.13 samples/sec   Loss 10.0931   LearningRate 0.0542   Epoch: 5   Global Step: 218750   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:17:29,144-Speed 2624.72 samples/sec   Loss 10.1606   LearningRate 0.0542   Epoch: 5   Global Step: 218760   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:17:33,022-Speed 2641.78 samples/sec   Loss 9.9648   LearningRate 0.0542   Epoch: 5   Global Step: 218770   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:36,920-Speed 2627.09 samples/sec   Loss 10.0013   LearningRate 0.0542   Epoch: 5   Global Step: 218780   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:40,820-Speed 2626.39 samples/sec   Loss 10.0684   LearningRate 0.0542   Epoch: 5   Global Step: 218790   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:44,718-Speed 2627.90 samples/sec   Loss 9.9735   LearningRate 0.0542   Epoch: 5   Global Step: 218800   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:48,617-Speed 2627.33 samples/sec   Loss 9.9950   LearningRate 0.0542   Epoch: 5   Global Step: 218810   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:52,510-Speed 2631.13 samples/sec   Loss 10.0293   LearningRate 0.0542   Epoch: 5   Global Step: 218820   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:17:56,400-Speed 2632.85 samples/sec   Loss 10.1088   LearningRate 0.0542   Epoch: 5   Global Step: 218830   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:00,290-Speed 2633.14 samples/sec   Loss 9.9623   LearningRate 0.0542   Epoch: 5   Global Step: 218840   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:04,184-Speed 2629.59 samples/sec   Loss 9.9415   LearningRate 0.0542   Epoch: 5   Global Step: 218850   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:08,088-Speed 2623.79 samples/sec   Loss 9.9904   LearningRate 0.0542   Epoch: 5   Global Step: 218860   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:11,985-Speed 2628.44 samples/sec   Loss 10.0636   LearningRate 0.0542   Epoch: 5   Global Step: 218870   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:15,893-Speed 2621.21 samples/sec   Loss 10.0654   LearningRate 0.0542   Epoch: 5   Global Step: 218880   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:19,799-Speed 2622.59 samples/sec   Loss 9.9790   LearningRate 0.0542   Epoch: 5   Global Step: 218890   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:23,717-Speed 2614.24 samples/sec   Loss 9.9989   LearningRate 0.0542   Epoch: 5   Global Step: 218900   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:27,608-Speed 2632.91 samples/sec   Loss 9.9674   LearningRate 0.0542   Epoch: 5   Global Step: 218910   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:31,498-Speed 2632.64 samples/sec   Loss 10.0210   LearningRate 0.0542   Epoch: 5   Global Step: 218920   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:35,390-Speed 2631.27 samples/sec   Loss 9.9453   LearningRate 0.0542   Epoch: 5   Global Step: 218930   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:39,292-Speed 2625.07 samples/sec   Loss 9.9155   LearningRate 0.0542   Epoch: 5   Global Step: 218940   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:43,195-Speed 2624.45 samples/sec   Loss 9.9185   LearningRate 0.0542   Epoch: 5   Global Step: 218950   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:47,091-Speed 2629.37 samples/sec   Loss 10.0309   LearningRate 0.0542   Epoch: 5   Global Step: 218960   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:18:50,982-Speed 2632.08 samples/sec   Loss 10.0034   LearningRate 0.0542   Epoch: 5   Global Step: 218970   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:18:54,907-Speed 2609.61 samples/sec   Loss 9.9939   LearningRate 0.0542   Epoch: 5   Global Step: 218980   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:18:58,798-Speed 2632.74 samples/sec   Loss 9.9940   LearningRate 0.0542   Epoch: 5   Global Step: 218990   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:19:02,693-Speed 2629.85 samples/sec   Loss 9.9783   LearningRate 0.0542   Epoch: 5   Global Step: 219000   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:19:06,570-Speed 2641.31 samples/sec   Loss 10.0626   LearningRate 0.0542   Epoch: 5   Global Step: 219010   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:19:10,456-Speed 2635.38 samples/sec   Loss 10.1523   LearningRate 0.0542   Epoch: 5   Global Step: 219020   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:14,360-Speed 2624.35 samples/sec   Loss 10.0614   LearningRate 0.0542   Epoch: 5   Global Step: 219030   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:18,253-Speed 2630.95 samples/sec   Loss 9.9222   LearningRate 0.0542   Epoch: 5   Global Step: 219040   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:22,145-Speed 2631.75 samples/sec   Loss 10.0708   LearningRate 0.0542   Epoch: 5   Global Step: 219050   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:26,037-Speed 2632.03 samples/sec   Loss 10.1807   LearningRate 0.0542   Epoch: 5   Global Step: 219060   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:29,931-Speed 2630.31 samples/sec   Loss 9.9998   LearningRate 0.0542   Epoch: 5   Global Step: 219070   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:33,834-Speed 2623.59 samples/sec   Loss 10.0663   LearningRate 0.0542   Epoch: 5   Global Step: 219080   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:37,726-Speed 2631.47 samples/sec   Loss 10.0483   LearningRate 0.0542   Epoch: 5   Global Step: 219090   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:41,618-Speed 2632.31 samples/sec   Loss 10.0424   LearningRate 0.0542   Epoch: 5   Global Step: 219100   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:45,515-Speed 2628.24 samples/sec   Loss 9.9180   LearningRate 0.0542   Epoch: 5   Global Step: 219110   Fp16 Grad Scale: 65536   Required: 69 hours
Training: 2022-04-13 20:19:49,439-Speed 2610.16 samples/sec   Loss 9.9792   LearningRate 0.0541   Epoch: 5   Global Step: 219120   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:19:53,512-Speed 2514.73 samples/sec   Loss 9.9893   LearningRate 0.0541   Epoch: 5   Global Step: 219130   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:19:57,588-Speed 2512.89 samples/sec   Loss 9.9848   LearningRate 0.0541   Epoch: 5   Global Step: 219140   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:01,658-Speed 2516.45 samples/sec   Loss 10.0273   LearningRate 0.0541   Epoch: 5   Global Step: 219150   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:05,641-Speed 2571.69 samples/sec   Loss 10.1292   LearningRate 0.0541   Epoch: 5   Global Step: 219160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:09,546-Speed 2622.57 samples/sec   Loss 9.9445   LearningRate 0.0541   Epoch: 5   Global Step: 219170   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:13,449-Speed 2624.49 samples/sec   Loss 10.0377   LearningRate 0.0541   Epoch: 5   Global Step: 219180   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:17,356-Speed 2621.64 samples/sec   Loss 10.1857   LearningRate 0.0541   Epoch: 5   Global Step: 219190   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:21,250-Speed 2630.12 samples/sec   Loss 10.1430   LearningRate 0.0541   Epoch: 5   Global Step: 219200   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:25,155-Speed 2623.14 samples/sec   Loss 9.9388   LearningRate 0.0541   Epoch: 5   Global Step: 219210   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:20:29,050-Speed 2629.47 samples/sec   Loss 9.9866   LearningRate 0.0541   Epoch: 5   Global Step: 219220   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:32,948-Speed 2627.73 samples/sec   Loss 9.9781   LearningRate 0.0541   Epoch: 5   Global Step: 219230   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:36,852-Speed 2623.75 samples/sec   Loss 9.9573   LearningRate 0.0541   Epoch: 5   Global Step: 219240   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:40,736-Speed 2636.47 samples/sec   Loss 9.9393   LearningRate 0.0541   Epoch: 5   Global Step: 219250   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:44,630-Speed 2630.69 samples/sec   Loss 10.1807   LearningRate 0.0541   Epoch: 5   Global Step: 219260   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:48,521-Speed 2632.28 samples/sec   Loss 9.9795   LearningRate 0.0541   Epoch: 5   Global Step: 219270   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:52,410-Speed 2633.56 samples/sec   Loss 10.0123   LearningRate 0.0541   Epoch: 5   Global Step: 219280   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:20:56,302-Speed 2631.80 samples/sec   Loss 10.0497   LearningRate 0.0541   Epoch: 5   Global Step: 219290   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:00,198-Speed 2629.47 samples/sec   Loss 10.0614   LearningRate 0.0541   Epoch: 5   Global Step: 219300   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:04,091-Speed 2631.25 samples/sec   Loss 9.9597   LearningRate 0.0541   Epoch: 5   Global Step: 219310   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:07,967-Speed 2641.87 samples/sec   Loss 10.1479   LearningRate 0.0541   Epoch: 5   Global Step: 219320   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:11,857-Speed 2632.92 samples/sec   Loss 9.9663   LearningRate 0.0541   Epoch: 5   Global Step: 219330   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:15,749-Speed 2631.77 samples/sec   Loss 10.0250   LearningRate 0.0541   Epoch: 5   Global Step: 219340   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:19,641-Speed 2632.06 samples/sec   Loss 9.9026   LearningRate 0.0541   Epoch: 5   Global Step: 219350   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:21:23,519-Speed 2640.88 samples/sec   Loss 10.0417   LearningRate 0.0541   Epoch: 5   Global Step: 219360   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:21:27,403-Speed 2636.93 samples/sec   Loss 10.0396   LearningRate 0.0541   Epoch: 5   Global Step: 219370   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:21:31,297-Speed 2630.99 samples/sec   Loss 10.0664   LearningRate 0.0541   Epoch: 5   Global Step: 219380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:35,247-Speed 2592.82 samples/sec   Loss 9.9745   LearningRate 0.0541   Epoch: 5   Global Step: 219390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:39,145-Speed 2627.64 samples/sec   Loss 9.9969   LearningRate 0.0541   Epoch: 5   Global Step: 219400   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:43,036-Speed 2631.92 samples/sec   Loss 9.9790   LearningRate 0.0541   Epoch: 5   Global Step: 219410   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:46,927-Speed 2632.77 samples/sec   Loss 9.9665   LearningRate 0.0541   Epoch: 5   Global Step: 219420   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:50,817-Speed 2633.06 samples/sec   Loss 10.0347   LearningRate 0.0541   Epoch: 5   Global Step: 219430   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:54,708-Speed 2632.60 samples/sec   Loss 10.0895   LearningRate 0.0541   Epoch: 5   Global Step: 219440   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:21:58,602-Speed 2630.53 samples/sec   Loss 10.0710   LearningRate 0.0541   Epoch: 5   Global Step: 219450   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:02,498-Speed 2628.70 samples/sec   Loss 10.0501   LearningRate 0.0541   Epoch: 5   Global Step: 219460   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:22:06,395-Speed 2628.40 samples/sec   Loss 10.1238   LearningRate 0.0541   Epoch: 5   Global Step: 219470   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:22:10,296-Speed 2625.73 samples/sec   Loss 9.8773   LearningRate 0.0541   Epoch: 5   Global Step: 219480   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:22:14,210-Speed 2617.00 samples/sec   Loss 9.9269   LearningRate 0.0541   Epoch: 5   Global Step: 219490   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:22:18,104-Speed 2630.24 samples/sec   Loss 10.0223   LearningRate 0.0541   Epoch: 5   Global Step: 219500   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:22:22,002-Speed 2627.59 samples/sec   Loss 10.1305   LearningRate 0.0541   Epoch: 5   Global Step: 219510   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:22:25,896-Speed 2630.23 samples/sec   Loss 10.0255   LearningRate 0.0541   Epoch: 5   Global Step: 219520   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:29,810-Speed 2616.89 samples/sec   Loss 10.0652   LearningRate 0.0541   Epoch: 5   Global Step: 219530   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:33,706-Speed 2628.80 samples/sec   Loss 9.9454   LearningRate 0.0541   Epoch: 5   Global Step: 219540   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:37,611-Speed 2623.31 samples/sec   Loss 10.0909   LearningRate 0.0541   Epoch: 5   Global Step: 219550   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:41,527-Speed 2615.05 samples/sec   Loss 9.9847   LearningRate 0.0541   Epoch: 5   Global Step: 219560   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:45,435-Speed 2621.09 samples/sec   Loss 10.0680   LearningRate 0.0541   Epoch: 5   Global Step: 219570   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:49,330-Speed 2629.53 samples/sec   Loss 10.0363   LearningRate 0.0541   Epoch: 5   Global Step: 219580   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:53,222-Speed 2632.08 samples/sec   Loss 10.0211   LearningRate 0.0541   Epoch: 5   Global Step: 219590   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:22:57,112-Speed 2632.85 samples/sec   Loss 10.0274   LearningRate 0.0541   Epoch: 5   Global Step: 219600   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:01,002-Speed 2632.55 samples/sec   Loss 10.0392   LearningRate 0.0541   Epoch: 5   Global Step: 219610   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:04,902-Speed 2626.98 samples/sec   Loss 10.0463   LearningRate 0.0541   Epoch: 5   Global Step: 219620   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:23:08,793-Speed 2632.15 samples/sec   Loss 10.0101   LearningRate 0.0541   Epoch: 5   Global Step: 219630   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:23:12,661-Speed 2647.95 samples/sec   Loss 10.0421   LearningRate 0.0541   Epoch: 5   Global Step: 219640   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:16,566-Speed 2622.68 samples/sec   Loss 9.9684   LearningRate 0.0541   Epoch: 5   Global Step: 219650   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:20,456-Speed 2633.30 samples/sec   Loss 9.9565   LearningRate 0.0541   Epoch: 5   Global Step: 219660   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:24,350-Speed 2630.12 samples/sec   Loss 10.0608   LearningRate 0.0541   Epoch: 5   Global Step: 219670   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:28,245-Speed 2629.53 samples/sec   Loss 9.9962   LearningRate 0.0541   Epoch: 5   Global Step: 219680   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:32,137-Speed 2631.50 samples/sec   Loss 10.0436   LearningRate 0.0540   Epoch: 5   Global Step: 219690   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:36,028-Speed 2632.69 samples/sec   Loss 10.1261   LearningRate 0.0540   Epoch: 5   Global Step: 219700   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:39,921-Speed 2630.91 samples/sec   Loss 10.0081   LearningRate 0.0540   Epoch: 5   Global Step: 219710   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:43,814-Speed 2631.19 samples/sec   Loss 10.0462   LearningRate 0.0540   Epoch: 5   Global Step: 219720   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:47,722-Speed 2620.77 samples/sec   Loss 10.0509   LearningRate 0.0540   Epoch: 5   Global Step: 219730   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:23:51,632-Speed 2619.49 samples/sec   Loss 10.0479   LearningRate 0.0540   Epoch: 5   Global Step: 219740   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:23:55,541-Speed 2620.08 samples/sec   Loss 9.9296   LearningRate 0.0540   Epoch: 5   Global Step: 219750   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:23:59,463-Speed 2611.62 samples/sec   Loss 9.8947   LearningRate 0.0540   Epoch: 5   Global Step: 219760   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:24:03,371-Speed 2621.50 samples/sec   Loss 10.0902   LearningRate 0.0540   Epoch: 5   Global Step: 219770   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:24:07,275-Speed 2623.16 samples/sec   Loss 10.1367   LearningRate 0.0540   Epoch: 5   Global Step: 219780   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:24:11,171-Speed 2628.97 samples/sec   Loss 10.0022   LearningRate 0.0540   Epoch: 5   Global Step: 219790   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:15,091-Speed 2613.26 samples/sec   Loss 10.0311   LearningRate 0.0540   Epoch: 5   Global Step: 219800   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:18,989-Speed 2627.79 samples/sec   Loss 9.9355   LearningRate 0.0540   Epoch: 5   Global Step: 219810   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:22,919-Speed 2606.44 samples/sec   Loss 9.9337   LearningRate 0.0540   Epoch: 5   Global Step: 219820   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:26,810-Speed 2632.83 samples/sec   Loss 10.0110   LearningRate 0.0540   Epoch: 5   Global Step: 219830   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:30,701-Speed 2632.02 samples/sec   Loss 9.9141   LearningRate 0.0540   Epoch: 5   Global Step: 219840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:34,594-Speed 2630.86 samples/sec   Loss 10.1088   LearningRate 0.0540   Epoch: 5   Global Step: 219850   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:38,485-Speed 2632.26 samples/sec   Loss 9.9927   LearningRate 0.0540   Epoch: 5   Global Step: 219860   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:24:42,369-Speed 2637.47 samples/sec   Loss 9.9011   LearningRate 0.0540   Epoch: 5   Global Step: 219870   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:24:46,268-Speed 2626.75 samples/sec   Loss 9.9875   LearningRate 0.0540   Epoch: 5   Global Step: 219880   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:24:50,169-Speed 2626.04 samples/sec   Loss 9.8731   LearningRate 0.0540   Epoch: 5   Global Step: 219890   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:24:54,074-Speed 2622.71 samples/sec   Loss 10.0153   LearningRate 0.0540   Epoch: 5   Global Step: 219900   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:24:57,976-Speed 2625.57 samples/sec   Loss 10.1247   LearningRate 0.0540   Epoch: 5   Global Step: 219910   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:25:01,877-Speed 2625.61 samples/sec   Loss 9.9480   LearningRate 0.0540   Epoch: 5   Global Step: 219920   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:25:05,774-Speed 2627.60 samples/sec   Loss 10.0217   LearningRate 0.0540   Epoch: 5   Global Step: 219930   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:25:09,676-Speed 2625.31 samples/sec   Loss 10.0062   LearningRate 0.0540   Epoch: 5   Global Step: 219940   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:25:13,565-Speed 2633.38 samples/sec   Loss 10.0309   LearningRate 0.0540   Epoch: 5   Global Step: 219950   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:25:17,454-Speed 2633.62 samples/sec   Loss 10.1350   LearningRate 0.0540   Epoch: 5   Global Step: 219960   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:25:21,353-Speed 2627.76 samples/sec   Loss 9.8697   LearningRate 0.0540   Epoch: 5   Global Step: 219970   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:25:25,243-Speed 2633.08 samples/sec   Loss 10.0525   LearningRate 0.0540   Epoch: 5   Global Step: 219980   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:25:29,147-Speed 2623.71 samples/sec   Loss 9.9963   LearningRate 0.0540   Epoch: 5   Global Step: 219990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:25:33,051-Speed 2623.29 samples/sec   Loss 9.9675   LearningRate 0.0540   Epoch: 5   Global Step: 220000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:26:16,230-[lfw][220000]XNorm: 23.200060
Training: 2022-04-13 20:26:16,231-[lfw][220000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-13 20:26:16,232-[lfw][220000]Accuracy-Highest: 0.99783
Training: 2022-04-13 20:27:06,276-[cfp_fp][220000]XNorm: 21.186069
Training: 2022-04-13 20:27:06,277-[cfp_fp][220000]Accuracy-Flip: 0.98314+-0.00635
Training: 2022-04-13 20:27:06,278-[cfp_fp][220000]Accuracy-Highest: 0.98314
Training: 2022-04-13 20:27:49,348-[agedb_30][220000]XNorm: 22.932315
Training: 2022-04-13 20:27:49,349-[agedb_30][220000]Accuracy-Flip: 0.97133+-0.00653
Training: 2022-04-13 20:27:49,349-[agedb_30][220000]Accuracy-Highest: 0.97150
Training: 2022-04-13 20:27:53,241-Speed 73.04 samples/sec   Loss 10.0306   LearningRate 0.0540   Epoch: 5   Global Step: 220010   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:27:57,124-Speed 2637.24 samples/sec   Loss 9.9739   LearningRate 0.0540   Epoch: 5   Global Step: 220020   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:01,049-Speed 2610.26 samples/sec   Loss 10.1418   LearningRate 0.0540   Epoch: 5   Global Step: 220030   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:04,965-Speed 2615.20 samples/sec   Loss 9.9842   LearningRate 0.0540   Epoch: 5   Global Step: 220040   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:08,838-Speed 2644.69 samples/sec   Loss 10.0220   LearningRate 0.0540   Epoch: 5   Global Step: 220050   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:12,842-Speed 2558.22 samples/sec   Loss 9.9369   LearningRate 0.0540   Epoch: 5   Global Step: 220060   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:16,822-Speed 2574.39 samples/sec   Loss 10.1259   LearningRate 0.0540   Epoch: 5   Global Step: 220070   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:28:20,712-Speed 2633.18 samples/sec   Loss 9.8278   LearningRate 0.0540   Epoch: 5   Global Step: 220080   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:28:24,602-Speed 2633.24 samples/sec   Loss 10.1270   LearningRate 0.0540   Epoch: 5   Global Step: 220090   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:28:28,487-Speed 2636.40 samples/sec   Loss 9.8763   LearningRate 0.0540   Epoch: 5   Global Step: 220100   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:28:32,370-Speed 2638.58 samples/sec   Loss 10.0805   LearningRate 0.0540   Epoch: 5   Global Step: 220110   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:36,265-Speed 2629.07 samples/sec   Loss 10.0684   LearningRate 0.0540   Epoch: 5   Global Step: 220120   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:40,165-Speed 2626.84 samples/sec   Loss 9.9581   LearningRate 0.0540   Epoch: 5   Global Step: 220130   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:44,053-Speed 2634.11 samples/sec   Loss 10.0655   LearningRate 0.0540   Epoch: 5   Global Step: 220140   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:47,944-Speed 2632.16 samples/sec   Loss 9.8805   LearningRate 0.0540   Epoch: 5   Global Step: 220150   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:51,837-Speed 2631.33 samples/sec   Loss 9.9495   LearningRate 0.0540   Epoch: 5   Global Step: 220160   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:55,739-Speed 2624.68 samples/sec   Loss 9.7935   LearningRate 0.0540   Epoch: 5   Global Step: 220170   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:28:59,647-Speed 2620.86 samples/sec   Loss 9.9669   LearningRate 0.0540   Epoch: 5   Global Step: 220180   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:29:03,548-Speed 2626.10 samples/sec   Loss 9.9065   LearningRate 0.0540   Epoch: 5   Global Step: 220190   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:29:07,440-Speed 2631.68 samples/sec   Loss 10.0515   LearningRate 0.0540   Epoch: 5   Global Step: 220200   Fp16 Grad Scale: 131072   Required: 69 hours
Training: 2022-04-13 20:29:11,343-Speed 2623.81 samples/sec   Loss 10.0182   LearningRate 0.0540   Epoch: 5   Global Step: 220210   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:29:15,238-Speed 2629.97 samples/sec   Loss 10.0195   LearningRate 0.0540   Epoch: 5   Global Step: 220220   Fp16 Grad Scale: 262144   Required: 69 hours
Training: 2022-04-13 20:29:19,136-Speed 2627.35 samples/sec   Loss 9.8854   LearningRate 0.0540   Epoch: 5   Global Step: 220230   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:29:23,026-Speed 2632.97 samples/sec   Loss 9.9001   LearningRate 0.0540   Epoch: 5   Global Step: 220240   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:26,928-Speed 2625.51 samples/sec   Loss 10.0019   LearningRate 0.0539   Epoch: 5   Global Step: 220250   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:30,822-Speed 2630.79 samples/sec   Loss 10.0835   LearningRate 0.0539   Epoch: 5   Global Step: 220260   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:34,719-Speed 2628.76 samples/sec   Loss 9.9621   LearningRate 0.0539   Epoch: 5   Global Step: 220270   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:38,641-Speed 2611.14 samples/sec   Loss 9.9796   LearningRate 0.0539   Epoch: 5   Global Step: 220280   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:42,537-Speed 2629.41 samples/sec   Loss 9.9820   LearningRate 0.0539   Epoch: 5   Global Step: 220290   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:46,436-Speed 2626.58 samples/sec   Loss 9.8204   LearningRate 0.0539   Epoch: 5   Global Step: 220300   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:50,357-Speed 2612.51 samples/sec   Loss 9.9489   LearningRate 0.0539   Epoch: 5   Global Step: 220310   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:54,251-Speed 2630.68 samples/sec   Loss 10.0093   LearningRate 0.0539   Epoch: 5   Global Step: 220320   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:29:58,143-Speed 2631.69 samples/sec   Loss 9.9711   LearningRate 0.0539   Epoch: 5   Global Step: 220330   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:02,056-Speed 2617.95 samples/sec   Loss 10.0277   LearningRate 0.0539   Epoch: 5   Global Step: 220340   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:30:05,950-Speed 2630.23 samples/sec   Loss 10.0085   LearningRate 0.0539   Epoch: 5   Global Step: 220350   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:09,859-Speed 2619.94 samples/sec   Loss 10.0123   LearningRate 0.0539   Epoch: 5   Global Step: 220360   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:13,749-Speed 2633.11 samples/sec   Loss 9.9321   LearningRate 0.0539   Epoch: 5   Global Step: 220370   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:17,644-Speed 2629.80 samples/sec   Loss 10.0915   LearningRate 0.0539   Epoch: 5   Global Step: 220380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:21,547-Speed 2624.56 samples/sec   Loss 10.1267   LearningRate 0.0539   Epoch: 5   Global Step: 220390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:25,444-Speed 2628.29 samples/sec   Loss 10.0198   LearningRate 0.0539   Epoch: 5   Global Step: 220400   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:29,340-Speed 2629.32 samples/sec   Loss 10.0362   LearningRate 0.0539   Epoch: 5   Global Step: 220410   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:33,237-Speed 2628.65 samples/sec   Loss 9.9091   LearningRate 0.0539   Epoch: 5   Global Step: 220420   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:37,128-Speed 2631.91 samples/sec   Loss 9.8926   LearningRate 0.0539   Epoch: 5   Global Step: 220430   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:41,020-Speed 2632.17 samples/sec   Loss 10.0105   LearningRate 0.0539   Epoch: 5   Global Step: 220440   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:44,892-Speed 2644.56 samples/sec   Loss 10.1809   LearningRate 0.0539   Epoch: 5   Global Step: 220450   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:48,785-Speed 2631.68 samples/sec   Loss 10.0105   LearningRate 0.0539   Epoch: 5   Global Step: 220460   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:52,678-Speed 2630.73 samples/sec   Loss 9.9888   LearningRate 0.0539   Epoch: 5   Global Step: 220470   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:30:56,571-Speed 2631.20 samples/sec   Loss 10.1567   LearningRate 0.0539   Epoch: 5   Global Step: 220480   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:00,465-Speed 2630.27 samples/sec   Loss 10.0783   LearningRate 0.0539   Epoch: 5   Global Step: 220490   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:04,393-Speed 2607.58 samples/sec   Loss 10.0672   LearningRate 0.0539   Epoch: 5   Global Step: 220500   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:08,289-Speed 2629.65 samples/sec   Loss 10.0104   LearningRate 0.0539   Epoch: 5   Global Step: 220510   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:12,208-Speed 2613.54 samples/sec   Loss 9.9724   LearningRate 0.0539   Epoch: 5   Global Step: 220520   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:16,110-Speed 2624.63 samples/sec   Loss 9.8584   LearningRate 0.0539   Epoch: 5   Global Step: 220530   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:20,013-Speed 2624.24 samples/sec   Loss 9.9497   LearningRate 0.0539   Epoch: 5   Global Step: 220540   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:23,914-Speed 2625.47 samples/sec   Loss 10.0391   LearningRate 0.0539   Epoch: 5   Global Step: 220550   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:31:27,813-Speed 2626.71 samples/sec   Loss 9.9497   LearningRate 0.0539   Epoch: 5   Global Step: 220560   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:31:31,691-Speed 2642.27 samples/sec   Loss 9.8686   LearningRate 0.0539   Epoch: 5   Global Step: 220570   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:31:35,547-Speed 2655.94 samples/sec   Loss 9.9410   LearningRate 0.0539   Epoch: 5   Global Step: 220580   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:31:39,442-Speed 2629.23 samples/sec   Loss 10.2647   LearningRate 0.0539   Epoch: 5   Global Step: 220590   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:31:43,341-Speed 2626.76 samples/sec   Loss 10.2443   LearningRate 0.0539   Epoch: 5   Global Step: 220600   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:31:47,236-Speed 2629.84 samples/sec   Loss 9.9690   LearningRate 0.0539   Epoch: 5   Global Step: 220610   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:31:51,136-Speed 2626.02 samples/sec   Loss 9.9935   LearningRate 0.0539   Epoch: 5   Global Step: 220620   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:31:55,047-Speed 2619.00 samples/sec   Loss 10.0623   LearningRate 0.0539   Epoch: 5   Global Step: 220630   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:31:58,933-Speed 2635.53 samples/sec   Loss 10.0607   LearningRate 0.0539   Epoch: 5   Global Step: 220640   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:32:02,843-Speed 2620.19 samples/sec   Loss 10.0129   LearningRate 0.0539   Epoch: 5   Global Step: 220650   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:32:06,734-Speed 2632.42 samples/sec   Loss 9.8744   LearningRate 0.0539   Epoch: 5   Global Step: 220660   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:32:10,662-Speed 2607.11 samples/sec   Loss 9.9987   LearningRate 0.0539   Epoch: 5   Global Step: 220670   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:32:14,554-Speed 2631.71 samples/sec   Loss 10.0584   LearningRate 0.0539   Epoch: 5   Global Step: 220680   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:18,443-Speed 2634.11 samples/sec   Loss 9.9873   LearningRate 0.0539   Epoch: 5   Global Step: 220690   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:22,333-Speed 2633.09 samples/sec   Loss 10.0505   LearningRate 0.0539   Epoch: 5   Global Step: 220700   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:26,225-Speed 2631.34 samples/sec   Loss 9.9631   LearningRate 0.0539   Epoch: 5   Global Step: 220710   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:30,124-Speed 2627.21 samples/sec   Loss 10.0928   LearningRate 0.0539   Epoch: 5   Global Step: 220720   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:34,040-Speed 2616.12 samples/sec   Loss 10.0408   LearningRate 0.0539   Epoch: 5   Global Step: 220730   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:37,938-Speed 2627.43 samples/sec   Loss 9.9417   LearningRate 0.0539   Epoch: 5   Global Step: 220740   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:41,838-Speed 2626.46 samples/sec   Loss 10.1104   LearningRate 0.0539   Epoch: 5   Global Step: 220750   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:45,740-Speed 2624.71 samples/sec   Loss 9.9397   LearningRate 0.0539   Epoch: 5   Global Step: 220760   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:49,647-Speed 2621.83 samples/sec   Loss 10.0041   LearningRate 0.0539   Epoch: 5   Global Step: 220770   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:32:53,556-Speed 2619.56 samples/sec   Loss 9.9856   LearningRate 0.0539   Epoch: 5   Global Step: 220780   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:32:57,467-Speed 2619.50 samples/sec   Loss 10.0469   LearningRate 0.0539   Epoch: 5   Global Step: 220790   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:01,361-Speed 2630.61 samples/sec   Loss 10.0396   LearningRate 0.0539   Epoch: 5   Global Step: 220800   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:05,255-Speed 2630.43 samples/sec   Loss 10.0961   LearningRate 0.0539   Epoch: 5   Global Step: 220810   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:09,149-Speed 2630.13 samples/sec   Loss 9.9185   LearningRate 0.0538   Epoch: 5   Global Step: 220820   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:13,046-Speed 2628.37 samples/sec   Loss 9.9684   LearningRate 0.0538   Epoch: 5   Global Step: 220830   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:16,942-Speed 2628.87 samples/sec   Loss 10.0556   LearningRate 0.0538   Epoch: 5   Global Step: 220840   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:20,839-Speed 2628.70 samples/sec   Loss 9.8995   LearningRate 0.0538   Epoch: 5   Global Step: 220850   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:24,732-Speed 2630.45 samples/sec   Loss 9.9357   LearningRate 0.0538   Epoch: 5   Global Step: 220860   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:28,644-Speed 2618.68 samples/sec   Loss 10.0277   LearningRate 0.0538   Epoch: 5   Global Step: 220870   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:32,534-Speed 2632.47 samples/sec   Loss 10.0318   LearningRate 0.0538   Epoch: 5   Global Step: 220880   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:33:36,437-Speed 2625.32 samples/sec   Loss 9.8649   LearningRate 0.0538   Epoch: 5   Global Step: 220890   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:33:40,334-Speed 2628.01 samples/sec   Loss 10.0395   LearningRate 0.0538   Epoch: 5   Global Step: 220900   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:33:44,226-Speed 2631.89 samples/sec   Loss 9.9568   LearningRate 0.0538   Epoch: 5   Global Step: 220910   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:33:48,123-Speed 2628.20 samples/sec   Loss 9.8580   LearningRate 0.0538   Epoch: 5   Global Step: 220920   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:33:52,016-Speed 2631.05 samples/sec   Loss 10.0262   LearningRate 0.0538   Epoch: 5   Global Step: 220930   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:33:55,888-Speed 2645.19 samples/sec   Loss 10.9438   LearningRate 0.0538   Epoch: 5   Global Step: 220940   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:33:59,801-Speed 2617.70 samples/sec   Loss 10.4241   LearningRate 0.0538   Epoch: 5   Global Step: 220950   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:03,711-Speed 2619.17 samples/sec   Loss 10.2572   LearningRate 0.0538   Epoch: 5   Global Step: 220960   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:07,655-Speed 2597.34 samples/sec   Loss 10.0677   LearningRate 0.0538   Epoch: 5   Global Step: 220970   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:11,548-Speed 2631.45 samples/sec   Loss 10.1055   LearningRate 0.0538   Epoch: 5   Global Step: 220980   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:15,489-Speed 2598.95 samples/sec   Loss 10.0243   LearningRate 0.0538   Epoch: 5   Global Step: 220990   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:19,386-Speed 2628.66 samples/sec   Loss 10.2253   LearningRate 0.0538   Epoch: 5   Global Step: 221000   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:23,328-Speed 2598.24 samples/sec   Loss 9.8980   LearningRate 0.0538   Epoch: 5   Global Step: 221010   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:27,251-Speed 2610.72 samples/sec   Loss 9.9987   LearningRate 0.0538   Epoch: 5   Global Step: 221020   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:31,154-Speed 2624.39 samples/sec   Loss 10.0776   LearningRate 0.0538   Epoch: 5   Global Step: 221030   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:34:35,045-Speed 2632.56 samples/sec   Loss 10.1081   LearningRate 0.0538   Epoch: 5   Global Step: 221040   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:34:39,049-Speed 2557.73 samples/sec   Loss 10.0113   LearningRate 0.0538   Epoch: 5   Global Step: 221050   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:34:42,960-Speed 2618.91 samples/sec   Loss 9.9864   LearningRate 0.0538   Epoch: 5   Global Step: 221060   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:34:46,856-Speed 2629.34 samples/sec   Loss 9.7910   LearningRate 0.0538   Epoch: 5   Global Step: 221070   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:34:50,758-Speed 2624.96 samples/sec   Loss 9.9383   LearningRate 0.0538   Epoch: 5   Global Step: 221080   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:34:54,651-Speed 2631.56 samples/sec   Loss 9.9549   LearningRate 0.0538   Epoch: 5   Global Step: 221090   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:34:58,546-Speed 2629.27 samples/sec   Loss 9.8907   LearningRate 0.0538   Epoch: 5   Global Step: 221100   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:35:02,467-Speed 2611.70 samples/sec   Loss 9.9447   LearningRate 0.0538   Epoch: 5   Global Step: 221110   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:35:06,397-Speed 2606.70 samples/sec   Loss 9.8893   LearningRate 0.0538   Epoch: 5   Global Step: 221120   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:35:10,303-Speed 2622.39 samples/sec   Loss 9.9943   LearningRate 0.0538   Epoch: 5   Global Step: 221130   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:35:14,217-Speed 2617.03 samples/sec   Loss 10.0138   LearningRate 0.0538   Epoch: 5   Global Step: 221140   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:18,123-Speed 2621.88 samples/sec   Loss 10.0316   LearningRate 0.0538   Epoch: 5   Global Step: 221150   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:22,024-Speed 2626.33 samples/sec   Loss 9.8782   LearningRate 0.0538   Epoch: 5   Global Step: 221160   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:25,926-Speed 2624.89 samples/sec   Loss 9.8930   LearningRate 0.0538   Epoch: 5   Global Step: 221170   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:29,870-Speed 2597.16 samples/sec   Loss 9.8945   LearningRate 0.0538   Epoch: 5   Global Step: 221180   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:33,768-Speed 2628.02 samples/sec   Loss 9.9212   LearningRate 0.0538   Epoch: 5   Global Step: 221190   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:37,688-Speed 2612.30 samples/sec   Loss 9.9390   LearningRate 0.0538   Epoch: 5   Global Step: 221200   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:41,593-Speed 2622.96 samples/sec   Loss 9.9658   LearningRate 0.0538   Epoch: 5   Global Step: 221210   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:45,493-Speed 2626.61 samples/sec   Loss 9.9981   LearningRate 0.0538   Epoch: 5   Global Step: 221220   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:49,392-Speed 2627.13 samples/sec   Loss 9.9202   LearningRate 0.0538   Epoch: 5   Global Step: 221230   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:35:53,294-Speed 2625.14 samples/sec   Loss 9.9435   LearningRate 0.0538   Epoch: 5   Global Step: 221240   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:35:57,191-Speed 2628.47 samples/sec   Loss 10.0340   LearningRate 0.0538   Epoch: 5   Global Step: 221250   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:36:01,093-Speed 2625.02 samples/sec   Loss 9.9066   LearningRate 0.0538   Epoch: 5   Global Step: 221260   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:36:05,003-Speed 2619.70 samples/sec   Loss 10.0100   LearningRate 0.0538   Epoch: 5   Global Step: 221270   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:36:08,906-Speed 2624.00 samples/sec   Loss 10.0515   LearningRate 0.0538   Epoch: 5   Global Step: 221280   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:36:12,809-Speed 2624.28 samples/sec   Loss 10.0242   LearningRate 0.0538   Epoch: 5   Global Step: 221290   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:36:16,695-Speed 2635.49 samples/sec   Loss 10.1204   LearningRate 0.0538   Epoch: 5   Global Step: 221300   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:20,599-Speed 2624.56 samples/sec   Loss 10.1266   LearningRate 0.0538   Epoch: 5   Global Step: 221310   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:24,495-Speed 2628.68 samples/sec   Loss 9.9080   LearningRate 0.0538   Epoch: 5   Global Step: 221320   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:28,407-Speed 2618.24 samples/sec   Loss 9.9866   LearningRate 0.0538   Epoch: 5   Global Step: 221330   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:32,313-Speed 2622.37 samples/sec   Loss 9.7967   LearningRate 0.0538   Epoch: 5   Global Step: 221340   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:36,212-Speed 2626.85 samples/sec   Loss 10.0145   LearningRate 0.0538   Epoch: 5   Global Step: 221350   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:40,106-Speed 2630.17 samples/sec   Loss 10.0062   LearningRate 0.0538   Epoch: 5   Global Step: 221360   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:44,037-Speed 2606.10 samples/sec   Loss 9.7928   LearningRate 0.0538   Epoch: 5   Global Step: 221370   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:47,946-Speed 2620.55 samples/sec   Loss 9.8781   LearningRate 0.0537   Epoch: 5   Global Step: 221380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:51,913-Speed 2582.32 samples/sec   Loss 9.9520   LearningRate 0.0537   Epoch: 5   Global Step: 221390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:36:55,837-Speed 2610.04 samples/sec   Loss 9.8845   LearningRate 0.0537   Epoch: 5   Global Step: 221400   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:36:59,734-Speed 2628.38 samples/sec   Loss 9.9165   LearningRate 0.0537   Epoch: 5   Global Step: 221410   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:37:03,626-Speed 2631.45 samples/sec   Loss 9.9415   LearningRate 0.0537   Epoch: 5   Global Step: 221420   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:37:07,522-Speed 2629.27 samples/sec   Loss 10.0071   LearningRate 0.0537   Epoch: 5   Global Step: 221430   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:37:11,414-Speed 2631.48 samples/sec   Loss 9.9442   LearningRate 0.0537   Epoch: 5   Global Step: 221440   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:37:15,314-Speed 2626.18 samples/sec   Loss 10.0824   LearningRate 0.0537   Epoch: 5   Global Step: 221450   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:37:19,234-Speed 2613.82 samples/sec   Loss 9.9524   LearningRate 0.0537   Epoch: 5   Global Step: 221460   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:37:23,143-Speed 2619.63 samples/sec   Loss 10.0173   LearningRate 0.0537   Epoch: 5   Global Step: 221470   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:27,065-Speed 2612.17 samples/sec   Loss 9.9579   LearningRate 0.0537   Epoch: 5   Global Step: 221480   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:30,965-Speed 2626.27 samples/sec   Loss 10.0224   LearningRate 0.0537   Epoch: 5   Global Step: 221490   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:34,862-Speed 2628.18 samples/sec   Loss 10.0503   LearningRate 0.0537   Epoch: 5   Global Step: 221500   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:38,752-Speed 2632.77 samples/sec   Loss 9.8846   LearningRate 0.0537   Epoch: 5   Global Step: 221510   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:42,644-Speed 2631.95 samples/sec   Loss 9.9804   LearningRate 0.0537   Epoch: 5   Global Step: 221520   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:46,542-Speed 2627.31 samples/sec   Loss 10.1132   LearningRate 0.0537   Epoch: 5   Global Step: 221530   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:50,436-Speed 2630.66 samples/sec   Loss 9.9822   LearningRate 0.0537   Epoch: 5   Global Step: 221540   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:54,335-Speed 2626.85 samples/sec   Loss 10.0538   LearningRate 0.0537   Epoch: 5   Global Step: 221550   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:37:58,229-Speed 2630.86 samples/sec   Loss 9.9929   LearningRate 0.0537   Epoch: 5   Global Step: 221560   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:38:02,127-Speed 2627.49 samples/sec   Loss 9.9637   LearningRate 0.0537   Epoch: 5   Global Step: 221570   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:06,020-Speed 2631.08 samples/sec   Loss 9.9839   LearningRate 0.0537   Epoch: 5   Global Step: 221580   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:09,925-Speed 2622.64 samples/sec   Loss 9.9822   LearningRate 0.0537   Epoch: 5   Global Step: 221590   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:13,816-Speed 2632.28 samples/sec   Loss 9.9101   LearningRate 0.0537   Epoch: 5   Global Step: 221600   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:17,713-Speed 2628.62 samples/sec   Loss 9.8222   LearningRate 0.0537   Epoch: 5   Global Step: 221610   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:21,642-Speed 2606.58 samples/sec   Loss 10.0665   LearningRate 0.0537   Epoch: 5   Global Step: 221620   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:25,539-Speed 2628.85 samples/sec   Loss 9.9446   LearningRate 0.0537   Epoch: 5   Global Step: 221630   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:29,430-Speed 2632.65 samples/sec   Loss 10.0640   LearningRate 0.0537   Epoch: 5   Global Step: 221640   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:33,322-Speed 2631.87 samples/sec   Loss 9.9477   LearningRate 0.0537   Epoch: 5   Global Step: 221650   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:37,216-Speed 2630.51 samples/sec   Loss 9.7971   LearningRate 0.0537   Epoch: 5   Global Step: 221660   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:41,101-Speed 2636.34 samples/sec   Loss 9.9805   LearningRate 0.0537   Epoch: 5   Global Step: 221670   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:45,020-Speed 2613.72 samples/sec   Loss 9.9391   LearningRate 0.0537   Epoch: 5   Global Step: 221680   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:38:48,925-Speed 2622.90 samples/sec   Loss 9.9267   LearningRate 0.0537   Epoch: 5   Global Step: 221690   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:38:52,825-Speed 2626.09 samples/sec   Loss 9.9418   LearningRate 0.0537   Epoch: 5   Global Step: 221700   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:38:56,716-Speed 2632.54 samples/sec   Loss 10.0152   LearningRate 0.0537   Epoch: 5   Global Step: 221710   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:00,607-Speed 2632.68 samples/sec   Loss 9.9849   LearningRate 0.0537   Epoch: 5   Global Step: 221720   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:04,506-Speed 2627.00 samples/sec   Loss 9.8714   LearningRate 0.0537   Epoch: 5   Global Step: 221730   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:08,406-Speed 2626.33 samples/sec   Loss 10.0591   LearningRate 0.0537   Epoch: 5   Global Step: 221740   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:12,326-Speed 2613.04 samples/sec   Loss 9.9576   LearningRate 0.0537   Epoch: 5   Global Step: 221750   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:16,252-Speed 2608.97 samples/sec   Loss 10.0310   LearningRate 0.0537   Epoch: 5   Global Step: 221760   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:20,159-Speed 2621.68 samples/sec   Loss 9.9116   LearningRate 0.0537   Epoch: 5   Global Step: 221770   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:24,062-Speed 2624.19 samples/sec   Loss 9.9616   LearningRate 0.0537   Epoch: 5   Global Step: 221780   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:27,959-Speed 2628.33 samples/sec   Loss 9.8618   LearningRate 0.0537   Epoch: 5   Global Step: 221790   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:39:31,871-Speed 2617.85 samples/sec   Loss 9.9557   LearningRate 0.0537   Epoch: 5   Global Step: 221800   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:39:35,774-Speed 2624.99 samples/sec   Loss 10.1888   LearningRate 0.0537   Epoch: 5   Global Step: 221810   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:39:39,657-Speed 2637.74 samples/sec   Loss 9.9377   LearningRate 0.0537   Epoch: 5   Global Step: 221820   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:43,544-Speed 2635.55 samples/sec   Loss 10.0117   LearningRate 0.0537   Epoch: 5   Global Step: 221830   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:47,437-Speed 2630.55 samples/sec   Loss 10.0307   LearningRate 0.0537   Epoch: 5   Global Step: 221840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:51,334-Speed 2627.80 samples/sec   Loss 10.1571   LearningRate 0.0537   Epoch: 5   Global Step: 221850   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:55,260-Speed 2608.92 samples/sec   Loss 9.9051   LearningRate 0.0537   Epoch: 5   Global Step: 221860   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:39:59,162-Speed 2624.83 samples/sec   Loss 9.9544   LearningRate 0.0537   Epoch: 5   Global Step: 221870   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:40:03,227-Speed 2519.57 samples/sec   Loss 10.1159   LearningRate 0.0537   Epoch: 5   Global Step: 221880   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:07,318-Speed 2504.03 samples/sec   Loss 10.0403   LearningRate 0.0537   Epoch: 5   Global Step: 221890   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:11,409-Speed 2503.82 samples/sec   Loss 9.7826   LearningRate 0.0537   Epoch: 5   Global Step: 221900   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:15,349-Speed 2599.95 samples/sec   Loss 9.9961   LearningRate 0.0537   Epoch: 5   Global Step: 221910   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:19,244-Speed 2629.42 samples/sec   Loss 10.1118   LearningRate 0.0537   Epoch: 5   Global Step: 221920   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:23,166-Speed 2611.67 samples/sec   Loss 9.9938   LearningRate 0.0537   Epoch: 5   Global Step: 221930   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:27,080-Speed 2617.00 samples/sec   Loss 9.8600   LearningRate 0.0537   Epoch: 5   Global Step: 221940   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:30,972-Speed 2631.62 samples/sec   Loss 10.0447   LearningRate 0.0536   Epoch: 5   Global Step: 221950   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:34,866-Speed 2630.86 samples/sec   Loss 10.0652   LearningRate 0.0536   Epoch: 5   Global Step: 221960   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:38,761-Speed 2629.72 samples/sec   Loss 9.9364   LearningRate 0.0536   Epoch: 5   Global Step: 221970   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:40:42,655-Speed 2629.94 samples/sec   Loss 9.8763   LearningRate 0.0536   Epoch: 5   Global Step: 221980   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:40:46,551-Speed 2629.26 samples/sec   Loss 9.9005   LearningRate 0.0536   Epoch: 5   Global Step: 221990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:40:50,448-Speed 2628.27 samples/sec   Loss 9.9638   LearningRate 0.0536   Epoch: 5   Global Step: 222000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:40:54,342-Speed 2630.64 samples/sec   Loss 10.0484   LearningRate 0.0536   Epoch: 5   Global Step: 222010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:40:58,218-Speed 2642.65 samples/sec   Loss 9.9676   LearningRate 0.0536   Epoch: 5   Global Step: 222020   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:02,115-Speed 2628.25 samples/sec   Loss 9.9068   LearningRate 0.0536   Epoch: 5   Global Step: 222030   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:06,010-Speed 2629.85 samples/sec   Loss 10.1066   LearningRate 0.0536   Epoch: 5   Global Step: 222040   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:09,907-Speed 2628.78 samples/sec   Loss 10.0034   LearningRate 0.0536   Epoch: 5   Global Step: 222050   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:13,807-Speed 2625.99 samples/sec   Loss 9.9523   LearningRate 0.0536   Epoch: 5   Global Step: 222060   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:17,709-Speed 2624.37 samples/sec   Loss 9.9761   LearningRate 0.0536   Epoch: 5   Global Step: 222070   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:21,608-Speed 2627.50 samples/sec   Loss 9.9762   LearningRate 0.0536   Epoch: 5   Global Step: 222080   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:25,505-Speed 2628.15 samples/sec   Loss 9.8709   LearningRate 0.0536   Epoch: 5   Global Step: 222090   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:29,402-Speed 2628.68 samples/sec   Loss 10.0299   LearningRate 0.0536   Epoch: 5   Global Step: 222100   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:33,294-Speed 2631.75 samples/sec   Loss 10.0269   LearningRate 0.0536   Epoch: 5   Global Step: 222110   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:41:37,188-Speed 2630.20 samples/sec   Loss 10.0309   LearningRate 0.0536   Epoch: 5   Global Step: 222120   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:41:41,087-Speed 2626.50 samples/sec   Loss 10.0736   LearningRate 0.0536   Epoch: 5   Global Step: 222130   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:41:45,070-Speed 2571.94 samples/sec   Loss 10.0035   LearningRate 0.0536   Epoch: 5   Global Step: 222140   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:41:48,983-Speed 2617.59 samples/sec   Loss 9.8634   LearningRate 0.0536   Epoch: 5   Global Step: 222150   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:41:52,886-Speed 2624.79 samples/sec   Loss 9.9059   LearningRate 0.0536   Epoch: 5   Global Step: 222160   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:41:56,820-Speed 2603.45 samples/sec   Loss 9.9098   LearningRate 0.0536   Epoch: 5   Global Step: 222170   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:42:00,729-Speed 2620.24 samples/sec   Loss 9.9441   LearningRate 0.0536   Epoch: 5   Global Step: 222180   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:42:04,627-Speed 2627.69 samples/sec   Loss 10.0259   LearningRate 0.0536   Epoch: 5   Global Step: 222190   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:42:08,536-Speed 2620.49 samples/sec   Loss 9.8892   LearningRate 0.0536   Epoch: 5   Global Step: 222200   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:42:12,439-Speed 2623.88 samples/sec   Loss 10.0029   LearningRate 0.0536   Epoch: 5   Global Step: 222210   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:42:16,323-Speed 2637.11 samples/sec   Loss 10.0500   LearningRate 0.0536   Epoch: 5   Global Step: 222220   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:42:20,159-Speed 2670.64 samples/sec   Loss 10.3783   LearningRate 0.0536   Epoch: 5   Global Step: 222230   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:42:24,032-Speed 2644.54 samples/sec   Loss 10.0954   LearningRate 0.0536   Epoch: 5   Global Step: 222240   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:27,924-Speed 2632.01 samples/sec   Loss 9.9727   LearningRate 0.0536   Epoch: 5   Global Step: 222250   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:31,817-Speed 2631.08 samples/sec   Loss 10.1716   LearningRate 0.0536   Epoch: 5   Global Step: 222260   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:35,706-Speed 2632.86 samples/sec   Loss 9.7784   LearningRate 0.0536   Epoch: 5   Global Step: 222270   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:39,596-Speed 2632.91 samples/sec   Loss 10.0891   LearningRate 0.0536   Epoch: 5   Global Step: 222280   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:43,485-Speed 2634.68 samples/sec   Loss 9.9355   LearningRate 0.0536   Epoch: 5   Global Step: 222290   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:47,384-Speed 2626.66 samples/sec   Loss 9.8839   LearningRate 0.0536   Epoch: 5   Global Step: 222300   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:51,277-Speed 2631.45 samples/sec   Loss 10.0169   LearningRate 0.0536   Epoch: 5   Global Step: 222310   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:55,169-Speed 2631.71 samples/sec   Loss 9.9630   LearningRate 0.0536   Epoch: 5   Global Step: 222320   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:42:59,055-Speed 2635.63 samples/sec   Loss 9.9222   LearningRate 0.0536   Epoch: 5   Global Step: 222330   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 20:43:02,945-Speed 2633.37 samples/sec   Loss 9.9072   LearningRate 0.0536   Epoch: 5   Global Step: 222340   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:06,836-Speed 2632.62 samples/sec   Loss 9.9409   LearningRate 0.0536   Epoch: 5   Global Step: 222350   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:10,725-Speed 2633.50 samples/sec   Loss 10.1096   LearningRate 0.0536   Epoch: 5   Global Step: 222360   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:14,616-Speed 2632.07 samples/sec   Loss 9.9633   LearningRate 0.0536   Epoch: 5   Global Step: 222370   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:18,508-Speed 2631.91 samples/sec   Loss 10.0456   LearningRate 0.0536   Epoch: 5   Global Step: 222380   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:22,402-Speed 2630.86 samples/sec   Loss 9.8395   LearningRate 0.0536   Epoch: 5   Global Step: 222390   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:26,292-Speed 2632.76 samples/sec   Loss 9.9683   LearningRate 0.0536   Epoch: 5   Global Step: 222400   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:30,181-Speed 2633.47 samples/sec   Loss 10.0553   LearningRate 0.0536   Epoch: 5   Global Step: 222410   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:34,081-Speed 2626.44 samples/sec   Loss 9.8993   LearningRate 0.0536   Epoch: 5   Global Step: 222420   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:37,972-Speed 2632.64 samples/sec   Loss 10.0165   LearningRate 0.0536   Epoch: 5   Global Step: 222430   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:43:41,865-Speed 2630.99 samples/sec   Loss 10.0069   LearningRate 0.0536   Epoch: 5   Global Step: 222440   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:43:45,761-Speed 2629.31 samples/sec   Loss 9.9758   LearningRate 0.0536   Epoch: 5   Global Step: 222450   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:43:49,650-Speed 2633.73 samples/sec   Loss 10.0067   LearningRate 0.0536   Epoch: 5   Global Step: 222460   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:43:53,548-Speed 2628.08 samples/sec   Loss 10.0894   LearningRate 0.0536   Epoch: 5   Global Step: 222470   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:43:57,443-Speed 2629.38 samples/sec   Loss 9.7530   LearningRate 0.0536   Epoch: 5   Global Step: 222480   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:44:01,335-Speed 2631.85 samples/sec   Loss 9.8219   LearningRate 0.0536   Epoch: 5   Global Step: 222490   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:44:05,225-Speed 2633.13 samples/sec   Loss 9.9561   LearningRate 0.0536   Epoch: 5   Global Step: 222500   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:44:09,116-Speed 2632.07 samples/sec   Loss 9.8875   LearningRate 0.0536   Epoch: 5   Global Step: 222510   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:44:13,012-Speed 2628.51 samples/sec   Loss 10.0546   LearningRate 0.0535   Epoch: 5   Global Step: 222520   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:44:16,915-Speed 2624.27 samples/sec   Loss 9.8989   LearningRate 0.0535   Epoch: 5   Global Step: 222530   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:44:20,866-Speed 2593.29 samples/sec   Loss 9.9713   LearningRate 0.0535   Epoch: 5   Global Step: 222540   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:24,890-Speed 2544.63 samples/sec   Loss 9.8810   LearningRate 0.0535   Epoch: 5   Global Step: 222550   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:28,803-Speed 2617.52 samples/sec   Loss 9.7996   LearningRate 0.0535   Epoch: 5   Global Step: 222560   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:32,702-Speed 2626.92 samples/sec   Loss 10.0661   LearningRate 0.0535   Epoch: 5   Global Step: 222570   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:36,594-Speed 2632.18 samples/sec   Loss 10.0310   LearningRate 0.0535   Epoch: 5   Global Step: 222580   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:40,522-Speed 2607.57 samples/sec   Loss 9.8175   LearningRate 0.0535   Epoch: 5   Global Step: 222590   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:44,416-Speed 2630.49 samples/sec   Loss 9.8700   LearningRate 0.0535   Epoch: 5   Global Step: 222600   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:48,325-Speed 2620.51 samples/sec   Loss 10.0659   LearningRate 0.0535   Epoch: 5   Global Step: 222610   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:52,217-Speed 2631.57 samples/sec   Loss 9.8055   LearningRate 0.0535   Epoch: 5   Global Step: 222620   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:56,106-Speed 2634.41 samples/sec   Loss 9.9905   LearningRate 0.0535   Epoch: 5   Global Step: 222630   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:44:59,998-Speed 2631.79 samples/sec   Loss 9.8704   LearningRate 0.0535   Epoch: 5   Global Step: 222640   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:45:03,892-Speed 2630.10 samples/sec   Loss 9.7757   LearningRate 0.0535   Epoch: 5   Global Step: 222650   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:45:07,789-Speed 2628.27 samples/sec   Loss 10.0900   LearningRate 0.0535   Epoch: 5   Global Step: 222660   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:45:11,680-Speed 2632.47 samples/sec   Loss 10.1294   LearningRate 0.0535   Epoch: 5   Global Step: 222670   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:45:15,555-Speed 2643.93 samples/sec   Loss 10.0662   LearningRate 0.0535   Epoch: 5   Global Step: 222680   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:19,469-Speed 2616.25 samples/sec   Loss 9.9463   LearningRate 0.0535   Epoch: 5   Global Step: 222690   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:23,360-Speed 2633.22 samples/sec   Loss 9.9107   LearningRate 0.0535   Epoch: 5   Global Step: 222700   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:27,248-Speed 2634.55 samples/sec   Loss 9.9043   LearningRate 0.0535   Epoch: 5   Global Step: 222710   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:31,139-Speed 2631.72 samples/sec   Loss 10.0594   LearningRate 0.0535   Epoch: 5   Global Step: 222720   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:35,038-Speed 2627.23 samples/sec   Loss 9.8619   LearningRate 0.0535   Epoch: 5   Global Step: 222730   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:38,931-Speed 2631.08 samples/sec   Loss 9.9575   LearningRate 0.0535   Epoch: 5   Global Step: 222740   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:42,842-Speed 2618.97 samples/sec   Loss 9.8973   LearningRate 0.0535   Epoch: 5   Global Step: 222750   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:46,730-Speed 2634.56 samples/sec   Loss 9.9674   LearningRate 0.0535   Epoch: 5   Global Step: 222760   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:50,621-Speed 2632.16 samples/sec   Loss 10.0083   LearningRate 0.0535   Epoch: 5   Global Step: 222770   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:45:54,567-Speed 2596.47 samples/sec   Loss 9.8616   LearningRate 0.0535   Epoch: 5   Global Step: 222780   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:45:58,464-Speed 2627.73 samples/sec   Loss 9.8128   LearningRate 0.0535   Epoch: 5   Global Step: 222790   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:02,359-Speed 2629.87 samples/sec   Loss 9.9636   LearningRate 0.0535   Epoch: 5   Global Step: 222800   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:06,285-Speed 2608.51 samples/sec   Loss 9.8985   LearningRate 0.0535   Epoch: 5   Global Step: 222810   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:10,177-Speed 2632.34 samples/sec   Loss 9.9453   LearningRate 0.0535   Epoch: 5   Global Step: 222820   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:14,078-Speed 2625.76 samples/sec   Loss 9.8297   LearningRate 0.0535   Epoch: 5   Global Step: 222830   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:17,972-Speed 2630.10 samples/sec   Loss 9.9408   LearningRate 0.0535   Epoch: 5   Global Step: 222840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:21,868-Speed 2628.56 samples/sec   Loss 10.1231   LearningRate 0.0535   Epoch: 5   Global Step: 222850   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:25,755-Speed 2635.85 samples/sec   Loss 9.9355   LearningRate 0.0535   Epoch: 5   Global Step: 222860   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:29,648-Speed 2630.65 samples/sec   Loss 9.9057   LearningRate 0.0535   Epoch: 5   Global Step: 222870   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:46:33,583-Speed 2603.01 samples/sec   Loss 9.8778   LearningRate 0.0535   Epoch: 5   Global Step: 222880   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:46:37,543-Speed 2586.44 samples/sec   Loss 9.8476   LearningRate 0.0535   Epoch: 5   Global Step: 222890   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:46:41,447-Speed 2623.86 samples/sec   Loss 9.8638   LearningRate 0.0535   Epoch: 5   Global Step: 222900   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:46:45,336-Speed 2633.41 samples/sec   Loss 9.9004   LearningRate 0.0535   Epoch: 5   Global Step: 222910   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:46:49,283-Speed 2595.04 samples/sec   Loss 9.9720   LearningRate 0.0535   Epoch: 5   Global Step: 222920   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:46:53,184-Speed 2626.03 samples/sec   Loss 9.9502   LearningRate 0.0535   Epoch: 5   Global Step: 222930   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:46:57,092-Speed 2620.77 samples/sec   Loss 9.8201   LearningRate 0.0535   Epoch: 5   Global Step: 222940   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:47:00,984-Speed 2631.74 samples/sec   Loss 9.9970   LearningRate 0.0535   Epoch: 5   Global Step: 222950   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:47:04,886-Speed 2624.84 samples/sec   Loss 9.8847   LearningRate 0.0535   Epoch: 5   Global Step: 222960   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:47:08,785-Speed 2626.69 samples/sec   Loss 9.9449   LearningRate 0.0535   Epoch: 5   Global Step: 222970   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:47:12,670-Speed 2637.21 samples/sec   Loss 9.9037   LearningRate 0.0535   Epoch: 5   Global Step: 222980   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:47:16,548-Speed 2641.04 samples/sec   Loss 9.8392   LearningRate 0.0535   Epoch: 5   Global Step: 222990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:20,445-Speed 2628.84 samples/sec   Loss 10.0679   LearningRate 0.0535   Epoch: 5   Global Step: 223000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:24,343-Speed 2627.02 samples/sec   Loss 10.0924   LearningRate 0.0535   Epoch: 5   Global Step: 223010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:28,234-Speed 2633.23 samples/sec   Loss 9.8146   LearningRate 0.0535   Epoch: 5   Global Step: 223020   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:32,126-Speed 2631.69 samples/sec   Loss 10.0594   LearningRate 0.0535   Epoch: 5   Global Step: 223030   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:36,018-Speed 2630.93 samples/sec   Loss 9.9995   LearningRate 0.0535   Epoch: 5   Global Step: 223040   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:39,915-Speed 2628.69 samples/sec   Loss 9.8340   LearningRate 0.0535   Epoch: 5   Global Step: 223050   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:43,805-Speed 2633.13 samples/sec   Loss 9.8454   LearningRate 0.0535   Epoch: 5   Global Step: 223060   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:47,694-Speed 2633.41 samples/sec   Loss 9.9458   LearningRate 0.0535   Epoch: 5   Global Step: 223070   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:51,617-Speed 2611.59 samples/sec   Loss 10.0023   LearningRate 0.0534   Epoch: 5   Global Step: 223080   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:47:55,507-Speed 2632.57 samples/sec   Loss 10.0731   LearningRate 0.0534   Epoch: 5   Global Step: 223090   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:47:59,421-Speed 2617.15 samples/sec   Loss 9.8902   LearningRate 0.0534   Epoch: 5   Global Step: 223100   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:48:03,302-Speed 2639.60 samples/sec   Loss 9.9975   LearningRate 0.0534   Epoch: 5   Global Step: 223110   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:48:07,178-Speed 2642.67 samples/sec   Loss 9.9069   LearningRate 0.0534   Epoch: 5   Global Step: 223120   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:11,072-Speed 2629.79 samples/sec   Loss 10.0021   LearningRate 0.0534   Epoch: 5   Global Step: 223130   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:14,966-Speed 2630.14 samples/sec   Loss 9.7332   LearningRate 0.0534   Epoch: 5   Global Step: 223140   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:18,857-Speed 2632.23 samples/sec   Loss 9.9114   LearningRate 0.0534   Epoch: 5   Global Step: 223150   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:22,749-Speed 2632.64 samples/sec   Loss 9.9074   LearningRate 0.0534   Epoch: 5   Global Step: 223160   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:26,642-Speed 2630.60 samples/sec   Loss 10.0004   LearningRate 0.0534   Epoch: 5   Global Step: 223170   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:30,561-Speed 2614.55 samples/sec   Loss 10.0383   LearningRate 0.0534   Epoch: 5   Global Step: 223180   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:34,495-Speed 2603.69 samples/sec   Loss 9.9453   LearningRate 0.0534   Epoch: 5   Global Step: 223190   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:38,386-Speed 2632.18 samples/sec   Loss 10.0069   LearningRate 0.0534   Epoch: 5   Global Step: 223200   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:42,291-Speed 2622.65 samples/sec   Loss 9.9091   LearningRate 0.0534   Epoch: 5   Global Step: 223210   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:48:46,191-Speed 2626.08 samples/sec   Loss 9.9508   LearningRate 0.0534   Epoch: 5   Global Step: 223220   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:48:50,084-Speed 2631.31 samples/sec   Loss 10.0275   LearningRate 0.0534   Epoch: 5   Global Step: 223230   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:48:53,978-Speed 2630.83 samples/sec   Loss 9.7795   LearningRate 0.0534   Epoch: 5   Global Step: 223240   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:48:57,871-Speed 2630.95 samples/sec   Loss 9.8413   LearningRate 0.0534   Epoch: 5   Global Step: 223250   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:01,823-Speed 2591.64 samples/sec   Loss 9.8719   LearningRate 0.0534   Epoch: 5   Global Step: 223260   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:05,924-Speed 2497.14 samples/sec   Loss 9.8953   LearningRate 0.0534   Epoch: 5   Global Step: 223270   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:09,903-Speed 2574.35 samples/sec   Loss 10.0541   LearningRate 0.0534   Epoch: 5   Global Step: 223280   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:13,794-Speed 2632.67 samples/sec   Loss 9.9652   LearningRate 0.0534   Epoch: 5   Global Step: 223290   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:17,691-Speed 2627.98 samples/sec   Loss 10.0003   LearningRate 0.0534   Epoch: 5   Global Step: 223300   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:21,590-Speed 2627.64 samples/sec   Loss 10.0383   LearningRate 0.0534   Epoch: 5   Global Step: 223310   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:25,499-Speed 2620.33 samples/sec   Loss 9.9839   LearningRate 0.0534   Epoch: 5   Global Step: 223320   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:49:29,386-Speed 2634.95 samples/sec   Loss 9.9028   LearningRate 0.0534   Epoch: 5   Global Step: 223330   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:49:33,280-Speed 2630.05 samples/sec   Loss 9.7972   LearningRate 0.0534   Epoch: 5   Global Step: 223340   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:49:37,174-Speed 2630.58 samples/sec   Loss 10.1121   LearningRate 0.0534   Epoch: 5   Global Step: 223350   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:49:41,066-Speed 2631.77 samples/sec   Loss 9.9421   LearningRate 0.0534   Epoch: 5   Global Step: 223360   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:49:44,954-Speed 2634.75 samples/sec   Loss 9.7636   LearningRate 0.0534   Epoch: 5   Global Step: 223370   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:49:48,833-Speed 2640.11 samples/sec   Loss 9.9285   LearningRate 0.0534   Epoch: 5   Global Step: 223380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:52,724-Speed 2632.97 samples/sec   Loss 9.8530   LearningRate 0.0534   Epoch: 5   Global Step: 223390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:49:56,615-Speed 2632.33 samples/sec   Loss 9.9277   LearningRate 0.0534   Epoch: 5   Global Step: 223400   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:00,519-Speed 2624.23 samples/sec   Loss 9.9369   LearningRate 0.0534   Epoch: 5   Global Step: 223410   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:04,518-Speed 2560.95 samples/sec   Loss 9.7782   LearningRate 0.0534   Epoch: 5   Global Step: 223420   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:08,412-Speed 2630.46 samples/sec   Loss 9.9601   LearningRate 0.0534   Epoch: 5   Global Step: 223430   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:12,312-Speed 2625.83 samples/sec   Loss 9.8683   LearningRate 0.0534   Epoch: 5   Global Step: 223440   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:16,217-Speed 2623.54 samples/sec   Loss 9.8748   LearningRate 0.0534   Epoch: 5   Global Step: 223450   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:20,137-Speed 2612.41 samples/sec   Loss 9.7779   LearningRate 0.0534   Epoch: 5   Global Step: 223460   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:24,040-Speed 2624.10 samples/sec   Loss 10.1118   LearningRate 0.0534   Epoch: 5   Global Step: 223470   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:27,942-Speed 2625.35 samples/sec   Loss 9.8733   LearningRate 0.0534   Epoch: 5   Global Step: 223480   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:50:31,837-Speed 2630.05 samples/sec   Loss 9.9905   LearningRate 0.0534   Epoch: 5   Global Step: 223490   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:50:35,747-Speed 2619.40 samples/sec   Loss 9.9618   LearningRate 0.0534   Epoch: 5   Global Step: 223500   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:50:39,647-Speed 2625.85 samples/sec   Loss 10.0169   LearningRate 0.0534   Epoch: 5   Global Step: 223510   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:50:43,522-Speed 2643.25 samples/sec   Loss 10.0026   LearningRate 0.0534   Epoch: 5   Global Step: 223520   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:47,414-Speed 2631.70 samples/sec   Loss 9.9424   LearningRate 0.0534   Epoch: 5   Global Step: 223530   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:51,313-Speed 2626.99 samples/sec   Loss 10.0219   LearningRate 0.0534   Epoch: 5   Global Step: 223540   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:55,213-Speed 2626.69 samples/sec   Loss 10.0499   LearningRate 0.0534   Epoch: 5   Global Step: 223550   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:50:59,114-Speed 2625.49 samples/sec   Loss 9.7926   LearningRate 0.0534   Epoch: 5   Global Step: 223560   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:51:03,032-Speed 2614.01 samples/sec   Loss 9.9254   LearningRate 0.0534   Epoch: 5   Global Step: 223570   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:51:06,936-Speed 2623.74 samples/sec   Loss 9.7633   LearningRate 0.0534   Epoch: 5   Global Step: 223580   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:51:10,832-Speed 2629.13 samples/sec   Loss 10.0974   LearningRate 0.0534   Epoch: 5   Global Step: 223590   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:51:14,730-Speed 2627.01 samples/sec   Loss 9.9393   LearningRate 0.0534   Epoch: 5   Global Step: 223600   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:51:18,647-Speed 2615.54 samples/sec   Loss 9.8615   LearningRate 0.0534   Epoch: 5   Global Step: 223610   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:51:22,549-Speed 2625.02 samples/sec   Loss 10.0826   LearningRate 0.0534   Epoch: 5   Global Step: 223620   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:26,436-Speed 2635.06 samples/sec   Loss 9.9477   LearningRate 0.0534   Epoch: 5   Global Step: 223630   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:30,328-Speed 2631.45 samples/sec   Loss 10.0366   LearningRate 0.0534   Epoch: 5   Global Step: 223640   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:34,215-Speed 2635.07 samples/sec   Loss 9.9425   LearningRate 0.0533   Epoch: 5   Global Step: 223650   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:38,111-Speed 2628.96 samples/sec   Loss 9.8353   LearningRate 0.0533   Epoch: 5   Global Step: 223660   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:42,010-Speed 2627.02 samples/sec   Loss 9.9725   LearningRate 0.0533   Epoch: 5   Global Step: 223670   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:45,950-Speed 2600.11 samples/sec   Loss 10.0483   LearningRate 0.0533   Epoch: 5   Global Step: 223680   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:49,899-Speed 2593.78 samples/sec   Loss 9.9384   LearningRate 0.0533   Epoch: 5   Global Step: 223690   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:53,792-Speed 2631.36 samples/sec   Loss 9.8662   LearningRate 0.0533   Epoch: 5   Global Step: 223700   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:51:57,687-Speed 2629.35 samples/sec   Loss 10.0280   LearningRate 0.0533   Epoch: 5   Global Step: 223710   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:52:01,565-Speed 2641.74 samples/sec   Loss 9.8743   LearningRate 0.0533   Epoch: 5   Global Step: 223720   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:52:05,457-Speed 2631.08 samples/sec   Loss 10.0845   LearningRate 0.0533   Epoch: 5   Global Step: 223730   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:52:09,349-Speed 2632.79 samples/sec   Loss 9.9653   LearningRate 0.0533   Epoch: 5   Global Step: 223740   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:52:13,236-Speed 2634.92 samples/sec   Loss 9.8704   LearningRate 0.0533   Epoch: 5   Global Step: 223750   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:17,130-Speed 2630.41 samples/sec   Loss 9.9584   LearningRate 0.0533   Epoch: 5   Global Step: 223760   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:21,027-Speed 2628.48 samples/sec   Loss 9.9776   LearningRate 0.0533   Epoch: 5   Global Step: 223770   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:25,050-Speed 2545.86 samples/sec   Loss 9.8518   LearningRate 0.0533   Epoch: 5   Global Step: 223780   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:29,118-Speed 2518.12 samples/sec   Loss 9.8563   LearningRate 0.0533   Epoch: 5   Global Step: 223790   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:33,189-Speed 2516.13 samples/sec   Loss 9.9064   LearningRate 0.0533   Epoch: 5   Global Step: 223800   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:37,258-Speed 2517.37 samples/sec   Loss 9.9785   LearningRate 0.0533   Epoch: 5   Global Step: 223810   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:41,270-Speed 2552.71 samples/sec   Loss 9.9337   LearningRate 0.0533   Epoch: 5   Global Step: 223820   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:45,168-Speed 2627.62 samples/sec   Loss 9.9514   LearningRate 0.0533   Epoch: 5   Global Step: 223830   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:49,078-Speed 2619.41 samples/sec   Loss 9.8206   LearningRate 0.0533   Epoch: 5   Global Step: 223840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:52:52,976-Speed 2627.78 samples/sec   Loss 9.9791   LearningRate 0.0533   Epoch: 5   Global Step: 223850   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:52:56,873-Speed 2628.37 samples/sec   Loss 9.8044   LearningRate 0.0533   Epoch: 5   Global Step: 223860   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:00,777-Speed 2623.95 samples/sec   Loss 10.0532   LearningRate 0.0533   Epoch: 5   Global Step: 223870   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:04,676-Speed 2626.68 samples/sec   Loss 9.8862   LearningRate 0.0533   Epoch: 5   Global Step: 223880   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:08,582-Speed 2622.13 samples/sec   Loss 10.0209   LearningRate 0.0533   Epoch: 5   Global Step: 223890   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:12,478-Speed 2629.04 samples/sec   Loss 9.9271   LearningRate 0.0533   Epoch: 5   Global Step: 223900   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:16,371-Speed 2631.45 samples/sec   Loss 9.9515   LearningRate 0.0533   Epoch: 5   Global Step: 223910   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:20,262-Speed 2632.17 samples/sec   Loss 9.9501   LearningRate 0.0533   Epoch: 5   Global Step: 223920   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:24,151-Speed 2633.99 samples/sec   Loss 9.8312   LearningRate 0.0533   Epoch: 5   Global Step: 223930   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:28,043-Speed 2631.33 samples/sec   Loss 10.0079   LearningRate 0.0533   Epoch: 5   Global Step: 223940   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:31,935-Speed 2632.05 samples/sec   Loss 10.0324   LearningRate 0.0533   Epoch: 5   Global Step: 223950   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:35,836-Speed 2625.75 samples/sec   Loss 9.8009   LearningRate 0.0533   Epoch: 5   Global Step: 223960   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:39,729-Speed 2630.65 samples/sec   Loss 9.8744   LearningRate 0.0533   Epoch: 5   Global Step: 223970   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:53:43,609-Speed 2639.58 samples/sec   Loss 9.9999   LearningRate 0.0533   Epoch: 5   Global Step: 223980   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:53:47,494-Speed 2636.72 samples/sec   Loss 9.8168   LearningRate 0.0533   Epoch: 5   Global Step: 223990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:53:51,389-Speed 2629.92 samples/sec   Loss 9.9550   LearningRate 0.0533   Epoch: 5   Global Step: 224000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:53:55,278-Speed 2633.98 samples/sec   Loss 9.8599   LearningRate 0.0533   Epoch: 5   Global Step: 224010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:53:59,171-Speed 2630.67 samples/sec   Loss 9.8806   LearningRate 0.0533   Epoch: 5   Global Step: 224020   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:03,068-Speed 2628.68 samples/sec   Loss 9.8718   LearningRate 0.0533   Epoch: 5   Global Step: 224030   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:06,961-Speed 2630.39 samples/sec   Loss 9.9094   LearningRate 0.0533   Epoch: 5   Global Step: 224040   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:10,849-Speed 2634.32 samples/sec   Loss 9.8365   LearningRate 0.0533   Epoch: 5   Global Step: 224050   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:14,742-Speed 2631.06 samples/sec   Loss 10.0507   LearningRate 0.0533   Epoch: 5   Global Step: 224060   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:18,634-Speed 2631.69 samples/sec   Loss 9.9663   LearningRate 0.0533   Epoch: 5   Global Step: 224070   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:22,528-Speed 2630.41 samples/sec   Loss 9.8775   LearningRate 0.0533   Epoch: 5   Global Step: 224080   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:26,419-Speed 2632.57 samples/sec   Loss 9.9971   LearningRate 0.0533   Epoch: 5   Global Step: 224090   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:30,312-Speed 2631.28 samples/sec   Loss 9.9837   LearningRate 0.0533   Epoch: 5   Global Step: 224100   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:34,202-Speed 2632.28 samples/sec   Loss 9.8657   LearningRate 0.0533   Epoch: 5   Global Step: 224110   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:38,099-Speed 2628.47 samples/sec   Loss 9.9011   LearningRate 0.0533   Epoch: 5   Global Step: 224120   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:41,989-Speed 2632.53 samples/sec   Loss 9.8177   LearningRate 0.0533   Epoch: 5   Global Step: 224130   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:45,879-Speed 2633.37 samples/sec   Loss 9.8377   LearningRate 0.0533   Epoch: 5   Global Step: 224140   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:49,772-Speed 2631.06 samples/sec   Loss 9.8346   LearningRate 0.0533   Epoch: 5   Global Step: 224150   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:54:53,652-Speed 2640.37 samples/sec   Loss 9.7820   LearningRate 0.0533   Epoch: 5   Global Step: 224160   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:54:57,567-Speed 2615.81 samples/sec   Loss 9.8801   LearningRate 0.0533   Epoch: 5   Global Step: 224170   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:55:01,530-Speed 2584.85 samples/sec   Loss 9.8018   LearningRate 0.0533   Epoch: 5   Global Step: 224180   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:55:05,423-Speed 2630.60 samples/sec   Loss 9.9661   LearningRate 0.0533   Epoch: 5   Global Step: 224190   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:55:09,265-Speed 2665.78 samples/sec   Loss 9.9443   LearningRate 0.0533   Epoch: 5   Global Step: 224200   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:13,169-Speed 2623.31 samples/sec   Loss 10.2531   LearningRate 0.0533   Epoch: 5   Global Step: 224210   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:17,272-Speed 2496.69 samples/sec   Loss 10.3264   LearningRate 0.0532   Epoch: 5   Global Step: 224220   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:21,169-Speed 2628.78 samples/sec   Loss 10.2215   LearningRate 0.0532   Epoch: 5   Global Step: 224230   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:25,055-Speed 2635.91 samples/sec   Loss 9.9916   LearningRate 0.0532   Epoch: 5   Global Step: 224240   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:28,946-Speed 2632.42 samples/sec   Loss 9.8787   LearningRate 0.0532   Epoch: 5   Global Step: 224250   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:32,831-Speed 2636.75 samples/sec   Loss 10.0229   LearningRate 0.0532   Epoch: 5   Global Step: 224260   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:36,723-Speed 2631.70 samples/sec   Loss 9.9440   LearningRate 0.0532   Epoch: 5   Global Step: 224270   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:40,613-Speed 2633.00 samples/sec   Loss 10.0088   LearningRate 0.0532   Epoch: 5   Global Step: 224280   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:44,506-Speed 2630.85 samples/sec   Loss 9.9179   LearningRate 0.0532   Epoch: 5   Global Step: 224290   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 20:55:48,403-Speed 2628.66 samples/sec   Loss 10.0302   LearningRate 0.0532   Epoch: 5   Global Step: 224300   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:55:52,294-Speed 2632.95 samples/sec   Loss 10.0066   LearningRate 0.0532   Epoch: 5   Global Step: 224310   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:55:56,200-Speed 2621.76 samples/sec   Loss 10.1273   LearningRate 0.0532   Epoch: 5   Global Step: 224320   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:00,096-Speed 2629.28 samples/sec   Loss 9.9389   LearningRate 0.0532   Epoch: 5   Global Step: 224330   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:03,994-Speed 2627.84 samples/sec   Loss 9.9657   LearningRate 0.0532   Epoch: 5   Global Step: 224340   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:07,903-Speed 2620.06 samples/sec   Loss 9.8748   LearningRate 0.0532   Epoch: 5   Global Step: 224350   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:11,797-Speed 2630.72 samples/sec   Loss 9.8609   LearningRate 0.0532   Epoch: 5   Global Step: 224360   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:15,692-Speed 2629.98 samples/sec   Loss 9.9681   LearningRate 0.0532   Epoch: 5   Global Step: 224370   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:19,591-Speed 2626.41 samples/sec   Loss 9.9890   LearningRate 0.0532   Epoch: 5   Global Step: 224380   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:23,487-Speed 2628.76 samples/sec   Loss 10.0159   LearningRate 0.0532   Epoch: 5   Global Step: 224390   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 20:56:27,384-Speed 2628.42 samples/sec   Loss 10.0307   LearningRate 0.0532   Epoch: 5   Global Step: 224400   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:31,277-Speed 2631.68 samples/sec   Loss 9.8227   LearningRate 0.0532   Epoch: 5   Global Step: 224410   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:35,168-Speed 2632.47 samples/sec   Loss 10.0589   LearningRate 0.0532   Epoch: 5   Global Step: 224420   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:39,067-Speed 2626.42 samples/sec   Loss 9.9223   LearningRate 0.0532   Epoch: 5   Global Step: 224430   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:43,029-Speed 2585.33 samples/sec   Loss 10.0054   LearningRate 0.0532   Epoch: 5   Global Step: 224440   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:46,934-Speed 2623.33 samples/sec   Loss 9.8702   LearningRate 0.0532   Epoch: 5   Global Step: 224450   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:50,826-Speed 2632.02 samples/sec   Loss 10.0326   LearningRate 0.0532   Epoch: 5   Global Step: 224460   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:54,718-Speed 2631.86 samples/sec   Loss 9.7734   LearningRate 0.0532   Epoch: 5   Global Step: 224470   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:56:58,611-Speed 2630.80 samples/sec   Loss 9.8825   LearningRate 0.0532   Epoch: 5   Global Step: 224480   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:02,552-Speed 2598.86 samples/sec   Loss 10.0180   LearningRate 0.0532   Epoch: 5   Global Step: 224490   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:06,444-Speed 2631.83 samples/sec   Loss 9.9771   LearningRate 0.0532   Epoch: 5   Global Step: 224500   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:57:10,334-Speed 2632.98 samples/sec   Loss 9.9258   LearningRate 0.0532   Epoch: 5   Global Step: 224510   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:57:14,209-Speed 2642.94 samples/sec   Loss 9.8681   LearningRate 0.0532   Epoch: 5   Global Step: 224520   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:18,113-Speed 2624.21 samples/sec   Loss 9.9952   LearningRate 0.0532   Epoch: 5   Global Step: 224530   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:22,014-Speed 2625.10 samples/sec   Loss 9.9513   LearningRate 0.0532   Epoch: 5   Global Step: 224540   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:25,910-Speed 2629.55 samples/sec   Loss 9.9112   LearningRate 0.0532   Epoch: 5   Global Step: 224550   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:29,800-Speed 2632.56 samples/sec   Loss 9.8678   LearningRate 0.0532   Epoch: 5   Global Step: 224560   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:33,699-Speed 2627.04 samples/sec   Loss 9.9772   LearningRate 0.0532   Epoch: 5   Global Step: 224570   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:37,593-Speed 2630.31 samples/sec   Loss 9.9936   LearningRate 0.0532   Epoch: 5   Global Step: 224580   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:41,481-Speed 2634.11 samples/sec   Loss 9.9751   LearningRate 0.0532   Epoch: 5   Global Step: 224590   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:45,384-Speed 2624.59 samples/sec   Loss 9.9290   LearningRate 0.0532   Epoch: 5   Global Step: 224600   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:49,281-Speed 2628.42 samples/sec   Loss 9.9046   LearningRate 0.0532   Epoch: 5   Global Step: 224610   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 20:57:53,169-Speed 2634.47 samples/sec   Loss 9.7644   LearningRate 0.0532   Epoch: 5   Global Step: 224620   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:57:57,060-Speed 2632.39 samples/sec   Loss 10.0718   LearningRate 0.0532   Epoch: 5   Global Step: 224630   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:00,950-Speed 2633.46 samples/sec   Loss 9.9740   LearningRate 0.0532   Epoch: 5   Global Step: 224640   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:04,853-Speed 2624.34 samples/sec   Loss 9.7975   LearningRate 0.0532   Epoch: 5   Global Step: 224650   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:08,747-Speed 2630.03 samples/sec   Loss 9.8088   LearningRate 0.0532   Epoch: 5   Global Step: 224660   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:12,687-Speed 2599.87 samples/sec   Loss 10.0030   LearningRate 0.0532   Epoch: 5   Global Step: 224670   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:16,578-Speed 2632.59 samples/sec   Loss 10.1395   LearningRate 0.0532   Epoch: 5   Global Step: 224680   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:20,475-Speed 2628.59 samples/sec   Loss 9.9603   LearningRate 0.0532   Epoch: 5   Global Step: 224690   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:24,367-Speed 2631.97 samples/sec   Loss 9.9104   LearningRate 0.0532   Epoch: 5   Global Step: 224700   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:28,259-Speed 2631.80 samples/sec   Loss 9.9652   LearningRate 0.0532   Epoch: 5   Global Step: 224710   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:32,160-Speed 2625.50 samples/sec   Loss 9.7889   LearningRate 0.0532   Epoch: 5   Global Step: 224720   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:58:36,083-Speed 2610.36 samples/sec   Loss 9.8432   LearningRate 0.0532   Epoch: 5   Global Step: 224730   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:58:39,980-Speed 2629.02 samples/sec   Loss 9.8835   LearningRate 0.0532   Epoch: 5   Global Step: 224740   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:58:43,914-Speed 2603.89 samples/sec   Loss 9.8571   LearningRate 0.0532   Epoch: 5   Global Step: 224750   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:58:47,789-Speed 2643.27 samples/sec   Loss 9.9702   LearningRate 0.0532   Epoch: 5   Global Step: 224760   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:51,818-Speed 2541.95 samples/sec   Loss 9.7622   LearningRate 0.0532   Epoch: 5   Global Step: 224770   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:55,744-Speed 2609.25 samples/sec   Loss 10.0066   LearningRate 0.0532   Epoch: 5   Global Step: 224780   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:58:59,651-Speed 2621.96 samples/sec   Loss 10.0500   LearningRate 0.0531   Epoch: 5   Global Step: 224790   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:03,546-Speed 2628.97 samples/sec   Loss 9.8071   LearningRate 0.0531   Epoch: 5   Global Step: 224800   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:07,437-Speed 2632.46 samples/sec   Loss 9.8555   LearningRate 0.0531   Epoch: 5   Global Step: 224810   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:11,341-Speed 2624.08 samples/sec   Loss 9.9674   LearningRate 0.0531   Epoch: 5   Global Step: 224820   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:15,232-Speed 2632.18 samples/sec   Loss 9.9435   LearningRate 0.0531   Epoch: 5   Global Step: 224830   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:19,123-Speed 2632.31 samples/sec   Loss 9.8121   LearningRate 0.0531   Epoch: 5   Global Step: 224840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:23,019-Speed 2628.83 samples/sec   Loss 10.0046   LearningRate 0.0531   Epoch: 5   Global Step: 224850   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:26,913-Speed 2630.38 samples/sec   Loss 9.8554   LearningRate 0.0531   Epoch: 5   Global Step: 224860   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 20:59:30,783-Speed 2646.91 samples/sec   Loss 9.8210   LearningRate 0.0531   Epoch: 5   Global Step: 224870   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:34,681-Speed 2627.77 samples/sec   Loss 9.8464   LearningRate 0.0531   Epoch: 5   Global Step: 224880   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:38,574-Speed 2630.23 samples/sec   Loss 9.8833   LearningRate 0.0531   Epoch: 5   Global Step: 224890   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:42,468-Speed 2630.67 samples/sec   Loss 9.9301   LearningRate 0.0531   Epoch: 5   Global Step: 224900   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:46,367-Speed 2627.48 samples/sec   Loss 9.7920   LearningRate 0.0531   Epoch: 5   Global Step: 224910   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:50,280-Speed 2617.25 samples/sec   Loss 9.9007   LearningRate 0.0531   Epoch: 5   Global Step: 224920   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:54,168-Speed 2634.83 samples/sec   Loss 9.9045   LearningRate 0.0531   Epoch: 5   Global Step: 224930   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 20:59:58,064-Speed 2628.67 samples/sec   Loss 9.9983   LearningRate 0.0531   Epoch: 5   Global Step: 224940   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:01,954-Speed 2633.99 samples/sec   Loss 9.7964   LearningRate 0.0531   Epoch: 5   Global Step: 224950   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:05,847-Speed 2630.87 samples/sec   Loss 9.9651   LearningRate 0.0531   Epoch: 5   Global Step: 224960   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:09,735-Speed 2633.89 samples/sec   Loss 9.8456   LearningRate 0.0531   Epoch: 5   Global Step: 224970   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:00:13,629-Speed 2630.26 samples/sec   Loss 9.7829   LearningRate 0.0531   Epoch: 5   Global Step: 224980   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:00:17,536-Speed 2621.97 samples/sec   Loss 9.8832   LearningRate 0.0531   Epoch: 5   Global Step: 224990   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:00:21,432-Speed 2629.03 samples/sec   Loss 9.8628   LearningRate 0.0531   Epoch: 5   Global Step: 225000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:25,319-Speed 2635.00 samples/sec   Loss 9.8885   LearningRate 0.0531   Epoch: 5   Global Step: 225010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:29,221-Speed 2625.03 samples/sec   Loss 9.8711   LearningRate 0.0531   Epoch: 5   Global Step: 225020   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:33,110-Speed 2633.54 samples/sec   Loss 9.9835   LearningRate 0.0531   Epoch: 5   Global Step: 225030   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:37,000-Speed 2633.19 samples/sec   Loss 9.8811   LearningRate 0.0531   Epoch: 5   Global Step: 225040   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:40,892-Speed 2631.49 samples/sec   Loss 9.8894   LearningRate 0.0531   Epoch: 5   Global Step: 225050   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:44,854-Speed 2585.02 samples/sec   Loss 9.8009   LearningRate 0.0531   Epoch: 5   Global Step: 225060   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:48,824-Speed 2579.88 samples/sec   Loss 9.9235   LearningRate 0.0531   Epoch: 5   Global Step: 225070   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:52,712-Speed 2634.30 samples/sec   Loss 9.7276   LearningRate 0.0531   Epoch: 5   Global Step: 225080   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:00:56,603-Speed 2632.54 samples/sec   Loss 9.6960   LearningRate 0.0531   Epoch: 5   Global Step: 225090   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:00,494-Speed 2634.47 samples/sec   Loss 9.7407   LearningRate 0.0531   Epoch: 5   Global Step: 225100   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:04,388-Speed 2630.13 samples/sec   Loss 9.8953   LearningRate 0.0531   Epoch: 5   Global Step: 225110   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:08,285-Speed 2628.62 samples/sec   Loss 9.7714   LearningRate 0.0531   Epoch: 5   Global Step: 225120   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:12,175-Speed 2632.80 samples/sec   Loss 9.8345   LearningRate 0.0531   Epoch: 5   Global Step: 225130   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:16,068-Speed 2630.78 samples/sec   Loss 9.8705   LearningRate 0.0531   Epoch: 5   Global Step: 225140   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:19,961-Speed 2630.64 samples/sec   Loss 9.9753   LearningRate 0.0531   Epoch: 5   Global Step: 225150   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:23,851-Speed 2633.16 samples/sec   Loss 9.9479   LearningRate 0.0531   Epoch: 5   Global Step: 225160   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:01:27,735-Speed 2637.27 samples/sec   Loss 9.9552   LearningRate 0.0531   Epoch: 5   Global Step: 225170   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:31,633-Speed 2628.38 samples/sec   Loss 9.9410   LearningRate 0.0531   Epoch: 5   Global Step: 225180   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:35,524-Speed 2632.14 samples/sec   Loss 9.9377   LearningRate 0.0531   Epoch: 5   Global Step: 225190   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:39,417-Speed 2630.69 samples/sec   Loss 9.8561   LearningRate 0.0531   Epoch: 5   Global Step: 225200   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:43,306-Speed 2633.32 samples/sec   Loss 9.8944   LearningRate 0.0531   Epoch: 5   Global Step: 225210   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:47,205-Speed 2627.51 samples/sec   Loss 9.8515   LearningRate 0.0531   Epoch: 5   Global Step: 225220   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:51,115-Speed 2618.98 samples/sec   Loss 9.9554   LearningRate 0.0531   Epoch: 5   Global Step: 225230   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:55,020-Speed 2623.03 samples/sec   Loss 9.9009   LearningRate 0.0531   Epoch: 5   Global Step: 225240   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:01:58,910-Speed 2633.26 samples/sec   Loss 9.8546   LearningRate 0.0531   Epoch: 5   Global Step: 225250   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:02,823-Speed 2617.55 samples/sec   Loss 9.8796   LearningRate 0.0531   Epoch: 5   Global Step: 225260   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:06,743-Speed 2612.77 samples/sec   Loss 9.9280   LearningRate 0.0531   Epoch: 5   Global Step: 225270   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:02:10,638-Speed 2629.50 samples/sec   Loss 9.8144   LearningRate 0.0531   Epoch: 5   Global Step: 225280   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:02:14,528-Speed 2633.45 samples/sec   Loss 9.9028   LearningRate 0.0531   Epoch: 5   Global Step: 225290   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:02:18,423-Speed 2629.58 samples/sec   Loss 9.9500   LearningRate 0.0531   Epoch: 5   Global Step: 225300   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:02:22,361-Speed 2600.98 samples/sec   Loss 9.8282   LearningRate 0.0531   Epoch: 5   Global Step: 225310   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:02:26,235-Speed 2643.83 samples/sec   Loss 9.8044   LearningRate 0.0531   Epoch: 5   Global Step: 225320   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:30,134-Speed 2627.36 samples/sec   Loss 9.7558   LearningRate 0.0531   Epoch: 5   Global Step: 225330   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:34,034-Speed 2625.90 samples/sec   Loss 9.9243   LearningRate 0.0531   Epoch: 5   Global Step: 225340   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:37,926-Speed 2632.05 samples/sec   Loss 9.8330   LearningRate 0.0531   Epoch: 5   Global Step: 225350   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:41,821-Speed 2630.32 samples/sec   Loss 9.9333   LearningRate 0.0530   Epoch: 5   Global Step: 225360   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:45,718-Speed 2627.55 samples/sec   Loss 9.8278   LearningRate 0.0530   Epoch: 5   Global Step: 225370   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:49,621-Speed 2624.56 samples/sec   Loss 9.9343   LearningRate 0.0530   Epoch: 5   Global Step: 225380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:53,523-Speed 2625.16 samples/sec   Loss 9.8656   LearningRate 0.0530   Epoch: 5   Global Step: 225390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:02:57,420-Speed 2628.90 samples/sec   Loss 9.8388   LearningRate 0.0530   Epoch: 5   Global Step: 225400   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:03:01,340-Speed 2612.36 samples/sec   Loss 9.8457   LearningRate 0.0530   Epoch: 5   Global Step: 225410   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:03:05,237-Speed 2628.93 samples/sec   Loss 9.8211   LearningRate 0.0530   Epoch: 5   Global Step: 225420   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:03:09,132-Speed 2629.18 samples/sec   Loss 9.8868   LearningRate 0.0530   Epoch: 5   Global Step: 225430   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:03:12,988-Speed 2656.68 samples/sec   Loss 9.7363   LearningRate 0.0530   Epoch: 5   Global Step: 225440   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:16,913-Speed 2609.60 samples/sec   Loss 9.8877   LearningRate 0.0530   Epoch: 5   Global Step: 225450   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:20,807-Speed 2630.39 samples/sec   Loss 9.8981   LearningRate 0.0530   Epoch: 5   Global Step: 225460   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:24,747-Speed 2600.11 samples/sec   Loss 9.9329   LearningRate 0.0530   Epoch: 5   Global Step: 225470   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:28,644-Speed 2628.22 samples/sec   Loss 9.9714   LearningRate 0.0530   Epoch: 5   Global Step: 225480   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:32,536-Speed 2631.45 samples/sec   Loss 9.8181   LearningRate 0.0530   Epoch: 5   Global Step: 225490   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:36,429-Speed 2631.08 samples/sec   Loss 9.8691   LearningRate 0.0530   Epoch: 5   Global Step: 225500   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:40,318-Speed 2634.10 samples/sec   Loss 9.7142   LearningRate 0.0530   Epoch: 5   Global Step: 225510   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:44,208-Speed 2632.95 samples/sec   Loss 9.8789   LearningRate 0.0530   Epoch: 5   Global Step: 225520   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:48,126-Speed 2614.20 samples/sec   Loss 9.8394   LearningRate 0.0530   Epoch: 5   Global Step: 225530   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:03:52,018-Speed 2631.75 samples/sec   Loss 9.7966   LearningRate 0.0530   Epoch: 5   Global Step: 225540   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:03:55,910-Speed 2631.85 samples/sec   Loss 9.9231   LearningRate 0.0530   Epoch: 5   Global Step: 225550   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:03:59,808-Speed 2627.25 samples/sec   Loss 9.9690   LearningRate 0.0530   Epoch: 5   Global Step: 225560   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:03,702-Speed 2630.39 samples/sec   Loss 9.8700   LearningRate 0.0530   Epoch: 5   Global Step: 225570   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:07,601-Speed 2627.30 samples/sec   Loss 9.9112   LearningRate 0.0530   Epoch: 5   Global Step: 225580   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:11,493-Speed 2632.14 samples/sec   Loss 9.9762   LearningRate 0.0530   Epoch: 5   Global Step: 225590   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:15,387-Speed 2630.00 samples/sec   Loss 9.9101   LearningRate 0.0530   Epoch: 5   Global Step: 225600   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:19,285-Speed 2627.89 samples/sec   Loss 10.0678   LearningRate 0.0530   Epoch: 5   Global Step: 225610   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:23,181-Speed 2628.80 samples/sec   Loss 9.9200   LearningRate 0.0530   Epoch: 5   Global Step: 225620   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:27,075-Speed 2630.70 samples/sec   Loss 9.8766   LearningRate 0.0530   Epoch: 5   Global Step: 225630   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:30,970-Speed 2629.95 samples/sec   Loss 9.8959   LearningRate 0.0530   Epoch: 5   Global Step: 225640   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:04:34,865-Speed 2629.63 samples/sec   Loss 9.9377   LearningRate 0.0530   Epoch: 5   Global Step: 225650   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:04:38,756-Speed 2632.13 samples/sec   Loss 9.8460   LearningRate 0.0530   Epoch: 5   Global Step: 225660   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:04:42,653-Speed 2628.90 samples/sec   Loss 9.8461   LearningRate 0.0530   Epoch: 5   Global Step: 225670   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:04:46,541-Speed 2634.07 samples/sec   Loss 9.9429   LearningRate 0.0530   Epoch: 5   Global Step: 225680   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:04:50,424-Speed 2637.60 samples/sec   Loss 9.9131   LearningRate 0.0530   Epoch: 5   Global Step: 225690   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:54,320-Speed 2628.73 samples/sec   Loss 9.9352   LearningRate 0.0530   Epoch: 5   Global Step: 225700   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:04:58,211-Speed 2632.66 samples/sec   Loss 9.8361   LearningRate 0.0530   Epoch: 5   Global Step: 225710   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:02,106-Speed 2629.70 samples/sec   Loss 9.9805   LearningRate 0.0530   Epoch: 5   Global Step: 225720   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:05,995-Speed 2633.96 samples/sec   Loss 9.8376   LearningRate 0.0530   Epoch: 5   Global Step: 225730   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:09,883-Speed 2634.47 samples/sec   Loss 9.8924   LearningRate 0.0530   Epoch: 5   Global Step: 225740   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:13,774-Speed 2632.01 samples/sec   Loss 9.9249   LearningRate 0.0530   Epoch: 5   Global Step: 225750   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:17,664-Speed 2633.14 samples/sec   Loss 9.8687   LearningRate 0.0530   Epoch: 5   Global Step: 225760   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:21,551-Speed 2635.27 samples/sec   Loss 9.9754   LearningRate 0.0530   Epoch: 5   Global Step: 225770   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:25,441-Speed 2633.14 samples/sec   Loss 9.9145   LearningRate 0.0530   Epoch: 5   Global Step: 225780   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:29,336-Speed 2629.71 samples/sec   Loss 9.8998   LearningRate 0.0530   Epoch: 5   Global Step: 225790   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:05:33,229-Speed 2630.99 samples/sec   Loss 9.9975   LearningRate 0.0530   Epoch: 5   Global Step: 225800   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:05:37,119-Speed 2632.58 samples/sec   Loss 9.8822   LearningRate 0.0530   Epoch: 5   Global Step: 225810   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:05:41,013-Speed 2631.15 samples/sec   Loss 9.8139   LearningRate 0.0530   Epoch: 5   Global Step: 225820   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:05:44,918-Speed 2622.97 samples/sec   Loss 9.8483   LearningRate 0.0530   Epoch: 5   Global Step: 225830   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:05:48,808-Speed 2633.14 samples/sec   Loss 9.8473   LearningRate 0.0530   Epoch: 5   Global Step: 225840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:52,722-Speed 2616.98 samples/sec   Loss 9.9438   LearningRate 0.0530   Epoch: 5   Global Step: 225850   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:05:56,614-Speed 2631.68 samples/sec   Loss 9.9323   LearningRate 0.0530   Epoch: 5   Global Step: 225860   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:00,508-Speed 2630.22 samples/sec   Loss 9.9712   LearningRate 0.0530   Epoch: 5   Global Step: 225870   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:04,406-Speed 2627.43 samples/sec   Loss 10.0379   LearningRate 0.0530   Epoch: 5   Global Step: 225880   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:08,297-Speed 2632.55 samples/sec   Loss 9.9290   LearningRate 0.0530   Epoch: 5   Global Step: 225890   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:12,200-Speed 2624.07 samples/sec   Loss 9.8584   LearningRate 0.0530   Epoch: 5   Global Step: 225900   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:16,091-Speed 2632.95 samples/sec   Loss 9.8207   LearningRate 0.0530   Epoch: 5   Global Step: 225910   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:19,992-Speed 2625.48 samples/sec   Loss 9.7512   LearningRate 0.0530   Epoch: 5   Global Step: 225920   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:23,887-Speed 2629.88 samples/sec   Loss 9.8351   LearningRate 0.0529   Epoch: 5   Global Step: 225930   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:27,783-Speed 2628.58 samples/sec   Loss 9.9161   LearningRate 0.0529   Epoch: 5   Global Step: 225940   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:06:31,754-Speed 2580.17 samples/sec   Loss 9.8685   LearningRate 0.0529   Epoch: 5   Global Step: 225950   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:06:35,641-Speed 2634.90 samples/sec   Loss 9.8877   LearningRate 0.0529   Epoch: 5   Global Step: 225960   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:06:39,588-Speed 2594.90 samples/sec   Loss 9.7178   LearningRate 0.0529   Epoch: 5   Global Step: 225970   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:06:43,545-Speed 2588.82 samples/sec   Loss 9.8415   LearningRate 0.0529   Epoch: 5   Global Step: 225980   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:47,438-Speed 2631.17 samples/sec   Loss 9.8302   LearningRate 0.0529   Epoch: 5   Global Step: 225990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:51,329-Speed 2632.62 samples/sec   Loss 9.6877   LearningRate 0.0529   Epoch: 5   Global Step: 226000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:55,228-Speed 2626.70 samples/sec   Loss 9.7771   LearningRate 0.0529   Epoch: 5   Global Step: 226010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:06:59,135-Speed 2621.83 samples/sec   Loss 9.7988   LearningRate 0.0529   Epoch: 5   Global Step: 226020   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:07:03,035-Speed 2626.72 samples/sec   Loss 9.7365   LearningRate 0.0529   Epoch: 5   Global Step: 226030   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:07:06,942-Speed 2621.06 samples/sec   Loss 9.7143   LearningRate 0.0529   Epoch: 5   Global Step: 226040   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:07:10,838-Speed 2629.51 samples/sec   Loss 10.0356   LearningRate 0.0529   Epoch: 5   Global Step: 226050   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:07:14,727-Speed 2633.56 samples/sec   Loss 9.9277   LearningRate 0.0529   Epoch: 5   Global Step: 226060   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:07:18,627-Speed 2626.51 samples/sec   Loss 9.9412   LearningRate 0.0529   Epoch: 5   Global Step: 226070   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:07:22,508-Speed 2639.67 samples/sec   Loss 9.8453   LearningRate 0.0529   Epoch: 5   Global Step: 226080   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:26,400-Speed 2631.58 samples/sec   Loss 9.8129   LearningRate 0.0529   Epoch: 5   Global Step: 226090   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:30,295-Speed 2629.72 samples/sec   Loss 9.8300   LearningRate 0.0529   Epoch: 5   Global Step: 226100   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:34,185-Speed 2633.29 samples/sec   Loss 9.8265   LearningRate 0.0529   Epoch: 5   Global Step: 226110   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:38,077-Speed 2631.18 samples/sec   Loss 9.8776   LearningRate 0.0529   Epoch: 5   Global Step: 226120   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:41,968-Speed 2632.03 samples/sec   Loss 9.8815   LearningRate 0.0529   Epoch: 5   Global Step: 226130   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:45,873-Speed 2623.55 samples/sec   Loss 10.0422   LearningRate 0.0529   Epoch: 5   Global Step: 226140   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:49,775-Speed 2624.61 samples/sec   Loss 9.7725   LearningRate 0.0529   Epoch: 5   Global Step: 226150   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:53,684-Speed 2620.85 samples/sec   Loss 9.8758   LearningRate 0.0529   Epoch: 5   Global Step: 226160   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:07:57,576-Speed 2631.48 samples/sec   Loss 9.9495   LearningRate 0.0529   Epoch: 5   Global Step: 226170   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:08:01,473-Speed 2628.09 samples/sec   Loss 9.8432   LearningRate 0.0529   Epoch: 5   Global Step: 226180   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:05,365-Speed 2632.09 samples/sec   Loss 9.7617   LearningRate 0.0529   Epoch: 5   Global Step: 226190   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:09,267-Speed 2624.71 samples/sec   Loss 9.8240   LearningRate 0.0529   Epoch: 5   Global Step: 226200   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:13,161-Speed 2630.43 samples/sec   Loss 9.8898   LearningRate 0.0529   Epoch: 5   Global Step: 226210   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:17,054-Speed 2630.93 samples/sec   Loss 9.9547   LearningRate 0.0529   Epoch: 5   Global Step: 226220   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:20,950-Speed 2628.73 samples/sec   Loss 9.8866   LearningRate 0.0529   Epoch: 5   Global Step: 226230   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:24,842-Speed 2631.83 samples/sec   Loss 9.7861   LearningRate 0.0529   Epoch: 5   Global Step: 226240   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:28,735-Speed 2631.95 samples/sec   Loss 9.7419   LearningRate 0.0529   Epoch: 5   Global Step: 226250   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:32,628-Speed 2631.09 samples/sec   Loss 9.9339   LearningRate 0.0529   Epoch: 5   Global Step: 226260   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:36,530-Speed 2624.71 samples/sec   Loss 9.6784   LearningRate 0.0529   Epoch: 5   Global Step: 226270   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:40,422-Speed 2631.50 samples/sec   Loss 9.8624   LearningRate 0.0529   Epoch: 5   Global Step: 226280   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:08:44,319-Speed 2628.61 samples/sec   Loss 9.9416   LearningRate 0.0529   Epoch: 5   Global Step: 226290   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:08:48,196-Speed 2641.74 samples/sec   Loss 10.0205   LearningRate 0.0529   Epoch: 5   Global Step: 226300   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:52,096-Speed 2626.74 samples/sec   Loss 9.8431   LearningRate 0.0529   Epoch: 5   Global Step: 226310   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:55,992-Speed 2628.53 samples/sec   Loss 9.9061   LearningRate 0.0529   Epoch: 5   Global Step: 226320   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:08:59,919-Speed 2607.97 samples/sec   Loss 9.9165   LearningRate 0.0529   Epoch: 5   Global Step: 226330   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:03,815-Speed 2629.62 samples/sec   Loss 9.7797   LearningRate 0.0529   Epoch: 5   Global Step: 226340   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:07,724-Speed 2620.20 samples/sec   Loss 9.9701   LearningRate 0.0529   Epoch: 5   Global Step: 226350   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:11,641-Speed 2614.52 samples/sec   Loss 9.9066   LearningRate 0.0529   Epoch: 5   Global Step: 226360   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:15,557-Speed 2615.70 samples/sec   Loss 9.8721   LearningRate 0.0529   Epoch: 5   Global Step: 226370   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:19,519-Speed 2585.88 samples/sec   Loss 9.7727   LearningRate 0.0529   Epoch: 5   Global Step: 226380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:23,429-Speed 2619.42 samples/sec   Loss 9.9427   LearningRate 0.0529   Epoch: 5   Global Step: 226390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:09:27,326-Speed 2628.44 samples/sec   Loss 9.8236   LearningRate 0.0529   Epoch: 5   Global Step: 226400   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:31,210-Speed 2636.69 samples/sec   Loss 10.0187   LearningRate 0.0529   Epoch: 5   Global Step: 226410   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:35,106-Speed 2628.89 samples/sec   Loss 9.7853   LearningRate 0.0529   Epoch: 5   Global Step: 226420   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:39,005-Speed 2627.01 samples/sec   Loss 9.8686   LearningRate 0.0529   Epoch: 5   Global Step: 226430   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:42,906-Speed 2625.55 samples/sec   Loss 9.9708   LearningRate 0.0529   Epoch: 5   Global Step: 226440   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:46,809-Speed 2624.54 samples/sec   Loss 9.9198   LearningRate 0.0529   Epoch: 5   Global Step: 226450   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:50,699-Speed 2633.34 samples/sec   Loss 9.8728   LearningRate 0.0529   Epoch: 5   Global Step: 226460   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:54,589-Speed 2632.78 samples/sec   Loss 9.8466   LearningRate 0.0529   Epoch: 5   Global Step: 226470   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:09:58,478-Speed 2633.74 samples/sec   Loss 9.9398   LearningRate 0.0529   Epoch: 5   Global Step: 226480   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:10:02,373-Speed 2629.55 samples/sec   Loss 9.9029   LearningRate 0.0529   Epoch: 5   Global Step: 226490   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:10:06,269-Speed 2629.28 samples/sec   Loss 9.8788   LearningRate 0.0528   Epoch: 5   Global Step: 226500   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:10,159-Speed 2632.61 samples/sec   Loss 9.7614   LearningRate 0.0528   Epoch: 5   Global Step: 226510   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:14,061-Speed 2625.32 samples/sec   Loss 9.9428   LearningRate 0.0528   Epoch: 5   Global Step: 226520   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:17,957-Speed 2629.45 samples/sec   Loss 9.8262   LearningRate 0.0528   Epoch: 5   Global Step: 226530   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:21,847-Speed 2632.87 samples/sec   Loss 9.8847   LearningRate 0.0528   Epoch: 5   Global Step: 226540   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:25,737-Speed 2633.25 samples/sec   Loss 9.9376   LearningRate 0.0528   Epoch: 5   Global Step: 226550   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:29,629-Speed 2631.39 samples/sec   Loss 9.9114   LearningRate 0.0528   Epoch: 5   Global Step: 226560   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:33,521-Speed 2631.74 samples/sec   Loss 9.7601   LearningRate 0.0528   Epoch: 5   Global Step: 226570   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:37,431-Speed 2619.55 samples/sec   Loss 9.9706   LearningRate 0.0528   Epoch: 5   Global Step: 226580   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:41,323-Speed 2632.07 samples/sec   Loss 9.8925   LearningRate 0.0528   Epoch: 5   Global Step: 226590   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:10:45,220-Speed 2628.91 samples/sec   Loss 9.7990   LearningRate 0.0528   Epoch: 5   Global Step: 226600   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:10:49,114-Speed 2630.77 samples/sec   Loss 9.9947   LearningRate 0.0528   Epoch: 5   Global Step: 226610   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:10:53,004-Speed 2632.86 samples/sec   Loss 9.8000   LearningRate 0.0528   Epoch: 5   Global Step: 226620   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:10:56,880-Speed 2642.67 samples/sec   Loss 9.8742   LearningRate 0.0528   Epoch: 5   Global Step: 226630   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:00,779-Speed 2626.47 samples/sec   Loss 9.9762   LearningRate 0.0528   Epoch: 5   Global Step: 226640   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:04,675-Speed 2629.27 samples/sec   Loss 9.7696   LearningRate 0.0528   Epoch: 5   Global Step: 226650   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:08,571-Speed 2628.39 samples/sec   Loss 9.8425   LearningRate 0.0528   Epoch: 5   Global Step: 226660   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:12,468-Speed 2628.77 samples/sec   Loss 9.8359   LearningRate 0.0528   Epoch: 5   Global Step: 226670   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:16,362-Speed 2630.07 samples/sec   Loss 9.8227   LearningRate 0.0528   Epoch: 5   Global Step: 226680   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:20,252-Speed 2633.43 samples/sec   Loss 9.8461   LearningRate 0.0528   Epoch: 5   Global Step: 226690   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:24,141-Speed 2633.65 samples/sec   Loss 9.9862   LearningRate 0.0528   Epoch: 5   Global Step: 226700   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:28,032-Speed 2632.02 samples/sec   Loss 10.0625   LearningRate 0.0528   Epoch: 5   Global Step: 226710   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:31,927-Speed 2629.98 samples/sec   Loss 9.9421   LearningRate 0.0528   Epoch: 5   Global Step: 226720   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:35,817-Speed 2632.37 samples/sec   Loss 9.7232   LearningRate 0.0528   Epoch: 5   Global Step: 226730   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:11:39,711-Speed 2630.17 samples/sec   Loss 10.0695   LearningRate 0.0528   Epoch: 5   Global Step: 226740   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:11:43,584-Speed 2645.29 samples/sec   Loss 9.9310   LearningRate 0.0528   Epoch: 5   Global Step: 226750   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:47,473-Speed 2633.82 samples/sec   Loss 9.6964   LearningRate 0.0528   Epoch: 5   Global Step: 226760   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:51,365-Speed 2631.83 samples/sec   Loss 9.8832   LearningRate 0.0528   Epoch: 5   Global Step: 226770   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:55,258-Speed 2631.12 samples/sec   Loss 9.8677   LearningRate 0.0528   Epoch: 5   Global Step: 226780   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:11:59,149-Speed 2632.47 samples/sec   Loss 9.7987   LearningRate 0.0528   Epoch: 5   Global Step: 226790   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:12:03,040-Speed 2632.56 samples/sec   Loss 9.8798   LearningRate 0.0528   Epoch: 5   Global Step: 226800   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:12:06,930-Speed 2632.78 samples/sec   Loss 9.7775   LearningRate 0.0528   Epoch: 5   Global Step: 226810   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:12:10,832-Speed 2625.10 samples/sec   Loss 9.8182   LearningRate 0.0528   Epoch: 5   Global Step: 226820   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:12:14,723-Speed 2632.24 samples/sec   Loss 9.8398   LearningRate 0.0528   Epoch: 5   Global Step: 226830   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:12:18,619-Speed 2629.17 samples/sec   Loss 9.8796   LearningRate 0.0528   Epoch: 5   Global Step: 226840   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:12:22,515-Speed 2628.94 samples/sec   Loss 9.8380   LearningRate 0.0528   Epoch: 5   Global Step: 226850   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:26,415-Speed 2626.10 samples/sec   Loss 9.8218   LearningRate 0.0528   Epoch: 5   Global Step: 226860   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:30,305-Speed 2633.15 samples/sec   Loss 9.8286   LearningRate 0.0528   Epoch: 5   Global Step: 226870   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:34,205-Speed 2626.46 samples/sec   Loss 9.8502   LearningRate 0.0528   Epoch: 5   Global Step: 226880   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:38,102-Speed 2628.73 samples/sec   Loss 9.8977   LearningRate 0.0528   Epoch: 5   Global Step: 226890   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:42,121-Speed 2548.61 samples/sec   Loss 9.8004   LearningRate 0.0528   Epoch: 5   Global Step: 226900   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:46,032-Speed 2618.43 samples/sec   Loss 9.8753   LearningRate 0.0528   Epoch: 5   Global Step: 226910   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:49,924-Speed 2632.02 samples/sec   Loss 9.9656   LearningRate 0.0528   Epoch: 5   Global Step: 226920   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:53,813-Speed 2633.81 samples/sec   Loss 9.8443   LearningRate 0.0528   Epoch: 5   Global Step: 226930   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:12:57,692-Speed 2640.92 samples/sec   Loss 9.8162   LearningRate 0.0528   Epoch: 5   Global Step: 226940   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:01,578-Speed 2635.51 samples/sec   Loss 9.7680   LearningRate 0.0528   Epoch: 5   Global Step: 226950   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:05,470-Speed 2631.37 samples/sec   Loss 9.8323   LearningRate 0.0528   Epoch: 5   Global Step: 226960   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:09,364-Speed 2630.27 samples/sec   Loss 9.8226   LearningRate 0.0528   Epoch: 5   Global Step: 226970   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:13,262-Speed 2627.43 samples/sec   Loss 9.8559   LearningRate 0.0528   Epoch: 5   Global Step: 226980   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:17,161-Speed 2627.36 samples/sec   Loss 9.8507   LearningRate 0.0528   Epoch: 5   Global Step: 226990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:21,059-Speed 2627.46 samples/sec   Loss 9.8497   LearningRate 0.0528   Epoch: 5   Global Step: 227000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:24,953-Speed 2630.76 samples/sec   Loss 9.7569   LearningRate 0.0528   Epoch: 5   Global Step: 227010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:28,844-Speed 2632.06 samples/sec   Loss 9.9044   LearningRate 0.0528   Epoch: 5   Global Step: 227020   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:32,736-Speed 2632.22 samples/sec   Loss 9.7592   LearningRate 0.0528   Epoch: 5   Global Step: 227030   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:36,627-Speed 2632.06 samples/sec   Loss 9.7386   LearningRate 0.0528   Epoch: 5   Global Step: 227040   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:13:40,518-Speed 2632.34 samples/sec   Loss 9.8670   LearningRate 0.0528   Epoch: 5   Global Step: 227050   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:13:44,409-Speed 2631.85 samples/sec   Loss 9.8762   LearningRate 0.0528   Epoch: 5   Global Step: 227060   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:13:48,299-Speed 2633.31 samples/sec   Loss 9.8265   LearningRate 0.0527   Epoch: 5   Global Step: 227070   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:13:52,183-Speed 2637.32 samples/sec   Loss 9.8186   LearningRate 0.0527   Epoch: 5   Global Step: 227080   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:56,067-Speed 2637.44 samples/sec   Loss 9.8738   LearningRate 0.0527   Epoch: 5   Global Step: 227090   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:13:59,956-Speed 2633.74 samples/sec   Loss 9.8987   LearningRate 0.0527   Epoch: 5   Global Step: 227100   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:03,848-Speed 2632.19 samples/sec   Loss 9.8156   LearningRate 0.0527   Epoch: 5   Global Step: 227110   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:07,745-Speed 2628.05 samples/sec   Loss 9.8858   LearningRate 0.0527   Epoch: 5   Global Step: 227120   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:11,653-Speed 2620.72 samples/sec   Loss 9.7881   LearningRate 0.0527   Epoch: 5   Global Step: 227130   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:15,578-Speed 2609.20 samples/sec   Loss 9.9087   LearningRate 0.0527   Epoch: 5   Global Step: 227140   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:19,491-Speed 2617.90 samples/sec   Loss 9.7901   LearningRate 0.0527   Epoch: 5   Global Step: 227150   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:23,398-Speed 2621.86 samples/sec   Loss 9.8208   LearningRate 0.0527   Epoch: 5   Global Step: 227160   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:27,356-Speed 2587.63 samples/sec   Loss 9.7692   LearningRate 0.0527   Epoch: 5   Global Step: 227170   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:31,265-Speed 2620.53 samples/sec   Loss 9.9342   LearningRate 0.0527   Epoch: 5   Global Step: 227180   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:14:35,172-Speed 2621.15 samples/sec   Loss 9.9150   LearningRate 0.0527   Epoch: 5   Global Step: 227190   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:14:39,071-Speed 2627.58 samples/sec   Loss 9.8596   LearningRate 0.0527   Epoch: 5   Global Step: 227200   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:14:42,969-Speed 2627.63 samples/sec   Loss 9.7946   LearningRate 0.0527   Epoch: 5   Global Step: 227210   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:14:46,863-Speed 2629.75 samples/sec   Loss 9.8059   LearningRate 0.0527   Epoch: 5   Global Step: 227220   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:14:50,758-Speed 2630.40 samples/sec   Loss 9.9264   LearningRate 0.0527   Epoch: 5   Global Step: 227230   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:14:54,643-Speed 2636.14 samples/sec   Loss 9.8543   LearningRate 0.0527   Epoch: 5   Global Step: 227240   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:14:58,532-Speed 2633.59 samples/sec   Loss 9.9215   LearningRate 0.0527   Epoch: 5   Global Step: 227250   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:02,429-Speed 2627.94 samples/sec   Loss 9.7911   LearningRate 0.0527   Epoch: 5   Global Step: 227260   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:06,325-Speed 2629.17 samples/sec   Loss 9.8505   LearningRate 0.0527   Epoch: 5   Global Step: 227270   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:10,220-Speed 2629.82 samples/sec   Loss 10.0209   LearningRate 0.0527   Epoch: 5   Global Step: 227280   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:14,120-Speed 2626.70 samples/sec   Loss 9.8580   LearningRate 0.0527   Epoch: 5   Global Step: 227290   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:18,024-Speed 2624.06 samples/sec   Loss 9.8922   LearningRate 0.0527   Epoch: 5   Global Step: 227300   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:21,920-Speed 2628.72 samples/sec   Loss 9.8653   LearningRate 0.0527   Epoch: 5   Global Step: 227310   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:25,815-Speed 2629.91 samples/sec   Loss 9.8166   LearningRate 0.0527   Epoch: 5   Global Step: 227320   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:29,713-Speed 2627.25 samples/sec   Loss 9.9071   LearningRate 0.0527   Epoch: 5   Global Step: 227330   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:33,777-Speed 2520.19 samples/sec   Loss 9.7583   LearningRate 0.0527   Epoch: 5   Global Step: 227340   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:15:37,852-Speed 2513.60 samples/sec   Loss 9.8765   LearningRate 0.0527   Epoch: 5   Global Step: 227350   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:15:41,818-Speed 2582.56 samples/sec   Loss 9.8323   LearningRate 0.0527   Epoch: 5   Global Step: 227360   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:15:45,697-Speed 2640.61 samples/sec   Loss 9.9044   LearningRate 0.0527   Epoch: 5   Global Step: 227370   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:49,590-Speed 2631.26 samples/sec   Loss 9.8999   LearningRate 0.0527   Epoch: 5   Global Step: 227380   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:53,485-Speed 2629.35 samples/sec   Loss 9.9318   LearningRate 0.0527   Epoch: 5   Global Step: 227390   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:15:57,388-Speed 2625.19 samples/sec   Loss 9.8124   LearningRate 0.0527   Epoch: 5   Global Step: 227400   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:16:01,286-Speed 2627.16 samples/sec   Loss 9.8219   LearningRate 0.0527   Epoch: 5   Global Step: 227410   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:16:05,140-Speed 2657.81 samples/sec   Loss 9.9917   LearningRate 0.0527   Epoch: 5   Global Step: 227420   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:16:09,005-Speed 2649.58 samples/sec   Loss 10.1925   LearningRate 0.0527   Epoch: 5   Global Step: 227430   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:12,896-Speed 2633.00 samples/sec   Loss 10.5108   LearningRate 0.0527   Epoch: 5   Global Step: 227440   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:16,792-Speed 2628.98 samples/sec   Loss 10.2685   LearningRate 0.0527   Epoch: 5   Global Step: 227450   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:20,685-Speed 2631.04 samples/sec   Loss 10.4492   LearningRate 0.0527   Epoch: 5   Global Step: 227460   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:24,626-Speed 2599.28 samples/sec   Loss 10.1342   LearningRate 0.0527   Epoch: 5   Global Step: 227470   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:28,516-Speed 2633.67 samples/sec   Loss 10.1101   LearningRate 0.0527   Epoch: 5   Global Step: 227480   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:32,407-Speed 2632.12 samples/sec   Loss 10.1559   LearningRate 0.0527   Epoch: 5   Global Step: 227490   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:36,294-Speed 2634.81 samples/sec   Loss 9.9477   LearningRate 0.0527   Epoch: 5   Global Step: 227500   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:40,190-Speed 2629.05 samples/sec   Loss 10.1452   LearningRate 0.0527   Epoch: 5   Global Step: 227510   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:44,080-Speed 2633.17 samples/sec   Loss 10.2475   LearningRate 0.0527   Epoch: 5   Global Step: 227520   Fp16 Grad Scale: 4096   Required: 68 hours
Training: 2022-04-13 21:16:47,984-Speed 2623.78 samples/sec   Loss 10.0801   LearningRate 0.0527   Epoch: 5   Global Step: 227530   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:16:51,873-Speed 2633.71 samples/sec   Loss 9.8441   LearningRate 0.0527   Epoch: 5   Global Step: 227540   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:16:55,765-Speed 2632.57 samples/sec   Loss 10.0396   LearningRate 0.0527   Epoch: 5   Global Step: 227550   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:16:59,658-Speed 2630.51 samples/sec   Loss 10.1784   LearningRate 0.0527   Epoch: 5   Global Step: 227560   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:03,564-Speed 2622.63 samples/sec   Loss 10.0526   LearningRate 0.0527   Epoch: 5   Global Step: 227570   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:07,461-Speed 2627.92 samples/sec   Loss 9.8167   LearningRate 0.0527   Epoch: 5   Global Step: 227580   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:11,352-Speed 2633.81 samples/sec   Loss 9.9052   LearningRate 0.0527   Epoch: 5   Global Step: 227590   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:15,244-Speed 2631.38 samples/sec   Loss 9.8902   LearningRate 0.0527   Epoch: 5   Global Step: 227600   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:19,139-Speed 2630.05 samples/sec   Loss 9.9874   LearningRate 0.0527   Epoch: 5   Global Step: 227610   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:23,027-Speed 2633.76 samples/sec   Loss 9.9426   LearningRate 0.0527   Epoch: 5   Global Step: 227620   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-13 21:17:26,917-Speed 2633.93 samples/sec   Loss 9.7821   LearningRate 0.0527   Epoch: 5   Global Step: 227630   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:30,809-Speed 2631.47 samples/sec   Loss 9.8416   LearningRate 0.0526   Epoch: 5   Global Step: 227640   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:34,825-Speed 2549.74 samples/sec   Loss 9.8864   LearningRate 0.0526   Epoch: 5   Global Step: 227650   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:38,922-Speed 2500.28 samples/sec   Loss 9.9466   LearningRate 0.0526   Epoch: 5   Global Step: 227660   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:42,809-Speed 2635.01 samples/sec   Loss 9.8985   LearningRate 0.0526   Epoch: 5   Global Step: 227670   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:46,699-Speed 2633.02 samples/sec   Loss 9.8022   LearningRate 0.0526   Epoch: 5   Global Step: 227680   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:50,669-Speed 2580.38 samples/sec   Loss 9.9261   LearningRate 0.0526   Epoch: 5   Global Step: 227690   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:54,556-Speed 2635.28 samples/sec   Loss 9.9278   LearningRate 0.0526   Epoch: 5   Global Step: 227700   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:17:58,464-Speed 2621.61 samples/sec   Loss 9.9068   LearningRate 0.0526   Epoch: 5   Global Step: 227710   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:18:02,356-Speed 2631.16 samples/sec   Loss 9.8973   LearningRate 0.0526   Epoch: 5   Global Step: 227720   Fp16 Grad Scale: 16384   Required: 68 hours
Training: 2022-04-13 21:18:06,256-Speed 2626.42 samples/sec   Loss 10.0978   LearningRate 0.0526   Epoch: 5   Global Step: 227730   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:10,149-Speed 2630.78 samples/sec   Loss 9.8081   LearningRate 0.0526   Epoch: 5   Global Step: 227740   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:14,044-Speed 2629.73 samples/sec   Loss 9.9210   LearningRate 0.0526   Epoch: 5   Global Step: 227750   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:17,928-Speed 2637.23 samples/sec   Loss 9.7745   LearningRate 0.0526   Epoch: 5   Global Step: 227760   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:21,820-Speed 2631.55 samples/sec   Loss 9.8985   LearningRate 0.0526   Epoch: 5   Global Step: 227770   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:25,721-Speed 2626.06 samples/sec   Loss 9.7383   LearningRate 0.0526   Epoch: 5   Global Step: 227780   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:29,627-Speed 2622.39 samples/sec   Loss 9.9062   LearningRate 0.0526   Epoch: 5   Global Step: 227790   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:33,525-Speed 2627.52 samples/sec   Loss 9.9404   LearningRate 0.0526   Epoch: 5   Global Step: 227800   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:37,429-Speed 2623.10 samples/sec   Loss 9.8179   LearningRate 0.0526   Epoch: 5   Global Step: 227810   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:41,329-Speed 2626.12 samples/sec   Loss 9.7933   LearningRate 0.0526   Epoch: 5   Global Step: 227820   Fp16 Grad Scale: 32768   Required: 68 hours
Training: 2022-04-13 21:18:45,233-Speed 2623.48 samples/sec   Loss 9.9860   LearningRate 0.0526   Epoch: 5   Global Step: 227830   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:18:49,129-Speed 2629.89 samples/sec   Loss 9.9819   LearningRate 0.0526   Epoch: 5   Global Step: 227840   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:18:53,023-Speed 2630.06 samples/sec   Loss 9.7907   LearningRate 0.0526   Epoch: 5   Global Step: 227850   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:18:57,014-Speed 2566.45 samples/sec   Loss 9.9148   LearningRate 0.0526   Epoch: 5   Global Step: 227860   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:01,028-Speed 2551.87 samples/sec   Loss 9.9249   LearningRate 0.0526   Epoch: 5   Global Step: 227870   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:05,059-Speed 2540.89 samples/sec   Loss 9.9128   LearningRate 0.0526   Epoch: 5   Global Step: 227880   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:08,954-Speed 2630.30 samples/sec   Loss 9.8933   LearningRate 0.0526   Epoch: 5   Global Step: 227890   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:12,970-Speed 2549.98 samples/sec   Loss 10.0497   LearningRate 0.0526   Epoch: 5   Global Step: 227900   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:16,875-Speed 2623.12 samples/sec   Loss 9.9188   LearningRate 0.0526   Epoch: 5   Global Step: 227910   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:20,773-Speed 2627.02 samples/sec   Loss 9.8926   LearningRate 0.0526   Epoch: 5   Global Step: 227920   Fp16 Grad Scale: 65536   Required: 68 hours
Training: 2022-04-13 21:19:24,665-Speed 2632.19 samples/sec   Loss 9.9236   LearningRate 0.0526   Epoch: 5   Global Step: 227930   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:28,556-Speed 2631.95 samples/sec   Loss 9.8049   LearningRate 0.0526   Epoch: 5   Global Step: 227940   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:32,447-Speed 2633.26 samples/sec   Loss 10.0675   LearningRate 0.0526   Epoch: 5   Global Step: 227950   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:36,338-Speed 2632.20 samples/sec   Loss 9.8432   LearningRate 0.0526   Epoch: 5   Global Step: 227960   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:40,262-Speed 2610.09 samples/sec   Loss 9.8670   LearningRate 0.0526   Epoch: 5   Global Step: 227970   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:44,157-Speed 2629.07 samples/sec   Loss 9.8002   LearningRate 0.0526   Epoch: 5   Global Step: 227980   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:48,051-Speed 2630.55 samples/sec   Loss 9.7892   LearningRate 0.0526   Epoch: 5   Global Step: 227990   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:51,943-Speed 2631.40 samples/sec   Loss 10.0635   LearningRate 0.0526   Epoch: 5   Global Step: 228000   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:55,835-Speed 2632.06 samples/sec   Loss 9.8422   LearningRate 0.0526   Epoch: 5   Global Step: 228010   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:19:59,732-Speed 2628.49 samples/sec   Loss 9.6603   LearningRate 0.0526   Epoch: 5   Global Step: 228020   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:03,661-Speed 2607.10 samples/sec   Loss 9.8730   LearningRate 0.0526   Epoch: 5   Global Step: 228030   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:20:07,557-Speed 2628.83 samples/sec   Loss 9.8153   LearningRate 0.0526   Epoch: 5   Global Step: 228040   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:20:11,447-Speed 2633.44 samples/sec   Loss 9.8006   LearningRate 0.0526   Epoch: 5   Global Step: 228050   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:20:15,339-Speed 2630.98 samples/sec   Loss 9.8018   LearningRate 0.0526   Epoch: 5   Global Step: 228060   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:20:19,215-Speed 2643.73 samples/sec   Loss 9.7958   LearningRate 0.0526   Epoch: 5   Global Step: 228070   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:23,105-Speed 2632.39 samples/sec   Loss 9.8084   LearningRate 0.0526   Epoch: 5   Global Step: 228080   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:26,999-Speed 2630.59 samples/sec   Loss 9.9294   LearningRate 0.0526   Epoch: 5   Global Step: 228090   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:30,891-Speed 2631.47 samples/sec   Loss 9.8010   LearningRate 0.0526   Epoch: 5   Global Step: 228100   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:34,782-Speed 2632.42 samples/sec   Loss 9.8490   LearningRate 0.0526   Epoch: 5   Global Step: 228110   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:38,673-Speed 2632.80 samples/sec   Loss 9.7908   LearningRate 0.0526   Epoch: 5   Global Step: 228120   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:42,567-Speed 2630.35 samples/sec   Loss 9.8342   LearningRate 0.0526   Epoch: 5   Global Step: 228130   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:46,498-Speed 2605.87 samples/sec   Loss 9.8004   LearningRate 0.0526   Epoch: 5   Global Step: 228140   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:50,394-Speed 2629.06 samples/sec   Loss 9.7694   LearningRate 0.0526   Epoch: 5   Global Step: 228150   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:54,291-Speed 2628.44 samples/sec   Loss 9.8967   LearningRate 0.0526   Epoch: 5   Global Step: 228160   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:20:58,190-Speed 2626.96 samples/sec   Loss 9.8275   LearningRate 0.0526   Epoch: 5   Global Step: 228170   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:21:02,097-Speed 2622.04 samples/sec   Loss 9.9173   LearningRate 0.0526   Epoch: 5   Global Step: 228180   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:21:06,094-Speed 2562.13 samples/sec   Loss 9.8699   LearningRate 0.0526   Epoch: 5   Global Step: 228190   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:21:09,993-Speed 2627.56 samples/sec   Loss 9.8566   LearningRate 0.0526   Epoch: 5   Global Step: 228200   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:21:13,871-Speed 2641.62 samples/sec   Loss 9.7890   LearningRate 0.0525   Epoch: 5   Global Step: 228210   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:17,775-Speed 2622.98 samples/sec   Loss 9.7980   LearningRate 0.0525   Epoch: 5   Global Step: 228220   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:21,672-Speed 2628.44 samples/sec   Loss 9.9529   LearningRate 0.0525   Epoch: 5   Global Step: 228230   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:25,605-Speed 2604.22 samples/sec   Loss 9.8907   LearningRate 0.0525   Epoch: 5   Global Step: 228240   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:29,500-Speed 2630.56 samples/sec   Loss 9.8995   LearningRate 0.0525   Epoch: 5   Global Step: 228250   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:33,394-Speed 2629.68 samples/sec   Loss 9.8172   LearningRate 0.0525   Epoch: 5   Global Step: 228260   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:37,287-Speed 2630.94 samples/sec   Loss 9.8757   LearningRate 0.0525   Epoch: 5   Global Step: 228270   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:41,180-Speed 2630.91 samples/sec   Loss 9.7587   LearningRate 0.0525   Epoch: 5   Global Step: 228280   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:45,084-Speed 2623.87 samples/sec   Loss 9.7156   LearningRate 0.0525   Epoch: 5   Global Step: 228290   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:48,992-Speed 2621.27 samples/sec   Loss 9.6410   LearningRate 0.0525   Epoch: 5   Global Step: 228300   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:21:52,900-Speed 2621.28 samples/sec   Loss 9.7655   LearningRate 0.0525   Epoch: 5   Global Step: 228310   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:21:56,793-Speed 2631.12 samples/sec   Loss 9.9339   LearningRate 0.0525   Epoch: 5   Global Step: 228320   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:22:00,688-Speed 2629.41 samples/sec   Loss 9.8453   LearningRate 0.0525   Epoch: 5   Global Step: 228330   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:22:04,592-Speed 2623.00 samples/sec   Loss 9.7382   LearningRate 0.0525   Epoch: 5   Global Step: 228340   Fp16 Grad Scale: 262144   Required: 68 hours
Training: 2022-04-13 21:22:08,470-Speed 2641.15 samples/sec   Loss 9.6318   LearningRate 0.0525   Epoch: 5   Global Step: 228350   Fp16 Grad Scale: 131072   Required: 68 hours
Training: 2022-04-13 21:22:12,368-Speed 2627.97 samples/sec   Loss 9.8867   LearningRate 0.0525   Epoch: 5   Global Step: 228360   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:16,258-Speed 2632.81 samples/sec   Loss 9.7298   LearningRate 0.0525   Epoch: 5   Global Step: 228370   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:20,158-Speed 2627.00 samples/sec   Loss 9.8544   LearningRate 0.0525   Epoch: 5   Global Step: 228380   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:24,053-Speed 2629.02 samples/sec   Loss 9.8291   LearningRate 0.0525   Epoch: 5   Global Step: 228390   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:27,946-Speed 2631.82 samples/sec   Loss 9.7749   LearningRate 0.0525   Epoch: 5   Global Step: 228400   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:31,834-Speed 2634.16 samples/sec   Loss 9.9054   LearningRate 0.0525   Epoch: 5   Global Step: 228410   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:35,727-Speed 2631.33 samples/sec   Loss 9.8730   LearningRate 0.0525   Epoch: 5   Global Step: 228420   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:39,625-Speed 2627.05 samples/sec   Loss 9.8657   LearningRate 0.0525   Epoch: 5   Global Step: 228430   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:43,519-Speed 2630.18 samples/sec   Loss 9.8124   LearningRate 0.0525   Epoch: 5   Global Step: 228440   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:47,423-Speed 2623.76 samples/sec   Loss 9.8531   LearningRate 0.0525   Epoch: 5   Global Step: 228450   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:22:51,303-Speed 2640.03 samples/sec   Loss 9.9762   LearningRate 0.0525   Epoch: 5   Global Step: 228460   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:55,195-Speed 2631.82 samples/sec   Loss 9.9897   LearningRate 0.0525   Epoch: 5   Global Step: 228470   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:22:59,093-Speed 2627.47 samples/sec   Loss 9.8956   LearningRate 0.0525   Epoch: 5   Global Step: 228480   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:02,995-Speed 2624.77 samples/sec   Loss 9.8224   LearningRate 0.0525   Epoch: 5   Global Step: 228490   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:06,888-Speed 2630.82 samples/sec   Loss 9.9958   LearningRate 0.0525   Epoch: 5   Global Step: 228500   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:10,781-Speed 2631.12 samples/sec   Loss 9.7713   LearningRate 0.0525   Epoch: 5   Global Step: 228510   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:14,674-Speed 2631.17 samples/sec   Loss 9.7802   LearningRate 0.0525   Epoch: 5   Global Step: 228520   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:18,576-Speed 2624.56 samples/sec   Loss 9.8220   LearningRate 0.0525   Epoch: 5   Global Step: 228530   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:22,470-Speed 2630.17 samples/sec   Loss 9.7678   LearningRate 0.0525   Epoch: 5   Global Step: 228540   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:26,362-Speed 2631.66 samples/sec   Loss 9.8270   LearningRate 0.0525   Epoch: 5   Global Step: 228550   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:23:30,255-Speed 2631.22 samples/sec   Loss 9.8184   LearningRate 0.0525   Epoch: 5   Global Step: 228560   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:34,150-Speed 2630.15 samples/sec   Loss 9.8497   LearningRate 0.0525   Epoch: 5   Global Step: 228570   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:38,043-Speed 2630.35 samples/sec   Loss 9.6895   LearningRate 0.0525   Epoch: 5   Global Step: 228580   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:41,935-Speed 2632.11 samples/sec   Loss 9.8083   LearningRate 0.0525   Epoch: 5   Global Step: 228590   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:45,825-Speed 2632.62 samples/sec   Loss 9.8014   LearningRate 0.0525   Epoch: 5   Global Step: 228600   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:49,719-Speed 2630.01 samples/sec   Loss 9.8100   LearningRate 0.0525   Epoch: 5   Global Step: 228610   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:53,620-Speed 2625.61 samples/sec   Loss 9.7099   LearningRate 0.0525   Epoch: 5   Global Step: 228620   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:23:57,537-Speed 2615.07 samples/sec   Loss 9.8838   LearningRate 0.0525   Epoch: 5   Global Step: 228630   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:01,426-Speed 2633.41 samples/sec   Loss 9.7078   LearningRate 0.0525   Epoch: 5   Global Step: 228640   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:05,321-Speed 2629.27 samples/sec   Loss 9.8971   LearningRate 0.0525   Epoch: 5   Global Step: 228650   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:09,203-Speed 2638.30 samples/sec   Loss 9.7835   LearningRate 0.0525   Epoch: 5   Global Step: 228660   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:13,091-Speed 2634.89 samples/sec   Loss 9.8990   LearningRate 0.0525   Epoch: 5   Global Step: 228670   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:17,000-Speed 2620.16 samples/sec   Loss 9.9254   LearningRate 0.0525   Epoch: 5   Global Step: 228680   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:20,930-Speed 2606.25 samples/sec   Loss 9.7564   LearningRate 0.0525   Epoch: 5   Global Step: 228690   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:24:24,798-Speed 2648.24 samples/sec   Loss 9.8552   LearningRate 0.0525   Epoch: 5   Global Step: 228700   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:28,693-Speed 2629.46 samples/sec   Loss 9.9765   LearningRate 0.0525   Epoch: 5   Global Step: 228710   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:32,586-Speed 2631.98 samples/sec   Loss 9.8288   LearningRate 0.0525   Epoch: 5   Global Step: 228720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:36,480-Speed 2629.92 samples/sec   Loss 9.8658   LearningRate 0.0525   Epoch: 5   Global Step: 228730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:40,372-Speed 2631.72 samples/sec   Loss 9.7838   LearningRate 0.0525   Epoch: 5   Global Step: 228740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:44,267-Speed 2629.39 samples/sec   Loss 9.8776   LearningRate 0.0525   Epoch: 5   Global Step: 228750   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:48,157-Speed 2632.93 samples/sec   Loss 9.9648   LearningRate 0.0525   Epoch: 5   Global Step: 228760   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:52,054-Speed 2628.76 samples/sec   Loss 9.7620   LearningRate 0.0525   Epoch: 5   Global Step: 228770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:55,940-Speed 2635.40 samples/sec   Loss 9.8367   LearningRate 0.0524   Epoch: 5   Global Step: 228780   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:24:59,831-Speed 2632.08 samples/sec   Loss 9.8701   LearningRate 0.0524   Epoch: 5   Global Step: 228790   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:03,730-Speed 2627.33 samples/sec   Loss 9.7737   LearningRate 0.0524   Epoch: 5   Global Step: 228800   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:25:07,615-Speed 2636.48 samples/sec   Loss 9.7795   LearningRate 0.0524   Epoch: 5   Global Step: 228810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:11,509-Speed 2629.66 samples/sec   Loss 9.7346   LearningRate 0.0524   Epoch: 5   Global Step: 228820   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:15,407-Speed 2627.99 samples/sec   Loss 9.7857   LearningRate 0.0524   Epoch: 5   Global Step: 228830   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:19,303-Speed 2628.94 samples/sec   Loss 9.9292   LearningRate 0.0524   Epoch: 5   Global Step: 228840   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:23,205-Speed 2625.09 samples/sec   Loss 9.9171   LearningRate 0.0524   Epoch: 5   Global Step: 228850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:27,103-Speed 2627.75 samples/sec   Loss 9.8420   LearningRate 0.0524   Epoch: 5   Global Step: 228860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:31,113-Speed 2554.08 samples/sec   Loss 9.6799   LearningRate 0.0524   Epoch: 5   Global Step: 228870   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:35,005-Speed 2631.65 samples/sec   Loss 9.8173   LearningRate 0.0524   Epoch: 5   Global Step: 228880   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:38,911-Speed 2622.66 samples/sec   Loss 9.8063   LearningRate 0.0524   Epoch: 5   Global Step: 228890   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:42,797-Speed 2635.47 samples/sec   Loss 9.7215   LearningRate 0.0524   Epoch: 5   Global Step: 228900   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:46,686-Speed 2633.37 samples/sec   Loss 9.7699   LearningRate 0.0524   Epoch: 5   Global Step: 228910   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:25:50,561-Speed 2643.19 samples/sec   Loss 9.7514   LearningRate 0.0524   Epoch: 5   Global Step: 228920   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:54,465-Speed 2624.02 samples/sec   Loss 9.7724   LearningRate 0.0524   Epoch: 5   Global Step: 228930   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:25:58,350-Speed 2636.09 samples/sec   Loss 9.7929   LearningRate 0.0524   Epoch: 5   Global Step: 228940   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:02,243-Speed 2631.05 samples/sec   Loss 9.9832   LearningRate 0.0524   Epoch: 5   Global Step: 228950   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:06,138-Speed 2629.37 samples/sec   Loss 9.9217   LearningRate 0.0524   Epoch: 5   Global Step: 228960   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:10,034-Speed 2629.31 samples/sec   Loss 9.7519   LearningRate 0.0524   Epoch: 5   Global Step: 228970   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:13,928-Speed 2629.85 samples/sec   Loss 9.9709   LearningRate 0.0524   Epoch: 5   Global Step: 228980   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:17,820-Speed 2631.82 samples/sec   Loss 9.7709   LearningRate 0.0524   Epoch: 5   Global Step: 228990   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:21,716-Speed 2628.92 samples/sec   Loss 9.8292   LearningRate 0.0524   Epoch: 5   Global Step: 229000   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:25,619-Speed 2624.37 samples/sec   Loss 9.7905   LearningRate 0.0524   Epoch: 5   Global Step: 229010   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:29,504-Speed 2636.79 samples/sec   Loss 9.7845   LearningRate 0.0524   Epoch: 5   Global Step: 229020   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:33,411-Speed 2621.22 samples/sec   Loss 9.6040   LearningRate 0.0524   Epoch: 5   Global Step: 229030   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:37,320-Speed 2620.04 samples/sec   Loss 9.7803   LearningRate 0.0524   Epoch: 5   Global Step: 229040   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:41,215-Speed 2629.35 samples/sec   Loss 9.8802   LearningRate 0.0524   Epoch: 5   Global Step: 229050   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:45,112-Speed 2628.59 samples/sec   Loss 9.7287   LearningRate 0.0524   Epoch: 5   Global Step: 229060   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:49,007-Speed 2629.55 samples/sec   Loss 9.8175   LearningRate 0.0524   Epoch: 5   Global Step: 229070   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:52,901-Speed 2630.57 samples/sec   Loss 9.8914   LearningRate 0.0524   Epoch: 5   Global Step: 229080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:26:56,796-Speed 2629.42 samples/sec   Loss 9.8856   LearningRate 0.0524   Epoch: 5   Global Step: 229090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:27:00,692-Speed 2628.98 samples/sec   Loss 9.8603   LearningRate 0.0524   Epoch: 5   Global Step: 229100   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:27:04,588-Speed 2628.70 samples/sec   Loss 9.7127   LearningRate 0.0524   Epoch: 5   Global Step: 229110   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:27:08,485-Speed 2628.36 samples/sec   Loss 9.8773   LearningRate 0.0524   Epoch: 5   Global Step: 229120   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:12,381-Speed 2628.81 samples/sec   Loss 9.8683   LearningRate 0.0524   Epoch: 5   Global Step: 229130   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:16,281-Speed 2626.27 samples/sec   Loss 9.9217   LearningRate 0.0524   Epoch: 5   Global Step: 229140   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:20,180-Speed 2626.87 samples/sec   Loss 9.7654   LearningRate 0.0524   Epoch: 5   Global Step: 229150   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:24,075-Speed 2629.58 samples/sec   Loss 9.8050   LearningRate 0.0524   Epoch: 5   Global Step: 229160   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:27,970-Speed 2629.90 samples/sec   Loss 9.8452   LearningRate 0.0524   Epoch: 5   Global Step: 229170   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:31,875-Speed 2623.02 samples/sec   Loss 9.9138   LearningRate 0.0524   Epoch: 5   Global Step: 229180   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:35,775-Speed 2626.16 samples/sec   Loss 9.7683   LearningRate 0.0524   Epoch: 5   Global Step: 229190   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:39,672-Speed 2627.92 samples/sec   Loss 9.8034   LearningRate 0.0524   Epoch: 5   Global Step: 229200   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:43,565-Speed 2631.27 samples/sec   Loss 9.8375   LearningRate 0.0524   Epoch: 5   Global Step: 229210   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:47,488-Speed 2611.04 samples/sec   Loss 9.8806   LearningRate 0.0524   Epoch: 5   Global Step: 229220   Fp16 Grad Scale: 524288   Required: 67 hours
Training: 2022-04-13 21:27:51,383-Speed 2629.68 samples/sec   Loss 9.9527   LearningRate 0.0524   Epoch: 5   Global Step: 229230   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:27:55,259-Speed 2642.59 samples/sec   Loss 9.7682   LearningRate 0.0524   Epoch: 5   Global Step: 229240   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:27:59,150-Speed 2632.25 samples/sec   Loss 9.9179   LearningRate 0.0524   Epoch: 5   Global Step: 229250   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:03,039-Speed 2633.91 samples/sec   Loss 9.7879   LearningRate 0.0524   Epoch: 5   Global Step: 229260   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:06,927-Speed 2633.93 samples/sec   Loss 9.9731   LearningRate 0.0524   Epoch: 5   Global Step: 229270   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:10,823-Speed 2628.91 samples/sec   Loss 9.8915   LearningRate 0.0524   Epoch: 5   Global Step: 229280   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:14,731-Speed 2620.97 samples/sec   Loss 9.8824   LearningRate 0.0524   Epoch: 5   Global Step: 229290   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:18,634-Speed 2624.07 samples/sec   Loss 9.7522   LearningRate 0.0524   Epoch: 5   Global Step: 229300   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:22,531-Speed 2628.72 samples/sec   Loss 9.8321   LearningRate 0.0524   Epoch: 5   Global Step: 229310   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:26,428-Speed 2627.90 samples/sec   Loss 9.7910   LearningRate 0.0524   Epoch: 5   Global Step: 229320   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:30,335-Speed 2621.86 samples/sec   Loss 9.8892   LearningRate 0.0524   Epoch: 5   Global Step: 229330   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:28:34,240-Speed 2622.47 samples/sec   Loss 9.8043   LearningRate 0.0524   Epoch: 5   Global Step: 229340   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:28:38,147-Speed 2621.23 samples/sec   Loss 9.7912   LearningRate 0.0524   Epoch: 5   Global Step: 229350   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:28:42,046-Speed 2627.23 samples/sec   Loss 9.6682   LearningRate 0.0523   Epoch: 5   Global Step: 229360   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:28:45,943-Speed 2628.23 samples/sec   Loss 9.7437   LearningRate 0.0523   Epoch: 5   Global Step: 229370   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:28:49,835-Speed 2632.36 samples/sec   Loss 9.7033   LearningRate 0.0523   Epoch: 5   Global Step: 229380   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:28:53,728-Speed 2630.70 samples/sec   Loss 9.6556   LearningRate 0.0523   Epoch: 5   Global Step: 229390   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:28:57,604-Speed 2642.53 samples/sec   Loss 9.8847   LearningRate 0.0523   Epoch: 5   Global Step: 229400   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:01,512-Speed 2620.44 samples/sec   Loss 9.7251   LearningRate 0.0523   Epoch: 5   Global Step: 229410   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:05,395-Speed 2637.84 samples/sec   Loss 9.8833   LearningRate 0.0523   Epoch: 5   Global Step: 229420   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:09,285-Speed 2632.76 samples/sec   Loss 9.8807   LearningRate 0.0523   Epoch: 5   Global Step: 229430   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:13,181-Speed 2629.29 samples/sec   Loss 9.7266   LearningRate 0.0523   Epoch: 5   Global Step: 229440   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:17,073-Speed 2631.59 samples/sec   Loss 9.8985   LearningRate 0.0523   Epoch: 5   Global Step: 229450   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:20,966-Speed 2631.70 samples/sec   Loss 9.8841   LearningRate 0.0523   Epoch: 5   Global Step: 229460   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:24,856-Speed 2632.82 samples/sec   Loss 9.7409   LearningRate 0.0523   Epoch: 5   Global Step: 229470   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:28,748-Speed 2632.04 samples/sec   Loss 9.8716   LearningRate 0.0523   Epoch: 5   Global Step: 229480   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:32,640-Speed 2631.47 samples/sec   Loss 9.8441   LearningRate 0.0523   Epoch: 5   Global Step: 229490   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:29:36,536-Speed 2628.83 samples/sec   Loss 9.7414   LearningRate 0.0523   Epoch: 5   Global Step: 229500   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:29:40,470-Speed 2603.13 samples/sec   Loss 9.9053   LearningRate 0.0523   Epoch: 5   Global Step: 229510   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:29:44,475-Speed 2557.80 samples/sec   Loss 9.7993   LearningRate 0.0523   Epoch: 5   Global Step: 229520   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:29:48,371-Speed 2629.11 samples/sec   Loss 9.8965   LearningRate 0.0523   Epoch: 5   Global Step: 229530   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:29:52,271-Speed 2626.35 samples/sec   Loss 9.6851   LearningRate 0.0523   Epoch: 5   Global Step: 229540   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:29:56,188-Speed 2615.10 samples/sec   Loss 9.7830   LearningRate 0.0523   Epoch: 5   Global Step: 229550   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:30:00,080-Speed 2631.87 samples/sec   Loss 9.9539   LearningRate 0.0523   Epoch: 5   Global Step: 229560   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:30:03,969-Speed 2633.61 samples/sec   Loss 9.9666   LearningRate 0.0523   Epoch: 5   Global Step: 229570   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:30:07,882-Speed 2617.57 samples/sec   Loss 9.6941   LearningRate 0.0523   Epoch: 5   Global Step: 229580   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:30:11,755-Speed 2644.90 samples/sec   Loss 9.8202   LearningRate 0.0523   Epoch: 5   Global Step: 229590   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:15,648-Speed 2630.86 samples/sec   Loss 9.7357   LearningRate 0.0523   Epoch: 5   Global Step: 229600   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:19,536-Speed 2634.32 samples/sec   Loss 9.6396   LearningRate 0.0523   Epoch: 5   Global Step: 229610   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:23,426-Speed 2633.06 samples/sec   Loss 9.8508   LearningRate 0.0523   Epoch: 5   Global Step: 229620   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:27,317-Speed 2632.98 samples/sec   Loss 9.7181   LearningRate 0.0523   Epoch: 5   Global Step: 229630   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:31,208-Speed 2632.49 samples/sec   Loss 9.7654   LearningRate 0.0523   Epoch: 5   Global Step: 229640   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:35,097-Speed 2633.37 samples/sec   Loss 9.7813   LearningRate 0.0523   Epoch: 5   Global Step: 229650   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:38,993-Speed 2629.25 samples/sec   Loss 9.7371   LearningRate 0.0523   Epoch: 5   Global Step: 229660   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:42,896-Speed 2623.66 samples/sec   Loss 9.8867   LearningRate 0.0523   Epoch: 5   Global Step: 229670   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:46,787-Speed 2632.02 samples/sec   Loss 9.8386   LearningRate 0.0523   Epoch: 5   Global Step: 229680   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:30:50,678-Speed 2632.59 samples/sec   Loss 9.8681   LearningRate 0.0523   Epoch: 5   Global Step: 229690   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:30:54,575-Speed 2629.21 samples/sec   Loss 9.7223   LearningRate 0.0523   Epoch: 5   Global Step: 229700   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:30:58,470-Speed 2629.29 samples/sec   Loss 9.8495   LearningRate 0.0523   Epoch: 5   Global Step: 229710   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:31:02,342-Speed 2645.67 samples/sec   Loss 9.8324   LearningRate 0.0523   Epoch: 5   Global Step: 229720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:06,238-Speed 2628.47 samples/sec   Loss 9.9164   LearningRate 0.0523   Epoch: 5   Global Step: 229730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:10,137-Speed 2627.50 samples/sec   Loss 9.6812   LearningRate 0.0523   Epoch: 5   Global Step: 229740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:14,036-Speed 2626.55 samples/sec   Loss 9.9089   LearningRate 0.0523   Epoch: 5   Global Step: 229750   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:18,029-Speed 2565.35 samples/sec   Loss 9.8591   LearningRate 0.0523   Epoch: 5   Global Step: 229760   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:22,127-Speed 2499.28 samples/sec   Loss 9.9088   LearningRate 0.0523   Epoch: 5   Global Step: 229770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:26,061-Speed 2604.21 samples/sec   Loss 9.7362   LearningRate 0.0523   Epoch: 5   Global Step: 229780   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:29,957-Speed 2628.42 samples/sec   Loss 9.8069   LearningRate 0.0523   Epoch: 5   Global Step: 229790   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:33,851-Speed 2630.88 samples/sec   Loss 9.8735   LearningRate 0.0523   Epoch: 5   Global Step: 229800   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:37,750-Speed 2626.25 samples/sec   Loss 9.8636   LearningRate 0.0523   Epoch: 5   Global Step: 229810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:41,644-Speed 2630.77 samples/sec   Loss 9.7094   LearningRate 0.0523   Epoch: 5   Global Step: 229820   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:31:45,518-Speed 2643.63 samples/sec   Loss 9.9173   LearningRate 0.0523   Epoch: 5   Global Step: 229830   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:49,419-Speed 2625.47 samples/sec   Loss 9.8395   LearningRate 0.0523   Epoch: 5   Global Step: 229840   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:53,309-Speed 2633.34 samples/sec   Loss 9.8648   LearningRate 0.0523   Epoch: 5   Global Step: 229850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:31:57,198-Speed 2634.12 samples/sec   Loss 9.9796   LearningRate 0.0523   Epoch: 5   Global Step: 229860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:01,128-Speed 2605.80 samples/sec   Loss 9.9202   LearningRate 0.0523   Epoch: 5   Global Step: 229870   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:05,035-Speed 2622.38 samples/sec   Loss 9.8551   LearningRate 0.0523   Epoch: 5   Global Step: 229880   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:08,931-Speed 2628.93 samples/sec   Loss 9.7935   LearningRate 0.0523   Epoch: 5   Global Step: 229890   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:12,831-Speed 2625.93 samples/sec   Loss 9.8396   LearningRate 0.0523   Epoch: 5   Global Step: 229900   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:16,738-Speed 2621.30 samples/sec   Loss 9.7856   LearningRate 0.0523   Epoch: 5   Global Step: 229910   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:20,627-Speed 2634.16 samples/sec   Loss 9.7553   LearningRate 0.0523   Epoch: 5   Global Step: 229920   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:24,542-Speed 2616.25 samples/sec   Loss 9.6696   LearningRate 0.0522   Epoch: 5   Global Step: 229930   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:32:28,437-Speed 2629.96 samples/sec   Loss 9.9108   LearningRate 0.0522   Epoch: 5   Global Step: 229940   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:32:32,347-Speed 2619.72 samples/sec   Loss 9.7605   LearningRate 0.0522   Epoch: 5   Global Step: 229950   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:32:36,239-Speed 2632.01 samples/sec   Loss 9.7759   LearningRate 0.0522   Epoch: 5   Global Step: 229960   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:40,128-Speed 2633.85 samples/sec   Loss 9.7984   LearningRate 0.0522   Epoch: 5   Global Step: 229970   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:44,017-Speed 2633.59 samples/sec   Loss 9.8140   LearningRate 0.0522   Epoch: 5   Global Step: 229980   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:47,919-Speed 2625.36 samples/sec   Loss 9.7710   LearningRate 0.0522   Epoch: 5   Global Step: 229990   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:32:51,806-Speed 2635.54 samples/sec   Loss 9.9782   LearningRate 0.0522   Epoch: 5   Global Step: 230000   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:33:35,754-[lfw][230000]XNorm: 23.869069
Training: 2022-04-13 21:33:35,755-[lfw][230000]Accuracy-Flip: 0.99667+-0.00357
Training: 2022-04-13 21:33:35,755-[lfw][230000]Accuracy-Highest: 0.99783
Training: 2022-04-13 21:34:26,737-[cfp_fp][230000]XNorm: 21.752470
Training: 2022-04-13 21:34:26,738-[cfp_fp][230000]Accuracy-Flip: 0.98300+-0.00601
Training: 2022-04-13 21:34:26,739-[cfp_fp][230000]Accuracy-Highest: 0.98314
Training: 2022-04-13 21:35:10,391-[agedb_30][230000]XNorm: 23.469151
Training: 2022-04-13 21:35:10,392-[agedb_30][230000]Accuracy-Flip: 0.97250+-0.00704
Training: 2022-04-13 21:35:10,392-[agedb_30][230000]Accuracy-Highest: 0.97250
Training: 2022-04-13 21:35:14,253-Speed 71.89 samples/sec   Loss 9.8625   LearningRate 0.0522   Epoch: 5   Global Step: 230010   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:18,116-Speed 2651.71 samples/sec   Loss 9.9053   LearningRate 0.0522   Epoch: 5   Global Step: 230020   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:22,007-Speed 2632.87 samples/sec   Loss 9.8077   LearningRate 0.0522   Epoch: 5   Global Step: 230030   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:25,870-Speed 2651.21 samples/sec   Loss 9.6645   LearningRate 0.0522   Epoch: 5   Global Step: 230040   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:29,739-Speed 2647.51 samples/sec   Loss 9.8851   LearningRate 0.0522   Epoch: 5   Global Step: 230050   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:33,629-Speed 2632.55 samples/sec   Loss 9.9087   LearningRate 0.0522   Epoch: 5   Global Step: 230060   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:37,500-Speed 2647.39 samples/sec   Loss 9.7955   LearningRate 0.0522   Epoch: 5   Global Step: 230070   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:41,375-Speed 2643.20 samples/sec   Loss 9.8599   LearningRate 0.0522   Epoch: 5   Global Step: 230080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:45,262-Speed 2635.06 samples/sec   Loss 9.8509   LearningRate 0.0522   Epoch: 5   Global Step: 230090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:49,144-Speed 2638.42 samples/sec   Loss 9.8000   LearningRate 0.0522   Epoch: 5   Global Step: 230100   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:53,031-Speed 2634.35 samples/sec   Loss 9.9286   LearningRate 0.0522   Epoch: 5   Global Step: 230110   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:35:56,911-Speed 2640.11 samples/sec   Loss 9.8851   LearningRate 0.0522   Epoch: 5   Global Step: 230120   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:36:00,790-Speed 2640.62 samples/sec   Loss 9.9577   LearningRate 0.0522   Epoch: 5   Global Step: 230130   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:36:04,671-Speed 2639.47 samples/sec   Loss 9.6234   LearningRate 0.0522   Epoch: 5   Global Step: 230140   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:36:08,565-Speed 2629.61 samples/sec   Loss 9.6021   LearningRate 0.0522   Epoch: 5   Global Step: 230150   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:36:12,490-Speed 2609.90 samples/sec   Loss 9.7450   LearningRate 0.0522   Epoch: 5   Global Step: 230160   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:36:16,372-Speed 2638.87 samples/sec   Loss 9.7543   LearningRate 0.0522   Epoch: 5   Global Step: 230170   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:36:20,258-Speed 2635.68 samples/sec   Loss 9.7625   LearningRate 0.0522   Epoch: 5   Global Step: 230180   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:36:24,130-Speed 2645.06 samples/sec   Loss 9.8229   LearningRate 0.0522   Epoch: 5   Global Step: 230190   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:36:27,994-Speed 2651.24 samples/sec   Loss 10.0426   LearningRate 0.0522   Epoch: 5   Global Step: 230200   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:31,880-Speed 2635.35 samples/sec   Loss 10.5652   LearningRate 0.0522   Epoch: 5   Global Step: 230210   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:35,765-Speed 2636.84 samples/sec   Loss 10.3296   LearningRate 0.0522   Epoch: 5   Global Step: 230220   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:39,648-Speed 2638.08 samples/sec   Loss 9.9743   LearningRate 0.0522   Epoch: 5   Global Step: 230230   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:43,532-Speed 2636.97 samples/sec   Loss 9.8951   LearningRate 0.0522   Epoch: 5   Global Step: 230240   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:47,417-Speed 2636.28 samples/sec   Loss 9.8186   LearningRate 0.0522   Epoch: 5   Global Step: 230250   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:51,305-Speed 2634.38 samples/sec   Loss 9.8904   LearningRate 0.0522   Epoch: 5   Global Step: 230260   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:55,229-Speed 2610.39 samples/sec   Loss 9.7790   LearningRate 0.0522   Epoch: 5   Global Step: 230270   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:36:59,149-Speed 2613.17 samples/sec   Loss 9.8529   LearningRate 0.0522   Epoch: 5   Global Step: 230280   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:37:03,036-Speed 2634.51 samples/sec   Loss 9.7580   LearningRate 0.0522   Epoch: 5   Global Step: 230290   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:37:06,931-Speed 2630.21 samples/sec   Loss 9.8505   LearningRate 0.0522   Epoch: 5   Global Step: 230300   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:10,819-Speed 2633.76 samples/sec   Loss 9.8186   LearningRate 0.0522   Epoch: 5   Global Step: 230310   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:14,707-Speed 2635.37 samples/sec   Loss 9.7626   LearningRate 0.0522   Epoch: 5   Global Step: 230320   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:18,600-Speed 2630.96 samples/sec   Loss 9.8483   LearningRate 0.0522   Epoch: 5   Global Step: 230330   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:22,490-Speed 2632.75 samples/sec   Loss 9.6805   LearningRate 0.0522   Epoch: 5   Global Step: 230340   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:26,378-Speed 2634.70 samples/sec   Loss 9.8696   LearningRate 0.0522   Epoch: 5   Global Step: 230350   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:30,268-Speed 2633.09 samples/sec   Loss 9.7255   LearningRate 0.0522   Epoch: 5   Global Step: 230360   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:34,157-Speed 2634.18 samples/sec   Loss 9.7267   LearningRate 0.0522   Epoch: 5   Global Step: 230370   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:38,060-Speed 2624.33 samples/sec   Loss 9.9357   LearningRate 0.0522   Epoch: 5   Global Step: 230380   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:41,951-Speed 2632.87 samples/sec   Loss 9.8404   LearningRate 0.0522   Epoch: 5   Global Step: 230390   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:37:45,839-Speed 2634.36 samples/sec   Loss 9.8091   LearningRate 0.0522   Epoch: 5   Global Step: 230400   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:37:49,740-Speed 2625.70 samples/sec   Loss 9.6754   LearningRate 0.0522   Epoch: 5   Global Step: 230410   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:37:53,627-Speed 2634.35 samples/sec   Loss 9.8196   LearningRate 0.0522   Epoch: 5   Global Step: 230420   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:37:57,521-Speed 2630.72 samples/sec   Loss 9.7971   LearningRate 0.0522   Epoch: 5   Global Step: 230430   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:01,409-Speed 2634.15 samples/sec   Loss 9.8717   LearningRate 0.0522   Epoch: 5   Global Step: 230440   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:05,310-Speed 2626.16 samples/sec   Loss 9.9549   LearningRate 0.0522   Epoch: 5   Global Step: 230450   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:09,220-Speed 2619.29 samples/sec   Loss 9.8492   LearningRate 0.0522   Epoch: 5   Global Step: 230460   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:13,113-Speed 2631.72 samples/sec   Loss 9.7719   LearningRate 0.0522   Epoch: 5   Global Step: 230470   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:17,005-Speed 2631.46 samples/sec   Loss 9.8339   LearningRate 0.0522   Epoch: 5   Global Step: 230480   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:20,895-Speed 2632.78 samples/sec   Loss 9.8051   LearningRate 0.0522   Epoch: 5   Global Step: 230490   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:38:24,789-Speed 2629.77 samples/sec   Loss 9.9133   LearningRate 0.0521   Epoch: 5   Global Step: 230500   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:28,683-Speed 2630.27 samples/sec   Loss 9.5236   LearningRate 0.0521   Epoch: 5   Global Step: 230510   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:32,604-Speed 2612.13 samples/sec   Loss 9.8038   LearningRate 0.0521   Epoch: 5   Global Step: 230520   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:36,512-Speed 2620.69 samples/sec   Loss 9.8068   LearningRate 0.0521   Epoch: 5   Global Step: 230530   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:40,401-Speed 2634.33 samples/sec   Loss 9.7807   LearningRate 0.0521   Epoch: 5   Global Step: 230540   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:44,295-Speed 2630.38 samples/sec   Loss 9.7948   LearningRate 0.0521   Epoch: 5   Global Step: 230550   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:48,195-Speed 2626.30 samples/sec   Loss 9.7360   LearningRate 0.0521   Epoch: 5   Global Step: 230560   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:52,088-Speed 2630.81 samples/sec   Loss 9.7249   LearningRate 0.0521   Epoch: 5   Global Step: 230570   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:55,980-Speed 2631.44 samples/sec   Loss 9.8685   LearningRate 0.0521   Epoch: 5   Global Step: 230580   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:38:59,869-Speed 2634.10 samples/sec   Loss 9.7505   LearningRate 0.0521   Epoch: 5   Global Step: 230590   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:03,765-Speed 2628.80 samples/sec   Loss 9.7403   LearningRate 0.0521   Epoch: 5   Global Step: 230600   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:39:07,659-Speed 2630.08 samples/sec   Loss 9.7601   LearningRate 0.0521   Epoch: 5   Global Step: 230610   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:39:11,731-Speed 2515.71 samples/sec   Loss 9.7380   LearningRate 0.0521   Epoch: 5   Global Step: 230620   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:39:15,621-Speed 2632.95 samples/sec   Loss 9.8582   LearningRate 0.0521   Epoch: 5   Global Step: 230630   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:39:19,513-Speed 2632.06 samples/sec   Loss 9.6587   LearningRate 0.0521   Epoch: 5   Global Step: 230640   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:39:23,384-Speed 2645.56 samples/sec   Loss 9.8743   LearningRate 0.0521   Epoch: 5   Global Step: 230650   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:27,273-Speed 2633.91 samples/sec   Loss 9.7950   LearningRate 0.0521   Epoch: 5   Global Step: 230660   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:31,163-Speed 2632.40 samples/sec   Loss 9.7500   LearningRate 0.0521   Epoch: 5   Global Step: 230670   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:35,060-Speed 2628.98 samples/sec   Loss 9.7974   LearningRate 0.0521   Epoch: 5   Global Step: 230680   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:38,969-Speed 2620.23 samples/sec   Loss 9.6792   LearningRate 0.0521   Epoch: 5   Global Step: 230690   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:42,870-Speed 2625.87 samples/sec   Loss 9.8514   LearningRate 0.0521   Epoch: 5   Global Step: 230700   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:46,761-Speed 2631.84 samples/sec   Loss 9.6318   LearningRate 0.0521   Epoch: 5   Global Step: 230710   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:50,663-Speed 2625.10 samples/sec   Loss 9.7846   LearningRate 0.0521   Epoch: 5   Global Step: 230720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:54,558-Speed 2629.36 samples/sec   Loss 9.8558   LearningRate 0.0521   Epoch: 5   Global Step: 230730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:39:58,452-Speed 2630.65 samples/sec   Loss 9.8073   LearningRate 0.0521   Epoch: 5   Global Step: 230740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:40:02,359-Speed 2621.14 samples/sec   Loss 9.6216   LearningRate 0.0521   Epoch: 5   Global Step: 230750   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:40:06,249-Speed 2633.28 samples/sec   Loss 9.9251   LearningRate 0.0521   Epoch: 5   Global Step: 230760   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:40:10,143-Speed 2630.89 samples/sec   Loss 9.8116   LearningRate 0.0521   Epoch: 5   Global Step: 230770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:40:14,001-Speed 2654.59 samples/sec   Loss 10.2534   LearningRate 0.0521   Epoch: 5   Global Step: 230780   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:17,887-Speed 2636.12 samples/sec   Loss 9.8844   LearningRate 0.0521   Epoch: 5   Global Step: 230790   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:21,791-Speed 2622.83 samples/sec   Loss 9.9411   LearningRate 0.0521   Epoch: 5   Global Step: 230800   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:25,690-Speed 2627.26 samples/sec   Loss 10.1853   LearningRate 0.0521   Epoch: 5   Global Step: 230810   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:29,593-Speed 2623.91 samples/sec   Loss 9.7436   LearningRate 0.0521   Epoch: 5   Global Step: 230820   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:33,481-Speed 2634.74 samples/sec   Loss 9.8479   LearningRate 0.0521   Epoch: 5   Global Step: 230830   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:37,370-Speed 2633.55 samples/sec   Loss 9.7450   LearningRate 0.0521   Epoch: 5   Global Step: 230840   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:41,260-Speed 2633.26 samples/sec   Loss 9.9395   LearningRate 0.0521   Epoch: 5   Global Step: 230850   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:45,144-Speed 2636.95 samples/sec   Loss 9.8519   LearningRate 0.0521   Epoch: 5   Global Step: 230860   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:49,034-Speed 2632.83 samples/sec   Loss 9.7411   LearningRate 0.0521   Epoch: 5   Global Step: 230870   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:40:52,927-Speed 2631.27 samples/sec   Loss 9.8191   LearningRate 0.0521   Epoch: 5   Global Step: 230880   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:40:56,822-Speed 2629.41 samples/sec   Loss 9.7529   LearningRate 0.0521   Epoch: 5   Global Step: 230890   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:00,718-Speed 2628.87 samples/sec   Loss 9.8076   LearningRate 0.0521   Epoch: 5   Global Step: 230900   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:04,621-Speed 2624.07 samples/sec   Loss 9.7680   LearningRate 0.0521   Epoch: 5   Global Step: 230910   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:08,527-Speed 2623.18 samples/sec   Loss 9.8391   LearningRate 0.0521   Epoch: 5   Global Step: 230920   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:12,418-Speed 2632.24 samples/sec   Loss 9.7800   LearningRate 0.0521   Epoch: 5   Global Step: 230930   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:16,309-Speed 2632.11 samples/sec   Loss 9.7455   LearningRate 0.0521   Epoch: 5   Global Step: 230940   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:20,202-Speed 2631.88 samples/sec   Loss 9.6255   LearningRate 0.0521   Epoch: 5   Global Step: 230950   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:24,105-Speed 2624.00 samples/sec   Loss 9.8212   LearningRate 0.0521   Epoch: 5   Global Step: 230960   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:27,999-Speed 2630.45 samples/sec   Loss 9.9593   LearningRate 0.0521   Epoch: 5   Global Step: 230970   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:41:31,902-Speed 2624.49 samples/sec   Loss 9.7792   LearningRate 0.0521   Epoch: 5   Global Step: 230980   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:35,801-Speed 2626.49 samples/sec   Loss 9.7873   LearningRate 0.0521   Epoch: 5   Global Step: 230990   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:39,706-Speed 2623.04 samples/sec   Loss 9.9744   LearningRate 0.0521   Epoch: 5   Global Step: 231000   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:43,602-Speed 2629.12 samples/sec   Loss 9.8963   LearningRate 0.0521   Epoch: 5   Global Step: 231010   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:47,495-Speed 2630.75 samples/sec   Loss 9.8595   LearningRate 0.0521   Epoch: 5   Global Step: 231020   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:51,404-Speed 2620.72 samples/sec   Loss 9.7018   LearningRate 0.0521   Epoch: 5   Global Step: 231030   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:55,304-Speed 2626.05 samples/sec   Loss 9.6928   LearningRate 0.0521   Epoch: 5   Global Step: 231040   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:41:59,198-Speed 2630.35 samples/sec   Loss 9.7146   LearningRate 0.0521   Epoch: 5   Global Step: 231050   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:42:03,101-Speed 2624.72 samples/sec   Loss 9.7601   LearningRate 0.0521   Epoch: 5   Global Step: 231060   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:42:06,994-Speed 2630.65 samples/sec   Loss 9.9034   LearningRate 0.0521   Epoch: 5   Global Step: 231070   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:42:10,886-Speed 2631.10 samples/sec   Loss 9.9238   LearningRate 0.0520   Epoch: 5   Global Step: 231080   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:14,777-Speed 2632.92 samples/sec   Loss 9.6877   LearningRate 0.0520   Epoch: 5   Global Step: 231090   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:18,668-Speed 2632.40 samples/sec   Loss 9.7121   LearningRate 0.0520   Epoch: 5   Global Step: 231100   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:22,557-Speed 2633.80 samples/sec   Loss 9.7665   LearningRate 0.0520   Epoch: 5   Global Step: 231110   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:26,451-Speed 2629.89 samples/sec   Loss 9.8416   LearningRate 0.0520   Epoch: 5   Global Step: 231120   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:30,342-Speed 2632.59 samples/sec   Loss 9.8422   LearningRate 0.0520   Epoch: 5   Global Step: 231130   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:34,237-Speed 2629.66 samples/sec   Loss 9.7206   LearningRate 0.0520   Epoch: 5   Global Step: 231140   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:38,131-Speed 2629.93 samples/sec   Loss 9.8123   LearningRate 0.0520   Epoch: 5   Global Step: 231150   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:42,022-Speed 2632.01 samples/sec   Loss 9.7194   LearningRate 0.0520   Epoch: 5   Global Step: 231160   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:45,918-Speed 2629.61 samples/sec   Loss 9.6622   LearningRate 0.0520   Epoch: 5   Global Step: 231170   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:42:49,816-Speed 2628.74 samples/sec   Loss 9.6802   LearningRate 0.0520   Epoch: 5   Global Step: 231180   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:42:53,709-Speed 2630.71 samples/sec   Loss 9.8650   LearningRate 0.0520   Epoch: 5   Global Step: 231190   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:42:57,600-Speed 2632.72 samples/sec   Loss 9.8409   LearningRate 0.0520   Epoch: 5   Global Step: 231200   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:01,495-Speed 2629.18 samples/sec   Loss 9.8954   LearningRate 0.0520   Epoch: 5   Global Step: 231210   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:05,389-Speed 2630.44 samples/sec   Loss 9.8246   LearningRate 0.0520   Epoch: 5   Global Step: 231220   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:09,280-Speed 2632.23 samples/sec   Loss 9.7654   LearningRate 0.0520   Epoch: 5   Global Step: 231230   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:13,176-Speed 2628.75 samples/sec   Loss 9.7698   LearningRate 0.0520   Epoch: 5   Global Step: 231240   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:17,072-Speed 2628.76 samples/sec   Loss 9.6777   LearningRate 0.0520   Epoch: 5   Global Step: 231250   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:20,968-Speed 2629.42 samples/sec   Loss 9.6634   LearningRate 0.0520   Epoch: 5   Global Step: 231260   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:24,861-Speed 2631.18 samples/sec   Loss 9.7781   LearningRate 0.0520   Epoch: 5   Global Step: 231270   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:43:28,754-Speed 2631.83 samples/sec   Loss 9.8227   LearningRate 0.0520   Epoch: 5   Global Step: 231280   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:32,648-Speed 2629.75 samples/sec   Loss 9.7856   LearningRate 0.0520   Epoch: 5   Global Step: 231290   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:36,542-Speed 2629.87 samples/sec   Loss 9.7307   LearningRate 0.0520   Epoch: 5   Global Step: 231300   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:40,437-Speed 2629.95 samples/sec   Loss 9.8062   LearningRate 0.0520   Epoch: 5   Global Step: 231310   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:44,332-Speed 2629.98 samples/sec   Loss 9.8873   LearningRate 0.0520   Epoch: 5   Global Step: 231320   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:48,228-Speed 2628.56 samples/sec   Loss 9.6944   LearningRate 0.0520   Epoch: 5   Global Step: 231330   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:52,125-Speed 2628.38 samples/sec   Loss 9.7146   LearningRate 0.0520   Epoch: 5   Global Step: 231340   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:56,035-Speed 2619.24 samples/sec   Loss 9.7236   LearningRate 0.0520   Epoch: 5   Global Step: 231350   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:43:59,945-Speed 2620.26 samples/sec   Loss 9.8859   LearningRate 0.0520   Epoch: 5   Global Step: 231360   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:44:03,844-Speed 2626.82 samples/sec   Loss 9.7829   LearningRate 0.0520   Epoch: 5   Global Step: 231370   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:44:07,737-Speed 2630.67 samples/sec   Loss 9.7376   LearningRate 0.0520   Epoch: 5   Global Step: 231380   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:44:11,619-Speed 2638.90 samples/sec   Loss 9.8859   LearningRate 0.0520   Epoch: 5   Global Step: 231390   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:15,512-Speed 2631.04 samples/sec   Loss 9.7900   LearningRate 0.0520   Epoch: 5   Global Step: 231400   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:19,408-Speed 2629.04 samples/sec   Loss 9.7217   LearningRate 0.0520   Epoch: 5   Global Step: 231410   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:23,303-Speed 2629.26 samples/sec   Loss 9.7442   LearningRate 0.0520   Epoch: 5   Global Step: 231420   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:27,198-Speed 2630.01 samples/sec   Loss 9.7979   LearningRate 0.0520   Epoch: 5   Global Step: 231430   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:31,096-Speed 2627.38 samples/sec   Loss 9.8394   LearningRate 0.0520   Epoch: 5   Global Step: 231440   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:34,992-Speed 2629.70 samples/sec   Loss 9.8832   LearningRate 0.0520   Epoch: 5   Global Step: 231450   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:38,891-Speed 2626.98 samples/sec   Loss 9.6881   LearningRate 0.0520   Epoch: 5   Global Step: 231460   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:44:42,780-Speed 2633.17 samples/sec   Loss 9.8174   LearningRate 0.0520   Epoch: 5   Global Step: 231470   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:44:46,643-Speed 2651.32 samples/sec   Loss 10.0610   LearningRate 0.0520   Epoch: 5   Global Step: 231480   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:44:50,541-Speed 2628.10 samples/sec   Loss 10.0430   LearningRate 0.0520   Epoch: 5   Global Step: 231490   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:44:54,433-Speed 2631.27 samples/sec   Loss 10.9428   LearningRate 0.0520   Epoch: 5   Global Step: 231500   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:44:58,330-Speed 2629.53 samples/sec   Loss 10.5008   LearningRate 0.0520   Epoch: 5   Global Step: 231510   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:02,236-Speed 2622.23 samples/sec   Loss 10.0203   LearningRate 0.0520   Epoch: 5   Global Step: 231520   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:06,130-Speed 2630.09 samples/sec   Loss 9.9930   LearningRate 0.0520   Epoch: 5   Global Step: 231530   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:10,026-Speed 2628.56 samples/sec   Loss 9.7291   LearningRate 0.0520   Epoch: 5   Global Step: 231540   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:13,922-Speed 2629.14 samples/sec   Loss 9.9213   LearningRate 0.0520   Epoch: 5   Global Step: 231550   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:17,814-Speed 2632.14 samples/sec   Loss 9.8422   LearningRate 0.0520   Epoch: 5   Global Step: 231560   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:21,708-Speed 2629.94 samples/sec   Loss 9.8898   LearningRate 0.0520   Epoch: 5   Global Step: 231570   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:25,620-Speed 2619.23 samples/sec   Loss 9.8806   LearningRate 0.0520   Epoch: 5   Global Step: 231580   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:29,509-Speed 2633.77 samples/sec   Loss 9.9224   LearningRate 0.0520   Epoch: 5   Global Step: 231590   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 21:45:33,402-Speed 2631.02 samples/sec   Loss 9.8334   LearningRate 0.0520   Epoch: 5   Global Step: 231600   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:45:37,310-Speed 2621.34 samples/sec   Loss 9.8104   LearningRate 0.0520   Epoch: 5   Global Step: 231610   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:45:41,204-Speed 2630.25 samples/sec   Loss 9.8004   LearningRate 0.0520   Epoch: 5   Global Step: 231620   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:45:45,094-Speed 2632.58 samples/sec   Loss 9.7949   LearningRate 0.0520   Epoch: 5   Global Step: 231630   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:45:48,987-Speed 2631.66 samples/sec   Loss 9.8596   LearningRate 0.0520   Epoch: 5   Global Step: 231640   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:45:52,890-Speed 2623.75 samples/sec   Loss 9.7850   LearningRate 0.0519   Epoch: 5   Global Step: 231650   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:45:56,787-Speed 2628.71 samples/sec   Loss 9.6964   LearningRate 0.0519   Epoch: 5   Global Step: 231660   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:46:00,682-Speed 2629.70 samples/sec   Loss 9.9002   LearningRate 0.0519   Epoch: 5   Global Step: 231670   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:46:04,573-Speed 2632.47 samples/sec   Loss 9.7571   LearningRate 0.0519   Epoch: 5   Global Step: 231680   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:46:08,468-Speed 2629.77 samples/sec   Loss 9.7008   LearningRate 0.0519   Epoch: 5   Global Step: 231690   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 21:46:12,366-Speed 2627.91 samples/sec   Loss 9.8077   LearningRate 0.0519   Epoch: 5   Global Step: 231700   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:16,260-Speed 2629.66 samples/sec   Loss 9.9805   LearningRate 0.0519   Epoch: 5   Global Step: 231710   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:20,246-Speed 2570.26 samples/sec   Loss 9.7473   LearningRate 0.0519   Epoch: 5   Global Step: 231720   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:24,142-Speed 2628.91 samples/sec   Loss 9.8181   LearningRate 0.0519   Epoch: 5   Global Step: 231730   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:28,034-Speed 2631.38 samples/sec   Loss 9.8934   LearningRate 0.0519   Epoch: 5   Global Step: 231740   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:31,935-Speed 2625.75 samples/sec   Loss 9.8305   LearningRate 0.0519   Epoch: 5   Global Step: 231750   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:35,832-Speed 2628.03 samples/sec   Loss 9.7533   LearningRate 0.0519   Epoch: 5   Global Step: 231760   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:39,726-Speed 2630.48 samples/sec   Loss 9.7168   LearningRate 0.0519   Epoch: 5   Global Step: 231770   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:43,642-Speed 2615.29 samples/sec   Loss 9.8785   LearningRate 0.0519   Epoch: 5   Global Step: 231780   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:47,540-Speed 2627.96 samples/sec   Loss 9.9097   LearningRate 0.0519   Epoch: 5   Global Step: 231790   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 21:46:51,438-Speed 2627.07 samples/sec   Loss 9.9153   LearningRate 0.0519   Epoch: 5   Global Step: 231800   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:46:55,333-Speed 2630.09 samples/sec   Loss 9.8737   LearningRate 0.0519   Epoch: 5   Global Step: 231810   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:46:59,227-Speed 2630.10 samples/sec   Loss 9.7101   LearningRate 0.0519   Epoch: 5   Global Step: 231820   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:03,135-Speed 2621.20 samples/sec   Loss 9.7496   LearningRate 0.0519   Epoch: 5   Global Step: 231830   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:07,031-Speed 2628.50 samples/sec   Loss 9.7450   LearningRate 0.0519   Epoch: 5   Global Step: 231840   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:10,929-Speed 2627.72 samples/sec   Loss 9.8928   LearningRate 0.0519   Epoch: 5   Global Step: 231850   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:14,832-Speed 2624.37 samples/sec   Loss 9.8142   LearningRate 0.0519   Epoch: 5   Global Step: 231860   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:18,755-Speed 2610.54 samples/sec   Loss 9.8191   LearningRate 0.0519   Epoch: 5   Global Step: 231870   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:22,653-Speed 2627.63 samples/sec   Loss 9.8057   LearningRate 0.0519   Epoch: 5   Global Step: 231880   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:26,550-Speed 2628.67 samples/sec   Loss 9.7159   LearningRate 0.0519   Epoch: 5   Global Step: 231890   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:47:30,457-Speed 2621.75 samples/sec   Loss 9.6906   LearningRate 0.0519   Epoch: 5   Global Step: 231900   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:34,359-Speed 2625.19 samples/sec   Loss 9.8551   LearningRate 0.0519   Epoch: 5   Global Step: 231910   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:38,268-Speed 2619.97 samples/sec   Loss 9.8518   LearningRate 0.0519   Epoch: 5   Global Step: 231920   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:42,236-Speed 2580.98 samples/sec   Loss 9.8471   LearningRate 0.0519   Epoch: 5   Global Step: 231930   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:46,138-Speed 2625.80 samples/sec   Loss 9.7912   LearningRate 0.0519   Epoch: 5   Global Step: 231940   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:50,035-Speed 2628.39 samples/sec   Loss 9.6639   LearningRate 0.0519   Epoch: 5   Global Step: 231950   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:53,978-Speed 2597.34 samples/sec   Loss 9.7980   LearningRate 0.0519   Epoch: 5   Global Step: 231960   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:47:57,886-Speed 2621.23 samples/sec   Loss 9.8351   LearningRate 0.0519   Epoch: 5   Global Step: 231970   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:01,795-Speed 2619.91 samples/sec   Loss 9.6150   LearningRate 0.0519   Epoch: 5   Global Step: 231980   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:05,697-Speed 2625.22 samples/sec   Loss 9.6512   LearningRate 0.0519   Epoch: 5   Global Step: 231990   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:09,602-Speed 2622.54 samples/sec   Loss 9.6359   LearningRate 0.0519   Epoch: 5   Global Step: 232000   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:13,569-Speed 2583.65 samples/sec   Loss 9.7138   LearningRate 0.0519   Epoch: 5   Global Step: 232010   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:17,477-Speed 2620.34 samples/sec   Loss 9.8355   LearningRate 0.0519   Epoch: 5   Global Step: 232020   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:21,391-Speed 2617.08 samples/sec   Loss 9.7443   LearningRate 0.0519   Epoch: 5   Global Step: 232030   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:25,298-Speed 2621.61 samples/sec   Loss 9.6662   LearningRate 0.0519   Epoch: 5   Global Step: 232040   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:29,199-Speed 2625.91 samples/sec   Loss 9.7995   LearningRate 0.0519   Epoch: 5   Global Step: 232050   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:33,110-Speed 2618.64 samples/sec   Loss 9.6479   LearningRate 0.0519   Epoch: 5   Global Step: 232060   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:37,140-Speed 2541.53 samples/sec   Loss 9.5514   LearningRate 0.0519   Epoch: 5   Global Step: 232070   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:48:41,065-Speed 2609.68 samples/sec   Loss 9.6364   LearningRate 0.0519   Epoch: 5   Global Step: 232080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:44,969-Speed 2623.46 samples/sec   Loss 9.8109   LearningRate 0.0519   Epoch: 5   Global Step: 232090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:48,872-Speed 2624.26 samples/sec   Loss 9.6854   LearningRate 0.0519   Epoch: 5   Global Step: 232100   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:52,776-Speed 2624.02 samples/sec   Loss 9.7537   LearningRate 0.0519   Epoch: 5   Global Step: 232110   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:48:56,678-Speed 2624.80 samples/sec   Loss 9.8796   LearningRate 0.0519   Epoch: 5   Global Step: 232120   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:49:00,573-Speed 2629.63 samples/sec   Loss 9.6682   LearningRate 0.0519   Epoch: 5   Global Step: 232130   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:49:04,473-Speed 2626.64 samples/sec   Loss 9.8155   LearningRate 0.0519   Epoch: 5   Global Step: 232140   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:49:08,391-Speed 2613.71 samples/sec   Loss 9.7665   LearningRate 0.0519   Epoch: 5   Global Step: 232150   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:49:12,283-Speed 2631.96 samples/sec   Loss 9.7417   LearningRate 0.0519   Epoch: 5   Global Step: 232160   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:49:16,191-Speed 2621.17 samples/sec   Loss 9.8342   LearningRate 0.0519   Epoch: 5   Global Step: 232170   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:49:20,103-Speed 2618.35 samples/sec   Loss 9.7386   LearningRate 0.0519   Epoch: 5   Global Step: 232180   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:24,022-Speed 2613.31 samples/sec   Loss 9.7969   LearningRate 0.0519   Epoch: 5   Global Step: 232190   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:27,934-Speed 2618.73 samples/sec   Loss 9.7117   LearningRate 0.0519   Epoch: 5   Global Step: 232200   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:31,843-Speed 2620.29 samples/sec   Loss 9.6739   LearningRate 0.0519   Epoch: 5   Global Step: 232210   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:35,765-Speed 2611.53 samples/sec   Loss 9.7795   LearningRate 0.0519   Epoch: 5   Global Step: 232220   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:39,672-Speed 2621.31 samples/sec   Loss 9.8126   LearningRate 0.0518   Epoch: 5   Global Step: 232230   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:43,590-Speed 2614.74 samples/sec   Loss 9.7087   LearningRate 0.0518   Epoch: 5   Global Step: 232240   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:47,501-Speed 2619.21 samples/sec   Loss 9.6367   LearningRate 0.0518   Epoch: 5   Global Step: 232250   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:51,406-Speed 2622.61 samples/sec   Loss 9.8581   LearningRate 0.0518   Epoch: 5   Global Step: 232260   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:55,323-Speed 2615.14 samples/sec   Loss 9.8919   LearningRate 0.0518   Epoch: 5   Global Step: 232270   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:49:59,229-Speed 2629.44 samples/sec   Loss 9.7216   LearningRate 0.0518   Epoch: 5   Global Step: 232280   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:50:03,138-Speed 2620.78 samples/sec   Loss 9.5848   LearningRate 0.0518   Epoch: 5   Global Step: 232290   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:50:07,022-Speed 2636.83 samples/sec   Loss 9.7468   LearningRate 0.0518   Epoch: 5   Global Step: 232300   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:10,931-Speed 2619.90 samples/sec   Loss 9.6552   LearningRate 0.0518   Epoch: 5   Global Step: 232310   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:14,853-Speed 2633.69 samples/sec   Loss 9.7744   LearningRate 0.0518   Epoch: 5   Global Step: 232320   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:18,752-Speed 2626.85 samples/sec   Loss 9.6841   LearningRate 0.0518   Epoch: 5   Global Step: 232330   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:22,648-Speed 2629.26 samples/sec   Loss 9.7162   LearningRate 0.0518   Epoch: 5   Global Step: 232340   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:26,579-Speed 2627.67 samples/sec   Loss 9.7553   LearningRate 0.0518   Epoch: 5   Global Step: 232350   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:30,473-Speed 2630.02 samples/sec   Loss 9.7059   LearningRate 0.0518   Epoch: 5   Global Step: 232360   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:34,518-Speed 2610.45 samples/sec   Loss 9.8521   LearningRate 0.0518   Epoch: 5   Global Step: 232370   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:50:38,399-Speed 2639.65 samples/sec   Loss 9.7351   LearningRate 0.0518   Epoch: 5   Global Step: 232380   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:50:42,295-Speed 2628.92 samples/sec   Loss 9.5892   LearningRate 0.0518   Epoch: 5   Global Step: 232390   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:50:46,192-Speed 2628.21 samples/sec   Loss 9.7444   LearningRate 0.0518   Epoch: 5   Global Step: 232400   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:50:50,386-Speed 2624.31 samples/sec   Loss 9.7410   LearningRate 0.0518   Epoch: 5   Global Step: 232410   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:50:54,282-Speed 2628.89 samples/sec   Loss 9.7363   LearningRate 0.0518   Epoch: 5   Global Step: 232420   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:50:58,193-Speed 2618.75 samples/sec   Loss 9.5443   LearningRate 0.0518   Epoch: 5   Global Step: 232430   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:51:02,107-Speed 2617.13 samples/sec   Loss 9.8728   LearningRate 0.0518   Epoch: 5   Global Step: 232440   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:51:06,007-Speed 2626.33 samples/sec   Loss 9.5993   LearningRate 0.0518   Epoch: 5   Global Step: 232450   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:51:09,918-Speed 2619.04 samples/sec   Loss 9.6884   LearningRate 0.0518   Epoch: 5   Global Step: 232460   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:51:13,813-Speed 2629.04 samples/sec   Loss 9.6427   LearningRate 0.0518   Epoch: 5   Global Step: 232470   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:51:17,709-Speed 2628.93 samples/sec   Loss 9.6900   LearningRate 0.0518   Epoch: 5   Global Step: 232480   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:21,608-Speed 2627.49 samples/sec   Loss 9.7604   LearningRate 0.0518   Epoch: 5   Global Step: 232490   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:25,506-Speed 2627.34 samples/sec   Loss 9.6651   LearningRate 0.0518   Epoch: 5   Global Step: 232500   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:29,411-Speed 2622.64 samples/sec   Loss 9.6750   LearningRate 0.0518   Epoch: 5   Global Step: 232510   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:33,309-Speed 2628.61 samples/sec   Loss 9.7617   LearningRate 0.0518   Epoch: 5   Global Step: 232520   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:37,218-Speed 2620.32 samples/sec   Loss 9.6899   LearningRate 0.0518   Epoch: 5   Global Step: 232530   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:41,115-Speed 2627.94 samples/sec   Loss 9.7294   LearningRate 0.0518   Epoch: 5   Global Step: 232540   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:45,017-Speed 2625.26 samples/sec   Loss 9.8289   LearningRate 0.0518   Epoch: 5   Global Step: 232550   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:48,917-Speed 2625.61 samples/sec   Loss 9.6832   LearningRate 0.0518   Epoch: 5   Global Step: 232560   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:52,815-Speed 2627.83 samples/sec   Loss 9.7667   LearningRate 0.0518   Epoch: 5   Global Step: 232570   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:51:56,715-Speed 2626.54 samples/sec   Loss 9.8103   LearningRate 0.0518   Epoch: 5   Global Step: 232580   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:00,617-Speed 2625.39 samples/sec   Loss 9.7692   LearningRate 0.0518   Epoch: 5   Global Step: 232590   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:04,513-Speed 2628.43 samples/sec   Loss 9.8072   LearningRate 0.0518   Epoch: 5   Global Step: 232600   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:08,420-Speed 2622.27 samples/sec   Loss 9.8087   LearningRate 0.0518   Epoch: 5   Global Step: 232610   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:12,337-Speed 2614.78 samples/sec   Loss 9.8579   LearningRate 0.0518   Epoch: 5   Global Step: 232620   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:16,238-Speed 2625.78 samples/sec   Loss 9.9213   LearningRate 0.0518   Epoch: 5   Global Step: 232630   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:20,152-Speed 2616.16 samples/sec   Loss 9.7420   LearningRate 0.0518   Epoch: 5   Global Step: 232640   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:52:24,034-Speed 2638.81 samples/sec   Loss 9.7674   LearningRate 0.0518   Epoch: 5   Global Step: 232650   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:27,935-Speed 2625.76 samples/sec   Loss 9.7215   LearningRate 0.0518   Epoch: 5   Global Step: 232660   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:31,837-Speed 2625.02 samples/sec   Loss 9.7328   LearningRate 0.0518   Epoch: 5   Global Step: 232670   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:35,737-Speed 2626.81 samples/sec   Loss 9.6441   LearningRate 0.0518   Epoch: 5   Global Step: 232680   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:39,638-Speed 2625.65 samples/sec   Loss 9.7006   LearningRate 0.0518   Epoch: 5   Global Step: 232690   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:43,617-Speed 2574.15 samples/sec   Loss 9.7523   LearningRate 0.0518   Epoch: 5   Global Step: 232700   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:47,533-Speed 2624.55 samples/sec   Loss 9.7486   LearningRate 0.0518   Epoch: 5   Global Step: 232710   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:51,452-Speed 2613.67 samples/sec   Loss 9.6615   LearningRate 0.0518   Epoch: 5   Global Step: 232720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:55,354-Speed 2624.78 samples/sec   Loss 9.8118   LearningRate 0.0518   Epoch: 5   Global Step: 232730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:52:59,262-Speed 2621.17 samples/sec   Loss 9.7848   LearningRate 0.0518   Epoch: 5   Global Step: 232740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:03,180-Speed 2613.93 samples/sec   Loss 9.9054   LearningRate 0.0518   Epoch: 5   Global Step: 232750   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:53:07,175-Speed 2563.62 samples/sec   Loss 9.6542   LearningRate 0.0518   Epoch: 5   Global Step: 232760   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:53:11,082-Speed 2622.24 samples/sec   Loss 9.7388   LearningRate 0.0518   Epoch: 5   Global Step: 232770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:15,102-Speed 2547.72 samples/sec   Loss 9.8142   LearningRate 0.0518   Epoch: 5   Global Step: 232780   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:19,003-Speed 2625.01 samples/sec   Loss 9.6755   LearningRate 0.0518   Epoch: 5   Global Step: 232790   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:22,903-Speed 2627.17 samples/sec   Loss 9.7317   LearningRate 0.0518   Epoch: 5   Global Step: 232800   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:26,799-Speed 2628.57 samples/sec   Loss 9.7418   LearningRate 0.0517   Epoch: 5   Global Step: 232810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:30,699-Speed 2626.83 samples/sec   Loss 9.7311   LearningRate 0.0517   Epoch: 5   Global Step: 232820   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:34,601-Speed 2624.94 samples/sec   Loss 9.7169   LearningRate 0.0517   Epoch: 5   Global Step: 232830   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:38,500-Speed 2626.91 samples/sec   Loss 9.7494   LearningRate 0.0517   Epoch: 5   Global Step: 232840   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:42,412-Speed 2617.59 samples/sec   Loss 9.7701   LearningRate 0.0517   Epoch: 5   Global Step: 232850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:46,310-Speed 2628.61 samples/sec   Loss 9.7571   LearningRate 0.0517   Epoch: 5   Global Step: 232860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:53:50,206-Speed 2628.69 samples/sec   Loss 9.7013   LearningRate 0.0517   Epoch: 5   Global Step: 232870   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:53:54,111-Speed 2622.65 samples/sec   Loss 9.6785   LearningRate 0.0517   Epoch: 5   Global Step: 232880   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:53:57,990-Speed 2640.86 samples/sec   Loss 9.6876   LearningRate 0.0517   Epoch: 5   Global Step: 232890   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:01,891-Speed 2625.91 samples/sec   Loss 9.8004   LearningRate 0.0517   Epoch: 5   Global Step: 232900   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:05,803-Speed 2617.84 samples/sec   Loss 9.8267   LearningRate 0.0517   Epoch: 5   Global Step: 232910   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:09,713-Speed 2619.69 samples/sec   Loss 9.7475   LearningRate 0.0517   Epoch: 5   Global Step: 232920   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:13,615-Speed 2624.63 samples/sec   Loss 9.7006   LearningRate 0.0517   Epoch: 5   Global Step: 232930   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:17,520-Speed 2623.63 samples/sec   Loss 9.8258   LearningRate 0.0517   Epoch: 5   Global Step: 232940   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:21,420-Speed 2626.58 samples/sec   Loss 9.8286   LearningRate 0.0517   Epoch: 5   Global Step: 232950   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:25,328-Speed 2620.68 samples/sec   Loss 9.7703   LearningRate 0.0517   Epoch: 5   Global Step: 232960   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:29,338-Speed 2554.39 samples/sec   Loss 9.6291   LearningRate 0.0517   Epoch: 5   Global Step: 232970   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:33,260-Speed 2611.09 samples/sec   Loss 9.7009   LearningRate 0.0517   Epoch: 5   Global Step: 232980   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:54:37,170-Speed 2619.64 samples/sec   Loss 9.6802   LearningRate 0.0517   Epoch: 5   Global Step: 232990   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:54:41,078-Speed 2621.15 samples/sec   Loss 9.8060   LearningRate 0.0517   Epoch: 5   Global Step: 233000   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:54:44,984-Speed 2622.65 samples/sec   Loss 9.8255   LearningRate 0.0517   Epoch: 5   Global Step: 233010   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:54:48,884-Speed 2626.48 samples/sec   Loss 9.8890   LearningRate 0.0517   Epoch: 5   Global Step: 233020   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:54:52,786-Speed 2625.29 samples/sec   Loss 9.6707   LearningRate 0.0517   Epoch: 5   Global Step: 233030   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:54:56,796-Speed 2553.74 samples/sec   Loss 9.8292   LearningRate 0.0517   Epoch: 5   Global Step: 233040   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:00,847-Speed 2528.67 samples/sec   Loss 9.7821   LearningRate 0.0517   Epoch: 5   Global Step: 233050   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:04,750-Speed 2624.23 samples/sec   Loss 9.7041   LearningRate 0.0517   Epoch: 5   Global Step: 233060   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:08,655-Speed 2622.89 samples/sec   Loss 9.7532   LearningRate 0.0517   Epoch: 5   Global Step: 233070   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:12,552-Speed 2627.93 samples/sec   Loss 9.7795   LearningRate 0.0517   Epoch: 5   Global Step: 233080   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:16,436-Speed 2637.69 samples/sec   Loss 9.6811   LearningRate 0.0517   Epoch: 5   Global Step: 233090   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:20,339-Speed 2624.24 samples/sec   Loss 9.6263   LearningRate 0.0517   Epoch: 5   Global Step: 233100   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:24,273-Speed 2603.93 samples/sec   Loss 9.7415   LearningRate 0.0517   Epoch: 5   Global Step: 233110   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:55:28,156-Speed 2638.09 samples/sec   Loss 9.7885   LearningRate 0.0517   Epoch: 5   Global Step: 233120   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:32,066-Speed 2620.01 samples/sec   Loss 9.7925   LearningRate 0.0517   Epoch: 5   Global Step: 233130   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:35,970-Speed 2623.41 samples/sec   Loss 9.8500   LearningRate 0.0517   Epoch: 5   Global Step: 233140   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:39,872-Speed 2624.83 samples/sec   Loss 9.7569   LearningRate 0.0517   Epoch: 5   Global Step: 233150   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:43,770-Speed 2627.44 samples/sec   Loss 9.7345   LearningRate 0.0517   Epoch: 5   Global Step: 233160   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:47,670-Speed 2626.28 samples/sec   Loss 9.8273   LearningRate 0.0517   Epoch: 5   Global Step: 233170   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:51,572-Speed 2625.44 samples/sec   Loss 9.9768   LearningRate 0.0517   Epoch: 5   Global Step: 233180   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:55,484-Speed 2618.03 samples/sec   Loss 9.7270   LearningRate 0.0517   Epoch: 5   Global Step: 233190   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:55:59,382-Speed 2627.62 samples/sec   Loss 9.6823   LearningRate 0.0517   Epoch: 5   Global Step: 233200   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:03,284-Speed 2625.09 samples/sec   Loss 9.7188   LearningRate 0.0517   Epoch: 5   Global Step: 233210   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:07,172-Speed 2634.37 samples/sec   Loss 9.7304   LearningRate 0.0517   Epoch: 5   Global Step: 233220   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:11,069-Speed 2627.77 samples/sec   Loss 9.7133   LearningRate 0.0517   Epoch: 5   Global Step: 233230   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:15,001-Speed 2605.44 samples/sec   Loss 9.7049   LearningRate 0.0517   Epoch: 5   Global Step: 233240   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:18,899-Speed 2627.05 samples/sec   Loss 9.8155   LearningRate 0.0517   Epoch: 5   Global Step: 233250   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:22,810-Speed 2619.47 samples/sec   Loss 9.8673   LearningRate 0.0517   Epoch: 5   Global Step: 233260   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:26,711-Speed 2625.83 samples/sec   Loss 9.7987   LearningRate 0.0517   Epoch: 5   Global Step: 233270   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:30,627-Speed 2615.15 samples/sec   Loss 9.6865   LearningRate 0.0517   Epoch: 5   Global Step: 233280   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:34,524-Speed 2628.71 samples/sec   Loss 9.7004   LearningRate 0.0517   Epoch: 5   Global Step: 233290   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:38,428-Speed 2623.27 samples/sec   Loss 9.7518   LearningRate 0.0517   Epoch: 5   Global Step: 233300   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:42,332-Speed 2624.04 samples/sec   Loss 9.7403   LearningRate 0.0517   Epoch: 5   Global Step: 233310   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:56:46,231-Speed 2626.77 samples/sec   Loss 9.7040   LearningRate 0.0517   Epoch: 5   Global Step: 233320   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:56:50,132-Speed 2625.44 samples/sec   Loss 9.7062   LearningRate 0.0517   Epoch: 5   Global Step: 233330   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:56:54,038-Speed 2621.87 samples/sec   Loss 9.7217   LearningRate 0.0517   Epoch: 5   Global Step: 233340   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:56:57,947-Speed 2620.51 samples/sec   Loss 9.6391   LearningRate 0.0517   Epoch: 5   Global Step: 233350   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:57:01,867-Speed 2612.75 samples/sec   Loss 9.7192   LearningRate 0.0517   Epoch: 5   Global Step: 233360   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:57:05,765-Speed 2627.58 samples/sec   Loss 9.8140   LearningRate 0.0517   Epoch: 5   Global Step: 233370   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:57:09,649-Speed 2637.82 samples/sec   Loss 9.7906   LearningRate 0.0516   Epoch: 5   Global Step: 233380   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:13,550-Speed 2625.65 samples/sec   Loss 9.7835   LearningRate 0.0516   Epoch: 5   Global Step: 233390   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:17,453-Speed 2624.14 samples/sec   Loss 9.7347   LearningRate 0.0516   Epoch: 5   Global Step: 233400   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:21,382-Speed 2606.98 samples/sec   Loss 9.7432   LearningRate 0.0516   Epoch: 5   Global Step: 233410   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:25,302-Speed 2612.65 samples/sec   Loss 9.6952   LearningRate 0.0516   Epoch: 5   Global Step: 233420   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:29,214-Speed 2618.09 samples/sec   Loss 9.8663   LearningRate 0.0516   Epoch: 5   Global Step: 233430   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:33,125-Speed 2618.85 samples/sec   Loss 9.6149   LearningRate 0.0516   Epoch: 5   Global Step: 233440   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:57:37,012-Speed 2634.98 samples/sec   Loss 9.6401   LearningRate 0.0516   Epoch: 5   Global Step: 233450   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:57:40,916-Speed 2624.00 samples/sec   Loss 9.6578   LearningRate 0.0516   Epoch: 5   Global Step: 233460   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:57:44,823-Speed 2621.39 samples/sec   Loss 9.7915   LearningRate 0.0516   Epoch: 5   Global Step: 233470   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:57:48,736-Speed 2617.57 samples/sec   Loss 9.7008   LearningRate 0.0516   Epoch: 5   Global Step: 233480   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:57:52,638-Speed 2624.69 samples/sec   Loss 9.6981   LearningRate 0.0516   Epoch: 5   Global Step: 233490   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:57:56,549-Speed 2619.08 samples/sec   Loss 9.7230   LearningRate 0.0516   Epoch: 5   Global Step: 233500   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:58:00,458-Speed 2620.23 samples/sec   Loss 9.8299   LearningRate 0.0516   Epoch: 5   Global Step: 233510   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:58:04,545-Speed 2506.09 samples/sec   Loss 9.6847   LearningRate 0.0516   Epoch: 5   Global Step: 233520   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:58:08,490-Speed 2596.37 samples/sec   Loss 9.6998   LearningRate 0.0516   Epoch: 5   Global Step: 233530   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:58:12,395-Speed 2623.54 samples/sec   Loss 9.7523   LearningRate 0.0516   Epoch: 5   Global Step: 233540   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 21:58:16,311-Speed 2615.14 samples/sec   Loss 9.6691   LearningRate 0.0516   Epoch: 5   Global Step: 233550   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:20,216-Speed 2623.05 samples/sec   Loss 9.6761   LearningRate 0.0516   Epoch: 5   Global Step: 233560   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:24,120-Speed 2623.45 samples/sec   Loss 9.6091   LearningRate 0.0516   Epoch: 5   Global Step: 233570   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:28,033-Speed 2618.27 samples/sec   Loss 9.9069   LearningRate 0.0516   Epoch: 5   Global Step: 233580   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:31,939-Speed 2622.38 samples/sec   Loss 9.8050   LearningRate 0.0516   Epoch: 5   Global Step: 233590   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:35,842-Speed 2623.84 samples/sec   Loss 9.7090   LearningRate 0.0516   Epoch: 5   Global Step: 233600   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:39,744-Speed 2624.75 samples/sec   Loss 9.7013   LearningRate 0.0516   Epoch: 5   Global Step: 233610   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:43,647-Speed 2624.44 samples/sec   Loss 9.7485   LearningRate 0.0516   Epoch: 5   Global Step: 233620   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:47,548-Speed 2625.49 samples/sec   Loss 9.7105   LearningRate 0.0516   Epoch: 5   Global Step: 233630   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:51,451-Speed 2624.62 samples/sec   Loss 9.7282   LearningRate 0.0516   Epoch: 5   Global Step: 233640   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:58:55,353-Speed 2624.70 samples/sec   Loss 9.7370   LearningRate 0.0516   Epoch: 5   Global Step: 233650   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:58:59,267-Speed 2617.18 samples/sec   Loss 9.7608   LearningRate 0.0516   Epoch: 5   Global Step: 233660   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:03,176-Speed 2620.10 samples/sec   Loss 9.7230   LearningRate 0.0516   Epoch: 5   Global Step: 233670   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:07,082-Speed 2622.12 samples/sec   Loss 9.6919   LearningRate 0.0516   Epoch: 5   Global Step: 233680   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:10,983-Speed 2625.84 samples/sec   Loss 9.6308   LearningRate 0.0516   Epoch: 5   Global Step: 233690   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:14,881-Speed 2627.88 samples/sec   Loss 9.8770   LearningRate 0.0516   Epoch: 5   Global Step: 233700   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:18,782-Speed 2625.54 samples/sec   Loss 9.7603   LearningRate 0.0516   Epoch: 5   Global Step: 233710   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:22,691-Speed 2620.39 samples/sec   Loss 9.6441   LearningRate 0.0516   Epoch: 5   Global Step: 233720   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:26,589-Speed 2627.29 samples/sec   Loss 9.7212   LearningRate 0.0516   Epoch: 5   Global Step: 233730   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:30,487-Speed 2627.76 samples/sec   Loss 9.7339   LearningRate 0.0516   Epoch: 5   Global Step: 233740   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:34,378-Speed 2632.77 samples/sec   Loss 9.7961   LearningRate 0.0516   Epoch: 5   Global Step: 233750   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:38,298-Speed 2612.41 samples/sec   Loss 9.8724   LearningRate 0.0516   Epoch: 5   Global Step: 233760   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:42,200-Speed 2624.54 samples/sec   Loss 9.8087   LearningRate 0.0516   Epoch: 5   Global Step: 233770   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:46,110-Speed 2619.70 samples/sec   Loss 9.8057   LearningRate 0.0516   Epoch: 5   Global Step: 233780   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:50,012-Speed 2624.81 samples/sec   Loss 9.7914   LearningRate 0.0516   Epoch: 5   Global Step: 233790   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 21:59:53,949-Speed 2602.24 samples/sec   Loss 9.7257   LearningRate 0.0516   Epoch: 5   Global Step: 233800   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 21:59:57,853-Speed 2622.93 samples/sec   Loss 9.7371   LearningRate 0.0516   Epoch: 5   Global Step: 233810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:01,771-Speed 2615.03 samples/sec   Loss 9.7383   LearningRate 0.0516   Epoch: 5   Global Step: 233820   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:05,669-Speed 2627.25 samples/sec   Loss 9.6887   LearningRate 0.0516   Epoch: 5   Global Step: 233830   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:09,574-Speed 2622.69 samples/sec   Loss 9.6303   LearningRate 0.0516   Epoch: 5   Global Step: 233840   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:13,476-Speed 2625.06 samples/sec   Loss 9.6508   LearningRate 0.0516   Epoch: 5   Global Step: 233850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:17,376-Speed 2626.21 samples/sec   Loss 9.7708   LearningRate 0.0516   Epoch: 5   Global Step: 233860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:21,277-Speed 2625.37 samples/sec   Loss 9.6842   LearningRate 0.0516   Epoch: 5   Global Step: 233870   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:25,185-Speed 2621.23 samples/sec   Loss 9.6681   LearningRate 0.0516   Epoch: 5   Global Step: 233880   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:29,100-Speed 2616.17 samples/sec   Loss 9.5982   LearningRate 0.0516   Epoch: 5   Global Step: 233890   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:33,003-Speed 2624.76 samples/sec   Loss 9.6731   LearningRate 0.0516   Epoch: 5   Global Step: 233900   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:00:36,900-Speed 2627.61 samples/sec   Loss 9.7050   LearningRate 0.0516   Epoch: 5   Global Step: 233910   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:00:40,803-Speed 2624.42 samples/sec   Loss 9.6333   LearningRate 0.0516   Epoch: 5   Global Step: 233920   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:00:44,738-Speed 2603.00 samples/sec   Loss 9.6948   LearningRate 0.0516   Epoch: 5   Global Step: 233930   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:48,664-Speed 2608.77 samples/sec   Loss 9.6118   LearningRate 0.0516   Epoch: 5   Global Step: 233940   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:52,566-Speed 2625.52 samples/sec   Loss 9.8251   LearningRate 0.0516   Epoch: 5   Global Step: 233950   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:00:56,468-Speed 2624.32 samples/sec   Loss 9.6850   LearningRate 0.0515   Epoch: 5   Global Step: 233960   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:00,374-Speed 2622.56 samples/sec   Loss 9.6183   LearningRate 0.0515   Epoch: 5   Global Step: 233970   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:04,296-Speed 2611.22 samples/sec   Loss 9.7376   LearningRate 0.0515   Epoch: 5   Global Step: 233980   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:08,205-Speed 2620.56 samples/sec   Loss 9.7863   LearningRate 0.0515   Epoch: 5   Global Step: 233990   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:12,110-Speed 2622.70 samples/sec   Loss 9.7109   LearningRate 0.0515   Epoch: 5   Global Step: 234000   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:16,009-Speed 2626.96 samples/sec   Loss 9.7741   LearningRate 0.0515   Epoch: 5   Global Step: 234010   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:19,910-Speed 2625.91 samples/sec   Loss 9.6156   LearningRate 0.0515   Epoch: 5   Global Step: 234020   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:23,810-Speed 2626.12 samples/sec   Loss 9.7286   LearningRate 0.0515   Epoch: 5   Global Step: 234030   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:01:27,710-Speed 2626.29 samples/sec   Loss 9.7384   LearningRate 0.0515   Epoch: 5   Global Step: 234040   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:01:31,610-Speed 2626.03 samples/sec   Loss 9.8157   LearningRate 0.0515   Epoch: 5   Global Step: 234050   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:01:35,523-Speed 2617.44 samples/sec   Loss 9.7646   LearningRate 0.0515   Epoch: 5   Global Step: 234060   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:01:39,415-Speed 2632.06 samples/sec   Loss 9.8225   LearningRate 0.0515   Epoch: 5   Global Step: 234070   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:43,305-Speed 2632.42 samples/sec   Loss 9.8607   LearningRate 0.0515   Epoch: 5   Global Step: 234080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:47,209-Speed 2623.94 samples/sec   Loss 9.6685   LearningRate 0.0515   Epoch: 5   Global Step: 234090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:51,114-Speed 2622.95 samples/sec   Loss 9.6380   LearningRate 0.0515   Epoch: 5   Global Step: 234100   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:55,014-Speed 2626.24 samples/sec   Loss 9.6690   LearningRate 0.0515   Epoch: 5   Global Step: 234110   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:01:58,913-Speed 2627.00 samples/sec   Loss 9.7547   LearningRate 0.0515   Epoch: 5   Global Step: 234120   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:02:02,834-Speed 2613.01 samples/sec   Loss 9.7151   LearningRate 0.0515   Epoch: 5   Global Step: 234130   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:02:06,732-Speed 2627.24 samples/sec   Loss 9.8126   LearningRate 0.0515   Epoch: 5   Global Step: 234140   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:02:10,636-Speed 2623.57 samples/sec   Loss 9.7886   LearningRate 0.0515   Epoch: 5   Global Step: 234150   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:02:14,546-Speed 2619.44 samples/sec   Loss 9.7274   LearningRate 0.0515   Epoch: 5   Global Step: 234160   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:02:18,450-Speed 2623.82 samples/sec   Loss 9.8461   LearningRate 0.0515   Epoch: 5   Global Step: 234170   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:22,352-Speed 2625.62 samples/sec   Loss 9.7313   LearningRate 0.0515   Epoch: 5   Global Step: 234180   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:26,257-Speed 2623.21 samples/sec   Loss 9.6920   LearningRate 0.0515   Epoch: 5   Global Step: 234190   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:30,169-Speed 2618.02 samples/sec   Loss 9.8085   LearningRate 0.0515   Epoch: 5   Global Step: 234200   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:34,096-Speed 2608.78 samples/sec   Loss 9.6495   LearningRate 0.0515   Epoch: 5   Global Step: 234210   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:38,011-Speed 2615.80 samples/sec   Loss 9.6200   LearningRate 0.0515   Epoch: 5   Global Step: 234220   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:41,914-Speed 2624.94 samples/sec   Loss 9.7383   LearningRate 0.0515   Epoch: 5   Global Step: 234230   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:45,816-Speed 2624.45 samples/sec   Loss 9.7445   LearningRate 0.0515   Epoch: 5   Global Step: 234240   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:49,721-Speed 2622.68 samples/sec   Loss 9.7992   LearningRate 0.0515   Epoch: 5   Global Step: 234250   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:02:53,619-Speed 2627.39 samples/sec   Loss 9.6189   LearningRate 0.0515   Epoch: 5   Global Step: 234260   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:02:57,522-Speed 2624.46 samples/sec   Loss 9.7324   LearningRate 0.0515   Epoch: 5   Global Step: 234270   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:01,423-Speed 2625.39 samples/sec   Loss 9.6063   LearningRate 0.0515   Epoch: 5   Global Step: 234280   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:05,326-Speed 2624.99 samples/sec   Loss 9.6588   LearningRate 0.0515   Epoch: 5   Global Step: 234290   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:09,226-Speed 2625.87 samples/sec   Loss 9.7344   LearningRate 0.0515   Epoch: 5   Global Step: 234300   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:13,128-Speed 2625.48 samples/sec   Loss 9.7507   LearningRate 0.0515   Epoch: 5   Global Step: 234310   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:17,032-Speed 2623.37 samples/sec   Loss 9.6607   LearningRate 0.0515   Epoch: 5   Global Step: 234320   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:20,934-Speed 2624.92 samples/sec   Loss 9.7709   LearningRate 0.0515   Epoch: 5   Global Step: 234330   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:24,958-Speed 2545.14 samples/sec   Loss 9.8205   LearningRate 0.0515   Epoch: 5   Global Step: 234340   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:29,051-Speed 2502.02 samples/sec   Loss 9.6415   LearningRate 0.0515   Epoch: 5   Global Step: 234350   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:33,146-Speed 2501.70 samples/sec   Loss 9.7782   LearningRate 0.0515   Epoch: 5   Global Step: 234360   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:03:37,185-Speed 2535.90 samples/sec   Loss 9.7401   LearningRate 0.0515   Epoch: 5   Global Step: 234370   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:41,117-Speed 2605.16 samples/sec   Loss 9.6131   LearningRate 0.0515   Epoch: 5   Global Step: 234380   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:45,032-Speed 2616.04 samples/sec   Loss 9.7507   LearningRate 0.0515   Epoch: 5   Global Step: 234390   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:48,940-Speed 2620.86 samples/sec   Loss 9.7706   LearningRate 0.0515   Epoch: 5   Global Step: 234400   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:52,905-Speed 2583.48 samples/sec   Loss 9.5920   LearningRate 0.0515   Epoch: 5   Global Step: 234410   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:03:56,843-Speed 2600.96 samples/sec   Loss 9.6953   LearningRate 0.0515   Epoch: 5   Global Step: 234420   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:04:00,744-Speed 2625.02 samples/sec   Loss 9.7451   LearningRate 0.0515   Epoch: 5   Global Step: 234430   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:04:04,641-Speed 2629.24 samples/sec   Loss 9.7459   LearningRate 0.0515   Epoch: 5   Global Step: 234440   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:04:08,551-Speed 2619.35 samples/sec   Loss 9.6761   LearningRate 0.0515   Epoch: 5   Global Step: 234450   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:04:12,457-Speed 2622.31 samples/sec   Loss 9.6921   LearningRate 0.0515   Epoch: 5   Global Step: 234460   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:04:16,356-Speed 2627.15 samples/sec   Loss 9.7441   LearningRate 0.0515   Epoch: 5   Global Step: 234470   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:20,260-Speed 2623.38 samples/sec   Loss 9.6468   LearningRate 0.0515   Epoch: 5   Global Step: 234480   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:24,159-Speed 2626.45 samples/sec   Loss 9.6106   LearningRate 0.0515   Epoch: 5   Global Step: 234490   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:28,074-Speed 2616.59 samples/sec   Loss 9.6778   LearningRate 0.0515   Epoch: 5   Global Step: 234500   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:31,990-Speed 2615.71 samples/sec   Loss 9.6197   LearningRate 0.0515   Epoch: 5   Global Step: 234510   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:35,982-Speed 2565.56 samples/sec   Loss 9.6722   LearningRate 0.0515   Epoch: 5   Global Step: 234520   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:39,937-Speed 2589.98 samples/sec   Loss 9.7275   LearningRate 0.0515   Epoch: 5   Global Step: 234530   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:43,858-Speed 2612.81 samples/sec   Loss 9.7165   LearningRate 0.0514   Epoch: 5   Global Step: 234540   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:47,762-Speed 2623.13 samples/sec   Loss 9.8305   LearningRate 0.0514   Epoch: 5   Global Step: 234550   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:51,661-Speed 2627.10 samples/sec   Loss 9.7353   LearningRate 0.0514   Epoch: 5   Global Step: 234560   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:55,537-Speed 2642.63 samples/sec   Loss 9.7669   LearningRate 0.0514   Epoch: 5   Global Step: 234570   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:04:59,459-Speed 2611.39 samples/sec   Loss 9.6030   LearningRate 0.0514   Epoch: 5   Global Step: 234580   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:05:03,358-Speed 2626.76 samples/sec   Loss 9.6305   LearningRate 0.0514   Epoch: 5   Global Step: 234590   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:05:07,240-Speed 2638.10 samples/sec   Loss 9.5646   LearningRate 0.0514   Epoch: 5   Global Step: 234600   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:11,138-Speed 2627.40 samples/sec   Loss 9.8047   LearningRate 0.0514   Epoch: 5   Global Step: 234610   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:15,046-Speed 2621.39 samples/sec   Loss 9.5724   LearningRate 0.0514   Epoch: 5   Global Step: 234620   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:18,955-Speed 2620.57 samples/sec   Loss 9.6911   LearningRate 0.0514   Epoch: 5   Global Step: 234630   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:22,868-Speed 2617.39 samples/sec   Loss 9.6458   LearningRate 0.0514   Epoch: 5   Global Step: 234640   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:26,772-Speed 2624.07 samples/sec   Loss 9.5640   LearningRate 0.0514   Epoch: 5   Global Step: 234650   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:30,672-Speed 2625.80 samples/sec   Loss 9.6515   LearningRate 0.0514   Epoch: 5   Global Step: 234660   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:34,587-Speed 2616.18 samples/sec   Loss 9.7988   LearningRate 0.0514   Epoch: 5   Global Step: 234670   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:38,502-Speed 2615.75 samples/sec   Loss 9.6617   LearningRate 0.0514   Epoch: 5   Global Step: 234680   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:42,401-Speed 2627.21 samples/sec   Loss 9.7477   LearningRate 0.0514   Epoch: 5   Global Step: 234690   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:46,282-Speed 2639.11 samples/sec   Loss 9.6757   LearningRate 0.0514   Epoch: 5   Global Step: 234700   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:50,197-Speed 2616.86 samples/sec   Loss 9.7726   LearningRate 0.0514   Epoch: 5   Global Step: 234710   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:54,105-Speed 2620.15 samples/sec   Loss 9.6646   LearningRate 0.0514   Epoch: 5   Global Step: 234720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:05:58,016-Speed 2619.48 samples/sec   Loss 9.5845   LearningRate 0.0514   Epoch: 5   Global Step: 234730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:01,919-Speed 2623.62 samples/sec   Loss 9.6412   LearningRate 0.0514   Epoch: 5   Global Step: 234740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:05,831-Speed 2618.21 samples/sec   Loss 9.8052   LearningRate 0.0514   Epoch: 5   Global Step: 234750   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:09,744-Speed 2617.66 samples/sec   Loss 9.6234   LearningRate 0.0514   Epoch: 5   Global Step: 234760   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:13,656-Speed 2618.47 samples/sec   Loss 9.7499   LearningRate 0.0514   Epoch: 5   Global Step: 234770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:17,566-Speed 2620.04 samples/sec   Loss 9.8322   LearningRate 0.0514   Epoch: 5   Global Step: 234780   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:21,481-Speed 2616.12 samples/sec   Loss 9.7636   LearningRate 0.0514   Epoch: 5   Global Step: 234790   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:25,379-Speed 2627.97 samples/sec   Loss 9.7988   LearningRate 0.0514   Epoch: 5   Global Step: 234800   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:06:29,267-Speed 2634.72 samples/sec   Loss 9.6107   LearningRate 0.0514   Epoch: 5   Global Step: 234810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:33,165-Speed 2627.41 samples/sec   Loss 9.6638   LearningRate 0.0514   Epoch: 5   Global Step: 234820   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:37,064-Speed 2626.80 samples/sec   Loss 9.6293   LearningRate 0.0514   Epoch: 5   Global Step: 234830   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:40,961-Speed 2628.74 samples/sec   Loss 9.7686   LearningRate 0.0514   Epoch: 5   Global Step: 234840   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:44,862-Speed 2625.34 samples/sec   Loss 9.8022   LearningRate 0.0514   Epoch: 5   Global Step: 234850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:48,774-Speed 2618.91 samples/sec   Loss 9.7111   LearningRate 0.0514   Epoch: 5   Global Step: 234860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:52,671-Speed 2627.74 samples/sec   Loss 9.6136   LearningRate 0.0514   Epoch: 5   Global Step: 234870   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:06:56,571-Speed 2626.54 samples/sec   Loss 9.7552   LearningRate 0.0514   Epoch: 5   Global Step: 234880   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:07:00,471-Speed 2626.53 samples/sec   Loss 9.7458   LearningRate 0.0514   Epoch: 5   Global Step: 234890   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:07:04,385-Speed 2616.39 samples/sec   Loss 9.7816   LearningRate 0.0514   Epoch: 5   Global Step: 234900   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:07:08,282-Speed 2627.89 samples/sec   Loss 9.8849   LearningRate 0.0514   Epoch: 5   Global Step: 234910   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:12,189-Speed 2622.58 samples/sec   Loss 9.7079   LearningRate 0.0514   Epoch: 5   Global Step: 234920   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:16,087-Speed 2627.15 samples/sec   Loss 9.6327   LearningRate 0.0514   Epoch: 5   Global Step: 234930   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:20,019-Speed 2605.30 samples/sec   Loss 9.6362   LearningRate 0.0514   Epoch: 5   Global Step: 234940   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:23,917-Speed 2627.62 samples/sec   Loss 9.6131   LearningRate 0.0514   Epoch: 5   Global Step: 234950   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:27,821-Speed 2624.28 samples/sec   Loss 9.7220   LearningRate 0.0514   Epoch: 5   Global Step: 234960   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:31,718-Speed 2628.35 samples/sec   Loss 9.7012   LearningRate 0.0514   Epoch: 5   Global Step: 234970   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:35,614-Speed 2628.46 samples/sec   Loss 9.6241   LearningRate 0.0514   Epoch: 5   Global Step: 234980   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:39,512-Speed 2628.13 samples/sec   Loss 9.6818   LearningRate 0.0514   Epoch: 5   Global Step: 234990   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:43,409-Speed 2628.31 samples/sec   Loss 9.5934   LearningRate 0.0514   Epoch: 5   Global Step: 235000   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:07:47,262-Speed 2658.57 samples/sec   Loss 9.6311   LearningRate 0.0514   Epoch: 5   Global Step: 235010   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:07:51,160-Speed 2627.86 samples/sec   Loss 9.8216   LearningRate 0.0514   Epoch: 5   Global Step: 235020   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:07:55,054-Speed 2630.29 samples/sec   Loss 9.6516   LearningRate 0.0514   Epoch: 5   Global Step: 235030   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:07:58,951-Speed 2628.07 samples/sec   Loss 9.7291   LearningRate 0.0514   Epoch: 5   Global Step: 235040   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:08:02,846-Speed 2629.81 samples/sec   Loss 9.6002   LearningRate 0.0514   Epoch: 5   Global Step: 235050   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:08:06,749-Speed 2624.22 samples/sec   Loss 9.6283   LearningRate 0.0514   Epoch: 5   Global Step: 235060   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:08:10,646-Speed 2628.15 samples/sec   Loss 9.5898   LearningRate 0.0514   Epoch: 5   Global Step: 235070   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:08:14,552-Speed 2622.48 samples/sec   Loss 9.7587   LearningRate 0.0514   Epoch: 5   Global Step: 235080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:08:18,455-Speed 2624.45 samples/sec   Loss 9.6995   LearningRate 0.0514   Epoch: 5   Global Step: 235090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:08:22,359-Speed 2624.05 samples/sec   Loss 9.6330   LearningRate 0.0514   Epoch: 5   Global Step: 235100   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:08:26,213-Speed 2657.32 samples/sec   Loss 10.1494   LearningRate 0.0514   Epoch: 5   Global Step: 235110   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:30,123-Speed 2619.17 samples/sec   Loss 10.1164   LearningRate 0.0513   Epoch: 5   Global Step: 235120   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:34,024-Speed 2626.26 samples/sec   Loss 10.7050   LearningRate 0.0513   Epoch: 5   Global Step: 235130   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:37,940-Speed 2615.38 samples/sec   Loss 10.2369   LearningRate 0.0513   Epoch: 5   Global Step: 235140   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:41,848-Speed 2620.90 samples/sec   Loss 9.9680   LearningRate 0.0513   Epoch: 5   Global Step: 235150   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:45,766-Speed 2613.94 samples/sec   Loss 9.6812   LearningRate 0.0513   Epoch: 5   Global Step: 235160   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:49,662-Speed 2629.32 samples/sec   Loss 9.6828   LearningRate 0.0513   Epoch: 5   Global Step: 235170   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:53,568-Speed 2621.92 samples/sec   Loss 9.9019   LearningRate 0.0513   Epoch: 5   Global Step: 235180   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:08:57,488-Speed 2613.28 samples/sec   Loss 9.8400   LearningRate 0.0513   Epoch: 5   Global Step: 235190   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:09:01,394-Speed 2622.29 samples/sec   Loss 9.8673   LearningRate 0.0513   Epoch: 5   Global Step: 235200   Fp16 Grad Scale: 8192   Required: 67 hours
Training: 2022-04-13 22:09:05,326-Speed 2604.90 samples/sec   Loss 9.7918   LearningRate 0.0513   Epoch: 5   Global Step: 235210   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:09,221-Speed 2630.28 samples/sec   Loss 9.7706   LearningRate 0.0513   Epoch: 5   Global Step: 235220   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:13,130-Speed 2620.25 samples/sec   Loss 9.6315   LearningRate 0.0513   Epoch: 5   Global Step: 235230   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:17,022-Speed 2631.73 samples/sec   Loss 9.5911   LearningRate 0.0513   Epoch: 5   Global Step: 235240   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:20,916-Speed 2630.32 samples/sec   Loss 9.8050   LearningRate 0.0513   Epoch: 5   Global Step: 235250   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:24,847-Speed 2605.35 samples/sec   Loss 9.8296   LearningRate 0.0513   Epoch: 5   Global Step: 235260   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:28,751-Speed 2624.10 samples/sec   Loss 9.5250   LearningRate 0.0513   Epoch: 5   Global Step: 235270   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:32,651-Speed 2626.87 samples/sec   Loss 9.6874   LearningRate 0.0513   Epoch: 5   Global Step: 235280   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:36,553-Speed 2624.68 samples/sec   Loss 9.7870   LearningRate 0.0513   Epoch: 5   Global Step: 235290   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:40,482-Speed 2608.02 samples/sec   Loss 9.8816   LearningRate 0.0513   Epoch: 5   Global Step: 235300   Fp16 Grad Scale: 16384   Required: 67 hours
Training: 2022-04-13 22:09:44,390-Speed 2620.54 samples/sec   Loss 9.7102   LearningRate 0.0513   Epoch: 5   Global Step: 235310   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:09:48,290-Speed 2626.51 samples/sec   Loss 9.7193   LearningRate 0.0513   Epoch: 5   Global Step: 235320   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:09:52,192-Speed 2624.89 samples/sec   Loss 9.6638   LearningRate 0.0513   Epoch: 5   Global Step: 235330   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:09:56,100-Speed 2620.74 samples/sec   Loss 9.6739   LearningRate 0.0513   Epoch: 5   Global Step: 235340   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:00,008-Speed 2621.07 samples/sec   Loss 9.6905   LearningRate 0.0513   Epoch: 5   Global Step: 235350   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:03,956-Speed 2595.01 samples/sec   Loss 9.6328   LearningRate 0.0513   Epoch: 5   Global Step: 235360   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:07,866-Speed 2619.71 samples/sec   Loss 9.8303   LearningRate 0.0513   Epoch: 5   Global Step: 235370   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:11,760-Speed 2629.86 samples/sec   Loss 9.6382   LearningRate 0.0513   Epoch: 5   Global Step: 235380   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:15,655-Speed 2630.25 samples/sec   Loss 9.8365   LearningRate 0.0513   Epoch: 5   Global Step: 235390   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:19,557-Speed 2624.51 samples/sec   Loss 9.6368   LearningRate 0.0513   Epoch: 5   Global Step: 235400   Fp16 Grad Scale: 32768   Required: 67 hours
Training: 2022-04-13 22:10:23,449-Speed 2631.96 samples/sec   Loss 9.8367   LearningRate 0.0513   Epoch: 5   Global Step: 235410   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:27,344-Speed 2629.21 samples/sec   Loss 9.6563   LearningRate 0.0513   Epoch: 5   Global Step: 235420   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:31,245-Speed 2626.14 samples/sec   Loss 9.6945   LearningRate 0.0513   Epoch: 5   Global Step: 235430   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:35,138-Speed 2630.85 samples/sec   Loss 9.7751   LearningRate 0.0513   Epoch: 5   Global Step: 235440   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:39,032-Speed 2630.70 samples/sec   Loss 9.6506   LearningRate 0.0513   Epoch: 5   Global Step: 235450   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:42,924-Speed 2631.93 samples/sec   Loss 9.6121   LearningRate 0.0513   Epoch: 5   Global Step: 235460   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:46,836-Speed 2617.76 samples/sec   Loss 9.6069   LearningRate 0.0513   Epoch: 5   Global Step: 235470   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:50,733-Speed 2628.96 samples/sec   Loss 9.6682   LearningRate 0.0513   Epoch: 5   Global Step: 235480   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:54,629-Speed 2628.56 samples/sec   Loss 9.7235   LearningRate 0.0513   Epoch: 5   Global Step: 235490   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:10:58,535-Speed 2622.38 samples/sec   Loss 9.6425   LearningRate 0.0513   Epoch: 5   Global Step: 235500   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:11:02,425-Speed 2632.48 samples/sec   Loss 9.6079   LearningRate 0.0513   Epoch: 5   Global Step: 235510   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:06,315-Speed 2633.60 samples/sec   Loss 9.7048   LearningRate 0.0513   Epoch: 5   Global Step: 235520   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:10,211-Speed 2629.15 samples/sec   Loss 9.6408   LearningRate 0.0513   Epoch: 5   Global Step: 235530   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:14,110-Speed 2626.95 samples/sec   Loss 9.7062   LearningRate 0.0513   Epoch: 5   Global Step: 235540   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:18,004-Speed 2630.64 samples/sec   Loss 9.7928   LearningRate 0.0513   Epoch: 5   Global Step: 235550   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:21,896-Speed 2631.50 samples/sec   Loss 9.6764   LearningRate 0.0513   Epoch: 5   Global Step: 235560   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:25,801-Speed 2623.58 samples/sec   Loss 9.8308   LearningRate 0.0513   Epoch: 5   Global Step: 235570   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:29,694-Speed 2630.23 samples/sec   Loss 9.6778   LearningRate 0.0513   Epoch: 5   Global Step: 235580   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:33,590-Speed 2629.63 samples/sec   Loss 9.8219   LearningRate 0.0513   Epoch: 5   Global Step: 235590   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:37,481-Speed 2632.02 samples/sec   Loss 9.7869   LearningRate 0.0513   Epoch: 5   Global Step: 235600   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:11:41,376-Speed 2629.73 samples/sec   Loss 9.6826   LearningRate 0.0513   Epoch: 5   Global Step: 235610   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:11:45,272-Speed 2628.96 samples/sec   Loss 9.6824   LearningRate 0.0513   Epoch: 5   Global Step: 235620   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:11:49,177-Speed 2623.83 samples/sec   Loss 9.5521   LearningRate 0.0513   Epoch: 5   Global Step: 235630   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:11:53,233-Speed 2525.15 samples/sec   Loss 9.6498   LearningRate 0.0513   Epoch: 5   Global Step: 235640   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:11:57,343-Speed 2492.52 samples/sec   Loss 9.7593   LearningRate 0.0513   Epoch: 5   Global Step: 235650   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:12:01,277-Speed 2602.88 samples/sec   Loss 9.8059   LearningRate 0.0513   Epoch: 5   Global Step: 235660   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:05,176-Speed 2626.79 samples/sec   Loss 9.6771   LearningRate 0.0513   Epoch: 5   Global Step: 235670   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:09,074-Speed 2627.86 samples/sec   Loss 9.7064   LearningRate 0.0513   Epoch: 5   Global Step: 235680   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:12,978-Speed 2623.62 samples/sec   Loss 9.7535   LearningRate 0.0513   Epoch: 5   Global Step: 235690   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:16,884-Speed 2622.25 samples/sec   Loss 9.7739   LearningRate 0.0512   Epoch: 5   Global Step: 235700   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:20,785-Speed 2625.94 samples/sec   Loss 9.6103   LearningRate 0.0512   Epoch: 5   Global Step: 235710   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:24,682-Speed 2628.36 samples/sec   Loss 9.7072   LearningRate 0.0512   Epoch: 5   Global Step: 235720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:28,574-Speed 2632.09 samples/sec   Loss 9.6485   LearningRate 0.0512   Epoch: 5   Global Step: 235730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:32,472-Speed 2627.81 samples/sec   Loss 9.7072   LearningRate 0.0512   Epoch: 5   Global Step: 235740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:36,366-Speed 2630.15 samples/sec   Loss 9.6284   LearningRate 0.0512   Epoch: 5   Global Step: 235750   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:40,260-Speed 2629.78 samples/sec   Loss 9.7014   LearningRate 0.0512   Epoch: 5   Global Step: 235760   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:12:44,151-Speed 2632.39 samples/sec   Loss 9.7314   LearningRate 0.0512   Epoch: 5   Global Step: 235770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:48,046-Speed 2630.53 samples/sec   Loss 9.8262   LearningRate 0.0512   Epoch: 5   Global Step: 235780   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:51,937-Speed 2631.80 samples/sec   Loss 9.7813   LearningRate 0.0512   Epoch: 5   Global Step: 235790   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:55,832-Speed 2629.88 samples/sec   Loss 9.5084   LearningRate 0.0512   Epoch: 5   Global Step: 235800   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:12:59,727-Speed 2629.56 samples/sec   Loss 9.6315   LearningRate 0.0512   Epoch: 5   Global Step: 235810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:03,634-Speed 2621.68 samples/sec   Loss 9.6594   LearningRate 0.0512   Epoch: 5   Global Step: 235820   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:07,533-Speed 2626.57 samples/sec   Loss 9.7877   LearningRate 0.0512   Epoch: 5   Global Step: 235830   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:11,430-Speed 2628.74 samples/sec   Loss 9.6317   LearningRate 0.0512   Epoch: 5   Global Step: 235840   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:15,334-Speed 2623.35 samples/sec   Loss 9.6186   LearningRate 0.0512   Epoch: 5   Global Step: 235850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:19,234-Speed 2626.46 samples/sec   Loss 9.6559   LearningRate 0.0512   Epoch: 5   Global Step: 235860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:23,135-Speed 2625.51 samples/sec   Loss 9.6028   LearningRate 0.0512   Epoch: 5   Global Step: 235870   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:13:27,032-Speed 2628.94 samples/sec   Loss 9.7335   LearningRate 0.0512   Epoch: 5   Global Step: 235880   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:13:30,927-Speed 2629.66 samples/sec   Loss 9.6884   LearningRate 0.0512   Epoch: 5   Global Step: 235890   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:13:34,825-Speed 2627.13 samples/sec   Loss 9.7448   LearningRate 0.0512   Epoch: 5   Global Step: 235900   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:13:38,726-Speed 2625.40 samples/sec   Loss 9.6309   LearningRate 0.0512   Epoch: 5   Global Step: 235910   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:13:42,608-Speed 2639.26 samples/sec   Loss 9.5963   LearningRate 0.0512   Epoch: 5   Global Step: 235920   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:46,506-Speed 2627.50 samples/sec   Loss 9.5436   LearningRate 0.0512   Epoch: 5   Global Step: 235930   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:50,416-Speed 2619.74 samples/sec   Loss 9.6195   LearningRate 0.0512   Epoch: 5   Global Step: 235940   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:54,306-Speed 2632.44 samples/sec   Loss 9.5578   LearningRate 0.0512   Epoch: 5   Global Step: 235950   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:13:58,209-Speed 2625.00 samples/sec   Loss 9.7493   LearningRate 0.0512   Epoch: 5   Global Step: 235960   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:02,106-Speed 2628.11 samples/sec   Loss 9.6478   LearningRate 0.0512   Epoch: 5   Global Step: 235970   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:06,001-Speed 2629.42 samples/sec   Loss 9.6296   LearningRate 0.0512   Epoch: 5   Global Step: 235980   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:09,899-Speed 2627.33 samples/sec   Loss 9.5784   LearningRate 0.0512   Epoch: 5   Global Step: 235990   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:13,795-Speed 2629.68 samples/sec   Loss 9.7253   LearningRate 0.0512   Epoch: 5   Global Step: 236000   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:17,696-Speed 2625.54 samples/sec   Loss 9.7426   LearningRate 0.0512   Epoch: 5   Global Step: 236010   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:21,600-Speed 2623.97 samples/sec   Loss 9.7399   LearningRate 0.0512   Epoch: 5   Global Step: 236020   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:14:25,496-Speed 2628.91 samples/sec   Loss 9.4397   LearningRate 0.0512   Epoch: 5   Global Step: 236030   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:14:29,379-Speed 2638.09 samples/sec   Loss 9.7469   LearningRate 0.0512   Epoch: 5   Global Step: 236040   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:33,276-Speed 2628.57 samples/sec   Loss 9.7430   LearningRate 0.0512   Epoch: 5   Global Step: 236050   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:37,212-Speed 2602.09 samples/sec   Loss 9.8899   LearningRate 0.0512   Epoch: 5   Global Step: 236060   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:41,133-Speed 2612.47 samples/sec   Loss 9.6538   LearningRate 0.0512   Epoch: 5   Global Step: 236070   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:45,027-Speed 2630.28 samples/sec   Loss 9.7708   LearningRate 0.0512   Epoch: 5   Global Step: 236080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:48,926-Speed 2627.72 samples/sec   Loss 9.6010   LearningRate 0.0512   Epoch: 5   Global Step: 236090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:52,823-Speed 2628.24 samples/sec   Loss 9.6986   LearningRate 0.0512   Epoch: 5   Global Step: 236100   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:14:56,719-Speed 2628.91 samples/sec   Loss 9.7660   LearningRate 0.0512   Epoch: 5   Global Step: 236110   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:00,626-Speed 2621.49 samples/sec   Loss 9.5844   LearningRate 0.0512   Epoch: 5   Global Step: 236120   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:04,521-Speed 2629.46 samples/sec   Loss 9.5850   LearningRate 0.0512   Epoch: 5   Global Step: 236130   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:08,413-Speed 2631.79 samples/sec   Loss 9.5367   LearningRate 0.0512   Epoch: 5   Global Step: 236140   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:15:12,297-Speed 2637.35 samples/sec   Loss 9.6912   LearningRate 0.0512   Epoch: 5   Global Step: 236150   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:16,192-Speed 2630.11 samples/sec   Loss 9.6956   LearningRate 0.0512   Epoch: 5   Global Step: 236160   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:20,122-Speed 2605.93 samples/sec   Loss 9.6834   LearningRate 0.0512   Epoch: 5   Global Step: 236170   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:24,016-Speed 2630.47 samples/sec   Loss 9.6983   LearningRate 0.0512   Epoch: 5   Global Step: 236180   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:27,910-Speed 2630.40 samples/sec   Loss 9.6665   LearningRate 0.0512   Epoch: 5   Global Step: 236190   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:31,923-Speed 2552.07 samples/sec   Loss 9.5978   LearningRate 0.0512   Epoch: 5   Global Step: 236200   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:35,812-Speed 2633.69 samples/sec   Loss 9.7145   LearningRate 0.0512   Epoch: 5   Global Step: 236210   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:39,715-Speed 2625.13 samples/sec   Loss 9.7228   LearningRate 0.0512   Epoch: 5   Global Step: 236220   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:43,610-Speed 2628.86 samples/sec   Loss 9.6890   LearningRate 0.0512   Epoch: 5   Global Step: 236230   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:47,541-Speed 2606.46 samples/sec   Loss 9.6790   LearningRate 0.0512   Epoch: 5   Global Step: 236240   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:15:51,447-Speed 2621.78 samples/sec   Loss 9.5257   LearningRate 0.0512   Epoch: 5   Global Step: 236250   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:15:55,353-Speed 2622.67 samples/sec   Loss 9.7073   LearningRate 0.0512   Epoch: 5   Global Step: 236260   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:15:59,251-Speed 2627.71 samples/sec   Loss 9.5046   LearningRate 0.0512   Epoch: 5   Global Step: 236270   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:16:03,145-Speed 2630.46 samples/sec   Loss 9.5542   LearningRate 0.0511   Epoch: 5   Global Step: 236280   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:16:07,039-Speed 2630.26 samples/sec   Loss 9.6944   LearningRate 0.0511   Epoch: 5   Global Step: 236290   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:16:10,915-Speed 2642.52 samples/sec   Loss 9.5768   LearningRate 0.0511   Epoch: 5   Global Step: 236300   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:14,815-Speed 2626.68 samples/sec   Loss 9.6690   LearningRate 0.0511   Epoch: 5   Global Step: 236310   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:18,732-Speed 2614.62 samples/sec   Loss 9.6634   LearningRate 0.0511   Epoch: 5   Global Step: 236320   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:22,628-Speed 2629.53 samples/sec   Loss 9.6047   LearningRate 0.0511   Epoch: 5   Global Step: 236330   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:26,531-Speed 2623.85 samples/sec   Loss 9.6344   LearningRate 0.0511   Epoch: 5   Global Step: 236340   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:30,427-Speed 2628.85 samples/sec   Loss 9.6071   LearningRate 0.0511   Epoch: 5   Global Step: 236350   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:34,356-Speed 2607.26 samples/sec   Loss 9.5685   LearningRate 0.0511   Epoch: 5   Global Step: 236360   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:38,249-Speed 2631.16 samples/sec   Loss 9.6619   LearningRate 0.0511   Epoch: 5   Global Step: 236370   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:42,153-Speed 2623.89 samples/sec   Loss 9.5389   LearningRate 0.0511   Epoch: 5   Global Step: 236380   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:46,067-Speed 2617.25 samples/sec   Loss 9.5701   LearningRate 0.0511   Epoch: 5   Global Step: 236390   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:16:50,019-Speed 2591.69 samples/sec   Loss 9.7119   LearningRate 0.0511   Epoch: 5   Global Step: 236400   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:16:53,910-Speed 2632.17 samples/sec   Loss 9.7484   LearningRate 0.0511   Epoch: 5   Global Step: 236410   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:16:57,818-Speed 2621.34 samples/sec   Loss 9.9769   LearningRate 0.0511   Epoch: 5   Global Step: 236420   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:17:01,715-Speed 2628.07 samples/sec   Loss 9.6886   LearningRate 0.0511   Epoch: 5   Global Step: 236430   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:17:05,610-Speed 2629.82 samples/sec   Loss 9.6117   LearningRate 0.0511   Epoch: 5   Global Step: 236440   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:17:09,508-Speed 2628.00 samples/sec   Loss 9.6962   LearningRate 0.0511   Epoch: 5   Global Step: 236450   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:17:13,404-Speed 2629.60 samples/sec   Loss 9.6447   LearningRate 0.0511   Epoch: 5   Global Step: 236460   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:17:17,279-Speed 2642.87 samples/sec   Loss 9.6442   LearningRate 0.0511   Epoch: 5   Global Step: 236470   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:21,185-Speed 2622.59 samples/sec   Loss 9.7325   LearningRate 0.0511   Epoch: 5   Global Step: 236480   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:25,077-Speed 2631.80 samples/sec   Loss 9.5067   LearningRate 0.0511   Epoch: 5   Global Step: 236490   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:28,969-Speed 2632.15 samples/sec   Loss 9.6437   LearningRate 0.0511   Epoch: 5   Global Step: 236500   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:32,866-Speed 2627.96 samples/sec   Loss 9.5476   LearningRate 0.0511   Epoch: 5   Global Step: 236510   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:36,763-Speed 2628.30 samples/sec   Loss 9.6733   LearningRate 0.0511   Epoch: 5   Global Step: 236520   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:40,658-Speed 2629.44 samples/sec   Loss 9.6113   LearningRate 0.0511   Epoch: 5   Global Step: 236530   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:44,556-Speed 2627.91 samples/sec   Loss 9.6417   LearningRate 0.0511   Epoch: 5   Global Step: 236540   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:48,480-Speed 2610.80 samples/sec   Loss 9.5774   LearningRate 0.0511   Epoch: 5   Global Step: 236550   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:52,384-Speed 2623.63 samples/sec   Loss 9.6518   LearningRate 0.0511   Epoch: 5   Global Step: 236560   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:17:56,286-Speed 2625.36 samples/sec   Loss 9.7686   LearningRate 0.0511   Epoch: 5   Global Step: 236570   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:18:00,205-Speed 2613.56 samples/sec   Loss 9.5466   LearningRate 0.0511   Epoch: 5   Global Step: 236580   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:18:04,100-Speed 2629.27 samples/sec   Loss 9.6511   LearningRate 0.0511   Epoch: 5   Global Step: 236590   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:18:07,973-Speed 2644.33 samples/sec   Loss 9.6534   LearningRate 0.0511   Epoch: 5   Global Step: 236600   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:11,874-Speed 2626.03 samples/sec   Loss 9.6949   LearningRate 0.0511   Epoch: 5   Global Step: 236610   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:15,773-Speed 2626.97 samples/sec   Loss 9.5971   LearningRate 0.0511   Epoch: 5   Global Step: 236620   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:19,667-Speed 2630.46 samples/sec   Loss 9.7359   LearningRate 0.0511   Epoch: 5   Global Step: 236630   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:23,563-Speed 2628.62 samples/sec   Loss 9.6852   LearningRate 0.0511   Epoch: 5   Global Step: 236640   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:27,456-Speed 2631.49 samples/sec   Loss 9.6621   LearningRate 0.0511   Epoch: 5   Global Step: 236650   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:31,351-Speed 2629.64 samples/sec   Loss 9.5832   LearningRate 0.0511   Epoch: 5   Global Step: 236660   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:35,245-Speed 2630.27 samples/sec   Loss 9.5781   LearningRate 0.0511   Epoch: 5   Global Step: 236670   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:39,139-Speed 2630.22 samples/sec   Loss 9.5112   LearningRate 0.0511   Epoch: 5   Global Step: 236680   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:43,037-Speed 2627.83 samples/sec   Loss 9.5185   LearningRate 0.0511   Epoch: 5   Global Step: 236690   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:46,932-Speed 2629.48 samples/sec   Loss 9.7118   LearningRate 0.0511   Epoch: 5   Global Step: 236700   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:18:50,828-Speed 2629.39 samples/sec   Loss 9.6263   LearningRate 0.0511   Epoch: 5   Global Step: 236710   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:18:54,710-Speed 2638.53 samples/sec   Loss 9.6693   LearningRate 0.0511   Epoch: 5   Global Step: 236720   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:18:58,605-Speed 2630.16 samples/sec   Loss 9.7265   LearningRate 0.0511   Epoch: 5   Global Step: 236730   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:02,529-Speed 2610.16 samples/sec   Loss 9.7383   LearningRate 0.0511   Epoch: 5   Global Step: 236740   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:06,425-Speed 2629.16 samples/sec   Loss 9.6659   LearningRate 0.0511   Epoch: 5   Global Step: 236750   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:10,321-Speed 2628.71 samples/sec   Loss 9.5037   LearningRate 0.0511   Epoch: 5   Global Step: 236760   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:14,226-Speed 2622.86 samples/sec   Loss 9.7815   LearningRate 0.0511   Epoch: 5   Global Step: 236770   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:18,132-Speed 2622.51 samples/sec   Loss 9.5558   LearningRate 0.0511   Epoch: 5   Global Step: 236780   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:22,024-Speed 2632.03 samples/sec   Loss 9.6067   LearningRate 0.0511   Epoch: 5   Global Step: 236790   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:25,915-Speed 2632.76 samples/sec   Loss 9.5507   LearningRate 0.0511   Epoch: 5   Global Step: 236800   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:29,811-Speed 2628.59 samples/sec   Loss 9.6969   LearningRate 0.0511   Epoch: 5   Global Step: 236810   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:33,705-Speed 2630.32 samples/sec   Loss 9.5628   LearningRate 0.0511   Epoch: 5   Global Step: 236820   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:19:37,601-Speed 2628.93 samples/sec   Loss 9.5332   LearningRate 0.0511   Epoch: 5   Global Step: 236830   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:19:41,540-Speed 2600.72 samples/sec   Loss 9.6556   LearningRate 0.0511   Epoch: 5   Global Step: 236840   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:19:45,421-Speed 2639.35 samples/sec   Loss 9.7448   LearningRate 0.0511   Epoch: 5   Global Step: 236850   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:49,315-Speed 2630.61 samples/sec   Loss 9.7453   LearningRate 0.0510   Epoch: 5   Global Step: 236860   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:53,211-Speed 2628.39 samples/sec   Loss 9.6810   LearningRate 0.0510   Epoch: 5   Global Step: 236870   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:19:57,105-Speed 2630.83 samples/sec   Loss 9.6214   LearningRate 0.0510   Epoch: 5   Global Step: 236880   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:00,998-Speed 2631.03 samples/sec   Loss 9.7246   LearningRate 0.0510   Epoch: 5   Global Step: 236890   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:04,888-Speed 2632.99 samples/sec   Loss 9.6048   LearningRate 0.0510   Epoch: 5   Global Step: 236900   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:08,811-Speed 2611.13 samples/sec   Loss 9.5717   LearningRate 0.0510   Epoch: 5   Global Step: 236910   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:12,705-Speed 2630.27 samples/sec   Loss 9.5346   LearningRate 0.0510   Epoch: 5   Global Step: 236920   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:16,630-Speed 2609.68 samples/sec   Loss 9.6144   LearningRate 0.0510   Epoch: 5   Global Step: 236930   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:20,526-Speed 2629.11 samples/sec   Loss 9.7973   LearningRate 0.0510   Epoch: 5   Global Step: 236940   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:24,423-Speed 2628.45 samples/sec   Loss 9.7516   LearningRate 0.0510   Epoch: 5   Global Step: 236950   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:20:28,318-Speed 2630.04 samples/sec   Loss 9.6064   LearningRate 0.0510   Epoch: 5   Global Step: 236960   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:20:32,210-Speed 2631.87 samples/sec   Loss 9.7135   LearningRate 0.0510   Epoch: 5   Global Step: 236970   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:20:36,108-Speed 2627.54 samples/sec   Loss 9.6018   LearningRate 0.0510   Epoch: 5   Global Step: 236980   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:20:39,989-Speed 2639.40 samples/sec   Loss 9.7401   LearningRate 0.0510   Epoch: 5   Global Step: 236990   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:43,882-Speed 2631.56 samples/sec   Loss 9.6216   LearningRate 0.0510   Epoch: 5   Global Step: 237000   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:47,777-Speed 2629.12 samples/sec   Loss 9.6820   LearningRate 0.0510   Epoch: 5   Global Step: 237010   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:51,668-Speed 2632.69 samples/sec   Loss 9.6148   LearningRate 0.0510   Epoch: 5   Global Step: 237020   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:55,571-Speed 2624.80 samples/sec   Loss 9.6471   LearningRate 0.0510   Epoch: 5   Global Step: 237030   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:20:59,477-Speed 2622.10 samples/sec   Loss 9.6663   LearningRate 0.0510   Epoch: 5   Global Step: 237040   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:03,389-Speed 2617.92 samples/sec   Loss 9.6127   LearningRate 0.0510   Epoch: 5   Global Step: 237050   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:07,285-Speed 2629.56 samples/sec   Loss 9.7094   LearningRate 0.0510   Epoch: 5   Global Step: 237060   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:11,183-Speed 2627.08 samples/sec   Loss 9.8181   LearningRate 0.0510   Epoch: 5   Global Step: 237070   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:15,079-Speed 2629.37 samples/sec   Loss 9.6885   LearningRate 0.0510   Epoch: 5   Global Step: 237080   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:18,958-Speed 2640.21 samples/sec   Loss 9.7662   LearningRate 0.0510   Epoch: 5   Global Step: 237090   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:22,853-Speed 2629.80 samples/sec   Loss 9.7612   LearningRate 0.0510   Epoch: 5   Global Step: 237100   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:26,765-Speed 2618.58 samples/sec   Loss 9.7758   LearningRate 0.0510   Epoch: 5   Global Step: 237110   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:30,674-Speed 2620.01 samples/sec   Loss 9.7971   LearningRate 0.0510   Epoch: 5   Global Step: 237120   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:34,570-Speed 2628.94 samples/sec   Loss 9.6956   LearningRate 0.0510   Epoch: 5   Global Step: 237130   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:38,504-Speed 2603.48 samples/sec   Loss 9.4805   LearningRate 0.0510   Epoch: 5   Global Step: 237140   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:42,399-Speed 2630.19 samples/sec   Loss 9.6208   LearningRate 0.0510   Epoch: 5   Global Step: 237150   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:46,297-Speed 2627.98 samples/sec   Loss 9.6919   LearningRate 0.0510   Epoch: 5   Global Step: 237160   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:50,192-Speed 2629.49 samples/sec   Loss 9.6490   LearningRate 0.0510   Epoch: 5   Global Step: 237170   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:54,096-Speed 2623.74 samples/sec   Loss 9.5316   LearningRate 0.0510   Epoch: 5   Global Step: 237180   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:21:57,997-Speed 2625.77 samples/sec   Loss 9.6983   LearningRate 0.0510   Epoch: 5   Global Step: 237190   Fp16 Grad Scale: 262144   Required: 67 hours
Training: 2022-04-13 22:22:01,871-Speed 2643.81 samples/sec   Loss 9.5725   LearningRate 0.0510   Epoch: 5   Global Step: 237200   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:22:05,763-Speed 2630.96 samples/sec   Loss 9.5487   LearningRate 0.0510   Epoch: 5   Global Step: 237210   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:22:09,664-Speed 2632.52 samples/sec   Loss 9.6562   LearningRate 0.0510   Epoch: 5   Global Step: 237220   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:22:13,559-Speed 2629.95 samples/sec   Loss 9.6381   LearningRate 0.0510   Epoch: 5   Global Step: 237230   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:22:17,436-Speed 2642.23 samples/sec   Loss 9.6046   LearningRate 0.0510   Epoch: 5   Global Step: 237240   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:21,342-Speed 2622.31 samples/sec   Loss 9.5628   LearningRate 0.0510   Epoch: 5   Global Step: 237250   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:25,259-Speed 2614.85 samples/sec   Loss 9.7132   LearningRate 0.0510   Epoch: 5   Global Step: 237260   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:29,152-Speed 2631.06 samples/sec   Loss 9.6412   LearningRate 0.0510   Epoch: 5   Global Step: 237270   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:33,042-Speed 2632.70 samples/sec   Loss 9.6422   LearningRate 0.0510   Epoch: 5   Global Step: 237280   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:36,992-Speed 2593.41 samples/sec   Loss 9.6570   LearningRate 0.0510   Epoch: 5   Global Step: 237290   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:40,879-Speed 2635.08 samples/sec   Loss 9.8004   LearningRate 0.0510   Epoch: 5   Global Step: 237300   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:44,791-Speed 2618.25 samples/sec   Loss 9.7383   LearningRate 0.0510   Epoch: 5   Global Step: 237310   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:48,696-Speed 2622.86 samples/sec   Loss 9.5697   LearningRate 0.0510   Epoch: 5   Global Step: 237320   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:52,652-Speed 2589.37 samples/sec   Loss 9.6855   LearningRate 0.0510   Epoch: 5   Global Step: 237330   Fp16 Grad Scale: 65536   Required: 67 hours
Training: 2022-04-13 22:22:56,545-Speed 2630.96 samples/sec   Loss 9.5905   LearningRate 0.0510   Epoch: 5   Global Step: 237340   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:23:00,438-Speed 2631.07 samples/sec   Loss 9.7095   LearningRate 0.0510   Epoch: 5   Global Step: 237350   Fp16 Grad Scale: 131072   Required: 67 hours
Training: 2022-04-13 22:23:04,370-Speed 2604.88 samples/sec   Loss 9.6726   LearningRate 0.0510   Epoch: 5   Global Step: 237360   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:08,264-Speed 2630.72 samples/sec   Loss 9.5924   LearningRate 0.0510   Epoch: 5   Global Step: 237370   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:12,159-Speed 2629.19 samples/sec   Loss 9.6212   LearningRate 0.0510   Epoch: 5   Global Step: 237380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:16,075-Speed 2616.00 samples/sec   Loss 9.6786   LearningRate 0.0510   Epoch: 5   Global Step: 237390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:19,968-Speed 2630.67 samples/sec   Loss 9.6644   LearningRate 0.0510   Epoch: 5   Global Step: 237400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:23,869-Speed 2625.71 samples/sec   Loss 9.6284   LearningRate 0.0510   Epoch: 5   Global Step: 237410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:27,774-Speed 2623.23 samples/sec   Loss 9.7717   LearningRate 0.0510   Epoch: 5   Global Step: 237420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:31,681-Speed 2621.61 samples/sec   Loss 9.6909   LearningRate 0.0510   Epoch: 5   Global Step: 237430   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:35,586-Speed 2622.63 samples/sec   Loss 9.8044   LearningRate 0.0509   Epoch: 5   Global Step: 237440   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:23:39,511-Speed 2609.09 samples/sec   Loss 9.5783   LearningRate 0.0509   Epoch: 5   Global Step: 237450   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:23:43,407-Speed 2629.83 samples/sec   Loss 9.6575   LearningRate 0.0509   Epoch: 5   Global Step: 237460   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:23:47,302-Speed 2629.57 samples/sec   Loss 9.7507   LearningRate 0.0509   Epoch: 5   Global Step: 237470   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:23:51,214-Speed 2618.41 samples/sec   Loss 9.6082   LearningRate 0.0509   Epoch: 5   Global Step: 237480   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:23:55,093-Speed 2640.62 samples/sec   Loss 9.6826   LearningRate 0.0509   Epoch: 5   Global Step: 237490   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:23:58,988-Speed 2630.24 samples/sec   Loss 9.6751   LearningRate 0.0509   Epoch: 5   Global Step: 237500   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:02,887-Speed 2626.85 samples/sec   Loss 9.7269   LearningRate 0.0509   Epoch: 5   Global Step: 237510   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:06,784-Speed 2628.18 samples/sec   Loss 9.7018   LearningRate 0.0509   Epoch: 5   Global Step: 237520   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:10,681-Speed 2628.36 samples/sec   Loss 9.6690   LearningRate 0.0509   Epoch: 5   Global Step: 237530   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:14,597-Speed 2616.26 samples/sec   Loss 9.5834   LearningRate 0.0509   Epoch: 5   Global Step: 237540   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:18,553-Speed 2588.94 samples/sec   Loss 9.5216   LearningRate 0.0509   Epoch: 5   Global Step: 237550   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:22,462-Speed 2620.58 samples/sec   Loss 9.5822   LearningRate 0.0509   Epoch: 5   Global Step: 237560   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:26,366-Speed 2623.33 samples/sec   Loss 9.5265   LearningRate 0.0509   Epoch: 5   Global Step: 237570   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:30,310-Speed 2597.60 samples/sec   Loss 9.7859   LearningRate 0.0509   Epoch: 5   Global Step: 237580   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:24:34,224-Speed 2616.30 samples/sec   Loss 9.7176   LearningRate 0.0509   Epoch: 5   Global Step: 237590   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:24:38,145-Speed 2612.06 samples/sec   Loss 9.7267   LearningRate 0.0509   Epoch: 5   Global Step: 237600   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:24:42,055-Speed 2619.82 samples/sec   Loss 9.5985   LearningRate 0.0509   Epoch: 5   Global Step: 237610   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:24:46,061-Speed 2556.64 samples/sec   Loss 9.6001   LearningRate 0.0509   Epoch: 5   Global Step: 237620   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:24:49,956-Speed 2630.31 samples/sec   Loss 9.6581   LearningRate 0.0509   Epoch: 5   Global Step: 237630   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:24:53,852-Speed 2628.48 samples/sec   Loss 9.6555   LearningRate 0.0509   Epoch: 5   Global Step: 237640   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:24:57,746-Speed 2630.68 samples/sec   Loss 9.7336   LearningRate 0.0509   Epoch: 5   Global Step: 237650   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:25:01,639-Speed 2631.40 samples/sec   Loss 9.5738   LearningRate 0.0509   Epoch: 5   Global Step: 237660   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:25:05,514-Speed 2643.01 samples/sec   Loss 9.6523   LearningRate 0.0509   Epoch: 5   Global Step: 237670   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:25:09,418-Speed 2622.90 samples/sec   Loss 9.6131   LearningRate 0.0509   Epoch: 5   Global Step: 237680   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:25:13,313-Speed 2630.83 samples/sec   Loss 9.5005   LearningRate 0.0509   Epoch: 5   Global Step: 237690   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:25:17,212-Speed 2626.63 samples/sec   Loss 9.6487   LearningRate 0.0509   Epoch: 5   Global Step: 237700   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:25:21,049-Speed 2669.87 samples/sec   Loss 10.9180   LearningRate 0.0509   Epoch: 5   Global Step: 237710   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:24,949-Speed 2626.16 samples/sec   Loss 10.5578   LearningRate 0.0509   Epoch: 5   Global Step: 237720   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:28,846-Speed 2628.98 samples/sec   Loss 9.9241   LearningRate 0.0509   Epoch: 5   Global Step: 237730   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:32,830-Speed 2570.25 samples/sec   Loss 9.9336   LearningRate 0.0509   Epoch: 5   Global Step: 237740   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:36,720-Speed 2632.94 samples/sec   Loss 9.9190   LearningRate 0.0509   Epoch: 5   Global Step: 237750   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:40,620-Speed 2626.45 samples/sec   Loss 9.7368   LearningRate 0.0509   Epoch: 5   Global Step: 237760   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:44,520-Speed 2626.77 samples/sec   Loss 9.7254   LearningRate 0.0509   Epoch: 5   Global Step: 237770   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:48,414-Speed 2629.98 samples/sec   Loss 9.8848   LearningRate 0.0509   Epoch: 5   Global Step: 237780   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:52,305-Speed 2632.29 samples/sec   Loss 9.8414   LearningRate 0.0509   Epoch: 5   Global Step: 237790   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:25:56,196-Speed 2632.71 samples/sec   Loss 9.7727   LearningRate 0.0509   Epoch: 5   Global Step: 237800   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:26:00,122-Speed 2608.97 samples/sec   Loss 9.7137   LearningRate 0.0509   Epoch: 5   Global Step: 237810   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:04,014-Speed 2631.52 samples/sec   Loss 9.7447   LearningRate 0.0509   Epoch: 5   Global Step: 237820   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:07,920-Speed 2622.22 samples/sec   Loss 9.6381   LearningRate 0.0509   Epoch: 5   Global Step: 237830   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:11,821-Speed 2625.81 samples/sec   Loss 9.7811   LearningRate 0.0509   Epoch: 5   Global Step: 237840   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:15,726-Speed 2623.33 samples/sec   Loss 9.8625   LearningRate 0.0509   Epoch: 5   Global Step: 237850   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:19,630-Speed 2623.97 samples/sec   Loss 9.7596   LearningRate 0.0509   Epoch: 5   Global Step: 237860   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:23,521-Speed 2631.95 samples/sec   Loss 9.7315   LearningRate 0.0509   Epoch: 5   Global Step: 237870   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:27,430-Speed 2621.00 samples/sec   Loss 9.7744   LearningRate 0.0509   Epoch: 5   Global Step: 237880   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:31,325-Speed 2629.24 samples/sec   Loss 9.5963   LearningRate 0.0509   Epoch: 5   Global Step: 237890   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:35,229-Speed 2624.44 samples/sec   Loss 9.6569   LearningRate 0.0509   Epoch: 5   Global Step: 237900   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:26:39,144-Speed 2616.31 samples/sec   Loss 9.9626   LearningRate 0.0509   Epoch: 5   Global Step: 237910   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:26:43,044-Speed 2626.49 samples/sec   Loss 10.2195   LearningRate 0.0509   Epoch: 5   Global Step: 237920   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:26:46,939-Speed 2629.41 samples/sec   Loss 9.9637   LearningRate 0.0509   Epoch: 5   Global Step: 237930   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:26:50,837-Speed 2628.24 samples/sec   Loss 9.8019   LearningRate 0.0509   Epoch: 5   Global Step: 237940   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:26:54,744-Speed 2621.81 samples/sec   Loss 9.9155   LearningRate 0.0509   Epoch: 5   Global Step: 237950   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:26:58,660-Speed 2615.56 samples/sec   Loss 9.7602   LearningRate 0.0509   Epoch: 5   Global Step: 237960   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:27:02,638-Speed 2574.62 samples/sec   Loss 9.7123   LearningRate 0.0509   Epoch: 5   Global Step: 237970   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:27:06,606-Speed 2581.27 samples/sec   Loss 9.6146   LearningRate 0.0509   Epoch: 5   Global Step: 237980   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:27:10,513-Speed 2621.91 samples/sec   Loss 9.6617   LearningRate 0.0509   Epoch: 5   Global Step: 237990   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:27:14,415-Speed 2625.19 samples/sec   Loss 9.6413   LearningRate 0.0509   Epoch: 5   Global Step: 238000   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:27:18,312-Speed 2628.17 samples/sec   Loss 9.6201   LearningRate 0.0509   Epoch: 5   Global Step: 238010   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:22,219-Speed 2621.60 samples/sec   Loss 9.6604   LearningRate 0.0508   Epoch: 5   Global Step: 238020   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:26,111-Speed 2632.03 samples/sec   Loss 9.7983   LearningRate 0.0508   Epoch: 5   Global Step: 238030   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:30,002-Speed 2632.22 samples/sec   Loss 9.8264   LearningRate 0.0508   Epoch: 5   Global Step: 238040   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:33,895-Speed 2631.17 samples/sec   Loss 9.7625   LearningRate 0.0508   Epoch: 5   Global Step: 238050   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:37,793-Speed 2627.54 samples/sec   Loss 9.6017   LearningRate 0.0508   Epoch: 5   Global Step: 238060   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:41,694-Speed 2625.50 samples/sec   Loss 9.7294   LearningRate 0.0508   Epoch: 5   Global Step: 238070   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:45,593-Speed 2627.26 samples/sec   Loss 9.6919   LearningRate 0.0508   Epoch: 5   Global Step: 238080   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:49,488-Speed 2629.35 samples/sec   Loss 9.7363   LearningRate 0.0508   Epoch: 5   Global Step: 238090   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:53,379-Speed 2633.18 samples/sec   Loss 9.6182   LearningRate 0.0508   Epoch: 5   Global Step: 238100   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:27:57,277-Speed 2627.32 samples/sec   Loss 9.5102   LearningRate 0.0508   Epoch: 5   Global Step: 238110   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:01,176-Speed 2626.68 samples/sec   Loss 9.7035   LearningRate 0.0508   Epoch: 5   Global Step: 238120   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:05,116-Speed 2599.69 samples/sec   Loss 9.7175   LearningRate 0.0508   Epoch: 5   Global Step: 238130   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:09,037-Speed 2612.46 samples/sec   Loss 9.8575   LearningRate 0.0508   Epoch: 5   Global Step: 238140   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:12,942-Speed 2622.79 samples/sec   Loss 9.6981   LearningRate 0.0508   Epoch: 5   Global Step: 238150   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:16,837-Speed 2630.72 samples/sec   Loss 9.6875   LearningRate 0.0508   Epoch: 5   Global Step: 238160   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:20,733-Speed 2628.56 samples/sec   Loss 9.6225   LearningRate 0.0508   Epoch: 5   Global Step: 238170   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:24,633-Speed 2626.85 samples/sec   Loss 9.6387   LearningRate 0.0508   Epoch: 5   Global Step: 238180   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:28,550-Speed 2614.70 samples/sec   Loss 9.5213   LearningRate 0.0508   Epoch: 5   Global Step: 238190   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:32,469-Speed 2613.40 samples/sec   Loss 9.6932   LearningRate 0.0508   Epoch: 5   Global Step: 238200   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:28:36,374-Speed 2623.12 samples/sec   Loss 9.6191   LearningRate 0.0508   Epoch: 5   Global Step: 238210   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:28:40,268-Speed 2630.73 samples/sec   Loss 9.7425   LearningRate 0.0508   Epoch: 5   Global Step: 238220   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:28:44,172-Speed 2623.64 samples/sec   Loss 9.6397   LearningRate 0.0508   Epoch: 5   Global Step: 238230   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:28:48,083-Speed 2618.75 samples/sec   Loss 9.5447   LearningRate 0.0508   Epoch: 5   Global Step: 238240   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:28:51,979-Speed 2629.14 samples/sec   Loss 9.6771   LearningRate 0.0508   Epoch: 5   Global Step: 238250   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:28:55,871-Speed 2631.91 samples/sec   Loss 9.6930   LearningRate 0.0508   Epoch: 5   Global Step: 238260   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:28:59,767-Speed 2628.85 samples/sec   Loss 9.5158   LearningRate 0.0508   Epoch: 5   Global Step: 238270   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:29:03,646-Speed 2639.71 samples/sec   Loss 9.5982   LearningRate 0.0508   Epoch: 5   Global Step: 238280   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:07,548-Speed 2625.37 samples/sec   Loss 9.7792   LearningRate 0.0508   Epoch: 5   Global Step: 238290   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:11,445-Speed 2628.73 samples/sec   Loss 9.5279   LearningRate 0.0508   Epoch: 5   Global Step: 238300   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:15,350-Speed 2622.89 samples/sec   Loss 9.7304   LearningRate 0.0508   Epoch: 5   Global Step: 238310   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:19,256-Speed 2623.16 samples/sec   Loss 9.5259   LearningRate 0.0508   Epoch: 5   Global Step: 238320   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:23,158-Speed 2625.08 samples/sec   Loss 9.5878   LearningRate 0.0508   Epoch: 5   Global Step: 238330   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:27,053-Speed 2630.04 samples/sec   Loss 9.6982   LearningRate 0.0508   Epoch: 5   Global Step: 238340   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:30,949-Speed 2628.36 samples/sec   Loss 9.6239   LearningRate 0.0508   Epoch: 5   Global Step: 238350   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:34,840-Speed 2632.05 samples/sec   Loss 9.6458   LearningRate 0.0508   Epoch: 5   Global Step: 238360   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:38,734-Speed 2630.63 samples/sec   Loss 9.8311   LearningRate 0.0508   Epoch: 5   Global Step: 238370   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:29:42,627-Speed 2631.47 samples/sec   Loss 9.5519   LearningRate 0.0508   Epoch: 5   Global Step: 238380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:29:46,521-Speed 2629.88 samples/sec   Loss 9.6572   LearningRate 0.0508   Epoch: 5   Global Step: 238390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:29:50,444-Speed 2611.50 samples/sec   Loss 9.7158   LearningRate 0.0508   Epoch: 5   Global Step: 238400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:29:54,407-Speed 2584.21 samples/sec   Loss 9.6974   LearningRate 0.0508   Epoch: 5   Global Step: 238410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:29:58,304-Speed 2628.52 samples/sec   Loss 9.7066   LearningRate 0.0508   Epoch: 5   Global Step: 238420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:02,200-Speed 2629.48 samples/sec   Loss 9.7074   LearningRate 0.0508   Epoch: 5   Global Step: 238430   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:06,096-Speed 2628.96 samples/sec   Loss 9.7128   LearningRate 0.0508   Epoch: 5   Global Step: 238440   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:09,992-Speed 2628.37 samples/sec   Loss 9.5827   LearningRate 0.0508   Epoch: 5   Global Step: 238450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:13,894-Speed 2625.32 samples/sec   Loss 9.4651   LearningRate 0.0508   Epoch: 5   Global Step: 238460   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:17,789-Speed 2629.52 samples/sec   Loss 9.6660   LearningRate 0.0508   Epoch: 5   Global Step: 238470   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:21,684-Speed 2630.21 samples/sec   Loss 9.5593   LearningRate 0.0508   Epoch: 5   Global Step: 238480   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:30:25,564-Speed 2639.45 samples/sec   Loss 9.6959   LearningRate 0.0508   Epoch: 5   Global Step: 238490   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:29,459-Speed 2630.08 samples/sec   Loss 9.6461   LearningRate 0.0508   Epoch: 5   Global Step: 238500   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:33,361-Speed 2624.58 samples/sec   Loss 9.6648   LearningRate 0.0508   Epoch: 5   Global Step: 238510   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:37,255-Speed 2630.33 samples/sec   Loss 9.6478   LearningRate 0.0508   Epoch: 5   Global Step: 238520   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:41,156-Speed 2625.14 samples/sec   Loss 9.5149   LearningRate 0.0508   Epoch: 5   Global Step: 238530   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:45,069-Speed 2618.49 samples/sec   Loss 9.7228   LearningRate 0.0508   Epoch: 5   Global Step: 238540   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:48,968-Speed 2626.70 samples/sec   Loss 9.5042   LearningRate 0.0508   Epoch: 5   Global Step: 238550   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:52,866-Speed 2628.01 samples/sec   Loss 9.5073   LearningRate 0.0508   Epoch: 5   Global Step: 238560   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:30:56,782-Speed 2615.10 samples/sec   Loss 9.5730   LearningRate 0.0508   Epoch: 5   Global Step: 238570   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:00,677-Speed 2629.99 samples/sec   Loss 9.7565   LearningRate 0.0508   Epoch: 5   Global Step: 238580   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:04,579-Speed 2624.98 samples/sec   Loss 9.6984   LearningRate 0.0508   Epoch: 5   Global Step: 238590   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:31:08,476-Speed 2628.48 samples/sec   Loss 9.6380   LearningRate 0.0507   Epoch: 5   Global Step: 238600   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:31:12,350-Speed 2643.67 samples/sec   Loss 9.6669   LearningRate 0.0507   Epoch: 5   Global Step: 238610   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:16,245-Speed 2629.44 samples/sec   Loss 9.6810   LearningRate 0.0507   Epoch: 5   Global Step: 238620   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:20,148-Speed 2625.05 samples/sec   Loss 9.6389   LearningRate 0.0507   Epoch: 5   Global Step: 238630   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:24,045-Speed 2628.30 samples/sec   Loss 9.5675   LearningRate 0.0507   Epoch: 5   Global Step: 238640   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:27,944-Speed 2626.79 samples/sec   Loss 9.6317   LearningRate 0.0507   Epoch: 5   Global Step: 238650   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:31,838-Speed 2630.15 samples/sec   Loss 9.6046   LearningRate 0.0507   Epoch: 5   Global Step: 238660   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:35,756-Speed 2613.90 samples/sec   Loss 9.6210   LearningRate 0.0507   Epoch: 5   Global Step: 238670   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:39,653-Speed 2628.53 samples/sec   Loss 9.5847   LearningRate 0.0507   Epoch: 5   Global Step: 238680   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:43,550-Speed 2627.97 samples/sec   Loss 9.6570   LearningRate 0.0507   Epoch: 5   Global Step: 238690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:47,443-Speed 2630.78 samples/sec   Loss 9.5857   LearningRate 0.0507   Epoch: 5   Global Step: 238700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:31:51,342-Speed 2627.36 samples/sec   Loss 9.5127   LearningRate 0.0507   Epoch: 5   Global Step: 238710   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:31:55,244-Speed 2625.38 samples/sec   Loss 9.7246   LearningRate 0.0507   Epoch: 5   Global Step: 238720   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:31:59,141-Speed 2628.00 samples/sec   Loss 9.6734   LearningRate 0.0507   Epoch: 5   Global Step: 238730   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:03,043-Speed 2625.27 samples/sec   Loss 9.7323   LearningRate 0.0507   Epoch: 5   Global Step: 238740   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:06,945-Speed 2624.76 samples/sec   Loss 9.6729   LearningRate 0.0507   Epoch: 5   Global Step: 238750   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:10,849-Speed 2623.62 samples/sec   Loss 9.5945   LearningRate 0.0507   Epoch: 5   Global Step: 238760   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:14,755-Speed 2622.42 samples/sec   Loss 9.6388   LearningRate 0.0507   Epoch: 5   Global Step: 238770   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:18,685-Speed 2605.89 samples/sec   Loss 9.5103   LearningRate 0.0507   Epoch: 5   Global Step: 238780   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:22,627-Speed 2598.29 samples/sec   Loss 9.5403   LearningRate 0.0507   Epoch: 5   Global Step: 238790   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:26,530-Speed 2624.31 samples/sec   Loss 9.5917   LearningRate 0.0507   Epoch: 5   Global Step: 238800   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:30,433-Speed 2624.44 samples/sec   Loss 9.5826   LearningRate 0.0507   Epoch: 5   Global Step: 238810   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:32:34,325-Speed 2631.68 samples/sec   Loss 9.4727   LearningRate 0.0507   Epoch: 5   Global Step: 238820   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:32:38,226-Speed 2626.01 samples/sec   Loss 9.6242   LearningRate 0.0507   Epoch: 5   Global Step: 238830   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:32:42,125-Speed 2626.92 samples/sec   Loss 9.6659   LearningRate 0.0507   Epoch: 5   Global Step: 238840   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:32:46,060-Speed 2602.88 samples/sec   Loss 9.6400   LearningRate 0.0507   Epoch: 5   Global Step: 238850   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:32:49,960-Speed 2626.14 samples/sec   Loss 9.6881   LearningRate 0.0507   Epoch: 5   Global Step: 238860   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:32:53,898-Speed 2601.17 samples/sec   Loss 9.7768   LearningRate 0.0507   Epoch: 5   Global Step: 238870   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:32:57,881-Speed 2571.29 samples/sec   Loss 9.6120   LearningRate 0.0507   Epoch: 5   Global Step: 238880   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:33:01,798-Speed 2615.29 samples/sec   Loss 9.6991   LearningRate 0.0507   Epoch: 5   Global Step: 238890   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:33:05,691-Speed 2630.21 samples/sec   Loss 9.5686   LearningRate 0.0507   Epoch: 5   Global Step: 238900   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:33:09,626-Speed 2603.82 samples/sec   Loss 9.5242   LearningRate 0.0507   Epoch: 5   Global Step: 238910   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:33:13,521-Speed 2629.46 samples/sec   Loss 9.6910   LearningRate 0.0507   Epoch: 5   Global Step: 238920   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:17,415-Speed 2630.33 samples/sec   Loss 9.4934   LearningRate 0.0507   Epoch: 5   Global Step: 238930   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:21,309-Speed 2630.53 samples/sec   Loss 9.6179   LearningRate 0.0507   Epoch: 5   Global Step: 238940   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:25,208-Speed 2626.32 samples/sec   Loss 9.7767   LearningRate 0.0507   Epoch: 5   Global Step: 238950   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:29,115-Speed 2622.20 samples/sec   Loss 9.6037   LearningRate 0.0507   Epoch: 5   Global Step: 238960   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:33,017-Speed 2624.80 samples/sec   Loss 9.5474   LearningRate 0.0507   Epoch: 5   Global Step: 238970   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:36,926-Speed 2620.11 samples/sec   Loss 9.8367   LearningRate 0.0507   Epoch: 5   Global Step: 238980   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:40,826-Speed 2626.07 samples/sec   Loss 9.5452   LearningRate 0.0507   Epoch: 5   Global Step: 238990   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:44,723-Speed 2634.94 samples/sec   Loss 9.5831   LearningRate 0.0507   Epoch: 5   Global Step: 239000   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:33:48,598-Speed 2642.93 samples/sec   Loss 9.6567   LearningRate 0.0507   Epoch: 5   Global Step: 239010   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:33:52,498-Speed 2626.77 samples/sec   Loss 9.5911   LearningRate 0.0507   Epoch: 5   Global Step: 239020   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:33:56,394-Speed 2629.16 samples/sec   Loss 9.6234   LearningRate 0.0507   Epoch: 5   Global Step: 239030   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:00,295-Speed 2626.12 samples/sec   Loss 9.6211   LearningRate 0.0507   Epoch: 5   Global Step: 239040   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:04,205-Speed 2619.60 samples/sec   Loss 9.6483   LearningRate 0.0507   Epoch: 5   Global Step: 239050   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:08,109-Speed 2623.08 samples/sec   Loss 9.7200   LearningRate 0.0507   Epoch: 5   Global Step: 239060   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:12,029-Speed 2612.55 samples/sec   Loss 9.6056   LearningRate 0.0507   Epoch: 5   Global Step: 239070   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:15,932-Speed 2624.70 samples/sec   Loss 9.7006   LearningRate 0.0507   Epoch: 5   Global Step: 239080   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:19,825-Speed 2631.34 samples/sec   Loss 9.6033   LearningRate 0.0507   Epoch: 5   Global Step: 239090   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:23,719-Speed 2630.24 samples/sec   Loss 9.6665   LearningRate 0.0507   Epoch: 5   Global Step: 239100   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:27,628-Speed 2620.74 samples/sec   Loss 9.7288   LearningRate 0.0507   Epoch: 5   Global Step: 239110   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:34:31,497-Speed 2647.21 samples/sec   Loss 9.6068   LearningRate 0.0507   Epoch: 5   Global Step: 239120   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:35,391-Speed 2630.07 samples/sec   Loss 9.7058   LearningRate 0.0507   Epoch: 5   Global Step: 239130   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:39,289-Speed 2627.81 samples/sec   Loss 9.8129   LearningRate 0.0507   Epoch: 5   Global Step: 239140   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:43,183-Speed 2630.28 samples/sec   Loss 9.6292   LearningRate 0.0507   Epoch: 5   Global Step: 239150   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:47,078-Speed 2630.23 samples/sec   Loss 9.6994   LearningRate 0.0507   Epoch: 5   Global Step: 239160   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:50,972-Speed 2630.67 samples/sec   Loss 9.7340   LearningRate 0.0507   Epoch: 5   Global Step: 239170   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:54,868-Speed 2628.84 samples/sec   Loss 9.4168   LearningRate 0.0506   Epoch: 5   Global Step: 239180   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:34:58,762-Speed 2630.04 samples/sec   Loss 9.5664   LearningRate 0.0506   Epoch: 5   Global Step: 239190   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:02,659-Speed 2628.77 samples/sec   Loss 9.5271   LearningRate 0.0506   Epoch: 5   Global Step: 239200   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:06,570-Speed 2618.87 samples/sec   Loss 9.5342   LearningRate 0.0506   Epoch: 5   Global Step: 239210   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:10,495-Speed 2609.45 samples/sec   Loss 9.5612   LearningRate 0.0506   Epoch: 5   Global Step: 239220   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:14,404-Speed 2620.60 samples/sec   Loss 9.5705   LearningRate 0.0506   Epoch: 5   Global Step: 239230   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:18,289-Speed 2636.54 samples/sec   Loss 9.3532   LearningRate 0.0506   Epoch: 5   Global Step: 239240   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:22,208-Speed 2613.42 samples/sec   Loss 9.5635   LearningRate 0.0506   Epoch: 5   Global Step: 239250   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:26,104-Speed 2628.70 samples/sec   Loss 9.7912   LearningRate 0.0506   Epoch: 5   Global Step: 239260   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:30,137-Speed 2540.46 samples/sec   Loss 9.7369   LearningRate 0.0506   Epoch: 5   Global Step: 239270   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:34,033-Speed 2628.80 samples/sec   Loss 9.5918   LearningRate 0.0506   Epoch: 5   Global Step: 239280   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:37,934-Speed 2625.73 samples/sec   Loss 9.7405   LearningRate 0.0506   Epoch: 5   Global Step: 239290   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:35:41,825-Speed 2631.88 samples/sec   Loss 9.7451   LearningRate 0.0506   Epoch: 5   Global Step: 239300   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:45,720-Speed 2629.94 samples/sec   Loss 9.5564   LearningRate 0.0506   Epoch: 5   Global Step: 239310   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:49,617-Speed 2628.45 samples/sec   Loss 9.5339   LearningRate 0.0506   Epoch: 5   Global Step: 239320   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:53,511-Speed 2630.25 samples/sec   Loss 9.7271   LearningRate 0.0506   Epoch: 5   Global Step: 239330   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:35:57,405-Speed 2630.40 samples/sec   Loss 9.5353   LearningRate 0.0506   Epoch: 5   Global Step: 239340   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:01,299-Speed 2630.24 samples/sec   Loss 9.6525   LearningRate 0.0506   Epoch: 5   Global Step: 239350   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:05,203-Speed 2624.03 samples/sec   Loss 9.7727   LearningRate 0.0506   Epoch: 5   Global Step: 239360   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:09,092-Speed 2633.77 samples/sec   Loss 9.6135   LearningRate 0.0506   Epoch: 5   Global Step: 239370   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:12,986-Speed 2629.99 samples/sec   Loss 9.6289   LearningRate 0.0506   Epoch: 5   Global Step: 239380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:16,878-Speed 2631.50 samples/sec   Loss 9.6026   LearningRate 0.0506   Epoch: 5   Global Step: 239390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:20,760-Speed 2638.89 samples/sec   Loss 9.5305   LearningRate 0.0506   Epoch: 5   Global Step: 239400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:24,662-Speed 2625.39 samples/sec   Loss 9.5489   LearningRate 0.0506   Epoch: 5   Global Step: 239410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:28,578-Speed 2615.55 samples/sec   Loss 9.7963   LearningRate 0.0506   Epoch: 5   Global Step: 239420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:32,489-Speed 2619.24 samples/sec   Loss 9.7612   LearningRate 0.0506   Epoch: 5   Global Step: 239430   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:36,382-Speed 2630.78 samples/sec   Loss 9.5960   LearningRate 0.0506   Epoch: 5   Global Step: 239440   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:40,293-Speed 2618.90 samples/sec   Loss 9.5748   LearningRate 0.0506   Epoch: 5   Global Step: 239450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:44,185-Speed 2631.91 samples/sec   Loss 9.5992   LearningRate 0.0506   Epoch: 5   Global Step: 239460   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:48,101-Speed 2615.42 samples/sec   Loss 9.5432   LearningRate 0.0506   Epoch: 5   Global Step: 239470   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:51,996-Speed 2629.53 samples/sec   Loss 9.5112   LearningRate 0.0506   Epoch: 5   Global Step: 239480   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:55,896-Speed 2626.12 samples/sec   Loss 9.6065   LearningRate 0.0506   Epoch: 5   Global Step: 239490   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:36:59,809-Speed 2618.43 samples/sec   Loss 9.6752   LearningRate 0.0506   Epoch: 5   Global Step: 239500   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:37:03,706-Speed 2628.26 samples/sec   Loss 9.5553   LearningRate 0.0506   Epoch: 5   Global Step: 239510   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:37:07,600-Speed 2629.60 samples/sec   Loss 9.6292   LearningRate 0.0506   Epoch: 5   Global Step: 239520   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:37:11,501-Speed 2625.51 samples/sec   Loss 9.7144   LearningRate 0.0506   Epoch: 5   Global Step: 239530   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:37:15,389-Speed 2634.96 samples/sec   Loss 9.6763   LearningRate 0.0506   Epoch: 5   Global Step: 239540   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:19,284-Speed 2629.25 samples/sec   Loss 9.6021   LearningRate 0.0506   Epoch: 5   Global Step: 239550   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:23,208-Speed 2610.84 samples/sec   Loss 9.5623   LearningRate 0.0506   Epoch: 5   Global Step: 239560   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:27,104-Speed 2629.06 samples/sec   Loss 9.5855   LearningRate 0.0506   Epoch: 5   Global Step: 239570   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:31,052-Speed 2594.49 samples/sec   Loss 9.6279   LearningRate 0.0506   Epoch: 5   Global Step: 239580   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:34,954-Speed 2625.12 samples/sec   Loss 9.5373   LearningRate 0.0506   Epoch: 5   Global Step: 239590   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:38,852-Speed 2627.17 samples/sec   Loss 9.5985   LearningRate 0.0506   Epoch: 5   Global Step: 239600   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:42,748-Speed 2629.05 samples/sec   Loss 9.5146   LearningRate 0.0506   Epoch: 5   Global Step: 239610   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:46,644-Speed 2629.78 samples/sec   Loss 9.6765   LearningRate 0.0506   Epoch: 5   Global Step: 239620   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:50,540-Speed 2628.89 samples/sec   Loss 9.5264   LearningRate 0.0506   Epoch: 5   Global Step: 239630   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:37:54,433-Speed 2631.12 samples/sec   Loss 9.6811   LearningRate 0.0506   Epoch: 5   Global Step: 239640   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:37:58,329-Speed 2628.84 samples/sec   Loss 9.6771   LearningRate 0.0506   Epoch: 5   Global Step: 239650   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:38:02,231-Speed 2624.84 samples/sec   Loss 9.7347   LearningRate 0.0506   Epoch: 5   Global Step: 239660   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:38:06,126-Speed 2629.62 samples/sec   Loss 9.5918   LearningRate 0.0506   Epoch: 5   Global Step: 239670   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:38:10,021-Speed 2630.00 samples/sec   Loss 9.6581   LearningRate 0.0506   Epoch: 5   Global Step: 239680   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:38:13,898-Speed 2641.51 samples/sec   Loss 9.7239   LearningRate 0.0506   Epoch: 5   Global Step: 239690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:17,793-Speed 2629.93 samples/sec   Loss 9.6733   LearningRate 0.0506   Epoch: 5   Global Step: 239700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:21,687-Speed 2630.21 samples/sec   Loss 9.6765   LearningRate 0.0506   Epoch: 5   Global Step: 239710   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:25,580-Speed 2631.26 samples/sec   Loss 9.7637   LearningRate 0.0506   Epoch: 5   Global Step: 239720   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:29,494-Speed 2617.11 samples/sec   Loss 9.7022   LearningRate 0.0506   Epoch: 5   Global Step: 239730   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:33,409-Speed 2615.83 samples/sec   Loss 9.7183   LearningRate 0.0506   Epoch: 5   Global Step: 239740   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:37,308-Speed 2627.16 samples/sec   Loss 9.6106   LearningRate 0.0506   Epoch: 5   Global Step: 239750   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:41,204-Speed 2629.35 samples/sec   Loss 9.6950   LearningRate 0.0506   Epoch: 5   Global Step: 239760   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:45,100-Speed 2629.37 samples/sec   Loss 9.7001   LearningRate 0.0505   Epoch: 5   Global Step: 239770   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:48,996-Speed 2628.41 samples/sec   Loss 9.5944   LearningRate 0.0505   Epoch: 5   Global Step: 239780   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:38:52,902-Speed 2622.77 samples/sec   Loss 9.6273   LearningRate 0.0505   Epoch: 5   Global Step: 239790   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:38:56,715-Speed 2686.47 samples/sec   Loss 9.9507   LearningRate 0.0505   Epoch: 5   Global Step: 239800   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:00,610-Speed 2629.55 samples/sec   Loss 10.8635   LearningRate 0.0505   Epoch: 5   Global Step: 239810   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:04,511-Speed 2624.81 samples/sec   Loss 10.1667   LearningRate 0.0505   Epoch: 5   Global Step: 239820   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:08,417-Speed 2622.58 samples/sec   Loss 9.8011   LearningRate 0.0505   Epoch: 5   Global Step: 239830   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:12,317-Speed 2626.27 samples/sec   Loss 9.7404   LearningRate 0.0505   Epoch: 5   Global Step: 239840   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:16,212-Speed 2630.69 samples/sec   Loss 9.7233   LearningRate 0.0505   Epoch: 5   Global Step: 239850   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:20,102-Speed 2633.07 samples/sec   Loss 9.6349   LearningRate 0.0505   Epoch: 5   Global Step: 239860   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:23,993-Speed 2632.05 samples/sec   Loss 9.5466   LearningRate 0.0505   Epoch: 5   Global Step: 239870   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:27,891-Speed 2627.72 samples/sec   Loss 9.7187   LearningRate 0.0505   Epoch: 5   Global Step: 239880   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:31,782-Speed 2631.87 samples/sec   Loss 9.7336   LearningRate 0.0505   Epoch: 5   Global Step: 239890   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:39:35,677-Speed 2629.57 samples/sec   Loss 9.5187   LearningRate 0.0505   Epoch: 5   Global Step: 239900   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:39:39,568-Speed 2632.44 samples/sec   Loss 9.7353   LearningRate 0.0505   Epoch: 5   Global Step: 239910   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:39:43,464-Speed 2629.10 samples/sec   Loss 9.6673   LearningRate 0.0505   Epoch: 5   Global Step: 239920   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:39:47,360-Speed 2629.01 samples/sec   Loss 9.5959   LearningRate 0.0505   Epoch: 5   Global Step: 239930   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:39:51,282-Speed 2612.38 samples/sec   Loss 9.6061   LearningRate 0.0505   Epoch: 5   Global Step: 239940   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:39:55,174-Speed 2631.42 samples/sec   Loss 9.7010   LearningRate 0.0505   Epoch: 5   Global Step: 239950   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:39:59,080-Speed 2622.66 samples/sec   Loss 9.5629   LearningRate 0.0505   Epoch: 5   Global Step: 239960   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:40:02,982-Speed 2624.95 samples/sec   Loss 9.4253   LearningRate 0.0505   Epoch: 5   Global Step: 239970   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:40:06,873-Speed 2632.28 samples/sec   Loss 9.7121   LearningRate 0.0505   Epoch: 5   Global Step: 239980   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:40:10,763-Speed 2632.55 samples/sec   Loss 9.6184   LearningRate 0.0505   Epoch: 5   Global Step: 239990   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:40:14,661-Speed 2627.97 samples/sec   Loss 9.5810   LearningRate 0.0505   Epoch: 5   Global Step: 240000   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:40:57,423-[lfw][240000]XNorm: 23.453357
Training: 2022-04-13 22:40:57,424-[lfw][240000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-13 22:40:57,425-[lfw][240000]Accuracy-Highest: 0.99783
Training: 2022-04-13 22:41:47,529-[cfp_fp][240000]XNorm: 21.397198
Training: 2022-04-13 22:41:47,529-[cfp_fp][240000]Accuracy-Flip: 0.97871+-0.00689
Training: 2022-04-13 22:41:47,531-[cfp_fp][240000]Accuracy-Highest: 0.98314
Training: 2022-04-13 22:42:30,745-[agedb_30][240000]XNorm: 23.193004
Training: 2022-04-13 22:42:30,746-[agedb_30][240000]Accuracy-Flip: 0.97267+-0.00814
Training: 2022-04-13 22:42:30,746-[agedb_30][240000]Accuracy-Highest: 0.97267
Training: 2022-04-13 22:42:34,620-Speed 73.16 samples/sec   Loss 9.6308   LearningRate 0.0505   Epoch: 5   Global Step: 240010   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:42:38,486-Speed 2649.22 samples/sec   Loss 9.6315   LearningRate 0.0505   Epoch: 5   Global Step: 240020   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:42:42,354-Speed 2647.98 samples/sec   Loss 9.6037   LearningRate 0.0505   Epoch: 5   Global Step: 240030   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:42:46,253-Speed 2627.16 samples/sec   Loss 9.6040   LearningRate 0.0505   Epoch: 5   Global Step: 240040   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:42:50,128-Speed 2643.02 samples/sec   Loss 9.6319   LearningRate 0.0505   Epoch: 5   Global Step: 240050   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:42:54,009-Speed 2639.43 samples/sec   Loss 9.6521   LearningRate 0.0505   Epoch: 5   Global Step: 240060   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:42:57,898-Speed 2635.08 samples/sec   Loss 9.6619   LearningRate 0.0505   Epoch: 5   Global Step: 240070   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:01,779-Speed 2639.98 samples/sec   Loss 9.6239   LearningRate 0.0505   Epoch: 5   Global Step: 240080   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:05,659-Speed 2639.73 samples/sec   Loss 9.7536   LearningRate 0.0505   Epoch: 5   Global Step: 240090   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:09,612-Speed 2590.83 samples/sec   Loss 9.5619   LearningRate 0.0505   Epoch: 5   Global Step: 240100   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:43:13,494-Speed 2637.99 samples/sec   Loss 9.8461   LearningRate 0.0505   Epoch: 5   Global Step: 240110   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:17,384-Speed 2634.59 samples/sec   Loss 9.6905   LearningRate 0.0505   Epoch: 5   Global Step: 240120   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:21,268-Speed 2636.91 samples/sec   Loss 9.5922   LearningRate 0.0505   Epoch: 5   Global Step: 240130   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:25,154-Speed 2635.89 samples/sec   Loss 9.7915   LearningRate 0.0505   Epoch: 5   Global Step: 240140   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:29,047-Speed 2631.23 samples/sec   Loss 9.6019   LearningRate 0.0505   Epoch: 5   Global Step: 240150   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:32,933-Speed 2636.01 samples/sec   Loss 9.5559   LearningRate 0.0505   Epoch: 5   Global Step: 240160   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:36,824-Speed 2632.28 samples/sec   Loss 9.6549   LearningRate 0.0505   Epoch: 5   Global Step: 240170   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:40,714-Speed 2632.72 samples/sec   Loss 9.6300   LearningRate 0.0505   Epoch: 5   Global Step: 240180   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:44,609-Speed 2630.19 samples/sec   Loss 9.5507   LearningRate 0.0505   Epoch: 5   Global Step: 240190   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:48,496-Speed 2635.21 samples/sec   Loss 9.5817   LearningRate 0.0505   Epoch: 5   Global Step: 240200   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:43:52,389-Speed 2630.88 samples/sec   Loss 9.5426   LearningRate 0.0505   Epoch: 5   Global Step: 240210   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:43:56,286-Speed 2628.62 samples/sec   Loss 9.5314   LearningRate 0.0505   Epoch: 5   Global Step: 240220   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:00,214-Speed 2607.93 samples/sec   Loss 9.6816   LearningRate 0.0505   Epoch: 5   Global Step: 240230   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:04,105-Speed 2632.38 samples/sec   Loss 9.4340   LearningRate 0.0505   Epoch: 5   Global Step: 240240   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:07,998-Speed 2630.82 samples/sec   Loss 9.7597   LearningRate 0.0505   Epoch: 5   Global Step: 240250   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:11,904-Speed 2622.33 samples/sec   Loss 9.5818   LearningRate 0.0505   Epoch: 5   Global Step: 240260   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:15,798-Speed 2630.66 samples/sec   Loss 9.6377   LearningRate 0.0505   Epoch: 5   Global Step: 240270   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:19,691-Speed 2630.95 samples/sec   Loss 9.6915   LearningRate 0.0505   Epoch: 5   Global Step: 240280   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:23,585-Speed 2630.34 samples/sec   Loss 9.6877   LearningRate 0.0505   Epoch: 5   Global Step: 240290   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:27,499-Speed 2617.47 samples/sec   Loss 9.6008   LearningRate 0.0505   Epoch: 5   Global Step: 240300   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:44:31,385-Speed 2635.97 samples/sec   Loss 9.6503   LearningRate 0.0505   Epoch: 5   Global Step: 240310   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:35,277-Speed 2631.46 samples/sec   Loss 9.7214   LearningRate 0.0505   Epoch: 5   Global Step: 240320   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:39,172-Speed 2629.40 samples/sec   Loss 9.5754   LearningRate 0.0505   Epoch: 5   Global Step: 240330   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:43,065-Speed 2631.36 samples/sec   Loss 9.5089   LearningRate 0.0505   Epoch: 5   Global Step: 240340   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:46,964-Speed 2626.87 samples/sec   Loss 9.6744   LearningRate 0.0504   Epoch: 5   Global Step: 240350   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:50,855-Speed 2632.52 samples/sec   Loss 9.6775   LearningRate 0.0504   Epoch: 5   Global Step: 240360   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:54,750-Speed 2629.40 samples/sec   Loss 9.5177   LearningRate 0.0504   Epoch: 5   Global Step: 240370   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:44:58,645-Speed 2629.75 samples/sec   Loss 9.4817   LearningRate 0.0504   Epoch: 5   Global Step: 240380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:02,541-Speed 2629.30 samples/sec   Loss 9.5052   LearningRate 0.0504   Epoch: 5   Global Step: 240390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:06,436-Speed 2629.01 samples/sec   Loss 9.6230   LearningRate 0.0504   Epoch: 5   Global Step: 240400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:10,340-Speed 2623.63 samples/sec   Loss 9.6899   LearningRate 0.0504   Epoch: 5   Global Step: 240410   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:45:14,235-Speed 2629.87 samples/sec   Loss 9.6543   LearningRate 0.0504   Epoch: 5   Global Step: 240420   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:45:18,136-Speed 2625.69 samples/sec   Loss 9.5262   LearningRate 0.0504   Epoch: 5   Global Step: 240430   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:45:22,031-Speed 2630.04 samples/sec   Loss 9.5618   LearningRate 0.0504   Epoch: 5   Global Step: 240440   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:45:25,919-Speed 2634.39 samples/sec   Loss 9.6032   LearningRate 0.0504   Epoch: 5   Global Step: 240450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:29,812-Speed 2631.25 samples/sec   Loss 9.6756   LearningRate 0.0504   Epoch: 5   Global Step: 240460   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:33,707-Speed 2629.91 samples/sec   Loss 9.7120   LearningRate 0.0504   Epoch: 5   Global Step: 240470   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:37,600-Speed 2630.54 samples/sec   Loss 9.6119   LearningRate 0.0504   Epoch: 5   Global Step: 240480   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:41,504-Speed 2623.62 samples/sec   Loss 9.6405   LearningRate 0.0504   Epoch: 5   Global Step: 240490   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:45,403-Speed 2626.98 samples/sec   Loss 9.5614   LearningRate 0.0504   Epoch: 5   Global Step: 240500   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:49,297-Speed 2630.63 samples/sec   Loss 9.6849   LearningRate 0.0504   Epoch: 5   Global Step: 240510   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:53,196-Speed 2627.14 samples/sec   Loss 9.5409   LearningRate 0.0504   Epoch: 5   Global Step: 240520   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:45:57,086-Speed 2632.87 samples/sec   Loss 9.6413   LearningRate 0.0504   Epoch: 5   Global Step: 240530   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:46:00,882-Speed 2698.52 samples/sec   Loss 9.6836   LearningRate 0.0504   Epoch: 5   Global Step: 240540   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:04,776-Speed 2630.63 samples/sec   Loss 9.9694   LearningRate 0.0504   Epoch: 5   Global Step: 240550   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:08,672-Speed 2628.66 samples/sec   Loss 9.7306   LearningRate 0.0504   Epoch: 5   Global Step: 240560   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:12,558-Speed 2635.27 samples/sec   Loss 9.6238   LearningRate 0.0504   Epoch: 5   Global Step: 240570   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:16,456-Speed 2627.73 samples/sec   Loss 9.4066   LearningRate 0.0504   Epoch: 5   Global Step: 240580   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:20,349-Speed 2631.05 samples/sec   Loss 9.6675   LearningRate 0.0504   Epoch: 5   Global Step: 240590   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:24,242-Speed 2631.80 samples/sec   Loss 9.6137   LearningRate 0.0504   Epoch: 5   Global Step: 240600   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:28,132-Speed 2632.74 samples/sec   Loss 9.5432   LearningRate 0.0504   Epoch: 5   Global Step: 240610   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:32,023-Speed 2632.56 samples/sec   Loss 9.7246   LearningRate 0.0504   Epoch: 5   Global Step: 240620   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:35,914-Speed 2632.18 samples/sec   Loss 9.5727   LearningRate 0.0504   Epoch: 5   Global Step: 240630   Fp16 Grad Scale: 2048   Required: 66 hours
Training: 2022-04-13 22:46:39,803-Speed 2633.55 samples/sec   Loss 9.5412   LearningRate 0.0504   Epoch: 5   Global Step: 240640   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:46:43,696-Speed 2630.65 samples/sec   Loss 9.5173   LearningRate 0.0504   Epoch: 5   Global Step: 240650   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:46:47,600-Speed 2624.04 samples/sec   Loss 9.6442   LearningRate 0.0504   Epoch: 5   Global Step: 240660   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:46:51,519-Speed 2613.72 samples/sec   Loss 9.7092   LearningRate 0.0504   Epoch: 5   Global Step: 240670   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:46:55,410-Speed 2632.63 samples/sec   Loss 9.5415   LearningRate 0.0504   Epoch: 5   Global Step: 240680   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:46:59,303-Speed 2631.30 samples/sec   Loss 9.5696   LearningRate 0.0504   Epoch: 5   Global Step: 240690   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:47:03,205-Speed 2624.68 samples/sec   Loss 9.7120   LearningRate 0.0504   Epoch: 5   Global Step: 240700   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:47:07,104-Speed 2627.02 samples/sec   Loss 9.6448   LearningRate 0.0504   Epoch: 5   Global Step: 240710   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:47:11,003-Speed 2627.00 samples/sec   Loss 9.6730   LearningRate 0.0504   Epoch: 5   Global Step: 240720   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:47:14,900-Speed 2628.07 samples/sec   Loss 9.5785   LearningRate 0.0504   Epoch: 5   Global Step: 240730   Fp16 Grad Scale: 4096   Required: 66 hours
Training: 2022-04-13 22:47:18,809-Speed 2620.46 samples/sec   Loss 9.6927   LearningRate 0.0504   Epoch: 5   Global Step: 240740   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:22,712-Speed 2624.69 samples/sec   Loss 9.7495   LearningRate 0.0504   Epoch: 5   Global Step: 240750   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:26,640-Speed 2607.03 samples/sec   Loss 9.7443   LearningRate 0.0504   Epoch: 5   Global Step: 240760   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:30,542-Speed 2625.03 samples/sec   Loss 9.6745   LearningRate 0.0504   Epoch: 5   Global Step: 240770   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:34,441-Speed 2627.64 samples/sec   Loss 9.6713   LearningRate 0.0504   Epoch: 5   Global Step: 240780   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:38,339-Speed 2627.22 samples/sec   Loss 9.6373   LearningRate 0.0504   Epoch: 5   Global Step: 240790   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:42,258-Speed 2613.43 samples/sec   Loss 9.4895   LearningRate 0.0504   Epoch: 5   Global Step: 240800   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:46,159-Speed 2626.72 samples/sec   Loss 9.6245   LearningRate 0.0504   Epoch: 5   Global Step: 240810   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:50,226-Speed 2518.41 samples/sec   Loss 9.6234   LearningRate 0.0504   Epoch: 5   Global Step: 240820   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:54,262-Speed 2537.74 samples/sec   Loss 9.6951   LearningRate 0.0504   Epoch: 5   Global Step: 240830   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:47:58,149-Speed 2634.81 samples/sec   Loss 9.5792   LearningRate 0.0504   Epoch: 5   Global Step: 240840   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:02,039-Speed 2633.54 samples/sec   Loss 9.5714   LearningRate 0.0504   Epoch: 5   Global Step: 240850   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:05,938-Speed 2627.27 samples/sec   Loss 9.5532   LearningRate 0.0504   Epoch: 5   Global Step: 240860   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:09,834-Speed 2629.04 samples/sec   Loss 9.5669   LearningRate 0.0504   Epoch: 5   Global Step: 240870   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:13,727-Speed 2630.50 samples/sec   Loss 9.5145   LearningRate 0.0504   Epoch: 5   Global Step: 240880   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:17,616-Speed 2633.96 samples/sec   Loss 9.5479   LearningRate 0.0504   Epoch: 5   Global Step: 240890   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:21,511-Speed 2630.57 samples/sec   Loss 9.5530   LearningRate 0.0504   Epoch: 5   Global Step: 240900   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:25,406-Speed 2629.40 samples/sec   Loss 9.6303   LearningRate 0.0504   Epoch: 5   Global Step: 240910   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:29,318-Speed 2618.26 samples/sec   Loss 9.7224   LearningRate 0.0504   Epoch: 5   Global Step: 240920   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:33,210-Speed 2632.13 samples/sec   Loss 9.6214   LearningRate 0.0503   Epoch: 5   Global Step: 240930   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:48:37,106-Speed 2628.36 samples/sec   Loss 9.6509   LearningRate 0.0503   Epoch: 5   Global Step: 240940   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:48:41,003-Speed 2628.58 samples/sec   Loss 9.5858   LearningRate 0.0503   Epoch: 5   Global Step: 240950   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:48:44,905-Speed 2625.16 samples/sec   Loss 9.7072   LearningRate 0.0503   Epoch: 5   Global Step: 240960   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:48:48,798-Speed 2630.83 samples/sec   Loss 9.5562   LearningRate 0.0503   Epoch: 5   Global Step: 240970   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:48:52,730-Speed 2604.67 samples/sec   Loss 9.5693   LearningRate 0.0503   Epoch: 5   Global Step: 240980   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:48:56,623-Speed 2631.64 samples/sec   Loss 9.5509   LearningRate 0.0503   Epoch: 5   Global Step: 240990   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:49:00,517-Speed 2630.33 samples/sec   Loss 9.6085   LearningRate 0.0503   Epoch: 5   Global Step: 241000   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:49:04,410-Speed 2631.04 samples/sec   Loss 9.5667   LearningRate 0.0503   Epoch: 5   Global Step: 241010   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:49:08,307-Speed 2628.45 samples/sec   Loss 9.7293   LearningRate 0.0503   Epoch: 5   Global Step: 241020   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:49:12,206-Speed 2627.43 samples/sec   Loss 9.4664   LearningRate 0.0503   Epoch: 5   Global Step: 241030   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:49:16,129-Speed 2610.91 samples/sec   Loss 9.5814   LearningRate 0.0503   Epoch: 5   Global Step: 241040   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:20,021-Speed 2631.34 samples/sec   Loss 9.6152   LearningRate 0.0503   Epoch: 5   Global Step: 241050   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:23,912-Speed 2632.21 samples/sec   Loss 9.5086   LearningRate 0.0503   Epoch: 5   Global Step: 241060   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:27,811-Speed 2627.15 samples/sec   Loss 9.4464   LearningRate 0.0503   Epoch: 5   Global Step: 241070   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:31,710-Speed 2627.07 samples/sec   Loss 9.6022   LearningRate 0.0503   Epoch: 5   Global Step: 241080   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:35,607-Speed 2628.51 samples/sec   Loss 9.6741   LearningRate 0.0503   Epoch: 5   Global Step: 241090   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:39,511-Speed 2624.03 samples/sec   Loss 9.7050   LearningRate 0.0503   Epoch: 5   Global Step: 241100   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:43,401-Speed 2633.11 samples/sec   Loss 9.4665   LearningRate 0.0503   Epoch: 5   Global Step: 241110   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:47,294-Speed 2630.58 samples/sec   Loss 9.6026   LearningRate 0.0503   Epoch: 5   Global Step: 241120   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:51,192-Speed 2627.15 samples/sec   Loss 9.5717   LearningRate 0.0503   Epoch: 5   Global Step: 241130   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:49:55,084-Speed 2631.99 samples/sec   Loss 9.4032   LearningRate 0.0503   Epoch: 5   Global Step: 241140   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:49:58,984-Speed 2625.84 samples/sec   Loss 9.6200   LearningRate 0.0503   Epoch: 5   Global Step: 241150   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:02,896-Speed 2618.84 samples/sec   Loss 9.6502   LearningRate 0.0503   Epoch: 5   Global Step: 241160   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:06,790-Speed 2629.88 samples/sec   Loss 9.6076   LearningRate 0.0503   Epoch: 5   Global Step: 241170   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:10,699-Speed 2620.55 samples/sec   Loss 9.7879   LearningRate 0.0503   Epoch: 5   Global Step: 241180   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:14,595-Speed 2628.90 samples/sec   Loss 9.6629   LearningRate 0.0503   Epoch: 5   Global Step: 241190   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:18,632-Speed 2537.19 samples/sec   Loss 9.6317   LearningRate 0.0503   Epoch: 5   Global Step: 241200   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:22,639-Speed 2555.77 samples/sec   Loss 9.6151   LearningRate 0.0503   Epoch: 5   Global Step: 241210   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:26,534-Speed 2629.87 samples/sec   Loss 9.4833   LearningRate 0.0503   Epoch: 5   Global Step: 241220   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:30,429-Speed 2629.55 samples/sec   Loss 9.3730   LearningRate 0.0503   Epoch: 5   Global Step: 241230   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:34,335-Speed 2622.14 samples/sec   Loss 9.5630   LearningRate 0.0503   Epoch: 5   Global Step: 241240   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:50:38,217-Speed 2637.87 samples/sec   Loss 9.6796   LearningRate 0.0503   Epoch: 5   Global Step: 241250   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:42,116-Speed 2627.44 samples/sec   Loss 9.5776   LearningRate 0.0503   Epoch: 5   Global Step: 241260   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:46,016-Speed 2626.17 samples/sec   Loss 9.5396   LearningRate 0.0503   Epoch: 5   Global Step: 241270   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:49,906-Speed 2633.09 samples/sec   Loss 9.5468   LearningRate 0.0503   Epoch: 5   Global Step: 241280   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:50:53,902-Speed 2563.07 samples/sec   Loss 9.4856   LearningRate 0.0503   Epoch: 5   Global Step: 241290   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:50:57,870-Speed 2581.90 samples/sec   Loss 9.4349   LearningRate 0.0503   Epoch: 5   Global Step: 241300   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:01,767-Speed 2627.63 samples/sec   Loss 9.6910   LearningRate 0.0503   Epoch: 5   Global Step: 241310   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:05,672-Speed 2622.90 samples/sec   Loss 9.3746   LearningRate 0.0503   Epoch: 5   Global Step: 241320   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:09,567-Speed 2629.19 samples/sec   Loss 9.5807   LearningRate 0.0503   Epoch: 5   Global Step: 241330   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:13,475-Speed 2621.28 samples/sec   Loss 9.5275   LearningRate 0.0503   Epoch: 5   Global Step: 241340   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:17,370-Speed 2629.35 samples/sec   Loss 9.5242   LearningRate 0.0503   Epoch: 5   Global Step: 241350   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:21,264-Speed 2630.29 samples/sec   Loss 9.7167   LearningRate 0.0503   Epoch: 5   Global Step: 241360   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:25,161-Speed 2628.90 samples/sec   Loss 9.7536   LearningRate 0.0503   Epoch: 5   Global Step: 241370   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:29,057-Speed 2628.89 samples/sec   Loss 9.6219   LearningRate 0.0503   Epoch: 5   Global Step: 241380   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:51:32,949-Speed 2631.51 samples/sec   Loss 9.4199   LearningRate 0.0503   Epoch: 5   Global Step: 241390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:51:36,843-Speed 2629.94 samples/sec   Loss 9.5039   LearningRate 0.0503   Epoch: 5   Global Step: 241400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:51:40,738-Speed 2629.74 samples/sec   Loss 9.6803   LearningRate 0.0503   Epoch: 5   Global Step: 241410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:51:44,631-Speed 2631.17 samples/sec   Loss 9.6459   LearningRate 0.0503   Epoch: 5   Global Step: 241420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:51:48,533-Speed 2624.42 samples/sec   Loss 9.6261   LearningRate 0.0503   Epoch: 5   Global Step: 241430   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:51:52,425-Speed 2631.81 samples/sec   Loss 9.5267   LearningRate 0.0503   Epoch: 5   Global Step: 241440   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:51:56,317-Speed 2631.77 samples/sec   Loss 9.5857   LearningRate 0.0503   Epoch: 5   Global Step: 241450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:00,213-Speed 2629.10 samples/sec   Loss 9.6555   LearningRate 0.0503   Epoch: 5   Global Step: 241460   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:04,128-Speed 2615.83 samples/sec   Loss 9.5818   LearningRate 0.0503   Epoch: 5   Global Step: 241470   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:08,019-Speed 2632.78 samples/sec   Loss 9.4933   LearningRate 0.0503   Epoch: 5   Global Step: 241480   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:11,913-Speed 2630.18 samples/sec   Loss 9.5944   LearningRate 0.0503   Epoch: 5   Global Step: 241490   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:52:15,817-Speed 2623.73 samples/sec   Loss 9.6481   LearningRate 0.0503   Epoch: 5   Global Step: 241500   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:52:19,714-Speed 2628.06 samples/sec   Loss 9.4666   LearningRate 0.0503   Epoch: 5   Global Step: 241510   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:52:23,609-Speed 2629.62 samples/sec   Loss 9.5775   LearningRate 0.0502   Epoch: 5   Global Step: 241520   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:52:27,494-Speed 2636.28 samples/sec   Loss 9.5784   LearningRate 0.0502   Epoch: 5   Global Step: 241530   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:31,382-Speed 2634.85 samples/sec   Loss 9.6397   LearningRate 0.0502   Epoch: 5   Global Step: 241540   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:35,278-Speed 2628.87 samples/sec   Loss 9.6433   LearningRate 0.0502   Epoch: 5   Global Step: 241550   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:39,171-Speed 2630.57 samples/sec   Loss 9.6789   LearningRate 0.0502   Epoch: 5   Global Step: 241560   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:52:43,048-Speed 2642.45 samples/sec   Loss 9.5991   LearningRate 0.0502   Epoch: 5   Global Step: 241570   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:52:46,948-Speed 2626.42 samples/sec   Loss 9.5586   LearningRate 0.0502   Epoch: 5   Global Step: 241580   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:52:50,839-Speed 2631.93 samples/sec   Loss 9.6669   LearningRate 0.0502   Epoch: 5   Global Step: 241590   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:52:54,732-Speed 2631.18 samples/sec   Loss 9.5931   LearningRate 0.0502   Epoch: 5   Global Step: 241600   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:52:58,626-Speed 2630.00 samples/sec   Loss 9.6079   LearningRate 0.0502   Epoch: 5   Global Step: 241610   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:53:02,525-Speed 2627.07 samples/sec   Loss 9.6771   LearningRate 0.0502   Epoch: 5   Global Step: 241620   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:53:06,419-Speed 2630.07 samples/sec   Loss 9.5190   LearningRate 0.0502   Epoch: 5   Global Step: 241630   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:53:10,323-Speed 2623.60 samples/sec   Loss 9.5424   LearningRate 0.0502   Epoch: 5   Global Step: 241640   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:53:14,218-Speed 2629.57 samples/sec   Loss 9.5316   LearningRate 0.0502   Epoch: 5   Global Step: 241650   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:53:18,115-Speed 2628.20 samples/sec   Loss 9.5656   LearningRate 0.0502   Epoch: 5   Global Step: 241660   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:53:22,044-Speed 2607.30 samples/sec   Loss 9.5833   LearningRate 0.0502   Epoch: 5   Global Step: 241670   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:25,938-Speed 2629.83 samples/sec   Loss 9.7235   LearningRate 0.0502   Epoch: 5   Global Step: 241680   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:29,831-Speed 2631.45 samples/sec   Loss 9.6158   LearningRate 0.0502   Epoch: 5   Global Step: 241690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:33,741-Speed 2619.12 samples/sec   Loss 9.6054   LearningRate 0.0502   Epoch: 5   Global Step: 241700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:37,638-Speed 2628.23 samples/sec   Loss 9.5648   LearningRate 0.0502   Epoch: 5   Global Step: 241710   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:41,534-Speed 2628.82 samples/sec   Loss 9.5490   LearningRate 0.0502   Epoch: 5   Global Step: 241720   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:45,433-Speed 2627.19 samples/sec   Loss 9.6323   LearningRate 0.0502   Epoch: 5   Global Step: 241730   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:49,335-Speed 2624.37 samples/sec   Loss 9.7011   LearningRate 0.0502   Epoch: 5   Global Step: 241740   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:53,226-Speed 2632.77 samples/sec   Loss 9.5842   LearningRate 0.0502   Epoch: 5   Global Step: 241750   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:53:57,121-Speed 2629.72 samples/sec   Loss 9.6947   LearningRate 0.0502   Epoch: 5   Global Step: 241760   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:54:01,015-Speed 2630.65 samples/sec   Loss 9.4814   LearningRate 0.0502   Epoch: 5   Global Step: 241770   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:04,909-Speed 2630.22 samples/sec   Loss 9.6277   LearningRate 0.0502   Epoch: 5   Global Step: 241780   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:08,802-Speed 2630.96 samples/sec   Loss 9.6722   LearningRate 0.0502   Epoch: 5   Global Step: 241790   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:12,694-Speed 2631.01 samples/sec   Loss 9.3911   LearningRate 0.0502   Epoch: 5   Global Step: 241800   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:16,597-Speed 2624.83 samples/sec   Loss 9.5473   LearningRate 0.0502   Epoch: 5   Global Step: 241810   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:20,500-Speed 2623.53 samples/sec   Loss 9.6183   LearningRate 0.0502   Epoch: 5   Global Step: 241820   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:24,398-Speed 2627.94 samples/sec   Loss 9.6044   LearningRate 0.0502   Epoch: 5   Global Step: 241830   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:28,294-Speed 2628.56 samples/sec   Loss 9.5543   LearningRate 0.0502   Epoch: 5   Global Step: 241840   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:32,192-Speed 2628.28 samples/sec   Loss 9.5905   LearningRate 0.0502   Epoch: 5   Global Step: 241850   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:36,090-Speed 2627.34 samples/sec   Loss 9.5688   LearningRate 0.0502   Epoch: 5   Global Step: 241860   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:39,977-Speed 2635.55 samples/sec   Loss 9.4925   LearningRate 0.0502   Epoch: 5   Global Step: 241870   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:43,870-Speed 2630.91 samples/sec   Loss 9.4454   LearningRate 0.0502   Epoch: 5   Global Step: 241880   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:47,766-Speed 2628.84 samples/sec   Loss 9.5981   LearningRate 0.0502   Epoch: 5   Global Step: 241890   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:51,665-Speed 2626.90 samples/sec   Loss 9.5134   LearningRate 0.0502   Epoch: 5   Global Step: 241900   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:55,560-Speed 2629.24 samples/sec   Loss 9.4869   LearningRate 0.0502   Epoch: 5   Global Step: 241910   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:54:59,465-Speed 2623.04 samples/sec   Loss 9.6220   LearningRate 0.0502   Epoch: 5   Global Step: 241920   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:55:03,347-Speed 2638.23 samples/sec   Loss 9.4374   LearningRate 0.0502   Epoch: 5   Global Step: 241930   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:55:07,249-Speed 2625.11 samples/sec   Loss 9.5672   LearningRate 0.0502   Epoch: 5   Global Step: 241940   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:55:11,120-Speed 2645.54 samples/sec   Loss 9.5963   LearningRate 0.0502   Epoch: 5   Global Step: 241950   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:55:15,149-Speed 2542.66 samples/sec   Loss 11.4892   LearningRate 0.0502   Epoch: 5   Global Step: 241960   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:19,045-Speed 2629.14 samples/sec   Loss 10.7380   LearningRate 0.0502   Epoch: 5   Global Step: 241970   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:22,940-Speed 2629.65 samples/sec   Loss 10.0265   LearningRate 0.0502   Epoch: 5   Global Step: 241980   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:26,834-Speed 2629.75 samples/sec   Loss 9.7536   LearningRate 0.0502   Epoch: 5   Global Step: 241990   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:30,734-Speed 2626.80 samples/sec   Loss 9.6523   LearningRate 0.0502   Epoch: 5   Global Step: 242000   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:34,632-Speed 2627.12 samples/sec   Loss 9.6295   LearningRate 0.0502   Epoch: 5   Global Step: 242010   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:38,534-Speed 2625.17 samples/sec   Loss 9.6919   LearningRate 0.0502   Epoch: 5   Global Step: 242020   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:42,442-Speed 2620.20 samples/sec   Loss 9.6211   LearningRate 0.0502   Epoch: 5   Global Step: 242030   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:46,343-Speed 2625.96 samples/sec   Loss 9.6253   LearningRate 0.0502   Epoch: 5   Global Step: 242040   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:50,236-Speed 2630.88 samples/sec   Loss 9.6718   LearningRate 0.0502   Epoch: 5   Global Step: 242050   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 22:55:54,131-Speed 2629.47 samples/sec   Loss 9.7397   LearningRate 0.0502   Epoch: 5   Global Step: 242060   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:55:58,031-Speed 2626.76 samples/sec   Loss 9.6491   LearningRate 0.0502   Epoch: 5   Global Step: 242070   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:01,920-Speed 2633.53 samples/sec   Loss 9.6305   LearningRate 0.0502   Epoch: 5   Global Step: 242080   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:05,807-Speed 2635.08 samples/sec   Loss 9.6038   LearningRate 0.0502   Epoch: 5   Global Step: 242090   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:09,704-Speed 2628.24 samples/sec   Loss 9.6552   LearningRate 0.0501   Epoch: 5   Global Step: 242100   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:13,595-Speed 2632.17 samples/sec   Loss 9.5818   LearningRate 0.0501   Epoch: 5   Global Step: 242110   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:17,495-Speed 2626.01 samples/sec   Loss 9.6714   LearningRate 0.0501   Epoch: 5   Global Step: 242120   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:21,391-Speed 2629.11 samples/sec   Loss 9.5844   LearningRate 0.0501   Epoch: 5   Global Step: 242130   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:25,283-Speed 2631.11 samples/sec   Loss 9.4961   LearningRate 0.0501   Epoch: 5   Global Step: 242140   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:29,179-Speed 2629.18 samples/sec   Loss 9.5645   LearningRate 0.0501   Epoch: 5   Global Step: 242150   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 22:56:33,079-Speed 2626.83 samples/sec   Loss 9.6289   LearningRate 0.0501   Epoch: 5   Global Step: 242160   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:56:36,981-Speed 2624.65 samples/sec   Loss 9.7160   LearningRate 0.0501   Epoch: 5   Global Step: 242170   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:56:40,878-Speed 2628.05 samples/sec   Loss 9.6856   LearningRate 0.0501   Epoch: 5   Global Step: 242180   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:56:44,767-Speed 2633.52 samples/sec   Loss 9.7048   LearningRate 0.0501   Epoch: 5   Global Step: 242190   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:56:48,662-Speed 2630.28 samples/sec   Loss 9.7238   LearningRate 0.0501   Epoch: 5   Global Step: 242200   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:56:52,552-Speed 2632.66 samples/sec   Loss 9.5040   LearningRate 0.0501   Epoch: 5   Global Step: 242210   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:56:56,445-Speed 2630.89 samples/sec   Loss 9.7094   LearningRate 0.0501   Epoch: 5   Global Step: 242220   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:57:00,339-Speed 2630.58 samples/sec   Loss 9.5797   LearningRate 0.0501   Epoch: 5   Global Step: 242230   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:57:04,236-Speed 2628.32 samples/sec   Loss 9.7126   LearningRate 0.0501   Epoch: 5   Global Step: 242240   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:57:08,132-Speed 2628.75 samples/sec   Loss 9.6384   LearningRate 0.0501   Epoch: 5   Global Step: 242250   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 22:57:12,024-Speed 2631.85 samples/sec   Loss 9.6486   LearningRate 0.0501   Epoch: 5   Global Step: 242260   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:15,917-Speed 2630.90 samples/sec   Loss 9.7077   LearningRate 0.0501   Epoch: 5   Global Step: 242270   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:19,817-Speed 2626.09 samples/sec   Loss 9.7032   LearningRate 0.0501   Epoch: 5   Global Step: 242280   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:23,722-Speed 2623.14 samples/sec   Loss 9.6351   LearningRate 0.0501   Epoch: 5   Global Step: 242290   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:27,614-Speed 2631.84 samples/sec   Loss 9.5485   LearningRate 0.0501   Epoch: 5   Global Step: 242300   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:31,506-Speed 2631.39 samples/sec   Loss 9.6364   LearningRate 0.0501   Epoch: 5   Global Step: 242310   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:35,436-Speed 2605.53 samples/sec   Loss 9.6132   LearningRate 0.0501   Epoch: 5   Global Step: 242320   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:39,416-Speed 2573.41 samples/sec   Loss 9.7375   LearningRate 0.0501   Epoch: 5   Global Step: 242330   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:43,315-Speed 2627.16 samples/sec   Loss 9.5880   LearningRate 0.0501   Epoch: 5   Global Step: 242340   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:47,221-Speed 2622.52 samples/sec   Loss 9.6511   LearningRate 0.0501   Epoch: 5   Global Step: 242350   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 22:57:51,238-Speed 2550.15 samples/sec   Loss 9.5500   LearningRate 0.0501   Epoch: 5   Global Step: 242360   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:57:55,212-Speed 2577.17 samples/sec   Loss 9.6620   LearningRate 0.0501   Epoch: 5   Global Step: 242370   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:57:59,105-Speed 2631.00 samples/sec   Loss 9.5893   LearningRate 0.0501   Epoch: 5   Global Step: 242380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:02,997-Speed 2631.34 samples/sec   Loss 9.6520   LearningRate 0.0501   Epoch: 5   Global Step: 242390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:06,895-Speed 2627.27 samples/sec   Loss 9.5425   LearningRate 0.0501   Epoch: 5   Global Step: 242400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:10,791-Speed 2629.34 samples/sec   Loss 9.5294   LearningRate 0.0501   Epoch: 5   Global Step: 242410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:14,683-Speed 2631.65 samples/sec   Loss 9.5137   LearningRate 0.0501   Epoch: 5   Global Step: 242420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:18,580-Speed 2628.44 samples/sec   Loss 9.5825   LearningRate 0.0501   Epoch: 5   Global Step: 242430   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:22,482-Speed 2624.58 samples/sec   Loss 9.6119   LearningRate 0.0501   Epoch: 5   Global Step: 242440   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:26,373-Speed 2632.56 samples/sec   Loss 9.5348   LearningRate 0.0501   Epoch: 5   Global Step: 242450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:30,266-Speed 2630.90 samples/sec   Loss 9.6036   LearningRate 0.0501   Epoch: 5   Global Step: 242460   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:58:34,159-Speed 2630.86 samples/sec   Loss 9.4674   LearningRate 0.0501   Epoch: 5   Global Step: 242470   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:58:38,034-Speed 2642.99 samples/sec   Loss 9.5108   LearningRate 0.0501   Epoch: 5   Global Step: 242480   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:41,924-Speed 2633.35 samples/sec   Loss 9.4541   LearningRate 0.0501   Epoch: 5   Global Step: 242490   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:45,817-Speed 2630.93 samples/sec   Loss 9.5117   LearningRate 0.0501   Epoch: 5   Global Step: 242500   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:49,813-Speed 2563.02 samples/sec   Loss 9.5961   LearningRate 0.0501   Epoch: 5   Global Step: 242510   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:53,705-Speed 2631.51 samples/sec   Loss 9.6013   LearningRate 0.0501   Epoch: 5   Global Step: 242520   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:58:57,595-Speed 2632.92 samples/sec   Loss 9.6053   LearningRate 0.0501   Epoch: 5   Global Step: 242530   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:01,490-Speed 2630.02 samples/sec   Loss 9.6510   LearningRate 0.0501   Epoch: 5   Global Step: 242540   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:05,381-Speed 2632.56 samples/sec   Loss 9.6339   LearningRate 0.0501   Epoch: 5   Global Step: 242550   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:09,277-Speed 2628.32 samples/sec   Loss 9.5529   LearningRate 0.0501   Epoch: 5   Global Step: 242560   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:13,170-Speed 2631.58 samples/sec   Loss 9.5424   LearningRate 0.0501   Epoch: 5   Global Step: 242570   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:17,065-Speed 2630.09 samples/sec   Loss 9.5287   LearningRate 0.0501   Epoch: 5   Global Step: 242580   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:59:20,961-Speed 2629.23 samples/sec   Loss 9.5470   LearningRate 0.0501   Epoch: 5   Global Step: 242590   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:59:24,852-Speed 2632.10 samples/sec   Loss 9.4078   LearningRate 0.0501   Epoch: 5   Global Step: 242600   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:59:28,744-Speed 2631.84 samples/sec   Loss 9.6398   LearningRate 0.0501   Epoch: 5   Global Step: 242610   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 22:59:32,618-Speed 2643.33 samples/sec   Loss 9.5391   LearningRate 0.0501   Epoch: 5   Global Step: 242620   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:36,509-Speed 2632.37 samples/sec   Loss 9.5516   LearningRate 0.0501   Epoch: 5   Global Step: 242630   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:40,402-Speed 2631.03 samples/sec   Loss 9.6545   LearningRate 0.0501   Epoch: 5   Global Step: 242640   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:44,300-Speed 2628.22 samples/sec   Loss 9.4493   LearningRate 0.0501   Epoch: 5   Global Step: 242650   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:48,191-Speed 2632.62 samples/sec   Loss 9.5623   LearningRate 0.0501   Epoch: 5   Global Step: 242660   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:52,086-Speed 2629.44 samples/sec   Loss 9.4283   LearningRate 0.0501   Epoch: 5   Global Step: 242670   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:55,979-Speed 2631.33 samples/sec   Loss 9.4749   LearningRate 0.0501   Epoch: 5   Global Step: 242680   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 22:59:59,868-Speed 2633.50 samples/sec   Loss 9.5422   LearningRate 0.0500   Epoch: 5   Global Step: 242690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:03,768-Speed 2625.55 samples/sec   Loss 9.5512   LearningRate 0.0500   Epoch: 5   Global Step: 242700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:07,686-Speed 2614.61 samples/sec   Loss 9.5925   LearningRate 0.0500   Epoch: 5   Global Step: 242710   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:11,583-Speed 2627.93 samples/sec   Loss 9.5093   LearningRate 0.0500   Epoch: 5   Global Step: 242720   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:00:15,476-Speed 2630.66 samples/sec   Loss 9.6019   LearningRate 0.0500   Epoch: 5   Global Step: 242730   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:00:19,353-Speed 2641.92 samples/sec   Loss 9.5133   LearningRate 0.0500   Epoch: 5   Global Step: 242740   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:23,253-Speed 2626.70 samples/sec   Loss 9.5171   LearningRate 0.0500   Epoch: 5   Global Step: 242750   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:27,146-Speed 2630.93 samples/sec   Loss 9.5982   LearningRate 0.0500   Epoch: 5   Global Step: 242760   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:31,044-Speed 2627.34 samples/sec   Loss 9.6362   LearningRate 0.0500   Epoch: 5   Global Step: 242770   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:34,940-Speed 2628.81 samples/sec   Loss 9.6093   LearningRate 0.0500   Epoch: 5   Global Step: 242780   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:38,835-Speed 2629.94 samples/sec   Loss 9.6093   LearningRate 0.0500   Epoch: 5   Global Step: 242790   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:42,729-Speed 2630.71 samples/sec   Loss 9.5227   LearningRate 0.0500   Epoch: 5   Global Step: 242800   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:46,622-Speed 2630.77 samples/sec   Loss 9.4961   LearningRate 0.0500   Epoch: 5   Global Step: 242810   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:50,545-Speed 2610.65 samples/sec   Loss 9.5147   LearningRate 0.0500   Epoch: 5   Global Step: 242820   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:54,438-Speed 2631.03 samples/sec   Loss 9.6220   LearningRate 0.0500   Epoch: 5   Global Step: 242830   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:00:58,329-Speed 2632.05 samples/sec   Loss 9.4557   LearningRate 0.0500   Epoch: 5   Global Step: 242840   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:01:02,223-Speed 2630.68 samples/sec   Loss 9.5262   LearningRate 0.0500   Epoch: 5   Global Step: 242850   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:01:06,115-Speed 2631.84 samples/sec   Loss 9.5740   LearningRate 0.0500   Epoch: 5   Global Step: 242860   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:01:10,003-Speed 2634.32 samples/sec   Loss 9.5194   LearningRate 0.0500   Epoch: 5   Global Step: 242870   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:13,923-Speed 2612.75 samples/sec   Loss 9.6196   LearningRate 0.0500   Epoch: 5   Global Step: 242880   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:17,841-Speed 2614.09 samples/sec   Loss 9.6638   LearningRate 0.0500   Epoch: 5   Global Step: 242890   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:21,737-Speed 2629.06 samples/sec   Loss 9.6983   LearningRate 0.0500   Epoch: 5   Global Step: 242900   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:25,633-Speed 2628.53 samples/sec   Loss 9.5624   LearningRate 0.0500   Epoch: 5   Global Step: 242910   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:29,534-Speed 2625.62 samples/sec   Loss 9.5927   LearningRate 0.0500   Epoch: 5   Global Step: 242920   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:33,431-Speed 2628.28 samples/sec   Loss 9.5097   LearningRate 0.0500   Epoch: 5   Global Step: 242930   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:37,329-Speed 2627.77 samples/sec   Loss 9.5724   LearningRate 0.0500   Epoch: 5   Global Step: 242940   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:41,222-Speed 2631.10 samples/sec   Loss 9.5709   LearningRate 0.0500   Epoch: 5   Global Step: 242950   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:45,115-Speed 2631.16 samples/sec   Loss 9.5346   LearningRate 0.0500   Epoch: 5   Global Step: 242960   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:01:49,007-Speed 2631.57 samples/sec   Loss 9.5385   LearningRate 0.0500   Epoch: 5   Global Step: 242970   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:01:52,911-Speed 2623.46 samples/sec   Loss 9.5208   LearningRate 0.0500   Epoch: 5   Global Step: 242980   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:01:56,804-Speed 2630.97 samples/sec   Loss 9.6250   LearningRate 0.0500   Epoch: 5   Global Step: 242990   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:02:00,703-Speed 2626.48 samples/sec   Loss 9.6781   LearningRate 0.0500   Epoch: 5   Global Step: 243000   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:02:04,604-Speed 2625.73 samples/sec   Loss 9.4186   LearningRate 0.0500   Epoch: 5   Global Step: 243010   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:02:08,507-Speed 2624.36 samples/sec   Loss 9.4201   LearningRate 0.0500   Epoch: 5   Global Step: 243020   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:02:12,391-Speed 2637.30 samples/sec   Loss 9.5773   LearningRate 0.0500   Epoch: 5   Global Step: 243030   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:16,282-Speed 2632.12 samples/sec   Loss 9.5785   LearningRate 0.0500   Epoch: 5   Global Step: 243040   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:20,174-Speed 2632.11 samples/sec   Loss 9.5238   LearningRate 0.0500   Epoch: 5   Global Step: 243050   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:24,064-Speed 2632.80 samples/sec   Loss 9.4708   LearningRate 0.0500   Epoch: 5   Global Step: 243060   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:27,958-Speed 2630.06 samples/sec   Loss 9.6038   LearningRate 0.0500   Epoch: 5   Global Step: 243070   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:31,859-Speed 2625.30 samples/sec   Loss 9.5677   LearningRate 0.0500   Epoch: 5   Global Step: 243080   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:35,767-Speed 2620.80 samples/sec   Loss 9.6036   LearningRate 0.0500   Epoch: 5   Global Step: 243090   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:39,675-Speed 2620.93 samples/sec   Loss 9.5077   LearningRate 0.0500   Epoch: 5   Global Step: 243100   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:43,577-Speed 2625.08 samples/sec   Loss 9.4909   LearningRate 0.0500   Epoch: 5   Global Step: 243110   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:47,473-Speed 2628.68 samples/sec   Loss 9.5404   LearningRate 0.0500   Epoch: 5   Global Step: 243120   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:51,365-Speed 2631.92 samples/sec   Loss 9.6138   LearningRate 0.0500   Epoch: 5   Global Step: 243130   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:02:55,251-Speed 2635.96 samples/sec   Loss 9.6331   LearningRate 0.0500   Epoch: 5   Global Step: 243140   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:02:59,144-Speed 2630.89 samples/sec   Loss 9.6430   LearningRate 0.0500   Epoch: 5   Global Step: 243150   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:03,036-Speed 2631.18 samples/sec   Loss 9.4864   LearningRate 0.0500   Epoch: 5   Global Step: 243160   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:06,929-Speed 2631.02 samples/sec   Loss 9.5007   LearningRate 0.0500   Epoch: 5   Global Step: 243170   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:10,827-Speed 2627.39 samples/sec   Loss 9.7045   LearningRate 0.0500   Epoch: 5   Global Step: 243180   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:14,720-Speed 2630.80 samples/sec   Loss 9.6017   LearningRate 0.0500   Epoch: 5   Global Step: 243190   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:18,614-Speed 2631.27 samples/sec   Loss 9.4976   LearningRate 0.0500   Epoch: 5   Global Step: 243200   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:22,507-Speed 2630.45 samples/sec   Loss 9.5074   LearningRate 0.0500   Epoch: 5   Global Step: 243210   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:26,395-Speed 2634.42 samples/sec   Loss 9.3759   LearningRate 0.0500   Epoch: 5   Global Step: 243220   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:30,299-Speed 2623.97 samples/sec   Loss 9.5247   LearningRate 0.0500   Epoch: 5   Global Step: 243230   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:34,191-Speed 2631.44 samples/sec   Loss 9.5673   LearningRate 0.0500   Epoch: 5   Global Step: 243240   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:03:38,066-Speed 2643.33 samples/sec   Loss 9.4863   LearningRate 0.0500   Epoch: 5   Global Step: 243250   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:41,959-Speed 2630.73 samples/sec   Loss 9.5296   LearningRate 0.0500   Epoch: 5   Global Step: 243260   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:45,866-Speed 2621.81 samples/sec   Loss 9.6507   LearningRate 0.0500   Epoch: 5   Global Step: 243270   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:49,762-Speed 2628.98 samples/sec   Loss 9.6225   LearningRate 0.0499   Epoch: 5   Global Step: 243280   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:53,667-Speed 2622.72 samples/sec   Loss 9.3916   LearningRate 0.0499   Epoch: 5   Global Step: 243290   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:03:57,560-Speed 2630.86 samples/sec   Loss 9.6218   LearningRate 0.0499   Epoch: 5   Global Step: 243300   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:01,453-Speed 2630.86 samples/sec   Loss 9.6018   LearningRate 0.0499   Epoch: 5   Global Step: 243310   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:05,423-Speed 2579.78 samples/sec   Loss 9.4607   LearningRate 0.0499   Epoch: 5   Global Step: 243320   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:09,317-Speed 2630.54 samples/sec   Loss 9.4764   LearningRate 0.0499   Epoch: 5   Global Step: 243330   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:13,208-Speed 2632.61 samples/sec   Loss 9.4637   LearningRate 0.0499   Epoch: 5   Global Step: 243340   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:17,099-Speed 2632.23 samples/sec   Loss 9.3648   LearningRate 0.0499   Epoch: 5   Global Step: 243350   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:04:21,009-Speed 2619.74 samples/sec   Loss 9.5423   LearningRate 0.0499   Epoch: 5   Global Step: 243360   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:24,899-Speed 2632.54 samples/sec   Loss 9.6360   LearningRate 0.0499   Epoch: 5   Global Step: 243370   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:28,786-Speed 2635.41 samples/sec   Loss 9.5939   LearningRate 0.0499   Epoch: 5   Global Step: 243380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:32,676-Speed 2632.93 samples/sec   Loss 9.5576   LearningRate 0.0499   Epoch: 5   Global Step: 243390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:36,571-Speed 2629.47 samples/sec   Loss 9.5428   LearningRate 0.0499   Epoch: 5   Global Step: 243400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:40,460-Speed 2633.25 samples/sec   Loss 9.5032   LearningRate 0.0499   Epoch: 5   Global Step: 243410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:44,354-Speed 2630.54 samples/sec   Loss 9.4540   LearningRate 0.0499   Epoch: 5   Global Step: 243420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:48,262-Speed 2620.83 samples/sec   Loss 9.5002   LearningRate 0.0499   Epoch: 5   Global Step: 243430   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:52,166-Speed 2623.93 samples/sec   Loss 9.5840   LearningRate 0.0499   Epoch: 5   Global Step: 243440   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:56,069-Speed 2623.99 samples/sec   Loss 9.4318   LearningRate 0.0499   Epoch: 5   Global Step: 243450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:04:59,968-Speed 2626.87 samples/sec   Loss 9.5688   LearningRate 0.0499   Epoch: 5   Global Step: 243460   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:05:03,863-Speed 2629.58 samples/sec   Loss 9.5579   LearningRate 0.0499   Epoch: 5   Global Step: 243470   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:05:07,768-Speed 2622.72 samples/sec   Loss 9.5493   LearningRate 0.0499   Epoch: 5   Global Step: 243480   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:05:11,668-Speed 2626.68 samples/sec   Loss 9.5798   LearningRate 0.0499   Epoch: 5   Global Step: 243490   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:05:15,563-Speed 2629.51 samples/sec   Loss 9.4727   LearningRate 0.0499   Epoch: 5   Global Step: 243500   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:19,458-Speed 2629.39 samples/sec   Loss 9.4792   LearningRate 0.0499   Epoch: 5   Global Step: 243510   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:23,354-Speed 2628.73 samples/sec   Loss 9.4273   LearningRate 0.0499   Epoch: 5   Global Step: 243520   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:27,256-Speed 2625.34 samples/sec   Loss 9.5826   LearningRate 0.0499   Epoch: 5   Global Step: 243530   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:31,158-Speed 2624.47 samples/sec   Loss 9.5092   LearningRate 0.0499   Epoch: 5   Global Step: 243540   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:35,054-Speed 2629.15 samples/sec   Loss 9.5678   LearningRate 0.0499   Epoch: 5   Global Step: 243550   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:38,952-Speed 2627.22 samples/sec   Loss 9.5847   LearningRate 0.0499   Epoch: 5   Global Step: 243560   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:42,853-Speed 2625.82 samples/sec   Loss 9.4845   LearningRate 0.0499   Epoch: 5   Global Step: 243570   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:46,752-Speed 2626.75 samples/sec   Loss 9.5772   LearningRate 0.0499   Epoch: 5   Global Step: 243580   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:05:50,644-Speed 2631.95 samples/sec   Loss 9.4587   LearningRate 0.0499   Epoch: 5   Global Step: 243590   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:05:54,545-Speed 2625.39 samples/sec   Loss 9.5407   LearningRate 0.0499   Epoch: 5   Global Step: 243600   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:05:58,443-Speed 2627.56 samples/sec   Loss 9.5189   LearningRate 0.0499   Epoch: 5   Global Step: 243610   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:02,351-Speed 2620.87 samples/sec   Loss 9.5395   LearningRate 0.0499   Epoch: 5   Global Step: 243620   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:06,247-Speed 2628.87 samples/sec   Loss 9.5943   LearningRate 0.0499   Epoch: 5   Global Step: 243630   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:10,142-Speed 2629.55 samples/sec   Loss 9.4794   LearningRate 0.0499   Epoch: 5   Global Step: 243640   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:14,035-Speed 2631.57 samples/sec   Loss 9.5402   LearningRate 0.0499   Epoch: 5   Global Step: 243650   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:17,935-Speed 2626.72 samples/sec   Loss 9.5519   LearningRate 0.0499   Epoch: 5   Global Step: 243660   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:21,828-Speed 2630.42 samples/sec   Loss 9.6836   LearningRate 0.0499   Epoch: 5   Global Step: 243670   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:25,902-Speed 2514.16 samples/sec   Loss 9.6297   LearningRate 0.0499   Epoch: 5   Global Step: 243680   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:06:29,809-Speed 2621.31 samples/sec   Loss 9.4276   LearningRate 0.0499   Epoch: 5   Global Step: 243690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:33,713-Speed 2624.05 samples/sec   Loss 9.6045   LearningRate 0.0499   Epoch: 5   Global Step: 243700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:37,703-Speed 2566.84 samples/sec   Loss 9.4886   LearningRate 0.0499   Epoch: 5   Global Step: 243710   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:41,628-Speed 2609.25 samples/sec   Loss 9.5183   LearningRate 0.0499   Epoch: 5   Global Step: 243720   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:45,528-Speed 2626.99 samples/sec   Loss 9.6091   LearningRate 0.0499   Epoch: 5   Global Step: 243730   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:49,424-Speed 2628.36 samples/sec   Loss 9.5656   LearningRate 0.0499   Epoch: 5   Global Step: 243740   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:53,340-Speed 2616.11 samples/sec   Loss 9.4950   LearningRate 0.0499   Epoch: 5   Global Step: 243750   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:06:57,240-Speed 2625.44 samples/sec   Loss 9.6014   LearningRate 0.0499   Epoch: 5   Global Step: 243760   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:07:01,135-Speed 2629.99 samples/sec   Loss 9.6773   LearningRate 0.0499   Epoch: 5   Global Step: 243770   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:07:05,033-Speed 2627.10 samples/sec   Loss 9.5384   LearningRate 0.0499   Epoch: 5   Global Step: 243780   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:07:08,939-Speed 2622.99 samples/sec   Loss 9.6108   LearningRate 0.0499   Epoch: 5   Global Step: 243790   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:12,843-Speed 2623.69 samples/sec   Loss 9.5664   LearningRate 0.0499   Epoch: 5   Global Step: 243800   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:16,746-Speed 2623.96 samples/sec   Loss 9.5914   LearningRate 0.0499   Epoch: 5   Global Step: 243810   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:20,648-Speed 2625.12 samples/sec   Loss 9.5369   LearningRate 0.0499   Epoch: 5   Global Step: 243820   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:25,057-Speed 2323.12 samples/sec   Loss 9.6036   LearningRate 0.0499   Epoch: 5   Global Step: 243830   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:28,949-Speed 2631.92 samples/sec   Loss 9.4308   LearningRate 0.0499   Epoch: 5   Global Step: 243840   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:32,846-Speed 2628.40 samples/sec   Loss 9.4898   LearningRate 0.0499   Epoch: 5   Global Step: 243850   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:36,740-Speed 2630.12 samples/sec   Loss 9.5318   LearningRate 0.0498   Epoch: 5   Global Step: 243860   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:40,646-Speed 2622.03 samples/sec   Loss 9.5273   LearningRate 0.0498   Epoch: 5   Global Step: 243870   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:07:44,535-Speed 2633.40 samples/sec   Loss 9.4509   LearningRate 0.0498   Epoch: 5   Global Step: 243880   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:07:48,439-Speed 2623.43 samples/sec   Loss 9.4978   LearningRate 0.0498   Epoch: 5   Global Step: 243890   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:07:52,361-Speed 2611.53 samples/sec   Loss 9.4145   LearningRate 0.0498   Epoch: 5   Global Step: 243900   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:07:56,264-Speed 2624.00 samples/sec   Loss 9.4847   LearningRate 0.0498   Epoch: 5   Global Step: 243910   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:00,176-Speed 2618.52 samples/sec   Loss 9.3742   LearningRate 0.0498   Epoch: 5   Global Step: 243920   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:04,091-Speed 2616.14 samples/sec   Loss 9.5231   LearningRate 0.0498   Epoch: 5   Global Step: 243930   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:07,984-Speed 2631.39 samples/sec   Loss 9.5326   LearningRate 0.0498   Epoch: 5   Global Step: 243940   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:12,051-Speed 2518.18 samples/sec   Loss 9.4407   LearningRate 0.0498   Epoch: 5   Global Step: 243950   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:16,145-Speed 2502.04 samples/sec   Loss 9.4794   LearningRate 0.0498   Epoch: 5   Global Step: 243960   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:20,059-Speed 2616.63 samples/sec   Loss 9.5419   LearningRate 0.0498   Epoch: 5   Global Step: 243970   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:23,959-Speed 2626.37 samples/sec   Loss 9.5483   LearningRate 0.0498   Epoch: 5   Global Step: 243980   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:08:27,864-Speed 2623.12 samples/sec   Loss 9.4537   LearningRate 0.0498   Epoch: 5   Global Step: 243990   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:08:31,773-Speed 2620.25 samples/sec   Loss 9.5237   LearningRate 0.0498   Epoch: 5   Global Step: 244000   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:35,788-Speed 2550.85 samples/sec   Loss 9.5870   LearningRate 0.0498   Epoch: 5   Global Step: 244010   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:39,699-Speed 2618.79 samples/sec   Loss 9.6192   LearningRate 0.0498   Epoch: 5   Global Step: 244020   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:43,608-Speed 2620.35 samples/sec   Loss 9.4953   LearningRate 0.0498   Epoch: 5   Global Step: 244030   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:47,502-Speed 2630.65 samples/sec   Loss 9.5935   LearningRate 0.0498   Epoch: 5   Global Step: 244040   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:51,400-Speed 2627.23 samples/sec   Loss 9.4358   LearningRate 0.0498   Epoch: 5   Global Step: 244050   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:55,297-Speed 2628.65 samples/sec   Loss 9.3532   LearningRate 0.0498   Epoch: 5   Global Step: 244060   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:08:59,193-Speed 2628.83 samples/sec   Loss 9.2878   LearningRate 0.0498   Epoch: 5   Global Step: 244070   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:03,091-Speed 2627.31 samples/sec   Loss 9.5918   LearningRate 0.0498   Epoch: 5   Global Step: 244080   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:06,985-Speed 2630.54 samples/sec   Loss 9.4669   LearningRate 0.0498   Epoch: 5   Global Step: 244090   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:10,880-Speed 2629.39 samples/sec   Loss 9.5101   LearningRate 0.0498   Epoch: 5   Global Step: 244100   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:09:14,773-Speed 2631.53 samples/sec   Loss 9.5884   LearningRate 0.0498   Epoch: 5   Global Step: 244110   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:09:18,669-Speed 2628.85 samples/sec   Loss 9.5769   LearningRate 0.0498   Epoch: 5   Global Step: 244120   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:09:22,550-Speed 2639.13 samples/sec   Loss 9.5879   LearningRate 0.0498   Epoch: 5   Global Step: 244130   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:26,445-Speed 2629.35 samples/sec   Loss 9.5015   LearningRate 0.0498   Epoch: 5   Global Step: 244140   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:30,355-Speed 2619.52 samples/sec   Loss 9.6124   LearningRate 0.0498   Epoch: 5   Global Step: 244150   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:34,257-Speed 2624.81 samples/sec   Loss 9.5030   LearningRate 0.0498   Epoch: 5   Global Step: 244160   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:38,149-Speed 2631.34 samples/sec   Loss 9.5326   LearningRate 0.0498   Epoch: 5   Global Step: 244170   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:42,040-Speed 2632.57 samples/sec   Loss 9.4647   LearningRate 0.0498   Epoch: 5   Global Step: 244180   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:47,025-Speed 2054.36 samples/sec   Loss 9.5337   LearningRate 0.0498   Epoch: 5   Global Step: 244190   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:50,917-Speed 2632.00 samples/sec   Loss 9.4710   LearningRate 0.0498   Epoch: 5   Global Step: 244200   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:54,806-Speed 2634.09 samples/sec   Loss 9.5339   LearningRate 0.0498   Epoch: 5   Global Step: 244210   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:09:58,712-Speed 2622.03 samples/sec   Loss 9.5856   LearningRate 0.0498   Epoch: 5   Global Step: 244220   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:10:02,608-Speed 2628.92 samples/sec   Loss 9.4389   LearningRate 0.0498   Epoch: 5   Global Step: 244230   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:10:06,498-Speed 2632.54 samples/sec   Loss 9.5809   LearningRate 0.0498   Epoch: 5   Global Step: 244240   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:10:10,380-Speed 2638.42 samples/sec   Loss 9.6281   LearningRate 0.0498   Epoch: 5   Global Step: 244250   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:10:14,252-Speed 2645.42 samples/sec   Loss 9.6348   LearningRate 0.0498   Epoch: 5   Global Step: 244260   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:10:18,154-Speed 2625.13 samples/sec   Loss 9.5880   LearningRate 0.0498   Epoch: 5   Global Step: 244270   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:10:22,044-Speed 2632.61 samples/sec   Loss 9.6289   LearningRate 0.0498   Epoch: 5   Global Step: 244280   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:10:25,886-Speed 2666.10 samples/sec   Loss 10.4292   LearningRate 0.0498   Epoch: 5   Global Step: 244290   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:29,782-Speed 2629.41 samples/sec   Loss 10.0445   LearningRate 0.0498   Epoch: 5   Global Step: 244300   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:33,673-Speed 2631.86 samples/sec   Loss 10.2844   LearningRate 0.0498   Epoch: 5   Global Step: 244310   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:37,563-Speed 2632.86 samples/sec   Loss 9.9684   LearningRate 0.0498   Epoch: 5   Global Step: 244320   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:41,454-Speed 2632.39 samples/sec   Loss 9.6799   LearningRate 0.0498   Epoch: 5   Global Step: 244330   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:45,343-Speed 2633.95 samples/sec   Loss 9.7636   LearningRate 0.0498   Epoch: 5   Global Step: 244340   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:49,233-Speed 2632.63 samples/sec   Loss 9.5752   LearningRate 0.0498   Epoch: 5   Global Step: 244350   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:53,123-Speed 2632.70 samples/sec   Loss 9.5306   LearningRate 0.0498   Epoch: 5   Global Step: 244360   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:10:57,066-Speed 2597.81 samples/sec   Loss 9.5039   LearningRate 0.0498   Epoch: 5   Global Step: 244370   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:11:00,962-Speed 2628.77 samples/sec   Loss 9.5097   LearningRate 0.0498   Epoch: 5   Global Step: 244380   Fp16 Grad Scale: 8192   Required: 66 hours
Training: 2022-04-13 23:11:04,855-Speed 2631.23 samples/sec   Loss 9.5638   LearningRate 0.0498   Epoch: 5   Global Step: 244390   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:08,744-Speed 2633.92 samples/sec   Loss 9.6094   LearningRate 0.0498   Epoch: 5   Global Step: 244400   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:12,635-Speed 2632.41 samples/sec   Loss 9.5353   LearningRate 0.0498   Epoch: 5   Global Step: 244410   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:16,534-Speed 2626.55 samples/sec   Loss 9.5719   LearningRate 0.0498   Epoch: 5   Global Step: 244420   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:20,442-Speed 2621.20 samples/sec   Loss 9.5188   LearningRate 0.0498   Epoch: 5   Global Step: 244430   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:24,335-Speed 2631.03 samples/sec   Loss 9.5126   LearningRate 0.0498   Epoch: 5   Global Step: 244440   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:28,225-Speed 2632.78 samples/sec   Loss 9.6500   LearningRate 0.0497   Epoch: 5   Global Step: 244450   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:32,117-Speed 2631.61 samples/sec   Loss 9.5831   LearningRate 0.0497   Epoch: 5   Global Step: 244460   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:36,033-Speed 2615.49 samples/sec   Loss 9.5298   LearningRate 0.0497   Epoch: 5   Global Step: 244470   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:39,926-Speed 2630.79 samples/sec   Loss 9.4112   LearningRate 0.0497   Epoch: 5   Global Step: 244480   Fp16 Grad Scale: 16384   Required: 66 hours
Training: 2022-04-13 23:11:43,826-Speed 2626.34 samples/sec   Loss 9.5163   LearningRate 0.0497   Epoch: 5   Global Step: 244490   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:11:47,718-Speed 2631.57 samples/sec   Loss 9.4381   LearningRate 0.0497   Epoch: 5   Global Step: 244500   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:11:51,615-Speed 2628.45 samples/sec   Loss 9.4471   LearningRate 0.0497   Epoch: 5   Global Step: 244510   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:11:55,509-Speed 2630.18 samples/sec   Loss 9.5059   LearningRate 0.0497   Epoch: 5   Global Step: 244520   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:11:59,400-Speed 2632.75 samples/sec   Loss 9.6157   LearningRate 0.0497   Epoch: 5   Global Step: 244530   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:12:03,302-Speed 2624.32 samples/sec   Loss 9.5382   LearningRate 0.0497   Epoch: 5   Global Step: 244540   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:12:07,195-Speed 2630.84 samples/sec   Loss 9.6516   LearningRate 0.0497   Epoch: 5   Global Step: 244550   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:12:11,087-Speed 2631.97 samples/sec   Loss 9.5524   LearningRate 0.0497   Epoch: 5   Global Step: 244560   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:12:14,977-Speed 2632.86 samples/sec   Loss 9.4898   LearningRate 0.0497   Epoch: 5   Global Step: 244570   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:12:18,871-Speed 2630.58 samples/sec   Loss 9.6360   LearningRate 0.0497   Epoch: 5   Global Step: 244580   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:12:22,762-Speed 2632.30 samples/sec   Loss 9.6281   LearningRate 0.0497   Epoch: 5   Global Step: 244590   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:26,661-Speed 2626.71 samples/sec   Loss 9.4374   LearningRate 0.0497   Epoch: 5   Global Step: 244600   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:30,557-Speed 2629.06 samples/sec   Loss 9.5340   LearningRate 0.0497   Epoch: 5   Global Step: 244610   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:34,485-Speed 2607.80 samples/sec   Loss 9.5603   LearningRate 0.0497   Epoch: 5   Global Step: 244620   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:38,377-Speed 2631.52 samples/sec   Loss 9.4732   LearningRate 0.0497   Epoch: 5   Global Step: 244630   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:42,269-Speed 2631.63 samples/sec   Loss 9.4795   LearningRate 0.0497   Epoch: 5   Global Step: 244640   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:46,159-Speed 2632.60 samples/sec   Loss 9.6481   LearningRate 0.0497   Epoch: 5   Global Step: 244650   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:50,050-Speed 2632.60 samples/sec   Loss 9.6600   LearningRate 0.0497   Epoch: 5   Global Step: 244660   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:53,951-Speed 2625.81 samples/sec   Loss 9.6427   LearningRate 0.0497   Epoch: 5   Global Step: 244670   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:12:57,839-Speed 2633.74 samples/sec   Loss 9.5495   LearningRate 0.0497   Epoch: 5   Global Step: 244680   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:13:01,730-Speed 2632.52 samples/sec   Loss 9.3985   LearningRate 0.0497   Epoch: 5   Global Step: 244690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:05,621-Speed 2632.46 samples/sec   Loss 9.4884   LearningRate 0.0497   Epoch: 5   Global Step: 244700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:09,536-Speed 2616.35 samples/sec   Loss 9.5271   LearningRate 0.0497   Epoch: 5   Global Step: 244710   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:13,431-Speed 2629.43 samples/sec   Loss 9.3309   LearningRate 0.0497   Epoch: 5   Global Step: 244720   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:17,328-Speed 2628.49 samples/sec   Loss 9.6025   LearningRate 0.0497   Epoch: 5   Global Step: 244730   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:21,222-Speed 2629.90 samples/sec   Loss 9.5742   LearningRate 0.0497   Epoch: 5   Global Step: 244740   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:25,115-Speed 2631.41 samples/sec   Loss 9.4002   LearningRate 0.0497   Epoch: 5   Global Step: 244750   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:29,019-Speed 2623.23 samples/sec   Loss 9.5843   LearningRate 0.0497   Epoch: 5   Global Step: 244760   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:32,923-Speed 2623.40 samples/sec   Loss 9.5983   LearningRate 0.0497   Epoch: 5   Global Step: 244770   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:36,826-Speed 2624.37 samples/sec   Loss 9.5729   LearningRate 0.0497   Epoch: 5   Global Step: 244780   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:40,724-Speed 2627.64 samples/sec   Loss 9.4820   LearningRate 0.0497   Epoch: 5   Global Step: 244790   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:13:44,604-Speed 2640.14 samples/sec   Loss 9.5908   LearningRate 0.0497   Epoch: 5   Global Step: 244800   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:48,493-Speed 2633.64 samples/sec   Loss 9.5456   LearningRate 0.0497   Epoch: 5   Global Step: 244810   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:52,385-Speed 2631.18 samples/sec   Loss 9.3413   LearningRate 0.0497   Epoch: 5   Global Step: 244820   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:13:56,279-Speed 2630.40 samples/sec   Loss 9.6042   LearningRate 0.0497   Epoch: 5   Global Step: 244830   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:14:00,172-Speed 2630.83 samples/sec   Loss 9.3242   LearningRate 0.0497   Epoch: 5   Global Step: 244840   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:14:04,084-Speed 2618.39 samples/sec   Loss 9.5815   LearningRate 0.0497   Epoch: 5   Global Step: 244850   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:14:07,967-Speed 2637.37 samples/sec   Loss 9.4715   LearningRate 0.0497   Epoch: 5   Global Step: 244860   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:11,869-Speed 2625.44 samples/sec   Loss 9.5135   LearningRate 0.0497   Epoch: 5   Global Step: 244870   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:15,766-Speed 2627.81 samples/sec   Loss 9.4639   LearningRate 0.0497   Epoch: 5   Global Step: 244880   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:19,664-Speed 2628.06 samples/sec   Loss 9.5543   LearningRate 0.0497   Epoch: 5   Global Step: 244890   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:23,581-Speed 2614.58 samples/sec   Loss 9.5567   LearningRate 0.0497   Epoch: 5   Global Step: 244900   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:27,473-Speed 2632.22 samples/sec   Loss 9.4505   LearningRate 0.0497   Epoch: 5   Global Step: 244910   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:31,360-Speed 2635.01 samples/sec   Loss 9.7151   LearningRate 0.0497   Epoch: 5   Global Step: 244920   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:14:35,247-Speed 2634.62 samples/sec   Loss 9.5913   LearningRate 0.0497   Epoch: 5   Global Step: 244930   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:14:39,138-Speed 2631.91 samples/sec   Loss 9.5374   LearningRate 0.0497   Epoch: 5   Global Step: 244940   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:14:43,032-Speed 2630.45 samples/sec   Loss 9.5653   LearningRate 0.0497   Epoch: 5   Global Step: 244950   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:14:46,927-Speed 2629.74 samples/sec   Loss 9.4287   LearningRate 0.0497   Epoch: 5   Global Step: 244960   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:14:50,826-Speed 2627.11 samples/sec   Loss 9.4067   LearningRate 0.0497   Epoch: 5   Global Step: 244970   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:14:54,773-Speed 2594.74 samples/sec   Loss 9.6900   LearningRate 0.0497   Epoch: 5   Global Step: 244980   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:14:58,671-Speed 2628.11 samples/sec   Loss 9.5027   LearningRate 0.0497   Epoch: 5   Global Step: 244990   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:15:02,565-Speed 2629.86 samples/sec   Loss 9.6579   LearningRate 0.0497   Epoch: 5   Global Step: 245000   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:15:06,464-Speed 2626.61 samples/sec   Loss 9.4506   LearningRate 0.0497   Epoch: 5   Global Step: 245010   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:15:10,358-Speed 2630.50 samples/sec   Loss 9.5236   LearningRate 0.0497   Epoch: 5   Global Step: 245020   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:15:14,250-Speed 2631.51 samples/sec   Loss 9.5722   LearningRate 0.0497   Epoch: 5   Global Step: 245030   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:18,151-Speed 2625.33 samples/sec   Loss 9.5003   LearningRate 0.0496   Epoch: 5   Global Step: 245040   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:22,047-Speed 2629.30 samples/sec   Loss 9.4521   LearningRate 0.0496   Epoch: 5   Global Step: 245050   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:25,940-Speed 2630.79 samples/sec   Loss 9.4213   LearningRate 0.0496   Epoch: 5   Global Step: 245060   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:29,836-Speed 2629.16 samples/sec   Loss 9.5540   LearningRate 0.0496   Epoch: 5   Global Step: 245070   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:33,732-Speed 2629.08 samples/sec   Loss 9.6330   LearningRate 0.0496   Epoch: 5   Global Step: 245080   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:37,625-Speed 2631.08 samples/sec   Loss 9.4697   LearningRate 0.0496   Epoch: 5   Global Step: 245090   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:41,523-Speed 2626.98 samples/sec   Loss 9.4399   LearningRate 0.0496   Epoch: 5   Global Step: 245100   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:45,422-Speed 2627.04 samples/sec   Loss 9.5548   LearningRate 0.0496   Epoch: 5   Global Step: 245110   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:49,321-Speed 2627.15 samples/sec   Loss 9.5632   LearningRate 0.0496   Epoch: 5   Global Step: 245120   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:15:53,182-Speed 2653.00 samples/sec   Loss 9.6432   LearningRate 0.0496   Epoch: 5   Global Step: 245130   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:15:57,079-Speed 2627.83 samples/sec   Loss 9.7183   LearningRate 0.0496   Epoch: 5   Global Step: 245140   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:00,975-Speed 2628.88 samples/sec   Loss 9.5990   LearningRate 0.0496   Epoch: 5   Global Step: 245150   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:04,872-Speed 2628.80 samples/sec   Loss 9.4572   LearningRate 0.0496   Epoch: 5   Global Step: 245160   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:08,761-Speed 2633.31 samples/sec   Loss 9.4389   LearningRate 0.0496   Epoch: 5   Global Step: 245170   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:12,652-Speed 2632.93 samples/sec   Loss 9.6179   LearningRate 0.0496   Epoch: 5   Global Step: 245180   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:16,547-Speed 2629.95 samples/sec   Loss 9.6185   LearningRate 0.0496   Epoch: 5   Global Step: 245190   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:20,438-Speed 2632.00 samples/sec   Loss 9.4326   LearningRate 0.0496   Epoch: 5   Global Step: 245200   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:24,332-Speed 2630.43 samples/sec   Loss 9.5039   LearningRate 0.0496   Epoch: 5   Global Step: 245210   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:28,236-Speed 2623.25 samples/sec   Loss 9.6326   LearningRate 0.0496   Epoch: 5   Global Step: 245220   Fp16 Grad Scale: 32768   Required: 66 hours
Training: 2022-04-13 23:16:32,125-Speed 2633.83 samples/sec   Loss 9.4809   LearningRate 0.0496   Epoch: 5   Global Step: 245230   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:36,023-Speed 2627.34 samples/sec   Loss 9.5188   LearningRate 0.0496   Epoch: 5   Global Step: 245240   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:39,911-Speed 2634.26 samples/sec   Loss 9.5549   LearningRate 0.0496   Epoch: 5   Global Step: 245250   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:43,804-Speed 2631.40 samples/sec   Loss 9.4520   LearningRate 0.0496   Epoch: 5   Global Step: 245260   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:47,699-Speed 2629.84 samples/sec   Loss 9.4896   LearningRate 0.0496   Epoch: 5   Global Step: 245270   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:51,593-Speed 2630.42 samples/sec   Loss 9.5749   LearningRate 0.0496   Epoch: 5   Global Step: 245280   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:55,485-Speed 2631.16 samples/sec   Loss 9.4860   LearningRate 0.0496   Epoch: 5   Global Step: 245290   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:16:59,389-Speed 2623.91 samples/sec   Loss 9.4873   LearningRate 0.0496   Epoch: 5   Global Step: 245300   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:17:03,306-Speed 2614.80 samples/sec   Loss 9.5904   LearningRate 0.0496   Epoch: 5   Global Step: 245310   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:17:07,201-Speed 2629.01 samples/sec   Loss 9.5188   LearningRate 0.0496   Epoch: 5   Global Step: 245320   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:17:11,105-Speed 2623.85 samples/sec   Loss 9.4256   LearningRate 0.0496   Epoch: 5   Global Step: 245330   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:14,996-Speed 2631.99 samples/sec   Loss 9.5615   LearningRate 0.0496   Epoch: 5   Global Step: 245340   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:18,895-Speed 2627.56 samples/sec   Loss 9.5036   LearningRate 0.0496   Epoch: 5   Global Step: 245350   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:22,795-Speed 2626.05 samples/sec   Loss 9.5128   LearningRate 0.0496   Epoch: 5   Global Step: 245360   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:26,693-Speed 2627.72 samples/sec   Loss 9.4971   LearningRate 0.0496   Epoch: 5   Global Step: 245370   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:30,591-Speed 2627.81 samples/sec   Loss 9.5387   LearningRate 0.0496   Epoch: 5   Global Step: 245380   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:34,481-Speed 2632.83 samples/sec   Loss 9.7222   LearningRate 0.0496   Epoch: 5   Global Step: 245390   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:38,379-Speed 2627.43 samples/sec   Loss 9.3275   LearningRate 0.0496   Epoch: 5   Global Step: 245400   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:42,273-Speed 2630.37 samples/sec   Loss 9.5118   LearningRate 0.0496   Epoch: 5   Global Step: 245410   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:46,163-Speed 2632.44 samples/sec   Loss 9.4769   LearningRate 0.0496   Epoch: 5   Global Step: 245420   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:50,055-Speed 2632.22 samples/sec   Loss 9.6651   LearningRate 0.0496   Epoch: 5   Global Step: 245430   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:17:53,927-Speed 2644.56 samples/sec   Loss 9.4596   LearningRate 0.0496   Epoch: 5   Global Step: 245440   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:17:57,819-Speed 2632.21 samples/sec   Loss 9.4676   LearningRate 0.0496   Epoch: 5   Global Step: 245450   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:18:01,712-Speed 2630.91 samples/sec   Loss 9.4487   LearningRate 0.0496   Epoch: 5   Global Step: 245460   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:18:05,590-Speed 2641.50 samples/sec   Loss 9.5453   LearningRate 0.0496   Epoch: 5   Global Step: 245470   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:09,483-Speed 2630.45 samples/sec   Loss 9.4230   LearningRate 0.0496   Epoch: 5   Global Step: 245480   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:13,373-Speed 2633.53 samples/sec   Loss 9.5638   LearningRate 0.0496   Epoch: 5   Global Step: 245490   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:17,285-Speed 2617.83 samples/sec   Loss 9.5743   LearningRate 0.0496   Epoch: 5   Global Step: 245500   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:21,223-Speed 2601.24 samples/sec   Loss 9.5884   LearningRate 0.0496   Epoch: 5   Global Step: 245510   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:25,117-Speed 2630.16 samples/sec   Loss 9.3782   LearningRate 0.0496   Epoch: 5   Global Step: 245520   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:29,010-Speed 2631.26 samples/sec   Loss 9.4843   LearningRate 0.0496   Epoch: 5   Global Step: 245530   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:32,910-Speed 2625.92 samples/sec   Loss 9.6982   LearningRate 0.0496   Epoch: 5   Global Step: 245540   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:36,813-Speed 2624.56 samples/sec   Loss 9.6400   LearningRate 0.0496   Epoch: 5   Global Step: 245550   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:40,708-Speed 2629.63 samples/sec   Loss 9.4664   LearningRate 0.0496   Epoch: 5   Global Step: 245560   Fp16 Grad Scale: 65536   Required: 66 hours
Training: 2022-04-13 23:18:44,601-Speed 2631.19 samples/sec   Loss 9.6289   LearningRate 0.0496   Epoch: 5   Global Step: 245570   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:18:48,499-Speed 2627.55 samples/sec   Loss 9.4742   LearningRate 0.0496   Epoch: 5   Global Step: 245580   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:18:52,388-Speed 2633.76 samples/sec   Loss 9.4050   LearningRate 0.0496   Epoch: 5   Global Step: 245590   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:18:56,283-Speed 2629.46 samples/sec   Loss 9.5620   LearningRate 0.0496   Epoch: 5   Global Step: 245600   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:00,175-Speed 2631.10 samples/sec   Loss 9.3687   LearningRate 0.0496   Epoch: 5   Global Step: 245610   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:04,070-Speed 2629.87 samples/sec   Loss 9.4938   LearningRate 0.0496   Epoch: 5   Global Step: 245620   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:07,966-Speed 2629.17 samples/sec   Loss 9.5758   LearningRate 0.0495   Epoch: 5   Global Step: 245630   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:11,861-Speed 2629.64 samples/sec   Loss 9.3954   LearningRate 0.0495   Epoch: 5   Global Step: 245640   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:15,756-Speed 2629.62 samples/sec   Loss 9.6540   LearningRate 0.0495   Epoch: 5   Global Step: 245650   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:19,648-Speed 2631.43 samples/sec   Loss 9.5716   LearningRate 0.0495   Epoch: 5   Global Step: 245660   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:23,545-Speed 2628.69 samples/sec   Loss 9.5915   LearningRate 0.0495   Epoch: 5   Global Step: 245670   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:19:27,447-Speed 2624.95 samples/sec   Loss 9.4593   LearningRate 0.0495   Epoch: 5   Global Step: 245680   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:19:31,325-Speed 2640.84 samples/sec   Loss 9.6289   LearningRate 0.0495   Epoch: 5   Global Step: 245690   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:35,220-Speed 2629.06 samples/sec   Loss 9.5068   LearningRate 0.0495   Epoch: 5   Global Step: 245700   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:39,136-Speed 2615.57 samples/sec   Loss 9.5931   LearningRate 0.0495   Epoch: 5   Global Step: 245710   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:43,026-Speed 2633.57 samples/sec   Loss 9.4722   LearningRate 0.0495   Epoch: 5   Global Step: 245720   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:46,916-Speed 2632.98 samples/sec   Loss 9.5446   LearningRate 0.0495   Epoch: 5   Global Step: 245730   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:50,827-Speed 2619.59 samples/sec   Loss 9.6133   LearningRate 0.0495   Epoch: 5   Global Step: 245740   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:54,718-Speed 2632.17 samples/sec   Loss 9.5165   LearningRate 0.0495   Epoch: 5   Global Step: 245750   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:19:58,606-Speed 2634.14 samples/sec   Loss 9.5269   LearningRate 0.0495   Epoch: 5   Global Step: 245760   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:02,498-Speed 2631.65 samples/sec   Loss 9.5341   LearningRate 0.0495   Epoch: 5   Global Step: 245770   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:06,387-Speed 2633.11 samples/sec   Loss 9.6066   LearningRate 0.0495   Epoch: 5   Global Step: 245780   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:10,263-Speed 2642.43 samples/sec   Loss 9.4708   LearningRate 0.0495   Epoch: 5   Global Step: 245790   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:14,165-Speed 2625.24 samples/sec   Loss 9.4873   LearningRate 0.0495   Epoch: 5   Global Step: 245800   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:18,058-Speed 2630.59 samples/sec   Loss 9.4257   LearningRate 0.0495   Epoch: 5   Global Step: 245810   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:21,959-Speed 2625.89 samples/sec   Loss 9.5249   LearningRate 0.0495   Epoch: 5   Global Step: 245820   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:25,854-Speed 2629.86 samples/sec   Loss 9.4715   LearningRate 0.0495   Epoch: 5   Global Step: 245830   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:29,757-Speed 2624.59 samples/sec   Loss 9.6067   LearningRate 0.0495   Epoch: 5   Global Step: 245840   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:33,653-Speed 2628.87 samples/sec   Loss 9.5344   LearningRate 0.0495   Epoch: 5   Global Step: 245850   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:37,549-Speed 2628.45 samples/sec   Loss 9.5790   LearningRate 0.0495   Epoch: 5   Global Step: 245860   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:41,440-Speed 2632.06 samples/sec   Loss 9.4988   LearningRate 0.0495   Epoch: 5   Global Step: 245870   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:45,348-Speed 2621.02 samples/sec   Loss 9.6179   LearningRate 0.0495   Epoch: 5   Global Step: 245880   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:49,245-Speed 2628.78 samples/sec   Loss 9.4863   LearningRate 0.0495   Epoch: 5   Global Step: 245890   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:20:53,130-Speed 2636.02 samples/sec   Loss 9.4308   LearningRate 0.0495   Epoch: 5   Global Step: 245900   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:20:57,016-Speed 2636.23 samples/sec   Loss 9.4794   LearningRate 0.0495   Epoch: 5   Global Step: 245910   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:00,909-Speed 2630.22 samples/sec   Loss 9.5567   LearningRate 0.0495   Epoch: 5   Global Step: 245920   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:04,806-Speed 2628.63 samples/sec   Loss 9.4828   LearningRate 0.0495   Epoch: 5   Global Step: 245930   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:08,697-Speed 2632.18 samples/sec   Loss 9.5852   LearningRate 0.0495   Epoch: 5   Global Step: 245940   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:12,589-Speed 2632.03 samples/sec   Loss 9.4544   LearningRate 0.0495   Epoch: 5   Global Step: 245950   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:16,483-Speed 2630.02 samples/sec   Loss 9.5163   LearningRate 0.0495   Epoch: 5   Global Step: 245960   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:20,384-Speed 2625.83 samples/sec   Loss 9.5153   LearningRate 0.0495   Epoch: 5   Global Step: 245970   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:24,278-Speed 2630.23 samples/sec   Loss 9.5120   LearningRate 0.0495   Epoch: 5   Global Step: 245980   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:28,179-Speed 2626.23 samples/sec   Loss 9.5111   LearningRate 0.0495   Epoch: 5   Global Step: 245990   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:32,075-Speed 2628.67 samples/sec   Loss 9.2883   LearningRate 0.0495   Epoch: 5   Global Step: 246000   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:21:35,985-Speed 2619.23 samples/sec   Loss 9.4712   LearningRate 0.0495   Epoch: 5   Global Step: 246010   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:21:39,868-Speed 2637.77 samples/sec   Loss 9.4856   LearningRate 0.0495   Epoch: 5   Global Step: 246020   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:43,764-Speed 2628.91 samples/sec   Loss 9.4596   LearningRate 0.0495   Epoch: 5   Global Step: 246030   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:47,655-Speed 2632.74 samples/sec   Loss 9.4530   LearningRate 0.0495   Epoch: 5   Global Step: 246040   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:51,546-Speed 2632.37 samples/sec   Loss 9.5612   LearningRate 0.0495   Epoch: 5   Global Step: 246050   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:55,437-Speed 2632.20 samples/sec   Loss 9.6297   LearningRate 0.0495   Epoch: 5   Global Step: 246060   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:21:59,329-Speed 2631.35 samples/sec   Loss 9.4544   LearningRate 0.0495   Epoch: 5   Global Step: 246070   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:03,223-Speed 2630.50 samples/sec   Loss 9.6352   LearningRate 0.0495   Epoch: 5   Global Step: 246080   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:07,117-Speed 2630.28 samples/sec   Loss 9.5611   LearningRate 0.0495   Epoch: 5   Global Step: 246090   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:11,012-Speed 2629.19 samples/sec   Loss 9.5247   LearningRate 0.0495   Epoch: 5   Global Step: 246100   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:14,908-Speed 2628.73 samples/sec   Loss 9.5208   LearningRate 0.0495   Epoch: 5   Global Step: 246110   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:18,839-Speed 2605.71 samples/sec   Loss 9.4050   LearningRate 0.0495   Epoch: 5   Global Step: 246120   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:22:22,719-Speed 2639.86 samples/sec   Loss 9.6034   LearningRate 0.0495   Epoch: 5   Global Step: 246130   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:26,613-Speed 2630.55 samples/sec   Loss 9.5929   LearningRate 0.0495   Epoch: 5   Global Step: 246140   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:30,505-Speed 2631.44 samples/sec   Loss 9.4005   LearningRate 0.0495   Epoch: 5   Global Step: 246150   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:34,400-Speed 2629.90 samples/sec   Loss 9.3717   LearningRate 0.0495   Epoch: 5   Global Step: 246160   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:38,301-Speed 2625.75 samples/sec   Loss 9.5968   LearningRate 0.0495   Epoch: 5   Global Step: 246170   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:42,198-Speed 2627.80 samples/sec   Loss 9.5320   LearningRate 0.0495   Epoch: 5   Global Step: 246180   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:46,105-Speed 2621.67 samples/sec   Loss 9.4852   LearningRate 0.0495   Epoch: 5   Global Step: 246190   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:50,009-Speed 2623.39 samples/sec   Loss 9.4553   LearningRate 0.0495   Epoch: 5   Global Step: 246200   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:53,897-Speed 2634.10 samples/sec   Loss 9.5129   LearningRate 0.0495   Epoch: 5   Global Step: 246210   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:22:57,790-Speed 2630.83 samples/sec   Loss 9.4790   LearningRate 0.0494   Epoch: 5   Global Step: 246220   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:23:01,681-Speed 2632.73 samples/sec   Loss 9.5185   LearningRate 0.0494   Epoch: 5   Global Step: 246230   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:23:05,573-Speed 2632.12 samples/sec   Loss 9.5821   LearningRate 0.0494   Epoch: 5   Global Step: 246240   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:23:09,463-Speed 2632.72 samples/sec   Loss 9.7049   LearningRate 0.0494   Epoch: 5   Global Step: 246250   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:23:13,352-Speed 2633.51 samples/sec   Loss 9.6951   LearningRate 0.0494   Epoch: 5   Global Step: 246260   Fp16 Grad Scale: 262144   Required: 66 hours
Training: 2022-04-13 23:23:17,223-Speed 2645.49 samples/sec   Loss 9.5504   LearningRate 0.0494   Epoch: 5   Global Step: 246270   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:23:21,114-Speed 2632.75 samples/sec   Loss 9.5188   LearningRate 0.0494   Epoch: 5   Global Step: 246280   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:23:25,005-Speed 2632.49 samples/sec   Loss 9.4864   LearningRate 0.0494   Epoch: 5   Global Step: 246290   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:23:28,896-Speed 2632.26 samples/sec   Loss 9.4730   LearningRate 0.0494   Epoch: 5   Global Step: 246300   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:23:32,789-Speed 2631.00 samples/sec   Loss 9.5173   LearningRate 0.0494   Epoch: 5   Global Step: 246310   Fp16 Grad Scale: 131072   Required: 66 hours
Training: 2022-04-13 23:23:36,681-Speed 2631.31 samples/sec   Loss 9.5009   LearningRate 0.0494   Epoch: 5   Global Step: 246320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:23:40,576-Speed 2630.00 samples/sec   Loss 9.5282   LearningRate 0.0494   Epoch: 5   Global Step: 246330   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:23:44,465-Speed 2633.35 samples/sec   Loss 9.3546   LearningRate 0.0494   Epoch: 5   Global Step: 246340   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:23:48,357-Speed 2631.89 samples/sec   Loss 9.4394   LearningRate 0.0494   Epoch: 5   Global Step: 246350   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:23:52,255-Speed 2627.54 samples/sec   Loss 9.4334   LearningRate 0.0494   Epoch: 5   Global Step: 246360   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:23:56,151-Speed 2629.08 samples/sec   Loss 9.4848   LearningRate 0.0494   Epoch: 5   Global Step: 246370   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:24:00,053-Speed 2624.37 samples/sec   Loss 9.4792   LearningRate 0.0494   Epoch: 5   Global Step: 246380   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:24:03,924-Speed 2646.47 samples/sec   Loss 9.5001   LearningRate 0.0494   Epoch: 5   Global Step: 246390   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:07,814-Speed 2632.53 samples/sec   Loss 9.4353   LearningRate 0.0494   Epoch: 5   Global Step: 246400   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:11,708-Speed 2630.04 samples/sec   Loss 9.4539   LearningRate 0.0494   Epoch: 5   Global Step: 246410   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:15,609-Speed 2625.65 samples/sec   Loss 9.5909   LearningRate 0.0494   Epoch: 5   Global Step: 246420   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:19,503-Speed 2630.31 samples/sec   Loss 9.6451   LearningRate 0.0494   Epoch: 5   Global Step: 246430   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:23,397-Speed 2630.71 samples/sec   Loss 9.5123   LearningRate 0.0494   Epoch: 5   Global Step: 246440   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:27,290-Speed 2630.68 samples/sec   Loss 9.5545   LearningRate 0.0494   Epoch: 5   Global Step: 246450   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:31,182-Speed 2631.89 samples/sec   Loss 9.5017   LearningRate 0.0494   Epoch: 5   Global Step: 246460   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:35,073-Speed 2632.58 samples/sec   Loss 9.4450   LearningRate 0.0494   Epoch: 5   Global Step: 246470   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:38,964-Speed 2632.03 samples/sec   Loss 9.5850   LearningRate 0.0494   Epoch: 5   Global Step: 246480   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:42,844-Speed 2639.36 samples/sec   Loss 9.5815   LearningRate 0.0494   Epoch: 5   Global Step: 246490   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:46,729-Speed 2637.09 samples/sec   Loss 9.5430   LearningRate 0.0494   Epoch: 5   Global Step: 246500   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:50,624-Speed 2629.65 samples/sec   Loss 9.4508   LearningRate 0.0494   Epoch: 5   Global Step: 246510   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:24:54,502-Speed 2641.16 samples/sec   Loss 9.4488   LearningRate 0.0494   Epoch: 5   Global Step: 246520   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:24:58,398-Speed 2628.94 samples/sec   Loss 9.6648   LearningRate 0.0494   Epoch: 5   Global Step: 246530   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:02,296-Speed 2627.95 samples/sec   Loss 9.5843   LearningRate 0.0494   Epoch: 5   Global Step: 246540   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:06,188-Speed 2631.29 samples/sec   Loss 9.3273   LearningRate 0.0494   Epoch: 5   Global Step: 246550   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:10,079-Speed 2632.58 samples/sec   Loss 9.2950   LearningRate 0.0494   Epoch: 5   Global Step: 246560   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:13,969-Speed 2632.52 samples/sec   Loss 9.5634   LearningRate 0.0494   Epoch: 5   Global Step: 246570   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:17,870-Speed 2625.65 samples/sec   Loss 9.6061   LearningRate 0.0494   Epoch: 5   Global Step: 246580   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:21,777-Speed 2621.79 samples/sec   Loss 9.5308   LearningRate 0.0494   Epoch: 5   Global Step: 246590   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:25,681-Speed 2623.51 samples/sec   Loss 9.4987   LearningRate 0.0494   Epoch: 5   Global Step: 246600   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:29,571-Speed 2633.33 samples/sec   Loss 9.4526   LearningRate 0.0494   Epoch: 5   Global Step: 246610   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:33,466-Speed 2629.73 samples/sec   Loss 9.3456   LearningRate 0.0494   Epoch: 5   Global Step: 246620   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:25:37,346-Speed 2640.02 samples/sec   Loss 9.4790   LearningRate 0.0494   Epoch: 5   Global Step: 246630   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:41,236-Speed 2632.48 samples/sec   Loss 9.5100   LearningRate 0.0494   Epoch: 5   Global Step: 246640   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:45,136-Speed 2626.38 samples/sec   Loss 9.5086   LearningRate 0.0494   Epoch: 5   Global Step: 246650   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:49,049-Speed 2617.74 samples/sec   Loss 9.5084   LearningRate 0.0494   Epoch: 5   Global Step: 246660   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:25:52,935-Speed 2635.78 samples/sec   Loss 9.4968   LearningRate 0.0494   Epoch: 5   Global Step: 246670   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:25:56,833-Speed 2627.71 samples/sec   Loss 9.4380   LearningRate 0.0494   Epoch: 5   Global Step: 246680   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:00,722-Speed 2633.77 samples/sec   Loss 9.5739   LearningRate 0.0494   Epoch: 5   Global Step: 246690   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:04,617-Speed 2630.03 samples/sec   Loss 9.5049   LearningRate 0.0494   Epoch: 5   Global Step: 246700   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:08,519-Speed 2624.13 samples/sec   Loss 9.3442   LearningRate 0.0494   Epoch: 5   Global Step: 246710   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:12,410-Speed 2632.71 samples/sec   Loss 9.5678   LearningRate 0.0494   Epoch: 5   Global Step: 246720   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:16,301-Speed 2632.24 samples/sec   Loss 9.5976   LearningRate 0.0494   Epoch: 5   Global Step: 246730   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:20,193-Speed 2631.93 samples/sec   Loss 9.4155   LearningRate 0.0494   Epoch: 5   Global Step: 246740   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:24,108-Speed 2616.00 samples/sec   Loss 9.4416   LearningRate 0.0494   Epoch: 5   Global Step: 246750   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:28,010-Speed 2625.17 samples/sec   Loss 9.5018   LearningRate 0.0494   Epoch: 5   Global Step: 246760   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:26:31,903-Speed 2631.54 samples/sec   Loss 9.4580   LearningRate 0.0494   Epoch: 5   Global Step: 246770   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:35,793-Speed 2632.67 samples/sec   Loss 9.5319   LearningRate 0.0494   Epoch: 5   Global Step: 246780   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:39,687-Speed 2629.93 samples/sec   Loss 9.4249   LearningRate 0.0494   Epoch: 5   Global Step: 246790   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:43,580-Speed 2631.36 samples/sec   Loss 9.4487   LearningRate 0.0494   Epoch: 5   Global Step: 246800   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:47,475-Speed 2629.55 samples/sec   Loss 9.6767   LearningRate 0.0493   Epoch: 5   Global Step: 246810   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:51,368-Speed 2630.90 samples/sec   Loss 9.4131   LearningRate 0.0493   Epoch: 5   Global Step: 246820   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:55,259-Speed 2632.28 samples/sec   Loss 9.4214   LearningRate 0.0493   Epoch: 5   Global Step: 246830   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:26:59,176-Speed 2615.01 samples/sec   Loss 9.5140   LearningRate 0.0493   Epoch: 5   Global Step: 246840   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:27:03,085-Speed 2620.60 samples/sec   Loss 9.5358   LearningRate 0.0493   Epoch: 5   Global Step: 246850   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:27:06,975-Speed 2632.79 samples/sec   Loss 9.6177   LearningRate 0.0493   Epoch: 5   Global Step: 246860   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:27:10,866-Speed 2632.43 samples/sec   Loss 9.5170   LearningRate 0.0493   Epoch: 5   Global Step: 246870   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:14,764-Speed 2627.62 samples/sec   Loss 9.5328   LearningRate 0.0493   Epoch: 5   Global Step: 246880   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:18,658-Speed 2630.47 samples/sec   Loss 9.5197   LearningRate 0.0493   Epoch: 5   Global Step: 246890   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:22,555-Speed 2628.79 samples/sec   Loss 9.5363   LearningRate 0.0493   Epoch: 5   Global Step: 246900   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:26,447-Speed 2631.36 samples/sec   Loss 9.4983   LearningRate 0.0493   Epoch: 5   Global Step: 246910   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:30,342-Speed 2629.86 samples/sec   Loss 9.4820   LearningRate 0.0493   Epoch: 5   Global Step: 246920   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:34,244-Speed 2624.90 samples/sec   Loss 9.5489   LearningRate 0.0493   Epoch: 5   Global Step: 246930   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:38,137-Speed 2630.60 samples/sec   Loss 9.4980   LearningRate 0.0493   Epoch: 5   Global Step: 246940   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:42,032-Speed 2630.11 samples/sec   Loss 9.4854   LearningRate 0.0493   Epoch: 5   Global Step: 246950   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:27:45,904-Speed 2645.54 samples/sec   Loss 9.5613   LearningRate 0.0493   Epoch: 5   Global Step: 246960   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:27:49,763-Speed 2654.47 samples/sec   Loss 11.1611   LearningRate 0.0493   Epoch: 5   Global Step: 246970   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:27:53,648-Speed 2636.16 samples/sec   Loss 10.2777   LearningRate 0.0493   Epoch: 5   Global Step: 246980   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:27:57,543-Speed 2629.52 samples/sec   Loss 9.9330   LearningRate 0.0493   Epoch: 5   Global Step: 246990   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:01,435-Speed 2631.72 samples/sec   Loss 9.5338   LearningRate 0.0493   Epoch: 5   Global Step: 247000   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:05,323-Speed 2634.18 samples/sec   Loss 9.4666   LearningRate 0.0493   Epoch: 5   Global Step: 247010   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:09,211-Speed 2634.02 samples/sec   Loss 9.5782   LearningRate 0.0493   Epoch: 5   Global Step: 247020   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:13,101-Speed 2633.89 samples/sec   Loss 9.6396   LearningRate 0.0493   Epoch: 5   Global Step: 247030   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:16,988-Speed 2634.49 samples/sec   Loss 9.4538   LearningRate 0.0493   Epoch: 5   Global Step: 247040   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:20,883-Speed 2630.22 samples/sec   Loss 9.6430   LearningRate 0.0493   Epoch: 5   Global Step: 247050   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:24,770-Speed 2634.76 samples/sec   Loss 9.3908   LearningRate 0.0493   Epoch: 5   Global Step: 247060   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:28:28,663-Speed 2631.08 samples/sec   Loss 9.6417   LearningRate 0.0493   Epoch: 5   Global Step: 247070   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:32,554-Speed 2632.62 samples/sec   Loss 9.5588   LearningRate 0.0493   Epoch: 5   Global Step: 247080   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:36,444-Speed 2632.97 samples/sec   Loss 9.5378   LearningRate 0.0493   Epoch: 5   Global Step: 247090   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:40,330-Speed 2635.27 samples/sec   Loss 9.5037   LearningRate 0.0493   Epoch: 5   Global Step: 247100   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:44,222-Speed 2632.06 samples/sec   Loss 9.4120   LearningRate 0.0493   Epoch: 5   Global Step: 247110   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:48,118-Speed 2629.57 samples/sec   Loss 9.5266   LearningRate 0.0493   Epoch: 5   Global Step: 247120   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:52,011-Speed 2630.76 samples/sec   Loss 9.4221   LearningRate 0.0493   Epoch: 5   Global Step: 247130   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:55,900-Speed 2633.73 samples/sec   Loss 9.6960   LearningRate 0.0493   Epoch: 5   Global Step: 247140   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:28:59,807-Speed 2621.57 samples/sec   Loss 9.5500   LearningRate 0.0493   Epoch: 5   Global Step: 247150   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:29:03,719-Speed 2618.89 samples/sec   Loss 9.6194   LearningRate 0.0493   Epoch: 5   Global Step: 247160   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:29:07,624-Speed 2623.04 samples/sec   Loss 9.4456   LearningRate 0.0493   Epoch: 5   Global Step: 247170   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:29:11,548-Speed 2610.01 samples/sec   Loss 9.5187   LearningRate 0.0493   Epoch: 5   Global Step: 247180   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:29:15,447-Speed 2627.27 samples/sec   Loss 9.5582   LearningRate 0.0493   Epoch: 5   Global Step: 247190   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:29:19,344-Speed 2628.28 samples/sec   Loss 9.6012   LearningRate 0.0493   Epoch: 5   Global Step: 247200   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:29:23,241-Speed 2628.32 samples/sec   Loss 9.6363   LearningRate 0.0493   Epoch: 5   Global Step: 247210   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:29:27,091-Speed 2660.64 samples/sec   Loss 10.2801   LearningRate 0.0493   Epoch: 5   Global Step: 247220   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:29:30,982-Speed 2632.00 samples/sec   Loss 9.9408   LearningRate 0.0493   Epoch: 5   Global Step: 247230   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:34,859-Speed 2642.45 samples/sec   Loss 9.7733   LearningRate 0.0493   Epoch: 5   Global Step: 247240   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:38,745-Speed 2635.33 samples/sec   Loss 9.4521   LearningRate 0.0493   Epoch: 5   Global Step: 247250   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:42,632-Speed 2635.04 samples/sec   Loss 9.6497   LearningRate 0.0493   Epoch: 5   Global Step: 247260   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:46,525-Speed 2631.01 samples/sec   Loss 9.5586   LearningRate 0.0493   Epoch: 5   Global Step: 247270   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:50,536-Speed 2553.50 samples/sec   Loss 9.5815   LearningRate 0.0493   Epoch: 5   Global Step: 247280   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:54,416-Speed 2640.14 samples/sec   Loss 9.6092   LearningRate 0.0493   Epoch: 5   Global Step: 247290   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:29:58,323-Speed 2621.45 samples/sec   Loss 9.5481   LearningRate 0.0493   Epoch: 5   Global Step: 247300   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:30:02,220-Speed 2628.99 samples/sec   Loss 9.5804   LearningRate 0.0493   Epoch: 5   Global Step: 247310   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:30:06,115-Speed 2630.03 samples/sec   Loss 9.4982   LearningRate 0.0493   Epoch: 5   Global Step: 247320   Fp16 Grad Scale: 4096   Required: 65 hours
Training: 2022-04-13 23:30:10,006-Speed 2631.91 samples/sec   Loss 9.5348   LearningRate 0.0493   Epoch: 5   Global Step: 247330   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:13,902-Speed 2628.99 samples/sec   Loss 9.5320   LearningRate 0.0493   Epoch: 5   Global Step: 247340   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:17,793-Speed 2631.87 samples/sec   Loss 9.4938   LearningRate 0.0493   Epoch: 5   Global Step: 247350   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:21,688-Speed 2630.02 samples/sec   Loss 9.5110   LearningRate 0.0493   Epoch: 5   Global Step: 247360   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:25,579-Speed 2633.92 samples/sec   Loss 9.4957   LearningRate 0.0493   Epoch: 5   Global Step: 247370   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:29,503-Speed 2610.43 samples/sec   Loss 9.5665   LearningRate 0.0493   Epoch: 5   Global Step: 247380   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:33,396-Speed 2630.26 samples/sec   Loss 9.6129   LearningRate 0.0493   Epoch: 5   Global Step: 247390   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:37,289-Speed 2631.10 samples/sec   Loss 9.6245   LearningRate 0.0492   Epoch: 5   Global Step: 247400   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:41,178-Speed 2633.93 samples/sec   Loss 9.5903   LearningRate 0.0492   Epoch: 5   Global Step: 247410   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:45,063-Speed 2636.86 samples/sec   Loss 9.5536   LearningRate 0.0492   Epoch: 5   Global Step: 247420   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:30:48,951-Speed 2634.53 samples/sec   Loss 9.4534   LearningRate 0.0492   Epoch: 5   Global Step: 247430   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:30:52,837-Speed 2635.85 samples/sec   Loss 9.4697   LearningRate 0.0492   Epoch: 5   Global Step: 247440   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:30:56,726-Speed 2633.55 samples/sec   Loss 9.3993   LearningRate 0.0492   Epoch: 5   Global Step: 247450   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:00,612-Speed 2635.93 samples/sec   Loss 9.4525   LearningRate 0.0492   Epoch: 5   Global Step: 247460   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:04,501-Speed 2633.06 samples/sec   Loss 9.3200   LearningRate 0.0492   Epoch: 5   Global Step: 247470   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:08,412-Speed 2619.11 samples/sec   Loss 9.5724   LearningRate 0.0492   Epoch: 5   Global Step: 247480   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:12,310-Speed 2627.53 samples/sec   Loss 9.5571   LearningRate 0.0492   Epoch: 5   Global Step: 247490   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:16,202-Speed 2632.03 samples/sec   Loss 9.5423   LearningRate 0.0492   Epoch: 5   Global Step: 247500   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:20,094-Speed 2631.60 samples/sec   Loss 9.5337   LearningRate 0.0492   Epoch: 5   Global Step: 247510   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:23,995-Speed 2625.57 samples/sec   Loss 9.4674   LearningRate 0.0492   Epoch: 5   Global Step: 247520   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:31:27,893-Speed 2627.54 samples/sec   Loss 9.5570   LearningRate 0.0492   Epoch: 5   Global Step: 247530   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:31,781-Speed 2634.75 samples/sec   Loss 9.5704   LearningRate 0.0492   Epoch: 5   Global Step: 247540   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:35,680-Speed 2627.06 samples/sec   Loss 9.5659   LearningRate 0.0492   Epoch: 5   Global Step: 247550   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:39,572-Speed 2631.45 samples/sec   Loss 9.4392   LearningRate 0.0492   Epoch: 5   Global Step: 247560   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:43,466-Speed 2630.61 samples/sec   Loss 9.5027   LearningRate 0.0492   Epoch: 5   Global Step: 247570   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:47,367-Speed 2625.22 samples/sec   Loss 9.6044   LearningRate 0.0492   Epoch: 5   Global Step: 247580   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:51,257-Speed 2633.70 samples/sec   Loss 9.4583   LearningRate 0.0492   Epoch: 5   Global Step: 247590   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:55,151-Speed 2630.05 samples/sec   Loss 9.4611   LearningRate 0.0492   Epoch: 5   Global Step: 247600   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:31:59,043-Speed 2632.05 samples/sec   Loss 9.5308   LearningRate 0.0492   Epoch: 5   Global Step: 247610   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:32:02,933-Speed 2633.12 samples/sec   Loss 9.4002   LearningRate 0.0492   Epoch: 5   Global Step: 247620   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:32:06,832-Speed 2626.90 samples/sec   Loss 9.4878   LearningRate 0.0492   Epoch: 5   Global Step: 247630   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:10,724-Speed 2631.66 samples/sec   Loss 9.4217   LearningRate 0.0492   Epoch: 5   Global Step: 247640   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:14,637-Speed 2617.92 samples/sec   Loss 9.4637   LearningRate 0.0492   Epoch: 5   Global Step: 247650   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:18,536-Speed 2627.12 samples/sec   Loss 9.4736   LearningRate 0.0492   Epoch: 5   Global Step: 247660   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:22,425-Speed 2633.37 samples/sec   Loss 9.5300   LearningRate 0.0492   Epoch: 5   Global Step: 247670   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:26,321-Speed 2628.96 samples/sec   Loss 9.4464   LearningRate 0.0492   Epoch: 5   Global Step: 247680   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:30,236-Speed 2616.52 samples/sec   Loss 9.4886   LearningRate 0.0492   Epoch: 5   Global Step: 247690   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:34,142-Speed 2622.29 samples/sec   Loss 9.4753   LearningRate 0.0492   Epoch: 5   Global Step: 247700   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:38,039-Speed 2628.27 samples/sec   Loss 9.3649   LearningRate 0.0492   Epoch: 5   Global Step: 247710   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:41,934-Speed 2629.58 samples/sec   Loss 9.5172   LearningRate 0.0492   Epoch: 5   Global Step: 247720   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:32:45,858-Speed 2610.24 samples/sec   Loss 9.4279   LearningRate 0.0492   Epoch: 5   Global Step: 247730   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:32:49,751-Speed 2631.31 samples/sec   Loss 9.4792   LearningRate 0.0492   Epoch: 5   Global Step: 247740   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:32:53,684-Speed 2604.81 samples/sec   Loss 9.6541   LearningRate 0.0492   Epoch: 5   Global Step: 247750   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:32:57,609-Speed 2609.11 samples/sec   Loss 9.6085   LearningRate 0.0492   Epoch: 5   Global Step: 247760   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:01,502-Speed 2631.19 samples/sec   Loss 9.5938   LearningRate 0.0492   Epoch: 5   Global Step: 247770   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:05,392-Speed 2633.43 samples/sec   Loss 9.3586   LearningRate 0.0492   Epoch: 5   Global Step: 247780   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:09,280-Speed 2633.81 samples/sec   Loss 9.4507   LearningRate 0.0492   Epoch: 5   Global Step: 247790   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:13,172-Speed 2631.79 samples/sec   Loss 9.4023   LearningRate 0.0492   Epoch: 5   Global Step: 247800   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:17,068-Speed 2628.76 samples/sec   Loss 9.5912   LearningRate 0.0492   Epoch: 5   Global Step: 247810   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:20,982-Speed 2617.73 samples/sec   Loss 9.5501   LearningRate 0.0492   Epoch: 5   Global Step: 247820   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:24,876-Speed 2630.12 samples/sec   Loss 9.3740   LearningRate 0.0492   Epoch: 5   Global Step: 247830   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:33:28,747-Speed 2646.20 samples/sec   Loss 9.4285   LearningRate 0.0492   Epoch: 5   Global Step: 247840   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:32,638-Speed 2632.30 samples/sec   Loss 9.5019   LearningRate 0.0492   Epoch: 5   Global Step: 247850   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:36,534-Speed 2628.44 samples/sec   Loss 9.6026   LearningRate 0.0492   Epoch: 5   Global Step: 247860   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:40,426-Speed 2631.31 samples/sec   Loss 9.5223   LearningRate 0.0492   Epoch: 5   Global Step: 247870   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:44,320-Speed 2630.64 samples/sec   Loss 9.5117   LearningRate 0.0492   Epoch: 5   Global Step: 247880   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:48,210-Speed 2633.09 samples/sec   Loss 9.4472   LearningRate 0.0492   Epoch: 5   Global Step: 247890   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:52,101-Speed 2632.53 samples/sec   Loss 9.5637   LearningRate 0.0492   Epoch: 5   Global Step: 247900   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:56,001-Speed 2626.48 samples/sec   Loss 9.5537   LearningRate 0.0492   Epoch: 5   Global Step: 247910   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:33:59,910-Speed 2620.08 samples/sec   Loss 9.3801   LearningRate 0.0492   Epoch: 5   Global Step: 247920   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:03,808-Speed 2627.53 samples/sec   Loss 9.2518   LearningRate 0.0492   Epoch: 5   Global Step: 247930   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:07,705-Speed 2628.15 samples/sec   Loss 9.6209   LearningRate 0.0492   Epoch: 5   Global Step: 247940   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:34:11,597-Speed 2631.74 samples/sec   Loss 9.4764   LearningRate 0.0492   Epoch: 5   Global Step: 247950   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:34:15,481-Speed 2637.09 samples/sec   Loss 9.5373   LearningRate 0.0492   Epoch: 5   Global Step: 247960   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:19,404-Speed 2610.68 samples/sec   Loss 9.5635   LearningRate 0.0492   Epoch: 5   Global Step: 247970   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:23,298-Speed 2630.35 samples/sec   Loss 9.4709   LearningRate 0.0492   Epoch: 5   Global Step: 247980   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:27,191-Speed 2631.36 samples/sec   Loss 9.4743   LearningRate 0.0491   Epoch: 5   Global Step: 247990   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:31,080-Speed 2633.50 samples/sec   Loss 9.5458   LearningRate 0.0491   Epoch: 5   Global Step: 248000   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:34,970-Speed 2633.29 samples/sec   Loss 9.5879   LearningRate 0.0491   Epoch: 5   Global Step: 248010   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:38,863-Speed 2630.84 samples/sec   Loss 9.5188   LearningRate 0.0491   Epoch: 5   Global Step: 248020   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:42,756-Speed 2631.09 samples/sec   Loss 9.5124   LearningRate 0.0491   Epoch: 5   Global Step: 248030   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:46,649-Speed 2630.58 samples/sec   Loss 9.4197   LearningRate 0.0491   Epoch: 5   Global Step: 248040   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:50,547-Speed 2628.37 samples/sec   Loss 9.4812   LearningRate 0.0491   Epoch: 5   Global Step: 248050   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:54,438-Speed 2632.54 samples/sec   Loss 9.4440   LearningRate 0.0491   Epoch: 5   Global Step: 248060   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:34:58,369-Speed 2605.74 samples/sec   Loss 9.5516   LearningRate 0.0491   Epoch: 5   Global Step: 248070   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:02,257-Speed 2634.37 samples/sec   Loss 9.6292   LearningRate 0.0491   Epoch: 5   Global Step: 248080   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:06,157-Speed 2626.76 samples/sec   Loss 9.5712   LearningRate 0.0491   Epoch: 5   Global Step: 248090   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:10,064-Speed 2620.84 samples/sec   Loss 9.4463   LearningRate 0.0491   Epoch: 5   Global Step: 248100   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:13,956-Speed 2631.77 samples/sec   Loss 9.5136   LearningRate 0.0491   Epoch: 5   Global Step: 248110   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:17,849-Speed 2631.10 samples/sec   Loss 9.4908   LearningRate 0.0491   Epoch: 5   Global Step: 248120   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:21,756-Speed 2621.55 samples/sec   Loss 9.5279   LearningRate 0.0491   Epoch: 5   Global Step: 248130   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:25,659-Speed 2624.48 samples/sec   Loss 9.4909   LearningRate 0.0491   Epoch: 5   Global Step: 248140   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:29,554-Speed 2629.76 samples/sec   Loss 9.4608   LearningRate 0.0491   Epoch: 5   Global Step: 248150   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:33,450-Speed 2629.26 samples/sec   Loss 9.6543   LearningRate 0.0491   Epoch: 5   Global Step: 248160   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:35:37,343-Speed 2631.06 samples/sec   Loss 9.4186   LearningRate 0.0491   Epoch: 5   Global Step: 248170   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:35:41,273-Speed 2606.41 samples/sec   Loss 9.5465   LearningRate 0.0491   Epoch: 5   Global Step: 248180   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:35:45,179-Speed 2621.89 samples/sec   Loss 9.4689   LearningRate 0.0491   Epoch: 5   Global Step: 248190   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:35:49,080-Speed 2626.27 samples/sec   Loss 9.6112   LearningRate 0.0491   Epoch: 5   Global Step: 248200   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:35:52,982-Speed 2624.66 samples/sec   Loss 9.4852   LearningRate 0.0491   Epoch: 5   Global Step: 248210   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:35:56,879-Speed 2628.54 samples/sec   Loss 9.5967   LearningRate 0.0491   Epoch: 5   Global Step: 248220   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:00,796-Speed 2614.35 samples/sec   Loss 9.5618   LearningRate 0.0491   Epoch: 5   Global Step: 248230   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:04,699-Speed 2625.22 samples/sec   Loss 9.3876   LearningRate 0.0491   Epoch: 5   Global Step: 248240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:08,602-Speed 2624.08 samples/sec   Loss 9.4614   LearningRate 0.0491   Epoch: 5   Global Step: 248250   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:12,516-Speed 2617.11 samples/sec   Loss 9.2963   LearningRate 0.0491   Epoch: 5   Global Step: 248260   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:16,402-Speed 2635.85 samples/sec   Loss 9.5491   LearningRate 0.0491   Epoch: 5   Global Step: 248270   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:36:20,290-Speed 2634.11 samples/sec   Loss 9.4949   LearningRate 0.0491   Epoch: 5   Global Step: 248280   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:36:24,166-Speed 2642.53 samples/sec   Loss 9.5197   LearningRate 0.0491   Epoch: 5   Global Step: 248290   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:28,059-Speed 2630.89 samples/sec   Loss 9.4746   LearningRate 0.0491   Epoch: 5   Global Step: 248300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:31,971-Speed 2618.52 samples/sec   Loss 9.4196   LearningRate 0.0491   Epoch: 5   Global Step: 248310   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:35,861-Speed 2633.23 samples/sec   Loss 9.3797   LearningRate 0.0491   Epoch: 5   Global Step: 248320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:39,763-Speed 2625.18 samples/sec   Loss 9.4763   LearningRate 0.0491   Epoch: 5   Global Step: 248330   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:43,662-Speed 2626.75 samples/sec   Loss 9.4439   LearningRate 0.0491   Epoch: 5   Global Step: 248340   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:47,595-Speed 2604.23 samples/sec   Loss 9.3295   LearningRate 0.0491   Epoch: 5   Global Step: 248350   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:51,526-Speed 2605.73 samples/sec   Loss 9.4576   LearningRate 0.0491   Epoch: 5   Global Step: 248360   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:36:55,400-Speed 2644.20 samples/sec   Loss 9.4841   LearningRate 0.0491   Epoch: 5   Global Step: 248370   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:36:59,290-Speed 2633.07 samples/sec   Loss 9.5818   LearningRate 0.0491   Epoch: 5   Global Step: 248380   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:03,181-Speed 2632.25 samples/sec   Loss 9.5118   LearningRate 0.0491   Epoch: 5   Global Step: 248390   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:07,098-Speed 2615.12 samples/sec   Loss 9.4670   LearningRate 0.0491   Epoch: 5   Global Step: 248400   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:11,016-Speed 2614.22 samples/sec   Loss 9.4847   LearningRate 0.0491   Epoch: 5   Global Step: 248410   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:14,994-Speed 2574.63 samples/sec   Loss 9.4926   LearningRate 0.0491   Epoch: 5   Global Step: 248420   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:18,888-Speed 2630.28 samples/sec   Loss 9.5388   LearningRate 0.0491   Epoch: 5   Global Step: 248430   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:22,779-Speed 2632.79 samples/sec   Loss 9.4482   LearningRate 0.0491   Epoch: 5   Global Step: 248440   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:26,667-Speed 2634.13 samples/sec   Loss 9.4247   LearningRate 0.0491   Epoch: 5   Global Step: 248450   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:30,559-Speed 2631.91 samples/sec   Loss 9.4933   LearningRate 0.0491   Epoch: 5   Global Step: 248460   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:37:34,457-Speed 2627.19 samples/sec   Loss 9.3606   LearningRate 0.0491   Epoch: 5   Global Step: 248470   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:37:38,354-Speed 2628.47 samples/sec   Loss 9.4377   LearningRate 0.0491   Epoch: 5   Global Step: 248480   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:37:42,257-Speed 2623.82 samples/sec   Loss 9.5610   LearningRate 0.0491   Epoch: 5   Global Step: 248490   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:37:46,146-Speed 2634.06 samples/sec   Loss 9.3322   LearningRate 0.0491   Epoch: 5   Global Step: 248500   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:37:50,036-Speed 2633.04 samples/sec   Loss 9.5179   LearningRate 0.0491   Epoch: 5   Global Step: 248510   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:37:53,928-Speed 2632.12 samples/sec   Loss 9.4951   LearningRate 0.0491   Epoch: 5   Global Step: 248520   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:37:57,828-Speed 2625.79 samples/sec   Loss 9.4132   LearningRate 0.0491   Epoch: 5   Global Step: 248530   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:01,739-Speed 2618.93 samples/sec   Loss 9.4243   LearningRate 0.0491   Epoch: 5   Global Step: 248540   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:05,630-Speed 2632.55 samples/sec   Loss 9.5829   LearningRate 0.0491   Epoch: 5   Global Step: 248550   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:09,529-Speed 2626.34 samples/sec   Loss 9.5176   LearningRate 0.0491   Epoch: 5   Global Step: 248560   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:13,443-Speed 2617.33 samples/sec   Loss 9.5063   LearningRate 0.0491   Epoch: 5   Global Step: 248570   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:38:17,423-Speed 2573.55 samples/sec   Loss 9.5146   LearningRate 0.0490   Epoch: 5   Global Step: 248580   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:38:21,313-Speed 2633.61 samples/sec   Loss 9.4220   LearningRate 0.0490   Epoch: 5   Global Step: 248590   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:38:25,207-Speed 2629.94 samples/sec   Loss 9.6004   LearningRate 0.0490   Epoch: 5   Global Step: 248600   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:29,107-Speed 2626.53 samples/sec   Loss 9.5224   LearningRate 0.0490   Epoch: 5   Global Step: 248610   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:32,998-Speed 2632.73 samples/sec   Loss 9.4652   LearningRate 0.0490   Epoch: 5   Global Step: 248620   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:36,887-Speed 2633.68 samples/sec   Loss 9.4678   LearningRate 0.0490   Epoch: 5   Global Step: 248630   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:40,794-Speed 2620.91 samples/sec   Loss 9.3503   LearningRate 0.0490   Epoch: 5   Global Step: 248640   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:44,674-Speed 2639.93 samples/sec   Loss 9.5733   LearningRate 0.0490   Epoch: 5   Global Step: 248650   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:48,565-Speed 2632.42 samples/sec   Loss 9.5011   LearningRate 0.0490   Epoch: 5   Global Step: 248660   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:38:52,442-Speed 2642.19 samples/sec   Loss 9.3746   LearningRate 0.0490   Epoch: 5   Global Step: 248670   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:38:56,335-Speed 2631.34 samples/sec   Loss 9.5417   LearningRate 0.0490   Epoch: 5   Global Step: 248680   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:39:00,179-Speed 2664.70 samples/sec   Loss 9.9596   LearningRate 0.0490   Epoch: 5   Global Step: 248690   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:04,073-Speed 2630.04 samples/sec   Loss 10.5796   LearningRate 0.0490   Epoch: 5   Global Step: 248700   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:07,960-Speed 2634.89 samples/sec   Loss 9.8318   LearningRate 0.0490   Epoch: 5   Global Step: 248710   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:11,850-Speed 2632.55 samples/sec   Loss 9.4759   LearningRate 0.0490   Epoch: 5   Global Step: 248720   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:15,740-Speed 2633.43 samples/sec   Loss 9.5001   LearningRate 0.0490   Epoch: 5   Global Step: 248730   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:19,643-Speed 2623.97 samples/sec   Loss 9.4250   LearningRate 0.0490   Epoch: 5   Global Step: 248740   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:23,536-Speed 2631.21 samples/sec   Loss 9.4808   LearningRate 0.0490   Epoch: 5   Global Step: 248750   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:27,432-Speed 2628.83 samples/sec   Loss 9.4480   LearningRate 0.0490   Epoch: 5   Global Step: 248760   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:31,329-Speed 2628.77 samples/sec   Loss 9.3496   LearningRate 0.0490   Epoch: 5   Global Step: 248770   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:35,221-Speed 2631.65 samples/sec   Loss 9.3858   LearningRate 0.0490   Epoch: 5   Global Step: 248780   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-13 23:39:39,115-Speed 2629.99 samples/sec   Loss 9.5104   LearningRate 0.0490   Epoch: 5   Global Step: 248790   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:39:43,018-Speed 2624.14 samples/sec   Loss 9.5013   LearningRate 0.0490   Epoch: 5   Global Step: 248800   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:39:46,926-Speed 2621.10 samples/sec   Loss 9.5317   LearningRate 0.0490   Epoch: 5   Global Step: 248810   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:39:50,815-Speed 2633.22 samples/sec   Loss 9.6170   LearningRate 0.0490   Epoch: 5   Global Step: 248820   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:39:54,712-Speed 2628.23 samples/sec   Loss 9.5691   LearningRate 0.0490   Epoch: 5   Global Step: 248830   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:39:58,601-Speed 2634.15 samples/sec   Loss 9.6470   LearningRate 0.0490   Epoch: 5   Global Step: 248840   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:40:02,494-Speed 2631.53 samples/sec   Loss 9.3551   LearningRate 0.0490   Epoch: 5   Global Step: 248850   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:40:06,387-Speed 2630.54 samples/sec   Loss 9.4282   LearningRate 0.0490   Epoch: 5   Global Step: 248860   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:40:10,292-Speed 2623.02 samples/sec   Loss 9.4197   LearningRate 0.0490   Epoch: 5   Global Step: 248870   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:40:32,124-Speed 469.05 samples/sec   Loss 9.4151   LearningRate 0.0490   Epoch: 6   Global Step: 248880   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:40:36,018-Speed 2630.43 samples/sec   Loss 9.3991   LearningRate 0.0490   Epoch: 6   Global Step: 248890   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:40:39,907-Speed 2633.73 samples/sec   Loss 9.4376   LearningRate 0.0490   Epoch: 6   Global Step: 248900   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:40:43,826-Speed 2614.61 samples/sec   Loss 9.4657   LearningRate 0.0490   Epoch: 6   Global Step: 248910   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:40:47,749-Speed 2610.95 samples/sec   Loss 9.5165   LearningRate 0.0490   Epoch: 6   Global Step: 248920   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:40:51,642-Speed 2631.45 samples/sec   Loss 9.4526   LearningRate 0.0490   Epoch: 6   Global Step: 248930   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:40:55,533-Speed 2632.09 samples/sec   Loss 9.3744   LearningRate 0.0490   Epoch: 6   Global Step: 248940   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:40:59,424-Speed 2632.87 samples/sec   Loss 9.5917   LearningRate 0.0490   Epoch: 6   Global Step: 248950   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:41:03,348-Speed 2610.23 samples/sec   Loss 9.5356   LearningRate 0.0490   Epoch: 6   Global Step: 248960   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:41:07,254-Speed 2621.89 samples/sec   Loss 9.5657   LearningRate 0.0490   Epoch: 6   Global Step: 248970   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:41:11,154-Speed 2626.24 samples/sec   Loss 9.5554   LearningRate 0.0490   Epoch: 6   Global Step: 248980   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:41:15,063-Speed 2620.69 samples/sec   Loss 9.4879   LearningRate 0.0490   Epoch: 6   Global Step: 248990   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:18,974-Speed 2618.81 samples/sec   Loss 9.4240   LearningRate 0.0490   Epoch: 6   Global Step: 249000   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:22,879-Speed 2623.17 samples/sec   Loss 9.4620   LearningRate 0.0490   Epoch: 6   Global Step: 249010   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:26,784-Speed 2622.62 samples/sec   Loss 9.3585   LearningRate 0.0490   Epoch: 6   Global Step: 249020   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:30,699-Speed 2616.71 samples/sec   Loss 9.5357   LearningRate 0.0490   Epoch: 6   Global Step: 249030   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:34,618-Speed 2613.36 samples/sec   Loss 9.4126   LearningRate 0.0490   Epoch: 6   Global Step: 249040   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:38,526-Speed 2620.49 samples/sec   Loss 9.3555   LearningRate 0.0490   Epoch: 6   Global Step: 249050   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:42,435-Speed 2620.50 samples/sec   Loss 9.3726   LearningRate 0.0490   Epoch: 6   Global Step: 249060   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:46,353-Speed 2614.35 samples/sec   Loss 9.4706   LearningRate 0.0490   Epoch: 6   Global Step: 249070   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:50,269-Speed 2614.90 samples/sec   Loss 9.4136   LearningRate 0.0490   Epoch: 6   Global Step: 249080   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:41:54,176-Speed 2621.92 samples/sec   Loss 9.5219   LearningRate 0.0490   Epoch: 6   Global Step: 249090   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:41:58,079-Speed 2624.18 samples/sec   Loss 9.4934   LearningRate 0.0490   Epoch: 6   Global Step: 249100   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:02,003-Speed 2610.86 samples/sec   Loss 9.4199   LearningRate 0.0490   Epoch: 6   Global Step: 249110   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:05,928-Speed 2609.35 samples/sec   Loss 9.4692   LearningRate 0.0490   Epoch: 6   Global Step: 249120   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:09,833-Speed 2622.53 samples/sec   Loss 9.4948   LearningRate 0.0490   Epoch: 6   Global Step: 249130   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:13,741-Speed 2621.39 samples/sec   Loss 9.7260   LearningRate 0.0490   Epoch: 6   Global Step: 249140   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:17,647-Speed 2622.41 samples/sec   Loss 9.4086   LearningRate 0.0490   Epoch: 6   Global Step: 249150   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:21,564-Speed 2614.59 samples/sec   Loss 9.5127   LearningRate 0.0490   Epoch: 6   Global Step: 249160   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:25,492-Speed 2607.70 samples/sec   Loss 9.5220   LearningRate 0.0490   Epoch: 6   Global Step: 249170   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:29,398-Speed 2622.32 samples/sec   Loss 9.3732   LearningRate 0.0489   Epoch: 6   Global Step: 249180   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:33,299-Speed 2625.22 samples/sec   Loss 9.5379   LearningRate 0.0489   Epoch: 6   Global Step: 249190   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:42:37,267-Speed 2581.98 samples/sec   Loss 9.3875   LearningRate 0.0489   Epoch: 6   Global Step: 249200   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:42:41,194-Speed 2608.02 samples/sec   Loss 9.5787   LearningRate 0.0489   Epoch: 6   Global Step: 249210   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:45,095-Speed 2626.05 samples/sec   Loss 9.4610   LearningRate 0.0489   Epoch: 6   Global Step: 249220   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:48,995-Speed 2626.30 samples/sec   Loss 9.3558   LearningRate 0.0489   Epoch: 6   Global Step: 249230   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:52,902-Speed 2621.41 samples/sec   Loss 9.2167   LearningRate 0.0489   Epoch: 6   Global Step: 249240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:42:56,822-Speed 2613.52 samples/sec   Loss 9.3626   LearningRate 0.0489   Epoch: 6   Global Step: 249250   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:00,726-Speed 2623.21 samples/sec   Loss 9.4149   LearningRate 0.0489   Epoch: 6   Global Step: 249260   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:04,628-Speed 2625.28 samples/sec   Loss 9.4750   LearningRate 0.0489   Epoch: 6   Global Step: 249270   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:08,528-Speed 2625.85 samples/sec   Loss 9.3846   LearningRate 0.0489   Epoch: 6   Global Step: 249280   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:12,430-Speed 2624.79 samples/sec   Loss 9.5522   LearningRate 0.0489   Epoch: 6   Global Step: 249290   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:16,339-Speed 2621.17 samples/sec   Loss 9.3321   LearningRate 0.0489   Epoch: 6   Global Step: 249300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:20,237-Speed 2627.21 samples/sec   Loss 9.4543   LearningRate 0.0489   Epoch: 6   Global Step: 249310   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:43:24,167-Speed 2606.42 samples/sec   Loss 9.4917   LearningRate 0.0489   Epoch: 6   Global Step: 249320   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:43:28,072-Speed 2622.98 samples/sec   Loss 9.5252   LearningRate 0.0489   Epoch: 6   Global Step: 249330   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:43:31,979-Speed 2622.28 samples/sec   Loss 9.1780   LearningRate 0.0489   Epoch: 6   Global Step: 249340   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:43:35,864-Speed 2636.02 samples/sec   Loss 9.5262   LearningRate 0.0489   Epoch: 6   Global Step: 249350   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:39,770-Speed 2622.35 samples/sec   Loss 9.4032   LearningRate 0.0489   Epoch: 6   Global Step: 249360   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:43,672-Speed 2624.33 samples/sec   Loss 9.4291   LearningRate 0.0489   Epoch: 6   Global Step: 249370   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:47,575-Speed 2625.01 samples/sec   Loss 9.4990   LearningRate 0.0489   Epoch: 6   Global Step: 249380   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:51,478-Speed 2624.25 samples/sec   Loss 9.4268   LearningRate 0.0489   Epoch: 6   Global Step: 249390   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:43:55,338-Speed 2653.71 samples/sec   Loss 10.2778   LearningRate 0.0489   Epoch: 6   Global Step: 249400   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:43:59,238-Speed 2626.54 samples/sec   Loss 10.1882   LearningRate 0.0489   Epoch: 6   Global Step: 249410   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:03,145-Speed 2620.94 samples/sec   Loss 9.8791   LearningRate 0.0489   Epoch: 6   Global Step: 249420   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:07,048-Speed 2624.24 samples/sec   Loss 9.4665   LearningRate 0.0489   Epoch: 6   Global Step: 249430   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:10,960-Speed 2617.87 samples/sec   Loss 9.6702   LearningRate 0.0489   Epoch: 6   Global Step: 249440   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:14,894-Speed 2604.54 samples/sec   Loss 9.4674   LearningRate 0.0489   Epoch: 6   Global Step: 249450   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:18,796-Speed 2625.44 samples/sec   Loss 9.2932   LearningRate 0.0489   Epoch: 6   Global Step: 249460   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:22,699-Speed 2624.41 samples/sec   Loss 9.3700   LearningRate 0.0489   Epoch: 6   Global Step: 249470   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:26,604-Speed 2623.19 samples/sec   Loss 9.3945   LearningRate 0.0489   Epoch: 6   Global Step: 249480   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:30,506-Speed 2624.93 samples/sec   Loss 9.5187   LearningRate 0.0489   Epoch: 6   Global Step: 249490   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-13 23:44:34,416-Speed 2619.07 samples/sec   Loss 9.4341   LearningRate 0.0489   Epoch: 6   Global Step: 249500   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:44:38,345-Speed 2606.85 samples/sec   Loss 9.4035   LearningRate 0.0489   Epoch: 6   Global Step: 249510   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:44:42,249-Speed 2623.76 samples/sec   Loss 9.6053   LearningRate 0.0489   Epoch: 6   Global Step: 249520   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:44:46,154-Speed 2622.95 samples/sec   Loss 9.3582   LearningRate 0.0489   Epoch: 6   Global Step: 249530   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:44:50,058-Speed 2623.94 samples/sec   Loss 9.4756   LearningRate 0.0489   Epoch: 6   Global Step: 249540   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:44:53,965-Speed 2621.33 samples/sec   Loss 9.4923   LearningRate 0.0489   Epoch: 6   Global Step: 249550   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:44:57,866-Speed 2625.47 samples/sec   Loss 9.5627   LearningRate 0.0489   Epoch: 6   Global Step: 249560   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:45:01,775-Speed 2620.90 samples/sec   Loss 9.3328   LearningRate 0.0489   Epoch: 6   Global Step: 249570   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:45:05,680-Speed 2622.69 samples/sec   Loss 9.5177   LearningRate 0.0489   Epoch: 6   Global Step: 249580   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:45:09,592-Speed 2618.16 samples/sec   Loss 9.4032   LearningRate 0.0489   Epoch: 6   Global Step: 249590   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:45:13,499-Speed 2621.32 samples/sec   Loss 9.6054   LearningRate 0.0489   Epoch: 6   Global Step: 249600   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:17,419-Speed 2613.41 samples/sec   Loss 9.4426   LearningRate 0.0489   Epoch: 6   Global Step: 249610   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:21,326-Speed 2621.60 samples/sec   Loss 9.3167   LearningRate 0.0489   Epoch: 6   Global Step: 249620   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:25,226-Speed 2626.38 samples/sec   Loss 9.5632   LearningRate 0.0489   Epoch: 6   Global Step: 249630   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:29,129-Speed 2624.17 samples/sec   Loss 9.4452   LearningRate 0.0489   Epoch: 6   Global Step: 249640   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:33,039-Speed 2620.25 samples/sec   Loss 9.5188   LearningRate 0.0489   Epoch: 6   Global Step: 249650   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:36,942-Speed 2623.92 samples/sec   Loss 9.6226   LearningRate 0.0489   Epoch: 6   Global Step: 249660   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:40,854-Speed 2617.98 samples/sec   Loss 9.3679   LearningRate 0.0489   Epoch: 6   Global Step: 249670   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:44,768-Speed 2616.65 samples/sec   Loss 9.6470   LearningRate 0.0489   Epoch: 6   Global Step: 249680   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:48,682-Speed 2617.50 samples/sec   Loss 9.5597   LearningRate 0.0489   Epoch: 6   Global Step: 249690   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:45:52,601-Speed 2613.30 samples/sec   Loss 9.4092   LearningRate 0.0489   Epoch: 6   Global Step: 249700   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:45:56,516-Speed 2616.10 samples/sec   Loss 9.5076   LearningRate 0.0489   Epoch: 6   Global Step: 249710   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:00,451-Speed 2603.65 samples/sec   Loss 9.5039   LearningRate 0.0489   Epoch: 6   Global Step: 249720   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:04,364-Speed 2617.00 samples/sec   Loss 9.4566   LearningRate 0.0489   Epoch: 6   Global Step: 249730   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:08,396-Speed 2540.56 samples/sec   Loss 9.3634   LearningRate 0.0489   Epoch: 6   Global Step: 249740   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:12,319-Speed 2610.37 samples/sec   Loss 9.3496   LearningRate 0.0489   Epoch: 6   Global Step: 249750   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:16,239-Speed 2614.52 samples/sec   Loss 9.4532   LearningRate 0.0489   Epoch: 6   Global Step: 249760   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:20,149-Speed 2619.09 samples/sec   Loss 9.4841   LearningRate 0.0488   Epoch: 6   Global Step: 249770   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:24,060-Speed 2619.26 samples/sec   Loss 9.3839   LearningRate 0.0488   Epoch: 6   Global Step: 249780   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:27,979-Speed 2613.34 samples/sec   Loss 9.5379   LearningRate 0.0488   Epoch: 6   Global Step: 249790   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:31,890-Speed 2619.42 samples/sec   Loss 9.5682   LearningRate 0.0488   Epoch: 6   Global Step: 249800   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:46:35,807-Speed 2614.66 samples/sec   Loss 9.3601   LearningRate 0.0488   Epoch: 6   Global Step: 249810   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:46:39,754-Speed 2595.70 samples/sec   Loss 9.5184   LearningRate 0.0488   Epoch: 6   Global Step: 249820   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:46:43,662-Speed 2620.66 samples/sec   Loss 9.5268   LearningRate 0.0488   Epoch: 6   Global Step: 249830   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:46:47,562-Speed 2625.98 samples/sec   Loss 9.4600   LearningRate 0.0488   Epoch: 6   Global Step: 249840   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:51,469-Speed 2621.67 samples/sec   Loss 9.4771   LearningRate 0.0488   Epoch: 6   Global Step: 249850   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:55,374-Speed 2622.85 samples/sec   Loss 9.4950   LearningRate 0.0488   Epoch: 6   Global Step: 249860   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:46:59,276-Speed 2624.75 samples/sec   Loss 9.4730   LearningRate 0.0488   Epoch: 6   Global Step: 249870   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:03,179-Speed 2624.84 samples/sec   Loss 9.4417   LearningRate 0.0488   Epoch: 6   Global Step: 249880   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:07,089-Speed 2619.23 samples/sec   Loss 9.5983   LearningRate 0.0488   Epoch: 6   Global Step: 249890   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:11,022-Speed 2604.74 samples/sec   Loss 9.3588   LearningRate 0.0488   Epoch: 6   Global Step: 249900   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:14,925-Speed 2624.51 samples/sec   Loss 9.4094   LearningRate 0.0488   Epoch: 6   Global Step: 249910   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:18,826-Speed 2625.28 samples/sec   Loss 9.4231   LearningRate 0.0488   Epoch: 6   Global Step: 249920   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:22,735-Speed 2620.26 samples/sec   Loss 9.4332   LearningRate 0.0488   Epoch: 6   Global Step: 249930   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:26,623-Speed 2634.66 samples/sec   Loss 9.5170   LearningRate 0.0488   Epoch: 6   Global Step: 249940   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:30,526-Speed 2624.17 samples/sec   Loss 9.3892   LearningRate 0.0488   Epoch: 6   Global Step: 249950   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:34,462-Speed 2601.80 samples/sec   Loss 9.4715   LearningRate 0.0488   Epoch: 6   Global Step: 249960   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:38,387-Speed 2610.05 samples/sec   Loss 9.4741   LearningRate 0.0488   Epoch: 6   Global Step: 249970   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:42,301-Speed 2617.27 samples/sec   Loss 9.2392   LearningRate 0.0488   Epoch: 6   Global Step: 249980   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:46,205-Speed 2622.92 samples/sec   Loss 9.4043   LearningRate 0.0488   Epoch: 6   Global Step: 249990   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:47:50,114-Speed 2620.51 samples/sec   Loss 9.4199   LearningRate 0.0488   Epoch: 6   Global Step: 250000   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:48:32,686-[lfw][250000]XNorm: 23.060224
Training: 2022-04-13 23:48:32,687-[lfw][250000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-13 23:48:32,687-[lfw][250000]Accuracy-Highest: 0.99783
Training: 2022-04-13 23:49:23,677-[cfp_fp][250000]XNorm: 21.058497
Training: 2022-04-13 23:49:23,678-[cfp_fp][250000]Accuracy-Flip: 0.98643+-0.00630
Training: 2022-04-13 23:49:23,679-[cfp_fp][250000]Accuracy-Highest: 0.98643
Training: 2022-04-13 23:50:06,750-[agedb_30][250000]XNorm: 22.844863
Training: 2022-04-13 23:50:06,751-[agedb_30][250000]Accuracy-Flip: 0.97350+-0.00497
Training: 2022-04-13 23:50:06,752-[agedb_30][250000]Accuracy-Highest: 0.97350
Training: 2022-04-13 23:50:10,633-Speed 72.87 samples/sec   Loss 9.4795   LearningRate 0.0488   Epoch: 6   Global Step: 250010   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:14,512-Speed 2640.83 samples/sec   Loss 9.5247   LearningRate 0.0488   Epoch: 6   Global Step: 250020   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:18,393-Speed 2639.27 samples/sec   Loss 9.4675   LearningRate 0.0488   Epoch: 6   Global Step: 250030   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:22,278-Speed 2636.03 samples/sec   Loss 9.4229   LearningRate 0.0488   Epoch: 6   Global Step: 250040   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:50:26,163-Speed 2636.19 samples/sec   Loss 9.3145   LearningRate 0.0488   Epoch: 6   Global Step: 250050   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:50:30,056-Speed 2630.61 samples/sec   Loss 9.4436   LearningRate 0.0488   Epoch: 6   Global Step: 250060   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:50:33,954-Speed 2628.23 samples/sec   Loss 9.4900   LearningRate 0.0488   Epoch: 6   Global Step: 250070   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:50:37,859-Speed 2622.24 samples/sec   Loss 9.3488   LearningRate 0.0488   Epoch: 6   Global Step: 250080   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:50:41,755-Speed 2629.38 samples/sec   Loss 9.5109   LearningRate 0.0488   Epoch: 6   Global Step: 250090   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:45,649-Speed 2629.82 samples/sec   Loss 9.4124   LearningRate 0.0488   Epoch: 6   Global Step: 250100   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:49,539-Speed 2633.93 samples/sec   Loss 9.3919   LearningRate 0.0488   Epoch: 6   Global Step: 250110   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:53,427-Speed 2634.24 samples/sec   Loss 9.3098   LearningRate 0.0488   Epoch: 6   Global Step: 250120   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:50:57,319-Speed 2631.70 samples/sec   Loss 9.3235   LearningRate 0.0488   Epoch: 6   Global Step: 250130   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:01,207-Speed 2634.66 samples/sec   Loss 9.3885   LearningRate 0.0488   Epoch: 6   Global Step: 250140   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:05,101-Speed 2630.68 samples/sec   Loss 9.5117   LearningRate 0.0488   Epoch: 6   Global Step: 250150   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:08,997-Speed 2628.73 samples/sec   Loss 9.5828   LearningRate 0.0488   Epoch: 6   Global Step: 250160   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:12,908-Speed 2619.04 samples/sec   Loss 9.5401   LearningRate 0.0488   Epoch: 6   Global Step: 250170   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:16,801-Speed 2630.81 samples/sec   Loss 9.4196   LearningRate 0.0488   Epoch: 6   Global Step: 250180   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:20,704-Speed 2624.72 samples/sec   Loss 9.4573   LearningRate 0.0488   Epoch: 6   Global Step: 250190   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:51:24,597-Speed 2631.16 samples/sec   Loss 9.5196   LearningRate 0.0488   Epoch: 6   Global Step: 250200   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:51:28,496-Speed 2626.72 samples/sec   Loss 9.5458   LearningRate 0.0488   Epoch: 6   Global Step: 250210   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:51:32,390-Speed 2630.17 samples/sec   Loss 9.4914   LearningRate 0.0488   Epoch: 6   Global Step: 250220   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:51:36,282-Speed 2631.53 samples/sec   Loss 9.4577   LearningRate 0.0488   Epoch: 6   Global Step: 250230   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:51:40,161-Speed 2641.32 samples/sec   Loss 9.3600   LearningRate 0.0488   Epoch: 6   Global Step: 250240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:44,061-Speed 2625.55 samples/sec   Loss 9.3471   LearningRate 0.0488   Epoch: 6   Global Step: 250250   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:47,955-Speed 2630.40 samples/sec   Loss 9.4754   LearningRate 0.0488   Epoch: 6   Global Step: 250260   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:51,866-Speed 2618.97 samples/sec   Loss 9.3924   LearningRate 0.0488   Epoch: 6   Global Step: 250270   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:55,762-Speed 2629.26 samples/sec   Loss 9.4724   LearningRate 0.0488   Epoch: 6   Global Step: 250280   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:51:59,660-Speed 2627.70 samples/sec   Loss 9.3624   LearningRate 0.0488   Epoch: 6   Global Step: 250290   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:03,605-Speed 2595.94 samples/sec   Loss 9.4878   LearningRate 0.0488   Epoch: 6   Global Step: 250300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:07,497-Speed 2632.09 samples/sec   Loss 9.3383   LearningRate 0.0488   Epoch: 6   Global Step: 250310   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:11,393-Speed 2629.28 samples/sec   Loss 9.3810   LearningRate 0.0488   Epoch: 6   Global Step: 250320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:15,293-Speed 2625.72 samples/sec   Loss 9.3959   LearningRate 0.0488   Epoch: 6   Global Step: 250330   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:19,198-Speed 2623.75 samples/sec   Loss 9.3561   LearningRate 0.0488   Epoch: 6   Global Step: 250340   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:52:23,100-Speed 2624.51 samples/sec   Loss 9.4484   LearningRate 0.0488   Epoch: 6   Global Step: 250350   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:52:26,993-Speed 2630.81 samples/sec   Loss 9.4237   LearningRate 0.0487   Epoch: 6   Global Step: 250360   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:52:30,871-Speed 2641.31 samples/sec   Loss 9.3676   LearningRate 0.0487   Epoch: 6   Global Step: 250370   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:34,764-Speed 2630.89 samples/sec   Loss 9.4265   LearningRate 0.0487   Epoch: 6   Global Step: 250380   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:38,659-Speed 2629.39 samples/sec   Loss 9.3274   LearningRate 0.0487   Epoch: 6   Global Step: 250390   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:42,551-Speed 2632.10 samples/sec   Loss 9.4898   LearningRate 0.0487   Epoch: 6   Global Step: 250400   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:46,442-Speed 2632.90 samples/sec   Loss 9.4714   LearningRate 0.0487   Epoch: 6   Global Step: 250410   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:50,334-Speed 2631.07 samples/sec   Loss 9.3755   LearningRate 0.0487   Epoch: 6   Global Step: 250420   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:54,255-Speed 2612.77 samples/sec   Loss 9.4086   LearningRate 0.0487   Epoch: 6   Global Step: 250430   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:52:58,149-Speed 2630.42 samples/sec   Loss 9.3413   LearningRate 0.0487   Epoch: 6   Global Step: 250440   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:02,047-Speed 2627.52 samples/sec   Loss 9.2456   LearningRate 0.0487   Epoch: 6   Global Step: 250450   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:05,944-Speed 2627.98 samples/sec   Loss 9.4180   LearningRate 0.0487   Epoch: 6   Global Step: 250460   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:09,838-Speed 2630.70 samples/sec   Loss 9.4440   LearningRate 0.0487   Epoch: 6   Global Step: 250470   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:53:13,737-Speed 2627.21 samples/sec   Loss 9.4338   LearningRate 0.0487   Epoch: 6   Global Step: 250480   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:53:17,627-Speed 2632.86 samples/sec   Loss 9.5037   LearningRate 0.0487   Epoch: 6   Global Step: 250490   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:21,519-Speed 2631.67 samples/sec   Loss 9.4525   LearningRate 0.0487   Epoch: 6   Global Step: 250500   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:25,414-Speed 2630.02 samples/sec   Loss 9.4378   LearningRate 0.0487   Epoch: 6   Global Step: 250510   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:29,308-Speed 2630.23 samples/sec   Loss 9.5830   LearningRate 0.0487   Epoch: 6   Global Step: 250520   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:33,217-Speed 2620.46 samples/sec   Loss 9.4648   LearningRate 0.0487   Epoch: 6   Global Step: 250530   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:37,128-Speed 2618.13 samples/sec   Loss 9.3446   LearningRate 0.0487   Epoch: 6   Global Step: 250540   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:41,022-Speed 2630.85 samples/sec   Loss 9.3836   LearningRate 0.0487   Epoch: 6   Global Step: 250550   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:44,914-Speed 2631.81 samples/sec   Loss 9.4484   LearningRate 0.0487   Epoch: 6   Global Step: 250560   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:48,804-Speed 2632.75 samples/sec   Loss 9.3939   LearningRate 0.0487   Epoch: 6   Global Step: 250570   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:52,696-Speed 2631.86 samples/sec   Loss 9.2878   LearningRate 0.0487   Epoch: 6   Global Step: 250580   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:53:56,591-Speed 2629.74 samples/sec   Loss 9.4067   LearningRate 0.0487   Epoch: 6   Global Step: 250590   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:54:00,486-Speed 2629.66 samples/sec   Loss 9.5045   LearningRate 0.0487   Epoch: 6   Global Step: 250600   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:54:04,382-Speed 2628.62 samples/sec   Loss 9.5429   LearningRate 0.0487   Epoch: 6   Global Step: 250610   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:54:08,278-Speed 2629.00 samples/sec   Loss 9.4358   LearningRate 0.0487   Epoch: 6   Global Step: 250620   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:54:12,176-Speed 2627.12 samples/sec   Loss 9.4058   LearningRate 0.0487   Epoch: 6   Global Step: 250630   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:54:16,072-Speed 2630.01 samples/sec   Loss 9.4923   LearningRate 0.0487   Epoch: 6   Global Step: 250640   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:54:19,946-Speed 2643.24 samples/sec   Loss 9.5148   LearningRate 0.0487   Epoch: 6   Global Step: 250650   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:23,840-Speed 2630.74 samples/sec   Loss 9.5599   LearningRate 0.0487   Epoch: 6   Global Step: 250660   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:27,735-Speed 2629.36 samples/sec   Loss 9.4298   LearningRate 0.0487   Epoch: 6   Global Step: 250670   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:31,640-Speed 2623.01 samples/sec   Loss 9.4514   LearningRate 0.0487   Epoch: 6   Global Step: 250680   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:35,544-Speed 2623.43 samples/sec   Loss 9.3914   LearningRate 0.0487   Epoch: 6   Global Step: 250690   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:39,452-Speed 2620.93 samples/sec   Loss 9.4688   LearningRate 0.0487   Epoch: 6   Global Step: 250700   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:43,357-Speed 2622.89 samples/sec   Loss 9.4397   LearningRate 0.0487   Epoch: 6   Global Step: 250710   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:47,263-Speed 2622.02 samples/sec   Loss 9.4321   LearningRate 0.0487   Epoch: 6   Global Step: 250720   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:51,168-Speed 2623.38 samples/sec   Loss 9.3600   LearningRate 0.0487   Epoch: 6   Global Step: 250730   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:55,079-Speed 2618.60 samples/sec   Loss 9.3223   LearningRate 0.0487   Epoch: 6   Global Step: 250740   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:54:58,975-Speed 2628.99 samples/sec   Loss 9.4056   LearningRate 0.0487   Epoch: 6   Global Step: 250750   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:55:02,886-Speed 2618.67 samples/sec   Loss 9.3621   LearningRate 0.0487   Epoch: 6   Global Step: 250760   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:55:06,809-Speed 2611.09 samples/sec   Loss 9.4855   LearningRate 0.0487   Epoch: 6   Global Step: 250770   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:55:10,707-Speed 2627.24 samples/sec   Loss 9.4017   LearningRate 0.0487   Epoch: 6   Global Step: 250780   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:14,607-Speed 2626.18 samples/sec   Loss 9.4804   LearningRate 0.0487   Epoch: 6   Global Step: 250790   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:18,512-Speed 2622.72 samples/sec   Loss 9.5169   LearningRate 0.0487   Epoch: 6   Global Step: 250800   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:22,423-Speed 2619.62 samples/sec   Loss 9.4991   LearningRate 0.0487   Epoch: 6   Global Step: 250810   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:26,318-Speed 2629.39 samples/sec   Loss 9.4095   LearningRate 0.0487   Epoch: 6   Global Step: 250820   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:30,225-Speed 2622.03 samples/sec   Loss 9.4573   LearningRate 0.0487   Epoch: 6   Global Step: 250830   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:34,132-Speed 2621.62 samples/sec   Loss 9.3957   LearningRate 0.0487   Epoch: 6   Global Step: 250840   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:38,049-Speed 2614.86 samples/sec   Loss 9.4991   LearningRate 0.0487   Epoch: 6   Global Step: 250850   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:41,957-Speed 2621.53 samples/sec   Loss 9.4137   LearningRate 0.0487   Epoch: 6   Global Step: 250860   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:45,868-Speed 2618.96 samples/sec   Loss 9.5137   LearningRate 0.0487   Epoch: 6   Global Step: 250870   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:55:49,815-Speed 2594.95 samples/sec   Loss 9.4216   LearningRate 0.0487   Epoch: 6   Global Step: 250880   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:55:53,711-Speed 2629.26 samples/sec   Loss 9.4598   LearningRate 0.0487   Epoch: 6   Global Step: 250890   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:55:57,607-Speed 2629.20 samples/sec   Loss 9.5282   LearningRate 0.0487   Epoch: 6   Global Step: 250900   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:56:01,493-Speed 2636.08 samples/sec   Loss 9.4723   LearningRate 0.0487   Epoch: 6   Global Step: 250910   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:05,396-Speed 2624.16 samples/sec   Loss 9.4627   LearningRate 0.0487   Epoch: 6   Global Step: 250920   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:09,300-Speed 2623.34 samples/sec   Loss 9.4100   LearningRate 0.0487   Epoch: 6   Global Step: 250930   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:13,204-Speed 2623.33 samples/sec   Loss 9.3773   LearningRate 0.0487   Epoch: 6   Global Step: 250940   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:17,105-Speed 2625.61 samples/sec   Loss 9.3816   LearningRate 0.0487   Epoch: 6   Global Step: 250950   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:21,011-Speed 2622.28 samples/sec   Loss 9.4757   LearningRate 0.0486   Epoch: 6   Global Step: 250960   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:24,915-Speed 2623.37 samples/sec   Loss 9.2176   LearningRate 0.0486   Epoch: 6   Global Step: 250970   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:28,809-Speed 2630.30 samples/sec   Loss 9.2546   LearningRate 0.0486   Epoch: 6   Global Step: 250980   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:33,216-Speed 2324.37 samples/sec   Loss 9.5268   LearningRate 0.0486   Epoch: 6   Global Step: 250990   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:37,180-Speed 2584.09 samples/sec   Loss 9.3457   LearningRate 0.0486   Epoch: 6   Global Step: 251000   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:41,066-Speed 2635.63 samples/sec   Loss 9.4429   LearningRate 0.0486   Epoch: 6   Global Step: 251010   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:44,954-Speed 2634.25 samples/sec   Loss 9.4228   LearningRate 0.0486   Epoch: 6   Global Step: 251020   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:48,850-Speed 2628.48 samples/sec   Loss 9.3850   LearningRate 0.0486   Epoch: 6   Global Step: 251030   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:52,746-Speed 2629.22 samples/sec   Loss 9.3057   LearningRate 0.0486   Epoch: 6   Global Step: 251040   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:56:56,644-Speed 2627.27 samples/sec   Loss 9.5021   LearningRate 0.0486   Epoch: 6   Global Step: 251050   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:57:00,540-Speed 2629.15 samples/sec   Loss 9.4535   LearningRate 0.0486   Epoch: 6   Global Step: 251060   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:57:04,437-Speed 2628.33 samples/sec   Loss 9.4585   LearningRate 0.0486   Epoch: 6   Global Step: 251070   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:57:08,331-Speed 2630.33 samples/sec   Loss 9.3224   LearningRate 0.0486   Epoch: 6   Global Step: 251080   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:57:12,214-Speed 2637.54 samples/sec   Loss 9.4492   LearningRate 0.0486   Epoch: 6   Global Step: 251090   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:16,105-Speed 2632.51 samples/sec   Loss 9.2040   LearningRate 0.0486   Epoch: 6   Global Step: 251100   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:20,010-Speed 2622.72 samples/sec   Loss 9.4946   LearningRate 0.0486   Epoch: 6   Global Step: 251110   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:23,921-Speed 2618.92 samples/sec   Loss 9.4182   LearningRate 0.0486   Epoch: 6   Global Step: 251120   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:27,828-Speed 2621.98 samples/sec   Loss 9.3053   LearningRate 0.0486   Epoch: 6   Global Step: 251130   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:31,731-Speed 2623.74 samples/sec   Loss 9.3866   LearningRate 0.0486   Epoch: 6   Global Step: 251140   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:35,637-Speed 2622.50 samples/sec   Loss 9.4247   LearningRate 0.0486   Epoch: 6   Global Step: 251150   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:39,544-Speed 2621.27 samples/sec   Loss 9.5253   LearningRate 0.0486   Epoch: 6   Global Step: 251160   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:43,458-Speed 2616.83 samples/sec   Loss 9.4614   LearningRate 0.0486   Epoch: 6   Global Step: 251170   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:47,355-Speed 2628.43 samples/sec   Loss 9.4296   LearningRate 0.0486   Epoch: 6   Global Step: 251180   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:57:51,254-Speed 2627.18 samples/sec   Loss 9.4460   LearningRate 0.0486   Epoch: 6   Global Step: 251190   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:57:55,154-Speed 2626.12 samples/sec   Loss 9.3367   LearningRate 0.0486   Epoch: 6   Global Step: 251200   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:57:59,055-Speed 2625.66 samples/sec   Loss 9.4120   LearningRate 0.0486   Epoch: 6   Global Step: 251210   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:02,975-Speed 2612.72 samples/sec   Loss 9.4276   LearningRate 0.0486   Epoch: 6   Global Step: 251220   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:06,872-Speed 2628.20 samples/sec   Loss 9.4086   LearningRate 0.0486   Epoch: 6   Global Step: 251230   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:10,767-Speed 2629.47 samples/sec   Loss 9.4558   LearningRate 0.0486   Epoch: 6   Global Step: 251240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:14,662-Speed 2629.88 samples/sec   Loss 9.4585   LearningRate 0.0486   Epoch: 6   Global Step: 251250   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:18,573-Speed 2618.12 samples/sec   Loss 9.3473   LearningRate 0.0486   Epoch: 6   Global Step: 251260   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:22,471-Speed 2627.82 samples/sec   Loss 9.3434   LearningRate 0.0486   Epoch: 6   Global Step: 251270   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:26,367-Speed 2630.01 samples/sec   Loss 9.4197   LearningRate 0.0486   Epoch: 6   Global Step: 251280   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:30,260-Speed 2631.13 samples/sec   Loss 9.5521   LearningRate 0.0486   Epoch: 6   Global Step: 251290   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-13 23:58:34,142-Speed 2638.48 samples/sec   Loss 9.4919   LearningRate 0.0486   Epoch: 6   Global Step: 251300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:38,099-Speed 2588.10 samples/sec   Loss 9.3230   LearningRate 0.0486   Epoch: 6   Global Step: 251310   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:42,191-Speed 2503.05 samples/sec   Loss 9.5084   LearningRate 0.0486   Epoch: 6   Global Step: 251320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:46,164-Speed 2578.31 samples/sec   Loss 9.3383   LearningRate 0.0486   Epoch: 6   Global Step: 251330   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-13 23:58:50,021-Speed 2654.97 samples/sec   Loss 9.4361   LearningRate 0.0486   Epoch: 6   Global Step: 251340   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:58:53,983-Speed 2585.49 samples/sec   Loss 9.5937   LearningRate 0.0486   Epoch: 6   Global Step: 251350   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:58:57,875-Speed 2631.22 samples/sec   Loss 9.4681   LearningRate 0.0486   Epoch: 6   Global Step: 251360   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:01,764-Speed 2633.71 samples/sec   Loss 9.5214   LearningRate 0.0486   Epoch: 6   Global Step: 251370   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:05,656-Speed 2631.26 samples/sec   Loss 9.2972   LearningRate 0.0486   Epoch: 6   Global Step: 251380   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:09,552-Speed 2629.65 samples/sec   Loss 9.5050   LearningRate 0.0486   Epoch: 6   Global Step: 251390   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:13,446-Speed 2630.39 samples/sec   Loss 9.4187   LearningRate 0.0486   Epoch: 6   Global Step: 251400   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:17,339-Speed 2630.72 samples/sec   Loss 9.4854   LearningRate 0.0486   Epoch: 6   Global Step: 251410   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:21,232-Speed 2631.02 samples/sec   Loss 9.3733   LearningRate 0.0486   Epoch: 6   Global Step: 251420   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:25,129-Speed 2628.31 samples/sec   Loss 9.5612   LearningRate 0.0486   Epoch: 6   Global Step: 251430   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:29,030-Speed 2625.32 samples/sec   Loss 9.4489   LearningRate 0.0486   Epoch: 6   Global Step: 251440   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:59:32,924-Speed 2630.24 samples/sec   Loss 9.4247   LearningRate 0.0486   Epoch: 6   Global Step: 251450   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:59:36,824-Speed 2626.08 samples/sec   Loss 9.2978   LearningRate 0.0486   Epoch: 6   Global Step: 251460   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-13 23:59:40,715-Speed 2632.64 samples/sec   Loss 9.9335   LearningRate 0.0486   Epoch: 6   Global Step: 251470   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:44,608-Speed 2631.11 samples/sec   Loss 10.0452   LearningRate 0.0486   Epoch: 6   Global Step: 251480   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:48,508-Speed 2626.04 samples/sec   Loss 9.8083   LearningRate 0.0486   Epoch: 6   Global Step: 251490   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:52,409-Speed 2625.65 samples/sec   Loss 9.7663   LearningRate 0.0486   Epoch: 6   Global Step: 251500   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-13 23:59:56,316-Speed 2621.86 samples/sec   Loss 9.7303   LearningRate 0.0486   Epoch: 6   Global Step: 251510   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:00:00,205-Speed 2633.38 samples/sec   Loss 9.5860   LearningRate 0.0486   Epoch: 6   Global Step: 251520   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:00:04,122-Speed 2614.90 samples/sec   Loss 9.6441   LearningRate 0.0486   Epoch: 6   Global Step: 251530   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:00:08,025-Speed 2624.08 samples/sec   Loss 9.6077   LearningRate 0.0486   Epoch: 6   Global Step: 251540   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:00:11,927-Speed 2624.59 samples/sec   Loss 9.6171   LearningRate 0.0485   Epoch: 6   Global Step: 251550   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:00:15,833-Speed 2622.94 samples/sec   Loss 9.5051   LearningRate 0.0485   Epoch: 6   Global Step: 251560   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:00:19,726-Speed 2630.84 samples/sec   Loss 9.2651   LearningRate 0.0485   Epoch: 6   Global Step: 251570   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:23,624-Speed 2628.65 samples/sec   Loss 9.4983   LearningRate 0.0485   Epoch: 6   Global Step: 251580   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:27,514-Speed 2632.78 samples/sec   Loss 9.5020   LearningRate 0.0485   Epoch: 6   Global Step: 251590   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:31,405-Speed 2633.03 samples/sec   Loss 9.5140   LearningRate 0.0485   Epoch: 6   Global Step: 251600   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:35,294-Speed 2633.08 samples/sec   Loss 9.5048   LearningRate 0.0485   Epoch: 6   Global Step: 251610   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:39,187-Speed 2630.96 samples/sec   Loss 9.5146   LearningRate 0.0485   Epoch: 6   Global Step: 251620   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:43,147-Speed 2586.30 samples/sec   Loss 9.3200   LearningRate 0.0485   Epoch: 6   Global Step: 251630   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:00:47,010-Speed 2651.20 samples/sec   Loss 9.5977   LearningRate 0.0485   Epoch: 6   Global Step: 251640   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:00:50,913-Speed 2624.22 samples/sec   Loss 9.5351   LearningRate 0.0485   Epoch: 6   Global Step: 251650   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:00:54,803-Speed 2632.90 samples/sec   Loss 9.4801   LearningRate 0.0485   Epoch: 6   Global Step: 251660   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:00:58,693-Speed 2633.05 samples/sec   Loss 9.4278   LearningRate 0.0485   Epoch: 6   Global Step: 251670   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:02,585-Speed 2631.72 samples/sec   Loss 9.3629   LearningRate 0.0485   Epoch: 6   Global Step: 251680   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:06,479-Speed 2630.38 samples/sec   Loss 9.4279   LearningRate 0.0485   Epoch: 6   Global Step: 251690   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:10,368-Speed 2633.41 samples/sec   Loss 9.4417   LearningRate 0.0485   Epoch: 6   Global Step: 251700   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:14,255-Speed 2635.39 samples/sec   Loss 9.4812   LearningRate 0.0485   Epoch: 6   Global Step: 251710   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:18,145-Speed 2632.80 samples/sec   Loss 9.4336   LearningRate 0.0485   Epoch: 6   Global Step: 251720   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:22,035-Speed 2633.12 samples/sec   Loss 9.4536   LearningRate 0.0485   Epoch: 6   Global Step: 251730   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:01:25,924-Speed 2633.23 samples/sec   Loss 9.5544   LearningRate 0.0485   Epoch: 6   Global Step: 251740   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:29,814-Speed 2633.11 samples/sec   Loss 9.4111   LearningRate 0.0485   Epoch: 6   Global Step: 251750   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:33,713-Speed 2626.76 samples/sec   Loss 9.2984   LearningRate 0.0485   Epoch: 6   Global Step: 251760   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:37,639-Speed 2608.86 samples/sec   Loss 9.4468   LearningRate 0.0485   Epoch: 6   Global Step: 251770   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:41,527-Speed 2634.40 samples/sec   Loss 9.3687   LearningRate 0.0485   Epoch: 6   Global Step: 251780   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:45,419-Speed 2631.81 samples/sec   Loss 9.5083   LearningRate 0.0485   Epoch: 6   Global Step: 251790   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:49,312-Speed 2631.11 samples/sec   Loss 9.5243   LearningRate 0.0485   Epoch: 6   Global Step: 251800   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:53,206-Speed 2630.23 samples/sec   Loss 9.3359   LearningRate 0.0485   Epoch: 6   Global Step: 251810   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:01:57,102-Speed 2628.75 samples/sec   Loss 9.5942   LearningRate 0.0485   Epoch: 6   Global Step: 251820   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:02:00,993-Speed 2632.74 samples/sec   Loss 9.4217   LearningRate 0.0485   Epoch: 6   Global Step: 251830   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:02:04,884-Speed 2632.27 samples/sec   Loss 9.5020   LearningRate 0.0485   Epoch: 6   Global Step: 251840   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:08,795-Speed 2618.78 samples/sec   Loss 9.5082   LearningRate 0.0485   Epoch: 6   Global Step: 251850   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:12,703-Speed 2620.38 samples/sec   Loss 9.4866   LearningRate 0.0485   Epoch: 6   Global Step: 251860   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:16,595-Speed 2632.17 samples/sec   Loss 9.2952   LearningRate 0.0485   Epoch: 6   Global Step: 251870   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:20,564-Speed 2580.61 samples/sec   Loss 9.3871   LearningRate 0.0485   Epoch: 6   Global Step: 251880   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:24,482-Speed 2614.07 samples/sec   Loss 9.2919   LearningRate 0.0485   Epoch: 6   Global Step: 251890   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:28,379-Speed 2628.39 samples/sec   Loss 9.4255   LearningRate 0.0485   Epoch: 6   Global Step: 251900   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:32,283-Speed 2623.44 samples/sec   Loss 9.3593   LearningRate 0.0485   Epoch: 6   Global Step: 251910   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:36,179-Speed 2628.66 samples/sec   Loss 9.3582   LearningRate 0.0485   Epoch: 6   Global Step: 251920   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:40,094-Speed 2616.27 samples/sec   Loss 9.4645   LearningRate 0.0485   Epoch: 6   Global Step: 251930   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:02:43,993-Speed 2626.43 samples/sec   Loss 9.3517   LearningRate 0.0485   Epoch: 6   Global Step: 251940   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:02:47,910-Speed 2614.80 samples/sec   Loss 9.5079   LearningRate 0.0485   Epoch: 6   Global Step: 251950   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:02:51,811-Speed 2625.99 samples/sec   Loss 9.4325   LearningRate 0.0485   Epoch: 6   Global Step: 251960   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:02:55,712-Speed 2625.34 samples/sec   Loss 9.4171   LearningRate 0.0485   Epoch: 6   Global Step: 251970   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:02:59,626-Speed 2617.22 samples/sec   Loss 9.3582   LearningRate 0.0485   Epoch: 6   Global Step: 251980   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:03:03,536-Speed 2619.65 samples/sec   Loss 9.4337   LearningRate 0.0485   Epoch: 6   Global Step: 251990   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:03:07,425-Speed 2633.69 samples/sec   Loss 9.8376   LearningRate 0.0485   Epoch: 6   Global Step: 252000   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:03:11,269-Speed 2664.51 samples/sec   Loss 10.1109   LearningRate 0.0485   Epoch: 6   Global Step: 252010   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:15,246-Speed 2575.33 samples/sec   Loss 9.6810   LearningRate 0.0485   Epoch: 6   Global Step: 252020   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:19,155-Speed 2619.95 samples/sec   Loss 9.4125   LearningRate 0.0485   Epoch: 6   Global Step: 252030   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:23,062-Speed 2621.07 samples/sec   Loss 9.3281   LearningRate 0.0485   Epoch: 6   Global Step: 252040   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:27,071-Speed 2555.21 samples/sec   Loss 9.3749   LearningRate 0.0485   Epoch: 6   Global Step: 252050   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:30,970-Speed 2627.14 samples/sec   Loss 9.4468   LearningRate 0.0485   Epoch: 6   Global Step: 252060   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:34,867-Speed 2628.75 samples/sec   Loss 9.5038   LearningRate 0.0485   Epoch: 6   Global Step: 252070   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:38,764-Speed 2628.38 samples/sec   Loss 9.5788   LearningRate 0.0485   Epoch: 6   Global Step: 252080   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:42,663-Speed 2626.60 samples/sec   Loss 9.4654   LearningRate 0.0485   Epoch: 6   Global Step: 252090   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:46,562-Speed 2626.50 samples/sec   Loss 9.5197   LearningRate 0.0485   Epoch: 6   Global Step: 252100   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-14 00:03:50,467-Speed 2623.44 samples/sec   Loss 9.5352   LearningRate 0.0485   Epoch: 6   Global Step: 252110   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:03:54,371-Speed 2622.84 samples/sec   Loss 9.5479   LearningRate 0.0485   Epoch: 6   Global Step: 252120   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:03:58,271-Speed 2626.61 samples/sec   Loss 9.5011   LearningRate 0.0485   Epoch: 6   Global Step: 252130   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:02,162-Speed 2632.56 samples/sec   Loss 9.4276   LearningRate 0.0485   Epoch: 6   Global Step: 252140   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:06,065-Speed 2624.61 samples/sec   Loss 9.5564   LearningRate 0.0484   Epoch: 6   Global Step: 252150   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:09,968-Speed 2624.28 samples/sec   Loss 9.4295   LearningRate 0.0484   Epoch: 6   Global Step: 252160   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:13,875-Speed 2621.33 samples/sec   Loss 9.5223   LearningRate 0.0484   Epoch: 6   Global Step: 252170   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:17,769-Speed 2630.11 samples/sec   Loss 9.3926   LearningRate 0.0484   Epoch: 6   Global Step: 252180   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:21,660-Speed 2632.57 samples/sec   Loss 9.4471   LearningRate 0.0484   Epoch: 6   Global Step: 252190   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:25,554-Speed 2630.47 samples/sec   Loss 9.4198   LearningRate 0.0484   Epoch: 6   Global Step: 252200   Fp16 Grad Scale: 16384   Required: 65 hours
Training: 2022-04-14 00:04:29,445-Speed 2632.08 samples/sec   Loss 9.3389   LearningRate 0.0484   Epoch: 6   Global Step: 252210   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:33,335-Speed 2633.12 samples/sec   Loss 9.4040   LearningRate 0.0484   Epoch: 6   Global Step: 252220   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:37,229-Speed 2630.48 samples/sec   Loss 9.4637   LearningRate 0.0484   Epoch: 6   Global Step: 252230   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:41,120-Speed 2632.51 samples/sec   Loss 9.4646   LearningRate 0.0484   Epoch: 6   Global Step: 252240   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:45,020-Speed 2626.50 samples/sec   Loss 9.4161   LearningRate 0.0484   Epoch: 6   Global Step: 252250   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:48,924-Speed 2623.87 samples/sec   Loss 9.5107   LearningRate 0.0484   Epoch: 6   Global Step: 252260   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:52,838-Speed 2616.71 samples/sec   Loss 9.4835   LearningRate 0.0484   Epoch: 6   Global Step: 252270   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:04:56,740-Speed 2625.49 samples/sec   Loss 9.3066   LearningRate 0.0484   Epoch: 6   Global Step: 252280   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:05:00,638-Speed 2627.11 samples/sec   Loss 9.3419   LearningRate 0.0484   Epoch: 6   Global Step: 252290   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:05:04,539-Speed 2625.57 samples/sec   Loss 9.4849   LearningRate 0.0484   Epoch: 6   Global Step: 252300   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:05:08,435-Speed 2628.64 samples/sec   Loss 9.4173   LearningRate 0.0484   Epoch: 6   Global Step: 252310   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:12,334-Speed 2627.26 samples/sec   Loss 9.4621   LearningRate 0.0484   Epoch: 6   Global Step: 252320   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:16,235-Speed 2625.73 samples/sec   Loss 9.4722   LearningRate 0.0484   Epoch: 6   Global Step: 252330   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:20,130-Speed 2629.97 samples/sec   Loss 9.4955   LearningRate 0.0484   Epoch: 6   Global Step: 252340   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:24,021-Speed 2632.37 samples/sec   Loss 9.5071   LearningRate 0.0484   Epoch: 6   Global Step: 252350   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:27,917-Speed 2629.23 samples/sec   Loss 9.4428   LearningRate 0.0484   Epoch: 6   Global Step: 252360   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:31,807-Speed 2633.12 samples/sec   Loss 9.3581   LearningRate 0.0484   Epoch: 6   Global Step: 252370   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:35,719-Speed 2618.02 samples/sec   Loss 9.4784   LearningRate 0.0484   Epoch: 6   Global Step: 252380   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:39,611-Speed 2631.92 samples/sec   Loss 9.3502   LearningRate 0.0484   Epoch: 6   Global Step: 252390   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:43,518-Speed 2621.49 samples/sec   Loss 9.5433   LearningRate 0.0484   Epoch: 6   Global Step: 252400   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:05:47,435-Speed 2615.24 samples/sec   Loss 9.4299   LearningRate 0.0484   Epoch: 6   Global Step: 252410   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:05:51,328-Speed 2631.24 samples/sec   Loss 9.4332   LearningRate 0.0484   Epoch: 6   Global Step: 252420   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:05:55,227-Speed 2626.88 samples/sec   Loss 9.3745   LearningRate 0.0484   Epoch: 6   Global Step: 252430   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:05:59,142-Speed 2616.56 samples/sec   Loss 9.3318   LearningRate 0.0484   Epoch: 6   Global Step: 252440   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:06:03,025-Speed 2637.28 samples/sec   Loss 9.3531   LearningRate 0.0484   Epoch: 6   Global Step: 252450   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:06,943-Speed 2614.04 samples/sec   Loss 9.5271   LearningRate 0.0484   Epoch: 6   Global Step: 252460   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:10,869-Speed 2609.30 samples/sec   Loss 9.4452   LearningRate 0.0484   Epoch: 6   Global Step: 252470   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:14,777-Speed 2621.01 samples/sec   Loss 9.3570   LearningRate 0.0484   Epoch: 6   Global Step: 252480   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:18,679-Speed 2625.24 samples/sec   Loss 9.4265   LearningRate 0.0484   Epoch: 6   Global Step: 252490   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:22,589-Speed 2619.70 samples/sec   Loss 9.5435   LearningRate 0.0484   Epoch: 6   Global Step: 252500   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:26,957-Speed 2344.58 samples/sec   Loss 9.5941   LearningRate 0.0484   Epoch: 6   Global Step: 252510   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:30,859-Speed 2625.48 samples/sec   Loss 9.3541   LearningRate 0.0484   Epoch: 6   Global Step: 252520   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:34,755-Speed 2628.45 samples/sec   Loss 9.4209   LearningRate 0.0484   Epoch: 6   Global Step: 252530   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:38,651-Speed 2629.22 samples/sec   Loss 9.4453   LearningRate 0.0484   Epoch: 6   Global Step: 252540   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:06:42,574-Speed 2610.78 samples/sec   Loss 9.4179   LearningRate 0.0484   Epoch: 6   Global Step: 252550   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:06:46,471-Speed 2629.29 samples/sec   Loss 9.4406   LearningRate 0.0484   Epoch: 6   Global Step: 252560   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:06:50,374-Speed 2624.03 samples/sec   Loss 9.5032   LearningRate 0.0484   Epoch: 6   Global Step: 252570   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:06:54,306-Speed 2605.29 samples/sec   Loss 9.4537   LearningRate 0.0484   Epoch: 6   Global Step: 252580   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:06:58,206-Speed 2625.95 samples/sec   Loss 9.3712   LearningRate 0.0484   Epoch: 6   Global Step: 252590   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:02,110-Speed 2624.28 samples/sec   Loss 9.3186   LearningRate 0.0484   Epoch: 6   Global Step: 252600   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:06,013-Speed 2624.34 samples/sec   Loss 9.4636   LearningRate 0.0484   Epoch: 6   Global Step: 252610   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:09,908-Speed 2629.34 samples/sec   Loss 9.4142   LearningRate 0.0484   Epoch: 6   Global Step: 252620   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:13,803-Speed 2629.27 samples/sec   Loss 9.3145   LearningRate 0.0484   Epoch: 6   Global Step: 252630   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:17,703-Speed 2626.43 samples/sec   Loss 9.3416   LearningRate 0.0484   Epoch: 6   Global Step: 252640   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:21,608-Speed 2623.07 samples/sec   Loss 9.4111   LearningRate 0.0484   Epoch: 6   Global Step: 252650   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:07:25,524-Speed 2615.61 samples/sec   Loss 9.4148   LearningRate 0.0484   Epoch: 6   Global Step: 252660   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:07:29,430-Speed 2622.49 samples/sec   Loss 9.3465   LearningRate 0.0484   Epoch: 6   Global Step: 252670   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:07:33,337-Speed 2621.69 samples/sec   Loss 9.3383   LearningRate 0.0484   Epoch: 6   Global Step: 252680   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:07:37,235-Speed 2627.15 samples/sec   Loss 9.2627   LearningRate 0.0484   Epoch: 6   Global Step: 252690   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:07:41,124-Speed 2633.71 samples/sec   Loss 9.3866   LearningRate 0.0484   Epoch: 6   Global Step: 252700   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:45,017-Speed 2631.12 samples/sec   Loss 9.4513   LearningRate 0.0484   Epoch: 6   Global Step: 252710   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:49,088-Speed 2515.98 samples/sec   Loss 9.3831   LearningRate 0.0484   Epoch: 6   Global Step: 252720   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:53,139-Speed 2528.30 samples/sec   Loss 9.3708   LearningRate 0.0484   Epoch: 6   Global Step: 252730   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:07:57,028-Speed 2633.52 samples/sec   Loss 9.5279   LearningRate 0.0483   Epoch: 6   Global Step: 252740   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:00,931-Speed 2624.24 samples/sec   Loss 9.4012   LearningRate 0.0483   Epoch: 6   Global Step: 252750   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:04,824-Speed 2631.06 samples/sec   Loss 9.3135   LearningRate 0.0483   Epoch: 6   Global Step: 252760   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:08,721-Speed 2627.97 samples/sec   Loss 9.2369   LearningRate 0.0483   Epoch: 6   Global Step: 252770   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:12,622-Speed 2625.50 samples/sec   Loss 9.4375   LearningRate 0.0483   Epoch: 6   Global Step: 252780   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:16,554-Speed 2605.49 samples/sec   Loss 9.2156   LearningRate 0.0483   Epoch: 6   Global Step: 252790   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:20,457-Speed 2623.85 samples/sec   Loss 9.2856   LearningRate 0.0483   Epoch: 6   Global Step: 252800   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:08:24,327-Speed 2646.43 samples/sec   Loss 9.3443   LearningRate 0.0483   Epoch: 6   Global Step: 252810   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:28,220-Speed 2631.55 samples/sec   Loss 9.3690   LearningRate 0.0483   Epoch: 6   Global Step: 252820   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:08:32,095-Speed 2642.90 samples/sec   Loss 9.4245   LearningRate 0.0483   Epoch: 6   Global Step: 252830   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:08:35,995-Speed 2626.88 samples/sec   Loss 9.4008   LearningRate 0.0483   Epoch: 6   Global Step: 252840   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:08:39,886-Speed 2631.85 samples/sec   Loss 9.3114   LearningRate 0.0483   Epoch: 6   Global Step: 252850   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:08:43,779-Speed 2631.16 samples/sec   Loss 9.4414   LearningRate 0.0483   Epoch: 6   Global Step: 252860   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:08:47,684-Speed 2622.55 samples/sec   Loss 9.2895   LearningRate 0.0483   Epoch: 6   Global Step: 252870   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:08:51,575-Speed 2632.90 samples/sec   Loss 9.4244   LearningRate 0.0483   Epoch: 6   Global Step: 252880   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:08:55,461-Speed 2635.31 samples/sec   Loss 9.3397   LearningRate 0.0483   Epoch: 6   Global Step: 252890   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:08:59,403-Speed 2598.47 samples/sec   Loss 9.3726   LearningRate 0.0483   Epoch: 6   Global Step: 252900   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:03,302-Speed 2626.77 samples/sec   Loss 9.2781   LearningRate 0.0483   Epoch: 6   Global Step: 252910   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:07,192-Speed 2633.48 samples/sec   Loss 9.4458   LearningRate 0.0483   Epoch: 6   Global Step: 252920   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:11,093-Speed 2625.39 samples/sec   Loss 9.4163   LearningRate 0.0483   Epoch: 6   Global Step: 252930   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:15,004-Speed 2618.92 samples/sec   Loss 9.4878   LearningRate 0.0483   Epoch: 6   Global Step: 252940   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:18,898-Speed 2629.81 samples/sec   Loss 9.3606   LearningRate 0.0483   Epoch: 6   Global Step: 252950   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:22,790-Speed 2632.02 samples/sec   Loss 9.4504   LearningRate 0.0483   Epoch: 6   Global Step: 252960   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:09:26,682-Speed 2632.18 samples/sec   Loss 9.4150   LearningRate 0.0483   Epoch: 6   Global Step: 252970   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:30,578-Speed 2628.84 samples/sec   Loss 9.3797   LearningRate 0.0483   Epoch: 6   Global Step: 252980   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:34,484-Speed 2622.74 samples/sec   Loss 9.3068   LearningRate 0.0483   Epoch: 6   Global Step: 252990   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:38,372-Speed 2633.87 samples/sec   Loss 9.3599   LearningRate 0.0483   Epoch: 6   Global Step: 253000   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:42,269-Speed 2628.74 samples/sec   Loss 9.2946   LearningRate 0.0483   Epoch: 6   Global Step: 253010   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:46,162-Speed 2630.73 samples/sec   Loss 9.4395   LearningRate 0.0483   Epoch: 6   Global Step: 253020   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:50,052-Speed 2633.24 samples/sec   Loss 9.4563   LearningRate 0.0483   Epoch: 6   Global Step: 253030   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:53,960-Speed 2620.53 samples/sec   Loss 9.3712   LearningRate 0.0483   Epoch: 6   Global Step: 253040   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:09:57,849-Speed 2633.87 samples/sec   Loss 9.3822   LearningRate 0.0483   Epoch: 6   Global Step: 253050   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:10:01,738-Speed 2633.53 samples/sec   Loss 9.3181   LearningRate 0.0483   Epoch: 6   Global Step: 253060   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:10:05,635-Speed 2628.98 samples/sec   Loss 9.4531   LearningRate 0.0483   Epoch: 6   Global Step: 253070   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:09,525-Speed 2632.80 samples/sec   Loss 9.4350   LearningRate 0.0483   Epoch: 6   Global Step: 253080   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:13,418-Speed 2631.06 samples/sec   Loss 9.4511   LearningRate 0.0483   Epoch: 6   Global Step: 253090   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:17,317-Speed 2626.86 samples/sec   Loss 9.3431   LearningRate 0.0483   Epoch: 6   Global Step: 253100   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:21,208-Speed 2632.15 samples/sec   Loss 9.3729   LearningRate 0.0483   Epoch: 6   Global Step: 253110   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:25,114-Speed 2622.26 samples/sec   Loss 9.3504   LearningRate 0.0483   Epoch: 6   Global Step: 253120   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:29,003-Speed 2633.76 samples/sec   Loss 9.4420   LearningRate 0.0483   Epoch: 6   Global Step: 253130   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:32,893-Speed 2633.76 samples/sec   Loss 9.4977   LearningRate 0.0483   Epoch: 6   Global Step: 253140   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:36,782-Speed 2633.69 samples/sec   Loss 9.4500   LearningRate 0.0483   Epoch: 6   Global Step: 253150   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:40,683-Speed 2625.73 samples/sec   Loss 9.3107   LearningRate 0.0483   Epoch: 6   Global Step: 253160   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:44,555-Speed 2644.68 samples/sec   Loss 9.3187   LearningRate 0.0483   Epoch: 6   Global Step: 253170   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:48,476-Speed 2613.05 samples/sec   Loss 9.4640   LearningRate 0.0483   Epoch: 6   Global Step: 253180   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:52,377-Speed 2625.57 samples/sec   Loss 9.2665   LearningRate 0.0483   Epoch: 6   Global Step: 253190   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:10:56,289-Speed 2620.21 samples/sec   Loss 9.4145   LearningRate 0.0483   Epoch: 6   Global Step: 253200   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:00,181-Speed 2631.84 samples/sec   Loss 9.3432   LearningRate 0.0483   Epoch: 6   Global Step: 253210   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:04,074-Speed 2631.39 samples/sec   Loss 9.3764   LearningRate 0.0483   Epoch: 6   Global Step: 253220   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:07,971-Speed 2628.18 samples/sec   Loss 9.3073   LearningRate 0.0483   Epoch: 6   Global Step: 253230   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:11,874-Speed 2624.27 samples/sec   Loss 9.4201   LearningRate 0.0483   Epoch: 6   Global Step: 253240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:15,784-Speed 2619.29 samples/sec   Loss 9.2494   LearningRate 0.0483   Epoch: 6   Global Step: 253250   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:19,675-Speed 2632.22 samples/sec   Loss 9.4385   LearningRate 0.0483   Epoch: 6   Global Step: 253260   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:23,572-Speed 2628.74 samples/sec   Loss 9.4092   LearningRate 0.0483   Epoch: 6   Global Step: 253270   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:11:27,450-Speed 2641.42 samples/sec   Loss 9.4206   LearningRate 0.0483   Epoch: 6   Global Step: 253280   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:31,353-Speed 2624.18 samples/sec   Loss 9.5244   LearningRate 0.0483   Epoch: 6   Global Step: 253290   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:35,251-Speed 2626.92 samples/sec   Loss 9.4353   LearningRate 0.0483   Epoch: 6   Global Step: 253300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:39,147-Speed 2628.83 samples/sec   Loss 9.4166   LearningRate 0.0483   Epoch: 6   Global Step: 253310   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:43,050-Speed 2625.13 samples/sec   Loss 9.3224   LearningRate 0.0483   Epoch: 6   Global Step: 253320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:46,953-Speed 2624.72 samples/sec   Loss 9.4245   LearningRate 0.0483   Epoch: 6   Global Step: 253330   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:50,849-Speed 2628.73 samples/sec   Loss 9.4358   LearningRate 0.0482   Epoch: 6   Global Step: 253340   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:54,768-Speed 2613.76 samples/sec   Loss 9.4151   LearningRate 0.0482   Epoch: 6   Global Step: 253350   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:11:58,673-Speed 2623.19 samples/sec   Loss 9.3522   LearningRate 0.0482   Epoch: 6   Global Step: 253360   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:02,560-Speed 2634.38 samples/sec   Loss 9.4009   LearningRate 0.0482   Epoch: 6   Global Step: 253370   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:06,453-Speed 2631.08 samples/sec   Loss 9.3560   LearningRate 0.0482   Epoch: 6   Global Step: 253380   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:10,345-Speed 2632.24 samples/sec   Loss 9.4311   LearningRate 0.0482   Epoch: 6   Global Step: 253390   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:14,231-Speed 2635.46 samples/sec   Loss 9.3714   LearningRate 0.0482   Epoch: 6   Global Step: 253400   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:18,133-Speed 2625.52 samples/sec   Loss 9.3641   LearningRate 0.0482   Epoch: 6   Global Step: 253410   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:22,039-Speed 2622.14 samples/sec   Loss 9.1858   LearningRate 0.0482   Epoch: 6   Global Step: 253420   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:25,943-Speed 2623.82 samples/sec   Loss 9.3700   LearningRate 0.0482   Epoch: 6   Global Step: 253430   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:29,832-Speed 2633.92 samples/sec   Loss 9.2845   LearningRate 0.0482   Epoch: 6   Global Step: 253440   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:33,734-Speed 2624.40 samples/sec   Loss 9.2886   LearningRate 0.0482   Epoch: 6   Global Step: 253450   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:37,624-Speed 2632.94 samples/sec   Loss 9.3624   LearningRate 0.0482   Epoch: 6   Global Step: 253460   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:41,521-Speed 2628.58 samples/sec   Loss 9.3885   LearningRate 0.0482   Epoch: 6   Global Step: 253470   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:45,416-Speed 2629.56 samples/sec   Loss 9.5156   LearningRate 0.0482   Epoch: 6   Global Step: 253480   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:12:49,326-Speed 2620.30 samples/sec   Loss 9.4809   LearningRate 0.0482   Epoch: 6   Global Step: 253490   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:53,226-Speed 2626.47 samples/sec   Loss 9.2725   LearningRate 0.0482   Epoch: 6   Global Step: 253500   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:12:57,494-Speed 2399.97 samples/sec   Loss 9.4320   LearningRate 0.0482   Epoch: 6   Global Step: 253510   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:01,397-Speed 2623.70 samples/sec   Loss 9.4817   LearningRate 0.0482   Epoch: 6   Global Step: 253520   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:05,285-Speed 2634.24 samples/sec   Loss 9.4958   LearningRate 0.0482   Epoch: 6   Global Step: 253530   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:09,180-Speed 2630.08 samples/sec   Loss 9.5266   LearningRate 0.0482   Epoch: 6   Global Step: 253540   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:13,075-Speed 2629.92 samples/sec   Loss 9.3996   LearningRate 0.0482   Epoch: 6   Global Step: 253550   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:16,975-Speed 2626.34 samples/sec   Loss 9.5002   LearningRate 0.0482   Epoch: 6   Global Step: 253560   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:20,873-Speed 2627.17 samples/sec   Loss 9.3431   LearningRate 0.0482   Epoch: 6   Global Step: 253570   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:24,783-Speed 2620.12 samples/sec   Loss 9.3227   LearningRate 0.0482   Epoch: 6   Global Step: 253580   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:28,712-Speed 2607.08 samples/sec   Loss 9.3212   LearningRate 0.0482   Epoch: 6   Global Step: 253590   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:13:32,616-Speed 2623.68 samples/sec   Loss 9.2732   LearningRate 0.0482   Epoch: 6   Global Step: 253600   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:13:36,511-Speed 2629.35 samples/sec   Loss 9.4549   LearningRate 0.0482   Epoch: 6   Global Step: 253610   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:13:40,380-Speed 2647.15 samples/sec   Loss 9.4954   LearningRate 0.0482   Epoch: 6   Global Step: 253620   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:44,275-Speed 2629.23 samples/sec   Loss 9.3920   LearningRate 0.0482   Epoch: 6   Global Step: 253630   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:48,171-Speed 2629.56 samples/sec   Loss 9.3857   LearningRate 0.0482   Epoch: 6   Global Step: 253640   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:52,065-Speed 2630.22 samples/sec   Loss 9.4604   LearningRate 0.0482   Epoch: 6   Global Step: 253650   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:13:55,926-Speed 2654.11 samples/sec   Loss 9.3340   LearningRate 0.0482   Epoch: 6   Global Step: 253660   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:13:59,818-Speed 2631.82 samples/sec   Loss 9.4157   LearningRate 0.0482   Epoch: 6   Global Step: 253670   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:03,720-Speed 2624.95 samples/sec   Loss 9.3732   LearningRate 0.0482   Epoch: 6   Global Step: 253680   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:07,614-Speed 2630.60 samples/sec   Loss 9.3076   LearningRate 0.0482   Epoch: 6   Global Step: 253690   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:11,511-Speed 2628.32 samples/sec   Loss 9.4626   LearningRate 0.0482   Epoch: 6   Global Step: 253700   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:15,412-Speed 2625.26 samples/sec   Loss 9.4240   LearningRate 0.0482   Epoch: 6   Global Step: 253710   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:19,309-Speed 2628.11 samples/sec   Loss 9.2921   LearningRate 0.0482   Epoch: 6   Global Step: 253720   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:23,205-Speed 2629.47 samples/sec   Loss 9.3452   LearningRate 0.0482   Epoch: 6   Global Step: 253730   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:27,098-Speed 2630.43 samples/sec   Loss 9.1973   LearningRate 0.0482   Epoch: 6   Global Step: 253740   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:30,998-Speed 2627.04 samples/sec   Loss 9.4426   LearningRate 0.0482   Epoch: 6   Global Step: 253750   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:14:34,891-Speed 2630.96 samples/sec   Loss 9.3620   LearningRate 0.0482   Epoch: 6   Global Step: 253760   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:14:38,784-Speed 2630.83 samples/sec   Loss 9.3367   LearningRate 0.0482   Epoch: 6   Global Step: 253770   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:14:42,675-Speed 2631.78 samples/sec   Loss 9.4745   LearningRate 0.0482   Epoch: 6   Global Step: 253780   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:14:46,572-Speed 2628.42 samples/sec   Loss 9.3304   LearningRate 0.0482   Epoch: 6   Global Step: 253790   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:14:50,468-Speed 2629.16 samples/sec   Loss 9.4082   LearningRate 0.0482   Epoch: 6   Global Step: 253800   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:14:54,366-Speed 2627.97 samples/sec   Loss 9.3235   LearningRate 0.0482   Epoch: 6   Global Step: 253810   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:14:58,260-Speed 2630.33 samples/sec   Loss 9.3663   LearningRate 0.0482   Epoch: 6   Global Step: 253820   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:15:02,168-Speed 2621.38 samples/sec   Loss 9.3124   LearningRate 0.0482   Epoch: 6   Global Step: 253830   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:15:06,073-Speed 2622.71 samples/sec   Loss 9.5028   LearningRate 0.0482   Epoch: 6   Global Step: 253840   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:09,966-Speed 2630.86 samples/sec   Loss 9.4900   LearningRate 0.0482   Epoch: 6   Global Step: 253850   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:13,862-Speed 2629.15 samples/sec   Loss 9.3643   LearningRate 0.0482   Epoch: 6   Global Step: 253860   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:17,781-Speed 2613.32 samples/sec   Loss 9.4642   LearningRate 0.0482   Epoch: 6   Global Step: 253870   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:21,688-Speed 2621.80 samples/sec   Loss 9.2150   LearningRate 0.0482   Epoch: 6   Global Step: 253880   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:25,579-Speed 2632.44 samples/sec   Loss 9.3354   LearningRate 0.0482   Epoch: 6   Global Step: 253890   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:29,472-Speed 2630.96 samples/sec   Loss 9.4717   LearningRate 0.0482   Epoch: 6   Global Step: 253900   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:33,365-Speed 2631.48 samples/sec   Loss 9.3613   LearningRate 0.0482   Epoch: 6   Global Step: 253910   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:37,257-Speed 2632.07 samples/sec   Loss 9.3564   LearningRate 0.0482   Epoch: 6   Global Step: 253920   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:41,150-Speed 2630.51 samples/sec   Loss 9.4285   LearningRate 0.0482   Epoch: 6   Global Step: 253930   Fp16 Grad Scale: 32768   Required: 65 hours
Training: 2022-04-14 00:15:45,044-Speed 2630.59 samples/sec   Loss 9.5535   LearningRate 0.0481   Epoch: 6   Global Step: 253940   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:15:48,938-Speed 2630.46 samples/sec   Loss 9.3318   LearningRate 0.0481   Epoch: 6   Global Step: 253950   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:15:52,832-Speed 2630.31 samples/sec   Loss 9.4854   LearningRate 0.0481   Epoch: 6   Global Step: 253960   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:15:56,726-Speed 2630.26 samples/sec   Loss 9.1871   LearningRate 0.0481   Epoch: 6   Global Step: 253970   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:00,619-Speed 2631.53 samples/sec   Loss 9.3354   LearningRate 0.0481   Epoch: 6   Global Step: 253980   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:04,510-Speed 2632.22 samples/sec   Loss 9.4565   LearningRate 0.0481   Epoch: 6   Global Step: 253990   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:08,402-Speed 2631.54 samples/sec   Loss 9.2906   LearningRate 0.0481   Epoch: 6   Global Step: 254000   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:12,311-Speed 2620.12 samples/sec   Loss 9.4804   LearningRate 0.0481   Epoch: 6   Global Step: 254010   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:16,228-Speed 2614.55 samples/sec   Loss 9.3428   LearningRate 0.0481   Epoch: 6   Global Step: 254020   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:20,149-Speed 2612.03 samples/sec   Loss 9.2807   LearningRate 0.0481   Epoch: 6   Global Step: 254030   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:16:24,054-Speed 2623.23 samples/sec   Loss 9.3086   LearningRate 0.0481   Epoch: 6   Global Step: 254040   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:27,955-Speed 2625.93 samples/sec   Loss 9.5311   LearningRate 0.0481   Epoch: 6   Global Step: 254050   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:31,847-Speed 2631.91 samples/sec   Loss 9.4259   LearningRate 0.0481   Epoch: 6   Global Step: 254060   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:35,737-Speed 2632.63 samples/sec   Loss 9.2874   LearningRate 0.0481   Epoch: 6   Global Step: 254070   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:39,631-Speed 2630.36 samples/sec   Loss 9.3541   LearningRate 0.0481   Epoch: 6   Global Step: 254080   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:43,522-Speed 2632.64 samples/sec   Loss 9.5260   LearningRate 0.0481   Epoch: 6   Global Step: 254090   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:47,414-Speed 2631.03 samples/sec   Loss 9.3094   LearningRate 0.0481   Epoch: 6   Global Step: 254100   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:51,306-Speed 2633.25 samples/sec   Loss 9.3980   LearningRate 0.0481   Epoch: 6   Global Step: 254110   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:55,198-Speed 2631.55 samples/sec   Loss 9.3522   LearningRate 0.0481   Epoch: 6   Global Step: 254120   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:16:59,094-Speed 2628.90 samples/sec   Loss 9.3988   LearningRate 0.0481   Epoch: 6   Global Step: 254130   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:17:02,989-Speed 2629.34 samples/sec   Loss 9.4884   LearningRate 0.0481   Epoch: 6   Global Step: 254140   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:06,898-Speed 2620.83 samples/sec   Loss 9.3284   LearningRate 0.0481   Epoch: 6   Global Step: 254150   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:10,791-Speed 2630.80 samples/sec   Loss 9.4247   LearningRate 0.0481   Epoch: 6   Global Step: 254160   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:14,755-Speed 2583.45 samples/sec   Loss 9.4412   LearningRate 0.0481   Epoch: 6   Global Step: 254170   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:18,659-Speed 2623.73 samples/sec   Loss 9.4746   LearningRate 0.0481   Epoch: 6   Global Step: 254180   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:22,552-Speed 2630.96 samples/sec   Loss 9.2639   LearningRate 0.0481   Epoch: 6   Global Step: 254190   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:26,444-Speed 2632.38 samples/sec   Loss 9.1681   LearningRate 0.0481   Epoch: 6   Global Step: 254200   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:30,332-Speed 2633.80 samples/sec   Loss 9.3389   LearningRate 0.0481   Epoch: 6   Global Step: 254210   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:34,229-Speed 2628.73 samples/sec   Loss 9.3739   LearningRate 0.0481   Epoch: 6   Global Step: 254220   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:17:38,120-Speed 2631.81 samples/sec   Loss 9.3663   LearningRate 0.0481   Epoch: 6   Global Step: 254230   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:17:42,011-Speed 2632.20 samples/sec   Loss 9.4813   LearningRate 0.0481   Epoch: 6   Global Step: 254240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:17:45,910-Speed 2626.84 samples/sec   Loss 9.5216   LearningRate 0.0481   Epoch: 6   Global Step: 254250   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:17:49,806-Speed 2629.31 samples/sec   Loss 9.4011   LearningRate 0.0481   Epoch: 6   Global Step: 254260   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:17:53,703-Speed 2628.46 samples/sec   Loss 9.5079   LearningRate 0.0481   Epoch: 6   Global Step: 254270   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:17:57,622-Speed 2614.01 samples/sec   Loss 9.4274   LearningRate 0.0481   Epoch: 6   Global Step: 254280   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:01,512-Speed 2632.47 samples/sec   Loss 9.3396   LearningRate 0.0481   Epoch: 6   Global Step: 254290   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:05,404-Speed 2631.64 samples/sec   Loss 9.2364   LearningRate 0.0481   Epoch: 6   Global Step: 254300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:09,298-Speed 2630.68 samples/sec   Loss 9.4154   LearningRate 0.0481   Epoch: 6   Global Step: 254310   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:13,191-Speed 2631.09 samples/sec   Loss 9.3154   LearningRate 0.0481   Epoch: 6   Global Step: 254320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:17,082-Speed 2631.98 samples/sec   Loss 9.4108   LearningRate 0.0481   Epoch: 6   Global Step: 254330   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:18:20,976-Speed 2630.74 samples/sec   Loss 9.4294   LearningRate 0.0481   Epoch: 6   Global Step: 254340   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:18:24,868-Speed 2631.59 samples/sec   Loss 9.5172   LearningRate 0.0481   Epoch: 6   Global Step: 254350   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:18:28,764-Speed 2629.16 samples/sec   Loss 9.2955   LearningRate 0.0481   Epoch: 6   Global Step: 254360   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:18:32,658-Speed 2630.29 samples/sec   Loss 9.4180   LearningRate 0.0481   Epoch: 6   Global Step: 254370   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:18:36,556-Speed 2627.79 samples/sec   Loss 9.4184   LearningRate 0.0481   Epoch: 6   Global Step: 254380   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:18:40,440-Speed 2636.76 samples/sec   Loss 9.2774   LearningRate 0.0481   Epoch: 6   Global Step: 254390   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:44,338-Speed 2627.53 samples/sec   Loss 9.3389   LearningRate 0.0481   Epoch: 6   Global Step: 254400   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:48,232-Speed 2630.90 samples/sec   Loss 9.3813   LearningRate 0.0481   Epoch: 6   Global Step: 254410   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:52,127-Speed 2628.86 samples/sec   Loss 9.4033   LearningRate 0.0481   Epoch: 6   Global Step: 254420   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:18:56,104-Speed 2576.14 samples/sec   Loss 9.5909   LearningRate 0.0481   Epoch: 6   Global Step: 254430   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:00,016-Speed 2618.08 samples/sec   Loss 9.3831   LearningRate 0.0481   Epoch: 6   Global Step: 254440   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:03,964-Speed 2594.05 samples/sec   Loss 9.3552   LearningRate 0.0481   Epoch: 6   Global Step: 254450   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:07,880-Speed 2615.52 samples/sec   Loss 9.3644   LearningRate 0.0481   Epoch: 6   Global Step: 254460   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:11,785-Speed 2623.72 samples/sec   Loss 9.2155   LearningRate 0.0481   Epoch: 6   Global Step: 254470   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:15,687-Speed 2624.91 samples/sec   Loss 9.4367   LearningRate 0.0481   Epoch: 6   Global Step: 254480   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:19,620-Speed 2604.14 samples/sec   Loss 9.3810   LearningRate 0.0481   Epoch: 6   Global Step: 254490   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:19:23,535-Speed 2616.42 samples/sec   Loss 9.4112   LearningRate 0.0481   Epoch: 6   Global Step: 254500   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:19:27,499-Speed 2584.31 samples/sec   Loss 9.2350   LearningRate 0.0481   Epoch: 6   Global Step: 254510   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:19:31,395-Speed 2628.42 samples/sec   Loss 9.3694   LearningRate 0.0481   Epoch: 6   Global Step: 254520   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:35,309-Speed 2617.51 samples/sec   Loss 9.3182   LearningRate 0.0481   Epoch: 6   Global Step: 254530   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:39,215-Speed 2621.56 samples/sec   Loss 9.4574   LearningRate 0.0480   Epoch: 6   Global Step: 254540   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:19:43,108-Speed 2632.09 samples/sec   Loss 9.2283   LearningRate 0.0480   Epoch: 6   Global Step: 254550   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:19:47,008-Speed 2625.79 samples/sec   Loss 9.3506   LearningRate 0.0480   Epoch: 6   Global Step: 254560   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:19:50,908-Speed 2627.12 samples/sec   Loss 9.3377   LearningRate 0.0480   Epoch: 6   Global Step: 254570   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:19:54,800-Speed 2631.32 samples/sec   Loss 9.4233   LearningRate 0.0480   Epoch: 6   Global Step: 254580   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:19:58,695-Speed 2629.65 samples/sec   Loss 9.3478   LearningRate 0.0480   Epoch: 6   Global Step: 254590   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:20:02,597-Speed 2625.46 samples/sec   Loss 9.3723   LearningRate 0.0480   Epoch: 6   Global Step: 254600   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:20:06,494-Speed 2628.27 samples/sec   Loss 9.3435   LearningRate 0.0480   Epoch: 6   Global Step: 254610   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:20:10,403-Speed 2620.14 samples/sec   Loss 9.4181   LearningRate 0.0480   Epoch: 6   Global Step: 254620   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:20:14,305-Speed 2625.02 samples/sec   Loss 9.3003   LearningRate 0.0480   Epoch: 6   Global Step: 254630   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:20:18,203-Speed 2628.10 samples/sec   Loss 9.4184   LearningRate 0.0480   Epoch: 6   Global Step: 254640   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:20:22,102-Speed 2626.81 samples/sec   Loss 9.3461   LearningRate 0.0480   Epoch: 6   Global Step: 254650   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:25,993-Speed 2632.56 samples/sec   Loss 9.2933   LearningRate 0.0480   Epoch: 6   Global Step: 254660   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:29,887-Speed 2629.83 samples/sec   Loss 9.1438   LearningRate 0.0480   Epoch: 6   Global Step: 254670   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:33,783-Speed 2629.69 samples/sec   Loss 9.3008   LearningRate 0.0480   Epoch: 6   Global Step: 254680   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:37,677-Speed 2630.34 samples/sec   Loss 9.3001   LearningRate 0.0480   Epoch: 6   Global Step: 254690   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:41,570-Speed 2630.39 samples/sec   Loss 9.2207   LearningRate 0.0480   Epoch: 6   Global Step: 254700   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:45,475-Speed 2622.74 samples/sec   Loss 9.3652   LearningRate 0.0480   Epoch: 6   Global Step: 254710   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:49,364-Speed 2634.08 samples/sec   Loss 9.5513   LearningRate 0.0480   Epoch: 6   Global Step: 254720   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:53,277-Speed 2617.85 samples/sec   Loss 9.3312   LearningRate 0.0480   Epoch: 6   Global Step: 254730   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:20:57,176-Speed 2626.84 samples/sec   Loss 9.3915   LearningRate 0.0480   Epoch: 6   Global Step: 254740   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:01,068-Speed 2631.88 samples/sec   Loss 9.3970   LearningRate 0.0480   Epoch: 6   Global Step: 254750   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:04,960-Speed 2631.67 samples/sec   Loss 9.2229   LearningRate 0.0480   Epoch: 6   Global Step: 254760   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:08,852-Speed 2631.82 samples/sec   Loss 9.4840   LearningRate 0.0480   Epoch: 6   Global Step: 254770   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:12,744-Speed 2632.00 samples/sec   Loss 9.5189   LearningRate 0.0480   Epoch: 6   Global Step: 254780   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:16,640-Speed 2628.68 samples/sec   Loss 9.4431   LearningRate 0.0480   Epoch: 6   Global Step: 254790   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:20,534-Speed 2630.24 samples/sec   Loss 9.4731   LearningRate 0.0480   Epoch: 6   Global Step: 254800   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:24,424-Speed 2632.96 samples/sec   Loss 9.4406   LearningRate 0.0480   Epoch: 6   Global Step: 254810   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:28,315-Speed 2632.59 samples/sec   Loss 9.3865   LearningRate 0.0480   Epoch: 6   Global Step: 254820   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:32,207-Speed 2631.81 samples/sec   Loss 9.4392   LearningRate 0.0480   Epoch: 6   Global Step: 254830   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:21:36,081-Speed 2643.81 samples/sec   Loss 9.3988   LearningRate 0.0480   Epoch: 6   Global Step: 254840   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:39,985-Speed 2623.58 samples/sec   Loss 9.3001   LearningRate 0.0480   Epoch: 6   Global Step: 254850   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:43,876-Speed 2632.17 samples/sec   Loss 9.2859   LearningRate 0.0480   Epoch: 6   Global Step: 254860   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:47,802-Speed 2610.14 samples/sec   Loss 9.4067   LearningRate 0.0480   Epoch: 6   Global Step: 254870   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:51,691-Speed 2633.59 samples/sec   Loss 9.4027   LearningRate 0.0480   Epoch: 6   Global Step: 254880   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:55,584-Speed 2631.43 samples/sec   Loss 9.2539   LearningRate 0.0480   Epoch: 6   Global Step: 254890   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:21:59,478-Speed 2630.03 samples/sec   Loss 9.3397   LearningRate 0.0480   Epoch: 6   Global Step: 254900   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:03,378-Speed 2626.86 samples/sec   Loss 9.4470   LearningRate 0.0480   Epoch: 6   Global Step: 254910   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:07,265-Speed 2635.14 samples/sec   Loss 9.4021   LearningRate 0.0480   Epoch: 6   Global Step: 254920   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:11,157-Speed 2630.92 samples/sec   Loss 9.2563   LearningRate 0.0480   Epoch: 6   Global Step: 254930   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:15,055-Speed 2627.88 samples/sec   Loss 9.3652   LearningRate 0.0480   Epoch: 6   Global Step: 254940   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:22:18,954-Speed 2627.02 samples/sec   Loss 9.2756   LearningRate 0.0480   Epoch: 6   Global Step: 254950   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:22:22,841-Speed 2635.68 samples/sec   Loss 9.2873   LearningRate 0.0480   Epoch: 6   Global Step: 254960   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:26,768-Speed 2607.68 samples/sec   Loss 9.3885   LearningRate 0.0480   Epoch: 6   Global Step: 254970   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:30,660-Speed 2631.77 samples/sec   Loss 9.4474   LearningRate 0.0480   Epoch: 6   Global Step: 254980   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:34,552-Speed 2632.51 samples/sec   Loss 9.3241   LearningRate 0.0480   Epoch: 6   Global Step: 254990   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:38,442-Speed 2633.17 samples/sec   Loss 9.2574   LearningRate 0.0480   Epoch: 6   Global Step: 255000   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:42,336-Speed 2630.36 samples/sec   Loss 9.3613   LearningRate 0.0480   Epoch: 6   Global Step: 255010   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:46,230-Speed 2629.95 samples/sec   Loss 9.3216   LearningRate 0.0480   Epoch: 6   Global Step: 255020   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:50,126-Speed 2628.95 samples/sec   Loss 9.3919   LearningRate 0.0480   Epoch: 6   Global Step: 255030   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:54,019-Speed 2631.65 samples/sec   Loss 9.2392   LearningRate 0.0480   Epoch: 6   Global Step: 255040   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:22:57,904-Speed 2636.07 samples/sec   Loss 9.4204   LearningRate 0.0480   Epoch: 6   Global Step: 255050   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:01,802-Speed 2627.63 samples/sec   Loss 9.4325   LearningRate 0.0480   Epoch: 6   Global Step: 255060   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:05,695-Speed 2631.05 samples/sec   Loss 9.3752   LearningRate 0.0480   Epoch: 6   Global Step: 255070   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:09,592-Speed 2628.10 samples/sec   Loss 9.3678   LearningRate 0.0480   Epoch: 6   Global Step: 255080   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:13,495-Speed 2624.68 samples/sec   Loss 9.5213   LearningRate 0.0480   Epoch: 6   Global Step: 255090   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:17,422-Speed 2608.25 samples/sec   Loss 9.3958   LearningRate 0.0480   Epoch: 6   Global Step: 255100   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:21,320-Speed 2628.09 samples/sec   Loss 9.3312   LearningRate 0.0480   Epoch: 6   Global Step: 255110   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:25,395-Speed 2513.38 samples/sec   Loss 9.4893   LearningRate 0.0480   Epoch: 6   Global Step: 255120   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:29,490-Speed 2501.07 samples/sec   Loss 9.2761   LearningRate 0.0479   Epoch: 6   Global Step: 255130   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:33,583-Speed 2502.50 samples/sec   Loss 9.3233   LearningRate 0.0479   Epoch: 6   Global Step: 255140   Fp16 Grad Scale: 65536   Required: 65 hours
Training: 2022-04-14 00:23:37,643-Speed 2522.63 samples/sec   Loss 9.3012   LearningRate 0.0479   Epoch: 6   Global Step: 255150   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:23:41,536-Speed 2630.68 samples/sec   Loss 9.2848   LearningRate 0.0479   Epoch: 6   Global Step: 255160   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:23:45,442-Speed 2622.54 samples/sec   Loss 9.3239   LearningRate 0.0479   Epoch: 6   Global Step: 255170   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:23:49,336-Speed 2630.91 samples/sec   Loss 9.3636   LearningRate 0.0479   Epoch: 6   Global Step: 255180   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:23:53,228-Speed 2631.59 samples/sec   Loss 9.3235   LearningRate 0.0479   Epoch: 6   Global Step: 255190   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:23:57,134-Speed 2622.95 samples/sec   Loss 9.1791   LearningRate 0.0479   Epoch: 6   Global Step: 255200   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:01,024-Speed 2632.76 samples/sec   Loss 9.2292   LearningRate 0.0479   Epoch: 6   Global Step: 255210   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:04,915-Speed 2632.14 samples/sec   Loss 9.3869   LearningRate 0.0479   Epoch: 6   Global Step: 255220   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:08,808-Speed 2631.09 samples/sec   Loss 9.3195   LearningRate 0.0479   Epoch: 6   Global Step: 255230   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:12,715-Speed 2621.75 samples/sec   Loss 9.2643   LearningRate 0.0479   Epoch: 6   Global Step: 255240   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:16,625-Speed 2619.49 samples/sec   Loss 9.4006   LearningRate 0.0479   Epoch: 6   Global Step: 255250   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:24:20,524-Speed 2627.28 samples/sec   Loss 9.2903   LearningRate 0.0479   Epoch: 6   Global Step: 255260   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:24:24,479-Speed 2589.92 samples/sec   Loss 9.3417   LearningRate 0.0479   Epoch: 6   Global Step: 255270   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:24:28,378-Speed 2626.74 samples/sec   Loss 9.3303   LearningRate 0.0479   Epoch: 6   Global Step: 255280   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:24:32,286-Speed 2621.25 samples/sec   Loss 9.3043   LearningRate 0.0479   Epoch: 6   Global Step: 255290   Fp16 Grad Scale: 262144   Required: 65 hours
Training: 2022-04-14 00:24:36,161-Speed 2643.17 samples/sec   Loss 9.3658   LearningRate 0.0479   Epoch: 6   Global Step: 255300   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:40,056-Speed 2629.38 samples/sec   Loss 9.3841   LearningRate 0.0479   Epoch: 6   Global Step: 255310   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:43,955-Speed 2627.01 samples/sec   Loss 9.4411   LearningRate 0.0479   Epoch: 6   Global Step: 255320   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:47,843-Speed 2634.16 samples/sec   Loss 9.2321   LearningRate 0.0479   Epoch: 6   Global Step: 255330   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:51,737-Speed 2630.98 samples/sec   Loss 9.3488   LearningRate 0.0479   Epoch: 6   Global Step: 255340   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:55,632-Speed 2629.37 samples/sec   Loss 9.3183   LearningRate 0.0479   Epoch: 6   Global Step: 255350   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:24:59,534-Speed 2625.33 samples/sec   Loss 9.3482   LearningRate 0.0479   Epoch: 6   Global Step: 255360   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:25:03,427-Speed 2630.71 samples/sec   Loss 9.2486   LearningRate 0.0479   Epoch: 6   Global Step: 255370   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:25:07,829-Speed 2326.92 samples/sec   Loss 9.2527   LearningRate 0.0479   Epoch: 6   Global Step: 255380   Fp16 Grad Scale: 131072   Required: 65 hours
Training: 2022-04-14 00:25:11,730-Speed 2625.30 samples/sec   Loss 9.2879   LearningRate 0.0479   Epoch: 6   Global Step: 255390   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:25:15,638-Speed 2620.74 samples/sec   Loss 9.3393   LearningRate 0.0479   Epoch: 6   Global Step: 255400   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:19,534-Speed 2629.60 samples/sec   Loss 9.2640   LearningRate 0.0479   Epoch: 6   Global Step: 255410   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:23,433-Speed 2626.87 samples/sec   Loss 9.2852   LearningRate 0.0479   Epoch: 6   Global Step: 255420   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:27,342-Speed 2620.29 samples/sec   Loss 9.2366   LearningRate 0.0479   Epoch: 6   Global Step: 255430   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:31,247-Speed 2623.27 samples/sec   Loss 9.3034   LearningRate 0.0479   Epoch: 6   Global Step: 255440   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:35,139-Speed 2631.34 samples/sec   Loss 9.4197   LearningRate 0.0479   Epoch: 6   Global Step: 255450   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:39,032-Speed 2631.03 samples/sec   Loss 9.4302   LearningRate 0.0479   Epoch: 6   Global Step: 255460   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:42,983-Speed 2592.57 samples/sec   Loss 9.3632   LearningRate 0.0479   Epoch: 6   Global Step: 255470   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:46,865-Speed 2638.87 samples/sec   Loss 9.3711   LearningRate 0.0479   Epoch: 6   Global Step: 255480   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:25:50,744-Speed 2640.88 samples/sec   Loss 9.2383   LearningRate 0.0479   Epoch: 6   Global Step: 255490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:25:54,657-Speed 2617.42 samples/sec   Loss 9.3260   LearningRate 0.0479   Epoch: 6   Global Step: 255500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:25:58,554-Speed 2628.99 samples/sec   Loss 9.3045   LearningRate 0.0479   Epoch: 6   Global Step: 255510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:02,450-Speed 2629.05 samples/sec   Loss 9.3607   LearningRate 0.0479   Epoch: 6   Global Step: 255520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:06,347-Speed 2628.07 samples/sec   Loss 9.3788   LearningRate 0.0479   Epoch: 6   Global Step: 255530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:10,241-Speed 2630.14 samples/sec   Loss 9.2419   LearningRate 0.0479   Epoch: 6   Global Step: 255540   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:14,134-Speed 2631.27 samples/sec   Loss 9.3859   LearningRate 0.0479   Epoch: 6   Global Step: 255550   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:18,033-Speed 2627.05 samples/sec   Loss 9.3852   LearningRate 0.0479   Epoch: 6   Global Step: 255560   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:21,926-Speed 2631.23 samples/sec   Loss 9.2674   LearningRate 0.0479   Epoch: 6   Global Step: 255570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:25,852-Speed 2608.62 samples/sec   Loss 9.3234   LearningRate 0.0479   Epoch: 6   Global Step: 255580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:29,741-Speed 2633.89 samples/sec   Loss 9.2325   LearningRate 0.0479   Epoch: 6   Global Step: 255590   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:26:33,618-Speed 2642.40 samples/sec   Loss 9.3203   LearningRate 0.0479   Epoch: 6   Global Step: 255600   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:37,506-Speed 2633.91 samples/sec   Loss 9.3882   LearningRate 0.0479   Epoch: 6   Global Step: 255610   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:41,399-Speed 2631.31 samples/sec   Loss 9.3583   LearningRate 0.0479   Epoch: 6   Global Step: 255620   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:45,292-Speed 2631.36 samples/sec   Loss 9.3957   LearningRate 0.0479   Epoch: 6   Global Step: 255630   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:49,198-Speed 2622.01 samples/sec   Loss 9.3278   LearningRate 0.0479   Epoch: 6   Global Step: 255640   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:53,086-Speed 2633.87 samples/sec   Loss 9.3688   LearningRate 0.0479   Epoch: 6   Global Step: 255650   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:26:56,981-Speed 2630.17 samples/sec   Loss 9.3843   LearningRate 0.0479   Epoch: 6   Global Step: 255660   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:00,881-Speed 2626.01 samples/sec   Loss 9.1826   LearningRate 0.0479   Epoch: 6   Global Step: 255670   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:04,779-Speed 2627.57 samples/sec   Loss 9.2576   LearningRate 0.0479   Epoch: 6   Global Step: 255680   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:08,694-Speed 2616.56 samples/sec   Loss 9.3502   LearningRate 0.0479   Epoch: 6   Global Step: 255690   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:12,601-Speed 2621.83 samples/sec   Loss 9.2874   LearningRate 0.0479   Epoch: 6   Global Step: 255700   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:27:16,494-Speed 2630.28 samples/sec   Loss 9.1917   LearningRate 0.0479   Epoch: 6   Global Step: 255710   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:27:20,367-Speed 2644.56 samples/sec   Loss 9.3818   LearningRate 0.0479   Epoch: 6   Global Step: 255720   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:24,265-Speed 2627.22 samples/sec   Loss 9.3183   LearningRate 0.0478   Epoch: 6   Global Step: 255730   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:28,167-Speed 2625.16 samples/sec   Loss 9.4427   LearningRate 0.0478   Epoch: 6   Global Step: 255740   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:32,063-Speed 2629.16 samples/sec   Loss 9.3313   LearningRate 0.0478   Epoch: 6   Global Step: 255750   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:35,969-Speed 2621.96 samples/sec   Loss 9.2427   LearningRate 0.0478   Epoch: 6   Global Step: 255760   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:39,866-Speed 2628.65 samples/sec   Loss 9.4379   LearningRate 0.0478   Epoch: 6   Global Step: 255770   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:43,755-Speed 2633.87 samples/sec   Loss 9.3683   LearningRate 0.0478   Epoch: 6   Global Step: 255780   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:47,662-Speed 2621.19 samples/sec   Loss 9.4403   LearningRate 0.0478   Epoch: 6   Global Step: 255790   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:51,553-Speed 2632.00 samples/sec   Loss 9.3348   LearningRate 0.0478   Epoch: 6   Global Step: 255800   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:27:55,420-Speed 2649.08 samples/sec   Loss 9.2580   LearningRate 0.0478   Epoch: 6   Global Step: 255810   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:27:59,311-Speed 2632.11 samples/sec   Loss 9.2096   LearningRate 0.0478   Epoch: 6   Global Step: 255820   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:03,199-Speed 2634.11 samples/sec   Loss 9.3387   LearningRate 0.0478   Epoch: 6   Global Step: 255830   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:07,088-Speed 2633.91 samples/sec   Loss 9.2939   LearningRate 0.0478   Epoch: 6   Global Step: 255840   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:10,986-Speed 2627.28 samples/sec   Loss 9.3893   LearningRate 0.0478   Epoch: 6   Global Step: 255850   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:14,873-Speed 2635.25 samples/sec   Loss 9.3125   LearningRate 0.0478   Epoch: 6   Global Step: 255860   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:18,766-Speed 2631.05 samples/sec   Loss 9.4206   LearningRate 0.0478   Epoch: 6   Global Step: 255870   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:22,654-Speed 2634.15 samples/sec   Loss 9.3570   LearningRate 0.0478   Epoch: 6   Global Step: 255880   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:26,542-Speed 2634.44 samples/sec   Loss 9.2210   LearningRate 0.0478   Epoch: 6   Global Step: 255890   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:30,435-Speed 2630.91 samples/sec   Loss 9.2343   LearningRate 0.0478   Epoch: 6   Global Step: 255900   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:28:34,328-Speed 2631.22 samples/sec   Loss 9.1719   LearningRate 0.0478   Epoch: 6   Global Step: 255910   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:28:38,234-Speed 2621.72 samples/sec   Loss 9.2611   LearningRate 0.0478   Epoch: 6   Global Step: 255920   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:28:42,133-Speed 2626.77 samples/sec   Loss 9.2826   LearningRate 0.0478   Epoch: 6   Global Step: 255930   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:28:46,027-Speed 2630.15 samples/sec   Loss 9.2198   LearningRate 0.0478   Epoch: 6   Global Step: 255940   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:28:49,958-Speed 2605.66 samples/sec   Loss 9.3051   LearningRate 0.0478   Epoch: 6   Global Step: 255950   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:28:53,868-Speed 2619.95 samples/sec   Loss 9.3085   LearningRate 0.0478   Epoch: 6   Global Step: 255960   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:28:57,763-Speed 2629.43 samples/sec   Loss 9.4268   LearningRate 0.0478   Epoch: 6   Global Step: 255970   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:01,652-Speed 2634.08 samples/sec   Loss 9.2555   LearningRate 0.0478   Epoch: 6   Global Step: 255980   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:05,542-Speed 2632.51 samples/sec   Loss 9.1527   LearningRate 0.0478   Epoch: 6   Global Step: 255990   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:09,441-Speed 2627.29 samples/sec   Loss 9.4246   LearningRate 0.0478   Epoch: 6   Global Step: 256000   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:13,335-Speed 2630.27 samples/sec   Loss 9.3888   LearningRate 0.0478   Epoch: 6   Global Step: 256010   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:29:17,214-Speed 2639.98 samples/sec   Loss 9.2636   LearningRate 0.0478   Epoch: 6   Global Step: 256020   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:21,106-Speed 2631.52 samples/sec   Loss 9.2981   LearningRate 0.0478   Epoch: 6   Global Step: 256030   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:24,996-Speed 2633.43 samples/sec   Loss 9.3076   LearningRate 0.0478   Epoch: 6   Global Step: 256040   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:28,887-Speed 2631.93 samples/sec   Loss 9.2668   LearningRate 0.0478   Epoch: 6   Global Step: 256050   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:32,780-Speed 2631.38 samples/sec   Loss 9.2631   LearningRate 0.0478   Epoch: 6   Global Step: 256060   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:36,669-Speed 2633.79 samples/sec   Loss 9.3412   LearningRate 0.0478   Epoch: 6   Global Step: 256070   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:40,559-Speed 2632.46 samples/sec   Loss 9.3459   LearningRate 0.0478   Epoch: 6   Global Step: 256080   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:44,478-Speed 2614.12 samples/sec   Loss 9.2466   LearningRate 0.0478   Epoch: 6   Global Step: 256090   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:48,373-Speed 2629.36 samples/sec   Loss 9.4224   LearningRate 0.0478   Epoch: 6   Global Step: 256100   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:52,269-Speed 2628.80 samples/sec   Loss 9.4310   LearningRate 0.0478   Epoch: 6   Global Step: 256110   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:29:56,172-Speed 2624.37 samples/sec   Loss 9.4097   LearningRate 0.0478   Epoch: 6   Global Step: 256120   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:00,065-Speed 2630.94 samples/sec   Loss 9.3323   LearningRate 0.0478   Epoch: 6   Global Step: 256130   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:03,959-Speed 2630.52 samples/sec   Loss 9.2956   LearningRate 0.0478   Epoch: 6   Global Step: 256140   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:07,854-Speed 2629.56 samples/sec   Loss 9.2999   LearningRate 0.0478   Epoch: 6   Global Step: 256150   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:11,757-Speed 2624.24 samples/sec   Loss 9.2648   LearningRate 0.0478   Epoch: 6   Global Step: 256160   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:15,665-Speed 2620.85 samples/sec   Loss 9.4173   LearningRate 0.0478   Epoch: 6   Global Step: 256170   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:19,563-Speed 2627.55 samples/sec   Loss 9.3289   LearningRate 0.0478   Epoch: 6   Global Step: 256180   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:30:23,433-Speed 2646.58 samples/sec   Loss 9.6394   LearningRate 0.0478   Epoch: 6   Global Step: 256190   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:30:27,324-Speed 2632.61 samples/sec   Loss 10.2517   LearningRate 0.0478   Epoch: 6   Global Step: 256200   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:31,215-Speed 2632.29 samples/sec   Loss 9.6302   LearningRate 0.0478   Epoch: 6   Global Step: 256210   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:35,113-Speed 2627.47 samples/sec   Loss 9.4730   LearningRate 0.0478   Epoch: 6   Global Step: 256220   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:39,009-Speed 2629.27 samples/sec   Loss 9.4542   LearningRate 0.0478   Epoch: 6   Global Step: 256230   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:42,909-Speed 2626.40 samples/sec   Loss 9.4651   LearningRate 0.0478   Epoch: 6   Global Step: 256240   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:46,808-Speed 2627.18 samples/sec   Loss 9.4570   LearningRate 0.0478   Epoch: 6   Global Step: 256250   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:50,702-Speed 2630.00 samples/sec   Loss 9.4355   LearningRate 0.0478   Epoch: 6   Global Step: 256260   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:54,596-Speed 2630.14 samples/sec   Loss 9.3151   LearningRate 0.0478   Epoch: 6   Global Step: 256270   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:30:58,510-Speed 2617.56 samples/sec   Loss 9.5244   LearningRate 0.0478   Epoch: 6   Global Step: 256280   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:31:02,413-Speed 2624.48 samples/sec   Loss 9.3561   LearningRate 0.0478   Epoch: 6   Global Step: 256290   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:31:06,305-Speed 2631.20 samples/sec   Loss 9.2891   LearningRate 0.0478   Epoch: 6   Global Step: 256300   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:10,199-Speed 2630.42 samples/sec   Loss 9.3955   LearningRate 0.0478   Epoch: 6   Global Step: 256310   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:14,097-Speed 2627.20 samples/sec   Loss 9.3477   LearningRate 0.0478   Epoch: 6   Global Step: 256320   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:18,056-Speed 2587.27 samples/sec   Loss 9.2997   LearningRate 0.0477   Epoch: 6   Global Step: 256330   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:21,974-Speed 2614.48 samples/sec   Loss 9.3726   LearningRate 0.0477   Epoch: 6   Global Step: 256340   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:25,865-Speed 2632.75 samples/sec   Loss 9.3744   LearningRate 0.0477   Epoch: 6   Global Step: 256350   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:29,766-Speed 2625.51 samples/sec   Loss 9.4466   LearningRate 0.0477   Epoch: 6   Global Step: 256360   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:33,656-Speed 2633.24 samples/sec   Loss 9.2423   LearningRate 0.0477   Epoch: 6   Global Step: 256370   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:37,546-Speed 2632.78 samples/sec   Loss 9.2881   LearningRate 0.0477   Epoch: 6   Global Step: 256380   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:41,437-Speed 2632.60 samples/sec   Loss 9.3142   LearningRate 0.0477   Epoch: 6   Global Step: 256390   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:31:45,331-Speed 2629.90 samples/sec   Loss 9.2245   LearningRate 0.0477   Epoch: 6   Global Step: 256400   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:31:49,226-Speed 2630.23 samples/sec   Loss 9.4184   LearningRate 0.0477   Epoch: 6   Global Step: 256410   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:31:53,114-Speed 2634.27 samples/sec   Loss 9.2702   LearningRate 0.0477   Epoch: 6   Global Step: 256420   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:31:57,005-Speed 2632.86 samples/sec   Loss 9.4496   LearningRate 0.0477   Epoch: 6   Global Step: 256430   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:00,896-Speed 2631.91 samples/sec   Loss 9.2843   LearningRate 0.0477   Epoch: 6   Global Step: 256440   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:04,791-Speed 2629.41 samples/sec   Loss 9.2783   LearningRate 0.0477   Epoch: 6   Global Step: 256450   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:08,689-Speed 2627.38 samples/sec   Loss 9.2719   LearningRate 0.0477   Epoch: 6   Global Step: 256460   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:12,584-Speed 2630.30 samples/sec   Loss 9.1924   LearningRate 0.0477   Epoch: 6   Global Step: 256470   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:16,488-Speed 2622.81 samples/sec   Loss 9.3790   LearningRate 0.0477   Epoch: 6   Global Step: 256480   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:20,388-Speed 2627.37 samples/sec   Loss 9.4057   LearningRate 0.0477   Epoch: 6   Global Step: 256490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:24,278-Speed 2632.80 samples/sec   Loss 9.4259   LearningRate 0.0477   Epoch: 6   Global Step: 256500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:28,279-Speed 2560.28 samples/sec   Loss 9.3610   LearningRate 0.0477   Epoch: 6   Global Step: 256510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:32,261-Speed 2571.94 samples/sec   Loss 9.5499   LearningRate 0.0477   Epoch: 6   Global Step: 256520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:36,152-Speed 2632.95 samples/sec   Loss 9.3627   LearningRate 0.0477   Epoch: 6   Global Step: 256530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:40,043-Speed 2631.96 samples/sec   Loss 9.3386   LearningRate 0.0477   Epoch: 6   Global Step: 256540   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:43,957-Speed 2617.01 samples/sec   Loss 9.2978   LearningRate 0.0477   Epoch: 6   Global Step: 256550   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:47,859-Speed 2625.44 samples/sec   Loss 9.3443   LearningRate 0.0477   Epoch: 6   Global Step: 256560   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:51,749-Speed 2632.72 samples/sec   Loss 9.1997   LearningRate 0.0477   Epoch: 6   Global Step: 256570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:55,697-Speed 2593.93 samples/sec   Loss 9.3028   LearningRate 0.0477   Epoch: 6   Global Step: 256580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:32:59,611-Speed 2617.55 samples/sec   Loss 9.2852   LearningRate 0.0477   Epoch: 6   Global Step: 256590   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:03,505-Speed 2629.97 samples/sec   Loss 9.3353   LearningRate 0.0477   Epoch: 6   Global Step: 256600   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:33:07,392-Speed 2635.06 samples/sec   Loss 9.2139   LearningRate 0.0477   Epoch: 6   Global Step: 256610   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:11,287-Speed 2629.97 samples/sec   Loss 9.2390   LearningRate 0.0477   Epoch: 6   Global Step: 256620   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:15,182-Speed 2630.65 samples/sec   Loss 9.3686   LearningRate 0.0477   Epoch: 6   Global Step: 256630   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:19,072-Speed 2632.93 samples/sec   Loss 9.1924   LearningRate 0.0477   Epoch: 6   Global Step: 256640   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:22,999-Speed 2607.76 samples/sec   Loss 9.3151   LearningRate 0.0477   Epoch: 6   Global Step: 256650   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:26,894-Speed 2629.76 samples/sec   Loss 9.2622   LearningRate 0.0477   Epoch: 6   Global Step: 256660   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:30,785-Speed 2632.85 samples/sec   Loss 9.4232   LearningRate 0.0477   Epoch: 6   Global Step: 256670   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:34,701-Speed 2615.87 samples/sec   Loss 9.3352   LearningRate 0.0477   Epoch: 6   Global Step: 256680   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:38,597-Speed 2628.81 samples/sec   Loss 9.3637   LearningRate 0.0477   Epoch: 6   Global Step: 256690   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:33:42,470-Speed 2644.69 samples/sec   Loss 9.2595   LearningRate 0.0477   Epoch: 6   Global Step: 256700   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:33:46,367-Speed 2628.41 samples/sec   Loss 9.3306   LearningRate 0.0477   Epoch: 6   Global Step: 256710   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:33:50,253-Speed 2635.55 samples/sec   Loss 9.2974   LearningRate 0.0477   Epoch: 6   Global Step: 256720   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:33:54,144-Speed 2632.89 samples/sec   Loss 9.3492   LearningRate 0.0477   Epoch: 6   Global Step: 256730   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:33:58,035-Speed 2633.02 samples/sec   Loss 9.3463   LearningRate 0.0477   Epoch: 6   Global Step: 256740   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:34:01,925-Speed 2632.65 samples/sec   Loss 9.2700   LearningRate 0.0477   Epoch: 6   Global Step: 256750   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:34:05,820-Speed 2629.95 samples/sec   Loss 9.3756   LearningRate 0.0477   Epoch: 6   Global Step: 256760   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:34:09,726-Speed 2622.65 samples/sec   Loss 9.3795   LearningRate 0.0477   Epoch: 6   Global Step: 256770   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:34:13,619-Speed 2630.87 samples/sec   Loss 9.4077   LearningRate 0.0477   Epoch: 6   Global Step: 256780   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:34:17,609-Speed 2566.89 samples/sec   Loss 9.4251   LearningRate 0.0477   Epoch: 6   Global Step: 256790   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:34:21,588-Speed 2573.66 samples/sec   Loss 9.3112   LearningRate 0.0477   Epoch: 6   Global Step: 256800   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:25,680-Speed 2503.46 samples/sec   Loss 9.3031   LearningRate 0.0477   Epoch: 6   Global Step: 256810   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:29,781-Speed 2497.34 samples/sec   Loss 9.3439   LearningRate 0.0477   Epoch: 6   Global Step: 256820   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:33,877-Speed 2500.37 samples/sec   Loss 9.3793   LearningRate 0.0477   Epoch: 6   Global Step: 256830   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:37,968-Speed 2503.92 samples/sec   Loss 9.3710   LearningRate 0.0477   Epoch: 6   Global Step: 256840   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:42,109-Speed 2473.18 samples/sec   Loss 9.2353   LearningRate 0.0477   Epoch: 6   Global Step: 256850   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:46,080-Speed 2579.76 samples/sec   Loss 9.2402   LearningRate 0.0477   Epoch: 6   Global Step: 256860   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:49,982-Speed 2624.62 samples/sec   Loss 9.2874   LearningRate 0.0477   Epoch: 6   Global Step: 256870   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:53,875-Speed 2631.22 samples/sec   Loss 9.2494   LearningRate 0.0477   Epoch: 6   Global Step: 256880   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:34:57,793-Speed 2613.86 samples/sec   Loss 9.3731   LearningRate 0.0477   Epoch: 6   Global Step: 256890   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:35:01,684-Speed 2632.31 samples/sec   Loss 9.2457   LearningRate 0.0477   Epoch: 6   Global Step: 256900   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:35:05,593-Speed 2620.14 samples/sec   Loss 9.2212   LearningRate 0.0477   Epoch: 6   Global Step: 256910   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:35:09,479-Speed 2635.90 samples/sec   Loss 9.2917   LearningRate 0.0477   Epoch: 6   Global Step: 256920   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:35:13,362-Speed 2637.71 samples/sec   Loss 9.2781   LearningRate 0.0476   Epoch: 6   Global Step: 256930   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:17,252-Speed 2633.10 samples/sec   Loss 9.3387   LearningRate 0.0476   Epoch: 6   Global Step: 256940   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:21,146-Speed 2630.45 samples/sec   Loss 9.3964   LearningRate 0.0476   Epoch: 6   Global Step: 256950   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:25,038-Speed 2631.67 samples/sec   Loss 9.4241   LearningRate 0.0476   Epoch: 6   Global Step: 256960   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:28,946-Speed 2621.14 samples/sec   Loss 9.4986   LearningRate 0.0476   Epoch: 6   Global Step: 256970   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:33,384-Speed 2307.76 samples/sec   Loss 9.3272   LearningRate 0.0476   Epoch: 6   Global Step: 256980   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:37,289-Speed 2622.97 samples/sec   Loss 9.3493   LearningRate 0.0476   Epoch: 6   Global Step: 256990   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:41,181-Speed 2631.99 samples/sec   Loss 9.4710   LearningRate 0.0476   Epoch: 6   Global Step: 257000   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:45,074-Speed 2630.94 samples/sec   Loss 9.4081   LearningRate 0.0476   Epoch: 6   Global Step: 257010   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:48,966-Speed 2631.57 samples/sec   Loss 9.2253   LearningRate 0.0476   Epoch: 6   Global Step: 257020   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:35:52,860-Speed 2630.12 samples/sec   Loss 9.2520   LearningRate 0.0476   Epoch: 6   Global Step: 257030   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:35:56,755-Speed 2630.08 samples/sec   Loss 9.2456   LearningRate 0.0476   Epoch: 6   Global Step: 257040   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:00,642-Speed 2635.06 samples/sec   Loss 9.3387   LearningRate 0.0476   Epoch: 6   Global Step: 257050   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:04,538-Speed 2629.00 samples/sec   Loss 9.3387   LearningRate 0.0476   Epoch: 6   Global Step: 257060   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:08,443-Speed 2622.53 samples/sec   Loss 9.3559   LearningRate 0.0476   Epoch: 6   Global Step: 257070   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:12,341-Speed 2627.46 samples/sec   Loss 9.4001   LearningRate 0.0476   Epoch: 6   Global Step: 257080   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:16,234-Speed 2631.35 samples/sec   Loss 9.4275   LearningRate 0.0476   Epoch: 6   Global Step: 257090   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:20,128-Speed 2630.22 samples/sec   Loss 9.2943   LearningRate 0.0476   Epoch: 6   Global Step: 257100   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:24,013-Speed 2636.42 samples/sec   Loss 9.4513   LearningRate 0.0476   Epoch: 6   Global Step: 257110   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:27,903-Speed 2633.24 samples/sec   Loss 9.2191   LearningRate 0.0476   Epoch: 6   Global Step: 257120   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:31,775-Speed 2645.23 samples/sec   Loss 9.2981   LearningRate 0.0476   Epoch: 6   Global Step: 257130   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:35,670-Speed 2629.25 samples/sec   Loss 9.1914   LearningRate 0.0476   Epoch: 6   Global Step: 257140   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:39,560-Speed 2633.43 samples/sec   Loss 9.1680   LearningRate 0.0476   Epoch: 6   Global Step: 257150   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:43,453-Speed 2630.81 samples/sec   Loss 9.2664   LearningRate 0.0476   Epoch: 6   Global Step: 257160   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:47,349-Speed 2628.72 samples/sec   Loss 9.2161   LearningRate 0.0476   Epoch: 6   Global Step: 257170   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:51,251-Speed 2625.37 samples/sec   Loss 9.3330   LearningRate 0.0476   Epoch: 6   Global Step: 257180   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:55,147-Speed 2629.15 samples/sec   Loss 9.3369   LearningRate 0.0476   Epoch: 6   Global Step: 257190   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:36:59,067-Speed 2613.22 samples/sec   Loss 9.3564   LearningRate 0.0476   Epoch: 6   Global Step: 257200   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:37:02,971-Speed 2622.80 samples/sec   Loss 9.2337   LearningRate 0.0476   Epoch: 6   Global Step: 257210   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:37:06,863-Speed 2631.94 samples/sec   Loss 9.2711   LearningRate 0.0476   Epoch: 6   Global Step: 257220   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:37:10,761-Speed 2627.95 samples/sec   Loss 9.4395   LearningRate 0.0476   Epoch: 6   Global Step: 257230   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:14,653-Speed 2631.63 samples/sec   Loss 9.3455   LearningRate 0.0476   Epoch: 6   Global Step: 257240   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:18,553-Speed 2625.94 samples/sec   Loss 9.2083   LearningRate 0.0476   Epoch: 6   Global Step: 257250   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:22,445-Speed 2631.82 samples/sec   Loss 9.1862   LearningRate 0.0476   Epoch: 6   Global Step: 257260   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:26,346-Speed 2625.58 samples/sec   Loss 9.3805   LearningRate 0.0476   Epoch: 6   Global Step: 257270   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:30,244-Speed 2627.85 samples/sec   Loss 9.1761   LearningRate 0.0476   Epoch: 6   Global Step: 257280   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:34,234-Speed 2567.00 samples/sec   Loss 9.0991   LearningRate 0.0476   Epoch: 6   Global Step: 257290   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:38,127-Speed 2631.20 samples/sec   Loss 9.1915   LearningRate 0.0476   Epoch: 6   Global Step: 257300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:42,031-Speed 2622.83 samples/sec   Loss 9.2351   LearningRate 0.0476   Epoch: 6   Global Step: 257310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:45,964-Speed 2604.77 samples/sec   Loss 9.2158   LearningRate 0.0476   Epoch: 6   Global Step: 257320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:49,842-Speed 2641.80 samples/sec   Loss 9.2230   LearningRate 0.0476   Epoch: 6   Global Step: 257330   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:37:53,726-Speed 2637.03 samples/sec   Loss 9.3332   LearningRate 0.0476   Epoch: 6   Global Step: 257340   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:37:57,633-Speed 2621.60 samples/sec   Loss 9.2765   LearningRate 0.0476   Epoch: 6   Global Step: 257350   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:01,529-Speed 2629.38 samples/sec   Loss 9.3562   LearningRate 0.0476   Epoch: 6   Global Step: 257360   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:05,456-Speed 2607.77 samples/sec   Loss 9.2915   LearningRate 0.0476   Epoch: 6   Global Step: 257370   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:09,463-Speed 2555.79 samples/sec   Loss 9.1731   LearningRate 0.0476   Epoch: 6   Global Step: 257380   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:13,375-Speed 2618.90 samples/sec   Loss 9.2042   LearningRate 0.0476   Epoch: 6   Global Step: 257390   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:17,284-Speed 2619.85 samples/sec   Loss 9.3544   LearningRate 0.0476   Epoch: 6   Global Step: 257400   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:21,183-Speed 2627.70 samples/sec   Loss 9.3654   LearningRate 0.0476   Epoch: 6   Global Step: 257410   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:25,077-Speed 2629.92 samples/sec   Loss 9.2745   LearningRate 0.0476   Epoch: 6   Global Step: 257420   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:28,985-Speed 2621.34 samples/sec   Loss 9.3035   LearningRate 0.0476   Epoch: 6   Global Step: 257430   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:38:32,877-Speed 2631.30 samples/sec   Loss 9.1818   LearningRate 0.0476   Epoch: 6   Global Step: 257440   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:38:36,768-Speed 2632.29 samples/sec   Loss 9.4194   LearningRate 0.0476   Epoch: 6   Global Step: 257450   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:38:40,904-Speed 2476.34 samples/sec   Loss 9.3030   LearningRate 0.0476   Epoch: 6   Global Step: 257460   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:38:44,797-Speed 2630.56 samples/sec   Loss 9.3137   LearningRate 0.0476   Epoch: 6   Global Step: 257470   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:38:48,695-Speed 2628.37 samples/sec   Loss 9.3646   LearningRate 0.0476   Epoch: 6   Global Step: 257480   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:38:52,601-Speed 2622.18 samples/sec   Loss 9.2218   LearningRate 0.0476   Epoch: 6   Global Step: 257490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:38:56,506-Speed 2622.37 samples/sec   Loss 9.2664   LearningRate 0.0476   Epoch: 6   Global Step: 257500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:39:00,401-Speed 2629.24 samples/sec   Loss 9.3190   LearningRate 0.0476   Epoch: 6   Global Step: 257510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:39:04,299-Speed 2627.91 samples/sec   Loss 9.3380   LearningRate 0.0476   Epoch: 6   Global Step: 257520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:39:08,202-Speed 2624.48 samples/sec   Loss 9.2043   LearningRate 0.0476   Epoch: 6   Global Step: 257530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:39:12,083-Speed 2639.35 samples/sec   Loss 9.6132   LearningRate 0.0475   Epoch: 6   Global Step: 257540   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:15,990-Speed 2620.81 samples/sec   Loss 9.2332   LearningRate 0.0475   Epoch: 6   Global Step: 257550   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:19,899-Speed 2621.02 samples/sec   Loss 9.2408   LearningRate 0.0475   Epoch: 6   Global Step: 257560   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:23,796-Speed 2628.73 samples/sec   Loss 9.2913   LearningRate 0.0475   Epoch: 6   Global Step: 257570   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:27,728-Speed 2604.47 samples/sec   Loss 9.4731   LearningRate 0.0475   Epoch: 6   Global Step: 257580   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:31,626-Speed 2627.77 samples/sec   Loss 9.2252   LearningRate 0.0475   Epoch: 6   Global Step: 257590   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:35,533-Speed 2621.87 samples/sec   Loss 9.4229   LearningRate 0.0475   Epoch: 6   Global Step: 257600   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:39,424-Speed 2632.48 samples/sec   Loss 9.3440   LearningRate 0.0475   Epoch: 6   Global Step: 257610   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:43,316-Speed 2631.83 samples/sec   Loss 9.3792   LearningRate 0.0475   Epoch: 6   Global Step: 257620   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:47,209-Speed 2631.06 samples/sec   Loss 9.3579   LearningRate 0.0475   Epoch: 6   Global Step: 257630   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:39:51,108-Speed 2626.96 samples/sec   Loss 9.2052   LearningRate 0.0475   Epoch: 6   Global Step: 257640   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:39:55,006-Speed 2627.10 samples/sec   Loss 9.2409   LearningRate 0.0475   Epoch: 6   Global Step: 257650   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:39:58,901-Speed 2629.95 samples/sec   Loss 9.3507   LearningRate 0.0475   Epoch: 6   Global Step: 257660   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:40:02,794-Speed 2630.94 samples/sec   Loss 9.3944   LearningRate 0.0475   Epoch: 6   Global Step: 257670   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:40:06,691-Speed 2627.94 samples/sec   Loss 9.3275   LearningRate 0.0475   Epoch: 6   Global Step: 257680   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:40:10,596-Speed 2623.56 samples/sec   Loss 9.4472   LearningRate 0.0475   Epoch: 6   Global Step: 257690   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:14,484-Speed 2634.60 samples/sec   Loss 9.3378   LearningRate 0.0475   Epoch: 6   Global Step: 257700   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:19,212-Speed 2166.11 samples/sec   Loss 9.3988   LearningRate 0.0475   Epoch: 6   Global Step: 257710   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:23,106-Speed 2630.14 samples/sec   Loss 9.2480   LearningRate 0.0475   Epoch: 6   Global Step: 257720   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:27,002-Speed 2629.17 samples/sec   Loss 9.3789   LearningRate 0.0475   Epoch: 6   Global Step: 257730   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:30,892-Speed 2633.56 samples/sec   Loss 9.4614   LearningRate 0.0475   Epoch: 6   Global Step: 257740   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:34,783-Speed 2632.32 samples/sec   Loss 9.3156   LearningRate 0.0475   Epoch: 6   Global Step: 257750   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:38,674-Speed 2631.62 samples/sec   Loss 9.3270   LearningRate 0.0475   Epoch: 6   Global Step: 257760   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:42,576-Speed 2624.96 samples/sec   Loss 9.3457   LearningRate 0.0475   Epoch: 6   Global Step: 257770   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:46,476-Speed 2626.93 samples/sec   Loss 9.2466   LearningRate 0.0475   Epoch: 6   Global Step: 257780   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:40:50,375-Speed 2626.58 samples/sec   Loss 9.3595   LearningRate 0.0475   Epoch: 6   Global Step: 257790   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:40:54,272-Speed 2628.78 samples/sec   Loss 9.3127   LearningRate 0.0475   Epoch: 6   Global Step: 257800   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:40:58,170-Speed 2627.51 samples/sec   Loss 9.4071   LearningRate 0.0475   Epoch: 6   Global Step: 257810   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:02,081-Speed 2618.62 samples/sec   Loss 9.3302   LearningRate 0.0475   Epoch: 6   Global Step: 257820   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:05,985-Speed 2622.92 samples/sec   Loss 9.5456   LearningRate 0.0475   Epoch: 6   Global Step: 257830   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:09,877-Speed 2632.26 samples/sec   Loss 9.3997   LearningRate 0.0475   Epoch: 6   Global Step: 257840   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:13,781-Speed 2623.52 samples/sec   Loss 9.4751   LearningRate 0.0475   Epoch: 6   Global Step: 257850   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:17,670-Speed 2633.62 samples/sec   Loss 9.3040   LearningRate 0.0475   Epoch: 6   Global Step: 257860   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:21,559-Speed 2634.17 samples/sec   Loss 9.3652   LearningRate 0.0475   Epoch: 6   Global Step: 257870   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:25,450-Speed 2632.13 samples/sec   Loss 9.3169   LearningRate 0.0475   Epoch: 6   Global Step: 257880   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:29,340-Speed 2633.34 samples/sec   Loss 9.2846   LearningRate 0.0475   Epoch: 6   Global Step: 257890   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:41:33,233-Speed 2631.28 samples/sec   Loss 9.2311   LearningRate 0.0475   Epoch: 6   Global Step: 257900   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:41:37,128-Speed 2629.19 samples/sec   Loss 9.2855   LearningRate 0.0475   Epoch: 6   Global Step: 257910   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:41:41,019-Speed 2632.41 samples/sec   Loss 9.3601   LearningRate 0.0475   Epoch: 6   Global Step: 257920   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:44,916-Speed 2628.56 samples/sec   Loss 9.2201   LearningRate 0.0475   Epoch: 6   Global Step: 257930   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:48,807-Speed 2632.16 samples/sec   Loss 9.3665   LearningRate 0.0475   Epoch: 6   Global Step: 257940   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:52,722-Speed 2616.68 samples/sec   Loss 9.3279   LearningRate 0.0475   Epoch: 6   Global Step: 257950   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:41:56,618-Speed 2629.31 samples/sec   Loss 9.3540   LearningRate 0.0475   Epoch: 6   Global Step: 257960   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:42:00,529-Speed 2618.93 samples/sec   Loss 9.3081   LearningRate 0.0475   Epoch: 6   Global Step: 257970   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:42:04,440-Speed 2618.97 samples/sec   Loss 9.3246   LearningRate 0.0475   Epoch: 6   Global Step: 257980   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:42:08,345-Speed 2622.48 samples/sec   Loss 9.3309   LearningRate 0.0475   Epoch: 6   Global Step: 257990   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:42:12,226-Speed 2638.98 samples/sec   Loss 9.3146   LearningRate 0.0475   Epoch: 6   Global Step: 258000   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:16,119-Speed 2631.49 samples/sec   Loss 9.2187   LearningRate 0.0475   Epoch: 6   Global Step: 258010   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:20,023-Speed 2623.89 samples/sec   Loss 9.3413   LearningRate 0.0475   Epoch: 6   Global Step: 258020   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:23,926-Speed 2623.79 samples/sec   Loss 9.2451   LearningRate 0.0475   Epoch: 6   Global Step: 258030   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:27,826-Speed 2626.66 samples/sec   Loss 9.1660   LearningRate 0.0475   Epoch: 6   Global Step: 258040   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:31,725-Speed 2626.83 samples/sec   Loss 9.3019   LearningRate 0.0475   Epoch: 6   Global Step: 258050   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:35,624-Speed 2627.15 samples/sec   Loss 9.2972   LearningRate 0.0475   Epoch: 6   Global Step: 258060   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:39,520-Speed 2629.08 samples/sec   Loss 9.2013   LearningRate 0.0475   Epoch: 6   Global Step: 258070   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:43,422-Speed 2624.99 samples/sec   Loss 9.1813   LearningRate 0.0475   Epoch: 6   Global Step: 258080   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:47,331-Speed 2620.20 samples/sec   Loss 9.2575   LearningRate 0.0475   Epoch: 6   Global Step: 258090   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:42:51,224-Speed 2630.88 samples/sec   Loss 9.2935   LearningRate 0.0475   Epoch: 6   Global Step: 258100   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:42:55,123-Speed 2627.03 samples/sec   Loss 9.4688   LearningRate 0.0475   Epoch: 6   Global Step: 258110   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:42:59,023-Speed 2626.27 samples/sec   Loss 9.2976   LearningRate 0.0475   Epoch: 6   Global Step: 258120   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:43:02,913-Speed 2633.03 samples/sec   Loss 9.2504   LearningRate 0.0475   Epoch: 6   Global Step: 258130   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:43:06,814-Speed 2625.58 samples/sec   Loss 9.2448   LearningRate 0.0474   Epoch: 6   Global Step: 258140   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:43:10,687-Speed 2644.75 samples/sec   Loss 9.2246   LearningRate 0.0474   Epoch: 6   Global Step: 258150   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:14,583-Speed 2629.41 samples/sec   Loss 9.2861   LearningRate 0.0474   Epoch: 6   Global Step: 258160   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:18,482-Speed 2626.41 samples/sec   Loss 9.2918   LearningRate 0.0474   Epoch: 6   Global Step: 258170   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:22,397-Speed 2616.69 samples/sec   Loss 9.3700   LearningRate 0.0474   Epoch: 6   Global Step: 258180   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:26,290-Speed 2630.91 samples/sec   Loss 9.2886   LearningRate 0.0474   Epoch: 6   Global Step: 258190   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:30,184-Speed 2630.60 samples/sec   Loss 9.2520   LearningRate 0.0474   Epoch: 6   Global Step: 258200   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:34,082-Speed 2627.99 samples/sec   Loss 9.2962   LearningRate 0.0474   Epoch: 6   Global Step: 258210   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:37,987-Speed 2622.39 samples/sec   Loss 9.2664   LearningRate 0.0474   Epoch: 6   Global Step: 258220   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:41,882-Speed 2630.06 samples/sec   Loss 9.5074   LearningRate 0.0474   Epoch: 6   Global Step: 258230   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:45,775-Speed 2630.68 samples/sec   Loss 9.3547   LearningRate 0.0474   Epoch: 6   Global Step: 258240   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:43:49,669-Speed 2630.59 samples/sec   Loss 9.4494   LearningRate 0.0474   Epoch: 6   Global Step: 258250   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:43:53,562-Speed 2631.31 samples/sec   Loss 9.3129   LearningRate 0.0474   Epoch: 6   Global Step: 258260   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:43:57,459-Speed 2628.05 samples/sec   Loss 9.4075   LearningRate 0.0474   Epoch: 6   Global Step: 258270   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:44:01,353-Speed 2630.89 samples/sec   Loss 9.4261   LearningRate 0.0474   Epoch: 6   Global Step: 258280   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:44:05,247-Speed 2629.74 samples/sec   Loss 9.3017   LearningRate 0.0474   Epoch: 6   Global Step: 258290   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:44:09,143-Speed 2629.18 samples/sec   Loss 9.3250   LearningRate 0.0474   Epoch: 6   Global Step: 258300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:44:13,039-Speed 2628.21 samples/sec   Loss 9.1945   LearningRate 0.0474   Epoch: 6   Global Step: 258310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:44:16,915-Speed 2642.87 samples/sec   Loss 9.2853   LearningRate 0.0474   Epoch: 6   Global Step: 258320   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:20,811-Speed 2628.85 samples/sec   Loss 9.2984   LearningRate 0.0474   Epoch: 6   Global Step: 258330   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:24,714-Speed 2624.31 samples/sec   Loss 9.2386   LearningRate 0.0474   Epoch: 6   Global Step: 258340   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:28,618-Speed 2623.78 samples/sec   Loss 9.3428   LearningRate 0.0474   Epoch: 6   Global Step: 258350   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:32,506-Speed 2634.11 samples/sec   Loss 9.3050   LearningRate 0.0474   Epoch: 6   Global Step: 258360   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:36,403-Speed 2628.71 samples/sec   Loss 9.3346   LearningRate 0.0474   Epoch: 6   Global Step: 258370   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:40,307-Speed 2623.80 samples/sec   Loss 9.3338   LearningRate 0.0474   Epoch: 6   Global Step: 258380   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:44,198-Speed 2632.24 samples/sec   Loss 9.3300   LearningRate 0.0474   Epoch: 6   Global Step: 258390   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:48,110-Speed 2618.16 samples/sec   Loss 9.2801   LearningRate 0.0474   Epoch: 6   Global Step: 258400   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:52,007-Speed 2628.59 samples/sec   Loss 9.1841   LearningRate 0.0474   Epoch: 6   Global Step: 258410   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:44:55,903-Speed 2629.20 samples/sec   Loss 9.2164   LearningRate 0.0474   Epoch: 6   Global Step: 258420   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:44:59,794-Speed 2632.18 samples/sec   Loss 9.2835   LearningRate 0.0474   Epoch: 6   Global Step: 258430   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:03,692-Speed 2627.95 samples/sec   Loss 9.2134   LearningRate 0.0474   Epoch: 6   Global Step: 258440   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:07,586-Speed 2629.99 samples/sec   Loss 9.2375   LearningRate 0.0474   Epoch: 6   Global Step: 258450   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:11,476-Speed 2633.22 samples/sec   Loss 9.3099   LearningRate 0.0474   Epoch: 6   Global Step: 258460   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:15,380-Speed 2623.92 samples/sec   Loss 9.2507   LearningRate 0.0474   Epoch: 6   Global Step: 258470   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:19,272-Speed 2631.63 samples/sec   Loss 9.3861   LearningRate 0.0474   Epoch: 6   Global Step: 258480   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:23,176-Speed 2623.38 samples/sec   Loss 9.2074   LearningRate 0.0474   Epoch: 6   Global Step: 258490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:27,068-Speed 2631.48 samples/sec   Loss 9.2637   LearningRate 0.0474   Epoch: 6   Global Step: 258500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:30,966-Speed 2627.92 samples/sec   Loss 9.3454   LearningRate 0.0474   Epoch: 6   Global Step: 258510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:34,861-Speed 2629.43 samples/sec   Loss 9.2652   LearningRate 0.0474   Epoch: 6   Global Step: 258520   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:45:38,753-Speed 2631.77 samples/sec   Loss 9.2454   LearningRate 0.0474   Epoch: 6   Global Step: 258530   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:45:42,626-Speed 2644.24 samples/sec   Loss 9.3229   LearningRate 0.0474   Epoch: 6   Global Step: 258540   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:46,515-Speed 2634.55 samples/sec   Loss 9.2622   LearningRate 0.0474   Epoch: 6   Global Step: 258550   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:50,409-Speed 2629.83 samples/sec   Loss 9.3645   LearningRate 0.0474   Epoch: 6   Global Step: 258560   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:54,319-Speed 2620.12 samples/sec   Loss 9.3005   LearningRate 0.0474   Epoch: 6   Global Step: 258570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:45:58,228-Speed 2619.97 samples/sec   Loss 9.3783   LearningRate 0.0474   Epoch: 6   Global Step: 258580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:46:02,120-Speed 2631.13 samples/sec   Loss 9.2230   LearningRate 0.0474   Epoch: 6   Global Step: 258590   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:46:06,016-Speed 2629.23 samples/sec   Loss 9.3962   LearningRate 0.0474   Epoch: 6   Global Step: 258600   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:46:09,926-Speed 2619.82 samples/sec   Loss 9.3597   LearningRate 0.0474   Epoch: 6   Global Step: 258610   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:46:13,825-Speed 2626.94 samples/sec   Loss 9.4502   LearningRate 0.0474   Epoch: 6   Global Step: 258620   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:46:17,727-Speed 2625.05 samples/sec   Loss 9.3258   LearningRate 0.0474   Epoch: 6   Global Step: 258630   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:46:21,624-Speed 2628.07 samples/sec   Loss 9.1703   LearningRate 0.0474   Epoch: 6   Global Step: 258640   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:25,519-Speed 2630.47 samples/sec   Loss 9.3282   LearningRate 0.0474   Epoch: 6   Global Step: 258650   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:29,449-Speed 2606.01 samples/sec   Loss 9.3260   LearningRate 0.0474   Epoch: 6   Global Step: 258660   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:33,355-Speed 2622.06 samples/sec   Loss 9.2505   LearningRate 0.0474   Epoch: 6   Global Step: 258670   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:37,317-Speed 2585.25 samples/sec   Loss 9.3255   LearningRate 0.0474   Epoch: 6   Global Step: 258680   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:41,230-Speed 2617.67 samples/sec   Loss 9.2693   LearningRate 0.0474   Epoch: 6   Global Step: 258690   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:45,124-Speed 2630.55 samples/sec   Loss 9.3186   LearningRate 0.0474   Epoch: 6   Global Step: 258700   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:49,020-Speed 2629.28 samples/sec   Loss 9.1871   LearningRate 0.0474   Epoch: 6   Global Step: 258710   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:52,931-Speed 2619.18 samples/sec   Loss 9.2124   LearningRate 0.0474   Epoch: 6   Global Step: 258720   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:46:56,820-Speed 2633.86 samples/sec   Loss 9.3573   LearningRate 0.0474   Epoch: 6   Global Step: 258730   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:00,735-Speed 2616.40 samples/sec   Loss 9.1195   LearningRate 0.0473   Epoch: 6   Global Step: 258740   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:04,641-Speed 2621.91 samples/sec   Loss 9.2279   LearningRate 0.0473   Epoch: 6   Global Step: 258750   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:08,552-Speed 2619.11 samples/sec   Loss 9.2606   LearningRate 0.0473   Epoch: 6   Global Step: 258760   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:12,452-Speed 2625.63 samples/sec   Loss 9.3489   LearningRate 0.0473   Epoch: 6   Global Step: 258770   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:16,348-Speed 2629.67 samples/sec   Loss 9.3376   LearningRate 0.0473   Epoch: 6   Global Step: 258780   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:20,243-Speed 2629.66 samples/sec   Loss 9.2892   LearningRate 0.0473   Epoch: 6   Global Step: 258790   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:24,140-Speed 2628.74 samples/sec   Loss 9.3506   LearningRate 0.0473   Epoch: 6   Global Step: 258800   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:28,035-Speed 2630.23 samples/sec   Loss 9.4524   LearningRate 0.0473   Epoch: 6   Global Step: 258810   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:31,946-Speed 2618.23 samples/sec   Loss 9.3649   LearningRate 0.0473   Epoch: 6   Global Step: 258820   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:35,847-Speed 2625.91 samples/sec   Loss 9.3421   LearningRate 0.0473   Epoch: 6   Global Step: 258830   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:47:39,744-Speed 2628.52 samples/sec   Loss 9.4250   LearningRate 0.0473   Epoch: 6   Global Step: 258840   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:47:43,638-Speed 2629.89 samples/sec   Loss 9.2958   LearningRate 0.0473   Epoch: 6   Global Step: 258850   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:47:47,535-Speed 2629.04 samples/sec   Loss 9.2431   LearningRate 0.0473   Epoch: 6   Global Step: 258860   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:47:51,418-Speed 2637.50 samples/sec   Loss 9.2359   LearningRate 0.0473   Epoch: 6   Global Step: 258870   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:55,318-Speed 2626.12 samples/sec   Loss 9.4550   LearningRate 0.0473   Epoch: 6   Global Step: 258880   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:47:59,211-Speed 2631.52 samples/sec   Loss 9.3604   LearningRate 0.0473   Epoch: 6   Global Step: 258890   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:03,149-Speed 2601.06 samples/sec   Loss 9.2946   LearningRate 0.0473   Epoch: 6   Global Step: 258900   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:07,043-Speed 2630.16 samples/sec   Loss 9.1932   LearningRate 0.0473   Epoch: 6   Global Step: 258910   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:10,984-Speed 2598.82 samples/sec   Loss 9.1500   LearningRate 0.0473   Epoch: 6   Global Step: 258920   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:14,877-Speed 2631.33 samples/sec   Loss 9.2639   LearningRate 0.0473   Epoch: 6   Global Step: 258930   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:18,770-Speed 2630.69 samples/sec   Loss 9.1816   LearningRate 0.0473   Epoch: 6   Global Step: 258940   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:22,661-Speed 2632.45 samples/sec   Loss 9.2249   LearningRate 0.0473   Epoch: 6   Global Step: 258950   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:26,551-Speed 2632.86 samples/sec   Loss 9.3060   LearningRate 0.0473   Epoch: 6   Global Step: 258960   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:30,444-Speed 2631.79 samples/sec   Loss 9.2927   LearningRate 0.0473   Epoch: 6   Global Step: 258970   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:48:34,345-Speed 2625.66 samples/sec   Loss 9.2062   LearningRate 0.0473   Epoch: 6   Global Step: 258980   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:48:38,241-Speed 2628.39 samples/sec   Loss 9.2550   LearningRate 0.0473   Epoch: 6   Global Step: 258990   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:48:42,134-Speed 2631.21 samples/sec   Loss 9.2743   LearningRate 0.0473   Epoch: 6   Global Step: 259000   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:46,033-Speed 2626.84 samples/sec   Loss 9.2341   LearningRate 0.0473   Epoch: 6   Global Step: 259010   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:49,930-Speed 2628.49 samples/sec   Loss 9.4150   LearningRate 0.0473   Epoch: 6   Global Step: 259020   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:53,834-Speed 2623.48 samples/sec   Loss 9.3058   LearningRate 0.0473   Epoch: 6   Global Step: 259030   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:48:57,754-Speed 2612.65 samples/sec   Loss 9.1531   LearningRate 0.0473   Epoch: 6   Global Step: 259040   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:01,647-Speed 2631.15 samples/sec   Loss 9.1386   LearningRate 0.0473   Epoch: 6   Global Step: 259050   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:05,545-Speed 2627.58 samples/sec   Loss 9.3299   LearningRate 0.0473   Epoch: 6   Global Step: 259060   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:09,450-Speed 2623.17 samples/sec   Loss 9.2274   LearningRate 0.0473   Epoch: 6   Global Step: 259070   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:13,341-Speed 2632.05 samples/sec   Loss 9.2936   LearningRate 0.0473   Epoch: 6   Global Step: 259080   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:17,232-Speed 2632.74 samples/sec   Loss 9.0871   LearningRate 0.0473   Epoch: 6   Global Step: 259090   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:21,124-Speed 2631.51 samples/sec   Loss 9.1796   LearningRate 0.0473   Epoch: 6   Global Step: 259100   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:49:25,015-Speed 2631.96 samples/sec   Loss 9.1784   LearningRate 0.0473   Epoch: 6   Global Step: 259110   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:49:28,906-Speed 2632.89 samples/sec   Loss 9.3877   LearningRate 0.0473   Epoch: 6   Global Step: 259120   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:49:33,378-Speed 2290.20 samples/sec   Loss 9.3622   LearningRate 0.0473   Epoch: 6   Global Step: 259130   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:37,269-Speed 2632.24 samples/sec   Loss 9.3102   LearningRate 0.0473   Epoch: 6   Global Step: 259140   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:41,176-Speed 2621.66 samples/sec   Loss 9.2454   LearningRate 0.0473   Epoch: 6   Global Step: 259150   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:45,069-Speed 2630.77 samples/sec   Loss 9.2485   LearningRate 0.0473   Epoch: 6   Global Step: 259160   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:48,962-Speed 2631.28 samples/sec   Loss 9.2820   LearningRate 0.0473   Epoch: 6   Global Step: 259170   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:52,862-Speed 2626.56 samples/sec   Loss 9.3817   LearningRate 0.0473   Epoch: 6   Global Step: 259180   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:49:56,753-Speed 2632.07 samples/sec   Loss 9.2470   LearningRate 0.0473   Epoch: 6   Global Step: 259190   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:00,649-Speed 2629.25 samples/sec   Loss 9.2634   LearningRate 0.0473   Epoch: 6   Global Step: 259200   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:04,556-Speed 2621.66 samples/sec   Loss 9.2956   LearningRate 0.0473   Epoch: 6   Global Step: 259210   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:08,448-Speed 2631.76 samples/sec   Loss 9.2740   LearningRate 0.0473   Epoch: 6   Global Step: 259220   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:12,336-Speed 2634.16 samples/sec   Loss 9.3669   LearningRate 0.0473   Epoch: 6   Global Step: 259230   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:50:16,229-Speed 2631.30 samples/sec   Loss 9.3159   LearningRate 0.0473   Epoch: 6   Global Step: 259240   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:50:20,123-Speed 2629.89 samples/sec   Loss 9.3500   LearningRate 0.0473   Epoch: 6   Global Step: 259250   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:50:24,014-Speed 2632.32 samples/sec   Loss 9.2463   LearningRate 0.0473   Epoch: 6   Global Step: 259260   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:50:27,893-Speed 2640.77 samples/sec   Loss 9.3466   LearningRate 0.0473   Epoch: 6   Global Step: 259270   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:31,782-Speed 2633.88 samples/sec   Loss 9.2109   LearningRate 0.0473   Epoch: 6   Global Step: 259280   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:35,673-Speed 2632.72 samples/sec   Loss 9.2991   LearningRate 0.0473   Epoch: 6   Global Step: 259290   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:39,567-Speed 2630.03 samples/sec   Loss 9.1871   LearningRate 0.0473   Epoch: 6   Global Step: 259300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:43,455-Speed 2634.03 samples/sec   Loss 9.1792   LearningRate 0.0473   Epoch: 6   Global Step: 259310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:47,375-Speed 2613.31 samples/sec   Loss 9.2774   LearningRate 0.0473   Epoch: 6   Global Step: 259320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:51,404-Speed 2542.41 samples/sec   Loss 9.2240   LearningRate 0.0473   Epoch: 6   Global Step: 259330   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:55,300-Speed 2629.11 samples/sec   Loss 9.2550   LearningRate 0.0472   Epoch: 6   Global Step: 259340   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:50:59,210-Speed 2619.98 samples/sec   Loss 9.3015   LearningRate 0.0472   Epoch: 6   Global Step: 259350   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:03,112-Speed 2624.83 samples/sec   Loss 9.2088   LearningRate 0.0472   Epoch: 6   Global Step: 259360   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:07,017-Speed 2622.44 samples/sec   Loss 9.1403   LearningRate 0.0472   Epoch: 6   Global Step: 259370   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:10,924-Speed 2621.83 samples/sec   Loss 9.1900   LearningRate 0.0472   Epoch: 6   Global Step: 259380   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:14,835-Speed 2619.35 samples/sec   Loss 9.3320   LearningRate 0.0472   Epoch: 6   Global Step: 259390   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:18,737-Speed 2625.20 samples/sec   Loss 9.3308   LearningRate 0.0472   Epoch: 6   Global Step: 259400   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:22,634-Speed 2627.63 samples/sec   Loss 9.2413   LearningRate 0.0472   Epoch: 6   Global Step: 259410   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:26,530-Speed 2629.90 samples/sec   Loss 9.1968   LearningRate 0.0472   Epoch: 6   Global Step: 259420   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:30,426-Speed 2628.64 samples/sec   Loss 9.3696   LearningRate 0.0472   Epoch: 6   Global Step: 259430   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:51:34,300-Speed 2643.74 samples/sec   Loss 9.2565   LearningRate 0.0472   Epoch: 6   Global Step: 259440   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:38,212-Speed 2617.92 samples/sec   Loss 9.3413   LearningRate 0.0472   Epoch: 6   Global Step: 259450   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:42,098-Speed 2636.31 samples/sec   Loss 9.3065   LearningRate 0.0472   Epoch: 6   Global Step: 259460   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:45,989-Speed 2632.67 samples/sec   Loss 9.2756   LearningRate 0.0472   Epoch: 6   Global Step: 259470   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:49,878-Speed 2633.33 samples/sec   Loss 9.4087   LearningRate 0.0472   Epoch: 6   Global Step: 259480   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:53,787-Speed 2620.98 samples/sec   Loss 9.3959   LearningRate 0.0472   Epoch: 6   Global Step: 259490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:51:57,681-Speed 2630.31 samples/sec   Loss 9.2322   LearningRate 0.0472   Epoch: 6   Global Step: 259500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:01,569-Speed 2634.50 samples/sec   Loss 9.2074   LearningRate 0.0472   Epoch: 6   Global Step: 259510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:05,460-Speed 2631.78 samples/sec   Loss 9.2823   LearningRate 0.0472   Epoch: 6   Global Step: 259520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:09,356-Speed 2629.89 samples/sec   Loss 9.2931   LearningRate 0.0472   Epoch: 6   Global Step: 259530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:13,297-Speed 2598.82 samples/sec   Loss 9.2783   LearningRate 0.0472   Epoch: 6   Global Step: 259540   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:52:17,191-Speed 2630.26 samples/sec   Loss 9.1967   LearningRate 0.0472   Epoch: 6   Global Step: 259550   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:52:21,069-Speed 2641.08 samples/sec   Loss 9.3083   LearningRate 0.0472   Epoch: 6   Global Step: 259560   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:24,958-Speed 2633.53 samples/sec   Loss 9.3590   LearningRate 0.0472   Epoch: 6   Global Step: 259570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:28,851-Speed 2631.36 samples/sec   Loss 9.1579   LearningRate 0.0472   Epoch: 6   Global Step: 259580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:52:32,737-Speed 2635.65 samples/sec   Loss 9.2513   LearningRate 0.0472   Epoch: 6   Global Step: 259590   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:52:36,642-Speed 2622.67 samples/sec   Loss 9.2381   LearningRate 0.0472   Epoch: 6   Global Step: 259600   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:52:40,533-Speed 2632.99 samples/sec   Loss 9.3988   LearningRate 0.0472   Epoch: 6   Global Step: 259610   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:52:44,436-Speed 2624.40 samples/sec   Loss 9.3679   LearningRate 0.0472   Epoch: 6   Global Step: 259620   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:52:48,366-Speed 2606.30 samples/sec   Loss 9.3976   LearningRate 0.0472   Epoch: 6   Global Step: 259630   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:52:52,272-Speed 2622.26 samples/sec   Loss 9.2715   LearningRate 0.0472   Epoch: 6   Global Step: 259640   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:52:56,158-Speed 2635.70 samples/sec   Loss 9.1641   LearningRate 0.0472   Epoch: 6   Global Step: 259650   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:53:00,047-Speed 2633.93 samples/sec   Loss 9.4508   LearningRate 0.0472   Epoch: 6   Global Step: 259660   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:53:03,934-Speed 2634.73 samples/sec   Loss 9.3374   LearningRate 0.0472   Epoch: 6   Global Step: 259670   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:53:07,837-Speed 2623.91 samples/sec   Loss 9.4126   LearningRate 0.0472   Epoch: 6   Global Step: 259680   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:53:11,736-Speed 2627.62 samples/sec   Loss 9.3526   LearningRate 0.0472   Epoch: 6   Global Step: 259690   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:53:15,636-Speed 2626.12 samples/sec   Loss 9.4899   LearningRate 0.0472   Epoch: 6   Global Step: 259700   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 00:53:19,537-Speed 2625.72 samples/sec   Loss 9.3522   LearningRate 0.0472   Epoch: 6   Global Step: 259710   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:23,428-Speed 2632.80 samples/sec   Loss 9.5188   LearningRate 0.0472   Epoch: 6   Global Step: 259720   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:27,315-Speed 2634.27 samples/sec   Loss 9.2959   LearningRate 0.0472   Epoch: 6   Global Step: 259730   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:31,203-Speed 2634.78 samples/sec   Loss 9.2792   LearningRate 0.0472   Epoch: 6   Global Step: 259740   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:35,102-Speed 2626.59 samples/sec   Loss 9.2457   LearningRate 0.0472   Epoch: 6   Global Step: 259750   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:38,993-Speed 2632.62 samples/sec   Loss 9.3375   LearningRate 0.0472   Epoch: 6   Global Step: 259760   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:42,887-Speed 2630.43 samples/sec   Loss 9.2777   LearningRate 0.0472   Epoch: 6   Global Step: 259770   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:46,775-Speed 2634.73 samples/sec   Loss 9.3129   LearningRate 0.0472   Epoch: 6   Global Step: 259780   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:50,668-Speed 2630.71 samples/sec   Loss 9.4397   LearningRate 0.0472   Epoch: 6   Global Step: 259790   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:54,556-Speed 2634.50 samples/sec   Loss 9.1472   LearningRate 0.0472   Epoch: 6   Global Step: 259800   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 00:53:58,446-Speed 2632.70 samples/sec   Loss 9.1649   LearningRate 0.0472   Epoch: 6   Global Step: 259810   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:02,356-Speed 2620.14 samples/sec   Loss 9.2767   LearningRate 0.0472   Epoch: 6   Global Step: 259820   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:06,243-Speed 2634.29 samples/sec   Loss 9.2436   LearningRate 0.0472   Epoch: 6   Global Step: 259830   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:10,135-Speed 2631.86 samples/sec   Loss 9.2647   LearningRate 0.0472   Epoch: 6   Global Step: 259840   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:14,027-Speed 2631.37 samples/sec   Loss 9.2154   LearningRate 0.0472   Epoch: 6   Global Step: 259850   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:17,918-Speed 2632.42 samples/sec   Loss 9.2670   LearningRate 0.0472   Epoch: 6   Global Step: 259860   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:21,809-Speed 2634.40 samples/sec   Loss 9.1465   LearningRate 0.0472   Epoch: 6   Global Step: 259870   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:25,700-Speed 2632.30 samples/sec   Loss 9.1874   LearningRate 0.0472   Epoch: 6   Global Step: 259880   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:29,597-Speed 2628.66 samples/sec   Loss 9.1450   LearningRate 0.0472   Epoch: 6   Global Step: 259890   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:33,488-Speed 2632.06 samples/sec   Loss 9.2378   LearningRate 0.0472   Epoch: 6   Global Step: 259900   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:54:37,377-Speed 2633.05 samples/sec   Loss 9.3374   LearningRate 0.0472   Epoch: 6   Global Step: 259910   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:54:41,277-Speed 2626.65 samples/sec   Loss 9.3231   LearningRate 0.0472   Epoch: 6   Global Step: 259920   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:54:45,171-Speed 2629.99 samples/sec   Loss 9.1656   LearningRate 0.0472   Epoch: 6   Global Step: 259930   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:54:49,061-Speed 2633.70 samples/sec   Loss 9.4582   LearningRate 0.0472   Epoch: 6   Global Step: 259940   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:54:52,952-Speed 2632.70 samples/sec   Loss 9.3525   LearningRate 0.0471   Epoch: 6   Global Step: 259950   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:54:56,845-Speed 2630.49 samples/sec   Loss 9.1429   LearningRate 0.0471   Epoch: 6   Global Step: 259960   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:55:00,767-Speed 2611.84 samples/sec   Loss 9.1893   LearningRate 0.0471   Epoch: 6   Global Step: 259970   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:55:04,657-Speed 2632.89 samples/sec   Loss 9.1382   LearningRate 0.0471   Epoch: 6   Global Step: 259980   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:55:08,539-Speed 2638.60 samples/sec   Loss 9.1973   LearningRate 0.0471   Epoch: 6   Global Step: 259990   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:55:12,439-Speed 2626.35 samples/sec   Loss 9.4235   LearningRate 0.0471   Epoch: 6   Global Step: 260000   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:55:55,770-[lfw][260000]XNorm: 23.913672
Training: 2022-04-14 00:55:55,770-[lfw][260000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-14 00:55:55,771-[lfw][260000]Accuracy-Highest: 0.99783
Training: 2022-04-14 00:56:46,222-[cfp_fp][260000]XNorm: 22.023019
Training: 2022-04-14 00:56:46,222-[cfp_fp][260000]Accuracy-Flip: 0.98214+-0.00471
Training: 2022-04-14 00:56:46,223-[cfp_fp][260000]Accuracy-Highest: 0.98643
Training: 2022-04-14 00:57:29,826-[agedb_30][260000]XNorm: 23.688854
Training: 2022-04-14 00:57:29,827-[agedb_30][260000]Accuracy-Flip: 0.97317+-0.00643
Training: 2022-04-14 00:57:29,827-[agedb_30][260000]Accuracy-Highest: 0.97350
Training: 2022-04-14 00:57:33,685-Speed 72.50 samples/sec   Loss 9.2724   LearningRate 0.0471   Epoch: 6   Global Step: 260010   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:57:37,554-Speed 2646.58 samples/sec   Loss 9.3040   LearningRate 0.0471   Epoch: 6   Global Step: 260020   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:57:41,415-Speed 2653.06 samples/sec   Loss 9.2536   LearningRate 0.0471   Epoch: 6   Global Step: 260030   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:57:45,275-Speed 2653.59 samples/sec   Loss 9.3815   LearningRate 0.0471   Epoch: 6   Global Step: 260040   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:57:49,142-Speed 2648.32 samples/sec   Loss 9.2068   LearningRate 0.0471   Epoch: 6   Global Step: 260050   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:57:53,016-Speed 2644.38 samples/sec   Loss 9.2394   LearningRate 0.0471   Epoch: 6   Global Step: 260060   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:57:56,881-Speed 2650.12 samples/sec   Loss 9.3356   LearningRate 0.0471   Epoch: 6   Global Step: 260070   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:00,750-Speed 2647.22 samples/sec   Loss 9.2719   LearningRate 0.0471   Epoch: 6   Global Step: 260080   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:04,619-Speed 2647.21 samples/sec   Loss 9.4407   LearningRate 0.0471   Epoch: 6   Global Step: 260090   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:58:08,499-Speed 2639.31 samples/sec   Loss 9.3249   LearningRate 0.0471   Epoch: 6   Global Step: 260100   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:58:12,379-Speed 2640.23 samples/sec   Loss 9.2983   LearningRate 0.0471   Epoch: 6   Global Step: 260110   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:58:16,254-Speed 2643.57 samples/sec   Loss 9.2421   LearningRate 0.0471   Epoch: 6   Global Step: 260120   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:58:20,131-Speed 2641.85 samples/sec   Loss 9.3179   LearningRate 0.0471   Epoch: 6   Global Step: 260130   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:58:23,994-Speed 2652.25 samples/sec   Loss 9.2956   LearningRate 0.0471   Epoch: 6   Global Step: 260140   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:27,881-Speed 2634.67 samples/sec   Loss 9.1237   LearningRate 0.0471   Epoch: 6   Global Step: 260150   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:31,762-Speed 2639.44 samples/sec   Loss 9.3284   LearningRate 0.0471   Epoch: 6   Global Step: 260160   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:35,648-Speed 2635.57 samples/sec   Loss 9.2577   LearningRate 0.0471   Epoch: 6   Global Step: 260170   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:39,546-Speed 2627.81 samples/sec   Loss 9.2829   LearningRate 0.0471   Epoch: 6   Global Step: 260180   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:43,440-Speed 2630.41 samples/sec   Loss 9.3043   LearningRate 0.0471   Epoch: 6   Global Step: 260190   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:47,339-Speed 2626.93 samples/sec   Loss 9.3381   LearningRate 0.0471   Epoch: 6   Global Step: 260200   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:51,248-Speed 2620.31 samples/sec   Loss 9.3855   LearningRate 0.0471   Epoch: 6   Global Step: 260210   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:55,149-Speed 2626.23 samples/sec   Loss 9.3459   LearningRate 0.0471   Epoch: 6   Global Step: 260220   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:58:59,054-Speed 2622.82 samples/sec   Loss 9.3257   LearningRate 0.0471   Epoch: 6   Global Step: 260230   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:59:02,943-Speed 2633.40 samples/sec   Loss 9.2594   LearningRate 0.0471   Epoch: 6   Global Step: 260240   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:06,831-Speed 2634.56 samples/sec   Loss 9.1176   LearningRate 0.0471   Epoch: 6   Global Step: 260250   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:10,718-Speed 2634.52 samples/sec   Loss 9.1402   LearningRate 0.0471   Epoch: 6   Global Step: 260260   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:14,607-Speed 2633.75 samples/sec   Loss 9.3969   LearningRate 0.0471   Epoch: 6   Global Step: 260270   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:18,496-Speed 2633.63 samples/sec   Loss 9.1800   LearningRate 0.0471   Epoch: 6   Global Step: 260280   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:22,389-Speed 2630.99 samples/sec   Loss 9.2420   LearningRate 0.0471   Epoch: 6   Global Step: 260290   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:26,278-Speed 2633.70 samples/sec   Loss 9.3097   LearningRate 0.0471   Epoch: 6   Global Step: 260300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:30,169-Speed 2632.49 samples/sec   Loss 9.3353   LearningRate 0.0471   Epoch: 6   Global Step: 260310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:34,081-Speed 2618.42 samples/sec   Loss 9.1351   LearningRate 0.0471   Epoch: 6   Global Step: 260320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:37,970-Speed 2633.45 samples/sec   Loss 9.2650   LearningRate 0.0471   Epoch: 6   Global Step: 260330   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:41,868-Speed 2627.46 samples/sec   Loss 9.2781   LearningRate 0.0471   Epoch: 6   Global Step: 260340   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:59:45,757-Speed 2634.00 samples/sec   Loss 9.2407   LearningRate 0.0471   Epoch: 6   Global Step: 260350   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 00:59:49,632-Speed 2643.02 samples/sec   Loss 9.2036   LearningRate 0.0471   Epoch: 6   Global Step: 260360   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 00:59:53,505-Speed 2644.68 samples/sec   Loss 9.2483   LearningRate 0.0471   Epoch: 6   Global Step: 260370   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 00:59:57,388-Speed 2637.95 samples/sec   Loss 9.2243   LearningRate 0.0471   Epoch: 6   Global Step: 260380   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:01,397-Speed 2554.86 samples/sec   Loss 9.2508   LearningRate 0.0471   Epoch: 6   Global Step: 260390   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:05,300-Speed 2624.22 samples/sec   Loss 9.2135   LearningRate 0.0471   Epoch: 6   Global Step: 260400   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:09,241-Speed 2598.77 samples/sec   Loss 9.2521   LearningRate 0.0471   Epoch: 6   Global Step: 260410   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:13,142-Speed 2625.48 samples/sec   Loss 9.1829   LearningRate 0.0471   Epoch: 6   Global Step: 260420   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:17,032-Speed 2633.61 samples/sec   Loss 9.1878   LearningRate 0.0471   Epoch: 6   Global Step: 260430   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:20,931-Speed 2626.67 samples/sec   Loss 9.2132   LearningRate 0.0471   Epoch: 6   Global Step: 260440   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:24,823-Speed 2631.75 samples/sec   Loss 9.3421   LearningRate 0.0471   Epoch: 6   Global Step: 260450   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:28,714-Speed 2632.75 samples/sec   Loss 9.2498   LearningRate 0.0471   Epoch: 6   Global Step: 260460   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:00:32,603-Speed 2633.11 samples/sec   Loss 9.1950   LearningRate 0.0471   Epoch: 6   Global Step: 260470   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:36,494-Speed 2632.64 samples/sec   Loss 9.2662   LearningRate 0.0471   Epoch: 6   Global Step: 260480   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:40,387-Speed 2630.79 samples/sec   Loss 9.2957   LearningRate 0.0471   Epoch: 6   Global Step: 260490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:44,277-Speed 2632.84 samples/sec   Loss 9.2002   LearningRate 0.0471   Epoch: 6   Global Step: 260500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:48,174-Speed 2628.18 samples/sec   Loss 9.1870   LearningRate 0.0471   Epoch: 6   Global Step: 260510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:52,070-Speed 2628.92 samples/sec   Loss 9.2567   LearningRate 0.0471   Epoch: 6   Global Step: 260520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:55,965-Speed 2629.88 samples/sec   Loss 9.2238   LearningRate 0.0471   Epoch: 6   Global Step: 260530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:00:59,828-Speed 2651.85 samples/sec   Loss 10.1081   LearningRate 0.0471   Epoch: 6   Global Step: 260540   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:01:03,704-Speed 2642.03 samples/sec   Loss 9.9400   LearningRate 0.0470   Epoch: 6   Global Step: 260550   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:07,597-Speed 2631.67 samples/sec   Loss 9.6538   LearningRate 0.0470   Epoch: 6   Global Step: 260560   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:11,494-Speed 2627.88 samples/sec   Loss 9.3360   LearningRate 0.0470   Epoch: 6   Global Step: 260570   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:15,391-Speed 2628.22 samples/sec   Loss 9.3206   LearningRate 0.0470   Epoch: 6   Global Step: 260580   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:19,295-Speed 2623.27 samples/sec   Loss 9.4960   LearningRate 0.0470   Epoch: 6   Global Step: 260590   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:23,193-Speed 2627.62 samples/sec   Loss 9.1986   LearningRate 0.0470   Epoch: 6   Global Step: 260600   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:27,092-Speed 2627.50 samples/sec   Loss 9.1352   LearningRate 0.0470   Epoch: 6   Global Step: 260610   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:30,984-Speed 2631.69 samples/sec   Loss 9.2217   LearningRate 0.0470   Epoch: 6   Global Step: 260620   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:34,889-Speed 2622.87 samples/sec   Loss 9.2340   LearningRate 0.0470   Epoch: 6   Global Step: 260630   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:38,786-Speed 2628.61 samples/sec   Loss 9.3027   LearningRate 0.0470   Epoch: 6   Global Step: 260640   Fp16 Grad Scale: 8192   Required: 64 hours
Training: 2022-04-14 01:01:42,676-Speed 2632.57 samples/sec   Loss 9.0990   LearningRate 0.0470   Epoch: 6   Global Step: 260650   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:01:46,568-Speed 2631.42 samples/sec   Loss 9.2187   LearningRate 0.0470   Epoch: 6   Global Step: 260660   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:01:50,475-Speed 2621.66 samples/sec   Loss 9.2319   LearningRate 0.0470   Epoch: 6   Global Step: 260670   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:01:54,370-Speed 2629.47 samples/sec   Loss 9.2991   LearningRate 0.0470   Epoch: 6   Global Step: 260680   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:01:58,262-Speed 2632.58 samples/sec   Loss 9.3289   LearningRate 0.0470   Epoch: 6   Global Step: 260690   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:02:02,221-Speed 2586.70 samples/sec   Loss 9.2110   LearningRate 0.0470   Epoch: 6   Global Step: 260700   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:02:06,208-Speed 2569.26 samples/sec   Loss 9.1589   LearningRate 0.0470   Epoch: 6   Global Step: 260710   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:02:10,117-Speed 2620.39 samples/sec   Loss 9.1763   LearningRate 0.0470   Epoch: 6   Global Step: 260720   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:02:14,011-Speed 2630.02 samples/sec   Loss 9.1826   LearningRate 0.0470   Epoch: 6   Global Step: 260730   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:02:17,912-Speed 2625.77 samples/sec   Loss 9.1789   LearningRate 0.0470   Epoch: 6   Global Step: 260740   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:02:21,805-Speed 2631.00 samples/sec   Loss 9.2150   LearningRate 0.0470   Epoch: 6   Global Step: 260750   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:25,697-Speed 2631.81 samples/sec   Loss 9.2670   LearningRate 0.0470   Epoch: 6   Global Step: 260760   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:29,588-Speed 2631.99 samples/sec   Loss 9.2770   LearningRate 0.0470   Epoch: 6   Global Step: 260770   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:33,483-Speed 2630.03 samples/sec   Loss 9.2247   LearningRate 0.0470   Epoch: 6   Global Step: 260780   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:37,374-Speed 2632.99 samples/sec   Loss 9.0764   LearningRate 0.0470   Epoch: 6   Global Step: 260790   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:41,270-Speed 2628.85 samples/sec   Loss 9.2535   LearningRate 0.0470   Epoch: 6   Global Step: 260800   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:45,169-Speed 2626.37 samples/sec   Loss 9.2443   LearningRate 0.0470   Epoch: 6   Global Step: 260810   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:49,073-Speed 2623.89 samples/sec   Loss 9.3572   LearningRate 0.0470   Epoch: 6   Global Step: 260820   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:52,975-Speed 2624.63 samples/sec   Loss 9.2036   LearningRate 0.0470   Epoch: 6   Global Step: 260830   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:02:56,870-Speed 2630.47 samples/sec   Loss 9.1784   LearningRate 0.0470   Epoch: 6   Global Step: 260840   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:03:00,794-Speed 2610.29 samples/sec   Loss 9.2854   LearningRate 0.0470   Epoch: 6   Global Step: 260850   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:04,695-Speed 2624.93 samples/sec   Loss 9.2734   LearningRate 0.0470   Epoch: 6   Global Step: 260860   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:08,606-Speed 2618.86 samples/sec   Loss 9.3007   LearningRate 0.0470   Epoch: 6   Global Step: 260870   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:12,540-Speed 2603.65 samples/sec   Loss 9.2543   LearningRate 0.0470   Epoch: 6   Global Step: 260880   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:16,441-Speed 2625.87 samples/sec   Loss 9.1682   LearningRate 0.0470   Epoch: 6   Global Step: 260890   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:20,344-Speed 2624.94 samples/sec   Loss 9.3656   LearningRate 0.0470   Epoch: 6   Global Step: 260900   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:24,242-Speed 2627.34 samples/sec   Loss 9.3395   LearningRate 0.0470   Epoch: 6   Global Step: 260910   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:28,145-Speed 2624.32 samples/sec   Loss 9.3962   LearningRate 0.0470   Epoch: 6   Global Step: 260920   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:32,045-Speed 2626.68 samples/sec   Loss 9.2309   LearningRate 0.0470   Epoch: 6   Global Step: 260930   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:35,947-Speed 2624.68 samples/sec   Loss 9.1626   LearningRate 0.0470   Epoch: 6   Global Step: 260940   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:03:39,848-Speed 2625.28 samples/sec   Loss 9.3604   LearningRate 0.0470   Epoch: 6   Global Step: 260950   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:03:43,744-Speed 2629.07 samples/sec   Loss 9.1980   LearningRate 0.0470   Epoch: 6   Global Step: 260960   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:03:47,639-Speed 2630.12 samples/sec   Loss 9.2100   LearningRate 0.0470   Epoch: 6   Global Step: 260970   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:03:51,579-Speed 2599.49 samples/sec   Loss 9.2555   LearningRate 0.0470   Epoch: 6   Global Step: 260980   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:03:55,478-Speed 2627.57 samples/sec   Loss 9.3299   LearningRate 0.0470   Epoch: 6   Global Step: 260990   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:03:59,376-Speed 2627.13 samples/sec   Loss 9.3467   LearningRate 0.0470   Epoch: 6   Global Step: 261000   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:03,276-Speed 2626.57 samples/sec   Loss 9.2966   LearningRate 0.0470   Epoch: 6   Global Step: 261010   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:07,172-Speed 2629.01 samples/sec   Loss 9.3155   LearningRate 0.0470   Epoch: 6   Global Step: 261020   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:11,079-Speed 2621.26 samples/sec   Loss 9.2296   LearningRate 0.0470   Epoch: 6   Global Step: 261030   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:14,987-Speed 2620.68 samples/sec   Loss 9.1885   LearningRate 0.0470   Epoch: 6   Global Step: 261040   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:18,876-Speed 2634.35 samples/sec   Loss 9.2891   LearningRate 0.0470   Epoch: 6   Global Step: 261050   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:22,799-Speed 2611.06 samples/sec   Loss 9.2019   LearningRate 0.0470   Epoch: 6   Global Step: 261060   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:26,721-Speed 2611.79 samples/sec   Loss 9.2837   LearningRate 0.0470   Epoch: 6   Global Step: 261070   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:30,633-Speed 2618.34 samples/sec   Loss 9.3010   LearningRate 0.0470   Epoch: 6   Global Step: 261080   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:34,555-Speed 2611.18 samples/sec   Loss 9.3459   LearningRate 0.0470   Epoch: 6   Global Step: 261090   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:38,452-Speed 2628.61 samples/sec   Loss 9.2476   LearningRate 0.0470   Epoch: 6   Global Step: 261100   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:42,350-Speed 2627.47 samples/sec   Loss 9.2778   LearningRate 0.0470   Epoch: 6   Global Step: 261110   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:46,275-Speed 2609.09 samples/sec   Loss 9.2627   LearningRate 0.0470   Epoch: 6   Global Step: 261120   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:50,292-Speed 2551.31 samples/sec   Loss 9.3790   LearningRate 0.0470   Epoch: 6   Global Step: 261130   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:54,182-Speed 2632.67 samples/sec   Loss 9.1724   LearningRate 0.0470   Epoch: 6   Global Step: 261140   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:04:58,080-Speed 2627.57 samples/sec   Loss 9.1947   LearningRate 0.0470   Epoch: 6   Global Step: 261150   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:05:02,002-Speed 2611.48 samples/sec   Loss 9.2995   LearningRate 0.0469   Epoch: 6   Global Step: 261160   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:05:05,883-Speed 2639.20 samples/sec   Loss 9.1701   LearningRate 0.0469   Epoch: 6   Global Step: 261170   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:09,779-Speed 2628.88 samples/sec   Loss 9.3574   LearningRate 0.0469   Epoch: 6   Global Step: 261180   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:13,682-Speed 2624.54 samples/sec   Loss 9.2555   LearningRate 0.0469   Epoch: 6   Global Step: 261190   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:17,579-Speed 2628.59 samples/sec   Loss 9.3104   LearningRate 0.0469   Epoch: 6   Global Step: 261200   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:21,478-Speed 2626.86 samples/sec   Loss 9.2424   LearningRate 0.0469   Epoch: 6   Global Step: 261210   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:25,408-Speed 2607.10 samples/sec   Loss 9.0989   LearningRate 0.0469   Epoch: 6   Global Step: 261220   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:29,321-Speed 2617.60 samples/sec   Loss 9.1232   LearningRate 0.0469   Epoch: 6   Global Step: 261230   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:33,223-Speed 2625.11 samples/sec   Loss 9.2238   LearningRate 0.0469   Epoch: 6   Global Step: 261240   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:37,125-Speed 2624.44 samples/sec   Loss 9.3301   LearningRate 0.0469   Epoch: 6   Global Step: 261250   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:41,031-Speed 2622.25 samples/sec   Loss 9.1405   LearningRate 0.0469   Epoch: 6   Global Step: 261260   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:44,913-Speed 2638.56 samples/sec   Loss 9.0649   LearningRate 0.0469   Epoch: 6   Global Step: 261270   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:48,810-Speed 2628.52 samples/sec   Loss 9.2868   LearningRate 0.0469   Epoch: 6   Global Step: 261280   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:52,712-Speed 2624.93 samples/sec   Loss 9.2370   LearningRate 0.0469   Epoch: 6   Global Step: 261290   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:05:56,616-Speed 2623.45 samples/sec   Loss 9.2468   LearningRate 0.0469   Epoch: 6   Global Step: 261300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:00,516-Speed 2626.45 samples/sec   Loss 9.1639   LearningRate 0.0469   Epoch: 6   Global Step: 261310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:04,433-Speed 2614.37 samples/sec   Loss 9.1618   LearningRate 0.0469   Epoch: 6   Global Step: 261320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:08,334-Speed 2625.82 samples/sec   Loss 9.2549   LearningRate 0.0469   Epoch: 6   Global Step: 261330   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:12,249-Speed 2616.30 samples/sec   Loss 9.2546   LearningRate 0.0469   Epoch: 6   Global Step: 261340   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:16,155-Speed 2621.81 samples/sec   Loss 9.4286   LearningRate 0.0469   Epoch: 6   Global Step: 261350   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:20,064-Speed 2619.99 samples/sec   Loss 9.2630   LearningRate 0.0469   Epoch: 6   Global Step: 261360   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:23,970-Speed 2622.72 samples/sec   Loss 9.2905   LearningRate 0.0469   Epoch: 6   Global Step: 261370   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:06:27,875-Speed 2623.17 samples/sec   Loss 9.1993   LearningRate 0.0469   Epoch: 6   Global Step: 261380   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:06:31,764-Speed 2633.62 samples/sec   Loss 9.2192   LearningRate 0.0469   Epoch: 6   Global Step: 261390   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:35,673-Speed 2620.35 samples/sec   Loss 9.3156   LearningRate 0.0469   Epoch: 6   Global Step: 261400   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:39,620-Speed 2594.75 samples/sec   Loss 9.3245   LearningRate 0.0469   Epoch: 6   Global Step: 261410   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:43,540-Speed 2612.94 samples/sec   Loss 9.3047   LearningRate 0.0469   Epoch: 6   Global Step: 261420   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:47,452-Speed 2617.67 samples/sec   Loss 9.1664   LearningRate 0.0469   Epoch: 6   Global Step: 261430   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:51,373-Speed 2612.20 samples/sec   Loss 9.1545   LearningRate 0.0469   Epoch: 6   Global Step: 261440   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:55,283-Speed 2619.32 samples/sec   Loss 9.1567   LearningRate 0.0469   Epoch: 6   Global Step: 261450   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:06:59,193-Speed 2619.37 samples/sec   Loss 9.2579   LearningRate 0.0469   Epoch: 6   Global Step: 261460   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:03,103-Speed 2621.95 samples/sec   Loss 9.2190   LearningRate 0.0469   Epoch: 6   Global Step: 261470   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:07,009-Speed 2621.89 samples/sec   Loss 9.2224   LearningRate 0.0469   Epoch: 6   Global Step: 261480   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:10,892-Speed 2637.78 samples/sec   Loss 9.2570   LearningRate 0.0469   Epoch: 6   Global Step: 261490   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:14,793-Speed 2625.53 samples/sec   Loss 9.3710   LearningRate 0.0469   Epoch: 6   Global Step: 261500   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:18,693-Speed 2626.00 samples/sec   Loss 9.2780   LearningRate 0.0469   Epoch: 6   Global Step: 261510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:22,593-Speed 2626.02 samples/sec   Loss 9.1665   LearningRate 0.0469   Epoch: 6   Global Step: 261520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:26,497-Speed 2623.99 samples/sec   Loss 9.1884   LearningRate 0.0469   Epoch: 6   Global Step: 261530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:30,399-Speed 2624.83 samples/sec   Loss 9.1978   LearningRate 0.0469   Epoch: 6   Global Step: 261540   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:34,300-Speed 2625.85 samples/sec   Loss 9.4194   LearningRate 0.0469   Epoch: 6   Global Step: 261550   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:38,202-Speed 2624.98 samples/sec   Loss 9.2858   LearningRate 0.0469   Epoch: 6   Global Step: 261560   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:42,187-Speed 2570.45 samples/sec   Loss 9.1380   LearningRate 0.0469   Epoch: 6   Global Step: 261570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:46,087-Speed 2625.56 samples/sec   Loss 9.2373   LearningRate 0.0469   Epoch: 6   Global Step: 261580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:07:49,991-Speed 2623.48 samples/sec   Loss 9.1890   LearningRate 0.0469   Epoch: 6   Global Step: 261590   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:07:53,901-Speed 2619.34 samples/sec   Loss 9.1681   LearningRate 0.0469   Epoch: 6   Global Step: 261600   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:07:57,801-Speed 2626.92 samples/sec   Loss 9.0903   LearningRate 0.0469   Epoch: 6   Global Step: 261610   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:08:01,708-Speed 2621.20 samples/sec   Loss 9.3436   LearningRate 0.0469   Epoch: 6   Global Step: 261620   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:08:05,619-Speed 2619.04 samples/sec   Loss 9.2264   LearningRate 0.0469   Epoch: 6   Global Step: 261630   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:08:09,526-Speed 2621.29 samples/sec   Loss 9.2654   LearningRate 0.0469   Epoch: 6   Global Step: 261640   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:08:13,426-Speed 2626.25 samples/sec   Loss 9.2074   LearningRate 0.0469   Epoch: 6   Global Step: 261650   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:17,332-Speed 2622.22 samples/sec   Loss 9.1564   LearningRate 0.0469   Epoch: 6   Global Step: 261660   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:21,240-Speed 2621.10 samples/sec   Loss 9.3567   LearningRate 0.0469   Epoch: 6   Global Step: 261670   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:25,152-Speed 2617.70 samples/sec   Loss 9.2837   LearningRate 0.0469   Epoch: 6   Global Step: 261680   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:29,061-Speed 2620.81 samples/sec   Loss 9.2734   LearningRate 0.0469   Epoch: 6   Global Step: 261690   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:32,975-Speed 2616.72 samples/sec   Loss 9.2225   LearningRate 0.0469   Epoch: 6   Global Step: 261700   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:36,877-Speed 2624.56 samples/sec   Loss 9.3101   LearningRate 0.0469   Epoch: 6   Global Step: 261710   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:40,778-Speed 2625.34 samples/sec   Loss 9.3182   LearningRate 0.0469   Epoch: 6   Global Step: 261720   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:44,689-Speed 2619.41 samples/sec   Loss 9.1571   LearningRate 0.0469   Epoch: 6   Global Step: 261730   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:48,601-Speed 2618.24 samples/sec   Loss 9.0678   LearningRate 0.0469   Epoch: 6   Global Step: 261740   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:08:52,505-Speed 2624.39 samples/sec   Loss 9.2497   LearningRate 0.0469   Epoch: 6   Global Step: 261750   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:08:56,391-Speed 2635.47 samples/sec   Loss 9.2452   LearningRate 0.0468   Epoch: 6   Global Step: 261760   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:00,314-Speed 2610.94 samples/sec   Loss 9.2832   LearningRate 0.0468   Epoch: 6   Global Step: 261770   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:04,217-Speed 2624.15 samples/sec   Loss 9.1378   LearningRate 0.0468   Epoch: 6   Global Step: 261780   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:08,171-Speed 2590.08 samples/sec   Loss 9.1352   LearningRate 0.0468   Epoch: 6   Global Step: 261790   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:12,073-Speed 2624.75 samples/sec   Loss 9.0706   LearningRate 0.0468   Epoch: 6   Global Step: 261800   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:15,976-Speed 2624.41 samples/sec   Loss 9.2424   LearningRate 0.0468   Epoch: 6   Global Step: 261810   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:19,877-Speed 2625.27 samples/sec   Loss 9.2490   LearningRate 0.0468   Epoch: 6   Global Step: 261820   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:23,780-Speed 2624.35 samples/sec   Loss 9.2125   LearningRate 0.0468   Epoch: 6   Global Step: 261830   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:27,686-Speed 2622.34 samples/sec   Loss 9.1749   LearningRate 0.0468   Epoch: 6   Global Step: 261840   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:31,596-Speed 2620.01 samples/sec   Loss 9.3120   LearningRate 0.0468   Epoch: 6   Global Step: 261850   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:35,499-Speed 2624.01 samples/sec   Loss 9.2362   LearningRate 0.0468   Epoch: 6   Global Step: 261860   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:09:39,402-Speed 2624.26 samples/sec   Loss 9.2307   LearningRate 0.0468   Epoch: 6   Global Step: 261870   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:09:43,304-Speed 2624.67 samples/sec   Loss 9.2804   LearningRate 0.0468   Epoch: 6   Global Step: 261880   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:09:47,201-Speed 2628.00 samples/sec   Loss 9.2343   LearningRate 0.0468   Epoch: 6   Global Step: 261890   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:09:51,079-Speed 2641.50 samples/sec   Loss 9.2482   LearningRate 0.0468   Epoch: 6   Global Step: 261900   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:09:54,975-Speed 2629.20 samples/sec   Loss 9.4344   LearningRate 0.0468   Epoch: 6   Global Step: 261910   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:09:58,882-Speed 2621.52 samples/sec   Loss 9.2806   LearningRate 0.0468   Epoch: 6   Global Step: 261920   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:02,788-Speed 2622.23 samples/sec   Loss 9.2688   LearningRate 0.0468   Epoch: 6   Global Step: 261930   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:06,697-Speed 2620.25 samples/sec   Loss 9.2624   LearningRate 0.0468   Epoch: 6   Global Step: 261940   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:10,607-Speed 2619.64 samples/sec   Loss 9.1308   LearningRate 0.0468   Epoch: 6   Global Step: 261950   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:14,514-Speed 2621.01 samples/sec   Loss 9.2742   LearningRate 0.0468   Epoch: 6   Global Step: 261960   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:18,421-Speed 2622.15 samples/sec   Loss 9.2846   LearningRate 0.0468   Epoch: 6   Global Step: 261970   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:22,332-Speed 2618.59 samples/sec   Loss 9.3354   LearningRate 0.0468   Epoch: 6   Global Step: 261980   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:26,231-Speed 2626.71 samples/sec   Loss 9.1928   LearningRate 0.0468   Epoch: 6   Global Step: 261990   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:30,136-Speed 2623.16 samples/sec   Loss 9.2115   LearningRate 0.0468   Epoch: 6   Global Step: 262000   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:34,019-Speed 2637.73 samples/sec   Loss 9.9037   LearningRate 0.0468   Epoch: 6   Global Step: 262010   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:37,923-Speed 2623.19 samples/sec   Loss 9.6131   LearningRate 0.0468   Epoch: 6   Global Step: 262020   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:41,823-Speed 2626.99 samples/sec   Loss 9.2629   LearningRate 0.0468   Epoch: 6   Global Step: 262030   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:45,746-Speed 2610.79 samples/sec   Loss 9.4049   LearningRate 0.0468   Epoch: 6   Global Step: 262040   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:49,659-Speed 2617.25 samples/sec   Loss 9.2102   LearningRate 0.0468   Epoch: 6   Global Step: 262050   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:53,562-Speed 2624.41 samples/sec   Loss 9.2468   LearningRate 0.0468   Epoch: 6   Global Step: 262060   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:10:57,461-Speed 2626.83 samples/sec   Loss 9.1777   LearningRate 0.0468   Epoch: 6   Global Step: 262070   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:11:01,389-Speed 2607.46 samples/sec   Loss 9.3041   LearningRate 0.0468   Epoch: 6   Global Step: 262080   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:11:05,293-Speed 2623.59 samples/sec   Loss 9.2939   LearningRate 0.0468   Epoch: 6   Global Step: 262090   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:11:09,196-Speed 2623.70 samples/sec   Loss 9.2053   LearningRate 0.0468   Epoch: 6   Global Step: 262100   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:11:13,101-Speed 2623.26 samples/sec   Loss 9.2765   LearningRate 0.0468   Epoch: 6   Global Step: 262110   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:17,004-Speed 2624.43 samples/sec   Loss 9.1356   LearningRate 0.0468   Epoch: 6   Global Step: 262120   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:20,913-Speed 2620.51 samples/sec   Loss 9.2192   LearningRate 0.0468   Epoch: 6   Global Step: 262130   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:24,814-Speed 2625.41 samples/sec   Loss 9.2142   LearningRate 0.0468   Epoch: 6   Global Step: 262140   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:28,713-Speed 2626.97 samples/sec   Loss 9.1812   LearningRate 0.0468   Epoch: 6   Global Step: 262150   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:32,623-Speed 2619.17 samples/sec   Loss 9.1961   LearningRate 0.0468   Epoch: 6   Global Step: 262160   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:36,526-Speed 2623.92 samples/sec   Loss 9.3491   LearningRate 0.0468   Epoch: 6   Global Step: 262170   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:40,427-Speed 2625.44 samples/sec   Loss 9.1941   LearningRate 0.0468   Epoch: 6   Global Step: 262180   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:44,331-Speed 2624.45 samples/sec   Loss 9.1652   LearningRate 0.0468   Epoch: 6   Global Step: 262190   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:48,235-Speed 2623.97 samples/sec   Loss 9.3160   LearningRate 0.0468   Epoch: 6   Global Step: 262200   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:11:52,171-Speed 2602.06 samples/sec   Loss 9.1676   LearningRate 0.0468   Epoch: 6   Global Step: 262210   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:11:56,086-Speed 2616.80 samples/sec   Loss 9.2593   LearningRate 0.0468   Epoch: 6   Global Step: 262220   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:00,003-Speed 2614.86 samples/sec   Loss 9.3155   LearningRate 0.0468   Epoch: 6   Global Step: 262230   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:03,917-Speed 2616.45 samples/sec   Loss 9.3277   LearningRate 0.0468   Epoch: 6   Global Step: 262240   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:07,842-Speed 2609.75 samples/sec   Loss 9.1389   LearningRate 0.0468   Epoch: 6   Global Step: 262250   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:11,738-Speed 2629.21 samples/sec   Loss 9.1594   LearningRate 0.0468   Epoch: 6   Global Step: 262260   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:15,646-Speed 2621.08 samples/sec   Loss 9.2517   LearningRate 0.0468   Epoch: 6   Global Step: 262270   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:19,564-Speed 2614.07 samples/sec   Loss 9.1566   LearningRate 0.0468   Epoch: 6   Global Step: 262280   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:23,485-Speed 2612.55 samples/sec   Loss 9.1424   LearningRate 0.0468   Epoch: 6   Global Step: 262290   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:27,388-Speed 2624.10 samples/sec   Loss 9.1153   LearningRate 0.0468   Epoch: 6   Global Step: 262300   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:12:31,289-Speed 2625.96 samples/sec   Loss 9.2618   LearningRate 0.0468   Epoch: 6   Global Step: 262310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:35,195-Speed 2622.68 samples/sec   Loss 9.2762   LearningRate 0.0468   Epoch: 6   Global Step: 262320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:39,097-Speed 2624.85 samples/sec   Loss 9.2411   LearningRate 0.0468   Epoch: 6   Global Step: 262330   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:43,000-Speed 2624.70 samples/sec   Loss 9.1421   LearningRate 0.0468   Epoch: 6   Global Step: 262340   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:46,908-Speed 2620.58 samples/sec   Loss 9.2093   LearningRate 0.0468   Epoch: 6   Global Step: 262350   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:50,842-Speed 2603.25 samples/sec   Loss 9.2418   LearningRate 0.0468   Epoch: 6   Global Step: 262360   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:54,746-Speed 2623.99 samples/sec   Loss 9.2634   LearningRate 0.0467   Epoch: 6   Global Step: 262370   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:12:58,646-Speed 2626.28 samples/sec   Loss 9.1771   LearningRate 0.0467   Epoch: 6   Global Step: 262380   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:13:02,549-Speed 2624.11 samples/sec   Loss 9.1892   LearningRate 0.0467   Epoch: 6   Global Step: 262390   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:13:06,452-Speed 2624.33 samples/sec   Loss 9.0898   LearningRate 0.0467   Epoch: 6   Global Step: 262400   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:13:10,356-Speed 2623.74 samples/sec   Loss 9.3280   LearningRate 0.0467   Epoch: 6   Global Step: 262410   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:14,261-Speed 2623.70 samples/sec   Loss 9.2930   LearningRate 0.0467   Epoch: 6   Global Step: 262420   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:18,166-Speed 2622.52 samples/sec   Loss 9.3030   LearningRate 0.0467   Epoch: 6   Global Step: 262430   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:22,068-Speed 2625.13 samples/sec   Loss 9.0957   LearningRate 0.0467   Epoch: 6   Global Step: 262440   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:25,969-Speed 2625.09 samples/sec   Loss 9.2138   LearningRate 0.0467   Epoch: 6   Global Step: 262450   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:29,892-Speed 2611.16 samples/sec   Loss 9.3274   LearningRate 0.0467   Epoch: 6   Global Step: 262460   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:33,799-Speed 2622.06 samples/sec   Loss 9.2066   LearningRate 0.0467   Epoch: 6   Global Step: 262470   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:37,707-Speed 2620.46 samples/sec   Loss 9.2782   LearningRate 0.0467   Epoch: 6   Global Step: 262480   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:41,608-Speed 2626.04 samples/sec   Loss 9.3124   LearningRate 0.0467   Epoch: 6   Global Step: 262490   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:45,513-Speed 2622.47 samples/sec   Loss 9.2547   LearningRate 0.0467   Epoch: 6   Global Step: 262500   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:13:49,411-Speed 2627.80 samples/sec   Loss 9.1423   LearningRate 0.0467   Epoch: 6   Global Step: 262510   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:13:53,317-Speed 2622.21 samples/sec   Loss 9.1490   LearningRate 0.0467   Epoch: 6   Global Step: 262520   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:13:57,220-Speed 2625.00 samples/sec   Loss 9.2157   LearningRate 0.0467   Epoch: 6   Global Step: 262530   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:01,133-Speed 2616.87 samples/sec   Loss 9.1942   LearningRate 0.0467   Epoch: 6   Global Step: 262540   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:05,053-Speed 2613.07 samples/sec   Loss 9.0535   LearningRate 0.0467   Epoch: 6   Global Step: 262550   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:09,053-Speed 2560.60 samples/sec   Loss 9.1599   LearningRate 0.0467   Epoch: 6   Global Step: 262560   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:13,087-Speed 2539.40 samples/sec   Loss 9.2909   LearningRate 0.0467   Epoch: 6   Global Step: 262570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:17,006-Speed 2613.44 samples/sec   Loss 9.2658   LearningRate 0.0467   Epoch: 6   Global Step: 262580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:20,906-Speed 2626.63 samples/sec   Loss 9.1921   LearningRate 0.0467   Epoch: 6   Global Step: 262590   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:24,827-Speed 2611.88 samples/sec   Loss 9.1271   LearningRate 0.0467   Epoch: 6   Global Step: 262600   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:14:28,729-Speed 2625.04 samples/sec   Loss 9.2491   LearningRate 0.0467   Epoch: 6   Global Step: 262610   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:32,632-Speed 2624.52 samples/sec   Loss 9.2344   LearningRate 0.0467   Epoch: 6   Global Step: 262620   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:36,541-Speed 2619.88 samples/sec   Loss 9.2355   LearningRate 0.0467   Epoch: 6   Global Step: 262630   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:40,465-Speed 2609.95 samples/sec   Loss 9.1555   LearningRate 0.0467   Epoch: 6   Global Step: 262640   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:44,378-Speed 2618.14 samples/sec   Loss 9.0573   LearningRate 0.0467   Epoch: 6   Global Step: 262650   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:48,283-Speed 2622.62 samples/sec   Loss 9.1819   LearningRate 0.0467   Epoch: 6   Global Step: 262660   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:52,191-Speed 2621.51 samples/sec   Loss 9.2511   LearningRate 0.0467   Epoch: 6   Global Step: 262670   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:56,091-Speed 2626.24 samples/sec   Loss 9.2722   LearningRate 0.0467   Epoch: 6   Global Step: 262680   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:14:59,991-Speed 2626.45 samples/sec   Loss 9.2072   LearningRate 0.0467   Epoch: 6   Global Step: 262690   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:03,891-Speed 2626.16 samples/sec   Loss 9.3122   LearningRate 0.0467   Epoch: 6   Global Step: 262700   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:08,267-Speed 2340.54 samples/sec   Loss 9.3831   LearningRate 0.0467   Epoch: 6   Global Step: 262710   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:12,177-Speed 2618.90 samples/sec   Loss 9.1450   LearningRate 0.0467   Epoch: 6   Global Step: 262720   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:16,069-Speed 2631.93 samples/sec   Loss 9.1126   LearningRate 0.0467   Epoch: 6   Global Step: 262730   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:19,970-Speed 2626.04 samples/sec   Loss 9.3413   LearningRate 0.0467   Epoch: 6   Global Step: 262740   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:23,872-Speed 2625.04 samples/sec   Loss 9.3232   LearningRate 0.0467   Epoch: 6   Global Step: 262750   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:27,782-Speed 2619.26 samples/sec   Loss 9.0779   LearningRate 0.0467   Epoch: 6   Global Step: 262760   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:31,683-Speed 2625.34 samples/sec   Loss 9.1160   LearningRate 0.0467   Epoch: 6   Global Step: 262770   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:35,582-Speed 2627.39 samples/sec   Loss 9.1678   LearningRate 0.0467   Epoch: 6   Global Step: 262780   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:39,494-Speed 2617.37 samples/sec   Loss 9.1487   LearningRate 0.0467   Epoch: 6   Global Step: 262790   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:43,425-Speed 2606.13 samples/sec   Loss 9.0731   LearningRate 0.0467   Epoch: 6   Global Step: 262800   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:47,312-Speed 2635.28 samples/sec   Loss 9.2119   LearningRate 0.0467   Epoch: 6   Global Step: 262810   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:51,211-Speed 2627.03 samples/sec   Loss 9.3134   LearningRate 0.0467   Epoch: 6   Global Step: 262820   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:55,110-Speed 2626.81 samples/sec   Loss 9.1003   LearningRate 0.0467   Epoch: 6   Global Step: 262830   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:15:59,007-Speed 2629.06 samples/sec   Loss 9.2042   LearningRate 0.0467   Epoch: 6   Global Step: 262840   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:16:02,906-Speed 2626.67 samples/sec   Loss 9.1625   LearningRate 0.0467   Epoch: 6   Global Step: 262850   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:16:06,795-Speed 2633.15 samples/sec   Loss 9.2072   LearningRate 0.0467   Epoch: 6   Global Step: 262860   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:16:10,642-Speed 2662.22 samples/sec   Loss 10.1797   LearningRate 0.0467   Epoch: 6   Global Step: 262870   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:14,543-Speed 2625.98 samples/sec   Loss 9.4990   LearningRate 0.0467   Epoch: 6   Global Step: 262880   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:18,441-Speed 2628.25 samples/sec   Loss 9.4066   LearningRate 0.0467   Epoch: 6   Global Step: 262890   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:22,340-Speed 2626.95 samples/sec   Loss 9.5263   LearningRate 0.0467   Epoch: 6   Global Step: 262900   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:26,230-Speed 2633.31 samples/sec   Loss 9.5334   LearningRate 0.0467   Epoch: 6   Global Step: 262910   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:30,132-Speed 2624.32 samples/sec   Loss 9.5311   LearningRate 0.0467   Epoch: 6   Global Step: 262920   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:34,041-Speed 2620.22 samples/sec   Loss 9.2314   LearningRate 0.0467   Epoch: 6   Global Step: 262930   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:37,938-Speed 2628.39 samples/sec   Loss 9.1933   LearningRate 0.0467   Epoch: 6   Global Step: 262940   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:41,838-Speed 2626.43 samples/sec   Loss 9.1598   LearningRate 0.0467   Epoch: 6   Global Step: 262950   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:45,744-Speed 2621.92 samples/sec   Loss 9.1579   LearningRate 0.0467   Epoch: 6   Global Step: 262960   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:16:49,664-Speed 2613.14 samples/sec   Loss 9.1998   LearningRate 0.0467   Epoch: 6   Global Step: 262970   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:16:53,575-Speed 2618.87 samples/sec   Loss 9.1647   LearningRate 0.0466   Epoch: 6   Global Step: 262980   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:16:57,478-Speed 2624.65 samples/sec   Loss 9.3311   LearningRate 0.0466   Epoch: 6   Global Step: 262990   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:17:01,370-Speed 2631.97 samples/sec   Loss 9.4422   LearningRate 0.0466   Epoch: 6   Global Step: 263000   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:05,276-Speed 2621.77 samples/sec   Loss 9.4562   LearningRate 0.0466   Epoch: 6   Global Step: 263010   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:09,173-Speed 2628.34 samples/sec   Loss 9.1567   LearningRate 0.0466   Epoch: 6   Global Step: 263020   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:13,068-Speed 2629.88 samples/sec   Loss 9.3790   LearningRate 0.0466   Epoch: 6   Global Step: 263030   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:16,962-Speed 2629.64 samples/sec   Loss 9.1704   LearningRate 0.0466   Epoch: 6   Global Step: 263040   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:20,858-Speed 2629.48 samples/sec   Loss 9.2678   LearningRate 0.0466   Epoch: 6   Global Step: 263050   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:24,759-Speed 2625.90 samples/sec   Loss 9.2913   LearningRate 0.0466   Epoch: 6   Global Step: 263060   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:28,677-Speed 2614.23 samples/sec   Loss 9.1328   LearningRate 0.0466   Epoch: 6   Global Step: 263070   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:32,579-Speed 2625.29 samples/sec   Loss 9.1627   LearningRate 0.0466   Epoch: 6   Global Step: 263080   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:36,482-Speed 2624.20 samples/sec   Loss 9.1861   LearningRate 0.0466   Epoch: 6   Global Step: 263090   Fp16 Grad Scale: 16384   Required: 64 hours
Training: 2022-04-14 01:17:40,383-Speed 2625.24 samples/sec   Loss 9.3165   LearningRate 0.0466   Epoch: 6   Global Step: 263100   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:17:44,335-Speed 2591.93 samples/sec   Loss 9.3166   LearningRate 0.0466   Epoch: 6   Global Step: 263110   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:17:48,230-Speed 2630.02 samples/sec   Loss 9.0572   LearningRate 0.0466   Epoch: 6   Global Step: 263120   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:17:52,247-Speed 2549.74 samples/sec   Loss 9.2407   LearningRate 0.0466   Epoch: 6   Global Step: 263130   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:17:56,149-Speed 2625.59 samples/sec   Loss 9.1151   LearningRate 0.0466   Epoch: 6   Global Step: 263140   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:18:00,047-Speed 2627.37 samples/sec   Loss 9.2782   LearningRate 0.0466   Epoch: 6   Global Step: 263150   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:18:03,941-Speed 2630.13 samples/sec   Loss 9.2057   LearningRate 0.0466   Epoch: 6   Global Step: 263160   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:18:07,836-Speed 2629.44 samples/sec   Loss 9.2333   LearningRate 0.0466   Epoch: 6   Global Step: 263170   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:18:11,736-Speed 2626.65 samples/sec   Loss 9.3420   LearningRate 0.0466   Epoch: 6   Global Step: 263180   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:18:15,639-Speed 2624.46 samples/sec   Loss 9.2935   LearningRate 0.0466   Epoch: 6   Global Step: 263190   Fp16 Grad Scale: 32768   Required: 64 hours
Training: 2022-04-14 01:18:19,637-Speed 2561.94 samples/sec   Loss 9.3636   LearningRate 0.0466   Epoch: 6   Global Step: 263200   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:23,534-Speed 2628.75 samples/sec   Loss 9.2897   LearningRate 0.0466   Epoch: 6   Global Step: 263210   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:27,438-Speed 2623.32 samples/sec   Loss 9.2388   LearningRate 0.0466   Epoch: 6   Global Step: 263220   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:31,334-Speed 2628.80 samples/sec   Loss 9.3316   LearningRate 0.0466   Epoch: 6   Global Step: 263230   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:35,235-Speed 2625.88 samples/sec   Loss 9.0562   LearningRate 0.0466   Epoch: 6   Global Step: 263240   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:39,144-Speed 2620.52 samples/sec   Loss 9.2089   LearningRate 0.0466   Epoch: 6   Global Step: 263250   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:43,049-Speed 2622.94 samples/sec   Loss 9.2495   LearningRate 0.0466   Epoch: 6   Global Step: 263260   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:46,992-Speed 2597.81 samples/sec   Loss 9.2239   LearningRate 0.0466   Epoch: 6   Global Step: 263270   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:50,903-Speed 2619.07 samples/sec   Loss 9.2950   LearningRate 0.0466   Epoch: 6   Global Step: 263280   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:54,822-Speed 2613.60 samples/sec   Loss 9.2554   LearningRate 0.0466   Epoch: 6   Global Step: 263290   Fp16 Grad Scale: 65536   Required: 64 hours
Training: 2022-04-14 01:18:58,716-Speed 2630.79 samples/sec   Loss 9.2371   LearningRate 0.0466   Epoch: 6   Global Step: 263300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:02,621-Speed 2623.13 samples/sec   Loss 9.3310   LearningRate 0.0466   Epoch: 6   Global Step: 263310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:06,530-Speed 2620.03 samples/sec   Loss 9.2226   LearningRate 0.0466   Epoch: 6   Global Step: 263320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:10,428-Speed 2627.42 samples/sec   Loss 9.2552   LearningRate 0.0466   Epoch: 6   Global Step: 263330   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:14,338-Speed 2619.62 samples/sec   Loss 9.1275   LearningRate 0.0466   Epoch: 6   Global Step: 263340   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:18,255-Speed 2614.25 samples/sec   Loss 9.2388   LearningRate 0.0466   Epoch: 6   Global Step: 263350   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:22,160-Speed 2623.58 samples/sec   Loss 9.3164   LearningRate 0.0466   Epoch: 6   Global Step: 263360   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:26,065-Speed 2623.12 samples/sec   Loss 9.1982   LearningRate 0.0466   Epoch: 6   Global Step: 263370   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:29,985-Speed 2612.35 samples/sec   Loss 9.2265   LearningRate 0.0466   Epoch: 6   Global Step: 263380   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:33,889-Speed 2623.62 samples/sec   Loss 9.3696   LearningRate 0.0466   Epoch: 6   Global Step: 263390   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:19:37,787-Speed 2627.71 samples/sec   Loss 9.2311   LearningRate 0.0466   Epoch: 6   Global Step: 263400   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:19:41,684-Speed 2628.17 samples/sec   Loss 9.1499   LearningRate 0.0466   Epoch: 6   Global Step: 263410   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:19:45,580-Speed 2629.34 samples/sec   Loss 9.3494   LearningRate 0.0466   Epoch: 6   Global Step: 263420   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:19:49,491-Speed 2618.87 samples/sec   Loss 9.2067   LearningRate 0.0466   Epoch: 6   Global Step: 263430   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:19:53,388-Speed 2628.01 samples/sec   Loss 9.1828   LearningRate 0.0466   Epoch: 6   Global Step: 263440   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:19:57,319-Speed 2605.50 samples/sec   Loss 9.1426   LearningRate 0.0466   Epoch: 6   Global Step: 263450   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:01,218-Speed 2627.24 samples/sec   Loss 9.0667   LearningRate 0.0466   Epoch: 6   Global Step: 263460   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:05,120-Speed 2624.68 samples/sec   Loss 9.2244   LearningRate 0.0466   Epoch: 6   Global Step: 263470   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:09,036-Speed 2616.42 samples/sec   Loss 9.1824   LearningRate 0.0466   Epoch: 6   Global Step: 263480   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:12,942-Speed 2621.79 samples/sec   Loss 9.2147   LearningRate 0.0466   Epoch: 6   Global Step: 263490   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:16,834-Speed 2632.43 samples/sec   Loss 9.1513   LearningRate 0.0466   Epoch: 6   Global Step: 263500   Fp16 Grad Scale: 524288   Required: 64 hours
Training: 2022-04-14 01:20:20,709-Speed 2642.66 samples/sec   Loss 9.1465   LearningRate 0.0466   Epoch: 6   Global Step: 263510   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:24,607-Speed 2628.44 samples/sec   Loss 9.0161   LearningRate 0.0466   Epoch: 6   Global Step: 263520   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:28,508-Speed 2625.53 samples/sec   Loss 9.1992   LearningRate 0.0466   Epoch: 6   Global Step: 263530   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:32,414-Speed 2621.57 samples/sec   Loss 9.2987   LearningRate 0.0466   Epoch: 6   Global Step: 263540   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:36,309-Speed 2629.59 samples/sec   Loss 9.2718   LearningRate 0.0466   Epoch: 6   Global Step: 263550   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:40,204-Speed 2630.16 samples/sec   Loss 9.1791   LearningRate 0.0466   Epoch: 6   Global Step: 263560   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:20:44,087-Speed 2638.09 samples/sec   Loss 9.1690   LearningRate 0.0466   Epoch: 6   Global Step: 263570   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:20:47,984-Speed 2628.24 samples/sec   Loss 9.2729   LearningRate 0.0465   Epoch: 6   Global Step: 263580   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:20:51,878-Speed 2630.43 samples/sec   Loss 9.2890   LearningRate 0.0465   Epoch: 6   Global Step: 263590   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:20:55,771-Speed 2630.34 samples/sec   Loss 9.0941   LearningRate 0.0465   Epoch: 6   Global Step: 263600   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:20:59,665-Speed 2630.36 samples/sec   Loss 9.1732   LearningRate 0.0465   Epoch: 6   Global Step: 263610   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:03,579-Speed 2616.74 samples/sec   Loss 9.1104   LearningRate 0.0465   Epoch: 6   Global Step: 263620   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:07,489-Speed 2619.63 samples/sec   Loss 9.1513   LearningRate 0.0465   Epoch: 6   Global Step: 263630   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:11,382-Speed 2630.60 samples/sec   Loss 9.1482   LearningRate 0.0465   Epoch: 6   Global Step: 263640   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:15,282-Speed 2627.17 samples/sec   Loss 9.1857   LearningRate 0.0465   Epoch: 6   Global Step: 263650   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:19,204-Speed 2611.49 samples/sec   Loss 9.2532   LearningRate 0.0465   Epoch: 6   Global Step: 263660   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:23,101-Speed 2628.41 samples/sec   Loss 9.1827   LearningRate 0.0465   Epoch: 6   Global Step: 263670   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:21:26,995-Speed 2630.83 samples/sec   Loss 9.2047   LearningRate 0.0465   Epoch: 6   Global Step: 263680   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:21:30,888-Speed 2631.14 samples/sec   Loss 9.1886   LearningRate 0.0465   Epoch: 6   Global Step: 263690   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:21:34,783-Speed 2629.17 samples/sec   Loss 9.2098   LearningRate 0.0465   Epoch: 6   Global Step: 263700   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:21:38,678-Speed 2629.63 samples/sec   Loss 9.1538   LearningRate 0.0465   Epoch: 6   Global Step: 263710   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:21:42,556-Speed 2641.26 samples/sec   Loss 9.2764   LearningRate 0.0465   Epoch: 6   Global Step: 263720   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:46,474-Speed 2614.20 samples/sec   Loss 9.1297   LearningRate 0.0465   Epoch: 6   Global Step: 263730   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:50,371-Speed 2628.60 samples/sec   Loss 9.2397   LearningRate 0.0465   Epoch: 6   Global Step: 263740   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:54,266-Speed 2629.43 samples/sec   Loss 9.2802   LearningRate 0.0465   Epoch: 6   Global Step: 263750   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:21:58,168-Speed 2625.17 samples/sec   Loss 9.2292   LearningRate 0.0465   Epoch: 6   Global Step: 263760   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:22:02,062-Speed 2630.14 samples/sec   Loss 9.1145   LearningRate 0.0465   Epoch: 6   Global Step: 263770   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:22:05,958-Speed 2628.96 samples/sec   Loss 9.2279   LearningRate 0.0465   Epoch: 6   Global Step: 263780   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:22:09,855-Speed 2628.64 samples/sec   Loss 9.1273   LearningRate 0.0465   Epoch: 6   Global Step: 263790   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:22:13,750-Speed 2629.62 samples/sec   Loss 9.3160   LearningRate 0.0465   Epoch: 6   Global Step: 263800   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:22:17,647-Speed 2628.12 samples/sec   Loss 9.2509   LearningRate 0.0465   Epoch: 6   Global Step: 263810   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:22:21,589-Speed 2601.57 samples/sec   Loss 9.1596   LearningRate 0.0465   Epoch: 6   Global Step: 263820   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:25,513-Speed 2610.71 samples/sec   Loss 9.2304   LearningRate 0.0465   Epoch: 6   Global Step: 263830   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:29,408-Speed 2629.53 samples/sec   Loss 9.1339   LearningRate 0.0465   Epoch: 6   Global Step: 263840   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:33,336-Speed 2607.33 samples/sec   Loss 9.3021   LearningRate 0.0465   Epoch: 6   Global Step: 263850   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:37,237-Speed 2625.87 samples/sec   Loss 9.2391   LearningRate 0.0465   Epoch: 6   Global Step: 263860   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:41,128-Speed 2632.26 samples/sec   Loss 9.0957   LearningRate 0.0465   Epoch: 6   Global Step: 263870   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:45,019-Speed 2632.52 samples/sec   Loss 9.2584   LearningRate 0.0465   Epoch: 6   Global Step: 263880   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:48,913-Speed 2630.45 samples/sec   Loss 9.1363   LearningRate 0.0465   Epoch: 6   Global Step: 263890   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:52,809-Speed 2628.73 samples/sec   Loss 9.1756   LearningRate 0.0465   Epoch: 6   Global Step: 263900   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:22:56,709-Speed 2626.85 samples/sec   Loss 9.2217   LearningRate 0.0465   Epoch: 6   Global Step: 263910   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:00,593-Speed 2637.42 samples/sec   Loss 9.3410   LearningRate 0.0465   Epoch: 6   Global Step: 263920   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:04,523-Speed 2605.79 samples/sec   Loss 9.0970   LearningRate 0.0465   Epoch: 6   Global Step: 263930   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:08,423-Speed 2626.20 samples/sec   Loss 9.0977   LearningRate 0.0465   Epoch: 6   Global Step: 263940   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:12,412-Speed 2568.33 samples/sec   Loss 9.1443   LearningRate 0.0465   Epoch: 6   Global Step: 263950   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:16,310-Speed 2627.50 samples/sec   Loss 9.0902   LearningRate 0.0465   Epoch: 6   Global Step: 263960   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:20,211-Speed 2625.43 samples/sec   Loss 9.2658   LearningRate 0.0465   Epoch: 6   Global Step: 263970   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:24,107-Speed 2629.00 samples/sec   Loss 9.0931   LearningRate 0.0465   Epoch: 6   Global Step: 263980   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:28,003-Speed 2629.29 samples/sec   Loss 9.2630   LearningRate 0.0465   Epoch: 6   Global Step: 263990   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:31,902-Speed 2627.01 samples/sec   Loss 9.2446   LearningRate 0.0465   Epoch: 6   Global Step: 264000   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:23:35,781-Speed 2640.44 samples/sec   Loss 9.3000   LearningRate 0.0465   Epoch: 6   Global Step: 264010   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:23:39,674-Speed 2630.87 samples/sec   Loss 9.2545   LearningRate 0.0465   Epoch: 6   Global Step: 264020   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:23:43,591-Speed 2614.48 samples/sec   Loss 9.2766   LearningRate 0.0465   Epoch: 6   Global Step: 264030   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:23:47,495-Speed 2623.68 samples/sec   Loss 9.1744   LearningRate 0.0465   Epoch: 6   Global Step: 264040   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:23:51,399-Speed 2623.49 samples/sec   Loss 9.2429   LearningRate 0.0465   Epoch: 6   Global Step: 264050   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:23:55,302-Speed 2624.38 samples/sec   Loss 9.1579   LearningRate 0.0465   Epoch: 6   Global Step: 264060   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:23:59,200-Speed 2627.75 samples/sec   Loss 9.1081   LearningRate 0.0465   Epoch: 6   Global Step: 264070   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:24:03,096-Speed 2628.97 samples/sec   Loss 9.1781   LearningRate 0.0465   Epoch: 6   Global Step: 264080   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:24:06,989-Speed 2631.02 samples/sec   Loss 9.0835   LearningRate 0.0465   Epoch: 6   Global Step: 264090   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:24:10,884-Speed 2629.32 samples/sec   Loss 9.3056   LearningRate 0.0465   Epoch: 6   Global Step: 264100   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:24:14,787-Speed 2624.38 samples/sec   Loss 9.0850   LearningRate 0.0465   Epoch: 6   Global Step: 264110   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:18,779-Speed 2565.37 samples/sec   Loss 9.2030   LearningRate 0.0465   Epoch: 6   Global Step: 264120   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:22,674-Speed 2629.72 samples/sec   Loss 9.1087   LearningRate 0.0465   Epoch: 6   Global Step: 264130   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:26,574-Speed 2626.17 samples/sec   Loss 9.1901   LearningRate 0.0465   Epoch: 6   Global Step: 264140   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:30,486-Speed 2618.66 samples/sec   Loss 9.3093   LearningRate 0.0465   Epoch: 6   Global Step: 264150   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:34,382-Speed 2628.78 samples/sec   Loss 9.2321   LearningRate 0.0465   Epoch: 6   Global Step: 264160   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:38,277-Speed 2629.99 samples/sec   Loss 9.2625   LearningRate 0.0465   Epoch: 6   Global Step: 264170   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:42,174-Speed 2628.16 samples/sec   Loss 9.1524   LearningRate 0.0465   Epoch: 6   Global Step: 264180   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:46,070-Speed 2629.40 samples/sec   Loss 9.2506   LearningRate 0.0464   Epoch: 6   Global Step: 264190   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:49,971-Speed 2625.14 samples/sec   Loss 9.1457   LearningRate 0.0464   Epoch: 6   Global Step: 264200   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:53,850-Speed 2640.79 samples/sec   Loss 9.2871   LearningRate 0.0464   Epoch: 6   Global Step: 264210   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:24:57,754-Speed 2623.54 samples/sec   Loss 9.1198   LearningRate 0.0464   Epoch: 6   Global Step: 264220   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:25:01,624-Speed 2646.69 samples/sec   Loss 9.1953   LearningRate 0.0464   Epoch: 6   Global Step: 264230   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:05,517-Speed 2630.35 samples/sec   Loss 9.1582   LearningRate 0.0464   Epoch: 6   Global Step: 264240   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:09,412-Speed 2630.07 samples/sec   Loss 9.2408   LearningRate 0.0464   Epoch: 6   Global Step: 264250   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:13,305-Speed 2631.08 samples/sec   Loss 9.3129   LearningRate 0.0464   Epoch: 6   Global Step: 264260   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:17,202-Speed 2628.74 samples/sec   Loss 9.2099   LearningRate 0.0464   Epoch: 6   Global Step: 264270   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:21,102-Speed 2626.00 samples/sec   Loss 9.2662   LearningRate 0.0464   Epoch: 6   Global Step: 264280   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:25,007-Speed 2623.54 samples/sec   Loss 9.1563   LearningRate 0.0464   Epoch: 6   Global Step: 264290   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:28,914-Speed 2621.72 samples/sec   Loss 9.2395   LearningRate 0.0464   Epoch: 6   Global Step: 264300   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:32,813-Speed 2626.48 samples/sec   Loss 9.0805   LearningRate 0.0464   Epoch: 6   Global Step: 264310   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:36,741-Speed 2607.30 samples/sec   Loss 9.3564   LearningRate 0.0464   Epoch: 6   Global Step: 264320   Fp16 Grad Scale: 131072   Required: 64 hours
Training: 2022-04-14 01:25:40,638-Speed 2628.80 samples/sec   Loss 9.1900   LearningRate 0.0464   Epoch: 6   Global Step: 264330   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:25:44,534-Speed 2628.87 samples/sec   Loss 9.2690   LearningRate 0.0464   Epoch: 6   Global Step: 264340   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:25:48,440-Speed 2622.46 samples/sec   Loss 9.0355   LearningRate 0.0464   Epoch: 6   Global Step: 264350   Fp16 Grad Scale: 262144   Required: 64 hours
Training: 2022-04-14 01:25:52,339-Speed 2627.14 samples/sec   Loss 9.0888   LearningRate 0.0464   Epoch: 6   Global Step: 264360   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:25:56,246-Speed 2620.90 samples/sec   Loss 9.2013   LearningRate 0.0464   Epoch: 6   Global Step: 264370   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:26:00,141-Speed 2629.81 samples/sec   Loss 9.2131   LearningRate 0.0464   Epoch: 6   Global Step: 264380   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:26:04,040-Speed 2627.37 samples/sec   Loss 9.2771   LearningRate 0.0464   Epoch: 6   Global Step: 264390   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:26:07,947-Speed 2621.34 samples/sec   Loss 9.1014   LearningRate 0.0464   Epoch: 6   Global Step: 264400   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:26:11,842-Speed 2629.72 samples/sec   Loss 9.2060   LearningRate 0.0464   Epoch: 6   Global Step: 264410   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:26:15,702-Speed 2653.37 samples/sec   Loss 9.2021   LearningRate 0.0464   Epoch: 6   Global Step: 264420   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:19,597-Speed 2629.72 samples/sec   Loss 9.0939   LearningRate 0.0464   Epoch: 6   Global Step: 264430   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:23,498-Speed 2625.78 samples/sec   Loss 9.2565   LearningRate 0.0464   Epoch: 6   Global Step: 264440   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:27,413-Speed 2615.70 samples/sec   Loss 9.3083   LearningRate 0.0464   Epoch: 6   Global Step: 264450   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:31,309-Speed 2628.74 samples/sec   Loss 9.1897   LearningRate 0.0464   Epoch: 6   Global Step: 264460   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:35,210-Speed 2626.21 samples/sec   Loss 9.2161   LearningRate 0.0464   Epoch: 6   Global Step: 264470   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:39,105-Speed 2629.75 samples/sec   Loss 9.2354   LearningRate 0.0464   Epoch: 6   Global Step: 264480   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:42,999-Speed 2629.70 samples/sec   Loss 9.2656   LearningRate 0.0464   Epoch: 6   Global Step: 264490   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:46,900-Speed 2626.19 samples/sec   Loss 9.0600   LearningRate 0.0464   Epoch: 6   Global Step: 264500   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:50,805-Speed 2622.31 samples/sec   Loss 9.0897   LearningRate 0.0464   Epoch: 6   Global Step: 264510   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:26:54,711-Speed 2623.03 samples/sec   Loss 9.0864   LearningRate 0.0464   Epoch: 6   Global Step: 264520   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:26:58,612-Speed 2625.63 samples/sec   Loss 9.1468   LearningRate 0.0464   Epoch: 6   Global Step: 264530   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:02,533-Speed 2611.64 samples/sec   Loss 9.2488   LearningRate 0.0464   Epoch: 6   Global Step: 264540   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:06,433-Speed 2626.06 samples/sec   Loss 9.1851   LearningRate 0.0464   Epoch: 6   Global Step: 264550   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:10,334-Speed 2625.79 samples/sec   Loss 9.1958   LearningRate 0.0464   Epoch: 6   Global Step: 264560   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:14,226-Speed 2632.07 samples/sec   Loss 9.2520   LearningRate 0.0464   Epoch: 6   Global Step: 264570   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:18,119-Speed 2631.02 samples/sec   Loss 9.0856   LearningRate 0.0464   Epoch: 6   Global Step: 264580   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:22,016-Speed 2628.18 samples/sec   Loss 9.1361   LearningRate 0.0464   Epoch: 6   Global Step: 264590   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:25,919-Speed 2624.44 samples/sec   Loss 9.1702   LearningRate 0.0464   Epoch: 6   Global Step: 264600   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:27:29,804-Speed 2636.32 samples/sec   Loss 9.1747   LearningRate 0.0464   Epoch: 6   Global Step: 264610   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:27:33,677-Speed 2644.37 samples/sec   Loss 9.2879   LearningRate 0.0464   Epoch: 6   Global Step: 264620   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:27:37,578-Speed 2625.38 samples/sec   Loss 9.1029   LearningRate 0.0464   Epoch: 6   Global Step: 264630   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:27:41,472-Speed 2630.06 samples/sec   Loss 9.2817   LearningRate 0.0464   Epoch: 6   Global Step: 264640   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:27:45,363-Speed 2632.58 samples/sec   Loss 9.3088   LearningRate 0.0464   Epoch: 6   Global Step: 264650   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:27:49,254-Speed 2632.21 samples/sec   Loss 9.2329   LearningRate 0.0464   Epoch: 6   Global Step: 264660   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:27:53,147-Speed 2631.50 samples/sec   Loss 9.1758   LearningRate 0.0464   Epoch: 6   Global Step: 264670   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:27:57,043-Speed 2629.05 samples/sec   Loss 9.1226   LearningRate 0.0464   Epoch: 6   Global Step: 264680   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:28:00,943-Speed 2625.71 samples/sec   Loss 9.2058   LearningRate 0.0464   Epoch: 6   Global Step: 264690   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:28:04,836-Speed 2631.04 samples/sec   Loss 9.1217   LearningRate 0.0464   Epoch: 6   Global Step: 264700   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:28:08,728-Speed 2631.60 samples/sec   Loss 9.1670   LearningRate 0.0464   Epoch: 6   Global Step: 264710   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:28:12,620-Speed 2631.62 samples/sec   Loss 9.1946   LearningRate 0.0464   Epoch: 6   Global Step: 264720   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:16,520-Speed 2626.50 samples/sec   Loss 9.2835   LearningRate 0.0464   Epoch: 6   Global Step: 264730   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:20,413-Speed 2630.91 samples/sec   Loss 9.1700   LearningRate 0.0464   Epoch: 6   Global Step: 264740   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:24,316-Speed 2624.53 samples/sec   Loss 9.1058   LearningRate 0.0464   Epoch: 6   Global Step: 264750   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:28,218-Speed 2624.48 samples/sec   Loss 9.2139   LearningRate 0.0464   Epoch: 6   Global Step: 264760   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:32,169-Speed 2592.97 samples/sec   Loss 9.1715   LearningRate 0.0464   Epoch: 6   Global Step: 264770   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:36,069-Speed 2625.92 samples/sec   Loss 9.1192   LearningRate 0.0464   Epoch: 6   Global Step: 264780   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:39,963-Speed 2630.56 samples/sec   Loss 9.2642   LearningRate 0.0464   Epoch: 6   Global Step: 264790   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:43,857-Speed 2630.16 samples/sec   Loss 9.1170   LearningRate 0.0463   Epoch: 6   Global Step: 264800   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:47,750-Speed 2631.02 samples/sec   Loss 9.2077   LearningRate 0.0463   Epoch: 6   Global Step: 264810   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:28:51,645-Speed 2629.56 samples/sec   Loss 9.1888   LearningRate 0.0463   Epoch: 6   Global Step: 264820   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:28:55,538-Speed 2630.55 samples/sec   Loss 9.2259   LearningRate 0.0463   Epoch: 6   Global Step: 264830   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:28:59,472-Speed 2603.63 samples/sec   Loss 9.1364   LearningRate 0.0463   Epoch: 6   Global Step: 264840   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:03,367-Speed 2629.65 samples/sec   Loss 9.2314   LearningRate 0.0463   Epoch: 6   Global Step: 264850   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:07,267-Speed 2626.08 samples/sec   Loss 9.2554   LearningRate 0.0463   Epoch: 6   Global Step: 264860   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:11,162-Speed 2629.87 samples/sec   Loss 9.1884   LearningRate 0.0463   Epoch: 6   Global Step: 264870   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:15,056-Speed 2630.05 samples/sec   Loss 9.0456   LearningRate 0.0463   Epoch: 6   Global Step: 264880   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:18,947-Speed 2632.68 samples/sec   Loss 9.1341   LearningRate 0.0463   Epoch: 6   Global Step: 264890   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:22,858-Speed 2618.56 samples/sec   Loss 9.1847   LearningRate 0.0463   Epoch: 6   Global Step: 264900   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:26,755-Speed 2628.37 samples/sec   Loss 9.2878   LearningRate 0.0463   Epoch: 6   Global Step: 264910   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:30,652-Speed 2628.22 samples/sec   Loss 9.2322   LearningRate 0.0463   Epoch: 6   Global Step: 264920   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:29:34,558-Speed 2622.27 samples/sec   Loss 9.1522   LearningRate 0.0463   Epoch: 6   Global Step: 264930   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:29:38,452-Speed 2630.00 samples/sec   Loss 9.1455   LearningRate 0.0463   Epoch: 6   Global Step: 264940   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:29:42,346-Speed 2630.15 samples/sec   Loss 9.2020   LearningRate 0.0463   Epoch: 6   Global Step: 264950   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:29:46,243-Speed 2629.00 samples/sec   Loss 9.3399   LearningRate 0.0463   Epoch: 6   Global Step: 264960   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:29:50,132-Speed 2633.33 samples/sec   Loss 9.1852   LearningRate 0.0463   Epoch: 6   Global Step: 264970   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:54,036-Speed 2623.86 samples/sec   Loss 9.2086   LearningRate 0.0463   Epoch: 6   Global Step: 264980   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:29:57,944-Speed 2621.08 samples/sec   Loss 9.2156   LearningRate 0.0463   Epoch: 6   Global Step: 264990   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:01,849-Speed 2622.23 samples/sec   Loss 9.1064   LearningRate 0.0463   Epoch: 6   Global Step: 265000   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:05,744-Speed 2629.83 samples/sec   Loss 9.1058   LearningRate 0.0463   Epoch: 6   Global Step: 265010   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:09,643-Speed 2626.45 samples/sec   Loss 9.1622   LearningRate 0.0463   Epoch: 6   Global Step: 265020   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:13,550-Speed 2621.87 samples/sec   Loss 9.2011   LearningRate 0.0463   Epoch: 6   Global Step: 265030   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:17,449-Speed 2626.86 samples/sec   Loss 9.1821   LearningRate 0.0463   Epoch: 6   Global Step: 265040   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:21,348-Speed 2627.00 samples/sec   Loss 9.3022   LearningRate 0.0463   Epoch: 6   Global Step: 265050   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:25,244-Speed 2629.34 samples/sec   Loss 9.3022   LearningRate 0.0463   Epoch: 6   Global Step: 265060   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:29,138-Speed 2629.71 samples/sec   Loss 9.1682   LearningRate 0.0463   Epoch: 6   Global Step: 265070   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:30:33,035-Speed 2628.12 samples/sec   Loss 9.1386   LearningRate 0.0463   Epoch: 6   Global Step: 265080   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:30:36,953-Speed 2613.96 samples/sec   Loss 9.2544   LearningRate 0.0463   Epoch: 6   Global Step: 265090   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:30:40,853-Speed 2626.76 samples/sec   Loss 9.2569   LearningRate 0.0463   Epoch: 6   Global Step: 265100   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:30:44,756-Speed 2624.44 samples/sec   Loss 9.1780   LearningRate 0.0463   Epoch: 6   Global Step: 265110   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:30:48,808-Speed 2527.11 samples/sec   Loss 9.1205   LearningRate 0.0463   Epoch: 6   Global Step: 265120   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:30:52,699-Speed 2633.01 samples/sec   Loss 9.3194   LearningRate 0.0463   Epoch: 6   Global Step: 265130   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:30:56,653-Speed 2589.90 samples/sec   Loss 9.2631   LearningRate 0.0463   Epoch: 6   Global Step: 265140   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:00,548-Speed 2630.06 samples/sec   Loss 9.1085   LearningRate 0.0463   Epoch: 6   Global Step: 265150   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:04,446-Speed 2627.62 samples/sec   Loss 9.1874   LearningRate 0.0463   Epoch: 6   Global Step: 265160   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:08,345-Speed 2626.94 samples/sec   Loss 9.2438   LearningRate 0.0463   Epoch: 6   Global Step: 265170   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:12,246-Speed 2625.07 samples/sec   Loss 9.2611   LearningRate 0.0463   Epoch: 6   Global Step: 265180   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:16,144-Speed 2628.15 samples/sec   Loss 9.1815   LearningRate 0.0463   Epoch: 6   Global Step: 265190   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:20,046-Speed 2624.58 samples/sec   Loss 9.1817   LearningRate 0.0463   Epoch: 6   Global Step: 265200   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:23,950-Speed 2623.31 samples/sec   Loss 9.1011   LearningRate 0.0463   Epoch: 6   Global Step: 265210   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:27,849-Speed 2626.95 samples/sec   Loss 9.0888   LearningRate 0.0463   Epoch: 6   Global Step: 265220   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:31,746-Speed 2628.77 samples/sec   Loss 9.0715   LearningRate 0.0463   Epoch: 6   Global Step: 265230   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:31:35,642-Speed 2628.70 samples/sec   Loss 9.1624   LearningRate 0.0463   Epoch: 6   Global Step: 265240   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:31:39,545-Speed 2624.13 samples/sec   Loss 9.1057   LearningRate 0.0463   Epoch: 6   Global Step: 265250   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:31:43,435-Speed 2633.15 samples/sec   Loss 9.1433   LearningRate 0.0463   Epoch: 6   Global Step: 265260   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:31:47,311-Speed 2642.73 samples/sec   Loss 9.1483   LearningRate 0.0463   Epoch: 6   Global Step: 265270   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:51,206-Speed 2629.36 samples/sec   Loss 9.1457   LearningRate 0.0463   Epoch: 6   Global Step: 265280   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:55,106-Speed 2626.41 samples/sec   Loss 9.1816   LearningRate 0.0463   Epoch: 6   Global Step: 265290   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:31:59,005-Speed 2626.75 samples/sec   Loss 9.1957   LearningRate 0.0463   Epoch: 6   Global Step: 265300   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:02,903-Speed 2627.46 samples/sec   Loss 9.3195   LearningRate 0.0463   Epoch: 6   Global Step: 265310   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:06,801-Speed 2627.62 samples/sec   Loss 9.3575   LearningRate 0.0463   Epoch: 6   Global Step: 265320   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:10,694-Speed 2630.87 samples/sec   Loss 9.2072   LearningRate 0.0463   Epoch: 6   Global Step: 265330   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:14,590-Speed 2631.57 samples/sec   Loss 9.1894   LearningRate 0.0463   Epoch: 6   Global Step: 265340   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:18,485-Speed 2629.62 samples/sec   Loss 9.1208   LearningRate 0.0463   Epoch: 6   Global Step: 265350   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:22,380-Speed 2629.58 samples/sec   Loss 9.0974   LearningRate 0.0463   Epoch: 6   Global Step: 265360   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:26,277-Speed 2628.42 samples/sec   Loss 9.0885   LearningRate 0.0463   Epoch: 6   Global Step: 265370   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:32:30,176-Speed 2626.59 samples/sec   Loss 9.1062   LearningRate 0.0463   Epoch: 6   Global Step: 265380   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:32:34,068-Speed 2631.56 samples/sec   Loss 9.1701   LearningRate 0.0463   Epoch: 6   Global Step: 265390   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:32:37,961-Speed 2630.81 samples/sec   Loss 9.2199   LearningRate 0.0463   Epoch: 6   Global Step: 265400   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:41,856-Speed 2630.22 samples/sec   Loss 9.0588   LearningRate 0.0462   Epoch: 6   Global Step: 265410   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:45,749-Speed 2631.24 samples/sec   Loss 9.1332   LearningRate 0.0462   Epoch: 6   Global Step: 265420   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:49,648-Speed 2626.32 samples/sec   Loss 9.1749   LearningRate 0.0462   Epoch: 6   Global Step: 265430   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:53,549-Speed 2626.13 samples/sec   Loss 9.2501   LearningRate 0.0462   Epoch: 6   Global Step: 265440   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:32:57,444-Speed 2629.17 samples/sec   Loss 9.1865   LearningRate 0.0462   Epoch: 6   Global Step: 265450   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:01,342-Speed 2627.19 samples/sec   Loss 9.0980   LearningRate 0.0462   Epoch: 6   Global Step: 265460   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:05,243-Speed 2625.61 samples/sec   Loss 9.1739   LearningRate 0.0462   Epoch: 6   Global Step: 265470   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:09,145-Speed 2625.20 samples/sec   Loss 9.0882   LearningRate 0.0462   Epoch: 6   Global Step: 265480   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:13,042-Speed 2627.73 samples/sec   Loss 9.1090   LearningRate 0.0462   Epoch: 6   Global Step: 265490   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:16,940-Speed 2627.86 samples/sec   Loss 9.1197   LearningRate 0.0462   Epoch: 6   Global Step: 265500   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:33:20,820-Speed 2639.83 samples/sec   Loss 9.1602   LearningRate 0.0462   Epoch: 6   Global Step: 265510   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:24,716-Speed 2629.27 samples/sec   Loss 9.0818   LearningRate 0.0462   Epoch: 6   Global Step: 265520   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:28,610-Speed 2630.45 samples/sec   Loss 9.2642   LearningRate 0.0462   Epoch: 6   Global Step: 265530   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:32,509-Speed 2626.57 samples/sec   Loss 9.2282   LearningRate 0.0462   Epoch: 6   Global Step: 265540   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:36,409-Speed 2626.21 samples/sec   Loss 9.1838   LearningRate 0.0462   Epoch: 6   Global Step: 265550   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:40,307-Speed 2627.59 samples/sec   Loss 9.1829   LearningRate 0.0462   Epoch: 6   Global Step: 265560   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:44,206-Speed 2626.88 samples/sec   Loss 9.1831   LearningRate 0.0462   Epoch: 6   Global Step: 265570   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:48,100-Speed 2629.79 samples/sec   Loss 9.1310   LearningRate 0.0462   Epoch: 6   Global Step: 265580   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:51,993-Speed 2631.54 samples/sec   Loss 9.1898   LearningRate 0.0462   Epoch: 6   Global Step: 265590   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:55,886-Speed 2630.30 samples/sec   Loss 9.0862   LearningRate 0.0462   Epoch: 6   Global Step: 265600   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:33:59,851-Speed 2583.89 samples/sec   Loss 9.1468   LearningRate 0.0462   Epoch: 6   Global Step: 265610   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:34:03,753-Speed 2624.78 samples/sec   Loss 9.1972   LearningRate 0.0462   Epoch: 6   Global Step: 265620   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:34:07,648-Speed 2629.62 samples/sec   Loss 9.0979   LearningRate 0.0462   Epoch: 6   Global Step: 265630   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:34:11,526-Speed 2641.00 samples/sec   Loss 9.2230   LearningRate 0.0462   Epoch: 6   Global Step: 265640   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:34:15,420-Speed 2630.34 samples/sec   Loss 9.1888   LearningRate 0.0462   Epoch: 6   Global Step: 265650   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:34:19,312-Speed 2631.38 samples/sec   Loss 9.1273   LearningRate 0.0462   Epoch: 6   Global Step: 265660   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:34:23,189-Speed 2642.26 samples/sec   Loss 9.2424   LearningRate 0.0462   Epoch: 6   Global Step: 265670   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:27,080-Speed 2631.99 samples/sec   Loss 9.2086   LearningRate 0.0462   Epoch: 6   Global Step: 265680   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:30,973-Speed 2630.63 samples/sec   Loss 9.1889   LearningRate 0.0462   Epoch: 6   Global Step: 265690   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:34,870-Speed 2628.69 samples/sec   Loss 9.2061   LearningRate 0.0462   Epoch: 6   Global Step: 265700   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:38,780-Speed 2619.30 samples/sec   Loss 9.0879   LearningRate 0.0462   Epoch: 6   Global Step: 265710   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:42,681-Speed 2625.30 samples/sec   Loss 9.0281   LearningRate 0.0462   Epoch: 6   Global Step: 265720   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:46,581-Speed 2626.85 samples/sec   Loss 9.1777   LearningRate 0.0462   Epoch: 6   Global Step: 265730   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:50,479-Speed 2627.31 samples/sec   Loss 9.1412   LearningRate 0.0462   Epoch: 6   Global Step: 265740   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:54,379-Speed 2626.41 samples/sec   Loss 9.1138   LearningRate 0.0462   Epoch: 6   Global Step: 265750   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:34:58,285-Speed 2621.90 samples/sec   Loss 9.0203   LearningRate 0.0462   Epoch: 6   Global Step: 265760   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:35:02,199-Speed 2616.71 samples/sec   Loss 9.1845   LearningRate 0.0462   Epoch: 6   Global Step: 265770   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:06,096-Speed 2628.39 samples/sec   Loss 9.1656   LearningRate 0.0462   Epoch: 6   Global Step: 265780   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:09,996-Speed 2626.38 samples/sec   Loss 9.3102   LearningRate 0.0462   Epoch: 6   Global Step: 265790   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:13,887-Speed 2632.43 samples/sec   Loss 9.2031   LearningRate 0.0462   Epoch: 6   Global Step: 265800   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:17,779-Speed 2631.73 samples/sec   Loss 9.1359   LearningRate 0.0462   Epoch: 6   Global Step: 265810   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:21,672-Speed 2630.71 samples/sec   Loss 9.0629   LearningRate 0.0462   Epoch: 6   Global Step: 265820   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:25,565-Speed 2630.99 samples/sec   Loss 9.1627   LearningRate 0.0462   Epoch: 6   Global Step: 265830   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:29,457-Speed 2631.65 samples/sec   Loss 9.0543   LearningRate 0.0462   Epoch: 6   Global Step: 265840   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:33,351-Speed 2630.67 samples/sec   Loss 9.0558   LearningRate 0.0462   Epoch: 6   Global Step: 265850   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:37,246-Speed 2629.00 samples/sec   Loss 9.1681   LearningRate 0.0462   Epoch: 6   Global Step: 265860   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:35:41,147-Speed 2625.89 samples/sec   Loss 9.1783   LearningRate 0.0462   Epoch: 6   Global Step: 265870   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:35:45,040-Speed 2631.10 samples/sec   Loss 9.0727   LearningRate 0.0462   Epoch: 6   Global Step: 265880   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:35:48,936-Speed 2628.83 samples/sec   Loss 9.1216   LearningRate 0.0462   Epoch: 6   Global Step: 265890   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:35:52,837-Speed 2625.47 samples/sec   Loss 9.1237   LearningRate 0.0462   Epoch: 6   Global Step: 265900   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:35:56,727-Speed 2633.35 samples/sec   Loss 9.1288   LearningRate 0.0462   Epoch: 6   Global Step: 265910   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:00,619-Speed 2631.60 samples/sec   Loss 9.2038   LearningRate 0.0462   Epoch: 6   Global Step: 265920   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:04,517-Speed 2627.20 samples/sec   Loss 9.1121   LearningRate 0.0462   Epoch: 6   Global Step: 265930   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:08,407-Speed 2633.41 samples/sec   Loss 9.1696   LearningRate 0.0462   Epoch: 6   Global Step: 265940   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:12,301-Speed 2629.66 samples/sec   Loss 9.2043   LearningRate 0.0462   Epoch: 6   Global Step: 265950   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:16,196-Speed 2630.25 samples/sec   Loss 9.1610   LearningRate 0.0462   Epoch: 6   Global Step: 265960   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:20,082-Speed 2635.25 samples/sec   Loss 9.1834   LearningRate 0.0462   Epoch: 6   Global Step: 265970   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:23,975-Speed 2631.06 samples/sec   Loss 9.1859   LearningRate 0.0462   Epoch: 6   Global Step: 265980   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:27,866-Speed 2631.92 samples/sec   Loss 9.1711   LearningRate 0.0462   Epoch: 6   Global Step: 265990   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:31,761-Speed 2630.49 samples/sec   Loss 9.2010   LearningRate 0.0462   Epoch: 6   Global Step: 266000   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:36:35,639-Speed 2641.42 samples/sec   Loss 9.2503   LearningRate 0.0462   Epoch: 6   Global Step: 266010   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:36:39,531-Speed 2631.19 samples/sec   Loss 9.2122   LearningRate 0.0461   Epoch: 6   Global Step: 266020   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:36:43,426-Speed 2629.61 samples/sec   Loss 9.1820   LearningRate 0.0461   Epoch: 6   Global Step: 266030   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:36:47,319-Speed 2631.32 samples/sec   Loss 9.0522   LearningRate 0.0461   Epoch: 6   Global Step: 266040   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:36:51,217-Speed 2627.46 samples/sec   Loss 9.0558   LearningRate 0.0461   Epoch: 6   Global Step: 266050   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:36:55,112-Speed 2629.73 samples/sec   Loss 9.0779   LearningRate 0.0461   Epoch: 6   Global Step: 266060   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:36:59,008-Speed 2628.17 samples/sec   Loss 9.1016   LearningRate 0.0461   Epoch: 6   Global Step: 266070   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:37:02,901-Speed 2631.01 samples/sec   Loss 9.3237   LearningRate 0.0461   Epoch: 6   Global Step: 266080   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:37:06,794-Speed 2631.54 samples/sec   Loss 9.1533   LearningRate 0.0461   Epoch: 6   Global Step: 266090   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:37:10,698-Speed 2623.76 samples/sec   Loss 9.1501   LearningRate 0.0461   Epoch: 6   Global Step: 266100   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:37:14,591-Speed 2630.72 samples/sec   Loss 9.1268   LearningRate 0.0461   Epoch: 6   Global Step: 266110   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:37:18,483-Speed 2631.87 samples/sec   Loss 9.2441   LearningRate 0.0461   Epoch: 6   Global Step: 266120   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:37:22,393-Speed 2618.96 samples/sec   Loss 9.1390   LearningRate 0.0461   Epoch: 6   Global Step: 266130   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:37:26,294-Speed 2626.11 samples/sec   Loss 9.0206   LearningRate 0.0461   Epoch: 6   Global Step: 266140   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:37:30,185-Speed 2631.83 samples/sec   Loss 9.1701   LearningRate 0.0461   Epoch: 6   Global Step: 266150   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:37:34,064-Speed 2640.73 samples/sec   Loss 9.1470   LearningRate 0.0461   Epoch: 6   Global Step: 266160   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:37:37,929-Speed 2649.44 samples/sec   Loss 9.3504   LearningRate 0.0461   Epoch: 6   Global Step: 266170   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:37:41,784-Speed 2657.75 samples/sec   Loss 10.3604   LearningRate 0.0461   Epoch: 6   Global Step: 266180   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:37:45,673-Speed 2634.58 samples/sec   Loss 9.4956   LearningRate 0.0461   Epoch: 6   Global Step: 266190   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:37:49,569-Speed 2628.88 samples/sec   Loss 9.3731   LearningRate 0.0461   Epoch: 6   Global Step: 266200   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:37:53,462-Speed 2630.80 samples/sec   Loss 9.1694   LearningRate 0.0461   Epoch: 6   Global Step: 266210   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:37:57,361-Speed 2627.01 samples/sec   Loss 9.3986   LearningRate 0.0461   Epoch: 6   Global Step: 266220   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:38:01,271-Speed 2619.46 samples/sec   Loss 9.3582   LearningRate 0.0461   Epoch: 6   Global Step: 266230   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:38:05,168-Speed 2627.87 samples/sec   Loss 9.3504   LearningRate 0.0461   Epoch: 6   Global Step: 266240   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:38:09,069-Speed 2625.89 samples/sec   Loss 9.2250   LearningRate 0.0461   Epoch: 6   Global Step: 266250   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:38:12,958-Speed 2633.26 samples/sec   Loss 9.2518   LearningRate 0.0461   Epoch: 6   Global Step: 266260   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:38:16,844-Speed 2635.99 samples/sec   Loss 9.2282   LearningRate 0.0461   Epoch: 6   Global Step: 266270   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:38:20,737-Speed 2631.62 samples/sec   Loss 9.0972   LearningRate 0.0461   Epoch: 6   Global Step: 266280   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:24,628-Speed 2632.12 samples/sec   Loss 9.2932   LearningRate 0.0461   Epoch: 6   Global Step: 266290   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:28,517-Speed 2633.42 samples/sec   Loss 9.0843   LearningRate 0.0461   Epoch: 6   Global Step: 266300   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:32,409-Speed 2631.49 samples/sec   Loss 9.2862   LearningRate 0.0461   Epoch: 6   Global Step: 266310   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:36,313-Speed 2623.72 samples/sec   Loss 9.2598   LearningRate 0.0461   Epoch: 6   Global Step: 266320   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:40,240-Speed 2608.44 samples/sec   Loss 9.2813   LearningRate 0.0461   Epoch: 6   Global Step: 266330   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:44,219-Speed 2574.55 samples/sec   Loss 9.1196   LearningRate 0.0461   Epoch: 6   Global Step: 266340   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:48,110-Speed 2631.99 samples/sec   Loss 9.3377   LearningRate 0.0461   Epoch: 6   Global Step: 266350   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:52,016-Speed 2622.80 samples/sec   Loss 9.2073   LearningRate 0.0461   Epoch: 6   Global Step: 266360   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:55,905-Speed 2633.78 samples/sec   Loss 9.7803   LearningRate 0.0461   Epoch: 6   Global Step: 266370   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:38:59,804-Speed 2627.32 samples/sec   Loss 9.3257   LearningRate 0.0461   Epoch: 6   Global Step: 266380   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:03,696-Speed 2631.55 samples/sec   Loss 9.3188   LearningRate 0.0461   Epoch: 6   Global Step: 266390   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:07,593-Speed 2628.28 samples/sec   Loss 9.1749   LearningRate 0.0461   Epoch: 6   Global Step: 266400   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:11,491-Speed 2627.60 samples/sec   Loss 9.3855   LearningRate 0.0461   Epoch: 6   Global Step: 266410   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:15,389-Speed 2628.18 samples/sec   Loss 9.2611   LearningRate 0.0461   Epoch: 6   Global Step: 266420   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:19,286-Speed 2628.41 samples/sec   Loss 9.2166   LearningRate 0.0461   Epoch: 6   Global Step: 266430   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:23,177-Speed 2632.44 samples/sec   Loss 9.1428   LearningRate 0.0461   Epoch: 6   Global Step: 266440   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:27,070-Speed 2630.77 samples/sec   Loss 9.0774   LearningRate 0.0461   Epoch: 6   Global Step: 266450   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:30,966-Speed 2629.19 samples/sec   Loss 9.0415   LearningRate 0.0461   Epoch: 6   Global Step: 266460   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:34,859-Speed 2630.48 samples/sec   Loss 9.2308   LearningRate 0.0461   Epoch: 6   Global Step: 266470   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:38,739-Speed 2640.17 samples/sec   Loss 9.1615   LearningRate 0.0461   Epoch: 6   Global Step: 266480   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:42,633-Speed 2630.00 samples/sec   Loss 9.0656   LearningRate 0.0461   Epoch: 6   Global Step: 266490   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:39:46,514-Speed 2639.43 samples/sec   Loss 9.4187   LearningRate 0.0461   Epoch: 6   Global Step: 266500   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:39:50,380-Speed 2649.33 samples/sec   Loss 9.2080   LearningRate 0.0461   Epoch: 6   Global Step: 266510   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:39:54,267-Speed 2635.46 samples/sec   Loss 9.5471   LearningRate 0.0461   Epoch: 6   Global Step: 266520   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:39:58,225-Speed 2587.83 samples/sec   Loss 9.3387   LearningRate 0.0461   Epoch: 6   Global Step: 266530   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:02,128-Speed 2624.13 samples/sec   Loss 9.2242   LearningRate 0.0461   Epoch: 6   Global Step: 266540   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:06,021-Speed 2631.24 samples/sec   Loss 9.2322   LearningRate 0.0461   Epoch: 6   Global Step: 266550   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:09,910-Speed 2633.86 samples/sec   Loss 9.2469   LearningRate 0.0461   Epoch: 6   Global Step: 266560   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:13,800-Speed 2632.79 samples/sec   Loss 9.2257   LearningRate 0.0461   Epoch: 6   Global Step: 266570   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:17,692-Speed 2631.53 samples/sec   Loss 9.2191   LearningRate 0.0461   Epoch: 6   Global Step: 266580   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:21,585-Speed 2631.62 samples/sec   Loss 9.0826   LearningRate 0.0461   Epoch: 6   Global Step: 266590   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:25,489-Speed 2623.43 samples/sec   Loss 9.1764   LearningRate 0.0461   Epoch: 6   Global Step: 266600   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:40:29,387-Speed 2627.65 samples/sec   Loss 9.2248   LearningRate 0.0461   Epoch: 6   Global Step: 266610   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:33,277-Speed 2633.38 samples/sec   Loss 9.0953   LearningRate 0.0461   Epoch: 6   Global Step: 266620   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:37,168-Speed 2632.25 samples/sec   Loss 9.1415   LearningRate 0.0460   Epoch: 6   Global Step: 266630   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:41,056-Speed 2633.76 samples/sec   Loss 9.2451   LearningRate 0.0460   Epoch: 6   Global Step: 266640   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:44,955-Speed 2627.52 samples/sec   Loss 9.1765   LearningRate 0.0460   Epoch: 6   Global Step: 266650   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:48,842-Speed 2634.92 samples/sec   Loss 9.1716   LearningRate 0.0460   Epoch: 6   Global Step: 266660   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:52,738-Speed 2629.08 samples/sec   Loss 9.0849   LearningRate 0.0460   Epoch: 6   Global Step: 266670   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:40:56,634-Speed 2629.10 samples/sec   Loss 9.2545   LearningRate 0.0460   Epoch: 6   Global Step: 266680   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:41:00,519-Speed 2636.31 samples/sec   Loss 9.4418   LearningRate 0.0460   Epoch: 6   Global Step: 266690   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:41:04,437-Speed 2614.15 samples/sec   Loss 9.2953   LearningRate 0.0460   Epoch: 6   Global Step: 266700   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:41:08,327-Speed 2633.12 samples/sec   Loss 9.2514   LearningRate 0.0460   Epoch: 6   Global Step: 266710   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:12,217-Speed 2633.24 samples/sec   Loss 9.2841   LearningRate 0.0460   Epoch: 6   Global Step: 266720   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:16,118-Speed 2625.22 samples/sec   Loss 9.4557   LearningRate 0.0460   Epoch: 6   Global Step: 266730   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:20,013-Speed 2630.08 samples/sec   Loss 9.2274   LearningRate 0.0460   Epoch: 6   Global Step: 266740   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:23,903-Speed 2633.24 samples/sec   Loss 9.2323   LearningRate 0.0460   Epoch: 6   Global Step: 266750   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:27,795-Speed 2631.28 samples/sec   Loss 9.0541   LearningRate 0.0460   Epoch: 6   Global Step: 266760   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:31,689-Speed 2630.99 samples/sec   Loss 9.1400   LearningRate 0.0460   Epoch: 6   Global Step: 266770   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:35,582-Speed 2630.66 samples/sec   Loss 9.1131   LearningRate 0.0460   Epoch: 6   Global Step: 266780   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:39,476-Speed 2629.74 samples/sec   Loss 9.1192   LearningRate 0.0460   Epoch: 6   Global Step: 266790   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:43,369-Speed 2631.12 samples/sec   Loss 9.1786   LearningRate 0.0460   Epoch: 6   Global Step: 266800   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:41:47,270-Speed 2625.53 samples/sec   Loss 9.1742   LearningRate 0.0460   Epoch: 6   Global Step: 266810   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:41:51,162-Speed 2632.07 samples/sec   Loss 9.1666   LearningRate 0.0460   Epoch: 6   Global Step: 266820   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:41:55,052-Speed 2632.72 samples/sec   Loss 9.0722   LearningRate 0.0460   Epoch: 6   Global Step: 266830   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:41:58,943-Speed 2632.81 samples/sec   Loss 9.1309   LearningRate 0.0460   Epoch: 6   Global Step: 266840   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:02,834-Speed 2631.99 samples/sec   Loss 9.1093   LearningRate 0.0460   Epoch: 6   Global Step: 266850   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:06,727-Speed 2631.02 samples/sec   Loss 9.2125   LearningRate 0.0460   Epoch: 6   Global Step: 266860   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:10,619-Speed 2631.16 samples/sec   Loss 9.1259   LearningRate 0.0460   Epoch: 6   Global Step: 266870   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:14,511-Speed 2632.08 samples/sec   Loss 9.2960   LearningRate 0.0460   Epoch: 6   Global Step: 266880   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:18,407-Speed 2628.86 samples/sec   Loss 9.2261   LearningRate 0.0460   Epoch: 6   Global Step: 266890   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:22,302-Speed 2629.73 samples/sec   Loss 9.1497   LearningRate 0.0460   Epoch: 6   Global Step: 266900   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:42:26,193-Speed 2631.93 samples/sec   Loss 9.1184   LearningRate 0.0460   Epoch: 6   Global Step: 266910   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:42:30,036-Speed 2665.17 samples/sec   Loss 10.5463   LearningRate 0.0460   Epoch: 6   Global Step: 266920   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:33,930-Speed 2630.50 samples/sec   Loss 10.1699   LearningRate 0.0460   Epoch: 6   Global Step: 266930   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:37,822-Speed 2631.99 samples/sec   Loss 9.4872   LearningRate 0.0460   Epoch: 6   Global Step: 266940   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:41,714-Speed 2631.49 samples/sec   Loss 9.2826   LearningRate 0.0460   Epoch: 6   Global Step: 266950   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:45,606-Speed 2632.12 samples/sec   Loss 9.3340   LearningRate 0.0460   Epoch: 6   Global Step: 266960   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:49,500-Speed 2630.10 samples/sec   Loss 9.3453   LearningRate 0.0460   Epoch: 6   Global Step: 266970   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:53,388-Speed 2633.97 samples/sec   Loss 9.2801   LearningRate 0.0460   Epoch: 6   Global Step: 266980   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:42:57,278-Speed 2632.98 samples/sec   Loss 9.1569   LearningRate 0.0460   Epoch: 6   Global Step: 266990   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:43:01,169-Speed 2632.23 samples/sec   Loss 9.3122   LearningRate 0.0460   Epoch: 6   Global Step: 267000   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:43:05,061-Speed 2631.25 samples/sec   Loss 9.0867   LearningRate 0.0460   Epoch: 6   Global Step: 267010   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:43:08,951-Speed 2633.26 samples/sec   Loss 9.2511   LearningRate 0.0460   Epoch: 6   Global Step: 267020   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:12,842-Speed 2631.94 samples/sec   Loss 9.1416   LearningRate 0.0460   Epoch: 6   Global Step: 267030   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:16,733-Speed 2632.97 samples/sec   Loss 9.0925   LearningRate 0.0460   Epoch: 6   Global Step: 267040   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:20,626-Speed 2631.63 samples/sec   Loss 9.1270   LearningRate 0.0460   Epoch: 6   Global Step: 267050   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:24,516-Speed 2632.69 samples/sec   Loss 9.1804   LearningRate 0.0460   Epoch: 6   Global Step: 267060   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:28,406-Speed 2633.33 samples/sec   Loss 9.2760   LearningRate 0.0460   Epoch: 6   Global Step: 267070   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:32,300-Speed 2629.81 samples/sec   Loss 9.1959   LearningRate 0.0460   Epoch: 6   Global Step: 267080   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:36,216-Speed 2615.31 samples/sec   Loss 9.1730   LearningRate 0.0460   Epoch: 6   Global Step: 267090   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:40,109-Speed 2631.09 samples/sec   Loss 9.3232   LearningRate 0.0460   Epoch: 6   Global Step: 267100   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:44,005-Speed 2628.75 samples/sec   Loss 9.1902   LearningRate 0.0460   Epoch: 6   Global Step: 267110   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:43:47,901-Speed 2628.93 samples/sec   Loss 9.1487   LearningRate 0.0460   Epoch: 6   Global Step: 267120   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:43:51,790-Speed 2633.73 samples/sec   Loss 9.0621   LearningRate 0.0460   Epoch: 6   Global Step: 267130   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:43:55,681-Speed 2632.47 samples/sec   Loss 9.2116   LearningRate 0.0460   Epoch: 6   Global Step: 267140   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:43:59,585-Speed 2623.38 samples/sec   Loss 9.1523   LearningRate 0.0460   Epoch: 6   Global Step: 267150   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:03,477-Speed 2631.50 samples/sec   Loss 9.1569   LearningRate 0.0460   Epoch: 6   Global Step: 267160   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:07,373-Speed 2628.97 samples/sec   Loss 9.1786   LearningRate 0.0460   Epoch: 6   Global Step: 267170   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:11,264-Speed 2632.33 samples/sec   Loss 9.1109   LearningRate 0.0460   Epoch: 6   Global Step: 267180   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:15,154-Speed 2632.83 samples/sec   Loss 9.1845   LearningRate 0.0460   Epoch: 6   Global Step: 267190   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:19,048-Speed 2630.56 samples/sec   Loss 9.2540   LearningRate 0.0460   Epoch: 6   Global Step: 267200   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:22,942-Speed 2630.29 samples/sec   Loss 9.3528   LearningRate 0.0460   Epoch: 6   Global Step: 267210   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:44:26,836-Speed 2630.28 samples/sec   Loss 9.2025   LearningRate 0.0460   Epoch: 6   Global Step: 267220   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:44:30,731-Speed 2630.19 samples/sec   Loss 9.2620   LearningRate 0.0460   Epoch: 6   Global Step: 267230   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:44:34,630-Speed 2626.75 samples/sec   Loss 9.1044   LearningRate 0.0459   Epoch: 6   Global Step: 267240   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:44:38,489-Speed 2653.60 samples/sec   Loss 10.0610   LearningRate 0.0459   Epoch: 6   Global Step: 267250   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:44:42,374-Speed 2636.61 samples/sec   Loss 9.7661   LearningRate 0.0459   Epoch: 6   Global Step: 267260   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:44:46,265-Speed 2631.92 samples/sec   Loss 9.3829   LearningRate 0.0459   Epoch: 6   Global Step: 267270   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:44:50,162-Speed 2628.30 samples/sec   Loss 9.3004   LearningRate 0.0459   Epoch: 6   Global Step: 267280   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:44:54,050-Speed 2634.18 samples/sec   Loss 9.2730   LearningRate 0.0459   Epoch: 6   Global Step: 267290   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:44:57,952-Speed 2624.86 samples/sec   Loss 9.2176   LearningRate 0.0459   Epoch: 6   Global Step: 267300   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:45:01,855-Speed 2624.52 samples/sec   Loss 9.2507   LearningRate 0.0459   Epoch: 6   Global Step: 267310   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:45:05,744-Speed 2633.82 samples/sec   Loss 9.1882   LearningRate 0.0459   Epoch: 6   Global Step: 267320   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:45:09,639-Speed 2629.36 samples/sec   Loss 9.0132   LearningRate 0.0459   Epoch: 6   Global Step: 267330   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:45:13,529-Speed 2632.78 samples/sec   Loss 9.3156   LearningRate 0.0459   Epoch: 6   Global Step: 267340   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:45:17,425-Speed 2629.81 samples/sec   Loss 9.1435   LearningRate 0.0459   Epoch: 6   Global Step: 267350   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:21,317-Speed 2632.12 samples/sec   Loss 9.1580   LearningRate 0.0459   Epoch: 6   Global Step: 267360   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:25,209-Speed 2630.86 samples/sec   Loss 9.2676   LearningRate 0.0459   Epoch: 6   Global Step: 267370   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:29,103-Speed 2630.45 samples/sec   Loss 9.0773   LearningRate 0.0459   Epoch: 6   Global Step: 267380   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:33,004-Speed 2625.33 samples/sec   Loss 9.1655   LearningRate 0.0459   Epoch: 6   Global Step: 267390   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:36,895-Speed 2632.25 samples/sec   Loss 9.2141   LearningRate 0.0459   Epoch: 6   Global Step: 267400   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:40,796-Speed 2625.58 samples/sec   Loss 9.2807   LearningRate 0.0459   Epoch: 6   Global Step: 267410   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:44,768-Speed 2579.08 samples/sec   Loss 9.1256   LearningRate 0.0459   Epoch: 6   Global Step: 267420   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:48,675-Speed 2621.35 samples/sec   Loss 9.2097   LearningRate 0.0459   Epoch: 6   Global Step: 267430   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:52,574-Speed 2626.98 samples/sec   Loss 9.1410   LearningRate 0.0459   Epoch: 6   Global Step: 267440   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:45:56,470-Speed 2628.79 samples/sec   Loss 9.1573   LearningRate 0.0459   Epoch: 6   Global Step: 267450   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:00,362-Speed 2631.72 samples/sec   Loss 9.1304   LearningRate 0.0459   Epoch: 6   Global Step: 267460   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:04,258-Speed 2628.76 samples/sec   Loss 9.1206   LearningRate 0.0459   Epoch: 6   Global Step: 267470   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:08,153-Speed 2629.47 samples/sec   Loss 9.2798   LearningRate 0.0459   Epoch: 6   Global Step: 267480   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:12,053-Speed 2626.08 samples/sec   Loss 9.2727   LearningRate 0.0459   Epoch: 6   Global Step: 267490   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:15,947-Speed 2630.94 samples/sec   Loss 9.2133   LearningRate 0.0459   Epoch: 6   Global Step: 267500   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:19,838-Speed 2632.44 samples/sec   Loss 9.2090   LearningRate 0.0459   Epoch: 6   Global Step: 267510   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:23,732-Speed 2630.10 samples/sec   Loss 9.3152   LearningRate 0.0459   Epoch: 6   Global Step: 267520   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:27,626-Speed 2630.70 samples/sec   Loss 9.1938   LearningRate 0.0459   Epoch: 6   Global Step: 267530   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:31,519-Speed 2630.40 samples/sec   Loss 9.3048   LearningRate 0.0459   Epoch: 6   Global Step: 267540   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:46:35,411-Speed 2631.74 samples/sec   Loss 9.2459   LearningRate 0.0459   Epoch: 6   Global Step: 267550   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:46:39,315-Speed 2623.14 samples/sec   Loss 9.2494   LearningRate 0.0459   Epoch: 6   Global Step: 267560   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:46:43,208-Speed 2631.47 samples/sec   Loss 9.2269   LearningRate 0.0459   Epoch: 6   Global Step: 267570   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:46:47,102-Speed 2629.96 samples/sec   Loss 9.0547   LearningRate 0.0459   Epoch: 6   Global Step: 267580   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:46:51,002-Speed 2626.50 samples/sec   Loss 9.2585   LearningRate 0.0459   Epoch: 6   Global Step: 267590   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:46:54,890-Speed 2634.00 samples/sec   Loss 9.1076   LearningRate 0.0459   Epoch: 6   Global Step: 267600   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:46:58,784-Speed 2630.76 samples/sec   Loss 9.2237   LearningRate 0.0459   Epoch: 6   Global Step: 267610   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:47:02,677-Speed 2630.89 samples/sec   Loss 9.1706   LearningRate 0.0459   Epoch: 6   Global Step: 267620   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:47:06,574-Speed 2628.58 samples/sec   Loss 9.1382   LearningRate 0.0459   Epoch: 6   Global Step: 267630   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:47:10,475-Speed 2624.86 samples/sec   Loss 9.2575   LearningRate 0.0459   Epoch: 6   Global Step: 267640   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:47:14,375-Speed 2626.01 samples/sec   Loss 9.0491   LearningRate 0.0459   Epoch: 6   Global Step: 267650   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:18,266-Speed 2632.17 samples/sec   Loss 9.1341   LearningRate 0.0459   Epoch: 6   Global Step: 267660   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:22,155-Speed 2634.24 samples/sec   Loss 9.0575   LearningRate 0.0459   Epoch: 6   Global Step: 267670   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:26,060-Speed 2623.08 samples/sec   Loss 9.1250   LearningRate 0.0459   Epoch: 6   Global Step: 267680   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:29,959-Speed 2626.89 samples/sec   Loss 9.1191   LearningRate 0.0459   Epoch: 6   Global Step: 267690   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:33,851-Speed 2631.87 samples/sec   Loss 9.2159   LearningRate 0.0459   Epoch: 6   Global Step: 267700   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:37,744-Speed 2630.49 samples/sec   Loss 9.3464   LearningRate 0.0459   Epoch: 6   Global Step: 267710   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:41,635-Speed 2632.47 samples/sec   Loss 9.1230   LearningRate 0.0459   Epoch: 6   Global Step: 267720   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:45,540-Speed 2622.37 samples/sec   Loss 9.1680   LearningRate 0.0459   Epoch: 6   Global Step: 267730   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:49,432-Speed 2632.04 samples/sec   Loss 9.3370   LearningRate 0.0459   Epoch: 6   Global Step: 267740   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:53,309-Speed 2641.91 samples/sec   Loss 9.1873   LearningRate 0.0459   Epoch: 6   Global Step: 267750   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:47:57,182-Speed 2644.45 samples/sec   Loss 9.2112   LearningRate 0.0459   Epoch: 6   Global Step: 267760   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:01,072-Speed 2633.04 samples/sec   Loss 9.2474   LearningRate 0.0459   Epoch: 6   Global Step: 267770   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:04,984-Speed 2618.00 samples/sec   Loss 9.3296   LearningRate 0.0459   Epoch: 6   Global Step: 267780   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:08,879-Speed 2629.30 samples/sec   Loss 9.1859   LearningRate 0.0459   Epoch: 6   Global Step: 267790   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:12,771-Speed 2631.77 samples/sec   Loss 9.0392   LearningRate 0.0459   Epoch: 6   Global Step: 267800   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:16,674-Speed 2624.46 samples/sec   Loss 9.2447   LearningRate 0.0459   Epoch: 6   Global Step: 267810   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:20,566-Speed 2632.44 samples/sec   Loss 9.1000   LearningRate 0.0459   Epoch: 6   Global Step: 267820   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:24,455-Speed 2633.20 samples/sec   Loss 8.9644   LearningRate 0.0459   Epoch: 6   Global Step: 267830   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:28,348-Speed 2630.97 samples/sec   Loss 9.2387   LearningRate 0.0459   Epoch: 6   Global Step: 267840   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:48:32,248-Speed 2626.47 samples/sec   Loss 9.1431   LearningRate 0.0458   Epoch: 6   Global Step: 267850   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:36,132-Speed 2637.45 samples/sec   Loss 9.2797   LearningRate 0.0458   Epoch: 6   Global Step: 267860   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:40,022-Speed 2632.81 samples/sec   Loss 9.1682   LearningRate 0.0458   Epoch: 6   Global Step: 267870   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:43,912-Speed 2633.05 samples/sec   Loss 9.2243   LearningRate 0.0458   Epoch: 6   Global Step: 267880   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:47,812-Speed 2626.24 samples/sec   Loss 9.1913   LearningRate 0.0458   Epoch: 6   Global Step: 267890   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:51,704-Speed 2631.82 samples/sec   Loss 9.3623   LearningRate 0.0458   Epoch: 6   Global Step: 267900   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:55,597-Speed 2631.35 samples/sec   Loss 9.2358   LearningRate 0.0458   Epoch: 6   Global Step: 267910   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:48:59,485-Speed 2633.60 samples/sec   Loss 9.2960   LearningRate 0.0458   Epoch: 6   Global Step: 267920   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:49:03,377-Speed 2631.46 samples/sec   Loss 9.0651   LearningRate 0.0458   Epoch: 6   Global Step: 267930   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:49:07,269-Speed 2631.91 samples/sec   Loss 9.1021   LearningRate 0.0458   Epoch: 6   Global Step: 267940   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:49:11,158-Speed 2633.87 samples/sec   Loss 9.1856   LearningRate 0.0458   Epoch: 6   Global Step: 267950   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:49:14,989-Speed 2672.95 samples/sec   Loss 9.5729   LearningRate 0.0458   Epoch: 6   Global Step: 267960   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:18,877-Speed 2634.70 samples/sec   Loss 9.2156   LearningRate 0.0458   Epoch: 6   Global Step: 267970   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:22,763-Speed 2635.32 samples/sec   Loss 9.2090   LearningRate 0.0458   Epoch: 6   Global Step: 267980   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:26,661-Speed 2627.74 samples/sec   Loss 9.0292   LearningRate 0.0458   Epoch: 6   Global Step: 267990   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:30,559-Speed 2627.77 samples/sec   Loss 9.2254   LearningRate 0.0458   Epoch: 6   Global Step: 268000   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:34,453-Speed 2630.78 samples/sec   Loss 9.1225   LearningRate 0.0458   Epoch: 6   Global Step: 268010   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:38,348-Speed 2629.39 samples/sec   Loss 9.0253   LearningRate 0.0458   Epoch: 6   Global Step: 268020   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:42,239-Speed 2631.91 samples/sec   Loss 9.2111   LearningRate 0.0458   Epoch: 6   Global Step: 268030   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:46,131-Speed 2631.90 samples/sec   Loss 9.1300   LearningRate 0.0458   Epoch: 6   Global Step: 268040   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:50,024-Speed 2630.90 samples/sec   Loss 9.0211   LearningRate 0.0458   Epoch: 6   Global Step: 268050   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:49:53,924-Speed 2626.49 samples/sec   Loss 9.1664   LearningRate 0.0458   Epoch: 6   Global Step: 268060   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:49:57,812-Speed 2634.04 samples/sec   Loss 9.0721   LearningRate 0.0458   Epoch: 6   Global Step: 268070   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:01,705-Speed 2631.52 samples/sec   Loss 9.2144   LearningRate 0.0458   Epoch: 6   Global Step: 268080   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:05,594-Speed 2633.41 samples/sec   Loss 9.2032   LearningRate 0.0458   Epoch: 6   Global Step: 268090   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:09,487-Speed 2630.77 samples/sec   Loss 9.1543   LearningRate 0.0458   Epoch: 6   Global Step: 268100   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:13,380-Speed 2631.07 samples/sec   Loss 9.1297   LearningRate 0.0458   Epoch: 6   Global Step: 268110   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:17,273-Speed 2630.65 samples/sec   Loss 9.2631   LearningRate 0.0458   Epoch: 6   Global Step: 268120   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:21,175-Speed 2625.07 samples/sec   Loss 9.0455   LearningRate 0.0458   Epoch: 6   Global Step: 268130   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:25,087-Speed 2618.09 samples/sec   Loss 9.1653   LearningRate 0.0458   Epoch: 6   Global Step: 268140   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:28,988-Speed 2625.30 samples/sec   Loss 9.0919   LearningRate 0.0458   Epoch: 6   Global Step: 268150   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:50:32,889-Speed 2625.69 samples/sec   Loss 9.2184   LearningRate 0.0458   Epoch: 6   Global Step: 268160   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:50:36,790-Speed 2625.74 samples/sec   Loss 9.0628   LearningRate 0.0458   Epoch: 6   Global Step: 268170   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:50:40,692-Speed 2625.09 samples/sec   Loss 9.1862   LearningRate 0.0458   Epoch: 6   Global Step: 268180   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:50:44,590-Speed 2627.79 samples/sec   Loss 9.2036   LearningRate 0.0458   Epoch: 6   Global Step: 268190   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:50:48,502-Speed 2618.02 samples/sec   Loss 9.0463   LearningRate 0.0458   Epoch: 6   Global Step: 268200   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:50:52,417-Speed 2616.93 samples/sec   Loss 9.1768   LearningRate 0.0458   Epoch: 6   Global Step: 268210   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:50:56,316-Speed 2626.72 samples/sec   Loss 9.3072   LearningRate 0.0458   Epoch: 6   Global Step: 268220   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:51:00,215-Speed 2626.81 samples/sec   Loss 9.1510   LearningRate 0.0458   Epoch: 6   Global Step: 268230   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:51:04,110-Speed 2629.15 samples/sec   Loss 9.0620   LearningRate 0.0458   Epoch: 6   Global Step: 268240   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:51:08,012-Speed 2624.70 samples/sec   Loss 9.2038   LearningRate 0.0458   Epoch: 6   Global Step: 268250   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:51:11,912-Speed 2626.37 samples/sec   Loss 9.2141   LearningRate 0.0458   Epoch: 6   Global Step: 268260   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:15,809-Speed 2628.18 samples/sec   Loss 9.0850   LearningRate 0.0458   Epoch: 6   Global Step: 268270   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:19,709-Speed 2626.64 samples/sec   Loss 9.2841   LearningRate 0.0458   Epoch: 6   Global Step: 268280   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:23,602-Speed 2631.42 samples/sec   Loss 9.2963   LearningRate 0.0458   Epoch: 6   Global Step: 268290   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:27,494-Speed 2631.46 samples/sec   Loss 9.2716   LearningRate 0.0458   Epoch: 6   Global Step: 268300   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:31,390-Speed 2628.94 samples/sec   Loss 9.1219   LearningRate 0.0458   Epoch: 6   Global Step: 268310   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:35,279-Speed 2633.42 samples/sec   Loss 9.3093   LearningRate 0.0458   Epoch: 6   Global Step: 268320   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:39,171-Speed 2631.76 samples/sec   Loss 9.2101   LearningRate 0.0458   Epoch: 6   Global Step: 268330   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:43,068-Speed 2627.67 samples/sec   Loss 9.2050   LearningRate 0.0458   Epoch: 6   Global Step: 268340   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:46,975-Speed 2621.54 samples/sec   Loss 9.2335   LearningRate 0.0458   Epoch: 6   Global Step: 268350   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:51:50,868-Speed 2630.95 samples/sec   Loss 9.0954   LearningRate 0.0458   Epoch: 6   Global Step: 268360   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:51:54,763-Speed 2629.94 samples/sec   Loss 9.2823   LearningRate 0.0458   Epoch: 6   Global Step: 268370   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:51:58,668-Speed 2623.00 samples/sec   Loss 9.1728   LearningRate 0.0458   Epoch: 6   Global Step: 268380   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:02,561-Speed 2631.47 samples/sec   Loss 9.1748   LearningRate 0.0458   Epoch: 6   Global Step: 268390   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:06,481-Speed 2612.77 samples/sec   Loss 9.1087   LearningRate 0.0458   Epoch: 6   Global Step: 268400   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:10,581-Speed 2497.80 samples/sec   Loss 9.1412   LearningRate 0.0458   Epoch: 6   Global Step: 268410   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:14,674-Speed 2502.20 samples/sec   Loss 9.0897   LearningRate 0.0458   Epoch: 6   Global Step: 268420   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:18,773-Speed 2498.42 samples/sec   Loss 9.2223   LearningRate 0.0458   Epoch: 6   Global Step: 268430   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:22,667-Speed 2630.69 samples/sec   Loss 9.1864   LearningRate 0.0458   Epoch: 6   Global Step: 268440   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:26,560-Speed 2630.91 samples/sec   Loss 9.2008   LearningRate 0.0458   Epoch: 6   Global Step: 268450   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:52:30,452-Speed 2631.72 samples/sec   Loss 9.1448   LearningRate 0.0458   Epoch: 6   Global Step: 268460   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:34,341-Speed 2633.70 samples/sec   Loss 9.2019   LearningRate 0.0457   Epoch: 6   Global Step: 268470   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:38,249-Speed 2620.95 samples/sec   Loss 9.2118   LearningRate 0.0457   Epoch: 6   Global Step: 268480   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:42,140-Speed 2631.86 samples/sec   Loss 9.1443   LearningRate 0.0457   Epoch: 6   Global Step: 268490   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:46,033-Speed 2631.07 samples/sec   Loss 9.1033   LearningRate 0.0457   Epoch: 6   Global Step: 268500   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:49,929-Speed 2629.42 samples/sec   Loss 9.1153   LearningRate 0.0457   Epoch: 6   Global Step: 268510   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:53,826-Speed 2627.59 samples/sec   Loss 9.1183   LearningRate 0.0457   Epoch: 6   Global Step: 268520   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:52:57,757-Speed 2605.90 samples/sec   Loss 9.0094   LearningRate 0.0457   Epoch: 6   Global Step: 268530   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:01,649-Speed 2631.71 samples/sec   Loss 9.0682   LearningRate 0.0457   Epoch: 6   Global Step: 268540   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:05,543-Speed 2630.25 samples/sec   Loss 9.0503   LearningRate 0.0457   Epoch: 6   Global Step: 268550   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:09,420-Speed 2641.92 samples/sec   Loss 9.0889   LearningRate 0.0457   Epoch: 6   Global Step: 268560   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:13,317-Speed 2627.97 samples/sec   Loss 9.1871   LearningRate 0.0457   Epoch: 6   Global Step: 268570   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:17,214-Speed 2628.02 samples/sec   Loss 9.0346   LearningRate 0.0457   Epoch: 6   Global Step: 268580   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:21,113-Speed 2627.19 samples/sec   Loss 9.2283   LearningRate 0.0457   Epoch: 6   Global Step: 268590   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:25,010-Speed 2628.08 samples/sec   Loss 9.2022   LearningRate 0.0457   Epoch: 6   Global Step: 268600   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:28,921-Speed 2618.87 samples/sec   Loss 9.0251   LearningRate 0.0457   Epoch: 6   Global Step: 268610   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 01:53:32,809-Speed 2634.44 samples/sec   Loss 9.1621   LearningRate 0.0457   Epoch: 6   Global Step: 268620   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:53:36,722-Speed 2617.90 samples/sec   Loss 9.1180   LearningRate 0.0457   Epoch: 6   Global Step: 268630   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:53:40,619-Speed 2627.69 samples/sec   Loss 9.2702   LearningRate 0.0457   Epoch: 6   Global Step: 268640   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:53:44,674-Speed 2526.14 samples/sec   Loss 9.2492   LearningRate 0.0457   Epoch: 6   Global Step: 268650   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:53:48,725-Speed 2528.35 samples/sec   Loss 9.0661   LearningRate 0.0457   Epoch: 6   Global Step: 268660   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:53:52,525-Speed 2695.63 samples/sec   Loss 9.5631   LearningRate 0.0457   Epoch: 6   Global Step: 268670   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:53:56,389-Speed 2650.57 samples/sec   Loss 9.9581   LearningRate 0.0457   Epoch: 6   Global Step: 268680   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:00,276-Speed 2635.17 samples/sec   Loss 9.2565   LearningRate 0.0457   Epoch: 6   Global Step: 268690   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:04,173-Speed 2628.28 samples/sec   Loss 9.2040   LearningRate 0.0457   Epoch: 6   Global Step: 268700   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:08,071-Speed 2627.64 samples/sec   Loss 9.1258   LearningRate 0.0457   Epoch: 6   Global Step: 268710   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:11,968-Speed 2627.95 samples/sec   Loss 9.1247   LearningRate 0.0457   Epoch: 6   Global Step: 268720   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:15,868-Speed 2626.22 samples/sec   Loss 9.1598   LearningRate 0.0457   Epoch: 6   Global Step: 268730   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:19,768-Speed 2625.71 samples/sec   Loss 9.2367   LearningRate 0.0457   Epoch: 6   Global Step: 268740   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:23,666-Speed 2627.58 samples/sec   Loss 9.1420   LearningRate 0.0457   Epoch: 6   Global Step: 268750   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:27,559-Speed 2631.59 samples/sec   Loss 9.0880   LearningRate 0.0457   Epoch: 6   Global Step: 268760   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:31,447-Speed 2633.96 samples/sec   Loss 9.0544   LearningRate 0.0457   Epoch: 6   Global Step: 268770   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 01:54:35,334-Speed 2634.85 samples/sec   Loss 9.1698   LearningRate 0.0457   Epoch: 6   Global Step: 268780   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:54:39,225-Speed 2632.20 samples/sec   Loss 9.1856   LearningRate 0.0457   Epoch: 6   Global Step: 268790   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:54:43,130-Speed 2623.44 samples/sec   Loss 9.0475   LearningRate 0.0457   Epoch: 6   Global Step: 268800   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:54:47,021-Speed 2632.09 samples/sec   Loss 9.1468   LearningRate 0.0457   Epoch: 6   Global Step: 268810   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:54:50,919-Speed 2627.84 samples/sec   Loss 9.1548   LearningRate 0.0457   Epoch: 6   Global Step: 268820   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:54:54,811-Speed 2631.55 samples/sec   Loss 9.1372   LearningRate 0.0457   Epoch: 6   Global Step: 268830   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:54:58,697-Speed 2635.02 samples/sec   Loss 9.0984   LearningRate 0.0457   Epoch: 6   Global Step: 268840   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:55:02,588-Speed 2633.15 samples/sec   Loss 9.1668   LearningRate 0.0457   Epoch: 6   Global Step: 268850   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:55:06,497-Speed 2620.21 samples/sec   Loss 9.1635   LearningRate 0.0457   Epoch: 6   Global Step: 268860   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:55:10,387-Speed 2632.88 samples/sec   Loss 9.1835   LearningRate 0.0457   Epoch: 6   Global Step: 268870   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 01:55:14,275-Speed 2634.16 samples/sec   Loss 9.1107   LearningRate 0.0457   Epoch: 6   Global Step: 268880   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:18,181-Speed 2622.56 samples/sec   Loss 9.1757   LearningRate 0.0457   Epoch: 6   Global Step: 268890   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:22,074-Speed 2631.14 samples/sec   Loss 9.1194   LearningRate 0.0457   Epoch: 6   Global Step: 268900   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:25,963-Speed 2633.66 samples/sec   Loss 9.0952   LearningRate 0.0457   Epoch: 6   Global Step: 268910   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:29,856-Speed 2630.99 samples/sec   Loss 9.2855   LearningRate 0.0457   Epoch: 6   Global Step: 268920   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:33,746-Speed 2632.41 samples/sec   Loss 9.2702   LearningRate 0.0457   Epoch: 6   Global Step: 268930   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:37,635-Speed 2634.04 samples/sec   Loss 9.1805   LearningRate 0.0457   Epoch: 6   Global Step: 268940   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:41,525-Speed 2632.68 samples/sec   Loss 9.0697   LearningRate 0.0457   Epoch: 6   Global Step: 268950   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:45,418-Speed 2631.65 samples/sec   Loss 8.9927   LearningRate 0.0457   Epoch: 6   Global Step: 268960   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:49,306-Speed 2633.92 samples/sec   Loss 9.0862   LearningRate 0.0457   Epoch: 6   Global Step: 268970   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 01:55:53,194-Speed 2634.70 samples/sec   Loss 9.2092   LearningRate 0.0457   Epoch: 6   Global Step: 268980   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:55:57,086-Speed 2631.32 samples/sec   Loss 9.2305   LearningRate 0.0457   Epoch: 6   Global Step: 268990   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:00,996-Speed 2619.75 samples/sec   Loss 9.1049   LearningRate 0.0457   Epoch: 6   Global Step: 269000   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:04,887-Speed 2632.27 samples/sec   Loss 9.2217   LearningRate 0.0457   Epoch: 6   Global Step: 269010   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:08,780-Speed 2630.42 samples/sec   Loss 9.1229   LearningRate 0.0457   Epoch: 6   Global Step: 269020   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:12,669-Speed 2633.71 samples/sec   Loss 9.2245   LearningRate 0.0457   Epoch: 6   Global Step: 269030   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:16,559-Speed 2633.16 samples/sec   Loss 9.1123   LearningRate 0.0457   Epoch: 6   Global Step: 269040   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:20,447-Speed 2634.50 samples/sec   Loss 9.2719   LearningRate 0.0457   Epoch: 6   Global Step: 269050   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:24,335-Speed 2634.76 samples/sec   Loss 9.1591   LearningRate 0.0457   Epoch: 6   Global Step: 269060   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:28,224-Speed 2633.73 samples/sec   Loss 9.4100   LearningRate 0.0457   Epoch: 6   Global Step: 269070   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 01:56:32,125-Speed 2626.69 samples/sec   Loss 9.2763   LearningRate 0.0456   Epoch: 6   Global Step: 269080   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:36,020-Speed 2629.41 samples/sec   Loss 9.3571   LearningRate 0.0456   Epoch: 6   Global Step: 269090   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:39,905-Speed 2635.77 samples/sec   Loss 9.2778   LearningRate 0.0456   Epoch: 6   Global Step: 269100   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:43,797-Speed 2631.58 samples/sec   Loss 9.1069   LearningRate 0.0456   Epoch: 6   Global Step: 269110   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:47,709-Speed 2618.82 samples/sec   Loss 9.1829   LearningRate 0.0456   Epoch: 6   Global Step: 269120   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:51,731-Speed 2546.27 samples/sec   Loss 9.1981   LearningRate 0.0456   Epoch: 6   Global Step: 269130   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:55,624-Speed 2631.42 samples/sec   Loss 9.1931   LearningRate 0.0456   Epoch: 6   Global Step: 269140   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:56:59,529-Speed 2622.59 samples/sec   Loss 9.1725   LearningRate 0.0456   Epoch: 6   Global Step: 269150   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:03,434-Speed 2623.24 samples/sec   Loss 9.2072   LearningRate 0.0456   Epoch: 6   Global Step: 269160   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:07,334-Speed 2626.48 samples/sec   Loss 9.1949   LearningRate 0.0456   Epoch: 6   Global Step: 269170   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:11,209-Speed 2642.49 samples/sec   Loss 9.3185   LearningRate 0.0456   Epoch: 6   Global Step: 269180   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:15,103-Speed 2630.60 samples/sec   Loss 9.2522   LearningRate 0.0456   Epoch: 6   Global Step: 269190   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:18,998-Speed 2629.51 samples/sec   Loss 9.5083   LearningRate 0.0456   Epoch: 6   Global Step: 269200   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:22,897-Speed 2627.49 samples/sec   Loss 9.3266   LearningRate 0.0456   Epoch: 6   Global Step: 269210   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:26,788-Speed 2632.10 samples/sec   Loss 9.2419   LearningRate 0.0456   Epoch: 6   Global Step: 269220   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:30,680-Speed 2631.71 samples/sec   Loss 9.0598   LearningRate 0.0456   Epoch: 6   Global Step: 269230   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:34,569-Speed 2633.68 samples/sec   Loss 9.1555   LearningRate 0.0456   Epoch: 6   Global Step: 269240   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:38,483-Speed 2616.77 samples/sec   Loss 9.1251   LearningRate 0.0456   Epoch: 6   Global Step: 269250   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:42,541-Speed 2524.12 samples/sec   Loss 9.1182   LearningRate 0.0456   Epoch: 6   Global Step: 269260   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:46,615-Speed 2514.61 samples/sec   Loss 9.0999   LearningRate 0.0456   Epoch: 6   Global Step: 269270   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 01:57:50,532-Speed 2614.33 samples/sec   Loss 9.2130   LearningRate 0.0456   Epoch: 6   Global Step: 269280   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:57:54,446-Speed 2617.46 samples/sec   Loss 9.2605   LearningRate 0.0456   Epoch: 6   Global Step: 269290   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:57:58,338-Speed 2631.26 samples/sec   Loss 9.1481   LearningRate 0.0456   Epoch: 6   Global Step: 269300   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:02,234-Speed 2629.48 samples/sec   Loss 9.0933   LearningRate 0.0456   Epoch: 6   Global Step: 269310   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:06,125-Speed 2632.47 samples/sec   Loss 9.1332   LearningRate 0.0456   Epoch: 6   Global Step: 269320   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:10,019-Speed 2629.98 samples/sec   Loss 9.0286   LearningRate 0.0456   Epoch: 6   Global Step: 269330   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:13,917-Speed 2627.19 samples/sec   Loss 9.0254   LearningRate 0.0456   Epoch: 6   Global Step: 269340   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:17,813-Speed 2629.74 samples/sec   Loss 9.2052   LearningRate 0.0456   Epoch: 6   Global Step: 269350   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:21,726-Speed 2617.15 samples/sec   Loss 9.1417   LearningRate 0.0456   Epoch: 6   Global Step: 269360   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:25,667-Speed 2598.70 samples/sec   Loss 9.2790   LearningRate 0.0456   Epoch: 6   Global Step: 269370   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 01:58:29,562-Speed 2629.72 samples/sec   Loss 9.2202   LearningRate 0.0456   Epoch: 6   Global Step: 269380   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:33,459-Speed 2628.21 samples/sec   Loss 9.0683   LearningRate 0.0456   Epoch: 6   Global Step: 269390   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:37,351-Speed 2632.01 samples/sec   Loss 9.0703   LearningRate 0.0456   Epoch: 6   Global Step: 269400   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:41,245-Speed 2630.42 samples/sec   Loss 9.2846   LearningRate 0.0456   Epoch: 6   Global Step: 269410   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:45,144-Speed 2626.63 samples/sec   Loss 9.0753   LearningRate 0.0456   Epoch: 6   Global Step: 269420   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:49,035-Speed 2632.16 samples/sec   Loss 9.0994   LearningRate 0.0456   Epoch: 6   Global Step: 269430   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:52,926-Speed 2632.87 samples/sec   Loss 9.1568   LearningRate 0.0456   Epoch: 6   Global Step: 269440   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:58:56,817-Speed 2632.59 samples/sec   Loss 9.1248   LearningRate 0.0456   Epoch: 6   Global Step: 269450   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:59:00,709-Speed 2631.50 samples/sec   Loss 9.0883   LearningRate 0.0456   Epoch: 6   Global Step: 269460   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:59:04,756-Speed 2530.94 samples/sec   Loss 9.1651   LearningRate 0.0456   Epoch: 6   Global Step: 269470   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 01:59:08,675-Speed 2614.16 samples/sec   Loss 9.1019   LearningRate 0.0456   Epoch: 6   Global Step: 269480   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:12,605-Speed 2606.50 samples/sec   Loss 9.1191   LearningRate 0.0456   Epoch: 6   Global Step: 269490   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:16,497-Speed 2631.01 samples/sec   Loss 9.0802   LearningRate 0.0456   Epoch: 6   Global Step: 269500   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:20,399-Speed 2624.76 samples/sec   Loss 9.1310   LearningRate 0.0456   Epoch: 6   Global Step: 269510   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:24,293-Speed 2631.02 samples/sec   Loss 9.1247   LearningRate 0.0456   Epoch: 6   Global Step: 269520   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:28,183-Speed 2633.25 samples/sec   Loss 9.0995   LearningRate 0.0456   Epoch: 6   Global Step: 269530   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:32,101-Speed 2614.09 samples/sec   Loss 9.0800   LearningRate 0.0456   Epoch: 6   Global Step: 269540   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:35,996-Speed 2630.18 samples/sec   Loss 9.1482   LearningRate 0.0456   Epoch: 6   Global Step: 269550   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:39,891-Speed 2629.33 samples/sec   Loss 9.0423   LearningRate 0.0456   Epoch: 6   Global Step: 269560   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:43,784-Speed 2631.20 samples/sec   Loss 9.0153   LearningRate 0.0456   Epoch: 6   Global Step: 269570   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:47,658-Speed 2643.39 samples/sec   Loss 9.0404   LearningRate 0.0456   Epoch: 6   Global Step: 269580   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:51,549-Speed 2632.59 samples/sec   Loss 9.0595   LearningRate 0.0456   Epoch: 6   Global Step: 269590   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:55,440-Speed 2632.23 samples/sec   Loss 9.1197   LearningRate 0.0456   Epoch: 6   Global Step: 269600   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 01:59:59,333-Speed 2630.96 samples/sec   Loss 9.1980   LearningRate 0.0456   Epoch: 6   Global Step: 269610   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:03,245-Speed 2618.72 samples/sec   Loss 9.1376   LearningRate 0.0456   Epoch: 6   Global Step: 269620   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:07,149-Speed 2623.70 samples/sec   Loss 9.2206   LearningRate 0.0456   Epoch: 6   Global Step: 269630   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:11,046-Speed 2628.31 samples/sec   Loss 9.1564   LearningRate 0.0456   Epoch: 6   Global Step: 269640   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:14,938-Speed 2631.70 samples/sec   Loss 9.1459   LearningRate 0.0456   Epoch: 6   Global Step: 269650   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:18,834-Speed 2629.14 samples/sec   Loss 9.0089   LearningRate 0.0456   Epoch: 6   Global Step: 269660   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:22,735-Speed 2625.69 samples/sec   Loss 9.0946   LearningRate 0.0456   Epoch: 6   Global Step: 269670   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:00:26,621-Speed 2635.05 samples/sec   Loss 9.1567   LearningRate 0.0456   Epoch: 6   Global Step: 269680   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:00:30,519-Speed 2627.49 samples/sec   Loss 9.1374   LearningRate 0.0456   Epoch: 6   Global Step: 269690   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:00:34,438-Speed 2613.96 samples/sec   Loss 9.1618   LearningRate 0.0455   Epoch: 6   Global Step: 269700   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:00:38,340-Speed 2625.15 samples/sec   Loss 9.0348   LearningRate 0.0455   Epoch: 6   Global Step: 269710   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:00:42,207-Speed 2648.91 samples/sec   Loss 9.1594   LearningRate 0.0455   Epoch: 6   Global Step: 269720   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:00:46,110-Speed 2624.11 samples/sec   Loss 9.1597   LearningRate 0.0455   Epoch: 6   Global Step: 269730   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:00:50,013-Speed 2624.38 samples/sec   Loss 9.0662   LearningRate 0.0455   Epoch: 6   Global Step: 269740   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:00:53,916-Speed 2624.26 samples/sec   Loss 9.0704   LearningRate 0.0455   Epoch: 6   Global Step: 269750   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:00:57,817-Speed 2625.44 samples/sec   Loss 9.1675   LearningRate 0.0455   Epoch: 6   Global Step: 269760   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:01:01,721-Speed 2623.39 samples/sec   Loss 9.1074   LearningRate 0.0455   Epoch: 6   Global Step: 269770   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:01:05,621-Speed 2626.09 samples/sec   Loss 9.1198   LearningRate 0.0455   Epoch: 6   Global Step: 269780   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:01:09,522-Speed 2625.40 samples/sec   Loss 9.0081   LearningRate 0.0455   Epoch: 6   Global Step: 269790   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:01:13,423-Speed 2626.20 samples/sec   Loss 9.1608   LearningRate 0.0455   Epoch: 6   Global Step: 269800   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:01:17,305-Speed 2638.56 samples/sec   Loss 9.6563   LearningRate 0.0455   Epoch: 6   Global Step: 269810   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:01:21,214-Speed 2620.16 samples/sec   Loss 9.4555   LearningRate 0.0455   Epoch: 6   Global Step: 269820   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:01:25,130-Speed 2615.90 samples/sec   Loss 9.0788   LearningRate 0.0455   Epoch: 6   Global Step: 269830   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:01:29,020-Speed 2632.70 samples/sec   Loss 9.2156   LearningRate 0.0455   Epoch: 6   Global Step: 269840   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:01:32,880-Speed 2653.53 samples/sec   Loss 9.4486   LearningRate 0.0455   Epoch: 6   Global Step: 269850   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:01:36,761-Speed 2639.20 samples/sec   Loss 9.5067   LearningRate 0.0455   Epoch: 6   Global Step: 269860   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:01:40,655-Speed 2630.43 samples/sec   Loss 9.1704   LearningRate 0.0455   Epoch: 6   Global Step: 269870   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:01:44,552-Speed 2628.09 samples/sec   Loss 10.2979   LearningRate 0.0455   Epoch: 6   Global Step: 269880   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:01:48,490-Speed 2601.47 samples/sec   Loss 10.0212   LearningRate 0.0455   Epoch: 6   Global Step: 269890   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:01:52,397-Speed 2621.30 samples/sec   Loss 9.7023   LearningRate 0.0455   Epoch: 6   Global Step: 269900   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:01:56,281-Speed 2637.93 samples/sec   Loss 9.4155   LearningRate 0.0455   Epoch: 6   Global Step: 269910   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:02:00,167-Speed 2635.61 samples/sec   Loss 9.3079   LearningRate 0.0455   Epoch: 6   Global Step: 269920   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:02:04,055-Speed 2634.08 samples/sec   Loss 9.1971   LearningRate 0.0455   Epoch: 6   Global Step: 269930   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:02:07,942-Speed 2634.75 samples/sec   Loss 9.2992   LearningRate 0.0455   Epoch: 6   Global Step: 269940   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:02:11,832-Speed 2633.80 samples/sec   Loss 9.1423   LearningRate 0.0455   Epoch: 6   Global Step: 269950   Fp16 Grad Scale: 512   Required: 63 hours
Training: 2022-04-14 02:02:15,721-Speed 2633.62 samples/sec   Loss 9.1950   LearningRate 0.0455   Epoch: 6   Global Step: 269960   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:02:19,616-Speed 2630.18 samples/sec   Loss 9.1468   LearningRate 0.0455   Epoch: 6   Global Step: 269970   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:02:23,503-Speed 2635.03 samples/sec   Loss 9.1318   LearningRate 0.0455   Epoch: 6   Global Step: 269980   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:02:27,392-Speed 2633.66 samples/sec   Loss 9.1776   LearningRate 0.0455   Epoch: 6   Global Step: 269990   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:02:31,314-Speed 2611.69 samples/sec   Loss 9.1283   LearningRate 0.0455   Epoch: 6   Global Step: 270000   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:03:14,635-[lfw][270000]XNorm: 23.538378
Training: 2022-04-14 02:03:14,636-[lfw][270000]Accuracy-Flip: 0.99717+-0.00248
Training: 2022-04-14 02:03:14,636-[lfw][270000]Accuracy-Highest: 0.99783
Training: 2022-04-14 02:04:05,255-[cfp_fp][270000]XNorm: 21.506028
Training: 2022-04-14 02:04:05,256-[cfp_fp][270000]Accuracy-Flip: 0.98400+-0.00423
Training: 2022-04-14 02:04:05,257-[cfp_fp][270000]Accuracy-Highest: 0.98643
Training: 2022-04-14 02:04:48,760-[agedb_30][270000]XNorm: 23.227426
Training: 2022-04-14 02:04:48,761-[agedb_30][270000]Accuracy-Flip: 0.97367+-0.00552
Training: 2022-04-14 02:04:48,762-[agedb_30][270000]Accuracy-Highest: 0.97367
Training: 2022-04-14 02:04:52,633-Speed 72.46 samples/sec   Loss 9.0449   LearningRate 0.0455   Epoch: 6   Global Step: 270010   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:04:56,545-Speed 2618.26 samples/sec   Loss 9.1670   LearningRate 0.0455   Epoch: 6   Global Step: 270020   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:05:00,427-Speed 2639.24 samples/sec   Loss 9.0819   LearningRate 0.0455   Epoch: 6   Global Step: 270030   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:05:04,354-Speed 2608.66 samples/sec   Loss 9.1276   LearningRate 0.0455   Epoch: 6   Global Step: 270040   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:05:08,230-Speed 2643.20 samples/sec   Loss 9.2126   LearningRate 0.0455   Epoch: 6   Global Step: 270050   Fp16 Grad Scale: 1024   Required: 63 hours
Training: 2022-04-14 02:05:12,151-Speed 2611.57 samples/sec   Loss 8.9853   LearningRate 0.0455   Epoch: 6   Global Step: 270060   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:16,024-Speed 2645.35 samples/sec   Loss 9.1526   LearningRate 0.0455   Epoch: 6   Global Step: 270070   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:19,902-Speed 2641.02 samples/sec   Loss 9.1344   LearningRate 0.0455   Epoch: 6   Global Step: 270080   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:23,783-Speed 2638.85 samples/sec   Loss 9.2694   LearningRate 0.0455   Epoch: 6   Global Step: 270090   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:27,664-Speed 2639.81 samples/sec   Loss 9.0678   LearningRate 0.0455   Epoch: 6   Global Step: 270100   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:31,548-Speed 2636.96 samples/sec   Loss 9.0944   LearningRate 0.0455   Epoch: 6   Global Step: 270110   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:35,447-Speed 2627.14 samples/sec   Loss 9.1972   LearningRate 0.0455   Epoch: 6   Global Step: 270120   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:39,343-Speed 2628.91 samples/sec   Loss 9.1729   LearningRate 0.0455   Epoch: 6   Global Step: 270130   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:43,239-Speed 2629.23 samples/sec   Loss 9.3025   LearningRate 0.0455   Epoch: 6   Global Step: 270140   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:47,117-Speed 2641.51 samples/sec   Loss 9.0491   LearningRate 0.0455   Epoch: 6   Global Step: 270150   Fp16 Grad Scale: 2048   Required: 63 hours
Training: 2022-04-14 02:05:51,002-Speed 2636.25 samples/sec   Loss 9.2114   LearningRate 0.0455   Epoch: 6   Global Step: 270160   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:05:54,898-Speed 2629.10 samples/sec   Loss 9.1614   LearningRate 0.0455   Epoch: 6   Global Step: 270170   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:05:58,790-Speed 2632.06 samples/sec   Loss 9.2348   LearningRate 0.0455   Epoch: 6   Global Step: 270180   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:02,677-Speed 2635.01 samples/sec   Loss 9.2104   LearningRate 0.0455   Epoch: 6   Global Step: 270190   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:06,564-Speed 2635.31 samples/sec   Loss 9.1985   LearningRate 0.0455   Epoch: 6   Global Step: 270200   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:10,455-Speed 2632.32 samples/sec   Loss 9.0861   LearningRate 0.0455   Epoch: 6   Global Step: 270210   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:14,346-Speed 2632.95 samples/sec   Loss 9.0845   LearningRate 0.0455   Epoch: 6   Global Step: 270220   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:18,243-Speed 2628.60 samples/sec   Loss 9.1891   LearningRate 0.0455   Epoch: 6   Global Step: 270230   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:22,142-Speed 2627.07 samples/sec   Loss 9.0076   LearningRate 0.0455   Epoch: 6   Global Step: 270240   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:26,035-Speed 2631.27 samples/sec   Loss 9.2130   LearningRate 0.0455   Epoch: 6   Global Step: 270250   Fp16 Grad Scale: 4096   Required: 63 hours
Training: 2022-04-14 02:06:29,924-Speed 2633.07 samples/sec   Loss 9.1656   LearningRate 0.0455   Epoch: 6   Global Step: 270260   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:33,820-Speed 2629.11 samples/sec   Loss 9.0832   LearningRate 0.0455   Epoch: 6   Global Step: 270270   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:37,718-Speed 2627.88 samples/sec   Loss 9.0835   LearningRate 0.0455   Epoch: 6   Global Step: 270280   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:41,617-Speed 2627.27 samples/sec   Loss 8.9163   LearningRate 0.0455   Epoch: 6   Global Step: 270290   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:45,520-Speed 2623.94 samples/sec   Loss 9.2278   LearningRate 0.0455   Epoch: 6   Global Step: 270300   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:49,425-Speed 2622.77 samples/sec   Loss 9.2327   LearningRate 0.0454   Epoch: 6   Global Step: 270310   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:53,349-Speed 2610.46 samples/sec   Loss 9.0614   LearningRate 0.0454   Epoch: 6   Global Step: 270320   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:06:57,262-Speed 2618.10 samples/sec   Loss 9.1127   LearningRate 0.0454   Epoch: 6   Global Step: 270330   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:07:01,149-Speed 2634.85 samples/sec   Loss 8.9733   LearningRate 0.0454   Epoch: 6   Global Step: 270340   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:07:05,152-Speed 2558.84 samples/sec   Loss 9.1382   LearningRate 0.0454   Epoch: 6   Global Step: 270350   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:07:09,217-Speed 2519.89 samples/sec   Loss 9.0729   LearningRate 0.0454   Epoch: 6   Global Step: 270360   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:13,958-Speed 2160.38 samples/sec   Loss 9.2538   LearningRate 0.0454   Epoch: 6   Global Step: 270370   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:17,862-Speed 2622.89 samples/sec   Loss 9.1760   LearningRate 0.0454   Epoch: 6   Global Step: 270380   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:21,766-Speed 2624.19 samples/sec   Loss 9.1868   LearningRate 0.0454   Epoch: 6   Global Step: 270390   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:25,665-Speed 2626.79 samples/sec   Loss 9.1874   LearningRate 0.0454   Epoch: 6   Global Step: 270400   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:29,561-Speed 2629.79 samples/sec   Loss 9.1103   LearningRate 0.0454   Epoch: 6   Global Step: 270410   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:33,454-Speed 2630.81 samples/sec   Loss 9.1034   LearningRate 0.0454   Epoch: 6   Global Step: 270420   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:37,343-Speed 2633.57 samples/sec   Loss 9.2100   LearningRate 0.0454   Epoch: 6   Global Step: 270430   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:41,230-Speed 2634.65 samples/sec   Loss 9.0315   LearningRate 0.0454   Epoch: 6   Global Step: 270440   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:45,124-Speed 2630.50 samples/sec   Loss 9.1871   LearningRate 0.0454   Epoch: 6   Global Step: 270450   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:07:49,024-Speed 2626.70 samples/sec   Loss 9.2010   LearningRate 0.0454   Epoch: 6   Global Step: 270460   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:07:52,915-Speed 2632.60 samples/sec   Loss 9.1460   LearningRate 0.0454   Epoch: 6   Global Step: 270470   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:07:56,801-Speed 2635.68 samples/sec   Loss 9.1486   LearningRate 0.0454   Epoch: 6   Global Step: 270480   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:00,687-Speed 2635.65 samples/sec   Loss 9.1179   LearningRate 0.0454   Epoch: 6   Global Step: 270490   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:04,576-Speed 2633.32 samples/sec   Loss 9.1230   LearningRate 0.0454   Epoch: 6   Global Step: 270500   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:08,486-Speed 2619.52 samples/sec   Loss 9.0775   LearningRate 0.0454   Epoch: 6   Global Step: 270510   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:12,414-Speed 2607.77 samples/sec   Loss 9.3501   LearningRate 0.0454   Epoch: 6   Global Step: 270520   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:16,313-Speed 2626.39 samples/sec   Loss 9.0490   LearningRate 0.0454   Epoch: 6   Global Step: 270530   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:20,214-Speed 2625.96 samples/sec   Loss 9.1464   LearningRate 0.0454   Epoch: 6   Global Step: 270540   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:24,105-Speed 2632.64 samples/sec   Loss 9.2749   LearningRate 0.0454   Epoch: 6   Global Step: 270550   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:08:28,017-Speed 2618.24 samples/sec   Loss 9.1247   LearningRate 0.0454   Epoch: 6   Global Step: 270560   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:31,909-Speed 2631.90 samples/sec   Loss 9.0588   LearningRate 0.0454   Epoch: 6   Global Step: 270570   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:35,808-Speed 2626.68 samples/sec   Loss 9.1182   LearningRate 0.0454   Epoch: 6   Global Step: 270580   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:39,731-Speed 2611.20 samples/sec   Loss 9.2150   LearningRate 0.0454   Epoch: 6   Global Step: 270590   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:43,625-Speed 2630.55 samples/sec   Loss 9.0785   LearningRate 0.0454   Epoch: 6   Global Step: 270600   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:47,516-Speed 2631.92 samples/sec   Loss 9.2137   LearningRate 0.0454   Epoch: 6   Global Step: 270610   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:51,404-Speed 2634.44 samples/sec   Loss 9.0487   LearningRate 0.0454   Epoch: 6   Global Step: 270620   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:55,297-Speed 2631.64 samples/sec   Loss 8.9464   LearningRate 0.0454   Epoch: 6   Global Step: 270630   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:08:59,192-Speed 2630.11 samples/sec   Loss 9.1150   LearningRate 0.0454   Epoch: 6   Global Step: 270640   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:03,082-Speed 2632.39 samples/sec   Loss 9.0587   LearningRate 0.0454   Epoch: 6   Global Step: 270650   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:06,982-Speed 2626.27 samples/sec   Loss 9.1351   LearningRate 0.0454   Epoch: 6   Global Step: 270660   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:10,877-Speed 2629.53 samples/sec   Loss 9.1047   LearningRate 0.0454   Epoch: 6   Global Step: 270670   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:14,770-Speed 2635.76 samples/sec   Loss 9.1351   LearningRate 0.0454   Epoch: 6   Global Step: 270680   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:18,663-Speed 2631.48 samples/sec   Loss 9.2324   LearningRate 0.0454   Epoch: 6   Global Step: 270690   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:22,556-Speed 2631.23 samples/sec   Loss 9.0936   LearningRate 0.0454   Epoch: 6   Global Step: 270700   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:26,457-Speed 2625.85 samples/sec   Loss 9.0997   LearningRate 0.0454   Epoch: 6   Global Step: 270710   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:30,353-Speed 2628.38 samples/sec   Loss 9.1235   LearningRate 0.0454   Epoch: 6   Global Step: 270720   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:09:34,231-Speed 2640.81 samples/sec   Loss 9.1874   LearningRate 0.0454   Epoch: 6   Global Step: 270730   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:38,140-Speed 2620.06 samples/sec   Loss 9.1059   LearningRate 0.0454   Epoch: 6   Global Step: 270740   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:42,054-Speed 2617.49 samples/sec   Loss 9.1740   LearningRate 0.0454   Epoch: 6   Global Step: 270750   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:46,045-Speed 2566.34 samples/sec   Loss 9.1508   LearningRate 0.0454   Epoch: 6   Global Step: 270760   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:49,969-Speed 2610.74 samples/sec   Loss 9.0312   LearningRate 0.0454   Epoch: 6   Global Step: 270770   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:53,868-Speed 2626.85 samples/sec   Loss 9.0760   LearningRate 0.0454   Epoch: 6   Global Step: 270780   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:09:57,779-Speed 2619.48 samples/sec   Loss 9.1552   LearningRate 0.0454   Epoch: 6   Global Step: 270790   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:10:01,659-Speed 2639.65 samples/sec   Loss 9.1552   LearningRate 0.0454   Epoch: 6   Global Step: 270800   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:10:05,533-Speed 2644.23 samples/sec   Loss 10.4164   LearningRate 0.0454   Epoch: 6   Global Step: 270810   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:09,426-Speed 2630.63 samples/sec   Loss 9.3996   LearningRate 0.0454   Epoch: 6   Global Step: 270820   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:13,354-Speed 2607.51 samples/sec   Loss 9.1364   LearningRate 0.0454   Epoch: 6   Global Step: 270830   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:17,260-Speed 2622.46 samples/sec   Loss 9.2270   LearningRate 0.0454   Epoch: 6   Global Step: 270840   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:21,148-Speed 2634.49 samples/sec   Loss 9.2777   LearningRate 0.0454   Epoch: 6   Global Step: 270850   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:25,038-Speed 2633.10 samples/sec   Loss 9.1491   LearningRate 0.0454   Epoch: 6   Global Step: 270860   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:28,927-Speed 2634.04 samples/sec   Loss 9.2240   LearningRate 0.0454   Epoch: 6   Global Step: 270870   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:32,824-Speed 2628.15 samples/sec   Loss 9.2931   LearningRate 0.0454   Epoch: 6   Global Step: 270880   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:36,715-Speed 2632.47 samples/sec   Loss 9.3030   LearningRate 0.0454   Epoch: 6   Global Step: 270890   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:40,614-Speed 2626.84 samples/sec   Loss 9.1960   LearningRate 0.0454   Epoch: 6   Global Step: 270900   Fp16 Grad Scale: 8192   Required: 63 hours
Training: 2022-04-14 02:10:44,504-Speed 2633.46 samples/sec   Loss 9.1650   LearningRate 0.0454   Epoch: 6   Global Step: 270910   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:10:48,400-Speed 2628.78 samples/sec   Loss 9.2469   LearningRate 0.0454   Epoch: 6   Global Step: 270920   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:10:52,289-Speed 2634.25 samples/sec   Loss 9.1162   LearningRate 0.0453   Epoch: 6   Global Step: 270930   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:10:56,189-Speed 2625.75 samples/sec   Loss 9.2315   LearningRate 0.0453   Epoch: 6   Global Step: 270940   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:00,080-Speed 2632.89 samples/sec   Loss 9.0535   LearningRate 0.0453   Epoch: 6   Global Step: 270950   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:03,971-Speed 2631.77 samples/sec   Loss 9.0634   LearningRate 0.0453   Epoch: 6   Global Step: 270960   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:07,865-Speed 2630.37 samples/sec   Loss 9.1429   LearningRate 0.0453   Epoch: 6   Global Step: 270970   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:11,756-Speed 2632.28 samples/sec   Loss 9.0520   LearningRate 0.0453   Epoch: 6   Global Step: 270980   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:15,645-Speed 2633.43 samples/sec   Loss 9.1484   LearningRate 0.0453   Epoch: 6   Global Step: 270990   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:19,548-Speed 2624.91 samples/sec   Loss 9.1962   LearningRate 0.0453   Epoch: 6   Global Step: 271000   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:23,437-Speed 2633.09 samples/sec   Loss 9.1372   LearningRate 0.0453   Epoch: 6   Global Step: 271010   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:11:27,333-Speed 2629.51 samples/sec   Loss 9.3175   LearningRate 0.0453   Epoch: 6   Global Step: 271020   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:11:31,226-Speed 2631.49 samples/sec   Loss 9.1155   LearningRate 0.0453   Epoch: 6   Global Step: 271030   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:11:35,098-Speed 2645.06 samples/sec   Loss 9.2446   LearningRate 0.0453   Epoch: 6   Global Step: 271040   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:39,002-Speed 2623.00 samples/sec   Loss 9.1921   LearningRate 0.0453   Epoch: 6   Global Step: 271050   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:42,904-Speed 2625.92 samples/sec   Loss 9.2182   LearningRate 0.0453   Epoch: 6   Global Step: 271060   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:46,799-Speed 2629.42 samples/sec   Loss 9.1075   LearningRate 0.0453   Epoch: 6   Global Step: 271070   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:50,686-Speed 2635.97 samples/sec   Loss 9.0749   LearningRate 0.0453   Epoch: 6   Global Step: 271080   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:54,577-Speed 2631.93 samples/sec   Loss 9.2306   LearningRate 0.0453   Epoch: 6   Global Step: 271090   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:11:58,470-Speed 2631.28 samples/sec   Loss 9.1135   LearningRate 0.0453   Epoch: 6   Global Step: 271100   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:12:02,469-Speed 2560.85 samples/sec   Loss 9.2025   LearningRate 0.0453   Epoch: 6   Global Step: 271110   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:12:06,365-Speed 2628.82 samples/sec   Loss 9.1623   LearningRate 0.0453   Epoch: 6   Global Step: 271120   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:12:10,254-Speed 2633.64 samples/sec   Loss 9.0675   LearningRate 0.0453   Epoch: 6   Global Step: 271130   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:12:14,144-Speed 2633.04 samples/sec   Loss 9.3496   LearningRate 0.0453   Epoch: 6   Global Step: 271140   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:18,036-Speed 2631.87 samples/sec   Loss 9.2246   LearningRate 0.0453   Epoch: 6   Global Step: 271150   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:21,931-Speed 2629.42 samples/sec   Loss 9.0531   LearningRate 0.0453   Epoch: 6   Global Step: 271160   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:25,833-Speed 2624.98 samples/sec   Loss 9.0349   LearningRate 0.0453   Epoch: 6   Global Step: 271170   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:29,726-Speed 2630.92 samples/sec   Loss 9.0798   LearningRate 0.0453   Epoch: 6   Global Step: 271180   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:33,626-Speed 2626.90 samples/sec   Loss 9.1832   LearningRate 0.0453   Epoch: 6   Global Step: 271190   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:37,525-Speed 2626.70 samples/sec   Loss 9.1298   LearningRate 0.0453   Epoch: 6   Global Step: 271200   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:41,417-Speed 2631.72 samples/sec   Loss 9.0529   LearningRate 0.0453   Epoch: 6   Global Step: 271210   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:45,347-Speed 2606.35 samples/sec   Loss 9.0269   LearningRate 0.0453   Epoch: 6   Global Step: 271220   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:49,242-Speed 2629.53 samples/sec   Loss 9.1795   LearningRate 0.0453   Epoch: 6   Global Step: 271230   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:12:53,137-Speed 2630.02 samples/sec   Loss 9.2210   LearningRate 0.0453   Epoch: 6   Global Step: 271240   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:12:57,030-Speed 2630.88 samples/sec   Loss 9.2076   LearningRate 0.0453   Epoch: 6   Global Step: 271250   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:00,919-Speed 2633.90 samples/sec   Loss 9.1369   LearningRate 0.0453   Epoch: 6   Global Step: 271260   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:04,810-Speed 2632.61 samples/sec   Loss 9.1059   LearningRate 0.0453   Epoch: 6   Global Step: 271270   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:08,699-Speed 2633.41 samples/sec   Loss 9.1601   LearningRate 0.0453   Epoch: 6   Global Step: 271280   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:12,603-Speed 2623.25 samples/sec   Loss 9.2113   LearningRate 0.0453   Epoch: 6   Global Step: 271290   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:16,502-Speed 2626.93 samples/sec   Loss 9.0220   LearningRate 0.0453   Epoch: 6   Global Step: 271300   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:20,404-Speed 2625.14 samples/sec   Loss 9.0734   LearningRate 0.0453   Epoch: 6   Global Step: 271310   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:24,306-Speed 2625.07 samples/sec   Loss 9.1686   LearningRate 0.0453   Epoch: 6   Global Step: 271320   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:28,210-Speed 2623.97 samples/sec   Loss 9.1982   LearningRate 0.0453   Epoch: 6   Global Step: 271330   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:13:32,114-Speed 2623.29 samples/sec   Loss 9.0533   LearningRate 0.0453   Epoch: 6   Global Step: 271340   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:36,017-Speed 2623.60 samples/sec   Loss 9.1025   LearningRate 0.0453   Epoch: 6   Global Step: 271350   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:39,916-Speed 2627.25 samples/sec   Loss 9.2158   LearningRate 0.0453   Epoch: 6   Global Step: 271360   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:43,820-Speed 2623.78 samples/sec   Loss 9.0814   LearningRate 0.0453   Epoch: 6   Global Step: 271370   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:47,713-Speed 2630.40 samples/sec   Loss 9.2049   LearningRate 0.0453   Epoch: 6   Global Step: 271380   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:51,614-Speed 2626.11 samples/sec   Loss 9.2041   LearningRate 0.0453   Epoch: 6   Global Step: 271390   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:55,510-Speed 2628.66 samples/sec   Loss 9.1719   LearningRate 0.0453   Epoch: 6   Global Step: 271400   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:13:59,401-Speed 2632.56 samples/sec   Loss 9.0678   LearningRate 0.0453   Epoch: 6   Global Step: 271410   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:03,300-Speed 2627.28 samples/sec   Loss 9.2481   LearningRate 0.0453   Epoch: 6   Global Step: 271420   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:07,191-Speed 2632.04 samples/sec   Loss 8.9610   LearningRate 0.0453   Epoch: 6   Global Step: 271430   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:11,080-Speed 2633.19 samples/sec   Loss 9.1730   LearningRate 0.0453   Epoch: 6   Global Step: 271440   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:14:14,976-Speed 2629.60 samples/sec   Loss 9.1125   LearningRate 0.0453   Epoch: 6   Global Step: 271450   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:14:18,866-Speed 2632.80 samples/sec   Loss 9.1159   LearningRate 0.0453   Epoch: 6   Global Step: 271460   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:14:22,770-Speed 2623.39 samples/sec   Loss 9.0579   LearningRate 0.0453   Epoch: 6   Global Step: 271470   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:14:26,663-Speed 2631.03 samples/sec   Loss 9.0725   LearningRate 0.0453   Epoch: 6   Global Step: 271480   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:14:30,549-Speed 2636.03 samples/sec   Loss 9.0398   LearningRate 0.0453   Epoch: 6   Global Step: 271490   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:34,445-Speed 2628.79 samples/sec   Loss 9.1499   LearningRate 0.0453   Epoch: 6   Global Step: 271500   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:38,339-Speed 2629.88 samples/sec   Loss 9.0691   LearningRate 0.0453   Epoch: 6   Global Step: 271510   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:42,242-Speed 2624.63 samples/sec   Loss 9.0351   LearningRate 0.0453   Epoch: 6   Global Step: 271520   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:46,146-Speed 2623.11 samples/sec   Loss 9.0761   LearningRate 0.0453   Epoch: 6   Global Step: 271530   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:50,044-Speed 2627.70 samples/sec   Loss 9.1224   LearningRate 0.0452   Epoch: 6   Global Step: 271540   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:53,937-Speed 2631.13 samples/sec   Loss 9.1614   LearningRate 0.0452   Epoch: 6   Global Step: 271550   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:14:57,834-Speed 2628.51 samples/sec   Loss 9.1691   LearningRate 0.0452   Epoch: 6   Global Step: 271560   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:15:01,800-Speed 2582.54 samples/sec   Loss 9.2008   LearningRate 0.0452   Epoch: 6   Global Step: 271570   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:15:05,694-Speed 2629.75 samples/sec   Loss 9.0378   LearningRate 0.0452   Epoch: 6   Global Step: 271580   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:15:09,588-Speed 2630.94 samples/sec   Loss 9.1030   LearningRate 0.0452   Epoch: 6   Global Step: 271590   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:13,477-Speed 2633.47 samples/sec   Loss 9.1971   LearningRate 0.0452   Epoch: 6   Global Step: 271600   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:17,376-Speed 2627.07 samples/sec   Loss 8.9549   LearningRate 0.0452   Epoch: 6   Global Step: 271610   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:21,273-Speed 2627.90 samples/sec   Loss 9.0286   LearningRate 0.0452   Epoch: 6   Global Step: 271620   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:25,176-Speed 2624.34 samples/sec   Loss 9.1546   LearningRate 0.0452   Epoch: 6   Global Step: 271630   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:29,075-Speed 2626.86 samples/sec   Loss 9.1388   LearningRate 0.0452   Epoch: 6   Global Step: 271640   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:32,973-Speed 2627.83 samples/sec   Loss 9.1137   LearningRate 0.0452   Epoch: 6   Global Step: 271650   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:36,880-Speed 2621.12 samples/sec   Loss 8.9945   LearningRate 0.0452   Epoch: 6   Global Step: 271660   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:40,772-Speed 2631.56 samples/sec   Loss 9.1259   LearningRate 0.0452   Epoch: 6   Global Step: 271670   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:44,663-Speed 2632.37 samples/sec   Loss 9.0475   LearningRate 0.0452   Epoch: 6   Global Step: 271680   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:48,546-Speed 2637.76 samples/sec   Loss 9.0812   LearningRate 0.0452   Epoch: 6   Global Step: 271690   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:52,439-Speed 2631.51 samples/sec   Loss 9.0946   LearningRate 0.0452   Epoch: 6   Global Step: 271700   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:15:56,331-Speed 2631.57 samples/sec   Loss 9.1211   LearningRate 0.0452   Epoch: 6   Global Step: 271710   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:16:00,223-Speed 2632.00 samples/sec   Loss 9.0482   LearningRate 0.0452   Epoch: 6   Global Step: 271720   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:16:04,103-Speed 2639.57 samples/sec   Loss 9.1513   LearningRate 0.0452   Epoch: 6   Global Step: 271730   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:08,002-Speed 2626.39 samples/sec   Loss 9.0349   LearningRate 0.0452   Epoch: 6   Global Step: 271740   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:11,900-Speed 2627.40 samples/sec   Loss 9.2121   LearningRate 0.0452   Epoch: 6   Global Step: 271750   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:15,794-Speed 2630.50 samples/sec   Loss 9.2539   LearningRate 0.0452   Epoch: 6   Global Step: 271760   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:19,692-Speed 2627.65 samples/sec   Loss 9.2860   LearningRate 0.0452   Epoch: 6   Global Step: 271770   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:23,584-Speed 2631.59 samples/sec   Loss 9.0873   LearningRate 0.0452   Epoch: 6   Global Step: 271780   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:27,475-Speed 2632.14 samples/sec   Loss 9.0989   LearningRate 0.0452   Epoch: 6   Global Step: 271790   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:31,365-Speed 2633.33 samples/sec   Loss 9.1550   LearningRate 0.0452   Epoch: 6   Global Step: 271800   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:35,259-Speed 2630.38 samples/sec   Loss 9.0844   LearningRate 0.0452   Epoch: 6   Global Step: 271810   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:39,149-Speed 2633.31 samples/sec   Loss 9.1804   LearningRate 0.0452   Epoch: 6   Global Step: 271820   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:43,042-Speed 2630.22 samples/sec   Loss 8.9586   LearningRate 0.0452   Epoch: 6   Global Step: 271830   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:16:46,940-Speed 2628.26 samples/sec   Loss 8.9621   LearningRate 0.0452   Epoch: 6   Global Step: 271840   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:16:50,813-Speed 2643.80 samples/sec   Loss 9.0887   LearningRate 0.0452   Epoch: 6   Global Step: 271850   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:54,721-Speed 2621.28 samples/sec   Loss 9.0103   LearningRate 0.0452   Epoch: 6   Global Step: 271860   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:16:58,612-Speed 2632.30 samples/sec   Loss 9.2593   LearningRate 0.0452   Epoch: 6   Global Step: 271870   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:02,503-Speed 2631.97 samples/sec   Loss 8.9974   LearningRate 0.0452   Epoch: 6   Global Step: 271880   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:06,394-Speed 2632.40 samples/sec   Loss 9.2117   LearningRate 0.0452   Epoch: 6   Global Step: 271890   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:10,298-Speed 2623.86 samples/sec   Loss 9.1182   LearningRate 0.0452   Epoch: 6   Global Step: 271900   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:14,193-Speed 2629.57 samples/sec   Loss 9.1037   LearningRate 0.0452   Epoch: 6   Global Step: 271910   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:18,090-Speed 2628.26 samples/sec   Loss 8.9867   LearningRate 0.0452   Epoch: 6   Global Step: 271920   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:21,995-Speed 2623.47 samples/sec   Loss 9.0497   LearningRate 0.0452   Epoch: 6   Global Step: 271930   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:25,888-Speed 2630.72 samples/sec   Loss 9.0934   LearningRate 0.0452   Epoch: 6   Global Step: 271940   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:17:29,779-Speed 2631.96 samples/sec   Loss 9.1297   LearningRate 0.0452   Epoch: 6   Global Step: 271950   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:17:33,683-Speed 2623.82 samples/sec   Loss 9.1317   LearningRate 0.0452   Epoch: 6   Global Step: 271960   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:17:37,587-Speed 2623.16 samples/sec   Loss 9.0998   LearningRate 0.0452   Epoch: 6   Global Step: 271970   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:17:41,482-Speed 2630.02 samples/sec   Loss 8.9317   LearningRate 0.0452   Epoch: 6   Global Step: 271980   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:17:45,370-Speed 2634.74 samples/sec   Loss 9.1644   LearningRate 0.0452   Epoch: 6   Global Step: 271990   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:17:49,194-Speed 2678.51 samples/sec   Loss 9.1725   LearningRate 0.0452   Epoch: 6   Global Step: 272000   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:17:53,087-Speed 2630.94 samples/sec   Loss 9.1632   LearningRate 0.0452   Epoch: 6   Global Step: 272010   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:17:56,985-Speed 2627.51 samples/sec   Loss 9.0492   LearningRate 0.0452   Epoch: 6   Global Step: 272020   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:00,871-Speed 2635.37 samples/sec   Loss 9.0960   LearningRate 0.0452   Epoch: 6   Global Step: 272030   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:04,773-Speed 2625.09 samples/sec   Loss 9.2208   LearningRate 0.0452   Epoch: 6   Global Step: 272040   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:08,694-Speed 2611.47 samples/sec   Loss 9.0638   LearningRate 0.0452   Epoch: 6   Global Step: 272050   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:12,590-Speed 2629.44 samples/sec   Loss 9.0947   LearningRate 0.0452   Epoch: 6   Global Step: 272060   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:16,502-Speed 2618.66 samples/sec   Loss 9.1800   LearningRate 0.0452   Epoch: 6   Global Step: 272070   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:20,547-Speed 2532.32 samples/sec   Loss 9.1584   LearningRate 0.0452   Epoch: 6   Global Step: 272080   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:24,451-Speed 2623.28 samples/sec   Loss 9.1210   LearningRate 0.0452   Epoch: 6   Global Step: 272090   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:18:28,350-Speed 2626.90 samples/sec   Loss 9.1383   LearningRate 0.0452   Epoch: 6   Global Step: 272100   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:32,272-Speed 2611.84 samples/sec   Loss 9.1006   LearningRate 0.0452   Epoch: 6   Global Step: 272110   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:36,234-Speed 2584.97 samples/sec   Loss 9.1667   LearningRate 0.0452   Epoch: 6   Global Step: 272120   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:40,127-Speed 2630.49 samples/sec   Loss 9.0688   LearningRate 0.0452   Epoch: 6   Global Step: 272130   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:44,070-Speed 2598.08 samples/sec   Loss 9.1572   LearningRate 0.0452   Epoch: 6   Global Step: 272140   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:48,115-Speed 2532.11 samples/sec   Loss 8.8733   LearningRate 0.0452   Epoch: 6   Global Step: 272150   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:52,005-Speed 2632.81 samples/sec   Loss 9.0902   LearningRate 0.0451   Epoch: 6   Global Step: 272160   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:55,903-Speed 2628.14 samples/sec   Loss 9.1426   LearningRate 0.0451   Epoch: 6   Global Step: 272170   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:18:59,801-Speed 2627.49 samples/sec   Loss 9.1896   LearningRate 0.0451   Epoch: 6   Global Step: 272180   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:19:03,699-Speed 2627.34 samples/sec   Loss 9.0063   LearningRate 0.0451   Epoch: 6   Global Step: 272190   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:19:07,609-Speed 2619.51 samples/sec   Loss 9.0152   LearningRate 0.0451   Epoch: 6   Global Step: 272200   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:11,507-Speed 2627.25 samples/sec   Loss 9.1573   LearningRate 0.0451   Epoch: 6   Global Step: 272210   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:15,409-Speed 2625.04 samples/sec   Loss 9.1389   LearningRate 0.0451   Epoch: 6   Global Step: 272220   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:19,303-Speed 2630.98 samples/sec   Loss 8.9970   LearningRate 0.0451   Epoch: 6   Global Step: 272230   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:23,194-Speed 2632.16 samples/sec   Loss 8.9117   LearningRate 0.0451   Epoch: 6   Global Step: 272240   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:27,090-Speed 2628.69 samples/sec   Loss 9.0752   LearningRate 0.0451   Epoch: 6   Global Step: 272250   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:30,985-Speed 2629.84 samples/sec   Loss 9.0162   LearningRate 0.0451   Epoch: 6   Global Step: 272260   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:34,878-Speed 2630.53 samples/sec   Loss 9.1673   LearningRate 0.0451   Epoch: 6   Global Step: 272270   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:38,772-Speed 2630.34 samples/sec   Loss 9.1945   LearningRate 0.0451   Epoch: 6   Global Step: 272280   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:42,662-Speed 2633.04 samples/sec   Loss 9.0748   LearningRate 0.0451   Epoch: 6   Global Step: 272290   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:19:46,557-Speed 2629.81 samples/sec   Loss 9.0188   LearningRate 0.0451   Epoch: 6   Global Step: 272300   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:19:50,452-Speed 2629.48 samples/sec   Loss 9.0961   LearningRate 0.0451   Epoch: 6   Global Step: 272310   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:19:54,352-Speed 2626.19 samples/sec   Loss 9.1485   LearningRate 0.0451   Epoch: 6   Global Step: 272320   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:19:58,249-Speed 2628.32 samples/sec   Loss 9.1161   LearningRate 0.0451   Epoch: 6   Global Step: 272330   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:20:02,147-Speed 2627.38 samples/sec   Loss 9.1769   LearningRate 0.0451   Epoch: 6   Global Step: 272340   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:20:06,043-Speed 2629.19 samples/sec   Loss 9.0535   LearningRate 0.0451   Epoch: 6   Global Step: 272350   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:20:09,913-Speed 2646.28 samples/sec   Loss 9.1662   LearningRate 0.0451   Epoch: 6   Global Step: 272360   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:13,816-Speed 2624.41 samples/sec   Loss 8.9721   LearningRate 0.0451   Epoch: 6   Global Step: 272370   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:17,726-Speed 2619.66 samples/sec   Loss 9.0023   LearningRate 0.0451   Epoch: 6   Global Step: 272380   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:21,625-Speed 2626.91 samples/sec   Loss 9.0944   LearningRate 0.0451   Epoch: 6   Global Step: 272390   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:25,520-Speed 2629.42 samples/sec   Loss 9.0521   LearningRate 0.0451   Epoch: 6   Global Step: 272400   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:29,419-Speed 2626.93 samples/sec   Loss 8.9872   LearningRate 0.0451   Epoch: 6   Global Step: 272410   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:33,316-Speed 2628.06 samples/sec   Loss 9.2541   LearningRate 0.0451   Epoch: 6   Global Step: 272420   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:37,222-Speed 2622.61 samples/sec   Loss 9.1797   LearningRate 0.0451   Epoch: 6   Global Step: 272430   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:41,113-Speed 2631.75 samples/sec   Loss 9.0082   LearningRate 0.0451   Epoch: 6   Global Step: 272440   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:45,129-Speed 2550.25 samples/sec   Loss 9.0742   LearningRate 0.0451   Epoch: 6   Global Step: 272450   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:20:49,198-Speed 2517.62 samples/sec   Loss 8.9884   LearningRate 0.0451   Epoch: 6   Global Step: 272460   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:20:53,210-Speed 2552.98 samples/sec   Loss 9.0929   LearningRate 0.0451   Epoch: 6   Global Step: 272470   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:20:57,103-Speed 2630.98 samples/sec   Loss 9.0814   LearningRate 0.0451   Epoch: 6   Global Step: 272480   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:00,997-Speed 2630.25 samples/sec   Loss 9.1110   LearningRate 0.0451   Epoch: 6   Global Step: 272490   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:04,894-Speed 2628.14 samples/sec   Loss 9.0227   LearningRate 0.0451   Epoch: 6   Global Step: 272500   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:08,790-Speed 2629.09 samples/sec   Loss 9.0382   LearningRate 0.0451   Epoch: 6   Global Step: 272510   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:12,685-Speed 2629.63 samples/sec   Loss 9.0675   LearningRate 0.0451   Epoch: 6   Global Step: 272520   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:16,594-Speed 2619.92 samples/sec   Loss 8.9888   LearningRate 0.0451   Epoch: 6   Global Step: 272530   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:20,511-Speed 2614.83 samples/sec   Loss 9.0601   LearningRate 0.0451   Epoch: 6   Global Step: 272540   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:24,429-Speed 2614.66 samples/sec   Loss 8.9756   LearningRate 0.0451   Epoch: 6   Global Step: 272550   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:28,320-Speed 2632.12 samples/sec   Loss 9.0741   LearningRate 0.0451   Epoch: 6   Global Step: 272560   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:21:32,253-Speed 2604.56 samples/sec   Loss 9.2027   LearningRate 0.0451   Epoch: 6   Global Step: 272570   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:21:36,154-Speed 2625.00 samples/sec   Loss 9.0464   LearningRate 0.0451   Epoch: 6   Global Step: 272580   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:21:40,063-Speed 2620.54 samples/sec   Loss 9.0050   LearningRate 0.0451   Epoch: 6   Global Step: 272590   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:21:43,982-Speed 2613.36 samples/sec   Loss 9.1593   LearningRate 0.0451   Epoch: 6   Global Step: 272600   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:21:47,877-Speed 2629.83 samples/sec   Loss 8.9886   LearningRate 0.0451   Epoch: 6   Global Step: 272610   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:21:51,752-Speed 2643.02 samples/sec   Loss 9.2335   LearningRate 0.0451   Epoch: 6   Global Step: 272620   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:55,646-Speed 2630.12 samples/sec   Loss 9.0525   LearningRate 0.0451   Epoch: 6   Global Step: 272630   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:21:59,540-Speed 2630.51 samples/sec   Loss 8.9383   LearningRate 0.0451   Epoch: 6   Global Step: 272640   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:03,436-Speed 2629.09 samples/sec   Loss 9.0615   LearningRate 0.0451   Epoch: 6   Global Step: 272650   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:07,329-Speed 2630.65 samples/sec   Loss 9.2300   LearningRate 0.0451   Epoch: 6   Global Step: 272660   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:11,222-Speed 2630.92 samples/sec   Loss 8.9946   LearningRate 0.0451   Epoch: 6   Global Step: 272670   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:15,115-Speed 2631.13 samples/sec   Loss 9.1087   LearningRate 0.0451   Epoch: 6   Global Step: 272680   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:19,012-Speed 2628.38 samples/sec   Loss 9.1115   LearningRate 0.0451   Epoch: 6   Global Step: 272690   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:22,908-Speed 2628.91 samples/sec   Loss 9.1837   LearningRate 0.0451   Epoch: 6   Global Step: 272700   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:26,801-Speed 2630.49 samples/sec   Loss 9.1801   LearningRate 0.0451   Epoch: 6   Global Step: 272710   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:30,701-Speed 2626.33 samples/sec   Loss 9.1115   LearningRate 0.0451   Epoch: 6   Global Step: 272720   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:22:34,594-Speed 2631.36 samples/sec   Loss 8.9063   LearningRate 0.0451   Epoch: 6   Global Step: 272730   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:22:38,497-Speed 2624.15 samples/sec   Loss 9.1312   LearningRate 0.0451   Epoch: 6   Global Step: 272740   Fp16 Grad Scale: 262144   Required: 63 hours
Training: 2022-04-14 02:22:42,367-Speed 2646.53 samples/sec   Loss 9.0609   LearningRate 0.0451   Epoch: 6   Global Step: 272750   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:46,267-Speed 2626.18 samples/sec   Loss 9.0327   LearningRate 0.0451   Epoch: 6   Global Step: 272760   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:50,170-Speed 2624.48 samples/sec   Loss 8.9783   LearningRate 0.0451   Epoch: 6   Global Step: 272770   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:54,064-Speed 2630.35 samples/sec   Loss 9.0692   LearningRate 0.0450   Epoch: 6   Global Step: 272780   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:22:57,958-Speed 2630.38 samples/sec   Loss 9.1075   LearningRate 0.0450   Epoch: 6   Global Step: 272790   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:23:01,863-Speed 2622.66 samples/sec   Loss 9.1074   LearningRate 0.0450   Epoch: 6   Global Step: 272800   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:23:05,755-Speed 2631.19 samples/sec   Loss 9.0316   LearningRate 0.0450   Epoch: 6   Global Step: 272810   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:23:09,649-Speed 2630.82 samples/sec   Loss 9.0110   LearningRate 0.0450   Epoch: 6   Global Step: 272820   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:23:13,531-Speed 2638.20 samples/sec   Loss 9.0142   LearningRate 0.0450   Epoch: 6   Global Step: 272830   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:17,426-Speed 2629.29 samples/sec   Loss 9.2166   LearningRate 0.0450   Epoch: 6   Global Step: 272840   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:21,323-Speed 2628.52 samples/sec   Loss 9.0436   LearningRate 0.0450   Epoch: 6   Global Step: 272850   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:25,217-Speed 2630.00 samples/sec   Loss 9.0564   LearningRate 0.0450   Epoch: 6   Global Step: 272860   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:29,117-Speed 2626.64 samples/sec   Loss 9.2659   LearningRate 0.0450   Epoch: 6   Global Step: 272870   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:33,008-Speed 2631.93 samples/sec   Loss 9.0812   LearningRate 0.0450   Epoch: 6   Global Step: 272880   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:36,901-Speed 2631.05 samples/sec   Loss 8.9512   LearningRate 0.0450   Epoch: 6   Global Step: 272890   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:40,794-Speed 2631.19 samples/sec   Loss 9.0290   LearningRate 0.0450   Epoch: 6   Global Step: 272900   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:44,686-Speed 2631.32 samples/sec   Loss 9.0601   LearningRate 0.0450   Epoch: 6   Global Step: 272910   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:48,593-Speed 2621.40 samples/sec   Loss 9.0539   LearningRate 0.0450   Epoch: 6   Global Step: 272920   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:23:52,507-Speed 2617.33 samples/sec   Loss 8.9983   LearningRate 0.0450   Epoch: 6   Global Step: 272930   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:23:56,417-Speed 2619.27 samples/sec   Loss 9.0926   LearningRate 0.0450   Epoch: 6   Global Step: 272940   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:00,318-Speed 2625.58 samples/sec   Loss 9.0170   LearningRate 0.0450   Epoch: 6   Global Step: 272950   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:04,216-Speed 2627.44 samples/sec   Loss 9.0911   LearningRate 0.0450   Epoch: 6   Global Step: 272960   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:08,114-Speed 2628.10 samples/sec   Loss 8.9428   LearningRate 0.0450   Epoch: 6   Global Step: 272970   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:12,011-Speed 2627.95 samples/sec   Loss 9.1126   LearningRate 0.0450   Epoch: 6   Global Step: 272980   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:15,907-Speed 2628.79 samples/sec   Loss 9.0505   LearningRate 0.0450   Epoch: 6   Global Step: 272990   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:19,803-Speed 2629.10 samples/sec   Loss 9.0296   LearningRate 0.0450   Epoch: 6   Global Step: 273000   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:23,697-Speed 2630.08 samples/sec   Loss 9.0091   LearningRate 0.0450   Epoch: 6   Global Step: 273010   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:27,589-Speed 2631.83 samples/sec   Loss 9.0784   LearningRate 0.0450   Epoch: 6   Global Step: 273020   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:24:31,485-Speed 2628.66 samples/sec   Loss 9.0220   LearningRate 0.0450   Epoch: 6   Global Step: 273030   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:24:35,382-Speed 2628.51 samples/sec   Loss 9.2219   LearningRate 0.0450   Epoch: 6   Global Step: 273040   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:24:39,274-Speed 2631.53 samples/sec   Loss 9.0816   LearningRate 0.0450   Epoch: 6   Global Step: 273050   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:24:43,168-Speed 2630.63 samples/sec   Loss 9.2257   LearningRate 0.0450   Epoch: 6   Global Step: 273060   Fp16 Grad Scale: 131072   Required: 63 hours
Training: 2022-04-14 02:24:46,998-Speed 2674.12 samples/sec   Loss 9.5799   LearningRate 0.0450   Epoch: 6   Global Step: 273070   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:24:50,891-Speed 2630.79 samples/sec   Loss 9.0185   LearningRate 0.0450   Epoch: 6   Global Step: 273080   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:24:54,796-Speed 2622.75 samples/sec   Loss 9.2515   LearningRate 0.0450   Epoch: 6   Global Step: 273090   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:24:58,689-Speed 2631.10 samples/sec   Loss 9.1252   LearningRate 0.0450   Epoch: 6   Global Step: 273100   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:02,585-Speed 2629.14 samples/sec   Loss 9.1247   LearningRate 0.0450   Epoch: 6   Global Step: 273110   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:06,489-Speed 2623.50 samples/sec   Loss 9.1615   LearningRate 0.0450   Epoch: 6   Global Step: 273120   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:10,393-Speed 2623.06 samples/sec   Loss 8.9382   LearningRate 0.0450   Epoch: 6   Global Step: 273130   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:14,318-Speed 2610.09 samples/sec   Loss 8.9621   LearningRate 0.0450   Epoch: 6   Global Step: 273140   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:18,211-Speed 2630.76 samples/sec   Loss 9.1391   LearningRate 0.0450   Epoch: 6   Global Step: 273150   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:22,104-Speed 2631.38 samples/sec   Loss 9.1207   LearningRate 0.0450   Epoch: 6   Global Step: 273160   Fp16 Grad Scale: 16384   Required: 63 hours
Training: 2022-04-14 02:25:26,000-Speed 2629.05 samples/sec   Loss 9.0570   LearningRate 0.0450   Epoch: 6   Global Step: 273170   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:29,893-Speed 2630.88 samples/sec   Loss 9.0310   LearningRate 0.0450   Epoch: 6   Global Step: 273180   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:33,785-Speed 2631.57 samples/sec   Loss 9.1909   LearningRate 0.0450   Epoch: 6   Global Step: 273190   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:37,676-Speed 2632.40 samples/sec   Loss 9.1139   LearningRate 0.0450   Epoch: 6   Global Step: 273200   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:41,572-Speed 2628.32 samples/sec   Loss 9.1304   LearningRate 0.0450   Epoch: 6   Global Step: 273210   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:45,467-Speed 2629.65 samples/sec   Loss 8.9547   LearningRate 0.0450   Epoch: 6   Global Step: 273220   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:49,362-Speed 2629.93 samples/sec   Loss 9.0597   LearningRate 0.0450   Epoch: 6   Global Step: 273230   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:53,260-Speed 2627.05 samples/sec   Loss 9.0596   LearningRate 0.0450   Epoch: 6   Global Step: 273240   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:25:57,152-Speed 2632.09 samples/sec   Loss 9.1033   LearningRate 0.0450   Epoch: 6   Global Step: 273250   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:26:01,045-Speed 2631.36 samples/sec   Loss 9.0466   LearningRate 0.0450   Epoch: 6   Global Step: 273260   Fp16 Grad Scale: 32768   Required: 63 hours
Training: 2022-04-14 02:26:04,942-Speed 2628.32 samples/sec   Loss 9.0866   LearningRate 0.0450   Epoch: 6   Global Step: 273270   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:26:08,836-Speed 2630.03 samples/sec   Loss 9.0503   LearningRate 0.0450   Epoch: 6   Global Step: 273280   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:26:12,732-Speed 2628.57 samples/sec   Loss 9.0056   LearningRate 0.0450   Epoch: 6   Global Step: 273290   Fp16 Grad Scale: 65536   Required: 63 hours
Training: 2022-04-14 02:26:16,622-Speed 2633.07 samples/sec   Loss 9.0413   LearningRate 0.0450   Epoch: 6   Global Step: 273300   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:20,516-Speed 2630.21 samples/sec   Loss 9.0370   LearningRate 0.0450   Epoch: 6   Global Step: 273310   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:24,423-Speed 2621.78 samples/sec   Loss 9.1355   LearningRate 0.0450   Epoch: 6   Global Step: 273320   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:28,323-Speed 2625.94 samples/sec   Loss 9.1432   LearningRate 0.0450   Epoch: 6   Global Step: 273330   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:32,215-Speed 2632.09 samples/sec   Loss 9.1759   LearningRate 0.0450   Epoch: 6   Global Step: 273340   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:36,114-Speed 2626.98 samples/sec   Loss 9.1769   LearningRate 0.0450   Epoch: 6   Global Step: 273350   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:40,008-Speed 2630.14 samples/sec   Loss 9.0470   LearningRate 0.0450   Epoch: 6   Global Step: 273360   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:26:43,903-Speed 2629.51 samples/sec   Loss 8.9972   LearningRate 0.0450   Epoch: 6   Global Step: 273370   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:26:47,800-Speed 2628.37 samples/sec   Loss 9.0702   LearningRate 0.0450   Epoch: 6   Global Step: 273380   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:26:51,693-Speed 2631.23 samples/sec   Loss 9.2020   LearningRate 0.0450   Epoch: 6   Global Step: 273390   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:26:55,588-Speed 2629.68 samples/sec   Loss 9.0192   LearningRate 0.0449   Epoch: 6   Global Step: 273400   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:26:59,480-Speed 2631.18 samples/sec   Loss 9.0765   LearningRate 0.0449   Epoch: 6   Global Step: 273410   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:03,372-Speed 2631.57 samples/sec   Loss 9.0369   LearningRate 0.0449   Epoch: 6   Global Step: 273420   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:07,266-Speed 2630.59 samples/sec   Loss 9.0294   LearningRate 0.0449   Epoch: 6   Global Step: 273430   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:11,168-Speed 2624.87 samples/sec   Loss 9.1363   LearningRate 0.0449   Epoch: 6   Global Step: 273440   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:15,072-Speed 2623.30 samples/sec   Loss 9.0556   LearningRate 0.0449   Epoch: 6   Global Step: 273450   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:18,977-Speed 2623.08 samples/sec   Loss 9.0369   LearningRate 0.0449   Epoch: 6   Global Step: 273460   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:22,878-Speed 2625.83 samples/sec   Loss 9.0781   LearningRate 0.0449   Epoch: 6   Global Step: 273470   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:27:26,771-Speed 2630.59 samples/sec   Loss 9.2498   LearningRate 0.0449   Epoch: 6   Global Step: 273480   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:27:30,663-Speed 2631.57 samples/sec   Loss 9.3477   LearningRate 0.0449   Epoch: 6   Global Step: 273490   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:27:34,559-Speed 2629.29 samples/sec   Loss 9.0495   LearningRate 0.0449   Epoch: 6   Global Step: 273500   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:27:38,459-Speed 2626.01 samples/sec   Loss 9.1121   LearningRate 0.0449   Epoch: 6   Global Step: 273510   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:27:42,341-Speed 2638.36 samples/sec   Loss 9.1372   LearningRate 0.0449   Epoch: 6   Global Step: 273520   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:46,243-Speed 2624.47 samples/sec   Loss 8.9546   LearningRate 0.0449   Epoch: 6   Global Step: 273530   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:50,153-Speed 2619.89 samples/sec   Loss 9.0067   LearningRate 0.0449   Epoch: 6   Global Step: 273540   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:54,049-Speed 2629.20 samples/sec   Loss 9.0878   LearningRate 0.0449   Epoch: 6   Global Step: 273550   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:27:57,956-Speed 2621.41 samples/sec   Loss 9.0045   LearningRate 0.0449   Epoch: 6   Global Step: 273560   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:01,850-Speed 2630.54 samples/sec   Loss 9.0865   LearningRate 0.0449   Epoch: 6   Global Step: 273570   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:05,744-Speed 2630.13 samples/sec   Loss 8.9950   LearningRate 0.0449   Epoch: 6   Global Step: 273580   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:09,640-Speed 2628.94 samples/sec   Loss 9.0696   LearningRate 0.0449   Epoch: 6   Global Step: 273590   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:13,546-Speed 2621.80 samples/sec   Loss 9.1241   LearningRate 0.0449   Epoch: 6   Global Step: 273600   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:17,445-Speed 2626.77 samples/sec   Loss 9.0012   LearningRate 0.0449   Epoch: 6   Global Step: 273610   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:21,338-Speed 2631.59 samples/sec   Loss 9.0782   LearningRate 0.0449   Epoch: 6   Global Step: 273620   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:28:25,260-Speed 2611.64 samples/sec   Loss 9.0597   LearningRate 0.0449   Epoch: 6   Global Step: 273630   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:28:29,161-Speed 2625.74 samples/sec   Loss 9.0552   LearningRate 0.0449   Epoch: 6   Global Step: 273640   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:28:33,052-Speed 2632.37 samples/sec   Loss 8.9383   LearningRate 0.0449   Epoch: 6   Global Step: 273650   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:28:36,928-Speed 2642.17 samples/sec   Loss 9.1006   LearningRate 0.0449   Epoch: 6   Global Step: 273660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:40,822-Speed 2630.45 samples/sec   Loss 9.0059   LearningRate 0.0449   Epoch: 6   Global Step: 273670   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:44,718-Speed 2628.58 samples/sec   Loss 9.0531   LearningRate 0.0449   Epoch: 6   Global Step: 273680   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:48,613-Speed 2629.97 samples/sec   Loss 9.0196   LearningRate 0.0449   Epoch: 6   Global Step: 273690   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:52,506-Speed 2630.78 samples/sec   Loss 9.0164   LearningRate 0.0449   Epoch: 6   Global Step: 273700   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:28:56,400-Speed 2629.95 samples/sec   Loss 9.0793   LearningRate 0.0449   Epoch: 6   Global Step: 273710   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:29:00,293-Speed 2631.62 samples/sec   Loss 9.1050   LearningRate 0.0449   Epoch: 6   Global Step: 273720   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:29:04,188-Speed 2629.49 samples/sec   Loss 9.0998   LearningRate 0.0449   Epoch: 6   Global Step: 273730   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:29:08,065-Speed 2641.89 samples/sec   Loss 8.9995   LearningRate 0.0449   Epoch: 6   Global Step: 273740   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:11,960-Speed 2629.62 samples/sec   Loss 9.1166   LearningRate 0.0449   Epoch: 6   Global Step: 273750   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:15,851-Speed 2632.50 samples/sec   Loss 9.0992   LearningRate 0.0449   Epoch: 6   Global Step: 273760   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:19,754-Speed 2623.61 samples/sec   Loss 8.8886   LearningRate 0.0449   Epoch: 6   Global Step: 273770   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:23,652-Speed 2627.59 samples/sec   Loss 9.0703   LearningRate 0.0449   Epoch: 6   Global Step: 273780   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:27,550-Speed 2627.61 samples/sec   Loss 9.0335   LearningRate 0.0449   Epoch: 6   Global Step: 273790   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:31,455-Speed 2622.98 samples/sec   Loss 9.0189   LearningRate 0.0449   Epoch: 6   Global Step: 273800   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:35,347-Speed 2631.72 samples/sec   Loss 8.9884   LearningRate 0.0449   Epoch: 6   Global Step: 273810   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:39,243-Speed 2629.24 samples/sec   Loss 9.1956   LearningRate 0.0449   Epoch: 6   Global Step: 273820   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:43,137-Speed 2630.00 samples/sec   Loss 9.0691   LearningRate 0.0449   Epoch: 6   Global Step: 273830   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:29:47,043-Speed 2621.83 samples/sec   Loss 8.9649   LearningRate 0.0449   Epoch: 6   Global Step: 273840   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:29:50,939-Speed 2628.98 samples/sec   Loss 8.9655   LearningRate 0.0449   Epoch: 6   Global Step: 273850   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:29:54,834-Speed 2629.54 samples/sec   Loss 9.0924   LearningRate 0.0449   Epoch: 6   Global Step: 273860   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:29:58,726-Speed 2631.77 samples/sec   Loss 9.1389   LearningRate 0.0449   Epoch: 6   Global Step: 273870   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:30:02,618-Speed 2632.04 samples/sec   Loss 9.0764   LearningRate 0.0449   Epoch: 6   Global Step: 273880   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:30:06,533-Speed 2615.84 samples/sec   Loss 9.0577   LearningRate 0.0449   Epoch: 6   Global Step: 273890   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:30:10,391-Speed 2654.74 samples/sec   Loss 9.1627   LearningRate 0.0449   Epoch: 6   Global Step: 273900   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:14,283-Speed 2631.55 samples/sec   Loss 9.2238   LearningRate 0.0449   Epoch: 6   Global Step: 273910   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:18,177-Speed 2630.25 samples/sec   Loss 9.0911   LearningRate 0.0449   Epoch: 6   Global Step: 273920   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:22,070-Speed 2631.67 samples/sec   Loss 9.1113   LearningRate 0.0449   Epoch: 6   Global Step: 273930   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:25,960-Speed 2632.61 samples/sec   Loss 9.1411   LearningRate 0.0449   Epoch: 6   Global Step: 273940   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:29,855-Speed 2629.74 samples/sec   Loss 9.3182   LearningRate 0.0449   Epoch: 6   Global Step: 273950   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:33,747-Speed 2631.96 samples/sec   Loss 9.1127   LearningRate 0.0449   Epoch: 6   Global Step: 273960   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:37,644-Speed 2628.40 samples/sec   Loss 9.1040   LearningRate 0.0449   Epoch: 6   Global Step: 273970   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:41,576-Speed 2604.34 samples/sec   Loss 9.0840   LearningRate 0.0449   Epoch: 6   Global Step: 273980   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:45,490-Speed 2620.57 samples/sec   Loss 9.0764   LearningRate 0.0449   Epoch: 6   Global Step: 273990   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:30:49,384-Speed 2630.17 samples/sec   Loss 9.1094   LearningRate 0.0449   Epoch: 6   Global Step: 274000   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:30:53,325-Speed 2599.74 samples/sec   Loss 9.0822   LearningRate 0.0448   Epoch: 6   Global Step: 274010   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:30:57,221-Speed 2629.18 samples/sec   Loss 9.0481   LearningRate 0.0448   Epoch: 6   Global Step: 274020   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:01,112-Speed 2632.07 samples/sec   Loss 9.0356   LearningRate 0.0448   Epoch: 6   Global Step: 274030   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:05,023-Speed 2619.41 samples/sec   Loss 9.0993   LearningRate 0.0448   Epoch: 6   Global Step: 274040   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:08,932-Speed 2620.15 samples/sec   Loss 9.2268   LearningRate 0.0448   Epoch: 6   Global Step: 274050   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:12,841-Speed 2619.98 samples/sec   Loss 9.0539   LearningRate 0.0448   Epoch: 6   Global Step: 274060   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:16,737-Speed 2629.12 samples/sec   Loss 9.1194   LearningRate 0.0448   Epoch: 6   Global Step: 274070   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:20,634-Speed 2628.60 samples/sec   Loss 8.8718   LearningRate 0.0448   Epoch: 6   Global Step: 274080   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:24,541-Speed 2621.38 samples/sec   Loss 9.2029   LearningRate 0.0448   Epoch: 6   Global Step: 274090   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:31:28,439-Speed 2628.49 samples/sec   Loss 8.9262   LearningRate 0.0448   Epoch: 6   Global Step: 274100   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:32,345-Speed 2622.05 samples/sec   Loss 9.1334   LearningRate 0.0448   Epoch: 6   Global Step: 274110   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:36,232-Speed 2634.68 samples/sec   Loss 9.1357   LearningRate 0.0448   Epoch: 6   Global Step: 274120   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:40,124-Speed 2632.02 samples/sec   Loss 9.1065   LearningRate 0.0448   Epoch: 6   Global Step: 274130   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:44,022-Speed 2627.56 samples/sec   Loss 9.0799   LearningRate 0.0448   Epoch: 6   Global Step: 274140   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:47,921-Speed 2627.19 samples/sec   Loss 8.9372   LearningRate 0.0448   Epoch: 6   Global Step: 274150   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:51,815-Speed 2630.13 samples/sec   Loss 9.1956   LearningRate 0.0448   Epoch: 6   Global Step: 274160   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:55,712-Speed 2628.54 samples/sec   Loss 9.0864   LearningRate 0.0448   Epoch: 6   Global Step: 274170   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:31:59,606-Speed 2630.37 samples/sec   Loss 9.0944   LearningRate 0.0448   Epoch: 6   Global Step: 274180   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:32:03,499-Speed 2630.99 samples/sec   Loss 8.9958   LearningRate 0.0448   Epoch: 6   Global Step: 274190   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:32:07,391-Speed 2631.40 samples/sec   Loss 9.1259   LearningRate 0.0448   Epoch: 6   Global Step: 274200   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:11,284-Speed 2631.15 samples/sec   Loss 9.0404   LearningRate 0.0448   Epoch: 6   Global Step: 274210   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:15,188-Speed 2623.64 samples/sec   Loss 8.9970   LearningRate 0.0448   Epoch: 6   Global Step: 274220   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:19,087-Speed 2626.75 samples/sec   Loss 8.9732   LearningRate 0.0448   Epoch: 6   Global Step: 274230   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:22,980-Speed 2630.82 samples/sec   Loss 9.0264   LearningRate 0.0448   Epoch: 6   Global Step: 274240   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:26,873-Speed 2631.09 samples/sec   Loss 9.0787   LearningRate 0.0448   Epoch: 6   Global Step: 274250   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:30,767-Speed 2630.47 samples/sec   Loss 9.0424   LearningRate 0.0448   Epoch: 6   Global Step: 274260   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:34,657-Speed 2634.80 samples/sec   Loss 9.1284   LearningRate 0.0448   Epoch: 6   Global Step: 274270   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:38,552-Speed 2629.46 samples/sec   Loss 9.0661   LearningRate 0.0448   Epoch: 6   Global Step: 274280   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:42,443-Speed 2632.13 samples/sec   Loss 9.0221   LearningRate 0.0448   Epoch: 6   Global Step: 274290   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:32:46,336-Speed 2630.39 samples/sec   Loss 9.1187   LearningRate 0.0448   Epoch: 6   Global Step: 274300   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:32:50,236-Speed 2626.91 samples/sec   Loss 9.0285   LearningRate 0.0448   Epoch: 6   Global Step: 274310   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:32:54,135-Speed 2626.55 samples/sec   Loss 9.1385   LearningRate 0.0448   Epoch: 6   Global Step: 274320   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:32:58,118-Speed 2571.93 samples/sec   Loss 9.0175   LearningRate 0.0448   Epoch: 6   Global Step: 274330   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:02,014-Speed 2629.52 samples/sec   Loss 9.1562   LearningRate 0.0448   Epoch: 6   Global Step: 274340   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:05,910-Speed 2628.94 samples/sec   Loss 9.0310   LearningRate 0.0448   Epoch: 6   Global Step: 274350   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:09,824-Speed 2616.96 samples/sec   Loss 8.9929   LearningRate 0.0448   Epoch: 6   Global Step: 274360   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:13,717-Speed 2630.81 samples/sec   Loss 8.9792   LearningRate 0.0448   Epoch: 6   Global Step: 274370   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:17,613-Speed 2628.81 samples/sec   Loss 9.0559   LearningRate 0.0448   Epoch: 6   Global Step: 274380   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:21,515-Speed 2625.70 samples/sec   Loss 9.0287   LearningRate 0.0448   Epoch: 6   Global Step: 274390   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:25,414-Speed 2627.44 samples/sec   Loss 9.1096   LearningRate 0.0448   Epoch: 6   Global Step: 274400   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:29,329-Speed 2615.95 samples/sec   Loss 8.9958   LearningRate 0.0448   Epoch: 6   Global Step: 274410   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:33:33,211-Speed 2638.54 samples/sec   Loss 9.0789   LearningRate 0.0448   Epoch: 6   Global Step: 274420   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:33:37,110-Speed 2626.89 samples/sec   Loss 9.0951   LearningRate 0.0448   Epoch: 6   Global Step: 274430   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:33:41,103-Speed 2565.45 samples/sec   Loss 9.1546   LearningRate 0.0448   Epoch: 6   Global Step: 274440   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:33:45,052-Speed 2593.24 samples/sec   Loss 9.1426   LearningRate 0.0448   Epoch: 6   Global Step: 274450   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:33:48,983-Speed 2606.36 samples/sec   Loss 9.0890   LearningRate 0.0448   Epoch: 6   Global Step: 274460   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:33:52,879-Speed 2628.61 samples/sec   Loss 9.0332   LearningRate 0.0448   Epoch: 6   Global Step: 274470   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:33:56,811-Speed 2605.93 samples/sec   Loss 9.1470   LearningRate 0.0448   Epoch: 6   Global Step: 274480   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:34:00,743-Speed 2604.61 samples/sec   Loss 9.1715   LearningRate 0.0448   Epoch: 6   Global Step: 274490   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:34:04,641-Speed 2627.53 samples/sec   Loss 9.1017   LearningRate 0.0448   Epoch: 6   Global Step: 274500   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:34:08,532-Speed 2632.63 samples/sec   Loss 9.0686   LearningRate 0.0448   Epoch: 6   Global Step: 274510   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:34:12,425-Speed 2631.13 samples/sec   Loss 9.0506   LearningRate 0.0448   Epoch: 6   Global Step: 274520   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:16,315-Speed 2632.94 samples/sec   Loss 8.9238   LearningRate 0.0448   Epoch: 6   Global Step: 274530   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:20,216-Speed 2625.81 samples/sec   Loss 8.9795   LearningRate 0.0448   Epoch: 6   Global Step: 274540   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:24,106-Speed 2632.33 samples/sec   Loss 8.8285   LearningRate 0.0448   Epoch: 6   Global Step: 274550   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:27,998-Speed 2631.98 samples/sec   Loss 8.9468   LearningRate 0.0448   Epoch: 6   Global Step: 274560   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:31,897-Speed 2631.26 samples/sec   Loss 9.0005   LearningRate 0.0448   Epoch: 6   Global Step: 274570   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:35,804-Speed 2621.72 samples/sec   Loss 9.0935   LearningRate 0.0448   Epoch: 6   Global Step: 274580   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:39,709-Speed 2623.05 samples/sec   Loss 8.9895   LearningRate 0.0448   Epoch: 6   Global Step: 274590   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:43,606-Speed 2628.12 samples/sec   Loss 9.0154   LearningRate 0.0448   Epoch: 6   Global Step: 274600   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:47,501-Speed 2630.05 samples/sec   Loss 9.1054   LearningRate 0.0448   Epoch: 6   Global Step: 274610   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:51,401-Speed 2625.99 samples/sec   Loss 9.1361   LearningRate 0.0448   Epoch: 6   Global Step: 274620   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:34:55,295-Speed 2630.44 samples/sec   Loss 9.1187   LearningRate 0.0447   Epoch: 6   Global Step: 274630   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:34:59,193-Speed 2627.53 samples/sec   Loss 9.1495   LearningRate 0.0447   Epoch: 6   Global Step: 274640   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:03,092-Speed 2627.09 samples/sec   Loss 9.1373   LearningRate 0.0447   Epoch: 6   Global Step: 274650   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:06,998-Speed 2623.86 samples/sec   Loss 9.0060   LearningRate 0.0447   Epoch: 6   Global Step: 274660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:10,898-Speed 2627.01 samples/sec   Loss 9.1457   LearningRate 0.0447   Epoch: 6   Global Step: 274670   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:14,795-Speed 2628.02 samples/sec   Loss 8.9856   LearningRate 0.0447   Epoch: 6   Global Step: 274680   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:18,698-Speed 2624.50 samples/sec   Loss 9.1154   LearningRate 0.0447   Epoch: 6   Global Step: 274690   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:22,595-Speed 2629.13 samples/sec   Loss 9.1519   LearningRate 0.0447   Epoch: 6   Global Step: 274700   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:26,494-Speed 2626.48 samples/sec   Loss 9.0739   LearningRate 0.0447   Epoch: 6   Global Step: 274710   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:30,394-Speed 2626.61 samples/sec   Loss 9.1502   LearningRate 0.0447   Epoch: 6   Global Step: 274720   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:34,294-Speed 2626.54 samples/sec   Loss 9.1135   LearningRate 0.0447   Epoch: 6   Global Step: 274730   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:35:38,236-Speed 2598.08 samples/sec   Loss 9.0944   LearningRate 0.0447   Epoch: 6   Global Step: 274740   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:35:42,131-Speed 2629.87 samples/sec   Loss 9.1323   LearningRate 0.0447   Epoch: 6   Global Step: 274750   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:35:46,022-Speed 2632.33 samples/sec   Loss 9.1096   LearningRate 0.0447   Epoch: 6   Global Step: 274760   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:35:49,892-Speed 2646.52 samples/sec   Loss 9.0398   LearningRate 0.0447   Epoch: 6   Global Step: 274770   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:53,798-Speed 2622.29 samples/sec   Loss 8.8713   LearningRate 0.0447   Epoch: 6   Global Step: 274780   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:35:57,705-Speed 2621.67 samples/sec   Loss 9.0592   LearningRate 0.0447   Epoch: 6   Global Step: 274790   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:36:01,587-Speed 2639.06 samples/sec   Loss 9.0519   LearningRate 0.0447   Epoch: 6   Global Step: 274800   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:05,491-Speed 2623.28 samples/sec   Loss 8.9809   LearningRate 0.0447   Epoch: 6   Global Step: 274810   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:09,408-Speed 2614.68 samples/sec   Loss 9.0447   LearningRate 0.0447   Epoch: 6   Global Step: 274820   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:13,304-Speed 2629.07 samples/sec   Loss 9.0727   LearningRate 0.0447   Epoch: 6   Global Step: 274830   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:17,220-Speed 2615.58 samples/sec   Loss 9.1816   LearningRate 0.0447   Epoch: 6   Global Step: 274840   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:21,113-Speed 2631.43 samples/sec   Loss 9.0411   LearningRate 0.0447   Epoch: 6   Global Step: 274850   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:25,004-Speed 2632.11 samples/sec   Loss 8.9740   LearningRate 0.0447   Epoch: 6   Global Step: 274860   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:28,896-Speed 2631.73 samples/sec   Loss 9.0570   LearningRate 0.0447   Epoch: 6   Global Step: 274870   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:32,792-Speed 2629.19 samples/sec   Loss 9.0544   LearningRate 0.0447   Epoch: 6   Global Step: 274880   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:36:36,665-Speed 2644.48 samples/sec   Loss 8.9348   LearningRate 0.0447   Epoch: 6   Global Step: 274890   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:36:40,534-Speed 2647.69 samples/sec   Loss 9.4221   LearningRate 0.0447   Epoch: 6   Global Step: 274900   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:36:44,418-Speed 2636.79 samples/sec   Loss 9.0815   LearningRate 0.0447   Epoch: 6   Global Step: 274910   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:36:48,309-Speed 2632.46 samples/sec   Loss 9.1308   LearningRate 0.0447   Epoch: 6   Global Step: 274920   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:36:52,245-Speed 2602.84 samples/sec   Loss 9.0556   LearningRate 0.0447   Epoch: 6   Global Step: 274930   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:36:56,143-Speed 2627.46 samples/sec   Loss 9.1050   LearningRate 0.0447   Epoch: 6   Global Step: 274940   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:37:00,036-Speed 2631.08 samples/sec   Loss 9.0902   LearningRate 0.0447   Epoch: 6   Global Step: 274950   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:37:03,930-Speed 2630.05 samples/sec   Loss 9.0904   LearningRate 0.0447   Epoch: 6   Global Step: 274960   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:37:07,822-Speed 2631.74 samples/sec   Loss 9.0725   LearningRate 0.0447   Epoch: 6   Global Step: 274970   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:37:11,718-Speed 2628.74 samples/sec   Loss 9.0295   LearningRate 0.0447   Epoch: 6   Global Step: 274980   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:37:15,619-Speed 2625.92 samples/sec   Loss 9.0278   LearningRate 0.0447   Epoch: 6   Global Step: 274990   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:37:19,514-Speed 2629.48 samples/sec   Loss 9.0413   LearningRate 0.0447   Epoch: 6   Global Step: 275000   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:23,408-Speed 2630.49 samples/sec   Loss 8.9908   LearningRate 0.0447   Epoch: 6   Global Step: 275010   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:27,307-Speed 2626.83 samples/sec   Loss 9.1073   LearningRate 0.0447   Epoch: 6   Global Step: 275020   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:31,211-Speed 2623.83 samples/sec   Loss 9.0880   LearningRate 0.0447   Epoch: 6   Global Step: 275030   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:35,105-Speed 2630.40 samples/sec   Loss 9.0310   LearningRate 0.0447   Epoch: 6   Global Step: 275040   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:39,025-Speed 2612.69 samples/sec   Loss 8.9886   LearningRate 0.0447   Epoch: 6   Global Step: 275050   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:42,990-Speed 2582.88 samples/sec   Loss 9.1065   LearningRate 0.0447   Epoch: 6   Global Step: 275060   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:46,882-Speed 2632.43 samples/sec   Loss 9.1028   LearningRate 0.0447   Epoch: 6   Global Step: 275070   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:50,776-Speed 2629.92 samples/sec   Loss 8.9660   LearningRate 0.0447   Epoch: 6   Global Step: 275080   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:54,666-Speed 2633.46 samples/sec   Loss 9.0888   LearningRate 0.0447   Epoch: 6   Global Step: 275090   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:37:58,557-Speed 2632.07 samples/sec   Loss 9.0976   LearningRate 0.0447   Epoch: 6   Global Step: 275100   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:02,451-Speed 2630.88 samples/sec   Loss 9.0992   LearningRate 0.0447   Epoch: 6   Global Step: 275110   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:06,358-Speed 2621.35 samples/sec   Loss 9.0682   LearningRate 0.0447   Epoch: 6   Global Step: 275120   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:10,254-Speed 2628.78 samples/sec   Loss 9.1132   LearningRate 0.0447   Epoch: 6   Global Step: 275130   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:14,141-Speed 2634.96 samples/sec   Loss 9.1619   LearningRate 0.0447   Epoch: 6   Global Step: 275140   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:18,034-Speed 2631.02 samples/sec   Loss 9.2207   LearningRate 0.0447   Epoch: 6   Global Step: 275150   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:21,925-Speed 2633.02 samples/sec   Loss 9.0451   LearningRate 0.0447   Epoch: 6   Global Step: 275160   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:25,826-Speed 2625.42 samples/sec   Loss 8.9939   LearningRate 0.0447   Epoch: 6   Global Step: 275170   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:29,726-Speed 2626.55 samples/sec   Loss 9.0324   LearningRate 0.0447   Epoch: 6   Global Step: 275180   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:33,616-Speed 2632.94 samples/sec   Loss 9.0204   LearningRate 0.0447   Epoch: 6   Global Step: 275190   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:38:37,514-Speed 2627.38 samples/sec   Loss 9.0423   LearningRate 0.0447   Epoch: 6   Global Step: 275200   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:38:41,413-Speed 2627.51 samples/sec   Loss 9.1613   LearningRate 0.0447   Epoch: 6   Global Step: 275210   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:38:45,307-Speed 2630.05 samples/sec   Loss 9.2489   LearningRate 0.0447   Epoch: 6   Global Step: 275220   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:38:49,246-Speed 2600.47 samples/sec   Loss 9.1214   LearningRate 0.0447   Epoch: 6   Global Step: 275230   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:38:53,148-Speed 2625.02 samples/sec   Loss 8.9993   LearningRate 0.0447   Epoch: 6   Global Step: 275240   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:38:57,050-Speed 2625.18 samples/sec   Loss 9.0327   LearningRate 0.0446   Epoch: 6   Global Step: 275250   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:39:00,965-Speed 2616.11 samples/sec   Loss 9.0799   LearningRate 0.0446   Epoch: 6   Global Step: 275260   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:39:04,889-Speed 2610.37 samples/sec   Loss 8.9443   LearningRate 0.0446   Epoch: 6   Global Step: 275270   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:39:08,779-Speed 2632.72 samples/sec   Loss 9.1322   LearningRate 0.0446   Epoch: 6   Global Step: 275280   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:39:12,693-Speed 2617.19 samples/sec   Loss 9.0330   LearningRate 0.0446   Epoch: 6   Global Step: 275290   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:39:16,594-Speed 2625.21 samples/sec   Loss 9.0024   LearningRate 0.0446   Epoch: 6   Global Step: 275300   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:20,493-Speed 2627.28 samples/sec   Loss 9.0026   LearningRate 0.0446   Epoch: 6   Global Step: 275310   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:24,393-Speed 2626.67 samples/sec   Loss 9.1172   LearningRate 0.0446   Epoch: 6   Global Step: 275320   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:28,299-Speed 2621.70 samples/sec   Loss 9.1167   LearningRate 0.0446   Epoch: 6   Global Step: 275330   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:32,198-Speed 2627.16 samples/sec   Loss 9.0252   LearningRate 0.0446   Epoch: 6   Global Step: 275340   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:36,094-Speed 2629.02 samples/sec   Loss 9.1314   LearningRate 0.0446   Epoch: 6   Global Step: 275350   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:39,986-Speed 2631.51 samples/sec   Loss 9.1173   LearningRate 0.0446   Epoch: 6   Global Step: 275360   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:43,876-Speed 2633.01 samples/sec   Loss 9.0741   LearningRate 0.0446   Epoch: 6   Global Step: 275370   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:47,766-Speed 2632.67 samples/sec   Loss 8.9137   LearningRate 0.0446   Epoch: 6   Global Step: 275380   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:51,668-Speed 2625.05 samples/sec   Loss 9.1454   LearningRate 0.0446   Epoch: 6   Global Step: 275390   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:39:55,565-Speed 2628.15 samples/sec   Loss 9.1138   LearningRate 0.0446   Epoch: 6   Global Step: 275400   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:39:59,446-Speed 2639.75 samples/sec   Loss 9.0892   LearningRate 0.0446   Epoch: 6   Global Step: 275410   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:03,339-Speed 2630.18 samples/sec   Loss 9.1725   LearningRate 0.0446   Epoch: 6   Global Step: 275420   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:07,230-Speed 2633.01 samples/sec   Loss 9.0755   LearningRate 0.0446   Epoch: 6   Global Step: 275430   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:11,117-Speed 2634.69 samples/sec   Loss 8.9741   LearningRate 0.0446   Epoch: 6   Global Step: 275440   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:15,006-Speed 2633.79 samples/sec   Loss 8.9143   LearningRate 0.0446   Epoch: 6   Global Step: 275450   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:18,897-Speed 2632.09 samples/sec   Loss 9.0316   LearningRate 0.0446   Epoch: 6   Global Step: 275460   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:22,788-Speed 2632.70 samples/sec   Loss 9.0079   LearningRate 0.0446   Epoch: 6   Global Step: 275470   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:26,676-Speed 2634.29 samples/sec   Loss 8.9824   LearningRate 0.0446   Epoch: 6   Global Step: 275480   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:30,565-Speed 2633.31 samples/sec   Loss 9.0314   LearningRate 0.0446   Epoch: 6   Global Step: 275490   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:34,461-Speed 2629.18 samples/sec   Loss 9.1013   LearningRate 0.0446   Epoch: 6   Global Step: 275500   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:38,336-Speed 2642.62 samples/sec   Loss 9.0099   LearningRate 0.0446   Epoch: 6   Global Step: 275510   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:42,237-Speed 2625.98 samples/sec   Loss 8.8642   LearningRate 0.0446   Epoch: 6   Global Step: 275520   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:46,131-Speed 2630.37 samples/sec   Loss 9.0474   LearningRate 0.0446   Epoch: 6   Global Step: 275530   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:50,025-Speed 2630.48 samples/sec   Loss 9.0593   LearningRate 0.0446   Epoch: 6   Global Step: 275540   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:53,922-Speed 2634.38 samples/sec   Loss 9.0398   LearningRate 0.0446   Epoch: 6   Global Step: 275550   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:40:57,831-Speed 2620.20 samples/sec   Loss 9.0580   LearningRate 0.0446   Epoch: 6   Global Step: 275560   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:01,722-Speed 2631.78 samples/sec   Loss 9.0585   LearningRate 0.0446   Epoch: 6   Global Step: 275570   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:05,619-Speed 2628.57 samples/sec   Loss 8.9013   LearningRate 0.0446   Epoch: 6   Global Step: 275580   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:09,513-Speed 2630.02 samples/sec   Loss 9.0767   LearningRate 0.0446   Epoch: 6   Global Step: 275590   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:13,406-Speed 2630.80 samples/sec   Loss 8.7813   LearningRate 0.0446   Epoch: 6   Global Step: 275600   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:17,297-Speed 2632.75 samples/sec   Loss 9.0610   LearningRate 0.0446   Epoch: 6   Global Step: 275610   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:41:21,190-Speed 2630.57 samples/sec   Loss 8.8983   LearningRate 0.0446   Epoch: 6   Global Step: 275620   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:41:25,166-Speed 2575.80 samples/sec   Loss 9.0037   LearningRate 0.0446   Epoch: 6   Global Step: 275630   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:41:29,052-Speed 2636.23 samples/sec   Loss 9.0879   LearningRate 0.0446   Epoch: 6   Global Step: 275640   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:32,966-Speed 2616.75 samples/sec   Loss 9.0606   LearningRate 0.0446   Epoch: 6   Global Step: 275650   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:36,876-Speed 2619.46 samples/sec   Loss 8.9662   LearningRate 0.0446   Epoch: 6   Global Step: 275660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:40,792-Speed 2615.26 samples/sec   Loss 8.9947   LearningRate 0.0446   Epoch: 6   Global Step: 275670   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:44,703-Speed 2618.96 samples/sec   Loss 9.1233   LearningRate 0.0446   Epoch: 6   Global Step: 275680   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:41:48,599-Speed 2629.16 samples/sec   Loss 9.0191   LearningRate 0.0446   Epoch: 6   Global Step: 275690   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:41:52,516-Speed 2615.00 samples/sec   Loss 8.9448   LearningRate 0.0446   Epoch: 6   Global Step: 275700   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:41:56,410-Speed 2630.04 samples/sec   Loss 9.0298   LearningRate 0.0446   Epoch: 6   Global Step: 275710   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:00,303-Speed 2630.99 samples/sec   Loss 8.8883   LearningRate 0.0446   Epoch: 6   Global Step: 275720   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:04,205-Speed 2625.06 samples/sec   Loss 8.9870   LearningRate 0.0446   Epoch: 6   Global Step: 275730   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:08,096-Speed 2632.07 samples/sec   Loss 8.9914   LearningRate 0.0446   Epoch: 6   Global Step: 275740   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:11,985-Speed 2633.66 samples/sec   Loss 8.9403   LearningRate 0.0446   Epoch: 6   Global Step: 275750   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:15,874-Speed 2634.01 samples/sec   Loss 9.0478   LearningRate 0.0446   Epoch: 6   Global Step: 275760   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:19,763-Speed 2633.29 samples/sec   Loss 8.9611   LearningRate 0.0446   Epoch: 6   Global Step: 275770   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:23,654-Speed 2632.64 samples/sec   Loss 9.0574   LearningRate 0.0446   Epoch: 6   Global Step: 275780   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:42:27,544-Speed 2632.75 samples/sec   Loss 9.0086   LearningRate 0.0446   Epoch: 6   Global Step: 275790   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:31,434-Speed 2633.04 samples/sec   Loss 9.1585   LearningRate 0.0446   Epoch: 6   Global Step: 275800   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:35,323-Speed 2633.24 samples/sec   Loss 9.0164   LearningRate 0.0446   Epoch: 6   Global Step: 275810   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:39,218-Speed 2629.67 samples/sec   Loss 8.9396   LearningRate 0.0446   Epoch: 6   Global Step: 275820   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:43,109-Speed 2632.07 samples/sec   Loss 9.0792   LearningRate 0.0446   Epoch: 6   Global Step: 275830   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:47,008-Speed 2627.52 samples/sec   Loss 8.9270   LearningRate 0.0446   Epoch: 6   Global Step: 275840   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:50,903-Speed 2629.28 samples/sec   Loss 8.8109   LearningRate 0.0446   Epoch: 6   Global Step: 275850   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:54,808-Speed 2623.58 samples/sec   Loss 8.9965   LearningRate 0.0446   Epoch: 6   Global Step: 275860   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:42:58,704-Speed 2628.97 samples/sec   Loss 9.1074   LearningRate 0.0446   Epoch: 6   Global Step: 275870   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:02,698-Speed 2564.91 samples/sec   Loss 9.2022   LearningRate 0.0445   Epoch: 6   Global Step: 275880   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:06,581-Speed 2637.29 samples/sec   Loss 9.1152   LearningRate 0.0445   Epoch: 6   Global Step: 275890   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:43:10,479-Speed 2627.50 samples/sec   Loss 9.1229   LearningRate 0.0445   Epoch: 6   Global Step: 275900   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:43:14,359-Speed 2639.80 samples/sec   Loss 9.0744   LearningRate 0.0445   Epoch: 6   Global Step: 275910   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:18,249-Speed 2632.69 samples/sec   Loss 9.0938   LearningRate 0.0445   Epoch: 6   Global Step: 275920   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:22,146-Speed 2628.38 samples/sec   Loss 9.1223   LearningRate 0.0445   Epoch: 6   Global Step: 275930   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:26,044-Speed 2627.69 samples/sec   Loss 8.9926   LearningRate 0.0445   Epoch: 6   Global Step: 275940   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:29,941-Speed 2630.28 samples/sec   Loss 9.1735   LearningRate 0.0445   Epoch: 6   Global Step: 275950   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:33,833-Speed 2631.39 samples/sec   Loss 9.0263   LearningRate 0.0445   Epoch: 6   Global Step: 275960   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:37,728-Speed 2628.99 samples/sec   Loss 9.0667   LearningRate 0.0445   Epoch: 6   Global Step: 275970   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:41,625-Speed 2628.68 samples/sec   Loss 9.0150   LearningRate 0.0445   Epoch: 6   Global Step: 275980   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:45,519-Speed 2630.12 samples/sec   Loss 8.9241   LearningRate 0.0445   Epoch: 6   Global Step: 275990   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:43:49,397-Speed 2641.11 samples/sec   Loss 8.9738   LearningRate 0.0445   Epoch: 6   Global Step: 276000   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:43:53,307-Speed 2619.36 samples/sec   Loss 9.0757   LearningRate 0.0445   Epoch: 6   Global Step: 276010   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:43:57,197-Speed 2633.22 samples/sec   Loss 8.8957   LearningRate 0.0445   Epoch: 6   Global Step: 276020   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:44:01,116-Speed 2613.49 samples/sec   Loss 9.0034   LearningRate 0.0445   Epoch: 6   Global Step: 276030   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:44:05,014-Speed 2627.98 samples/sec   Loss 9.1013   LearningRate 0.0445   Epoch: 6   Global Step: 276040   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:44:08,913-Speed 2626.37 samples/sec   Loss 8.9554   LearningRate 0.0445   Epoch: 6   Global Step: 276050   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:44:12,801-Speed 2634.63 samples/sec   Loss 8.9608   LearningRate 0.0445   Epoch: 6   Global Step: 276060   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:44:16,686-Speed 2636.44 samples/sec   Loss 9.9828   LearningRate 0.0445   Epoch: 6   Global Step: 276070   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:20,576-Speed 2632.64 samples/sec   Loss 9.2565   LearningRate 0.0445   Epoch: 6   Global Step: 276080   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:24,468-Speed 2631.42 samples/sec   Loss 8.9631   LearningRate 0.0445   Epoch: 6   Global Step: 276090   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:28,360-Speed 2632.18 samples/sec   Loss 8.9721   LearningRate 0.0445   Epoch: 6   Global Step: 276100   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:32,251-Speed 2631.66 samples/sec   Loss 9.0377   LearningRate 0.0445   Epoch: 6   Global Step: 276110   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:36,142-Speed 2632.75 samples/sec   Loss 9.0455   LearningRate 0.0445   Epoch: 6   Global Step: 276120   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:40,032-Speed 2633.22 samples/sec   Loss 9.2326   LearningRate 0.0445   Epoch: 6   Global Step: 276130   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:43,923-Speed 2632.23 samples/sec   Loss 8.9542   LearningRate 0.0445   Epoch: 6   Global Step: 276140   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:47,818-Speed 2629.54 samples/sec   Loss 8.9695   LearningRate 0.0445   Epoch: 6   Global Step: 276150   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:51,708-Speed 2633.06 samples/sec   Loss 9.0648   LearningRate 0.0445   Epoch: 6   Global Step: 276160   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:44:55,603-Speed 2629.21 samples/sec   Loss 8.8742   LearningRate 0.0445   Epoch: 6   Global Step: 276170   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:44:59,507-Speed 2623.75 samples/sec   Loss 9.0309   LearningRate 0.0445   Epoch: 6   Global Step: 276180   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:03,399-Speed 2631.52 samples/sec   Loss 8.9437   LearningRate 0.0445   Epoch: 6   Global Step: 276190   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:07,293-Speed 2630.78 samples/sec   Loss 9.1598   LearningRate 0.0445   Epoch: 6   Global Step: 276200   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:11,190-Speed 2628.31 samples/sec   Loss 9.1717   LearningRate 0.0445   Epoch: 6   Global Step: 276210   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:15,081-Speed 2632.32 samples/sec   Loss 9.1189   LearningRate 0.0445   Epoch: 6   Global Step: 276220   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:18,974-Speed 2630.92 samples/sec   Loss 8.9501   LearningRate 0.0445   Epoch: 6   Global Step: 276230   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:22,866-Speed 2631.92 samples/sec   Loss 9.0750   LearningRate 0.0445   Epoch: 6   Global Step: 276240   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:26,760-Speed 2630.35 samples/sec   Loss 9.0945   LearningRate 0.0445   Epoch: 6   Global Step: 276250   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:30,651-Speed 2632.18 samples/sec   Loss 9.1089   LearningRate 0.0445   Epoch: 6   Global Step: 276260   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:45:34,541-Speed 2632.59 samples/sec   Loss 8.8600   LearningRate 0.0445   Epoch: 6   Global Step: 276270   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:45:38,442-Speed 2625.46 samples/sec   Loss 9.0019   LearningRate 0.0445   Epoch: 6   Global Step: 276280   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:45:42,334-Speed 2631.58 samples/sec   Loss 8.9781   LearningRate 0.0445   Epoch: 6   Global Step: 276290   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:45:46,229-Speed 2629.91 samples/sec   Loss 9.0723   LearningRate 0.0445   Epoch: 6   Global Step: 276300   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:45:50,120-Speed 2632.90 samples/sec   Loss 9.0380   LearningRate 0.0445   Epoch: 6   Global Step: 276310   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:45:54,017-Speed 2628.09 samples/sec   Loss 9.2052   LearningRate 0.0445   Epoch: 6   Global Step: 276320   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:45:57,909-Speed 2632.05 samples/sec   Loss 8.9615   LearningRate 0.0445   Epoch: 6   Global Step: 276330   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:01,812-Speed 2624.19 samples/sec   Loss 9.0832   LearningRate 0.0445   Epoch: 6   Global Step: 276340   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:05,707-Speed 2629.58 samples/sec   Loss 9.0658   LearningRate 0.0445   Epoch: 6   Global Step: 276350   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:09,601-Speed 2630.73 samples/sec   Loss 8.9684   LearningRate 0.0445   Epoch: 6   Global Step: 276360   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:13,494-Speed 2630.67 samples/sec   Loss 8.9288   LearningRate 0.0445   Epoch: 6   Global Step: 276370   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:46:17,389-Speed 2629.22 samples/sec   Loss 8.9473   LearningRate 0.0445   Epoch: 6   Global Step: 276380   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:46:21,274-Speed 2636.59 samples/sec   Loss 9.0991   LearningRate 0.0445   Epoch: 6   Global Step: 276390   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:25,167-Speed 2630.77 samples/sec   Loss 8.9935   LearningRate 0.0445   Epoch: 6   Global Step: 276400   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:29,055-Speed 2634.03 samples/sec   Loss 8.9068   LearningRate 0.0445   Epoch: 6   Global Step: 276410   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:32,947-Speed 2632.36 samples/sec   Loss 8.9687   LearningRate 0.0445   Epoch: 6   Global Step: 276420   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:36,837-Speed 2632.90 samples/sec   Loss 9.0202   LearningRate 0.0445   Epoch: 6   Global Step: 276430   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:46:40,719-Speed 2638.80 samples/sec   Loss 8.9308   LearningRate 0.0445   Epoch: 6   Global Step: 276440   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:46:44,614-Speed 2628.92 samples/sec   Loss 9.0477   LearningRate 0.0445   Epoch: 6   Global Step: 276450   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:46:48,505-Speed 2632.62 samples/sec   Loss 9.0896   LearningRate 0.0445   Epoch: 6   Global Step: 276460   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:46:52,395-Speed 2633.03 samples/sec   Loss 8.8678   LearningRate 0.0445   Epoch: 6   Global Step: 276470   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:46:56,288-Speed 2630.86 samples/sec   Loss 8.9724   LearningRate 0.0445   Epoch: 6   Global Step: 276480   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:47:00,182-Speed 2630.27 samples/sec   Loss 9.0040   LearningRate 0.0445   Epoch: 6   Global Step: 276490   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:47:04,087-Speed 2622.89 samples/sec   Loss 9.0540   LearningRate 0.0444   Epoch: 6   Global Step: 276500   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:47:07,974-Speed 2634.71 samples/sec   Loss 8.9655   LearningRate 0.0444   Epoch: 6   Global Step: 276510   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:47:11,880-Speed 2622.58 samples/sec   Loss 9.0025   LearningRate 0.0444   Epoch: 6   Global Step: 276520   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:47:15,784-Speed 2623.17 samples/sec   Loss 8.9154   LearningRate 0.0444   Epoch: 6   Global Step: 276530   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:47:19,694-Speed 2619.98 samples/sec   Loss 8.8858   LearningRate 0.0444   Epoch: 6   Global Step: 276540   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:23,601-Speed 2622.09 samples/sec   Loss 9.1899   LearningRate 0.0444   Epoch: 6   Global Step: 276550   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:27,505-Speed 2623.15 samples/sec   Loss 9.0248   LearningRate 0.0444   Epoch: 6   Global Step: 276560   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:31,402-Speed 2628.35 samples/sec   Loss 9.1546   LearningRate 0.0444   Epoch: 6   Global Step: 276570   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:35,307-Speed 2622.67 samples/sec   Loss 9.1235   LearningRate 0.0444   Epoch: 6   Global Step: 276580   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:39,203-Speed 2628.70 samples/sec   Loss 8.9532   LearningRate 0.0444   Epoch: 6   Global Step: 276590   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:43,095-Speed 2631.77 samples/sec   Loss 8.9852   LearningRate 0.0444   Epoch: 6   Global Step: 276600   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:46,985-Speed 2632.83 samples/sec   Loss 9.0151   LearningRate 0.0444   Epoch: 6   Global Step: 276610   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:50,883-Speed 2627.96 samples/sec   Loss 9.1891   LearningRate 0.0444   Epoch: 6   Global Step: 276620   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:54,799-Speed 2615.57 samples/sec   Loss 9.1338   LearningRate 0.0444   Epoch: 6   Global Step: 276630   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:47:58,714-Speed 2616.12 samples/sec   Loss 8.9941   LearningRate 0.0444   Epoch: 6   Global Step: 276640   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:02,615-Speed 2625.93 samples/sec   Loss 9.0463   LearningRate 0.0444   Epoch: 6   Global Step: 276650   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:06,511-Speed 2628.15 samples/sec   Loss 8.9332   LearningRate 0.0444   Epoch: 6   Global Step: 276660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:10,406-Speed 2630.10 samples/sec   Loss 9.0515   LearningRate 0.0444   Epoch: 6   Global Step: 276670   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:14,301-Speed 2629.04 samples/sec   Loss 9.1195   LearningRate 0.0444   Epoch: 6   Global Step: 276680   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:18,213-Speed 2618.06 samples/sec   Loss 9.0538   LearningRate 0.0444   Epoch: 6   Global Step: 276690   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:22,112-Speed 2627.27 samples/sec   Loss 9.0545   LearningRate 0.0444   Epoch: 6   Global Step: 276700   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:26,009-Speed 2628.08 samples/sec   Loss 8.9145   LearningRate 0.0444   Epoch: 6   Global Step: 276710   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:29,899-Speed 2633.30 samples/sec   Loss 8.9627   LearningRate 0.0444   Epoch: 6   Global Step: 276720   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:33,791-Speed 2631.64 samples/sec   Loss 9.1191   LearningRate 0.0444   Epoch: 6   Global Step: 276730   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:37,683-Speed 2631.85 samples/sec   Loss 8.9794   LearningRate 0.0444   Epoch: 6   Global Step: 276740   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:48:41,555-Speed 2644.88 samples/sec   Loss 8.9992   LearningRate 0.0444   Epoch: 6   Global Step: 276750   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:45,448-Speed 2631.12 samples/sec   Loss 9.0680   LearningRate 0.0444   Epoch: 6   Global Step: 276760   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:49,339-Speed 2632.20 samples/sec   Loss 9.0591   LearningRate 0.0444   Epoch: 6   Global Step: 276770   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:53,229-Speed 2632.79 samples/sec   Loss 9.1203   LearningRate 0.0444   Epoch: 6   Global Step: 276780   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:48:57,119-Speed 2632.87 samples/sec   Loss 8.9693   LearningRate 0.0444   Epoch: 6   Global Step: 276790   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:01,006-Speed 2635.32 samples/sec   Loss 9.0464   LearningRate 0.0444   Epoch: 6   Global Step: 276800   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:04,913-Speed 2621.57 samples/sec   Loss 9.1247   LearningRate 0.0444   Epoch: 6   Global Step: 276810   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:08,810-Speed 2628.04 samples/sec   Loss 8.9989   LearningRate 0.0444   Epoch: 6   Global Step: 276820   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:12,703-Speed 2630.97 samples/sec   Loss 9.0413   LearningRate 0.0444   Epoch: 6   Global Step: 276830   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:16,595-Speed 2631.67 samples/sec   Loss 9.0566   LearningRate 0.0444   Epoch: 6   Global Step: 276840   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:20,488-Speed 2630.67 samples/sec   Loss 9.0227   LearningRate 0.0444   Epoch: 6   Global Step: 276850   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:49:24,391-Speed 2624.73 samples/sec   Loss 9.0557   LearningRate 0.0444   Epoch: 6   Global Step: 276860   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:28,292-Speed 2625.36 samples/sec   Loss 9.0833   LearningRate 0.0444   Epoch: 6   Global Step: 276870   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:32,187-Speed 2629.46 samples/sec   Loss 9.0066   LearningRate 0.0444   Epoch: 6   Global Step: 276880   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:36,096-Speed 2620.24 samples/sec   Loss 8.9822   LearningRate 0.0444   Epoch: 6   Global Step: 276890   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:39,997-Speed 2625.37 samples/sec   Loss 9.0291   LearningRate 0.0444   Epoch: 6   Global Step: 276900   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:43,928-Speed 2605.37 samples/sec   Loss 9.0104   LearningRate 0.0444   Epoch: 6   Global Step: 276910   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:47,831-Speed 2625.02 samples/sec   Loss 8.9314   LearningRate 0.0444   Epoch: 6   Global Step: 276920   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:51,730-Speed 2626.55 samples/sec   Loss 9.0053   LearningRate 0.0444   Epoch: 6   Global Step: 276930   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:55,627-Speed 2627.98 samples/sec   Loss 8.9029   LearningRate 0.0444   Epoch: 6   Global Step: 276940   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:49:59,529-Speed 2625.25 samples/sec   Loss 8.8978   LearningRate 0.0444   Epoch: 6   Global Step: 276950   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:50:03,435-Speed 2622.07 samples/sec   Loss 9.0802   LearningRate 0.0444   Epoch: 6   Global Step: 276960   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 02:50:07,320-Speed 2636.53 samples/sec   Loss 9.0903   LearningRate 0.0444   Epoch: 6   Global Step: 276970   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:50:11,217-Speed 2627.49 samples/sec   Loss 8.9225   LearningRate 0.0444   Epoch: 6   Global Step: 276980   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:50:15,094-Speed 2642.19 samples/sec   Loss 8.9467   LearningRate 0.0444   Epoch: 6   Global Step: 276990   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:50:19,000-Speed 2622.15 samples/sec   Loss 8.9133   LearningRate 0.0444   Epoch: 6   Global Step: 277000   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:50:22,853-Speed 2658.47 samples/sec   Loss 10.4272   LearningRate 0.0444   Epoch: 6   Global Step: 277010   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:26,761-Speed 2620.95 samples/sec   Loss 9.7567   LearningRate 0.0444   Epoch: 6   Global Step: 277020   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:30,654-Speed 2630.85 samples/sec   Loss 9.3507   LearningRate 0.0444   Epoch: 6   Global Step: 277030   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:34,543-Speed 2633.59 samples/sec   Loss 9.1629   LearningRate 0.0444   Epoch: 6   Global Step: 277040   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:38,462-Speed 2614.05 samples/sec   Loss 9.1675   LearningRate 0.0444   Epoch: 6   Global Step: 277050   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:42,384-Speed 2611.30 samples/sec   Loss 9.0715   LearningRate 0.0444   Epoch: 6   Global Step: 277060   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:46,280-Speed 2628.42 samples/sec   Loss 9.0038   LearningRate 0.0444   Epoch: 6   Global Step: 277070   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:50,179-Speed 2626.71 samples/sec   Loss 9.0204   LearningRate 0.0444   Epoch: 6   Global Step: 277080   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:54,083-Speed 2623.37 samples/sec   Loss 8.9527   LearningRate 0.0444   Epoch: 6   Global Step: 277090   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:50:58,011-Speed 2607.70 samples/sec   Loss 9.1512   LearningRate 0.0444   Epoch: 6   Global Step: 277100   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:51:01,910-Speed 2626.98 samples/sec   Loss 8.9931   LearningRate 0.0444   Epoch: 6   Global Step: 277110   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:05,814-Speed 2623.90 samples/sec   Loss 9.0037   LearningRate 0.0443   Epoch: 6   Global Step: 277120   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:09,713-Speed 2626.53 samples/sec   Loss 9.0782   LearningRate 0.0443   Epoch: 6   Global Step: 277130   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:13,614-Speed 2625.91 samples/sec   Loss 9.0677   LearningRate 0.0443   Epoch: 6   Global Step: 277140   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:17,517-Speed 2623.88 samples/sec   Loss 9.0674   LearningRate 0.0443   Epoch: 6   Global Step: 277150   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:21,417-Speed 2626.19 samples/sec   Loss 9.0026   LearningRate 0.0443   Epoch: 6   Global Step: 277160   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:25,319-Speed 2624.72 samples/sec   Loss 8.9905   LearningRate 0.0443   Epoch: 6   Global Step: 277170   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:29,220-Speed 2625.82 samples/sec   Loss 8.9556   LearningRate 0.0443   Epoch: 6   Global Step: 277180   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:33,113-Speed 2630.60 samples/sec   Loss 8.9209   LearningRate 0.0443   Epoch: 6   Global Step: 277190   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:37,022-Speed 2620.65 samples/sec   Loss 8.9214   LearningRate 0.0443   Epoch: 6   Global Step: 277200   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:51:40,922-Speed 2626.53 samples/sec   Loss 9.0091   LearningRate 0.0443   Epoch: 6   Global Step: 277210   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:51:44,832-Speed 2619.22 samples/sec   Loss 9.1305   LearningRate 0.0443   Epoch: 6   Global Step: 277220   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:51:48,730-Speed 2627.41 samples/sec   Loss 9.0519   LearningRate 0.0443   Epoch: 6   Global Step: 277230   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:51:52,629-Speed 2627.51 samples/sec   Loss 9.0510   LearningRate 0.0443   Epoch: 6   Global Step: 277240   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:51:56,524-Speed 2629.27 samples/sec   Loss 9.0653   LearningRate 0.0443   Epoch: 6   Global Step: 277250   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:52:00,426-Speed 2624.64 samples/sec   Loss 8.9521   LearningRate 0.0443   Epoch: 6   Global Step: 277260   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:52:04,347-Speed 2612.06 samples/sec   Loss 9.1505   LearningRate 0.0443   Epoch: 6   Global Step: 277270   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:52:08,238-Speed 2632.46 samples/sec   Loss 8.9143   LearningRate 0.0443   Epoch: 6   Global Step: 277280   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:52:12,131-Speed 2630.94 samples/sec   Loss 8.9926   LearningRate 0.0443   Epoch: 6   Global Step: 277290   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:52:16,026-Speed 2629.57 samples/sec   Loss 8.9500   LearningRate 0.0443   Epoch: 6   Global Step: 277300   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:52:19,921-Speed 2630.04 samples/sec   Loss 9.1072   LearningRate 0.0443   Epoch: 6   Global Step: 277310   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:23,809-Speed 2634.42 samples/sec   Loss 8.9963   LearningRate 0.0443   Epoch: 6   Global Step: 277320   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:27,700-Speed 2633.19 samples/sec   Loss 9.1754   LearningRate 0.0443   Epoch: 6   Global Step: 277330   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:31,589-Speed 2633.09 samples/sec   Loss 9.1734   LearningRate 0.0443   Epoch: 6   Global Step: 277340   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:35,494-Speed 2623.12 samples/sec   Loss 8.8445   LearningRate 0.0443   Epoch: 6   Global Step: 277350   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:39,381-Speed 2634.24 samples/sec   Loss 8.9494   LearningRate 0.0443   Epoch: 6   Global Step: 277360   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:43,274-Speed 2631.33 samples/sec   Loss 8.9140   LearningRate 0.0443   Epoch: 6   Global Step: 277370   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:47,162-Speed 2634.18 samples/sec   Loss 8.9851   LearningRate 0.0443   Epoch: 6   Global Step: 277380   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:51,063-Speed 2625.97 samples/sec   Loss 8.9002   LearningRate 0.0443   Epoch: 6   Global Step: 277390   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:54,967-Speed 2623.29 samples/sec   Loss 8.9900   LearningRate 0.0443   Epoch: 6   Global Step: 277400   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:52:58,883-Speed 2615.40 samples/sec   Loss 9.0751   LearningRate 0.0443   Epoch: 6   Global Step: 277410   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:02,778-Speed 2629.34 samples/sec   Loss 9.0031   LearningRate 0.0443   Epoch: 6   Global Step: 277420   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:06,669-Speed 2632.48 samples/sec   Loss 8.9778   LearningRate 0.0443   Epoch: 6   Global Step: 277430   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:10,565-Speed 2629.19 samples/sec   Loss 9.0595   LearningRate 0.0443   Epoch: 6   Global Step: 277440   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:14,457-Speed 2631.37 samples/sec   Loss 9.0309   LearningRate 0.0443   Epoch: 6   Global Step: 277450   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:18,359-Speed 2624.80 samples/sec   Loss 9.0122   LearningRate 0.0443   Epoch: 6   Global Step: 277460   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:22,256-Speed 2628.03 samples/sec   Loss 9.0936   LearningRate 0.0443   Epoch: 6   Global Step: 277470   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:26,158-Speed 2625.65 samples/sec   Loss 9.0688   LearningRate 0.0443   Epoch: 6   Global Step: 277480   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:53:30,046-Speed 2633.92 samples/sec   Loss 9.0156   LearningRate 0.0443   Epoch: 6   Global Step: 277490   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:53:33,842-Speed 2698.51 samples/sec   Loss 9.5045   LearningRate 0.0443   Epoch: 6   Global Step: 277500   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:53:37,742-Speed 2626.48 samples/sec   Loss 9.6891   LearningRate 0.0443   Epoch: 6   Global Step: 277510   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:53:41,641-Speed 2626.73 samples/sec   Loss 9.4107   LearningRate 0.0443   Epoch: 6   Global Step: 277520   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:53:45,544-Speed 2624.02 samples/sec   Loss 9.0700   LearningRate 0.0443   Epoch: 6   Global Step: 277530   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:53:49,456-Speed 2618.11 samples/sec   Loss 9.1172   LearningRate 0.0443   Epoch: 6   Global Step: 277540   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:53:53,412-Speed 2588.92 samples/sec   Loss 8.9188   LearningRate 0.0443   Epoch: 6   Global Step: 277550   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:53:57,313-Speed 2626.06 samples/sec   Loss 8.9332   LearningRate 0.0443   Epoch: 6   Global Step: 277560   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:54:01,203-Speed 2632.66 samples/sec   Loss 9.1221   LearningRate 0.0443   Epoch: 6   Global Step: 277570   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:54:05,094-Speed 2632.27 samples/sec   Loss 9.0326   LearningRate 0.0443   Epoch: 6   Global Step: 277580   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:54:09,020-Speed 2608.88 samples/sec   Loss 9.1520   LearningRate 0.0443   Epoch: 6   Global Step: 277590   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 02:54:12,914-Speed 2630.35 samples/sec   Loss 9.1262   LearningRate 0.0443   Epoch: 6   Global Step: 277600   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:16,801-Speed 2635.28 samples/sec   Loss 8.9184   LearningRate 0.0443   Epoch: 6   Global Step: 277610   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:20,689-Speed 2634.34 samples/sec   Loss 8.8883   LearningRate 0.0443   Epoch: 6   Global Step: 277620   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:24,587-Speed 2627.68 samples/sec   Loss 8.9355   LearningRate 0.0443   Epoch: 6   Global Step: 277630   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:28,473-Speed 2635.79 samples/sec   Loss 8.8760   LearningRate 0.0443   Epoch: 6   Global Step: 277640   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:32,378-Speed 2622.48 samples/sec   Loss 8.9678   LearningRate 0.0443   Epoch: 6   Global Step: 277650   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:36,267-Speed 2633.73 samples/sec   Loss 9.0321   LearningRate 0.0443   Epoch: 6   Global Step: 277660   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:40,155-Speed 2634.10 samples/sec   Loss 9.0278   LearningRate 0.0443   Epoch: 6   Global Step: 277670   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:44,046-Speed 2632.12 samples/sec   Loss 9.0445   LearningRate 0.0443   Epoch: 6   Global Step: 277680   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:48,107-Speed 2522.28 samples/sec   Loss 9.0047   LearningRate 0.0443   Epoch: 6   Global Step: 277690   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 02:54:52,073-Speed 2582.88 samples/sec   Loss 8.9879   LearningRate 0.0443   Epoch: 6   Global Step: 277700   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:54:55,967-Speed 2630.49 samples/sec   Loss 9.1299   LearningRate 0.0443   Epoch: 6   Global Step: 277710   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:54:59,858-Speed 2632.05 samples/sec   Loss 8.9635   LearningRate 0.0443   Epoch: 6   Global Step: 277720   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:03,749-Speed 2632.45 samples/sec   Loss 8.9971   LearningRate 0.0443   Epoch: 6   Global Step: 277730   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:07,642-Speed 2630.59 samples/sec   Loss 8.9687   LearningRate 0.0442   Epoch: 6   Global Step: 277740   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:11,530-Speed 2634.30 samples/sec   Loss 8.9989   LearningRate 0.0442   Epoch: 6   Global Step: 277750   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:15,429-Speed 2627.08 samples/sec   Loss 8.9195   LearningRate 0.0442   Epoch: 6   Global Step: 277760   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:19,322-Speed 2631.21 samples/sec   Loss 9.1430   LearningRate 0.0442   Epoch: 6   Global Step: 277770   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:23,214-Speed 2631.53 samples/sec   Loss 8.9309   LearningRate 0.0442   Epoch: 6   Global Step: 277780   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:27,125-Speed 2619.02 samples/sec   Loss 8.9962   LearningRate 0.0442   Epoch: 6   Global Step: 277790   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 02:55:31,015-Speed 2632.49 samples/sec   Loss 8.9418   LearningRate 0.0442   Epoch: 6   Global Step: 277800   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:34,903-Speed 2634.37 samples/sec   Loss 9.1413   LearningRate 0.0442   Epoch: 6   Global Step: 277810   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:38,795-Speed 2631.69 samples/sec   Loss 9.0943   LearningRate 0.0442   Epoch: 6   Global Step: 277820   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:42,687-Speed 2631.69 samples/sec   Loss 8.8337   LearningRate 0.0442   Epoch: 6   Global Step: 277830   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:46,577-Speed 2632.85 samples/sec   Loss 8.9245   LearningRate 0.0442   Epoch: 6   Global Step: 277840   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:50,468-Speed 2633.05 samples/sec   Loss 8.9308   LearningRate 0.0442   Epoch: 6   Global Step: 277850   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:54,360-Speed 2631.41 samples/sec   Loss 9.0975   LearningRate 0.0442   Epoch: 6   Global Step: 277860   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:55:58,247-Speed 2634.87 samples/sec   Loss 9.0419   LearningRate 0.0442   Epoch: 6   Global Step: 277870   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:56:02,134-Speed 2634.83 samples/sec   Loss 8.9586   LearningRate 0.0442   Epoch: 6   Global Step: 277880   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:56:06,025-Speed 2632.10 samples/sec   Loss 9.0307   LearningRate 0.0442   Epoch: 6   Global Step: 277890   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:56:09,915-Speed 2632.73 samples/sec   Loss 9.1291   LearningRate 0.0442   Epoch: 6   Global Step: 277900   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:13,817-Speed 2625.49 samples/sec   Loss 8.8499   LearningRate 0.0442   Epoch: 6   Global Step: 277910   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:17,714-Speed 2628.86 samples/sec   Loss 8.9917   LearningRate 0.0442   Epoch: 6   Global Step: 277920   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:21,609-Speed 2628.90 samples/sec   Loss 9.0485   LearningRate 0.0442   Epoch: 6   Global Step: 277930   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:25,505-Speed 2629.11 samples/sec   Loss 8.9522   LearningRate 0.0442   Epoch: 6   Global Step: 277940   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:29,397-Speed 2631.75 samples/sec   Loss 9.0368   LearningRate 0.0442   Epoch: 6   Global Step: 277950   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:33,291-Speed 2630.16 samples/sec   Loss 8.9376   LearningRate 0.0442   Epoch: 6   Global Step: 277960   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:37,181-Speed 2632.69 samples/sec   Loss 9.0874   LearningRate 0.0442   Epoch: 6   Global Step: 277970   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:41,071-Speed 2633.19 samples/sec   Loss 8.9794   LearningRate 0.0442   Epoch: 6   Global Step: 277980   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:44,964-Speed 2630.65 samples/sec   Loss 8.9658   LearningRate 0.0442   Epoch: 6   Global Step: 277990   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:56:48,857-Speed 2631.34 samples/sec   Loss 8.9922   LearningRate 0.0442   Epoch: 6   Global Step: 278000   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:56:52,749-Speed 2632.11 samples/sec   Loss 8.8166   LearningRate 0.0442   Epoch: 6   Global Step: 278010   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:56:56,651-Speed 2624.49 samples/sec   Loss 8.9576   LearningRate 0.0442   Epoch: 6   Global Step: 278020   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:00,541-Speed 2633.41 samples/sec   Loss 9.0017   LearningRate 0.0442   Epoch: 6   Global Step: 278030   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:04,434-Speed 2630.74 samples/sec   Loss 8.9956   LearningRate 0.0442   Epoch: 6   Global Step: 278040   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:08,332-Speed 2627.25 samples/sec   Loss 9.0721   LearningRate 0.0442   Epoch: 6   Global Step: 278050   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:12,245-Speed 2617.37 samples/sec   Loss 8.9560   LearningRate 0.0442   Epoch: 6   Global Step: 278060   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:16,170-Speed 2609.93 samples/sec   Loss 8.8745   LearningRate 0.0442   Epoch: 6   Global Step: 278070   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:20,064-Speed 2630.01 samples/sec   Loss 9.0135   LearningRate 0.0442   Epoch: 6   Global Step: 278080   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:23,956-Speed 2631.99 samples/sec   Loss 8.9615   LearningRate 0.0442   Epoch: 6   Global Step: 278090   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:57:27,855-Speed 2627.29 samples/sec   Loss 9.0905   LearningRate 0.0442   Epoch: 6   Global Step: 278100   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:31,752-Speed 2628.44 samples/sec   Loss 8.9499   LearningRate 0.0442   Epoch: 6   Global Step: 278110   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:35,650-Speed 2627.79 samples/sec   Loss 9.0840   LearningRate 0.0442   Epoch: 6   Global Step: 278120   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:39,544-Speed 2629.78 samples/sec   Loss 9.0457   LearningRate 0.0442   Epoch: 6   Global Step: 278130   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:43,439-Speed 2629.62 samples/sec   Loss 9.0397   LearningRate 0.0442   Epoch: 6   Global Step: 278140   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:47,333-Speed 2630.62 samples/sec   Loss 9.0452   LearningRate 0.0442   Epoch: 6   Global Step: 278150   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:51,227-Speed 2629.85 samples/sec   Loss 9.0264   LearningRate 0.0442   Epoch: 6   Global Step: 278160   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:55,121-Speed 2631.13 samples/sec   Loss 8.9524   LearningRate 0.0442   Epoch: 6   Global Step: 278170   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:57:59,014-Speed 2630.65 samples/sec   Loss 8.7909   LearningRate 0.0442   Epoch: 6   Global Step: 278180   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:02,947-Speed 2604.35 samples/sec   Loss 9.0750   LearningRate 0.0442   Epoch: 6   Global Step: 278190   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:06,809-Speed 2652.25 samples/sec   Loss 9.1158   LearningRate 0.0442   Epoch: 6   Global Step: 278200   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:10,701-Speed 2631.80 samples/sec   Loss 9.0542   LearningRate 0.0442   Epoch: 6   Global Step: 278210   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:14,599-Speed 2627.06 samples/sec   Loss 9.0434   LearningRate 0.0442   Epoch: 6   Global Step: 278220   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:18,499-Speed 2627.06 samples/sec   Loss 9.0670   LearningRate 0.0442   Epoch: 6   Global Step: 278230   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:22,391-Speed 2631.71 samples/sec   Loss 9.0737   LearningRate 0.0442   Epoch: 6   Global Step: 278240   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:26,284-Speed 2630.61 samples/sec   Loss 9.0138   LearningRate 0.0442   Epoch: 6   Global Step: 278250   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:30,177-Speed 2631.18 samples/sec   Loss 8.9717   LearningRate 0.0442   Epoch: 6   Global Step: 278260   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:34,126-Speed 2593.84 samples/sec   Loss 9.0575   LearningRate 0.0442   Epoch: 6   Global Step: 278270   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 02:58:38,014-Speed 2634.56 samples/sec   Loss 8.9040   LearningRate 0.0442   Epoch: 6   Global Step: 278280   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:58:41,908-Speed 2629.98 samples/sec   Loss 8.9646   LearningRate 0.0442   Epoch: 6   Global Step: 278290   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:58:45,802-Speed 2631.10 samples/sec   Loss 9.1065   LearningRate 0.0442   Epoch: 6   Global Step: 278300   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:58:49,703-Speed 2625.17 samples/sec   Loss 9.0127   LearningRate 0.0442   Epoch: 6   Global Step: 278310   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:58:53,602-Speed 2627.55 samples/sec   Loss 8.9398   LearningRate 0.0442   Epoch: 6   Global Step: 278320   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 02:58:57,489-Speed 2634.52 samples/sec   Loss 8.8893   LearningRate 0.0442   Epoch: 6   Global Step: 278330   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:59:01,370-Speed 2639.37 samples/sec   Loss 10.4496   LearningRate 0.0442   Epoch: 6   Global Step: 278340   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:05,259-Speed 2633.37 samples/sec   Loss 9.3183   LearningRate 0.0442   Epoch: 6   Global Step: 278350   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:09,165-Speed 2622.80 samples/sec   Loss 9.0245   LearningRate 0.0442   Epoch: 6   Global Step: 278360   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:13,056-Speed 2632.42 samples/sec   Loss 9.1371   LearningRate 0.0441   Epoch: 6   Global Step: 278370   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:16,953-Speed 2628.35 samples/sec   Loss 9.0456   LearningRate 0.0441   Epoch: 6   Global Step: 278380   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:20,842-Speed 2634.01 samples/sec   Loss 9.1619   LearningRate 0.0441   Epoch: 6   Global Step: 278390   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:24,732-Speed 2632.91 samples/sec   Loss 9.0527   LearningRate 0.0441   Epoch: 6   Global Step: 278400   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:28,621-Speed 2633.96 samples/sec   Loss 8.9310   LearningRate 0.0441   Epoch: 6   Global Step: 278410   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:32,512-Speed 2631.88 samples/sec   Loss 9.0816   LearningRate 0.0441   Epoch: 6   Global Step: 278420   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:36,404-Speed 2631.80 samples/sec   Loss 9.0097   LearningRate 0.0441   Epoch: 6   Global Step: 278430   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 02:59:40,299-Speed 2629.18 samples/sec   Loss 8.9990   LearningRate 0.0441   Epoch: 6   Global Step: 278440   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:59:44,197-Speed 2627.94 samples/sec   Loss 9.1718   LearningRate 0.0441   Epoch: 6   Global Step: 278450   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:59:48,096-Speed 2627.38 samples/sec   Loss 9.0540   LearningRate 0.0441   Epoch: 6   Global Step: 278460   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:59:51,995-Speed 2627.03 samples/sec   Loss 9.0670   LearningRate 0.0441   Epoch: 6   Global Step: 278470   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:59:55,885-Speed 2633.36 samples/sec   Loss 9.0236   LearningRate 0.0441   Epoch: 6   Global Step: 278480   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 02:59:59,780-Speed 2629.86 samples/sec   Loss 9.0997   LearningRate 0.0441   Epoch: 6   Global Step: 278490   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:00:03,672-Speed 2631.99 samples/sec   Loss 8.9716   LearningRate 0.0441   Epoch: 6   Global Step: 278500   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:00:07,566-Speed 2630.08 samples/sec   Loss 8.9815   LearningRate 0.0441   Epoch: 6   Global Step: 278510   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:00:11,456-Speed 2632.98 samples/sec   Loss 8.9321   LearningRate 0.0441   Epoch: 6   Global Step: 278520   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:00:15,347-Speed 2632.73 samples/sec   Loss 8.9983   LearningRate 0.0441   Epoch: 6   Global Step: 278530   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:00:19,238-Speed 2632.18 samples/sec   Loss 9.0005   LearningRate 0.0441   Epoch: 6   Global Step: 278540   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:23,134-Speed 2629.26 samples/sec   Loss 9.0043   LearningRate 0.0441   Epoch: 6   Global Step: 278550   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:27,032-Speed 2627.74 samples/sec   Loss 8.9910   LearningRate 0.0441   Epoch: 6   Global Step: 278560   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:30,923-Speed 2632.30 samples/sec   Loss 9.0256   LearningRate 0.0441   Epoch: 6   Global Step: 278570   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:34,814-Speed 2632.57 samples/sec   Loss 8.9620   LearningRate 0.0441   Epoch: 6   Global Step: 278580   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:38,731-Speed 2614.42 samples/sec   Loss 9.0308   LearningRate 0.0441   Epoch: 6   Global Step: 278590   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:42,630-Speed 2627.80 samples/sec   Loss 9.0202   LearningRate 0.0441   Epoch: 6   Global Step: 278600   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:46,536-Speed 2622.29 samples/sec   Loss 8.9684   LearningRate 0.0441   Epoch: 6   Global Step: 278610   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:50,429-Speed 2630.62 samples/sec   Loss 9.0199   LearningRate 0.0441   Epoch: 6   Global Step: 278620   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:54,331-Speed 2625.40 samples/sec   Loss 8.9362   LearningRate 0.0441   Epoch: 6   Global Step: 278630   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:00:58,231-Speed 2626.27 samples/sec   Loss 9.0653   LearningRate 0.0441   Epoch: 6   Global Step: 278640   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:02,139-Speed 2621.19 samples/sec   Loss 9.0079   LearningRate 0.0441   Epoch: 6   Global Step: 278650   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:06,032-Speed 2630.53 samples/sec   Loss 8.9335   LearningRate 0.0441   Epoch: 6   Global Step: 278660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:09,925-Speed 2630.99 samples/sec   Loss 9.1521   LearningRate 0.0441   Epoch: 6   Global Step: 278670   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:13,816-Speed 2633.01 samples/sec   Loss 9.0895   LearningRate 0.0441   Epoch: 6   Global Step: 278680   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:17,719-Speed 2623.72 samples/sec   Loss 9.0110   LearningRate 0.0441   Epoch: 6   Global Step: 278690   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:21,603-Speed 2637.30 samples/sec   Loss 9.1152   LearningRate 0.0441   Epoch: 6   Global Step: 278700   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:01:25,480-Speed 2641.65 samples/sec   Loss 9.0820   LearningRate 0.0441   Epoch: 6   Global Step: 278710   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:29,376-Speed 2629.57 samples/sec   Loss 8.8736   LearningRate 0.0441   Epoch: 6   Global Step: 278720   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:33,264-Speed 2634.14 samples/sec   Loss 8.8816   LearningRate 0.0441   Epoch: 6   Global Step: 278730   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:37,158-Speed 2630.47 samples/sec   Loss 8.9730   LearningRate 0.0441   Epoch: 6   Global Step: 278740   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:41,050-Speed 2631.38 samples/sec   Loss 8.9492   LearningRate 0.0441   Epoch: 6   Global Step: 278750   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:44,939-Speed 2633.77 samples/sec   Loss 9.0489   LearningRate 0.0441   Epoch: 6   Global Step: 278760   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:48,837-Speed 2628.01 samples/sec   Loss 8.8933   LearningRate 0.0441   Epoch: 6   Global Step: 278770   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:52,766-Speed 2606.66 samples/sec   Loss 8.9841   LearningRate 0.0441   Epoch: 6   Global Step: 278780   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:01:56,804-Speed 2536.99 samples/sec   Loss 9.0055   LearningRate 0.0441   Epoch: 6   Global Step: 278790   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:02:00,704-Speed 2626.03 samples/sec   Loss 9.0059   LearningRate 0.0441   Epoch: 6   Global Step: 278800   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:02:04,598-Speed 2630.40 samples/sec   Loss 8.9355   LearningRate 0.0441   Epoch: 6   Global Step: 278810   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:08,495-Speed 2627.90 samples/sec   Loss 9.1970   LearningRate 0.0441   Epoch: 6   Global Step: 278820   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:12,392-Speed 2628.51 samples/sec   Loss 9.0509   LearningRate 0.0441   Epoch: 6   Global Step: 278830   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:16,306-Speed 2616.57 samples/sec   Loss 9.0913   LearningRate 0.0441   Epoch: 6   Global Step: 278840   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:20,219-Speed 2617.52 samples/sec   Loss 8.8813   LearningRate 0.0441   Epoch: 6   Global Step: 278850   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:24,116-Speed 2628.49 samples/sec   Loss 9.0504   LearningRate 0.0441   Epoch: 6   Global Step: 278860   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:28,013-Speed 2627.99 samples/sec   Loss 9.1014   LearningRate 0.0441   Epoch: 6   Global Step: 278870   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:31,917-Speed 2624.60 samples/sec   Loss 9.0511   LearningRate 0.0441   Epoch: 6   Global Step: 278880   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:35,813-Speed 2628.61 samples/sec   Loss 8.9463   LearningRate 0.0441   Epoch: 6   Global Step: 278890   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:39,708-Speed 2630.59 samples/sec   Loss 8.9307   LearningRate 0.0441   Epoch: 6   Global Step: 278900   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:43,581-Speed 2644.57 samples/sec   Loss 9.0581   LearningRate 0.0441   Epoch: 6   Global Step: 278910   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:47,484-Speed 2624.29 samples/sec   Loss 8.9225   LearningRate 0.0441   Epoch: 6   Global Step: 278920   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:51,376-Speed 2631.52 samples/sec   Loss 8.9744   LearningRate 0.0441   Epoch: 6   Global Step: 278930   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:55,377-Speed 2559.77 samples/sec   Loss 9.1344   LearningRate 0.0441   Epoch: 6   Global Step: 278940   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:02:59,269-Speed 2633.15 samples/sec   Loss 9.0223   LearningRate 0.0441   Epoch: 6   Global Step: 278950   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:03:03,199-Speed 2605.97 samples/sec   Loss 9.0515   LearningRate 0.0441   Epoch: 6   Global Step: 278960   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:03:07,337-Speed 2475.51 samples/sec   Loss 9.0594   LearningRate 0.0441   Epoch: 6   Global Step: 278970   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:03:11,330-Speed 2565.04 samples/sec   Loss 8.9325   LearningRate 0.0441   Epoch: 6   Global Step: 278980   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:03:15,202-Speed 2645.46 samples/sec   Loss 8.9693   LearningRate 0.0440   Epoch: 6   Global Step: 278990   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:19,106-Speed 2624.08 samples/sec   Loss 8.9084   LearningRate 0.0440   Epoch: 6   Global Step: 279000   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:23,005-Speed 2626.80 samples/sec   Loss 8.9853   LearningRate 0.0440   Epoch: 6   Global Step: 279010   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:26,897-Speed 2631.05 samples/sec   Loss 8.9623   LearningRate 0.0440   Epoch: 6   Global Step: 279020   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:30,800-Speed 2624.72 samples/sec   Loss 8.9611   LearningRate 0.0440   Epoch: 6   Global Step: 279030   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:34,693-Speed 2631.16 samples/sec   Loss 8.9535   LearningRate 0.0440   Epoch: 6   Global Step: 279040   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:38,598-Speed 2623.35 samples/sec   Loss 9.1517   LearningRate 0.0440   Epoch: 6   Global Step: 279050   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:42,495-Speed 2627.76 samples/sec   Loss 9.0432   LearningRate 0.0440   Epoch: 6   Global Step: 279060   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:46,391-Speed 2628.88 samples/sec   Loss 8.9982   LearningRate 0.0440   Epoch: 6   Global Step: 279070   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:50,298-Speed 2621.35 samples/sec   Loss 9.0345   LearningRate 0.0440   Epoch: 6   Global Step: 279080   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:03:54,182-Speed 2638.43 samples/sec   Loss 9.0306   LearningRate 0.0440   Epoch: 6   Global Step: 279090   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:03:58,076-Speed 2630.61 samples/sec   Loss 9.1267   LearningRate 0.0440   Epoch: 6   Global Step: 279100   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:01,967-Speed 2632.20 samples/sec   Loss 9.1098   LearningRate 0.0440   Epoch: 6   Global Step: 279110   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:05,857-Speed 2632.78 samples/sec   Loss 8.9204   LearningRate 0.0440   Epoch: 6   Global Step: 279120   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:09,762-Speed 2623.42 samples/sec   Loss 8.8393   LearningRate 0.0440   Epoch: 6   Global Step: 279130   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:13,660-Speed 2627.65 samples/sec   Loss 9.0099   LearningRate 0.0440   Epoch: 6   Global Step: 279140   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:17,553-Speed 2630.85 samples/sec   Loss 9.1117   LearningRate 0.0440   Epoch: 6   Global Step: 279150   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:21,448-Speed 2629.27 samples/sec   Loss 8.8690   LearningRate 0.0440   Epoch: 6   Global Step: 279160   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:25,343-Speed 2630.12 samples/sec   Loss 9.0466   LearningRate 0.0440   Epoch: 6   Global Step: 279170   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:29,235-Speed 2632.13 samples/sec   Loss 8.9296   LearningRate 0.0440   Epoch: 6   Global Step: 279180   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:33,127-Speed 2631.83 samples/sec   Loss 9.1035   LearningRate 0.0440   Epoch: 6   Global Step: 279190   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:04:37,034-Speed 2621.48 samples/sec   Loss 8.9707   LearningRate 0.0440   Epoch: 6   Global Step: 279200   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:04:40,902-Speed 2648.07 samples/sec   Loss 9.1641   LearningRate 0.0440   Epoch: 6   Global Step: 279210   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:44,793-Speed 2632.73 samples/sec   Loss 8.9883   LearningRate 0.0440   Epoch: 6   Global Step: 279220   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:04:48,667-Speed 2643.91 samples/sec   Loss 8.9760   LearningRate 0.0440   Epoch: 6   Global Step: 279230   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:04:52,566-Speed 2626.72 samples/sec   Loss 9.1030   LearningRate 0.0440   Epoch: 6   Global Step: 279240   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:04:56,454-Speed 2633.99 samples/sec   Loss 8.9655   LearningRate 0.0440   Epoch: 6   Global Step: 279250   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:00,343-Speed 2633.74 samples/sec   Loss 9.0973   LearningRate 0.0440   Epoch: 6   Global Step: 279260   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:04,237-Speed 2630.41 samples/sec   Loss 9.1752   LearningRate 0.0440   Epoch: 6   Global Step: 279270   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:08,137-Speed 2626.91 samples/sec   Loss 8.7684   LearningRate 0.0440   Epoch: 6   Global Step: 279280   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:12,042-Speed 2622.79 samples/sec   Loss 8.9119   LearningRate 0.0440   Epoch: 6   Global Step: 279290   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:15,937-Speed 2629.35 samples/sec   Loss 8.9617   LearningRate 0.0440   Epoch: 6   Global Step: 279300   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:19,834-Speed 2628.37 samples/sec   Loss 8.9659   LearningRate 0.0440   Epoch: 6   Global Step: 279310   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:23,726-Speed 2632.00 samples/sec   Loss 8.9045   LearningRate 0.0440   Epoch: 6   Global Step: 279320   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:27,618-Speed 2631.92 samples/sec   Loss 8.9818   LearningRate 0.0440   Epoch: 6   Global Step: 279330   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:05:31,540-Speed 2611.34 samples/sec   Loss 8.8630   LearningRate 0.0440   Epoch: 6   Global Step: 279340   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:05:35,437-Speed 2628.36 samples/sec   Loss 9.0226   LearningRate 0.0440   Epoch: 6   Global Step: 279350   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:05:39,328-Speed 2632.12 samples/sec   Loss 9.0932   LearningRate 0.0440   Epoch: 6   Global Step: 279360   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:05:43,215-Speed 2635.86 samples/sec   Loss 9.0954   LearningRate 0.0440   Epoch: 6   Global Step: 279370   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:47,110-Speed 2629.73 samples/sec   Loss 9.0028   LearningRate 0.0440   Epoch: 6   Global Step: 279380   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:51,001-Speed 2632.05 samples/sec   Loss 9.0266   LearningRate 0.0440   Epoch: 6   Global Step: 279390   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:54,890-Speed 2633.93 samples/sec   Loss 9.0738   LearningRate 0.0440   Epoch: 6   Global Step: 279400   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:05:58,781-Speed 2632.61 samples/sec   Loss 8.9515   LearningRate 0.0440   Epoch: 6   Global Step: 279410   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:06:02,678-Speed 2628.15 samples/sec   Loss 9.0879   LearningRate 0.0440   Epoch: 6   Global Step: 279420   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:06:06,566-Speed 2634.05 samples/sec   Loss 8.8812   LearningRate 0.0440   Epoch: 6   Global Step: 279430   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:06:10,458-Speed 2631.45 samples/sec   Loss 8.9554   LearningRate 0.0440   Epoch: 6   Global Step: 279440   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:06:14,366-Speed 2621.43 samples/sec   Loss 8.9898   LearningRate 0.0440   Epoch: 6   Global Step: 279450   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:06:18,260-Speed 2630.63 samples/sec   Loss 8.9311   LearningRate 0.0440   Epoch: 6   Global Step: 279460   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:06:22,152-Speed 2631.76 samples/sec   Loss 9.0198   LearningRate 0.0440   Epoch: 6   Global Step: 279470   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:26,046-Speed 2630.12 samples/sec   Loss 9.0839   LearningRate 0.0440   Epoch: 6   Global Step: 279480   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:29,939-Speed 2631.29 samples/sec   Loss 9.0238   LearningRate 0.0440   Epoch: 6   Global Step: 279490   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:33,835-Speed 2629.20 samples/sec   Loss 9.0049   LearningRate 0.0440   Epoch: 6   Global Step: 279500   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:37,730-Speed 2628.95 samples/sec   Loss 9.0078   LearningRate 0.0440   Epoch: 6   Global Step: 279510   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:41,620-Speed 2632.94 samples/sec   Loss 8.9178   LearningRate 0.0440   Epoch: 6   Global Step: 279520   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:45,521-Speed 2625.73 samples/sec   Loss 8.9392   LearningRate 0.0440   Epoch: 6   Global Step: 279530   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:49,406-Speed 2636.68 samples/sec   Loss 8.9866   LearningRate 0.0440   Epoch: 6   Global Step: 279540   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:53,323-Speed 2615.18 samples/sec   Loss 8.9090   LearningRate 0.0440   Epoch: 6   Global Step: 279550   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:06:57,227-Speed 2623.62 samples/sec   Loss 9.0745   LearningRate 0.0440   Epoch: 6   Global Step: 279560   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:01,107-Speed 2640.10 samples/sec   Loss 9.1133   LearningRate 0.0440   Epoch: 6   Global Step: 279570   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:05,026-Speed 2612.96 samples/sec   Loss 8.8836   LearningRate 0.0440   Epoch: 6   Global Step: 279580   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:09,137-Speed 2491.34 samples/sec   Loss 8.9301   LearningRate 0.0440   Epoch: 6   Global Step: 279590   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:13,043-Speed 2622.73 samples/sec   Loss 8.9580   LearningRate 0.0440   Epoch: 6   Global Step: 279600   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:16,948-Speed 2622.80 samples/sec   Loss 8.9343   LearningRate 0.0440   Epoch: 6   Global Step: 279610   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:20,905-Speed 2588.84 samples/sec   Loss 9.0927   LearningRate 0.0439   Epoch: 6   Global Step: 279620   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:24,800-Speed 2629.22 samples/sec   Loss 9.0413   LearningRate 0.0439   Epoch: 6   Global Step: 279630   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:28,709-Speed 2620.39 samples/sec   Loss 8.9017   LearningRate 0.0439   Epoch: 6   Global Step: 279640   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:32,600-Speed 2632.31 samples/sec   Loss 8.9519   LearningRate 0.0439   Epoch: 6   Global Step: 279650   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:36,492-Speed 2631.78 samples/sec   Loss 8.9735   LearningRate 0.0439   Epoch: 6   Global Step: 279660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:07:40,511-Speed 2548.16 samples/sec   Loss 8.8835   LearningRate 0.0439   Epoch: 6   Global Step: 279670   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:07:44,486-Speed 2577.46 samples/sec   Loss 8.8599   LearningRate 0.0439   Epoch: 6   Global Step: 279680   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:07:48,389-Speed 2624.47 samples/sec   Loss 8.9864   LearningRate 0.0439   Epoch: 6   Global Step: 279690   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:07:52,314-Speed 2609.75 samples/sec   Loss 8.9950   LearningRate 0.0439   Epoch: 6   Global Step: 279700   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:07:56,207-Speed 2631.02 samples/sec   Loss 8.9535   LearningRate 0.0439   Epoch: 6   Global Step: 279710   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:08:00,076-Speed 2648.07 samples/sec   Loss 8.8302   LearningRate 0.0439   Epoch: 6   Global Step: 279720   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:03,970-Speed 2630.10 samples/sec   Loss 9.0154   LearningRate 0.0439   Epoch: 6   Global Step: 279730   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:07,860-Speed 2633.12 samples/sec   Loss 8.9597   LearningRate 0.0439   Epoch: 6   Global Step: 279740   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:11,755-Speed 2629.55 samples/sec   Loss 8.9614   LearningRate 0.0439   Epoch: 6   Global Step: 279750   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:15,645-Speed 2633.18 samples/sec   Loss 8.8942   LearningRate 0.0439   Epoch: 6   Global Step: 279760   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:19,537-Speed 2632.18 samples/sec   Loss 9.1038   LearningRate 0.0439   Epoch: 6   Global Step: 279770   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:23,426-Speed 2633.67 samples/sec   Loss 8.9829   LearningRate 0.0439   Epoch: 6   Global Step: 279780   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:27,318-Speed 2631.29 samples/sec   Loss 8.9764   LearningRate 0.0439   Epoch: 6   Global Step: 279790   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:31,220-Speed 2625.79 samples/sec   Loss 9.0265   LearningRate 0.0439   Epoch: 6   Global Step: 279800   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:35,115-Speed 2629.76 samples/sec   Loss 9.0534   LearningRate 0.0439   Epoch: 6   Global Step: 279810   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:39,011-Speed 2629.73 samples/sec   Loss 9.0303   LearningRate 0.0439   Epoch: 6   Global Step: 279820   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:08:42,908-Speed 2627.86 samples/sec   Loss 8.9436   LearningRate 0.0439   Epoch: 6   Global Step: 279830   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:08:46,809-Speed 2626.03 samples/sec   Loss 8.8809   LearningRate 0.0439   Epoch: 6   Global Step: 279840   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:08:50,687-Speed 2641.20 samples/sec   Loss 8.8413   LearningRate 0.0439   Epoch: 6   Global Step: 279850   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:54,580-Speed 2631.01 samples/sec   Loss 8.8164   LearningRate 0.0439   Epoch: 6   Global Step: 279860   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:08:58,481-Speed 2625.71 samples/sec   Loss 8.8733   LearningRate 0.0439   Epoch: 6   Global Step: 279870   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:02,389-Speed 2620.97 samples/sec   Loss 9.0040   LearningRate 0.0439   Epoch: 6   Global Step: 279880   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:06,289-Speed 2626.37 samples/sec   Loss 9.0797   LearningRate 0.0439   Epoch: 6   Global Step: 279890   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:10,189-Speed 2626.92 samples/sec   Loss 9.0121   LearningRate 0.0439   Epoch: 6   Global Step: 279900   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:14,087-Speed 2627.65 samples/sec   Loss 8.9514   LearningRate 0.0439   Epoch: 6   Global Step: 279910   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:18,026-Speed 2599.80 samples/sec   Loss 8.9799   LearningRate 0.0439   Epoch: 6   Global Step: 279920   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:21,919-Speed 2631.13 samples/sec   Loss 9.0018   LearningRate 0.0439   Epoch: 6   Global Step: 279930   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:25,813-Speed 2630.89 samples/sec   Loss 8.8582   LearningRate 0.0439   Epoch: 6   Global Step: 279940   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:29,707-Speed 2630.76 samples/sec   Loss 9.0563   LearningRate 0.0439   Epoch: 6   Global Step: 279950   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:09:33,579-Speed 2645.13 samples/sec   Loss 8.9739   LearningRate 0.0439   Epoch: 6   Global Step: 279960   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:37,480-Speed 2625.60 samples/sec   Loss 8.9084   LearningRate 0.0439   Epoch: 6   Global Step: 279970   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:41,372-Speed 2631.49 samples/sec   Loss 8.9583   LearningRate 0.0439   Epoch: 6   Global Step: 279980   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:45,266-Speed 2630.58 samples/sec   Loss 8.8825   LearningRate 0.0439   Epoch: 6   Global Step: 279990   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:09:49,161-Speed 2629.66 samples/sec   Loss 8.8247   LearningRate 0.0439   Epoch: 6   Global Step: 280000   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:10:32,439-[lfw][280000]XNorm: 23.485694
Training: 2022-04-14 03:10:32,440-[lfw][280000]Accuracy-Flip: 0.99767+-0.00200
Training: 2022-04-14 03:10:32,441-[lfw][280000]Accuracy-Highest: 0.99783
Training: 2022-04-14 03:11:22,285-[cfp_fp][280000]XNorm: 21.563512
Training: 2022-04-14 03:11:22,286-[cfp_fp][280000]Accuracy-Flip: 0.98457+-0.00750
Training: 2022-04-14 03:11:22,288-[cfp_fp][280000]Accuracy-Highest: 0.98643
Training: 2022-04-14 03:12:04,966-[agedb_30][280000]XNorm: 23.049351
Training: 2022-04-14 03:12:04,967-[agedb_30][280000]Accuracy-Flip: 0.97567+-0.00898
Training: 2022-04-14 03:12:04,967-[agedb_30][280000]Accuracy-Highest: 0.97567
Training: 2022-04-14 03:12:08,836-Speed 73.31 samples/sec   Loss 8.9456   LearningRate 0.0439   Epoch: 6   Global Step: 280010   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:12,703-Speed 2648.44 samples/sec   Loss 9.0089   LearningRate 0.0439   Epoch: 6   Global Step: 280020   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:16,573-Speed 2646.83 samples/sec   Loss 8.8560   LearningRate 0.0439   Epoch: 6   Global Step: 280030   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:20,445-Speed 2644.97 samples/sec   Loss 8.9283   LearningRate 0.0439   Epoch: 6   Global Step: 280040   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:24,322-Speed 2641.94 samples/sec   Loss 8.9151   LearningRate 0.0439   Epoch: 6   Global Step: 280050   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:28,201-Speed 2640.34 samples/sec   Loss 8.9687   LearningRate 0.0439   Epoch: 6   Global Step: 280060   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:12:32,080-Speed 2641.03 samples/sec   Loss 8.9975   LearningRate 0.0439   Epoch: 6   Global Step: 280070   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:12:35,940-Speed 2653.27 samples/sec   Loss 9.0290   LearningRate 0.0439   Epoch: 6   Global Step: 280080   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:39,842-Speed 2625.36 samples/sec   Loss 8.8998   LearningRate 0.0439   Epoch: 6   Global Step: 280090   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:43,729-Speed 2635.43 samples/sec   Loss 8.9782   LearningRate 0.0439   Epoch: 6   Global Step: 280100   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:47,618-Speed 2633.87 samples/sec   Loss 9.0031   LearningRate 0.0439   Epoch: 6   Global Step: 280110   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:51,611-Speed 2565.33 samples/sec   Loss 9.0930   LearningRate 0.0439   Epoch: 6   Global Step: 280120   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:55,505-Speed 2630.43 samples/sec   Loss 9.0085   LearningRate 0.0439   Epoch: 6   Global Step: 280130   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:12:59,441-Speed 2602.16 samples/sec   Loss 8.9029   LearningRate 0.0439   Epoch: 6   Global Step: 280140   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:03,362-Speed 2612.27 samples/sec   Loss 8.8970   LearningRate 0.0439   Epoch: 6   Global Step: 280150   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:07,262-Speed 2626.66 samples/sec   Loss 8.9113   LearningRate 0.0439   Epoch: 6   Global Step: 280160   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:11,156-Speed 2630.41 samples/sec   Loss 8.8401   LearningRate 0.0439   Epoch: 6   Global Step: 280170   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:15,053-Speed 2628.12 samples/sec   Loss 9.0423   LearningRate 0.0439   Epoch: 6   Global Step: 280180   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:13:18,928-Speed 2643.80 samples/sec   Loss 9.0882   LearningRate 0.0439   Epoch: 6   Global Step: 280190   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:22,819-Speed 2632.33 samples/sec   Loss 8.8513   LearningRate 0.0439   Epoch: 6   Global Step: 280200   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:26,713-Speed 2630.17 samples/sec   Loss 9.0091   LearningRate 0.0439   Epoch: 6   Global Step: 280210   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:30,607-Speed 2629.55 samples/sec   Loss 8.9569   LearningRate 0.0439   Epoch: 6   Global Step: 280220   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:34,507-Speed 2627.27 samples/sec   Loss 9.0598   LearningRate 0.0439   Epoch: 6   Global Step: 280230   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:38,402-Speed 2629.59 samples/sec   Loss 8.8813   LearningRate 0.0438   Epoch: 6   Global Step: 280240   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:42,306-Speed 2623.53 samples/sec   Loss 9.0159   LearningRate 0.0438   Epoch: 6   Global Step: 280250   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:13:46,185-Speed 2640.31 samples/sec   Loss 9.0656   LearningRate 0.0438   Epoch: 6   Global Step: 280260   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:13:50,083-Speed 2627.79 samples/sec   Loss 8.9253   LearningRate 0.0438   Epoch: 6   Global Step: 280270   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:13:53,982-Speed 2627.14 samples/sec   Loss 8.9244   LearningRate 0.0438   Epoch: 6   Global Step: 280280   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:13:57,879-Speed 2628.27 samples/sec   Loss 9.0218   LearningRate 0.0438   Epoch: 6   Global Step: 280290   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:01,779-Speed 2625.54 samples/sec   Loss 9.0316   LearningRate 0.0438   Epoch: 6   Global Step: 280300   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:05,680-Speed 2626.24 samples/sec   Loss 8.9545   LearningRate 0.0438   Epoch: 6   Global Step: 280310   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:09,577-Speed 2627.78 samples/sec   Loss 9.0684   LearningRate 0.0438   Epoch: 6   Global Step: 280320   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:13,477-Speed 2626.58 samples/sec   Loss 9.0044   LearningRate 0.0438   Epoch: 6   Global Step: 280330   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:17,381-Speed 2623.20 samples/sec   Loss 9.0814   LearningRate 0.0438   Epoch: 6   Global Step: 280340   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:21,277-Speed 2629.59 samples/sec   Loss 9.0173   LearningRate 0.0438   Epoch: 6   Global Step: 280350   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:14:25,176-Speed 2626.61 samples/sec   Loss 8.9822   LearningRate 0.0438   Epoch: 6   Global Step: 280360   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:29,072-Speed 2629.41 samples/sec   Loss 8.9960   LearningRate 0.0438   Epoch: 6   Global Step: 280370   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:32,968-Speed 2628.17 samples/sec   Loss 8.9358   LearningRate 0.0438   Epoch: 6   Global Step: 280380   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:36,864-Speed 2629.01 samples/sec   Loss 8.9109   LearningRate 0.0438   Epoch: 6   Global Step: 280390   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:40,762-Speed 2628.17 samples/sec   Loss 8.7813   LearningRate 0.0438   Epoch: 6   Global Step: 280400   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:44,657-Speed 2629.52 samples/sec   Loss 8.9115   LearningRate 0.0438   Epoch: 6   Global Step: 280410   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:48,552-Speed 2629.25 samples/sec   Loss 8.9847   LearningRate 0.0438   Epoch: 6   Global Step: 280420   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:52,457-Speed 2622.76 samples/sec   Loss 8.8875   LearningRate 0.0438   Epoch: 6   Global Step: 280430   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:14:56,361-Speed 2625.53 samples/sec   Loss 8.9508   LearningRate 0.0438   Epoch: 6   Global Step: 280440   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:00,257-Speed 2628.80 samples/sec   Loss 8.9338   LearningRate 0.0438   Epoch: 6   Global Step: 280450   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:04,154-Speed 2627.97 samples/sec   Loss 8.9023   LearningRate 0.0438   Epoch: 6   Global Step: 280460   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:15:08,058-Speed 2623.60 samples/sec   Loss 8.9377   LearningRate 0.0438   Epoch: 6   Global Step: 280470   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:15:11,940-Speed 2638.58 samples/sec   Loss 8.9073   LearningRate 0.0438   Epoch: 6   Global Step: 280480   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:15,835-Speed 2629.80 samples/sec   Loss 9.0006   LearningRate 0.0438   Epoch: 6   Global Step: 280490   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:19,736-Speed 2625.87 samples/sec   Loss 8.8645   LearningRate 0.0438   Epoch: 6   Global Step: 280500   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:23,636-Speed 2626.58 samples/sec   Loss 8.9572   LearningRate 0.0438   Epoch: 6   Global Step: 280510   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:27,569-Speed 2604.39 samples/sec   Loss 8.9949   LearningRate 0.0438   Epoch: 6   Global Step: 280520   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:31,475-Speed 2622.75 samples/sec   Loss 8.9765   LearningRate 0.0438   Epoch: 6   Global Step: 280530   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:35,368-Speed 2630.46 samples/sec   Loss 8.9749   LearningRate 0.0438   Epoch: 6   Global Step: 280540   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:39,266-Speed 2627.49 samples/sec   Loss 8.9542   LearningRate 0.0438   Epoch: 6   Global Step: 280550   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:43,164-Speed 2627.82 samples/sec   Loss 8.9421   LearningRate 0.0438   Epoch: 6   Global Step: 280560   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:47,061-Speed 2628.21 samples/sec   Loss 9.0020   LearningRate 0.0438   Epoch: 6   Global Step: 280570   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:50,940-Speed 2640.74 samples/sec   Loss 9.0008   LearningRate 0.0438   Epoch: 6   Global Step: 280580   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:54,835-Speed 2629.50 samples/sec   Loss 9.0894   LearningRate 0.0438   Epoch: 6   Global Step: 280590   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:15:58,736-Speed 2626.09 samples/sec   Loss 9.0437   LearningRate 0.0438   Epoch: 6   Global Step: 280600   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:02,635-Speed 2626.32 samples/sec   Loss 8.8511   LearningRate 0.0438   Epoch: 6   Global Step: 280610   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:06,535-Speed 2626.65 samples/sec   Loss 8.9105   LearningRate 0.0438   Epoch: 6   Global Step: 280620   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:10,436-Speed 2625.73 samples/sec   Loss 8.9800   LearningRate 0.0438   Epoch: 6   Global Step: 280630   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:14,336-Speed 2625.99 samples/sec   Loss 9.0714   LearningRate 0.0438   Epoch: 6   Global Step: 280640   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:18,235-Speed 2626.99 samples/sec   Loss 8.9980   LearningRate 0.0438   Epoch: 6   Global Step: 280650   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:22,224-Speed 2567.68 samples/sec   Loss 8.9944   LearningRate 0.0438   Epoch: 6   Global Step: 280660   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:26,125-Speed 2626.06 samples/sec   Loss 8.9840   LearningRate 0.0438   Epoch: 6   Global Step: 280670   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:30,024-Speed 2627.64 samples/sec   Loss 8.8323   LearningRate 0.0438   Epoch: 6   Global Step: 280680   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:16:33,919-Speed 2629.28 samples/sec   Loss 8.9555   LearningRate 0.0438   Epoch: 6   Global Step: 280690   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:16:37,800-Speed 2639.00 samples/sec   Loss 9.0310   LearningRate 0.0438   Epoch: 6   Global Step: 280700   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:41,704-Speed 2623.33 samples/sec   Loss 8.9441   LearningRate 0.0438   Epoch: 6   Global Step: 280710   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:45,607-Speed 2625.09 samples/sec   Loss 8.9997   LearningRate 0.0438   Epoch: 6   Global Step: 280720   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:49,508-Speed 2624.88 samples/sec   Loss 8.9671   LearningRate 0.0438   Epoch: 6   Global Step: 280730   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:53,414-Speed 2622.51 samples/sec   Loss 9.0527   LearningRate 0.0438   Epoch: 6   Global Step: 280740   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:16:57,284-Speed 2646.85 samples/sec   Loss 9.1476   LearningRate 0.0438   Epoch: 6   Global Step: 280750   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:17:01,128-Speed 2664.95 samples/sec   Loss 9.2360   LearningRate 0.0438   Epoch: 6   Global Step: 280760   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:17:05,017-Speed 2633.46 samples/sec   Loss 9.3282   LearningRate 0.0438   Epoch: 6   Global Step: 280770   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:08,910-Speed 2630.67 samples/sec   Loss 9.0494   LearningRate 0.0438   Epoch: 6   Global Step: 280780   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:12,803-Speed 2630.93 samples/sec   Loss 9.0928   LearningRate 0.0438   Epoch: 6   Global Step: 280790   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:16,702-Speed 2627.29 samples/sec   Loss 8.9651   LearningRate 0.0438   Epoch: 6   Global Step: 280800   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:20,623-Speed 2612.73 samples/sec   Loss 8.9473   LearningRate 0.0438   Epoch: 6   Global Step: 280810   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:24,535-Speed 2618.08 samples/sec   Loss 8.9585   LearningRate 0.0438   Epoch: 6   Global Step: 280820   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:28,444-Speed 2620.33 samples/sec   Loss 8.9572   LearningRate 0.0438   Epoch: 6   Global Step: 280830   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:32,337-Speed 2630.98 samples/sec   Loss 9.3185   LearningRate 0.0438   Epoch: 6   Global Step: 280840   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:36,238-Speed 2625.54 samples/sec   Loss 8.9271   LearningRate 0.0438   Epoch: 6   Global Step: 280850   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:40,133-Speed 2629.61 samples/sec   Loss 9.0746   LearningRate 0.0438   Epoch: 6   Global Step: 280860   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:17:44,038-Speed 2623.02 samples/sec   Loss 8.9255   LearningRate 0.0437   Epoch: 6   Global Step: 280870   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:17:47,937-Speed 2626.90 samples/sec   Loss 9.0843   LearningRate 0.0437   Epoch: 6   Global Step: 280880   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:17:51,832-Speed 2629.35 samples/sec   Loss 8.9828   LearningRate 0.0437   Epoch: 6   Global Step: 280890   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:17:55,724-Speed 2631.83 samples/sec   Loss 8.9213   LearningRate 0.0437   Epoch: 6   Global Step: 280900   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:17:59,617-Speed 2631.61 samples/sec   Loss 8.9702   LearningRate 0.0437   Epoch: 6   Global Step: 280910   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:18:03,479-Speed 2652.23 samples/sec   Loss 9.4241   LearningRate 0.0437   Epoch: 6   Global Step: 280920   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:07,369-Speed 2632.26 samples/sec   Loss 9.2552   LearningRate 0.0437   Epoch: 6   Global Step: 280930   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:11,276-Speed 2621.68 samples/sec   Loss 8.9660   LearningRate 0.0437   Epoch: 6   Global Step: 280940   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:15,168-Speed 2631.65 samples/sec   Loss 9.0553   LearningRate 0.0437   Epoch: 6   Global Step: 280950   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:19,065-Speed 2628.21 samples/sec   Loss 8.9928   LearningRate 0.0437   Epoch: 6   Global Step: 280960   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:22,969-Speed 2623.53 samples/sec   Loss 9.0390   LearningRate 0.0437   Epoch: 6   Global Step: 280970   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:26,860-Speed 2632.89 samples/sec   Loss 9.0432   LearningRate 0.0437   Epoch: 6   Global Step: 280980   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:30,754-Speed 2630.43 samples/sec   Loss 9.0705   LearningRate 0.0437   Epoch: 6   Global Step: 280990   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:34,648-Speed 2630.13 samples/sec   Loss 9.0612   LearningRate 0.0437   Epoch: 6   Global Step: 281000   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:38,540-Speed 2631.87 samples/sec   Loss 9.0888   LearningRate 0.0437   Epoch: 6   Global Step: 281010   Fp16 Grad Scale: 1024   Required: 62 hours
Training: 2022-04-14 03:18:42,434-Speed 2629.70 samples/sec   Loss 9.1420   LearningRate 0.0437   Epoch: 6   Global Step: 281020   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:18:46,328-Speed 2630.39 samples/sec   Loss 9.0120   LearningRate 0.0437   Epoch: 6   Global Step: 281030   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:18:50,220-Speed 2631.38 samples/sec   Loss 9.0516   LearningRate 0.0437   Epoch: 6   Global Step: 281040   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:18:54,125-Speed 2623.04 samples/sec   Loss 8.8993   LearningRate 0.0437   Epoch: 6   Global Step: 281050   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:18:58,025-Speed 2626.52 samples/sec   Loss 8.9613   LearningRate 0.0437   Epoch: 6   Global Step: 281060   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:19:01,928-Speed 2624.15 samples/sec   Loss 9.0560   LearningRate 0.0437   Epoch: 6   Global Step: 281070   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:19:05,832-Speed 2623.75 samples/sec   Loss 9.0815   LearningRate 0.0437   Epoch: 6   Global Step: 281080   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:19:09,726-Speed 2630.18 samples/sec   Loss 9.0891   LearningRate 0.0437   Epoch: 6   Global Step: 281090   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:19:13,623-Speed 2628.43 samples/sec   Loss 8.9693   LearningRate 0.0437   Epoch: 6   Global Step: 281100   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:19:17,571-Speed 2594.37 samples/sec   Loss 8.9380   LearningRate 0.0437   Epoch: 6   Global Step: 281110   Fp16 Grad Scale: 2048   Required: 62 hours
Training: 2022-04-14 03:19:21,547-Speed 2575.76 samples/sec   Loss 8.9945   LearningRate 0.0437   Epoch: 6   Global Step: 281120   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:25,446-Speed 2627.55 samples/sec   Loss 9.0591   LearningRate 0.0437   Epoch: 6   Global Step: 281130   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:29,342-Speed 2629.21 samples/sec   Loss 9.0176   LearningRate 0.0437   Epoch: 6   Global Step: 281140   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:33,236-Speed 2630.21 samples/sec   Loss 9.0158   LearningRate 0.0437   Epoch: 6   Global Step: 281150   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:37,129-Speed 2631.58 samples/sec   Loss 8.9700   LearningRate 0.0437   Epoch: 6   Global Step: 281160   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:41,020-Speed 2631.86 samples/sec   Loss 8.9311   LearningRate 0.0437   Epoch: 6   Global Step: 281170   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:44,913-Speed 2631.09 samples/sec   Loss 9.1067   LearningRate 0.0437   Epoch: 6   Global Step: 281180   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:48,807-Speed 2630.64 samples/sec   Loss 9.1425   LearningRate 0.0437   Epoch: 6   Global Step: 281190   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:52,700-Speed 2630.45 samples/sec   Loss 8.9268   LearningRate 0.0437   Epoch: 6   Global Step: 281200   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:19:56,595-Speed 2629.92 samples/sec   Loss 8.8681   LearningRate 0.0437   Epoch: 6   Global Step: 281210   Fp16 Grad Scale: 4096   Required: 62 hours
Training: 2022-04-14 03:20:00,486-Speed 2632.37 samples/sec   Loss 9.0335   LearningRate 0.0437   Epoch: 6   Global Step: 281220   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:04,391-Speed 2623.26 samples/sec   Loss 9.0426   LearningRate 0.0437   Epoch: 6   Global Step: 281230   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:08,281-Speed 2632.91 samples/sec   Loss 8.9539   LearningRate 0.0437   Epoch: 6   Global Step: 281240   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:12,178-Speed 2628.05 samples/sec   Loss 9.0671   LearningRate 0.0437   Epoch: 6   Global Step: 281250   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:16,072-Speed 2630.51 samples/sec   Loss 8.9817   LearningRate 0.0437   Epoch: 6   Global Step: 281260   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:19,964-Speed 2631.78 samples/sec   Loss 9.0183   LearningRate 0.0437   Epoch: 6   Global Step: 281270   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:23,863-Speed 2626.49 samples/sec   Loss 8.9837   LearningRate 0.0437   Epoch: 6   Global Step: 281280   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:27,759-Speed 2628.91 samples/sec   Loss 9.0317   LearningRate 0.0437   Epoch: 6   Global Step: 281290   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:31,659-Speed 2626.89 samples/sec   Loss 8.9265   LearningRate 0.0437   Epoch: 6   Global Step: 281300   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:35,566-Speed 2621.73 samples/sec   Loss 8.9251   LearningRate 0.0437   Epoch: 6   Global Step: 281310   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:39,458-Speed 2631.40 samples/sec   Loss 8.9856   LearningRate 0.0437   Epoch: 6   Global Step: 281320   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:20:43,346-Speed 2634.56 samples/sec   Loss 8.9205   LearningRate 0.0437   Epoch: 6   Global Step: 281330   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:47,238-Speed 2631.29 samples/sec   Loss 8.8283   LearningRate 0.0437   Epoch: 6   Global Step: 281340   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:51,137-Speed 2627.31 samples/sec   Loss 9.1450   LearningRate 0.0437   Epoch: 6   Global Step: 281350   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:55,031-Speed 2630.39 samples/sec   Loss 8.8764   LearningRate 0.0437   Epoch: 6   Global Step: 281360   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:20:58,973-Speed 2598.63 samples/sec   Loss 8.8675   LearningRate 0.0437   Epoch: 6   Global Step: 281370   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:21:02,872-Speed 2626.98 samples/sec   Loss 8.8885   LearningRate 0.0437   Epoch: 6   Global Step: 281380   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:21:06,809-Speed 2601.58 samples/sec   Loss 9.0188   LearningRate 0.0437   Epoch: 6   Global Step: 281390   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:21:10,704-Speed 2629.62 samples/sec   Loss 8.8985   LearningRate 0.0437   Epoch: 6   Global Step: 281400   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:21:14,605-Speed 2625.25 samples/sec   Loss 9.0631   LearningRate 0.0437   Epoch: 6   Global Step: 281410   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:21:18,501-Speed 2628.87 samples/sec   Loss 9.0718   LearningRate 0.0437   Epoch: 6   Global Step: 281420   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-14 03:21:22,397-Speed 2629.35 samples/sec   Loss 9.1348   LearningRate 0.0437   Epoch: 6   Global Step: 281430   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:26,296-Speed 2627.01 samples/sec   Loss 8.8798   LearningRate 0.0437   Epoch: 6   Global Step: 281440   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:30,301-Speed 2557.84 samples/sec   Loss 9.0280   LearningRate 0.0437   Epoch: 6   Global Step: 281450   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:34,211-Speed 2619.34 samples/sec   Loss 8.9755   LearningRate 0.0437   Epoch: 6   Global Step: 281460   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:38,120-Speed 2619.81 samples/sec   Loss 9.0067   LearningRate 0.0437   Epoch: 6   Global Step: 281470   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:42,021-Speed 2625.59 samples/sec   Loss 9.1671   LearningRate 0.0437   Epoch: 6   Global Step: 281480   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:45,914-Speed 2630.85 samples/sec   Loss 8.9879   LearningRate 0.0437   Epoch: 6   Global Step: 281490   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:49,809-Speed 2630.25 samples/sec   Loss 8.8654   LearningRate 0.0436   Epoch: 6   Global Step: 281500   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:53,739-Speed 2606.42 samples/sec   Loss 9.1164   LearningRate 0.0436   Epoch: 6   Global Step: 281510   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:21:57,653-Speed 2616.31 samples/sec   Loss 9.1049   LearningRate 0.0436   Epoch: 6   Global Step: 281520   Fp16 Grad Scale: 16384   Required: 62 hours
Training: 2022-04-14 03:22:01,555-Speed 2625.70 samples/sec   Loss 8.9823   LearningRate 0.0436   Epoch: 6   Global Step: 281530   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:05,452-Speed 2627.92 samples/sec   Loss 8.9096   LearningRate 0.0436   Epoch: 6   Global Step: 281540   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:09,357-Speed 2623.19 samples/sec   Loss 8.9282   LearningRate 0.0436   Epoch: 6   Global Step: 281550   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:13,282-Speed 2609.01 samples/sec   Loss 8.9781   LearningRate 0.0436   Epoch: 6   Global Step: 281560   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:17,177-Speed 2630.39 samples/sec   Loss 8.8895   LearningRate 0.0436   Epoch: 6   Global Step: 281570   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:21,086-Speed 2620.58 samples/sec   Loss 8.9911   LearningRate 0.0436   Epoch: 6   Global Step: 281580   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:24,982-Speed 2628.47 samples/sec   Loss 8.8412   LearningRate 0.0436   Epoch: 6   Global Step: 281590   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:28,884-Speed 2625.41 samples/sec   Loss 8.9361   LearningRate 0.0436   Epoch: 6   Global Step: 281600   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:32,776-Speed 2631.74 samples/sec   Loss 8.8902   LearningRate 0.0436   Epoch: 6   Global Step: 281610   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:36,668-Speed 2631.21 samples/sec   Loss 8.9763   LearningRate 0.0436   Epoch: 6   Global Step: 281620   Fp16 Grad Scale: 32768   Required: 62 hours
Training: 2022-04-14 03:22:40,577-Speed 2620.51 samples/sec   Loss 8.9413   LearningRate 0.0436   Epoch: 6   Global Step: 281630   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:22:44,469-Speed 2632.25 samples/sec   Loss 8.9328   LearningRate 0.0436   Epoch: 6   Global Step: 281640   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:22:48,364-Speed 2629.53 samples/sec   Loss 8.8902   LearningRate 0.0436   Epoch: 6   Global Step: 281650   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:22:52,257-Speed 2630.92 samples/sec   Loss 8.8618   LearningRate 0.0436   Epoch: 6   Global Step: 281660   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:22:56,149-Speed 2631.47 samples/sec   Loss 8.7509   LearningRate 0.0436   Epoch: 6   Global Step: 281670   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:23:00,046-Speed 2628.78 samples/sec   Loss 9.0099   LearningRate 0.0436   Epoch: 6   Global Step: 281680   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:23:03,947-Speed 2625.18 samples/sec   Loss 9.0073   LearningRate 0.0436   Epoch: 6   Global Step: 281690   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:23:07,842-Speed 2629.84 samples/sec   Loss 8.9276   LearningRate 0.0436   Epoch: 6   Global Step: 281700   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:23:11,749-Speed 2621.68 samples/sec   Loss 9.0180   LearningRate 0.0436   Epoch: 6   Global Step: 281710   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:23:15,649-Speed 2626.64 samples/sec   Loss 8.8211   LearningRate 0.0436   Epoch: 6   Global Step: 281720   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:23:19,541-Speed 2631.33 samples/sec   Loss 9.0041   LearningRate 0.0436   Epoch: 6   Global Step: 281730   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:23,437-Speed 2629.12 samples/sec   Loss 8.9672   LearningRate 0.0436   Epoch: 6   Global Step: 281740   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:27,333-Speed 2628.64 samples/sec   Loss 8.9152   LearningRate 0.0436   Epoch: 6   Global Step: 281750   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:31,227-Speed 2630.49 samples/sec   Loss 9.0015   LearningRate 0.0436   Epoch: 6   Global Step: 281760   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:35,123-Speed 2628.91 samples/sec   Loss 9.0612   LearningRate 0.0436   Epoch: 6   Global Step: 281770   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:39,019-Speed 2628.96 samples/sec   Loss 9.0239   LearningRate 0.0436   Epoch: 6   Global Step: 281780   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:42,916-Speed 2628.49 samples/sec   Loss 8.8624   LearningRate 0.0436   Epoch: 6   Global Step: 281790   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:46,811-Speed 2630.22 samples/sec   Loss 8.8957   LearningRate 0.0436   Epoch: 6   Global Step: 281800   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:50,707-Speed 2628.90 samples/sec   Loss 8.9611   LearningRate 0.0436   Epoch: 6   Global Step: 281810   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:54,601-Speed 2630.42 samples/sec   Loss 8.9604   LearningRate 0.0436   Epoch: 6   Global Step: 281820   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:23:58,497-Speed 2629.18 samples/sec   Loss 9.0328   LearningRate 0.0436   Epoch: 6   Global Step: 281830   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:24:02,394-Speed 2627.84 samples/sec   Loss 8.9101   LearningRate 0.0436   Epoch: 6   Global Step: 281840   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:24:06,290-Speed 2629.06 samples/sec   Loss 9.0393   LearningRate 0.0436   Epoch: 6   Global Step: 281850   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:24:10,184-Speed 2630.34 samples/sec   Loss 8.9201   LearningRate 0.0436   Epoch: 6   Global Step: 281860   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:24:14,095-Speed 2618.90 samples/sec   Loss 8.9401   LearningRate 0.0436   Epoch: 6   Global Step: 281870   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:24:18,012-Speed 2614.44 samples/sec   Loss 8.8353   LearningRate 0.0436   Epoch: 6   Global Step: 281880   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:24:21,998-Speed 2569.98 samples/sec   Loss 8.8345   LearningRate 0.0436   Epoch: 6   Global Step: 281890   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:25,887-Speed 2634.33 samples/sec   Loss 8.9828   LearningRate 0.0436   Epoch: 6   Global Step: 281900   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:29,790-Speed 2623.69 samples/sec   Loss 8.9775   LearningRate 0.0436   Epoch: 6   Global Step: 281910   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:33,690-Speed 2626.51 samples/sec   Loss 8.8142   LearningRate 0.0436   Epoch: 6   Global Step: 281920   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:37,587-Speed 2628.60 samples/sec   Loss 9.0971   LearningRate 0.0436   Epoch: 6   Global Step: 281930   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:41,500-Speed 2617.34 samples/sec   Loss 8.9270   LearningRate 0.0436   Epoch: 6   Global Step: 281940   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:45,400-Speed 2626.15 samples/sec   Loss 8.8469   LearningRate 0.0436   Epoch: 6   Global Step: 281950   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:49,292-Speed 2631.70 samples/sec   Loss 8.9425   LearningRate 0.0436   Epoch: 6   Global Step: 281960   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:53,212-Speed 2612.88 samples/sec   Loss 8.9375   LearningRate 0.0436   Epoch: 6   Global Step: 281970   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:24:57,106-Speed 2630.25 samples/sec   Loss 8.9487   LearningRate 0.0436   Epoch: 6   Global Step: 281980   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:01,003-Speed 2628.13 samples/sec   Loss 9.0739   LearningRate 0.0436   Epoch: 6   Global Step: 281990   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:25:04,897-Speed 2630.79 samples/sec   Loss 9.0035   LearningRate 0.0436   Epoch: 6   Global Step: 282000   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:25:08,789-Speed 2631.39 samples/sec   Loss 8.9741   LearningRate 0.0436   Epoch: 6   Global Step: 282010   Fp16 Grad Scale: 262144   Required: 62 hours
Training: 2022-04-14 03:25:12,695-Speed 2622.38 samples/sec   Loss 8.8845   LearningRate 0.0436   Epoch: 6   Global Step: 282020   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:16,711-Speed 2550.27 samples/sec   Loss 8.9392   LearningRate 0.0436   Epoch: 6   Global Step: 282030   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:20,637-Speed 2608.97 samples/sec   Loss 8.9805   LearningRate 0.0436   Epoch: 6   Global Step: 282040   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:24,553-Speed 2615.66 samples/sec   Loss 8.9431   LearningRate 0.0436   Epoch: 6   Global Step: 282050   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:28,628-Speed 2513.27 samples/sec   Loss 9.0294   LearningRate 0.0436   Epoch: 6   Global Step: 282060   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:32,628-Speed 2560.87 samples/sec   Loss 8.9947   LearningRate 0.0436   Epoch: 6   Global Step: 282070   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:36,529-Speed 2625.98 samples/sec   Loss 8.9471   LearningRate 0.0436   Epoch: 6   Global Step: 282080   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:25:40,406-Speed 2641.80 samples/sec   Loss 8.9559   LearningRate 0.0436   Epoch: 6   Global Step: 282090   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:25:44,308-Speed 2625.17 samples/sec   Loss 8.9892   LearningRate 0.0436   Epoch: 6   Global Step: 282100   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:25:48,203-Speed 2629.02 samples/sec   Loss 8.8704   LearningRate 0.0436   Epoch: 6   Global Step: 282110   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:25:52,102-Speed 2627.76 samples/sec   Loss 8.8955   LearningRate 0.0436   Epoch: 6   Global Step: 282120   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:25:56,001-Speed 2627.42 samples/sec   Loss 8.8446   LearningRate 0.0435   Epoch: 6   Global Step: 282130   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:25:59,899-Speed 2627.06 samples/sec   Loss 8.9933   LearningRate 0.0435   Epoch: 6   Global Step: 282140   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:26:03,793-Speed 2631.27 samples/sec   Loss 9.0237   LearningRate 0.0435   Epoch: 6   Global Step: 282150   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:26:07,832-Speed 2535.42 samples/sec   Loss 9.0079   LearningRate 0.0435   Epoch: 6   Global Step: 282160   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:26:11,905-Speed 2514.60 samples/sec   Loss 8.9651   LearningRate 0.0435   Epoch: 6   Global Step: 282170   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:26:15,992-Speed 2506.41 samples/sec   Loss 9.0572   LearningRate 0.0435   Epoch: 6   Global Step: 282180   Fp16 Grad Scale: 65536   Required: 62 hours
Training: 2022-04-14 03:26:20,069-Speed 2512.64 samples/sec   Loss 9.0076   LearningRate 0.0435   Epoch: 6   Global Step: 282190   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:26:24,089-Speed 2547.37 samples/sec   Loss 9.0730   LearningRate 0.0435   Epoch: 6   Global Step: 282200   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:26:27,981-Speed 2631.48 samples/sec   Loss 8.8980   LearningRate 0.0435   Epoch: 6   Global Step: 282210   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:26:31,881-Speed 2626.77 samples/sec   Loss 9.0204   LearningRate 0.0435   Epoch: 6   Global Step: 282220   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:26:35,776-Speed 2629.71 samples/sec   Loss 8.9595   LearningRate 0.0435   Epoch: 6   Global Step: 282230   Fp16 Grad Scale: 131072   Required: 62 hours
Training: 2022-04-14 03:26:39,670-Speed 2629.87 samples/sec   Loss 8.9385   LearningRate 0.0435   Epoch: 6   Global Step: 282240   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:26:43,576-Speed 2622.34 samples/sec   Loss 8.9436   LearningRate 0.0435   Epoch: 6   Global Step: 282250   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:26:47,486-Speed 2619.58 samples/sec   Loss 9.0003   LearningRate 0.0435   Epoch: 6   Global Step: 282260   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:26:51,380-Speed 2630.34 samples/sec   Loss 8.9251   LearningRate 0.0435   Epoch: 6   Global Step: 282270   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:26:55,274-Speed 2630.87 samples/sec   Loss 8.9841   LearningRate 0.0435   Epoch: 6   Global Step: 282280   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:26:59,187-Speed 2617.35 samples/sec   Loss 8.9209   LearningRate 0.0435   Epoch: 6   Global Step: 282290   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:27:03,080-Speed 2630.86 samples/sec   Loss 8.9781   LearningRate 0.0435   Epoch: 6   Global Step: 282300   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:27:06,977-Speed 2628.25 samples/sec   Loss 8.8820   LearningRate 0.0435   Epoch: 6   Global Step: 282310   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:27:10,877-Speed 2626.67 samples/sec   Loss 8.9492   LearningRate 0.0435   Epoch: 6   Global Step: 282320   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:27:14,754-Speed 2641.88 samples/sec   Loss 9.0781   LearningRate 0.0435   Epoch: 6   Global Step: 282330   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:27:18,646-Speed 2631.35 samples/sec   Loss 8.8727   LearningRate 0.0435   Epoch: 6   Global Step: 282340   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:27:22,539-Speed 2631.18 samples/sec   Loss 8.9967   LearningRate 0.0435   Epoch: 6   Global Step: 282350   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:27:26,432-Speed 2631.02 samples/sec   Loss 8.9524   LearningRate 0.0435   Epoch: 6   Global Step: 282360   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:27:30,324-Speed 2631.78 samples/sec   Loss 8.7617   LearningRate 0.0435   Epoch: 6   Global Step: 282370   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:27:34,226-Speed 2624.61 samples/sec   Loss 9.0567   LearningRate 0.0435   Epoch: 6   Global Step: 282380   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:27:38,100-Speed 2644.07 samples/sec   Loss 8.9479   LearningRate 0.0435   Epoch: 6   Global Step: 282390   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:27:41,994-Speed 2630.19 samples/sec   Loss 8.8291   LearningRate 0.0435   Epoch: 6   Global Step: 282400   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:27:45,915-Speed 2612.65 samples/sec   Loss 8.9099   LearningRate 0.0435   Epoch: 6   Global Step: 282410   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:27:49,807-Speed 2631.49 samples/sec   Loss 8.9919   LearningRate 0.0435   Epoch: 6   Global Step: 282420   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:27:53,699-Speed 2632.09 samples/sec   Loss 8.9133   LearningRate 0.0435   Epoch: 6   Global Step: 282430   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:27:57,596-Speed 2628.15 samples/sec   Loss 8.9549   LearningRate 0.0435   Epoch: 6   Global Step: 282440   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:28:01,487-Speed 2632.67 samples/sec   Loss 8.8331   LearningRate 0.0435   Epoch: 6   Global Step: 282450   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:28:05,393-Speed 2621.89 samples/sec   Loss 8.8774   LearningRate 0.0435   Epoch: 6   Global Step: 282460   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:28:09,308-Speed 2616.46 samples/sec   Loss 8.9156   LearningRate 0.0435   Epoch: 6   Global Step: 282470   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:28:13,210-Speed 2624.80 samples/sec   Loss 8.9302   LearningRate 0.0435   Epoch: 6   Global Step: 282480   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:28:17,110-Speed 2626.37 samples/sec   Loss 8.8316   LearningRate 0.0435   Epoch: 6   Global Step: 282490   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:21,015-Speed 2623.12 samples/sec   Loss 9.0360   LearningRate 0.0435   Epoch: 6   Global Step: 282500   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:24,916-Speed 2625.37 samples/sec   Loss 8.8750   LearningRate 0.0435   Epoch: 6   Global Step: 282510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:28,822-Speed 2622.50 samples/sec   Loss 8.8773   LearningRate 0.0435   Epoch: 6   Global Step: 282520   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:32,720-Speed 2628.10 samples/sec   Loss 9.0990   LearningRate 0.0435   Epoch: 6   Global Step: 282530   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:36,621-Speed 2625.35 samples/sec   Loss 8.9860   LearningRate 0.0435   Epoch: 6   Global Step: 282540   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:40,525-Speed 2623.28 samples/sec   Loss 8.8866   LearningRate 0.0435   Epoch: 6   Global Step: 282550   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:44,431-Speed 2622.52 samples/sec   Loss 8.7707   LearningRate 0.0435   Epoch: 6   Global Step: 282560   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:48,346-Speed 2615.69 samples/sec   Loss 8.8888   LearningRate 0.0435   Epoch: 6   Global Step: 282570   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:52,240-Speed 2630.92 samples/sec   Loss 9.0367   LearningRate 0.0435   Epoch: 6   Global Step: 282580   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:28:56,118-Speed 2641.09 samples/sec   Loss 8.8452   LearningRate 0.0435   Epoch: 6   Global Step: 282590   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:00,011-Speed 2631.22 samples/sec   Loss 8.9118   LearningRate 0.0435   Epoch: 6   Global Step: 282600   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:03,920-Speed 2620.07 samples/sec   Loss 8.9233   LearningRate 0.0435   Epoch: 6   Global Step: 282610   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:07,813-Speed 2631.21 samples/sec   Loss 9.0544   LearningRate 0.0435   Epoch: 6   Global Step: 282620   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:11,708-Speed 2629.30 samples/sec   Loss 8.9487   LearningRate 0.0435   Epoch: 6   Global Step: 282630   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:15,604-Speed 2629.56 samples/sec   Loss 8.9414   LearningRate 0.0435   Epoch: 6   Global Step: 282640   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:19,508-Speed 2623.47 samples/sec   Loss 8.9213   LearningRate 0.0435   Epoch: 6   Global Step: 282650   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:23,399-Speed 2632.68 samples/sec   Loss 9.0602   LearningRate 0.0435   Epoch: 6   Global Step: 282660   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:27,311-Speed 2618.00 samples/sec   Loss 8.9149   LearningRate 0.0435   Epoch: 6   Global Step: 282670   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:29:31,169-Speed 2654.79 samples/sec   Loss 9.7200   LearningRate 0.0435   Epoch: 6   Global Step: 282680   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:29:35,043-Speed 2644.37 samples/sec   Loss 10.1879   LearningRate 0.0435   Epoch: 6   Global Step: 282690   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:29:38,938-Speed 2629.37 samples/sec   Loss 9.4880   LearningRate 0.0435   Epoch: 6   Global Step: 282700   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:29:42,830-Speed 2631.81 samples/sec   Loss 9.1766   LearningRate 0.0435   Epoch: 6   Global Step: 282710   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:29:46,721-Speed 2632.28 samples/sec   Loss 9.0414   LearningRate 0.0435   Epoch: 6   Global Step: 282720   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:29:50,618-Speed 2627.97 samples/sec   Loss 9.0536   LearningRate 0.0435   Epoch: 6   Global Step: 282730   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:29:54,550-Speed 2605.84 samples/sec   Loss 8.9925   LearningRate 0.0435   Epoch: 6   Global Step: 282740   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:29:58,443-Speed 2630.62 samples/sec   Loss 9.2231   LearningRate 0.0434   Epoch: 6   Global Step: 282750   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:30:02,338-Speed 2630.21 samples/sec   Loss 8.8733   LearningRate 0.0434   Epoch: 6   Global Step: 282760   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:30:06,230-Speed 2631.06 samples/sec   Loss 8.9299   LearningRate 0.0434   Epoch: 6   Global Step: 282770   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:30:10,133-Speed 2624.87 samples/sec   Loss 8.8776   LearningRate 0.0434   Epoch: 6   Global Step: 282780   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:30:14,024-Speed 2631.73 samples/sec   Loss 8.8722   LearningRate 0.0434   Epoch: 6   Global Step: 282790   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:17,919-Speed 2630.45 samples/sec   Loss 8.9566   LearningRate 0.0434   Epoch: 6   Global Step: 282800   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:21,809-Speed 2633.27 samples/sec   Loss 8.8503   LearningRate 0.0434   Epoch: 6   Global Step: 282810   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:25,716-Speed 2621.42 samples/sec   Loss 8.9178   LearningRate 0.0434   Epoch: 6   Global Step: 282820   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:29,609-Speed 2630.57 samples/sec   Loss 8.9567   LearningRate 0.0434   Epoch: 6   Global Step: 282830   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:33,500-Speed 2632.40 samples/sec   Loss 8.9086   LearningRate 0.0434   Epoch: 6   Global Step: 282840   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:37,389-Speed 2634.13 samples/sec   Loss 9.0240   LearningRate 0.0434   Epoch: 6   Global Step: 282850   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:41,345-Speed 2589.26 samples/sec   Loss 8.8817   LearningRate 0.0434   Epoch: 6   Global Step: 282860   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:45,288-Speed 2597.59 samples/sec   Loss 8.8911   LearningRate 0.0434   Epoch: 6   Global Step: 282870   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:49,227-Speed 2599.89 samples/sec   Loss 8.9283   LearningRate 0.0434   Epoch: 6   Global Step: 282880   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:30:53,181-Speed 2590.99 samples/sec   Loss 9.0108   LearningRate 0.0434   Epoch: 6   Global Step: 282890   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:30:57,088-Speed 2621.52 samples/sec   Loss 8.8391   LearningRate 0.0434   Epoch: 6   Global Step: 282900   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:00,988-Speed 2626.04 samples/sec   Loss 8.8304   LearningRate 0.0434   Epoch: 6   Global Step: 282910   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:04,890-Speed 2625.69 samples/sec   Loss 8.9451   LearningRate 0.0434   Epoch: 6   Global Step: 282920   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:08,782-Speed 2631.29 samples/sec   Loss 8.8829   LearningRate 0.0434   Epoch: 6   Global Step: 282930   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:12,670-Speed 2634.27 samples/sec   Loss 8.8565   LearningRate 0.0434   Epoch: 6   Global Step: 282940   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:16,564-Speed 2630.30 samples/sec   Loss 9.0117   LearningRate 0.0434   Epoch: 6   Global Step: 282950   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:20,492-Speed 2608.05 samples/sec   Loss 8.8951   LearningRate 0.0434   Epoch: 6   Global Step: 282960   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:24,397-Speed 2622.59 samples/sec   Loss 9.0913   LearningRate 0.0434   Epoch: 6   Global Step: 282970   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:28,288-Speed 2633.07 samples/sec   Loss 9.0802   LearningRate 0.0434   Epoch: 6   Global Step: 282980   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:31:32,183-Speed 2629.09 samples/sec   Loss 8.9298   LearningRate 0.0434   Epoch: 6   Global Step: 282990   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:36,074-Speed 2633.10 samples/sec   Loss 9.1370   LearningRate 0.0434   Epoch: 6   Global Step: 283000   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:39,977-Speed 2624.14 samples/sec   Loss 8.8356   LearningRate 0.0434   Epoch: 6   Global Step: 283010   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:43,869-Speed 2631.49 samples/sec   Loss 8.9387   LearningRate 0.0434   Epoch: 6   Global Step: 283020   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:47,760-Speed 2632.11 samples/sec   Loss 8.8209   LearningRate 0.0434   Epoch: 6   Global Step: 283030   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:51,652-Speed 2631.43 samples/sec   Loss 8.9578   LearningRate 0.0434   Epoch: 6   Global Step: 283040   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:55,548-Speed 2629.44 samples/sec   Loss 8.9158   LearningRate 0.0434   Epoch: 6   Global Step: 283050   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:31:59,437-Speed 2633.83 samples/sec   Loss 8.8162   LearningRate 0.0434   Epoch: 6   Global Step: 283060   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:03,336-Speed 2626.86 samples/sec   Loss 8.8478   LearningRate 0.0434   Epoch: 6   Global Step: 283070   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:07,233-Speed 2628.70 samples/sec   Loss 8.8345   LearningRate 0.0434   Epoch: 6   Global Step: 283080   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:11,109-Speed 2642.12 samples/sec   Loss 8.9750   LearningRate 0.0434   Epoch: 6   Global Step: 283090   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:15,001-Speed 2631.60 samples/sec   Loss 8.9799   LearningRate 0.0434   Epoch: 6   Global Step: 283100   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:18,892-Speed 2631.92 samples/sec   Loss 8.9213   LearningRate 0.0434   Epoch: 6   Global Step: 283110   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:22,791-Speed 2627.70 samples/sec   Loss 9.0041   LearningRate 0.0434   Epoch: 6   Global Step: 283120   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:26,685-Speed 2630.48 samples/sec   Loss 9.0127   LearningRate 0.0434   Epoch: 6   Global Step: 283130   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:30,590-Speed 2622.61 samples/sec   Loss 8.8208   LearningRate 0.0434   Epoch: 6   Global Step: 283140   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:34,612-Speed 2546.88 samples/sec   Loss 8.8410   LearningRate 0.0434   Epoch: 6   Global Step: 283150   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:38,731-Speed 2486.13 samples/sec   Loss 8.9548   LearningRate 0.0434   Epoch: 6   Global Step: 283160   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:42,627-Speed 2629.93 samples/sec   Loss 9.1274   LearningRate 0.0434   Epoch: 6   Global Step: 283170   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:46,628-Speed 2560.08 samples/sec   Loss 8.9281   LearningRate 0.0434   Epoch: 6   Global Step: 283180   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:32:50,517-Speed 2632.89 samples/sec   Loss 8.8762   LearningRate 0.0434   Epoch: 6   Global Step: 283190   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:32:54,409-Speed 2632.19 samples/sec   Loss 8.7945   LearningRate 0.0434   Epoch: 6   Global Step: 283200   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:32:58,303-Speed 2630.24 samples/sec   Loss 8.9922   LearningRate 0.0434   Epoch: 6   Global Step: 283210   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:33:02,201-Speed 2627.65 samples/sec   Loss 8.8654   LearningRate 0.0434   Epoch: 6   Global Step: 283220   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:33:06,075-Speed 2643.78 samples/sec   Loss 8.9529   LearningRate 0.0434   Epoch: 6   Global Step: 283230   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:09,966-Speed 2632.63 samples/sec   Loss 8.9257   LearningRate 0.0434   Epoch: 6   Global Step: 283240   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:13,861-Speed 2628.91 samples/sec   Loss 8.9442   LearningRate 0.0434   Epoch: 6   Global Step: 283250   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:17,757-Speed 2629.35 samples/sec   Loss 8.9082   LearningRate 0.0434   Epoch: 6   Global Step: 283260   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:21,655-Speed 2628.17 samples/sec   Loss 8.9432   LearningRate 0.0434   Epoch: 6   Global Step: 283270   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:25,544-Speed 2633.03 samples/sec   Loss 8.9398   LearningRate 0.0434   Epoch: 6   Global Step: 283280   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:29,454-Speed 2619.81 samples/sec   Loss 8.8568   LearningRate 0.0434   Epoch: 6   Global Step: 283290   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:33,353-Speed 2627.40 samples/sec   Loss 8.9499   LearningRate 0.0434   Epoch: 6   Global Step: 283300   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:37,281-Speed 2607.29 samples/sec   Loss 8.9313   LearningRate 0.0434   Epoch: 6   Global Step: 283310   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:41,182-Speed 2625.36 samples/sec   Loss 8.8944   LearningRate 0.0434   Epoch: 6   Global Step: 283320   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:33:45,088-Speed 2622.28 samples/sec   Loss 8.8433   LearningRate 0.0434   Epoch: 6   Global Step: 283330   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:33:48,986-Speed 2627.93 samples/sec   Loss 8.9407   LearningRate 0.0434   Epoch: 6   Global Step: 283340   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:33:52,885-Speed 2627.46 samples/sec   Loss 8.8699   LearningRate 0.0434   Epoch: 6   Global Step: 283350   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:33:56,788-Speed 2623.63 samples/sec   Loss 8.8830   LearningRate 0.0434   Epoch: 6   Global Step: 283360   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:00,685-Speed 2628.66 samples/sec   Loss 9.0905   LearningRate 0.0434   Epoch: 6   Global Step: 283370   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:04,584-Speed 2627.17 samples/sec   Loss 8.9905   LearningRate 0.0433   Epoch: 6   Global Step: 283380   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:08,487-Speed 2624.26 samples/sec   Loss 8.8887   LearningRate 0.0433   Epoch: 6   Global Step: 283390   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:12,391-Speed 2623.39 samples/sec   Loss 8.9381   LearningRate 0.0433   Epoch: 6   Global Step: 283400   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:16,298-Speed 2621.39 samples/sec   Loss 8.8762   LearningRate 0.0433   Epoch: 6   Global Step: 283410   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:20,205-Speed 2621.73 samples/sec   Loss 8.9485   LearningRate 0.0433   Epoch: 6   Global Step: 283420   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:24,108-Speed 2624.78 samples/sec   Loss 8.8977   LearningRate 0.0433   Epoch: 6   Global Step: 283430   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:34:28,007-Speed 2626.87 samples/sec   Loss 8.9242   LearningRate 0.0433   Epoch: 6   Global Step: 283440   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:34:31,892-Speed 2636.37 samples/sec   Loss 8.8448   LearningRate 0.0433   Epoch: 6   Global Step: 283450   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:35,793-Speed 2625.10 samples/sec   Loss 8.8434   LearningRate 0.0433   Epoch: 6   Global Step: 283460   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:39,697-Speed 2623.54 samples/sec   Loss 9.0238   LearningRate 0.0433   Epoch: 6   Global Step: 283470   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:43,622-Speed 2609.53 samples/sec   Loss 8.8630   LearningRate 0.0433   Epoch: 6   Global Step: 283480   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:47,529-Speed 2621.87 samples/sec   Loss 8.8017   LearningRate 0.0433   Epoch: 6   Global Step: 283490   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:51,425-Speed 2628.38 samples/sec   Loss 8.8091   LearningRate 0.0433   Epoch: 6   Global Step: 283500   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:55,326-Speed 2626.33 samples/sec   Loss 8.8419   LearningRate 0.0433   Epoch: 6   Global Step: 283510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:34:59,217-Speed 2631.96 samples/sec   Loss 8.8987   LearningRate 0.0433   Epoch: 6   Global Step: 283520   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:03,110-Speed 2631.02 samples/sec   Loss 8.9971   LearningRate 0.0433   Epoch: 6   Global Step: 283530   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:07,028-Speed 2614.00 samples/sec   Loss 8.8823   LearningRate 0.0433   Epoch: 6   Global Step: 283540   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:10,928-Speed 2626.09 samples/sec   Loss 8.8846   LearningRate 0.0433   Epoch: 6   Global Step: 283550   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:35:14,816-Speed 2634.65 samples/sec   Loss 8.9660   LearningRate 0.0433   Epoch: 6   Global Step: 283560   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:18,728-Speed 2617.70 samples/sec   Loss 8.9007   LearningRate 0.0433   Epoch: 6   Global Step: 283570   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:22,640-Speed 2619.06 samples/sec   Loss 9.0620   LearningRate 0.0433   Epoch: 6   Global Step: 283580   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:26,545-Speed 2622.85 samples/sec   Loss 9.0099   LearningRate 0.0433   Epoch: 6   Global Step: 283590   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:30,440-Speed 2629.58 samples/sec   Loss 8.9696   LearningRate 0.0433   Epoch: 6   Global Step: 283600   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:34,333-Speed 2631.18 samples/sec   Loss 8.7813   LearningRate 0.0433   Epoch: 6   Global Step: 283610   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:38,231-Speed 2627.49 samples/sec   Loss 8.7517   LearningRate 0.0433   Epoch: 6   Global Step: 283620   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:42,129-Speed 2627.55 samples/sec   Loss 8.9927   LearningRate 0.0433   Epoch: 6   Global Step: 283630   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:46,024-Speed 2629.87 samples/sec   Loss 8.8960   LearningRate 0.0433   Epoch: 6   Global Step: 283640   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:49,916-Speed 2631.36 samples/sec   Loss 8.9649   LearningRate 0.0433   Epoch: 6   Global Step: 283650   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:35:53,810-Speed 2630.77 samples/sec   Loss 8.9273   LearningRate 0.0433   Epoch: 6   Global Step: 283660   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:35:57,712-Speed 2624.82 samples/sec   Loss 8.8360   LearningRate 0.0433   Epoch: 6   Global Step: 283670   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:36:01,582-Speed 2647.02 samples/sec   Loss 8.9061   LearningRate 0.0433   Epoch: 6   Global Step: 283680   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:05,479-Speed 2627.80 samples/sec   Loss 8.9706   LearningRate 0.0433   Epoch: 6   Global Step: 283690   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:09,371-Speed 2631.42 samples/sec   Loss 8.8992   LearningRate 0.0433   Epoch: 6   Global Step: 283700   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:13,267-Speed 2628.93 samples/sec   Loss 8.9134   LearningRate 0.0433   Epoch: 6   Global Step: 283710   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:17,164-Speed 2628.55 samples/sec   Loss 8.9191   LearningRate 0.0433   Epoch: 6   Global Step: 283720   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:21,059-Speed 2629.76 samples/sec   Loss 8.8853   LearningRate 0.0433   Epoch: 6   Global Step: 283730   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:24,972-Speed 2617.50 samples/sec   Loss 8.8802   LearningRate 0.0433   Epoch: 6   Global Step: 283740   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:28,874-Speed 2625.43 samples/sec   Loss 8.7946   LearningRate 0.0433   Epoch: 6   Global Step: 283750   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:32,767-Speed 2630.66 samples/sec   Loss 9.0343   LearningRate 0.0433   Epoch: 6   Global Step: 283760   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:36,668-Speed 2625.52 samples/sec   Loss 8.9650   LearningRate 0.0433   Epoch: 6   Global Step: 283770   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:36:40,564-Speed 2628.99 samples/sec   Loss 8.8570   LearningRate 0.0433   Epoch: 6   Global Step: 283780   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:36:44,459-Speed 2629.55 samples/sec   Loss 8.9908   LearningRate 0.0433   Epoch: 6   Global Step: 283790   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:36:48,362-Speed 2624.57 samples/sec   Loss 8.8424   LearningRate 0.0433   Epoch: 6   Global Step: 283800   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:36:52,254-Speed 2631.35 samples/sec   Loss 8.8871   LearningRate 0.0433   Epoch: 6   Global Step: 283810   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:36:56,145-Speed 2632.64 samples/sec   Loss 8.8931   LearningRate 0.0433   Epoch: 6   Global Step: 283820   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:37:00,037-Speed 2631.76 samples/sec   Loss 8.9809   LearningRate 0.0433   Epoch: 6   Global Step: 283830   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:37:03,845-Speed 2689.20 samples/sec   Loss 9.6805   LearningRate 0.0433   Epoch: 6   Global Step: 283840   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:07,742-Speed 2628.93 samples/sec   Loss 9.4544   LearningRate 0.0433   Epoch: 6   Global Step: 283850   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:11,629-Speed 2634.81 samples/sec   Loss 10.2517   LearningRate 0.0433   Epoch: 6   Global Step: 283860   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:15,522-Speed 2630.90 samples/sec   Loss 9.6497   LearningRate 0.0433   Epoch: 6   Global Step: 283870   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:19,412-Speed 2632.95 samples/sec   Loss 9.3343   LearningRate 0.0433   Epoch: 6   Global Step: 283880   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:23,309-Speed 2628.62 samples/sec   Loss 9.0732   LearningRate 0.0433   Epoch: 6   Global Step: 283890   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:27,212-Speed 2624.00 samples/sec   Loss 9.0652   LearningRate 0.0433   Epoch: 6   Global Step: 283900   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:31,095-Speed 2637.94 samples/sec   Loss 9.0428   LearningRate 0.0433   Epoch: 6   Global Step: 283910   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:34,986-Speed 2632.70 samples/sec   Loss 8.9241   LearningRate 0.0433   Epoch: 6   Global Step: 283920   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:38,884-Speed 2627.28 samples/sec   Loss 8.9324   LearningRate 0.0433   Epoch: 6   Global Step: 283930   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:37:42,779-Speed 2630.20 samples/sec   Loss 9.0772   LearningRate 0.0433   Epoch: 6   Global Step: 283940   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:37:46,678-Speed 2627.09 samples/sec   Loss 8.9428   LearningRate 0.0433   Epoch: 6   Global Step: 283950   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:37:50,578-Speed 2625.82 samples/sec   Loss 8.9241   LearningRate 0.0433   Epoch: 6   Global Step: 283960   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:37:54,473-Speed 2630.30 samples/sec   Loss 8.9176   LearningRate 0.0433   Epoch: 6   Global Step: 283970   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:37:58,363-Speed 2633.07 samples/sec   Loss 8.9799   LearningRate 0.0433   Epoch: 6   Global Step: 283980   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:38:02,263-Speed 2626.39 samples/sec   Loss 8.9135   LearningRate 0.0433   Epoch: 6   Global Step: 283990   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:38:06,183-Speed 2613.22 samples/sec   Loss 9.0558   LearningRate 0.0433   Epoch: 6   Global Step: 284000   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:38:10,072-Speed 2633.14 samples/sec   Loss 9.1041   LearningRate 0.0432   Epoch: 6   Global Step: 284010   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:38:13,963-Speed 2632.78 samples/sec   Loss 8.9242   LearningRate 0.0432   Epoch: 6   Global Step: 284020   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:38:17,855-Speed 2631.20 samples/sec   Loss 8.9266   LearningRate 0.0432   Epoch: 6   Global Step: 284030   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:38:21,746-Speed 2632.69 samples/sec   Loss 8.9780   LearningRate 0.0432   Epoch: 6   Global Step: 284040   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:25,636-Speed 2632.56 samples/sec   Loss 8.9575   LearningRate 0.0432   Epoch: 6   Global Step: 284050   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:29,531-Speed 2629.60 samples/sec   Loss 9.0099   LearningRate 0.0432   Epoch: 6   Global Step: 284060   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:33,425-Speed 2630.74 samples/sec   Loss 8.7765   LearningRate 0.0432   Epoch: 6   Global Step: 284070   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:37,318-Speed 2631.47 samples/sec   Loss 8.8995   LearningRate 0.0432   Epoch: 6   Global Step: 284080   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:41,220-Speed 2624.87 samples/sec   Loss 8.9357   LearningRate 0.0432   Epoch: 6   Global Step: 284090   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:45,113-Speed 2630.78 samples/sec   Loss 8.7364   LearningRate 0.0432   Epoch: 6   Global Step: 284100   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:49,006-Speed 2630.75 samples/sec   Loss 8.8756   LearningRate 0.0432   Epoch: 6   Global Step: 284110   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:52,904-Speed 2627.93 samples/sec   Loss 8.9952   LearningRate 0.0432   Epoch: 6   Global Step: 284120   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:38:56,806-Speed 2624.49 samples/sec   Loss 8.9586   LearningRate 0.0432   Epoch: 6   Global Step: 284130   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:39:00,708-Speed 2624.99 samples/sec   Loss 8.7528   LearningRate 0.0432   Epoch: 6   Global Step: 284140   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:04,609-Speed 2625.50 samples/sec   Loss 8.8272   LearningRate 0.0432   Epoch: 6   Global Step: 284150   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:08,516-Speed 2622.37 samples/sec   Loss 8.8586   LearningRate 0.0432   Epoch: 6   Global Step: 284160   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:12,415-Speed 2626.55 samples/sec   Loss 8.8151   LearningRate 0.0432   Epoch: 6   Global Step: 284170   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:16,309-Speed 2630.02 samples/sec   Loss 8.8711   LearningRate 0.0432   Epoch: 6   Global Step: 284180   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:20,206-Speed 2637.15 samples/sec   Loss 8.9263   LearningRate 0.0432   Epoch: 6   Global Step: 284190   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:24,097-Speed 2632.58 samples/sec   Loss 8.9584   LearningRate 0.0432   Epoch: 6   Global Step: 284200   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:27,987-Speed 2632.74 samples/sec   Loss 8.9674   LearningRate 0.0432   Epoch: 6   Global Step: 284210   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:31,880-Speed 2631.11 samples/sec   Loss 8.8194   LearningRate 0.0432   Epoch: 6   Global Step: 284220   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:35,774-Speed 2630.11 samples/sec   Loss 8.8574   LearningRate 0.0432   Epoch: 6   Global Step: 284230   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:39,668-Speed 2630.41 samples/sec   Loss 8.8515   LearningRate 0.0432   Epoch: 6   Global Step: 284240   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:39:43,567-Speed 2626.94 samples/sec   Loss 8.9459   LearningRate 0.0432   Epoch: 6   Global Step: 284250   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:39:47,449-Speed 2639.18 samples/sec   Loss 9.0747   LearningRate 0.0432   Epoch: 6   Global Step: 284260   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:39:51,346-Speed 2628.14 samples/sec   Loss 8.8551   LearningRate 0.0432   Epoch: 6   Global Step: 284270   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:39:55,241-Speed 2629.32 samples/sec   Loss 8.8200   LearningRate 0.0432   Epoch: 6   Global Step: 284280   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:39:59,136-Speed 2629.37 samples/sec   Loss 8.8766   LearningRate 0.0432   Epoch: 6   Global Step: 284290   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:03,040-Speed 2623.45 samples/sec   Loss 8.8757   LearningRate 0.0432   Epoch: 6   Global Step: 284300   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:06,940-Speed 2625.97 samples/sec   Loss 8.7575   LearningRate 0.0432   Epoch: 6   Global Step: 284310   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:10,900-Speed 2586.76 samples/sec   Loss 8.8676   LearningRate 0.0432   Epoch: 6   Global Step: 284320   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:14,827-Speed 2608.55 samples/sec   Loss 8.8954   LearningRate 0.0432   Epoch: 6   Global Step: 284330   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:18,726-Speed 2627.37 samples/sec   Loss 8.8152   LearningRate 0.0432   Epoch: 6   Global Step: 284340   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:22,615-Speed 2633.52 samples/sec   Loss 8.8055   LearningRate 0.0432   Epoch: 6   Global Step: 284350   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:40:26,516-Speed 2625.52 samples/sec   Loss 8.9406   LearningRate 0.0432   Epoch: 6   Global Step: 284360   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:30,423-Speed 2621.35 samples/sec   Loss 8.9736   LearningRate 0.0432   Epoch: 6   Global Step: 284370   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:34,316-Speed 2630.97 samples/sec   Loss 8.9150   LearningRate 0.0432   Epoch: 6   Global Step: 284380   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:38,208-Speed 2631.37 samples/sec   Loss 8.9411   LearningRate 0.0432   Epoch: 6   Global Step: 284390   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:42,100-Speed 2632.09 samples/sec   Loss 8.9420   LearningRate 0.0432   Epoch: 6   Global Step: 284400   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:45,991-Speed 2632.13 samples/sec   Loss 8.9817   LearningRate 0.0432   Epoch: 6   Global Step: 284410   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:49,945-Speed 2590.58 samples/sec   Loss 8.7446   LearningRate 0.0432   Epoch: 6   Global Step: 284420   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:53,846-Speed 2625.67 samples/sec   Loss 8.9735   LearningRate 0.0432   Epoch: 6   Global Step: 284430   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:40:57,743-Speed 2628.08 samples/sec   Loss 9.0133   LearningRate 0.0432   Epoch: 6   Global Step: 284440   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:41:01,669-Speed 2608.66 samples/sec   Loss 9.2844   LearningRate 0.0432   Epoch: 6   Global Step: 284450   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:41:05,585-Speed 2615.60 samples/sec   Loss 9.1676   LearningRate 0.0432   Epoch: 6   Global Step: 284460   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:41:09,479-Speed 2630.32 samples/sec   Loss 8.9153   LearningRate 0.0432   Epoch: 6   Global Step: 284470   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:41:13,384-Speed 2622.93 samples/sec   Loss 8.9011   LearningRate 0.0432   Epoch: 6   Global Step: 284480   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:41:17,280-Speed 2629.75 samples/sec   Loss 8.8281   LearningRate 0.0432   Epoch: 6   Global Step: 284490   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:41:21,174-Speed 2630.16 samples/sec   Loss 8.9251   LearningRate 0.0432   Epoch: 6   Global Step: 284500   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:41:25,084-Speed 2619.34 samples/sec   Loss 8.8534   LearningRate 0.0432   Epoch: 6   Global Step: 284510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:41:28,945-Speed 2652.91 samples/sec   Loss 8.9321   LearningRate 0.0432   Epoch: 6   Global Step: 284520   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:32,839-Speed 2630.26 samples/sec   Loss 8.8975   LearningRate 0.0432   Epoch: 6   Global Step: 284530   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:36,731-Speed 2631.68 samples/sec   Loss 8.9555   LearningRate 0.0432   Epoch: 6   Global Step: 284540   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:40,622-Speed 2632.64 samples/sec   Loss 9.0184   LearningRate 0.0432   Epoch: 6   Global Step: 284550   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:44,528-Speed 2622.33 samples/sec   Loss 8.9165   LearningRate 0.0432   Epoch: 6   Global Step: 284560   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:48,419-Speed 2632.37 samples/sec   Loss 8.9417   LearningRate 0.0432   Epoch: 6   Global Step: 284570   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:52,317-Speed 2628.09 samples/sec   Loss 8.8982   LearningRate 0.0432   Epoch: 6   Global Step: 284580   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:41:56,239-Speed 2611.44 samples/sec   Loss 8.9229   LearningRate 0.0432   Epoch: 6   Global Step: 284590   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:42:00,139-Speed 2626.59 samples/sec   Loss 8.8080   LearningRate 0.0432   Epoch: 6   Global Step: 284600   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:42:04,030-Speed 2631.82 samples/sec   Loss 8.9316   LearningRate 0.0432   Epoch: 6   Global Step: 284610   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:42:07,923-Speed 2631.13 samples/sec   Loss 8.8812   LearningRate 0.0432   Epoch: 6   Global Step: 284620   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:11,819-Speed 2629.17 samples/sec   Loss 8.9072   LearningRate 0.0432   Epoch: 6   Global Step: 284630   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:15,715-Speed 2629.35 samples/sec   Loss 8.8138   LearningRate 0.0432   Epoch: 6   Global Step: 284640   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:19,613-Speed 2627.25 samples/sec   Loss 8.7925   LearningRate 0.0431   Epoch: 6   Global Step: 284650   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:23,512-Speed 2627.08 samples/sec   Loss 9.2147   LearningRate 0.0431   Epoch: 6   Global Step: 284660   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:27,420-Speed 2620.99 samples/sec   Loss 9.4688   LearningRate 0.0431   Epoch: 6   Global Step: 284670   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:31,319-Speed 2627.32 samples/sec   Loss 9.1214   LearningRate 0.0431   Epoch: 6   Global Step: 284680   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:35,219-Speed 2626.48 samples/sec   Loss 8.8818   LearningRate 0.0431   Epoch: 6   Global Step: 284690   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:39,141-Speed 2611.29 samples/sec   Loss 8.8672   LearningRate 0.0431   Epoch: 6   Global Step: 284700   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:43,043-Speed 2624.92 samples/sec   Loss 8.9459   LearningRate 0.0431   Epoch: 6   Global Step: 284710   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:42:46,949-Speed 2622.14 samples/sec   Loss 8.9172   LearningRate 0.0431   Epoch: 6   Global Step: 284720   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:42:50,842-Speed 2631.08 samples/sec   Loss 8.9771   LearningRate 0.0431   Epoch: 6   Global Step: 284730   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:42:54,734-Speed 2631.40 samples/sec   Loss 9.0622   LearningRate 0.0431   Epoch: 6   Global Step: 284740   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:42:58,629-Speed 2630.13 samples/sec   Loss 8.8644   LearningRate 0.0431   Epoch: 6   Global Step: 284750   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:43:02,524-Speed 2629.93 samples/sec   Loss 8.8639   LearningRate 0.0431   Epoch: 6   Global Step: 284760   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:43:06,416-Speed 2631.77 samples/sec   Loss 8.9525   LearningRate 0.0431   Epoch: 6   Global Step: 284770   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:43:10,306-Speed 2632.73 samples/sec   Loss 8.9583   LearningRate 0.0431   Epoch: 6   Global Step: 284780   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:43:14,203-Speed 2627.99 samples/sec   Loss 8.9803   LearningRate 0.0431   Epoch: 6   Global Step: 284790   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:18,098-Speed 2629.90 samples/sec   Loss 8.8963   LearningRate 0.0431   Epoch: 6   Global Step: 284800   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:21,990-Speed 2631.75 samples/sec   Loss 8.8503   LearningRate 0.0431   Epoch: 6   Global Step: 284810   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:25,893-Speed 2623.78 samples/sec   Loss 8.9528   LearningRate 0.0431   Epoch: 6   Global Step: 284820   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:29,786-Speed 2631.42 samples/sec   Loss 8.9669   LearningRate 0.0431   Epoch: 6   Global Step: 284830   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:33,693-Speed 2621.48 samples/sec   Loss 8.8878   LearningRate 0.0431   Epoch: 6   Global Step: 284840   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:37,585-Speed 2632.13 samples/sec   Loss 8.8753   LearningRate 0.0431   Epoch: 6   Global Step: 284850   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:41,472-Speed 2634.80 samples/sec   Loss 8.9990   LearningRate 0.0431   Epoch: 6   Global Step: 284860   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:45,364-Speed 2631.68 samples/sec   Loss 8.9565   LearningRate 0.0431   Epoch: 6   Global Step: 284870   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:49,255-Speed 2631.90 samples/sec   Loss 8.8862   LearningRate 0.0431   Epoch: 6   Global Step: 284880   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:43:53,148-Speed 2631.34 samples/sec   Loss 8.8896   LearningRate 0.0431   Epoch: 6   Global Step: 284890   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:43:57,040-Speed 2631.59 samples/sec   Loss 8.7729   LearningRate 0.0431   Epoch: 6   Global Step: 284900   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:44:00,933-Speed 2630.84 samples/sec   Loss 8.9622   LearningRate 0.0431   Epoch: 6   Global Step: 284910   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:44:04,838-Speed 2622.85 samples/sec   Loss 8.8108   LearningRate 0.0431   Epoch: 6   Global Step: 284920   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:44:08,699-Speed 2653.56 samples/sec   Loss 9.1165   LearningRate 0.0431   Epoch: 6   Global Step: 284930   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:12,602-Speed 2623.47 samples/sec   Loss 9.7217   LearningRate 0.0431   Epoch: 6   Global Step: 284940   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:16,504-Speed 2625.03 samples/sec   Loss 9.3048   LearningRate 0.0431   Epoch: 6   Global Step: 284950   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:20,408-Speed 2623.74 samples/sec   Loss 9.1264   LearningRate 0.0431   Epoch: 6   Global Step: 284960   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:24,301-Speed 2631.07 samples/sec   Loss 8.9491   LearningRate 0.0431   Epoch: 6   Global Step: 284970   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:28,196-Speed 2629.96 samples/sec   Loss 9.1535   LearningRate 0.0431   Epoch: 6   Global Step: 284980   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:32,088-Speed 2631.65 samples/sec   Loss 8.9666   LearningRate 0.0431   Epoch: 6   Global Step: 284990   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:35,977-Speed 2633.66 samples/sec   Loss 8.9818   LearningRate 0.0431   Epoch: 6   Global Step: 285000   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:39,869-Speed 2631.74 samples/sec   Loss 8.9760   LearningRate 0.0431   Epoch: 6   Global Step: 285010   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:43,772-Speed 2624.58 samples/sec   Loss 8.8992   LearningRate 0.0431   Epoch: 6   Global Step: 285020   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:44:47,666-Speed 2630.07 samples/sec   Loss 8.8725   LearningRate 0.0431   Epoch: 6   Global Step: 285030   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:44:51,557-Speed 2632.49 samples/sec   Loss 8.9109   LearningRate 0.0431   Epoch: 6   Global Step: 285040   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:44:55,448-Speed 2632.14 samples/sec   Loss 8.9703   LearningRate 0.0431   Epoch: 6   Global Step: 285050   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:44:59,342-Speed 2630.12 samples/sec   Loss 9.0213   LearningRate 0.0431   Epoch: 6   Global Step: 285060   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:03,295-Speed 2590.66 samples/sec   Loss 8.9783   LearningRate 0.0431   Epoch: 6   Global Step: 285070   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:07,295-Speed 2560.67 samples/sec   Loss 8.9851   LearningRate 0.0431   Epoch: 6   Global Step: 285080   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:11,191-Speed 2629.28 samples/sec   Loss 8.9855   LearningRate 0.0431   Epoch: 6   Global Step: 285090   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:15,093-Speed 2624.46 samples/sec   Loss 8.9348   LearningRate 0.0431   Epoch: 6   Global Step: 285100   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:18,987-Speed 2630.76 samples/sec   Loss 8.8648   LearningRate 0.0431   Epoch: 6   Global Step: 285110   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:22,893-Speed 2622.41 samples/sec   Loss 8.8777   LearningRate 0.0431   Epoch: 6   Global Step: 285120   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:45:26,785-Speed 2631.45 samples/sec   Loss 8.8209   LearningRate 0.0431   Epoch: 6   Global Step: 285130   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:30,679-Speed 2630.48 samples/sec   Loss 8.8930   LearningRate 0.0431   Epoch: 6   Global Step: 285140   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:34,583-Speed 2623.39 samples/sec   Loss 8.8300   LearningRate 0.0431   Epoch: 6   Global Step: 285150   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:38,479-Speed 2628.92 samples/sec   Loss 8.9131   LearningRate 0.0431   Epoch: 6   Global Step: 285160   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:42,372-Speed 2631.26 samples/sec   Loss 8.8374   LearningRate 0.0431   Epoch: 6   Global Step: 285170   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:46,267-Speed 2628.99 samples/sec   Loss 8.9038   LearningRate 0.0431   Epoch: 6   Global Step: 285180   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:50,160-Speed 2631.35 samples/sec   Loss 8.8810   LearningRate 0.0431   Epoch: 6   Global Step: 285190   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:54,053-Speed 2631.41 samples/sec   Loss 8.9144   LearningRate 0.0431   Epoch: 6   Global Step: 285200   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:45:57,945-Speed 2631.41 samples/sec   Loss 8.9599   LearningRate 0.0431   Epoch: 6   Global Step: 285210   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:46:01,841-Speed 2629.25 samples/sec   Loss 8.9223   LearningRate 0.0431   Epoch: 6   Global Step: 285220   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:46:05,716-Speed 2643.12 samples/sec   Loss 8.9774   LearningRate 0.0431   Epoch: 6   Global Step: 285230   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:46:09,607-Speed 2632.08 samples/sec   Loss 8.8502   LearningRate 0.0431   Epoch: 6   Global Step: 285240   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:46:13,480-Speed 2644.55 samples/sec   Loss 8.8896   LearningRate 0.0431   Epoch: 6   Global Step: 285250   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:17,395-Speed 2615.83 samples/sec   Loss 8.7804   LearningRate 0.0431   Epoch: 6   Global Step: 285260   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:21,286-Speed 2632.68 samples/sec   Loss 8.7975   LearningRate 0.0431   Epoch: 6   Global Step: 285270   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:25,177-Speed 2632.47 samples/sec   Loss 9.0551   LearningRate 0.0430   Epoch: 6   Global Step: 285280   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:29,073-Speed 2629.20 samples/sec   Loss 8.9144   LearningRate 0.0430   Epoch: 6   Global Step: 285290   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:32,969-Speed 2628.72 samples/sec   Loss 8.8985   LearningRate 0.0430   Epoch: 6   Global Step: 285300   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:36,877-Speed 2621.02 samples/sec   Loss 8.8625   LearningRate 0.0430   Epoch: 6   Global Step: 285310   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:40,779-Speed 2624.97 samples/sec   Loss 8.8138   LearningRate 0.0430   Epoch: 6   Global Step: 285320   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:44,707-Speed 2607.37 samples/sec   Loss 8.8824   LearningRate 0.0430   Epoch: 6   Global Step: 285330   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:48,614-Speed 2621.86 samples/sec   Loss 8.7687   LearningRate 0.0430   Epoch: 6   Global Step: 285340   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:46:52,510-Speed 2629.03 samples/sec   Loss 8.7439   LearningRate 0.0430   Epoch: 6   Global Step: 285350   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:46:56,412-Speed 2625.70 samples/sec   Loss 8.9328   LearningRate 0.0430   Epoch: 6   Global Step: 285360   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:00,304-Speed 2631.14 samples/sec   Loss 8.8748   LearningRate 0.0430   Epoch: 6   Global Step: 285370   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:04,222-Speed 2614.48 samples/sec   Loss 8.9924   LearningRate 0.0430   Epoch: 6   Global Step: 285380   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:08,116-Speed 2630.34 samples/sec   Loss 8.8701   LearningRate 0.0430   Epoch: 6   Global Step: 285390   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:12,022-Speed 2622.08 samples/sec   Loss 8.8680   LearningRate 0.0430   Epoch: 6   Global Step: 285400   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:15,919-Speed 2628.57 samples/sec   Loss 8.9465   LearningRate 0.0430   Epoch: 6   Global Step: 285410   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:19,815-Speed 2628.99 samples/sec   Loss 8.9274   LearningRate 0.0430   Epoch: 6   Global Step: 285420   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:23,798-Speed 2571.17 samples/sec   Loss 9.0353   LearningRate 0.0430   Epoch: 6   Global Step: 285430   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:27,693-Speed 2629.78 samples/sec   Loss 8.8179   LearningRate 0.0430   Epoch: 6   Global Step: 285440   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:31,587-Speed 2630.39 samples/sec   Loss 8.9270   LearningRate 0.0430   Epoch: 6   Global Step: 285450   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:47:35,467-Speed 2639.97 samples/sec   Loss 8.9519   LearningRate 0.0430   Epoch: 6   Global Step: 285460   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:39,361-Speed 2629.62 samples/sec   Loss 9.0484   LearningRate 0.0430   Epoch: 6   Global Step: 285470   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:43,264-Speed 2626.20 samples/sec   Loss 8.8603   LearningRate 0.0430   Epoch: 6   Global Step: 285480   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:47,161-Speed 2628.37 samples/sec   Loss 8.8813   LearningRate 0.0430   Epoch: 6   Global Step: 285490   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:51,058-Speed 2628.39 samples/sec   Loss 9.0342   LearningRate 0.0430   Epoch: 6   Global Step: 285500   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:54,953-Speed 2629.25 samples/sec   Loss 8.8599   LearningRate 0.0430   Epoch: 6   Global Step: 285510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:47:58,851-Speed 2628.48 samples/sec   Loss 8.9248   LearningRate 0.0430   Epoch: 6   Global Step: 285520   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:02,753-Speed 2624.97 samples/sec   Loss 8.8273   LearningRate 0.0430   Epoch: 6   Global Step: 285530   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:06,647-Speed 2629.95 samples/sec   Loss 8.8884   LearningRate 0.0430   Epoch: 6   Global Step: 285540   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:10,541-Speed 2629.75 samples/sec   Loss 8.8839   LearningRate 0.0430   Epoch: 6   Global Step: 285550   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:14,435-Speed 2631.04 samples/sec   Loss 8.8652   LearningRate 0.0430   Epoch: 6   Global Step: 285560   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:48:18,325-Speed 2632.99 samples/sec   Loss 8.8423   LearningRate 0.0430   Epoch: 6   Global Step: 285570   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:22,221-Speed 2629.09 samples/sec   Loss 8.9437   LearningRate 0.0430   Epoch: 6   Global Step: 285580   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:26,116-Speed 2629.03 samples/sec   Loss 8.9040   LearningRate 0.0430   Epoch: 6   Global Step: 285590   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:30,008-Speed 2631.56 samples/sec   Loss 8.9193   LearningRate 0.0430   Epoch: 6   Global Step: 285600   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:33,911-Speed 2624.88 samples/sec   Loss 8.8932   LearningRate 0.0430   Epoch: 6   Global Step: 285610   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:37,805-Speed 2629.88 samples/sec   Loss 8.9576   LearningRate 0.0430   Epoch: 6   Global Step: 285620   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:48:41,651-Speed 2663.40 samples/sec   Loss 9.5452   LearningRate 0.0430   Epoch: 6   Global Step: 285630   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:48:45,533-Speed 2638.85 samples/sec   Loss 9.2923   LearningRate 0.0430   Epoch: 6   Global Step: 285640   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:48:49,423-Speed 2633.06 samples/sec   Loss 8.8949   LearningRate 0.0430   Epoch: 6   Global Step: 285650   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:48:53,313-Speed 2632.87 samples/sec   Loss 8.9823   LearningRate 0.0430   Epoch: 6   Global Step: 285660   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:48:57,240-Speed 2607.90 samples/sec   Loss 8.9827   LearningRate 0.0430   Epoch: 6   Global Step: 285670   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:49:01,182-Speed 2598.32 samples/sec   Loss 8.9465   LearningRate 0.0430   Epoch: 6   Global Step: 285680   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:49:05,075-Speed 2631.26 samples/sec   Loss 8.8702   LearningRate 0.0430   Epoch: 6   Global Step: 285690   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:49:08,967-Speed 2631.74 samples/sec   Loss 8.8931   LearningRate 0.0430   Epoch: 6   Global Step: 285700   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:49:12,858-Speed 2632.76 samples/sec   Loss 8.9291   LearningRate 0.0430   Epoch: 6   Global Step: 285710   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:49:16,752-Speed 2630.17 samples/sec   Loss 8.9855   LearningRate 0.0430   Epoch: 6   Global Step: 285720   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:49:20,645-Speed 2630.33 samples/sec   Loss 8.9469   LearningRate 0.0430   Epoch: 6   Global Step: 285730   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:24,546-Speed 2625.73 samples/sec   Loss 8.8598   LearningRate 0.0430   Epoch: 6   Global Step: 285740   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:28,444-Speed 2627.64 samples/sec   Loss 8.9843   LearningRate 0.0430   Epoch: 6   Global Step: 285750   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:32,338-Speed 2630.39 samples/sec   Loss 8.9529   LearningRate 0.0430   Epoch: 6   Global Step: 285760   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:36,228-Speed 2633.01 samples/sec   Loss 8.8560   LearningRate 0.0430   Epoch: 6   Global Step: 285770   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:40,134-Speed 2622.06 samples/sec   Loss 8.9067   LearningRate 0.0430   Epoch: 6   Global Step: 285780   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:44,038-Speed 2624.20 samples/sec   Loss 8.8978   LearningRate 0.0430   Epoch: 6   Global Step: 285790   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:47,932-Speed 2629.87 samples/sec   Loss 8.9465   LearningRate 0.0430   Epoch: 6   Global Step: 285800   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:51,822-Speed 2633.23 samples/sec   Loss 8.8681   LearningRate 0.0430   Epoch: 6   Global Step: 285810   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:55,714-Speed 2631.19 samples/sec   Loss 9.0106   LearningRate 0.0430   Epoch: 6   Global Step: 285820   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:49:59,628-Speed 2617.41 samples/sec   Loss 8.7821   LearningRate 0.0430   Epoch: 6   Global Step: 285830   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:50:03,560-Speed 2604.98 samples/sec   Loss 8.9618   LearningRate 0.0430   Epoch: 6   Global Step: 285840   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:50:07,445-Speed 2636.31 samples/sec   Loss 9.0039   LearningRate 0.0430   Epoch: 6   Global Step: 285850   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:11,329-Speed 2637.54 samples/sec   Loss 9.1228   LearningRate 0.0430   Epoch: 6   Global Step: 285860   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:15,228-Speed 2626.98 samples/sec   Loss 8.8524   LearningRate 0.0430   Epoch: 6   Global Step: 285870   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:19,118-Speed 2632.78 samples/sec   Loss 8.9462   LearningRate 0.0430   Epoch: 6   Global Step: 285880   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:23,009-Speed 2632.46 samples/sec   Loss 8.9877   LearningRate 0.0430   Epoch: 6   Global Step: 285890   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:26,905-Speed 2628.94 samples/sec   Loss 8.9179   LearningRate 0.0430   Epoch: 6   Global Step: 285900   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:30,799-Speed 2630.36 samples/sec   Loss 8.8983   LearningRate 0.0429   Epoch: 6   Global Step: 285910   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:34,690-Speed 2632.62 samples/sec   Loss 8.8748   LearningRate 0.0429   Epoch: 6   Global Step: 285920   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:38,589-Speed 2627.29 samples/sec   Loss 8.7838   LearningRate 0.0429   Epoch: 6   Global Step: 285930   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:42,481-Speed 2631.62 samples/sec   Loss 8.9174   LearningRate 0.0429   Epoch: 6   Global Step: 285940   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:50:46,370-Speed 2634.06 samples/sec   Loss 8.8817   LearningRate 0.0429   Epoch: 6   Global Step: 285950   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:50:50,263-Speed 2630.25 samples/sec   Loss 9.0496   LearningRate 0.0429   Epoch: 6   Global Step: 285960   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:50:54,195-Speed 2605.13 samples/sec   Loss 8.8577   LearningRate 0.0429   Epoch: 6   Global Step: 285970   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:50:58,088-Speed 2631.12 samples/sec   Loss 8.9609   LearningRate 0.0429   Epoch: 6   Global Step: 285980   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:02,015-Speed 2608.48 samples/sec   Loss 8.8734   LearningRate 0.0429   Epoch: 6   Global Step: 285990   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:05,911-Speed 2629.03 samples/sec   Loss 8.8471   LearningRate 0.0429   Epoch: 6   Global Step: 286000   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:09,834-Speed 2611.10 samples/sec   Loss 8.8320   LearningRate 0.0429   Epoch: 6   Global Step: 286010   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:13,723-Speed 2633.63 samples/sec   Loss 8.9255   LearningRate 0.0429   Epoch: 6   Global Step: 286020   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:17,631-Speed 2620.95 samples/sec   Loss 8.9276   LearningRate 0.0429   Epoch: 6   Global Step: 286030   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:21,520-Speed 2633.90 samples/sec   Loss 8.8883   LearningRate 0.0429   Epoch: 6   Global Step: 286040   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:51:25,410-Speed 2633.17 samples/sec   Loss 8.8449   LearningRate 0.0429   Epoch: 6   Global Step: 286050   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:29,301-Speed 2632.34 samples/sec   Loss 8.8542   LearningRate 0.0429   Epoch: 6   Global Step: 286060   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:33,197-Speed 2629.04 samples/sec   Loss 8.9034   LearningRate 0.0429   Epoch: 6   Global Step: 286070   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:37,089-Speed 2631.42 samples/sec   Loss 8.8308   LearningRate 0.0429   Epoch: 6   Global Step: 286080   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:40,989-Speed 2626.21 samples/sec   Loss 8.8789   LearningRate 0.0429   Epoch: 6   Global Step: 286090   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:44,885-Speed 2629.35 samples/sec   Loss 8.7973   LearningRate 0.0429   Epoch: 6   Global Step: 286100   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:48,781-Speed 2629.41 samples/sec   Loss 8.7821   LearningRate 0.0429   Epoch: 6   Global Step: 286110   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:52,676-Speed 2629.77 samples/sec   Loss 8.8225   LearningRate 0.0429   Epoch: 6   Global Step: 286120   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:51:56,598-Speed 2611.11 samples/sec   Loss 8.8779   LearningRate 0.0429   Epoch: 6   Global Step: 286130   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:52:00,492-Speed 2630.86 samples/sec   Loss 8.8507   LearningRate 0.0429   Epoch: 6   Global Step: 286140   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:52:04,385-Speed 2631.06 samples/sec   Loss 8.9463   LearningRate 0.0429   Epoch: 6   Global Step: 286150   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:08,279-Speed 2630.20 samples/sec   Loss 8.9571   LearningRate 0.0429   Epoch: 6   Global Step: 286160   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:12,207-Speed 2607.70 samples/sec   Loss 8.9851   LearningRate 0.0429   Epoch: 6   Global Step: 286170   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:16,113-Speed 2621.65 samples/sec   Loss 8.7836   LearningRate 0.0429   Epoch: 6   Global Step: 286180   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:20,031-Speed 2614.92 samples/sec   Loss 8.9396   LearningRate 0.0429   Epoch: 6   Global Step: 286190   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:23,928-Speed 2628.73 samples/sec   Loss 8.7012   LearningRate 0.0429   Epoch: 6   Global Step: 286200   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:27,841-Speed 2617.65 samples/sec   Loss 8.8809   LearningRate 0.0429   Epoch: 6   Global Step: 286210   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:31,735-Speed 2630.14 samples/sec   Loss 8.7933   LearningRate 0.0429   Epoch: 6   Global Step: 286220   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:35,641-Speed 2622.42 samples/sec   Loss 9.0005   LearningRate 0.0429   Epoch: 6   Global Step: 286230   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:39,545-Speed 2623.10 samples/sec   Loss 8.8451   LearningRate 0.0429   Epoch: 6   Global Step: 286240   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:43,460-Speed 2616.58 samples/sec   Loss 8.8605   LearningRate 0.0429   Epoch: 6   Global Step: 286250   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:52:47,346-Speed 2635.69 samples/sec   Loss 8.8547   LearningRate 0.0429   Epoch: 6   Global Step: 286260   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:51,245-Speed 2627.42 samples/sec   Loss 8.9040   LearningRate 0.0429   Epoch: 6   Global Step: 286270   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:55,137-Speed 2631.39 samples/sec   Loss 8.8688   LearningRate 0.0429   Epoch: 6   Global Step: 286280   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:52:59,037-Speed 2626.53 samples/sec   Loss 8.9741   LearningRate 0.0429   Epoch: 6   Global Step: 286290   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:53:02,950-Speed 2617.57 samples/sec   Loss 8.8313   LearningRate 0.0429   Epoch: 6   Global Step: 286300   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:53:06,842-Speed 2631.42 samples/sec   Loss 8.7843   LearningRate 0.0429   Epoch: 6   Global Step: 286310   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:53:10,717-Speed 2642.84 samples/sec   Loss 8.9032   LearningRate 0.0429   Epoch: 6   Global Step: 286320   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:14,609-Speed 2632.00 samples/sec   Loss 8.7930   LearningRate 0.0429   Epoch: 6   Global Step: 286330   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:18,505-Speed 2629.08 samples/sec   Loss 8.9555   LearningRate 0.0429   Epoch: 6   Global Step: 286340   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:22,401-Speed 2629.06 samples/sec   Loss 8.9625   LearningRate 0.0429   Epoch: 6   Global Step: 286350   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:26,293-Speed 2631.59 samples/sec   Loss 8.9354   LearningRate 0.0429   Epoch: 6   Global Step: 286360   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:30,197-Speed 2623.56 samples/sec   Loss 8.7708   LearningRate 0.0429   Epoch: 6   Global Step: 286370   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:34,092-Speed 2629.61 samples/sec   Loss 9.0417   LearningRate 0.0429   Epoch: 6   Global Step: 286380   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:38,025-Speed 2604.20 samples/sec   Loss 8.8520   LearningRate 0.0429   Epoch: 6   Global Step: 286390   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:41,931-Speed 2622.81 samples/sec   Loss 8.9354   LearningRate 0.0429   Epoch: 6   Global Step: 286400   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:45,832-Speed 2625.55 samples/sec   Loss 8.9083   LearningRate 0.0429   Epoch: 6   Global Step: 286410   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:53:49,749-Speed 2614.84 samples/sec   Loss 8.9260   LearningRate 0.0429   Epoch: 6   Global Step: 286420   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:53:53,667-Speed 2614.16 samples/sec   Loss 8.8826   LearningRate 0.0429   Epoch: 6   Global Step: 286430   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:53:57,563-Speed 2628.91 samples/sec   Loss 8.7748   LearningRate 0.0429   Epoch: 6   Global Step: 286440   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:54:01,467-Speed 2623.66 samples/sec   Loss 8.9304   LearningRate 0.0429   Epoch: 6   Global Step: 286450   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:54:05,360-Speed 2631.22 samples/sec   Loss 8.8718   LearningRate 0.0429   Epoch: 6   Global Step: 286460   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:54:09,425-Speed 2519.20 samples/sec   Loss 8.7937   LearningRate 0.0429   Epoch: 6   Global Step: 286470   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:54:13,322-Speed 2628.75 samples/sec   Loss 8.9734   LearningRate 0.0429   Epoch: 6   Global Step: 286480   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:54:17,226-Speed 2623.30 samples/sec   Loss 8.8066   LearningRate 0.0429   Epoch: 6   Global Step: 286490   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:54:21,109-Speed 2638.37 samples/sec   Loss 8.7259   LearningRate 0.0429   Epoch: 6   Global Step: 286500   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:25,020-Speed 2618.48 samples/sec   Loss 8.9206   LearningRate 0.0429   Epoch: 6   Global Step: 286510   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:28,921-Speed 2625.77 samples/sec   Loss 8.8667   LearningRate 0.0429   Epoch: 6   Global Step: 286520   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:32,817-Speed 2629.27 samples/sec   Loss 8.9286   LearningRate 0.0429   Epoch: 6   Global Step: 286530   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:36,715-Speed 2627.94 samples/sec   Loss 8.8362   LearningRate 0.0428   Epoch: 6   Global Step: 286540   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:40,610-Speed 2629.64 samples/sec   Loss 8.8034   LearningRate 0.0428   Epoch: 6   Global Step: 286550   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:44,504-Speed 2630.26 samples/sec   Loss 8.8559   LearningRate 0.0428   Epoch: 6   Global Step: 286560   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:48,400-Speed 2628.99 samples/sec   Loss 8.9308   LearningRate 0.0428   Epoch: 6   Global Step: 286570   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:52,299-Speed 2626.62 samples/sec   Loss 8.8630   LearningRate 0.0428   Epoch: 6   Global Step: 286580   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:54:56,189-Speed 2633.08 samples/sec   Loss 8.8022   LearningRate 0.0428   Epoch: 6   Global Step: 286590   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 03:55:00,080-Speed 2632.41 samples/sec   Loss 8.7190   LearningRate 0.0428   Epoch: 6   Global Step: 286600   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:03,973-Speed 2631.66 samples/sec   Loss 8.9018   LearningRate 0.0428   Epoch: 6   Global Step: 286610   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:07,870-Speed 2627.69 samples/sec   Loss 8.8504   LearningRate 0.0428   Epoch: 6   Global Step: 286620   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:11,764-Speed 2630.80 samples/sec   Loss 8.7191   LearningRate 0.0428   Epoch: 6   Global Step: 286630   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:15,669-Speed 2622.80 samples/sec   Loss 8.9175   LearningRate 0.0428   Epoch: 6   Global Step: 286640   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:19,560-Speed 2632.35 samples/sec   Loss 8.8566   LearningRate 0.0428   Epoch: 6   Global Step: 286650   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:23,451-Speed 2631.92 samples/sec   Loss 8.7625   LearningRate 0.0428   Epoch: 6   Global Step: 286660   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:27,345-Speed 2630.64 samples/sec   Loss 8.8737   LearningRate 0.0428   Epoch: 6   Global Step: 286670   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:31,240-Speed 2629.61 samples/sec   Loss 8.9980   LearningRate 0.0428   Epoch: 6   Global Step: 286680   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:35,135-Speed 2629.77 samples/sec   Loss 8.7775   LearningRate 0.0428   Epoch: 6   Global Step: 286690   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:39,015-Speed 2639.63 samples/sec   Loss 8.7025   LearningRate 0.0428   Epoch: 6   Global Step: 286700   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:42,912-Speed 2628.34 samples/sec   Loss 8.7550   LearningRate 0.0428   Epoch: 6   Global Step: 286710   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:46,805-Speed 2630.97 samples/sec   Loss 8.8933   LearningRate 0.0428   Epoch: 6   Global Step: 286720   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:50,702-Speed 2628.55 samples/sec   Loss 8.9661   LearningRate 0.0428   Epoch: 6   Global Step: 286730   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:54,596-Speed 2629.59 samples/sec   Loss 8.7855   LearningRate 0.0428   Epoch: 6   Global Step: 286740   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:55:58,495-Speed 2627.67 samples/sec   Loss 8.8989   LearningRate 0.0428   Epoch: 6   Global Step: 286750   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:02,387-Speed 2631.98 samples/sec   Loss 8.9575   LearningRate 0.0428   Epoch: 6   Global Step: 286760   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:06,281-Speed 2630.47 samples/sec   Loss 8.8681   LearningRate 0.0428   Epoch: 6   Global Step: 286770   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:10,171-Speed 2632.36 samples/sec   Loss 8.8652   LearningRate 0.0428   Epoch: 6   Global Step: 286780   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:14,070-Speed 2627.26 samples/sec   Loss 8.9659   LearningRate 0.0428   Epoch: 6   Global Step: 286790   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:17,967-Speed 2628.11 samples/sec   Loss 8.7647   LearningRate 0.0428   Epoch: 6   Global Step: 286800   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:56:21,860-Speed 2630.79 samples/sec   Loss 8.9200   LearningRate 0.0428   Epoch: 6   Global Step: 286810   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:56:25,752-Speed 2631.50 samples/sec   Loss 8.9286   LearningRate 0.0428   Epoch: 6   Global Step: 286820   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:56:29,661-Speed 2620.45 samples/sec   Loss 8.9507   LearningRate 0.0428   Epoch: 6   Global Step: 286830   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 03:56:33,544-Speed 2637.77 samples/sec   Loss 8.8557   LearningRate 0.0428   Epoch: 6   Global Step: 286840   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:37,444-Speed 2626.22 samples/sec   Loss 8.8693   LearningRate 0.0428   Epoch: 6   Global Step: 286850   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:41,346-Speed 2624.66 samples/sec   Loss 8.7317   LearningRate 0.0428   Epoch: 6   Global Step: 286860   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:45,254-Speed 2621.16 samples/sec   Loss 8.7406   LearningRate 0.0428   Epoch: 6   Global Step: 286870   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 03:56:49,092-Speed 2668.74 samples/sec   Loss 9.2719   LearningRate 0.0428   Epoch: 6   Global Step: 286880   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:56:53,027-Speed 2603.37 samples/sec   Loss 9.2186   LearningRate 0.0428   Epoch: 6   Global Step: 286890   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:56:56,938-Speed 2618.69 samples/sec   Loss 9.0050   LearningRate 0.0428   Epoch: 6   Global Step: 286900   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:57:00,838-Speed 2626.17 samples/sec   Loss 9.0380   LearningRate 0.0428   Epoch: 6   Global Step: 286910   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:04,729-Speed 2632.31 samples/sec   Loss 9.8714   LearningRate 0.0428   Epoch: 6   Global Step: 286920   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:08,712-Speed 2571.85 samples/sec   Loss 9.0143   LearningRate 0.0428   Epoch: 6   Global Step: 286930   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:12,620-Speed 2621.24 samples/sec   Loss 8.8739   LearningRate 0.0428   Epoch: 6   Global Step: 286940   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:16,513-Speed 2630.69 samples/sec   Loss 8.9658   LearningRate 0.0428   Epoch: 6   Global Step: 286950   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:20,409-Speed 2629.65 samples/sec   Loss 8.8880   LearningRate 0.0428   Epoch: 6   Global Step: 286960   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:24,305-Speed 2628.82 samples/sec   Loss 8.6798   LearningRate 0.0428   Epoch: 6   Global Step: 286970   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:28,230-Speed 2609.72 samples/sec   Loss 8.7869   LearningRate 0.0428   Epoch: 6   Global Step: 286980   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:32,127-Speed 2628.31 samples/sec   Loss 8.9023   LearningRate 0.0428   Epoch: 6   Global Step: 286990   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:36,083-Speed 2589.22 samples/sec   Loss 8.8405   LearningRate 0.0428   Epoch: 6   Global Step: 287000   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 03:57:40,040-Speed 2588.43 samples/sec   Loss 8.7638   LearningRate 0.0428   Epoch: 6   Global Step: 287010   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:57:43,941-Speed 2626.29 samples/sec   Loss 9.0077   LearningRate 0.0428   Epoch: 6   Global Step: 287020   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:57:47,834-Speed 2630.92 samples/sec   Loss 8.9418   LearningRate 0.0428   Epoch: 6   Global Step: 287030   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:57:51,725-Speed 2633.00 samples/sec   Loss 8.7865   LearningRate 0.0428   Epoch: 6   Global Step: 287040   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:57:55,622-Speed 2627.90 samples/sec   Loss 8.7501   LearningRate 0.0428   Epoch: 6   Global Step: 287050   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:57:59,512-Speed 2633.61 samples/sec   Loss 8.7439   LearningRate 0.0428   Epoch: 6   Global Step: 287060   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:58:03,421-Speed 2619.98 samples/sec   Loss 9.0548   LearningRate 0.0428   Epoch: 6   Global Step: 287070   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:58:07,314-Speed 2631.32 samples/sec   Loss 8.9224   LearningRate 0.0428   Epoch: 6   Global Step: 287080   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:58:11,208-Speed 2630.42 samples/sec   Loss 9.0056   LearningRate 0.0428   Epoch: 6   Global Step: 287090   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:58:15,105-Speed 2628.30 samples/sec   Loss 8.8965   LearningRate 0.0428   Epoch: 6   Global Step: 287100   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 03:58:19,000-Speed 2629.77 samples/sec   Loss 8.7452   LearningRate 0.0428   Epoch: 6   Global Step: 287110   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:22,893-Speed 2631.07 samples/sec   Loss 8.7861   LearningRate 0.0428   Epoch: 6   Global Step: 287120   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:26,788-Speed 2630.21 samples/sec   Loss 8.7474   LearningRate 0.0428   Epoch: 6   Global Step: 287130   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:30,684-Speed 2628.90 samples/sec   Loss 8.9313   LearningRate 0.0428   Epoch: 6   Global Step: 287140   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:34,620-Speed 2601.96 samples/sec   Loss 8.9003   LearningRate 0.0428   Epoch: 6   Global Step: 287150   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:38,516-Speed 2628.72 samples/sec   Loss 8.8665   LearningRate 0.0428   Epoch: 6   Global Step: 287160   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:42,421-Speed 2623.32 samples/sec   Loss 8.8242   LearningRate 0.0428   Epoch: 6   Global Step: 287170   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:46,311-Speed 2632.77 samples/sec   Loss 8.9077   LearningRate 0.0427   Epoch: 6   Global Step: 287180   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:50,219-Speed 2621.56 samples/sec   Loss 8.7656   LearningRate 0.0427   Epoch: 6   Global Step: 287190   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:54,109-Speed 2632.95 samples/sec   Loss 8.7641   LearningRate 0.0427   Epoch: 6   Global Step: 287200   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 03:58:58,003-Speed 2630.48 samples/sec   Loss 8.7608   LearningRate 0.0427   Epoch: 6   Global Step: 287210   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:01,903-Speed 2626.50 samples/sec   Loss 8.9416   LearningRate 0.0427   Epoch: 6   Global Step: 287220   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:05,804-Speed 2625.41 samples/sec   Loss 8.8090   LearningRate 0.0427   Epoch: 6   Global Step: 287230   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:09,704-Speed 2626.05 samples/sec   Loss 8.8045   LearningRate 0.0427   Epoch: 6   Global Step: 287240   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:13,619-Speed 2616.45 samples/sec   Loss 8.8548   LearningRate 0.0427   Epoch: 6   Global Step: 287250   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:17,511-Speed 2631.72 samples/sec   Loss 8.8767   LearningRate 0.0427   Epoch: 6   Global Step: 287260   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:21,417-Speed 2622.62 samples/sec   Loss 8.9487   LearningRate 0.0427   Epoch: 6   Global Step: 287270   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:25,342-Speed 2609.92 samples/sec   Loss 8.7965   LearningRate 0.0427   Epoch: 6   Global Step: 287280   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:29,235-Speed 2631.24 samples/sec   Loss 8.9410   LearningRate 0.0427   Epoch: 6   Global Step: 287290   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:33,152-Speed 2614.85 samples/sec   Loss 8.9101   LearningRate 0.0427   Epoch: 6   Global Step: 287300   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 03:59:37,047-Speed 2629.91 samples/sec   Loss 8.9007   LearningRate 0.0427   Epoch: 6   Global Step: 287310   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:59:40,938-Speed 2632.30 samples/sec   Loss 8.8432   LearningRate 0.0427   Epoch: 6   Global Step: 287320   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:59:44,828-Speed 2632.64 samples/sec   Loss 9.0032   LearningRate 0.0427   Epoch: 6   Global Step: 287330   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:59:48,719-Speed 2632.62 samples/sec   Loss 8.8828   LearningRate 0.0427   Epoch: 6   Global Step: 287340   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:59:52,611-Speed 2631.59 samples/sec   Loss 8.8099   LearningRate 0.0427   Epoch: 6   Global Step: 287350   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 03:59:56,519-Speed 2621.19 samples/sec   Loss 8.8690   LearningRate 0.0427   Epoch: 6   Global Step: 287360   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:00:00,430-Speed 2619.05 samples/sec   Loss 8.8603   LearningRate 0.0427   Epoch: 6   Global Step: 287370   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:00:04,324-Speed 2629.91 samples/sec   Loss 9.0728   LearningRate 0.0427   Epoch: 6   Global Step: 287380   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:00:08,225-Speed 2625.66 samples/sec   Loss 8.7380   LearningRate 0.0427   Epoch: 6   Global Step: 287390   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:00:12,123-Speed 2627.48 samples/sec   Loss 8.7835   LearningRate 0.0427   Epoch: 6   Global Step: 287400   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:00:16,017-Speed 2631.09 samples/sec   Loss 8.7907   LearningRate 0.0427   Epoch: 6   Global Step: 287410   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:19,908-Speed 2632.39 samples/sec   Loss 8.8328   LearningRate 0.0427   Epoch: 6   Global Step: 287420   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:23,803-Speed 2629.56 samples/sec   Loss 8.9030   LearningRate 0.0427   Epoch: 6   Global Step: 287430   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:27,694-Speed 2631.97 samples/sec   Loss 8.8473   LearningRate 0.0427   Epoch: 6   Global Step: 287440   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:31,585-Speed 2632.30 samples/sec   Loss 8.8169   LearningRate 0.0427   Epoch: 6   Global Step: 287450   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:35,481-Speed 2629.74 samples/sec   Loss 8.9102   LearningRate 0.0427   Epoch: 6   Global Step: 287460   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:39,372-Speed 2632.77 samples/sec   Loss 8.8460   LearningRate 0.0427   Epoch: 6   Global Step: 287470   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:43,265-Speed 2630.76 samples/sec   Loss 8.8026   LearningRate 0.0427   Epoch: 6   Global Step: 287480   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:47,155-Speed 2633.04 samples/sec   Loss 8.8992   LearningRate 0.0427   Epoch: 6   Global Step: 287490   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:51,049-Speed 2630.02 samples/sec   Loss 8.8784   LearningRate 0.0427   Epoch: 6   Global Step: 287500   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:00:54,941-Speed 2631.75 samples/sec   Loss 8.9321   LearningRate 0.0427   Epoch: 6   Global Step: 287510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:00:58,836-Speed 2630.07 samples/sec   Loss 8.8583   LearningRate 0.0427   Epoch: 6   Global Step: 287520   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:02,727-Speed 2632.19 samples/sec   Loss 8.9179   LearningRate 0.0427   Epoch: 6   Global Step: 287530   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:06,620-Speed 2630.37 samples/sec   Loss 8.8815   LearningRate 0.0427   Epoch: 6   Global Step: 287540   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:10,533-Speed 2617.92 samples/sec   Loss 8.8478   LearningRate 0.0427   Epoch: 6   Global Step: 287550   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:14,448-Speed 2616.72 samples/sec   Loss 8.6600   LearningRate 0.0427   Epoch: 6   Global Step: 287560   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:18,338-Speed 2632.70 samples/sec   Loss 8.8550   LearningRate 0.0427   Epoch: 6   Global Step: 287570   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:22,251-Speed 2618.00 samples/sec   Loss 8.9142   LearningRate 0.0427   Epoch: 6   Global Step: 287580   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:26,144-Speed 2631.29 samples/sec   Loss 8.7996   LearningRate 0.0427   Epoch: 6   Global Step: 287590   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:30,037-Speed 2630.96 samples/sec   Loss 8.7543   LearningRate 0.0427   Epoch: 6   Global Step: 287600   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:33,930-Speed 2630.92 samples/sec   Loss 8.7624   LearningRate 0.0427   Epoch: 6   Global Step: 287610   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:01:37,820-Speed 2633.67 samples/sec   Loss 8.8421   LearningRate 0.0427   Epoch: 6   Global Step: 287620   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:01:41,712-Speed 2631.10 samples/sec   Loss 8.9717   LearningRate 0.0427   Epoch: 6   Global Step: 287630   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:01:45,627-Speed 2617.37 samples/sec   Loss 8.7752   LearningRate 0.0427   Epoch: 6   Global Step: 287640   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:01:49,500-Speed 2644.10 samples/sec   Loss 8.8473   LearningRate 0.0427   Epoch: 6   Global Step: 287650   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:53,393-Speed 2631.88 samples/sec   Loss 8.9374   LearningRate 0.0427   Epoch: 6   Global Step: 287660   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:01:57,302-Speed 2620.01 samples/sec   Loss 8.7892   LearningRate 0.0427   Epoch: 6   Global Step: 287670   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:01,206-Speed 2623.11 samples/sec   Loss 8.8526   LearningRate 0.0427   Epoch: 6   Global Step: 287680   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:05,127-Speed 2612.41 samples/sec   Loss 8.9547   LearningRate 0.0427   Epoch: 6   Global Step: 287690   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:09,020-Speed 2631.49 samples/sec   Loss 8.8329   LearningRate 0.0427   Epoch: 6   Global Step: 287700   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:12,918-Speed 2627.49 samples/sec   Loss 8.8719   LearningRate 0.0427   Epoch: 6   Global Step: 287710   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:16,859-Speed 2598.54 samples/sec   Loss 8.8581   LearningRate 0.0427   Epoch: 6   Global Step: 287720   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:20,758-Speed 2627.12 samples/sec   Loss 8.9932   LearningRate 0.0427   Epoch: 6   Global Step: 287730   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:24,650-Speed 2633.49 samples/sec   Loss 8.7630   LearningRate 0.0427   Epoch: 6   Global Step: 287740   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:28,546-Speed 2628.71 samples/sec   Loss 8.7477   LearningRate 0.0427   Epoch: 6   Global Step: 287750   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:02:32,440-Speed 2630.44 samples/sec   Loss 8.8856   LearningRate 0.0427   Epoch: 6   Global Step: 287760   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:02:36,333-Speed 2630.99 samples/sec   Loss 8.8012   LearningRate 0.0427   Epoch: 6   Global Step: 287770   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:02:40,230-Speed 2628.36 samples/sec   Loss 8.8557   LearningRate 0.0427   Epoch: 6   Global Step: 287780   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:02:44,133-Speed 2624.12 samples/sec   Loss 8.8310   LearningRate 0.0427   Epoch: 6   Global Step: 287790   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:48,035-Speed 2625.06 samples/sec   Loss 8.8490   LearningRate 0.0427   Epoch: 6   Global Step: 287800   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:51,939-Speed 2623.73 samples/sec   Loss 8.7370   LearningRate 0.0426   Epoch: 6   Global Step: 287810   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:55,842-Speed 2625.10 samples/sec   Loss 8.7276   LearningRate 0.0426   Epoch: 6   Global Step: 287820   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:02:59,735-Speed 2630.38 samples/sec   Loss 8.8903   LearningRate 0.0426   Epoch: 6   Global Step: 287830   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:03:03,634-Speed 2626.90 samples/sec   Loss 8.8913   LearningRate 0.0426   Epoch: 6   Global Step: 287840   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:03:07,526-Speed 2631.37 samples/sec   Loss 8.7094   LearningRate 0.0426   Epoch: 6   Global Step: 287850   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:03:11,406-Speed 2640.02 samples/sec   Loss 8.7339   LearningRate 0.0426   Epoch: 6   Global Step: 287860   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:15,318-Speed 2618.88 samples/sec   Loss 8.8680   LearningRate 0.0426   Epoch: 6   Global Step: 287870   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:19,213-Speed 2629.23 samples/sec   Loss 8.6976   LearningRate 0.0426   Epoch: 6   Global Step: 287880   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:23,107-Speed 2630.67 samples/sec   Loss 8.8111   LearningRate 0.0426   Epoch: 6   Global Step: 287890   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:27,010-Speed 2624.39 samples/sec   Loss 8.8564   LearningRate 0.0426   Epoch: 6   Global Step: 287900   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:30,911-Speed 2625.49 samples/sec   Loss 9.0071   LearningRate 0.0426   Epoch: 6   Global Step: 287910   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:34,817-Speed 2622.26 samples/sec   Loss 8.8268   LearningRate 0.0426   Epoch: 6   Global Step: 287920   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:38,713-Speed 2629.55 samples/sec   Loss 8.7943   LearningRate 0.0426   Epoch: 6   Global Step: 287930   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:42,609-Speed 2628.34 samples/sec   Loss 8.8428   LearningRate 0.0426   Epoch: 6   Global Step: 287940   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:46,506-Speed 2628.50 samples/sec   Loss 8.8218   LearningRate 0.0426   Epoch: 6   Global Step: 287950   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:03:50,399-Speed 2630.90 samples/sec   Loss 8.8046   LearningRate 0.0426   Epoch: 6   Global Step: 287960   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:03:54,296-Speed 2628.55 samples/sec   Loss 8.7818   LearningRate 0.0426   Epoch: 6   Global Step: 287970   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:03:58,190-Speed 2630.25 samples/sec   Loss 8.8843   LearningRate 0.0426   Epoch: 6   Global Step: 287980   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:02,086-Speed 2628.99 samples/sec   Loss 8.7882   LearningRate 0.0426   Epoch: 6   Global Step: 287990   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:05,984-Speed 2627.41 samples/sec   Loss 8.8555   LearningRate 0.0426   Epoch: 6   Global Step: 288000   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:09,895-Speed 2618.84 samples/sec   Loss 8.9000   LearningRate 0.0426   Epoch: 6   Global Step: 288010   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:13,791-Speed 2629.32 samples/sec   Loss 8.9746   LearningRate 0.0426   Epoch: 6   Global Step: 288020   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:17,720-Speed 2607.24 samples/sec   Loss 8.9422   LearningRate 0.0426   Epoch: 6   Global Step: 288030   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:21,615-Speed 2630.54 samples/sec   Loss 8.7369   LearningRate 0.0426   Epoch: 6   Global Step: 288040   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:04:25,493-Speed 2640.89 samples/sec   Loss 8.8891   LearningRate 0.0426   Epoch: 6   Global Step: 288050   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:29,400-Speed 2621.81 samples/sec   Loss 8.7529   LearningRate 0.0426   Epoch: 6   Global Step: 288060   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:33,302-Speed 2624.96 samples/sec   Loss 8.8767   LearningRate 0.0426   Epoch: 6   Global Step: 288070   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:37,197-Speed 2629.84 samples/sec   Loss 8.7865   LearningRate 0.0426   Epoch: 6   Global Step: 288080   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:41,092-Speed 2629.27 samples/sec   Loss 8.9017   LearningRate 0.0426   Epoch: 6   Global Step: 288090   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:44,989-Speed 2628.64 samples/sec   Loss 8.7983   LearningRate 0.0426   Epoch: 6   Global Step: 288100   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:48,885-Speed 2628.98 samples/sec   Loss 8.6845   LearningRate 0.0426   Epoch: 6   Global Step: 288110   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:52,814-Speed 2607.57 samples/sec   Loss 8.8939   LearningRate 0.0426   Epoch: 6   Global Step: 288120   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:04:56,746-Speed 2604.92 samples/sec   Loss 8.8225   LearningRate 0.0426   Epoch: 6   Global Step: 288130   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:05:00,668-Speed 2611.52 samples/sec   Loss 8.8206   LearningRate 0.0426   Epoch: 6   Global Step: 288140   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:05:04,574-Speed 2622.46 samples/sec   Loss 8.7923   LearningRate 0.0426   Epoch: 6   Global Step: 288150   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:08,469-Speed 2629.23 samples/sec   Loss 8.8896   LearningRate 0.0426   Epoch: 6   Global Step: 288160   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:12,361-Speed 2631.69 samples/sec   Loss 8.8625   LearningRate 0.0426   Epoch: 6   Global Step: 288170   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:16,274-Speed 2617.59 samples/sec   Loss 8.8153   LearningRate 0.0426   Epoch: 6   Global Step: 288180   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:20,171-Speed 2628.76 samples/sec   Loss 8.7170   LearningRate 0.0426   Epoch: 6   Global Step: 288190   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:24,062-Speed 2632.27 samples/sec   Loss 8.8606   LearningRate 0.0426   Epoch: 6   Global Step: 288200   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:27,958-Speed 2629.17 samples/sec   Loss 8.9279   LearningRate 0.0426   Epoch: 6   Global Step: 288210   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:31,854-Speed 2629.14 samples/sec   Loss 8.8217   LearningRate 0.0426   Epoch: 6   Global Step: 288220   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:35,749-Speed 2629.90 samples/sec   Loss 8.8973   LearningRate 0.0426   Epoch: 6   Global Step: 288230   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:39,656-Speed 2621.36 samples/sec   Loss 8.8210   LearningRate 0.0426   Epoch: 6   Global Step: 288240   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:05:43,533-Speed 2642.43 samples/sec   Loss 8.8048   LearningRate 0.0426   Epoch: 6   Global Step: 288250   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:05:47,432-Speed 2626.47 samples/sec   Loss 8.8530   LearningRate 0.0426   Epoch: 6   Global Step: 288260   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:05:51,326-Speed 2630.56 samples/sec   Loss 8.7668   LearningRate 0.0426   Epoch: 6   Global Step: 288270   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:05:55,221-Speed 2629.89 samples/sec   Loss 8.6517   LearningRate 0.0426   Epoch: 6   Global Step: 288280   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:05:59,113-Speed 2631.71 samples/sec   Loss 8.8953   LearningRate 0.0426   Epoch: 6   Global Step: 288290   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:06:03,008-Speed 2629.76 samples/sec   Loss 8.8058   LearningRate 0.0426   Epoch: 6   Global Step: 288300   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:06:06,908-Speed 2626.40 samples/sec   Loss 9.0845   LearningRate 0.0426   Epoch: 6   Global Step: 288310   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:06:10,800-Speed 2631.49 samples/sec   Loss 8.9324   LearningRate 0.0426   Epoch: 6   Global Step: 288320   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:06:14,725-Speed 2610.07 samples/sec   Loss 8.9075   LearningRate 0.0426   Epoch: 6   Global Step: 288330   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:06:18,654-Speed 2606.28 samples/sec   Loss 8.7097   LearningRate 0.0426   Epoch: 6   Global Step: 288340   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:06:22,565-Speed 2619.28 samples/sec   Loss 8.8265   LearningRate 0.0426   Epoch: 6   Global Step: 288350   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:26,481-Speed 2629.82 samples/sec   Loss 8.6841   LearningRate 0.0426   Epoch: 6   Global Step: 288360   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:30,407-Speed 2633.72 samples/sec   Loss 8.7554   LearningRate 0.0426   Epoch: 6   Global Step: 288370   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:34,331-Speed 2610.20 samples/sec   Loss 8.8536   LearningRate 0.0426   Epoch: 6   Global Step: 288380   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:38,224-Speed 2631.11 samples/sec   Loss 8.9111   LearningRate 0.0426   Epoch: 6   Global Step: 288390   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:42,126-Speed 2625.10 samples/sec   Loss 8.8460   LearningRate 0.0426   Epoch: 6   Global Step: 288400   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:50,214-Speed 2616.32 samples/sec   Loss 8.7493   LearningRate 0.0426   Epoch: 6   Global Step: 288410   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:54,282-Speed 2641.40 samples/sec   Loss 8.7888   LearningRate 0.0426   Epoch: 6   Global Step: 288420   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:06:58,190-Speed 2620.66 samples/sec   Loss 8.7511   LearningRate 0.0426   Epoch: 6   Global Step: 288430   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:02,258-Speed 2640.38 samples/sec   Loss 8.9033   LearningRate 0.0426   Epoch: 6   Global Step: 288440   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:06,153-Speed 2629.11 samples/sec   Loss 8.8011   LearningRate 0.0425   Epoch: 6   Global Step: 288450   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:07:10,041-Speed 2634.39 samples/sec   Loss 8.8530   LearningRate 0.0425   Epoch: 6   Global Step: 288460   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:07:13,944-Speed 2624.44 samples/sec   Loss 8.8107   LearningRate 0.0425   Epoch: 6   Global Step: 288470   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:07:18,065-Speed 2629.68 samples/sec   Loss 8.9325   LearningRate 0.0425   Epoch: 6   Global Step: 288480   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:07:21,961-Speed 2629.14 samples/sec   Loss 8.6886   LearningRate 0.0425   Epoch: 6   Global Step: 288490   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:07:25,846-Speed 2636.57 samples/sec   Loss 8.8505   LearningRate 0.0425   Epoch: 6   Global Step: 288500   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:07:29,733-Speed 2635.20 samples/sec   Loss 8.8464   LearningRate 0.0425   Epoch: 6   Global Step: 288510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:33,621-Speed 2634.16 samples/sec   Loss 8.8143   LearningRate 0.0425   Epoch: 6   Global Step: 288520   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:37,515-Speed 2629.96 samples/sec   Loss 8.6437   LearningRate 0.0425   Epoch: 6   Global Step: 288530   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:41,407-Speed 2632.48 samples/sec   Loss 8.9358   LearningRate 0.0425   Epoch: 6   Global Step: 288540   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:45,297-Speed 2632.88 samples/sec   Loss 8.8143   LearningRate 0.0425   Epoch: 6   Global Step: 288550   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:07:49,090-Speed 2700.70 samples/sec   Loss 9.9233   LearningRate 0.0425   Epoch: 6   Global Step: 288560   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:07:52,989-Speed 2627.04 samples/sec   Loss 9.7595   LearningRate 0.0425   Epoch: 6   Global Step: 288570   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:07:56,964-Speed 2576.66 samples/sec   Loss 9.7823   LearningRate 0.0425   Epoch: 6   Global Step: 288580   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:00,845-Speed 2638.70 samples/sec   Loss 9.3922   LearningRate 0.0425   Epoch: 6   Global Step: 288590   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:04,737-Speed 2632.15 samples/sec   Loss 9.2172   LearningRate 0.0425   Epoch: 6   Global Step: 288600   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:08,642-Speed 2623.10 samples/sec   Loss 9.0778   LearningRate 0.0425   Epoch: 6   Global Step: 288610   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:12,559-Speed 2614.62 samples/sec   Loss 8.8250   LearningRate 0.0425   Epoch: 6   Global Step: 288620   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:16,458-Speed 2627.74 samples/sec   Loss 8.9765   LearningRate 0.0425   Epoch: 6   Global Step: 288630   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:20,354-Speed 2628.40 samples/sec   Loss 8.8867   LearningRate 0.0425   Epoch: 6   Global Step: 288640   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:24,241-Speed 2635.19 samples/sec   Loss 8.7609   LearningRate 0.0425   Epoch: 6   Global Step: 288650   Fp16 Grad Scale: 2048   Required: 61 hours
Training: 2022-04-14 04:08:28,131-Speed 2633.30 samples/sec   Loss 8.8971   LearningRate 0.0425   Epoch: 6   Global Step: 288660   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:32,022-Speed 2632.48 samples/sec   Loss 8.9420   LearningRate 0.0425   Epoch: 6   Global Step: 288670   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:35,912-Speed 2632.98 samples/sec   Loss 8.8059   LearningRate 0.0425   Epoch: 6   Global Step: 288680   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:39,802-Speed 2632.70 samples/sec   Loss 8.8804   LearningRate 0.0425   Epoch: 6   Global Step: 288690   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:43,692-Speed 2632.82 samples/sec   Loss 8.8257   LearningRate 0.0425   Epoch: 6   Global Step: 288700   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:47,583-Speed 2632.90 samples/sec   Loss 8.7469   LearningRate 0.0425   Epoch: 6   Global Step: 288710   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:51,487-Speed 2623.54 samples/sec   Loss 8.8783   LearningRate 0.0425   Epoch: 6   Global Step: 288720   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:55,397-Speed 2619.70 samples/sec   Loss 8.8933   LearningRate 0.0425   Epoch: 6   Global Step: 288730   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:08:59,327-Speed 2606.41 samples/sec   Loss 8.7898   LearningRate 0.0425   Epoch: 6   Global Step: 288740   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:09:03,222-Speed 2629.34 samples/sec   Loss 8.9776   LearningRate 0.0425   Epoch: 6   Global Step: 288750   Fp16 Grad Scale: 4096   Required: 61 hours
Training: 2022-04-14 04:09:07,119-Speed 2628.25 samples/sec   Loss 8.7799   LearningRate 0.0425   Epoch: 6   Global Step: 288760   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:11,010-Speed 2632.97 samples/sec   Loss 8.9022   LearningRate 0.0425   Epoch: 6   Global Step: 288770   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:14,901-Speed 2631.84 samples/sec   Loss 8.9764   LearningRate 0.0425   Epoch: 6   Global Step: 288780   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:18,801-Speed 2626.01 samples/sec   Loss 8.9349   LearningRate 0.0425   Epoch: 6   Global Step: 288790   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:22,702-Speed 2626.51 samples/sec   Loss 8.6480   LearningRate 0.0425   Epoch: 6   Global Step: 288800   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:26,607-Speed 2623.08 samples/sec   Loss 8.7625   LearningRate 0.0425   Epoch: 6   Global Step: 288810   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:30,519-Speed 2618.48 samples/sec   Loss 8.8194   LearningRate 0.0425   Epoch: 6   Global Step: 288820   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:34,421-Speed 2625.19 samples/sec   Loss 8.8932   LearningRate 0.0425   Epoch: 6   Global Step: 288830   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:38,328-Speed 2621.56 samples/sec   Loss 8.8216   LearningRate 0.0425   Epoch: 6   Global Step: 288840   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:42,228-Speed 2626.04 samples/sec   Loss 8.6688   LearningRate 0.0425   Epoch: 6   Global Step: 288850   Fp16 Grad Scale: 8192   Required: 61 hours
Training: 2022-04-14 04:09:46,136-Speed 2621.39 samples/sec   Loss 8.8345   LearningRate 0.0425   Epoch: 6   Global Step: 288860   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:09:50,025-Speed 2633.58 samples/sec   Loss 8.8273   LearningRate 0.0425   Epoch: 6   Global Step: 288870   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:09:53,918-Speed 2631.43 samples/sec   Loss 8.8943   LearningRate 0.0425   Epoch: 6   Global Step: 288880   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:09:57,801-Speed 2637.02 samples/sec   Loss 8.8060   LearningRate 0.0425   Epoch: 6   Global Step: 288890   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:01,704-Speed 2624.76 samples/sec   Loss 8.9558   LearningRate 0.0425   Epoch: 6   Global Step: 288900   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:05,599-Speed 2629.77 samples/sec   Loss 8.8570   LearningRate 0.0425   Epoch: 6   Global Step: 288910   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:09,543-Speed 2596.83 samples/sec   Loss 8.6972   LearningRate 0.0425   Epoch: 6   Global Step: 288920   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:13,444-Speed 2625.79 samples/sec   Loss 8.9259   LearningRate 0.0425   Epoch: 6   Global Step: 288930   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:17,385-Speed 2598.42 samples/sec   Loss 8.8055   LearningRate 0.0425   Epoch: 6   Global Step: 288940   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:21,272-Speed 2635.27 samples/sec   Loss 8.5954   LearningRate 0.0425   Epoch: 6   Global Step: 288950   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:10:25,163-Speed 2632.63 samples/sec   Loss 8.9194   LearningRate 0.0425   Epoch: 6   Global Step: 288960   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:29,051-Speed 2634.70 samples/sec   Loss 8.6978   LearningRate 0.0425   Epoch: 6   Global Step: 288970   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:32,940-Speed 2633.40 samples/sec   Loss 8.7914   LearningRate 0.0425   Epoch: 6   Global Step: 288980   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:36,952-Speed 2552.59 samples/sec   Loss 8.9471   LearningRate 0.0425   Epoch: 6   Global Step: 288990   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:40,848-Speed 2628.72 samples/sec   Loss 8.8134   LearningRate 0.0425   Epoch: 6   Global Step: 289000   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:44,744-Speed 2629.55 samples/sec   Loss 8.9043   LearningRate 0.0425   Epoch: 6   Global Step: 289010   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:48,644-Speed 2626.14 samples/sec   Loss 8.7262   LearningRate 0.0425   Epoch: 6   Global Step: 289020   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:52,534-Speed 2633.22 samples/sec   Loss 8.8118   LearningRate 0.0425   Epoch: 6   Global Step: 289030   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:10:56,425-Speed 2632.49 samples/sec   Loss 8.7297   LearningRate 0.0425   Epoch: 6   Global Step: 289040   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:11:00,393-Speed 2580.95 samples/sec   Loss 8.8854   LearningRate 0.0425   Epoch: 6   Global Step: 289050   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:11:04,292-Speed 2627.07 samples/sec   Loss 8.8463   LearningRate 0.0425   Epoch: 6   Global Step: 289060   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:08,199-Speed 2621.99 samples/sec   Loss 8.9806   LearningRate 0.0425   Epoch: 6   Global Step: 289070   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:12,099-Speed 2625.79 samples/sec   Loss 8.8818   LearningRate 0.0424   Epoch: 6   Global Step: 289080   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:16,017-Speed 2614.84 samples/sec   Loss 8.8311   LearningRate 0.0424   Epoch: 6   Global Step: 289090   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:19,910-Speed 2631.14 samples/sec   Loss 8.8965   LearningRate 0.0424   Epoch: 6   Global Step: 289100   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:23,797-Speed 2635.13 samples/sec   Loss 8.8498   LearningRate 0.0424   Epoch: 6   Global Step: 289110   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:27,695-Speed 2627.18 samples/sec   Loss 8.8187   LearningRate 0.0424   Epoch: 6   Global Step: 289120   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:31,586-Speed 2633.11 samples/sec   Loss 8.9151   LearningRate 0.0424   Epoch: 6   Global Step: 289130   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:35,476-Speed 2633.03 samples/sec   Loss 8.8820   LearningRate 0.0424   Epoch: 6   Global Step: 289140   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:39,369-Speed 2630.94 samples/sec   Loss 8.6749   LearningRate 0.0424   Epoch: 6   Global Step: 289150   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:43,251-Speed 2637.89 samples/sec   Loss 8.8423   LearningRate 0.0424   Epoch: 6   Global Step: 289160   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:47,209-Speed 2588.40 samples/sec   Loss 8.8006   LearningRate 0.0424   Epoch: 6   Global Step: 289170   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:51,108-Speed 2626.48 samples/sec   Loss 8.7925   LearningRate 0.0424   Epoch: 6   Global Step: 289180   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:55,001-Speed 2631.13 samples/sec   Loss 8.7969   LearningRate 0.0424   Epoch: 6   Global Step: 289190   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:11:58,889-Speed 2634.67 samples/sec   Loss 8.7191   LearningRate 0.0424   Epoch: 6   Global Step: 289200   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:12:02,788-Speed 2627.30 samples/sec   Loss 8.7777   LearningRate 0.0424   Epoch: 6   Global Step: 289210   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:12:06,696-Speed 2620.83 samples/sec   Loss 8.7717   LearningRate 0.0424   Epoch: 6   Global Step: 289220   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:12:10,591-Speed 2629.16 samples/sec   Loss 8.7781   LearningRate 0.0424   Epoch: 6   Global Step: 289230   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:12:14,502-Speed 2619.22 samples/sec   Loss 8.8968   LearningRate 0.0424   Epoch: 6   Global Step: 289240   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:12:18,392-Speed 2632.66 samples/sec   Loss 8.7545   LearningRate 0.0424   Epoch: 6   Global Step: 289250   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:12:22,287-Speed 2630.16 samples/sec   Loss 8.8849   LearningRate 0.0424   Epoch: 6   Global Step: 289260   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:26,181-Speed 2630.18 samples/sec   Loss 8.7773   LearningRate 0.0424   Epoch: 6   Global Step: 289270   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:30,085-Speed 2623.93 samples/sec   Loss 8.9770   LearningRate 0.0424   Epoch: 6   Global Step: 289280   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:33,980-Speed 2629.55 samples/sec   Loss 8.8916   LearningRate 0.0424   Epoch: 6   Global Step: 289290   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:37,871-Speed 2632.27 samples/sec   Loss 8.7966   LearningRate 0.0424   Epoch: 6   Global Step: 289300   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:41,761-Speed 2633.02 samples/sec   Loss 8.9332   LearningRate 0.0424   Epoch: 6   Global Step: 289310   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:45,665-Speed 2623.68 samples/sec   Loss 8.8474   LearningRate 0.0424   Epoch: 6   Global Step: 289320   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:49,567-Speed 2625.13 samples/sec   Loss 8.7935   LearningRate 0.0424   Epoch: 6   Global Step: 289330   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:53,469-Speed 2625.35 samples/sec   Loss 8.8367   LearningRate 0.0424   Epoch: 6   Global Step: 289340   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:12:57,360-Speed 2632.29 samples/sec   Loss 8.8713   LearningRate 0.0424   Epoch: 6   Global Step: 289350   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:01,256-Speed 2628.88 samples/sec   Loss 8.9480   LearningRate 0.0424   Epoch: 6   Global Step: 289360   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:13:05,147-Speed 2632.44 samples/sec   Loss 8.8148   LearningRate 0.0424   Epoch: 6   Global Step: 289370   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:09,043-Speed 2629.46 samples/sec   Loss 8.8040   LearningRate 0.0424   Epoch: 6   Global Step: 289380   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:12,935-Speed 2631.53 samples/sec   Loss 8.8113   LearningRate 0.0424   Epoch: 6   Global Step: 289390   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:16,833-Speed 2628.10 samples/sec   Loss 8.6761   LearningRate 0.0424   Epoch: 6   Global Step: 289400   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:20,727-Speed 2630.51 samples/sec   Loss 8.7031   LearningRate 0.0424   Epoch: 6   Global Step: 289410   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:24,623-Speed 2628.67 samples/sec   Loss 8.8505   LearningRate 0.0424   Epoch: 6   Global Step: 289420   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:28,519-Speed 2629.51 samples/sec   Loss 8.7437   LearningRate 0.0424   Epoch: 6   Global Step: 289430   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:32,419-Speed 2626.43 samples/sec   Loss 8.9131   LearningRate 0.0424   Epoch: 6   Global Step: 289440   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:36,338-Speed 2612.96 samples/sec   Loss 8.7717   LearningRate 0.0424   Epoch: 6   Global Step: 289450   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:40,231-Speed 2630.95 samples/sec   Loss 8.7603   LearningRate 0.0424   Epoch: 6   Global Step: 289460   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:44,124-Speed 2631.29 samples/sec   Loss 8.7356   LearningRate 0.0424   Epoch: 6   Global Step: 289470   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:13:48,005-Speed 2638.98 samples/sec   Loss 8.7980   LearningRate 0.0424   Epoch: 6   Global Step: 289480   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:51,898-Speed 2631.30 samples/sec   Loss 8.8773   LearningRate 0.0424   Epoch: 6   Global Step: 289490   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:55,797-Speed 2626.65 samples/sec   Loss 8.9322   LearningRate 0.0424   Epoch: 6   Global Step: 289500   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:13:59,724-Speed 2608.79 samples/sec   Loss 8.9133   LearningRate 0.0424   Epoch: 6   Global Step: 289510   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:03,623-Speed 2627.24 samples/sec   Loss 8.8405   LearningRate 0.0424   Epoch: 6   Global Step: 289520   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:07,519-Speed 2628.79 samples/sec   Loss 8.8212   LearningRate 0.0424   Epoch: 6   Global Step: 289530   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:11,419-Speed 2625.90 samples/sec   Loss 8.9146   LearningRate 0.0424   Epoch: 6   Global Step: 289540   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:15,315-Speed 2629.68 samples/sec   Loss 8.9040   LearningRate 0.0424   Epoch: 6   Global Step: 289550   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:19,219-Speed 2623.42 samples/sec   Loss 8.8090   LearningRate 0.0424   Epoch: 6   Global Step: 289560   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:23,119-Speed 2626.38 samples/sec   Loss 8.8062   LearningRate 0.0424   Epoch: 6   Global Step: 289570   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:26,993-Speed 2643.41 samples/sec   Loss 8.7264   LearningRate 0.0424   Epoch: 6   Global Step: 289580   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:30,891-Speed 2627.98 samples/sec   Loss 8.7980   LearningRate 0.0424   Epoch: 6   Global Step: 289590   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:34,790-Speed 2626.92 samples/sec   Loss 8.7977   LearningRate 0.0424   Epoch: 6   Global Step: 289600   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:14:38,676-Speed 2635.51 samples/sec   Loss 8.8956   LearningRate 0.0424   Epoch: 6   Global Step: 289610   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:14:42,578-Speed 2625.38 samples/sec   Loss 8.7872   LearningRate 0.0424   Epoch: 6   Global Step: 289620   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:14:46,495-Speed 2614.87 samples/sec   Loss 8.8678   LearningRate 0.0424   Epoch: 6   Global Step: 289630   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:14:50,403-Speed 2620.70 samples/sec   Loss 8.6948   LearningRate 0.0424   Epoch: 6   Global Step: 289640   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:14:54,290-Speed 2635.33 samples/sec   Loss 8.7375   LearningRate 0.0424   Epoch: 6   Global Step: 289650   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:14:58,180-Speed 2633.05 samples/sec   Loss 8.7370   LearningRate 0.0424   Epoch: 6   Global Step: 289660   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:15:02,094-Speed 2617.33 samples/sec   Loss 8.7897   LearningRate 0.0424   Epoch: 6   Global Step: 289670   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:15:06,000-Speed 2622.23 samples/sec   Loss 8.7292   LearningRate 0.0424   Epoch: 6   Global Step: 289680   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:15:09,902-Speed 2624.41 samples/sec   Loss 8.8666   LearningRate 0.0424   Epoch: 6   Global Step: 289690   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:15:13,800-Speed 2627.59 samples/sec   Loss 8.9690   LearningRate 0.0424   Epoch: 6   Global Step: 289700   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:15:17,700-Speed 2626.00 samples/sec   Loss 8.8151   LearningRate 0.0424   Epoch: 6   Global Step: 289710   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:21,598-Speed 2628.05 samples/sec   Loss 8.7990   LearningRate 0.0423   Epoch: 6   Global Step: 289720   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:25,493-Speed 2629.53 samples/sec   Loss 8.8809   LearningRate 0.0423   Epoch: 6   Global Step: 289730   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:29,399-Speed 2631.20 samples/sec   Loss 8.6339   LearningRate 0.0423   Epoch: 6   Global Step: 289740   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:33,293-Speed 2630.24 samples/sec   Loss 8.7933   LearningRate 0.0423   Epoch: 6   Global Step: 289750   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:37,189-Speed 2628.62 samples/sec   Loss 8.7944   LearningRate 0.0423   Epoch: 6   Global Step: 289760   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:41,085-Speed 2628.65 samples/sec   Loss 8.7399   LearningRate 0.0423   Epoch: 6   Global Step: 289770   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:44,985-Speed 2626.83 samples/sec   Loss 8.8266   LearningRate 0.0423   Epoch: 6   Global Step: 289780   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:48,880-Speed 2629.34 samples/sec   Loss 8.8990   LearningRate 0.0423   Epoch: 6   Global Step: 289790   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:52,776-Speed 2629.51 samples/sec   Loss 8.8402   LearningRate 0.0423   Epoch: 6   Global Step: 289800   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:15:56,668-Speed 2631.22 samples/sec   Loss 8.8825   LearningRate 0.0423   Epoch: 6   Global Step: 289810   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:16:00,561-Speed 2631.40 samples/sec   Loss 8.6754   LearningRate 0.0423   Epoch: 6   Global Step: 289820   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:16:04,458-Speed 2628.46 samples/sec   Loss 8.8711   LearningRate 0.0423   Epoch: 6   Global Step: 289830   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:16:08,347-Speed 2633.28 samples/sec   Loss 8.6748   LearningRate 0.0423   Epoch: 6   Global Step: 289840   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:16:12,221-Speed 2643.77 samples/sec   Loss 8.7400   LearningRate 0.0423   Epoch: 6   Global Step: 289850   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:16:16,105-Speed 2637.22 samples/sec   Loss 8.9369   LearningRate 0.0423   Epoch: 6   Global Step: 289860   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:20,018-Speed 2617.06 samples/sec   Loss 8.8927   LearningRate 0.0423   Epoch: 6   Global Step: 289870   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:23,918-Speed 2627.27 samples/sec   Loss 8.7779   LearningRate 0.0423   Epoch: 6   Global Step: 289880   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:27,806-Speed 2634.19 samples/sec   Loss 8.8740   LearningRate 0.0423   Epoch: 6   Global Step: 289890   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:31,739-Speed 2603.98 samples/sec   Loss 8.6785   LearningRate 0.0423   Epoch: 6   Global Step: 289900   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:35,790-Speed 2528.54 samples/sec   Loss 8.9122   LearningRate 0.0423   Epoch: 6   Global Step: 289910   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:39,694-Speed 2623.33 samples/sec   Loss 8.7766   LearningRate 0.0423   Epoch: 6   Global Step: 289920   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:43,593-Speed 2627.15 samples/sec   Loss 8.8468   LearningRate 0.0423   Epoch: 6   Global Step: 289930   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:47,485-Speed 2631.69 samples/sec   Loss 8.7730   LearningRate 0.0423   Epoch: 6   Global Step: 289940   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:51,377-Speed 2632.12 samples/sec   Loss 8.8498   LearningRate 0.0423   Epoch: 6   Global Step: 289950   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:16:55,269-Speed 2631.48 samples/sec   Loss 8.8886   LearningRate 0.0423   Epoch: 6   Global Step: 289960   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:16:59,158-Speed 2633.97 samples/sec   Loss 8.8051   LearningRate 0.0423   Epoch: 6   Global Step: 289970   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:17:03,032-Speed 2643.70 samples/sec   Loss 8.8787   LearningRate 0.0423   Epoch: 6   Global Step: 289980   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:17:06,932-Speed 2626.01 samples/sec   Loss 8.8915   LearningRate 0.0423   Epoch: 6   Global Step: 289990   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:17:10,826-Speed 2630.27 samples/sec   Loss 8.6995   LearningRate 0.0423   Epoch: 6   Global Step: 290000   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:17:54,451-[lfw][290000]XNorm: 22.874048
Training: 2022-04-14 04:17:54,452-[lfw][290000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-14 04:17:54,453-[lfw][290000]Accuracy-Highest: 0.99783
Training: 2022-04-14 04:18:45,175-[cfp_fp][290000]XNorm: 20.931474
Training: 2022-04-14 04:18:45,176-[cfp_fp][290000]Accuracy-Flip: 0.98443+-0.00587
Training: 2022-04-14 04:18:45,177-[cfp_fp][290000]Accuracy-Highest: 0.98643
Training: 2022-04-14 04:19:28,384-[agedb_30][290000]XNorm: 22.810445
Training: 2022-04-14 04:19:28,385-[agedb_30][290000]Accuracy-Flip: 0.97483+-0.00603
Training: 2022-04-14 04:19:28,386-[agedb_30][290000]Accuracy-Highest: 0.97567
Training: 2022-04-14 04:19:32,247-Speed 72.41 samples/sec   Loss 8.8164   LearningRate 0.0423   Epoch: 6   Global Step: 290010   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:36,108-Speed 2652.77 samples/sec   Loss 8.7170   LearningRate 0.0423   Epoch: 6   Global Step: 290020   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:39,990-Speed 2639.06 samples/sec   Loss 8.8588   LearningRate 0.0423   Epoch: 6   Global Step: 290030   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:44,042-Speed 2527.39 samples/sec   Loss 8.9399   LearningRate 0.0423   Epoch: 6   Global Step: 290040   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:47,966-Speed 2611.04 samples/sec   Loss 8.9161   LearningRate 0.0423   Epoch: 6   Global Step: 290050   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:51,831-Speed 2650.11 samples/sec   Loss 8.7573   LearningRate 0.0423   Epoch: 6   Global Step: 290060   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:55,700-Speed 2647.20 samples/sec   Loss 8.8720   LearningRate 0.0423   Epoch: 6   Global Step: 290070   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:19:59,599-Speed 2627.69 samples/sec   Loss 8.7707   LearningRate 0.0423   Epoch: 6   Global Step: 290080   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:03,471-Speed 2644.63 samples/sec   Loss 8.8243   LearningRate 0.0423   Epoch: 6   Global Step: 290090   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:07,355-Speed 2637.40 samples/sec   Loss 8.9221   LearningRate 0.0423   Epoch: 6   Global Step: 290100   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:11,234-Speed 2640.46 samples/sec   Loss 8.7086   LearningRate 0.0423   Epoch: 6   Global Step: 290110   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:15,115-Speed 2639.05 samples/sec   Loss 8.7687   LearningRate 0.0423   Epoch: 6   Global Step: 290120   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:18,997-Speed 2638.32 samples/sec   Loss 8.9350   LearningRate 0.0423   Epoch: 6   Global Step: 290130   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:22,881-Speed 2637.35 samples/sec   Loss 9.0165   LearningRate 0.0423   Epoch: 6   Global Step: 290140   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:26,769-Speed 2634.45 samples/sec   Loss 8.8082   LearningRate 0.0423   Epoch: 6   Global Step: 290150   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:30,655-Speed 2636.14 samples/sec   Loss 8.7674   LearningRate 0.0423   Epoch: 6   Global Step: 290160   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:34,543-Speed 2634.16 samples/sec   Loss 8.8797   LearningRate 0.0423   Epoch: 6   Global Step: 290170   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:38,429-Speed 2635.94 samples/sec   Loss 8.8336   LearningRate 0.0423   Epoch: 6   Global Step: 290180   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:20:42,318-Speed 2633.49 samples/sec   Loss 8.8589   LearningRate 0.0423   Epoch: 6   Global Step: 290190   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:20:46,312-Speed 2564.75 samples/sec   Loss 8.7375   LearningRate 0.0423   Epoch: 6   Global Step: 290200   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:20:50,214-Speed 2624.10 samples/sec   Loss 8.8825   LearningRate 0.0423   Epoch: 6   Global Step: 290210   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:20:54,078-Speed 2651.37 samples/sec   Loss 8.8864   LearningRate 0.0423   Epoch: 6   Global Step: 290220   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:20:57,969-Speed 2632.01 samples/sec   Loss 8.9049   LearningRate 0.0423   Epoch: 6   Global Step: 290230   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:01,859-Speed 2633.36 samples/sec   Loss 8.8268   LearningRate 0.0423   Epoch: 6   Global Step: 290240   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:05,762-Speed 2624.29 samples/sec   Loss 8.8174   LearningRate 0.0423   Epoch: 6   Global Step: 290250   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:09,655-Speed 2630.83 samples/sec   Loss 8.8647   LearningRate 0.0423   Epoch: 6   Global Step: 290260   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:13,550-Speed 2629.69 samples/sec   Loss 8.7727   LearningRate 0.0423   Epoch: 6   Global Step: 290270   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:17,445-Speed 2629.51 samples/sec   Loss 8.7293   LearningRate 0.0423   Epoch: 6   Global Step: 290280   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:21,343-Speed 2628.46 samples/sec   Loss 8.6967   LearningRate 0.0423   Epoch: 6   Global Step: 290290   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:25,231-Speed 2634.18 samples/sec   Loss 8.9588   LearningRate 0.0423   Epoch: 6   Global Step: 290300   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:29,125-Speed 2630.28 samples/sec   Loss 8.7845   LearningRate 0.0423   Epoch: 6   Global Step: 290310   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:33,026-Speed 2625.27 samples/sec   Loss 8.8547   LearningRate 0.0423   Epoch: 6   Global Step: 290320   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:36,922-Speed 2629.46 samples/sec   Loss 8.8374   LearningRate 0.0423   Epoch: 6   Global Step: 290330   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:40,810-Speed 2634.13 samples/sec   Loss 8.8431   LearningRate 0.0423   Epoch: 6   Global Step: 290340   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:21:44,711-Speed 2625.75 samples/sec   Loss 8.9314   LearningRate 0.0423   Epoch: 6   Global Step: 290350   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:22:05,650-Speed 489.08 samples/sec   Loss 8.8387   LearningRate 0.0422   Epoch: 7   Global Step: 290360   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:22:09,529-Speed 2640.49 samples/sec   Loss 8.7885   LearningRate 0.0422   Epoch: 7   Global Step: 290370   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:22:13,408-Speed 2641.06 samples/sec   Loss 8.7895   LearningRate 0.0422   Epoch: 7   Global Step: 290380   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:17,306-Speed 2627.78 samples/sec   Loss 8.8993   LearningRate 0.0422   Epoch: 7   Global Step: 290390   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:21,209-Speed 2624.16 samples/sec   Loss 8.8399   LearningRate 0.0422   Epoch: 7   Global Step: 290400   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:25,170-Speed 2586.26 samples/sec   Loss 8.8313   LearningRate 0.0422   Epoch: 7   Global Step: 290410   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:29,078-Speed 2620.80 samples/sec   Loss 8.7965   LearningRate 0.0422   Epoch: 7   Global Step: 290420   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:32,967-Speed 2634.00 samples/sec   Loss 8.7864   LearningRate 0.0422   Epoch: 7   Global Step: 290430   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:36,857-Speed 2632.41 samples/sec   Loss 8.8116   LearningRate 0.0422   Epoch: 7   Global Step: 290440   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:40,752-Speed 2629.79 samples/sec   Loss 8.7601   LearningRate 0.0422   Epoch: 7   Global Step: 290450   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:44,644-Speed 2631.34 samples/sec   Loss 8.6554   LearningRate 0.0422   Epoch: 7   Global Step: 290460   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:48,543-Speed 2626.92 samples/sec   Loss 8.9084   LearningRate 0.0422   Epoch: 7   Global Step: 290470   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:22:52,438-Speed 2629.78 samples/sec   Loss 8.7083   LearningRate 0.0422   Epoch: 7   Global Step: 290480   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:22:56,314-Speed 2642.35 samples/sec   Loss 8.7206   LearningRate 0.0422   Epoch: 7   Global Step: 290490   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:23:00,210-Speed 2629.06 samples/sec   Loss 8.8826   LearningRate 0.0422   Epoch: 7   Global Step: 290500   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:23:04,085-Speed 2643.06 samples/sec   Loss 9.6456   LearningRate 0.0422   Epoch: 7   Global Step: 290510   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:07,995-Speed 2619.60 samples/sec   Loss 9.1266   LearningRate 0.0422   Epoch: 7   Global Step: 290520   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:11,907-Speed 2617.80 samples/sec   Loss 10.0776   LearningRate 0.0422   Epoch: 7   Global Step: 290530   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:15,810-Speed 2624.77 samples/sec   Loss 9.0079   LearningRate 0.0422   Epoch: 7   Global Step: 290540   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:19,707-Speed 2628.31 samples/sec   Loss 9.0149   LearningRate 0.0422   Epoch: 7   Global Step: 290550   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:23,607-Speed 2626.94 samples/sec   Loss 8.7510   LearningRate 0.0422   Epoch: 7   Global Step: 290560   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:27,506-Speed 2626.65 samples/sec   Loss 8.8153   LearningRate 0.0422   Epoch: 7   Global Step: 290570   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:31,405-Speed 2627.26 samples/sec   Loss 8.7424   LearningRate 0.0422   Epoch: 7   Global Step: 290580   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:35,316-Speed 2618.90 samples/sec   Loss 8.7181   LearningRate 0.0422   Epoch: 7   Global Step: 290590   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:39,219-Speed 2624.05 samples/sec   Loss 8.7930   LearningRate 0.0422   Epoch: 7   Global Step: 290600   Fp16 Grad Scale: 16384   Required: 61 hours
Training: 2022-04-14 04:23:43,113-Speed 2630.26 samples/sec   Loss 8.7527   LearningRate 0.0422   Epoch: 7   Global Step: 290610   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:23:47,013-Speed 2626.93 samples/sec   Loss 8.8855   LearningRate 0.0422   Epoch: 7   Global Step: 290620   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:23:50,912-Speed 2626.28 samples/sec   Loss 8.6941   LearningRate 0.0422   Epoch: 7   Global Step: 290630   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:23:54,851-Speed 2600.62 samples/sec   Loss 8.8424   LearningRate 0.0422   Epoch: 7   Global Step: 290640   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:23:58,756-Speed 2623.05 samples/sec   Loss 8.6961   LearningRate 0.0422   Epoch: 7   Global Step: 290650   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:24:02,661-Speed 2623.00 samples/sec   Loss 8.7247   LearningRate 0.0422   Epoch: 7   Global Step: 290660   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:24:06,570-Speed 2620.31 samples/sec   Loss 8.8417   LearningRate 0.0422   Epoch: 7   Global Step: 290670   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:24:10,507-Speed 2601.81 samples/sec   Loss 8.7811   LearningRate 0.0422   Epoch: 7   Global Step: 290680   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:24:14,414-Speed 2621.36 samples/sec   Loss 8.6956   LearningRate 0.0422   Epoch: 7   Global Step: 290690   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:24:18,316-Speed 2625.08 samples/sec   Loss 8.9194   LearningRate 0.0422   Epoch: 7   Global Step: 290700   Fp16 Grad Scale: 32768   Required: 61 hours
Training: 2022-04-14 04:24:22,224-Speed 2621.33 samples/sec   Loss 8.8088   LearningRate 0.0422   Epoch: 7   Global Step: 290710   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:26,133-Speed 2619.99 samples/sec   Loss 8.8212   LearningRate 0.0422   Epoch: 7   Global Step: 290720   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:30,035-Speed 2625.16 samples/sec   Loss 8.8487   LearningRate 0.0422   Epoch: 7   Global Step: 290730   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:33,946-Speed 2618.99 samples/sec   Loss 8.7830   LearningRate 0.0422   Epoch: 7   Global Step: 290740   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:37,871-Speed 2609.40 samples/sec   Loss 8.8074   LearningRate 0.0422   Epoch: 7   Global Step: 290750   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:41,796-Speed 2609.76 samples/sec   Loss 8.7769   LearningRate 0.0422   Epoch: 7   Global Step: 290760   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:45,701-Speed 2623.19 samples/sec   Loss 8.6644   LearningRate 0.0422   Epoch: 7   Global Step: 290770   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:49,602-Speed 2626.08 samples/sec   Loss 8.6984   LearningRate 0.0422   Epoch: 7   Global Step: 290780   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:53,505-Speed 2624.18 samples/sec   Loss 8.9119   LearningRate 0.0422   Epoch: 7   Global Step: 290790   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:24:57,408-Speed 2624.24 samples/sec   Loss 8.6952   LearningRate 0.0422   Epoch: 7   Global Step: 290800   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:01,313-Speed 2623.37 samples/sec   Loss 8.8433   LearningRate 0.0422   Epoch: 7   Global Step: 290810   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:25:05,218-Speed 2622.36 samples/sec   Loss 8.7789   LearningRate 0.0422   Epoch: 7   Global Step: 290820   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:25:09,122-Speed 2623.52 samples/sec   Loss 8.6967   LearningRate 0.0422   Epoch: 7   Global Step: 290830   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:25:13,044-Speed 2611.49 samples/sec   Loss 8.6908   LearningRate 0.0422   Epoch: 7   Global Step: 290840   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:25:16,953-Speed 2620.89 samples/sec   Loss 8.6945   LearningRate 0.0422   Epoch: 7   Global Step: 290850   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:25:20,855-Speed 2624.64 samples/sec   Loss 8.8852   LearningRate 0.0422   Epoch: 7   Global Step: 290860   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:25:24,745-Speed 2633.51 samples/sec   Loss 8.7699   LearningRate 0.0422   Epoch: 7   Global Step: 290870   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:28,645-Speed 2626.07 samples/sec   Loss 8.6910   LearningRate 0.0422   Epoch: 7   Global Step: 290880   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:32,547-Speed 2624.61 samples/sec   Loss 8.5520   LearningRate 0.0422   Epoch: 7   Global Step: 290890   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:36,456-Speed 2620.84 samples/sec   Loss 8.8328   LearningRate 0.0422   Epoch: 7   Global Step: 290900   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:40,359-Speed 2624.19 samples/sec   Loss 8.7666   LearningRate 0.0422   Epoch: 7   Global Step: 290910   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:44,261-Speed 2624.51 samples/sec   Loss 8.8398   LearningRate 0.0422   Epoch: 7   Global Step: 290920   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:48,170-Speed 2620.25 samples/sec   Loss 8.8658   LearningRate 0.0422   Epoch: 7   Global Step: 290930   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:52,095-Speed 2609.72 samples/sec   Loss 8.7627   LearningRate 0.0422   Epoch: 7   Global Step: 290940   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:56,004-Speed 2620.27 samples/sec   Loss 8.7521   LearningRate 0.0422   Epoch: 7   Global Step: 290950   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:25:59,907-Speed 2624.27 samples/sec   Loss 8.7851   LearningRate 0.0422   Epoch: 7   Global Step: 290960   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:26:03,822-Speed 2616.18 samples/sec   Loss 8.7684   LearningRate 0.0422   Epoch: 7   Global Step: 290970   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:07,731-Speed 2620.09 samples/sec   Loss 8.7246   LearningRate 0.0422   Epoch: 7   Global Step: 290980   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:11,636-Speed 2623.32 samples/sec   Loss 8.7573   LearningRate 0.0422   Epoch: 7   Global Step: 290990   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:15,540-Speed 2623.45 samples/sec   Loss 8.8260   LearningRate 0.0421   Epoch: 7   Global Step: 291000   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:19,444-Speed 2623.16 samples/sec   Loss 8.7945   LearningRate 0.0421   Epoch: 7   Global Step: 291010   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:23,348-Speed 2623.92 samples/sec   Loss 8.6822   LearningRate 0.0421   Epoch: 7   Global Step: 291020   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:27,264-Speed 2615.39 samples/sec   Loss 8.7892   LearningRate 0.0421   Epoch: 7   Global Step: 291030   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:31,173-Speed 2620.58 samples/sec   Loss 8.8364   LearningRate 0.0421   Epoch: 7   Global Step: 291040   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:35,078-Speed 2622.36 samples/sec   Loss 8.7326   LearningRate 0.0421   Epoch: 7   Global Step: 291050   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:38,989-Speed 2618.92 samples/sec   Loss 8.5724   LearningRate 0.0421   Epoch: 7   Global Step: 291060   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:42,893-Speed 2623.30 samples/sec   Loss 8.8028   LearningRate 0.0421   Epoch: 7   Global Step: 291070   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:26:46,798-Speed 2623.66 samples/sec   Loss 8.8966   LearningRate 0.0421   Epoch: 7   Global Step: 291080   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:26:50,704-Speed 2621.79 samples/sec   Loss 8.7306   LearningRate 0.0421   Epoch: 7   Global Step: 291090   Fp16 Grad Scale: 262144   Required: 61 hours
Training: 2022-04-14 04:26:54,610-Speed 2622.89 samples/sec   Loss 8.8487   LearningRate 0.0421   Epoch: 7   Global Step: 291100   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:26:58,511-Speed 2625.38 samples/sec   Loss 8.7482   LearningRate 0.0421   Epoch: 7   Global Step: 291110   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:27:02,422-Speed 2618.76 samples/sec   Loss 8.6437   LearningRate 0.0421   Epoch: 7   Global Step: 291120   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:27:06,329-Speed 2621.41 samples/sec   Loss 8.9087   LearningRate 0.0421   Epoch: 7   Global Step: 291130   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:27:10,234-Speed 2622.46 samples/sec   Loss 8.8100   LearningRate 0.0421   Epoch: 7   Global Step: 291140   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:27:14,136-Speed 2624.75 samples/sec   Loss 8.7911   LearningRate 0.0421   Epoch: 7   Global Step: 291150   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:27:18,048-Speed 2618.91 samples/sec   Loss 8.7321   LearningRate 0.0421   Epoch: 7   Global Step: 291160   Fp16 Grad Scale: 131072   Required: 61 hours
Training: 2022-04-14 04:27:21,927-Speed 2640.94 samples/sec   Loss 8.8615   LearningRate 0.0421   Epoch: 7   Global Step: 291170   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:25,833-Speed 2622.03 samples/sec   Loss 8.9750   LearningRate 0.0421   Epoch: 7   Global Step: 291180   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:29,739-Speed 2622.39 samples/sec   Loss 8.6822   LearningRate 0.0421   Epoch: 7   Global Step: 291190   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:33,657-Speed 2613.98 samples/sec   Loss 8.7759   LearningRate 0.0421   Epoch: 7   Global Step: 291200   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:37,565-Speed 2621.13 samples/sec   Loss 8.8528   LearningRate 0.0421   Epoch: 7   Global Step: 291210   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:41,515-Speed 2593.00 samples/sec   Loss 8.6457   LearningRate 0.0421   Epoch: 7   Global Step: 291220   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:45,426-Speed 2619.15 samples/sec   Loss 8.7139   LearningRate 0.0421   Epoch: 7   Global Step: 291230   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:49,333-Speed 2621.68 samples/sec   Loss 8.8526   LearningRate 0.0421   Epoch: 7   Global Step: 291240   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:53,241-Speed 2621.18 samples/sec   Loss 8.7571   LearningRate 0.0421   Epoch: 7   Global Step: 291250   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:27:57,148-Speed 2621.06 samples/sec   Loss 8.7422   LearningRate 0.0421   Epoch: 7   Global Step: 291260   Fp16 Grad Scale: 65536   Required: 61 hours
Training: 2022-04-14 04:28:01,055-Speed 2621.64 samples/sec   Loss 8.8106   LearningRate 0.0421   Epoch: 7   Global Step: 291270   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:04,968-Speed 2618.06 samples/sec   Loss 8.6496   LearningRate 0.0421   Epoch: 7   Global Step: 291280   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:08,869-Speed 2625.22 samples/sec   Loss 8.7925   LearningRate 0.0421   Epoch: 7   Global Step: 291290   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:12,772-Speed 2624.11 samples/sec   Loss 8.7306   LearningRate 0.0421   Epoch: 7   Global Step: 291300   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:16,677-Speed 2623.17 samples/sec   Loss 8.6794   LearningRate 0.0421   Epoch: 7   Global Step: 291310   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:20,581-Speed 2623.55 samples/sec   Loss 8.8286   LearningRate 0.0421   Epoch: 7   Global Step: 291320   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:24,489-Speed 2620.92 samples/sec   Loss 8.8067   LearningRate 0.0421   Epoch: 7   Global Step: 291330   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:28,391-Speed 2625.26 samples/sec   Loss 8.6626   LearningRate 0.0421   Epoch: 7   Global Step: 291340   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:32,325-Speed 2603.73 samples/sec   Loss 8.7238   LearningRate 0.0421   Epoch: 7   Global Step: 291350   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:36,229-Speed 2623.46 samples/sec   Loss 8.6466   LearningRate 0.0421   Epoch: 7   Global Step: 291360   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:40,140-Speed 2619.14 samples/sec   Loss 8.6735   LearningRate 0.0421   Epoch: 7   Global Step: 291370   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:28:44,045-Speed 2622.68 samples/sec   Loss 8.7162   LearningRate 0.0421   Epoch: 7   Global Step: 291380   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:28:47,939-Speed 2630.54 samples/sec   Loss 8.7799   LearningRate 0.0421   Epoch: 7   Global Step: 291390   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:51,845-Speed 2622.85 samples/sec   Loss 8.7210   LearningRate 0.0421   Epoch: 7   Global Step: 291400   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:55,757-Speed 2618.06 samples/sec   Loss 8.8538   LearningRate 0.0421   Epoch: 7   Global Step: 291410   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:28:59,662-Speed 2622.36 samples/sec   Loss 9.0102   LearningRate 0.0421   Epoch: 7   Global Step: 291420   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:29:03,573-Speed 2619.13 samples/sec   Loss 8.7515   LearningRate 0.0421   Epoch: 7   Global Step: 291430   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:29:07,491-Speed 2614.23 samples/sec   Loss 8.6658   LearningRate 0.0421   Epoch: 7   Global Step: 291440   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:29:11,403-Speed 2618.24 samples/sec   Loss 8.7093   LearningRate 0.0421   Epoch: 7   Global Step: 291450   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:29:15,293-Speed 2633.06 samples/sec   Loss 8.7196   LearningRate 0.0421   Epoch: 7   Global Step: 291460   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:19,199-Speed 2622.29 samples/sec   Loss 8.7519   LearningRate 0.0421   Epoch: 7   Global Step: 291470   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:23,103-Speed 2623.57 samples/sec   Loss 8.8411   LearningRate 0.0421   Epoch: 7   Global Step: 291480   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:27,007-Speed 2623.54 samples/sec   Loss 8.7974   LearningRate 0.0421   Epoch: 7   Global Step: 291490   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:30,911-Speed 2623.55 samples/sec   Loss 8.7441   LearningRate 0.0421   Epoch: 7   Global Step: 291500   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:34,816-Speed 2623.30 samples/sec   Loss 8.7863   LearningRate 0.0421   Epoch: 7   Global Step: 291510   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:38,720-Speed 2623.07 samples/sec   Loss 8.7152   LearningRate 0.0421   Epoch: 7   Global Step: 291520   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:42,642-Speed 2611.77 samples/sec   Loss 8.7474   LearningRate 0.0421   Epoch: 7   Global Step: 291530   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:46,546-Speed 2623.41 samples/sec   Loss 8.8766   LearningRate 0.0421   Epoch: 7   Global Step: 291540   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:50,449-Speed 2624.83 samples/sec   Loss 8.7270   LearningRate 0.0421   Epoch: 7   Global Step: 291550   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:29:54,349-Speed 2626.10 samples/sec   Loss 8.7656   LearningRate 0.0421   Epoch: 7   Global Step: 291560   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:29:58,254-Speed 2623.29 samples/sec   Loss 8.8047   LearningRate 0.0421   Epoch: 7   Global Step: 291570   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:02,157-Speed 2624.26 samples/sec   Loss 8.6371   LearningRate 0.0421   Epoch: 7   Global Step: 291580   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:06,060-Speed 2623.89 samples/sec   Loss 8.7659   LearningRate 0.0421   Epoch: 7   Global Step: 291590   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:10,003-Speed 2597.68 samples/sec   Loss 8.7534   LearningRate 0.0421   Epoch: 7   Global Step: 291600   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:13,909-Speed 2621.94 samples/sec   Loss 8.7460   LearningRate 0.0421   Epoch: 7   Global Step: 291610   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:17,851-Speed 2599.18 samples/sec   Loss 8.7162   LearningRate 0.0421   Epoch: 7   Global Step: 291620   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:21,760-Speed 2619.85 samples/sec   Loss 8.5825   LearningRate 0.0421   Epoch: 7   Global Step: 291630   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:25,663-Speed 2624.95 samples/sec   Loss 8.9060   LearningRate 0.0420   Epoch: 7   Global Step: 291640   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:29,564-Speed 2625.75 samples/sec   Loss 8.7560   LearningRate 0.0420   Epoch: 7   Global Step: 291650   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:33,495-Speed 2605.03 samples/sec   Loss 8.7122   LearningRate 0.0420   Epoch: 7   Global Step: 291660   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:30:37,412-Speed 2615.13 samples/sec   Loss 8.8131   LearningRate 0.0420   Epoch: 7   Global Step: 291670   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:30:41,346-Speed 2603.52 samples/sec   Loss 8.7482   LearningRate 0.0420   Epoch: 7   Global Step: 291680   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:30:45,257-Speed 2619.14 samples/sec   Loss 8.7778   LearningRate 0.0420   Epoch: 7   Global Step: 291690   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:50,279-Speed 2039.61 samples/sec   Loss 8.7464   LearningRate 0.0420   Epoch: 7   Global Step: 291700   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:54,192-Speed 2617.74 samples/sec   Loss 8.7163   LearningRate 0.0420   Epoch: 7   Global Step: 291710   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:30:58,107-Speed 2617.56 samples/sec   Loss 8.7921   LearningRate 0.0420   Epoch: 7   Global Step: 291720   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:02,017-Speed 2619.22 samples/sec   Loss 8.8190   LearningRate 0.0420   Epoch: 7   Global Step: 291730   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:05,929-Speed 2617.77 samples/sec   Loss 8.8449   LearningRate 0.0420   Epoch: 7   Global Step: 291740   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:09,847-Speed 2614.06 samples/sec   Loss 8.8964   LearningRate 0.0420   Epoch: 7   Global Step: 291750   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:13,765-Speed 2614.88 samples/sec   Loss 8.7647   LearningRate 0.0420   Epoch: 7   Global Step: 291760   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:17,677-Speed 2618.83 samples/sec   Loss 8.9597   LearningRate 0.0420   Epoch: 7   Global Step: 291770   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:21,588-Speed 2618.26 samples/sec   Loss 8.8159   LearningRate 0.0420   Epoch: 7   Global Step: 291780   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:25,515-Speed 2609.25 samples/sec   Loss 8.7683   LearningRate 0.0420   Epoch: 7   Global Step: 291790   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:31:29,404-Speed 2633.13 samples/sec   Loss 8.8441   LearningRate 0.0420   Epoch: 7   Global Step: 291800   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:33,352-Speed 2594.26 samples/sec   Loss 8.8043   LearningRate 0.0420   Epoch: 7   Global Step: 291810   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:37,271-Speed 2614.00 samples/sec   Loss 8.7027   LearningRate 0.0420   Epoch: 7   Global Step: 291820   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:41,176-Speed 2623.23 samples/sec   Loss 8.8064   LearningRate 0.0420   Epoch: 7   Global Step: 291830   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:45,078-Speed 2624.33 samples/sec   Loss 8.7223   LearningRate 0.0420   Epoch: 7   Global Step: 291840   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:48,989-Speed 2619.26 samples/sec   Loss 8.8033   LearningRate 0.0420   Epoch: 7   Global Step: 291850   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:52,910-Speed 2611.62 samples/sec   Loss 8.7067   LearningRate 0.0420   Epoch: 7   Global Step: 291860   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:31:56,802-Speed 2632.11 samples/sec   Loss 8.9241   LearningRate 0.0420   Epoch: 7   Global Step: 291870   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:00,707-Speed 2622.90 samples/sec   Loss 8.8993   LearningRate 0.0420   Epoch: 7   Global Step: 291880   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:04,622-Speed 2616.39 samples/sec   Loss 8.5751   LearningRate 0.0420   Epoch: 7   Global Step: 291890   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:08,524-Speed 2624.99 samples/sec   Loss 8.9187   LearningRate 0.0420   Epoch: 7   Global Step: 291900   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:12,429-Speed 2622.75 samples/sec   Loss 8.8472   LearningRate 0.0420   Epoch: 7   Global Step: 291910   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:16,332-Speed 2624.39 samples/sec   Loss 8.7701   LearningRate 0.0420   Epoch: 7   Global Step: 291920   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:20,350-Speed 2549.02 samples/sec   Loss 8.7971   LearningRate 0.0420   Epoch: 7   Global Step: 291930   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:24,302-Speed 2591.56 samples/sec   Loss 8.8163   LearningRate 0.0420   Epoch: 7   Global Step: 291940   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:28,206-Speed 2623.82 samples/sec   Loss 8.8556   LearningRate 0.0420   Epoch: 7   Global Step: 291950   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:32,113-Speed 2621.88 samples/sec   Loss 8.6865   LearningRate 0.0420   Epoch: 7   Global Step: 291960   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:32:36,020-Speed 2621.28 samples/sec   Loss 8.9367   LearningRate 0.0420   Epoch: 7   Global Step: 291970   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:32:39,927-Speed 2621.51 samples/sec   Loss 8.6961   LearningRate 0.0420   Epoch: 7   Global Step: 291980   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:32:43,833-Speed 2622.30 samples/sec   Loss 8.6897   LearningRate 0.0420   Epoch: 7   Global Step: 291990   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:32:47,748-Speed 2616.44 samples/sec   Loss 8.9166   LearningRate 0.0420   Epoch: 7   Global Step: 292000   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:32:51,661-Speed 2617.27 samples/sec   Loss 8.8403   LearningRate 0.0420   Epoch: 7   Global Step: 292010   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:32:55,576-Speed 2616.20 samples/sec   Loss 8.7333   LearningRate 0.0420   Epoch: 7   Global Step: 292020   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:32:59,478-Speed 2624.99 samples/sec   Loss 8.9394   LearningRate 0.0420   Epoch: 7   Global Step: 292030   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:03,382-Speed 2623.84 samples/sec   Loss 8.9444   LearningRate 0.0420   Epoch: 7   Global Step: 292040   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:07,291-Speed 2620.48 samples/sec   Loss 8.8774   LearningRate 0.0420   Epoch: 7   Global Step: 292050   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:11,303-Speed 2552.36 samples/sec   Loss 8.7786   LearningRate 0.0420   Epoch: 7   Global Step: 292060   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:15,206-Speed 2624.59 samples/sec   Loss 8.7415   LearningRate 0.0420   Epoch: 7   Global Step: 292070   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:33:19,110-Speed 2623.51 samples/sec   Loss 8.7790   LearningRate 0.0420   Epoch: 7   Global Step: 292080   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:33:22,998-Speed 2634.47 samples/sec   Loss 8.8399   LearningRate 0.0420   Epoch: 7   Global Step: 292090   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:26,912-Speed 2616.68 samples/sec   Loss 8.7873   LearningRate 0.0420   Epoch: 7   Global Step: 292100   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:30,914-Speed 2560.27 samples/sec   Loss 8.8275   LearningRate 0.0420   Epoch: 7   Global Step: 292110   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:34,815-Speed 2625.15 samples/sec   Loss 8.8031   LearningRate 0.0420   Epoch: 7   Global Step: 292120   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:38,719-Speed 2623.14 samples/sec   Loss 8.7093   LearningRate 0.0420   Epoch: 7   Global Step: 292130   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:42,622-Speed 2624.86 samples/sec   Loss 8.6192   LearningRate 0.0420   Epoch: 7   Global Step: 292140   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:46,523-Speed 2625.88 samples/sec   Loss 8.7195   LearningRate 0.0420   Epoch: 7   Global Step: 292150   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:50,428-Speed 2622.53 samples/sec   Loss 8.7747   LearningRate 0.0420   Epoch: 7   Global Step: 292160   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:54,329-Speed 2625.66 samples/sec   Loss 8.7360   LearningRate 0.0420   Epoch: 7   Global Step: 292170   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:33:58,227-Speed 2627.53 samples/sec   Loss 8.8705   LearningRate 0.0420   Epoch: 7   Global Step: 292180   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:34:02,112-Speed 2637.36 samples/sec   Loss 8.8390   LearningRate 0.0420   Epoch: 7   Global Step: 292190   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:34:05,995-Speed 2637.37 samples/sec   Loss 8.7839   LearningRate 0.0420   Epoch: 7   Global Step: 292200   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:09,897-Speed 2624.44 samples/sec   Loss 8.7069   LearningRate 0.0420   Epoch: 7   Global Step: 292210   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:13,801-Speed 2623.68 samples/sec   Loss 8.6793   LearningRate 0.0420   Epoch: 7   Global Step: 292220   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:17,703-Speed 2625.43 samples/sec   Loss 8.7945   LearningRate 0.0420   Epoch: 7   Global Step: 292230   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:21,606-Speed 2623.57 samples/sec   Loss 8.7945   LearningRate 0.0420   Epoch: 7   Global Step: 292240   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:25,510-Speed 2623.76 samples/sec   Loss 8.7382   LearningRate 0.0420   Epoch: 7   Global Step: 292250   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:29,411-Speed 2625.76 samples/sec   Loss 9.0063   LearningRate 0.0420   Epoch: 7   Global Step: 292260   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:33,314-Speed 2624.30 samples/sec   Loss 8.8589   LearningRate 0.0420   Epoch: 7   Global Step: 292270   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:37,215-Speed 2625.78 samples/sec   Loss 8.6967   LearningRate 0.0419   Epoch: 7   Global Step: 292280   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:41,114-Speed 2626.54 samples/sec   Loss 8.8070   LearningRate 0.0419   Epoch: 7   Global Step: 292290   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:34:45,019-Speed 2623.74 samples/sec   Loss 8.7595   LearningRate 0.0419   Epoch: 7   Global Step: 292300   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:34:48,923-Speed 2623.62 samples/sec   Loss 8.6963   LearningRate 0.0419   Epoch: 7   Global Step: 292310   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:34:52,860-Speed 2601.86 samples/sec   Loss 8.7918   LearningRate 0.0419   Epoch: 7   Global Step: 292320   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:34:56,774-Speed 2616.78 samples/sec   Loss 8.8219   LearningRate 0.0419   Epoch: 7   Global Step: 292330   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:00,676-Speed 2624.82 samples/sec   Loss 8.7501   LearningRate 0.0419   Epoch: 7   Global Step: 292340   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:04,578-Speed 2624.63 samples/sec   Loss 8.8259   LearningRate 0.0419   Epoch: 7   Global Step: 292350   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:08,477-Speed 2627.74 samples/sec   Loss 8.6324   LearningRate 0.0419   Epoch: 7   Global Step: 292360   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:12,376-Speed 2626.69 samples/sec   Loss 8.7403   LearningRate 0.0419   Epoch: 7   Global Step: 292370   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:16,280-Speed 2623.52 samples/sec   Loss 8.8079   LearningRate 0.0419   Epoch: 7   Global Step: 292380   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:20,253-Speed 2578.92 samples/sec   Loss 8.7048   LearningRate 0.0419   Epoch: 7   Global Step: 292390   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:24,152-Speed 2626.45 samples/sec   Loss 8.6606   LearningRate 0.0419   Epoch: 7   Global Step: 292400   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:28,050-Speed 2627.73 samples/sec   Loss 8.6633   LearningRate 0.0419   Epoch: 7   Global Step: 292410   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:31,950-Speed 2626.04 samples/sec   Loss 8.8200   LearningRate 0.0419   Epoch: 7   Global Step: 292420   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:35,849-Speed 2627.08 samples/sec   Loss 8.7832   LearningRate 0.0419   Epoch: 7   Global Step: 292430   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:39,751-Speed 2624.50 samples/sec   Loss 8.7788   LearningRate 0.0419   Epoch: 7   Global Step: 292440   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:43,659-Speed 2621.37 samples/sec   Loss 8.8007   LearningRate 0.0419   Epoch: 7   Global Step: 292450   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:47,574-Speed 2615.83 samples/sec   Loss 8.7769   LearningRate 0.0419   Epoch: 7   Global Step: 292460   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:51,487-Speed 2617.93 samples/sec   Loss 8.7666   LearningRate 0.0419   Epoch: 7   Global Step: 292470   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:55,400-Speed 2617.42 samples/sec   Loss 8.6618   LearningRate 0.0419   Epoch: 7   Global Step: 292480   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:35:59,307-Speed 2621.26 samples/sec   Loss 8.8372   LearningRate 0.0419   Epoch: 7   Global Step: 292490   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:36:03,220-Speed 2617.54 samples/sec   Loss 8.7619   LearningRate 0.0419   Epoch: 7   Global Step: 292500   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:36:07,112-Speed 2631.89 samples/sec   Loss 8.7046   LearningRate 0.0419   Epoch: 7   Global Step: 292510   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:36:10,974-Speed 2651.56 samples/sec   Loss 8.7303   LearningRate 0.0419   Epoch: 7   Global Step: 292520   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:14,883-Speed 2620.01 samples/sec   Loss 8.7242   LearningRate 0.0419   Epoch: 7   Global Step: 292530   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:18,785-Speed 2625.08 samples/sec   Loss 8.8278   LearningRate 0.0419   Epoch: 7   Global Step: 292540   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:22,688-Speed 2623.77 samples/sec   Loss 8.8623   LearningRate 0.0419   Epoch: 7   Global Step: 292550   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:26,587-Speed 2627.22 samples/sec   Loss 8.8410   LearningRate 0.0419   Epoch: 7   Global Step: 292560   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:30,492-Speed 2623.28 samples/sec   Loss 8.8693   LearningRate 0.0419   Epoch: 7   Global Step: 292570   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:34,396-Speed 2623.48 samples/sec   Loss 8.7892   LearningRate 0.0419   Epoch: 7   Global Step: 292580   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:38,293-Speed 2628.20 samples/sec   Loss 8.7430   LearningRate 0.0419   Epoch: 7   Global Step: 292590   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:42,262-Speed 2580.75 samples/sec   Loss 8.8558   LearningRate 0.0419   Epoch: 7   Global Step: 292600   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:46,163-Speed 2625.30 samples/sec   Loss 8.7325   LearningRate 0.0419   Epoch: 7   Global Step: 292610   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:36:50,064-Speed 2625.24 samples/sec   Loss 8.7593   LearningRate 0.0419   Epoch: 7   Global Step: 292620   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:36:53,973-Speed 2620.29 samples/sec   Loss 8.7877   LearningRate 0.0419   Epoch: 7   Global Step: 292630   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:36:57,881-Speed 2621.45 samples/sec   Loss 8.8322   LearningRate 0.0419   Epoch: 7   Global Step: 292640   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:01,789-Speed 2620.75 samples/sec   Loss 8.6892   LearningRate 0.0419   Epoch: 7   Global Step: 292650   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:05,689-Speed 2626.49 samples/sec   Loss 8.6122   LearningRate 0.0419   Epoch: 7   Global Step: 292660   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:09,588-Speed 2626.84 samples/sec   Loss 8.7900   LearningRate 0.0419   Epoch: 7   Global Step: 292670   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:13,489-Speed 2625.60 samples/sec   Loss 8.7391   LearningRate 0.0419   Epoch: 7   Global Step: 292680   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:17,389-Speed 2626.45 samples/sec   Loss 8.8374   LearningRate 0.0419   Epoch: 7   Global Step: 292690   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:21,290-Speed 2625.77 samples/sec   Loss 8.6994   LearningRate 0.0419   Epoch: 7   Global Step: 292700   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:25,188-Speed 2627.48 samples/sec   Loss 8.7866   LearningRate 0.0419   Epoch: 7   Global Step: 292710   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:37:29,087-Speed 2626.63 samples/sec   Loss 8.7670   LearningRate 0.0419   Epoch: 7   Global Step: 292720   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:32,989-Speed 2624.68 samples/sec   Loss 8.6591   LearningRate 0.0419   Epoch: 7   Global Step: 292730   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:37,259-Speed 2398.72 samples/sec   Loss 8.6771   LearningRate 0.0419   Epoch: 7   Global Step: 292740   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:41,167-Speed 2620.91 samples/sec   Loss 8.6213   LearningRate 0.0419   Epoch: 7   Global Step: 292750   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:45,067-Speed 2626.57 samples/sec   Loss 8.6862   LearningRate 0.0419   Epoch: 7   Global Step: 292760   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:48,964-Speed 2628.03 samples/sec   Loss 8.6957   LearningRate 0.0419   Epoch: 7   Global Step: 292770   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:52,866-Speed 2625.11 samples/sec   Loss 8.6844   LearningRate 0.0419   Epoch: 7   Global Step: 292780   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:37:56,763-Speed 2628.14 samples/sec   Loss 8.7776   LearningRate 0.0419   Epoch: 7   Global Step: 292790   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:00,663-Speed 2626.52 samples/sec   Loss 8.8204   LearningRate 0.0419   Epoch: 7   Global Step: 292800   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:04,564-Speed 2625.01 samples/sec   Loss 8.7251   LearningRate 0.0419   Epoch: 7   Global Step: 292810   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:08,469-Speed 2622.78 samples/sec   Loss 8.7416   LearningRate 0.0419   Epoch: 7   Global Step: 292820   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:38:12,369-Speed 2626.20 samples/sec   Loss 8.7414   LearningRate 0.0419   Epoch: 7   Global Step: 292830   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:38:16,251-Speed 2638.71 samples/sec   Loss 8.7858   LearningRate 0.0419   Epoch: 7   Global Step: 292840   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:20,150-Speed 2626.77 samples/sec   Loss 8.7377   LearningRate 0.0419   Epoch: 7   Global Step: 292850   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:24,046-Speed 2629.21 samples/sec   Loss 8.8350   LearningRate 0.0419   Epoch: 7   Global Step: 292860   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:27,946-Speed 2626.34 samples/sec   Loss 8.8064   LearningRate 0.0419   Epoch: 7   Global Step: 292870   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:31,851-Speed 2622.47 samples/sec   Loss 8.8011   LearningRate 0.0419   Epoch: 7   Global Step: 292880   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:35,749-Speed 2627.56 samples/sec   Loss 8.7361   LearningRate 0.0419   Epoch: 7   Global Step: 292890   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:39,646-Speed 2628.20 samples/sec   Loss 8.5805   LearningRate 0.0419   Epoch: 7   Global Step: 292900   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:43,543-Speed 2628.33 samples/sec   Loss 8.7746   LearningRate 0.0419   Epoch: 7   Global Step: 292910   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:47,442-Speed 2626.50 samples/sec   Loss 8.7687   LearningRate 0.0418   Epoch: 7   Global Step: 292920   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:51,344-Speed 2625.08 samples/sec   Loss 8.9179   LearningRate 0.0418   Epoch: 7   Global Step: 292930   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:38:55,256-Speed 2618.39 samples/sec   Loss 8.6856   LearningRate 0.0418   Epoch: 7   Global Step: 292940   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:38:59,168-Speed 2617.74 samples/sec   Loss 8.7664   LearningRate 0.0418   Epoch: 7   Global Step: 292950   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:39:03,042-Speed 2644.82 samples/sec   Loss 8.7556   LearningRate 0.0418   Epoch: 7   Global Step: 292960   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:06,939-Speed 2628.08 samples/sec   Loss 8.7595   LearningRate 0.0418   Epoch: 7   Global Step: 292970   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:10,841-Speed 2624.99 samples/sec   Loss 8.6338   LearningRate 0.0418   Epoch: 7   Global Step: 292980   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:14,743-Speed 2624.63 samples/sec   Loss 8.8289   LearningRate 0.0418   Epoch: 7   Global Step: 292990   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:18,642-Speed 2626.85 samples/sec   Loss 8.5993   LearningRate 0.0418   Epoch: 7   Global Step: 293000   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:22,542-Speed 2625.85 samples/sec   Loss 8.8501   LearningRate 0.0418   Epoch: 7   Global Step: 293010   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:26,442-Speed 2626.62 samples/sec   Loss 8.6968   LearningRate 0.0418   Epoch: 7   Global Step: 293020   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:30,341-Speed 2626.64 samples/sec   Loss 8.7869   LearningRate 0.0418   Epoch: 7   Global Step: 293030   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:34,259-Speed 2614.63 samples/sec   Loss 8.8365   LearningRate 0.0418   Epoch: 7   Global Step: 293040   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:38,153-Speed 2629.95 samples/sec   Loss 8.6822   LearningRate 0.0418   Epoch: 7   Global Step: 293050   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:42,052-Speed 2627.12 samples/sec   Loss 8.8005   LearningRate 0.0418   Epoch: 7   Global Step: 293060   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:39:45,947-Speed 2629.59 samples/sec   Loss 8.8405   LearningRate 0.0418   Epoch: 7   Global Step: 293070   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:39:49,824-Speed 2641.69 samples/sec   Loss 8.6516   LearningRate 0.0418   Epoch: 7   Global Step: 293080   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:53,720-Speed 2629.08 samples/sec   Loss 8.6694   LearningRate 0.0418   Epoch: 7   Global Step: 293090   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:39:57,616-Speed 2628.67 samples/sec   Loss 8.7429   LearningRate 0.0418   Epoch: 7   Global Step: 293100   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:01,511-Speed 2629.90 samples/sec   Loss 8.7842   LearningRate 0.0418   Epoch: 7   Global Step: 293110   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:05,405-Speed 2630.05 samples/sec   Loss 8.7618   LearningRate 0.0418   Epoch: 7   Global Step: 293120   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:09,305-Speed 2626.32 samples/sec   Loss 8.7699   LearningRate 0.0418   Epoch: 7   Global Step: 293130   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:13,199-Speed 2629.81 samples/sec   Loss 8.7929   LearningRate 0.0418   Epoch: 7   Global Step: 293140   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:17,099-Speed 2627.04 samples/sec   Loss 8.9325   LearningRate 0.0418   Epoch: 7   Global Step: 293150   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:20,997-Speed 2627.75 samples/sec   Loss 8.6316   LearningRate 0.0418   Epoch: 7   Global Step: 293160   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:24,894-Speed 2628.04 samples/sec   Loss 8.6855   LearningRate 0.0418   Epoch: 7   Global Step: 293170   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:28,795-Speed 2625.86 samples/sec   Loss 8.8138   LearningRate 0.0418   Epoch: 7   Global Step: 293180   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:40:32,692-Speed 2628.09 samples/sec   Loss 8.6398   LearningRate 0.0418   Epoch: 7   Global Step: 293190   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:40:36,587-Speed 2629.46 samples/sec   Loss 8.8381   LearningRate 0.0418   Epoch: 7   Global Step: 293200   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:40:40,528-Speed 2598.83 samples/sec   Loss 8.7146   LearningRate 0.0418   Epoch: 7   Global Step: 293210   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:40:44,420-Speed 2631.69 samples/sec   Loss 8.8164   LearningRate 0.0418   Epoch: 7   Global Step: 293220   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:48,316-Speed 2628.49 samples/sec   Loss 8.7560   LearningRate 0.0418   Epoch: 7   Global Step: 293230   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:52,228-Speed 2618.99 samples/sec   Loss 8.7227   LearningRate 0.0418   Epoch: 7   Global Step: 293240   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:40:56,121-Speed 2630.50 samples/sec   Loss 8.7887   LearningRate 0.0418   Epoch: 7   Global Step: 293250   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:41:00,018-Speed 2628.38 samples/sec   Loss 8.6370   LearningRate 0.0418   Epoch: 7   Global Step: 293260   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:41:03,897-Speed 2640.59 samples/sec   Loss 8.7375   LearningRate 0.0418   Epoch: 7   Global Step: 293270   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:07,804-Speed 2621.53 samples/sec   Loss 8.6615   LearningRate 0.0418   Epoch: 7   Global Step: 293280   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:11,708-Speed 2623.02 samples/sec   Loss 8.6697   LearningRate 0.0418   Epoch: 7   Global Step: 293290   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:15,638-Speed 2606.37 samples/sec   Loss 8.8314   LearningRate 0.0418   Epoch: 7   Global Step: 293300   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:19,545-Speed 2621.90 samples/sec   Loss 8.8119   LearningRate 0.0418   Epoch: 7   Global Step: 293310   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:23,438-Speed 2631.00 samples/sec   Loss 8.8069   LearningRate 0.0418   Epoch: 7   Global Step: 293320   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:27,344-Speed 2622.28 samples/sec   Loss 8.7726   LearningRate 0.0418   Epoch: 7   Global Step: 293330   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:31,245-Speed 2625.70 samples/sec   Loss 8.6816   LearningRate 0.0418   Epoch: 7   Global Step: 293340   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:35,140-Speed 2629.42 samples/sec   Loss 8.6903   LearningRate 0.0418   Epoch: 7   Global Step: 293350   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:39,034-Speed 2630.27 samples/sec   Loss 8.7214   LearningRate 0.0418   Epoch: 7   Global Step: 293360   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:41:42,928-Speed 2630.92 samples/sec   Loss 8.6950   LearningRate 0.0418   Epoch: 7   Global Step: 293370   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:41:46,821-Speed 2630.88 samples/sec   Loss 8.6635   LearningRate 0.0418   Epoch: 7   Global Step: 293380   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:41:50,717-Speed 2629.13 samples/sec   Loss 8.8100   LearningRate 0.0418   Epoch: 7   Global Step: 293390   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:41:54,652-Speed 2602.79 samples/sec   Loss 8.8197   LearningRate 0.0418   Epoch: 7   Global Step: 293400   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:41:58,551-Speed 2627.39 samples/sec   Loss 8.7551   LearningRate 0.0418   Epoch: 7   Global Step: 293410   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:02,455-Speed 2623.81 samples/sec   Loss 8.6565   LearningRate 0.0418   Epoch: 7   Global Step: 293420   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:06,352-Speed 2628.10 samples/sec   Loss 8.7155   LearningRate 0.0418   Epoch: 7   Global Step: 293430   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:10,246-Speed 2629.57 samples/sec   Loss 8.8379   LearningRate 0.0418   Epoch: 7   Global Step: 293440   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:14,155-Speed 2620.71 samples/sec   Loss 8.6190   LearningRate 0.0418   Epoch: 7   Global Step: 293450   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:18,060-Speed 2622.97 samples/sec   Loss 8.7832   LearningRate 0.0418   Epoch: 7   Global Step: 293460   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:21,960-Speed 2626.46 samples/sec   Loss 8.6596   LearningRate 0.0418   Epoch: 7   Global Step: 293470   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:42:25,886-Speed 2608.80 samples/sec   Loss 8.7730   LearningRate 0.0418   Epoch: 7   Global Step: 293480   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:29,794-Speed 2621.63 samples/sec   Loss 8.8352   LearningRate 0.0418   Epoch: 7   Global Step: 293490   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:42:33,679-Speed 2636.36 samples/sec   Loss 8.7043   LearningRate 0.0418   Epoch: 7   Global Step: 293500   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:42:37,585-Speed 2622.20 samples/sec   Loss 8.8199   LearningRate 0.0418   Epoch: 7   Global Step: 293510   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:42:41,484-Speed 2626.53 samples/sec   Loss 8.7841   LearningRate 0.0418   Epoch: 7   Global Step: 293520   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:42:45,379-Speed 2629.09 samples/sec   Loss 8.7525   LearningRate 0.0418   Epoch: 7   Global Step: 293530   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:42:49,279-Speed 2626.54 samples/sec   Loss 8.8122   LearningRate 0.0418   Epoch: 7   Global Step: 293540   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:42:53,197-Speed 2614.74 samples/sec   Loss 8.7009   LearningRate 0.0418   Epoch: 7   Global Step: 293550   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:42:57,120-Speed 2611.36 samples/sec   Loss 9.3674   LearningRate 0.0417   Epoch: 7   Global Step: 293560   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:01,085-Speed 2582.79 samples/sec   Loss 9.5308   LearningRate 0.0417   Epoch: 7   Global Step: 293570   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:04,994-Speed 2619.92 samples/sec   Loss 8.9210   LearningRate 0.0417   Epoch: 7   Global Step: 293580   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:08,901-Speed 2621.84 samples/sec   Loss 8.8632   LearningRate 0.0417   Epoch: 7   Global Step: 293590   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:12,847-Speed 2595.80 samples/sec   Loss 8.6328   LearningRate 0.0417   Epoch: 7   Global Step: 293600   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:16,753-Speed 2621.83 samples/sec   Loss 8.9378   LearningRate 0.0417   Epoch: 7   Global Step: 293610   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:20,751-Speed 2562.41 samples/sec   Loss 8.7668   LearningRate 0.0417   Epoch: 7   Global Step: 293620   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:24,683-Speed 2604.58 samples/sec   Loss 9.4611   LearningRate 0.0417   Epoch: 7   Global Step: 293630   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:28,631-Speed 2594.98 samples/sec   Loss 9.2657   LearningRate 0.0417   Epoch: 7   Global Step: 293640   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:32,530-Speed 2626.50 samples/sec   Loss 8.9565   LearningRate 0.0417   Epoch: 7   Global Step: 293650   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:43:36,426-Speed 2628.82 samples/sec   Loss 8.7954   LearningRate 0.0417   Epoch: 7   Global Step: 293660   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:43:40,329-Speed 2623.92 samples/sec   Loss 8.8260   LearningRate 0.0417   Epoch: 7   Global Step: 293670   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:43:44,236-Speed 2621.69 samples/sec   Loss 8.8460   LearningRate 0.0417   Epoch: 7   Global Step: 293680   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:43:48,137-Speed 2625.63 samples/sec   Loss 8.8512   LearningRate 0.0417   Epoch: 7   Global Step: 293690   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:43:52,032-Speed 2630.23 samples/sec   Loss 8.7707   LearningRate 0.0417   Epoch: 7   Global Step: 293700   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:43:55,930-Speed 2627.76 samples/sec   Loss 8.8947   LearningRate 0.0417   Epoch: 7   Global Step: 293710   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:43:59,830-Speed 2626.42 samples/sec   Loss 8.8077   LearningRate 0.0417   Epoch: 7   Global Step: 293720   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:03,730-Speed 2625.90 samples/sec   Loss 8.9748   LearningRate 0.0417   Epoch: 7   Global Step: 293730   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:07,629-Speed 2627.06 samples/sec   Loss 8.8469   LearningRate 0.0417   Epoch: 7   Global Step: 293740   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:11,525-Speed 2629.08 samples/sec   Loss 8.9237   LearningRate 0.0417   Epoch: 7   Global Step: 293750   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:15,434-Speed 2619.51 samples/sec   Loss 8.8734   LearningRate 0.0417   Epoch: 7   Global Step: 293760   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:44:19,338-Speed 2624.19 samples/sec   Loss 8.8400   LearningRate 0.0417   Epoch: 7   Global Step: 293770   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:44:23,227-Speed 2633.54 samples/sec   Loss 8.7822   LearningRate 0.0417   Epoch: 7   Global Step: 293780   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:27,136-Speed 2620.86 samples/sec   Loss 8.8896   LearningRate 0.0417   Epoch: 7   Global Step: 293790   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:31,028-Speed 2631.55 samples/sec   Loss 8.8413   LearningRate 0.0417   Epoch: 7   Global Step: 293800   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:34,929-Speed 2625.84 samples/sec   Loss 8.8373   LearningRate 0.0417   Epoch: 7   Global Step: 293810   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:38,832-Speed 2623.95 samples/sec   Loss 8.7889   LearningRate 0.0417   Epoch: 7   Global Step: 293820   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:42,733-Speed 2626.02 samples/sec   Loss 8.7525   LearningRate 0.0417   Epoch: 7   Global Step: 293830   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:46,799-Speed 2518.86 samples/sec   Loss 8.7748   LearningRate 0.0417   Epoch: 7   Global Step: 293840   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:50,713-Speed 2616.81 samples/sec   Loss 8.8581   LearningRate 0.0417   Epoch: 7   Global Step: 293850   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:54,640-Speed 2608.34 samples/sec   Loss 8.6766   LearningRate 0.0417   Epoch: 7   Global Step: 293860   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:44:58,536-Speed 2628.80 samples/sec   Loss 8.7198   LearningRate 0.0417   Epoch: 7   Global Step: 293870   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:45:02,522-Speed 2570.31 samples/sec   Loss 8.8588   LearningRate 0.0417   Epoch: 7   Global Step: 293880   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:06,420-Speed 2627.47 samples/sec   Loss 8.8323   LearningRate 0.0417   Epoch: 7   Global Step: 293890   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:10,337-Speed 2614.82 samples/sec   Loss 8.7371   LearningRate 0.0417   Epoch: 7   Global Step: 293900   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:14,249-Speed 2618.09 samples/sec   Loss 8.8152   LearningRate 0.0417   Epoch: 7   Global Step: 293910   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:18,165-Speed 2615.63 samples/sec   Loss 8.7105   LearningRate 0.0417   Epoch: 7   Global Step: 293920   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:22,073-Speed 2621.23 samples/sec   Loss 8.7342   LearningRate 0.0417   Epoch: 7   Global Step: 293930   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:25,966-Speed 2631.13 samples/sec   Loss 8.6956   LearningRate 0.0417   Epoch: 7   Global Step: 293940   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:29,861-Speed 2629.77 samples/sec   Loss 8.7261   LearningRate 0.0417   Epoch: 7   Global Step: 293950   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:33,754-Speed 2630.97 samples/sec   Loss 8.8336   LearningRate 0.0417   Epoch: 7   Global Step: 293960   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:37,648-Speed 2630.71 samples/sec   Loss 8.8861   LearningRate 0.0417   Epoch: 7   Global Step: 293970   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:41,539-Speed 2632.20 samples/sec   Loss 8.7111   LearningRate 0.0417   Epoch: 7   Global Step: 293980   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:45,432-Speed 2630.97 samples/sec   Loss 8.7516   LearningRate 0.0417   Epoch: 7   Global Step: 293990   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:49,338-Speed 2622.50 samples/sec   Loss 8.8241   LearningRate 0.0417   Epoch: 7   Global Step: 294000   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:53,246-Speed 2621.19 samples/sec   Loss 8.8791   LearningRate 0.0417   Epoch: 7   Global Step: 294010   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:45:57,143-Speed 2628.36 samples/sec   Loss 8.7999   LearningRate 0.0417   Epoch: 7   Global Step: 294020   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:01,046-Speed 2623.78 samples/sec   Loss 8.7519   LearningRate 0.0417   Epoch: 7   Global Step: 294030   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:04,943-Speed 2628.68 samples/sec   Loss 8.7849   LearningRate 0.0417   Epoch: 7   Global Step: 294040   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:08,860-Speed 2615.12 samples/sec   Loss 8.8512   LearningRate 0.0417   Epoch: 7   Global Step: 294050   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:12,788-Speed 2607.31 samples/sec   Loss 8.7843   LearningRate 0.0417   Epoch: 7   Global Step: 294060   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:16,682-Speed 2630.53 samples/sec   Loss 8.6921   LearningRate 0.0417   Epoch: 7   Global Step: 294070   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:20,579-Speed 2628.47 samples/sec   Loss 8.6886   LearningRate 0.0417   Epoch: 7   Global Step: 294080   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:46:24,456-Speed 2642.14 samples/sec   Loss 8.7521   LearningRate 0.0417   Epoch: 7   Global Step: 294090   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:28,347-Speed 2631.94 samples/sec   Loss 8.6902   LearningRate 0.0417   Epoch: 7   Global Step: 294100   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:32,240-Speed 2630.85 samples/sec   Loss 8.7986   LearningRate 0.0417   Epoch: 7   Global Step: 294110   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:36,133-Speed 2631.12 samples/sec   Loss 8.7286   LearningRate 0.0417   Epoch: 7   Global Step: 294120   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:40,047-Speed 2616.98 samples/sec   Loss 8.8391   LearningRate 0.0417   Epoch: 7   Global Step: 294130   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:43,955-Speed 2621.65 samples/sec   Loss 8.6807   LearningRate 0.0417   Epoch: 7   Global Step: 294140   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:47,860-Speed 2622.46 samples/sec   Loss 8.7728   LearningRate 0.0417   Epoch: 7   Global Step: 294150   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:51,776-Speed 2616.43 samples/sec   Loss 8.7919   LearningRate 0.0417   Epoch: 7   Global Step: 294160   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:55,689-Speed 2617.12 samples/sec   Loss 8.7245   LearningRate 0.0417   Epoch: 7   Global Step: 294170   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:46:59,630-Speed 2599.06 samples/sec   Loss 8.7146   LearningRate 0.0417   Epoch: 7   Global Step: 294180   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:47:03,525-Speed 2629.30 samples/sec   Loss 8.9039   LearningRate 0.0417   Epoch: 7   Global Step: 294190   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:47:07,419-Speed 2631.06 samples/sec   Loss 9.1688   LearningRate 0.0416   Epoch: 7   Global Step: 294200   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:11,311-Speed 2631.36 samples/sec   Loss 8.6836   LearningRate 0.0416   Epoch: 7   Global Step: 294210   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:15,227-Speed 2615.46 samples/sec   Loss 8.7322   LearningRate 0.0416   Epoch: 7   Global Step: 294220   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:19,274-Speed 2531.31 samples/sec   Loss 8.7779   LearningRate 0.0416   Epoch: 7   Global Step: 294230   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:23,168-Speed 2629.82 samples/sec   Loss 8.9031   LearningRate 0.0416   Epoch: 7   Global Step: 294240   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:27,062-Speed 2630.89 samples/sec   Loss 8.7394   LearningRate 0.0416   Epoch: 7   Global Step: 294250   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:30,954-Speed 2631.66 samples/sec   Loss 8.7134   LearningRate 0.0416   Epoch: 7   Global Step: 294260   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:34,853-Speed 2627.02 samples/sec   Loss 8.6432   LearningRate 0.0416   Epoch: 7   Global Step: 294270   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:38,747-Speed 2630.17 samples/sec   Loss 8.7634   LearningRate 0.0416   Epoch: 7   Global Step: 294280   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:42,642-Speed 2629.60 samples/sec   Loss 8.5009   LearningRate 0.0416   Epoch: 7   Global Step: 294290   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:47:46,539-Speed 2628.30 samples/sec   Loss 8.8659   LearningRate 0.0416   Epoch: 7   Global Step: 294300   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:47:50,451-Speed 2618.54 samples/sec   Loss 8.7234   LearningRate 0.0416   Epoch: 7   Global Step: 294310   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:47:54,358-Speed 2621.17 samples/sec   Loss 9.0650   LearningRate 0.0416   Epoch: 7   Global Step: 294320   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:47:58,286-Speed 2608.30 samples/sec   Loss 9.2601   LearningRate 0.0416   Epoch: 7   Global Step: 294330   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:02,191-Speed 2623.06 samples/sec   Loss 8.6971   LearningRate 0.0416   Epoch: 7   Global Step: 294340   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:06,093-Speed 2624.97 samples/sec   Loss 8.6390   LearningRate 0.0416   Epoch: 7   Global Step: 294350   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:10,001-Speed 2621.01 samples/sec   Loss 8.6849   LearningRate 0.0416   Epoch: 7   Global Step: 294360   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:13,903-Speed 2624.68 samples/sec   Loss 8.7182   LearningRate 0.0416   Epoch: 7   Global Step: 294370   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:17,812-Speed 2620.49 samples/sec   Loss 8.7751   LearningRate 0.0416   Epoch: 7   Global Step: 294380   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:21,717-Speed 2623.58 samples/sec   Loss 8.8332   LearningRate 0.0416   Epoch: 7   Global Step: 294390   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:48:25,619-Speed 2624.26 samples/sec   Loss 8.8013   LearningRate 0.0416   Epoch: 7   Global Step: 294400   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:29,526-Speed 2622.18 samples/sec   Loss 8.7941   LearningRate 0.0416   Epoch: 7   Global Step: 294410   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:33,421-Speed 2629.65 samples/sec   Loss 8.5362   LearningRate 0.0416   Epoch: 7   Global Step: 294420   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:37,317-Speed 2628.58 samples/sec   Loss 8.8894   LearningRate 0.0416   Epoch: 7   Global Step: 294430   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:41,213-Speed 2629.11 samples/sec   Loss 8.8172   LearningRate 0.0416   Epoch: 7   Global Step: 294440   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:45,107-Speed 2630.40 samples/sec   Loss 8.7417   LearningRate 0.0416   Epoch: 7   Global Step: 294450   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:49,014-Speed 2622.00 samples/sec   Loss 8.6788   LearningRate 0.0416   Epoch: 7   Global Step: 294460   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:52,909-Speed 2629.12 samples/sec   Loss 8.7313   LearningRate 0.0416   Epoch: 7   Global Step: 294470   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:48:56,821-Speed 2618.74 samples/sec   Loss 8.7369   LearningRate 0.0416   Epoch: 7   Global Step: 294480   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:49:00,722-Speed 2625.38 samples/sec   Loss 8.6453   LearningRate 0.0416   Epoch: 7   Global Step: 294490   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:49:04,621-Speed 2626.87 samples/sec   Loss 8.7640   LearningRate 0.0416   Epoch: 7   Global Step: 294500   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:49:08,520-Speed 2626.70 samples/sec   Loss 8.7678   LearningRate 0.0416   Epoch: 7   Global Step: 294510   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:49:12,417-Speed 2628.84 samples/sec   Loss 8.6859   LearningRate 0.0416   Epoch: 7   Global Step: 294520   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:49:16,318-Speed 2625.33 samples/sec   Loss 8.6972   LearningRate 0.0416   Epoch: 7   Global Step: 294530   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:49:20,225-Speed 2621.81 samples/sec   Loss 8.8166   LearningRate 0.0416   Epoch: 7   Global Step: 294540   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:49:24,116-Speed 2632.10 samples/sec   Loss 8.7408   LearningRate 0.0416   Epoch: 7   Global Step: 294550   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:49:28,012-Speed 2629.10 samples/sec   Loss 8.7748   LearningRate 0.0416   Epoch: 7   Global Step: 294560   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:49:31,909-Speed 2628.76 samples/sec   Loss 8.7638   LearningRate 0.0416   Epoch: 7   Global Step: 294570   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:49:35,775-Speed 2648.93 samples/sec   Loss 9.6169   LearningRate 0.0416   Epoch: 7   Global Step: 294580   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:49:39,667-Speed 2631.53 samples/sec   Loss 9.1071   LearningRate 0.0416   Epoch: 7   Global Step: 294590   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:49:43,561-Speed 2634.75 samples/sec   Loss 8.8230   LearningRate 0.0416   Epoch: 7   Global Step: 294600   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:49:47,458-Speed 2628.58 samples/sec   Loss 8.7909   LearningRate 0.0416   Epoch: 7   Global Step: 294610   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:49:51,350-Speed 2631.79 samples/sec   Loss 8.7722   LearningRate 0.0416   Epoch: 7   Global Step: 294620   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:49:55,242-Speed 2631.49 samples/sec   Loss 8.7162   LearningRate 0.0416   Epoch: 7   Global Step: 294630   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:49:59,135-Speed 2630.98 samples/sec   Loss 8.7688   LearningRate 0.0416   Epoch: 7   Global Step: 294640   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:50:03,031-Speed 2628.93 samples/sec   Loss 8.8540   LearningRate 0.0416   Epoch: 7   Global Step: 294650   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:50:06,923-Speed 2631.71 samples/sec   Loss 8.7050   LearningRate 0.0416   Epoch: 7   Global Step: 294660   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:50:10,821-Speed 2627.61 samples/sec   Loss 8.8674   LearningRate 0.0416   Epoch: 7   Global Step: 294670   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:50:14,716-Speed 2629.68 samples/sec   Loss 8.8424   LearningRate 0.0416   Epoch: 7   Global Step: 294680   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:18,614-Speed 2627.45 samples/sec   Loss 8.7645   LearningRate 0.0416   Epoch: 7   Global Step: 294690   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:22,532-Speed 2613.86 samples/sec   Loss 8.8857   LearningRate 0.0416   Epoch: 7   Global Step: 294700   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:26,432-Speed 2626.54 samples/sec   Loss 8.7948   LearningRate 0.0416   Epoch: 7   Global Step: 294710   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:30,329-Speed 2628.17 samples/sec   Loss 8.8063   LearningRate 0.0416   Epoch: 7   Global Step: 294720   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:34,230-Speed 2625.81 samples/sec   Loss 8.6679   LearningRate 0.0416   Epoch: 7   Global Step: 294730   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:38,125-Speed 2630.09 samples/sec   Loss 8.6284   LearningRate 0.0416   Epoch: 7   Global Step: 294740   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:42,022-Speed 2627.85 samples/sec   Loss 8.6810   LearningRate 0.0416   Epoch: 7   Global Step: 294750   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:45,917-Speed 2629.28 samples/sec   Loss 8.8773   LearningRate 0.0416   Epoch: 7   Global Step: 294760   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:49,816-Speed 2627.48 samples/sec   Loss 8.7453   LearningRate 0.0416   Epoch: 7   Global Step: 294770   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:50:53,721-Speed 2622.65 samples/sec   Loss 8.7031   LearningRate 0.0416   Epoch: 7   Global Step: 294780   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:50:57,649-Speed 2607.97 samples/sec   Loss 8.7584   LearningRate 0.0416   Epoch: 7   Global Step: 294790   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:01,545-Speed 2629.18 samples/sec   Loss 8.6941   LearningRate 0.0416   Epoch: 7   Global Step: 294800   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:05,453-Speed 2620.53 samples/sec   Loss 8.8336   LearningRate 0.0416   Epoch: 7   Global Step: 294810   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:09,342-Speed 2633.68 samples/sec   Loss 8.7132   LearningRate 0.0416   Epoch: 7   Global Step: 294820   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:13,253-Speed 2619.41 samples/sec   Loss 8.6420   LearningRate 0.0416   Epoch: 7   Global Step: 294830   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:17,185-Speed 2604.80 samples/sec   Loss 8.7512   LearningRate 0.0415   Epoch: 7   Global Step: 294840   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:21,116-Speed 2605.84 samples/sec   Loss 8.8586   LearningRate 0.0415   Epoch: 7   Global Step: 294850   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:25,021-Speed 2623.00 samples/sec   Loss 8.8046   LearningRate 0.0415   Epoch: 7   Global Step: 294860   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:28,957-Speed 2602.06 samples/sec   Loss 8.8028   LearningRate 0.0415   Epoch: 7   Global Step: 294870   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:32,876-Speed 2613.74 samples/sec   Loss 8.7540   LearningRate 0.0415   Epoch: 7   Global Step: 294880   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:51:36,784-Speed 2621.35 samples/sec   Loss 8.8894   LearningRate 0.0415   Epoch: 7   Global Step: 294890   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:51:40,660-Speed 2642.31 samples/sec   Loss 8.7653   LearningRate 0.0415   Epoch: 7   Global Step: 294900   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:44,562-Speed 2625.09 samples/sec   Loss 8.8143   LearningRate 0.0415   Epoch: 7   Global Step: 294910   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:48,460-Speed 2627.46 samples/sec   Loss 8.6922   LearningRate 0.0415   Epoch: 7   Global Step: 294920   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:52,364-Speed 2623.14 samples/sec   Loss 8.6293   LearningRate 0.0415   Epoch: 7   Global Step: 294930   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:51:56,358-Speed 2565.12 samples/sec   Loss 8.8658   LearningRate 0.0415   Epoch: 7   Global Step: 294940   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:52:00,251-Speed 2630.62 samples/sec   Loss 8.7273   LearningRate 0.0415   Epoch: 7   Global Step: 294950   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:52:04,145-Speed 2630.64 samples/sec   Loss 8.8544   LearningRate 0.0415   Epoch: 7   Global Step: 294960   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:52:08,084-Speed 2600.17 samples/sec   Loss 8.6800   LearningRate 0.0415   Epoch: 7   Global Step: 294970   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:52:12,063-Speed 2573.78 samples/sec   Loss 8.9891   LearningRate 0.0415   Epoch: 7   Global Step: 294980   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:52:15,962-Speed 2627.14 samples/sec   Loss 8.8475   LearningRate 0.0415   Epoch: 7   Global Step: 294990   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:52:19,859-Speed 2628.11 samples/sec   Loss 8.7313   LearningRate 0.0415   Epoch: 7   Global Step: 295000   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:23,756-Speed 2628.34 samples/sec   Loss 8.8441   LearningRate 0.0415   Epoch: 7   Global Step: 295010   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:27,659-Speed 2624.42 samples/sec   Loss 8.8184   LearningRate 0.0415   Epoch: 7   Global Step: 295020   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:31,555-Speed 2629.19 samples/sec   Loss 8.8022   LearningRate 0.0415   Epoch: 7   Global Step: 295030   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:35,456-Speed 2625.40 samples/sec   Loss 8.6722   LearningRate 0.0415   Epoch: 7   Global Step: 295040   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:39,358-Speed 2625.09 samples/sec   Loss 8.5694   LearningRate 0.0415   Epoch: 7   Global Step: 295050   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:43,439-Speed 2509.85 samples/sec   Loss 8.7389   LearningRate 0.0415   Epoch: 7   Global Step: 295060   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:47,337-Speed 2628.27 samples/sec   Loss 8.7558   LearningRate 0.0415   Epoch: 7   Global Step: 295070   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:51,233-Speed 2628.76 samples/sec   Loss 8.7563   LearningRate 0.0415   Epoch: 7   Global Step: 295080   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:55,129-Speed 2629.69 samples/sec   Loss 8.7258   LearningRate 0.0415   Epoch: 7   Global Step: 295090   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:52:59,025-Speed 2628.65 samples/sec   Loss 8.6634   LearningRate 0.0415   Epoch: 7   Global Step: 295100   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:53:02,923-Speed 2627.66 samples/sec   Loss 8.7014   LearningRate 0.0415   Epoch: 7   Global Step: 295110   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:53:06,802-Speed 2640.62 samples/sec   Loss 8.7412   LearningRate 0.0415   Epoch: 7   Global Step: 295120   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:10,749-Speed 2594.92 samples/sec   Loss 8.7460   LearningRate 0.0415   Epoch: 7   Global Step: 295130   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:14,664-Speed 2616.95 samples/sec   Loss 8.8211   LearningRate 0.0415   Epoch: 7   Global Step: 295140   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:18,580-Speed 2615.77 samples/sec   Loss 8.6728   LearningRate 0.0415   Epoch: 7   Global Step: 295150   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:22,522-Speed 2597.93 samples/sec   Loss 8.7093   LearningRate 0.0415   Epoch: 7   Global Step: 295160   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:26,437-Speed 2616.62 samples/sec   Loss 8.7562   LearningRate 0.0415   Epoch: 7   Global Step: 295170   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:30,350-Speed 2617.82 samples/sec   Loss 8.6242   LearningRate 0.0415   Epoch: 7   Global Step: 295180   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:34,263-Speed 2617.06 samples/sec   Loss 8.7194   LearningRate 0.0415   Epoch: 7   Global Step: 295190   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:38,173-Speed 2619.73 samples/sec   Loss 8.6563   LearningRate 0.0415   Epoch: 7   Global Step: 295200   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:53:42,090-Speed 2614.30 samples/sec   Loss 8.7036   LearningRate 0.0415   Epoch: 7   Global Step: 295210   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:53:46,008-Speed 2614.88 samples/sec   Loss 8.6472   LearningRate 0.0415   Epoch: 7   Global Step: 295220   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:53:49,911-Speed 2624.88 samples/sec   Loss 8.7779   LearningRate 0.0415   Epoch: 7   Global Step: 295230   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:53:53,828-Speed 2614.74 samples/sec   Loss 8.6634   LearningRate 0.0415   Epoch: 7   Global Step: 295240   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:53:57,727-Speed 2627.41 samples/sec   Loss 8.7565   LearningRate 0.0415   Epoch: 7   Global Step: 295250   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:01,621-Speed 2630.29 samples/sec   Loss 8.7069   LearningRate 0.0415   Epoch: 7   Global Step: 295260   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:05,522-Speed 2624.99 samples/sec   Loss 8.7182   LearningRate 0.0415   Epoch: 7   Global Step: 295270   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:09,417-Speed 2629.83 samples/sec   Loss 8.7246   LearningRate 0.0415   Epoch: 7   Global Step: 295280   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:13,310-Speed 2631.65 samples/sec   Loss 8.7254   LearningRate 0.0415   Epoch: 7   Global Step: 295290   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:17,206-Speed 2628.52 samples/sec   Loss 8.6835   LearningRate 0.0415   Epoch: 7   Global Step: 295300   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:21,144-Speed 2600.90 samples/sec   Loss 8.7974   LearningRate 0.0415   Epoch: 7   Global Step: 295310   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:54:25,040-Speed 2628.98 samples/sec   Loss 8.6359   LearningRate 0.0415   Epoch: 7   Global Step: 295320   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:54:28,937-Speed 2629.07 samples/sec   Loss 8.6681   LearningRate 0.0415   Epoch: 7   Global Step: 295330   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:54:32,860-Speed 2610.31 samples/sec   Loss 8.7082   LearningRate 0.0415   Epoch: 7   Global Step: 295340   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:36,757-Speed 2628.41 samples/sec   Loss 8.6576   LearningRate 0.0415   Epoch: 7   Global Step: 295350   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:40,672-Speed 2616.33 samples/sec   Loss 8.7503   LearningRate 0.0415   Epoch: 7   Global Step: 295360   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:44,582-Speed 2619.79 samples/sec   Loss 8.8370   LearningRate 0.0415   Epoch: 7   Global Step: 295370   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:48,488-Speed 2622.29 samples/sec   Loss 8.7437   LearningRate 0.0415   Epoch: 7   Global Step: 295380   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:52,399-Speed 2618.49 samples/sec   Loss 8.7317   LearningRate 0.0415   Epoch: 7   Global Step: 295390   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:54:56,298-Speed 2627.46 samples/sec   Loss 8.7662   LearningRate 0.0415   Epoch: 7   Global Step: 295400   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:55:00,188-Speed 2633.48 samples/sec   Loss 8.7187   LearningRate 0.0415   Epoch: 7   Global Step: 295410   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:55:04,119-Speed 2605.08 samples/sec   Loss 8.7507   LearningRate 0.0415   Epoch: 7   Global Step: 295420   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:55:08,011-Speed 2631.75 samples/sec   Loss 8.6980   LearningRate 0.0415   Epoch: 7   Global Step: 295430   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:55:11,905-Speed 2629.67 samples/sec   Loss 8.7405   LearningRate 0.0415   Epoch: 7   Global Step: 295440   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:55:15,801-Speed 2629.60 samples/sec   Loss 8.8168   LearningRate 0.0415   Epoch: 7   Global Step: 295450   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:55:19,713-Speed 2618.66 samples/sec   Loss 8.7811   LearningRate 0.0415   Epoch: 7   Global Step: 295460   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:55:23,607-Speed 2630.27 samples/sec   Loss 8.7370   LearningRate 0.0415   Epoch: 7   Global Step: 295470   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:55:27,447-Speed 2667.37 samples/sec   Loss 9.0950   LearningRate 0.0415   Epoch: 7   Global Step: 295480   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:31,349-Speed 2624.93 samples/sec   Loss 8.8500   LearningRate 0.0414   Epoch: 7   Global Step: 295490   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:35,242-Speed 2630.99 samples/sec   Loss 8.7658   LearningRate 0.0414   Epoch: 7   Global Step: 295500   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:39,132-Speed 2632.79 samples/sec   Loss 8.6482   LearningRate 0.0414   Epoch: 7   Global Step: 295510   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:43,030-Speed 2627.82 samples/sec   Loss 8.7964   LearningRate 0.0414   Epoch: 7   Global Step: 295520   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:46,922-Speed 2631.66 samples/sec   Loss 8.7322   LearningRate 0.0414   Epoch: 7   Global Step: 295530   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:50,814-Speed 2631.67 samples/sec   Loss 8.7511   LearningRate 0.0414   Epoch: 7   Global Step: 295540   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:54,713-Speed 2627.01 samples/sec   Loss 8.7640   LearningRate 0.0414   Epoch: 7   Global Step: 295550   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:55:58,622-Speed 2620.33 samples/sec   Loss 8.6434   LearningRate 0.0414   Epoch: 7   Global Step: 295560   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:56:02,521-Speed 2627.25 samples/sec   Loss 8.7873   LearningRate 0.0414   Epoch: 7   Global Step: 295570   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 04:56:06,523-Speed 2558.79 samples/sec   Loss 8.6608   LearningRate 0.0414   Epoch: 7   Global Step: 295580   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:10,459-Speed 2602.15 samples/sec   Loss 8.6680   LearningRate 0.0414   Epoch: 7   Global Step: 295590   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:14,403-Speed 2597.46 samples/sec   Loss 8.7566   LearningRate 0.0414   Epoch: 7   Global Step: 295600   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:18,395-Speed 2565.51 samples/sec   Loss 8.8387   LearningRate 0.0414   Epoch: 7   Global Step: 295610   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:22,365-Speed 2580.24 samples/sec   Loss 8.6428   LearningRate 0.0414   Epoch: 7   Global Step: 295620   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:26,259-Speed 2630.47 samples/sec   Loss 8.7572   LearningRate 0.0414   Epoch: 7   Global Step: 295630   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:30,153-Speed 2630.36 samples/sec   Loss 8.8638   LearningRate 0.0414   Epoch: 7   Global Step: 295640   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:34,044-Speed 2631.98 samples/sec   Loss 8.8450   LearningRate 0.0414   Epoch: 7   Global Step: 295650   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:37,946-Speed 2625.23 samples/sec   Loss 8.8729   LearningRate 0.0414   Epoch: 7   Global Step: 295660   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:41,853-Speed 2620.97 samples/sec   Loss 8.7834   LearningRate 0.0414   Epoch: 7   Global Step: 295670   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 04:56:45,766-Speed 2617.71 samples/sec   Loss 8.7186   LearningRate 0.0414   Epoch: 7   Global Step: 295680   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:56:49,668-Speed 2624.83 samples/sec   Loss 8.7128   LearningRate 0.0414   Epoch: 7   Global Step: 295690   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:56:53,588-Speed 2612.58 samples/sec   Loss 8.7111   LearningRate 0.0414   Epoch: 7   Global Step: 295700   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:56:57,487-Speed 2627.64 samples/sec   Loss 8.8224   LearningRate 0.0414   Epoch: 7   Global Step: 295710   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:01,382-Speed 2629.43 samples/sec   Loss 8.7858   LearningRate 0.0414   Epoch: 7   Global Step: 295720   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:05,280-Speed 2627.98 samples/sec   Loss 8.8385   LearningRate 0.0414   Epoch: 7   Global Step: 295730   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:09,203-Speed 2610.33 samples/sec   Loss 8.8047   LearningRate 0.0414   Epoch: 7   Global Step: 295740   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:13,104-Speed 2625.81 samples/sec   Loss 8.7842   LearningRate 0.0414   Epoch: 7   Global Step: 295750   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:17,004-Speed 2626.22 samples/sec   Loss 8.8455   LearningRate 0.0414   Epoch: 7   Global Step: 295760   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:20,904-Speed 2626.37 samples/sec   Loss 8.7067   LearningRate 0.0414   Epoch: 7   Global Step: 295770   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:24,817-Speed 2617.75 samples/sec   Loss 8.7691   LearningRate 0.0414   Epoch: 7   Global Step: 295780   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:57:28,717-Speed 2625.70 samples/sec   Loss 8.7381   LearningRate 0.0414   Epoch: 7   Global Step: 295790   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:57:32,612-Speed 2630.55 samples/sec   Loss 8.7323   LearningRate 0.0414   Epoch: 7   Global Step: 295800   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:57:36,502-Speed 2632.98 samples/sec   Loss 8.6742   LearningRate 0.0414   Epoch: 7   Global Step: 295810   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:57:40,377-Speed 2642.96 samples/sec   Loss 8.7261   LearningRate 0.0414   Epoch: 7   Global Step: 295820   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:44,271-Speed 2630.37 samples/sec   Loss 8.7105   LearningRate 0.0414   Epoch: 7   Global Step: 295830   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:48,181-Speed 2619.47 samples/sec   Loss 8.8610   LearningRate 0.0414   Epoch: 7   Global Step: 295840   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:52,080-Speed 2627.09 samples/sec   Loss 8.7596   LearningRate 0.0414   Epoch: 7   Global Step: 295850   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:55,971-Speed 2632.75 samples/sec   Loss 8.8516   LearningRate 0.0414   Epoch: 7   Global Step: 295860   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:57:59,864-Speed 2630.84 samples/sec   Loss 8.9096   LearningRate 0.0414   Epoch: 7   Global Step: 295870   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:58:03,758-Speed 2630.37 samples/sec   Loss 8.5406   LearningRate 0.0414   Epoch: 7   Global Step: 295880   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:58:07,650-Speed 2632.22 samples/sec   Loss 8.6950   LearningRate 0.0414   Epoch: 7   Global Step: 295890   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:58:11,541-Speed 2631.62 samples/sec   Loss 8.7615   LearningRate 0.0414   Epoch: 7   Global Step: 295900   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:58:15,436-Speed 2629.57 samples/sec   Loss 8.7855   LearningRate 0.0414   Epoch: 7   Global Step: 295910   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 04:58:19,343-Speed 2622.08 samples/sec   Loss 8.8171   LearningRate 0.0414   Epoch: 7   Global Step: 295920   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:23,276-Speed 2604.18 samples/sec   Loss 8.8148   LearningRate 0.0414   Epoch: 7   Global Step: 295930   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:27,182-Speed 2621.80 samples/sec   Loss 8.7104   LearningRate 0.0414   Epoch: 7   Global Step: 295940   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:31,082-Speed 2627.12 samples/sec   Loss 8.8743   LearningRate 0.0414   Epoch: 7   Global Step: 295950   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:34,978-Speed 2628.88 samples/sec   Loss 8.6682   LearningRate 0.0414   Epoch: 7   Global Step: 295960   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:38,880-Speed 2624.94 samples/sec   Loss 8.7541   LearningRate 0.0414   Epoch: 7   Global Step: 295970   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:42,799-Speed 2613.21 samples/sec   Loss 8.7517   LearningRate 0.0414   Epoch: 7   Global Step: 295980   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:46,709-Speed 2619.90 samples/sec   Loss 8.5924   LearningRate 0.0414   Epoch: 7   Global Step: 295990   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:50,615-Speed 2622.32 samples/sec   Loss 8.6849   LearningRate 0.0414   Epoch: 7   Global Step: 296000   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:54,522-Speed 2621.87 samples/sec   Loss 8.7039   LearningRate 0.0414   Epoch: 7   Global Step: 296010   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 04:58:58,420-Speed 2627.50 samples/sec   Loss 8.7917   LearningRate 0.0414   Epoch: 7   Global Step: 296020   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:02,316-Speed 2629.13 samples/sec   Loss 8.8864   LearningRate 0.0414   Epoch: 7   Global Step: 296030   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:06,221-Speed 2622.36 samples/sec   Loss 8.5722   LearningRate 0.0414   Epoch: 7   Global Step: 296040   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:10,992-Speed 2147.04 samples/sec   Loss 8.7141   LearningRate 0.0414   Epoch: 7   Global Step: 296050   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:14,881-Speed 2633.52 samples/sec   Loss 8.6772   LearningRate 0.0414   Epoch: 7   Global Step: 296060   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:18,774-Speed 2630.91 samples/sec   Loss 8.7612   LearningRate 0.0414   Epoch: 7   Global Step: 296070   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:22,671-Speed 2629.05 samples/sec   Loss 8.8424   LearningRate 0.0414   Epoch: 7   Global Step: 296080   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:26,563-Speed 2631.71 samples/sec   Loss 8.7760   LearningRate 0.0414   Epoch: 7   Global Step: 296090   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:30,455-Speed 2631.45 samples/sec   Loss 8.8602   LearningRate 0.0414   Epoch: 7   Global Step: 296100   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:34,349-Speed 2630.39 samples/sec   Loss 8.8089   LearningRate 0.0414   Epoch: 7   Global Step: 296110   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:38,244-Speed 2629.47 samples/sec   Loss 8.7100   LearningRate 0.0414   Epoch: 7   Global Step: 296120   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:59:42,135-Speed 2631.97 samples/sec   Loss 8.6249   LearningRate 0.0413   Epoch: 7   Global Step: 296130   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:59:46,040-Speed 2623.40 samples/sec   Loss 8.6295   LearningRate 0.0413   Epoch: 7   Global Step: 296140   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 04:59:49,913-Speed 2644.93 samples/sec   Loss 8.7503   LearningRate 0.0413   Epoch: 7   Global Step: 296150   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:53,805-Speed 2631.44 samples/sec   Loss 8.7408   LearningRate 0.0413   Epoch: 7   Global Step: 296160   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 04:59:57,713-Speed 2621.51 samples/sec   Loss 8.6590   LearningRate 0.0413   Epoch: 7   Global Step: 296170   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:01,605-Speed 2631.74 samples/sec   Loss 8.7676   LearningRate 0.0413   Epoch: 7   Global Step: 296180   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:05,523-Speed 2613.77 samples/sec   Loss 8.7279   LearningRate 0.0413   Epoch: 7   Global Step: 296190   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:09,430-Speed 2621.62 samples/sec   Loss 8.7892   LearningRate 0.0413   Epoch: 7   Global Step: 296200   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:13,327-Speed 2628.88 samples/sec   Loss 8.7405   LearningRate 0.0413   Epoch: 7   Global Step: 296210   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:17,250-Speed 2612.12 samples/sec   Loss 8.6858   LearningRate 0.0413   Epoch: 7   Global Step: 296220   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:21,142-Speed 2631.53 samples/sec   Loss 8.7197   LearningRate 0.0413   Epoch: 7   Global Step: 296230   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:25,058-Speed 2615.53 samples/sec   Loss 8.7615   LearningRate 0.0413   Epoch: 7   Global Step: 296240   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:29,037-Speed 2574.41 samples/sec   Loss 8.8666   LearningRate 0.0413   Epoch: 7   Global Step: 296250   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:00:32,919-Speed 2638.34 samples/sec   Loss 8.7076   LearningRate 0.0413   Epoch: 7   Global Step: 296260   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:36,822-Speed 2624.51 samples/sec   Loss 8.7959   LearningRate 0.0413   Epoch: 7   Global Step: 296270   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:40,727-Speed 2622.79 samples/sec   Loss 8.7235   LearningRate 0.0413   Epoch: 7   Global Step: 296280   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:44,630-Speed 2624.75 samples/sec   Loss 8.7333   LearningRate 0.0413   Epoch: 7   Global Step: 296290   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:48,532-Speed 2624.90 samples/sec   Loss 8.7447   LearningRate 0.0413   Epoch: 7   Global Step: 296300   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:52,426-Speed 2630.62 samples/sec   Loss 8.5678   LearningRate 0.0413   Epoch: 7   Global Step: 296310   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:00:56,371-Speed 2595.82 samples/sec   Loss 8.6196   LearningRate 0.0413   Epoch: 7   Global Step: 296320   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:00,299-Speed 2608.38 samples/sec   Loss 8.8279   LearningRate 0.0413   Epoch: 7   Global Step: 296330   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:04,209-Speed 2619.83 samples/sec   Loss 8.7722   LearningRate 0.0413   Epoch: 7   Global Step: 296340   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:08,110-Speed 2625.44 samples/sec   Loss 8.6233   LearningRate 0.0413   Epoch: 7   Global Step: 296350   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:12,007-Speed 2628.02 samples/sec   Loss 8.6491   LearningRate 0.0413   Epoch: 7   Global Step: 296360   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:15,925-Speed 2613.95 samples/sec   Loss 8.7661   LearningRate 0.0413   Epoch: 7   Global Step: 296370   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:19,835-Speed 2619.84 samples/sec   Loss 8.7274   LearningRate 0.0413   Epoch: 7   Global Step: 296380   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:23,750-Speed 2616.41 samples/sec   Loss 8.7630   LearningRate 0.0413   Epoch: 7   Global Step: 296390   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:27,675-Speed 2609.28 samples/sec   Loss 8.8526   LearningRate 0.0413   Epoch: 7   Global Step: 296400   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:01:31,560-Speed 2637.13 samples/sec   Loss 8.6730   LearningRate 0.0413   Epoch: 7   Global Step: 296410   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:01:35,462-Speed 2624.95 samples/sec   Loss 8.6557   LearningRate 0.0413   Epoch: 7   Global Step: 296420   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:01:39,365-Speed 2623.84 samples/sec   Loss 8.7498   LearningRate 0.0413   Epoch: 7   Global Step: 296430   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:01:43,276-Speed 2619.09 samples/sec   Loss 8.6804   LearningRate 0.0413   Epoch: 7   Global Step: 296440   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:01:47,187-Speed 2618.50 samples/sec   Loss 8.8154   LearningRate 0.0413   Epoch: 7   Global Step: 296450   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:01:51,072-Speed 2636.78 samples/sec   Loss 9.5079   LearningRate 0.0413   Epoch: 7   Global Step: 296460   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:01:54,984-Speed 2618.07 samples/sec   Loss 9.4364   LearningRate 0.0413   Epoch: 7   Global Step: 296470   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:01:58,897-Speed 2617.50 samples/sec   Loss 8.9499   LearningRate 0.0413   Epoch: 7   Global Step: 296480   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:02:02,780-Speed 2637.63 samples/sec   Loss 9.1301   LearningRate 0.0413   Epoch: 7   Global Step: 296490   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:06,675-Speed 2629.69 samples/sec   Loss 8.8027   LearningRate 0.0413   Epoch: 7   Global Step: 296500   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:10,582-Speed 2621.56 samples/sec   Loss 8.8797   LearningRate 0.0413   Epoch: 7   Global Step: 296510   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:14,495-Speed 2618.20 samples/sec   Loss 8.7494   LearningRate 0.0413   Epoch: 7   Global Step: 296520   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:18,400-Speed 2622.87 samples/sec   Loss 8.9801   LearningRate 0.0413   Epoch: 7   Global Step: 296530   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:22,333-Speed 2603.76 samples/sec   Loss 8.7040   LearningRate 0.0413   Epoch: 7   Global Step: 296540   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:26,370-Speed 2537.74 samples/sec   Loss 8.9457   LearningRate 0.0413   Epoch: 7   Global Step: 296550   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:30,296-Speed 2609.05 samples/sec   Loss 8.8109   LearningRate 0.0413   Epoch: 7   Global Step: 296560   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:34,240-Speed 2597.02 samples/sec   Loss 8.8017   LearningRate 0.0413   Epoch: 7   Global Step: 296570   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:38,140-Speed 2625.74 samples/sec   Loss 8.8670   LearningRate 0.0413   Epoch: 7   Global Step: 296580   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:02:42,044-Speed 2623.84 samples/sec   Loss 8.6444   LearningRate 0.0413   Epoch: 7   Global Step: 296590   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:02:45,969-Speed 2609.59 samples/sec   Loss 8.6779   LearningRate 0.0413   Epoch: 7   Global Step: 296600   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:02:49,915-Speed 2595.74 samples/sec   Loss 8.6197   LearningRate 0.0413   Epoch: 7   Global Step: 296610   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:02:53,809-Speed 2629.94 samples/sec   Loss 8.8237   LearningRate 0.0413   Epoch: 7   Global Step: 296620   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:02:57,703-Speed 2630.86 samples/sec   Loss 8.9255   LearningRate 0.0413   Epoch: 7   Global Step: 296630   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:03:01,639-Speed 2601.95 samples/sec   Loss 8.8743   LearningRate 0.0413   Epoch: 7   Global Step: 296640   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:03:05,531-Speed 2632.00 samples/sec   Loss 8.8341   LearningRate 0.0413   Epoch: 7   Global Step: 296650   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:03:09,423-Speed 2631.19 samples/sec   Loss 8.8796   LearningRate 0.0413   Epoch: 7   Global Step: 296660   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:03:13,314-Speed 2632.10 samples/sec   Loss 8.8677   LearningRate 0.0413   Epoch: 7   Global Step: 296670   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:03:17,207-Speed 2631.16 samples/sec   Loss 8.7732   LearningRate 0.0413   Epoch: 7   Global Step: 296680   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:03:21,098-Speed 2633.21 samples/sec   Loss 8.7431   LearningRate 0.0413   Epoch: 7   Global Step: 296690   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:24,990-Speed 2631.42 samples/sec   Loss 8.6340   LearningRate 0.0413   Epoch: 7   Global Step: 296700   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:28,881-Speed 2632.32 samples/sec   Loss 8.8378   LearningRate 0.0413   Epoch: 7   Global Step: 296710   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:32,772-Speed 2632.28 samples/sec   Loss 8.7200   LearningRate 0.0413   Epoch: 7   Global Step: 296720   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:36,664-Speed 2631.63 samples/sec   Loss 8.7064   LearningRate 0.0413   Epoch: 7   Global Step: 296730   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:40,560-Speed 2629.24 samples/sec   Loss 8.7237   LearningRate 0.0413   Epoch: 7   Global Step: 296740   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:44,452-Speed 2631.17 samples/sec   Loss 8.8157   LearningRate 0.0413   Epoch: 7   Global Step: 296750   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:48,351-Speed 2626.87 samples/sec   Loss 8.7266   LearningRate 0.0413   Epoch: 7   Global Step: 296760   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:52,249-Speed 2627.78 samples/sec   Loss 8.7055   LearningRate 0.0413   Epoch: 7   Global Step: 296770   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:03:56,152-Speed 2624.29 samples/sec   Loss 8.6098   LearningRate 0.0412   Epoch: 7   Global Step: 296780   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:00,052-Speed 2626.73 samples/sec   Loss 8.6474   LearningRate 0.0412   Epoch: 7   Global Step: 296790   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:04:03,944-Speed 2632.55 samples/sec   Loss 8.6915   LearningRate 0.0412   Epoch: 7   Global Step: 296800   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:04:07,817-Speed 2644.24 samples/sec   Loss 8.8474   LearningRate 0.0412   Epoch: 7   Global Step: 296810   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:11,706-Speed 2633.32 samples/sec   Loss 8.6236   LearningRate 0.0412   Epoch: 7   Global Step: 296820   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:15,613-Speed 2621.91 samples/sec   Loss 8.6695   LearningRate 0.0412   Epoch: 7   Global Step: 296830   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:19,503-Speed 2633.22 samples/sec   Loss 8.6896   LearningRate 0.0412   Epoch: 7   Global Step: 296840   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:23,396-Speed 2630.65 samples/sec   Loss 8.7826   LearningRate 0.0412   Epoch: 7   Global Step: 296850   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:27,445-Speed 2530.42 samples/sec   Loss 8.5391   LearningRate 0.0412   Epoch: 7   Global Step: 296860   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:31,327-Speed 2638.35 samples/sec   Loss 8.5693   LearningRate 0.0412   Epoch: 7   Global Step: 296870   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:35,219-Speed 2632.02 samples/sec   Loss 8.5983   LearningRate 0.0412   Epoch: 7   Global Step: 296880   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:39,119-Speed 2625.82 samples/sec   Loss 8.5391   LearningRate 0.0412   Epoch: 7   Global Step: 296890   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:43,001-Speed 2638.56 samples/sec   Loss 8.5836   LearningRate 0.0412   Epoch: 7   Global Step: 296900   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:04:46,895-Speed 2630.70 samples/sec   Loss 8.6271   LearningRate 0.0412   Epoch: 7   Global Step: 296910   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:04:50,787-Speed 2631.95 samples/sec   Loss 8.7976   LearningRate 0.0412   Epoch: 7   Global Step: 296920   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:04:54,686-Speed 2626.83 samples/sec   Loss 8.5946   LearningRate 0.0412   Epoch: 7   Global Step: 296930   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:04:58,597-Speed 2618.64 samples/sec   Loss 8.5478   LearningRate 0.0412   Epoch: 7   Global Step: 296940   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:02,488-Speed 2632.34 samples/sec   Loss 8.8146   LearningRate 0.0412   Epoch: 7   Global Step: 296950   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:06,378-Speed 2633.16 samples/sec   Loss 8.7793   LearningRate 0.0412   Epoch: 7   Global Step: 296960   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:10,267-Speed 2634.24 samples/sec   Loss 8.6596   LearningRate 0.0412   Epoch: 7   Global Step: 296970   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:14,156-Speed 2633.26 samples/sec   Loss 8.6492   LearningRate 0.0412   Epoch: 7   Global Step: 296980   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:18,054-Speed 2627.11 samples/sec   Loss 8.5724   LearningRate 0.0412   Epoch: 7   Global Step: 296990   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:21,946-Speed 2632.19 samples/sec   Loss 8.7432   LearningRate 0.0412   Epoch: 7   Global Step: 297000   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:25,840-Speed 2630.93 samples/sec   Loss 8.7314   LearningRate 0.0412   Epoch: 7   Global Step: 297010   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:05:29,725-Speed 2636.15 samples/sec   Loss 8.5690   LearningRate 0.0412   Epoch: 7   Global Step: 297020   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:05:33,599-Speed 2644.14 samples/sec   Loss 8.6563   LearningRate 0.0412   Epoch: 7   Global Step: 297030   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:05:37,491-Speed 2631.48 samples/sec   Loss 8.6714   LearningRate 0.0412   Epoch: 7   Global Step: 297040   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:05:41,383-Speed 2631.77 samples/sec   Loss 8.6710   LearningRate 0.0412   Epoch: 7   Global Step: 297050   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:05:45,277-Speed 2630.70 samples/sec   Loss 8.6953   LearningRate 0.0412   Epoch: 7   Global Step: 297060   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:05:49,164-Speed 2634.55 samples/sec   Loss 8.9223   LearningRate 0.0412   Epoch: 7   Global Step: 297070   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:05:53,056-Speed 2632.54 samples/sec   Loss 8.7365   LearningRate 0.0412   Epoch: 7   Global Step: 297080   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:05:56,944-Speed 2634.13 samples/sec   Loss 8.8094   LearningRate 0.0412   Epoch: 7   Global Step: 297090   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:06:00,834-Speed 2632.91 samples/sec   Loss 8.6971   LearningRate 0.0412   Epoch: 7   Global Step: 297100   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:06:04,730-Speed 2628.68 samples/sec   Loss 8.8590   LearningRate 0.0412   Epoch: 7   Global Step: 297110   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:06:08,633-Speed 2624.65 samples/sec   Loss 8.7375   LearningRate 0.0412   Epoch: 7   Global Step: 297120   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:06:12,526-Speed 2630.42 samples/sec   Loss 8.6669   LearningRate 0.0412   Epoch: 7   Global Step: 297130   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:16,426-Speed 2626.74 samples/sec   Loss 8.6073   LearningRate 0.0412   Epoch: 7   Global Step: 297140   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:20,315-Speed 2633.26 samples/sec   Loss 8.6257   LearningRate 0.0412   Epoch: 7   Global Step: 297150   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:24,206-Speed 2632.91 samples/sec   Loss 8.5388   LearningRate 0.0412   Epoch: 7   Global Step: 297160   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:28,108-Speed 2625.19 samples/sec   Loss 8.6973   LearningRate 0.0412   Epoch: 7   Global Step: 297170   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:31,997-Speed 2633.28 samples/sec   Loss 8.7175   LearningRate 0.0412   Epoch: 7   Global Step: 297180   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:35,904-Speed 2621.44 samples/sec   Loss 8.7733   LearningRate 0.0412   Epoch: 7   Global Step: 297190   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:39,798-Speed 2630.41 samples/sec   Loss 8.5766   LearningRate 0.0412   Epoch: 7   Global Step: 297200   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:43,691-Speed 2630.70 samples/sec   Loss 8.6508   LearningRate 0.0412   Epoch: 7   Global Step: 297210   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:47,582-Speed 2632.41 samples/sec   Loss 8.7525   LearningRate 0.0412   Epoch: 7   Global Step: 297220   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:06:51,479-Speed 2628.60 samples/sec   Loss 8.6406   LearningRate 0.0412   Epoch: 7   Global Step: 297230   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:06:55,382-Speed 2623.68 samples/sec   Loss 8.6799   LearningRate 0.0412   Epoch: 7   Global Step: 297240   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:06:59,261-Speed 2640.92 samples/sec   Loss 8.6497   LearningRate 0.0412   Epoch: 7   Global Step: 297250   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:03,185-Speed 2610.09 samples/sec   Loss 8.7255   LearningRate 0.0412   Epoch: 7   Global Step: 297260   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:07,085-Speed 2626.09 samples/sec   Loss 8.6291   LearningRate 0.0412   Epoch: 7   Global Step: 297270   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:11,006-Speed 2612.50 samples/sec   Loss 8.7373   LearningRate 0.0412   Epoch: 7   Global Step: 297280   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:14,908-Speed 2625.07 samples/sec   Loss 8.7764   LearningRate 0.0412   Epoch: 7   Global Step: 297290   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:18,805-Speed 2628.47 samples/sec   Loss 8.5489   LearningRate 0.0412   Epoch: 7   Global Step: 297300   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:22,717-Speed 2618.25 samples/sec   Loss 8.6191   LearningRate 0.0412   Epoch: 7   Global Step: 297310   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:26,619-Speed 2625.39 samples/sec   Loss 8.4798   LearningRate 0.0412   Epoch: 7   Global Step: 297320   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:30,530-Speed 2618.65 samples/sec   Loss 8.7061   LearningRate 0.0412   Epoch: 7   Global Step: 297330   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:34,422-Speed 2631.80 samples/sec   Loss 8.6175   LearningRate 0.0412   Epoch: 7   Global Step: 297340   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:38,298-Speed 2642.25 samples/sec   Loss 8.6624   LearningRate 0.0412   Epoch: 7   Global Step: 297350   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:07:42,171-Speed 2644.68 samples/sec   Loss 8.7388   LearningRate 0.0412   Epoch: 7   Global Step: 297360   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:07:46,065-Speed 2630.75 samples/sec   Loss 8.6762   LearningRate 0.0412   Epoch: 7   Global Step: 297370   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:07:49,957-Speed 2631.28 samples/sec   Loss 8.4520   LearningRate 0.0412   Epoch: 7   Global Step: 297380   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:07:53,855-Speed 2628.17 samples/sec   Loss 8.7897   LearningRate 0.0412   Epoch: 7   Global Step: 297390   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:07:57,748-Speed 2630.78 samples/sec   Loss 8.7002   LearningRate 0.0412   Epoch: 7   Global Step: 297400   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:08:01,653-Speed 2623.26 samples/sec   Loss 8.7199   LearningRate 0.0412   Epoch: 7   Global Step: 297410   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:08:05,543-Speed 2632.99 samples/sec   Loss 8.5757   LearningRate 0.0411   Epoch: 7   Global Step: 297420   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:08:09,447-Speed 2623.53 samples/sec   Loss 8.7661   LearningRate 0.0411   Epoch: 7   Global Step: 297430   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:08:13,339-Speed 2631.20 samples/sec   Loss 8.7487   LearningRate 0.0411   Epoch: 7   Global Step: 297440   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:08:17,250-Speed 2619.17 samples/sec   Loss 8.8176   LearningRate 0.0411   Epoch: 7   Global Step: 297450   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:08:21,165-Speed 2616.75 samples/sec   Loss 8.7362   LearningRate 0.0411   Epoch: 7   Global Step: 297460   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:08:25,096-Speed 2605.62 samples/sec   Loss 8.7285   LearningRate 0.0411   Epoch: 7   Global Step: 297470   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:08:28,996-Speed 2626.82 samples/sec   Loss 8.7165   LearningRate 0.0411   Epoch: 7   Global Step: 297480   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:08:32,896-Speed 2625.74 samples/sec   Loss 8.6144   LearningRate 0.0411   Epoch: 7   Global Step: 297490   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:08:36,758-Speed 2652.58 samples/sec   Loss 8.6759   LearningRate 0.0411   Epoch: 7   Global Step: 297500   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:08:40,653-Speed 2629.91 samples/sec   Loss 9.9806   LearningRate 0.0411   Epoch: 7   Global Step: 297510   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:08:44,554-Speed 2625.76 samples/sec   Loss 9.2055   LearningRate 0.0411   Epoch: 7   Global Step: 297520   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:08:48,464-Speed 2618.80 samples/sec   Loss 8.9084   LearningRate 0.0411   Epoch: 7   Global Step: 297530   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:08:52,364-Speed 2626.74 samples/sec   Loss 8.7804   LearningRate 0.0411   Epoch: 7   Global Step: 297540   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:08:56,267-Speed 2624.62 samples/sec   Loss 8.8066   LearningRate 0.0411   Epoch: 7   Global Step: 297550   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:09:00,163-Speed 2629.23 samples/sec   Loss 8.7772   LearningRate 0.0411   Epoch: 7   Global Step: 297560   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:09:04,094-Speed 2606.10 samples/sec   Loss 8.8103   LearningRate 0.0411   Epoch: 7   Global Step: 297570   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:09:08,026-Speed 2604.65 samples/sec   Loss 8.6820   LearningRate 0.0411   Epoch: 7   Global Step: 297580   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:09:12,012-Speed 2569.90 samples/sec   Loss 8.6297   LearningRate 0.0411   Epoch: 7   Global Step: 297590   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:09:15,911-Speed 2626.51 samples/sec   Loss 8.7040   LearningRate 0.0411   Epoch: 7   Global Step: 297600   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:19,822-Speed 2619.41 samples/sec   Loss 8.6943   LearningRate 0.0411   Epoch: 7   Global Step: 297610   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:23,726-Speed 2623.43 samples/sec   Loss 8.7883   LearningRate 0.0411   Epoch: 7   Global Step: 297620   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:27,638-Speed 2617.95 samples/sec   Loss 8.6729   LearningRate 0.0411   Epoch: 7   Global Step: 297630   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:31,528-Speed 2633.20 samples/sec   Loss 8.7042   LearningRate 0.0411   Epoch: 7   Global Step: 297640   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:35,416-Speed 2634.77 samples/sec   Loss 8.7404   LearningRate 0.0411   Epoch: 7   Global Step: 297650   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:39,304-Speed 2633.96 samples/sec   Loss 8.6709   LearningRate 0.0411   Epoch: 7   Global Step: 297660   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:43,193-Speed 2633.48 samples/sec   Loss 8.7497   LearningRate 0.0411   Epoch: 7   Global Step: 297670   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:47,085-Speed 2631.55 samples/sec   Loss 8.7009   LearningRate 0.0411   Epoch: 7   Global Step: 297680   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:51,062-Speed 2575.71 samples/sec   Loss 8.5600   LearningRate 0.0411   Epoch: 7   Global Step: 297690   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:09:54,959-Speed 2628.64 samples/sec   Loss 8.7837   LearningRate 0.0411   Epoch: 7   Global Step: 297700   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:09:58,857-Speed 2627.76 samples/sec   Loss 8.6869   LearningRate 0.0411   Epoch: 7   Global Step: 297710   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:02,764-Speed 2621.81 samples/sec   Loss 8.4883   LearningRate 0.0411   Epoch: 7   Global Step: 297720   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:06,814-Speed 2529.10 samples/sec   Loss 8.7326   LearningRate 0.0411   Epoch: 7   Global Step: 297730   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:10,712-Speed 2627.37 samples/sec   Loss 8.7573   LearningRate 0.0411   Epoch: 7   Global Step: 297740   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:14,611-Speed 2626.50 samples/sec   Loss 8.8483   LearningRate 0.0411   Epoch: 7   Global Step: 297750   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:18,503-Speed 2631.39 samples/sec   Loss 8.7900   LearningRate 0.0411   Epoch: 7   Global Step: 297760   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:22,397-Speed 2630.92 samples/sec   Loss 8.6780   LearningRate 0.0411   Epoch: 7   Global Step: 297770   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:26,288-Speed 2632.95 samples/sec   Loss 8.7562   LearningRate 0.0411   Epoch: 7   Global Step: 297780   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:30,196-Speed 2620.73 samples/sec   Loss 8.7820   LearningRate 0.0411   Epoch: 7   Global Step: 297790   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:10:34,122-Speed 2608.60 samples/sec   Loss 8.7014   LearningRate 0.0411   Epoch: 7   Global Step: 297800   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:10:38,023-Speed 2625.69 samples/sec   Loss 8.6186   LearningRate 0.0411   Epoch: 7   Global Step: 297810   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:10:41,911-Speed 2634.54 samples/sec   Loss 8.7382   LearningRate 0.0411   Epoch: 7   Global Step: 297820   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:10:45,806-Speed 2629.56 samples/sec   Loss 8.5980   LearningRate 0.0411   Epoch: 7   Global Step: 297830   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:10:49,697-Speed 2632.46 samples/sec   Loss 8.6866   LearningRate 0.0411   Epoch: 7   Global Step: 297840   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:10:53,599-Speed 2625.18 samples/sec   Loss 8.7261   LearningRate 0.0411   Epoch: 7   Global Step: 297850   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:10:57,500-Speed 2625.50 samples/sec   Loss 8.7173   LearningRate 0.0411   Epoch: 7   Global Step: 297860   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:11:01,390-Speed 2633.31 samples/sec   Loss 8.7743   LearningRate 0.0411   Epoch: 7   Global Step: 297870   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:11:05,280-Speed 2632.46 samples/sec   Loss 8.6952   LearningRate 0.0411   Epoch: 7   Global Step: 297880   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:11:09,174-Speed 2630.37 samples/sec   Loss 8.7054   LearningRate 0.0411   Epoch: 7   Global Step: 297890   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:11:13,065-Speed 2632.44 samples/sec   Loss 8.5070   LearningRate 0.0411   Epoch: 7   Global Step: 297900   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:11:16,952-Speed 2634.84 samples/sec   Loss 8.7835   LearningRate 0.0411   Epoch: 7   Global Step: 297910   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:11:20,841-Speed 2633.48 samples/sec   Loss 8.6772   LearningRate 0.0411   Epoch: 7   Global Step: 297920   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:11:24,734-Speed 2631.14 samples/sec   Loss 8.6483   LearningRate 0.0411   Epoch: 7   Global Step: 297930   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:11:28,621-Speed 2635.08 samples/sec   Loss 8.7195   LearningRate 0.0411   Epoch: 7   Global Step: 297940   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:11:32,474-Speed 2658.49 samples/sec   Loss 8.6679   LearningRate 0.0411   Epoch: 7   Global Step: 297950   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:11:36,345-Speed 2645.64 samples/sec   Loss 8.5772   LearningRate 0.0411   Epoch: 7   Global Step: 297960   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:11:40,235-Speed 2633.22 samples/sec   Loss 8.6861   LearningRate 0.0411   Epoch: 7   Global Step: 297970   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:11:44,127-Speed 2631.33 samples/sec   Loss 8.5286   LearningRate 0.0411   Epoch: 7   Global Step: 297980   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:11:48,020-Speed 2631.19 samples/sec   Loss 8.5680   LearningRate 0.0411   Epoch: 7   Global Step: 297990   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:11:51,926-Speed 2622.60 samples/sec   Loss 8.5532   LearningRate 0.0411   Epoch: 7   Global Step: 298000   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:11:55,822-Speed 2629.11 samples/sec   Loss 8.7101   LearningRate 0.0411   Epoch: 7   Global Step: 298010   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:11:59,728-Speed 2622.58 samples/sec   Loss 8.7728   LearningRate 0.0411   Epoch: 7   Global Step: 298020   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:12:03,645-Speed 2615.04 samples/sec   Loss 8.6848   LearningRate 0.0411   Epoch: 7   Global Step: 298030   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:12:07,539-Speed 2630.06 samples/sec   Loss 8.6108   LearningRate 0.0411   Epoch: 7   Global Step: 298040   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:12:11,437-Speed 2627.32 samples/sec   Loss 8.5991   LearningRate 0.0411   Epoch: 7   Global Step: 298050   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:12:15,405-Speed 2581.77 samples/sec   Loss 8.6390   LearningRate 0.0411   Epoch: 7   Global Step: 298060   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:12:19,319-Speed 2617.01 samples/sec   Loss 8.5958   LearningRate 0.0410   Epoch: 7   Global Step: 298070   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:12:23,203-Speed 2637.45 samples/sec   Loss 9.4731   LearningRate 0.0410   Epoch: 7   Global Step: 298080   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:12:27,087-Speed 2636.91 samples/sec   Loss 9.3013   LearningRate 0.0410   Epoch: 7   Global Step: 298090   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:30,979-Speed 2632.10 samples/sec   Loss 8.6679   LearningRate 0.0410   Epoch: 7   Global Step: 298100   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:34,871-Speed 2631.33 samples/sec   Loss 8.7046   LearningRate 0.0410   Epoch: 7   Global Step: 298110   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:38,770-Speed 2627.00 samples/sec   Loss 8.7489   LearningRate 0.0410   Epoch: 7   Global Step: 298120   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:42,661-Speed 2632.21 samples/sec   Loss 8.7588   LearningRate 0.0410   Epoch: 7   Global Step: 298130   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:46,563-Speed 2625.36 samples/sec   Loss 8.6167   LearningRate 0.0410   Epoch: 7   Global Step: 298140   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:50,458-Speed 2629.92 samples/sec   Loss 8.6424   LearningRate 0.0410   Epoch: 7   Global Step: 298150   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:54,357-Speed 2627.17 samples/sec   Loss 8.5511   LearningRate 0.0410   Epoch: 7   Global Step: 298160   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:12:58,244-Speed 2634.88 samples/sec   Loss 8.6360   LearningRate 0.0410   Epoch: 7   Global Step: 298170   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:13:02,133-Speed 2634.41 samples/sec   Loss 8.7446   LearningRate 0.0410   Epoch: 7   Global Step: 298180   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:13:06,024-Speed 2632.13 samples/sec   Loss 8.7145   LearningRate 0.0410   Epoch: 7   Global Step: 298190   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:09,914-Speed 2632.61 samples/sec   Loss 8.5651   LearningRate 0.0410   Epoch: 7   Global Step: 298200   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:13,804-Speed 2633.30 samples/sec   Loss 8.7329   LearningRate 0.0410   Epoch: 7   Global Step: 298210   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:17,695-Speed 2632.87 samples/sec   Loss 8.7479   LearningRate 0.0410   Epoch: 7   Global Step: 298220   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:21,584-Speed 2633.35 samples/sec   Loss 8.5429   LearningRate 0.0410   Epoch: 7   Global Step: 298230   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:25,481-Speed 2628.06 samples/sec   Loss 8.5732   LearningRate 0.0410   Epoch: 7   Global Step: 298240   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:29,373-Speed 2632.37 samples/sec   Loss 8.6122   LearningRate 0.0410   Epoch: 7   Global Step: 298250   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:33,261-Speed 2634.69 samples/sec   Loss 8.6247   LearningRate 0.0410   Epoch: 7   Global Step: 298260   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:37,152-Speed 2632.14 samples/sec   Loss 8.5535   LearningRate 0.0410   Epoch: 7   Global Step: 298270   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:41,043-Speed 2632.00 samples/sec   Loss 8.6000   LearningRate 0.0410   Epoch: 7   Global Step: 298280   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:13:44,932-Speed 2633.77 samples/sec   Loss 8.6018   LearningRate 0.0410   Epoch: 7   Global Step: 298290   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:13:48,838-Speed 2622.24 samples/sec   Loss 8.7116   LearningRate 0.0410   Epoch: 7   Global Step: 298300   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:13:52,731-Speed 2631.28 samples/sec   Loss 8.6498   LearningRate 0.0410   Epoch: 7   Global Step: 298310   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:13:56,640-Speed 2620.00 samples/sec   Loss 8.5443   LearningRate 0.0410   Epoch: 7   Global Step: 298320   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:00,551-Speed 2619.20 samples/sec   Loss 8.6758   LearningRate 0.0410   Epoch: 7   Global Step: 298330   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:04,439-Speed 2634.18 samples/sec   Loss 8.7803   LearningRate 0.0410   Epoch: 7   Global Step: 298340   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:08,328-Speed 2633.45 samples/sec   Loss 8.7194   LearningRate 0.0410   Epoch: 7   Global Step: 298350   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:12,216-Speed 2633.96 samples/sec   Loss 8.6261   LearningRate 0.0410   Epoch: 7   Global Step: 298360   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:16,113-Speed 2628.36 samples/sec   Loss 8.7203   LearningRate 0.0410   Epoch: 7   Global Step: 298370   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:20,003-Speed 2633.36 samples/sec   Loss 8.6771   LearningRate 0.0410   Epoch: 7   Global Step: 298380   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:14:23,897-Speed 2630.09 samples/sec   Loss 8.6921   LearningRate 0.0410   Epoch: 7   Global Step: 298390   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:27,788-Speed 2632.58 samples/sec   Loss 8.7522   LearningRate 0.0410   Epoch: 7   Global Step: 298400   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:31,680-Speed 2633.37 samples/sec   Loss 8.7518   LearningRate 0.0410   Epoch: 7   Global Step: 298410   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:35,576-Speed 2628.84 samples/sec   Loss 8.8198   LearningRate 0.0410   Epoch: 7   Global Step: 298420   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:39,466-Speed 2633.29 samples/sec   Loss 8.7418   LearningRate 0.0410   Epoch: 7   Global Step: 298430   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:43,366-Speed 2626.26 samples/sec   Loss 8.7169   LearningRate 0.0410   Epoch: 7   Global Step: 298440   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:47,272-Speed 2622.50 samples/sec   Loss 8.5899   LearningRate 0.0410   Epoch: 7   Global Step: 298450   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:51,163-Speed 2632.29 samples/sec   Loss 8.5347   LearningRate 0.0410   Epoch: 7   Global Step: 298460   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:55,054-Speed 2632.28 samples/sec   Loss 8.5415   LearningRate 0.0410   Epoch: 7   Global Step: 298470   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:14:58,947-Speed 2630.74 samples/sec   Loss 8.7200   LearningRate 0.0410   Epoch: 7   Global Step: 298480   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:02,827-Speed 2639.92 samples/sec   Loss 8.6947   LearningRate 0.0410   Epoch: 7   Global Step: 298490   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:06,718-Speed 2632.56 samples/sec   Loss 8.6624   LearningRate 0.0410   Epoch: 7   Global Step: 298500   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:10,620-Speed 2624.84 samples/sec   Loss 8.5859   LearningRate 0.0410   Epoch: 7   Global Step: 298510   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:14,511-Speed 2633.33 samples/sec   Loss 8.6642   LearningRate 0.0410   Epoch: 7   Global Step: 298520   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:18,401-Speed 2632.50 samples/sec   Loss 8.6338   LearningRate 0.0410   Epoch: 7   Global Step: 298530   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:22,414-Speed 2552.30 samples/sec   Loss 8.6372   LearningRate 0.0410   Epoch: 7   Global Step: 298540   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:26,311-Speed 2628.54 samples/sec   Loss 8.7166   LearningRate 0.0410   Epoch: 7   Global Step: 298550   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:30,246-Speed 2603.38 samples/sec   Loss 8.7734   LearningRate 0.0410   Epoch: 7   Global Step: 298560   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:34,137-Speed 2631.94 samples/sec   Loss 8.6406   LearningRate 0.0410   Epoch: 7   Global Step: 298570   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:38,026-Speed 2633.76 samples/sec   Loss 8.7823   LearningRate 0.0410   Epoch: 7   Global Step: 298580   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:15:41,919-Speed 2631.20 samples/sec   Loss 8.6141   LearningRate 0.0410   Epoch: 7   Global Step: 298590   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:15:45,808-Speed 2634.05 samples/sec   Loss 8.7506   LearningRate 0.0410   Epoch: 7   Global Step: 298600   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:15:49,701-Speed 2630.71 samples/sec   Loss 8.6767   LearningRate 0.0410   Epoch: 7   Global Step: 298610   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:15:53,591-Speed 2633.08 samples/sec   Loss 8.6970   LearningRate 0.0410   Epoch: 7   Global Step: 298620   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:15:57,484-Speed 2630.99 samples/sec   Loss 8.7327   LearningRate 0.0410   Epoch: 7   Global Step: 298630   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:16:01,375-Speed 2632.26 samples/sec   Loss 8.8499   LearningRate 0.0410   Epoch: 7   Global Step: 298640   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:16:05,281-Speed 2622.14 samples/sec   Loss 8.5869   LearningRate 0.0410   Epoch: 7   Global Step: 298650   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:16:09,188-Speed 2621.59 samples/sec   Loss 8.7342   LearningRate 0.0410   Epoch: 7   Global Step: 298660   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:16:13,082-Speed 2630.48 samples/sec   Loss 8.5493   LearningRate 0.0410   Epoch: 7   Global Step: 298670   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:16:16,961-Speed 2640.92 samples/sec   Loss 8.5649   LearningRate 0.0410   Epoch: 7   Global Step: 298680   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:20,856-Speed 2629.34 samples/sec   Loss 8.6222   LearningRate 0.0410   Epoch: 7   Global Step: 298690   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:24,769-Speed 2617.43 samples/sec   Loss 8.6833   LearningRate 0.0410   Epoch: 7   Global Step: 298700   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:28,659-Speed 2633.05 samples/sec   Loss 8.6354   LearningRate 0.0410   Epoch: 7   Global Step: 298710   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:32,552-Speed 2631.01 samples/sec   Loss 8.4323   LearningRate 0.0409   Epoch: 7   Global Step: 298720   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:36,444-Speed 2631.52 samples/sec   Loss 8.6261   LearningRate 0.0409   Epoch: 7   Global Step: 298730   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:40,341-Speed 2628.10 samples/sec   Loss 8.6571   LearningRate 0.0409   Epoch: 7   Global Step: 298740   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:44,247-Speed 2622.21 samples/sec   Loss 8.6609   LearningRate 0.0409   Epoch: 7   Global Step: 298750   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:48,142-Speed 2630.02 samples/sec   Loss 8.6412   LearningRate 0.0409   Epoch: 7   Global Step: 298760   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:52,035-Speed 2631.30 samples/sec   Loss 8.5932   LearningRate 0.0409   Epoch: 7   Global Step: 298770   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:16:55,926-Speed 2632.53 samples/sec   Loss 8.5933   LearningRate 0.0409   Epoch: 7   Global Step: 298780   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:16:59,804-Speed 2640.65 samples/sec   Loss 8.5606   LearningRate 0.0409   Epoch: 7   Global Step: 298790   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:17:03,698-Speed 2630.46 samples/sec   Loss 8.6056   LearningRate 0.0409   Epoch: 7   Global Step: 298800   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:17:07,541-Speed 2665.37 samples/sec   Loss 8.7659   LearningRate 0.0409   Epoch: 7   Global Step: 298810   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:11,426-Speed 2636.40 samples/sec   Loss 9.1316   LearningRate 0.0409   Epoch: 7   Global Step: 298820   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:15,323-Speed 2628.50 samples/sec   Loss 9.1463   LearningRate 0.0409   Epoch: 7   Global Step: 298830   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:19,219-Speed 2628.71 samples/sec   Loss 8.7152   LearningRate 0.0409   Epoch: 7   Global Step: 298840   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:23,132-Speed 2618.10 samples/sec   Loss 8.8216   LearningRate 0.0409   Epoch: 7   Global Step: 298850   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:27,036-Speed 2624.11 samples/sec   Loss 8.7848   LearningRate 0.0409   Epoch: 7   Global Step: 298860   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:30,932-Speed 2629.15 samples/sec   Loss 8.5544   LearningRate 0.0409   Epoch: 7   Global Step: 298870   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:34,822-Speed 2633.28 samples/sec   Loss 8.6403   LearningRate 0.0409   Epoch: 7   Global Step: 298880   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:38,709-Speed 2634.75 samples/sec   Loss 8.6325   LearningRate 0.0409   Epoch: 7   Global Step: 298890   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:42,598-Speed 2633.62 samples/sec   Loss 8.5850   LearningRate 0.0409   Epoch: 7   Global Step: 298900   Fp16 Grad Scale: 8192   Required: 60 hours
Training: 2022-04-14 05:17:46,490-Speed 2632.40 samples/sec   Loss 8.5903   LearningRate 0.0409   Epoch: 7   Global Step: 298910   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:17:50,381-Speed 2631.84 samples/sec   Loss 8.6723   LearningRate 0.0409   Epoch: 7   Global Step: 298920   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:17:54,272-Speed 2632.96 samples/sec   Loss 8.6360   LearningRate 0.0409   Epoch: 7   Global Step: 298930   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:17:58,162-Speed 2632.78 samples/sec   Loss 8.6457   LearningRate 0.0409   Epoch: 7   Global Step: 298940   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:02,062-Speed 2626.71 samples/sec   Loss 8.6152   LearningRate 0.0409   Epoch: 7   Global Step: 298950   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:05,950-Speed 2634.32 samples/sec   Loss 8.9333   LearningRate 0.0409   Epoch: 7   Global Step: 298960   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:09,859-Speed 2619.90 samples/sec   Loss 9.3442   LearningRate 0.0409   Epoch: 7   Global Step: 298970   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:13,750-Speed 2632.25 samples/sec   Loss 8.9096   LearningRate 0.0409   Epoch: 7   Global Step: 298980   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:17,647-Speed 2628.56 samples/sec   Loss 8.8624   LearningRate 0.0409   Epoch: 7   Global Step: 298990   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:21,557-Speed 2619.66 samples/sec   Loss 8.6450   LearningRate 0.0409   Epoch: 7   Global Step: 299000   Fp16 Grad Scale: 16384   Required: 60 hours
Training: 2022-04-14 05:18:25,448-Speed 2632.36 samples/sec   Loss 8.8364   LearningRate 0.0409   Epoch: 7   Global Step: 299010   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:29,364-Speed 2615.82 samples/sec   Loss 8.7022   LearningRate 0.0409   Epoch: 7   Global Step: 299020   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:33,253-Speed 2633.80 samples/sec   Loss 8.6572   LearningRate 0.0409   Epoch: 7   Global Step: 299030   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:37,144-Speed 2632.64 samples/sec   Loss 8.7033   LearningRate 0.0409   Epoch: 7   Global Step: 299040   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:41,036-Speed 2631.40 samples/sec   Loss 8.6895   LearningRate 0.0409   Epoch: 7   Global Step: 299050   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:44,924-Speed 2634.44 samples/sec   Loss 8.6062   LearningRate 0.0409   Epoch: 7   Global Step: 299060   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:48,817-Speed 2631.08 samples/sec   Loss 8.6283   LearningRate 0.0409   Epoch: 7   Global Step: 299070   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:52,701-Speed 2636.75 samples/sec   Loss 9.0010   LearningRate 0.0409   Epoch: 7   Global Step: 299080   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:18:56,613-Speed 2617.99 samples/sec   Loss 8.7356   LearningRate 0.0409   Epoch: 7   Global Step: 299090   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:19:00,520-Speed 2622.46 samples/sec   Loss 8.7372   LearningRate 0.0409   Epoch: 7   Global Step: 299100   Fp16 Grad Scale: 32768   Required: 60 hours
Training: 2022-04-14 05:19:04,410-Speed 2633.01 samples/sec   Loss 8.7280   LearningRate 0.0409   Epoch: 7   Global Step: 299110   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:08,299-Speed 2633.56 samples/sec   Loss 8.6951   LearningRate 0.0409   Epoch: 7   Global Step: 299120   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:12,200-Speed 2625.27 samples/sec   Loss 8.6638   LearningRate 0.0409   Epoch: 7   Global Step: 299130   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:16,093-Speed 2631.56 samples/sec   Loss 8.6906   LearningRate 0.0409   Epoch: 7   Global Step: 299140   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:19,982-Speed 2633.89 samples/sec   Loss 8.5102   LearningRate 0.0409   Epoch: 7   Global Step: 299150   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:23,901-Speed 2613.11 samples/sec   Loss 8.5199   LearningRate 0.0409   Epoch: 7   Global Step: 299160   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:27,798-Speed 2628.85 samples/sec   Loss 8.6649   LearningRate 0.0409   Epoch: 7   Global Step: 299170   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:31,690-Speed 2631.63 samples/sec   Loss 8.7197   LearningRate 0.0409   Epoch: 7   Global Step: 299180   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:35,583-Speed 2631.11 samples/sec   Loss 8.6354   LearningRate 0.0409   Epoch: 7   Global Step: 299190   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:39,473-Speed 2633.25 samples/sec   Loss 8.7473   LearningRate 0.0409   Epoch: 7   Global Step: 299200   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:19:43,379-Speed 2622.40 samples/sec   Loss 8.6758   LearningRate 0.0409   Epoch: 7   Global Step: 299210   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:19:47,290-Speed 2619.03 samples/sec   Loss 8.6448   LearningRate 0.0409   Epoch: 7   Global Step: 299220   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:19:51,204-Speed 2616.90 samples/sec   Loss 8.6861   LearningRate 0.0409   Epoch: 7   Global Step: 299230   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:19:55,092-Speed 2634.34 samples/sec   Loss 8.7831   LearningRate 0.0409   Epoch: 7   Global Step: 299240   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:19:58,987-Speed 2629.56 samples/sec   Loss 8.6388   LearningRate 0.0409   Epoch: 7   Global Step: 299250   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:02,892-Speed 2623.11 samples/sec   Loss 8.5884   LearningRate 0.0409   Epoch: 7   Global Step: 299260   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:06,779-Speed 2635.00 samples/sec   Loss 8.7068   LearningRate 0.0409   Epoch: 7   Global Step: 299270   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:10,680-Speed 2625.70 samples/sec   Loss 8.6892   LearningRate 0.0409   Epoch: 7   Global Step: 299280   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:14,574-Speed 2629.99 samples/sec   Loss 8.8166   LearningRate 0.0409   Epoch: 7   Global Step: 299290   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:18,466-Speed 2632.35 samples/sec   Loss 8.6075   LearningRate 0.0409   Epoch: 7   Global Step: 299300   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:22,386-Speed 2612.63 samples/sec   Loss 8.6343   LearningRate 0.0409   Epoch: 7   Global Step: 299310   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:20:26,259-Speed 2644.55 samples/sec   Loss 8.6401   LearningRate 0.0409   Epoch: 7   Global Step: 299320   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:30,152-Speed 2631.13 samples/sec   Loss 8.6640   LearningRate 0.0409   Epoch: 7   Global Step: 299330   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:34,045-Speed 2630.73 samples/sec   Loss 8.7306   LearningRate 0.0409   Epoch: 7   Global Step: 299340   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:37,945-Speed 2626.85 samples/sec   Loss 8.7749   LearningRate 0.0409   Epoch: 7   Global Step: 299350   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:41,837-Speed 2631.20 samples/sec   Loss 8.6802   LearningRate 0.0409   Epoch: 7   Global Step: 299360   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:45,733-Speed 2630.53 samples/sec   Loss 8.7716   LearningRate 0.0408   Epoch: 7   Global Step: 299370   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:49,622-Speed 2633.33 samples/sec   Loss 8.6386   LearningRate 0.0408   Epoch: 7   Global Step: 299380   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:53,524-Speed 2625.18 samples/sec   Loss 8.7566   LearningRate 0.0408   Epoch: 7   Global Step: 299390   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:20:57,428-Speed 2623.41 samples/sec   Loss 8.6789   LearningRate 0.0408   Epoch: 7   Global Step: 299400   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:01,323-Speed 2630.12 samples/sec   Loss 8.5916   LearningRate 0.0408   Epoch: 7   Global Step: 299410   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:05,292-Speed 2580.40 samples/sec   Loss 8.6343   LearningRate 0.0408   Epoch: 7   Global Step: 299420   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:21:09,180-Speed 2634.15 samples/sec   Loss 8.5514   LearningRate 0.0408   Epoch: 7   Global Step: 299430   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:21:13,098-Speed 2614.35 samples/sec   Loss 8.6232   LearningRate 0.0408   Epoch: 7   Global Step: 299440   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:21:16,996-Speed 2628.82 samples/sec   Loss 8.6151   LearningRate 0.0408   Epoch: 7   Global Step: 299450   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:21:20,892-Speed 2628.83 samples/sec   Loss 8.7250   LearningRate 0.0408   Epoch: 7   Global Step: 299460   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:24,794-Speed 2624.72 samples/sec   Loss 8.6852   LearningRate 0.0408   Epoch: 7   Global Step: 299470   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:28,689-Speed 2629.96 samples/sec   Loss 8.7014   LearningRate 0.0408   Epoch: 7   Global Step: 299480   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:32,579-Speed 2632.94 samples/sec   Loss 8.6613   LearningRate 0.0408   Epoch: 7   Global Step: 299490   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:36,479-Speed 2626.02 samples/sec   Loss 8.6496   LearningRate 0.0408   Epoch: 7   Global Step: 299500   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:40,363-Speed 2637.18 samples/sec   Loss 8.6699   LearningRate 0.0408   Epoch: 7   Global Step: 299510   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:44,264-Speed 2625.26 samples/sec   Loss 8.7048   LearningRate 0.0408   Epoch: 7   Global Step: 299520   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:48,161-Speed 2629.34 samples/sec   Loss 8.6622   LearningRate 0.0408   Epoch: 7   Global Step: 299530   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:52,049-Speed 2633.74 samples/sec   Loss 8.6879   LearningRate 0.0408   Epoch: 7   Global Step: 299540   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:55,941-Speed 2632.08 samples/sec   Loss 8.6391   LearningRate 0.0408   Epoch: 7   Global Step: 299550   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:21:59,898-Speed 2587.94 samples/sec   Loss 8.7260   LearningRate 0.0408   Epoch: 7   Global Step: 299560   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:22:03,790-Speed 2631.85 samples/sec   Loss 8.6324   LearningRate 0.0408   Epoch: 7   Global Step: 299570   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:22:07,684-Speed 2630.51 samples/sec   Loss 8.5227   LearningRate 0.0408   Epoch: 7   Global Step: 299580   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:22:11,575-Speed 2632.86 samples/sec   Loss 8.6787   LearningRate 0.0408   Epoch: 7   Global Step: 299590   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:22:15,464-Speed 2633.58 samples/sec   Loss 8.5762   LearningRate 0.0408   Epoch: 7   Global Step: 299600   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:22:19,358-Speed 2630.03 samples/sec   Loss 8.7866   LearningRate 0.0408   Epoch: 7   Global Step: 299610   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:22:23,264-Speed 2622.62 samples/sec   Loss 8.7278   LearningRate 0.0408   Epoch: 7   Global Step: 299620   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:22:27,157-Speed 2630.76 samples/sec   Loss 8.6216   LearningRate 0.0408   Epoch: 7   Global Step: 299630   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:22:31,045-Speed 2634.59 samples/sec   Loss 8.5517   LearningRate 0.0408   Epoch: 7   Global Step: 299640   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:22:35,669-Speed 2215.17 samples/sec   Loss 8.5131   LearningRate 0.0408   Epoch: 7   Global Step: 299650   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:22:39,556-Speed 2634.93 samples/sec   Loss 8.5635   LearningRate 0.0408   Epoch: 7   Global Step: 299660   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:22:43,445-Speed 2634.13 samples/sec   Loss 8.6404   LearningRate 0.0408   Epoch: 7   Global Step: 299670   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:22:47,336-Speed 2632.20 samples/sec   Loss 8.6074   LearningRate 0.0408   Epoch: 7   Global Step: 299680   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:22:51,234-Speed 2628.05 samples/sec   Loss 8.6236   LearningRate 0.0408   Epoch: 7   Global Step: 299690   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:22:55,129-Speed 2629.69 samples/sec   Loss 8.6496   LearningRate 0.0408   Epoch: 7   Global Step: 299700   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:22:59,106-Speed 2574.91 samples/sec   Loss 8.7506   LearningRate 0.0408   Epoch: 7   Global Step: 299710   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:23:03,005-Speed 2627.10 samples/sec   Loss 8.6555   LearningRate 0.0408   Epoch: 7   Global Step: 299720   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:23:06,894-Speed 2634.29 samples/sec   Loss 8.5970   LearningRate 0.0408   Epoch: 7   Global Step: 299730   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:23:10,769-Speed 2642.79 samples/sec   Loss 8.6981   LearningRate 0.0408   Epoch: 7   Global Step: 299740   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:23:14,659-Speed 2633.47 samples/sec   Loss 8.6193   LearningRate 0.0408   Epoch: 7   Global Step: 299750   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:23:18,523-Speed 2650.81 samples/sec   Loss 8.6360   LearningRate 0.0408   Epoch: 7   Global Step: 299760   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:22,421-Speed 2628.11 samples/sec   Loss 8.6622   LearningRate 0.0408   Epoch: 7   Global Step: 299770   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:26,328-Speed 2621.97 samples/sec   Loss 8.5768   LearningRate 0.0408   Epoch: 7   Global Step: 299780   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:30,230-Speed 2624.55 samples/sec   Loss 8.6343   LearningRate 0.0408   Epoch: 7   Global Step: 299790   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:34,147-Speed 2614.88 samples/sec   Loss 8.6228   LearningRate 0.0408   Epoch: 7   Global Step: 299800   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:38,045-Speed 2627.61 samples/sec   Loss 8.7490   LearningRate 0.0408   Epoch: 7   Global Step: 299810   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:41,936-Speed 2632.72 samples/sec   Loss 8.6621   LearningRate 0.0408   Epoch: 7   Global Step: 299820   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:45,827-Speed 2632.33 samples/sec   Loss 8.6681   LearningRate 0.0408   Epoch: 7   Global Step: 299830   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:49,718-Speed 2632.32 samples/sec   Loss 8.5943   LearningRate 0.0408   Epoch: 7   Global Step: 299840   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:53,612-Speed 2630.60 samples/sec   Loss 8.6973   LearningRate 0.0408   Epoch: 7   Global Step: 299850   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:23:57,508-Speed 2629.08 samples/sec   Loss 8.5324   LearningRate 0.0408   Epoch: 7   Global Step: 299860   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:01,432-Speed 2610.49 samples/sec   Loss 8.6043   LearningRate 0.0408   Epoch: 7   Global Step: 299870   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:05,331-Speed 2626.77 samples/sec   Loss 8.6051   LearningRate 0.0408   Epoch: 7   Global Step: 299880   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:09,224-Speed 2631.41 samples/sec   Loss 8.6200   LearningRate 0.0408   Epoch: 7   Global Step: 299890   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:13,153-Speed 2606.58 samples/sec   Loss 8.6816   LearningRate 0.0408   Epoch: 7   Global Step: 299900   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:17,049-Speed 2628.65 samples/sec   Loss 8.6934   LearningRate 0.0408   Epoch: 7   Global Step: 299910   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:20,940-Speed 2632.54 samples/sec   Loss 8.6635   LearningRate 0.0408   Epoch: 7   Global Step: 299920   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:24,834-Speed 2630.22 samples/sec   Loss 8.5108   LearningRate 0.0408   Epoch: 7   Global Step: 299930   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:28,728-Speed 2630.97 samples/sec   Loss 8.6060   LearningRate 0.0408   Epoch: 7   Global Step: 299940   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:32,624-Speed 2628.70 samples/sec   Loss 8.6793   LearningRate 0.0408   Epoch: 7   Global Step: 299950   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:24:36,514-Speed 2633.29 samples/sec   Loss 8.6662   LearningRate 0.0408   Epoch: 7   Global Step: 299960   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:24:40,407-Speed 2630.73 samples/sec   Loss 8.6111   LearningRate 0.0408   Epoch: 7   Global Step: 299970   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:24:44,297-Speed 2632.82 samples/sec   Loss 8.6515   LearningRate 0.0408   Epoch: 7   Global Step: 299980   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:24:48,201-Speed 2623.97 samples/sec   Loss 8.5599   LearningRate 0.0408   Epoch: 7   Global Step: 299990   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:24:52,093-Speed 2631.12 samples/sec   Loss 8.5346   LearningRate 0.0408   Epoch: 7   Global Step: 300000   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:25:35,138-[lfw][300000]XNorm: 23.715297
Training: 2022-04-14 05:25:35,139-[lfw][300000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 05:25:35,140-[lfw][300000]Accuracy-Highest: 0.99783
Training: 2022-04-14 05:26:25,234-[cfp_fp][300000]XNorm: 21.356608
Training: 2022-04-14 05:26:25,235-[cfp_fp][300000]Accuracy-Flip: 0.98343+-0.00703
Training: 2022-04-14 05:26:25,236-[cfp_fp][300000]Accuracy-Highest: 0.98643
Training: 2022-04-14 05:27:08,313-[agedb_30][300000]XNorm: 23.332386
Training: 2022-04-14 05:27:08,314-[agedb_30][300000]Accuracy-Flip: 0.97233+-0.00782
Training: 2022-04-14 05:27:08,314-[agedb_30][300000]Accuracy-Highest: 0.97567
Training: 2022-04-14 05:27:12,200-Speed 73.09 samples/sec   Loss 8.6982   LearningRate 0.0408   Epoch: 7   Global Step: 300010   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:27:16,080-Speed 2639.92 samples/sec   Loss 8.6507   LearningRate 0.0407   Epoch: 7   Global Step: 300020   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:27:19,978-Speed 2627.67 samples/sec   Loss 8.5857   LearningRate 0.0407   Epoch: 7   Global Step: 300030   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:27:23,864-Speed 2635.31 samples/sec   Loss 8.6176   LearningRate 0.0407   Epoch: 7   Global Step: 300040   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:27:27,756-Speed 2632.16 samples/sec   Loss 8.4628   LearningRate 0.0407   Epoch: 7   Global Step: 300050   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:27:31,632-Speed 2642.00 samples/sec   Loss 8.6328   LearningRate 0.0407   Epoch: 7   Global Step: 300060   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:27:35,519-Speed 2635.92 samples/sec   Loss 8.6108   LearningRate 0.0407   Epoch: 7   Global Step: 300070   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:27:39,401-Speed 2638.23 samples/sec   Loss 8.5989   LearningRate 0.0407   Epoch: 7   Global Step: 300080   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:27:43,291-Speed 2633.31 samples/sec   Loss 8.6361   LearningRate 0.0407   Epoch: 7   Global Step: 300090   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:27:47,169-Speed 2641.04 samples/sec   Loss 8.6133   LearningRate 0.0407   Epoch: 7   Global Step: 300100   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:27:51,054-Speed 2636.59 samples/sec   Loss 8.7090   LearningRate 0.0407   Epoch: 7   Global Step: 300110   Fp16 Grad Scale: 262144   Required: 60 hours
Training: 2022-04-14 05:27:54,931-Speed 2641.59 samples/sec   Loss 8.7565   LearningRate 0.0407   Epoch: 7   Global Step: 300120   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:27:58,818-Speed 2634.67 samples/sec   Loss 8.5640   LearningRate 0.0407   Epoch: 7   Global Step: 300130   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:02,712-Speed 2630.58 samples/sec   Loss 8.5801   LearningRate 0.0407   Epoch: 7   Global Step: 300140   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:06,599-Speed 2635.41 samples/sec   Loss 8.7097   LearningRate 0.0407   Epoch: 7   Global Step: 300150   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:10,488-Speed 2634.22 samples/sec   Loss 8.7816   LearningRate 0.0407   Epoch: 7   Global Step: 300160   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:14,377-Speed 2633.67 samples/sec   Loss 8.5912   LearningRate 0.0407   Epoch: 7   Global Step: 300170   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:18,266-Speed 2633.71 samples/sec   Loss 8.5642   LearningRate 0.0407   Epoch: 7   Global Step: 300180   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:22,159-Speed 2631.39 samples/sec   Loss 8.7474   LearningRate 0.0407   Epoch: 7   Global Step: 300190   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:26,053-Speed 2630.28 samples/sec   Loss 8.6771   LearningRate 0.0407   Epoch: 7   Global Step: 300200   Fp16 Grad Scale: 131072   Required: 60 hours
Training: 2022-04-14 05:28:29,943-Speed 2632.93 samples/sec   Loss 8.5695   LearningRate 0.0407   Epoch: 7   Global Step: 300210   Fp16 Grad Scale: 65536   Required: 60 hours
Training: 2022-04-14 05:28:33,822-Speed 2641.05 samples/sec   Loss 8.6666   LearningRate 0.0407   Epoch: 7   Global Step: 300220   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:28:37,700-Speed 2640.79 samples/sec   Loss 9.1170   LearningRate 0.0407   Epoch: 7   Global Step: 300230   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:28:41,592-Speed 2632.54 samples/sec   Loss 9.0319   LearningRate 0.0407   Epoch: 7   Global Step: 300240   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:28:45,484-Speed 2631.18 samples/sec   Loss 9.0011   LearningRate 0.0407   Epoch: 7   Global Step: 300250   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:28:49,377-Speed 2630.87 samples/sec   Loss 8.5888   LearningRate 0.0407   Epoch: 7   Global Step: 300260   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:28:53,283-Speed 2622.29 samples/sec   Loss 8.6442   LearningRate 0.0407   Epoch: 7   Global Step: 300270   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:28:57,179-Speed 2628.69 samples/sec   Loss 8.6917   LearningRate 0.0407   Epoch: 7   Global Step: 300280   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:29:01,079-Speed 2626.48 samples/sec   Loss 8.5547   LearningRate 0.0407   Epoch: 7   Global Step: 300290   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:29:04,979-Speed 2626.21 samples/sec   Loss 8.5868   LearningRate 0.0407   Epoch: 7   Global Step: 300300   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:29:08,888-Speed 2619.80 samples/sec   Loss 8.5809   LearningRate 0.0407   Epoch: 7   Global Step: 300310   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:29:12,811-Speed 2611.34 samples/sec   Loss 8.5393   LearningRate 0.0407   Epoch: 7   Global Step: 300320   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:29:16,707-Speed 2629.21 samples/sec   Loss 8.6306   LearningRate 0.0407   Epoch: 7   Global Step: 300330   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:20,604-Speed 2627.81 samples/sec   Loss 8.5821   LearningRate 0.0407   Epoch: 7   Global Step: 300340   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:24,496-Speed 2631.48 samples/sec   Loss 8.6266   LearningRate 0.0407   Epoch: 7   Global Step: 300350   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:28,388-Speed 2632.14 samples/sec   Loss 8.6503   LearningRate 0.0407   Epoch: 7   Global Step: 300360   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:32,285-Speed 2627.98 samples/sec   Loss 8.6106   LearningRate 0.0407   Epoch: 7   Global Step: 300370   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:36,193-Speed 2620.97 samples/sec   Loss 8.5959   LearningRate 0.0407   Epoch: 7   Global Step: 300380   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:40,081-Speed 2634.26 samples/sec   Loss 8.5557   LearningRate 0.0407   Epoch: 7   Global Step: 300390   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:43,978-Speed 2628.84 samples/sec   Loss 8.7449   LearningRate 0.0407   Epoch: 7   Global Step: 300400   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:47,879-Speed 2625.26 samples/sec   Loss 8.6684   LearningRate 0.0407   Epoch: 7   Global Step: 300410   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:51,881-Speed 2559.49 samples/sec   Loss 8.5411   LearningRate 0.0407   Epoch: 7   Global Step: 300420   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:29:55,779-Speed 2627.69 samples/sec   Loss 8.6853   LearningRate 0.0407   Epoch: 7   Global Step: 300430   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:29:59,674-Speed 2629.41 samples/sec   Loss 8.5811   LearningRate 0.0407   Epoch: 7   Global Step: 300440   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:03,572-Speed 2627.96 samples/sec   Loss 8.7870   LearningRate 0.0407   Epoch: 7   Global Step: 300450   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:07,469-Speed 2627.94 samples/sec   Loss 8.6899   LearningRate 0.0407   Epoch: 7   Global Step: 300460   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:11,361-Speed 2631.96 samples/sec   Loss 8.5119   LearningRate 0.0407   Epoch: 7   Global Step: 300470   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:15,291-Speed 2606.05 samples/sec   Loss 8.6046   LearningRate 0.0407   Epoch: 7   Global Step: 300480   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:19,193-Speed 2624.93 samples/sec   Loss 8.6358   LearningRate 0.0407   Epoch: 7   Global Step: 300490   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:23,092-Speed 2627.24 samples/sec   Loss 8.5388   LearningRate 0.0407   Epoch: 7   Global Step: 300500   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:26,988-Speed 2629.52 samples/sec   Loss 8.6257   LearningRate 0.0407   Epoch: 7   Global Step: 300510   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:30,883-Speed 2629.20 samples/sec   Loss 8.7022   LearningRate 0.0407   Epoch: 7   Global Step: 300520   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:30:34,780-Speed 2628.38 samples/sec   Loss 8.6100   LearningRate 0.0407   Epoch: 7   Global Step: 300530   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:30:38,692-Speed 2618.21 samples/sec   Loss 8.5855   LearningRate 0.0407   Epoch: 7   Global Step: 300540   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:30:42,605-Speed 2617.45 samples/sec   Loss 8.5852   LearningRate 0.0407   Epoch: 7   Global Step: 300550   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:30:46,513-Speed 2621.43 samples/sec   Loss 8.7911   LearningRate 0.0407   Epoch: 7   Global Step: 300560   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:30:50,415-Speed 2624.56 samples/sec   Loss 8.7191   LearningRate 0.0407   Epoch: 7   Global Step: 300570   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:30:54,325-Speed 2619.55 samples/sec   Loss 8.6048   LearningRate 0.0407   Epoch: 7   Global Step: 300580   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:30:58,227-Speed 2624.43 samples/sec   Loss 8.6415   LearningRate 0.0407   Epoch: 7   Global Step: 300590   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:31:02,159-Speed 2605.12 samples/sec   Loss 8.7449   LearningRate 0.0407   Epoch: 7   Global Step: 300600   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:31:06,052-Speed 2630.86 samples/sec   Loss 8.6928   LearningRate 0.0407   Epoch: 7   Global Step: 300610   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:31:09,957-Speed 2623.80 samples/sec   Loss 8.5858   LearningRate 0.0407   Epoch: 7   Global Step: 300620   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:31:13,862-Speed 2622.79 samples/sec   Loss 8.8001   LearningRate 0.0407   Epoch: 7   Global Step: 300630   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:31:17,780-Speed 2614.43 samples/sec   Loss 8.7148   LearningRate 0.0407   Epoch: 7   Global Step: 300640   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:31:21,689-Speed 2619.83 samples/sec   Loss 8.6645   LearningRate 0.0407   Epoch: 7   Global Step: 300650   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:31:25,577-Speed 2634.80 samples/sec   Loss 8.6478   LearningRate 0.0407   Epoch: 7   Global Step: 300660   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:29,477-Speed 2626.23 samples/sec   Loss 8.6879   LearningRate 0.0406   Epoch: 7   Global Step: 300670   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:33,375-Speed 2627.29 samples/sec   Loss 8.7101   LearningRate 0.0406   Epoch: 7   Global Step: 300680   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:37,289-Speed 2616.87 samples/sec   Loss 8.5811   LearningRate 0.0406   Epoch: 7   Global Step: 300690   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:41,191-Speed 2625.25 samples/sec   Loss 8.7123   LearningRate 0.0406   Epoch: 7   Global Step: 300700   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:45,105-Speed 2616.98 samples/sec   Loss 8.6816   LearningRate 0.0406   Epoch: 7   Global Step: 300710   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:49,011-Speed 2622.73 samples/sec   Loss 8.6026   LearningRate 0.0406   Epoch: 7   Global Step: 300720   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:52,910-Speed 2626.72 samples/sec   Loss 8.5960   LearningRate 0.0406   Epoch: 7   Global Step: 300730   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:31:56,812-Speed 2624.63 samples/sec   Loss 8.6204   LearningRate 0.0406   Epoch: 7   Global Step: 300740   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:32:00,718-Speed 2621.95 samples/sec   Loss 8.6502   LearningRate 0.0406   Epoch: 7   Global Step: 300750   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:32:04,619-Speed 2625.66 samples/sec   Loss 8.5313   LearningRate 0.0406   Epoch: 7   Global Step: 300760   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:08,543-Speed 2610.41 samples/sec   Loss 8.5643   LearningRate 0.0406   Epoch: 7   Global Step: 300770   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:12,445-Speed 2625.05 samples/sec   Loss 8.6893   LearningRate 0.0406   Epoch: 7   Global Step: 300780   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:16,347-Speed 2625.58 samples/sec   Loss 8.4074   LearningRate 0.0406   Epoch: 7   Global Step: 300790   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:20,313-Speed 2582.19 samples/sec   Loss 8.6638   LearningRate 0.0406   Epoch: 7   Global Step: 300800   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:24,231-Speed 2613.82 samples/sec   Loss 8.6809   LearningRate 0.0406   Epoch: 7   Global Step: 300810   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:28,139-Speed 2621.53 samples/sec   Loss 8.5037   LearningRate 0.0406   Epoch: 7   Global Step: 300820   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:32,046-Speed 2621.40 samples/sec   Loss 8.7189   LearningRate 0.0406   Epoch: 7   Global Step: 300830   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:35,957-Speed 2619.07 samples/sec   Loss 8.7217   LearningRate 0.0406   Epoch: 7   Global Step: 300840   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:39,872-Speed 2615.92 samples/sec   Loss 8.7548   LearningRate 0.0406   Epoch: 7   Global Step: 300850   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:32:43,794-Speed 2611.99 samples/sec   Loss 8.6716   LearningRate 0.0406   Epoch: 7   Global Step: 300860   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:32:47,705-Speed 2618.70 samples/sec   Loss 8.6933   LearningRate 0.0406   Epoch: 7   Global Step: 300870   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:32:51,608-Speed 2624.79 samples/sec   Loss 8.6151   LearningRate 0.0406   Epoch: 7   Global Step: 300880   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:32:55,515-Speed 2621.38 samples/sec   Loss 8.5755   LearningRate 0.0406   Epoch: 7   Global Step: 300890   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:32:59,416-Speed 2625.40 samples/sec   Loss 8.6322   LearningRate 0.0406   Epoch: 7   Global Step: 300900   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:33:03,319-Speed 2624.15 samples/sec   Loss 8.4404   LearningRate 0.0406   Epoch: 7   Global Step: 300910   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:33:07,213-Speed 2630.77 samples/sec   Loss 8.6108   LearningRate 0.0406   Epoch: 7   Global Step: 300920   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:33:11,112-Speed 2626.15 samples/sec   Loss 8.5980   LearningRate 0.0406   Epoch: 7   Global Step: 300930   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:33:15,007-Speed 2629.74 samples/sec   Loss 8.5669   LearningRate 0.0406   Epoch: 7   Global Step: 300940   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:33:18,904-Speed 2628.55 samples/sec   Loss 8.6926   LearningRate 0.0406   Epoch: 7   Global Step: 300950   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:33:22,804-Speed 2626.36 samples/sec   Loss 8.4904   LearningRate 0.0406   Epoch: 7   Global Step: 300960   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:33:26,703-Speed 2627.36 samples/sec   Loss 8.6461   LearningRate 0.0406   Epoch: 7   Global Step: 300970   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:33:30,568-Speed 2649.48 samples/sec   Loss 8.6319   LearningRate 0.0406   Epoch: 7   Global Step: 300980   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:34,465-Speed 2628.24 samples/sec   Loss 8.6378   LearningRate 0.0406   Epoch: 7   Global Step: 300990   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:38,362-Speed 2628.09 samples/sec   Loss 8.6147   LearningRate 0.0406   Epoch: 7   Global Step: 301000   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:42,279-Speed 2615.70 samples/sec   Loss 8.6909   LearningRate 0.0406   Epoch: 7   Global Step: 301010   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:46,190-Speed 2618.66 samples/sec   Loss 8.6164   LearningRate 0.0406   Epoch: 7   Global Step: 301020   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:50,095-Speed 2623.15 samples/sec   Loss 8.6779   LearningRate 0.0406   Epoch: 7   Global Step: 301030   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:54,030-Speed 2602.97 samples/sec   Loss 8.8170   LearningRate 0.0406   Epoch: 7   Global Step: 301040   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:33:57,937-Speed 2621.89 samples/sec   Loss 8.7029   LearningRate 0.0406   Epoch: 7   Global Step: 301050   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:34:01,845-Speed 2620.52 samples/sec   Loss 8.6733   LearningRate 0.0406   Epoch: 7   Global Step: 301060   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:34:05,751-Speed 2622.52 samples/sec   Loss 8.5225   LearningRate 0.0406   Epoch: 7   Global Step: 301070   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:34:09,650-Speed 2626.10 samples/sec   Loss 8.7103   LearningRate 0.0406   Epoch: 7   Global Step: 301080   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:13,547-Speed 2629.25 samples/sec   Loss 8.6538   LearningRate 0.0406   Epoch: 7   Global Step: 301090   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:17,443-Speed 2628.52 samples/sec   Loss 8.5707   LearningRate 0.0406   Epoch: 7   Global Step: 301100   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:21,337-Speed 2630.53 samples/sec   Loss 8.5898   LearningRate 0.0406   Epoch: 7   Global Step: 301110   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:25,232-Speed 2629.47 samples/sec   Loss 8.4709   LearningRate 0.0406   Epoch: 7   Global Step: 301120   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:29,132-Speed 2627.27 samples/sec   Loss 8.6096   LearningRate 0.0406   Epoch: 7   Global Step: 301130   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:33,025-Speed 2630.72 samples/sec   Loss 8.6752   LearningRate 0.0406   Epoch: 7   Global Step: 301140   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:36,920-Speed 2629.55 samples/sec   Loss 8.6499   LearningRate 0.0406   Epoch: 7   Global Step: 301150   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:40,826-Speed 2622.47 samples/sec   Loss 8.6730   LearningRate 0.0406   Epoch: 7   Global Step: 301160   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:44,721-Speed 2629.72 samples/sec   Loss 8.5951   LearningRate 0.0406   Epoch: 7   Global Step: 301170   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:48,620-Speed 2626.80 samples/sec   Loss 8.4610   LearningRate 0.0406   Epoch: 7   Global Step: 301180   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:34:52,515-Speed 2629.46 samples/sec   Loss 8.6043   LearningRate 0.0406   Epoch: 7   Global Step: 301190   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:34:56,412-Speed 2628.41 samples/sec   Loss 8.5519   LearningRate 0.0406   Epoch: 7   Global Step: 301200   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:00,310-Speed 2627.71 samples/sec   Loss 8.6702   LearningRate 0.0406   Epoch: 7   Global Step: 301210   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:04,207-Speed 2628.84 samples/sec   Loss 8.7808   LearningRate 0.0406   Epoch: 7   Global Step: 301220   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:08,106-Speed 2626.71 samples/sec   Loss 8.6169   LearningRate 0.0406   Epoch: 7   Global Step: 301230   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:12,017-Speed 2618.85 samples/sec   Loss 8.5955   LearningRate 0.0406   Epoch: 7   Global Step: 301240   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:15,933-Speed 2615.54 samples/sec   Loss 8.6130   LearningRate 0.0406   Epoch: 7   Global Step: 301250   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:19,841-Speed 2620.54 samples/sec   Loss 8.5306   LearningRate 0.0406   Epoch: 7   Global Step: 301260   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:23,734-Speed 2631.09 samples/sec   Loss 8.5318   LearningRate 0.0406   Epoch: 7   Global Step: 301270   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:27,633-Speed 2627.51 samples/sec   Loss 8.5773   LearningRate 0.0406   Epoch: 7   Global Step: 301280   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:31,526-Speed 2630.49 samples/sec   Loss 8.6454   LearningRate 0.0406   Epoch: 7   Global Step: 301290   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:35:35,422-Speed 2629.07 samples/sec   Loss 8.6020   LearningRate 0.0406   Epoch: 7   Global Step: 301300   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:35:39,299-Speed 2641.97 samples/sec   Loss 8.7366   LearningRate 0.0406   Epoch: 7   Global Step: 301310   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:43,233-Speed 2603.64 samples/sec   Loss 8.5993   LearningRate 0.0405   Epoch: 7   Global Step: 301320   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:47,129-Speed 2628.77 samples/sec   Loss 8.6778   LearningRate 0.0405   Epoch: 7   Global Step: 301330   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:35:51,010-Speed 2639.83 samples/sec   Loss 8.5886   LearningRate 0.0405   Epoch: 7   Global Step: 301340   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:35:54,910-Speed 2626.06 samples/sec   Loss 8.6152   LearningRate 0.0405   Epoch: 7   Global Step: 301350   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:35:58,814-Speed 2631.58 samples/sec   Loss 8.4756   LearningRate 0.0405   Epoch: 7   Global Step: 301360   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:02,707-Speed 2630.25 samples/sec   Loss 8.5966   LearningRate 0.0405   Epoch: 7   Global Step: 301370   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:06,599-Speed 2632.10 samples/sec   Loss 8.6341   LearningRate 0.0405   Epoch: 7   Global Step: 301380   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:10,498-Speed 2626.26 samples/sec   Loss 8.6950   LearningRate 0.0405   Epoch: 7   Global Step: 301390   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:14,410-Speed 2618.64 samples/sec   Loss 8.5425   LearningRate 0.0405   Epoch: 7   Global Step: 301400   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:18,316-Speed 2622.35 samples/sec   Loss 8.6609   LearningRate 0.0405   Epoch: 7   Global Step: 301410   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:22,212-Speed 2629.32 samples/sec   Loss 8.4582   LearningRate 0.0405   Epoch: 7   Global Step: 301420   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:26,109-Speed 2628.45 samples/sec   Loss 8.5244   LearningRate 0.0405   Epoch: 7   Global Step: 301430   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:36:30,005-Speed 2628.84 samples/sec   Loss 8.5288   LearningRate 0.0405   Epoch: 7   Global Step: 301440   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:33,901-Speed 2628.84 samples/sec   Loss 8.5914   LearningRate 0.0405   Epoch: 7   Global Step: 301450   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:37,804-Speed 2624.58 samples/sec   Loss 8.5043   LearningRate 0.0405   Epoch: 7   Global Step: 301460   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:41,697-Speed 2630.79 samples/sec   Loss 8.5729   LearningRate 0.0405   Epoch: 7   Global Step: 301470   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:45,596-Speed 2626.78 samples/sec   Loss 8.5369   LearningRate 0.0405   Epoch: 7   Global Step: 301480   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:49,494-Speed 2628.10 samples/sec   Loss 8.5318   LearningRate 0.0405   Epoch: 7   Global Step: 301490   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:53,394-Speed 2626.18 samples/sec   Loss 8.5828   LearningRate 0.0405   Epoch: 7   Global Step: 301500   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:36:57,294-Speed 2626.43 samples/sec   Loss 8.5720   LearningRate 0.0405   Epoch: 7   Global Step: 301510   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:37:01,176-Speed 2638.80 samples/sec   Loss 8.5723   LearningRate 0.0405   Epoch: 7   Global Step: 301520   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:05,072-Speed 2629.09 samples/sec   Loss 8.5837   LearningRate 0.0405   Epoch: 7   Global Step: 301530   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:08,986-Speed 2617.00 samples/sec   Loss 8.6308   LearningRate 0.0405   Epoch: 7   Global Step: 301540   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:12,911-Speed 2609.32 samples/sec   Loss 8.6942   LearningRate 0.0405   Epoch: 7   Global Step: 301550   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:16,828-Speed 2615.33 samples/sec   Loss 8.5566   LearningRate 0.0405   Epoch: 7   Global Step: 301560   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:20,747-Speed 2613.56 samples/sec   Loss 8.6740   LearningRate 0.0405   Epoch: 7   Global Step: 301570   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:24,659-Speed 2617.91 samples/sec   Loss 8.6463   LearningRate 0.0405   Epoch: 7   Global Step: 301580   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:28,578-Speed 2613.65 samples/sec   Loss 8.5403   LearningRate 0.0405   Epoch: 7   Global Step: 301590   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:32,474-Speed 2629.65 samples/sec   Loss 8.5765   LearningRate 0.0405   Epoch: 7   Global Step: 301600   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:36,364-Speed 2632.86 samples/sec   Loss 8.5631   LearningRate 0.0405   Epoch: 7   Global Step: 301610   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:37:40,211-Speed 2662.34 samples/sec   Loss 9.1707   LearningRate 0.0405   Epoch: 7   Global Step: 301620   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:37:44,101-Speed 2632.63 samples/sec   Loss 9.8343   LearningRate 0.0405   Epoch: 7   Global Step: 301630   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:37:48,000-Speed 2627.45 samples/sec   Loss 8.8314   LearningRate 0.0405   Epoch: 7   Global Step: 301640   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:37:51,890-Speed 2632.93 samples/sec   Loss 8.6405   LearningRate 0.0405   Epoch: 7   Global Step: 301650   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:37:55,820-Speed 2606.47 samples/sec   Loss 8.6751   LearningRate 0.0405   Epoch: 7   Global Step: 301660   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:37:59,716-Speed 2628.54 samples/sec   Loss 8.6192   LearningRate 0.0405   Epoch: 7   Global Step: 301670   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:38:03,608-Speed 2631.92 samples/sec   Loss 8.6075   LearningRate 0.0405   Epoch: 7   Global Step: 301680   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:38:07,510-Speed 2625.04 samples/sec   Loss 8.6569   LearningRate 0.0405   Epoch: 7   Global Step: 301690   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:38:11,406-Speed 2628.72 samples/sec   Loss 8.6255   LearningRate 0.0405   Epoch: 7   Global Step: 301700   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:38:15,313-Speed 2621.75 samples/sec   Loss 8.6594   LearningRate 0.0405   Epoch: 7   Global Step: 301710   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:38:19,273-Speed 2586.46 samples/sec   Loss 8.5642   LearningRate 0.0405   Epoch: 7   Global Step: 301720   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:38:23,184-Speed 2619.26 samples/sec   Loss 8.5914   LearningRate 0.0405   Epoch: 7   Global Step: 301730   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:27,087-Speed 2624.33 samples/sec   Loss 8.5669   LearningRate 0.0405   Epoch: 7   Global Step: 301740   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:30,984-Speed 2628.01 samples/sec   Loss 8.6176   LearningRate 0.0405   Epoch: 7   Global Step: 301750   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:34,890-Speed 2621.98 samples/sec   Loss 8.6876   LearningRate 0.0405   Epoch: 7   Global Step: 301760   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:39,011-Speed 2485.90 samples/sec   Loss 8.5783   LearningRate 0.0405   Epoch: 7   Global Step: 301770   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:42,936-Speed 2609.97 samples/sec   Loss 8.6240   LearningRate 0.0405   Epoch: 7   Global Step: 301780   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:46,838-Speed 2624.63 samples/sec   Loss 8.8659   LearningRate 0.0405   Epoch: 7   Global Step: 301790   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:50,729-Speed 2632.44 samples/sec   Loss 8.6648   LearningRate 0.0405   Epoch: 7   Global Step: 301800   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:54,618-Speed 2634.18 samples/sec   Loss 8.6098   LearningRate 0.0405   Epoch: 7   Global Step: 301810   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:38:58,511-Speed 2631.08 samples/sec   Loss 8.6333   LearningRate 0.0405   Epoch: 7   Global Step: 301820   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:39:02,409-Speed 2627.16 samples/sec   Loss 8.4904   LearningRate 0.0405   Epoch: 7   Global Step: 301830   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:06,298-Speed 2633.45 samples/sec   Loss 8.7026   LearningRate 0.0405   Epoch: 7   Global Step: 301840   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:10,191-Speed 2631.22 samples/sec   Loss 8.6363   LearningRate 0.0405   Epoch: 7   Global Step: 301850   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:14,084-Speed 2631.36 samples/sec   Loss 8.7039   LearningRate 0.0405   Epoch: 7   Global Step: 301860   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:17,974-Speed 2633.18 samples/sec   Loss 8.6542   LearningRate 0.0405   Epoch: 7   Global Step: 301870   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:21,872-Speed 2627.86 samples/sec   Loss 8.7114   LearningRate 0.0405   Epoch: 7   Global Step: 301880   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:25,762-Speed 2632.31 samples/sec   Loss 8.5186   LearningRate 0.0405   Epoch: 7   Global Step: 301890   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:29,675-Speed 2617.81 samples/sec   Loss 8.6536   LearningRate 0.0405   Epoch: 7   Global Step: 301900   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:33,570-Speed 2629.91 samples/sec   Loss 8.5891   LearningRate 0.0405   Epoch: 7   Global Step: 301910   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:37,467-Speed 2628.32 samples/sec   Loss 8.6733   LearningRate 0.0405   Epoch: 7   Global Step: 301920   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:39:41,361-Speed 2629.91 samples/sec   Loss 8.5154   LearningRate 0.0405   Epoch: 7   Global Step: 301930   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:39:45,255-Speed 2630.89 samples/sec   Loss 8.6200   LearningRate 0.0405   Epoch: 7   Global Step: 301940   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:39:49,146-Speed 2632.61 samples/sec   Loss 8.5817   LearningRate 0.0405   Epoch: 7   Global Step: 301950   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:39:53,042-Speed 2629.42 samples/sec   Loss 8.5521   LearningRate 0.0405   Epoch: 7   Global Step: 301960   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:39:56,933-Speed 2632.70 samples/sec   Loss 8.5069   LearningRate 0.0404   Epoch: 7   Global Step: 301970   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:40:00,881-Speed 2594.14 samples/sec   Loss 8.4494   LearningRate 0.0404   Epoch: 7   Global Step: 301980   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:40:04,956-Speed 2513.08 samples/sec   Loss 8.6200   LearningRate 0.0404   Epoch: 7   Global Step: 301990   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:40:09,029-Speed 2514.96 samples/sec   Loss 8.6581   LearningRate 0.0404   Epoch: 7   Global Step: 302000   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:40:13,014-Speed 2570.62 samples/sec   Loss 8.5082   LearningRate 0.0404   Epoch: 7   Global Step: 302010   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:40:16,908-Speed 2629.97 samples/sec   Loss 8.6256   LearningRate 0.0404   Epoch: 7   Global Step: 302020   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:40:20,803-Speed 2629.60 samples/sec   Loss 8.5461   LearningRate 0.0404   Epoch: 7   Global Step: 302030   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:24,700-Speed 2628.89 samples/sec   Loss 8.6890   LearningRate 0.0404   Epoch: 7   Global Step: 302040   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:28,593-Speed 2631.18 samples/sec   Loss 8.6296   LearningRate 0.0404   Epoch: 7   Global Step: 302050   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:32,544-Speed 2591.68 samples/sec   Loss 8.6041   LearningRate 0.0404   Epoch: 7   Global Step: 302060   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:36,438-Speed 2630.87 samples/sec   Loss 8.6134   LearningRate 0.0404   Epoch: 7   Global Step: 302070   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:40,333-Speed 2629.14 samples/sec   Loss 8.7129   LearningRate 0.0404   Epoch: 7   Global Step: 302080   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:44,234-Speed 2626.27 samples/sec   Loss 8.4771   LearningRate 0.0404   Epoch: 7   Global Step: 302090   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:48,128-Speed 2629.74 samples/sec   Loss 8.6349   LearningRate 0.0404   Epoch: 7   Global Step: 302100   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:52,026-Speed 2627.95 samples/sec   Loss 8.5490   LearningRate 0.0404   Epoch: 7   Global Step: 302110   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:55,927-Speed 2625.69 samples/sec   Loss 8.5906   LearningRate 0.0404   Epoch: 7   Global Step: 302120   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:40:59,825-Speed 2627.88 samples/sec   Loss 8.5896   LearningRate 0.0404   Epoch: 7   Global Step: 302130   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:03,720-Speed 2629.36 samples/sec   Loss 8.6486   LearningRate 0.0404   Epoch: 7   Global Step: 302140   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:07,614-Speed 2630.54 samples/sec   Loss 8.6640   LearningRate 0.0404   Epoch: 7   Global Step: 302150   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:11,505-Speed 2632.25 samples/sec   Loss 8.6988   LearningRate 0.0404   Epoch: 7   Global Step: 302160   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:15,396-Speed 2632.11 samples/sec   Loss 8.6473   LearningRate 0.0404   Epoch: 7   Global Step: 302170   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:19,296-Speed 2626.15 samples/sec   Loss 8.6794   LearningRate 0.0404   Epoch: 7   Global Step: 302180   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:23,197-Speed 2625.85 samples/sec   Loss 8.5578   LearningRate 0.0404   Epoch: 7   Global Step: 302190   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:27,091-Speed 2630.37 samples/sec   Loss 8.5811   LearningRate 0.0404   Epoch: 7   Global Step: 302200   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:30,985-Speed 2629.85 samples/sec   Loss 8.7337   LearningRate 0.0404   Epoch: 7   Global Step: 302210   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:34,879-Speed 2631.05 samples/sec   Loss 8.5822   LearningRate 0.0404   Epoch: 7   Global Step: 302220   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:38,757-Speed 2640.75 samples/sec   Loss 8.5446   LearningRate 0.0404   Epoch: 7   Global Step: 302230   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:42,650-Speed 2631.41 samples/sec   Loss 8.6835   LearningRate 0.0404   Epoch: 7   Global Step: 302240   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:46,544-Speed 2630.03 samples/sec   Loss 8.4898   LearningRate 0.0404   Epoch: 7   Global Step: 302250   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:50,439-Speed 2629.86 samples/sec   Loss 8.6335   LearningRate 0.0404   Epoch: 7   Global Step: 302260   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:41:54,322-Speed 2637.49 samples/sec   Loss 8.4650   LearningRate 0.0404   Epoch: 7   Global Step: 302270   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:41:58,239-Speed 2615.74 samples/sec   Loss 8.6236   LearningRate 0.0404   Epoch: 7   Global Step: 302280   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:02,132-Speed 2630.42 samples/sec   Loss 8.6800   LearningRate 0.0404   Epoch: 7   Global Step: 302290   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:06,040-Speed 2620.58 samples/sec   Loss 8.6254   LearningRate 0.0404   Epoch: 7   Global Step: 302300   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:09,937-Speed 2628.42 samples/sec   Loss 8.5945   LearningRate 0.0404   Epoch: 7   Global Step: 302310   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:13,839-Speed 2625.78 samples/sec   Loss 8.4579   LearningRate 0.0404   Epoch: 7   Global Step: 302320   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:17,735-Speed 2628.82 samples/sec   Loss 8.6709   LearningRate 0.0404   Epoch: 7   Global Step: 302330   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:21,633-Speed 2627.07 samples/sec   Loss 8.5214   LearningRate 0.0404   Epoch: 7   Global Step: 302340   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:25,539-Speed 2622.59 samples/sec   Loss 8.6490   LearningRate 0.0404   Epoch: 7   Global Step: 302350   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:29,444-Speed 2623.08 samples/sec   Loss 8.5634   LearningRate 0.0404   Epoch: 7   Global Step: 302360   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:42:33,340-Speed 2629.01 samples/sec   Loss 8.6290   LearningRate 0.0404   Epoch: 7   Global Step: 302370   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:42:37,260-Speed 2612.69 samples/sec   Loss 8.6433   LearningRate 0.0404   Epoch: 7   Global Step: 302380   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:42:41,157-Speed 2628.54 samples/sec   Loss 8.6529   LearningRate 0.0404   Epoch: 7   Global Step: 302390   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:42:45,051-Speed 2630.43 samples/sec   Loss 8.5648   LearningRate 0.0404   Epoch: 7   Global Step: 302400   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:42:48,960-Speed 2620.62 samples/sec   Loss 8.6408   LearningRate 0.0404   Epoch: 7   Global Step: 302410   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:42:52,896-Speed 2602.00 samples/sec   Loss 8.5904   LearningRate 0.0404   Epoch: 7   Global Step: 302420   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:42:56,796-Speed 2626.51 samples/sec   Loss 8.5337   LearningRate 0.0404   Epoch: 7   Global Step: 302430   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:00,700-Speed 2623.74 samples/sec   Loss 8.5443   LearningRate 0.0404   Epoch: 7   Global Step: 302440   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:04,598-Speed 2627.96 samples/sec   Loss 8.6204   LearningRate 0.0404   Epoch: 7   Global Step: 302450   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:08,488-Speed 2632.77 samples/sec   Loss 8.7381   LearningRate 0.0404   Epoch: 7   Global Step: 302460   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:12,380-Speed 2631.44 samples/sec   Loss 8.4755   LearningRate 0.0404   Epoch: 7   Global Step: 302470   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:43:16,289-Speed 2620.57 samples/sec   Loss 8.5345   LearningRate 0.0404   Epoch: 7   Global Step: 302480   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:20,181-Speed 2631.48 samples/sec   Loss 8.6634   LearningRate 0.0404   Epoch: 7   Global Step: 302490   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:24,079-Speed 2628.23 samples/sec   Loss 8.5959   LearningRate 0.0404   Epoch: 7   Global Step: 302500   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:28,015-Speed 2602.56 samples/sec   Loss 8.6728   LearningRate 0.0404   Epoch: 7   Global Step: 302510   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:31,911-Speed 2629.07 samples/sec   Loss 8.6042   LearningRate 0.0404   Epoch: 7   Global Step: 302520   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:35,807-Speed 2629.36 samples/sec   Loss 8.4388   LearningRate 0.0404   Epoch: 7   Global Step: 302530   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:39,703-Speed 2628.82 samples/sec   Loss 8.5678   LearningRate 0.0404   Epoch: 7   Global Step: 302540   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:43,631-Speed 2607.09 samples/sec   Loss 8.6417   LearningRate 0.0404   Epoch: 7   Global Step: 302550   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:47,530-Speed 2627.20 samples/sec   Loss 8.6519   LearningRate 0.0404   Epoch: 7   Global Step: 302560   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:51,438-Speed 2621.45 samples/sec   Loss 8.6162   LearningRate 0.0404   Epoch: 7   Global Step: 302570   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:43:55,339-Speed 2625.82 samples/sec   Loss 8.5397   LearningRate 0.0404   Epoch: 7   Global Step: 302580   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:43:59,242-Speed 2624.07 samples/sec   Loss 8.5832   LearningRate 0.0404   Epoch: 7   Global Step: 302590   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:44:03,124-Speed 2638.26 samples/sec   Loss 8.6466   LearningRate 0.0404   Epoch: 7   Global Step: 302600   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:07,042-Speed 2614.33 samples/sec   Loss 8.6765   LearningRate 0.0404   Epoch: 7   Global Step: 302610   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:10,945-Speed 2624.47 samples/sec   Loss 8.6216   LearningRate 0.0403   Epoch: 7   Global Step: 302620   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:14,845-Speed 2626.26 samples/sec   Loss 8.5772   LearningRate 0.0403   Epoch: 7   Global Step: 302630   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:18,746-Speed 2625.33 samples/sec   Loss 8.4765   LearningRate 0.0403   Epoch: 7   Global Step: 302640   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:22,640-Speed 2630.12 samples/sec   Loss 8.5564   LearningRate 0.0403   Epoch: 7   Global Step: 302650   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:26,534-Speed 2630.13 samples/sec   Loss 8.5635   LearningRate 0.0403   Epoch: 7   Global Step: 302660   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:30,432-Speed 2627.60 samples/sec   Loss 8.5865   LearningRate 0.0403   Epoch: 7   Global Step: 302670   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:34,384-Speed 2592.47 samples/sec   Loss 8.4992   LearningRate 0.0403   Epoch: 7   Global Step: 302680   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:38,271-Speed 2635.20 samples/sec   Loss 8.5881   LearningRate 0.0403   Epoch: 7   Global Step: 302690   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:42,167-Speed 2628.81 samples/sec   Loss 8.5098   LearningRate 0.0403   Epoch: 7   Global Step: 302700   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:44:46,042-Speed 2643.45 samples/sec   Loss 8.6913   LearningRate 0.0403   Epoch: 7   Global Step: 302710   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:49,948-Speed 2622.50 samples/sec   Loss 8.6545   LearningRate 0.0403   Epoch: 7   Global Step: 302720   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:53,850-Speed 2625.05 samples/sec   Loss 8.6081   LearningRate 0.0403   Epoch: 7   Global Step: 302730   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:44:57,763-Speed 2617.03 samples/sec   Loss 8.6560   LearningRate 0.0403   Epoch: 7   Global Step: 302740   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:01,675-Speed 2618.05 samples/sec   Loss 8.6816   LearningRate 0.0403   Epoch: 7   Global Step: 302750   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:05,576-Speed 2625.49 samples/sec   Loss 8.6471   LearningRate 0.0403   Epoch: 7   Global Step: 302760   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:09,470-Speed 2631.08 samples/sec   Loss 8.5144   LearningRate 0.0403   Epoch: 7   Global Step: 302770   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:13,369-Speed 2626.71 samples/sec   Loss 8.6442   LearningRate 0.0403   Epoch: 7   Global Step: 302780   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:17,273-Speed 2623.38 samples/sec   Loss 8.5806   LearningRate 0.0403   Epoch: 7   Global Step: 302790   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:21,170-Speed 2628.69 samples/sec   Loss 8.6734   LearningRate 0.0403   Epoch: 7   Global Step: 302800   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:25,042-Speed 2645.07 samples/sec   Loss 8.5627   LearningRate 0.0403   Epoch: 7   Global Step: 302810   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:45:28,912-Speed 2646.89 samples/sec   Loss 8.6022   LearningRate 0.0403   Epoch: 7   Global Step: 302820   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:45:32,762-Speed 2660.11 samples/sec   Loss 9.5869   LearningRate 0.0403   Epoch: 7   Global Step: 302830   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:45:36,634-Speed 2645.46 samples/sec   Loss 10.0393   LearningRate 0.0403   Epoch: 7   Global Step: 302840   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:45:40,530-Speed 2629.13 samples/sec   Loss 9.1763   LearningRate 0.0403   Epoch: 7   Global Step: 302850   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:45:44,428-Speed 2627.38 samples/sec   Loss 8.7632   LearningRate 0.0403   Epoch: 7   Global Step: 302860   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:45:48,321-Speed 2631.01 samples/sec   Loss 8.5994   LearningRate 0.0403   Epoch: 7   Global Step: 302870   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:45:52,216-Speed 2630.47 samples/sec   Loss 8.6019   LearningRate 0.0403   Epoch: 7   Global Step: 302880   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:45:56,106-Speed 2633.02 samples/sec   Loss 8.5697   LearningRate 0.0403   Epoch: 7   Global Step: 302890   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:46:00,037-Speed 2605.77 samples/sec   Loss 8.6088   LearningRate 0.0403   Epoch: 7   Global Step: 302900   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:46:03,930-Speed 2631.12 samples/sec   Loss 8.6115   LearningRate 0.0403   Epoch: 7   Global Step: 302910   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:46:07,818-Speed 2634.30 samples/sec   Loss 8.4923   LearningRate 0.0403   Epoch: 7   Global Step: 302920   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:46:11,706-Speed 2634.18 samples/sec   Loss 8.5929   LearningRate 0.0403   Epoch: 7   Global Step: 302930   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:46:15,602-Speed 2628.74 samples/sec   Loss 8.6266   LearningRate 0.0403   Epoch: 7   Global Step: 302940   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:19,539-Speed 2602.55 samples/sec   Loss 8.3892   LearningRate 0.0403   Epoch: 7   Global Step: 302950   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:23,436-Speed 2628.33 samples/sec   Loss 8.7562   LearningRate 0.0403   Epoch: 7   Global Step: 302960   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:27,340-Speed 2623.79 samples/sec   Loss 8.6779   LearningRate 0.0403   Epoch: 7   Global Step: 302970   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:31,240-Speed 2626.35 samples/sec   Loss 8.5622   LearningRate 0.0403   Epoch: 7   Global Step: 302980   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:35,133-Speed 2631.23 samples/sec   Loss 8.6117   LearningRate 0.0403   Epoch: 7   Global Step: 302990   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:39,024-Speed 2632.37 samples/sec   Loss 8.5719   LearningRate 0.0403   Epoch: 7   Global Step: 303000   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:42,918-Speed 2630.39 samples/sec   Loss 8.3568   LearningRate 0.0403   Epoch: 7   Global Step: 303010   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:46,812-Speed 2629.71 samples/sec   Loss 8.7089   LearningRate 0.0403   Epoch: 7   Global Step: 303020   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:50,706-Speed 2631.01 samples/sec   Loss 8.5514   LearningRate 0.0403   Epoch: 7   Global Step: 303030   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:46:54,604-Speed 2627.83 samples/sec   Loss 8.7036   LearningRate 0.0403   Epoch: 7   Global Step: 303040   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:46:58,503-Speed 2626.90 samples/sec   Loss 8.7243   LearningRate 0.0403   Epoch: 7   Global Step: 303050   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:47:02,386-Speed 2637.32 samples/sec   Loss 8.7464   LearningRate 0.0403   Epoch: 7   Global Step: 303060   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:06,275-Speed 2633.76 samples/sec   Loss 8.8313   LearningRate 0.0403   Epoch: 7   Global Step: 303070   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:10,168-Speed 2630.63 samples/sec   Loss 8.6678   LearningRate 0.0403   Epoch: 7   Global Step: 303080   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:14,063-Speed 2629.92 samples/sec   Loss 8.6554   LearningRate 0.0403   Epoch: 7   Global Step: 303090   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:17,957-Speed 2630.49 samples/sec   Loss 8.5399   LearningRate 0.0403   Epoch: 7   Global Step: 303100   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:21,856-Speed 2627.16 samples/sec   Loss 8.6151   LearningRate 0.0403   Epoch: 7   Global Step: 303110   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:25,754-Speed 2628.04 samples/sec   Loss 8.6132   LearningRate 0.0403   Epoch: 7   Global Step: 303120   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:29,647-Speed 2631.17 samples/sec   Loss 8.5972   LearningRate 0.0403   Epoch: 7   Global Step: 303130   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:33,538-Speed 2632.23 samples/sec   Loss 8.5206   LearningRate 0.0403   Epoch: 7   Global Step: 303140   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:37,435-Speed 2627.91 samples/sec   Loss 8.6129   LearningRate 0.0403   Epoch: 7   Global Step: 303150   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:47:41,335-Speed 2625.98 samples/sec   Loss 8.6220   LearningRate 0.0403   Epoch: 7   Global Step: 303160   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:47:45,229-Speed 2630.35 samples/sec   Loss 8.5449   LearningRate 0.0403   Epoch: 7   Global Step: 303170   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:47:49,124-Speed 2629.91 samples/sec   Loss 8.5931   LearningRate 0.0403   Epoch: 7   Global Step: 303180   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:47:53,019-Speed 2629.92 samples/sec   Loss 8.5726   LearningRate 0.0403   Epoch: 7   Global Step: 303190   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:47:56,914-Speed 2629.25 samples/sec   Loss 8.6641   LearningRate 0.0403   Epoch: 7   Global Step: 303200   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:48:00,808-Speed 2630.67 samples/sec   Loss 8.5205   LearningRate 0.0403   Epoch: 7   Global Step: 303210   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:48:04,700-Speed 2631.64 samples/sec   Loss 8.5833   LearningRate 0.0403   Epoch: 7   Global Step: 303220   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:48:08,594-Speed 2630.12 samples/sec   Loss 8.6219   LearningRate 0.0403   Epoch: 7   Global Step: 303230   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:48:12,497-Speed 2623.87 samples/sec   Loss 9.2859   LearningRate 0.0403   Epoch: 7   Global Step: 303240   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:48:16,392-Speed 2629.92 samples/sec   Loss 9.0265   LearningRate 0.0403   Epoch: 7   Global Step: 303250   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:48:20,287-Speed 2629.78 samples/sec   Loss 8.6759   LearningRate 0.0403   Epoch: 7   Global Step: 303260   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:24,180-Speed 2630.99 samples/sec   Loss 8.6433   LearningRate 0.0403   Epoch: 7   Global Step: 303270   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:28,073-Speed 2630.97 samples/sec   Loss 8.6435   LearningRate 0.0402   Epoch: 7   Global Step: 303280   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:31,971-Speed 2627.52 samples/sec   Loss 8.6000   LearningRate 0.0402   Epoch: 7   Global Step: 303290   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:35,870-Speed 2627.06 samples/sec   Loss 8.6982   LearningRate 0.0402   Epoch: 7   Global Step: 303300   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:39,768-Speed 2628.28 samples/sec   Loss 8.5412   LearningRate 0.0402   Epoch: 7   Global Step: 303310   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:43,664-Speed 2628.95 samples/sec   Loss 8.4961   LearningRate 0.0402   Epoch: 7   Global Step: 303320   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:47,560-Speed 2628.56 samples/sec   Loss 8.7140   LearningRate 0.0402   Epoch: 7   Global Step: 303330   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:51,456-Speed 2628.92 samples/sec   Loss 8.6829   LearningRate 0.0402   Epoch: 7   Global Step: 303340   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:55,347-Speed 2632.54 samples/sec   Loss 8.6759   LearningRate 0.0402   Epoch: 7   Global Step: 303350   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:48:59,245-Speed 2627.50 samples/sec   Loss 8.5308   LearningRate 0.0402   Epoch: 7   Global Step: 303360   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:03,141-Speed 2629.55 samples/sec   Loss 8.6502   LearningRate 0.0402   Epoch: 7   Global Step: 303370   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:07,213-Speed 2514.84 samples/sec   Loss 8.5963   LearningRate 0.0402   Epoch: 7   Global Step: 303380   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:11,279-Speed 2519.60 samples/sec   Loss 8.4633   LearningRate 0.0402   Epoch: 7   Global Step: 303390   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:15,277-Speed 2561.99 samples/sec   Loss 8.5225   LearningRate 0.0402   Epoch: 7   Global Step: 303400   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:19,183-Speed 2621.92 samples/sec   Loss 8.5967   LearningRate 0.0402   Epoch: 7   Global Step: 303410   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:23,076-Speed 2630.92 samples/sec   Loss 8.5180   LearningRate 0.0402   Epoch: 7   Global Step: 303420   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:26,969-Speed 2631.02 samples/sec   Loss 8.5892   LearningRate 0.0402   Epoch: 7   Global Step: 303430   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:30,874-Speed 2631.84 samples/sec   Loss 8.4495   LearningRate 0.0402   Epoch: 7   Global Step: 303440   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:34,779-Speed 2622.96 samples/sec   Loss 8.5674   LearningRate 0.0402   Epoch: 7   Global Step: 303450   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:49:38,674-Speed 2630.45 samples/sec   Loss 8.6830   LearningRate 0.0402   Epoch: 7   Global Step: 303460   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:49:42,575-Speed 2625.71 samples/sec   Loss 8.5565   LearningRate 0.0402   Epoch: 7   Global Step: 303470   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:49:46,472-Speed 2627.60 samples/sec   Loss 8.6901   LearningRate 0.0402   Epoch: 7   Global Step: 303480   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:49:50,368-Speed 2629.08 samples/sec   Loss 8.6641   LearningRate 0.0402   Epoch: 7   Global Step: 303490   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:49:54,274-Speed 2622.20 samples/sec   Loss 8.6410   LearningRate 0.0402   Epoch: 7   Global Step: 303500   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:49:58,169-Speed 2629.61 samples/sec   Loss 8.5229   LearningRate 0.0402   Epoch: 7   Global Step: 303510   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:02,071-Speed 2625.39 samples/sec   Loss 8.7474   LearningRate 0.0402   Epoch: 7   Global Step: 303520   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:05,997-Speed 2609.07 samples/sec   Loss 8.7021   LearningRate 0.0402   Epoch: 7   Global Step: 303530   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:09,891-Speed 2630.63 samples/sec   Loss 8.5468   LearningRate 0.0402   Epoch: 7   Global Step: 303540   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:13,805-Speed 2616.68 samples/sec   Loss 8.6317   LearningRate 0.0402   Epoch: 7   Global Step: 303550   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:17,697-Speed 2631.63 samples/sec   Loss 8.7164   LearningRate 0.0402   Epoch: 7   Global Step: 303560   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:50:21,590-Speed 2630.95 samples/sec   Loss 8.5792   LearningRate 0.0402   Epoch: 7   Global Step: 303570   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:25,495-Speed 2623.00 samples/sec   Loss 8.5302   LearningRate 0.0402   Epoch: 7   Global Step: 303580   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:29,387-Speed 2631.57 samples/sec   Loss 8.6014   LearningRate 0.0402   Epoch: 7   Global Step: 303590   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:33,291-Speed 2623.57 samples/sec   Loss 8.5258   LearningRate 0.0402   Epoch: 7   Global Step: 303600   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:37,200-Speed 2620.42 samples/sec   Loss 8.5586   LearningRate 0.0402   Epoch: 7   Global Step: 303610   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:41,204-Speed 2558.28 samples/sec   Loss 8.5798   LearningRate 0.0402   Epoch: 7   Global Step: 303620   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:45,100-Speed 2629.35 samples/sec   Loss 8.6245   LearningRate 0.0402   Epoch: 7   Global Step: 303630   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:48,997-Speed 2628.21 samples/sec   Loss 8.6226   LearningRate 0.0402   Epoch: 7   Global Step: 303640   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:50:52,873-Speed 2642.33 samples/sec   Loss 8.4849   LearningRate 0.0402   Epoch: 7   Global Step: 303650   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:50:56,773-Speed 2626.49 samples/sec   Loss 8.4408   LearningRate 0.0402   Epoch: 7   Global Step: 303660   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:00,666-Speed 2631.23 samples/sec   Loss 8.5225   LearningRate 0.0402   Epoch: 7   Global Step: 303670   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:04,564-Speed 2627.01 samples/sec   Loss 8.6282   LearningRate 0.0402   Epoch: 7   Global Step: 303680   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:08,459-Speed 2629.87 samples/sec   Loss 8.5871   LearningRate 0.0402   Epoch: 7   Global Step: 303690   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:12,356-Speed 2628.52 samples/sec   Loss 8.4940   LearningRate 0.0402   Epoch: 7   Global Step: 303700   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:16,249-Speed 2631.14 samples/sec   Loss 8.7352   LearningRate 0.0402   Epoch: 7   Global Step: 303710   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:20,141-Speed 2631.96 samples/sec   Loss 8.6403   LearningRate 0.0402   Epoch: 7   Global Step: 303720   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:24,040-Speed 2627.03 samples/sec   Loss 8.6264   LearningRate 0.0402   Epoch: 7   Global Step: 303730   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:27,933-Speed 2630.81 samples/sec   Loss 8.4726   LearningRate 0.0402   Epoch: 7   Global Step: 303740   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:51:31,828-Speed 2629.42 samples/sec   Loss 8.4729   LearningRate 0.0402   Epoch: 7   Global Step: 303750   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:35,722-Speed 2630.56 samples/sec   Loss 8.5622   LearningRate 0.0402   Epoch: 7   Global Step: 303760   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:39,626-Speed 2623.21 samples/sec   Loss 8.6490   LearningRate 0.0402   Epoch: 7   Global Step: 303770   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:43,519-Speed 2631.31 samples/sec   Loss 8.8295   LearningRate 0.0402   Epoch: 7   Global Step: 303780   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:47,413-Speed 2630.13 samples/sec   Loss 8.4584   LearningRate 0.0402   Epoch: 7   Global Step: 303790   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:51,307-Speed 2631.00 samples/sec   Loss 8.6541   LearningRate 0.0402   Epoch: 7   Global Step: 303800   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:55,207-Speed 2626.36 samples/sec   Loss 8.5962   LearningRate 0.0402   Epoch: 7   Global Step: 303810   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:51:59,103-Speed 2628.91 samples/sec   Loss 8.5817   LearningRate 0.0402   Epoch: 7   Global Step: 303820   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:02,998-Speed 2629.68 samples/sec   Loss 8.5626   LearningRate 0.0402   Epoch: 7   Global Step: 303830   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:06,894-Speed 2628.78 samples/sec   Loss 8.6514   LearningRate 0.0402   Epoch: 7   Global Step: 303840   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:10,791-Speed 2627.89 samples/sec   Loss 8.4795   LearningRate 0.0402   Epoch: 7   Global Step: 303850   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:52:14,687-Speed 2629.18 samples/sec   Loss 8.6875   LearningRate 0.0402   Epoch: 7   Global Step: 303860   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:52:18,586-Speed 2627.35 samples/sec   Loss 8.6502   LearningRate 0.0402   Epoch: 7   Global Step: 303870   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:52:22,479-Speed 2631.30 samples/sec   Loss 8.6692   LearningRate 0.0402   Epoch: 7   Global Step: 303880   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:52:26,363-Speed 2637.18 samples/sec   Loss 8.5692   LearningRate 0.0402   Epoch: 7   Global Step: 303890   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:30,257-Speed 2630.44 samples/sec   Loss 8.4990   LearningRate 0.0402   Epoch: 7   Global Step: 303900   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:34,151-Speed 2630.17 samples/sec   Loss 8.6063   LearningRate 0.0402   Epoch: 7   Global Step: 303910   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:38,053-Speed 2624.52 samples/sec   Loss 8.5409   LearningRate 0.0402   Epoch: 7   Global Step: 303920   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:41,952-Speed 2626.81 samples/sec   Loss 8.5797   LearningRate 0.0401   Epoch: 7   Global Step: 303930   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:45,851-Speed 2627.64 samples/sec   Loss 8.4928   LearningRate 0.0401   Epoch: 7   Global Step: 303940   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:52:49,727-Speed 2642.82 samples/sec   Loss 8.5471   LearningRate 0.0401   Epoch: 7   Global Step: 303950   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:52:53,626-Speed 2626.14 samples/sec   Loss 8.4690   LearningRate 0.0401   Epoch: 7   Global Step: 303960   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:52:57,520-Speed 2631.34 samples/sec   Loss 8.5969   LearningRate 0.0401   Epoch: 7   Global Step: 303970   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:53:01,419-Speed 2626.95 samples/sec   Loss 8.5844   LearningRate 0.0401   Epoch: 7   Global Step: 303980   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:53:05,317-Speed 2627.12 samples/sec   Loss 8.4908   LearningRate 0.0401   Epoch: 7   Global Step: 303990   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:53:09,212-Speed 2629.49 samples/sec   Loss 8.5825   LearningRate 0.0401   Epoch: 7   Global Step: 304000   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:53:13,025-Speed 2686.50 samples/sec   Loss 9.2987   LearningRate 0.0401   Epoch: 7   Global Step: 304010   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:16,924-Speed 2627.26 samples/sec   Loss 8.4796   LearningRate 0.0401   Epoch: 7   Global Step: 304020   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:20,825-Speed 2625.98 samples/sec   Loss 8.6199   LearningRate 0.0401   Epoch: 7   Global Step: 304030   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:24,720-Speed 2629.33 samples/sec   Loss 8.7720   LearningRate 0.0401   Epoch: 7   Global Step: 304040   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:28,617-Speed 2628.27 samples/sec   Loss 8.7637   LearningRate 0.0401   Epoch: 7   Global Step: 304050   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:32,516-Speed 2626.91 samples/sec   Loss 8.5875   LearningRate 0.0401   Epoch: 7   Global Step: 304060   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:36,412-Speed 2628.55 samples/sec   Loss 8.6728   LearningRate 0.0401   Epoch: 7   Global Step: 304070   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:40,312-Speed 2625.90 samples/sec   Loss 8.5514   LearningRate 0.0401   Epoch: 7   Global Step: 304080   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:44,209-Speed 2628.78 samples/sec   Loss 8.6294   LearningRate 0.0401   Epoch: 7   Global Step: 304090   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:48,102-Speed 2631.14 samples/sec   Loss 8.5215   LearningRate 0.0401   Epoch: 7   Global Step: 304100   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 05:53:52,000-Speed 2627.62 samples/sec   Loss 8.4511   LearningRate 0.0401   Epoch: 7   Global Step: 304110   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:53:55,892-Speed 2631.74 samples/sec   Loss 8.6081   LearningRate 0.0401   Epoch: 7   Global Step: 304120   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:53:59,787-Speed 2629.81 samples/sec   Loss 8.6296   LearningRate 0.0401   Epoch: 7   Global Step: 304130   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:03,682-Speed 2629.55 samples/sec   Loss 8.4919   LearningRate 0.0401   Epoch: 7   Global Step: 304140   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:07,582-Speed 2626.27 samples/sec   Loss 8.5160   LearningRate 0.0401   Epoch: 7   Global Step: 304150   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:11,487-Speed 2622.44 samples/sec   Loss 8.5753   LearningRate 0.0401   Epoch: 7   Global Step: 304160   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:15,385-Speed 2628.21 samples/sec   Loss 8.5074   LearningRate 0.0401   Epoch: 7   Global Step: 304170   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:19,282-Speed 2627.89 samples/sec   Loss 8.6776   LearningRate 0.0401   Epoch: 7   Global Step: 304180   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:23,284-Speed 2559.68 samples/sec   Loss 8.4884   LearningRate 0.0401   Epoch: 7   Global Step: 304190   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:27,174-Speed 2633.07 samples/sec   Loss 8.4716   LearningRate 0.0401   Epoch: 7   Global Step: 304200   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 05:54:31,063-Speed 2633.71 samples/sec   Loss 8.6322   LearningRate 0.0401   Epoch: 7   Global Step: 304210   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:34,957-Speed 2630.88 samples/sec   Loss 8.5658   LearningRate 0.0401   Epoch: 7   Global Step: 304220   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:38,848-Speed 2631.90 samples/sec   Loss 8.4835   LearningRate 0.0401   Epoch: 7   Global Step: 304230   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:42,745-Speed 2628.63 samples/sec   Loss 8.5354   LearningRate 0.0401   Epoch: 7   Global Step: 304240   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:46,646-Speed 2625.42 samples/sec   Loss 8.5428   LearningRate 0.0401   Epoch: 7   Global Step: 304250   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:52,108-Speed 1875.21 samples/sec   Loss 8.6405   LearningRate 0.0401   Epoch: 7   Global Step: 304260   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:55,993-Speed 2636.59 samples/sec   Loss 8.5469   LearningRate 0.0401   Epoch: 7   Global Step: 304270   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:54:59,890-Speed 2627.97 samples/sec   Loss 8.5972   LearningRate 0.0401   Epoch: 7   Global Step: 304280   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:55:03,789-Speed 2627.21 samples/sec   Loss 8.7873   LearningRate 0.0401   Epoch: 7   Global Step: 304290   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:55:07,735-Speed 2595.18 samples/sec   Loss 8.6374   LearningRate 0.0401   Epoch: 7   Global Step: 304300   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 05:55:11,634-Speed 2627.07 samples/sec   Loss 8.4911   LearningRate 0.0401   Epoch: 7   Global Step: 304310   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:15,538-Speed 2623.83 samples/sec   Loss 8.6899   LearningRate 0.0401   Epoch: 7   Global Step: 304320   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:19,442-Speed 2623.92 samples/sec   Loss 8.4429   LearningRate 0.0401   Epoch: 7   Global Step: 304330   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:23,337-Speed 2629.00 samples/sec   Loss 8.6087   LearningRate 0.0401   Epoch: 7   Global Step: 304340   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:27,226-Speed 2634.02 samples/sec   Loss 8.5974   LearningRate 0.0401   Epoch: 7   Global Step: 304350   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:31,118-Speed 2631.89 samples/sec   Loss 8.4825   LearningRate 0.0401   Epoch: 7   Global Step: 304360   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:35,057-Speed 2600.19 samples/sec   Loss 8.6743   LearningRate 0.0401   Epoch: 7   Global Step: 304370   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:38,950-Speed 2631.51 samples/sec   Loss 8.6165   LearningRate 0.0401   Epoch: 7   Global Step: 304380   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:42,855-Speed 2622.96 samples/sec   Loss 8.4773   LearningRate 0.0401   Epoch: 7   Global Step: 304390   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:46,745-Speed 2633.00 samples/sec   Loss 8.5520   LearningRate 0.0401   Epoch: 7   Global Step: 304400   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 05:55:50,633-Speed 2634.15 samples/sec   Loss 8.5794   LearningRate 0.0401   Epoch: 7   Global Step: 304410   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:55:54,521-Speed 2633.99 samples/sec   Loss 8.6732   LearningRate 0.0401   Epoch: 7   Global Step: 304420   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:55:58,416-Speed 2630.46 samples/sec   Loss 8.5286   LearningRate 0.0401   Epoch: 7   Global Step: 304430   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:02,310-Speed 2630.08 samples/sec   Loss 8.6001   LearningRate 0.0401   Epoch: 7   Global Step: 304440   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:06,203-Speed 2631.58 samples/sec   Loss 8.4908   LearningRate 0.0401   Epoch: 7   Global Step: 304450   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:10,107-Speed 2623.26 samples/sec   Loss 8.4991   LearningRate 0.0401   Epoch: 7   Global Step: 304460   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:14,003-Speed 2629.38 samples/sec   Loss 8.5999   LearningRate 0.0401   Epoch: 7   Global Step: 304470   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:17,991-Speed 2567.76 samples/sec   Loss 8.5342   LearningRate 0.0401   Epoch: 7   Global Step: 304480   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:21,870-Speed 2640.82 samples/sec   Loss 8.6815   LearningRate 0.0401   Epoch: 7   Global Step: 304490   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:25,772-Speed 2624.66 samples/sec   Loss 8.7450   LearningRate 0.0401   Epoch: 7   Global Step: 304500   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 05:56:29,661-Speed 2633.85 samples/sec   Loss 8.6513   LearningRate 0.0401   Epoch: 7   Global Step: 304510   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:33,551-Speed 2632.36 samples/sec   Loss 8.5443   LearningRate 0.0401   Epoch: 7   Global Step: 304520   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:37,447-Speed 2629.74 samples/sec   Loss 8.6006   LearningRate 0.0401   Epoch: 7   Global Step: 304530   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:41,339-Speed 2631.63 samples/sec   Loss 8.4508   LearningRate 0.0401   Epoch: 7   Global Step: 304540   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:45,265-Speed 2609.19 samples/sec   Loss 8.5842   LearningRate 0.0401   Epoch: 7   Global Step: 304550   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:49,159-Speed 2630.36 samples/sec   Loss 8.4830   LearningRate 0.0401   Epoch: 7   Global Step: 304560   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:53,051-Speed 2631.57 samples/sec   Loss 8.5183   LearningRate 0.0401   Epoch: 7   Global Step: 304570   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:56:56,939-Speed 2634.27 samples/sec   Loss 8.5437   LearningRate 0.0400   Epoch: 7   Global Step: 304580   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:00,826-Speed 2634.93 samples/sec   Loss 8.4319   LearningRate 0.0400   Epoch: 7   Global Step: 304590   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:04,716-Speed 2632.78 samples/sec   Loss 8.7338   LearningRate 0.0400   Epoch: 7   Global Step: 304600   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:08,623-Speed 2621.74 samples/sec   Loss 8.5108   LearningRate 0.0400   Epoch: 7   Global Step: 304610   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:57:12,517-Speed 2630.47 samples/sec   Loss 8.5684   LearningRate 0.0400   Epoch: 7   Global Step: 304620   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:57:16,426-Speed 2620.32 samples/sec   Loss 8.5007   LearningRate 0.0400   Epoch: 7   Global Step: 304630   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:57:20,326-Speed 2626.31 samples/sec   Loss 8.5713   LearningRate 0.0400   Epoch: 7   Global Step: 304640   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:24,218-Speed 2631.89 samples/sec   Loss 8.6032   LearningRate 0.0400   Epoch: 7   Global Step: 304650   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:28,109-Speed 2632.03 samples/sec   Loss 8.5719   LearningRate 0.0400   Epoch: 7   Global Step: 304660   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:32,007-Speed 2627.53 samples/sec   Loss 8.5706   LearningRate 0.0400   Epoch: 7   Global Step: 304670   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:35,904-Speed 2627.95 samples/sec   Loss 8.5493   LearningRate 0.0400   Epoch: 7   Global Step: 304680   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:39,794-Speed 2633.30 samples/sec   Loss 8.6357   LearningRate 0.0400   Epoch: 7   Global Step: 304690   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:43,685-Speed 2632.42 samples/sec   Loss 8.5500   LearningRate 0.0400   Epoch: 7   Global Step: 304700   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:47,579-Speed 2630.20 samples/sec   Loss 8.5527   LearningRate 0.0400   Epoch: 7   Global Step: 304710   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:51,473-Speed 2630.92 samples/sec   Loss 8.5974   LearningRate 0.0400   Epoch: 7   Global Step: 304720   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:55,370-Speed 2628.07 samples/sec   Loss 8.6293   LearningRate 0.0400   Epoch: 7   Global Step: 304730   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:57:59,267-Speed 2628.83 samples/sec   Loss 8.5306   LearningRate 0.0400   Epoch: 7   Global Step: 304740   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:03,166-Speed 2626.74 samples/sec   Loss 8.6730   LearningRate 0.0400   Epoch: 7   Global Step: 304750   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:07,064-Speed 2627.66 samples/sec   Loss 8.5227   LearningRate 0.0400   Epoch: 7   Global Step: 304760   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:10,963-Speed 2626.42 samples/sec   Loss 8.5748   LearningRate 0.0400   Epoch: 7   Global Step: 304770   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:14,856-Speed 2631.71 samples/sec   Loss 8.5343   LearningRate 0.0400   Epoch: 7   Global Step: 304780   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:18,745-Speed 2633.39 samples/sec   Loss 8.5296   LearningRate 0.0400   Epoch: 7   Global Step: 304790   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:22,641-Speed 2628.92 samples/sec   Loss 8.5801   LearningRate 0.0400   Epoch: 7   Global Step: 304800   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:26,536-Speed 2630.08 samples/sec   Loss 8.6237   LearningRate 0.0400   Epoch: 7   Global Step: 304810   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:30,432-Speed 2628.98 samples/sec   Loss 8.5912   LearningRate 0.0400   Epoch: 7   Global Step: 304820   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:34,334-Speed 2624.70 samples/sec   Loss 8.5501   LearningRate 0.0400   Epoch: 7   Global Step: 304830   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:38,234-Speed 2626.15 samples/sec   Loss 8.5468   LearningRate 0.0400   Epoch: 7   Global Step: 304840   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:58:42,136-Speed 2624.68 samples/sec   Loss 8.5936   LearningRate 0.0400   Epoch: 7   Global Step: 304850   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:58:46,020-Speed 2637.17 samples/sec   Loss 8.5516   LearningRate 0.0400   Epoch: 7   Global Step: 304860   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:49,937-Speed 2614.72 samples/sec   Loss 8.5850   LearningRate 0.0400   Epoch: 7   Global Step: 304870   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:53,842-Speed 2623.49 samples/sec   Loss 8.4428   LearningRate 0.0400   Epoch: 7   Global Step: 304880   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:58:57,744-Speed 2624.67 samples/sec   Loss 8.6072   LearningRate 0.0400   Epoch: 7   Global Step: 304890   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:01,688-Speed 2597.41 samples/sec   Loss 8.6228   LearningRate 0.0400   Epoch: 7   Global Step: 304900   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:05,685-Speed 2562.46 samples/sec   Loss 8.6526   LearningRate 0.0400   Epoch: 7   Global Step: 304910   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:09,582-Speed 2628.41 samples/sec   Loss 8.5440   LearningRate 0.0400   Epoch: 7   Global Step: 304920   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:13,480-Speed 2627.30 samples/sec   Loss 8.3943   LearningRate 0.0400   Epoch: 7   Global Step: 304930   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:17,381-Speed 2625.32 samples/sec   Loss 8.6671   LearningRate 0.0400   Epoch: 7   Global Step: 304940   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:21,281-Speed 2626.18 samples/sec   Loss 8.6681   LearningRate 0.0400   Epoch: 7   Global Step: 304950   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:25,177-Speed 2629.07 samples/sec   Loss 8.5241   LearningRate 0.0400   Epoch: 7   Global Step: 304960   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:59:29,089-Speed 2618.29 samples/sec   Loss 8.4978   LearningRate 0.0400   Epoch: 7   Global Step: 304970   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:59:32,996-Speed 2621.76 samples/sec   Loss 8.6836   LearningRate 0.0400   Epoch: 7   Global Step: 304980   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:59:36,889-Speed 2630.77 samples/sec   Loss 8.5408   LearningRate 0.0400   Epoch: 7   Global Step: 304990   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:59:40,787-Speed 2627.63 samples/sec   Loss 8.4563   LearningRate 0.0400   Epoch: 7   Global Step: 305000   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 05:59:44,676-Speed 2633.99 samples/sec   Loss 8.5791   LearningRate 0.0400   Epoch: 7   Global Step: 305010   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:48,585-Speed 2620.18 samples/sec   Loss 8.6122   LearningRate 0.0400   Epoch: 7   Global Step: 305020   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 05:59:52,471-Speed 2636.31 samples/sec   Loss 8.4478   LearningRate 0.0400   Epoch: 7   Global Step: 305030   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 05:59:56,368-Speed 2628.10 samples/sec   Loss 8.6102   LearningRate 0.0400   Epoch: 7   Global Step: 305040   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:00,269-Speed 2625.82 samples/sec   Loss 8.6850   LearningRate 0.0400   Epoch: 7   Global Step: 305050   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:04,165-Speed 2628.59 samples/sec   Loss 8.6982   LearningRate 0.0400   Epoch: 7   Global Step: 305060   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:08,056-Speed 2632.19 samples/sec   Loss 8.6112   LearningRate 0.0400   Epoch: 7   Global Step: 305070   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:11,952-Speed 2628.93 samples/sec   Loss 8.4594   LearningRate 0.0400   Epoch: 7   Global Step: 305080   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:15,876-Speed 2610.86 samples/sec   Loss 8.3985   LearningRate 0.0400   Epoch: 7   Global Step: 305090   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:19,766-Speed 2632.93 samples/sec   Loss 8.6440   LearningRate 0.0400   Epoch: 7   Global Step: 305100   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:23,658-Speed 2631.52 samples/sec   Loss 8.6814   LearningRate 0.0400   Epoch: 7   Global Step: 305110   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:27,555-Speed 2627.97 samples/sec   Loss 8.5238   LearningRate 0.0400   Epoch: 7   Global Step: 305120   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:00:31,455-Speed 2627.06 samples/sec   Loss 8.5401   LearningRate 0.0400   Epoch: 7   Global Step: 305130   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:00:35,350-Speed 2629.24 samples/sec   Loss 8.3756   LearningRate 0.0400   Epoch: 7   Global Step: 305140   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:00:39,256-Speed 2622.05 samples/sec   Loss 8.5011   LearningRate 0.0400   Epoch: 7   Global Step: 305150   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:00:43,155-Speed 2627.44 samples/sec   Loss 8.5539   LearningRate 0.0400   Epoch: 7   Global Step: 305160   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:00:47,053-Speed 2627.09 samples/sec   Loss 8.5468   LearningRate 0.0400   Epoch: 7   Global Step: 305170   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:00:50,945-Speed 2632.11 samples/sec   Loss 8.5157   LearningRate 0.0400   Epoch: 7   Global Step: 305180   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:00:54,731-Speed 2705.79 samples/sec   Loss 9.4146   LearningRate 0.0400   Epoch: 7   Global Step: 305190   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:00:58,626-Speed 2629.01 samples/sec   Loss 9.8982   LearningRate 0.0400   Epoch: 7   Global Step: 305200   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:02,515-Speed 2633.94 samples/sec   Loss 8.9029   LearningRate 0.0400   Epoch: 7   Global Step: 305210   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:06,403-Speed 2634.94 samples/sec   Loss 8.6005   LearningRate 0.0400   Epoch: 7   Global Step: 305220   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:10,291-Speed 2634.30 samples/sec   Loss 8.6981   LearningRate 0.0400   Epoch: 7   Global Step: 305230   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:14,201-Speed 2619.70 samples/sec   Loss 8.5706   LearningRate 0.0399   Epoch: 7   Global Step: 305240   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:18,197-Speed 2562.95 samples/sec   Loss 8.6055   LearningRate 0.0399   Epoch: 7   Global Step: 305250   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:22,090-Speed 2630.96 samples/sec   Loss 8.6630   LearningRate 0.0399   Epoch: 7   Global Step: 305260   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:25,987-Speed 2628.32 samples/sec   Loss 8.6119   LearningRate 0.0399   Epoch: 7   Global Step: 305270   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:29,891-Speed 2623.59 samples/sec   Loss 8.5976   LearningRate 0.0399   Epoch: 7   Global Step: 305280   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:01:33,829-Speed 2600.66 samples/sec   Loss 8.6122   LearningRate 0.0399   Epoch: 7   Global Step: 305290   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:01:37,829-Speed 2560.17 samples/sec   Loss 8.5278   LearningRate 0.0399   Epoch: 7   Global Step: 305300   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:01:41,751-Speed 2612.16 samples/sec   Loss 8.5347   LearningRate 0.0399   Epoch: 7   Global Step: 305310   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:01:45,644-Speed 2631.00 samples/sec   Loss 8.4925   LearningRate 0.0399   Epoch: 7   Global Step: 305320   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:01:49,534-Speed 2633.33 samples/sec   Loss 8.4748   LearningRate 0.0399   Epoch: 7   Global Step: 305330   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:01:53,428-Speed 2630.17 samples/sec   Loss 8.5090   LearningRate 0.0399   Epoch: 7   Global Step: 305340   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:01:57,334-Speed 2622.36 samples/sec   Loss 8.4794   LearningRate 0.0399   Epoch: 7   Global Step: 305350   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:02:01,237-Speed 2624.31 samples/sec   Loss 8.5918   LearningRate 0.0399   Epoch: 7   Global Step: 305360   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:02:05,133-Speed 2629.30 samples/sec   Loss 8.7403   LearningRate 0.0399   Epoch: 7   Global Step: 305370   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:02:09,046-Speed 2617.30 samples/sec   Loss 8.5713   LearningRate 0.0399   Epoch: 7   Global Step: 305380   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:02:12,935-Speed 2636.03 samples/sec   Loss 8.4303   LearningRate 0.0399   Epoch: 7   Global Step: 305390   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:16,826-Speed 2632.38 samples/sec   Loss 8.6535   LearningRate 0.0399   Epoch: 7   Global Step: 305400   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:20,727-Speed 2626.03 samples/sec   Loss 8.5731   LearningRate 0.0399   Epoch: 7   Global Step: 305410   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:24,622-Speed 2629.54 samples/sec   Loss 8.4004   LearningRate 0.0399   Epoch: 7   Global Step: 305420   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:28,544-Speed 2611.28 samples/sec   Loss 8.5080   LearningRate 0.0399   Epoch: 7   Global Step: 305430   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:32,449-Speed 2623.20 samples/sec   Loss 8.4514   LearningRate 0.0399   Epoch: 7   Global Step: 305440   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:36,356-Speed 2621.85 samples/sec   Loss 8.4432   LearningRate 0.0399   Epoch: 7   Global Step: 305450   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:40,257-Speed 2625.36 samples/sec   Loss 8.4795   LearningRate 0.0399   Epoch: 7   Global Step: 305460   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:44,170-Speed 2617.30 samples/sec   Loss 8.6018   LearningRate 0.0399   Epoch: 7   Global Step: 305470   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:48,076-Speed 2622.21 samples/sec   Loss 8.4387   LearningRate 0.0399   Epoch: 7   Global Step: 305480   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:02:51,986-Speed 2619.73 samples/sec   Loss 8.5378   LearningRate 0.0399   Epoch: 7   Global Step: 305490   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:02:55,909-Speed 2611.27 samples/sec   Loss 8.4738   LearningRate 0.0399   Epoch: 7   Global Step: 305500   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:02:59,805-Speed 2628.71 samples/sec   Loss 8.4426   LearningRate 0.0399   Epoch: 7   Global Step: 305510   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:03,698-Speed 2630.91 samples/sec   Loss 8.4181   LearningRate 0.0399   Epoch: 7   Global Step: 305520   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:07,592-Speed 2630.10 samples/sec   Loss 8.5765   LearningRate 0.0399   Epoch: 7   Global Step: 305530   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:11,494-Speed 2625.74 samples/sec   Loss 8.6355   LearningRate 0.0399   Epoch: 7   Global Step: 305540   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:15,403-Speed 2620.12 samples/sec   Loss 8.5735   LearningRate 0.0399   Epoch: 7   Global Step: 305550   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:19,293-Speed 2632.69 samples/sec   Loss 8.5515   LearningRate 0.0399   Epoch: 7   Global Step: 305560   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:23,190-Speed 2628.07 samples/sec   Loss 8.5326   LearningRate 0.0399   Epoch: 7   Global Step: 305570   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:27,103-Speed 2617.58 samples/sec   Loss 8.6175   LearningRate 0.0399   Epoch: 7   Global Step: 305580   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:03:31,177-Speed 2514.51 samples/sec   Loss 8.4354   LearningRate 0.0399   Epoch: 7   Global Step: 305590   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:35,228-Speed 2528.45 samples/sec   Loss 8.5142   LearningRate 0.0399   Epoch: 7   Global Step: 305600   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:39,133-Speed 2622.76 samples/sec   Loss 8.5604   LearningRate 0.0399   Epoch: 7   Global Step: 305610   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:43,067-Speed 2603.32 samples/sec   Loss 8.4914   LearningRate 0.0399   Epoch: 7   Global Step: 305620   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:47,056-Speed 2567.79 samples/sec   Loss 8.5680   LearningRate 0.0399   Epoch: 7   Global Step: 305630   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:50,950-Speed 2630.88 samples/sec   Loss 8.6136   LearningRate 0.0399   Epoch: 7   Global Step: 305640   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:54,870-Speed 2612.55 samples/sec   Loss 8.5115   LearningRate 0.0399   Epoch: 7   Global Step: 305650   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:03:58,774-Speed 2624.20 samples/sec   Loss 8.4392   LearningRate 0.0399   Epoch: 7   Global Step: 305660   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:04:02,691-Speed 2614.46 samples/sec   Loss 8.5593   LearningRate 0.0399   Epoch: 7   Global Step: 305670   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:04:06,676-Speed 2570.56 samples/sec   Loss 8.5004   LearningRate 0.0399   Epoch: 7   Global Step: 305680   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:04:10,567-Speed 2631.97 samples/sec   Loss 8.5839   LearningRate 0.0399   Epoch: 7   Global Step: 305690   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:14,463-Speed 2628.74 samples/sec   Loss 8.4492   LearningRate 0.0399   Epoch: 7   Global Step: 305700   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:18,357-Speed 2630.69 samples/sec   Loss 8.4585   LearningRate 0.0399   Epoch: 7   Global Step: 305710   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:22,247-Speed 2633.00 samples/sec   Loss 8.4294   LearningRate 0.0399   Epoch: 7   Global Step: 305720   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:26,141-Speed 2630.42 samples/sec   Loss 8.5161   LearningRate 0.0399   Epoch: 7   Global Step: 305730   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:30,032-Speed 2632.25 samples/sec   Loss 8.3683   LearningRate 0.0399   Epoch: 7   Global Step: 305740   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:33,921-Speed 2633.75 samples/sec   Loss 8.5120   LearningRate 0.0399   Epoch: 7   Global Step: 305750   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:37,831-Speed 2619.48 samples/sec   Loss 8.5376   LearningRate 0.0399   Epoch: 7   Global Step: 305760   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:41,726-Speed 2629.47 samples/sec   Loss 8.4286   LearningRate 0.0399   Epoch: 7   Global Step: 305770   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:45,616-Speed 2633.19 samples/sec   Loss 8.4957   LearningRate 0.0399   Epoch: 7   Global Step: 305780   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:04:49,507-Speed 2632.56 samples/sec   Loss 8.5162   LearningRate 0.0399   Epoch: 7   Global Step: 305790   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:04:53,399-Speed 2631.96 samples/sec   Loss 8.4525   LearningRate 0.0399   Epoch: 7   Global Step: 305800   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:04:57,294-Speed 2629.27 samples/sec   Loss 8.6355   LearningRate 0.0399   Epoch: 7   Global Step: 305810   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:05:01,190-Speed 2629.50 samples/sec   Loss 8.4964   LearningRate 0.0399   Epoch: 7   Global Step: 305820   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:05:05,068-Speed 2640.61 samples/sec   Loss 8.5204   LearningRate 0.0399   Epoch: 7   Global Step: 305830   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:05:08,973-Speed 2623.08 samples/sec   Loss 8.4920   LearningRate 0.0399   Epoch: 7   Global Step: 305840   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:05:12,850-Speed 2641.32 samples/sec   Loss 8.5726   LearningRate 0.0399   Epoch: 7   Global Step: 305850   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:16,770-Speed 2613.78 samples/sec   Loss 8.4063   LearningRate 0.0399   Epoch: 7   Global Step: 305860   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:20,669-Speed 2627.27 samples/sec   Loss 8.4011   LearningRate 0.0399   Epoch: 7   Global Step: 305870   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:24,564-Speed 2629.75 samples/sec   Loss 8.5305   LearningRate 0.0399   Epoch: 7   Global Step: 305880   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:28,459-Speed 2629.69 samples/sec   Loss 8.5265   LearningRate 0.0399   Epoch: 7   Global Step: 305890   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:32,351-Speed 2632.55 samples/sec   Loss 8.6343   LearningRate 0.0398   Epoch: 7   Global Step: 305900   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:36,242-Speed 2632.47 samples/sec   Loss 8.4502   LearningRate 0.0398   Epoch: 7   Global Step: 305910   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:40,133-Speed 2631.96 samples/sec   Loss 8.5370   LearningRate 0.0398   Epoch: 7   Global Step: 305920   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:44,028-Speed 2629.72 samples/sec   Loss 8.4392   LearningRate 0.0398   Epoch: 7   Global Step: 305930   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:47,922-Speed 2629.84 samples/sec   Loss 8.4694   LearningRate 0.0398   Epoch: 7   Global Step: 305940   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:05:51,827-Speed 2623.63 samples/sec   Loss 8.6031   LearningRate 0.0398   Epoch: 7   Global Step: 305950   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:05:55,726-Speed 2627.00 samples/sec   Loss 8.4668   LearningRate 0.0398   Epoch: 7   Global Step: 305960   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:05:59,691-Speed 2583.35 samples/sec   Loss 8.4533   LearningRate 0.0398   Epoch: 7   Global Step: 305970   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:03,653-Speed 2584.86 samples/sec   Loss 8.5926   LearningRate 0.0398   Epoch: 7   Global Step: 305980   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:07,546-Speed 2631.16 samples/sec   Loss 8.6704   LearningRate 0.0398   Epoch: 7   Global Step: 305990   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:11,438-Speed 2631.40 samples/sec   Loss 8.5273   LearningRate 0.0398   Epoch: 7   Global Step: 306000   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:15,349-Speed 2619.22 samples/sec   Loss 8.5875   LearningRate 0.0398   Epoch: 7   Global Step: 306010   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:19,244-Speed 2629.44 samples/sec   Loss 8.6387   LearningRate 0.0398   Epoch: 7   Global Step: 306020   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:23,152-Speed 2621.13 samples/sec   Loss 8.4123   LearningRate 0.0398   Epoch: 7   Global Step: 306030   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:06:26,971-Speed 2682.16 samples/sec   Loss 9.0834   LearningRate 0.0398   Epoch: 7   Global Step: 306040   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:30,856-Speed 2635.99 samples/sec   Loss 8.6332   LearningRate 0.0398   Epoch: 7   Global Step: 306050   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:34,748-Speed 2631.67 samples/sec   Loss 8.6777   LearningRate 0.0398   Epoch: 7   Global Step: 306060   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:38,639-Speed 2632.15 samples/sec   Loss 8.6228   LearningRate 0.0398   Epoch: 7   Global Step: 306070   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:42,530-Speed 2632.40 samples/sec   Loss 8.5618   LearningRate 0.0398   Epoch: 7   Global Step: 306080   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:46,417-Speed 2635.97 samples/sec   Loss 8.5394   LearningRate 0.0398   Epoch: 7   Global Step: 306090   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:50,303-Speed 2635.55 samples/sec   Loss 8.5831   LearningRate 0.0398   Epoch: 7   Global Step: 306100   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:54,190-Speed 2634.92 samples/sec   Loss 8.4807   LearningRate 0.0398   Epoch: 7   Global Step: 306110   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:06:58,079-Speed 2633.73 samples/sec   Loss 8.4766   LearningRate 0.0398   Epoch: 7   Global Step: 306120   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:07:01,981-Speed 2624.84 samples/sec   Loss 8.3966   LearningRate 0.0398   Epoch: 7   Global Step: 306130   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:07:05,881-Speed 2625.74 samples/sec   Loss 8.5247   LearningRate 0.0398   Epoch: 7   Global Step: 306140   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:09,771-Speed 2634.02 samples/sec   Loss 8.5659   LearningRate 0.0398   Epoch: 7   Global Step: 306150   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:13,660-Speed 2633.11 samples/sec   Loss 8.5497   LearningRate 0.0398   Epoch: 7   Global Step: 306160   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:17,549-Speed 2634.11 samples/sec   Loss 8.5692   LearningRate 0.0398   Epoch: 7   Global Step: 306170   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:21,441-Speed 2631.59 samples/sec   Loss 8.5985   LearningRate 0.0398   Epoch: 7   Global Step: 306180   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:25,334-Speed 2631.65 samples/sec   Loss 8.5638   LearningRate 0.0398   Epoch: 7   Global Step: 306190   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:29,238-Speed 2623.25 samples/sec   Loss 8.5046   LearningRate 0.0398   Epoch: 7   Global Step: 306200   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:33,128-Speed 2632.98 samples/sec   Loss 8.6065   LearningRate 0.0398   Epoch: 7   Global Step: 306210   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:37,021-Speed 2630.53 samples/sec   Loss 8.5638   LearningRate 0.0398   Epoch: 7   Global Step: 306220   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:40,909-Speed 2634.69 samples/sec   Loss 8.5878   LearningRate 0.0398   Epoch: 7   Global Step: 306230   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:07:44,837-Speed 2607.83 samples/sec   Loss 8.5466   LearningRate 0.0398   Epoch: 7   Global Step: 306240   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:07:48,730-Speed 2631.35 samples/sec   Loss 8.5451   LearningRate 0.0398   Epoch: 7   Global Step: 306250   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:07:52,633-Speed 2623.77 samples/sec   Loss 8.6375   LearningRate 0.0398   Epoch: 7   Global Step: 306260   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:07:56,533-Speed 2626.93 samples/sec   Loss 8.5221   LearningRate 0.0398   Epoch: 7   Global Step: 306270   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:00,435-Speed 2625.15 samples/sec   Loss 8.5941   LearningRate 0.0398   Epoch: 7   Global Step: 306280   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:04,332-Speed 2628.09 samples/sec   Loss 8.3997   LearningRate 0.0398   Epoch: 7   Global Step: 306290   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:08,221-Speed 2633.40 samples/sec   Loss 8.5476   LearningRate 0.0398   Epoch: 7   Global Step: 306300   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:12,109-Speed 2634.18 samples/sec   Loss 8.5686   LearningRate 0.0398   Epoch: 7   Global Step: 306310   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:15,998-Speed 2633.81 samples/sec   Loss 8.3976   LearningRate 0.0398   Epoch: 7   Global Step: 306320   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:19,886-Speed 2635.17 samples/sec   Loss 8.5447   LearningRate 0.0398   Epoch: 7   Global Step: 306330   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:08:23,779-Speed 2630.76 samples/sec   Loss 8.5990   LearningRate 0.0398   Epoch: 7   Global Step: 306340   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:27,676-Speed 2628.61 samples/sec   Loss 8.4592   LearningRate 0.0398   Epoch: 7   Global Step: 306350   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:31,566-Speed 2632.28 samples/sec   Loss 8.3781   LearningRate 0.0398   Epoch: 7   Global Step: 306360   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:35,476-Speed 2619.32 samples/sec   Loss 8.6451   LearningRate 0.0398   Epoch: 7   Global Step: 306370   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:39,362-Speed 2636.16 samples/sec   Loss 8.5282   LearningRate 0.0398   Epoch: 7   Global Step: 306380   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:43,261-Speed 2626.57 samples/sec   Loss 8.6735   LearningRate 0.0398   Epoch: 7   Global Step: 306390   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:47,159-Speed 2627.97 samples/sec   Loss 8.4448   LearningRate 0.0398   Epoch: 7   Global Step: 306400   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:51,051-Speed 2631.75 samples/sec   Loss 8.4476   LearningRate 0.0398   Epoch: 7   Global Step: 306410   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:54,944-Speed 2630.97 samples/sec   Loss 8.4803   LearningRate 0.0398   Epoch: 7   Global Step: 306420   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:08:58,836-Speed 2631.73 samples/sec   Loss 8.5601   LearningRate 0.0398   Epoch: 7   Global Step: 306430   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:09:02,728-Speed 2630.92 samples/sec   Loss 8.5499   LearningRate 0.0398   Epoch: 7   Global Step: 306440   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:09:06,620-Speed 2631.94 samples/sec   Loss 8.5537   LearningRate 0.0398   Epoch: 7   Global Step: 306450   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:09:10,513-Speed 2630.86 samples/sec   Loss 8.4526   LearningRate 0.0398   Epoch: 7   Global Step: 306460   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:09:14,420-Speed 2621.76 samples/sec   Loss 8.6343   LearningRate 0.0398   Epoch: 7   Global Step: 306470   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:09:18,313-Speed 2630.82 samples/sec   Loss 8.4936   LearningRate 0.0398   Epoch: 7   Global Step: 306480   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:09:22,204-Speed 2632.32 samples/sec   Loss 8.5257   LearningRate 0.0398   Epoch: 7   Global Step: 306490   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:09:26,053-Speed 2661.02 samples/sec   Loss 8.5776   LearningRate 0.0398   Epoch: 7   Global Step: 306500   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:29,951-Speed 2628.22 samples/sec   Loss 8.4915   LearningRate 0.0398   Epoch: 7   Global Step: 306510   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:33,843-Speed 2631.23 samples/sec   Loss 8.5474   LearningRate 0.0398   Epoch: 7   Global Step: 306520   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:37,748-Speed 2622.89 samples/sec   Loss 8.4136   LearningRate 0.0398   Epoch: 7   Global Step: 306530   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:41,691-Speed 2597.73 samples/sec   Loss 8.4598   LearningRate 0.0398   Epoch: 7   Global Step: 306540   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:45,654-Speed 2584.75 samples/sec   Loss 8.5344   LearningRate 0.0397   Epoch: 7   Global Step: 306550   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:49,558-Speed 2622.83 samples/sec   Loss 8.5641   LearningRate 0.0397   Epoch: 7   Global Step: 306560   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:53,465-Speed 2621.71 samples/sec   Loss 8.5657   LearningRate 0.0397   Epoch: 7   Global Step: 306570   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:09:57,343-Speed 2640.97 samples/sec   Loss 9.7156   LearningRate 0.0397   Epoch: 7   Global Step: 306580   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:01,237-Speed 2630.47 samples/sec   Loss 9.2574   LearningRate 0.0397   Epoch: 7   Global Step: 306590   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:05,135-Speed 2627.66 samples/sec   Loss 8.8070   LearningRate 0.0397   Epoch: 7   Global Step: 306600   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:09,232-Speed 2500.25 samples/sec   Loss 8.7514   LearningRate 0.0397   Epoch: 7   Global Step: 306610   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:13,123-Speed 2632.11 samples/sec   Loss 9.1219   LearningRate 0.0397   Epoch: 7   Global Step: 306620   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:17,032-Speed 2620.43 samples/sec   Loss 8.7113   LearningRate 0.0397   Epoch: 7   Global Step: 306630   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:20,939-Speed 2620.97 samples/sec   Loss 8.5985   LearningRate 0.0397   Epoch: 7   Global Step: 306640   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:24,850-Speed 2619.46 samples/sec   Loss 8.5483   LearningRate 0.0397   Epoch: 7   Global Step: 306650   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:28,748-Speed 2627.41 samples/sec   Loss 8.5958   LearningRate 0.0397   Epoch: 7   Global Step: 306660   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:32,643-Speed 2629.44 samples/sec   Loss 8.6119   LearningRate 0.0397   Epoch: 7   Global Step: 306670   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:10:36,541-Speed 2627.63 samples/sec   Loss 8.7665   LearningRate 0.0397   Epoch: 7   Global Step: 306680   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:10:40,439-Speed 2627.47 samples/sec   Loss 8.5087   LearningRate 0.0397   Epoch: 7   Global Step: 306690   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:10:44,334-Speed 2629.89 samples/sec   Loss 8.5878   LearningRate 0.0397   Epoch: 7   Global Step: 306700   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:10:48,226-Speed 2631.66 samples/sec   Loss 8.4358   LearningRate 0.0397   Epoch: 7   Global Step: 306710   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:10:52,115-Speed 2633.87 samples/sec   Loss 8.4224   LearningRate 0.0397   Epoch: 7   Global Step: 306720   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:10:56,014-Speed 2626.44 samples/sec   Loss 8.6340   LearningRate 0.0397   Epoch: 7   Global Step: 306730   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:10:59,913-Speed 2626.87 samples/sec   Loss 8.5921   LearningRate 0.0397   Epoch: 7   Global Step: 306740   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:11:03,817-Speed 2623.34 samples/sec   Loss 8.5835   LearningRate 0.0397   Epoch: 7   Global Step: 306750   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:11:07,721-Speed 2624.03 samples/sec   Loss 8.5784   LearningRate 0.0397   Epoch: 7   Global Step: 306760   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:11:11,609-Speed 2633.59 samples/sec   Loss 8.5789   LearningRate 0.0397   Epoch: 7   Global Step: 306770   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:11:15,502-Speed 2631.17 samples/sec   Loss 8.5298   LearningRate 0.0397   Epoch: 7   Global Step: 306780   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:19,392-Speed 2633.50 samples/sec   Loss 8.4788   LearningRate 0.0397   Epoch: 7   Global Step: 306790   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:23,282-Speed 2632.95 samples/sec   Loss 8.6456   LearningRate 0.0397   Epoch: 7   Global Step: 306800   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:27,172-Speed 2632.78 samples/sec   Loss 8.5784   LearningRate 0.0397   Epoch: 7   Global Step: 306810   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:31,069-Speed 2628.76 samples/sec   Loss 8.3486   LearningRate 0.0397   Epoch: 7   Global Step: 306820   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:34,959-Speed 2632.58 samples/sec   Loss 8.6489   LearningRate 0.0397   Epoch: 7   Global Step: 306830   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:38,850-Speed 2632.24 samples/sec   Loss 8.6731   LearningRate 0.0397   Epoch: 7   Global Step: 306840   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:42,741-Speed 2632.35 samples/sec   Loss 8.6948   LearningRate 0.0397   Epoch: 7   Global Step: 306850   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:46,634-Speed 2630.94 samples/sec   Loss 8.4830   LearningRate 0.0397   Epoch: 7   Global Step: 306860   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:50,536-Speed 2624.64 samples/sec   Loss 9.1530   LearningRate 0.0397   Epoch: 7   Global Step: 306870   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:11:54,408-Speed 2645.22 samples/sec   Loss 9.8304   LearningRate 0.0397   Epoch: 7   Global Step: 306880   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:11:58,312-Speed 2624.09 samples/sec   Loss 8.7436   LearningRate 0.0397   Epoch: 7   Global Step: 306890   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:02,207-Speed 2629.25 samples/sec   Loss 8.5982   LearningRate 0.0397   Epoch: 7   Global Step: 306900   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:06,107-Speed 2626.29 samples/sec   Loss 8.4542   LearningRate 0.0397   Epoch: 7   Global Step: 306910   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:10,000-Speed 2631.11 samples/sec   Loss 8.5401   LearningRate 0.0397   Epoch: 7   Global Step: 306920   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:13,891-Speed 2632.15 samples/sec   Loss 8.6119   LearningRate 0.0397   Epoch: 7   Global Step: 306930   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:17,779-Speed 2634.14 samples/sec   Loss 8.6828   LearningRate 0.0397   Epoch: 7   Global Step: 306940   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:21,667-Speed 2634.74 samples/sec   Loss 8.6490   LearningRate 0.0397   Epoch: 7   Global Step: 306950   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:25,562-Speed 2629.42 samples/sec   Loss 8.7066   LearningRate 0.0397   Epoch: 7   Global Step: 306960   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:29,457-Speed 2629.67 samples/sec   Loss 8.7376   LearningRate 0.0397   Epoch: 7   Global Step: 306970   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:12:33,349-Speed 2631.77 samples/sec   Loss 8.6126   LearningRate 0.0397   Epoch: 7   Global Step: 306980   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:12:37,242-Speed 2631.33 samples/sec   Loss 8.6520   LearningRate 0.0397   Epoch: 7   Global Step: 306990   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:12:41,135-Speed 2630.89 samples/sec   Loss 8.5282   LearningRate 0.0397   Epoch: 7   Global Step: 307000   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:12:45,028-Speed 2630.84 samples/sec   Loss 8.5342   LearningRate 0.0397   Epoch: 7   Global Step: 307010   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:12:48,919-Speed 2632.17 samples/sec   Loss 8.4754   LearningRate 0.0397   Epoch: 7   Global Step: 307020   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:12:52,808-Speed 2633.84 samples/sec   Loss 8.4402   LearningRate 0.0397   Epoch: 7   Global Step: 307030   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:12:56,698-Speed 2632.91 samples/sec   Loss 8.5790   LearningRate 0.0397   Epoch: 7   Global Step: 307040   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:13:00,594-Speed 2628.44 samples/sec   Loss 8.6240   LearningRate 0.0397   Epoch: 7   Global Step: 307050   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:13:04,484-Speed 2633.20 samples/sec   Loss 8.4448   LearningRate 0.0397   Epoch: 7   Global Step: 307060   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:13:08,375-Speed 2632.35 samples/sec   Loss 8.6467   LearningRate 0.0397   Epoch: 7   Global Step: 307070   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:13:12,274-Speed 2626.81 samples/sec   Loss 8.4554   LearningRate 0.0397   Epoch: 7   Global Step: 307080   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:16,166-Speed 2631.98 samples/sec   Loss 8.5313   LearningRate 0.0397   Epoch: 7   Global Step: 307090   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:20,058-Speed 2631.78 samples/sec   Loss 8.5625   LearningRate 0.0397   Epoch: 7   Global Step: 307100   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:23,951-Speed 2630.75 samples/sec   Loss 8.5934   LearningRate 0.0397   Epoch: 7   Global Step: 307110   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:27,855-Speed 2623.80 samples/sec   Loss 8.5541   LearningRate 0.0397   Epoch: 7   Global Step: 307120   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:31,758-Speed 2623.59 samples/sec   Loss 8.4677   LearningRate 0.0397   Epoch: 7   Global Step: 307130   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:35,650-Speed 2631.56 samples/sec   Loss 8.4157   LearningRate 0.0397   Epoch: 7   Global Step: 307140   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:39,550-Speed 2626.22 samples/sec   Loss 8.5586   LearningRate 0.0397   Epoch: 7   Global Step: 307150   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:43,445-Speed 2629.49 samples/sec   Loss 8.3655   LearningRate 0.0397   Epoch: 7   Global Step: 307160   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:47,346-Speed 2626.21 samples/sec   Loss 8.4945   LearningRate 0.0397   Epoch: 7   Global Step: 307170   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:13:51,235-Speed 2633.62 samples/sec   Loss 8.5064   LearningRate 0.0397   Epoch: 7   Global Step: 307180   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:13:55,125-Speed 2633.43 samples/sec   Loss 8.4117   LearningRate 0.0397   Epoch: 7   Global Step: 307190   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:13:59,024-Speed 2626.57 samples/sec   Loss 8.6368   LearningRate 0.0397   Epoch: 7   Global Step: 307200   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:02,923-Speed 2627.01 samples/sec   Loss 8.5043   LearningRate 0.0396   Epoch: 7   Global Step: 307210   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:06,816-Speed 2630.70 samples/sec   Loss 8.3859   LearningRate 0.0396   Epoch: 7   Global Step: 307220   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:10,707-Speed 2632.07 samples/sec   Loss 8.4620   LearningRate 0.0396   Epoch: 7   Global Step: 307230   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:14,603-Speed 2629.12 samples/sec   Loss 8.3385   LearningRate 0.0396   Epoch: 7   Global Step: 307240   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:18,500-Speed 2628.05 samples/sec   Loss 8.5942   LearningRate 0.0396   Epoch: 7   Global Step: 307250   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:22,402-Speed 2624.84 samples/sec   Loss 8.5698   LearningRate 0.0396   Epoch: 7   Global Step: 307260   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:26,301-Speed 2627.09 samples/sec   Loss 8.4955   LearningRate 0.0396   Epoch: 7   Global Step: 307270   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:14:30,193-Speed 2631.90 samples/sec   Loss 8.4619   LearningRate 0.0396   Epoch: 7   Global Step: 307280   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:34,111-Speed 2614.23 samples/sec   Loss 8.3843   LearningRate 0.0396   Epoch: 7   Global Step: 307290   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:38,016-Speed 2622.98 samples/sec   Loss 8.5922   LearningRate 0.0396   Epoch: 7   Global Step: 307300   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:41,908-Speed 2631.61 samples/sec   Loss 8.4839   LearningRate 0.0396   Epoch: 7   Global Step: 307310   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:45,801-Speed 2630.35 samples/sec   Loss 8.4409   LearningRate 0.0396   Epoch: 7   Global Step: 307320   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:49,693-Speed 2631.84 samples/sec   Loss 8.6235   LearningRate 0.0396   Epoch: 7   Global Step: 307330   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:53,612-Speed 2613.32 samples/sec   Loss 8.5956   LearningRate 0.0396   Epoch: 7   Global Step: 307340   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:14:57,515-Speed 2624.49 samples/sec   Loss 8.4972   LearningRate 0.0396   Epoch: 7   Global Step: 307350   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:15:01,406-Speed 2632.19 samples/sec   Loss 8.6210   LearningRate 0.0396   Epoch: 7   Global Step: 307360   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:15:05,297-Speed 2632.08 samples/sec   Loss 8.3880   LearningRate 0.0396   Epoch: 7   Global Step: 307370   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:15:09,191-Speed 2630.91 samples/sec   Loss 8.4912   LearningRate 0.0396   Epoch: 7   Global Step: 307380   Fp16 Grad Scale: 262144   Required: 59 hours
Training: 2022-04-14 06:15:13,065-Speed 2644.17 samples/sec   Loss 8.6472   LearningRate 0.0396   Epoch: 7   Global Step: 307390   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:15:16,939-Speed 2643.99 samples/sec   Loss 8.5155   LearningRate 0.0396   Epoch: 7   Global Step: 307400   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:20,837-Speed 2627.48 samples/sec   Loss 8.4387   LearningRate 0.0396   Epoch: 7   Global Step: 307410   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:24,752-Speed 2616.19 samples/sec   Loss 8.6198   LearningRate 0.0396   Epoch: 7   Global Step: 307420   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:28,661-Speed 2619.89 samples/sec   Loss 8.5854   LearningRate 0.0396   Epoch: 7   Global Step: 307430   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:32,590-Speed 2606.54 samples/sec   Loss 8.6455   LearningRate 0.0396   Epoch: 7   Global Step: 307440   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:36,494-Speed 2623.56 samples/sec   Loss 8.4526   LearningRate 0.0396   Epoch: 7   Global Step: 307450   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:40,404-Speed 2620.39 samples/sec   Loss 8.5534   LearningRate 0.0396   Epoch: 7   Global Step: 307460   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:44,313-Speed 2620.27 samples/sec   Loss 8.5634   LearningRate 0.0396   Epoch: 7   Global Step: 307470   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:48,289-Speed 2577.06 samples/sec   Loss 8.5424   LearningRate 0.0396   Epoch: 7   Global Step: 307480   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:52,220-Speed 2605.33 samples/sec   Loss 8.6292   LearningRate 0.0396   Epoch: 7   Global Step: 307490   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:15:56,156-Speed 2602.65 samples/sec   Loss 8.4712   LearningRate 0.0396   Epoch: 7   Global Step: 307500   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:16:00,083-Speed 2607.99 samples/sec   Loss 8.3610   LearningRate 0.0396   Epoch: 7   Global Step: 307510   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:16:03,969-Speed 2635.67 samples/sec   Loss 8.4428   LearningRate 0.0396   Epoch: 7   Global Step: 307520   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:07,868-Speed 2626.68 samples/sec   Loss 8.5538   LearningRate 0.0396   Epoch: 7   Global Step: 307530   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:11,772-Speed 2624.13 samples/sec   Loss 8.4669   LearningRate 0.0396   Epoch: 7   Global Step: 307540   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:15,670-Speed 2627.63 samples/sec   Loss 8.4651   LearningRate 0.0396   Epoch: 7   Global Step: 307550   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:19,587-Speed 2614.33 samples/sec   Loss 8.4724   LearningRate 0.0396   Epoch: 7   Global Step: 307560   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:23,501-Speed 2617.39 samples/sec   Loss 8.4370   LearningRate 0.0396   Epoch: 7   Global Step: 307570   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:27,417-Speed 2615.37 samples/sec   Loss 8.4345   LearningRate 0.0396   Epoch: 7   Global Step: 307580   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:31,338-Speed 2612.41 samples/sec   Loss 8.3647   LearningRate 0.0396   Epoch: 7   Global Step: 307590   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:35,253-Speed 2616.32 samples/sec   Loss 8.4921   LearningRate 0.0396   Epoch: 7   Global Step: 307600   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:39,161-Speed 2620.27 samples/sec   Loss 8.4610   LearningRate 0.0396   Epoch: 7   Global Step: 307610   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:16:43,060-Speed 2626.92 samples/sec   Loss 8.4593   LearningRate 0.0396   Epoch: 7   Global Step: 307620   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:16:46,949-Speed 2633.88 samples/sec   Loss 8.5064   LearningRate 0.0396   Epoch: 7   Global Step: 307630   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:16:50,844-Speed 2629.96 samples/sec   Loss 8.5415   LearningRate 0.0396   Epoch: 7   Global Step: 307640   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:16:54,748-Speed 2623.60 samples/sec   Loss 8.5716   LearningRate 0.0396   Epoch: 7   Global Step: 307650   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:16:58,639-Speed 2632.71 samples/sec   Loss 8.4230   LearningRate 0.0396   Epoch: 7   Global Step: 307660   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:17:02,536-Speed 2627.99 samples/sec   Loss 8.4567   LearningRate 0.0396   Epoch: 7   Global Step: 307670   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:17:06,428-Speed 2631.81 samples/sec   Loss 8.6068   LearningRate 0.0396   Epoch: 7   Global Step: 307680   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:17:10,328-Speed 2625.68 samples/sec   Loss 8.4986   LearningRate 0.0396   Epoch: 7   Global Step: 307690   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:17:14,225-Speed 2628.47 samples/sec   Loss 8.4527   LearningRate 0.0396   Epoch: 7   Global Step: 307700   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:17:18,120-Speed 2629.99 samples/sec   Loss 8.4345   LearningRate 0.0396   Epoch: 7   Global Step: 307710   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:17:21,971-Speed 2660.89 samples/sec   Loss 8.6708   LearningRate 0.0396   Epoch: 7   Global Step: 307720   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:17:25,809-Speed 2668.62 samples/sec   Loss 8.7105   LearningRate 0.0396   Epoch: 7   Global Step: 307730   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:29,702-Speed 2631.65 samples/sec   Loss 8.6422   LearningRate 0.0396   Epoch: 7   Global Step: 307740   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:33,614-Speed 2618.16 samples/sec   Loss 8.8436   LearningRate 0.0396   Epoch: 7   Global Step: 307750   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:37,508-Speed 2630.36 samples/sec   Loss 8.6113   LearningRate 0.0396   Epoch: 7   Global Step: 307760   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:41,399-Speed 2631.73 samples/sec   Loss 8.5940   LearningRate 0.0396   Epoch: 7   Global Step: 307770   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:45,289-Speed 2633.65 samples/sec   Loss 8.4961   LearningRate 0.0396   Epoch: 7   Global Step: 307780   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:49,181-Speed 2631.41 samples/sec   Loss 8.3510   LearningRate 0.0396   Epoch: 7   Global Step: 307790   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:53,076-Speed 2630.43 samples/sec   Loss 8.5675   LearningRate 0.0396   Epoch: 7   Global Step: 307800   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:17:56,981-Speed 2622.58 samples/sec   Loss 8.5359   LearningRate 0.0396   Epoch: 7   Global Step: 307810   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:18:00,872-Speed 2632.90 samples/sec   Loss 8.5792   LearningRate 0.0396   Epoch: 7   Global Step: 307820   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:18:04,762-Speed 2632.35 samples/sec   Loss 8.5047   LearningRate 0.0396   Epoch: 7   Global Step: 307830   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:08,661-Speed 2626.96 samples/sec   Loss 8.4304   LearningRate 0.0396   Epoch: 7   Global Step: 307840   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:12,562-Speed 2625.35 samples/sec   Loss 8.4281   LearningRate 0.0396   Epoch: 7   Global Step: 307850   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:16,445-Speed 2638.08 samples/sec   Loss 8.4438   LearningRate 0.0396   Epoch: 7   Global Step: 307860   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:20,337-Speed 2631.88 samples/sec   Loss 8.5404   LearningRate 0.0395   Epoch: 7   Global Step: 307870   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:24,235-Speed 2627.82 samples/sec   Loss 8.4261   LearningRate 0.0395   Epoch: 7   Global Step: 307880   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:28,123-Speed 2634.57 samples/sec   Loss 8.4069   LearningRate 0.0395   Epoch: 7   Global Step: 307890   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:32,013-Speed 2632.96 samples/sec   Loss 8.6262   LearningRate 0.0395   Epoch: 7   Global Step: 307900   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:35,904-Speed 2631.65 samples/sec   Loss 8.4875   LearningRate 0.0395   Epoch: 7   Global Step: 307910   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:39,792-Speed 2634.17 samples/sec   Loss 8.4669   LearningRate 0.0395   Epoch: 7   Global Step: 307920   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:18:43,686-Speed 2631.22 samples/sec   Loss 8.5024   LearningRate 0.0395   Epoch: 7   Global Step: 307930   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:18:47,581-Speed 2629.79 samples/sec   Loss 8.5814   LearningRate 0.0395   Epoch: 7   Global Step: 307940   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:18:51,471-Speed 2632.66 samples/sec   Loss 8.5327   LearningRate 0.0395   Epoch: 7   Global Step: 307950   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:18:55,361-Speed 2632.96 samples/sec   Loss 8.4293   LearningRate 0.0395   Epoch: 7   Global Step: 307960   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:18:59,254-Speed 2631.04 samples/sec   Loss 8.5021   LearningRate 0.0395   Epoch: 7   Global Step: 307970   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:19:03,146-Speed 2632.15 samples/sec   Loss 8.4035   LearningRate 0.0395   Epoch: 7   Global Step: 307980   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:19:07,049-Speed 2624.02 samples/sec   Loss 8.4024   LearningRate 0.0395   Epoch: 7   Global Step: 307990   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:19:10,954-Speed 2622.83 samples/sec   Loss 8.4141   LearningRate 0.0395   Epoch: 7   Global Step: 308000   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:19:14,857-Speed 2624.03 samples/sec   Loss 8.5504   LearningRate 0.0395   Epoch: 7   Global Step: 308010   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:19:18,861-Speed 2558.26 samples/sec   Loss 8.4524   LearningRate 0.0395   Epoch: 7   Global Step: 308020   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:19:22,766-Speed 2623.00 samples/sec   Loss 8.5845   LearningRate 0.0395   Epoch: 7   Global Step: 308030   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:26,670-Speed 2623.86 samples/sec   Loss 8.5667   LearningRate 0.0395   Epoch: 7   Global Step: 308040   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:30,571-Speed 2625.32 samples/sec   Loss 8.4266   LearningRate 0.0395   Epoch: 7   Global Step: 308050   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:34,477-Speed 2621.96 samples/sec   Loss 8.5423   LearningRate 0.0395   Epoch: 7   Global Step: 308060   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:38,379-Speed 2625.28 samples/sec   Loss 8.4437   LearningRate 0.0395   Epoch: 7   Global Step: 308070   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:42,291-Speed 2618.07 samples/sec   Loss 8.5925   LearningRate 0.0395   Epoch: 7   Global Step: 308080   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:46,189-Speed 2627.43 samples/sec   Loss 8.4922   LearningRate 0.0395   Epoch: 7   Global Step: 308090   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:50,093-Speed 2624.16 samples/sec   Loss 8.7308   LearningRate 0.0395   Epoch: 7   Global Step: 308100   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:53,997-Speed 2623.65 samples/sec   Loss 8.5134   LearningRate 0.0395   Epoch: 7   Global Step: 308110   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:19:57,900-Speed 2624.49 samples/sec   Loss 8.4781   LearningRate 0.0395   Epoch: 7   Global Step: 308120   Fp16 Grad Scale: 65536   Required: 59 hours
Training: 2022-04-14 06:20:01,799-Speed 2627.00 samples/sec   Loss 8.5302   LearningRate 0.0395   Epoch: 7   Global Step: 308130   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:20:05,692-Speed 2630.62 samples/sec   Loss 8.5971   LearningRate 0.0395   Epoch: 7   Global Step: 308140   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:20:09,607-Speed 2616.14 samples/sec   Loss 8.5727   LearningRate 0.0395   Epoch: 7   Global Step: 308150   Fp16 Grad Scale: 131072   Required: 59 hours
Training: 2022-04-14 06:20:13,464-Speed 2655.77 samples/sec   Loss 8.6178   LearningRate 0.0395   Epoch: 7   Global Step: 308160   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:17,363-Speed 2627.34 samples/sec   Loss 8.5798   LearningRate 0.0395   Epoch: 7   Global Step: 308170   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:21,258-Speed 2629.26 samples/sec   Loss 8.4101   LearningRate 0.0395   Epoch: 7   Global Step: 308180   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:25,152-Speed 2630.97 samples/sec   Loss 8.4745   LearningRate 0.0395   Epoch: 7   Global Step: 308190   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:29,052-Speed 2626.12 samples/sec   Loss 8.5937   LearningRate 0.0395   Epoch: 7   Global Step: 308200   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:32,947-Speed 2629.88 samples/sec   Loss 8.4862   LearningRate 0.0395   Epoch: 7   Global Step: 308210   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:36,837-Speed 2632.35 samples/sec   Loss 8.5687   LearningRate 0.0395   Epoch: 7   Global Step: 308220   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-04-14 06:20:40,689-Speed 2659.73 samples/sec   Loss 8.7167   LearningRate 0.0395   Epoch: 7   Global Step: 308230   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:20:44,543-Speed 2657.99 samples/sec   Loss 8.8230   LearningRate 0.0395   Epoch: 7   Global Step: 308240   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:20:48,435-Speed 2631.65 samples/sec   Loss 8.6552   LearningRate 0.0395   Epoch: 7   Global Step: 308250   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:20:52,325-Speed 2633.07 samples/sec   Loss 9.7335   LearningRate 0.0395   Epoch: 7   Global Step: 308260   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:20:56,206-Speed 2638.65 samples/sec   Loss 8.9205   LearningRate 0.0395   Epoch: 7   Global Step: 308270   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:00,097-Speed 2632.46 samples/sec   Loss 8.6870   LearningRate 0.0395   Epoch: 7   Global Step: 308280   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:04,007-Speed 2619.43 samples/sec   Loss 8.5886   LearningRate 0.0395   Epoch: 7   Global Step: 308290   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:07,905-Speed 2628.02 samples/sec   Loss 8.4214   LearningRate 0.0395   Epoch: 7   Global Step: 308300   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:11,790-Speed 2636.18 samples/sec   Loss 8.3900   LearningRate 0.0395   Epoch: 7   Global Step: 308310   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:15,732-Speed 2598.40 samples/sec   Loss 8.4451   LearningRate 0.0395   Epoch: 7   Global Step: 308320   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:19,674-Speed 2598.47 samples/sec   Loss 8.7108   LearningRate 0.0395   Epoch: 7   Global Step: 308330   Fp16 Grad Scale: 2048   Required: 59 hours
Training: 2022-04-14 06:21:23,737-Speed 2520.64 samples/sec   Loss 8.4454   LearningRate 0.0395   Epoch: 7   Global Step: 308340   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:27,623-Speed 2635.90 samples/sec   Loss 8.4966   LearningRate 0.0395   Epoch: 7   Global Step: 308350   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:31,522-Speed 2627.31 samples/sec   Loss 8.4999   LearningRate 0.0395   Epoch: 7   Global Step: 308360   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:35,420-Speed 2627.77 samples/sec   Loss 8.4976   LearningRate 0.0395   Epoch: 7   Global Step: 308370   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:39,317-Speed 2628.02 samples/sec   Loss 8.4305   LearningRate 0.0395   Epoch: 7   Global Step: 308380   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:43,214-Speed 2628.38 samples/sec   Loss 8.5757   LearningRate 0.0395   Epoch: 7   Global Step: 308390   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:47,115-Speed 2625.42 samples/sec   Loss 8.8415   LearningRate 0.0395   Epoch: 7   Global Step: 308400   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:51,009-Speed 2630.73 samples/sec   Loss 8.6073   LearningRate 0.0395   Epoch: 7   Global Step: 308410   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:54,903-Speed 2630.16 samples/sec   Loss 8.4787   LearningRate 0.0395   Epoch: 7   Global Step: 308420   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:21:58,806-Speed 2624.74 samples/sec   Loss 8.3238   LearningRate 0.0395   Epoch: 7   Global Step: 308430   Fp16 Grad Scale: 4096   Required: 59 hours
Training: 2022-04-14 06:22:02,709-Speed 2623.81 samples/sec   Loss 8.3841   LearningRate 0.0395   Epoch: 7   Global Step: 308440   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:06,609-Speed 2626.27 samples/sec   Loss 8.5576   LearningRate 0.0395   Epoch: 7   Global Step: 308450   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:10,507-Speed 2627.04 samples/sec   Loss 8.5972   LearningRate 0.0395   Epoch: 7   Global Step: 308460   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:14,397-Speed 2633.60 samples/sec   Loss 8.5586   LearningRate 0.0395   Epoch: 7   Global Step: 308470   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:18,285-Speed 2635.00 samples/sec   Loss 8.5333   LearningRate 0.0395   Epoch: 7   Global Step: 308480   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:22,172-Speed 2634.49 samples/sec   Loss 8.5699   LearningRate 0.0395   Epoch: 7   Global Step: 308490   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:26,065-Speed 2631.65 samples/sec   Loss 8.5284   LearningRate 0.0395   Epoch: 7   Global Step: 308500   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:29,964-Speed 2626.51 samples/sec   Loss 8.5197   LearningRate 0.0395   Epoch: 7   Global Step: 308510   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:33,861-Speed 2629.10 samples/sec   Loss 8.4175   LearningRate 0.0395   Epoch: 7   Global Step: 308520   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:37,748-Speed 2634.81 samples/sec   Loss 8.4657   LearningRate 0.0394   Epoch: 7   Global Step: 308530   Fp16 Grad Scale: 8192   Required: 59 hours
Training: 2022-04-14 06:22:41,642-Speed 2630.42 samples/sec   Loss 8.4253   LearningRate 0.0394   Epoch: 7   Global Step: 308540   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:22:45,528-Speed 2635.19 samples/sec   Loss 8.3868   LearningRate 0.0394   Epoch: 7   Global Step: 308550   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:22:49,423-Speed 2629.95 samples/sec   Loss 8.4683   LearningRate 0.0394   Epoch: 7   Global Step: 308560   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:22:53,354-Speed 2605.94 samples/sec   Loss 8.4060   LearningRate 0.0394   Epoch: 7   Global Step: 308570   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:22:57,244-Speed 2633.02 samples/sec   Loss 8.3843   LearningRate 0.0394   Epoch: 7   Global Step: 308580   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:23:01,145-Speed 2625.60 samples/sec   Loss 8.4389   LearningRate 0.0394   Epoch: 7   Global Step: 308590   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:23:05,045-Speed 2625.98 samples/sec   Loss 8.5073   LearningRate 0.0394   Epoch: 7   Global Step: 308600   Fp16 Grad Scale: 16384   Required: 59 hours
Training: 2022-04-14 06:23:08,942-Speed 2628.52 samples/sec   Loss 8.4851   LearningRate 0.0394   Epoch: 7   Global Step: 308610   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:23:12,837-Speed 2629.81 samples/sec   Loss 8.4878   LearningRate 0.0394   Epoch: 7   Global Step: 308620   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:23:16,731-Speed 2630.41 samples/sec   Loss 8.5755   LearningRate 0.0394   Epoch: 7   Global Step: 308630   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:23:20,622-Speed 2632.53 samples/sec   Loss 8.3862   LearningRate 0.0394   Epoch: 7   Global Step: 308640   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:24,510-Speed 2634.45 samples/sec   Loss 8.5592   LearningRate 0.0394   Epoch: 7   Global Step: 308650   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:28,408-Speed 2627.16 samples/sec   Loss 8.5079   LearningRate 0.0394   Epoch: 7   Global Step: 308660   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:32,308-Speed 2626.72 samples/sec   Loss 8.5640   LearningRate 0.0394   Epoch: 7   Global Step: 308670   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:36,196-Speed 2634.31 samples/sec   Loss 8.5854   LearningRate 0.0394   Epoch: 7   Global Step: 308680   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:40,087-Speed 2632.49 samples/sec   Loss 8.4391   LearningRate 0.0394   Epoch: 7   Global Step: 308690   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:44,164-Speed 2511.83 samples/sec   Loss 8.5312   LearningRate 0.0394   Epoch: 7   Global Step: 308700   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:48,258-Speed 2502.44 samples/sec   Loss 8.4583   LearningRate 0.0394   Epoch: 7   Global Step: 308710   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:52,148-Speed 2633.00 samples/sec   Loss 8.5228   LearningRate 0.0394   Epoch: 7   Global Step: 308720   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:56,035-Speed 2637.76 samples/sec   Loss 8.3965   LearningRate 0.0394   Epoch: 7   Global Step: 308730   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:23:59,926-Speed 2632.57 samples/sec   Loss 8.4136   LearningRate 0.0394   Epoch: 7   Global Step: 308740   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:03,819-Speed 2630.85 samples/sec   Loss 8.4296   LearningRate 0.0394   Epoch: 7   Global Step: 308750   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:07,709-Speed 2633.23 samples/sec   Loss 8.4548   LearningRate 0.0394   Epoch: 7   Global Step: 308760   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:11,606-Speed 2627.93 samples/sec   Loss 8.6017   LearningRate 0.0394   Epoch: 7   Global Step: 308770   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:15,497-Speed 2632.31 samples/sec   Loss 8.4613   LearningRate 0.0394   Epoch: 7   Global Step: 308780   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:19,389-Speed 2631.82 samples/sec   Loss 8.4584   LearningRate 0.0394   Epoch: 7   Global Step: 308790   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:23,289-Speed 2626.92 samples/sec   Loss 8.5562   LearningRate 0.0394   Epoch: 7   Global Step: 308800   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:27,179-Speed 2632.39 samples/sec   Loss 8.4878   LearningRate 0.0394   Epoch: 7   Global Step: 308810   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:31,070-Speed 2632.56 samples/sec   Loss 8.5162   LearningRate 0.0394   Epoch: 7   Global Step: 308820   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:34,972-Speed 2624.36 samples/sec   Loss 8.4275   LearningRate 0.0394   Epoch: 7   Global Step: 308830   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:24:38,869-Speed 2628.91 samples/sec   Loss 8.4009   LearningRate 0.0394   Epoch: 7   Global Step: 308840   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:24:42,764-Speed 2630.22 samples/sec   Loss 8.4804   LearningRate 0.0394   Epoch: 7   Global Step: 308850   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:24:46,653-Speed 2633.28 samples/sec   Loss 8.5580   LearningRate 0.0394   Epoch: 7   Global Step: 308860   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:24:50,551-Speed 2627.71 samples/sec   Loss 8.5187   LearningRate 0.0394   Epoch: 7   Global Step: 308870   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:24:54,449-Speed 2627.97 samples/sec   Loss 8.4698   LearningRate 0.0394   Epoch: 7   Global Step: 308880   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:24:58,341-Speed 2631.29 samples/sec   Loss 8.5116   LearningRate 0.0394   Epoch: 7   Global Step: 308890   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:02,253-Speed 2617.91 samples/sec   Loss 8.4344   LearningRate 0.0394   Epoch: 7   Global Step: 308900   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:06,155-Speed 2625.37 samples/sec   Loss 8.4927   LearningRate 0.0394   Epoch: 7   Global Step: 308910   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:10,057-Speed 2624.68 samples/sec   Loss 8.4245   LearningRate 0.0394   Epoch: 7   Global Step: 308920   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:13,972-Speed 2616.07 samples/sec   Loss 8.5987   LearningRate 0.0394   Epoch: 7   Global Step: 308930   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:17,886-Speed 2617.27 samples/sec   Loss 8.4517   LearningRate 0.0394   Epoch: 7   Global Step: 308940   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:21,898-Speed 2552.88 samples/sec   Loss 8.5927   LearningRate 0.0394   Epoch: 7   Global Step: 308950   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:25,789-Speed 2632.81 samples/sec   Loss 8.4875   LearningRate 0.0394   Epoch: 7   Global Step: 308960   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:25:29,666-Speed 2641.66 samples/sec   Loss 8.5480   LearningRate 0.0394   Epoch: 7   Global Step: 308970   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:33,564-Speed 2627.61 samples/sec   Loss 8.4443   LearningRate 0.0394   Epoch: 7   Global Step: 308980   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:37,457-Speed 2630.80 samples/sec   Loss 8.5753   LearningRate 0.0394   Epoch: 7   Global Step: 308990   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:41,352-Speed 2629.94 samples/sec   Loss 8.5165   LearningRate 0.0394   Epoch: 7   Global Step: 309000   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:45,243-Speed 2631.90 samples/sec   Loss 8.4937   LearningRate 0.0394   Epoch: 7   Global Step: 309010   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:49,148-Speed 2623.27 samples/sec   Loss 8.4928   LearningRate 0.0394   Epoch: 7   Global Step: 309020   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:53,040-Speed 2631.22 samples/sec   Loss 8.4642   LearningRate 0.0394   Epoch: 7   Global Step: 309030   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:25:56,930-Speed 2633.83 samples/sec   Loss 8.5321   LearningRate 0.0394   Epoch: 7   Global Step: 309040   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:26:00,822-Speed 2631.33 samples/sec   Loss 8.4940   LearningRate 0.0394   Epoch: 7   Global Step: 309050   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:26:04,723-Speed 2625.89 samples/sec   Loss 8.4421   LearningRate 0.0394   Epoch: 7   Global Step: 309060   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:26:08,672-Speed 2593.28 samples/sec   Loss 8.4986   LearningRate 0.0394   Epoch: 7   Global Step: 309070   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:12,563-Speed 2632.58 samples/sec   Loss 8.4248   LearningRate 0.0394   Epoch: 7   Global Step: 309080   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:16,463-Speed 2626.32 samples/sec   Loss 8.4871   LearningRate 0.0394   Epoch: 7   Global Step: 309090   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:20,352-Speed 2634.11 samples/sec   Loss 8.4150   LearningRate 0.0394   Epoch: 7   Global Step: 309100   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:24,248-Speed 2629.28 samples/sec   Loss 8.4823   LearningRate 0.0394   Epoch: 7   Global Step: 309110   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:28,148-Speed 2626.39 samples/sec   Loss 8.5307   LearningRate 0.0394   Epoch: 7   Global Step: 309120   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:32,066-Speed 2613.80 samples/sec   Loss 8.4975   LearningRate 0.0394   Epoch: 7   Global Step: 309130   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:35,972-Speed 2622.49 samples/sec   Loss 8.4433   LearningRate 0.0394   Epoch: 7   Global Step: 309140   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:39,868-Speed 2628.52 samples/sec   Loss 8.4213   LearningRate 0.0394   Epoch: 7   Global Step: 309150   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:43,761-Speed 2631.66 samples/sec   Loss 8.4295   LearningRate 0.0394   Epoch: 7   Global Step: 309160   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:47,653-Speed 2631.62 samples/sec   Loss 8.3982   LearningRate 0.0394   Epoch: 7   Global Step: 309170   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:26:51,549-Speed 2629.28 samples/sec   Loss 8.3217   LearningRate 0.0394   Epoch: 7   Global Step: 309180   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:26:55,420-Speed 2645.61 samples/sec   Loss 8.4426   LearningRate 0.0393   Epoch: 7   Global Step: 309190   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:26:59,312-Speed 2631.83 samples/sec   Loss 8.4536   LearningRate 0.0393   Epoch: 7   Global Step: 309200   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:03,205-Speed 2631.01 samples/sec   Loss 8.6384   LearningRate 0.0393   Epoch: 7   Global Step: 309210   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:07,098-Speed 2631.41 samples/sec   Loss 8.4910   LearningRate 0.0393   Epoch: 7   Global Step: 309220   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:11,006-Speed 2620.53 samples/sec   Loss 8.3839   LearningRate 0.0393   Epoch: 7   Global Step: 309230   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:14,905-Speed 2627.20 samples/sec   Loss 8.4581   LearningRate 0.0393   Epoch: 7   Global Step: 309240   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:18,829-Speed 2610.65 samples/sec   Loss 8.6374   LearningRate 0.0393   Epoch: 7   Global Step: 309250   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:22,807-Speed 2574.27 samples/sec   Loss 8.4646   LearningRate 0.0393   Epoch: 7   Global Step: 309260   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:26,698-Speed 2633.16 samples/sec   Loss 8.4138   LearningRate 0.0393   Epoch: 7   Global Step: 309270   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:30,714-Speed 2550.18 samples/sec   Loss 8.3663   LearningRate 0.0393   Epoch: 7   Global Step: 309280   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:27:34,612-Speed 2627.77 samples/sec   Loss 8.5359   LearningRate 0.0393   Epoch: 7   Global Step: 309290   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:27:38,502-Speed 2632.64 samples/sec   Loss 8.4124   LearningRate 0.0393   Epoch: 7   Global Step: 309300   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:27:42,405-Speed 2624.50 samples/sec   Loss 8.3847   LearningRate 0.0393   Epoch: 7   Global Step: 309310   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:27:46,305-Speed 2626.31 samples/sec   Loss 8.4783   LearningRate 0.0393   Epoch: 7   Global Step: 309320   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:27:50,199-Speed 2630.22 samples/sec   Loss 8.5011   LearningRate 0.0393   Epoch: 7   Global Step: 309330   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:27:54,091-Speed 2631.86 samples/sec   Loss 8.5466   LearningRate 0.0393   Epoch: 7   Global Step: 309340   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:27:57,982-Speed 2633.01 samples/sec   Loss 8.4649   LearningRate 0.0393   Epoch: 7   Global Step: 309350   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:01,877-Speed 2629.09 samples/sec   Loss 8.5918   LearningRate 0.0393   Epoch: 7   Global Step: 309360   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:05,782-Speed 2623.18 samples/sec   Loss 8.4195   LearningRate 0.0393   Epoch: 7   Global Step: 309370   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:09,710-Speed 2607.56 samples/sec   Loss 8.4345   LearningRate 0.0393   Epoch: 7   Global Step: 309380   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:13,607-Speed 2628.36 samples/sec   Loss 8.4247   LearningRate 0.0393   Epoch: 7   Global Step: 309390   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:28:17,487-Speed 2639.58 samples/sec   Loss 8.5292   LearningRate 0.0393   Epoch: 7   Global Step: 309400   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:21,382-Speed 2629.52 samples/sec   Loss 8.3265   LearningRate 0.0393   Epoch: 7   Global Step: 309410   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:25,293-Speed 2619.54 samples/sec   Loss 8.5024   LearningRate 0.0393   Epoch: 7   Global Step: 309420   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:29,172-Speed 2639.99 samples/sec   Loss 8.3635   LearningRate 0.0393   Epoch: 7   Global Step: 309430   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:33,062-Speed 2633.34 samples/sec   Loss 8.4103   LearningRate 0.0393   Epoch: 7   Global Step: 309440   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:36,956-Speed 2630.47 samples/sec   Loss 8.4197   LearningRate 0.0393   Epoch: 7   Global Step: 309450   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:40,856-Speed 2626.38 samples/sec   Loss 8.5054   LearningRate 0.0393   Epoch: 7   Global Step: 309460   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:44,747-Speed 2631.88 samples/sec   Loss 8.4290   LearningRate 0.0393   Epoch: 7   Global Step: 309470   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:48,655-Speed 2621.01 samples/sec   Loss 8.5689   LearningRate 0.0393   Epoch: 7   Global Step: 309480   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:52,548-Speed 2631.08 samples/sec   Loss 8.4280   LearningRate 0.0393   Epoch: 7   Global Step: 309490   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:28:56,434-Speed 2635.73 samples/sec   Loss 8.5492   LearningRate 0.0393   Epoch: 7   Global Step: 309500   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:00,324-Speed 2632.97 samples/sec   Loss 8.5456   LearningRate 0.0393   Epoch: 7   Global Step: 309510   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:04,217-Speed 2631.12 samples/sec   Loss 8.4889   LearningRate 0.0393   Epoch: 7   Global Step: 309520   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:08,121-Speed 2623.75 samples/sec   Loss 8.4631   LearningRate 0.0393   Epoch: 7   Global Step: 309530   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:12,014-Speed 2630.83 samples/sec   Loss 8.5471   LearningRate 0.0393   Epoch: 7   Global Step: 309540   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:15,925-Speed 2619.34 samples/sec   Loss 8.5183   LearningRate 0.0393   Epoch: 7   Global Step: 309550   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:19,817-Speed 2631.81 samples/sec   Loss 8.6114   LearningRate 0.0393   Epoch: 7   Global Step: 309560   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:23,708-Speed 2632.31 samples/sec   Loss 8.5222   LearningRate 0.0393   Epoch: 7   Global Step: 309570   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:27,602-Speed 2630.68 samples/sec   Loss 8.4574   LearningRate 0.0393   Epoch: 7   Global Step: 309580   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:31,493-Speed 2632.15 samples/sec   Loss 8.4022   LearningRate 0.0393   Epoch: 7   Global Step: 309590   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:35,384-Speed 2632.11 samples/sec   Loss 8.5125   LearningRate 0.0393   Epoch: 7   Global Step: 309600   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:29:39,259-Speed 2643.20 samples/sec   Loss 8.5852   LearningRate 0.0393   Epoch: 7   Global Step: 309610   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:43,154-Speed 2629.86 samples/sec   Loss 8.5481   LearningRate 0.0393   Epoch: 7   Global Step: 309620   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:29:46,998-Speed 2665.23 samples/sec   Loss 8.8497   LearningRate 0.0393   Epoch: 7   Global Step: 309630   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:29:50,809-Speed 2686.99 samples/sec   Loss 8.9631   LearningRate 0.0393   Epoch: 7   Global Step: 309640   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:29:54,750-Speed 2599.62 samples/sec   Loss 9.9050   LearningRate 0.0393   Epoch: 7   Global Step: 309650   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:29:58,652-Speed 2624.64 samples/sec   Loss 9.2481   LearningRate 0.0393   Epoch: 7   Global Step: 309660   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:02,547-Speed 2630.11 samples/sec   Loss 8.6500   LearningRate 0.0393   Epoch: 7   Global Step: 309670   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:06,426-Speed 2640.55 samples/sec   Loss 8.6328   LearningRate 0.0393   Epoch: 7   Global Step: 309680   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:10,315-Speed 2633.70 samples/sec   Loss 8.5906   LearningRate 0.0393   Epoch: 7   Global Step: 309690   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:14,199-Speed 2637.15 samples/sec   Loss 8.3866   LearningRate 0.0393   Epoch: 7   Global Step: 309700   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:18,090-Speed 2632.32 samples/sec   Loss 8.5847   LearningRate 0.0393   Epoch: 7   Global Step: 309710   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:21,973-Speed 2637.40 samples/sec   Loss 8.5630   LearningRate 0.0393   Epoch: 7   Global Step: 309720   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:25,858-Speed 2636.44 samples/sec   Loss 8.4527   LearningRate 0.0393   Epoch: 7   Global Step: 309730   Fp16 Grad Scale: 1024   Required: 58 hours
Training: 2022-04-14 06:30:29,745-Speed 2635.72 samples/sec   Loss 8.6084   LearningRate 0.0393   Epoch: 7   Global Step: 309740   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:33,632-Speed 2636.31 samples/sec   Loss 8.5018   LearningRate 0.0393   Epoch: 7   Global Step: 309750   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:37,520-Speed 2633.70 samples/sec   Loss 8.5537   LearningRate 0.0393   Epoch: 7   Global Step: 309760   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:41,423-Speed 2624.41 samples/sec   Loss 8.5187   LearningRate 0.0393   Epoch: 7   Global Step: 309770   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:45,317-Speed 2630.65 samples/sec   Loss 8.4933   LearningRate 0.0393   Epoch: 7   Global Step: 309780   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:49,203-Speed 2635.93 samples/sec   Loss 8.4284   LearningRate 0.0393   Epoch: 7   Global Step: 309790   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:53,093-Speed 2633.08 samples/sec   Loss 8.4809   LearningRate 0.0393   Epoch: 7   Global Step: 309800   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:30:56,981-Speed 2634.24 samples/sec   Loss 8.4584   LearningRate 0.0393   Epoch: 7   Global Step: 309810   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:31:00,868-Speed 2634.98 samples/sec   Loss 8.4936   LearningRate 0.0393   Epoch: 7   Global Step: 309820   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:31:04,759-Speed 2632.50 samples/sec   Loss 8.5804   LearningRate 0.0393   Epoch: 7   Global Step: 309830   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:31:08,656-Speed 2628.43 samples/sec   Loss 8.5066   LearningRate 0.0393   Epoch: 7   Global Step: 309840   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:12,547-Speed 2632.26 samples/sec   Loss 8.4511   LearningRate 0.0392   Epoch: 7   Global Step: 309850   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:16,440-Speed 2631.48 samples/sec   Loss 8.5333   LearningRate 0.0392   Epoch: 7   Global Step: 309860   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:20,336-Speed 2628.91 samples/sec   Loss 8.5429   LearningRate 0.0392   Epoch: 7   Global Step: 309870   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:24,220-Speed 2636.96 samples/sec   Loss 8.4158   LearningRate 0.0392   Epoch: 7   Global Step: 309880   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:28,108-Speed 2635.20 samples/sec   Loss 8.5322   LearningRate 0.0392   Epoch: 7   Global Step: 309890   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:31,994-Speed 2635.58 samples/sec   Loss 8.4952   LearningRate 0.0392   Epoch: 7   Global Step: 309900   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:35,883-Speed 2633.77 samples/sec   Loss 8.5263   LearningRate 0.0392   Epoch: 7   Global Step: 309910   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:39,769-Speed 2635.86 samples/sec   Loss 8.4671   LearningRate 0.0392   Epoch: 7   Global Step: 309920   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:43,656-Speed 2635.37 samples/sec   Loss 8.5088   LearningRate 0.0392   Epoch: 7   Global Step: 309930   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:31:47,542-Speed 2635.45 samples/sec   Loss 8.5801   LearningRate 0.0392   Epoch: 7   Global Step: 309940   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:31:51,433-Speed 2633.54 samples/sec   Loss 8.4806   LearningRate 0.0392   Epoch: 7   Global Step: 309950   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:31:55,321-Speed 2633.95 samples/sec   Loss 8.5577   LearningRate 0.0392   Epoch: 7   Global Step: 309960   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:31:59,211-Speed 2633.66 samples/sec   Loss 8.5020   LearningRate 0.0392   Epoch: 7   Global Step: 309970   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:32:03,102-Speed 2632.13 samples/sec   Loss 8.4447   LearningRate 0.0392   Epoch: 7   Global Step: 309980   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:32:06,989-Speed 2634.90 samples/sec   Loss 8.5255   LearningRate 0.0392   Epoch: 7   Global Step: 309990   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:32:10,884-Speed 2629.29 samples/sec   Loss 8.4560   LearningRate 0.0392   Epoch: 7   Global Step: 310000   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:32:53,903-[lfw][310000]XNorm: 24.335987
Training: 2022-04-14 06:32:53,904-[lfw][310000]Accuracy-Flip: 0.99717+-0.00224
Training: 2022-04-14 06:32:53,905-[lfw][310000]Accuracy-Highest: 0.99783
Training: 2022-04-14 06:33:43,984-[cfp_fp][310000]XNorm: 22.351407
Training: 2022-04-14 06:33:43,985-[cfp_fp][310000]Accuracy-Flip: 0.98671+-0.00723
Training: 2022-04-14 06:33:43,986-[cfp_fp][310000]Accuracy-Highest: 0.98671
Training: 2022-04-14 06:34:27,092-[agedb_30][310000]XNorm: 24.160127
Training: 2022-04-14 06:34:27,093-[agedb_30][310000]Accuracy-Flip: 0.97483+-0.00736
Training: 2022-04-14 06:34:27,094-[agedb_30][310000]Accuracy-Highest: 0.97567
Training: 2022-04-14 06:34:30,951-Speed 73.11 samples/sec   Loss 8.4833   LearningRate 0.0392   Epoch: 7   Global Step: 310010   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:34:34,806-Speed 2656.60 samples/sec   Loss 8.4297   LearningRate 0.0392   Epoch: 7   Global Step: 310020   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:34:38,661-Speed 2656.92 samples/sec   Loss 8.4514   LearningRate 0.0392   Epoch: 7   Global Step: 310030   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:34:42,530-Speed 2647.73 samples/sec   Loss 8.3931   LearningRate 0.0392   Epoch: 7   Global Step: 310040   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:34:46,392-Speed 2651.85 samples/sec   Loss 8.4707   LearningRate 0.0392   Epoch: 7   Global Step: 310050   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:34:50,266-Speed 2644.73 samples/sec   Loss 8.4335   LearningRate 0.0392   Epoch: 7   Global Step: 310060   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:34:54,130-Speed 2651.87 samples/sec   Loss 8.5364   LearningRate 0.0392   Epoch: 7   Global Step: 310070   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:34:57,999-Speed 2647.05 samples/sec   Loss 8.3961   LearningRate 0.0392   Epoch: 7   Global Step: 310080   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:35:01,870-Speed 2646.24 samples/sec   Loss 8.4945   LearningRate 0.0392   Epoch: 7   Global Step: 310090   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:35:05,741-Speed 2645.42 samples/sec   Loss 8.4302   LearningRate 0.0392   Epoch: 7   Global Step: 310100   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:35:09,630-Speed 2633.88 samples/sec   Loss 8.3383   LearningRate 0.0392   Epoch: 7   Global Step: 310110   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:35:13,502-Speed 2645.21 samples/sec   Loss 8.5111   LearningRate 0.0392   Epoch: 7   Global Step: 310120   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:35:17,378-Speed 2642.66 samples/sec   Loss 8.5368   LearningRate 0.0392   Epoch: 7   Global Step: 310130   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:35:21,255-Speed 2642.28 samples/sec   Loss 8.4182   LearningRate 0.0392   Epoch: 7   Global Step: 310140   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:25,137-Speed 2638.13 samples/sec   Loss 8.4755   LearningRate 0.0392   Epoch: 7   Global Step: 310150   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:29,061-Speed 2610.54 samples/sec   Loss 8.5016   LearningRate 0.0392   Epoch: 7   Global Step: 310160   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:32,952-Speed 2632.84 samples/sec   Loss 8.5592   LearningRate 0.0392   Epoch: 7   Global Step: 310170   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:36,844-Speed 2631.26 samples/sec   Loss 8.5650   LearningRate 0.0392   Epoch: 7   Global Step: 310180   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:40,735-Speed 2632.18 samples/sec   Loss 8.4760   LearningRate 0.0392   Epoch: 7   Global Step: 310190   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:44,624-Speed 2634.45 samples/sec   Loss 8.4790   LearningRate 0.0392   Epoch: 7   Global Step: 310200   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:48,520-Speed 2628.88 samples/sec   Loss 8.3787   LearningRate 0.0392   Epoch: 7   Global Step: 310210   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:35:52,387-Speed 2648.45 samples/sec   Loss 9.4608   LearningRate 0.0392   Epoch: 7   Global Step: 310220   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:35:56,274-Speed 2634.98 samples/sec   Loss 9.1655   LearningRate 0.0392   Epoch: 7   Global Step: 310230   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:00,165-Speed 2633.05 samples/sec   Loss 8.7293   LearningRate 0.0392   Epoch: 7   Global Step: 310240   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:04,051-Speed 2635.47 samples/sec   Loss 8.5570   LearningRate 0.0392   Epoch: 7   Global Step: 310250   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:07,941-Speed 2632.87 samples/sec   Loss 8.5848   LearningRate 0.0392   Epoch: 7   Global Step: 310260   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:11,828-Speed 2635.52 samples/sec   Loss 8.4708   LearningRate 0.0392   Epoch: 7   Global Step: 310270   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:15,711-Speed 2637.59 samples/sec   Loss 8.5518   LearningRate 0.0392   Epoch: 7   Global Step: 310280   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:19,603-Speed 2632.09 samples/sec   Loss 8.5161   LearningRate 0.0392   Epoch: 7   Global Step: 310290   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:23,492-Speed 2633.54 samples/sec   Loss 8.4514   LearningRate 0.0392   Epoch: 7   Global Step: 310300   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:27,379-Speed 2635.63 samples/sec   Loss 8.4196   LearningRate 0.0392   Epoch: 7   Global Step: 310310   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:36:31,267-Speed 2634.19 samples/sec   Loss 8.4066   LearningRate 0.0392   Epoch: 7   Global Step: 310320   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:35,154-Speed 2635.19 samples/sec   Loss 8.4794   LearningRate 0.0392   Epoch: 7   Global Step: 310330   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:39,045-Speed 2632.21 samples/sec   Loss 8.4514   LearningRate 0.0392   Epoch: 7   Global Step: 310340   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:42,933-Speed 2634.20 samples/sec   Loss 8.4282   LearningRate 0.0392   Epoch: 7   Global Step: 310350   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:46,820-Speed 2635.27 samples/sec   Loss 8.3427   LearningRate 0.0392   Epoch: 7   Global Step: 310360   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:50,721-Speed 2625.85 samples/sec   Loss 8.6256   LearningRate 0.0392   Epoch: 7   Global Step: 310370   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:54,607-Speed 2636.22 samples/sec   Loss 8.4908   LearningRate 0.0392   Epoch: 7   Global Step: 310380   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:36:58,493-Speed 2635.66 samples/sec   Loss 8.4220   LearningRate 0.0392   Epoch: 7   Global Step: 310390   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:37:02,401-Speed 2621.20 samples/sec   Loss 8.3828   LearningRate 0.0392   Epoch: 7   Global Step: 310400   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:37:06,293-Speed 2631.27 samples/sec   Loss 8.3579   LearningRate 0.0392   Epoch: 7   Global Step: 310410   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:37:10,182-Speed 2633.94 samples/sec   Loss 8.4408   LearningRate 0.0392   Epoch: 7   Global Step: 310420   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:14,074-Speed 2631.18 samples/sec   Loss 8.4449   LearningRate 0.0392   Epoch: 7   Global Step: 310430   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:17,963-Speed 2633.95 samples/sec   Loss 8.4036   LearningRate 0.0392   Epoch: 7   Global Step: 310440   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:21,855-Speed 2631.36 samples/sec   Loss 8.5152   LearningRate 0.0392   Epoch: 7   Global Step: 310450   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:25,750-Speed 2630.15 samples/sec   Loss 8.5218   LearningRate 0.0392   Epoch: 7   Global Step: 310460   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:29,645-Speed 2629.31 samples/sec   Loss 8.4608   LearningRate 0.0392   Epoch: 7   Global Step: 310470   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:33,543-Speed 2627.64 samples/sec   Loss 8.4746   LearningRate 0.0392   Epoch: 7   Global Step: 310480   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:37,454-Speed 2619.12 samples/sec   Loss 8.5441   LearningRate 0.0392   Epoch: 7   Global Step: 310490   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:41,371-Speed 2614.84 samples/sec   Loss 8.5683   LearningRate 0.0392   Epoch: 7   Global Step: 310500   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:45,269-Speed 2627.93 samples/sec   Loss 8.4970   LearningRate 0.0392   Epoch: 7   Global Step: 310510   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:37:49,160-Speed 2631.99 samples/sec   Loss 8.5016   LearningRate 0.0391   Epoch: 7   Global Step: 310520   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:37:53,055-Speed 2629.45 samples/sec   Loss 8.3966   LearningRate 0.0391   Epoch: 7   Global Step: 310530   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:37:56,952-Speed 2628.55 samples/sec   Loss 8.3233   LearningRate 0.0391   Epoch: 7   Global Step: 310540   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:00,844-Speed 2631.51 samples/sec   Loss 8.5658   LearningRate 0.0391   Epoch: 7   Global Step: 310550   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:04,749-Speed 2622.93 samples/sec   Loss 8.4176   LearningRate 0.0391   Epoch: 7   Global Step: 310560   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:08,645-Speed 2629.18 samples/sec   Loss 8.4756   LearningRate 0.0391   Epoch: 7   Global Step: 310570   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:12,580-Speed 2602.77 samples/sec   Loss 8.4438   LearningRate 0.0391   Epoch: 7   Global Step: 310580   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:16,492-Speed 2618.23 samples/sec   Loss 8.4199   LearningRate 0.0391   Epoch: 7   Global Step: 310590   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:20,391-Speed 2627.40 samples/sec   Loss 8.5280   LearningRate 0.0391   Epoch: 7   Global Step: 310600   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:24,292-Speed 2625.64 samples/sec   Loss 8.4418   LearningRate 0.0391   Epoch: 7   Global Step: 310610   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:38:28,204-Speed 2617.82 samples/sec   Loss 8.4298   LearningRate 0.0391   Epoch: 7   Global Step: 310620   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:32,245-Speed 2534.45 samples/sec   Loss 8.4175   LearningRate 0.0391   Epoch: 7   Global Step: 310630   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:36,143-Speed 2627.53 samples/sec   Loss 8.5185   LearningRate 0.0391   Epoch: 7   Global Step: 310640   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:40,056-Speed 2618.37 samples/sec   Loss 8.4457   LearningRate 0.0391   Epoch: 7   Global Step: 310650   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:43,981-Speed 2609.07 samples/sec   Loss 8.4389   LearningRate 0.0391   Epoch: 7   Global Step: 310660   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:47,897-Speed 2616.27 samples/sec   Loss 8.4078   LearningRate 0.0391   Epoch: 7   Global Step: 310670   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:51,875-Speed 2574.62 samples/sec   Loss 8.4601   LearningRate 0.0391   Epoch: 7   Global Step: 310680   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:55,799-Speed 2610.77 samples/sec   Loss 8.4157   LearningRate 0.0391   Epoch: 7   Global Step: 310690   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:38:59,694-Speed 2629.43 samples/sec   Loss 8.4592   LearningRate 0.0391   Epoch: 7   Global Step: 310700   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:03,591-Speed 2628.03 samples/sec   Loss 8.3541   LearningRate 0.0391   Epoch: 7   Global Step: 310710   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:07,494-Speed 2623.60 samples/sec   Loss 8.4443   LearningRate 0.0391   Epoch: 7   Global Step: 310720   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:39:11,392-Speed 2628.05 samples/sec   Loss 8.4224   LearningRate 0.0391   Epoch: 7   Global Step: 310730   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:39:15,291-Speed 2627.25 samples/sec   Loss 8.3854   LearningRate 0.0391   Epoch: 7   Global Step: 310740   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:39:19,196-Speed 2623.42 samples/sec   Loss 8.4832   LearningRate 0.0391   Epoch: 7   Global Step: 310750   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:39:23,113-Speed 2614.22 samples/sec   Loss 8.4698   LearningRate 0.0391   Epoch: 7   Global Step: 310760   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:39:27,036-Speed 2611.28 samples/sec   Loss 8.4268   LearningRate 0.0391   Epoch: 7   Global Step: 310770   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:39:30,935-Speed 2627.13 samples/sec   Loss 8.4991   LearningRate 0.0391   Epoch: 7   Global Step: 310780   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:34,869-Speed 2603.56 samples/sec   Loss 8.4699   LearningRate 0.0391   Epoch: 7   Global Step: 310790   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:38,778-Speed 2620.20 samples/sec   Loss 8.3524   LearningRate 0.0391   Epoch: 7   Global Step: 310800   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:42,679-Speed 2625.91 samples/sec   Loss 8.4687   LearningRate 0.0391   Epoch: 7   Global Step: 310810   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:46,586-Speed 2621.66 samples/sec   Loss 8.3580   LearningRate 0.0391   Epoch: 7   Global Step: 310820   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:50,485-Speed 2626.71 samples/sec   Loss 8.5220   LearningRate 0.0391   Epoch: 7   Global Step: 310830   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:54,384-Speed 2626.66 samples/sec   Loss 8.5933   LearningRate 0.0391   Epoch: 7   Global Step: 310840   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:39:58,288-Speed 2624.22 samples/sec   Loss 8.4849   LearningRate 0.0391   Epoch: 7   Global Step: 310850   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:40:02,204-Speed 2615.44 samples/sec   Loss 8.4485   LearningRate 0.0391   Epoch: 7   Global Step: 310860   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:40:06,106-Speed 2624.85 samples/sec   Loss 8.5360   LearningRate 0.0391   Epoch: 7   Global Step: 310870   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:40:10,004-Speed 2627.69 samples/sec   Loss 8.4456   LearningRate 0.0391   Epoch: 7   Global Step: 310880   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:40:13,917-Speed 2617.77 samples/sec   Loss 8.3758   LearningRate 0.0391   Epoch: 7   Global Step: 310890   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:40:17,812-Speed 2629.64 samples/sec   Loss 8.4242   LearningRate 0.0391   Epoch: 7   Global Step: 310900   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:40:21,721-Speed 2620.16 samples/sec   Loss 8.4243   LearningRate 0.0391   Epoch: 7   Global Step: 310910   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:40:25,660-Speed 2600.27 samples/sec   Loss 8.4565   LearningRate 0.0391   Epoch: 7   Global Step: 310920   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:40:29,553-Speed 2631.25 samples/sec   Loss 8.4456   LearningRate 0.0391   Epoch: 7   Global Step: 310930   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:33,451-Speed 2627.89 samples/sec   Loss 8.4747   LearningRate 0.0391   Epoch: 7   Global Step: 310940   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:37,349-Speed 2627.41 samples/sec   Loss 8.3616   LearningRate 0.0391   Epoch: 7   Global Step: 310950   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:41,250-Speed 2625.21 samples/sec   Loss 8.4452   LearningRate 0.0391   Epoch: 7   Global Step: 310960   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:45,155-Speed 2623.70 samples/sec   Loss 8.4052   LearningRate 0.0391   Epoch: 7   Global Step: 310970   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:49,055-Speed 2625.96 samples/sec   Loss 8.4364   LearningRate 0.0391   Epoch: 7   Global Step: 310980   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:52,954-Speed 2626.93 samples/sec   Loss 8.2832   LearningRate 0.0391   Epoch: 7   Global Step: 310990   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:40:56,855-Speed 2625.47 samples/sec   Loss 8.4401   LearningRate 0.0391   Epoch: 7   Global Step: 311000   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:00,773-Speed 2615.16 samples/sec   Loss 8.5486   LearningRate 0.0391   Epoch: 7   Global Step: 311010   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:04,677-Speed 2623.27 samples/sec   Loss 8.4415   LearningRate 0.0391   Epoch: 7   Global Step: 311020   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:08,580-Speed 2624.11 samples/sec   Loss 8.2420   LearningRate 0.0391   Epoch: 7   Global Step: 311030   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:41:12,480-Speed 2626.50 samples/sec   Loss 8.5357   LearningRate 0.0391   Epoch: 7   Global Step: 311040   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:41:16,359-Speed 2640.07 samples/sec   Loss 8.4349   LearningRate 0.0391   Epoch: 7   Global Step: 311050   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:20,268-Speed 2621.22 samples/sec   Loss 8.3645   LearningRate 0.0391   Epoch: 7   Global Step: 311060   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:24,168-Speed 2626.30 samples/sec   Loss 8.3454   LearningRate 0.0391   Epoch: 7   Global Step: 311070   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:28,064-Speed 2629.24 samples/sec   Loss 8.3852   LearningRate 0.0391   Epoch: 7   Global Step: 311080   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:31,962-Speed 2627.53 samples/sec   Loss 8.5385   LearningRate 0.0391   Epoch: 7   Global Step: 311090   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:35,882-Speed 2612.80 samples/sec   Loss 8.5042   LearningRate 0.0391   Epoch: 7   Global Step: 311100   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:39,778-Speed 2629.17 samples/sec   Loss 8.4377   LearningRate 0.0391   Epoch: 7   Global Step: 311110   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:43,693-Speed 2616.23 samples/sec   Loss 8.3575   LearningRate 0.0391   Epoch: 7   Global Step: 311120   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:47,587-Speed 2630.31 samples/sec   Loss 8.3399   LearningRate 0.0391   Epoch: 7   Global Step: 311130   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:51,492-Speed 2622.86 samples/sec   Loss 8.2684   LearningRate 0.0391   Epoch: 7   Global Step: 311140   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:41:55,397-Speed 2622.49 samples/sec   Loss 8.5362   LearningRate 0.0391   Epoch: 7   Global Step: 311150   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:41:59,306-Speed 2620.74 samples/sec   Loss 8.6516   LearningRate 0.0391   Epoch: 7   Global Step: 311160   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:03,211-Speed 2622.77 samples/sec   Loss 8.3665   LearningRate 0.0391   Epoch: 7   Global Step: 311170   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:07,115-Speed 2623.89 samples/sec   Loss 8.3676   LearningRate 0.0390   Epoch: 7   Global Step: 311180   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:11,019-Speed 2623.19 samples/sec   Loss 8.3812   LearningRate 0.0390   Epoch: 7   Global Step: 311190   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:14,930-Speed 2618.70 samples/sec   Loss 8.3970   LearningRate 0.0390   Epoch: 7   Global Step: 311200   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:18,843-Speed 2617.52 samples/sec   Loss 8.4444   LearningRate 0.0390   Epoch: 7   Global Step: 311210   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:22,752-Speed 2619.98 samples/sec   Loss 8.4633   LearningRate 0.0390   Epoch: 7   Global Step: 311220   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:26,666-Speed 2617.83 samples/sec   Loss 8.3880   LearningRate 0.0390   Epoch: 7   Global Step: 311230   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:30,580-Speed 2616.78 samples/sec   Loss 8.4947   LearningRate 0.0390   Epoch: 7   Global Step: 311240   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:34,478-Speed 2627.71 samples/sec   Loss 8.5214   LearningRate 0.0390   Epoch: 7   Global Step: 311250   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:42:38,357-Speed 2640.74 samples/sec   Loss 8.2512   LearningRate 0.0390   Epoch: 7   Global Step: 311260   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:42,254-Speed 2627.66 samples/sec   Loss 8.4475   LearningRate 0.0390   Epoch: 7   Global Step: 311270   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:46,155-Speed 2625.80 samples/sec   Loss 8.4358   LearningRate 0.0390   Epoch: 7   Global Step: 311280   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:50,068-Speed 2617.90 samples/sec   Loss 8.3676   LearningRate 0.0390   Epoch: 7   Global Step: 311290   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:53,985-Speed 2614.47 samples/sec   Loss 8.4807   LearningRate 0.0390   Epoch: 7   Global Step: 311300   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:42:57,907-Speed 2611.87 samples/sec   Loss 8.4474   LearningRate 0.0390   Epoch: 7   Global Step: 311310   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:43:01,858-Speed 2592.25 samples/sec   Loss 8.3757   LearningRate 0.0390   Epoch: 7   Global Step: 311320   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:43:05,894-Speed 2538.11 samples/sec   Loss 8.4751   LearningRate 0.0390   Epoch: 7   Global Step: 311330   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:09,815-Speed 2612.31 samples/sec   Loss 8.3552   LearningRate 0.0390   Epoch: 7   Global Step: 311340   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:13,717-Speed 2625.13 samples/sec   Loss 8.4421   LearningRate 0.0390   Epoch: 7   Global Step: 311350   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:17,619-Speed 2624.37 samples/sec   Loss 8.2570   LearningRate 0.0390   Epoch: 7   Global Step: 311360   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:21,530-Speed 2618.94 samples/sec   Loss 8.5122   LearningRate 0.0390   Epoch: 7   Global Step: 311370   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:25,436-Speed 2622.29 samples/sec   Loss 8.4127   LearningRate 0.0390   Epoch: 7   Global Step: 311380   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:29,341-Speed 2622.80 samples/sec   Loss 8.5092   LearningRate 0.0390   Epoch: 7   Global Step: 311390   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:33,240-Speed 2627.58 samples/sec   Loss 8.5100   LearningRate 0.0390   Epoch: 7   Global Step: 311400   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:37,138-Speed 2627.54 samples/sec   Loss 8.4577   LearningRate 0.0390   Epoch: 7   Global Step: 311410   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:41,038-Speed 2626.53 samples/sec   Loss 8.4460   LearningRate 0.0390   Epoch: 7   Global Step: 311420   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:43:44,938-Speed 2626.34 samples/sec   Loss 8.4963   LearningRate 0.0390   Epoch: 7   Global Step: 311430   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:43:48,837-Speed 2627.10 samples/sec   Loss 8.3741   LearningRate 0.0390   Epoch: 7   Global Step: 311440   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:43:52,736-Speed 2626.44 samples/sec   Loss 8.4036   LearningRate 0.0390   Epoch: 7   Global Step: 311450   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:43:56,664-Speed 2608.13 samples/sec   Loss 8.5723   LearningRate 0.0390   Epoch: 7   Global Step: 311460   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:00,586-Speed 2611.67 samples/sec   Loss 8.4579   LearningRate 0.0390   Epoch: 7   Global Step: 311470   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:04,491-Speed 2623.18 samples/sec   Loss 8.4031   LearningRate 0.0390   Epoch: 7   Global Step: 311480   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:08,423-Speed 2604.63 samples/sec   Loss 8.3330   LearningRate 0.0390   Epoch: 7   Global Step: 311490   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:12,337-Speed 2616.69 samples/sec   Loss 8.5702   LearningRate 0.0390   Epoch: 7   Global Step: 311500   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:16,250-Speed 2617.70 samples/sec   Loss 8.3518   LearningRate 0.0390   Epoch: 7   Global Step: 311510   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:20,160-Speed 2622.24 samples/sec   Loss 8.5166   LearningRate 0.0390   Epoch: 7   Global Step: 311520   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:44:24,079-Speed 2613.34 samples/sec   Loss 8.4391   LearningRate 0.0390   Epoch: 7   Global Step: 311530   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:44:27,995-Speed 2615.62 samples/sec   Loss 8.3858   LearningRate 0.0390   Epoch: 7   Global Step: 311540   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:44:31,898-Speed 2623.92 samples/sec   Loss 8.3988   LearningRate 0.0390   Epoch: 7   Global Step: 311550   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:44:35,800-Speed 2625.50 samples/sec   Loss 8.6242   LearningRate 0.0390   Epoch: 7   Global Step: 311560   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:44:39,701-Speed 2626.17 samples/sec   Loss 8.4315   LearningRate 0.0390   Epoch: 7   Global Step: 311570   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:44:43,602-Speed 2625.23 samples/sec   Loss 8.3229   LearningRate 0.0390   Epoch: 7   Global Step: 311580   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:44:47,468-Speed 2649.60 samples/sec   Loss 8.4096   LearningRate 0.0390   Epoch: 7   Global Step: 311590   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:44:51,265-Speed 2697.38 samples/sec   Loss 9.7189   LearningRate 0.0390   Epoch: 7   Global Step: 311600   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:44:55,170-Speed 2623.43 samples/sec   Loss 10.1570   LearningRate 0.0390   Epoch: 7   Global Step: 311610   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:44:59,066-Speed 2629.18 samples/sec   Loss 8.9240   LearningRate 0.0390   Epoch: 7   Global Step: 311620   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:03,054-Speed 2567.61 samples/sec   Loss 8.6603   LearningRate 0.0390   Epoch: 7   Global Step: 311630   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:06,954-Speed 2626.56 samples/sec   Loss 8.5170   LearningRate 0.0390   Epoch: 7   Global Step: 311640   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:10,851-Speed 2628.45 samples/sec   Loss 8.5118   LearningRate 0.0390   Epoch: 7   Global Step: 311650   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:14,752-Speed 2626.90 samples/sec   Loss 8.4128   LearningRate 0.0390   Epoch: 7   Global Step: 311660   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:18,714-Speed 2584.79 samples/sec   Loss 8.4458   LearningRate 0.0390   Epoch: 7   Global Step: 311670   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:22,610-Speed 2629.13 samples/sec   Loss 8.4683   LearningRate 0.0390   Epoch: 7   Global Step: 311680   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:26,517-Speed 2621.08 samples/sec   Loss 8.4163   LearningRate 0.0390   Epoch: 7   Global Step: 311690   Fp16 Grad Scale: 2048   Required: 58 hours
Training: 2022-04-14 06:45:30,415-Speed 2628.29 samples/sec   Loss 8.4282   LearningRate 0.0390   Epoch: 7   Global Step: 311700   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:34,318-Speed 2623.87 samples/sec   Loss 8.4193   LearningRate 0.0390   Epoch: 7   Global Step: 311710   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:38,217-Speed 2626.90 samples/sec   Loss 8.5137   LearningRate 0.0390   Epoch: 7   Global Step: 311720   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:42,121-Speed 2623.28 samples/sec   Loss 8.3824   LearningRate 0.0390   Epoch: 7   Global Step: 311730   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:46,010-Speed 2634.56 samples/sec   Loss 8.6685   LearningRate 0.0390   Epoch: 7   Global Step: 311740   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:49,909-Speed 2627.16 samples/sec   Loss 8.4228   LearningRate 0.0390   Epoch: 7   Global Step: 311750   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:53,808-Speed 2626.82 samples/sec   Loss 8.5446   LearningRate 0.0390   Epoch: 7   Global Step: 311760   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:45:57,707-Speed 2627.66 samples/sec   Loss 8.3771   LearningRate 0.0390   Epoch: 7   Global Step: 311770   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:46:01,604-Speed 2628.16 samples/sec   Loss 8.3709   LearningRate 0.0390   Epoch: 7   Global Step: 311780   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:46:05,505-Speed 2625.23 samples/sec   Loss 8.3808   LearningRate 0.0390   Epoch: 7   Global Step: 311790   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 06:46:09,425-Speed 2612.83 samples/sec   Loss 8.4147   LearningRate 0.0390   Epoch: 7   Global Step: 311800   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:13,554-Speed 2480.69 samples/sec   Loss 8.3914   LearningRate 0.0390   Epoch: 7   Global Step: 311810   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:17,688-Speed 2477.33 samples/sec   Loss 8.5301   LearningRate 0.0390   Epoch: 7   Global Step: 311820   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:21,675-Speed 2569.53 samples/sec   Loss 8.3412   LearningRate 0.0390   Epoch: 7   Global Step: 311830   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:25,750-Speed 2512.96 samples/sec   Loss 8.5161   LearningRate 0.0389   Epoch: 7   Global Step: 311840   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:29,694-Speed 2597.65 samples/sec   Loss 8.5450   LearningRate 0.0389   Epoch: 7   Global Step: 311850   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:33,590-Speed 2628.63 samples/sec   Loss 8.5380   LearningRate 0.0389   Epoch: 7   Global Step: 311860   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:37,491-Speed 2625.54 samples/sec   Loss 8.4247   LearningRate 0.0389   Epoch: 7   Global Step: 311870   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:41,397-Speed 2622.01 samples/sec   Loss 8.3437   LearningRate 0.0389   Epoch: 7   Global Step: 311880   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:45,299-Speed 2625.57 samples/sec   Loss 8.4183   LearningRate 0.0389   Epoch: 7   Global Step: 311890   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:46:49,202-Speed 2624.14 samples/sec   Loss 8.3847   LearningRate 0.0389   Epoch: 7   Global Step: 311900   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:46:53,110-Speed 2621.58 samples/sec   Loss 8.6306   LearningRate 0.0389   Epoch: 7   Global Step: 311910   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:46:57,013-Speed 2624.30 samples/sec   Loss 8.4912   LearningRate 0.0389   Epoch: 7   Global Step: 311920   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:00,930-Speed 2614.82 samples/sec   Loss 8.3079   LearningRate 0.0389   Epoch: 7   Global Step: 311930   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:04,831-Speed 2625.45 samples/sec   Loss 8.4149   LearningRate 0.0389   Epoch: 7   Global Step: 311940   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:08,732-Speed 2625.67 samples/sec   Loss 8.4739   LearningRate 0.0389   Epoch: 7   Global Step: 311950   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:12,638-Speed 2622.70 samples/sec   Loss 8.4635   LearningRate 0.0389   Epoch: 7   Global Step: 311960   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:16,541-Speed 2624.01 samples/sec   Loss 8.4842   LearningRate 0.0389   Epoch: 7   Global Step: 311970   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:20,448-Speed 2622.05 samples/sec   Loss 8.3876   LearningRate 0.0389   Epoch: 7   Global Step: 311980   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:24,348-Speed 2626.06 samples/sec   Loss 8.5130   LearningRate 0.0389   Epoch: 7   Global Step: 311990   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:47:28,255-Speed 2621.80 samples/sec   Loss 8.3572   LearningRate 0.0389   Epoch: 7   Global Step: 312000   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:32,162-Speed 2621.89 samples/sec   Loss 8.4163   LearningRate 0.0389   Epoch: 7   Global Step: 312010   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:36,071-Speed 2619.55 samples/sec   Loss 8.4925   LearningRate 0.0389   Epoch: 7   Global Step: 312020   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:39,986-Speed 2616.15 samples/sec   Loss 8.4854   LearningRate 0.0389   Epoch: 7   Global Step: 312030   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:43,903-Speed 2615.42 samples/sec   Loss 8.4175   LearningRate 0.0389   Epoch: 7   Global Step: 312040   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:47,815-Speed 2617.95 samples/sec   Loss 8.4683   LearningRate 0.0389   Epoch: 7   Global Step: 312050   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:51,888-Speed 2514.90 samples/sec   Loss 8.4024   LearningRate 0.0389   Epoch: 7   Global Step: 312060   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:55,970-Speed 2509.06 samples/sec   Loss 8.2676   LearningRate 0.0389   Epoch: 7   Global Step: 312070   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:47:59,899-Speed 2607.22 samples/sec   Loss 8.3406   LearningRate 0.0389   Epoch: 7   Global Step: 312080   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:48:03,809-Speed 2619.16 samples/sec   Loss 8.3854   LearningRate 0.0389   Epoch: 7   Global Step: 312090   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:48:07,726-Speed 2615.16 samples/sec   Loss 8.3958   LearningRate 0.0389   Epoch: 7   Global Step: 312100   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:11,639-Speed 2617.19 samples/sec   Loss 8.4724   LearningRate 0.0389   Epoch: 7   Global Step: 312110   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:15,549-Speed 2620.37 samples/sec   Loss 8.4581   LearningRate 0.0389   Epoch: 7   Global Step: 312120   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:19,460-Speed 2618.35 samples/sec   Loss 8.4161   LearningRate 0.0389   Epoch: 7   Global Step: 312130   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:23,369-Speed 2620.20 samples/sec   Loss 8.3917   LearningRate 0.0389   Epoch: 7   Global Step: 312140   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:27,275-Speed 2622.79 samples/sec   Loss 8.3975   LearningRate 0.0389   Epoch: 7   Global Step: 312150   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:31,175-Speed 2626.27 samples/sec   Loss 8.4877   LearningRate 0.0389   Epoch: 7   Global Step: 312160   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:35,079-Speed 2623.66 samples/sec   Loss 8.3622   LearningRate 0.0389   Epoch: 7   Global Step: 312170   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:39,014-Speed 2603.13 samples/sec   Loss 8.5725   LearningRate 0.0389   Epoch: 7   Global Step: 312180   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:42,925-Speed 2618.47 samples/sec   Loss 8.5494   LearningRate 0.0389   Epoch: 7   Global Step: 312190   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:48:46,836-Speed 2618.76 samples/sec   Loss 8.3946   LearningRate 0.0389   Epoch: 7   Global Step: 312200   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:48:50,762-Speed 2609.37 samples/sec   Loss 8.4427   LearningRate 0.0389   Epoch: 7   Global Step: 312210   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:48:54,665-Speed 2624.41 samples/sec   Loss 8.4159   LearningRate 0.0389   Epoch: 7   Global Step: 312220   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:48:58,596-Speed 2606.01 samples/sec   Loss 8.4530   LearningRate 0.0389   Epoch: 7   Global Step: 312230   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:02,528-Speed 2604.92 samples/sec   Loss 8.3843   LearningRate 0.0389   Epoch: 7   Global Step: 312240   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:06,428-Speed 2626.55 samples/sec   Loss 8.4410   LearningRate 0.0389   Epoch: 7   Global Step: 312250   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:10,336-Speed 2620.83 samples/sec   Loss 8.4254   LearningRate 0.0389   Epoch: 7   Global Step: 312260   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:14,238-Speed 2624.78 samples/sec   Loss 8.4932   LearningRate 0.0389   Epoch: 7   Global Step: 312270   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:18,140-Speed 2625.09 samples/sec   Loss 8.4168   LearningRate 0.0389   Epoch: 7   Global Step: 312280   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:22,056-Speed 2615.53 samples/sec   Loss 8.4995   LearningRate 0.0389   Epoch: 7   Global Step: 312290   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:25,966-Speed 2619.91 samples/sec   Loss 8.4804   LearningRate 0.0389   Epoch: 7   Global Step: 312300   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:49:29,852-Speed 2635.70 samples/sec   Loss 8.5981   LearningRate 0.0389   Epoch: 7   Global Step: 312310   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:33,765-Speed 2617.68 samples/sec   Loss 8.4015   LearningRate 0.0389   Epoch: 7   Global Step: 312320   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:37,661-Speed 2629.19 samples/sec   Loss 8.4235   LearningRate 0.0389   Epoch: 7   Global Step: 312330   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:41,564-Speed 2624.53 samples/sec   Loss 8.4754   LearningRate 0.0389   Epoch: 7   Global Step: 312340   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:45,474-Speed 2619.26 samples/sec   Loss 8.5315   LearningRate 0.0389   Epoch: 7   Global Step: 312350   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:49,388-Speed 2616.58 samples/sec   Loss 8.2924   LearningRate 0.0389   Epoch: 7   Global Step: 312360   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:53,294-Speed 2622.66 samples/sec   Loss 8.3392   LearningRate 0.0389   Epoch: 7   Global Step: 312370   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:49:57,200-Speed 2622.58 samples/sec   Loss 8.4595   LearningRate 0.0389   Epoch: 7   Global Step: 312380   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:01,106-Speed 2622.35 samples/sec   Loss 8.5479   LearningRate 0.0389   Epoch: 7   Global Step: 312390   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:05,019-Speed 2617.66 samples/sec   Loss 8.5559   LearningRate 0.0389   Epoch: 7   Global Step: 312400   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:08,927-Speed 2621.45 samples/sec   Loss 8.4579   LearningRate 0.0389   Epoch: 7   Global Step: 312410   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:12,829-Speed 2624.44 samples/sec   Loss 8.2464   LearningRate 0.0389   Epoch: 7   Global Step: 312420   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:16,734-Speed 2622.79 samples/sec   Loss 8.3115   LearningRate 0.0389   Epoch: 7   Global Step: 312430   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:20,633-Speed 2627.39 samples/sec   Loss 8.4618   LearningRate 0.0389   Epoch: 7   Global Step: 312440   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:24,534-Speed 2625.92 samples/sec   Loss 8.5139   LearningRate 0.0389   Epoch: 7   Global Step: 312450   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:28,432-Speed 2627.22 samples/sec   Loss 8.4921   LearningRate 0.0389   Epoch: 7   Global Step: 312460   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:32,335-Speed 2624.53 samples/sec   Loss 8.4759   LearningRate 0.0389   Epoch: 7   Global Step: 312470   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:36,244-Speed 2619.96 samples/sec   Loss 8.4726   LearningRate 0.0389   Epoch: 7   Global Step: 312480   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:40,154-Speed 2619.60 samples/sec   Loss 8.4313   LearningRate 0.0389   Epoch: 7   Global Step: 312490   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:44,061-Speed 2621.87 samples/sec   Loss 8.3497   LearningRate 0.0389   Epoch: 7   Global Step: 312500   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:47,953-Speed 2631.89 samples/sec   Loss 8.4629   LearningRate 0.0388   Epoch: 7   Global Step: 312510   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:51,848-Speed 2629.06 samples/sec   Loss 8.4168   LearningRate 0.0388   Epoch: 7   Global Step: 312520   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:55,761-Speed 2617.98 samples/sec   Loss 8.4600   LearningRate 0.0388   Epoch: 7   Global Step: 312530   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:50:59,664-Speed 2623.98 samples/sec   Loss 8.3542   LearningRate 0.0388   Epoch: 7   Global Step: 312540   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:03,564-Speed 2626.41 samples/sec   Loss 8.4968   LearningRate 0.0388   Epoch: 7   Global Step: 312550   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:07,463-Speed 2626.26 samples/sec   Loss 8.2081   LearningRate 0.0388   Epoch: 7   Global Step: 312560   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:11,368-Speed 2623.71 samples/sec   Loss 8.4110   LearningRate 0.0388   Epoch: 7   Global Step: 312570   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:15,279-Speed 2619.00 samples/sec   Loss 8.3726   LearningRate 0.0388   Epoch: 7   Global Step: 312580   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:19,179-Speed 2625.93 samples/sec   Loss 8.5513   LearningRate 0.0388   Epoch: 7   Global Step: 312590   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:23,091-Speed 2618.86 samples/sec   Loss 8.4975   LearningRate 0.0388   Epoch: 7   Global Step: 312600   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:27,001-Speed 2619.42 samples/sec   Loss 8.5310   LearningRate 0.0388   Epoch: 7   Global Step: 312610   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:51:30,912-Speed 2618.83 samples/sec   Loss 8.2921   LearningRate 0.0388   Epoch: 7   Global Step: 312620   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:51:34,819-Speed 2621.84 samples/sec   Loss 8.4409   LearningRate 0.0388   Epoch: 7   Global Step: 312630   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:51:38,746-Speed 2608.09 samples/sec   Loss 8.4652   LearningRate 0.0388   Epoch: 7   Global Step: 312640   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:51:42,653-Speed 2621.52 samples/sec   Loss 8.5112   LearningRate 0.0388   Epoch: 7   Global Step: 312650   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:46,561-Speed 2621.02 samples/sec   Loss 8.4297   LearningRate 0.0388   Epoch: 7   Global Step: 312660   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:50,470-Speed 2620.74 samples/sec   Loss 8.2682   LearningRate 0.0388   Epoch: 7   Global Step: 312670   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:54,406-Speed 2602.48 samples/sec   Loss 8.2828   LearningRate 0.0388   Epoch: 7   Global Step: 312680   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:51:58,319-Speed 2617.66 samples/sec   Loss 8.4542   LearningRate 0.0388   Epoch: 7   Global Step: 312690   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:52:02,228-Speed 2619.83 samples/sec   Loss 8.4604   LearningRate 0.0388   Epoch: 7   Global Step: 312700   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:52:06,119-Speed 2632.70 samples/sec   Loss 8.3983   LearningRate 0.0388   Epoch: 7   Global Step: 312710   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:10,031-Speed 2618.05 samples/sec   Loss 8.5225   LearningRate 0.0388   Epoch: 7   Global Step: 312720   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:14,014-Speed 2572.07 samples/sec   Loss 8.5229   LearningRate 0.0388   Epoch: 7   Global Step: 312730   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:17,921-Speed 2621.52 samples/sec   Loss 8.4133   LearningRate 0.0388   Epoch: 7   Global Step: 312740   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:21,850-Speed 2607.03 samples/sec   Loss 8.3631   LearningRate 0.0388   Epoch: 7   Global Step: 312750   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:25,925-Speed 2513.35 samples/sec   Loss 8.3578   LearningRate 0.0388   Epoch: 7   Global Step: 312760   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:29,939-Speed 2551.92 samples/sec   Loss 8.3996   LearningRate 0.0388   Epoch: 7   Global Step: 312770   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:33,842-Speed 2624.06 samples/sec   Loss 8.4446   LearningRate 0.0388   Epoch: 7   Global Step: 312780   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:37,748-Speed 2622.46 samples/sec   Loss 8.4536   LearningRate 0.0388   Epoch: 7   Global Step: 312790   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:41,659-Speed 2618.94 samples/sec   Loss 8.3795   LearningRate 0.0388   Epoch: 7   Global Step: 312800   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:45,586-Speed 2608.22 samples/sec   Loss 8.3635   LearningRate 0.0388   Epoch: 7   Global Step: 312810   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:52:49,468-Speed 2638.60 samples/sec   Loss 8.3704   LearningRate 0.0388   Epoch: 7   Global Step: 312820   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:53,368-Speed 2626.44 samples/sec   Loss 8.3452   LearningRate 0.0388   Epoch: 7   Global Step: 312830   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:52:57,269-Speed 2625.43 samples/sec   Loss 8.5296   LearningRate 0.0388   Epoch: 7   Global Step: 312840   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:01,168-Speed 2627.83 samples/sec   Loss 8.4627   LearningRate 0.0388   Epoch: 7   Global Step: 312850   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:05,074-Speed 2621.91 samples/sec   Loss 8.3356   LearningRate 0.0388   Epoch: 7   Global Step: 312860   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:08,976-Speed 2625.16 samples/sec   Loss 8.4248   LearningRate 0.0388   Epoch: 7   Global Step: 312870   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:12,897-Speed 2611.90 samples/sec   Loss 8.3113   LearningRate 0.0388   Epoch: 7   Global Step: 312880   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:16,980-Speed 2508.48 samples/sec   Loss 8.3428   LearningRate 0.0388   Epoch: 7   Global Step: 312890   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:20,983-Speed 2559.20 samples/sec   Loss 8.5260   LearningRate 0.0388   Epoch: 7   Global Step: 312900   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:24,887-Speed 2623.24 samples/sec   Loss 8.5183   LearningRate 0.0388   Epoch: 7   Global Step: 312910   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:53:28,793-Speed 2622.72 samples/sec   Loss 8.3623   LearningRate 0.0388   Epoch: 7   Global Step: 312920   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:53:32,697-Speed 2623.54 samples/sec   Loss 8.3593   LearningRate 0.0388   Epoch: 7   Global Step: 312930   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:53:36,636-Speed 2600.62 samples/sec   Loss 8.4198   LearningRate 0.0388   Epoch: 7   Global Step: 312940   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:53:40,483-Speed 2662.57 samples/sec   Loss 8.9214   LearningRate 0.0388   Epoch: 7   Global Step: 312950   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:53:44,387-Speed 2623.78 samples/sec   Loss 9.0122   LearningRate 0.0388   Epoch: 7   Global Step: 312960   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:53:48,290-Speed 2624.09 samples/sec   Loss 8.7122   LearningRate 0.0388   Epoch: 7   Global Step: 312970   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:53:52,192-Speed 2628.70 samples/sec   Loss 8.5815   LearningRate 0.0388   Epoch: 7   Global Step: 312980   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:53:56,089-Speed 2628.30 samples/sec   Loss 8.4867   LearningRate 0.0388   Epoch: 7   Global Step: 312990   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:53:59,985-Speed 2629.25 samples/sec   Loss 8.5565   LearningRate 0.0388   Epoch: 7   Global Step: 313000   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:54:03,882-Speed 2628.43 samples/sec   Loss 8.4591   LearningRate 0.0388   Epoch: 7   Global Step: 313010   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:54:07,779-Speed 2628.40 samples/sec   Loss 8.3508   LearningRate 0.0388   Epoch: 7   Global Step: 313020   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:54:11,678-Speed 2626.32 samples/sec   Loss 8.3312   LearningRate 0.0388   Epoch: 7   Global Step: 313030   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:54:15,574-Speed 2629.39 samples/sec   Loss 8.4270   LearningRate 0.0388   Epoch: 7   Global Step: 313040   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 06:54:19,472-Speed 2627.49 samples/sec   Loss 8.4652   LearningRate 0.0388   Epoch: 7   Global Step: 313050   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:23,370-Speed 2628.03 samples/sec   Loss 8.4812   LearningRate 0.0388   Epoch: 7   Global Step: 313060   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:27,297-Speed 2608.14 samples/sec   Loss 8.3695   LearningRate 0.0388   Epoch: 7   Global Step: 313070   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:31,194-Speed 2628.99 samples/sec   Loss 8.5941   LearningRate 0.0388   Epoch: 7   Global Step: 313080   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:35,093-Speed 2627.40 samples/sec   Loss 8.3987   LearningRate 0.0388   Epoch: 7   Global Step: 313090   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:38,995-Speed 2625.39 samples/sec   Loss 8.3289   LearningRate 0.0388   Epoch: 7   Global Step: 313100   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:42,890-Speed 2629.01 samples/sec   Loss 8.4923   LearningRate 0.0388   Epoch: 7   Global Step: 313110   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:46,787-Speed 2628.35 samples/sec   Loss 8.3501   LearningRate 0.0388   Epoch: 7   Global Step: 313120   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:50,685-Speed 2627.95 samples/sec   Loss 8.5258   LearningRate 0.0388   Epoch: 7   Global Step: 313130   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:54,581-Speed 2628.90 samples/sec   Loss 8.3088   LearningRate 0.0388   Epoch: 7   Global Step: 313140   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 06:54:58,476-Speed 2629.35 samples/sec   Loss 8.4711   LearningRate 0.0388   Epoch: 7   Global Step: 313150   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:02,373-Speed 2628.33 samples/sec   Loss 8.3861   LearningRate 0.0388   Epoch: 7   Global Step: 313160   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:06,269-Speed 2629.17 samples/sec   Loss 8.5660   LearningRate 0.0388   Epoch: 7   Global Step: 313170   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:10,181-Speed 2618.28 samples/sec   Loss 8.2951   LearningRate 0.0387   Epoch: 7   Global Step: 313180   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:14,076-Speed 2629.18 samples/sec   Loss 8.3155   LearningRate 0.0387   Epoch: 7   Global Step: 313190   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:17,977-Speed 2625.55 samples/sec   Loss 8.2870   LearningRate 0.0387   Epoch: 7   Global Step: 313200   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:21,877-Speed 2626.42 samples/sec   Loss 8.5498   LearningRate 0.0387   Epoch: 7   Global Step: 313210   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:25,788-Speed 2619.32 samples/sec   Loss 8.5426   LearningRate 0.0387   Epoch: 7   Global Step: 313220   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:29,697-Speed 2620.12 samples/sec   Loss 8.4436   LearningRate 0.0387   Epoch: 7   Global Step: 313230   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:33,611-Speed 2616.89 samples/sec   Loss 8.4365   LearningRate 0.0387   Epoch: 7   Global Step: 313240   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 06:55:37,531-Speed 2612.64 samples/sec   Loss 8.2447   LearningRate 0.0387   Epoch: 7   Global Step: 313250   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:55:41,444-Speed 2617.78 samples/sec   Loss 8.4537   LearningRate 0.0387   Epoch: 7   Global Step: 313260   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:55:45,345-Speed 2625.49 samples/sec   Loss 8.4907   LearningRate 0.0387   Epoch: 7   Global Step: 313270   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:55:49,251-Speed 2622.49 samples/sec   Loss 8.3773   LearningRate 0.0387   Epoch: 7   Global Step: 313280   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:55:53,151-Speed 2626.33 samples/sec   Loss 8.4008   LearningRate 0.0387   Epoch: 7   Global Step: 313290   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:55:57,050-Speed 2627.20 samples/sec   Loss 8.2501   LearningRate 0.0387   Epoch: 7   Global Step: 313300   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:56:00,963-Speed 2617.85 samples/sec   Loss 8.3963   LearningRate 0.0387   Epoch: 7   Global Step: 313310   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:56:04,896-Speed 2604.03 samples/sec   Loss 8.5668   LearningRate 0.0387   Epoch: 7   Global Step: 313320   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:56:08,793-Speed 2628.66 samples/sec   Loss 8.2914   LearningRate 0.0387   Epoch: 7   Global Step: 313330   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:56:12,695-Speed 2625.08 samples/sec   Loss 8.4025   LearningRate 0.0387   Epoch: 7   Global Step: 313340   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 06:56:16,594-Speed 2626.58 samples/sec   Loss 8.3324   LearningRate 0.0387   Epoch: 7   Global Step: 313350   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:20,494-Speed 2626.15 samples/sec   Loss 8.3138   LearningRate 0.0387   Epoch: 7   Global Step: 313360   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:24,398-Speed 2624.20 samples/sec   Loss 8.4370   LearningRate 0.0387   Epoch: 7   Global Step: 313370   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:28,301-Speed 2623.63 samples/sec   Loss 8.4275   LearningRate 0.0387   Epoch: 7   Global Step: 313380   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:32,216-Speed 2621.21 samples/sec   Loss 8.4522   LearningRate 0.0387   Epoch: 7   Global Step: 313390   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:36,133-Speed 2615.28 samples/sec   Loss 8.5992   LearningRate 0.0387   Epoch: 7   Global Step: 313400   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:40,040-Speed 2621.51 samples/sec   Loss 8.3160   LearningRate 0.0387   Epoch: 7   Global Step: 313410   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:43,946-Speed 2622.26 samples/sec   Loss 8.4191   LearningRate 0.0387   Epoch: 7   Global Step: 313420   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:47,855-Speed 2620.82 samples/sec   Loss 8.4298   LearningRate 0.0387   Epoch: 7   Global Step: 313430   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:51,756-Speed 2625.61 samples/sec   Loss 8.3636   LearningRate 0.0387   Epoch: 7   Global Step: 313440   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:55,638-Speed 2638.21 samples/sec   Loss 8.4477   LearningRate 0.0387   Epoch: 7   Global Step: 313450   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:56:59,691-Speed 2527.02 samples/sec   Loss 8.3152   LearningRate 0.0387   Epoch: 7   Global Step: 313460   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:03,779-Speed 2506.06 samples/sec   Loss 8.4328   LearningRate 0.0387   Epoch: 7   Global Step: 313470   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:07,680-Speed 2626.43 samples/sec   Loss 8.3860   LearningRate 0.0387   Epoch: 7   Global Step: 313480   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:11,575-Speed 2628.93 samples/sec   Loss 8.3482   LearningRate 0.0387   Epoch: 7   Global Step: 313490   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:15,479-Speed 2624.28 samples/sec   Loss 8.3845   LearningRate 0.0387   Epoch: 7   Global Step: 313500   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:19,373-Speed 2629.75 samples/sec   Loss 8.2739   LearningRate 0.0387   Epoch: 7   Global Step: 313510   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:23,366-Speed 2565.29 samples/sec   Loss 8.3508   LearningRate 0.0387   Epoch: 7   Global Step: 313520   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:27,361-Speed 2563.78 samples/sec   Loss 8.4475   LearningRate 0.0387   Epoch: 7   Global Step: 313530   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:31,260-Speed 2627.43 samples/sec   Loss 8.4503   LearningRate 0.0387   Epoch: 7   Global Step: 313540   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:35,156-Speed 2628.44 samples/sec   Loss 8.2536   LearningRate 0.0387   Epoch: 7   Global Step: 313550   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:57:39,053-Speed 2628.01 samples/sec   Loss 8.4307   LearningRate 0.0387   Epoch: 7   Global Step: 313560   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:57:42,967-Speed 2617.40 samples/sec   Loss 8.5069   LearningRate 0.0387   Epoch: 7   Global Step: 313570   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:57:46,856-Speed 2633.94 samples/sec   Loss 8.4516   LearningRate 0.0387   Epoch: 7   Global Step: 313580   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:50,752-Speed 2628.56 samples/sec   Loss 8.4846   LearningRate 0.0387   Epoch: 7   Global Step: 313590   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:54,665-Speed 2617.68 samples/sec   Loss 8.4165   LearningRate 0.0387   Epoch: 7   Global Step: 313600   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:57:58,564-Speed 2627.49 samples/sec   Loss 8.3312   LearningRate 0.0387   Epoch: 7   Global Step: 313610   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:02,466-Speed 2625.22 samples/sec   Loss 8.4209   LearningRate 0.0387   Epoch: 7   Global Step: 313620   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:06,366-Speed 2626.06 samples/sec   Loss 8.2358   LearningRate 0.0387   Epoch: 7   Global Step: 313630   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:10,262-Speed 2628.67 samples/sec   Loss 8.4108   LearningRate 0.0387   Epoch: 7   Global Step: 313640   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:14,160-Speed 2628.07 samples/sec   Loss 8.5222   LearningRate 0.0387   Epoch: 7   Global Step: 313650   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:18,056-Speed 2628.63 samples/sec   Loss 8.4529   LearningRate 0.0387   Epoch: 7   Global Step: 313660   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:21,953-Speed 2628.26 samples/sec   Loss 8.3541   LearningRate 0.0387   Epoch: 7   Global Step: 313670   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:25,848-Speed 2629.64 samples/sec   Loss 8.3057   LearningRate 0.0387   Epoch: 7   Global Step: 313680   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:58:29,726-Speed 2641.32 samples/sec   Loss 8.3187   LearningRate 0.0387   Epoch: 7   Global Step: 313690   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:33,626-Speed 2626.70 samples/sec   Loss 8.5005   LearningRate 0.0387   Epoch: 7   Global Step: 313700   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:37,525-Speed 2626.83 samples/sec   Loss 8.3177   LearningRate 0.0387   Epoch: 7   Global Step: 313710   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:41,420-Speed 2629.22 samples/sec   Loss 8.3100   LearningRate 0.0387   Epoch: 7   Global Step: 313720   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:45,315-Speed 2629.66 samples/sec   Loss 8.4117   LearningRate 0.0387   Epoch: 7   Global Step: 313730   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:49,211-Speed 2628.67 samples/sec   Loss 8.2567   LearningRate 0.0387   Epoch: 7   Global Step: 313740   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:53,113-Speed 2625.54 samples/sec   Loss 8.4380   LearningRate 0.0387   Epoch: 7   Global Step: 313750   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:58:57,008-Speed 2628.98 samples/sec   Loss 8.4623   LearningRate 0.0387   Epoch: 7   Global Step: 313760   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:00,910-Speed 2625.72 samples/sec   Loss 8.3529   LearningRate 0.0387   Epoch: 7   Global Step: 313770   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:04,810-Speed 2626.18 samples/sec   Loss 8.2957   LearningRate 0.0387   Epoch: 7   Global Step: 313780   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:08,706-Speed 2628.94 samples/sec   Loss 8.3447   LearningRate 0.0387   Epoch: 7   Global Step: 313790   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:59:12,589-Speed 2637.51 samples/sec   Loss 8.4049   LearningRate 0.0387   Epoch: 7   Global Step: 313800   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:16,518-Speed 2606.96 samples/sec   Loss 8.3787   LearningRate 0.0387   Epoch: 7   Global Step: 313810   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:20,442-Speed 2610.38 samples/sec   Loss 8.4359   LearningRate 0.0387   Epoch: 7   Global Step: 313820   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:24,409-Speed 2582.03 samples/sec   Loss 8.4365   LearningRate 0.0387   Epoch: 7   Global Step: 313830   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:28,301-Speed 2631.46 samples/sec   Loss 8.4283   LearningRate 0.0386   Epoch: 7   Global Step: 313840   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:32,196-Speed 2630.38 samples/sec   Loss 8.2462   LearningRate 0.0386   Epoch: 7   Global Step: 313850   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:36,102-Speed 2622.44 samples/sec   Loss 8.4215   LearningRate 0.0386   Epoch: 7   Global Step: 313860   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:40,021-Speed 2613.18 samples/sec   Loss 8.3696   LearningRate 0.0386   Epoch: 7   Global Step: 313870   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:43,920-Speed 2627.05 samples/sec   Loss 8.2472   LearningRate 0.0386   Epoch: 7   Global Step: 313880   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:47,813-Speed 2631.38 samples/sec   Loss 8.2705   LearningRate 0.0386   Epoch: 7   Global Step: 313890   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:51,707-Speed 2630.33 samples/sec   Loss 8.2812   LearningRate 0.0386   Epoch: 7   Global Step: 313900   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 06:59:55,584-Speed 2641.45 samples/sec   Loss 8.5181   LearningRate 0.0386   Epoch: 7   Global Step: 313910   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 06:59:59,454-Speed 2647.40 samples/sec   Loss 8.3667   LearningRate 0.0386   Epoch: 7   Global Step: 313920   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:00:03,310-Speed 2655.77 samples/sec   Loss 9.1675   LearningRate 0.0386   Epoch: 7   Global Step: 313930   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:07,202-Speed 2631.56 samples/sec   Loss 8.4503   LearningRate 0.0386   Epoch: 7   Global Step: 313940   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:11,099-Speed 2628.23 samples/sec   Loss 8.2998   LearningRate 0.0386   Epoch: 7   Global Step: 313950   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:14,998-Speed 2627.08 samples/sec   Loss 8.4337   LearningRate 0.0386   Epoch: 7   Global Step: 313960   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:18,902-Speed 2624.27 samples/sec   Loss 8.4363   LearningRate 0.0386   Epoch: 7   Global Step: 313970   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:22,802-Speed 2626.45 samples/sec   Loss 8.3871   LearningRate 0.0386   Epoch: 7   Global Step: 313980   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:26,725-Speed 2610.81 samples/sec   Loss 8.3781   LearningRate 0.0386   Epoch: 7   Global Step: 313990   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:30,634-Speed 2620.04 samples/sec   Loss 8.3533   LearningRate 0.0386   Epoch: 7   Global Step: 314000   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:34,535-Speed 2626.19 samples/sec   Loss 8.4515   LearningRate 0.0386   Epoch: 7   Global Step: 314010   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:38,431-Speed 2629.08 samples/sec   Loss 8.3631   LearningRate 0.0386   Epoch: 7   Global Step: 314020   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:00:42,438-Speed 2555.96 samples/sec   Loss 8.4662   LearningRate 0.0386   Epoch: 7   Global Step: 314030   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:00:46,346-Speed 2621.27 samples/sec   Loss 8.4242   LearningRate 0.0386   Epoch: 7   Global Step: 314040   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:00:50,238-Speed 2631.57 samples/sec   Loss 8.2071   LearningRate 0.0386   Epoch: 7   Global Step: 314050   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:00:54,139-Speed 2625.21 samples/sec   Loss 8.4033   LearningRate 0.0386   Epoch: 7   Global Step: 314060   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:00:58,045-Speed 2622.95 samples/sec   Loss 8.3031   LearningRate 0.0386   Epoch: 7   Global Step: 314070   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:01:01,980-Speed 2603.42 samples/sec   Loss 8.3817   LearningRate 0.0386   Epoch: 7   Global Step: 314080   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:01:05,885-Speed 2622.48 samples/sec   Loss 8.4643   LearningRate 0.0386   Epoch: 7   Global Step: 314090   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:01:09,801-Speed 2615.49 samples/sec   Loss 8.3311   LearningRate 0.0386   Epoch: 7   Global Step: 314100   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:01:13,710-Speed 2620.47 samples/sec   Loss 8.4259   LearningRate 0.0386   Epoch: 7   Global Step: 314110   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:01:17,609-Speed 2627.23 samples/sec   Loss 8.3462   LearningRate 0.0386   Epoch: 7   Global Step: 314120   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:01:21,509-Speed 2625.99 samples/sec   Loss 8.3164   LearningRate 0.0386   Epoch: 7   Global Step: 314130   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:25,406-Speed 2628.72 samples/sec   Loss 8.3957   LearningRate 0.0386   Epoch: 7   Global Step: 314140   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:29,299-Speed 2630.68 samples/sec   Loss 8.4108   LearningRate 0.0386   Epoch: 7   Global Step: 314150   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:33,204-Speed 2622.90 samples/sec   Loss 8.1881   LearningRate 0.0386   Epoch: 7   Global Step: 314160   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:37,108-Speed 2623.53 samples/sec   Loss 8.3484   LearningRate 0.0386   Epoch: 7   Global Step: 314170   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:41,003-Speed 2630.26 samples/sec   Loss 8.3335   LearningRate 0.0386   Epoch: 7   Global Step: 314180   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:44,898-Speed 2629.75 samples/sec   Loss 8.4306   LearningRate 0.0386   Epoch: 7   Global Step: 314190   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:48,804-Speed 2622.52 samples/sec   Loss 8.3914   LearningRate 0.0386   Epoch: 7   Global Step: 314200   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:52,708-Speed 2623.21 samples/sec   Loss 8.3754   LearningRate 0.0386   Epoch: 7   Global Step: 314210   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:01:56,615-Speed 2621.82 samples/sec   Loss 8.4052   LearningRate 0.0386   Epoch: 7   Global Step: 314220   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:02:00,523-Speed 2620.62 samples/sec   Loss 8.2964   LearningRate 0.0386   Epoch: 7   Global Step: 314230   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:04,426-Speed 2625.10 samples/sec   Loss 8.2416   LearningRate 0.0386   Epoch: 7   Global Step: 314240   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:08,327-Speed 2625.43 samples/sec   Loss 8.3019   LearningRate 0.0386   Epoch: 7   Global Step: 314250   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:12,232-Speed 2627.94 samples/sec   Loss 8.2565   LearningRate 0.0386   Epoch: 7   Global Step: 314260   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:16,134-Speed 2624.91 samples/sec   Loss 8.4917   LearningRate 0.0386   Epoch: 7   Global Step: 314270   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:20,041-Speed 2621.39 samples/sec   Loss 8.4745   LearningRate 0.0386   Epoch: 7   Global Step: 314280   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:23,954-Speed 2617.23 samples/sec   Loss 8.3453   LearningRate 0.0386   Epoch: 7   Global Step: 314290   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:27,858-Speed 2624.64 samples/sec   Loss 8.3809   LearningRate 0.0386   Epoch: 7   Global Step: 314300   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:31,762-Speed 2623.63 samples/sec   Loss 8.3749   LearningRate 0.0386   Epoch: 7   Global Step: 314310   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:35,712-Speed 2593.10 samples/sec   Loss 8.4046   LearningRate 0.0386   Epoch: 7   Global Step: 314320   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:02:39,612-Speed 2626.37 samples/sec   Loss 8.4971   LearningRate 0.0386   Epoch: 7   Global Step: 314330   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:02:43,511-Speed 2627.03 samples/sec   Loss 8.3650   LearningRate 0.0386   Epoch: 7   Global Step: 314340   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:02:47,412-Speed 2625.64 samples/sec   Loss 8.2364   LearningRate 0.0386   Epoch: 7   Global Step: 314350   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:02:51,317-Speed 2622.78 samples/sec   Loss 8.4136   LearningRate 0.0386   Epoch: 7   Global Step: 314360   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:02:55,219-Speed 2625.46 samples/sec   Loss 8.4568   LearningRate 0.0386   Epoch: 7   Global Step: 314370   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:02:59,143-Speed 2609.89 samples/sec   Loss 8.3637   LearningRate 0.0386   Epoch: 7   Global Step: 314380   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:03:03,052-Speed 2620.82 samples/sec   Loss 8.3933   LearningRate 0.0386   Epoch: 7   Global Step: 314390   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:03:06,938-Speed 2635.63 samples/sec   Loss 8.4170   LearningRate 0.0386   Epoch: 7   Global Step: 314400   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:10,837-Speed 2626.53 samples/sec   Loss 8.5058   LearningRate 0.0386   Epoch: 7   Global Step: 314410   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:14,739-Speed 2625.40 samples/sec   Loss 8.3713   LearningRate 0.0386   Epoch: 7   Global Step: 314420   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:18,646-Speed 2621.29 samples/sec   Loss 8.3983   LearningRate 0.0386   Epoch: 7   Global Step: 314430   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:22,673-Speed 2543.57 samples/sec   Loss 8.3091   LearningRate 0.0386   Epoch: 7   Global Step: 314440   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:26,579-Speed 2622.21 samples/sec   Loss 8.3159   LearningRate 0.0386   Epoch: 7   Global Step: 314450   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:30,485-Speed 2622.58 samples/sec   Loss 8.4077   LearningRate 0.0386   Epoch: 7   Global Step: 314460   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:34,385-Speed 2626.39 samples/sec   Loss 8.4487   LearningRate 0.0386   Epoch: 7   Global Step: 314470   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:38,282-Speed 2628.19 samples/sec   Loss 8.3613   LearningRate 0.0386   Epoch: 7   Global Step: 314480   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:42,178-Speed 2629.06 samples/sec   Loss 8.4535   LearningRate 0.0386   Epoch: 7   Global Step: 314490   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:03:46,082-Speed 2623.44 samples/sec   Loss 8.3490   LearningRate 0.0386   Epoch: 7   Global Step: 314500   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:03:49,973-Speed 2632.50 samples/sec   Loss 8.3504   LearningRate 0.0385   Epoch: 7   Global Step: 314510   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:03:53,905-Speed 2606.35 samples/sec   Loss 8.3939   LearningRate 0.0385   Epoch: 7   Global Step: 314520   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:03:57,801-Speed 2628.98 samples/sec   Loss 8.2606   LearningRate 0.0385   Epoch: 7   Global Step: 314530   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:04:01,698-Speed 2628.76 samples/sec   Loss 8.4327   LearningRate 0.0385   Epoch: 7   Global Step: 314540   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:04:05,581-Speed 2637.39 samples/sec   Loss 8.4017   LearningRate 0.0385   Epoch: 7   Global Step: 314550   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:04:09,407-Speed 2676.85 samples/sec   Loss 10.9358   LearningRate 0.0385   Epoch: 7   Global Step: 314560   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:13,298-Speed 2632.76 samples/sec   Loss 10.0894   LearningRate 0.0385   Epoch: 7   Global Step: 314570   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:17,204-Speed 2622.42 samples/sec   Loss 8.9212   LearningRate 0.0385   Epoch: 7   Global Step: 314580   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:21,101-Speed 2628.07 samples/sec   Loss 8.5450   LearningRate 0.0385   Epoch: 7   Global Step: 314590   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:25,008-Speed 2621.59 samples/sec   Loss 8.5014   LearningRate 0.0385   Epoch: 7   Global Step: 314600   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:28,900-Speed 2631.87 samples/sec   Loss 8.5322   LearningRate 0.0385   Epoch: 7   Global Step: 314610   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:32,788-Speed 2634.66 samples/sec   Loss 8.4170   LearningRate 0.0385   Epoch: 7   Global Step: 314620   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:36,688-Speed 2626.48 samples/sec   Loss 8.3995   LearningRate 0.0385   Epoch: 7   Global Step: 314630   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:40,585-Speed 2627.85 samples/sec   Loss 8.2608   LearningRate 0.0385   Epoch: 7   Global Step: 314640   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:44,477-Speed 2631.74 samples/sec   Loss 8.4068   LearningRate 0.0385   Epoch: 7   Global Step: 314650   Fp16 Grad Scale: 4096   Required: 58 hours
Training: 2022-04-14 07:04:48,368-Speed 2632.04 samples/sec   Loss 8.4015   LearningRate 0.0385   Epoch: 7   Global Step: 314660   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:04:52,265-Speed 2629.30 samples/sec   Loss 8.2650   LearningRate 0.0385   Epoch: 7   Global Step: 314670   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:04:56,159-Speed 2629.73 samples/sec   Loss 8.4644   LearningRate 0.0385   Epoch: 7   Global Step: 314680   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:00,073-Speed 2617.52 samples/sec   Loss 8.3841   LearningRate 0.0385   Epoch: 7   Global Step: 314690   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:03,985-Speed 2618.16 samples/sec   Loss 8.3682   LearningRate 0.0385   Epoch: 7   Global Step: 314700   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:07,886-Speed 2625.78 samples/sec   Loss 8.3898   LearningRate 0.0385   Epoch: 7   Global Step: 314710   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:11,782-Speed 2628.91 samples/sec   Loss 8.3686   LearningRate 0.0385   Epoch: 7   Global Step: 314720   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:15,715-Speed 2604.20 samples/sec   Loss 8.2676   LearningRate 0.0385   Epoch: 7   Global Step: 314730   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:19,614-Speed 2626.97 samples/sec   Loss 8.2747   LearningRate 0.0385   Epoch: 7   Global Step: 314740   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:23,509-Speed 2630.30 samples/sec   Loss 8.3930   LearningRate 0.0385   Epoch: 7   Global Step: 314750   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:05:27,401-Speed 2630.95 samples/sec   Loss 8.3653   LearningRate 0.0385   Epoch: 7   Global Step: 314760   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:31,307-Speed 2622.61 samples/sec   Loss 8.3738   LearningRate 0.0385   Epoch: 7   Global Step: 314770   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:35,199-Speed 2631.78 samples/sec   Loss 8.4193   LearningRate 0.0385   Epoch: 7   Global Step: 314780   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:39,162-Speed 2584.45 samples/sec   Loss 8.2151   LearningRate 0.0385   Epoch: 7   Global Step: 314790   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:43,074-Speed 2618.18 samples/sec   Loss 8.3393   LearningRate 0.0385   Epoch: 7   Global Step: 314800   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:46,986-Speed 2619.04 samples/sec   Loss 8.3031   LearningRate 0.0385   Epoch: 7   Global Step: 314810   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:50,891-Speed 2623.12 samples/sec   Loss 8.4423   LearningRate 0.0385   Epoch: 7   Global Step: 314820   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:54,802-Speed 2619.08 samples/sec   Loss 8.3461   LearningRate 0.0385   Epoch: 7   Global Step: 314830   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:05:58,713-Speed 2618.95 samples/sec   Loss 8.3117   LearningRate 0.0385   Epoch: 7   Global Step: 314840   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:06:02,606-Speed 2631.26 samples/sec   Loss 8.4429   LearningRate 0.0385   Epoch: 7   Global Step: 314850   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:06:06,514-Speed 2620.94 samples/sec   Loss 8.3235   LearningRate 0.0385   Epoch: 7   Global Step: 314860   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:10,410-Speed 2628.77 samples/sec   Loss 8.4373   LearningRate 0.0385   Epoch: 7   Global Step: 314870   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:14,300-Speed 2633.26 samples/sec   Loss 8.3953   LearningRate 0.0385   Epoch: 7   Global Step: 314880   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:18,194-Speed 2629.99 samples/sec   Loss 8.6395   LearningRate 0.0385   Epoch: 7   Global Step: 314890   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:22,089-Speed 2630.08 samples/sec   Loss 8.3625   LearningRate 0.0385   Epoch: 7   Global Step: 314900   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:25,983-Speed 2629.92 samples/sec   Loss 8.4191   LearningRate 0.0385   Epoch: 7   Global Step: 314910   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:29,879-Speed 2629.64 samples/sec   Loss 8.3623   LearningRate 0.0385   Epoch: 7   Global Step: 314920   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:33,775-Speed 2628.39 samples/sec   Loss 8.3656   LearningRate 0.0385   Epoch: 7   Global Step: 314930   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:37,666-Speed 2633.09 samples/sec   Loss 8.3665   LearningRate 0.0385   Epoch: 7   Global Step: 314940   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:41,558-Speed 2631.39 samples/sec   Loss 8.4483   LearningRate 0.0385   Epoch: 7   Global Step: 314950   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:06:45,453-Speed 2629.83 samples/sec   Loss 8.2097   LearningRate 0.0385   Epoch: 7   Global Step: 314960   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:06:49,349-Speed 2629.10 samples/sec   Loss 8.3446   LearningRate 0.0385   Epoch: 7   Global Step: 314970   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:06:53,255-Speed 2628.79 samples/sec   Loss 8.1831   LearningRate 0.0385   Epoch: 7   Global Step: 314980   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:06:57,154-Speed 2626.46 samples/sec   Loss 8.3842   LearningRate 0.0385   Epoch: 7   Global Step: 314990   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:01,051-Speed 2628.15 samples/sec   Loss 8.3206   LearningRate 0.0385   Epoch: 7   Global Step: 315000   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:04,949-Speed 2627.44 samples/sec   Loss 8.3986   LearningRate 0.0385   Epoch: 7   Global Step: 315010   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:08,844-Speed 2629.65 samples/sec   Loss 8.4205   LearningRate 0.0385   Epoch: 7   Global Step: 315020   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:12,745-Speed 2626.39 samples/sec   Loss 8.2620   LearningRate 0.0385   Epoch: 7   Global Step: 315030   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:16,656-Speed 2618.85 samples/sec   Loss 8.3603   LearningRate 0.0385   Epoch: 7   Global Step: 315040   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:20,562-Speed 2622.55 samples/sec   Loss 8.3709   LearningRate 0.0385   Epoch: 7   Global Step: 315050   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:07:24,459-Speed 2627.82 samples/sec   Loss 8.3655   LearningRate 0.0385   Epoch: 7   Global Step: 315060   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:28,358-Speed 2626.51 samples/sec   Loss 8.2804   LearningRate 0.0385   Epoch: 7   Global Step: 315070   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:32,257-Speed 2627.25 samples/sec   Loss 8.3233   LearningRate 0.0385   Epoch: 7   Global Step: 315080   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:36,166-Speed 2620.01 samples/sec   Loss 8.4489   LearningRate 0.0385   Epoch: 7   Global Step: 315090   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:40,063-Speed 2628.30 samples/sec   Loss 8.3803   LearningRate 0.0385   Epoch: 7   Global Step: 315100   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:43,968-Speed 2623.34 samples/sec   Loss 8.2017   LearningRate 0.0385   Epoch: 7   Global Step: 315110   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:47,868-Speed 2625.97 samples/sec   Loss 8.3602   LearningRate 0.0385   Epoch: 7   Global Step: 315120   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:51,792-Speed 2610.87 samples/sec   Loss 8.2454   LearningRate 0.0385   Epoch: 7   Global Step: 315130   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:55,686-Speed 2630.11 samples/sec   Loss 8.3337   LearningRate 0.0385   Epoch: 7   Global Step: 315140   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:07:59,570-Speed 2637.53 samples/sec   Loss 8.3603   LearningRate 0.0385   Epoch: 7   Global Step: 315150   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:03,472-Speed 2624.70 samples/sec   Loss 8.3915   LearningRate 0.0385   Epoch: 7   Global Step: 315160   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:07,368-Speed 2628.82 samples/sec   Loss 8.4410   LearningRate 0.0385   Epoch: 7   Global Step: 315170   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:11,283-Speed 2616.38 samples/sec   Loss 8.2686   LearningRate 0.0384   Epoch: 7   Global Step: 315180   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:15,184-Speed 2625.61 samples/sec   Loss 8.3024   LearningRate 0.0384   Epoch: 7   Global Step: 315190   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:19,090-Speed 2622.60 samples/sec   Loss 8.3673   LearningRate 0.0384   Epoch: 7   Global Step: 315200   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:23,024-Speed 2603.56 samples/sec   Loss 8.3669   LearningRate 0.0384   Epoch: 7   Global Step: 315210   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:26,926-Speed 2624.95 samples/sec   Loss 8.5385   LearningRate 0.0384   Epoch: 7   Global Step: 315220   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:30,845-Speed 2613.70 samples/sec   Loss 8.4053   LearningRate 0.0384   Epoch: 7   Global Step: 315230   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:34,790-Speed 2596.20 samples/sec   Loss 8.4083   LearningRate 0.0384   Epoch: 7   Global Step: 315240   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:08:38,887-Speed 2500.03 samples/sec   Loss 8.2749   LearningRate 0.0384   Epoch: 7   Global Step: 315250   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:08:42,853-Speed 2583.06 samples/sec   Loss 8.2867   LearningRate 0.0384   Epoch: 7   Global Step: 315260   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:08:46,760-Speed 2621.04 samples/sec   Loss 8.3032   LearningRate 0.0384   Epoch: 7   Global Step: 315270   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:08:50,661-Speed 2626.10 samples/sec   Loss 8.2271   LearningRate 0.0384   Epoch: 7   Global Step: 315280   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:08:54,559-Speed 2627.39 samples/sec   Loss 8.2975   LearningRate 0.0384   Epoch: 7   Global Step: 315290   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:08:58,457-Speed 2628.02 samples/sec   Loss 8.3992   LearningRate 0.0384   Epoch: 7   Global Step: 315300   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:09:02,360-Speed 2623.82 samples/sec   Loss 8.3974   LearningRate 0.0384   Epoch: 7   Global Step: 315310   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:09:06,274-Speed 2622.59 samples/sec   Loss 8.3009   LearningRate 0.0384   Epoch: 7   Global Step: 315320   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:09:10,177-Speed 2623.97 samples/sec   Loss 8.1871   LearningRate 0.0384   Epoch: 7   Global Step: 315330   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:09:14,074-Speed 2628.05 samples/sec   Loss 8.2599   LearningRate 0.0384   Epoch: 7   Global Step: 315340   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:09:17,955-Speed 2639.22 samples/sec   Loss 8.3150   LearningRate 0.0384   Epoch: 7   Global Step: 315350   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:09:21,832-Speed 2642.06 samples/sec   Loss 8.3305   LearningRate 0.0384   Epoch: 7   Global Step: 315360   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:25,724-Speed 2632.17 samples/sec   Loss 8.5007   LearningRate 0.0384   Epoch: 7   Global Step: 315370   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:29,614-Speed 2632.85 samples/sec   Loss 8.3357   LearningRate 0.0384   Epoch: 7   Global Step: 315380   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:33,544-Speed 2605.88 samples/sec   Loss 8.3776   LearningRate 0.0384   Epoch: 7   Global Step: 315390   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:37,444-Speed 2626.78 samples/sec   Loss 8.3510   LearningRate 0.0384   Epoch: 7   Global Step: 315400   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:41,344-Speed 2626.26 samples/sec   Loss 8.3112   LearningRate 0.0384   Epoch: 7   Global Step: 315410   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:45,238-Speed 2630.58 samples/sec   Loss 8.2726   LearningRate 0.0384   Epoch: 7   Global Step: 315420   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:49,172-Speed 2603.52 samples/sec   Loss 8.3312   LearningRate 0.0384   Epoch: 7   Global Step: 315430   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:53,066-Speed 2630.42 samples/sec   Loss 8.3031   LearningRate 0.0384   Epoch: 7   Global Step: 315440   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:09:56,957-Speed 2632.40 samples/sec   Loss 8.3532   LearningRate 0.0384   Epoch: 7   Global Step: 315450   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:10:00,852-Speed 2629.68 samples/sec   Loss 8.2955   LearningRate 0.0384   Epoch: 7   Global Step: 315460   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:04,745-Speed 2631.12 samples/sec   Loss 8.3887   LearningRate 0.0384   Epoch: 7   Global Step: 315470   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:08,639-Speed 2630.36 samples/sec   Loss 8.3419   LearningRate 0.0384   Epoch: 7   Global Step: 315480   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:12,532-Speed 2631.03 samples/sec   Loss 8.3016   LearningRate 0.0384   Epoch: 7   Global Step: 315490   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:16,426-Speed 2630.30 samples/sec   Loss 8.2490   LearningRate 0.0384   Epoch: 7   Global Step: 315500   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:20,353-Speed 2608.61 samples/sec   Loss 8.4333   LearningRate 0.0384   Epoch: 7   Global Step: 315510   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:24,245-Speed 2631.50 samples/sec   Loss 8.3962   LearningRate 0.0384   Epoch: 7   Global Step: 315520   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:28,142-Speed 2628.26 samples/sec   Loss 8.4777   LearningRate 0.0384   Epoch: 7   Global Step: 315530   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:32,030-Speed 2634.15 samples/sec   Loss 8.3865   LearningRate 0.0384   Epoch: 7   Global Step: 315540   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:35,924-Speed 2630.62 samples/sec   Loss 8.4350   LearningRate 0.0384   Epoch: 7   Global Step: 315550   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:39,831-Speed 2621.27 samples/sec   Loss 8.2778   LearningRate 0.0384   Epoch: 7   Global Step: 315560   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:10:43,704-Speed 2644.72 samples/sec   Loss 8.3109   LearningRate 0.0384   Epoch: 7   Global Step: 315570   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:47,601-Speed 2628.49 samples/sec   Loss 8.3337   LearningRate 0.0384   Epoch: 7   Global Step: 315580   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:51,493-Speed 2631.65 samples/sec   Loss 8.4512   LearningRate 0.0384   Epoch: 7   Global Step: 315590   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:55,408-Speed 2616.79 samples/sec   Loss 8.3714   LearningRate 0.0384   Epoch: 7   Global Step: 315600   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:10:59,322-Speed 2616.77 samples/sec   Loss 8.3660   LearningRate 0.0384   Epoch: 7   Global Step: 315610   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:03,214-Speed 2631.33 samples/sec   Loss 8.4619   LearningRate 0.0384   Epoch: 7   Global Step: 315620   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:07,123-Speed 2620.50 samples/sec   Loss 8.3971   LearningRate 0.0384   Epoch: 7   Global Step: 315630   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:11,027-Speed 2623.66 samples/sec   Loss 8.3782   LearningRate 0.0384   Epoch: 7   Global Step: 315640   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:14,932-Speed 2623.01 samples/sec   Loss 8.4921   LearningRate 0.0384   Epoch: 7   Global Step: 315650   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:18,830-Speed 2627.64 samples/sec   Loss 8.3439   LearningRate 0.0384   Epoch: 7   Global Step: 315660   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:22,734-Speed 2623.64 samples/sec   Loss 8.2835   LearningRate 0.0384   Epoch: 7   Global Step: 315670   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:11:26,622-Speed 2634.63 samples/sec   Loss 8.4404   LearningRate 0.0384   Epoch: 7   Global Step: 315680   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:30,519-Speed 2628.49 samples/sec   Loss 8.4559   LearningRate 0.0384   Epoch: 7   Global Step: 315690   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:34,413-Speed 2629.89 samples/sec   Loss 8.5016   LearningRate 0.0384   Epoch: 7   Global Step: 315700   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:38,313-Speed 2626.16 samples/sec   Loss 8.2981   LearningRate 0.0384   Epoch: 7   Global Step: 315710   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:42,242-Speed 2607.54 samples/sec   Loss 8.4897   LearningRate 0.0384   Epoch: 7   Global Step: 315720   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:46,141-Speed 2627.01 samples/sec   Loss 8.3149   LearningRate 0.0384   Epoch: 7   Global Step: 315730   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:50,040-Speed 2627.29 samples/sec   Loss 8.4341   LearningRate 0.0384   Epoch: 7   Global Step: 315740   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:53,935-Speed 2629.10 samples/sec   Loss 8.4287   LearningRate 0.0384   Epoch: 7   Global Step: 315750   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:11:57,834-Speed 2627.34 samples/sec   Loss 8.2658   LearningRate 0.0384   Epoch: 7   Global Step: 315760   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:01,729-Speed 2630.12 samples/sec   Loss 8.3934   LearningRate 0.0384   Epoch: 7   Global Step: 315770   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:05,608-Speed 2640.20 samples/sec   Loss 8.2978   LearningRate 0.0384   Epoch: 7   Global Step: 315780   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:09,507-Speed 2626.28 samples/sec   Loss 8.4732   LearningRate 0.0384   Epoch: 7   Global Step: 315790   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:13,401-Speed 2630.90 samples/sec   Loss 8.3351   LearningRate 0.0384   Epoch: 7   Global Step: 315800   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:17,298-Speed 2628.47 samples/sec   Loss 8.4606   LearningRate 0.0384   Epoch: 7   Global Step: 315810   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:21,200-Speed 2624.90 samples/sec   Loss 8.2258   LearningRate 0.0384   Epoch: 7   Global Step: 315820   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:25,100-Speed 2626.45 samples/sec   Loss 8.4059   LearningRate 0.0384   Epoch: 7   Global Step: 315830   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:29,005-Speed 2622.69 samples/sec   Loss 8.2423   LearningRate 0.0384   Epoch: 7   Global Step: 315840   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:32,925-Speed 2613.04 samples/sec   Loss 8.4077   LearningRate 0.0383   Epoch: 7   Global Step: 315850   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:36,823-Speed 2627.91 samples/sec   Loss 8.3731   LearningRate 0.0383   Epoch: 7   Global Step: 315860   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:40,719-Speed 2628.90 samples/sec   Loss 8.3717   LearningRate 0.0383   Epoch: 7   Global Step: 315870   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:44,609-Speed 2633.15 samples/sec   Loss 8.4484   LearningRate 0.0383   Epoch: 7   Global Step: 315880   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:48,499-Speed 2633.40 samples/sec   Loss 8.3794   LearningRate 0.0383   Epoch: 7   Global Step: 315890   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:52,399-Speed 2626.41 samples/sec   Loss 8.3619   LearningRate 0.0383   Epoch: 7   Global Step: 315900   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:12:56,298-Speed 2626.80 samples/sec   Loss 8.4211   LearningRate 0.0383   Epoch: 7   Global Step: 315910   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:00,198-Speed 2626.33 samples/sec   Loss 8.2379   LearningRate 0.0383   Epoch: 7   Global Step: 315920   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:04,103-Speed 2622.82 samples/sec   Loss 8.3679   LearningRate 0.0383   Epoch: 7   Global Step: 315930   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:08,002-Speed 2627.42 samples/sec   Loss 8.3061   LearningRate 0.0383   Epoch: 7   Global Step: 315940   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:11,899-Speed 2628.10 samples/sec   Loss 8.2791   LearningRate 0.0383   Epoch: 7   Global Step: 315950   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:15,792-Speed 2631.08 samples/sec   Loss 8.4487   LearningRate 0.0383   Epoch: 7   Global Step: 315960   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:19,687-Speed 2629.57 samples/sec   Loss 8.3806   LearningRate 0.0383   Epoch: 7   Global Step: 315970   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:23,584-Speed 2628.45 samples/sec   Loss 8.3376   LearningRate 0.0383   Epoch: 7   Global Step: 315980   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:27,478-Speed 2630.74 samples/sec   Loss 8.4977   LearningRate 0.0383   Epoch: 7   Global Step: 315990   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:31,374-Speed 2628.82 samples/sec   Loss 8.3822   LearningRate 0.0383   Epoch: 7   Global Step: 316000   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:35,273-Speed 2626.60 samples/sec   Loss 8.2379   LearningRate 0.0383   Epoch: 7   Global Step: 316010   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:39,176-Speed 2624.84 samples/sec   Loss 8.3776   LearningRate 0.0383   Epoch: 7   Global Step: 316020   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:43,071-Speed 2629.96 samples/sec   Loss 8.2459   LearningRate 0.0383   Epoch: 7   Global Step: 316030   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:46,971-Speed 2626.01 samples/sec   Loss 8.2738   LearningRate 0.0383   Epoch: 7   Global Step: 316040   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:50,875-Speed 2624.21 samples/sec   Loss 8.2280   LearningRate 0.0383   Epoch: 7   Global Step: 316050   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:13:54,760-Speed 2636.52 samples/sec   Loss 8.4276   LearningRate 0.0383   Epoch: 7   Global Step: 316060   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:13:58,652-Speed 2631.30 samples/sec   Loss 8.3382   LearningRate 0.0383   Epoch: 7   Global Step: 316070   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:14:02,524-Speed 2645.35 samples/sec   Loss 8.4526   LearningRate 0.0383   Epoch: 7   Global Step: 316080   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:06,419-Speed 2629.37 samples/sec   Loss 8.6118   LearningRate 0.0383   Epoch: 7   Global Step: 316090   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:10,326-Speed 2622.31 samples/sec   Loss 8.3476   LearningRate 0.0383   Epoch: 7   Global Step: 316100   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:14,319-Speed 2564.90 samples/sec   Loss 8.3417   LearningRate 0.0383   Epoch: 7   Global Step: 316110   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:18,419-Speed 2498.40 samples/sec   Loss 8.4076   LearningRate 0.0383   Epoch: 7   Global Step: 316120   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:22,514-Speed 2501.19 samples/sec   Loss 8.2674   LearningRate 0.0383   Epoch: 7   Global Step: 316130   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:26,612-Speed 2499.35 samples/sec   Loss 8.2755   LearningRate 0.0383   Epoch: 7   Global Step: 316140   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:30,563-Speed 2592.39 samples/sec   Loss 8.5051   LearningRate 0.0383   Epoch: 7   Global Step: 316150   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:34,471-Speed 2621.09 samples/sec   Loss 8.2054   LearningRate 0.0383   Epoch: 7   Global Step: 316160   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:38,369-Speed 2627.68 samples/sec   Loss 8.4073   LearningRate 0.0383   Epoch: 7   Global Step: 316170   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:14:42,274-Speed 2622.76 samples/sec   Loss 8.4042   LearningRate 0.0383   Epoch: 7   Global Step: 316180   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:14:46,197-Speed 2610.48 samples/sec   Loss 8.3752   LearningRate 0.0383   Epoch: 7   Global Step: 316190   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:14:50,117-Speed 2613.09 samples/sec   Loss 8.3694   LearningRate 0.0383   Epoch: 7   Global Step: 316200   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:14:54,025-Speed 2621.43 samples/sec   Loss 8.4069   LearningRate 0.0383   Epoch: 7   Global Step: 316210   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:14:57,928-Speed 2624.12 samples/sec   Loss 8.3380   LearningRate 0.0383   Epoch: 7   Global Step: 316220   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:15:01,827-Speed 2626.90 samples/sec   Loss 8.2653   LearningRate 0.0383   Epoch: 7   Global Step: 316230   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:15:05,740-Speed 2616.79 samples/sec   Loss 8.3647   LearningRate 0.0383   Epoch: 7   Global Step: 316240   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:15:09,658-Speed 2614.14 samples/sec   Loss 8.3239   LearningRate 0.0383   Epoch: 7   Global Step: 316250   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:15:13,538-Speed 2639.95 samples/sec   Loss 8.3292   LearningRate 0.0383   Epoch: 7   Global Step: 316260   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:15:17,403-Speed 2649.92 samples/sec   Loss 8.5933   LearningRate 0.0383   Epoch: 7   Global Step: 316270   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:21,315-Speed 2620.71 samples/sec   Loss 8.3474   LearningRate 0.0383   Epoch: 7   Global Step: 316280   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:25,216-Speed 2625.77 samples/sec   Loss 8.5153   LearningRate 0.0383   Epoch: 7   Global Step: 316290   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:29,123-Speed 2621.66 samples/sec   Loss 8.2504   LearningRate 0.0383   Epoch: 7   Global Step: 316300   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:33,039-Speed 2615.31 samples/sec   Loss 8.3669   LearningRate 0.0383   Epoch: 7   Global Step: 316310   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:36,978-Speed 2600.16 samples/sec   Loss 8.4641   LearningRate 0.0383   Epoch: 7   Global Step: 316320   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:40,878-Speed 2626.11 samples/sec   Loss 8.4283   LearningRate 0.0383   Epoch: 7   Global Step: 316330   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:44,788-Speed 2619.62 samples/sec   Loss 8.2604   LearningRate 0.0383   Epoch: 7   Global Step: 316340   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:48,695-Speed 2621.90 samples/sec   Loss 8.2818   LearningRate 0.0383   Epoch: 7   Global Step: 316350   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:52,601-Speed 2622.41 samples/sec   Loss 8.4958   LearningRate 0.0383   Epoch: 7   Global Step: 316360   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-14 07:15:56,515-Speed 2616.79 samples/sec   Loss 8.2361   LearningRate 0.0383   Epoch: 7   Global Step: 316370   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:00,427-Speed 2618.79 samples/sec   Loss 8.4765   LearningRate 0.0383   Epoch: 7   Global Step: 316380   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:04,332-Speed 2622.67 samples/sec   Loss 8.2564   LearningRate 0.0383   Epoch: 7   Global Step: 316390   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:08,250-Speed 2613.73 samples/sec   Loss 8.4094   LearningRate 0.0383   Epoch: 7   Global Step: 316400   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:12,176-Speed 2609.33 samples/sec   Loss 8.3332   LearningRate 0.0383   Epoch: 7   Global Step: 316410   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:16,077-Speed 2625.42 samples/sec   Loss 8.4835   LearningRate 0.0383   Epoch: 7   Global Step: 316420   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:19,986-Speed 2621.06 samples/sec   Loss 8.3861   LearningRate 0.0383   Epoch: 7   Global Step: 316430   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:23,903-Speed 2614.78 samples/sec   Loss 8.2589   LearningRate 0.0383   Epoch: 7   Global Step: 316440   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:27,818-Speed 2616.71 samples/sec   Loss 8.3335   LearningRate 0.0383   Epoch: 7   Global Step: 316450   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:31,722-Speed 2623.13 samples/sec   Loss 8.2649   LearningRate 0.0383   Epoch: 7   Global Step: 316460   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:16:35,625-Speed 2624.23 samples/sec   Loss 8.4021   LearningRate 0.0383   Epoch: 7   Global Step: 316470   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:16:39,528-Speed 2623.98 samples/sec   Loss 8.3896   LearningRate 0.0383   Epoch: 7   Global Step: 316480   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:16:43,429-Speed 2625.74 samples/sec   Loss 8.2875   LearningRate 0.0383   Epoch: 7   Global Step: 316490   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:16:47,335-Speed 2622.16 samples/sec   Loss 8.3795   LearningRate 0.0383   Epoch: 7   Global Step: 316500   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:16:51,231-Speed 2629.69 samples/sec   Loss 8.3749   LearningRate 0.0383   Epoch: 7   Global Step: 316510   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:16:55,172-Speed 2598.56 samples/sec   Loss 8.2715   LearningRate 0.0382   Epoch: 7   Global Step: 316520   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:16:59,082-Speed 2619.86 samples/sec   Loss 8.3240   LearningRate 0.0382   Epoch: 7   Global Step: 316530   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:17:02,990-Speed 2621.02 samples/sec   Loss 8.3757   LearningRate 0.0382   Epoch: 7   Global Step: 316540   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:17:06,889-Speed 2626.74 samples/sec   Loss 8.3690   LearningRate 0.0382   Epoch: 7   Global Step: 316550   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:17:10,791-Speed 2624.72 samples/sec   Loss 8.4462   LearningRate 0.0382   Epoch: 7   Global Step: 316560   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:17:14,689-Speed 2628.34 samples/sec   Loss 8.3005   LearningRate 0.0382   Epoch: 7   Global Step: 316570   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:18,602-Speed 2618.27 samples/sec   Loss 8.4581   LearningRate 0.0382   Epoch: 7   Global Step: 316580   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:22,626-Speed 2545.10 samples/sec   Loss 8.2945   LearningRate 0.0382   Epoch: 7   Global Step: 316590   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:26,544-Speed 2614.92 samples/sec   Loss 8.3021   LearningRate 0.0382   Epoch: 7   Global Step: 316600   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:30,512-Speed 2580.97 samples/sec   Loss 8.5026   LearningRate 0.0382   Epoch: 7   Global Step: 316610   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:34,462-Speed 2593.36 samples/sec   Loss 8.3314   LearningRate 0.0382   Epoch: 7   Global Step: 316620   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:38,365-Speed 2624.38 samples/sec   Loss 8.1932   LearningRate 0.0382   Epoch: 7   Global Step: 316630   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:42,262-Speed 2628.04 samples/sec   Loss 8.4220   LearningRate 0.0382   Epoch: 7   Global Step: 316640   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:46,156-Speed 2630.31 samples/sec   Loss 8.3645   LearningRate 0.0382   Epoch: 7   Global Step: 316650   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:50,056-Speed 2626.42 samples/sec   Loss 8.2322   LearningRate 0.0382   Epoch: 7   Global Step: 316660   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:17:53,950-Speed 2629.84 samples/sec   Loss 8.3090   LearningRate 0.0382   Epoch: 7   Global Step: 316670   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:17:57,857-Speed 2622.53 samples/sec   Loss 8.3968   LearningRate 0.0382   Epoch: 7   Global Step: 316680   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:01,755-Speed 2627.44 samples/sec   Loss 8.4082   LearningRate 0.0382   Epoch: 7   Global Step: 316690   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:05,661-Speed 2622.31 samples/sec   Loss 8.2836   LearningRate 0.0382   Epoch: 7   Global Step: 316700   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:09,559-Speed 2627.42 samples/sec   Loss 8.3800   LearningRate 0.0382   Epoch: 7   Global Step: 316710   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:13,456-Speed 2628.40 samples/sec   Loss 8.3999   LearningRate 0.0382   Epoch: 7   Global Step: 316720   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:17,348-Speed 2631.37 samples/sec   Loss 8.2995   LearningRate 0.0382   Epoch: 7   Global Step: 316730   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:21,242-Speed 2630.25 samples/sec   Loss 8.3235   LearningRate 0.0382   Epoch: 7   Global Step: 316740   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:25,133-Speed 2631.68 samples/sec   Loss 8.2379   LearningRate 0.0382   Epoch: 7   Global Step: 316750   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:29,033-Speed 2626.98 samples/sec   Loss 8.2681   LearningRate 0.0382   Epoch: 7   Global Step: 316760   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:32,926-Speed 2631.05 samples/sec   Loss 8.3652   LearningRate 0.0382   Epoch: 7   Global Step: 316770   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:18:36,819-Speed 2631.20 samples/sec   Loss 8.2482   LearningRate 0.0382   Epoch: 7   Global Step: 316780   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:18:40,712-Speed 2630.97 samples/sec   Loss 8.3293   LearningRate 0.0382   Epoch: 7   Global Step: 316790   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:18:44,638-Speed 2608.96 samples/sec   Loss 8.2703   LearningRate 0.0382   Epoch: 7   Global Step: 316800   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:18:48,530-Speed 2631.69 samples/sec   Loss 8.2896   LearningRate 0.0382   Epoch: 7   Global Step: 316810   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:18:52,424-Speed 2630.60 samples/sec   Loss 8.3874   LearningRate 0.0382   Epoch: 7   Global Step: 316820   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:18:56,323-Speed 2627.03 samples/sec   Loss 8.3049   LearningRate 0.0382   Epoch: 7   Global Step: 316830   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:00,221-Speed 2627.54 samples/sec   Loss 8.3062   LearningRate 0.0382   Epoch: 7   Global Step: 316840   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:04,117-Speed 2628.92 samples/sec   Loss 8.2319   LearningRate 0.0382   Epoch: 7   Global Step: 316850   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:08,014-Speed 2628.72 samples/sec   Loss 8.2918   LearningRate 0.0382   Epoch: 7   Global Step: 316860   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:11,917-Speed 2624.02 samples/sec   Loss 8.3587   LearningRate 0.0382   Epoch: 7   Global Step: 316870   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:15,810-Speed 2630.79 samples/sec   Loss 8.3640   LearningRate 0.0382   Epoch: 7   Global Step: 316880   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:19,704-Speed 2630.38 samples/sec   Loss 8.4525   LearningRate 0.0382   Epoch: 7   Global Step: 316890   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:23,605-Speed 2626.15 samples/sec   Loss 8.3394   LearningRate 0.0382   Epoch: 7   Global Step: 316900   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:27,500-Speed 2629.29 samples/sec   Loss 8.3917   LearningRate 0.0382   Epoch: 7   Global Step: 316910   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:31,395-Speed 2629.51 samples/sec   Loss 8.2647   LearningRate 0.0382   Epoch: 7   Global Step: 316920   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:19:35,275-Speed 2639.73 samples/sec   Loss 8.2563   LearningRate 0.0382   Epoch: 7   Global Step: 316930   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:39,200-Speed 2610.20 samples/sec   Loss 8.3150   LearningRate 0.0382   Epoch: 7   Global Step: 316940   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:43,092-Speed 2632.02 samples/sec   Loss 8.3565   LearningRate 0.0382   Epoch: 7   Global Step: 316950   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:46,986-Speed 2629.96 samples/sec   Loss 8.3834   LearningRate 0.0382   Epoch: 7   Global Step: 316960   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:50,879-Speed 2631.13 samples/sec   Loss 8.3781   LearningRate 0.0382   Epoch: 7   Global Step: 316970   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:54,775-Speed 2629.08 samples/sec   Loss 8.4133   LearningRate 0.0382   Epoch: 7   Global Step: 316980   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:19:58,669-Speed 2630.18 samples/sec   Loss 8.4296   LearningRate 0.0382   Epoch: 7   Global Step: 316990   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:20:02,568-Speed 2627.30 samples/sec   Loss 8.3389   LearningRate 0.0382   Epoch: 7   Global Step: 317000   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:20:06,472-Speed 2623.85 samples/sec   Loss 8.3562   LearningRate 0.0382   Epoch: 7   Global Step: 317010   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:20:10,376-Speed 2623.54 samples/sec   Loss 8.2498   LearningRate 0.0382   Epoch: 7   Global Step: 317020   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:20:14,280-Speed 2624.23 samples/sec   Loss 8.3319   LearningRate 0.0382   Epoch: 7   Global Step: 317030   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:20:18,201-Speed 2612.23 samples/sec   Loss 8.2558   LearningRate 0.0382   Epoch: 7   Global Step: 317040   Fp16 Grad Scale: 262144   Required: 58 hours
Training: 2022-04-14 07:20:22,072-Speed 2646.19 samples/sec   Loss 8.3133   LearningRate 0.0382   Epoch: 7   Global Step: 317050   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:25,972-Speed 2626.42 samples/sec   Loss 8.3187   LearningRate 0.0382   Epoch: 7   Global Step: 317060   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:29,869-Speed 2628.22 samples/sec   Loss 8.3350   LearningRate 0.0382   Epoch: 7   Global Step: 317070   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:33,793-Speed 2610.03 samples/sec   Loss 8.4008   LearningRate 0.0382   Epoch: 7   Global Step: 317080   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:37,685-Speed 2632.30 samples/sec   Loss 8.2194   LearningRate 0.0382   Epoch: 7   Global Step: 317090   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:41,590-Speed 2623.11 samples/sec   Loss 8.2149   LearningRate 0.0382   Epoch: 7   Global Step: 317100   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:45,484-Speed 2630.32 samples/sec   Loss 8.2595   LearningRate 0.0382   Epoch: 7   Global Step: 317110   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:20:49,348-Speed 2650.60 samples/sec   Loss 8.7182   LearningRate 0.0382   Epoch: 7   Global Step: 317120   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:20:53,252-Speed 2623.39 samples/sec   Loss 8.7446   LearningRate 0.0382   Epoch: 7   Global Step: 317130   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:20:57,155-Speed 2624.37 samples/sec   Loss 8.2988   LearningRate 0.0382   Epoch: 7   Global Step: 317140   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:01,046-Speed 2632.52 samples/sec   Loss 8.3048   LearningRate 0.0382   Epoch: 7   Global Step: 317150   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:04,945-Speed 2626.91 samples/sec   Loss 8.3540   LearningRate 0.0382   Epoch: 7   Global Step: 317160   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:08,849-Speed 2624.74 samples/sec   Loss 8.1649   LearningRate 0.0382   Epoch: 7   Global Step: 317170   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:12,755-Speed 2622.00 samples/sec   Loss 8.3914   LearningRate 0.0382   Epoch: 7   Global Step: 317180   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:16,652-Speed 2628.41 samples/sec   Loss 8.3376   LearningRate 0.0381   Epoch: 7   Global Step: 317190   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:20,557-Speed 2623.38 samples/sec   Loss 8.3392   LearningRate 0.0381   Epoch: 7   Global Step: 317200   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:24,456-Speed 2626.77 samples/sec   Loss 8.3762   LearningRate 0.0381   Epoch: 7   Global Step: 317210   Fp16 Grad Scale: 16384   Required: 58 hours
Training: 2022-04-14 07:21:28,446-Speed 2567.56 samples/sec   Loss 8.3406   LearningRate 0.0381   Epoch: 7   Global Step: 317220   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:32,355-Speed 2620.20 samples/sec   Loss 8.3884   LearningRate 0.0381   Epoch: 7   Global Step: 317230   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:36,254-Speed 2627.13 samples/sec   Loss 8.3121   LearningRate 0.0381   Epoch: 7   Global Step: 317240   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:40,158-Speed 2623.32 samples/sec   Loss 8.4365   LearningRate 0.0381   Epoch: 7   Global Step: 317250   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:44,059-Speed 2625.27 samples/sec   Loss 8.2480   LearningRate 0.0381   Epoch: 7   Global Step: 317260   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:47,998-Speed 2600.81 samples/sec   Loss 8.3484   LearningRate 0.0381   Epoch: 7   Global Step: 317270   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:51,927-Speed 2606.84 samples/sec   Loss 8.2808   LearningRate 0.0381   Epoch: 7   Global Step: 317280   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:55,855-Speed 2607.59 samples/sec   Loss 8.3084   LearningRate 0.0381   Epoch: 7   Global Step: 317290   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:21:59,794-Speed 2600.84 samples/sec   Loss 8.2626   LearningRate 0.0381   Epoch: 7   Global Step: 317300   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:22:03,761-Speed 2581.59 samples/sec   Loss 8.2895   LearningRate 0.0381   Epoch: 7   Global Step: 317310   Fp16 Grad Scale: 32768   Required: 58 hours
Training: 2022-04-14 07:22:07,667-Speed 2622.42 samples/sec   Loss 8.2610   LearningRate 0.0381   Epoch: 7   Global Step: 317320   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:11,580-Speed 2617.24 samples/sec   Loss 8.3216   LearningRate 0.0381   Epoch: 7   Global Step: 317330   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:15,487-Speed 2621.85 samples/sec   Loss 8.4064   LearningRate 0.0381   Epoch: 7   Global Step: 317340   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:19,400-Speed 2617.64 samples/sec   Loss 8.2590   LearningRate 0.0381   Epoch: 7   Global Step: 317350   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:23,295-Speed 2629.80 samples/sec   Loss 8.4323   LearningRate 0.0381   Epoch: 7   Global Step: 317360   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:27,209-Speed 2617.45 samples/sec   Loss 8.3623   LearningRate 0.0381   Epoch: 7   Global Step: 317370   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:31,201-Speed 2565.48 samples/sec   Loss 8.2801   LearningRate 0.0381   Epoch: 7   Global Step: 317380   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:35,097-Speed 2629.06 samples/sec   Loss 8.2398   LearningRate 0.0381   Epoch: 7   Global Step: 317390   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:38,989-Speed 2631.86 samples/sec   Loss 8.3546   LearningRate 0.0381   Epoch: 7   Global Step: 317400   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:42,881-Speed 2632.00 samples/sec   Loss 8.3725   LearningRate 0.0381   Epoch: 7   Global Step: 317410   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:22:46,784-Speed 2623.49 samples/sec   Loss 8.3188   LearningRate 0.0381   Epoch: 7   Global Step: 317420   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:22:50,677-Speed 2631.31 samples/sec   Loss 8.2928   LearningRate 0.0381   Epoch: 7   Global Step: 317430   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:22:54,570-Speed 2631.13 samples/sec   Loss 8.3401   LearningRate 0.0381   Epoch: 7   Global Step: 317440   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:22:58,463-Speed 2631.21 samples/sec   Loss 8.4106   LearningRate 0.0381   Epoch: 7   Global Step: 317450   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:23:02,341-Speed 2641.23 samples/sec   Loss 8.3293   LearningRate 0.0381   Epoch: 7   Global Step: 317460   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:06,232-Speed 2631.92 samples/sec   Loss 8.3590   LearningRate 0.0381   Epoch: 7   Global Step: 317470   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:10,126-Speed 2630.24 samples/sec   Loss 8.2964   LearningRate 0.0381   Epoch: 7   Global Step: 317480   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:14,074-Speed 2595.05 samples/sec   Loss 8.4283   LearningRate 0.0381   Epoch: 7   Global Step: 317490   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:18,006-Speed 2604.90 samples/sec   Loss 8.3066   LearningRate 0.0381   Epoch: 7   Global Step: 317500   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:21,897-Speed 2632.26 samples/sec   Loss 8.2904   LearningRate 0.0381   Epoch: 7   Global Step: 317510   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:25,819-Speed 2611.72 samples/sec   Loss 8.5070   LearningRate 0.0381   Epoch: 7   Global Step: 317520   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:29,715-Speed 2629.40 samples/sec   Loss 8.3631   LearningRate 0.0381   Epoch: 7   Global Step: 317530   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:33,632-Speed 2615.00 samples/sec   Loss 8.2246   LearningRate 0.0381   Epoch: 7   Global Step: 317540   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:37,533-Speed 2625.32 samples/sec   Loss 8.4549   LearningRate 0.0381   Epoch: 7   Global Step: 317550   Fp16 Grad Scale: 65536   Required: 58 hours
Training: 2022-04-14 07:23:41,442-Speed 2620.53 samples/sec   Loss 8.4589   LearningRate 0.0381   Epoch: 7   Global Step: 317560   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:23:45,357-Speed 2616.51 samples/sec   Loss 8.4767   LearningRate 0.0381   Epoch: 7   Global Step: 317570   Fp16 Grad Scale: 131072   Required: 58 hours
Training: 2022-04-14 07:23:49,251-Speed 2630.28 samples/sec   Loss 8.1746   LearningRate 0.0381   Epoch: 7   Global Step: 317580   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:23:53,148-Speed 2628.01 samples/sec   Loss 8.3344   LearningRate 0.0381   Epoch: 7   Global Step: 317590   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:23:57,058-Speed 2619.82 samples/sec   Loss 8.2970   LearningRate 0.0381   Epoch: 7   Global Step: 317600   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:24:00,943-Speed 2636.63 samples/sec   Loss 8.2874   LearningRate 0.0381   Epoch: 7   Global Step: 317610   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:04,830-Speed 2635.24 samples/sec   Loss 8.3713   LearningRate 0.0381   Epoch: 7   Global Step: 317620   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:08,719-Speed 2633.23 samples/sec   Loss 8.3563   LearningRate 0.0381   Epoch: 7   Global Step: 317630   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:12,611-Speed 2631.78 samples/sec   Loss 8.3975   LearningRate 0.0381   Epoch: 7   Global Step: 317640   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:16,507-Speed 2629.26 samples/sec   Loss 8.4268   LearningRate 0.0381   Epoch: 7   Global Step: 317650   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:20,498-Speed 2566.86 samples/sec   Loss 8.3593   LearningRate 0.0381   Epoch: 7   Global Step: 317660   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:24,397-Speed 2627.02 samples/sec   Loss 8.1139   LearningRate 0.0381   Epoch: 7   Global Step: 317670   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:28,292-Speed 2629.25 samples/sec   Loss 8.4648   LearningRate 0.0381   Epoch: 7   Global Step: 317680   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:32,186-Speed 2630.36 samples/sec   Loss 8.4114   LearningRate 0.0381   Epoch: 7   Global Step: 317690   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:36,082-Speed 2628.72 samples/sec   Loss 8.3482   LearningRate 0.0381   Epoch: 7   Global Step: 317700   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:24:39,977-Speed 2630.17 samples/sec   Loss 8.1791   LearningRate 0.0381   Epoch: 7   Global Step: 317710   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:24:43,876-Speed 2627.75 samples/sec   Loss 8.3384   LearningRate 0.0381   Epoch: 7   Global Step: 317720   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:24:47,787-Speed 2618.58 samples/sec   Loss 8.3765   LearningRate 0.0381   Epoch: 7   Global Step: 317730   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:24:51,682-Speed 2629.84 samples/sec   Loss 8.3278   LearningRate 0.0381   Epoch: 7   Global Step: 317740   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:24:55,577-Speed 2629.61 samples/sec   Loss 8.3426   LearningRate 0.0381   Epoch: 7   Global Step: 317750   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:24:59,479-Speed 2624.60 samples/sec   Loss 8.2069   LearningRate 0.0381   Epoch: 7   Global Step: 317760   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:03,395-Speed 2615.73 samples/sec   Loss 8.1880   LearningRate 0.0381   Epoch: 7   Global Step: 317770   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:07,302-Speed 2622.12 samples/sec   Loss 8.3356   LearningRate 0.0381   Epoch: 7   Global Step: 317780   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:11,213-Speed 2619.26 samples/sec   Loss 8.2776   LearningRate 0.0381   Epoch: 7   Global Step: 317790   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:15,121-Speed 2620.95 samples/sec   Loss 8.3743   LearningRate 0.0381   Epoch: 7   Global Step: 317800   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:19,040-Speed 2613.76 samples/sec   Loss 8.3377   LearningRate 0.0381   Epoch: 7   Global Step: 317810   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:25:22,937-Speed 2628.65 samples/sec   Loss 8.2363   LearningRate 0.0381   Epoch: 7   Global Step: 317820   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:25:26,834-Speed 2628.64 samples/sec   Loss 8.3245   LearningRate 0.0381   Epoch: 7   Global Step: 317830   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:25:30,750-Speed 2615.03 samples/sec   Loss 8.3206   LearningRate 0.0381   Epoch: 7   Global Step: 317840   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:25:34,647-Speed 2628.66 samples/sec   Loss 8.2352   LearningRate 0.0381   Epoch: 7   Global Step: 317850   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:25:38,539-Speed 2631.91 samples/sec   Loss 8.2326   LearningRate 0.0380   Epoch: 7   Global Step: 317860   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:42,465-Speed 2608.81 samples/sec   Loss 8.3391   LearningRate 0.0380   Epoch: 7   Global Step: 317870   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:46,364-Speed 2626.44 samples/sec   Loss 8.2320   LearningRate 0.0380   Epoch: 7   Global Step: 317880   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:50,262-Speed 2628.43 samples/sec   Loss 8.4431   LearningRate 0.0380   Epoch: 7   Global Step: 317890   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:54,155-Speed 2631.24 samples/sec   Loss 8.2717   LearningRate 0.0380   Epoch: 7   Global Step: 317900   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:25:58,053-Speed 2627.49 samples/sec   Loss 8.3320   LearningRate 0.0380   Epoch: 7   Global Step: 317910   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:26:01,950-Speed 2628.41 samples/sec   Loss 8.4076   LearningRate 0.0380   Epoch: 7   Global Step: 317920   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:26:05,847-Speed 2628.30 samples/sec   Loss 8.3434   LearningRate 0.0380   Epoch: 7   Global Step: 317930   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:26:09,754-Speed 2621.87 samples/sec   Loss 8.1882   LearningRate 0.0380   Epoch: 7   Global Step: 317940   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:26:13,682-Speed 2607.62 samples/sec   Loss 8.3798   LearningRate 0.0380   Epoch: 7   Global Step: 317950   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:26:17,576-Speed 2630.29 samples/sec   Loss 8.3475   LearningRate 0.0380   Epoch: 7   Global Step: 317960   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:26:21,455-Speed 2640.55 samples/sec   Loss 8.2919   LearningRate 0.0380   Epoch: 7   Global Step: 317970   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:26:25,333-Speed 2641.07 samples/sec   Loss 8.3476   LearningRate 0.0380   Epoch: 7   Global Step: 317980   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:29,236-Speed 2624.39 samples/sec   Loss 8.2416   LearningRate 0.0380   Epoch: 7   Global Step: 317990   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:33,132-Speed 2629.09 samples/sec   Loss 8.4061   LearningRate 0.0380   Epoch: 7   Global Step: 318000   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:37,035-Speed 2624.33 samples/sec   Loss 8.2773   LearningRate 0.0380   Epoch: 7   Global Step: 318010   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:40,934-Speed 2626.83 samples/sec   Loss 8.3205   LearningRate 0.0380   Epoch: 7   Global Step: 318020   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:44,838-Speed 2624.14 samples/sec   Loss 8.3328   LearningRate 0.0380   Epoch: 7   Global Step: 318030   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:48,731-Speed 2631.73 samples/sec   Loss 8.3357   LearningRate 0.0380   Epoch: 7   Global Step: 318040   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:52,630-Speed 2626.89 samples/sec   Loss 8.3949   LearningRate 0.0380   Epoch: 7   Global Step: 318050   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:26:56,555-Speed 2609.37 samples/sec   Loss 8.4148   LearningRate 0.0380   Epoch: 7   Global Step: 318060   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:00,473-Speed 2614.82 samples/sec   Loss 8.3398   LearningRate 0.0380   Epoch: 7   Global Step: 318070   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:04,352-Speed 2640.19 samples/sec   Loss 8.2788   LearningRate 0.0380   Epoch: 7   Global Step: 318080   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:08,260-Speed 2620.93 samples/sec   Loss 8.4207   LearningRate 0.0380   Epoch: 7   Global Step: 318090   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:12,157-Speed 2628.50 samples/sec   Loss 8.3794   LearningRate 0.0380   Epoch: 7   Global Step: 318100   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:16,061-Speed 2623.78 samples/sec   Loss 8.4353   LearningRate 0.0380   Epoch: 7   Global Step: 318110   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:19,953-Speed 2631.48 samples/sec   Loss 8.1908   LearningRate 0.0380   Epoch: 7   Global Step: 318120   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:23,846-Speed 2631.58 samples/sec   Loss 8.2512   LearningRate 0.0380   Epoch: 7   Global Step: 318130   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:27,737-Speed 2632.67 samples/sec   Loss 8.3191   LearningRate 0.0380   Epoch: 7   Global Step: 318140   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:31,649-Speed 2617.87 samples/sec   Loss 8.2028   LearningRate 0.0380   Epoch: 7   Global Step: 318150   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:35,548-Speed 2627.02 samples/sec   Loss 8.3369   LearningRate 0.0380   Epoch: 7   Global Step: 318160   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:39,441-Speed 2630.70 samples/sec   Loss 8.1997   LearningRate 0.0380   Epoch: 7   Global Step: 318170   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:27:43,335-Speed 2630.89 samples/sec   Loss 8.1718   LearningRate 0.0380   Epoch: 7   Global Step: 318180   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:27:47,227-Speed 2631.10 samples/sec   Loss 8.2933   LearningRate 0.0380   Epoch: 7   Global Step: 318190   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:27:51,123-Speed 2629.53 samples/sec   Loss 8.2264   LearningRate 0.0380   Epoch: 7   Global Step: 318200   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:27:55,023-Speed 2625.62 samples/sec   Loss 8.3820   LearningRate 0.0380   Epoch: 7   Global Step: 318210   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:27:58,916-Speed 2632.02 samples/sec   Loss 8.3120   LearningRate 0.0380   Epoch: 7   Global Step: 318220   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:02,818-Speed 2624.61 samples/sec   Loss 8.2664   LearningRate 0.0380   Epoch: 7   Global Step: 318230   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:06,715-Speed 2627.86 samples/sec   Loss 8.4670   LearningRate 0.0380   Epoch: 7   Global Step: 318240   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:10,621-Speed 2622.21 samples/sec   Loss 8.3445   LearningRate 0.0380   Epoch: 7   Global Step: 318250   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:14,524-Speed 2624.34 samples/sec   Loss 8.3859   LearningRate 0.0380   Epoch: 7   Global Step: 318260   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:18,426-Speed 2625.12 samples/sec   Loss 8.2987   LearningRate 0.0380   Epoch: 7   Global Step: 318270   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:22,333-Speed 2621.15 samples/sec   Loss 8.3711   LearningRate 0.0380   Epoch: 7   Global Step: 318280   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:28:26,231-Speed 2628.39 samples/sec   Loss 8.2078   LearningRate 0.0380   Epoch: 7   Global Step: 318290   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:28:30,106-Speed 2642.86 samples/sec   Loss 8.3456   LearningRate 0.0380   Epoch: 7   Global Step: 318300   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:28:34,019-Speed 2617.75 samples/sec   Loss 8.3201   LearningRate 0.0380   Epoch: 7   Global Step: 318310   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:28:37,918-Speed 2626.50 samples/sec   Loss 8.3950   LearningRate 0.0380   Epoch: 7   Global Step: 318320   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:28:41,814-Speed 2629.00 samples/sec   Loss 8.2582   LearningRate 0.0380   Epoch: 7   Global Step: 318330   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:28:45,720-Speed 2622.31 samples/sec   Loss 8.2810   LearningRate 0.0380   Epoch: 7   Global Step: 318340   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:28:49,594-Speed 2643.79 samples/sec   Loss 8.4146   LearningRate 0.0380   Epoch: 7   Global Step: 318350   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:28:53,488-Speed 2630.10 samples/sec   Loss 8.3526   LearningRate 0.0380   Epoch: 7   Global Step: 318360   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:28:57,378-Speed 2634.02 samples/sec   Loss 8.3283   LearningRate 0.0380   Epoch: 7   Global Step: 318370   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:01,269-Speed 2631.66 samples/sec   Loss 8.2501   LearningRate 0.0380   Epoch: 7   Global Step: 318380   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:05,212-Speed 2598.47 samples/sec   Loss 8.1956   LearningRate 0.0380   Epoch: 7   Global Step: 318390   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:09,108-Speed 2628.69 samples/sec   Loss 8.2995   LearningRate 0.0380   Epoch: 7   Global Step: 318400   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:13,047-Speed 2600.26 samples/sec   Loss 8.3785   LearningRate 0.0380   Epoch: 7   Global Step: 318410   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:16,947-Speed 2626.42 samples/sec   Loss 8.3167   LearningRate 0.0380   Epoch: 7   Global Step: 318420   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:20,842-Speed 2629.71 samples/sec   Loss 8.2573   LearningRate 0.0380   Epoch: 7   Global Step: 318430   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:24,737-Speed 2630.62 samples/sec   Loss 8.3403   LearningRate 0.0380   Epoch: 7   Global Step: 318440   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:29:28,627-Speed 2632.65 samples/sec   Loss 8.2710   LearningRate 0.0380   Epoch: 7   Global Step: 318450   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:32,519-Speed 2631.97 samples/sec   Loss 8.3257   LearningRate 0.0380   Epoch: 7   Global Step: 318460   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:36,412-Speed 2630.96 samples/sec   Loss 8.4147   LearningRate 0.0380   Epoch: 7   Global Step: 318470   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:40,314-Speed 2625.06 samples/sec   Loss 8.3619   LearningRate 0.0380   Epoch: 7   Global Step: 318480   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:44,207-Speed 2630.54 samples/sec   Loss 8.3504   LearningRate 0.0380   Epoch: 7   Global Step: 318490   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:48,135-Speed 2608.32 samples/sec   Loss 8.2606   LearningRate 0.0380   Epoch: 7   Global Step: 318500   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:52,030-Speed 2629.30 samples/sec   Loss 8.2435   LearningRate 0.0380   Epoch: 7   Global Step: 318510   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:55,954-Speed 2610.91 samples/sec   Loss 8.3353   LearningRate 0.0380   Epoch: 7   Global Step: 318520   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:29:59,840-Speed 2635.38 samples/sec   Loss 8.3432   LearningRate 0.0379   Epoch: 7   Global Step: 318530   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:30:03,734-Speed 2630.55 samples/sec   Loss 8.2462   LearningRate 0.0379   Epoch: 7   Global Step: 318540   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:30:07,629-Speed 2629.59 samples/sec   Loss 8.2082   LearningRate 0.0379   Epoch: 7   Global Step: 318550   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:11,523-Speed 2630.47 samples/sec   Loss 8.3148   LearningRate 0.0379   Epoch: 7   Global Step: 318560   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:15,418-Speed 2629.16 samples/sec   Loss 8.3576   LearningRate 0.0379   Epoch: 7   Global Step: 318570   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:19,319-Speed 2626.20 samples/sec   Loss 8.3091   LearningRate 0.0379   Epoch: 7   Global Step: 318580   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:23,220-Speed 2625.40 samples/sec   Loss 8.3032   LearningRate 0.0379   Epoch: 7   Global Step: 318590   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:27,125-Speed 2623.17 samples/sec   Loss 8.2695   LearningRate 0.0379   Epoch: 7   Global Step: 318600   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:31,015-Speed 2632.83 samples/sec   Loss 8.2230   LearningRate 0.0379   Epoch: 7   Global Step: 318610   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:34,906-Speed 2632.50 samples/sec   Loss 8.3716   LearningRate 0.0379   Epoch: 7   Global Step: 318620   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:38,801-Speed 2629.41 samples/sec   Loss 8.3276   LearningRate 0.0379   Epoch: 7   Global Step: 318630   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:42,704-Speed 2624.18 samples/sec   Loss 8.4249   LearningRate 0.0379   Epoch: 7   Global Step: 318640   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:46,582-Speed 2640.82 samples/sec   Loss 8.4329   LearningRate 0.0379   Epoch: 7   Global Step: 318650   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:50,474-Speed 2631.94 samples/sec   Loss 8.2633   LearningRate 0.0379   Epoch: 7   Global Step: 318660   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:54,407-Speed 2604.26 samples/sec   Loss 8.3915   LearningRate 0.0379   Epoch: 7   Global Step: 318670   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:30:58,203-Speed 2698.26 samples/sec   Loss 8.7561   LearningRate 0.0379   Epoch: 7   Global Step: 318680   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:02,097-Speed 2630.67 samples/sec   Loss 9.6438   LearningRate 0.0379   Epoch: 7   Global Step: 318690   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:05,990-Speed 2630.79 samples/sec   Loss 9.0493   LearningRate 0.0379   Epoch: 7   Global Step: 318700   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:09,878-Speed 2634.19 samples/sec   Loss 8.6103   LearningRate 0.0379   Epoch: 7   Global Step: 318710   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:13,767-Speed 2633.43 samples/sec   Loss 8.3840   LearningRate 0.0379   Epoch: 7   Global Step: 318720   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:17,656-Speed 2633.57 samples/sec   Loss 8.2082   LearningRate 0.0379   Epoch: 7   Global Step: 318730   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:21,548-Speed 2631.64 samples/sec   Loss 8.2505   LearningRate 0.0379   Epoch: 7   Global Step: 318740   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:25,443-Speed 2629.38 samples/sec   Loss 8.3549   LearningRate 0.0379   Epoch: 7   Global Step: 318750   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:29,368-Speed 2609.51 samples/sec   Loss 8.3969   LearningRate 0.0379   Epoch: 7   Global Step: 318760   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:33,261-Speed 2631.45 samples/sec   Loss 8.2817   LearningRate 0.0379   Epoch: 7   Global Step: 318770   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:31:37,158-Speed 2627.96 samples/sec   Loss 8.2085   LearningRate 0.0379   Epoch: 7   Global Step: 318780   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:31:41,049-Speed 2632.19 samples/sec   Loss 8.3142   LearningRate 0.0379   Epoch: 7   Global Step: 318790   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:31:45,037-Speed 2568.30 samples/sec   Loss 8.2587   LearningRate 0.0379   Epoch: 7   Global Step: 318800   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:31:48,934-Speed 2628.32 samples/sec   Loss 8.2675   LearningRate 0.0379   Epoch: 7   Global Step: 318810   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:31:52,829-Speed 2630.12 samples/sec   Loss 8.1560   LearningRate 0.0379   Epoch: 7   Global Step: 318820   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:31:56,728-Speed 2626.30 samples/sec   Loss 8.3424   LearningRate 0.0379   Epoch: 7   Global Step: 318830   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:32:00,620-Speed 2632.12 samples/sec   Loss 8.3857   LearningRate 0.0379   Epoch: 7   Global Step: 318840   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:32:04,530-Speed 2619.18 samples/sec   Loss 8.1504   LearningRate 0.0379   Epoch: 7   Global Step: 318850   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:32:08,423-Speed 2630.92 samples/sec   Loss 8.2625   LearningRate 0.0379   Epoch: 7   Global Step: 318860   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:32:12,313-Speed 2633.32 samples/sec   Loss 8.2137   LearningRate 0.0379   Epoch: 7   Global Step: 318870   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:32:16,202-Speed 2633.47 samples/sec   Loss 8.3323   LearningRate 0.0379   Epoch: 7   Global Step: 318880   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:20,094-Speed 2632.16 samples/sec   Loss 8.3291   LearningRate 0.0379   Epoch: 7   Global Step: 318890   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:23,986-Speed 2631.17 samples/sec   Loss 8.3216   LearningRate 0.0379   Epoch: 7   Global Step: 318900   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:27,880-Speed 2630.78 samples/sec   Loss 8.2693   LearningRate 0.0379   Epoch: 7   Global Step: 318910   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:31,819-Speed 2599.81 samples/sec   Loss 8.3509   LearningRate 0.0379   Epoch: 7   Global Step: 318920   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:35,721-Speed 2624.66 samples/sec   Loss 8.2552   LearningRate 0.0379   Epoch: 7   Global Step: 318930   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:39,617-Speed 2628.68 samples/sec   Loss 8.2491   LearningRate 0.0379   Epoch: 7   Global Step: 318940   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:43,517-Speed 2626.54 samples/sec   Loss 8.2295   LearningRate 0.0379   Epoch: 7   Global Step: 318950   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:47,419-Speed 2625.03 samples/sec   Loss 8.2863   LearningRate 0.0379   Epoch: 7   Global Step: 318960   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:51,317-Speed 2627.66 samples/sec   Loss 8.2829   LearningRate 0.0379   Epoch: 7   Global Step: 318970   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:32:55,216-Speed 2627.02 samples/sec   Loss 8.2950   LearningRate 0.0379   Epoch: 7   Global Step: 318980   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:32:59,114-Speed 2627.68 samples/sec   Loss 8.2057   LearningRate 0.0379   Epoch: 7   Global Step: 318990   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:03,009-Speed 2629.19 samples/sec   Loss 8.2335   LearningRate 0.0379   Epoch: 7   Global Step: 319000   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:06,901-Speed 2631.50 samples/sec   Loss 8.1410   LearningRate 0.0379   Epoch: 7   Global Step: 319010   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:10,791-Speed 2633.08 samples/sec   Loss 8.2797   LearningRate 0.0379   Epoch: 7   Global Step: 319020   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:14,686-Speed 2629.39 samples/sec   Loss 8.3621   LearningRate 0.0379   Epoch: 7   Global Step: 319030   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:18,579-Speed 2631.41 samples/sec   Loss 8.3365   LearningRate 0.0379   Epoch: 7   Global Step: 319040   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:22,468-Speed 2633.86 samples/sec   Loss 8.3144   LearningRate 0.0379   Epoch: 7   Global Step: 319050   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:26,360-Speed 2631.20 samples/sec   Loss 8.1555   LearningRate 0.0379   Epoch: 7   Global Step: 319060   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:30,252-Speed 2632.30 samples/sec   Loss 8.4053   LearningRate 0.0379   Epoch: 7   Global Step: 319070   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:33:34,144-Speed 2631.47 samples/sec   Loss 8.3965   LearningRate 0.0379   Epoch: 7   Global Step: 319080   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:33:38,038-Speed 2629.81 samples/sec   Loss 8.2327   LearningRate 0.0379   Epoch: 7   Global Step: 319090   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:33:41,933-Speed 2629.38 samples/sec   Loss 8.2876   LearningRate 0.0379   Epoch: 7   Global Step: 319100   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:33:45,830-Speed 2628.62 samples/sec   Loss 8.2711   LearningRate 0.0379   Epoch: 7   Global Step: 319110   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:33:49,733-Speed 2624.56 samples/sec   Loss 8.3325   LearningRate 0.0379   Epoch: 7   Global Step: 319120   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:33:53,628-Speed 2629.14 samples/sec   Loss 8.3168   LearningRate 0.0379   Epoch: 7   Global Step: 319130   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:33:57,527-Speed 2627.20 samples/sec   Loss 8.2303   LearningRate 0.0379   Epoch: 7   Global Step: 319140   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:01,416-Speed 2633.07 samples/sec   Loss 8.1086   LearningRate 0.0379   Epoch: 7   Global Step: 319150   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:05,308-Speed 2632.19 samples/sec   Loss 8.3268   LearningRate 0.0379   Epoch: 7   Global Step: 319160   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:09,203-Speed 2629.44 samples/sec   Loss 8.3504   LearningRate 0.0379   Epoch: 7   Global Step: 319170   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:13,103-Speed 2626.34 samples/sec   Loss 8.3301   LearningRate 0.0379   Epoch: 7   Global Step: 319180   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:34:16,997-Speed 2630.16 samples/sec   Loss 8.2675   LearningRate 0.0379   Epoch: 7   Global Step: 319190   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:34:20,892-Speed 2630.07 samples/sec   Loss 8.3351   LearningRate 0.0379   Epoch: 7   Global Step: 319200   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:34:24,783-Speed 2632.10 samples/sec   Loss 8.2187   LearningRate 0.0378   Epoch: 7   Global Step: 319210   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:28,674-Speed 2632.31 samples/sec   Loss 8.3020   LearningRate 0.0378   Epoch: 7   Global Step: 319220   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:32,568-Speed 2630.46 samples/sec   Loss 8.2393   LearningRate 0.0378   Epoch: 7   Global Step: 319230   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:36,464-Speed 2628.59 samples/sec   Loss 8.2958   LearningRate 0.0378   Epoch: 7   Global Step: 319240   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:40,356-Speed 2631.50 samples/sec   Loss 8.2756   LearningRate 0.0378   Epoch: 7   Global Step: 319250   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:44,249-Speed 2631.27 samples/sec   Loss 8.3031   LearningRate 0.0378   Epoch: 7   Global Step: 319260   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:48,162-Speed 2617.56 samples/sec   Loss 8.2886   LearningRate 0.0378   Epoch: 7   Global Step: 319270   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:52,067-Speed 2622.85 samples/sec   Loss 8.3519   LearningRate 0.0378   Epoch: 7   Global Step: 319280   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:56,088-Speed 2547.26 samples/sec   Loss 8.3268   LearningRate 0.0378   Epoch: 7   Global Step: 319290   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:34:59,984-Speed 2628.54 samples/sec   Loss 8.2736   LearningRate 0.0378   Epoch: 7   Global Step: 319300   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:35:03,877-Speed 2631.28 samples/sec   Loss 8.3498   LearningRate 0.0378   Epoch: 7   Global Step: 319310   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:07,771-Speed 2630.16 samples/sec   Loss 8.3004   LearningRate 0.0378   Epoch: 7   Global Step: 319320   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:11,667-Speed 2628.91 samples/sec   Loss 8.2817   LearningRate 0.0378   Epoch: 7   Global Step: 319330   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:15,580-Speed 2617.25 samples/sec   Loss 8.2975   LearningRate 0.0378   Epoch: 7   Global Step: 319340   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:19,475-Speed 2630.16 samples/sec   Loss 8.2945   LearningRate 0.0378   Epoch: 7   Global Step: 319350   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:23,368-Speed 2630.58 samples/sec   Loss 8.4310   LearningRate 0.0378   Epoch: 7   Global Step: 319360   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:27,295-Speed 2608.77 samples/sec   Loss 8.2188   LearningRate 0.0378   Epoch: 7   Global Step: 319370   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:31,185-Speed 2632.60 samples/sec   Loss 8.2707   LearningRate 0.0378   Epoch: 7   Global Step: 319380   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:35,076-Speed 2632.98 samples/sec   Loss 8.2516   LearningRate 0.0378   Epoch: 7   Global Step: 319390   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:38,966-Speed 2632.57 samples/sec   Loss 8.3972   LearningRate 0.0378   Epoch: 7   Global Step: 319400   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:35:42,858-Speed 2631.53 samples/sec   Loss 8.2336   LearningRate 0.0378   Epoch: 7   Global Step: 319410   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:35:46,750-Speed 2631.95 samples/sec   Loss 8.2421   LearningRate 0.0378   Epoch: 7   Global Step: 319420   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:35:50,654-Speed 2623.85 samples/sec   Loss 8.3302   LearningRate 0.0378   Epoch: 7   Global Step: 319430   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:35:54,589-Speed 2603.13 samples/sec   Loss 8.3223   LearningRate 0.0378   Epoch: 7   Global Step: 319440   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:35:58,490-Speed 2625.51 samples/sec   Loss 8.2494   LearningRate 0.0378   Epoch: 7   Global Step: 319450   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:02,383-Speed 2631.83 samples/sec   Loss 8.2340   LearningRate 0.0378   Epoch: 7   Global Step: 319460   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:06,274-Speed 2631.53 samples/sec   Loss 8.2124   LearningRate 0.0378   Epoch: 7   Global Step: 319470   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:10,166-Speed 2632.05 samples/sec   Loss 8.1704   LearningRate 0.0378   Epoch: 7   Global Step: 319480   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:14,071-Speed 2622.80 samples/sec   Loss 8.1196   LearningRate 0.0378   Epoch: 7   Global Step: 319490   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:17,976-Speed 2622.66 samples/sec   Loss 8.2366   LearningRate 0.0378   Epoch: 7   Global Step: 319500   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:21,868-Speed 2631.59 samples/sec   Loss 8.2898   LearningRate 0.0378   Epoch: 7   Global Step: 319510   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:36:25,757-Speed 2633.62 samples/sec   Loss 8.1793   LearningRate 0.0378   Epoch: 7   Global Step: 319520   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:29,649-Speed 2631.86 samples/sec   Loss 8.1830   LearningRate 0.0378   Epoch: 7   Global Step: 319530   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:33,540-Speed 2632.18 samples/sec   Loss 8.2049   LearningRate 0.0378   Epoch: 7   Global Step: 319540   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:36:37,448-Speed 2620.91 samples/sec   Loss 8.2111   LearningRate 0.0378   Epoch: 7   Global Step: 319550   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:36:41,333-Speed 2636.88 samples/sec   Loss 8.2671   LearningRate 0.0378   Epoch: 7   Global Step: 319560   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:36:45,224-Speed 2631.98 samples/sec   Loss 8.2265   LearningRate 0.0378   Epoch: 7   Global Step: 319570   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:36:49,117-Speed 2631.11 samples/sec   Loss 8.2437   LearningRate 0.0378   Epoch: 7   Global Step: 319580   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:36:53,009-Speed 2632.09 samples/sec   Loss 8.2410   LearningRate 0.0378   Epoch: 7   Global Step: 319590   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:36:56,904-Speed 2629.39 samples/sec   Loss 8.2983   LearningRate 0.0378   Epoch: 7   Global Step: 319600   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:00,796-Speed 2631.42 samples/sec   Loss 8.2854   LearningRate 0.0378   Epoch: 7   Global Step: 319610   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:04,697-Speed 2625.45 samples/sec   Loss 8.2704   LearningRate 0.0378   Epoch: 7   Global Step: 319620   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:08,593-Speed 2628.73 samples/sec   Loss 8.2259   LearningRate 0.0378   Epoch: 7   Global Step: 319630   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:12,485-Speed 2631.77 samples/sec   Loss 8.1561   LearningRate 0.0378   Epoch: 7   Global Step: 319640   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:16,378-Speed 2631.47 samples/sec   Loss 8.2714   LearningRate 0.0378   Epoch: 7   Global Step: 319650   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:20,274-Speed 2628.61 samples/sec   Loss 8.2681   LearningRate 0.0378   Epoch: 7   Global Step: 319660   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:24,171-Speed 2628.71 samples/sec   Loss 8.2244   LearningRate 0.0378   Epoch: 7   Global Step: 319670   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:28,068-Speed 2627.85 samples/sec   Loss 8.2056   LearningRate 0.0378   Epoch: 7   Global Step: 319680   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:31,962-Speed 2630.14 samples/sec   Loss 8.2484   LearningRate 0.0378   Epoch: 7   Global Step: 319690   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:35,854-Speed 2632.00 samples/sec   Loss 8.2332   LearningRate 0.0378   Epoch: 7   Global Step: 319700   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:39,762-Speed 2620.52 samples/sec   Loss 8.2482   LearningRate 0.0378   Epoch: 7   Global Step: 319710   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:37:43,660-Speed 2627.47 samples/sec   Loss 8.3118   LearningRate 0.0378   Epoch: 7   Global Step: 319720   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:47,550-Speed 2632.89 samples/sec   Loss 8.3125   LearningRate 0.0378   Epoch: 7   Global Step: 319730   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:51,444-Speed 2630.53 samples/sec   Loss 8.2983   LearningRate 0.0378   Epoch: 7   Global Step: 319740   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:55,336-Speed 2631.82 samples/sec   Loss 8.2334   LearningRate 0.0378   Epoch: 7   Global Step: 319750   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:37:59,226-Speed 2633.03 samples/sec   Loss 8.1663   LearningRate 0.0378   Epoch: 7   Global Step: 319760   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:03,117-Speed 2632.05 samples/sec   Loss 8.2955   LearningRate 0.0378   Epoch: 7   Global Step: 319770   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:07,013-Speed 2629.13 samples/sec   Loss 8.4008   LearningRate 0.0378   Epoch: 7   Global Step: 319780   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:10,911-Speed 2627.57 samples/sec   Loss 8.2766   LearningRate 0.0378   Epoch: 7   Global Step: 319790   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:14,810-Speed 2627.03 samples/sec   Loss 8.3339   LearningRate 0.0378   Epoch: 7   Global Step: 319800   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:18,702-Speed 2631.13 samples/sec   Loss 8.2455   LearningRate 0.0378   Epoch: 7   Global Step: 319810   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:22,598-Speed 2629.24 samples/sec   Loss 8.3829   LearningRate 0.0378   Epoch: 7   Global Step: 319820   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:38:26,492-Speed 2629.90 samples/sec   Loss 8.3126   LearningRate 0.0378   Epoch: 7   Global Step: 319830   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:38:30,385-Speed 2631.29 samples/sec   Loss 8.3227   LearningRate 0.0378   Epoch: 7   Global Step: 319840   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:38:34,293-Speed 2621.33 samples/sec   Loss 8.2995   LearningRate 0.0378   Epoch: 7   Global Step: 319850   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:38:38,164-Speed 2645.63 samples/sec   Loss 8.1712   LearningRate 0.0378   Epoch: 7   Global Step: 319860   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:42,056-Speed 2631.49 samples/sec   Loss 8.3084   LearningRate 0.0378   Epoch: 7   Global Step: 319870   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:45,948-Speed 2632.15 samples/sec   Loss 8.2234   LearningRate 0.0377   Epoch: 7   Global Step: 319880   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:49,837-Speed 2633.76 samples/sec   Loss 8.1874   LearningRate 0.0377   Epoch: 7   Global Step: 319890   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:53,732-Speed 2629.45 samples/sec   Loss 8.2146   LearningRate 0.0377   Epoch: 7   Global Step: 319900   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:38:57,624-Speed 2631.85 samples/sec   Loss 8.2127   LearningRate 0.0377   Epoch: 7   Global Step: 319910   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:39:01,522-Speed 2627.38 samples/sec   Loss 8.3027   LearningRate 0.0377   Epoch: 7   Global Step: 319920   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:39:05,415-Speed 2631.17 samples/sec   Loss 8.3110   LearningRate 0.0377   Epoch: 7   Global Step: 319930   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:39:09,312-Speed 2628.10 samples/sec   Loss 8.2290   LearningRate 0.0377   Epoch: 7   Global Step: 319940   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:39:13,207-Speed 2629.73 samples/sec   Loss 8.2414   LearningRate 0.0377   Epoch: 7   Global Step: 319950   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:39:17,104-Speed 2628.11 samples/sec   Loss 8.2287   LearningRate 0.0377   Epoch: 7   Global Step: 319960   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:39:20,999-Speed 2630.34 samples/sec   Loss 8.2893   LearningRate 0.0377   Epoch: 7   Global Step: 319970   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:39:24,896-Speed 2627.82 samples/sec   Loss 8.3418   LearningRate 0.0377   Epoch: 7   Global Step: 319980   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:39:28,793-Speed 2628.56 samples/sec   Loss 8.3549   LearningRate 0.0377   Epoch: 7   Global Step: 319990   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:39:32,693-Speed 2626.13 samples/sec   Loss 8.1529   LearningRate 0.0377   Epoch: 7   Global Step: 320000   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:40:15,731-[lfw][320000]XNorm: 23.231626
Training: 2022-04-14 07:40:15,732-[lfw][320000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 07:40:15,732-[lfw][320000]Accuracy-Highest: 0.99783
Training: 2022-04-14 07:41:05,862-[cfp_fp][320000]XNorm: 21.376191
Training: 2022-04-14 07:41:05,863-[cfp_fp][320000]Accuracy-Flip: 0.98443+-0.00559
Training: 2022-04-14 07:41:05,863-[cfp_fp][320000]Accuracy-Highest: 0.98671
Training: 2022-04-14 07:41:48,754-[agedb_30][320000]XNorm: 23.276776
Training: 2022-04-14 07:41:48,755-[agedb_30][320000]Accuracy-Flip: 0.97367+-0.00741
Training: 2022-04-14 07:41:48,755-[agedb_30][320000]Accuracy-Highest: 0.97567
Training: 2022-04-14 07:41:52,619-Speed 73.18 samples/sec   Loss 8.2415   LearningRate 0.0377   Epoch: 7   Global Step: 320010   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:41:56,612-Speed 2564.72 samples/sec   Loss 8.2733   LearningRate 0.0377   Epoch: 7   Global Step: 320020   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:00,486-Speed 2644.42 samples/sec   Loss 8.2650   LearningRate 0.0377   Epoch: 7   Global Step: 320030   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:04,357-Speed 2645.42 samples/sec   Loss 8.2616   LearningRate 0.0377   Epoch: 7   Global Step: 320040   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:08,257-Speed 2626.13 samples/sec   Loss 8.2859   LearningRate 0.0377   Epoch: 7   Global Step: 320050   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:12,130-Speed 2644.57 samples/sec   Loss 8.3468   LearningRate 0.0377   Epoch: 7   Global Step: 320060   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:16,004-Speed 2644.46 samples/sec   Loss 8.2589   LearningRate 0.0377   Epoch: 7   Global Step: 320070   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:19,882-Speed 2641.17 samples/sec   Loss 8.3195   LearningRate 0.0377   Epoch: 7   Global Step: 320080   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:23,763-Speed 2639.11 samples/sec   Loss 8.1749   LearningRate 0.0377   Epoch: 7   Global Step: 320090   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:27,650-Speed 2634.75 samples/sec   Loss 8.2900   LearningRate 0.0377   Epoch: 7   Global Step: 320100   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:31,536-Speed 2636.22 samples/sec   Loss 8.3292   LearningRate 0.0377   Epoch: 7   Global Step: 320110   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:35,441-Speed 2622.51 samples/sec   Loss 8.4126   LearningRate 0.0377   Epoch: 7   Global Step: 320120   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:39,335-Speed 2630.45 samples/sec   Loss 8.3268   LearningRate 0.0377   Epoch: 7   Global Step: 320130   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:43,223-Speed 2633.78 samples/sec   Loss 8.3719   LearningRate 0.0377   Epoch: 7   Global Step: 320140   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:47,112-Speed 2634.44 samples/sec   Loss 8.3383   LearningRate 0.0377   Epoch: 7   Global Step: 320150   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:50,982-Speed 2646.20 samples/sec   Loss 8.1859   LearningRate 0.0377   Epoch: 7   Global Step: 320160   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:54,872-Speed 2633.60 samples/sec   Loss 8.2915   LearningRate 0.0377   Epoch: 7   Global Step: 320170   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:42:58,765-Speed 2630.82 samples/sec   Loss 8.1501   LearningRate 0.0377   Epoch: 7   Global Step: 320180   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:43:02,653-Speed 2634.39 samples/sec   Loss 8.3285   LearningRate 0.0377   Epoch: 7   Global Step: 320190   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:43:06,555-Speed 2624.41 samples/sec   Loss 8.2682   LearningRate 0.0377   Epoch: 7   Global Step: 320200   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:43:10,453-Speed 2627.65 samples/sec   Loss 8.2238   LearningRate 0.0377   Epoch: 7   Global Step: 320210   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:43:14,343-Speed 2632.94 samples/sec   Loss 8.2562   LearningRate 0.0377   Epoch: 7   Global Step: 320220   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:43:18,221-Speed 2641.28 samples/sec   Loss 8.2568   LearningRate 0.0377   Epoch: 7   Global Step: 320230   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:22,110-Speed 2633.69 samples/sec   Loss 8.3588   LearningRate 0.0377   Epoch: 7   Global Step: 320240   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:26,002-Speed 2631.98 samples/sec   Loss 8.2503   LearningRate 0.0377   Epoch: 7   Global Step: 320250   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:29,899-Speed 2628.60 samples/sec   Loss 8.3487   LearningRate 0.0377   Epoch: 7   Global Step: 320260   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:33,793-Speed 2630.19 samples/sec   Loss 8.2552   LearningRate 0.0377   Epoch: 7   Global Step: 320270   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:37,693-Speed 2625.82 samples/sec   Loss 8.2438   LearningRate 0.0377   Epoch: 7   Global Step: 320280   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:41,599-Speed 2622.37 samples/sec   Loss 8.1329   LearningRate 0.0377   Epoch: 7   Global Step: 320290   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:45,496-Speed 2627.91 samples/sec   Loss 8.1990   LearningRate 0.0377   Epoch: 7   Global Step: 320300   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:49,392-Speed 2629.99 samples/sec   Loss 8.4165   LearningRate 0.0377   Epoch: 7   Global Step: 320310   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:53,285-Speed 2631.34 samples/sec   Loss 8.2202   LearningRate 0.0377   Epoch: 7   Global Step: 320320   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:43:57,175-Speed 2633.06 samples/sec   Loss 8.2801   LearningRate 0.0377   Epoch: 7   Global Step: 320330   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:44:01,065-Speed 2633.27 samples/sec   Loss 8.2131   LearningRate 0.0377   Epoch: 7   Global Step: 320340   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:44:04,956-Speed 2632.05 samples/sec   Loss 8.2217   LearningRate 0.0377   Epoch: 7   Global Step: 320350   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:44:08,848-Speed 2631.74 samples/sec   Loss 8.2645   LearningRate 0.0377   Epoch: 7   Global Step: 320360   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:44:12,738-Speed 2632.65 samples/sec   Loss 8.2530   LearningRate 0.0377   Epoch: 7   Global Step: 320370   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:44:16,548-Speed 2688.41 samples/sec   Loss 8.7854   LearningRate 0.0377   Epoch: 7   Global Step: 320380   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:20,442-Speed 2630.50 samples/sec   Loss 10.4934   LearningRate 0.0377   Epoch: 7   Global Step: 320390   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:24,345-Speed 2624.26 samples/sec   Loss 8.9209   LearningRate 0.0377   Epoch: 7   Global Step: 320400   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:28,232-Speed 2635.64 samples/sec   Loss 8.4355   LearningRate 0.0377   Epoch: 7   Global Step: 320410   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:32,120-Speed 2634.68 samples/sec   Loss 8.4534   LearningRate 0.0377   Epoch: 7   Global Step: 320420   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:36,005-Speed 2636.39 samples/sec   Loss 8.2574   LearningRate 0.0377   Epoch: 7   Global Step: 320430   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:39,906-Speed 2625.44 samples/sec   Loss 8.2184   LearningRate 0.0377   Epoch: 7   Global Step: 320440   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:43,807-Speed 2625.51 samples/sec   Loss 8.1456   LearningRate 0.0377   Epoch: 7   Global Step: 320450   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:47,715-Speed 2621.10 samples/sec   Loss 8.3298   LearningRate 0.0377   Epoch: 7   Global Step: 320460   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:51,682-Speed 2581.71 samples/sec   Loss 8.3292   LearningRate 0.0377   Epoch: 7   Global Step: 320470   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:44:55,675-Speed 2565.01 samples/sec   Loss 8.3170   LearningRate 0.0377   Epoch: 7   Global Step: 320480   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:44:59,574-Speed 2627.63 samples/sec   Loss 8.2369   LearningRate 0.0377   Epoch: 7   Global Step: 320490   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:03,472-Speed 2627.51 samples/sec   Loss 8.1798   LearningRate 0.0377   Epoch: 7   Global Step: 320500   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:07,375-Speed 2624.29 samples/sec   Loss 8.3522   LearningRate 0.0377   Epoch: 7   Global Step: 320510   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:11,283-Speed 2620.70 samples/sec   Loss 8.3097   LearningRate 0.0377   Epoch: 7   Global Step: 320520   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:15,177-Speed 2629.91 samples/sec   Loss 8.2277   LearningRate 0.0377   Epoch: 7   Global Step: 320530   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:19,080-Speed 2624.53 samples/sec   Loss 8.1828   LearningRate 0.0377   Epoch: 7   Global Step: 320540   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:22,967-Speed 2635.15 samples/sec   Loss 8.4459   LearningRate 0.0377   Epoch: 7   Global Step: 320550   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:26,927-Speed 2586.61 samples/sec   Loss 8.3319   LearningRate 0.0376   Epoch: 7   Global Step: 320560   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:30,817-Speed 2632.74 samples/sec   Loss 8.3353   LearningRate 0.0376   Epoch: 7   Global Step: 320570   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:45:34,707-Speed 2633.68 samples/sec   Loss 8.4024   LearningRate 0.0376   Epoch: 7   Global Step: 320580   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:45:38,610-Speed 2623.92 samples/sec   Loss 8.3342   LearningRate 0.0376   Epoch: 7   Global Step: 320590   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:45:42,501-Speed 2632.47 samples/sec   Loss 8.3075   LearningRate 0.0376   Epoch: 7   Global Step: 320600   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:45:46,389-Speed 2634.04 samples/sec   Loss 8.2512   LearningRate 0.0376   Epoch: 7   Global Step: 320610   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:45:50,276-Speed 2634.59 samples/sec   Loss 8.2460   LearningRate 0.0376   Epoch: 7   Global Step: 320620   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:45:54,174-Speed 2628.43 samples/sec   Loss 8.1746   LearningRate 0.0376   Epoch: 7   Global Step: 320630   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:45:58,062-Speed 2634.16 samples/sec   Loss 8.2369   LearningRate 0.0376   Epoch: 7   Global Step: 320640   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:46:01,964-Speed 2625.41 samples/sec   Loss 8.2817   LearningRate 0.0376   Epoch: 7   Global Step: 320650   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:46:05,845-Speed 2638.69 samples/sec   Loss 8.3533   LearningRate 0.0376   Epoch: 7   Global Step: 320660   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:46:09,739-Speed 2630.61 samples/sec   Loss 8.1770   LearningRate 0.0376   Epoch: 7   Global Step: 320670   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:46:13,635-Speed 2628.61 samples/sec   Loss 8.2218   LearningRate 0.0376   Epoch: 7   Global Step: 320680   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:17,525-Speed 2633.22 samples/sec   Loss 8.2374   LearningRate 0.0376   Epoch: 7   Global Step: 320690   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:21,414-Speed 2633.51 samples/sec   Loss 8.3840   LearningRate 0.0376   Epoch: 7   Global Step: 320700   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:25,303-Speed 2633.62 samples/sec   Loss 8.3315   LearningRate 0.0376   Epoch: 7   Global Step: 320710   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:29,192-Speed 2634.41 samples/sec   Loss 8.3050   LearningRate 0.0376   Epoch: 7   Global Step: 320720   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:33,078-Speed 2635.67 samples/sec   Loss 8.2594   LearningRate 0.0376   Epoch: 7   Global Step: 320730   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:36,965-Speed 2634.38 samples/sec   Loss 8.2622   LearningRate 0.0376   Epoch: 7   Global Step: 320740   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:40,883-Speed 2614.81 samples/sec   Loss 8.2620   LearningRate 0.0376   Epoch: 7   Global Step: 320750   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:44,773-Speed 2633.06 samples/sec   Loss 8.2762   LearningRate 0.0376   Epoch: 7   Global Step: 320760   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:48,716-Speed 2597.53 samples/sec   Loss 8.1460   LearningRate 0.0376   Epoch: 7   Global Step: 320770   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:46:52,603-Speed 2635.13 samples/sec   Loss 8.2766   LearningRate 0.0376   Epoch: 7   Global Step: 320780   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:46:56,492-Speed 2633.77 samples/sec   Loss 8.1972   LearningRate 0.0376   Epoch: 7   Global Step: 320790   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:00,382-Speed 2632.97 samples/sec   Loss 8.3506   LearningRate 0.0376   Epoch: 7   Global Step: 320800   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:04,288-Speed 2622.42 samples/sec   Loss 8.2703   LearningRate 0.0376   Epoch: 7   Global Step: 320810   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:08,192-Speed 2623.54 samples/sec   Loss 8.3327   LearningRate 0.0376   Epoch: 7   Global Step: 320820   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:12,122-Speed 2606.36 samples/sec   Loss 8.2406   LearningRate 0.0376   Epoch: 7   Global Step: 320830   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:16,021-Speed 2627.39 samples/sec   Loss 8.0995   LearningRate 0.0376   Epoch: 7   Global Step: 320840   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:19,913-Speed 2631.68 samples/sec   Loss 8.2216   LearningRate 0.0376   Epoch: 7   Global Step: 320850   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:23,812-Speed 2626.95 samples/sec   Loss 8.2562   LearningRate 0.0376   Epoch: 7   Global Step: 320860   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:27,706-Speed 2629.97 samples/sec   Loss 8.2314   LearningRate 0.0376   Epoch: 7   Global Step: 320870   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:47:31,594-Speed 2634.58 samples/sec   Loss 8.3607   LearningRate 0.0376   Epoch: 7   Global Step: 320880   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:35,482-Speed 2634.00 samples/sec   Loss 8.1791   LearningRate 0.0376   Epoch: 7   Global Step: 320890   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:39,375-Speed 2631.56 samples/sec   Loss 8.3548   LearningRate 0.0376   Epoch: 7   Global Step: 320900   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:43,266-Speed 2632.14 samples/sec   Loss 8.2724   LearningRate 0.0376   Epoch: 7   Global Step: 320910   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:47,154-Speed 2634.50 samples/sec   Loss 8.2089   LearningRate 0.0376   Epoch: 7   Global Step: 320920   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:51,043-Speed 2633.74 samples/sec   Loss 8.3063   LearningRate 0.0376   Epoch: 7   Global Step: 320930   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:54,931-Speed 2634.23 samples/sec   Loss 8.2665   LearningRate 0.0376   Epoch: 7   Global Step: 320940   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:47:58,797-Speed 2649.53 samples/sec   Loss 8.5661   LearningRate 0.0376   Epoch: 7   Global Step: 320950   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:48:02,725-Speed 2607.30 samples/sec   Loss 8.7098   LearningRate 0.0376   Epoch: 7   Global Step: 320960   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:48:06,586-Speed 2652.87 samples/sec   Loss 8.9958   LearningRate 0.0376   Epoch: 7   Global Step: 320970   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:10,473-Speed 2635.25 samples/sec   Loss 8.4089   LearningRate 0.0376   Epoch: 7   Global Step: 320980   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:14,365-Speed 2631.44 samples/sec   Loss 8.1900   LearningRate 0.0376   Epoch: 7   Global Step: 320990   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:18,257-Speed 2631.98 samples/sec   Loss 8.3110   LearningRate 0.0376   Epoch: 7   Global Step: 321000   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:22,151-Speed 2630.40 samples/sec   Loss 8.2273   LearningRate 0.0376   Epoch: 7   Global Step: 321010   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:26,039-Speed 2634.50 samples/sec   Loss 8.2188   LearningRate 0.0376   Epoch: 7   Global Step: 321020   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:29,928-Speed 2633.09 samples/sec   Loss 8.3557   LearningRate 0.0376   Epoch: 7   Global Step: 321030   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:33,827-Speed 2626.97 samples/sec   Loss 8.2471   LearningRate 0.0376   Epoch: 7   Global Step: 321040   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:37,717-Speed 2633.10 samples/sec   Loss 8.3447   LearningRate 0.0376   Epoch: 7   Global Step: 321050   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:41,606-Speed 2633.41 samples/sec   Loss 8.1803   LearningRate 0.0376   Epoch: 7   Global Step: 321060   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:48:45,496-Speed 2632.95 samples/sec   Loss 8.2697   LearningRate 0.0376   Epoch: 7   Global Step: 321070   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:48:49,388-Speed 2631.48 samples/sec   Loss 8.1169   LearningRate 0.0376   Epoch: 7   Global Step: 321080   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:48:53,279-Speed 2633.32 samples/sec   Loss 8.2244   LearningRate 0.0376   Epoch: 7   Global Step: 321090   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:48:57,172-Speed 2630.96 samples/sec   Loss 8.3264   LearningRate 0.0376   Epoch: 7   Global Step: 321100   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:01,061-Speed 2633.56 samples/sec   Loss 8.2384   LearningRate 0.0376   Epoch: 7   Global Step: 321110   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:05,070-Speed 2554.35 samples/sec   Loss 8.2940   LearningRate 0.0376   Epoch: 7   Global Step: 321120   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:09,061-Speed 2567.17 samples/sec   Loss 8.2516   LearningRate 0.0376   Epoch: 7   Global Step: 321130   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:12,963-Speed 2624.91 samples/sec   Loss 8.2151   LearningRate 0.0376   Epoch: 7   Global Step: 321140   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:16,861-Speed 2627.21 samples/sec   Loss 8.3660   LearningRate 0.0376   Epoch: 7   Global Step: 321150   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:20,753-Speed 2631.88 samples/sec   Loss 8.3114   LearningRate 0.0376   Epoch: 7   Global Step: 321160   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:49:24,647-Speed 2630.70 samples/sec   Loss 8.3980   LearningRate 0.0376   Epoch: 7   Global Step: 321170   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:28,547-Speed 2626.67 samples/sec   Loss 8.1749   LearningRate 0.0376   Epoch: 7   Global Step: 321180   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:32,443-Speed 2628.95 samples/sec   Loss 8.2504   LearningRate 0.0376   Epoch: 7   Global Step: 321190   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:36,341-Speed 2627.22 samples/sec   Loss 8.3607   LearningRate 0.0376   Epoch: 7   Global Step: 321200   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:40,242-Speed 2625.67 samples/sec   Loss 8.2534   LearningRate 0.0376   Epoch: 7   Global Step: 321210   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:44,132-Speed 2633.31 samples/sec   Loss 8.3576   LearningRate 0.0376   Epoch: 7   Global Step: 321220   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:48,026-Speed 2630.75 samples/sec   Loss 8.2457   LearningRate 0.0375   Epoch: 7   Global Step: 321230   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:51,919-Speed 2631.37 samples/sec   Loss 8.2804   LearningRate 0.0375   Epoch: 7   Global Step: 321240   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:55,823-Speed 2623.22 samples/sec   Loss 8.1273   LearningRate 0.0375   Epoch: 7   Global Step: 321250   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:49:59,703-Speed 2640.15 samples/sec   Loss 8.3278   LearningRate 0.0375   Epoch: 7   Global Step: 321260   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:50:03,594-Speed 2632.37 samples/sec   Loss 8.2663   LearningRate 0.0375   Epoch: 7   Global Step: 321270   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:07,552-Speed 2588.03 samples/sec   Loss 8.2970   LearningRate 0.0375   Epoch: 7   Global Step: 321280   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:11,461-Speed 2619.77 samples/sec   Loss 8.2703   LearningRate 0.0375   Epoch: 7   Global Step: 321290   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:15,351-Speed 2633.11 samples/sec   Loss 8.3071   LearningRate 0.0375   Epoch: 7   Global Step: 321300   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:19,244-Speed 2631.60 samples/sec   Loss 8.2781   LearningRate 0.0375   Epoch: 7   Global Step: 321310   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:23,146-Speed 2624.69 samples/sec   Loss 8.3412   LearningRate 0.0375   Epoch: 7   Global Step: 321320   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:27,065-Speed 2613.74 samples/sec   Loss 8.1100   LearningRate 0.0375   Epoch: 7   Global Step: 321330   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:30,978-Speed 2617.63 samples/sec   Loss 8.2230   LearningRate 0.0375   Epoch: 7   Global Step: 321340   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:34,894-Speed 2615.43 samples/sec   Loss 8.2875   LearningRate 0.0375   Epoch: 7   Global Step: 321350   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:38,800-Speed 2622.63 samples/sec   Loss 8.2225   LearningRate 0.0375   Epoch: 7   Global Step: 321360   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:50:42,729-Speed 2606.98 samples/sec   Loss 8.2652   LearningRate 0.0375   Epoch: 7   Global Step: 321370   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:50:46,640-Speed 2618.72 samples/sec   Loss 8.2068   LearningRate 0.0375   Epoch: 7   Global Step: 321380   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:50:50,555-Speed 2616.71 samples/sec   Loss 8.0741   LearningRate 0.0375   Epoch: 7   Global Step: 321390   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:50:54,459-Speed 2623.28 samples/sec   Loss 8.2275   LearningRate 0.0375   Epoch: 7   Global Step: 321400   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:50:58,347-Speed 2634.61 samples/sec   Loss 8.3012   LearningRate 0.0375   Epoch: 7   Global Step: 321410   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:51:02,250-Speed 2624.13 samples/sec   Loss 8.1889   LearningRate 0.0375   Epoch: 7   Global Step: 321420   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:51:06,154-Speed 2623.19 samples/sec   Loss 8.2061   LearningRate 0.0375   Epoch: 7   Global Step: 321430   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:51:10,050-Speed 2629.08 samples/sec   Loss 8.1211   LearningRate 0.0375   Epoch: 7   Global Step: 321440   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:51:13,958-Speed 2620.82 samples/sec   Loss 8.0751   LearningRate 0.0375   Epoch: 7   Global Step: 321450   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:51:17,864-Speed 2622.26 samples/sec   Loss 8.2323   LearningRate 0.0375   Epoch: 7   Global Step: 321460   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:51:21,778-Speed 2617.46 samples/sec   Loss 8.2167   LearningRate 0.0375   Epoch: 7   Global Step: 321470   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:25,678-Speed 2626.07 samples/sec   Loss 8.3174   LearningRate 0.0375   Epoch: 7   Global Step: 321480   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:29,570-Speed 2632.26 samples/sec   Loss 8.2868   LearningRate 0.0375   Epoch: 7   Global Step: 321490   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:33,462-Speed 2630.94 samples/sec   Loss 8.2812   LearningRate 0.0375   Epoch: 7   Global Step: 321500   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:37,354-Speed 2631.74 samples/sec   Loss 8.2543   LearningRate 0.0375   Epoch: 7   Global Step: 321510   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:41,249-Speed 2630.12 samples/sec   Loss 8.1901   LearningRate 0.0375   Epoch: 7   Global Step: 321520   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:45,151-Speed 2624.71 samples/sec   Loss 8.4679   LearningRate 0.0375   Epoch: 7   Global Step: 321530   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:49,056-Speed 2623.69 samples/sec   Loss 8.4743   LearningRate 0.0375   Epoch: 7   Global Step: 321540   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:52,947-Speed 2632.47 samples/sec   Loss 8.3268   LearningRate 0.0375   Epoch: 7   Global Step: 321550   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:51:56,837-Speed 2632.86 samples/sec   Loss 8.1322   LearningRate 0.0375   Epoch: 7   Global Step: 321560   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:52:00,730-Speed 2631.21 samples/sec   Loss 8.2183   LearningRate 0.0375   Epoch: 7   Global Step: 321570   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:04,624-Speed 2630.08 samples/sec   Loss 8.1180   LearningRate 0.0375   Epoch: 7   Global Step: 321580   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:08,517-Speed 2630.99 samples/sec   Loss 8.1764   LearningRate 0.0375   Epoch: 7   Global Step: 321590   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:12,412-Speed 2629.67 samples/sec   Loss 8.1412   LearningRate 0.0375   Epoch: 7   Global Step: 321600   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:16,311-Speed 2626.82 samples/sec   Loss 8.1172   LearningRate 0.0375   Epoch: 7   Global Step: 321610   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:20,202-Speed 2632.72 samples/sec   Loss 8.2325   LearningRate 0.0375   Epoch: 7   Global Step: 321620   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:24,093-Speed 2633.08 samples/sec   Loss 8.3090   LearningRate 0.0375   Epoch: 7   Global Step: 321630   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:27,986-Speed 2631.02 samples/sec   Loss 8.2025   LearningRate 0.0375   Epoch: 7   Global Step: 321640   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:31,880-Speed 2630.19 samples/sec   Loss 8.3481   LearningRate 0.0375   Epoch: 7   Global Step: 321650   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:35,774-Speed 2630.12 samples/sec   Loss 8.3418   LearningRate 0.0375   Epoch: 7   Global Step: 321660   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:39,669-Speed 2630.07 samples/sec   Loss 8.2951   LearningRate 0.0375   Epoch: 7   Global Step: 321670   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 07:52:43,557-Speed 2633.73 samples/sec   Loss 8.2306   LearningRate 0.0375   Epoch: 7   Global Step: 321680   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:47,463-Speed 2622.35 samples/sec   Loss 8.3020   LearningRate 0.0375   Epoch: 7   Global Step: 321690   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:51,360-Speed 2628.27 samples/sec   Loss 8.2173   LearningRate 0.0375   Epoch: 7   Global Step: 321700   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:55,267-Speed 2622.26 samples/sec   Loss 8.1783   LearningRate 0.0375   Epoch: 7   Global Step: 321710   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:52:59,161-Speed 2630.25 samples/sec   Loss 8.3220   LearningRate 0.0375   Epoch: 7   Global Step: 321720   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:53:03,055-Speed 2630.38 samples/sec   Loss 8.2598   LearningRate 0.0375   Epoch: 7   Global Step: 321730   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:53:06,965-Speed 2619.11 samples/sec   Loss 8.2265   LearningRate 0.0375   Epoch: 7   Global Step: 321740   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:10,858-Speed 2630.98 samples/sec   Loss 8.3990   LearningRate 0.0375   Epoch: 7   Global Step: 321750   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:14,755-Speed 2628.63 samples/sec   Loss 8.2998   LearningRate 0.0375   Epoch: 7   Global Step: 321760   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:18,650-Speed 2629.79 samples/sec   Loss 8.2148   LearningRate 0.0375   Epoch: 7   Global Step: 321770   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:22,546-Speed 2628.87 samples/sec   Loss 8.2455   LearningRate 0.0375   Epoch: 7   Global Step: 321780   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:26,471-Speed 2609.24 samples/sec   Loss 8.2474   LearningRate 0.0375   Epoch: 7   Global Step: 321790   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:30,370-Speed 2627.11 samples/sec   Loss 8.2499   LearningRate 0.0375   Epoch: 7   Global Step: 321800   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:34,281-Speed 2619.22 samples/sec   Loss 8.2884   LearningRate 0.0375   Epoch: 7   Global Step: 321810   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:38,175-Speed 2630.28 samples/sec   Loss 8.2073   LearningRate 0.0375   Epoch: 7   Global Step: 321820   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:42,069-Speed 2630.49 samples/sec   Loss 8.1349   LearningRate 0.0375   Epoch: 7   Global Step: 321830   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:53:45,958-Speed 2633.26 samples/sec   Loss 8.2780   LearningRate 0.0375   Epoch: 7   Global Step: 321840   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:53:49,863-Speed 2622.88 samples/sec   Loss 8.2273   LearningRate 0.0375   Epoch: 7   Global Step: 321850   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:53:53,756-Speed 2630.88 samples/sec   Loss 8.3127   LearningRate 0.0375   Epoch: 7   Global Step: 321860   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:53:57,648-Speed 2631.67 samples/sec   Loss 8.1246   LearningRate 0.0375   Epoch: 7   Global Step: 321870   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:54:01,539-Speed 2632.67 samples/sec   Loss 8.3288   LearningRate 0.0375   Epoch: 7   Global Step: 321880   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:54:05,435-Speed 2628.67 samples/sec   Loss 8.1800   LearningRate 0.0375   Epoch: 7   Global Step: 321890   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:54:09,330-Speed 2630.14 samples/sec   Loss 8.2192   LearningRate 0.0375   Epoch: 7   Global Step: 321900   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:54:13,221-Speed 2632.18 samples/sec   Loss 8.1754   LearningRate 0.0374   Epoch: 7   Global Step: 321910   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:54:17,085-Speed 2650.99 samples/sec   Loss 8.2980   LearningRate 0.0374   Epoch: 7   Global Step: 321920   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:20,978-Speed 2631.20 samples/sec   Loss 8.2863   LearningRate 0.0374   Epoch: 7   Global Step: 321930   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:24,899-Speed 2612.24 samples/sec   Loss 8.2258   LearningRate 0.0374   Epoch: 7   Global Step: 321940   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:28,790-Speed 2632.42 samples/sec   Loss 8.3574   LearningRate 0.0374   Epoch: 7   Global Step: 321950   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:32,683-Speed 2631.21 samples/sec   Loss 8.1619   LearningRate 0.0374   Epoch: 7   Global Step: 321960   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:36,586-Speed 2624.17 samples/sec   Loss 8.2622   LearningRate 0.0374   Epoch: 7   Global Step: 321970   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:40,477-Speed 2632.66 samples/sec   Loss 8.3165   LearningRate 0.0374   Epoch: 7   Global Step: 321980   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:44,367-Speed 2633.36 samples/sec   Loss 8.3320   LearningRate 0.0374   Epoch: 7   Global Step: 321990   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:48,259-Speed 2631.17 samples/sec   Loss 8.2515   LearningRate 0.0374   Epoch: 7   Global Step: 322000   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:52,156-Speed 2629.27 samples/sec   Loss 8.3110   LearningRate 0.0374   Epoch: 7   Global Step: 322010   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 07:54:56,064-Speed 2620.81 samples/sec   Loss 8.2667   LearningRate 0.0374   Epoch: 7   Global Step: 322020   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:54:59,974-Speed 2618.90 samples/sec   Loss 8.3264   LearningRate 0.0374   Epoch: 7   Global Step: 322030   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:03,887-Speed 2617.89 samples/sec   Loss 8.1444   LearningRate 0.0374   Epoch: 7   Global Step: 322040   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:07,779-Speed 2632.30 samples/sec   Loss 8.2264   LearningRate 0.0374   Epoch: 7   Global Step: 322050   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:11,677-Speed 2627.05 samples/sec   Loss 8.1238   LearningRate 0.0374   Epoch: 7   Global Step: 322060   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:15,567-Speed 2633.48 samples/sec   Loss 8.2566   LearningRate 0.0374   Epoch: 7   Global Step: 322070   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:19,553-Speed 2569.88 samples/sec   Loss 8.2034   LearningRate 0.0374   Epoch: 7   Global Step: 322080   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:23,631-Speed 2511.78 samples/sec   Loss 8.2602   LearningRate 0.0374   Epoch: 7   Global Step: 322090   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:27,615-Speed 2571.10 samples/sec   Loss 8.2424   LearningRate 0.0374   Epoch: 7   Global Step: 322100   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:31,507-Speed 2631.61 samples/sec   Loss 8.2013   LearningRate 0.0374   Epoch: 7   Global Step: 322110   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:55:35,398-Speed 2631.88 samples/sec   Loss 8.2673   LearningRate 0.0374   Epoch: 7   Global Step: 322120   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:55:39,293-Speed 2629.76 samples/sec   Loss 8.2427   LearningRate 0.0374   Epoch: 7   Global Step: 322130   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:55:43,192-Speed 2626.71 samples/sec   Loss 8.2063   LearningRate 0.0374   Epoch: 7   Global Step: 322140   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:55:47,089-Speed 2628.48 samples/sec   Loss 8.3063   LearningRate 0.0374   Epoch: 7   Global Step: 322150   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:55:50,984-Speed 2629.86 samples/sec   Loss 8.2415   LearningRate 0.0374   Epoch: 7   Global Step: 322160   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:55:54,877-Speed 2630.93 samples/sec   Loss 8.0396   LearningRate 0.0374   Epoch: 7   Global Step: 322170   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:55:58,766-Speed 2633.79 samples/sec   Loss 8.2154   LearningRate 0.0374   Epoch: 7   Global Step: 322180   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:56:02,648-Speed 2638.69 samples/sec   Loss 8.3045   LearningRate 0.0374   Epoch: 7   Global Step: 322190   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:06,574-Speed 2609.02 samples/sec   Loss 8.2242   LearningRate 0.0374   Epoch: 7   Global Step: 322200   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:10,488-Speed 2616.82 samples/sec   Loss 8.2505   LearningRate 0.0374   Epoch: 7   Global Step: 322210   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:14,380-Speed 2632.37 samples/sec   Loss 8.0504   LearningRate 0.0374   Epoch: 7   Global Step: 322220   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:18,273-Speed 2630.73 samples/sec   Loss 8.1271   LearningRate 0.0374   Epoch: 7   Global Step: 322230   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:22,166-Speed 2631.01 samples/sec   Loss 8.2596   LearningRate 0.0374   Epoch: 7   Global Step: 322240   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:26,063-Speed 2628.49 samples/sec   Loss 8.2252   LearningRate 0.0374   Epoch: 7   Global Step: 322250   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:29,956-Speed 2631.67 samples/sec   Loss 8.2769   LearningRate 0.0374   Epoch: 7   Global Step: 322260   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:33,848-Speed 2631.34 samples/sec   Loss 8.1872   LearningRate 0.0374   Epoch: 7   Global Step: 322270   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:37,743-Speed 2629.78 samples/sec   Loss 8.2020   LearningRate 0.0374   Epoch: 7   Global Step: 322280   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 07:56:41,640-Speed 2628.36 samples/sec   Loss 8.1919   LearningRate 0.0374   Epoch: 7   Global Step: 322290   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:56:45,533-Speed 2630.69 samples/sec   Loss 8.2577   LearningRate 0.0374   Epoch: 7   Global Step: 322300   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:56:49,429-Speed 2629.46 samples/sec   Loss 8.2490   LearningRate 0.0374   Epoch: 7   Global Step: 322310   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:56:53,323-Speed 2630.57 samples/sec   Loss 8.2403   LearningRate 0.0374   Epoch: 7   Global Step: 322320   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:56:57,220-Speed 2628.43 samples/sec   Loss 8.0996   LearningRate 0.0374   Epoch: 7   Global Step: 322330   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:57:01,112-Speed 2631.02 samples/sec   Loss 8.2977   LearningRate 0.0374   Epoch: 7   Global Step: 322340   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:57:05,010-Speed 2627.90 samples/sec   Loss 8.3270   LearningRate 0.0374   Epoch: 7   Global Step: 322350   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:57:08,904-Speed 2629.87 samples/sec   Loss 8.3075   LearningRate 0.0374   Epoch: 7   Global Step: 322360   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:57:12,801-Speed 2628.34 samples/sec   Loss 8.1137   LearningRate 0.0374   Epoch: 7   Global Step: 322370   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 07:57:16,631-Speed 2674.63 samples/sec   Loss 8.4211   LearningRate 0.0374   Epoch: 7   Global Step: 322380   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:57:20,507-Speed 2642.23 samples/sec   Loss 8.5375   LearningRate 0.0374   Epoch: 7   Global Step: 322390   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:24,397-Speed 2632.81 samples/sec   Loss 8.4116   LearningRate 0.0374   Epoch: 7   Global Step: 322400   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:28,288-Speed 2632.85 samples/sec   Loss 8.3062   LearningRate 0.0374   Epoch: 7   Global Step: 322410   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:32,178-Speed 2632.92 samples/sec   Loss 8.3057   LearningRate 0.0374   Epoch: 7   Global Step: 322420   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:36,073-Speed 2629.99 samples/sec   Loss 8.3136   LearningRate 0.0374   Epoch: 7   Global Step: 322430   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:39,973-Speed 2625.75 samples/sec   Loss 8.3057   LearningRate 0.0374   Epoch: 7   Global Step: 322440   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:43,862-Speed 2633.50 samples/sec   Loss 8.2801   LearningRate 0.0374   Epoch: 7   Global Step: 322450   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:47,750-Speed 2634.54 samples/sec   Loss 8.5245   LearningRate 0.0374   Epoch: 7   Global Step: 322460   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:51,641-Speed 2632.66 samples/sec   Loss 8.1412   LearningRate 0.0374   Epoch: 7   Global Step: 322470   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:55,551-Speed 2619.28 samples/sec   Loss 8.3392   LearningRate 0.0374   Epoch: 7   Global Step: 322480   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 07:57:59,440-Speed 2633.76 samples/sec   Loss 8.2428   LearningRate 0.0374   Epoch: 7   Global Step: 322490   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:03,332-Speed 2631.56 samples/sec   Loss 8.0859   LearningRate 0.0374   Epoch: 7   Global Step: 322500   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:07,225-Speed 2631.21 samples/sec   Loss 8.1145   LearningRate 0.0374   Epoch: 7   Global Step: 322510   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:11,116-Speed 2632.12 samples/sec   Loss 8.2212   LearningRate 0.0374   Epoch: 7   Global Step: 322520   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:15,011-Speed 2629.38 samples/sec   Loss 8.1932   LearningRate 0.0374   Epoch: 7   Global Step: 322530   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:18,908-Speed 2628.08 samples/sec   Loss 8.2330   LearningRate 0.0374   Epoch: 7   Global Step: 322540   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:22,800-Speed 2631.84 samples/sec   Loss 8.2540   LearningRate 0.0374   Epoch: 7   Global Step: 322550   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:26,691-Speed 2631.84 samples/sec   Loss 8.3254   LearningRate 0.0374   Epoch: 7   Global Step: 322560   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:30,587-Speed 2629.24 samples/sec   Loss 8.2750   LearningRate 0.0374   Epoch: 7   Global Step: 322570   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:34,482-Speed 2629.94 samples/sec   Loss 8.4396   LearningRate 0.0374   Epoch: 7   Global Step: 322580   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 07:58:38,383-Speed 2625.41 samples/sec   Loss 8.3487   LearningRate 0.0373   Epoch: 7   Global Step: 322590   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:58:42,280-Speed 2628.06 samples/sec   Loss 8.2220   LearningRate 0.0373   Epoch: 7   Global Step: 322600   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:58:46,174-Speed 2630.87 samples/sec   Loss 8.2577   LearningRate 0.0373   Epoch: 7   Global Step: 322610   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:58:50,073-Speed 2626.68 samples/sec   Loss 8.3470   LearningRate 0.0373   Epoch: 7   Global Step: 322620   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:58:53,964-Speed 2632.44 samples/sec   Loss 8.2660   LearningRate 0.0373   Epoch: 7   Global Step: 322630   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:58:57,872-Speed 2620.65 samples/sec   Loss 8.2279   LearningRate 0.0373   Epoch: 7   Global Step: 322640   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:59:01,768-Speed 2628.67 samples/sec   Loss 8.2572   LearningRate 0.0373   Epoch: 7   Global Step: 322650   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:59:05,658-Speed 2632.71 samples/sec   Loss 8.1925   LearningRate 0.0373   Epoch: 7   Global Step: 322660   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:59:09,550-Speed 2631.82 samples/sec   Loss 8.2294   LearningRate 0.0373   Epoch: 7   Global Step: 322670   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:59:13,449-Speed 2627.08 samples/sec   Loss 8.0773   LearningRate 0.0373   Epoch: 7   Global Step: 322680   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 07:59:17,352-Speed 2624.21 samples/sec   Loss 8.1560   LearningRate 0.0373   Epoch: 7   Global Step: 322690   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:21,255-Speed 2624.14 samples/sec   Loss 8.2565   LearningRate 0.0373   Epoch: 7   Global Step: 322700   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:25,152-Speed 2628.72 samples/sec   Loss 8.1938   LearningRate 0.0373   Epoch: 7   Global Step: 322710   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:29,046-Speed 2630.48 samples/sec   Loss 8.2482   LearningRate 0.0373   Epoch: 7   Global Step: 322720   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:32,937-Speed 2632.39 samples/sec   Loss 8.2163   LearningRate 0.0373   Epoch: 7   Global Step: 322730   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:36,827-Speed 2632.57 samples/sec   Loss 8.3267   LearningRate 0.0373   Epoch: 7   Global Step: 322740   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:40,720-Speed 2631.35 samples/sec   Loss 8.1422   LearningRate 0.0373   Epoch: 7   Global Step: 322750   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:44,608-Speed 2634.57 samples/sec   Loss 8.1597   LearningRate 0.0373   Epoch: 7   Global Step: 322760   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:48,502-Speed 2630.29 samples/sec   Loss 8.3599   LearningRate 0.0373   Epoch: 7   Global Step: 322770   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:52,400-Speed 2627.51 samples/sec   Loss 8.2294   LearningRate 0.0373   Epoch: 7   Global Step: 322780   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 07:59:56,295-Speed 2629.65 samples/sec   Loss 8.1935   LearningRate 0.0373   Epoch: 7   Global Step: 322790   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:00,191-Speed 2629.10 samples/sec   Loss 8.1916   LearningRate 0.0373   Epoch: 7   Global Step: 322800   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:04,090-Speed 2626.71 samples/sec   Loss 8.2611   LearningRate 0.0373   Epoch: 7   Global Step: 322810   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:07,985-Speed 2629.58 samples/sec   Loss 8.1089   LearningRate 0.0373   Epoch: 7   Global Step: 322820   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:11,876-Speed 2632.15 samples/sec   Loss 8.2498   LearningRate 0.0373   Epoch: 7   Global Step: 322830   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:15,763-Speed 2635.41 samples/sec   Loss 8.1359   LearningRate 0.0373   Epoch: 7   Global Step: 322840   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:19,652-Speed 2633.47 samples/sec   Loss 8.3276   LearningRate 0.0373   Epoch: 7   Global Step: 322850   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:23,555-Speed 2624.43 samples/sec   Loss 8.2619   LearningRate 0.0373   Epoch: 7   Global Step: 322860   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:27,444-Speed 2633.36 samples/sec   Loss 8.3124   LearningRate 0.0373   Epoch: 7   Global Step: 322870   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:31,337-Speed 2631.13 samples/sec   Loss 8.1369   LearningRate 0.0373   Epoch: 7   Global Step: 322880   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:00:35,228-Speed 2632.16 samples/sec   Loss 8.2838   LearningRate 0.0373   Epoch: 7   Global Step: 322890   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:00:39,119-Speed 2632.49 samples/sec   Loss 8.2448   LearningRate 0.0373   Epoch: 7   Global Step: 322900   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:00:43,010-Speed 2632.55 samples/sec   Loss 8.3040   LearningRate 0.0373   Epoch: 7   Global Step: 322910   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:00:46,905-Speed 2629.73 samples/sec   Loss 8.2531   LearningRate 0.0373   Epoch: 7   Global Step: 322920   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:00:50,805-Speed 2625.75 samples/sec   Loss 8.1787   LearningRate 0.0373   Epoch: 7   Global Step: 322930   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:00:54,709-Speed 2623.94 samples/sec   Loss 8.3644   LearningRate 0.0373   Epoch: 7   Global Step: 322940   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:00:58,618-Speed 2619.79 samples/sec   Loss 8.1807   LearningRate 0.0373   Epoch: 7   Global Step: 322950   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:01:02,520-Speed 2625.10 samples/sec   Loss 8.2888   LearningRate 0.0373   Epoch: 7   Global Step: 322960   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:01:06,423-Speed 2624.02 samples/sec   Loss 8.1933   LearningRate 0.0373   Epoch: 7   Global Step: 322970   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:01:10,324-Speed 2625.90 samples/sec   Loss 8.1847   LearningRate 0.0373   Epoch: 7   Global Step: 322980   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:01:14,221-Speed 2627.75 samples/sec   Loss 8.2751   LearningRate 0.0373   Epoch: 7   Global Step: 322990   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:18,119-Speed 2627.56 samples/sec   Loss 8.3229   LearningRate 0.0373   Epoch: 7   Global Step: 323000   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:22,030-Speed 2618.94 samples/sec   Loss 8.1761   LearningRate 0.0373   Epoch: 7   Global Step: 323010   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:25,925-Speed 2629.71 samples/sec   Loss 8.2443   LearningRate 0.0373   Epoch: 7   Global Step: 323020   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:29,815-Speed 2633.02 samples/sec   Loss 8.2466   LearningRate 0.0373   Epoch: 7   Global Step: 323030   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:33,720-Speed 2622.79 samples/sec   Loss 8.1614   LearningRate 0.0373   Epoch: 7   Global Step: 323040   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:37,624-Speed 2623.35 samples/sec   Loss 8.2585   LearningRate 0.0373   Epoch: 7   Global Step: 323050   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:41,524-Speed 2625.97 samples/sec   Loss 8.1760   LearningRate 0.0373   Epoch: 7   Global Step: 323060   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:45,415-Speed 2632.58 samples/sec   Loss 8.1035   LearningRate 0.0373   Epoch: 7   Global Step: 323070   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:49,312-Speed 2628.43 samples/sec   Loss 8.3743   LearningRate 0.0373   Epoch: 7   Global Step: 323080   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:53,181-Speed 2647.73 samples/sec   Loss 8.1303   LearningRate 0.0373   Epoch: 7   Global Step: 323090   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:01:57,074-Speed 2630.97 samples/sec   Loss 8.3789   LearningRate 0.0373   Epoch: 7   Global Step: 323100   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:00,967-Speed 2630.82 samples/sec   Loss 8.2241   LearningRate 0.0373   Epoch: 7   Global Step: 323110   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:04,860-Speed 2630.99 samples/sec   Loss 8.1243   LearningRate 0.0373   Epoch: 7   Global Step: 323120   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:08,758-Speed 2627.29 samples/sec   Loss 8.3115   LearningRate 0.0373   Epoch: 7   Global Step: 323130   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:12,658-Speed 2626.05 samples/sec   Loss 8.2603   LearningRate 0.0373   Epoch: 7   Global Step: 323140   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:16,550-Speed 2631.60 samples/sec   Loss 8.2245   LearningRate 0.0373   Epoch: 7   Global Step: 323150   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:20,445-Speed 2630.23 samples/sec   Loss 8.2658   LearningRate 0.0373   Epoch: 7   Global Step: 323160   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:24,339-Speed 2630.37 samples/sec   Loss 8.1335   LearningRate 0.0373   Epoch: 7   Global Step: 323170   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:28,235-Speed 2628.72 samples/sec   Loss 8.2884   LearningRate 0.0373   Epoch: 7   Global Step: 323180   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:32,120-Speed 2636.80 samples/sec   Loss 8.1889   LearningRate 0.0373   Epoch: 7   Global Step: 323190   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:36,010-Speed 2632.59 samples/sec   Loss 8.2568   LearningRate 0.0373   Epoch: 7   Global Step: 323200   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:39,903-Speed 2630.79 samples/sec   Loss 8.0151   LearningRate 0.0373   Epoch: 7   Global Step: 323210   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:02:43,795-Speed 2631.87 samples/sec   Loss 8.2043   LearningRate 0.0373   Epoch: 7   Global Step: 323220   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:02:47,692-Speed 2627.68 samples/sec   Loss 8.3020   LearningRate 0.0373   Epoch: 7   Global Step: 323230   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:02:51,588-Speed 2629.09 samples/sec   Loss 8.2595   LearningRate 0.0373   Epoch: 7   Global Step: 323240   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:02:55,483-Speed 2629.97 samples/sec   Loss 8.1965   LearningRate 0.0373   Epoch: 7   Global Step: 323250   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:02:59,386-Speed 2623.94 samples/sec   Loss 8.2719   LearningRate 0.0373   Epoch: 7   Global Step: 323260   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:03,281-Speed 2629.63 samples/sec   Loss 8.2135   LearningRate 0.0372   Epoch: 7   Global Step: 323270   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:07,178-Speed 2628.13 samples/sec   Loss 8.3132   LearningRate 0.0372   Epoch: 7   Global Step: 323280   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:11,073-Speed 2629.95 samples/sec   Loss 8.2782   LearningRate 0.0372   Epoch: 7   Global Step: 323290   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:14,972-Speed 2626.52 samples/sec   Loss 8.3013   LearningRate 0.0372   Epoch: 7   Global Step: 323300   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:18,883-Speed 2619.08 samples/sec   Loss 8.3226   LearningRate 0.0372   Epoch: 7   Global Step: 323310   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:22,793-Speed 2619.44 samples/sec   Loss 8.1770   LearningRate 0.0372   Epoch: 7   Global Step: 323320   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:03:26,723-Speed 2605.82 samples/sec   Loss 8.2176   LearningRate 0.0372   Epoch: 7   Global Step: 323330   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:03:30,616-Speed 2631.43 samples/sec   Loss 8.1023   LearningRate 0.0372   Epoch: 7   Global Step: 323340   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:34,519-Speed 2623.94 samples/sec   Loss 8.2271   LearningRate 0.0372   Epoch: 7   Global Step: 323350   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:38,430-Speed 2618.92 samples/sec   Loss 8.1013   LearningRate 0.0372   Epoch: 7   Global Step: 323360   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:42,339-Speed 2619.96 samples/sec   Loss 8.3039   LearningRate 0.0372   Epoch: 7   Global Step: 323370   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:46,237-Speed 2627.97 samples/sec   Loss 8.1926   LearningRate 0.0372   Epoch: 7   Global Step: 323380   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:50,140-Speed 2624.30 samples/sec   Loss 8.2885   LearningRate 0.0372   Epoch: 7   Global Step: 323390   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:54,043-Speed 2624.09 samples/sec   Loss 8.0593   LearningRate 0.0372   Epoch: 7   Global Step: 323400   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:03:57,945-Speed 2624.91 samples/sec   Loss 8.2487   LearningRate 0.0372   Epoch: 7   Global Step: 323410   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:01,848-Speed 2624.44 samples/sec   Loss 8.3200   LearningRate 0.0372   Epoch: 7   Global Step: 323420   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:05,752-Speed 2623.46 samples/sec   Loss 8.1806   LearningRate 0.0372   Epoch: 7   Global Step: 323430   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:09,643-Speed 2632.23 samples/sec   Loss 8.2882   LearningRate 0.0372   Epoch: 7   Global Step: 323440   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:13,536-Speed 2630.87 samples/sec   Loss 8.2100   LearningRate 0.0372   Epoch: 7   Global Step: 323450   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:17,432-Speed 2629.02 samples/sec   Loss 8.3376   LearningRate 0.0372   Epoch: 7   Global Step: 323460   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:21,320-Speed 2634.54 samples/sec   Loss 8.1986   LearningRate 0.0372   Epoch: 7   Global Step: 323470   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:25,214-Speed 2630.61 samples/sec   Loss 8.1757   LearningRate 0.0372   Epoch: 7   Global Step: 323480   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:29,107-Speed 2630.69 samples/sec   Loss 8.1473   LearningRate 0.0372   Epoch: 7   Global Step: 323490   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:33,001-Speed 2630.15 samples/sec   Loss 8.2106   LearningRate 0.0372   Epoch: 7   Global Step: 323500   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:36,896-Speed 2630.06 samples/sec   Loss 8.1036   LearningRate 0.0372   Epoch: 7   Global Step: 323510   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:40,798-Speed 2624.70 samples/sec   Loss 8.2113   LearningRate 0.0372   Epoch: 7   Global Step: 323520   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:44,698-Speed 2626.12 samples/sec   Loss 8.2024   LearningRate 0.0372   Epoch: 7   Global Step: 323530   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:04:48,595-Speed 2628.33 samples/sec   Loss 8.1758   LearningRate 0.0372   Epoch: 7   Global Step: 323540   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:04:52,508-Speed 2617.41 samples/sec   Loss 8.3267   LearningRate 0.0372   Epoch: 7   Global Step: 323550   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:04:56,436-Speed 2607.72 samples/sec   Loss 8.1800   LearningRate 0.0372   Epoch: 7   Global Step: 323560   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:00,366-Speed 2606.60 samples/sec   Loss 8.3108   LearningRate 0.0372   Epoch: 7   Global Step: 323570   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:04,373-Speed 2556.16 samples/sec   Loss 8.2897   LearningRate 0.0372   Epoch: 7   Global Step: 323580   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:08,270-Speed 2627.52 samples/sec   Loss 8.0463   LearningRate 0.0372   Epoch: 7   Global Step: 323590   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:12,166-Speed 2628.91 samples/sec   Loss 8.2040   LearningRate 0.0372   Epoch: 7   Global Step: 323600   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:16,072-Speed 2622.53 samples/sec   Loss 8.1305   LearningRate 0.0372   Epoch: 7   Global Step: 323610   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:19,968-Speed 2628.93 samples/sec   Loss 8.1378   LearningRate 0.0372   Epoch: 7   Global Step: 323620   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:23,863-Speed 2629.80 samples/sec   Loss 8.2499   LearningRate 0.0372   Epoch: 7   Global Step: 323630   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:27,759-Speed 2628.77 samples/sec   Loss 8.3127   LearningRate 0.0372   Epoch: 7   Global Step: 323640   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:05:31,654-Speed 2629.67 samples/sec   Loss 8.1701   LearningRate 0.0372   Epoch: 7   Global Step: 323650   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:05:35,530-Speed 2642.71 samples/sec   Loss 8.1140   LearningRate 0.0372   Epoch: 7   Global Step: 323660   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:39,426-Speed 2628.97 samples/sec   Loss 8.2762   LearningRate 0.0372   Epoch: 7   Global Step: 323670   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:43,341-Speed 2615.86 samples/sec   Loss 8.3343   LearningRate 0.0372   Epoch: 7   Global Step: 323680   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:47,237-Speed 2628.80 samples/sec   Loss 8.1312   LearningRate 0.0372   Epoch: 7   Global Step: 323690   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:51,136-Speed 2627.22 samples/sec   Loss 8.2646   LearningRate 0.0372   Epoch: 7   Global Step: 323700   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:55,026-Speed 2632.49 samples/sec   Loss 8.1718   LearningRate 0.0372   Epoch: 7   Global Step: 323710   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:05:58,921-Speed 2630.16 samples/sec   Loss 8.1119   LearningRate 0.0372   Epoch: 7   Global Step: 323720   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:02,825-Speed 2623.42 samples/sec   Loss 8.1840   LearningRate 0.0372   Epoch: 7   Global Step: 323730   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:06,725-Speed 2625.75 samples/sec   Loss 8.1966   LearningRate 0.0372   Epoch: 7   Global Step: 323740   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:10,620-Speed 2629.79 samples/sec   Loss 8.2561   LearningRate 0.0372   Epoch: 7   Global Step: 323750   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:14,521-Speed 2626.23 samples/sec   Loss 8.3880   LearningRate 0.0372   Epoch: 7   Global Step: 323760   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:06:18,412-Speed 2631.95 samples/sec   Loss 8.3049   LearningRate 0.0372   Epoch: 7   Global Step: 323770   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:22,329-Speed 2614.75 samples/sec   Loss 8.2251   LearningRate 0.0372   Epoch: 7   Global Step: 323780   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:26,227-Speed 2627.87 samples/sec   Loss 8.2904   LearningRate 0.0372   Epoch: 7   Global Step: 323790   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:30,119-Speed 2631.49 samples/sec   Loss 8.1073   LearningRate 0.0372   Epoch: 7   Global Step: 323800   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:34,024-Speed 2622.59 samples/sec   Loss 8.1454   LearningRate 0.0372   Epoch: 7   Global Step: 323810   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:37,932-Speed 2620.91 samples/sec   Loss 8.1054   LearningRate 0.0372   Epoch: 7   Global Step: 323820   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:41,881-Speed 2593.81 samples/sec   Loss 8.2495   LearningRate 0.0372   Epoch: 7   Global Step: 323830   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:45,821-Speed 2599.60 samples/sec   Loss 8.0734   LearningRate 0.0372   Epoch: 7   Global Step: 323840   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:49,722-Speed 2625.04 samples/sec   Loss 8.3051   LearningRate 0.0372   Epoch: 7   Global Step: 323850   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:53,634-Speed 2618.63 samples/sec   Loss 8.2604   LearningRate 0.0372   Epoch: 7   Global Step: 323860   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:06:57,508-Speed 2643.82 samples/sec   Loss 8.3323   LearningRate 0.0372   Epoch: 7   Global Step: 323870   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:01,401-Speed 2631.68 samples/sec   Loss 8.2517   LearningRate 0.0372   Epoch: 7   Global Step: 323880   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:05,293-Speed 2631.39 samples/sec   Loss 8.0254   LearningRate 0.0372   Epoch: 7   Global Step: 323890   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:09,191-Speed 2627.37 samples/sec   Loss 8.3701   LearningRate 0.0372   Epoch: 7   Global Step: 323900   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:13,096-Speed 2622.27 samples/sec   Loss 8.1439   LearningRate 0.0372   Epoch: 7   Global Step: 323910   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:16,998-Speed 2625.20 samples/sec   Loss 8.2107   LearningRate 0.0372   Epoch: 7   Global Step: 323920   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:20,891-Speed 2631.33 samples/sec   Loss 8.2545   LearningRate 0.0372   Epoch: 7   Global Step: 323930   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:24,787-Speed 2628.55 samples/sec   Loss 8.3875   LearningRate 0.0372   Epoch: 7   Global Step: 323940   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:28,681-Speed 2631.05 samples/sec   Loss 8.1710   LearningRate 0.0371   Epoch: 7   Global Step: 323950   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:32,583-Speed 2624.93 samples/sec   Loss 8.1882   LearningRate 0.0371   Epoch: 7   Global Step: 323960   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:36,475-Speed 2631.46 samples/sec   Loss 8.1856   LearningRate 0.0371   Epoch: 7   Global Step: 323970   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:07:40,352-Speed 2641.58 samples/sec   Loss 8.2776   LearningRate 0.0371   Epoch: 7   Global Step: 323980   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:44,244-Speed 2631.78 samples/sec   Loss 8.0367   LearningRate 0.0371   Epoch: 7   Global Step: 323990   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:48,138-Speed 2629.56 samples/sec   Loss 8.2100   LearningRate 0.0371   Epoch: 7   Global Step: 324000   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:07:52,015-Speed 2642.45 samples/sec   Loss 8.0615   LearningRate 0.0371   Epoch: 7   Global Step: 324010   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:07:55,905-Speed 2632.99 samples/sec   Loss 8.1882   LearningRate 0.0371   Epoch: 7   Global Step: 324020   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:07:59,805-Speed 2626.12 samples/sec   Loss 8.2857   LearningRate 0.0371   Epoch: 7   Global Step: 324030   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:03,700-Speed 2629.67 samples/sec   Loss 8.1728   LearningRate 0.0371   Epoch: 7   Global Step: 324040   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:07,595-Speed 2629.88 samples/sec   Loss 8.2033   LearningRate 0.0371   Epoch: 7   Global Step: 324050   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:11,487-Speed 2630.98 samples/sec   Loss 8.1857   LearningRate 0.0371   Epoch: 7   Global Step: 324060   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:15,379-Speed 2631.98 samples/sec   Loss 8.1555   LearningRate 0.0371   Epoch: 7   Global Step: 324070   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:19,270-Speed 2632.05 samples/sec   Loss 8.1471   LearningRate 0.0371   Epoch: 7   Global Step: 324080   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:23,175-Speed 2622.95 samples/sec   Loss 8.1432   LearningRate 0.0371   Epoch: 7   Global Step: 324090   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:27,075-Speed 2626.37 samples/sec   Loss 8.2290   LearningRate 0.0371   Epoch: 7   Global Step: 324100   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:08:30,970-Speed 2629.82 samples/sec   Loss 8.1153   LearningRate 0.0371   Epoch: 7   Global Step: 324110   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:34,875-Speed 2622.57 samples/sec   Loss 8.2192   LearningRate 0.0371   Epoch: 7   Global Step: 324120   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:38,778-Speed 2624.60 samples/sec   Loss 8.2261   LearningRate 0.0371   Epoch: 7   Global Step: 324130   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:42,671-Speed 2631.13 samples/sec   Loss 8.2270   LearningRate 0.0371   Epoch: 7   Global Step: 324140   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:46,567-Speed 2629.25 samples/sec   Loss 8.1644   LearningRate 0.0371   Epoch: 7   Global Step: 324150   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:50,460-Speed 2631.05 samples/sec   Loss 8.2495   LearningRate 0.0371   Epoch: 7   Global Step: 324160   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:54,354-Speed 2629.91 samples/sec   Loss 8.2469   LearningRate 0.0371   Epoch: 7   Global Step: 324170   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:08:58,247-Speed 2631.24 samples/sec   Loss 8.2318   LearningRate 0.0371   Epoch: 7   Global Step: 324180   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:02,154-Speed 2621.28 samples/sec   Loss 8.1349   LearningRate 0.0371   Epoch: 7   Global Step: 324190   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:06,047-Speed 2631.17 samples/sec   Loss 8.2776   LearningRate 0.0371   Epoch: 7   Global Step: 324200   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:09,927-Speed 2639.43 samples/sec   Loss 8.1813   LearningRate 0.0371   Epoch: 7   Global Step: 324210   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:13,820-Speed 2630.91 samples/sec   Loss 8.2136   LearningRate 0.0371   Epoch: 7   Global Step: 324220   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:17,714-Speed 2630.52 samples/sec   Loss 8.1511   LearningRate 0.0371   Epoch: 7   Global Step: 324230   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:21,605-Speed 2632.31 samples/sec   Loss 8.2374   LearningRate 0.0371   Epoch: 7   Global Step: 324240   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:25,497-Speed 2631.89 samples/sec   Loss 8.1928   LearningRate 0.0371   Epoch: 7   Global Step: 324250   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:29,406-Speed 2620.25 samples/sec   Loss 8.1896   LearningRate 0.0371   Epoch: 7   Global Step: 324260   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:33,297-Speed 2632.27 samples/sec   Loss 8.2167   LearningRate 0.0371   Epoch: 7   Global Step: 324270   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:37,204-Speed 2621.12 samples/sec   Loss 8.1037   LearningRate 0.0371   Epoch: 7   Global Step: 324280   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:41,094-Speed 2632.75 samples/sec   Loss 8.2433   LearningRate 0.0371   Epoch: 7   Global Step: 324290   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:44,999-Speed 2623.22 samples/sec   Loss 8.1867   LearningRate 0.0371   Epoch: 7   Global Step: 324300   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:09:48,889-Speed 2632.50 samples/sec   Loss 8.2536   LearningRate 0.0371   Epoch: 7   Global Step: 324310   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:09:52,786-Speed 2629.05 samples/sec   Loss 8.1671   LearningRate 0.0371   Epoch: 7   Global Step: 324320   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:09:56,683-Speed 2627.81 samples/sec   Loss 8.2178   LearningRate 0.0371   Epoch: 7   Global Step: 324330   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:10:00,573-Speed 2635.54 samples/sec   Loss 8.2591   LearningRate 0.0371   Epoch: 7   Global Step: 324340   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:10:04,451-Speed 2641.02 samples/sec   Loss 8.0822   LearningRate 0.0371   Epoch: 7   Global Step: 324350   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:10:08,327-Speed 2642.72 samples/sec   Loss 8.1992   LearningRate 0.0371   Epoch: 7   Global Step: 324360   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:12,219-Speed 2631.65 samples/sec   Loss 8.1305   LearningRate 0.0371   Epoch: 7   Global Step: 324370   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:16,114-Speed 2629.57 samples/sec   Loss 8.1562   LearningRate 0.0371   Epoch: 7   Global Step: 324380   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:20,005-Speed 2632.24 samples/sec   Loss 8.1906   LearningRate 0.0371   Epoch: 7   Global Step: 324390   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:23,896-Speed 2632.24 samples/sec   Loss 8.2135   LearningRate 0.0371   Epoch: 7   Global Step: 324400   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:27,787-Speed 2631.92 samples/sec   Loss 8.0871   LearningRate 0.0371   Epoch: 7   Global Step: 324410   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:31,676-Speed 2634.30 samples/sec   Loss 8.2739   LearningRate 0.0371   Epoch: 7   Global Step: 324420   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:35,570-Speed 2630.10 samples/sec   Loss 8.1621   LearningRate 0.0371   Epoch: 7   Global Step: 324430   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:39,474-Speed 2623.49 samples/sec   Loss 8.0331   LearningRate 0.0371   Epoch: 7   Global Step: 324440   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:43,402-Speed 2607.76 samples/sec   Loss 8.2574   LearningRate 0.0371   Epoch: 7   Global Step: 324450   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:47,295-Speed 2630.89 samples/sec   Loss 8.1786   LearningRate 0.0371   Epoch: 7   Global Step: 324460   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:10:51,172-Speed 2641.71 samples/sec   Loss 8.1159   LearningRate 0.0371   Epoch: 7   Global Step: 324470   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:55,072-Speed 2626.98 samples/sec   Loss 8.2105   LearningRate 0.0371   Epoch: 7   Global Step: 324480   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:10:58,964-Speed 2631.69 samples/sec   Loss 8.1302   LearningRate 0.0371   Epoch: 7   Global Step: 324490   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:02,860-Speed 2628.47 samples/sec   Loss 8.2334   LearningRate 0.0371   Epoch: 7   Global Step: 324500   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:06,764-Speed 2623.93 samples/sec   Loss 8.1632   LearningRate 0.0371   Epoch: 7   Global Step: 324510   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:10,662-Speed 2627.71 samples/sec   Loss 8.1607   LearningRate 0.0371   Epoch: 7   Global Step: 324520   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:14,558-Speed 2628.67 samples/sec   Loss 8.0956   LearningRate 0.0371   Epoch: 7   Global Step: 324530   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:18,458-Speed 2626.05 samples/sec   Loss 8.2098   LearningRate 0.0371   Epoch: 7   Global Step: 324540   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:22,359-Speed 2625.60 samples/sec   Loss 8.1457   LearningRate 0.0371   Epoch: 7   Global Step: 324550   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:26,259-Speed 2626.27 samples/sec   Loss 8.2800   LearningRate 0.0371   Epoch: 7   Global Step: 324560   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:11:30,155-Speed 2629.17 samples/sec   Loss 8.1249   LearningRate 0.0371   Epoch: 7   Global Step: 324570   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:34,050-Speed 2629.66 samples/sec   Loss 8.1857   LearningRate 0.0371   Epoch: 7   Global Step: 324580   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:37,955-Speed 2622.40 samples/sec   Loss 8.2117   LearningRate 0.0371   Epoch: 7   Global Step: 324590   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:41,853-Speed 2627.58 samples/sec   Loss 8.1500   LearningRate 0.0371   Epoch: 7   Global Step: 324600   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:45,752-Speed 2627.46 samples/sec   Loss 8.1789   LearningRate 0.0371   Epoch: 7   Global Step: 324610   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:49,649-Speed 2627.89 samples/sec   Loss 8.0609   LearningRate 0.0371   Epoch: 7   Global Step: 324620   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:53,547-Speed 2627.96 samples/sec   Loss 8.2062   LearningRate 0.0370   Epoch: 7   Global Step: 324630   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:11:57,452-Speed 2623.13 samples/sec   Loss 8.3936   LearningRate 0.0370   Epoch: 7   Global Step: 324640   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:12:01,346-Speed 2629.93 samples/sec   Loss 8.1562   LearningRate 0.0370   Epoch: 7   Global Step: 324650   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:12:05,236-Speed 2632.88 samples/sec   Loss 8.2340   LearningRate 0.0370   Epoch: 7   Global Step: 324660   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:12:09,127-Speed 2632.14 samples/sec   Loss 8.2560   LearningRate 0.0370   Epoch: 7   Global Step: 324670   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:12:13,008-Speed 2639.26 samples/sec   Loss 8.2580   LearningRate 0.0370   Epoch: 7   Global Step: 324680   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:12:16,905-Speed 2628.37 samples/sec   Loss 8.3107   LearningRate 0.0370   Epoch: 7   Global Step: 324690   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:12:20,806-Speed 2625.56 samples/sec   Loss 8.0904   LearningRate 0.0370   Epoch: 7   Global Step: 324700   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:12:24,700-Speed 2630.11 samples/sec   Loss 8.2057   LearningRate 0.0370   Epoch: 7   Global Step: 324710   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:12:28,495-Speed 2699.44 samples/sec   Loss 8.4634   LearningRate 0.0370   Epoch: 7   Global Step: 324720   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:32,380-Speed 2636.59 samples/sec   Loss 8.4139   LearningRate 0.0370   Epoch: 7   Global Step: 324730   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:36,331-Speed 2592.03 samples/sec   Loss 8.2323   LearningRate 0.0370   Epoch: 7   Global Step: 324740   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:40,232-Speed 2625.42 samples/sec   Loss 8.1621   LearningRate 0.0370   Epoch: 7   Global Step: 324750   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:44,120-Speed 2634.29 samples/sec   Loss 8.3162   LearningRate 0.0370   Epoch: 7   Global Step: 324760   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:48,013-Speed 2630.86 samples/sec   Loss 8.2872   LearningRate 0.0370   Epoch: 7   Global Step: 324770   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:51,911-Speed 2627.99 samples/sec   Loss 8.1777   LearningRate 0.0370   Epoch: 7   Global Step: 324780   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:55,804-Speed 2630.45 samples/sec   Loss 8.1920   LearningRate 0.0370   Epoch: 7   Global Step: 324790   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:12:59,701-Speed 2628.52 samples/sec   Loss 8.2690   LearningRate 0.0370   Epoch: 7   Global Step: 324800   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:13:03,597-Speed 2628.97 samples/sec   Loss 8.1036   LearningRate 0.0370   Epoch: 7   Global Step: 324810   Fp16 Grad Scale: 1024   Required: 57 hours
Training: 2022-04-14 08:13:07,491-Speed 2630.09 samples/sec   Loss 8.1956   LearningRate 0.0370   Epoch: 7   Global Step: 324820   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:11,397-Speed 2622.61 samples/sec   Loss 8.1517   LearningRate 0.0370   Epoch: 7   Global Step: 324830   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:15,291-Speed 2630.29 samples/sec   Loss 8.1446   LearningRate 0.0370   Epoch: 7   Global Step: 324840   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:19,179-Speed 2633.89 samples/sec   Loss 8.1770   LearningRate 0.0370   Epoch: 7   Global Step: 324850   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:23,067-Speed 2634.48 samples/sec   Loss 8.2195   LearningRate 0.0370   Epoch: 7   Global Step: 324860   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:26,956-Speed 2633.49 samples/sec   Loss 8.2281   LearningRate 0.0370   Epoch: 7   Global Step: 324870   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:30,842-Speed 2635.78 samples/sec   Loss 8.1292   LearningRate 0.0370   Epoch: 7   Global Step: 324880   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:34,731-Speed 2633.35 samples/sec   Loss 8.3120   LearningRate 0.0370   Epoch: 7   Global Step: 324890   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:38,619-Speed 2634.71 samples/sec   Loss 8.2246   LearningRate 0.0370   Epoch: 7   Global Step: 324900   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:42,512-Speed 2631.14 samples/sec   Loss 8.1428   LearningRate 0.0370   Epoch: 7   Global Step: 324910   Fp16 Grad Scale: 2048   Required: 57 hours
Training: 2022-04-14 08:13:46,405-Speed 2630.76 samples/sec   Loss 8.1922   LearningRate 0.0370   Epoch: 7   Global Step: 324920   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:13:50,302-Speed 2628.69 samples/sec   Loss 8.0814   LearningRate 0.0370   Epoch: 7   Global Step: 324930   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:13:54,201-Speed 2626.70 samples/sec   Loss 8.2327   LearningRate 0.0370   Epoch: 7   Global Step: 324940   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:13:58,102-Speed 2625.20 samples/sec   Loss 8.2220   LearningRate 0.0370   Epoch: 7   Global Step: 324950   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:02,043-Speed 2599.32 samples/sec   Loss 8.1802   LearningRate 0.0370   Epoch: 7   Global Step: 324960   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:05,941-Speed 2627.63 samples/sec   Loss 8.4914   LearningRate 0.0370   Epoch: 7   Global Step: 324970   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:09,839-Speed 2627.45 samples/sec   Loss 9.7753   LearningRate 0.0370   Epoch: 7   Global Step: 324980   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:13,730-Speed 2631.86 samples/sec   Loss 8.7474   LearningRate 0.0370   Epoch: 7   Global Step: 324990   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:17,621-Speed 2633.03 samples/sec   Loss 8.5662   LearningRate 0.0370   Epoch: 7   Global Step: 325000   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:21,511-Speed 2633.06 samples/sec   Loss 8.6005   LearningRate 0.0370   Epoch: 7   Global Step: 325010   Fp16 Grad Scale: 4096   Required: 57 hours
Training: 2022-04-14 08:14:25,402-Speed 2632.07 samples/sec   Loss 8.3873   LearningRate 0.0370   Epoch: 7   Global Step: 325020   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:29,293-Speed 2632.91 samples/sec   Loss 8.3539   LearningRate 0.0370   Epoch: 7   Global Step: 325030   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:33,182-Speed 2633.10 samples/sec   Loss 8.2783   LearningRate 0.0370   Epoch: 7   Global Step: 325040   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:37,071-Speed 2634.27 samples/sec   Loss 8.3541   LearningRate 0.0370   Epoch: 7   Global Step: 325050   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:40,961-Speed 2633.09 samples/sec   Loss 8.2536   LearningRate 0.0370   Epoch: 7   Global Step: 325060   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:44,856-Speed 2629.43 samples/sec   Loss 8.3219   LearningRate 0.0370   Epoch: 7   Global Step: 325070   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:48,750-Speed 2630.29 samples/sec   Loss 8.1877   LearningRate 0.0370   Epoch: 7   Global Step: 325080   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:52,643-Speed 2631.32 samples/sec   Loss 8.2336   LearningRate 0.0370   Epoch: 7   Global Step: 325090   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:14:56,539-Speed 2629.80 samples/sec   Loss 8.3311   LearningRate 0.0370   Epoch: 7   Global Step: 325100   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:15:00,447-Speed 2620.71 samples/sec   Loss 8.2135   LearningRate 0.0370   Epoch: 7   Global Step: 325110   Fp16 Grad Scale: 8192   Required: 57 hours
Training: 2022-04-14 08:15:04,344-Speed 2627.92 samples/sec   Loss 8.2210   LearningRate 0.0370   Epoch: 7   Global Step: 325120   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:08,240-Speed 2629.23 samples/sec   Loss 8.1852   LearningRate 0.0370   Epoch: 7   Global Step: 325130   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:12,145-Speed 2622.86 samples/sec   Loss 8.1394   LearningRate 0.0370   Epoch: 7   Global Step: 325140   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:16,076-Speed 2605.28 samples/sec   Loss 8.1339   LearningRate 0.0370   Epoch: 7   Global Step: 325150   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:19,964-Speed 2634.92 samples/sec   Loss 8.0512   LearningRate 0.0370   Epoch: 7   Global Step: 325160   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:23,856-Speed 2632.03 samples/sec   Loss 7.9733   LearningRate 0.0370   Epoch: 7   Global Step: 325170   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:27,765-Speed 2620.52 samples/sec   Loss 8.2161   LearningRate 0.0370   Epoch: 7   Global Step: 325180   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:31,662-Speed 2627.70 samples/sec   Loss 8.3257   LearningRate 0.0370   Epoch: 7   Global Step: 325190   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:35,562-Speed 2626.65 samples/sec   Loss 8.1498   LearningRate 0.0370   Epoch: 7   Global Step: 325200   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:39,458-Speed 2629.01 samples/sec   Loss 8.2505   LearningRate 0.0370   Epoch: 7   Global Step: 325210   Fp16 Grad Scale: 16384   Required: 57 hours
Training: 2022-04-14 08:15:43,358-Speed 2625.95 samples/sec   Loss 8.1995   LearningRate 0.0370   Epoch: 7   Global Step: 325220   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:15:47,249-Speed 2632.33 samples/sec   Loss 8.1666   LearningRate 0.0370   Epoch: 7   Global Step: 325230   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:15:51,142-Speed 2630.93 samples/sec   Loss 8.1047   LearningRate 0.0370   Epoch: 7   Global Step: 325240   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:15:55,035-Speed 2631.27 samples/sec   Loss 8.0400   LearningRate 0.0370   Epoch: 7   Global Step: 325250   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:15:58,928-Speed 2630.87 samples/sec   Loss 8.2464   LearningRate 0.0370   Epoch: 7   Global Step: 325260   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:16:02,818-Speed 2633.06 samples/sec   Loss 8.1512   LearningRate 0.0370   Epoch: 7   Global Step: 325270   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:16:06,716-Speed 2628.19 samples/sec   Loss 8.1529   LearningRate 0.0370   Epoch: 7   Global Step: 325280   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:16:10,600-Speed 2637.20 samples/sec   Loss 8.2101   LearningRate 0.0370   Epoch: 7   Global Step: 325290   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:16:14,495-Speed 2628.96 samples/sec   Loss 8.2998   LearningRate 0.0370   Epoch: 7   Global Step: 325300   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:16:18,394-Speed 2626.88 samples/sec   Loss 8.1014   LearningRate 0.0369   Epoch: 7   Global Step: 325310   Fp16 Grad Scale: 32768   Required: 57 hours
Training: 2022-04-14 08:16:22,294-Speed 2626.35 samples/sec   Loss 8.2315   LearningRate 0.0369   Epoch: 7   Global Step: 325320   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:26,200-Speed 2622.39 samples/sec   Loss 8.3914   LearningRate 0.0369   Epoch: 7   Global Step: 325330   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:30,095-Speed 2629.61 samples/sec   Loss 8.1339   LearningRate 0.0369   Epoch: 7   Global Step: 325340   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:33,988-Speed 2631.56 samples/sec   Loss 8.1389   LearningRate 0.0369   Epoch: 7   Global Step: 325350   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:37,876-Speed 2634.18 samples/sec   Loss 8.1974   LearningRate 0.0369   Epoch: 7   Global Step: 325360   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:41,766-Speed 2633.67 samples/sec   Loss 8.1706   LearningRate 0.0369   Epoch: 7   Global Step: 325370   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:45,657-Speed 2631.63 samples/sec   Loss 8.2502   LearningRate 0.0369   Epoch: 7   Global Step: 325380   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:49,546-Speed 2633.86 samples/sec   Loss 8.2727   LearningRate 0.0369   Epoch: 7   Global Step: 325390   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:53,441-Speed 2629.55 samples/sec   Loss 8.1787   LearningRate 0.0369   Epoch: 7   Global Step: 325400   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:16:57,370-Speed 2607.33 samples/sec   Loss 8.2321   LearningRate 0.0369   Epoch: 7   Global Step: 325410   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:01,270-Speed 2625.74 samples/sec   Loss 8.2805   LearningRate 0.0369   Epoch: 7   Global Step: 325420   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:17:05,170-Speed 2626.50 samples/sec   Loss 8.0914   LearningRate 0.0369   Epoch: 7   Global Step: 325430   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:17:09,060-Speed 2632.80 samples/sec   Loss 8.1180   LearningRate 0.0369   Epoch: 7   Global Step: 325440   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:17:12,949-Speed 2634.21 samples/sec   Loss 8.0060   LearningRate 0.0369   Epoch: 7   Global Step: 325450   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:17:16,841-Speed 2631.99 samples/sec   Loss 8.2054   LearningRate 0.0369   Epoch: 7   Global Step: 325460   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:17:20,798-Speed 2588.54 samples/sec   Loss 8.2214   LearningRate 0.0369   Epoch: 7   Global Step: 325470   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:17:24,864-Speed 2519.02 samples/sec   Loss 8.2392   LearningRate 0.0369   Epoch: 7   Global Step: 325480   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:28,795-Speed 2605.40 samples/sec   Loss 8.1915   LearningRate 0.0369   Epoch: 7   Global Step: 325490   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:32,699-Speed 2624.03 samples/sec   Loss 8.1593   LearningRate 0.0369   Epoch: 7   Global Step: 325500   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:36,598-Speed 2626.70 samples/sec   Loss 8.1851   LearningRate 0.0369   Epoch: 7   Global Step: 325510   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:40,493-Speed 2630.11 samples/sec   Loss 8.1199   LearningRate 0.0369   Epoch: 7   Global Step: 325520   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:44,419-Speed 2608.45 samples/sec   Loss 8.1724   LearningRate 0.0369   Epoch: 7   Global Step: 325530   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:48,424-Speed 2557.85 samples/sec   Loss 8.2100   LearningRate 0.0369   Epoch: 7   Global Step: 325540   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:52,313-Speed 2634.34 samples/sec   Loss 8.1241   LearningRate 0.0369   Epoch: 7   Global Step: 325550   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:17:56,269-Speed 2588.37 samples/sec   Loss 8.2377   LearningRate 0.0369   Epoch: 7   Global Step: 325560   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:00,354-Speed 2507.62 samples/sec   Loss 8.1252   LearningRate 0.0369   Epoch: 7   Global Step: 325570   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:04,246-Speed 2632.12 samples/sec   Loss 8.1792   LearningRate 0.0369   Epoch: 7   Global Step: 325580   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:08,137-Speed 2632.03 samples/sec   Loss 8.1602   LearningRate 0.0369   Epoch: 7   Global Step: 325590   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:12,043-Speed 2621.79 samples/sec   Loss 8.2393   LearningRate 0.0369   Epoch: 7   Global Step: 325600   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:15,938-Speed 2630.92 samples/sec   Loss 8.2439   LearningRate 0.0369   Epoch: 7   Global Step: 325610   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:19,843-Speed 2623.30 samples/sec   Loss 8.0668   LearningRate 0.0369   Epoch: 7   Global Step: 325620   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:23,736-Speed 2631.00 samples/sec   Loss 8.2349   LearningRate 0.0369   Epoch: 7   Global Step: 325630   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:27,629-Speed 2631.13 samples/sec   Loss 7.9739   LearningRate 0.0369   Epoch: 7   Global Step: 325640   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:31,643-Speed 2551.70 samples/sec   Loss 8.2249   LearningRate 0.0369   Epoch: 7   Global Step: 325650   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:18:35,527-Speed 2637.30 samples/sec   Loss 8.1013   LearningRate 0.0369   Epoch: 7   Global Step: 325660   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:39,416-Speed 2633.36 samples/sec   Loss 8.2822   LearningRate 0.0369   Epoch: 7   Global Step: 325670   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:43,306-Speed 2632.54 samples/sec   Loss 8.1805   LearningRate 0.0369   Epoch: 7   Global Step: 325680   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:47,197-Speed 2632.55 samples/sec   Loss 8.1327   LearningRate 0.0369   Epoch: 7   Global Step: 325690   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:51,089-Speed 2632.15 samples/sec   Loss 8.0916   LearningRate 0.0369   Epoch: 7   Global Step: 325700   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:54,979-Speed 2633.48 samples/sec   Loss 8.1488   LearningRate 0.0369   Epoch: 7   Global Step: 325710   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:18:58,875-Speed 2628.24 samples/sec   Loss 8.1577   LearningRate 0.0369   Epoch: 7   Global Step: 325720   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:19:02,767-Speed 2632.09 samples/sec   Loss 8.2222   LearningRate 0.0369   Epoch: 7   Global Step: 325730   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:19:06,671-Speed 2623.57 samples/sec   Loss 8.2740   LearningRate 0.0369   Epoch: 7   Global Step: 325740   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:19:10,572-Speed 2625.54 samples/sec   Loss 8.1048   LearningRate 0.0369   Epoch: 7   Global Step: 325750   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:19:14,463-Speed 2631.78 samples/sec   Loss 8.2022   LearningRate 0.0369   Epoch: 7   Global Step: 325760   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:18,404-Speed 2599.18 samples/sec   Loss 8.0972   LearningRate 0.0369   Epoch: 7   Global Step: 325770   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:22,294-Speed 2633.25 samples/sec   Loss 8.2073   LearningRate 0.0369   Epoch: 7   Global Step: 325780   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:26,188-Speed 2630.78 samples/sec   Loss 8.2338   LearningRate 0.0369   Epoch: 7   Global Step: 325790   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:30,082-Speed 2630.08 samples/sec   Loss 8.1638   LearningRate 0.0369   Epoch: 7   Global Step: 325800   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:33,979-Speed 2628.03 samples/sec   Loss 8.2014   LearningRate 0.0369   Epoch: 7   Global Step: 325810   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:37,888-Speed 2620.34 samples/sec   Loss 8.2862   LearningRate 0.0369   Epoch: 7   Global Step: 325820   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:41,789-Speed 2625.60 samples/sec   Loss 8.2135   LearningRate 0.0369   Epoch: 7   Global Step: 325830   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:45,692-Speed 2624.52 samples/sec   Loss 8.1700   LearningRate 0.0369   Epoch: 7   Global Step: 325840   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:49,607-Speed 2616.08 samples/sec   Loss 8.1419   LearningRate 0.0369   Epoch: 7   Global Step: 325850   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:19:53,517-Speed 2619.92 samples/sec   Loss 8.2179   LearningRate 0.0369   Epoch: 7   Global Step: 325860   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:19:57,422-Speed 2622.63 samples/sec   Loss 8.2351   LearningRate 0.0369   Epoch: 7   Global Step: 325870   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:20:01,333-Speed 2618.84 samples/sec   Loss 8.2260   LearningRate 0.0369   Epoch: 7   Global Step: 325880   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:20:05,251-Speed 2614.21 samples/sec   Loss 8.1209   LearningRate 0.0369   Epoch: 7   Global Step: 325890   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:20:09,150-Speed 2626.94 samples/sec   Loss 8.2907   LearningRate 0.0369   Epoch: 7   Global Step: 325900   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:20:13,041-Speed 2632.14 samples/sec   Loss 8.0296   LearningRate 0.0369   Epoch: 7   Global Step: 325910   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:20:16,933-Speed 2632.26 samples/sec   Loss 8.1545   LearningRate 0.0369   Epoch: 7   Global Step: 325920   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:20:20,804-Speed 2645.27 samples/sec   Loss 8.0361   LearningRate 0.0369   Epoch: 7   Global Step: 325930   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:24,695-Speed 2632.50 samples/sec   Loss 8.1655   LearningRate 0.0369   Epoch: 7   Global Step: 325940   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:28,584-Speed 2633.72 samples/sec   Loss 8.1514   LearningRate 0.0369   Epoch: 7   Global Step: 325950   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:32,477-Speed 2631.05 samples/sec   Loss 8.1053   LearningRate 0.0369   Epoch: 7   Global Step: 325960   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:36,368-Speed 2632.11 samples/sec   Loss 8.1733   LearningRate 0.0369   Epoch: 7   Global Step: 325970   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:40,264-Speed 2629.02 samples/sec   Loss 8.0660   LearningRate 0.0369   Epoch: 7   Global Step: 325980   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:44,156-Speed 2631.79 samples/sec   Loss 8.1463   LearningRate 0.0369   Epoch: 7   Global Step: 325990   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:48,050-Speed 2629.99 samples/sec   Loss 8.0442   LearningRate 0.0368   Epoch: 7   Global Step: 326000   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:51,947-Speed 2628.95 samples/sec   Loss 8.0426   LearningRate 0.0368   Epoch: 7   Global Step: 326010   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:55,838-Speed 2632.01 samples/sec   Loss 8.2487   LearningRate 0.0368   Epoch: 7   Global Step: 326020   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:20:59,729-Speed 2632.12 samples/sec   Loss 8.0900   LearningRate 0.0368   Epoch: 7   Global Step: 326030   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:21:03,638-Speed 2620.31 samples/sec   Loss 8.1789   LearningRate 0.0368   Epoch: 7   Global Step: 326040   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:21:07,541-Speed 2625.00 samples/sec   Loss 8.1231   LearningRate 0.0368   Epoch: 7   Global Step: 326050   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:21:11,440-Speed 2626.67 samples/sec   Loss 8.2435   LearningRate 0.0368   Epoch: 7   Global Step: 326060   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:21:15,368-Speed 2607.18 samples/sec   Loss 8.2907   LearningRate 0.0368   Epoch: 7   Global Step: 326070   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:19,258-Speed 2633.56 samples/sec   Loss 7.9792   LearningRate 0.0368   Epoch: 7   Global Step: 326080   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:23,150-Speed 2631.75 samples/sec   Loss 8.0802   LearningRate 0.0368   Epoch: 7   Global Step: 326090   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:27,043-Speed 2631.47 samples/sec   Loss 8.1097   LearningRate 0.0368   Epoch: 7   Global Step: 326100   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:30,950-Speed 2621.68 samples/sec   Loss 8.1579   LearningRate 0.0368   Epoch: 7   Global Step: 326110   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:34,844-Speed 2630.40 samples/sec   Loss 8.1637   LearningRate 0.0368   Epoch: 7   Global Step: 326120   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:38,753-Speed 2620.23 samples/sec   Loss 8.0694   LearningRate 0.0368   Epoch: 7   Global Step: 326130   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:42,654-Speed 2625.23 samples/sec   Loss 8.1450   LearningRate 0.0368   Epoch: 7   Global Step: 326140   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:46,552-Speed 2628.15 samples/sec   Loss 8.1532   LearningRate 0.0368   Epoch: 7   Global Step: 326150   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:50,448-Speed 2629.04 samples/sec   Loss 8.2410   LearningRate 0.0368   Epoch: 7   Global Step: 326160   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:21:54,341-Speed 2630.96 samples/sec   Loss 8.2696   LearningRate 0.0368   Epoch: 7   Global Step: 326170   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:21:58,232-Speed 2632.69 samples/sec   Loss 8.2285   LearningRate 0.0368   Epoch: 7   Global Step: 326180   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:02,127-Speed 2629.48 samples/sec   Loss 8.0963   LearningRate 0.0368   Epoch: 7   Global Step: 326190   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:06,019-Speed 2631.88 samples/sec   Loss 8.0706   LearningRate 0.0368   Epoch: 7   Global Step: 326200   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:09,919-Speed 2626.07 samples/sec   Loss 8.1965   LearningRate 0.0368   Epoch: 7   Global Step: 326210   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:13,827-Speed 2621.04 samples/sec   Loss 8.0451   LearningRate 0.0368   Epoch: 7   Global Step: 326220   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:17,838-Speed 2553.77 samples/sec   Loss 8.2903   LearningRate 0.0368   Epoch: 7   Global Step: 326230   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:21,814-Speed 2576.94 samples/sec   Loss 8.1542   LearningRate 0.0368   Epoch: 7   Global Step: 326240   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:25,707-Speed 2630.78 samples/sec   Loss 8.1264   LearningRate 0.0368   Epoch: 7   Global Step: 326250   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:29,622-Speed 2615.79 samples/sec   Loss 8.1908   LearningRate 0.0368   Epoch: 7   Global Step: 326260   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:33,529-Speed 2622.02 samples/sec   Loss 8.1861   LearningRate 0.0368   Epoch: 7   Global Step: 326270   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:22:37,422-Speed 2630.94 samples/sec   Loss 8.1376   LearningRate 0.0368   Epoch: 7   Global Step: 326280   Fp16 Grad Scale: 262144   Required: 57 hours
Training: 2022-04-14 08:22:41,295-Speed 2644.87 samples/sec   Loss 8.2118   LearningRate 0.0368   Epoch: 7   Global Step: 326290   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:45,195-Speed 2626.21 samples/sec   Loss 8.1495   LearningRate 0.0368   Epoch: 7   Global Step: 326300   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:49,088-Speed 2630.62 samples/sec   Loss 8.0880   LearningRate 0.0368   Epoch: 7   Global Step: 326310   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:52,984-Speed 2628.89 samples/sec   Loss 8.0397   LearningRate 0.0368   Epoch: 7   Global Step: 326320   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:22:56,870-Speed 2635.77 samples/sec   Loss 8.1632   LearningRate 0.0368   Epoch: 7   Global Step: 326330   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:00,758-Speed 2634.63 samples/sec   Loss 8.1328   LearningRate 0.0368   Epoch: 7   Global Step: 326340   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:04,656-Speed 2627.29 samples/sec   Loss 8.1810   LearningRate 0.0368   Epoch: 7   Global Step: 326350   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:08,554-Speed 2627.92 samples/sec   Loss 8.0872   LearningRate 0.0368   Epoch: 7   Global Step: 326360   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:12,458-Speed 2623.86 samples/sec   Loss 8.2099   LearningRate 0.0368   Epoch: 7   Global Step: 326370   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:16,356-Speed 2627.35 samples/sec   Loss 8.1177   LearningRate 0.0368   Epoch: 7   Global Step: 326380   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:20,260-Speed 2623.50 samples/sec   Loss 8.1818   LearningRate 0.0368   Epoch: 7   Global Step: 326390   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:24,158-Speed 2627.85 samples/sec   Loss 8.1492   LearningRate 0.0368   Epoch: 7   Global Step: 326400   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:28,206-Speed 2529.91 samples/sec   Loss 8.2405   LearningRate 0.0368   Epoch: 7   Global Step: 326410   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:32,109-Speed 2624.28 samples/sec   Loss 8.1460   LearningRate 0.0368   Epoch: 7   Global Step: 326420   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:36,005-Speed 2629.10 samples/sec   Loss 8.1566   LearningRate 0.0368   Epoch: 7   Global Step: 326430   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:23:39,921-Speed 2615.27 samples/sec   Loss 8.2216   LearningRate 0.0368   Epoch: 7   Global Step: 326440   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:23:43,928-Speed 2556.20 samples/sec   Loss 8.2039   LearningRate 0.0368   Epoch: 7   Global Step: 326450   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:23:47,821-Speed 2631.02 samples/sec   Loss 8.0517   LearningRate 0.0368   Epoch: 7   Global Step: 326460   Fp16 Grad Scale: 131072   Required: 57 hours
Training: 2022-04-14 08:23:51,704-Speed 2637.59 samples/sec   Loss 8.2661   LearningRate 0.0368   Epoch: 7   Global Step: 326470   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:55,602-Speed 2627.79 samples/sec   Loss 8.0343   LearningRate 0.0368   Epoch: 7   Global Step: 326480   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:23:59,506-Speed 2623.77 samples/sec   Loss 8.0780   LearningRate 0.0368   Epoch: 7   Global Step: 326490   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:24:03,420-Speed 2616.62 samples/sec   Loss 8.1274   LearningRate 0.0368   Epoch: 7   Global Step: 326500   Fp16 Grad Scale: 65536   Required: 57 hours
Training: 2022-04-14 08:24:07,324-Speed 2623.85 samples/sec   Loss 8.2094   LearningRate 0.0368   Epoch: 7   Global Step: 326510   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:11,229-Speed 2622.90 samples/sec   Loss 8.1154   LearningRate 0.0368   Epoch: 7   Global Step: 326520   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:15,130-Speed 2625.42 samples/sec   Loss 8.2928   LearningRate 0.0368   Epoch: 7   Global Step: 326530   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:19,043-Speed 2617.42 samples/sec   Loss 8.0782   LearningRate 0.0368   Epoch: 7   Global Step: 326540   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:22,939-Speed 2629.17 samples/sec   Loss 8.1641   LearningRate 0.0368   Epoch: 7   Global Step: 326550   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:26,836-Speed 2628.21 samples/sec   Loss 8.1989   LearningRate 0.0368   Epoch: 7   Global Step: 326560   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:30,732-Speed 2629.30 samples/sec   Loss 8.1182   LearningRate 0.0368   Epoch: 7   Global Step: 326570   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:24:34,627-Speed 2629.31 samples/sec   Loss 8.1735   LearningRate 0.0368   Epoch: 7   Global Step: 326580   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:24:38,520-Speed 2630.83 samples/sec   Loss 8.2256   LearningRate 0.0368   Epoch: 7   Global Step: 326590   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:42,518-Speed 2561.69 samples/sec   Loss 8.1668   LearningRate 0.0368   Epoch: 7   Global Step: 326600   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:46,437-Speed 2614.31 samples/sec   Loss 8.1214   LearningRate 0.0368   Epoch: 7   Global Step: 326610   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:50,341-Speed 2623.31 samples/sec   Loss 8.2659   LearningRate 0.0368   Epoch: 7   Global Step: 326620   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:54,278-Speed 2602.16 samples/sec   Loss 8.2757   LearningRate 0.0368   Epoch: 7   Global Step: 326630   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:24:58,177-Speed 2626.45 samples/sec   Loss 8.0873   LearningRate 0.0368   Epoch: 7   Global Step: 326640   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:25:02,078-Speed 2626.59 samples/sec   Loss 8.2161   LearningRate 0.0368   Epoch: 7   Global Step: 326650   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:25:05,976-Speed 2627.13 samples/sec   Loss 8.1336   LearningRate 0.0368   Epoch: 7   Global Step: 326660   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:25:09,893-Speed 2614.74 samples/sec   Loss 8.1433   LearningRate 0.0368   Epoch: 7   Global Step: 326670   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:25:13,816-Speed 2610.70 samples/sec   Loss 7.9984   LearningRate 0.0367   Epoch: 7   Global Step: 326680   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:25:17,721-Speed 2623.57 samples/sec   Loss 8.1502   LearningRate 0.0367   Epoch: 7   Global Step: 326690   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:21,628-Speed 2621.67 samples/sec   Loss 8.2783   LearningRate 0.0367   Epoch: 7   Global Step: 326700   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:25,533-Speed 2623.20 samples/sec   Loss 8.1706   LearningRate 0.0367   Epoch: 7   Global Step: 326710   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:29,437-Speed 2623.60 samples/sec   Loss 8.0965   LearningRate 0.0367   Epoch: 7   Global Step: 326720   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:33,340-Speed 2624.03 samples/sec   Loss 8.0767   LearningRate 0.0367   Epoch: 7   Global Step: 326730   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:37,253-Speed 2617.86 samples/sec   Loss 8.2058   LearningRate 0.0367   Epoch: 7   Global Step: 326740   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:41,160-Speed 2621.73 samples/sec   Loss 8.2440   LearningRate 0.0367   Epoch: 7   Global Step: 326750   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:45,065-Speed 2622.24 samples/sec   Loss 8.0394   LearningRate 0.0367   Epoch: 7   Global Step: 326760   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:48,964-Speed 2627.55 samples/sec   Loss 8.1224   LearningRate 0.0367   Epoch: 7   Global Step: 326770   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:52,871-Speed 2622.03 samples/sec   Loss 8.1282   LearningRate 0.0367   Epoch: 7   Global Step: 326780   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:25:56,747-Speed 2642.48 samples/sec   Loss 8.2175   LearningRate 0.0367   Epoch: 7   Global Step: 326790   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:00,673-Speed 2609.05 samples/sec   Loss 8.2444   LearningRate 0.0367   Epoch: 7   Global Step: 326800   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:04,569-Speed 2628.87 samples/sec   Loss 8.0896   LearningRate 0.0367   Epoch: 7   Global Step: 326810   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:08,500-Speed 2605.59 samples/sec   Loss 8.1243   LearningRate 0.0367   Epoch: 7   Global Step: 326820   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:12,399-Speed 2627.18 samples/sec   Loss 8.1964   LearningRate 0.0367   Epoch: 7   Global Step: 326830   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:16,299-Speed 2626.58 samples/sec   Loss 8.2458   LearningRate 0.0367   Epoch: 7   Global Step: 326840   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:20,209-Speed 2619.69 samples/sec   Loss 8.0723   LearningRate 0.0367   Epoch: 7   Global Step: 326850   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:24,104-Speed 2629.08 samples/sec   Loss 8.1573   LearningRate 0.0367   Epoch: 7   Global Step: 326860   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:28,001-Speed 2628.71 samples/sec   Loss 8.0980   LearningRate 0.0367   Epoch: 7   Global Step: 326870   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:31,894-Speed 2630.82 samples/sec   Loss 8.1555   LearningRate 0.0367   Epoch: 7   Global Step: 326880   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:35,812-Speed 2614.35 samples/sec   Loss 8.1039   LearningRate 0.0367   Epoch: 7   Global Step: 326890   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:26:39,720-Speed 2621.34 samples/sec   Loss 8.0641   LearningRate 0.0367   Epoch: 7   Global Step: 326900   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:26:43,594-Speed 2643.53 samples/sec   Loss 8.1763   LearningRate 0.0367   Epoch: 7   Global Step: 326910   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:47,495-Speed 2625.90 samples/sec   Loss 8.1100   LearningRate 0.0367   Epoch: 7   Global Step: 326920   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:51,396-Speed 2625.86 samples/sec   Loss 8.1091   LearningRate 0.0367   Epoch: 7   Global Step: 326930   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:55,574-Speed 2451.51 samples/sec   Loss 8.0540   LearningRate 0.0367   Epoch: 7   Global Step: 326940   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:26:59,463-Speed 2634.08 samples/sec   Loss 8.0526   LearningRate 0.0367   Epoch: 7   Global Step: 326950   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:27:03,354-Speed 2631.95 samples/sec   Loss 8.2035   LearningRate 0.0367   Epoch: 7   Global Step: 326960   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:27:07,249-Speed 2629.88 samples/sec   Loss 8.2878   LearningRate 0.0367   Epoch: 7   Global Step: 326970   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:27:11,142-Speed 2630.93 samples/sec   Loss 8.1960   LearningRate 0.0367   Epoch: 7   Global Step: 326980   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:27:15,054-Speed 2618.51 samples/sec   Loss 8.1780   LearningRate 0.0367   Epoch: 7   Global Step: 326990   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:27:18,958-Speed 2623.78 samples/sec   Loss 8.2624   LearningRate 0.0367   Epoch: 7   Global Step: 327000   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:27:22,864-Speed 2622.27 samples/sec   Loss 8.0982   LearningRate 0.0367   Epoch: 7   Global Step: 327010   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:26,791-Speed 2608.64 samples/sec   Loss 8.0944   LearningRate 0.0367   Epoch: 7   Global Step: 327020   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:30,698-Speed 2622.16 samples/sec   Loss 8.1071   LearningRate 0.0367   Epoch: 7   Global Step: 327030   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:34,635-Speed 2601.76 samples/sec   Loss 8.1806   LearningRate 0.0367   Epoch: 7   Global Step: 327040   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:38,528-Speed 2631.24 samples/sec   Loss 8.2596   LearningRate 0.0367   Epoch: 7   Global Step: 327050   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:42,417-Speed 2633.35 samples/sec   Loss 8.2442   LearningRate 0.0367   Epoch: 7   Global Step: 327060   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:46,404-Speed 2569.30 samples/sec   Loss 8.2423   LearningRate 0.0367   Epoch: 7   Global Step: 327070   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:50,299-Speed 2629.29 samples/sec   Loss 8.1806   LearningRate 0.0367   Epoch: 7   Global Step: 327080   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:54,191-Speed 2631.64 samples/sec   Loss 8.1455   LearningRate 0.0367   Epoch: 7   Global Step: 327090   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:27:58,085-Speed 2630.94 samples/sec   Loss 8.1764   LearningRate 0.0367   Epoch: 7   Global Step: 327100   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:01,977-Speed 2631.81 samples/sec   Loss 8.2413   LearningRate 0.0367   Epoch: 7   Global Step: 327110   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:05,864-Speed 2635.26 samples/sec   Loss 8.0954   LearningRate 0.0367   Epoch: 7   Global Step: 327120   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:09,760-Speed 2628.98 samples/sec   Loss 8.0977   LearningRate 0.0367   Epoch: 7   Global Step: 327130   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:13,661-Speed 2625.42 samples/sec   Loss 8.1186   LearningRate 0.0367   Epoch: 7   Global Step: 327140   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:17,563-Speed 2625.31 samples/sec   Loss 8.1151   LearningRate 0.0367   Epoch: 7   Global Step: 327150   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:21,498-Speed 2602.85 samples/sec   Loss 8.0799   LearningRate 0.0367   Epoch: 7   Global Step: 327160   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:25,408-Speed 2619.83 samples/sec   Loss 8.1363   LearningRate 0.0367   Epoch: 7   Global Step: 327170   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:28:29,294-Speed 2635.37 samples/sec   Loss 8.0896   LearningRate 0.0367   Epoch: 7   Global Step: 327180   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:33,238-Speed 2597.46 samples/sec   Loss 8.0244   LearningRate 0.0367   Epoch: 7   Global Step: 327190   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:37,141-Speed 2624.12 samples/sec   Loss 8.0983   LearningRate 0.0367   Epoch: 7   Global Step: 327200   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:41,042-Speed 2625.51 samples/sec   Loss 8.1584   LearningRate 0.0367   Epoch: 7   Global Step: 327210   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:44,935-Speed 2631.23 samples/sec   Loss 8.1186   LearningRate 0.0367   Epoch: 7   Global Step: 327220   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:49,000-Speed 2520.01 samples/sec   Loss 8.1522   LearningRate 0.0367   Epoch: 7   Global Step: 327230   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:53,065-Speed 2519.91 samples/sec   Loss 8.1547   LearningRate 0.0367   Epoch: 7   Global Step: 327240   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:28:57,095-Speed 2541.36 samples/sec   Loss 8.1952   LearningRate 0.0367   Epoch: 7   Global Step: 327250   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:29:00,948-Speed 2658.00 samples/sec   Loss 8.1660   LearningRate 0.0367   Epoch: 7   Global Step: 327260   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:04,864-Speed 2615.66 samples/sec   Loss 8.2206   LearningRate 0.0367   Epoch: 7   Global Step: 327270   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:08,760-Speed 2629.10 samples/sec   Loss 8.2367   LearningRate 0.0367   Epoch: 7   Global Step: 327280   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:12,651-Speed 2632.71 samples/sec   Loss 8.1580   LearningRate 0.0367   Epoch: 7   Global Step: 327290   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:16,538-Speed 2634.73 samples/sec   Loss 8.1345   LearningRate 0.0367   Epoch: 7   Global Step: 327300   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:20,426-Speed 2634.93 samples/sec   Loss 8.1865   LearningRate 0.0367   Epoch: 7   Global Step: 327310   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:24,317-Speed 2632.00 samples/sec   Loss 8.1996   LearningRate 0.0367   Epoch: 7   Global Step: 327320   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:28,218-Speed 2626.15 samples/sec   Loss 7.9910   LearningRate 0.0367   Epoch: 7   Global Step: 327330   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:32,121-Speed 2624.26 samples/sec   Loss 8.1013   LearningRate 0.0367   Epoch: 7   Global Step: 327340   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:36,011-Speed 2632.71 samples/sec   Loss 8.1580   LearningRate 0.0367   Epoch: 7   Global Step: 327350   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:29:39,903-Speed 2631.71 samples/sec   Loss 8.0383   LearningRate 0.0366   Epoch: 7   Global Step: 327360   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:29:43,807-Speed 2623.65 samples/sec   Loss 8.1110   LearningRate 0.0366   Epoch: 7   Global Step: 327370   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:29:47,700-Speed 2631.66 samples/sec   Loss 8.0130   LearningRate 0.0366   Epoch: 7   Global Step: 327380   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:29:51,587-Speed 2634.98 samples/sec   Loss 8.1668   LearningRate 0.0366   Epoch: 7   Global Step: 327390   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:29:55,481-Speed 2629.90 samples/sec   Loss 8.0786   LearningRate 0.0366   Epoch: 7   Global Step: 327400   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:29:59,379-Speed 2627.91 samples/sec   Loss 8.0651   LearningRate 0.0366   Epoch: 7   Global Step: 327410   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:30:03,279-Speed 2625.58 samples/sec   Loss 8.2173   LearningRate 0.0366   Epoch: 7   Global Step: 327420   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:30:07,167-Speed 2634.21 samples/sec   Loss 8.2218   LearningRate 0.0366   Epoch: 7   Global Step: 327430   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:30:11,058-Speed 2632.38 samples/sec   Loss 8.0354   LearningRate 0.0366   Epoch: 7   Global Step: 327440   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:30:14,947-Speed 2633.69 samples/sec   Loss 8.1350   LearningRate 0.0366   Epoch: 7   Global Step: 327450   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:30:18,837-Speed 2633.02 samples/sec   Loss 8.0377   LearningRate 0.0366   Epoch: 7   Global Step: 327460   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:22,727-Speed 2632.72 samples/sec   Loss 8.1731   LearningRate 0.0366   Epoch: 7   Global Step: 327470   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:26,619-Speed 2632.65 samples/sec   Loss 8.1795   LearningRate 0.0366   Epoch: 7   Global Step: 327480   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:30,506-Speed 2634.49 samples/sec   Loss 8.0886   LearningRate 0.0366   Epoch: 7   Global Step: 327490   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:34,397-Speed 2632.37 samples/sec   Loss 8.1323   LearningRate 0.0366   Epoch: 7   Global Step: 327500   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:38,288-Speed 2632.08 samples/sec   Loss 7.9924   LearningRate 0.0366   Epoch: 7   Global Step: 327510   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:42,189-Speed 2626.11 samples/sec   Loss 8.1929   LearningRate 0.0366   Epoch: 7   Global Step: 327520   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:46,080-Speed 2632.14 samples/sec   Loss 8.1847   LearningRate 0.0366   Epoch: 7   Global Step: 327530   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:49,987-Speed 2621.17 samples/sec   Loss 8.1544   LearningRate 0.0366   Epoch: 7   Global Step: 327540   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:53,891-Speed 2623.28 samples/sec   Loss 8.0707   LearningRate 0.0366   Epoch: 7   Global Step: 327550   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:30:57,773-Speed 2639.34 samples/sec   Loss 8.1005   LearningRate 0.0366   Epoch: 7   Global Step: 327560   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:01,669-Speed 2628.84 samples/sec   Loss 8.1602   LearningRate 0.0366   Epoch: 7   Global Step: 327570   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:05,562-Speed 2630.73 samples/sec   Loss 8.1234   LearningRate 0.0366   Epoch: 7   Global Step: 327580   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:09,455-Speed 2630.66 samples/sec   Loss 7.9662   LearningRate 0.0366   Epoch: 7   Global Step: 327590   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:13,355-Speed 2626.76 samples/sec   Loss 8.1813   LearningRate 0.0366   Epoch: 7   Global Step: 327600   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:17,244-Speed 2633.49 samples/sec   Loss 8.1562   LearningRate 0.0366   Epoch: 7   Global Step: 327610   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:21,137-Speed 2630.97 samples/sec   Loss 8.0458   LearningRate 0.0366   Epoch: 7   Global Step: 327620   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:25,027-Speed 2632.91 samples/sec   Loss 8.2252   LearningRate 0.0366   Epoch: 7   Global Step: 327630   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:28,932-Speed 2622.84 samples/sec   Loss 8.1585   LearningRate 0.0366   Epoch: 7   Global Step: 327640   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:32,835-Speed 2624.67 samples/sec   Loss 8.2341   LearningRate 0.0366   Epoch: 7   Global Step: 327650   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:31:36,727-Speed 2631.61 samples/sec   Loss 8.1469   LearningRate 0.0366   Epoch: 7   Global Step: 327660   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:31:40,627-Speed 2625.89 samples/sec   Loss 8.1814   LearningRate 0.0366   Epoch: 7   Global Step: 327670   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:31:44,524-Speed 2628.64 samples/sec   Loss 8.1838   LearningRate 0.0366   Epoch: 7   Global Step: 327680   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:31:48,417-Speed 2630.75 samples/sec   Loss 8.1291   LearningRate 0.0366   Epoch: 7   Global Step: 327690   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:31:52,313-Speed 2628.83 samples/sec   Loss 8.1140   LearningRate 0.0366   Epoch: 7   Global Step: 327700   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:31:56,203-Speed 2632.84 samples/sec   Loss 8.1558   LearningRate 0.0366   Epoch: 7   Global Step: 327710   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:00,101-Speed 2627.91 samples/sec   Loss 8.1348   LearningRate 0.0366   Epoch: 7   Global Step: 327720   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:03,995-Speed 2630.15 samples/sec   Loss 8.0966   LearningRate 0.0366   Epoch: 7   Global Step: 327730   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:07,888-Speed 2630.32 samples/sec   Loss 8.1870   LearningRate 0.0366   Epoch: 7   Global Step: 327740   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:11,798-Speed 2620.41 samples/sec   Loss 8.1504   LearningRate 0.0366   Epoch: 7   Global Step: 327750   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:15,694-Speed 2628.53 samples/sec   Loss 8.1952   LearningRate 0.0366   Epoch: 7   Global Step: 327760   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:32:19,570-Speed 2642.74 samples/sec   Loss 8.2051   LearningRate 0.0366   Epoch: 7   Global Step: 327770   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:23,466-Speed 2628.78 samples/sec   Loss 8.1512   LearningRate 0.0366   Epoch: 7   Global Step: 327780   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:27,361-Speed 2629.73 samples/sec   Loss 8.0387   LearningRate 0.0366   Epoch: 7   Global Step: 327790   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:31,256-Speed 2630.03 samples/sec   Loss 8.1163   LearningRate 0.0366   Epoch: 7   Global Step: 327800   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:35,149-Speed 2630.25 samples/sec   Loss 8.0387   LearningRate 0.0366   Epoch: 7   Global Step: 327810   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:39,042-Speed 2631.06 samples/sec   Loss 8.1019   LearningRate 0.0366   Epoch: 7   Global Step: 327820   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:32:42,914-Speed 2644.86 samples/sec   Loss 8.0904   LearningRate 0.0366   Epoch: 7   Global Step: 327830   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:32:46,807-Speed 2631.99 samples/sec   Loss 8.1277   LearningRate 0.0366   Epoch: 7   Global Step: 327840   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:32:50,699-Speed 2631.19 samples/sec   Loss 8.1223   LearningRate 0.0366   Epoch: 7   Global Step: 327850   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:32:54,595-Speed 2629.44 samples/sec   Loss 8.2161   LearningRate 0.0366   Epoch: 7   Global Step: 327860   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:32:58,500-Speed 2622.83 samples/sec   Loss 8.2207   LearningRate 0.0366   Epoch: 7   Global Step: 327870   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:33:02,395-Speed 2630.16 samples/sec   Loss 8.0936   LearningRate 0.0366   Epoch: 7   Global Step: 327880   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:33:06,286-Speed 2632.19 samples/sec   Loss 8.1114   LearningRate 0.0366   Epoch: 7   Global Step: 327890   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:33:10,187-Speed 2625.92 samples/sec   Loss 8.0377   LearningRate 0.0366   Epoch: 7   Global Step: 327900   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:33:14,082-Speed 2629.37 samples/sec   Loss 8.1164   LearningRate 0.0366   Epoch: 7   Global Step: 327910   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:33:17,976-Speed 2630.03 samples/sec   Loss 8.0370   LearningRate 0.0366   Epoch: 7   Global Step: 327920   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:33:21,871-Speed 2629.83 samples/sec   Loss 8.3547   LearningRate 0.0366   Epoch: 7   Global Step: 327930   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:25,769-Speed 2627.54 samples/sec   Loss 8.2295   LearningRate 0.0366   Epoch: 7   Global Step: 327940   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:29,665-Speed 2629.21 samples/sec   Loss 8.1272   LearningRate 0.0366   Epoch: 7   Global Step: 327950   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:33,556-Speed 2632.88 samples/sec   Loss 8.1098   LearningRate 0.0366   Epoch: 7   Global Step: 327960   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:37,456-Speed 2625.83 samples/sec   Loss 8.1288   LearningRate 0.0366   Epoch: 7   Global Step: 327970   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:41,357-Speed 2625.50 samples/sec   Loss 8.1741   LearningRate 0.0366   Epoch: 7   Global Step: 327980   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:45,269-Speed 2617.73 samples/sec   Loss 8.1011   LearningRate 0.0366   Epoch: 7   Global Step: 327990   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:49,162-Speed 2631.49 samples/sec   Loss 8.1783   LearningRate 0.0366   Epoch: 7   Global Step: 328000   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:53,062-Speed 2625.86 samples/sec   Loss 8.1080   LearningRate 0.0366   Epoch: 7   Global Step: 328010   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:33:56,954-Speed 2631.68 samples/sec   Loss 8.0725   LearningRate 0.0366   Epoch: 7   Global Step: 328020   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:00,829-Speed 2643.68 samples/sec   Loss 8.0081   LearningRate 0.0366   Epoch: 7   Global Step: 328030   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:04,725-Speed 2628.99 samples/sec   Loss 7.9479   LearningRate 0.0366   Epoch: 7   Global Step: 328040   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:08,620-Speed 2629.20 samples/sec   Loss 8.1045   LearningRate 0.0365   Epoch: 7   Global Step: 328050   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:12,514-Speed 2630.32 samples/sec   Loss 8.1597   LearningRate 0.0365   Epoch: 7   Global Step: 328060   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:16,397-Speed 2638.03 samples/sec   Loss 8.1038   LearningRate 0.0365   Epoch: 7   Global Step: 328070   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:20,305-Speed 2620.74 samples/sec   Loss 8.2151   LearningRate 0.0365   Epoch: 7   Global Step: 328080   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:24,196-Speed 2632.29 samples/sec   Loss 8.0777   LearningRate 0.0365   Epoch: 7   Global Step: 328090   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:28,107-Speed 2619.10 samples/sec   Loss 8.0893   LearningRate 0.0365   Epoch: 7   Global Step: 328100   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:32,007-Speed 2626.21 samples/sec   Loss 8.0992   LearningRate 0.0365   Epoch: 7   Global Step: 328110   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:35,913-Speed 2622.61 samples/sec   Loss 8.2456   LearningRate 0.0365   Epoch: 7   Global Step: 328120   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:39,778-Speed 2650.01 samples/sec   Loss 8.1067   LearningRate 0.0365   Epoch: 7   Global Step: 328130   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:43,672-Speed 2630.98 samples/sec   Loss 8.0582   LearningRate 0.0365   Epoch: 7   Global Step: 328140   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:47,565-Speed 2630.89 samples/sec   Loss 8.1136   LearningRate 0.0365   Epoch: 7   Global Step: 328150   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:34:51,448-Speed 2637.63 samples/sec   Loss 8.2020   LearningRate 0.0365   Epoch: 7   Global Step: 328160   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:34:55,348-Speed 2625.96 samples/sec   Loss 8.0654   LearningRate 0.0365   Epoch: 7   Global Step: 328170   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:34:59,240-Speed 2631.47 samples/sec   Loss 8.1405   LearningRate 0.0365   Epoch: 7   Global Step: 328180   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:03,143-Speed 2624.70 samples/sec   Loss 8.1505   LearningRate 0.0365   Epoch: 7   Global Step: 328190   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:07,035-Speed 2631.89 samples/sec   Loss 8.2061   LearningRate 0.0365   Epoch: 7   Global Step: 328200   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:10,930-Speed 2628.98 samples/sec   Loss 8.1476   LearningRate 0.0365   Epoch: 7   Global Step: 328210   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:14,839-Speed 2620.82 samples/sec   Loss 8.0518   LearningRate 0.0365   Epoch: 7   Global Step: 328220   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:18,732-Speed 2630.79 samples/sec   Loss 8.0464   LearningRate 0.0365   Epoch: 7   Global Step: 328230   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:22,628-Speed 2629.01 samples/sec   Loss 8.1577   LearningRate 0.0365   Epoch: 7   Global Step: 328240   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:35:26,502-Speed 2643.81 samples/sec   Loss 8.0592   LearningRate 0.0365   Epoch: 7   Global Step: 328250   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:30,397-Speed 2629.36 samples/sec   Loss 8.1471   LearningRate 0.0365   Epoch: 7   Global Step: 328260   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:34,293-Speed 2629.45 samples/sec   Loss 8.0707   LearningRate 0.0365   Epoch: 7   Global Step: 328270   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:38,187-Speed 2630.32 samples/sec   Loss 8.0863   LearningRate 0.0365   Epoch: 7   Global Step: 328280   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:42,079-Speed 2631.83 samples/sec   Loss 8.1765   LearningRate 0.0365   Epoch: 7   Global Step: 328290   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:45,998-Speed 2613.34 samples/sec   Loss 8.0213   LearningRate 0.0365   Epoch: 7   Global Step: 328300   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:49,888-Speed 2633.09 samples/sec   Loss 8.0944   LearningRate 0.0365   Epoch: 7   Global Step: 328310   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:53,779-Speed 2632.27 samples/sec   Loss 8.1714   LearningRate 0.0365   Epoch: 7   Global Step: 328320   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:35:57,672-Speed 2631.42 samples/sec   Loss 8.1119   LearningRate 0.0365   Epoch: 7   Global Step: 328330   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:36:01,566-Speed 2629.84 samples/sec   Loss 8.1380   LearningRate 0.0365   Epoch: 7   Global Step: 328340   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:36:05,554-Speed 2568.44 samples/sec   Loss 8.1707   LearningRate 0.0365   Epoch: 7   Global Step: 328350   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:09,459-Speed 2622.93 samples/sec   Loss 8.1987   LearningRate 0.0365   Epoch: 7   Global Step: 328360   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:13,355-Speed 2628.79 samples/sec   Loss 8.0001   LearningRate 0.0365   Epoch: 7   Global Step: 328370   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:17,250-Speed 2629.14 samples/sec   Loss 8.0694   LearningRate 0.0365   Epoch: 7   Global Step: 328380   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:21,203-Speed 2592.07 samples/sec   Loss 8.1255   LearningRate 0.0365   Epoch: 7   Global Step: 328390   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:25,095-Speed 2631.59 samples/sec   Loss 8.1810   LearningRate 0.0365   Epoch: 7   Global Step: 328400   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:29,001-Speed 2621.95 samples/sec   Loss 8.2147   LearningRate 0.0365   Epoch: 7   Global Step: 328410   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:32,905-Speed 2623.91 samples/sec   Loss 8.1779   LearningRate 0.0365   Epoch: 7   Global Step: 328420   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:36,809-Speed 2623.67 samples/sec   Loss 8.1010   LearningRate 0.0365   Epoch: 7   Global Step: 328430   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:40,711-Speed 2624.47 samples/sec   Loss 8.2304   LearningRate 0.0365   Epoch: 7   Global Step: 328440   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:36:44,624-Speed 2618.21 samples/sec   Loss 8.1569   LearningRate 0.0365   Epoch: 7   Global Step: 328450   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:36:48,549-Speed 2609.73 samples/sec   Loss 7.9764   LearningRate 0.0365   Epoch: 7   Global Step: 328460   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:36:52,452-Speed 2624.23 samples/sec   Loss 8.0040   LearningRate 0.0365   Epoch: 7   Global Step: 328470   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:36:56,359-Speed 2621.76 samples/sec   Loss 8.0856   LearningRate 0.0365   Epoch: 7   Global Step: 328480   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:37:00,240-Speed 2639.23 samples/sec   Loss 8.1226   LearningRate 0.0365   Epoch: 7   Global Step: 328490   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:04,136-Speed 2628.84 samples/sec   Loss 8.0944   LearningRate 0.0365   Epoch: 7   Global Step: 328500   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:08,048-Speed 2617.90 samples/sec   Loss 8.0844   LearningRate 0.0365   Epoch: 7   Global Step: 328510   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:11,953-Speed 2623.10 samples/sec   Loss 8.0426   LearningRate 0.0365   Epoch: 7   Global Step: 328520   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:15,855-Speed 2625.38 samples/sec   Loss 8.0676   LearningRate 0.0365   Epoch: 7   Global Step: 328530   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:19,751-Speed 2629.20 samples/sec   Loss 7.9865   LearningRate 0.0365   Epoch: 7   Global Step: 328540   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:23,664-Speed 2617.28 samples/sec   Loss 8.3084   LearningRate 0.0365   Epoch: 7   Global Step: 328550   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:27,573-Speed 2621.19 samples/sec   Loss 8.1218   LearningRate 0.0365   Epoch: 7   Global Step: 328560   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:31,477-Speed 2623.20 samples/sec   Loss 8.0390   LearningRate 0.0365   Epoch: 7   Global Step: 328570   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:35,369-Speed 2631.30 samples/sec   Loss 8.0364   LearningRate 0.0365   Epoch: 7   Global Step: 328580   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:37:39,260-Speed 2632.46 samples/sec   Loss 8.1470   LearningRate 0.0365   Epoch: 7   Global Step: 328590   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:37:43,152-Speed 2631.41 samples/sec   Loss 8.1364   LearningRate 0.0365   Epoch: 7   Global Step: 328600   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:37:47,045-Speed 2631.40 samples/sec   Loss 8.1358   LearningRate 0.0365   Epoch: 7   Global Step: 328610   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:37:50,938-Speed 2631.17 samples/sec   Loss 7.9764   LearningRate 0.0365   Epoch: 7   Global Step: 328620   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:37:54,842-Speed 2623.70 samples/sec   Loss 8.1183   LearningRate 0.0365   Epoch: 7   Global Step: 328630   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:37:58,734-Speed 2631.94 samples/sec   Loss 8.0677   LearningRate 0.0365   Epoch: 7   Global Step: 328640   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:02,645-Speed 2619.33 samples/sec   Loss 8.0779   LearningRate 0.0365   Epoch: 7   Global Step: 328650   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:06,538-Speed 2630.95 samples/sec   Loss 8.2151   LearningRate 0.0365   Epoch: 7   Global Step: 328660   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:10,467-Speed 2606.84 samples/sec   Loss 8.2015   LearningRate 0.0365   Epoch: 7   Global Step: 328670   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:14,367-Speed 2626.57 samples/sec   Loss 8.2321   LearningRate 0.0365   Epoch: 7   Global Step: 328680   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:18,265-Speed 2627.85 samples/sec   Loss 7.9971   LearningRate 0.0365   Epoch: 7   Global Step: 328690   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:38:22,200-Speed 2602.28 samples/sec   Loss 8.0172   LearningRate 0.0365   Epoch: 7   Global Step: 328700   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:26,110-Speed 2620.64 samples/sec   Loss 8.1359   LearningRate 0.0365   Epoch: 7   Global Step: 328710   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:30,009-Speed 2626.57 samples/sec   Loss 8.2565   LearningRate 0.0365   Epoch: 7   Global Step: 328720   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:33,927-Speed 2614.57 samples/sec   Loss 8.0301   LearningRate 0.0365   Epoch: 7   Global Step: 328730   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:37,823-Speed 2629.31 samples/sec   Loss 8.2939   LearningRate 0.0364   Epoch: 7   Global Step: 328740   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:41,714-Speed 2632.44 samples/sec   Loss 8.1770   LearningRate 0.0364   Epoch: 7   Global Step: 328750   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:45,606-Speed 2631.64 samples/sec   Loss 8.0877   LearningRate 0.0364   Epoch: 7   Global Step: 328760   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:49,498-Speed 2631.52 samples/sec   Loss 8.0049   LearningRate 0.0364   Epoch: 7   Global Step: 328770   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:53,394-Speed 2628.88 samples/sec   Loss 8.0422   LearningRate 0.0364   Epoch: 7   Global Step: 328780   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:38:57,286-Speed 2631.96 samples/sec   Loss 8.1137   LearningRate 0.0364   Epoch: 7   Global Step: 328790   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:01,182-Speed 2629.07 samples/sec   Loss 8.1543   LearningRate 0.0364   Epoch: 7   Global Step: 328800   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:39:05,061-Speed 2640.86 samples/sec   Loss 8.1259   LearningRate 0.0364   Epoch: 7   Global Step: 328810   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:08,951-Speed 2632.71 samples/sec   Loss 8.1790   LearningRate 0.0364   Epoch: 7   Global Step: 328820   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:12,847-Speed 2628.55 samples/sec   Loss 8.0353   LearningRate 0.0364   Epoch: 7   Global Step: 328830   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:16,742-Speed 2630.30 samples/sec   Loss 8.2955   LearningRate 0.0364   Epoch: 7   Global Step: 328840   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:20,641-Speed 2626.26 samples/sec   Loss 8.0843   LearningRate 0.0364   Epoch: 7   Global Step: 328850   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:24,532-Speed 2633.06 samples/sec   Loss 8.2083   LearningRate 0.0364   Epoch: 7   Global Step: 328860   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:28,424-Speed 2631.32 samples/sec   Loss 8.1427   LearningRate 0.0364   Epoch: 7   Global Step: 328870   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:32,320-Speed 2629.38 samples/sec   Loss 8.1286   LearningRate 0.0364   Epoch: 7   Global Step: 328880   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:36,238-Speed 2614.19 samples/sec   Loss 8.0770   LearningRate 0.0364   Epoch: 7   Global Step: 328890   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:40,138-Speed 2626.27 samples/sec   Loss 8.1877   LearningRate 0.0364   Epoch: 7   Global Step: 328900   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:44,025-Speed 2635.00 samples/sec   Loss 7.9978   LearningRate 0.0364   Epoch: 7   Global Step: 328910   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:47,909-Speed 2637.70 samples/sec   Loss 7.9878   LearningRate 0.0364   Epoch: 7   Global Step: 328920   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:51,804-Speed 2629.65 samples/sec   Loss 8.1977   LearningRate 0.0364   Epoch: 7   Global Step: 328930   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:55,701-Speed 2628.51 samples/sec   Loss 8.1133   LearningRate 0.0364   Epoch: 7   Global Step: 328940   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:39:59,757-Speed 2525.48 samples/sec   Loss 8.1080   LearningRate 0.0364   Epoch: 7   Global Step: 328950   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:40:03,850-Speed 2502.52 samples/sec   Loss 8.1555   LearningRate 0.0364   Epoch: 7   Global Step: 328960   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:40:07,797-Speed 2594.50 samples/sec   Loss 8.1108   LearningRate 0.0364   Epoch: 7   Global Step: 328970   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:40:11,687-Speed 2633.04 samples/sec   Loss 8.1426   LearningRate 0.0364   Epoch: 7   Global Step: 328980   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:40:15,592-Speed 2623.08 samples/sec   Loss 8.0940   LearningRate 0.0364   Epoch: 7   Global Step: 328990   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:40:19,469-Speed 2642.17 samples/sec   Loss 8.1087   LearningRate 0.0364   Epoch: 7   Global Step: 329000   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:40:23,308-Speed 2667.34 samples/sec   Loss 10.8725   LearningRate 0.0364   Epoch: 7   Global Step: 329010   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:27,203-Speed 2629.70 samples/sec   Loss 9.6417   LearningRate 0.0364   Epoch: 7   Global Step: 329020   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:31,093-Speed 2633.14 samples/sec   Loss 8.7480   LearningRate 0.0364   Epoch: 7   Global Step: 329030   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:34,981-Speed 2634.59 samples/sec   Loss 8.5940   LearningRate 0.0364   Epoch: 7   Global Step: 329040   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:38,868-Speed 2635.26 samples/sec   Loss 8.5332   LearningRate 0.0364   Epoch: 7   Global Step: 329050   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:42,770-Speed 2625.03 samples/sec   Loss 8.3560   LearningRate 0.0364   Epoch: 7   Global Step: 329060   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:46,663-Speed 2631.07 samples/sec   Loss 8.3910   LearningRate 0.0364   Epoch: 7   Global Step: 329070   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:50,558-Speed 2629.49 samples/sec   Loss 8.1806   LearningRate 0.0364   Epoch: 7   Global Step: 329080   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:54,454-Speed 2629.93 samples/sec   Loss 8.2291   LearningRate 0.0364   Epoch: 7   Global Step: 329090   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:40:58,362-Speed 2620.74 samples/sec   Loss 8.1816   LearningRate 0.0364   Epoch: 7   Global Step: 329100   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:41:02,270-Speed 2620.70 samples/sec   Loss 8.1761   LearningRate 0.0364   Epoch: 7   Global Step: 329110   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:06,159-Speed 2633.80 samples/sec   Loss 8.1736   LearningRate 0.0364   Epoch: 7   Global Step: 329120   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:10,050-Speed 2633.17 samples/sec   Loss 8.1065   LearningRate 0.0364   Epoch: 7   Global Step: 329130   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:13,942-Speed 2631.53 samples/sec   Loss 8.1764   LearningRate 0.0364   Epoch: 7   Global Step: 329140   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:17,847-Speed 2622.74 samples/sec   Loss 8.3217   LearningRate 0.0364   Epoch: 7   Global Step: 329150   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:21,737-Speed 2633.38 samples/sec   Loss 8.0647   LearningRate 0.0364   Epoch: 7   Global Step: 329160   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:25,628-Speed 2632.51 samples/sec   Loss 8.1968   LearningRate 0.0364   Epoch: 7   Global Step: 329170   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:29,517-Speed 2633.91 samples/sec   Loss 8.1257   LearningRate 0.0364   Epoch: 7   Global Step: 329180   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:33,406-Speed 2633.60 samples/sec   Loss 8.2120   LearningRate 0.0364   Epoch: 7   Global Step: 329190   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:37,296-Speed 2632.88 samples/sec   Loss 8.1391   LearningRate 0.0364   Epoch: 7   Global Step: 329200   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:41:41,211-Speed 2616.14 samples/sec   Loss 8.1714   LearningRate 0.0364   Epoch: 7   Global Step: 329210   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:41:45,113-Speed 2625.06 samples/sec   Loss 8.1300   LearningRate 0.0364   Epoch: 7   Global Step: 329220   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:41:49,014-Speed 2625.79 samples/sec   Loss 8.5430   LearningRate 0.0364   Epoch: 7   Global Step: 329230   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:41:52,900-Speed 2635.67 samples/sec   Loss 8.3279   LearningRate 0.0364   Epoch: 7   Global Step: 329240   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:41:56,799-Speed 2627.12 samples/sec   Loss 8.3015   LearningRate 0.0364   Epoch: 7   Global Step: 329250   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:42:00,700-Speed 2625.84 samples/sec   Loss 8.1661   LearningRate 0.0364   Epoch: 7   Global Step: 329260   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:42:04,620-Speed 2612.94 samples/sec   Loss 8.2971   LearningRate 0.0364   Epoch: 7   Global Step: 329270   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:42:08,536-Speed 2615.06 samples/sec   Loss 8.0996   LearningRate 0.0364   Epoch: 7   Global Step: 329280   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:42:12,428-Speed 2632.51 samples/sec   Loss 8.1152   LearningRate 0.0364   Epoch: 7   Global Step: 329290   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:42:16,316-Speed 2633.99 samples/sec   Loss 8.2030   LearningRate 0.0364   Epoch: 7   Global Step: 329300   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:42:20,241-Speed 2609.82 samples/sec   Loss 8.1352   LearningRate 0.0364   Epoch: 7   Global Step: 329310   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:24,130-Speed 2634.07 samples/sec   Loss 8.1144   LearningRate 0.0364   Epoch: 7   Global Step: 329320   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:28,022-Speed 2631.87 samples/sec   Loss 8.2009   LearningRate 0.0364   Epoch: 7   Global Step: 329330   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:31,924-Speed 2624.77 samples/sec   Loss 8.1490   LearningRate 0.0364   Epoch: 7   Global Step: 329340   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:35,823-Speed 2627.23 samples/sec   Loss 8.0751   LearningRate 0.0364   Epoch: 7   Global Step: 329350   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:39,711-Speed 2634.25 samples/sec   Loss 8.0779   LearningRate 0.0364   Epoch: 7   Global Step: 329360   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:43,601-Speed 2633.21 samples/sec   Loss 8.1529   LearningRate 0.0364   Epoch: 7   Global Step: 329370   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:47,491-Speed 2632.71 samples/sec   Loss 8.1250   LearningRate 0.0364   Epoch: 7   Global Step: 329380   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:51,384-Speed 2631.61 samples/sec   Loss 8.0555   LearningRate 0.0364   Epoch: 7   Global Step: 329390   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:55,273-Speed 2633.03 samples/sec   Loss 8.1229   LearningRate 0.0364   Epoch: 7   Global Step: 329400   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:42:59,165-Speed 2632.71 samples/sec   Loss 8.1422   LearningRate 0.0364   Epoch: 7   Global Step: 329410   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:03,061-Speed 2628.34 samples/sec   Loss 8.1322   LearningRate 0.0363   Epoch: 7   Global Step: 329420   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:06,954-Speed 2631.03 samples/sec   Loss 8.1082   LearningRate 0.0363   Epoch: 7   Global Step: 329430   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:10,856-Speed 2624.80 samples/sec   Loss 8.0912   LearningRate 0.0363   Epoch: 7   Global Step: 329440   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:14,807-Speed 2593.47 samples/sec   Loss 7.9926   LearningRate 0.0363   Epoch: 7   Global Step: 329450   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:18,878-Speed 2515.47 samples/sec   Loss 8.1861   LearningRate 0.0363   Epoch: 7   Global Step: 329460   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:22,826-Speed 2594.28 samples/sec   Loss 8.1270   LearningRate 0.0363   Epoch: 7   Global Step: 329470   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:43:26,719-Speed 2631.35 samples/sec   Loss 8.2033   LearningRate 0.0363   Epoch: 7   Global Step: 329480   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:43:30,594-Speed 2643.19 samples/sec   Loss 8.1985   LearningRate 0.0363   Epoch: 7   Global Step: 329490   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:34,493-Speed 2627.32 samples/sec   Loss 8.0285   LearningRate 0.0363   Epoch: 7   Global Step: 329500   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:38,386-Speed 2630.29 samples/sec   Loss 8.2013   LearningRate 0.0363   Epoch: 7   Global Step: 329510   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:42,289-Speed 2624.33 samples/sec   Loss 8.2040   LearningRate 0.0363   Epoch: 7   Global Step: 329520   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:46,181-Speed 2631.79 samples/sec   Loss 8.1049   LearningRate 0.0363   Epoch: 7   Global Step: 329530   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:50,073-Speed 2631.83 samples/sec   Loss 8.1323   LearningRate 0.0363   Epoch: 7   Global Step: 329540   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:53,964-Speed 2631.83 samples/sec   Loss 8.1710   LearningRate 0.0363   Epoch: 7   Global Step: 329550   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:43:57,861-Speed 2628.56 samples/sec   Loss 8.1913   LearningRate 0.0363   Epoch: 7   Global Step: 329560   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:44:01,752-Speed 2632.61 samples/sec   Loss 8.2087   LearningRate 0.0363   Epoch: 7   Global Step: 329570   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:44:05,660-Speed 2620.35 samples/sec   Loss 8.1881   LearningRate 0.0363   Epoch: 7   Global Step: 329580   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:44:09,555-Speed 2630.05 samples/sec   Loss 8.0790   LearningRate 0.0363   Epoch: 7   Global Step: 329590   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:13,446-Speed 2632.62 samples/sec   Loss 8.0691   LearningRate 0.0363   Epoch: 7   Global Step: 329600   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:17,361-Speed 2615.73 samples/sec   Loss 8.0776   LearningRate 0.0363   Epoch: 7   Global Step: 329610   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:21,255-Speed 2630.08 samples/sec   Loss 8.1396   LearningRate 0.0363   Epoch: 7   Global Step: 329620   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:25,146-Speed 2632.58 samples/sec   Loss 8.0423   LearningRate 0.0363   Epoch: 7   Global Step: 329630   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:29,036-Speed 2633.23 samples/sec   Loss 8.2633   LearningRate 0.0363   Epoch: 7   Global Step: 329640   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:32,925-Speed 2633.81 samples/sec   Loss 7.9302   LearningRate 0.0363   Epoch: 7   Global Step: 329650   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:36,816-Speed 2632.06 samples/sec   Loss 8.1422   LearningRate 0.0363   Epoch: 7   Global Step: 329660   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:40,704-Speed 2633.85 samples/sec   Loss 8.1406   LearningRate 0.0363   Epoch: 7   Global Step: 329670   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:44,593-Speed 2633.74 samples/sec   Loss 8.0434   LearningRate 0.0363   Epoch: 7   Global Step: 329680   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:44:48,483-Speed 2633.44 samples/sec   Loss 8.1317   LearningRate 0.0363   Epoch: 7   Global Step: 329690   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:44:52,384-Speed 2625.22 samples/sec   Loss 8.0829   LearningRate 0.0363   Epoch: 7   Global Step: 329700   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:44:56,275-Speed 2632.51 samples/sec   Loss 8.1063   LearningRate 0.0363   Epoch: 7   Global Step: 329710   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:00,191-Speed 2615.52 samples/sec   Loss 8.2397   LearningRate 0.0363   Epoch: 7   Global Step: 329720   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:04,084-Speed 2631.41 samples/sec   Loss 8.1862   LearningRate 0.0363   Epoch: 7   Global Step: 329730   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:07,978-Speed 2629.79 samples/sec   Loss 8.0775   LearningRate 0.0363   Epoch: 7   Global Step: 329740   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:11,871-Speed 2630.87 samples/sec   Loss 8.1837   LearningRate 0.0363   Epoch: 7   Global Step: 329750   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:15,772-Speed 2625.39 samples/sec   Loss 8.0501   LearningRate 0.0363   Epoch: 7   Global Step: 329760   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:19,669-Speed 2628.85 samples/sec   Loss 8.1020   LearningRate 0.0363   Epoch: 7   Global Step: 329770   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:23,556-Speed 2634.33 samples/sec   Loss 8.1519   LearningRate 0.0363   Epoch: 7   Global Step: 329780   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:45:27,450-Speed 2630.78 samples/sec   Loss 8.1505   LearningRate 0.0363   Epoch: 7   Global Step: 329790   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:31,369-Speed 2613.37 samples/sec   Loss 8.1504   LearningRate 0.0363   Epoch: 7   Global Step: 329800   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:35,261-Speed 2632.19 samples/sec   Loss 8.0446   LearningRate 0.0363   Epoch: 7   Global Step: 329810   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:39,155-Speed 2629.74 samples/sec   Loss 8.1318   LearningRate 0.0363   Epoch: 7   Global Step: 329820   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:43,052-Speed 2628.59 samples/sec   Loss 8.1994   LearningRate 0.0363   Epoch: 7   Global Step: 329830   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:46,967-Speed 2615.97 samples/sec   Loss 8.1027   LearningRate 0.0363   Epoch: 7   Global Step: 329840   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:50,860-Speed 2630.56 samples/sec   Loss 8.1851   LearningRate 0.0363   Epoch: 7   Global Step: 329850   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:54,757-Speed 2628.76 samples/sec   Loss 8.1204   LearningRate 0.0363   Epoch: 7   Global Step: 329860   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:45:58,650-Speed 2630.70 samples/sec   Loss 8.1545   LearningRate 0.0363   Epoch: 7   Global Step: 329870   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:46:02,542-Speed 2632.03 samples/sec   Loss 8.1141   LearningRate 0.0363   Epoch: 7   Global Step: 329880   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:46:06,432-Speed 2632.75 samples/sec   Loss 8.1234   LearningRate 0.0363   Epoch: 7   Global Step: 329890   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:46:10,322-Speed 2632.93 samples/sec   Loss 8.1777   LearningRate 0.0363   Epoch: 7   Global Step: 329900   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:46:14,195-Speed 2644.72 samples/sec   Loss 8.1891   LearningRate 0.0363   Epoch: 7   Global Step: 329910   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:46:18,098-Speed 2624.07 samples/sec   Loss 8.0486   LearningRate 0.0363   Epoch: 7   Global Step: 329920   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:46:21,990-Speed 2631.37 samples/sec   Loss 8.1874   LearningRate 0.0363   Epoch: 7   Global Step: 329930   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:46:25,884-Speed 2630.71 samples/sec   Loss 8.0364   LearningRate 0.0363   Epoch: 7   Global Step: 329940   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:46:29,762-Speed 2640.93 samples/sec   Loss 8.2020   LearningRate 0.0363   Epoch: 7   Global Step: 329950   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:46:33,651-Speed 2633.92 samples/sec   Loss 8.1450   LearningRate 0.0363   Epoch: 7   Global Step: 329960   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:46:37,541-Speed 2632.41 samples/sec   Loss 8.1426   LearningRate 0.0363   Epoch: 7   Global Step: 329970   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:46:41,436-Speed 2629.88 samples/sec   Loss 8.1128   LearningRate 0.0363   Epoch: 7   Global Step: 329980   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:46:45,327-Speed 2633.07 samples/sec   Loss 8.0848   LearningRate 0.0363   Epoch: 7   Global Step: 329990   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:46:49,215-Speed 2633.95 samples/sec   Loss 8.0960   LearningRate 0.0363   Epoch: 7   Global Step: 330000   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:47:32,167-[lfw][330000]XNorm: 23.292403
Training: 2022-04-14 08:47:32,168-[lfw][330000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-14 08:47:32,168-[lfw][330000]Accuracy-Highest: 0.99783
Training: 2022-04-14 08:48:22,119-[cfp_fp][330000]XNorm: 21.462890
Training: 2022-04-14 08:48:22,120-[cfp_fp][330000]Accuracy-Flip: 0.98529+-0.00703
Training: 2022-04-14 08:48:22,120-[cfp_fp][330000]Accuracy-Highest: 0.98671
Training: 2022-04-14 08:49:05,100-[agedb_30][330000]XNorm: 22.999810
Training: 2022-04-14 08:49:05,101-[agedb_30][330000]Accuracy-Flip: 0.97350+-0.00732
Training: 2022-04-14 08:49:05,101-[agedb_30][330000]Accuracy-Highest: 0.97567
Training: 2022-04-14 08:49:08,965-Speed 73.27 samples/sec   Loss 8.1676   LearningRate 0.0363   Epoch: 7   Global Step: 330010   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:49:12,833-Speed 2647.85 samples/sec   Loss 7.9833   LearningRate 0.0363   Epoch: 7   Global Step: 330020   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:49:16,705-Speed 2645.47 samples/sec   Loss 8.0206   LearningRate 0.0363   Epoch: 7   Global Step: 330030   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:49:20,578-Speed 2644.78 samples/sec   Loss 8.1956   LearningRate 0.0363   Epoch: 7   Global Step: 330040   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:49:24,451-Speed 2643.87 samples/sec   Loss 8.1524   LearningRate 0.0363   Epoch: 7   Global Step: 330050   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:49:28,312-Speed 2652.93 samples/sec   Loss 8.2106   LearningRate 0.0363   Epoch: 7   Global Step: 330060   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:49:32,133-Speed 2680.72 samples/sec   Loss 8.9486   LearningRate 0.0363   Epoch: 7   Global Step: 330070   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:36,007-Speed 2643.48 samples/sec   Loss 8.7149   LearningRate 0.0363   Epoch: 7   Global Step: 330080   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:39,891-Speed 2637.70 samples/sec   Loss 8.2281   LearningRate 0.0363   Epoch: 7   Global Step: 330090   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:43,772-Speed 2639.03 samples/sec   Loss 8.1858   LearningRate 0.0363   Epoch: 7   Global Step: 330100   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:47,658-Speed 2635.51 samples/sec   Loss 8.3165   LearningRate 0.0362   Epoch: 7   Global Step: 330110   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:51,549-Speed 2632.39 samples/sec   Loss 8.1772   LearningRate 0.0362   Epoch: 7   Global Step: 330120   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:55,454-Speed 2623.18 samples/sec   Loss 8.1508   LearningRate 0.0362   Epoch: 7   Global Step: 330130   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:49:59,335-Speed 2639.30 samples/sec   Loss 8.2061   LearningRate 0.0362   Epoch: 7   Global Step: 330140   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:50:03,219-Speed 2636.59 samples/sec   Loss 7.9901   LearningRate 0.0362   Epoch: 7   Global Step: 330150   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:50:07,108-Speed 2633.99 samples/sec   Loss 8.1764   LearningRate 0.0362   Epoch: 7   Global Step: 330160   Fp16 Grad Scale: 1024   Required: 56 hours
Training: 2022-04-14 08:50:10,994-Speed 2635.05 samples/sec   Loss 8.1125   LearningRate 0.0362   Epoch: 7   Global Step: 330170   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:14,883-Speed 2633.90 samples/sec   Loss 8.0601   LearningRate 0.0362   Epoch: 7   Global Step: 330180   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:18,773-Speed 2633.01 samples/sec   Loss 8.1133   LearningRate 0.0362   Epoch: 7   Global Step: 330190   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:22,676-Speed 2624.59 samples/sec   Loss 8.1122   LearningRate 0.0362   Epoch: 7   Global Step: 330200   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:26,564-Speed 2634.25 samples/sec   Loss 8.1955   LearningRate 0.0362   Epoch: 7   Global Step: 330210   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:30,452-Speed 2634.52 samples/sec   Loss 8.1748   LearningRate 0.0362   Epoch: 7   Global Step: 330220   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:34,353-Speed 2625.21 samples/sec   Loss 8.1836   LearningRate 0.0362   Epoch: 7   Global Step: 330230   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:38,248-Speed 2629.86 samples/sec   Loss 8.1812   LearningRate 0.0362   Epoch: 7   Global Step: 330240   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:42,150-Speed 2624.97 samples/sec   Loss 8.1763   LearningRate 0.0362   Epoch: 7   Global Step: 330250   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:46,059-Speed 2619.68 samples/sec   Loss 8.4262   LearningRate 0.0362   Epoch: 7   Global Step: 330260   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:50:49,953-Speed 2630.56 samples/sec   Loss 8.1134   LearningRate 0.0362   Epoch: 7   Global Step: 330270   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:50:53,847-Speed 2630.50 samples/sec   Loss 8.1188   LearningRate 0.0362   Epoch: 7   Global Step: 330280   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:50:57,739-Speed 2631.65 samples/sec   Loss 8.0827   LearningRate 0.0362   Epoch: 7   Global Step: 330290   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:01,630-Speed 2631.83 samples/sec   Loss 8.0315   LearningRate 0.0362   Epoch: 7   Global Step: 330300   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:05,519-Speed 2633.82 samples/sec   Loss 8.0526   LearningRate 0.0362   Epoch: 7   Global Step: 330310   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:09,409-Speed 2633.43 samples/sec   Loss 7.9905   LearningRate 0.0362   Epoch: 7   Global Step: 330320   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:13,298-Speed 2633.60 samples/sec   Loss 8.1033   LearningRate 0.0362   Epoch: 7   Global Step: 330330   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:17,192-Speed 2630.13 samples/sec   Loss 8.1345   LearningRate 0.0362   Epoch: 7   Global Step: 330340   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:21,088-Speed 2629.56 samples/sec   Loss 8.1873   LearningRate 0.0362   Epoch: 7   Global Step: 330350   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:24,980-Speed 2631.34 samples/sec   Loss 8.0582   LearningRate 0.0362   Epoch: 7   Global Step: 330360   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:51:28,870-Speed 2632.87 samples/sec   Loss 8.0762   LearningRate 0.0362   Epoch: 7   Global Step: 330370   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:32,770-Speed 2626.47 samples/sec   Loss 8.0095   LearningRate 0.0362   Epoch: 7   Global Step: 330380   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:36,663-Speed 2630.72 samples/sec   Loss 8.1338   LearningRate 0.0362   Epoch: 7   Global Step: 330390   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:40,557-Speed 2630.54 samples/sec   Loss 7.9657   LearningRate 0.0362   Epoch: 7   Global Step: 330400   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:44,529-Speed 2578.65 samples/sec   Loss 7.8959   LearningRate 0.0362   Epoch: 7   Global Step: 330410   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:48,425-Speed 2629.03 samples/sec   Loss 8.0477   LearningRate 0.0362   Epoch: 7   Global Step: 330420   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:52,321-Speed 2629.55 samples/sec   Loss 8.1013   LearningRate 0.0362   Epoch: 7   Global Step: 330430   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:51:56,220-Speed 2626.29 samples/sec   Loss 8.0865   LearningRate 0.0362   Epoch: 7   Global Step: 330440   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:52:00,130-Speed 2620.29 samples/sec   Loss 8.1931   LearningRate 0.0362   Epoch: 7   Global Step: 330450   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:52:04,056-Speed 2609.06 samples/sec   Loss 7.9621   LearningRate 0.0362   Epoch: 7   Global Step: 330460   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:52:07,957-Speed 2625.07 samples/sec   Loss 8.0899   LearningRate 0.0362   Epoch: 7   Global Step: 330470   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:11,850-Speed 2630.62 samples/sec   Loss 8.2089   LearningRate 0.0362   Epoch: 7   Global Step: 330480   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:15,744-Speed 2630.51 samples/sec   Loss 8.0687   LearningRate 0.0362   Epoch: 7   Global Step: 330490   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:19,652-Speed 2621.49 samples/sec   Loss 8.0107   LearningRate 0.0362   Epoch: 7   Global Step: 330500   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:23,549-Speed 2628.62 samples/sec   Loss 8.0096   LearningRate 0.0362   Epoch: 7   Global Step: 330510   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:27,641-Speed 2503.38 samples/sec   Loss 8.1173   LearningRate 0.0362   Epoch: 7   Global Step: 330520   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:31,678-Speed 2537.03 samples/sec   Loss 8.1890   LearningRate 0.0362   Epoch: 7   Global Step: 330530   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:35,569-Speed 2632.12 samples/sec   Loss 8.0871   LearningRate 0.0362   Epoch: 7   Global Step: 330540   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:39,463-Speed 2630.64 samples/sec   Loss 8.1733   LearningRate 0.0362   Epoch: 7   Global Step: 330550   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:43,356-Speed 2630.76 samples/sec   Loss 8.1522   LearningRate 0.0362   Epoch: 7   Global Step: 330560   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:52:47,252-Speed 2629.06 samples/sec   Loss 8.1062   LearningRate 0.0362   Epoch: 7   Global Step: 330570   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:52:51,152-Speed 2626.09 samples/sec   Loss 8.1481   LearningRate 0.0362   Epoch: 7   Global Step: 330580   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:52:55,049-Speed 2628.72 samples/sec   Loss 8.1906   LearningRate 0.0362   Epoch: 7   Global Step: 330590   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:52:58,951-Speed 2624.80 samples/sec   Loss 8.1648   LearningRate 0.0362   Epoch: 7   Global Step: 330600   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:02,847-Speed 2629.08 samples/sec   Loss 8.1323   LearningRate 0.0362   Epoch: 7   Global Step: 330610   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:06,744-Speed 2628.14 samples/sec   Loss 8.0844   LearningRate 0.0362   Epoch: 7   Global Step: 330620   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:10,641-Speed 2627.89 samples/sec   Loss 7.9767   LearningRate 0.0362   Epoch: 7   Global Step: 330630   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:14,536-Speed 2630.06 samples/sec   Loss 8.1886   LearningRate 0.0362   Epoch: 7   Global Step: 330640   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:18,436-Speed 2626.28 samples/sec   Loss 8.1683   LearningRate 0.0362   Epoch: 7   Global Step: 330650   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:22,330-Speed 2630.51 samples/sec   Loss 8.2013   LearningRate 0.0362   Epoch: 7   Global Step: 330660   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:53:26,226-Speed 2629.12 samples/sec   Loss 8.1985   LearningRate 0.0362   Epoch: 7   Global Step: 330670   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:30,120-Speed 2630.03 samples/sec   Loss 8.1756   LearningRate 0.0362   Epoch: 7   Global Step: 330680   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:34,018-Speed 2628.02 samples/sec   Loss 8.0954   LearningRate 0.0362   Epoch: 7   Global Step: 330690   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:37,916-Speed 2627.42 samples/sec   Loss 8.1198   LearningRate 0.0362   Epoch: 7   Global Step: 330700   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:41,825-Speed 2619.67 samples/sec   Loss 8.1433   LearningRate 0.0362   Epoch: 7   Global Step: 330710   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:45,733-Speed 2621.29 samples/sec   Loss 8.0453   LearningRate 0.0362   Epoch: 7   Global Step: 330720   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:49,642-Speed 2620.24 samples/sec   Loss 8.0526   LearningRate 0.0362   Epoch: 7   Global Step: 330730   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:53,542-Speed 2626.47 samples/sec   Loss 8.1178   LearningRate 0.0362   Epoch: 7   Global Step: 330740   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:53:57,443-Speed 2625.79 samples/sec   Loss 8.0351   LearningRate 0.0362   Epoch: 7   Global Step: 330750   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:54:01,338-Speed 2628.91 samples/sec   Loss 8.0716   LearningRate 0.0362   Epoch: 7   Global Step: 330760   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:54:05,234-Speed 2629.63 samples/sec   Loss 8.0972   LearningRate 0.0362   Epoch: 7   Global Step: 330770   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:09,131-Speed 2627.81 samples/sec   Loss 8.0501   LearningRate 0.0362   Epoch: 7   Global Step: 330780   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:13,032-Speed 2625.88 samples/sec   Loss 8.1485   LearningRate 0.0362   Epoch: 7   Global Step: 330790   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:16,928-Speed 2628.68 samples/sec   Loss 8.0203   LearningRate 0.0361   Epoch: 7   Global Step: 330800   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:20,883-Speed 2590.05 samples/sec   Loss 8.0866   LearningRate 0.0361   Epoch: 7   Global Step: 330810   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:24,778-Speed 2630.47 samples/sec   Loss 8.1810   LearningRate 0.0361   Epoch: 7   Global Step: 330820   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:28,676-Speed 2627.23 samples/sec   Loss 8.0424   LearningRate 0.0361   Epoch: 7   Global Step: 330830   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:32,623-Speed 2595.52 samples/sec   Loss 8.2643   LearningRate 0.0361   Epoch: 7   Global Step: 330840   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:36,533-Speed 2619.23 samples/sec   Loss 8.1416   LearningRate 0.0361   Epoch: 7   Global Step: 330850   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:40,431-Speed 2627.85 samples/sec   Loss 7.9999   LearningRate 0.0361   Epoch: 7   Global Step: 330860   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:44,324-Speed 2631.09 samples/sec   Loss 8.1633   LearningRate 0.0361   Epoch: 7   Global Step: 330870   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:54:48,204-Speed 2640.01 samples/sec   Loss 8.0789   LearningRate 0.0361   Epoch: 7   Global Step: 330880   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:52,104-Speed 2625.71 samples/sec   Loss 8.1589   LearningRate 0.0361   Epoch: 7   Global Step: 330890   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:55,999-Speed 2629.95 samples/sec   Loss 8.0979   LearningRate 0.0361   Epoch: 7   Global Step: 330900   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:54:59,895-Speed 2629.33 samples/sec   Loss 7.9973   LearningRate 0.0361   Epoch: 7   Global Step: 330910   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:03,792-Speed 2629.45 samples/sec   Loss 8.2101   LearningRate 0.0361   Epoch: 7   Global Step: 330920   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:07,689-Speed 2627.95 samples/sec   Loss 7.9794   LearningRate 0.0361   Epoch: 7   Global Step: 330930   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:11,582-Speed 2631.04 samples/sec   Loss 8.1453   LearningRate 0.0361   Epoch: 7   Global Step: 330940   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:15,488-Speed 2622.65 samples/sec   Loss 8.1556   LearningRate 0.0361   Epoch: 7   Global Step: 330950   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:19,382-Speed 2630.38 samples/sec   Loss 8.2977   LearningRate 0.0361   Epoch: 7   Global Step: 330960   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:23,281-Speed 2626.54 samples/sec   Loss 8.0516   LearningRate 0.0361   Epoch: 7   Global Step: 330970   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:27,181-Speed 2626.89 samples/sec   Loss 7.9823   LearningRate 0.0361   Epoch: 7   Global Step: 330980   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 08:55:31,055-Speed 2643.31 samples/sec   Loss 8.0335   LearningRate 0.0361   Epoch: 7   Global Step: 330990   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 08:55:34,924-Speed 2647.57 samples/sec   Loss 8.1116   LearningRate 0.0361   Epoch: 7   Global Step: 331000   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:55:38,817-Speed 2631.39 samples/sec   Loss 8.1938   LearningRate 0.0361   Epoch: 7   Global Step: 331010   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:55:42,727-Speed 2619.34 samples/sec   Loss 8.1826   LearningRate 0.0361   Epoch: 7   Global Step: 331020   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:55:46,620-Speed 2631.13 samples/sec   Loss 8.0827   LearningRate 0.0361   Epoch: 7   Global Step: 331030   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:55:50,516-Speed 2629.06 samples/sec   Loss 8.0451   LearningRate 0.0361   Epoch: 7   Global Step: 331040   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:55:54,407-Speed 2632.07 samples/sec   Loss 8.1705   LearningRate 0.0361   Epoch: 7   Global Step: 331050   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:55:58,311-Speed 2623.76 samples/sec   Loss 8.0927   LearningRate 0.0361   Epoch: 7   Global Step: 331060   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:56:02,210-Speed 2626.69 samples/sec   Loss 8.0800   LearningRate 0.0361   Epoch: 7   Global Step: 331070   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:56:06,106-Speed 2629.64 samples/sec   Loss 8.2255   LearningRate 0.0361   Epoch: 7   Global Step: 331080   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:56:10,004-Speed 2627.22 samples/sec   Loss 8.1545   LearningRate 0.0361   Epoch: 7   Global Step: 331090   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:56:13,900-Speed 2629.15 samples/sec   Loss 8.0935   LearningRate 0.0361   Epoch: 7   Global Step: 331100   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:56:17,794-Speed 2629.96 samples/sec   Loss 8.0919   LearningRate 0.0361   Epoch: 7   Global Step: 331110   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 08:56:21,675-Speed 2639.18 samples/sec   Loss 8.0859   LearningRate 0.0361   Epoch: 7   Global Step: 331120   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:25,569-Speed 2630.92 samples/sec   Loss 8.1039   LearningRate 0.0361   Epoch: 7   Global Step: 331130   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:29,469-Speed 2626.29 samples/sec   Loss 8.0268   LearningRate 0.0361   Epoch: 7   Global Step: 331140   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:33,360-Speed 2631.83 samples/sec   Loss 8.1113   LearningRate 0.0361   Epoch: 7   Global Step: 331150   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:37,265-Speed 2623.53 samples/sec   Loss 8.0356   LearningRate 0.0361   Epoch: 7   Global Step: 331160   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:41,157-Speed 2631.31 samples/sec   Loss 8.1111   LearningRate 0.0361   Epoch: 7   Global Step: 331170   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:45,061-Speed 2624.29 samples/sec   Loss 7.9105   LearningRate 0.0361   Epoch: 7   Global Step: 331180   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:48,952-Speed 2632.29 samples/sec   Loss 8.1415   LearningRate 0.0361   Epoch: 7   Global Step: 331190   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:52,851-Speed 2627.06 samples/sec   Loss 8.1897   LearningRate 0.0361   Epoch: 7   Global Step: 331200   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:56:56,742-Speed 2632.27 samples/sec   Loss 8.1025   LearningRate 0.0361   Epoch: 7   Global Step: 331210   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:57:00,633-Speed 2632.17 samples/sec   Loss 8.1088   LearningRate 0.0361   Epoch: 7   Global Step: 331220   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:57:04,533-Speed 2626.42 samples/sec   Loss 7.9081   LearningRate 0.0361   Epoch: 7   Global Step: 331230   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:57:08,423-Speed 2632.54 samples/sec   Loss 8.0862   LearningRate 0.0361   Epoch: 7   Global Step: 331240   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:57:12,317-Speed 2630.45 samples/sec   Loss 8.0739   LearningRate 0.0361   Epoch: 7   Global Step: 331250   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:57:16,209-Speed 2631.74 samples/sec   Loss 8.1016   LearningRate 0.0361   Epoch: 7   Global Step: 331260   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 08:57:20,078-Speed 2647.95 samples/sec   Loss 8.4140   LearningRate 0.0361   Epoch: 7   Global Step: 331270   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:57:23,929-Speed 2660.01 samples/sec   Loss 9.5979   LearningRate 0.0361   Epoch: 7   Global Step: 331280   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:27,837-Speed 2620.71 samples/sec   Loss 9.0119   LearningRate 0.0361   Epoch: 7   Global Step: 331290   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:31,726-Speed 2633.45 samples/sec   Loss 8.4616   LearningRate 0.0361   Epoch: 7   Global Step: 331300   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:35,623-Speed 2628.55 samples/sec   Loss 8.1836   LearningRate 0.0361   Epoch: 7   Global Step: 331310   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:39,514-Speed 2632.44 samples/sec   Loss 8.2211   LearningRate 0.0361   Epoch: 7   Global Step: 331320   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:43,403-Speed 2633.60 samples/sec   Loss 7.9472   LearningRate 0.0361   Epoch: 7   Global Step: 331330   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:47,299-Speed 2629.10 samples/sec   Loss 8.0013   LearningRate 0.0361   Epoch: 7   Global Step: 331340   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:51,213-Speed 2617.06 samples/sec   Loss 8.1469   LearningRate 0.0361   Epoch: 7   Global Step: 331350   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:55,103-Speed 2632.28 samples/sec   Loss 8.1538   LearningRate 0.0361   Epoch: 7   Global Step: 331360   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:57:58,994-Speed 2633.09 samples/sec   Loss 8.1315   LearningRate 0.0361   Epoch: 7   Global Step: 331370   Fp16 Grad Scale: 2048   Required: 56 hours
Training: 2022-04-14 08:58:02,900-Speed 2622.18 samples/sec   Loss 8.0082   LearningRate 0.0361   Epoch: 7   Global Step: 331380   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:06,809-Speed 2620.14 samples/sec   Loss 8.0951   LearningRate 0.0361   Epoch: 7   Global Step: 331390   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:10,714-Speed 2622.84 samples/sec   Loss 8.1328   LearningRate 0.0361   Epoch: 7   Global Step: 331400   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:14,626-Speed 2618.46 samples/sec   Loss 8.0562   LearningRate 0.0361   Epoch: 7   Global Step: 331410   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:18,521-Speed 2629.49 samples/sec   Loss 8.1339   LearningRate 0.0361   Epoch: 7   Global Step: 331420   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:22,414-Speed 2631.20 samples/sec   Loss 8.0964   LearningRate 0.0361   Epoch: 7   Global Step: 331430   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:26,310-Speed 2629.42 samples/sec   Loss 8.0811   LearningRate 0.0361   Epoch: 7   Global Step: 331440   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:30,291-Speed 2572.83 samples/sec   Loss 8.1476   LearningRate 0.0361   Epoch: 7   Global Step: 331450   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:34,204-Speed 2617.56 samples/sec   Loss 8.2405   LearningRate 0.0361   Epoch: 7   Global Step: 331460   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:38,095-Speed 2632.52 samples/sec   Loss 8.1784   LearningRate 0.0361   Epoch: 7   Global Step: 331470   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 08:58:41,990-Speed 2629.26 samples/sec   Loss 8.1231   LearningRate 0.0361   Epoch: 7   Global Step: 331480   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:58:45,886-Speed 2629.85 samples/sec   Loss 8.0009   LearningRate 0.0360   Epoch: 7   Global Step: 331490   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:58:49,784-Speed 2627.29 samples/sec   Loss 8.2593   LearningRate 0.0360   Epoch: 7   Global Step: 331500   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:58:53,677-Speed 2630.84 samples/sec   Loss 8.2235   LearningRate 0.0360   Epoch: 7   Global Step: 331510   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:58:57,575-Speed 2627.37 samples/sec   Loss 8.0172   LearningRate 0.0360   Epoch: 7   Global Step: 331520   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:59:01,472-Speed 2629.05 samples/sec   Loss 8.0116   LearningRate 0.0360   Epoch: 7   Global Step: 331530   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:59:05,375-Speed 2623.93 samples/sec   Loss 8.0141   LearningRate 0.0360   Epoch: 7   Global Step: 331540   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:59:09,270-Speed 2630.03 samples/sec   Loss 8.0661   LearningRate 0.0360   Epoch: 7   Global Step: 331550   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:59:13,171-Speed 2625.22 samples/sec   Loss 8.6936   LearningRate 0.0360   Epoch: 7   Global Step: 331560   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:59:17,069-Speed 2627.47 samples/sec   Loss 8.6215   LearningRate 0.0360   Epoch: 7   Global Step: 331570   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 08:59:20,968-Speed 2627.38 samples/sec   Loss 8.2048   LearningRate 0.0360   Epoch: 7   Global Step: 331580   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:24,863-Speed 2629.32 samples/sec   Loss 8.1931   LearningRate 0.0360   Epoch: 7   Global Step: 331590   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:28,756-Speed 2631.52 samples/sec   Loss 8.1277   LearningRate 0.0360   Epoch: 7   Global Step: 331600   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:32,651-Speed 2629.22 samples/sec   Loss 8.0890   LearningRate 0.0360   Epoch: 7   Global Step: 331610   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:36,544-Speed 2630.89 samples/sec   Loss 8.0975   LearningRate 0.0360   Epoch: 7   Global Step: 331620   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:40,449-Speed 2623.45 samples/sec   Loss 8.1331   LearningRate 0.0360   Epoch: 7   Global Step: 331630   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:44,355-Speed 2621.92 samples/sec   Loss 8.1233   LearningRate 0.0360   Epoch: 7   Global Step: 331640   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:48,279-Speed 2610.33 samples/sec   Loss 8.1467   LearningRate 0.0360   Epoch: 7   Global Step: 331650   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:52,179-Speed 2626.46 samples/sec   Loss 7.9623   LearningRate 0.0360   Epoch: 7   Global Step: 331660   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 08:59:56,084-Speed 2622.46 samples/sec   Loss 8.1169   LearningRate 0.0360   Epoch: 7   Global Step: 331670   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:00,012-Speed 2608.32 samples/sec   Loss 8.1322   LearningRate 0.0360   Epoch: 7   Global Step: 331680   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:00:03,912-Speed 2626.37 samples/sec   Loss 8.1812   LearningRate 0.0360   Epoch: 7   Global Step: 331690   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:00:07,805-Speed 2630.41 samples/sec   Loss 8.1219   LearningRate 0.0360   Epoch: 7   Global Step: 331700   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:00:11,699-Speed 2630.16 samples/sec   Loss 8.1624   LearningRate 0.0360   Epoch: 7   Global Step: 331710   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:00:15,592-Speed 2631.36 samples/sec   Loss 8.1138   LearningRate 0.0360   Epoch: 7   Global Step: 331720   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:00:19,494-Speed 2624.86 samples/sec   Loss 8.1536   LearningRate 0.0360   Epoch: 7   Global Step: 331730   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:00:23,380-Speed 2635.60 samples/sec   Loss 8.4053   LearningRate 0.0360   Epoch: 7   Global Step: 331740   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:27,279-Speed 2627.45 samples/sec   Loss 8.2858   LearningRate 0.0360   Epoch: 7   Global Step: 331750   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:31,168-Speed 2633.67 samples/sec   Loss 8.1461   LearningRate 0.0360   Epoch: 7   Global Step: 331760   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:35,063-Speed 2630.09 samples/sec   Loss 8.1461   LearningRate 0.0360   Epoch: 7   Global Step: 331770   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:38,966-Speed 2623.84 samples/sec   Loss 8.1362   LearningRate 0.0360   Epoch: 7   Global Step: 331780   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:42,868-Speed 2624.92 samples/sec   Loss 8.1501   LearningRate 0.0360   Epoch: 7   Global Step: 331790   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:46,757-Speed 2633.34 samples/sec   Loss 8.0914   LearningRate 0.0360   Epoch: 7   Global Step: 331800   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:50,661-Speed 2624.57 samples/sec   Loss 8.0940   LearningRate 0.0360   Epoch: 7   Global Step: 331810   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:54,563-Speed 2624.97 samples/sec   Loss 8.1197   LearningRate 0.0360   Epoch: 7   Global Step: 331820   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:00:58,462-Speed 2627.50 samples/sec   Loss 8.1127   LearningRate 0.0360   Epoch: 7   Global Step: 331830   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:01:19,502-Speed 486.72 samples/sec   Loss 8.0724   LearningRate 0.0360   Epoch: 8   Global Step: 331840   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:23,400-Speed 2627.68 samples/sec   Loss 8.0930   LearningRate 0.0360   Epoch: 8   Global Step: 331850   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:27,297-Speed 2628.78 samples/sec   Loss 8.0882   LearningRate 0.0360   Epoch: 8   Global Step: 331860   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:31,196-Speed 2626.83 samples/sec   Loss 8.1416   LearningRate 0.0360   Epoch: 8   Global Step: 331870   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:35,085-Speed 2634.27 samples/sec   Loss 8.1663   LearningRate 0.0360   Epoch: 8   Global Step: 331880   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:38,973-Speed 2633.76 samples/sec   Loss 8.0787   LearningRate 0.0360   Epoch: 8   Global Step: 331890   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:42,865-Speed 2631.88 samples/sec   Loss 8.0898   LearningRate 0.0360   Epoch: 8   Global Step: 331900   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:46,762-Speed 2628.35 samples/sec   Loss 8.0664   LearningRate 0.0360   Epoch: 8   Global Step: 331910   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:50,676-Speed 2616.77 samples/sec   Loss 8.0642   LearningRate 0.0360   Epoch: 8   Global Step: 331920   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:54,608-Speed 2604.70 samples/sec   Loss 8.0583   LearningRate 0.0360   Epoch: 8   Global Step: 331930   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:01:58,505-Speed 2628.62 samples/sec   Loss 8.0791   LearningRate 0.0360   Epoch: 8   Global Step: 331940   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:02,406-Speed 2625.98 samples/sec   Loss 8.2003   LearningRate 0.0360   Epoch: 8   Global Step: 331950   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:06,327-Speed 2612.40 samples/sec   Loss 8.0981   LearningRate 0.0360   Epoch: 8   Global Step: 331960   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:10,238-Speed 2618.68 samples/sec   Loss 8.2651   LearningRate 0.0360   Epoch: 8   Global Step: 331970   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:14,138-Speed 2626.59 samples/sec   Loss 8.1200   LearningRate 0.0360   Epoch: 8   Global Step: 331980   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:18,042-Speed 2623.09 samples/sec   Loss 8.1717   LearningRate 0.0360   Epoch: 8   Global Step: 331990   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:21,968-Speed 2608.90 samples/sec   Loss 8.1598   LearningRate 0.0360   Epoch: 8   Global Step: 332000   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:25,890-Speed 2611.98 samples/sec   Loss 8.1122   LearningRate 0.0360   Epoch: 8   Global Step: 332010   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:29,803-Speed 2617.49 samples/sec   Loss 8.0745   LearningRate 0.0360   Epoch: 8   Global Step: 332020   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:33,700-Speed 2628.39 samples/sec   Loss 8.1734   LearningRate 0.0360   Epoch: 8   Global Step: 332030   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:02:37,600-Speed 2626.45 samples/sec   Loss 8.0309   LearningRate 0.0360   Epoch: 8   Global Step: 332040   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:02:41,572-Speed 2578.97 samples/sec   Loss 8.0865   LearningRate 0.0360   Epoch: 8   Global Step: 332050   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:02:45,652-Speed 2510.64 samples/sec   Loss 7.8931   LearningRate 0.0360   Epoch: 8   Global Step: 332060   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:02:49,732-Speed 2509.69 samples/sec   Loss 8.1778   LearningRate 0.0360   Epoch: 8   Global Step: 332070   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:02:53,647-Speed 2616.44 samples/sec   Loss 8.1967   LearningRate 0.0360   Epoch: 8   Global Step: 332080   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:02:57,545-Speed 2628.33 samples/sec   Loss 8.0222   LearningRate 0.0360   Epoch: 8   Global Step: 332090   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:01,443-Speed 2627.36 samples/sec   Loss 8.0256   LearningRate 0.0360   Epoch: 8   Global Step: 332100   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:05,346-Speed 2623.86 samples/sec   Loss 7.9926   LearningRate 0.0360   Epoch: 8   Global Step: 332110   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:09,250-Speed 2623.92 samples/sec   Loss 8.1616   LearningRate 0.0360   Epoch: 8   Global Step: 332120   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:13,154-Speed 2623.87 samples/sec   Loss 8.1122   LearningRate 0.0360   Epoch: 8   Global Step: 332130   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:17,193-Speed 2536.16 samples/sec   Loss 8.0400   LearningRate 0.0360   Epoch: 8   Global Step: 332140   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:03:21,174-Speed 2573.13 samples/sec   Loss 7.9568   LearningRate 0.0360   Epoch: 8   Global Step: 332150   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:25,066-Speed 2631.37 samples/sec   Loss 7.9864   LearningRate 0.0360   Epoch: 8   Global Step: 332160   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:28,971-Speed 2623.59 samples/sec   Loss 8.1531   LearningRate 0.0360   Epoch: 8   Global Step: 332170   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:32,869-Speed 2627.85 samples/sec   Loss 8.0557   LearningRate 0.0359   Epoch: 8   Global Step: 332180   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:36,767-Speed 2626.99 samples/sec   Loss 8.0888   LearningRate 0.0359   Epoch: 8   Global Step: 332190   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:03:40,637-Speed 2647.02 samples/sec   Loss 7.9639   LearningRate 0.0359   Epoch: 8   Global Step: 332200   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:03:44,599-Speed 2585.62 samples/sec   Loss 8.1019   LearningRate 0.0359   Epoch: 8   Global Step: 332210   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:03:48,492-Speed 2630.81 samples/sec   Loss 8.1202   LearningRate 0.0359   Epoch: 8   Global Step: 332220   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:03:52,391-Speed 2626.94 samples/sec   Loss 8.0869   LearningRate 0.0359   Epoch: 8   Global Step: 332230   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:03:56,284-Speed 2630.70 samples/sec   Loss 7.9768   LearningRate 0.0359   Epoch: 8   Global Step: 332240   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:04:00,183-Speed 2627.57 samples/sec   Loss 7.8780   LearningRate 0.0359   Epoch: 8   Global Step: 332250   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:04:04,091-Speed 2620.50 samples/sec   Loss 8.1383   LearningRate 0.0359   Epoch: 8   Global Step: 332260   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:04:07,990-Speed 2627.30 samples/sec   Loss 8.0668   LearningRate 0.0359   Epoch: 8   Global Step: 332270   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:04:11,887-Speed 2628.02 samples/sec   Loss 8.0280   LearningRate 0.0359   Epoch: 8   Global Step: 332280   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:04:15,785-Speed 2628.01 samples/sec   Loss 8.0694   LearningRate 0.0359   Epoch: 8   Global Step: 332290   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:04:19,684-Speed 2626.73 samples/sec   Loss 8.1257   LearningRate 0.0359   Epoch: 8   Global Step: 332300   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:04:23,586-Speed 2625.33 samples/sec   Loss 8.0586   LearningRate 0.0359   Epoch: 8   Global Step: 332310   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:04:27,495-Speed 2620.34 samples/sec   Loss 7.9935   LearningRate 0.0359   Epoch: 8   Global Step: 332320   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:04:31,423-Speed 2608.03 samples/sec   Loss 8.1010   LearningRate 0.0359   Epoch: 8   Global Step: 332330   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:04:35,337-Speed 2616.37 samples/sec   Loss 8.0936   LearningRate 0.0359   Epoch: 8   Global Step: 332340   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:04:39,193-Speed 2656.25 samples/sec   Loss 8.1958   LearningRate 0.0359   Epoch: 8   Global Step: 332350   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:04:43,088-Speed 2629.51 samples/sec   Loss 8.1956   LearningRate 0.0359   Epoch: 8   Global Step: 332360   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:04:46,994-Speed 2622.79 samples/sec   Loss 8.0880   LearningRate 0.0359   Epoch: 8   Global Step: 332370   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:04:50,890-Speed 2629.58 samples/sec   Loss 8.1545   LearningRate 0.0359   Epoch: 8   Global Step: 332380   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:04:54,783-Speed 2630.54 samples/sec   Loss 8.0955   LearningRate 0.0359   Epoch: 8   Global Step: 332390   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:04:58,678-Speed 2629.94 samples/sec   Loss 8.0489   LearningRate 0.0359   Epoch: 8   Global Step: 332400   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:05:02,571-Speed 2630.93 samples/sec   Loss 8.0866   LearningRate 0.0359   Epoch: 8   Global Step: 332410   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:05:06,482-Speed 2618.94 samples/sec   Loss 8.0286   LearningRate 0.0359   Epoch: 8   Global Step: 332420   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:05:10,378-Speed 2629.06 samples/sec   Loss 7.9878   LearningRate 0.0359   Epoch: 8   Global Step: 332430   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:05:14,269-Speed 2632.88 samples/sec   Loss 7.9828   LearningRate 0.0359   Epoch: 8   Global Step: 332440   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:05:18,170-Speed 2625.18 samples/sec   Loss 8.1495   LearningRate 0.0359   Epoch: 8   Global Step: 332450   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:22,067-Speed 2628.65 samples/sec   Loss 8.0742   LearningRate 0.0359   Epoch: 8   Global Step: 332460   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:25,963-Speed 2628.55 samples/sec   Loss 8.0345   LearningRate 0.0359   Epoch: 8   Global Step: 332470   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:29,858-Speed 2630.48 samples/sec   Loss 8.0629   LearningRate 0.0359   Epoch: 8   Global Step: 332480   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:33,753-Speed 2629.52 samples/sec   Loss 8.0729   LearningRate 0.0359   Epoch: 8   Global Step: 332490   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:37,663-Speed 2620.07 samples/sec   Loss 8.1325   LearningRate 0.0359   Epoch: 8   Global Step: 332500   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:41,560-Speed 2627.92 samples/sec   Loss 7.8843   LearningRate 0.0359   Epoch: 8   Global Step: 332510   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:45,458-Speed 2628.05 samples/sec   Loss 8.0671   LearningRate 0.0359   Epoch: 8   Global Step: 332520   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:49,354-Speed 2628.95 samples/sec   Loss 8.0667   LearningRate 0.0359   Epoch: 8   Global Step: 332530   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:53,250-Speed 2629.05 samples/sec   Loss 8.0703   LearningRate 0.0359   Epoch: 8   Global Step: 332540   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:05:57,144-Speed 2630.27 samples/sec   Loss 8.0832   LearningRate 0.0359   Epoch: 8   Global Step: 332550   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:06:01,017-Speed 2644.82 samples/sec   Loss 8.3471   LearningRate 0.0359   Epoch: 8   Global Step: 332560   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:06:04,889-Speed 2645.33 samples/sec   Loss 8.6420   LearningRate 0.0359   Epoch: 8   Global Step: 332570   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:08,779-Speed 2632.78 samples/sec   Loss 8.1058   LearningRate 0.0359   Epoch: 8   Global Step: 332580   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:12,673-Speed 2630.58 samples/sec   Loss 8.0368   LearningRate 0.0359   Epoch: 8   Global Step: 332590   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:16,571-Speed 2627.42 samples/sec   Loss 8.1550   LearningRate 0.0359   Epoch: 8   Global Step: 332600   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:20,469-Speed 2627.65 samples/sec   Loss 8.0765   LearningRate 0.0359   Epoch: 8   Global Step: 332610   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:24,360-Speed 2632.45 samples/sec   Loss 8.0532   LearningRate 0.0359   Epoch: 8   Global Step: 332620   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:28,252-Speed 2631.50 samples/sec   Loss 8.1302   LearningRate 0.0359   Epoch: 8   Global Step: 332630   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:32,163-Speed 2618.95 samples/sec   Loss 7.9484   LearningRate 0.0359   Epoch: 8   Global Step: 332640   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:36,054-Speed 2632.48 samples/sec   Loss 8.1387   LearningRate 0.0359   Epoch: 8   Global Step: 332650   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:39,949-Speed 2630.35 samples/sec   Loss 8.0824   LearningRate 0.0359   Epoch: 8   Global Step: 332660   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:06:43,841-Speed 2631.73 samples/sec   Loss 8.0997   LearningRate 0.0359   Epoch: 8   Global Step: 332670   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:06:47,927-Speed 2506.53 samples/sec   Loss 8.1206   LearningRate 0.0359   Epoch: 8   Global Step: 332680   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:06:52,009-Speed 2509.00 samples/sec   Loss 8.0730   LearningRate 0.0359   Epoch: 8   Global Step: 332690   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:06:56,050-Speed 2534.63 samples/sec   Loss 8.0453   LearningRate 0.0359   Epoch: 8   Global Step: 332700   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:06:59,988-Speed 2601.41 samples/sec   Loss 8.0840   LearningRate 0.0359   Epoch: 8   Global Step: 332710   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:07:03,883-Speed 2630.35 samples/sec   Loss 8.0519   LearningRate 0.0359   Epoch: 8   Global Step: 332720   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:07:07,773-Speed 2632.46 samples/sec   Loss 8.2101   LearningRate 0.0359   Epoch: 8   Global Step: 332730   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:07:11,775-Speed 2559.76 samples/sec   Loss 7.9456   LearningRate 0.0359   Epoch: 8   Global Step: 332740   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:07:15,666-Speed 2632.17 samples/sec   Loss 8.0233   LearningRate 0.0359   Epoch: 8   Global Step: 332750   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:07:19,582-Speed 2615.44 samples/sec   Loss 8.1102   LearningRate 0.0359   Epoch: 8   Global Step: 332760   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:07:23,486-Speed 2623.89 samples/sec   Loss 8.1020   LearningRate 0.0359   Epoch: 8   Global Step: 332770   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:27,379-Speed 2631.25 samples/sec   Loss 8.0722   LearningRate 0.0359   Epoch: 8   Global Step: 332780   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:31,273-Speed 2630.30 samples/sec   Loss 8.1105   LearningRate 0.0359   Epoch: 8   Global Step: 332790   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:35,170-Speed 2628.35 samples/sec   Loss 7.9900   LearningRate 0.0359   Epoch: 8   Global Step: 332800   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:39,068-Speed 2627.88 samples/sec   Loss 8.2125   LearningRate 0.0359   Epoch: 8   Global Step: 332810   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:42,961-Speed 2631.36 samples/sec   Loss 8.0646   LearningRate 0.0359   Epoch: 8   Global Step: 332820   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:47,012-Speed 2528.04 samples/sec   Loss 8.0637   LearningRate 0.0359   Epoch: 8   Global Step: 332830   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:51,080-Speed 2517.81 samples/sec   Loss 8.1037   LearningRate 0.0359   Epoch: 8   Global Step: 332840   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:54,977-Speed 2628.51 samples/sec   Loss 8.0799   LearningRate 0.0359   Epoch: 8   Global Step: 332850   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:07:58,878-Speed 2625.15 samples/sec   Loss 8.0609   LearningRate 0.0359   Epoch: 8   Global Step: 332860   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:08:02,787-Speed 2620.72 samples/sec   Loss 7.9097   LearningRate 0.0359   Epoch: 8   Global Step: 332870   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:06,703-Speed 2615.20 samples/sec   Loss 8.0930   LearningRate 0.0358   Epoch: 8   Global Step: 332880   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:10,623-Speed 2612.74 samples/sec   Loss 7.9687   LearningRate 0.0358   Epoch: 8   Global Step: 332890   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:14,521-Speed 2628.05 samples/sec   Loss 8.0454   LearningRate 0.0358   Epoch: 8   Global Step: 332900   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:18,414-Speed 2630.93 samples/sec   Loss 8.1104   LearningRate 0.0358   Epoch: 8   Global Step: 332910   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:22,310-Speed 2628.83 samples/sec   Loss 8.1454   LearningRate 0.0358   Epoch: 8   Global Step: 332920   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:26,204-Speed 2630.77 samples/sec   Loss 8.1043   LearningRate 0.0358   Epoch: 8   Global Step: 332930   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:30,097-Speed 2630.78 samples/sec   Loss 8.1660   LearningRate 0.0358   Epoch: 8   Global Step: 332940   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:33,998-Speed 2625.74 samples/sec   Loss 8.0346   LearningRate 0.0358   Epoch: 8   Global Step: 332950   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:37,921-Speed 2611.16 samples/sec   Loss 8.0763   LearningRate 0.0358   Epoch: 8   Global Step: 332960   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:08:41,821-Speed 2626.77 samples/sec   Loss 8.0541   LearningRate 0.0358   Epoch: 8   Global Step: 332970   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:08:45,726-Speed 2622.82 samples/sec   Loss 8.0873   LearningRate 0.0358   Epoch: 8   Global Step: 332980   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:08:49,630-Speed 2623.25 samples/sec   Loss 8.1418   LearningRate 0.0358   Epoch: 8   Global Step: 332990   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:08:53,531-Speed 2625.92 samples/sec   Loss 8.2173   LearningRate 0.0358   Epoch: 8   Global Step: 333000   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:08:57,433-Speed 2624.99 samples/sec   Loss 8.0241   LearningRate 0.0358   Epoch: 8   Global Step: 333010   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:09:01,426-Speed 2565.37 samples/sec   Loss 8.0322   LearningRate 0.0358   Epoch: 8   Global Step: 333020   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:09:05,320-Speed 2629.58 samples/sec   Loss 8.0987   LearningRate 0.0358   Epoch: 8   Global Step: 333030   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:09:09,214-Speed 2630.44 samples/sec   Loss 7.9428   LearningRate 0.0358   Epoch: 8   Global Step: 333040   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:09:13,107-Speed 2630.96 samples/sec   Loss 8.0302   LearningRate 0.0358   Epoch: 8   Global Step: 333050   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:09:17,003-Speed 2629.44 samples/sec   Loss 8.1008   LearningRate 0.0358   Epoch: 8   Global Step: 333060   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:09:20,910-Speed 2621.31 samples/sec   Loss 7.9886   LearningRate 0.0358   Epoch: 8   Global Step: 333070   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:24,815-Speed 2622.91 samples/sec   Loss 8.0530   LearningRate 0.0358   Epoch: 8   Global Step: 333080   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:28,710-Speed 2629.31 samples/sec   Loss 8.0570   LearningRate 0.0358   Epoch: 8   Global Step: 333090   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:32,605-Speed 2629.77 samples/sec   Loss 8.0753   LearningRate 0.0358   Epoch: 8   Global Step: 333100   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:36,503-Speed 2627.66 samples/sec   Loss 7.9732   LearningRate 0.0358   Epoch: 8   Global Step: 333110   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:40,400-Speed 2628.31 samples/sec   Loss 7.8666   LearningRate 0.0358   Epoch: 8   Global Step: 333120   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:44,294-Speed 2630.15 samples/sec   Loss 8.0497   LearningRate 0.0358   Epoch: 8   Global Step: 333130   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:48,192-Speed 2627.88 samples/sec   Loss 8.0661   LearningRate 0.0358   Epoch: 8   Global Step: 333140   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:52,106-Speed 2617.41 samples/sec   Loss 8.1224   LearningRate 0.0358   Epoch: 8   Global Step: 333150   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:56,006-Speed 2625.75 samples/sec   Loss 8.1106   LearningRate 0.0358   Epoch: 8   Global Step: 333160   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:09:59,940-Speed 2604.20 samples/sec   Loss 8.1165   LearningRate 0.0358   Epoch: 8   Global Step: 333170   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:10:03,848-Speed 2621.11 samples/sec   Loss 8.0717   LearningRate 0.0358   Epoch: 8   Global Step: 333180   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:10:07,727-Speed 2640.02 samples/sec   Loss 7.9876   LearningRate 0.0358   Epoch: 8   Global Step: 333190   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:11,620-Speed 2631.08 samples/sec   Loss 8.0861   LearningRate 0.0358   Epoch: 8   Global Step: 333200   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:15,524-Speed 2623.99 samples/sec   Loss 8.0739   LearningRate 0.0358   Epoch: 8   Global Step: 333210   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:19,424-Speed 2626.06 samples/sec   Loss 8.1661   LearningRate 0.0358   Epoch: 8   Global Step: 333220   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:23,322-Speed 2628.37 samples/sec   Loss 8.1139   LearningRate 0.0358   Epoch: 8   Global Step: 333230   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:27,233-Speed 2618.77 samples/sec   Loss 8.0243   LearningRate 0.0358   Epoch: 8   Global Step: 333240   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:31,124-Speed 2632.68 samples/sec   Loss 8.1090   LearningRate 0.0358   Epoch: 8   Global Step: 333250   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:35,021-Speed 2628.00 samples/sec   Loss 8.0065   LearningRate 0.0358   Epoch: 8   Global Step: 333260   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:38,917-Speed 2629.25 samples/sec   Loss 8.1280   LearningRate 0.0358   Epoch: 8   Global Step: 333270   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:42,810-Speed 2630.70 samples/sec   Loss 8.1283   LearningRate 0.0358   Epoch: 8   Global Step: 333280   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:46,705-Speed 2629.94 samples/sec   Loss 8.0781   LearningRate 0.0358   Epoch: 8   Global Step: 333290   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:50,596-Speed 2632.41 samples/sec   Loss 8.0848   LearningRate 0.0358   Epoch: 8   Global Step: 333300   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:54,496-Speed 2626.34 samples/sec   Loss 8.1469   LearningRate 0.0358   Epoch: 8   Global Step: 333310   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:10:58,392-Speed 2628.68 samples/sec   Loss 8.1304   LearningRate 0.0358   Epoch: 8   Global Step: 333320   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:11:02,308-Speed 2616.22 samples/sec   Loss 8.0745   LearningRate 0.0358   Epoch: 8   Global Step: 333330   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:11:06,201-Speed 2630.76 samples/sec   Loss 7.9707   LearningRate 0.0358   Epoch: 8   Global Step: 333340   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:11:10,098-Speed 2628.30 samples/sec   Loss 7.9917   LearningRate 0.0358   Epoch: 8   Global Step: 333350   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:11:13,977-Speed 2640.70 samples/sec   Loss 7.9272   LearningRate 0.0358   Epoch: 8   Global Step: 333360   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:17,874-Speed 2627.89 samples/sec   Loss 8.0231   LearningRate 0.0358   Epoch: 8   Global Step: 333370   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:21,775-Speed 2626.30 samples/sec   Loss 7.9398   LearningRate 0.0358   Epoch: 8   Global Step: 333380   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:25,679-Speed 2623.40 samples/sec   Loss 8.1658   LearningRate 0.0358   Epoch: 8   Global Step: 333390   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:29,586-Speed 2622.23 samples/sec   Loss 8.0233   LearningRate 0.0358   Epoch: 8   Global Step: 333400   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:33,486-Speed 2626.04 samples/sec   Loss 8.0400   LearningRate 0.0358   Epoch: 8   Global Step: 333410   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:37,380-Speed 2630.16 samples/sec   Loss 7.9075   LearningRate 0.0358   Epoch: 8   Global Step: 333420   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:41,284-Speed 2622.98 samples/sec   Loss 8.0718   LearningRate 0.0358   Epoch: 8   Global Step: 333430   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:45,197-Speed 2618.21 samples/sec   Loss 8.0394   LearningRate 0.0358   Epoch: 8   Global Step: 333440   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:49,101-Speed 2623.54 samples/sec   Loss 7.9447   LearningRate 0.0358   Epoch: 8   Global Step: 333450   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:11:53,001-Speed 2626.21 samples/sec   Loss 8.0330   LearningRate 0.0358   Epoch: 8   Global Step: 333460   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:11:56,908-Speed 2622.10 samples/sec   Loss 8.0817   LearningRate 0.0358   Epoch: 8   Global Step: 333470   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:00,817-Speed 2620.05 samples/sec   Loss 7.9722   LearningRate 0.0358   Epoch: 8   Global Step: 333480   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:04,719-Speed 2625.65 samples/sec   Loss 8.0094   LearningRate 0.0358   Epoch: 8   Global Step: 333490   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:08,629-Speed 2619.03 samples/sec   Loss 8.0012   LearningRate 0.0358   Epoch: 8   Global Step: 333500   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:12,524-Speed 2629.90 samples/sec   Loss 8.0354   LearningRate 0.0358   Epoch: 8   Global Step: 333510   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:16,455-Speed 2605.68 samples/sec   Loss 8.0766   LearningRate 0.0358   Epoch: 8   Global Step: 333520   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:20,348-Speed 2631.05 samples/sec   Loss 8.0330   LearningRate 0.0358   Epoch: 8   Global Step: 333530   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:24,242-Speed 2630.44 samples/sec   Loss 7.9942   LearningRate 0.0358   Epoch: 8   Global Step: 333540   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:28,150-Speed 2620.76 samples/sec   Loss 8.0467   LearningRate 0.0358   Epoch: 8   Global Step: 333550   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:32,057-Speed 2621.76 samples/sec   Loss 8.0287   LearningRate 0.0358   Epoch: 8   Global Step: 333560   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:35,965-Speed 2620.99 samples/sec   Loss 8.0512   LearningRate 0.0357   Epoch: 8   Global Step: 333570   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:39,858-Speed 2631.10 samples/sec   Loss 8.0338   LearningRate 0.0357   Epoch: 8   Global Step: 333580   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:43,756-Speed 2627.67 samples/sec   Loss 8.0226   LearningRate 0.0357   Epoch: 8   Global Step: 333590   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:47,663-Speed 2621.73 samples/sec   Loss 8.1382   LearningRate 0.0357   Epoch: 8   Global Step: 333600   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:51,558-Speed 2629.39 samples/sec   Loss 8.0635   LearningRate 0.0357   Epoch: 8   Global Step: 333610   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:55,452-Speed 2630.22 samples/sec   Loss 8.1867   LearningRate 0.0357   Epoch: 8   Global Step: 333620   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:12:59,344-Speed 2632.03 samples/sec   Loss 8.1224   LearningRate 0.0357   Epoch: 8   Global Step: 333630   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:13:03,241-Speed 2628.15 samples/sec   Loss 8.0921   LearningRate 0.0357   Epoch: 8   Global Step: 333640   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:13:07,146-Speed 2623.40 samples/sec   Loss 8.0493   LearningRate 0.0357   Epoch: 8   Global Step: 333650   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:13:11,029-Speed 2637.73 samples/sec   Loss 8.2181   LearningRate 0.0357   Epoch: 8   Global Step: 333660   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:13:14,928-Speed 2627.14 samples/sec   Loss 7.9532   LearningRate 0.0357   Epoch: 8   Global Step: 333670   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:13:18,868-Speed 2599.48 samples/sec   Loss 8.0531   LearningRate 0.0357   Epoch: 8   Global Step: 333680   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:13:22,749-Speed 2639.00 samples/sec   Loss 8.1026   LearningRate 0.0357   Epoch: 8   Global Step: 333690   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:26,635-Speed 2636.02 samples/sec   Loss 7.9632   LearningRate 0.0357   Epoch: 8   Global Step: 333700   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:30,526-Speed 2632.60 samples/sec   Loss 8.0493   LearningRate 0.0357   Epoch: 8   Global Step: 333710   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:34,421-Speed 2629.64 samples/sec   Loss 8.0637   LearningRate 0.0357   Epoch: 8   Global Step: 333720   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:38,313-Speed 2632.22 samples/sec   Loss 7.9183   LearningRate 0.0357   Epoch: 8   Global Step: 333730   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:42,205-Speed 2630.95 samples/sec   Loss 8.0091   LearningRate 0.0357   Epoch: 8   Global Step: 333740   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:46,108-Speed 2624.91 samples/sec   Loss 8.1248   LearningRate 0.0357   Epoch: 8   Global Step: 333750   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:50,019-Speed 2619.28 samples/sec   Loss 8.1167   LearningRate 0.0357   Epoch: 8   Global Step: 333760   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:53,911-Speed 2631.13 samples/sec   Loss 8.1648   LearningRate 0.0357   Epoch: 8   Global Step: 333770   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:13:57,804-Speed 2630.71 samples/sec   Loss 8.0331   LearningRate 0.0357   Epoch: 8   Global Step: 333780   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:14:01,696-Speed 2632.29 samples/sec   Loss 8.0924   LearningRate 0.0357   Epoch: 8   Global Step: 333790   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:05,595-Speed 2626.78 samples/sec   Loss 8.0477   LearningRate 0.0357   Epoch: 8   Global Step: 333800   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:09,490-Speed 2629.98 samples/sec   Loss 8.1474   LearningRate 0.0357   Epoch: 8   Global Step: 333810   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:13,381-Speed 2631.99 samples/sec   Loss 7.9747   LearningRate 0.0357   Epoch: 8   Global Step: 333820   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:17,274-Speed 2631.61 samples/sec   Loss 8.0949   LearningRate 0.0357   Epoch: 8   Global Step: 333830   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:21,171-Speed 2628.09 samples/sec   Loss 7.9709   LearningRate 0.0357   Epoch: 8   Global Step: 333840   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:25,072-Speed 2625.44 samples/sec   Loss 7.9283   LearningRate 0.0357   Epoch: 8   Global Step: 333850   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:28,964-Speed 2631.01 samples/sec   Loss 8.2173   LearningRate 0.0357   Epoch: 8   Global Step: 333860   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:32,866-Speed 2625.65 samples/sec   Loss 8.0289   LearningRate 0.0357   Epoch: 8   Global Step: 333870   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:36,757-Speed 2632.80 samples/sec   Loss 7.9851   LearningRate 0.0357   Epoch: 8   Global Step: 333880   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:14:40,651-Speed 2629.79 samples/sec   Loss 8.0797   LearningRate 0.0357   Epoch: 8   Global Step: 333890   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:14:44,543-Speed 2631.66 samples/sec   Loss 8.0128   LearningRate 0.0357   Epoch: 8   Global Step: 333900   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:14:48,470-Speed 2608.92 samples/sec   Loss 8.1130   LearningRate 0.0357   Epoch: 8   Global Step: 333910   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:14:52,371-Speed 2625.19 samples/sec   Loss 8.0733   LearningRate 0.0357   Epoch: 8   Global Step: 333920   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:14:56,240-Speed 2647.47 samples/sec   Loss 8.0124   LearningRate 0.0357   Epoch: 8   Global Step: 333930   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:00,128-Speed 2634.92 samples/sec   Loss 7.9999   LearningRate 0.0357   Epoch: 8   Global Step: 333940   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:04,027-Speed 2626.95 samples/sec   Loss 8.0684   LearningRate 0.0357   Epoch: 8   Global Step: 333950   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:07,919-Speed 2631.37 samples/sec   Loss 8.0414   LearningRate 0.0357   Epoch: 8   Global Step: 333960   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:11,812-Speed 2631.27 samples/sec   Loss 8.0710   LearningRate 0.0357   Epoch: 8   Global Step: 333970   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:15,711-Speed 2627.44 samples/sec   Loss 7.9338   LearningRate 0.0357   Epoch: 8   Global Step: 333980   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:19,611-Speed 2626.06 samples/sec   Loss 8.1180   LearningRate 0.0357   Epoch: 8   Global Step: 333990   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:23,509-Speed 2628.01 samples/sec   Loss 7.9387   LearningRate 0.0357   Epoch: 8   Global Step: 334000   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:27,411-Speed 2625.25 samples/sec   Loss 8.0927   LearningRate 0.0357   Epoch: 8   Global Step: 334010   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:31,320-Speed 2620.28 samples/sec   Loss 7.9982   LearningRate 0.0357   Epoch: 8   Global Step: 334020   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:15:35,217-Speed 2628.32 samples/sec   Loss 7.9400   LearningRate 0.0357   Epoch: 8   Global Step: 334030   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:15:39,123-Speed 2622.19 samples/sec   Loss 8.1059   LearningRate 0.0357   Epoch: 8   Global Step: 334040   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:15:43,062-Speed 2600.27 samples/sec   Loss 8.1633   LearningRate 0.0357   Epoch: 8   Global Step: 334050   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:15:46,962-Speed 2626.47 samples/sec   Loss 8.0581   LearningRate 0.0357   Epoch: 8   Global Step: 334060   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:15:50,856-Speed 2630.51 samples/sec   Loss 8.0625   LearningRate 0.0357   Epoch: 8   Global Step: 334070   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:15:54,750-Speed 2630.43 samples/sec   Loss 7.8791   LearningRate 0.0357   Epoch: 8   Global Step: 334080   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:15:58,641-Speed 2631.81 samples/sec   Loss 8.0558   LearningRate 0.0357   Epoch: 8   Global Step: 334090   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:16:02,550-Speed 2620.53 samples/sec   Loss 8.2167   LearningRate 0.0357   Epoch: 8   Global Step: 334100   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:16:06,453-Speed 2624.54 samples/sec   Loss 8.0438   LearningRate 0.0357   Epoch: 8   Global Step: 334110   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:16:10,347-Speed 2630.02 samples/sec   Loss 8.0723   LearningRate 0.0357   Epoch: 8   Global Step: 334120   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:16:14,276-Speed 2607.11 samples/sec   Loss 8.0993   LearningRate 0.0357   Epoch: 8   Global Step: 334130   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:16:18,165-Speed 2633.86 samples/sec   Loss 8.0316   LearningRate 0.0357   Epoch: 8   Global Step: 334140   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:16:22,043-Speed 2642.06 samples/sec   Loss 8.0657   LearningRate 0.0357   Epoch: 8   Global Step: 334150   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:16:25,935-Speed 2631.51 samples/sec   Loss 8.0733   LearningRate 0.0357   Epoch: 8   Global Step: 334160   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:16:29,815-Speed 2639.53 samples/sec   Loss 8.0651   LearningRate 0.0357   Epoch: 8   Global Step: 334170   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:16:33,717-Speed 2624.94 samples/sec   Loss 8.0315   LearningRate 0.0357   Epoch: 8   Global Step: 334180   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:16:37,573-Speed 2655.93 samples/sec   Loss 8.4830   LearningRate 0.0357   Epoch: 8   Global Step: 334190   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:16:41,474-Speed 2626.13 samples/sec   Loss 9.7285   LearningRate 0.0357   Epoch: 8   Global Step: 334200   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:16:45,368-Speed 2630.28 samples/sec   Loss 8.6545   LearningRate 0.0357   Epoch: 8   Global Step: 334210   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:16:49,261-Speed 2631.06 samples/sec   Loss 8.5442   LearningRate 0.0357   Epoch: 8   Global Step: 334220   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:16:53,158-Speed 2628.63 samples/sec   Loss 8.2880   LearningRate 0.0357   Epoch: 8   Global Step: 334230   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:16:57,054-Speed 2628.60 samples/sec   Loss 8.3946   LearningRate 0.0357   Epoch: 8   Global Step: 334240   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:17:00,954-Speed 2626.35 samples/sec   Loss 8.1937   LearningRate 0.0357   Epoch: 8   Global Step: 334250   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:17:04,854-Speed 2625.80 samples/sec   Loss 8.2796   LearningRate 0.0356   Epoch: 8   Global Step: 334260   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:17:08,747-Speed 2631.10 samples/sec   Loss 8.1742   LearningRate 0.0356   Epoch: 8   Global Step: 334270   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:17:12,654-Speed 2621.77 samples/sec   Loss 8.2107   LearningRate 0.0356   Epoch: 8   Global Step: 334280   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:17:16,548-Speed 2631.33 samples/sec   Loss 8.2308   LearningRate 0.0356   Epoch: 8   Global Step: 334290   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:20,445-Speed 2628.15 samples/sec   Loss 8.1824   LearningRate 0.0356   Epoch: 8   Global Step: 334300   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:24,337-Speed 2631.26 samples/sec   Loss 8.1221   LearningRate 0.0356   Epoch: 8   Global Step: 334310   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:28,230-Speed 2631.24 samples/sec   Loss 8.0363   LearningRate 0.0356   Epoch: 8   Global Step: 334320   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:32,124-Speed 2630.08 samples/sec   Loss 8.0958   LearningRate 0.0356   Epoch: 8   Global Step: 334330   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:36,030-Speed 2622.35 samples/sec   Loss 8.1907   LearningRate 0.0356   Epoch: 8   Global Step: 334340   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:39,935-Speed 2622.44 samples/sec   Loss 8.0704   LearningRate 0.0356   Epoch: 8   Global Step: 334350   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:43,833-Speed 2627.75 samples/sec   Loss 7.9724   LearningRate 0.0356   Epoch: 8   Global Step: 334360   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:47,726-Speed 2630.81 samples/sec   Loss 8.1434   LearningRate 0.0356   Epoch: 8   Global Step: 334370   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:51,618-Speed 2631.89 samples/sec   Loss 8.0575   LearningRate 0.0356   Epoch: 8   Global Step: 334380   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:17:55,510-Speed 2632.38 samples/sec   Loss 8.0326   LearningRate 0.0356   Epoch: 8   Global Step: 334390   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:17:59,416-Speed 2622.01 samples/sec   Loss 8.1342   LearningRate 0.0356   Epoch: 8   Global Step: 334400   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:03,339-Speed 2610.89 samples/sec   Loss 7.9441   LearningRate 0.0356   Epoch: 8   Global Step: 334410   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:07,231-Speed 2631.67 samples/sec   Loss 8.0842   LearningRate 0.0356   Epoch: 8   Global Step: 334420   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:11,129-Speed 2627.31 samples/sec   Loss 8.1988   LearningRate 0.0356   Epoch: 8   Global Step: 334430   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:15,017-Speed 2634.44 samples/sec   Loss 7.9733   LearningRate 0.0356   Epoch: 8   Global Step: 334440   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:18,909-Speed 2631.94 samples/sec   Loss 8.0313   LearningRate 0.0356   Epoch: 8   Global Step: 334450   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:22,801-Speed 2631.69 samples/sec   Loss 8.0987   LearningRate 0.0356   Epoch: 8   Global Step: 334460   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:26,691-Speed 2632.87 samples/sec   Loss 8.0949   LearningRate 0.0356   Epoch: 8   Global Step: 334470   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:30,583-Speed 2631.62 samples/sec   Loss 8.0157   LearningRate 0.0356   Epoch: 8   Global Step: 334480   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:18:34,472-Speed 2633.46 samples/sec   Loss 8.0237   LearningRate 0.0356   Epoch: 8   Global Step: 334490   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:18:38,363-Speed 2632.62 samples/sec   Loss 8.1195   LearningRate 0.0356   Epoch: 8   Global Step: 334500   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:18:42,252-Speed 2633.83 samples/sec   Loss 7.9493   LearningRate 0.0356   Epoch: 8   Global Step: 334510   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:18:46,164-Speed 2618.52 samples/sec   Loss 8.0683   LearningRate 0.0356   Epoch: 8   Global Step: 334520   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:18:50,063-Speed 2626.43 samples/sec   Loss 8.0767   LearningRate 0.0356   Epoch: 8   Global Step: 334530   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:18:53,964-Speed 2625.98 samples/sec   Loss 8.0652   LearningRate 0.0356   Epoch: 8   Global Step: 334540   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:18:57,861-Speed 2627.76 samples/sec   Loss 8.0097   LearningRate 0.0356   Epoch: 8   Global Step: 334550   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:19:01,760-Speed 2627.14 samples/sec   Loss 8.0974   LearningRate 0.0356   Epoch: 8   Global Step: 334560   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:19:05,658-Speed 2627.52 samples/sec   Loss 7.9060   LearningRate 0.0356   Epoch: 8   Global Step: 334570   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:19:09,553-Speed 2629.33 samples/sec   Loss 8.0722   LearningRate 0.0356   Epoch: 8   Global Step: 334580   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:19:13,452-Speed 2627.58 samples/sec   Loss 8.1737   LearningRate 0.0356   Epoch: 8   Global Step: 334590   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:17,352-Speed 2626.35 samples/sec   Loss 8.0482   LearningRate 0.0356   Epoch: 8   Global Step: 334600   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:21,258-Speed 2622.17 samples/sec   Loss 8.1149   LearningRate 0.0356   Epoch: 8   Global Step: 334610   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:25,156-Speed 2627.30 samples/sec   Loss 7.9826   LearningRate 0.0356   Epoch: 8   Global Step: 334620   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:29,068-Speed 2618.36 samples/sec   Loss 8.1030   LearningRate 0.0356   Epoch: 8   Global Step: 334630   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:32,964-Speed 2628.70 samples/sec   Loss 7.9665   LearningRate 0.0356   Epoch: 8   Global Step: 334640   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:36,871-Speed 2621.55 samples/sec   Loss 8.0085   LearningRate 0.0356   Epoch: 8   Global Step: 334650   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:40,776-Speed 2622.55 samples/sec   Loss 8.0740   LearningRate 0.0356   Epoch: 8   Global Step: 334660   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:44,676-Speed 2626.62 samples/sec   Loss 8.0358   LearningRate 0.0356   Epoch: 8   Global Step: 334670   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:48,570-Speed 2629.76 samples/sec   Loss 7.9615   LearningRate 0.0356   Epoch: 8   Global Step: 334680   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:19:52,465-Speed 2630.21 samples/sec   Loss 8.0334   LearningRate 0.0356   Epoch: 8   Global Step: 334690   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:19:56,368-Speed 2623.95 samples/sec   Loss 7.9116   LearningRate 0.0356   Epoch: 8   Global Step: 334700   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:20:00,287-Speed 2613.63 samples/sec   Loss 8.1159   LearningRate 0.0356   Epoch: 8   Global Step: 334710   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:20:04,216-Speed 2606.90 samples/sec   Loss 8.0513   LearningRate 0.0356   Epoch: 8   Global Step: 334720   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:08,110-Speed 2630.47 samples/sec   Loss 8.0598   LearningRate 0.0356   Epoch: 8   Global Step: 334730   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:12,007-Speed 2627.86 samples/sec   Loss 8.0012   LearningRate 0.0356   Epoch: 8   Global Step: 334740   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:15,906-Speed 2626.98 samples/sec   Loss 8.0792   LearningRate 0.0356   Epoch: 8   Global Step: 334750   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:19,814-Speed 2620.78 samples/sec   Loss 8.0090   LearningRate 0.0356   Epoch: 8   Global Step: 334760   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:23,715-Speed 2625.79 samples/sec   Loss 8.1064   LearningRate 0.0356   Epoch: 8   Global Step: 334770   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:27,612-Speed 2627.65 samples/sec   Loss 8.0194   LearningRate 0.0356   Epoch: 8   Global Step: 334780   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:31,514-Speed 2624.99 samples/sec   Loss 8.0149   LearningRate 0.0356   Epoch: 8   Global Step: 334790   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:35,418-Speed 2624.13 samples/sec   Loss 7.9058   LearningRate 0.0356   Epoch: 8   Global Step: 334800   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:39,313-Speed 2629.03 samples/sec   Loss 7.9774   LearningRate 0.0356   Epoch: 8   Global Step: 334810   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:43,219-Speed 2622.66 samples/sec   Loss 8.0421   LearningRate 0.0356   Epoch: 8   Global Step: 334820   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:20:47,096-Speed 2641.39 samples/sec   Loss 8.1303   LearningRate 0.0356   Epoch: 8   Global Step: 334830   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:50,996-Speed 2626.34 samples/sec   Loss 7.9717   LearningRate 0.0356   Epoch: 8   Global Step: 334840   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:54,894-Speed 2627.31 samples/sec   Loss 7.9770   LearningRate 0.0356   Epoch: 8   Global Step: 334850   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:20:58,791-Speed 2628.70 samples/sec   Loss 7.9962   LearningRate 0.0356   Epoch: 8   Global Step: 334860   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:02,695-Speed 2623.38 samples/sec   Loss 7.8968   LearningRate 0.0356   Epoch: 8   Global Step: 334870   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:06,592-Speed 2628.28 samples/sec   Loss 8.0249   LearningRate 0.0356   Epoch: 8   Global Step: 334880   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:10,488-Speed 2628.62 samples/sec   Loss 7.9010   LearningRate 0.0356   Epoch: 8   Global Step: 334890   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:14,384-Speed 2629.11 samples/sec   Loss 8.0430   LearningRate 0.0356   Epoch: 8   Global Step: 334900   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:18,283-Speed 2627.55 samples/sec   Loss 8.2498   LearningRate 0.0356   Epoch: 8   Global Step: 334910   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:22,181-Speed 2627.28 samples/sec   Loss 8.1058   LearningRate 0.0356   Epoch: 8   Global Step: 334920   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:26,074-Speed 2630.78 samples/sec   Loss 8.0953   LearningRate 0.0356   Epoch: 8   Global Step: 334930   Fp16 Grad Scale: 262144   Required: 56 hours
Training: 2022-04-14 09:21:29,957-Speed 2637.88 samples/sec   Loss 8.0272   LearningRate 0.0356   Epoch: 8   Global Step: 334940   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:33,863-Speed 2621.80 samples/sec   Loss 7.9921   LearningRate 0.0356   Epoch: 8   Global Step: 334950   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:21:37,709-Speed 2662.85 samples/sec   Loss 8.0177   LearningRate 0.0355   Epoch: 8   Global Step: 334960   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:21:41,582-Speed 2645.10 samples/sec   Loss 8.4671   LearningRate 0.0355   Epoch: 8   Global Step: 334970   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:21:45,472-Speed 2632.65 samples/sec   Loss 8.1287   LearningRate 0.0355   Epoch: 8   Global Step: 334980   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:21:49,372-Speed 2626.34 samples/sec   Loss 7.9874   LearningRate 0.0355   Epoch: 8   Global Step: 334990   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:21:53,259-Speed 2635.24 samples/sec   Loss 8.1363   LearningRate 0.0355   Epoch: 8   Global Step: 335000   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:21:57,147-Speed 2634.44 samples/sec   Loss 7.9930   LearningRate 0.0355   Epoch: 8   Global Step: 335010   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:22:01,037-Speed 2633.00 samples/sec   Loss 7.9800   LearningRate 0.0355   Epoch: 8   Global Step: 335020   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:22:04,929-Speed 2631.09 samples/sec   Loss 8.0743   LearningRate 0.0355   Epoch: 8   Global Step: 335030   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:22:08,820-Speed 2632.30 samples/sec   Loss 8.0762   LearningRate 0.0355   Epoch: 8   Global Step: 335040   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:22:12,707-Speed 2635.28 samples/sec   Loss 7.9823   LearningRate 0.0355   Epoch: 8   Global Step: 335050   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:22:16,596-Speed 2633.53 samples/sec   Loss 8.0050   LearningRate 0.0355   Epoch: 8   Global Step: 335060   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-14 09:22:20,494-Speed 2627.58 samples/sec   Loss 8.0219   LearningRate 0.0355   Epoch: 8   Global Step: 335070   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:24,382-Speed 2634.59 samples/sec   Loss 8.0500   LearningRate 0.0355   Epoch: 8   Global Step: 335080   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:28,275-Speed 2631.11 samples/sec   Loss 8.0313   LearningRate 0.0355   Epoch: 8   Global Step: 335090   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:32,171-Speed 2628.91 samples/sec   Loss 8.0486   LearningRate 0.0355   Epoch: 8   Global Step: 335100   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:36,064-Speed 2630.84 samples/sec   Loss 8.0530   LearningRate 0.0355   Epoch: 8   Global Step: 335110   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:39,953-Speed 2633.52 samples/sec   Loss 8.0839   LearningRate 0.0355   Epoch: 8   Global Step: 335120   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:43,933-Speed 2573.73 samples/sec   Loss 7.9958   LearningRate 0.0355   Epoch: 8   Global Step: 335130   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:47,839-Speed 2622.01 samples/sec   Loss 8.0360   LearningRate 0.0355   Epoch: 8   Global Step: 335140   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:51,737-Speed 2627.84 samples/sec   Loss 8.0032   LearningRate 0.0355   Epoch: 8   Global Step: 335150   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:55,637-Speed 2625.65 samples/sec   Loss 8.1026   LearningRate 0.0355   Epoch: 8   Global Step: 335160   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-14 09:22:59,528-Speed 2632.75 samples/sec   Loss 7.9349   LearningRate 0.0355   Epoch: 8   Global Step: 335170   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:03,424-Speed 2628.89 samples/sec   Loss 8.0349   LearningRate 0.0355   Epoch: 8   Global Step: 335180   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:07,317-Speed 2630.73 samples/sec   Loss 7.9283   LearningRate 0.0355   Epoch: 8   Global Step: 335190   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:11,209-Speed 2632.23 samples/sec   Loss 7.9055   LearningRate 0.0355   Epoch: 8   Global Step: 335200   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:15,118-Speed 2620.06 samples/sec   Loss 7.9836   LearningRate 0.0355   Epoch: 8   Global Step: 335210   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:19,018-Speed 2625.96 samples/sec   Loss 8.0254   LearningRate 0.0355   Epoch: 8   Global Step: 335220   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:22,907-Speed 2633.50 samples/sec   Loss 8.0392   LearningRate 0.0355   Epoch: 8   Global Step: 335230   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:26,801-Speed 2630.73 samples/sec   Loss 8.0136   LearningRate 0.0355   Epoch: 8   Global Step: 335240   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:30,701-Speed 2625.73 samples/sec   Loss 7.9715   LearningRate 0.0355   Epoch: 8   Global Step: 335250   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:34,606-Speed 2622.93 samples/sec   Loss 8.0281   LearningRate 0.0355   Epoch: 8   Global Step: 335260   Fp16 Grad Scale: 16384   Required: 56 hours
Training: 2022-04-14 09:23:38,505-Speed 2626.79 samples/sec   Loss 8.0632   LearningRate 0.0355   Epoch: 8   Global Step: 335270   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:23:42,395-Speed 2632.94 samples/sec   Loss 7.9342   LearningRate 0.0355   Epoch: 8   Global Step: 335280   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:23:46,287-Speed 2631.80 samples/sec   Loss 8.0293   LearningRate 0.0355   Epoch: 8   Global Step: 335290   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:23:50,180-Speed 2631.79 samples/sec   Loss 7.9624   LearningRate 0.0355   Epoch: 8   Global Step: 335300   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:23:54,075-Speed 2628.95 samples/sec   Loss 7.8603   LearningRate 0.0355   Epoch: 8   Global Step: 335310   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:23:57,969-Speed 2630.72 samples/sec   Loss 8.1627   LearningRate 0.0355   Epoch: 8   Global Step: 335320   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:24:01,863-Speed 2630.37 samples/sec   Loss 8.1809   LearningRate 0.0355   Epoch: 8   Global Step: 335330   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:24:05,757-Speed 2629.76 samples/sec   Loss 7.9393   LearningRate 0.0355   Epoch: 8   Global Step: 335340   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:24:09,651-Speed 2630.07 samples/sec   Loss 7.9522   LearningRate 0.0355   Epoch: 8   Global Step: 335350   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:24:13,547-Speed 2629.53 samples/sec   Loss 8.0189   LearningRate 0.0355   Epoch: 8   Global Step: 335360   Fp16 Grad Scale: 32768   Required: 56 hours
Training: 2022-04-14 09:24:17,447-Speed 2626.14 samples/sec   Loss 8.1489   LearningRate 0.0355   Epoch: 8   Global Step: 335370   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:21,355-Speed 2620.33 samples/sec   Loss 8.1272   LearningRate 0.0355   Epoch: 8   Global Step: 335380   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:25,265-Speed 2620.06 samples/sec   Loss 8.0075   LearningRate 0.0355   Epoch: 8   Global Step: 335390   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:29,157-Speed 2631.33 samples/sec   Loss 8.1175   LearningRate 0.0355   Epoch: 8   Global Step: 335400   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:33,060-Speed 2625.11 samples/sec   Loss 8.1167   LearningRate 0.0355   Epoch: 8   Global Step: 335410   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:36,963-Speed 2623.99 samples/sec   Loss 8.0120   LearningRate 0.0355   Epoch: 8   Global Step: 335420   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:40,865-Speed 2624.56 samples/sec   Loss 7.9457   LearningRate 0.0355   Epoch: 8   Global Step: 335430   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:44,764-Speed 2627.14 samples/sec   Loss 7.9917   LearningRate 0.0355   Epoch: 8   Global Step: 335440   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:48,663-Speed 2626.74 samples/sec   Loss 8.0372   LearningRate 0.0355   Epoch: 8   Global Step: 335450   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:52,572-Speed 2620.45 samples/sec   Loss 8.0429   LearningRate 0.0355   Epoch: 8   Global Step: 335460   Fp16 Grad Scale: 65536   Required: 56 hours
Training: 2022-04-14 09:24:56,479-Speed 2621.41 samples/sec   Loss 7.9779   LearningRate 0.0355   Epoch: 8   Global Step: 335470   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:25:00,383-Speed 2623.95 samples/sec   Loss 8.1447   LearningRate 0.0355   Epoch: 8   Global Step: 335480   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:25:04,278-Speed 2629.65 samples/sec   Loss 8.0687   LearningRate 0.0355   Epoch: 8   Global Step: 335490   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:25:08,172-Speed 2629.93 samples/sec   Loss 7.8880   LearningRate 0.0355   Epoch: 8   Global Step: 335500   Fp16 Grad Scale: 131072   Required: 56 hours
Training: 2022-04-14 09:25:12,065-Speed 2631.16 samples/sec   Loss 7.8851   LearningRate 0.0355   Epoch: 8   Global Step: 335510   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:15,955-Speed 2633.14 samples/sec   Loss 8.1351   LearningRate 0.0355   Epoch: 8   Global Step: 335520   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:19,851-Speed 2628.53 samples/sec   Loss 8.0394   LearningRate 0.0355   Epoch: 8   Global Step: 335530   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:23,745-Speed 2630.55 samples/sec   Loss 8.0980   LearningRate 0.0355   Epoch: 8   Global Step: 335540   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:27,635-Speed 2632.90 samples/sec   Loss 8.1482   LearningRate 0.0355   Epoch: 8   Global Step: 335550   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:31,531-Speed 2628.30 samples/sec   Loss 8.0788   LearningRate 0.0355   Epoch: 8   Global Step: 335560   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:35,422-Speed 2632.43 samples/sec   Loss 7.9495   LearningRate 0.0355   Epoch: 8   Global Step: 335570   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:25:39,315-Speed 2631.63 samples/sec   Loss 7.9988   LearningRate 0.0355   Epoch: 8   Global Step: 335580   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:25:43,194-Speed 2640.25 samples/sec   Loss 8.0488   LearningRate 0.0355   Epoch: 8   Global Step: 335590   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:47,088-Speed 2630.61 samples/sec   Loss 7.8176   LearningRate 0.0355   Epoch: 8   Global Step: 335600   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:50,981-Speed 2630.89 samples/sec   Loss 8.0471   LearningRate 0.0355   Epoch: 8   Global Step: 335610   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:54,874-Speed 2630.71 samples/sec   Loss 7.8607   LearningRate 0.0355   Epoch: 8   Global Step: 335620   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:25:58,770-Speed 2628.71 samples/sec   Loss 8.0093   LearningRate 0.0355   Epoch: 8   Global Step: 335630   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:26:02,664-Speed 2629.93 samples/sec   Loss 8.0792   LearningRate 0.0355   Epoch: 8   Global Step: 335640   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:26:06,559-Speed 2629.76 samples/sec   Loss 7.9881   LearningRate 0.0354   Epoch: 8   Global Step: 335650   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:26:10,453-Speed 2630.23 samples/sec   Loss 7.9375   LearningRate 0.0354   Epoch: 8   Global Step: 335660   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:14,361-Speed 2621.16 samples/sec   Loss 7.9280   LearningRate 0.0354   Epoch: 8   Global Step: 335670   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:18,272-Speed 2619.14 samples/sec   Loss 8.0308   LearningRate 0.0354   Epoch: 8   Global Step: 335680   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:22,181-Speed 2620.04 samples/sec   Loss 8.1375   LearningRate 0.0354   Epoch: 8   Global Step: 335690   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:26,085-Speed 2624.06 samples/sec   Loss 8.0259   LearningRate 0.0354   Epoch: 8   Global Step: 335700   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:29,992-Speed 2621.36 samples/sec   Loss 8.0576   LearningRate 0.0354   Epoch: 8   Global Step: 335710   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:33,885-Speed 2630.75 samples/sec   Loss 8.0322   LearningRate 0.0354   Epoch: 8   Global Step: 335720   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:37,793-Speed 2620.82 samples/sec   Loss 7.9785   LearningRate 0.0354   Epoch: 8   Global Step: 335730   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:41,683-Speed 2632.70 samples/sec   Loss 8.0153   LearningRate 0.0354   Epoch: 8   Global Step: 335740   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:45,578-Speed 2629.87 samples/sec   Loss 8.0276   LearningRate 0.0354   Epoch: 8   Global Step: 335750   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:26:49,482-Speed 2623.17 samples/sec   Loss 7.9987   LearningRate 0.0354   Epoch: 8   Global Step: 335760   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:26:53,376-Speed 2630.42 samples/sec   Loss 8.0419   LearningRate 0.0354   Epoch: 8   Global Step: 335770   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:26:57,272-Speed 2629.24 samples/sec   Loss 7.9567   LearningRate 0.0354   Epoch: 8   Global Step: 335780   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:01,189-Speed 2614.68 samples/sec   Loss 8.0865   LearningRate 0.0354   Epoch: 8   Global Step: 335790   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:05,084-Speed 2629.60 samples/sec   Loss 8.0006   LearningRate 0.0354   Epoch: 8   Global Step: 335800   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:08,975-Speed 2632.41 samples/sec   Loss 8.0160   LearningRate 0.0354   Epoch: 8   Global Step: 335810   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:12,881-Speed 2622.25 samples/sec   Loss 8.0751   LearningRate 0.0354   Epoch: 8   Global Step: 335820   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:16,778-Speed 2628.19 samples/sec   Loss 8.0595   LearningRate 0.0354   Epoch: 8   Global Step: 335830   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:20,671-Speed 2631.09 samples/sec   Loss 7.9819   LearningRate 0.0354   Epoch: 8   Global Step: 335840   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:24,564-Speed 2630.61 samples/sec   Loss 8.0534   LearningRate 0.0354   Epoch: 8   Global Step: 335850   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:28,440-Speed 2642.54 samples/sec   Loss 7.9034   LearningRate 0.0354   Epoch: 8   Global Step: 335860   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:32,335-Speed 2629.79 samples/sec   Loss 7.8690   LearningRate 0.0354   Epoch: 8   Global Step: 335870   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:36,229-Speed 2630.29 samples/sec   Loss 8.1321   LearningRate 0.0354   Epoch: 8   Global Step: 335880   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:40,120-Speed 2632.57 samples/sec   Loss 8.0090   LearningRate 0.0354   Epoch: 8   Global Step: 335890   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:44,019-Speed 2627.25 samples/sec   Loss 7.9536   LearningRate 0.0354   Epoch: 8   Global Step: 335900   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:47,914-Speed 2629.29 samples/sec   Loss 8.0416   LearningRate 0.0354   Epoch: 8   Global Step: 335910   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:51,818-Speed 2623.45 samples/sec   Loss 8.0155   LearningRate 0.0354   Epoch: 8   Global Step: 335920   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:55,712-Speed 2630.57 samples/sec   Loss 7.9525   LearningRate 0.0354   Epoch: 8   Global Step: 335930   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:27:59,605-Speed 2631.03 samples/sec   Loss 7.9480   LearningRate 0.0354   Epoch: 8   Global Step: 335940   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:03,504-Speed 2626.54 samples/sec   Loss 7.9338   LearningRate 0.0354   Epoch: 8   Global Step: 335950   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:07,387-Speed 2637.53 samples/sec   Loss 7.9257   LearningRate 0.0354   Epoch: 8   Global Step: 335960   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:11,277-Speed 2634.08 samples/sec   Loss 7.9190   LearningRate 0.0354   Epoch: 8   Global Step: 335970   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:15,178-Speed 2625.72 samples/sec   Loss 8.0475   LearningRate 0.0354   Epoch: 8   Global Step: 335980   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:19,072-Speed 2630.10 samples/sec   Loss 7.9882   LearningRate 0.0354   Epoch: 8   Global Step: 335990   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:22,967-Speed 2629.60 samples/sec   Loss 8.0164   LearningRate 0.0354   Epoch: 8   Global Step: 336000   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:26,863-Speed 2629.28 samples/sec   Loss 7.9467   LearningRate 0.0354   Epoch: 8   Global Step: 336010   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:30,756-Speed 2630.50 samples/sec   Loss 7.9321   LearningRate 0.0354   Epoch: 8   Global Step: 336020   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:34,651-Speed 2629.67 samples/sec   Loss 7.9072   LearningRate 0.0354   Epoch: 8   Global Step: 336030   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:38,545-Speed 2630.35 samples/sec   Loss 8.0132   LearningRate 0.0354   Epoch: 8   Global Step: 336040   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:42,500-Speed 2589.52 samples/sec   Loss 7.9319   LearningRate 0.0354   Epoch: 8   Global Step: 336050   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:46,600-Speed 2498.19 samples/sec   Loss 7.8498   LearningRate 0.0354   Epoch: 8   Global Step: 336060   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:28:50,670-Speed 2516.38 samples/sec   Loss 8.0606   LearningRate 0.0354   Epoch: 8   Global Step: 336070   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:28:54,554-Speed 2637.18 samples/sec   Loss 8.1456   LearningRate 0.0354   Epoch: 8   Global Step: 336080   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:28:58,449-Speed 2629.91 samples/sec   Loss 8.1207   LearningRate 0.0354   Epoch: 8   Global Step: 336090   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:02,350-Speed 2625.34 samples/sec   Loss 8.0580   LearningRate 0.0354   Epoch: 8   Global Step: 336100   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:06,245-Speed 2629.45 samples/sec   Loss 7.9348   LearningRate 0.0354   Epoch: 8   Global Step: 336110   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:10,154-Speed 2620.13 samples/sec   Loss 7.9213   LearningRate 0.0354   Epoch: 8   Global Step: 336120   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:14,047-Speed 2631.45 samples/sec   Loss 7.9146   LearningRate 0.0354   Epoch: 8   Global Step: 336130   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:17,950-Speed 2623.86 samples/sec   Loss 8.0964   LearningRate 0.0354   Epoch: 8   Global Step: 336140   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:21,849-Speed 2626.68 samples/sec   Loss 8.0148   LearningRate 0.0354   Epoch: 8   Global Step: 336150   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:25,751-Speed 2625.06 samples/sec   Loss 8.0580   LearningRate 0.0354   Epoch: 8   Global Step: 336160   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:29,650-Speed 2627.11 samples/sec   Loss 8.0233   LearningRate 0.0354   Epoch: 8   Global Step: 336170   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:33,552-Speed 2625.32 samples/sec   Loss 7.9100   LearningRate 0.0354   Epoch: 8   Global Step: 336180   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:29:37,427-Speed 2642.90 samples/sec   Loss 8.0841   LearningRate 0.0354   Epoch: 8   Global Step: 336190   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:41,321-Speed 2630.47 samples/sec   Loss 7.8968   LearningRate 0.0354   Epoch: 8   Global Step: 336200   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:45,216-Speed 2629.20 samples/sec   Loss 7.9629   LearningRate 0.0354   Epoch: 8   Global Step: 336210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:49,109-Speed 2630.84 samples/sec   Loss 7.9466   LearningRate 0.0354   Epoch: 8   Global Step: 336220   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:53,001-Speed 2632.02 samples/sec   Loss 7.9270   LearningRate 0.0354   Epoch: 8   Global Step: 336230   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:29:56,936-Speed 2602.80 samples/sec   Loss 7.9442   LearningRate 0.0354   Epoch: 8   Global Step: 336240   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:30:00,850-Speed 2616.30 samples/sec   Loss 7.8910   LearningRate 0.0354   Epoch: 8   Global Step: 336250   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:30:04,762-Speed 2618.24 samples/sec   Loss 8.1024   LearningRate 0.0354   Epoch: 8   Global Step: 336260   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:30:08,652-Speed 2632.90 samples/sec   Loss 8.0423   LearningRate 0.0354   Epoch: 8   Global Step: 336270   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:30:12,545-Speed 2631.14 samples/sec   Loss 8.0058   LearningRate 0.0354   Epoch: 8   Global Step: 336280   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:30:16,422-Speed 2642.34 samples/sec   Loss 7.9741   LearningRate 0.0354   Epoch: 8   Global Step: 336290   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:30:20,317-Speed 2629.38 samples/sec   Loss 7.8021   LearningRate 0.0354   Epoch: 8   Global Step: 336300   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:30:24,197-Speed 2639.46 samples/sec   Loss 7.9267   LearningRate 0.0354   Epoch: 8   Global Step: 336310   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:28,090-Speed 2631.20 samples/sec   Loss 7.9353   LearningRate 0.0354   Epoch: 8   Global Step: 336320   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:31,984-Speed 2630.57 samples/sec   Loss 7.9928   LearningRate 0.0354   Epoch: 8   Global Step: 336330   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:35,875-Speed 2631.83 samples/sec   Loss 7.9033   LearningRate 0.0354   Epoch: 8   Global Step: 336340   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:39,768-Speed 2631.28 samples/sec   Loss 7.9172   LearningRate 0.0353   Epoch: 8   Global Step: 336350   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:43,664-Speed 2628.62 samples/sec   Loss 7.9095   LearningRate 0.0353   Epoch: 8   Global Step: 336360   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:47,563-Speed 2626.95 samples/sec   Loss 7.9392   LearningRate 0.0353   Epoch: 8   Global Step: 336370   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:51,460-Speed 2628.81 samples/sec   Loss 8.0450   LearningRate 0.0353   Epoch: 8   Global Step: 336380   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:55,353-Speed 2631.20 samples/sec   Loss 7.9724   LearningRate 0.0353   Epoch: 8   Global Step: 336390   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:30:59,245-Speed 2631.17 samples/sec   Loss 7.9859   LearningRate 0.0353   Epoch: 8   Global Step: 336400   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:31:03,155-Speed 2619.53 samples/sec   Loss 8.1257   LearningRate 0.0353   Epoch: 8   Global Step: 336410   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:07,060-Speed 2623.18 samples/sec   Loss 7.8633   LearningRate 0.0353   Epoch: 8   Global Step: 336420   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:10,975-Speed 2615.74 samples/sec   Loss 7.8669   LearningRate 0.0353   Epoch: 8   Global Step: 336430   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:14,897-Speed 2611.83 samples/sec   Loss 8.1444   LearningRate 0.0353   Epoch: 8   Global Step: 336440   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:18,815-Speed 2613.71 samples/sec   Loss 7.9417   LearningRate 0.0353   Epoch: 8   Global Step: 336450   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:22,729-Speed 2617.20 samples/sec   Loss 8.0654   LearningRate 0.0353   Epoch: 8   Global Step: 336460   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:26,629-Speed 2625.71 samples/sec   Loss 8.0625   LearningRate 0.0353   Epoch: 8   Global Step: 336470   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:30,520-Speed 2632.55 samples/sec   Loss 8.0186   LearningRate 0.0353   Epoch: 8   Global Step: 336480   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:34,425-Speed 2623.11 samples/sec   Loss 7.9785   LearningRate 0.0353   Epoch: 8   Global Step: 336490   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:31:38,260-Speed 2670.78 samples/sec   Loss 8.2996   LearningRate 0.0353   Epoch: 8   Global Step: 336500   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:31:42,115-Speed 2656.61 samples/sec   Loss 9.6476   LearningRate 0.0353   Epoch: 8   Global Step: 336510   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:31:46,002-Speed 2635.04 samples/sec   Loss 9.2868   LearningRate 0.0353   Epoch: 8   Global Step: 336520   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:31:49,892-Speed 2633.04 samples/sec   Loss 8.5001   LearningRate 0.0353   Epoch: 8   Global Step: 336530   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:31:53,782-Speed 2633.19 samples/sec   Loss 8.3730   LearningRate 0.0353   Epoch: 8   Global Step: 336540   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:31:57,670-Speed 2634.40 samples/sec   Loss 8.1863   LearningRate 0.0353   Epoch: 8   Global Step: 336550   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:32:01,559-Speed 2633.16 samples/sec   Loss 8.1522   LearningRate 0.0353   Epoch: 8   Global Step: 336560   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:32:05,450-Speed 2632.53 samples/sec   Loss 8.1246   LearningRate 0.0353   Epoch: 8   Global Step: 336570   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:32:09,337-Speed 2635.19 samples/sec   Loss 8.1362   LearningRate 0.0353   Epoch: 8   Global Step: 336580   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:32:13,237-Speed 2626.68 samples/sec   Loss 8.1412   LearningRate 0.0353   Epoch: 8   Global Step: 336590   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:32:17,132-Speed 2629.53 samples/sec   Loss 7.9520   LearningRate 0.0353   Epoch: 8   Global Step: 336600   Fp16 Grad Scale: 1024   Required: 55 hours
Training: 2022-04-14 09:32:21,031-Speed 2626.73 samples/sec   Loss 8.3721   LearningRate 0.0353   Epoch: 8   Global Step: 336610   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:24,924-Speed 2631.21 samples/sec   Loss 8.1390   LearningRate 0.0353   Epoch: 8   Global Step: 336620   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:28,816-Speed 2631.03 samples/sec   Loss 8.0495   LearningRate 0.0353   Epoch: 8   Global Step: 336630   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:32,705-Speed 2633.89 samples/sec   Loss 7.9751   LearningRate 0.0353   Epoch: 8   Global Step: 336640   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:36,600-Speed 2629.24 samples/sec   Loss 7.9379   LearningRate 0.0353   Epoch: 8   Global Step: 336650   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:40,491-Speed 2632.54 samples/sec   Loss 7.9204   LearningRate 0.0353   Epoch: 8   Global Step: 336660   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:44,382-Speed 2632.52 samples/sec   Loss 8.0854   LearningRate 0.0353   Epoch: 8   Global Step: 336670   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:48,275-Speed 2630.34 samples/sec   Loss 8.0291   LearningRate 0.0353   Epoch: 8   Global Step: 336680   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:52,181-Speed 2623.10 samples/sec   Loss 8.0704   LearningRate 0.0353   Epoch: 8   Global Step: 336690   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:56,071-Speed 2632.48 samples/sec   Loss 8.1585   LearningRate 0.0353   Epoch: 8   Global Step: 336700   Fp16 Grad Scale: 2048   Required: 55 hours
Training: 2022-04-14 09:32:59,971-Speed 2626.41 samples/sec   Loss 8.0747   LearningRate 0.0353   Epoch: 8   Global Step: 336710   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:03,874-Speed 2623.92 samples/sec   Loss 8.0608   LearningRate 0.0353   Epoch: 8   Global Step: 336720   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:07,771-Speed 2628.54 samples/sec   Loss 7.9898   LearningRate 0.0353   Epoch: 8   Global Step: 336730   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:11,675-Speed 2623.12 samples/sec   Loss 8.0042   LearningRate 0.0353   Epoch: 8   Global Step: 336740   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:15,573-Speed 2627.82 samples/sec   Loss 7.8826   LearningRate 0.0353   Epoch: 8   Global Step: 336750   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:19,478-Speed 2622.59 samples/sec   Loss 7.9699   LearningRate 0.0353   Epoch: 8   Global Step: 336760   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:23,372-Speed 2630.35 samples/sec   Loss 7.8981   LearningRate 0.0353   Epoch: 8   Global Step: 336770   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:27,259-Speed 2635.08 samples/sec   Loss 8.0476   LearningRate 0.0353   Epoch: 8   Global Step: 336780   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:31,152-Speed 2631.26 samples/sec   Loss 7.9586   LearningRate 0.0353   Epoch: 8   Global Step: 336790   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:35,041-Speed 2633.88 samples/sec   Loss 7.9436   LearningRate 0.0353   Epoch: 8   Global Step: 336800   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:33:38,929-Speed 2634.03 samples/sec   Loss 8.0179   LearningRate 0.0353   Epoch: 8   Global Step: 336810   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:33:42,819-Speed 2633.36 samples/sec   Loss 8.0359   LearningRate 0.0353   Epoch: 8   Global Step: 336820   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:33:46,708-Speed 2633.04 samples/sec   Loss 7.9777   LearningRate 0.0353   Epoch: 8   Global Step: 336830   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:33:50,600-Speed 2631.94 samples/sec   Loss 7.8492   LearningRate 0.0353   Epoch: 8   Global Step: 336840   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:33:54,491-Speed 2632.16 samples/sec   Loss 7.9421   LearningRate 0.0353   Epoch: 8   Global Step: 336850   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:33:58,392-Speed 2625.69 samples/sec   Loss 7.9635   LearningRate 0.0353   Epoch: 8   Global Step: 336860   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:34:02,286-Speed 2629.83 samples/sec   Loss 8.1616   LearningRate 0.0353   Epoch: 8   Global Step: 336870   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:34:06,181-Speed 2629.58 samples/sec   Loss 8.1640   LearningRate 0.0353   Epoch: 8   Global Step: 336880   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:34:10,071-Speed 2633.55 samples/sec   Loss 8.0713   LearningRate 0.0353   Epoch: 8   Global Step: 336890   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:34:13,965-Speed 2630.03 samples/sec   Loss 8.1333   LearningRate 0.0353   Epoch: 8   Global Step: 336900   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:34:17,860-Speed 2629.84 samples/sec   Loss 7.9144   LearningRate 0.0353   Epoch: 8   Global Step: 336910   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:21,760-Speed 2626.11 samples/sec   Loss 7.9661   LearningRate 0.0353   Epoch: 8   Global Step: 336920   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:25,669-Speed 2619.95 samples/sec   Loss 7.9862   LearningRate 0.0353   Epoch: 8   Global Step: 336930   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:29,574-Speed 2623.03 samples/sec   Loss 7.9816   LearningRate 0.0353   Epoch: 8   Global Step: 336940   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:33,481-Speed 2621.54 samples/sec   Loss 7.9184   LearningRate 0.0353   Epoch: 8   Global Step: 336950   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:37,376-Speed 2629.56 samples/sec   Loss 8.0473   LearningRate 0.0353   Epoch: 8   Global Step: 336960   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:41,272-Speed 2628.86 samples/sec   Loss 8.0612   LearningRate 0.0353   Epoch: 8   Global Step: 336970   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:45,175-Speed 2624.51 samples/sec   Loss 7.9732   LearningRate 0.0353   Epoch: 8   Global Step: 336980   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:49,247-Speed 2515.49 samples/sec   Loss 7.9811   LearningRate 0.0353   Epoch: 8   Global Step: 336990   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:53,171-Speed 2610.63 samples/sec   Loss 7.9859   LearningRate 0.0353   Epoch: 8   Global Step: 337000   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:34:57,089-Speed 2614.69 samples/sec   Loss 7.9548   LearningRate 0.0353   Epoch: 8   Global Step: 337010   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:00,994-Speed 2622.57 samples/sec   Loss 7.9606   LearningRate 0.0353   Epoch: 8   Global Step: 337020   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:04,894-Speed 2626.11 samples/sec   Loss 7.9180   LearningRate 0.0353   Epoch: 8   Global Step: 337030   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:08,804-Speed 2619.84 samples/sec   Loss 8.0358   LearningRate 0.0353   Epoch: 8   Global Step: 337040   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:12,707-Speed 2624.56 samples/sec   Loss 7.9054   LearningRate 0.0352   Epoch: 8   Global Step: 337050   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:16,670-Speed 2584.27 samples/sec   Loss 8.0399   LearningRate 0.0352   Epoch: 8   Global Step: 337060   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:20,576-Speed 2622.38 samples/sec   Loss 7.9303   LearningRate 0.0352   Epoch: 8   Global Step: 337070   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:24,472-Speed 2628.45 samples/sec   Loss 7.9484   LearningRate 0.0352   Epoch: 8   Global Step: 337080   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:28,369-Speed 2628.61 samples/sec   Loss 8.0752   LearningRate 0.0352   Epoch: 8   Global Step: 337090   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:32,269-Speed 2626.32 samples/sec   Loss 8.0516   LearningRate 0.0352   Epoch: 8   Global Step: 337100   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:35:36,206-Speed 2601.82 samples/sec   Loss 8.0061   LearningRate 0.0352   Epoch: 8   Global Step: 337110   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:35:40,129-Speed 2610.77 samples/sec   Loss 7.9998   LearningRate 0.0352   Epoch: 8   Global Step: 337120   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:35:44,028-Speed 2627.06 samples/sec   Loss 8.0091   LearningRate 0.0352   Epoch: 8   Global Step: 337130   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:35:47,931-Speed 2624.47 samples/sec   Loss 7.9982   LearningRate 0.0352   Epoch: 8   Global Step: 337140   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:35:51,824-Speed 2630.87 samples/sec   Loss 8.0101   LearningRate 0.0352   Epoch: 8   Global Step: 337150   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:35:55,727-Speed 2624.55 samples/sec   Loss 8.0102   LearningRate 0.0352   Epoch: 8   Global Step: 337160   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:35:59,619-Speed 2631.80 samples/sec   Loss 7.9769   LearningRate 0.0352   Epoch: 8   Global Step: 337170   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:03,516-Speed 2628.62 samples/sec   Loss 7.9233   LearningRate 0.0352   Epoch: 8   Global Step: 337180   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:07,407-Speed 2632.43 samples/sec   Loss 7.9450   LearningRate 0.0352   Epoch: 8   Global Step: 337190   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:11,303-Speed 2628.55 samples/sec   Loss 7.9599   LearningRate 0.0352   Epoch: 8   Global Step: 337200   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:15,197-Speed 2630.54 samples/sec   Loss 7.9538   LearningRate 0.0352   Epoch: 8   Global Step: 337210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:36:19,094-Speed 2628.41 samples/sec   Loss 7.9975   LearningRate 0.0352   Epoch: 8   Global Step: 337220   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:36:22,988-Speed 2630.41 samples/sec   Loss 7.9725   LearningRate 0.0352   Epoch: 8   Global Step: 337230   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:36:26,882-Speed 2630.23 samples/sec   Loss 8.0153   LearningRate 0.0352   Epoch: 8   Global Step: 337240   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:36:30,775-Speed 2630.68 samples/sec   Loss 7.9725   LearningRate 0.0352   Epoch: 8   Global Step: 337250   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:36:34,668-Speed 2631.81 samples/sec   Loss 7.9564   LearningRate 0.0352   Epoch: 8   Global Step: 337260   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:36:38,557-Speed 2633.57 samples/sec   Loss 7.8330   LearningRate 0.0352   Epoch: 8   Global Step: 337270   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:42,521-Speed 2583.49 samples/sec   Loss 8.0324   LearningRate 0.0352   Epoch: 8   Global Step: 337280   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:46,414-Speed 2631.13 samples/sec   Loss 7.9748   LearningRate 0.0352   Epoch: 8   Global Step: 337290   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:50,319-Speed 2622.51 samples/sec   Loss 7.9492   LearningRate 0.0352   Epoch: 8   Global Step: 337300   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:54,214-Speed 2630.02 samples/sec   Loss 7.9788   LearningRate 0.0352   Epoch: 8   Global Step: 337310   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:36:58,104-Speed 2633.01 samples/sec   Loss 7.9299   LearningRate 0.0352   Epoch: 8   Global Step: 337320   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:37:02,071-Speed 2581.54 samples/sec   Loss 8.0857   LearningRate 0.0352   Epoch: 8   Global Step: 337330   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:37:05,964-Speed 2631.18 samples/sec   Loss 7.9149   LearningRate 0.0352   Epoch: 8   Global Step: 337340   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:37:09,878-Speed 2617.52 samples/sec   Loss 7.8947   LearningRate 0.0352   Epoch: 8   Global Step: 337350   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:37:13,777-Speed 2626.84 samples/sec   Loss 7.9759   LearningRate 0.0352   Epoch: 8   Global Step: 337360   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:37:17,669-Speed 2631.60 samples/sec   Loss 7.8689   LearningRate 0.0352   Epoch: 8   Global Step: 337370   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:37:21,563-Speed 2630.34 samples/sec   Loss 7.9855   LearningRate 0.0352   Epoch: 8   Global Step: 337380   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:37:25,458-Speed 2629.81 samples/sec   Loss 7.9851   LearningRate 0.0352   Epoch: 8   Global Step: 337390   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:37:29,338-Speed 2639.86 samples/sec   Loss 7.9962   LearningRate 0.0352   Epoch: 8   Global Step: 337400   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:37:33,217-Speed 2640.87 samples/sec   Loss 7.8919   LearningRate 0.0352   Epoch: 8   Global Step: 337410   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:37:37,120-Speed 2623.84 samples/sec   Loss 8.0419   LearningRate 0.0352   Epoch: 8   Global Step: 337420   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:37:41,031-Speed 2619.18 samples/sec   Loss 8.0328   LearningRate 0.0352   Epoch: 8   Global Step: 337430   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:37:44,925-Speed 2630.47 samples/sec   Loss 8.0438   LearningRate 0.0352   Epoch: 8   Global Step: 337440   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:37:48,822-Speed 2628.54 samples/sec   Loss 8.0731   LearningRate 0.0352   Epoch: 8   Global Step: 337450   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:37:52,717-Speed 2629.78 samples/sec   Loss 8.0769   LearningRate 0.0352   Epoch: 8   Global Step: 337460   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:37:56,617-Speed 2626.03 samples/sec   Loss 8.0494   LearningRate 0.0352   Epoch: 8   Global Step: 337470   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:00,510-Speed 2630.96 samples/sec   Loss 8.0452   LearningRate 0.0352   Epoch: 8   Global Step: 337480   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:04,418-Speed 2621.01 samples/sec   Loss 8.0485   LearningRate 0.0352   Epoch: 8   Global Step: 337490   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:08,324-Speed 2622.42 samples/sec   Loss 8.1787   LearningRate 0.0352   Epoch: 8   Global Step: 337500   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:12,220-Speed 2629.33 samples/sec   Loss 7.9836   LearningRate 0.0352   Epoch: 8   Global Step: 337510   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:16,124-Speed 2623.57 samples/sec   Loss 8.0842   LearningRate 0.0352   Epoch: 8   Global Step: 337520   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:20,014-Speed 2635.09 samples/sec   Loss 7.9624   LearningRate 0.0352   Epoch: 8   Global Step: 337530   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:23,905-Speed 2632.08 samples/sec   Loss 7.9183   LearningRate 0.0352   Epoch: 8   Global Step: 337540   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:27,811-Speed 2622.36 samples/sec   Loss 8.0051   LearningRate 0.0352   Epoch: 8   Global Step: 337550   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:31,710-Speed 2627.09 samples/sec   Loss 8.0248   LearningRate 0.0352   Epoch: 8   Global Step: 337560   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:35,609-Speed 2626.94 samples/sec   Loss 8.0141   LearningRate 0.0352   Epoch: 8   Global Step: 337570   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:39,500-Speed 2632.43 samples/sec   Loss 7.8904   LearningRate 0.0352   Epoch: 8   Global Step: 337580   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:38:43,375-Speed 2643.38 samples/sec   Loss 8.0529   LearningRate 0.0352   Epoch: 8   Global Step: 337590   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:47,277-Speed 2624.38 samples/sec   Loss 7.8664   LearningRate 0.0352   Epoch: 8   Global Step: 337600   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:51,178-Speed 2625.88 samples/sec   Loss 8.0271   LearningRate 0.0352   Epoch: 8   Global Step: 337610   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:55,072-Speed 2630.43 samples/sec   Loss 7.9790   LearningRate 0.0352   Epoch: 8   Global Step: 337620   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:38:58,966-Speed 2630.63 samples/sec   Loss 7.9603   LearningRate 0.0352   Epoch: 8   Global Step: 337630   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:39:02,870-Speed 2623.99 samples/sec   Loss 7.8262   LearningRate 0.0352   Epoch: 8   Global Step: 337640   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:39:06,766-Speed 2628.67 samples/sec   Loss 7.8898   LearningRate 0.0352   Epoch: 8   Global Step: 337650   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:39:10,670-Speed 2623.34 samples/sec   Loss 8.0611   LearningRate 0.0352   Epoch: 8   Global Step: 337660   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:39:15,100-Speed 2312.31 samples/sec   Loss 7.9946   LearningRate 0.0352   Epoch: 8   Global Step: 337670   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:39:18,997-Speed 2628.90 samples/sec   Loss 7.9938   LearningRate 0.0352   Epoch: 8   Global Step: 337680   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:39:22,887-Speed 2632.87 samples/sec   Loss 7.9372   LearningRate 0.0352   Epoch: 8   Global Step: 337690   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:26,780-Speed 2631.11 samples/sec   Loss 7.9112   LearningRate 0.0352   Epoch: 8   Global Step: 337700   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:30,682-Speed 2624.64 samples/sec   Loss 7.9351   LearningRate 0.0352   Epoch: 8   Global Step: 337710   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:34,579-Speed 2628.11 samples/sec   Loss 7.9374   LearningRate 0.0352   Epoch: 8   Global Step: 337720   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:38,473-Speed 2630.37 samples/sec   Loss 8.1149   LearningRate 0.0352   Epoch: 8   Global Step: 337730   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:42,360-Speed 2635.31 samples/sec   Loss 7.8833   LearningRate 0.0352   Epoch: 8   Global Step: 337740   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:46,254-Speed 2630.23 samples/sec   Loss 7.8826   LearningRate 0.0351   Epoch: 8   Global Step: 337750   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:50,151-Speed 2628.17 samples/sec   Loss 8.0132   LearningRate 0.0351   Epoch: 8   Global Step: 337760   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:54,050-Speed 2627.10 samples/sec   Loss 8.0316   LearningRate 0.0351   Epoch: 8   Global Step: 337770   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:39:57,940-Speed 2633.14 samples/sec   Loss 8.0454   LearningRate 0.0351   Epoch: 8   Global Step: 337780   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:01,837-Speed 2628.18 samples/sec   Loss 7.9559   LearningRate 0.0351   Epoch: 8   Global Step: 337790   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:40:05,833-Speed 2563.37 samples/sec   Loss 7.9335   LearningRate 0.0351   Epoch: 8   Global Step: 337800   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:40:09,737-Speed 2623.39 samples/sec   Loss 7.7894   LearningRate 0.0351   Epoch: 8   Global Step: 337810   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:40:13,631-Speed 2630.65 samples/sec   Loss 7.9599   LearningRate 0.0351   Epoch: 8   Global Step: 337820   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:40:17,531-Speed 2626.56 samples/sec   Loss 7.9711   LearningRate 0.0351   Epoch: 8   Global Step: 337830   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:40:21,405-Speed 2643.19 samples/sec   Loss 8.0088   LearningRate 0.0351   Epoch: 8   Global Step: 337840   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:25,302-Speed 2628.99 samples/sec   Loss 7.8439   LearningRate 0.0351   Epoch: 8   Global Step: 337850   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:29,190-Speed 2633.78 samples/sec   Loss 8.0350   LearningRate 0.0351   Epoch: 8   Global Step: 337860   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:33,086-Speed 2629.23 samples/sec   Loss 7.9946   LearningRate 0.0351   Epoch: 8   Global Step: 337870   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:36,986-Speed 2626.46 samples/sec   Loss 7.9543   LearningRate 0.0351   Epoch: 8   Global Step: 337880   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:40,891-Speed 2623.05 samples/sec   Loss 8.0857   LearningRate 0.0351   Epoch: 8   Global Step: 337890   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:44,801-Speed 2618.97 samples/sec   Loss 7.8618   LearningRate 0.0351   Epoch: 8   Global Step: 337900   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:48,697-Speed 2629.21 samples/sec   Loss 7.9443   LearningRate 0.0351   Epoch: 8   Global Step: 337910   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:52,603-Speed 2622.35 samples/sec   Loss 8.0386   LearningRate 0.0351   Epoch: 8   Global Step: 337920   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:40:56,491-Speed 2634.57 samples/sec   Loss 7.8367   LearningRate 0.0351   Epoch: 8   Global Step: 337930   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:41:00,381-Speed 2632.86 samples/sec   Loss 8.0727   LearningRate 0.0351   Epoch: 8   Global Step: 337940   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:04,275-Speed 2630.63 samples/sec   Loss 7.9707   LearningRate 0.0351   Epoch: 8   Global Step: 337950   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:08,175-Speed 2626.22 samples/sec   Loss 7.8645   LearningRate 0.0351   Epoch: 8   Global Step: 337960   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:12,083-Speed 2621.04 samples/sec   Loss 8.0009   LearningRate 0.0351   Epoch: 8   Global Step: 337970   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:15,982-Speed 2626.40 samples/sec   Loss 8.0018   LearningRate 0.0351   Epoch: 8   Global Step: 337980   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:19,887-Speed 2623.28 samples/sec   Loss 8.0029   LearningRate 0.0351   Epoch: 8   Global Step: 337990   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:23,778-Speed 2632.26 samples/sec   Loss 7.8915   LearningRate 0.0351   Epoch: 8   Global Step: 338000   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:27,667-Speed 2633.41 samples/sec   Loss 8.0747   LearningRate 0.0351   Epoch: 8   Global Step: 338010   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:31,559-Speed 2631.96 samples/sec   Loss 8.0921   LearningRate 0.0351   Epoch: 8   Global Step: 338020   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:35,453-Speed 2630.44 samples/sec   Loss 8.0423   LearningRate 0.0351   Epoch: 8   Global Step: 338030   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:39,343-Speed 2632.86 samples/sec   Loss 7.9880   LearningRate 0.0351   Epoch: 8   Global Step: 338040   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:41:43,215-Speed 2645.08 samples/sec   Loss 8.0901   LearningRate 0.0351   Epoch: 8   Global Step: 338050   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:47,104-Speed 2633.58 samples/sec   Loss 7.9339   LearningRate 0.0351   Epoch: 8   Global Step: 338060   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:51,019-Speed 2617.05 samples/sec   Loss 7.9587   LearningRate 0.0351   Epoch: 8   Global Step: 338070   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:54,908-Speed 2633.58 samples/sec   Loss 8.0033   LearningRate 0.0351   Epoch: 8   Global Step: 338080   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:41:58,811-Speed 2624.48 samples/sec   Loss 7.9809   LearningRate 0.0351   Epoch: 8   Global Step: 338090   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:42:02,693-Speed 2638.94 samples/sec   Loss 8.1167   LearningRate 0.0351   Epoch: 8   Global Step: 338100   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:06,583-Speed 2633.08 samples/sec   Loss 8.1010   LearningRate 0.0351   Epoch: 8   Global Step: 338110   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:10,473-Speed 2633.05 samples/sec   Loss 8.0137   LearningRate 0.0351   Epoch: 8   Global Step: 338120   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:14,369-Speed 2628.94 samples/sec   Loss 8.0095   LearningRate 0.0351   Epoch: 8   Global Step: 338130   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:18,298-Speed 2606.82 samples/sec   Loss 7.8961   LearningRate 0.0351   Epoch: 8   Global Step: 338140   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:22,197-Speed 2627.64 samples/sec   Loss 8.0655   LearningRate 0.0351   Epoch: 8   Global Step: 338150   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:26,092-Speed 2629.66 samples/sec   Loss 7.9946   LearningRate 0.0351   Epoch: 8   Global Step: 338160   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:29,997-Speed 2623.38 samples/sec   Loss 7.9334   LearningRate 0.0351   Epoch: 8   Global Step: 338170   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:33,896-Speed 2626.92 samples/sec   Loss 8.0891   LearningRate 0.0351   Epoch: 8   Global Step: 338180   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:37,797-Speed 2625.47 samples/sec   Loss 7.9621   LearningRate 0.0351   Epoch: 8   Global Step: 338190   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:42:41,689-Speed 2631.75 samples/sec   Loss 7.9376   LearningRate 0.0351   Epoch: 8   Global Step: 338200   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:42:45,583-Speed 2629.95 samples/sec   Loss 7.9314   LearningRate 0.0351   Epoch: 8   Global Step: 338210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:42:49,491-Speed 2621.08 samples/sec   Loss 7.9817   LearningRate 0.0351   Epoch: 8   Global Step: 338220   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:42:53,393-Speed 2625.04 samples/sec   Loss 7.9082   LearningRate 0.0351   Epoch: 8   Global Step: 338230   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:42:57,261-Speed 2648.55 samples/sec   Loss 7.9909   LearningRate 0.0351   Epoch: 8   Global Step: 338240   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:01,154-Speed 2630.52 samples/sec   Loss 8.0251   LearningRate 0.0351   Epoch: 8   Global Step: 338250   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:05,049-Speed 2629.59 samples/sec   Loss 7.9131   LearningRate 0.0351   Epoch: 8   Global Step: 338260   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:08,941-Speed 2631.98 samples/sec   Loss 8.0629   LearningRate 0.0351   Epoch: 8   Global Step: 338270   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:12,835-Speed 2630.36 samples/sec   Loss 7.8694   LearningRate 0.0351   Epoch: 8   Global Step: 338280   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:16,724-Speed 2633.26 samples/sec   Loss 8.0289   LearningRate 0.0351   Epoch: 8   Global Step: 338290   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:20,617-Speed 2631.42 samples/sec   Loss 8.0423   LearningRate 0.0351   Epoch: 8   Global Step: 338300   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:24,515-Speed 2627.35 samples/sec   Loss 7.9896   LearningRate 0.0351   Epoch: 8   Global Step: 338310   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:28,411-Speed 2628.92 samples/sec   Loss 7.9212   LearningRate 0.0351   Epoch: 8   Global Step: 338320   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:32,326-Speed 2616.15 samples/sec   Loss 7.9284   LearningRate 0.0351   Epoch: 8   Global Step: 338330   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:36,223-Speed 2628.46 samples/sec   Loss 7.8352   LearningRate 0.0351   Epoch: 8   Global Step: 338340   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:43:40,100-Speed 2641.99 samples/sec   Loss 7.9056   LearningRate 0.0351   Epoch: 8   Global Step: 338350   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:43,995-Speed 2629.39 samples/sec   Loss 7.7857   LearningRate 0.0351   Epoch: 8   Global Step: 338360   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:47,898-Speed 2624.12 samples/sec   Loss 8.0167   LearningRate 0.0351   Epoch: 8   Global Step: 338370   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:51,802-Speed 2623.70 samples/sec   Loss 7.9391   LearningRate 0.0351   Epoch: 8   Global Step: 338380   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:55,692-Speed 2633.03 samples/sec   Loss 7.8911   LearningRate 0.0351   Epoch: 8   Global Step: 338390   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:43:59,583-Speed 2632.77 samples/sec   Loss 7.8312   LearningRate 0.0351   Epoch: 8   Global Step: 338400   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:03,477-Speed 2630.48 samples/sec   Loss 8.0847   LearningRate 0.0351   Epoch: 8   Global Step: 338410   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:07,369-Speed 2631.60 samples/sec   Loss 7.8350   LearningRate 0.0351   Epoch: 8   Global Step: 338420   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:11,605-Speed 2417.64 samples/sec   Loss 8.0544   LearningRate 0.0351   Epoch: 8   Global Step: 338430   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:15,496-Speed 2633.07 samples/sec   Loss 7.8910   LearningRate 0.0351   Epoch: 8   Global Step: 338440   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:19,387-Speed 2632.06 samples/sec   Loss 7.9383   LearningRate 0.0350   Epoch: 8   Global Step: 338450   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:44:23,281-Speed 2630.66 samples/sec   Loss 7.9400   LearningRate 0.0350   Epoch: 8   Global Step: 338460   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:44:27,176-Speed 2629.60 samples/sec   Loss 8.0561   LearningRate 0.0350   Epoch: 8   Global Step: 338470   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:44:31,052-Speed 2642.50 samples/sec   Loss 8.0196   LearningRate 0.0350   Epoch: 8   Global Step: 338480   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:34,947-Speed 2629.93 samples/sec   Loss 7.9174   LearningRate 0.0350   Epoch: 8   Global Step: 338490   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:38,838-Speed 2632.29 samples/sec   Loss 7.8772   LearningRate 0.0350   Epoch: 8   Global Step: 338500   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:42,732-Speed 2630.37 samples/sec   Loss 7.9026   LearningRate 0.0350   Epoch: 8   Global Step: 338510   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:46,626-Speed 2630.02 samples/sec   Loss 7.8589   LearningRate 0.0350   Epoch: 8   Global Step: 338520   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:50,515-Speed 2634.31 samples/sec   Loss 7.9386   LearningRate 0.0350   Epoch: 8   Global Step: 338530   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:44:54,356-Speed 2666.42 samples/sec   Loss 7.9283   LearningRate 0.0350   Epoch: 8   Global Step: 338540   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:44:58,226-Speed 2646.78 samples/sec   Loss 9.5497   LearningRate 0.0350   Epoch: 8   Global Step: 338550   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:02,316-Speed 2504.35 samples/sec   Loss 8.8636   LearningRate 0.0350   Epoch: 8   Global Step: 338560   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:06,332-Speed 2549.90 samples/sec   Loss 8.4360   LearningRate 0.0350   Epoch: 8   Global Step: 338570   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:10,227-Speed 2629.61 samples/sec   Loss 8.2983   LearningRate 0.0350   Epoch: 8   Global Step: 338580   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:14,119-Speed 2632.14 samples/sec   Loss 8.1735   LearningRate 0.0350   Epoch: 8   Global Step: 338590   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:18,012-Speed 2631.08 samples/sec   Loss 8.1899   LearningRate 0.0350   Epoch: 8   Global Step: 338600   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:21,900-Speed 2634.74 samples/sec   Loss 7.9739   LearningRate 0.0350   Epoch: 8   Global Step: 338610   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:25,793-Speed 2630.22 samples/sec   Loss 8.1017   LearningRate 0.0350   Epoch: 8   Global Step: 338620   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:29,683-Speed 2633.70 samples/sec   Loss 7.8620   LearningRate 0.0350   Epoch: 8   Global Step: 338630   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:33,570-Speed 2635.39 samples/sec   Loss 8.1524   LearningRate 0.0350   Epoch: 8   Global Step: 338640   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 09:45:37,465-Speed 2629.27 samples/sec   Loss 8.0067   LearningRate 0.0350   Epoch: 8   Global Step: 338650   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:45:41,364-Speed 2626.50 samples/sec   Loss 8.1343   LearningRate 0.0350   Epoch: 8   Global Step: 338660   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:45:45,256-Speed 2631.69 samples/sec   Loss 8.1179   LearningRate 0.0350   Epoch: 8   Global Step: 338670   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:45:49,155-Speed 2626.76 samples/sec   Loss 7.9194   LearningRate 0.0350   Epoch: 8   Global Step: 338680   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:45:53,046-Speed 2632.46 samples/sec   Loss 8.2150   LearningRate 0.0350   Epoch: 8   Global Step: 338690   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:45:56,937-Speed 2632.49 samples/sec   Loss 8.0253   LearningRate 0.0350   Epoch: 8   Global Step: 338700   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:46:00,827-Speed 2633.25 samples/sec   Loss 8.1474   LearningRate 0.0350   Epoch: 8   Global Step: 338710   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:46:04,720-Speed 2631.15 samples/sec   Loss 7.9717   LearningRate 0.0350   Epoch: 8   Global Step: 338720   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:46:08,623-Speed 2624.25 samples/sec   Loss 8.0563   LearningRate 0.0350   Epoch: 8   Global Step: 338730   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:46:12,528-Speed 2622.82 samples/sec   Loss 7.9754   LearningRate 0.0350   Epoch: 8   Global Step: 338740   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:46:16,423-Speed 2629.37 samples/sec   Loss 8.0336   LearningRate 0.0350   Epoch: 8   Global Step: 338750   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:20,316-Speed 2631.97 samples/sec   Loss 8.0466   LearningRate 0.0350   Epoch: 8   Global Step: 338760   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:24,212-Speed 2628.35 samples/sec   Loss 8.0276   LearningRate 0.0350   Epoch: 8   Global Step: 338770   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:28,107-Speed 2629.63 samples/sec   Loss 8.0643   LearningRate 0.0350   Epoch: 8   Global Step: 338780   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:32,005-Speed 2628.08 samples/sec   Loss 8.0411   LearningRate 0.0350   Epoch: 8   Global Step: 338790   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:35,894-Speed 2633.67 samples/sec   Loss 7.9511   LearningRate 0.0350   Epoch: 8   Global Step: 338800   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:39,789-Speed 2629.82 samples/sec   Loss 7.9391   LearningRate 0.0350   Epoch: 8   Global Step: 338810   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:43,687-Speed 2627.33 samples/sec   Loss 7.8976   LearningRate 0.0350   Epoch: 8   Global Step: 338820   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:47,599-Speed 2617.95 samples/sec   Loss 7.9477   LearningRate 0.0350   Epoch: 8   Global Step: 338830   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:51,500-Speed 2626.16 samples/sec   Loss 8.0167   LearningRate 0.0350   Epoch: 8   Global Step: 338840   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:46:55,402-Speed 2625.38 samples/sec   Loss 7.9363   LearningRate 0.0350   Epoch: 8   Global Step: 338850   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:46:59,298-Speed 2628.73 samples/sec   Loss 7.9686   LearningRate 0.0350   Epoch: 8   Global Step: 338860   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:03,211-Speed 2617.52 samples/sec   Loss 8.0678   LearningRate 0.0350   Epoch: 8   Global Step: 338870   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:07,105-Speed 2630.54 samples/sec   Loss 8.0457   LearningRate 0.0350   Epoch: 8   Global Step: 338880   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:11,016-Speed 2618.62 samples/sec   Loss 7.8693   LearningRate 0.0350   Epoch: 8   Global Step: 338890   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:14,910-Speed 2630.61 samples/sec   Loss 7.8882   LearningRate 0.0350   Epoch: 8   Global Step: 338900   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:18,824-Speed 2617.11 samples/sec   Loss 8.0679   LearningRate 0.0350   Epoch: 8   Global Step: 338910   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:22,712-Speed 2634.54 samples/sec   Loss 8.0666   LearningRate 0.0350   Epoch: 8   Global Step: 338920   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:26,623-Speed 2618.63 samples/sec   Loss 8.0276   LearningRate 0.0350   Epoch: 8   Global Step: 338930   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:30,543-Speed 2612.77 samples/sec   Loss 8.1153   LearningRate 0.0350   Epoch: 8   Global Step: 338940   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:34,437-Speed 2631.06 samples/sec   Loss 8.0473   LearningRate 0.0350   Epoch: 8   Global Step: 338950   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:47:38,332-Speed 2629.67 samples/sec   Loss 8.0351   LearningRate 0.0350   Epoch: 8   Global Step: 338960   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:47:42,225-Speed 2630.72 samples/sec   Loss 7.9532   LearningRate 0.0350   Epoch: 8   Global Step: 338970   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:47:46,118-Speed 2630.72 samples/sec   Loss 7.9283   LearningRate 0.0350   Epoch: 8   Global Step: 338980   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:47:50,010-Speed 2631.85 samples/sec   Loss 8.1120   LearningRate 0.0350   Epoch: 8   Global Step: 338990   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:47:53,889-Speed 2640.12 samples/sec   Loss 7.9984   LearningRate 0.0350   Epoch: 8   Global Step: 339000   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:47:57,785-Speed 2629.30 samples/sec   Loss 7.9890   LearningRate 0.0350   Epoch: 8   Global Step: 339010   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:01,675-Speed 2633.22 samples/sec   Loss 7.8484   LearningRate 0.0350   Epoch: 8   Global Step: 339020   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:05,567-Speed 2631.52 samples/sec   Loss 7.9943   LearningRate 0.0350   Epoch: 8   Global Step: 339030   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:09,460-Speed 2630.79 samples/sec   Loss 8.0453   LearningRate 0.0350   Epoch: 8   Global Step: 339040   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:13,355-Speed 2629.71 samples/sec   Loss 8.0161   LearningRate 0.0350   Epoch: 8   Global Step: 339050   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:17,248-Speed 2630.82 samples/sec   Loss 7.8990   LearningRate 0.0350   Epoch: 8   Global Step: 339060   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:21,152-Speed 2623.64 samples/sec   Loss 8.0268   LearningRate 0.0350   Epoch: 8   Global Step: 339070   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:25,043-Speed 2632.10 samples/sec   Loss 7.8441   LearningRate 0.0350   Epoch: 8   Global Step: 339080   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:28,938-Speed 2630.25 samples/sec   Loss 7.9639   LearningRate 0.0350   Epoch: 8   Global Step: 339090   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:48:32,841-Speed 2624.01 samples/sec   Loss 7.8993   LearningRate 0.0350   Epoch: 8   Global Step: 339100   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:48:36,736-Speed 2629.89 samples/sec   Loss 8.1350   LearningRate 0.0350   Epoch: 8   Global Step: 339110   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:48:40,628-Speed 2631.49 samples/sec   Loss 8.1458   LearningRate 0.0350   Epoch: 8   Global Step: 339120   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:48:44,520-Speed 2631.51 samples/sec   Loss 7.9502   LearningRate 0.0350   Epoch: 8   Global Step: 339130   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:48:48,413-Speed 2630.87 samples/sec   Loss 7.9121   LearningRate 0.0350   Epoch: 8   Global Step: 339140   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:48:52,310-Speed 2628.28 samples/sec   Loss 8.0395   LearningRate 0.0349   Epoch: 8   Global Step: 339150   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:48:56,214-Speed 2624.14 samples/sec   Loss 8.0078   LearningRate 0.0349   Epoch: 8   Global Step: 339160   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:00,108-Speed 2630.03 samples/sec   Loss 7.9907   LearningRate 0.0349   Epoch: 8   Global Step: 339170   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:04,007-Speed 2626.75 samples/sec   Loss 8.0781   LearningRate 0.0349   Epoch: 8   Global Step: 339180   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:07,901-Speed 2630.79 samples/sec   Loss 7.8831   LearningRate 0.0349   Epoch: 8   Global Step: 339190   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:11,792-Speed 2632.20 samples/sec   Loss 7.9878   LearningRate 0.0349   Epoch: 8   Global Step: 339200   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:15,698-Speed 2622.74 samples/sec   Loss 7.9832   LearningRate 0.0349   Epoch: 8   Global Step: 339210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:19,592-Speed 2630.66 samples/sec   Loss 8.0507   LearningRate 0.0349   Epoch: 8   Global Step: 339220   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:23,486-Speed 2630.20 samples/sec   Loss 8.0694   LearningRate 0.0349   Epoch: 8   Global Step: 339230   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:27,381-Speed 2629.77 samples/sec   Loss 7.8998   LearningRate 0.0349   Epoch: 8   Global Step: 339240   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:31,300-Speed 2614.16 samples/sec   Loss 7.9569   LearningRate 0.0349   Epoch: 8   Global Step: 339250   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:35,195-Speed 2629.50 samples/sec   Loss 7.9741   LearningRate 0.0349   Epoch: 8   Global Step: 339260   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:39,088-Speed 2630.75 samples/sec   Loss 7.9594   LearningRate 0.0349   Epoch: 8   Global Step: 339270   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:42,983-Speed 2629.75 samples/sec   Loss 7.9949   LearningRate 0.0349   Epoch: 8   Global Step: 339280   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:49:46,859-Speed 2642.59 samples/sec   Loss 7.9878   LearningRate 0.0349   Epoch: 8   Global Step: 339290   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:50,753-Speed 2631.14 samples/sec   Loss 8.0969   LearningRate 0.0349   Epoch: 8   Global Step: 339300   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:54,645-Speed 2631.43 samples/sec   Loss 8.0157   LearningRate 0.0349   Epoch: 8   Global Step: 339310   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:49:58,537-Speed 2631.81 samples/sec   Loss 7.9732   LearningRate 0.0349   Epoch: 8   Global Step: 339320   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:50:02,432-Speed 2629.28 samples/sec   Loss 7.8061   LearningRate 0.0349   Epoch: 8   Global Step: 339330   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:50:06,327-Speed 2629.76 samples/sec   Loss 8.0038   LearningRate 0.0349   Epoch: 8   Global Step: 339340   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:50:10,229-Speed 2624.57 samples/sec   Loss 8.0598   LearningRate 0.0349   Epoch: 8   Global Step: 339350   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:50:14,075-Speed 2663.91 samples/sec   Loss 8.2910   LearningRate 0.0349   Epoch: 8   Global Step: 339360   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:17,969-Speed 2630.06 samples/sec   Loss 7.9730   LearningRate 0.0349   Epoch: 8   Global Step: 339370   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:21,902-Speed 2604.72 samples/sec   Loss 8.0503   LearningRate 0.0349   Epoch: 8   Global Step: 339380   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:25,798-Speed 2629.08 samples/sec   Loss 7.9168   LearningRate 0.0349   Epoch: 8   Global Step: 339390   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:29,688-Speed 2633.01 samples/sec   Loss 7.8734   LearningRate 0.0349   Epoch: 8   Global Step: 339400   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:33,580-Speed 2631.60 samples/sec   Loss 7.9402   LearningRate 0.0349   Epoch: 8   Global Step: 339410   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:37,471-Speed 2632.21 samples/sec   Loss 7.8947   LearningRate 0.0349   Epoch: 8   Global Step: 339420   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:41,362-Speed 2632.19 samples/sec   Loss 7.8516   LearningRate 0.0349   Epoch: 8   Global Step: 339430   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:45,256-Speed 2630.44 samples/sec   Loss 7.9952   LearningRate 0.0349   Epoch: 8   Global Step: 339440   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:49,148-Speed 2631.68 samples/sec   Loss 7.7993   LearningRate 0.0349   Epoch: 8   Global Step: 339450   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 09:50:53,040-Speed 2632.01 samples/sec   Loss 7.8172   LearningRate 0.0349   Epoch: 8   Global Step: 339460   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:50:56,929-Speed 2633.25 samples/sec   Loss 8.1144   LearningRate 0.0349   Epoch: 8   Global Step: 339470   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:00,827-Speed 2627.71 samples/sec   Loss 7.9956   LearningRate 0.0349   Epoch: 8   Global Step: 339480   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:04,764-Speed 2601.75 samples/sec   Loss 8.0030   LearningRate 0.0349   Epoch: 8   Global Step: 339490   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:08,648-Speed 2637.36 samples/sec   Loss 7.9646   LearningRate 0.0349   Epoch: 8   Global Step: 339500   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:12,542-Speed 2630.03 samples/sec   Loss 7.9496   LearningRate 0.0349   Epoch: 8   Global Step: 339510   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:16,445-Speed 2624.52 samples/sec   Loss 7.9647   LearningRate 0.0349   Epoch: 8   Global Step: 339520   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:20,362-Speed 2615.14 samples/sec   Loss 7.8483   LearningRate 0.0349   Epoch: 8   Global Step: 339530   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:24,253-Speed 2632.49 samples/sec   Loss 8.0874   LearningRate 0.0349   Epoch: 8   Global Step: 339540   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:28,143-Speed 2632.97 samples/sec   Loss 8.0018   LearningRate 0.0349   Epoch: 8   Global Step: 339550   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 09:51:32,038-Speed 2629.85 samples/sec   Loss 7.9948   LearningRate 0.0349   Epoch: 8   Global Step: 339560   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:35,930-Speed 2631.56 samples/sec   Loss 7.9766   LearningRate 0.0349   Epoch: 8   Global Step: 339570   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:39,825-Speed 2629.79 samples/sec   Loss 7.9354   LearningRate 0.0349   Epoch: 8   Global Step: 339580   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:43,717-Speed 2631.83 samples/sec   Loss 7.8204   LearningRate 0.0349   Epoch: 8   Global Step: 339590   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:47,615-Speed 2627.50 samples/sec   Loss 7.8979   LearningRate 0.0349   Epoch: 8   Global Step: 339600   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:51,508-Speed 2631.14 samples/sec   Loss 7.9025   LearningRate 0.0349   Epoch: 8   Global Step: 339610   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:55,518-Speed 2554.32 samples/sec   Loss 7.7623   LearningRate 0.0349   Epoch: 8   Global Step: 339620   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:51:59,535-Speed 2550.10 samples/sec   Loss 7.9556   LearningRate 0.0349   Epoch: 8   Global Step: 339630   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:52:03,448-Speed 2617.34 samples/sec   Loss 7.8470   LearningRate 0.0349   Epoch: 8   Global Step: 339640   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:52:07,344-Speed 2629.19 samples/sec   Loss 7.9783   LearningRate 0.0349   Epoch: 8   Global Step: 339650   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:52:11,241-Speed 2627.86 samples/sec   Loss 7.8676   LearningRate 0.0349   Epoch: 8   Global Step: 339660   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:15,135-Speed 2631.16 samples/sec   Loss 7.9799   LearningRate 0.0349   Epoch: 8   Global Step: 339670   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:19,032-Speed 2627.45 samples/sec   Loss 7.9575   LearningRate 0.0349   Epoch: 8   Global Step: 339680   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:22,936-Speed 2623.78 samples/sec   Loss 7.7965   LearningRate 0.0349   Epoch: 8   Global Step: 339690   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:26,837-Speed 2626.10 samples/sec   Loss 7.9749   LearningRate 0.0349   Epoch: 8   Global Step: 339700   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:30,731-Speed 2630.26 samples/sec   Loss 7.9899   LearningRate 0.0349   Epoch: 8   Global Step: 339710   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:34,625-Speed 2630.52 samples/sec   Loss 7.9731   LearningRate 0.0349   Epoch: 8   Global Step: 339720   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:38,522-Speed 2627.99 samples/sec   Loss 7.9102   LearningRate 0.0349   Epoch: 8   Global Step: 339730   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:42,424-Speed 2624.45 samples/sec   Loss 8.0756   LearningRate 0.0349   Epoch: 8   Global Step: 339740   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:46,318-Speed 2630.52 samples/sec   Loss 7.8615   LearningRate 0.0349   Epoch: 8   Global Step: 339750   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:52:50,219-Speed 2625.56 samples/sec   Loss 7.9894   LearningRate 0.0349   Epoch: 8   Global Step: 339760   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:52:54,112-Speed 2631.25 samples/sec   Loss 7.9055   LearningRate 0.0349   Epoch: 8   Global Step: 339770   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:52:58,007-Speed 2629.99 samples/sec   Loss 8.0346   LearningRate 0.0349   Epoch: 8   Global Step: 339780   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:53:01,889-Speed 2638.29 samples/sec   Loss 7.9438   LearningRate 0.0349   Epoch: 8   Global Step: 339790   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:05,781-Speed 2631.69 samples/sec   Loss 7.9971   LearningRate 0.0349   Epoch: 8   Global Step: 339800   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:09,672-Speed 2632.37 samples/sec   Loss 7.8766   LearningRate 0.0349   Epoch: 8   Global Step: 339810   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:13,567-Speed 2629.88 samples/sec   Loss 7.9329   LearningRate 0.0349   Epoch: 8   Global Step: 339820   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:17,481-Speed 2616.58 samples/sec   Loss 8.0590   LearningRate 0.0349   Epoch: 8   Global Step: 339830   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:21,379-Speed 2628.16 samples/sec   Loss 7.9473   LearningRate 0.0349   Epoch: 8   Global Step: 339840   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:25,277-Speed 2627.40 samples/sec   Loss 7.9461   LearningRate 0.0348   Epoch: 8   Global Step: 339850   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:29,173-Speed 2629.29 samples/sec   Loss 7.9514   LearningRate 0.0348   Epoch: 8   Global Step: 339860   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:33,066-Speed 2631.35 samples/sec   Loss 7.9888   LearningRate 0.0348   Epoch: 8   Global Step: 339870   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:37,036-Speed 2579.52 samples/sec   Loss 7.8185   LearningRate 0.0348   Epoch: 8   Global Step: 339880   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:53:40,942-Speed 2622.12 samples/sec   Loss 8.1055   LearningRate 0.0348   Epoch: 8   Global Step: 339890   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:53:44,841-Speed 2626.73 samples/sec   Loss 7.8751   LearningRate 0.0348   Epoch: 8   Global Step: 339900   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:53:48,733-Speed 2632.25 samples/sec   Loss 7.9305   LearningRate 0.0348   Epoch: 8   Global Step: 339910   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:53:52,641-Speed 2620.90 samples/sec   Loss 7.9404   LearningRate 0.0348   Epoch: 8   Global Step: 339920   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:53:56,572-Speed 2605.90 samples/sec   Loss 7.8447   LearningRate 0.0348   Epoch: 8   Global Step: 339930   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:54:00,468-Speed 2629.17 samples/sec   Loss 7.8512   LearningRate 0.0348   Epoch: 8   Global Step: 339940   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:54:04,360-Speed 2631.92 samples/sec   Loss 7.9657   LearningRate 0.0348   Epoch: 8   Global Step: 339950   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:54:08,252-Speed 2631.33 samples/sec   Loss 8.0410   LearningRate 0.0348   Epoch: 8   Global Step: 339960   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:54:12,152-Speed 2625.98 samples/sec   Loss 7.9330   LearningRate 0.0348   Epoch: 8   Global Step: 339970   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:54:16,053-Speed 2625.51 samples/sec   Loss 8.0138   LearningRate 0.0348   Epoch: 8   Global Step: 339980   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:54:19,949-Speed 2629.11 samples/sec   Loss 7.9488   LearningRate 0.0348   Epoch: 8   Global Step: 339990   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 09:54:23,826-Speed 2642.07 samples/sec   Loss 7.9458   LearningRate 0.0348   Epoch: 8   Global Step: 340000   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:55:06,916-[lfw][340000]XNorm: 23.821781
Training: 2022-04-14 09:55:06,917-[lfw][340000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-14 09:55:06,917-[lfw][340000]Accuracy-Highest: 0.99783
Training: 2022-04-14 09:55:57,090-[cfp_fp][340000]XNorm: 22.373600
Training: 2022-04-14 09:55:57,091-[cfp_fp][340000]Accuracy-Flip: 0.98471+-0.00568
Training: 2022-04-14 09:55:57,092-[cfp_fp][340000]Accuracy-Highest: 0.98671
Training: 2022-04-14 09:56:39,927-[agedb_30][340000]XNorm: 23.840092
Training: 2022-04-14 09:56:39,928-[agedb_30][340000]Accuracy-Flip: 0.97550+-0.00624
Training: 2022-04-14 09:56:39,929-[agedb_30][340000]Accuracy-Highest: 0.97567
Training: 2022-04-14 09:56:43,776-Speed 73.17 samples/sec   Loss 7.8511   LearningRate 0.0348   Epoch: 8   Global Step: 340010   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:56:47,707-Speed 2605.66 samples/sec   Loss 7.9034   LearningRate 0.0348   Epoch: 8   Global Step: 340020   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:56:51,767-Speed 2522.87 samples/sec   Loss 8.0212   LearningRate 0.0348   Epoch: 8   Global Step: 340030   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:56:55,662-Speed 2629.72 samples/sec   Loss 8.0207   LearningRate 0.0348   Epoch: 8   Global Step: 340040   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:56:59,536-Speed 2643.87 samples/sec   Loss 8.0078   LearningRate 0.0348   Epoch: 8   Global Step: 340050   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:57:03,419-Speed 2637.03 samples/sec   Loss 7.8868   LearningRate 0.0348   Epoch: 8   Global Step: 340060   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:57:07,299-Speed 2640.21 samples/sec   Loss 7.9230   LearningRate 0.0348   Epoch: 8   Global Step: 340070   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:57:11,189-Speed 2632.70 samples/sec   Loss 8.0346   LearningRate 0.0348   Epoch: 8   Global Step: 340080   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:57:15,081-Speed 2631.87 samples/sec   Loss 8.0475   LearningRate 0.0348   Epoch: 8   Global Step: 340090   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:57:18,972-Speed 2632.61 samples/sec   Loss 8.0318   LearningRate 0.0348   Epoch: 8   Global Step: 340100   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:57:22,858-Speed 2636.18 samples/sec   Loss 7.7697   LearningRate 0.0348   Epoch: 8   Global Step: 340110   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:26,742-Speed 2636.85 samples/sec   Loss 7.8530   LearningRate 0.0348   Epoch: 8   Global Step: 340120   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:30,628-Speed 2635.79 samples/sec   Loss 7.8862   LearningRate 0.0348   Epoch: 8   Global Step: 340130   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:34,513-Speed 2636.20 samples/sec   Loss 7.8605   LearningRate 0.0348   Epoch: 8   Global Step: 340140   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:38,408-Speed 2629.56 samples/sec   Loss 7.9551   LearningRate 0.0348   Epoch: 8   Global Step: 340150   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:42,310-Speed 2625.29 samples/sec   Loss 7.9196   LearningRate 0.0348   Epoch: 8   Global Step: 340160   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:46,209-Speed 2626.95 samples/sec   Loss 7.8640   LearningRate 0.0348   Epoch: 8   Global Step: 340170   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:50,118-Speed 2620.50 samples/sec   Loss 7.8891   LearningRate 0.0348   Epoch: 8   Global Step: 340180   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:54,014-Speed 2628.70 samples/sec   Loss 7.8499   LearningRate 0.0348   Epoch: 8   Global Step: 340190   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:57:58,417-Speed 2326.66 samples/sec   Loss 8.0422   LearningRate 0.0348   Epoch: 8   Global Step: 340200   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:58:02,308-Speed 2632.74 samples/sec   Loss 7.8702   LearningRate 0.0348   Epoch: 8   Global Step: 340210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:58:06,180-Speed 2645.15 samples/sec   Loss 8.0474   LearningRate 0.0348   Epoch: 8   Global Step: 340220   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:58:10,074-Speed 2630.75 samples/sec   Loss 7.9089   LearningRate 0.0348   Epoch: 8   Global Step: 340230   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:58:13,964-Speed 2632.81 samples/sec   Loss 8.0307   LearningRate 0.0348   Epoch: 8   Global Step: 340240   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:58:17,855-Speed 2632.77 samples/sec   Loss 8.0519   LearningRate 0.0348   Epoch: 8   Global Step: 340250   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:58:21,745-Speed 2632.68 samples/sec   Loss 7.8379   LearningRate 0.0348   Epoch: 8   Global Step: 340260   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:58:25,654-Speed 2620.38 samples/sec   Loss 8.0399   LearningRate 0.0348   Epoch: 8   Global Step: 340270   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:58:29,545-Speed 2632.24 samples/sec   Loss 8.7269   LearningRate 0.0348   Epoch: 8   Global Step: 340280   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:33,462-Speed 2615.32 samples/sec   Loss 9.0381   LearningRate 0.0348   Epoch: 8   Global Step: 340290   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:37,353-Speed 2633.04 samples/sec   Loss 8.5724   LearningRate 0.0348   Epoch: 8   Global Step: 340300   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:41,242-Speed 2633.64 samples/sec   Loss 8.1886   LearningRate 0.0348   Epoch: 8   Global Step: 340310   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:45,132-Speed 2632.77 samples/sec   Loss 8.2123   LearningRate 0.0348   Epoch: 8   Global Step: 340320   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:49,026-Speed 2630.43 samples/sec   Loss 8.1547   LearningRate 0.0348   Epoch: 8   Global Step: 340330   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:52,915-Speed 2633.97 samples/sec   Loss 8.1452   LearningRate 0.0348   Epoch: 8   Global Step: 340340   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:58:56,811-Speed 2629.12 samples/sec   Loss 7.9938   LearningRate 0.0348   Epoch: 8   Global Step: 340350   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:59:00,702-Speed 2632.26 samples/sec   Loss 7.9891   LearningRate 0.0348   Epoch: 8   Global Step: 340360   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:59:04,603-Speed 2625.60 samples/sec   Loss 8.1016   LearningRate 0.0348   Epoch: 8   Global Step: 340370   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 09:59:08,501-Speed 2628.08 samples/sec   Loss 7.9774   LearningRate 0.0348   Epoch: 8   Global Step: 340380   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:12,392-Speed 2632.44 samples/sec   Loss 8.0738   LearningRate 0.0348   Epoch: 8   Global Step: 340390   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:16,285-Speed 2630.91 samples/sec   Loss 8.0044   LearningRate 0.0348   Epoch: 8   Global Step: 340400   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:20,180-Speed 2629.89 samples/sec   Loss 8.1242   LearningRate 0.0348   Epoch: 8   Global Step: 340410   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:24,081-Speed 2625.36 samples/sec   Loss 7.9443   LearningRate 0.0348   Epoch: 8   Global Step: 340420   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:27,980-Speed 2626.42 samples/sec   Loss 8.0022   LearningRate 0.0348   Epoch: 8   Global Step: 340430   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:31,875-Speed 2629.81 samples/sec   Loss 7.8025   LearningRate 0.0348   Epoch: 8   Global Step: 340440   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:35,817-Speed 2598.25 samples/sec   Loss 7.7468   LearningRate 0.0348   Epoch: 8   Global Step: 340450   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:39,744-Speed 2608.46 samples/sec   Loss 7.9952   LearningRate 0.0348   Epoch: 8   Global Step: 340460   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:43,640-Speed 2628.96 samples/sec   Loss 7.9654   LearningRate 0.0348   Epoch: 8   Global Step: 340470   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 09:59:47,532-Speed 2631.67 samples/sec   Loss 7.9899   LearningRate 0.0348   Epoch: 8   Global Step: 340480   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:59:51,422-Speed 2633.32 samples/sec   Loss 8.0450   LearningRate 0.0348   Epoch: 8   Global Step: 340490   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:59:55,318-Speed 2628.90 samples/sec   Loss 7.8309   LearningRate 0.0348   Epoch: 8   Global Step: 340500   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 09:59:59,218-Speed 2625.73 samples/sec   Loss 7.9258   LearningRate 0.0348   Epoch: 8   Global Step: 340510   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:00:03,113-Speed 2629.96 samples/sec   Loss 7.8648   LearningRate 0.0348   Epoch: 8   Global Step: 340520   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:00:07,018-Speed 2622.77 samples/sec   Loss 8.0062   LearningRate 0.0348   Epoch: 8   Global Step: 340530   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:00:10,919-Speed 2625.69 samples/sec   Loss 7.8725   LearningRate 0.0348   Epoch: 8   Global Step: 340540   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:00:14,797-Speed 2641.51 samples/sec   Loss 7.9454   LearningRate 0.0347   Epoch: 8   Global Step: 340550   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:18,692-Speed 2629.45 samples/sec   Loss 7.9662   LearningRate 0.0347   Epoch: 8   Global Step: 340560   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:22,595-Speed 2624.72 samples/sec   Loss 7.8498   LearningRate 0.0347   Epoch: 8   Global Step: 340570   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:26,497-Speed 2624.52 samples/sec   Loss 7.8733   LearningRate 0.0347   Epoch: 8   Global Step: 340580   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:30,401-Speed 2623.22 samples/sec   Loss 8.0905   LearningRate 0.0347   Epoch: 8   Global Step: 340590   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:34,301-Speed 2626.45 samples/sec   Loss 7.9081   LearningRate 0.0347   Epoch: 8   Global Step: 340600   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:38,210-Speed 2620.83 samples/sec   Loss 7.9177   LearningRate 0.0347   Epoch: 8   Global Step: 340610   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:42,116-Speed 2621.93 samples/sec   Loss 7.9005   LearningRate 0.0347   Epoch: 8   Global Step: 340620   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:46,013-Speed 2628.67 samples/sec   Loss 8.0401   LearningRate 0.0347   Epoch: 8   Global Step: 340630   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:49,940-Speed 2607.93 samples/sec   Loss 7.8778   LearningRate 0.0347   Epoch: 8   Global Step: 340640   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:00:53,839-Speed 2627.31 samples/sec   Loss 7.9331   LearningRate 0.0347   Epoch: 8   Global Step: 340650   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:00:57,738-Speed 2627.08 samples/sec   Loss 8.0384   LearningRate 0.0347   Epoch: 8   Global Step: 340660   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:01,634-Speed 2628.85 samples/sec   Loss 8.0554   LearningRate 0.0347   Epoch: 8   Global Step: 340670   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:05,532-Speed 2627.08 samples/sec   Loss 8.0004   LearningRate 0.0347   Epoch: 8   Global Step: 340680   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:09,441-Speed 2620.14 samples/sec   Loss 7.9008   LearningRate 0.0347   Epoch: 8   Global Step: 340690   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:13,336-Speed 2630.09 samples/sec   Loss 7.9485   LearningRate 0.0347   Epoch: 8   Global Step: 340700   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:17,232-Speed 2628.99 samples/sec   Loss 7.9989   LearningRate 0.0347   Epoch: 8   Global Step: 340710   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:21,130-Speed 2633.43 samples/sec   Loss 7.8034   LearningRate 0.0347   Epoch: 8   Global Step: 340720   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:25,023-Speed 2630.89 samples/sec   Loss 7.8701   LearningRate 0.0347   Epoch: 8   Global Step: 340730   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:28,950-Speed 2608.35 samples/sec   Loss 8.0094   LearningRate 0.0347   Epoch: 8   Global Step: 340740   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:32,845-Speed 2629.80 samples/sec   Loss 7.8054   LearningRate 0.0347   Epoch: 8   Global Step: 340750   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:01:36,727-Speed 2637.86 samples/sec   Loss 7.9206   LearningRate 0.0347   Epoch: 8   Global Step: 340760   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:40,655-Speed 2607.57 samples/sec   Loss 7.8546   LearningRate 0.0347   Epoch: 8   Global Step: 340770   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:44,570-Speed 2616.17 samples/sec   Loss 7.9081   LearningRate 0.0347   Epoch: 8   Global Step: 340780   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:48,471-Speed 2626.38 samples/sec   Loss 7.9521   LearningRate 0.0347   Epoch: 8   Global Step: 340790   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:52,364-Speed 2630.39 samples/sec   Loss 8.0036   LearningRate 0.0347   Epoch: 8   Global Step: 340800   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:01:56,277-Speed 2617.88 samples/sec   Loss 7.9155   LearningRate 0.0347   Epoch: 8   Global Step: 340810   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:00,174-Speed 2628.65 samples/sec   Loss 7.9237   LearningRate 0.0347   Epoch: 8   Global Step: 340820   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:04,070-Speed 2628.85 samples/sec   Loss 7.9557   LearningRate 0.0347   Epoch: 8   Global Step: 340830   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:08,000-Speed 2605.73 samples/sec   Loss 7.9668   LearningRate 0.0347   Epoch: 8   Global Step: 340840   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:11,900-Speed 2627.36 samples/sec   Loss 8.0415   LearningRate 0.0347   Epoch: 8   Global Step: 340850   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:15,805-Speed 2622.30 samples/sec   Loss 7.9707   LearningRate 0.0347   Epoch: 8   Global Step: 340860   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:02:19,726-Speed 2612.59 samples/sec   Loss 7.9143   LearningRate 0.0347   Epoch: 8   Global Step: 340870   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:02:23,609-Speed 2637.77 samples/sec   Loss 7.9574   LearningRate 0.0347   Epoch: 8   Global Step: 340880   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:27,510-Speed 2625.98 samples/sec   Loss 7.8478   LearningRate 0.0347   Epoch: 8   Global Step: 340890   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:31,411-Speed 2625.66 samples/sec   Loss 7.9028   LearningRate 0.0347   Epoch: 8   Global Step: 340900   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:35,309-Speed 2627.10 samples/sec   Loss 7.9920   LearningRate 0.0347   Epoch: 8   Global Step: 340910   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:39,206-Speed 2628.30 samples/sec   Loss 7.8776   LearningRate 0.0347   Epoch: 8   Global Step: 340920   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:43,104-Speed 2628.09 samples/sec   Loss 7.8448   LearningRate 0.0347   Epoch: 8   Global Step: 340930   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:47,008-Speed 2623.24 samples/sec   Loss 7.8146   LearningRate 0.0347   Epoch: 8   Global Step: 340940   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:50,917-Speed 2620.87 samples/sec   Loss 8.0020   LearningRate 0.0347   Epoch: 8   Global Step: 340950   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:54,818-Speed 2625.26 samples/sec   Loss 7.9560   LearningRate 0.0347   Epoch: 8   Global Step: 340960   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:02:58,714-Speed 2629.47 samples/sec   Loss 7.9864   LearningRate 0.0347   Epoch: 8   Global Step: 340970   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:02,611-Speed 2627.98 samples/sec   Loss 7.9373   LearningRate 0.0347   Epoch: 8   Global Step: 340980   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:03:06,523-Speed 2618.26 samples/sec   Loss 8.0075   LearningRate 0.0347   Epoch: 8   Global Step: 340990   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:03:10,421-Speed 2627.36 samples/sec   Loss 7.9683   LearningRate 0.0347   Epoch: 8   Global Step: 341000   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:03:14,299-Speed 2641.79 samples/sec   Loss 7.8587   LearningRate 0.0347   Epoch: 8   Global Step: 341010   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:18,196-Speed 2628.70 samples/sec   Loss 7.8989   LearningRate 0.0347   Epoch: 8   Global Step: 341020   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:22,095-Speed 2627.00 samples/sec   Loss 7.9263   LearningRate 0.0347   Epoch: 8   Global Step: 341030   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:25,993-Speed 2627.94 samples/sec   Loss 7.9307   LearningRate 0.0347   Epoch: 8   Global Step: 341040   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:29,892-Speed 2626.90 samples/sec   Loss 7.8846   LearningRate 0.0347   Epoch: 8   Global Step: 341050   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:33,794-Speed 2624.74 samples/sec   Loss 7.7976   LearningRate 0.0347   Epoch: 8   Global Step: 341060   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:37,688-Speed 2629.99 samples/sec   Loss 7.9674   LearningRate 0.0347   Epoch: 8   Global Step: 341070   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:41,617-Speed 2607.37 samples/sec   Loss 8.0206   LearningRate 0.0347   Epoch: 8   Global Step: 341080   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:45,525-Speed 2620.58 samples/sec   Loss 7.8754   LearningRate 0.0347   Epoch: 8   Global Step: 341090   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:49,431-Speed 2622.36 samples/sec   Loss 7.8484   LearningRate 0.0347   Epoch: 8   Global Step: 341100   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:53,343-Speed 2618.23 samples/sec   Loss 8.0606   LearningRate 0.0347   Epoch: 8   Global Step: 341110   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:03:57,235-Speed 2631.74 samples/sec   Loss 7.8745   LearningRate 0.0347   Epoch: 8   Global Step: 341120   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:01,133-Speed 2628.20 samples/sec   Loss 7.9030   LearningRate 0.0347   Epoch: 8   Global Step: 341130   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:05,031-Speed 2627.07 samples/sec   Loss 7.9575   LearningRate 0.0347   Epoch: 8   Global Step: 341140   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:08,927-Speed 2629.05 samples/sec   Loss 7.8086   LearningRate 0.0347   Epoch: 8   Global Step: 341150   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:12,859-Speed 2604.97 samples/sec   Loss 7.7418   LearningRate 0.0347   Epoch: 8   Global Step: 341160   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:16,753-Speed 2630.39 samples/sec   Loss 7.7818   LearningRate 0.0347   Epoch: 8   Global Step: 341170   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:20,655-Speed 2625.18 samples/sec   Loss 7.9450   LearningRate 0.0347   Epoch: 8   Global Step: 341180   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:24,553-Speed 2627.95 samples/sec   Loss 7.9672   LearningRate 0.0347   Epoch: 8   Global Step: 341190   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:28,450-Speed 2628.73 samples/sec   Loss 8.0333   LearningRate 0.0347   Epoch: 8   Global Step: 341200   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:32,331-Speed 2638.47 samples/sec   Loss 7.9409   LearningRate 0.0347   Epoch: 8   Global Step: 341210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:04:36,212-Speed 2638.95 samples/sec   Loss 7.7935   LearningRate 0.0347   Epoch: 8   Global Step: 341220   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:04:40,115-Speed 2624.32 samples/sec   Loss 7.8945   LearningRate 0.0347   Epoch: 8   Global Step: 341230   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:04:44,014-Speed 2627.38 samples/sec   Loss 7.9664   LearningRate 0.0347   Epoch: 8   Global Step: 341240   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:04:47,923-Speed 2620.34 samples/sec   Loss 8.0478   LearningRate 0.0347   Epoch: 8   Global Step: 341250   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:04:51,816-Speed 2631.27 samples/sec   Loss 7.9736   LearningRate 0.0346   Epoch: 8   Global Step: 341260   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:04:55,714-Speed 2627.60 samples/sec   Loss 7.8703   LearningRate 0.0346   Epoch: 8   Global Step: 341270   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:04:59,622-Speed 2621.24 samples/sec   Loss 7.8077   LearningRate 0.0346   Epoch: 8   Global Step: 341280   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:05:03,530-Speed 2620.44 samples/sec   Loss 7.9746   LearningRate 0.0346   Epoch: 8   Global Step: 341290   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:05:07,426-Speed 2629.19 samples/sec   Loss 8.0740   LearningRate 0.0346   Epoch: 8   Global Step: 341300   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:05:11,321-Speed 2629.29 samples/sec   Loss 7.8197   LearningRate 0.0346   Epoch: 8   Global Step: 341310   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:05:15,217-Speed 2629.43 samples/sec   Loss 7.7600   LearningRate 0.0346   Epoch: 8   Global Step: 341320   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:19,121-Speed 2623.85 samples/sec   Loss 7.9964   LearningRate 0.0346   Epoch: 8   Global Step: 341330   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:23,021-Speed 2626.44 samples/sec   Loss 7.8693   LearningRate 0.0346   Epoch: 8   Global Step: 341340   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:26,926-Speed 2622.49 samples/sec   Loss 7.9845   LearningRate 0.0346   Epoch: 8   Global Step: 341350   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:30,831-Speed 2622.64 samples/sec   Loss 8.0050   LearningRate 0.0346   Epoch: 8   Global Step: 341360   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:34,740-Speed 2620.21 samples/sec   Loss 7.7824   LearningRate 0.0346   Epoch: 8   Global Step: 341370   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:38,643-Speed 2624.13 samples/sec   Loss 7.8336   LearningRate 0.0346   Epoch: 8   Global Step: 341380   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:42,542-Speed 2627.10 samples/sec   Loss 7.9266   LearningRate 0.0346   Epoch: 8   Global Step: 341390   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:46,450-Speed 2620.78 samples/sec   Loss 8.0142   LearningRate 0.0346   Epoch: 8   Global Step: 341400   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:50,362-Speed 2618.81 samples/sec   Loss 7.9090   LearningRate 0.0346   Epoch: 8   Global Step: 341410   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:05:54,246-Speed 2636.82 samples/sec   Loss 8.0771   LearningRate 0.0346   Epoch: 8   Global Step: 341420   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:05:58,097-Speed 2659.78 samples/sec   Loss 8.6716   LearningRate 0.0346   Epoch: 8   Global Step: 341430   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:01,995-Speed 2627.80 samples/sec   Loss 8.2575   LearningRate 0.0346   Epoch: 8   Global Step: 341440   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:05,903-Speed 2620.48 samples/sec   Loss 7.9662   LearningRate 0.0346   Epoch: 8   Global Step: 341450   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:09,826-Speed 2611.28 samples/sec   Loss 7.9188   LearningRate 0.0346   Epoch: 8   Global Step: 341460   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:13,739-Speed 2618.10 samples/sec   Loss 7.8936   LearningRate 0.0346   Epoch: 8   Global Step: 341470   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:17,635-Speed 2628.33 samples/sec   Loss 7.7918   LearningRate 0.0346   Epoch: 8   Global Step: 341480   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:21,548-Speed 2618.45 samples/sec   Loss 7.9093   LearningRate 0.0346   Epoch: 8   Global Step: 341490   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:25,452-Speed 2623.55 samples/sec   Loss 7.9551   LearningRate 0.0346   Epoch: 8   Global Step: 341500   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:29,355-Speed 2623.77 samples/sec   Loss 7.9204   LearningRate 0.0346   Epoch: 8   Global Step: 341510   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:33,264-Speed 2620.84 samples/sec   Loss 7.9169   LearningRate 0.0346   Epoch: 8   Global Step: 341520   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:37,526-Speed 2403.16 samples/sec   Loss 7.8830   LearningRate 0.0346   Epoch: 8   Global Step: 341530   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:06:41,410-Speed 2636.64 samples/sec   Loss 8.4310   LearningRate 0.0346   Epoch: 8   Global Step: 341540   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:45,310-Speed 2626.65 samples/sec   Loss 8.0837   LearningRate 0.0346   Epoch: 8   Global Step: 341550   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:49,208-Speed 2627.22 samples/sec   Loss 7.8581   LearningRate 0.0346   Epoch: 8   Global Step: 341560   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:53,114-Speed 2623.34 samples/sec   Loss 7.9432   LearningRate 0.0346   Epoch: 8   Global Step: 341570   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:06:57,009-Speed 2629.04 samples/sec   Loss 8.1530   LearningRate 0.0346   Epoch: 8   Global Step: 341580   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:07:00,905-Speed 2628.80 samples/sec   Loss 8.0352   LearningRate 0.0346   Epoch: 8   Global Step: 341590   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:07:04,817-Speed 2618.19 samples/sec   Loss 8.0449   LearningRate 0.0346   Epoch: 8   Global Step: 341600   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:07:08,708-Speed 2633.13 samples/sec   Loss 7.9288   LearningRate 0.0346   Epoch: 8   Global Step: 341610   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:07:12,602-Speed 2629.89 samples/sec   Loss 7.9582   LearningRate 0.0346   Epoch: 8   Global Step: 341620   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:07:16,496-Speed 2630.28 samples/sec   Loss 8.0268   LearningRate 0.0346   Epoch: 8   Global Step: 341630   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:07:20,446-Speed 2593.20 samples/sec   Loss 7.9262   LearningRate 0.0346   Epoch: 8   Global Step: 341640   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:24,351-Speed 2623.04 samples/sec   Loss 7.8959   LearningRate 0.0346   Epoch: 8   Global Step: 341650   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:28,247-Speed 2629.70 samples/sec   Loss 7.9458   LearningRate 0.0346   Epoch: 8   Global Step: 341660   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:32,157-Speed 2618.82 samples/sec   Loss 8.0379   LearningRate 0.0346   Epoch: 8   Global Step: 341670   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:36,061-Speed 2623.58 samples/sec   Loss 8.0735   LearningRate 0.0346   Epoch: 8   Global Step: 341680   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:39,968-Speed 2621.71 samples/sec   Loss 7.8118   LearningRate 0.0346   Epoch: 8   Global Step: 341690   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:43,886-Speed 2614.92 samples/sec   Loss 7.8323   LearningRate 0.0346   Epoch: 8   Global Step: 341700   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:47,802-Speed 2615.02 samples/sec   Loss 8.0126   LearningRate 0.0346   Epoch: 8   Global Step: 341710   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:51,732-Speed 2606.47 samples/sec   Loss 7.8688   LearningRate 0.0346   Epoch: 8   Global Step: 341720   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:55,639-Speed 2621.70 samples/sec   Loss 7.9436   LearningRate 0.0346   Epoch: 8   Global Step: 341730   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:07:59,556-Speed 2615.17 samples/sec   Loss 7.7712   LearningRate 0.0346   Epoch: 8   Global Step: 341740   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:03,478-Speed 2611.76 samples/sec   Loss 7.8621   LearningRate 0.0346   Epoch: 8   Global Step: 341750   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:07,389-Speed 2618.82 samples/sec   Loss 7.9714   LearningRate 0.0346   Epoch: 8   Global Step: 341760   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:11,292-Speed 2624.12 samples/sec   Loss 7.8966   LearningRate 0.0346   Epoch: 8   Global Step: 341770   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:15,188-Speed 2628.48 samples/sec   Loss 7.8837   LearningRate 0.0346   Epoch: 8   Global Step: 341780   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:19,085-Speed 2628.85 samples/sec   Loss 7.9603   LearningRate 0.0346   Epoch: 8   Global Step: 341790   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:22,981-Speed 2628.72 samples/sec   Loss 7.9803   LearningRate 0.0346   Epoch: 8   Global Step: 341800   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:26,876-Speed 2629.63 samples/sec   Loss 7.9907   LearningRate 0.0346   Epoch: 8   Global Step: 341810   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:30,790-Speed 2616.88 samples/sec   Loss 7.9503   LearningRate 0.0346   Epoch: 8   Global Step: 341820   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:34,686-Speed 2629.45 samples/sec   Loss 7.8332   LearningRate 0.0346   Epoch: 8   Global Step: 341830   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:08:38,580-Speed 2629.65 samples/sec   Loss 7.8954   LearningRate 0.0346   Epoch: 8   Global Step: 341840   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:08:42,477-Speed 2628.54 samples/sec   Loss 8.0631   LearningRate 0.0346   Epoch: 8   Global Step: 341850   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:08:46,376-Speed 2627.12 samples/sec   Loss 8.0510   LearningRate 0.0346   Epoch: 8   Global Step: 341860   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:08:50,273-Speed 2628.38 samples/sec   Loss 8.0112   LearningRate 0.0346   Epoch: 8   Global Step: 341870   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:08:54,192-Speed 2613.36 samples/sec   Loss 8.0189   LearningRate 0.0346   Epoch: 8   Global Step: 341880   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:08:58,093-Speed 2626.59 samples/sec   Loss 7.9102   LearningRate 0.0346   Epoch: 8   Global Step: 341890   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:09:01,989-Speed 2628.97 samples/sec   Loss 7.8732   LearningRate 0.0346   Epoch: 8   Global Step: 341900   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:09:05,889-Speed 2625.65 samples/sec   Loss 7.8541   LearningRate 0.0346   Epoch: 8   Global Step: 341910   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:09:09,807-Speed 2614.40 samples/sec   Loss 7.9649   LearningRate 0.0346   Epoch: 8   Global Step: 341920   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:09:13,706-Speed 2627.05 samples/sec   Loss 7.8767   LearningRate 0.0346   Epoch: 8   Global Step: 341930   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:09:17,604-Speed 2628.06 samples/sec   Loss 7.8136   LearningRate 0.0346   Epoch: 8   Global Step: 341940   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:21,502-Speed 2627.10 samples/sec   Loss 7.9221   LearningRate 0.0346   Epoch: 8   Global Step: 341950   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:25,407-Speed 2623.11 samples/sec   Loss 7.9133   LearningRate 0.0345   Epoch: 8   Global Step: 341960   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:29,304-Speed 2628.86 samples/sec   Loss 7.9048   LearningRate 0.0345   Epoch: 8   Global Step: 341970   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:33,205-Speed 2625.15 samples/sec   Loss 7.9027   LearningRate 0.0345   Epoch: 8   Global Step: 341980   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:37,105-Speed 2626.45 samples/sec   Loss 8.0492   LearningRate 0.0345   Epoch: 8   Global Step: 341990   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:41,008-Speed 2624.37 samples/sec   Loss 7.9389   LearningRate 0.0345   Epoch: 8   Global Step: 342000   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:44,918-Speed 2619.85 samples/sec   Loss 7.7952   LearningRate 0.0345   Epoch: 8   Global Step: 342010   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:09:48,804-Speed 2636.13 samples/sec   Loss 8.0121   LearningRate 0.0345   Epoch: 8   Global Step: 342020   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:09:52,680-Speed 2642.08 samples/sec   Loss 8.2410   LearningRate 0.0345   Epoch: 8   Global Step: 342030   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:09:56,581-Speed 2625.95 samples/sec   Loss 7.9169   LearningRate 0.0345   Epoch: 8   Global Step: 342040   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:00,504-Speed 2610.38 samples/sec   Loss 7.8646   LearningRate 0.0345   Epoch: 8   Global Step: 342050   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:04,399-Speed 2630.00 samples/sec   Loss 7.9424   LearningRate 0.0345   Epoch: 8   Global Step: 342060   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:08,295-Speed 2628.97 samples/sec   Loss 7.8945   LearningRate 0.0345   Epoch: 8   Global Step: 342070   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:12,247-Speed 2591.75 samples/sec   Loss 7.9067   LearningRate 0.0345   Epoch: 8   Global Step: 342080   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:16,152-Speed 2622.79 samples/sec   Loss 7.8917   LearningRate 0.0345   Epoch: 8   Global Step: 342090   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:20,051-Speed 2627.28 samples/sec   Loss 7.8844   LearningRate 0.0345   Epoch: 8   Global Step: 342100   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:23,949-Speed 2628.10 samples/sec   Loss 7.9944   LearningRate 0.0345   Epoch: 8   Global Step: 342110   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:27,864-Speed 2616.41 samples/sec   Loss 8.1113   LearningRate 0.0345   Epoch: 8   Global Step: 342120   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:10:31,751-Speed 2634.99 samples/sec   Loss 7.9404   LearningRate 0.0345   Epoch: 8   Global Step: 342130   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:35,658-Speed 2621.38 samples/sec   Loss 7.8962   LearningRate 0.0345   Epoch: 8   Global Step: 342140   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:39,562-Speed 2623.81 samples/sec   Loss 7.8804   LearningRate 0.0345   Epoch: 8   Global Step: 342150   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:43,471-Speed 2620.36 samples/sec   Loss 8.0955   LearningRate 0.0345   Epoch: 8   Global Step: 342160   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:47,369-Speed 2627.18 samples/sec   Loss 7.9205   LearningRate 0.0345   Epoch: 8   Global Step: 342170   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:51,266-Speed 2628.96 samples/sec   Loss 7.8481   LearningRate 0.0345   Epoch: 8   Global Step: 342180   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:55,184-Speed 2614.23 samples/sec   Loss 7.8845   LearningRate 0.0345   Epoch: 8   Global Step: 342190   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:10:59,117-Speed 2604.26 samples/sec   Loss 8.0072   LearningRate 0.0345   Epoch: 8   Global Step: 342200   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:11:03,017-Speed 2626.17 samples/sec   Loss 7.8686   LearningRate 0.0345   Epoch: 8   Global Step: 342210   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:11:06,912-Speed 2629.81 samples/sec   Loss 7.8992   LearningRate 0.0345   Epoch: 8   Global Step: 342220   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:11:10,810-Speed 2627.93 samples/sec   Loss 8.0933   LearningRate 0.0345   Epoch: 8   Global Step: 342230   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:14,712-Speed 2624.47 samples/sec   Loss 7.9106   LearningRate 0.0345   Epoch: 8   Global Step: 342240   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:18,626-Speed 2617.84 samples/sec   Loss 7.8118   LearningRate 0.0345   Epoch: 8   Global Step: 342250   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:22,530-Speed 2623.59 samples/sec   Loss 7.9546   LearningRate 0.0345   Epoch: 8   Global Step: 342260   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:26,426-Speed 2629.15 samples/sec   Loss 7.9431   LearningRate 0.0345   Epoch: 8   Global Step: 342270   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:30,336-Speed 2618.90 samples/sec   Loss 7.8000   LearningRate 0.0345   Epoch: 8   Global Step: 342280   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:34,237-Speed 2626.21 samples/sec   Loss 7.9052   LearningRate 0.0345   Epoch: 8   Global Step: 342290   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:38,135-Speed 2627.18 samples/sec   Loss 7.8945   LearningRate 0.0345   Epoch: 8   Global Step: 342300   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:42,109-Speed 2577.75 samples/sec   Loss 7.9867   LearningRate 0.0345   Epoch: 8   Global Step: 342310   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:46,007-Speed 2627.18 samples/sec   Loss 7.9127   LearningRate 0.0345   Epoch: 8   Global Step: 342320   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:11:49,922-Speed 2616.81 samples/sec   Loss 7.7460   LearningRate 0.0345   Epoch: 8   Global Step: 342330   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:11:53,819-Speed 2628.52 samples/sec   Loss 7.9858   LearningRate 0.0345   Epoch: 8   Global Step: 342340   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:11:57,692-Speed 2644.40 samples/sec   Loss 7.8953   LearningRate 0.0345   Epoch: 8   Global Step: 342350   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:12:01,543-Speed 2659.67 samples/sec   Loss 8.7897   LearningRate 0.0345   Epoch: 8   Global Step: 342360   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:05,456-Speed 2617.96 samples/sec   Loss 7.8878   LearningRate 0.0345   Epoch: 8   Global Step: 342370   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:09,350-Speed 2629.88 samples/sec   Loss 7.8771   LearningRate 0.0345   Epoch: 8   Global Step: 342380   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:13,261-Speed 2619.42 samples/sec   Loss 8.0207   LearningRate 0.0345   Epoch: 8   Global Step: 342390   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:17,182-Speed 2612.08 samples/sec   Loss 7.9690   LearningRate 0.0345   Epoch: 8   Global Step: 342400   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:21,279-Speed 2499.68 samples/sec   Loss 7.8673   LearningRate 0.0345   Epoch: 8   Global Step: 342410   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:25,377-Speed 2499.78 samples/sec   Loss 7.8216   LearningRate 0.0345   Epoch: 8   Global Step: 342420   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:29,315-Speed 2601.33 samples/sec   Loss 7.9205   LearningRate 0.0345   Epoch: 8   Global Step: 342430   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:33,215-Speed 2626.10 samples/sec   Loss 7.7641   LearningRate 0.0345   Epoch: 8   Global Step: 342440   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:37,109-Speed 2629.71 samples/sec   Loss 7.9393   LearningRate 0.0345   Epoch: 8   Global Step: 342450   Fp16 Grad Scale: 4096   Required: 55 hours
Training: 2022-04-14 10:12:41,006-Speed 2628.53 samples/sec   Loss 8.0094   LearningRate 0.0345   Epoch: 8   Global Step: 342460   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:12:44,908-Speed 2625.18 samples/sec   Loss 8.0010   LearningRate 0.0345   Epoch: 8   Global Step: 342470   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:12:48,807-Speed 2626.97 samples/sec   Loss 7.8715   LearningRate 0.0345   Epoch: 8   Global Step: 342480   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:12:52,707-Speed 2625.80 samples/sec   Loss 7.9087   LearningRate 0.0345   Epoch: 8   Global Step: 342490   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:12:56,619-Speed 2618.62 samples/sec   Loss 7.8827   LearningRate 0.0345   Epoch: 8   Global Step: 342500   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:13:00,524-Speed 2623.26 samples/sec   Loss 7.8095   LearningRate 0.0345   Epoch: 8   Global Step: 342510   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:13:04,425-Speed 2625.19 samples/sec   Loss 7.9034   LearningRate 0.0345   Epoch: 8   Global Step: 342520   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:13:08,333-Speed 2620.67 samples/sec   Loss 7.9145   LearningRate 0.0345   Epoch: 8   Global Step: 342530   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:13:12,232-Speed 2627.25 samples/sec   Loss 8.0495   LearningRate 0.0345   Epoch: 8   Global Step: 342540   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:13:16,127-Speed 2629.34 samples/sec   Loss 7.8400   LearningRate 0.0345   Epoch: 8   Global Step: 342550   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:13:20,030-Speed 2624.83 samples/sec   Loss 7.8567   LearningRate 0.0345   Epoch: 8   Global Step: 342560   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:24,030-Speed 2560.44 samples/sec   Loss 7.9241   LearningRate 0.0345   Epoch: 8   Global Step: 342570   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:27,944-Speed 2616.79 samples/sec   Loss 7.9777   LearningRate 0.0345   Epoch: 8   Global Step: 342580   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:31,842-Speed 2627.56 samples/sec   Loss 7.8761   LearningRate 0.0345   Epoch: 8   Global Step: 342590   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:35,744-Speed 2624.83 samples/sec   Loss 7.9197   LearningRate 0.0345   Epoch: 8   Global Step: 342600   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:39,644-Speed 2626.12 samples/sec   Loss 7.8438   LearningRate 0.0345   Epoch: 8   Global Step: 342610   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:43,540-Speed 2629.41 samples/sec   Loss 7.9158   LearningRate 0.0345   Epoch: 8   Global Step: 342620   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:47,438-Speed 2627.16 samples/sec   Loss 7.9066   LearningRate 0.0345   Epoch: 8   Global Step: 342630   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:51,351-Speed 2618.10 samples/sec   Loss 7.9297   LearningRate 0.0345   Epoch: 8   Global Step: 342640   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:55,248-Speed 2628.41 samples/sec   Loss 7.8621   LearningRate 0.0345   Epoch: 8   Global Step: 342650   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:13:59,152-Speed 2623.67 samples/sec   Loss 7.9786   LearningRate 0.0345   Epoch: 8   Global Step: 342660   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:03,086-Speed 2603.23 samples/sec   Loss 7.8980   LearningRate 0.0344   Epoch: 8   Global Step: 342670   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:06,991-Speed 2623.23 samples/sec   Loss 7.9100   LearningRate 0.0344   Epoch: 8   Global Step: 342680   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:10,891-Speed 2626.15 samples/sec   Loss 7.8471   LearningRate 0.0344   Epoch: 8   Global Step: 342690   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:14,794-Speed 2624.30 samples/sec   Loss 7.9825   LearningRate 0.0344   Epoch: 8   Global Step: 342700   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:18,698-Speed 2624.10 samples/sec   Loss 7.9368   LearningRate 0.0344   Epoch: 8   Global Step: 342710   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:22,602-Speed 2623.29 samples/sec   Loss 7.8433   LearningRate 0.0344   Epoch: 8   Global Step: 342720   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:26,508-Speed 2623.01 samples/sec   Loss 7.9471   LearningRate 0.0344   Epoch: 8   Global Step: 342730   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:30,417-Speed 2619.83 samples/sec   Loss 7.9235   LearningRate 0.0344   Epoch: 8   Global Step: 342740   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:34,325-Speed 2621.45 samples/sec   Loss 7.8550   LearningRate 0.0344   Epoch: 8   Global Step: 342750   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:14:38,196-Speed 2646.03 samples/sec   Loss 8.2141   LearningRate 0.0344   Epoch: 8   Global Step: 342760   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:14:42,108-Speed 2617.50 samples/sec   Loss 7.9374   LearningRate 0.0344   Epoch: 8   Global Step: 342770   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:14:46,017-Speed 2620.22 samples/sec   Loss 7.8251   LearningRate 0.0344   Epoch: 8   Global Step: 342780   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:14:49,928-Speed 2619.40 samples/sec   Loss 7.8581   LearningRate 0.0344   Epoch: 8   Global Step: 342790   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:14:53,828-Speed 2626.42 samples/sec   Loss 8.0606   LearningRate 0.0344   Epoch: 8   Global Step: 342800   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:14:57,728-Speed 2625.88 samples/sec   Loss 7.9558   LearningRate 0.0344   Epoch: 8   Global Step: 342810   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:15:01,627-Speed 2627.71 samples/sec   Loss 7.8890   LearningRate 0.0344   Epoch: 8   Global Step: 342820   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:15:05,529-Speed 2624.34 samples/sec   Loss 7.8654   LearningRate 0.0344   Epoch: 8   Global Step: 342830   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:15:09,430-Speed 2625.66 samples/sec   Loss 7.9421   LearningRate 0.0344   Epoch: 8   Global Step: 342840   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:15:13,330-Speed 2625.88 samples/sec   Loss 7.9264   LearningRate 0.0344   Epoch: 8   Global Step: 342850   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-14 10:15:17,231-Speed 2626.02 samples/sec   Loss 7.8864   LearningRate 0.0344   Epoch: 8   Global Step: 342860   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:21,182-Speed 2592.78 samples/sec   Loss 7.8516   LearningRate 0.0344   Epoch: 8   Global Step: 342870   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:25,093-Speed 2618.51 samples/sec   Loss 7.8951   LearningRate 0.0344   Epoch: 8   Global Step: 342880   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:29,057-Speed 2584.39 samples/sec   Loss 8.0302   LearningRate 0.0344   Epoch: 8   Global Step: 342890   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:32,999-Speed 2598.59 samples/sec   Loss 7.9692   LearningRate 0.0344   Epoch: 8   Global Step: 342900   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:36,903-Speed 2623.26 samples/sec   Loss 7.9599   LearningRate 0.0344   Epoch: 8   Global Step: 342910   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:40,823-Speed 2613.01 samples/sec   Loss 7.9227   LearningRate 0.0344   Epoch: 8   Global Step: 342920   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:44,736-Speed 2617.19 samples/sec   Loss 8.5335   LearningRate 0.0344   Epoch: 8   Global Step: 342930   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:48,629-Speed 2631.24 samples/sec   Loss 8.3345   LearningRate 0.0344   Epoch: 8   Global Step: 342940   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:52,531-Speed 2625.05 samples/sec   Loss 8.0201   LearningRate 0.0344   Epoch: 8   Global Step: 342950   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:15:56,430-Speed 2627.32 samples/sec   Loss 7.9588   LearningRate 0.0344   Epoch: 8   Global Step: 342960   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:00,332-Speed 2624.64 samples/sec   Loss 7.9812   LearningRate 0.0344   Epoch: 8   Global Step: 342970   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:04,232-Speed 2626.14 samples/sec   Loss 7.9498   LearningRate 0.0344   Epoch: 8   Global Step: 342980   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:08,144-Speed 2618.09 samples/sec   Loss 7.9961   LearningRate 0.0344   Epoch: 8   Global Step: 342990   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:12,040-Speed 2628.67 samples/sec   Loss 7.9199   LearningRate 0.0344   Epoch: 8   Global Step: 343000   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:15,944-Speed 2623.90 samples/sec   Loss 7.9879   LearningRate 0.0344   Epoch: 8   Global Step: 343010   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:19,844-Speed 2626.15 samples/sec   Loss 7.9076   LearningRate 0.0344   Epoch: 8   Global Step: 343020   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:23,746-Speed 2625.42 samples/sec   Loss 7.8109   LearningRate 0.0344   Epoch: 8   Global Step: 343030   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:27,651-Speed 2622.30 samples/sec   Loss 7.8617   LearningRate 0.0344   Epoch: 8   Global Step: 343040   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:31,554-Speed 2623.96 samples/sec   Loss 8.0200   LearningRate 0.0344   Epoch: 8   Global Step: 343050   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:16:35,462-Speed 2621.21 samples/sec   Loss 7.9847   LearningRate 0.0344   Epoch: 8   Global Step: 343060   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:16:39,366-Speed 2623.93 samples/sec   Loss 7.9407   LearningRate 0.0344   Epoch: 8   Global Step: 343070   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:16:43,270-Speed 2623.60 samples/sec   Loss 7.9319   LearningRate 0.0344   Epoch: 8   Global Step: 343080   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:16:47,174-Speed 2623.81 samples/sec   Loss 7.8767   LearningRate 0.0344   Epoch: 8   Global Step: 343090   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:16:51,079-Speed 2622.68 samples/sec   Loss 8.0037   LearningRate 0.0344   Epoch: 8   Global Step: 343100   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:16:54,978-Speed 2627.08 samples/sec   Loss 8.0162   LearningRate 0.0344   Epoch: 8   Global Step: 343110   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:16:58,968-Speed 2566.43 samples/sec   Loss 7.9013   LearningRate 0.0344   Epoch: 8   Global Step: 343120   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:17:02,860-Speed 2632.08 samples/sec   Loss 7.9745   LearningRate 0.0344   Epoch: 8   Global Step: 343130   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:17:06,762-Speed 2625.23 samples/sec   Loss 7.9045   LearningRate 0.0344   Epoch: 8   Global Step: 343140   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:17:10,871-Speed 2492.80 samples/sec   Loss 8.4964   LearningRate 0.0344   Epoch: 8   Global Step: 343150   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:14,774-Speed 2624.41 samples/sec   Loss 8.0938   LearningRate 0.0344   Epoch: 8   Global Step: 343160   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:18,678-Speed 2623.40 samples/sec   Loss 8.1156   LearningRate 0.0344   Epoch: 8   Global Step: 343170   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:22,577-Speed 2626.55 samples/sec   Loss 7.9588   LearningRate 0.0344   Epoch: 8   Global Step: 343180   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:26,484-Speed 2621.82 samples/sec   Loss 7.8948   LearningRate 0.0344   Epoch: 8   Global Step: 343190   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:30,384-Speed 2626.35 samples/sec   Loss 7.9987   LearningRate 0.0344   Epoch: 8   Global Step: 343200   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:34,281-Speed 2628.15 samples/sec   Loss 7.9803   LearningRate 0.0344   Epoch: 8   Global Step: 343210   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:38,181-Speed 2625.72 samples/sec   Loss 7.8838   LearningRate 0.0344   Epoch: 8   Global Step: 343220   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:42,094-Speed 2618.00 samples/sec   Loss 7.8814   LearningRate 0.0344   Epoch: 8   Global Step: 343230   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:45,988-Speed 2630.37 samples/sec   Loss 7.8695   LearningRate 0.0344   Epoch: 8   Global Step: 343240   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:17:49,904-Speed 2615.43 samples/sec   Loss 7.9163   LearningRate 0.0344   Epoch: 8   Global Step: 343250   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:17:53,846-Speed 2598.10 samples/sec   Loss 7.8913   LearningRate 0.0344   Epoch: 8   Global Step: 343260   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:17:57,900-Speed 2526.87 samples/sec   Loss 7.9469   LearningRate 0.0344   Epoch: 8   Global Step: 343270   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:01,799-Speed 2626.54 samples/sec   Loss 7.9997   LearningRate 0.0344   Epoch: 8   Global Step: 343280   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:05,724-Speed 2609.20 samples/sec   Loss 7.9336   LearningRate 0.0344   Epoch: 8   Global Step: 343290   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:09,625-Speed 2625.93 samples/sec   Loss 7.8669   LearningRate 0.0344   Epoch: 8   Global Step: 343300   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:13,526-Speed 2625.78 samples/sec   Loss 7.8972   LearningRate 0.0344   Epoch: 8   Global Step: 343310   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:17,427-Speed 2625.65 samples/sec   Loss 7.9307   LearningRate 0.0344   Epoch: 8   Global Step: 343320   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:21,343-Speed 2615.41 samples/sec   Loss 7.9119   LearningRate 0.0344   Epoch: 8   Global Step: 343330   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:25,764-Speed 2317.06 samples/sec   Loss 7.8637   LearningRate 0.0344   Epoch: 8   Global Step: 343340   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:18:29,659-Speed 2629.54 samples/sec   Loss 7.8165   LearningRate 0.0344   Epoch: 8   Global Step: 343350   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:33,580-Speed 2612.58 samples/sec   Loss 7.9034   LearningRate 0.0344   Epoch: 8   Global Step: 343360   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:37,480-Speed 2625.81 samples/sec   Loss 7.9102   LearningRate 0.0344   Epoch: 8   Global Step: 343370   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:41,382-Speed 2625.10 samples/sec   Loss 8.0062   LearningRate 0.0343   Epoch: 8   Global Step: 343380   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:45,283-Speed 2625.69 samples/sec   Loss 7.9011   LearningRate 0.0343   Epoch: 8   Global Step: 343390   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:49,185-Speed 2625.33 samples/sec   Loss 7.8710   LearningRate 0.0343   Epoch: 8   Global Step: 343400   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:53,093-Speed 2620.52 samples/sec   Loss 7.9260   LearningRate 0.0343   Epoch: 8   Global Step: 343410   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:18:56,990-Speed 2628.50 samples/sec   Loss 7.7146   LearningRate 0.0343   Epoch: 8   Global Step: 343420   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:19:00,892-Speed 2625.42 samples/sec   Loss 8.0502   LearningRate 0.0343   Epoch: 8   Global Step: 343430   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:19:04,793-Speed 2625.53 samples/sec   Loss 7.9394   LearningRate 0.0343   Epoch: 8   Global Step: 343440   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:19:08,698-Speed 2622.74 samples/sec   Loss 7.8623   LearningRate 0.0343   Epoch: 8   Global Step: 343450   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:12,615-Speed 2615.17 samples/sec   Loss 7.9480   LearningRate 0.0343   Epoch: 8   Global Step: 343460   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:16,534-Speed 2613.25 samples/sec   Loss 7.7824   LearningRate 0.0343   Epoch: 8   Global Step: 343470   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:20,443-Speed 2620.79 samples/sec   Loss 7.9526   LearningRate 0.0343   Epoch: 8   Global Step: 343480   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:24,373-Speed 2606.07 samples/sec   Loss 8.0242   LearningRate 0.0343   Epoch: 8   Global Step: 343490   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:28,276-Speed 2624.44 samples/sec   Loss 7.9020   LearningRate 0.0343   Epoch: 8   Global Step: 343500   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:32,187-Speed 2618.78 samples/sec   Loss 7.9917   LearningRate 0.0343   Epoch: 8   Global Step: 343510   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:36,123-Speed 2602.22 samples/sec   Loss 7.9417   LearningRate 0.0343   Epoch: 8   Global Step: 343520   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:40,025-Speed 2624.87 samples/sec   Loss 7.8452   LearningRate 0.0343   Epoch: 8   Global Step: 343530   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:43,926-Speed 2625.98 samples/sec   Loss 7.7171   LearningRate 0.0343   Epoch: 8   Global Step: 343540   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:47,808-Speed 2638.12 samples/sec   Loss 7.9090   LearningRate 0.0343   Epoch: 8   Global Step: 343550   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:51,720-Speed 2618.69 samples/sec   Loss 7.8575   LearningRate 0.0343   Epoch: 8   Global Step: 343560   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:55,612-Speed 2631.44 samples/sec   Loss 7.8959   LearningRate 0.0343   Epoch: 8   Global Step: 343570   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:19:59,513-Speed 2625.89 samples/sec   Loss 7.9508   LearningRate 0.0343   Epoch: 8   Global Step: 343580   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:03,417-Speed 2624.13 samples/sec   Loss 7.9215   LearningRate 0.0343   Epoch: 8   Global Step: 343590   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:07,318-Speed 2624.80 samples/sec   Loss 8.0183   LearningRate 0.0343   Epoch: 8   Global Step: 343600   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:11,223-Speed 2623.63 samples/sec   Loss 7.8771   LearningRate 0.0343   Epoch: 8   Global Step: 343610   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:15,124-Speed 2625.83 samples/sec   Loss 7.8974   LearningRate 0.0343   Epoch: 8   Global Step: 343620   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:19,055-Speed 2605.39 samples/sec   Loss 7.9814   LearningRate 0.0343   Epoch: 8   Global Step: 343630   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:22,955-Speed 2626.03 samples/sec   Loss 7.8426   LearningRate 0.0343   Epoch: 8   Global Step: 343640   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:20:26,815-Speed 2653.92 samples/sec   Loss 8.3507   LearningRate 0.0343   Epoch: 8   Global Step: 343650   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:30,714-Speed 2626.69 samples/sec   Loss 8.2890   LearningRate 0.0343   Epoch: 8   Global Step: 343660   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:34,619-Speed 2623.79 samples/sec   Loss 7.9839   LearningRate 0.0343   Epoch: 8   Global Step: 343670   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:38,524-Speed 2622.63 samples/sec   Loss 7.9309   LearningRate 0.0343   Epoch: 8   Global Step: 343680   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:42,433-Speed 2620.38 samples/sec   Loss 7.9292   LearningRate 0.0343   Epoch: 8   Global Step: 343690   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:46,340-Speed 2621.26 samples/sec   Loss 7.9746   LearningRate 0.0343   Epoch: 8   Global Step: 343700   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:50,244-Speed 2624.00 samples/sec   Loss 8.0519   LearningRate 0.0343   Epoch: 8   Global Step: 343710   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:54,184-Speed 2599.93 samples/sec   Loss 7.8018   LearningRate 0.0343   Epoch: 8   Global Step: 343720   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:20:58,084-Speed 2626.65 samples/sec   Loss 7.8848   LearningRate 0.0343   Epoch: 8   Global Step: 343730   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:01,985-Speed 2625.57 samples/sec   Loss 7.7898   LearningRate 0.0343   Epoch: 8   Global Step: 343740   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:05,897-Speed 2618.33 samples/sec   Loss 8.1016   LearningRate 0.0343   Epoch: 8   Global Step: 343750   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:09,807-Speed 2619.48 samples/sec   Loss 7.9748   LearningRate 0.0343   Epoch: 8   Global Step: 343760   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:13,708-Speed 2625.98 samples/sec   Loss 8.0036   LearningRate 0.0343   Epoch: 8   Global Step: 343770   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:17,612-Speed 2623.49 samples/sec   Loss 7.9259   LearningRate 0.0343   Epoch: 8   Global Step: 343780   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:21,611-Speed 2560.65 samples/sec   Loss 7.9538   LearningRate 0.0343   Epoch: 8   Global Step: 343790   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:25,706-Speed 2501.86 samples/sec   Loss 7.8550   LearningRate 0.0343   Epoch: 8   Global Step: 343800   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:29,806-Speed 2497.89 samples/sec   Loss 7.8359   LearningRate 0.0343   Epoch: 8   Global Step: 343810   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:21:33,744-Speed 2601.09 samples/sec   Loss 8.7317   LearningRate 0.0343   Epoch: 8   Global Step: 343820   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:37,655-Speed 2618.69 samples/sec   Loss 8.4939   LearningRate 0.0343   Epoch: 8   Global Step: 343830   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:41,562-Speed 2622.11 samples/sec   Loss 8.0353   LearningRate 0.0343   Epoch: 8   Global Step: 343840   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:45,466-Speed 2623.23 samples/sec   Loss 7.9259   LearningRate 0.0343   Epoch: 8   Global Step: 343850   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:49,400-Speed 2603.91 samples/sec   Loss 7.9249   LearningRate 0.0343   Epoch: 8   Global Step: 343860   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:53,356-Speed 2589.36 samples/sec   Loss 7.9028   LearningRate 0.0343   Epoch: 8   Global Step: 343870   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:21:57,254-Speed 2627.34 samples/sec   Loss 7.9552   LearningRate 0.0343   Epoch: 8   Global Step: 343880   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:22:01,177-Speed 2610.83 samples/sec   Loss 7.9651   LearningRate 0.0343   Epoch: 8   Global Step: 343890   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:22:05,076-Speed 2627.52 samples/sec   Loss 8.0174   LearningRate 0.0343   Epoch: 8   Global Step: 343900   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:22:08,976-Speed 2626.22 samples/sec   Loss 7.9624   LearningRate 0.0343   Epoch: 8   Global Step: 343910   Fp16 Grad Scale: 16384   Required: 55 hours
Training: 2022-04-14 10:22:12,878-Speed 2624.92 samples/sec   Loss 7.8906   LearningRate 0.0343   Epoch: 8   Global Step: 343920   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:16,781-Speed 2624.15 samples/sec   Loss 7.8734   LearningRate 0.0343   Epoch: 8   Global Step: 343930   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:20,696-Speed 2616.24 samples/sec   Loss 7.9260   LearningRate 0.0343   Epoch: 8   Global Step: 343940   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:24,600-Speed 2623.92 samples/sec   Loss 7.9233   LearningRate 0.0343   Epoch: 8   Global Step: 343950   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:28,500-Speed 2625.88 samples/sec   Loss 7.9950   LearningRate 0.0343   Epoch: 8   Global Step: 343960   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:32,409-Speed 2620.75 samples/sec   Loss 7.9261   LearningRate 0.0343   Epoch: 8   Global Step: 343970   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:36,312-Speed 2624.46 samples/sec   Loss 7.9126   LearningRate 0.0343   Epoch: 8   Global Step: 343980   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:40,210-Speed 2627.62 samples/sec   Loss 7.9646   LearningRate 0.0343   Epoch: 8   Global Step: 343990   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:44,109-Speed 2626.97 samples/sec   Loss 7.7875   LearningRate 0.0343   Epoch: 8   Global Step: 344000   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:48,012-Speed 2624.36 samples/sec   Loss 7.9603   LearningRate 0.0343   Epoch: 8   Global Step: 344010   Fp16 Grad Scale: 32768   Required: 55 hours
Training: 2022-04-14 10:22:51,909-Speed 2628.14 samples/sec   Loss 7.7901   LearningRate 0.0343   Epoch: 8   Global Step: 344020   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:22:55,810-Speed 2625.64 samples/sec   Loss 7.8518   LearningRate 0.0343   Epoch: 8   Global Step: 344030   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:22:59,710-Speed 2626.77 samples/sec   Loss 7.8894   LearningRate 0.0343   Epoch: 8   Global Step: 344040   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:03,610-Speed 2626.13 samples/sec   Loss 7.8350   LearningRate 0.0343   Epoch: 8   Global Step: 344050   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:07,511-Speed 2625.68 samples/sec   Loss 7.9356   LearningRate 0.0343   Epoch: 8   Global Step: 344060   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:11,487-Speed 2633.62 samples/sec   Loss 7.9871   LearningRate 0.0343   Epoch: 8   Global Step: 344070   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:17,636-Speed 2635.25 samples/sec   Loss 7.9165   LearningRate 0.0343   Epoch: 8   Global Step: 344080   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:21,569-Speed 2634.75 samples/sec   Loss 7.8219   LearningRate 0.0342   Epoch: 8   Global Step: 344090   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:25,844-Speed 2633.53 samples/sec   Loss 7.8738   LearningRate 0.0342   Epoch: 8   Global Step: 344100   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:29,742-Speed 2627.56 samples/sec   Loss 7.8206   LearningRate 0.0342   Epoch: 8   Global Step: 344110   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:23:34,348-Speed 2558.60 samples/sec   Loss 7.9399   LearningRate 0.0342   Epoch: 8   Global Step: 344120   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:23:38,247-Speed 2627.00 samples/sec   Loss 8.0391   LearningRate 0.0342   Epoch: 8   Global Step: 344130   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:23:42,150-Speed 2623.89 samples/sec   Loss 8.0710   LearningRate 0.0342   Epoch: 8   Global Step: 344140   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:23:46,166-Speed 2629.86 samples/sec   Loss 7.9274   LearningRate 0.0342   Epoch: 8   Global Step: 344150   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:23:50,072-Speed 2622.52 samples/sec   Loss 8.0546   LearningRate 0.0342   Epoch: 8   Global Step: 344160   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:23:53,975-Speed 2624.16 samples/sec   Loss 7.9103   LearningRate 0.0342   Epoch: 8   Global Step: 344170   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:23:57,875-Speed 2630.69 samples/sec   Loss 7.8566   LearningRate 0.0342   Epoch: 8   Global Step: 344180   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:01,779-Speed 2623.86 samples/sec   Loss 7.7637   LearningRate 0.0342   Epoch: 8   Global Step: 344190   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:05,675-Speed 2628.97 samples/sec   Loss 7.8268   LearningRate 0.0342   Epoch: 8   Global Step: 344200   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:09,573-Speed 2627.69 samples/sec   Loss 7.8255   LearningRate 0.0342   Epoch: 8   Global Step: 344210   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:13,468-Speed 2629.21 samples/sec   Loss 7.8713   LearningRate 0.0342   Epoch: 8   Global Step: 344220   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:17,368-Speed 2626.40 samples/sec   Loss 7.9797   LearningRate 0.0342   Epoch: 8   Global Step: 344230   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:21,282-Speed 2617.44 samples/sec   Loss 7.9842   LearningRate 0.0342   Epoch: 8   Global Step: 344240   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:25,187-Speed 2622.88 samples/sec   Loss 7.8225   LearningRate 0.0342   Epoch: 8   Global Step: 344250   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:29,092-Speed 2622.90 samples/sec   Loss 7.8924   LearningRate 0.0342   Epoch: 8   Global Step: 344260   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:32,996-Speed 2624.20 samples/sec   Loss 7.8847   LearningRate 0.0342   Epoch: 8   Global Step: 344270   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:36,896-Speed 2626.32 samples/sec   Loss 7.8366   LearningRate 0.0342   Epoch: 8   Global Step: 344280   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:40,875-Speed 2574.00 samples/sec   Loss 7.9456   LearningRate 0.0342   Epoch: 8   Global Step: 344290   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:44,778-Speed 2624.26 samples/sec   Loss 7.8950   LearningRate 0.0342   Epoch: 8   Global Step: 344300   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:48,692-Speed 2616.88 samples/sec   Loss 7.8650   LearningRate 0.0342   Epoch: 8   Global Step: 344310   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:24:52,621-Speed 2606.71 samples/sec   Loss 7.8948   LearningRate 0.0342   Epoch: 8   Global Step: 344320   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:24:56,571-Speed 2593.39 samples/sec   Loss 7.8113   LearningRate 0.0342   Epoch: 8   Global Step: 344330   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:25:00,484-Speed 2617.59 samples/sec   Loss 8.0525   LearningRate 0.0342   Epoch: 8   Global Step: 344340   Fp16 Grad Scale: 262144   Required: 55 hours
Training: 2022-04-14 10:25:04,372-Speed 2634.63 samples/sec   Loss 7.9327   LearningRate 0.0342   Epoch: 8   Global Step: 344350   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:08,278-Speed 2622.42 samples/sec   Loss 7.8092   LearningRate 0.0342   Epoch: 8   Global Step: 344360   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:12,184-Speed 2621.92 samples/sec   Loss 7.8199   LearningRate 0.0342   Epoch: 8   Global Step: 344370   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:16,083-Speed 2627.13 samples/sec   Loss 7.8280   LearningRate 0.0342   Epoch: 8   Global Step: 344380   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:19,989-Speed 2622.58 samples/sec   Loss 7.8747   LearningRate 0.0342   Epoch: 8   Global Step: 344390   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:23,901-Speed 2618.29 samples/sec   Loss 7.8363   LearningRate 0.0342   Epoch: 8   Global Step: 344400   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:27,800-Speed 2627.06 samples/sec   Loss 7.8189   LearningRate 0.0342   Epoch: 8   Global Step: 344410   Fp16 Grad Scale: 131072   Required: 55 hours
Training: 2022-04-14 10:25:31,683-Speed 2638.04 samples/sec   Loss 7.6890   LearningRate 0.0342   Epoch: 8   Global Step: 344420   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:25:35,584-Speed 2626.33 samples/sec   Loss 7.9643   LearningRate 0.0342   Epoch: 8   Global Step: 344430   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:25:39,489-Speed 2623.06 samples/sec   Loss 7.8943   LearningRate 0.0342   Epoch: 8   Global Step: 344440   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:25:43,390-Speed 2624.84 samples/sec   Loss 7.8186   LearningRate 0.0342   Epoch: 8   Global Step: 344450   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:25:47,288-Speed 2627.91 samples/sec   Loss 7.7636   LearningRate 0.0342   Epoch: 8   Global Step: 344460   Fp16 Grad Scale: 65536   Required: 55 hours
Training: 2022-04-14 10:25:51,186-Speed 2627.38 samples/sec   Loss 7.8479   LearningRate 0.0342   Epoch: 8   Global Step: 344470   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:25:55,089-Speed 2624.59 samples/sec   Loss 7.9047   LearningRate 0.0342   Epoch: 8   Global Step: 344480   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:25:58,989-Speed 2627.41 samples/sec   Loss 7.9527   LearningRate 0.0342   Epoch: 8   Global Step: 344490   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:02,891-Speed 2624.72 samples/sec   Loss 7.8730   LearningRate 0.0342   Epoch: 8   Global Step: 344500   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:06,791-Speed 2626.33 samples/sec   Loss 8.0033   LearningRate 0.0342   Epoch: 8   Global Step: 344510   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:10,695-Speed 2623.60 samples/sec   Loss 8.0000   LearningRate 0.0342   Epoch: 8   Global Step: 344520   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:14,596-Speed 2625.25 samples/sec   Loss 7.8658   LearningRate 0.0342   Epoch: 8   Global Step: 344530   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:18,495-Speed 2627.30 samples/sec   Loss 7.8347   LearningRate 0.0342   Epoch: 8   Global Step: 344540   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:22,394-Speed 2626.85 samples/sec   Loss 7.9306   LearningRate 0.0342   Epoch: 8   Global Step: 344550   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:26:26,274-Speed 2640.71 samples/sec   Loss 7.9302   LearningRate 0.0342   Epoch: 8   Global Step: 344560   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:30,182-Speed 2620.82 samples/sec   Loss 7.8399   LearningRate 0.0342   Epoch: 8   Global Step: 344570   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:34,096-Speed 2616.52 samples/sec   Loss 8.0043   LearningRate 0.0342   Epoch: 8   Global Step: 344580   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:38,015-Speed 2613.55 samples/sec   Loss 8.0199   LearningRate 0.0342   Epoch: 8   Global Step: 344590   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:41,913-Speed 2628.28 samples/sec   Loss 7.8215   LearningRate 0.0342   Epoch: 8   Global Step: 344600   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:45,814-Speed 2625.84 samples/sec   Loss 7.9068   LearningRate 0.0342   Epoch: 8   Global Step: 344610   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:49,707-Speed 2630.20 samples/sec   Loss 7.9176   LearningRate 0.0342   Epoch: 8   Global Step: 344620   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:53,610-Speed 2624.25 samples/sec   Loss 7.7155   LearningRate 0.0342   Epoch: 8   Global Step: 344630   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:26:57,506-Speed 2629.47 samples/sec   Loss 7.8267   LearningRate 0.0342   Epoch: 8   Global Step: 344640   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:01,413-Speed 2622.00 samples/sec   Loss 7.7969   LearningRate 0.0342   Epoch: 8   Global Step: 344650   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:05,309-Speed 2628.85 samples/sec   Loss 7.8175   LearningRate 0.0342   Epoch: 8   Global Step: 344660   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:27:09,219-Speed 2619.89 samples/sec   Loss 7.8889   LearningRate 0.0342   Epoch: 8   Global Step: 344670   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:27:13,117-Speed 2627.28 samples/sec   Loss 7.8732   LearningRate 0.0342   Epoch: 8   Global Step: 344680   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:27:17,016-Speed 2626.96 samples/sec   Loss 7.7636   LearningRate 0.0342   Epoch: 8   Global Step: 344690   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:27:20,912-Speed 2628.95 samples/sec   Loss 7.8892   LearningRate 0.0342   Epoch: 8   Global Step: 344700   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:27:24,810-Speed 2628.06 samples/sec   Loss 7.7887   LearningRate 0.0342   Epoch: 8   Global Step: 344710   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:27:28,693-Speed 2638.06 samples/sec   Loss 7.8919   LearningRate 0.0342   Epoch: 8   Global Step: 344720   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:32,591-Speed 2627.62 samples/sec   Loss 7.9255   LearningRate 0.0342   Epoch: 8   Global Step: 344730   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:36,489-Speed 2627.39 samples/sec   Loss 7.8386   LearningRate 0.0342   Epoch: 8   Global Step: 344740   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:40,411-Speed 2611.90 samples/sec   Loss 7.8712   LearningRate 0.0342   Epoch: 8   Global Step: 344750   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:44,311-Speed 2625.97 samples/sec   Loss 7.7699   LearningRate 0.0342   Epoch: 8   Global Step: 344760   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:48,210-Speed 2626.80 samples/sec   Loss 7.8505   LearningRate 0.0342   Epoch: 8   Global Step: 344770   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:52,113-Speed 2624.36 samples/sec   Loss 7.8846   LearningRate 0.0342   Epoch: 8   Global Step: 344780   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:56,005-Speed 2631.42 samples/sec   Loss 7.9297   LearningRate 0.0342   Epoch: 8   Global Step: 344790   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:27:59,905-Speed 2626.29 samples/sec   Loss 7.8681   LearningRate 0.0341   Epoch: 8   Global Step: 344800   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:28:03,811-Speed 2623.09 samples/sec   Loss 7.9260   LearningRate 0.0341   Epoch: 8   Global Step: 344810   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:28:07,730-Speed 2613.30 samples/sec   Loss 7.9141   LearningRate 0.0341   Epoch: 8   Global Step: 344820   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:11,639-Speed 2620.58 samples/sec   Loss 7.9327   LearningRate 0.0341   Epoch: 8   Global Step: 344830   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:15,542-Speed 2624.27 samples/sec   Loss 7.8110   LearningRate 0.0341   Epoch: 8   Global Step: 344840   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:19,443-Speed 2625.50 samples/sec   Loss 7.8188   LearningRate 0.0341   Epoch: 8   Global Step: 344850   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:23,342-Speed 2627.23 samples/sec   Loss 7.7880   LearningRate 0.0341   Epoch: 8   Global Step: 344860   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:27,237-Speed 2629.79 samples/sec   Loss 7.9714   LearningRate 0.0341   Epoch: 8   Global Step: 344870   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:31,132-Speed 2629.25 samples/sec   Loss 7.9231   LearningRate 0.0341   Epoch: 8   Global Step: 344880   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:35,031-Speed 2626.98 samples/sec   Loss 7.8557   LearningRate 0.0341   Epoch: 8   Global Step: 344890   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:38,927-Speed 2628.54 samples/sec   Loss 7.9344   LearningRate 0.0341   Epoch: 8   Global Step: 344900   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:42,826-Speed 2627.99 samples/sec   Loss 7.8748   LearningRate 0.0341   Epoch: 8   Global Step: 344910   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:28:46,720-Speed 2630.82 samples/sec   Loss 7.9061   LearningRate 0.0341   Epoch: 8   Global Step: 344920   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:28:50,546-Speed 2677.13 samples/sec   Loss 8.1705   LearningRate 0.0341   Epoch: 8   Global Step: 344930   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:28:54,447-Speed 2625.51 samples/sec   Loss 8.5245   LearningRate 0.0341   Epoch: 8   Global Step: 344940   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:28:58,338-Speed 2632.81 samples/sec   Loss 8.1137   LearningRate 0.0341   Epoch: 8   Global Step: 344950   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:02,232-Speed 2629.91 samples/sec   Loss 8.0001   LearningRate 0.0341   Epoch: 8   Global Step: 344960   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:06,128-Speed 2629.27 samples/sec   Loss 7.9297   LearningRate 0.0341   Epoch: 8   Global Step: 344970   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:10,031-Speed 2624.23 samples/sec   Loss 7.8084   LearningRate 0.0341   Epoch: 8   Global Step: 344980   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:13,929-Speed 2627.51 samples/sec   Loss 7.9958   LearningRate 0.0341   Epoch: 8   Global Step: 344990   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:17,926-Speed 2562.98 samples/sec   Loss 7.8248   LearningRate 0.0341   Epoch: 8   Global Step: 345000   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:21,886-Speed 2586.48 samples/sec   Loss 7.8522   LearningRate 0.0341   Epoch: 8   Global Step: 345010   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:25,784-Speed 2627.37 samples/sec   Loss 7.7826   LearningRate 0.0341   Epoch: 8   Global Step: 345020   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:29:29,684-Speed 2626.76 samples/sec   Loss 7.8064   LearningRate 0.0341   Epoch: 8   Global Step: 345030   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:33,600-Speed 2615.27 samples/sec   Loss 7.8404   LearningRate 0.0341   Epoch: 8   Global Step: 345040   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:37,496-Speed 2629.05 samples/sec   Loss 7.8901   LearningRate 0.0341   Epoch: 8   Global Step: 345050   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:41,408-Speed 2617.54 samples/sec   Loss 7.8451   LearningRate 0.0341   Epoch: 8   Global Step: 345060   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:45,310-Speed 2625.42 samples/sec   Loss 8.0110   LearningRate 0.0341   Epoch: 8   Global Step: 345070   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:49,241-Speed 2605.46 samples/sec   Loss 7.7627   LearningRate 0.0341   Epoch: 8   Global Step: 345080   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:53,139-Speed 2628.09 samples/sec   Loss 7.8378   LearningRate 0.0341   Epoch: 8   Global Step: 345090   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:29:57,036-Speed 2627.73 samples/sec   Loss 7.9626   LearningRate 0.0341   Epoch: 8   Global Step: 345100   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:30:00,941-Speed 2624.32 samples/sec   Loss 7.7687   LearningRate 0.0341   Epoch: 8   Global Step: 345110   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:30:04,848-Speed 2621.51 samples/sec   Loss 7.9925   LearningRate 0.0341   Epoch: 8   Global Step: 345120   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:30:08,755-Speed 2621.32 samples/sec   Loss 7.9093   LearningRate 0.0341   Epoch: 8   Global Step: 345130   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:12,719-Speed 2583.69 samples/sec   Loss 7.6971   LearningRate 0.0341   Epoch: 8   Global Step: 345140   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:16,661-Speed 2598.35 samples/sec   Loss 7.8716   LearningRate 0.0341   Epoch: 8   Global Step: 345150   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:20,560-Speed 2627.04 samples/sec   Loss 7.9349   LearningRate 0.0341   Epoch: 8   Global Step: 345160   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:24,507-Speed 2595.25 samples/sec   Loss 7.9160   LearningRate 0.0341   Epoch: 8   Global Step: 345170   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:28,417-Speed 2619.25 samples/sec   Loss 7.8188   LearningRate 0.0341   Epoch: 8   Global Step: 345180   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:32,334-Speed 2615.12 samples/sec   Loss 7.9551   LearningRate 0.0341   Epoch: 8   Global Step: 345190   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:36,246-Speed 2618.44 samples/sec   Loss 7.8733   LearningRate 0.0341   Epoch: 8   Global Step: 345200   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:40,172-Speed 2608.68 samples/sec   Loss 7.8140   LearningRate 0.0341   Epoch: 8   Global Step: 345210   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:44,069-Speed 2628.29 samples/sec   Loss 7.7297   LearningRate 0.0341   Epoch: 8   Global Step: 345220   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:30:47,971-Speed 2624.99 samples/sec   Loss 7.9628   LearningRate 0.0341   Epoch: 8   Global Step: 345230   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:30:51,875-Speed 2623.71 samples/sec   Loss 7.6836   LearningRate 0.0341   Epoch: 8   Global Step: 345240   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:30:55,777-Speed 2625.28 samples/sec   Loss 7.8826   LearningRate 0.0341   Epoch: 8   Global Step: 345250   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:30:59,673-Speed 2629.03 samples/sec   Loss 7.8541   LearningRate 0.0341   Epoch: 8   Global Step: 345260   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:31:03,541-Speed 2647.63 samples/sec   Loss 8.3049   LearningRate 0.0341   Epoch: 8   Global Step: 345270   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:07,448-Speed 2622.12 samples/sec   Loss 8.9489   LearningRate 0.0341   Epoch: 8   Global Step: 345280   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:11,339-Speed 2631.74 samples/sec   Loss 8.0012   LearningRate 0.0341   Epoch: 8   Global Step: 345290   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:15,236-Speed 2628.97 samples/sec   Loss 7.9733   LearningRate 0.0341   Epoch: 8   Global Step: 345300   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:19,132-Speed 2629.56 samples/sec   Loss 7.8927   LearningRate 0.0341   Epoch: 8   Global Step: 345310   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:23,030-Speed 2627.79 samples/sec   Loss 7.8236   LearningRate 0.0341   Epoch: 8   Global Step: 345320   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:26,960-Speed 2605.95 samples/sec   Loss 7.9125   LearningRate 0.0341   Epoch: 8   Global Step: 345330   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:30,868-Speed 2621.54 samples/sec   Loss 7.9849   LearningRate 0.0341   Epoch: 8   Global Step: 345340   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:34,761-Speed 2630.88 samples/sec   Loss 7.7005   LearningRate 0.0341   Epoch: 8   Global Step: 345350   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:38,660-Speed 2627.11 samples/sec   Loss 7.8426   LearningRate 0.0341   Epoch: 8   Global Step: 345360   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:31:42,556-Speed 2628.21 samples/sec   Loss 7.8181   LearningRate 0.0341   Epoch: 8   Global Step: 345370   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:31:46,452-Speed 2629.46 samples/sec   Loss 7.8950   LearningRate 0.0341   Epoch: 8   Global Step: 345380   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:31:50,363-Speed 2618.93 samples/sec   Loss 7.9274   LearningRate 0.0341   Epoch: 8   Global Step: 345390   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:31:54,261-Speed 2627.63 samples/sec   Loss 7.8563   LearningRate 0.0341   Epoch: 8   Global Step: 345400   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:31:58,173-Speed 2618.04 samples/sec   Loss 7.7982   LearningRate 0.0341   Epoch: 8   Global Step: 345410   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:32:02,090-Speed 2615.29 samples/sec   Loss 7.8781   LearningRate 0.0341   Epoch: 8   Global Step: 345420   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:32:06,002-Speed 2617.74 samples/sec   Loss 7.8484   LearningRate 0.0341   Epoch: 8   Global Step: 345430   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:32:09,904-Speed 2625.66 samples/sec   Loss 7.8630   LearningRate 0.0341   Epoch: 8   Global Step: 345440   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:32:13,798-Speed 2630.07 samples/sec   Loss 8.1017   LearningRate 0.0341   Epoch: 8   Global Step: 345450   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:32:17,700-Speed 2624.99 samples/sec   Loss 8.0147   LearningRate 0.0341   Epoch: 8   Global Step: 345460   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:32:21,601-Speed 2625.39 samples/sec   Loss 7.9137   LearningRate 0.0341   Epoch: 8   Global Step: 345470   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:25,499-Speed 2627.71 samples/sec   Loss 7.9250   LearningRate 0.0341   Epoch: 8   Global Step: 345480   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:29,397-Speed 2628.03 samples/sec   Loss 7.8693   LearningRate 0.0341   Epoch: 8   Global Step: 345490   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:33,293-Speed 2629.17 samples/sec   Loss 7.8976   LearningRate 0.0341   Epoch: 8   Global Step: 345500   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:37,201-Speed 2620.93 samples/sec   Loss 7.8967   LearningRate 0.0340   Epoch: 8   Global Step: 345510   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:41,099-Speed 2627.45 samples/sec   Loss 7.7663   LearningRate 0.0340   Epoch: 8   Global Step: 345520   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:44,995-Speed 2628.64 samples/sec   Loss 7.8249   LearningRate 0.0340   Epoch: 8   Global Step: 345530   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:48,968-Speed 2577.79 samples/sec   Loss 7.8093   LearningRate 0.0340   Epoch: 8   Global Step: 345540   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:52,867-Speed 2627.52 samples/sec   Loss 7.8893   LearningRate 0.0340   Epoch: 8   Global Step: 345550   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:32:56,779-Speed 2618.17 samples/sec   Loss 7.9519   LearningRate 0.0340   Epoch: 8   Global Step: 345560   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:00,680-Speed 2625.63 samples/sec   Loss 7.8887   LearningRate 0.0340   Epoch: 8   Global Step: 345570   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:33:04,717-Speed 2537.03 samples/sec   Loss 7.8996   LearningRate 0.0340   Epoch: 8   Global Step: 345580   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:33:08,707-Speed 2567.66 samples/sec   Loss 7.9283   LearningRate 0.0340   Epoch: 8   Global Step: 345590   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:33:12,601-Speed 2630.54 samples/sec   Loss 7.9674   LearningRate 0.0340   Epoch: 8   Global Step: 345600   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:33:16,494-Speed 2630.35 samples/sec   Loss 7.8233   LearningRate 0.0340   Epoch: 8   Global Step: 345610   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:33:20,397-Speed 2624.15 samples/sec   Loss 7.7639   LearningRate 0.0340   Epoch: 8   Global Step: 345620   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:33:24,280-Speed 2637.45 samples/sec   Loss 7.9116   LearningRate 0.0340   Epoch: 8   Global Step: 345630   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:28,181-Speed 2626.37 samples/sec   Loss 7.8903   LearningRate 0.0340   Epoch: 8   Global Step: 345640   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:32,083-Speed 2624.96 samples/sec   Loss 7.8472   LearningRate 0.0340   Epoch: 8   Global Step: 345650   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:35,988-Speed 2622.96 samples/sec   Loss 8.0819   LearningRate 0.0340   Epoch: 8   Global Step: 345660   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:39,889-Speed 2625.62 samples/sec   Loss 7.8474   LearningRate 0.0340   Epoch: 8   Global Step: 345670   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:43,788-Speed 2626.91 samples/sec   Loss 7.8315   LearningRate 0.0340   Epoch: 8   Global Step: 345680   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:47,687-Speed 2626.67 samples/sec   Loss 7.8676   LearningRate 0.0340   Epoch: 8   Global Step: 345690   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:51,579-Speed 2631.62 samples/sec   Loss 7.9161   LearningRate 0.0340   Epoch: 8   Global Step: 345700   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:55,473-Speed 2630.70 samples/sec   Loss 7.8074   LearningRate 0.0340   Epoch: 8   Global Step: 345710   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:33:59,381-Speed 2620.85 samples/sec   Loss 7.8625   LearningRate 0.0340   Epoch: 8   Global Step: 345720   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:34:03,271-Speed 2633.50 samples/sec   Loss 7.8755   LearningRate 0.0340   Epoch: 8   Global Step: 345730   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:07,165-Speed 2630.12 samples/sec   Loss 7.8098   LearningRate 0.0340   Epoch: 8   Global Step: 345740   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:11,066-Speed 2626.07 samples/sec   Loss 7.9368   LearningRate 0.0340   Epoch: 8   Global Step: 345750   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:14,958-Speed 2631.28 samples/sec   Loss 7.8570   LearningRate 0.0340   Epoch: 8   Global Step: 345760   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:18,852-Speed 2630.48 samples/sec   Loss 7.7437   LearningRate 0.0340   Epoch: 8   Global Step: 345770   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:22,765-Speed 2617.94 samples/sec   Loss 7.9110   LearningRate 0.0340   Epoch: 8   Global Step: 345780   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:26,660-Speed 2629.41 samples/sec   Loss 7.8578   LearningRate 0.0340   Epoch: 8   Global Step: 345790   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:30,557-Speed 2627.94 samples/sec   Loss 7.8340   LearningRate 0.0340   Epoch: 8   Global Step: 345800   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:34,451-Speed 2630.64 samples/sec   Loss 7.9311   LearningRate 0.0340   Epoch: 8   Global Step: 345810   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:38,348-Speed 2628.18 samples/sec   Loss 7.9599   LearningRate 0.0340   Epoch: 8   Global Step: 345820   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:34:42,230-Speed 2639.26 samples/sec   Loss 7.8214   LearningRate 0.0340   Epoch: 8   Global Step: 345830   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:34:46,128-Speed 2627.21 samples/sec   Loss 7.8989   LearningRate 0.0340   Epoch: 8   Global Step: 345840   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:34:50,037-Speed 2620.51 samples/sec   Loss 7.8586   LearningRate 0.0340   Epoch: 8   Global Step: 345850   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:34:53,955-Speed 2614.33 samples/sec   Loss 7.9845   LearningRate 0.0340   Epoch: 8   Global Step: 345860   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:34:57,853-Speed 2627.56 samples/sec   Loss 7.8309   LearningRate 0.0340   Epoch: 8   Global Step: 345870   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:01,747-Speed 2630.42 samples/sec   Loss 7.7940   LearningRate 0.0340   Epoch: 8   Global Step: 345880   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:05,641-Speed 2630.26 samples/sec   Loss 7.9098   LearningRate 0.0340   Epoch: 8   Global Step: 345890   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:09,535-Speed 2630.54 samples/sec   Loss 7.7786   LearningRate 0.0340   Epoch: 8   Global Step: 345900   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:13,429-Speed 2630.56 samples/sec   Loss 7.7903   LearningRate 0.0340   Epoch: 8   Global Step: 345910   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:17,328-Speed 2626.97 samples/sec   Loss 7.8335   LearningRate 0.0340   Epoch: 8   Global Step: 345920   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:21,224-Speed 2629.19 samples/sec   Loss 7.9155   LearningRate 0.0340   Epoch: 8   Global Step: 345930   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:25,123-Speed 2626.99 samples/sec   Loss 7.7577   LearningRate 0.0340   Epoch: 8   Global Step: 345940   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:29,036-Speed 2617.68 samples/sec   Loss 7.7836   LearningRate 0.0340   Epoch: 8   Global Step: 345950   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:32,941-Speed 2622.53 samples/sec   Loss 7.8626   LearningRate 0.0340   Epoch: 8   Global Step: 345960   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:36,844-Speed 2624.46 samples/sec   Loss 7.7589   LearningRate 0.0340   Epoch: 8   Global Step: 345970   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:40,743-Speed 2627.12 samples/sec   Loss 7.8608   LearningRate 0.0340   Epoch: 8   Global Step: 345980   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:44,639-Speed 2629.00 samples/sec   Loss 7.8895   LearningRate 0.0340   Epoch: 8   Global Step: 345990   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:48,533-Speed 2630.61 samples/sec   Loss 7.7728   LearningRate 0.0340   Epoch: 8   Global Step: 346000   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:35:52,415-Speed 2637.92 samples/sec   Loss 7.9282   LearningRate 0.0340   Epoch: 8   Global Step: 346010   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:35:56,340-Speed 2610.26 samples/sec   Loss 7.8753   LearningRate 0.0340   Epoch: 8   Global Step: 346020   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:36:00,244-Speed 2623.91 samples/sec   Loss 7.9229   LearningRate 0.0340   Epoch: 8   Global Step: 346030   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:36:04,141-Speed 2627.52 samples/sec   Loss 7.8425   LearningRate 0.0340   Epoch: 8   Global Step: 346040   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:36:08,061-Speed 2613.32 samples/sec   Loss 7.8254   LearningRate 0.0340   Epoch: 8   Global Step: 346050   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:36:11,853-Speed 2701.20 samples/sec   Loss 8.1620   LearningRate 0.0340   Epoch: 8   Global Step: 346060   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:15,743-Speed 2633.27 samples/sec   Loss 8.3455   LearningRate 0.0340   Epoch: 8   Global Step: 346070   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:19,643-Speed 2625.90 samples/sec   Loss 7.9921   LearningRate 0.0340   Epoch: 8   Global Step: 346080   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:23,534-Speed 2632.10 samples/sec   Loss 7.7797   LearningRate 0.0340   Epoch: 8   Global Step: 346090   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:27,432-Speed 2628.36 samples/sec   Loss 7.8688   LearningRate 0.0340   Epoch: 8   Global Step: 346100   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:31,338-Speed 2622.57 samples/sec   Loss 7.7901   LearningRate 0.0340   Epoch: 8   Global Step: 346110   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:35,228-Speed 2632.66 samples/sec   Loss 7.7878   LearningRate 0.0340   Epoch: 8   Global Step: 346120   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:39,119-Speed 2632.04 samples/sec   Loss 7.8663   LearningRate 0.0340   Epoch: 8   Global Step: 346130   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:43,022-Speed 2624.24 samples/sec   Loss 7.9035   LearningRate 0.0340   Epoch: 8   Global Step: 346140   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:46,913-Speed 2632.18 samples/sec   Loss 7.8771   LearningRate 0.0340   Epoch: 8   Global Step: 346150   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:36:50,805-Speed 2631.99 samples/sec   Loss 7.8133   LearningRate 0.0340   Epoch: 8   Global Step: 346160   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:36:54,704-Speed 2626.44 samples/sec   Loss 7.9799   LearningRate 0.0340   Epoch: 8   Global Step: 346170   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:36:58,587-Speed 2638.52 samples/sec   Loss 8.5993   LearningRate 0.0340   Epoch: 8   Global Step: 346180   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:02,482-Speed 2629.26 samples/sec   Loss 8.5679   LearningRate 0.0340   Epoch: 8   Global Step: 346190   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:06,408-Speed 2609.35 samples/sec   Loss 8.1419   LearningRate 0.0340   Epoch: 8   Global Step: 346200   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:10,300-Speed 2631.05 samples/sec   Loss 7.7714   LearningRate 0.0340   Epoch: 8   Global Step: 346210   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:14,192-Speed 2631.59 samples/sec   Loss 7.8671   LearningRate 0.0339   Epoch: 8   Global Step: 346220   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:18,084-Speed 2631.97 samples/sec   Loss 7.9723   LearningRate 0.0339   Epoch: 8   Global Step: 346230   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:21,982-Speed 2627.72 samples/sec   Loss 8.0104   LearningRate 0.0339   Epoch: 8   Global Step: 346240   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:25,872-Speed 2633.20 samples/sec   Loss 7.8729   LearningRate 0.0339   Epoch: 8   Global Step: 346250   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:29,765-Speed 2630.63 samples/sec   Loss 7.7986   LearningRate 0.0339   Epoch: 8   Global Step: 346260   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:33,664-Speed 2627.15 samples/sec   Loss 7.9544   LearningRate 0.0339   Epoch: 8   Global Step: 346270   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:37:37,556-Speed 2630.97 samples/sec   Loss 7.9090   LearningRate 0.0339   Epoch: 8   Global Step: 346280   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:37:41,457-Speed 2626.29 samples/sec   Loss 7.8002   LearningRate 0.0339   Epoch: 8   Global Step: 346290   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:37:45,348-Speed 2632.59 samples/sec   Loss 7.8729   LearningRate 0.0339   Epoch: 8   Global Step: 346300   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:37:49,239-Speed 2632.16 samples/sec   Loss 7.9037   LearningRate 0.0339   Epoch: 8   Global Step: 346310   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:37:53,132-Speed 2630.86 samples/sec   Loss 7.7975   LearningRate 0.0339   Epoch: 8   Global Step: 346320   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:37:57,024-Speed 2631.66 samples/sec   Loss 7.9201   LearningRate 0.0339   Epoch: 8   Global Step: 346330   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:38:00,918-Speed 2630.12 samples/sec   Loss 7.9714   LearningRate 0.0339   Epoch: 8   Global Step: 346340   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:38:04,812-Speed 2630.29 samples/sec   Loss 7.8520   LearningRate 0.0339   Epoch: 8   Global Step: 346350   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:38:08,706-Speed 2630.06 samples/sec   Loss 7.9134   LearningRate 0.0339   Epoch: 8   Global Step: 346360   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:38:12,605-Speed 2627.09 samples/sec   Loss 7.8401   LearningRate 0.0339   Epoch: 8   Global Step: 346370   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:38:16,549-Speed 2597.29 samples/sec   Loss 7.8340   LearningRate 0.0339   Epoch: 8   Global Step: 346380   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:20,597-Speed 2529.82 samples/sec   Loss 7.8961   LearningRate 0.0339   Epoch: 8   Global Step: 346390   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:24,506-Speed 2620.64 samples/sec   Loss 7.8569   LearningRate 0.0339   Epoch: 8   Global Step: 346400   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:28,405-Speed 2626.89 samples/sec   Loss 7.9219   LearningRate 0.0339   Epoch: 8   Global Step: 346410   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:32,301-Speed 2631.96 samples/sec   Loss 7.8208   LearningRate 0.0339   Epoch: 8   Global Step: 346420   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:36,208-Speed 2621.54 samples/sec   Loss 7.7893   LearningRate 0.0339   Epoch: 8   Global Step: 346430   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:40,108-Speed 2625.84 samples/sec   Loss 7.8158   LearningRate 0.0339   Epoch: 8   Global Step: 346440   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:44,003-Speed 2629.77 samples/sec   Loss 7.7162   LearningRate 0.0339   Epoch: 8   Global Step: 346450   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:47,909-Speed 2622.72 samples/sec   Loss 7.9148   LearningRate 0.0339   Epoch: 8   Global Step: 346460   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:51,798-Speed 2633.39 samples/sec   Loss 7.7047   LearningRate 0.0339   Epoch: 8   Global Step: 346470   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:38:55,690-Speed 2631.80 samples/sec   Loss 7.8125   LearningRate 0.0339   Epoch: 8   Global Step: 346480   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:38:59,585-Speed 2629.36 samples/sec   Loss 7.8177   LearningRate 0.0339   Epoch: 8   Global Step: 346490   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:03,479-Speed 2630.81 samples/sec   Loss 7.9200   LearningRate 0.0339   Epoch: 8   Global Step: 346500   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:07,386-Speed 2621.34 samples/sec   Loss 7.8667   LearningRate 0.0339   Epoch: 8   Global Step: 346510   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:11,305-Speed 2613.34 samples/sec   Loss 7.8611   LearningRate 0.0339   Epoch: 8   Global Step: 346520   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:15,204-Speed 2626.69 samples/sec   Loss 7.8773   LearningRate 0.0339   Epoch: 8   Global Step: 346530   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:19,128-Speed 2610.26 samples/sec   Loss 7.7972   LearningRate 0.0339   Epoch: 8   Global Step: 346540   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:23,036-Speed 2621.49 samples/sec   Loss 7.8923   LearningRate 0.0339   Epoch: 8   Global Step: 346550   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:26,953-Speed 2614.94 samples/sec   Loss 7.8099   LearningRate 0.0339   Epoch: 8   Global Step: 346560   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:30,860-Speed 2621.84 samples/sec   Loss 7.8908   LearningRate 0.0339   Epoch: 8   Global Step: 346570   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:39:34,779-Speed 2613.47 samples/sec   Loss 7.8928   LearningRate 0.0339   Epoch: 8   Global Step: 346580   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:39:38,683-Speed 2623.32 samples/sec   Loss 7.8398   LearningRate 0.0339   Epoch: 8   Global Step: 346590   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:39:42,632-Speed 2593.27 samples/sec   Loss 7.9393   LearningRate 0.0339   Epoch: 8   Global Step: 346600   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:39:46,579-Speed 2595.55 samples/sec   Loss 8.7043   LearningRate 0.0339   Epoch: 8   Global Step: 346610   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:39:50,478-Speed 2626.99 samples/sec   Loss 8.0856   LearningRate 0.0339   Epoch: 8   Global Step: 346620   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:39:54,390-Speed 2618.08 samples/sec   Loss 7.8243   LearningRate 0.0339   Epoch: 8   Global Step: 346630   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:39:58,302-Speed 2618.44 samples/sec   Loss 8.0713   LearningRate 0.0339   Epoch: 8   Global Step: 346640   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:02,202-Speed 2626.67 samples/sec   Loss 7.9989   LearningRate 0.0339   Epoch: 8   Global Step: 346650   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:06,099-Speed 2628.12 samples/sec   Loss 7.9822   LearningRate 0.0339   Epoch: 8   Global Step: 346660   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:10,001-Speed 2624.87 samples/sec   Loss 7.9725   LearningRate 0.0339   Epoch: 8   Global Step: 346670   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:13,905-Speed 2623.45 samples/sec   Loss 7.9705   LearningRate 0.0339   Epoch: 8   Global Step: 346680   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:17,806-Speed 2625.73 samples/sec   Loss 8.0381   LearningRate 0.0339   Epoch: 8   Global Step: 346690   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:21,704-Speed 2627.50 samples/sec   Loss 7.7383   LearningRate 0.0339   Epoch: 8   Global Step: 346700   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:40:25,599-Speed 2629.93 samples/sec   Loss 7.8012   LearningRate 0.0339   Epoch: 8   Global Step: 346710   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:29,496-Speed 2628.48 samples/sec   Loss 7.8468   LearningRate 0.0339   Epoch: 8   Global Step: 346720   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:33,419-Speed 2610.71 samples/sec   Loss 7.8294   LearningRate 0.0339   Epoch: 8   Global Step: 346730   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:37,318-Speed 2626.88 samples/sec   Loss 7.9634   LearningRate 0.0339   Epoch: 8   Global Step: 346740   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:41,223-Speed 2623.08 samples/sec   Loss 7.8739   LearningRate 0.0339   Epoch: 8   Global Step: 346750   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:45,123-Speed 2626.57 samples/sec   Loss 7.9310   LearningRate 0.0339   Epoch: 8   Global Step: 346760   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:49,017-Speed 2630.39 samples/sec   Loss 7.8538   LearningRate 0.0339   Epoch: 8   Global Step: 346770   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:52,907-Speed 2633.45 samples/sec   Loss 7.8215   LearningRate 0.0339   Epoch: 8   Global Step: 346780   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:40:56,801-Speed 2629.61 samples/sec   Loss 7.7833   LearningRate 0.0339   Epoch: 8   Global Step: 346790   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:41:00,697-Speed 2629.06 samples/sec   Loss 7.9218   LearningRate 0.0339   Epoch: 8   Global Step: 346800   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:41:04,590-Speed 2630.60 samples/sec   Loss 7.8021   LearningRate 0.0339   Epoch: 8   Global Step: 346810   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:41:08,483-Speed 2632.01 samples/sec   Loss 7.8449   LearningRate 0.0339   Epoch: 8   Global Step: 346820   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:41:12,374-Speed 2632.25 samples/sec   Loss 7.7623   LearningRate 0.0339   Epoch: 8   Global Step: 346830   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:41:16,265-Speed 2632.18 samples/sec   Loss 7.7625   LearningRate 0.0339   Epoch: 8   Global Step: 346840   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:41:20,152-Speed 2634.61 samples/sec   Loss 7.9564   LearningRate 0.0339   Epoch: 8   Global Step: 346850   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:41:24,007-Speed 2657.21 samples/sec   Loss 8.7556   LearningRate 0.0339   Epoch: 8   Global Step: 346860   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:27,902-Speed 2629.67 samples/sec   Loss 8.2419   LearningRate 0.0339   Epoch: 8   Global Step: 346870   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:31,791-Speed 2633.53 samples/sec   Loss 7.8172   LearningRate 0.0339   Epoch: 8   Global Step: 346880   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:35,687-Speed 2628.47 samples/sec   Loss 7.7757   LearningRate 0.0339   Epoch: 8   Global Step: 346890   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:39,583-Speed 2629.31 samples/sec   Loss 7.8648   LearningRate 0.0339   Epoch: 8   Global Step: 346900   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:43,476-Speed 2631.56 samples/sec   Loss 7.9778   LearningRate 0.0339   Epoch: 8   Global Step: 346910   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:47,382-Speed 2622.18 samples/sec   Loss 7.9691   LearningRate 0.0339   Epoch: 8   Global Step: 346920   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:51,275-Speed 2631.11 samples/sec   Loss 7.8733   LearningRate 0.0338   Epoch: 8   Global Step: 346930   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:55,163-Speed 2634.35 samples/sec   Loss 7.8944   LearningRate 0.0338   Epoch: 8   Global Step: 346940   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:41:59,077-Speed 2617.66 samples/sec   Loss 7.8950   LearningRate 0.0338   Epoch: 8   Global Step: 346950   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:42:02,970-Speed 2630.67 samples/sec   Loss 7.8938   LearningRate 0.0338   Epoch: 8   Global Step: 346960   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:06,862-Speed 2631.86 samples/sec   Loss 7.9625   LearningRate 0.0338   Epoch: 8   Global Step: 346970   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:10,755-Speed 2630.58 samples/sec   Loss 7.8155   LearningRate 0.0338   Epoch: 8   Global Step: 346980   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:14,652-Speed 2628.89 samples/sec   Loss 7.8434   LearningRate 0.0338   Epoch: 8   Global Step: 346990   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:18,581-Speed 2607.57 samples/sec   Loss 7.8575   LearningRate 0.0338   Epoch: 8   Global Step: 347000   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:22,471-Speed 2632.55 samples/sec   Loss 7.7798   LearningRate 0.0338   Epoch: 8   Global Step: 347010   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:26,361-Speed 2633.82 samples/sec   Loss 7.9261   LearningRate 0.0338   Epoch: 8   Global Step: 347020   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:30,251-Speed 2632.88 samples/sec   Loss 7.8253   LearningRate 0.0338   Epoch: 8   Global Step: 347030   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:34,141-Speed 2632.84 samples/sec   Loss 7.7397   LearningRate 0.0338   Epoch: 8   Global Step: 347040   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:38,031-Speed 2632.62 samples/sec   Loss 7.8622   LearningRate 0.0338   Epoch: 8   Global Step: 347050   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:42:41,927-Speed 2629.64 samples/sec   Loss 7.7983   LearningRate 0.0338   Epoch: 8   Global Step: 347060   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:42:45,818-Speed 2632.45 samples/sec   Loss 7.8627   LearningRate 0.0338   Epoch: 8   Global Step: 347070   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:42:49,724-Speed 2622.39 samples/sec   Loss 7.8066   LearningRate 0.0338   Epoch: 8   Global Step: 347080   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:42:53,626-Speed 2624.88 samples/sec   Loss 7.8649   LearningRate 0.0338   Epoch: 8   Global Step: 347090   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:42:57,521-Speed 2629.91 samples/sec   Loss 7.8863   LearningRate 0.0338   Epoch: 8   Global Step: 347100   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:43:01,412-Speed 2631.98 samples/sec   Loss 7.7902   LearningRate 0.0338   Epoch: 8   Global Step: 347110   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:43:05,318-Speed 2622.46 samples/sec   Loss 7.7626   LearningRate 0.0338   Epoch: 8   Global Step: 347120   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:43:09,218-Speed 2626.00 samples/sec   Loss 7.8572   LearningRate 0.0338   Epoch: 8   Global Step: 347130   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:43:13,116-Speed 2627.90 samples/sec   Loss 7.8021   LearningRate 0.0338   Epoch: 8   Global Step: 347140   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:43:17,015-Speed 2626.87 samples/sec   Loss 7.8552   LearningRate 0.0338   Epoch: 8   Global Step: 347150   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:43:20,915-Speed 2626.72 samples/sec   Loss 7.8629   LearningRate 0.0338   Epoch: 8   Global Step: 347160   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:24,809-Speed 2629.97 samples/sec   Loss 7.9342   LearningRate 0.0338   Epoch: 8   Global Step: 347170   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:28,709-Speed 2626.02 samples/sec   Loss 7.7386   LearningRate 0.0338   Epoch: 8   Global Step: 347180   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:32,603-Speed 2630.39 samples/sec   Loss 7.8637   LearningRate 0.0338   Epoch: 8   Global Step: 347190   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:36,503-Speed 2625.85 samples/sec   Loss 7.7230   LearningRate 0.0338   Epoch: 8   Global Step: 347200   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:40,401-Speed 2627.81 samples/sec   Loss 7.8082   LearningRate 0.0338   Epoch: 8   Global Step: 347210   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:44,312-Speed 2619.11 samples/sec   Loss 7.9271   LearningRate 0.0338   Epoch: 8   Global Step: 347220   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:48,208-Speed 2629.78 samples/sec   Loss 7.8107   LearningRate 0.0338   Epoch: 8   Global Step: 347230   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:52,117-Speed 2619.88 samples/sec   Loss 7.9748   LearningRate 0.0338   Epoch: 8   Global Step: 347240   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:56,021-Speed 2623.79 samples/sec   Loss 7.8444   LearningRate 0.0338   Epoch: 8   Global Step: 347250   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:43:59,929-Speed 2620.73 samples/sec   Loss 7.7396   LearningRate 0.0338   Epoch: 8   Global Step: 347260   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:03,830-Speed 2626.00 samples/sec   Loss 7.8727   LearningRate 0.0338   Epoch: 8   Global Step: 347270   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:07,728-Speed 2627.01 samples/sec   Loss 7.7569   LearningRate 0.0338   Epoch: 8   Global Step: 347280   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:11,629-Speed 2625.67 samples/sec   Loss 7.7576   LearningRate 0.0338   Epoch: 8   Global Step: 347290   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:15,525-Speed 2628.97 samples/sec   Loss 7.7747   LearningRate 0.0338   Epoch: 8   Global Step: 347300   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:19,427-Speed 2626.01 samples/sec   Loss 7.8096   LearningRate 0.0338   Epoch: 8   Global Step: 347310   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:23,336-Speed 2620.35 samples/sec   Loss 7.8516   LearningRate 0.0338   Epoch: 8   Global Step: 347320   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:27,350-Speed 2551.80 samples/sec   Loss 7.9253   LearningRate 0.0338   Epoch: 8   Global Step: 347330   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:31,366-Speed 2551.18 samples/sec   Loss 7.8507   LearningRate 0.0338   Epoch: 8   Global Step: 347340   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:35,266-Speed 2625.93 samples/sec   Loss 7.8556   LearningRate 0.0338   Epoch: 8   Global Step: 347350   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:44:39,170-Speed 2623.45 samples/sec   Loss 7.8526   LearningRate 0.0338   Epoch: 8   Global Step: 347360   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:44:43,074-Speed 2623.90 samples/sec   Loss 7.7584   LearningRate 0.0338   Epoch: 8   Global Step: 347370   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:44:46,969-Speed 2629.48 samples/sec   Loss 7.7081   LearningRate 0.0338   Epoch: 8   Global Step: 347380   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:44:50,870-Speed 2625.77 samples/sec   Loss 7.7701   LearningRate 0.0338   Epoch: 8   Global Step: 347390   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:44:54,779-Speed 2620.47 samples/sec   Loss 7.8911   LearningRate 0.0338   Epoch: 8   Global Step: 347400   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:44:58,700-Speed 2613.23 samples/sec   Loss 7.8189   LearningRate 0.0338   Epoch: 8   Global Step: 347410   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:45:02,586-Speed 2635.60 samples/sec   Loss 7.8240   LearningRate 0.0338   Epoch: 8   Global Step: 347420   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:06,517-Speed 2605.88 samples/sec   Loss 7.9164   LearningRate 0.0338   Epoch: 8   Global Step: 347430   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:10,421-Speed 2623.60 samples/sec   Loss 7.8841   LearningRate 0.0338   Epoch: 8   Global Step: 347440   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:14,337-Speed 2615.12 samples/sec   Loss 7.8389   LearningRate 0.0338   Epoch: 8   Global Step: 347450   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:18,236-Speed 2627.14 samples/sec   Loss 7.8348   LearningRate 0.0338   Epoch: 8   Global Step: 347460   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:22,135-Speed 2626.92 samples/sec   Loss 7.8822   LearningRate 0.0338   Epoch: 8   Global Step: 347470   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:26,055-Speed 2612.92 samples/sec   Loss 7.8878   LearningRate 0.0338   Epoch: 8   Global Step: 347480   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:29,954-Speed 2627.29 samples/sec   Loss 7.8028   LearningRate 0.0338   Epoch: 8   Global Step: 347490   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:33,848-Speed 2630.51 samples/sec   Loss 7.8226   LearningRate 0.0338   Epoch: 8   Global Step: 347500   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:37,742-Speed 2630.52 samples/sec   Loss 7.8989   LearningRate 0.0338   Epoch: 8   Global Step: 347510   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:45:41,638-Speed 2628.76 samples/sec   Loss 7.8563   LearningRate 0.0338   Epoch: 8   Global Step: 347520   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:45:45,552-Speed 2616.64 samples/sec   Loss 7.8625   LearningRate 0.0338   Epoch: 8   Global Step: 347530   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:45:49,448-Speed 2628.42 samples/sec   Loss 7.7992   LearningRate 0.0338   Epoch: 8   Global Step: 347540   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:45:53,320-Speed 2647.64 samples/sec   Loss 7.8554   LearningRate 0.0338   Epoch: 8   Global Step: 347550   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:45:57,200-Speed 2639.59 samples/sec   Loss 8.1120   LearningRate 0.0338   Epoch: 8   Global Step: 347560   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:46:01,066-Speed 2650.23 samples/sec   Loss 8.7011   LearningRate 0.0338   Epoch: 8   Global Step: 347570   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:04,966-Speed 2626.02 samples/sec   Loss 7.9231   LearningRate 0.0338   Epoch: 8   Global Step: 347580   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:08,856-Speed 2633.05 samples/sec   Loss 7.9044   LearningRate 0.0338   Epoch: 8   Global Step: 347590   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:12,751-Speed 2629.54 samples/sec   Loss 7.7986   LearningRate 0.0338   Epoch: 8   Global Step: 347600   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:16,647-Speed 2629.50 samples/sec   Loss 7.8794   LearningRate 0.0338   Epoch: 8   Global Step: 347610   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:20,541-Speed 2629.87 samples/sec   Loss 7.9523   LearningRate 0.0338   Epoch: 8   Global Step: 347620   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:24,436-Speed 2629.89 samples/sec   Loss 7.8787   LearningRate 0.0338   Epoch: 8   Global Step: 347630   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:28,327-Speed 2632.23 samples/sec   Loss 7.9582   LearningRate 0.0337   Epoch: 8   Global Step: 347640   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:32,229-Speed 2625.19 samples/sec   Loss 7.8559   LearningRate 0.0337   Epoch: 8   Global Step: 347650   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:36,119-Speed 2632.90 samples/sec   Loss 7.8463   LearningRate 0.0337   Epoch: 8   Global Step: 347660   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:46:40,011-Speed 2632.11 samples/sec   Loss 7.7859   LearningRate 0.0337   Epoch: 8   Global Step: 347670   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:46:43,901-Speed 2633.14 samples/sec   Loss 7.7736   LearningRate 0.0337   Epoch: 8   Global Step: 347680   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:46:47,795-Speed 2630.23 samples/sec   Loss 7.7084   LearningRate 0.0337   Epoch: 8   Global Step: 347690   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:46:51,684-Speed 2634.63 samples/sec   Loss 7.7953   LearningRate 0.0337   Epoch: 8   Global Step: 347700   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:46:55,584-Speed 2625.57 samples/sec   Loss 7.7911   LearningRate 0.0337   Epoch: 8   Global Step: 347710   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:46:59,474-Speed 2633.10 samples/sec   Loss 7.8219   LearningRate 0.0337   Epoch: 8   Global Step: 347720   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:03,362-Speed 2634.10 samples/sec   Loss 7.8346   LearningRate 0.0337   Epoch: 8   Global Step: 347730   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:07,256-Speed 2630.80 samples/sec   Loss 7.8395   LearningRate 0.0337   Epoch: 8   Global Step: 347740   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:11,157-Speed 2625.38 samples/sec   Loss 7.9098   LearningRate 0.0337   Epoch: 8   Global Step: 347750   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:15,049-Speed 2632.18 samples/sec   Loss 7.7062   LearningRate 0.0337   Epoch: 8   Global Step: 347760   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:18,943-Speed 2630.47 samples/sec   Loss 7.9415   LearningRate 0.0337   Epoch: 8   Global Step: 347770   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:22,840-Speed 2628.12 samples/sec   Loss 7.9364   LearningRate 0.0337   Epoch: 8   Global Step: 347780   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:26,736-Speed 2629.06 samples/sec   Loss 7.8348   LearningRate 0.0337   Epoch: 8   Global Step: 347790   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:30,626-Speed 2632.92 samples/sec   Loss 7.6730   LearningRate 0.0337   Epoch: 8   Global Step: 347800   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:34,528-Speed 2624.91 samples/sec   Loss 7.9373   LearningRate 0.0337   Epoch: 8   Global Step: 347810   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:38,437-Speed 2620.23 samples/sec   Loss 7.7944   LearningRate 0.0337   Epoch: 8   Global Step: 347820   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:42,331-Speed 2630.53 samples/sec   Loss 7.8208   LearningRate 0.0337   Epoch: 8   Global Step: 347830   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:46,222-Speed 2632.21 samples/sec   Loss 7.7820   LearningRate 0.0337   Epoch: 8   Global Step: 347840   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:47:50,103-Speed 2639.47 samples/sec   Loss 7.9396   LearningRate 0.0337   Epoch: 8   Global Step: 347850   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:54,017-Speed 2616.66 samples/sec   Loss 8.9970   LearningRate 0.0337   Epoch: 8   Global Step: 347860   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:47:57,907-Speed 2633.33 samples/sec   Loss 8.2106   LearningRate 0.0337   Epoch: 8   Global Step: 347870   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:01,798-Speed 2631.82 samples/sec   Loss 7.9116   LearningRate 0.0337   Epoch: 8   Global Step: 347880   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:05,690-Speed 2631.77 samples/sec   Loss 7.8741   LearningRate 0.0337   Epoch: 8   Global Step: 347890   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:09,584-Speed 2629.56 samples/sec   Loss 7.7998   LearningRate 0.0337   Epoch: 8   Global Step: 347900   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:13,477-Speed 2632.09 samples/sec   Loss 7.8627   LearningRate 0.0337   Epoch: 8   Global Step: 347910   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:17,374-Speed 2628.79 samples/sec   Loss 7.9680   LearningRate 0.0337   Epoch: 8   Global Step: 347920   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:21,265-Speed 2632.25 samples/sec   Loss 7.8506   LearningRate 0.0337   Epoch: 8   Global Step: 347930   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:25,156-Speed 2631.93 samples/sec   Loss 7.8740   LearningRate 0.0337   Epoch: 8   Global Step: 347940   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:48:29,050-Speed 2630.59 samples/sec   Loss 7.9113   LearningRate 0.0337   Epoch: 8   Global Step: 347950   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:32,945-Speed 2629.68 samples/sec   Loss 7.7791   LearningRate 0.0337   Epoch: 8   Global Step: 347960   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:36,845-Speed 2626.06 samples/sec   Loss 7.9154   LearningRate 0.0337   Epoch: 8   Global Step: 347970   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:40,769-Speed 2609.94 samples/sec   Loss 7.8890   LearningRate 0.0337   Epoch: 8   Global Step: 347980   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:44,709-Speed 2599.79 samples/sec   Loss 7.8005   LearningRate 0.0337   Epoch: 8   Global Step: 347990   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:48,600-Speed 2632.56 samples/sec   Loss 7.7203   LearningRate 0.0337   Epoch: 8   Global Step: 348000   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:52,495-Speed 2629.97 samples/sec   Loss 7.7859   LearningRate 0.0337   Epoch: 8   Global Step: 348010   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:48:56,385-Speed 2632.91 samples/sec   Loss 7.8457   LearningRate 0.0337   Epoch: 8   Global Step: 348020   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:49:00,283-Speed 2627.18 samples/sec   Loss 7.8042   LearningRate 0.0337   Epoch: 8   Global Step: 348030   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:49:04,203-Speed 2613.05 samples/sec   Loss 7.9527   LearningRate 0.0337   Epoch: 8   Global Step: 348040   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:49:08,114-Speed 2618.34 samples/sec   Loss 7.8678   LearningRate 0.0337   Epoch: 8   Global Step: 348050   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:49:12,007-Speed 2631.65 samples/sec   Loss 7.8690   LearningRate 0.0337   Epoch: 8   Global Step: 348060   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:49:15,831-Speed 2677.73 samples/sec   Loss 8.1995   LearningRate 0.0337   Epoch: 8   Global Step: 348070   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:19,720-Speed 2634.01 samples/sec   Loss 8.3817   LearningRate 0.0337   Epoch: 8   Global Step: 348080   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:23,602-Speed 2638.02 samples/sec   Loss 8.1735   LearningRate 0.0337   Epoch: 8   Global Step: 348090   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:27,491-Speed 2635.15 samples/sec   Loss 7.9619   LearningRate 0.0337   Epoch: 8   Global Step: 348100   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:31,380-Speed 2633.30 samples/sec   Loss 7.8810   LearningRate 0.0337   Epoch: 8   Global Step: 348110   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:35,266-Speed 2636.23 samples/sec   Loss 7.9036   LearningRate 0.0337   Epoch: 8   Global Step: 348120   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:39,159-Speed 2630.56 samples/sec   Loss 7.8887   LearningRate 0.0337   Epoch: 8   Global Step: 348130   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:43,045-Speed 2635.50 samples/sec   Loss 7.8671   LearningRate 0.0337   Epoch: 8   Global Step: 348140   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:46,940-Speed 2629.69 samples/sec   Loss 7.8971   LearningRate 0.0337   Epoch: 8   Global Step: 348150   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:50,830-Speed 2633.01 samples/sec   Loss 7.7982   LearningRate 0.0337   Epoch: 8   Global Step: 348160   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 10:49:54,719-Speed 2633.79 samples/sec   Loss 7.9520   LearningRate 0.0337   Epoch: 8   Global Step: 348170   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:49:58,615-Speed 2628.79 samples/sec   Loss 7.8102   LearningRate 0.0337   Epoch: 8   Global Step: 348180   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:02,514-Speed 2627.39 samples/sec   Loss 7.7770   LearningRate 0.0337   Epoch: 8   Global Step: 348190   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:06,403-Speed 2633.39 samples/sec   Loss 7.8721   LearningRate 0.0337   Epoch: 8   Global Step: 348200   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:10,293-Speed 2633.07 samples/sec   Loss 8.0434   LearningRate 0.0337   Epoch: 8   Global Step: 348210   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:14,188-Speed 2629.94 samples/sec   Loss 8.1558   LearningRate 0.0337   Epoch: 8   Global Step: 348220   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:18,073-Speed 2635.98 samples/sec   Loss 8.5843   LearningRate 0.0337   Epoch: 8   Global Step: 348230   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:21,964-Speed 2632.29 samples/sec   Loss 7.8504   LearningRate 0.0337   Epoch: 8   Global Step: 348240   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:25,854-Speed 2632.62 samples/sec   Loss 7.7592   LearningRate 0.0337   Epoch: 8   Global Step: 348250   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:29,757-Speed 2625.05 samples/sec   Loss 7.7970   LearningRate 0.0337   Epoch: 8   Global Step: 348260   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 10:50:33,690-Speed 2604.51 samples/sec   Loss 7.8570   LearningRate 0.0337   Epoch: 8   Global Step: 348270   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:50:37,604-Speed 2616.45 samples/sec   Loss 7.8382   LearningRate 0.0337   Epoch: 8   Global Step: 348280   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:50:41,497-Speed 2631.76 samples/sec   Loss 7.7917   LearningRate 0.0337   Epoch: 8   Global Step: 348290   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:50:45,393-Speed 2628.87 samples/sec   Loss 7.8920   LearningRate 0.0337   Epoch: 8   Global Step: 348300   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:50:49,285-Speed 2632.05 samples/sec   Loss 7.7839   LearningRate 0.0337   Epoch: 8   Global Step: 348310   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:50:53,174-Speed 2633.64 samples/sec   Loss 7.8419   LearningRate 0.0337   Epoch: 8   Global Step: 348320   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:50:57,066-Speed 2631.53 samples/sec   Loss 7.8666   LearningRate 0.0337   Epoch: 8   Global Step: 348330   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:51:00,957-Speed 2632.10 samples/sec   Loss 7.8926   LearningRate 0.0337   Epoch: 8   Global Step: 348340   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:51:04,867-Speed 2620.11 samples/sec   Loss 7.8827   LearningRate 0.0337   Epoch: 8   Global Step: 348350   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:51:08,778-Speed 2619.38 samples/sec   Loss 7.7768   LearningRate 0.0336   Epoch: 8   Global Step: 348360   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 10:51:12,674-Speed 2628.69 samples/sec   Loss 7.9130   LearningRate 0.0336   Epoch: 8   Global Step: 348370   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:16,563-Speed 2634.03 samples/sec   Loss 7.9214   LearningRate 0.0336   Epoch: 8   Global Step: 348380   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:20,462-Speed 2626.83 samples/sec   Loss 7.8593   LearningRate 0.0336   Epoch: 8   Global Step: 348390   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:24,354-Speed 2631.73 samples/sec   Loss 7.8312   LearningRate 0.0336   Epoch: 8   Global Step: 348400   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:28,264-Speed 2619.62 samples/sec   Loss 7.7925   LearningRate 0.0336   Epoch: 8   Global Step: 348410   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:32,172-Speed 2621.16 samples/sec   Loss 7.7330   LearningRate 0.0336   Epoch: 8   Global Step: 348420   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:36,070-Speed 2627.19 samples/sec   Loss 7.7443   LearningRate 0.0336   Epoch: 8   Global Step: 348430   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:39,967-Speed 2628.43 samples/sec   Loss 7.7283   LearningRate 0.0336   Epoch: 8   Global Step: 348440   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:43,860-Speed 2630.88 samples/sec   Loss 7.8823   LearningRate 0.0336   Epoch: 8   Global Step: 348450   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:47,754-Speed 2631.28 samples/sec   Loss 7.9689   LearningRate 0.0336   Epoch: 8   Global Step: 348460   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:51:51,643-Speed 2632.93 samples/sec   Loss 7.8954   LearningRate 0.0336   Epoch: 8   Global Step: 348470   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:51:55,534-Speed 2632.41 samples/sec   Loss 7.8529   LearningRate 0.0336   Epoch: 8   Global Step: 348480   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:51:59,432-Speed 2627.49 samples/sec   Loss 7.9056   LearningRate 0.0336   Epoch: 8   Global Step: 348490   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:03,320-Speed 2634.38 samples/sec   Loss 7.8650   LearningRate 0.0336   Epoch: 8   Global Step: 348500   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:07,212-Speed 2631.33 samples/sec   Loss 7.7556   LearningRate 0.0336   Epoch: 8   Global Step: 348510   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:11,105-Speed 2631.49 samples/sec   Loss 7.7210   LearningRate 0.0336   Epoch: 8   Global Step: 348520   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:15,016-Speed 2618.68 samples/sec   Loss 7.8390   LearningRate 0.0336   Epoch: 8   Global Step: 348530   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:19,061-Speed 2533.07 samples/sec   Loss 7.9422   LearningRate 0.0336   Epoch: 8   Global Step: 348540   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:22,956-Speed 2629.39 samples/sec   Loss 7.7649   LearningRate 0.0336   Epoch: 8   Global Step: 348550   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:26,851-Speed 2629.97 samples/sec   Loss 7.8407   LearningRate 0.0336   Epoch: 8   Global Step: 348560   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:30,757-Speed 2622.32 samples/sec   Loss 7.8860   LearningRate 0.0336   Epoch: 8   Global Step: 348570   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:52:34,647-Speed 2633.28 samples/sec   Loss 7.8634   LearningRate 0.0336   Epoch: 8   Global Step: 348580   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:38,546-Speed 2627.38 samples/sec   Loss 7.8846   LearningRate 0.0336   Epoch: 8   Global Step: 348590   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:42,459-Speed 2617.35 samples/sec   Loss 7.9010   LearningRate 0.0336   Epoch: 8   Global Step: 348600   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:46,366-Speed 2621.40 samples/sec   Loss 7.8773   LearningRate 0.0336   Epoch: 8   Global Step: 348610   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:50,299-Speed 2604.44 samples/sec   Loss 7.9248   LearningRate 0.0336   Epoch: 8   Global Step: 348620   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:54,214-Speed 2615.97 samples/sec   Loss 7.7236   LearningRate 0.0336   Epoch: 8   Global Step: 348630   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:52:58,136-Speed 2611.97 samples/sec   Loss 7.9005   LearningRate 0.0336   Epoch: 8   Global Step: 348640   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:53:02,038-Speed 2624.68 samples/sec   Loss 7.8008   LearningRate 0.0336   Epoch: 8   Global Step: 348650   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:53:05,940-Speed 2625.12 samples/sec   Loss 7.7773   LearningRate 0.0336   Epoch: 8   Global Step: 348660   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:53:09,947-Speed 2556.13 samples/sec   Loss 8.1315   LearningRate 0.0336   Epoch: 8   Global Step: 348670   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:13,839-Speed 2632.40 samples/sec   Loss 7.8288   LearningRate 0.0336   Epoch: 8   Global Step: 348680   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:17,731-Speed 2631.26 samples/sec   Loss 7.7842   LearningRate 0.0336   Epoch: 8   Global Step: 348690   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:21,653-Speed 2611.57 samples/sec   Loss 7.7297   LearningRate 0.0336   Epoch: 8   Global Step: 348700   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:25,555-Speed 2625.25 samples/sec   Loss 7.7979   LearningRate 0.0336   Epoch: 8   Global Step: 348710   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:29,455-Speed 2626.61 samples/sec   Loss 7.7795   LearningRate 0.0336   Epoch: 8   Global Step: 348720   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:33,345-Speed 2633.09 samples/sec   Loss 7.7121   LearningRate 0.0336   Epoch: 8   Global Step: 348730   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:37,238-Speed 2630.31 samples/sec   Loss 7.7207   LearningRate 0.0336   Epoch: 8   Global Step: 348740   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:41,140-Speed 2625.05 samples/sec   Loss 7.7806   LearningRate 0.0336   Epoch: 8   Global Step: 348750   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:45,040-Speed 2626.57 samples/sec   Loss 7.8057   LearningRate 0.0336   Epoch: 8   Global Step: 348760   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:53:48,933-Speed 2631.63 samples/sec   Loss 7.7666   LearningRate 0.0336   Epoch: 8   Global Step: 348770   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:53:52,827-Speed 2629.74 samples/sec   Loss 7.8212   LearningRate 0.0336   Epoch: 8   Global Step: 348780   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:53:56,723-Speed 2629.15 samples/sec   Loss 7.9380   LearningRate 0.0336   Epoch: 8   Global Step: 348790   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:00,619-Speed 2628.77 samples/sec   Loss 7.8542   LearningRate 0.0336   Epoch: 8   Global Step: 348800   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:04,514-Speed 2629.92 samples/sec   Loss 7.7355   LearningRate 0.0336   Epoch: 8   Global Step: 348810   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:08,408-Speed 2630.11 samples/sec   Loss 7.7644   LearningRate 0.0336   Epoch: 8   Global Step: 348820   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:12,308-Speed 2626.45 samples/sec   Loss 7.8855   LearningRate 0.0336   Epoch: 8   Global Step: 348830   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:16,199-Speed 2631.91 samples/sec   Loss 7.9265   LearningRate 0.0336   Epoch: 8   Global Step: 348840   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:20,092-Speed 2631.64 samples/sec   Loss 7.8142   LearningRate 0.0336   Epoch: 8   Global Step: 348850   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:23,984-Speed 2631.76 samples/sec   Loss 7.9643   LearningRate 0.0336   Epoch: 8   Global Step: 348860   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:27,877-Speed 2631.24 samples/sec   Loss 7.8819   LearningRate 0.0336   Epoch: 8   Global Step: 348870   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:54:31,771-Speed 2629.85 samples/sec   Loss 7.8261   LearningRate 0.0336   Epoch: 8   Global Step: 348880   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:54:35,664-Speed 2630.63 samples/sec   Loss 7.6794   LearningRate 0.0336   Epoch: 8   Global Step: 348890   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:54:39,558-Speed 2630.38 samples/sec   Loss 7.8070   LearningRate 0.0336   Epoch: 8   Global Step: 348900   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:54:43,452-Speed 2630.27 samples/sec   Loss 7.8035   LearningRate 0.0336   Epoch: 8   Global Step: 348910   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:54:47,348-Speed 2628.53 samples/sec   Loss 7.9214   LearningRate 0.0336   Epoch: 8   Global Step: 348920   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:54:51,225-Speed 2642.63 samples/sec   Loss 7.7224   LearningRate 0.0336   Epoch: 8   Global Step: 348930   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:55,118-Speed 2631.09 samples/sec   Loss 7.8400   LearningRate 0.0336   Epoch: 8   Global Step: 348940   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:54:59,022-Speed 2623.65 samples/sec   Loss 7.9165   LearningRate 0.0336   Epoch: 8   Global Step: 348950   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:02,918-Speed 2628.96 samples/sec   Loss 7.8681   LearningRate 0.0336   Epoch: 8   Global Step: 348960   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:06,806-Speed 2633.91 samples/sec   Loss 7.7905   LearningRate 0.0336   Epoch: 8   Global Step: 348970   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:10,698-Speed 2631.44 samples/sec   Loss 7.9189   LearningRate 0.0336   Epoch: 8   Global Step: 348980   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:14,592-Speed 2630.41 samples/sec   Loss 7.8248   LearningRate 0.0336   Epoch: 8   Global Step: 348990   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:18,485-Speed 2631.10 samples/sec   Loss 7.7925   LearningRate 0.0336   Epoch: 8   Global Step: 349000   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:22,386-Speed 2634.08 samples/sec   Loss 7.7649   LearningRate 0.0336   Epoch: 8   Global Step: 349010   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:26,277-Speed 2631.93 samples/sec   Loss 7.8179   LearningRate 0.0336   Epoch: 8   Global Step: 349020   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:55:30,171-Speed 2630.99 samples/sec   Loss 7.7357   LearningRate 0.0336   Epoch: 8   Global Step: 349030   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:34,063-Speed 2631.03 samples/sec   Loss 7.7789   LearningRate 0.0336   Epoch: 8   Global Step: 349040   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:37,957-Speed 2631.01 samples/sec   Loss 7.7905   LearningRate 0.0336   Epoch: 8   Global Step: 349050   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:41,850-Speed 2630.74 samples/sec   Loss 7.7833   LearningRate 0.0336   Epoch: 8   Global Step: 349060   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:45,744-Speed 2630.31 samples/sec   Loss 7.8773   LearningRate 0.0335   Epoch: 8   Global Step: 349070   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:49,648-Speed 2623.56 samples/sec   Loss 7.8653   LearningRate 0.0335   Epoch: 8   Global Step: 349080   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:53,530-Speed 2638.39 samples/sec   Loss 7.8518   LearningRate 0.0335   Epoch: 8   Global Step: 349090   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:55:57,426-Speed 2629.43 samples/sec   Loss 7.7154   LearningRate 0.0335   Epoch: 8   Global Step: 349100   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:56:01,324-Speed 2627.36 samples/sec   Loss 7.8857   LearningRate 0.0335   Epoch: 8   Global Step: 349110   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:56:05,222-Speed 2627.53 samples/sec   Loss 7.8398   LearningRate 0.0335   Epoch: 8   Global Step: 349120   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:56:09,088-Speed 2649.30 samples/sec   Loss 7.7937   LearningRate 0.0335   Epoch: 8   Global Step: 349130   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:12,983-Speed 2630.16 samples/sec   Loss 7.8720   LearningRate 0.0335   Epoch: 8   Global Step: 349140   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:16,876-Speed 2630.40 samples/sec   Loss 7.8426   LearningRate 0.0335   Epoch: 8   Global Step: 349150   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:20,780-Speed 2623.89 samples/sec   Loss 7.8176   LearningRate 0.0335   Epoch: 8   Global Step: 349160   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:24,672-Speed 2632.47 samples/sec   Loss 7.8551   LearningRate 0.0335   Epoch: 8   Global Step: 349170   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:28,568-Speed 2628.42 samples/sec   Loss 7.8249   LearningRate 0.0335   Epoch: 8   Global Step: 349180   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:32,461-Speed 2631.07 samples/sec   Loss 7.6645   LearningRate 0.0335   Epoch: 8   Global Step: 349190   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:36,359-Speed 2627.82 samples/sec   Loss 7.7793   LearningRate 0.0335   Epoch: 8   Global Step: 349200   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:40,254-Speed 2629.40 samples/sec   Loss 7.7129   LearningRate 0.0335   Epoch: 8   Global Step: 349210   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:44,148-Speed 2630.27 samples/sec   Loss 7.9960   LearningRate 0.0335   Epoch: 8   Global Step: 349220   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:56:48,117-Speed 2581.08 samples/sec   Loss 7.7435   LearningRate 0.0335   Epoch: 8   Global Step: 349230   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:56:52,012-Speed 2629.65 samples/sec   Loss 7.7663   LearningRate 0.0335   Epoch: 8   Global Step: 349240   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:56:55,900-Speed 2634.05 samples/sec   Loss 7.7660   LearningRate 0.0335   Epoch: 8   Global Step: 349250   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:56:59,809-Speed 2620.03 samples/sec   Loss 7.7462   LearningRate 0.0335   Epoch: 8   Global Step: 349260   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:03,709-Speed 2627.18 samples/sec   Loss 7.9420   LearningRate 0.0335   Epoch: 8   Global Step: 349270   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:07,602-Speed 2630.59 samples/sec   Loss 7.8308   LearningRate 0.0335   Epoch: 8   Global Step: 349280   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:11,502-Speed 2626.58 samples/sec   Loss 7.6911   LearningRate 0.0335   Epoch: 8   Global Step: 349290   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:15,403-Speed 2625.98 samples/sec   Loss 7.9529   LearningRate 0.0335   Epoch: 8   Global Step: 349300   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:19,307-Speed 2623.30 samples/sec   Loss 7.7542   LearningRate 0.0335   Epoch: 8   Global Step: 349310   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:23,199-Speed 2631.51 samples/sec   Loss 7.7487   LearningRate 0.0335   Epoch: 8   Global Step: 349320   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:27,096-Speed 2628.60 samples/sec   Loss 7.7098   LearningRate 0.0335   Epoch: 8   Global Step: 349330   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:30,987-Speed 2632.27 samples/sec   Loss 7.7488   LearningRate 0.0335   Epoch: 8   Global Step: 349340   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:34,973-Speed 2570.03 samples/sec   Loss 8.0394   LearningRate 0.0335   Epoch: 8   Global Step: 349350   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:38,862-Speed 2633.35 samples/sec   Loss 7.9304   LearningRate 0.0335   Epoch: 8   Global Step: 349360   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:42,759-Speed 2628.71 samples/sec   Loss 7.7294   LearningRate 0.0335   Epoch: 8   Global Step: 349370   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:46,650-Speed 2632.13 samples/sec   Loss 7.9278   LearningRate 0.0335   Epoch: 8   Global Step: 349380   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:50,553-Speed 2624.59 samples/sec   Loss 7.8056   LearningRate 0.0335   Epoch: 8   Global Step: 349390   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:57:54,432-Speed 2640.30 samples/sec   Loss 7.8635   LearningRate 0.0335   Epoch: 8   Global Step: 349400   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:57:58,329-Speed 2628.85 samples/sec   Loss 7.8809   LearningRate 0.0335   Epoch: 8   Global Step: 349410   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:02,226-Speed 2627.90 samples/sec   Loss 7.6143   LearningRate 0.0335   Epoch: 8   Global Step: 349420   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:06,118-Speed 2631.97 samples/sec   Loss 7.9202   LearningRate 0.0335   Epoch: 8   Global Step: 349430   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:10,009-Speed 2632.32 samples/sec   Loss 7.8632   LearningRate 0.0335   Epoch: 8   Global Step: 349440   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:13,900-Speed 2632.69 samples/sec   Loss 7.7412   LearningRate 0.0335   Epoch: 8   Global Step: 349450   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:17,795-Speed 2630.16 samples/sec   Loss 7.6810   LearningRate 0.0335   Epoch: 8   Global Step: 349460   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:21,692-Speed 2628.19 samples/sec   Loss 7.8639   LearningRate 0.0335   Epoch: 8   Global Step: 349470   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:25,591-Speed 2627.08 samples/sec   Loss 7.8810   LearningRate 0.0335   Epoch: 8   Global Step: 349480   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:29,482-Speed 2632.24 samples/sec   Loss 7.8094   LearningRate 0.0335   Epoch: 8   Global Step: 349490   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:33,562-Speed 2510.44 samples/sec   Loss 7.7452   LearningRate 0.0335   Epoch: 8   Global Step: 349500   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 10:58:37,626-Speed 2520.30 samples/sec   Loss 7.6826   LearningRate 0.0335   Epoch: 8   Global Step: 349510   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:41,619-Speed 2565.16 samples/sec   Loss 7.8241   LearningRate 0.0335   Epoch: 8   Global Step: 349520   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:45,513-Speed 2630.41 samples/sec   Loss 7.7944   LearningRate 0.0335   Epoch: 8   Global Step: 349530   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:49,409-Speed 2629.10 samples/sec   Loss 7.8235   LearningRate 0.0335   Epoch: 8   Global Step: 349540   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 10:58:53,268-Speed 2654.42 samples/sec   Loss 7.8405   LearningRate 0.0335   Epoch: 8   Global Step: 349550   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:58:57,163-Speed 2629.60 samples/sec   Loss 7.8313   LearningRate 0.0335   Epoch: 8   Global Step: 349560   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:01,056-Speed 2630.71 samples/sec   Loss 7.7952   LearningRate 0.0335   Epoch: 8   Global Step: 349570   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:04,949-Speed 2630.91 samples/sec   Loss 7.8403   LearningRate 0.0335   Epoch: 8   Global Step: 349580   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:08,839-Speed 2633.38 samples/sec   Loss 7.7487   LearningRate 0.0335   Epoch: 8   Global Step: 349590   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:12,732-Speed 2630.84 samples/sec   Loss 7.6833   LearningRate 0.0335   Epoch: 8   Global Step: 349600   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:16,624-Speed 2631.87 samples/sec   Loss 7.7935   LearningRate 0.0335   Epoch: 8   Global Step: 349610   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:20,518-Speed 2630.60 samples/sec   Loss 7.7886   LearningRate 0.0335   Epoch: 8   Global Step: 349620   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:24,415-Speed 2627.94 samples/sec   Loss 7.7561   LearningRate 0.0335   Epoch: 8   Global Step: 349630   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:28,323-Speed 2621.08 samples/sec   Loss 7.9366   LearningRate 0.0335   Epoch: 8   Global Step: 349640   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 10:59:32,223-Speed 2625.77 samples/sec   Loss 7.7891   LearningRate 0.0335   Epoch: 8   Global Step: 349650   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:36,132-Speed 2619.96 samples/sec   Loss 7.7346   LearningRate 0.0335   Epoch: 8   Global Step: 349660   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:40,047-Speed 2616.32 samples/sec   Loss 7.7446   LearningRate 0.0335   Epoch: 8   Global Step: 349670   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:43,945-Speed 2628.66 samples/sec   Loss 7.8603   LearningRate 0.0335   Epoch: 8   Global Step: 349680   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:47,842-Speed 2628.04 samples/sec   Loss 7.8624   LearningRate 0.0335   Epoch: 8   Global Step: 349690   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:51,738-Speed 2629.19 samples/sec   Loss 7.7917   LearningRate 0.0335   Epoch: 8   Global Step: 349700   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:55,631-Speed 2630.65 samples/sec   Loss 7.6744   LearningRate 0.0335   Epoch: 8   Global Step: 349710   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 10:59:59,569-Speed 2601.62 samples/sec   Loss 7.8094   LearningRate 0.0335   Epoch: 8   Global Step: 349720   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:00:03,460-Speed 2631.97 samples/sec   Loss 7.6741   LearningRate 0.0335   Epoch: 8   Global Step: 349730   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:00:07,351-Speed 2632.25 samples/sec   Loss 7.8699   LearningRate 0.0335   Epoch: 8   Global Step: 349740   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:00:11,245-Speed 2630.57 samples/sec   Loss 7.7134   LearningRate 0.0335   Epoch: 8   Global Step: 349750   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:15,208-Speed 2584.57 samples/sec   Loss 7.8161   LearningRate 0.0335   Epoch: 8   Global Step: 349760   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:19,122-Speed 2616.90 samples/sec   Loss 7.7861   LearningRate 0.0335   Epoch: 8   Global Step: 349770   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:23,015-Speed 2631.18 samples/sec   Loss 7.7958   LearningRate 0.0335   Epoch: 8   Global Step: 349780   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:26,907-Speed 2631.65 samples/sec   Loss 7.7846   LearningRate 0.0334   Epoch: 8   Global Step: 349790   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:30,801-Speed 2630.62 samples/sec   Loss 7.7662   LearningRate 0.0334   Epoch: 8   Global Step: 349800   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:34,701-Speed 2626.16 samples/sec   Loss 7.8708   LearningRate 0.0334   Epoch: 8   Global Step: 349810   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:38,597-Speed 2628.96 samples/sec   Loss 7.7171   LearningRate 0.0334   Epoch: 8   Global Step: 349820   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:42,491-Speed 2630.05 samples/sec   Loss 7.7820   LearningRate 0.0334   Epoch: 8   Global Step: 349830   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:46,395-Speed 2623.93 samples/sec   Loss 7.7171   LearningRate 0.0334   Epoch: 8   Global Step: 349840   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:00:50,289-Speed 2630.32 samples/sec   Loss 7.8522   LearningRate 0.0334   Epoch: 8   Global Step: 349850   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:00:54,189-Speed 2625.84 samples/sec   Loss 7.8608   LearningRate 0.0334   Epoch: 8   Global Step: 349860   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:00:58,088-Speed 2627.02 samples/sec   Loss 7.8234   LearningRate 0.0334   Epoch: 8   Global Step: 349870   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:01,982-Speed 2630.32 samples/sec   Loss 7.8138   LearningRate 0.0334   Epoch: 8   Global Step: 349880   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:05,882-Speed 2626.69 samples/sec   Loss 7.7620   LearningRate 0.0334   Epoch: 8   Global Step: 349890   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:09,774-Speed 2631.45 samples/sec   Loss 7.7436   LearningRate 0.0334   Epoch: 8   Global Step: 349900   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:13,671-Speed 2628.25 samples/sec   Loss 7.9242   LearningRate 0.0334   Epoch: 8   Global Step: 349910   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:17,568-Speed 2628.16 samples/sec   Loss 7.8426   LearningRate 0.0334   Epoch: 8   Global Step: 349920   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:21,460-Speed 2632.04 samples/sec   Loss 7.8222   LearningRate 0.0334   Epoch: 8   Global Step: 349930   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:25,368-Speed 2620.53 samples/sec   Loss 7.7712   LearningRate 0.0334   Epoch: 8   Global Step: 349940   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:29,271-Speed 2625.19 samples/sec   Loss 7.7341   LearningRate 0.0334   Epoch: 8   Global Step: 349950   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:01:33,169-Speed 2627.34 samples/sec   Loss 7.8290   LearningRate 0.0334   Epoch: 8   Global Step: 349960   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:01:37,059-Speed 2632.74 samples/sec   Loss 7.9281   LearningRate 0.0334   Epoch: 8   Global Step: 349970   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:40,947-Speed 2634.83 samples/sec   Loss 7.7826   LearningRate 0.0334   Epoch: 8   Global Step: 349980   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:44,866-Speed 2613.50 samples/sec   Loss 7.8003   LearningRate 0.0334   Epoch: 8   Global Step: 349990   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:01:48,760-Speed 2630.21 samples/sec   Loss 7.7117   LearningRate 0.0334   Epoch: 8   Global Step: 350000   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:02:31,897-[lfw][350000]XNorm: 23.307740
Training: 2022-04-14 11:02:31,898-[lfw][350000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 11:02:31,898-[lfw][350000]Accuracy-Highest: 0.99783
Training: 2022-04-14 11:03:22,011-[cfp_fp][350000]XNorm: 21.773272
Training: 2022-04-14 11:03:22,012-[cfp_fp][350000]Accuracy-Flip: 0.98300+-0.00689
Training: 2022-04-14 11:03:22,013-[cfp_fp][350000]Accuracy-Highest: 0.98671
Training: 2022-04-14 11:04:05,224-[agedb_30][350000]XNorm: 23.298955
Training: 2022-04-14 11:04:05,225-[agedb_30][350000]Accuracy-Flip: 0.97383+-0.00711
Training: 2022-04-14 11:04:05,226-[agedb_30][350000]Accuracy-Highest: 0.97567
Training: 2022-04-14 11:04:09,078-Speed 72.98 samples/sec   Loss 7.8284   LearningRate 0.0334   Epoch: 8   Global Step: 350010   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:12,963-Speed 2636.60 samples/sec   Loss 7.6572   LearningRate 0.0334   Epoch: 8   Global Step: 350020   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:16,846-Speed 2638.19 samples/sec   Loss 7.8301   LearningRate 0.0334   Epoch: 8   Global Step: 350030   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:20,728-Speed 2638.03 samples/sec   Loss 7.8756   LearningRate 0.0334   Epoch: 8   Global Step: 350040   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:24,613-Speed 2636.29 samples/sec   Loss 7.7250   LearningRate 0.0334   Epoch: 8   Global Step: 350050   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:28,503-Speed 2632.66 samples/sec   Loss 7.7519   LearningRate 0.0334   Epoch: 8   Global Step: 350060   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:32,392-Speed 2634.53 samples/sec   Loss 7.8223   LearningRate 0.0334   Epoch: 8   Global Step: 350070   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:36,284-Speed 2631.45 samples/sec   Loss 7.6980   LearningRate 0.0334   Epoch: 8   Global Step: 350080   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:40,179-Speed 2629.83 samples/sec   Loss 7.7555   LearningRate 0.0334   Epoch: 8   Global Step: 350090   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:44,063-Speed 2637.69 samples/sec   Loss 7.7704   LearningRate 0.0334   Epoch: 8   Global Step: 350100   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:04:47,953-Speed 2632.70 samples/sec   Loss 7.6910   LearningRate 0.0334   Epoch: 8   Global Step: 350110   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:04:51,849-Speed 2628.82 samples/sec   Loss 7.7395   LearningRate 0.0334   Epoch: 8   Global Step: 350120   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:04:55,756-Speed 2622.05 samples/sec   Loss 7.8127   LearningRate 0.0334   Epoch: 8   Global Step: 350130   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:04:59,658-Speed 2624.34 samples/sec   Loss 7.7396   LearningRate 0.0334   Epoch: 8   Global Step: 350140   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:05:03,485-Speed 2676.69 samples/sec   Loss 7.7714   LearningRate 0.0334   Epoch: 8   Global Step: 350150   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:07,378-Speed 2630.75 samples/sec   Loss 8.7090   LearningRate 0.0334   Epoch: 8   Global Step: 350160   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:11,290-Speed 2618.24 samples/sec   Loss 7.9132   LearningRate 0.0334   Epoch: 8   Global Step: 350170   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:15,190-Speed 2626.09 samples/sec   Loss 7.7453   LearningRate 0.0334   Epoch: 8   Global Step: 350180   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:19,109-Speed 2615.10 samples/sec   Loss 7.9204   LearningRate 0.0334   Epoch: 8   Global Step: 350190   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:23,016-Speed 2621.70 samples/sec   Loss 7.8686   LearningRate 0.0334   Epoch: 8   Global Step: 350200   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:26,915-Speed 2626.86 samples/sec   Loss 7.9257   LearningRate 0.0334   Epoch: 8   Global Step: 350210   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:30,819-Speed 2622.83 samples/sec   Loss 7.7419   LearningRate 0.0334   Epoch: 8   Global Step: 350220   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:34,731-Speed 2618.85 samples/sec   Loss 7.8369   LearningRate 0.0334   Epoch: 8   Global Step: 350230   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:38,637-Speed 2621.72 samples/sec   Loss 7.7127   LearningRate 0.0334   Epoch: 8   Global Step: 350240   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:05:42,547-Speed 2620.11 samples/sec   Loss 7.8960   LearningRate 0.0334   Epoch: 8   Global Step: 350250   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:05:46,451-Speed 2623.41 samples/sec   Loss 7.7500   LearningRate 0.0334   Epoch: 8   Global Step: 350260   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:05:50,448-Speed 2562.51 samples/sec   Loss 7.7822   LearningRate 0.0334   Epoch: 8   Global Step: 350270   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:05:54,346-Speed 2627.93 samples/sec   Loss 7.8853   LearningRate 0.0334   Epoch: 8   Global Step: 350280   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:05:58,264-Speed 2613.98 samples/sec   Loss 7.7824   LearningRate 0.0334   Epoch: 8   Global Step: 350290   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:06:02,180-Speed 2615.85 samples/sec   Loss 7.7738   LearningRate 0.0334   Epoch: 8   Global Step: 350300   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:06:06,095-Speed 2616.65 samples/sec   Loss 7.7673   LearningRate 0.0334   Epoch: 8   Global Step: 350310   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:06:10,006-Speed 2618.56 samples/sec   Loss 7.6850   LearningRate 0.0334   Epoch: 8   Global Step: 350320   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:06:13,913-Speed 2621.78 samples/sec   Loss 7.7477   LearningRate 0.0334   Epoch: 8   Global Step: 350330   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:06:17,814-Speed 2625.74 samples/sec   Loss 7.8606   LearningRate 0.0334   Epoch: 8   Global Step: 350340   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:06:21,709-Speed 2629.64 samples/sec   Loss 7.7507   LearningRate 0.0334   Epoch: 8   Global Step: 350350   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:25,618-Speed 2620.16 samples/sec   Loss 7.7774   LearningRate 0.0334   Epoch: 8   Global Step: 350360   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:29,517-Speed 2626.54 samples/sec   Loss 7.8123   LearningRate 0.0334   Epoch: 8   Global Step: 350370   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:33,416-Speed 2627.25 samples/sec   Loss 7.7165   LearningRate 0.0334   Epoch: 8   Global Step: 350380   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:37,312-Speed 2628.65 samples/sec   Loss 7.8761   LearningRate 0.0334   Epoch: 8   Global Step: 350390   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:41,208-Speed 2629.47 samples/sec   Loss 7.8491   LearningRate 0.0334   Epoch: 8   Global Step: 350400   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:45,104-Speed 2628.96 samples/sec   Loss 7.8132   LearningRate 0.0334   Epoch: 8   Global Step: 350410   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:49,018-Speed 2616.86 samples/sec   Loss 7.8692   LearningRate 0.0334   Epoch: 8   Global Step: 350420   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:52,914-Speed 2628.90 samples/sec   Loss 7.6786   LearningRate 0.0334   Epoch: 8   Global Step: 350430   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:06:56,823-Speed 2620.50 samples/sec   Loss 7.8376   LearningRate 0.0334   Epoch: 8   Global Step: 350440   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:07:00,729-Speed 2621.77 samples/sec   Loss 7.7831   LearningRate 0.0334   Epoch: 8   Global Step: 350450   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:04,639-Speed 2619.43 samples/sec   Loss 7.6502   LearningRate 0.0334   Epoch: 8   Global Step: 350460   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:08,547-Speed 2620.88 samples/sec   Loss 7.7500   LearningRate 0.0334   Epoch: 8   Global Step: 350470   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:12,454-Speed 2622.05 samples/sec   Loss 7.9318   LearningRate 0.0334   Epoch: 8   Global Step: 350480   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:16,355-Speed 2625.36 samples/sec   Loss 7.8201   LearningRate 0.0334   Epoch: 8   Global Step: 350490   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:20,266-Speed 2618.63 samples/sec   Loss 7.5957   LearningRate 0.0334   Epoch: 8   Global Step: 350500   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:24,168-Speed 2624.90 samples/sec   Loss 7.9119   LearningRate 0.0333   Epoch: 8   Global Step: 350510   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:28,070-Speed 2624.61 samples/sec   Loss 7.7485   LearningRate 0.0333   Epoch: 8   Global Step: 350520   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:31,969-Speed 2627.44 samples/sec   Loss 7.7725   LearningRate 0.0333   Epoch: 8   Global Step: 350530   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:35,865-Speed 2628.83 samples/sec   Loss 7.7826   LearningRate 0.0333   Epoch: 8   Global Step: 350540   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:39,763-Speed 2627.85 samples/sec   Loss 7.8088   LearningRate 0.0333   Epoch: 8   Global Step: 350550   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:07:43,661-Speed 2627.75 samples/sec   Loss 7.7132   LearningRate 0.0333   Epoch: 8   Global Step: 350560   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:07:47,537-Speed 2642.64 samples/sec   Loss 7.7482   LearningRate 0.0333   Epoch: 8   Global Step: 350570   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:51,442-Speed 2622.65 samples/sec   Loss 7.8390   LearningRate 0.0333   Epoch: 8   Global Step: 350580   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:55,348-Speed 2622.32 samples/sec   Loss 7.7882   LearningRate 0.0333   Epoch: 8   Global Step: 350590   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:07:59,242-Speed 2629.99 samples/sec   Loss 7.7146   LearningRate 0.0333   Epoch: 8   Global Step: 350600   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:03,140-Speed 2627.94 samples/sec   Loss 7.7543   LearningRate 0.0333   Epoch: 8   Global Step: 350610   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:07,048-Speed 2621.01 samples/sec   Loss 7.9110   LearningRate 0.0333   Epoch: 8   Global Step: 350620   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:10,943-Speed 2630.06 samples/sec   Loss 7.8266   LearningRate 0.0333   Epoch: 8   Global Step: 350630   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:14,840-Speed 2627.82 samples/sec   Loss 7.7316   LearningRate 0.0333   Epoch: 8   Global Step: 350640   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:18,740-Speed 2626.80 samples/sec   Loss 7.7160   LearningRate 0.0333   Epoch: 8   Global Step: 350650   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:22,640-Speed 2625.97 samples/sec   Loss 7.6971   LearningRate 0.0333   Epoch: 8   Global Step: 350660   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:08:26,561-Speed 2612.07 samples/sec   Loss 7.7239   LearningRate 0.0333   Epoch: 8   Global Step: 350670   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:30,457-Speed 2629.23 samples/sec   Loss 7.5918   LearningRate 0.0333   Epoch: 8   Global Step: 350680   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:34,353-Speed 2629.15 samples/sec   Loss 7.8339   LearningRate 0.0333   Epoch: 8   Global Step: 350690   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:38,284-Speed 2605.53 samples/sec   Loss 7.7127   LearningRate 0.0333   Epoch: 8   Global Step: 350700   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:42,178-Speed 2630.09 samples/sec   Loss 7.8334   LearningRate 0.0333   Epoch: 8   Global Step: 350710   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:46,091-Speed 2618.17 samples/sec   Loss 7.7336   LearningRate 0.0333   Epoch: 8   Global Step: 350720   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:49,995-Speed 2623.23 samples/sec   Loss 7.6790   LearningRate 0.0333   Epoch: 8   Global Step: 350730   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:53,926-Speed 2605.93 samples/sec   Loss 7.6994   LearningRate 0.0333   Epoch: 8   Global Step: 350740   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:08:57,867-Speed 2599.27 samples/sec   Loss 7.8676   LearningRate 0.0333   Epoch: 8   Global Step: 350750   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:01,768-Speed 2625.33 samples/sec   Loss 7.8734   LearningRate 0.0333   Epoch: 8   Global Step: 350760   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:05,660-Speed 2631.57 samples/sec   Loss 7.8518   LearningRate 0.0333   Epoch: 8   Global Step: 350770   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:09:09,545-Speed 2637.11 samples/sec   Loss 7.7964   LearningRate 0.0333   Epoch: 8   Global Step: 350780   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:13,445-Speed 2625.71 samples/sec   Loss 7.8038   LearningRate 0.0333   Epoch: 8   Global Step: 350790   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:17,340-Speed 2629.71 samples/sec   Loss 7.8734   LearningRate 0.0333   Epoch: 8   Global Step: 350800   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:21,242-Speed 2625.66 samples/sec   Loss 7.7735   LearningRate 0.0333   Epoch: 8   Global Step: 350810   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:25,157-Speed 2615.87 samples/sec   Loss 7.7940   LearningRate 0.0333   Epoch: 8   Global Step: 350820   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:29,072-Speed 2616.56 samples/sec   Loss 7.9704   LearningRate 0.0333   Epoch: 8   Global Step: 350830   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:32,972-Speed 2625.98 samples/sec   Loss 7.9626   LearningRate 0.0333   Epoch: 8   Global Step: 350840   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:09:36,854-Speed 2638.34 samples/sec   Loss 7.8278   LearningRate 0.0333   Epoch: 8   Global Step: 350850   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:09:40,749-Speed 2629.80 samples/sec   Loss 7.7911   LearningRate 0.0333   Epoch: 8   Global Step: 350860   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:09:44,669-Speed 2613.22 samples/sec   Loss 7.8576   LearningRate 0.0333   Epoch: 8   Global Step: 350870   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:09:48,566-Speed 2627.98 samples/sec   Loss 7.7523   LearningRate 0.0333   Epoch: 8   Global Step: 350880   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:09:52,466-Speed 2630.46 samples/sec   Loss 7.8274   LearningRate 0.0333   Epoch: 8   Global Step: 350890   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:09:56,368-Speed 2625.47 samples/sec   Loss 7.8681   LearningRate 0.0333   Epoch: 8   Global Step: 350900   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:00,270-Speed 2624.68 samples/sec   Loss 7.6925   LearningRate 0.0333   Epoch: 8   Global Step: 350910   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:04,165-Speed 2630.14 samples/sec   Loss 7.7920   LearningRate 0.0333   Epoch: 8   Global Step: 350920   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:08,061-Speed 2628.79 samples/sec   Loss 7.7769   LearningRate 0.0333   Epoch: 8   Global Step: 350930   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:11,972-Speed 2618.49 samples/sec   Loss 7.7720   LearningRate 0.0333   Epoch: 8   Global Step: 350940   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:15,873-Speed 2625.98 samples/sec   Loss 7.7756   LearningRate 0.0333   Epoch: 8   Global Step: 350950   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:19,775-Speed 2625.21 samples/sec   Loss 7.6414   LearningRate 0.0333   Epoch: 8   Global Step: 350960   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:23,680-Speed 2622.49 samples/sec   Loss 7.8539   LearningRate 0.0333   Epoch: 8   Global Step: 350970   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:27,577-Speed 2628.87 samples/sec   Loss 7.7015   LearningRate 0.0333   Epoch: 8   Global Step: 350980   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:31,487-Speed 2619.21 samples/sec   Loss 7.8002   LearningRate 0.0333   Epoch: 8   Global Step: 350990   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:35,392-Speed 2623.11 samples/sec   Loss 7.7378   LearningRate 0.0333   Epoch: 8   Global Step: 351000   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:39,292-Speed 2625.54 samples/sec   Loss 7.8077   LearningRate 0.0333   Epoch: 8   Global Step: 351010   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:10:43,198-Speed 2622.82 samples/sec   Loss 7.7743   LearningRate 0.0333   Epoch: 8   Global Step: 351020   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:47,091-Speed 2630.64 samples/sec   Loss 7.8082   LearningRate 0.0333   Epoch: 8   Global Step: 351030   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:10:50,908-Speed 2684.10 samples/sec   Loss 8.1792   LearningRate 0.0333   Epoch: 8   Global Step: 351040   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:10:54,806-Speed 2627.51 samples/sec   Loss 9.3272   LearningRate 0.0333   Epoch: 8   Global Step: 351050   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:10:58,693-Speed 2635.19 samples/sec   Loss 8.1442   LearningRate 0.0333   Epoch: 8   Global Step: 351060   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:02,608-Speed 2616.24 samples/sec   Loss 7.9191   LearningRate 0.0333   Epoch: 8   Global Step: 351070   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:06,498-Speed 2632.95 samples/sec   Loss 7.8261   LearningRate 0.0333   Epoch: 8   Global Step: 351080   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:10,396-Speed 2626.76 samples/sec   Loss 7.9354   LearningRate 0.0333   Epoch: 8   Global Step: 351090   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:14,345-Speed 2593.88 samples/sec   Loss 7.7428   LearningRate 0.0333   Epoch: 8   Global Step: 351100   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:18,236-Speed 2632.25 samples/sec   Loss 7.8871   LearningRate 0.0333   Epoch: 8   Global Step: 351110   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:22,127-Speed 2632.46 samples/sec   Loss 7.8608   LearningRate 0.0333   Epoch: 8   Global Step: 351120   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:26,030-Speed 2624.81 samples/sec   Loss 7.8095   LearningRate 0.0333   Epoch: 8   Global Step: 351130   Fp16 Grad Scale: 2048   Required: 54 hours
Training: 2022-04-14 11:11:29,929-Speed 2626.77 samples/sec   Loss 7.8988   LearningRate 0.0333   Epoch: 8   Global Step: 351140   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:33,842-Speed 2617.52 samples/sec   Loss 7.7470   LearningRate 0.0333   Epoch: 8   Global Step: 351150   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:37,744-Speed 2624.81 samples/sec   Loss 7.8384   LearningRate 0.0333   Epoch: 8   Global Step: 351160   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:41,637-Speed 2630.48 samples/sec   Loss 7.8907   LearningRate 0.0333   Epoch: 8   Global Step: 351170   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:45,534-Speed 2628.63 samples/sec   Loss 7.7596   LearningRate 0.0333   Epoch: 8   Global Step: 351180   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:49,432-Speed 2627.31 samples/sec   Loss 7.7814   LearningRate 0.0333   Epoch: 8   Global Step: 351190   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:53,339-Speed 2622.57 samples/sec   Loss 7.8551   LearningRate 0.0333   Epoch: 8   Global Step: 351200   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:11:57,258-Speed 2613.28 samples/sec   Loss 7.7208   LearningRate 0.0333   Epoch: 8   Global Step: 351210   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:12:01,155-Speed 2628.75 samples/sec   Loss 7.8352   LearningRate 0.0333   Epoch: 8   Global Step: 351220   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:12:05,053-Speed 2627.08 samples/sec   Loss 7.7922   LearningRate 0.0332   Epoch: 8   Global Step: 351230   Fp16 Grad Scale: 4096   Required: 54 hours
Training: 2022-04-14 11:12:08,949-Speed 2629.16 samples/sec   Loss 7.7632   LearningRate 0.0332   Epoch: 8   Global Step: 351240   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:12,856-Speed 2621.86 samples/sec   Loss 7.7654   LearningRate 0.0332   Epoch: 8   Global Step: 351250   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:16,749-Speed 2630.50 samples/sec   Loss 7.8154   LearningRate 0.0332   Epoch: 8   Global Step: 351260   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:20,644-Speed 2630.16 samples/sec   Loss 7.7952   LearningRate 0.0332   Epoch: 8   Global Step: 351270   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:24,534-Speed 2632.34 samples/sec   Loss 7.6619   LearningRate 0.0332   Epoch: 8   Global Step: 351280   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:28,432-Speed 2628.11 samples/sec   Loss 7.8478   LearningRate 0.0332   Epoch: 8   Global Step: 351290   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:32,332-Speed 2626.13 samples/sec   Loss 7.8643   LearningRate 0.0332   Epoch: 8   Global Step: 351300   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:36,235-Speed 2624.96 samples/sec   Loss 7.8296   LearningRate 0.0332   Epoch: 8   Global Step: 351310   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:40,141-Speed 2622.30 samples/sec   Loss 7.8519   LearningRate 0.0332   Epoch: 8   Global Step: 351320   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:44,054-Speed 2617.41 samples/sec   Loss 7.9729   LearningRate 0.0332   Epoch: 8   Global Step: 351330   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:12:47,957-Speed 2624.13 samples/sec   Loss 7.7841   LearningRate 0.0332   Epoch: 8   Global Step: 351340   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:12:51,855-Speed 2627.87 samples/sec   Loss 7.7076   LearningRate 0.0332   Epoch: 8   Global Step: 351350   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:12:55,776-Speed 2612.06 samples/sec   Loss 7.7038   LearningRate 0.0332   Epoch: 8   Global Step: 351360   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:12:59,671-Speed 2629.72 samples/sec   Loss 7.7503   LearningRate 0.0332   Epoch: 8   Global Step: 351370   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:03,574-Speed 2624.20 samples/sec   Loss 7.7777   LearningRate 0.0332   Epoch: 8   Global Step: 351380   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:07,468-Speed 2630.26 samples/sec   Loss 7.8727   LearningRate 0.0332   Epoch: 8   Global Step: 351390   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:11,366-Speed 2627.47 samples/sec   Loss 7.8006   LearningRate 0.0332   Epoch: 8   Global Step: 351400   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:15,275-Speed 2620.22 samples/sec   Loss 7.7961   LearningRate 0.0332   Epoch: 8   Global Step: 351410   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:19,169-Speed 2630.75 samples/sec   Loss 7.6870   LearningRate 0.0332   Epoch: 8   Global Step: 351420   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:23,061-Speed 2631.16 samples/sec   Loss 7.7741   LearningRate 0.0332   Epoch: 8   Global Step: 351430   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:13:26,965-Speed 2624.69 samples/sec   Loss 7.7511   LearningRate 0.0332   Epoch: 8   Global Step: 351440   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:30,861-Speed 2628.33 samples/sec   Loss 7.7481   LearningRate 0.0332   Epoch: 8   Global Step: 351450   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:34,764-Speed 2624.21 samples/sec   Loss 7.8239   LearningRate 0.0332   Epoch: 8   Global Step: 351460   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:38,684-Speed 2613.18 samples/sec   Loss 7.7865   LearningRate 0.0332   Epoch: 8   Global Step: 351470   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:42,588-Speed 2623.11 samples/sec   Loss 7.7278   LearningRate 0.0332   Epoch: 8   Global Step: 351480   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:46,510-Speed 2611.62 samples/sec   Loss 7.6071   LearningRate 0.0332   Epoch: 8   Global Step: 351490   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:50,410-Speed 2626.51 samples/sec   Loss 7.8003   LearningRate 0.0332   Epoch: 8   Global Step: 351500   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:54,305-Speed 2629.39 samples/sec   Loss 7.8374   LearningRate 0.0332   Epoch: 8   Global Step: 351510   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:13:58,313-Speed 2555.71 samples/sec   Loss 7.7770   LearningRate 0.0332   Epoch: 8   Global Step: 351520   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:02,245-Speed 2604.57 samples/sec   Loss 7.7797   LearningRate 0.0332   Epoch: 8   Global Step: 351530   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:06,135-Speed 2633.28 samples/sec   Loss 7.8213   LearningRate 0.0332   Epoch: 8   Global Step: 351540   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:14:10,012-Speed 2641.94 samples/sec   Loss 7.7219   LearningRate 0.0332   Epoch: 8   Global Step: 351550   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:13,903-Speed 2631.81 samples/sec   Loss 7.7892   LearningRate 0.0332   Epoch: 8   Global Step: 351560   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:17,797-Speed 2630.11 samples/sec   Loss 7.8427   LearningRate 0.0332   Epoch: 8   Global Step: 351570   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:21,689-Speed 2632.19 samples/sec   Loss 7.7352   LearningRate 0.0332   Epoch: 8   Global Step: 351580   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:25,583-Speed 2630.39 samples/sec   Loss 7.7759   LearningRate 0.0332   Epoch: 8   Global Step: 351590   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:29,473-Speed 2632.59 samples/sec   Loss 7.8469   LearningRate 0.0332   Epoch: 8   Global Step: 351600   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:33,365-Speed 2631.56 samples/sec   Loss 7.8256   LearningRate 0.0332   Epoch: 8   Global Step: 351610   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:37,260-Speed 2629.47 samples/sec   Loss 7.7491   LearningRate 0.0332   Epoch: 8   Global Step: 351620   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:41,151-Speed 2632.59 samples/sec   Loss 7.6474   LearningRate 0.0332   Epoch: 8   Global Step: 351630   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:45,042-Speed 2632.50 samples/sec   Loss 7.7477   LearningRate 0.0332   Epoch: 8   Global Step: 351640   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:14:48,938-Speed 2628.80 samples/sec   Loss 7.9022   LearningRate 0.0332   Epoch: 8   Global Step: 351650   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:14:52,831-Speed 2630.78 samples/sec   Loss 7.7320   LearningRate 0.0332   Epoch: 8   Global Step: 351660   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:14:56,726-Speed 2629.63 samples/sec   Loss 7.8005   LearningRate 0.0332   Epoch: 8   Global Step: 351670   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:00,617-Speed 2632.31 samples/sec   Loss 7.8496   LearningRate 0.0332   Epoch: 8   Global Step: 351680   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:04,511-Speed 2629.85 samples/sec   Loss 7.7714   LearningRate 0.0332   Epoch: 8   Global Step: 351690   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:08,414-Speed 2624.51 samples/sec   Loss 7.8272   LearningRate 0.0332   Epoch: 8   Global Step: 351700   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:12,310-Speed 2629.26 samples/sec   Loss 7.7797   LearningRate 0.0332   Epoch: 8   Global Step: 351710   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:16,208-Speed 2627.77 samples/sec   Loss 7.7957   LearningRate 0.0332   Epoch: 8   Global Step: 351720   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:20,104-Speed 2628.81 samples/sec   Loss 7.6623   LearningRate 0.0332   Epoch: 8   Global Step: 351730   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:23,991-Speed 2634.99 samples/sec   Loss 7.7437   LearningRate 0.0332   Epoch: 8   Global Step: 351740   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:27,886-Speed 2629.89 samples/sec   Loss 7.8103   LearningRate 0.0332   Epoch: 8   Global Step: 351750   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:15:31,772-Speed 2635.74 samples/sec   Loss 7.7353   LearningRate 0.0332   Epoch: 8   Global Step: 351760   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:35,690-Speed 2613.94 samples/sec   Loss 7.7493   LearningRate 0.0332   Epoch: 8   Global Step: 351770   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:39,585-Speed 2629.58 samples/sec   Loss 7.7535   LearningRate 0.0332   Epoch: 8   Global Step: 351780   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:43,478-Speed 2631.48 samples/sec   Loss 7.6010   LearningRate 0.0332   Epoch: 8   Global Step: 351790   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:47,373-Speed 2629.25 samples/sec   Loss 7.7496   LearningRate 0.0332   Epoch: 8   Global Step: 351800   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:51,265-Speed 2632.05 samples/sec   Loss 7.6999   LearningRate 0.0332   Epoch: 8   Global Step: 351810   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:55,161-Speed 2628.70 samples/sec   Loss 7.9025   LearningRate 0.0332   Epoch: 8   Global Step: 351820   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:15:59,055-Speed 2630.81 samples/sec   Loss 7.8101   LearningRate 0.0332   Epoch: 8   Global Step: 351830   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:16:02,955-Speed 2625.95 samples/sec   Loss 7.7144   LearningRate 0.0332   Epoch: 8   Global Step: 351840   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:16:06,844-Speed 2633.49 samples/sec   Loss 7.8209   LearningRate 0.0332   Epoch: 8   Global Step: 351850   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:16:10,745-Speed 2625.13 samples/sec   Loss 7.7749   LearningRate 0.0332   Epoch: 8   Global Step: 351860   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:14,645-Speed 2626.90 samples/sec   Loss 7.6993   LearningRate 0.0332   Epoch: 8   Global Step: 351870   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:18,542-Speed 2627.81 samples/sec   Loss 7.8736   LearningRate 0.0332   Epoch: 8   Global Step: 351880   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:22,450-Speed 2621.32 samples/sec   Loss 7.6983   LearningRate 0.0332   Epoch: 8   Global Step: 351890   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:26,348-Speed 2627.07 samples/sec   Loss 7.6099   LearningRate 0.0332   Epoch: 8   Global Step: 351900   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:30,266-Speed 2614.89 samples/sec   Loss 7.6132   LearningRate 0.0332   Epoch: 8   Global Step: 351910   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:34,168-Speed 2624.54 samples/sec   Loss 7.8816   LearningRate 0.0332   Epoch: 8   Global Step: 351920   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:38,065-Speed 2628.45 samples/sec   Loss 7.7181   LearningRate 0.0332   Epoch: 8   Global Step: 351930   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:41,958-Speed 2630.85 samples/sec   Loss 7.8355   LearningRate 0.0332   Epoch: 8   Global Step: 351940   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:45,853-Speed 2629.46 samples/sec   Loss 7.8648   LearningRate 0.0331   Epoch: 8   Global Step: 351950   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:16:49,756-Speed 2623.72 samples/sec   Loss 7.7304   LearningRate 0.0331   Epoch: 8   Global Step: 351960   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:16:53,667-Speed 2619.39 samples/sec   Loss 7.7022   LearningRate 0.0331   Epoch: 8   Global Step: 351970   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:16:57,548-Speed 2639.09 samples/sec   Loss 7.8777   LearningRate 0.0331   Epoch: 8   Global Step: 351980   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:01,441-Speed 2630.70 samples/sec   Loss 7.8208   LearningRate 0.0331   Epoch: 8   Global Step: 351990   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:05,338-Speed 2628.28 samples/sec   Loss 7.6680   LearningRate 0.0331   Epoch: 8   Global Step: 352000   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:09,237-Speed 2627.28 samples/sec   Loss 7.6868   LearningRate 0.0331   Epoch: 8   Global Step: 352010   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:13,134-Speed 2628.28 samples/sec   Loss 7.7793   LearningRate 0.0331   Epoch: 8   Global Step: 352020   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:17,030-Speed 2628.59 samples/sec   Loss 7.9275   LearningRate 0.0331   Epoch: 8   Global Step: 352030   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:20,933-Speed 2624.74 samples/sec   Loss 7.8273   LearningRate 0.0331   Epoch: 8   Global Step: 352040   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:24,834-Speed 2625.11 samples/sec   Loss 7.6697   LearningRate 0.0331   Epoch: 8   Global Step: 352050   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:28,734-Speed 2626.74 samples/sec   Loss 7.9582   LearningRate 0.0331   Epoch: 8   Global Step: 352060   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:32,630-Speed 2628.50 samples/sec   Loss 7.6386   LearningRate 0.0331   Epoch: 8   Global Step: 352070   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:17:36,530-Speed 2625.89 samples/sec   Loss 7.8181   LearningRate 0.0331   Epoch: 8   Global Step: 352080   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:17:40,439-Speed 2620.45 samples/sec   Loss 7.8020   LearningRate 0.0331   Epoch: 8   Global Step: 352090   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:17:44,338-Speed 2627.12 samples/sec   Loss 7.7761   LearningRate 0.0331   Epoch: 8   Global Step: 352100   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:17:48,338-Speed 2560.31 samples/sec   Loss 7.7419   LearningRate 0.0331   Epoch: 8   Global Step: 352110   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:17:52,227-Speed 2634.07 samples/sec   Loss 7.8176   LearningRate 0.0331   Epoch: 8   Global Step: 352120   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:17:56,127-Speed 2626.30 samples/sec   Loss 7.7232   LearningRate 0.0331   Epoch: 8   Global Step: 352130   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:00,021-Speed 2630.24 samples/sec   Loss 7.6753   LearningRate 0.0331   Epoch: 8   Global Step: 352140   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:03,916-Speed 2629.50 samples/sec   Loss 7.7315   LearningRate 0.0331   Epoch: 8   Global Step: 352150   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:07,818-Speed 2624.41 samples/sec   Loss 7.8756   LearningRate 0.0331   Epoch: 8   Global Step: 352160   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:11,715-Speed 2628.43 samples/sec   Loss 7.7090   LearningRate 0.0331   Epoch: 8   Global Step: 352170   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:15,602-Speed 2635.29 samples/sec   Loss 7.7268   LearningRate 0.0331   Epoch: 8   Global Step: 352180   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:19,503-Speed 2625.73 samples/sec   Loss 7.7500   LearningRate 0.0331   Epoch: 8   Global Step: 352190   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:23,394-Speed 2632.16 samples/sec   Loss 7.6585   LearningRate 0.0331   Epoch: 8   Global Step: 352200   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:18:27,265-Speed 2646.46 samples/sec   Loss 7.9038   LearningRate 0.0331   Epoch: 8   Global Step: 352210   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:31,159-Speed 2630.04 samples/sec   Loss 7.7393   LearningRate 0.0331   Epoch: 8   Global Step: 352220   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:35,051-Speed 2631.59 samples/sec   Loss 7.6861   LearningRate 0.0331   Epoch: 8   Global Step: 352230   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:38,947-Speed 2629.20 samples/sec   Loss 7.7712   LearningRate 0.0331   Epoch: 8   Global Step: 352240   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:42,840-Speed 2630.84 samples/sec   Loss 7.7484   LearningRate 0.0331   Epoch: 8   Global Step: 352250   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:46,734-Speed 2629.73 samples/sec   Loss 7.8081   LearningRate 0.0331   Epoch: 8   Global Step: 352260   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:50,629-Speed 2631.00 samples/sec   Loss 7.8091   LearningRate 0.0331   Epoch: 8   Global Step: 352270   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:54,520-Speed 2632.71 samples/sec   Loss 7.6651   LearningRate 0.0331   Epoch: 8   Global Step: 352280   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:18:58,413-Speed 2630.93 samples/sec   Loss 7.7024   LearningRate 0.0331   Epoch: 8   Global Step: 352290   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:19:02,307-Speed 2630.11 samples/sec   Loss 7.7941   LearningRate 0.0331   Epoch: 8   Global Step: 352300   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:19:06,206-Speed 2627.33 samples/sec   Loss 7.6835   LearningRate 0.0331   Epoch: 8   Global Step: 352310   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:19:10,088-Speed 2638.25 samples/sec   Loss 7.7797   LearningRate 0.0331   Epoch: 8   Global Step: 352320   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:19:14,008-Speed 2612.69 samples/sec   Loss 7.7088   LearningRate 0.0331   Epoch: 8   Global Step: 352330   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:19:17,902-Speed 2630.44 samples/sec   Loss 7.7553   LearningRate 0.0331   Epoch: 8   Global Step: 352340   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:19:21,797-Speed 2630.12 samples/sec   Loss 7.7814   LearningRate 0.0331   Epoch: 8   Global Step: 352350   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:19:25,682-Speed 2636.99 samples/sec   Loss 7.7713   LearningRate 0.0331   Epoch: 8   Global Step: 352360   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:19:29,568-Speed 2635.72 samples/sec   Loss 8.8057   LearningRate 0.0331   Epoch: 8   Global Step: 352370   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:33,456-Speed 2634.92 samples/sec   Loss 7.9395   LearningRate 0.0331   Epoch: 8   Global Step: 352380   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:37,359-Speed 2624.18 samples/sec   Loss 7.8710   LearningRate 0.0331   Epoch: 8   Global Step: 352390   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:41,248-Speed 2633.72 samples/sec   Loss 7.9290   LearningRate 0.0331   Epoch: 8   Global Step: 352400   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:45,138-Speed 2633.13 samples/sec   Loss 7.8138   LearningRate 0.0331   Epoch: 8   Global Step: 352410   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:49,039-Speed 2625.45 samples/sec   Loss 7.8193   LearningRate 0.0331   Epoch: 8   Global Step: 352420   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:52,931-Speed 2631.84 samples/sec   Loss 7.8404   LearningRate 0.0331   Epoch: 8   Global Step: 352430   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:19:56,825-Speed 2630.23 samples/sec   Loss 7.7288   LearningRate 0.0331   Epoch: 8   Global Step: 352440   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:20:00,718-Speed 2630.79 samples/sec   Loss 7.8433   LearningRate 0.0331   Epoch: 8   Global Step: 352450   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:20:04,609-Speed 2632.72 samples/sec   Loss 7.7060   LearningRate 0.0331   Epoch: 8   Global Step: 352460   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:20:08,507-Speed 2628.10 samples/sec   Loss 7.9089   LearningRate 0.0331   Epoch: 8   Global Step: 352470   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:12,404-Speed 2627.78 samples/sec   Loss 7.8366   LearningRate 0.0331   Epoch: 8   Global Step: 352480   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:16,306-Speed 2624.90 samples/sec   Loss 7.7739   LearningRate 0.0331   Epoch: 8   Global Step: 352490   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:20,202-Speed 2629.42 samples/sec   Loss 7.8664   LearningRate 0.0331   Epoch: 8   Global Step: 352500   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:24,094-Speed 2630.85 samples/sec   Loss 7.7047   LearningRate 0.0331   Epoch: 8   Global Step: 352510   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:27,998-Speed 2623.56 samples/sec   Loss 7.7102   LearningRate 0.0331   Epoch: 8   Global Step: 352520   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:31,892-Speed 2631.02 samples/sec   Loss 7.8807   LearningRate 0.0331   Epoch: 8   Global Step: 352530   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:35,784-Speed 2632.18 samples/sec   Loss 7.6440   LearningRate 0.0331   Epoch: 8   Global Step: 352540   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:39,676-Speed 2631.35 samples/sec   Loss 7.7189   LearningRate 0.0331   Epoch: 8   Global Step: 352550   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:43,574-Speed 2627.90 samples/sec   Loss 7.6957   LearningRate 0.0331   Epoch: 8   Global Step: 352560   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:20:47,467-Speed 2631.01 samples/sec   Loss 7.7733   LearningRate 0.0331   Epoch: 8   Global Step: 352570   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:20:51,359-Speed 2631.91 samples/sec   Loss 7.6328   LearningRate 0.0331   Epoch: 8   Global Step: 352580   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:20:55,255-Speed 2629.22 samples/sec   Loss 7.7147   LearningRate 0.0331   Epoch: 8   Global Step: 352590   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:20:59,150-Speed 2629.72 samples/sec   Loss 7.7002   LearningRate 0.0331   Epoch: 8   Global Step: 352600   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:03,094-Speed 2597.32 samples/sec   Loss 7.6745   LearningRate 0.0331   Epoch: 8   Global Step: 352610   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:06,985-Speed 2632.19 samples/sec   Loss 7.7310   LearningRate 0.0331   Epoch: 8   Global Step: 352620   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:10,881-Speed 2628.79 samples/sec   Loss 7.8366   LearningRate 0.0331   Epoch: 8   Global Step: 352630   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:14,786-Speed 2623.20 samples/sec   Loss 7.8245   LearningRate 0.0331   Epoch: 8   Global Step: 352640   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:18,677-Speed 2631.94 samples/sec   Loss 7.9079   LearningRate 0.0331   Epoch: 8   Global Step: 352650   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:22,567-Speed 2632.83 samples/sec   Loss 7.6749   LearningRate 0.0331   Epoch: 8   Global Step: 352660   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:21:26,459-Speed 2632.34 samples/sec   Loss 7.6411   LearningRate 0.0330   Epoch: 8   Global Step: 352670   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:30,357-Speed 2627.39 samples/sec   Loss 7.7703   LearningRate 0.0330   Epoch: 8   Global Step: 352680   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:34,255-Speed 2627.41 samples/sec   Loss 7.7941   LearningRate 0.0330   Epoch: 8   Global Step: 352690   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:38,153-Speed 2627.92 samples/sec   Loss 7.6634   LearningRate 0.0330   Epoch: 8   Global Step: 352700   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:42,063-Speed 2619.31 samples/sec   Loss 7.6120   LearningRate 0.0330   Epoch: 8   Global Step: 352710   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:45,973-Speed 2619.28 samples/sec   Loss 7.8337   LearningRate 0.0330   Epoch: 8   Global Step: 352720   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:49,875-Speed 2625.03 samples/sec   Loss 7.7197   LearningRate 0.0330   Epoch: 8   Global Step: 352730   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:53,776-Speed 2625.85 samples/sec   Loss 7.6795   LearningRate 0.0330   Epoch: 8   Global Step: 352740   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:21:57,683-Speed 2621.30 samples/sec   Loss 7.6591   LearningRate 0.0330   Epoch: 8   Global Step: 352750   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:22:01,587-Speed 2624.40 samples/sec   Loss 7.8060   LearningRate 0.0330   Epoch: 8   Global Step: 352760   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:22:05,494-Speed 2621.05 samples/sec   Loss 7.7096   LearningRate 0.0330   Epoch: 8   Global Step: 352770   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:22:09,397-Speed 2624.04 samples/sec   Loss 7.6915   LearningRate 0.0330   Epoch: 8   Global Step: 352780   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:22:13,266-Speed 2647.41 samples/sec   Loss 7.7622   LearningRate 0.0330   Epoch: 8   Global Step: 352790   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:17,181-Speed 2615.96 samples/sec   Loss 7.7834   LearningRate 0.0330   Epoch: 8   Global Step: 352800   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:21,080-Speed 2627.59 samples/sec   Loss 7.6632   LearningRate 0.0330   Epoch: 8   Global Step: 352810   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:24,978-Speed 2627.43 samples/sec   Loss 7.6464   LearningRate 0.0330   Epoch: 8   Global Step: 352820   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:28,887-Speed 2620.72 samples/sec   Loss 7.8285   LearningRate 0.0330   Epoch: 8   Global Step: 352830   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:32,824-Speed 2601.27 samples/sec   Loss 7.6549   LearningRate 0.0330   Epoch: 8   Global Step: 352840   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:36,716-Speed 2631.76 samples/sec   Loss 7.6766   LearningRate 0.0330   Epoch: 8   Global Step: 352850   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:40,615-Speed 2626.88 samples/sec   Loss 7.7217   LearningRate 0.0330   Epoch: 8   Global Step: 352860   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:44,512-Speed 2628.78 samples/sec   Loss 7.7299   LearningRate 0.0330   Epoch: 8   Global Step: 352870   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:48,442-Speed 2606.46 samples/sec   Loss 7.7680   LearningRate 0.0330   Epoch: 8   Global Step: 352880   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:22:52,334-Speed 2631.66 samples/sec   Loss 7.6769   LearningRate 0.0330   Epoch: 8   Global Step: 352890   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:22:56,230-Speed 2628.93 samples/sec   Loss 7.7476   LearningRate 0.0330   Epoch: 8   Global Step: 352900   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:00,132-Speed 2625.26 samples/sec   Loss 7.7090   LearningRate 0.0330   Epoch: 8   Global Step: 352910   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:04,030-Speed 2627.79 samples/sec   Loss 7.7314   LearningRate 0.0330   Epoch: 8   Global Step: 352920   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:07,938-Speed 2621.10 samples/sec   Loss 7.8016   LearningRate 0.0330   Epoch: 8   Global Step: 352930   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:11,829-Speed 2631.98 samples/sec   Loss 7.6822   LearningRate 0.0330   Epoch: 8   Global Step: 352940   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:15,721-Speed 2632.10 samples/sec   Loss 7.7345   LearningRate 0.0330   Epoch: 8   Global Step: 352950   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:19,614-Speed 2630.84 samples/sec   Loss 7.7682   LearningRate 0.0330   Epoch: 8   Global Step: 352960   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:23,513-Speed 2627.27 samples/sec   Loss 7.7469   LearningRate 0.0330   Epoch: 8   Global Step: 352970   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:27,435-Speed 2611.37 samples/sec   Loss 7.6839   LearningRate 0.0330   Epoch: 8   Global Step: 352980   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:31,334-Speed 2627.35 samples/sec   Loss 7.6865   LearningRate 0.0330   Epoch: 8   Global Step: 352990   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:23:35,255-Speed 2612.25 samples/sec   Loss 7.7226   LearningRate 0.0330   Epoch: 8   Global Step: 353000   Fp16 Grad Scale: 262144   Required: 54 hours
Training: 2022-04-14 11:23:39,126-Speed 2645.77 samples/sec   Loss 7.8635   LearningRate 0.0330   Epoch: 8   Global Step: 353010   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:23:43,009-Speed 2638.11 samples/sec   Loss 7.7344   LearningRate 0.0330   Epoch: 8   Global Step: 353020   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:23:46,906-Speed 2628.33 samples/sec   Loss 7.6597   LearningRate 0.0330   Epoch: 8   Global Step: 353030   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:23:50,801-Speed 2629.67 samples/sec   Loss 7.7640   LearningRate 0.0330   Epoch: 8   Global Step: 353040   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:23:54,690-Speed 2633.86 samples/sec   Loss 7.6786   LearningRate 0.0330   Epoch: 8   Global Step: 353050   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:23:58,583-Speed 2630.66 samples/sec   Loss 7.6358   LearningRate 0.0330   Epoch: 8   Global Step: 353060   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:24:02,479-Speed 2628.99 samples/sec   Loss 7.7535   LearningRate 0.0330   Epoch: 8   Global Step: 353070   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:24:06,372-Speed 2631.37 samples/sec   Loss 7.8139   LearningRate 0.0330   Epoch: 8   Global Step: 353080   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:24:10,264-Speed 2631.70 samples/sec   Loss 7.7558   LearningRate 0.0330   Epoch: 8   Global Step: 353090   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:24:14,192-Speed 2607.19 samples/sec   Loss 7.9047   LearningRate 0.0330   Epoch: 8   Global Step: 353100   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:24:18,170-Speed 2574.49 samples/sec   Loss 7.6999   LearningRate 0.0330   Epoch: 8   Global Step: 353110   Fp16 Grad Scale: 65536   Required: 54 hours
Training: 2022-04-14 11:24:22,068-Speed 2627.88 samples/sec   Loss 7.6849   LearningRate 0.0330   Epoch: 8   Global Step: 353120   Fp16 Grad Scale: 131072   Required: 54 hours
Training: 2022-04-14 11:24:25,909-Speed 2666.61 samples/sec   Loss 7.8685   LearningRate 0.0330   Epoch: 8   Global Step: 353130   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:29,821-Speed 2618.36 samples/sec   Loss 9.8280   LearningRate 0.0330   Epoch: 8   Global Step: 353140   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:33,718-Speed 2628.31 samples/sec   Loss 8.6324   LearningRate 0.0330   Epoch: 8   Global Step: 353150   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:37,620-Speed 2625.36 samples/sec   Loss 7.9921   LearningRate 0.0330   Epoch: 8   Global Step: 353160   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:41,514-Speed 2630.31 samples/sec   Loss 7.9325   LearningRate 0.0330   Epoch: 8   Global Step: 353170   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:45,405-Speed 2631.71 samples/sec   Loss 7.8013   LearningRate 0.0330   Epoch: 8   Global Step: 353180   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:49,293-Speed 2634.38 samples/sec   Loss 7.7832   LearningRate 0.0330   Epoch: 8   Global Step: 353190   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:53,182-Speed 2633.33 samples/sec   Loss 7.9124   LearningRate 0.0330   Epoch: 8   Global Step: 353200   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:24:57,072-Speed 2633.70 samples/sec   Loss 7.8042   LearningRate 0.0330   Epoch: 8   Global Step: 353210   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:25:00,959-Speed 2634.63 samples/sec   Loss 7.7935   LearningRate 0.0330   Epoch: 8   Global Step: 353220   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-14 11:25:04,851-Speed 2632.02 samples/sec   Loss 7.7468   LearningRate 0.0330   Epoch: 8   Global Step: 353230   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:08,741-Speed 2633.25 samples/sec   Loss 7.8733   LearningRate 0.0330   Epoch: 8   Global Step: 353240   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:12,641-Speed 2625.82 samples/sec   Loss 7.6770   LearningRate 0.0330   Epoch: 8   Global Step: 353250   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:16,542-Speed 2626.10 samples/sec   Loss 7.7854   LearningRate 0.0330   Epoch: 8   Global Step: 353260   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:20,431-Speed 2632.94 samples/sec   Loss 7.8589   LearningRate 0.0330   Epoch: 8   Global Step: 353270   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:24,329-Speed 2628.24 samples/sec   Loss 7.7373   LearningRate 0.0330   Epoch: 8   Global Step: 353280   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:28,226-Speed 2628.02 samples/sec   Loss 7.7127   LearningRate 0.0330   Epoch: 8   Global Step: 353290   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:32,124-Speed 2627.46 samples/sec   Loss 7.7657   LearningRate 0.0330   Epoch: 8   Global Step: 353300   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:36,019-Speed 2629.92 samples/sec   Loss 7.8977   LearningRate 0.0330   Epoch: 8   Global Step: 353310   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:39,910-Speed 2632.77 samples/sec   Loss 7.8147   LearningRate 0.0330   Epoch: 8   Global Step: 353320   Fp16 Grad Scale: 16384   Required: 54 hours
Training: 2022-04-14 11:25:43,804-Speed 2630.19 samples/sec   Loss 7.7940   LearningRate 0.0330   Epoch: 8   Global Step: 353330   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:25:47,710-Speed 2622.57 samples/sec   Loss 7.6067   LearningRate 0.0330   Epoch: 8   Global Step: 353340   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:25:51,599-Speed 2633.44 samples/sec   Loss 7.7015   LearningRate 0.0330   Epoch: 8   Global Step: 353350   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:25:55,491-Speed 2631.88 samples/sec   Loss 7.7925   LearningRate 0.0330   Epoch: 8   Global Step: 353360   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:25:59,381-Speed 2632.62 samples/sec   Loss 7.6135   LearningRate 0.0330   Epoch: 8   Global Step: 353370   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:26:03,277-Speed 2629.01 samples/sec   Loss 7.6437   LearningRate 0.0330   Epoch: 8   Global Step: 353380   Fp16 Grad Scale: 32768   Required: 54 hours
Training: 2022-04-14 11:26:07,171-Speed 2630.33 samples/sec   Loss 7.6738   LearningRate 0.0329   Epoch: 8   Global Step: 353390   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:26:11,058-Speed 2635.84 samples/sec   Loss 7.7603   LearningRate 0.0329   Epoch: 8   Global Step: 353400   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:26:14,959-Speed 2625.11 samples/sec   Loss 7.7742   LearningRate 0.0329   Epoch: 8   Global Step: 353410   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:26:18,865-Speed 2622.16 samples/sec   Loss 7.7687   LearningRate 0.0329   Epoch: 8   Global Step: 353420   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:26:22,768-Speed 2623.98 samples/sec   Loss 7.8019   LearningRate 0.0329   Epoch: 8   Global Step: 353430   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:26,672-Speed 2624.22 samples/sec   Loss 7.7580   LearningRate 0.0329   Epoch: 8   Global Step: 353440   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:30,590-Speed 2613.56 samples/sec   Loss 7.8088   LearningRate 0.0329   Epoch: 8   Global Step: 353450   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:34,508-Speed 2614.41 samples/sec   Loss 7.6909   LearningRate 0.0329   Epoch: 8   Global Step: 353460   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:38,415-Speed 2621.34 samples/sec   Loss 7.7840   LearningRate 0.0329   Epoch: 8   Global Step: 353470   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:42,311-Speed 2629.72 samples/sec   Loss 7.8019   LearningRate 0.0329   Epoch: 8   Global Step: 353480   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:46,203-Speed 2631.71 samples/sec   Loss 7.7623   LearningRate 0.0329   Epoch: 8   Global Step: 353490   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:50,097-Speed 2630.30 samples/sec   Loss 7.7458   LearningRate 0.0329   Epoch: 8   Global Step: 353500   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:53,995-Speed 2628.08 samples/sec   Loss 7.8251   LearningRate 0.0329   Epoch: 8   Global Step: 353510   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:26:57,903-Speed 2620.46 samples/sec   Loss 7.7511   LearningRate 0.0329   Epoch: 8   Global Step: 353520   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:27:01,809-Speed 2622.43 samples/sec   Loss 7.7435   LearningRate 0.0329   Epoch: 8   Global Step: 353530   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:27:05,713-Speed 2623.45 samples/sec   Loss 7.8046   LearningRate 0.0329   Epoch: 8   Global Step: 353540   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:27:09,605-Speed 2632.13 samples/sec   Loss 7.7732   LearningRate 0.0329   Epoch: 8   Global Step: 353550   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:27:13,495-Speed 2633.19 samples/sec   Loss 7.8197   LearningRate 0.0329   Epoch: 8   Global Step: 353560   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:27:17,428-Speed 2604.05 samples/sec   Loss 7.7460   LearningRate 0.0329   Epoch: 8   Global Step: 353570   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:27:21,307-Speed 2640.79 samples/sec   Loss 7.8878   LearningRate 0.0329   Epoch: 8   Global Step: 353580   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:27:25,123-Speed 2684.30 samples/sec   Loss 8.3791   LearningRate 0.0329   Epoch: 8   Global Step: 353590   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:29,028-Speed 2623.42 samples/sec   Loss 7.9505   LearningRate 0.0329   Epoch: 8   Global Step: 353600   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:32,930-Speed 2624.50 samples/sec   Loss 7.6783   LearningRate 0.0329   Epoch: 8   Global Step: 353610   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:36,819-Speed 2633.90 samples/sec   Loss 7.6657   LearningRate 0.0329   Epoch: 8   Global Step: 353620   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:40,729-Speed 2619.55 samples/sec   Loss 7.7266   LearningRate 0.0329   Epoch: 8   Global Step: 353630   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:44,630-Speed 2626.01 samples/sec   Loss 7.8480   LearningRate 0.0329   Epoch: 8   Global Step: 353640   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:48,564-Speed 2603.46 samples/sec   Loss 7.7626   LearningRate 0.0329   Epoch: 8   Global Step: 353650   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:52,634-Speed 2516.86 samples/sec   Loss 7.7324   LearningRate 0.0329   Epoch: 8   Global Step: 353660   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:27:56,523-Speed 2633.34 samples/sec   Loss 7.7669   LearningRate 0.0329   Epoch: 8   Global Step: 353670   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:28:00,464-Speed 2599.07 samples/sec   Loss 7.6657   LearningRate 0.0329   Epoch: 8   Global Step: 353680   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:28:04,440-Speed 2576.22 samples/sec   Loss 7.7937   LearningRate 0.0329   Epoch: 8   Global Step: 353690   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:08,340-Speed 2626.58 samples/sec   Loss 7.7732   LearningRate 0.0329   Epoch: 8   Global Step: 353700   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:12,233-Speed 2630.77 samples/sec   Loss 7.7560   LearningRate 0.0329   Epoch: 8   Global Step: 353710   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:16,129-Speed 2629.06 samples/sec   Loss 7.8810   LearningRate 0.0329   Epoch: 8   Global Step: 353720   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:20,019-Speed 2633.39 samples/sec   Loss 7.6515   LearningRate 0.0329   Epoch: 8   Global Step: 353730   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:23,907-Speed 2634.60 samples/sec   Loss 7.7915   LearningRate 0.0329   Epoch: 8   Global Step: 353740   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:27,796-Speed 2633.01 samples/sec   Loss 7.8445   LearningRate 0.0329   Epoch: 8   Global Step: 353750   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:31,688-Speed 2632.36 samples/sec   Loss 7.7517   LearningRate 0.0329   Epoch: 8   Global Step: 353760   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:35,579-Speed 2632.28 samples/sec   Loss 7.7213   LearningRate 0.0329   Epoch: 8   Global Step: 353770   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:39,474-Speed 2629.83 samples/sec   Loss 7.7575   LearningRate 0.0329   Epoch: 8   Global Step: 353780   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:28:43,363-Speed 2633.37 samples/sec   Loss 7.7031   LearningRate 0.0329   Epoch: 8   Global Step: 353790   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:28:47,277-Speed 2617.38 samples/sec   Loss 7.6645   LearningRate 0.0329   Epoch: 8   Global Step: 353800   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:28:51,182-Speed 2622.74 samples/sec   Loss 7.7583   LearningRate 0.0329   Epoch: 8   Global Step: 353810   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:28:55,092-Speed 2619.42 samples/sec   Loss 7.5995   LearningRate 0.0329   Epoch: 8   Global Step: 353820   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:28:58,993-Speed 2625.70 samples/sec   Loss 7.9105   LearningRate 0.0329   Epoch: 8   Global Step: 353830   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:29:02,901-Speed 2621.70 samples/sec   Loss 7.7915   LearningRate 0.0329   Epoch: 8   Global Step: 353840   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:29:06,804-Speed 2623.58 samples/sec   Loss 7.7261   LearningRate 0.0329   Epoch: 8   Global Step: 353850   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:29:10,712-Speed 2620.97 samples/sec   Loss 7.7490   LearningRate 0.0329   Epoch: 8   Global Step: 353860   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:29:14,612-Speed 2625.90 samples/sec   Loss 7.7530   LearningRate 0.0329   Epoch: 8   Global Step: 353870   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:29:18,511-Speed 2627.13 samples/sec   Loss 7.7428   LearningRate 0.0329   Epoch: 8   Global Step: 353880   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:29:22,411-Speed 2627.64 samples/sec   Loss 7.8082   LearningRate 0.0329   Epoch: 8   Global Step: 353890   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:26,307-Speed 2628.95 samples/sec   Loss 7.7654   LearningRate 0.0329   Epoch: 8   Global Step: 353900   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:30,200-Speed 2631.11 samples/sec   Loss 7.7262   LearningRate 0.0329   Epoch: 8   Global Step: 353910   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:34,103-Speed 2624.60 samples/sec   Loss 7.6751   LearningRate 0.0329   Epoch: 8   Global Step: 353920   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:37,996-Speed 2630.95 samples/sec   Loss 7.7099   LearningRate 0.0329   Epoch: 8   Global Step: 353930   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:41,893-Speed 2628.55 samples/sec   Loss 7.7750   LearningRate 0.0329   Epoch: 8   Global Step: 353940   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:45,790-Speed 2627.88 samples/sec   Loss 7.6540   LearningRate 0.0329   Epoch: 8   Global Step: 353950   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:49,690-Speed 2626.87 samples/sec   Loss 7.7065   LearningRate 0.0329   Epoch: 8   Global Step: 353960   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:53,592-Speed 2625.26 samples/sec   Loss 7.8182   LearningRate 0.0329   Epoch: 8   Global Step: 353970   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:29:57,486-Speed 2630.17 samples/sec   Loss 7.8049   LearningRate 0.0329   Epoch: 8   Global Step: 353980   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:30:01,375-Speed 2633.46 samples/sec   Loss 7.9028   LearningRate 0.0329   Epoch: 8   Global Step: 353990   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:05,270-Speed 2629.53 samples/sec   Loss 7.6805   LearningRate 0.0329   Epoch: 8   Global Step: 354000   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:09,165-Speed 2629.71 samples/sec   Loss 7.7037   LearningRate 0.0329   Epoch: 8   Global Step: 354010   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:13,074-Speed 2620.66 samples/sec   Loss 7.7167   LearningRate 0.0329   Epoch: 8   Global Step: 354020   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:16,990-Speed 2615.91 samples/sec   Loss 7.6166   LearningRate 0.0329   Epoch: 8   Global Step: 354030   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:20,891-Speed 2625.18 samples/sec   Loss 7.7741   LearningRate 0.0329   Epoch: 8   Global Step: 354040   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:24,783-Speed 2631.56 samples/sec   Loss 7.7764   LearningRate 0.0329   Epoch: 8   Global Step: 354050   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:28,680-Speed 2627.70 samples/sec   Loss 7.6301   LearningRate 0.0329   Epoch: 8   Global Step: 354060   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:32,579-Speed 2627.59 samples/sec   Loss 7.6690   LearningRate 0.0329   Epoch: 8   Global Step: 354070   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:36,474-Speed 2629.48 samples/sec   Loss 7.6643   LearningRate 0.0329   Epoch: 8   Global Step: 354080   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:30:40,426-Speed 2591.21 samples/sec   Loss 7.7658   LearningRate 0.0329   Epoch: 8   Global Step: 354090   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:30:44,323-Speed 2628.67 samples/sec   Loss 7.8065   LearningRate 0.0329   Epoch: 8   Global Step: 354100   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:30:48,229-Speed 2622.29 samples/sec   Loss 7.7093   LearningRate 0.0328   Epoch: 8   Global Step: 354110   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:30:52,122-Speed 2631.19 samples/sec   Loss 7.6683   LearningRate 0.0328   Epoch: 8   Global Step: 354120   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:30:56,020-Speed 2628.04 samples/sec   Loss 7.4942   LearningRate 0.0328   Epoch: 8   Global Step: 354130   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:30:59,913-Speed 2630.62 samples/sec   Loss 7.8289   LearningRate 0.0328   Epoch: 8   Global Step: 354140   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:03,812-Speed 2626.94 samples/sec   Loss 7.7318   LearningRate 0.0328   Epoch: 8   Global Step: 354150   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:07,721-Speed 2620.33 samples/sec   Loss 7.7792   LearningRate 0.0328   Epoch: 8   Global Step: 354160   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:11,614-Speed 2631.45 samples/sec   Loss 7.7283   LearningRate 0.0328   Epoch: 8   Global Step: 354170   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:15,519-Speed 2622.51 samples/sec   Loss 7.6859   LearningRate 0.0328   Epoch: 8   Global Step: 354180   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:19,412-Speed 2631.60 samples/sec   Loss 7.8472   LearningRate 0.0328   Epoch: 8   Global Step: 354190   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:31:23,309-Speed 2628.22 samples/sec   Loss 7.7270   LearningRate 0.0328   Epoch: 8   Global Step: 354200   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:31:27,200-Speed 2632.71 samples/sec   Loss 7.7675   LearningRate 0.0328   Epoch: 8   Global Step: 354210   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:31:31,125-Speed 2608.91 samples/sec   Loss 7.9092   LearningRate 0.0328   Epoch: 8   Global Step: 354220   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:31:35,001-Speed 2642.93 samples/sec   Loss 7.5589   LearningRate 0.0328   Epoch: 8   Global Step: 354230   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:38,908-Speed 2621.23 samples/sec   Loss 7.7480   LearningRate 0.0328   Epoch: 8   Global Step: 354240   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:42,804-Speed 2629.86 samples/sec   Loss 7.6476   LearningRate 0.0328   Epoch: 8   Global Step: 354250   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:46,719-Speed 2615.96 samples/sec   Loss 7.6603   LearningRate 0.0328   Epoch: 8   Global Step: 354260   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:50,636-Speed 2615.41 samples/sec   Loss 7.7794   LearningRate 0.0328   Epoch: 8   Global Step: 354270   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:31:54,529-Speed 2630.20 samples/sec   Loss 8.0824   LearningRate 0.0328   Epoch: 8   Global Step: 354280   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:31:58,386-Speed 2656.18 samples/sec   Loss 8.9332   LearningRate 0.0328   Epoch: 8   Global Step: 354290   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:32:02,273-Speed 2634.72 samples/sec   Loss 7.9376   LearningRate 0.0328   Epoch: 8   Global Step: 354300   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:32:06,151-Speed 2641.74 samples/sec   Loss 8.0626   LearningRate 0.0328   Epoch: 8   Global Step: 354310   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:10,046-Speed 2629.72 samples/sec   Loss 8.5480   LearningRate 0.0328   Epoch: 8   Global Step: 354320   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:13,948-Speed 2625.09 samples/sec   Loss 7.6853   LearningRate 0.0328   Epoch: 8   Global Step: 354330   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:17,856-Speed 2621.28 samples/sec   Loss 7.6114   LearningRate 0.0328   Epoch: 8   Global Step: 354340   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:21,754-Speed 2627.63 samples/sec   Loss 7.6564   LearningRate 0.0328   Epoch: 8   Global Step: 354350   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:25,657-Speed 2624.00 samples/sec   Loss 7.7724   LearningRate 0.0328   Epoch: 8   Global Step: 354360   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:29,545-Speed 2634.15 samples/sec   Loss 7.8676   LearningRate 0.0328   Epoch: 8   Global Step: 354370   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:33,438-Speed 2630.92 samples/sec   Loss 7.6571   LearningRate 0.0328   Epoch: 8   Global Step: 354380   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:37,328-Speed 2633.40 samples/sec   Loss 7.7211   LearningRate 0.0328   Epoch: 8   Global Step: 354390   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:41,222-Speed 2630.35 samples/sec   Loss 7.8398   LearningRate 0.0328   Epoch: 8   Global Step: 354400   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:32:45,117-Speed 2629.42 samples/sec   Loss 7.7153   LearningRate 0.0328   Epoch: 8   Global Step: 354410   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:32:49,009-Speed 2632.17 samples/sec   Loss 7.8376   LearningRate 0.0328   Epoch: 8   Global Step: 354420   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:32:52,923-Speed 2616.17 samples/sec   Loss 7.7752   LearningRate 0.0328   Epoch: 8   Global Step: 354430   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:32:56,815-Speed 2632.78 samples/sec   Loss 7.7375   LearningRate 0.0328   Epoch: 8   Global Step: 354440   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:00,702-Speed 2634.83 samples/sec   Loss 7.6159   LearningRate 0.0328   Epoch: 8   Global Step: 354450   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:04,595-Speed 2630.44 samples/sec   Loss 7.6407   LearningRate 0.0328   Epoch: 8   Global Step: 354460   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:08,482-Speed 2634.89 samples/sec   Loss 7.7610   LearningRate 0.0328   Epoch: 8   Global Step: 354470   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:12,370-Speed 2634.80 samples/sec   Loss 7.6206   LearningRate 0.0328   Epoch: 8   Global Step: 354480   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:16,266-Speed 2629.06 samples/sec   Loss 7.6805   LearningRate 0.0328   Epoch: 8   Global Step: 354490   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:20,158-Speed 2631.69 samples/sec   Loss 7.8505   LearningRate 0.0328   Epoch: 8   Global Step: 354500   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:33:24,052-Speed 2630.04 samples/sec   Loss 7.6191   LearningRate 0.0328   Epoch: 8   Global Step: 354510   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:27,942-Speed 2632.85 samples/sec   Loss 7.7241   LearningRate 0.0328   Epoch: 8   Global Step: 354520   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:31,840-Speed 2628.09 samples/sec   Loss 7.8462   LearningRate 0.0328   Epoch: 8   Global Step: 354530   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:35,728-Speed 2634.02 samples/sec   Loss 7.7922   LearningRate 0.0328   Epoch: 8   Global Step: 354540   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:39,617-Speed 2633.47 samples/sec   Loss 7.7391   LearningRate 0.0328   Epoch: 8   Global Step: 354550   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:43,509-Speed 2631.64 samples/sec   Loss 7.7020   LearningRate 0.0328   Epoch: 8   Global Step: 354560   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:47,407-Speed 2628.15 samples/sec   Loss 7.8112   LearningRate 0.0328   Epoch: 8   Global Step: 354570   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:51,300-Speed 2630.79 samples/sec   Loss 7.7775   LearningRate 0.0328   Epoch: 8   Global Step: 354580   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:55,224-Speed 2610.53 samples/sec   Loss 7.7025   LearningRate 0.0328   Epoch: 8   Global Step: 354590   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:33:59,115-Speed 2632.10 samples/sec   Loss 7.6921   LearningRate 0.0328   Epoch: 8   Global Step: 354600   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:34:03,008-Speed 2631.09 samples/sec   Loss 7.7798   LearningRate 0.0328   Epoch: 8   Global Step: 354610   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:06,902-Speed 2630.49 samples/sec   Loss 7.6000   LearningRate 0.0328   Epoch: 8   Global Step: 354620   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:10,792-Speed 2632.70 samples/sec   Loss 7.6890   LearningRate 0.0328   Epoch: 8   Global Step: 354630   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:14,682-Speed 2633.10 samples/sec   Loss 7.6665   LearningRate 0.0328   Epoch: 8   Global Step: 354640   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:18,576-Speed 2630.52 samples/sec   Loss 7.7535   LearningRate 0.0328   Epoch: 8   Global Step: 354650   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:22,467-Speed 2632.70 samples/sec   Loss 7.6799   LearningRate 0.0328   Epoch: 8   Global Step: 354660   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:26,359-Speed 2631.35 samples/sec   Loss 7.7479   LearningRate 0.0328   Epoch: 8   Global Step: 354670   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:30,263-Speed 2624.12 samples/sec   Loss 7.6057   LearningRate 0.0328   Epoch: 8   Global Step: 354680   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:34,159-Speed 2629.09 samples/sec   Loss 7.7938   LearningRate 0.0328   Epoch: 8   Global Step: 354690   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:38,062-Speed 2624.39 samples/sec   Loss 7.7384   LearningRate 0.0328   Epoch: 8   Global Step: 354700   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:34:41,960-Speed 2627.27 samples/sec   Loss 7.8235   LearningRate 0.0328   Epoch: 8   Global Step: 354710   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:34:45,857-Speed 2628.38 samples/sec   Loss 7.6599   LearningRate 0.0328   Epoch: 8   Global Step: 354720   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:34:49,750-Speed 2631.05 samples/sec   Loss 7.8379   LearningRate 0.0328   Epoch: 8   Global Step: 354730   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:34:53,646-Speed 2629.07 samples/sec   Loss 7.6803   LearningRate 0.0328   Epoch: 8   Global Step: 354740   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:34:57,542-Speed 2629.30 samples/sec   Loss 7.6863   LearningRate 0.0328   Epoch: 8   Global Step: 354750   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:35:01,448-Speed 2621.94 samples/sec   Loss 7.6908   LearningRate 0.0328   Epoch: 8   Global Step: 354760   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:35:05,342-Speed 2630.21 samples/sec   Loss 7.6837   LearningRate 0.0328   Epoch: 8   Global Step: 354770   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:35:09,236-Speed 2630.36 samples/sec   Loss 7.7333   LearningRate 0.0328   Epoch: 8   Global Step: 354780   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:35:13,131-Speed 2629.71 samples/sec   Loss 7.7955   LearningRate 0.0328   Epoch: 8   Global Step: 354790   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:35:17,048-Speed 2615.25 samples/sec   Loss 7.6121   LearningRate 0.0328   Epoch: 8   Global Step: 354800   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:35:20,941-Speed 2630.92 samples/sec   Loss 7.7973   LearningRate 0.0328   Epoch: 8   Global Step: 354810   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:24,831-Speed 2632.95 samples/sec   Loss 7.7231   LearningRate 0.0328   Epoch: 8   Global Step: 354820   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:28,731-Speed 2626.19 samples/sec   Loss 7.7762   LearningRate 0.0328   Epoch: 8   Global Step: 354830   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:32,631-Speed 2627.41 samples/sec   Loss 7.6828   LearningRate 0.0327   Epoch: 8   Global Step: 354840   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:36,533-Speed 2624.79 samples/sec   Loss 7.8244   LearningRate 0.0327   Epoch: 8   Global Step: 354850   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:40,448-Speed 2615.95 samples/sec   Loss 7.6803   LearningRate 0.0327   Epoch: 8   Global Step: 354860   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:44,352-Speed 2623.74 samples/sec   Loss 7.7440   LearningRate 0.0327   Epoch: 8   Global Step: 354870   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:48,246-Speed 2631.42 samples/sec   Loss 7.6486   LearningRate 0.0327   Epoch: 8   Global Step: 354880   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:52,149-Speed 2624.16 samples/sec   Loss 7.5577   LearningRate 0.0327   Epoch: 8   Global Step: 354890   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:56,047-Speed 2627.63 samples/sec   Loss 7.6954   LearningRate 0.0327   Epoch: 8   Global Step: 354900   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:35:59,943-Speed 2629.17 samples/sec   Loss 7.7071   LearningRate 0.0327   Epoch: 8   Global Step: 354910   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:36:03,839-Speed 2629.12 samples/sec   Loss 7.7537   LearningRate 0.0327   Epoch: 8   Global Step: 354920   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:07,742-Speed 2624.46 samples/sec   Loss 7.7806   LearningRate 0.0327   Epoch: 8   Global Step: 354930   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:11,666-Speed 2610.08 samples/sec   Loss 7.6816   LearningRate 0.0327   Epoch: 8   Global Step: 354940   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:15,565-Speed 2627.05 samples/sec   Loss 7.6824   LearningRate 0.0327   Epoch: 8   Global Step: 354950   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:19,468-Speed 2623.97 samples/sec   Loss 7.6456   LearningRate 0.0327   Epoch: 8   Global Step: 354960   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:23,379-Speed 2619.23 samples/sec   Loss 7.7913   LearningRate 0.0327   Epoch: 8   Global Step: 354970   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:27,287-Speed 2621.10 samples/sec   Loss 7.6512   LearningRate 0.0327   Epoch: 8   Global Step: 354980   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:31,183-Speed 2629.36 samples/sec   Loss 7.7576   LearningRate 0.0327   Epoch: 8   Global Step: 354990   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:35,074-Speed 2631.66 samples/sec   Loss 7.6935   LearningRate 0.0327   Epoch: 8   Global Step: 355000   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:39,015-Speed 2599.50 samples/sec   Loss 7.7970   LearningRate 0.0327   Epoch: 8   Global Step: 355010   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:36:42,911-Speed 2629.01 samples/sec   Loss 7.7540   LearningRate 0.0327   Epoch: 8   Global Step: 355020   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:36:46,802-Speed 2632.32 samples/sec   Loss 7.8591   LearningRate 0.0327   Epoch: 8   Global Step: 355030   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:36:50,693-Speed 2632.66 samples/sec   Loss 7.7303   LearningRate 0.0327   Epoch: 8   Global Step: 355040   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:36:54,604-Speed 2618.49 samples/sec   Loss 7.7209   LearningRate 0.0327   Epoch: 8   Global Step: 355050   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:36:58,494-Speed 2633.46 samples/sec   Loss 7.6565   LearningRate 0.0327   Epoch: 8   Global Step: 355060   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:37:02,396-Speed 2624.59 samples/sec   Loss 7.7311   LearningRate 0.0327   Epoch: 8   Global Step: 355070   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:37:06,269-Speed 2645.10 samples/sec   Loss 7.7101   LearningRate 0.0327   Epoch: 8   Global Step: 355080   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:10,190-Speed 2611.84 samples/sec   Loss 7.6967   LearningRate 0.0327   Epoch: 8   Global Step: 355090   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:14,104-Speed 2617.41 samples/sec   Loss 7.7781   LearningRate 0.0327   Epoch: 8   Global Step: 355100   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:17,991-Speed 2634.96 samples/sec   Loss 7.6775   LearningRate 0.0327   Epoch: 8   Global Step: 355110   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:21,891-Speed 2626.60 samples/sec   Loss 7.7598   LearningRate 0.0327   Epoch: 8   Global Step: 355120   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:25,784-Speed 2630.46 samples/sec   Loss 7.7019   LearningRate 0.0327   Epoch: 8   Global Step: 355130   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:29,686-Speed 2625.41 samples/sec   Loss 7.6955   LearningRate 0.0327   Epoch: 8   Global Step: 355140   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:33,575-Speed 2633.30 samples/sec   Loss 7.7059   LearningRate 0.0327   Epoch: 8   Global Step: 355150   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:37,511-Speed 2602.41 samples/sec   Loss 7.6672   LearningRate 0.0327   Epoch: 8   Global Step: 355160   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:41,400-Speed 2633.83 samples/sec   Loss 7.6087   LearningRate 0.0327   Epoch: 8   Global Step: 355170   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:37:45,291-Speed 2632.71 samples/sec   Loss 7.6245   LearningRate 0.0327   Epoch: 8   Global Step: 355180   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:37:49,184-Speed 2630.83 samples/sec   Loss 7.6346   LearningRate 0.0327   Epoch: 8   Global Step: 355190   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:37:53,077-Speed 2631.78 samples/sec   Loss 7.7305   LearningRate 0.0327   Epoch: 8   Global Step: 355200   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:37:56,964-Speed 2634.36 samples/sec   Loss 7.7388   LearningRate 0.0327   Epoch: 8   Global Step: 355210   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:00,856-Speed 2631.61 samples/sec   Loss 7.8537   LearningRate 0.0327   Epoch: 8   Global Step: 355220   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:04,763-Speed 2621.79 samples/sec   Loss 7.6557   LearningRate 0.0327   Epoch: 8   Global Step: 355230   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:08,659-Speed 2629.19 samples/sec   Loss 7.6075   LearningRate 0.0327   Epoch: 8   Global Step: 355240   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:12,560-Speed 2626.05 samples/sec   Loss 7.7635   LearningRate 0.0327   Epoch: 8   Global Step: 355250   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:16,455-Speed 2629.60 samples/sec   Loss 7.8774   LearningRate 0.0327   Epoch: 8   Global Step: 355260   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:20,346-Speed 2632.54 samples/sec   Loss 7.7731   LearningRate 0.0327   Epoch: 8   Global Step: 355270   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:24,243-Speed 2628.50 samples/sec   Loss 7.6849   LearningRate 0.0327   Epoch: 8   Global Step: 355280   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:38:28,140-Speed 2627.65 samples/sec   Loss 7.6357   LearningRate 0.0327   Epoch: 8   Global Step: 355290   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:38:32,011-Speed 2646.03 samples/sec   Loss 7.8063   LearningRate 0.0327   Epoch: 8   Global Step: 355300   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:35,908-Speed 2628.14 samples/sec   Loss 7.6530   LearningRate 0.0327   Epoch: 8   Global Step: 355310   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:39,800-Speed 2631.92 samples/sec   Loss 7.7135   LearningRate 0.0327   Epoch: 8   Global Step: 355320   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:43,696-Speed 2629.43 samples/sec   Loss 7.6680   LearningRate 0.0327   Epoch: 8   Global Step: 355330   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:47,588-Speed 2631.66 samples/sec   Loss 7.7702   LearningRate 0.0327   Epoch: 8   Global Step: 355340   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:51,478-Speed 2633.27 samples/sec   Loss 7.6612   LearningRate 0.0327   Epoch: 8   Global Step: 355350   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:55,388-Speed 2619.32 samples/sec   Loss 7.7439   LearningRate 0.0327   Epoch: 8   Global Step: 355360   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:38:59,285-Speed 2628.60 samples/sec   Loss 7.6554   LearningRate 0.0327   Epoch: 8   Global Step: 355370   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:39:03,107-Speed 2679.67 samples/sec   Loss 8.2837   LearningRate 0.0327   Epoch: 8   Global Step: 355380   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:06,999-Speed 2632.11 samples/sec   Loss 8.7650   LearningRate 0.0327   Epoch: 8   Global Step: 355390   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:10,889-Speed 2633.09 samples/sec   Loss 8.3403   LearningRate 0.0327   Epoch: 8   Global Step: 355400   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:14,775-Speed 2635.17 samples/sec   Loss 7.9134   LearningRate 0.0327   Epoch: 8   Global Step: 355410   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:18,667-Speed 2632.02 samples/sec   Loss 7.8366   LearningRate 0.0327   Epoch: 8   Global Step: 355420   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:22,562-Speed 2630.01 samples/sec   Loss 7.8695   LearningRate 0.0327   Epoch: 8   Global Step: 355430   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:26,451-Speed 2634.00 samples/sec   Loss 7.6236   LearningRate 0.0327   Epoch: 8   Global Step: 355440   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:30,347-Speed 2628.77 samples/sec   Loss 7.5666   LearningRate 0.0327   Epoch: 8   Global Step: 355450   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:34,229-Speed 2637.71 samples/sec   Loss 7.7910   LearningRate 0.0327   Epoch: 8   Global Step: 355460   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:38,125-Speed 2629.21 samples/sec   Loss 7.6151   LearningRate 0.0327   Epoch: 8   Global Step: 355470   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:39:42,017-Speed 2631.77 samples/sec   Loss 7.7587   LearningRate 0.0327   Epoch: 8   Global Step: 355480   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:39:45,915-Speed 2628.10 samples/sec   Loss 7.6524   LearningRate 0.0327   Epoch: 8   Global Step: 355490   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:39:49,816-Speed 2625.23 samples/sec   Loss 7.7620   LearningRate 0.0327   Epoch: 8   Global Step: 355500   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:39:53,713-Speed 2628.54 samples/sec   Loss 7.8042   LearningRate 0.0327   Epoch: 8   Global Step: 355510   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:39:57,604-Speed 2634.62 samples/sec   Loss 7.6585   LearningRate 0.0327   Epoch: 8   Global Step: 355520   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:40:01,509-Speed 2622.78 samples/sec   Loss 7.6066   LearningRate 0.0327   Epoch: 8   Global Step: 355530   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:40:05,400-Speed 2631.81 samples/sec   Loss 7.6866   LearningRate 0.0327   Epoch: 8   Global Step: 355540   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:40:09,294-Speed 2630.15 samples/sec   Loss 7.6839   LearningRate 0.0327   Epoch: 8   Global Step: 355550   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:40:13,194-Speed 2627.05 samples/sec   Loss 7.8628   LearningRate 0.0326   Epoch: 8   Global Step: 355560   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:40:17,085-Speed 2631.91 samples/sec   Loss 7.8250   LearningRate 0.0326   Epoch: 8   Global Step: 355570   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:40:20,990-Speed 2623.48 samples/sec   Loss 7.6529   LearningRate 0.0326   Epoch: 8   Global Step: 355580   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:24,877-Speed 2634.94 samples/sec   Loss 7.7250   LearningRate 0.0326   Epoch: 8   Global Step: 355590   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:28,805-Speed 2607.72 samples/sec   Loss 7.7868   LearningRate 0.0326   Epoch: 8   Global Step: 355600   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:32,703-Speed 2627.60 samples/sec   Loss 7.6186   LearningRate 0.0326   Epoch: 8   Global Step: 355610   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:36,597-Speed 2630.59 samples/sec   Loss 7.7022   LearningRate 0.0326   Epoch: 8   Global Step: 355620   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:40,497-Speed 2626.15 samples/sec   Loss 7.7959   LearningRate 0.0326   Epoch: 8   Global Step: 355630   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:44,419-Speed 2611.48 samples/sec   Loss 7.7304   LearningRate 0.0326   Epoch: 8   Global Step: 355640   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:48,338-Speed 2614.33 samples/sec   Loss 7.7160   LearningRate 0.0326   Epoch: 8   Global Step: 355650   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:52,235-Speed 2627.83 samples/sec   Loss 7.7442   LearningRate 0.0326   Epoch: 8   Global Step: 355660   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:40:56,126-Speed 2632.56 samples/sec   Loss 7.8065   LearningRate 0.0326   Epoch: 8   Global Step: 355670   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:41:00,024-Speed 2627.77 samples/sec   Loss 7.5671   LearningRate 0.0326   Epoch: 8   Global Step: 355680   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:03,919-Speed 2629.13 samples/sec   Loss 7.7023   LearningRate 0.0326   Epoch: 8   Global Step: 355690   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:07,819-Speed 2626.31 samples/sec   Loss 7.7532   LearningRate 0.0326   Epoch: 8   Global Step: 355700   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:11,768-Speed 2593.98 samples/sec   Loss 7.6806   LearningRate 0.0326   Epoch: 8   Global Step: 355710   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:15,842-Speed 2514.02 samples/sec   Loss 7.7160   LearningRate 0.0326   Epoch: 8   Global Step: 355720   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:19,779-Speed 2601.58 samples/sec   Loss 7.6739   LearningRate 0.0326   Epoch: 8   Global Step: 355730   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:23,683-Speed 2624.26 samples/sec   Loss 7.6638   LearningRate 0.0326   Epoch: 8   Global Step: 355740   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:27,574-Speed 2632.30 samples/sec   Loss 7.5338   LearningRate 0.0326   Epoch: 8   Global Step: 355750   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:31,469-Speed 2629.85 samples/sec   Loss 7.7141   LearningRate 0.0326   Epoch: 8   Global Step: 355760   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:35,376-Speed 2621.48 samples/sec   Loss 7.6045   LearningRate 0.0326   Epoch: 8   Global Step: 355770   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:41:39,275-Speed 2627.26 samples/sec   Loss 7.6628   LearningRate 0.0326   Epoch: 8   Global Step: 355780   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:41:43,170-Speed 2629.78 samples/sec   Loss 7.7082   LearningRate 0.0326   Epoch: 8   Global Step: 355790   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:41:47,065-Speed 2628.99 samples/sec   Loss 7.6840   LearningRate 0.0326   Epoch: 8   Global Step: 355800   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:41:50,964-Speed 2627.33 samples/sec   Loss 7.6820   LearningRate 0.0326   Epoch: 8   Global Step: 355810   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:41:54,862-Speed 2627.62 samples/sec   Loss 7.7782   LearningRate 0.0326   Epoch: 8   Global Step: 355820   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:41:58,786-Speed 2610.40 samples/sec   Loss 7.6278   LearningRate 0.0326   Epoch: 8   Global Step: 355830   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:42:02,703-Speed 2615.27 samples/sec   Loss 7.5896   LearningRate 0.0326   Epoch: 8   Global Step: 355840   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:42:06,616-Speed 2617.03 samples/sec   Loss 7.6346   LearningRate 0.0326   Epoch: 8   Global Step: 355850   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:42:10,529-Speed 2617.39 samples/sec   Loss 7.7341   LearningRate 0.0326   Epoch: 8   Global Step: 355860   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:42:14,445-Speed 2616.04 samples/sec   Loss 7.7878   LearningRate 0.0326   Epoch: 8   Global Step: 355870   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:42:18,361-Speed 2615.98 samples/sec   Loss 7.7969   LearningRate 0.0326   Epoch: 8   Global Step: 355880   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:22,270-Speed 2620.09 samples/sec   Loss 7.7508   LearningRate 0.0326   Epoch: 8   Global Step: 355890   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:26,177-Speed 2621.54 samples/sec   Loss 7.7563   LearningRate 0.0326   Epoch: 8   Global Step: 355900   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:30,092-Speed 2616.01 samples/sec   Loss 7.7013   LearningRate 0.0326   Epoch: 8   Global Step: 355910   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:33,996-Speed 2624.39 samples/sec   Loss 7.7377   LearningRate 0.0326   Epoch: 8   Global Step: 355920   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:37,896-Speed 2626.09 samples/sec   Loss 7.7082   LearningRate 0.0326   Epoch: 8   Global Step: 355930   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:41,801-Speed 2622.76 samples/sec   Loss 7.5570   LearningRate 0.0326   Epoch: 8   Global Step: 355940   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:45,693-Speed 2631.93 samples/sec   Loss 7.7011   LearningRate 0.0326   Epoch: 8   Global Step: 355950   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:49,585-Speed 2631.38 samples/sec   Loss 7.6783   LearningRate 0.0326   Epoch: 8   Global Step: 355960   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:53,479-Speed 2631.06 samples/sec   Loss 7.5363   LearningRate 0.0326   Epoch: 8   Global Step: 355970   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:42:57,449-Speed 2579.44 samples/sec   Loss 7.6423   LearningRate 0.0326   Epoch: 8   Global Step: 355980   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:43:01,348-Speed 2627.23 samples/sec   Loss 7.7565   LearningRate 0.0326   Epoch: 8   Global Step: 355990   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:43:05,243-Speed 2629.11 samples/sec   Loss 7.7676   LearningRate 0.0326   Epoch: 8   Global Step: 356000   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:43:09,139-Speed 2629.59 samples/sec   Loss 7.6751   LearningRate 0.0326   Epoch: 8   Global Step: 356010   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:43:13,199-Speed 2523.24 samples/sec   Loss 7.8128   LearningRate 0.0326   Epoch: 8   Global Step: 356020   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:43:17,159-Speed 2586.16 samples/sec   Loss 7.6596   LearningRate 0.0326   Epoch: 8   Global Step: 356030   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:43:21,081-Speed 2612.04 samples/sec   Loss 7.7541   LearningRate 0.0326   Epoch: 8   Global Step: 356040   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:43:24,981-Speed 2626.27 samples/sec   Loss 7.7722   LearningRate 0.0326   Epoch: 8   Global Step: 356050   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:43:28,883-Speed 2624.77 samples/sec   Loss 7.6084   LearningRate 0.0326   Epoch: 8   Global Step: 356060   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:43:32,756-Speed 2644.16 samples/sec   Loss 7.8864   LearningRate 0.0326   Epoch: 8   Global Step: 356070   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:43:36,650-Speed 2630.81 samples/sec   Loss 7.6738   LearningRate 0.0326   Epoch: 8   Global Step: 356080   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:43:40,542-Speed 2631.19 samples/sec   Loss 7.6449   LearningRate 0.0326   Epoch: 8   Global Step: 356090   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:43:44,434-Speed 2632.21 samples/sec   Loss 7.8156   LearningRate 0.0326   Epoch: 8   Global Step: 356100   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:43:48,329-Speed 2629.65 samples/sec   Loss 7.7003   LearningRate 0.0326   Epoch: 8   Global Step: 356110   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:43:52,222-Speed 2631.59 samples/sec   Loss 7.5806   LearningRate 0.0326   Epoch: 8   Global Step: 356120   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:43:56,116-Speed 2630.51 samples/sec   Loss 7.7509   LearningRate 0.0326   Epoch: 8   Global Step: 356130   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:44:00,015-Speed 2626.54 samples/sec   Loss 7.6865   LearningRate 0.0326   Epoch: 8   Global Step: 356140   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:44:03,893-Speed 2640.79 samples/sec   Loss 9.2135   LearningRate 0.0326   Epoch: 8   Global Step: 356150   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:07,787-Speed 2631.33 samples/sec   Loss 8.4317   LearningRate 0.0326   Epoch: 8   Global Step: 356160   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:11,675-Speed 2633.95 samples/sec   Loss 7.7725   LearningRate 0.0326   Epoch: 8   Global Step: 356170   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:15,596-Speed 2612.59 samples/sec   Loss 7.8412   LearningRate 0.0326   Epoch: 8   Global Step: 356180   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:19,488-Speed 2631.75 samples/sec   Loss 7.7790   LearningRate 0.0326   Epoch: 8   Global Step: 356190   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:23,411-Speed 2611.00 samples/sec   Loss 7.8121   LearningRate 0.0326   Epoch: 8   Global Step: 356200   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:27,305-Speed 2630.31 samples/sec   Loss 7.7417   LearningRate 0.0326   Epoch: 8   Global Step: 356210   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:31,196-Speed 2632.73 samples/sec   Loss 7.6680   LearningRate 0.0326   Epoch: 8   Global Step: 356220   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:35,088-Speed 2631.26 samples/sec   Loss 7.6373   LearningRate 0.0326   Epoch: 8   Global Step: 356230   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:38,979-Speed 2632.85 samples/sec   Loss 7.6782   LearningRate 0.0326   Epoch: 8   Global Step: 356240   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:44:42,872-Speed 2631.06 samples/sec   Loss 7.7989   LearningRate 0.0326   Epoch: 8   Global Step: 356250   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:44:46,764-Speed 2631.49 samples/sec   Loss 7.7374   LearningRate 0.0326   Epoch: 8   Global Step: 356260   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:44:50,675-Speed 2618.96 samples/sec   Loss 7.7430   LearningRate 0.0326   Epoch: 8   Global Step: 356270   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:44:54,591-Speed 2615.93 samples/sec   Loss 7.7115   LearningRate 0.0326   Epoch: 8   Global Step: 356280   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:44:58,487-Speed 2628.77 samples/sec   Loss 7.6895   LearningRate 0.0325   Epoch: 8   Global Step: 356290   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:02,377-Speed 2633.15 samples/sec   Loss 7.7384   LearningRate 0.0325   Epoch: 8   Global Step: 356300   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:06,277-Speed 2626.06 samples/sec   Loss 7.6071   LearningRate 0.0325   Epoch: 8   Global Step: 356310   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:10,187-Speed 2619.82 samples/sec   Loss 7.5413   LearningRate 0.0325   Epoch: 8   Global Step: 356320   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:14,083-Speed 2629.17 samples/sec   Loss 7.6724   LearningRate 0.0325   Epoch: 8   Global Step: 356330   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:17,983-Speed 2626.39 samples/sec   Loss 7.7458   LearningRate 0.0325   Epoch: 8   Global Step: 356340   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:21,883-Speed 2625.65 samples/sec   Loss 7.7035   LearningRate 0.0325   Epoch: 8   Global Step: 356350   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:45:25,776-Speed 2631.39 samples/sec   Loss 7.6625   LearningRate 0.0325   Epoch: 8   Global Step: 356360   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:45:29,666-Speed 2633.26 samples/sec   Loss 7.7173   LearningRate 0.0325   Epoch: 8   Global Step: 356370   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:45:33,561-Speed 2629.62 samples/sec   Loss 7.5518   LearningRate 0.0325   Epoch: 8   Global Step: 356380   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:45:37,473-Speed 2617.77 samples/sec   Loss 7.7002   LearningRate 0.0325   Epoch: 8   Global Step: 356390   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:45:41,366-Speed 2631.93 samples/sec   Loss 7.6817   LearningRate 0.0325   Epoch: 8   Global Step: 356400   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:45:45,241-Speed 2642.98 samples/sec   Loss 7.7068   LearningRate 0.0325   Epoch: 8   Global Step: 356410   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:49,131-Speed 2632.97 samples/sec   Loss 7.6345   LearningRate 0.0325   Epoch: 8   Global Step: 356420   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:53,038-Speed 2621.81 samples/sec   Loss 7.6751   LearningRate 0.0325   Epoch: 8   Global Step: 356430   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:45:56,935-Speed 2627.95 samples/sec   Loss 7.6969   LearningRate 0.0325   Epoch: 8   Global Step: 356440   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:00,828-Speed 2631.02 samples/sec   Loss 7.6407   LearningRate 0.0325   Epoch: 8   Global Step: 356450   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:04,722-Speed 2630.87 samples/sec   Loss 7.6235   LearningRate 0.0325   Epoch: 8   Global Step: 356460   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:08,613-Speed 2632.23 samples/sec   Loss 7.7432   LearningRate 0.0325   Epoch: 8   Global Step: 356470   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:12,520-Speed 2621.34 samples/sec   Loss 7.7405   LearningRate 0.0325   Epoch: 8   Global Step: 356480   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:16,414-Speed 2630.54 samples/sec   Loss 7.8584   LearningRate 0.0325   Epoch: 8   Global Step: 356490   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:20,307-Speed 2631.08 samples/sec   Loss 7.7918   LearningRate 0.0325   Epoch: 8   Global Step: 356500   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:46:24,202-Speed 2628.94 samples/sec   Loss 7.5467   LearningRate 0.0325   Epoch: 8   Global Step: 356510   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:28,102-Speed 2626.45 samples/sec   Loss 7.6550   LearningRate 0.0325   Epoch: 8   Global Step: 356520   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:31,997-Speed 2629.58 samples/sec   Loss 7.5610   LearningRate 0.0325   Epoch: 8   Global Step: 356530   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:35,894-Speed 2628.50 samples/sec   Loss 7.8231   LearningRate 0.0325   Epoch: 8   Global Step: 356540   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:39,797-Speed 2624.03 samples/sec   Loss 7.7833   LearningRate 0.0325   Epoch: 8   Global Step: 356550   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:43,706-Speed 2620.56 samples/sec   Loss 7.6605   LearningRate 0.0325   Epoch: 8   Global Step: 356560   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:47,609-Speed 2623.96 samples/sec   Loss 7.6181   LearningRate 0.0325   Epoch: 8   Global Step: 356570   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:51,505-Speed 2629.37 samples/sec   Loss 7.5971   LearningRate 0.0325   Epoch: 8   Global Step: 356580   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:55,399-Speed 2630.25 samples/sec   Loss 7.6609   LearningRate 0.0325   Epoch: 8   Global Step: 356590   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:46:59,289-Speed 2634.28 samples/sec   Loss 7.6658   LearningRate 0.0325   Epoch: 8   Global Step: 356600   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:03,186-Speed 2627.77 samples/sec   Loss 7.6585   LearningRate 0.0325   Epoch: 8   Global Step: 356610   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:47:07,090-Speed 2623.29 samples/sec   Loss 7.7348   LearningRate 0.0325   Epoch: 8   Global Step: 356620   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:47:10,982-Speed 2631.84 samples/sec   Loss 7.7905   LearningRate 0.0325   Epoch: 8   Global Step: 356630   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:47:14,881-Speed 2627.24 samples/sec   Loss 7.6836   LearningRate 0.0325   Epoch: 8   Global Step: 356640   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:47:18,759-Speed 2641.74 samples/sec   Loss 7.7589   LearningRate 0.0325   Epoch: 8   Global Step: 356650   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:22,653-Speed 2630.26 samples/sec   Loss 7.6117   LearningRate 0.0325   Epoch: 8   Global Step: 356660   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:26,545-Speed 2631.52 samples/sec   Loss 7.7562   LearningRate 0.0325   Epoch: 8   Global Step: 356670   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:30,445-Speed 2626.04 samples/sec   Loss 7.7302   LearningRate 0.0325   Epoch: 8   Global Step: 356680   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:34,351-Speed 2622.35 samples/sec   Loss 7.6678   LearningRate 0.0325   Epoch: 8   Global Step: 356690   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:38,244-Speed 2630.47 samples/sec   Loss 7.7905   LearningRate 0.0325   Epoch: 8   Global Step: 356700   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:42,139-Speed 2630.32 samples/sec   Loss 7.5942   LearningRate 0.0325   Epoch: 8   Global Step: 356710   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:46,028-Speed 2633.41 samples/sec   Loss 7.6732   LearningRate 0.0325   Epoch: 8   Global Step: 356720   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:49,922-Speed 2630.63 samples/sec   Loss 7.5726   LearningRate 0.0325   Epoch: 8   Global Step: 356730   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:53,813-Speed 2632.08 samples/sec   Loss 7.5734   LearningRate 0.0325   Epoch: 8   Global Step: 356740   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:47:57,737-Speed 2610.93 samples/sec   Loss 7.6065   LearningRate 0.0325   Epoch: 8   Global Step: 356750   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:48:01,630-Speed 2630.94 samples/sec   Loss 7.6829   LearningRate 0.0325   Epoch: 8   Global Step: 356760   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:48:05,542-Speed 2617.97 samples/sec   Loss 7.6552   LearningRate 0.0325   Epoch: 8   Global Step: 356770   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:48:09,470-Speed 2607.37 samples/sec   Loss 7.6230   LearningRate 0.0325   Epoch: 8   Global Step: 356780   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:48:13,380-Speed 2620.29 samples/sec   Loss 7.5721   LearningRate 0.0325   Epoch: 8   Global Step: 356790   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:48:17,309-Speed 2606.43 samples/sec   Loss 7.7514   LearningRate 0.0325   Epoch: 8   Global Step: 356800   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:21,209-Speed 2626.25 samples/sec   Loss 7.5971   LearningRate 0.0325   Epoch: 8   Global Step: 356810   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:25,109-Speed 2626.51 samples/sec   Loss 7.6305   LearningRate 0.0325   Epoch: 8   Global Step: 356820   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:29,009-Speed 2627.16 samples/sec   Loss 7.7463   LearningRate 0.0325   Epoch: 8   Global Step: 356830   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:32,906-Speed 2627.60 samples/sec   Loss 7.5983   LearningRate 0.0325   Epoch: 8   Global Step: 356840   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:36,819-Speed 2617.73 samples/sec   Loss 7.6770   LearningRate 0.0325   Epoch: 8   Global Step: 356850   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:40,717-Speed 2627.52 samples/sec   Loss 7.8191   LearningRate 0.0325   Epoch: 8   Global Step: 356860   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:44,608-Speed 2632.82 samples/sec   Loss 7.6768   LearningRate 0.0325   Epoch: 8   Global Step: 356870   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:48,511-Speed 2623.96 samples/sec   Loss 7.7077   LearningRate 0.0325   Epoch: 8   Global Step: 356880   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:52,498-Speed 2568.92 samples/sec   Loss 7.6959   LearningRate 0.0325   Epoch: 8   Global Step: 356890   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:48:56,421-Speed 2611.43 samples/sec   Loss 7.6331   LearningRate 0.0325   Epoch: 8   Global Step: 356900   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:49:00,320-Speed 2626.64 samples/sec   Loss 7.5755   LearningRate 0.0325   Epoch: 8   Global Step: 356910   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:49:04,196-Speed 2642.67 samples/sec   Loss 7.7904   LearningRate 0.0325   Epoch: 8   Global Step: 356920   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:08,094-Speed 2627.74 samples/sec   Loss 7.7270   LearningRate 0.0325   Epoch: 8   Global Step: 356930   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:11,993-Speed 2627.06 samples/sec   Loss 7.6865   LearningRate 0.0325   Epoch: 8   Global Step: 356940   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:15,883-Speed 2632.60 samples/sec   Loss 7.6958   LearningRate 0.0325   Epoch: 8   Global Step: 356950   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:19,790-Speed 2621.83 samples/sec   Loss 7.6914   LearningRate 0.0325   Epoch: 8   Global Step: 356960   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:23,694-Speed 2624.06 samples/sec   Loss 7.6141   LearningRate 0.0325   Epoch: 8   Global Step: 356970   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:27,590-Speed 2629.57 samples/sec   Loss 7.6795   LearningRate 0.0325   Epoch: 8   Global Step: 356980   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:31,498-Speed 2620.21 samples/sec   Loss 7.6386   LearningRate 0.0325   Epoch: 8   Global Step: 356990   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:35,413-Speed 2616.24 samples/sec   Loss 7.6432   LearningRate 0.0325   Epoch: 8   Global Step: 357000   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:39,312-Speed 2627.00 samples/sec   Loss 7.7518   LearningRate 0.0325   Epoch: 8   Global Step: 357010   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:49:43,204-Speed 2631.93 samples/sec   Loss 7.7398   LearningRate 0.0324   Epoch: 8   Global Step: 357020   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:49:47,103-Speed 2627.00 samples/sec   Loss 7.6507   LearningRate 0.0324   Epoch: 8   Global Step: 357030   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:49:51,002-Speed 2627.19 samples/sec   Loss 7.6526   LearningRate 0.0324   Epoch: 8   Global Step: 357040   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:49:54,890-Speed 2633.72 samples/sec   Loss 7.6365   LearningRate 0.0324   Epoch: 8   Global Step: 357050   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:49:58,745-Speed 2657.49 samples/sec   Loss 8.4114   LearningRate 0.0324   Epoch: 8   Global Step: 357060   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:02,655-Speed 2619.14 samples/sec   Loss 9.4159   LearningRate 0.0324   Epoch: 8   Global Step: 357070   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:06,547-Speed 2631.69 samples/sec   Loss 8.4404   LearningRate 0.0324   Epoch: 8   Global Step: 357080   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:10,431-Speed 2636.48 samples/sec   Loss 7.9256   LearningRate 0.0324   Epoch: 8   Global Step: 357090   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:14,335-Speed 2624.28 samples/sec   Loss 7.8087   LearningRate 0.0324   Epoch: 8   Global Step: 357100   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:18,224-Speed 2633.76 samples/sec   Loss 7.6863   LearningRate 0.0324   Epoch: 8   Global Step: 357110   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:22,114-Speed 2632.84 samples/sec   Loss 7.7387   LearningRate 0.0324   Epoch: 8   Global Step: 357120   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:26,007-Speed 2630.91 samples/sec   Loss 7.6710   LearningRate 0.0324   Epoch: 8   Global Step: 357130   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:29,892-Speed 2636.73 samples/sec   Loss 7.6754   LearningRate 0.0324   Epoch: 8   Global Step: 357140   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:33,785-Speed 2631.14 samples/sec   Loss 7.7168   LearningRate 0.0324   Epoch: 8   Global Step: 357150   Fp16 Grad Scale: 1024   Required: 53 hours
Training: 2022-04-14 11:50:37,677-Speed 2631.45 samples/sec   Loss 7.6490   LearningRate 0.0324   Epoch: 8   Global Step: 357160   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:50:41,562-Speed 2636.11 samples/sec   Loss 7.7720   LearningRate 0.0324   Epoch: 8   Global Step: 357170   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:50:45,455-Speed 2630.88 samples/sec   Loss 7.7537   LearningRate 0.0324   Epoch: 8   Global Step: 357180   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:50:49,342-Speed 2635.12 samples/sec   Loss 7.6456   LearningRate 0.0324   Epoch: 8   Global Step: 357190   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:50:53,236-Speed 2630.14 samples/sec   Loss 7.5654   LearningRate 0.0324   Epoch: 8   Global Step: 357200   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:50:57,134-Speed 2628.09 samples/sec   Loss 7.6231   LearningRate 0.0324   Epoch: 8   Global Step: 357210   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:51:01,030-Speed 2628.98 samples/sec   Loss 7.6185   LearningRate 0.0324   Epoch: 8   Global Step: 357220   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:51:04,927-Speed 2628.56 samples/sec   Loss 7.7181   LearningRate 0.0324   Epoch: 8   Global Step: 357230   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:51:08,823-Speed 2628.62 samples/sec   Loss 7.7302   LearningRate 0.0324   Epoch: 8   Global Step: 357240   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:51:12,721-Speed 2628.22 samples/sec   Loss 7.6188   LearningRate 0.0324   Epoch: 8   Global Step: 357250   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 11:51:16,610-Speed 2633.73 samples/sec   Loss 7.7866   LearningRate 0.0324   Epoch: 8   Global Step: 357260   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:20,525-Speed 2616.10 samples/sec   Loss 7.6598   LearningRate 0.0324   Epoch: 8   Global Step: 357270   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:24,415-Speed 2633.21 samples/sec   Loss 7.6241   LearningRate 0.0324   Epoch: 8   Global Step: 357280   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:28,304-Speed 2634.29 samples/sec   Loss 7.6877   LearningRate 0.0324   Epoch: 8   Global Step: 357290   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:32,198-Speed 2629.74 samples/sec   Loss 7.6424   LearningRate 0.0324   Epoch: 8   Global Step: 357300   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:36,089-Speed 2632.25 samples/sec   Loss 7.7009   LearningRate 0.0324   Epoch: 8   Global Step: 357310   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:39,977-Speed 2634.51 samples/sec   Loss 7.6812   LearningRate 0.0324   Epoch: 8   Global Step: 357320   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:43,878-Speed 2626.04 samples/sec   Loss 7.8029   LearningRate 0.0324   Epoch: 8   Global Step: 357330   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:47,764-Speed 2635.83 samples/sec   Loss 7.6532   LearningRate 0.0324   Epoch: 8   Global Step: 357340   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:51,654-Speed 2632.84 samples/sec   Loss 7.8506   LearningRate 0.0324   Epoch: 8   Global Step: 357350   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 11:51:55,542-Speed 2634.04 samples/sec   Loss 7.6439   LearningRate 0.0324   Epoch: 8   Global Step: 357360   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:51:59,464-Speed 2612.07 samples/sec   Loss 7.7753   LearningRate 0.0324   Epoch: 8   Global Step: 357370   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:03,388-Speed 2610.34 samples/sec   Loss 7.7084   LearningRate 0.0324   Epoch: 8   Global Step: 357380   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:07,465-Speed 2512.12 samples/sec   Loss 7.6427   LearningRate 0.0324   Epoch: 8   Global Step: 357390   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:11,529-Speed 2520.23 samples/sec   Loss 7.6422   LearningRate 0.0324   Epoch: 8   Global Step: 357400   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:15,593-Speed 2520.89 samples/sec   Loss 7.6046   LearningRate 0.0324   Epoch: 8   Global Step: 357410   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:19,595-Speed 2559.22 samples/sec   Loss 7.7895   LearningRate 0.0324   Epoch: 8   Global Step: 357420   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:23,582-Speed 2569.13 samples/sec   Loss 7.7364   LearningRate 0.0324   Epoch: 8   Global Step: 357430   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:27,473-Speed 2632.51 samples/sec   Loss 7.6526   LearningRate 0.0324   Epoch: 8   Global Step: 357440   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:31,365-Speed 2631.75 samples/sec   Loss 7.8005   LearningRate 0.0324   Epoch: 8   Global Step: 357450   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 11:52:35,253-Speed 2634.16 samples/sec   Loss 7.5113   LearningRate 0.0324   Epoch: 8   Global Step: 357460   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:52:39,164-Speed 2618.97 samples/sec   Loss 7.7414   LearningRate 0.0324   Epoch: 8   Global Step: 357470   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:52:43,057-Speed 2631.03 samples/sec   Loss 7.6165   LearningRate 0.0324   Epoch: 8   Global Step: 357480   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:52:46,948-Speed 2632.53 samples/sec   Loss 7.6351   LearningRate 0.0324   Epoch: 8   Global Step: 357490   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:52:50,841-Speed 2631.67 samples/sec   Loss 7.8054   LearningRate 0.0324   Epoch: 8   Global Step: 357500   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:52:54,732-Speed 2631.70 samples/sec   Loss 7.7448   LearningRate 0.0324   Epoch: 8   Global Step: 357510   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:52:58,627-Speed 2629.87 samples/sec   Loss 7.7769   LearningRate 0.0324   Epoch: 8   Global Step: 357520   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:53:02,524-Speed 2628.80 samples/sec   Loss 7.6471   LearningRate 0.0324   Epoch: 8   Global Step: 357530   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:53:06,426-Speed 2624.66 samples/sec   Loss 7.7518   LearningRate 0.0324   Epoch: 8   Global Step: 357540   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:53:10,332-Speed 2622.01 samples/sec   Loss 7.6907   LearningRate 0.0324   Epoch: 8   Global Step: 357550   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:53:14,226-Speed 2631.14 samples/sec   Loss 7.7151   LearningRate 0.0324   Epoch: 8   Global Step: 357560   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:18,119-Speed 2630.82 samples/sec   Loss 7.7288   LearningRate 0.0324   Epoch: 8   Global Step: 357570   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:22,034-Speed 2616.18 samples/sec   Loss 7.6792   LearningRate 0.0324   Epoch: 8   Global Step: 357580   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:26,107-Speed 2514.75 samples/sec   Loss 7.6511   LearningRate 0.0324   Epoch: 8   Global Step: 357590   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:30,007-Speed 2626.56 samples/sec   Loss 7.7355   LearningRate 0.0324   Epoch: 8   Global Step: 357600   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:33,904-Speed 2628.61 samples/sec   Loss 7.7932   LearningRate 0.0324   Epoch: 8   Global Step: 357610   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:37,809-Speed 2622.65 samples/sec   Loss 7.6358   LearningRate 0.0324   Epoch: 8   Global Step: 357620   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:41,714-Speed 2622.79 samples/sec   Loss 7.6470   LearningRate 0.0324   Epoch: 8   Global Step: 357630   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:45,792-Speed 2511.77 samples/sec   Loss 7.7434   LearningRate 0.0324   Epoch: 8   Global Step: 357640   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:49,806-Speed 2551.66 samples/sec   Loss 7.6708   LearningRate 0.0324   Epoch: 8   Global Step: 357650   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:53:53,707-Speed 2625.51 samples/sec   Loss 7.7055   LearningRate 0.0324   Epoch: 8   Global Step: 357660   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:53:57,610-Speed 2624.57 samples/sec   Loss 7.6078   LearningRate 0.0324   Epoch: 8   Global Step: 357670   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:54:01,494-Speed 2637.17 samples/sec   Loss 7.7067   LearningRate 0.0324   Epoch: 8   Global Step: 357680   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:54:05,365-Speed 2646.00 samples/sec   Loss 8.1297   LearningRate 0.0324   Epoch: 8   Global Step: 357690   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:09,279-Speed 2616.90 samples/sec   Loss 8.2875   LearningRate 0.0324   Epoch: 8   Global Step: 357700   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:13,188-Speed 2620.54 samples/sec   Loss 7.8820   LearningRate 0.0324   Epoch: 8   Global Step: 357710   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:17,100-Speed 2618.51 samples/sec   Loss 7.7424   LearningRate 0.0324   Epoch: 8   Global Step: 357720   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:20,989-Speed 2632.91 samples/sec   Loss 7.6995   LearningRate 0.0324   Epoch: 8   Global Step: 357730   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:24,882-Speed 2631.64 samples/sec   Loss 7.7010   LearningRate 0.0323   Epoch: 8   Global Step: 357740   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:28,770-Speed 2634.35 samples/sec   Loss 7.6387   LearningRate 0.0323   Epoch: 8   Global Step: 357750   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:32,659-Speed 2634.21 samples/sec   Loss 7.7460   LearningRate 0.0323   Epoch: 8   Global Step: 357760   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:36,559-Speed 2625.85 samples/sec   Loss 7.8311   LearningRate 0.0323   Epoch: 8   Global Step: 357770   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:40,450-Speed 2632.80 samples/sec   Loss 7.5836   LearningRate 0.0323   Epoch: 8   Global Step: 357780   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 11:54:44,345-Speed 2629.72 samples/sec   Loss 7.7265   LearningRate 0.0323   Epoch: 8   Global Step: 357790   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:54:48,242-Speed 2628.27 samples/sec   Loss 7.7511   LearningRate 0.0323   Epoch: 8   Global Step: 357800   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:54:52,136-Speed 2631.21 samples/sec   Loss 7.7803   LearningRate 0.0323   Epoch: 8   Global Step: 357810   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:54:56,026-Speed 2632.45 samples/sec   Loss 7.5793   LearningRate 0.0323   Epoch: 8   Global Step: 357820   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:54:59,922-Speed 2629.13 samples/sec   Loss 7.7185   LearningRate 0.0323   Epoch: 8   Global Step: 357830   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:55:03,810-Speed 2633.85 samples/sec   Loss 7.6213   LearningRate 0.0323   Epoch: 8   Global Step: 357840   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:55:07,707-Speed 2629.25 samples/sec   Loss 7.7613   LearningRate 0.0323   Epoch: 8   Global Step: 357850   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:55:11,609-Speed 2624.92 samples/sec   Loss 7.5637   LearningRate 0.0323   Epoch: 8   Global Step: 357860   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:55:15,527-Speed 2614.00 samples/sec   Loss 7.7085   LearningRate 0.0323   Epoch: 8   Global Step: 357870   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:55:19,446-Speed 2613.80 samples/sec   Loss 7.5694   LearningRate 0.0323   Epoch: 8   Global Step: 357880   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 11:55:23,343-Speed 2628.31 samples/sec   Loss 7.6870   LearningRate 0.0323   Epoch: 8   Global Step: 357890   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:27,263-Speed 2613.63 samples/sec   Loss 7.7097   LearningRate 0.0323   Epoch: 8   Global Step: 357900   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:31,156-Speed 2630.83 samples/sec   Loss 7.6351   LearningRate 0.0323   Epoch: 8   Global Step: 357910   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:35,052-Speed 2628.27 samples/sec   Loss 7.6474   LearningRate 0.0323   Epoch: 8   Global Step: 357920   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:38,951-Speed 2627.03 samples/sec   Loss 7.6575   LearningRate 0.0323   Epoch: 8   Global Step: 357930   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:42,841-Speed 2633.31 samples/sec   Loss 7.6052   LearningRate 0.0323   Epoch: 8   Global Step: 357940   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:46,738-Speed 2628.60 samples/sec   Loss 7.5448   LearningRate 0.0323   Epoch: 8   Global Step: 357950   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:50,641-Speed 2624.23 samples/sec   Loss 7.7304   LearningRate 0.0323   Epoch: 8   Global Step: 357960   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:54,536-Speed 2629.68 samples/sec   Loss 7.6288   LearningRate 0.0323   Epoch: 8   Global Step: 357970   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:55:58,438-Speed 2624.85 samples/sec   Loss 7.7273   LearningRate 0.0323   Epoch: 8   Global Step: 357980   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 11:56:02,337-Speed 2627.24 samples/sec   Loss 7.6813   LearningRate 0.0323   Epoch: 8   Global Step: 357990   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:06,227-Speed 2633.16 samples/sec   Loss 7.7140   LearningRate 0.0323   Epoch: 8   Global Step: 358000   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:10,150-Speed 2611.08 samples/sec   Loss 7.5673   LearningRate 0.0323   Epoch: 8   Global Step: 358010   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:14,041-Speed 2632.50 samples/sec   Loss 7.6750   LearningRate 0.0323   Epoch: 8   Global Step: 358020   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:17,941-Speed 2626.10 samples/sec   Loss 7.6162   LearningRate 0.0323   Epoch: 8   Global Step: 358030   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:21,838-Speed 2628.91 samples/sec   Loss 7.6389   LearningRate 0.0323   Epoch: 8   Global Step: 358040   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:25,725-Speed 2634.89 samples/sec   Loss 7.6378   LearningRate 0.0323   Epoch: 8   Global Step: 358050   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:29,622-Speed 2628.25 samples/sec   Loss 7.7943   LearningRate 0.0323   Epoch: 8   Global Step: 358060   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:33,529-Speed 2622.01 samples/sec   Loss 7.7085   LearningRate 0.0323   Epoch: 8   Global Step: 358070   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:37,423-Speed 2629.82 samples/sec   Loss 7.4929   LearningRate 0.0323   Epoch: 8   Global Step: 358080   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:56:41,318-Speed 2629.37 samples/sec   Loss 7.7002   LearningRate 0.0323   Epoch: 8   Global Step: 358090   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:56:45,209-Speed 2632.46 samples/sec   Loss 7.6246   LearningRate 0.0323   Epoch: 8   Global Step: 358100   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:56:49,107-Speed 2628.21 samples/sec   Loss 7.6472   LearningRate 0.0323   Epoch: 8   Global Step: 358110   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:56:53,012-Speed 2622.85 samples/sec   Loss 7.6397   LearningRate 0.0323   Epoch: 8   Global Step: 358120   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:56:56,910-Speed 2627.98 samples/sec   Loss 7.7786   LearningRate 0.0323   Epoch: 8   Global Step: 358130   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:57:00,801-Speed 2632.47 samples/sec   Loss 7.7458   LearningRate 0.0323   Epoch: 8   Global Step: 358140   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:57:04,691-Speed 2632.30 samples/sec   Loss 7.7370   LearningRate 0.0323   Epoch: 8   Global Step: 358150   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:57:08,582-Speed 2632.23 samples/sec   Loss 7.8132   LearningRate 0.0323   Epoch: 8   Global Step: 358160   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:57:12,480-Speed 2627.89 samples/sec   Loss 7.7361   LearningRate 0.0323   Epoch: 8   Global Step: 358170   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:57:16,374-Speed 2630.15 samples/sec   Loss 7.7315   LearningRate 0.0323   Epoch: 8   Global Step: 358180   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:57:20,293-Speed 2613.71 samples/sec   Loss 7.6897   LearningRate 0.0323   Epoch: 8   Global Step: 358190   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:24,198-Speed 2622.50 samples/sec   Loss 7.6469   LearningRate 0.0323   Epoch: 8   Global Step: 358200   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:28,098-Speed 2626.63 samples/sec   Loss 7.6419   LearningRate 0.0323   Epoch: 8   Global Step: 358210   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:31,985-Speed 2635.18 samples/sec   Loss 7.6246   LearningRate 0.0323   Epoch: 8   Global Step: 358220   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:35,889-Speed 2623.56 samples/sec   Loss 7.6502   LearningRate 0.0323   Epoch: 8   Global Step: 358230   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:39,851-Speed 2585.01 samples/sec   Loss 7.5121   LearningRate 0.0323   Epoch: 8   Global Step: 358240   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:43,755-Speed 2624.14 samples/sec   Loss 7.7334   LearningRate 0.0323   Epoch: 8   Global Step: 358250   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:47,654-Speed 2626.66 samples/sec   Loss 7.6547   LearningRate 0.0323   Epoch: 8   Global Step: 358260   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:51,559-Speed 2622.96 samples/sec   Loss 7.7590   LearningRate 0.0323   Epoch: 8   Global Step: 358270   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:55,461-Speed 2625.25 samples/sec   Loss 7.6140   LearningRate 0.0323   Epoch: 8   Global Step: 358280   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:57:59,327-Speed 2649.43 samples/sec   Loss 7.7583   LearningRate 0.0323   Epoch: 8   Global Step: 358290   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:03,220-Speed 2631.17 samples/sec   Loss 7.6870   LearningRate 0.0323   Epoch: 8   Global Step: 358300   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:07,135-Speed 2615.74 samples/sec   Loss 7.7001   LearningRate 0.0323   Epoch: 8   Global Step: 358310   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:11,031-Speed 2629.11 samples/sec   Loss 7.6318   LearningRate 0.0323   Epoch: 8   Global Step: 358320   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:14,928-Speed 2628.48 samples/sec   Loss 7.4964   LearningRate 0.0323   Epoch: 8   Global Step: 358330   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:18,822-Speed 2630.21 samples/sec   Loss 7.7232   LearningRate 0.0323   Epoch: 8   Global Step: 358340   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:22,716-Speed 2630.48 samples/sec   Loss 7.6334   LearningRate 0.0323   Epoch: 8   Global Step: 358350   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:26,610-Speed 2630.36 samples/sec   Loss 7.6281   LearningRate 0.0323   Epoch: 8   Global Step: 358360   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:30,502-Speed 2631.56 samples/sec   Loss 7.7462   LearningRate 0.0323   Epoch: 8   Global Step: 358370   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:34,394-Speed 2632.09 samples/sec   Loss 7.7809   LearningRate 0.0323   Epoch: 8   Global Step: 358380   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:38,287-Speed 2631.02 samples/sec   Loss 7.8529   LearningRate 0.0323   Epoch: 8   Global Step: 358390   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:58:42,177-Speed 2632.53 samples/sec   Loss 7.6490   LearningRate 0.0323   Epoch: 8   Global Step: 358400   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:58:46,063-Speed 2635.79 samples/sec   Loss 7.6549   LearningRate 0.0323   Epoch: 8   Global Step: 358410   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:49,946-Speed 2638.09 samples/sec   Loss 7.6823   LearningRate 0.0323   Epoch: 8   Global Step: 358420   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:53,838-Speed 2631.46 samples/sec   Loss 7.5539   LearningRate 0.0323   Epoch: 8   Global Step: 358430   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:58:57,734-Speed 2629.64 samples/sec   Loss 7.7107   LearningRate 0.0323   Epoch: 8   Global Step: 358440   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:01,625-Speed 2631.82 samples/sec   Loss 7.6626   LearningRate 0.0323   Epoch: 8   Global Step: 358450   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:05,531-Speed 2623.05 samples/sec   Loss 7.5982   LearningRate 0.0323   Epoch: 8   Global Step: 358460   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:09,428-Speed 2627.99 samples/sec   Loss 7.7283   LearningRate 0.0322   Epoch: 8   Global Step: 358470   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:13,331-Speed 2632.15 samples/sec   Loss 7.7189   LearningRate 0.0322   Epoch: 8   Global Step: 358480   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:17,217-Speed 2636.04 samples/sec   Loss 7.7300   LearningRate 0.0322   Epoch: 8   Global Step: 358490   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:21,108-Speed 2632.39 samples/sec   Loss 7.7197   LearningRate 0.0322   Epoch: 8   Global Step: 358500   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:25,002-Speed 2630.42 samples/sec   Loss 7.7148   LearningRate 0.0322   Epoch: 8   Global Step: 358510   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:59:28,895-Speed 2631.00 samples/sec   Loss 7.6302   LearningRate 0.0322   Epoch: 8   Global Step: 358520   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:59:32,789-Speed 2630.81 samples/sec   Loss 7.8094   LearningRate 0.0322   Epoch: 8   Global Step: 358530   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:59:36,683-Speed 2630.24 samples/sec   Loss 7.6966   LearningRate 0.0322   Epoch: 8   Global Step: 358540   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 11:59:40,563-Speed 2639.43 samples/sec   Loss 7.6669   LearningRate 0.0322   Epoch: 8   Global Step: 358550   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:44,465-Speed 2625.26 samples/sec   Loss 7.6526   LearningRate 0.0322   Epoch: 8   Global Step: 358560   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:48,363-Speed 2627.81 samples/sec   Loss 7.6844   LearningRate 0.0322   Epoch: 8   Global Step: 358570   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:52,269-Speed 2622.01 samples/sec   Loss 7.6755   LearningRate 0.0322   Epoch: 8   Global Step: 358580   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 11:59:56,187-Speed 2615.06 samples/sec   Loss 7.6173   LearningRate 0.0322   Epoch: 8   Global Step: 358590   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:00,082-Speed 2629.29 samples/sec   Loss 7.6057   LearningRate 0.0322   Epoch: 8   Global Step: 358600   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:04,009-Speed 2608.94 samples/sec   Loss 7.6444   LearningRate 0.0322   Epoch: 8   Global Step: 358610   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:07,905-Speed 2628.77 samples/sec   Loss 7.6269   LearningRate 0.0322   Epoch: 8   Global Step: 358620   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:11,824-Speed 2613.40 samples/sec   Loss 7.6922   LearningRate 0.0322   Epoch: 8   Global Step: 358630   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:15,740-Speed 2616.03 samples/sec   Loss 7.7236   LearningRate 0.0322   Epoch: 8   Global Step: 358640   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:19,634-Speed 2630.40 samples/sec   Loss 7.5719   LearningRate 0.0322   Epoch: 8   Global Step: 358650   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:00:23,534-Speed 2626.36 samples/sec   Loss 7.6372   LearningRate 0.0322   Epoch: 8   Global Step: 358660   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:27,431-Speed 2627.98 samples/sec   Loss 7.6934   LearningRate 0.0322   Epoch: 8   Global Step: 358670   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:31,323-Speed 2632.58 samples/sec   Loss 7.7007   LearningRate 0.0322   Epoch: 8   Global Step: 358680   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:35,218-Speed 2629.44 samples/sec   Loss 7.6976   LearningRate 0.0322   Epoch: 8   Global Step: 358690   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:39,111-Speed 2631.45 samples/sec   Loss 7.6416   LearningRate 0.0322   Epoch: 8   Global Step: 358700   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:43,002-Speed 2632.15 samples/sec   Loss 7.6033   LearningRate 0.0322   Epoch: 8   Global Step: 358710   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:46,956-Speed 2590.37 samples/sec   Loss 7.6751   LearningRate 0.0322   Epoch: 8   Global Step: 358720   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:50,862-Speed 2622.31 samples/sec   Loss 7.6638   LearningRate 0.0322   Epoch: 8   Global Step: 358730   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:54,798-Speed 2602.87 samples/sec   Loss 7.6820   LearningRate 0.0322   Epoch: 8   Global Step: 358740   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:00:58,693-Speed 2629.55 samples/sec   Loss 7.5989   LearningRate 0.0322   Epoch: 8   Global Step: 358750   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:02,588-Speed 2629.74 samples/sec   Loss 7.6651   LearningRate 0.0322   Epoch: 8   Global Step: 358760   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:01:06,482-Speed 2630.34 samples/sec   Loss 7.7206   LearningRate 0.0322   Epoch: 8   Global Step: 358770   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:01:10,381-Speed 2627.45 samples/sec   Loss 7.6387   LearningRate 0.0322   Epoch: 8   Global Step: 358780   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:01:14,259-Speed 2640.72 samples/sec   Loss 7.5365   LearningRate 0.0322   Epoch: 8   Global Step: 358790   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:18,154-Speed 2629.50 samples/sec   Loss 7.6699   LearningRate 0.0322   Epoch: 8   Global Step: 358800   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:22,071-Speed 2615.20 samples/sec   Loss 7.6847   LearningRate 0.0322   Epoch: 8   Global Step: 358810   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:25,970-Speed 2627.21 samples/sec   Loss 7.6130   LearningRate 0.0322   Epoch: 8   Global Step: 358820   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:29,867-Speed 2628.62 samples/sec   Loss 7.5845   LearningRate 0.0322   Epoch: 8   Global Step: 358830   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:33,762-Speed 2629.24 samples/sec   Loss 7.5816   LearningRate 0.0322   Epoch: 8   Global Step: 358840   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:37,655-Speed 2631.24 samples/sec   Loss 7.7316   LearningRate 0.0322   Epoch: 8   Global Step: 358850   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:41,549-Speed 2630.51 samples/sec   Loss 7.7136   LearningRate 0.0322   Epoch: 8   Global Step: 358860   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:45,477-Speed 2607.22 samples/sec   Loss 7.7097   LearningRate 0.0322   Epoch: 8   Global Step: 358870   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:49,373-Speed 2629.43 samples/sec   Loss 7.6304   LearningRate 0.0322   Epoch: 8   Global Step: 358880   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:01:53,313-Speed 2599.59 samples/sec   Loss 7.4158   LearningRate 0.0322   Epoch: 8   Global Step: 358890   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:01:57,207-Speed 2631.03 samples/sec   Loss 7.5738   LearningRate 0.0322   Epoch: 8   Global Step: 358900   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:02:01,093-Speed 2636.04 samples/sec   Loss 7.6343   LearningRate 0.0322   Epoch: 8   Global Step: 358910   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:02:05,109-Speed 2550.17 samples/sec   Loss 7.6083   LearningRate 0.0322   Epoch: 8   Global Step: 358920   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:02:09,079-Speed 2579.99 samples/sec   Loss 7.7876   LearningRate 0.0322   Epoch: 8   Global Step: 358930   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:12,987-Speed 2620.70 samples/sec   Loss 7.6071   LearningRate 0.0322   Epoch: 8   Global Step: 358940   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:16,898-Speed 2618.90 samples/sec   Loss 7.5729   LearningRate 0.0322   Epoch: 8   Global Step: 358950   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:20,832-Speed 2603.80 samples/sec   Loss 7.6045   LearningRate 0.0322   Epoch: 8   Global Step: 358960   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:24,723-Speed 2632.67 samples/sec   Loss 7.6887   LearningRate 0.0322   Epoch: 8   Global Step: 358970   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:28,620-Speed 2628.16 samples/sec   Loss 7.6891   LearningRate 0.0322   Epoch: 8   Global Step: 358980   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:32,517-Speed 2628.71 samples/sec   Loss 7.6827   LearningRate 0.0322   Epoch: 8   Global Step: 358990   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:36,417-Speed 2625.95 samples/sec   Loss 7.7087   LearningRate 0.0322   Epoch: 8   Global Step: 359000   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:40,308-Speed 2632.43 samples/sec   Loss 7.6425   LearningRate 0.0322   Epoch: 8   Global Step: 359010   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:44,202-Speed 2630.20 samples/sec   Loss 7.6174   LearningRate 0.0322   Epoch: 8   Global Step: 359020   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:02:48,126-Speed 2610.91 samples/sec   Loss 7.7366   LearningRate 0.0322   Epoch: 8   Global Step: 359030   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:02:52,018-Speed 2631.81 samples/sec   Loss 7.5714   LearningRate 0.0322   Epoch: 8   Global Step: 359040   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:02:55,932-Speed 2616.97 samples/sec   Loss 7.4905   LearningRate 0.0322   Epoch: 8   Global Step: 359050   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:02:59,824-Speed 2631.89 samples/sec   Loss 7.6085   LearningRate 0.0322   Epoch: 8   Global Step: 359060   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:03,740-Speed 2615.32 samples/sec   Loss 7.6428   LearningRate 0.0322   Epoch: 8   Global Step: 359070   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:07,635-Speed 2629.76 samples/sec   Loss 7.4498   LearningRate 0.0322   Epoch: 8   Global Step: 359080   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:11,548-Speed 2617.47 samples/sec   Loss 7.7818   LearningRate 0.0322   Epoch: 8   Global Step: 359090   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:15,441-Speed 2631.10 samples/sec   Loss 7.6754   LearningRate 0.0322   Epoch: 8   Global Step: 359100   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:19,336-Speed 2630.03 samples/sec   Loss 7.7788   LearningRate 0.0322   Epoch: 8   Global Step: 359110   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:23,227-Speed 2632.62 samples/sec   Loss 7.6039   LearningRate 0.0322   Epoch: 8   Global Step: 359120   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:03:27,121-Speed 2630.25 samples/sec   Loss 7.6132   LearningRate 0.0322   Epoch: 8   Global Step: 359130   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:03:30,982-Speed 2653.42 samples/sec   Loss 7.6543   LearningRate 0.0322   Epoch: 8   Global Step: 359140   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:34,874-Speed 2631.62 samples/sec   Loss 7.6654   LearningRate 0.0322   Epoch: 8   Global Step: 359150   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:38,781-Speed 2621.81 samples/sec   Loss 7.6454   LearningRate 0.0322   Epoch: 8   Global Step: 359160   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:42,679-Speed 2627.11 samples/sec   Loss 7.6164   LearningRate 0.0322   Epoch: 8   Global Step: 359170   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:46,574-Speed 2630.17 samples/sec   Loss 7.6616   LearningRate 0.0322   Epoch: 8   Global Step: 359180   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:50,463-Speed 2634.05 samples/sec   Loss 7.6592   LearningRate 0.0322   Epoch: 8   Global Step: 359190   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:54,365-Speed 2625.13 samples/sec   Loss 7.5159   LearningRate 0.0322   Epoch: 8   Global Step: 359200   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:03:58,254-Speed 2633.57 samples/sec   Loss 7.6098   LearningRate 0.0321   Epoch: 8   Global Step: 359210   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:04:02,145-Speed 2632.37 samples/sec   Loss 7.7034   LearningRate 0.0321   Epoch: 8   Global Step: 359220   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:04:06,040-Speed 2629.65 samples/sec   Loss 7.6718   LearningRate 0.0321   Epoch: 8   Global Step: 359230   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:04:09,932-Speed 2631.42 samples/sec   Loss 7.5931   LearningRate 0.0321   Epoch: 8   Global Step: 359240   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:13,828-Speed 2629.20 samples/sec   Loss 7.5933   LearningRate 0.0321   Epoch: 8   Global Step: 359250   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:17,735-Speed 2621.74 samples/sec   Loss 7.6226   LearningRate 0.0321   Epoch: 8   Global Step: 359260   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:21,633-Speed 2627.77 samples/sec   Loss 7.6970   LearningRate 0.0321   Epoch: 8   Global Step: 359270   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:25,527-Speed 2630.75 samples/sec   Loss 7.6103   LearningRate 0.0321   Epoch: 8   Global Step: 359280   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:29,447-Speed 2613.06 samples/sec   Loss 7.5093   LearningRate 0.0321   Epoch: 8   Global Step: 359290   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:33,341-Speed 2630.24 samples/sec   Loss 7.6104   LearningRate 0.0321   Epoch: 8   Global Step: 359300   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:04:37,203-Speed 2652.33 samples/sec   Loss 7.7237   LearningRate 0.0321   Epoch: 8   Global Step: 359310   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:04:41,106-Speed 2624.31 samples/sec   Loss 7.7040   LearningRate 0.0321   Epoch: 8   Global Step: 359320   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:04:45,005-Speed 2627.06 samples/sec   Loss 7.7471   LearningRate 0.0321   Epoch: 8   Global Step: 359330   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:04:48,907-Speed 2625.31 samples/sec   Loss 7.4903   LearningRate 0.0321   Epoch: 8   Global Step: 359340   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:04:52,831-Speed 2610.42 samples/sec   Loss 7.5231   LearningRate 0.0321   Epoch: 8   Global Step: 359350   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:04:56,720-Speed 2638.30 samples/sec   Loss 7.6345   LearningRate 0.0321   Epoch: 8   Global Step: 359360   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:05:00,619-Speed 2627.39 samples/sec   Loss 7.5150   LearningRate 0.0321   Epoch: 8   Global Step: 359370   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:05:04,611-Speed 2565.80 samples/sec   Loss 7.6358   LearningRate 0.0321   Epoch: 8   Global Step: 359380   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:05:08,502-Speed 2632.34 samples/sec   Loss 7.5823   LearningRate 0.0321   Epoch: 8   Global Step: 359390   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:05:12,407-Speed 2623.26 samples/sec   Loss 7.6099   LearningRate 0.0321   Epoch: 8   Global Step: 359400   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:05:16,298-Speed 2632.26 samples/sec   Loss 7.7219   LearningRate 0.0321   Epoch: 8   Global Step: 359410   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:20,210-Speed 2618.26 samples/sec   Loss 7.5968   LearningRate 0.0321   Epoch: 8   Global Step: 359420   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:24,101-Speed 2632.13 samples/sec   Loss 7.5964   LearningRate 0.0321   Epoch: 8   Global Step: 359430   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:27,999-Speed 2628.42 samples/sec   Loss 7.6786   LearningRate 0.0321   Epoch: 8   Global Step: 359440   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:31,892-Speed 2630.60 samples/sec   Loss 7.7053   LearningRate 0.0321   Epoch: 8   Global Step: 359450   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:35,783-Speed 2632.64 samples/sec   Loss 7.5908   LearningRate 0.0321   Epoch: 8   Global Step: 359460   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:39,730-Speed 2595.13 samples/sec   Loss 7.5919   LearningRate 0.0321   Epoch: 8   Global Step: 359470   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:43,622-Speed 2631.60 samples/sec   Loss 7.6476   LearningRate 0.0321   Epoch: 8   Global Step: 359480   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:47,514-Speed 2631.84 samples/sec   Loss 7.6347   LearningRate 0.0321   Epoch: 8   Global Step: 359490   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:51,604-Speed 2503.80 samples/sec   Loss 7.6154   LearningRate 0.0321   Epoch: 8   Global Step: 359500   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:05:55,695-Speed 2504.52 samples/sec   Loss 7.6848   LearningRate 0.0321   Epoch: 8   Global Step: 359510   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:05:59,793-Speed 2499.02 samples/sec   Loss 7.5410   LearningRate 0.0321   Epoch: 8   Global Step: 359520   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:03,839-Speed 2531.76 samples/sec   Loss 7.5931   LearningRate 0.0321   Epoch: 8   Global Step: 359530   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:07,734-Speed 2629.52 samples/sec   Loss 7.6508   LearningRate 0.0321   Epoch: 8   Global Step: 359540   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:11,630-Speed 2629.06 samples/sec   Loss 7.6988   LearningRate 0.0321   Epoch: 8   Global Step: 359550   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:15,521-Speed 2632.12 samples/sec   Loss 7.6539   LearningRate 0.0321   Epoch: 8   Global Step: 359560   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:19,414-Speed 2631.41 samples/sec   Loss 7.5458   LearningRate 0.0321   Epoch: 8   Global Step: 359570   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:23,307-Speed 2630.76 samples/sec   Loss 7.5695   LearningRate 0.0321   Epoch: 8   Global Step: 359580   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:27,200-Speed 2631.35 samples/sec   Loss 7.6595   LearningRate 0.0321   Epoch: 8   Global Step: 359590   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:31,093-Speed 2630.87 samples/sec   Loss 7.5764   LearningRate 0.0321   Epoch: 8   Global Step: 359600   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:34,991-Speed 2628.36 samples/sec   Loss 7.6588   LearningRate 0.0321   Epoch: 8   Global Step: 359610   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:06:38,867-Speed 2642.42 samples/sec   Loss 7.7328   LearningRate 0.0321   Epoch: 8   Global Step: 359620   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:42,759-Speed 2631.34 samples/sec   Loss 7.5577   LearningRate 0.0321   Epoch: 8   Global Step: 359630   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:46,652-Speed 2630.98 samples/sec   Loss 7.4842   LearningRate 0.0321   Epoch: 8   Global Step: 359640   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:50,551-Speed 2626.88 samples/sec   Loss 7.6778   LearningRate 0.0321   Epoch: 8   Global Step: 359650   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:54,459-Speed 2621.65 samples/sec   Loss 7.7164   LearningRate 0.0321   Epoch: 8   Global Step: 359660   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:06:58,350-Speed 2632.35 samples/sec   Loss 7.6226   LearningRate 0.0321   Epoch: 8   Global Step: 359670   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:02,255-Speed 2622.50 samples/sec   Loss 7.7629   LearningRate 0.0321   Epoch: 8   Global Step: 359680   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:06,175-Speed 2613.17 samples/sec   Loss 7.7089   LearningRate 0.0321   Epoch: 8   Global Step: 359690   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:10,088-Speed 2617.74 samples/sec   Loss 7.6868   LearningRate 0.0321   Epoch: 8   Global Step: 359700   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:14,016-Speed 2607.79 samples/sec   Loss 7.5975   LearningRate 0.0321   Epoch: 8   Global Step: 359710   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:17,914-Speed 2627.70 samples/sec   Loss 7.4885   LearningRate 0.0321   Epoch: 8   Global Step: 359720   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:07:21,835-Speed 2612.30 samples/sec   Loss 7.6731   LearningRate 0.0321   Epoch: 8   Global Step: 359730   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:25,735-Speed 2626.48 samples/sec   Loss 7.6887   LearningRate 0.0321   Epoch: 8   Global Step: 359740   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:29,640-Speed 2623.20 samples/sec   Loss 7.6309   LearningRate 0.0321   Epoch: 8   Global Step: 359750   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:33,545-Speed 2622.93 samples/sec   Loss 7.5267   LearningRate 0.0321   Epoch: 8   Global Step: 359760   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:07:37,430-Speed 2635.98 samples/sec   Loss 7.5527   LearningRate 0.0321   Epoch: 8   Global Step: 359770   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:07:41,332-Speed 2624.87 samples/sec   Loss 7.6194   LearningRate 0.0321   Epoch: 8   Global Step: 359780   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:07:45,225-Speed 2631.31 samples/sec   Loss 7.6366   LearningRate 0.0321   Epoch: 8   Global Step: 359790   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:07:49,118-Speed 2631.02 samples/sec   Loss 7.6284   LearningRate 0.0321   Epoch: 8   Global Step: 359800   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:07:53,015-Speed 2628.38 samples/sec   Loss 7.6364   LearningRate 0.0321   Epoch: 8   Global Step: 359810   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:07:56,893-Speed 2641.08 samples/sec   Loss 7.7760   LearningRate 0.0321   Epoch: 8   Global Step: 359820   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:08:00,785-Speed 2632.30 samples/sec   Loss 7.5546   LearningRate 0.0321   Epoch: 8   Global Step: 359830   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:08:04,686-Speed 2625.47 samples/sec   Loss 7.6737   LearningRate 0.0321   Epoch: 8   Global Step: 359840   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:08:08,609-Speed 2610.92 samples/sec   Loss 7.5816   LearningRate 0.0321   Epoch: 8   Global Step: 359850   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:08:12,485-Speed 2641.92 samples/sec   Loss 7.5800   LearningRate 0.0321   Epoch: 8   Global Step: 359860   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:16,378-Speed 2631.24 samples/sec   Loss 7.5720   LearningRate 0.0321   Epoch: 8   Global Step: 359870   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:20,272-Speed 2631.11 samples/sec   Loss 7.5051   LearningRate 0.0321   Epoch: 8   Global Step: 359880   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:24,184-Speed 2617.48 samples/sec   Loss 7.6259   LearningRate 0.0321   Epoch: 8   Global Step: 359890   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:28,122-Speed 2601.70 samples/sec   Loss 7.6642   LearningRate 0.0321   Epoch: 8   Global Step: 359900   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:32,006-Speed 2636.82 samples/sec   Loss 7.5829   LearningRate 0.0321   Epoch: 8   Global Step: 359910   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:35,910-Speed 2623.50 samples/sec   Loss 7.6005   LearningRate 0.0321   Epoch: 8   Global Step: 359920   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:39,802-Speed 2631.80 samples/sec   Loss 7.6776   LearningRate 0.0321   Epoch: 8   Global Step: 359930   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:43,725-Speed 2610.73 samples/sec   Loss 7.7001   LearningRate 0.0320   Epoch: 8   Global Step: 359940   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:47,621-Speed 2629.08 samples/sec   Loss 7.5850   LearningRate 0.0320   Epoch: 8   Global Step: 359950   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:08:51,520-Speed 2627.34 samples/sec   Loss 7.7052   LearningRate 0.0320   Epoch: 8   Global Step: 359960   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:08:55,411-Speed 2632.53 samples/sec   Loss 7.6059   LearningRate 0.0320   Epoch: 8   Global Step: 359970   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:08:59,307-Speed 2629.53 samples/sec   Loss 7.7299   LearningRate 0.0320   Epoch: 8   Global Step: 359980   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:09:03,204-Speed 2628.21 samples/sec   Loss 7.6315   LearningRate 0.0320   Epoch: 8   Global Step: 359990   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:09:07,099-Speed 2629.51 samples/sec   Loss 7.6191   LearningRate 0.0320   Epoch: 8   Global Step: 360000   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:09:49,998-[lfw][360000]XNorm: 23.186249
Training: 2022-04-14 12:09:49,999-[lfw][360000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-14 12:09:49,999-[lfw][360000]Accuracy-Highest: 0.99783
Training: 2022-04-14 12:10:39,924-[cfp_fp][360000]XNorm: 21.406011
Training: 2022-04-14 12:10:39,925-[cfp_fp][360000]Accuracy-Flip: 0.98600+-0.00538
Training: 2022-04-14 12:10:39,926-[cfp_fp][360000]Accuracy-Highest: 0.98671
Training: 2022-04-14 12:11:23,081-[agedb_30][360000]XNorm: 23.462880
Training: 2022-04-14 12:11:23,082-[agedb_30][360000]Accuracy-Flip: 0.97700+-0.00698
Training: 2022-04-14 12:11:23,083-[agedb_30][360000]Accuracy-Highest: 0.97700
Training: 2022-04-14 12:11:26,950-Speed 73.22 samples/sec   Loss 7.7173   LearningRate 0.0320   Epoch: 8   Global Step: 360010   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:11:30,822-Speed 2645.23 samples/sec   Loss 7.6198   LearningRate 0.0320   Epoch: 8   Global Step: 360020   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:11:34,699-Speed 2642.03 samples/sec   Loss 7.6861   LearningRate 0.0320   Epoch: 8   Global Step: 360030   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:11:38,575-Speed 2642.38 samples/sec   Loss 7.5721   LearningRate 0.0320   Epoch: 8   Global Step: 360040   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:11:42,464-Speed 2633.98 samples/sec   Loss 7.5390   LearningRate 0.0320   Epoch: 8   Global Step: 360050   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:11:46,347-Speed 2637.88 samples/sec   Loss 7.5161   LearningRate 0.0320   Epoch: 8   Global Step: 360060   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:11:50,221-Speed 2644.10 samples/sec   Loss 7.7719   LearningRate 0.0320   Epoch: 8   Global Step: 360070   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:11:54,101-Speed 2639.92 samples/sec   Loss 7.7210   LearningRate 0.0320   Epoch: 8   Global Step: 360080   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:11:57,984-Speed 2637.54 samples/sec   Loss 7.6872   LearningRate 0.0320   Epoch: 8   Global Step: 360090   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:01,867-Speed 2637.94 samples/sec   Loss 7.6933   LearningRate 0.0320   Epoch: 8   Global Step: 360100   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:05,781-Speed 2616.75 samples/sec   Loss 7.6424   LearningRate 0.0320   Epoch: 8   Global Step: 360110   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:09,674-Speed 2631.49 samples/sec   Loss 7.7962   LearningRate 0.0320   Epoch: 8   Global Step: 360120   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:13,567-Speed 2630.56 samples/sec   Loss 7.6215   LearningRate 0.0320   Epoch: 8   Global Step: 360130   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:17,536-Speed 2581.05 samples/sec   Loss 7.5789   LearningRate 0.0320   Epoch: 8   Global Step: 360140   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:21,423-Speed 2635.27 samples/sec   Loss 7.6918   LearningRate 0.0320   Epoch: 8   Global Step: 360150   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:25,314-Speed 2632.71 samples/sec   Loss 7.6304   LearningRate 0.0320   Epoch: 8   Global Step: 360160   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:12:29,215-Speed 2625.51 samples/sec   Loss 7.6599   LearningRate 0.0320   Epoch: 8   Global Step: 360170   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:12:33,105-Speed 2633.51 samples/sec   Loss 7.7558   LearningRate 0.0320   Epoch: 8   Global Step: 360180   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:12:36,997-Speed 2631.28 samples/sec   Loss 7.6271   LearningRate 0.0320   Epoch: 8   Global Step: 360190   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:12:40,888-Speed 2632.23 samples/sec   Loss 7.5690   LearningRate 0.0320   Epoch: 8   Global Step: 360200   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:12:44,780-Speed 2632.35 samples/sec   Loss 7.6474   LearningRate 0.0320   Epoch: 8   Global Step: 360210   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:12:48,662-Speed 2638.24 samples/sec   Loss 7.5521   LearningRate 0.0320   Epoch: 8   Global Step: 360220   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:12:52,459-Speed 2697.92 samples/sec   Loss 8.2656   LearningRate 0.0320   Epoch: 8   Global Step: 360230   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:12:56,358-Speed 2626.57 samples/sec   Loss 9.4664   LearningRate 0.0320   Epoch: 8   Global Step: 360240   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:00,253-Speed 2629.69 samples/sec   Loss 8.0911   LearningRate 0.0320   Epoch: 8   Global Step: 360250   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:04,139-Speed 2635.95 samples/sec   Loss 7.9676   LearningRate 0.0320   Epoch: 8   Global Step: 360260   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:08,056-Speed 2614.67 samples/sec   Loss 7.5958   LearningRate 0.0320   Epoch: 8   Global Step: 360270   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:11,945-Speed 2633.61 samples/sec   Loss 7.6948   LearningRate 0.0320   Epoch: 8   Global Step: 360280   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:15,857-Speed 2618.46 samples/sec   Loss 7.8558   LearningRate 0.0320   Epoch: 8   Global Step: 360290   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:19,747-Speed 2633.06 samples/sec   Loss 7.5955   LearningRate 0.0320   Epoch: 8   Global Step: 360300   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:23,636-Speed 2634.13 samples/sec   Loss 7.6708   LearningRate 0.0320   Epoch: 8   Global Step: 360310   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:27,522-Speed 2635.71 samples/sec   Loss 7.6577   LearningRate 0.0320   Epoch: 8   Global Step: 360320   Fp16 Grad Scale: 2048   Required: 53 hours
Training: 2022-04-14 12:13:31,410-Speed 2634.74 samples/sec   Loss 7.6951   LearningRate 0.0320   Epoch: 8   Global Step: 360330   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:35,304-Speed 2630.70 samples/sec   Loss 7.6706   LearningRate 0.0320   Epoch: 8   Global Step: 360340   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:39,191-Speed 2634.41 samples/sec   Loss 7.6972   LearningRate 0.0320   Epoch: 8   Global Step: 360350   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:43,087-Speed 2629.03 samples/sec   Loss 7.5343   LearningRate 0.0320   Epoch: 8   Global Step: 360360   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:46,973-Speed 2635.47 samples/sec   Loss 7.7674   LearningRate 0.0320   Epoch: 8   Global Step: 360370   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:50,870-Speed 2628.77 samples/sec   Loss 7.6071   LearningRate 0.0320   Epoch: 8   Global Step: 360380   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:54,760-Speed 2633.38 samples/sec   Loss 7.6036   LearningRate 0.0320   Epoch: 8   Global Step: 360390   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:13:58,722-Speed 2585.29 samples/sec   Loss 7.6981   LearningRate 0.0320   Epoch: 8   Global Step: 360400   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:14:02,613-Speed 2632.31 samples/sec   Loss 7.6596   LearningRate 0.0320   Epoch: 8   Global Step: 360410   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:14:06,504-Speed 2631.58 samples/sec   Loss 7.8083   LearningRate 0.0320   Epoch: 8   Global Step: 360420   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:14:10,395-Speed 2632.47 samples/sec   Loss 7.6519   LearningRate 0.0320   Epoch: 8   Global Step: 360430   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:14,282-Speed 2635.25 samples/sec   Loss 7.6988   LearningRate 0.0320   Epoch: 8   Global Step: 360440   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:18,177-Speed 2630.08 samples/sec   Loss 7.6190   LearningRate 0.0320   Epoch: 8   Global Step: 360450   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:22,104-Speed 2608.04 samples/sec   Loss 7.6952   LearningRate 0.0320   Epoch: 8   Global Step: 360460   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:25,997-Speed 2631.17 samples/sec   Loss 7.6975   LearningRate 0.0320   Epoch: 8   Global Step: 360470   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:29,887-Speed 2633.59 samples/sec   Loss 7.4706   LearningRate 0.0320   Epoch: 8   Global Step: 360480   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:33,799-Speed 2618.23 samples/sec   Loss 7.7487   LearningRate 0.0320   Epoch: 8   Global Step: 360490   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:37,684-Speed 2636.03 samples/sec   Loss 7.6528   LearningRate 0.0320   Epoch: 8   Global Step: 360500   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:41,593-Speed 2620.86 samples/sec   Loss 7.5133   LearningRate 0.0320   Epoch: 8   Global Step: 360510   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:45,483-Speed 2632.81 samples/sec   Loss 7.6963   LearningRate 0.0320   Epoch: 8   Global Step: 360520   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:14:49,372-Speed 2634.06 samples/sec   Loss 7.6039   LearningRate 0.0320   Epoch: 8   Global Step: 360530   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:14:53,262-Speed 2633.28 samples/sec   Loss 7.7185   LearningRate 0.0320   Epoch: 8   Global Step: 360540   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:14:57,157-Speed 2629.48 samples/sec   Loss 7.4770   LearningRate 0.0320   Epoch: 8   Global Step: 360550   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:01,053-Speed 2629.27 samples/sec   Loss 7.6348   LearningRate 0.0320   Epoch: 8   Global Step: 360560   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:04,949-Speed 2628.77 samples/sec   Loss 7.5302   LearningRate 0.0320   Epoch: 8   Global Step: 360570   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:08,840-Speed 2632.26 samples/sec   Loss 7.4513   LearningRate 0.0320   Epoch: 8   Global Step: 360580   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:12,729-Speed 2633.50 samples/sec   Loss 7.5870   LearningRate 0.0320   Epoch: 8   Global Step: 360590   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:16,623-Speed 2630.45 samples/sec   Loss 7.5243   LearningRate 0.0320   Epoch: 8   Global Step: 360600   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:20,517-Speed 2630.36 samples/sec   Loss 7.7309   LearningRate 0.0320   Epoch: 8   Global Step: 360610   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:24,408-Speed 2633.11 samples/sec   Loss 7.6213   LearningRate 0.0320   Epoch: 8   Global Step: 360620   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:15:28,304-Speed 2628.46 samples/sec   Loss 7.6240   LearningRate 0.0320   Epoch: 8   Global Step: 360630   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:32,202-Speed 2627.88 samples/sec   Loss 7.6015   LearningRate 0.0320   Epoch: 8   Global Step: 360640   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:36,102-Speed 2626.02 samples/sec   Loss 7.6004   LearningRate 0.0320   Epoch: 8   Global Step: 360650   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:39,999-Speed 2628.44 samples/sec   Loss 7.6114   LearningRate 0.0320   Epoch: 8   Global Step: 360660   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:43,889-Speed 2632.25 samples/sec   Loss 7.6854   LearningRate 0.0319   Epoch: 8   Global Step: 360670   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:47,783-Speed 2631.49 samples/sec   Loss 7.5167   LearningRate 0.0319   Epoch: 8   Global Step: 360680   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:51,672-Speed 2634.45 samples/sec   Loss 7.6341   LearningRate 0.0319   Epoch: 8   Global Step: 360690   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:55,583-Speed 2619.10 samples/sec   Loss 7.6713   LearningRate 0.0319   Epoch: 8   Global Step: 360700   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:15:59,486-Speed 2624.41 samples/sec   Loss 7.6199   LearningRate 0.0319   Epoch: 8   Global Step: 360710   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:16:03,412-Speed 2608.23 samples/sec   Loss 7.7022   LearningRate 0.0319   Epoch: 8   Global Step: 360720   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:16:07,308-Speed 2629.17 samples/sec   Loss 7.6179   LearningRate 0.0319   Epoch: 8   Global Step: 360730   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:11,199-Speed 2631.74 samples/sec   Loss 7.4739   LearningRate 0.0319   Epoch: 8   Global Step: 360740   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:15,100-Speed 2626.59 samples/sec   Loss 7.5869   LearningRate 0.0319   Epoch: 8   Global Step: 360750   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:18,993-Speed 2630.73 samples/sec   Loss 7.4563   LearningRate 0.0319   Epoch: 8   Global Step: 360760   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:22,902-Speed 2620.05 samples/sec   Loss 7.5693   LearningRate 0.0319   Epoch: 8   Global Step: 360770   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:26,790-Speed 2634.54 samples/sec   Loss 7.5508   LearningRate 0.0319   Epoch: 8   Global Step: 360780   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:30,684-Speed 2630.55 samples/sec   Loss 7.5436   LearningRate 0.0319   Epoch: 8   Global Step: 360790   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:34,588-Speed 2623.69 samples/sec   Loss 7.6368   LearningRate 0.0319   Epoch: 8   Global Step: 360800   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:38,488-Speed 2625.56 samples/sec   Loss 7.6152   LearningRate 0.0319   Epoch: 8   Global Step: 360810   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:42,383-Speed 2630.05 samples/sec   Loss 7.6720   LearningRate 0.0319   Epoch: 8   Global Step: 360820   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:16:46,280-Speed 2629.31 samples/sec   Loss 7.6072   LearningRate 0.0319   Epoch: 8   Global Step: 360830   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:16:50,185-Speed 2622.99 samples/sec   Loss 7.6587   LearningRate 0.0319   Epoch: 8   Global Step: 360840   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:16:54,100-Speed 2616.48 samples/sec   Loss 7.5131   LearningRate 0.0319   Epoch: 8   Global Step: 360850   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:16:57,996-Speed 2629.28 samples/sec   Loss 7.6958   LearningRate 0.0319   Epoch: 8   Global Step: 360860   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:01,899-Speed 2624.00 samples/sec   Loss 7.5475   LearningRate 0.0319   Epoch: 8   Global Step: 360870   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:05,797-Speed 2627.75 samples/sec   Loss 7.5489   LearningRate 0.0319   Epoch: 8   Global Step: 360880   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:09,697-Speed 2626.40 samples/sec   Loss 7.6478   LearningRate 0.0319   Epoch: 8   Global Step: 360890   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:13,643-Speed 2595.32 samples/sec   Loss 7.6170   LearningRate 0.0319   Epoch: 8   Global Step: 360900   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:17,550-Speed 2622.10 samples/sec   Loss 7.5051   LearningRate 0.0319   Epoch: 8   Global Step: 360910   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:21,443-Speed 2631.43 samples/sec   Loss 7.5852   LearningRate 0.0319   Epoch: 8   Global Step: 360920   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:25,346-Speed 2624.06 samples/sec   Loss 7.3717   LearningRate 0.0319   Epoch: 8   Global Step: 360930   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:17:29,265-Speed 2613.29 samples/sec   Loss 7.5893   LearningRate 0.0319   Epoch: 8   Global Step: 360940   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:17:33,177-Speed 2618.37 samples/sec   Loss 7.6367   LearningRate 0.0319   Epoch: 8   Global Step: 360950   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:17:37,075-Speed 2627.91 samples/sec   Loss 7.5545   LearningRate 0.0319   Epoch: 8   Global Step: 360960   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:17:40,975-Speed 2626.35 samples/sec   Loss 7.4993   LearningRate 0.0319   Epoch: 8   Global Step: 360970   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:17:44,880-Speed 2622.55 samples/sec   Loss 7.5462   LearningRate 0.0319   Epoch: 8   Global Step: 360980   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:17:48,782-Speed 2625.81 samples/sec   Loss 7.6389   LearningRate 0.0319   Epoch: 8   Global Step: 360990   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:52,697-Speed 2616.07 samples/sec   Loss 7.6670   LearningRate 0.0319   Epoch: 8   Global Step: 361000   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:17:56,597-Speed 2626.47 samples/sec   Loss 7.5681   LearningRate 0.0319   Epoch: 8   Global Step: 361010   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:18:00,447-Speed 2660.36 samples/sec   Loss 8.2528   LearningRate 0.0319   Epoch: 8   Global Step: 361020   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:04,344-Speed 2628.05 samples/sec   Loss 8.2373   LearningRate 0.0319   Epoch: 8   Global Step: 361030   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:08,235-Speed 2632.06 samples/sec   Loss 7.6857   LearningRate 0.0319   Epoch: 8   Global Step: 361040   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:12,145-Speed 2619.91 samples/sec   Loss 7.6400   LearningRate 0.0319   Epoch: 8   Global Step: 361050   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:16,043-Speed 2627.91 samples/sec   Loss 7.6704   LearningRate 0.0319   Epoch: 8   Global Step: 361060   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:19,937-Speed 2630.60 samples/sec   Loss 7.6312   LearningRate 0.0319   Epoch: 8   Global Step: 361070   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:23,830-Speed 2630.77 samples/sec   Loss 7.5979   LearningRate 0.0319   Epoch: 8   Global Step: 361080   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:27,751-Speed 2612.38 samples/sec   Loss 7.5909   LearningRate 0.0319   Epoch: 8   Global Step: 361090   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:31,665-Speed 2617.30 samples/sec   Loss 7.5654   LearningRate 0.0319   Epoch: 8   Global Step: 361100   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:35,565-Speed 2625.99 samples/sec   Loss 7.5680   LearningRate 0.0319   Epoch: 8   Global Step: 361110   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:18:39,457-Speed 2631.75 samples/sec   Loss 7.5924   LearningRate 0.0319   Epoch: 8   Global Step: 361120   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:18:43,349-Speed 2631.07 samples/sec   Loss 7.6319   LearningRate 0.0319   Epoch: 8   Global Step: 361130   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:18:47,254-Speed 2623.35 samples/sec   Loss 7.7140   LearningRate 0.0319   Epoch: 8   Global Step: 361140   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:18:51,147-Speed 2631.17 samples/sec   Loss 7.5710   LearningRate 0.0319   Epoch: 8   Global Step: 361150   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:18:55,079-Speed 2605.15 samples/sec   Loss 7.5625   LearningRate 0.0319   Epoch: 8   Global Step: 361160   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:18:58,977-Speed 2627.85 samples/sec   Loss 7.7170   LearningRate 0.0319   Epoch: 8   Global Step: 361170   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:19:02,872-Speed 2629.54 samples/sec   Loss 7.5435   LearningRate 0.0319   Epoch: 8   Global Step: 361180   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:19:06,769-Speed 2627.71 samples/sec   Loss 7.6307   LearningRate 0.0319   Epoch: 8   Global Step: 361190   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:19:10,666-Speed 2628.53 samples/sec   Loss 7.7269   LearningRate 0.0319   Epoch: 8   Global Step: 361200   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:19:14,556-Speed 2632.75 samples/sec   Loss 7.6746   LearningRate 0.0319   Epoch: 8   Global Step: 361210   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:19:18,455-Speed 2627.12 samples/sec   Loss 7.6481   LearningRate 0.0319   Epoch: 8   Global Step: 361220   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:22,346-Speed 2632.97 samples/sec   Loss 7.5985   LearningRate 0.0319   Epoch: 8   Global Step: 361230   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:26,241-Speed 2629.66 samples/sec   Loss 7.5983   LearningRate 0.0319   Epoch: 8   Global Step: 361240   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:30,133-Speed 2631.86 samples/sec   Loss 7.5503   LearningRate 0.0319   Epoch: 8   Global Step: 361250   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:34,036-Speed 2624.36 samples/sec   Loss 7.6271   LearningRate 0.0319   Epoch: 8   Global Step: 361260   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:37,927-Speed 2632.13 samples/sec   Loss 7.5797   LearningRate 0.0319   Epoch: 8   Global Step: 361270   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:41,840-Speed 2617.76 samples/sec   Loss 7.5480   LearningRate 0.0319   Epoch: 8   Global Step: 361280   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:45,734-Speed 2630.20 samples/sec   Loss 7.6108   LearningRate 0.0319   Epoch: 8   Global Step: 361290   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:49,645-Speed 2619.31 samples/sec   Loss 7.5362   LearningRate 0.0319   Epoch: 8   Global Step: 361300   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:53,540-Speed 2629.45 samples/sec   Loss 7.7050   LearningRate 0.0319   Epoch: 8   Global Step: 361310   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:19:57,438-Speed 2627.78 samples/sec   Loss 7.4865   LearningRate 0.0319   Epoch: 8   Global Step: 361320   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:20:01,338-Speed 2626.53 samples/sec   Loss 7.5937   LearningRate 0.0319   Epoch: 8   Global Step: 361330   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:20:05,247-Speed 2620.44 samples/sec   Loss 7.5615   LearningRate 0.0319   Epoch: 8   Global Step: 361340   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:20:09,140-Speed 2630.85 samples/sec   Loss 7.6068   LearningRate 0.0319   Epoch: 8   Global Step: 361350   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:20:13,044-Speed 2622.90 samples/sec   Loss 7.4928   LearningRate 0.0319   Epoch: 8   Global Step: 361360   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:20:16,922-Speed 2641.67 samples/sec   Loss 7.6978   LearningRate 0.0319   Epoch: 8   Global Step: 361370   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:20,813-Speed 2632.74 samples/sec   Loss 7.6582   LearningRate 0.0319   Epoch: 8   Global Step: 361380   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:24,707-Speed 2630.73 samples/sec   Loss 7.5909   LearningRate 0.0319   Epoch: 8   Global Step: 361390   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:28,616-Speed 2620.33 samples/sec   Loss 7.6563   LearningRate 0.0318   Epoch: 8   Global Step: 361400   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:32,509-Speed 2631.02 samples/sec   Loss 7.5724   LearningRate 0.0318   Epoch: 8   Global Step: 361410   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:36,418-Speed 2620.09 samples/sec   Loss 7.6032   LearningRate 0.0318   Epoch: 8   Global Step: 361420   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:40,313-Speed 2629.39 samples/sec   Loss 7.5554   LearningRate 0.0318   Epoch: 8   Global Step: 361430   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:44,206-Speed 2631.44 samples/sec   Loss 7.6963   LearningRate 0.0318   Epoch: 8   Global Step: 361440   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:48,097-Speed 2632.00 samples/sec   Loss 7.4981   LearningRate 0.0318   Epoch: 8   Global Step: 361450   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:52,000-Speed 2624.62 samples/sec   Loss 7.7255   LearningRate 0.0318   Epoch: 8   Global Step: 361460   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:20:55,900-Speed 2626.28 samples/sec   Loss 7.7412   LearningRate 0.0318   Epoch: 8   Global Step: 361470   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:20:59,815-Speed 2616.75 samples/sec   Loss 7.6979   LearningRate 0.0318   Epoch: 8   Global Step: 361480   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:21:03,711-Speed 2628.61 samples/sec   Loss 7.6704   LearningRate 0.0318   Epoch: 8   Global Step: 361490   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:21:07,640-Speed 2607.36 samples/sec   Loss 7.7059   LearningRate 0.0318   Epoch: 8   Global Step: 361500   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:21:11,505-Speed 2650.06 samples/sec   Loss 7.5106   LearningRate 0.0318   Epoch: 8   Global Step: 361510   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:15,414-Speed 2620.05 samples/sec   Loss 7.5384   LearningRate 0.0318   Epoch: 8   Global Step: 361520   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:19,313-Speed 2626.60 samples/sec   Loss 7.6110   LearningRate 0.0318   Epoch: 8   Global Step: 361530   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:23,208-Speed 2630.24 samples/sec   Loss 7.5747   LearningRate 0.0318   Epoch: 8   Global Step: 361540   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:27,105-Speed 2628.46 samples/sec   Loss 7.5740   LearningRate 0.0318   Epoch: 8   Global Step: 361550   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:31,012-Speed 2621.62 samples/sec   Loss 7.6978   LearningRate 0.0318   Epoch: 8   Global Step: 361560   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:34,908-Speed 2629.20 samples/sec   Loss 7.5454   LearningRate 0.0318   Epoch: 8   Global Step: 361570   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:38,798-Speed 2632.67 samples/sec   Loss 7.6154   LearningRate 0.0318   Epoch: 8   Global Step: 361580   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:42,692-Speed 2630.78 samples/sec   Loss 7.5579   LearningRate 0.0318   Epoch: 8   Global Step: 361590   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:46,586-Speed 2630.47 samples/sec   Loss 7.5841   LearningRate 0.0318   Epoch: 8   Global Step: 361600   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:21:50,484-Speed 2627.92 samples/sec   Loss 7.6422   LearningRate 0.0318   Epoch: 8   Global Step: 361610   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:21:54,440-Speed 2589.15 samples/sec   Loss 7.5956   LearningRate 0.0318   Epoch: 8   Global Step: 361620   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:21:58,267-Speed 2676.33 samples/sec   Loss 8.1411   LearningRate 0.0318   Epoch: 8   Global Step: 361630   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:02,151-Speed 2637.93 samples/sec   Loss 8.0207   LearningRate 0.0318   Epoch: 8   Global Step: 361640   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:06,053-Speed 2625.47 samples/sec   Loss 7.5799   LearningRate 0.0318   Epoch: 8   Global Step: 361650   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:10,037-Speed 2570.57 samples/sec   Loss 7.7418   LearningRate 0.0318   Epoch: 8   Global Step: 361660   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:13,931-Speed 2630.04 samples/sec   Loss 7.5850   LearningRate 0.0318   Epoch: 8   Global Step: 361670   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:17,876-Speed 2596.87 samples/sec   Loss 8.7690   LearningRate 0.0318   Epoch: 8   Global Step: 361680   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:21,765-Speed 2634.43 samples/sec   Loss 8.2293   LearningRate 0.0318   Epoch: 8   Global Step: 361690   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:25,691-Speed 2609.04 samples/sec   Loss 8.0665   LearningRate 0.0318   Epoch: 8   Global Step: 361700   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:29,590-Speed 2627.30 samples/sec   Loss 7.8234   LearningRate 0.0318   Epoch: 8   Global Step: 361710   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:33,504-Speed 2616.81 samples/sec   Loss 7.7638   LearningRate 0.0318   Epoch: 8   Global Step: 361720   Fp16 Grad Scale: 4096   Required: 53 hours
Training: 2022-04-14 12:22:37,409-Speed 2622.77 samples/sec   Loss 7.6242   LearningRate 0.0318   Epoch: 8   Global Step: 361730   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:22:41,314-Speed 2623.40 samples/sec   Loss 7.7413   LearningRate 0.0318   Epoch: 8   Global Step: 361740   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:22:45,217-Speed 2623.55 samples/sec   Loss 7.5829   LearningRate 0.0318   Epoch: 8   Global Step: 361750   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:22:49,135-Speed 2615.22 samples/sec   Loss 7.6139   LearningRate 0.0318   Epoch: 8   Global Step: 361760   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:22:53,051-Speed 2615.52 samples/sec   Loss 7.7238   LearningRate 0.0318   Epoch: 8   Global Step: 361770   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:22:56,970-Speed 2613.18 samples/sec   Loss 7.6364   LearningRate 0.0318   Epoch: 8   Global Step: 361780   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:23:00,865-Speed 2630.04 samples/sec   Loss 7.5514   LearningRate 0.0318   Epoch: 8   Global Step: 361790   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:23:04,768-Speed 2623.73 samples/sec   Loss 7.6114   LearningRate 0.0318   Epoch: 8   Global Step: 361800   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:23:08,666-Speed 2627.97 samples/sec   Loss 7.7079   LearningRate 0.0318   Epoch: 8   Global Step: 361810   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:23:12,572-Speed 2622.20 samples/sec   Loss 7.6033   LearningRate 0.0318   Epoch: 8   Global Step: 361820   Fp16 Grad Scale: 8192   Required: 53 hours
Training: 2022-04-14 12:23:16,475-Speed 2624.22 samples/sec   Loss 7.7175   LearningRate 0.0318   Epoch: 8   Global Step: 361830   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:20,387-Speed 2618.17 samples/sec   Loss 7.5791   LearningRate 0.0318   Epoch: 8   Global Step: 361840   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:24,314-Speed 2608.50 samples/sec   Loss 7.6638   LearningRate 0.0318   Epoch: 8   Global Step: 361850   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:28,214-Speed 2625.85 samples/sec   Loss 7.7976   LearningRate 0.0318   Epoch: 8   Global Step: 361860   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:32,110-Speed 2628.96 samples/sec   Loss 7.5975   LearningRate 0.0318   Epoch: 8   Global Step: 361870   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:36,018-Speed 2621.26 samples/sec   Loss 7.6091   LearningRate 0.0318   Epoch: 8   Global Step: 361880   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:40,031-Speed 2551.87 samples/sec   Loss 7.7123   LearningRate 0.0318   Epoch: 8   Global Step: 361890   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:43,927-Speed 2629.30 samples/sec   Loss 7.7523   LearningRate 0.0318   Epoch: 8   Global Step: 361900   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:47,857-Speed 2606.22 samples/sec   Loss 7.5771   LearningRate 0.0318   Epoch: 8   Global Step: 361910   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:51,758-Speed 2625.80 samples/sec   Loss 7.5768   LearningRate 0.0318   Epoch: 8   Global Step: 361920   Fp16 Grad Scale: 16384   Required: 53 hours
Training: 2022-04-14 12:23:55,660-Speed 2625.06 samples/sec   Loss 7.5707   LearningRate 0.0318   Epoch: 8   Global Step: 361930   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:23:59,567-Speed 2621.20 samples/sec   Loss 7.6170   LearningRate 0.0318   Epoch: 8   Global Step: 361940   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:03,484-Speed 2615.15 samples/sec   Loss 7.6676   LearningRate 0.0318   Epoch: 8   Global Step: 361950   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:07,388-Speed 2623.76 samples/sec   Loss 7.7035   LearningRate 0.0318   Epoch: 8   Global Step: 361960   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:11,281-Speed 2630.89 samples/sec   Loss 7.5509   LearningRate 0.0318   Epoch: 8   Global Step: 361970   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:15,181-Speed 2626.34 samples/sec   Loss 7.5062   LearningRate 0.0318   Epoch: 8   Global Step: 361980   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:19,081-Speed 2626.42 samples/sec   Loss 7.6106   LearningRate 0.0318   Epoch: 8   Global Step: 361990   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:22,976-Speed 2629.50 samples/sec   Loss 7.5464   LearningRate 0.0318   Epoch: 8   Global Step: 362000   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:26,875-Speed 2627.39 samples/sec   Loss 7.6154   LearningRate 0.0318   Epoch: 8   Global Step: 362010   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:30,769-Speed 2630.32 samples/sec   Loss 7.6386   LearningRate 0.0318   Epoch: 8   Global Step: 362020   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-04-14 12:24:34,677-Speed 2620.47 samples/sec   Loss 7.5678   LearningRate 0.0318   Epoch: 8   Global Step: 362030   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:24:38,576-Speed 2627.34 samples/sec   Loss 7.6209   LearningRate 0.0318   Epoch: 8   Global Step: 362040   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:24:42,477-Speed 2625.86 samples/sec   Loss 7.7724   LearningRate 0.0318   Epoch: 8   Global Step: 362050   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:24:46,387-Speed 2619.42 samples/sec   Loss 7.6197   LearningRate 0.0318   Epoch: 8   Global Step: 362060   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:24:50,282-Speed 2629.68 samples/sec   Loss 7.4695   LearningRate 0.0318   Epoch: 8   Global Step: 362070   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:24:54,186-Speed 2624.06 samples/sec   Loss 7.5402   LearningRate 0.0318   Epoch: 8   Global Step: 362080   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:24:58,090-Speed 2623.56 samples/sec   Loss 7.6435   LearningRate 0.0318   Epoch: 8   Global Step: 362090   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:25:01,981-Speed 2632.04 samples/sec   Loss 7.5512   LearningRate 0.0318   Epoch: 8   Global Step: 362100   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:25:05,901-Speed 2613.41 samples/sec   Loss 7.7217   LearningRate 0.0318   Epoch: 8   Global Step: 362110   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:25:09,795-Speed 2629.79 samples/sec   Loss 7.5748   LearningRate 0.0318   Epoch: 8   Global Step: 362120   Fp16 Grad Scale: 65536   Required: 53 hours
Training: 2022-04-14 12:25:13,689-Speed 2630.50 samples/sec   Loss 7.6101   LearningRate 0.0318   Epoch: 8   Global Step: 362130   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:17,587-Speed 2627.72 samples/sec   Loss 7.6678   LearningRate 0.0317   Epoch: 8   Global Step: 362140   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:21,544-Speed 2588.72 samples/sec   Loss 7.4806   LearningRate 0.0317   Epoch: 8   Global Step: 362150   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:25,428-Speed 2637.13 samples/sec   Loss 7.5636   LearningRate 0.0317   Epoch: 8   Global Step: 362160   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:29,327-Speed 2627.19 samples/sec   Loss 7.6954   LearningRate 0.0317   Epoch: 8   Global Step: 362170   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:33,223-Speed 2629.15 samples/sec   Loss 7.5092   LearningRate 0.0317   Epoch: 8   Global Step: 362180   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:37,134-Speed 2618.75 samples/sec   Loss 7.6903   LearningRate 0.0317   Epoch: 8   Global Step: 362190   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:41,037-Speed 2624.11 samples/sec   Loss 7.5484   LearningRate 0.0317   Epoch: 8   Global Step: 362200   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:44,940-Speed 2624.28 samples/sec   Loss 7.6196   LearningRate 0.0317   Epoch: 8   Global Step: 362210   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:48,822-Speed 2637.94 samples/sec   Loss 7.6806   LearningRate 0.0317   Epoch: 8   Global Step: 362220   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:25:52,714-Speed 2632.48 samples/sec   Loss 7.6777   LearningRate 0.0317   Epoch: 8   Global Step: 362230   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:25:56,770-Speed 2525.06 samples/sec   Loss 7.6006   LearningRate 0.0317   Epoch: 8   Global Step: 362240   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:26:00,717-Speed 2595.34 samples/sec   Loss 7.5713   LearningRate 0.0317   Epoch: 8   Global Step: 362250   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:26:04,615-Speed 2627.12 samples/sec   Loss 7.5597   LearningRate 0.0317   Epoch: 8   Global Step: 362260   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:26:08,505-Speed 2632.62 samples/sec   Loss 7.6852   LearningRate 0.0317   Epoch: 8   Global Step: 362270   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:26:12,472-Speed 2582.07 samples/sec   Loss 7.7681   LearningRate 0.0317   Epoch: 8   Global Step: 362280   Fp16 Grad Scale: 262144   Required: 53 hours
Training: 2022-04-14 12:26:16,523-Speed 2528.16 samples/sec   Loss 7.7481   LearningRate 0.0317   Epoch: 8   Global Step: 362290   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:26:20,590-Speed 2518.16 samples/sec   Loss 7.4858   LearningRate 0.0317   Epoch: 8   Global Step: 362300   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:26:24,661-Speed 2516.74 samples/sec   Loss 7.4750   LearningRate 0.0317   Epoch: 8   Global Step: 362310   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:26:28,648-Speed 2569.20 samples/sec   Loss 7.6046   LearningRate 0.0317   Epoch: 8   Global Step: 362320   Fp16 Grad Scale: 131072   Required: 53 hours
Training: 2022-04-14 12:26:32,564-Speed 2615.27 samples/sec   Loss 7.6556   LearningRate 0.0317   Epoch: 8   Global Step: 362330   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:26:36,456-Speed 2631.61 samples/sec   Loss 7.5445   LearningRate 0.0317   Epoch: 8   Global Step: 362340   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:26:40,360-Speed 2623.94 samples/sec   Loss 7.6460   LearningRate 0.0317   Epoch: 8   Global Step: 362350   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:26:44,260-Speed 2626.33 samples/sec   Loss 7.5368   LearningRate 0.0317   Epoch: 8   Global Step: 362360   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:26:48,153-Speed 2630.85 samples/sec   Loss 7.7499   LearningRate 0.0317   Epoch: 8   Global Step: 362370   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:26:52,048-Speed 2629.79 samples/sec   Loss 7.4846   LearningRate 0.0317   Epoch: 8   Global Step: 362380   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:26:55,952-Speed 2623.69 samples/sec   Loss 7.6938   LearningRate 0.0317   Epoch: 8   Global Step: 362390   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:26:59,849-Speed 2628.85 samples/sec   Loss 7.7377   LearningRate 0.0317   Epoch: 8   Global Step: 362400   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:27:03,736-Speed 2634.73 samples/sec   Loss 7.7187   LearningRate 0.0317   Epoch: 8   Global Step: 362410   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:07,639-Speed 2624.30 samples/sec   Loss 7.5457   LearningRate 0.0317   Epoch: 8   Global Step: 362420   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:11,530-Speed 2632.74 samples/sec   Loss 7.7388   LearningRate 0.0317   Epoch: 8   Global Step: 362430   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:15,437-Speed 2621.76 samples/sec   Loss 7.4396   LearningRate 0.0317   Epoch: 8   Global Step: 362440   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:19,340-Speed 2623.80 samples/sec   Loss 7.7168   LearningRate 0.0317   Epoch: 8   Global Step: 362450   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:23,251-Speed 2619.16 samples/sec   Loss 7.6409   LearningRate 0.0317   Epoch: 8   Global Step: 362460   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:27,160-Speed 2620.18 samples/sec   Loss 7.5485   LearningRate 0.0317   Epoch: 8   Global Step: 362470   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:31,060-Speed 2626.72 samples/sec   Loss 7.5972   LearningRate 0.0317   Epoch: 8   Global Step: 362480   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:34,962-Speed 2625.19 samples/sec   Loss 7.6057   LearningRate 0.0317   Epoch: 8   Global Step: 362490   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:38,870-Speed 2620.56 samples/sec   Loss 7.5215   LearningRate 0.0317   Epoch: 8   Global Step: 362500   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:42,766-Speed 2628.78 samples/sec   Loss 7.5938   LearningRate 0.0317   Epoch: 8   Global Step: 362510   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:27:46,663-Speed 2628.68 samples/sec   Loss 7.5679   LearningRate 0.0317   Epoch: 8   Global Step: 362520   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:27:50,540-Speed 2641.53 samples/sec   Loss 7.5690   LearningRate 0.0317   Epoch: 8   Global Step: 362530   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:54,437-Speed 2629.23 samples/sec   Loss 7.4749   LearningRate 0.0317   Epoch: 8   Global Step: 362540   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:27:58,337-Speed 2625.90 samples/sec   Loss 7.4766   LearningRate 0.0317   Epoch: 8   Global Step: 362550   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:02,255-Speed 2614.36 samples/sec   Loss 7.6118   LearningRate 0.0317   Epoch: 8   Global Step: 362560   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:06,155-Speed 2626.57 samples/sec   Loss 7.6079   LearningRate 0.0317   Epoch: 8   Global Step: 362570   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:10,053-Speed 2627.46 samples/sec   Loss 7.5795   LearningRate 0.0317   Epoch: 8   Global Step: 362580   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:13,954-Speed 2625.58 samples/sec   Loss 7.4668   LearningRate 0.0317   Epoch: 8   Global Step: 362590   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:17,854-Speed 2628.92 samples/sec   Loss 7.4602   LearningRate 0.0317   Epoch: 8   Global Step: 362600   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:21,749-Speed 2629.44 samples/sec   Loss 7.6033   LearningRate 0.0317   Epoch: 8   Global Step: 362610   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:25,644-Speed 2629.35 samples/sec   Loss 7.5687   LearningRate 0.0317   Epoch: 8   Global Step: 362620   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:29,538-Speed 2630.74 samples/sec   Loss 7.7696   LearningRate 0.0317   Epoch: 8   Global Step: 362630   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:28:33,420-Speed 2638.05 samples/sec   Loss 7.5610   LearningRate 0.0317   Epoch: 8   Global Step: 362640   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:37,312-Speed 2632.04 samples/sec   Loss 7.6043   LearningRate 0.0317   Epoch: 8   Global Step: 362650   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:41,213-Speed 2625.85 samples/sec   Loss 7.5873   LearningRate 0.0317   Epoch: 8   Global Step: 362660   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:45,113-Speed 2626.33 samples/sec   Loss 7.6517   LearningRate 0.0317   Epoch: 8   Global Step: 362670   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:49,008-Speed 2629.88 samples/sec   Loss 7.5060   LearningRate 0.0317   Epoch: 8   Global Step: 362680   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:52,930-Speed 2612.32 samples/sec   Loss 7.6717   LearningRate 0.0317   Epoch: 8   Global Step: 362690   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:28:56,833-Speed 2624.14 samples/sec   Loss 7.5805   LearningRate 0.0317   Epoch: 8   Global Step: 362700   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:29:00,727-Speed 2630.21 samples/sec   Loss 7.5732   LearningRate 0.0317   Epoch: 8   Global Step: 362710   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:29:04,623-Speed 2629.27 samples/sec   Loss 7.6046   LearningRate 0.0317   Epoch: 8   Global Step: 362720   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:29:08,518-Speed 2629.62 samples/sec   Loss 7.5464   LearningRate 0.0317   Epoch: 8   Global Step: 362730   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:29:12,426-Speed 2621.05 samples/sec   Loss 7.6009   LearningRate 0.0317   Epoch: 8   Global Step: 362740   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:29:16,304-Speed 2641.40 samples/sec   Loss 7.6571   LearningRate 0.0317   Epoch: 8   Global Step: 362750   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:29:20,109-Speed 2692.36 samples/sec   Loss 8.5421   LearningRate 0.0317   Epoch: 8   Global Step: 362760   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:24,005-Speed 2628.50 samples/sec   Loss 8.3322   LearningRate 0.0317   Epoch: 8   Global Step: 362770   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:27,892-Speed 2635.18 samples/sec   Loss 7.6839   LearningRate 0.0317   Epoch: 8   Global Step: 362780   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:31,831-Speed 2600.60 samples/sec   Loss 7.7051   LearningRate 0.0317   Epoch: 8   Global Step: 362790   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:35,787-Speed 2589.55 samples/sec   Loss 7.5280   LearningRate 0.0317   Epoch: 8   Global Step: 362800   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:39,685-Speed 2627.29 samples/sec   Loss 7.5408   LearningRate 0.0317   Epoch: 8   Global Step: 362810   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:43,584-Speed 2627.32 samples/sec   Loss 7.4308   LearningRate 0.0317   Epoch: 8   Global Step: 362820   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:47,481-Speed 2628.43 samples/sec   Loss 7.6738   LearningRate 0.0317   Epoch: 8   Global Step: 362830   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:51,370-Speed 2633.95 samples/sec   Loss 7.6577   LearningRate 0.0317   Epoch: 8   Global Step: 362840   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:55,261-Speed 2632.14 samples/sec   Loss 7.5468   LearningRate 0.0317   Epoch: 8   Global Step: 362850   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:29:59,159-Speed 2627.62 samples/sec   Loss 7.7167   LearningRate 0.0317   Epoch: 8   Global Step: 362860   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:03,062-Speed 2624.67 samples/sec   Loss 7.4960   LearningRate 0.0317   Epoch: 8   Global Step: 362870   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:06,957-Speed 2629.54 samples/sec   Loss 7.5773   LearningRate 0.0316   Epoch: 8   Global Step: 362880   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:10,856-Speed 2633.93 samples/sec   Loss 7.6887   LearningRate 0.0316   Epoch: 8   Global Step: 362890   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:14,749-Speed 2631.30 samples/sec   Loss 7.5102   LearningRate 0.0316   Epoch: 8   Global Step: 362900   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:18,640-Speed 2632.63 samples/sec   Loss 7.6063   LearningRate 0.0316   Epoch: 8   Global Step: 362910   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:22,538-Speed 2627.46 samples/sec   Loss 7.5013   LearningRate 0.0316   Epoch: 8   Global Step: 362920   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:26,426-Speed 2634.78 samples/sec   Loss 7.7403   LearningRate 0.0316   Epoch: 8   Global Step: 362930   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:30,340-Speed 2616.35 samples/sec   Loss 7.6598   LearningRate 0.0316   Epoch: 8   Global Step: 362940   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:34,234-Speed 2630.40 samples/sec   Loss 7.5326   LearningRate 0.0316   Epoch: 8   Global Step: 362950   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:30:38,125-Speed 2632.66 samples/sec   Loss 7.5285   LearningRate 0.0316   Epoch: 8   Global Step: 362960   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:30:42,024-Speed 2627.09 samples/sec   Loss 7.5949   LearningRate 0.0316   Epoch: 8   Global Step: 362970   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:30:45,916-Speed 2631.21 samples/sec   Loss 7.5566   LearningRate 0.0316   Epoch: 8   Global Step: 362980   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:30:49,810-Speed 2630.75 samples/sec   Loss 7.6611   LearningRate 0.0316   Epoch: 8   Global Step: 362990   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:30:53,705-Speed 2629.70 samples/sec   Loss 7.6002   LearningRate 0.0316   Epoch: 8   Global Step: 363000   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:30:57,606-Speed 2625.68 samples/sec   Loss 7.6883   LearningRate 0.0316   Epoch: 8   Global Step: 363010   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:31:01,498-Speed 2631.50 samples/sec   Loss 7.5466   LearningRate 0.0316   Epoch: 8   Global Step: 363020   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:31:05,392-Speed 2630.35 samples/sec   Loss 7.5466   LearningRate 0.0316   Epoch: 8   Global Step: 363030   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:31:09,292-Speed 2626.24 samples/sec   Loss 7.6790   LearningRate 0.0316   Epoch: 8   Global Step: 363040   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:31:13,236-Speed 2597.30 samples/sec   Loss 7.6006   LearningRate 0.0316   Epoch: 8   Global Step: 363050   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:31:17,143-Speed 2621.35 samples/sec   Loss 7.5377   LearningRate 0.0316   Epoch: 8   Global Step: 363060   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:31:21,037-Speed 2630.42 samples/sec   Loss 7.6914   LearningRate 0.0316   Epoch: 8   Global Step: 363070   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:31:24,930-Speed 2630.98 samples/sec   Loss 7.6088   LearningRate 0.0316   Epoch: 8   Global Step: 363080   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:31:28,826-Speed 2629.25 samples/sec   Loss 7.4560   LearningRate 0.0316   Epoch: 8   Global Step: 363090   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:31:32,694-Speed 2647.97 samples/sec   Loss 8.7917   LearningRate 0.0316   Epoch: 8   Global Step: 363100   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:36,586-Speed 2632.05 samples/sec   Loss 8.1441   LearningRate 0.0316   Epoch: 8   Global Step: 363110   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:40,488-Speed 2624.31 samples/sec   Loss 8.1125   LearningRate 0.0316   Epoch: 8   Global Step: 363120   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:44,376-Speed 2634.35 samples/sec   Loss 7.7996   LearningRate 0.0316   Epoch: 8   Global Step: 363130   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:48,272-Speed 2629.05 samples/sec   Loss 7.6416   LearningRate 0.0316   Epoch: 8   Global Step: 363140   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:52,171-Speed 2626.95 samples/sec   Loss 7.6245   LearningRate 0.0316   Epoch: 8   Global Step: 363150   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:56,073-Speed 2625.72 samples/sec   Loss 7.6626   LearningRate 0.0316   Epoch: 8   Global Step: 363160   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:31:59,966-Speed 2630.40 samples/sec   Loss 7.5179   LearningRate 0.0316   Epoch: 8   Global Step: 363170   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:32:03,859-Speed 2630.75 samples/sec   Loss 7.5265   LearningRate 0.0316   Epoch: 8   Global Step: 363180   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:32:07,755-Speed 2629.19 samples/sec   Loss 7.4938   LearningRate 0.0316   Epoch: 8   Global Step: 363190   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:32:11,650-Speed 2629.77 samples/sec   Loss 7.6847   LearningRate 0.0316   Epoch: 8   Global Step: 363200   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:15,563-Speed 2617.69 samples/sec   Loss 7.4749   LearningRate 0.0316   Epoch: 8   Global Step: 363210   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:19,489-Speed 2608.77 samples/sec   Loss 7.5664   LearningRate 0.0316   Epoch: 8   Global Step: 363220   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:23,384-Speed 2629.81 samples/sec   Loss 7.7187   LearningRate 0.0316   Epoch: 8   Global Step: 363230   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:27,281-Speed 2628.03 samples/sec   Loss 7.5069   LearningRate 0.0316   Epoch: 8   Global Step: 363240   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:31,177-Speed 2629.50 samples/sec   Loss 8.4893   LearningRate 0.0316   Epoch: 8   Global Step: 363250   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:35,068-Speed 2632.62 samples/sec   Loss 7.8970   LearningRate 0.0316   Epoch: 8   Global Step: 363260   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:38,961-Speed 2630.34 samples/sec   Loss 7.5665   LearningRate 0.0316   Epoch: 8   Global Step: 363270   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:42,858-Speed 2628.48 samples/sec   Loss 7.6403   LearningRate 0.0316   Epoch: 8   Global Step: 363280   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:46,767-Speed 2620.12 samples/sec   Loss 7.5128   LearningRate 0.0316   Epoch: 8   Global Step: 363290   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:32:50,661-Speed 2630.86 samples/sec   Loss 7.5216   LearningRate 0.0316   Epoch: 8   Global Step: 363300   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:32:54,565-Speed 2623.96 samples/sec   Loss 7.5121   LearningRate 0.0316   Epoch: 8   Global Step: 363310   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:32:58,450-Speed 2636.16 samples/sec   Loss 7.5214   LearningRate 0.0316   Epoch: 8   Global Step: 363320   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:02,373-Speed 2610.62 samples/sec   Loss 7.5101   LearningRate 0.0316   Epoch: 8   Global Step: 363330   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:06,258-Speed 2636.47 samples/sec   Loss 7.6122   LearningRate 0.0316   Epoch: 8   Global Step: 363340   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:10,150-Speed 2631.66 samples/sec   Loss 7.5600   LearningRate 0.0316   Epoch: 8   Global Step: 363350   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:14,058-Speed 2620.70 samples/sec   Loss 7.5226   LearningRate 0.0316   Epoch: 8   Global Step: 363360   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:17,943-Speed 2636.94 samples/sec   Loss 7.5624   LearningRate 0.0316   Epoch: 8   Global Step: 363370   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:21,838-Speed 2630.03 samples/sec   Loss 7.6197   LearningRate 0.0316   Epoch: 8   Global Step: 363380   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:25,739-Speed 2625.36 samples/sec   Loss 7.6009   LearningRate 0.0316   Epoch: 8   Global Step: 363390   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:33:29,627-Speed 2634.17 samples/sec   Loss 7.5549   LearningRate 0.0316   Epoch: 8   Global Step: 363400   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:33,532-Speed 2623.47 samples/sec   Loss 7.5400   LearningRate 0.0316   Epoch: 8   Global Step: 363410   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:37,421-Speed 2633.62 samples/sec   Loss 7.6495   LearningRate 0.0316   Epoch: 8   Global Step: 363420   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:41,316-Speed 2629.53 samples/sec   Loss 7.5901   LearningRate 0.0316   Epoch: 8   Global Step: 363430   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:45,228-Speed 2618.50 samples/sec   Loss 7.5642   LearningRate 0.0316   Epoch: 8   Global Step: 363440   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:49,140-Speed 2618.26 samples/sec   Loss 7.5586   LearningRate 0.0316   Epoch: 8   Global Step: 363450   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:53,037-Speed 2628.86 samples/sec   Loss 7.6006   LearningRate 0.0316   Epoch: 8   Global Step: 363460   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:33:56,929-Speed 2631.49 samples/sec   Loss 7.5581   LearningRate 0.0316   Epoch: 8   Global Step: 363470   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:34:00,834-Speed 2622.78 samples/sec   Loss 7.6900   LearningRate 0.0316   Epoch: 8   Global Step: 363480   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:34:04,719-Speed 2636.52 samples/sec   Loss 7.5239   LearningRate 0.0316   Epoch: 8   Global Step: 363490   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:34:08,621-Speed 2625.41 samples/sec   Loss 7.4326   LearningRate 0.0316   Epoch: 8   Global Step: 363500   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:34:12,516-Speed 2629.20 samples/sec   Loss 7.6287   LearningRate 0.0316   Epoch: 8   Global Step: 363510   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:34:16,399-Speed 2637.71 samples/sec   Loss 7.7497   LearningRate 0.0316   Epoch: 8   Global Step: 363520   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:34:20,241-Speed 2665.83 samples/sec   Loss 8.0541   LearningRate 0.0316   Epoch: 8   Global Step: 363530   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:24,136-Speed 2630.10 samples/sec   Loss 7.7940   LearningRate 0.0316   Epoch: 8   Global Step: 363540   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:28,027-Speed 2633.36 samples/sec   Loss 7.6522   LearningRate 0.0316   Epoch: 8   Global Step: 363550   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:31,918-Speed 2632.31 samples/sec   Loss 7.5996   LearningRate 0.0316   Epoch: 8   Global Step: 363560   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:35,811-Speed 2631.17 samples/sec   Loss 7.5314   LearningRate 0.0316   Epoch: 8   Global Step: 363570   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:39,705-Speed 2630.60 samples/sec   Loss 7.5482   LearningRate 0.0316   Epoch: 8   Global Step: 363580   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:43,594-Speed 2633.50 samples/sec   Loss 7.6091   LearningRate 0.0316   Epoch: 8   Global Step: 363590   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:47,484-Speed 2632.86 samples/sec   Loss 7.5019   LearningRate 0.0316   Epoch: 8   Global Step: 363600   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:51,377-Speed 2631.54 samples/sec   Loss 7.6446   LearningRate 0.0316   Epoch: 8   Global Step: 363610   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:55,265-Speed 2634.34 samples/sec   Loss 7.4672   LearningRate 0.0315   Epoch: 8   Global Step: 363620   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:34:59,158-Speed 2630.96 samples/sec   Loss 7.6636   LearningRate 0.0315   Epoch: 8   Global Step: 363630   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:03,046-Speed 2634.00 samples/sec   Loss 7.5351   LearningRate 0.0315   Epoch: 8   Global Step: 363640   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:06,946-Speed 2626.22 samples/sec   Loss 7.6799   LearningRate 0.0315   Epoch: 8   Global Step: 363650   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:10,847-Speed 2625.62 samples/sec   Loss 7.5884   LearningRate 0.0315   Epoch: 8   Global Step: 363660   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:14,751-Speed 2623.76 samples/sec   Loss 7.7447   LearningRate 0.0315   Epoch: 8   Global Step: 363670   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:18,643-Speed 2631.65 samples/sec   Loss 7.6893   LearningRate 0.0315   Epoch: 8   Global Step: 363680   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:22,532-Speed 2633.18 samples/sec   Loss 7.6179   LearningRate 0.0315   Epoch: 8   Global Step: 363690   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:26,428-Speed 2629.75 samples/sec   Loss 7.5853   LearningRate 0.0315   Epoch: 8   Global Step: 363700   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:30,321-Speed 2631.21 samples/sec   Loss 7.5761   LearningRate 0.0315   Epoch: 8   Global Step: 363710   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:34,225-Speed 2623.47 samples/sec   Loss 7.6013   LearningRate 0.0315   Epoch: 8   Global Step: 363720   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:35:38,123-Speed 2627.36 samples/sec   Loss 7.6044   LearningRate 0.0315   Epoch: 8   Global Step: 363730   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:35:42,023-Speed 2626.29 samples/sec   Loss 7.5518   LearningRate 0.0315   Epoch: 8   Global Step: 363740   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:35:45,917-Speed 2630.68 samples/sec   Loss 7.5421   LearningRate 0.0315   Epoch: 8   Global Step: 363750   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:35:49,810-Speed 2631.10 samples/sec   Loss 7.5815   LearningRate 0.0315   Epoch: 8   Global Step: 363760   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:35:53,708-Speed 2627.71 samples/sec   Loss 7.6844   LearningRate 0.0315   Epoch: 8   Global Step: 363770   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:35:57,602-Speed 2630.67 samples/sec   Loss 7.6194   LearningRate 0.0315   Epoch: 8   Global Step: 363780   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:36:01,509-Speed 2621.36 samples/sec   Loss 7.4828   LearningRate 0.0315   Epoch: 8   Global Step: 363790   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:36:05,408-Speed 2626.64 samples/sec   Loss 7.6271   LearningRate 0.0315   Epoch: 8   Global Step: 363800   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:36:09,307-Speed 2626.77 samples/sec   Loss 7.6302   LearningRate 0.0315   Epoch: 8   Global Step: 363810   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:36:13,194-Speed 2634.98 samples/sec   Loss 7.6527   LearningRate 0.0315   Epoch: 8   Global Step: 363820   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:36:17,087-Speed 2631.57 samples/sec   Loss 7.5600   LearningRate 0.0315   Epoch: 8   Global Step: 363830   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:20,986-Speed 2626.84 samples/sec   Loss 7.5236   LearningRate 0.0315   Epoch: 8   Global Step: 363840   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:24,882-Speed 2629.11 samples/sec   Loss 7.6016   LearningRate 0.0315   Epoch: 8   Global Step: 363850   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:28,774-Speed 2632.14 samples/sec   Loss 7.6132   LearningRate 0.0315   Epoch: 8   Global Step: 363860   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:32,671-Speed 2627.94 samples/sec   Loss 7.5966   LearningRate 0.0315   Epoch: 8   Global Step: 363870   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:36,565-Speed 2630.05 samples/sec   Loss 7.5688   LearningRate 0.0315   Epoch: 8   Global Step: 363880   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:40,470-Speed 2622.98 samples/sec   Loss 7.5607   LearningRate 0.0315   Epoch: 8   Global Step: 363890   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:44,361-Speed 2632.29 samples/sec   Loss 7.5644   LearningRate 0.0315   Epoch: 8   Global Step: 363900   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:48,257-Speed 2629.03 samples/sec   Loss 7.5363   LearningRate 0.0315   Epoch: 8   Global Step: 363910   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:52,149-Speed 2631.65 samples/sec   Loss 7.7447   LearningRate 0.0315   Epoch: 8   Global Step: 363920   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:56,036-Speed 2635.05 samples/sec   Loss 7.5651   LearningRate 0.0315   Epoch: 8   Global Step: 363930   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:36:59,905-Speed 2647.46 samples/sec   Loss 7.9156   LearningRate 0.0315   Epoch: 8   Global Step: 363940   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:03,795-Speed 2632.63 samples/sec   Loss 8.3529   LearningRate 0.0315   Epoch: 8   Global Step: 363950   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:07,685-Speed 2633.33 samples/sec   Loss 7.7071   LearningRate 0.0315   Epoch: 8   Global Step: 363960   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:11,575-Speed 2632.93 samples/sec   Loss 7.6699   LearningRate 0.0315   Epoch: 8   Global Step: 363970   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:15,472-Speed 2628.33 samples/sec   Loss 7.4623   LearningRate 0.0315   Epoch: 8   Global Step: 363980   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:19,356-Speed 2637.04 samples/sec   Loss 7.5443   LearningRate 0.0315   Epoch: 8   Global Step: 363990   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:23,251-Speed 2629.38 samples/sec   Loss 7.5935   LearningRate 0.0315   Epoch: 8   Global Step: 364000   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:27,141-Speed 2633.53 samples/sec   Loss 7.6407   LearningRate 0.0315   Epoch: 8   Global Step: 364010   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:31,031-Speed 2633.06 samples/sec   Loss 7.5691   LearningRate 0.0315   Epoch: 8   Global Step: 364020   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:34,928-Speed 2628.10 samples/sec   Loss 7.6766   LearningRate 0.0315   Epoch: 8   Global Step: 364030   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:37:38,820-Speed 2631.05 samples/sec   Loss 7.5956   LearningRate 0.0315   Epoch: 8   Global Step: 364040   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:37:42,714-Speed 2630.71 samples/sec   Loss 7.5471   LearningRate 0.0315   Epoch: 8   Global Step: 364050   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:37:46,616-Speed 2625.01 samples/sec   Loss 7.6707   LearningRate 0.0315   Epoch: 8   Global Step: 364060   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:37:50,513-Speed 2628.44 samples/sec   Loss 7.5378   LearningRate 0.0315   Epoch: 8   Global Step: 364070   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:37:54,427-Speed 2616.74 samples/sec   Loss 7.5641   LearningRate 0.0315   Epoch: 8   Global Step: 364080   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:37:58,330-Speed 2623.76 samples/sec   Loss 7.6798   LearningRate 0.0315   Epoch: 8   Global Step: 364090   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:38:02,387-Speed 2524.40 samples/sec   Loss 7.6542   LearningRate 0.0315   Epoch: 8   Global Step: 364100   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:38:06,282-Speed 2630.38 samples/sec   Loss 7.5917   LearningRate 0.0315   Epoch: 8   Global Step: 364110   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:38:10,176-Speed 2629.87 samples/sec   Loss 7.7024   LearningRate 0.0315   Epoch: 8   Global Step: 364120   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:38:14,071-Speed 2629.68 samples/sec   Loss 7.4664   LearningRate 0.0315   Epoch: 8   Global Step: 364130   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:38:17,969-Speed 2627.60 samples/sec   Loss 7.4888   LearningRate 0.0315   Epoch: 8   Global Step: 364140   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:21,866-Speed 2628.38 samples/sec   Loss 7.6173   LearningRate 0.0315   Epoch: 8   Global Step: 364150   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:25,776-Speed 2619.92 samples/sec   Loss 7.6693   LearningRate 0.0315   Epoch: 8   Global Step: 364160   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:29,668-Speed 2631.32 samples/sec   Loss 7.6058   LearningRate 0.0315   Epoch: 8   Global Step: 364170   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:33,562-Speed 2630.12 samples/sec   Loss 7.5995   LearningRate 0.0315   Epoch: 8   Global Step: 364180   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:37,456-Speed 2630.08 samples/sec   Loss 7.5211   LearningRate 0.0315   Epoch: 8   Global Step: 364190   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:41,352-Speed 2629.29 samples/sec   Loss 7.4725   LearningRate 0.0315   Epoch: 8   Global Step: 364200   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:45,245-Speed 2630.81 samples/sec   Loss 7.4886   LearningRate 0.0315   Epoch: 8   Global Step: 364210   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:49,140-Speed 2630.02 samples/sec   Loss 7.5928   LearningRate 0.0315   Epoch: 8   Global Step: 364220   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:53,033-Speed 2630.69 samples/sec   Loss 7.6091   LearningRate 0.0315   Epoch: 8   Global Step: 364230   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:38:56,929-Speed 2628.68 samples/sec   Loss 7.5403   LearningRate 0.0315   Epoch: 8   Global Step: 364240   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:00,837-Speed 2621.38 samples/sec   Loss 7.6329   LearningRate 0.0315   Epoch: 8   Global Step: 364250   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:04,726-Speed 2633.24 samples/sec   Loss 7.6006   LearningRate 0.0315   Epoch: 8   Global Step: 364260   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:08,625-Speed 2627.15 samples/sec   Loss 7.5019   LearningRate 0.0315   Epoch: 8   Global Step: 364270   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:12,523-Speed 2626.97 samples/sec   Loss 7.3937   LearningRate 0.0315   Epoch: 8   Global Step: 364280   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:16,426-Speed 2624.52 samples/sec   Loss 7.7210   LearningRate 0.0315   Epoch: 8   Global Step: 364290   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:20,328-Speed 2625.16 samples/sec   Loss 7.5852   LearningRate 0.0315   Epoch: 8   Global Step: 364300   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:24,220-Speed 2631.51 samples/sec   Loss 7.6095   LearningRate 0.0315   Epoch: 8   Global Step: 364310   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:28,113-Speed 2631.40 samples/sec   Loss 7.6353   LearningRate 0.0315   Epoch: 8   Global Step: 364320   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:32,009-Speed 2628.54 samples/sec   Loss 7.6265   LearningRate 0.0315   Epoch: 8   Global Step: 364330   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:35,887-Speed 2641.54 samples/sec   Loss 7.5301   LearningRate 0.0315   Epoch: 8   Global Step: 364340   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:39,784-Speed 2627.72 samples/sec   Loss 7.5230   LearningRate 0.0314   Epoch: 8   Global Step: 364350   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:43,688-Speed 2623.24 samples/sec   Loss 7.6471   LearningRate 0.0314   Epoch: 8   Global Step: 364360   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:47,589-Speed 2625.96 samples/sec   Loss 7.5649   LearningRate 0.0314   Epoch: 8   Global Step: 364370   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:51,487-Speed 2627.59 samples/sec   Loss 7.5426   LearningRate 0.0314   Epoch: 8   Global Step: 364380   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:55,409-Speed 2611.87 samples/sec   Loss 7.4566   LearningRate 0.0314   Epoch: 8   Global Step: 364390   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:39:59,309-Speed 2626.05 samples/sec   Loss 7.6708   LearningRate 0.0314   Epoch: 8   Global Step: 364400   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:03,216-Speed 2621.80 samples/sec   Loss 7.4747   LearningRate 0.0314   Epoch: 8   Global Step: 364410   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:07,125-Speed 2620.15 samples/sec   Loss 7.5527   LearningRate 0.0314   Epoch: 8   Global Step: 364420   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:11,026-Speed 2625.36 samples/sec   Loss 7.5494   LearningRate 0.0314   Epoch: 8   Global Step: 364430   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:14,919-Speed 2630.57 samples/sec   Loss 7.4655   LearningRate 0.0314   Epoch: 8   Global Step: 364440   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:40:18,812-Speed 2631.23 samples/sec   Loss 7.5466   LearningRate 0.0314   Epoch: 8   Global Step: 364450   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:40:22,706-Speed 2629.95 samples/sec   Loss 7.6239   LearningRate 0.0314   Epoch: 8   Global Step: 364460   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:40:26,581-Speed 2642.97 samples/sec   Loss 7.7455   LearningRate 0.0314   Epoch: 8   Global Step: 364470   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:30,477-Speed 2628.94 samples/sec   Loss 7.5832   LearningRate 0.0314   Epoch: 8   Global Step: 364480   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:34,371-Speed 2630.63 samples/sec   Loss 7.4699   LearningRate 0.0314   Epoch: 8   Global Step: 364490   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:38,263-Speed 2631.91 samples/sec   Loss 7.5545   LearningRate 0.0314   Epoch: 8   Global Step: 364500   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:42,164-Speed 2625.73 samples/sec   Loss 7.6300   LearningRate 0.0314   Epoch: 8   Global Step: 364510   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:46,063-Speed 2626.96 samples/sec   Loss 7.6323   LearningRate 0.0314   Epoch: 8   Global Step: 364520   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:49,958-Speed 2629.02 samples/sec   Loss 7.5544   LearningRate 0.0314   Epoch: 8   Global Step: 364530   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:53,854-Speed 2629.21 samples/sec   Loss 7.5823   LearningRate 0.0314   Epoch: 8   Global Step: 364540   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:40:57,752-Speed 2627.80 samples/sec   Loss 7.6089   LearningRate 0.0314   Epoch: 8   Global Step: 364550   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:01,653-Speed 2625.26 samples/sec   Loss 7.5297   LearningRate 0.0314   Epoch: 8   Global Step: 364560   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:05,558-Speed 2622.85 samples/sec   Loss 7.5416   LearningRate 0.0314   Epoch: 8   Global Step: 364570   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:41:09,453-Speed 2629.36 samples/sec   Loss 7.7350   LearningRate 0.0314   Epoch: 8   Global Step: 364580   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:41:13,346-Speed 2631.57 samples/sec   Loss 7.7096   LearningRate 0.0314   Epoch: 8   Global Step: 364590   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:17,260-Speed 2616.68 samples/sec   Loss 7.4256   LearningRate 0.0314   Epoch: 8   Global Step: 364600   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:21,163-Speed 2624.68 samples/sec   Loss 7.5749   LearningRate 0.0314   Epoch: 8   Global Step: 364610   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:25,067-Speed 2623.33 samples/sec   Loss 7.6492   LearningRate 0.0314   Epoch: 8   Global Step: 364620   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:28,972-Speed 2623.36 samples/sec   Loss 7.5326   LearningRate 0.0314   Epoch: 8   Global Step: 364630   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:32,877-Speed 2622.24 samples/sec   Loss 7.6460   LearningRate 0.0314   Epoch: 8   Global Step: 364640   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:36,775-Speed 2627.39 samples/sec   Loss 7.6482   LearningRate 0.0314   Epoch: 8   Global Step: 364650   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:40,676-Speed 2625.82 samples/sec   Loss 7.5697   LearningRate 0.0314   Epoch: 8   Global Step: 364660   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:44,585-Speed 2620.56 samples/sec   Loss 7.5493   LearningRate 0.0314   Epoch: 8   Global Step: 364670   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:48,478-Speed 2630.66 samples/sec   Loss 7.4491   LearningRate 0.0314   Epoch: 8   Global Step: 364680   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:41:52,382-Speed 2624.00 samples/sec   Loss 7.5186   LearningRate 0.0314   Epoch: 8   Global Step: 364690   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:41:56,285-Speed 2624.15 samples/sec   Loss 7.5120   LearningRate 0.0314   Epoch: 8   Global Step: 364700   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:42:00,135-Speed 2659.99 samples/sec   Loss 7.5735   LearningRate 0.0314   Epoch: 8   Global Step: 364710   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:42:04,035-Speed 2626.07 samples/sec   Loss 7.6642   LearningRate 0.0314   Epoch: 8   Global Step: 364720   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:42:07,934-Speed 2627.32 samples/sec   Loss 7.6045   LearningRate 0.0314   Epoch: 8   Global Step: 364730   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:42:11,823-Speed 2633.64 samples/sec   Loss 7.6052   LearningRate 0.0314   Epoch: 8   Global Step: 364740   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:15,734-Speed 2618.42 samples/sec   Loss 8.5925   LearningRate 0.0314   Epoch: 8   Global Step: 364750   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:19,649-Speed 2616.47 samples/sec   Loss 8.0332   LearningRate 0.0314   Epoch: 8   Global Step: 364760   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:23,554-Speed 2622.36 samples/sec   Loss 7.8472   LearningRate 0.0314   Epoch: 8   Global Step: 364770   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:27,451-Speed 2628.62 samples/sec   Loss 7.9295   LearningRate 0.0314   Epoch: 8   Global Step: 364780   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:31,342-Speed 2632.44 samples/sec   Loss 7.5687   LearningRate 0.0314   Epoch: 8   Global Step: 364790   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:35,253-Speed 2619.24 samples/sec   Loss 7.6629   LearningRate 0.0314   Epoch: 8   Global Step: 364800   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:39,141-Speed 2634.09 samples/sec   Loss 7.5570   LearningRate 0.0314   Epoch: 8   Global Step: 364810   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:43,039-Speed 2627.39 samples/sec   Loss 7.6401   LearningRate 0.0314   Epoch: 8   Global Step: 364820   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:46,929-Speed 2632.94 samples/sec   Loss 7.6121   LearningRate 0.0314   Epoch: 8   Global Step: 364830   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:42:50,822-Speed 2630.88 samples/sec   Loss 7.6820   LearningRate 0.0314   Epoch: 8   Global Step: 364840   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:42:54,713-Speed 2632.52 samples/sec   Loss 7.5710   LearningRate 0.0314   Epoch: 8   Global Step: 364850   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:42:58,654-Speed 2598.63 samples/sec   Loss 7.6142   LearningRate 0.0314   Epoch: 8   Global Step: 364860   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:02,565-Speed 2619.00 samples/sec   Loss 7.6604   LearningRate 0.0314   Epoch: 8   Global Step: 364870   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:06,465-Speed 2625.95 samples/sec   Loss 7.5793   LearningRate 0.0314   Epoch: 8   Global Step: 364880   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:10,359-Speed 2630.42 samples/sec   Loss 7.5681   LearningRate 0.0314   Epoch: 8   Global Step: 364890   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:14,250-Speed 2632.51 samples/sec   Loss 7.5149   LearningRate 0.0314   Epoch: 8   Global Step: 364900   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:18,146-Speed 2629.25 samples/sec   Loss 7.5525   LearningRate 0.0314   Epoch: 8   Global Step: 364910   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:22,041-Speed 2629.48 samples/sec   Loss 7.5154   LearningRate 0.0314   Epoch: 8   Global Step: 364920   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:25,957-Speed 2615.41 samples/sec   Loss 7.5880   LearningRate 0.0314   Epoch: 8   Global Step: 364930   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:29,851-Speed 2630.32 samples/sec   Loss 7.5300   LearningRate 0.0314   Epoch: 8   Global Step: 364940   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:33,744-Speed 2630.62 samples/sec   Loss 7.5416   LearningRate 0.0314   Epoch: 8   Global Step: 364950   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:43:37,641-Speed 2628.45 samples/sec   Loss 7.5581   LearningRate 0.0314   Epoch: 8   Global Step: 364960   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:43:41,535-Speed 2629.75 samples/sec   Loss 7.6480   LearningRate 0.0314   Epoch: 8   Global Step: 364970   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:43:45,426-Speed 2632.42 samples/sec   Loss 7.4815   LearningRate 0.0314   Epoch: 8   Global Step: 364980   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:43:49,335-Speed 2620.64 samples/sec   Loss 7.4437   LearningRate 0.0314   Epoch: 8   Global Step: 364990   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:43:53,227-Speed 2632.24 samples/sec   Loss 7.5324   LearningRate 0.0314   Epoch: 8   Global Step: 365000   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:43:57,098-Speed 2645.69 samples/sec   Loss 7.9834   LearningRate 0.0314   Epoch: 8   Global Step: 365010   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:00,992-Speed 2629.94 samples/sec   Loss 7.9409   LearningRate 0.0314   Epoch: 8   Global Step: 365020   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:04,879-Speed 2635.13 samples/sec   Loss 7.6072   LearningRate 0.0314   Epoch: 8   Global Step: 365030   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:08,771-Speed 2631.24 samples/sec   Loss 7.4733   LearningRate 0.0314   Epoch: 8   Global Step: 365040   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:12,665-Speed 2630.51 samples/sec   Loss 7.6131   LearningRate 0.0314   Epoch: 8   Global Step: 365050   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:16,561-Speed 2628.80 samples/sec   Loss 7.8514   LearningRate 0.0314   Epoch: 8   Global Step: 365060   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:20,451-Speed 2633.30 samples/sec   Loss 7.5977   LearningRate 0.0314   Epoch: 8   Global Step: 365070   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:24,349-Speed 2627.68 samples/sec   Loss 7.4287   LearningRate 0.0314   Epoch: 8   Global Step: 365080   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:28,240-Speed 2632.57 samples/sec   Loss 7.4832   LearningRate 0.0313   Epoch: 8   Global Step: 365090   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:32,134-Speed 2630.32 samples/sec   Loss 7.4968   LearningRate 0.0313   Epoch: 8   Global Step: 365100   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:44:36,033-Speed 2627.23 samples/sec   Loss 7.5064   LearningRate 0.0313   Epoch: 8   Global Step: 365110   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:44:39,926-Speed 2630.74 samples/sec   Loss 7.5684   LearningRate 0.0313   Epoch: 8   Global Step: 365120   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:44:43,819-Speed 2630.56 samples/sec   Loss 7.6254   LearningRate 0.0313   Epoch: 8   Global Step: 365130   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:44:47,710-Speed 2632.23 samples/sec   Loss 7.6328   LearningRate 0.0313   Epoch: 8   Global Step: 365140   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:44:51,601-Speed 2632.30 samples/sec   Loss 7.6107   LearningRate 0.0313   Epoch: 8   Global Step: 365150   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:44:55,499-Speed 2627.75 samples/sec   Loss 7.5972   LearningRate 0.0313   Epoch: 8   Global Step: 365160   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:44:59,400-Speed 2625.33 samples/sec   Loss 7.5754   LearningRate 0.0313   Epoch: 8   Global Step: 365170   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:45:03,291-Speed 2632.63 samples/sec   Loss 7.6525   LearningRate 0.0313   Epoch: 8   Global Step: 365180   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:45:07,187-Speed 2629.10 samples/sec   Loss 7.5700   LearningRate 0.0313   Epoch: 8   Global Step: 365190   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:45:11,079-Speed 2631.22 samples/sec   Loss 7.5349   LearningRate 0.0313   Epoch: 8   Global Step: 365200   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:45:14,983-Speed 2623.51 samples/sec   Loss 7.5774   LearningRate 0.0313   Epoch: 8   Global Step: 365210   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:18,881-Speed 2627.55 samples/sec   Loss 7.6623   LearningRate 0.0313   Epoch: 8   Global Step: 365220   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:22,785-Speed 2623.47 samples/sec   Loss 7.5630   LearningRate 0.0313   Epoch: 8   Global Step: 365230   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:26,679-Speed 2630.47 samples/sec   Loss 7.5300   LearningRate 0.0313   Epoch: 8   Global Step: 365240   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:30,570-Speed 2632.21 samples/sec   Loss 7.6114   LearningRate 0.0313   Epoch: 8   Global Step: 365250   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:34,461-Speed 2632.20 samples/sec   Loss 7.4788   LearningRate 0.0313   Epoch: 8   Global Step: 365260   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:38,369-Speed 2621.24 samples/sec   Loss 7.5550   LearningRate 0.0313   Epoch: 8   Global Step: 365270   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:42,267-Speed 2627.85 samples/sec   Loss 7.5693   LearningRate 0.0313   Epoch: 8   Global Step: 365280   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:46,161-Speed 2629.86 samples/sec   Loss 7.5864   LearningRate 0.0313   Epoch: 8   Global Step: 365290   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:50,053-Speed 2632.22 samples/sec   Loss 7.4533   LearningRate 0.0313   Epoch: 8   Global Step: 365300   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:45:53,946-Speed 2630.64 samples/sec   Loss 7.5567   LearningRate 0.0313   Epoch: 8   Global Step: 365310   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:45:57,841-Speed 2629.39 samples/sec   Loss 7.5728   LearningRate 0.0313   Epoch: 8   Global Step: 365320   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:01,738-Speed 2628.70 samples/sec   Loss 7.5574   LearningRate 0.0313   Epoch: 8   Global Step: 365330   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:05,641-Speed 2623.73 samples/sec   Loss 7.5988   LearningRate 0.0313   Epoch: 8   Global Step: 365340   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:09,535-Speed 2630.20 samples/sec   Loss 7.6604   LearningRate 0.0313   Epoch: 8   Global Step: 365350   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:13,430-Speed 2629.57 samples/sec   Loss 7.5531   LearningRate 0.0313   Epoch: 8   Global Step: 365360   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:17,324-Speed 2630.57 samples/sec   Loss 7.4258   LearningRate 0.0313   Epoch: 8   Global Step: 365370   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:21,226-Speed 2625.01 samples/sec   Loss 7.5228   LearningRate 0.0313   Epoch: 8   Global Step: 365380   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:25,117-Speed 2632.37 samples/sec   Loss 7.5423   LearningRate 0.0313   Epoch: 8   Global Step: 365390   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:29,013-Speed 2629.35 samples/sec   Loss 7.6250   LearningRate 0.0313   Epoch: 8   Global Step: 365400   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:46:32,905-Speed 2631.61 samples/sec   Loss 7.4882   LearningRate 0.0313   Epoch: 8   Global Step: 365410   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:46:36,797-Speed 2631.23 samples/sec   Loss 7.4423   LearningRate 0.0313   Epoch: 8   Global Step: 365420   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:46:40,697-Speed 2626.32 samples/sec   Loss 7.5314   LearningRate 0.0313   Epoch: 8   Global Step: 365430   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:46:44,614-Speed 2615.41 samples/sec   Loss 7.5946   LearningRate 0.0313   Epoch: 8   Global Step: 365440   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:46:48,507-Speed 2630.29 samples/sec   Loss 7.6296   LearningRate 0.0313   Epoch: 8   Global Step: 365450   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:46:52,405-Speed 2627.71 samples/sec   Loss 7.5172   LearningRate 0.0313   Epoch: 8   Global Step: 365460   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:46:56,282-Speed 2641.64 samples/sec   Loss 7.4477   LearningRate 0.0313   Epoch: 8   Global Step: 365470   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:00,198-Speed 2615.99 samples/sec   Loss 7.5830   LearningRate 0.0313   Epoch: 8   Global Step: 365480   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:04,092-Speed 2630.06 samples/sec   Loss 7.6843   LearningRate 0.0313   Epoch: 8   Global Step: 365490   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:07,986-Speed 2630.71 samples/sec   Loss 7.5952   LearningRate 0.0313   Epoch: 8   Global Step: 365500   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:11,880-Speed 2630.16 samples/sec   Loss 7.6831   LearningRate 0.0313   Epoch: 8   Global Step: 365510   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:15,774-Speed 2630.39 samples/sec   Loss 7.5465   LearningRate 0.0313   Epoch: 8   Global Step: 365520   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:19,699-Speed 2610.10 samples/sec   Loss 7.4826   LearningRate 0.0313   Epoch: 8   Global Step: 365530   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:23,601-Speed 2624.65 samples/sec   Loss 7.5583   LearningRate 0.0313   Epoch: 8   Global Step: 365540   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:27,492-Speed 2632.26 samples/sec   Loss 7.4935   LearningRate 0.0313   Epoch: 8   Global Step: 365550   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:31,384-Speed 2632.00 samples/sec   Loss 7.6590   LearningRate 0.0313   Epoch: 8   Global Step: 365560   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:47:35,274-Speed 2632.61 samples/sec   Loss 7.5304   LearningRate 0.0313   Epoch: 8   Global Step: 365570   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:47:39,208-Speed 2603.71 samples/sec   Loss 7.5632   LearningRate 0.0313   Epoch: 8   Global Step: 365580   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:47:43,109-Speed 2625.79 samples/sec   Loss 7.6976   LearningRate 0.0313   Epoch: 8   Global Step: 365590   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:47:46,998-Speed 2633.33 samples/sec   Loss 7.3817   LearningRate 0.0313   Epoch: 8   Global Step: 365600   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:47:50,890-Speed 2631.82 samples/sec   Loss 7.5879   LearningRate 0.0313   Epoch: 8   Global Step: 365610   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:47:54,782-Speed 2631.67 samples/sec   Loss 7.6014   LearningRate 0.0313   Epoch: 8   Global Step: 365620   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:47:58,673-Speed 2632.02 samples/sec   Loss 7.5598   LearningRate 0.0313   Epoch: 8   Global Step: 365630   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:02,578-Speed 2622.97 samples/sec   Loss 7.5803   LearningRate 0.0313   Epoch: 8   Global Step: 365640   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:06,471-Speed 2631.01 samples/sec   Loss 7.4091   LearningRate 0.0313   Epoch: 8   Global Step: 365650   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:10,372-Speed 2625.24 samples/sec   Loss 7.5735   LearningRate 0.0313   Epoch: 8   Global Step: 365660   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:14,252-Speed 2640.24 samples/sec   Loss 7.6037   LearningRate 0.0313   Epoch: 8   Global Step: 365670   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:18,139-Speed 2635.19 samples/sec   Loss 7.5424   LearningRate 0.0313   Epoch: 8   Global Step: 365680   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:22,043-Speed 2623.70 samples/sec   Loss 7.5257   LearningRate 0.0313   Epoch: 8   Global Step: 365690   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:25,929-Speed 2635.50 samples/sec   Loss 7.7507   LearningRate 0.0313   Epoch: 8   Global Step: 365700   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:29,837-Speed 2620.70 samples/sec   Loss 7.4545   LearningRate 0.0313   Epoch: 8   Global Step: 365710   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:33,741-Speed 2623.84 samples/sec   Loss 7.4576   LearningRate 0.0313   Epoch: 8   Global Step: 365720   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:37,636-Speed 2628.98 samples/sec   Loss 7.5782   LearningRate 0.0313   Epoch: 8   Global Step: 365730   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:41,532-Speed 2629.16 samples/sec   Loss 7.4344   LearningRate 0.0313   Epoch: 8   Global Step: 365740   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:45,438-Speed 2622.07 samples/sec   Loss 7.4808   LearningRate 0.0313   Epoch: 8   Global Step: 365750   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:49,322-Speed 2636.98 samples/sec   Loss 7.5998   LearningRate 0.0313   Epoch: 8   Global Step: 365760   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:48:53,213-Speed 2632.54 samples/sec   Loss 7.5756   LearningRate 0.0313   Epoch: 8   Global Step: 365770   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:48:57,108-Speed 2630.57 samples/sec   Loss 7.6375   LearningRate 0.0313   Epoch: 8   Global Step: 365780   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:01,006-Speed 2627.56 samples/sec   Loss 7.5259   LearningRate 0.0313   Epoch: 8   Global Step: 365790   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:04,905-Speed 2626.84 samples/sec   Loss 7.5002   LearningRate 0.0313   Epoch: 8   Global Step: 365800   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:08,795-Speed 2632.79 samples/sec   Loss 7.4911   LearningRate 0.0313   Epoch: 8   Global Step: 365810   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:12,690-Speed 2629.87 samples/sec   Loss 7.6668   LearningRate 0.0313   Epoch: 8   Global Step: 365820   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:16,588-Speed 2627.16 samples/sec   Loss 7.5413   LearningRate 0.0313   Epoch: 8   Global Step: 365830   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:20,504-Speed 2616.06 samples/sec   Loss 7.5296   LearningRate 0.0312   Epoch: 8   Global Step: 365840   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:24,404-Speed 2626.43 samples/sec   Loss 7.6808   LearningRate 0.0312   Epoch: 8   Global Step: 365850   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:28,311-Speed 2621.69 samples/sec   Loss 7.4991   LearningRate 0.0312   Epoch: 8   Global Step: 365860   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:49:32,219-Speed 2620.67 samples/sec   Loss 7.4928   LearningRate 0.0312   Epoch: 8   Global Step: 365870   Fp16 Grad Scale: 524288   Required: 52 hours
Training: 2022-04-14 12:49:36,083-Speed 2650.76 samples/sec   Loss 7.5219   LearningRate 0.0312   Epoch: 8   Global Step: 365880   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:49:39,987-Speed 2623.30 samples/sec   Loss 7.6098   LearningRate 0.0312   Epoch: 8   Global Step: 365890   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:49:43,885-Speed 2628.62 samples/sec   Loss 7.5215   LearningRate 0.0312   Epoch: 8   Global Step: 365900   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:49:47,779-Speed 2630.09 samples/sec   Loss 7.4970   LearningRate 0.0312   Epoch: 8   Global Step: 365910   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:49:51,673-Speed 2630.15 samples/sec   Loss 7.5158   LearningRate 0.0312   Epoch: 8   Global Step: 365920   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:49:55,568-Speed 2629.60 samples/sec   Loss 7.4480   LearningRate 0.0312   Epoch: 8   Global Step: 365930   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:49:59,470-Speed 2625.28 samples/sec   Loss 7.5751   LearningRate 0.0312   Epoch: 8   Global Step: 365940   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:03,375-Speed 2623.02 samples/sec   Loss 7.4616   LearningRate 0.0312   Epoch: 8   Global Step: 365950   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:07,284-Speed 2620.00 samples/sec   Loss 7.4345   LearningRate 0.0312   Epoch: 8   Global Step: 365960   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:11,182-Speed 2627.66 samples/sec   Loss 7.4378   LearningRate 0.0312   Epoch: 8   Global Step: 365970   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:15,082-Speed 2626.94 samples/sec   Loss 7.4917   LearningRate 0.0312   Epoch: 8   Global Step: 365980   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:50:18,988-Speed 2621.75 samples/sec   Loss 7.5387   LearningRate 0.0312   Epoch: 8   Global Step: 365990   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:50:22,884-Speed 2628.53 samples/sec   Loss 7.6305   LearningRate 0.0312   Epoch: 8   Global Step: 366000   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:50:26,802-Speed 2614.82 samples/sec   Loss 7.4802   LearningRate 0.0312   Epoch: 8   Global Step: 366010   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:50:30,728-Speed 2609.13 samples/sec   Loss 7.5360   LearningRate 0.0312   Epoch: 8   Global Step: 366020   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:50:34,625-Speed 2628.23 samples/sec   Loss 7.5221   LearningRate 0.0312   Epoch: 8   Global Step: 366030   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:50:38,508-Speed 2638.31 samples/sec   Loss 7.4645   LearningRate 0.0312   Epoch: 8   Global Step: 366040   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:42,405-Speed 2628.05 samples/sec   Loss 7.5204   LearningRate 0.0312   Epoch: 8   Global Step: 366050   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:46,337-Speed 2604.57 samples/sec   Loss 7.6725   LearningRate 0.0312   Epoch: 8   Global Step: 366060   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:50,302-Speed 2584.14 samples/sec   Loss 7.5065   LearningRate 0.0312   Epoch: 8   Global Step: 366070   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:54,200-Speed 2627.30 samples/sec   Loss 7.5083   LearningRate 0.0312   Epoch: 8   Global Step: 366080   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:50:58,097-Speed 2628.95 samples/sec   Loss 7.5213   LearningRate 0.0312   Epoch: 8   Global Step: 366090   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:01,992-Speed 2629.56 samples/sec   Loss 7.5156   LearningRate 0.0312   Epoch: 8   Global Step: 366100   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:05,894-Speed 2624.67 samples/sec   Loss 7.6418   LearningRate 0.0312   Epoch: 8   Global Step: 366110   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:09,803-Speed 2620.59 samples/sec   Loss 7.4161   LearningRate 0.0312   Epoch: 8   Global Step: 366120   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:13,700-Speed 2628.34 samples/sec   Loss 7.5939   LearningRate 0.0312   Epoch: 8   Global Step: 366130   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:17,644-Speed 2597.54 samples/sec   Loss 7.4234   LearningRate 0.0312   Epoch: 8   Global Step: 366140   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:51:21,545-Speed 2625.32 samples/sec   Loss 7.5455   LearningRate 0.0312   Epoch: 8   Global Step: 366150   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:51:25,441-Speed 2629.24 samples/sec   Loss 7.5666   LearningRate 0.0312   Epoch: 8   Global Step: 366160   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:51:29,340-Speed 2626.81 samples/sec   Loss 7.4489   LearningRate 0.0312   Epoch: 8   Global Step: 366170   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:33,242-Speed 2625.53 samples/sec   Loss 7.5175   LearningRate 0.0312   Epoch: 8   Global Step: 366180   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:37,178-Speed 2602.27 samples/sec   Loss 7.5630   LearningRate 0.0312   Epoch: 8   Global Step: 366190   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:41,065-Speed 2635.15 samples/sec   Loss 7.5909   LearningRate 0.0312   Epoch: 8   Global Step: 366200   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:44,961-Speed 2628.85 samples/sec   Loss 7.5627   LearningRate 0.0312   Epoch: 8   Global Step: 366210   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:48,883-Speed 2612.05 samples/sec   Loss 7.6283   LearningRate 0.0312   Epoch: 8   Global Step: 366220   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:52,778-Speed 2629.23 samples/sec   Loss 7.5676   LearningRate 0.0312   Epoch: 8   Global Step: 366230   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:51:56,700-Speed 2611.76 samples/sec   Loss 7.5643   LearningRate 0.0312   Epoch: 8   Global Step: 366240   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:00,593-Speed 2631.21 samples/sec   Loss 7.5636   LearningRate 0.0312   Epoch: 8   Global Step: 366250   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:04,490-Speed 2628.94 samples/sec   Loss 7.5902   LearningRate 0.0312   Epoch: 8   Global Step: 366260   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:08,401-Speed 2618.35 samples/sec   Loss 7.4726   LearningRate 0.0312   Epoch: 8   Global Step: 366270   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:52:12,309-Speed 2620.98 samples/sec   Loss 7.4687   LearningRate 0.0312   Epoch: 8   Global Step: 366280   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:52:16,211-Speed 2624.87 samples/sec   Loss 7.4069   LearningRate 0.0312   Epoch: 8   Global Step: 366290   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 12:52:20,086-Speed 2643.56 samples/sec   Loss 7.3602   LearningRate 0.0312   Epoch: 8   Global Step: 366300   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:23,989-Speed 2624.73 samples/sec   Loss 7.4703   LearningRate 0.0312   Epoch: 8   Global Step: 366310   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:27,887-Speed 2627.46 samples/sec   Loss 7.5137   LearningRate 0.0312   Epoch: 8   Global Step: 366320   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:31,776-Speed 2633.37 samples/sec   Loss 7.6301   LearningRate 0.0312   Epoch: 8   Global Step: 366330   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:52:35,649-Speed 2644.51 samples/sec   Loss 7.4416   LearningRate 0.0312   Epoch: 8   Global Step: 366340   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:52:39,549-Speed 2626.96 samples/sec   Loss 7.5543   LearningRate 0.0312   Epoch: 8   Global Step: 366350   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:52:43,447-Speed 2627.73 samples/sec   Loss 7.5310   LearningRate 0.0312   Epoch: 8   Global Step: 366360   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:52:47,344-Speed 2628.69 samples/sec   Loss 7.6259   LearningRate 0.0312   Epoch: 8   Global Step: 366370   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:52:51,245-Speed 2624.83 samples/sec   Loss 7.5861   LearningRate 0.0312   Epoch: 8   Global Step: 366380   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:52:55,146-Speed 2626.30 samples/sec   Loss 7.5325   LearningRate 0.0312   Epoch: 8   Global Step: 366390   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:52:59,057-Speed 2618.65 samples/sec   Loss 7.6032   LearningRate 0.0312   Epoch: 8   Global Step: 366400   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:53:02,971-Speed 2616.91 samples/sec   Loss 7.5907   LearningRate 0.0312   Epoch: 8   Global Step: 366410   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:53:06,871-Speed 2626.07 samples/sec   Loss 7.4614   LearningRate 0.0312   Epoch: 8   Global Step: 366420   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:53:10,850-Speed 2574.78 samples/sec   Loss 7.6028   LearningRate 0.0312   Epoch: 8   Global Step: 366430   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:53:14,761-Speed 2618.87 samples/sec   Loss 7.6634   LearningRate 0.0312   Epoch: 8   Global Step: 366440   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:53:18,696-Speed 2602.75 samples/sec   Loss 7.5267   LearningRate 0.0312   Epoch: 8   Global Step: 366450   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:53:22,624-Speed 2607.86 samples/sec   Loss 7.5751   LearningRate 0.0312   Epoch: 8   Global Step: 366460   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:53:26,519-Speed 2629.30 samples/sec   Loss 7.5310   LearningRate 0.0312   Epoch: 8   Global Step: 366470   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:53:30,415-Speed 2629.04 samples/sec   Loss 7.4937   LearningRate 0.0312   Epoch: 8   Global Step: 366480   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 12:53:34,296-Speed 2639.32 samples/sec   Loss 7.4164   LearningRate 0.0312   Epoch: 8   Global Step: 366490   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:53:38,209-Speed 2617.04 samples/sec   Loss 7.5250   LearningRate 0.0312   Epoch: 8   Global Step: 366500   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 12:53:42,090-Speed 2639.07 samples/sec   Loss 7.4877   LearningRate 0.0312   Epoch: 8   Global Step: 366510   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:53:45,993-Speed 2624.90 samples/sec   Loss 7.4077   LearningRate 0.0312   Epoch: 8   Global Step: 366520   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:53:49,896-Speed 2624.42 samples/sec   Loss 7.6274   LearningRate 0.0312   Epoch: 8   Global Step: 366530   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:53:53,805-Speed 2620.21 samples/sec   Loss 7.5982   LearningRate 0.0312   Epoch: 8   Global Step: 366540   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:53:57,705-Speed 2626.07 samples/sec   Loss 7.5370   LearningRate 0.0312   Epoch: 8   Global Step: 366550   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 12:54:01,543-Speed 2668.79 samples/sec   Loss 8.3447   LearningRate 0.0312   Epoch: 8   Global Step: 366560   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:54:05,418-Speed 2642.82 samples/sec   Loss 8.9709   LearningRate 0.0312   Epoch: 8   Global Step: 366570   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:09,331-Speed 2617.83 samples/sec   Loss 7.9889   LearningRate 0.0311   Epoch: 8   Global Step: 366580   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:13,226-Speed 2629.12 samples/sec   Loss 7.6809   LearningRate 0.0311   Epoch: 8   Global Step: 366590   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:17,129-Speed 2624.88 samples/sec   Loss 7.5438   LearningRate 0.0311   Epoch: 8   Global Step: 366600   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:21,025-Speed 2629.15 samples/sec   Loss 7.5564   LearningRate 0.0311   Epoch: 8   Global Step: 366610   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:24,920-Speed 2629.60 samples/sec   Loss 7.4387   LearningRate 0.0311   Epoch: 8   Global Step: 366620   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:28,808-Speed 2634.34 samples/sec   Loss 7.4352   LearningRate 0.0311   Epoch: 8   Global Step: 366630   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:32,700-Speed 2631.61 samples/sec   Loss 7.4509   LearningRate 0.0311   Epoch: 8   Global Step: 366640   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:36,591-Speed 2632.12 samples/sec   Loss 7.4819   LearningRate 0.0311   Epoch: 8   Global Step: 366650   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:40,493-Speed 2624.60 samples/sec   Loss 7.4940   LearningRate 0.0311   Epoch: 8   Global Step: 366660   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:54:44,408-Speed 2616.44 samples/sec   Loss 7.5613   LearningRate 0.0311   Epoch: 8   Global Step: 366670   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:54:48,311-Speed 2624.66 samples/sec   Loss 7.5114   LearningRate 0.0311   Epoch: 8   Global Step: 366680   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:54:52,208-Speed 2628.82 samples/sec   Loss 7.5424   LearningRate 0.0311   Epoch: 8   Global Step: 366690   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:54:56,120-Speed 2618.06 samples/sec   Loss 7.5285   LearningRate 0.0311   Epoch: 8   Global Step: 366700   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:00,011-Speed 2632.80 samples/sec   Loss 7.5099   LearningRate 0.0311   Epoch: 8   Global Step: 366710   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:03,901-Speed 2632.82 samples/sec   Loss 7.5616   LearningRate 0.0311   Epoch: 8   Global Step: 366720   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:07,794-Speed 2631.48 samples/sec   Loss 7.4995   LearningRate 0.0311   Epoch: 8   Global Step: 366730   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:11,689-Speed 2629.69 samples/sec   Loss 7.3928   LearningRate 0.0311   Epoch: 8   Global Step: 366740   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:15,578-Speed 2633.64 samples/sec   Loss 7.4926   LearningRate 0.0311   Epoch: 8   Global Step: 366750   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:19,470-Speed 2631.52 samples/sec   Loss 7.4627   LearningRate 0.0311   Epoch: 8   Global Step: 366760   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:55:23,389-Speed 2614.15 samples/sec   Loss 7.5328   LearningRate 0.0311   Epoch: 8   Global Step: 366770   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:27,279-Speed 2633.05 samples/sec   Loss 7.5276   LearningRate 0.0311   Epoch: 8   Global Step: 366780   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:31,171-Speed 2631.85 samples/sec   Loss 7.5427   LearningRate 0.0311   Epoch: 8   Global Step: 366790   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:35,067-Speed 2628.39 samples/sec   Loss 7.4517   LearningRate 0.0311   Epoch: 8   Global Step: 366800   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:38,964-Speed 2628.58 samples/sec   Loss 7.5654   LearningRate 0.0311   Epoch: 8   Global Step: 366810   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:42,868-Speed 2623.99 samples/sec   Loss 7.5028   LearningRate 0.0311   Epoch: 8   Global Step: 366820   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:46,768-Speed 2625.56 samples/sec   Loss 7.4903   LearningRate 0.0311   Epoch: 8   Global Step: 366830   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:50,658-Speed 2633.28 samples/sec   Loss 7.5534   LearningRate 0.0311   Epoch: 8   Global Step: 366840   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:54,549-Speed 2632.05 samples/sec   Loss 7.5862   LearningRate 0.0311   Epoch: 8   Global Step: 366850   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:55:58,443-Speed 2630.90 samples/sec   Loss 7.4745   LearningRate 0.0311   Epoch: 8   Global Step: 366860   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:56:02,339-Speed 2628.91 samples/sec   Loss 7.4769   LearningRate 0.0311   Epoch: 8   Global Step: 366870   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:06,237-Speed 2627.87 samples/sec   Loss 7.5669   LearningRate 0.0311   Epoch: 8   Global Step: 366880   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:10,132-Speed 2629.11 samples/sec   Loss 7.5356   LearningRate 0.0311   Epoch: 8   Global Step: 366890   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:14,027-Speed 2630.20 samples/sec   Loss 7.5211   LearningRate 0.0311   Epoch: 8   Global Step: 366900   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:17,923-Speed 2628.85 samples/sec   Loss 7.4852   LearningRate 0.0311   Epoch: 8   Global Step: 366910   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:21,817-Speed 2629.95 samples/sec   Loss 7.6408   LearningRate 0.0311   Epoch: 8   Global Step: 366920   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:25,710-Speed 2631.18 samples/sec   Loss 7.4446   LearningRate 0.0311   Epoch: 8   Global Step: 366930   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:29,602-Speed 2632.18 samples/sec   Loss 7.5699   LearningRate 0.0311   Epoch: 8   Global Step: 366940   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:33,495-Speed 2631.09 samples/sec   Loss 7.5138   LearningRate 0.0311   Epoch: 8   Global Step: 366950   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:37,388-Speed 2630.53 samples/sec   Loss 7.6881   LearningRate 0.0311   Epoch: 8   Global Step: 366960   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:56:41,283-Speed 2629.58 samples/sec   Loss 7.4535   LearningRate 0.0311   Epoch: 8   Global Step: 366970   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:56:45,176-Speed 2630.68 samples/sec   Loss 7.4890   LearningRate 0.0311   Epoch: 8   Global Step: 366980   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:56:49,084-Speed 2621.61 samples/sec   Loss 7.4983   LearningRate 0.0311   Epoch: 8   Global Step: 366990   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:56:52,984-Speed 2626.17 samples/sec   Loss 7.5535   LearningRate 0.0311   Epoch: 8   Global Step: 367000   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:56:56,927-Speed 2597.82 samples/sec   Loss 7.4768   LearningRate 0.0311   Epoch: 8   Global Step: 367010   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:57:00,819-Speed 2631.67 samples/sec   Loss 7.4226   LearningRate 0.0311   Epoch: 8   Global Step: 367020   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:57:04,710-Speed 2632.66 samples/sec   Loss 7.4912   LearningRate 0.0311   Epoch: 8   Global Step: 367030   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:57:08,617-Speed 2621.33 samples/sec   Loss 7.4567   LearningRate 0.0311   Epoch: 8   Global Step: 367040   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:57:12,522-Speed 2623.10 samples/sec   Loss 7.4972   LearningRate 0.0311   Epoch: 8   Global Step: 367050   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:57:16,362-Speed 2667.50 samples/sec   Loss 8.2466   LearningRate 0.0311   Epoch: 8   Global Step: 367060   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:20,259-Speed 2628.29 samples/sec   Loss 8.3295   LearningRate 0.0311   Epoch: 8   Global Step: 367070   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:24,166-Speed 2621.15 samples/sec   Loss 8.0505   LearningRate 0.0311   Epoch: 8   Global Step: 367080   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:28,074-Speed 2621.73 samples/sec   Loss 7.7718   LearningRate 0.0311   Epoch: 8   Global Step: 367090   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:31,970-Speed 2628.43 samples/sec   Loss 7.8621   LearningRate 0.0311   Epoch: 8   Global Step: 367100   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:35,896-Speed 2609.52 samples/sec   Loss 7.7887   LearningRate 0.0311   Epoch: 8   Global Step: 367110   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:39,792-Speed 2629.04 samples/sec   Loss 7.6869   LearningRate 0.0311   Epoch: 8   Global Step: 367120   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:43,683-Speed 2632.03 samples/sec   Loss 7.6704   LearningRate 0.0311   Epoch: 8   Global Step: 367130   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:47,584-Speed 2625.93 samples/sec   Loss 7.6511   LearningRate 0.0311   Epoch: 8   Global Step: 367140   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:51,481-Speed 2628.55 samples/sec   Loss 7.7229   LearningRate 0.0311   Epoch: 8   Global Step: 367150   Fp16 Grad Scale: 1024   Required: 52 hours
Training: 2022-04-14 12:57:55,377-Speed 2629.17 samples/sec   Loss 7.6127   LearningRate 0.0311   Epoch: 8   Global Step: 367160   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:57:59,265-Speed 2633.73 samples/sec   Loss 7.4656   LearningRate 0.0311   Epoch: 8   Global Step: 367170   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:03,166-Speed 2626.59 samples/sec   Loss 7.6100   LearningRate 0.0311   Epoch: 8   Global Step: 367180   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:07,064-Speed 2627.25 samples/sec   Loss 7.7853   LearningRate 0.0311   Epoch: 8   Global Step: 367190   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:10,953-Speed 2633.52 samples/sec   Loss 7.5293   LearningRate 0.0311   Epoch: 8   Global Step: 367200   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:14,849-Speed 2628.69 samples/sec   Loss 7.6408   LearningRate 0.0311   Epoch: 8   Global Step: 367210   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:18,736-Speed 2635.62 samples/sec   Loss 7.5326   LearningRate 0.0311   Epoch: 8   Global Step: 367220   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:22,629-Speed 2631.28 samples/sec   Loss 7.4627   LearningRate 0.0311   Epoch: 8   Global Step: 367230   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:26,516-Speed 2634.80 samples/sec   Loss 7.6567   LearningRate 0.0311   Epoch: 8   Global Step: 367240   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:30,398-Speed 2639.07 samples/sec   Loss 7.4865   LearningRate 0.0311   Epoch: 8   Global Step: 367250   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 12:58:34,289-Speed 2632.25 samples/sec   Loss 7.5767   LearningRate 0.0311   Epoch: 8   Global Step: 367260   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:58:38,178-Speed 2633.55 samples/sec   Loss 7.5907   LearningRate 0.0311   Epoch: 8   Global Step: 367270   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:58:42,072-Speed 2630.82 samples/sec   Loss 7.4787   LearningRate 0.0311   Epoch: 8   Global Step: 367280   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:58:45,960-Speed 2634.28 samples/sec   Loss 7.4058   LearningRate 0.0311   Epoch: 8   Global Step: 367290   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:58:49,858-Speed 2627.19 samples/sec   Loss 7.6020   LearningRate 0.0311   Epoch: 8   Global Step: 367300   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:58:53,749-Speed 2632.99 samples/sec   Loss 7.5255   LearningRate 0.0311   Epoch: 8   Global Step: 367310   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:58:57,648-Speed 2626.42 samples/sec   Loss 7.4865   LearningRate 0.0310   Epoch: 8   Global Step: 367320   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:59:01,544-Speed 2629.18 samples/sec   Loss 7.5938   LearningRate 0.0310   Epoch: 8   Global Step: 367330   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:59:05,438-Speed 2630.14 samples/sec   Loss 7.5673   LearningRate 0.0310   Epoch: 8   Global Step: 367340   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:59:09,335-Speed 2628.53 samples/sec   Loss 7.5338   LearningRate 0.0310   Epoch: 8   Global Step: 367350   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 12:59:13,232-Speed 2628.45 samples/sec   Loss 7.7589   LearningRate 0.0310   Epoch: 8   Global Step: 367360   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:17,124-Speed 2632.02 samples/sec   Loss 7.5568   LearningRate 0.0310   Epoch: 8   Global Step: 367370   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:21,043-Speed 2614.04 samples/sec   Loss 7.6941   LearningRate 0.0310   Epoch: 8   Global Step: 367380   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:25,016-Speed 2577.68 samples/sec   Loss 7.7162   LearningRate 0.0310   Epoch: 8   Global Step: 367390   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:28,923-Speed 2621.42 samples/sec   Loss 7.4987   LearningRate 0.0310   Epoch: 8   Global Step: 367400   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:32,819-Speed 2629.33 samples/sec   Loss 7.4960   LearningRate 0.0310   Epoch: 8   Global Step: 367410   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:36,709-Speed 2633.32 samples/sec   Loss 7.4596   LearningRate 0.0310   Epoch: 8   Global Step: 367420   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:40,615-Speed 2622.09 samples/sec   Loss 7.4063   LearningRate 0.0310   Epoch: 8   Global Step: 367430   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:44,525-Speed 2619.49 samples/sec   Loss 7.4845   LearningRate 0.0310   Epoch: 8   Global Step: 367440   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:48,417-Speed 2631.47 samples/sec   Loss 7.4082   LearningRate 0.0310   Epoch: 8   Global Step: 367450   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 12:59:52,309-Speed 2632.15 samples/sec   Loss 7.6149   LearningRate 0.0310   Epoch: 8   Global Step: 367460   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 12:59:56,207-Speed 2627.67 samples/sec   Loss 7.4980   LearningRate 0.0310   Epoch: 8   Global Step: 367470   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:00,105-Speed 2627.72 samples/sec   Loss 7.4439   LearningRate 0.0310   Epoch: 8   Global Step: 367480   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:04,006-Speed 2625.61 samples/sec   Loss 7.4747   LearningRate 0.0310   Epoch: 8   Global Step: 367490   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:07,904-Speed 2627.62 samples/sec   Loss 7.5431   LearningRate 0.0310   Epoch: 8   Global Step: 367500   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:11,882-Speed 2574.38 samples/sec   Loss 7.5560   LearningRate 0.0310   Epoch: 8   Global Step: 367510   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:15,792-Speed 2620.40 samples/sec   Loss 7.3591   LearningRate 0.0310   Epoch: 8   Global Step: 367520   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:19,689-Speed 2628.22 samples/sec   Loss 7.4989   LearningRate 0.0310   Epoch: 8   Global Step: 367530   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:23,582-Speed 2631.49 samples/sec   Loss 7.4258   LearningRate 0.0310   Epoch: 8   Global Step: 367540   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:27,502-Speed 2612.55 samples/sec   Loss 7.4614   LearningRate 0.0310   Epoch: 8   Global Step: 367550   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:00:31,389-Speed 2635.39 samples/sec   Loss 7.5419   LearningRate 0.0310   Epoch: 8   Global Step: 367560   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:35,290-Speed 2625.24 samples/sec   Loss 7.5342   LearningRate 0.0310   Epoch: 8   Global Step: 367570   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:39,188-Speed 2628.22 samples/sec   Loss 7.5647   LearningRate 0.0310   Epoch: 8   Global Step: 367580   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:43,082-Speed 2630.27 samples/sec   Loss 7.4449   LearningRate 0.0310   Epoch: 8   Global Step: 367590   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:46,996-Speed 2616.87 samples/sec   Loss 7.5841   LearningRate 0.0310   Epoch: 8   Global Step: 367600   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:50,889-Speed 2631.88 samples/sec   Loss 7.5635   LearningRate 0.0310   Epoch: 8   Global Step: 367610   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:54,781-Speed 2631.21 samples/sec   Loss 7.5259   LearningRate 0.0310   Epoch: 8   Global Step: 367620   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:00:58,674-Speed 2631.37 samples/sec   Loss 7.5324   LearningRate 0.0310   Epoch: 8   Global Step: 367630   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:01:02,568-Speed 2630.08 samples/sec   Loss 7.5306   LearningRate 0.0310   Epoch: 8   Global Step: 367640   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:01:06,473-Speed 2623.33 samples/sec   Loss 7.6484   LearningRate 0.0310   Epoch: 8   Global Step: 367650   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:01:10,358-Speed 2635.80 samples/sec   Loss 7.9285   LearningRate 0.0310   Epoch: 8   Global Step: 367660   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:14,258-Speed 2626.62 samples/sec   Loss 7.5972   LearningRate 0.0310   Epoch: 8   Global Step: 367670   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:18,169-Speed 2619.06 samples/sec   Loss 7.5773   LearningRate 0.0310   Epoch: 8   Global Step: 367680   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:22,107-Speed 2601.37 samples/sec   Loss 7.4809   LearningRate 0.0310   Epoch: 8   Global Step: 367690   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:26,003-Speed 2628.61 samples/sec   Loss 7.4608   LearningRate 0.0310   Epoch: 8   Global Step: 367700   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:29,904-Speed 2626.36 samples/sec   Loss 7.4396   LearningRate 0.0310   Epoch: 8   Global Step: 367710   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:33,822-Speed 2613.79 samples/sec   Loss 7.4983   LearningRate 0.0310   Epoch: 8   Global Step: 367720   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:37,711-Speed 2633.81 samples/sec   Loss 7.5616   LearningRate 0.0310   Epoch: 8   Global Step: 367730   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:41,601-Speed 2633.28 samples/sec   Loss 7.5731   LearningRate 0.0310   Epoch: 8   Global Step: 367740   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:45,500-Speed 2626.77 samples/sec   Loss 7.5330   LearningRate 0.0310   Epoch: 8   Global Step: 367750   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:01:49,368-Speed 2648.09 samples/sec   Loss 7.6023   LearningRate 0.0310   Epoch: 8   Global Step: 367760   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:01:53,245-Speed 2642.24 samples/sec   Loss 8.5982   LearningRate 0.0310   Epoch: 8   Global Step: 367770   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:01:57,149-Speed 2624.15 samples/sec   Loss 8.1095   LearningRate 0.0310   Epoch: 8   Global Step: 367780   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:01,042-Speed 2631.28 samples/sec   Loss 7.6581   LearningRate 0.0310   Epoch: 8   Global Step: 367790   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:04,947-Speed 2622.41 samples/sec   Loss 7.5043   LearningRate 0.0310   Epoch: 8   Global Step: 367800   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:08,838-Speed 2632.30 samples/sec   Loss 7.4792   LearningRate 0.0310   Epoch: 8   Global Step: 367810   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:12,729-Speed 2632.79 samples/sec   Loss 7.4754   LearningRate 0.0310   Epoch: 8   Global Step: 367820   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:16,618-Speed 2633.59 samples/sec   Loss 7.6033   LearningRate 0.0310   Epoch: 8   Global Step: 367830   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:20,518-Speed 2626.74 samples/sec   Loss 7.7118   LearningRate 0.0310   Epoch: 8   Global Step: 367840   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:24,409-Speed 2631.79 samples/sec   Loss 7.6120   LearningRate 0.0310   Epoch: 8   Global Step: 367850   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:28,307-Speed 2627.61 samples/sec   Loss 7.6279   LearningRate 0.0310   Epoch: 8   Global Step: 367860   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:02:32,195-Speed 2635.12 samples/sec   Loss 7.5923   LearningRate 0.0310   Epoch: 8   Global Step: 367870   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:36,089-Speed 2630.59 samples/sec   Loss 7.5029   LearningRate 0.0310   Epoch: 8   Global Step: 367880   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:39,994-Speed 2622.85 samples/sec   Loss 7.7170   LearningRate 0.0310   Epoch: 8   Global Step: 367890   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:43,887-Speed 2630.39 samples/sec   Loss 7.4923   LearningRate 0.0310   Epoch: 8   Global Step: 367900   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:47,782-Speed 2629.80 samples/sec   Loss 7.6296   LearningRate 0.0310   Epoch: 8   Global Step: 367910   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:51,677-Speed 2629.72 samples/sec   Loss 7.4989   LearningRate 0.0310   Epoch: 8   Global Step: 367920   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:55,579-Speed 2625.36 samples/sec   Loss 7.6107   LearningRate 0.0310   Epoch: 8   Global Step: 367930   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:02:59,476-Speed 2628.43 samples/sec   Loss 7.4838   LearningRate 0.0310   Epoch: 8   Global Step: 367940   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:03:03,391-Speed 2616.17 samples/sec   Loss 7.5324   LearningRate 0.0310   Epoch: 8   Global Step: 367950   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:03:07,286-Speed 2629.90 samples/sec   Loss 7.5070   LearningRate 0.0310   Epoch: 8   Global Step: 367960   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:03:11,180-Speed 2630.08 samples/sec   Loss 7.5061   LearningRate 0.0310   Epoch: 8   Global Step: 367970   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:15,076-Speed 2628.84 samples/sec   Loss 7.6798   LearningRate 0.0310   Epoch: 8   Global Step: 367980   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:18,972-Speed 2629.56 samples/sec   Loss 7.5514   LearningRate 0.0310   Epoch: 8   Global Step: 367990   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:22,870-Speed 2627.35 samples/sec   Loss 7.4959   LearningRate 0.0310   Epoch: 8   Global Step: 368000   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:26,765-Speed 2630.32 samples/sec   Loss 7.6618   LearningRate 0.0310   Epoch: 8   Global Step: 368010   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:30,657-Speed 2631.44 samples/sec   Loss 7.4375   LearningRate 0.0310   Epoch: 8   Global Step: 368020   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:34,553-Speed 2629.61 samples/sec   Loss 7.5667   LearningRate 0.0310   Epoch: 8   Global Step: 368030   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:38,452-Speed 2627.19 samples/sec   Loss 7.4874   LearningRate 0.0310   Epoch: 8   Global Step: 368040   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:42,343-Speed 2632.05 samples/sec   Loss 7.5159   LearningRate 0.0310   Epoch: 8   Global Step: 368050   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:46,242-Speed 2626.75 samples/sec   Loss 7.4115   LearningRate 0.0310   Epoch: 8   Global Step: 368060   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:03:50,131-Speed 2633.95 samples/sec   Loss 7.5145   LearningRate 0.0309   Epoch: 8   Global Step: 368070   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:03:54,024-Speed 2630.47 samples/sec   Loss 7.7014   LearningRate 0.0309   Epoch: 8   Global Step: 368080   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:03:57,919-Speed 2630.38 samples/sec   Loss 7.3939   LearningRate 0.0309   Epoch: 8   Global Step: 368090   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:01,818-Speed 2626.32 samples/sec   Loss 7.4720   LearningRate 0.0309   Epoch: 8   Global Step: 368100   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:05,711-Speed 2631.54 samples/sec   Loss 7.4895   LearningRate 0.0309   Epoch: 8   Global Step: 368110   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:09,601-Speed 2632.75 samples/sec   Loss 7.5428   LearningRate 0.0309   Epoch: 8   Global Step: 368120   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:13,488-Speed 2634.68 samples/sec   Loss 7.5621   LearningRate 0.0309   Epoch: 8   Global Step: 368130   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:17,380-Speed 2631.46 samples/sec   Loss 7.6093   LearningRate 0.0309   Epoch: 8   Global Step: 368140   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:21,278-Speed 2628.35 samples/sec   Loss 7.4868   LearningRate 0.0309   Epoch: 8   Global Step: 368150   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:25,171-Speed 2630.44 samples/sec   Loss 7.4799   LearningRate 0.0309   Epoch: 8   Global Step: 368160   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:04:29,066-Speed 2629.74 samples/sec   Loss 7.4501   LearningRate 0.0309   Epoch: 8   Global Step: 368170   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:32,963-Speed 2628.22 samples/sec   Loss 7.4921   LearningRate 0.0309   Epoch: 8   Global Step: 368180   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:36,857-Speed 2630.98 samples/sec   Loss 7.4569   LearningRate 0.0309   Epoch: 8   Global Step: 368190   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:40,750-Speed 2630.99 samples/sec   Loss 7.5837   LearningRate 0.0309   Epoch: 8   Global Step: 368200   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:44,647-Speed 2628.31 samples/sec   Loss 7.4709   LearningRate 0.0309   Epoch: 8   Global Step: 368210   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:48,560-Speed 2617.30 samples/sec   Loss 7.5578   LearningRate 0.0309   Epoch: 8   Global Step: 368220   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:52,454-Speed 2630.31 samples/sec   Loss 7.5557   LearningRate 0.0309   Epoch: 8   Global Step: 368230   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:04:56,346-Speed 2632.56 samples/sec   Loss 7.5333   LearningRate 0.0309   Epoch: 8   Global Step: 368240   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:05:00,253-Speed 2620.99 samples/sec   Loss 7.4028   LearningRate 0.0309   Epoch: 8   Global Step: 368250   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:05:04,154-Speed 2626.00 samples/sec   Loss 7.4711   LearningRate 0.0309   Epoch: 8   Global Step: 368260   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:05:08,047-Speed 2630.52 samples/sec   Loss 7.5606   LearningRate 0.0309   Epoch: 8   Global Step: 368270   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:11,946-Speed 2626.95 samples/sec   Loss 7.5124   LearningRate 0.0309   Epoch: 8   Global Step: 368280   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:15,866-Speed 2613.45 samples/sec   Loss 7.5112   LearningRate 0.0309   Epoch: 8   Global Step: 368290   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:19,766-Speed 2626.56 samples/sec   Loss 7.4575   LearningRate 0.0309   Epoch: 8   Global Step: 368300   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:23,665-Speed 2627.00 samples/sec   Loss 7.4769   LearningRate 0.0309   Epoch: 8   Global Step: 368310   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:27,563-Speed 2627.71 samples/sec   Loss 7.5961   LearningRate 0.0309   Epoch: 8   Global Step: 368320   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:31,464-Speed 2626.62 samples/sec   Loss 7.2831   LearningRate 0.0309   Epoch: 8   Global Step: 368330   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:35,365-Speed 2625.44 samples/sec   Loss 7.5698   LearningRate 0.0309   Epoch: 8   Global Step: 368340   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:39,340-Speed 2576.24 samples/sec   Loss 7.5418   LearningRate 0.0309   Epoch: 8   Global Step: 368350   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:43,248-Speed 2621.33 samples/sec   Loss 7.5690   LearningRate 0.0309   Epoch: 8   Global Step: 368360   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:47,132-Speed 2637.54 samples/sec   Loss 7.5710   LearningRate 0.0309   Epoch: 8   Global Step: 368370   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:51,036-Speed 2626.61 samples/sec   Loss 7.4817   LearningRate 0.0309   Epoch: 8   Global Step: 368380   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:54,933-Speed 2627.75 samples/sec   Loss 7.6603   LearningRate 0.0309   Epoch: 8   Global Step: 368390   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:05:58,840-Speed 2621.96 samples/sec   Loss 7.4644   LearningRate 0.0309   Epoch: 8   Global Step: 368400   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:02,737-Speed 2628.74 samples/sec   Loss 7.4250   LearningRate 0.0309   Epoch: 8   Global Step: 368410   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:06,628-Speed 2632.03 samples/sec   Loss 7.4924   LearningRate 0.0309   Epoch: 8   Global Step: 368420   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:10,545-Speed 2614.62 samples/sec   Loss 7.4366   LearningRate 0.0309   Epoch: 8   Global Step: 368430   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:14,481-Speed 2602.97 samples/sec   Loss 7.5623   LearningRate 0.0309   Epoch: 8   Global Step: 368440   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:18,383-Speed 2624.65 samples/sec   Loss 7.3994   LearningRate 0.0309   Epoch: 8   Global Step: 368450   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:22,303-Speed 2613.24 samples/sec   Loss 7.5393   LearningRate 0.0309   Epoch: 8   Global Step: 368460   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:06:26,184-Speed 2639.47 samples/sec   Loss 7.5289   LearningRate 0.0309   Epoch: 8   Global Step: 368470   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:06:30,070-Speed 2636.01 samples/sec   Loss 7.4058   LearningRate 0.0309   Epoch: 8   Global Step: 368480   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:06:33,962-Speed 2631.33 samples/sec   Loss 7.5210   LearningRate 0.0309   Epoch: 8   Global Step: 368490   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:06:37,856-Speed 2630.80 samples/sec   Loss 7.4682   LearningRate 0.0309   Epoch: 8   Global Step: 368500   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:06:41,749-Speed 2630.28 samples/sec   Loss 7.4427   LearningRate 0.0309   Epoch: 8   Global Step: 368510   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:06:45,647-Speed 2636.14 samples/sec   Loss 7.5173   LearningRate 0.0309   Epoch: 8   Global Step: 368520   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:06:49,511-Speed 2650.75 samples/sec   Loss 7.5293   LearningRate 0.0309   Epoch: 8   Global Step: 368530   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:06:53,430-Speed 2614.14 samples/sec   Loss 7.5477   LearningRate 0.0309   Epoch: 8   Global Step: 368540   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:06:57,326-Speed 2629.05 samples/sec   Loss 7.6042   LearningRate 0.0309   Epoch: 8   Global Step: 368550   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:01,215-Speed 2633.98 samples/sec   Loss 7.5575   LearningRate 0.0309   Epoch: 8   Global Step: 368560   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:05,101-Speed 2635.35 samples/sec   Loss 7.5740   LearningRate 0.0309   Epoch: 8   Global Step: 368570   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:08,996-Speed 2629.97 samples/sec   Loss 7.5462   LearningRate 0.0309   Epoch: 8   Global Step: 368580   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:12,889-Speed 2630.54 samples/sec   Loss 7.5148   LearningRate 0.0309   Epoch: 8   Global Step: 368590   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:16,802-Speed 2618.03 samples/sec   Loss 7.4317   LearningRate 0.0309   Epoch: 8   Global Step: 368600   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:20,702-Speed 2626.31 samples/sec   Loss 7.4710   LearningRate 0.0309   Epoch: 8   Global Step: 368610   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:24,603-Speed 2625.71 samples/sec   Loss 7.5206   LearningRate 0.0309   Epoch: 8   Global Step: 368620   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:07:28,514-Speed 2619.08 samples/sec   Loss 7.5201   LearningRate 0.0309   Epoch: 8   Global Step: 368630   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:07:32,419-Speed 2623.18 samples/sec   Loss 7.5130   LearningRate 0.0309   Epoch: 8   Global Step: 368640   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:07:36,322-Speed 2624.26 samples/sec   Loss 7.4574   LearningRate 0.0309   Epoch: 8   Global Step: 368650   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:07:40,227-Speed 2623.12 samples/sec   Loss 7.5600   LearningRate 0.0309   Epoch: 8   Global Step: 368660   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:07:44,160-Speed 2603.99 samples/sec   Loss 7.5369   LearningRate 0.0309   Epoch: 8   Global Step: 368670   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:07:48,065-Speed 2622.75 samples/sec   Loss 7.3795   LearningRate 0.0309   Epoch: 8   Global Step: 368680   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:07:51,942-Speed 2641.51 samples/sec   Loss 8.0174   LearningRate 0.0309   Epoch: 8   Global Step: 368690   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:07:55,831-Speed 2634.32 samples/sec   Loss 8.1745   LearningRate 0.0309   Epoch: 8   Global Step: 368700   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:07:59,780-Speed 2594.06 samples/sec   Loss 7.8591   LearningRate 0.0309   Epoch: 8   Global Step: 368710   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:03,676-Speed 2628.51 samples/sec   Loss 7.7702   LearningRate 0.0309   Epoch: 8   Global Step: 368720   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:07,589-Speed 2617.04 samples/sec   Loss 7.6073   LearningRate 0.0309   Epoch: 8   Global Step: 368730   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:11,478-Speed 2634.07 samples/sec   Loss 7.4977   LearningRate 0.0309   Epoch: 8   Global Step: 368740   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:15,368-Speed 2632.93 samples/sec   Loss 7.6754   LearningRate 0.0309   Epoch: 8   Global Step: 368750   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:19,272-Speed 2623.36 samples/sec   Loss 7.4313   LearningRate 0.0309   Epoch: 8   Global Step: 368760   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:23,165-Speed 2630.96 samples/sec   Loss 7.4922   LearningRate 0.0309   Epoch: 8   Global Step: 368770   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:27,056-Speed 2632.83 samples/sec   Loss 7.5403   LearningRate 0.0309   Epoch: 8   Global Step: 368780   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:08:30,949-Speed 2631.04 samples/sec   Loss 7.6607   LearningRate 0.0309   Epoch: 8   Global Step: 368790   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:34,843-Speed 2630.60 samples/sec   Loss 7.5593   LearningRate 0.0309   Epoch: 8   Global Step: 368800   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:38,732-Speed 2633.20 samples/sec   Loss 7.3803   LearningRate 0.0308   Epoch: 8   Global Step: 368810   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:42,629-Speed 2628.87 samples/sec   Loss 7.2850   LearningRate 0.0308   Epoch: 8   Global Step: 368820   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:46,518-Speed 2632.84 samples/sec   Loss 7.5181   LearningRate 0.0308   Epoch: 8   Global Step: 368830   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:50,415-Speed 2628.88 samples/sec   Loss 7.5053   LearningRate 0.0308   Epoch: 8   Global Step: 368840   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:54,314-Speed 2627.38 samples/sec   Loss 7.4500   LearningRate 0.0308   Epoch: 8   Global Step: 368850   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:08:58,237-Speed 2611.52 samples/sec   Loss 7.5763   LearningRate 0.0308   Epoch: 8   Global Step: 368860   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:02,128-Speed 2631.77 samples/sec   Loss 7.5542   LearningRate 0.0308   Epoch: 8   Global Step: 368870   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:06,019-Speed 2633.10 samples/sec   Loss 7.4746   LearningRate 0.0308   Epoch: 8   Global Step: 368880   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:09,925-Speed 2622.49 samples/sec   Loss 7.4228   LearningRate 0.0308   Epoch: 8   Global Step: 368890   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:09:13,819-Speed 2630.13 samples/sec   Loss 7.4538   LearningRate 0.0308   Epoch: 8   Global Step: 368900   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:09:18,667-Speed 2112.39 samples/sec   Loss 7.5019   LearningRate 0.0308   Epoch: 8   Global Step: 368910   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:09:22,547-Speed 2640.26 samples/sec   Loss 7.4820   LearningRate 0.0308   Epoch: 8   Global Step: 368920   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:09:26,437-Speed 2633.49 samples/sec   Loss 7.6976   LearningRate 0.0308   Epoch: 8   Global Step: 368930   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:30,341-Speed 2623.36 samples/sec   Loss 7.6497   LearningRate 0.0308   Epoch: 8   Global Step: 368940   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:34,229-Speed 2634.61 samples/sec   Loss 7.4098   LearningRate 0.0308   Epoch: 8   Global Step: 368950   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:38,124-Speed 2629.24 samples/sec   Loss 7.5462   LearningRate 0.0308   Epoch: 8   Global Step: 368960   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:42,017-Speed 2630.95 samples/sec   Loss 7.5560   LearningRate 0.0308   Epoch: 8   Global Step: 368970   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:45,910-Speed 2630.81 samples/sec   Loss 7.3710   LearningRate 0.0308   Epoch: 8   Global Step: 368980   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:49,804-Speed 2631.25 samples/sec   Loss 7.5579   LearningRate 0.0308   Epoch: 8   Global Step: 368990   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:53,698-Speed 2630.22 samples/sec   Loss 7.4275   LearningRate 0.0308   Epoch: 8   Global Step: 369000   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:09:57,611-Speed 2617.69 samples/sec   Loss 7.4872   LearningRate 0.0308   Epoch: 8   Global Step: 369010   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:10:01,507-Speed 2628.91 samples/sec   Loss 7.4855   LearningRate 0.0308   Epoch: 8   Global Step: 369020   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:10:05,399-Speed 2631.86 samples/sec   Loss 7.4754   LearningRate 0.0308   Epoch: 8   Global Step: 369030   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:09,293-Speed 2630.47 samples/sec   Loss 7.4508   LearningRate 0.0308   Epoch: 8   Global Step: 369040   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:13,190-Speed 2628.18 samples/sec   Loss 7.6703   LearningRate 0.0308   Epoch: 8   Global Step: 369050   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:17,093-Speed 2623.73 samples/sec   Loss 7.4763   LearningRate 0.0308   Epoch: 8   Global Step: 369060   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:20,982-Speed 2634.39 samples/sec   Loss 7.4006   LearningRate 0.0308   Epoch: 8   Global Step: 369070   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:24,872-Speed 2633.11 samples/sec   Loss 7.4177   LearningRate 0.0308   Epoch: 8   Global Step: 369080   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:28,765-Speed 2630.83 samples/sec   Loss 7.6446   LearningRate 0.0308   Epoch: 8   Global Step: 369090   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:32,660-Speed 2629.63 samples/sec   Loss 7.4700   LearningRate 0.0308   Epoch: 8   Global Step: 369100   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:36,580-Speed 2613.72 samples/sec   Loss 7.4919   LearningRate 0.0308   Epoch: 8   Global Step: 369110   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:40,476-Speed 2628.20 samples/sec   Loss 7.5110   LearningRate 0.0308   Epoch: 8   Global Step: 369120   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:10:44,383-Speed 2622.24 samples/sec   Loss 7.5045   LearningRate 0.0308   Epoch: 8   Global Step: 369130   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:10:48,304-Speed 2612.36 samples/sec   Loss 7.6227   LearningRate 0.0308   Epoch: 8   Global Step: 369140   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:10:52,202-Speed 2627.91 samples/sec   Loss 7.5487   LearningRate 0.0308   Epoch: 8   Global Step: 369150   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:10:56,090-Speed 2634.82 samples/sec   Loss 7.4636   LearningRate 0.0308   Epoch: 8   Global Step: 369160   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:10:59,970-Speed 2639.74 samples/sec   Loss 8.1882   LearningRate 0.0308   Epoch: 8   Global Step: 369170   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:03,860-Speed 2632.55 samples/sec   Loss 7.7570   LearningRate 0.0308   Epoch: 8   Global Step: 369180   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:07,775-Speed 2616.38 samples/sec   Loss 7.8150   LearningRate 0.0308   Epoch: 8   Global Step: 369190   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:11,747-Speed 2578.75 samples/sec   Loss 7.6765   LearningRate 0.0308   Epoch: 8   Global Step: 369200   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:15,650-Speed 2624.53 samples/sec   Loss 7.7205   LearningRate 0.0308   Epoch: 8   Global Step: 369210   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:19,589-Speed 2600.18 samples/sec   Loss 7.5711   LearningRate 0.0308   Epoch: 8   Global Step: 369220   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:23,483-Speed 2630.52 samples/sec   Loss 7.5279   LearningRate 0.0308   Epoch: 8   Global Step: 369230   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:27,382-Speed 2627.15 samples/sec   Loss 7.5712   LearningRate 0.0308   Epoch: 8   Global Step: 369240   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:31,276-Speed 2630.63 samples/sec   Loss 7.6528   LearningRate 0.0308   Epoch: 8   Global Step: 369250   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:35,184-Speed 2620.65 samples/sec   Loss 7.5730   LearningRate 0.0308   Epoch: 8   Global Step: 369260   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:11:39,078-Speed 2630.38 samples/sec   Loss 7.5532   LearningRate 0.0308   Epoch: 8   Global Step: 369270   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:11:42,972-Speed 2629.91 samples/sec   Loss 7.4544   LearningRate 0.0308   Epoch: 8   Global Step: 369280   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:11:46,865-Speed 2631.50 samples/sec   Loss 7.5696   LearningRate 0.0308   Epoch: 8   Global Step: 369290   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:11:50,763-Speed 2627.43 samples/sec   Loss 7.3837   LearningRate 0.0308   Epoch: 8   Global Step: 369300   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:11:54,650-Speed 2636.02 samples/sec   Loss 7.4825   LearningRate 0.0308   Epoch: 8   Global Step: 369310   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:11:58,560-Speed 2619.45 samples/sec   Loss 7.4829   LearningRate 0.0308   Epoch: 8   Global Step: 369320   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:12:02,459-Speed 2627.13 samples/sec   Loss 7.6059   LearningRate 0.0308   Epoch: 8   Global Step: 369330   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:12:06,373-Speed 2616.91 samples/sec   Loss 7.7304   LearningRate 0.0308   Epoch: 8   Global Step: 369340   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:12:10,260-Speed 2635.19 samples/sec   Loss 7.5978   LearningRate 0.0308   Epoch: 8   Global Step: 369350   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:12:14,162-Speed 2624.58 samples/sec   Loss 7.4186   LearningRate 0.0308   Epoch: 8   Global Step: 369360   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:12:18,058-Speed 2628.98 samples/sec   Loss 7.4647   LearningRate 0.0308   Epoch: 8   Global Step: 369370   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:21,988-Speed 2607.02 samples/sec   Loss 7.4426   LearningRate 0.0308   Epoch: 8   Global Step: 369380   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:25,914-Speed 2608.81 samples/sec   Loss 7.4901   LearningRate 0.0308   Epoch: 8   Global Step: 369390   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:29,805-Speed 2632.85 samples/sec   Loss 7.4779   LearningRate 0.0308   Epoch: 8   Global Step: 369400   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:33,703-Speed 2627.71 samples/sec   Loss 7.4884   LearningRate 0.0308   Epoch: 8   Global Step: 369410   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:37,606-Speed 2623.77 samples/sec   Loss 7.4566   LearningRate 0.0308   Epoch: 8   Global Step: 369420   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:41,503-Speed 2628.30 samples/sec   Loss 7.4370   LearningRate 0.0308   Epoch: 8   Global Step: 369430   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:45,387-Speed 2637.76 samples/sec   Loss 7.4889   LearningRate 0.0308   Epoch: 8   Global Step: 369440   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:49,297-Speed 2619.05 samples/sec   Loss 7.5164   LearningRate 0.0308   Epoch: 8   Global Step: 369450   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:53,182-Speed 2636.76 samples/sec   Loss 7.4627   LearningRate 0.0308   Epoch: 8   Global Step: 369460   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:12:57,077-Speed 2629.53 samples/sec   Loss 7.5692   LearningRate 0.0308   Epoch: 8   Global Step: 369470   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:13:00,984-Speed 2621.98 samples/sec   Loss 7.5856   LearningRate 0.0308   Epoch: 8   Global Step: 369480   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:13:04,877-Speed 2630.61 samples/sec   Loss 7.4717   LearningRate 0.0308   Epoch: 8   Global Step: 369490   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:13:08,775-Speed 2627.72 samples/sec   Loss 7.5355   LearningRate 0.0308   Epoch: 8   Global Step: 369500   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:13:12,652-Speed 2641.50 samples/sec   Loss 7.6887   LearningRate 0.0308   Epoch: 8   Global Step: 369510   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:16,545-Speed 2631.31 samples/sec   Loss 7.5110   LearningRate 0.0308   Epoch: 8   Global Step: 369520   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:20,453-Speed 2620.97 samples/sec   Loss 7.5136   LearningRate 0.0308   Epoch: 8   Global Step: 369530   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:24,362-Speed 2620.11 samples/sec   Loss 7.6060   LearningRate 0.0308   Epoch: 8   Global Step: 369540   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:28,255-Speed 2630.77 samples/sec   Loss 7.4801   LearningRate 0.0308   Epoch: 8   Global Step: 369550   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:32,154-Speed 2627.79 samples/sec   Loss 7.5810   LearningRate 0.0307   Epoch: 8   Global Step: 369560   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:36,048-Speed 2629.88 samples/sec   Loss 7.5105   LearningRate 0.0307   Epoch: 8   Global Step: 369570   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:39,948-Speed 2626.34 samples/sec   Loss 7.5135   LearningRate 0.0307   Epoch: 8   Global Step: 369580   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:43,844-Speed 2628.64 samples/sec   Loss 7.5993   LearningRate 0.0307   Epoch: 8   Global Step: 369590   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:47,757-Speed 2618.19 samples/sec   Loss 7.4331   LearningRate 0.0307   Epoch: 8   Global Step: 369600   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:13:51,695-Speed 2601.08 samples/sec   Loss 7.4309   LearningRate 0.0307   Epoch: 8   Global Step: 369610   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:13:55,616-Speed 2612.61 samples/sec   Loss 7.4641   LearningRate 0.0307   Epoch: 8   Global Step: 369620   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:13:59,666-Speed 2529.06 samples/sec   Loss 7.3953   LearningRate 0.0307   Epoch: 8   Global Step: 369630   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:03,611-Speed 2596.35 samples/sec   Loss 7.5670   LearningRate 0.0307   Epoch: 8   Global Step: 369640   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:07,501-Speed 2632.75 samples/sec   Loss 7.4295   LearningRate 0.0307   Epoch: 8   Global Step: 369650   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:11,399-Speed 2628.10 samples/sec   Loss 7.4416   LearningRate 0.0307   Epoch: 8   Global Step: 369660   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:15,294-Speed 2629.94 samples/sec   Loss 7.4409   LearningRate 0.0307   Epoch: 8   Global Step: 369670   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:19,190-Speed 2628.24 samples/sec   Loss 7.4619   LearningRate 0.0307   Epoch: 8   Global Step: 369680   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:23,085-Speed 2630.15 samples/sec   Loss 7.5177   LearningRate 0.0307   Epoch: 8   Global Step: 369690   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:26,979-Speed 2630.76 samples/sec   Loss 7.4735   LearningRate 0.0307   Epoch: 8   Global Step: 369700   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:30,859-Speed 2639.29 samples/sec   Loss 7.5236   LearningRate 0.0307   Epoch: 8   Global Step: 369710   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:34,756-Speed 2629.26 samples/sec   Loss 7.5350   LearningRate 0.0307   Epoch: 8   Global Step: 369720   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:38,655-Speed 2627.04 samples/sec   Loss 7.5244   LearningRate 0.0307   Epoch: 8   Global Step: 369730   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:42,553-Speed 2627.34 samples/sec   Loss 7.5032   LearningRate 0.0307   Epoch: 8   Global Step: 369740   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:46,452-Speed 2627.28 samples/sec   Loss 7.4720   LearningRate 0.0307   Epoch: 8   Global Step: 369750   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:50,354-Speed 2624.88 samples/sec   Loss 7.4262   LearningRate 0.0307   Epoch: 8   Global Step: 369760   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:54,266-Speed 2618.01 samples/sec   Loss 7.3975   LearningRate 0.0307   Epoch: 8   Global Step: 369770   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:14:58,169-Speed 2624.75 samples/sec   Loss 7.4942   LearningRate 0.0307   Epoch: 8   Global Step: 369780   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:15:02,069-Speed 2626.89 samples/sec   Loss 7.5402   LearningRate 0.0307   Epoch: 8   Global Step: 369790   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:15:06,013-Speed 2596.29 samples/sec   Loss 7.4815   LearningRate 0.0307   Epoch: 8   Global Step: 369800   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:15:09,924-Speed 2619.60 samples/sec   Loss 7.5220   LearningRate 0.0307   Epoch: 8   Global Step: 369810   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:13,822-Speed 2627.79 samples/sec   Loss 7.4611   LearningRate 0.0307   Epoch: 8   Global Step: 369820   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:17,718-Speed 2628.70 samples/sec   Loss 7.4649   LearningRate 0.0307   Epoch: 8   Global Step: 369830   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:21,622-Speed 2623.17 samples/sec   Loss 7.5744   LearningRate 0.0307   Epoch: 8   Global Step: 369840   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:25,534-Speed 2618.72 samples/sec   Loss 7.6129   LearningRate 0.0307   Epoch: 8   Global Step: 369850   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:29,432-Speed 2627.85 samples/sec   Loss 7.5953   LearningRate 0.0307   Epoch: 8   Global Step: 369860   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:33,365-Speed 2604.53 samples/sec   Loss 7.3872   LearningRate 0.0307   Epoch: 8   Global Step: 369870   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:37,271-Speed 2622.26 samples/sec   Loss 7.4027   LearningRate 0.0307   Epoch: 8   Global Step: 369880   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:41,167-Speed 2629.37 samples/sec   Loss 7.4674   LearningRate 0.0307   Epoch: 8   Global Step: 369890   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:45,068-Speed 2625.80 samples/sec   Loss 7.5571   LearningRate 0.0307   Epoch: 8   Global Step: 369900   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:15:48,956-Speed 2633.95 samples/sec   Loss 7.6210   LearningRate 0.0307   Epoch: 8   Global Step: 369910   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 13:15:52,854-Speed 2628.00 samples/sec   Loss 7.5315   LearningRate 0.0307   Epoch: 8   Global Step: 369920   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 13:15:56,776-Speed 2611.91 samples/sec   Loss 7.4592   LearningRate 0.0307   Epoch: 8   Global Step: 369930   Fp16 Grad Scale: 262144   Required: 52 hours
Training: 2022-04-14 13:16:00,651-Speed 2643.12 samples/sec   Loss 7.3849   LearningRate 0.0307   Epoch: 8   Global Step: 369940   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:16:04,561-Speed 2619.70 samples/sec   Loss 8.2234   LearningRate 0.0307   Epoch: 8   Global Step: 369950   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:16:08,455-Speed 2630.45 samples/sec   Loss 7.9784   LearningRate 0.0307   Epoch: 8   Global Step: 369960   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:16:12,348-Speed 2630.88 samples/sec   Loss 7.6410   LearningRate 0.0307   Epoch: 8   Global Step: 369970   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:16:16,238-Speed 2633.04 samples/sec   Loss 7.4689   LearningRate 0.0307   Epoch: 8   Global Step: 369980   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:16:20,138-Speed 2626.59 samples/sec   Loss 7.4160   LearningRate 0.0307   Epoch: 8   Global Step: 369990   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:16:24,043-Speed 2622.76 samples/sec   Loss 7.5794   LearningRate 0.0307   Epoch: 8   Global Step: 370000   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:17:06,921-[lfw][370000]XNorm: 23.203387
Training: 2022-04-14 13:17:06,922-[lfw][370000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-14 13:17:06,923-[lfw][370000]Accuracy-Highest: 0.99783
Training: 2022-04-14 13:17:56,979-[cfp_fp][370000]XNorm: 21.421876
Training: 2022-04-14 13:17:56,980-[cfp_fp][370000]Accuracy-Flip: 0.98557+-0.00692
Training: 2022-04-14 13:17:56,981-[cfp_fp][370000]Accuracy-Highest: 0.98671
Training: 2022-04-14 13:18:40,141-[agedb_30][370000]XNorm: 23.100004
Training: 2022-04-14 13:18:40,142-[agedb_30][370000]Accuracy-Flip: 0.97567+-0.00807
Training: 2022-04-14 13:18:40,142-[agedb_30][370000]Accuracy-Highest: 0.97700
Training: 2022-04-14 13:18:44,007-Speed 73.16 samples/sec   Loss 7.4218   LearningRate 0.0307   Epoch: 8   Global Step: 370010   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:18:47,881-Speed 2644.26 samples/sec   Loss 7.5556   LearningRate 0.0307   Epoch: 8   Global Step: 370020   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:18:51,759-Speed 2640.86 samples/sec   Loss 7.5187   LearningRate 0.0307   Epoch: 8   Global Step: 370030   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:18:55,638-Speed 2641.47 samples/sec   Loss 7.4571   LearningRate 0.0307   Epoch: 8   Global Step: 370040   Fp16 Grad Scale: 2048   Required: 52 hours
Training: 2022-04-14 13:18:59,524-Speed 2635.70 samples/sec   Loss 7.4537   LearningRate 0.0307   Epoch: 8   Global Step: 370050   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:03,407-Speed 2637.60 samples/sec   Loss 7.4621   LearningRate 0.0307   Epoch: 8   Global Step: 370060   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:07,284-Speed 2642.27 samples/sec   Loss 7.5877   LearningRate 0.0307   Epoch: 8   Global Step: 370070   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:11,170-Speed 2635.88 samples/sec   Loss 7.4202   LearningRate 0.0307   Epoch: 8   Global Step: 370080   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:15,047-Speed 2641.82 samples/sec   Loss 7.4986   LearningRate 0.0307   Epoch: 8   Global Step: 370090   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:18,941-Speed 2630.43 samples/sec   Loss 7.4485   LearningRate 0.0307   Epoch: 8   Global Step: 370100   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:22,825-Speed 2636.94 samples/sec   Loss 7.4231   LearningRate 0.0307   Epoch: 8   Global Step: 370110   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:26,718-Speed 2631.02 samples/sec   Loss 7.4584   LearningRate 0.0307   Epoch: 8   Global Step: 370120   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:30,601-Speed 2638.11 samples/sec   Loss 7.5596   LearningRate 0.0307   Epoch: 8   Global Step: 370130   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:34,498-Speed 2628.00 samples/sec   Loss 7.5784   LearningRate 0.0307   Epoch: 8   Global Step: 370140   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:19:38,387-Speed 2633.68 samples/sec   Loss 7.4268   LearningRate 0.0307   Epoch: 8   Global Step: 370150   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:19:42,289-Speed 2625.51 samples/sec   Loss 7.5094   LearningRate 0.0307   Epoch: 8   Global Step: 370160   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:19:46,180-Speed 2632.11 samples/sec   Loss 7.6036   LearningRate 0.0307   Epoch: 8   Global Step: 370170   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:19:50,078-Speed 2627.74 samples/sec   Loss 7.5385   LearningRate 0.0307   Epoch: 8   Global Step: 370180   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:19:53,973-Speed 2629.42 samples/sec   Loss 7.5478   LearningRate 0.0307   Epoch: 8   Global Step: 370190   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:19:57,873-Speed 2626.50 samples/sec   Loss 7.4716   LearningRate 0.0307   Epoch: 8   Global Step: 370200   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:20:02,241-Speed 2344.70 samples/sec   Loss 7.5567   LearningRate 0.0307   Epoch: 8   Global Step: 370210   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:20:06,140-Speed 2626.89 samples/sec   Loss 7.5481   LearningRate 0.0307   Epoch: 8   Global Step: 370220   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:20:10,035-Speed 2630.34 samples/sec   Loss 7.7376   LearningRate 0.0307   Epoch: 8   Global Step: 370230   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:20:13,935-Speed 2626.05 samples/sec   Loss 7.4199   LearningRate 0.0307   Epoch: 8   Global Step: 370240   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:20:17,841-Speed 2622.22 samples/sec   Loss 7.5154   LearningRate 0.0307   Epoch: 8   Global Step: 370250   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:21,741-Speed 2626.43 samples/sec   Loss 7.4985   LearningRate 0.0307   Epoch: 8   Global Step: 370260   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:25,646-Speed 2622.77 samples/sec   Loss 7.5836   LearningRate 0.0307   Epoch: 8   Global Step: 370270   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:29,540-Speed 2630.50 samples/sec   Loss 7.4015   LearningRate 0.0307   Epoch: 8   Global Step: 370280   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:33,436-Speed 2629.07 samples/sec   Loss 7.5371   LearningRate 0.0307   Epoch: 8   Global Step: 370290   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:37,333-Speed 2628.51 samples/sec   Loss 7.6175   LearningRate 0.0307   Epoch: 8   Global Step: 370300   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:41,266-Speed 2604.07 samples/sec   Loss 7.4506   LearningRate 0.0306   Epoch: 8   Global Step: 370310   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:45,158-Speed 2632.20 samples/sec   Loss 7.4167   LearningRate 0.0306   Epoch: 8   Global Step: 370320   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:49,053-Speed 2629.79 samples/sec   Loss 7.4637   LearningRate 0.0306   Epoch: 8   Global Step: 370330   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:52,954-Speed 2625.66 samples/sec   Loss 7.4474   LearningRate 0.0306   Epoch: 8   Global Step: 370340   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:20:56,850-Speed 2628.92 samples/sec   Loss 7.4900   LearningRate 0.0306   Epoch: 8   Global Step: 370350   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:00,746-Speed 2628.85 samples/sec   Loss 7.4585   LearningRate 0.0306   Epoch: 8   Global Step: 370360   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:04,642-Speed 2628.72 samples/sec   Loss 7.4569   LearningRate 0.0306   Epoch: 8   Global Step: 370370   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:08,535-Speed 2631.15 samples/sec   Loss 7.4099   LearningRate 0.0306   Epoch: 8   Global Step: 370380   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:12,434-Speed 2627.62 samples/sec   Loss 7.6030   LearningRate 0.0306   Epoch: 8   Global Step: 370390   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:16,331-Speed 2628.02 samples/sec   Loss 7.5625   LearningRate 0.0306   Epoch: 8   Global Step: 370400   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:20,228-Speed 2628.76 samples/sec   Loss 7.4507   LearningRate 0.0306   Epoch: 8   Global Step: 370410   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:24,125-Speed 2627.85 samples/sec   Loss 7.3618   LearningRate 0.0306   Epoch: 8   Global Step: 370420   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:28,025-Speed 2626.58 samples/sec   Loss 7.3830   LearningRate 0.0306   Epoch: 8   Global Step: 370430   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:31,942-Speed 2614.71 samples/sec   Loss 7.4726   LearningRate 0.0306   Epoch: 8   Global Step: 370440   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:35,838-Speed 2629.03 samples/sec   Loss 7.5149   LearningRate 0.0306   Epoch: 8   Global Step: 370450   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:21:39,740-Speed 2625.21 samples/sec   Loss 7.4811   LearningRate 0.0306   Epoch: 8   Global Step: 370460   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:21:43,634-Speed 2630.79 samples/sec   Loss 7.5256   LearningRate 0.0306   Epoch: 8   Global Step: 370470   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:21:47,522-Speed 2634.43 samples/sec   Loss 7.5647   LearningRate 0.0306   Epoch: 8   Global Step: 370480   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:21:51,393-Speed 2645.74 samples/sec   Loss 7.6463   LearningRate 0.0306   Epoch: 8   Global Step: 370490   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:21:55,300-Speed 2621.58 samples/sec   Loss 7.6402   LearningRate 0.0306   Epoch: 8   Global Step: 370500   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:21:59,191-Speed 2632.80 samples/sec   Loss 7.5444   LearningRate 0.0306   Epoch: 8   Global Step: 370510   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:03,098-Speed 2621.17 samples/sec   Loss 7.4839   LearningRate 0.0306   Epoch: 8   Global Step: 370520   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:06,998-Speed 2626.22 samples/sec   Loss 7.5373   LearningRate 0.0306   Epoch: 8   Global Step: 370530   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:10,894-Speed 2629.69 samples/sec   Loss 7.5479   LearningRate 0.0306   Epoch: 8   Global Step: 370540   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:14,788-Speed 2630.17 samples/sec   Loss 7.5295   LearningRate 0.0306   Epoch: 8   Global Step: 370550   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:18,681-Speed 2631.02 samples/sec   Loss 7.5392   LearningRate 0.0306   Epoch: 8   Global Step: 370560   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:22,587-Speed 2622.29 samples/sec   Loss 7.4348   LearningRate 0.0306   Epoch: 8   Global Step: 370570   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:26,484-Speed 2628.41 samples/sec   Loss 7.4805   LearningRate 0.0306   Epoch: 8   Global Step: 370580   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:30,389-Speed 2622.92 samples/sec   Loss 7.4185   LearningRate 0.0306   Epoch: 8   Global Step: 370590   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:22:34,290-Speed 2625.99 samples/sec   Loss 7.3661   LearningRate 0.0306   Epoch: 8   Global Step: 370600   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:22:38,242-Speed 2591.33 samples/sec   Loss 7.5782   LearningRate 0.0306   Epoch: 8   Global Step: 370610   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:22:42,139-Speed 2628.76 samples/sec   Loss 7.4360   LearningRate 0.0306   Epoch: 8   Global Step: 370620   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:22:46,088-Speed 2593.03 samples/sec   Loss 7.3615   LearningRate 0.0306   Epoch: 8   Global Step: 370630   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:22:49,980-Speed 2632.11 samples/sec   Loss 7.5273   LearningRate 0.0306   Epoch: 8   Global Step: 370640   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:22:53,861-Speed 2639.49 samples/sec   Loss 7.8681   LearningRate 0.0306   Epoch: 8   Global Step: 370650   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:22:57,751-Speed 2632.93 samples/sec   Loss 7.5390   LearningRate 0.0306   Epoch: 8   Global Step: 370660   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:01,647-Speed 2628.66 samples/sec   Loss 7.3902   LearningRate 0.0306   Epoch: 8   Global Step: 370670   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:05,546-Speed 2626.62 samples/sec   Loss 7.4660   LearningRate 0.0306   Epoch: 8   Global Step: 370680   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:09,449-Speed 2624.10 samples/sec   Loss 7.5322   LearningRate 0.0306   Epoch: 8   Global Step: 370690   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:13,352-Speed 2624.93 samples/sec   Loss 7.4509   LearningRate 0.0306   Epoch: 8   Global Step: 370700   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:17,252-Speed 2626.10 samples/sec   Loss 7.4358   LearningRate 0.0306   Epoch: 8   Global Step: 370710   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:21,162-Speed 2619.81 samples/sec   Loss 7.5711   LearningRate 0.0306   Epoch: 8   Global Step: 370720   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:25,074-Speed 2618.61 samples/sec   Loss 7.3562   LearningRate 0.0306   Epoch: 8   Global Step: 370730   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:28,970-Speed 2628.97 samples/sec   Loss 7.5454   LearningRate 0.0306   Epoch: 8   Global Step: 370740   Fp16 Grad Scale: 4096   Required: 52 hours
Training: 2022-04-14 13:23:32,879-Speed 2620.54 samples/sec   Loss 7.4600   LearningRate 0.0306   Epoch: 8   Global Step: 370750   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:23:36,778-Speed 2626.69 samples/sec   Loss 7.4424   LearningRate 0.0306   Epoch: 8   Global Step: 370760   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:23:40,669-Speed 2632.34 samples/sec   Loss 7.5278   LearningRate 0.0306   Epoch: 8   Global Step: 370770   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:23:44,587-Speed 2613.74 samples/sec   Loss 7.5581   LearningRate 0.0306   Epoch: 8   Global Step: 370780   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:23:48,492-Speed 2623.68 samples/sec   Loss 7.4031   LearningRate 0.0306   Epoch: 8   Global Step: 370790   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:23:52,399-Speed 2621.69 samples/sec   Loss 7.5309   LearningRate 0.0306   Epoch: 8   Global Step: 370800   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:23:56,380-Speed 2573.26 samples/sec   Loss 7.4374   LearningRate 0.0306   Epoch: 8   Global Step: 370810   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:24:00,321-Speed 2598.84 samples/sec   Loss 7.4228   LearningRate 0.0306   Epoch: 8   Global Step: 370820   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:24:04,246-Speed 2609.55 samples/sec   Loss 7.3362   LearningRate 0.0306   Epoch: 8   Global Step: 370830   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:24:08,142-Speed 2628.87 samples/sec   Loss 7.4085   LearningRate 0.0306   Epoch: 8   Global Step: 370840   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-14 13:24:12,038-Speed 2629.65 samples/sec   Loss 7.3889   LearningRate 0.0306   Epoch: 8   Global Step: 370850   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:15,989-Speed 2592.04 samples/sec   Loss 7.4036   LearningRate 0.0306   Epoch: 8   Global Step: 370860   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:19,888-Speed 2626.54 samples/sec   Loss 7.5420   LearningRate 0.0306   Epoch: 8   Global Step: 370870   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:23,795-Speed 2622.08 samples/sec   Loss 7.5416   LearningRate 0.0306   Epoch: 8   Global Step: 370880   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:27,694-Speed 2627.23 samples/sec   Loss 7.4958   LearningRate 0.0306   Epoch: 8   Global Step: 370890   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:31,662-Speed 2581.44 samples/sec   Loss 7.3846   LearningRate 0.0306   Epoch: 8   Global Step: 370900   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:35,565-Speed 2623.66 samples/sec   Loss 7.4732   LearningRate 0.0306   Epoch: 8   Global Step: 370910   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:39,476-Speed 2618.89 samples/sec   Loss 7.3946   LearningRate 0.0306   Epoch: 8   Global Step: 370920   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:43,399-Speed 2610.83 samples/sec   Loss 7.5565   LearningRate 0.0306   Epoch: 8   Global Step: 370930   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:47,409-Speed 2554.18 samples/sec   Loss 8.2785   LearningRate 0.0306   Epoch: 8   Global Step: 370940   Fp16 Grad Scale: 16384   Required: 52 hours
Training: 2022-04-14 13:24:51,307-Speed 2627.58 samples/sec   Loss 7.9291   LearningRate 0.0306   Epoch: 8   Global Step: 370950   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:24:55,213-Speed 2622.84 samples/sec   Loss 7.8090   LearningRate 0.0306   Epoch: 8   Global Step: 370960   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:24:59,118-Speed 2622.79 samples/sec   Loss 7.7956   LearningRate 0.0306   Epoch: 8   Global Step: 370970   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:03,021-Speed 2624.04 samples/sec   Loss 7.6423   LearningRate 0.0306   Epoch: 8   Global Step: 370980   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:06,919-Speed 2627.26 samples/sec   Loss 7.5346   LearningRate 0.0306   Epoch: 8   Global Step: 370990   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:10,819-Speed 2626.11 samples/sec   Loss 7.5100   LearningRate 0.0306   Epoch: 8   Global Step: 371000   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:14,718-Speed 2626.88 samples/sec   Loss 7.4804   LearningRate 0.0306   Epoch: 8   Global Step: 371010   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:18,616-Speed 2627.63 samples/sec   Loss 7.6717   LearningRate 0.0306   Epoch: 8   Global Step: 371020   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:22,530-Speed 2617.11 samples/sec   Loss 7.5169   LearningRate 0.0306   Epoch: 8   Global Step: 371030   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:26,428-Speed 2628.08 samples/sec   Loss 7.5908   LearningRate 0.0306   Epoch: 8   Global Step: 371040   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:25:30,370-Speed 2597.56 samples/sec   Loss 7.4770   LearningRate 0.0306   Epoch: 8   Global Step: 371050   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:34,266-Speed 2629.51 samples/sec   Loss 7.4119   LearningRate 0.0305   Epoch: 8   Global Step: 371060   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:38,161-Speed 2629.01 samples/sec   Loss 7.5161   LearningRate 0.0305   Epoch: 8   Global Step: 371070   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:42,055-Speed 2630.53 samples/sec   Loss 7.4250   LearningRate 0.0305   Epoch: 8   Global Step: 371080   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:45,948-Speed 2631.02 samples/sec   Loss 7.5319   LearningRate 0.0305   Epoch: 8   Global Step: 371090   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:49,846-Speed 2627.13 samples/sec   Loss 7.5072   LearningRate 0.0305   Epoch: 8   Global Step: 371100   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:53,741-Speed 2629.45 samples/sec   Loss 7.4490   LearningRate 0.0305   Epoch: 8   Global Step: 371110   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:25:57,640-Speed 2626.81 samples/sec   Loss 7.5528   LearningRate 0.0305   Epoch: 8   Global Step: 371120   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:26:01,541-Speed 2626.08 samples/sec   Loss 7.5822   LearningRate 0.0305   Epoch: 8   Global Step: 371130   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:26:05,430-Speed 2633.82 samples/sec   Loss 7.3663   LearningRate 0.0305   Epoch: 8   Global Step: 371140   Fp16 Grad Scale: 65536   Required: 52 hours
Training: 2022-04-14 13:26:09,338-Speed 2620.61 samples/sec   Loss 7.4186   LearningRate 0.0305   Epoch: 8   Global Step: 371150   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:26:13,245-Speed 2621.22 samples/sec   Loss 7.4354   LearningRate 0.0305   Epoch: 8   Global Step: 371160   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:26:17,145-Speed 2626.37 samples/sec   Loss 7.4477   LearningRate 0.0305   Epoch: 8   Global Step: 371170   Fp16 Grad Scale: 131072   Required: 52 hours
Training: 2022-04-14 13:26:21,009-Speed 2650.59 samples/sec   Loss 7.5077   LearningRate 0.0305   Epoch: 8   Global Step: 371180   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:24,945-Speed 2602.35 samples/sec   Loss 7.4622   LearningRate 0.0305   Epoch: 8   Global Step: 371190   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:28,849-Speed 2623.73 samples/sec   Loss 7.4817   LearningRate 0.0305   Epoch: 8   Global Step: 371200   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:32,828-Speed 2574.07 samples/sec   Loss 7.4568   LearningRate 0.0305   Epoch: 8   Global Step: 371210   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:36,719-Speed 2632.03 samples/sec   Loss 7.4237   LearningRate 0.0305   Epoch: 8   Global Step: 371220   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:40,612-Speed 2631.16 samples/sec   Loss 7.3713   LearningRate 0.0305   Epoch: 8   Global Step: 371230   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:44,505-Speed 2630.88 samples/sec   Loss 7.4893   LearningRate 0.0305   Epoch: 8   Global Step: 371240   Fp16 Grad Scale: 32768   Required: 52 hours
Training: 2022-04-14 13:26:48,399-Speed 2630.50 samples/sec   Loss 7.4679   LearningRate 0.0305   Epoch: 8   Global Step: 371250   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:26:52,306-Speed 2621.40 samples/sec   Loss 7.4478   LearningRate 0.0305   Epoch: 8   Global Step: 371260   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:26:56,199-Speed 2630.73 samples/sec   Loss 7.5628   LearningRate 0.0305   Epoch: 8   Global Step: 371270   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:00,096-Speed 2628.22 samples/sec   Loss 7.4889   LearningRate 0.0305   Epoch: 8   Global Step: 371280   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:27:04,001-Speed 2623.12 samples/sec   Loss 7.4099   LearningRate 0.0305   Epoch: 8   Global Step: 371290   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:27:07,904-Speed 2623.84 samples/sec   Loss 7.4285   LearningRate 0.0305   Epoch: 8   Global Step: 371300   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:27:11,810-Speed 2622.59 samples/sec   Loss 7.4120   LearningRate 0.0305   Epoch: 8   Global Step: 371310   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:27:15,701-Speed 2632.13 samples/sec   Loss 7.5398   LearningRate 0.0305   Epoch: 8   Global Step: 371320   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:27:19,595-Speed 2630.45 samples/sec   Loss 7.4601   LearningRate 0.0305   Epoch: 8   Global Step: 371330   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:23,502-Speed 2621.76 samples/sec   Loss 8.0406   LearningRate 0.0305   Epoch: 8   Global Step: 371340   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:27,401-Speed 2626.97 samples/sec   Loss 7.7389   LearningRate 0.0305   Epoch: 8   Global Step: 371350   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:31,306-Speed 2622.57 samples/sec   Loss 7.5940   LearningRate 0.0305   Epoch: 8   Global Step: 371360   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:35,198-Speed 2631.32 samples/sec   Loss 7.5138   LearningRate 0.0305   Epoch: 8   Global Step: 371370   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:39,096-Speed 2627.73 samples/sec   Loss 7.5084   LearningRate 0.0305   Epoch: 8   Global Step: 371380   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:42,985-Speed 2634.02 samples/sec   Loss 7.4373   LearningRate 0.0305   Epoch: 8   Global Step: 371390   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:46,880-Speed 2628.97 samples/sec   Loss 7.5330   LearningRate 0.0305   Epoch: 8   Global Step: 371400   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:50,783-Speed 2624.27 samples/sec   Loss 7.5160   LearningRate 0.0305   Epoch: 8   Global Step: 371410   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:54,673-Speed 2633.43 samples/sec   Loss 7.3530   LearningRate 0.0305   Epoch: 8   Global Step: 371420   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:27:58,577-Speed 2623.58 samples/sec   Loss 7.6978   LearningRate 0.0305   Epoch: 8   Global Step: 371430   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:02,499-Speed 2611.85 samples/sec   Loss 7.4885   LearningRate 0.0305   Epoch: 8   Global Step: 371440   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:06,388-Speed 2632.95 samples/sec   Loss 7.3741   LearningRate 0.0305   Epoch: 8   Global Step: 371450   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:10,315-Speed 2608.30 samples/sec   Loss 7.4704   LearningRate 0.0305   Epoch: 8   Global Step: 371460   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:14,217-Speed 2624.92 samples/sec   Loss 7.4784   LearningRate 0.0305   Epoch: 8   Global Step: 371470   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:18,111-Speed 2630.32 samples/sec   Loss 7.3890   LearningRate 0.0305   Epoch: 8   Global Step: 371480   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:22,003-Speed 2631.48 samples/sec   Loss 7.4333   LearningRate 0.0305   Epoch: 8   Global Step: 371490   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:28:25,884-Speed 2639.20 samples/sec   Loss 7.5584   LearningRate 0.0305   Epoch: 8   Global Step: 371500   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:29,788-Speed 2624.25 samples/sec   Loss 7.7910   LearningRate 0.0305   Epoch: 8   Global Step: 371510   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:33,690-Speed 2624.51 samples/sec   Loss 7.4400   LearningRate 0.0305   Epoch: 8   Global Step: 371520   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:37,584-Speed 2630.42 samples/sec   Loss 7.4757   LearningRate 0.0305   Epoch: 8   Global Step: 371530   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:41,476-Speed 2631.43 samples/sec   Loss 7.5290   LearningRate 0.0305   Epoch: 8   Global Step: 371540   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:45,387-Speed 2618.80 samples/sec   Loss 7.4758   LearningRate 0.0305   Epoch: 8   Global Step: 371550   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:49,290-Speed 2624.46 samples/sec   Loss 7.5217   LearningRate 0.0305   Epoch: 8   Global Step: 371560   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:53,180-Speed 2632.46 samples/sec   Loss 7.4857   LearningRate 0.0305   Epoch: 8   Global Step: 371570   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:28:57,070-Speed 2633.21 samples/sec   Loss 7.4781   LearningRate 0.0305   Epoch: 8   Global Step: 371580   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:29:00,966-Speed 2628.68 samples/sec   Loss 7.3820   LearningRate 0.0305   Epoch: 8   Global Step: 371590   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:29:04,892-Speed 2608.71 samples/sec   Loss 7.5765   LearningRate 0.0305   Epoch: 8   Global Step: 371600   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:29:08,790-Speed 2628.00 samples/sec   Loss 7.4636   LearningRate 0.0305   Epoch: 8   Global Step: 371610   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:29:12,705-Speed 2616.06 samples/sec   Loss 7.4087   LearningRate 0.0305   Epoch: 8   Global Step: 371620   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:29:16,625-Speed 2613.12 samples/sec   Loss 7.5011   LearningRate 0.0305   Epoch: 8   Global Step: 371630   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:29:20,547-Speed 2611.07 samples/sec   Loss 7.4635   LearningRate 0.0305   Epoch: 8   Global Step: 371640   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:29:24,447-Speed 2626.12 samples/sec   Loss 7.5575   LearningRate 0.0305   Epoch: 8   Global Step: 371650   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:29:28,326-Speed 2640.84 samples/sec   Loss 7.8063   LearningRate 0.0305   Epoch: 8   Global Step: 371660   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:32,232-Speed 2622.20 samples/sec   Loss 7.6138   LearningRate 0.0305   Epoch: 8   Global Step: 371670   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:36,130-Speed 2627.13 samples/sec   Loss 7.3879   LearningRate 0.0305   Epoch: 8   Global Step: 371680   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:40,036-Speed 2622.11 samples/sec   Loss 7.4122   LearningRate 0.0305   Epoch: 8   Global Step: 371690   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:43,940-Speed 2624.01 samples/sec   Loss 7.5187   LearningRate 0.0305   Epoch: 8   Global Step: 371700   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:47,835-Speed 2629.86 samples/sec   Loss 7.4765   LearningRate 0.0305   Epoch: 8   Global Step: 371710   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:51,725-Speed 2632.72 samples/sec   Loss 7.4744   LearningRate 0.0305   Epoch: 8   Global Step: 371720   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:55,632-Speed 2621.51 samples/sec   Loss 7.4770   LearningRate 0.0305   Epoch: 8   Global Step: 371730   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:29:59,528-Speed 2628.99 samples/sec   Loss 7.5159   LearningRate 0.0305   Epoch: 8   Global Step: 371740   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:30:03,422-Speed 2630.18 samples/sec   Loss 7.4800   LearningRate 0.0305   Epoch: 8   Global Step: 371750   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:30:07,315-Speed 2630.97 samples/sec   Loss 7.5316   LearningRate 0.0305   Epoch: 8   Global Step: 371760   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:11,222-Speed 2621.02 samples/sec   Loss 7.4775   LearningRate 0.0305   Epoch: 8   Global Step: 371770   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:15,123-Speed 2625.84 samples/sec   Loss 7.3881   LearningRate 0.0305   Epoch: 8   Global Step: 371780   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:19,018-Speed 2629.25 samples/sec   Loss 7.4694   LearningRate 0.0305   Epoch: 8   Global Step: 371790   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:22,937-Speed 2614.18 samples/sec   Loss 7.4371   LearningRate 0.0305   Epoch: 8   Global Step: 371800   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:26,845-Speed 2620.47 samples/sec   Loss 7.3503   LearningRate 0.0304   Epoch: 8   Global Step: 371810   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:30,756-Speed 2618.77 samples/sec   Loss 7.4777   LearningRate 0.0304   Epoch: 8   Global Step: 371820   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:34,659-Speed 2624.48 samples/sec   Loss 7.4335   LearningRate 0.0304   Epoch: 8   Global Step: 371830   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:38,566-Speed 2621.73 samples/sec   Loss 7.4328   LearningRate 0.0304   Epoch: 8   Global Step: 371840   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:42,464-Speed 2627.17 samples/sec   Loss 7.3727   LearningRate 0.0304   Epoch: 8   Global Step: 371850   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:30:46,368-Speed 2623.40 samples/sec   Loss 7.4033   LearningRate 0.0304   Epoch: 8   Global Step: 371860   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:30:50,265-Speed 2628.00 samples/sec   Loss 7.3365   LearningRate 0.0304   Epoch: 8   Global Step: 371870   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:30:54,158-Speed 2631.21 samples/sec   Loss 7.4744   LearningRate 0.0304   Epoch: 8   Global Step: 371880   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:30:58,066-Speed 2620.43 samples/sec   Loss 7.4002   LearningRate 0.0304   Epoch: 8   Global Step: 371890   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:01,965-Speed 2627.91 samples/sec   Loss 7.4368   LearningRate 0.0304   Epoch: 8   Global Step: 371900   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:05,874-Speed 2619.74 samples/sec   Loss 7.4580   LearningRate 0.0304   Epoch: 8   Global Step: 371910   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:09,768-Speed 2630.01 samples/sec   Loss 7.5057   LearningRate 0.0304   Epoch: 8   Global Step: 371920   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:13,672-Speed 2623.53 samples/sec   Loss 7.3824   LearningRate 0.0304   Epoch: 8   Global Step: 371930   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:17,572-Speed 2626.27 samples/sec   Loss 7.4361   LearningRate 0.0304   Epoch: 8   Global Step: 371940   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:21,477-Speed 2622.69 samples/sec   Loss 7.3967   LearningRate 0.0304   Epoch: 8   Global Step: 371950   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:31:25,338-Speed 2653.40 samples/sec   Loss 7.6403   LearningRate 0.0304   Epoch: 8   Global Step: 371960   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:29,236-Speed 2627.37 samples/sec   Loss 8.0404   LearningRate 0.0304   Epoch: 8   Global Step: 371970   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:33,161-Speed 2609.73 samples/sec   Loss 7.5014   LearningRate 0.0304   Epoch: 8   Global Step: 371980   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:37,249-Speed 2505.57 samples/sec   Loss 7.4509   LearningRate 0.0304   Epoch: 8   Global Step: 371990   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:41,211-Speed 2584.63 samples/sec   Loss 7.4915   LearningRate 0.0304   Epoch: 8   Global Step: 372000   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:45,114-Speed 2624.42 samples/sec   Loss 7.5238   LearningRate 0.0304   Epoch: 8   Global Step: 372010   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:49,057-Speed 2597.61 samples/sec   Loss 7.5337   LearningRate 0.0304   Epoch: 8   Global Step: 372020   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:52,960-Speed 2623.94 samples/sec   Loss 7.4634   LearningRate 0.0304   Epoch: 8   Global Step: 372030   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:31:56,865-Speed 2622.79 samples/sec   Loss 7.3927   LearningRate 0.0304   Epoch: 8   Global Step: 372040   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:00,768-Speed 2624.57 samples/sec   Loss 7.3767   LearningRate 0.0304   Epoch: 8   Global Step: 372050   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:04,655-Speed 2634.99 samples/sec   Loss 7.6265   LearningRate 0.0304   Epoch: 8   Global Step: 372060   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:08,554-Speed 2626.83 samples/sec   Loss 7.5339   LearningRate 0.0304   Epoch: 8   Global Step: 372070   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:12,442-Speed 2634.13 samples/sec   Loss 7.4738   LearningRate 0.0304   Epoch: 8   Global Step: 372080   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:16,351-Speed 2620.32 samples/sec   Loss 7.3773   LearningRate 0.0304   Epoch: 8   Global Step: 372090   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:20,246-Speed 2629.58 samples/sec   Loss 7.5844   LearningRate 0.0304   Epoch: 8   Global Step: 372100   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:24,139-Speed 2631.37 samples/sec   Loss 7.5040   LearningRate 0.0304   Epoch: 8   Global Step: 372110   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:28,039-Speed 2625.83 samples/sec   Loss 7.5440   LearningRate 0.0304   Epoch: 8   Global Step: 372120   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:31,937-Speed 2627.60 samples/sec   Loss 7.5230   LearningRate 0.0304   Epoch: 8   Global Step: 372130   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:35,828-Speed 2632.59 samples/sec   Loss 7.4215   LearningRate 0.0304   Epoch: 8   Global Step: 372140   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:39,741-Speed 2617.25 samples/sec   Loss 7.3873   LearningRate 0.0304   Epoch: 8   Global Step: 372150   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:32:43,639-Speed 2627.57 samples/sec   Loss 7.4989   LearningRate 0.0304   Epoch: 8   Global Step: 372160   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:32:47,573-Speed 2603.69 samples/sec   Loss 7.4757   LearningRate 0.0304   Epoch: 8   Global Step: 372170   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:32:51,470-Speed 2628.10 samples/sec   Loss 7.5271   LearningRate 0.0304   Epoch: 8   Global Step: 372180   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:32:55,363-Speed 2631.43 samples/sec   Loss 7.4309   LearningRate 0.0304   Epoch: 8   Global Step: 372190   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:32:59,298-Speed 2602.95 samples/sec   Loss 7.5763   LearningRate 0.0304   Epoch: 8   Global Step: 372200   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:03,185-Speed 2634.79 samples/sec   Loss 7.6085   LearningRate 0.0304   Epoch: 8   Global Step: 372210   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:07,079-Speed 2630.04 samples/sec   Loss 7.3780   LearningRate 0.0304   Epoch: 8   Global Step: 372220   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:10,977-Speed 2627.35 samples/sec   Loss 7.3959   LearningRate 0.0304   Epoch: 8   Global Step: 372230   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:14,866-Speed 2634.23 samples/sec   Loss 7.4182   LearningRate 0.0304   Epoch: 8   Global Step: 372240   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:18,785-Speed 2613.31 samples/sec   Loss 7.5054   LearningRate 0.0304   Epoch: 8   Global Step: 372250   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:22,682-Speed 2628.29 samples/sec   Loss 7.4724   LearningRate 0.0304   Epoch: 8   Global Step: 372260   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:26,572-Speed 2632.62 samples/sec   Loss 7.4514   LearningRate 0.0304   Epoch: 8   Global Step: 372270   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:30,465-Speed 2631.10 samples/sec   Loss 7.4624   LearningRate 0.0304   Epoch: 8   Global Step: 372280   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:34,384-Speed 2613.69 samples/sec   Loss 7.5139   LearningRate 0.0304   Epoch: 8   Global Step: 372290   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:38,290-Speed 2622.51 samples/sec   Loss 7.4400   LearningRate 0.0304   Epoch: 8   Global Step: 372300   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:42,178-Speed 2634.39 samples/sec   Loss 7.4814   LearningRate 0.0304   Epoch: 8   Global Step: 372310   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:46,080-Speed 2624.59 samples/sec   Loss 7.4456   LearningRate 0.0304   Epoch: 8   Global Step: 372320   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:33:49,961-Speed 2638.94 samples/sec   Loss 7.5439   LearningRate 0.0304   Epoch: 8   Global Step: 372330   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:53,853-Speed 2631.86 samples/sec   Loss 7.7377   LearningRate 0.0304   Epoch: 8   Global Step: 372340   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:33:57,742-Speed 2633.47 samples/sec   Loss 7.5904   LearningRate 0.0304   Epoch: 8   Global Step: 372350   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:01,637-Speed 2629.74 samples/sec   Loss 7.5960   LearningRate 0.0304   Epoch: 8   Global Step: 372360   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:05,526-Speed 2633.63 samples/sec   Loss 7.4431   LearningRate 0.0304   Epoch: 8   Global Step: 372370   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:09,420-Speed 2629.95 samples/sec   Loss 7.4841   LearningRate 0.0304   Epoch: 8   Global Step: 372380   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:13,313-Speed 2631.46 samples/sec   Loss 7.5132   LearningRate 0.0304   Epoch: 8   Global Step: 372390   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:17,212-Speed 2626.72 samples/sec   Loss 7.3834   LearningRate 0.0304   Epoch: 8   Global Step: 372400   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:21,117-Speed 2623.10 samples/sec   Loss 7.5136   LearningRate 0.0304   Epoch: 8   Global Step: 372410   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:25,013-Speed 2628.86 samples/sec   Loss 7.4908   LearningRate 0.0304   Epoch: 8   Global Step: 372420   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:28,910-Speed 2627.95 samples/sec   Loss 7.4407   LearningRate 0.0304   Epoch: 8   Global Step: 372430   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:32,802-Speed 2631.76 samples/sec   Loss 7.4390   LearningRate 0.0304   Epoch: 8   Global Step: 372440   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:34:36,704-Speed 2624.38 samples/sec   Loss 7.4867   LearningRate 0.0304   Epoch: 8   Global Step: 372450   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:34:40,604-Speed 2626.49 samples/sec   Loss 7.6030   LearningRate 0.0304   Epoch: 8   Global Step: 372460   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:34:44,540-Speed 2602.45 samples/sec   Loss 7.5575   LearningRate 0.0304   Epoch: 8   Global Step: 372470   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:34:48,430-Speed 2633.17 samples/sec   Loss 7.5026   LearningRate 0.0304   Epoch: 8   Global Step: 372480   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:34:52,330-Speed 2626.07 samples/sec   Loss 7.4491   LearningRate 0.0304   Epoch: 8   Global Step: 372490   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:34:56,220-Speed 2633.22 samples/sec   Loss 7.3626   LearningRate 0.0304   Epoch: 8   Global Step: 372500   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:35:00,113-Speed 2630.82 samples/sec   Loss 7.4455   LearningRate 0.0304   Epoch: 8   Global Step: 372510   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:35:04,009-Speed 2629.03 samples/sec   Loss 7.4953   LearningRate 0.0304   Epoch: 8   Global Step: 372520   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:35:07,923-Speed 2616.22 samples/sec   Loss 7.4185   LearningRate 0.0304   Epoch: 8   Global Step: 372530   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:35:11,810-Speed 2634.84 samples/sec   Loss 7.4540   LearningRate 0.0304   Epoch: 8   Global Step: 372540   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:35:15,713-Speed 2624.56 samples/sec   Loss 7.5172   LearningRate 0.0304   Epoch: 8   Global Step: 372550   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:19,611-Speed 2627.54 samples/sec   Loss 7.4529   LearningRate 0.0303   Epoch: 8   Global Step: 372560   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:23,498-Speed 2635.36 samples/sec   Loss 7.4982   LearningRate 0.0303   Epoch: 8   Global Step: 372570   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:27,416-Speed 2614.13 samples/sec   Loss 7.3438   LearningRate 0.0303   Epoch: 8   Global Step: 372580   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:31,310-Speed 2630.16 samples/sec   Loss 7.5610   LearningRate 0.0303   Epoch: 8   Global Step: 372590   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:35,203-Speed 2630.96 samples/sec   Loss 7.4525   LearningRate 0.0303   Epoch: 8   Global Step: 372600   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:39,097-Speed 2630.40 samples/sec   Loss 7.4560   LearningRate 0.0303   Epoch: 8   Global Step: 372610   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:42,990-Speed 2630.65 samples/sec   Loss 7.3108   LearningRate 0.0303   Epoch: 8   Global Step: 372620   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:46,891-Speed 2625.20 samples/sec   Loss 7.4476   LearningRate 0.0303   Epoch: 8   Global Step: 372630   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:50,788-Speed 2628.99 samples/sec   Loss 7.5975   LearningRate 0.0303   Epoch: 8   Global Step: 372640   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:35:54,686-Speed 2626.83 samples/sec   Loss 7.5601   LearningRate 0.0303   Epoch: 8   Global Step: 372650   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:35:58,583-Speed 2629.08 samples/sec   Loss 7.3586   LearningRate 0.0303   Epoch: 8   Global Step: 372660   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:02,496-Speed 2617.12 samples/sec   Loss 7.6112   LearningRate 0.0303   Epoch: 8   Global Step: 372670   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:06,385-Speed 2633.63 samples/sec   Loss 7.6459   LearningRate 0.0303   Epoch: 8   Global Step: 372680   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:10,287-Speed 2625.07 samples/sec   Loss 7.3563   LearningRate 0.0303   Epoch: 8   Global Step: 372690   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:14,181-Speed 2631.30 samples/sec   Loss 7.5531   LearningRate 0.0303   Epoch: 8   Global Step: 372700   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:18,079-Speed 2627.72 samples/sec   Loss 7.4365   LearningRate 0.0303   Epoch: 8   Global Step: 372710   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:21,971-Speed 2631.11 samples/sec   Loss 7.5500   LearningRate 0.0303   Epoch: 8   Global Step: 372720   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:25,875-Speed 2623.96 samples/sec   Loss 7.5778   LearningRate 0.0303   Epoch: 8   Global Step: 372730   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:29,762-Speed 2634.88 samples/sec   Loss 7.4442   LearningRate 0.0303   Epoch: 8   Global Step: 372740   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:33,655-Speed 2631.41 samples/sec   Loss 7.4030   LearningRate 0.0303   Epoch: 8   Global Step: 372750   Fp16 Grad Scale: 262144   Required: 51 hours
Training: 2022-04-14 13:36:37,531-Speed 2642.69 samples/sec   Loss 7.4245   LearningRate 0.0303   Epoch: 8   Global Step: 372760   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:41,436-Speed 2622.17 samples/sec   Loss 7.4253   LearningRate 0.0303   Epoch: 8   Global Step: 372770   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:45,349-Speed 2617.87 samples/sec   Loss 7.5723   LearningRate 0.0303   Epoch: 8   Global Step: 372780   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:49,238-Speed 2633.68 samples/sec   Loss 7.3939   LearningRate 0.0303   Epoch: 8   Global Step: 372790   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:53,137-Speed 2626.94 samples/sec   Loss 7.4871   LearningRate 0.0303   Epoch: 8   Global Step: 372800   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:36:57,025-Speed 2634.83 samples/sec   Loss 7.5026   LearningRate 0.0303   Epoch: 8   Global Step: 372810   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:37:00,903-Speed 2641.29 samples/sec   Loss 7.7252   LearningRate 0.0303   Epoch: 8   Global Step: 372820   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:04,798-Speed 2629.88 samples/sec   Loss 7.5417   LearningRate 0.0303   Epoch: 8   Global Step: 372830   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:08,693-Speed 2629.37 samples/sec   Loss 7.4269   LearningRate 0.0303   Epoch: 8   Global Step: 372840   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:12,596-Speed 2624.60 samples/sec   Loss 7.4005   LearningRate 0.0303   Epoch: 8   Global Step: 372850   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:16,492-Speed 2628.63 samples/sec   Loss 7.3288   LearningRate 0.0303   Epoch: 8   Global Step: 372860   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:20,387-Speed 2629.51 samples/sec   Loss 7.4657   LearningRate 0.0303   Epoch: 8   Global Step: 372870   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:24,296-Speed 2621.09 samples/sec   Loss 7.4033   LearningRate 0.0303   Epoch: 8   Global Step: 372880   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:28,192-Speed 2628.96 samples/sec   Loss 7.4591   LearningRate 0.0303   Epoch: 8   Global Step: 372890   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:32,180-Speed 2568.15 samples/sec   Loss 7.3791   LearningRate 0.0303   Epoch: 8   Global Step: 372900   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:36,142-Speed 2585.24 samples/sec   Loss 7.3930   LearningRate 0.0303   Epoch: 8   Global Step: 372910   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:37:40,032-Speed 2633.03 samples/sec   Loss 7.4531   LearningRate 0.0303   Epoch: 8   Global Step: 372920   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:37:43,929-Speed 2627.99 samples/sec   Loss 7.5344   LearningRate 0.0303   Epoch: 8   Global Step: 372930   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:37:47,821-Speed 2632.69 samples/sec   Loss 7.5043   LearningRate 0.0303   Epoch: 8   Global Step: 372940   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:37:51,719-Speed 2627.45 samples/sec   Loss 7.4612   LearningRate 0.0303   Epoch: 8   Global Step: 372950   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:37:55,650-Speed 2605.89 samples/sec   Loss 7.4087   LearningRate 0.0303   Epoch: 8   Global Step: 372960   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:37:59,558-Speed 2620.91 samples/sec   Loss 7.4377   LearningRate 0.0303   Epoch: 8   Global Step: 372970   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:38:03,446-Speed 2634.35 samples/sec   Loss 7.5648   LearningRate 0.0303   Epoch: 8   Global Step: 372980   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:38:07,345-Speed 2627.14 samples/sec   Loss 7.5379   LearningRate 0.0303   Epoch: 8   Global Step: 372990   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:38:11,250-Speed 2622.71 samples/sec   Loss 7.5402   LearningRate 0.0303   Epoch: 8   Global Step: 373000   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:38:15,148-Speed 2627.74 samples/sec   Loss 7.4115   LearningRate 0.0303   Epoch: 8   Global Step: 373010   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:38:19,055-Speed 2621.79 samples/sec   Loss 7.4359   LearningRate 0.0303   Epoch: 8   Global Step: 373020   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:38:22,958-Speed 2624.39 samples/sec   Loss 7.4563   LearningRate 0.0303   Epoch: 8   Global Step: 373030   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:38:26,850-Speed 2631.03 samples/sec   Loss 7.5334   LearningRate 0.0303   Epoch: 8   Global Step: 373040   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:38:30,713-Speed 2651.59 samples/sec   Loss 7.3986   LearningRate 0.0303   Epoch: 8   Global Step: 373050   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:34,615-Speed 2625.66 samples/sec   Loss 7.4284   LearningRate 0.0303   Epoch: 8   Global Step: 373060   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:38,552-Speed 2601.52 samples/sec   Loss 7.5169   LearningRate 0.0303   Epoch: 8   Global Step: 373070   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:42,457-Speed 2622.81 samples/sec   Loss 7.6204   LearningRate 0.0303   Epoch: 8   Global Step: 373080   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:46,348-Speed 2632.72 samples/sec   Loss 7.4529   LearningRate 0.0303   Epoch: 8   Global Step: 373090   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:50,242-Speed 2630.28 samples/sec   Loss 7.4858   LearningRate 0.0303   Epoch: 8   Global Step: 373100   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:54,163-Speed 2612.48 samples/sec   Loss 7.4002   LearningRate 0.0303   Epoch: 8   Global Step: 373110   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:38:58,085-Speed 2611.79 samples/sec   Loss 7.4061   LearningRate 0.0303   Epoch: 8   Global Step: 373120   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:01,974-Speed 2633.13 samples/sec   Loss 7.4264   LearningRate 0.0303   Epoch: 8   Global Step: 373130   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:05,865-Speed 2632.07 samples/sec   Loss 7.4362   LearningRate 0.0303   Epoch: 8   Global Step: 373140   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:09,756-Speed 2632.93 samples/sec   Loss 7.4270   LearningRate 0.0303   Epoch: 8   Global Step: 373150   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:39:13,649-Speed 2630.40 samples/sec   Loss 7.3783   LearningRate 0.0303   Epoch: 8   Global Step: 373160   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:39:17,535-Speed 2636.22 samples/sec   Loss 7.5702   LearningRate 0.0303   Epoch: 8   Global Step: 373170   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:21,442-Speed 2621.66 samples/sec   Loss 7.9409   LearningRate 0.0303   Epoch: 8   Global Step: 373180   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:25,355-Speed 2617.74 samples/sec   Loss 7.4198   LearningRate 0.0303   Epoch: 8   Global Step: 373190   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:29,241-Speed 2635.33 samples/sec   Loss 7.3569   LearningRate 0.0303   Epoch: 8   Global Step: 373200   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:33,134-Speed 2631.00 samples/sec   Loss 7.3734   LearningRate 0.0303   Epoch: 8   Global Step: 373210   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:37,033-Speed 2626.91 samples/sec   Loss 7.4176   LearningRate 0.0303   Epoch: 8   Global Step: 373220   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:40,926-Speed 2630.66 samples/sec   Loss 7.4716   LearningRate 0.0303   Epoch: 8   Global Step: 373230   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:44,826-Speed 2626.28 samples/sec   Loss 7.4529   LearningRate 0.0303   Epoch: 8   Global Step: 373240   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:48,715-Speed 2634.09 samples/sec   Loss 7.5218   LearningRate 0.0303   Epoch: 8   Global Step: 373250   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:52,664-Speed 2593.94 samples/sec   Loss 7.5672   LearningRate 0.0303   Epoch: 8   Global Step: 373260   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:39:56,584-Speed 2612.74 samples/sec   Loss 7.5567   LearningRate 0.0303   Epoch: 8   Global Step: 373270   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:00,473-Speed 2634.19 samples/sec   Loss 7.5259   LearningRate 0.0303   Epoch: 8   Global Step: 373280   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:04,375-Speed 2624.96 samples/sec   Loss 7.4809   LearningRate 0.0303   Epoch: 8   Global Step: 373290   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:08,285-Speed 2619.48 samples/sec   Loss 7.6564   LearningRate 0.0303   Epoch: 8   Global Step: 373300   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:12,201-Speed 2615.17 samples/sec   Loss 7.4061   LearningRate 0.0303   Epoch: 8   Global Step: 373310   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:33,756-Speed 475.10 samples/sec   Loss 7.3694   LearningRate 0.0302   Epoch: 9   Global Step: 373320   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:37,639-Speed 2638.28 samples/sec   Loss 7.3945   LearningRate 0.0302   Epoch: 9   Global Step: 373330   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:41,618-Speed 2573.93 samples/sec   Loss 7.4375   LearningRate 0.0302   Epoch: 9   Global Step: 373340   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:45,597-Speed 2574.37 samples/sec   Loss 7.4042   LearningRate 0.0302   Epoch: 9   Global Step: 373350   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:49,485-Speed 2635.11 samples/sec   Loss 7.5298   LearningRate 0.0302   Epoch: 9   Global Step: 373360   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:40:53,373-Speed 2633.90 samples/sec   Loss 7.6136   LearningRate 0.0302   Epoch: 9   Global Step: 373370   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:40:57,278-Speed 2623.26 samples/sec   Loss 7.4064   LearningRate 0.0302   Epoch: 9   Global Step: 373380   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:41:01,241-Speed 2584.83 samples/sec   Loss 7.3080   LearningRate 0.0302   Epoch: 9   Global Step: 373390   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:41:05,135-Speed 2630.09 samples/sec   Loss 7.3934   LearningRate 0.0302   Epoch: 9   Global Step: 373400   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:41:09,132-Speed 2562.44 samples/sec   Loss 7.3376   LearningRate 0.0302   Epoch: 9   Global Step: 373410   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:41:13,041-Speed 2620.60 samples/sec   Loss 7.4863   LearningRate 0.0302   Epoch: 9   Global Step: 373420   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:41:16,899-Speed 2655.11 samples/sec   Loss 7.4832   LearningRate 0.0302   Epoch: 9   Global Step: 373430   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:20,814-Speed 2616.01 samples/sec   Loss 7.5234   LearningRate 0.0302   Epoch: 9   Global Step: 373440   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:24,710-Speed 2629.03 samples/sec   Loss 7.5608   LearningRate 0.0302   Epoch: 9   Global Step: 373450   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:28,611-Speed 2626.02 samples/sec   Loss 7.4573   LearningRate 0.0302   Epoch: 9   Global Step: 373460   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:32,506-Speed 2629.40 samples/sec   Loss 7.4742   LearningRate 0.0302   Epoch: 9   Global Step: 373470   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:36,401-Speed 2629.67 samples/sec   Loss 7.5138   LearningRate 0.0302   Epoch: 9   Global Step: 373480   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:40,307-Speed 2621.94 samples/sec   Loss 7.4580   LearningRate 0.0302   Epoch: 9   Global Step: 373490   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:44,205-Speed 2632.09 samples/sec   Loss 7.5467   LearningRate 0.0302   Epoch: 9   Global Step: 373500   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:48,119-Speed 2616.44 samples/sec   Loss 7.5418   LearningRate 0.0302   Epoch: 9   Global Step: 373510   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:52,039-Speed 2613.80 samples/sec   Loss 7.4280   LearningRate 0.0302   Epoch: 9   Global Step: 373520   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:41:55,935-Speed 2628.40 samples/sec   Loss 7.3102   LearningRate 0.0302   Epoch: 9   Global Step: 373530   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:41:59,860-Speed 2610.35 samples/sec   Loss 7.4031   LearningRate 0.0302   Epoch: 9   Global Step: 373540   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:03,754-Speed 2630.34 samples/sec   Loss 7.4495   LearningRate 0.0302   Epoch: 9   Global Step: 373550   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:07,680-Speed 2608.76 samples/sec   Loss 7.5505   LearningRate 0.0302   Epoch: 9   Global Step: 373560   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:11,590-Speed 2619.00 samples/sec   Loss 7.2894   LearningRate 0.0302   Epoch: 9   Global Step: 373570   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:15,498-Speed 2621.31 samples/sec   Loss 7.4615   LearningRate 0.0302   Epoch: 9   Global Step: 373580   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:19,421-Speed 2611.22 samples/sec   Loss 7.4524   LearningRate 0.0302   Epoch: 9   Global Step: 373590   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:23,312-Speed 2632.40 samples/sec   Loss 7.3752   LearningRate 0.0302   Epoch: 9   Global Step: 373600   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:27,210-Speed 2627.69 samples/sec   Loss 7.4590   LearningRate 0.0302   Epoch: 9   Global Step: 373610   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:31,109-Speed 2627.31 samples/sec   Loss 7.3804   LearningRate 0.0302   Epoch: 9   Global Step: 373620   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:35,006-Speed 2628.25 samples/sec   Loss 7.4317   LearningRate 0.0302   Epoch: 9   Global Step: 373630   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:42:38,913-Speed 2622.00 samples/sec   Loss 7.2822   LearningRate 0.0302   Epoch: 9   Global Step: 373640   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:42:42,802-Speed 2632.96 samples/sec   Loss 7.3589   LearningRate 0.0302   Epoch: 9   Global Step: 373650   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:46,698-Speed 2629.70 samples/sec   Loss 7.4248   LearningRate 0.0302   Epoch: 9   Global Step: 373660   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:50,593-Speed 2629.30 samples/sec   Loss 7.4881   LearningRate 0.0302   Epoch: 9   Global Step: 373670   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:54,487-Speed 2630.56 samples/sec   Loss 7.3089   LearningRate 0.0302   Epoch: 9   Global Step: 373680   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:42:58,382-Speed 2629.65 samples/sec   Loss 7.2710   LearningRate 0.0302   Epoch: 9   Global Step: 373690   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:43:02,277-Speed 2629.33 samples/sec   Loss 7.4362   LearningRate 0.0302   Epoch: 9   Global Step: 373700   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:43:06,177-Speed 2625.93 samples/sec   Loss 7.4176   LearningRate 0.0302   Epoch: 9   Global Step: 373710   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:43:10,082-Speed 2623.08 samples/sec   Loss 7.2875   LearningRate 0.0302   Epoch: 9   Global Step: 373720   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:43:13,987-Speed 2622.92 samples/sec   Loss 7.5301   LearningRate 0.0302   Epoch: 9   Global Step: 373730   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:43:17,887-Speed 2626.62 samples/sec   Loss 7.3908   LearningRate 0.0302   Epoch: 9   Global Step: 373740   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:43:21,784-Speed 2628.49 samples/sec   Loss 7.4588   LearningRate 0.0302   Epoch: 9   Global Step: 373750   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:43:25,762-Speed 2575.21 samples/sec   Loss 7.3721   LearningRate 0.0302   Epoch: 9   Global Step: 373760   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:43:29,654-Speed 2631.47 samples/sec   Loss 7.4106   LearningRate 0.0302   Epoch: 9   Global Step: 373770   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:43:33,509-Speed 2656.82 samples/sec   Loss 7.5137   LearningRate 0.0302   Epoch: 9   Global Step: 373780   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:43:37,412-Speed 2623.62 samples/sec   Loss 7.4801   LearningRate 0.0302   Epoch: 9   Global Step: 373790   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:43:41,311-Speed 2627.39 samples/sec   Loss 7.3299   LearningRate 0.0302   Epoch: 9   Global Step: 373800   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:43:45,209-Speed 2627.52 samples/sec   Loss 7.3562   LearningRate 0.0302   Epoch: 9   Global Step: 373810   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:43:49,105-Speed 2628.87 samples/sec   Loss 7.3772   LearningRate 0.0302   Epoch: 9   Global Step: 373820   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:43:53,006-Speed 2626.01 samples/sec   Loss 7.4895   LearningRate 0.0302   Epoch: 9   Global Step: 373830   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:43:56,910-Speed 2623.93 samples/sec   Loss 7.3557   LearningRate 0.0302   Epoch: 9   Global Step: 373840   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:44:00,810-Speed 2626.18 samples/sec   Loss 7.3088   LearningRate 0.0302   Epoch: 9   Global Step: 373850   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:44:04,704-Speed 2629.53 samples/sec   Loss 7.4116   LearningRate 0.0302   Epoch: 9   Global Step: 373860   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:44:08,599-Speed 2630.24 samples/sec   Loss 7.4271   LearningRate 0.0302   Epoch: 9   Global Step: 373870   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:44:12,493-Speed 2630.09 samples/sec   Loss 7.4085   LearningRate 0.0302   Epoch: 9   Global Step: 373880   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:44:16,401-Speed 2620.98 samples/sec   Loss 7.5625   LearningRate 0.0302   Epoch: 9   Global Step: 373890   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:44:20,303-Speed 2624.69 samples/sec   Loss 7.4500   LearningRate 0.0302   Epoch: 9   Global Step: 373900   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:44:24,194-Speed 2632.79 samples/sec   Loss 7.3656   LearningRate 0.0302   Epoch: 9   Global Step: 373910   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:44:28,105-Speed 2618.91 samples/sec   Loss 7.3412   LearningRate 0.0302   Epoch: 9   Global Step: 373920   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:44:32,059-Speed 2590.05 samples/sec   Loss 7.4831   LearningRate 0.0302   Epoch: 9   Global Step: 373930   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:44:35,918-Speed 2654.52 samples/sec   Loss 7.4263   LearningRate 0.0302   Epoch: 9   Global Step: 373940   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:44:39,810-Speed 2631.98 samples/sec   Loss 8.3790   LearningRate 0.0302   Epoch: 9   Global Step: 373950   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:44:43,698-Speed 2633.75 samples/sec   Loss 7.6278   LearningRate 0.0302   Epoch: 9   Global Step: 373960   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:44:47,582-Speed 2637.53 samples/sec   Loss 7.3650   LearningRate 0.0302   Epoch: 9   Global Step: 373970   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:44:51,472-Speed 2633.41 samples/sec   Loss 7.4469   LearningRate 0.0302   Epoch: 9   Global Step: 373980   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:44:55,363-Speed 2632.32 samples/sec   Loss 7.4616   LearningRate 0.0302   Epoch: 9   Global Step: 373990   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:44:59,258-Speed 2629.40 samples/sec   Loss 7.3239   LearningRate 0.0302   Epoch: 9   Global Step: 374000   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:45:03,155-Speed 2628.20 samples/sec   Loss 7.4125   LearningRate 0.0302   Epoch: 9   Global Step: 374010   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:45:07,042-Speed 2635.52 samples/sec   Loss 7.5239   LearningRate 0.0302   Epoch: 9   Global Step: 374020   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:45:10,933-Speed 2632.48 samples/sec   Loss 7.4036   LearningRate 0.0302   Epoch: 9   Global Step: 374030   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:45:14,837-Speed 2623.84 samples/sec   Loss 7.3928   LearningRate 0.0302   Epoch: 9   Global Step: 374040   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:18,729-Speed 2631.44 samples/sec   Loss 7.3568   LearningRate 0.0302   Epoch: 9   Global Step: 374050   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:22,624-Speed 2630.63 samples/sec   Loss 7.4085   LearningRate 0.0302   Epoch: 9   Global Step: 374060   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:26,514-Speed 2632.66 samples/sec   Loss 7.4283   LearningRate 0.0301   Epoch: 9   Global Step: 374070   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:30,415-Speed 2625.75 samples/sec   Loss 7.4839   LearningRate 0.0301   Epoch: 9   Global Step: 374080   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:34,327-Speed 2618.38 samples/sec   Loss 7.3327   LearningRate 0.0301   Epoch: 9   Global Step: 374090   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:38,251-Speed 2610.30 samples/sec   Loss 7.3562   LearningRate 0.0301   Epoch: 9   Global Step: 374100   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:42,146-Speed 2629.56 samples/sec   Loss 7.4828   LearningRate 0.0301   Epoch: 9   Global Step: 374110   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:46,214-Speed 2517.78 samples/sec   Loss 7.3694   LearningRate 0.0301   Epoch: 9   Global Step: 374120   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:50,125-Speed 2619.85 samples/sec   Loss 7.4593   LearningRate 0.0301   Epoch: 9   Global Step: 374130   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:45:54,019-Speed 2629.75 samples/sec   Loss 7.2693   LearningRate 0.0301   Epoch: 9   Global Step: 374140   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:45:57,928-Speed 2620.56 samples/sec   Loss 7.4375   LearningRate 0.0301   Epoch: 9   Global Step: 374150   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:01,835-Speed 2621.38 samples/sec   Loss 7.3382   LearningRate 0.0301   Epoch: 9   Global Step: 374160   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:05,738-Speed 2624.42 samples/sec   Loss 7.5469   LearningRate 0.0301   Epoch: 9   Global Step: 374170   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:09,637-Speed 2627.05 samples/sec   Loss 7.3090   LearningRate 0.0301   Epoch: 9   Global Step: 374180   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:13,549-Speed 2617.94 samples/sec   Loss 7.2263   LearningRate 0.0301   Epoch: 9   Global Step: 374190   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:17,453-Speed 2624.28 samples/sec   Loss 7.1738   LearningRate 0.0301   Epoch: 9   Global Step: 374200   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:21,348-Speed 2629.67 samples/sec   Loss 7.3949   LearningRate 0.0301   Epoch: 9   Global Step: 374210   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:25,240-Speed 2631.65 samples/sec   Loss 7.4886   LearningRate 0.0301   Epoch: 9   Global Step: 374220   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:29,147-Speed 2621.81 samples/sec   Loss 7.4113   LearningRate 0.0301   Epoch: 9   Global Step: 374230   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:46:33,049-Speed 2624.76 samples/sec   Loss 7.4325   LearningRate 0.0301   Epoch: 9   Global Step: 374240   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:46:36,946-Speed 2628.24 samples/sec   Loss 7.3989   LearningRate 0.0301   Epoch: 9   Global Step: 374250   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:46:40,873-Speed 2608.34 samples/sec   Loss 7.3477   LearningRate 0.0301   Epoch: 9   Global Step: 374260   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:46:44,898-Speed 2544.67 samples/sec   Loss 7.4270   LearningRate 0.0301   Epoch: 9   Global Step: 374270   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:46:48,788-Speed 2632.86 samples/sec   Loss 7.4188   LearningRate 0.0301   Epoch: 9   Global Step: 374280   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:46:52,683-Speed 2630.22 samples/sec   Loss 7.4149   LearningRate 0.0301   Epoch: 9   Global Step: 374290   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:46:56,589-Speed 2621.62 samples/sec   Loss 7.4154   LearningRate 0.0301   Epoch: 9   Global Step: 374300   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:47:00,491-Speed 2625.17 samples/sec   Loss 7.4257   LearningRate 0.0301   Epoch: 9   Global Step: 374310   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:47:04,399-Speed 2620.94 samples/sec   Loss 7.4275   LearningRate 0.0301   Epoch: 9   Global Step: 374320   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:47:08,299-Speed 2626.25 samples/sec   Loss 7.3514   LearningRate 0.0301   Epoch: 9   Global Step: 374330   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:47:12,200-Speed 2625.59 samples/sec   Loss 7.4689   LearningRate 0.0301   Epoch: 9   Global Step: 374340   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:47:16,116-Speed 2615.85 samples/sec   Loss 7.2877   LearningRate 0.0301   Epoch: 9   Global Step: 374350   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:47:20,020-Speed 2624.10 samples/sec   Loss 7.5158   LearningRate 0.0301   Epoch: 9   Global Step: 374360   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:47:23,932-Speed 2617.91 samples/sec   Loss 7.2775   LearningRate 0.0301   Epoch: 9   Global Step: 374370   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:47:27,831-Speed 2627.30 samples/sec   Loss 7.3571   LearningRate 0.0301   Epoch: 9   Global Step: 374380   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:47:31,691-Speed 2653.34 samples/sec   Loss 7.8395   LearningRate 0.0301   Epoch: 9   Global Step: 374390   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:35,584-Speed 2630.80 samples/sec   Loss 7.8361   LearningRate 0.0301   Epoch: 9   Global Step: 374400   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:39,482-Speed 2627.64 samples/sec   Loss 7.3236   LearningRate 0.0301   Epoch: 9   Global Step: 374410   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:43,396-Speed 2616.88 samples/sec   Loss 7.3669   LearningRate 0.0301   Epoch: 9   Global Step: 374420   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:47,294-Speed 2627.64 samples/sec   Loss 7.4412   LearningRate 0.0301   Epoch: 9   Global Step: 374430   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:51,184-Speed 2633.61 samples/sec   Loss 7.4261   LearningRate 0.0301   Epoch: 9   Global Step: 374440   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:55,073-Speed 2633.61 samples/sec   Loss 7.3970   LearningRate 0.0301   Epoch: 9   Global Step: 374450   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:47:58,969-Speed 2628.69 samples/sec   Loss 7.3617   LearningRate 0.0301   Epoch: 9   Global Step: 374460   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:48:02,868-Speed 2627.59 samples/sec   Loss 7.5678   LearningRate 0.0301   Epoch: 9   Global Step: 374470   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:48:06,767-Speed 2626.76 samples/sec   Loss 7.6299   LearningRate 0.0301   Epoch: 9   Global Step: 374480   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:48:10,675-Speed 2620.58 samples/sec   Loss 7.5211   LearningRate 0.0301   Epoch: 9   Global Step: 374490   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:14,561-Speed 2635.99 samples/sec   Loss 7.4017   LearningRate 0.0301   Epoch: 9   Global Step: 374500   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:18,463-Speed 2625.03 samples/sec   Loss 7.3727   LearningRate 0.0301   Epoch: 9   Global Step: 374510   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:22,363-Speed 2626.42 samples/sec   Loss 7.5138   LearningRate 0.0301   Epoch: 9   Global Step: 374520   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:26,254-Speed 2632.13 samples/sec   Loss 7.3012   LearningRate 0.0301   Epoch: 9   Global Step: 374530   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:30,153-Speed 2627.63 samples/sec   Loss 7.3520   LearningRate 0.0301   Epoch: 9   Global Step: 374540   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:34,047-Speed 2629.87 samples/sec   Loss 7.2639   LearningRate 0.0301   Epoch: 9   Global Step: 374550   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:37,945-Speed 2627.92 samples/sec   Loss 7.4496   LearningRate 0.0301   Epoch: 9   Global Step: 374560   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:41,842-Speed 2628.56 samples/sec   Loss 7.4161   LearningRate 0.0301   Epoch: 9   Global Step: 374570   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:45,741-Speed 2627.00 samples/sec   Loss 7.3643   LearningRate 0.0301   Epoch: 9   Global Step: 374580   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:48:49,637-Speed 2629.25 samples/sec   Loss 7.5423   LearningRate 0.0301   Epoch: 9   Global Step: 374590   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:48:53,523-Speed 2635.38 samples/sec   Loss 7.3772   LearningRate 0.0301   Epoch: 9   Global Step: 374600   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:48:57,425-Speed 2624.76 samples/sec   Loss 7.3541   LearningRate 0.0301   Epoch: 9   Global Step: 374610   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:01,324-Speed 2627.03 samples/sec   Loss 7.4441   LearningRate 0.0301   Epoch: 9   Global Step: 374620   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:05,224-Speed 2626.64 samples/sec   Loss 7.4053   LearningRate 0.0301   Epoch: 9   Global Step: 374630   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:09,132-Speed 2620.94 samples/sec   Loss 7.3927   LearningRate 0.0301   Epoch: 9   Global Step: 374640   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:13,049-Speed 2614.70 samples/sec   Loss 7.5037   LearningRate 0.0301   Epoch: 9   Global Step: 374650   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:16,943-Speed 2630.29 samples/sec   Loss 7.4029   LearningRate 0.0301   Epoch: 9   Global Step: 374660   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:20,834-Speed 2632.49 samples/sec   Loss 7.4105   LearningRate 0.0301   Epoch: 9   Global Step: 374670   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:24,735-Speed 2625.25 samples/sec   Loss 7.4889   LearningRate 0.0301   Epoch: 9   Global Step: 374680   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:28,634-Speed 2627.35 samples/sec   Loss 7.5163   LearningRate 0.0301   Epoch: 9   Global Step: 374690   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:49:32,515-Speed 2638.73 samples/sec   Loss 7.5228   LearningRate 0.0301   Epoch: 9   Global Step: 374700   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:36,410-Speed 2629.66 samples/sec   Loss 7.5321   LearningRate 0.0301   Epoch: 9   Global Step: 374710   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:40,306-Speed 2629.25 samples/sec   Loss 7.4021   LearningRate 0.0301   Epoch: 9   Global Step: 374720   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:44,198-Speed 2631.79 samples/sec   Loss 7.4446   LearningRate 0.0301   Epoch: 9   Global Step: 374730   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:48,104-Speed 2621.83 samples/sec   Loss 7.4940   LearningRate 0.0301   Epoch: 9   Global Step: 374740   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:51,994-Speed 2633.31 samples/sec   Loss 7.3304   LearningRate 0.0301   Epoch: 9   Global Step: 374750   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:55,889-Speed 2629.92 samples/sec   Loss 7.4165   LearningRate 0.0301   Epoch: 9   Global Step: 374760   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:49:59,798-Speed 2619.90 samples/sec   Loss 7.3708   LearningRate 0.0301   Epoch: 9   Global Step: 374770   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:50:03,688-Speed 2632.99 samples/sec   Loss 7.2917   LearningRate 0.0301   Epoch: 9   Global Step: 374780   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:50:07,582-Speed 2630.76 samples/sec   Loss 7.4613   LearningRate 0.0301   Epoch: 9   Global Step: 374790   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:50:11,482-Speed 2626.78 samples/sec   Loss 7.4664   LearningRate 0.0301   Epoch: 9   Global Step: 374800   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:50:15,373-Speed 2632.24 samples/sec   Loss 7.4959   LearningRate 0.0301   Epoch: 9   Global Step: 374810   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:50:19,273-Speed 2625.91 samples/sec   Loss 7.3600   LearningRate 0.0301   Epoch: 9   Global Step: 374820   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:50:23,179-Speed 2623.14 samples/sec   Loss 7.4406   LearningRate 0.0300   Epoch: 9   Global Step: 374830   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:50:27,079-Speed 2625.77 samples/sec   Loss 7.4683   LearningRate 0.0300   Epoch: 9   Global Step: 374840   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:50:30,983-Speed 2623.67 samples/sec   Loss 7.4364   LearningRate 0.0300   Epoch: 9   Global Step: 374850   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:50:34,848-Speed 2650.33 samples/sec   Loss 7.7852   LearningRate 0.0300   Epoch: 9   Global Step: 374860   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:50:38,716-Speed 2648.40 samples/sec   Loss 7.8912   LearningRate 0.0300   Epoch: 9   Global Step: 374870   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:50:42,607-Speed 2632.03 samples/sec   Loss 7.5438   LearningRate 0.0300   Epoch: 9   Global Step: 374880   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:50:46,502-Speed 2629.93 samples/sec   Loss 7.2349   LearningRate 0.0300   Epoch: 9   Global Step: 374890   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:50:50,401-Speed 2626.55 samples/sec   Loss 7.5156   LearningRate 0.0300   Epoch: 9   Global Step: 374900   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:50:54,306-Speed 2623.13 samples/sec   Loss 7.3620   LearningRate 0.0300   Epoch: 9   Global Step: 374910   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:50:58,201-Speed 2629.81 samples/sec   Loss 7.3685   LearningRate 0.0300   Epoch: 9   Global Step: 374920   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:51:02,100-Speed 2626.63 samples/sec   Loss 7.4385   LearningRate 0.0300   Epoch: 9   Global Step: 374930   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:51:06,000-Speed 2626.17 samples/sec   Loss 7.2715   LearningRate 0.0300   Epoch: 9   Global Step: 374940   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:51:09,899-Speed 2627.47 samples/sec   Loss 7.3971   LearningRate 0.0300   Epoch: 9   Global Step: 374950   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:51:13,805-Speed 2622.15 samples/sec   Loss 7.3840   LearningRate 0.0300   Epoch: 9   Global Step: 374960   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 13:51:17,717-Speed 2618.01 samples/sec   Loss 7.3769   LearningRate 0.0300   Epoch: 9   Global Step: 374970   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:21,614-Speed 2628.32 samples/sec   Loss 7.5137   LearningRate 0.0300   Epoch: 9   Global Step: 374980   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:25,504-Speed 2632.95 samples/sec   Loss 7.3920   LearningRate 0.0300   Epoch: 9   Global Step: 374990   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:29,399-Speed 2630.02 samples/sec   Loss 7.4898   LearningRate 0.0300   Epoch: 9   Global Step: 375000   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:33,296-Speed 2628.31 samples/sec   Loss 7.5158   LearningRate 0.0300   Epoch: 9   Global Step: 375010   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:37,212-Speed 2614.94 samples/sec   Loss 7.3054   LearningRate 0.0300   Epoch: 9   Global Step: 375020   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:41,104-Speed 2631.42 samples/sec   Loss 7.4192   LearningRate 0.0300   Epoch: 9   Global Step: 375030   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:44,998-Speed 2630.41 samples/sec   Loss 7.5172   LearningRate 0.0300   Epoch: 9   Global Step: 375040   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:48,889-Speed 2632.35 samples/sec   Loss 7.4724   LearningRate 0.0300   Epoch: 9   Global Step: 375050   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:52,781-Speed 2631.96 samples/sec   Loss 7.5040   LearningRate 0.0300   Epoch: 9   Global Step: 375060   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:51:56,677-Speed 2628.59 samples/sec   Loss 7.4361   LearningRate 0.0300   Epoch: 9   Global Step: 375070   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:00,578-Speed 2626.26 samples/sec   Loss 7.2609   LearningRate 0.0300   Epoch: 9   Global Step: 375080   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:04,501-Speed 2610.65 samples/sec   Loss 7.3696   LearningRate 0.0300   Epoch: 9   Global Step: 375090   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:08,401-Speed 2626.11 samples/sec   Loss 7.4396   LearningRate 0.0300   Epoch: 9   Global Step: 375100   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:12,298-Speed 2628.26 samples/sec   Loss 7.4146   LearningRate 0.0300   Epoch: 9   Global Step: 375110   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:16,320-Speed 2546.86 samples/sec   Loss 7.4130   LearningRate 0.0300   Epoch: 9   Global Step: 375120   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:20,258-Speed 2600.83 samples/sec   Loss 7.4198   LearningRate 0.0300   Epoch: 9   Global Step: 375130   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:24,231-Speed 2577.93 samples/sec   Loss 7.4761   LearningRate 0.0300   Epoch: 9   Global Step: 375140   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:28,180-Speed 2594.12 samples/sec   Loss 7.5133   LearningRate 0.0300   Epoch: 9   Global Step: 375150   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:32,081-Speed 2625.54 samples/sec   Loss 7.4076   LearningRate 0.0300   Epoch: 9   Global Step: 375160   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:52:35,983-Speed 2624.80 samples/sec   Loss 7.2928   LearningRate 0.0300   Epoch: 9   Global Step: 375170   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:52:39,877-Speed 2630.27 samples/sec   Loss 7.5476   LearningRate 0.0300   Epoch: 9   Global Step: 375180   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:52:43,832-Speed 2589.84 samples/sec   Loss 7.4577   LearningRate 0.0300   Epoch: 9   Global Step: 375190   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:52:47,743-Speed 2619.06 samples/sec   Loss 7.3979   LearningRate 0.0300   Epoch: 9   Global Step: 375200   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:52:51,783-Speed 2535.75 samples/sec   Loss 7.4485   LearningRate 0.0300   Epoch: 9   Global Step: 375210   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:52:55,688-Speed 2622.67 samples/sec   Loss 7.4899   LearningRate 0.0300   Epoch: 9   Global Step: 375220   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:52:59,583-Speed 2629.66 samples/sec   Loss 7.3261   LearningRate 0.0300   Epoch: 9   Global Step: 375230   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:53:03,471-Speed 2634.43 samples/sec   Loss 7.5307   LearningRate 0.0300   Epoch: 9   Global Step: 375240   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:07,379-Speed 2621.06 samples/sec   Loss 8.2978   LearningRate 0.0300   Epoch: 9   Global Step: 375250   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:11,277-Speed 2627.18 samples/sec   Loss 7.6954   LearningRate 0.0300   Epoch: 9   Global Step: 375260   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:15,181-Speed 2623.79 samples/sec   Loss 7.4547   LearningRate 0.0300   Epoch: 9   Global Step: 375270   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:19,089-Speed 2621.25 samples/sec   Loss 7.5140   LearningRate 0.0300   Epoch: 9   Global Step: 375280   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:23,008-Speed 2614.23 samples/sec   Loss 7.4181   LearningRate 0.0300   Epoch: 9   Global Step: 375290   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:26,907-Speed 2627.25 samples/sec   Loss 7.6964   LearningRate 0.0300   Epoch: 9   Global Step: 375300   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:30,799-Speed 2631.91 samples/sec   Loss 7.4024   LearningRate 0.0300   Epoch: 9   Global Step: 375310   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:34,695-Speed 2628.61 samples/sec   Loss 7.3789   LearningRate 0.0300   Epoch: 9   Global Step: 375320   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:38,701-Speed 2556.87 samples/sec   Loss 7.5098   LearningRate 0.0300   Epoch: 9   Global Step: 375330   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:53:42,600-Speed 2626.90 samples/sec   Loss 7.2400   LearningRate 0.0300   Epoch: 9   Global Step: 375340   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:53:46,565-Speed 2583.38 samples/sec   Loss 7.2552   LearningRate 0.0300   Epoch: 9   Global Step: 375350   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:53:50,462-Speed 2628.32 samples/sec   Loss 7.3591   LearningRate 0.0300   Epoch: 9   Global Step: 375360   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:53:54,364-Speed 2625.56 samples/sec   Loss 7.4625   LearningRate 0.0300   Epoch: 9   Global Step: 375370   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:53:58,320-Speed 2594.25 samples/sec   Loss 7.4608   LearningRate 0.0300   Epoch: 9   Global Step: 375380   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:02,265-Speed 2596.49 samples/sec   Loss 7.5268   LearningRate 0.0300   Epoch: 9   Global Step: 375390   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:06,219-Speed 2590.06 samples/sec   Loss 7.3503   LearningRate 0.0300   Epoch: 9   Global Step: 375400   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:10,108-Speed 2633.59 samples/sec   Loss 7.4337   LearningRate 0.0300   Epoch: 9   Global Step: 375410   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:14,011-Speed 2624.53 samples/sec   Loss 7.4847   LearningRate 0.0300   Epoch: 9   Global Step: 375420   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:17,909-Speed 2627.48 samples/sec   Loss 7.4255   LearningRate 0.0300   Epoch: 9   Global Step: 375430   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:21,810-Speed 2626.12 samples/sec   Loss 7.4620   LearningRate 0.0300   Epoch: 9   Global Step: 375440   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:25,700-Speed 2633.68 samples/sec   Loss 7.5854   LearningRate 0.0300   Epoch: 9   Global Step: 375450   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:29,591-Speed 2631.85 samples/sec   Loss 7.4486   LearningRate 0.0300   Epoch: 9   Global Step: 375460   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:33,484-Speed 2630.86 samples/sec   Loss 7.3912   LearningRate 0.0300   Epoch: 9   Global Step: 375470   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:54:37,384-Speed 2626.73 samples/sec   Loss 7.4811   LearningRate 0.0300   Epoch: 9   Global Step: 375480   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:54:41,294-Speed 2619.83 samples/sec   Loss 7.4051   LearningRate 0.0300   Epoch: 9   Global Step: 375490   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:54:45,181-Speed 2634.71 samples/sec   Loss 7.4440   LearningRate 0.0300   Epoch: 9   Global Step: 375500   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:54:49,072-Speed 2632.54 samples/sec   Loss 7.4349   LearningRate 0.0300   Epoch: 9   Global Step: 375510   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:54:52,970-Speed 2627.71 samples/sec   Loss 7.4442   LearningRate 0.0300   Epoch: 9   Global Step: 375520   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:54:56,864-Speed 2630.39 samples/sec   Loss 7.3803   LearningRate 0.0300   Epoch: 9   Global Step: 375530   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:55:00,763-Speed 2626.93 samples/sec   Loss 7.4193   LearningRate 0.0300   Epoch: 9   Global Step: 375540   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:55:04,661-Speed 2627.87 samples/sec   Loss 7.4465   LearningRate 0.0300   Epoch: 9   Global Step: 375550   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:55:08,548-Speed 2635.23 samples/sec   Loss 7.3614   LearningRate 0.0300   Epoch: 9   Global Step: 375560   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:55:12,443-Speed 2629.83 samples/sec   Loss 7.3313   LearningRate 0.0300   Epoch: 9   Global Step: 375570   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:55:16,339-Speed 2629.19 samples/sec   Loss 7.3636   LearningRate 0.0299   Epoch: 9   Global Step: 375580   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:20,233-Speed 2630.60 samples/sec   Loss 7.3976   LearningRate 0.0299   Epoch: 9   Global Step: 375590   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:24,125-Speed 2632.07 samples/sec   Loss 7.2711   LearningRate 0.0299   Epoch: 9   Global Step: 375600   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:28,029-Speed 2622.78 samples/sec   Loss 7.2722   LearningRate 0.0299   Epoch: 9   Global Step: 375610   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:31,932-Speed 2625.03 samples/sec   Loss 7.4160   LearningRate 0.0299   Epoch: 9   Global Step: 375620   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:35,823-Speed 2632.25 samples/sec   Loss 7.3355   LearningRate 0.0299   Epoch: 9   Global Step: 375630   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:39,794-Speed 2579.19 samples/sec   Loss 7.3957   LearningRate 0.0299   Epoch: 9   Global Step: 375640   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:43,691-Speed 2628.16 samples/sec   Loss 7.3504   LearningRate 0.0299   Epoch: 9   Global Step: 375650   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:47,620-Speed 2607.24 samples/sec   Loss 7.4745   LearningRate 0.0299   Epoch: 9   Global Step: 375660   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:51,522-Speed 2624.79 samples/sec   Loss 7.3925   LearningRate 0.0299   Epoch: 9   Global Step: 375670   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:55:55,414-Speed 2631.77 samples/sec   Loss 7.3391   LearningRate 0.0299   Epoch: 9   Global Step: 375680   Fp16 Grad Scale: 262144   Required: 51 hours
Training: 2022-04-14 13:55:59,290-Speed 2642.43 samples/sec   Loss 7.4044   LearningRate 0.0299   Epoch: 9   Global Step: 375690   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:03,189-Speed 2627.33 samples/sec   Loss 7.2664   LearningRate 0.0299   Epoch: 9   Global Step: 375700   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:07,082-Speed 2630.68 samples/sec   Loss 7.4132   LearningRate 0.0299   Epoch: 9   Global Step: 375710   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:10,977-Speed 2629.63 samples/sec   Loss 7.3239   LearningRate 0.0299   Epoch: 9   Global Step: 375720   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:14,893-Speed 2615.49 samples/sec   Loss 7.3528   LearningRate 0.0299   Epoch: 9   Global Step: 375730   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:18,793-Speed 2626.82 samples/sec   Loss 7.4307   LearningRate 0.0299   Epoch: 9   Global Step: 375740   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:22,783-Speed 2567.05 samples/sec   Loss 7.3271   LearningRate 0.0299   Epoch: 9   Global Step: 375750   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:26,706-Speed 2610.66 samples/sec   Loss 7.3944   LearningRate 0.0299   Epoch: 9   Global Step: 375760   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:30,602-Speed 2629.53 samples/sec   Loss 7.4068   LearningRate 0.0299   Epoch: 9   Global Step: 375770   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:34,533-Speed 2605.41 samples/sec   Loss 7.3392   LearningRate 0.0299   Epoch: 9   Global Step: 375780   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:56:38,355-Speed 2680.26 samples/sec   Loss 7.9681   LearningRate 0.0299   Epoch: 9   Global Step: 375790   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:56:42,251-Speed 2629.03 samples/sec   Loss 7.6799   LearningRate 0.0299   Epoch: 9   Global Step: 375800   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:56:46,226-Speed 2577.26 samples/sec   Loss 7.4617   LearningRate 0.0299   Epoch: 9   Global Step: 375810   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:56:50,122-Speed 2629.00 samples/sec   Loss 7.3001   LearningRate 0.0299   Epoch: 9   Global Step: 375820   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:56:54,009-Speed 2634.91 samples/sec   Loss 7.4034   LearningRate 0.0299   Epoch: 9   Global Step: 375830   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:56:57,905-Speed 2629.04 samples/sec   Loss 7.3275   LearningRate 0.0299   Epoch: 9   Global Step: 375840   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:57:01,792-Speed 2635.23 samples/sec   Loss 7.3654   LearningRate 0.0299   Epoch: 9   Global Step: 375850   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:57:05,694-Speed 2625.08 samples/sec   Loss 7.4654   LearningRate 0.0299   Epoch: 9   Global Step: 375860   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:57:09,589-Speed 2629.28 samples/sec   Loss 7.3762   LearningRate 0.0299   Epoch: 9   Global Step: 375870   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:57:13,578-Speed 2567.83 samples/sec   Loss 7.4568   LearningRate 0.0299   Epoch: 9   Global Step: 375880   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:57:17,474-Speed 2629.09 samples/sec   Loss 7.4342   LearningRate 0.0299   Epoch: 9   Global Step: 375890   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:21,371-Speed 2628.48 samples/sec   Loss 7.3200   LearningRate 0.0299   Epoch: 9   Global Step: 375900   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:25,262-Speed 2631.94 samples/sec   Loss 7.2912   LearningRate 0.0299   Epoch: 9   Global Step: 375910   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:29,162-Speed 2626.78 samples/sec   Loss 7.3902   LearningRate 0.0299   Epoch: 9   Global Step: 375920   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:33,067-Speed 2622.83 samples/sec   Loss 7.3633   LearningRate 0.0299   Epoch: 9   Global Step: 375930   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:36,973-Speed 2621.95 samples/sec   Loss 7.4053   LearningRate 0.0299   Epoch: 9   Global Step: 375940   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:40,897-Speed 2609.87 samples/sec   Loss 7.4586   LearningRate 0.0299   Epoch: 9   Global Step: 375950   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:44,801-Speed 2623.87 samples/sec   Loss 7.3667   LearningRate 0.0299   Epoch: 9   Global Step: 375960   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:48,687-Speed 2635.63 samples/sec   Loss 7.3789   LearningRate 0.0299   Epoch: 9   Global Step: 375970   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:52,576-Speed 2633.94 samples/sec   Loss 7.3592   LearningRate 0.0299   Epoch: 9   Global Step: 375980   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 13:57:56,468-Speed 2631.47 samples/sec   Loss 7.4006   LearningRate 0.0299   Epoch: 9   Global Step: 375990   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:00,363-Speed 2629.75 samples/sec   Loss 7.3969   LearningRate 0.0299   Epoch: 9   Global Step: 376000   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:04,266-Speed 2624.15 samples/sec   Loss 7.3147   LearningRate 0.0299   Epoch: 9   Global Step: 376010   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:08,166-Speed 2626.51 samples/sec   Loss 7.2802   LearningRate 0.0299   Epoch: 9   Global Step: 376020   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:12,065-Speed 2626.77 samples/sec   Loss 7.3714   LearningRate 0.0299   Epoch: 9   Global Step: 376030   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:15,988-Speed 2610.91 samples/sec   Loss 7.4145   LearningRate 0.0299   Epoch: 9   Global Step: 376040   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:19,903-Speed 2615.76 samples/sec   Loss 7.3776   LearningRate 0.0299   Epoch: 9   Global Step: 376050   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:23,798-Speed 2629.96 samples/sec   Loss 7.2837   LearningRate 0.0299   Epoch: 9   Global Step: 376060   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:27,691-Speed 2631.41 samples/sec   Loss 7.4031   LearningRate 0.0299   Epoch: 9   Global Step: 376070   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:31,586-Speed 2629.43 samples/sec   Loss 7.2940   LearningRate 0.0299   Epoch: 9   Global Step: 376080   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:58:35,490-Speed 2623.87 samples/sec   Loss 7.4186   LearningRate 0.0299   Epoch: 9   Global Step: 376090   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:58:39,380-Speed 2632.47 samples/sec   Loss 7.4818   LearningRate 0.0299   Epoch: 9   Global Step: 376100   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:58:43,278-Speed 2627.47 samples/sec   Loss 7.3244   LearningRate 0.0299   Epoch: 9   Global Step: 376110   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:58:47,171-Speed 2630.81 samples/sec   Loss 7.3808   LearningRate 0.0299   Epoch: 9   Global Step: 376120   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:58:51,066-Speed 2630.10 samples/sec   Loss 7.4461   LearningRate 0.0299   Epoch: 9   Global Step: 376130   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:58:54,973-Speed 2621.23 samples/sec   Loss 7.1946   LearningRate 0.0299   Epoch: 9   Global Step: 376140   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 13:58:58,856-Speed 2638.34 samples/sec   Loss 7.3858   LearningRate 0.0299   Epoch: 9   Global Step: 376150   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:02,746-Speed 2632.87 samples/sec   Loss 7.2642   LearningRate 0.0299   Epoch: 9   Global Step: 376160   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:06,682-Speed 2603.15 samples/sec   Loss 7.4184   LearningRate 0.0299   Epoch: 9   Global Step: 376170   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:10,578-Speed 2628.73 samples/sec   Loss 7.3082   LearningRate 0.0299   Epoch: 9   Global Step: 376180   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:14,487-Speed 2619.85 samples/sec   Loss 7.3162   LearningRate 0.0299   Epoch: 9   Global Step: 376190   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:18,388-Speed 2626.12 samples/sec   Loss 7.2678   LearningRate 0.0299   Epoch: 9   Global Step: 376200   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:22,279-Speed 2632.67 samples/sec   Loss 7.3488   LearningRate 0.0299   Epoch: 9   Global Step: 376210   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:26,209-Speed 2606.35 samples/sec   Loss 7.4289   LearningRate 0.0299   Epoch: 9   Global Step: 376220   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:30,114-Speed 2623.31 samples/sec   Loss 7.4149   LearningRate 0.0299   Epoch: 9   Global Step: 376230   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:34,015-Speed 2625.49 samples/sec   Loss 7.4112   LearningRate 0.0299   Epoch: 9   Global Step: 376240   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 13:59:37,877-Speed 2651.89 samples/sec   Loss 7.8352   LearningRate 0.0299   Epoch: 9   Global Step: 376250   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:59:41,774-Speed 2628.47 samples/sec   Loss 8.3635   LearningRate 0.0299   Epoch: 9   Global Step: 376260   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:59:45,669-Speed 2629.57 samples/sec   Loss 7.7955   LearningRate 0.0299   Epoch: 9   Global Step: 376270   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:59:49,562-Speed 2631.27 samples/sec   Loss 7.7840   LearningRate 0.0299   Epoch: 9   Global Step: 376280   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:59:53,455-Speed 2631.08 samples/sec   Loss 7.5972   LearningRate 0.0299   Epoch: 9   Global Step: 376290   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 13:59:57,370-Speed 2616.69 samples/sec   Loss 7.5820   LearningRate 0.0299   Epoch: 9   Global Step: 376300   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:00:01,306-Speed 2601.66 samples/sec   Loss 7.5361   LearningRate 0.0299   Epoch: 9   Global Step: 376310   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:00:05,204-Speed 2628.34 samples/sec   Loss 7.5472   LearningRate 0.0299   Epoch: 9   Global Step: 376320   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:00:09,102-Speed 2627.86 samples/sec   Loss 7.4975   LearningRate 0.0299   Epoch: 9   Global Step: 376330   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:00:12,994-Speed 2631.37 samples/sec   Loss 7.4827   LearningRate 0.0298   Epoch: 9   Global Step: 376340   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:00:16,889-Speed 2629.36 samples/sec   Loss 7.4756   LearningRate 0.0298   Epoch: 9   Global Step: 376350   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:20,781-Speed 2632.27 samples/sec   Loss 7.4312   LearningRate 0.0298   Epoch: 9   Global Step: 376360   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:24,666-Speed 2636.36 samples/sec   Loss 7.4398   LearningRate 0.0298   Epoch: 9   Global Step: 376370   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:28,588-Speed 2611.16 samples/sec   Loss 7.3882   LearningRate 0.0298   Epoch: 9   Global Step: 376380   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:32,492-Speed 2624.39 samples/sec   Loss 7.4173   LearningRate 0.0298   Epoch: 9   Global Step: 376390   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:36,431-Speed 2600.50 samples/sec   Loss 7.4765   LearningRate 0.0298   Epoch: 9   Global Step: 376400   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:40,333-Speed 2624.58 samples/sec   Loss 7.3681   LearningRate 0.0298   Epoch: 9   Global Step: 376410   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:44,222-Speed 2633.81 samples/sec   Loss 7.3915   LearningRate 0.0298   Epoch: 9   Global Step: 376420   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:48,116-Speed 2630.55 samples/sec   Loss 7.2647   LearningRate 0.0298   Epoch: 9   Global Step: 376430   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:52,015-Speed 2626.73 samples/sec   Loss 7.4339   LearningRate 0.0298   Epoch: 9   Global Step: 376440   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:00:55,906-Speed 2632.54 samples/sec   Loss 7.3257   LearningRate 0.0298   Epoch: 9   Global Step: 376450   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:00:59,801-Speed 2629.60 samples/sec   Loss 7.4363   LearningRate 0.0298   Epoch: 9   Global Step: 376460   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:03,692-Speed 2632.08 samples/sec   Loss 7.4733   LearningRate 0.0298   Epoch: 9   Global Step: 376470   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:07,585-Speed 2631.08 samples/sec   Loss 7.3083   LearningRate 0.0298   Epoch: 9   Global Step: 376480   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:11,500-Speed 2616.28 samples/sec   Loss 7.5941   LearningRate 0.0298   Epoch: 9   Global Step: 376490   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:15,404-Speed 2624.01 samples/sec   Loss 7.5477   LearningRate 0.0298   Epoch: 9   Global Step: 376500   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:19,296-Speed 2631.96 samples/sec   Loss 7.4694   LearningRate 0.0298   Epoch: 9   Global Step: 376510   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:23,219-Speed 2611.24 samples/sec   Loss 7.4581   LearningRate 0.0298   Epoch: 9   Global Step: 376520   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:27,114-Speed 2629.64 samples/sec   Loss 7.3115   LearningRate 0.0298   Epoch: 9   Global Step: 376530   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:31,026-Speed 2619.21 samples/sec   Loss 7.3020   LearningRate 0.0298   Epoch: 9   Global Step: 376540   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:01:34,931-Speed 2623.41 samples/sec   Loss 7.4560   LearningRate 0.0298   Epoch: 9   Global Step: 376550   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:01:38,821-Speed 2632.40 samples/sec   Loss 7.3052   LearningRate 0.0298   Epoch: 9   Global Step: 376560   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:01:42,719-Speed 2627.49 samples/sec   Loss 7.4956   LearningRate 0.0298   Epoch: 9   Global Step: 376570   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:01:46,624-Speed 2623.31 samples/sec   Loss 7.4297   LearningRate 0.0298   Epoch: 9   Global Step: 376580   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:01:50,630-Speed 2556.93 samples/sec   Loss 7.3897   LearningRate 0.0298   Epoch: 9   Global Step: 376590   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:01:54,550-Speed 2613.02 samples/sec   Loss 7.3959   LearningRate 0.0298   Epoch: 9   Global Step: 376600   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:01:58,456-Speed 2621.70 samples/sec   Loss 7.5520   LearningRate 0.0298   Epoch: 9   Global Step: 376610   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:02:02,394-Speed 2601.24 samples/sec   Loss 7.4952   LearningRate 0.0298   Epoch: 9   Global Step: 376620   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:06,293-Speed 2627.29 samples/sec   Loss 7.3676   LearningRate 0.0298   Epoch: 9   Global Step: 376630   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:10,180-Speed 2635.18 samples/sec   Loss 7.5975   LearningRate 0.0298   Epoch: 9   Global Step: 376640   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:14,108-Speed 2607.32 samples/sec   Loss 7.3368   LearningRate 0.0298   Epoch: 9   Global Step: 376650   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:18,000-Speed 2631.93 samples/sec   Loss 7.3609   LearningRate 0.0298   Epoch: 9   Global Step: 376660   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:21,918-Speed 2614.50 samples/sec   Loss 7.3481   LearningRate 0.0298   Epoch: 9   Global Step: 376670   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:25,831-Speed 2617.29 samples/sec   Loss 7.3188   LearningRate 0.0298   Epoch: 9   Global Step: 376680   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:29,749-Speed 2614.65 samples/sec   Loss 7.3461   LearningRate 0.0298   Epoch: 9   Global Step: 376690   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:33,668-Speed 2613.94 samples/sec   Loss 7.4050   LearningRate 0.0298   Epoch: 9   Global Step: 376700   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:37,606-Speed 2600.46 samples/sec   Loss 7.4625   LearningRate 0.0298   Epoch: 9   Global Step: 376710   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:02:41,474-Speed 2647.92 samples/sec   Loss 7.6585   LearningRate 0.0298   Epoch: 9   Global Step: 376720   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:02:45,359-Speed 2636.69 samples/sec   Loss 7.3578   LearningRate 0.0298   Epoch: 9   Global Step: 376730   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:02:49,254-Speed 2629.46 samples/sec   Loss 7.3370   LearningRate 0.0298   Epoch: 9   Global Step: 376740   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:02:53,160-Speed 2622.68 samples/sec   Loss 7.4528   LearningRate 0.0298   Epoch: 9   Global Step: 376750   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:02:57,054-Speed 2630.79 samples/sec   Loss 7.3462   LearningRate 0.0298   Epoch: 9   Global Step: 376760   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:03:00,954-Speed 2626.90 samples/sec   Loss 7.3051   LearningRate 0.0298   Epoch: 9   Global Step: 376770   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:03:04,867-Speed 2617.50 samples/sec   Loss 7.3537   LearningRate 0.0298   Epoch: 9   Global Step: 376780   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:03:08,773-Speed 2622.47 samples/sec   Loss 7.3508   LearningRate 0.0298   Epoch: 9   Global Step: 376790   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:03:12,672-Speed 2626.97 samples/sec   Loss 7.3810   LearningRate 0.0298   Epoch: 9   Global Step: 376800   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:03:16,569-Speed 2627.92 samples/sec   Loss 7.2964   LearningRate 0.0298   Epoch: 9   Global Step: 376810   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:03:20,478-Speed 2620.21 samples/sec   Loss 7.3212   LearningRate 0.0298   Epoch: 9   Global Step: 376820   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:24,377-Speed 2627.45 samples/sec   Loss 7.4124   LearningRate 0.0298   Epoch: 9   Global Step: 376830   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:28,273-Speed 2628.83 samples/sec   Loss 7.3339   LearningRate 0.0298   Epoch: 9   Global Step: 376840   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:32,165-Speed 2631.81 samples/sec   Loss 7.2761   LearningRate 0.0298   Epoch: 9   Global Step: 376850   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:36,091-Speed 2609.10 samples/sec   Loss 7.3971   LearningRate 0.0298   Epoch: 9   Global Step: 376860   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:39,993-Speed 2624.66 samples/sec   Loss 7.3341   LearningRate 0.0298   Epoch: 9   Global Step: 376870   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:43,894-Speed 2625.87 samples/sec   Loss 7.4604   LearningRate 0.0298   Epoch: 9   Global Step: 376880   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:03:47,762-Speed 2648.21 samples/sec   Loss 7.3587   LearningRate 0.0298   Epoch: 9   Global Step: 376890   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:03:51,703-Speed 2599.45 samples/sec   Loss 7.8634   LearningRate 0.0298   Epoch: 9   Global Step: 376900   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:03:55,598-Speed 2629.65 samples/sec   Loss 7.3092   LearningRate 0.0298   Epoch: 9   Global Step: 376910   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:03:59,527-Speed 2606.74 samples/sec   Loss 7.2918   LearningRate 0.0298   Epoch: 9   Global Step: 376920   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:03,420-Speed 2631.92 samples/sec   Loss 7.2859   LearningRate 0.0298   Epoch: 9   Global Step: 376930   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:07,329-Speed 2620.13 samples/sec   Loss 7.2254   LearningRate 0.0298   Epoch: 9   Global Step: 376940   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:11,217-Speed 2634.19 samples/sec   Loss 7.3720   LearningRate 0.0298   Epoch: 9   Global Step: 376950   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:15,107-Speed 2633.66 samples/sec   Loss 7.3477   LearningRate 0.0298   Epoch: 9   Global Step: 376960   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:19,022-Speed 2615.79 samples/sec   Loss 7.3497   LearningRate 0.0298   Epoch: 9   Global Step: 376970   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:22,948-Speed 2609.41 samples/sec   Loss 7.3865   LearningRate 0.0298   Epoch: 9   Global Step: 376980   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:26,830-Speed 2638.48 samples/sec   Loss 7.2133   LearningRate 0.0298   Epoch: 9   Global Step: 376990   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:04:30,727-Speed 2628.51 samples/sec   Loss 7.3106   LearningRate 0.0298   Epoch: 9   Global Step: 377000   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:04:34,624-Speed 2628.68 samples/sec   Loss 7.2941   LearningRate 0.0298   Epoch: 9   Global Step: 377010   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:04:38,526-Speed 2625.13 samples/sec   Loss 7.4768   LearningRate 0.0298   Epoch: 9   Global Step: 377020   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:04:42,414-Speed 2634.18 samples/sec   Loss 7.3756   LearningRate 0.0298   Epoch: 9   Global Step: 377030   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:04:46,309-Speed 2629.72 samples/sec   Loss 7.3464   LearningRate 0.0298   Epoch: 9   Global Step: 377040   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:04:50,224-Speed 2616.49 samples/sec   Loss 7.5002   LearningRate 0.0298   Epoch: 9   Global Step: 377050   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:54,118-Speed 2630.56 samples/sec   Loss 7.9728   LearningRate 0.0298   Epoch: 9   Global Step: 377060   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:04:58,041-Speed 2610.75 samples/sec   Loss 7.6400   LearningRate 0.0298   Epoch: 9   Global Step: 377070   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:01,936-Speed 2630.04 samples/sec   Loss 7.3517   LearningRate 0.0298   Epoch: 9   Global Step: 377080   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:05,923-Speed 2569.36 samples/sec   Loss 7.2481   LearningRate 0.0298   Epoch: 9   Global Step: 377090   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:09,818-Speed 2629.65 samples/sec   Loss 7.4278   LearningRate 0.0297   Epoch: 9   Global Step: 377100   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:13,717-Speed 2627.03 samples/sec   Loss 7.6427   LearningRate 0.0297   Epoch: 9   Global Step: 377110   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:17,607-Speed 2632.73 samples/sec   Loss 7.6178   LearningRate 0.0297   Epoch: 9   Global Step: 377120   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:21,495-Speed 2634.78 samples/sec   Loss 7.2052   LearningRate 0.0297   Epoch: 9   Global Step: 377130   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:25,396-Speed 2625.68 samples/sec   Loss 7.2866   LearningRate 0.0297   Epoch: 9   Global Step: 377140   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:05:29,299-Speed 2624.67 samples/sec   Loss 7.2974   LearningRate 0.0297   Epoch: 9   Global Step: 377150   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:33,189-Speed 2632.92 samples/sec   Loss 7.4049   LearningRate 0.0297   Epoch: 9   Global Step: 377160   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:37,084-Speed 2629.65 samples/sec   Loss 7.3452   LearningRate 0.0297   Epoch: 9   Global Step: 377170   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:40,971-Speed 2635.06 samples/sec   Loss 7.4369   LearningRate 0.0297   Epoch: 9   Global Step: 377180   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:44,862-Speed 2632.61 samples/sec   Loss 7.2413   LearningRate 0.0297   Epoch: 9   Global Step: 377190   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:48,758-Speed 2628.37 samples/sec   Loss 7.2256   LearningRate 0.0297   Epoch: 9   Global Step: 377200   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:52,668-Speed 2620.57 samples/sec   Loss 7.3741   LearningRate 0.0297   Epoch: 9   Global Step: 377210   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:05:56,561-Speed 2631.00 samples/sec   Loss 7.3053   LearningRate 0.0297   Epoch: 9   Global Step: 377220   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:06:00,465-Speed 2623.61 samples/sec   Loss 7.3413   LearningRate 0.0297   Epoch: 9   Global Step: 377230   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:06:04,359-Speed 2630.48 samples/sec   Loss 7.3174   LearningRate 0.0297   Epoch: 9   Global Step: 377240   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:06:08,277-Speed 2614.38 samples/sec   Loss 7.4381   LearningRate 0.0297   Epoch: 9   Global Step: 377250   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:12,184-Speed 2621.25 samples/sec   Loss 7.3257   LearningRate 0.0297   Epoch: 9   Global Step: 377260   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:16,077-Speed 2630.80 samples/sec   Loss 7.3738   LearningRate 0.0297   Epoch: 9   Global Step: 377270   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:19,969-Speed 2631.88 samples/sec   Loss 7.3518   LearningRate 0.0297   Epoch: 9   Global Step: 377280   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:23,878-Speed 2620.16 samples/sec   Loss 7.3755   LearningRate 0.0297   Epoch: 9   Global Step: 377290   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:27,779-Speed 2625.84 samples/sec   Loss 7.4162   LearningRate 0.0297   Epoch: 9   Global Step: 377300   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:31,670-Speed 2632.79 samples/sec   Loss 7.3035   LearningRate 0.0297   Epoch: 9   Global Step: 377310   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:35,581-Speed 2618.30 samples/sec   Loss 7.2902   LearningRate 0.0297   Epoch: 9   Global Step: 377320   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:39,477-Speed 2629.18 samples/sec   Loss 7.3405   LearningRate 0.0297   Epoch: 9   Global Step: 377330   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:43,366-Speed 2633.86 samples/sec   Loss 7.4945   LearningRate 0.0297   Epoch: 9   Global Step: 377340   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:06:47,263-Speed 2628.74 samples/sec   Loss 7.3650   LearningRate 0.0297   Epoch: 9   Global Step: 377350   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:06:51,151-Speed 2633.94 samples/sec   Loss 7.4149   LearningRate 0.0297   Epoch: 9   Global Step: 377360   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:06:55,050-Speed 2627.16 samples/sec   Loss 7.4003   LearningRate 0.0297   Epoch: 9   Global Step: 377370   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:06:58,943-Speed 2630.52 samples/sec   Loss 7.4404   LearningRate 0.0297   Epoch: 9   Global Step: 377380   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:07:02,844-Speed 2625.58 samples/sec   Loss 7.3894   LearningRate 0.0297   Epoch: 9   Global Step: 377390   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:07:06,749-Speed 2622.60 samples/sec   Loss 7.2295   LearningRate 0.0297   Epoch: 9   Global Step: 377400   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:07:10,649-Speed 2627.19 samples/sec   Loss 7.2705   LearningRate 0.0297   Epoch: 9   Global Step: 377410   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:07:14,542-Speed 2630.38 samples/sec   Loss 7.4299   LearningRate 0.0297   Epoch: 9   Global Step: 377420   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:07:18,451-Speed 2620.58 samples/sec   Loss 7.2976   LearningRate 0.0297   Epoch: 9   Global Step: 377430   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:07:22,336-Speed 2636.81 samples/sec   Loss 7.3742   LearningRate 0.0297   Epoch: 9   Global Step: 377440   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:07:26,221-Speed 2636.21 samples/sec   Loss 7.4258   LearningRate 0.0297   Epoch: 9   Global Step: 377450   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:07:30,134-Speed 2617.68 samples/sec   Loss 7.4454   LearningRate 0.0297   Epoch: 9   Global Step: 377460   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:07:34,024-Speed 2633.55 samples/sec   Loss 7.4048   LearningRate 0.0297   Epoch: 9   Global Step: 377470   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:07:37,912-Speed 2634.10 samples/sec   Loss 7.2095   LearningRate 0.0297   Epoch: 9   Global Step: 377480   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:07:41,789-Speed 2641.84 samples/sec   Loss 7.4230   LearningRate 0.0297   Epoch: 9   Global Step: 377490   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:07:45,680-Speed 2632.47 samples/sec   Loss 7.5291   LearningRate 0.0297   Epoch: 9   Global Step: 377500   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:07:49,577-Speed 2627.96 samples/sec   Loss 7.6707   LearningRate 0.0297   Epoch: 9   Global Step: 377510   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:07:53,470-Speed 2631.73 samples/sec   Loss 7.7505   LearningRate 0.0297   Epoch: 9   Global Step: 377520   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:07:57,364-Speed 2629.77 samples/sec   Loss 7.5680   LearningRate 0.0297   Epoch: 9   Global Step: 377530   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:01,280-Speed 2616.24 samples/sec   Loss 7.2773   LearningRate 0.0297   Epoch: 9   Global Step: 377540   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:05,296-Speed 2550.06 samples/sec   Loss 7.2076   LearningRate 0.0297   Epoch: 9   Global Step: 377550   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:09,191-Speed 2629.49 samples/sec   Loss 7.3651   LearningRate 0.0297   Epoch: 9   Global Step: 377560   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:13,095-Speed 2623.76 samples/sec   Loss 7.3520   LearningRate 0.0297   Epoch: 9   Global Step: 377570   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:16,988-Speed 2631.11 samples/sec   Loss 7.2468   LearningRate 0.0297   Epoch: 9   Global Step: 377580   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:20,890-Speed 2624.97 samples/sec   Loss 7.2567   LearningRate 0.0297   Epoch: 9   Global Step: 377590   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:08:24,786-Speed 2629.16 samples/sec   Loss 7.4646   LearningRate 0.0297   Epoch: 9   Global Step: 377600   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:28,694-Speed 2621.14 samples/sec   Loss 7.3508   LearningRate 0.0297   Epoch: 9   Global Step: 377610   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:32,613-Speed 2613.56 samples/sec   Loss 7.4170   LearningRate 0.0297   Epoch: 9   Global Step: 377620   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:36,514-Speed 2625.57 samples/sec   Loss 7.3656   LearningRate 0.0297   Epoch: 9   Global Step: 377630   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:40,417-Speed 2624.24 samples/sec   Loss 7.3626   LearningRate 0.0297   Epoch: 9   Global Step: 377640   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:44,309-Speed 2631.51 samples/sec   Loss 7.2248   LearningRate 0.0297   Epoch: 9   Global Step: 377650   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:48,204-Speed 2629.36 samples/sec   Loss 7.2787   LearningRate 0.0297   Epoch: 9   Global Step: 377660   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:52,134-Speed 2606.84 samples/sec   Loss 7.3752   LearningRate 0.0297   Epoch: 9   Global Step: 377670   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:56,095-Speed 2585.70 samples/sec   Loss 7.4495   LearningRate 0.0297   Epoch: 9   Global Step: 377680   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:08:59,995-Speed 2626.93 samples/sec   Loss 7.3285   LearningRate 0.0297   Epoch: 9   Global Step: 377690   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:09:03,882-Speed 2635.30 samples/sec   Loss 7.3271   LearningRate 0.0297   Epoch: 9   Global Step: 377700   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:07,775-Speed 2630.79 samples/sec   Loss 7.3233   LearningRate 0.0297   Epoch: 9   Global Step: 377710   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:11,677-Speed 2625.89 samples/sec   Loss 7.1908   LearningRate 0.0297   Epoch: 9   Global Step: 377720   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:15,567-Speed 2633.56 samples/sec   Loss 7.4386   LearningRate 0.0297   Epoch: 9   Global Step: 377730   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:19,480-Speed 2617.02 samples/sec   Loss 7.3361   LearningRate 0.0297   Epoch: 9   Global Step: 377740   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:23,377-Speed 2629.17 samples/sec   Loss 7.3033   LearningRate 0.0297   Epoch: 9   Global Step: 377750   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:27,270-Speed 2630.78 samples/sec   Loss 7.3314   LearningRate 0.0297   Epoch: 9   Global Step: 377760   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:31,172-Speed 2625.24 samples/sec   Loss 7.5588   LearningRate 0.0297   Epoch: 9   Global Step: 377770   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:35,072-Speed 2625.83 samples/sec   Loss 7.4283   LearningRate 0.0297   Epoch: 9   Global Step: 377780   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:38,960-Speed 2634.48 samples/sec   Loss 7.3657   LearningRate 0.0297   Epoch: 9   Global Step: 377790   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:09:42,853-Speed 2630.77 samples/sec   Loss 7.3020   LearningRate 0.0297   Epoch: 9   Global Step: 377800   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:09:46,771-Speed 2614.79 samples/sec   Loss 7.1737   LearningRate 0.0297   Epoch: 9   Global Step: 377810   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:09:50,676-Speed 2622.09 samples/sec   Loss 7.4428   LearningRate 0.0297   Epoch: 9   Global Step: 377820   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:09:54,597-Speed 2612.82 samples/sec   Loss 7.3589   LearningRate 0.0297   Epoch: 9   Global Step: 377830   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:09:58,487-Speed 2632.71 samples/sec   Loss 7.5153   LearningRate 0.0297   Epoch: 9   Global Step: 377840   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:02,393-Speed 2622.68 samples/sec   Loss 7.2557   LearningRate 0.0297   Epoch: 9   Global Step: 377850   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:06,287-Speed 2630.22 samples/sec   Loss 7.3333   LearningRate 0.0296   Epoch: 9   Global Step: 377860   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:10,183-Speed 2628.63 samples/sec   Loss 7.4009   LearningRate 0.0296   Epoch: 9   Global Step: 377870   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:14,098-Speed 2616.79 samples/sec   Loss 7.2085   LearningRate 0.0296   Epoch: 9   Global Step: 377880   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:18,000-Speed 2625.16 samples/sec   Loss 7.2674   LearningRate 0.0296   Epoch: 9   Global Step: 377890   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:21,906-Speed 2621.90 samples/sec   Loss 7.4413   LearningRate 0.0296   Epoch: 9   Global Step: 377900   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:10:25,802-Speed 2629.02 samples/sec   Loss 7.2928   LearningRate 0.0296   Epoch: 9   Global Step: 377910   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:10:29,744-Speed 2598.39 samples/sec   Loss 7.2974   LearningRate 0.0296   Epoch: 9   Global Step: 377920   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:10:33,631-Speed 2635.37 samples/sec   Loss 7.3949   LearningRate 0.0296   Epoch: 9   Global Step: 377930   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:10:37,533-Speed 2625.37 samples/sec   Loss 7.3384   LearningRate 0.0296   Epoch: 9   Global Step: 377940   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:10:41,426-Speed 2630.78 samples/sec   Loss 7.3729   LearningRate 0.0296   Epoch: 9   Global Step: 377950   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:10:45,321-Speed 2629.20 samples/sec   Loss 7.4048   LearningRate 0.0296   Epoch: 9   Global Step: 377960   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:10:49,205-Speed 2637.56 samples/sec   Loss 7.4001   LearningRate 0.0296   Epoch: 9   Global Step: 377970   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:10:53,045-Speed 2667.50 samples/sec   Loss 7.8667   LearningRate 0.0296   Epoch: 9   Global Step: 377980   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:10:56,945-Speed 2626.52 samples/sec   Loss 7.4698   LearningRate 0.0296   Epoch: 9   Global Step: 377990   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:00,840-Speed 2629.63 samples/sec   Loss 7.4129   LearningRate 0.0296   Epoch: 9   Global Step: 378000   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:04,919-Speed 2510.57 samples/sec   Loss 7.3888   LearningRate 0.0296   Epoch: 9   Global Step: 378010   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:08,823-Speed 2623.81 samples/sec   Loss 7.4232   LearningRate 0.0296   Epoch: 9   Global Step: 378020   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:12,708-Speed 2636.89 samples/sec   Loss 7.2837   LearningRate 0.0296   Epoch: 9   Global Step: 378030   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:16,616-Speed 2620.66 samples/sec   Loss 7.3897   LearningRate 0.0296   Epoch: 9   Global Step: 378040   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:20,501-Speed 2636.33 samples/sec   Loss 7.3698   LearningRate 0.0296   Epoch: 9   Global Step: 378050   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:24,390-Speed 2633.62 samples/sec   Loss 7.3189   LearningRate 0.0296   Epoch: 9   Global Step: 378060   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:28,284-Speed 2630.27 samples/sec   Loss 7.3617   LearningRate 0.0296   Epoch: 9   Global Step: 378070   Fp16 Grad Scale: 4096   Required: 51 hours
Training: 2022-04-14 14:11:32,211-Speed 2608.69 samples/sec   Loss 7.3013   LearningRate 0.0296   Epoch: 9   Global Step: 378080   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:36,099-Speed 2634.60 samples/sec   Loss 7.3224   LearningRate 0.0296   Epoch: 9   Global Step: 378090   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:39,988-Speed 2633.85 samples/sec   Loss 7.2286   LearningRate 0.0296   Epoch: 9   Global Step: 378100   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:43,914-Speed 2608.76 samples/sec   Loss 7.3679   LearningRate 0.0296   Epoch: 9   Global Step: 378110   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:47,804-Speed 2632.91 samples/sec   Loss 7.3035   LearningRate 0.0296   Epoch: 9   Global Step: 378120   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:51,701-Speed 2628.90 samples/sec   Loss 7.3412   LearningRate 0.0296   Epoch: 9   Global Step: 378130   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:55,596-Speed 2629.18 samples/sec   Loss 7.3468   LearningRate 0.0296   Epoch: 9   Global Step: 378140   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:11:59,496-Speed 2626.35 samples/sec   Loss 7.2095   LearningRate 0.0296   Epoch: 9   Global Step: 378150   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:03,399-Speed 2624.00 samples/sec   Loss 7.3317   LearningRate 0.0296   Epoch: 9   Global Step: 378160   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:07,288-Speed 2634.05 samples/sec   Loss 7.2183   LearningRate 0.0296   Epoch: 9   Global Step: 378170   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:11,188-Speed 2626.72 samples/sec   Loss 7.4421   LearningRate 0.0296   Epoch: 9   Global Step: 378180   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:12:15,077-Speed 2633.01 samples/sec   Loss 7.4401   LearningRate 0.0296   Epoch: 9   Global Step: 378190   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:12:18,964-Speed 2635.23 samples/sec   Loss 7.9709   LearningRate 0.0296   Epoch: 9   Global Step: 378200   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:22,853-Speed 2633.75 samples/sec   Loss 7.6717   LearningRate 0.0296   Epoch: 9   Global Step: 378210   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:26,748-Speed 2629.13 samples/sec   Loss 7.4790   LearningRate 0.0296   Epoch: 9   Global Step: 378220   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:30,665-Speed 2614.86 samples/sec   Loss 7.3413   LearningRate 0.0296   Epoch: 9   Global Step: 378230   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:34,553-Speed 2633.85 samples/sec   Loss 7.3633   LearningRate 0.0296   Epoch: 9   Global Step: 378240   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:38,451-Speed 2628.02 samples/sec   Loss 7.3969   LearningRate 0.0296   Epoch: 9   Global Step: 378250   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:42,351-Speed 2628.58 samples/sec   Loss 7.3529   LearningRate 0.0296   Epoch: 9   Global Step: 378260   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:46,240-Speed 2634.14 samples/sec   Loss 7.3826   LearningRate 0.0296   Epoch: 9   Global Step: 378270   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:50,132-Speed 2631.74 samples/sec   Loss 7.3215   LearningRate 0.0296   Epoch: 9   Global Step: 378280   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:54,021-Speed 2633.51 samples/sec   Loss 7.3339   LearningRate 0.0296   Epoch: 9   Global Step: 378290   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:12:57,922-Speed 2626.08 samples/sec   Loss 7.5342   LearningRate 0.0296   Epoch: 9   Global Step: 378300   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:01,818-Speed 2628.53 samples/sec   Loss 7.3599   LearningRate 0.0296   Epoch: 9   Global Step: 378310   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:05,705-Speed 2636.53 samples/sec   Loss 7.3861   LearningRate 0.0296   Epoch: 9   Global Step: 378320   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:09,596-Speed 2631.96 samples/sec   Loss 7.2492   LearningRate 0.0296   Epoch: 9   Global Step: 378330   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:13,500-Speed 2623.57 samples/sec   Loss 7.4284   LearningRate 0.0296   Epoch: 9   Global Step: 378340   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:17,391-Speed 2632.74 samples/sec   Loss 7.3678   LearningRate 0.0296   Epoch: 9   Global Step: 378350   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:21,293-Speed 2624.23 samples/sec   Loss 7.4052   LearningRate 0.0296   Epoch: 9   Global Step: 378360   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:25,182-Speed 2634.23 samples/sec   Loss 7.3345   LearningRate 0.0296   Epoch: 9   Global Step: 378370   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:29,101-Speed 2613.81 samples/sec   Loss 7.3181   LearningRate 0.0296   Epoch: 9   Global Step: 378380   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:33,013-Speed 2617.70 samples/sec   Loss 7.3200   LearningRate 0.0296   Epoch: 9   Global Step: 378390   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:13:36,925-Speed 2618.26 samples/sec   Loss 7.3175   LearningRate 0.0296   Epoch: 9   Global Step: 378400   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:13:40,833-Speed 2620.71 samples/sec   Loss 7.4175   LearningRate 0.0296   Epoch: 9   Global Step: 378410   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:13:44,736-Speed 2624.62 samples/sec   Loss 7.3799   LearningRate 0.0296   Epoch: 9   Global Step: 378420   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:13:48,634-Speed 2627.27 samples/sec   Loss 7.5076   LearningRate 0.0296   Epoch: 9   Global Step: 378430   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:13:52,520-Speed 2636.07 samples/sec   Loss 7.3156   LearningRate 0.0296   Epoch: 9   Global Step: 378440   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:13:56,416-Speed 2628.71 samples/sec   Loss 7.3116   LearningRate 0.0296   Epoch: 9   Global Step: 378450   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:14:00,314-Speed 2627.65 samples/sec   Loss 7.3336   LearningRate 0.0296   Epoch: 9   Global Step: 378460   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:14:04,211-Speed 2628.81 samples/sec   Loss 7.3877   LearningRate 0.0296   Epoch: 9   Global Step: 378470   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:14:08,073-Speed 2651.92 samples/sec   Loss 7.3119   LearningRate 0.0296   Epoch: 9   Global Step: 378480   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:11,964-Speed 2632.49 samples/sec   Loss 7.7223   LearningRate 0.0296   Epoch: 9   Global Step: 378490   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:15,850-Speed 2635.13 samples/sec   Loss 7.3405   LearningRate 0.0296   Epoch: 9   Global Step: 378500   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:19,756-Speed 2622.33 samples/sec   Loss 7.4116   LearningRate 0.0296   Epoch: 9   Global Step: 378510   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:23,660-Speed 2623.86 samples/sec   Loss 7.2997   LearningRate 0.0296   Epoch: 9   Global Step: 378520   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:27,548-Speed 2634.38 samples/sec   Loss 7.3348   LearningRate 0.0296   Epoch: 9   Global Step: 378530   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:31,437-Speed 2633.61 samples/sec   Loss 7.4609   LearningRate 0.0296   Epoch: 9   Global Step: 378540   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:35,323-Speed 2635.72 samples/sec   Loss 7.6258   LearningRate 0.0296   Epoch: 9   Global Step: 378550   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:39,214-Speed 2631.89 samples/sec   Loss 7.1818   LearningRate 0.0296   Epoch: 9   Global Step: 378560   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:43,105-Speed 2633.02 samples/sec   Loss 7.4771   LearningRate 0.0296   Epoch: 9   Global Step: 378570   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:14:46,997-Speed 2631.57 samples/sec   Loss 7.3508   LearningRate 0.0296   Epoch: 9   Global Step: 378580   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:14:50,888-Speed 2632.11 samples/sec   Loss 7.3324   LearningRate 0.0296   Epoch: 9   Global Step: 378590   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:14:54,790-Speed 2625.02 samples/sec   Loss 7.3093   LearningRate 0.0296   Epoch: 9   Global Step: 378600   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:14:58,729-Speed 2600.45 samples/sec   Loss 7.2943   LearningRate 0.0296   Epoch: 9   Global Step: 378610   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:02,631-Speed 2625.06 samples/sec   Loss 7.3121   LearningRate 0.0296   Epoch: 9   Global Step: 378620   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:06,528-Speed 2627.92 samples/sec   Loss 7.3479   LearningRate 0.0295   Epoch: 9   Global Step: 378630   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:10,426-Speed 2627.11 samples/sec   Loss 7.2327   LearningRate 0.0295   Epoch: 9   Global Step: 378640   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:14,325-Speed 2627.90 samples/sec   Loss 7.4225   LearningRate 0.0295   Epoch: 9   Global Step: 378650   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:18,212-Speed 2635.21 samples/sec   Loss 7.3846   LearningRate 0.0295   Epoch: 9   Global Step: 378660   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:22,102-Speed 2632.77 samples/sec   Loss 7.4320   LearningRate 0.0295   Epoch: 9   Global Step: 378670   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:15:26,109-Speed 2556.34 samples/sec   Loss 7.4027   LearningRate 0.0295   Epoch: 9   Global Step: 378680   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:29,999-Speed 2632.53 samples/sec   Loss 7.3797   LearningRate 0.0295   Epoch: 9   Global Step: 378690   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:33,905-Speed 2622.42 samples/sec   Loss 7.3879   LearningRate 0.0295   Epoch: 9   Global Step: 378700   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:37,807-Speed 2624.49 samples/sec   Loss 7.4056   LearningRate 0.0295   Epoch: 9   Global Step: 378710   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:41,700-Speed 2631.21 samples/sec   Loss 7.4070   LearningRate 0.0295   Epoch: 9   Global Step: 378720   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:45,609-Speed 2620.19 samples/sec   Loss 7.3355   LearningRate 0.0295   Epoch: 9   Global Step: 378730   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:49,505-Speed 2628.88 samples/sec   Loss 7.4385   LearningRate 0.0295   Epoch: 9   Global Step: 378740   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:53,392-Speed 2635.47 samples/sec   Loss 7.3948   LearningRate 0.0295   Epoch: 9   Global Step: 378750   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:15:57,286-Speed 2630.61 samples/sec   Loss 7.3047   LearningRate 0.0295   Epoch: 9   Global Step: 378760   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:16:01,174-Speed 2634.10 samples/sec   Loss 7.2092   LearningRate 0.0295   Epoch: 9   Global Step: 378770   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:16:05,057-Speed 2637.87 samples/sec   Loss 7.5552   LearningRate 0.0295   Epoch: 9   Global Step: 378780   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:08,948-Speed 2631.73 samples/sec   Loss 7.7476   LearningRate 0.0295   Epoch: 9   Global Step: 378790   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:12,839-Speed 2632.74 samples/sec   Loss 7.6696   LearningRate 0.0295   Epoch: 9   Global Step: 378800   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:16,740-Speed 2625.53 samples/sec   Loss 7.5463   LearningRate 0.0295   Epoch: 9   Global Step: 378810   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:20,647-Speed 2621.61 samples/sec   Loss 7.6293   LearningRate 0.0295   Epoch: 9   Global Step: 378820   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:24,565-Speed 2613.99 samples/sec   Loss 7.2922   LearningRate 0.0295   Epoch: 9   Global Step: 378830   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:28,450-Speed 2636.31 samples/sec   Loss 7.2360   LearningRate 0.0295   Epoch: 9   Global Step: 378840   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:32,344-Speed 2630.85 samples/sec   Loss 7.4284   LearningRate 0.0295   Epoch: 9   Global Step: 378850   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:36,236-Speed 2631.83 samples/sec   Loss 7.3268   LearningRate 0.0295   Epoch: 9   Global Step: 378860   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:40,126-Speed 2632.50 samples/sec   Loss 7.1463   LearningRate 0.0295   Epoch: 9   Global Step: 378870   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:16:44,031-Speed 2622.65 samples/sec   Loss 7.3460   LearningRate 0.0295   Epoch: 9   Global Step: 378880   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:16:47,926-Speed 2629.94 samples/sec   Loss 7.3296   LearningRate 0.0295   Epoch: 9   Global Step: 378890   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:16:51,830-Speed 2623.68 samples/sec   Loss 7.2379   LearningRate 0.0295   Epoch: 9   Global Step: 378900   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:16:55,735-Speed 2623.05 samples/sec   Loss 7.3364   LearningRate 0.0295   Epoch: 9   Global Step: 378910   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:16:59,692-Speed 2588.38 samples/sec   Loss 7.2631   LearningRate 0.0295   Epoch: 9   Global Step: 378920   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:17:03,617-Speed 2609.76 samples/sec   Loss 7.3312   LearningRate 0.0295   Epoch: 9   Global Step: 378930   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:17:07,510-Speed 2631.37 samples/sec   Loss 7.3898   LearningRate 0.0295   Epoch: 9   Global Step: 378940   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:17:11,408-Speed 2627.31 samples/sec   Loss 7.3329   LearningRate 0.0295   Epoch: 9   Global Step: 378950   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:17:15,304-Speed 2629.02 samples/sec   Loss 7.4041   LearningRate 0.0295   Epoch: 9   Global Step: 378960   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:17:19,209-Speed 2623.13 samples/sec   Loss 7.4061   LearningRate 0.0295   Epoch: 9   Global Step: 378970   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:17:23,109-Speed 2626.38 samples/sec   Loss 7.3334   LearningRate 0.0295   Epoch: 9   Global Step: 378980   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:26,998-Speed 2633.54 samples/sec   Loss 7.2432   LearningRate 0.0295   Epoch: 9   Global Step: 378990   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:30,895-Speed 2628.92 samples/sec   Loss 7.3250   LearningRate 0.0295   Epoch: 9   Global Step: 379000   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:34,783-Speed 2633.64 samples/sec   Loss 7.4265   LearningRate 0.0295   Epoch: 9   Global Step: 379010   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:38,679-Speed 2629.30 samples/sec   Loss 7.3218   LearningRate 0.0295   Epoch: 9   Global Step: 379020   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:42,583-Speed 2623.02 samples/sec   Loss 7.3826   LearningRate 0.0295   Epoch: 9   Global Step: 379030   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:46,486-Speed 2624.99 samples/sec   Loss 7.3485   LearningRate 0.0295   Epoch: 9   Global Step: 379040   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:50,382-Speed 2628.64 samples/sec   Loss 7.4646   LearningRate 0.0295   Epoch: 9   Global Step: 379050   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:54,275-Speed 2631.51 samples/sec   Loss 7.2657   LearningRate 0.0295   Epoch: 9   Global Step: 379060   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:17:58,170-Speed 2629.71 samples/sec   Loss 7.3014   LearningRate 0.0295   Epoch: 9   Global Step: 379070   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:18:02,076-Speed 2622.26 samples/sec   Loss 7.3463   LearningRate 0.0295   Epoch: 9   Global Step: 379080   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:06,037-Speed 2585.97 samples/sec   Loss 7.3233   LearningRate 0.0295   Epoch: 9   Global Step: 379090   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:09,931-Speed 2629.96 samples/sec   Loss 7.3382   LearningRate 0.0295   Epoch: 9   Global Step: 379100   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:13,817-Speed 2635.73 samples/sec   Loss 7.3009   LearningRate 0.0295   Epoch: 9   Global Step: 379110   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:17,710-Speed 2631.42 samples/sec   Loss 7.3587   LearningRate 0.0295   Epoch: 9   Global Step: 379120   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:21,614-Speed 2623.07 samples/sec   Loss 7.3685   LearningRate 0.0295   Epoch: 9   Global Step: 379130   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:25,505-Speed 2632.43 samples/sec   Loss 7.3077   LearningRate 0.0295   Epoch: 9   Global Step: 379140   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:18:29,356-Speed 2659.86 samples/sec   Loss 7.4701   LearningRate 0.0295   Epoch: 9   Global Step: 379150   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:33,251-Speed 2629.84 samples/sec   Loss 7.6581   LearningRate 0.0295   Epoch: 9   Global Step: 379160   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:37,137-Speed 2635.32 samples/sec   Loss 7.4106   LearningRate 0.0295   Epoch: 9   Global Step: 379170   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:41,030-Speed 2631.02 samples/sec   Loss 7.3255   LearningRate 0.0295   Epoch: 9   Global Step: 379180   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:44,936-Speed 2622.71 samples/sec   Loss 7.3531   LearningRate 0.0295   Epoch: 9   Global Step: 379190   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:48,826-Speed 2632.58 samples/sec   Loss 7.3285   LearningRate 0.0295   Epoch: 9   Global Step: 379200   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:52,720-Speed 2630.83 samples/sec   Loss 7.2447   LearningRate 0.0295   Epoch: 9   Global Step: 379210   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:18:56,610-Speed 2633.07 samples/sec   Loss 7.3521   LearningRate 0.0295   Epoch: 9   Global Step: 379220   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:19:00,509-Speed 2626.77 samples/sec   Loss 7.2810   LearningRate 0.0295   Epoch: 9   Global Step: 379230   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:19:04,403-Speed 2629.96 samples/sec   Loss 7.3845   LearningRate 0.0295   Epoch: 9   Global Step: 379240   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:19:08,292-Speed 2634.04 samples/sec   Loss 7.3429   LearningRate 0.0295   Epoch: 9   Global Step: 379250   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:12,194-Speed 2624.58 samples/sec   Loss 7.3710   LearningRate 0.0295   Epoch: 9   Global Step: 379260   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:16,098-Speed 2623.15 samples/sec   Loss 7.3994   LearningRate 0.0295   Epoch: 9   Global Step: 379270   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:19,999-Speed 2625.75 samples/sec   Loss 7.2143   LearningRate 0.0295   Epoch: 9   Global Step: 379280   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:23,900-Speed 2625.60 samples/sec   Loss 7.3918   LearningRate 0.0295   Epoch: 9   Global Step: 379290   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:27,786-Speed 2635.72 samples/sec   Loss 7.3392   LearningRate 0.0295   Epoch: 9   Global Step: 379300   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:31,684-Speed 2627.92 samples/sec   Loss 7.4472   LearningRate 0.0295   Epoch: 9   Global Step: 379310   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:35,569-Speed 2636.71 samples/sec   Loss 7.4058   LearningRate 0.0295   Epoch: 9   Global Step: 379320   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:39,481-Speed 2617.63 samples/sec   Loss 7.2243   LearningRate 0.0295   Epoch: 9   Global Step: 379330   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:43,384-Speed 2624.73 samples/sec   Loss 7.3791   LearningRate 0.0295   Epoch: 9   Global Step: 379340   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:19:47,284-Speed 2625.73 samples/sec   Loss 7.3415   LearningRate 0.0295   Epoch: 9   Global Step: 379350   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:19:51,193-Speed 2620.64 samples/sec   Loss 7.2450   LearningRate 0.0295   Epoch: 9   Global Step: 379360   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:19:55,084-Speed 2632.09 samples/sec   Loss 7.2938   LearningRate 0.0295   Epoch: 9   Global Step: 379370   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:19:59,001-Speed 2615.32 samples/sec   Loss 7.3573   LearningRate 0.0295   Epoch: 9   Global Step: 379380   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:02,903-Speed 2625.06 samples/sec   Loss 7.3262   LearningRate 0.0294   Epoch: 9   Global Step: 379390   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:07,186-Speed 2391.37 samples/sec   Loss 7.2970   LearningRate 0.0294   Epoch: 9   Global Step: 379400   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:11,081-Speed 2629.86 samples/sec   Loss 7.2302   LearningRate 0.0294   Epoch: 9   Global Step: 379410   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:14,972-Speed 2632.48 samples/sec   Loss 7.5318   LearningRate 0.0294   Epoch: 9   Global Step: 379420   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:18,906-Speed 2603.31 samples/sec   Loss 7.3332   LearningRate 0.0294   Epoch: 9   Global Step: 379430   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:22,803-Speed 2628.28 samples/sec   Loss 7.3979   LearningRate 0.0294   Epoch: 9   Global Step: 379440   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:26,710-Speed 2621.92 samples/sec   Loss 7.4538   LearningRate 0.0294   Epoch: 9   Global Step: 379450   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:20:30,603-Speed 2630.96 samples/sec   Loss 7.3731   LearningRate 0.0294   Epoch: 9   Global Step: 379460   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:20:34,505-Speed 2624.97 samples/sec   Loss 7.3758   LearningRate 0.0294   Epoch: 9   Global Step: 379470   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:20:38,381-Speed 2642.10 samples/sec   Loss 7.2983   LearningRate 0.0294   Epoch: 9   Global Step: 379480   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:20:42,238-Speed 2656.15 samples/sec   Loss 7.4074   LearningRate 0.0294   Epoch: 9   Global Step: 379490   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:20:46,135-Speed 2627.88 samples/sec   Loss 7.4347   LearningRate 0.0294   Epoch: 9   Global Step: 379500   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:20:50,030-Speed 2629.78 samples/sec   Loss 7.3500   LearningRate 0.0294   Epoch: 9   Global Step: 379510   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:20:53,928-Speed 2627.46 samples/sec   Loss 7.2618   LearningRate 0.0294   Epoch: 9   Global Step: 379520   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:20:57,815-Speed 2635.39 samples/sec   Loss 7.2773   LearningRate 0.0294   Epoch: 9   Global Step: 379530   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:21:01,708-Speed 2631.07 samples/sec   Loss 7.3744   LearningRate 0.0294   Epoch: 9   Global Step: 379540   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:21:05,863-Speed 2465.03 samples/sec   Loss 7.3961   LearningRate 0.0294   Epoch: 9   Global Step: 379550   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:21:09,756-Speed 2630.41 samples/sec   Loss 7.2856   LearningRate 0.0294   Epoch: 9   Global Step: 379560   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:21:13,657-Speed 2625.80 samples/sec   Loss 7.3672   LearningRate 0.0294   Epoch: 9   Global Step: 379570   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:21:17,551-Speed 2630.20 samples/sec   Loss 7.1307   LearningRate 0.0294   Epoch: 9   Global Step: 379580   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:21:21,451-Speed 2626.94 samples/sec   Loss 7.3924   LearningRate 0.0294   Epoch: 9   Global Step: 379590   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:25,345-Speed 2629.78 samples/sec   Loss 7.3874   LearningRate 0.0294   Epoch: 9   Global Step: 379600   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:29,361-Speed 2550.76 samples/sec   Loss 7.2338   LearningRate 0.0294   Epoch: 9   Global Step: 379610   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:33,301-Speed 2598.99 samples/sec   Loss 7.2398   LearningRate 0.0294   Epoch: 9   Global Step: 379620   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:37,195-Speed 2630.86 samples/sec   Loss 7.2870   LearningRate 0.0294   Epoch: 9   Global Step: 379630   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:41,090-Speed 2629.10 samples/sec   Loss 7.4450   LearningRate 0.0294   Epoch: 9   Global Step: 379640   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:44,984-Speed 2631.22 samples/sec   Loss 7.3175   LearningRate 0.0294   Epoch: 9   Global Step: 379650   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:48,883-Speed 2626.80 samples/sec   Loss 7.3093   LearningRate 0.0294   Epoch: 9   Global Step: 379660   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:55,327-Speed 1589.18 samples/sec   Loss 7.3101   LearningRate 0.0294   Epoch: 9   Global Step: 379670   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:21:59,225-Speed 2628.28 samples/sec   Loss 7.4478   LearningRate 0.0294   Epoch: 9   Global Step: 379680   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:22:03,107-Speed 2637.67 samples/sec   Loss 7.3541   LearningRate 0.0294   Epoch: 9   Global Step: 379690   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:07,003-Speed 2628.92 samples/sec   Loss 7.3906   LearningRate 0.0294   Epoch: 9   Global Step: 379700   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:10,891-Speed 2634.46 samples/sec   Loss 7.3108   LearningRate 0.0294   Epoch: 9   Global Step: 379710   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:14,776-Speed 2637.45 samples/sec   Loss 7.2572   LearningRate 0.0294   Epoch: 9   Global Step: 379720   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:18,668-Speed 2631.67 samples/sec   Loss 7.4243   LearningRate 0.0294   Epoch: 9   Global Step: 379730   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:22,569-Speed 2625.76 samples/sec   Loss 7.2910   LearningRate 0.0294   Epoch: 9   Global Step: 379740   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:26,467-Speed 2627.77 samples/sec   Loss 7.2914   LearningRate 0.0294   Epoch: 9   Global Step: 379750   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:30,357-Speed 2632.43 samples/sec   Loss 7.2333   LearningRate 0.0294   Epoch: 9   Global Step: 379760   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:34,458-Speed 2498.09 samples/sec   Loss 7.2859   LearningRate 0.0294   Epoch: 9   Global Step: 379770   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:38,356-Speed 2627.07 samples/sec   Loss 7.2460   LearningRate 0.0294   Epoch: 9   Global Step: 379780   Fp16 Grad Scale: 65536   Required: 51 hours
Training: 2022-04-14 14:22:42,253-Speed 2628.83 samples/sec   Loss 7.2830   LearningRate 0.0294   Epoch: 9   Global Step: 379790   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:22:46,164-Speed 2618.37 samples/sec   Loss 7.3678   LearningRate 0.0294   Epoch: 9   Global Step: 379800   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:22:50,064-Speed 2626.89 samples/sec   Loss 7.1995   LearningRate 0.0294   Epoch: 9   Global Step: 379810   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:22:53,961-Speed 2628.49 samples/sec   Loss 7.4021   LearningRate 0.0294   Epoch: 9   Global Step: 379820   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:22:57,859-Speed 2627.12 samples/sec   Loss 7.2495   LearningRate 0.0294   Epoch: 9   Global Step: 379830   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:23:01,762-Speed 2624.38 samples/sec   Loss 7.2969   LearningRate 0.0294   Epoch: 9   Global Step: 379840   Fp16 Grad Scale: 131072   Required: 51 hours
Training: 2022-04-14 14:23:05,654-Speed 2631.40 samples/sec   Loss 7.2905   LearningRate 0.0294   Epoch: 9   Global Step: 379850   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:09,558-Speed 2623.80 samples/sec   Loss 7.3761   LearningRate 0.0294   Epoch: 9   Global Step: 379860   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:13,474-Speed 2615.34 samples/sec   Loss 7.3373   LearningRate 0.0294   Epoch: 9   Global Step: 379870   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:17,375-Speed 2625.47 samples/sec   Loss 7.2922   LearningRate 0.0294   Epoch: 9   Global Step: 379880   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:21,273-Speed 2627.39 samples/sec   Loss 7.3274   LearningRate 0.0294   Epoch: 9   Global Step: 379890   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:25,172-Speed 2627.23 samples/sec   Loss 7.2872   LearningRate 0.0294   Epoch: 9   Global Step: 379900   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:29,073-Speed 2625.48 samples/sec   Loss 7.2463   LearningRate 0.0294   Epoch: 9   Global Step: 379910   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:32,974-Speed 2625.93 samples/sec   Loss 7.3291   LearningRate 0.0294   Epoch: 9   Global Step: 379920   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:23:36,857-Speed 2637.63 samples/sec   Loss 7.3868   LearningRate 0.0294   Epoch: 9   Global Step: 379930   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:23:40,748-Speed 2631.99 samples/sec   Loss 7.3962   LearningRate 0.0294   Epoch: 9   Global Step: 379940   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:23:44,649-Speed 2625.97 samples/sec   Loss 7.2720   LearningRate 0.0294   Epoch: 9   Global Step: 379950   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:23:48,550-Speed 2625.76 samples/sec   Loss 7.3804   LearningRate 0.0294   Epoch: 9   Global Step: 379960   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:23:52,408-Speed 2654.18 samples/sec   Loss 7.5996   LearningRate 0.0294   Epoch: 9   Global Step: 379970   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:23:56,310-Speed 2625.43 samples/sec   Loss 7.7685   LearningRate 0.0294   Epoch: 9   Global Step: 379980   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:24:00,203-Speed 2631.19 samples/sec   Loss 7.3201   LearningRate 0.0294   Epoch: 9   Global Step: 379990   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:24:04,090-Speed 2635.53 samples/sec   Loss 7.0410   LearningRate 0.0294   Epoch: 9   Global Step: 380000   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:24:46,722-[lfw][380000]XNorm: 24.351409
Training: 2022-04-14 14:24:46,723-[lfw][380000]Accuracy-Flip: 0.99750+-0.00271
Training: 2022-04-14 14:24:46,724-[lfw][380000]Accuracy-Highest: 0.99783
Training: 2022-04-14 14:25:36,366-[cfp_fp][380000]XNorm: 22.373011
Training: 2022-04-14 14:25:36,367-[cfp_fp][380000]Accuracy-Flip: 0.98757+-0.00495
Training: 2022-04-14 14:25:36,368-[cfp_fp][380000]Accuracy-Highest: 0.98757
Training: 2022-04-14 14:26:19,163-[agedb_30][380000]XNorm: 24.158033
Training: 2022-04-14 14:26:19,164-[agedb_30][380000]Accuracy-Flip: 0.97583+-0.00549
Training: 2022-04-14 14:26:19,165-[agedb_30][380000]Accuracy-Highest: 0.97700
Training: 2022-04-14 14:26:23,044-Speed 73.69 samples/sec   Loss 7.2693   LearningRate 0.0294   Epoch: 9   Global Step: 380010   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:26:26,968-Speed 2609.64 samples/sec   Loss 7.3517   LearningRate 0.0294   Epoch: 9   Global Step: 380020   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:26:30,847-Speed 2641.01 samples/sec   Loss 7.3764   LearningRate 0.0294   Epoch: 9   Global Step: 380030   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:26:34,724-Speed 2642.17 samples/sec   Loss 7.3075   LearningRate 0.0294   Epoch: 9   Global Step: 380040   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:26:38,611-Speed 2635.02 samples/sec   Loss 7.3524   LearningRate 0.0294   Epoch: 9   Global Step: 380050   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:26:42,487-Speed 2642.55 samples/sec   Loss 7.1790   LearningRate 0.0294   Epoch: 9   Global Step: 380060   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-14 14:26:46,383-Speed 2629.81 samples/sec   Loss 7.2716   LearningRate 0.0294   Epoch: 9   Global Step: 380070   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:26:50,263-Speed 2639.84 samples/sec   Loss 7.2565   LearningRate 0.0294   Epoch: 9   Global Step: 380080   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:26:54,147-Speed 2636.73 samples/sec   Loss 7.3138   LearningRate 0.0294   Epoch: 9   Global Step: 380090   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:26:58,035-Speed 2634.96 samples/sec   Loss 7.3253   LearningRate 0.0294   Epoch: 9   Global Step: 380100   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:01,930-Speed 2629.76 samples/sec   Loss 7.3168   LearningRate 0.0294   Epoch: 9   Global Step: 380110   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:05,817-Speed 2634.93 samples/sec   Loss 7.3079   LearningRate 0.0294   Epoch: 9   Global Step: 380120   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:09,714-Speed 2628.36 samples/sec   Loss 7.1239   LearningRate 0.0294   Epoch: 9   Global Step: 380130   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:13,608-Speed 2630.48 samples/sec   Loss 7.3457   LearningRate 0.0294   Epoch: 9   Global Step: 380140   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:17,497-Speed 2633.19 samples/sec   Loss 7.3100   LearningRate 0.0293   Epoch: 9   Global Step: 380150   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:21,389-Speed 2632.47 samples/sec   Loss 7.1503   LearningRate 0.0293   Epoch: 9   Global Step: 380160   Fp16 Grad Scale: 16384   Required: 51 hours
Training: 2022-04-14 14:27:25,290-Speed 2624.99 samples/sec   Loss 7.2215   LearningRate 0.0293   Epoch: 9   Global Step: 380170   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:27:29,182-Speed 2632.23 samples/sec   Loss 7.2819   LearningRate 0.0293   Epoch: 9   Global Step: 380180   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:27:33,084-Speed 2624.65 samples/sec   Loss 7.2677   LearningRate 0.0293   Epoch: 9   Global Step: 380190   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:27:36,988-Speed 2624.16 samples/sec   Loss 7.2279   LearningRate 0.0293   Epoch: 9   Global Step: 380200   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:27:40,894-Speed 2621.97 samples/sec   Loss 7.2779   LearningRate 0.0293   Epoch: 9   Global Step: 380210   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:27:44,790-Speed 2628.86 samples/sec   Loss 7.3434   LearningRate 0.0293   Epoch: 9   Global Step: 380220   Fp16 Grad Scale: 32768   Required: 51 hours
Training: 2022-04-14 14:27:48,686-Speed 2628.84 samples/sec   Loss 7.1491   LearningRate 0.0293   Epoch: 9   Global Step: 380230   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:27:52,599-Speed 2617.88 samples/sec   Loss 7.3214   LearningRate 0.0293   Epoch: 9   Global Step: 380240   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:27:56,498-Speed 2626.76 samples/sec   Loss 7.2319   LearningRate 0.0293   Epoch: 9   Global Step: 380250   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:00,400-Speed 2625.30 samples/sec   Loss 7.2668   LearningRate 0.0293   Epoch: 9   Global Step: 380260   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:04,291-Speed 2631.99 samples/sec   Loss 7.2834   LearningRate 0.0293   Epoch: 9   Global Step: 380270   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:08,202-Speed 2619.52 samples/sec   Loss 7.3483   LearningRate 0.0293   Epoch: 9   Global Step: 380280   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:12,106-Speed 2623.54 samples/sec   Loss 7.3931   LearningRate 0.0293   Epoch: 9   Global Step: 380290   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:16,003-Speed 2627.97 samples/sec   Loss 7.2274   LearningRate 0.0293   Epoch: 9   Global Step: 380300   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:19,919-Speed 2615.65 samples/sec   Loss 7.3890   LearningRate 0.0293   Epoch: 9   Global Step: 380310   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:28:23,799-Speed 2639.59 samples/sec   Loss 7.4742   LearningRate 0.0293   Epoch: 9   Global Step: 380320   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:27,698-Speed 2626.75 samples/sec   Loss 7.4423   LearningRate 0.0293   Epoch: 9   Global Step: 380330   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:31,593-Speed 2629.80 samples/sec   Loss 7.1873   LearningRate 0.0293   Epoch: 9   Global Step: 380340   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:35,494-Speed 2625.48 samples/sec   Loss 7.1356   LearningRate 0.0293   Epoch: 9   Global Step: 380350   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:39,388-Speed 2630.30 samples/sec   Loss 7.2059   LearningRate 0.0293   Epoch: 9   Global Step: 380360   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:43,286-Speed 2627.69 samples/sec   Loss 7.2360   LearningRate 0.0293   Epoch: 9   Global Step: 380370   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:47,178-Speed 2631.59 samples/sec   Loss 7.2753   LearningRate 0.0293   Epoch: 9   Global Step: 380380   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:51,072-Speed 2630.40 samples/sec   Loss 7.3362   LearningRate 0.0293   Epoch: 9   Global Step: 380390   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:54,971-Speed 2626.98 samples/sec   Loss 7.2647   LearningRate 0.0293   Epoch: 9   Global Step: 380400   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:28:58,863-Speed 2631.47 samples/sec   Loss 7.3404   LearningRate 0.0293   Epoch: 9   Global Step: 380410   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:29:02,757-Speed 2629.92 samples/sec   Loss 7.2897   LearningRate 0.0293   Epoch: 9   Global Step: 380420   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:06,656-Speed 2626.86 samples/sec   Loss 7.3568   LearningRate 0.0293   Epoch: 9   Global Step: 380430   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:10,561-Speed 2622.53 samples/sec   Loss 7.3294   LearningRate 0.0293   Epoch: 9   Global Step: 380440   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:14,454-Speed 2630.99 samples/sec   Loss 7.2679   LearningRate 0.0293   Epoch: 9   Global Step: 380450   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:18,356-Speed 2625.25 samples/sec   Loss 7.3223   LearningRate 0.0293   Epoch: 9   Global Step: 380460   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:22,269-Speed 2617.85 samples/sec   Loss 7.3811   LearningRate 0.0293   Epoch: 9   Global Step: 380470   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:26,165-Speed 2628.55 samples/sec   Loss 7.2693   LearningRate 0.0293   Epoch: 9   Global Step: 380480   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:30,065-Speed 2626.63 samples/sec   Loss 7.2592   LearningRate 0.0293   Epoch: 9   Global Step: 380490   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:33,959-Speed 2630.16 samples/sec   Loss 7.3453   LearningRate 0.0293   Epoch: 9   Global Step: 380500   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:37,848-Speed 2633.00 samples/sec   Loss 7.2950   LearningRate 0.0293   Epoch: 9   Global Step: 380510   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:29:41,748-Speed 2626.50 samples/sec   Loss 7.2226   LearningRate 0.0293   Epoch: 9   Global Step: 380520   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:29:45,660-Speed 2618.34 samples/sec   Loss 7.2981   LearningRate 0.0293   Epoch: 9   Global Step: 380530   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:29:49,559-Speed 2626.59 samples/sec   Loss 7.3338   LearningRate 0.0293   Epoch: 9   Global Step: 380540   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:29:53,455-Speed 2628.84 samples/sec   Loss 7.4067   LearningRate 0.0293   Epoch: 9   Global Step: 380550   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:29:57,346-Speed 2632.81 samples/sec   Loss 7.2377   LearningRate 0.0293   Epoch: 9   Global Step: 380560   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:01,262-Speed 2615.35 samples/sec   Loss 7.2325   LearningRate 0.0293   Epoch: 9   Global Step: 380570   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:05,170-Speed 2620.82 samples/sec   Loss 7.2961   LearningRate 0.0293   Epoch: 9   Global Step: 380580   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:09,069-Speed 2626.55 samples/sec   Loss 7.4103   LearningRate 0.0293   Epoch: 9   Global Step: 380590   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:12,973-Speed 2623.88 samples/sec   Loss 7.2808   LearningRate 0.0293   Epoch: 9   Global Step: 380600   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:16,875-Speed 2624.69 samples/sec   Loss 7.2373   LearningRate 0.0293   Epoch: 9   Global Step: 380610   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:20,755-Speed 2639.89 samples/sec   Loss 7.3063   LearningRate 0.0293   Epoch: 9   Global Step: 380620   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:24,664-Speed 2620.46 samples/sec   Loss 7.3339   LearningRate 0.0293   Epoch: 9   Global Step: 380630   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:28,564-Speed 2626.13 samples/sec   Loss 7.3839   LearningRate 0.0293   Epoch: 9   Global Step: 380640   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:32,561-Speed 2562.20 samples/sec   Loss 7.4317   LearningRate 0.0293   Epoch: 9   Global Step: 380650   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:36,462-Speed 2625.92 samples/sec   Loss 7.3450   LearningRate 0.0293   Epoch: 9   Global Step: 380660   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:40,360-Speed 2627.03 samples/sec   Loss 7.1882   LearningRate 0.0293   Epoch: 9   Global Step: 380670   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:44,261-Speed 2626.00 samples/sec   Loss 7.2234   LearningRate 0.0293   Epoch: 9   Global Step: 380680   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:48,165-Speed 2623.13 samples/sec   Loss 7.2114   LearningRate 0.0293   Epoch: 9   Global Step: 380690   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:52,069-Speed 2623.41 samples/sec   Loss 7.3069   LearningRate 0.0293   Epoch: 9   Global Step: 380700   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:55,966-Speed 2628.76 samples/sec   Loss 7.3360   LearningRate 0.0293   Epoch: 9   Global Step: 380710   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:30:59,816-Speed 2659.89 samples/sec   Loss 7.4761   LearningRate 0.0293   Epoch: 9   Global Step: 380720   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:03,727-Speed 2618.95 samples/sec   Loss 7.5657   LearningRate 0.0293   Epoch: 9   Global Step: 380730   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:07,623-Speed 2628.44 samples/sec   Loss 7.2141   LearningRate 0.0293   Epoch: 9   Global Step: 380740   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:11,529-Speed 2623.16 samples/sec   Loss 7.4339   LearningRate 0.0293   Epoch: 9   Global Step: 380750   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:15,420-Speed 2632.12 samples/sec   Loss 7.2759   LearningRate 0.0293   Epoch: 9   Global Step: 380760   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:19,316-Speed 2629.12 samples/sec   Loss 7.3055   LearningRate 0.0293   Epoch: 9   Global Step: 380770   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:23,229-Speed 2616.83 samples/sec   Loss 7.4357   LearningRate 0.0293   Epoch: 9   Global Step: 380780   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:27,131-Speed 2625.60 samples/sec   Loss 7.4269   LearningRate 0.0293   Epoch: 9   Global Step: 380790   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:31,032-Speed 2625.31 samples/sec   Loss 7.3246   LearningRate 0.0293   Epoch: 9   Global Step: 380800   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:34,930-Speed 2627.08 samples/sec   Loss 7.2714   LearningRate 0.0293   Epoch: 9   Global Step: 380810   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:31:38,856-Speed 2609.29 samples/sec   Loss 7.2665   LearningRate 0.0293   Epoch: 9   Global Step: 380820   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:31:42,753-Speed 2628.53 samples/sec   Loss 7.4119   LearningRate 0.0293   Epoch: 9   Global Step: 380830   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:31:46,651-Speed 2627.60 samples/sec   Loss 7.1262   LearningRate 0.0293   Epoch: 9   Global Step: 380840   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:31:50,555-Speed 2623.75 samples/sec   Loss 7.3539   LearningRate 0.0293   Epoch: 9   Global Step: 380850   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:31:54,451-Speed 2629.22 samples/sec   Loss 7.2859   LearningRate 0.0293   Epoch: 9   Global Step: 380860   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:31:58,356-Speed 2622.35 samples/sec   Loss 7.1987   LearningRate 0.0293   Epoch: 9   Global Step: 380870   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:32:02,259-Speed 2624.61 samples/sec   Loss 7.3007   LearningRate 0.0293   Epoch: 9   Global Step: 380880   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:32:06,150-Speed 2631.69 samples/sec   Loss 7.3112   LearningRate 0.0293   Epoch: 9   Global Step: 380890   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:32:10,047-Speed 2628.57 samples/sec   Loss 7.2827   LearningRate 0.0293   Epoch: 9   Global Step: 380900   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:32:13,938-Speed 2632.46 samples/sec   Loss 7.4300   LearningRate 0.0293   Epoch: 9   Global Step: 380910   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:32:17,849-Speed 2618.85 samples/sec   Loss 7.2927   LearningRate 0.0292   Epoch: 9   Global Step: 380920   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:21,746-Speed 2627.84 samples/sec   Loss 7.2857   LearningRate 0.0292   Epoch: 9   Global Step: 380930   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:25,647-Speed 2626.43 samples/sec   Loss 7.3255   LearningRate 0.0292   Epoch: 9   Global Step: 380940   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:29,534-Speed 2634.60 samples/sec   Loss 7.3952   LearningRate 0.0292   Epoch: 9   Global Step: 380950   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:33,481-Speed 2595.25 samples/sec   Loss 7.2867   LearningRate 0.0292   Epoch: 9   Global Step: 380960   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:37,383-Speed 2624.97 samples/sec   Loss 7.2727   LearningRate 0.0292   Epoch: 9   Global Step: 380970   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:41,327-Speed 2596.57 samples/sec   Loss 7.4273   LearningRate 0.0292   Epoch: 9   Global Step: 380980   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:45,224-Speed 2627.88 samples/sec   Loss 7.3503   LearningRate 0.0292   Epoch: 9   Global Step: 380990   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:49,119-Speed 2629.81 samples/sec   Loss 7.3350   LearningRate 0.0292   Epoch: 9   Global Step: 381000   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:53,021-Speed 2625.18 samples/sec   Loss 7.3494   LearningRate 0.0292   Epoch: 9   Global Step: 381010   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:32:56,924-Speed 2624.29 samples/sec   Loss 7.1961   LearningRate 0.0292   Epoch: 9   Global Step: 381020   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:00,816-Speed 2631.87 samples/sec   Loss 7.2674   LearningRate 0.0292   Epoch: 9   Global Step: 381030   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:04,719-Speed 2624.24 samples/sec   Loss 7.2784   LearningRate 0.0292   Epoch: 9   Global Step: 381040   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:08,615-Speed 2628.59 samples/sec   Loss 7.2292   LearningRate 0.0292   Epoch: 9   Global Step: 381050   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:12,511-Speed 2629.07 samples/sec   Loss 7.2072   LearningRate 0.0292   Epoch: 9   Global Step: 381060   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:16,410-Speed 2626.73 samples/sec   Loss 7.2986   LearningRate 0.0292   Epoch: 9   Global Step: 381070   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:20,312-Speed 2625.03 samples/sec   Loss 7.3952   LearningRate 0.0292   Epoch: 9   Global Step: 381080   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:24,204-Speed 2631.61 samples/sec   Loss 7.3166   LearningRate 0.0292   Epoch: 9   Global Step: 381090   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:28,099-Speed 2629.45 samples/sec   Loss 7.2884   LearningRate 0.0292   Epoch: 9   Global Step: 381100   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:32,000-Speed 2626.21 samples/sec   Loss 7.3216   LearningRate 0.0292   Epoch: 9   Global Step: 381110   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:35,887-Speed 2634.51 samples/sec   Loss 7.1959   LearningRate 0.0292   Epoch: 9   Global Step: 381120   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:33:39,766-Speed 2640.55 samples/sec   Loss 7.2531   LearningRate 0.0292   Epoch: 9   Global Step: 381130   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:43,661-Speed 2629.42 samples/sec   Loss 7.2486   LearningRate 0.0292   Epoch: 9   Global Step: 381140   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:47,552-Speed 2632.56 samples/sec   Loss 7.4212   LearningRate 0.0292   Epoch: 9   Global Step: 381150   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:51,451-Speed 2627.03 samples/sec   Loss 7.3905   LearningRate 0.0292   Epoch: 9   Global Step: 381160   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:55,341-Speed 2632.92 samples/sec   Loss 7.3462   LearningRate 0.0292   Epoch: 9   Global Step: 381170   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:33:59,239-Speed 2627.86 samples/sec   Loss 7.1775   LearningRate 0.0292   Epoch: 9   Global Step: 381180   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:34:03,125-Speed 2635.47 samples/sec   Loss 7.3278   LearningRate 0.0292   Epoch: 9   Global Step: 381190   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:34:07,028-Speed 2624.40 samples/sec   Loss 7.2689   LearningRate 0.0292   Epoch: 9   Global Step: 381200   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:34:10,924-Speed 2628.45 samples/sec   Loss 7.1965   LearningRate 0.0292   Epoch: 9   Global Step: 381210   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:34:14,790-Speed 2649.20 samples/sec   Loss 7.2299   LearningRate 0.0292   Epoch: 9   Global Step: 381220   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:34:18,672-Speed 2638.61 samples/sec   Loss 7.4905   LearningRate 0.0292   Epoch: 9   Global Step: 381230   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:22,576-Speed 2623.74 samples/sec   Loss 8.2937   LearningRate 0.0292   Epoch: 9   Global Step: 381240   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:26,467-Speed 2632.21 samples/sec   Loss 7.6895   LearningRate 0.0292   Epoch: 9   Global Step: 381250   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:30,364-Speed 2627.91 samples/sec   Loss 7.4142   LearningRate 0.0292   Epoch: 9   Global Step: 381260   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:34,262-Speed 2627.94 samples/sec   Loss 7.3700   LearningRate 0.0292   Epoch: 9   Global Step: 381270   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:38,158-Speed 2629.09 samples/sec   Loss 7.2916   LearningRate 0.0292   Epoch: 9   Global Step: 381280   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:42,052-Speed 2630.34 samples/sec   Loss 7.3248   LearningRate 0.0292   Epoch: 9   Global Step: 381290   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:45,953-Speed 2625.64 samples/sec   Loss 7.3324   LearningRate 0.0292   Epoch: 9   Global Step: 381300   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:49,858-Speed 2622.51 samples/sec   Loss 7.2649   LearningRate 0.0292   Epoch: 9   Global Step: 381310   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:53,755-Speed 2628.87 samples/sec   Loss 7.2873   LearningRate 0.0292   Epoch: 9   Global Step: 381320   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:34:57,650-Speed 2629.23 samples/sec   Loss 7.3722   LearningRate 0.0292   Epoch: 9   Global Step: 381330   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:01,543-Speed 2630.75 samples/sec   Loss 7.3807   LearningRate 0.0292   Epoch: 9   Global Step: 381340   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:05,438-Speed 2630.03 samples/sec   Loss 7.3452   LearningRate 0.0292   Epoch: 9   Global Step: 381350   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:09,339-Speed 2625.11 samples/sec   Loss 7.2508   LearningRate 0.0292   Epoch: 9   Global Step: 381360   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:13,229-Speed 2633.73 samples/sec   Loss 7.2536   LearningRate 0.0292   Epoch: 9   Global Step: 381370   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:17,137-Speed 2620.60 samples/sec   Loss 7.3158   LearningRate 0.0292   Epoch: 9   Global Step: 381380   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:21,040-Speed 2624.07 samples/sec   Loss 7.3338   LearningRate 0.0292   Epoch: 9   Global Step: 381390   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:24,939-Speed 2626.52 samples/sec   Loss 7.1513   LearningRate 0.0292   Epoch: 9   Global Step: 381400   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:28,843-Speed 2623.77 samples/sec   Loss 7.3231   LearningRate 0.0292   Epoch: 9   Global Step: 381410   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:35:32,728-Speed 2636.48 samples/sec   Loss 7.5485   LearningRate 0.0292   Epoch: 9   Global Step: 381420   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:35:36,622-Speed 2630.64 samples/sec   Loss 7.6080   LearningRate 0.0292   Epoch: 9   Global Step: 381430   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:35:40,537-Speed 2615.76 samples/sec   Loss 7.3854   LearningRate 0.0292   Epoch: 9   Global Step: 381440   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:35:44,439-Speed 2624.82 samples/sec   Loss 7.2962   LearningRate 0.0292   Epoch: 9   Global Step: 381450   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:35:48,337-Speed 2628.10 samples/sec   Loss 7.4302   LearningRate 0.0292   Epoch: 9   Global Step: 381460   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:35:52,236-Speed 2626.44 samples/sec   Loss 7.2710   LearningRate 0.0292   Epoch: 9   Global Step: 381470   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:35:56,133-Speed 2628.72 samples/sec   Loss 7.2049   LearningRate 0.0292   Epoch: 9   Global Step: 381480   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:36:00,051-Speed 2614.04 samples/sec   Loss 7.3654   LearningRate 0.0292   Epoch: 9   Global Step: 381490   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:36:03,944-Speed 2630.55 samples/sec   Loss 7.3982   LearningRate 0.0292   Epoch: 9   Global Step: 381500   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:36:07,845-Speed 2625.60 samples/sec   Loss 7.2216   LearningRate 0.0292   Epoch: 9   Global Step: 381510   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:36:11,755-Speed 2620.07 samples/sec   Loss 7.3482   LearningRate 0.0292   Epoch: 9   Global Step: 381520   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:15,648-Speed 2631.00 samples/sec   Loss 7.2362   LearningRate 0.0292   Epoch: 9   Global Step: 381530   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:19,545-Speed 2628.43 samples/sec   Loss 7.3603   LearningRate 0.0292   Epoch: 9   Global Step: 381540   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:23,437-Speed 2630.89 samples/sec   Loss 7.2804   LearningRate 0.0292   Epoch: 9   Global Step: 381550   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:27,339-Speed 2625.40 samples/sec   Loss 7.2985   LearningRate 0.0292   Epoch: 9   Global Step: 381560   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:31,240-Speed 2625.54 samples/sec   Loss 7.2666   LearningRate 0.0292   Epoch: 9   Global Step: 381570   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:35,130-Speed 2632.74 samples/sec   Loss 7.3585   LearningRate 0.0292   Epoch: 9   Global Step: 381580   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:39,028-Speed 2626.95 samples/sec   Loss 7.3227   LearningRate 0.0292   Epoch: 9   Global Step: 381590   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:42,926-Speed 2628.08 samples/sec   Loss 7.4256   LearningRate 0.0292   Epoch: 9   Global Step: 381600   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:46,832-Speed 2621.78 samples/sec   Loss 7.3203   LearningRate 0.0292   Epoch: 9   Global Step: 381610   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:36:50,730-Speed 2627.91 samples/sec   Loss 7.2659   LearningRate 0.0292   Epoch: 9   Global Step: 381620   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:36:54,617-Speed 2635.37 samples/sec   Loss 7.1392   LearningRate 0.0292   Epoch: 9   Global Step: 381630   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:36:58,510-Speed 2630.60 samples/sec   Loss 7.1996   LearningRate 0.0292   Epoch: 9   Global Step: 381640   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:02,415-Speed 2623.00 samples/sec   Loss 7.2296   LearningRate 0.0292   Epoch: 9   Global Step: 381650   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:06,304-Speed 2633.70 samples/sec   Loss 7.2122   LearningRate 0.0292   Epoch: 9   Global Step: 381660   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:10,201-Speed 2627.65 samples/sec   Loss 7.2579   LearningRate 0.0292   Epoch: 9   Global Step: 381670   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:14,120-Speed 2614.38 samples/sec   Loss 7.2795   LearningRate 0.0292   Epoch: 9   Global Step: 381680   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:18,026-Speed 2622.29 samples/sec   Loss 7.3117   LearningRate 0.0291   Epoch: 9   Global Step: 381690   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:21,924-Speed 2627.07 samples/sec   Loss 7.3151   LearningRate 0.0291   Epoch: 9   Global Step: 381700   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:25,820-Speed 2629.63 samples/sec   Loss 7.3792   LearningRate 0.0291   Epoch: 9   Global Step: 381710   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:37:29,717-Speed 2628.19 samples/sec   Loss 7.2910   LearningRate 0.0291   Epoch: 9   Global Step: 381720   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:33,620-Speed 2624.49 samples/sec   Loss 7.2543   LearningRate 0.0291   Epoch: 9   Global Step: 381730   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:37,514-Speed 2629.93 samples/sec   Loss 7.3106   LearningRate 0.0291   Epoch: 9   Global Step: 381740   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:41,409-Speed 2629.80 samples/sec   Loss 7.3171   LearningRate 0.0291   Epoch: 9   Global Step: 381750   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:45,314-Speed 2622.38 samples/sec   Loss 7.2183   LearningRate 0.0291   Epoch: 9   Global Step: 381760   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:49,221-Speed 2621.80 samples/sec   Loss 7.2364   LearningRate 0.0291   Epoch: 9   Global Step: 381770   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:53,109-Speed 2633.88 samples/sec   Loss 7.2469   LearningRate 0.0291   Epoch: 9   Global Step: 381780   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:37:57,005-Speed 2629.25 samples/sec   Loss 7.3600   LearningRate 0.0291   Epoch: 9   Global Step: 381790   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:00,902-Speed 2628.15 samples/sec   Loss 7.2876   LearningRate 0.0291   Epoch: 9   Global Step: 381800   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:04,796-Speed 2630.64 samples/sec   Loss 7.1491   LearningRate 0.0291   Epoch: 9   Global Step: 381810   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:08,687-Speed 2632.24 samples/sec   Loss 7.2976   LearningRate 0.0291   Epoch: 9   Global Step: 381820   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:38:12,581-Speed 2630.42 samples/sec   Loss 7.3078   LearningRate 0.0291   Epoch: 9   Global Step: 381830   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:38:16,518-Speed 2601.24 samples/sec   Loss 7.2566   LearningRate 0.0291   Epoch: 9   Global Step: 381840   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:38:20,490-Speed 2578.68 samples/sec   Loss 7.2085   LearningRate 0.0291   Epoch: 9   Global Step: 381850   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:38:24,376-Speed 2635.43 samples/sec   Loss 7.2602   LearningRate 0.0291   Epoch: 9   Global Step: 381860   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:28,288-Speed 2618.15 samples/sec   Loss 7.3351   LearningRate 0.0291   Epoch: 9   Global Step: 381870   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:32,193-Speed 2623.11 samples/sec   Loss 7.2311   LearningRate 0.0291   Epoch: 9   Global Step: 381880   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:36,089-Speed 2628.32 samples/sec   Loss 7.2137   LearningRate 0.0291   Epoch: 9   Global Step: 381890   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:40,007-Speed 2614.60 samples/sec   Loss 7.3819   LearningRate 0.0291   Epoch: 9   Global Step: 381900   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:43,909-Speed 2624.61 samples/sec   Loss 7.2314   LearningRate 0.0291   Epoch: 9   Global Step: 381910   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:47,812-Speed 2624.83 samples/sec   Loss 7.2190   LearningRate 0.0291   Epoch: 9   Global Step: 381920   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:51,893-Speed 2509.53 samples/sec   Loss 7.1955   LearningRate 0.0291   Epoch: 9   Global Step: 381930   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:38:55,781-Speed 2634.07 samples/sec   Loss 7.2439   LearningRate 0.0291   Epoch: 9   Global Step: 381940   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:38:59,696-Speed 2616.32 samples/sec   Loss 7.8477   LearningRate 0.0291   Epoch: 9   Global Step: 381950   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:03,614-Speed 2614.11 samples/sec   Loss 7.5094   LearningRate 0.0291   Epoch: 9   Global Step: 381960   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:07,515-Speed 2625.38 samples/sec   Loss 7.2853   LearningRate 0.0291   Epoch: 9   Global Step: 381970   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:11,417-Speed 2625.03 samples/sec   Loss 7.2722   LearningRate 0.0291   Epoch: 9   Global Step: 381980   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:15,308-Speed 2631.71 samples/sec   Loss 7.2407   LearningRate 0.0291   Epoch: 9   Global Step: 381990   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:19,205-Speed 2629.04 samples/sec   Loss 7.2080   LearningRate 0.0291   Epoch: 9   Global Step: 382000   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:23,107-Speed 2624.93 samples/sec   Loss 7.2654   LearningRate 0.0291   Epoch: 9   Global Step: 382010   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:27,003-Speed 2629.10 samples/sec   Loss 7.2430   LearningRate 0.0291   Epoch: 9   Global Step: 382020   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:30,897-Speed 2629.72 samples/sec   Loss 7.2645   LearningRate 0.0291   Epoch: 9   Global Step: 382030   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:39:34,791-Speed 2630.83 samples/sec   Loss 7.1310   LearningRate 0.0291   Epoch: 9   Global Step: 382040   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:39:38,690-Speed 2626.69 samples/sec   Loss 7.2667   LearningRate 0.0291   Epoch: 9   Global Step: 382050   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:39:42,598-Speed 2620.28 samples/sec   Loss 7.2982   LearningRate 0.0291   Epoch: 9   Global Step: 382060   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:39:46,518-Speed 2613.14 samples/sec   Loss 7.2589   LearningRate 0.0291   Epoch: 9   Global Step: 382070   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:39:50,417-Speed 2626.24 samples/sec   Loss 7.1817   LearningRate 0.0291   Epoch: 9   Global Step: 382080   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:39:54,358-Speed 2599.31 samples/sec   Loss 7.2765   LearningRate 0.0291   Epoch: 9   Global Step: 382090   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:39:58,449-Speed 2503.50 samples/sec   Loss 7.2348   LearningRate 0.0291   Epoch: 9   Global Step: 382100   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:02,449-Speed 2560.88 samples/sec   Loss 7.3205   LearningRate 0.0291   Epoch: 9   Global Step: 382110   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:06,394-Speed 2596.31 samples/sec   Loss 7.2671   LearningRate 0.0291   Epoch: 9   Global Step: 382120   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:10,306-Speed 2618.20 samples/sec   Loss 7.3382   LearningRate 0.0291   Epoch: 9   Global Step: 382130   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:14,201-Speed 2629.19 samples/sec   Loss 7.3941   LearningRate 0.0291   Epoch: 9   Global Step: 382140   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:40:18,100-Speed 2626.96 samples/sec   Loss 7.5041   LearningRate 0.0291   Epoch: 9   Global Step: 382150   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:22,001-Speed 2625.98 samples/sec   Loss 7.6680   LearningRate 0.0291   Epoch: 9   Global Step: 382160   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:25,901-Speed 2625.74 samples/sec   Loss 7.4646   LearningRate 0.0291   Epoch: 9   Global Step: 382170   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:29,850-Speed 2594.13 samples/sec   Loss 7.3426   LearningRate 0.0291   Epoch: 9   Global Step: 382180   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:33,756-Speed 2622.54 samples/sec   Loss 7.1874   LearningRate 0.0291   Epoch: 9   Global Step: 382190   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:37,768-Speed 2552.77 samples/sec   Loss 7.2823   LearningRate 0.0291   Epoch: 9   Global Step: 382200   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:41,806-Speed 2536.83 samples/sec   Loss 7.3311   LearningRate 0.0291   Epoch: 9   Global Step: 382210   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:40:45,735-Speed 2606.58 samples/sec   Loss 7.3615   LearningRate 0.0291   Epoch: 9   Global Step: 382220   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:40:49,717-Speed 2572.13 samples/sec   Loss 7.2844   LearningRate 0.0291   Epoch: 9   Global Step: 382230   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:40:53,623-Speed 2622.35 samples/sec   Loss 7.4938   LearningRate 0.0291   Epoch: 9   Global Step: 382240   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:40:57,526-Speed 2624.34 samples/sec   Loss 7.3855   LearningRate 0.0291   Epoch: 9   Global Step: 382250   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:01,421-Speed 2629.26 samples/sec   Loss 7.2494   LearningRate 0.0291   Epoch: 9   Global Step: 382260   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:05,309-Speed 2634.21 samples/sec   Loss 7.1420   LearningRate 0.0291   Epoch: 9   Global Step: 382270   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:09,258-Speed 2593.98 samples/sec   Loss 7.3399   LearningRate 0.0291   Epoch: 9   Global Step: 382280   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:13,157-Speed 2626.80 samples/sec   Loss 7.2777   LearningRate 0.0291   Epoch: 9   Global Step: 382290   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:17,066-Speed 2620.15 samples/sec   Loss 7.2555   LearningRate 0.0291   Epoch: 9   Global Step: 382300   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:20,968-Speed 2625.22 samples/sec   Loss 7.3646   LearningRate 0.0291   Epoch: 9   Global Step: 382310   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:41:25,096-Speed 2481.13 samples/sec   Loss 7.2017   LearningRate 0.0291   Epoch: 9   Global Step: 382320   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:28,994-Speed 2627.77 samples/sec   Loss 7.2761   LearningRate 0.0291   Epoch: 9   Global Step: 382330   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:32,910-Speed 2615.05 samples/sec   Loss 7.2673   LearningRate 0.0291   Epoch: 9   Global Step: 382340   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:36,802-Speed 2631.66 samples/sec   Loss 7.3083   LearningRate 0.0291   Epoch: 9   Global Step: 382350   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:40,707-Speed 2622.43 samples/sec   Loss 7.2714   LearningRate 0.0291   Epoch: 9   Global Step: 382360   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:44,600-Speed 2631.05 samples/sec   Loss 7.1499   LearningRate 0.0291   Epoch: 9   Global Step: 382370   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:48,501-Speed 2626.37 samples/sec   Loss 7.3177   LearningRate 0.0291   Epoch: 9   Global Step: 382380   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:52,390-Speed 2633.67 samples/sec   Loss 7.2488   LearningRate 0.0291   Epoch: 9   Global Step: 382390   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:41:56,278-Speed 2634.47 samples/sec   Loss 7.3252   LearningRate 0.0291   Epoch: 9   Global Step: 382400   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:42:00,176-Speed 2627.42 samples/sec   Loss 7.2118   LearningRate 0.0291   Epoch: 9   Global Step: 382410   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:42:04,081-Speed 2622.35 samples/sec   Loss 7.2154   LearningRate 0.0291   Epoch: 9   Global Step: 382420   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:07,979-Speed 2627.47 samples/sec   Loss 7.2016   LearningRate 0.0291   Epoch: 9   Global Step: 382430   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:11,864-Speed 2636.26 samples/sec   Loss 7.3534   LearningRate 0.0291   Epoch: 9   Global Step: 382440   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:15,767-Speed 2624.61 samples/sec   Loss 7.2646   LearningRate 0.0291   Epoch: 9   Global Step: 382450   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:19,670-Speed 2624.25 samples/sec   Loss 7.3675   LearningRate 0.0290   Epoch: 9   Global Step: 382460   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:23,557-Speed 2634.85 samples/sec   Loss 7.1842   LearningRate 0.0290   Epoch: 9   Global Step: 382470   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:27,453-Speed 2629.56 samples/sec   Loss 7.2553   LearningRate 0.0290   Epoch: 9   Global Step: 382480   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:31,353-Speed 2626.12 samples/sec   Loss 7.1649   LearningRate 0.0290   Epoch: 9   Global Step: 382490   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:35,247-Speed 2629.84 samples/sec   Loss 7.2986   LearningRate 0.0290   Epoch: 9   Global Step: 382500   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:39,139-Speed 2631.67 samples/sec   Loss 7.1637   LearningRate 0.0290   Epoch: 9   Global Step: 382510   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:42:43,037-Speed 2627.60 samples/sec   Loss 7.1309   LearningRate 0.0290   Epoch: 9   Global Step: 382520   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:42:46,945-Speed 2620.59 samples/sec   Loss 7.3324   LearningRate 0.0290   Epoch: 9   Global Step: 382530   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:42:50,841-Speed 2629.53 samples/sec   Loss 7.3476   LearningRate 0.0290   Epoch: 9   Global Step: 382540   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:42:54,743-Speed 2624.29 samples/sec   Loss 7.2153   LearningRate 0.0290   Epoch: 9   Global Step: 382550   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:42:58,641-Speed 2627.98 samples/sec   Loss 7.2190   LearningRate 0.0290   Epoch: 9   Global Step: 382560   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:43:02,523-Speed 2638.40 samples/sec   Loss 7.3183   LearningRate 0.0290   Epoch: 9   Global Step: 382570   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:43:07,438-Speed 2083.76 samples/sec   Loss 7.2252   LearningRate 0.0290   Epoch: 9   Global Step: 382580   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:43:11,334-Speed 2629.36 samples/sec   Loss 7.3161   LearningRate 0.0290   Epoch: 9   Global Step: 382590   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:43:15,252-Speed 2614.31 samples/sec   Loss 7.2200   LearningRate 0.0290   Epoch: 9   Global Step: 382600   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:43:19,158-Speed 2622.57 samples/sec   Loss 7.2004   LearningRate 0.0290   Epoch: 9   Global Step: 382610   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:43:23,053-Speed 2629.43 samples/sec   Loss 7.1423   LearningRate 0.0290   Epoch: 9   Global Step: 382620   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:26,956-Speed 2624.59 samples/sec   Loss 7.3195   LearningRate 0.0290   Epoch: 9   Global Step: 382630   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:30,864-Speed 2620.98 samples/sec   Loss 7.3201   LearningRate 0.0290   Epoch: 9   Global Step: 382640   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:34,793-Speed 2606.70 samples/sec   Loss 7.3078   LearningRate 0.0290   Epoch: 9   Global Step: 382650   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:38,760-Speed 2582.10 samples/sec   Loss 7.1917   LearningRate 0.0290   Epoch: 9   Global Step: 382660   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:42,655-Speed 2629.70 samples/sec   Loss 7.3805   LearningRate 0.0290   Epoch: 9   Global Step: 382670   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:46,561-Speed 2622.31 samples/sec   Loss 7.2444   LearningRate 0.0290   Epoch: 9   Global Step: 382680   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:50,476-Speed 2616.60 samples/sec   Loss 7.1194   LearningRate 0.0290   Epoch: 9   Global Step: 382690   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:54,380-Speed 2623.32 samples/sec   Loss 7.3731   LearningRate 0.0290   Epoch: 9   Global Step: 382700   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:43:58,282-Speed 2624.86 samples/sec   Loss 7.2754   LearningRate 0.0290   Epoch: 9   Global Step: 382710   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:44:02,174-Speed 2632.08 samples/sec   Loss 7.6548   LearningRate 0.0290   Epoch: 9   Global Step: 382720   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:06,074-Speed 2626.17 samples/sec   Loss 7.6230   LearningRate 0.0290   Epoch: 9   Global Step: 382730   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:10,021-Speed 2594.85 samples/sec   Loss 7.2889   LearningRate 0.0290   Epoch: 9   Global Step: 382740   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:13,927-Speed 2622.03 samples/sec   Loss 7.2939   LearningRate 0.0290   Epoch: 9   Global Step: 382750   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:17,825-Speed 2627.62 samples/sec   Loss 7.3349   LearningRate 0.0290   Epoch: 9   Global Step: 382760   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:21,756-Speed 2605.44 samples/sec   Loss 7.2081   LearningRate 0.0290   Epoch: 9   Global Step: 382770   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:25,698-Speed 2598.67 samples/sec   Loss 7.3071   LearningRate 0.0290   Epoch: 9   Global Step: 382780   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:29,597-Speed 2627.16 samples/sec   Loss 7.5249   LearningRate 0.0290   Epoch: 9   Global Step: 382790   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:33,605-Speed 2555.66 samples/sec   Loss 7.3362   LearningRate 0.0290   Epoch: 9   Global Step: 382800   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:37,506-Speed 2624.99 samples/sec   Loss 7.3559   LearningRate 0.0290   Epoch: 9   Global Step: 382810   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:44:41,407-Speed 2626.00 samples/sec   Loss 7.2748   LearningRate 0.0290   Epoch: 9   Global Step: 382820   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:44:45,303-Speed 2628.80 samples/sec   Loss 7.3207   LearningRate 0.0290   Epoch: 9   Global Step: 382830   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:44:49,191-Speed 2634.46 samples/sec   Loss 7.3733   LearningRate 0.0290   Epoch: 9   Global Step: 382840   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:44:53,085-Speed 2630.52 samples/sec   Loss 7.3076   LearningRate 0.0290   Epoch: 9   Global Step: 382850   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:44:56,987-Speed 2624.55 samples/sec   Loss 7.2252   LearningRate 0.0290   Epoch: 9   Global Step: 382860   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:45:00,885-Speed 2628.34 samples/sec   Loss 7.2488   LearningRate 0.0290   Epoch: 9   Global Step: 382870   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:45:04,784-Speed 2626.88 samples/sec   Loss 7.2864   LearningRate 0.0290   Epoch: 9   Global Step: 382880   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:45:08,700-Speed 2615.76 samples/sec   Loss 7.3032   LearningRate 0.0290   Epoch: 9   Global Step: 382890   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:45:12,618-Speed 2613.77 samples/sec   Loss 7.1321   LearningRate 0.0290   Epoch: 9   Global Step: 382900   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:45:16,504-Speed 2635.60 samples/sec   Loss 7.3772   LearningRate 0.0290   Epoch: 9   Global Step: 382910   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:45:20,405-Speed 2625.36 samples/sec   Loss 7.2918   LearningRate 0.0290   Epoch: 9   Global Step: 382920   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:24,305-Speed 2626.61 samples/sec   Loss 7.2155   LearningRate 0.0290   Epoch: 9   Global Step: 382930   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:28,198-Speed 2631.14 samples/sec   Loss 7.3712   LearningRate 0.0290   Epoch: 9   Global Step: 382940   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:32,104-Speed 2622.88 samples/sec   Loss 7.2149   LearningRate 0.0290   Epoch: 9   Global Step: 382950   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:36,024-Speed 2612.65 samples/sec   Loss 7.2466   LearningRate 0.0290   Epoch: 9   Global Step: 382960   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:39,941-Speed 2614.77 samples/sec   Loss 7.2692   LearningRate 0.0290   Epoch: 9   Global Step: 382970   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:43,831-Speed 2633.12 samples/sec   Loss 7.1910   LearningRate 0.0290   Epoch: 9   Global Step: 382980   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:47,729-Speed 2627.42 samples/sec   Loss 7.2548   LearningRate 0.0290   Epoch: 9   Global Step: 382990   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:51,620-Speed 2632.57 samples/sec   Loss 7.2474   LearningRate 0.0290   Epoch: 9   Global Step: 383000   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:55,517-Speed 2628.29 samples/sec   Loss 7.2435   LearningRate 0.0290   Epoch: 9   Global Step: 383010   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:45:59,407-Speed 2633.00 samples/sec   Loss 7.2056   LearningRate 0.0290   Epoch: 9   Global Step: 383020   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:46:03,317-Speed 2619.46 samples/sec   Loss 7.1142   LearningRate 0.0290   Epoch: 9   Global Step: 383030   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:46:07,242-Speed 2609.37 samples/sec   Loss 7.1827   LearningRate 0.0290   Epoch: 9   Global Step: 383040   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:46:11,135-Speed 2630.85 samples/sec   Loss 7.2141   LearningRate 0.0290   Epoch: 9   Global Step: 383050   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:46:15,038-Speed 2624.21 samples/sec   Loss 7.2654   LearningRate 0.0290   Epoch: 9   Global Step: 383060   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:46:18,863-Speed 2678.11 samples/sec   Loss 7.5155   LearningRate 0.0290   Epoch: 9   Global Step: 383070   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:22,786-Speed 2610.74 samples/sec   Loss 7.2722   LearningRate 0.0290   Epoch: 9   Global Step: 383080   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:26,679-Speed 2630.46 samples/sec   Loss 7.2946   LearningRate 0.0290   Epoch: 9   Global Step: 383090   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:30,570-Speed 2633.09 samples/sec   Loss 7.1455   LearningRate 0.0290   Epoch: 9   Global Step: 383100   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:34,471-Speed 2626.07 samples/sec   Loss 7.1706   LearningRate 0.0290   Epoch: 9   Global Step: 383110   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:38,361-Speed 2632.85 samples/sec   Loss 7.2529   LearningRate 0.0290   Epoch: 9   Global Step: 383120   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:42,253-Speed 2631.28 samples/sec   Loss 7.2173   LearningRate 0.0290   Epoch: 9   Global Step: 383130   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:46,142-Speed 2633.54 samples/sec   Loss 7.2944   LearningRate 0.0290   Epoch: 9   Global Step: 383140   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:50,042-Speed 2626.47 samples/sec   Loss 7.2431   LearningRate 0.0290   Epoch: 9   Global Step: 383150   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:53,944-Speed 2625.19 samples/sec   Loss 7.2805   LearningRate 0.0290   Epoch: 9   Global Step: 383160   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 14:46:57,835-Speed 2631.89 samples/sec   Loss 7.3382   LearningRate 0.0290   Epoch: 9   Global Step: 383170   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:01,736-Speed 2625.35 samples/sec   Loss 7.2763   LearningRate 0.0290   Epoch: 9   Global Step: 383180   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:05,624-Speed 2634.30 samples/sec   Loss 7.4516   LearningRate 0.0290   Epoch: 9   Global Step: 383190   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:09,529-Speed 2623.82 samples/sec   Loss 7.2859   LearningRate 0.0290   Epoch: 9   Global Step: 383200   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:13,451-Speed 2611.23 samples/sec   Loss 7.1825   LearningRate 0.0290   Epoch: 9   Global Step: 383210   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:17,377-Speed 2608.30 samples/sec   Loss 7.2442   LearningRate 0.0290   Epoch: 9   Global Step: 383220   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:21,274-Speed 2628.32 samples/sec   Loss 7.1417   LearningRate 0.0289   Epoch: 9   Global Step: 383230   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:25,170-Speed 2629.14 samples/sec   Loss 7.2105   LearningRate 0.0289   Epoch: 9   Global Step: 383240   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:29,069-Speed 2626.97 samples/sec   Loss 7.2447   LearningRate 0.0289   Epoch: 9   Global Step: 383250   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:33,011-Speed 2598.60 samples/sec   Loss 7.2591   LearningRate 0.0289   Epoch: 9   Global Step: 383260   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:47:36,913-Speed 2625.06 samples/sec   Loss 7.2469   LearningRate 0.0289   Epoch: 9   Global Step: 383270   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:47:40,818-Speed 2622.65 samples/sec   Loss 7.2619   LearningRate 0.0289   Epoch: 9   Global Step: 383280   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:47:44,712-Speed 2630.51 samples/sec   Loss 7.1968   LearningRate 0.0289   Epoch: 9   Global Step: 383290   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:47:48,607-Speed 2629.39 samples/sec   Loss 7.3395   LearningRate 0.0289   Epoch: 9   Global Step: 383300   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:47:52,519-Speed 2618.43 samples/sec   Loss 7.1662   LearningRate 0.0289   Epoch: 9   Global Step: 383310   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:47:56,416-Speed 2628.12 samples/sec   Loss 7.2528   LearningRate 0.0289   Epoch: 9   Global Step: 383320   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:00,306-Speed 2633.07 samples/sec   Loss 7.2527   LearningRate 0.0289   Epoch: 9   Global Step: 383330   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:04,203-Speed 2628.58 samples/sec   Loss 7.2206   LearningRate 0.0289   Epoch: 9   Global Step: 383340   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:08,095-Speed 2631.69 samples/sec   Loss 7.3494   LearningRate 0.0289   Epoch: 9   Global Step: 383350   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:11,995-Speed 2625.94 samples/sec   Loss 7.2724   LearningRate 0.0289   Epoch: 9   Global Step: 383360   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:15,875-Speed 2639.78 samples/sec   Loss 7.5958   LearningRate 0.0289   Epoch: 9   Global Step: 383370   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:19,796-Speed 2612.95 samples/sec   Loss 7.3840   LearningRate 0.0289   Epoch: 9   Global Step: 383380   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:23,700-Speed 2622.96 samples/sec   Loss 7.3030   LearningRate 0.0289   Epoch: 9   Global Step: 383390   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:27,599-Speed 2627.22 samples/sec   Loss 7.2379   LearningRate 0.0289   Epoch: 9   Global Step: 383400   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:31,502-Speed 2624.59 samples/sec   Loss 7.2186   LearningRate 0.0289   Epoch: 9   Global Step: 383410   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:35,392-Speed 2632.95 samples/sec   Loss 7.2766   LearningRate 0.0289   Epoch: 9   Global Step: 383420   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:39,287-Speed 2629.44 samples/sec   Loss 7.1863   LearningRate 0.0289   Epoch: 9   Global Step: 383430   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:43,189-Speed 2625.57 samples/sec   Loss 7.1937   LearningRate 0.0289   Epoch: 9   Global Step: 383440   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:47,093-Speed 2623.30 samples/sec   Loss 7.3441   LearningRate 0.0289   Epoch: 9   Global Step: 383450   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:51,016-Speed 2611.16 samples/sec   Loss 7.2801   LearningRate 0.0289   Epoch: 9   Global Step: 383460   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:48:54,922-Speed 2622.06 samples/sec   Loss 7.1539   LearningRate 0.0289   Epoch: 9   Global Step: 383470   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:48:58,849-Speed 2609.03 samples/sec   Loss 7.2993   LearningRate 0.0289   Epoch: 9   Global Step: 383480   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:02,743-Speed 2629.55 samples/sec   Loss 7.1645   LearningRate 0.0289   Epoch: 9   Global Step: 383490   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:06,654-Speed 2619.64 samples/sec   Loss 7.2330   LearningRate 0.0289   Epoch: 9   Global Step: 383500   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:10,553-Speed 2626.43 samples/sec   Loss 7.2721   LearningRate 0.0289   Epoch: 9   Global Step: 383510   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:14,474-Speed 2612.79 samples/sec   Loss 7.1621   LearningRate 0.0289   Epoch: 9   Global Step: 383520   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:18,380-Speed 2622.01 samples/sec   Loss 7.1604   LearningRate 0.0289   Epoch: 9   Global Step: 383530   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:22,287-Speed 2622.02 samples/sec   Loss 7.2195   LearningRate 0.0289   Epoch: 9   Global Step: 383540   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:26,176-Speed 2633.15 samples/sec   Loss 7.3625   LearningRate 0.0289   Epoch: 9   Global Step: 383550   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:30,075-Speed 2627.27 samples/sec   Loss 7.3287   LearningRate 0.0289   Epoch: 9   Global Step: 383560   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:49:33,979-Speed 2623.98 samples/sec   Loss 7.3968   LearningRate 0.0289   Epoch: 9   Global Step: 383570   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:49:37,877-Speed 2627.18 samples/sec   Loss 7.3169   LearningRate 0.0289   Epoch: 9   Global Step: 383580   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:49:41,792-Speed 2616.13 samples/sec   Loss 7.2265   LearningRate 0.0289   Epoch: 9   Global Step: 383590   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:49:45,700-Speed 2622.18 samples/sec   Loss 7.0870   LearningRate 0.0289   Epoch: 9   Global Step: 383600   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:49:49,602-Speed 2624.80 samples/sec   Loss 7.3123   LearningRate 0.0289   Epoch: 9   Global Step: 383610   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:49:53,502-Speed 2626.04 samples/sec   Loss 7.2640   LearningRate 0.0289   Epoch: 9   Global Step: 383620   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:49:57,511-Speed 2555.56 samples/sec   Loss 7.1340   LearningRate 0.0289   Epoch: 9   Global Step: 383630   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:01,512-Speed 2559.50 samples/sec   Loss 7.3376   LearningRate 0.0289   Epoch: 9   Global Step: 383640   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:05,410-Speed 2627.95 samples/sec   Loss 7.2897   LearningRate 0.0289   Epoch: 9   Global Step: 383650   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:09,305-Speed 2629.10 samples/sec   Loss 7.3124   LearningRate 0.0289   Epoch: 9   Global Step: 383660   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:13,187-Speed 2638.93 samples/sec   Loss 7.1544   LearningRate 0.0289   Epoch: 9   Global Step: 383670   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:17,082-Speed 2629.42 samples/sec   Loss 7.3642   LearningRate 0.0289   Epoch: 9   Global Step: 383680   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:20,992-Speed 2620.20 samples/sec   Loss 7.3234   LearningRate 0.0289   Epoch: 9   Global Step: 383690   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:24,896-Speed 2624.01 samples/sec   Loss 7.2930   LearningRate 0.0289   Epoch: 9   Global Step: 383700   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:28,811-Speed 2615.78 samples/sec   Loss 7.3506   LearningRate 0.0289   Epoch: 9   Global Step: 383710   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:32,712-Speed 2625.71 samples/sec   Loss 7.2289   LearningRate 0.0289   Epoch: 9   Global Step: 383720   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:36,614-Speed 2625.45 samples/sec   Loss 7.2180   LearningRate 0.0289   Epoch: 9   Global Step: 383730   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:40,527-Speed 2617.33 samples/sec   Loss 7.2231   LearningRate 0.0289   Epoch: 9   Global Step: 383740   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:44,417-Speed 2633.02 samples/sec   Loss 7.2469   LearningRate 0.0289   Epoch: 9   Global Step: 383750   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:48,308-Speed 2632.74 samples/sec   Loss 7.2404   LearningRate 0.0289   Epoch: 9   Global Step: 383760   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:52,203-Speed 2629.57 samples/sec   Loss 7.2504   LearningRate 0.0289   Epoch: 9   Global Step: 383770   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:50:56,089-Speed 2635.80 samples/sec   Loss 7.2056   LearningRate 0.0289   Epoch: 9   Global Step: 383780   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:50:59,984-Speed 2629.07 samples/sec   Loss 7.2001   LearningRate 0.0289   Epoch: 9   Global Step: 383790   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:51:03,879-Speed 2630.23 samples/sec   Loss 7.2860   LearningRate 0.0289   Epoch: 9   Global Step: 383800   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:51:07,777-Speed 2627.76 samples/sec   Loss 7.3349   LearningRate 0.0289   Epoch: 9   Global Step: 383810   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:51:11,662-Speed 2636.42 samples/sec   Loss 7.4179   LearningRate 0.0289   Epoch: 9   Global Step: 383820   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:51:15,562-Speed 2625.94 samples/sec   Loss 7.3077   LearningRate 0.0289   Epoch: 9   Global Step: 383830   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:51:19,444-Speed 2639.03 samples/sec   Loss 7.6078   LearningRate 0.0289   Epoch: 9   Global Step: 383840   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:51:23,339-Speed 2629.13 samples/sec   Loss 7.3694   LearningRate 0.0289   Epoch: 9   Global Step: 383850   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:51:27,217-Speed 2641.17 samples/sec   Loss 7.4318   LearningRate 0.0289   Epoch: 9   Global Step: 383860   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:31,114-Speed 2628.29 samples/sec   Loss 7.4806   LearningRate 0.0289   Epoch: 9   Global Step: 383870   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:35,014-Speed 2626.45 samples/sec   Loss 7.1989   LearningRate 0.0289   Epoch: 9   Global Step: 383880   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:38,906-Speed 2631.74 samples/sec   Loss 7.2872   LearningRate 0.0289   Epoch: 9   Global Step: 383890   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:42,799-Speed 2631.22 samples/sec   Loss 7.4352   LearningRate 0.0289   Epoch: 9   Global Step: 383900   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:46,693-Speed 2629.76 samples/sec   Loss 7.3393   LearningRate 0.0289   Epoch: 9   Global Step: 383910   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:50,675-Speed 2572.54 samples/sec   Loss 7.0813   LearningRate 0.0289   Epoch: 9   Global Step: 383920   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:54,574-Speed 2627.67 samples/sec   Loss 7.2013   LearningRate 0.0289   Epoch: 9   Global Step: 383930   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:51:58,471-Speed 2628.14 samples/sec   Loss 7.2085   LearningRate 0.0289   Epoch: 9   Global Step: 383940   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:52:02,368-Speed 2628.43 samples/sec   Loss 7.1543   LearningRate 0.0289   Epoch: 9   Global Step: 383950   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 14:52:06,270-Speed 2625.51 samples/sec   Loss 7.1049   LearningRate 0.0289   Epoch: 9   Global Step: 383960   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:10,165-Speed 2629.22 samples/sec   Loss 7.3653   LearningRate 0.0289   Epoch: 9   Global Step: 383970   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:14,063-Speed 2627.69 samples/sec   Loss 7.2962   LearningRate 0.0289   Epoch: 9   Global Step: 383980   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:17,968-Speed 2622.83 samples/sec   Loss 7.2715   LearningRate 0.0289   Epoch: 9   Global Step: 383990   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:21,879-Speed 2619.26 samples/sec   Loss 7.2065   LearningRate 0.0288   Epoch: 9   Global Step: 384000   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:26,026-Speed 2470.19 samples/sec   Loss 7.3075   LearningRate 0.0288   Epoch: 9   Global Step: 384010   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:29,972-Speed 2595.92 samples/sec   Loss 7.1102   LearningRate 0.0288   Epoch: 9   Global Step: 384020   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:33,864-Speed 2631.90 samples/sec   Loss 7.3286   LearningRate 0.0288   Epoch: 9   Global Step: 384030   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:37,772-Speed 2620.77 samples/sec   Loss 7.1894   LearningRate 0.0288   Epoch: 9   Global Step: 384040   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:41,671-Speed 2626.46 samples/sec   Loss 7.2408   LearningRate 0.0288   Epoch: 9   Global Step: 384050   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:52:45,626-Speed 2590.25 samples/sec   Loss 7.3274   LearningRate 0.0288   Epoch: 9   Global Step: 384060   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:52:49,524-Speed 2627.83 samples/sec   Loss 7.2646   LearningRate 0.0288   Epoch: 9   Global Step: 384070   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:52:53,519-Speed 2563.95 samples/sec   Loss 7.3414   LearningRate 0.0288   Epoch: 9   Global Step: 384080   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:52:57,418-Speed 2626.95 samples/sec   Loss 7.1795   LearningRate 0.0288   Epoch: 9   Global Step: 384090   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:53:01,377-Speed 2587.23 samples/sec   Loss 7.2068   LearningRate 0.0288   Epoch: 9   Global Step: 384100   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:53:05,275-Speed 2627.74 samples/sec   Loss 7.1159   LearningRate 0.0288   Epoch: 9   Global Step: 384110   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:53:09,180-Speed 2622.96 samples/sec   Loss 7.2342   LearningRate 0.0288   Epoch: 9   Global Step: 384120   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:53:13,074-Speed 2630.28 samples/sec   Loss 7.1565   LearningRate 0.0288   Epoch: 9   Global Step: 384130   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:53:16,969-Speed 2629.53 samples/sec   Loss 7.3229   LearningRate 0.0288   Epoch: 9   Global Step: 384140   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:53:20,859-Speed 2633.15 samples/sec   Loss 7.3506   LearningRate 0.0288   Epoch: 9   Global Step: 384150   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:24,763-Speed 2623.81 samples/sec   Loss 7.1925   LearningRate 0.0288   Epoch: 9   Global Step: 384160   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:28,665-Speed 2625.35 samples/sec   Loss 7.3010   LearningRate 0.0288   Epoch: 9   Global Step: 384170   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:32,561-Speed 2628.96 samples/sec   Loss 7.1872   LearningRate 0.0288   Epoch: 9   Global Step: 384180   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:36,452-Speed 2631.70 samples/sec   Loss 7.3152   LearningRate 0.0288   Epoch: 9   Global Step: 384190   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:40,346-Speed 2630.56 samples/sec   Loss 7.1093   LearningRate 0.0288   Epoch: 9   Global Step: 384200   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:44,248-Speed 2624.81 samples/sec   Loss 7.2603   LearningRate 0.0288   Epoch: 9   Global Step: 384210   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:48,154-Speed 2622.64 samples/sec   Loss 7.2691   LearningRate 0.0288   Epoch: 9   Global Step: 384220   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:52,056-Speed 2625.02 samples/sec   Loss 7.1615   LearningRate 0.0288   Epoch: 9   Global Step: 384230   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:55,997-Speed 2598.70 samples/sec   Loss 7.1160   LearningRate 0.0288   Epoch: 9   Global Step: 384240   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:53:59,893-Speed 2629.48 samples/sec   Loss 7.3556   LearningRate 0.0288   Epoch: 9   Global Step: 384250   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:03,790-Speed 2628.05 samples/sec   Loss 7.1507   LearningRate 0.0288   Epoch: 9   Global Step: 384260   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:07,689-Speed 2627.24 samples/sec   Loss 7.2083   LearningRate 0.0288   Epoch: 9   Global Step: 384270   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:11,596-Speed 2621.02 samples/sec   Loss 7.1889   LearningRate 0.0288   Epoch: 9   Global Step: 384280   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:15,491-Speed 2629.91 samples/sec   Loss 7.2626   LearningRate 0.0288   Epoch: 9   Global Step: 384290   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:19,414-Speed 2610.46 samples/sec   Loss 7.2529   LearningRate 0.0288   Epoch: 9   Global Step: 384300   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:23,414-Speed 2561.05 samples/sec   Loss 7.2375   LearningRate 0.0288   Epoch: 9   Global Step: 384310   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:27,324-Speed 2619.69 samples/sec   Loss 7.3642   LearningRate 0.0288   Epoch: 9   Global Step: 384320   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:31,215-Speed 2632.38 samples/sec   Loss 7.3003   LearningRate 0.0288   Epoch: 9   Global Step: 384330   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:35,115-Speed 2626.12 samples/sec   Loss 7.1035   LearningRate 0.0288   Epoch: 9   Global Step: 384340   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 14:54:39,010-Speed 2629.34 samples/sec   Loss 7.2814   LearningRate 0.0288   Epoch: 9   Global Step: 384350   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:54:42,907-Speed 2628.25 samples/sec   Loss 7.2108   LearningRate 0.0288   Epoch: 9   Global Step: 384360   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:54:46,803-Speed 2629.08 samples/sec   Loss 7.1188   LearningRate 0.0288   Epoch: 9   Global Step: 384370   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:54:50,697-Speed 2630.38 samples/sec   Loss 7.3364   LearningRate 0.0288   Epoch: 9   Global Step: 384380   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:54:54,594-Speed 2628.36 samples/sec   Loss 7.2628   LearningRate 0.0288   Epoch: 9   Global Step: 384390   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:54:58,485-Speed 2632.56 samples/sec   Loss 7.2243   LearningRate 0.0288   Epoch: 9   Global Step: 384400   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:02,381-Speed 2628.95 samples/sec   Loss 7.2254   LearningRate 0.0288   Epoch: 9   Global Step: 384410   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:06,270-Speed 2633.16 samples/sec   Loss 7.2706   LearningRate 0.0288   Epoch: 9   Global Step: 384420   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:10,164-Speed 2630.55 samples/sec   Loss 7.2864   LearningRate 0.0288   Epoch: 9   Global Step: 384430   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:14,057-Speed 2631.24 samples/sec   Loss 7.2659   LearningRate 0.0288   Epoch: 9   Global Step: 384440   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:17,943-Speed 2635.93 samples/sec   Loss 7.1528   LearningRate 0.0288   Epoch: 9   Global Step: 384450   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:55:21,830-Speed 2635.38 samples/sec   Loss 7.2110   LearningRate 0.0288   Epoch: 9   Global Step: 384460   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:25,727-Speed 2628.28 samples/sec   Loss 7.0954   LearningRate 0.0288   Epoch: 9   Global Step: 384470   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:29,623-Speed 2629.19 samples/sec   Loss 7.2179   LearningRate 0.0288   Epoch: 9   Global Step: 384480   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:33,554-Speed 2605.78 samples/sec   Loss 7.3578   LearningRate 0.0288   Epoch: 9   Global Step: 384490   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:37,451-Speed 2627.97 samples/sec   Loss 7.3620   LearningRate 0.0288   Epoch: 9   Global Step: 384500   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:41,353-Speed 2625.26 samples/sec   Loss 7.2919   LearningRate 0.0288   Epoch: 9   Global Step: 384510   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:45,252-Speed 2627.04 samples/sec   Loss 7.2035   LearningRate 0.0288   Epoch: 9   Global Step: 384520   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:49,147-Speed 2629.53 samples/sec   Loss 7.1337   LearningRate 0.0288   Epoch: 9   Global Step: 384530   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:53,045-Speed 2627.66 samples/sec   Loss 7.3717   LearningRate 0.0288   Epoch: 9   Global Step: 384540   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:55:56,936-Speed 2632.30 samples/sec   Loss 7.2091   LearningRate 0.0288   Epoch: 9   Global Step: 384550   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:00,828-Speed 2631.67 samples/sec   Loss 7.1978   LearningRate 0.0288   Epoch: 9   Global Step: 384560   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:56:04,710-Speed 2638.39 samples/sec   Loss 7.2025   LearningRate 0.0288   Epoch: 9   Global Step: 384570   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:08,611-Speed 2626.03 samples/sec   Loss 7.0935   LearningRate 0.0288   Epoch: 9   Global Step: 384580   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:12,526-Speed 2615.90 samples/sec   Loss 7.1762   LearningRate 0.0288   Epoch: 9   Global Step: 384590   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:16,422-Speed 2629.26 samples/sec   Loss 7.2295   LearningRate 0.0288   Epoch: 9   Global Step: 384600   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:20,329-Speed 2621.53 samples/sec   Loss 7.1522   LearningRate 0.0288   Epoch: 9   Global Step: 384610   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:24,216-Speed 2635.05 samples/sec   Loss 7.2398   LearningRate 0.0288   Epoch: 9   Global Step: 384620   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:28,115-Speed 2627.43 samples/sec   Loss 7.2061   LearningRate 0.0288   Epoch: 9   Global Step: 384630   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:32,021-Speed 2621.72 samples/sec   Loss 7.2410   LearningRate 0.0288   Epoch: 9   Global Step: 384640   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:35,922-Speed 2626.51 samples/sec   Loss 7.2486   LearningRate 0.0288   Epoch: 9   Global Step: 384650   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:39,816-Speed 2630.43 samples/sec   Loss 7.1669   LearningRate 0.0288   Epoch: 9   Global Step: 384660   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:43,705-Speed 2633.19 samples/sec   Loss 7.2542   LearningRate 0.0288   Epoch: 9   Global Step: 384670   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:56:47,608-Speed 2623.97 samples/sec   Loss 7.2040   LearningRate 0.0288   Epoch: 9   Global Step: 384680   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:56:51,514-Speed 2622.39 samples/sec   Loss 7.2102   LearningRate 0.0288   Epoch: 9   Global Step: 384690   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:56:55,416-Speed 2625.58 samples/sec   Loss 7.1937   LearningRate 0.0288   Epoch: 9   Global Step: 384700   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:56:59,317-Speed 2625.37 samples/sec   Loss 7.2500   LearningRate 0.0288   Epoch: 9   Global Step: 384710   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:03,214-Speed 2628.37 samples/sec   Loss 7.2603   LearningRate 0.0288   Epoch: 9   Global Step: 384720   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:07,114-Speed 2626.20 samples/sec   Loss 7.1265   LearningRate 0.0288   Epoch: 9   Global Step: 384730   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:11,005-Speed 2632.55 samples/sec   Loss 7.2091   LearningRate 0.0288   Epoch: 9   Global Step: 384740   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:14,909-Speed 2623.10 samples/sec   Loss 7.0500   LearningRate 0.0288   Epoch: 9   Global Step: 384750   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:18,823-Speed 2617.05 samples/sec   Loss 7.1965   LearningRate 0.0288   Epoch: 9   Global Step: 384760   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:22,720-Speed 2628.17 samples/sec   Loss 7.0831   LearningRate 0.0287   Epoch: 9   Global Step: 384770   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:26,618-Speed 2628.07 samples/sec   Loss 7.1535   LearningRate 0.0287   Epoch: 9   Global Step: 384780   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:30,515-Speed 2628.41 samples/sec   Loss 7.2706   LearningRate 0.0287   Epoch: 9   Global Step: 384790   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:34,418-Speed 2624.08 samples/sec   Loss 7.3214   LearningRate 0.0287   Epoch: 9   Global Step: 384800   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:57:38,307-Speed 2633.55 samples/sec   Loss 7.1613   LearningRate 0.0287   Epoch: 9   Global Step: 384810   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:42,232-Speed 2609.55 samples/sec   Loss 7.2296   LearningRate 0.0287   Epoch: 9   Global Step: 384820   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:46,141-Speed 2620.09 samples/sec   Loss 7.2293   LearningRate 0.0287   Epoch: 9   Global Step: 384830   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:50,073-Speed 2605.58 samples/sec   Loss 7.3045   LearningRate 0.0287   Epoch: 9   Global Step: 384840   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:53,970-Speed 2627.72 samples/sec   Loss 7.2298   LearningRate 0.0287   Epoch: 9   Global Step: 384850   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:57:57,859-Speed 2633.85 samples/sec   Loss 7.1226   LearningRate 0.0287   Epoch: 9   Global Step: 384860   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:01,769-Speed 2620.01 samples/sec   Loss 7.2248   LearningRate 0.0287   Epoch: 9   Global Step: 384870   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:05,674-Speed 2623.07 samples/sec   Loss 7.2732   LearningRate 0.0287   Epoch: 9   Global Step: 384880   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:09,586-Speed 2618.46 samples/sec   Loss 7.1588   LearningRate 0.0287   Epoch: 9   Global Step: 384890   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:13,513-Speed 2608.00 samples/sec   Loss 7.0779   LearningRate 0.0287   Epoch: 9   Global Step: 384900   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:17,428-Speed 2615.91 samples/sec   Loss 7.2434   LearningRate 0.0287   Epoch: 9   Global Step: 384910   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:58:21,325-Speed 2628.70 samples/sec   Loss 7.1069   LearningRate 0.0287   Epoch: 9   Global Step: 384920   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:58:25,218-Speed 2631.23 samples/sec   Loss 7.3064   LearningRate 0.0287   Epoch: 9   Global Step: 384930   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:29,131-Speed 2617.59 samples/sec   Loss 7.2546   LearningRate 0.0287   Epoch: 9   Global Step: 384940   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:33,029-Speed 2627.89 samples/sec   Loss 7.2615   LearningRate 0.0287   Epoch: 9   Global Step: 384950   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:36,923-Speed 2630.32 samples/sec   Loss 7.2293   LearningRate 0.0287   Epoch: 9   Global Step: 384960   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:40,825-Speed 2625.20 samples/sec   Loss 7.2563   LearningRate 0.0287   Epoch: 9   Global Step: 384970   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:44,711-Speed 2635.34 samples/sec   Loss 7.2227   LearningRate 0.0287   Epoch: 9   Global Step: 384980   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:48,621-Speed 2620.25 samples/sec   Loss 7.2160   LearningRate 0.0287   Epoch: 9   Global Step: 384990   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:52,521-Speed 2625.71 samples/sec   Loss 7.2472   LearningRate 0.0287   Epoch: 9   Global Step: 385000   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:58:56,412-Speed 2632.52 samples/sec   Loss 7.2273   LearningRate 0.0287   Epoch: 9   Global Step: 385010   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:00,308-Speed 2629.13 samples/sec   Loss 7.1882   LearningRate 0.0287   Epoch: 9   Global Step: 385020   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:04,202-Speed 2630.47 samples/sec   Loss 7.2040   LearningRate 0.0287   Epoch: 9   Global Step: 385030   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:59:08,094-Speed 2631.21 samples/sec   Loss 7.1132   LearningRate 0.0287   Epoch: 9   Global Step: 385040   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 14:59:11,974-Speed 2640.11 samples/sec   Loss 7.3426   LearningRate 0.0287   Epoch: 9   Global Step: 385050   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:15,871-Speed 2628.35 samples/sec   Loss 7.2144   LearningRate 0.0287   Epoch: 9   Global Step: 385060   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:19,773-Speed 2624.85 samples/sec   Loss 7.1851   LearningRate 0.0287   Epoch: 9   Global Step: 385070   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:23,677-Speed 2624.19 samples/sec   Loss 7.2135   LearningRate 0.0287   Epoch: 9   Global Step: 385080   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:27,572-Speed 2629.65 samples/sec   Loss 7.2602   LearningRate 0.0287   Epoch: 9   Global Step: 385090   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:31,467-Speed 2629.88 samples/sec   Loss 7.3025   LearningRate 0.0287   Epoch: 9   Global Step: 385100   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 14:59:35,333-Speed 2648.92 samples/sec   Loss 7.5903   LearningRate 0.0287   Epoch: 9   Global Step: 385110   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:59:39,238-Speed 2622.90 samples/sec   Loss 7.6459   LearningRate 0.0287   Epoch: 9   Global Step: 385120   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:59:43,159-Speed 2612.00 samples/sec   Loss 7.4114   LearningRate 0.0287   Epoch: 9   Global Step: 385130   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:59:47,054-Speed 2630.47 samples/sec   Loss 7.3449   LearningRate 0.0287   Epoch: 9   Global Step: 385140   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:59:50,946-Speed 2631.65 samples/sec   Loss 7.3138   LearningRate 0.0287   Epoch: 9   Global Step: 385150   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:59:54,859-Speed 2617.57 samples/sec   Loss 7.2395   LearningRate 0.0287   Epoch: 9   Global Step: 385160   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 14:59:58,766-Speed 2621.73 samples/sec   Loss 7.2455   LearningRate 0.0287   Epoch: 9   Global Step: 385170   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:00:02,659-Speed 2630.87 samples/sec   Loss 7.0810   LearningRate 0.0287   Epoch: 9   Global Step: 385180   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:00:06,554-Speed 2629.69 samples/sec   Loss 7.2623   LearningRate 0.0287   Epoch: 9   Global Step: 385190   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:00:10,444-Speed 2632.51 samples/sec   Loss 7.1957   LearningRate 0.0287   Epoch: 9   Global Step: 385200   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:00:14,347-Speed 2624.39 samples/sec   Loss 7.5936   LearningRate 0.0287   Epoch: 9   Global Step: 385210   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:18,255-Speed 2620.63 samples/sec   Loss 7.3102   LearningRate 0.0287   Epoch: 9   Global Step: 385220   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:22,156-Speed 2625.76 samples/sec   Loss 7.2228   LearningRate 0.0287   Epoch: 9   Global Step: 385230   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:26,071-Speed 2616.32 samples/sec   Loss 7.1171   LearningRate 0.0287   Epoch: 9   Global Step: 385240   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:29,981-Speed 2620.07 samples/sec   Loss 7.0837   LearningRate 0.0287   Epoch: 9   Global Step: 385250   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:33,883-Speed 2624.89 samples/sec   Loss 7.1396   LearningRate 0.0287   Epoch: 9   Global Step: 385260   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:38,013-Speed 2479.53 samples/sec   Loss 7.3325   LearningRate 0.0287   Epoch: 9   Global Step: 385270   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:41,913-Speed 2626.43 samples/sec   Loss 7.1923   LearningRate 0.0287   Epoch: 9   Global Step: 385280   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:45,830-Speed 2615.25 samples/sec   Loss 7.1690   LearningRate 0.0287   Epoch: 9   Global Step: 385290   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:49,729-Speed 2626.53 samples/sec   Loss 7.2308   LearningRate 0.0287   Epoch: 9   Global Step: 385300   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:00:53,642-Speed 2618.37 samples/sec   Loss 7.2320   LearningRate 0.0287   Epoch: 9   Global Step: 385310   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:00:57,645-Speed 2558.39 samples/sec   Loss 7.4055   LearningRate 0.0287   Epoch: 9   Global Step: 385320   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:01,548-Speed 2624.69 samples/sec   Loss 7.3799   LearningRate 0.0287   Epoch: 9   Global Step: 385330   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:05,443-Speed 2629.79 samples/sec   Loss 7.1758   LearningRate 0.0287   Epoch: 9   Global Step: 385340   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:09,347-Speed 2623.36 samples/sec   Loss 7.2985   LearningRate 0.0287   Epoch: 9   Global Step: 385350   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:13,244-Speed 2628.02 samples/sec   Loss 7.1574   LearningRate 0.0287   Epoch: 9   Global Step: 385360   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:17,139-Speed 2630.35 samples/sec   Loss 7.2481   LearningRate 0.0287   Epoch: 9   Global Step: 385370   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:21,032-Speed 2631.02 samples/sec   Loss 7.0820   LearningRate 0.0287   Epoch: 9   Global Step: 385380   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:01:24,901-Speed 2647.03 samples/sec   Loss 7.2507   LearningRate 0.0287   Epoch: 9   Global Step: 385390   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:28,795-Speed 2630.42 samples/sec   Loss 7.3261   LearningRate 0.0287   Epoch: 9   Global Step: 385400   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:32,686-Speed 2632.06 samples/sec   Loss 7.2531   LearningRate 0.0287   Epoch: 9   Global Step: 385410   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:36,590-Speed 2623.79 samples/sec   Loss 7.2568   LearningRate 0.0287   Epoch: 9   Global Step: 385420   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:40,585-Speed 2563.65 samples/sec   Loss 7.1089   LearningRate 0.0287   Epoch: 9   Global Step: 385430   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:45,376-Speed 2137.94 samples/sec   Loss 7.1209   LearningRate 0.0287   Epoch: 9   Global Step: 385440   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:49,284-Speed 2620.92 samples/sec   Loss 7.3102   LearningRate 0.0287   Epoch: 9   Global Step: 385450   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:53,191-Speed 2622.90 samples/sec   Loss 7.2228   LearningRate 0.0287   Epoch: 9   Global Step: 385460   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:01:57,085-Speed 2630.86 samples/sec   Loss 7.1358   LearningRate 0.0287   Epoch: 9   Global Step: 385470   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:02:01,019-Speed 2603.85 samples/sec   Loss 7.2611   LearningRate 0.0287   Epoch: 9   Global Step: 385480   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:02:04,906-Speed 2635.11 samples/sec   Loss 7.0823   LearningRate 0.0287   Epoch: 9   Global Step: 385490   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:08,813-Speed 2621.46 samples/sec   Loss 7.2642   LearningRate 0.0287   Epoch: 9   Global Step: 385500   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:12,709-Speed 2628.59 samples/sec   Loss 7.0798   LearningRate 0.0287   Epoch: 9   Global Step: 385510   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:16,607-Speed 2631.62 samples/sec   Loss 7.2764   LearningRate 0.0287   Epoch: 9   Global Step: 385520   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:20,508-Speed 2625.97 samples/sec   Loss 7.2245   LearningRate 0.0287   Epoch: 9   Global Step: 385530   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:24,406-Speed 2627.70 samples/sec   Loss 7.2367   LearningRate 0.0287   Epoch: 9   Global Step: 385540   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:28,323-Speed 2614.85 samples/sec   Loss 7.2479   LearningRate 0.0286   Epoch: 9   Global Step: 385550   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:32,218-Speed 2630.02 samples/sec   Loss 7.1801   LearningRate 0.0286   Epoch: 9   Global Step: 385560   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:36,112-Speed 2630.30 samples/sec   Loss 7.3332   LearningRate 0.0286   Epoch: 9   Global Step: 385570   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:40,014-Speed 2625.19 samples/sec   Loss 7.2557   LearningRate 0.0286   Epoch: 9   Global Step: 385580   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:02:43,902-Speed 2633.74 samples/sec   Loss 7.1605   LearningRate 0.0286   Epoch: 9   Global Step: 385590   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:02:47,802-Speed 2626.51 samples/sec   Loss 7.1884   LearningRate 0.0286   Epoch: 9   Global Step: 385600   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:02:51,695-Speed 2630.84 samples/sec   Loss 7.3000   LearningRate 0.0286   Epoch: 9   Global Step: 385610   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:02:55,597-Speed 2624.97 samples/sec   Loss 7.2195   LearningRate 0.0286   Epoch: 9   Global Step: 385620   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:02:59,490-Speed 2631.09 samples/sec   Loss 7.1897   LearningRate 0.0286   Epoch: 9   Global Step: 385630   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:03,383-Speed 2631.18 samples/sec   Loss 7.1827   LearningRate 0.0286   Epoch: 9   Global Step: 385640   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:07,285-Speed 2625.04 samples/sec   Loss 7.1779   LearningRate 0.0286   Epoch: 9   Global Step: 385650   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:11,190-Speed 2622.39 samples/sec   Loss 7.2670   LearningRate 0.0286   Epoch: 9   Global Step: 385660   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:15,093-Speed 2624.18 samples/sec   Loss 7.2574   LearningRate 0.0286   Epoch: 9   Global Step: 385670   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:19,001-Speed 2621.19 samples/sec   Loss 7.1513   LearningRate 0.0286   Epoch: 9   Global Step: 385680   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:22,915-Speed 2616.71 samples/sec   Loss 7.1504   LearningRate 0.0286   Epoch: 9   Global Step: 385690   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:03:26,830-Speed 2616.35 samples/sec   Loss 7.2109   LearningRate 0.0286   Epoch: 9   Global Step: 385700   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:30,790-Speed 2587.25 samples/sec   Loss 7.2080   LearningRate 0.0286   Epoch: 9   Global Step: 385710   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:34,697-Speed 2621.80 samples/sec   Loss 7.2927   LearningRate 0.0286   Epoch: 9   Global Step: 385720   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:38,624-Speed 2607.85 samples/sec   Loss 7.3530   LearningRate 0.0286   Epoch: 9   Global Step: 385730   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:42,541-Speed 2614.77 samples/sec   Loss 7.1587   LearningRate 0.0286   Epoch: 9   Global Step: 385740   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:46,452-Speed 2619.13 samples/sec   Loss 7.0864   LearningRate 0.0286   Epoch: 9   Global Step: 385750   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:03:50,307-Speed 2656.61 samples/sec   Loss 7.2346   LearningRate 0.0286   Epoch: 9   Global Step: 385760   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:03:54,222-Speed 2616.08 samples/sec   Loss 7.4078   LearningRate 0.0286   Epoch: 9   Global Step: 385770   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:03:58,136-Speed 2616.89 samples/sec   Loss 7.3987   LearningRate 0.0286   Epoch: 9   Global Step: 385780   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:04:02,015-Speed 2640.91 samples/sec   Loss 7.1467   LearningRate 0.0286   Epoch: 9   Global Step: 385790   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:05,907-Speed 2631.76 samples/sec   Loss 7.5962   LearningRate 0.0286   Epoch: 9   Global Step: 385800   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:09,846-Speed 2600.34 samples/sec   Loss 7.5180   LearningRate 0.0286   Epoch: 9   Global Step: 385810   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:13,740-Speed 2630.14 samples/sec   Loss 7.1738   LearningRate 0.0286   Epoch: 9   Global Step: 385820   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:17,640-Speed 2626.62 samples/sec   Loss 7.2267   LearningRate 0.0286   Epoch: 9   Global Step: 385830   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:21,541-Speed 2624.88 samples/sec   Loss 7.2694   LearningRate 0.0286   Epoch: 9   Global Step: 385840   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:25,445-Speed 2624.75 samples/sec   Loss 7.2929   LearningRate 0.0286   Epoch: 9   Global Step: 385850   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:29,329-Speed 2636.57 samples/sec   Loss 7.1273   LearningRate 0.0286   Epoch: 9   Global Step: 385860   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:33,228-Speed 2627.05 samples/sec   Loss 7.2205   LearningRate 0.0286   Epoch: 9   Global Step: 385870   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:37,169-Speed 2599.26 samples/sec   Loss 7.1772   LearningRate 0.0286   Epoch: 9   Global Step: 385880   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:04:41,068-Speed 2627.57 samples/sec   Loss 7.2529   LearningRate 0.0286   Epoch: 9   Global Step: 385890   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:04:45,003-Speed 2602.98 samples/sec   Loss 7.1464   LearningRate 0.0286   Epoch: 9   Global Step: 385900   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:04:48,897-Speed 2630.37 samples/sec   Loss 7.2058   LearningRate 0.0286   Epoch: 9   Global Step: 385910   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:04:52,788-Speed 2632.28 samples/sec   Loss 7.1766   LearningRate 0.0286   Epoch: 9   Global Step: 385920   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:04:56,685-Speed 2628.53 samples/sec   Loss 7.0884   LearningRate 0.0286   Epoch: 9   Global Step: 385930   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:05:00,597-Speed 2618.80 samples/sec   Loss 7.1883   LearningRate 0.0286   Epoch: 9   Global Step: 385940   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:05:04,487-Speed 2632.43 samples/sec   Loss 7.2188   LearningRate 0.0286   Epoch: 9   Global Step: 385950   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:05:08,391-Speed 2623.80 samples/sec   Loss 7.3055   LearningRate 0.0286   Epoch: 9   Global Step: 385960   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:05:12,308-Speed 2614.93 samples/sec   Loss 7.2283   LearningRate 0.0286   Epoch: 9   Global Step: 385970   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:05:16,204-Speed 2629.16 samples/sec   Loss 7.2299   LearningRate 0.0286   Epoch: 9   Global Step: 385980   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:05:20,099-Speed 2629.56 samples/sec   Loss 7.2754   LearningRate 0.0286   Epoch: 9   Global Step: 385990   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:24,073-Speed 2577.81 samples/sec   Loss 7.2524   LearningRate 0.0286   Epoch: 9   Global Step: 386000   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:27,991-Speed 2614.09 samples/sec   Loss 7.2093   LearningRate 0.0286   Epoch: 9   Global Step: 386010   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:31,900-Speed 2620.76 samples/sec   Loss 7.3676   LearningRate 0.0286   Epoch: 9   Global Step: 386020   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:35,789-Speed 2633.38 samples/sec   Loss 7.1707   LearningRate 0.0286   Epoch: 9   Global Step: 386030   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:39,705-Speed 2615.82 samples/sec   Loss 7.1792   LearningRate 0.0286   Epoch: 9   Global Step: 386040   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:43,603-Speed 2626.83 samples/sec   Loss 7.1451   LearningRate 0.0286   Epoch: 9   Global Step: 386050   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:47,507-Speed 2624.21 samples/sec   Loss 7.2008   LearningRate 0.0286   Epoch: 9   Global Step: 386060   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:51,403-Speed 2629.55 samples/sec   Loss 7.1593   LearningRate 0.0286   Epoch: 9   Global Step: 386070   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:55,305-Speed 2624.97 samples/sec   Loss 7.1515   LearningRate 0.0286   Epoch: 9   Global Step: 386080   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:05:59,199-Speed 2629.95 samples/sec   Loss 7.2049   LearningRate 0.0286   Epoch: 9   Global Step: 386090   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:03,102-Speed 2624.26 samples/sec   Loss 7.2491   LearningRate 0.0286   Epoch: 9   Global Step: 386100   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:06,996-Speed 2630.42 samples/sec   Loss 7.2627   LearningRate 0.0286   Epoch: 9   Global Step: 386110   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:10,901-Speed 2622.69 samples/sec   Loss 7.1780   LearningRate 0.0286   Epoch: 9   Global Step: 386120   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:14,799-Speed 2627.50 samples/sec   Loss 7.0863   LearningRate 0.0286   Epoch: 9   Global Step: 386130   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:18,708-Speed 2620.52 samples/sec   Loss 7.2347   LearningRate 0.0286   Epoch: 9   Global Step: 386140   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:22,612-Speed 2623.46 samples/sec   Loss 7.1930   LearningRate 0.0286   Epoch: 9   Global Step: 386150   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:26,630-Speed 2549.44 samples/sec   Loss 7.2910   LearningRate 0.0286   Epoch: 9   Global Step: 386160   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:30,721-Speed 2503.53 samples/sec   Loss 7.1034   LearningRate 0.0286   Epoch: 9   Global Step: 386170   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:06:34,778-Speed 2524.53 samples/sec   Loss 7.2073   LearningRate 0.0286   Epoch: 9   Global Step: 386180   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:06:38,744-Speed 2582.67 samples/sec   Loss 7.0626   LearningRate 0.0286   Epoch: 9   Global Step: 386190   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:06:42,655-Speed 2621.50 samples/sec   Loss 7.2613   LearningRate 0.0286   Epoch: 9   Global Step: 386200   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:06:46,560-Speed 2623.05 samples/sec   Loss 7.3415   LearningRate 0.0286   Epoch: 9   Global Step: 386210   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:06:50,450-Speed 2633.03 samples/sec   Loss 7.1758   LearningRate 0.0286   Epoch: 9   Global Step: 386220   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:06:54,352-Speed 2624.90 samples/sec   Loss 7.1043   LearningRate 0.0286   Epoch: 9   Global Step: 386230   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:06:58,253-Speed 2625.65 samples/sec   Loss 7.0549   LearningRate 0.0286   Epoch: 9   Global Step: 386240   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:07:02,153-Speed 2626.51 samples/sec   Loss 7.1975   LearningRate 0.0286   Epoch: 9   Global Step: 386250   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:07:05,996-Speed 2665.22 samples/sec   Loss 7.2909   LearningRate 0.0286   Epoch: 9   Global Step: 386260   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:09,907-Speed 2618.61 samples/sec   Loss 7.6316   LearningRate 0.0286   Epoch: 9   Global Step: 386270   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:13,812-Speed 2622.89 samples/sec   Loss 7.2771   LearningRate 0.0286   Epoch: 9   Global Step: 386280   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:17,730-Speed 2614.46 samples/sec   Loss 7.1627   LearningRate 0.0286   Epoch: 9   Global Step: 386290   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:21,618-Speed 2635.44 samples/sec   Loss 7.1236   LearningRate 0.0286   Epoch: 9   Global Step: 386300   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:25,553-Speed 2602.46 samples/sec   Loss 7.2287   LearningRate 0.0286   Epoch: 9   Global Step: 386310   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:29,499-Speed 2596.38 samples/sec   Loss 7.2324   LearningRate 0.0285   Epoch: 9   Global Step: 386320   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:33,397-Speed 2627.37 samples/sec   Loss 7.2375   LearningRate 0.0285   Epoch: 9   Global Step: 386330   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:37,312-Speed 2616.27 samples/sec   Loss 7.3328   LearningRate 0.0285   Epoch: 9   Global Step: 386340   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:41,234-Speed 2611.30 samples/sec   Loss 7.1355   LearningRate 0.0285   Epoch: 9   Global Step: 386350   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:07:45,133-Speed 2626.56 samples/sec   Loss 7.1104   LearningRate 0.0285   Epoch: 9   Global Step: 386360   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:07:49,041-Speed 2620.94 samples/sec   Loss 7.1889   LearningRate 0.0285   Epoch: 9   Global Step: 386370   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:07:52,947-Speed 2623.32 samples/sec   Loss 7.0354   LearningRate 0.0285   Epoch: 9   Global Step: 386380   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:07:56,839-Speed 2631.14 samples/sec   Loss 7.2416   LearningRate 0.0285   Epoch: 9   Global Step: 386390   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:00,744-Speed 2623.59 samples/sec   Loss 7.1973   LearningRate 0.0285   Epoch: 9   Global Step: 386400   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:04,651-Speed 2621.34 samples/sec   Loss 7.2468   LearningRate 0.0285   Epoch: 9   Global Step: 386410   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:08,548-Speed 2627.64 samples/sec   Loss 7.2011   LearningRate 0.0285   Epoch: 9   Global Step: 386420   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:12,442-Speed 2630.43 samples/sec   Loss 7.1673   LearningRate 0.0285   Epoch: 9   Global Step: 386430   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:16,339-Speed 2628.35 samples/sec   Loss 7.1202   LearningRate 0.0285   Epoch: 9   Global Step: 386440   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:20,242-Speed 2624.45 samples/sec   Loss 7.1615   LearningRate 0.0285   Epoch: 9   Global Step: 386450   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:08:24,155-Speed 2617.50 samples/sec   Loss 7.2265   LearningRate 0.0285   Epoch: 9   Global Step: 386460   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:28,054-Speed 2626.65 samples/sec   Loss 7.2218   LearningRate 0.0285   Epoch: 9   Global Step: 386470   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:31,947-Speed 2631.21 samples/sec   Loss 7.0469   LearningRate 0.0285   Epoch: 9   Global Step: 386480   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:35,853-Speed 2621.80 samples/sec   Loss 7.1445   LearningRate 0.0285   Epoch: 9   Global Step: 386490   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:39,749-Speed 2629.35 samples/sec   Loss 7.0682   LearningRate 0.0285   Epoch: 9   Global Step: 386500   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:43,646-Speed 2628.43 samples/sec   Loss 7.2245   LearningRate 0.0285   Epoch: 9   Global Step: 386510   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:47,555-Speed 2620.08 samples/sec   Loss 7.1681   LearningRate 0.0285   Epoch: 9   Global Step: 386520   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:51,452-Speed 2627.97 samples/sec   Loss 7.1786   LearningRate 0.0285   Epoch: 9   Global Step: 386530   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:55,356-Speed 2623.76 samples/sec   Loss 7.1642   LearningRate 0.0285   Epoch: 9   Global Step: 386540   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:08:59,246-Speed 2633.30 samples/sec   Loss 7.1217   LearningRate 0.0285   Epoch: 9   Global Step: 386550   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:09:03,141-Speed 2629.18 samples/sec   Loss 7.2750   LearningRate 0.0285   Epoch: 9   Global Step: 386560   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:07,050-Speed 2620.11 samples/sec   Loss 7.1749   LearningRate 0.0285   Epoch: 9   Global Step: 386570   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:10,947-Speed 2628.72 samples/sec   Loss 7.2848   LearningRate 0.0285   Epoch: 9   Global Step: 386580   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:14,849-Speed 2625.35 samples/sec   Loss 7.1599   LearningRate 0.0285   Epoch: 9   Global Step: 386590   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:18,747-Speed 2627.33 samples/sec   Loss 7.1278   LearningRate 0.0285   Epoch: 9   Global Step: 386600   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:22,659-Speed 2618.22 samples/sec   Loss 7.2596   LearningRate 0.0285   Epoch: 9   Global Step: 386610   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:26,574-Speed 2615.90 samples/sec   Loss 7.1309   LearningRate 0.0285   Epoch: 9   Global Step: 386620   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:30,472-Speed 2628.21 samples/sec   Loss 7.2230   LearningRate 0.0285   Epoch: 9   Global Step: 386630   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:34,360-Speed 2634.02 samples/sec   Loss 7.2374   LearningRate 0.0285   Epoch: 9   Global Step: 386640   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:38,254-Speed 2630.65 samples/sec   Loss 7.2513   LearningRate 0.0285   Epoch: 9   Global Step: 386650   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:09:42,159-Speed 2622.54 samples/sec   Loss 7.0669   LearningRate 0.0285   Epoch: 9   Global Step: 386660   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:09:46,067-Speed 2621.59 samples/sec   Loss 7.2011   LearningRate 0.0285   Epoch: 9   Global Step: 386670   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:09:49,958-Speed 2631.84 samples/sec   Loss 7.2620   LearningRate 0.0285   Epoch: 9   Global Step: 386680   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:09:53,850-Speed 2631.73 samples/sec   Loss 7.2550   LearningRate 0.0285   Epoch: 9   Global Step: 386690   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:09:57,742-Speed 2631.60 samples/sec   Loss 7.1330   LearningRate 0.0285   Epoch: 9   Global Step: 386700   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:10:01,673-Speed 2605.56 samples/sec   Loss 7.2908   LearningRate 0.0285   Epoch: 9   Global Step: 386710   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:10:05,574-Speed 2626.03 samples/sec   Loss 7.2435   LearningRate 0.0285   Epoch: 9   Global Step: 386720   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:10:09,467-Speed 2631.11 samples/sec   Loss 7.1214   LearningRate 0.0285   Epoch: 9   Global Step: 386730   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:10:13,417-Speed 2592.84 samples/sec   Loss 7.1604   LearningRate 0.0285   Epoch: 9   Global Step: 386740   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:10:17,345-Speed 2607.87 samples/sec   Loss 7.2128   LearningRate 0.0285   Epoch: 9   Global Step: 386750   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:10:21,240-Speed 2629.72 samples/sec   Loss 7.2060   LearningRate 0.0285   Epoch: 9   Global Step: 386760   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:10:25,090-Speed 2660.23 samples/sec   Loss 7.3458   LearningRate 0.0285   Epoch: 9   Global Step: 386770   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:28,986-Speed 2628.60 samples/sec   Loss 7.1842   LearningRate 0.0285   Epoch: 9   Global Step: 386780   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:32,877-Speed 2633.80 samples/sec   Loss 7.1140   LearningRate 0.0285   Epoch: 9   Global Step: 386790   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:36,778-Speed 2625.67 samples/sec   Loss 7.2317   LearningRate 0.0285   Epoch: 9   Global Step: 386800   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:40,685-Speed 2621.59 samples/sec   Loss 7.1509   LearningRate 0.0285   Epoch: 9   Global Step: 386810   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:44,589-Speed 2623.66 samples/sec   Loss 7.2221   LearningRate 0.0285   Epoch: 9   Global Step: 386820   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:48,499-Speed 2619.78 samples/sec   Loss 7.2309   LearningRate 0.0285   Epoch: 9   Global Step: 386830   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:52,395-Speed 2629.22 samples/sec   Loss 7.0884   LearningRate 0.0285   Epoch: 9   Global Step: 386840   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:10:56,300-Speed 2622.92 samples/sec   Loss 7.1416   LearningRate 0.0285   Epoch: 9   Global Step: 386850   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:00,202-Speed 2624.65 samples/sec   Loss 7.2201   LearningRate 0.0285   Epoch: 9   Global Step: 386860   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:04,101-Speed 2627.00 samples/sec   Loss 7.2717   LearningRate 0.0285   Epoch: 9   Global Step: 386870   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:11:08,009-Speed 2620.80 samples/sec   Loss 7.2370   LearningRate 0.0285   Epoch: 9   Global Step: 386880   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:11:11,883-Speed 2643.95 samples/sec   Loss 7.2694   LearningRate 0.0285   Epoch: 9   Global Step: 386890   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:15,886-Speed 2558.66 samples/sec   Loss 7.3043   LearningRate 0.0285   Epoch: 9   Global Step: 386900   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:19,775-Speed 2633.94 samples/sec   Loss 7.3108   LearningRate 0.0285   Epoch: 9   Global Step: 386910   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:23,698-Speed 2610.55 samples/sec   Loss 7.3338   LearningRate 0.0285   Epoch: 9   Global Step: 386920   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:27,596-Speed 2628.15 samples/sec   Loss 7.0603   LearningRate 0.0285   Epoch: 9   Global Step: 386930   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:31,487-Speed 2632.59 samples/sec   Loss 7.1597   LearningRate 0.0285   Epoch: 9   Global Step: 386940   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:35,389-Speed 2624.85 samples/sec   Loss 7.1634   LearningRate 0.0285   Epoch: 9   Global Step: 386950   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:39,285-Speed 2628.38 samples/sec   Loss 7.2286   LearningRate 0.0285   Epoch: 9   Global Step: 386960   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:43,178-Speed 2631.27 samples/sec   Loss 7.1057   LearningRate 0.0285   Epoch: 9   Global Step: 386970   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:47,063-Speed 2637.00 samples/sec   Loss 7.1588   LearningRate 0.0285   Epoch: 9   Global Step: 386980   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:11:50,961-Speed 2627.16 samples/sec   Loss 7.0711   LearningRate 0.0285   Epoch: 9   Global Step: 386990   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:11:54,947-Speed 2569.90 samples/sec   Loss 7.1509   LearningRate 0.0285   Epoch: 9   Global Step: 387000   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:11:58,841-Speed 2630.27 samples/sec   Loss 7.0610   LearningRate 0.0285   Epoch: 9   Global Step: 387010   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:02,737-Speed 2629.04 samples/sec   Loss 7.2797   LearningRate 0.0285   Epoch: 9   Global Step: 387020   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:06,630-Speed 2631.27 samples/sec   Loss 7.1858   LearningRate 0.0285   Epoch: 9   Global Step: 387030   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:10,530-Speed 2625.56 samples/sec   Loss 7.0854   LearningRate 0.0285   Epoch: 9   Global Step: 387040   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:14,432-Speed 2624.81 samples/sec   Loss 7.1967   LearningRate 0.0285   Epoch: 9   Global Step: 387050   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:18,319-Speed 2635.22 samples/sec   Loss 7.1665   LearningRate 0.0285   Epoch: 9   Global Step: 387060   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:22,210-Speed 2632.38 samples/sec   Loss 7.0920   LearningRate 0.0285   Epoch: 9   Global Step: 387070   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:26,113-Speed 2624.28 samples/sec   Loss 7.1518   LearningRate 0.0285   Epoch: 9   Global Step: 387080   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:12:30,016-Speed 2624.22 samples/sec   Loss 7.1784   LearningRate 0.0285   Epoch: 9   Global Step: 387090   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:33,915-Speed 2626.92 samples/sec   Loss 7.1841   LearningRate 0.0284   Epoch: 9   Global Step: 387100   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:37,842-Speed 2608.03 samples/sec   Loss 7.2609   LearningRate 0.0284   Epoch: 9   Global Step: 387110   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:41,730-Speed 2634.75 samples/sec   Loss 7.2334   LearningRate 0.0284   Epoch: 9   Global Step: 387120   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:45,621-Speed 2632.62 samples/sec   Loss 7.0034   LearningRate 0.0284   Epoch: 9   Global Step: 387130   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:49,517-Speed 2628.48 samples/sec   Loss 7.1415   LearningRate 0.0284   Epoch: 9   Global Step: 387140   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:53,420-Speed 2623.98 samples/sec   Loss 7.2358   LearningRate 0.0284   Epoch: 9   Global Step: 387150   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:12:57,314-Speed 2630.28 samples/sec   Loss 7.1335   LearningRate 0.0284   Epoch: 9   Global Step: 387160   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:01,217-Speed 2624.47 samples/sec   Loss 7.1235   LearningRate 0.0284   Epoch: 9   Global Step: 387170   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:05,131-Speed 2616.48 samples/sec   Loss 7.1109   LearningRate 0.0284   Epoch: 9   Global Step: 387180   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:09,033-Speed 2624.98 samples/sec   Loss 7.2502   LearningRate 0.0284   Epoch: 9   Global Step: 387190   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:13:12,922-Speed 2633.24 samples/sec   Loss 7.2055   LearningRate 0.0284   Epoch: 9   Global Step: 387200   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:13:16,813-Speed 2632.70 samples/sec   Loss 7.1959   LearningRate 0.0284   Epoch: 9   Global Step: 387210   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:20,720-Speed 2621.61 samples/sec   Loss 7.2009   LearningRate 0.0284   Epoch: 9   Global Step: 387220   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:24,617-Speed 2628.30 samples/sec   Loss 7.1009   LearningRate 0.0284   Epoch: 9   Global Step: 387230   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:28,509-Speed 2631.71 samples/sec   Loss 7.1989   LearningRate 0.0284   Epoch: 9   Global Step: 387240   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:32,409-Speed 2626.27 samples/sec   Loss 7.2266   LearningRate 0.0284   Epoch: 9   Global Step: 387250   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:36,301-Speed 2631.91 samples/sec   Loss 7.1152   LearningRate 0.0284   Epoch: 9   Global Step: 387260   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:40,205-Speed 2623.46 samples/sec   Loss 7.1173   LearningRate 0.0284   Epoch: 9   Global Step: 387270   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:44,125-Speed 2612.40 samples/sec   Loss 7.1764   LearningRate 0.0284   Epoch: 9   Global Step: 387280   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:48,027-Speed 2625.18 samples/sec   Loss 7.2523   LearningRate 0.0284   Epoch: 9   Global Step: 387290   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:51,923-Speed 2629.08 samples/sec   Loss 7.1535   LearningRate 0.0284   Epoch: 9   Global Step: 387300   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:13:55,820-Speed 2628.29 samples/sec   Loss 7.1179   LearningRate 0.0284   Epoch: 9   Global Step: 387310   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:13:59,710-Speed 2633.29 samples/sec   Loss 7.0717   LearningRate 0.0284   Epoch: 9   Global Step: 387320   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:14:03,606-Speed 2628.72 samples/sec   Loss 7.1161   LearningRate 0.0284   Epoch: 9   Global Step: 387330   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:07,505-Speed 2627.03 samples/sec   Loss 7.2520   LearningRate 0.0284   Epoch: 9   Global Step: 387340   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:11,414-Speed 2619.66 samples/sec   Loss 7.2645   LearningRate 0.0284   Epoch: 9   Global Step: 387350   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:15,309-Speed 2629.79 samples/sec   Loss 7.1332   LearningRate 0.0284   Epoch: 9   Global Step: 387360   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:19,217-Speed 2620.68 samples/sec   Loss 7.1321   LearningRate 0.0284   Epoch: 9   Global Step: 387370   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:23,112-Speed 2629.83 samples/sec   Loss 7.2331   LearningRate 0.0284   Epoch: 9   Global Step: 387380   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:27,015-Speed 2624.61 samples/sec   Loss 7.1501   LearningRate 0.0284   Epoch: 9   Global Step: 387390   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:30,917-Speed 2624.65 samples/sec   Loss 7.1951   LearningRate 0.0284   Epoch: 9   Global Step: 387400   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:34,813-Speed 2628.68 samples/sec   Loss 7.2008   LearningRate 0.0284   Epoch: 9   Global Step: 387410   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:38,710-Speed 2628.06 samples/sec   Loss 7.1306   LearningRate 0.0284   Epoch: 9   Global Step: 387420   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:42,615-Speed 2623.45 samples/sec   Loss 7.2323   LearningRate 0.0284   Epoch: 9   Global Step: 387430   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:14:46,505-Speed 2633.09 samples/sec   Loss 7.0429   LearningRate 0.0284   Epoch: 9   Global Step: 387440   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:50,409-Speed 2623.62 samples/sec   Loss 7.0919   LearningRate 0.0284   Epoch: 9   Global Step: 387450   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:54,301-Speed 2630.94 samples/sec   Loss 7.2508   LearningRate 0.0284   Epoch: 9   Global Step: 387460   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:14:58,207-Speed 2622.59 samples/sec   Loss 7.0988   LearningRate 0.0284   Epoch: 9   Global Step: 387470   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:15:02,105-Speed 2627.27 samples/sec   Loss 7.1913   LearningRate 0.0284   Epoch: 9   Global Step: 387480   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:15:05,980-Speed 2644.05 samples/sec   Loss 7.1250   LearningRate 0.0284   Epoch: 9   Global Step: 387490   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:09,880-Speed 2626.47 samples/sec   Loss 7.1215   LearningRate 0.0284   Epoch: 9   Global Step: 387500   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:13,838-Speed 2587.97 samples/sec   Loss 7.0751   LearningRate 0.0284   Epoch: 9   Global Step: 387510   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:17,746-Speed 2620.60 samples/sec   Loss 7.1144   LearningRate 0.0284   Epoch: 9   Global Step: 387520   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:21,711-Speed 2583.46 samples/sec   Loss 7.1518   LearningRate 0.0284   Epoch: 9   Global Step: 387530   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:25,612-Speed 2625.32 samples/sec   Loss 7.0433   LearningRate 0.0284   Epoch: 9   Global Step: 387540   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:29,522-Speed 2619.66 samples/sec   Loss 7.1437   LearningRate 0.0284   Epoch: 9   Global Step: 387550   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:35,060-Speed 1849.44 samples/sec   Loss 7.1790   LearningRate 0.0284   Epoch: 9   Global Step: 387560   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:38,974-Speed 2616.67 samples/sec   Loss 7.3011   LearningRate 0.0284   Epoch: 9   Global Step: 387570   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:42,870-Speed 2629.05 samples/sec   Loss 7.1310   LearningRate 0.0284   Epoch: 9   Global Step: 387580   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:46,843-Speed 2578.13 samples/sec   Loss 7.1604   LearningRate 0.0284   Epoch: 9   Global Step: 387590   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:15:50,803-Speed 2587.98 samples/sec   Loss 7.2714   LearningRate 0.0284   Epoch: 9   Global Step: 387600   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:54,721-Speed 2614.11 samples/sec   Loss 7.1040   LearningRate 0.0284   Epoch: 9   Global Step: 387610   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:15:58,564-Speed 2664.65 samples/sec   Loss 7.5062   LearningRate 0.0284   Epoch: 9   Global Step: 387620   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:02,459-Speed 2630.00 samples/sec   Loss 7.7021   LearningRate 0.0284   Epoch: 9   Global Step: 387630   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:06,368-Speed 2620.18 samples/sec   Loss 7.5854   LearningRate 0.0284   Epoch: 9   Global Step: 387640   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:10,292-Speed 2610.02 samples/sec   Loss 7.4614   LearningRate 0.0284   Epoch: 9   Global Step: 387650   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:14,380-Speed 2506.13 samples/sec   Loss 7.3552   LearningRate 0.0284   Epoch: 9   Global Step: 387660   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:18,278-Speed 2627.85 samples/sec   Loss 7.3142   LearningRate 0.0284   Epoch: 9   Global Step: 387670   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:22,174-Speed 2629.03 samples/sec   Loss 7.3016   LearningRate 0.0284   Epoch: 9   Global Step: 387680   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:26,092-Speed 2613.91 samples/sec   Loss 7.1433   LearningRate 0.0284   Epoch: 9   Global Step: 387690   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:30,135-Speed 2533.45 samples/sec   Loss 7.1321   LearningRate 0.0284   Epoch: 9   Global Step: 387700   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:34,204-Speed 2517.43 samples/sec   Loss 7.0387   LearningRate 0.0284   Epoch: 9   Global Step: 387710   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-04-14 15:16:38,277-Speed 2514.87 samples/sec   Loss 7.2792   LearningRate 0.0284   Epoch: 9   Global Step: 387720   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:16:42,375-Speed 2499.42 samples/sec   Loss 7.1584   LearningRate 0.0284   Epoch: 9   Global Step: 387730   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:16:46,391-Speed 2550.49 samples/sec   Loss 7.3004   LearningRate 0.0284   Epoch: 9   Global Step: 387740   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:16:50,319-Speed 2607.70 samples/sec   Loss 7.1156   LearningRate 0.0284   Epoch: 9   Global Step: 387750   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:16:54,239-Speed 2612.75 samples/sec   Loss 7.1463   LearningRate 0.0284   Epoch: 9   Global Step: 387760   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:16:58,221-Speed 2572.50 samples/sec   Loss 7.3924   LearningRate 0.0284   Epoch: 9   Global Step: 387770   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:02,117-Speed 2628.62 samples/sec   Loss 7.0659   LearningRate 0.0284   Epoch: 9   Global Step: 387780   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:06,020-Speed 2623.98 samples/sec   Loss 7.0975   LearningRate 0.0284   Epoch: 9   Global Step: 387790   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:09,928-Speed 2620.60 samples/sec   Loss 7.1601   LearningRate 0.0284   Epoch: 9   Global Step: 387800   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:13,822-Speed 2630.07 samples/sec   Loss 7.0606   LearningRate 0.0284   Epoch: 9   Global Step: 387810   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:17,731-Speed 2620.53 samples/sec   Loss 7.2715   LearningRate 0.0284   Epoch: 9   Global Step: 387820   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:17:21,639-Speed 2621.06 samples/sec   Loss 7.2050   LearningRate 0.0284   Epoch: 9   Global Step: 387830   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:17:25,532-Speed 2631.15 samples/sec   Loss 7.1758   LearningRate 0.0284   Epoch: 9   Global Step: 387840   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:17:29,431-Speed 2626.74 samples/sec   Loss 7.2958   LearningRate 0.0284   Epoch: 9   Global Step: 387850   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:17:33,332-Speed 2625.76 samples/sec   Loss 7.2438   LearningRate 0.0284   Epoch: 9   Global Step: 387860   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:17:37,234-Speed 2624.38 samples/sec   Loss 7.1355   LearningRate 0.0284   Epoch: 9   Global Step: 387870   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:17:41,121-Speed 2635.31 samples/sec   Loss 7.2646   LearningRate 0.0283   Epoch: 9   Global Step: 387880   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:45,020-Speed 2627.11 samples/sec   Loss 7.2121   LearningRate 0.0283   Epoch: 9   Global Step: 387890   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:48,942-Speed 2611.62 samples/sec   Loss 7.1160   LearningRate 0.0283   Epoch: 9   Global Step: 387900   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:52,837-Speed 2630.10 samples/sec   Loss 7.1309   LearningRate 0.0283   Epoch: 9   Global Step: 387910   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:17:56,749-Speed 2618.92 samples/sec   Loss 7.1214   LearningRate 0.0283   Epoch: 9   Global Step: 387920   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:18:00,648-Speed 2627.27 samples/sec   Loss 7.1485   LearningRate 0.0283   Epoch: 9   Global Step: 387930   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:18:04,558-Speed 2619.43 samples/sec   Loss 7.1966   LearningRate 0.0283   Epoch: 9   Global Step: 387940   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:18:08,451-Speed 2630.58 samples/sec   Loss 7.3131   LearningRate 0.0283   Epoch: 9   Global Step: 387950   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:18:12,356-Speed 2623.22 samples/sec   Loss 7.2258   LearningRate 0.0283   Epoch: 9   Global Step: 387960   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:18:16,264-Speed 2621.02 samples/sec   Loss 7.1087   LearningRate 0.0283   Epoch: 9   Global Step: 387970   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:18:20,150-Speed 2635.62 samples/sec   Loss 7.1238   LearningRate 0.0283   Epoch: 9   Global Step: 387980   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:24,055-Speed 2622.69 samples/sec   Loss 7.1126   LearningRate 0.0283   Epoch: 9   Global Step: 387990   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:27,950-Speed 2630.66 samples/sec   Loss 7.1977   LearningRate 0.0283   Epoch: 9   Global Step: 388000   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:31,839-Speed 2633.49 samples/sec   Loss 7.2383   LearningRate 0.0283   Epoch: 9   Global Step: 388010   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:35,741-Speed 2624.48 samples/sec   Loss 7.2748   LearningRate 0.0283   Epoch: 9   Global Step: 388020   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:39,637-Speed 2629.31 samples/sec   Loss 7.2265   LearningRate 0.0283   Epoch: 9   Global Step: 388030   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:43,534-Speed 2628.68 samples/sec   Loss 7.0976   LearningRate 0.0283   Epoch: 9   Global Step: 388040   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:47,437-Speed 2624.79 samples/sec   Loss 7.1924   LearningRate 0.0283   Epoch: 9   Global Step: 388050   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:51,328-Speed 2631.85 samples/sec   Loss 7.1498   LearningRate 0.0283   Epoch: 9   Global Step: 388060   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:55,234-Speed 2622.67 samples/sec   Loss 7.0963   LearningRate 0.0283   Epoch: 9   Global Step: 388070   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:18:59,154-Speed 2612.53 samples/sec   Loss 7.2230   LearningRate 0.0283   Epoch: 9   Global Step: 388080   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:19:03,043-Speed 2633.94 samples/sec   Loss 7.0488   LearningRate 0.0283   Epoch: 9   Global Step: 388090   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:19:06,942-Speed 2626.69 samples/sec   Loss 7.1001   LearningRate 0.0283   Epoch: 9   Global Step: 388100   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:19:10,834-Speed 2632.24 samples/sec   Loss 7.1744   LearningRate 0.0283   Epoch: 9   Global Step: 388110   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:19:14,747-Speed 2617.57 samples/sec   Loss 7.0200   LearningRate 0.0283   Epoch: 9   Global Step: 388120   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:19:18,664-Speed 2614.92 samples/sec   Loss 7.1377   LearningRate 0.0283   Epoch: 9   Global Step: 388130   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:19:22,541-Speed 2641.75 samples/sec   Loss 7.3154   LearningRate 0.0283   Epoch: 9   Global Step: 388140   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:26,438-Speed 2628.50 samples/sec   Loss 7.3175   LearningRate 0.0283   Epoch: 9   Global Step: 388150   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:30,348-Speed 2619.65 samples/sec   Loss 7.1952   LearningRate 0.0283   Epoch: 9   Global Step: 388160   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:34,239-Speed 2631.91 samples/sec   Loss 7.2116   LearningRate 0.0283   Epoch: 9   Global Step: 388170   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:38,139-Speed 2626.09 samples/sec   Loss 7.2289   LearningRate 0.0283   Epoch: 9   Global Step: 388180   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:42,029-Speed 2633.26 samples/sec   Loss 7.1962   LearningRate 0.0283   Epoch: 9   Global Step: 388190   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:45,937-Speed 2620.62 samples/sec   Loss 7.2410   LearningRate 0.0283   Epoch: 9   Global Step: 388200   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:49,834-Speed 2629.11 samples/sec   Loss 6.9827   LearningRate 0.0283   Epoch: 9   Global Step: 388210   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:53,745-Speed 2618.87 samples/sec   Loss 7.1689   LearningRate 0.0283   Epoch: 9   Global Step: 388220   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:19:57,646-Speed 2625.54 samples/sec   Loss 7.0654   LearningRate 0.0283   Epoch: 9   Global Step: 388230   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-14 15:20:01,561-Speed 2616.03 samples/sec   Loss 7.2661   LearningRate 0.0283   Epoch: 9   Global Step: 388240   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:05,462-Speed 2625.69 samples/sec   Loss 7.1517   LearningRate 0.0283   Epoch: 9   Global Step: 388250   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:09,363-Speed 2625.02 samples/sec   Loss 7.0688   LearningRate 0.0283   Epoch: 9   Global Step: 388260   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:13,371-Speed 2555.80 samples/sec   Loss 7.0565   LearningRate 0.0283   Epoch: 9   Global Step: 388270   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:17,466-Speed 2501.88 samples/sec   Loss 7.1544   LearningRate 0.0283   Epoch: 9   Global Step: 388280   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:21,571-Speed 2494.75 samples/sec   Loss 7.2020   LearningRate 0.0283   Epoch: 9   Global Step: 388290   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:25,579-Speed 2555.52 samples/sec   Loss 7.1748   LearningRate 0.0283   Epoch: 9   Global Step: 388300   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:29,469-Speed 2633.06 samples/sec   Loss 7.0838   LearningRate 0.0283   Epoch: 9   Global Step: 388310   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:33,396-Speed 2608.38 samples/sec   Loss 7.2098   LearningRate 0.0283   Epoch: 9   Global Step: 388320   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:37,295-Speed 2627.30 samples/sec   Loss 7.0710   LearningRate 0.0283   Epoch: 9   Global Step: 388330   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:20:41,192-Speed 2628.12 samples/sec   Loss 7.1521   LearningRate 0.0283   Epoch: 9   Global Step: 388340   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:20:45,081-Speed 2633.28 samples/sec   Loss 7.0725   LearningRate 0.0283   Epoch: 9   Global Step: 388350   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:20:48,985-Speed 2623.50 samples/sec   Loss 7.1712   LearningRate 0.0283   Epoch: 9   Global Step: 388360   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:20:52,889-Speed 2623.71 samples/sec   Loss 7.1939   LearningRate 0.0283   Epoch: 9   Global Step: 388370   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:20:56,791-Speed 2624.56 samples/sec   Loss 7.1765   LearningRate 0.0283   Epoch: 9   Global Step: 388380   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:21:00,710-Speed 2614.02 samples/sec   Loss 7.1603   LearningRate 0.0283   Epoch: 9   Global Step: 388390   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:21:04,607-Speed 2628.35 samples/sec   Loss 7.0807   LearningRate 0.0283   Epoch: 9   Global Step: 388400   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:21:08,512-Speed 2622.63 samples/sec   Loss 7.0894   LearningRate 0.0283   Epoch: 9   Global Step: 388410   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:21:12,420-Speed 2621.18 samples/sec   Loss 7.2065   LearningRate 0.0283   Epoch: 9   Global Step: 388420   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:21:16,326-Speed 2622.05 samples/sec   Loss 7.1727   LearningRate 0.0283   Epoch: 9   Global Step: 388430   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:21:20,227-Speed 2625.77 samples/sec   Loss 7.0743   LearningRate 0.0283   Epoch: 9   Global Step: 388440   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:24,125-Speed 2627.01 samples/sec   Loss 7.1137   LearningRate 0.0283   Epoch: 9   Global Step: 388450   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:28,031-Speed 2622.54 samples/sec   Loss 7.2122   LearningRate 0.0283   Epoch: 9   Global Step: 388460   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:31,936-Speed 2622.64 samples/sec   Loss 7.2137   LearningRate 0.0283   Epoch: 9   Global Step: 388470   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:35,831-Speed 2629.69 samples/sec   Loss 7.1181   LearningRate 0.0283   Epoch: 9   Global Step: 388480   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:39,724-Speed 2630.95 samples/sec   Loss 7.2008   LearningRate 0.0283   Epoch: 9   Global Step: 388490   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:43,627-Speed 2624.63 samples/sec   Loss 7.2559   LearningRate 0.0283   Epoch: 9   Global Step: 388500   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:47,519-Speed 2631.62 samples/sec   Loss 7.1924   LearningRate 0.0283   Epoch: 9   Global Step: 388510   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:51,423-Speed 2623.12 samples/sec   Loss 7.2058   LearningRate 0.0283   Epoch: 9   Global Step: 388520   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:55,323-Speed 2626.10 samples/sec   Loss 7.0107   LearningRate 0.0283   Epoch: 9   Global Step: 388530   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:21:59,222-Speed 2627.20 samples/sec   Loss 7.2651   LearningRate 0.0283   Epoch: 9   Global Step: 388540   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:22:03,117-Speed 2629.29 samples/sec   Loss 7.0469   LearningRate 0.0283   Epoch: 9   Global Step: 388550   Fp16 Grad Scale: 262144   Required: 50 hours
Training: 2022-04-14 15:22:07,018-Speed 2625.79 samples/sec   Loss 7.2537   LearningRate 0.0283   Epoch: 9   Global Step: 388560   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-04-14 15:22:10,900-Speed 2638.08 samples/sec   Loss 7.3022   LearningRate 0.0283   Epoch: 9   Global Step: 388570   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:14,802-Speed 2625.07 samples/sec   Loss 7.0930   LearningRate 0.0283   Epoch: 9   Global Step: 388580   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:18,779-Speed 2575.55 samples/sec   Loss 7.0209   LearningRate 0.0283   Epoch: 9   Global Step: 388590   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:22,707-Speed 2607.78 samples/sec   Loss 7.2051   LearningRate 0.0283   Epoch: 9   Global Step: 388600   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:26,603-Speed 2628.80 samples/sec   Loss 7.1557   LearningRate 0.0283   Epoch: 9   Global Step: 388610   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:30,495-Speed 2631.24 samples/sec   Loss 7.1267   LearningRate 0.0283   Epoch: 9   Global Step: 388620   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:34,386-Speed 2632.42 samples/sec   Loss 7.0593   LearningRate 0.0283   Epoch: 9   Global Step: 388630   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:38,288-Speed 2625.03 samples/sec   Loss 7.0606   LearningRate 0.0283   Epoch: 9   Global Step: 388640   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:42,186-Speed 2627.55 samples/sec   Loss 7.0352   LearningRate 0.0283   Epoch: 9   Global Step: 388650   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:22:46,083-Speed 2627.93 samples/sec   Loss 7.3566   LearningRate 0.0282   Epoch: 9   Global Step: 388660   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:22:49,983-Speed 2626.20 samples/sec   Loss 7.3392   LearningRate 0.0282   Epoch: 9   Global Step: 388670   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:22:53,887-Speed 2624.05 samples/sec   Loss 7.2333   LearningRate 0.0282   Epoch: 9   Global Step: 388680   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:22:57,785-Speed 2627.79 samples/sec   Loss 7.2504   LearningRate 0.0282   Epoch: 9   Global Step: 388690   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:01,697-Speed 2618.00 samples/sec   Loss 7.1925   LearningRate 0.0282   Epoch: 9   Global Step: 388700   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:05,593-Speed 2628.93 samples/sec   Loss 7.1173   LearningRate 0.0282   Epoch: 9   Global Step: 388710   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:09,505-Speed 2617.49 samples/sec   Loss 7.1758   LearningRate 0.0282   Epoch: 9   Global Step: 388720   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:13,407-Speed 2625.52 samples/sec   Loss 7.2381   LearningRate 0.0282   Epoch: 9   Global Step: 388730   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:17,318-Speed 2618.81 samples/sec   Loss 7.2958   LearningRate 0.0282   Epoch: 9   Global Step: 388740   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:21,219-Speed 2625.35 samples/sec   Loss 7.2443   LearningRate 0.0282   Epoch: 9   Global Step: 388750   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-04-14 15:23:25,128-Speed 2620.12 samples/sec   Loss 7.1453   LearningRate 0.0282   Epoch: 9   Global Step: 388760   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:23:29,042-Speed 2616.84 samples/sec   Loss 7.1180   LearningRate 0.0282   Epoch: 9   Global Step: 388770   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:23:32,945-Speed 2624.22 samples/sec   Loss 7.2421   LearningRate 0.0282   Epoch: 9   Global Step: 388780   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:23:36,843-Speed 2627.86 samples/sec   Loss 7.0581   LearningRate 0.0282   Epoch: 9   Global Step: 388790   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:23:40,781-Speed 2600.68 samples/sec   Loss 7.2660   LearningRate 0.0282   Epoch: 9   Global Step: 388800   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-04-14 15:23:44,680-Speed 2626.69 samples/sec   Loss 7.1219   LearningRate 0.0282   Epoch: 9   Global Step: 388810   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:23:48,587-Speed 2621.41 samples/sec   Loss 7.2633   LearningRate 0.0282   Epoch: 9   Global Step: 388820   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:23:52,600-Speed 2552.71 samples/sec   Loss 7.1644   LearningRate 0.0282   Epoch: 9   Global Step: 388830   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:23:56,504-Speed 2623.70 samples/sec   Loss 7.1786   LearningRate 0.0282   Epoch: 9   Global Step: 388840   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:00,403-Speed 2626.55 samples/sec   Loss 7.1054   LearningRate 0.0282   Epoch: 9   Global Step: 388850   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:04,322-Speed 2613.33 samples/sec   Loss 7.1677   LearningRate 0.0282   Epoch: 9   Global Step: 388860   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:24:08,211-Speed 2633.67 samples/sec   Loss 7.1695   LearningRate 0.0282   Epoch: 9   Global Step: 388870   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:12,137-Speed 2609.24 samples/sec   Loss 7.2821   LearningRate 0.0282   Epoch: 9   Global Step: 388880   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:16,201-Speed 2520.25 samples/sec   Loss 7.0955   LearningRate 0.0282   Epoch: 9   Global Step: 388890   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:20,149-Speed 2594.25 samples/sec   Loss 7.0524   LearningRate 0.0282   Epoch: 9   Global Step: 388900   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:24,058-Speed 2620.09 samples/sec   Loss 6.9864   LearningRate 0.0282   Epoch: 9   Global Step: 388910   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:27,991-Speed 2604.36 samples/sec   Loss 7.1420   LearningRate 0.0282   Epoch: 9   Global Step: 388920   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:31,984-Speed 2564.74 samples/sec   Loss 7.1793   LearningRate 0.0282   Epoch: 9   Global Step: 388930   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:35,887-Speed 2624.03 samples/sec   Loss 7.1951   LearningRate 0.0282   Epoch: 9   Global Step: 388940   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:39,786-Speed 2626.89 samples/sec   Loss 7.0918   LearningRate 0.0282   Epoch: 9   Global Step: 388950   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:43,690-Speed 2623.66 samples/sec   Loss 7.1309   LearningRate 0.0282   Epoch: 9   Global Step: 388960   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:24:47,593-Speed 2624.56 samples/sec   Loss 7.1309   LearningRate 0.0282   Epoch: 9   Global Step: 388970   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:24:51,495-Speed 2624.75 samples/sec   Loss 7.1562   LearningRate 0.0282   Epoch: 9   Global Step: 388980   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:24:55,393-Speed 2628.34 samples/sec   Loss 7.0466   LearningRate 0.0282   Epoch: 9   Global Step: 388990   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:24:59,284-Speed 2631.58 samples/sec   Loss 7.1362   LearningRate 0.0282   Epoch: 9   Global Step: 389000   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:25:03,187-Speed 2624.09 samples/sec   Loss 7.1178   LearningRate 0.0282   Epoch: 9   Global Step: 389010   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:25:07,087-Speed 2626.60 samples/sec   Loss 7.1820   LearningRate 0.0282   Epoch: 9   Global Step: 389020   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:25:10,944-Speed 2655.34 samples/sec   Loss 7.4257   LearningRate 0.0282   Epoch: 9   Global Step: 389030   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:14,839-Speed 2629.67 samples/sec   Loss 7.0506   LearningRate 0.0282   Epoch: 9   Global Step: 389040   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:18,733-Speed 2630.43 samples/sec   Loss 7.0586   LearningRate 0.0282   Epoch: 9   Global Step: 389050   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:22,648-Speed 2615.68 samples/sec   Loss 7.2384   LearningRate 0.0282   Epoch: 9   Global Step: 389060   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:26,542-Speed 2630.69 samples/sec   Loss 7.0492   LearningRate 0.0282   Epoch: 9   Global Step: 389070   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:30,439-Speed 2628.25 samples/sec   Loss 7.0538   LearningRate 0.0282   Epoch: 9   Global Step: 389080   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:34,345-Speed 2622.19 samples/sec   Loss 7.2579   LearningRate 0.0282   Epoch: 9   Global Step: 389090   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:38,246-Speed 2625.35 samples/sec   Loss 7.1991   LearningRate 0.0282   Epoch: 9   Global Step: 389100   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:42,141-Speed 2629.89 samples/sec   Loss 7.0852   LearningRate 0.0282   Epoch: 9   Global Step: 389110   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:46,044-Speed 2623.78 samples/sec   Loss 7.0859   LearningRate 0.0282   Epoch: 9   Global Step: 389120   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:25:49,946-Speed 2625.00 samples/sec   Loss 6.9790   LearningRate 0.0282   Epoch: 9   Global Step: 389130   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:25:53,834-Speed 2634.46 samples/sec   Loss 7.2119   LearningRate 0.0282   Epoch: 9   Global Step: 389140   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:25:57,734-Speed 2627.95 samples/sec   Loss 7.1399   LearningRate 0.0282   Epoch: 9   Global Step: 389150   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:01,664-Speed 2606.34 samples/sec   Loss 7.1063   LearningRate 0.0282   Epoch: 9   Global Step: 389160   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:05,553-Speed 2633.17 samples/sec   Loss 7.0507   LearningRate 0.0282   Epoch: 9   Global Step: 389170   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:09,468-Speed 2616.48 samples/sec   Loss 7.0323   LearningRate 0.0282   Epoch: 9   Global Step: 389180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:13,372-Speed 2623.90 samples/sec   Loss 7.1396   LearningRate 0.0282   Epoch: 9   Global Step: 389190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:17,281-Speed 2619.96 samples/sec   Loss 7.0336   LearningRate 0.0282   Epoch: 9   Global Step: 389200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:21,178-Speed 2628.31 samples/sec   Loss 7.1700   LearningRate 0.0282   Epoch: 9   Global Step: 389210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:25,084-Speed 2621.73 samples/sec   Loss 7.1177   LearningRate 0.0282   Epoch: 9   Global Step: 389220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:26:28,978-Speed 2630.13 samples/sec   Loss 7.2263   LearningRate 0.0282   Epoch: 9   Global Step: 389230   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:32,877-Speed 2627.46 samples/sec   Loss 7.1517   LearningRate 0.0282   Epoch: 9   Global Step: 389240   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:36,785-Speed 2620.57 samples/sec   Loss 7.1388   LearningRate 0.0282   Epoch: 9   Global Step: 389250   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:40,685-Speed 2626.22 samples/sec   Loss 7.0652   LearningRate 0.0282   Epoch: 9   Global Step: 389260   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:44,586-Speed 2625.78 samples/sec   Loss 7.1090   LearningRate 0.0282   Epoch: 9   Global Step: 389270   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:48,483-Speed 2627.65 samples/sec   Loss 7.1491   LearningRate 0.0282   Epoch: 9   Global Step: 389280   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:52,386-Speed 2624.93 samples/sec   Loss 7.2706   LearningRate 0.0282   Epoch: 9   Global Step: 389290   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:26:56,282-Speed 2628.85 samples/sec   Loss 7.0876   LearningRate 0.0282   Epoch: 9   Global Step: 389300   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:00,181-Speed 2627.07 samples/sec   Loss 7.0658   LearningRate 0.0282   Epoch: 9   Global Step: 389310   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:04,093-Speed 2617.45 samples/sec   Loss 7.0836   LearningRate 0.0282   Epoch: 9   Global Step: 389320   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:07,979-Speed 2636.28 samples/sec   Loss 7.1676   LearningRate 0.0282   Epoch: 9   Global Step: 389330   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:11,876-Speed 2628.26 samples/sec   Loss 7.1343   LearningRate 0.0282   Epoch: 9   Global Step: 389340   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:15,781-Speed 2622.47 samples/sec   Loss 7.0730   LearningRate 0.0282   Epoch: 9   Global Step: 389350   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:19,678-Speed 2628.36 samples/sec   Loss 7.1896   LearningRate 0.0282   Epoch: 9   Global Step: 389360   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:23,567-Speed 2633.30 samples/sec   Loss 7.1337   LearningRate 0.0282   Epoch: 9   Global Step: 389370   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:27,484-Speed 2615.09 samples/sec   Loss 7.1320   LearningRate 0.0282   Epoch: 9   Global Step: 389380   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:31,380-Speed 2628.86 samples/sec   Loss 7.1035   LearningRate 0.0282   Epoch: 9   Global Step: 389390   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:27:35,258-Speed 2641.60 samples/sec   Loss 7.1975   LearningRate 0.0282   Epoch: 9   Global Step: 389400   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:27:39,150-Speed 2631.62 samples/sec   Loss 7.1742   LearningRate 0.0282   Epoch: 9   Global Step: 389410   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:27:43,059-Speed 2620.13 samples/sec   Loss 7.0498   LearningRate 0.0282   Epoch: 9   Global Step: 389420   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:27:46,971-Speed 2617.75 samples/sec   Loss 7.1340   LearningRate 0.0282   Epoch: 9   Global Step: 389430   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:27:50,866-Speed 2630.05 samples/sec   Loss 7.1524   LearningRate 0.0281   Epoch: 9   Global Step: 389440   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:27:54,770-Speed 2623.34 samples/sec   Loss 7.1858   LearningRate 0.0281   Epoch: 9   Global Step: 389450   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:27:58,673-Speed 2624.21 samples/sec   Loss 6.9645   LearningRate 0.0281   Epoch: 9   Global Step: 389460   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:28:02,583-Speed 2619.34 samples/sec   Loss 7.1008   LearningRate 0.0281   Epoch: 9   Global Step: 389470   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:28:06,486-Speed 2624.49 samples/sec   Loss 7.1246   LearningRate 0.0281   Epoch: 9   Global Step: 389480   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:28:10,391-Speed 2622.93 samples/sec   Loss 7.2007   LearningRate 0.0281   Epoch: 9   Global Step: 389490   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:28:14,299-Speed 2620.55 samples/sec   Loss 7.1799   LearningRate 0.0281   Epoch: 9   Global Step: 389500   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:28:18,203-Speed 2624.36 samples/sec   Loss 7.0502   LearningRate 0.0281   Epoch: 9   Global Step: 389510   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:28:22,089-Speed 2635.03 samples/sec   Loss 7.1726   LearningRate 0.0281   Epoch: 9   Global Step: 389520   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:28:25,979-Speed 2633.09 samples/sec   Loss 7.1955   LearningRate 0.0281   Epoch: 9   Global Step: 389530   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:29,905-Speed 2609.13 samples/sec   Loss 7.4683   LearningRate 0.0281   Epoch: 9   Global Step: 389540   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:33,835-Speed 2605.70 samples/sec   Loss 7.2887   LearningRate 0.0281   Epoch: 9   Global Step: 389550   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:37,737-Speed 2625.29 samples/sec   Loss 7.1051   LearningRate 0.0281   Epoch: 9   Global Step: 389560   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:41,626-Speed 2633.84 samples/sec   Loss 7.1488   LearningRate 0.0281   Epoch: 9   Global Step: 389570   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:45,552-Speed 2608.53 samples/sec   Loss 7.2170   LearningRate 0.0281   Epoch: 9   Global Step: 389580   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:49,520-Speed 2581.60 samples/sec   Loss 7.2014   LearningRate 0.0281   Epoch: 9   Global Step: 389590   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:53,415-Speed 2629.69 samples/sec   Loss 7.1498   LearningRate 0.0281   Epoch: 9   Global Step: 389600   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:28:57,309-Speed 2630.53 samples/sec   Loss 7.1357   LearningRate 0.0281   Epoch: 9   Global Step: 389610   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:01,201-Speed 2630.92 samples/sec   Loss 7.2225   LearningRate 0.0281   Epoch: 9   Global Step: 389620   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:05,102-Speed 2625.38 samples/sec   Loss 7.1199   LearningRate 0.0281   Epoch: 9   Global Step: 389630   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:29:08,997-Speed 2629.90 samples/sec   Loss 7.2104   LearningRate 0.0281   Epoch: 9   Global Step: 389640   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:29:12,890-Speed 2630.58 samples/sec   Loss 7.0654   LearningRate 0.0281   Epoch: 9   Global Step: 389650   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:29:16,770-Speed 2639.92 samples/sec   Loss 7.4945   LearningRate 0.0281   Epoch: 9   Global Step: 389660   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:20,666-Speed 2629.47 samples/sec   Loss 7.2012   LearningRate 0.0281   Epoch: 9   Global Step: 389670   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:24,570-Speed 2623.52 samples/sec   Loss 7.0491   LearningRate 0.0281   Epoch: 9   Global Step: 389680   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:28,462-Speed 2631.65 samples/sec   Loss 7.1873   LearningRate 0.0281   Epoch: 9   Global Step: 389690   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:32,366-Speed 2623.31 samples/sec   Loss 7.2201   LearningRate 0.0281   Epoch: 9   Global Step: 389700   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:36,263-Speed 2628.28 samples/sec   Loss 7.1773   LearningRate 0.0281   Epoch: 9   Global Step: 389710   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:40,187-Speed 2609.95 samples/sec   Loss 7.1129   LearningRate 0.0281   Epoch: 9   Global Step: 389720   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:44,102-Speed 2616.22 samples/sec   Loss 7.1899   LearningRate 0.0281   Epoch: 9   Global Step: 389730   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:47,993-Speed 2632.58 samples/sec   Loss 7.1409   LearningRate 0.0281   Epoch: 9   Global Step: 389740   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:51,888-Speed 2629.58 samples/sec   Loss 7.0966   LearningRate 0.0281   Epoch: 9   Global Step: 389750   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:29:55,782-Speed 2630.23 samples/sec   Loss 7.1713   LearningRate 0.0281   Epoch: 9   Global Step: 389760   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:29:59,716-Speed 2604.15 samples/sec   Loss 7.1907   LearningRate 0.0281   Epoch: 9   Global Step: 389770   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:03,612-Speed 2628.58 samples/sec   Loss 7.1129   LearningRate 0.0281   Epoch: 9   Global Step: 389780   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:07,511-Speed 2626.77 samples/sec   Loss 7.2291   LearningRate 0.0281   Epoch: 9   Global Step: 389790   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:11,403-Speed 2631.30 samples/sec   Loss 7.1085   LearningRate 0.0281   Epoch: 9   Global Step: 389800   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:15,297-Speed 2630.37 samples/sec   Loss 7.1554   LearningRate 0.0281   Epoch: 9   Global Step: 389810   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:19,203-Speed 2622.37 samples/sec   Loss 7.1126   LearningRate 0.0281   Epoch: 9   Global Step: 389820   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:23,113-Speed 2619.35 samples/sec   Loss 7.1469   LearningRate 0.0281   Epoch: 9   Global Step: 389830   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:27,013-Speed 2625.97 samples/sec   Loss 7.0892   LearningRate 0.0281   Epoch: 9   Global Step: 389840   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:30,924-Speed 2618.91 samples/sec   Loss 7.0642   LearningRate 0.0281   Epoch: 9   Global Step: 389850   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:34,814-Speed 2633.34 samples/sec   Loss 7.1922   LearningRate 0.0281   Epoch: 9   Global Step: 389860   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:30:38,715-Speed 2625.47 samples/sec   Loss 7.1500   LearningRate 0.0281   Epoch: 9   Global Step: 389870   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:30:42,590-Speed 2643.27 samples/sec   Loss 7.0307   LearningRate 0.0281   Epoch: 9   Global Step: 389880   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:46,492-Speed 2624.86 samples/sec   Loss 7.1406   LearningRate 0.0281   Epoch: 9   Global Step: 389890   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:50,390-Speed 2627.71 samples/sec   Loss 7.1630   LearningRate 0.0281   Epoch: 9   Global Step: 389900   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:54,296-Speed 2621.78 samples/sec   Loss 7.1287   LearningRate 0.0281   Epoch: 9   Global Step: 389910   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:30:58,200-Speed 2623.43 samples/sec   Loss 7.0661   LearningRate 0.0281   Epoch: 9   Global Step: 389920   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:31:02,133-Speed 2604.16 samples/sec   Loss 7.1192   LearningRate 0.0281   Epoch: 9   Global Step: 389930   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:31:06,041-Speed 2621.13 samples/sec   Loss 7.1272   LearningRate 0.0281   Epoch: 9   Global Step: 389940   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:31:09,942-Speed 2625.37 samples/sec   Loss 7.0531   LearningRate 0.0281   Epoch: 9   Global Step: 389950   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:31:13,851-Speed 2620.81 samples/sec   Loss 7.0682   LearningRate 0.0281   Epoch: 9   Global Step: 389960   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:31:17,750-Speed 2626.68 samples/sec   Loss 7.0502   LearningRate 0.0281   Epoch: 9   Global Step: 389970   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:31:21,650-Speed 2625.89 samples/sec   Loss 6.9568   LearningRate 0.0281   Epoch: 9   Global Step: 389980   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:31:25,545-Speed 2629.67 samples/sec   Loss 7.3275   LearningRate 0.0281   Epoch: 9   Global Step: 389990   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:31:29,445-Speed 2626.30 samples/sec   Loss 7.1363   LearningRate 0.0281   Epoch: 9   Global Step: 390000   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:32:12,611-[lfw][390000]XNorm: 23.729612
Training: 2022-04-14 15:32:12,612-[lfw][390000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-14 15:32:12,612-[lfw][390000]Accuracy-Highest: 0.99783
Training: 2022-04-14 15:33:02,708-[cfp_fp][390000]XNorm: 21.771758
Training: 2022-04-14 15:33:02,709-[cfp_fp][390000]Accuracy-Flip: 0.98543+-0.00530
Training: 2022-04-14 15:33:02,710-[cfp_fp][390000]Accuracy-Highest: 0.98757
Training: 2022-04-14 15:33:45,726-[agedb_30][390000]XNorm: 23.669865
Training: 2022-04-14 15:33:45,727-[agedb_30][390000]Accuracy-Flip: 0.97667+-0.00734
Training: 2022-04-14 15:33:45,728-[agedb_30][390000]Accuracy-Highest: 0.97700
Training: 2022-04-14 15:33:49,587-Speed 73.07 samples/sec   Loss 7.0529   LearningRate 0.0281   Epoch: 9   Global Step: 390010   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:33:53,438-Speed 2659.49 samples/sec   Loss 7.1733   LearningRate 0.0281   Epoch: 9   Global Step: 390020   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:33:57,308-Speed 2646.55 samples/sec   Loss 7.1547   LearningRate 0.0281   Epoch: 9   Global Step: 390030   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:01,224-Speed 2615.38 samples/sec   Loss 7.0370   LearningRate 0.0281   Epoch: 9   Global Step: 390040   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:05,107-Speed 2638.38 samples/sec   Loss 7.0219   LearningRate 0.0281   Epoch: 9   Global Step: 390050   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:08,982-Speed 2642.90 samples/sec   Loss 7.1121   LearningRate 0.0281   Epoch: 9   Global Step: 390060   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:12,878-Speed 2629.67 samples/sec   Loss 7.0967   LearningRate 0.0281   Epoch: 9   Global Step: 390070   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:16,762-Speed 2637.39 samples/sec   Loss 6.9348   LearningRate 0.0281   Epoch: 9   Global Step: 390080   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:20,640-Speed 2641.61 samples/sec   Loss 7.2393   LearningRate 0.0281   Epoch: 9   Global Step: 390090   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:24,528-Speed 2634.20 samples/sec   Loss 7.1057   LearningRate 0.0281   Epoch: 9   Global Step: 390100   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:28,416-Speed 2634.65 samples/sec   Loss 7.0638   LearningRate 0.0281   Epoch: 9   Global Step: 390110   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:34:32,313-Speed 2628.09 samples/sec   Loss 7.1203   LearningRate 0.0281   Epoch: 9   Global Step: 390120   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:36,210-Speed 2628.39 samples/sec   Loss 7.0933   LearningRate 0.0281   Epoch: 9   Global Step: 390130   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:40,144-Speed 2603.47 samples/sec   Loss 7.1156   LearningRate 0.0281   Epoch: 9   Global Step: 390140   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:44,223-Speed 2511.35 samples/sec   Loss 7.1754   LearningRate 0.0281   Epoch: 9   Global Step: 390150   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:48,123-Speed 2626.55 samples/sec   Loss 7.1518   LearningRate 0.0281   Epoch: 9   Global Step: 390160   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:52,019-Speed 2628.95 samples/sec   Loss 7.0702   LearningRate 0.0281   Epoch: 9   Global Step: 390170   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:55,924-Speed 2622.71 samples/sec   Loss 7.1342   LearningRate 0.0281   Epoch: 9   Global Step: 390180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:34:59,829-Speed 2623.47 samples/sec   Loss 7.1767   LearningRate 0.0281   Epoch: 9   Global Step: 390190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:35:03,943-Speed 2489.13 samples/sec   Loss 7.0990   LearningRate 0.0281   Epoch: 9   Global Step: 390200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:35:07,840-Speed 2628.57 samples/sec   Loss 7.0687   LearningRate 0.0281   Epoch: 9   Global Step: 390210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:35:11,733-Speed 2630.54 samples/sec   Loss 6.9807   LearningRate 0.0280   Epoch: 9   Global Step: 390220   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:35:15,624-Speed 2632.51 samples/sec   Loss 7.1041   LearningRate 0.0280   Epoch: 9   Global Step: 390230   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:35:19,519-Speed 2629.52 samples/sec   Loss 7.1918   LearningRate 0.0280   Epoch: 9   Global Step: 390240   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:35:23,411-Speed 2631.91 samples/sec   Loss 7.0519   LearningRate 0.0280   Epoch: 9   Global Step: 390250   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:35:27,311-Speed 2625.70 samples/sec   Loss 7.0263   LearningRate 0.0280   Epoch: 9   Global Step: 390260   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:35:31,193-Speed 2639.25 samples/sec   Loss 7.0899   LearningRate 0.0280   Epoch: 9   Global Step: 390270   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:35:35,068-Speed 2643.16 samples/sec   Loss 7.1417   LearningRate 0.0280   Epoch: 9   Global Step: 390280   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:35:38,966-Speed 2627.22 samples/sec   Loss 7.2369   LearningRate 0.0280   Epoch: 9   Global Step: 390290   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:35:42,863-Speed 2628.89 samples/sec   Loss 7.1398   LearningRate 0.0280   Epoch: 9   Global Step: 390300   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:35:46,772-Speed 2620.43 samples/sec   Loss 7.1879   LearningRate 0.0280   Epoch: 9   Global Step: 390310   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:35:50,680-Speed 2620.58 samples/sec   Loss 6.9091   LearningRate 0.0280   Epoch: 9   Global Step: 390320   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:35:54,579-Speed 2627.18 samples/sec   Loss 7.0793   LearningRate 0.0280   Epoch: 9   Global Step: 390330   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:35:58,473-Speed 2631.08 samples/sec   Loss 7.0915   LearningRate 0.0280   Epoch: 9   Global Step: 390340   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:36:02,403-Speed 2606.10 samples/sec   Loss 7.0574   LearningRate 0.0280   Epoch: 9   Global Step: 390350   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:36:06,299-Speed 2629.36 samples/sec   Loss 7.2182   LearningRate 0.0280   Epoch: 9   Global Step: 390360   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:36:10,190-Speed 2631.70 samples/sec   Loss 7.1605   LearningRate 0.0280   Epoch: 9   Global Step: 390370   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:36:14,088-Speed 2628.08 samples/sec   Loss 7.1799   LearningRate 0.0280   Epoch: 9   Global Step: 390380   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:17,985-Speed 2628.39 samples/sec   Loss 7.2267   LearningRate 0.0280   Epoch: 9   Global Step: 390390   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:21,904-Speed 2613.84 samples/sec   Loss 7.1954   LearningRate 0.0280   Epoch: 9   Global Step: 390400   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:25,801-Speed 2628.94 samples/sec   Loss 7.0885   LearningRate 0.0280   Epoch: 9   Global Step: 390410   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:29,735-Speed 2603.33 samples/sec   Loss 7.1301   LearningRate 0.0280   Epoch: 9   Global Step: 390420   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:33,630-Speed 2630.31 samples/sec   Loss 7.1024   LearningRate 0.0280   Epoch: 9   Global Step: 390430   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:37,522-Speed 2631.47 samples/sec   Loss 7.0578   LearningRate 0.0280   Epoch: 9   Global Step: 390440   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:41,416-Speed 2630.08 samples/sec   Loss 7.1119   LearningRate 0.0280   Epoch: 9   Global Step: 390450   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:45,320-Speed 2623.44 samples/sec   Loss 7.1081   LearningRate 0.0280   Epoch: 9   Global Step: 390460   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:49,219-Speed 2627.58 samples/sec   Loss 7.1022   LearningRate 0.0280   Epoch: 9   Global Step: 390470   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:36:53,112-Speed 2630.66 samples/sec   Loss 7.1260   LearningRate 0.0280   Epoch: 9   Global Step: 390480   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:36:57,009-Speed 2628.18 samples/sec   Loss 7.0118   LearningRate 0.0280   Epoch: 9   Global Step: 390490   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:37:00,907-Speed 2627.75 samples/sec   Loss 7.0297   LearningRate 0.0280   Epoch: 9   Global Step: 390500   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:37:04,801-Speed 2630.54 samples/sec   Loss 7.1765   LearningRate 0.0280   Epoch: 9   Global Step: 390510   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:37:08,694-Speed 2631.18 samples/sec   Loss 7.0565   LearningRate 0.0280   Epoch: 9   Global Step: 390520   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:37:12,592-Speed 2627.64 samples/sec   Loss 7.1797   LearningRate 0.0280   Epoch: 9   Global Step: 390530   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:37:16,465-Speed 2644.35 samples/sec   Loss 7.1332   LearningRate 0.0280   Epoch: 9   Global Step: 390540   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:37:20,351-Speed 2636.00 samples/sec   Loss 7.0452   LearningRate 0.0280   Epoch: 9   Global Step: 390550   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:24,247-Speed 2629.61 samples/sec   Loss 7.1839   LearningRate 0.0280   Epoch: 9   Global Step: 390560   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:28,150-Speed 2624.28 samples/sec   Loss 7.1193   LearningRate 0.0280   Epoch: 9   Global Step: 390570   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:32,059-Speed 2620.25 samples/sec   Loss 7.0871   LearningRate 0.0280   Epoch: 9   Global Step: 390580   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:35,961-Speed 2624.79 samples/sec   Loss 7.0925   LearningRate 0.0280   Epoch: 9   Global Step: 390590   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:39,864-Speed 2624.32 samples/sec   Loss 7.0549   LearningRate 0.0280   Epoch: 9   Global Step: 390600   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:43,748-Speed 2637.23 samples/sec   Loss 7.1269   LearningRate 0.0280   Epoch: 9   Global Step: 390610   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:47,649-Speed 2625.27 samples/sec   Loss 7.1437   LearningRate 0.0280   Epoch: 9   Global Step: 390620   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:51,550-Speed 2625.27 samples/sec   Loss 7.1751   LearningRate 0.0280   Epoch: 9   Global Step: 390630   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:55,446-Speed 2629.91 samples/sec   Loss 7.1517   LearningRate 0.0280   Epoch: 9   Global Step: 390640   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:37:59,337-Speed 2631.79 samples/sec   Loss 7.1049   LearningRate 0.0280   Epoch: 9   Global Step: 390650   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:38:03,235-Speed 2628.24 samples/sec   Loss 7.0476   LearningRate 0.0280   Epoch: 9   Global Step: 390660   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:38:07,156-Speed 2611.72 samples/sec   Loss 7.1090   LearningRate 0.0280   Epoch: 9   Global Step: 390670   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:38:11,070-Speed 2617.63 samples/sec   Loss 6.9925   LearningRate 0.0280   Epoch: 9   Global Step: 390680   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:38:15,008-Speed 2600.99 samples/sec   Loss 7.1078   LearningRate 0.0280   Epoch: 9   Global Step: 390690   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:38:18,909-Speed 2625.21 samples/sec   Loss 7.1651   LearningRate 0.0280   Epoch: 9   Global Step: 390700   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:22,809-Speed 2626.24 samples/sec   Loss 7.4851   LearningRate 0.0280   Epoch: 9   Global Step: 390710   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:26,708-Speed 2627.53 samples/sec   Loss 7.1065   LearningRate 0.0280   Epoch: 9   Global Step: 390720   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:30,603-Speed 2629.60 samples/sec   Loss 7.0447   LearningRate 0.0280   Epoch: 9   Global Step: 390730   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:34,508-Speed 2622.50 samples/sec   Loss 7.0132   LearningRate 0.0280   Epoch: 9   Global Step: 390740   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:38,406-Speed 2628.29 samples/sec   Loss 7.1457   LearningRate 0.0280   Epoch: 9   Global Step: 390750   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:42,300-Speed 2630.48 samples/sec   Loss 7.1103   LearningRate 0.0280   Epoch: 9   Global Step: 390760   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:46,198-Speed 2627.62 samples/sec   Loss 7.1308   LearningRate 0.0280   Epoch: 9   Global Step: 390770   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:50,092-Speed 2629.68 samples/sec   Loss 7.2723   LearningRate 0.0280   Epoch: 9   Global Step: 390780   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:53,992-Speed 2626.86 samples/sec   Loss 7.2314   LearningRate 0.0280   Epoch: 9   Global Step: 390790   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:38:57,888-Speed 2629.35 samples/sec   Loss 7.1935   LearningRate 0.0280   Epoch: 9   Global Step: 390800   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:01,786-Speed 2627.67 samples/sec   Loss 7.0795   LearningRate 0.0280   Epoch: 9   Global Step: 390810   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:05,695-Speed 2619.96 samples/sec   Loss 7.0766   LearningRate 0.0280   Epoch: 9   Global Step: 390820   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:09,598-Speed 2624.83 samples/sec   Loss 7.2189   LearningRate 0.0280   Epoch: 9   Global Step: 390830   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:13,486-Speed 2633.81 samples/sec   Loss 7.0806   LearningRate 0.0280   Epoch: 9   Global Step: 390840   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:17,408-Speed 2611.24 samples/sec   Loss 7.1240   LearningRate 0.0280   Epoch: 9   Global Step: 390850   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:21,303-Speed 2630.08 samples/sec   Loss 7.0578   LearningRate 0.0280   Epoch: 9   Global Step: 390860   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:25,199-Speed 2629.08 samples/sec   Loss 6.9817   LearningRate 0.0280   Epoch: 9   Global Step: 390870   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:29,101-Speed 2625.17 samples/sec   Loss 7.1844   LearningRate 0.0280   Epoch: 9   Global Step: 390880   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:32,995-Speed 2630.03 samples/sec   Loss 7.0499   LearningRate 0.0280   Epoch: 9   Global Step: 390890   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:39:36,911-Speed 2615.73 samples/sec   Loss 7.1209   LearningRate 0.0280   Epoch: 9   Global Step: 390900   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:39:40,817-Speed 2622.75 samples/sec   Loss 7.0549   LearningRate 0.0280   Epoch: 9   Global Step: 390910   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:39:44,722-Speed 2622.51 samples/sec   Loss 7.0729   LearningRate 0.0280   Epoch: 9   Global Step: 390920   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:39:48,625-Speed 2624.44 samples/sec   Loss 7.1129   LearningRate 0.0280   Epoch: 9   Global Step: 390930   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:39:52,525-Speed 2625.84 samples/sec   Loss 7.1000   LearningRate 0.0280   Epoch: 9   Global Step: 390940   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:39:56,424-Speed 2627.95 samples/sec   Loss 7.0372   LearningRate 0.0280   Epoch: 9   Global Step: 390950   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:40:00,328-Speed 2623.17 samples/sec   Loss 7.2154   LearningRate 0.0280   Epoch: 9   Global Step: 390960   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:40:04,232-Speed 2624.00 samples/sec   Loss 7.0448   LearningRate 0.0280   Epoch: 9   Global Step: 390970   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:40:08,151-Speed 2613.07 samples/sec   Loss 7.0351   LearningRate 0.0280   Epoch: 9   Global Step: 390980   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:40:12,051-Speed 2626.63 samples/sec   Loss 7.1829   LearningRate 0.0280   Epoch: 9   Global Step: 390990   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:40:15,950-Speed 2626.89 samples/sec   Loss 7.2556   LearningRate 0.0279   Epoch: 9   Global Step: 391000   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:19,876-Speed 2608.66 samples/sec   Loss 7.1338   LearningRate 0.0279   Epoch: 9   Global Step: 391010   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:23,775-Speed 2627.74 samples/sec   Loss 7.0163   LearningRate 0.0279   Epoch: 9   Global Step: 391020   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:27,667-Speed 2631.27 samples/sec   Loss 7.1863   LearningRate 0.0279   Epoch: 9   Global Step: 391030   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:31,597-Speed 2607.39 samples/sec   Loss 7.1900   LearningRate 0.0279   Epoch: 9   Global Step: 391040   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:35,507-Speed 2619.78 samples/sec   Loss 7.0485   LearningRate 0.0279   Epoch: 9   Global Step: 391050   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:39,402-Speed 2629.26 samples/sec   Loss 7.0197   LearningRate 0.0279   Epoch: 9   Global Step: 391060   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:43,315-Speed 2617.61 samples/sec   Loss 7.1424   LearningRate 0.0279   Epoch: 9   Global Step: 391070   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:47,232-Speed 2615.03 samples/sec   Loss 7.0994   LearningRate 0.0279   Epoch: 9   Global Step: 391080   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:51,139-Speed 2621.75 samples/sec   Loss 6.9736   LearningRate 0.0279   Epoch: 9   Global Step: 391090   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:40:55,046-Speed 2621.75 samples/sec   Loss 7.1658   LearningRate 0.0279   Epoch: 9   Global Step: 391100   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:40:58,958-Speed 2618.49 samples/sec   Loss 7.0603   LearningRate 0.0279   Epoch: 9   Global Step: 391110   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:41:02,857-Speed 2626.83 samples/sec   Loss 7.1388   LearningRate 0.0279   Epoch: 9   Global Step: 391120   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:41:06,755-Speed 2627.85 samples/sec   Loss 7.1504   LearningRate 0.0279   Epoch: 9   Global Step: 391130   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:41:10,655-Speed 2626.65 samples/sec   Loss 7.1212   LearningRate 0.0279   Epoch: 9   Global Step: 391140   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:41:14,552-Speed 2627.86 samples/sec   Loss 7.1339   LearningRate 0.0279   Epoch: 9   Global Step: 391150   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:41:18,445-Speed 2631.36 samples/sec   Loss 7.0034   LearningRate 0.0279   Epoch: 9   Global Step: 391160   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:22,342-Speed 2628.01 samples/sec   Loss 7.2327   LearningRate 0.0279   Epoch: 9   Global Step: 391170   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:26,237-Speed 2630.09 samples/sec   Loss 7.2288   LearningRate 0.0279   Epoch: 9   Global Step: 391180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:30,131-Speed 2630.52 samples/sec   Loss 7.1389   LearningRate 0.0279   Epoch: 9   Global Step: 391190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:34,049-Speed 2614.57 samples/sec   Loss 7.1894   LearningRate 0.0279   Epoch: 9   Global Step: 391200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:37,958-Speed 2620.18 samples/sec   Loss 7.0148   LearningRate 0.0279   Epoch: 9   Global Step: 391210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:41,860-Speed 2624.91 samples/sec   Loss 7.0920   LearningRate 0.0279   Epoch: 9   Global Step: 391220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:45,761-Speed 2625.26 samples/sec   Loss 7.1586   LearningRate 0.0279   Epoch: 9   Global Step: 391230   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:49,664-Speed 2624.39 samples/sec   Loss 7.0770   LearningRate 0.0279   Epoch: 9   Global Step: 391240   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:53,566-Speed 2624.87 samples/sec   Loss 7.0828   LearningRate 0.0279   Epoch: 9   Global Step: 391250   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:41:57,474-Speed 2620.89 samples/sec   Loss 7.0234   LearningRate 0.0279   Epoch: 9   Global Step: 391260   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:42:01,379-Speed 2622.83 samples/sec   Loss 7.2129   LearningRate 0.0279   Epoch: 9   Global Step: 391270   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:42:05,278-Speed 2626.68 samples/sec   Loss 7.0998   LearningRate 0.0279   Epoch: 9   Global Step: 391280   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:42:09,146-Speed 2648.44 samples/sec   Loss 7.1782   LearningRate 0.0279   Epoch: 9   Global Step: 391290   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:13,045-Speed 2627.09 samples/sec   Loss 7.1288   LearningRate 0.0279   Epoch: 9   Global Step: 391300   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:16,942-Speed 2627.97 samples/sec   Loss 7.1086   LearningRate 0.0279   Epoch: 9   Global Step: 391310   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:20,842-Speed 2626.30 samples/sec   Loss 7.0754   LearningRate 0.0279   Epoch: 9   Global Step: 391320   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:24,779-Speed 2601.85 samples/sec   Loss 7.0197   LearningRate 0.0279   Epoch: 9   Global Step: 391330   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:28,678-Speed 2627.24 samples/sec   Loss 7.0480   LearningRate 0.0279   Epoch: 9   Global Step: 391340   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:32,575-Speed 2628.13 samples/sec   Loss 7.1418   LearningRate 0.0279   Epoch: 9   Global Step: 391350   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:36,476-Speed 2625.84 samples/sec   Loss 7.0344   LearningRate 0.0279   Epoch: 9   Global Step: 391360   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:40,377-Speed 2625.45 samples/sec   Loss 7.0170   LearningRate 0.0279   Epoch: 9   Global Step: 391370   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:44,283-Speed 2622.39 samples/sec   Loss 7.0076   LearningRate 0.0279   Epoch: 9   Global Step: 391380   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:42:48,379-Speed 2500.95 samples/sec   Loss 7.0544   LearningRate 0.0279   Epoch: 9   Global Step: 391390   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:42:52,304-Speed 2609.41 samples/sec   Loss 7.1074   LearningRate 0.0279   Epoch: 9   Global Step: 391400   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:42:56,219-Speed 2615.52 samples/sec   Loss 7.0310   LearningRate 0.0279   Epoch: 9   Global Step: 391410   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:00,114-Speed 2630.47 samples/sec   Loss 6.9856   LearningRate 0.0279   Epoch: 9   Global Step: 391420   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:04,062-Speed 2594.41 samples/sec   Loss 7.1295   LearningRate 0.0279   Epoch: 9   Global Step: 391430   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:07,971-Speed 2620.46 samples/sec   Loss 7.0880   LearningRate 0.0279   Epoch: 9   Global Step: 391440   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:11,863-Speed 2631.93 samples/sec   Loss 7.0649   LearningRate 0.0279   Epoch: 9   Global Step: 391450   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:15,798-Speed 2603.22 samples/sec   Loss 7.0368   LearningRate 0.0279   Epoch: 9   Global Step: 391460   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:19,700-Speed 2624.61 samples/sec   Loss 7.1468   LearningRate 0.0279   Epoch: 9   Global Step: 391470   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:23,623-Speed 2610.97 samples/sec   Loss 7.2503   LearningRate 0.0279   Epoch: 9   Global Step: 391480   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:43:27,541-Speed 2614.41 samples/sec   Loss 7.1859   LearningRate 0.0279   Epoch: 9   Global Step: 391490   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:31,437-Speed 2628.95 samples/sec   Loss 7.0397   LearningRate 0.0279   Epoch: 9   Global Step: 391500   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:35,337-Speed 2626.11 samples/sec   Loss 7.1276   LearningRate 0.0279   Epoch: 9   Global Step: 391510   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:39,241-Speed 2624.10 samples/sec   Loss 7.0545   LearningRate 0.0279   Epoch: 9   Global Step: 391520   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:43,135-Speed 2629.90 samples/sec   Loss 7.1107   LearningRate 0.0279   Epoch: 9   Global Step: 391530   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:47,048-Speed 2617.79 samples/sec   Loss 6.9942   LearningRate 0.0279   Epoch: 9   Global Step: 391540   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:50,946-Speed 2628.06 samples/sec   Loss 7.1939   LearningRate 0.0279   Epoch: 9   Global Step: 391550   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:54,861-Speed 2615.76 samples/sec   Loss 7.1078   LearningRate 0.0279   Epoch: 9   Global Step: 391560   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:43:58,800-Speed 2600.40 samples/sec   Loss 6.9621   LearningRate 0.0279   Epoch: 9   Global Step: 391570   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:44:02,684-Speed 2636.53 samples/sec   Loss 7.2372   LearningRate 0.0279   Epoch: 9   Global Step: 391580   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:06,595-Speed 2619.01 samples/sec   Loss 7.1048   LearningRate 0.0279   Epoch: 9   Global Step: 391590   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:10,515-Speed 2612.82 samples/sec   Loss 7.1740   LearningRate 0.0279   Epoch: 9   Global Step: 391600   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:14,474-Speed 2587.82 samples/sec   Loss 7.0302   LearningRate 0.0279   Epoch: 9   Global Step: 391610   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:18,538-Speed 2519.97 samples/sec   Loss 7.1801   LearningRate 0.0279   Epoch: 9   Global Step: 391620   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:22,441-Speed 2623.96 samples/sec   Loss 6.9425   LearningRate 0.0279   Epoch: 9   Global Step: 391630   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:26,349-Speed 2621.23 samples/sec   Loss 7.0463   LearningRate 0.0279   Epoch: 9   Global Step: 391640   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:30,268-Speed 2613.44 samples/sec   Loss 7.0494   LearningRate 0.0279   Epoch: 9   Global Step: 391650   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:34,177-Speed 2620.08 samples/sec   Loss 7.0699   LearningRate 0.0279   Epoch: 9   Global Step: 391660   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:38,082-Speed 2622.71 samples/sec   Loss 7.1169   LearningRate 0.0279   Epoch: 9   Global Step: 391670   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:44:42,009-Speed 2608.73 samples/sec   Loss 7.2851   LearningRate 0.0279   Epoch: 9   Global Step: 391680   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:44:45,921-Speed 2618.50 samples/sec   Loss 7.1153   LearningRate 0.0279   Epoch: 9   Global Step: 391690   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:44:49,823-Speed 2625.12 samples/sec   Loss 6.9952   LearningRate 0.0279   Epoch: 9   Global Step: 391700   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:44:53,735-Speed 2618.71 samples/sec   Loss 7.1441   LearningRate 0.0279   Epoch: 9   Global Step: 391710   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:44:57,647-Speed 2618.06 samples/sec   Loss 7.0588   LearningRate 0.0279   Epoch: 9   Global Step: 391720   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:01,545-Speed 2627.69 samples/sec   Loss 6.9674   LearningRate 0.0279   Epoch: 9   Global Step: 391730   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:05,457-Speed 2617.92 samples/sec   Loss 7.0515   LearningRate 0.0279   Epoch: 9   Global Step: 391740   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:09,565-Speed 2493.79 samples/sec   Loss 7.0388   LearningRate 0.0279   Epoch: 9   Global Step: 391750   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:13,594-Speed 2542.16 samples/sec   Loss 6.9623   LearningRate 0.0279   Epoch: 9   Global Step: 391760   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:17,542-Speed 2594.30 samples/sec   Loss 7.0292   LearningRate 0.0279   Epoch: 9   Global Step: 391770   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:21,474-Speed 2605.27 samples/sec   Loss 7.0131   LearningRate 0.0279   Epoch: 9   Global Step: 391780   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:45:25,384-Speed 2619.44 samples/sec   Loss 7.1069   LearningRate 0.0278   Epoch: 9   Global Step: 391790   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:45:29,645-Speed 2403.87 samples/sec   Loss 7.0928   LearningRate 0.0278   Epoch: 9   Global Step: 391800   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:33,777-Speed 2479.50 samples/sec   Loss 7.0758   LearningRate 0.0278   Epoch: 9   Global Step: 391810   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:37,726-Speed 2593.89 samples/sec   Loss 7.0779   LearningRate 0.0278   Epoch: 9   Global Step: 391820   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:41,641-Speed 2615.90 samples/sec   Loss 7.0681   LearningRate 0.0278   Epoch: 9   Global Step: 391830   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:45,563-Speed 2611.93 samples/sec   Loss 7.0406   LearningRate 0.0278   Epoch: 9   Global Step: 391840   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:49,465-Speed 2625.04 samples/sec   Loss 7.1304   LearningRate 0.0278   Epoch: 9   Global Step: 391850   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:53,366-Speed 2625.83 samples/sec   Loss 7.1041   LearningRate 0.0278   Epoch: 9   Global Step: 391860   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:45:57,237-Speed 2645.83 samples/sec   Loss 7.0985   LearningRate 0.0278   Epoch: 9   Global Step: 391870   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:01,148-Speed 2618.81 samples/sec   Loss 7.0838   LearningRate 0.0278   Epoch: 9   Global Step: 391880   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:05,050-Speed 2625.17 samples/sec   Loss 7.2140   LearningRate 0.0278   Epoch: 9   Global Step: 391890   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:08,953-Speed 2623.81 samples/sec   Loss 7.0338   LearningRate 0.0278   Epoch: 9   Global Step: 391900   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:12,846-Speed 2630.97 samples/sec   Loss 7.1866   LearningRate 0.0278   Epoch: 9   Global Step: 391910   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:16,746-Speed 2626.60 samples/sec   Loss 7.0368   LearningRate 0.0278   Epoch: 9   Global Step: 391920   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:20,663-Speed 2614.66 samples/sec   Loss 6.9828   LearningRate 0.0278   Epoch: 9   Global Step: 391930   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:24,560-Speed 2628.92 samples/sec   Loss 7.0358   LearningRate 0.0278   Epoch: 9   Global Step: 391940   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:28,453-Speed 2630.76 samples/sec   Loss 7.0444   LearningRate 0.0278   Epoch: 9   Global Step: 391950   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:32,357-Speed 2624.30 samples/sec   Loss 7.1320   LearningRate 0.0278   Epoch: 9   Global Step: 391960   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:46:36,258-Speed 2625.85 samples/sec   Loss 7.0691   LearningRate 0.0278   Epoch: 9   Global Step: 391970   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:46:40,125-Speed 2648.42 samples/sec   Loss 7.1076   LearningRate 0.0278   Epoch: 9   Global Step: 391980   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:46:44,042-Speed 2614.65 samples/sec   Loss 7.1191   LearningRate 0.0278   Epoch: 9   Global Step: 391990   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:46:47,959-Speed 2615.09 samples/sec   Loss 7.1141   LearningRate 0.0278   Epoch: 9   Global Step: 392000   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:46:51,880-Speed 2612.70 samples/sec   Loss 6.9602   LearningRate 0.0278   Epoch: 9   Global Step: 392010   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:46:55,773-Speed 2630.22 samples/sec   Loss 6.9487   LearningRate 0.0278   Epoch: 9   Global Step: 392020   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:46:59,704-Speed 2606.45 samples/sec   Loss 7.0203   LearningRate 0.0278   Epoch: 9   Global Step: 392030   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:47:03,598-Speed 2630.06 samples/sec   Loss 6.9970   LearningRate 0.0278   Epoch: 9   Global Step: 392040   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:47:07,519-Speed 2612.84 samples/sec   Loss 6.9270   LearningRate 0.0278   Epoch: 9   Global Step: 392050   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:47:11,419-Speed 2626.48 samples/sec   Loss 7.1524   LearningRate 0.0278   Epoch: 9   Global Step: 392060   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:47:15,354-Speed 2603.31 samples/sec   Loss 7.0291   LearningRate 0.0278   Epoch: 9   Global Step: 392070   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-14 15:47:19,259-Speed 2622.64 samples/sec   Loss 7.1185   LearningRate 0.0278   Epoch: 9   Global Step: 392080   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:23,173-Speed 2617.27 samples/sec   Loss 7.0808   LearningRate 0.0278   Epoch: 9   Global Step: 392090   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:27,082-Speed 2620.50 samples/sec   Loss 7.0044   LearningRate 0.0278   Epoch: 9   Global Step: 392100   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:30,984-Speed 2624.76 samples/sec   Loss 7.1024   LearningRate 0.0278   Epoch: 9   Global Step: 392110   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:34,899-Speed 2616.12 samples/sec   Loss 7.1096   LearningRate 0.0278   Epoch: 9   Global Step: 392120   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:38,797-Speed 2627.95 samples/sec   Loss 7.0758   LearningRate 0.0278   Epoch: 9   Global Step: 392130   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:42,693-Speed 2628.89 samples/sec   Loss 7.1696   LearningRate 0.0278   Epoch: 9   Global Step: 392140   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:46,599-Speed 2622.92 samples/sec   Loss 7.0191   LearningRate 0.0278   Epoch: 9   Global Step: 392150   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:50,497-Speed 2627.50 samples/sec   Loss 6.9551   LearningRate 0.0278   Epoch: 9   Global Step: 392160   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:54,395-Speed 2628.24 samples/sec   Loss 6.9564   LearningRate 0.0278   Epoch: 9   Global Step: 392170   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:47:58,294-Speed 2626.65 samples/sec   Loss 7.1083   LearningRate 0.0278   Epoch: 9   Global Step: 392180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:02,196-Speed 2626.09 samples/sec   Loss 6.9122   LearningRate 0.0278   Epoch: 9   Global Step: 392190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:06,097-Speed 2625.18 samples/sec   Loss 7.1188   LearningRate 0.0278   Epoch: 9   Global Step: 392200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:10,025-Speed 2607.36 samples/sec   Loss 7.0499   LearningRate 0.0278   Epoch: 9   Global Step: 392210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:13,934-Speed 2620.13 samples/sec   Loss 7.0730   LearningRate 0.0278   Epoch: 9   Global Step: 392220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:17,911-Speed 2575.67 samples/sec   Loss 7.1152   LearningRate 0.0278   Epoch: 9   Global Step: 392230   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:21,857-Speed 2595.99 samples/sec   Loss 7.0364   LearningRate 0.0278   Epoch: 9   Global Step: 392240   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:25,755-Speed 2627.92 samples/sec   Loss 7.2179   LearningRate 0.0278   Epoch: 9   Global Step: 392250   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:48:29,666-Speed 2618.23 samples/sec   Loss 7.2557   LearningRate 0.0278   Epoch: 9   Global Step: 392260   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:33,579-Speed 2617.86 samples/sec   Loss 7.2262   LearningRate 0.0278   Epoch: 9   Global Step: 392270   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:37,484-Speed 2622.73 samples/sec   Loss 7.1540   LearningRate 0.0278   Epoch: 9   Global Step: 392280   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:41,386-Speed 2625.66 samples/sec   Loss 7.0718   LearningRate 0.0278   Epoch: 9   Global Step: 392290   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:45,284-Speed 2626.93 samples/sec   Loss 7.1888   LearningRate 0.0278   Epoch: 9   Global Step: 392300   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:49,189-Speed 2623.56 samples/sec   Loss 7.0798   LearningRate 0.0278   Epoch: 9   Global Step: 392310   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:53,094-Speed 2622.53 samples/sec   Loss 6.9878   LearningRate 0.0278   Epoch: 9   Global Step: 392320   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:48:56,998-Speed 2623.64 samples/sec   Loss 7.1539   LearningRate 0.0278   Epoch: 9   Global Step: 392330   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:49:00,899-Speed 2625.81 samples/sec   Loss 7.0692   LearningRate 0.0278   Epoch: 9   Global Step: 392340   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:49:04,806-Speed 2621.54 samples/sec   Loss 7.1807   LearningRate 0.0278   Epoch: 9   Global Step: 392350   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:49:08,710-Speed 2623.14 samples/sec   Loss 7.0623   LearningRate 0.0278   Epoch: 9   Global Step: 392360   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:12,613-Speed 2624.67 samples/sec   Loss 6.9697   LearningRate 0.0278   Epoch: 9   Global Step: 392370   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:16,516-Speed 2624.22 samples/sec   Loss 7.2091   LearningRate 0.0278   Epoch: 9   Global Step: 392380   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:20,415-Speed 2626.92 samples/sec   Loss 7.0169   LearningRate 0.0278   Epoch: 9   Global Step: 392390   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:24,313-Speed 2627.91 samples/sec   Loss 7.0625   LearningRate 0.0278   Epoch: 9   Global Step: 392400   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:28,228-Speed 2616.39 samples/sec   Loss 7.1719   LearningRate 0.0278   Epoch: 9   Global Step: 392410   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:32,127-Speed 2626.76 samples/sec   Loss 7.0193   LearningRate 0.0278   Epoch: 9   Global Step: 392420   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:36,031-Speed 2623.78 samples/sec   Loss 7.0433   LearningRate 0.0278   Epoch: 9   Global Step: 392430   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:39,937-Speed 2622.75 samples/sec   Loss 7.0867   LearningRate 0.0278   Epoch: 9   Global Step: 392440   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:43,843-Speed 2622.01 samples/sec   Loss 7.2238   LearningRate 0.0278   Epoch: 9   Global Step: 392450   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:49:47,751-Speed 2621.45 samples/sec   Loss 7.0619   LearningRate 0.0278   Epoch: 9   Global Step: 392460   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:49:51,653-Speed 2624.97 samples/sec   Loss 7.1258   LearningRate 0.0278   Epoch: 9   Global Step: 392470   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:49:55,545-Speed 2631.72 samples/sec   Loss 6.9838   LearningRate 0.0278   Epoch: 9   Global Step: 392480   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:49:59,457-Speed 2618.41 samples/sec   Loss 7.0389   LearningRate 0.0278   Epoch: 9   Global Step: 392490   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:03,439-Speed 2571.81 samples/sec   Loss 7.0782   LearningRate 0.0278   Epoch: 9   Global Step: 392500   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:07,338-Speed 2626.79 samples/sec   Loss 6.9318   LearningRate 0.0278   Epoch: 9   Global Step: 392510   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:11,243-Speed 2623.48 samples/sec   Loss 7.1103   LearningRate 0.0278   Epoch: 9   Global Step: 392520   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:15,158-Speed 2615.48 samples/sec   Loss 7.0633   LearningRate 0.0278   Epoch: 9   Global Step: 392530   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:19,062-Speed 2624.55 samples/sec   Loss 7.0830   LearningRate 0.0278   Epoch: 9   Global Step: 392540   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:22,987-Speed 2609.54 samples/sec   Loss 7.1022   LearningRate 0.0278   Epoch: 9   Global Step: 392550   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:27,002-Speed 2551.07 samples/sec   Loss 7.1001   LearningRate 0.0278   Epoch: 9   Global Step: 392560   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:50:30,921-Speed 2614.03 samples/sec   Loss 7.0142   LearningRate 0.0278   Epoch: 9   Global Step: 392570   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:50:34,824-Speed 2624.04 samples/sec   Loss 7.0256   LearningRate 0.0277   Epoch: 9   Global Step: 392580   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:50:38,721-Speed 2628.12 samples/sec   Loss 7.1286   LearningRate 0.0277   Epoch: 9   Global Step: 392590   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:50:42,640-Speed 2613.52 samples/sec   Loss 7.0064   LearningRate 0.0277   Epoch: 9   Global Step: 392600   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:50:46,524-Speed 2637.71 samples/sec   Loss 7.0943   LearningRate 0.0277   Epoch: 9   Global Step: 392610   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:50,426-Speed 2624.87 samples/sec   Loss 7.1815   LearningRate 0.0277   Epoch: 9   Global Step: 392620   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:54,348-Speed 2611.75 samples/sec   Loss 7.0655   LearningRate 0.0277   Epoch: 9   Global Step: 392630   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:50:58,255-Speed 2621.44 samples/sec   Loss 7.0051   LearningRate 0.0277   Epoch: 9   Global Step: 392640   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:02,159-Speed 2623.74 samples/sec   Loss 7.0689   LearningRate 0.0277   Epoch: 9   Global Step: 392650   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:06,074-Speed 2615.88 samples/sec   Loss 7.0784   LearningRate 0.0277   Epoch: 9   Global Step: 392660   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:09,981-Speed 2622.18 samples/sec   Loss 7.0113   LearningRate 0.0277   Epoch: 9   Global Step: 392670   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:13,889-Speed 2620.61 samples/sec   Loss 7.1826   LearningRate 0.0277   Epoch: 9   Global Step: 392680   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:17,794-Speed 2623.29 samples/sec   Loss 7.0975   LearningRate 0.0277   Epoch: 9   Global Step: 392690   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:21,697-Speed 2624.41 samples/sec   Loss 7.1450   LearningRate 0.0277   Epoch: 9   Global Step: 392700   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:25,600-Speed 2624.36 samples/sec   Loss 7.1668   LearningRate 0.0277   Epoch: 9   Global Step: 392710   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:51:29,474-Speed 2644.43 samples/sec   Loss 7.0565   LearningRate 0.0277   Epoch: 9   Global Step: 392720   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:33,379-Speed 2622.57 samples/sec   Loss 7.1558   LearningRate 0.0277   Epoch: 9   Global Step: 392730   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:37,287-Speed 2620.54 samples/sec   Loss 7.0198   LearningRate 0.0277   Epoch: 9   Global Step: 392740   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:41,195-Speed 2620.92 samples/sec   Loss 7.0358   LearningRate 0.0277   Epoch: 9   Global Step: 392750   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:45,098-Speed 2624.50 samples/sec   Loss 7.0861   LearningRate 0.0277   Epoch: 9   Global Step: 392760   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:49,007-Speed 2620.31 samples/sec   Loss 7.0692   LearningRate 0.0277   Epoch: 9   Global Step: 392770   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:51:52,900-Speed 2631.37 samples/sec   Loss 7.0728   LearningRate 0.0277   Epoch: 9   Global Step: 392780   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:51:56,807-Speed 2621.52 samples/sec   Loss 7.0841   LearningRate 0.0277   Epoch: 9   Global Step: 392790   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:00,717-Speed 2619.30 samples/sec   Loss 7.0643   LearningRate 0.0277   Epoch: 9   Global Step: 392800   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:04,656-Speed 2600.38 samples/sec   Loss 6.8969   LearningRate 0.0277   Epoch: 9   Global Step: 392810   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:08,594-Speed 2600.61 samples/sec   Loss 6.9715   LearningRate 0.0277   Epoch: 9   Global Step: 392820   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:12,516-Speed 2611.40 samples/sec   Loss 7.1156   LearningRate 0.0277   Epoch: 9   Global Step: 392830   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:16,426-Speed 2619.98 samples/sec   Loss 7.0548   LearningRate 0.0277   Epoch: 9   Global Step: 392840   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:20,329-Speed 2624.05 samples/sec   Loss 7.1708   LearningRate 0.0277   Epoch: 9   Global Step: 392850   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:24,232-Speed 2624.90 samples/sec   Loss 7.0906   LearningRate 0.0277   Epoch: 9   Global Step: 392860   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:28,133-Speed 2624.98 samples/sec   Loss 7.0902   LearningRate 0.0277   Epoch: 9   Global Step: 392870   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:52:32,068-Speed 2603.40 samples/sec   Loss 7.0686   LearningRate 0.0277   Epoch: 9   Global Step: 392880   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:35,969-Speed 2625.41 samples/sec   Loss 7.0754   LearningRate 0.0277   Epoch: 9   Global Step: 392890   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:39,889-Speed 2613.22 samples/sec   Loss 7.0248   LearningRate 0.0277   Epoch: 9   Global Step: 392900   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:43,787-Speed 2627.55 samples/sec   Loss 7.0564   LearningRate 0.0277   Epoch: 9   Global Step: 392910   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:47,685-Speed 2627.72 samples/sec   Loss 7.0701   LearningRate 0.0277   Epoch: 9   Global Step: 392920   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:51,585-Speed 2626.33 samples/sec   Loss 7.1502   LearningRate 0.0277   Epoch: 9   Global Step: 392930   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:55,493-Speed 2621.25 samples/sec   Loss 7.0967   LearningRate 0.0277   Epoch: 9   Global Step: 392940   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:52:59,390-Speed 2627.86 samples/sec   Loss 6.9481   LearningRate 0.0277   Epoch: 9   Global Step: 392950   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:53:03,284-Speed 2630.66 samples/sec   Loss 7.0541   LearningRate 0.0277   Epoch: 9   Global Step: 392960   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:53:07,196-Speed 2618.16 samples/sec   Loss 7.1166   LearningRate 0.0277   Epoch: 9   Global Step: 392970   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:53:11,079-Speed 2637.45 samples/sec   Loss 7.1821   LearningRate 0.0277   Epoch: 9   Global Step: 392980   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:14,979-Speed 2626.83 samples/sec   Loss 6.9730   LearningRate 0.0277   Epoch: 9   Global Step: 392990   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:18,883-Speed 2623.45 samples/sec   Loss 6.9594   LearningRate 0.0277   Epoch: 9   Global Step: 393000   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:22,791-Speed 2621.26 samples/sec   Loss 7.1059   LearningRate 0.0277   Epoch: 9   Global Step: 393010   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:26,691-Speed 2626.31 samples/sec   Loss 7.0633   LearningRate 0.0277   Epoch: 9   Global Step: 393020   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:30,588-Speed 2628.53 samples/sec   Loss 7.1288   LearningRate 0.0277   Epoch: 9   Global Step: 393030   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:34,491-Speed 2624.05 samples/sec   Loss 7.0007   LearningRate 0.0277   Epoch: 9   Global Step: 393040   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:38,401-Speed 2620.01 samples/sec   Loss 7.0302   LearningRate 0.0277   Epoch: 9   Global Step: 393050   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:42,296-Speed 2629.17 samples/sec   Loss 7.1456   LearningRate 0.0277   Epoch: 9   Global Step: 393060   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:46,236-Speed 2599.97 samples/sec   Loss 6.8742   LearningRate 0.0277   Epoch: 9   Global Step: 393070   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:53:50,144-Speed 2621.35 samples/sec   Loss 7.1619   LearningRate 0.0277   Epoch: 9   Global Step: 393080   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:53:54,045-Speed 2625.79 samples/sec   Loss 7.1730   LearningRate 0.0277   Epoch: 9   Global Step: 393090   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:53:57,929-Speed 2637.20 samples/sec   Loss 7.0910   LearningRate 0.0277   Epoch: 9   Global Step: 393100   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:01,829-Speed 2626.22 samples/sec   Loss 7.0190   LearningRate 0.0277   Epoch: 9   Global Step: 393110   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:05,728-Speed 2626.92 samples/sec   Loss 7.0480   LearningRate 0.0277   Epoch: 9   Global Step: 393120   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:09,774-Speed 2531.38 samples/sec   Loss 7.0593   LearningRate 0.0277   Epoch: 9   Global Step: 393130   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:13,676-Speed 2625.14 samples/sec   Loss 6.9602   LearningRate 0.0277   Epoch: 9   Global Step: 393140   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:17,593-Speed 2615.32 samples/sec   Loss 7.1150   LearningRate 0.0277   Epoch: 9   Global Step: 393150   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:21,508-Speed 2615.63 samples/sec   Loss 7.0663   LearningRate 0.0277   Epoch: 9   Global Step: 393160   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:25,416-Speed 2620.72 samples/sec   Loss 7.1808   LearningRate 0.0277   Epoch: 9   Global Step: 393170   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:29,320-Speed 2624.18 samples/sec   Loss 7.0348   LearningRate 0.0277   Epoch: 9   Global Step: 393180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:33,220-Speed 2626.11 samples/sec   Loss 7.0934   LearningRate 0.0277   Epoch: 9   Global Step: 393190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:54:37,121-Speed 2626.09 samples/sec   Loss 7.2295   LearningRate 0.0277   Epoch: 9   Global Step: 393200   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:54:41,022-Speed 2625.60 samples/sec   Loss 7.1859   LearningRate 0.0277   Epoch: 9   Global Step: 393210   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:54:44,936-Speed 2616.36 samples/sec   Loss 6.9960   LearningRate 0.0277   Epoch: 9   Global Step: 393220   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:54:48,845-Speed 2620.77 samples/sec   Loss 7.0690   LearningRate 0.0277   Epoch: 9   Global Step: 393230   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:54:52,745-Speed 2626.03 samples/sec   Loss 7.0488   LearningRate 0.0277   Epoch: 9   Global Step: 393240   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:54:56,650-Speed 2622.88 samples/sec   Loss 7.0517   LearningRate 0.0277   Epoch: 9   Global Step: 393250   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:00,551-Speed 2625.49 samples/sec   Loss 7.0908   LearningRate 0.0277   Epoch: 9   Global Step: 393260   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:04,459-Speed 2621.04 samples/sec   Loss 7.1299   LearningRate 0.0277   Epoch: 9   Global Step: 393270   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:08,356-Speed 2628.07 samples/sec   Loss 6.9785   LearningRate 0.0277   Epoch: 9   Global Step: 393280   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:12,270-Speed 2617.08 samples/sec   Loss 6.9102   LearningRate 0.0277   Epoch: 9   Global Step: 393290   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:16,173-Speed 2624.27 samples/sec   Loss 7.0250   LearningRate 0.0277   Epoch: 9   Global Step: 393300   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:55:20,077-Speed 2623.46 samples/sec   Loss 7.0813   LearningRate 0.0277   Epoch: 9   Global Step: 393310   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:55:24,010-Speed 2604.14 samples/sec   Loss 7.0259   LearningRate 0.0277   Epoch: 9   Global Step: 393320   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:55:27,916-Speed 2622.43 samples/sec   Loss 7.0660   LearningRate 0.0277   Epoch: 9   Global Step: 393330   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:55:31,800-Speed 2636.88 samples/sec   Loss 6.9748   LearningRate 0.0277   Epoch: 9   Global Step: 393340   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:35,715-Speed 2616.24 samples/sec   Loss 7.0420   LearningRate 0.0277   Epoch: 9   Global Step: 393350   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:39,623-Speed 2621.16 samples/sec   Loss 7.1480   LearningRate 0.0276   Epoch: 9   Global Step: 393360   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:43,524-Speed 2625.75 samples/sec   Loss 7.0678   LearningRate 0.0276   Epoch: 9   Global Step: 393370   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:47,429-Speed 2623.10 samples/sec   Loss 6.9692   LearningRate 0.0276   Epoch: 9   Global Step: 393380   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:51,330-Speed 2624.94 samples/sec   Loss 7.0428   LearningRate 0.0276   Epoch: 9   Global Step: 393390   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:55,258-Speed 2608.10 samples/sec   Loss 7.1098   LearningRate 0.0276   Epoch: 9   Global Step: 393400   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:55:59,161-Speed 2624.05 samples/sec   Loss 7.0722   LearningRate 0.0276   Epoch: 9   Global Step: 393410   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:56:03,039-Speed 2641.31 samples/sec   Loss 6.9827   LearningRate 0.0276   Epoch: 9   Global Step: 393420   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:06,945-Speed 2622.28 samples/sec   Loss 7.3157   LearningRate 0.0276   Epoch: 9   Global Step: 393430   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:10,850-Speed 2622.95 samples/sec   Loss 7.0745   LearningRate 0.0276   Epoch: 9   Global Step: 393440   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:14,762-Speed 2617.69 samples/sec   Loss 7.1351   LearningRate 0.0276   Epoch: 9   Global Step: 393450   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:18,705-Speed 2598.08 samples/sec   Loss 6.8002   LearningRate 0.0276   Epoch: 9   Global Step: 393460   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:22,611-Speed 2621.94 samples/sec   Loss 7.0589   LearningRate 0.0276   Epoch: 9   Global Step: 393470   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:26,514-Speed 2625.28 samples/sec   Loss 7.0575   LearningRate 0.0276   Epoch: 9   Global Step: 393480   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:30,414-Speed 2626.25 samples/sec   Loss 7.1092   LearningRate 0.0276   Epoch: 9   Global Step: 393490   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:34,314-Speed 2625.80 samples/sec   Loss 6.9685   LearningRate 0.0276   Epoch: 9   Global Step: 393500   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:38,221-Speed 2621.66 samples/sec   Loss 7.1493   LearningRate 0.0276   Epoch: 9   Global Step: 393510   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 15:56:42,118-Speed 2628.69 samples/sec   Loss 6.9696   LearningRate 0.0276   Epoch: 9   Global Step: 393520   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:56:46,022-Speed 2623.54 samples/sec   Loss 7.0282   LearningRate 0.0276   Epoch: 9   Global Step: 393530   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:56:49,925-Speed 2624.50 samples/sec   Loss 7.0509   LearningRate 0.0276   Epoch: 9   Global Step: 393540   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:56:53,828-Speed 2624.48 samples/sec   Loss 6.9909   LearningRate 0.0276   Epoch: 9   Global Step: 393550   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:56:57,735-Speed 2621.79 samples/sec   Loss 6.9371   LearningRate 0.0276   Epoch: 9   Global Step: 393560   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:01,635-Speed 2626.31 samples/sec   Loss 7.0766   LearningRate 0.0276   Epoch: 9   Global Step: 393570   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:05,533-Speed 2627.91 samples/sec   Loss 7.1094   LearningRate 0.0276   Epoch: 9   Global Step: 393580   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:09,427-Speed 2630.05 samples/sec   Loss 7.0388   LearningRate 0.0276   Epoch: 9   Global Step: 393590   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:13,334-Speed 2621.87 samples/sec   Loss 6.9974   LearningRate 0.0276   Epoch: 9   Global Step: 393600   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:17,246-Speed 2618.52 samples/sec   Loss 7.0746   LearningRate 0.0276   Epoch: 9   Global Step: 393610   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:21,158-Speed 2617.76 samples/sec   Loss 7.0969   LearningRate 0.0276   Epoch: 9   Global Step: 393620   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:57:25,059-Speed 2625.88 samples/sec   Loss 7.1166   LearningRate 0.0276   Epoch: 9   Global Step: 393630   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:57:28,990-Speed 2606.02 samples/sec   Loss 7.0808   LearningRate 0.0276   Epoch: 9   Global Step: 393640   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:57:32,905-Speed 2615.72 samples/sec   Loss 7.1045   LearningRate 0.0276   Epoch: 9   Global Step: 393650   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:57:36,808-Speed 2624.35 samples/sec   Loss 6.9269   LearningRate 0.0276   Epoch: 9   Global Step: 393660   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:40,713-Speed 2623.35 samples/sec   Loss 6.9799   LearningRate 0.0276   Epoch: 9   Global Step: 393670   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:44,615-Speed 2624.40 samples/sec   Loss 7.0459   LearningRate 0.0276   Epoch: 9   Global Step: 393680   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:48,520-Speed 2623.19 samples/sec   Loss 6.9879   LearningRate 0.0276   Epoch: 9   Global Step: 393690   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:52,437-Speed 2615.32 samples/sec   Loss 7.0241   LearningRate 0.0276   Epoch: 9   Global Step: 393700   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:57:56,346-Speed 2620.27 samples/sec   Loss 7.0513   LearningRate 0.0276   Epoch: 9   Global Step: 393710   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:58:00,252-Speed 2621.93 samples/sec   Loss 7.1231   LearningRate 0.0276   Epoch: 9   Global Step: 393720   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:58:04,158-Speed 2622.33 samples/sec   Loss 6.9726   LearningRate 0.0276   Epoch: 9   Global Step: 393730   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:58:08,069-Speed 2618.79 samples/sec   Loss 6.9464   LearningRate 0.0276   Epoch: 9   Global Step: 393740   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:58:11,971-Speed 2625.56 samples/sec   Loss 7.1557   LearningRate 0.0276   Epoch: 9   Global Step: 393750   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:58:15,881-Speed 2619.60 samples/sec   Loss 7.0430   LearningRate 0.0276   Epoch: 9   Global Step: 393760   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:19,778-Speed 2628.57 samples/sec   Loss 6.9930   LearningRate 0.0276   Epoch: 9   Global Step: 393770   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:23,683-Speed 2622.98 samples/sec   Loss 6.8240   LearningRate 0.0276   Epoch: 9   Global Step: 393780   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:27,594-Speed 2619.12 samples/sec   Loss 7.0224   LearningRate 0.0276   Epoch: 9   Global Step: 393790   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:31,504-Speed 2619.35 samples/sec   Loss 6.9794   LearningRate 0.0276   Epoch: 9   Global Step: 393800   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:35,408-Speed 2623.27 samples/sec   Loss 7.1441   LearningRate 0.0276   Epoch: 9   Global Step: 393810   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:39,323-Speed 2616.41 samples/sec   Loss 7.0917   LearningRate 0.0276   Epoch: 9   Global Step: 393820   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:43,244-Speed 2612.43 samples/sec   Loss 7.0672   LearningRate 0.0276   Epoch: 9   Global Step: 393830   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:47,142-Speed 2627.63 samples/sec   Loss 6.9678   LearningRate 0.0276   Epoch: 9   Global Step: 393840   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:51,045-Speed 2624.82 samples/sec   Loss 6.9608   LearningRate 0.0276   Epoch: 9   Global Step: 393850   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:58:54,941-Speed 2628.80 samples/sec   Loss 7.1068   LearningRate 0.0276   Epoch: 9   Global Step: 393860   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:58:58,847-Speed 2622.23 samples/sec   Loss 6.9460   LearningRate 0.0276   Epoch: 9   Global Step: 393870   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 15:59:02,730-Speed 2637.77 samples/sec   Loss 6.9147   LearningRate 0.0276   Epoch: 9   Global Step: 393880   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:59:06,622-Speed 2631.93 samples/sec   Loss 7.1631   LearningRate 0.0276   Epoch: 9   Global Step: 393890   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:10,530-Speed 2621.13 samples/sec   Loss 7.0716   LearningRate 0.0276   Epoch: 9   Global Step: 393900   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:14,456-Speed 2608.82 samples/sec   Loss 7.1011   LearningRate 0.0276   Epoch: 9   Global Step: 393910   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:18,351-Speed 2629.94 samples/sec   Loss 7.0339   LearningRate 0.0276   Epoch: 9   Global Step: 393920   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:22,253-Speed 2624.92 samples/sec   Loss 6.9946   LearningRate 0.0276   Epoch: 9   Global Step: 393930   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:26,274-Speed 2547.44 samples/sec   Loss 6.9777   LearningRate 0.0276   Epoch: 9   Global Step: 393940   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:30,173-Speed 2626.74 samples/sec   Loss 7.1121   LearningRate 0.0276   Epoch: 9   Global Step: 393950   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:34,083-Speed 2619.34 samples/sec   Loss 6.9674   LearningRate 0.0276   Epoch: 9   Global Step: 393960   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:37,978-Speed 2629.57 samples/sec   Loss 7.0374   LearningRate 0.0276   Epoch: 9   Global Step: 393970   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:41,892-Speed 2616.66 samples/sec   Loss 7.0514   LearningRate 0.0276   Epoch: 9   Global Step: 393980   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:45,789-Speed 2628.48 samples/sec   Loss 7.0774   LearningRate 0.0276   Epoch: 9   Global Step: 393990   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:59:49,688-Speed 2626.88 samples/sec   Loss 7.0687   LearningRate 0.0276   Epoch: 9   Global Step: 394000   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 15:59:53,565-Speed 2641.97 samples/sec   Loss 6.9671   LearningRate 0.0276   Epoch: 9   Global Step: 394010   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 15:59:57,615-Speed 2529.49 samples/sec   Loss 7.0210   LearningRate 0.0276   Epoch: 9   Global Step: 394020   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:01,512-Speed 2627.75 samples/sec   Loss 7.1393   LearningRate 0.0276   Epoch: 9   Global Step: 394030   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:05,417-Speed 2623.00 samples/sec   Loss 7.0392   LearningRate 0.0276   Epoch: 9   Global Step: 394040   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:09,316-Speed 2626.82 samples/sec   Loss 7.0740   LearningRate 0.0276   Epoch: 9   Global Step: 394050   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:13,262-Speed 2596.30 samples/sec   Loss 7.0473   LearningRate 0.0276   Epoch: 9   Global Step: 394060   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:17,177-Speed 2616.25 samples/sec   Loss 6.9546   LearningRate 0.0276   Epoch: 9   Global Step: 394070   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:21,082-Speed 2622.44 samples/sec   Loss 6.9473   LearningRate 0.0276   Epoch: 9   Global Step: 394080   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:25,005-Speed 2611.03 samples/sec   Loss 6.9936   LearningRate 0.0276   Epoch: 9   Global Step: 394090   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:28,904-Speed 2626.86 samples/sec   Loss 6.9443   LearningRate 0.0276   Epoch: 9   Global Step: 394100   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:32,806-Speed 2625.57 samples/sec   Loss 7.0108   LearningRate 0.0276   Epoch: 9   Global Step: 394110   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:00:36,707-Speed 2625.23 samples/sec   Loss 7.0266   LearningRate 0.0276   Epoch: 9   Global Step: 394120   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:00:40,607-Speed 2626.15 samples/sec   Loss 7.1024   LearningRate 0.0276   Epoch: 9   Global Step: 394130   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:00:44,495-Speed 2635.13 samples/sec   Loss 7.1117   LearningRate 0.0276   Epoch: 9   Global Step: 394140   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:48,468-Speed 2577.95 samples/sec   Loss 7.0440   LearningRate 0.0275   Epoch: 9   Global Step: 394150   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:52,373-Speed 2623.13 samples/sec   Loss 7.0618   LearningRate 0.0275   Epoch: 9   Global Step: 394160   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:00:56,275-Speed 2624.59 samples/sec   Loss 7.1040   LearningRate 0.0275   Epoch: 9   Global Step: 394170   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:00,171-Speed 2629.23 samples/sec   Loss 7.1194   LearningRate 0.0275   Epoch: 9   Global Step: 394180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:04,076-Speed 2622.45 samples/sec   Loss 6.9291   LearningRate 0.0275   Epoch: 9   Global Step: 394190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:07,983-Speed 2621.93 samples/sec   Loss 6.9612   LearningRate 0.0275   Epoch: 9   Global Step: 394200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:11,881-Speed 2627.59 samples/sec   Loss 7.0443   LearningRate 0.0275   Epoch: 9   Global Step: 394210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:15,783-Speed 2625.27 samples/sec   Loss 6.9595   LearningRate 0.0275   Epoch: 9   Global Step: 394220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:19,680-Speed 2628.29 samples/sec   Loss 7.0751   LearningRate 0.0275   Epoch: 9   Global Step: 394230   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:01:23,588-Speed 2620.60 samples/sec   Loss 7.0611   LearningRate 0.0275   Epoch: 9   Global Step: 394240   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:27,496-Speed 2621.69 samples/sec   Loss 7.0085   LearningRate 0.0275   Epoch: 9   Global Step: 394250   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:31,406-Speed 2619.00 samples/sec   Loss 6.8939   LearningRate 0.0275   Epoch: 9   Global Step: 394260   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:35,329-Speed 2610.68 samples/sec   Loss 7.0596   LearningRate 0.0275   Epoch: 9   Global Step: 394270   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:39,235-Speed 2621.90 samples/sec   Loss 7.0047   LearningRate 0.0275   Epoch: 9   Global Step: 394280   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:43,136-Speed 2626.39 samples/sec   Loss 7.0837   LearningRate 0.0275   Epoch: 9   Global Step: 394290   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:47,038-Speed 2624.29 samples/sec   Loss 7.1049   LearningRate 0.0275   Epoch: 9   Global Step: 394300   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:50,965-Speed 2608.27 samples/sec   Loss 7.0389   LearningRate 0.0275   Epoch: 9   Global Step: 394310   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:54,866-Speed 2625.93 samples/sec   Loss 7.0721   LearningRate 0.0275   Epoch: 9   Global Step: 394320   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:01:58,768-Speed 2625.31 samples/sec   Loss 7.0704   LearningRate 0.0275   Epoch: 9   Global Step: 394330   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:02:02,670-Speed 2624.79 samples/sec   Loss 7.0989   LearningRate 0.0275   Epoch: 9   Global Step: 394340   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:06,569-Speed 2626.55 samples/sec   Loss 7.2215   LearningRate 0.0275   Epoch: 9   Global Step: 394350   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:10,474-Speed 2623.07 samples/sec   Loss 6.9758   LearningRate 0.0275   Epoch: 9   Global Step: 394360   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:14,371-Speed 2628.10 samples/sec   Loss 7.0077   LearningRate 0.0275   Epoch: 9   Global Step: 394370   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:18,274-Speed 2624.95 samples/sec   Loss 6.9631   LearningRate 0.0275   Epoch: 9   Global Step: 394380   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:22,172-Speed 2627.38 samples/sec   Loss 6.9714   LearningRate 0.0275   Epoch: 9   Global Step: 394390   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:26,067-Speed 2630.29 samples/sec   Loss 6.9751   LearningRate 0.0275   Epoch: 9   Global Step: 394400   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:29,976-Speed 2620.48 samples/sec   Loss 7.0553   LearningRate 0.0275   Epoch: 9   Global Step: 394410   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:33,878-Speed 2624.95 samples/sec   Loss 6.9846   LearningRate 0.0275   Epoch: 9   Global Step: 394420   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:37,774-Speed 2628.49 samples/sec   Loss 7.1841   LearningRate 0.0275   Epoch: 9   Global Step: 394430   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:02:41,693-Speed 2613.53 samples/sec   Loss 6.9823   LearningRate 0.0275   Epoch: 9   Global Step: 394440   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:02:45,590-Speed 2628.33 samples/sec   Loss 7.0362   LearningRate 0.0275   Epoch: 9   Global Step: 394450   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:02:49,496-Speed 2622.62 samples/sec   Loss 7.1675   LearningRate 0.0275   Epoch: 9   Global Step: 394460   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:02:53,394-Speed 2627.69 samples/sec   Loss 7.0055   LearningRate 0.0275   Epoch: 9   Global Step: 394470   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:02:57,276-Speed 2639.51 samples/sec   Loss 7.1672   LearningRate 0.0275   Epoch: 9   Global Step: 394480   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:01,174-Speed 2627.30 samples/sec   Loss 6.9125   LearningRate 0.0275   Epoch: 9   Global Step: 394490   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:05,077-Speed 2623.94 samples/sec   Loss 7.2203   LearningRate 0.0275   Epoch: 9   Global Step: 394500   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:08,976-Speed 2626.45 samples/sec   Loss 6.7988   LearningRate 0.0275   Epoch: 9   Global Step: 394510   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:12,892-Speed 2615.94 samples/sec   Loss 7.0323   LearningRate 0.0275   Epoch: 9   Global Step: 394520   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:16,802-Speed 2620.04 samples/sec   Loss 6.9925   LearningRate 0.0275   Epoch: 9   Global Step: 394530   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:20,698-Speed 2630.53 samples/sec   Loss 7.0466   LearningRate 0.0275   Epoch: 9   Global Step: 394540   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:24,612-Speed 2616.81 samples/sec   Loss 7.0264   LearningRate 0.0275   Epoch: 9   Global Step: 394550   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:28,518-Speed 2622.58 samples/sec   Loss 7.0548   LearningRate 0.0275   Epoch: 9   Global Step: 394560   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:32,456-Speed 2600.69 samples/sec   Loss 7.0563   LearningRate 0.0275   Epoch: 9   Global Step: 394570   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:36,340-Speed 2637.11 samples/sec   Loss 7.0951   LearningRate 0.0275   Epoch: 9   Global Step: 394580   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:40,240-Speed 2626.12 samples/sec   Loss 7.1499   LearningRate 0.0275   Epoch: 9   Global Step: 394590   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:44,142-Speed 2625.68 samples/sec   Loss 7.1001   LearningRate 0.0275   Epoch: 9   Global Step: 394600   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:48,042-Speed 2626.61 samples/sec   Loss 6.9676   LearningRate 0.0275   Epoch: 9   Global Step: 394610   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:51,938-Speed 2628.57 samples/sec   Loss 6.9270   LearningRate 0.0275   Epoch: 9   Global Step: 394620   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:55,834-Speed 2629.36 samples/sec   Loss 6.9145   LearningRate 0.0275   Epoch: 9   Global Step: 394630   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:03:59,711-Speed 2642.22 samples/sec   Loss 7.0186   LearningRate 0.0275   Epoch: 9   Global Step: 394640   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:03,620-Speed 2619.65 samples/sec   Loss 6.9982   LearningRate 0.0275   Epoch: 9   Global Step: 394650   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:07,534-Speed 2616.69 samples/sec   Loss 7.0068   LearningRate 0.0275   Epoch: 9   Global Step: 394660   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:11,626-Speed 2503.71 samples/sec   Loss 7.0071   LearningRate 0.0275   Epoch: 9   Global Step: 394670   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:15,595-Speed 2580.96 samples/sec   Loss 7.2205   LearningRate 0.0275   Epoch: 9   Global Step: 394680   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:19,598-Speed 2558.67 samples/sec   Loss 7.1485   LearningRate 0.0275   Epoch: 9   Global Step: 394690   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:23,498-Speed 2626.28 samples/sec   Loss 7.0313   LearningRate 0.0275   Epoch: 9   Global Step: 394700   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:27,404-Speed 2622.48 samples/sec   Loss 7.1428   LearningRate 0.0275   Epoch: 9   Global Step: 394710   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:31,311-Speed 2621.77 samples/sec   Loss 7.0597   LearningRate 0.0275   Epoch: 9   Global Step: 394720   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:35,210-Speed 2627.08 samples/sec   Loss 7.0207   LearningRate 0.0275   Epoch: 9   Global Step: 394730   Fp16 Grad Scale: 32768   Required: 49 hours
Training: 2022-04-14 16:04:39,211-Speed 2560.03 samples/sec   Loss 7.1555   LearningRate 0.0275   Epoch: 9   Global Step: 394740   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:04:43,114-Speed 2624.07 samples/sec   Loss 6.9729   LearningRate 0.0275   Epoch: 9   Global Step: 394750   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:04:47,032-Speed 2614.34 samples/sec   Loss 7.1676   LearningRate 0.0275   Epoch: 9   Global Step: 394760   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:04:50,997-Speed 2584.99 samples/sec   Loss 7.2373   LearningRate 0.0275   Epoch: 9   Global Step: 394770   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:04:54,894-Speed 2628.56 samples/sec   Loss 7.0809   LearningRate 0.0275   Epoch: 9   Global Step: 394780   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:04:58,797-Speed 2624.60 samples/sec   Loss 7.1139   LearningRate 0.0275   Epoch: 9   Global Step: 394790   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:05:02,695-Speed 2627.82 samples/sec   Loss 7.0969   LearningRate 0.0275   Epoch: 9   Global Step: 394800   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:05:06,617-Speed 2610.80 samples/sec   Loss 7.1396   LearningRate 0.0275   Epoch: 9   Global Step: 394810   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:05:10,545-Speed 2607.34 samples/sec   Loss 7.0325   LearningRate 0.0275   Epoch: 9   Global Step: 394820   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:05:14,453-Speed 2621.95 samples/sec   Loss 6.9860   LearningRate 0.0275   Epoch: 9   Global Step: 394830   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:05:18,351-Speed 2627.39 samples/sec   Loss 7.0051   LearningRate 0.0275   Epoch: 9   Global Step: 394840   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:22,246-Speed 2629.53 samples/sec   Loss 6.9576   LearningRate 0.0275   Epoch: 9   Global Step: 394850   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:26,156-Speed 2619.97 samples/sec   Loss 7.0830   LearningRate 0.0275   Epoch: 9   Global Step: 394860   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:30,057-Speed 2625.86 samples/sec   Loss 6.9121   LearningRate 0.0275   Epoch: 9   Global Step: 394870   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:33,955-Speed 2627.49 samples/sec   Loss 7.0723   LearningRate 0.0275   Epoch: 9   Global Step: 394880   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:37,859-Speed 2623.66 samples/sec   Loss 6.9725   LearningRate 0.0275   Epoch: 9   Global Step: 394890   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:41,757-Speed 2627.16 samples/sec   Loss 6.9040   LearningRate 0.0275   Epoch: 9   Global Step: 394900   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:45,652-Speed 2629.95 samples/sec   Loss 7.0032   LearningRate 0.0275   Epoch: 9   Global Step: 394910   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:49,558-Speed 2622.35 samples/sec   Loss 7.0149   LearningRate 0.0275   Epoch: 9   Global Step: 394920   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:53,453-Speed 2629.44 samples/sec   Loss 7.0021   LearningRate 0.0275   Epoch: 9   Global Step: 394930   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:05:57,360-Speed 2622.22 samples/sec   Loss 6.9952   LearningRate 0.0275   Epoch: 9   Global Step: 394940   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:06:01,256-Speed 2628.33 samples/sec   Loss 7.0931   LearningRate 0.0274   Epoch: 9   Global Step: 394950   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:06:05,138-Speed 2638.52 samples/sec   Loss 7.0546   LearningRate 0.0274   Epoch: 9   Global Step: 394960   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:06:09,120-Speed 2572.05 samples/sec   Loss 7.1501   LearningRate 0.0274   Epoch: 9   Global Step: 394970   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:13,047-Speed 2608.80 samples/sec   Loss 7.0990   LearningRate 0.0274   Epoch: 9   Global Step: 394980   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:16,948-Speed 2625.45 samples/sec   Loss 7.0997   LearningRate 0.0274   Epoch: 9   Global Step: 394990   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:20,844-Speed 2628.55 samples/sec   Loss 7.1139   LearningRate 0.0274   Epoch: 9   Global Step: 395000   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:24,778-Speed 2603.59 samples/sec   Loss 7.0128   LearningRate 0.0274   Epoch: 9   Global Step: 395010   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:28,868-Speed 2505.15 samples/sec   Loss 7.0486   LearningRate 0.0274   Epoch: 9   Global Step: 395020   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:32,766-Speed 2627.43 samples/sec   Loss 7.0819   LearningRate 0.0274   Epoch: 9   Global Step: 395030   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:36,680-Speed 2616.34 samples/sec   Loss 7.0679   LearningRate 0.0274   Epoch: 9   Global Step: 395040   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:40,575-Speed 2630.24 samples/sec   Loss 7.1995   LearningRate 0.0274   Epoch: 9   Global Step: 395050   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:44,470-Speed 2629.59 samples/sec   Loss 6.9623   LearningRate 0.0274   Epoch: 9   Global Step: 395060   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:06:48,369-Speed 2627.31 samples/sec   Loss 6.9913   LearningRate 0.0274   Epoch: 9   Global Step: 395070   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:06:52,265-Speed 2629.43 samples/sec   Loss 6.9484   LearningRate 0.0274   Epoch: 9   Global Step: 395080   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:06:56,166-Speed 2625.14 samples/sec   Loss 7.1153   LearningRate 0.0274   Epoch: 9   Global Step: 395090   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:00,254-Speed 2505.56 samples/sec   Loss 6.9821   LearningRate 0.0274   Epoch: 9   Global Step: 395100   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:04,165-Speed 2618.50 samples/sec   Loss 7.0593   LearningRate 0.0274   Epoch: 9   Global Step: 395110   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:08,061-Speed 2629.73 samples/sec   Loss 7.0362   LearningRate 0.0274   Epoch: 9   Global Step: 395120   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:11,948-Speed 2634.67 samples/sec   Loss 6.9090   LearningRate 0.0274   Epoch: 9   Global Step: 395130   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:15,854-Speed 2622.02 samples/sec   Loss 6.9571   LearningRate 0.0274   Epoch: 9   Global Step: 395140   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:19,776-Speed 2611.52 samples/sec   Loss 6.9460   LearningRate 0.0274   Epoch: 9   Global Step: 395150   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:23,696-Speed 2613.29 samples/sec   Loss 7.0912   LearningRate 0.0274   Epoch: 9   Global Step: 395160   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:27,575-Speed 2640.37 samples/sec   Loss 7.1122   LearningRate 0.0274   Epoch: 9   Global Step: 395170   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:31,555-Speed 2573.48 samples/sec   Loss 7.0404   LearningRate 0.0274   Epoch: 9   Global Step: 395180   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:35,452-Speed 2628.03 samples/sec   Loss 7.0043   LearningRate 0.0274   Epoch: 9   Global Step: 395190   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:39,353-Speed 2625.61 samples/sec   Loss 7.0903   LearningRate 0.0274   Epoch: 9   Global Step: 395200   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:07:43,259-Speed 2622.29 samples/sec   Loss 7.1902   LearningRate 0.0274   Epoch: 9   Global Step: 395210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:07:47,161-Speed 2624.52 samples/sec   Loss 6.9832   LearningRate 0.0274   Epoch: 9   Global Step: 395220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:07:51,143-Speed 2572.61 samples/sec   Loss 6.9333   LearningRate 0.0274   Epoch: 9   Global Step: 395230   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:07:55,127-Speed 2570.81 samples/sec   Loss 6.9537   LearningRate 0.0274   Epoch: 9   Global Step: 395240   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:07:59,046-Speed 2613.84 samples/sec   Loss 6.8885   LearningRate 0.0274   Epoch: 9   Global Step: 395250   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:02,940-Speed 2630.12 samples/sec   Loss 6.9926   LearningRate 0.0274   Epoch: 9   Global Step: 395260   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:06,837-Speed 2627.87 samples/sec   Loss 7.1411   LearningRate 0.0274   Epoch: 9   Global Step: 395270   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:10,735-Speed 2627.36 samples/sec   Loss 7.1072   LearningRate 0.0274   Epoch: 9   Global Step: 395280   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:14,631-Speed 2629.06 samples/sec   Loss 7.0189   LearningRate 0.0274   Epoch: 9   Global Step: 395290   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:18,559-Speed 2608.97 samples/sec   Loss 6.9936   LearningRate 0.0274   Epoch: 9   Global Step: 395300   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:22,456-Speed 2628.13 samples/sec   Loss 6.9223   LearningRate 0.0274   Epoch: 9   Global Step: 395310   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:08:26,356-Speed 2627.12 samples/sec   Loss 7.1200   LearningRate 0.0274   Epoch: 9   Global Step: 395320   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:08:30,242-Speed 2636.02 samples/sec   Loss 7.1370   LearningRate 0.0274   Epoch: 9   Global Step: 395330   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:34,140-Speed 2627.27 samples/sec   Loss 6.9752   LearningRate 0.0274   Epoch: 9   Global Step: 395340   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:38,038-Speed 2627.44 samples/sec   Loss 7.0435   LearningRate 0.0274   Epoch: 9   Global Step: 395350   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:41,935-Speed 2628.37 samples/sec   Loss 6.9406   LearningRate 0.0274   Epoch: 9   Global Step: 395360   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:45,836-Speed 2625.74 samples/sec   Loss 6.9676   LearningRate 0.0274   Epoch: 9   Global Step: 395370   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:49,733-Speed 2628.38 samples/sec   Loss 7.0204   LearningRate 0.0274   Epoch: 9   Global Step: 395380   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:53,631-Speed 2627.93 samples/sec   Loss 6.9798   LearningRate 0.0274   Epoch: 9   Global Step: 395390   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:08:57,540-Speed 2620.08 samples/sec   Loss 7.1201   LearningRate 0.0274   Epoch: 9   Global Step: 395400   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:01,441-Speed 2625.58 samples/sec   Loss 7.0379   LearningRate 0.0274   Epoch: 9   Global Step: 395410   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:05,340-Speed 2626.69 samples/sec   Loss 6.9860   LearningRate 0.0274   Epoch: 9   Global Step: 395420   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:09,236-Speed 2628.55 samples/sec   Loss 7.0009   LearningRate 0.0274   Epoch: 9   Global Step: 395430   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:09:13,118-Speed 2638.92 samples/sec   Loss 6.9243   LearningRate 0.0274   Epoch: 9   Global Step: 395440   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:17,031-Speed 2617.61 samples/sec   Loss 6.9399   LearningRate 0.0274   Epoch: 9   Global Step: 395450   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:20,929-Speed 2627.65 samples/sec   Loss 6.9613   LearningRate 0.0274   Epoch: 9   Global Step: 395460   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:24,827-Speed 2627.98 samples/sec   Loss 7.0573   LearningRate 0.0274   Epoch: 9   Global Step: 395470   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:28,752-Speed 2609.68 samples/sec   Loss 7.0007   LearningRate 0.0274   Epoch: 9   Global Step: 395480   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:32,664-Speed 2618.09 samples/sec   Loss 7.0285   LearningRate 0.0274   Epoch: 9   Global Step: 395490   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:36,566-Speed 2624.87 samples/sec   Loss 7.0695   LearningRate 0.0274   Epoch: 9   Global Step: 395500   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:40,467-Speed 2625.58 samples/sec   Loss 6.9925   LearningRate 0.0274   Epoch: 9   Global Step: 395510   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:44,361-Speed 2630.66 samples/sec   Loss 7.0363   LearningRate 0.0274   Epoch: 9   Global Step: 395520   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:48,259-Speed 2628.33 samples/sec   Loss 7.0386   LearningRate 0.0274   Epoch: 9   Global Step: 395530   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:09:52,156-Speed 2628.02 samples/sec   Loss 7.0028   LearningRate 0.0274   Epoch: 9   Global Step: 395540   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:09:56,051-Speed 2630.36 samples/sec   Loss 7.0741   LearningRate 0.0274   Epoch: 9   Global Step: 395550   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:09:59,975-Speed 2609.71 samples/sec   Loss 7.0807   LearningRate 0.0274   Epoch: 9   Global Step: 395560   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:03,876-Speed 2625.98 samples/sec   Loss 6.9469   LearningRate 0.0274   Epoch: 9   Global Step: 395570   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:07,775-Speed 2626.32 samples/sec   Loss 6.9340   LearningRate 0.0274   Epoch: 9   Global Step: 395580   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:11,675-Speed 2626.65 samples/sec   Loss 7.0846   LearningRate 0.0274   Epoch: 9   Global Step: 395590   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:15,575-Speed 2625.98 samples/sec   Loss 7.0558   LearningRate 0.0274   Epoch: 9   Global Step: 395600   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:19,474-Speed 2626.94 samples/sec   Loss 6.9497   LearningRate 0.0274   Epoch: 9   Global Step: 395610   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:23,374-Speed 2626.40 samples/sec   Loss 6.8740   LearningRate 0.0274   Epoch: 9   Global Step: 395620   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:10:27,249-Speed 2642.93 samples/sec   Loss 6.9859   LearningRate 0.0274   Epoch: 9   Global Step: 395630   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:31,160-Speed 2619.31 samples/sec   Loss 7.0725   LearningRate 0.0274   Epoch: 9   Global Step: 395640   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:35,063-Speed 2624.11 samples/sec   Loss 7.0234   LearningRate 0.0274   Epoch: 9   Global Step: 395650   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:38,963-Speed 2626.01 samples/sec   Loss 7.0939   LearningRate 0.0274   Epoch: 9   Global Step: 395660   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:42,861-Speed 2627.75 samples/sec   Loss 7.1329   LearningRate 0.0274   Epoch: 9   Global Step: 395670   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:46,763-Speed 2625.05 samples/sec   Loss 7.0038   LearningRate 0.0274   Epoch: 9   Global Step: 395680   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:50,658-Speed 2629.39 samples/sec   Loss 7.0757   LearningRate 0.0274   Epoch: 9   Global Step: 395690   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:54,554-Speed 2629.16 samples/sec   Loss 6.9321   LearningRate 0.0274   Epoch: 9   Global Step: 395700   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:10:58,457-Speed 2624.29 samples/sec   Loss 7.1787   LearningRate 0.0274   Epoch: 9   Global Step: 395710   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:02,356-Speed 2626.90 samples/sec   Loss 7.0504   LearningRate 0.0274   Epoch: 9   Global Step: 395720   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:06,259-Speed 2624.14 samples/sec   Loss 6.9649   LearningRate 0.0274   Epoch: 9   Global Step: 395730   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:11:10,159-Speed 2626.63 samples/sec   Loss 6.8575   LearningRate 0.0273   Epoch: 9   Global Step: 395740   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:11:14,053-Speed 2629.79 samples/sec   Loss 6.9892   LearningRate 0.0273   Epoch: 9   Global Step: 395750   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:11:17,935-Speed 2639.10 samples/sec   Loss 7.1090   LearningRate 0.0273   Epoch: 9   Global Step: 395760   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:21,841-Speed 2621.52 samples/sec   Loss 6.9995   LearningRate 0.0273   Epoch: 9   Global Step: 395770   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:25,752-Speed 2619.68 samples/sec   Loss 7.0709   LearningRate 0.0273   Epoch: 9   Global Step: 395780   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:29,669-Speed 2614.24 samples/sec   Loss 6.9445   LearningRate 0.0273   Epoch: 9   Global Step: 395790   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:33,576-Speed 2622.00 samples/sec   Loss 7.1263   LearningRate 0.0273   Epoch: 9   Global Step: 395800   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:37,473-Speed 2628.43 samples/sec   Loss 7.1003   LearningRate 0.0273   Epoch: 9   Global Step: 395810   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:41,391-Speed 2614.24 samples/sec   Loss 7.1099   LearningRate 0.0273   Epoch: 9   Global Step: 395820   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:45,312-Speed 2612.17 samples/sec   Loss 7.0340   LearningRate 0.0273   Epoch: 9   Global Step: 395830   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:49,210-Speed 2628.11 samples/sec   Loss 6.9711   LearningRate 0.0273   Epoch: 9   Global Step: 395840   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:53,104-Speed 2629.64 samples/sec   Loss 6.9884   LearningRate 0.0273   Epoch: 9   Global Step: 395850   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:11:57,015-Speed 2619.55 samples/sec   Loss 6.9702   LearningRate 0.0273   Epoch: 9   Global Step: 395860   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:00,919-Speed 2622.86 samples/sec   Loss 7.1906   LearningRate 0.0273   Epoch: 9   Global Step: 395870   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:04,828-Speed 2620.64 samples/sec   Loss 7.0477   LearningRate 0.0273   Epoch: 9   Global Step: 395880   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:08,721-Speed 2631.40 samples/sec   Loss 6.9350   LearningRate 0.0273   Epoch: 9   Global Step: 395890   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:12,645-Speed 2610.02 samples/sec   Loss 7.1542   LearningRate 0.0273   Epoch: 9   Global Step: 395900   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:16,576-Speed 2605.58 samples/sec   Loss 7.1134   LearningRate 0.0273   Epoch: 9   Global Step: 395910   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:20,472-Speed 2628.55 samples/sec   Loss 7.0974   LearningRate 0.0273   Epoch: 9   Global Step: 395920   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:24,374-Speed 2625.63 samples/sec   Loss 7.0009   LearningRate 0.0273   Epoch: 9   Global Step: 395930   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:28,310-Speed 2602.19 samples/sec   Loss 7.0672   LearningRate 0.0273   Epoch: 9   Global Step: 395940   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:32,214-Speed 2623.33 samples/sec   Loss 7.0443   LearningRate 0.0273   Epoch: 9   Global Step: 395950   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:36,099-Speed 2636.47 samples/sec   Loss 7.1131   LearningRate 0.0273   Epoch: 9   Global Step: 395960   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:12:39,975-Speed 2643.16 samples/sec   Loss 7.0913   LearningRate 0.0273   Epoch: 9   Global Step: 395970   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:12:43,923-Speed 2594.36 samples/sec   Loss 7.0471   LearningRate 0.0273   Epoch: 9   Global Step: 395980   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:12:47,824-Speed 2625.67 samples/sec   Loss 7.1199   LearningRate 0.0273   Epoch: 9   Global Step: 395990   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:12:51,721-Speed 2628.58 samples/sec   Loss 6.9295   LearningRate 0.0273   Epoch: 9   Global Step: 396000   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:12:55,626-Speed 2622.56 samples/sec   Loss 6.9649   LearningRate 0.0273   Epoch: 9   Global Step: 396010   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:12:59,527-Speed 2625.37 samples/sec   Loss 7.0466   LearningRate 0.0273   Epoch: 9   Global Step: 396020   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:03,425-Speed 2628.06 samples/sec   Loss 7.0061   LearningRate 0.0273   Epoch: 9   Global Step: 396030   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:07,319-Speed 2630.47 samples/sec   Loss 6.9410   LearningRate 0.0273   Epoch: 9   Global Step: 396040   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:11,217-Speed 2627.52 samples/sec   Loss 7.0842   LearningRate 0.0273   Epoch: 9   Global Step: 396050   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:15,120-Speed 2624.04 samples/sec   Loss 6.9006   LearningRate 0.0273   Epoch: 9   Global Step: 396060   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:19,032-Speed 2619.10 samples/sec   Loss 6.9475   LearningRate 0.0273   Epoch: 9   Global Step: 396070   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:22,925-Speed 2631.02 samples/sec   Loss 6.9466   LearningRate 0.0273   Epoch: 9   Global Step: 396080   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:26,820-Speed 2629.87 samples/sec   Loss 7.0210   LearningRate 0.0273   Epoch: 9   Global Step: 396090   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:30,711-Speed 2631.80 samples/sec   Loss 7.0562   LearningRate 0.0273   Epoch: 9   Global Step: 396100   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:34,611-Speed 2626.33 samples/sec   Loss 7.0061   LearningRate 0.0273   Epoch: 9   Global Step: 396110   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:38,505-Speed 2630.36 samples/sec   Loss 6.9666   LearningRate 0.0273   Epoch: 9   Global Step: 396120   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:42,399-Speed 2630.69 samples/sec   Loss 6.9939   LearningRate 0.0273   Epoch: 9   Global Step: 396130   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:13:46,293-Speed 2630.14 samples/sec   Loss 7.1314   LearningRate 0.0273   Epoch: 9   Global Step: 396140   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:50,200-Speed 2621.49 samples/sec   Loss 7.1777   LearningRate 0.0273   Epoch: 9   Global Step: 396150   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:54,094-Speed 2630.20 samples/sec   Loss 6.8987   LearningRate 0.0273   Epoch: 9   Global Step: 396160   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:13:57,989-Speed 2630.16 samples/sec   Loss 7.0502   LearningRate 0.0273   Epoch: 9   Global Step: 396170   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:01,883-Speed 2630.42 samples/sec   Loss 7.1092   LearningRate 0.0273   Epoch: 9   Global Step: 396180   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:05,791-Speed 2620.60 samples/sec   Loss 7.1840   LearningRate 0.0273   Epoch: 9   Global Step: 396190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:09,680-Speed 2633.29 samples/sec   Loss 7.1115   LearningRate 0.0273   Epoch: 9   Global Step: 396200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:13,599-Speed 2613.96 samples/sec   Loss 7.0528   LearningRate 0.0273   Epoch: 9   Global Step: 396210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:17,493-Speed 2629.66 samples/sec   Loss 7.0022   LearningRate 0.0273   Epoch: 9   Global Step: 396220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:21,384-Speed 2633.49 samples/sec   Loss 6.9587   LearningRate 0.0273   Epoch: 9   Global Step: 396230   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:25,278-Speed 2630.29 samples/sec   Loss 6.9874   LearningRate 0.0273   Epoch: 9   Global Step: 396240   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:14:29,175-Speed 2628.48 samples/sec   Loss 6.9398   LearningRate 0.0273   Epoch: 9   Global Step: 396250   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:14:33,071-Speed 2629.10 samples/sec   Loss 7.0111   LearningRate 0.0273   Epoch: 9   Global Step: 396260   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:14:36,969-Speed 2627.29 samples/sec   Loss 6.9267   LearningRate 0.0273   Epoch: 9   Global Step: 396270   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:14:40,849-Speed 2639.47 samples/sec   Loss 7.0391   LearningRate 0.0273   Epoch: 9   Global Step: 396280   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:44,749-Speed 2626.53 samples/sec   Loss 7.1371   LearningRate 0.0273   Epoch: 9   Global Step: 396290   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:48,645-Speed 2628.70 samples/sec   Loss 6.9290   LearningRate 0.0273   Epoch: 9   Global Step: 396300   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:52,546-Speed 2625.71 samples/sec   Loss 7.0614   LearningRate 0.0273   Epoch: 9   Global Step: 396310   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:14:56,441-Speed 2629.41 samples/sec   Loss 6.9076   LearningRate 0.0273   Epoch: 9   Global Step: 396320   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:15:00,342-Speed 2625.95 samples/sec   Loss 6.8840   LearningRate 0.0273   Epoch: 9   Global Step: 396330   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:15:04,234-Speed 2631.07 samples/sec   Loss 6.9903   LearningRate 0.0273   Epoch: 9   Global Step: 396340   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:15:08,134-Speed 2626.42 samples/sec   Loss 6.9575   LearningRate 0.0273   Epoch: 9   Global Step: 396350   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:15:12,031-Speed 2628.29 samples/sec   Loss 7.0043   LearningRate 0.0273   Epoch: 9   Global Step: 396360   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:15:15,933-Speed 2624.80 samples/sec   Loss 7.1997   LearningRate 0.0273   Epoch: 9   Global Step: 396370   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:15:19,827-Speed 2630.55 samples/sec   Loss 6.9589   LearningRate 0.0273   Epoch: 9   Global Step: 396380   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:23,722-Speed 2629.04 samples/sec   Loss 7.0235   LearningRate 0.0273   Epoch: 9   Global Step: 396390   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:27,650-Speed 2608.16 samples/sec   Loss 6.8821   LearningRate 0.0273   Epoch: 9   Global Step: 396400   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:31,543-Speed 2630.53 samples/sec   Loss 7.0311   LearningRate 0.0273   Epoch: 9   Global Step: 396410   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:35,437-Speed 2630.04 samples/sec   Loss 7.0107   LearningRate 0.0273   Epoch: 9   Global Step: 396420   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:39,331-Speed 2630.81 samples/sec   Loss 7.0301   LearningRate 0.0273   Epoch: 9   Global Step: 396430   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:43,223-Speed 2632.36 samples/sec   Loss 6.9728   LearningRate 0.0273   Epoch: 9   Global Step: 396440   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:47,118-Speed 2629.28 samples/sec   Loss 7.0262   LearningRate 0.0273   Epoch: 9   Global Step: 396450   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:51,016-Speed 2627.72 samples/sec   Loss 7.0055   LearningRate 0.0273   Epoch: 9   Global Step: 396460   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:54,927-Speed 2618.77 samples/sec   Loss 7.0105   LearningRate 0.0273   Epoch: 9   Global Step: 396470   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:15:58,811-Speed 2636.92 samples/sec   Loss 7.0568   LearningRate 0.0273   Epoch: 9   Global Step: 396480   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:02,705-Speed 2630.52 samples/sec   Loss 6.8573   LearningRate 0.0273   Epoch: 9   Global Step: 396490   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:06,603-Speed 2627.19 samples/sec   Loss 7.0069   LearningRate 0.0273   Epoch: 9   Global Step: 396500   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:10,500-Speed 2628.42 samples/sec   Loss 7.0983   LearningRate 0.0273   Epoch: 9   Global Step: 396510   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:14,398-Speed 2627.34 samples/sec   Loss 7.0678   LearningRate 0.0273   Epoch: 9   Global Step: 396520   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:18,322-Speed 2610.29 samples/sec   Loss 6.9242   LearningRate 0.0272   Epoch: 9   Global Step: 396530   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:22,224-Speed 2624.67 samples/sec   Loss 7.0352   LearningRate 0.0272   Epoch: 9   Global Step: 396540   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:26,119-Speed 2629.90 samples/sec   Loss 6.9130   LearningRate 0.0272   Epoch: 9   Global Step: 396550   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:30,015-Speed 2628.91 samples/sec   Loss 6.9822   LearningRate 0.0272   Epoch: 9   Global Step: 396560   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:16:33,903-Speed 2634.31 samples/sec   Loss 7.1471   LearningRate 0.0272   Epoch: 9   Global Step: 396570   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:16:37,805-Speed 2624.90 samples/sec   Loss 7.0371   LearningRate 0.0272   Epoch: 9   Global Step: 396580   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:16:41,705-Speed 2626.37 samples/sec   Loss 6.9478   LearningRate 0.0272   Epoch: 9   Global Step: 396590   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:16:45,610-Speed 2622.13 samples/sec   Loss 7.0253   LearningRate 0.0272   Epoch: 9   Global Step: 396600   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:16:49,513-Speed 2625.28 samples/sec   Loss 7.0234   LearningRate 0.0272   Epoch: 9   Global Step: 396610   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:16:53,425-Speed 2617.82 samples/sec   Loss 6.9998   LearningRate 0.0272   Epoch: 9   Global Step: 396620   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:16:57,326-Speed 2626.26 samples/sec   Loss 7.0329   LearningRate 0.0272   Epoch: 9   Global Step: 396630   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:17:01,228-Speed 2624.71 samples/sec   Loss 7.0084   LearningRate 0.0272   Epoch: 9   Global Step: 396640   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:17:05,124-Speed 2629.24 samples/sec   Loss 6.9638   LearningRate 0.0272   Epoch: 9   Global Step: 396650   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:17:09,022-Speed 2627.84 samples/sec   Loss 6.9861   LearningRate 0.0272   Epoch: 9   Global Step: 396660   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:17:12,927-Speed 2622.49 samples/sec   Loss 6.9393   LearningRate 0.0272   Epoch: 9   Global Step: 396670   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:16,827-Speed 2625.76 samples/sec   Loss 7.0119   LearningRate 0.0272   Epoch: 9   Global Step: 396680   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:20,747-Speed 2613.35 samples/sec   Loss 6.9836   LearningRate 0.0272   Epoch: 9   Global Step: 396690   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:24,640-Speed 2631.25 samples/sec   Loss 7.0970   LearningRate 0.0272   Epoch: 9   Global Step: 396700   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:28,542-Speed 2624.52 samples/sec   Loss 6.9620   LearningRate 0.0272   Epoch: 9   Global Step: 396710   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:32,454-Speed 2617.98 samples/sec   Loss 6.9054   LearningRate 0.0272   Epoch: 9   Global Step: 396720   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:36,351-Speed 2628.53 samples/sec   Loss 7.0126   LearningRate 0.0272   Epoch: 9   Global Step: 396730   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:40,252-Speed 2625.46 samples/sec   Loss 7.0725   LearningRate 0.0272   Epoch: 9   Global Step: 396740   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:44,152-Speed 2626.65 samples/sec   Loss 6.9986   LearningRate 0.0272   Epoch: 9   Global Step: 396750   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:48,045-Speed 2630.89 samples/sec   Loss 7.1389   LearningRate 0.0272   Epoch: 9   Global Step: 396760   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:51,924-Speed 2639.97 samples/sec   Loss 7.0701   LearningRate 0.0272   Epoch: 9   Global Step: 396770   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:55,817-Speed 2631.33 samples/sec   Loss 6.9835   LearningRate 0.0272   Epoch: 9   Global Step: 396780   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:17:59,716-Speed 2626.90 samples/sec   Loss 6.9185   LearningRate 0.0272   Epoch: 9   Global Step: 396790   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:03,615-Speed 2627.01 samples/sec   Loss 6.9109   LearningRate 0.0272   Epoch: 9   Global Step: 396800   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:07,528-Speed 2617.12 samples/sec   Loss 6.9887   LearningRate 0.0272   Epoch: 9   Global Step: 396810   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:11,424-Speed 2628.55 samples/sec   Loss 6.8265   LearningRate 0.0272   Epoch: 9   Global Step: 396820   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:15,322-Speed 2628.28 samples/sec   Loss 6.9954   LearningRate 0.0272   Epoch: 9   Global Step: 396830   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:19,221-Speed 2626.89 samples/sec   Loss 7.0522   LearningRate 0.0272   Epoch: 9   Global Step: 396840   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:23,116-Speed 2629.16 samples/sec   Loss 6.9042   LearningRate 0.0272   Epoch: 9   Global Step: 396850   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:27,007-Speed 2632.58 samples/sec   Loss 7.0830   LearningRate 0.0272   Epoch: 9   Global Step: 396860   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:30,893-Speed 2635.94 samples/sec   Loss 7.0352   LearningRate 0.0272   Epoch: 9   Global Step: 396870   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:18:34,792-Speed 2626.44 samples/sec   Loss 6.9889   LearningRate 0.0272   Epoch: 9   Global Step: 396880   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:18:38,686-Speed 2630.70 samples/sec   Loss 7.0311   LearningRate 0.0272   Epoch: 9   Global Step: 396890   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:42,588-Speed 2624.36 samples/sec   Loss 7.0718   LearningRate 0.0272   Epoch: 9   Global Step: 396900   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:46,486-Speed 2628.15 samples/sec   Loss 7.0867   LearningRate 0.0272   Epoch: 9   Global Step: 396910   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:50,382-Speed 2628.66 samples/sec   Loss 6.9943   LearningRate 0.0272   Epoch: 9   Global Step: 396920   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:54,279-Speed 2628.55 samples/sec   Loss 6.9996   LearningRate 0.0272   Epoch: 9   Global Step: 396930   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:18:58,181-Speed 2624.86 samples/sec   Loss 7.1088   LearningRate 0.0272   Epoch: 9   Global Step: 396940   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:02,092-Speed 2618.69 samples/sec   Loss 6.9633   LearningRate 0.0272   Epoch: 9   Global Step: 396950   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:05,994-Speed 2624.99 samples/sec   Loss 6.9923   LearningRate 0.0272   Epoch: 9   Global Step: 396960   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:09,953-Speed 2587.04 samples/sec   Loss 6.9972   LearningRate 0.0272   Epoch: 9   Global Step: 396970   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:13,850-Speed 2627.78 samples/sec   Loss 6.9979   LearningRate 0.0272   Epoch: 9   Global Step: 396980   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:17,740-Speed 2633.41 samples/sec   Loss 6.9114   LearningRate 0.0272   Epoch: 9   Global Step: 396990   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:21,634-Speed 2630.30 samples/sec   Loss 6.9574   LearningRate 0.0272   Epoch: 9   Global Step: 397000   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:25,537-Speed 2623.73 samples/sec   Loss 6.9118   LearningRate 0.0272   Epoch: 9   Global Step: 397010   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:29,464-Speed 2608.66 samples/sec   Loss 6.9382   LearningRate 0.0272   Epoch: 9   Global Step: 397020   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:33,353-Speed 2633.55 samples/sec   Loss 6.8885   LearningRate 0.0272   Epoch: 9   Global Step: 397030   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:37,257-Speed 2623.65 samples/sec   Loss 6.9804   LearningRate 0.0272   Epoch: 9   Global Step: 397040   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:19:41,137-Speed 2639.28 samples/sec   Loss 7.1183   LearningRate 0.0272   Epoch: 9   Global Step: 397050   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:19:45,095-Speed 2588.29 samples/sec   Loss 6.9484   LearningRate 0.0272   Epoch: 9   Global Step: 397060   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:19:48,995-Speed 2625.90 samples/sec   Loss 6.9961   LearningRate 0.0272   Epoch: 9   Global Step: 397070   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:19:52,888-Speed 2631.17 samples/sec   Loss 7.0696   LearningRate 0.0272   Epoch: 9   Global Step: 397080   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:19:56,786-Speed 2627.37 samples/sec   Loss 6.8866   LearningRate 0.0272   Epoch: 9   Global Step: 397090   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:00,680-Speed 2630.42 samples/sec   Loss 6.9774   LearningRate 0.0272   Epoch: 9   Global Step: 397100   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:04,583-Speed 2624.14 samples/sec   Loss 7.0390   LearningRate 0.0272   Epoch: 9   Global Step: 397110   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:08,482-Speed 2626.89 samples/sec   Loss 6.9689   LearningRate 0.0272   Epoch: 9   Global Step: 397120   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:12,379-Speed 2628.45 samples/sec   Loss 6.8014   LearningRate 0.0272   Epoch: 9   Global Step: 397130   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:16,272-Speed 2630.41 samples/sec   Loss 6.9489   LearningRate 0.0272   Epoch: 9   Global Step: 397140   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:20,170-Speed 2627.63 samples/sec   Loss 6.9924   LearningRate 0.0272   Epoch: 9   Global Step: 397150   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:20:24,069-Speed 2626.84 samples/sec   Loss 7.0244   LearningRate 0.0272   Epoch: 9   Global Step: 397160   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:20:27,966-Speed 2628.73 samples/sec   Loss 7.0092   LearningRate 0.0272   Epoch: 9   Global Step: 397170   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:20:31,871-Speed 2622.45 samples/sec   Loss 6.9002   LearningRate 0.0272   Epoch: 9   Global Step: 397180   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:20:35,749-Speed 2640.93 samples/sec   Loss 6.9843   LearningRate 0.0272   Epoch: 9   Global Step: 397190   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:39,659-Speed 2619.64 samples/sec   Loss 6.9518   LearningRate 0.0272   Epoch: 9   Global Step: 397200   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:43,554-Speed 2629.59 samples/sec   Loss 6.9969   LearningRate 0.0272   Epoch: 9   Global Step: 397210   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:47,455-Speed 2625.64 samples/sec   Loss 6.9376   LearningRate 0.0272   Epoch: 9   Global Step: 397220   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:51,352-Speed 2628.39 samples/sec   Loss 7.0016   LearningRate 0.0272   Epoch: 9   Global Step: 397230   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:55,253-Speed 2625.41 samples/sec   Loss 6.8669   LearningRate 0.0272   Epoch: 9   Global Step: 397240   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:20:59,149-Speed 2629.50 samples/sec   Loss 6.9157   LearningRate 0.0272   Epoch: 9   Global Step: 397250   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:21:03,042-Speed 2630.77 samples/sec   Loss 7.0380   LearningRate 0.0272   Epoch: 9   Global Step: 397260   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:21:06,935-Speed 2630.52 samples/sec   Loss 6.9234   LearningRate 0.0272   Epoch: 9   Global Step: 397270   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:21:10,827-Speed 2631.40 samples/sec   Loss 6.9160   LearningRate 0.0272   Epoch: 9   Global Step: 397280   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:21:14,721-Speed 2630.40 samples/sec   Loss 7.0379   LearningRate 0.0272   Epoch: 9   Global Step: 397290   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:18,621-Speed 2626.10 samples/sec   Loss 6.9722   LearningRate 0.0272   Epoch: 9   Global Step: 397300   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:22,521-Speed 2626.59 samples/sec   Loss 6.9146   LearningRate 0.0272   Epoch: 9   Global Step: 397310   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:26,422-Speed 2625.47 samples/sec   Loss 7.0080   LearningRate 0.0272   Epoch: 9   Global Step: 397320   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:30,319-Speed 2628.96 samples/sec   Loss 7.0658   LearningRate 0.0271   Epoch: 9   Global Step: 397330   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:34,219-Speed 2625.71 samples/sec   Loss 7.0233   LearningRate 0.0271   Epoch: 9   Global Step: 397340   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:38,117-Speed 2627.54 samples/sec   Loss 7.0416   LearningRate 0.0271   Epoch: 9   Global Step: 397350   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:42,009-Speed 2631.35 samples/sec   Loss 7.0837   LearningRate 0.0271   Epoch: 9   Global Step: 397360   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:45,905-Speed 2628.88 samples/sec   Loss 7.0690   LearningRate 0.0271   Epoch: 9   Global Step: 397370   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:49,805-Speed 2626.26 samples/sec   Loss 6.8675   LearningRate 0.0271   Epoch: 9   Global Step: 397380   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:21:53,704-Speed 2627.23 samples/sec   Loss 6.9021   LearningRate 0.0271   Epoch: 9   Global Step: 397390   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:21:57,585-Speed 2638.75 samples/sec   Loss 7.0246   LearningRate 0.0271   Epoch: 9   Global Step: 397400   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:01,482-Speed 2628.53 samples/sec   Loss 6.9638   LearningRate 0.0271   Epoch: 9   Global Step: 397410   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:05,384-Speed 2624.85 samples/sec   Loss 7.1097   LearningRate 0.0271   Epoch: 9   Global Step: 397420   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:09,278-Speed 2630.09 samples/sec   Loss 6.9651   LearningRate 0.0271   Epoch: 9   Global Step: 397430   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:13,172-Speed 2630.97 samples/sec   Loss 7.0652   LearningRate 0.0271   Epoch: 9   Global Step: 397440   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:17,072-Speed 2625.73 samples/sec   Loss 6.9744   LearningRate 0.0271   Epoch: 9   Global Step: 397450   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:20,980-Speed 2621.23 samples/sec   Loss 6.9759   LearningRate 0.0271   Epoch: 9   Global Step: 397460   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:24,907-Speed 2608.18 samples/sec   Loss 6.9863   LearningRate 0.0271   Epoch: 9   Global Step: 397470   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:28,816-Speed 2620.00 samples/sec   Loss 7.0035   LearningRate 0.0271   Epoch: 9   Global Step: 397480   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:32,709-Speed 2631.26 samples/sec   Loss 7.0582   LearningRate 0.0271   Epoch: 9   Global Step: 397490   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:36,607-Speed 2627.42 samples/sec   Loss 6.9760   LearningRate 0.0271   Epoch: 9   Global Step: 397500   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:22:40,523-Speed 2615.56 samples/sec   Loss 6.9214   LearningRate 0.0271   Epoch: 9   Global Step: 397510   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:22:44,400-Speed 2641.85 samples/sec   Loss 6.9688   LearningRate 0.0271   Epoch: 9   Global Step: 397520   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:48,298-Speed 2627.83 samples/sec   Loss 6.9589   LearningRate 0.0271   Epoch: 9   Global Step: 397530   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:52,196-Speed 2627.79 samples/sec   Loss 7.1416   LearningRate 0.0271   Epoch: 9   Global Step: 397540   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:56,093-Speed 2628.21 samples/sec   Loss 7.0093   LearningRate 0.0271   Epoch: 9   Global Step: 397550   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:22:59,992-Speed 2626.91 samples/sec   Loss 6.9875   LearningRate 0.0271   Epoch: 9   Global Step: 397560   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:03,900-Speed 2621.06 samples/sec   Loss 6.9586   LearningRate 0.0271   Epoch: 9   Global Step: 397570   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:07,805-Speed 2622.66 samples/sec   Loss 6.9314   LearningRate 0.0271   Epoch: 9   Global Step: 397580   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:11,693-Speed 2633.92 samples/sec   Loss 6.8670   LearningRate 0.0271   Epoch: 9   Global Step: 397590   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:15,595-Speed 2624.57 samples/sec   Loss 6.9646   LearningRate 0.0271   Epoch: 9   Global Step: 397600   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:19,491-Speed 2629.91 samples/sec   Loss 6.9218   LearningRate 0.0271   Epoch: 9   Global Step: 397610   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:23,390-Speed 2626.71 samples/sec   Loss 6.9193   LearningRate 0.0271   Epoch: 9   Global Step: 397620   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:23:27,332-Speed 2598.48 samples/sec   Loss 6.9283   LearningRate 0.0271   Epoch: 9   Global Step: 397630   Fp16 Grad Scale: 262144   Required: 49 hours
Training: 2022-04-14 16:23:31,257-Speed 2609.44 samples/sec   Loss 6.9906   LearningRate 0.0271   Epoch: 9   Global Step: 397640   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:35,155-Speed 2627.61 samples/sec   Loss 6.8981   LearningRate 0.0271   Epoch: 9   Global Step: 397650   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:39,107-Speed 2591.63 samples/sec   Loss 6.9205   LearningRate 0.0271   Epoch: 9   Global Step: 397660   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:43,003-Speed 2628.81 samples/sec   Loss 6.9718   LearningRate 0.0271   Epoch: 9   Global Step: 397670   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:46,908-Speed 2622.74 samples/sec   Loss 6.9297   LearningRate 0.0271   Epoch: 9   Global Step: 397680   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:50,812-Speed 2623.36 samples/sec   Loss 6.9888   LearningRate 0.0271   Epoch: 9   Global Step: 397690   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:54,715-Speed 2624.47 samples/sec   Loss 6.9604   LearningRate 0.0271   Epoch: 9   Global Step: 397700   Fp16 Grad Scale: 131072   Required: 49 hours
Training: 2022-04-14 16:23:58,597-Speed 2638.37 samples/sec   Loss 6.9978   LearningRate 0.0271   Epoch: 9   Global Step: 397710   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:24:02,495-Speed 2627.88 samples/sec   Loss 7.0031   LearningRate 0.0271   Epoch: 9   Global Step: 397720   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:24:06,391-Speed 2628.80 samples/sec   Loss 6.9494   LearningRate 0.0271   Epoch: 9   Global Step: 397730   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:24:10,295-Speed 2623.71 samples/sec   Loss 7.0709   LearningRate 0.0271   Epoch: 9   Global Step: 397740   Fp16 Grad Scale: 65536   Required: 49 hours
Training: 2022-04-14 16:24:14,206-Speed 2618.69 samples/sec   Loss 7.0633   LearningRate 0.0271   Epoch: 9   Global Step: 397750   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:18,103-Speed 2628.33 samples/sec   Loss 6.8993   LearningRate 0.0271   Epoch: 9   Global Step: 397760   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:21,998-Speed 2629.46 samples/sec   Loss 7.0489   LearningRate 0.0271   Epoch: 9   Global Step: 397770   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:25,891-Speed 2631.17 samples/sec   Loss 6.8404   LearningRate 0.0271   Epoch: 9   Global Step: 397780   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:29,805-Speed 2617.04 samples/sec   Loss 6.9839   LearningRate 0.0271   Epoch: 9   Global Step: 397790   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:33,699-Speed 2630.55 samples/sec   Loss 6.9960   LearningRate 0.0271   Epoch: 9   Global Step: 397800   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:37,597-Speed 2626.87 samples/sec   Loss 7.0176   LearningRate 0.0271   Epoch: 9   Global Step: 397810   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:24:41,476-Speed 2640.97 samples/sec   Loss 6.9545   LearningRate 0.0271   Epoch: 9   Global Step: 397820   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:45,368-Speed 2631.61 samples/sec   Loss 7.0465   LearningRate 0.0271   Epoch: 9   Global Step: 397830   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:49,262-Speed 2630.10 samples/sec   Loss 6.9800   LearningRate 0.0271   Epoch: 9   Global Step: 397840   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:53,161-Speed 2627.10 samples/sec   Loss 7.0607   LearningRate 0.0271   Epoch: 9   Global Step: 397850   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:24:57,069-Speed 2620.93 samples/sec   Loss 7.0079   LearningRate 0.0271   Epoch: 9   Global Step: 397860   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:01,126-Speed 2524.17 samples/sec   Loss 6.8706   LearningRate 0.0271   Epoch: 9   Global Step: 397870   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:05,147-Speed 2547.10 samples/sec   Loss 6.9733   LearningRate 0.0271   Epoch: 9   Global Step: 397880   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:09,044-Speed 2628.76 samples/sec   Loss 7.0313   LearningRate 0.0271   Epoch: 9   Global Step: 397890   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:12,950-Speed 2621.95 samples/sec   Loss 6.9075   LearningRate 0.0271   Epoch: 9   Global Step: 397900   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:16,854-Speed 2623.48 samples/sec   Loss 6.8946   LearningRate 0.0271   Epoch: 9   Global Step: 397910   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:20,764-Speed 2619.83 samples/sec   Loss 6.9756   LearningRate 0.0271   Epoch: 9   Global Step: 397920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:25:24,666-Speed 2624.48 samples/sec   Loss 6.9764   LearningRate 0.0271   Epoch: 9   Global Step: 397930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:25:28,569-Speed 2624.37 samples/sec   Loss 6.8709   LearningRate 0.0271   Epoch: 9   Global Step: 397940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:25:32,483-Speed 2616.54 samples/sec   Loss 6.8519   LearningRate 0.0271   Epoch: 9   Global Step: 397950   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:36,392-Speed 2620.34 samples/sec   Loss 6.9295   LearningRate 0.0271   Epoch: 9   Global Step: 397960   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:40,308-Speed 2615.08 samples/sec   Loss 6.9311   LearningRate 0.0271   Epoch: 9   Global Step: 397970   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:44,211-Speed 2624.54 samples/sec   Loss 6.9038   LearningRate 0.0271   Epoch: 9   Global Step: 397980   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:48,115-Speed 2623.53 samples/sec   Loss 6.9492   LearningRate 0.0271   Epoch: 9   Global Step: 397990   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:52,019-Speed 2624.16 samples/sec   Loss 6.9809   LearningRate 0.0271   Epoch: 9   Global Step: 398000   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:55,915-Speed 2629.01 samples/sec   Loss 6.9065   LearningRate 0.0271   Epoch: 9   Global Step: 398010   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:25:59,823-Speed 2620.60 samples/sec   Loss 6.8765   LearningRate 0.0271   Epoch: 9   Global Step: 398020   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:26:03,726-Speed 2623.89 samples/sec   Loss 7.1012   LearningRate 0.0271   Epoch: 9   Global Step: 398030   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:26:07,626-Speed 2626.17 samples/sec   Loss 6.9370   LearningRate 0.0271   Epoch: 9   Global Step: 398040   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:26:11,522-Speed 2629.27 samples/sec   Loss 6.8575   LearningRate 0.0271   Epoch: 9   Global Step: 398050   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:15,414-Speed 2631.40 samples/sec   Loss 6.9226   LearningRate 0.0271   Epoch: 9   Global Step: 398060   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:19,320-Speed 2621.94 samples/sec   Loss 6.9474   LearningRate 0.0271   Epoch: 9   Global Step: 398070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:23,224-Speed 2624.12 samples/sec   Loss 6.9345   LearningRate 0.0271   Epoch: 9   Global Step: 398080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:27,122-Speed 2627.62 samples/sec   Loss 6.9236   LearningRate 0.0271   Epoch: 9   Global Step: 398090   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:31,029-Speed 2621.77 samples/sec   Loss 6.9859   LearningRate 0.0271   Epoch: 9   Global Step: 398100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:34,929-Speed 2626.09 samples/sec   Loss 7.0321   LearningRate 0.0271   Epoch: 9   Global Step: 398110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:38,825-Speed 2628.48 samples/sec   Loss 6.9437   LearningRate 0.0270   Epoch: 9   Global Step: 398120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:42,722-Speed 2628.10 samples/sec   Loss 7.0683   LearningRate 0.0270   Epoch: 9   Global Step: 398130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:46,620-Speed 2627.81 samples/sec   Loss 7.0161   LearningRate 0.0270   Epoch: 9   Global Step: 398140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:50,537-Speed 2615.15 samples/sec   Loss 6.8072   LearningRate 0.0270   Epoch: 9   Global Step: 398150   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:26:54,416-Speed 2640.73 samples/sec   Loss 6.9174   LearningRate 0.0270   Epoch: 9   Global Step: 398160   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:26:58,312-Speed 2628.67 samples/sec   Loss 7.0060   LearningRate 0.0270   Epoch: 9   Global Step: 398170   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:02,225-Speed 2617.43 samples/sec   Loss 7.0382   LearningRate 0.0270   Epoch: 9   Global Step: 398180   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:06,119-Speed 2630.73 samples/sec   Loss 6.8634   LearningRate 0.0270   Epoch: 9   Global Step: 398190   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:10,018-Speed 2627.21 samples/sec   Loss 6.9147   LearningRate 0.0270   Epoch: 9   Global Step: 398200   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:13,925-Speed 2621.59 samples/sec   Loss 6.9675   LearningRate 0.0270   Epoch: 9   Global Step: 398210   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:17,825-Speed 2626.17 samples/sec   Loss 6.8001   LearningRate 0.0270   Epoch: 9   Global Step: 398220   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:21,729-Speed 2623.18 samples/sec   Loss 6.9120   LearningRate 0.0270   Epoch: 9   Global Step: 398230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:25,642-Speed 2618.17 samples/sec   Loss 6.9880   LearningRate 0.0270   Epoch: 9   Global Step: 398240   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:29,552-Speed 2619.54 samples/sec   Loss 7.0000   LearningRate 0.0270   Epoch: 9   Global Step: 398250   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:33,434-Speed 2638.30 samples/sec   Loss 7.0355   LearningRate 0.0270   Epoch: 9   Global Step: 398260   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:37,334-Speed 2626.12 samples/sec   Loss 6.9700   LearningRate 0.0270   Epoch: 9   Global Step: 398270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:41,231-Speed 2628.04 samples/sec   Loss 7.1031   LearningRate 0.0270   Epoch: 9   Global Step: 398280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:45,130-Speed 2627.10 samples/sec   Loss 6.9826   LearningRate 0.0270   Epoch: 9   Global Step: 398290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:49,032-Speed 2624.71 samples/sec   Loss 6.8230   LearningRate 0.0270   Epoch: 9   Global Step: 398300   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:52,927-Speed 2629.70 samples/sec   Loss 6.9134   LearningRate 0.0270   Epoch: 9   Global Step: 398310   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:27:56,824-Speed 2628.17 samples/sec   Loss 7.0150   LearningRate 0.0270   Epoch: 9   Global Step: 398320   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:28:00,717-Speed 2631.22 samples/sec   Loss 6.9222   LearningRate 0.0270   Epoch: 9   Global Step: 398330   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:28:04,615-Speed 2627.38 samples/sec   Loss 6.9278   LearningRate 0.0270   Epoch: 9   Global Step: 398340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:28:08,514-Speed 2627.31 samples/sec   Loss 6.9689   LearningRate 0.0270   Epoch: 9   Global Step: 398350   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:28:12,412-Speed 2627.03 samples/sec   Loss 6.9059   LearningRate 0.0270   Epoch: 9   Global Step: 398360   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:28:16,307-Speed 2630.08 samples/sec   Loss 6.9953   LearningRate 0.0270   Epoch: 9   Global Step: 398370   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:28:20,207-Speed 2626.22 samples/sec   Loss 7.1036   LearningRate 0.0270   Epoch: 9   Global Step: 398380   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:28:24,090-Speed 2637.88 samples/sec   Loss 6.9656   LearningRate 0.0270   Epoch: 9   Global Step: 398390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:28,003-Speed 2617.69 samples/sec   Loss 7.0843   LearningRate 0.0270   Epoch: 9   Global Step: 398400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:31,892-Speed 2633.78 samples/sec   Loss 7.0359   LearningRate 0.0270   Epoch: 9   Global Step: 398410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:35,792-Speed 2626.29 samples/sec   Loss 6.8342   LearningRate 0.0270   Epoch: 9   Global Step: 398420   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:39,688-Speed 2628.93 samples/sec   Loss 6.9682   LearningRate 0.0270   Epoch: 9   Global Step: 398430   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:43,589-Speed 2626.07 samples/sec   Loss 7.0716   LearningRate 0.0270   Epoch: 9   Global Step: 398440   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:47,484-Speed 2629.18 samples/sec   Loss 6.8836   LearningRate 0.0270   Epoch: 9   Global Step: 398450   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:51,379-Speed 2629.92 samples/sec   Loss 7.0413   LearningRate 0.0270   Epoch: 9   Global Step: 398460   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:55,272-Speed 2630.80 samples/sec   Loss 6.9398   LearningRate 0.0270   Epoch: 9   Global Step: 398470   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:28:59,164-Speed 2631.69 samples/sec   Loss 7.0328   LearningRate 0.0270   Epoch: 9   Global Step: 398480   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:29:03,063-Speed 2627.36 samples/sec   Loss 6.8765   LearningRate 0.0270   Epoch: 9   Global Step: 398490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:06,954-Speed 2632.10 samples/sec   Loss 6.7974   LearningRate 0.0270   Epoch: 9   Global Step: 398500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:10,849-Speed 2630.01 samples/sec   Loss 7.1945   LearningRate 0.0270   Epoch: 9   Global Step: 398510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:14,744-Speed 2628.97 samples/sec   Loss 6.9685   LearningRate 0.0270   Epoch: 9   Global Step: 398520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:18,679-Speed 2603.17 samples/sec   Loss 6.9532   LearningRate 0.0270   Epoch: 9   Global Step: 398530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:22,576-Speed 2628.34 samples/sec   Loss 7.0278   LearningRate 0.0270   Epoch: 9   Global Step: 398540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:26,477-Speed 2626.35 samples/sec   Loss 6.9493   LearningRate 0.0270   Epoch: 9   Global Step: 398550   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:30,371-Speed 2630.03 samples/sec   Loss 7.0045   LearningRate 0.0270   Epoch: 9   Global Step: 398560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:34,271-Speed 2626.86 samples/sec   Loss 6.9320   LearningRate 0.0270   Epoch: 9   Global Step: 398570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:38,168-Speed 2628.18 samples/sec   Loss 7.0698   LearningRate 0.0270   Epoch: 9   Global Step: 398580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:42,045-Speed 2641.46 samples/sec   Loss 6.9608   LearningRate 0.0270   Epoch: 9   Global Step: 398590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:45,944-Speed 2626.70 samples/sec   Loss 7.0397   LearningRate 0.0270   Epoch: 9   Global Step: 398600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:49,850-Speed 2622.82 samples/sec   Loss 6.8829   LearningRate 0.0270   Epoch: 9   Global Step: 398610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:53,750-Speed 2626.09 samples/sec   Loss 6.9358   LearningRate 0.0270   Epoch: 9   Global Step: 398620   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:29:57,648-Speed 2628.13 samples/sec   Loss 6.8859   LearningRate 0.0270   Epoch: 9   Global Step: 398630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:30:01,549-Speed 2625.37 samples/sec   Loss 6.9224   LearningRate 0.0270   Epoch: 9   Global Step: 398640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:30:05,441-Speed 2632.01 samples/sec   Loss 6.9392   LearningRate 0.0270   Epoch: 9   Global Step: 398650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:30:09,339-Speed 2627.14 samples/sec   Loss 6.9387   LearningRate 0.0270   Epoch: 9   Global Step: 398660   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:30:13,216-Speed 2642.09 samples/sec   Loss 6.9725   LearningRate 0.0270   Epoch: 9   Global Step: 398670   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:17,129-Speed 2616.96 samples/sec   Loss 6.9758   LearningRate 0.0270   Epoch: 9   Global Step: 398680   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:21,022-Speed 2631.50 samples/sec   Loss 7.1484   LearningRate 0.0270   Epoch: 9   Global Step: 398690   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:24,926-Speed 2623.72 samples/sec   Loss 6.9491   LearningRate 0.0270   Epoch: 9   Global Step: 398700   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:28,827-Speed 2625.22 samples/sec   Loss 6.9998   LearningRate 0.0270   Epoch: 9   Global Step: 398710   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:32,729-Speed 2624.89 samples/sec   Loss 6.9689   LearningRate 0.0270   Epoch: 9   Global Step: 398720   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:36,620-Speed 2632.38 samples/sec   Loss 6.9936   LearningRate 0.0270   Epoch: 9   Global Step: 398730   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:40,514-Speed 2630.79 samples/sec   Loss 6.9704   LearningRate 0.0270   Epoch: 9   Global Step: 398740   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:44,409-Speed 2629.49 samples/sec   Loss 6.9822   LearningRate 0.0270   Epoch: 9   Global Step: 398750   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:48,310-Speed 2625.89 samples/sec   Loss 6.8314   LearningRate 0.0270   Epoch: 9   Global Step: 398760   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:30:52,219-Speed 2620.00 samples/sec   Loss 6.9764   LearningRate 0.0270   Epoch: 9   Global Step: 398770   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:30:56,117-Speed 2628.06 samples/sec   Loss 6.8979   LearningRate 0.0270   Epoch: 9   Global Step: 398780   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:00,014-Speed 2628.28 samples/sec   Loss 7.0244   LearningRate 0.0270   Epoch: 9   Global Step: 398790   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:03,922-Speed 2620.93 samples/sec   Loss 7.0749   LearningRate 0.0270   Epoch: 9   Global Step: 398800   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:07,821-Speed 2626.50 samples/sec   Loss 7.0099   LearningRate 0.0270   Epoch: 9   Global Step: 398810   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:11,720-Speed 2627.51 samples/sec   Loss 7.0482   LearningRate 0.0270   Epoch: 9   Global Step: 398820   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:15,615-Speed 2629.71 samples/sec   Loss 7.0996   LearningRate 0.0270   Epoch: 9   Global Step: 398830   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:19,525-Speed 2619.78 samples/sec   Loss 7.0570   LearningRate 0.0270   Epoch: 9   Global Step: 398840   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:23,434-Speed 2620.31 samples/sec   Loss 7.0154   LearningRate 0.0270   Epoch: 9   Global Step: 398850   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:27,343-Speed 2619.84 samples/sec   Loss 6.9535   LearningRate 0.0270   Epoch: 9   Global Step: 398860   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:31,252-Speed 2621.02 samples/sec   Loss 7.0158   LearningRate 0.0270   Epoch: 9   Global Step: 398870   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:31:35,146-Speed 2630.23 samples/sec   Loss 6.8261   LearningRate 0.0270   Epoch: 9   Global Step: 398880   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:31:39,047-Speed 2625.54 samples/sec   Loss 6.9929   LearningRate 0.0270   Epoch: 9   Global Step: 398890   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:31:42,954-Speed 2621.18 samples/sec   Loss 6.8703   LearningRate 0.0270   Epoch: 9   Global Step: 398900   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:31:46,849-Speed 2630.14 samples/sec   Loss 6.9589   LearningRate 0.0270   Epoch: 9   Global Step: 398910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:50,744-Speed 2630.25 samples/sec   Loss 6.9633   LearningRate 0.0269   Epoch: 9   Global Step: 398920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:54,639-Speed 2629.51 samples/sec   Loss 6.9538   LearningRate 0.0269   Epoch: 9   Global Step: 398930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:31:58,536-Speed 2628.99 samples/sec   Loss 6.9916   LearningRate 0.0269   Epoch: 9   Global Step: 398940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:02,436-Speed 2625.94 samples/sec   Loss 7.0309   LearningRate 0.0269   Epoch: 9   Global Step: 398950   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:06,505-Speed 2517.05 samples/sec   Loss 6.8817   LearningRate 0.0269   Epoch: 9   Global Step: 398960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:10,464-Speed 2587.59 samples/sec   Loss 6.8971   LearningRate 0.0269   Epoch: 9   Global Step: 398970   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:14,361-Speed 2628.40 samples/sec   Loss 6.8654   LearningRate 0.0269   Epoch: 9   Global Step: 398980   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:18,256-Speed 2629.24 samples/sec   Loss 7.0198   LearningRate 0.0269   Epoch: 9   Global Step: 398990   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:22,182-Speed 2609.18 samples/sec   Loss 6.9918   LearningRate 0.0269   Epoch: 9   Global Step: 399000   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:26,059-Speed 2641.93 samples/sec   Loss 6.9116   LearningRate 0.0269   Epoch: 9   Global Step: 399010   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:30,091-Speed 2540.63 samples/sec   Loss 6.9402   LearningRate 0.0269   Epoch: 9   Global Step: 399020   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:33,986-Speed 2630.08 samples/sec   Loss 7.0488   LearningRate 0.0269   Epoch: 9   Global Step: 399030   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:37,879-Speed 2630.49 samples/sec   Loss 7.0015   LearningRate 0.0269   Epoch: 9   Global Step: 399040   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:41,815-Speed 2602.17 samples/sec   Loss 6.9784   LearningRate 0.0269   Epoch: 9   Global Step: 399050   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:45,714-Speed 2627.13 samples/sec   Loss 6.8620   LearningRate 0.0269   Epoch: 9   Global Step: 399060   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:49,621-Speed 2621.62 samples/sec   Loss 7.0341   LearningRate 0.0269   Epoch: 9   Global Step: 399070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:53,520-Speed 2627.45 samples/sec   Loss 7.0379   LearningRate 0.0269   Epoch: 9   Global Step: 399080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:32:57,419-Speed 2626.45 samples/sec   Loss 6.9929   LearningRate 0.0269   Epoch: 9   Global Step: 399090   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:01,320-Speed 2626.38 samples/sec   Loss 6.9923   LearningRate 0.0269   Epoch: 9   Global Step: 399100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:05,203-Speed 2637.60 samples/sec   Loss 6.9354   LearningRate 0.0269   Epoch: 9   Global Step: 399110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:09,105-Speed 2624.54 samples/sec   Loss 6.8036   LearningRate 0.0269   Epoch: 9   Global Step: 399120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:13,014-Speed 2619.97 samples/sec   Loss 6.9233   LearningRate 0.0269   Epoch: 9   Global Step: 399130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:16,918-Speed 2623.98 samples/sec   Loss 7.0334   LearningRate 0.0269   Epoch: 9   Global Step: 399140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:20,817-Speed 2626.67 samples/sec   Loss 6.9517   LearningRate 0.0269   Epoch: 9   Global Step: 399150   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:24,726-Speed 2620.42 samples/sec   Loss 7.0624   LearningRate 0.0269   Epoch: 9   Global Step: 399160   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:28,629-Speed 2624.89 samples/sec   Loss 6.8876   LearningRate 0.0269   Epoch: 9   Global Step: 399170   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:32,550-Speed 2611.71 samples/sec   Loss 6.8729   LearningRate 0.0269   Epoch: 9   Global Step: 399180   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:36,443-Speed 2631.21 samples/sec   Loss 6.8661   LearningRate 0.0269   Epoch: 9   Global Step: 399190   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:40,339-Speed 2628.65 samples/sec   Loss 6.9740   LearningRate 0.0269   Epoch: 9   Global Step: 399200   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:44,284-Speed 2596.20 samples/sec   Loss 6.7919   LearningRate 0.0269   Epoch: 9   Global Step: 399210   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:33:48,164-Speed 2640.46 samples/sec   Loss 7.0402   LearningRate 0.0269   Epoch: 9   Global Step: 399220   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:52,061-Speed 2628.40 samples/sec   Loss 7.1000   LearningRate 0.0269   Epoch: 9   Global Step: 399230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:55,961-Speed 2626.29 samples/sec   Loss 6.9432   LearningRate 0.0269   Epoch: 9   Global Step: 399240   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:33:59,868-Speed 2621.85 samples/sec   Loss 7.1353   LearningRate 0.0269   Epoch: 9   Global Step: 399250   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:03,779-Speed 2618.49 samples/sec   Loss 6.8572   LearningRate 0.0269   Epoch: 9   Global Step: 399260   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:07,673-Speed 2630.51 samples/sec   Loss 7.0322   LearningRate 0.0269   Epoch: 9   Global Step: 399270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:11,561-Speed 2634.28 samples/sec   Loss 6.8599   LearningRate 0.0269   Epoch: 9   Global Step: 399280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:15,468-Speed 2621.73 samples/sec   Loss 6.9030   LearningRate 0.0269   Epoch: 9   Global Step: 399290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:19,401-Speed 2604.08 samples/sec   Loss 7.0178   LearningRate 0.0269   Epoch: 9   Global Step: 399300   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:23,296-Speed 2629.97 samples/sec   Loss 6.9936   LearningRate 0.0269   Epoch: 9   Global Step: 399310   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:27,180-Speed 2637.33 samples/sec   Loss 6.8928   LearningRate 0.0269   Epoch: 9   Global Step: 399320   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:31,077-Speed 2628.33 samples/sec   Loss 6.9749   LearningRate 0.0269   Epoch: 9   Global Step: 399330   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:34,987-Speed 2619.68 samples/sec   Loss 7.1003   LearningRate 0.0269   Epoch: 9   Global Step: 399340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:38,882-Speed 2629.14 samples/sec   Loss 6.9120   LearningRate 0.0269   Epoch: 9   Global Step: 399350   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:42,781-Speed 2627.67 samples/sec   Loss 7.0195   LearningRate 0.0269   Epoch: 9   Global Step: 399360   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:46,681-Speed 2626.02 samples/sec   Loss 6.9320   LearningRate 0.0269   Epoch: 9   Global Step: 399370   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:50,576-Speed 2629.46 samples/sec   Loss 6.9707   LearningRate 0.0269   Epoch: 9   Global Step: 399380   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:54,506-Speed 2606.08 samples/sec   Loss 6.9461   LearningRate 0.0269   Epoch: 9   Global Step: 399390   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:34:58,411-Speed 2623.39 samples/sec   Loss 6.8727   LearningRate 0.0269   Epoch: 9   Global Step: 399400   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:35:02,319-Speed 2620.95 samples/sec   Loss 7.0013   LearningRate 0.0269   Epoch: 9   Global Step: 399410   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:35:06,211-Speed 2632.39 samples/sec   Loss 6.9407   LearningRate 0.0269   Epoch: 9   Global Step: 399420   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:35:10,138-Speed 2608.31 samples/sec   Loss 6.9246   LearningRate 0.0269   Epoch: 9   Global Step: 399430   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:35:14,033-Speed 2629.52 samples/sec   Loss 7.0121   LearningRate 0.0269   Epoch: 9   Global Step: 399440   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:35:17,912-Speed 2640.80 samples/sec   Loss 6.9827   LearningRate 0.0269   Epoch: 9   Global Step: 399450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:35:21,804-Speed 2631.76 samples/sec   Loss 6.9308   LearningRate 0.0269   Epoch: 9   Global Step: 399460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:35:25,704-Speed 2626.42 samples/sec   Loss 6.9528   LearningRate 0.0269   Epoch: 9   Global Step: 399470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:35:29,577-Speed 2644.92 samples/sec   Loss 6.7828   LearningRate 0.0269   Epoch: 9   Global Step: 399480   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:33,479-Speed 2625.40 samples/sec   Loss 6.9232   LearningRate 0.0269   Epoch: 9   Global Step: 399490   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:37,373-Speed 2630.30 samples/sec   Loss 6.8677   LearningRate 0.0269   Epoch: 9   Global Step: 399500   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:41,267-Speed 2629.85 samples/sec   Loss 7.0009   LearningRate 0.0269   Epoch: 9   Global Step: 399510   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:45,169-Speed 2624.97 samples/sec   Loss 6.8322   LearningRate 0.0269   Epoch: 9   Global Step: 399520   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:49,069-Speed 2627.31 samples/sec   Loss 7.0566   LearningRate 0.0269   Epoch: 9   Global Step: 399530   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:52,977-Speed 2620.53 samples/sec   Loss 6.9649   LearningRate 0.0269   Epoch: 9   Global Step: 399540   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:35:56,878-Speed 2625.62 samples/sec   Loss 6.8842   LearningRate 0.0269   Epoch: 9   Global Step: 399550   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:36:00,795-Speed 2614.49 samples/sec   Loss 7.0300   LearningRate 0.0269   Epoch: 9   Global Step: 399560   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:36:04,700-Speed 2623.69 samples/sec   Loss 6.9779   LearningRate 0.0269   Epoch: 9   Global Step: 399570   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:36:08,597-Speed 2628.08 samples/sec   Loss 7.0031   LearningRate 0.0269   Epoch: 9   Global Step: 399580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:12,523-Speed 2608.97 samples/sec   Loss 6.9954   LearningRate 0.0269   Epoch: 9   Global Step: 399590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:16,425-Speed 2624.61 samples/sec   Loss 6.9518   LearningRate 0.0269   Epoch: 9   Global Step: 399600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:20,321-Speed 2629.41 samples/sec   Loss 6.9756   LearningRate 0.0269   Epoch: 9   Global Step: 399610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:24,215-Speed 2630.14 samples/sec   Loss 6.9306   LearningRate 0.0269   Epoch: 9   Global Step: 399620   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:28,111-Speed 2629.06 samples/sec   Loss 6.9946   LearningRate 0.0269   Epoch: 9   Global Step: 399630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:32,008-Speed 2628.68 samples/sec   Loss 6.9382   LearningRate 0.0269   Epoch: 9   Global Step: 399640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:35,907-Speed 2627.21 samples/sec   Loss 6.9647   LearningRate 0.0269   Epoch: 9   Global Step: 399650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:39,840-Speed 2603.91 samples/sec   Loss 6.8688   LearningRate 0.0269   Epoch: 9   Global Step: 399660   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:43,808-Speed 2581.50 samples/sec   Loss 7.0031   LearningRate 0.0269   Epoch: 9   Global Step: 399670   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:47,856-Speed 2530.54 samples/sec   Loss 6.9784   LearningRate 0.0269   Epoch: 9   Global Step: 399680   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:51,897-Speed 2534.83 samples/sec   Loss 7.0223   LearningRate 0.0269   Epoch: 9   Global Step: 399690   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:55,820-Speed 2610.91 samples/sec   Loss 6.8799   LearningRate 0.0269   Epoch: 9   Global Step: 399700   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:36:59,721-Speed 2625.67 samples/sec   Loss 6.9756   LearningRate 0.0269   Epoch: 9   Global Step: 399710   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:03,655-Speed 2603.52 samples/sec   Loss 6.9371   LearningRate 0.0268   Epoch: 9   Global Step: 399720   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:07,563-Speed 2620.89 samples/sec   Loss 6.9051   LearningRate 0.0268   Epoch: 9   Global Step: 399730   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:11,490-Speed 2608.38 samples/sec   Loss 6.9668   LearningRate 0.0268   Epoch: 9   Global Step: 399740   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:15,484-Speed 2564.25 samples/sec   Loss 6.9290   LearningRate 0.0268   Epoch: 9   Global Step: 399750   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:19,382-Speed 2627.83 samples/sec   Loss 6.9639   LearningRate 0.0268   Epoch: 9   Global Step: 399760   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:23,278-Speed 2628.91 samples/sec   Loss 6.8519   LearningRate 0.0268   Epoch: 9   Global Step: 399770   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:27,176-Speed 2627.96 samples/sec   Loss 6.8526   LearningRate 0.0268   Epoch: 9   Global Step: 399780   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:37:31,046-Speed 2646.81 samples/sec   Loss 6.8892   LearningRate 0.0268   Epoch: 9   Global Step: 399790   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:34,951-Speed 2622.86 samples/sec   Loss 6.8986   LearningRate 0.0268   Epoch: 9   Global Step: 399800   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:38,854-Speed 2623.81 samples/sec   Loss 6.9100   LearningRate 0.0268   Epoch: 9   Global Step: 399810   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:42,761-Speed 2621.34 samples/sec   Loss 6.8357   LearningRate 0.0268   Epoch: 9   Global Step: 399820   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:46,660-Speed 2628.32 samples/sec   Loss 6.9182   LearningRate 0.0268   Epoch: 9   Global Step: 399830   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:50,555-Speed 2629.31 samples/sec   Loss 7.0446   LearningRate 0.0268   Epoch: 9   Global Step: 399840   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:54,457-Speed 2625.22 samples/sec   Loss 7.0453   LearningRate 0.0268   Epoch: 9   Global Step: 399850   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:37:58,354-Speed 2628.29 samples/sec   Loss 6.8828   LearningRate 0.0268   Epoch: 9   Global Step: 399860   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:02,251-Speed 2628.61 samples/sec   Loss 6.9459   LearningRate 0.0268   Epoch: 9   Global Step: 399870   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:06,157-Speed 2621.85 samples/sec   Loss 6.9343   LearningRate 0.0268   Epoch: 9   Global Step: 399880   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:10,055-Speed 2627.38 samples/sec   Loss 6.8800   LearningRate 0.0268   Epoch: 9   Global Step: 399890   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:38:13,934-Speed 2640.53 samples/sec   Loss 6.9069   LearningRate 0.0268   Epoch: 9   Global Step: 399900   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:17,830-Speed 2629.41 samples/sec   Loss 7.0784   LearningRate 0.0268   Epoch: 9   Global Step: 399910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:21,732-Speed 2624.93 samples/sec   Loss 6.9707   LearningRate 0.0268   Epoch: 9   Global Step: 399920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:25,634-Speed 2624.90 samples/sec   Loss 6.8471   LearningRate 0.0268   Epoch: 9   Global Step: 399930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:29,529-Speed 2629.97 samples/sec   Loss 6.9568   LearningRate 0.0268   Epoch: 9   Global Step: 399940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:33,434-Speed 2622.70 samples/sec   Loss 6.8514   LearningRate 0.0268   Epoch: 9   Global Step: 399950   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:37,342-Speed 2620.54 samples/sec   Loss 6.9225   LearningRate 0.0268   Epoch: 9   Global Step: 399960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:41,240-Speed 2627.50 samples/sec   Loss 6.9263   LearningRate 0.0268   Epoch: 9   Global Step: 399970   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:45,140-Speed 2626.82 samples/sec   Loss 6.9109   LearningRate 0.0268   Epoch: 9   Global Step: 399980   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:49,061-Speed 2612.46 samples/sec   Loss 6.9075   LearningRate 0.0268   Epoch: 9   Global Step: 399990   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:38:53,015-Speed 2590.42 samples/sec   Loss 6.9768   LearningRate 0.0268   Epoch: 9   Global Step: 400000   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:39:36,028-[lfw][400000]XNorm: 23.593337
Training: 2022-04-14 16:39:36,029-[lfw][400000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 16:39:36,029-[lfw][400000]Accuracy-Highest: 0.99783
Training: 2022-04-14 16:40:25,997-[cfp_fp][400000]XNorm: 21.596733
Training: 2022-04-14 16:40:26,397-[cfp_fp][400000]Accuracy-Flip: 0.98757+-0.00599
Training: 2022-04-14 16:40:26,397-[cfp_fp][400000]Accuracy-Highest: 0.98757
Training: 2022-04-14 16:41:09,421-[agedb_30][400000]XNorm: 23.452849
Training: 2022-04-14 16:41:09,422-[agedb_30][400000]Accuracy-Flip: 0.97600+-0.00735
Training: 2022-04-14 16:41:09,423-[agedb_30][400000]Accuracy-Highest: 0.97700
Training: 2022-04-14 16:41:13,284-Speed 73.00 samples/sec   Loss 6.9528   LearningRate 0.0268   Epoch: 9   Global Step: 400010   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:17,171-Speed 2635.21 samples/sec   Loss 6.9368   LearningRate 0.0268   Epoch: 9   Global Step: 400020   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:21,051-Speed 2639.65 samples/sec   Loss 6.9735   LearningRate 0.0268   Epoch: 9   Global Step: 400030   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:24,930-Speed 2640.35 samples/sec   Loss 6.8156   LearningRate 0.0268   Epoch: 9   Global Step: 400040   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:28,813-Speed 2637.34 samples/sec   Loss 6.8604   LearningRate 0.0268   Epoch: 9   Global Step: 400050   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:32,716-Speed 2624.57 samples/sec   Loss 6.8941   LearningRate 0.0268   Epoch: 9   Global Step: 400060   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:36,622-Speed 2621.73 samples/sec   Loss 6.9199   LearningRate 0.0268   Epoch: 9   Global Step: 400070   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:40,518-Speed 2628.94 samples/sec   Loss 6.7846   LearningRate 0.0268   Epoch: 9   Global Step: 400080   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:44,408-Speed 2633.23 samples/sec   Loss 7.0424   LearningRate 0.0268   Epoch: 9   Global Step: 400090   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:48,297-Speed 2633.95 samples/sec   Loss 7.0460   LearningRate 0.0268   Epoch: 9   Global Step: 400100   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:41:52,190-Speed 2631.04 samples/sec   Loss 6.8042   LearningRate 0.0268   Epoch: 9   Global Step: 400110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:41:56,086-Speed 2628.58 samples/sec   Loss 6.9918   LearningRate 0.0268   Epoch: 9   Global Step: 400120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:41:59,977-Speed 2632.18 samples/sec   Loss 7.0716   LearningRate 0.0268   Epoch: 9   Global Step: 400130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:03,940-Speed 2584.66 samples/sec   Loss 6.9859   LearningRate 0.0268   Epoch: 9   Global Step: 400140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:07,837-Speed 2628.53 samples/sec   Loss 7.0308   LearningRate 0.0268   Epoch: 9   Global Step: 400150   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:11,727-Speed 2633.04 samples/sec   Loss 7.0142   LearningRate 0.0268   Epoch: 9   Global Step: 400160   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:15,651-Speed 2610.17 samples/sec   Loss 6.8237   LearningRate 0.0268   Epoch: 9   Global Step: 400170   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:19,554-Speed 2624.55 samples/sec   Loss 6.8652   LearningRate 0.0268   Epoch: 9   Global Step: 400180   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:23,460-Speed 2622.33 samples/sec   Loss 6.8525   LearningRate 0.0268   Epoch: 9   Global Step: 400190   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:27,364-Speed 2623.29 samples/sec   Loss 6.8932   LearningRate 0.0268   Epoch: 9   Global Step: 400200   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:31,241-Speed 2641.73 samples/sec   Loss 6.8991   LearningRate 0.0268   Epoch: 9   Global Step: 400210   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:35,142-Speed 2626.14 samples/sec   Loss 6.8454   LearningRate 0.0268   Epoch: 9   Global Step: 400220   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:39,040-Speed 2627.63 samples/sec   Loss 6.9660   LearningRate 0.0268   Epoch: 9   Global Step: 400230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:42,957-Speed 2615.24 samples/sec   Loss 7.0036   LearningRate 0.0268   Epoch: 9   Global Step: 400240   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:46,856-Speed 2626.71 samples/sec   Loss 6.9189   LearningRate 0.0268   Epoch: 9   Global Step: 400250   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:50,756-Speed 2626.47 samples/sec   Loss 6.9871   LearningRate 0.0268   Epoch: 9   Global Step: 400260   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:54,657-Speed 2625.48 samples/sec   Loss 6.9538   LearningRate 0.0268   Epoch: 9   Global Step: 400270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:42:58,548-Speed 2632.28 samples/sec   Loss 6.8936   LearningRate 0.0268   Epoch: 9   Global Step: 400280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:43:02,451-Speed 2624.24 samples/sec   Loss 7.0190   LearningRate 0.0268   Epoch: 9   Global Step: 400290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:43:06,362-Speed 2619.09 samples/sec   Loss 6.9575   LearningRate 0.0268   Epoch: 9   Global Step: 400300   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:43:10,261-Speed 2626.86 samples/sec   Loss 7.0624   LearningRate 0.0268   Epoch: 9   Global Step: 400310   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:43:14,128-Speed 2649.00 samples/sec   Loss 6.8405   LearningRate 0.0268   Epoch: 9   Global Step: 400320   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:18,045-Speed 2615.00 samples/sec   Loss 6.8063   LearningRate 0.0268   Epoch: 9   Global Step: 400330   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:21,939-Speed 2630.46 samples/sec   Loss 6.9506   LearningRate 0.0268   Epoch: 9   Global Step: 400340   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:25,835-Speed 2628.64 samples/sec   Loss 6.9437   LearningRate 0.0268   Epoch: 9   Global Step: 400350   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:29,738-Speed 2624.36 samples/sec   Loss 6.9404   LearningRate 0.0268   Epoch: 9   Global Step: 400360   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:33,640-Speed 2624.57 samples/sec   Loss 6.9219   LearningRate 0.0268   Epoch: 9   Global Step: 400370   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:37,695-Speed 2526.44 samples/sec   Loss 6.8626   LearningRate 0.0268   Epoch: 9   Global Step: 400380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:41,782-Speed 2505.85 samples/sec   Loss 7.0048   LearningRate 0.0268   Epoch: 9   Global Step: 400390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:45,713-Speed 2605.67 samples/sec   Loss 6.9155   LearningRate 0.0268   Epoch: 9   Global Step: 400400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:49,613-Speed 2626.81 samples/sec   Loss 6.8409   LearningRate 0.0268   Epoch: 9   Global Step: 400410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:43:53,513-Speed 2626.31 samples/sec   Loss 6.9290   LearningRate 0.0268   Epoch: 9   Global Step: 400420   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:43:57,415-Speed 2624.70 samples/sec   Loss 6.9509   LearningRate 0.0268   Epoch: 9   Global Step: 400430   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:01,313-Speed 2627.45 samples/sec   Loss 6.9726   LearningRate 0.0268   Epoch: 9   Global Step: 400440   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:05,213-Speed 2626.71 samples/sec   Loss 6.8815   LearningRate 0.0268   Epoch: 9   Global Step: 400450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:09,134-Speed 2612.10 samples/sec   Loss 6.9576   LearningRate 0.0268   Epoch: 9   Global Step: 400460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:13,036-Speed 2624.90 samples/sec   Loss 6.9145   LearningRate 0.0268   Epoch: 9   Global Step: 400470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:16,947-Speed 2619.26 samples/sec   Loss 6.8865   LearningRate 0.0268   Epoch: 9   Global Step: 400480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:20,850-Speed 2624.28 samples/sec   Loss 6.9131   LearningRate 0.0268   Epoch: 9   Global Step: 400490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:24,747-Speed 2628.00 samples/sec   Loss 6.9682   LearningRate 0.0268   Epoch: 9   Global Step: 400500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:28,643-Speed 2629.02 samples/sec   Loss 6.9748   LearningRate 0.0268   Epoch: 9   Global Step: 400510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:32,542-Speed 2627.31 samples/sec   Loss 6.8952   LearningRate 0.0267   Epoch: 9   Global Step: 400520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:36,447-Speed 2622.67 samples/sec   Loss 6.8533   LearningRate 0.0267   Epoch: 9   Global Step: 400530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:40,355-Speed 2621.21 samples/sec   Loss 6.9403   LearningRate 0.0267   Epoch: 9   Global Step: 400540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:44,253-Speed 2627.55 samples/sec   Loss 6.8308   LearningRate 0.0267   Epoch: 9   Global Step: 400550   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:48,152-Speed 2627.91 samples/sec   Loss 6.9336   LearningRate 0.0267   Epoch: 9   Global Step: 400560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:52,051-Speed 2626.98 samples/sec   Loss 6.8960   LearningRate 0.0267   Epoch: 9   Global Step: 400570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:55,957-Speed 2622.49 samples/sec   Loss 6.9953   LearningRate 0.0267   Epoch: 9   Global Step: 400580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:44:59,855-Speed 2627.28 samples/sec   Loss 6.9333   LearningRate 0.0267   Epoch: 9   Global Step: 400590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:03,762-Speed 2621.85 samples/sec   Loss 6.9507   LearningRate 0.0267   Epoch: 9   Global Step: 400600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:07,660-Speed 2627.24 samples/sec   Loss 7.0367   LearningRate 0.0267   Epoch: 9   Global Step: 400610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:11,560-Speed 2626.57 samples/sec   Loss 7.0318   LearningRate 0.0267   Epoch: 9   Global Step: 400620   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:45:15,432-Speed 2644.80 samples/sec   Loss 6.9243   LearningRate 0.0267   Epoch: 9   Global Step: 400630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:19,392-Speed 2586.97 samples/sec   Loss 6.8823   LearningRate 0.0267   Epoch: 9   Global Step: 400640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:23,478-Speed 2506.95 samples/sec   Loss 6.9181   LearningRate 0.0267   Epoch: 9   Global Step: 400650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:27,525-Speed 2530.13 samples/sec   Loss 6.9042   LearningRate 0.0267   Epoch: 9   Global Step: 400660   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:31,533-Speed 2556.17 samples/sec   Loss 6.9668   LearningRate 0.0267   Epoch: 9   Global Step: 400670   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:35,615-Speed 2509.02 samples/sec   Loss 7.0523   LearningRate 0.0267   Epoch: 9   Global Step: 400680   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:39,692-Speed 2512.31 samples/sec   Loss 6.8582   LearningRate 0.0267   Epoch: 9   Global Step: 400690   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:43,735-Speed 2533.43 samples/sec   Loss 6.9211   LearningRate 0.0267   Epoch: 9   Global Step: 400700   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:47,632-Speed 2628.83 samples/sec   Loss 6.8273   LearningRate 0.0267   Epoch: 9   Global Step: 400710   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:51,530-Speed 2627.90 samples/sec   Loss 6.9293   LearningRate 0.0267   Epoch: 9   Global Step: 400720   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:55,412-Speed 2638.21 samples/sec   Loss 6.9202   LearningRate 0.0267   Epoch: 9   Global Step: 400730   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:45:59,306-Speed 2630.55 samples/sec   Loss 6.9753   LearningRate 0.0267   Epoch: 9   Global Step: 400740   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:46:03,206-Speed 2626.63 samples/sec   Loss 6.9071   LearningRate 0.0267   Epoch: 9   Global Step: 400750   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:46:07,108-Speed 2624.93 samples/sec   Loss 6.8984   LearningRate 0.0267   Epoch: 9   Global Step: 400760   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:46:11,008-Speed 2626.48 samples/sec   Loss 6.8019   LearningRate 0.0267   Epoch: 9   Global Step: 400770   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:46:14,906-Speed 2627.53 samples/sec   Loss 6.9847   LearningRate 0.0267   Epoch: 9   Global Step: 400780   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:46:18,788-Speed 2639.12 samples/sec   Loss 6.9051   LearningRate 0.0267   Epoch: 9   Global Step: 400790   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:22,682-Speed 2630.11 samples/sec   Loss 6.8483   LearningRate 0.0267   Epoch: 9   Global Step: 400800   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:26,586-Speed 2623.26 samples/sec   Loss 6.9315   LearningRate 0.0267   Epoch: 9   Global Step: 400810   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:30,488-Speed 2625.16 samples/sec   Loss 6.9935   LearningRate 0.0267   Epoch: 9   Global Step: 400820   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:34,387-Speed 2626.96 samples/sec   Loss 6.8632   LearningRate 0.0267   Epoch: 9   Global Step: 400830   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:38,282-Speed 2629.52 samples/sec   Loss 6.9387   LearningRate 0.0267   Epoch: 9   Global Step: 400840   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:42,206-Speed 2610.11 samples/sec   Loss 6.8592   LearningRate 0.0267   Epoch: 9   Global Step: 400850   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:46,110-Speed 2624.15 samples/sec   Loss 6.8991   LearningRate 0.0267   Epoch: 9   Global Step: 400860   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:50,007-Speed 2628.04 samples/sec   Loss 6.8899   LearningRate 0.0267   Epoch: 9   Global Step: 400870   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:53,919-Speed 2618.55 samples/sec   Loss 6.9610   LearningRate 0.0267   Epoch: 9   Global Step: 400880   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:46:57,815-Speed 2629.49 samples/sec   Loss 6.8995   LearningRate 0.0267   Epoch: 9   Global Step: 400890   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:01,712-Speed 2627.67 samples/sec   Loss 6.8563   LearningRate 0.0267   Epoch: 9   Global Step: 400900   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:05,631-Speed 2613.93 samples/sec   Loss 6.9477   LearningRate 0.0267   Epoch: 9   Global Step: 400910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:09,535-Speed 2623.43 samples/sec   Loss 6.8335   LearningRate 0.0267   Epoch: 9   Global Step: 400920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:13,435-Speed 2626.49 samples/sec   Loss 6.8792   LearningRate 0.0267   Epoch: 9   Global Step: 400930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:17,339-Speed 2623.44 samples/sec   Loss 6.9503   LearningRate 0.0267   Epoch: 9   Global Step: 400940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:21,237-Speed 2628.07 samples/sec   Loss 6.9202   LearningRate 0.0267   Epoch: 9   Global Step: 400950   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:25,134-Speed 2627.84 samples/sec   Loss 6.8287   LearningRate 0.0267   Epoch: 9   Global Step: 400960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:29,047-Speed 2618.04 samples/sec   Loss 6.8943   LearningRate 0.0267   Epoch: 9   Global Step: 400970   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:32,966-Speed 2612.92 samples/sec   Loss 6.9344   LearningRate 0.0267   Epoch: 9   Global Step: 400980   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:36,880-Speed 2617.13 samples/sec   Loss 7.0155   LearningRate 0.0267   Epoch: 9   Global Step: 400990   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:47:40,797-Speed 2614.66 samples/sec   Loss 6.9785   LearningRate 0.0267   Epoch: 9   Global Step: 401000   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:47:44,697-Speed 2626.38 samples/sec   Loss 6.8876   LearningRate 0.0267   Epoch: 9   Global Step: 401010   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:47:48,578-Speed 2639.11 samples/sec   Loss 6.9535   LearningRate 0.0267   Epoch: 9   Global Step: 401020   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:52,546-Speed 2582.50 samples/sec   Loss 6.9586   LearningRate 0.0267   Epoch: 9   Global Step: 401030   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:47:56,465-Speed 2613.26 samples/sec   Loss 6.8466   LearningRate 0.0267   Epoch: 9   Global Step: 401040   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:00,366-Speed 2625.82 samples/sec   Loss 6.8466   LearningRate 0.0267   Epoch: 9   Global Step: 401050   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:04,261-Speed 2629.78 samples/sec   Loss 6.8895   LearningRate 0.0267   Epoch: 9   Global Step: 401060   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:08,166-Speed 2622.76 samples/sec   Loss 6.8271   LearningRate 0.0267   Epoch: 9   Global Step: 401070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:12,073-Speed 2621.41 samples/sec   Loss 6.9235   LearningRate 0.0267   Epoch: 9   Global Step: 401080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:15,968-Speed 2629.69 samples/sec   Loss 6.8762   LearningRate 0.0267   Epoch: 9   Global Step: 401090   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:19,898-Speed 2606.52 samples/sec   Loss 6.8298   LearningRate 0.0267   Epoch: 9   Global Step: 401100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:23,802-Speed 2623.92 samples/sec   Loss 6.9161   LearningRate 0.0267   Epoch: 9   Global Step: 401110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:27,715-Speed 2617.90 samples/sec   Loss 6.9066   LearningRate 0.0267   Epoch: 9   Global Step: 401120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:31,613-Speed 2627.15 samples/sec   Loss 6.9223   LearningRate 0.0267   Epoch: 9   Global Step: 401130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:35,505-Speed 2632.10 samples/sec   Loss 6.9384   LearningRate 0.0267   Epoch: 9   Global Step: 401140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:39,408-Speed 2623.92 samples/sec   Loss 6.8489   LearningRate 0.0267   Epoch: 9   Global Step: 401150   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:43,312-Speed 2624.11 samples/sec   Loss 6.9417   LearningRate 0.0267   Epoch: 9   Global Step: 401160   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:47,215-Speed 2624.05 samples/sec   Loss 6.8335   LearningRate 0.0267   Epoch: 9   Global Step: 401170   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:51,123-Speed 2620.89 samples/sec   Loss 6.7922   LearningRate 0.0267   Epoch: 9   Global Step: 401180   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:55,018-Speed 2629.83 samples/sec   Loss 6.9772   LearningRate 0.0267   Epoch: 9   Global Step: 401190   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:48:58,941-Speed 2610.92 samples/sec   Loss 6.8586   LearningRate 0.0267   Epoch: 9   Global Step: 401200   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:49:02,850-Speed 2620.23 samples/sec   Loss 6.9402   LearningRate 0.0267   Epoch: 9   Global Step: 401210   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:49:06,746-Speed 2628.83 samples/sec   Loss 6.8979   LearningRate 0.0267   Epoch: 9   Global Step: 401220   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:49:10,618-Speed 2645.16 samples/sec   Loss 6.9566   LearningRate 0.0267   Epoch: 9   Global Step: 401230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:49:14,510-Speed 2632.09 samples/sec   Loss 6.9062   LearningRate 0.0267   Epoch: 9   Global Step: 401240   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:18,410-Speed 2626.73 samples/sec   Loss 6.9611   LearningRate 0.0267   Epoch: 9   Global Step: 401250   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:22,336-Speed 2609.25 samples/sec   Loss 6.9372   LearningRate 0.0267   Epoch: 9   Global Step: 401260   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:26,239-Speed 2624.79 samples/sec   Loss 6.9367   LearningRate 0.0267   Epoch: 9   Global Step: 401270   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:30,140-Speed 2625.98 samples/sec   Loss 6.8207   LearningRate 0.0267   Epoch: 9   Global Step: 401280   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:34,040-Speed 2626.02 samples/sec   Loss 6.9125   LearningRate 0.0267   Epoch: 9   Global Step: 401290   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:37,949-Speed 2620.52 samples/sec   Loss 6.8823   LearningRate 0.0267   Epoch: 9   Global Step: 401300   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:41,869-Speed 2612.31 samples/sec   Loss 7.0048   LearningRate 0.0267   Epoch: 9   Global Step: 401310   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:45,783-Speed 2617.27 samples/sec   Loss 6.8990   LearningRate 0.0267   Epoch: 9   Global Step: 401320   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:49,685-Speed 2625.81 samples/sec   Loss 7.0588   LearningRate 0.0266   Epoch: 9   Global Step: 401330   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:49:53,606-Speed 2612.05 samples/sec   Loss 6.8462   LearningRate 0.0266   Epoch: 9   Global Step: 401340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:49:57,525-Speed 2613.64 samples/sec   Loss 6.8960   LearningRate 0.0266   Epoch: 9   Global Step: 401350   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:01,418-Speed 2631.23 samples/sec   Loss 6.9281   LearningRate 0.0266   Epoch: 9   Global Step: 401360   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:05,321-Speed 2624.10 samples/sec   Loss 6.9283   LearningRate 0.0266   Epoch: 9   Global Step: 401370   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:09,220-Speed 2626.84 samples/sec   Loss 6.8685   LearningRate 0.0266   Epoch: 9   Global Step: 401380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:13,153-Speed 2604.56 samples/sec   Loss 6.9081   LearningRate 0.0266   Epoch: 9   Global Step: 401390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:17,050-Speed 2628.00 samples/sec   Loss 6.8599   LearningRate 0.0266   Epoch: 9   Global Step: 401400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:20,972-Speed 2611.74 samples/sec   Loss 6.8683   LearningRate 0.0266   Epoch: 9   Global Step: 401410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:24,884-Speed 2618.28 samples/sec   Loss 6.9392   LearningRate 0.0266   Epoch: 9   Global Step: 401420   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:28,781-Speed 2628.62 samples/sec   Loss 6.9230   LearningRate 0.0266   Epoch: 9   Global Step: 401430   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:32,682-Speed 2625.27 samples/sec   Loss 7.0972   LearningRate 0.0266   Epoch: 9   Global Step: 401440   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:50:36,601-Speed 2613.71 samples/sec   Loss 6.9417   LearningRate 0.0266   Epoch: 9   Global Step: 401450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:50:40,498-Speed 2628.24 samples/sec   Loss 6.8159   LearningRate 0.0266   Epoch: 9   Global Step: 401460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:50:44,392-Speed 2629.63 samples/sec   Loss 6.9773   LearningRate 0.0266   Epoch: 9   Global Step: 401470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:50:48,297-Speed 2623.64 samples/sec   Loss 6.9928   LearningRate 0.0266   Epoch: 9   Global Step: 401480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:50:52,196-Speed 2626.84 samples/sec   Loss 6.9521   LearningRate 0.0266   Epoch: 9   Global Step: 401490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:50:56,095-Speed 2627.73 samples/sec   Loss 6.9394   LearningRate 0.0266   Epoch: 9   Global Step: 401500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:50:59,999-Speed 2623.27 samples/sec   Loss 6.8586   LearningRate 0.0266   Epoch: 9   Global Step: 401510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:03,904-Speed 2623.10 samples/sec   Loss 6.8704   LearningRate 0.0266   Epoch: 9   Global Step: 401520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:07,807-Speed 2623.86 samples/sec   Loss 7.0018   LearningRate 0.0266   Epoch: 9   Global Step: 401530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:11,704-Speed 2628.23 samples/sec   Loss 7.0541   LearningRate 0.0266   Epoch: 9   Global Step: 401540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:15,603-Speed 2626.80 samples/sec   Loss 6.9159   LearningRate 0.0266   Epoch: 9   Global Step: 401550   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:51:19,491-Speed 2635.00 samples/sec   Loss 7.0230   LearningRate 0.0266   Epoch: 9   Global Step: 401560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:23,384-Speed 2630.56 samples/sec   Loss 7.0812   LearningRate 0.0266   Epoch: 9   Global Step: 401570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:27,298-Speed 2617.56 samples/sec   Loss 6.9074   LearningRate 0.0266   Epoch: 9   Global Step: 401580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:31,200-Speed 2625.16 samples/sec   Loss 6.8948   LearningRate 0.0266   Epoch: 9   Global Step: 401590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:35,106-Speed 2622.55 samples/sec   Loss 6.9450   LearningRate 0.0266   Epoch: 9   Global Step: 401600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:39,019-Speed 2617.14 samples/sec   Loss 7.0106   LearningRate 0.0266   Epoch: 9   Global Step: 401610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:42,918-Speed 2626.79 samples/sec   Loss 7.0127   LearningRate 0.0266   Epoch: 9   Global Step: 401620   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:46,820-Speed 2625.20 samples/sec   Loss 6.9555   LearningRate 0.0266   Epoch: 9   Global Step: 401630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:50,715-Speed 2629.45 samples/sec   Loss 6.7868   LearningRate 0.0266   Epoch: 9   Global Step: 401640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:54,616-Speed 2626.17 samples/sec   Loss 6.8917   LearningRate 0.0266   Epoch: 9   Global Step: 401650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:51:58,512-Speed 2629.04 samples/sec   Loss 6.8956   LearningRate 0.0266   Epoch: 9   Global Step: 401660   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:52:02,386-Speed 2643.82 samples/sec   Loss 6.8820   LearningRate 0.0266   Epoch: 9   Global Step: 401670   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:06,288-Speed 2624.94 samples/sec   Loss 6.8492   LearningRate 0.0266   Epoch: 9   Global Step: 401680   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:10,186-Speed 2627.65 samples/sec   Loss 6.7603   LearningRate 0.0266   Epoch: 9   Global Step: 401690   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:14,092-Speed 2622.10 samples/sec   Loss 6.8381   LearningRate 0.0266   Epoch: 9   Global Step: 401700   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:17,997-Speed 2623.25 samples/sec   Loss 7.0171   LearningRate 0.0266   Epoch: 9   Global Step: 401710   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:21,907-Speed 2619.12 samples/sec   Loss 6.8025   LearningRate 0.0266   Epoch: 9   Global Step: 401720   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:25,808-Speed 2626.35 samples/sec   Loss 6.7994   LearningRate 0.0266   Epoch: 9   Global Step: 401730   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:29,710-Speed 2625.17 samples/sec   Loss 6.8699   LearningRate 0.0266   Epoch: 9   Global Step: 401740   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:33,650-Speed 2599.63 samples/sec   Loss 6.8406   LearningRate 0.0266   Epoch: 9   Global Step: 401750   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:37,576-Speed 2608.32 samples/sec   Loss 6.9798   LearningRate 0.0266   Epoch: 9   Global Step: 401760   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:41,474-Speed 2627.42 samples/sec   Loss 6.9223   LearningRate 0.0266   Epoch: 9   Global Step: 401770   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:52:45,374-Speed 2626.77 samples/sec   Loss 7.0205   LearningRate 0.0266   Epoch: 9   Global Step: 401780   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:49,271-Speed 2627.99 samples/sec   Loss 6.9089   LearningRate 0.0266   Epoch: 9   Global Step: 401790   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:53,172-Speed 2625.75 samples/sec   Loss 6.8771   LearningRate 0.0266   Epoch: 9   Global Step: 401800   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:52:57,068-Speed 2628.82 samples/sec   Loss 6.8788   LearningRate 0.0266   Epoch: 9   Global Step: 401810   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:00,972-Speed 2623.79 samples/sec   Loss 6.8884   LearningRate 0.0266   Epoch: 9   Global Step: 401820   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:04,868-Speed 2629.06 samples/sec   Loss 6.9104   LearningRate 0.0266   Epoch: 9   Global Step: 401830   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:08,765-Speed 2628.34 samples/sec   Loss 7.0106   LearningRate 0.0266   Epoch: 9   Global Step: 401840   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:12,664-Speed 2626.77 samples/sec   Loss 6.9537   LearningRate 0.0266   Epoch: 9   Global Step: 401850   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:16,569-Speed 2622.94 samples/sec   Loss 7.0084   LearningRate 0.0266   Epoch: 9   Global Step: 401860   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:20,468-Speed 2626.95 samples/sec   Loss 6.9228   LearningRate 0.0266   Epoch: 9   Global Step: 401870   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:24,370-Speed 2624.98 samples/sec   Loss 6.9699   LearningRate 0.0266   Epoch: 9   Global Step: 401880   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:53:28,262-Speed 2632.04 samples/sec   Loss 6.8843   LearningRate 0.0266   Epoch: 9   Global Step: 401890   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:32,160-Speed 2627.64 samples/sec   Loss 6.8629   LearningRate 0.0266   Epoch: 9   Global Step: 401900   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:36,074-Speed 2616.47 samples/sec   Loss 6.9165   LearningRate 0.0266   Epoch: 9   Global Step: 401910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:39,972-Speed 2627.97 samples/sec   Loss 6.9695   LearningRate 0.0266   Epoch: 9   Global Step: 401920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:43,864-Speed 2631.44 samples/sec   Loss 6.8711   LearningRate 0.0266   Epoch: 9   Global Step: 401930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:47,765-Speed 2625.67 samples/sec   Loss 6.8798   LearningRate 0.0266   Epoch: 9   Global Step: 401940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:51,661-Speed 2629.33 samples/sec   Loss 6.9318   LearningRate 0.0266   Epoch: 9   Global Step: 401950   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:55,558-Speed 2628.56 samples/sec   Loss 6.8504   LearningRate 0.0266   Epoch: 9   Global Step: 401960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:53:59,462-Speed 2623.86 samples/sec   Loss 6.8619   LearningRate 0.0266   Epoch: 9   Global Step: 401970   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:03,394-Speed 2604.54 samples/sec   Loss 6.7394   LearningRate 0.0266   Epoch: 9   Global Step: 401980   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:07,274-Speed 2639.76 samples/sec   Loss 6.8610   LearningRate 0.0266   Epoch: 9   Global Step: 401990   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:11,171-Speed 2628.54 samples/sec   Loss 6.8953   LearningRate 0.0266   Epoch: 9   Global Step: 402000   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:15,064-Speed 2631.27 samples/sec   Loss 6.8445   LearningRate 0.0266   Epoch: 9   Global Step: 402010   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:18,986-Speed 2611.12 samples/sec   Loss 6.9181   LearningRate 0.0266   Epoch: 9   Global Step: 402020   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:22,884-Speed 2628.42 samples/sec   Loss 6.7987   LearningRate 0.0266   Epoch: 9   Global Step: 402030   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:26,780-Speed 2628.90 samples/sec   Loss 6.9104   LearningRate 0.0266   Epoch: 9   Global Step: 402040   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:30,689-Speed 2620.42 samples/sec   Loss 6.7997   LearningRate 0.0266   Epoch: 9   Global Step: 402050   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:34,587-Speed 2627.47 samples/sec   Loss 6.8387   LearningRate 0.0266   Epoch: 9   Global Step: 402060   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:38,486-Speed 2626.75 samples/sec   Loss 6.8277   LearningRate 0.0266   Epoch: 9   Global Step: 402070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:42,383-Speed 2628.65 samples/sec   Loss 6.9323   LearningRate 0.0266   Epoch: 9   Global Step: 402080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:46,295-Speed 2618.70 samples/sec   Loss 6.9646   LearningRate 0.0266   Epoch: 9   Global Step: 402090   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:54:50,175-Speed 2639.57 samples/sec   Loss 6.8257   LearningRate 0.0266   Epoch: 9   Global Step: 402100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:54:54,049-Speed 2644.09 samples/sec   Loss 6.8812   LearningRate 0.0266   Epoch: 9   Global Step: 402110   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:54:57,940-Speed 2632.56 samples/sec   Loss 6.9249   LearningRate 0.0266   Epoch: 9   Global Step: 402120   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:01,843-Speed 2624.42 samples/sec   Loss 6.7559   LearningRate 0.0265   Epoch: 9   Global Step: 402130   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:05,744-Speed 2625.25 samples/sec   Loss 6.9672   LearningRate 0.0265   Epoch: 9   Global Step: 402140   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:09,637-Speed 2630.84 samples/sec   Loss 6.8498   LearningRate 0.0265   Epoch: 9   Global Step: 402150   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:13,533-Speed 2629.25 samples/sec   Loss 6.8759   LearningRate 0.0265   Epoch: 9   Global Step: 402160   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:17,429-Speed 2629.30 samples/sec   Loss 6.9096   LearningRate 0.0265   Epoch: 9   Global Step: 402170   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:21,324-Speed 2629.82 samples/sec   Loss 6.8156   LearningRate 0.0265   Epoch: 9   Global Step: 402180   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:25,258-Speed 2603.96 samples/sec   Loss 6.9342   LearningRate 0.0265   Epoch: 9   Global Step: 402190   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:29,153-Speed 2629.45 samples/sec   Loss 6.9214   LearningRate 0.0265   Epoch: 9   Global Step: 402200   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:55:33,058-Speed 2623.22 samples/sec   Loss 6.9231   LearningRate 0.0265   Epoch: 9   Global Step: 402210   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:55:36,951-Speed 2630.80 samples/sec   Loss 6.9396   LearningRate 0.0265   Epoch: 9   Global Step: 402220   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:55:40,855-Speed 2623.62 samples/sec   Loss 6.8505   LearningRate 0.0265   Epoch: 9   Global Step: 402230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:55:44,753-Speed 2627.43 samples/sec   Loss 7.0593   LearningRate 0.0265   Epoch: 9   Global Step: 402240   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:55:48,651-Speed 2627.41 samples/sec   Loss 6.7800   LearningRate 0.0265   Epoch: 9   Global Step: 402250   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:55:52,567-Speed 2616.71 samples/sec   Loss 6.8133   LearningRate 0.0265   Epoch: 9   Global Step: 402260   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:55:56,462-Speed 2629.27 samples/sec   Loss 6.9093   LearningRate 0.0265   Epoch: 9   Global Step: 402270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:56:00,365-Speed 2624.62 samples/sec   Loss 6.7682   LearningRate 0.0265   Epoch: 9   Global Step: 402280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:56:04,263-Speed 2627.92 samples/sec   Loss 6.9003   LearningRate 0.0265   Epoch: 9   Global Step: 402290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:56:08,142-Speed 2640.36 samples/sec   Loss 6.9094   LearningRate 0.0265   Epoch: 9   Global Step: 402300   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:12,040-Speed 2627.40 samples/sec   Loss 6.7954   LearningRate 0.0265   Epoch: 9   Global Step: 402310   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:15,984-Speed 2597.54 samples/sec   Loss 7.0255   LearningRate 0.0265   Epoch: 9   Global Step: 402320   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:19,882-Speed 2628.18 samples/sec   Loss 6.8079   LearningRate 0.0265   Epoch: 9   Global Step: 402330   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:23,775-Speed 2630.32 samples/sec   Loss 6.7628   LearningRate 0.0265   Epoch: 9   Global Step: 402340   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:27,682-Speed 2622.70 samples/sec   Loss 6.9099   LearningRate 0.0265   Epoch: 9   Global Step: 402350   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:31,579-Speed 2628.07 samples/sec   Loss 6.8234   LearningRate 0.0265   Epoch: 9   Global Step: 402360   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:35,486-Speed 2622.00 samples/sec   Loss 6.8969   LearningRate 0.0265   Epoch: 9   Global Step: 402370   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:39,408-Speed 2611.54 samples/sec   Loss 6.8127   LearningRate 0.0265   Epoch: 9   Global Step: 402380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:43,305-Speed 2628.58 samples/sec   Loss 6.9651   LearningRate 0.0265   Epoch: 9   Global Step: 402390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:56:47,196-Speed 2631.78 samples/sec   Loss 6.8430   LearningRate 0.0265   Epoch: 9   Global Step: 402400   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:56:51,094-Speed 2628.05 samples/sec   Loss 6.8371   LearningRate 0.0265   Epoch: 9   Global Step: 402410   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:56:54,987-Speed 2630.96 samples/sec   Loss 6.8313   LearningRate 0.0265   Epoch: 9   Global Step: 402420   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:56:58,883-Speed 2628.99 samples/sec   Loss 6.8997   LearningRate 0.0265   Epoch: 9   Global Step: 402430   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:02,780-Speed 2628.60 samples/sec   Loss 6.8620   LearningRate 0.0265   Epoch: 9   Global Step: 402440   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:06,674-Speed 2630.10 samples/sec   Loss 6.9343   LearningRate 0.0265   Epoch: 9   Global Step: 402450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:10,568-Speed 2630.56 samples/sec   Loss 6.8978   LearningRate 0.0265   Epoch: 9   Global Step: 402460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:14,466-Speed 2627.93 samples/sec   Loss 6.9715   LearningRate 0.0265   Epoch: 9   Global Step: 402470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:18,367-Speed 2625.37 samples/sec   Loss 6.8756   LearningRate 0.0265   Epoch: 9   Global Step: 402480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:22,282-Speed 2616.21 samples/sec   Loss 6.8331   LearningRate 0.0265   Epoch: 9   Global Step: 402490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:26,175-Speed 2631.07 samples/sec   Loss 6.9198   LearningRate 0.0265   Epoch: 9   Global Step: 402500   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 16:57:30,059-Speed 2636.99 samples/sec   Loss 6.7880   LearningRate 0.0265   Epoch: 9   Global Step: 402510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:34,064-Speed 2558.01 samples/sec   Loss 6.8967   LearningRate 0.0265   Epoch: 9   Global Step: 402520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:37,961-Speed 2628.12 samples/sec   Loss 6.8738   LearningRate 0.0265   Epoch: 9   Global Step: 402530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:41,873-Speed 2618.11 samples/sec   Loss 6.8208   LearningRate 0.0265   Epoch: 9   Global Step: 402540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:45,768-Speed 2629.87 samples/sec   Loss 6.8672   LearningRate 0.0265   Epoch: 9   Global Step: 402550   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:49,705-Speed 2601.53 samples/sec   Loss 6.9308   LearningRate 0.0265   Epoch: 9   Global Step: 402560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:53,601-Speed 2629.43 samples/sec   Loss 6.9229   LearningRate 0.0265   Epoch: 9   Global Step: 402570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:57:57,496-Speed 2629.74 samples/sec   Loss 6.8791   LearningRate 0.0265   Epoch: 9   Global Step: 402580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:58:01,398-Speed 2624.95 samples/sec   Loss 6.9373   LearningRate 0.0265   Epoch: 9   Global Step: 402590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:58:05,362-Speed 2584.11 samples/sec   Loss 6.8750   LearningRate 0.0265   Epoch: 9   Global Step: 402600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:58:09,269-Speed 2621.30 samples/sec   Loss 6.8470   LearningRate 0.0265   Epoch: 9   Global Step: 402610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:58:13,166-Speed 2628.23 samples/sec   Loss 6.8397   LearningRate 0.0265   Epoch: 9   Global Step: 402620   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:17,062-Speed 2628.94 samples/sec   Loss 6.9422   LearningRate 0.0265   Epoch: 9   Global Step: 402630   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:20,959-Speed 2628.64 samples/sec   Loss 6.9705   LearningRate 0.0265   Epoch: 9   Global Step: 402640   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:24,861-Speed 2625.15 samples/sec   Loss 6.8546   LearningRate 0.0265   Epoch: 9   Global Step: 402650   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:28,758-Speed 2628.20 samples/sec   Loss 7.0045   LearningRate 0.0265   Epoch: 9   Global Step: 402660   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:32,675-Speed 2614.82 samples/sec   Loss 6.9442   LearningRate 0.0265   Epoch: 9   Global Step: 402670   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:36,587-Speed 2618.58 samples/sec   Loss 6.8783   LearningRate 0.0265   Epoch: 9   Global Step: 402680   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:40,480-Speed 2631.33 samples/sec   Loss 6.8566   LearningRate 0.0265   Epoch: 9   Global Step: 402690   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:44,378-Speed 2627.54 samples/sec   Loss 7.0067   LearningRate 0.0265   Epoch: 9   Global Step: 402700   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:48,281-Speed 2623.99 samples/sec   Loss 6.9433   LearningRate 0.0265   Epoch: 9   Global Step: 402710   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:58:52,180-Speed 2626.68 samples/sec   Loss 6.8934   LearningRate 0.0265   Epoch: 9   Global Step: 402720   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:58:56,080-Speed 2626.33 samples/sec   Loss 6.8355   LearningRate 0.0265   Epoch: 9   Global Step: 402730   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:58:59,987-Speed 2621.91 samples/sec   Loss 6.9743   LearningRate 0.0265   Epoch: 9   Global Step: 402740   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:59:03,863-Speed 2642.12 samples/sec   Loss 6.7845   LearningRate 0.0265   Epoch: 9   Global Step: 402750   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:07,768-Speed 2623.12 samples/sec   Loss 6.8761   LearningRate 0.0265   Epoch: 9   Global Step: 402760   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:11,670-Speed 2625.27 samples/sec   Loss 6.9322   LearningRate 0.0265   Epoch: 9   Global Step: 402770   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:15,569-Speed 2627.20 samples/sec   Loss 6.8808   LearningRate 0.0265   Epoch: 9   Global Step: 402780   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:19,466-Speed 2627.96 samples/sec   Loss 6.7978   LearningRate 0.0265   Epoch: 9   Global Step: 402790   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:23,410-Speed 2596.88 samples/sec   Loss 6.8617   LearningRate 0.0265   Epoch: 9   Global Step: 402800   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:27,311-Speed 2625.96 samples/sec   Loss 6.8752   LearningRate 0.0265   Epoch: 9   Global Step: 402810   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:31,218-Speed 2621.77 samples/sec   Loss 6.8820   LearningRate 0.0265   Epoch: 9   Global Step: 402820   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:35,134-Speed 2615.66 samples/sec   Loss 6.8537   LearningRate 0.0265   Epoch: 9   Global Step: 402830   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:39,038-Speed 2623.63 samples/sec   Loss 6.8474   LearningRate 0.0265   Epoch: 9   Global Step: 402840   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 16:59:42,950-Speed 2618.54 samples/sec   Loss 6.8556   LearningRate 0.0265   Epoch: 9   Global Step: 402850   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:59:46,855-Speed 2622.71 samples/sec   Loss 6.7935   LearningRate 0.0265   Epoch: 9   Global Step: 402860   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:59:50,763-Speed 2621.60 samples/sec   Loss 6.8167   LearningRate 0.0265   Epoch: 9   Global Step: 402870   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:59:54,667-Speed 2623.48 samples/sec   Loss 6.8561   LearningRate 0.0265   Epoch: 9   Global Step: 402880   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 16:59:58,545-Speed 2641.68 samples/sec   Loss 6.8265   LearningRate 0.0265   Epoch: 9   Global Step: 402890   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:02,457-Speed 2617.88 samples/sec   Loss 6.8742   LearningRate 0.0265   Epoch: 9   Global Step: 402900   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:06,360-Speed 2624.51 samples/sec   Loss 6.9103   LearningRate 0.0265   Epoch: 9   Global Step: 402910   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:10,279-Speed 2613.29 samples/sec   Loss 6.8309   LearningRate 0.0265   Epoch: 9   Global Step: 402920   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:14,178-Speed 2627.80 samples/sec   Loss 6.8032   LearningRate 0.0265   Epoch: 9   Global Step: 402930   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:18,089-Speed 2618.31 samples/sec   Loss 6.8997   LearningRate 0.0264   Epoch: 9   Global Step: 402940   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:21,984-Speed 2630.06 samples/sec   Loss 6.9920   LearningRate 0.0264   Epoch: 9   Global Step: 402950   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:25,883-Speed 2627.05 samples/sec   Loss 6.9101   LearningRate 0.0264   Epoch: 9   Global Step: 402960   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:29,781-Speed 2628.08 samples/sec   Loss 6.8726   LearningRate 0.0264   Epoch: 9   Global Step: 402970   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:33,680-Speed 2626.78 samples/sec   Loss 6.8863   LearningRate 0.0264   Epoch: 9   Global Step: 402980   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:00:37,581-Speed 2625.15 samples/sec   Loss 6.8618   LearningRate 0.0264   Epoch: 9   Global Step: 402990   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:00:41,490-Speed 2620.33 samples/sec   Loss 6.8348   LearningRate 0.0264   Epoch: 9   Global Step: 403000   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:00:45,400-Speed 2619.58 samples/sec   Loss 6.9072   LearningRate 0.0264   Epoch: 9   Global Step: 403010   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:00:49,296-Speed 2629.08 samples/sec   Loss 6.9377   LearningRate 0.0264   Epoch: 9   Global Step: 403020   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:00:53,194-Speed 2627.96 samples/sec   Loss 6.8728   LearningRate 0.0264   Epoch: 9   Global Step: 403030   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:00:57,101-Speed 2621.75 samples/sec   Loss 6.9616   LearningRate 0.0264   Epoch: 9   Global Step: 403040   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:01,025-Speed 2610.41 samples/sec   Loss 6.8095   LearningRate 0.0264   Epoch: 9   Global Step: 403050   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:05,034-Speed 2554.46 samples/sec   Loss 6.9043   LearningRate 0.0264   Epoch: 9   Global Step: 403060   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:08,940-Speed 2622.23 samples/sec   Loss 6.8031   LearningRate 0.0264   Epoch: 9   Global Step: 403070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:12,840-Speed 2626.28 samples/sec   Loss 6.7906   LearningRate 0.0264   Epoch: 9   Global Step: 403080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:16,727-Speed 2635.69 samples/sec   Loss 6.8624   LearningRate 0.0264   Epoch: 9   Global Step: 403090   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:20,634-Speed 2621.93 samples/sec   Loss 6.8368   LearningRate 0.0264   Epoch: 9   Global Step: 403100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:24,560-Speed 2608.48 samples/sec   Loss 6.8458   LearningRate 0.0264   Epoch: 9   Global Step: 403110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:01:28,431-Speed 2646.87 samples/sec   Loss 6.8750   LearningRate 0.0264   Epoch: 9   Global Step: 403120   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:32,327-Speed 2628.91 samples/sec   Loss 7.0447   LearningRate 0.0264   Epoch: 9   Global Step: 403130   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:36,225-Speed 2627.07 samples/sec   Loss 6.8263   LearningRate 0.0264   Epoch: 9   Global Step: 403140   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:40,123-Speed 2627.47 samples/sec   Loss 6.8940   LearningRate 0.0264   Epoch: 9   Global Step: 403150   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:44,018-Speed 2629.83 samples/sec   Loss 6.8275   LearningRate 0.0264   Epoch: 9   Global Step: 403160   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:47,919-Speed 2625.87 samples/sec   Loss 6.8906   LearningRate 0.0264   Epoch: 9   Global Step: 403170   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:51,819-Speed 2626.22 samples/sec   Loss 6.8025   LearningRate 0.0264   Epoch: 9   Global Step: 403180   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:55,717-Speed 2627.29 samples/sec   Loss 6.9082   LearningRate 0.0264   Epoch: 9   Global Step: 403190   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:01:59,616-Speed 2627.70 samples/sec   Loss 6.8642   LearningRate 0.0264   Epoch: 9   Global Step: 403200   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:02:03,516-Speed 2626.48 samples/sec   Loss 6.8912   LearningRate 0.0264   Epoch: 9   Global Step: 403210   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:02:07,412-Speed 2628.57 samples/sec   Loss 6.9884   LearningRate 0.0264   Epoch: 9   Global Step: 403220   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:11,448-Speed 2537.89 samples/sec   Loss 7.0667   LearningRate 0.0264   Epoch: 9   Global Step: 403230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:15,522-Speed 2513.96 samples/sec   Loss 6.9347   LearningRate 0.0264   Epoch: 9   Global Step: 403240   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:19,434-Speed 2618.26 samples/sec   Loss 6.7539   LearningRate 0.0264   Epoch: 9   Global Step: 403250   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:23,344-Speed 2619.54 samples/sec   Loss 6.9790   LearningRate 0.0264   Epoch: 9   Global Step: 403260   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:27,244-Speed 2626.92 samples/sec   Loss 6.9372   LearningRate 0.0264   Epoch: 9   Global Step: 403270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:31,146-Speed 2624.89 samples/sec   Loss 6.9681   LearningRate 0.0264   Epoch: 9   Global Step: 403280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:35,045-Speed 2626.94 samples/sec   Loss 6.7988   LearningRate 0.0264   Epoch: 9   Global Step: 403290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:38,944-Speed 2626.84 samples/sec   Loss 6.9112   LearningRate 0.0264   Epoch: 9   Global Step: 403300   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:42,841-Speed 2627.78 samples/sec   Loss 6.9535   LearningRate 0.0264   Epoch: 9   Global Step: 403310   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:46,744-Speed 2624.65 samples/sec   Loss 6.8494   LearningRate 0.0264   Epoch: 9   Global Step: 403320   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:02:50,617-Speed 2645.06 samples/sec   Loss 6.8202   LearningRate 0.0264   Epoch: 9   Global Step: 403330   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:54,517-Speed 2626.00 samples/sec   Loss 6.9602   LearningRate 0.0264   Epoch: 9   Global Step: 403340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:02:58,414-Speed 2628.47 samples/sec   Loss 6.7707   LearningRate 0.0264   Epoch: 9   Global Step: 403350   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:03:02,312-Speed 2627.08 samples/sec   Loss 6.9274   LearningRate 0.0264   Epoch: 9   Global Step: 403360   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:03:06,211-Speed 2627.45 samples/sec   Loss 6.9264   LearningRate 0.0264   Epoch: 9   Global Step: 403370   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:03:10,106-Speed 2629.51 samples/sec   Loss 6.8894   LearningRate 0.0264   Epoch: 9   Global Step: 403380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:14,003-Speed 2628.21 samples/sec   Loss 6.8541   LearningRate 0.0264   Epoch: 9   Global Step: 403390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:17,904-Speed 2625.72 samples/sec   Loss 6.9169   LearningRate 0.0264   Epoch: 9   Global Step: 403400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:21,813-Speed 2620.20 samples/sec   Loss 6.9173   LearningRate 0.0264   Epoch: 9   Global Step: 403410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:25,710-Speed 2628.29 samples/sec   Loss 6.8182   LearningRate 0.0264   Epoch: 9   Global Step: 403420   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:29,609-Speed 2627.30 samples/sec   Loss 6.7796   LearningRate 0.0264   Epoch: 9   Global Step: 403430   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:33,507-Speed 2627.63 samples/sec   Loss 6.7867   LearningRate 0.0264   Epoch: 9   Global Step: 403440   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:37,408-Speed 2625.41 samples/sec   Loss 6.7806   LearningRate 0.0264   Epoch: 9   Global Step: 403450   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:41,303-Speed 2629.65 samples/sec   Loss 6.8781   LearningRate 0.0264   Epoch: 9   Global Step: 403460   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:45,210-Speed 2621.00 samples/sec   Loss 6.9026   LearningRate 0.0264   Epoch: 9   Global Step: 403470   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:03:49,106-Speed 2629.54 samples/sec   Loss 6.9526   LearningRate 0.0264   Epoch: 9   Global Step: 403480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:03:53,002-Speed 2629.12 samples/sec   Loss 6.8666   LearningRate 0.0264   Epoch: 9   Global Step: 403490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:03:56,899-Speed 2628.65 samples/sec   Loss 6.8480   LearningRate 0.0264   Epoch: 9   Global Step: 403500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:00,798-Speed 2626.68 samples/sec   Loss 6.8707   LearningRate 0.0264   Epoch: 9   Global Step: 403510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:04,758-Speed 2586.73 samples/sec   Loss 6.9490   LearningRate 0.0264   Epoch: 9   Global Step: 403520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:08,650-Speed 2631.69 samples/sec   Loss 6.7507   LearningRate 0.0264   Epoch: 9   Global Step: 403530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:12,556-Speed 2621.90 samples/sec   Loss 6.9809   LearningRate 0.0264   Epoch: 9   Global Step: 403540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:16,449-Speed 2631.13 samples/sec   Loss 6.8572   LearningRate 0.0264   Epoch: 9   Global Step: 403550   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:20,365-Speed 2615.89 samples/sec   Loss 6.9890   LearningRate 0.0264   Epoch: 9   Global Step: 403560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:24,265-Speed 2626.12 samples/sec   Loss 6.8360   LearningRate 0.0264   Epoch: 9   Global Step: 403570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:28,147-Speed 2638.43 samples/sec   Loss 6.9315   LearningRate 0.0264   Epoch: 9   Global Step: 403580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:32,081-Speed 2603.62 samples/sec   Loss 6.8964   LearningRate 0.0264   Epoch: 9   Global Step: 403590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:35,978-Speed 2628.92 samples/sec   Loss 6.8899   LearningRate 0.0264   Epoch: 9   Global Step: 403600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:39,879-Speed 2625.50 samples/sec   Loss 6.8553   LearningRate 0.0264   Epoch: 9   Global Step: 403610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:43,774-Speed 2629.27 samples/sec   Loss 6.8740   LearningRate 0.0264   Epoch: 9   Global Step: 403620   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:47,675-Speed 2626.14 samples/sec   Loss 6.8395   LearningRate 0.0264   Epoch: 9   Global Step: 403630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:51,574-Speed 2626.80 samples/sec   Loss 6.9077   LearningRate 0.0264   Epoch: 9   Global Step: 403640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:55,468-Speed 2630.78 samples/sec   Loss 6.8466   LearningRate 0.0264   Epoch: 9   Global Step: 403650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:04:59,366-Speed 2627.32 samples/sec   Loss 6.8389   LearningRate 0.0264   Epoch: 9   Global Step: 403660   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:03,258-Speed 2631.44 samples/sec   Loss 6.9522   LearningRate 0.0264   Epoch: 9   Global Step: 403670   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:07,154-Speed 2628.97 samples/sec   Loss 6.9260   LearningRate 0.0264   Epoch: 9   Global Step: 403680   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:05:11,122-Speed 2581.79 samples/sec   Loss 6.7139   LearningRate 0.0264   Epoch: 9   Global Step: 403690   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:15,047-Speed 2609.87 samples/sec   Loss 6.8303   LearningRate 0.0264   Epoch: 9   Global Step: 403700   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:19,046-Speed 2560.99 samples/sec   Loss 6.8935   LearningRate 0.0264   Epoch: 9   Global Step: 403710   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:22,943-Speed 2628.32 samples/sec   Loss 6.8384   LearningRate 0.0264   Epoch: 9   Global Step: 403720   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:26,842-Speed 2627.61 samples/sec   Loss 6.8663   LearningRate 0.0264   Epoch: 9   Global Step: 403730   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:05:30,722-Speed 2639.60 samples/sec   Loss 6.8644   LearningRate 0.0263   Epoch: 9   Global Step: 403740   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:34,622-Speed 2625.96 samples/sec   Loss 6.8726   LearningRate 0.0263   Epoch: 9   Global Step: 403750   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:38,527-Speed 2622.95 samples/sec   Loss 6.9161   LearningRate 0.0263   Epoch: 9   Global Step: 403760   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:42,431-Speed 2623.18 samples/sec   Loss 6.8022   LearningRate 0.0263   Epoch: 9   Global Step: 403770   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:46,338-Speed 2621.98 samples/sec   Loss 6.8755   LearningRate 0.0263   Epoch: 9   Global Step: 403780   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:50,233-Speed 2629.53 samples/sec   Loss 6.8810   LearningRate 0.0263   Epoch: 9   Global Step: 403790   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:54,128-Speed 2629.84 samples/sec   Loss 6.8622   LearningRate 0.0263   Epoch: 9   Global Step: 403800   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:05:58,028-Speed 2626.47 samples/sec   Loss 6.9276   LearningRate 0.0263   Epoch: 9   Global Step: 403810   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:06:01,936-Speed 2620.66 samples/sec   Loss 6.8656   LearningRate 0.0263   Epoch: 9   Global Step: 403820   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:06:05,831-Speed 2629.40 samples/sec   Loss 6.7369   LearningRate 0.0263   Epoch: 9   Global Step: 403830   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:06:09,729-Speed 2627.46 samples/sec   Loss 6.8440   LearningRate 0.0263   Epoch: 9   Global Step: 403840   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:13,627-Speed 2627.54 samples/sec   Loss 6.9481   LearningRate 0.0263   Epoch: 9   Global Step: 403850   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:17,526-Speed 2627.30 samples/sec   Loss 6.9248   LearningRate 0.0263   Epoch: 9   Global Step: 403860   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:21,446-Speed 2613.00 samples/sec   Loss 6.9052   LearningRate 0.0263   Epoch: 9   Global Step: 403870   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:25,344-Speed 2628.23 samples/sec   Loss 6.9588   LearningRate 0.0263   Epoch: 9   Global Step: 403880   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:29,244-Speed 2626.36 samples/sec   Loss 6.8928   LearningRate 0.0263   Epoch: 9   Global Step: 403890   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:33,154-Speed 2619.46 samples/sec   Loss 6.8906   LearningRate 0.0263   Epoch: 9   Global Step: 403900   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:37,057-Speed 2624.20 samples/sec   Loss 6.7542   LearningRate 0.0263   Epoch: 9   Global Step: 403910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:40,957-Speed 2626.43 samples/sec   Loss 6.8258   LearningRate 0.0263   Epoch: 9   Global Step: 403920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:44,855-Speed 2627.68 samples/sec   Loss 6.8201   LearningRate 0.0263   Epoch: 9   Global Step: 403930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:06:48,758-Speed 2624.58 samples/sec   Loss 6.9351   LearningRate 0.0263   Epoch: 9   Global Step: 403940   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:06:52,656-Speed 2628.55 samples/sec   Loss 6.8549   LearningRate 0.0263   Epoch: 9   Global Step: 403950   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:06:56,540-Speed 2636.55 samples/sec   Loss 6.8714   LearningRate 0.0263   Epoch: 9   Global Step: 403960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:07:00,445-Speed 2623.58 samples/sec   Loss 6.9789   LearningRate 0.0263   Epoch: 9   Global Step: 403970   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:07:04,351-Speed 2621.94 samples/sec   Loss 6.8440   LearningRate 0.0263   Epoch: 9   Global Step: 403980   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:07:08,251-Speed 2626.57 samples/sec   Loss 6.7572   LearningRate 0.0263   Epoch: 9   Global Step: 403990   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:07:12,154-Speed 2624.19 samples/sec   Loss 6.8422   LearningRate 0.0263   Epoch: 9   Global Step: 404000   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:07:16,040-Speed 2635.70 samples/sec   Loss 6.8129   LearningRate 0.0263   Epoch: 9   Global Step: 404010   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:19,950-Speed 2619.77 samples/sec   Loss 6.9243   LearningRate 0.0263   Epoch: 9   Global Step: 404020   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:23,850-Speed 2627.03 samples/sec   Loss 6.8508   LearningRate 0.0263   Epoch: 9   Global Step: 404030   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:27,738-Speed 2634.14 samples/sec   Loss 6.7770   LearningRate 0.0263   Epoch: 9   Global Step: 404040   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:31,642-Speed 2623.98 samples/sec   Loss 6.7914   LearningRate 0.0263   Epoch: 9   Global Step: 404050   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:35,538-Speed 2628.35 samples/sec   Loss 6.8287   LearningRate 0.0263   Epoch: 9   Global Step: 404060   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:39,439-Speed 2626.05 samples/sec   Loss 6.8158   LearningRate 0.0263   Epoch: 9   Global Step: 404070   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:43,336-Speed 2628.08 samples/sec   Loss 6.7710   LearningRate 0.0263   Epoch: 9   Global Step: 404080   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:47,240-Speed 2624.03 samples/sec   Loss 6.8784   LearningRate 0.0263   Epoch: 9   Global Step: 404090   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:51,139-Speed 2626.65 samples/sec   Loss 6.8080   LearningRate 0.0263   Epoch: 9   Global Step: 404100   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:07:55,046-Speed 2621.42 samples/sec   Loss 6.8630   LearningRate 0.0263   Epoch: 9   Global Step: 404110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:07:58,949-Speed 2624.65 samples/sec   Loss 6.7967   LearningRate 0.0263   Epoch: 9   Global Step: 404120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:08:02,851-Speed 2624.91 samples/sec   Loss 6.8560   LearningRate 0.0263   Epoch: 9   Global Step: 404130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:08:06,760-Speed 2620.03 samples/sec   Loss 6.8530   LearningRate 0.0263   Epoch: 9   Global Step: 404140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:08:10,655-Speed 2629.65 samples/sec   Loss 6.7337   LearningRate 0.0263   Epoch: 9   Global Step: 404150   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:08:14,606-Speed 2593.05 samples/sec   Loss 6.8676   LearningRate 0.0263   Epoch: 9   Global Step: 404160   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:08:18,474-Speed 2648.17 samples/sec   Loss 6.8815   LearningRate 0.0263   Epoch: 9   Global Step: 404170   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:22,370-Speed 2629.12 samples/sec   Loss 6.8173   LearningRate 0.0263   Epoch: 9   Global Step: 404180   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:26,266-Speed 2629.29 samples/sec   Loss 6.9371   LearningRate 0.0263   Epoch: 9   Global Step: 404190   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:30,173-Speed 2621.71 samples/sec   Loss 6.8785   LearningRate 0.0263   Epoch: 9   Global Step: 404200   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:34,094-Speed 2612.51 samples/sec   Loss 6.8319   LearningRate 0.0263   Epoch: 9   Global Step: 404210   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:37,985-Speed 2632.48 samples/sec   Loss 6.8854   LearningRate 0.0263   Epoch: 9   Global Step: 404220   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:41,889-Speed 2623.05 samples/sec   Loss 7.0140   LearningRate 0.0263   Epoch: 9   Global Step: 404230   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:45,786-Speed 2628.65 samples/sec   Loss 6.8271   LearningRate 0.0263   Epoch: 9   Global Step: 404240   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:49,680-Speed 2630.38 samples/sec   Loss 6.8112   LearningRate 0.0263   Epoch: 9   Global Step: 404250   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:53,581-Speed 2625.75 samples/sec   Loss 6.9461   LearningRate 0.0263   Epoch: 9   Global Step: 404260   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:08:57,475-Speed 2630.26 samples/sec   Loss 6.8970   LearningRate 0.0263   Epoch: 9   Global Step: 404270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:01,388-Speed 2617.81 samples/sec   Loss 6.7254   LearningRate 0.0263   Epoch: 9   Global Step: 404280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:05,301-Speed 2617.99 samples/sec   Loss 6.7903   LearningRate 0.0263   Epoch: 9   Global Step: 404290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:09,198-Speed 2627.68 samples/sec   Loss 6.9243   LearningRate 0.0263   Epoch: 9   Global Step: 404300   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:13,094-Speed 2628.90 samples/sec   Loss 6.7431   LearningRate 0.0263   Epoch: 9   Global Step: 404310   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:16,993-Speed 2627.16 samples/sec   Loss 6.8841   LearningRate 0.0263   Epoch: 9   Global Step: 404320   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:20,894-Speed 2626.09 samples/sec   Loss 6.9538   LearningRate 0.0263   Epoch: 9   Global Step: 404330   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:24,792-Speed 2627.17 samples/sec   Loss 6.8671   LearningRate 0.0263   Epoch: 9   Global Step: 404340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:09:28,680-Speed 2634.91 samples/sec   Loss 6.8252   LearningRate 0.0263   Epoch: 9   Global Step: 404350   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:32,573-Speed 2630.49 samples/sec   Loss 6.9866   LearningRate 0.0263   Epoch: 9   Global Step: 404360   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:36,473-Speed 2627.05 samples/sec   Loss 7.0046   LearningRate 0.0263   Epoch: 9   Global Step: 404370   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:40,377-Speed 2623.48 samples/sec   Loss 6.8468   LearningRate 0.0263   Epoch: 9   Global Step: 404380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:44,296-Speed 2613.00 samples/sec   Loss 6.9777   LearningRate 0.0263   Epoch: 9   Global Step: 404390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:48,218-Speed 2611.42 samples/sec   Loss 6.9190   LearningRate 0.0263   Epoch: 9   Global Step: 404400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:52,129-Speed 2619.82 samples/sec   Loss 6.7674   LearningRate 0.0263   Epoch: 9   Global Step: 404410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:56,050-Speed 2612.28 samples/sec   Loss 6.8213   LearningRate 0.0263   Epoch: 9   Global Step: 404420   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:09:59,973-Speed 2610.82 samples/sec   Loss 6.8945   LearningRate 0.0263   Epoch: 9   Global Step: 404430   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:10:03,876-Speed 2624.45 samples/sec   Loss 6.9117   LearningRate 0.0263   Epoch: 9   Global Step: 404440   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:10:07,784-Speed 2620.83 samples/sec   Loss 6.8010   LearningRate 0.0263   Epoch: 9   Global Step: 404450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:11,708-Speed 2610.42 samples/sec   Loss 6.8129   LearningRate 0.0263   Epoch: 9   Global Step: 404460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:15,621-Speed 2617.75 samples/sec   Loss 6.7700   LearningRate 0.0263   Epoch: 9   Global Step: 404470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:19,523-Speed 2624.50 samples/sec   Loss 6.8817   LearningRate 0.0263   Epoch: 9   Global Step: 404480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:23,418-Speed 2629.86 samples/sec   Loss 6.6957   LearningRate 0.0263   Epoch: 9   Global Step: 404490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:27,329-Speed 2619.56 samples/sec   Loss 6.9000   LearningRate 0.0263   Epoch: 9   Global Step: 404500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:31,232-Speed 2623.78 samples/sec   Loss 6.9284   LearningRate 0.0263   Epoch: 9   Global Step: 404510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:35,222-Speed 2567.22 samples/sec   Loss 6.8376   LearningRate 0.0263   Epoch: 9   Global Step: 404520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:39,119-Speed 2628.24 samples/sec   Loss 6.9872   LearningRate 0.0263   Epoch: 9   Global Step: 404530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:43,018-Speed 2627.24 samples/sec   Loss 6.7994   LearningRate 0.0263   Epoch: 9   Global Step: 404540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:46,916-Speed 2627.73 samples/sec   Loss 6.9292   LearningRate 0.0262   Epoch: 9   Global Step: 404550   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:10:50,809-Speed 2630.91 samples/sec   Loss 6.8524   LearningRate 0.0262   Epoch: 9   Global Step: 404560   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:10:54,683-Speed 2644.11 samples/sec   Loss 6.8986   LearningRate 0.0262   Epoch: 9   Global Step: 404570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:10:58,577-Speed 2630.40 samples/sec   Loss 6.7863   LearningRate 0.0262   Epoch: 9   Global Step: 404580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:02,482-Speed 2622.20 samples/sec   Loss 6.8071   LearningRate 0.0262   Epoch: 9   Global Step: 404590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:06,377-Speed 2629.46 samples/sec   Loss 6.9177   LearningRate 0.0262   Epoch: 9   Global Step: 404600   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:10,278-Speed 2626.37 samples/sec   Loss 6.7940   LearningRate 0.0262   Epoch: 9   Global Step: 404610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:14,224-Speed 2595.51 samples/sec   Loss 6.8021   LearningRate 0.0262   Epoch: 9   Global Step: 404620   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:18,123-Speed 2627.38 samples/sec   Loss 6.9395   LearningRate 0.0262   Epoch: 9   Global Step: 404630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:22,036-Speed 2617.92 samples/sec   Loss 6.8219   LearningRate 0.0262   Epoch: 9   Global Step: 404640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:25,930-Speed 2630.03 samples/sec   Loss 6.8757   LearningRate 0.0262   Epoch: 9   Global Step: 404650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:29,831-Speed 2626.40 samples/sec   Loss 6.9319   LearningRate 0.0262   Epoch: 9   Global Step: 404660   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:33,724-Speed 2630.54 samples/sec   Loss 6.7930   LearningRate 0.0262   Epoch: 9   Global Step: 404670   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:11:37,607-Speed 2637.71 samples/sec   Loss 6.7937   LearningRate 0.0262   Epoch: 9   Global Step: 404680   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:41,504-Speed 2628.02 samples/sec   Loss 6.8581   LearningRate 0.0262   Epoch: 9   Global Step: 404690   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:11:45,389-Speed 2636.75 samples/sec   Loss 7.0308   LearningRate 0.0262   Epoch: 9   Global Step: 404700   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:11:49,304-Speed 2616.51 samples/sec   Loss 6.7442   LearningRate 0.0262   Epoch: 9   Global Step: 404710   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:11:53,209-Speed 2623.38 samples/sec   Loss 6.8578   LearningRate 0.0262   Epoch: 9   Global Step: 404720   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:11:57,132-Speed 2610.79 samples/sec   Loss 6.9268   LearningRate 0.0262   Epoch: 9   Global Step: 404730   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:01,134-Speed 2559.15 samples/sec   Loss 6.9769   LearningRate 0.0262   Epoch: 9   Global Step: 404740   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:05,033-Speed 2627.12 samples/sec   Loss 6.9335   LearningRate 0.0262   Epoch: 9   Global Step: 404750   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:08,934-Speed 2625.45 samples/sec   Loss 6.8983   LearningRate 0.0262   Epoch: 9   Global Step: 404760   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:12,831-Speed 2627.92 samples/sec   Loss 6.8393   LearningRate 0.0262   Epoch: 9   Global Step: 404770   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:16,728-Speed 2628.58 samples/sec   Loss 6.7915   LearningRate 0.0262   Epoch: 9   Global Step: 404780   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:20,623-Speed 2630.14 samples/sec   Loss 6.7294   LearningRate 0.0262   Epoch: 9   Global Step: 404790   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:12:24,523-Speed 2626.51 samples/sec   Loss 6.8130   LearningRate 0.0262   Epoch: 9   Global Step: 404800   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:28,420-Speed 2628.00 samples/sec   Loss 6.8972   LearningRate 0.0262   Epoch: 9   Global Step: 404810   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:32,321-Speed 2626.29 samples/sec   Loss 6.8560   LearningRate 0.0262   Epoch: 9   Global Step: 404820   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:36,219-Speed 2627.21 samples/sec   Loss 6.8877   LearningRate 0.0262   Epoch: 9   Global Step: 404830   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:40,113-Speed 2629.93 samples/sec   Loss 6.7806   LearningRate 0.0262   Epoch: 9   Global Step: 404840   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:44,015-Speed 2625.40 samples/sec   Loss 6.7986   LearningRate 0.0262   Epoch: 9   Global Step: 404850   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:47,917-Speed 2625.24 samples/sec   Loss 6.9130   LearningRate 0.0262   Epoch: 9   Global Step: 404860   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:51,820-Speed 2624.09 samples/sec   Loss 6.8185   LearningRate 0.0262   Epoch: 9   Global Step: 404870   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:55,725-Speed 2623.23 samples/sec   Loss 6.9179   LearningRate 0.0262   Epoch: 9   Global Step: 404880   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:12:59,650-Speed 2609.50 samples/sec   Loss 6.8328   LearningRate 0.0262   Epoch: 9   Global Step: 404890   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:03,546-Speed 2628.81 samples/sec   Loss 6.9444   LearningRate 0.0262   Epoch: 9   Global Step: 404900   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:13:07,428-Speed 2638.20 samples/sec   Loss 6.8004   LearningRate 0.0262   Epoch: 9   Global Step: 404910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:11,327-Speed 2626.90 samples/sec   Loss 6.8098   LearningRate 0.0262   Epoch: 9   Global Step: 404920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:15,225-Speed 2628.11 samples/sec   Loss 6.9086   LearningRate 0.0262   Epoch: 9   Global Step: 404930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:19,134-Speed 2620.00 samples/sec   Loss 6.7276   LearningRate 0.0262   Epoch: 9   Global Step: 404940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:23,044-Speed 2619.90 samples/sec   Loss 6.9524   LearningRate 0.0262   Epoch: 9   Global Step: 404950   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:26,941-Speed 2628.01 samples/sec   Loss 6.8718   LearningRate 0.0262   Epoch: 9   Global Step: 404960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:13:30,900-Speed 2587.76 samples/sec   Loss 6.8065   LearningRate 0.0262   Epoch: 9   Global Step: 404970   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:34,803-Speed 2623.83 samples/sec   Loss 6.8455   LearningRate 0.0262   Epoch: 9   Global Step: 404980   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:38,744-Speed 2599.11 samples/sec   Loss 6.9616   LearningRate 0.0262   Epoch: 9   Global Step: 404990   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:42,644-Speed 2626.19 samples/sec   Loss 6.8932   LearningRate 0.0262   Epoch: 9   Global Step: 405000   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:46,549-Speed 2623.07 samples/sec   Loss 6.8518   LearningRate 0.0262   Epoch: 9   Global Step: 405010   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:50,453-Speed 2623.33 samples/sec   Loss 6.8395   LearningRate 0.0262   Epoch: 9   Global Step: 405020   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:54,349-Speed 2629.18 samples/sec   Loss 6.8311   LearningRate 0.0262   Epoch: 9   Global Step: 405030   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:13:58,247-Speed 2627.25 samples/sec   Loss 6.8272   LearningRate 0.0262   Epoch: 9   Global Step: 405040   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:14:02,166-Speed 2614.25 samples/sec   Loss 6.8057   LearningRate 0.0262   Epoch: 9   Global Step: 405050   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:14:06,068-Speed 2624.51 samples/sec   Loss 6.7418   LearningRate 0.0262   Epoch: 9   Global Step: 405060   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:14:09,971-Speed 2624.71 samples/sec   Loss 6.8193   LearningRate 0.0262   Epoch: 9   Global Step: 405070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:13,866-Speed 2629.01 samples/sec   Loss 6.8205   LearningRate 0.0262   Epoch: 9   Global Step: 405080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:17,765-Speed 2627.55 samples/sec   Loss 6.9422   LearningRate 0.0262   Epoch: 9   Global Step: 405090   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:21,664-Speed 2626.96 samples/sec   Loss 6.8828   LearningRate 0.0262   Epoch: 9   Global Step: 405100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:25,561-Speed 2627.95 samples/sec   Loss 6.8515   LearningRate 0.0262   Epoch: 9   Global Step: 405110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:29,460-Speed 2626.82 samples/sec   Loss 6.8283   LearningRate 0.0262   Epoch: 9   Global Step: 405120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:33,361-Speed 2625.63 samples/sec   Loss 6.8627   LearningRate 0.0262   Epoch: 9   Global Step: 405130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:37,266-Speed 2623.41 samples/sec   Loss 6.8060   LearningRate 0.0262   Epoch: 9   Global Step: 405140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:41,174-Speed 2620.72 samples/sec   Loss 6.7542   LearningRate 0.0262   Epoch: 9   Global Step: 405150   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:45,068-Speed 2629.97 samples/sec   Loss 6.8522   LearningRate 0.0262   Epoch: 9   Global Step: 405160   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:48,963-Speed 2629.43 samples/sec   Loss 6.8363   LearningRate 0.0262   Epoch: 9   Global Step: 405170   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:14:52,894-Speed 2605.97 samples/sec   Loss 6.8933   LearningRate 0.0262   Epoch: 9   Global Step: 405180   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:14:56,789-Speed 2630.05 samples/sec   Loss 6.8206   LearningRate 0.0262   Epoch: 9   Global Step: 405190   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:15:00,698-Speed 2620.08 samples/sec   Loss 6.9592   LearningRate 0.0262   Epoch: 9   Global Step: 405200   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:15:04,590-Speed 2632.67 samples/sec   Loss 6.8122   LearningRate 0.0262   Epoch: 9   Global Step: 405210   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:15:08,481-Speed 2632.19 samples/sec   Loss 6.8608   LearningRate 0.0262   Epoch: 9   Global Step: 405220   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:15:12,382-Speed 2625.37 samples/sec   Loss 6.8016   LearningRate 0.0262   Epoch: 9   Global Step: 405230   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:15:16,259-Speed 2641.58 samples/sec   Loss 6.9281   LearningRate 0.0262   Epoch: 9   Global Step: 405240   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:20,190-Speed 2605.71 samples/sec   Loss 6.9399   LearningRate 0.0262   Epoch: 9   Global Step: 405250   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:24,089-Speed 2627.12 samples/sec   Loss 6.8422   LearningRate 0.0262   Epoch: 9   Global Step: 405260   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:27,994-Speed 2622.87 samples/sec   Loss 6.9219   LearningRate 0.0262   Epoch: 9   Global Step: 405270   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:31,895-Speed 2625.48 samples/sec   Loss 6.8533   LearningRate 0.0262   Epoch: 9   Global Step: 405280   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:35,797-Speed 2625.59 samples/sec   Loss 6.8488   LearningRate 0.0262   Epoch: 9   Global Step: 405290   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:39,739-Speed 2598.48 samples/sec   Loss 6.8612   LearningRate 0.0262   Epoch: 9   Global Step: 405300   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:43,634-Speed 2629.13 samples/sec   Loss 6.9075   LearningRate 0.0262   Epoch: 9   Global Step: 405310   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:47,537-Speed 2624.21 samples/sec   Loss 6.8978   LearningRate 0.0262   Epoch: 9   Global Step: 405320   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:51,436-Speed 2627.36 samples/sec   Loss 6.9366   LearningRate 0.0262   Epoch: 9   Global Step: 405330   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:15:55,348-Speed 2617.96 samples/sec   Loss 6.9162   LearningRate 0.0262   Epoch: 9   Global Step: 405340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:15:59,287-Speed 2600.51 samples/sec   Loss 6.9861   LearningRate 0.0262   Epoch: 9   Global Step: 405350   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:16:03,184-Speed 2628.60 samples/sec   Loss 6.8542   LearningRate 0.0261   Epoch: 9   Global Step: 405360   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:16:07,128-Speed 2596.64 samples/sec   Loss 6.8967   LearningRate 0.0261   Epoch: 9   Global Step: 405370   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:16:11,008-Speed 2640.39 samples/sec   Loss 6.8273   LearningRate 0.0261   Epoch: 9   Global Step: 405380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:14,907-Speed 2626.40 samples/sec   Loss 6.9119   LearningRate 0.0261   Epoch: 9   Global Step: 405390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:18,819-Speed 2618.01 samples/sec   Loss 6.8703   LearningRate 0.0261   Epoch: 9   Global Step: 405400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:22,723-Speed 2624.14 samples/sec   Loss 6.9004   LearningRate 0.0261   Epoch: 9   Global Step: 405410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:26,616-Speed 2630.92 samples/sec   Loss 6.7628   LearningRate 0.0261   Epoch: 9   Global Step: 405420   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:30,519-Speed 2624.48 samples/sec   Loss 6.7016   LearningRate 0.0261   Epoch: 9   Global Step: 405430   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:34,412-Speed 2630.87 samples/sec   Loss 6.8223   LearningRate 0.0261   Epoch: 9   Global Step: 405440   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:38,307-Speed 2629.90 samples/sec   Loss 6.8174   LearningRate 0.0261   Epoch: 9   Global Step: 405450   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:42,200-Speed 2631.14 samples/sec   Loss 6.8620   LearningRate 0.0261   Epoch: 9   Global Step: 405460   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:46,108-Speed 2621.16 samples/sec   Loss 6.8198   LearningRate 0.0261   Epoch: 9   Global Step: 405470   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:16:50,003-Speed 2629.53 samples/sec   Loss 6.7646   LearningRate 0.0261   Epoch: 9   Global Step: 405480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:16:53,909-Speed 2622.73 samples/sec   Loss 6.8763   LearningRate 0.0261   Epoch: 9   Global Step: 405490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:16:57,804-Speed 2629.50 samples/sec   Loss 6.9493   LearningRate 0.0261   Epoch: 9   Global Step: 405500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:01,699-Speed 2629.87 samples/sec   Loss 6.8341   LearningRate 0.0261   Epoch: 9   Global Step: 405510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:05,594-Speed 2629.79 samples/sec   Loss 6.9353   LearningRate 0.0261   Epoch: 9   Global Step: 405520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:09,488-Speed 2629.98 samples/sec   Loss 6.8344   LearningRate 0.0261   Epoch: 9   Global Step: 405530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:13,385-Speed 2628.60 samples/sec   Loss 6.8790   LearningRate 0.0261   Epoch: 9   Global Step: 405540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:17,294-Speed 2620.35 samples/sec   Loss 6.9944   LearningRate 0.0261   Epoch: 9   Global Step: 405550   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:21,195-Speed 2625.42 samples/sec   Loss 6.8000   LearningRate 0.0261   Epoch: 9   Global Step: 405560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:25,100-Speed 2623.21 samples/sec   Loss 6.8226   LearningRate 0.0261   Epoch: 9   Global Step: 405570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:29,003-Speed 2624.05 samples/sec   Loss 6.8085   LearningRate 0.0261   Epoch: 9   Global Step: 405580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:17:32,884-Speed 2638.97 samples/sec   Loss 6.7718   LearningRate 0.0261   Epoch: 9   Global Step: 405590   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:17:36,792-Speed 2621.23 samples/sec   Loss 6.7805   LearningRate 0.0261   Epoch: 9   Global Step: 405600   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:17:40,689-Speed 2628.27 samples/sec   Loss 6.8188   LearningRate 0.0261   Epoch: 9   Global Step: 405610   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:17:44,585-Speed 2628.82 samples/sec   Loss 6.7810   LearningRate 0.0261   Epoch: 9   Global Step: 405620   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:17:48,493-Speed 2621.48 samples/sec   Loss 6.8275   LearningRate 0.0261   Epoch: 9   Global Step: 405630   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:17:52,414-Speed 2612.02 samples/sec   Loss 6.8443   LearningRate 0.0261   Epoch: 9   Global Step: 405640   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:17:56,330-Speed 2615.46 samples/sec   Loss 6.8897   LearningRate 0.0261   Epoch: 9   Global Step: 405650   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:00,226-Speed 2629.06 samples/sec   Loss 6.8299   LearningRate 0.0261   Epoch: 9   Global Step: 405660   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:04,120-Speed 2630.82 samples/sec   Loss 6.9231   LearningRate 0.0261   Epoch: 9   Global Step: 405670   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:08,021-Speed 2625.04 samples/sec   Loss 6.8190   LearningRate 0.0261   Epoch: 9   Global Step: 405680   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:11,928-Speed 2621.65 samples/sec   Loss 6.8767   LearningRate 0.0261   Epoch: 9   Global Step: 405690   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:15,839-Speed 2619.19 samples/sec   Loss 6.8347   LearningRate 0.0261   Epoch: 9   Global Step: 405700   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:19,739-Speed 2626.62 samples/sec   Loss 6.8730   LearningRate 0.0261   Epoch: 9   Global Step: 405710   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:23,645-Speed 2622.03 samples/sec   Loss 6.8021   LearningRate 0.0261   Epoch: 9   Global Step: 405720   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:27,543-Speed 2627.82 samples/sec   Loss 6.9316   LearningRate 0.0261   Epoch: 9   Global Step: 405730   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:31,439-Speed 2628.61 samples/sec   Loss 6.7971   LearningRate 0.0261   Epoch: 9   Global Step: 405740   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:35,354-Speed 2616.26 samples/sec   Loss 6.9642   LearningRate 0.0261   Epoch: 9   Global Step: 405750   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:39,252-Speed 2627.40 samples/sec   Loss 6.7888   LearningRate 0.0261   Epoch: 9   Global Step: 405760   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:18:43,132-Speed 2640.01 samples/sec   Loss 6.7702   LearningRate 0.0261   Epoch: 9   Global Step: 405770   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:47,025-Speed 2631.12 samples/sec   Loss 6.8518   LearningRate 0.0261   Epoch: 9   Global Step: 405780   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:50,918-Speed 2631.31 samples/sec   Loss 6.9087   LearningRate 0.0261   Epoch: 9   Global Step: 405790   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:54,818-Speed 2626.44 samples/sec   Loss 6.7937   LearningRate 0.0261   Epoch: 9   Global Step: 405800   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:18:58,734-Speed 2615.25 samples/sec   Loss 6.8506   LearningRate 0.0261   Epoch: 9   Global Step: 405810   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:02,633-Speed 2626.77 samples/sec   Loss 6.9186   LearningRate 0.0261   Epoch: 9   Global Step: 405820   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:06,527-Speed 2629.98 samples/sec   Loss 7.0167   LearningRate 0.0261   Epoch: 9   Global Step: 405830   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:10,420-Speed 2631.40 samples/sec   Loss 6.7713   LearningRate 0.0261   Epoch: 9   Global Step: 405840   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:14,315-Speed 2628.99 samples/sec   Loss 6.8623   LearningRate 0.0261   Epoch: 9   Global Step: 405850   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:18,210-Speed 2629.87 samples/sec   Loss 6.8897   LearningRate 0.0261   Epoch: 9   Global Step: 405860   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:22,104-Speed 2630.33 samples/sec   Loss 6.8160   LearningRate 0.0261   Epoch: 9   Global Step: 405870   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:19:26,001-Speed 2629.06 samples/sec   Loss 6.7670   LearningRate 0.0261   Epoch: 9   Global Step: 405880   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:19:29,898-Speed 2627.60 samples/sec   Loss 6.7867   LearningRate 0.0261   Epoch: 9   Global Step: 405890   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:19:33,795-Speed 2628.19 samples/sec   Loss 6.7284   LearningRate 0.0261   Epoch: 9   Global Step: 405900   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:19:37,706-Speed 2619.13 samples/sec   Loss 6.8361   LearningRate 0.0261   Epoch: 9   Global Step: 405910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:19:41,606-Speed 2626.41 samples/sec   Loss 6.9804   LearningRate 0.0261   Epoch: 9   Global Step: 405920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:19:45,481-Speed 2643.13 samples/sec   Loss 6.8999   LearningRate 0.0261   Epoch: 9   Global Step: 405930   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:49,398-Speed 2614.53 samples/sec   Loss 6.8834   LearningRate 0.0261   Epoch: 9   Global Step: 405940   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:53,306-Speed 2621.03 samples/sec   Loss 6.8535   LearningRate 0.0261   Epoch: 9   Global Step: 405950   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:19:57,201-Speed 2629.73 samples/sec   Loss 6.8046   LearningRate 0.0261   Epoch: 9   Global Step: 405960   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:01,108-Speed 2621.79 samples/sec   Loss 6.7431   LearningRate 0.0261   Epoch: 9   Global Step: 405970   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:05,005-Speed 2628.22 samples/sec   Loss 6.7714   LearningRate 0.0261   Epoch: 9   Global Step: 405980   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:08,901-Speed 2628.71 samples/sec   Loss 6.7304   LearningRate 0.0261   Epoch: 9   Global Step: 405990   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:12,801-Speed 2626.41 samples/sec   Loss 6.8636   LearningRate 0.0261   Epoch: 9   Global Step: 406000   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:16,697-Speed 2628.66 samples/sec   Loss 6.8237   LearningRate 0.0261   Epoch: 9   Global Step: 406010   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:20,602-Speed 2622.74 samples/sec   Loss 6.7940   LearningRate 0.0261   Epoch: 9   Global Step: 406020   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:20:24,567-Speed 2583.55 samples/sec   Loss 6.9471   LearningRate 0.0261   Epoch: 9   Global Step: 406030   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:28,470-Speed 2623.73 samples/sec   Loss 6.7125   LearningRate 0.0261   Epoch: 9   Global Step: 406040   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:32,469-Speed 2561.70 samples/sec   Loss 6.8151   LearningRate 0.0261   Epoch: 9   Global Step: 406050   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:36,361-Speed 2632.00 samples/sec   Loss 6.8060   LearningRate 0.0261   Epoch: 9   Global Step: 406060   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:40,269-Speed 2620.42 samples/sec   Loss 6.7556   LearningRate 0.0261   Epoch: 9   Global Step: 406070   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:44,165-Speed 2629.09 samples/sec   Loss 6.7356   LearningRate 0.0261   Epoch: 9   Global Step: 406080   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:48,062-Speed 2628.45 samples/sec   Loss 6.9834   LearningRate 0.0261   Epoch: 9   Global Step: 406090   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:51,964-Speed 2624.68 samples/sec   Loss 6.8534   LearningRate 0.0261   Epoch: 9   Global Step: 406100   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:55,861-Speed 2628.35 samples/sec   Loss 6.7633   LearningRate 0.0261   Epoch: 9   Global Step: 406110   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:20:59,754-Speed 2630.76 samples/sec   Loss 6.7827   LearningRate 0.0261   Epoch: 9   Global Step: 406120   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:21:03,632-Speed 2641.67 samples/sec   Loss 6.9425   LearningRate 0.0261   Epoch: 9   Global Step: 406130   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:21:07,535-Speed 2623.94 samples/sec   Loss 6.9778   LearningRate 0.0261   Epoch: 9   Global Step: 406140   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:21:11,436-Speed 2625.23 samples/sec   Loss 6.8856   LearningRate 0.0261   Epoch: 9   Global Step: 406150   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:21:15,324-Speed 2634.37 samples/sec   Loss 6.7174   LearningRate 0.0261   Epoch: 9   Global Step: 406160   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:19,222-Speed 2627.80 samples/sec   Loss 6.8066   LearningRate 0.0260   Epoch: 9   Global Step: 406170   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:23,120-Speed 2627.66 samples/sec   Loss 6.8181   LearningRate 0.0260   Epoch: 9   Global Step: 406180   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:27,018-Speed 2627.33 samples/sec   Loss 6.6909   LearningRate 0.0260   Epoch: 9   Global Step: 406190   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:30,918-Speed 2626.32 samples/sec   Loss 6.8643   LearningRate 0.0260   Epoch: 9   Global Step: 406200   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:34,811-Speed 2631.28 samples/sec   Loss 6.7414   LearningRate 0.0260   Epoch: 9   Global Step: 406210   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:38,709-Speed 2627.07 samples/sec   Loss 6.8680   LearningRate 0.0260   Epoch: 9   Global Step: 406220   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:42,606-Speed 2628.62 samples/sec   Loss 6.7670   LearningRate 0.0260   Epoch: 9   Global Step: 406230   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:46,512-Speed 2622.62 samples/sec   Loss 6.8578   LearningRate 0.0260   Epoch: 9   Global Step: 406240   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:50,408-Speed 2629.01 samples/sec   Loss 6.9184   LearningRate 0.0260   Epoch: 9   Global Step: 406250   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-04-14 17:21:54,303-Speed 2629.23 samples/sec   Loss 6.9083   LearningRate 0.0260   Epoch: 9   Global Step: 406260   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:21:58,208-Speed 2623.34 samples/sec   Loss 6.8656   LearningRate 0.0260   Epoch: 9   Global Step: 406270   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:02,109-Speed 2625.74 samples/sec   Loss 6.8304   LearningRate 0.0260   Epoch: 9   Global Step: 406280   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:06,006-Speed 2628.22 samples/sec   Loss 6.7757   LearningRate 0.0260   Epoch: 9   Global Step: 406290   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:09,897-Speed 2631.99 samples/sec   Loss 6.8997   LearningRate 0.0260   Epoch: 9   Global Step: 406300   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:13,794-Speed 2628.60 samples/sec   Loss 6.9018   LearningRate 0.0260   Epoch: 9   Global Step: 406310   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:17,695-Speed 2625.91 samples/sec   Loss 6.8271   LearningRate 0.0260   Epoch: 9   Global Step: 406320   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:21,593-Speed 2627.53 samples/sec   Loss 6.7784   LearningRate 0.0260   Epoch: 9   Global Step: 406330   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:25,490-Speed 2627.92 samples/sec   Loss 6.8850   LearningRate 0.0260   Epoch: 9   Global Step: 406340   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:29,413-Speed 2611.12 samples/sec   Loss 6.7340   LearningRate 0.0260   Epoch: 9   Global Step: 406350   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:33,323-Speed 2620.01 samples/sec   Loss 6.7459   LearningRate 0.0260   Epoch: 9   Global Step: 406360   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:22:37,214-Speed 2632.14 samples/sec   Loss 6.8466   LearningRate 0.0260   Epoch: 9   Global Step: 406370   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:41,113-Speed 2626.37 samples/sec   Loss 6.8819   LearningRate 0.0260   Epoch: 9   Global Step: 406380   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:45,012-Speed 2627.31 samples/sec   Loss 6.7361   LearningRate 0.0260   Epoch: 9   Global Step: 406390   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:48,908-Speed 2629.57 samples/sec   Loss 6.8408   LearningRate 0.0260   Epoch: 9   Global Step: 406400   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:52,804-Speed 2628.90 samples/sec   Loss 6.7555   LearningRate 0.0260   Epoch: 9   Global Step: 406410   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:22:56,711-Speed 2621.08 samples/sec   Loss 6.9230   LearningRate 0.0260   Epoch: 9   Global Step: 406420   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:00,621-Speed 2619.55 samples/sec   Loss 6.6788   LearningRate 0.0260   Epoch: 9   Global Step: 406430   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:04,517-Speed 2628.65 samples/sec   Loss 6.8723   LearningRate 0.0260   Epoch: 9   Global Step: 406440   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:08,420-Speed 2624.85 samples/sec   Loss 6.8453   LearningRate 0.0260   Epoch: 9   Global Step: 406450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:12,319-Speed 2627.08 samples/sec   Loss 6.8184   LearningRate 0.0260   Epoch: 9   Global Step: 406460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:16,216-Speed 2628.00 samples/sec   Loss 6.7268   LearningRate 0.0260   Epoch: 9   Global Step: 406470   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:23:20,181-Speed 2583.22 samples/sec   Loss 6.8848   LearningRate 0.0260   Epoch: 9   Global Step: 406480   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:23:24,089-Speed 2620.98 samples/sec   Loss 6.8786   LearningRate 0.0260   Epoch: 9   Global Step: 406490   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:23:27,965-Speed 2642.50 samples/sec   Loss 6.9462   LearningRate 0.0260   Epoch: 9   Global Step: 406500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:31,868-Speed 2624.13 samples/sec   Loss 6.8711   LearningRate 0.0260   Epoch: 9   Global Step: 406510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:35,765-Speed 2628.31 samples/sec   Loss 6.8012   LearningRate 0.0260   Epoch: 9   Global Step: 406520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:39,673-Speed 2620.84 samples/sec   Loss 6.8410   LearningRate 0.0260   Epoch: 9   Global Step: 406530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:43,565-Speed 2631.84 samples/sec   Loss 6.8347   LearningRate 0.0260   Epoch: 9   Global Step: 406540   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:47,462-Speed 2628.89 samples/sec   Loss 6.7570   LearningRate 0.0260   Epoch: 9   Global Step: 406550   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:51,356-Speed 2629.81 samples/sec   Loss 6.8345   LearningRate 0.0260   Epoch: 9   Global Step: 406560   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:55,251-Speed 2629.68 samples/sec   Loss 6.7484   LearningRate 0.0260   Epoch: 9   Global Step: 406570   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:23:59,155-Speed 2624.23 samples/sec   Loss 6.7700   LearningRate 0.0260   Epoch: 9   Global Step: 406580   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:03,050-Speed 2629.39 samples/sec   Loss 7.0227   LearningRate 0.0260   Epoch: 9   Global Step: 406590   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:06,963-Speed 2617.36 samples/sec   Loss 6.6515   LearningRate 0.0260   Epoch: 9   Global Step: 406600   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-04-14 17:24:10,859-Speed 2629.07 samples/sec   Loss 6.8896   LearningRate 0.0260   Epoch: 9   Global Step: 406610   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:14,753-Speed 2630.66 samples/sec   Loss 6.8486   LearningRate 0.0260   Epoch: 9   Global Step: 406620   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:18,647-Speed 2630.63 samples/sec   Loss 6.8679   LearningRate 0.0260   Epoch: 9   Global Step: 406630   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:22,559-Speed 2617.74 samples/sec   Loss 6.8015   LearningRate 0.0260   Epoch: 9   Global Step: 406640   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:26,456-Speed 2628.95 samples/sec   Loss 6.8266   LearningRate 0.0260   Epoch: 9   Global Step: 406650   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:30,347-Speed 2632.14 samples/sec   Loss 6.8165   LearningRate 0.0260   Epoch: 9   Global Step: 406660   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:34,238-Speed 2632.50 samples/sec   Loss 6.9547   LearningRate 0.0260   Epoch: 9   Global Step: 406670   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:38,146-Speed 2620.28 samples/sec   Loss 6.8689   LearningRate 0.0260   Epoch: 9   Global Step: 406680   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-04-14 17:24:42,072-Speed 2609.06 samples/sec   Loss 6.8764   LearningRate 0.0260   Epoch: 9   Global Step: 406690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:24:45,977-Speed 2622.65 samples/sec   Loss 6.7440   LearningRate 0.0260   Epoch: 9   Global Step: 406700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:24:49,859-Speed 2639.27 samples/sec   Loss 6.8472   LearningRate 0.0260   Epoch: 9   Global Step: 406710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:24:53,774-Speed 2615.84 samples/sec   Loss 6.9586   LearningRate 0.0260   Epoch: 9   Global Step: 406720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:24:57,755-Speed 2573.54 samples/sec   Loss 6.8350   LearningRate 0.0260   Epoch: 9   Global Step: 406730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:01,650-Speed 2629.02 samples/sec   Loss 6.8535   LearningRate 0.0260   Epoch: 9   Global Step: 406740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:05,548-Speed 2627.78 samples/sec   Loss 6.8148   LearningRate 0.0260   Epoch: 9   Global Step: 406750   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:09,519-Speed 2579.21 samples/sec   Loss 6.7491   LearningRate 0.0260   Epoch: 9   Global Step: 406760   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:13,423-Speed 2624.18 samples/sec   Loss 6.8557   LearningRate 0.0260   Epoch: 9   Global Step: 406770   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:17,320-Speed 2628.26 samples/sec   Loss 6.9896   LearningRate 0.0260   Epoch: 9   Global Step: 406780   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:21,214-Speed 2630.61 samples/sec   Loss 6.8031   LearningRate 0.0260   Epoch: 9   Global Step: 406790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:25,205-Speed 2566.12 samples/sec   Loss 6.7788   LearningRate 0.0260   Epoch: 9   Global Step: 406800   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:25:29,105-Speed 2626.39 samples/sec   Loss 6.7826   LearningRate 0.0260   Epoch: 9   Global Step: 406810   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:25:32,978-Speed 2645.17 samples/sec   Loss 6.7711   LearningRate 0.0260   Epoch: 9   Global Step: 406820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:25:36,877-Speed 2626.78 samples/sec   Loss 6.7862   LearningRate 0.0260   Epoch: 9   Global Step: 406830   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:25:40,773-Speed 2628.80 samples/sec   Loss 6.9344   LearningRate 0.0260   Epoch: 9   Global Step: 406840   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:25:44,675-Speed 2625.08 samples/sec   Loss 6.7006   LearningRate 0.0260   Epoch: 9   Global Step: 406850   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:25:48,568-Speed 2631.29 samples/sec   Loss 6.8653   LearningRate 0.0260   Epoch: 9   Global Step: 406860   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:25:52,463-Speed 2629.88 samples/sec   Loss 6.9882   LearningRate 0.0260   Epoch: 9   Global Step: 406870   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:25:56,399-Speed 2602.59 samples/sec   Loss 6.8469   LearningRate 0.0260   Epoch: 9   Global Step: 406880   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:26:00,319-Speed 2612.54 samples/sec   Loss 6.7518   LearningRate 0.0260   Epoch: 9   Global Step: 406890   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:26:04,250-Speed 2605.49 samples/sec   Loss 6.8261   LearningRate 0.0260   Epoch: 9   Global Step: 406900   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:26:08,144-Speed 2630.84 samples/sec   Loss 6.8225   LearningRate 0.0260   Epoch: 9   Global Step: 406910   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:26:12,038-Speed 2630.29 samples/sec   Loss 6.9221   LearningRate 0.0260   Epoch: 9   Global Step: 406920   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:26:15,934-Speed 2628.95 samples/sec   Loss 6.7316   LearningRate 0.0260   Epoch: 9   Global Step: 406930   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:19,839-Speed 2622.99 samples/sec   Loss 6.8816   LearningRate 0.0260   Epoch: 9   Global Step: 406940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:23,746-Speed 2621.44 samples/sec   Loss 6.7888   LearningRate 0.0260   Epoch: 9   Global Step: 406950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:27,644-Speed 2627.95 samples/sec   Loss 6.7841   LearningRate 0.0260   Epoch: 9   Global Step: 406960   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:31,535-Speed 2632.51 samples/sec   Loss 6.8306   LearningRate 0.0260   Epoch: 9   Global Step: 406970   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:35,430-Speed 2629.11 samples/sec   Loss 6.7790   LearningRate 0.0260   Epoch: 9   Global Step: 406980   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:39,327-Speed 2628.54 samples/sec   Loss 6.8937   LearningRate 0.0259   Epoch: 9   Global Step: 406990   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:43,220-Speed 2630.87 samples/sec   Loss 6.8116   LearningRate 0.0259   Epoch: 9   Global Step: 407000   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:47,123-Speed 2624.61 samples/sec   Loss 6.8465   LearningRate 0.0259   Epoch: 9   Global Step: 407010   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:51,026-Speed 2624.00 samples/sec   Loss 6.7916   LearningRate 0.0259   Epoch: 9   Global Step: 407020   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:26:54,922-Speed 2629.49 samples/sec   Loss 6.8977   LearningRate 0.0259   Epoch: 9   Global Step: 407030   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:26:58,820-Speed 2627.99 samples/sec   Loss 6.8301   LearningRate 0.0259   Epoch: 9   Global Step: 407040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:02,721-Speed 2625.25 samples/sec   Loss 6.8113   LearningRate 0.0259   Epoch: 9   Global Step: 407050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:06,614-Speed 2630.81 samples/sec   Loss 6.8300   LearningRate 0.0259   Epoch: 9   Global Step: 407060   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:10,517-Speed 2624.10 samples/sec   Loss 6.8221   LearningRate 0.0259   Epoch: 9   Global Step: 407070   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:14,415-Speed 2627.57 samples/sec   Loss 6.7708   LearningRate 0.0259   Epoch: 9   Global Step: 407080   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:18,336-Speed 2612.77 samples/sec   Loss 6.7719   LearningRate 0.0259   Epoch: 9   Global Step: 407090   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:22,237-Speed 2625.25 samples/sec   Loss 6.7392   LearningRate 0.0259   Epoch: 9   Global Step: 407100   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:26,256-Speed 2548.93 samples/sec   Loss 6.6337   LearningRate 0.0259   Epoch: 9   Global Step: 407110   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:30,152-Speed 2628.81 samples/sec   Loss 6.7644   LearningRate 0.0259   Epoch: 9   Global Step: 407120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:34,037-Speed 2636.70 samples/sec   Loss 6.9281   LearningRate 0.0259   Epoch: 9   Global Step: 407130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:37,945-Speed 2621.12 samples/sec   Loss 6.7937   LearningRate 0.0259   Epoch: 9   Global Step: 407140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:41,864-Speed 2613.24 samples/sec   Loss 6.7238   LearningRate 0.0259   Epoch: 9   Global Step: 407150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:45,761-Speed 2627.99 samples/sec   Loss 6.7636   LearningRate 0.0259   Epoch: 9   Global Step: 407160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:49,657-Speed 2629.44 samples/sec   Loss 6.9556   LearningRate 0.0259   Epoch: 9   Global Step: 407170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:53,555-Speed 2627.71 samples/sec   Loss 6.9082   LearningRate 0.0259   Epoch: 9   Global Step: 407180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:27:57,454-Speed 2626.96 samples/sec   Loss 6.7106   LearningRate 0.0259   Epoch: 9   Global Step: 407190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:28:01,349-Speed 2630.47 samples/sec   Loss 6.7818   LearningRate 0.0259   Epoch: 9   Global Step: 407200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:28:05,253-Speed 2623.18 samples/sec   Loss 6.6907   LearningRate 0.0259   Epoch: 9   Global Step: 407210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:28:09,159-Speed 2622.05 samples/sec   Loss 6.8323   LearningRate 0.0259   Epoch: 9   Global Step: 407220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:28:13,037-Speed 2640.77 samples/sec   Loss 6.7819   LearningRate 0.0259   Epoch: 9   Global Step: 407230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:28:16,942-Speed 2624.60 samples/sec   Loss 6.8443   LearningRate 0.0259   Epoch: 9   Global Step: 407240   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:20,841-Speed 2626.11 samples/sec   Loss 6.8624   LearningRate 0.0259   Epoch: 9   Global Step: 407250   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:24,746-Speed 2623.67 samples/sec   Loss 6.7399   LearningRate 0.0259   Epoch: 9   Global Step: 407260   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:28,655-Speed 2620.67 samples/sec   Loss 6.8328   LearningRate 0.0259   Epoch: 9   Global Step: 407270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:32,555-Speed 2626.19 samples/sec   Loss 6.7837   LearningRate 0.0259   Epoch: 9   Global Step: 407280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:36,515-Speed 2586.08 samples/sec   Loss 6.7159   LearningRate 0.0259   Epoch: 9   Global Step: 407290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:40,416-Speed 2625.40 samples/sec   Loss 6.7894   LearningRate 0.0259   Epoch: 9   Global Step: 407300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:44,318-Speed 2624.96 samples/sec   Loss 6.7269   LearningRate 0.0259   Epoch: 9   Global Step: 407310   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:48,223-Speed 2623.13 samples/sec   Loss 6.7463   LearningRate 0.0259   Epoch: 9   Global Step: 407320   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:52,122-Speed 2627.52 samples/sec   Loss 6.6910   LearningRate 0.0259   Epoch: 9   Global Step: 407330   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:28:56,024-Speed 2624.39 samples/sec   Loss 6.7108   LearningRate 0.0259   Epoch: 9   Global Step: 407340   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:28:59,937-Speed 2617.60 samples/sec   Loss 6.7798   LearningRate 0.0259   Epoch: 9   Global Step: 407350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:03,843-Speed 2622.39 samples/sec   Loss 6.7750   LearningRate 0.0259   Epoch: 9   Global Step: 407360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:07,735-Speed 2632.00 samples/sec   Loss 6.7898   LearningRate 0.0259   Epoch: 9   Global Step: 407370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:11,630-Speed 2629.79 samples/sec   Loss 6.7724   LearningRate 0.0259   Epoch: 9   Global Step: 407380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:15,532-Speed 2624.81 samples/sec   Loss 6.7641   LearningRate 0.0259   Epoch: 9   Global Step: 407390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:19,425-Speed 2630.69 samples/sec   Loss 6.8593   LearningRate 0.0259   Epoch: 9   Global Step: 407400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:23,318-Speed 2631.37 samples/sec   Loss 6.8235   LearningRate 0.0259   Epoch: 9   Global Step: 407410   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:27,217-Speed 2627.58 samples/sec   Loss 6.6802   LearningRate 0.0259   Epoch: 9   Global Step: 407420   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:31,108-Speed 2632.51 samples/sec   Loss 6.7363   LearningRate 0.0259   Epoch: 9   Global Step: 407430   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:35,002-Speed 2630.08 samples/sec   Loss 6.8014   LearningRate 0.0259   Epoch: 9   Global Step: 407440   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:29:38,889-Speed 2634.63 samples/sec   Loss 6.8700   LearningRate 0.0259   Epoch: 9   Global Step: 407450   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:42,783-Speed 2630.62 samples/sec   Loss 6.8654   LearningRate 0.0259   Epoch: 9   Global Step: 407460   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:46,674-Speed 2632.59 samples/sec   Loss 6.7963   LearningRate 0.0259   Epoch: 9   Global Step: 407470   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:50,579-Speed 2623.02 samples/sec   Loss 6.8230   LearningRate 0.0259   Epoch: 9   Global Step: 407480   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:54,509-Speed 2605.83 samples/sec   Loss 6.8203   LearningRate 0.0259   Epoch: 9   Global Step: 407490   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:29:58,416-Speed 2622.36 samples/sec   Loss 6.9719   LearningRate 0.0259   Epoch: 9   Global Step: 407500   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:02,318-Speed 2625.06 samples/sec   Loss 6.8117   LearningRate 0.0259   Epoch: 9   Global Step: 407510   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:06,215-Speed 2627.90 samples/sec   Loss 6.8138   LearningRate 0.0259   Epoch: 9   Global Step: 407520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:10,125-Speed 2619.66 samples/sec   Loss 6.8548   LearningRate 0.0259   Epoch: 9   Global Step: 407530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:14,021-Speed 2628.72 samples/sec   Loss 6.9601   LearningRate 0.0259   Epoch: 9   Global Step: 407540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:17,957-Speed 2602.42 samples/sec   Loss 6.8094   LearningRate 0.0259   Epoch: 9   Global Step: 407550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:21,853-Speed 2629.65 samples/sec   Loss 6.8494   LearningRate 0.0259   Epoch: 9   Global Step: 407560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:25,746-Speed 2630.52 samples/sec   Loss 6.8154   LearningRate 0.0259   Epoch: 9   Global Step: 407570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:29,643-Speed 2628.51 samples/sec   Loss 6.8535   LearningRate 0.0259   Epoch: 9   Global Step: 407580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:33,588-Speed 2596.33 samples/sec   Loss 6.7681   LearningRate 0.0259   Epoch: 9   Global Step: 407590   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:37,723-Speed 2476.59 samples/sec   Loss 6.9306   LearningRate 0.0259   Epoch: 9   Global Step: 407600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:41,623-Speed 2626.56 samples/sec   Loss 6.6931   LearningRate 0.0259   Epoch: 9   Global Step: 407610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:45,532-Speed 2619.40 samples/sec   Loss 6.8676   LearningRate 0.0259   Epoch: 9   Global Step: 407620   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:49,434-Speed 2625.61 samples/sec   Loss 6.8877   LearningRate 0.0259   Epoch: 9   Global Step: 407630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:53,337-Speed 2624.03 samples/sec   Loss 6.7593   LearningRate 0.0259   Epoch: 9   Global Step: 407640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:30:57,213-Speed 2642.79 samples/sec   Loss 6.8342   LearningRate 0.0259   Epoch: 9   Global Step: 407650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:01,110-Speed 2628.00 samples/sec   Loss 6.7257   LearningRate 0.0259   Epoch: 9   Global Step: 407660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:05,010-Speed 2625.99 samples/sec   Loss 6.8127   LearningRate 0.0259   Epoch: 9   Global Step: 407670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:08,904-Speed 2630.47 samples/sec   Loss 6.7990   LearningRate 0.0259   Epoch: 9   Global Step: 407680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:12,797-Speed 2630.73 samples/sec   Loss 6.8791   LearningRate 0.0259   Epoch: 9   Global Step: 407690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:16,721-Speed 2610.19 samples/sec   Loss 6.8590   LearningRate 0.0259   Epoch: 9   Global Step: 407700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:20,669-Speed 2594.71 samples/sec   Loss 6.8743   LearningRate 0.0259   Epoch: 9   Global Step: 407710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:24,569-Speed 2625.97 samples/sec   Loss 6.6267   LearningRate 0.0259   Epoch: 9   Global Step: 407720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:28,464-Speed 2630.26 samples/sec   Loss 6.8268   LearningRate 0.0259   Epoch: 9   Global Step: 407730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:32,363-Speed 2626.64 samples/sec   Loss 6.7795   LearningRate 0.0259   Epoch: 9   Global Step: 407740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:36,259-Speed 2629.07 samples/sec   Loss 6.8119   LearningRate 0.0259   Epoch: 9   Global Step: 407750   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:31:40,138-Speed 2639.97 samples/sec   Loss 6.8188   LearningRate 0.0259   Epoch: 9   Global Step: 407760   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:31:44,013-Speed 2643.54 samples/sec   Loss 6.7861   LearningRate 0.0259   Epoch: 9   Global Step: 407770   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:31:47,908-Speed 2629.80 samples/sec   Loss 6.8297   LearningRate 0.0259   Epoch: 9   Global Step: 407780   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:31:51,808-Speed 2626.31 samples/sec   Loss 6.8069   LearningRate 0.0259   Epoch: 9   Global Step: 407790   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:31:55,704-Speed 2628.78 samples/sec   Loss 6.7691   LearningRate 0.0258   Epoch: 9   Global Step: 407800   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:31:59,610-Speed 2622.88 samples/sec   Loss 6.9298   LearningRate 0.0258   Epoch: 9   Global Step: 407810   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:03,506-Speed 2628.49 samples/sec   Loss 6.7795   LearningRate 0.0258   Epoch: 9   Global Step: 407820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:07,407-Speed 2625.86 samples/sec   Loss 6.8658   LearningRate 0.0258   Epoch: 9   Global Step: 407830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:11,300-Speed 2630.94 samples/sec   Loss 6.7419   LearningRate 0.0258   Epoch: 9   Global Step: 407840   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:15,197-Speed 2628.54 samples/sec   Loss 6.8447   LearningRate 0.0258   Epoch: 9   Global Step: 407850   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:19,089-Speed 2631.71 samples/sec   Loss 6.8234   LearningRate 0.0258   Epoch: 9   Global Step: 407860   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:22,990-Speed 2625.52 samples/sec   Loss 6.8237   LearningRate 0.0258   Epoch: 9   Global Step: 407870   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:32:26,899-Speed 2619.82 samples/sec   Loss 6.7592   LearningRate 0.0258   Epoch: 9   Global Step: 407880   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:32:30,804-Speed 2623.19 samples/sec   Loss 6.8099   LearningRate 0.0258   Epoch: 9   Global Step: 407890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:34,699-Speed 2629.30 samples/sec   Loss 6.7824   LearningRate 0.0258   Epoch: 9   Global Step: 407900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:38,601-Speed 2625.62 samples/sec   Loss 6.7843   LearningRate 0.0258   Epoch: 9   Global Step: 407910   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:42,498-Speed 2627.92 samples/sec   Loss 6.7846   LearningRate 0.0258   Epoch: 9   Global Step: 407920   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:46,399-Speed 2625.44 samples/sec   Loss 6.7827   LearningRate 0.0258   Epoch: 9   Global Step: 407930   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:50,297-Speed 2627.62 samples/sec   Loss 6.8296   LearningRate 0.0258   Epoch: 9   Global Step: 407940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:54,248-Speed 2592.84 samples/sec   Loss 6.7902   LearningRate 0.0258   Epoch: 9   Global Step: 407950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:32:58,142-Speed 2629.58 samples/sec   Loss 6.9079   LearningRate 0.0258   Epoch: 9   Global Step: 407960   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:33:02,037-Speed 2630.02 samples/sec   Loss 6.8079   LearningRate 0.0258   Epoch: 9   Global Step: 407970   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:33:05,933-Speed 2628.41 samples/sec   Loss 6.7724   LearningRate 0.0258   Epoch: 9   Global Step: 407980   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:33:09,829-Speed 2629.61 samples/sec   Loss 6.7580   LearningRate 0.0258   Epoch: 9   Global Step: 407990   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:13,724-Speed 2629.06 samples/sec   Loss 6.8083   LearningRate 0.0258   Epoch: 9   Global Step: 408000   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:17,622-Speed 2627.73 samples/sec   Loss 6.7153   LearningRate 0.0258   Epoch: 9   Global Step: 408010   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:21,515-Speed 2630.94 samples/sec   Loss 6.8946   LearningRate 0.0258   Epoch: 9   Global Step: 408020   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:25,410-Speed 2629.90 samples/sec   Loss 6.7497   LearningRate 0.0258   Epoch: 9   Global Step: 408030   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:29,308-Speed 2627.82 samples/sec   Loss 6.8143   LearningRate 0.0258   Epoch: 9   Global Step: 408040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:33,210-Speed 2624.36 samples/sec   Loss 6.7765   LearningRate 0.0258   Epoch: 9   Global Step: 408050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:37,112-Speed 2625.51 samples/sec   Loss 6.7908   LearningRate 0.0258   Epoch: 9   Global Step: 408060   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:41,013-Speed 2624.89 samples/sec   Loss 6.7172   LearningRate 0.0258   Epoch: 9   Global Step: 408070   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:33:44,886-Speed 2644.86 samples/sec   Loss 6.8922   LearningRate 0.0258   Epoch: 9   Global Step: 408080   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:33:48,784-Speed 2627.09 samples/sec   Loss 6.8027   LearningRate 0.0258   Epoch: 9   Global Step: 408090   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:33:52,691-Speed 2621.96 samples/sec   Loss 6.7775   LearningRate 0.0258   Epoch: 9   Global Step: 408100   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:33:56,585-Speed 2630.60 samples/sec   Loss 6.8572   LearningRate 0.0258   Epoch: 9   Global Step: 408110   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:00,491-Speed 2622.34 samples/sec   Loss 6.8094   LearningRate 0.0258   Epoch: 9   Global Step: 408120   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:04,393-Speed 2624.76 samples/sec   Loss 6.6931   LearningRate 0.0258   Epoch: 9   Global Step: 408130   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:08,289-Speed 2629.04 samples/sec   Loss 6.7542   LearningRate 0.0258   Epoch: 9   Global Step: 408140   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:12,189-Speed 2626.06 samples/sec   Loss 6.6963   LearningRate 0.0258   Epoch: 9   Global Step: 408150   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:16,099-Speed 2619.50 samples/sec   Loss 6.6612   LearningRate 0.0258   Epoch: 9   Global Step: 408160   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:19,998-Speed 2626.81 samples/sec   Loss 6.8417   LearningRate 0.0258   Epoch: 9   Global Step: 408170   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:23,918-Speed 2612.76 samples/sec   Loss 6.8445   LearningRate 0.0258   Epoch: 9   Global Step: 408180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:27,821-Speed 2623.77 samples/sec   Loss 6.8322   LearningRate 0.0258   Epoch: 9   Global Step: 408190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:31,724-Speed 2624.53 samples/sec   Loss 6.7728   LearningRate 0.0258   Epoch: 9   Global Step: 408200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:35,618-Speed 2629.98 samples/sec   Loss 6.8231   LearningRate 0.0258   Epoch: 9   Global Step: 408210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:39,518-Speed 2626.39 samples/sec   Loss 6.8803   LearningRate 0.0258   Epoch: 9   Global Step: 408220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:43,418-Speed 2626.64 samples/sec   Loss 6.7723   LearningRate 0.0258   Epoch: 9   Global Step: 408230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:47,315-Speed 2628.01 samples/sec   Loss 6.7695   LearningRate 0.0258   Epoch: 9   Global Step: 408240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:34:51,195-Speed 2639.73 samples/sec   Loss 6.7213   LearningRate 0.0258   Epoch: 9   Global Step: 408250   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:55,091-Speed 2629.47 samples/sec   Loss 6.7778   LearningRate 0.0258   Epoch: 9   Global Step: 408260   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:34:58,987-Speed 2628.40 samples/sec   Loss 6.7975   LearningRate 0.0258   Epoch: 9   Global Step: 408270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:02,882-Speed 2629.36 samples/sec   Loss 6.8343   LearningRate 0.0258   Epoch: 9   Global Step: 408280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:06,781-Speed 2627.39 samples/sec   Loss 6.7843   LearningRate 0.0258   Epoch: 9   Global Step: 408290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:10,689-Speed 2620.20 samples/sec   Loss 6.7108   LearningRate 0.0258   Epoch: 9   Global Step: 408300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:14,591-Speed 2625.87 samples/sec   Loss 6.8652   LearningRate 0.0258   Epoch: 9   Global Step: 408310   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:18,491-Speed 2626.04 samples/sec   Loss 6.8130   LearningRate 0.0258   Epoch: 9   Global Step: 408320   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:22,395-Speed 2623.47 samples/sec   Loss 6.7174   LearningRate 0.0258   Epoch: 9   Global Step: 408330   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:26,293-Speed 2627.22 samples/sec   Loss 6.6777   LearningRate 0.0258   Epoch: 9   Global Step: 408340   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:35:30,192-Speed 2627.05 samples/sec   Loss 6.7692   LearningRate 0.0258   Epoch: 9   Global Step: 408350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:34,108-Speed 2615.60 samples/sec   Loss 6.7491   LearningRate 0.0258   Epoch: 9   Global Step: 408360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:38,016-Speed 2620.78 samples/sec   Loss 6.7791   LearningRate 0.0258   Epoch: 9   Global Step: 408370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:41,923-Speed 2621.12 samples/sec   Loss 6.8244   LearningRate 0.0258   Epoch: 9   Global Step: 408380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:45,821-Speed 2627.74 samples/sec   Loss 6.8390   LearningRate 0.0258   Epoch: 9   Global Step: 408390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:49,752-Speed 2605.49 samples/sec   Loss 6.7374   LearningRate 0.0258   Epoch: 9   Global Step: 408400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:53,652-Speed 2626.62 samples/sec   Loss 6.7895   LearningRate 0.0258   Epoch: 9   Global Step: 408410   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:35:57,558-Speed 2622.10 samples/sec   Loss 6.8240   LearningRate 0.0258   Epoch: 9   Global Step: 408420   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:01,452-Speed 2630.38 samples/sec   Loss 6.8741   LearningRate 0.0258   Epoch: 9   Global Step: 408430   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:05,350-Speed 2627.63 samples/sec   Loss 6.7727   LearningRate 0.0258   Epoch: 9   Global Step: 408440   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:09,253-Speed 2624.07 samples/sec   Loss 6.8837   LearningRate 0.0258   Epoch: 9   Global Step: 408450   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:36:13,128-Speed 2643.03 samples/sec   Loss 6.7817   LearningRate 0.0258   Epoch: 9   Global Step: 408460   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:17,031-Speed 2623.91 samples/sec   Loss 6.7474   LearningRate 0.0258   Epoch: 9   Global Step: 408470   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:20,924-Speed 2631.59 samples/sec   Loss 6.7852   LearningRate 0.0258   Epoch: 9   Global Step: 408480   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:24,825-Speed 2625.28 samples/sec   Loss 6.7488   LearningRate 0.0258   Epoch: 9   Global Step: 408490   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:28,723-Speed 2627.65 samples/sec   Loss 6.8283   LearningRate 0.0258   Epoch: 9   Global Step: 408500   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:32,627-Speed 2623.68 samples/sec   Loss 6.7479   LearningRate 0.0258   Epoch: 9   Global Step: 408510   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:36,526-Speed 2626.75 samples/sec   Loss 6.8600   LearningRate 0.0258   Epoch: 9   Global Step: 408520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:40,427-Speed 2625.48 samples/sec   Loss 6.8002   LearningRate 0.0258   Epoch: 9   Global Step: 408530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:44,325-Speed 2627.66 samples/sec   Loss 6.7189   LearningRate 0.0258   Epoch: 9   Global Step: 408540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:48,221-Speed 2629.05 samples/sec   Loss 6.7937   LearningRate 0.0258   Epoch: 9   Global Step: 408550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:52,102-Speed 2639.43 samples/sec   Loss 6.6785   LearningRate 0.0258   Epoch: 9   Global Step: 408560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:56,014-Speed 2617.96 samples/sec   Loss 6.7411   LearningRate 0.0258   Epoch: 9   Global Step: 408570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:36:59,927-Speed 2617.36 samples/sec   Loss 6.7291   LearningRate 0.0258   Epoch: 9   Global Step: 408580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:37:03,816-Speed 2634.17 samples/sec   Loss 6.9530   LearningRate 0.0258   Epoch: 9   Global Step: 408590   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:07,708-Speed 2631.39 samples/sec   Loss 6.7700   LearningRate 0.0258   Epoch: 9   Global Step: 408600   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:11,605-Speed 2627.87 samples/sec   Loss 6.7461   LearningRate 0.0258   Epoch: 9   Global Step: 408610   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:15,501-Speed 2629.36 samples/sec   Loss 6.8490   LearningRate 0.0257   Epoch: 9   Global Step: 408620   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:19,405-Speed 2623.78 samples/sec   Loss 6.8136   LearningRate 0.0257   Epoch: 9   Global Step: 408630   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:23,324-Speed 2613.59 samples/sec   Loss 6.7809   LearningRate 0.0257   Epoch: 9   Global Step: 408640   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:27,222-Speed 2627.46 samples/sec   Loss 6.6919   LearningRate 0.0257   Epoch: 9   Global Step: 408650   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:31,126-Speed 2623.68 samples/sec   Loss 6.8989   LearningRate 0.0257   Epoch: 9   Global Step: 408660   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:35,039-Speed 2617.08 samples/sec   Loss 6.7906   LearningRate 0.0257   Epoch: 9   Global Step: 408670   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:38,934-Speed 2629.84 samples/sec   Loss 6.7579   LearningRate 0.0257   Epoch: 9   Global Step: 408680   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:37:42,827-Speed 2630.59 samples/sec   Loss 6.7144   LearningRate 0.0257   Epoch: 9   Global Step: 408690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:37:46,724-Speed 2628.67 samples/sec   Loss 6.7553   LearningRate 0.0257   Epoch: 9   Global Step: 408700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:37:50,621-Speed 2628.25 samples/sec   Loss 6.7519   LearningRate 0.0257   Epoch: 9   Global Step: 408710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:37:54,519-Speed 2628.05 samples/sec   Loss 6.7207   LearningRate 0.0257   Epoch: 9   Global Step: 408720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:37:58,420-Speed 2625.32 samples/sec   Loss 6.7069   LearningRate 0.0257   Epoch: 9   Global Step: 408730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:38:02,300-Speed 2639.70 samples/sec   Loss 6.8104   LearningRate 0.0257   Epoch: 9   Global Step: 408740   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:06,198-Speed 2627.47 samples/sec   Loss 6.7776   LearningRate 0.0257   Epoch: 9   Global Step: 408750   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:10,093-Speed 2629.59 samples/sec   Loss 6.7496   LearningRate 0.0257   Epoch: 9   Global Step: 408760   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:13,987-Speed 2630.26 samples/sec   Loss 6.8564   LearningRate 0.0257   Epoch: 9   Global Step: 408770   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:17,883-Speed 2628.65 samples/sec   Loss 6.6938   LearningRate 0.0257   Epoch: 9   Global Step: 408780   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:21,781-Speed 2627.33 samples/sec   Loss 6.6820   LearningRate 0.0257   Epoch: 9   Global Step: 408790   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:25,680-Speed 2627.46 samples/sec   Loss 6.8285   LearningRate 0.0257   Epoch: 9   Global Step: 408800   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:29,576-Speed 2630.60 samples/sec   Loss 6.6372   LearningRate 0.0257   Epoch: 9   Global Step: 408810   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:33,471-Speed 2629.19 samples/sec   Loss 6.7109   LearningRate 0.0257   Epoch: 9   Global Step: 408820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:37,372-Speed 2625.56 samples/sec   Loss 6.8812   LearningRate 0.0257   Epoch: 9   Global Step: 408830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:41,269-Speed 2628.18 samples/sec   Loss 6.9308   LearningRate 0.0257   Epoch: 9   Global Step: 408840   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:38:45,175-Speed 2622.13 samples/sec   Loss 6.8427   LearningRate 0.0257   Epoch: 9   Global Step: 408850   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:38:49,049-Speed 2644.03 samples/sec   Loss 6.8291   LearningRate 0.0257   Epoch: 9   Global Step: 408860   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:52,949-Speed 2626.13 samples/sec   Loss 6.7525   LearningRate 0.0257   Epoch: 9   Global Step: 408870   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:38:56,847-Speed 2627.10 samples/sec   Loss 6.6866   LearningRate 0.0257   Epoch: 9   Global Step: 408880   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:00,748-Speed 2626.01 samples/sec   Loss 6.7734   LearningRate 0.0257   Epoch: 9   Global Step: 408890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:04,647-Speed 2627.28 samples/sec   Loss 6.6592   LearningRate 0.0257   Epoch: 9   Global Step: 408900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:08,547-Speed 2626.13 samples/sec   Loss 6.8371   LearningRate 0.0257   Epoch: 9   Global Step: 408910   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:12,443-Speed 2629.13 samples/sec   Loss 6.8211   LearningRate 0.0257   Epoch: 9   Global Step: 408920   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:16,343-Speed 2626.37 samples/sec   Loss 6.7748   LearningRate 0.0257   Epoch: 9   Global Step: 408930   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:20,245-Speed 2624.64 samples/sec   Loss 6.6674   LearningRate 0.0257   Epoch: 9   Global Step: 408940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:24,152-Speed 2621.09 samples/sec   Loss 6.8401   LearningRate 0.0257   Epoch: 9   Global Step: 408950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:28,051-Speed 2627.44 samples/sec   Loss 6.7404   LearningRate 0.0257   Epoch: 9   Global Step: 408960   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:39:31,949-Speed 2627.16 samples/sec   Loss 6.8129   LearningRate 0.0257   Epoch: 9   Global Step: 408970   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:39:35,843-Speed 2630.95 samples/sec   Loss 6.8098   LearningRate 0.0257   Epoch: 9   Global Step: 408980   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:39:39,748-Speed 2623.21 samples/sec   Loss 6.7226   LearningRate 0.0257   Epoch: 9   Global Step: 408990   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:39:43,640-Speed 2631.76 samples/sec   Loss 6.7006   LearningRate 0.0257   Epoch: 9   Global Step: 409000   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:39:47,534-Speed 2630.14 samples/sec   Loss 6.7943   LearningRate 0.0257   Epoch: 9   Global Step: 409010   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:39:51,407-Speed 2644.36 samples/sec   Loss 6.8931   LearningRate 0.0257   Epoch: 9   Global Step: 409020   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:55,313-Speed 2622.15 samples/sec   Loss 6.8092   LearningRate 0.0257   Epoch: 9   Global Step: 409030   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:39:59,219-Speed 2622.00 samples/sec   Loss 6.7806   LearningRate 0.0257   Epoch: 9   Global Step: 409040   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:03,116-Speed 2628.09 samples/sec   Loss 6.7774   LearningRate 0.0257   Epoch: 9   Global Step: 409050   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:07,012-Speed 2630.28 samples/sec   Loss 6.8156   LearningRate 0.0257   Epoch: 9   Global Step: 409060   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:10,905-Speed 2630.17 samples/sec   Loss 6.8496   LearningRate 0.0257   Epoch: 9   Global Step: 409070   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:14,812-Speed 2622.84 samples/sec   Loss 6.8210   LearningRate 0.0257   Epoch: 9   Global Step: 409080   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:18,709-Speed 2628.36 samples/sec   Loss 6.7297   LearningRate 0.0257   Epoch: 9   Global Step: 409090   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:22,607-Speed 2627.84 samples/sec   Loss 6.7688   LearningRate 0.0257   Epoch: 9   Global Step: 409100   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:26,552-Speed 2595.90 samples/sec   Loss 6.7419   LearningRate 0.0257   Epoch: 9   Global Step: 409110   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:40:30,479-Speed 2608.63 samples/sec   Loss 6.7709   LearningRate 0.0257   Epoch: 9   Global Step: 409120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:34,387-Speed 2621.10 samples/sec   Loss 6.6692   LearningRate 0.0257   Epoch: 9   Global Step: 409130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:38,280-Speed 2631.33 samples/sec   Loss 6.7678   LearningRate 0.0257   Epoch: 9   Global Step: 409140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:42,172-Speed 2631.20 samples/sec   Loss 6.7460   LearningRate 0.0257   Epoch: 9   Global Step: 409150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:46,069-Speed 2628.47 samples/sec   Loss 6.8192   LearningRate 0.0257   Epoch: 9   Global Step: 409160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:49,971-Speed 2624.79 samples/sec   Loss 6.7039   LearningRate 0.0257   Epoch: 9   Global Step: 409170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:53,871-Speed 2626.07 samples/sec   Loss 6.9656   LearningRate 0.0257   Epoch: 9   Global Step: 409180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:40:57,779-Speed 2620.88 samples/sec   Loss 6.8234   LearningRate 0.0257   Epoch: 9   Global Step: 409190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:41:01,682-Speed 2624.47 samples/sec   Loss 6.8922   LearningRate 0.0257   Epoch: 9   Global Step: 409200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:41:05,567-Speed 2636.19 samples/sec   Loss 6.7074   LearningRate 0.0257   Epoch: 9   Global Step: 409210   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:09,474-Speed 2621.82 samples/sec   Loss 6.7683   LearningRate 0.0257   Epoch: 9   Global Step: 409220   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:13,371-Speed 2627.73 samples/sec   Loss 6.6961   LearningRate 0.0257   Epoch: 9   Global Step: 409230   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:17,264-Speed 2631.24 samples/sec   Loss 6.8550   LearningRate 0.0257   Epoch: 9   Global Step: 409240   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:21,162-Speed 2627.31 samples/sec   Loss 6.8153   LearningRate 0.0257   Epoch: 9   Global Step: 409250   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:25,067-Speed 2623.11 samples/sec   Loss 6.6055   LearningRate 0.0257   Epoch: 9   Global Step: 409260   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:28,961-Speed 2630.18 samples/sec   Loss 6.9252   LearningRate 0.0257   Epoch: 9   Global Step: 409270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:32,858-Speed 2628.56 samples/sec   Loss 6.7548   LearningRate 0.0257   Epoch: 9   Global Step: 409280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:36,842-Speed 2570.96 samples/sec   Loss 6.7781   LearningRate 0.0257   Epoch: 9   Global Step: 409290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:40,736-Speed 2629.54 samples/sec   Loss 6.7714   LearningRate 0.0257   Epoch: 9   Global Step: 409300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:41:44,633-Speed 2628.72 samples/sec   Loss 6.8466   LearningRate 0.0257   Epoch: 9   Global Step: 409310   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:41:48,527-Speed 2630.13 samples/sec   Loss 6.6702   LearningRate 0.0257   Epoch: 9   Global Step: 409320   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:41:52,439-Speed 2618.29 samples/sec   Loss 6.6960   LearningRate 0.0257   Epoch: 9   Global Step: 409330   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:41:56,334-Speed 2629.18 samples/sec   Loss 6.7535   LearningRate 0.0257   Epoch: 9   Global Step: 409340   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:00,233-Speed 2627.25 samples/sec   Loss 6.7464   LearningRate 0.0257   Epoch: 9   Global Step: 409350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:04,134-Speed 2625.45 samples/sec   Loss 6.7470   LearningRate 0.0257   Epoch: 9   Global Step: 409360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:08,038-Speed 2623.55 samples/sec   Loss 6.6759   LearningRate 0.0257   Epoch: 9   Global Step: 409370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:11,935-Speed 2628.28 samples/sec   Loss 6.6901   LearningRate 0.0257   Epoch: 9   Global Step: 409380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:15,831-Speed 2629.50 samples/sec   Loss 6.6633   LearningRate 0.0257   Epoch: 9   Global Step: 409390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:19,727-Speed 2629.58 samples/sec   Loss 6.7740   LearningRate 0.0257   Epoch: 9   Global Step: 409400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:42:23,622-Speed 2629.16 samples/sec   Loss 6.7931   LearningRate 0.0257   Epoch: 9   Global Step: 409410   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:42:27,535-Speed 2617.66 samples/sec   Loss 6.7997   LearningRate 0.0257   Epoch: 9   Global Step: 409420   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:42:31,377-Speed 2665.99 samples/sec   Loss 6.5777   LearningRate 0.0257   Epoch: 9   Global Step: 409430   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:35,290-Speed 2617.00 samples/sec   Loss 6.7329   LearningRate 0.0256   Epoch: 9   Global Step: 409440   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:39,201-Speed 2618.89 samples/sec   Loss 6.7587   LearningRate 0.0256   Epoch: 9   Global Step: 409450   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:43,091-Speed 2633.32 samples/sec   Loss 6.6980   LearningRate 0.0256   Epoch: 9   Global Step: 409460   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:46,989-Speed 2627.66 samples/sec   Loss 6.8052   LearningRate 0.0256   Epoch: 9   Global Step: 409470   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:50,883-Speed 2630.66 samples/sec   Loss 6.7083   LearningRate 0.0256   Epoch: 9   Global Step: 409480   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:54,782-Speed 2626.28 samples/sec   Loss 6.7337   LearningRate 0.0256   Epoch: 9   Global Step: 409490   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:42:58,681-Speed 2627.20 samples/sec   Loss 6.8198   LearningRate 0.0256   Epoch: 9   Global Step: 409500   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:43:02,585-Speed 2623.44 samples/sec   Loss 6.7521   LearningRate 0.0256   Epoch: 9   Global Step: 409510   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:43:06,491-Speed 2622.08 samples/sec   Loss 6.8167   LearningRate 0.0256   Epoch: 9   Global Step: 409520   Fp16 Grad Scale: 32768   Required: 47 hours
Training: 2022-04-14 17:43:10,388-Speed 2628.32 samples/sec   Loss 6.7226   LearningRate 0.0256   Epoch: 9   Global Step: 409530   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:14,285-Speed 2628.08 samples/sec   Loss 6.8083   LearningRate 0.0256   Epoch: 9   Global Step: 409540   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:18,190-Speed 2622.44 samples/sec   Loss 6.8394   LearningRate 0.0256   Epoch: 9   Global Step: 409550   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:22,093-Speed 2624.86 samples/sec   Loss 6.8592   LearningRate 0.0256   Epoch: 9   Global Step: 409560   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:25,989-Speed 2628.99 samples/sec   Loss 6.9010   LearningRate 0.0256   Epoch: 9   Global Step: 409570   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:29,882-Speed 2631.34 samples/sec   Loss 6.7431   LearningRate 0.0256   Epoch: 9   Global Step: 409580   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:33,784-Speed 2624.79 samples/sec   Loss 6.6812   LearningRate 0.0256   Epoch: 9   Global Step: 409590   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:37,681-Speed 2628.13 samples/sec   Loss 6.7626   LearningRate 0.0256   Epoch: 9   Global Step: 409600   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:41,576-Speed 2628.98 samples/sec   Loss 6.8771   LearningRate 0.0256   Epoch: 9   Global Step: 409610   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:45,472-Speed 2629.42 samples/sec   Loss 6.8589   LearningRate 0.0256   Epoch: 9   Global Step: 409620   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:43:49,365-Speed 2630.57 samples/sec   Loss 6.7160   LearningRate 0.0256   Epoch: 9   Global Step: 409630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:43:53,263-Speed 2627.65 samples/sec   Loss 6.7161   LearningRate 0.0256   Epoch: 9   Global Step: 409640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:43:57,164-Speed 2625.97 samples/sec   Loss 6.7441   LearningRate 0.0256   Epoch: 9   Global Step: 409650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:01,056-Speed 2631.73 samples/sec   Loss 6.7628   LearningRate 0.0256   Epoch: 9   Global Step: 409660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:04,954-Speed 2627.44 samples/sec   Loss 6.8309   LearningRate 0.0256   Epoch: 9   Global Step: 409670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:08,847-Speed 2631.54 samples/sec   Loss 6.7581   LearningRate 0.0256   Epoch: 9   Global Step: 409680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:12,740-Speed 2631.00 samples/sec   Loss 6.7747   LearningRate 0.0256   Epoch: 9   Global Step: 409690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:16,642-Speed 2624.91 samples/sec   Loss 6.7618   LearningRate 0.0256   Epoch: 9   Global Step: 409700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:20,543-Speed 2625.68 samples/sec   Loss 6.7141   LearningRate 0.0256   Epoch: 9   Global Step: 409710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:24,443-Speed 2626.29 samples/sec   Loss 6.7358   LearningRate 0.0256   Epoch: 9   Global Step: 409720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:28,318-Speed 2642.86 samples/sec   Loss 6.7577   LearningRate 0.0256   Epoch: 9   Global Step: 409730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:32,216-Speed 2627.18 samples/sec   Loss 6.8465   LearningRate 0.0256   Epoch: 9   Global Step: 409740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:36,111-Speed 2630.22 samples/sec   Loss 6.7717   LearningRate 0.0256   Epoch: 9   Global Step: 409750   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:40,013-Speed 2624.38 samples/sec   Loss 6.7216   LearningRate 0.0256   Epoch: 9   Global Step: 409760   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:43,910-Speed 2629.09 samples/sec   Loss 6.5616   LearningRate 0.0256   Epoch: 9   Global Step: 409770   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:47,815-Speed 2622.40 samples/sec   Loss 6.7320   LearningRate 0.0256   Epoch: 9   Global Step: 409780   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:51,710-Speed 2629.78 samples/sec   Loss 6.8301   LearningRate 0.0256   Epoch: 9   Global Step: 409790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:55,607-Speed 2628.05 samples/sec   Loss 6.7313   LearningRate 0.0256   Epoch: 9   Global Step: 409800   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:44:59,522-Speed 2616.62 samples/sec   Loss 6.8266   LearningRate 0.0256   Epoch: 9   Global Step: 409810   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:03,423-Speed 2625.02 samples/sec   Loss 6.6665   LearningRate 0.0256   Epoch: 9   Global Step: 409820   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:07,308-Speed 2636.96 samples/sec   Loss 6.7994   LearningRate 0.0256   Epoch: 9   Global Step: 409830   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:11,201-Speed 2630.50 samples/sec   Loss 6.8690   LearningRate 0.0256   Epoch: 9   Global Step: 409840   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:15,100-Speed 2627.15 samples/sec   Loss 6.7415   LearningRate 0.0256   Epoch: 9   Global Step: 409850   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:18,997-Speed 2628.09 samples/sec   Loss 6.7156   LearningRate 0.0256   Epoch: 9   Global Step: 409860   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:22,902-Speed 2623.30 samples/sec   Loss 6.7726   LearningRate 0.0256   Epoch: 9   Global Step: 409870   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:26,805-Speed 2624.17 samples/sec   Loss 6.7379   LearningRate 0.0256   Epoch: 9   Global Step: 409880   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:30,703-Speed 2627.15 samples/sec   Loss 6.7083   LearningRate 0.0256   Epoch: 9   Global Step: 409890   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:34,599-Speed 2629.08 samples/sec   Loss 6.7390   LearningRate 0.0256   Epoch: 9   Global Step: 409900   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:38,516-Speed 2615.22 samples/sec   Loss 6.7402   LearningRate 0.0256   Epoch: 9   Global Step: 409910   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:42,412-Speed 2629.15 samples/sec   Loss 6.6981   LearningRate 0.0256   Epoch: 9   Global Step: 409920   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:46,306-Speed 2629.98 samples/sec   Loss 6.7530   LearningRate 0.0256   Epoch: 9   Global Step: 409930   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:45:50,210-Speed 2623.07 samples/sec   Loss 6.7691   LearningRate 0.0256   Epoch: 9   Global Step: 409940   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:45:54,115-Speed 2623.37 samples/sec   Loss 6.9012   LearningRate 0.0256   Epoch: 9   Global Step: 409950   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:45:58,204-Speed 2505.07 samples/sec   Loss 6.7149   LearningRate 0.0256   Epoch: 9   Global Step: 409960   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:46:02,186-Speed 2572.02 samples/sec   Loss 6.9023   LearningRate 0.0256   Epoch: 9   Global Step: 409970   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:46:06,081-Speed 2629.41 samples/sec   Loss 6.7609   LearningRate 0.0256   Epoch: 9   Global Step: 409980   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:46:09,984-Speed 2624.32 samples/sec   Loss 6.7807   LearningRate 0.0256   Epoch: 9   Global Step: 409990   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:46:13,881-Speed 2628.33 samples/sec   Loss 6.6822   LearningRate 0.0256   Epoch: 9   Global Step: 410000   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:46:56,874-[lfw][410000]XNorm: 23.137810
Training: 2022-04-14 17:46:56,875-[lfw][410000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-14 17:46:56,876-[lfw][410000]Accuracy-Highest: 0.99783
Training: 2022-04-14 17:47:46,575-[cfp_fp][410000]XNorm: 21.697988
Training: 2022-04-14 17:47:46,576-[cfp_fp][410000]Accuracy-Flip: 0.98743+-0.00460
Training: 2022-04-14 17:47:46,577-[cfp_fp][410000]Accuracy-Highest: 0.98757
Training: 2022-04-14 17:48:30,144-[agedb_30][410000]XNorm: 23.512937
Training: 2022-04-14 17:48:30,144-[agedb_30][410000]Accuracy-Flip: 0.97567+-0.00667
Training: 2022-04-14 17:48:30,145-[agedb_30][410000]Accuracy-Highest: 0.97700
Training: 2022-04-14 17:48:34,010-Speed 73.08 samples/sec   Loss 6.8507   LearningRate 0.0256   Epoch: 9   Global Step: 410010   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:48:37,879-Speed 2646.71 samples/sec   Loss 6.7371   LearningRate 0.0256   Epoch: 9   Global Step: 410020   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:48:41,751-Speed 2645.30 samples/sec   Loss 6.7200   LearningRate 0.0256   Epoch: 9   Global Step: 410030   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:48:45,626-Speed 2643.63 samples/sec   Loss 6.9014   LearningRate 0.0256   Epoch: 9   Global Step: 410040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:48:49,483-Speed 2655.29 samples/sec   Loss 6.8184   LearningRate 0.0256   Epoch: 9   Global Step: 410050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:48:53,362-Speed 2640.44 samples/sec   Loss 6.7314   LearningRate 0.0256   Epoch: 9   Global Step: 410060   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:48:57,253-Speed 2633.18 samples/sec   Loss 6.7465   LearningRate 0.0256   Epoch: 9   Global Step: 410070   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:01,133-Speed 2639.97 samples/sec   Loss 6.7128   LearningRate 0.0256   Epoch: 9   Global Step: 410080   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:05,029-Speed 2628.61 samples/sec   Loss 6.6678   LearningRate 0.0256   Epoch: 9   Global Step: 410090   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:08,912-Speed 2638.19 samples/sec   Loss 6.8084   LearningRate 0.0256   Epoch: 9   Global Step: 410100   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:12,802-Speed 2633.03 samples/sec   Loss 6.7267   LearningRate 0.0256   Epoch: 9   Global Step: 410110   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:16,695-Speed 2631.28 samples/sec   Loss 6.7525   LearningRate 0.0256   Epoch: 9   Global Step: 410120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:20,587-Speed 2631.35 samples/sec   Loss 6.7245   LearningRate 0.0256   Epoch: 9   Global Step: 410130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:24,476-Speed 2634.13 samples/sec   Loss 6.8087   LearningRate 0.0256   Epoch: 9   Global Step: 410140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:28,347-Speed 2645.27 samples/sec   Loss 6.8889   LearningRate 0.0256   Epoch: 9   Global Step: 410150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:32,246-Speed 2628.10 samples/sec   Loss 6.7882   LearningRate 0.0256   Epoch: 9   Global Step: 410160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:36,139-Speed 2630.40 samples/sec   Loss 6.8911   LearningRate 0.0256   Epoch: 9   Global Step: 410170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:40,047-Speed 2621.31 samples/sec   Loss 6.7715   LearningRate 0.0256   Epoch: 9   Global Step: 410180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:43,944-Speed 2627.92 samples/sec   Loss 6.9588   LearningRate 0.0256   Epoch: 9   Global Step: 410190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:47,838-Speed 2630.50 samples/sec   Loss 6.7303   LearningRate 0.0256   Epoch: 9   Global Step: 410200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:51,741-Speed 2624.05 samples/sec   Loss 6.7005   LearningRate 0.0256   Epoch: 9   Global Step: 410210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:49:55,612-Speed 2646.33 samples/sec   Loss 6.6639   LearningRate 0.0256   Epoch: 9   Global Step: 410220   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:49:59,504-Speed 2631.12 samples/sec   Loss 6.7632   LearningRate 0.0256   Epoch: 9   Global Step: 410230   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:03,398-Speed 2630.80 samples/sec   Loss 6.7264   LearningRate 0.0256   Epoch: 9   Global Step: 410240   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:07,292-Speed 2631.21 samples/sec   Loss 6.7399   LearningRate 0.0256   Epoch: 9   Global Step: 410250   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:11,202-Speed 2619.20 samples/sec   Loss 6.7174   LearningRate 0.0255   Epoch: 9   Global Step: 410260   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:15,099-Speed 2628.78 samples/sec   Loss 6.8469   LearningRate 0.0255   Epoch: 9   Global Step: 410270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:18,995-Speed 2629.23 samples/sec   Loss 6.7187   LearningRate 0.0255   Epoch: 9   Global Step: 410280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:22,891-Speed 2628.36 samples/sec   Loss 6.7469   LearningRate 0.0255   Epoch: 9   Global Step: 410290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:26,790-Speed 2626.97 samples/sec   Loss 6.6427   LearningRate 0.0255   Epoch: 9   Global Step: 410300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:30,699-Speed 2620.47 samples/sec   Loss 6.7555   LearningRate 0.0255   Epoch: 9   Global Step: 410310   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:50:34,722-Speed 2546.18 samples/sec   Loss 6.7301   LearningRate 0.0255   Epoch: 9   Global Step: 410320   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:50:38,624-Speed 2625.24 samples/sec   Loss 6.6465   LearningRate 0.0255   Epoch: 9   Global Step: 410330   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:50:42,524-Speed 2626.98 samples/sec   Loss 6.8398   LearningRate 0.0255   Epoch: 9   Global Step: 410340   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:50:46,424-Speed 2626.51 samples/sec   Loss 6.7758   LearningRate 0.0255   Epoch: 9   Global Step: 410350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:50:50,355-Speed 2605.16 samples/sec   Loss 6.6659   LearningRate 0.0255   Epoch: 9   Global Step: 410360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:50:54,269-Speed 2617.29 samples/sec   Loss 6.7666   LearningRate 0.0255   Epoch: 9   Global Step: 410370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:50:58,169-Speed 2626.07 samples/sec   Loss 6.7487   LearningRate 0.0255   Epoch: 9   Global Step: 410380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:02,074-Speed 2623.03 samples/sec   Loss 6.7657   LearningRate 0.0255   Epoch: 9   Global Step: 410390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:06,012-Speed 2600.95 samples/sec   Loss 6.7846   LearningRate 0.0255   Epoch: 9   Global Step: 410400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:09,918-Speed 2622.38 samples/sec   Loss 6.7284   LearningRate 0.0255   Epoch: 9   Global Step: 410410   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:13,822-Speed 2623.81 samples/sec   Loss 6.7280   LearningRate 0.0255   Epoch: 9   Global Step: 410420   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:51:17,707-Speed 2636.91 samples/sec   Loss 6.7120   LearningRate 0.0255   Epoch: 9   Global Step: 410430   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:21,609-Speed 2624.48 samples/sec   Loss 6.7128   LearningRate 0.0255   Epoch: 9   Global Step: 410440   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:25,512-Speed 2624.49 samples/sec   Loss 6.7397   LearningRate 0.0255   Epoch: 9   Global Step: 410450   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:29,416-Speed 2623.33 samples/sec   Loss 6.8315   LearningRate 0.0255   Epoch: 9   Global Step: 410460   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:33,362-Speed 2595.57 samples/sec   Loss 6.5880   LearningRate 0.0255   Epoch: 9   Global Step: 410470   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:37,365-Speed 2558.94 samples/sec   Loss 6.6807   LearningRate 0.0255   Epoch: 9   Global Step: 410480   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:41,343-Speed 2575.21 samples/sec   Loss 6.6225   LearningRate 0.0255   Epoch: 9   Global Step: 410490   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:45,288-Speed 2596.31 samples/sec   Loss 6.7069   LearningRate 0.0255   Epoch: 9   Global Step: 410500   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:49,196-Speed 2620.59 samples/sec   Loss 6.6549   LearningRate 0.0255   Epoch: 9   Global Step: 410510   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:53,097-Speed 2625.41 samples/sec   Loss 6.8364   LearningRate 0.0255   Epoch: 9   Global Step: 410520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:51:56,980-Speed 2638.27 samples/sec   Loss 6.7145   LearningRate 0.0255   Epoch: 9   Global Step: 410530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:00,885-Speed 2622.63 samples/sec   Loss 6.6973   LearningRate 0.0255   Epoch: 9   Global Step: 410540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:04,785-Speed 2626.81 samples/sec   Loss 6.7224   LearningRate 0.0255   Epoch: 9   Global Step: 410550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:08,694-Speed 2619.54 samples/sec   Loss 6.6312   LearningRate 0.0255   Epoch: 9   Global Step: 410560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:12,608-Speed 2617.60 samples/sec   Loss 6.6001   LearningRate 0.0255   Epoch: 9   Global Step: 410570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:16,509-Speed 2624.97 samples/sec   Loss 6.7708   LearningRate 0.0255   Epoch: 9   Global Step: 410580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:20,452-Speed 2597.96 samples/sec   Loss 6.6993   LearningRate 0.0255   Epoch: 9   Global Step: 410590   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:24,352-Speed 2626.52 samples/sec   Loss 6.7935   LearningRate 0.0255   Epoch: 9   Global Step: 410600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:28,262-Speed 2619.88 samples/sec   Loss 6.7906   LearningRate 0.0255   Epoch: 9   Global Step: 410610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:32,189-Speed 2607.92 samples/sec   Loss 6.7897   LearningRate 0.0255   Epoch: 9   Global Step: 410620   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:36,085-Speed 2629.25 samples/sec   Loss 6.7612   LearningRate 0.0255   Epoch: 9   Global Step: 410630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:39,990-Speed 2622.53 samples/sec   Loss 6.7364   LearningRate 0.0255   Epoch: 9   Global Step: 410640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:43,893-Speed 2624.85 samples/sec   Loss 6.7569   LearningRate 0.0255   Epoch: 9   Global Step: 410650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:47,795-Speed 2625.40 samples/sec   Loss 6.7275   LearningRate 0.0255   Epoch: 9   Global Step: 410660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:51,702-Speed 2621.09 samples/sec   Loss 6.6506   LearningRate 0.0255   Epoch: 9   Global Step: 410670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:55,607-Speed 2623.22 samples/sec   Loss 6.7504   LearningRate 0.0255   Epoch: 9   Global Step: 410680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:52:59,517-Speed 2619.86 samples/sec   Loss 6.6850   LearningRate 0.0255   Epoch: 9   Global Step: 410690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:03,434-Speed 2614.60 samples/sec   Loss 6.6764   LearningRate 0.0255   Epoch: 9   Global Step: 410700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:07,372-Speed 2601.01 samples/sec   Loss 6.7491   LearningRate 0.0255   Epoch: 9   Global Step: 410710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:11,334-Speed 2585.72 samples/sec   Loss 6.6892   LearningRate 0.0255   Epoch: 9   Global Step: 410720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:15,332-Speed 2562.46 samples/sec   Loss 6.7626   LearningRate 0.0255   Epoch: 9   Global Step: 410730   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:53:19,238-Speed 2622.31 samples/sec   Loss 6.7760   LearningRate 0.0255   Epoch: 9   Global Step: 410740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:23,151-Speed 2617.68 samples/sec   Loss 6.7255   LearningRate 0.0255   Epoch: 9   Global Step: 410750   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:27,141-Speed 2566.83 samples/sec   Loss 6.8232   LearningRate 0.0255   Epoch: 9   Global Step: 410760   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:31,048-Speed 2622.11 samples/sec   Loss 6.6733   LearningRate 0.0255   Epoch: 9   Global Step: 410770   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:34,961-Speed 2616.99 samples/sec   Loss 6.8368   LearningRate 0.0255   Epoch: 9   Global Step: 410780   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:38,880-Speed 2613.48 samples/sec   Loss 6.6723   LearningRate 0.0255   Epoch: 9   Global Step: 410790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:42,787-Speed 2622.37 samples/sec   Loss 6.7909   LearningRate 0.0255   Epoch: 9   Global Step: 410800   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:53:46,679-Speed 2631.65 samples/sec   Loss 6.7439   LearningRate 0.0255   Epoch: 9   Global Step: 410810   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:53:50,581-Speed 2625.12 samples/sec   Loss 6.6399   LearningRate 0.0255   Epoch: 9   Global Step: 410820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:53:54,486-Speed 2622.47 samples/sec   Loss 6.7125   LearningRate 0.0255   Epoch: 9   Global Step: 410830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:53:58,388-Speed 2625.35 samples/sec   Loss 6.7294   LearningRate 0.0255   Epoch: 9   Global Step: 410840   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:02,299-Speed 2619.02 samples/sec   Loss 6.7369   LearningRate 0.0255   Epoch: 9   Global Step: 410850   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:06,233-Speed 2603.67 samples/sec   Loss 6.8138   LearningRate 0.0255   Epoch: 9   Global Step: 410860   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:10,151-Speed 2614.20 samples/sec   Loss 6.7552   LearningRate 0.0255   Epoch: 9   Global Step: 410870   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:14,066-Speed 2616.50 samples/sec   Loss 6.6643   LearningRate 0.0255   Epoch: 9   Global Step: 410880   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:17,976-Speed 2619.53 samples/sec   Loss 6.6348   LearningRate 0.0255   Epoch: 9   Global Step: 410890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:21,889-Speed 2617.42 samples/sec   Loss 6.9037   LearningRate 0.0255   Epoch: 9   Global Step: 410900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:25,811-Speed 2612.32 samples/sec   Loss 6.7455   LearningRate 0.0255   Epoch: 9   Global Step: 410910   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:54:29,757-Speed 2595.16 samples/sec   Loss 6.6681   LearningRate 0.0255   Epoch: 9   Global Step: 410920   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:54:33,693-Speed 2602.89 samples/sec   Loss 6.7718   LearningRate 0.0255   Epoch: 9   Global Step: 410930   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:54:37,846-Speed 2465.89 samples/sec   Loss 6.7359   LearningRate 0.0255   Epoch: 9   Global Step: 410940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:41,747-Speed 2625.84 samples/sec   Loss 6.7536   LearningRate 0.0255   Epoch: 9   Global Step: 410950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:45,653-Speed 2621.91 samples/sec   Loss 6.7611   LearningRate 0.0255   Epoch: 9   Global Step: 410960   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:49,557-Speed 2623.84 samples/sec   Loss 6.7442   LearningRate 0.0255   Epoch: 9   Global Step: 410970   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:53,460-Speed 2624.92 samples/sec   Loss 6.8712   LearningRate 0.0255   Epoch: 9   Global Step: 410980   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:54:57,363-Speed 2623.89 samples/sec   Loss 6.7948   LearningRate 0.0255   Epoch: 9   Global Step: 410990   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:55:01,285-Speed 2612.11 samples/sec   Loss 6.7563   LearningRate 0.0255   Epoch: 9   Global Step: 411000   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:55:05,192-Speed 2621.53 samples/sec   Loss 6.7447   LearningRate 0.0255   Epoch: 9   Global Step: 411010   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:55:09,108-Speed 2615.48 samples/sec   Loss 6.7133   LearningRate 0.0255   Epoch: 9   Global Step: 411020   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:55:13,033-Speed 2609.80 samples/sec   Loss 6.6876   LearningRate 0.0255   Epoch: 9   Global Step: 411030   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:55:16,936-Speed 2624.43 samples/sec   Loss 6.6541   LearningRate 0.0255   Epoch: 9   Global Step: 411040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:20,839-Speed 2623.90 samples/sec   Loss 6.6652   LearningRate 0.0255   Epoch: 9   Global Step: 411050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:24,738-Speed 2627.42 samples/sec   Loss 6.6843   LearningRate 0.0255   Epoch: 9   Global Step: 411060   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:28,644-Speed 2622.35 samples/sec   Loss 6.8288   LearningRate 0.0255   Epoch: 9   Global Step: 411070   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:32,550-Speed 2622.13 samples/sec   Loss 6.5988   LearningRate 0.0254   Epoch: 9   Global Step: 411080   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:36,459-Speed 2620.13 samples/sec   Loss 6.6970   LearningRate 0.0254   Epoch: 9   Global Step: 411090   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:40,366-Speed 2622.20 samples/sec   Loss 6.6641   LearningRate 0.0254   Epoch: 9   Global Step: 411100   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:44,269-Speed 2625.25 samples/sec   Loss 6.8192   LearningRate 0.0254   Epoch: 9   Global Step: 411110   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:48,168-Speed 2627.11 samples/sec   Loss 6.6830   LearningRate 0.0254   Epoch: 9   Global Step: 411120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:52,077-Speed 2619.98 samples/sec   Loss 6.6504   LearningRate 0.0254   Epoch: 9   Global Step: 411130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:55,969-Speed 2632.20 samples/sec   Loss 6.7650   LearningRate 0.0254   Epoch: 9   Global Step: 411140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:55:59,889-Speed 2612.72 samples/sec   Loss 6.7622   LearningRate 0.0254   Epoch: 9   Global Step: 411150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:03,800-Speed 2619.63 samples/sec   Loss 6.6894   LearningRate 0.0254   Epoch: 9   Global Step: 411160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:07,703-Speed 2623.78 samples/sec   Loss 6.8569   LearningRate 0.0254   Epoch: 9   Global Step: 411170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:11,610-Speed 2621.54 samples/sec   Loss 6.6737   LearningRate 0.0254   Epoch: 9   Global Step: 411180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:15,518-Speed 2620.76 samples/sec   Loss 6.6191   LearningRate 0.0254   Epoch: 9   Global Step: 411190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:19,431-Speed 2618.46 samples/sec   Loss 6.8445   LearningRate 0.0254   Epoch: 9   Global Step: 411200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:23,341-Speed 2619.68 samples/sec   Loss 6.7747   LearningRate 0.0254   Epoch: 9   Global Step: 411210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:27,244-Speed 2624.51 samples/sec   Loss 6.8299   LearningRate 0.0254   Epoch: 9   Global Step: 411220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:31,146-Speed 2624.58 samples/sec   Loss 6.7548   LearningRate 0.0254   Epoch: 9   Global Step: 411230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:35,036-Speed 2633.29 samples/sec   Loss 6.7869   LearningRate 0.0254   Epoch: 9   Global Step: 411240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:38,963-Speed 2608.06 samples/sec   Loss 6.7762   LearningRate 0.0254   Epoch: 9   Global Step: 411250   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:42,902-Speed 2603.01 samples/sec   Loss 6.8376   LearningRate 0.0254   Epoch: 9   Global Step: 411260   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:46,808-Speed 2621.97 samples/sec   Loss 6.8333   LearningRate 0.0254   Epoch: 9   Global Step: 411270   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:50,715-Speed 2621.90 samples/sec   Loss 6.7707   LearningRate 0.0254   Epoch: 9   Global Step: 411280   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:54,623-Speed 2620.81 samples/sec   Loss 6.7844   LearningRate 0.0254   Epoch: 9   Global Step: 411290   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:56:58,531-Speed 2621.11 samples/sec   Loss 6.8089   LearningRate 0.0254   Epoch: 9   Global Step: 411300   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:02,471-Speed 2598.88 samples/sec   Loss 6.6909   LearningRate 0.0254   Epoch: 9   Global Step: 411310   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:06,377-Speed 2623.03 samples/sec   Loss 6.7911   LearningRate 0.0254   Epoch: 9   Global Step: 411320   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:10,356-Speed 2574.18 samples/sec   Loss 6.7112   LearningRate 0.0254   Epoch: 9   Global Step: 411330   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:14,284-Speed 2607.85 samples/sec   Loss 6.7218   LearningRate 0.0254   Epoch: 9   Global Step: 411340   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:57:18,182-Speed 2628.17 samples/sec   Loss 6.7405   LearningRate 0.0254   Epoch: 9   Global Step: 411350   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:57:22,077-Speed 2629.37 samples/sec   Loss 6.7714   LearningRate 0.0254   Epoch: 9   Global Step: 411360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:25,985-Speed 2621.28 samples/sec   Loss 6.7917   LearningRate 0.0254   Epoch: 9   Global Step: 411370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:29,886-Speed 2625.21 samples/sec   Loss 6.7267   LearningRate 0.0254   Epoch: 9   Global Step: 411380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:33,794-Speed 2620.70 samples/sec   Loss 6.6313   LearningRate 0.0254   Epoch: 9   Global Step: 411390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:37,699-Speed 2623.13 samples/sec   Loss 6.7299   LearningRate 0.0254   Epoch: 9   Global Step: 411400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:57:41,610-Speed 2619.61 samples/sec   Loss 6.7374   LearningRate 0.0254   Epoch: 9   Global Step: 411410   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:57:45,522-Speed 2617.49 samples/sec   Loss 6.7937   LearningRate 0.0254   Epoch: 9   Global Step: 411420   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:57:49,429-Speed 2622.17 samples/sec   Loss 6.6241   LearningRate 0.0254   Epoch: 9   Global Step: 411430   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:57:53,388-Speed 2587.41 samples/sec   Loss 6.6853   LearningRate 0.0254   Epoch: 9   Global Step: 411440   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:57:57,292-Speed 2623.38 samples/sec   Loss 6.7568   LearningRate 0.0254   Epoch: 9   Global Step: 411450   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:58:01,193-Speed 2625.73 samples/sec   Loss 6.8264   LearningRate 0.0254   Epoch: 9   Global Step: 411460   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:58:05,101-Speed 2620.91 samples/sec   Loss 6.7102   LearningRate 0.0254   Epoch: 9   Global Step: 411470   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:58:09,006-Speed 2623.11 samples/sec   Loss 6.7250   LearningRate 0.0254   Epoch: 9   Global Step: 411480   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:58:12,916-Speed 2619.56 samples/sec   Loss 6.7203   LearningRate 0.0254   Epoch: 9   Global Step: 411490   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:58:16,824-Speed 2620.45 samples/sec   Loss 6.6990   LearningRate 0.0254   Epoch: 9   Global Step: 411500   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:58:20,748-Speed 2610.85 samples/sec   Loss 6.6309   LearningRate 0.0254   Epoch: 9   Global Step: 411510   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:24,698-Speed 2592.58 samples/sec   Loss 6.8297   LearningRate 0.0254   Epoch: 9   Global Step: 411520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:28,602-Speed 2623.98 samples/sec   Loss 6.7019   LearningRate 0.0254   Epoch: 9   Global Step: 411530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:32,510-Speed 2620.82 samples/sec   Loss 6.6447   LearningRate 0.0254   Epoch: 9   Global Step: 411540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:36,423-Speed 2617.52 samples/sec   Loss 6.6940   LearningRate 0.0254   Epoch: 9   Global Step: 411550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:40,341-Speed 2613.89 samples/sec   Loss 6.6333   LearningRate 0.0254   Epoch: 9   Global Step: 411560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:44,244-Speed 2625.14 samples/sec   Loss 6.6434   LearningRate 0.0254   Epoch: 9   Global Step: 411570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:48,145-Speed 2625.59 samples/sec   Loss 6.8457   LearningRate 0.0254   Epoch: 9   Global Step: 411580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:52,066-Speed 2612.54 samples/sec   Loss 6.6998   LearningRate 0.0254   Epoch: 9   Global Step: 411590   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:55,964-Speed 2627.71 samples/sec   Loss 6.7683   LearningRate 0.0254   Epoch: 9   Global Step: 411600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:58:59,885-Speed 2612.05 samples/sec   Loss 6.8534   LearningRate 0.0254   Epoch: 9   Global Step: 411610   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:59:03,780-Speed 2629.31 samples/sec   Loss 6.7526   LearningRate 0.0254   Epoch: 9   Global Step: 411620   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:59:07,674-Speed 2630.23 samples/sec   Loss 6.6544   LearningRate 0.0254   Epoch: 9   Global Step: 411630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:11,582-Speed 2621.54 samples/sec   Loss 6.6791   LearningRate 0.0254   Epoch: 9   Global Step: 411640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:15,499-Speed 2614.62 samples/sec   Loss 6.7490   LearningRate 0.0254   Epoch: 9   Global Step: 411650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:19,406-Speed 2621.57 samples/sec   Loss 6.6926   LearningRate 0.0254   Epoch: 9   Global Step: 411660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:23,310-Speed 2623.11 samples/sec   Loss 6.8180   LearningRate 0.0254   Epoch: 9   Global Step: 411670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:27,229-Speed 2613.99 samples/sec   Loss 6.7712   LearningRate 0.0254   Epoch: 9   Global Step: 411680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:31,137-Speed 2621.29 samples/sec   Loss 6.8055   LearningRate 0.0254   Epoch: 9   Global Step: 411690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:35,063-Speed 2608.65 samples/sec   Loss 6.7486   LearningRate 0.0254   Epoch: 9   Global Step: 411700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:38,999-Speed 2602.10 samples/sec   Loss 6.7296   LearningRate 0.0254   Epoch: 9   Global Step: 411710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:42,910-Speed 2619.40 samples/sec   Loss 6.6944   LearningRate 0.0254   Epoch: 9   Global Step: 411720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:46,819-Speed 2620.01 samples/sec   Loss 6.8634   LearningRate 0.0254   Epoch: 9   Global Step: 411730   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 17:59:50,705-Speed 2635.65 samples/sec   Loss 6.8445   LearningRate 0.0254   Epoch: 9   Global Step: 411740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 17:59:54,599-Speed 2630.54 samples/sec   Loss 6.7760   LearningRate 0.0254   Epoch: 9   Global Step: 411750   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 17:59:58,506-Speed 2621.93 samples/sec   Loss 6.6399   LearningRate 0.0254   Epoch: 9   Global Step: 411760   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:02,410-Speed 2623.81 samples/sec   Loss 6.6546   LearningRate 0.0254   Epoch: 9   Global Step: 411770   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:06,310-Speed 2625.78 samples/sec   Loss 6.7103   LearningRate 0.0254   Epoch: 9   Global Step: 411780   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:10,249-Speed 2600.45 samples/sec   Loss 6.7667   LearningRate 0.0254   Epoch: 9   Global Step: 411790   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:14,148-Speed 2626.87 samples/sec   Loss 6.7053   LearningRate 0.0254   Epoch: 9   Global Step: 411800   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:18,056-Speed 2620.66 samples/sec   Loss 6.7414   LearningRate 0.0254   Epoch: 9   Global Step: 411810   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:21,963-Speed 2621.46 samples/sec   Loss 6.6897   LearningRate 0.0254   Epoch: 9   Global Step: 411820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:25,889-Speed 2609.20 samples/sec   Loss 6.7861   LearningRate 0.0254   Epoch: 9   Global Step: 411830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:29,794-Speed 2623.19 samples/sec   Loss 6.7127   LearningRate 0.0254   Epoch: 9   Global Step: 411840   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:00:33,698-Speed 2623.57 samples/sec   Loss 6.6517   LearningRate 0.0254   Epoch: 9   Global Step: 411850   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:00:37,602-Speed 2623.58 samples/sec   Loss 6.6912   LearningRate 0.0254   Epoch: 9   Global Step: 411860   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:00:41,504-Speed 2624.89 samples/sec   Loss 6.7557   LearningRate 0.0254   Epoch: 9   Global Step: 411870   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:00:45,406-Speed 2625.13 samples/sec   Loss 6.8106   LearningRate 0.0254   Epoch: 9   Global Step: 411880   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:00:49,359-Speed 2592.91 samples/sec   Loss 6.7476   LearningRate 0.0254   Epoch: 9   Global Step: 411890   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:00:53,265-Speed 2622.71 samples/sec   Loss 6.7915   LearningRate 0.0253   Epoch: 9   Global Step: 411900   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:00:57,178-Speed 2617.88 samples/sec   Loss 6.7233   LearningRate 0.0253   Epoch: 9   Global Step: 411910   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:01:01,136-Speed 2588.32 samples/sec   Loss 6.7523   LearningRate 0.0253   Epoch: 9   Global Step: 411920   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:01:05,048-Speed 2618.01 samples/sec   Loss 6.7073   LearningRate 0.0253   Epoch: 9   Global Step: 411930   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:01:08,968-Speed 2612.73 samples/sec   Loss 6.7269   LearningRate 0.0253   Epoch: 9   Global Step: 411940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:12,875-Speed 2621.39 samples/sec   Loss 6.6673   LearningRate 0.0253   Epoch: 9   Global Step: 411950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:16,783-Speed 2621.02 samples/sec   Loss 6.7558   LearningRate 0.0253   Epoch: 9   Global Step: 411960   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:20,733-Speed 2593.37 samples/sec   Loss 6.7428   LearningRate 0.0253   Epoch: 9   Global Step: 411970   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:24,642-Speed 2620.65 samples/sec   Loss 6.7174   LearningRate 0.0253   Epoch: 9   Global Step: 411980   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:28,549-Speed 2621.16 samples/sec   Loss 6.6189   LearningRate 0.0253   Epoch: 9   Global Step: 411990   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:32,468-Speed 2613.91 samples/sec   Loss 6.7359   LearningRate 0.0253   Epoch: 9   Global Step: 412000   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:36,376-Speed 2621.02 samples/sec   Loss 6.7317   LearningRate 0.0253   Epoch: 9   Global Step: 412010   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:40,280-Speed 2623.79 samples/sec   Loss 6.6956   LearningRate 0.0253   Epoch: 9   Global Step: 412020   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:44,183-Speed 2623.60 samples/sec   Loss 6.8070   LearningRate 0.0253   Epoch: 9   Global Step: 412030   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:01:48,181-Speed 2562.42 samples/sec   Loss 6.7657   LearningRate 0.0253   Epoch: 9   Global Step: 412040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:01:52,087-Speed 2622.70 samples/sec   Loss 6.6833   LearningRate 0.0253   Epoch: 9   Global Step: 412050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:01:56,008-Speed 2612.10 samples/sec   Loss 6.9167   LearningRate 0.0253   Epoch: 9   Global Step: 412060   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:01:59,917-Speed 2620.15 samples/sec   Loss 6.8015   LearningRate 0.0253   Epoch: 9   Global Step: 412070   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:03,820-Speed 2624.74 samples/sec   Loss 6.7822   LearningRate 0.0253   Epoch: 9   Global Step: 412080   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:07,726-Speed 2621.76 samples/sec   Loss 6.7046   LearningRate 0.0253   Epoch: 9   Global Step: 412090   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:11,636-Speed 2619.97 samples/sec   Loss 6.6741   LearningRate 0.0253   Epoch: 9   Global Step: 412100   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:15,539-Speed 2623.76 samples/sec   Loss 6.7822   LearningRate 0.0253   Epoch: 9   Global Step: 412110   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:19,442-Speed 2624.96 samples/sec   Loss 6.7154   LearningRate 0.0253   Epoch: 9   Global Step: 412120   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:23,345-Speed 2623.84 samples/sec   Loss 6.8286   LearningRate 0.0253   Epoch: 9   Global Step: 412130   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:27,248-Speed 2624.12 samples/sec   Loss 6.8025   LearningRate 0.0253   Epoch: 9   Global Step: 412140   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:31,151-Speed 2624.49 samples/sec   Loss 6.7890   LearningRate 0.0253   Epoch: 9   Global Step: 412150   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:35,056-Speed 2622.79 samples/sec   Loss 6.7663   LearningRate 0.0253   Epoch: 9   Global Step: 412160   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:02:38,962-Speed 2622.28 samples/sec   Loss 6.7481   LearningRate 0.0253   Epoch: 9   Global Step: 412170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:02:42,870-Speed 2620.71 samples/sec   Loss 6.7010   LearningRate 0.0253   Epoch: 9   Global Step: 412180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:02:46,784-Speed 2617.43 samples/sec   Loss 6.7358   LearningRate 0.0253   Epoch: 9   Global Step: 412190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:02:50,693-Speed 2620.36 samples/sec   Loss 6.7395   LearningRate 0.0253   Epoch: 9   Global Step: 412200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:02:54,597-Speed 2623.65 samples/sec   Loss 6.7245   LearningRate 0.0253   Epoch: 9   Global Step: 412210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:02:58,502-Speed 2622.85 samples/sec   Loss 6.7317   LearningRate 0.0253   Epoch: 9   Global Step: 412220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:03:02,410-Speed 2620.43 samples/sec   Loss 6.6811   LearningRate 0.0253   Epoch: 9   Global Step: 412230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:03:06,334-Speed 2610.14 samples/sec   Loss 6.7483   LearningRate 0.0253   Epoch: 9   Global Step: 412240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:03:10,238-Speed 2624.48 samples/sec   Loss 6.7724   LearningRate 0.0253   Epoch: 9   Global Step: 412250   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:03:14,144-Speed 2622.38 samples/sec   Loss 6.7083   LearningRate 0.0253   Epoch: 9   Global Step: 412260   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:03:18,044-Speed 2626.09 samples/sec   Loss 6.6850   LearningRate 0.0253   Epoch: 9   Global Step: 412270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:21,970-Speed 2609.05 samples/sec   Loss 6.6905   LearningRate 0.0253   Epoch: 9   Global Step: 412280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:25,871-Speed 2625.47 samples/sec   Loss 6.6021   LearningRate 0.0253   Epoch: 9   Global Step: 412290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:29,780-Speed 2620.69 samples/sec   Loss 6.7683   LearningRate 0.0253   Epoch: 9   Global Step: 412300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:33,683-Speed 2624.25 samples/sec   Loss 6.6758   LearningRate 0.0253   Epoch: 9   Global Step: 412310   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:37,585-Speed 2624.29 samples/sec   Loss 6.7213   LearningRate 0.0253   Epoch: 9   Global Step: 412320   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:41,487-Speed 2625.48 samples/sec   Loss 6.8182   LearningRate 0.0253   Epoch: 9   Global Step: 412330   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:45,393-Speed 2622.27 samples/sec   Loss 6.6813   LearningRate 0.0253   Epoch: 9   Global Step: 412340   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:49,298-Speed 2623.12 samples/sec   Loss 6.6290   LearningRate 0.0253   Epoch: 9   Global Step: 412350   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:53,203-Speed 2622.79 samples/sec   Loss 6.6776   LearningRate 0.0253   Epoch: 9   Global Step: 412360   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:03:57,110-Speed 2622.14 samples/sec   Loss 6.7348   LearningRate 0.0253   Epoch: 9   Global Step: 412370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:04:01,039-Speed 2606.71 samples/sec   Loss 6.6142   LearningRate 0.0253   Epoch: 9   Global Step: 412380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:04:04,959-Speed 2613.05 samples/sec   Loss 6.6824   LearningRate 0.0253   Epoch: 9   Global Step: 412390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:04:08,865-Speed 2622.39 samples/sec   Loss 6.6196   LearningRate 0.0253   Epoch: 9   Global Step: 412400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:04:12,771-Speed 2622.32 samples/sec   Loss 6.6470   LearningRate 0.0253   Epoch: 9   Global Step: 412410   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:04:16,672-Speed 2625.51 samples/sec   Loss 6.7792   LearningRate 0.0253   Epoch: 9   Global Step: 412420   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:04:20,558-Speed 2635.54 samples/sec   Loss 6.6895   LearningRate 0.0253   Epoch: 9   Global Step: 412430   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:24,467-Speed 2620.75 samples/sec   Loss 6.6099   LearningRate 0.0253   Epoch: 9   Global Step: 412440   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:28,383-Speed 2615.97 samples/sec   Loss 6.7633   LearningRate 0.0253   Epoch: 9   Global Step: 412450   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:32,288-Speed 2622.29 samples/sec   Loss 6.7401   LearningRate 0.0253   Epoch: 9   Global Step: 412460   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:36,191-Speed 2624.18 samples/sec   Loss 6.6650   LearningRate 0.0253   Epoch: 9   Global Step: 412470   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:40,098-Speed 2621.60 samples/sec   Loss 6.7805   LearningRate 0.0253   Epoch: 9   Global Step: 412480   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:43,997-Speed 2627.16 samples/sec   Loss 6.7751   LearningRate 0.0253   Epoch: 9   Global Step: 412490   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:47,901-Speed 2623.42 samples/sec   Loss 6.8531   LearningRate 0.0253   Epoch: 9   Global Step: 412500   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:51,832-Speed 2606.25 samples/sec   Loss 6.7463   LearningRate 0.0253   Epoch: 9   Global Step: 412510   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:55,755-Speed 2611.12 samples/sec   Loss 6.6961   LearningRate 0.0253   Epoch: 9   Global Step: 412520   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:04:59,665-Speed 2620.01 samples/sec   Loss 6.8572   LearningRate 0.0253   Epoch: 9   Global Step: 412530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:03,578-Speed 2617.14 samples/sec   Loss 6.7378   LearningRate 0.0253   Epoch: 9   Global Step: 412540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:07,544-Speed 2582.60 samples/sec   Loss 6.7732   LearningRate 0.0253   Epoch: 9   Global Step: 412550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:11,453-Speed 2620.14 samples/sec   Loss 6.7328   LearningRate 0.0253   Epoch: 9   Global Step: 412560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:15,357-Speed 2623.64 samples/sec   Loss 6.7678   LearningRate 0.0253   Epoch: 9   Global Step: 412570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:19,377-Speed 2548.41 samples/sec   Loss 6.6101   LearningRate 0.0253   Epoch: 9   Global Step: 412580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:23,307-Speed 2606.30 samples/sec   Loss 6.5974   LearningRate 0.0253   Epoch: 9   Global Step: 412590   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:27,209-Speed 2625.11 samples/sec   Loss 6.7341   LearningRate 0.0253   Epoch: 9   Global Step: 412600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:31,147-Speed 2601.41 samples/sec   Loss 6.7143   LearningRate 0.0253   Epoch: 9   Global Step: 412610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:35,192-Speed 2531.90 samples/sec   Loss 6.7935   LearningRate 0.0253   Epoch: 9   Global Step: 412620   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:39,135-Speed 2597.62 samples/sec   Loss 6.6572   LearningRate 0.0253   Epoch: 9   Global Step: 412630   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:05:43,022-Speed 2635.07 samples/sec   Loss 6.7849   LearningRate 0.0253   Epoch: 9   Global Step: 412640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:46,932-Speed 2619.23 samples/sec   Loss 6.7297   LearningRate 0.0253   Epoch: 9   Global Step: 412650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:50,842-Speed 2620.25 samples/sec   Loss 6.6558   LearningRate 0.0253   Epoch: 9   Global Step: 412660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:54,747-Speed 2622.83 samples/sec   Loss 6.7159   LearningRate 0.0253   Epoch: 9   Global Step: 412670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:05:58,648-Speed 2625.69 samples/sec   Loss 6.6714   LearningRate 0.0253   Epoch: 9   Global Step: 412680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:06:02,541-Speed 2630.97 samples/sec   Loss 6.6507   LearningRate 0.0253   Epoch: 9   Global Step: 412690   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:06,456-Speed 2616.46 samples/sec   Loss 6.7068   LearningRate 0.0253   Epoch: 9   Global Step: 412700   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:10,355-Speed 2626.42 samples/sec   Loss 6.7523   LearningRate 0.0253   Epoch: 9   Global Step: 412710   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:14,255-Speed 2626.21 samples/sec   Loss 6.7187   LearningRate 0.0253   Epoch: 9   Global Step: 412720   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:18,158-Speed 2624.59 samples/sec   Loss 6.7684   LearningRate 0.0252   Epoch: 9   Global Step: 412730   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:22,064-Speed 2622.34 samples/sec   Loss 6.6633   LearningRate 0.0252   Epoch: 9   Global Step: 412740   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:25,968-Speed 2623.57 samples/sec   Loss 6.6897   LearningRate 0.0252   Epoch: 9   Global Step: 412750   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:29,917-Speed 2593.77 samples/sec   Loss 6.7300   LearningRate 0.0252   Epoch: 9   Global Step: 412760   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:33,820-Speed 2623.80 samples/sec   Loss 6.7029   LearningRate 0.0252   Epoch: 9   Global Step: 412770   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:37,725-Speed 2622.70 samples/sec   Loss 6.7306   LearningRate 0.0252   Epoch: 9   Global Step: 412780   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:06:41,634-Speed 2620.48 samples/sec   Loss 6.8247   LearningRate 0.0252   Epoch: 9   Global Step: 412790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:06:45,537-Speed 2623.72 samples/sec   Loss 6.7021   LearningRate 0.0252   Epoch: 9   Global Step: 412800   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:06:49,442-Speed 2623.36 samples/sec   Loss 6.7117   LearningRate 0.0252   Epoch: 9   Global Step: 412810   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:06:53,344-Speed 2624.69 samples/sec   Loss 6.7419   LearningRate 0.0252   Epoch: 9   Global Step: 412820   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:06:57,243-Speed 2626.90 samples/sec   Loss 6.7661   LearningRate 0.0252   Epoch: 9   Global Step: 412830   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:07:01,147-Speed 2623.96 samples/sec   Loss 6.6393   LearningRate 0.0252   Epoch: 9   Global Step: 412840   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:07:05,049-Speed 2624.84 samples/sec   Loss 6.7364   LearningRate 0.0252   Epoch: 9   Global Step: 412850   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:07:08,951-Speed 2624.45 samples/sec   Loss 6.7923   LearningRate 0.0252   Epoch: 9   Global Step: 412860   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:07:12,833-Speed 2638.50 samples/sec   Loss 6.8355   LearningRate 0.0252   Epoch: 9   Global Step: 412870   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:16,733-Speed 2626.12 samples/sec   Loss 6.6916   LearningRate 0.0252   Epoch: 9   Global Step: 412880   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:20,642-Speed 2620.53 samples/sec   Loss 6.8828   LearningRate 0.0252   Epoch: 9   Global Step: 412890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:24,553-Speed 2620.47 samples/sec   Loss 6.6669   LearningRate 0.0252   Epoch: 9   Global Step: 412900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:28,460-Speed 2621.41 samples/sec   Loss 6.6708   LearningRate 0.0252   Epoch: 9   Global Step: 412910   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:32,366-Speed 2622.59 samples/sec   Loss 6.7430   LearningRate 0.0252   Epoch: 9   Global Step: 412920   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:36,271-Speed 2623.10 samples/sec   Loss 6.6153   LearningRate 0.0252   Epoch: 9   Global Step: 412930   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:40,171-Speed 2626.33 samples/sec   Loss 6.6562   LearningRate 0.0252   Epoch: 9   Global Step: 412940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:44,071-Speed 2625.87 samples/sec   Loss 6.6936   LearningRate 0.0252   Epoch: 9   Global Step: 412950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:47,981-Speed 2619.45 samples/sec   Loss 6.7266   LearningRate 0.0252   Epoch: 9   Global Step: 412960   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:07:51,890-Speed 2620.05 samples/sec   Loss 6.7240   LearningRate 0.0252   Epoch: 9   Global Step: 412970   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:07:55,788-Speed 2627.67 samples/sec   Loss 6.6067   LearningRate 0.0252   Epoch: 9   Global Step: 412980   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:07:59,783-Speed 2563.79 samples/sec   Loss 6.8009   LearningRate 0.0252   Epoch: 9   Global Step: 412990   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:08:03,732-Speed 2594.59 samples/sec   Loss 6.7324   LearningRate 0.0252   Epoch: 9   Global Step: 413000   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:08:07,636-Speed 2623.57 samples/sec   Loss 6.7081   LearningRate 0.0252   Epoch: 9   Global Step: 413010   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:08:11,542-Speed 2621.80 samples/sec   Loss 6.8706   LearningRate 0.0252   Epoch: 9   Global Step: 413020   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:08:15,465-Speed 2611.33 samples/sec   Loss 6.6386   LearningRate 0.0252   Epoch: 9   Global Step: 413030   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:08:19,349-Speed 2637.25 samples/sec   Loss 6.6816   LearningRate 0.0252   Epoch: 9   Global Step: 413040   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:23,263-Speed 2616.03 samples/sec   Loss 6.6917   LearningRate 0.0252   Epoch: 9   Global Step: 413050   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:27,162-Speed 2627.20 samples/sec   Loss 6.7144   LearningRate 0.0252   Epoch: 9   Global Step: 413060   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:31,074-Speed 2618.27 samples/sec   Loss 6.6354   LearningRate 0.0252   Epoch: 9   Global Step: 413070   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:34,968-Speed 2629.96 samples/sec   Loss 6.6476   LearningRate 0.0252   Epoch: 9   Global Step: 413080   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:38,864-Speed 2628.91 samples/sec   Loss 6.6501   LearningRate 0.0252   Epoch: 9   Global Step: 413090   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:42,783-Speed 2613.07 samples/sec   Loss 6.7644   LearningRate 0.0252   Epoch: 9   Global Step: 413100   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:46,686-Speed 2624.57 samples/sec   Loss 6.7023   LearningRate 0.0252   Epoch: 9   Global Step: 413110   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:50,595-Speed 2620.31 samples/sec   Loss 6.6550   LearningRate 0.0252   Epoch: 9   Global Step: 413120   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:54,507-Speed 2618.80 samples/sec   Loss 6.6900   LearningRate 0.0252   Epoch: 9   Global Step: 413130   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:08:58,409-Speed 2624.75 samples/sec   Loss 6.6032   LearningRate 0.0252   Epoch: 9   Global Step: 413140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:02,311-Speed 2624.52 samples/sec   Loss 6.6027   LearningRate 0.0252   Epoch: 9   Global Step: 413150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:06,213-Speed 2624.64 samples/sec   Loss 6.7350   LearningRate 0.0252   Epoch: 9   Global Step: 413160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:10,111-Speed 2627.85 samples/sec   Loss 6.6227   LearningRate 0.0252   Epoch: 9   Global Step: 413170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:14,039-Speed 2607.24 samples/sec   Loss 6.6361   LearningRate 0.0252   Epoch: 9   Global Step: 413180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:17,967-Speed 2607.66 samples/sec   Loss 6.7690   LearningRate 0.0252   Epoch: 9   Global Step: 413190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:21,880-Speed 2617.39 samples/sec   Loss 6.7507   LearningRate 0.0252   Epoch: 9   Global Step: 413200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:25,789-Speed 2620.26 samples/sec   Loss 6.6622   LearningRate 0.0252   Epoch: 9   Global Step: 413210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:29,699-Speed 2619.59 samples/sec   Loss 6.6756   LearningRate 0.0252   Epoch: 9   Global Step: 413220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:33,597-Speed 2627.32 samples/sec   Loss 6.7303   LearningRate 0.0252   Epoch: 9   Global Step: 413230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:37,487-Speed 2632.87 samples/sec   Loss 6.7211   LearningRate 0.0252   Epoch: 9   Global Step: 413240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:41,392-Speed 2623.04 samples/sec   Loss 6.6472   LearningRate 0.0252   Epoch: 9   Global Step: 413250   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:45,294-Speed 2624.91 samples/sec   Loss 6.6544   LearningRate 0.0252   Epoch: 9   Global Step: 413260   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:49,200-Speed 2622.59 samples/sec   Loss 6.7455   LearningRate 0.0252   Epoch: 9   Global Step: 413270   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:53,105-Speed 2622.68 samples/sec   Loss 6.6834   LearningRate 0.0252   Epoch: 9   Global Step: 413280   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:09:57,019-Speed 2616.54 samples/sec   Loss 6.7895   LearningRate 0.0252   Epoch: 9   Global Step: 413290   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:00,926-Speed 2621.72 samples/sec   Loss 6.6455   LearningRate 0.0252   Epoch: 9   Global Step: 413300   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:04,831-Speed 2622.67 samples/sec   Loss 6.6921   LearningRate 0.0252   Epoch: 9   Global Step: 413310   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:08,732-Speed 2625.81 samples/sec   Loss 6.6944   LearningRate 0.0252   Epoch: 9   Global Step: 413320   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:12,645-Speed 2617.27 samples/sec   Loss 6.6984   LearningRate 0.0252   Epoch: 9   Global Step: 413330   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:16,534-Speed 2634.08 samples/sec   Loss 6.7632   LearningRate 0.0252   Epoch: 9   Global Step: 413340   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:20,437-Speed 2624.12 samples/sec   Loss 6.6642   LearningRate 0.0252   Epoch: 9   Global Step: 413350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:24,345-Speed 2620.58 samples/sec   Loss 6.7622   LearningRate 0.0252   Epoch: 9   Global Step: 413360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:28,247-Speed 2625.24 samples/sec   Loss 6.6950   LearningRate 0.0252   Epoch: 9   Global Step: 413370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:32,151-Speed 2623.13 samples/sec   Loss 6.7247   LearningRate 0.0252   Epoch: 9   Global Step: 413380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:10:36,033-Speed 2638.51 samples/sec   Loss 6.7372   LearningRate 0.0252   Epoch: 9   Global Step: 413390   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:10:39,932-Speed 2627.23 samples/sec   Loss 6.7905   LearningRate 0.0252   Epoch: 9   Global Step: 413400   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:10:43,827-Speed 2629.73 samples/sec   Loss 6.8020   LearningRate 0.0252   Epoch: 9   Global Step: 413410   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:10:47,725-Speed 2627.78 samples/sec   Loss 6.6949   LearningRate 0.0252   Epoch: 9   Global Step: 413420   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:10:51,638-Speed 2617.07 samples/sec   Loss 6.7420   LearningRate 0.0252   Epoch: 9   Global Step: 413430   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:10:55,537-Speed 2627.06 samples/sec   Loss 6.6209   LearningRate 0.0252   Epoch: 9   Global Step: 413440   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:10:59,435-Speed 2627.89 samples/sec   Loss 6.6876   LearningRate 0.0252   Epoch: 9   Global Step: 413450   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:11:03,331-Speed 2628.59 samples/sec   Loss 6.5864   LearningRate 0.0252   Epoch: 9   Global Step: 413460   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:11:07,254-Speed 2610.76 samples/sec   Loss 6.7672   LearningRate 0.0252   Epoch: 9   Global Step: 413470   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:11:11,169-Speed 2615.92 samples/sec   Loss 6.8403   LearningRate 0.0252   Epoch: 9   Global Step: 413480   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:11:15,062-Speed 2631.17 samples/sec   Loss 6.6864   LearningRate 0.0252   Epoch: 9   Global Step: 413490   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:18,962-Speed 2626.85 samples/sec   Loss 6.6949   LearningRate 0.0252   Epoch: 9   Global Step: 413500   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:22,867-Speed 2622.78 samples/sec   Loss 6.6864   LearningRate 0.0252   Epoch: 9   Global Step: 413510   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:26,767-Speed 2626.48 samples/sec   Loss 6.7666   LearningRate 0.0252   Epoch: 9   Global Step: 413520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:30,664-Speed 2627.91 samples/sec   Loss 6.6115   LearningRate 0.0252   Epoch: 9   Global Step: 413530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:34,559-Speed 2629.60 samples/sec   Loss 6.6454   LearningRate 0.0252   Epoch: 9   Global Step: 413540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:38,460-Speed 2625.80 samples/sec   Loss 6.5986   LearningRate 0.0251   Epoch: 9   Global Step: 413550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:42,365-Speed 2622.61 samples/sec   Loss 6.7439   LearningRate 0.0251   Epoch: 9   Global Step: 413560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:46,264-Speed 2626.79 samples/sec   Loss 6.7798   LearningRate 0.0251   Epoch: 9   Global Step: 413570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:50,160-Speed 2628.99 samples/sec   Loss 6.8014   LearningRate 0.0251   Epoch: 9   Global Step: 413580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:11:54,065-Speed 2622.60 samples/sec   Loss 6.7503   LearningRate 0.0251   Epoch: 9   Global Step: 413590   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:11:57,953-Speed 2635.14 samples/sec   Loss 6.6768   LearningRate 0.0251   Epoch: 9   Global Step: 413600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:01,851-Speed 2627.27 samples/sec   Loss 6.6445   LearningRate 0.0251   Epoch: 9   Global Step: 413610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:05,749-Speed 2627.97 samples/sec   Loss 6.8428   LearningRate 0.0251   Epoch: 9   Global Step: 413620   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:09,650-Speed 2624.89 samples/sec   Loss 6.5654   LearningRate 0.0251   Epoch: 9   Global Step: 413630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:13,552-Speed 2625.16 samples/sec   Loss 6.6323   LearningRate 0.0251   Epoch: 9   Global Step: 413640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:17,456-Speed 2623.84 samples/sec   Loss 6.7542   LearningRate 0.0251   Epoch: 9   Global Step: 413650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:21,360-Speed 2622.93 samples/sec   Loss 6.7136   LearningRate 0.0251   Epoch: 9   Global Step: 413660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:25,269-Speed 2620.74 samples/sec   Loss 6.7469   LearningRate 0.0251   Epoch: 9   Global Step: 413670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:29,175-Speed 2622.03 samples/sec   Loss 6.5907   LearningRate 0.0251   Epoch: 9   Global Step: 413680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:33,084-Speed 2620.09 samples/sec   Loss 6.7610   LearningRate 0.0251   Epoch: 9   Global Step: 413690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:36,988-Speed 2623.43 samples/sec   Loss 6.6238   LearningRate 0.0251   Epoch: 9   Global Step: 413700   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:12:40,864-Speed 2642.89 samples/sec   Loss 6.8517   LearningRate 0.0251   Epoch: 9   Global Step: 413710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:44,762-Speed 2627.42 samples/sec   Loss 6.7094   LearningRate 0.0251   Epoch: 9   Global Step: 413720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:48,665-Speed 2625.30 samples/sec   Loss 6.6643   LearningRate 0.0251   Epoch: 9   Global Step: 413730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:52,569-Speed 2623.19 samples/sec   Loss 6.6790   LearningRate 0.0251   Epoch: 9   Global Step: 413740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:12:56,467-Speed 2627.64 samples/sec   Loss 6.7444   LearningRate 0.0251   Epoch: 9   Global Step: 413750   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:13:00,366-Speed 2626.90 samples/sec   Loss 6.7288   LearningRate 0.0251   Epoch: 9   Global Step: 413760   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:13:04,269-Speed 2624.36 samples/sec   Loss 6.7508   LearningRate 0.0251   Epoch: 9   Global Step: 413770   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:13:08,193-Speed 2610.14 samples/sec   Loss 6.6893   LearningRate 0.0251   Epoch: 9   Global Step: 413780   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:13:12,094-Speed 2625.97 samples/sec   Loss 6.6517   LearningRate 0.0251   Epoch: 9   Global Step: 413790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:13:15,989-Speed 2629.39 samples/sec   Loss 6.5641   LearningRate 0.0251   Epoch: 9   Global Step: 413800   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:13:19,887-Speed 2627.54 samples/sec   Loss 6.5845   LearningRate 0.0251   Epoch: 9   Global Step: 413810   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:13:23,745-Speed 2654.65 samples/sec   Loss 6.6234   LearningRate 0.0251   Epoch: 9   Global Step: 413820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:27,719-Speed 2577.01 samples/sec   Loss 6.5585   LearningRate 0.0251   Epoch: 9   Global Step: 413830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:31,692-Speed 2578.57 samples/sec   Loss 6.5711   LearningRate 0.0251   Epoch: 9   Global Step: 413840   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:35,586-Speed 2629.76 samples/sec   Loss 6.7340   LearningRate 0.0251   Epoch: 9   Global Step: 413850   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:39,495-Speed 2620.11 samples/sec   Loss 6.7690   LearningRate 0.0251   Epoch: 9   Global Step: 413860   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:43,408-Speed 2617.90 samples/sec   Loss 6.6212   LearningRate 0.0251   Epoch: 9   Global Step: 413870   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:47,307-Speed 2626.74 samples/sec   Loss 6.6695   LearningRate 0.0251   Epoch: 9   Global Step: 413880   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:51,209-Speed 2625.15 samples/sec   Loss 6.7628   LearningRate 0.0251   Epoch: 9   Global Step: 413890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:55,101-Speed 2631.87 samples/sec   Loss 6.8098   LearningRate 0.0251   Epoch: 9   Global Step: 413900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:13:58,997-Speed 2628.93 samples/sec   Loss 6.6288   LearningRate 0.0251   Epoch: 9   Global Step: 413910   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:02,893-Speed 2628.92 samples/sec   Loss 6.7317   LearningRate 0.0251   Epoch: 9   Global Step: 413920   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:14:06,788-Speed 2629.66 samples/sec   Loss 6.6745   LearningRate 0.0251   Epoch: 9   Global Step: 413930   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:14:10,698-Speed 2619.73 samples/sec   Loss 6.6606   LearningRate 0.0251   Epoch: 9   Global Step: 413940   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:14:14,605-Speed 2621.08 samples/sec   Loss 6.6811   LearningRate 0.0251   Epoch: 9   Global Step: 413950   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:14:18,506-Speed 2625.19 samples/sec   Loss 6.6437   LearningRate 0.0251   Epoch: 9   Global Step: 413960   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:14:22,384-Speed 2641.08 samples/sec   Loss 6.8417   LearningRate 0.0251   Epoch: 9   Global Step: 413970   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:26,281-Speed 2628.72 samples/sec   Loss 6.5650   LearningRate 0.0251   Epoch: 9   Global Step: 413980   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:30,180-Speed 2627.36 samples/sec   Loss 6.7767   LearningRate 0.0251   Epoch: 9   Global Step: 413990   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:34,080-Speed 2625.79 samples/sec   Loss 6.6887   LearningRate 0.0251   Epoch: 9   Global Step: 414000   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:37,993-Speed 2617.51 samples/sec   Loss 6.6870   LearningRate 0.0251   Epoch: 9   Global Step: 414010   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:41,889-Speed 2628.92 samples/sec   Loss 6.5180   LearningRate 0.0251   Epoch: 9   Global Step: 414020   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:45,790-Speed 2625.50 samples/sec   Loss 6.6518   LearningRate 0.0251   Epoch: 9   Global Step: 414030   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:49,692-Speed 2624.77 samples/sec   Loss 6.7457   LearningRate 0.0251   Epoch: 9   Global Step: 414040   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:53,589-Speed 2628.42 samples/sec   Loss 6.7420   LearningRate 0.0251   Epoch: 9   Global Step: 414050   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:14:57,502-Speed 2617.36 samples/sec   Loss 6.7052   LearningRate 0.0251   Epoch: 9   Global Step: 414060   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:15:01,500-Speed 2561.95 samples/sec   Loss 6.6923   LearningRate 0.0251   Epoch: 9   Global Step: 414070   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:05,402-Speed 2624.74 samples/sec   Loss 6.7383   LearningRate 0.0251   Epoch: 9   Global Step: 414080   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:09,307-Speed 2623.39 samples/sec   Loss 6.7165   LearningRate 0.0251   Epoch: 9   Global Step: 414090   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:13,203-Speed 2628.81 samples/sec   Loss 6.6974   LearningRate 0.0251   Epoch: 9   Global Step: 414100   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:17,109-Speed 2621.71 samples/sec   Loss 6.7904   LearningRate 0.0251   Epoch: 9   Global Step: 414110   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:21,002-Speed 2630.97 samples/sec   Loss 6.7335   LearningRate 0.0251   Epoch: 9   Global Step: 414120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:24,908-Speed 2622.66 samples/sec   Loss 6.7842   LearningRate 0.0251   Epoch: 9   Global Step: 414130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:28,804-Speed 2629.46 samples/sec   Loss 6.7567   LearningRate 0.0251   Epoch: 9   Global Step: 414140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:32,699-Speed 2629.12 samples/sec   Loss 6.7082   LearningRate 0.0251   Epoch: 9   Global Step: 414150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:36,598-Speed 2626.64 samples/sec   Loss 6.7100   LearningRate 0.0251   Epoch: 9   Global Step: 414160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:40,495-Speed 2628.10 samples/sec   Loss 6.7293   LearningRate 0.0251   Epoch: 9   Global Step: 414170   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:15:44,382-Speed 2635.18 samples/sec   Loss 6.7219   LearningRate 0.0251   Epoch: 9   Global Step: 414180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:48,281-Speed 2627.57 samples/sec   Loss 6.7135   LearningRate 0.0251   Epoch: 9   Global Step: 414190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:52,196-Speed 2616.09 samples/sec   Loss 6.7049   LearningRate 0.0251   Epoch: 9   Global Step: 414200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:15:56,104-Speed 2620.43 samples/sec   Loss 6.5370   LearningRate 0.0251   Epoch: 9   Global Step: 414210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:16:00,016-Speed 2618.71 samples/sec   Loss 6.5930   LearningRate 0.0251   Epoch: 9   Global Step: 414220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:16:03,918-Speed 2624.76 samples/sec   Loss 6.8286   LearningRate 0.0251   Epoch: 9   Global Step: 414230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:16:07,840-Speed 2611.26 samples/sec   Loss 6.7025   LearningRate 0.0251   Epoch: 9   Global Step: 414240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:16:11,745-Speed 2622.76 samples/sec   Loss 6.6693   LearningRate 0.0251   Epoch: 9   Global Step: 414250   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:16:15,664-Speed 2613.40 samples/sec   Loss 6.8120   LearningRate 0.0251   Epoch: 9   Global Step: 414260   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:16:19,542-Speed 2641.02 samples/sec   Loss 6.6262   LearningRate 0.0251   Epoch: 9   Global Step: 414270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:23,452-Speed 2619.85 samples/sec   Loss 6.7175   LearningRate 0.0251   Epoch: 9   Global Step: 414280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:27,354-Speed 2624.65 samples/sec   Loss 6.6788   LearningRate 0.0251   Epoch: 9   Global Step: 414290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:31,269-Speed 2616.24 samples/sec   Loss 6.6065   LearningRate 0.0251   Epoch: 9   Global Step: 414300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:35,173-Speed 2623.51 samples/sec   Loss 6.5991   LearningRate 0.0251   Epoch: 9   Global Step: 414310   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:39,073-Speed 2626.18 samples/sec   Loss 6.6846   LearningRate 0.0251   Epoch: 9   Global Step: 414320   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:42,992-Speed 2613.37 samples/sec   Loss 6.5908   LearningRate 0.0251   Epoch: 9   Global Step: 414330   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:46,888-Speed 2628.69 samples/sec   Loss 6.6741   LearningRate 0.0251   Epoch: 9   Global Step: 414340   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:50,814-Speed 2608.93 samples/sec   Loss 6.6204   LearningRate 0.0251   Epoch: 9   Global Step: 414350   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:54,736-Speed 2611.75 samples/sec   Loss 6.5965   LearningRate 0.0251   Epoch: 9   Global Step: 414360   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:16:58,639-Speed 2624.19 samples/sec   Loss 6.7445   LearningRate 0.0251   Epoch: 9   Global Step: 414370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:02,555-Speed 2615.71 samples/sec   Loss 6.7108   LearningRate 0.0250   Epoch: 9   Global Step: 414380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:06,459-Speed 2623.25 samples/sec   Loss 6.6858   LearningRate 0.0250   Epoch: 9   Global Step: 414390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:10,367-Speed 2620.69 samples/sec   Loss 6.6747   LearningRate 0.0250   Epoch: 9   Global Step: 414400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:14,276-Speed 2620.46 samples/sec   Loss 6.7741   LearningRate 0.0250   Epoch: 9   Global Step: 414410   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:18,173-Speed 2628.35 samples/sec   Loss 6.6673   LearningRate 0.0250   Epoch: 9   Global Step: 414420   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:22,078-Speed 2622.93 samples/sec   Loss 6.7688   LearningRate 0.0250   Epoch: 9   Global Step: 414430   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:17:25,960-Speed 2638.20 samples/sec   Loss 6.6524   LearningRate 0.0250   Epoch: 9   Global Step: 414440   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:29,861-Speed 2625.66 samples/sec   Loss 6.6150   LearningRate 0.0250   Epoch: 9   Global Step: 414450   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:33,769-Speed 2620.79 samples/sec   Loss 6.6135   LearningRate 0.0250   Epoch: 9   Global Step: 414460   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:37,674-Speed 2622.53 samples/sec   Loss 6.5843   LearningRate 0.0250   Epoch: 9   Global Step: 414470   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:41,572-Speed 2627.78 samples/sec   Loss 6.6313   LearningRate 0.0250   Epoch: 9   Global Step: 414480   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:45,470-Speed 2627.71 samples/sec   Loss 6.6590   LearningRate 0.0250   Epoch: 9   Global Step: 414490   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:49,369-Speed 2626.90 samples/sec   Loss 6.5704   LearningRate 0.0250   Epoch: 9   Global Step: 414500   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:53,277-Speed 2621.09 samples/sec   Loss 6.6903   LearningRate 0.0250   Epoch: 9   Global Step: 414510   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:17:57,194-Speed 2614.87 samples/sec   Loss 6.5944   LearningRate 0.0250   Epoch: 9   Global Step: 414520   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:01,084-Speed 2633.09 samples/sec   Loss 6.7016   LearningRate 0.0250   Epoch: 9   Global Step: 414530   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:05,004-Speed 2612.72 samples/sec   Loss 6.8025   LearningRate 0.0250   Epoch: 9   Global Step: 414540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:18:08,879-Speed 2642.96 samples/sec   Loss 6.6949   LearningRate 0.0250   Epoch: 9   Global Step: 414550   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:12,777-Speed 2627.61 samples/sec   Loss 6.7361   LearningRate 0.0250   Epoch: 9   Global Step: 414560   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:16,677-Speed 2626.14 samples/sec   Loss 6.7119   LearningRate 0.0250   Epoch: 9   Global Step: 414570   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:20,578-Speed 2625.72 samples/sec   Loss 6.6936   LearningRate 0.0250   Epoch: 9   Global Step: 414580   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:24,478-Speed 2626.15 samples/sec   Loss 6.7140   LearningRate 0.0250   Epoch: 9   Global Step: 414590   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:28,370-Speed 2631.58 samples/sec   Loss 6.7420   LearningRate 0.0250   Epoch: 9   Global Step: 414600   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:32,278-Speed 2621.51 samples/sec   Loss 6.7499   LearningRate 0.0250   Epoch: 9   Global Step: 414610   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:36,173-Speed 2629.46 samples/sec   Loss 6.7276   LearningRate 0.0250   Epoch: 9   Global Step: 414620   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:40,070-Speed 2627.89 samples/sec   Loss 6.6017   LearningRate 0.0250   Epoch: 9   Global Step: 414630   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:43,968-Speed 2627.50 samples/sec   Loss 6.6959   LearningRate 0.0250   Epoch: 9   Global Step: 414640   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:18:47,865-Speed 2628.34 samples/sec   Loss 6.7079   LearningRate 0.0250   Epoch: 9   Global Step: 414650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:18:51,769-Speed 2623.53 samples/sec   Loss 6.8654   LearningRate 0.0250   Epoch: 9   Global Step: 414660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:18:55,665-Speed 2628.91 samples/sec   Loss 6.8122   LearningRate 0.0250   Epoch: 9   Global Step: 414670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:18:59,561-Speed 2628.79 samples/sec   Loss 6.6107   LearningRate 0.0250   Epoch: 9   Global Step: 414680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:19:03,431-Speed 2646.76 samples/sec   Loss 6.7077   LearningRate 0.0250   Epoch: 9   Global Step: 414690   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:07,322-Speed 2631.74 samples/sec   Loss 6.6601   LearningRate 0.0250   Epoch: 9   Global Step: 414700   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:11,222-Speed 2626.79 samples/sec   Loss 6.7466   LearningRate 0.0250   Epoch: 9   Global Step: 414710   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:15,114-Speed 2631.83 samples/sec   Loss 6.6423   LearningRate 0.0250   Epoch: 9   Global Step: 414720   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:19,009-Speed 2629.70 samples/sec   Loss 6.6232   LearningRate 0.0250   Epoch: 9   Global Step: 414730   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:22,929-Speed 2612.22 samples/sec   Loss 6.7433   LearningRate 0.0250   Epoch: 9   Global Step: 414740   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:26,832-Speed 2624.73 samples/sec   Loss 6.7222   LearningRate 0.0250   Epoch: 9   Global Step: 414750   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:30,746-Speed 2616.70 samples/sec   Loss 6.7550   LearningRate 0.0250   Epoch: 9   Global Step: 414760   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:34,640-Speed 2630.44 samples/sec   Loss 6.6741   LearningRate 0.0250   Epoch: 9   Global Step: 414770   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:38,550-Speed 2619.84 samples/sec   Loss 6.7562   LearningRate 0.0250   Epoch: 9   Global Step: 414780   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:19:42,443-Speed 2630.77 samples/sec   Loss 6.6415   LearningRate 0.0250   Epoch: 9   Global Step: 414790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:20:03,911-Speed 477.00 samples/sec   Loss 6.6136   LearningRate 0.0250   Epoch: 10   Global Step: 414800   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:20:07,798-Speed 2635.51 samples/sec   Loss 6.7046   LearningRate 0.0250   Epoch: 10   Global Step: 414810   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:20:11,659-Speed 2653.07 samples/sec   Loss 6.6818   LearningRate 0.0250   Epoch: 10   Global Step: 414820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:15,540-Speed 2638.98 samples/sec   Loss 6.8428   LearningRate 0.0250   Epoch: 10   Global Step: 414830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:19,438-Speed 2627.78 samples/sec   Loss 6.7264   LearningRate 0.0250   Epoch: 10   Global Step: 414840   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:23,322-Speed 2637.23 samples/sec   Loss 6.7687   LearningRate 0.0250   Epoch: 10   Global Step: 414850   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:27,212-Speed 2632.99 samples/sec   Loss 6.6144   LearningRate 0.0250   Epoch: 10   Global Step: 414860   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:31,102-Speed 2633.17 samples/sec   Loss 6.6380   LearningRate 0.0250   Epoch: 10   Global Step: 414870   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:34,995-Speed 2630.86 samples/sec   Loss 6.6934   LearningRate 0.0250   Epoch: 10   Global Step: 414880   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:38,884-Speed 2634.00 samples/sec   Loss 6.7166   LearningRate 0.0250   Epoch: 10   Global Step: 414890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:42,782-Speed 2627.52 samples/sec   Loss 6.6576   LearningRate 0.0250   Epoch: 10   Global Step: 414900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:46,723-Speed 2599.12 samples/sec   Loss 6.7692   LearningRate 0.0250   Epoch: 10   Global Step: 414910   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:20:50,612-Speed 2633.34 samples/sec   Loss 6.7700   LearningRate 0.0250   Epoch: 10   Global Step: 414920   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:20:54,516-Speed 2623.82 samples/sec   Loss 6.7246   LearningRate 0.0250   Epoch: 10   Global Step: 414930   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:20:58,410-Speed 2630.25 samples/sec   Loss 6.6584   LearningRate 0.0250   Epoch: 10   Global Step: 414940   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:02,309-Speed 2626.87 samples/sec   Loss 6.7072   LearningRate 0.0250   Epoch: 10   Global Step: 414950   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:06,217-Speed 2621.28 samples/sec   Loss 6.6743   LearningRate 0.0250   Epoch: 10   Global Step: 414960   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:10,122-Speed 2622.62 samples/sec   Loss 6.5546   LearningRate 0.0250   Epoch: 10   Global Step: 414970   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:14,025-Speed 2624.71 samples/sec   Loss 6.6345   LearningRate 0.0250   Epoch: 10   Global Step: 414980   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:18,039-Speed 2551.60 samples/sec   Loss 6.7359   LearningRate 0.0250   Epoch: 10   Global Step: 414990   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:22,167-Speed 2481.24 samples/sec   Loss 6.6081   LearningRate 0.0250   Epoch: 10   Global Step: 415000   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:21:26,120-Speed 2591.32 samples/sec   Loss 6.6717   LearningRate 0.0250   Epoch: 10   Global Step: 415010   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:30,018-Speed 2627.80 samples/sec   Loss 6.7200   LearningRate 0.0250   Epoch: 10   Global Step: 415020   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:33,919-Speed 2625.01 samples/sec   Loss 6.7822   LearningRate 0.0250   Epoch: 10   Global Step: 415030   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:37,827-Speed 2621.34 samples/sec   Loss 6.7206   LearningRate 0.0250   Epoch: 10   Global Step: 415040   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:41,729-Speed 2624.64 samples/sec   Loss 6.6701   LearningRate 0.0250   Epoch: 10   Global Step: 415050   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:45,630-Speed 2626.52 samples/sec   Loss 6.5811   LearningRate 0.0250   Epoch: 10   Global Step: 415060   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:49,533-Speed 2624.07 samples/sec   Loss 6.7182   LearningRate 0.0250   Epoch: 10   Global Step: 415070   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:53,432-Speed 2626.79 samples/sec   Loss 6.7163   LearningRate 0.0250   Epoch: 10   Global Step: 415080   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:21:57,328-Speed 2629.17 samples/sec   Loss 6.7075   LearningRate 0.0250   Epoch: 10   Global Step: 415090   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:22:01,224-Speed 2628.69 samples/sec   Loss 6.6656   LearningRate 0.0250   Epoch: 10   Global Step: 415100   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:22:05,119-Speed 2629.91 samples/sec   Loss 6.6575   LearningRate 0.0250   Epoch: 10   Global Step: 415110   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:09,019-Speed 2626.73 samples/sec   Loss 6.6409   LearningRate 0.0250   Epoch: 10   Global Step: 415120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:12,973-Speed 2590.61 samples/sec   Loss 6.8056   LearningRate 0.0250   Epoch: 10   Global Step: 415130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:16,876-Speed 2624.57 samples/sec   Loss 6.5898   LearningRate 0.0250   Epoch: 10   Global Step: 415140   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:20,784-Speed 2620.91 samples/sec   Loss 6.7034   LearningRate 0.0250   Epoch: 10   Global Step: 415150   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:24,717-Speed 2603.65 samples/sec   Loss 6.7823   LearningRate 0.0250   Epoch: 10   Global Step: 415160   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:28,632-Speed 2616.71 samples/sec   Loss 6.6345   LearningRate 0.0250   Epoch: 10   Global Step: 415170   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:32,528-Speed 2628.78 samples/sec   Loss 6.6079   LearningRate 0.0250   Epoch: 10   Global Step: 415180   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:36,433-Speed 2623.39 samples/sec   Loss 6.5879   LearningRate 0.0250   Epoch: 10   Global Step: 415190   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:40,329-Speed 2628.62 samples/sec   Loss 6.6128   LearningRate 0.0250   Epoch: 10   Global Step: 415200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:44,228-Speed 2627.47 samples/sec   Loss 6.6556   LearningRate 0.0249   Epoch: 10   Global Step: 415210   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:22:48,172-Speed 2596.95 samples/sec   Loss 6.6111   LearningRate 0.0249   Epoch: 10   Global Step: 415220   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:22:52,085-Speed 2617.59 samples/sec   Loss 6.5421   LearningRate 0.0249   Epoch: 10   Global Step: 415230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:56,035-Speed 2593.18 samples/sec   Loss 6.6298   LearningRate 0.0249   Epoch: 10   Global Step: 415240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:22:59,914-Speed 2640.30 samples/sec   Loss 6.5906   LearningRate 0.0249   Epoch: 10   Global Step: 415250   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:03,812-Speed 2627.59 samples/sec   Loss 6.7154   LearningRate 0.0249   Epoch: 10   Global Step: 415260   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:07,734-Speed 2611.85 samples/sec   Loss 6.6078   LearningRate 0.0249   Epoch: 10   Global Step: 415270   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:11,627-Speed 2630.89 samples/sec   Loss 6.5437   LearningRate 0.0249   Epoch: 10   Global Step: 415280   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:15,524-Speed 2628.94 samples/sec   Loss 6.6054   LearningRate 0.0249   Epoch: 10   Global Step: 415290   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:19,421-Speed 2628.10 samples/sec   Loss 6.6635   LearningRate 0.0249   Epoch: 10   Global Step: 415300   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:23,315-Speed 2630.27 samples/sec   Loss 6.6438   LearningRate 0.0249   Epoch: 10   Global Step: 415310   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:27,219-Speed 2623.30 samples/sec   Loss 6.6085   LearningRate 0.0249   Epoch: 10   Global Step: 415320   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:31,145-Speed 2609.87 samples/sec   Loss 6.4879   LearningRate 0.0249   Epoch: 10   Global Step: 415330   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:35,049-Speed 2623.05 samples/sec   Loss 6.7571   LearningRate 0.0249   Epoch: 10   Global Step: 415340   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:23:38,986-Speed 2602.32 samples/sec   Loss 6.6994   LearningRate 0.0249   Epoch: 10   Global Step: 415350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:23:42,933-Speed 2594.83 samples/sec   Loss 6.6087   LearningRate 0.0249   Epoch: 10   Global Step: 415360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:23:46,856-Speed 2611.20 samples/sec   Loss 6.7698   LearningRate 0.0249   Epoch: 10   Global Step: 415370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:23:50,750-Speed 2630.39 samples/sec   Loss 6.6361   LearningRate 0.0249   Epoch: 10   Global Step: 415380   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:23:54,659-Speed 2620.22 samples/sec   Loss 6.7704   LearningRate 0.0249   Epoch: 10   Global Step: 415390   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:23:58,562-Speed 2624.13 samples/sec   Loss 6.6950   LearningRate 0.0249   Epoch: 10   Global Step: 415400   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:02,463-Speed 2625.52 samples/sec   Loss 6.7378   LearningRate 0.0249   Epoch: 10   Global Step: 415410   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:06,361-Speed 2628.14 samples/sec   Loss 6.5895   LearningRate 0.0249   Epoch: 10   Global Step: 415420   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:10,259-Speed 2627.51 samples/sec   Loss 6.7282   LearningRate 0.0249   Epoch: 10   Global Step: 415430   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:14,159-Speed 2626.34 samples/sec   Loss 6.5110   LearningRate 0.0249   Epoch: 10   Global Step: 415440   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:18,059-Speed 2626.38 samples/sec   Loss 6.6229   LearningRate 0.0249   Epoch: 10   Global Step: 415450   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:24:21,954-Speed 2629.54 samples/sec   Loss 6.5709   LearningRate 0.0249   Epoch: 10   Global Step: 415460   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-04-14 18:24:25,833-Speed 2640.10 samples/sec   Loss 6.5086   LearningRate 0.0249   Epoch: 10   Global Step: 415470   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:29,728-Speed 2630.59 samples/sec   Loss 6.6290   LearningRate 0.0249   Epoch: 10   Global Step: 415480   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:24:33,610-Speed 2638.00 samples/sec   Loss 6.6571   LearningRate 0.0249   Epoch: 10   Global Step: 415490   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:24:37,504-Speed 2630.75 samples/sec   Loss 6.6028   LearningRate 0.0249   Epoch: 10   Global Step: 415500   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:24:41,410-Speed 2621.99 samples/sec   Loss 6.5420   LearningRate 0.0249   Epoch: 10   Global Step: 415510   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:24:45,307-Speed 2628.73 samples/sec   Loss 6.6470   LearningRate 0.0249   Epoch: 10   Global Step: 415520   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:24:49,206-Speed 2626.88 samples/sec   Loss 6.5743   LearningRate 0.0249   Epoch: 10   Global Step: 415530   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:24:53,118-Speed 2618.80 samples/sec   Loss 6.6906   LearningRate 0.0249   Epoch: 10   Global Step: 415540   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:24:57,012-Speed 2630.36 samples/sec   Loss 6.7334   LearningRate 0.0249   Epoch: 10   Global Step: 415550   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:25:00,925-Speed 2617.12 samples/sec   Loss 6.5558   LearningRate 0.0249   Epoch: 10   Global Step: 415560   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:25:04,824-Speed 2627.52 samples/sec   Loss 6.6484   LearningRate 0.0249   Epoch: 10   Global Step: 415570   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:25:08,718-Speed 2630.38 samples/sec   Loss 6.6741   LearningRate 0.0249   Epoch: 10   Global Step: 415580   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-04-14 18:25:12,614-Speed 2628.50 samples/sec   Loss 6.6756   LearningRate 0.0249   Epoch: 10   Global Step: 415590   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:16,633-Speed 2548.52 samples/sec   Loss 6.6100   LearningRate 0.0249   Epoch: 10   Global Step: 415600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:20,533-Speed 2626.38 samples/sec   Loss 6.6018   LearningRate 0.0249   Epoch: 10   Global Step: 415610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:24,427-Speed 2630.95 samples/sec   Loss 6.6890   LearningRate 0.0249   Epoch: 10   Global Step: 415620   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:28,331-Speed 2622.94 samples/sec   Loss 6.7876   LearningRate 0.0249   Epoch: 10   Global Step: 415630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:32,226-Speed 2630.23 samples/sec   Loss 6.6955   LearningRate 0.0249   Epoch: 10   Global Step: 415640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:36,123-Speed 2628.23 samples/sec   Loss 6.6654   LearningRate 0.0249   Epoch: 10   Global Step: 415650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:40,074-Speed 2592.50 samples/sec   Loss 6.6587   LearningRate 0.0249   Epoch: 10   Global Step: 415660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-04-14 18:25:43,969-Speed 2629.90 samples/sec   Loss 6.6755   LearningRate 0.0249   Epoch: 10   Global Step: 415670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:25:47,884-Speed 2615.99 samples/sec   Loss 6.7907   LearningRate 0.0249   Epoch: 10   Global Step: 415680   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:25:51,790-Speed 2622.36 samples/sec   Loss 6.7129   LearningRate 0.0249   Epoch: 10   Global Step: 415690   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:25:55,691-Speed 2626.10 samples/sec   Loss 6.6212   LearningRate 0.0249   Epoch: 10   Global Step: 415700   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:25:59,595-Speed 2623.51 samples/sec   Loss 6.6619   LearningRate 0.0249   Epoch: 10   Global Step: 415710   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:03,499-Speed 2623.14 samples/sec   Loss 6.7675   LearningRate 0.0249   Epoch: 10   Global Step: 415720   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:07,399-Speed 2626.10 samples/sec   Loss 6.6359   LearningRate 0.0249   Epoch: 10   Global Step: 415730   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:11,302-Speed 2624.90 samples/sec   Loss 6.6521   LearningRate 0.0249   Epoch: 10   Global Step: 415740   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:15,195-Speed 2631.02 samples/sec   Loss 6.6767   LearningRate 0.0249   Epoch: 10   Global Step: 415750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:19,110-Speed 2616.75 samples/sec   Loss 6.6509   LearningRate 0.0249   Epoch: 10   Global Step: 415760   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:23,015-Speed 2622.62 samples/sec   Loss 6.6780   LearningRate 0.0249   Epoch: 10   Global Step: 415770   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:26,941-Speed 2609.10 samples/sec   Loss 6.5860   LearningRate 0.0249   Epoch: 10   Global Step: 415780   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:26:30,866-Speed 2609.38 samples/sec   Loss 6.7352   LearningRate 0.0249   Epoch: 10   Global Step: 415790   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:26:34,775-Speed 2620.58 samples/sec   Loss 6.6400   LearningRate 0.0249   Epoch: 10   Global Step: 415800   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:26:38,688-Speed 2617.46 samples/sec   Loss 6.5194   LearningRate 0.0249   Epoch: 10   Global Step: 415810   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:26:42,588-Speed 2626.60 samples/sec   Loss 6.6929   LearningRate 0.0249   Epoch: 10   Global Step: 415820   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:26:46,487-Speed 2626.69 samples/sec   Loss 6.6300   LearningRate 0.0249   Epoch: 10   Global Step: 415830   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:26:50,376-Speed 2634.15 samples/sec   Loss 6.5883   LearningRate 0.0249   Epoch: 10   Global Step: 415840   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:54,366-Speed 2567.02 samples/sec   Loss 6.6801   LearningRate 0.0249   Epoch: 10   Global Step: 415850   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:26:58,298-Speed 2604.60 samples/sec   Loss 6.6002   LearningRate 0.0249   Epoch: 10   Global Step: 415860   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:02,214-Speed 2615.64 samples/sec   Loss 6.6339   LearningRate 0.0249   Epoch: 10   Global Step: 415870   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:06,111-Speed 2628.23 samples/sec   Loss 6.7221   LearningRate 0.0249   Epoch: 10   Global Step: 415880   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:10,011-Speed 2626.77 samples/sec   Loss 6.6368   LearningRate 0.0249   Epoch: 10   Global Step: 415890   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:13,910-Speed 2626.92 samples/sec   Loss 6.6334   LearningRate 0.0249   Epoch: 10   Global Step: 415900   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:17,813-Speed 2623.73 samples/sec   Loss 6.6400   LearningRate 0.0249   Epoch: 10   Global Step: 415910   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:21,721-Speed 2621.13 samples/sec   Loss 6.7128   LearningRate 0.0249   Epoch: 10   Global Step: 415920   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:25,623-Speed 2624.88 samples/sec   Loss 6.5775   LearningRate 0.0249   Epoch: 10   Global Step: 415930   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:27:29,542-Speed 2613.92 samples/sec   Loss 6.7040   LearningRate 0.0249   Epoch: 10   Global Step: 415940   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:33,438-Speed 2629.25 samples/sec   Loss 6.5469   LearningRate 0.0249   Epoch: 10   Global Step: 415950   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:37,345-Speed 2621.26 samples/sec   Loss 6.6215   LearningRate 0.0249   Epoch: 10   Global Step: 415960   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:41,242-Speed 2628.34 samples/sec   Loss 6.5305   LearningRate 0.0249   Epoch: 10   Global Step: 415970   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:45,138-Speed 2629.74 samples/sec   Loss 6.6872   LearningRate 0.0249   Epoch: 10   Global Step: 415980   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:49,033-Speed 2629.21 samples/sec   Loss 6.5547   LearningRate 0.0249   Epoch: 10   Global Step: 415990   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:52,933-Speed 2626.56 samples/sec   Loss 6.5374   LearningRate 0.0249   Epoch: 10   Global Step: 416000   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:27:56,834-Speed 2626.12 samples/sec   Loss 6.6368   LearningRate 0.0249   Epoch: 10   Global Step: 416010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:28:00,713-Speed 2641.11 samples/sec   Loss 6.6637   LearningRate 0.0249   Epoch: 10   Global Step: 416020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:04,645-Speed 2604.82 samples/sec   Loss 6.6745   LearningRate 0.0249   Epoch: 10   Global Step: 416030   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:08,564-Speed 2613.71 samples/sec   Loss 6.5566   LearningRate 0.0248   Epoch: 10   Global Step: 416040   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:12,464-Speed 2625.83 samples/sec   Loss 6.6394   LearningRate 0.0248   Epoch: 10   Global Step: 416050   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:16,364-Speed 2626.76 samples/sec   Loss 6.5397   LearningRate 0.0248   Epoch: 10   Global Step: 416060   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:20,274-Speed 2619.45 samples/sec   Loss 6.7181   LearningRate 0.0248   Epoch: 10   Global Step: 416070   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:24,183-Speed 2619.92 samples/sec   Loss 6.5740   LearningRate 0.0248   Epoch: 10   Global Step: 416080   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:28,089-Speed 2622.68 samples/sec   Loss 6.7217   LearningRate 0.0248   Epoch: 10   Global Step: 416090   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:31,984-Speed 2630.08 samples/sec   Loss 6.8088   LearningRate 0.0248   Epoch: 10   Global Step: 416100   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:35,880-Speed 2628.62 samples/sec   Loss 6.7379   LearningRate 0.0248   Epoch: 10   Global Step: 416110   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:28:39,776-Speed 2628.58 samples/sec   Loss 6.7041   LearningRate 0.0248   Epoch: 10   Global Step: 416120   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:28:43,670-Speed 2630.48 samples/sec   Loss 6.6069   LearningRate 0.0248   Epoch: 10   Global Step: 416130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:28:47,569-Speed 2627.39 samples/sec   Loss 6.5702   LearningRate 0.0248   Epoch: 10   Global Step: 416140   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:28:51,470-Speed 2625.98 samples/sec   Loss 6.6822   LearningRate 0.0248   Epoch: 10   Global Step: 416150   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:28:55,390-Speed 2612.39 samples/sec   Loss 6.6728   LearningRate 0.0248   Epoch: 10   Global Step: 416160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:28:59,298-Speed 2621.63 samples/sec   Loss 6.6688   LearningRate 0.0248   Epoch: 10   Global Step: 416170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:03,261-Speed 2584.05 samples/sec   Loss 6.7796   LearningRate 0.0248   Epoch: 10   Global Step: 416180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:07,159-Speed 2628.32 samples/sec   Loss 6.6708   LearningRate 0.0248   Epoch: 10   Global Step: 416190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:11,056-Speed 2628.21 samples/sec   Loss 6.6369   LearningRate 0.0248   Epoch: 10   Global Step: 416200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:14,952-Speed 2629.05 samples/sec   Loss 6.6579   LearningRate 0.0248   Epoch: 10   Global Step: 416210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:18,838-Speed 2635.53 samples/sec   Loss 6.8030   LearningRate 0.0248   Epoch: 10   Global Step: 416220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:22,748-Speed 2619.66 samples/sec   Loss 6.6282   LearningRate 0.0248   Epoch: 10   Global Step: 416230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:26,660-Speed 2617.90 samples/sec   Loss 6.5540   LearningRate 0.0248   Epoch: 10   Global Step: 416240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:30,557-Speed 2628.95 samples/sec   Loss 6.5378   LearningRate 0.0248   Epoch: 10   Global Step: 416250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:29:34,441-Speed 2636.57 samples/sec   Loss 6.5806   LearningRate 0.0248   Epoch: 10   Global Step: 416260   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:29:38,366-Speed 2609.34 samples/sec   Loss 6.5760   LearningRate 0.0248   Epoch: 10   Global Step: 416270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:29:42,279-Speed 2617.73 samples/sec   Loss 6.6190   LearningRate 0.0248   Epoch: 10   Global Step: 416280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:29:46,199-Speed 2613.07 samples/sec   Loss 6.7180   LearningRate 0.0248   Epoch: 10   Global Step: 416290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:29:50,104-Speed 2622.69 samples/sec   Loss 6.5895   LearningRate 0.0248   Epoch: 10   Global Step: 416300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:29:54,001-Speed 2628.67 samples/sec   Loss 6.6854   LearningRate 0.0248   Epoch: 10   Global Step: 416310   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:29:57,920-Speed 2613.34 samples/sec   Loss 6.5210   LearningRate 0.0248   Epoch: 10   Global Step: 416320   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:01,852-Speed 2605.55 samples/sec   Loss 6.5885   LearningRate 0.0248   Epoch: 10   Global Step: 416330   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:05,754-Speed 2625.03 samples/sec   Loss 6.7807   LearningRate 0.0248   Epoch: 10   Global Step: 416340   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:09,825-Speed 2515.85 samples/sec   Loss 6.6522   LearningRate 0.0248   Epoch: 10   Global Step: 416350   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:13,855-Speed 2541.75 samples/sec   Loss 6.6983   LearningRate 0.0248   Epoch: 10   Global Step: 416360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:30:17,729-Speed 2644.07 samples/sec   Loss 6.6636   LearningRate 0.0248   Epoch: 10   Global Step: 416370   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:21,624-Speed 2629.49 samples/sec   Loss 6.6870   LearningRate 0.0248   Epoch: 10   Global Step: 416380   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:25,525-Speed 2626.16 samples/sec   Loss 6.7320   LearningRate 0.0248   Epoch: 10   Global Step: 416390   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:29,424-Speed 2626.29 samples/sec   Loss 6.6857   LearningRate 0.0248   Epoch: 10   Global Step: 416400   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:33,321-Speed 2628.31 samples/sec   Loss 6.7141   LearningRate 0.0248   Epoch: 10   Global Step: 416410   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:37,247-Speed 2609.26 samples/sec   Loss 6.4895   LearningRate 0.0248   Epoch: 10   Global Step: 416420   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:41,148-Speed 2627.09 samples/sec   Loss 6.5241   LearningRate 0.0248   Epoch: 10   Global Step: 416430   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:45,041-Speed 2630.72 samples/sec   Loss 6.6109   LearningRate 0.0248   Epoch: 10   Global Step: 416440   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:48,950-Speed 2620.92 samples/sec   Loss 6.7097   LearningRate 0.0248   Epoch: 10   Global Step: 416450   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:52,845-Speed 2629.33 samples/sec   Loss 6.5920   LearningRate 0.0248   Epoch: 10   Global Step: 416460   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:30:56,738-Speed 2631.28 samples/sec   Loss 6.7307   LearningRate 0.0248   Epoch: 10   Global Step: 416470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:31:00,635-Speed 2628.44 samples/sec   Loss 6.6908   LearningRate 0.0248   Epoch: 10   Global Step: 416480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:31:04,510-Speed 2642.71 samples/sec   Loss 6.6524   LearningRate 0.0248   Epoch: 10   Global Step: 416490   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:08,405-Speed 2629.93 samples/sec   Loss 6.7374   LearningRate 0.0248   Epoch: 10   Global Step: 416500   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:12,304-Speed 2626.92 samples/sec   Loss 6.6040   LearningRate 0.0248   Epoch: 10   Global Step: 416510   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:16,200-Speed 2629.03 samples/sec   Loss 6.7025   LearningRate 0.0248   Epoch: 10   Global Step: 416520   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:20,097-Speed 2628.00 samples/sec   Loss 6.7064   LearningRate 0.0248   Epoch: 10   Global Step: 416530   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:23,993-Speed 2630.05 samples/sec   Loss 6.5739   LearningRate 0.0248   Epoch: 10   Global Step: 416540   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:27,891-Speed 2627.67 samples/sec   Loss 6.6303   LearningRate 0.0248   Epoch: 10   Global Step: 416550   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:31,783-Speed 2631.89 samples/sec   Loss 6.5515   LearningRate 0.0248   Epoch: 10   Global Step: 416560   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:35,676-Speed 2630.92 samples/sec   Loss 6.5896   LearningRate 0.0248   Epoch: 10   Global Step: 416570   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:39,571-Speed 2628.96 samples/sec   Loss 6.6604   LearningRate 0.0248   Epoch: 10   Global Step: 416580   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:31:43,479-Speed 2620.88 samples/sec   Loss 6.7212   LearningRate 0.0248   Epoch: 10   Global Step: 416590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:31:47,625-Speed 2470.58 samples/sec   Loss 6.6349   LearningRate 0.0248   Epoch: 10   Global Step: 416600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:31:51,558-Speed 2604.42 samples/sec   Loss 6.7308   LearningRate 0.0248   Epoch: 10   Global Step: 416610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:31:55,461-Speed 2624.55 samples/sec   Loss 6.5138   LearningRate 0.0248   Epoch: 10   Global Step: 416620   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:31:59,358-Speed 2628.85 samples/sec   Loss 6.7919   LearningRate 0.0248   Epoch: 10   Global Step: 416630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:03,254-Speed 2628.64 samples/sec   Loss 6.6916   LearningRate 0.0248   Epoch: 10   Global Step: 416640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:07,158-Speed 2623.18 samples/sec   Loss 6.6720   LearningRate 0.0248   Epoch: 10   Global Step: 416650   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:11,055-Speed 2628.50 samples/sec   Loss 6.5413   LearningRate 0.0248   Epoch: 10   Global Step: 416660   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:14,950-Speed 2629.72 samples/sec   Loss 6.6608   LearningRate 0.0248   Epoch: 10   Global Step: 416670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:18,856-Speed 2622.01 samples/sec   Loss 6.6927   LearningRate 0.0248   Epoch: 10   Global Step: 416680   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:22,737-Speed 2639.42 samples/sec   Loss 6.5527   LearningRate 0.0248   Epoch: 10   Global Step: 416690   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:26,635-Speed 2627.82 samples/sec   Loss 6.6748   LearningRate 0.0248   Epoch: 10   Global Step: 416700   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:30,528-Speed 2631.25 samples/sec   Loss 6.5976   LearningRate 0.0248   Epoch: 10   Global Step: 416710   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:34,432-Speed 2623.25 samples/sec   Loss 6.6184   LearningRate 0.0248   Epoch: 10   Global Step: 416720   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:38,342-Speed 2619.45 samples/sec   Loss 6.6496   LearningRate 0.0248   Epoch: 10   Global Step: 416730   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:32:42,219-Speed 2641.68 samples/sec   Loss 6.7885   LearningRate 0.0248   Epoch: 10   Global Step: 416740   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:32:46,120-Speed 2626.17 samples/sec   Loss 6.6461   LearningRate 0.0248   Epoch: 10   Global Step: 416750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:32:50,034-Speed 2616.90 samples/sec   Loss 6.7103   LearningRate 0.0248   Epoch: 10   Global Step: 416760   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:32:53,970-Speed 2603.08 samples/sec   Loss 6.6340   LearningRate 0.0248   Epoch: 10   Global Step: 416770   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:32:57,873-Speed 2624.13 samples/sec   Loss 6.6513   LearningRate 0.0248   Epoch: 10   Global Step: 416780   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:33:01,776-Speed 2624.31 samples/sec   Loss 6.6188   LearningRate 0.0248   Epoch: 10   Global Step: 416790   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:33:05,702-Speed 2608.50 samples/sec   Loss 6.7170   LearningRate 0.0248   Epoch: 10   Global Step: 416800   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:33:09,598-Speed 2628.96 samples/sec   Loss 6.6325   LearningRate 0.0248   Epoch: 10   Global Step: 416810   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:33:13,496-Speed 2628.18 samples/sec   Loss 6.5555   LearningRate 0.0248   Epoch: 10   Global Step: 416820   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:33:17,391-Speed 2629.22 samples/sec   Loss 6.5230   LearningRate 0.0248   Epoch: 10   Global Step: 416830   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:33:21,290-Speed 2626.77 samples/sec   Loss 6.6673   LearningRate 0.0248   Epoch: 10   Global Step: 416840   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:25,185-Speed 2630.21 samples/sec   Loss 6.5767   LearningRate 0.0248   Epoch: 10   Global Step: 416850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:29,083-Speed 2627.63 samples/sec   Loss 6.6122   LearningRate 0.0248   Epoch: 10   Global Step: 416860   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:32,979-Speed 2629.28 samples/sec   Loss 6.7218   LearningRate 0.0247   Epoch: 10   Global Step: 416870   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:36,886-Speed 2621.60 samples/sec   Loss 6.5563   LearningRate 0.0247   Epoch: 10   Global Step: 416880   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:40,783-Speed 2627.62 samples/sec   Loss 6.5840   LearningRate 0.0247   Epoch: 10   Global Step: 416890   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:44,680-Speed 2628.18 samples/sec   Loss 6.7890   LearningRate 0.0247   Epoch: 10   Global Step: 416900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:48,591-Speed 2619.41 samples/sec   Loss 6.5926   LearningRate 0.0247   Epoch: 10   Global Step: 416910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:52,511-Speed 2613.50 samples/sec   Loss 6.5110   LearningRate 0.0247   Epoch: 10   Global Step: 416920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:33:56,406-Speed 2628.98 samples/sec   Loss 6.6914   LearningRate 0.0247   Epoch: 10   Global Step: 416930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:34:00,311-Speed 2623.51 samples/sec   Loss 6.5837   LearningRate 0.0247   Epoch: 10   Global Step: 416940   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:34:04,184-Speed 2644.53 samples/sec   Loss 6.6025   LearningRate 0.0247   Epoch: 10   Global Step: 416950   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:08,084-Speed 2625.51 samples/sec   Loss 6.6507   LearningRate 0.0247   Epoch: 10   Global Step: 416960   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:11,981-Speed 2628.49 samples/sec   Loss 6.5073   LearningRate 0.0247   Epoch: 10   Global Step: 416970   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:15,875-Speed 2630.89 samples/sec   Loss 6.5794   LearningRate 0.0247   Epoch: 10   Global Step: 416980   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:19,770-Speed 2629.59 samples/sec   Loss 6.6053   LearningRate 0.0247   Epoch: 10   Global Step: 416990   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:23,662-Speed 2631.15 samples/sec   Loss 6.7707   LearningRate 0.0247   Epoch: 10   Global Step: 417000   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:27,559-Speed 2628.81 samples/sec   Loss 6.5511   LearningRate 0.0247   Epoch: 10   Global Step: 417010   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:31,459-Speed 2625.70 samples/sec   Loss 6.5897   LearningRate 0.0247   Epoch: 10   Global Step: 417020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:35,358-Speed 2627.09 samples/sec   Loss 6.7298   LearningRate 0.0247   Epoch: 10   Global Step: 417030   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:39,266-Speed 2621.27 samples/sec   Loss 6.7141   LearningRate 0.0247   Epoch: 10   Global Step: 417040   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:34:43,161-Speed 2629.62 samples/sec   Loss 6.4701   LearningRate 0.0247   Epoch: 10   Global Step: 417050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:34:47,052-Speed 2631.77 samples/sec   Loss 6.7462   LearningRate 0.0247   Epoch: 10   Global Step: 417060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:34:50,954-Speed 2625.44 samples/sec   Loss 6.6528   LearningRate 0.0247   Epoch: 10   Global Step: 417070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:34:54,879-Speed 2609.00 samples/sec   Loss 6.6112   LearningRate 0.0247   Epoch: 10   Global Step: 417080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:34:58,858-Speed 2574.57 samples/sec   Loss 6.4884   LearningRate 0.0247   Epoch: 10   Global Step: 417090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:02,763-Speed 2623.33 samples/sec   Loss 6.6113   LearningRate 0.0247   Epoch: 10   Global Step: 417100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:06,662-Speed 2626.95 samples/sec   Loss 6.5154   LearningRate 0.0247   Epoch: 10   Global Step: 417110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:10,583-Speed 2612.22 samples/sec   Loss 6.5593   LearningRate 0.0247   Epoch: 10   Global Step: 417120   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:14,483-Speed 2626.46 samples/sec   Loss 6.6875   LearningRate 0.0247   Epoch: 10   Global Step: 417130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:18,378-Speed 2629.41 samples/sec   Loss 6.6284   LearningRate 0.0247   Epoch: 10   Global Step: 417140   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:22,275-Speed 2628.29 samples/sec   Loss 6.6844   LearningRate 0.0247   Epoch: 10   Global Step: 417150   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:35:26,157-Speed 2638.48 samples/sec   Loss 6.6413   LearningRate 0.0247   Epoch: 10   Global Step: 417160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:30,157-Speed 2560.94 samples/sec   Loss 6.5494   LearningRate 0.0247   Epoch: 10   Global Step: 417170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:34,057-Speed 2626.32 samples/sec   Loss 6.7277   LearningRate 0.0247   Epoch: 10   Global Step: 417180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:37,955-Speed 2627.48 samples/sec   Loss 6.6407   LearningRate 0.0247   Epoch: 10   Global Step: 417190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:41,854-Speed 2627.19 samples/sec   Loss 6.6892   LearningRate 0.0247   Epoch: 10   Global Step: 417200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:45,766-Speed 2618.47 samples/sec   Loss 6.7127   LearningRate 0.0247   Epoch: 10   Global Step: 417210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:49,664-Speed 2627.05 samples/sec   Loss 6.5685   LearningRate 0.0247   Epoch: 10   Global Step: 417220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:53,558-Speed 2631.07 samples/sec   Loss 6.7923   LearningRate 0.0247   Epoch: 10   Global Step: 417230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:35:57,452-Speed 2630.46 samples/sec   Loss 6.5299   LearningRate 0.0247   Epoch: 10   Global Step: 417240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:01,349-Speed 2627.90 samples/sec   Loss 6.7662   LearningRate 0.0247   Epoch: 10   Global Step: 417250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:05,233-Speed 2637.90 samples/sec   Loss 6.7153   LearningRate 0.0247   Epoch: 10   Global Step: 417260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:09,150-Speed 2614.65 samples/sec   Loss 6.6850   LearningRate 0.0247   Epoch: 10   Global Step: 417270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:13,045-Speed 2630.07 samples/sec   Loss 6.7318   LearningRate 0.0247   Epoch: 10   Global Step: 417280   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:16,955-Speed 2619.18 samples/sec   Loss 6.6782   LearningRate 0.0247   Epoch: 10   Global Step: 417290   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:21,050-Speed 2501.62 samples/sec   Loss 6.6563   LearningRate 0.0247   Epoch: 10   Global Step: 417300   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:25,087-Speed 2537.12 samples/sec   Loss 6.5767   LearningRate 0.0247   Epoch: 10   Global Step: 417310   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:28,984-Speed 2628.34 samples/sec   Loss 6.6048   LearningRate 0.0247   Epoch: 10   Global Step: 417320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:32,880-Speed 2629.20 samples/sec   Loss 6.7469   LearningRate 0.0247   Epoch: 10   Global Step: 417330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:36,815-Speed 2602.90 samples/sec   Loss 6.7540   LearningRate 0.0247   Epoch: 10   Global Step: 417340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:40,867-Speed 2527.90 samples/sec   Loss 6.6153   LearningRate 0.0247   Epoch: 10   Global Step: 417350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:36:44,770-Speed 2624.21 samples/sec   Loss 6.7268   LearningRate 0.0247   Epoch: 10   Global Step: 417360   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:36:48,634-Speed 2650.73 samples/sec   Loss 6.4912   LearningRate 0.0247   Epoch: 10   Global Step: 417370   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:36:52,534-Speed 2626.45 samples/sec   Loss 6.5449   LearningRate 0.0247   Epoch: 10   Global Step: 417380   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:36:56,434-Speed 2626.19 samples/sec   Loss 6.5476   LearningRate 0.0247   Epoch: 10   Global Step: 417390   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:00,345-Speed 2618.90 samples/sec   Loss 6.7309   LearningRate 0.0247   Epoch: 10   Global Step: 417400   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:04,269-Speed 2609.56 samples/sec   Loss 6.6664   LearningRate 0.0247   Epoch: 10   Global Step: 417410   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:08,170-Speed 2625.38 samples/sec   Loss 6.6325   LearningRate 0.0247   Epoch: 10   Global Step: 417420   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:12,070-Speed 2627.08 samples/sec   Loss 6.7016   LearningRate 0.0247   Epoch: 10   Global Step: 417430   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:15,976-Speed 2622.43 samples/sec   Loss 6.5952   LearningRate 0.0247   Epoch: 10   Global Step: 417440   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:19,874-Speed 2626.93 samples/sec   Loss 6.6002   LearningRate 0.0247   Epoch: 10   Global Step: 417450   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:23,776-Speed 2625.16 samples/sec   Loss 6.6118   LearningRate 0.0247   Epoch: 10   Global Step: 417460   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:37:27,679-Speed 2624.98 samples/sec   Loss 6.5146   LearningRate 0.0247   Epoch: 10   Global Step: 417470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:31,582-Speed 2624.36 samples/sec   Loss 6.6390   LearningRate 0.0247   Epoch: 10   Global Step: 417480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:35,496-Speed 2616.87 samples/sec   Loss 6.6002   LearningRate 0.0247   Epoch: 10   Global Step: 417490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:39,396-Speed 2625.96 samples/sec   Loss 6.7179   LearningRate 0.0247   Epoch: 10   Global Step: 417500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:43,293-Speed 2628.81 samples/sec   Loss 6.6642   LearningRate 0.0247   Epoch: 10   Global Step: 417510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:47,191-Speed 2627.57 samples/sec   Loss 6.6984   LearningRate 0.0247   Epoch: 10   Global Step: 417520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:51,089-Speed 2627.70 samples/sec   Loss 6.7630   LearningRate 0.0247   Epoch: 10   Global Step: 417530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:54,985-Speed 2629.38 samples/sec   Loss 6.6479   LearningRate 0.0247   Epoch: 10   Global Step: 417540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:37:58,881-Speed 2628.34 samples/sec   Loss 6.6685   LearningRate 0.0247   Epoch: 10   Global Step: 417550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:02,774-Speed 2631.74 samples/sec   Loss 6.4846   LearningRate 0.0247   Epoch: 10   Global Step: 417560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:06,672-Speed 2627.79 samples/sec   Loss 6.5841   LearningRate 0.0247   Epoch: 10   Global Step: 417570   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:38:10,553-Speed 2639.09 samples/sec   Loss 6.6014   LearningRate 0.0247   Epoch: 10   Global Step: 417580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:14,448-Speed 2629.47 samples/sec   Loss 6.5487   LearningRate 0.0247   Epoch: 10   Global Step: 417590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:18,370-Speed 2612.21 samples/sec   Loss 6.6363   LearningRate 0.0247   Epoch: 10   Global Step: 417600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:22,267-Speed 2629.23 samples/sec   Loss 6.6368   LearningRate 0.0247   Epoch: 10   Global Step: 417610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:26,215-Speed 2594.13 samples/sec   Loss 6.6602   LearningRate 0.0247   Epoch: 10   Global Step: 417620   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:30,118-Speed 2625.09 samples/sec   Loss 6.6578   LearningRate 0.0247   Epoch: 10   Global Step: 417630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:34,030-Speed 2618.31 samples/sec   Loss 6.4692   LearningRate 0.0247   Epoch: 10   Global Step: 417640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:37,922-Speed 2631.81 samples/sec   Loss 6.6362   LearningRate 0.0247   Epoch: 10   Global Step: 417650   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:38:41,798-Speed 2642.03 samples/sec   Loss 6.6611   LearningRate 0.0247   Epoch: 10   Global Step: 417660   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:38:45,699-Speed 2632.08 samples/sec   Loss 6.6117   LearningRate 0.0247   Epoch: 10   Global Step: 417670   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:38:49,614-Speed 2616.02 samples/sec   Loss 6.6174   LearningRate 0.0247   Epoch: 10   Global Step: 417680   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:38:53,506-Speed 2631.98 samples/sec   Loss 6.5501   LearningRate 0.0247   Epoch: 10   Global Step: 417690   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:38:57,405-Speed 2627.20 samples/sec   Loss 6.6461   LearningRate 0.0247   Epoch: 10   Global Step: 417700   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:39:01,299-Speed 2630.32 samples/sec   Loss 6.6264   LearningRate 0.0246   Epoch: 10   Global Step: 417710   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:39:05,235-Speed 2602.42 samples/sec   Loss 6.6034   LearningRate 0.0246   Epoch: 10   Global Step: 417720   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:39:09,150-Speed 2615.82 samples/sec   Loss 6.6111   LearningRate 0.0246   Epoch: 10   Global Step: 417730   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:39:13,059-Speed 2620.27 samples/sec   Loss 6.6126   LearningRate 0.0246   Epoch: 10   Global Step: 417740   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:39:16,951-Speed 2631.36 samples/sec   Loss 6.5497   LearningRate 0.0246   Epoch: 10   Global Step: 417750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:39:20,854-Speed 2624.51 samples/sec   Loss 6.5932   LearningRate 0.0246   Epoch: 10   Global Step: 417760   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:24,754-Speed 2626.82 samples/sec   Loss 6.7463   LearningRate 0.0246   Epoch: 10   Global Step: 417770   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:28,655-Speed 2625.22 samples/sec   Loss 6.6285   LearningRate 0.0246   Epoch: 10   Global Step: 417780   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:32,555-Speed 2626.59 samples/sec   Loss 6.6055   LearningRate 0.0246   Epoch: 10   Global Step: 417790   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:36,455-Speed 2626.28 samples/sec   Loss 6.5143   LearningRate 0.0246   Epoch: 10   Global Step: 417800   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:40,349-Speed 2629.71 samples/sec   Loss 6.5765   LearningRate 0.0246   Epoch: 10   Global Step: 417810   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:44,244-Speed 2630.06 samples/sec   Loss 6.6145   LearningRate 0.0246   Epoch: 10   Global Step: 417820   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:48,143-Speed 2626.73 samples/sec   Loss 6.5930   LearningRate 0.0246   Epoch: 10   Global Step: 417830   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:52,040-Speed 2628.81 samples/sec   Loss 6.6574   LearningRate 0.0246   Epoch: 10   Global Step: 417840   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:55,940-Speed 2626.56 samples/sec   Loss 6.6186   LearningRate 0.0246   Epoch: 10   Global Step: 417850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:39:59,878-Speed 2600.87 samples/sec   Loss 6.6750   LearningRate 0.0246   Epoch: 10   Global Step: 417860   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:40:03,753-Speed 2643.20 samples/sec   Loss 6.5473   LearningRate 0.0246   Epoch: 10   Global Step: 417870   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:07,828-Speed 2513.38 samples/sec   Loss 6.5759   LearningRate 0.0246   Epoch: 10   Global Step: 417880   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:11,728-Speed 2626.14 samples/sec   Loss 6.5547   LearningRate 0.0246   Epoch: 10   Global Step: 417890   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:15,628-Speed 2626.52 samples/sec   Loss 6.5317   LearningRate 0.0246   Epoch: 10   Global Step: 417900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:19,530-Speed 2625.18 samples/sec   Loss 6.6016   LearningRate 0.0246   Epoch: 10   Global Step: 417910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:23,425-Speed 2629.51 samples/sec   Loss 6.5994   LearningRate 0.0246   Epoch: 10   Global Step: 417920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:27,330-Speed 2623.27 samples/sec   Loss 6.6535   LearningRate 0.0246   Epoch: 10   Global Step: 417930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:31,230-Speed 2626.16 samples/sec   Loss 6.6388   LearningRate 0.0246   Epoch: 10   Global Step: 417940   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:35,130-Speed 2626.27 samples/sec   Loss 6.5038   LearningRate 0.0246   Epoch: 10   Global Step: 417950   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:39,029-Speed 2626.78 samples/sec   Loss 6.7373   LearningRate 0.0246   Epoch: 10   Global Step: 417960   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:42,930-Speed 2625.15 samples/sec   Loss 6.5828   LearningRate 0.0246   Epoch: 10   Global Step: 417970   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:40:46,827-Speed 2628.91 samples/sec   Loss 6.6278   LearningRate 0.0246   Epoch: 10   Global Step: 417980   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:40:50,721-Speed 2629.63 samples/sec   Loss 6.6554   LearningRate 0.0246   Epoch: 10   Global Step: 417990   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:40:54,603-Speed 2639.39 samples/sec   Loss 6.6416   LearningRate 0.0246   Epoch: 10   Global Step: 418000   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:40:58,502-Speed 2626.93 samples/sec   Loss 6.6086   LearningRate 0.0246   Epoch: 10   Global Step: 418010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:02,400-Speed 2627.20 samples/sec   Loss 6.5279   LearningRate 0.0246   Epoch: 10   Global Step: 418020   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:06,297-Speed 2628.44 samples/sec   Loss 6.5591   LearningRate 0.0246   Epoch: 10   Global Step: 418030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:10,198-Speed 2625.21 samples/sec   Loss 6.6529   LearningRate 0.0246   Epoch: 10   Global Step: 418040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:14,094-Speed 2628.95 samples/sec   Loss 6.7021   LearningRate 0.0246   Epoch: 10   Global Step: 418050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:18,105-Speed 2554.23 samples/sec   Loss 6.5246   LearningRate 0.0246   Epoch: 10   Global Step: 418060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:22,006-Speed 2625.29 samples/sec   Loss 6.6388   LearningRate 0.0246   Epoch: 10   Global Step: 418070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:25,921-Speed 2616.66 samples/sec   Loss 6.6242   LearningRate 0.0246   Epoch: 10   Global Step: 418080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:29,822-Speed 2625.62 samples/sec   Loss 6.6560   LearningRate 0.0246   Epoch: 10   Global Step: 418090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:33,719-Speed 2628.30 samples/sec   Loss 6.7195   LearningRate 0.0246   Epoch: 10   Global Step: 418100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:37,667-Speed 2594.79 samples/sec   Loss 6.6202   LearningRate 0.0246   Epoch: 10   Global Step: 418110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:41,565-Speed 2627.38 samples/sec   Loss 6.7434   LearningRate 0.0246   Epoch: 10   Global Step: 418120   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:45,482-Speed 2614.67 samples/sec   Loss 6.6558   LearningRate 0.0246   Epoch: 10   Global Step: 418130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:41:49,370-Speed 2634.63 samples/sec   Loss 6.7467   LearningRate 0.0246   Epoch: 10   Global Step: 418140   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:41:53,269-Speed 2627.12 samples/sec   Loss 6.5751   LearningRate 0.0246   Epoch: 10   Global Step: 418150   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:41:57,163-Speed 2630.02 samples/sec   Loss 6.6158   LearningRate 0.0246   Epoch: 10   Global Step: 418160   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:01,069-Speed 2622.58 samples/sec   Loss 6.5295   LearningRate 0.0246   Epoch: 10   Global Step: 418170   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:04,988-Speed 2613.28 samples/sec   Loss 6.7719   LearningRate 0.0246   Epoch: 10   Global Step: 418180   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:08,887-Speed 2627.22 samples/sec   Loss 6.6418   LearningRate 0.0246   Epoch: 10   Global Step: 418190   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:12,778-Speed 2632.00 samples/sec   Loss 6.6076   LearningRate 0.0246   Epoch: 10   Global Step: 418200   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:16,686-Speed 2621.72 samples/sec   Loss 6.6566   LearningRate 0.0246   Epoch: 10   Global Step: 418210   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:20,578-Speed 2631.01 samples/sec   Loss 6.4846   LearningRate 0.0246   Epoch: 10   Global Step: 418220   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:24,476-Speed 2627.95 samples/sec   Loss 6.6032   LearningRate 0.0246   Epoch: 10   Global Step: 418230   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:28,374-Speed 2627.54 samples/sec   Loss 6.7676   LearningRate 0.0246   Epoch: 10   Global Step: 418240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:42:32,272-Speed 2627.80 samples/sec   Loss 6.6852   LearningRate 0.0246   Epoch: 10   Global Step: 418250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:42:36,166-Speed 2630.35 samples/sec   Loss 6.6090   LearningRate 0.0246   Epoch: 10   Global Step: 418260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:42:40,060-Speed 2629.72 samples/sec   Loss 6.6147   LearningRate 0.0246   Epoch: 10   Global Step: 418270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:42:43,932-Speed 2646.07 samples/sec   Loss 6.6148   LearningRate 0.0246   Epoch: 10   Global Step: 418280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:47,824-Speed 2631.19 samples/sec   Loss 6.6057   LearningRate 0.0246   Epoch: 10   Global Step: 418290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:51,728-Speed 2625.82 samples/sec   Loss 6.6163   LearningRate 0.0246   Epoch: 10   Global Step: 418300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:55,625-Speed 2628.10 samples/sec   Loss 6.6191   LearningRate 0.0246   Epoch: 10   Global Step: 418310   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:42:59,542-Speed 2614.98 samples/sec   Loss 6.5388   LearningRate 0.0246   Epoch: 10   Global Step: 418320   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:03,455-Speed 2617.33 samples/sec   Loss 6.4827   LearningRate 0.0246   Epoch: 10   Global Step: 418330   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:07,356-Speed 2625.65 samples/sec   Loss 6.7702   LearningRate 0.0246   Epoch: 10   Global Step: 418340   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:11,262-Speed 2622.15 samples/sec   Loss 6.6353   LearningRate 0.0246   Epoch: 10   Global Step: 418350   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:15,158-Speed 2628.91 samples/sec   Loss 6.5665   LearningRate 0.0246   Epoch: 10   Global Step: 418360   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:19,076-Speed 2614.41 samples/sec   Loss 6.5983   LearningRate 0.0246   Epoch: 10   Global Step: 418370   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:23,148-Speed 2515.52 samples/sec   Loss 6.7388   LearningRate 0.0246   Epoch: 10   Global Step: 418380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:43:27,062-Speed 2617.15 samples/sec   Loss 6.4427   LearningRate 0.0246   Epoch: 10   Global Step: 418390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:43:30,958-Speed 2629.15 samples/sec   Loss 6.6220   LearningRate 0.0246   Epoch: 10   Global Step: 418400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:43:34,855-Speed 2628.10 samples/sec   Loss 6.5017   LearningRate 0.0246   Epoch: 10   Global Step: 418410   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:43:38,742-Speed 2634.81 samples/sec   Loss 6.5855   LearningRate 0.0246   Epoch: 10   Global Step: 418420   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:42,677-Speed 2603.32 samples/sec   Loss 6.6146   LearningRate 0.0246   Epoch: 10   Global Step: 418430   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:46,592-Speed 2616.58 samples/sec   Loss 6.6756   LearningRate 0.0246   Epoch: 10   Global Step: 418440   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:50,490-Speed 2627.74 samples/sec   Loss 6.5787   LearningRate 0.0246   Epoch: 10   Global Step: 418450   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:54,428-Speed 2600.93 samples/sec   Loss 6.5314   LearningRate 0.0246   Epoch: 10   Global Step: 418460   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:43:58,326-Speed 2627.60 samples/sec   Loss 6.6747   LearningRate 0.0246   Epoch: 10   Global Step: 418470   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:44:02,227-Speed 2625.54 samples/sec   Loss 6.6464   LearningRate 0.0246   Epoch: 10   Global Step: 418480   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:44:06,133-Speed 2622.40 samples/sec   Loss 6.7096   LearningRate 0.0246   Epoch: 10   Global Step: 418490   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:44:10,031-Speed 2627.41 samples/sec   Loss 6.5253   LearningRate 0.0246   Epoch: 10   Global Step: 418500   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:44:13,926-Speed 2630.07 samples/sec   Loss 6.5804   LearningRate 0.0246   Epoch: 10   Global Step: 418510   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:44:17,820-Speed 2630.81 samples/sec   Loss 6.7045   LearningRate 0.0246   Epoch: 10   Global Step: 418520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:21,715-Speed 2629.54 samples/sec   Loss 6.5726   LearningRate 0.0246   Epoch: 10   Global Step: 418530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:25,618-Speed 2624.87 samples/sec   Loss 6.6791   LearningRate 0.0246   Epoch: 10   Global Step: 418540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:29,525-Speed 2621.42 samples/sec   Loss 6.6605   LearningRate 0.0245   Epoch: 10   Global Step: 418550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:33,416-Speed 2631.84 samples/sec   Loss 6.6133   LearningRate 0.0245   Epoch: 10   Global Step: 418560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:37,315-Speed 2626.79 samples/sec   Loss 6.5184   LearningRate 0.0245   Epoch: 10   Global Step: 418570   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:41,216-Speed 2626.03 samples/sec   Loss 6.5624   LearningRate 0.0245   Epoch: 10   Global Step: 418580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:45,111-Speed 2630.11 samples/sec   Loss 6.5725   LearningRate 0.0245   Epoch: 10   Global Step: 418590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:49,223-Speed 2490.57 samples/sec   Loss 6.6012   LearningRate 0.0245   Epoch: 10   Global Step: 418600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:53,139-Speed 2616.32 samples/sec   Loss 6.6612   LearningRate 0.0245   Epoch: 10   Global Step: 418610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:44:57,019-Speed 2640.64 samples/sec   Loss 6.5104   LearningRate 0.0245   Epoch: 10   Global Step: 418620   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:00,939-Speed 2612.82 samples/sec   Loss 6.5837   LearningRate 0.0245   Epoch: 10   Global Step: 418630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:04,838-Speed 2627.25 samples/sec   Loss 6.5247   LearningRate 0.0245   Epoch: 10   Global Step: 418640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:08,744-Speed 2622.03 samples/sec   Loss 6.5786   LearningRate 0.0245   Epoch: 10   Global Step: 418650   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:12,682-Speed 2600.75 samples/sec   Loss 6.6093   LearningRate 0.0245   Epoch: 10   Global Step: 418660   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:16,575-Speed 2631.02 samples/sec   Loss 6.5702   LearningRate 0.0245   Epoch: 10   Global Step: 418670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:20,506-Speed 2605.95 samples/sec   Loss 6.5542   LearningRate 0.0245   Epoch: 10   Global Step: 418680   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:24,405-Speed 2626.92 samples/sec   Loss 6.5148   LearningRate 0.0245   Epoch: 10   Global Step: 418690   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:28,302-Speed 2629.51 samples/sec   Loss 6.5377   LearningRate 0.0245   Epoch: 10   Global Step: 418700   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:32,198-Speed 2629.17 samples/sec   Loss 6.6521   LearningRate 0.0245   Epoch: 10   Global Step: 418710   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:36,103-Speed 2622.67 samples/sec   Loss 6.6598   LearningRate 0.0245   Epoch: 10   Global Step: 418720   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:45:39,986-Speed 2637.57 samples/sec   Loss 6.6659   LearningRate 0.0245   Epoch: 10   Global Step: 418730   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:43,980-Speed 2564.29 samples/sec   Loss 6.6719   LearningRate 0.0245   Epoch: 10   Global Step: 418740   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:45:47,863-Speed 2637.98 samples/sec   Loss 6.5657   LearningRate 0.0245   Epoch: 10   Global Step: 418750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:45:51,758-Speed 2630.03 samples/sec   Loss 6.6334   LearningRate 0.0245   Epoch: 10   Global Step: 418760   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:45:55,655-Speed 2628.53 samples/sec   Loss 6.6699   LearningRate 0.0245   Epoch: 10   Global Step: 418770   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:45:59,550-Speed 2630.03 samples/sec   Loss 6.6590   LearningRate 0.0245   Epoch: 10   Global Step: 418780   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:03,456-Speed 2621.95 samples/sec   Loss 6.5924   LearningRate 0.0245   Epoch: 10   Global Step: 418790   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:07,357-Speed 2625.40 samples/sec   Loss 6.5597   LearningRate 0.0245   Epoch: 10   Global Step: 418800   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:11,304-Speed 2595.41 samples/sec   Loss 6.5566   LearningRate 0.0245   Epoch: 10   Global Step: 418810   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:15,212-Speed 2621.22 samples/sec   Loss 6.6266   LearningRate 0.0245   Epoch: 10   Global Step: 418820   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:19,113-Speed 2625.63 samples/sec   Loss 6.4099   LearningRate 0.0245   Epoch: 10   Global Step: 418830   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:23,007-Speed 2630.94 samples/sec   Loss 6.6068   LearningRate 0.0245   Epoch: 10   Global Step: 418840   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:46:26,914-Speed 2621.57 samples/sec   Loss 6.6260   LearningRate 0.0245   Epoch: 10   Global Step: 418850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:30,805-Speed 2632.56 samples/sec   Loss 6.6356   LearningRate 0.0245   Epoch: 10   Global Step: 418860   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:34,698-Speed 2631.30 samples/sec   Loss 6.5674   LearningRate 0.0245   Epoch: 10   Global Step: 418870   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:38,592-Speed 2630.24 samples/sec   Loss 6.6647   LearningRate 0.0245   Epoch: 10   Global Step: 418880   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:42,485-Speed 2630.47 samples/sec   Loss 6.7363   LearningRate 0.0245   Epoch: 10   Global Step: 418890   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:46,377-Speed 2632.15 samples/sec   Loss 6.6241   LearningRate 0.0245   Epoch: 10   Global Step: 418900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:50,266-Speed 2633.69 samples/sec   Loss 6.7181   LearningRate 0.0245   Epoch: 10   Global Step: 418910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:54,165-Speed 2627.31 samples/sec   Loss 6.5513   LearningRate 0.0245   Epoch: 10   Global Step: 418920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:46:58,077-Speed 2617.64 samples/sec   Loss 6.5812   LearningRate 0.0245   Epoch: 10   Global Step: 418930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:01,973-Speed 2629.68 samples/sec   Loss 6.6322   LearningRate 0.0245   Epoch: 10   Global Step: 418940   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:05,846-Speed 2644.25 samples/sec   Loss 6.5395   LearningRate 0.0245   Epoch: 10   Global Step: 418950   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:09,769-Speed 2610.95 samples/sec   Loss 6.6465   LearningRate 0.0245   Epoch: 10   Global Step: 418960   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:13,664-Speed 2629.29 samples/sec   Loss 6.5556   LearningRate 0.0245   Epoch: 10   Global Step: 418970   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:17,756-Speed 2503.70 samples/sec   Loss 6.5794   LearningRate 0.0245   Epoch: 10   Global Step: 418980   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:21,648-Speed 2631.63 samples/sec   Loss 6.6227   LearningRate 0.0245   Epoch: 10   Global Step: 418990   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:25,554-Speed 2622.28 samples/sec   Loss 6.5953   LearningRate 0.0245   Epoch: 10   Global Step: 419000   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:29,468-Speed 2616.89 samples/sec   Loss 6.5592   LearningRate 0.0245   Epoch: 10   Global Step: 419010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:33,373-Speed 2623.31 samples/sec   Loss 6.6826   LearningRate 0.0245   Epoch: 10   Global Step: 419020   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:37,271-Speed 2627.23 samples/sec   Loss 6.6250   LearningRate 0.0245   Epoch: 10   Global Step: 419030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:41,172-Speed 2625.27 samples/sec   Loss 6.5621   LearningRate 0.0245   Epoch: 10   Global Step: 419040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:45,048-Speed 2642.84 samples/sec   Loss 6.5211   LearningRate 0.0245   Epoch: 10   Global Step: 419050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:48,959-Speed 2619.19 samples/sec   Loss 6.5138   LearningRate 0.0245   Epoch: 10   Global Step: 419060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:52,857-Speed 2628.16 samples/sec   Loss 6.5490   LearningRate 0.0245   Epoch: 10   Global Step: 419070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:47:56,799-Speed 2598.40 samples/sec   Loss 6.5586   LearningRate 0.0245   Epoch: 10   Global Step: 419080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:48:00,700-Speed 2625.96 samples/sec   Loss 6.5660   LearningRate 0.0245   Epoch: 10   Global Step: 419090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:48:04,597-Speed 2628.16 samples/sec   Loss 6.6471   LearningRate 0.0245   Epoch: 10   Global Step: 419100   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:08,488-Speed 2632.05 samples/sec   Loss 6.5820   LearningRate 0.0245   Epoch: 10   Global Step: 419110   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:12,379-Speed 2632.30 samples/sec   Loss 6.5713   LearningRate 0.0245   Epoch: 10   Global Step: 419120   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:16,282-Speed 2624.89 samples/sec   Loss 6.5720   LearningRate 0.0245   Epoch: 10   Global Step: 419130   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:20,184-Speed 2624.90 samples/sec   Loss 6.4777   LearningRate 0.0245   Epoch: 10   Global Step: 419140   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:24,082-Speed 2627.68 samples/sec   Loss 6.5987   LearningRate 0.0245   Epoch: 10   Global Step: 419150   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:27,980-Speed 2628.03 samples/sec   Loss 6.6288   LearningRate 0.0245   Epoch: 10   Global Step: 419160   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:31,882-Speed 2624.85 samples/sec   Loss 6.5897   LearningRate 0.0245   Epoch: 10   Global Step: 419170   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:35,782-Speed 2626.39 samples/sec   Loss 6.5593   LearningRate 0.0245   Epoch: 10   Global Step: 419180   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:39,678-Speed 2629.08 samples/sec   Loss 6.5234   LearningRate 0.0245   Epoch: 10   Global Step: 419190   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:48:43,571-Speed 2630.70 samples/sec   Loss 6.4920   LearningRate 0.0245   Epoch: 10   Global Step: 419200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:48:47,480-Speed 2620.37 samples/sec   Loss 6.6288   LearningRate 0.0245   Epoch: 10   Global Step: 419210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:48:51,388-Speed 2621.61 samples/sec   Loss 6.6132   LearningRate 0.0245   Epoch: 10   Global Step: 419220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:48:55,304-Speed 2615.46 samples/sec   Loss 6.5851   LearningRate 0.0245   Epoch: 10   Global Step: 419230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:48:59,196-Speed 2631.52 samples/sec   Loss 6.6027   LearningRate 0.0245   Epoch: 10   Global Step: 419240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:03,092-Speed 2628.78 samples/sec   Loss 6.5625   LearningRate 0.0245   Epoch: 10   Global Step: 419250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:06,991-Speed 2627.64 samples/sec   Loss 6.6194   LearningRate 0.0245   Epoch: 10   Global Step: 419260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:10,895-Speed 2623.20 samples/sec   Loss 6.5629   LearningRate 0.0245   Epoch: 10   Global Step: 419270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:14,793-Speed 2627.84 samples/sec   Loss 6.6236   LearningRate 0.0245   Epoch: 10   Global Step: 419280   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:18,702-Speed 2620.32 samples/sec   Loss 6.5913   LearningRate 0.0245   Epoch: 10   Global Step: 419290   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:22,602-Speed 2626.46 samples/sec   Loss 6.5610   LearningRate 0.0245   Epoch: 10   Global Step: 419300   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:49:26,510-Speed 2621.58 samples/sec   Loss 6.6556   LearningRate 0.0245   Epoch: 10   Global Step: 419310   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:49:30,400-Speed 2632.79 samples/sec   Loss 6.5551   LearningRate 0.0245   Epoch: 10   Global Step: 419320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:34,308-Speed 2620.77 samples/sec   Loss 6.6848   LearningRate 0.0245   Epoch: 10   Global Step: 419330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:38,203-Speed 2629.93 samples/sec   Loss 6.5926   LearningRate 0.0245   Epoch: 10   Global Step: 419340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:42,103-Speed 2628.09 samples/sec   Loss 6.5479   LearningRate 0.0245   Epoch: 10   Global Step: 419350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:46,002-Speed 2626.49 samples/sec   Loss 6.6099   LearningRate 0.0245   Epoch: 10   Global Step: 419360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:49,921-Speed 2613.75 samples/sec   Loss 6.6058   LearningRate 0.0245   Epoch: 10   Global Step: 419370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:53,846-Speed 2609.66 samples/sec   Loss 6.6570   LearningRate 0.0244   Epoch: 10   Global Step: 419380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:49:57,751-Speed 2627.77 samples/sec   Loss 6.5952   LearningRate 0.0244   Epoch: 10   Global Step: 419390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:01,644-Speed 2630.64 samples/sec   Loss 6.5406   LearningRate 0.0244   Epoch: 10   Global Step: 419400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:05,551-Speed 2621.78 samples/sec   Loss 6.4914   LearningRate 0.0244   Epoch: 10   Global Step: 419410   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:09,429-Speed 2640.59 samples/sec   Loss 6.5656   LearningRate 0.0244   Epoch: 10   Global Step: 419420   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:13,343-Speed 2618.21 samples/sec   Loss 6.5124   LearningRate 0.0244   Epoch: 10   Global Step: 419430   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:17,242-Speed 2626.66 samples/sec   Loss 6.5278   LearningRate 0.0244   Epoch: 10   Global Step: 419440   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:21,141-Speed 2627.58 samples/sec   Loss 6.5523   LearningRate 0.0244   Epoch: 10   Global Step: 419450   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:25,043-Speed 2625.26 samples/sec   Loss 6.7447   LearningRate 0.0244   Epoch: 10   Global Step: 419460   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:28,940-Speed 2628.55 samples/sec   Loss 6.6282   LearningRate 0.0244   Epoch: 10   Global Step: 419470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:32,839-Speed 2626.91 samples/sec   Loss 6.6463   LearningRate 0.0244   Epoch: 10   Global Step: 419480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:36,741-Speed 2624.92 samples/sec   Loss 6.7194   LearningRate 0.0244   Epoch: 10   Global Step: 419490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:40,640-Speed 2626.56 samples/sec   Loss 6.6751   LearningRate 0.0244   Epoch: 10   Global Step: 419500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:44,615-Speed 2577.89 samples/sec   Loss 6.4907   LearningRate 0.0244   Epoch: 10   Global Step: 419510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:48,503-Speed 2634.16 samples/sec   Loss 6.5854   LearningRate 0.0244   Epoch: 10   Global Step: 419520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:52,419-Speed 2615.50 samples/sec   Loss 6.5552   LearningRate 0.0244   Epoch: 10   Global Step: 419530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:50:56,409-Speed 2567.52 samples/sec   Loss 6.5771   LearningRate 0.0244   Epoch: 10   Global Step: 419540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:00,314-Speed 2622.88 samples/sec   Loss 6.6371   LearningRate 0.0244   Epoch: 10   Global Step: 419550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:04,222-Speed 2620.53 samples/sec   Loss 6.5806   LearningRate 0.0244   Epoch: 10   Global Step: 419560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:08,126-Speed 2623.60 samples/sec   Loss 6.6160   LearningRate 0.0244   Epoch: 10   Global Step: 419570   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:12,021-Speed 2630.58 samples/sec   Loss 6.5803   LearningRate 0.0244   Epoch: 10   Global Step: 419580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:15,932-Speed 2618.51 samples/sec   Loss 6.5221   LearningRate 0.0244   Epoch: 10   Global Step: 419590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:19,825-Speed 2631.27 samples/sec   Loss 6.5808   LearningRate 0.0244   Epoch: 10   Global Step: 419600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:23,740-Speed 2616.42 samples/sec   Loss 6.5996   LearningRate 0.0244   Epoch: 10   Global Step: 419610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:27,644-Speed 2623.94 samples/sec   Loss 6.6829   LearningRate 0.0244   Epoch: 10   Global Step: 419620   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:51:31,525-Speed 2639.27 samples/sec   Loss 6.5958   LearningRate 0.0244   Epoch: 10   Global Step: 419630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:35,420-Speed 2629.58 samples/sec   Loss 6.6345   LearningRate 0.0244   Epoch: 10   Global Step: 419640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:39,322-Speed 2624.72 samples/sec   Loss 6.5740   LearningRate 0.0244   Epoch: 10   Global Step: 419650   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:43,265-Speed 2597.28 samples/sec   Loss 6.5686   LearningRate 0.0244   Epoch: 10   Global Step: 419660   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:47,235-Speed 2580.04 samples/sec   Loss 6.5009   LearningRate 0.0244   Epoch: 10   Global Step: 419670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:51,321-Speed 2506.98 samples/sec   Loss 6.6176   LearningRate 0.0244   Epoch: 10   Global Step: 419680   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:55,401-Speed 2510.28 samples/sec   Loss 6.5277   LearningRate 0.0244   Epoch: 10   Global Step: 419690   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:51:59,462-Speed 2521.96 samples/sec   Loss 6.5478   LearningRate 0.0244   Epoch: 10   Global Step: 419700   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:52:03,359-Speed 2628.76 samples/sec   Loss 6.5684   LearningRate 0.0244   Epoch: 10   Global Step: 419710   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:52:07,263-Speed 2623.56 samples/sec   Loss 6.6201   LearningRate 0.0244   Epoch: 10   Global Step: 419720   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:52:11,163-Speed 2625.51 samples/sec   Loss 6.4589   LearningRate 0.0244   Epoch: 10   Global Step: 419730   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:52:15,076-Speed 2618.30 samples/sec   Loss 6.6325   LearningRate 0.0244   Epoch: 10   Global Step: 419740   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:52:19,018-Speed 2598.45 samples/sec   Loss 6.4897   LearningRate 0.0244   Epoch: 10   Global Step: 419750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:22,937-Speed 2613.96 samples/sec   Loss 6.6678   LearningRate 0.0244   Epoch: 10   Global Step: 419760   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:26,860-Speed 2610.67 samples/sec   Loss 6.6819   LearningRate 0.0244   Epoch: 10   Global Step: 419770   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:30,765-Speed 2623.47 samples/sec   Loss 6.6575   LearningRate 0.0244   Epoch: 10   Global Step: 419780   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:34,661-Speed 2629.06 samples/sec   Loss 6.6177   LearningRate 0.0244   Epoch: 10   Global Step: 419790   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:38,564-Speed 2623.80 samples/sec   Loss 6.5323   LearningRate 0.0244   Epoch: 10   Global Step: 419800   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:42,459-Speed 2629.69 samples/sec   Loss 6.5604   LearningRate 0.0244   Epoch: 10   Global Step: 419810   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:46,359-Speed 2626.21 samples/sec   Loss 6.6272   LearningRate 0.0244   Epoch: 10   Global Step: 419820   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:50,261-Speed 2624.56 samples/sec   Loss 6.5452   LearningRate 0.0244   Epoch: 10   Global Step: 419830   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:54,176-Speed 2616.59 samples/sec   Loss 6.5735   LearningRate 0.0244   Epoch: 10   Global Step: 419840   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:52:58,079-Speed 2624.25 samples/sec   Loss 6.6391   LearningRate 0.0244   Epoch: 10   Global Step: 419850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:02,100-Speed 2547.66 samples/sec   Loss 6.6588   LearningRate 0.0244   Epoch: 10   Global Step: 419860   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:05,998-Speed 2627.44 samples/sec   Loss 6.6368   LearningRate 0.0244   Epoch: 10   Global Step: 419870   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:09,900-Speed 2624.97 samples/sec   Loss 6.6181   LearningRate 0.0244   Epoch: 10   Global Step: 419880   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:13,796-Speed 2628.64 samples/sec   Loss 6.6027   LearningRate 0.0244   Epoch: 10   Global Step: 419890   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:17,703-Speed 2623.11 samples/sec   Loss 6.5725   LearningRate 0.0244   Epoch: 10   Global Step: 419900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:21,603-Speed 2626.08 samples/sec   Loss 6.6727   LearningRate 0.0244   Epoch: 10   Global Step: 419910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:25,499-Speed 2629.76 samples/sec   Loss 6.5230   LearningRate 0.0244   Epoch: 10   Global Step: 419920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:29,397-Speed 2627.81 samples/sec   Loss 6.5886   LearningRate 0.0244   Epoch: 10   Global Step: 419930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:33,317-Speed 2613.12 samples/sec   Loss 6.5769   LearningRate 0.0244   Epoch: 10   Global Step: 419940   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:53:37,194-Speed 2641.74 samples/sec   Loss 6.5521   LearningRate 0.0244   Epoch: 10   Global Step: 419950   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:53:41,097-Speed 2624.61 samples/sec   Loss 6.5680   LearningRate 0.0244   Epoch: 10   Global Step: 419960   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:53:45,000-Speed 2623.80 samples/sec   Loss 6.6904   LearningRate 0.0244   Epoch: 10   Global Step: 419970   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:53:48,908-Speed 2621.10 samples/sec   Loss 6.4872   LearningRate 0.0244   Epoch: 10   Global Step: 419980   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:53:52,805-Speed 2628.38 samples/sec   Loss 6.4816   LearningRate 0.0244   Epoch: 10   Global Step: 419990   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:53:56,702-Speed 2628.73 samples/sec   Loss 6.5280   LearningRate 0.0244   Epoch: 10   Global Step: 420000   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:54:39,590-[lfw][420000]XNorm: 23.189145
Training: 2022-04-14 18:54:39,590-[lfw][420000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-14 18:54:39,591-[lfw][420000]Accuracy-Highest: 0.99783
Training: 2022-04-14 18:55:29,226-[cfp_fp][420000]XNorm: 21.710651
Training: 2022-04-14 18:55:29,227-[cfp_fp][420000]Accuracy-Flip: 0.98843+-0.00517
Training: 2022-04-14 18:55:29,228-[cfp_fp][420000]Accuracy-Highest: 0.98843
Training: 2022-04-14 18:56:12,483-[agedb_30][420000]XNorm: 23.156444
Training: 2022-04-14 18:56:12,484-[agedb_30][420000]Accuracy-Flip: 0.97733+-0.00429
Training: 2022-04-14 18:56:12,484-[agedb_30][420000]Accuracy-Highest: 0.97733
Training: 2022-04-14 18:56:16,363-Speed 73.32 samples/sec   Loss 6.5663   LearningRate 0.0244   Epoch: 10   Global Step: 420010   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:56:20,416-Speed 2527.20 samples/sec   Loss 6.6214   LearningRate 0.0244   Epoch: 10   Global Step: 420020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:56:24,531-Speed 2488.92 samples/sec   Loss 6.5726   LearningRate 0.0244   Epoch: 10   Global Step: 420030   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:56:28,517-Speed 2569.80 samples/sec   Loss 6.7303   LearningRate 0.0244   Epoch: 10   Global Step: 420040   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:56:32,392-Speed 2643.20 samples/sec   Loss 6.5454   LearningRate 0.0244   Epoch: 10   Global Step: 420050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:36,297-Speed 2622.64 samples/sec   Loss 6.6500   LearningRate 0.0244   Epoch: 10   Global Step: 420060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:40,186-Speed 2634.92 samples/sec   Loss 6.5349   LearningRate 0.0244   Epoch: 10   Global Step: 420070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:44,236-Speed 2529.30 samples/sec   Loss 6.6165   LearningRate 0.0244   Epoch: 10   Global Step: 420080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:48,140-Speed 2623.42 samples/sec   Loss 6.5889   LearningRate 0.0244   Epoch: 10   Global Step: 420090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:52,061-Speed 2611.94 samples/sec   Loss 6.5852   LearningRate 0.0244   Epoch: 10   Global Step: 420100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:55,978-Speed 2616.80 samples/sec   Loss 6.7120   LearningRate 0.0244   Epoch: 10   Global Step: 420110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:56:59,882-Speed 2623.79 samples/sec   Loss 6.5262   LearningRate 0.0244   Epoch: 10   Global Step: 420120   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:03,791-Speed 2620.32 samples/sec   Loss 6.6043   LearningRate 0.0244   Epoch: 10   Global Step: 420130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:07,693-Speed 2625.04 samples/sec   Loss 6.5641   LearningRate 0.0244   Epoch: 10   Global Step: 420140   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:11,609-Speed 2615.14 samples/sec   Loss 6.5982   LearningRate 0.0244   Epoch: 10   Global Step: 420150   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 18:57:15,524-Speed 2616.59 samples/sec   Loss 6.4839   LearningRate 0.0244   Epoch: 10   Global Step: 420160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:19,425-Speed 2625.67 samples/sec   Loss 6.6255   LearningRate 0.0244   Epoch: 10   Global Step: 420170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:23,329-Speed 2623.39 samples/sec   Loss 6.5437   LearningRate 0.0244   Epoch: 10   Global Step: 420180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:27,238-Speed 2620.53 samples/sec   Loss 6.5889   LearningRate 0.0244   Epoch: 10   Global Step: 420190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:31,158-Speed 2613.24 samples/sec   Loss 6.5834   LearningRate 0.0244   Epoch: 10   Global Step: 420200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:35,066-Speed 2620.59 samples/sec   Loss 6.5946   LearningRate 0.0244   Epoch: 10   Global Step: 420210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:57:38,960-Speed 2630.04 samples/sec   Loss 6.4654   LearningRate 0.0243   Epoch: 10   Global Step: 420220   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:57:42,873-Speed 2617.71 samples/sec   Loss 6.6003   LearningRate 0.0243   Epoch: 10   Global Step: 420230   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:57:46,773-Speed 2626.29 samples/sec   Loss 6.5167   LearningRate 0.0243   Epoch: 10   Global Step: 420240   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:57:50,755-Speed 2572.32 samples/sec   Loss 6.6565   LearningRate 0.0243   Epoch: 10   Global Step: 420250   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:57:54,667-Speed 2618.32 samples/sec   Loss 6.5951   LearningRate 0.0243   Epoch: 10   Global Step: 420260   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:57:58,600-Speed 2604.27 samples/sec   Loss 6.6246   LearningRate 0.0243   Epoch: 10   Global Step: 420270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:02,524-Speed 2610.47 samples/sec   Loss 6.7312   LearningRate 0.0243   Epoch: 10   Global Step: 420280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:06,444-Speed 2612.89 samples/sec   Loss 6.5838   LearningRate 0.0243   Epoch: 10   Global Step: 420290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:10,363-Speed 2613.40 samples/sec   Loss 6.5225   LearningRate 0.0243   Epoch: 10   Global Step: 420300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:14,263-Speed 2626.11 samples/sec   Loss 6.7248   LearningRate 0.0243   Epoch: 10   Global Step: 420310   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:18,163-Speed 2626.02 samples/sec   Loss 6.6453   LearningRate 0.0243   Epoch: 10   Global Step: 420320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:22,072-Speed 2620.23 samples/sec   Loss 6.4056   LearningRate 0.0243   Epoch: 10   Global Step: 420330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:25,979-Speed 2622.33 samples/sec   Loss 6.4951   LearningRate 0.0243   Epoch: 10   Global Step: 420340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:29,916-Speed 2601.32 samples/sec   Loss 6.5814   LearningRate 0.0243   Epoch: 10   Global Step: 420350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:33,841-Speed 2609.35 samples/sec   Loss 6.5488   LearningRate 0.0243   Epoch: 10   Global Step: 420360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:37,762-Speed 2612.82 samples/sec   Loss 6.5444   LearningRate 0.0243   Epoch: 10   Global Step: 420370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:41,675-Speed 2617.78 samples/sec   Loss 6.5381   LearningRate 0.0243   Epoch: 10   Global Step: 420380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:45,588-Speed 2617.24 samples/sec   Loss 6.6701   LearningRate 0.0243   Epoch: 10   Global Step: 420390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 18:58:49,472-Speed 2637.23 samples/sec   Loss 6.6457   LearningRate 0.0243   Epoch: 10   Global Step: 420400   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:53,385-Speed 2617.15 samples/sec   Loss 6.5216   LearningRate 0.0243   Epoch: 10   Global Step: 420410   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:58:57,302-Speed 2615.34 samples/sec   Loss 6.5766   LearningRate 0.0243   Epoch: 10   Global Step: 420420   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:01,202-Speed 2625.61 samples/sec   Loss 6.5581   LearningRate 0.0243   Epoch: 10   Global Step: 420430   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:05,102-Speed 2626.33 samples/sec   Loss 6.4596   LearningRate 0.0243   Epoch: 10   Global Step: 420440   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:09,000-Speed 2627.81 samples/sec   Loss 6.5185   LearningRate 0.0243   Epoch: 10   Global Step: 420450   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:12,910-Speed 2619.72 samples/sec   Loss 6.5507   LearningRate 0.0243   Epoch: 10   Global Step: 420460   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:16,822-Speed 2618.02 samples/sec   Loss 6.6196   LearningRate 0.0243   Epoch: 10   Global Step: 420470   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:20,740-Speed 2614.58 samples/sec   Loss 6.5239   LearningRate 0.0243   Epoch: 10   Global Step: 420480   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:24,644-Speed 2623.76 samples/sec   Loss 6.5803   LearningRate 0.0243   Epoch: 10   Global Step: 420490   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:28,519-Speed 2643.39 samples/sec   Loss 6.4610   LearningRate 0.0243   Epoch: 10   Global Step: 420500   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:32,417-Speed 2627.49 samples/sec   Loss 6.4816   LearningRate 0.0243   Epoch: 10   Global Step: 420510   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:36,315-Speed 2627.75 samples/sec   Loss 6.5510   LearningRate 0.0243   Epoch: 10   Global Step: 420520   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:40,216-Speed 2625.92 samples/sec   Loss 6.4721   LearningRate 0.0243   Epoch: 10   Global Step: 420530   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:44,119-Speed 2623.62 samples/sec   Loss 6.6717   LearningRate 0.0243   Epoch: 10   Global Step: 420540   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:48,033-Speed 2617.36 samples/sec   Loss 6.5471   LearningRate 0.0243   Epoch: 10   Global Step: 420550   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:51,932-Speed 2627.33 samples/sec   Loss 6.5444   LearningRate 0.0243   Epoch: 10   Global Step: 420560   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:55,862-Speed 2606.10 samples/sec   Loss 6.5877   LearningRate 0.0243   Epoch: 10   Global Step: 420570   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 18:59:59,769-Speed 2621.23 samples/sec   Loss 6.6381   LearningRate 0.0243   Epoch: 10   Global Step: 420580   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:00:03,684-Speed 2617.08 samples/sec   Loss 6.5537   LearningRate 0.0243   Epoch: 10   Global Step: 420590   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:00:07,584-Speed 2625.64 samples/sec   Loss 6.6253   LearningRate 0.0243   Epoch: 10   Global Step: 420600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:11,507-Speed 2611.52 samples/sec   Loss 6.6168   LearningRate 0.0243   Epoch: 10   Global Step: 420610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:15,406-Speed 2626.59 samples/sec   Loss 6.4789   LearningRate 0.0243   Epoch: 10   Global Step: 420620   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:19,309-Speed 2624.67 samples/sec   Loss 6.4511   LearningRate 0.0243   Epoch: 10   Global Step: 420630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:23,491-Speed 2449.36 samples/sec   Loss 6.5979   LearningRate 0.0243   Epoch: 10   Global Step: 420640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:27,391-Speed 2625.95 samples/sec   Loss 6.5395   LearningRate 0.0243   Epoch: 10   Global Step: 420650   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:31,305-Speed 2616.92 samples/sec   Loss 6.5079   LearningRate 0.0243   Epoch: 10   Global Step: 420660   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:35,205-Speed 2626.40 samples/sec   Loss 6.5336   LearningRate 0.0243   Epoch: 10   Global Step: 420670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:39,108-Speed 2624.53 samples/sec   Loss 6.5256   LearningRate 0.0243   Epoch: 10   Global Step: 420680   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:00:42,984-Speed 2642.72 samples/sec   Loss 6.6214   LearningRate 0.0243   Epoch: 10   Global Step: 420690   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:00:46,884-Speed 2626.75 samples/sec   Loss 6.6154   LearningRate 0.0243   Epoch: 10   Global Step: 420700   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:00:50,785-Speed 2625.25 samples/sec   Loss 6.6098   LearningRate 0.0243   Epoch: 10   Global Step: 420710   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:00:54,687-Speed 2625.47 samples/sec   Loss 6.5479   LearningRate 0.0243   Epoch: 10   Global Step: 420720   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:00:58,597-Speed 2619.46 samples/sec   Loss 6.5242   LearningRate 0.0243   Epoch: 10   Global Step: 420730   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:01:02,497-Speed 2626.24 samples/sec   Loss 6.5832   LearningRate 0.0243   Epoch: 10   Global Step: 420740   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:01:06,396-Speed 2626.69 samples/sec   Loss 6.4459   LearningRate 0.0243   Epoch: 10   Global Step: 420750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:01:10,295-Speed 2627.52 samples/sec   Loss 6.6386   LearningRate 0.0243   Epoch: 10   Global Step: 420760   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:01:14,193-Speed 2627.30 samples/sec   Loss 6.6820   LearningRate 0.0243   Epoch: 10   Global Step: 420770   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:01:18,092-Speed 2626.97 samples/sec   Loss 6.5428   LearningRate 0.0243   Epoch: 10   Global Step: 420780   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:01:21,992-Speed 2627.43 samples/sec   Loss 6.6263   LearningRate 0.0243   Epoch: 10   Global Step: 420790   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:25,894-Speed 2624.68 samples/sec   Loss 6.6247   LearningRate 0.0243   Epoch: 10   Global Step: 420800   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:29,813-Speed 2613.92 samples/sec   Loss 6.6052   LearningRate 0.0243   Epoch: 10   Global Step: 420810   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:33,713-Speed 2626.24 samples/sec   Loss 6.6498   LearningRate 0.0243   Epoch: 10   Global Step: 420820   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:37,612-Speed 2626.78 samples/sec   Loss 6.5787   LearningRate 0.0243   Epoch: 10   Global Step: 420830   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:41,510-Speed 2627.80 samples/sec   Loss 6.5427   LearningRate 0.0243   Epoch: 10   Global Step: 420840   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:45,412-Speed 2624.73 samples/sec   Loss 6.5557   LearningRate 0.0243   Epoch: 10   Global Step: 420850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:49,320-Speed 2620.71 samples/sec   Loss 6.6204   LearningRate 0.0243   Epoch: 10   Global Step: 420860   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:53,221-Speed 2626.10 samples/sec   Loss 6.5130   LearningRate 0.0243   Epoch: 10   Global Step: 420870   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:01:57,141-Speed 2612.79 samples/sec   Loss 6.6064   LearningRate 0.0243   Epoch: 10   Global Step: 420880   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:01,031-Speed 2633.67 samples/sec   Loss 6.6043   LearningRate 0.0243   Epoch: 10   Global Step: 420890   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:04,947-Speed 2615.67 samples/sec   Loss 6.4712   LearningRate 0.0243   Epoch: 10   Global Step: 420900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:08,873-Speed 2608.66 samples/sec   Loss 6.5197   LearningRate 0.0243   Epoch: 10   Global Step: 420910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:12,799-Speed 2608.83 samples/sec   Loss 6.6750   LearningRate 0.0243   Epoch: 10   Global Step: 420920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:16,715-Speed 2615.75 samples/sec   Loss 6.5373   LearningRate 0.0243   Epoch: 10   Global Step: 420930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:20,620-Speed 2622.80 samples/sec   Loss 6.5713   LearningRate 0.0243   Epoch: 10   Global Step: 420940   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:24,518-Speed 2627.66 samples/sec   Loss 6.6721   LearningRate 0.0243   Epoch: 10   Global Step: 420950   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:28,413-Speed 2629.98 samples/sec   Loss 6.6035   LearningRate 0.0243   Epoch: 10   Global Step: 420960   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:32,311-Speed 2627.61 samples/sec   Loss 6.5655   LearningRate 0.0243   Epoch: 10   Global Step: 420970   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:36,216-Speed 2622.55 samples/sec   Loss 6.5855   LearningRate 0.0243   Epoch: 10   Global Step: 420980   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:02:40,113-Speed 2628.25 samples/sec   Loss 6.4984   LearningRate 0.0243   Epoch: 10   Global Step: 420990   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:02:43,981-Speed 2648.54 samples/sec   Loss 6.6096   LearningRate 0.0243   Epoch: 10   Global Step: 421000   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:02:47,885-Speed 2623.21 samples/sec   Loss 6.5485   LearningRate 0.0243   Epoch: 10   Global Step: 421010   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:02:51,788-Speed 2624.41 samples/sec   Loss 6.6115   LearningRate 0.0243   Epoch: 10   Global Step: 421020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:02:55,686-Speed 2627.60 samples/sec   Loss 6.7015   LearningRate 0.0243   Epoch: 10   Global Step: 421030   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:02:59,586-Speed 2626.58 samples/sec   Loss 6.5338   LearningRate 0.0243   Epoch: 10   Global Step: 421040   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:03:03,504-Speed 2614.41 samples/sec   Loss 6.5896   LearningRate 0.0243   Epoch: 10   Global Step: 421050   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:03:07,413-Speed 2620.21 samples/sec   Loss 6.6040   LearningRate 0.0242   Epoch: 10   Global Step: 421060   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:03:11,313-Speed 2626.11 samples/sec   Loss 6.7723   LearningRate 0.0242   Epoch: 10   Global Step: 421070   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:03:15,213-Speed 2626.15 samples/sec   Loss 6.6080   LearningRate 0.0242   Epoch: 10   Global Step: 421080   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:03:19,143-Speed 2606.35 samples/sec   Loss 6.6839   LearningRate 0.0242   Epoch: 10   Global Step: 421090   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:03:23,047-Speed 2624.53 samples/sec   Loss 6.5798   LearningRate 0.0242   Epoch: 10   Global Step: 421100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:26,939-Speed 2631.45 samples/sec   Loss 6.5282   LearningRate 0.0242   Epoch: 10   Global Step: 421110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:30,834-Speed 2630.31 samples/sec   Loss 6.6758   LearningRate 0.0242   Epoch: 10   Global Step: 421120   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:34,817-Speed 2571.32 samples/sec   Loss 6.5416   LearningRate 0.0242   Epoch: 10   Global Step: 421130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:38,800-Speed 2571.58 samples/sec   Loss 6.4254   LearningRate 0.0242   Epoch: 10   Global Step: 421140   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:42,701-Speed 2625.08 samples/sec   Loss 6.6142   LearningRate 0.0242   Epoch: 10   Global Step: 421150   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:46,612-Speed 2619.18 samples/sec   Loss 6.5583   LearningRate 0.0242   Epoch: 10   Global Step: 421160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:50,515-Speed 2624.16 samples/sec   Loss 6.5205   LearningRate 0.0242   Epoch: 10   Global Step: 421170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:54,426-Speed 2618.75 samples/sec   Loss 6.7227   LearningRate 0.0242   Epoch: 10   Global Step: 421180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:03:58,325-Speed 2627.38 samples/sec   Loss 6.5832   LearningRate 0.0242   Epoch: 10   Global Step: 421190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:02,225-Speed 2626.48 samples/sec   Loss 6.5008   LearningRate 0.0242   Epoch: 10   Global Step: 421200   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:04:06,106-Speed 2639.04 samples/sec   Loss 6.5944   LearningRate 0.0242   Epoch: 10   Global Step: 421210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:10,025-Speed 2613.29 samples/sec   Loss 6.5534   LearningRate 0.0242   Epoch: 10   Global Step: 421220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:13,925-Speed 2625.94 samples/sec   Loss 6.6919   LearningRate 0.0242   Epoch: 10   Global Step: 421230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:17,827-Speed 2625.52 samples/sec   Loss 6.6298   LearningRate 0.0242   Epoch: 10   Global Step: 421240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:21,732-Speed 2623.13 samples/sec   Loss 6.5686   LearningRate 0.0242   Epoch: 10   Global Step: 421250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:25,654-Speed 2611.97 samples/sec   Loss 6.5166   LearningRate 0.0242   Epoch: 10   Global Step: 421260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:04:29,528-Speed 2644.22 samples/sec   Loss 6.5259   LearningRate 0.0242   Epoch: 10   Global Step: 421270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:33,424-Speed 2628.30 samples/sec   Loss 6.4926   LearningRate 0.0242   Epoch: 10   Global Step: 421280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:37,320-Speed 2629.41 samples/sec   Loss 6.4780   LearningRate 0.0242   Epoch: 10   Global Step: 421290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:41,224-Speed 2623.61 samples/sec   Loss 6.5036   LearningRate 0.0242   Epoch: 10   Global Step: 421300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:45,122-Speed 2627.61 samples/sec   Loss 6.4533   LearningRate 0.0242   Epoch: 10   Global Step: 421310   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:49,034-Speed 2618.54 samples/sec   Loss 6.5467   LearningRate 0.0242   Epoch: 10   Global Step: 421320   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:52,930-Speed 2628.56 samples/sec   Loss 6.6520   LearningRate 0.0242   Epoch: 10   Global Step: 421330   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:04:56,833-Speed 2624.73 samples/sec   Loss 6.5036   LearningRate 0.0242   Epoch: 10   Global Step: 421340   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:00,739-Speed 2621.55 samples/sec   Loss 6.5017   LearningRate 0.0242   Epoch: 10   Global Step: 421350   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:04,655-Speed 2615.59 samples/sec   Loss 6.5885   LearningRate 0.0242   Epoch: 10   Global Step: 421360   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:08,536-Speed 2639.41 samples/sec   Loss 6.5464   LearningRate 0.0242   Epoch: 10   Global Step: 421370   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:12,439-Speed 2624.54 samples/sec   Loss 6.5829   LearningRate 0.0242   Epoch: 10   Global Step: 421380   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:16,351-Speed 2618.27 samples/sec   Loss 6.5922   LearningRate 0.0242   Epoch: 10   Global Step: 421390   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:20,258-Speed 2621.56 samples/sec   Loss 6.5247   LearningRate 0.0242   Epoch: 10   Global Step: 421400   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:24,154-Speed 2628.91 samples/sec   Loss 6.6007   LearningRate 0.0242   Epoch: 10   Global Step: 421410   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:28,058-Speed 2623.19 samples/sec   Loss 6.6021   LearningRate 0.0242   Epoch: 10   Global Step: 421420   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:31,958-Speed 2626.34 samples/sec   Loss 6.6070   LearningRate 0.0242   Epoch: 10   Global Step: 421430   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:35,859-Speed 2625.90 samples/sec   Loss 6.5370   LearningRate 0.0242   Epoch: 10   Global Step: 421440   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:39,759-Speed 2626.13 samples/sec   Loss 6.4311   LearningRate 0.0242   Epoch: 10   Global Step: 421450   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:43,658-Speed 2627.38 samples/sec   Loss 6.5125   LearningRate 0.0242   Epoch: 10   Global Step: 421460   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:05:47,572-Speed 2616.75 samples/sec   Loss 6.5165   LearningRate 0.0242   Epoch: 10   Global Step: 421470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:05:51,503-Speed 2606.10 samples/sec   Loss 6.4745   LearningRate 0.0242   Epoch: 10   Global Step: 421480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:05:55,409-Speed 2623.15 samples/sec   Loss 6.5595   LearningRate 0.0242   Epoch: 10   Global Step: 421490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:05:59,312-Speed 2624.43 samples/sec   Loss 6.5864   LearningRate 0.0242   Epoch: 10   Global Step: 421500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:03,207-Speed 2629.05 samples/sec   Loss 6.5489   LearningRate 0.0242   Epoch: 10   Global Step: 421510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:07,106-Speed 2626.97 samples/sec   Loss 6.5676   LearningRate 0.0242   Epoch: 10   Global Step: 421520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:11,004-Speed 2628.20 samples/sec   Loss 6.4911   LearningRate 0.0242   Epoch: 10   Global Step: 421530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:14,899-Speed 2629.92 samples/sec   Loss 6.7070   LearningRate 0.0242   Epoch: 10   Global Step: 421540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:18,796-Speed 2628.37 samples/sec   Loss 6.4474   LearningRate 0.0242   Epoch: 10   Global Step: 421550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:22,698-Speed 2624.33 samples/sec   Loss 6.4984   LearningRate 0.0242   Epoch: 10   Global Step: 421560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:26,614-Speed 2615.47 samples/sec   Loss 6.5532   LearningRate 0.0242   Epoch: 10   Global Step: 421570   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:06:30,524-Speed 2619.31 samples/sec   Loss 6.5055   LearningRate 0.0242   Epoch: 10   Global Step: 421580   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:06:34,420-Speed 2629.26 samples/sec   Loss 6.6114   LearningRate 0.0242   Epoch: 10   Global Step: 421590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:38,326-Speed 2622.39 samples/sec   Loss 6.5081   LearningRate 0.0242   Epoch: 10   Global Step: 421600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:42,220-Speed 2630.70 samples/sec   Loss 6.4663   LearningRate 0.0242   Epoch: 10   Global Step: 421610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:06:46,097-Speed 2642.17 samples/sec   Loss 6.5950   LearningRate 0.0242   Epoch: 10   Global Step: 421620   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:06:49,998-Speed 2625.61 samples/sec   Loss 6.4475   LearningRate 0.0242   Epoch: 10   Global Step: 421630   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:06:53,893-Speed 2629.37 samples/sec   Loss 6.4295   LearningRate 0.0242   Epoch: 10   Global Step: 421640   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:06:57,788-Speed 2629.75 samples/sec   Loss 6.4516   LearningRate 0.0242   Epoch: 10   Global Step: 421650   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:01,699-Speed 2618.42 samples/sec   Loss 6.6409   LearningRate 0.0242   Epoch: 10   Global Step: 421660   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:05,605-Speed 2622.10 samples/sec   Loss 6.5823   LearningRate 0.0242   Epoch: 10   Global Step: 421670   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:09,503-Speed 2628.18 samples/sec   Loss 6.5089   LearningRate 0.0242   Epoch: 10   Global Step: 421680   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:13,421-Speed 2613.68 samples/sec   Loss 6.6614   LearningRate 0.0242   Epoch: 10   Global Step: 421690   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:17,318-Speed 2628.75 samples/sec   Loss 6.3498   LearningRate 0.0242   Epoch: 10   Global Step: 421700   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:21,217-Speed 2627.49 samples/sec   Loss 6.4938   LearningRate 0.0242   Epoch: 10   Global Step: 421710   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:25,137-Speed 2612.93 samples/sec   Loss 6.5563   LearningRate 0.0242   Epoch: 10   Global Step: 421720   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:07:29,036-Speed 2626.57 samples/sec   Loss 6.4648   LearningRate 0.0242   Epoch: 10   Global Step: 421730   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:07:32,944-Speed 2620.93 samples/sec   Loss 6.5229   LearningRate 0.0242   Epoch: 10   Global Step: 421740   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:36,853-Speed 2619.96 samples/sec   Loss 6.5089   LearningRate 0.0242   Epoch: 10   Global Step: 421750   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:40,755-Speed 2624.96 samples/sec   Loss 6.6020   LearningRate 0.0242   Epoch: 10   Global Step: 421760   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:44,654-Speed 2627.30 samples/sec   Loss 6.5735   LearningRate 0.0242   Epoch: 10   Global Step: 421770   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:48,553-Speed 2627.17 samples/sec   Loss 6.4835   LearningRate 0.0242   Epoch: 10   Global Step: 421780   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:52,560-Speed 2556.07 samples/sec   Loss 6.5155   LearningRate 0.0242   Epoch: 10   Global Step: 421790   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:07:56,454-Speed 2630.34 samples/sec   Loss 6.5172   LearningRate 0.0242   Epoch: 10   Global Step: 421800   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:00,353-Speed 2627.65 samples/sec   Loss 6.4864   LearningRate 0.0242   Epoch: 10   Global Step: 421810   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:04,250-Speed 2627.75 samples/sec   Loss 6.6247   LearningRate 0.0242   Epoch: 10   Global Step: 421820   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:08,158-Speed 2620.86 samples/sec   Loss 6.4807   LearningRate 0.0242   Epoch: 10   Global Step: 421830   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:12,059-Speed 2625.62 samples/sec   Loss 6.6187   LearningRate 0.0242   Epoch: 10   Global Step: 421840   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:15,962-Speed 2624.52 samples/sec   Loss 6.5301   LearningRate 0.0242   Epoch: 10   Global Step: 421850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:19,859-Speed 2628.11 samples/sec   Loss 6.6390   LearningRate 0.0242   Epoch: 10   Global Step: 421860   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:23,765-Speed 2623.42 samples/sec   Loss 6.5646   LearningRate 0.0242   Epoch: 10   Global Step: 421870   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:27,675-Speed 2619.22 samples/sec   Loss 6.5707   LearningRate 0.0242   Epoch: 10   Global Step: 421880   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:31,585-Speed 2620.53 samples/sec   Loss 6.6874   LearningRate 0.0242   Epoch: 10   Global Step: 421890   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:35,480-Speed 2629.52 samples/sec   Loss 6.5161   LearningRate 0.0242   Epoch: 10   Global Step: 421900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:39,379-Speed 2627.04 samples/sec   Loss 6.5138   LearningRate 0.0241   Epoch: 10   Global Step: 421910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:43,284-Speed 2622.24 samples/sec   Loss 6.5859   LearningRate 0.0241   Epoch: 10   Global Step: 421920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:08:47,178-Speed 2630.69 samples/sec   Loss 6.4912   LearningRate 0.0241   Epoch: 10   Global Step: 421930   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:51,078-Speed 2626.21 samples/sec   Loss 6.5628   LearningRate 0.0241   Epoch: 10   Global Step: 421940   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:54,987-Speed 2620.31 samples/sec   Loss 6.5701   LearningRate 0.0241   Epoch: 10   Global Step: 421950   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:08:58,879-Speed 2632.23 samples/sec   Loss 6.5446   LearningRate 0.0241   Epoch: 10   Global Step: 421960   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:02,775-Speed 2628.79 samples/sec   Loss 6.5894   LearningRate 0.0241   Epoch: 10   Global Step: 421970   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:06,674-Speed 2626.63 samples/sec   Loss 6.5206   LearningRate 0.0241   Epoch: 10   Global Step: 421980   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:10,581-Speed 2621.99 samples/sec   Loss 6.5105   LearningRate 0.0241   Epoch: 10   Global Step: 421990   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:14,474-Speed 2631.29 samples/sec   Loss 6.5692   LearningRate 0.0241   Epoch: 10   Global Step: 422000   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:18,376-Speed 2624.23 samples/sec   Loss 6.6146   LearningRate 0.0241   Epoch: 10   Global Step: 422010   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:22,297-Speed 2613.18 samples/sec   Loss 6.6249   LearningRate 0.0241   Epoch: 10   Global Step: 422020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:26,210-Speed 2617.11 samples/sec   Loss 6.5731   LearningRate 0.0241   Epoch: 10   Global Step: 422030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:09:30,321-Speed 2491.95 samples/sec   Loss 6.5313   LearningRate 0.0241   Epoch: 10   Global Step: 422040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:09:34,214-Speed 2630.98 samples/sec   Loss 6.5840   LearningRate 0.0241   Epoch: 10   Global Step: 422050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:09:38,142-Speed 2607.25 samples/sec   Loss 6.6838   LearningRate 0.0241   Epoch: 10   Global Step: 422060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:09:42,038-Speed 2629.44 samples/sec   Loss 6.5746   LearningRate 0.0241   Epoch: 10   Global Step: 422070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:09:45,932-Speed 2630.72 samples/sec   Loss 6.5751   LearningRate 0.0241   Epoch: 10   Global Step: 422080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:09:49,810-Speed 2640.74 samples/sec   Loss 6.4154   LearningRate 0.0241   Epoch: 10   Global Step: 422090   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:53,710-Speed 2626.25 samples/sec   Loss 6.5361   LearningRate 0.0241   Epoch: 10   Global Step: 422100   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:09:57,618-Speed 2620.72 samples/sec   Loss 6.5893   LearningRate 0.0241   Epoch: 10   Global Step: 422110   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:01,522-Speed 2624.26 samples/sec   Loss 6.5104   LearningRate 0.0241   Epoch: 10   Global Step: 422120   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:05,423-Speed 2625.57 samples/sec   Loss 6.5686   LearningRate 0.0241   Epoch: 10   Global Step: 422130   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:09,341-Speed 2614.39 samples/sec   Loss 6.5693   LearningRate 0.0241   Epoch: 10   Global Step: 422140   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:13,249-Speed 2620.07 samples/sec   Loss 6.6213   LearningRate 0.0241   Epoch: 10   Global Step: 422150   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:17,347-Speed 2499.74 samples/sec   Loss 6.4355   LearningRate 0.0241   Epoch: 10   Global Step: 422160   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:21,348-Speed 2560.21 samples/sec   Loss 6.5321   LearningRate 0.0241   Epoch: 10   Global Step: 422170   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:25,244-Speed 2628.50 samples/sec   Loss 6.4815   LearningRate 0.0241   Epoch: 10   Global Step: 422180   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:10:29,143-Speed 2627.20 samples/sec   Loss 6.5168   LearningRate 0.0241   Epoch: 10   Global Step: 422190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:33,172-Speed 2542.25 samples/sec   Loss 6.3554   LearningRate 0.0241   Epoch: 10   Global Step: 422200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:37,070-Speed 2627.48 samples/sec   Loss 6.4946   LearningRate 0.0241   Epoch: 10   Global Step: 422210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:40,969-Speed 2627.50 samples/sec   Loss 6.5387   LearningRate 0.0241   Epoch: 10   Global Step: 422220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:44,867-Speed 2627.16 samples/sec   Loss 6.5329   LearningRate 0.0241   Epoch: 10   Global Step: 422230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:48,767-Speed 2626.51 samples/sec   Loss 6.5622   LearningRate 0.0241   Epoch: 10   Global Step: 422240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:52,668-Speed 2625.21 samples/sec   Loss 6.5025   LearningRate 0.0241   Epoch: 10   Global Step: 422250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:10:56,552-Speed 2637.66 samples/sec   Loss 6.6152   LearningRate 0.0241   Epoch: 10   Global Step: 422260   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:00,457-Speed 2622.53 samples/sec   Loss 6.4429   LearningRate 0.0241   Epoch: 10   Global Step: 422270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:04,357-Speed 2626.11 samples/sec   Loss 6.6294   LearningRate 0.0241   Epoch: 10   Global Step: 422280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:08,258-Speed 2625.93 samples/sec   Loss 6.4557   LearningRate 0.0241   Epoch: 10   Global Step: 422290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:12,154-Speed 2628.96 samples/sec   Loss 6.4342   LearningRate 0.0241   Epoch: 10   Global Step: 422300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:16,052-Speed 2628.67 samples/sec   Loss 6.5969   LearningRate 0.0241   Epoch: 10   Global Step: 422310   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:19,955-Speed 2623.65 samples/sec   Loss 6.6176   LearningRate 0.0241   Epoch: 10   Global Step: 422320   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:23,863-Speed 2621.72 samples/sec   Loss 6.4873   LearningRate 0.0241   Epoch: 10   Global Step: 422330   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:27,764-Speed 2625.30 samples/sec   Loss 6.5074   LearningRate 0.0241   Epoch: 10   Global Step: 422340   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:31,661-Speed 2628.08 samples/sec   Loss 6.5973   LearningRate 0.0241   Epoch: 10   Global Step: 422350   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:11:35,558-Speed 2628.36 samples/sec   Loss 6.5269   LearningRate 0.0241   Epoch: 10   Global Step: 422360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:11:39,454-Speed 2629.37 samples/sec   Loss 6.6221   LearningRate 0.0241   Epoch: 10   Global Step: 422370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:11:43,369-Speed 2615.99 samples/sec   Loss 6.5818   LearningRate 0.0241   Epoch: 10   Global Step: 422380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:11:47,265-Speed 2629.32 samples/sec   Loss 6.4682   LearningRate 0.0241   Epoch: 10   Global Step: 422390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:11:51,195-Speed 2606.44 samples/sec   Loss 6.5417   LearningRate 0.0241   Epoch: 10   Global Step: 422400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:11:55,089-Speed 2630.28 samples/sec   Loss 6.5464   LearningRate 0.0241   Epoch: 10   Global Step: 422410   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:11:58,997-Speed 2620.92 samples/sec   Loss 6.5446   LearningRate 0.0241   Epoch: 10   Global Step: 422420   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:12:02,908-Speed 2618.95 samples/sec   Loss 6.5696   LearningRate 0.0241   Epoch: 10   Global Step: 422430   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:12:06,807-Speed 2626.51 samples/sec   Loss 6.5252   LearningRate 0.0241   Epoch: 10   Global Step: 422440   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:12:10,717-Speed 2619.55 samples/sec   Loss 6.5225   LearningRate 0.0241   Epoch: 10   Global Step: 422450   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:12:14,603-Speed 2636.11 samples/sec   Loss 6.7238   LearningRate 0.0241   Epoch: 10   Global Step: 422460   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:12:18,500-Speed 2628.77 samples/sec   Loss 6.5765   LearningRate 0.0241   Epoch: 10   Global Step: 422470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:12:22,371-Speed 2646.27 samples/sec   Loss 6.4887   LearningRate 0.0241   Epoch: 10   Global Step: 422480   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:26,304-Speed 2604.42 samples/sec   Loss 6.5326   LearningRate 0.0241   Epoch: 10   Global Step: 422490   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:30,215-Speed 2618.37 samples/sec   Loss 6.5411   LearningRate 0.0241   Epoch: 10   Global Step: 422500   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:34,116-Speed 2626.23 samples/sec   Loss 6.5506   LearningRate 0.0241   Epoch: 10   Global Step: 422510   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:38,015-Speed 2626.64 samples/sec   Loss 6.4540   LearningRate 0.0241   Epoch: 10   Global Step: 422520   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:41,913-Speed 2627.73 samples/sec   Loss 6.5279   LearningRate 0.0241   Epoch: 10   Global Step: 422530   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:45,812-Speed 2627.06 samples/sec   Loss 6.5640   LearningRate 0.0241   Epoch: 10   Global Step: 422540   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:49,707-Speed 2629.78 samples/sec   Loss 6.6413   LearningRate 0.0241   Epoch: 10   Global Step: 422550   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:53,618-Speed 2618.70 samples/sec   Loss 6.3966   LearningRate 0.0241   Epoch: 10   Global Step: 422560   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:12:57,516-Speed 2628.56 samples/sec   Loss 6.4529   LearningRate 0.0241   Epoch: 10   Global Step: 422570   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:01,409-Speed 2630.90 samples/sec   Loss 6.5476   LearningRate 0.0241   Epoch: 10   Global Step: 422580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:13:05,307-Speed 2627.41 samples/sec   Loss 6.5982   LearningRate 0.0241   Epoch: 10   Global Step: 422590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:13:09,182-Speed 2642.89 samples/sec   Loss 6.5841   LearningRate 0.0241   Epoch: 10   Global Step: 422600   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:13,083-Speed 2626.07 samples/sec   Loss 6.5261   LearningRate 0.0241   Epoch: 10   Global Step: 422610   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:16,982-Speed 2626.91 samples/sec   Loss 6.5313   LearningRate 0.0241   Epoch: 10   Global Step: 422620   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:20,885-Speed 2624.27 samples/sec   Loss 6.4977   LearningRate 0.0241   Epoch: 10   Global Step: 422630   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:24,776-Speed 2632.63 samples/sec   Loss 6.4018   LearningRate 0.0241   Epoch: 10   Global Step: 422640   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:28,673-Speed 2628.60 samples/sec   Loss 6.5770   LearningRate 0.0241   Epoch: 10   Global Step: 422650   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:32,595-Speed 2611.59 samples/sec   Loss 6.5440   LearningRate 0.0241   Epoch: 10   Global Step: 422660   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:36,492-Speed 2628.16 samples/sec   Loss 6.5682   LearningRate 0.0241   Epoch: 10   Global Step: 422670   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:40,384-Speed 2631.74 samples/sec   Loss 6.5510   LearningRate 0.0241   Epoch: 10   Global Step: 422680   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:44,289-Speed 2622.65 samples/sec   Loss 6.5114   LearningRate 0.0241   Epoch: 10   Global Step: 422690   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:13:48,198-Speed 2620.11 samples/sec   Loss 6.4895   LearningRate 0.0241   Epoch: 10   Global Step: 422700   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:13:52,096-Speed 2627.55 samples/sec   Loss 6.6709   LearningRate 0.0241   Epoch: 10   Global Step: 422710   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:13:55,991-Speed 2630.32 samples/sec   Loss 6.4882   LearningRate 0.0241   Epoch: 10   Global Step: 422720   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:13:59,889-Speed 2627.31 samples/sec   Loss 6.4860   LearningRate 0.0241   Epoch: 10   Global Step: 422730   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:03,790-Speed 2625.89 samples/sec   Loss 6.5104   LearningRate 0.0241   Epoch: 10   Global Step: 422740   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:07,693-Speed 2624.03 samples/sec   Loss 6.5936   LearningRate 0.0240   Epoch: 10   Global Step: 422750   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:11,593-Speed 2626.20 samples/sec   Loss 6.4597   LearningRate 0.0240   Epoch: 10   Global Step: 422760   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:15,504-Speed 2618.66 samples/sec   Loss 6.4790   LearningRate 0.0240   Epoch: 10   Global Step: 422770   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:19,424-Speed 2612.71 samples/sec   Loss 6.6869   LearningRate 0.0240   Epoch: 10   Global Step: 422780   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:23,322-Speed 2628.33 samples/sec   Loss 6.6548   LearningRate 0.0240   Epoch: 10   Global Step: 422790   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:27,368-Speed 2531.42 samples/sec   Loss 6.6936   LearningRate 0.0240   Epoch: 10   Global Step: 422800   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:14:31,302-Speed 2603.61 samples/sec   Loss 6.5711   LearningRate 0.0240   Epoch: 10   Global Step: 422810   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:35,202-Speed 2625.95 samples/sec   Loss 6.5244   LearningRate 0.0240   Epoch: 10   Global Step: 422820   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:39,168-Speed 2582.47 samples/sec   Loss 6.5316   LearningRate 0.0240   Epoch: 10   Global Step: 422830   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:43,065-Speed 2628.16 samples/sec   Loss 6.5162   LearningRate 0.0240   Epoch: 10   Global Step: 422840   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:46,963-Speed 2627.95 samples/sec   Loss 6.5706   LearningRate 0.0240   Epoch: 10   Global Step: 422850   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:14:50,840-Speed 2641.73 samples/sec   Loss 6.5713   LearningRate 0.0240   Epoch: 10   Global Step: 422860   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:14:54,742-Speed 2625.36 samples/sec   Loss 6.4563   LearningRate 0.0240   Epoch: 10   Global Step: 422870   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:14:58,638-Speed 2628.85 samples/sec   Loss 6.5335   LearningRate 0.0240   Epoch: 10   Global Step: 422880   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:02,640-Speed 2559.21 samples/sec   Loss 6.6453   LearningRate 0.0240   Epoch: 10   Global Step: 422890   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:06,537-Speed 2628.62 samples/sec   Loss 6.4555   LearningRate 0.0240   Epoch: 10   Global Step: 422900   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:10,433-Speed 2629.13 samples/sec   Loss 6.5684   LearningRate 0.0240   Epoch: 10   Global Step: 422910   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:14,353-Speed 2612.42 samples/sec   Loss 6.5795   LearningRate 0.0240   Epoch: 10   Global Step: 422920   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:18,304-Speed 2592.70 samples/sec   Loss 6.5894   LearningRate 0.0240   Epoch: 10   Global Step: 422930   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:22,297-Speed 2565.26 samples/sec   Loss 6.5528   LearningRate 0.0240   Epoch: 10   Global Step: 422940   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:26,196-Speed 2627.27 samples/sec   Loss 6.5142   LearningRate 0.0240   Epoch: 10   Global Step: 422950   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:15:30,097-Speed 2625.53 samples/sec   Loss 6.4752   LearningRate 0.0240   Epoch: 10   Global Step: 422960   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:34,000-Speed 2623.85 samples/sec   Loss 6.4768   LearningRate 0.0240   Epoch: 10   Global Step: 422970   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:37,895-Speed 2629.98 samples/sec   Loss 6.5551   LearningRate 0.0240   Epoch: 10   Global Step: 422980   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:41,793-Speed 2627.65 samples/sec   Loss 6.4883   LearningRate 0.0240   Epoch: 10   Global Step: 422990   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:45,930-Speed 2476.18 samples/sec   Loss 6.6056   LearningRate 0.0240   Epoch: 10   Global Step: 423000   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:49,854-Speed 2609.91 samples/sec   Loss 6.6238   LearningRate 0.0240   Epoch: 10   Global Step: 423010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:53,776-Speed 2611.70 samples/sec   Loss 6.4693   LearningRate 0.0240   Epoch: 10   Global Step: 423020   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:15:57,676-Speed 2626.02 samples/sec   Loss 6.4900   LearningRate 0.0240   Epoch: 10   Global Step: 423030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:01,584-Speed 2621.03 samples/sec   Loss 6.5585   LearningRate 0.0240   Epoch: 10   Global Step: 423040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:05,498-Speed 2616.94 samples/sec   Loss 6.7279   LearningRate 0.0240   Epoch: 10   Global Step: 423050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:09,389-Speed 2632.09 samples/sec   Loss 6.5514   LearningRate 0.0240   Epoch: 10   Global Step: 423060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:13,278-Speed 2633.89 samples/sec   Loss 6.5285   LearningRate 0.0240   Epoch: 10   Global Step: 423070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:17,270-Speed 2565.94 samples/sec   Loss 6.4977   LearningRate 0.0240   Epoch: 10   Global Step: 423080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:21,175-Speed 2622.86 samples/sec   Loss 6.4719   LearningRate 0.0240   Epoch: 10   Global Step: 423090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:16:25,072-Speed 2628.62 samples/sec   Loss 6.4800   LearningRate 0.0240   Epoch: 10   Global Step: 423100   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:29,006-Speed 2603.03 samples/sec   Loss 6.4824   LearningRate 0.0240   Epoch: 10   Global Step: 423110   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:32,915-Speed 2620.68 samples/sec   Loss 6.4608   LearningRate 0.0240   Epoch: 10   Global Step: 423120   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:36,812-Speed 2628.60 samples/sec   Loss 6.4524   LearningRate 0.0240   Epoch: 10   Global Step: 423130   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:40,707-Speed 2629.52 samples/sec   Loss 6.4731   LearningRate 0.0240   Epoch: 10   Global Step: 423140   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:44,628-Speed 2611.89 samples/sec   Loss 6.5244   LearningRate 0.0240   Epoch: 10   Global Step: 423150   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:48,549-Speed 2612.46 samples/sec   Loss 6.5746   LearningRate 0.0240   Epoch: 10   Global Step: 423160   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:52,446-Speed 2628.54 samples/sec   Loss 6.5444   LearningRate 0.0240   Epoch: 10   Global Step: 423170   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:16:56,344-Speed 2627.98 samples/sec   Loss 6.4999   LearningRate 0.0240   Epoch: 10   Global Step: 423180   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:17:00,334-Speed 2566.72 samples/sec   Loss 6.5250   LearningRate 0.0240   Epoch: 10   Global Step: 423190   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:17:04,262-Speed 2608.18 samples/sec   Loss 6.4159   LearningRate 0.0240   Epoch: 10   Global Step: 423200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:08,161-Speed 2626.91 samples/sec   Loss 6.5176   LearningRate 0.0240   Epoch: 10   Global Step: 423210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:12,062-Speed 2625.55 samples/sec   Loss 6.5631   LearningRate 0.0240   Epoch: 10   Global Step: 423220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:15,964-Speed 2624.28 samples/sec   Loss 6.5915   LearningRate 0.0240   Epoch: 10   Global Step: 423230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:19,861-Speed 2628.91 samples/sec   Loss 6.6326   LearningRate 0.0240   Epoch: 10   Global Step: 423240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:23,770-Speed 2620.46 samples/sec   Loss 6.5148   LearningRate 0.0240   Epoch: 10   Global Step: 423250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:27,671-Speed 2625.97 samples/sec   Loss 6.5745   LearningRate 0.0240   Epoch: 10   Global Step: 423260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:31,588-Speed 2614.28 samples/sec   Loss 6.6232   LearningRate 0.0240   Epoch: 10   Global Step: 423270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:35,492-Speed 2623.90 samples/sec   Loss 6.5562   LearningRate 0.0240   Epoch: 10   Global Step: 423280   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:39,393-Speed 2625.58 samples/sec   Loss 6.4989   LearningRate 0.0240   Epoch: 10   Global Step: 423290   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:43,292-Speed 2627.05 samples/sec   Loss 6.4809   LearningRate 0.0240   Epoch: 10   Global Step: 423300   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:17:47,189-Speed 2628.51 samples/sec   Loss 6.5519   LearningRate 0.0240   Epoch: 10   Global Step: 423310   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:17:51,097-Speed 2620.91 samples/sec   Loss 6.5138   LearningRate 0.0240   Epoch: 10   Global Step: 423320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:54,996-Speed 2627.65 samples/sec   Loss 6.5935   LearningRate 0.0240   Epoch: 10   Global Step: 423330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:17:58,891-Speed 2629.48 samples/sec   Loss 6.5290   LearningRate 0.0240   Epoch: 10   Global Step: 423340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:02,790-Speed 2626.86 samples/sec   Loss 6.5709   LearningRate 0.0240   Epoch: 10   Global Step: 423350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:06,690-Speed 2625.74 samples/sec   Loss 6.5903   LearningRate 0.0240   Epoch: 10   Global Step: 423360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:10,597-Speed 2622.02 samples/sec   Loss 6.4302   LearningRate 0.0240   Epoch: 10   Global Step: 423370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:14,498-Speed 2625.46 samples/sec   Loss 6.5061   LearningRate 0.0240   Epoch: 10   Global Step: 423380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:18,402-Speed 2624.19 samples/sec   Loss 6.4613   LearningRate 0.0240   Epoch: 10   Global Step: 423390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:22,326-Speed 2609.57 samples/sec   Loss 6.4778   LearningRate 0.0240   Epoch: 10   Global Step: 423400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:26,220-Speed 2631.59 samples/sec   Loss 6.6814   LearningRate 0.0240   Epoch: 10   Global Step: 423410   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:30,146-Speed 2608.76 samples/sec   Loss 6.5140   LearningRate 0.0240   Epoch: 10   Global Step: 423420   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:18:34,022-Speed 2642.18 samples/sec   Loss 6.6688   LearningRate 0.0240   Epoch: 10   Global Step: 423430   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:37,945-Speed 2610.84 samples/sec   Loss 6.5296   LearningRate 0.0240   Epoch: 10   Global Step: 423440   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:41,855-Speed 2620.25 samples/sec   Loss 6.5494   LearningRate 0.0240   Epoch: 10   Global Step: 423450   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:45,764-Speed 2620.21 samples/sec   Loss 6.5478   LearningRate 0.0240   Epoch: 10   Global Step: 423460   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:49,673-Speed 2620.14 samples/sec   Loss 6.3881   LearningRate 0.0240   Epoch: 10   Global Step: 423470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:53,566-Speed 2631.03 samples/sec   Loss 6.5673   LearningRate 0.0240   Epoch: 10   Global Step: 423480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:18:57,460-Speed 2630.96 samples/sec   Loss 6.4304   LearningRate 0.0240   Epoch: 10   Global Step: 423490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:01,357-Speed 2627.81 samples/sec   Loss 6.4544   LearningRate 0.0240   Epoch: 10   Global Step: 423500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:05,358-Speed 2559.99 samples/sec   Loss 6.5216   LearningRate 0.0240   Epoch: 10   Global Step: 423510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:09,267-Speed 2619.67 samples/sec   Loss 6.4220   LearningRate 0.0240   Epoch: 10   Global Step: 423520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:13,143-Speed 2642.80 samples/sec   Loss 6.5680   LearningRate 0.0240   Epoch: 10   Global Step: 423530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:17,042-Speed 2626.96 samples/sec   Loss 6.5343   LearningRate 0.0240   Epoch: 10   Global Step: 423540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:20,941-Speed 2627.22 samples/sec   Loss 6.5660   LearningRate 0.0240   Epoch: 10   Global Step: 423550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:24,843-Speed 2625.26 samples/sec   Loss 6.5744   LearningRate 0.0240   Epoch: 10   Global Step: 423560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:28,742-Speed 2626.89 samples/sec   Loss 6.5547   LearningRate 0.0240   Epoch: 10   Global Step: 423570   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:32,639-Speed 2628.59 samples/sec   Loss 6.5086   LearningRate 0.0240   Epoch: 10   Global Step: 423580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:36,533-Speed 2629.91 samples/sec   Loss 6.5418   LearningRate 0.0240   Epoch: 10   Global Step: 423590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:40,427-Speed 2630.05 samples/sec   Loss 6.4220   LearningRate 0.0239   Epoch: 10   Global Step: 423600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:44,322-Speed 2629.81 samples/sec   Loss 6.4559   LearningRate 0.0239   Epoch: 10   Global Step: 423610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:19:48,197-Speed 2643.14 samples/sec   Loss 6.5791   LearningRate 0.0239   Epoch: 10   Global Step: 423620   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:19:52,138-Speed 2599.21 samples/sec   Loss 6.5453   LearningRate 0.0239   Epoch: 10   Global Step: 423630   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:19:56,036-Speed 2627.42 samples/sec   Loss 6.5896   LearningRate 0.0239   Epoch: 10   Global Step: 423640   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:19:59,948-Speed 2618.80 samples/sec   Loss 6.4750   LearningRate 0.0239   Epoch: 10   Global Step: 423650   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:03,848-Speed 2626.36 samples/sec   Loss 6.5927   LearningRate 0.0239   Epoch: 10   Global Step: 423660   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:07,780-Speed 2604.94 samples/sec   Loss 6.6815   LearningRate 0.0239   Epoch: 10   Global Step: 423670   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:11,762-Speed 2572.19 samples/sec   Loss 6.5962   LearningRate 0.0239   Epoch: 10   Global Step: 423680   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:15,858-Speed 2501.04 samples/sec   Loss 6.6094   LearningRate 0.0239   Epoch: 10   Global Step: 423690   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:19,799-Speed 2598.67 samples/sec   Loss 6.6834   LearningRate 0.0239   Epoch: 10   Global Step: 423700   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:23,744-Speed 2596.37 samples/sec   Loss 6.5202   LearningRate 0.0239   Epoch: 10   Global Step: 423710   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:20:27,795-Speed 2528.15 samples/sec   Loss 6.3975   LearningRate 0.0239   Epoch: 10   Global Step: 423720   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:31,701-Speed 2622.93 samples/sec   Loss 6.6140   LearningRate 0.0239   Epoch: 10   Global Step: 423730   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:35,611-Speed 2618.98 samples/sec   Loss 6.5852   LearningRate 0.0239   Epoch: 10   Global Step: 423740   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:39,553-Speed 2598.53 samples/sec   Loss 6.5779   LearningRate 0.0239   Epoch: 10   Global Step: 423750   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:43,453-Speed 2626.45 samples/sec   Loss 6.4894   LearningRate 0.0239   Epoch: 10   Global Step: 423760   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:47,343-Speed 2633.65 samples/sec   Loss 6.5676   LearningRate 0.0239   Epoch: 10   Global Step: 423770   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:51,257-Speed 2616.84 samples/sec   Loss 6.4244   LearningRate 0.0239   Epoch: 10   Global Step: 423780   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:55,160-Speed 2625.01 samples/sec   Loss 6.4580   LearningRate 0.0239   Epoch: 10   Global Step: 423790   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:20:59,036-Speed 2642.47 samples/sec   Loss 6.6427   LearningRate 0.0239   Epoch: 10   Global Step: 423800   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:02,933-Speed 2628.89 samples/sec   Loss 6.4159   LearningRate 0.0239   Epoch: 10   Global Step: 423810   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:06,824-Speed 2631.70 samples/sec   Loss 6.5584   LearningRate 0.0239   Epoch: 10   Global Step: 423820   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:10,714-Speed 2633.31 samples/sec   Loss 6.5476   LearningRate 0.0239   Epoch: 10   Global Step: 423830   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:14,607-Speed 2630.78 samples/sec   Loss 6.4691   LearningRate 0.0239   Epoch: 10   Global Step: 423840   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:18,514-Speed 2622.02 samples/sec   Loss 6.4459   LearningRate 0.0239   Epoch: 10   Global Step: 423850   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:22,412-Speed 2627.63 samples/sec   Loss 6.4645   LearningRate 0.0239   Epoch: 10   Global Step: 423860   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:26,318-Speed 2621.88 samples/sec   Loss 6.5246   LearningRate 0.0239   Epoch: 10   Global Step: 423870   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:30,213-Speed 2630.55 samples/sec   Loss 6.4800   LearningRate 0.0239   Epoch: 10   Global Step: 423880   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:34,111-Speed 2627.29 samples/sec   Loss 6.4435   LearningRate 0.0239   Epoch: 10   Global Step: 423890   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:38,008-Speed 2627.97 samples/sec   Loss 6.4505   LearningRate 0.0239   Epoch: 10   Global Step: 423900   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:21:41,906-Speed 2627.88 samples/sec   Loss 6.5026   LearningRate 0.0239   Epoch: 10   Global Step: 423910   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:21:45,801-Speed 2629.38 samples/sec   Loss 6.5005   LearningRate 0.0239   Epoch: 10   Global Step: 423920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:21:49,709-Speed 2621.44 samples/sec   Loss 6.5244   LearningRate 0.0239   Epoch: 10   Global Step: 423930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:21:53,585-Speed 2642.85 samples/sec   Loss 6.4681   LearningRate 0.0239   Epoch: 10   Global Step: 423940   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:21:57,481-Speed 2628.67 samples/sec   Loss 6.4879   LearningRate 0.0239   Epoch: 10   Global Step: 423950   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:01,383-Speed 2625.77 samples/sec   Loss 6.5325   LearningRate 0.0239   Epoch: 10   Global Step: 423960   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:05,285-Speed 2624.54 samples/sec   Loss 6.5968   LearningRate 0.0239   Epoch: 10   Global Step: 423970   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:09,189-Speed 2623.41 samples/sec   Loss 6.4452   LearningRate 0.0239   Epoch: 10   Global Step: 423980   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:13,083-Speed 2629.90 samples/sec   Loss 6.5443   LearningRate 0.0239   Epoch: 10   Global Step: 423990   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:16,979-Speed 2629.43 samples/sec   Loss 6.5333   LearningRate 0.0239   Epoch: 10   Global Step: 424000   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:20,880-Speed 2625.36 samples/sec   Loss 6.5327   LearningRate 0.0239   Epoch: 10   Global Step: 424010   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:24,786-Speed 2622.52 samples/sec   Loss 6.4884   LearningRate 0.0239   Epoch: 10   Global Step: 424020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:28,679-Speed 2631.29 samples/sec   Loss 6.5582   LearningRate 0.0239   Epoch: 10   Global Step: 424030   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:22:32,579-Speed 2626.33 samples/sec   Loss 6.5246   LearningRate 0.0239   Epoch: 10   Global Step: 424040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:36,487-Speed 2620.67 samples/sec   Loss 6.6285   LearningRate 0.0239   Epoch: 10   Global Step: 424050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:40,386-Speed 2626.99 samples/sec   Loss 6.5682   LearningRate 0.0239   Epoch: 10   Global Step: 424060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:44,286-Speed 2626.55 samples/sec   Loss 6.5286   LearningRate 0.0239   Epoch: 10   Global Step: 424070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:48,192-Speed 2621.61 samples/sec   Loss 6.5323   LearningRate 0.0239   Epoch: 10   Global Step: 424080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:52,090-Speed 2628.41 samples/sec   Loss 6.5854   LearningRate 0.0239   Epoch: 10   Global Step: 424090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:55,986-Speed 2629.12 samples/sec   Loss 6.6079   LearningRate 0.0239   Epoch: 10   Global Step: 424100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:22:59,879-Speed 2630.40 samples/sec   Loss 6.4553   LearningRate 0.0239   Epoch: 10   Global Step: 424110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:03,803-Speed 2610.28 samples/sec   Loss 6.4257   LearningRate 0.0239   Epoch: 10   Global Step: 424120   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:07,700-Speed 2629.04 samples/sec   Loss 6.4358   LearningRate 0.0239   Epoch: 10   Global Step: 424130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:11,614-Speed 2616.85 samples/sec   Loss 6.5728   LearningRate 0.0239   Epoch: 10   Global Step: 424140   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-04-14 19:23:15,494-Speed 2639.34 samples/sec   Loss 6.5141   LearningRate 0.0239   Epoch: 10   Global Step: 424150   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:19,409-Speed 2616.54 samples/sec   Loss 6.3956   LearningRate 0.0239   Epoch: 10   Global Step: 424160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:23,321-Speed 2617.77 samples/sec   Loss 6.4432   LearningRate 0.0239   Epoch: 10   Global Step: 424170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:27,217-Speed 2629.57 samples/sec   Loss 6.4105   LearningRate 0.0239   Epoch: 10   Global Step: 424180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:31,117-Speed 2626.25 samples/sec   Loss 6.4650   LearningRate 0.0239   Epoch: 10   Global Step: 424190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:35,028-Speed 2618.99 samples/sec   Loss 6.5436   LearningRate 0.0239   Epoch: 10   Global Step: 424200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:23:38,914-Speed 2635.61 samples/sec   Loss 6.5913   LearningRate 0.0239   Epoch: 10   Global Step: 424210   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:23:42,813-Speed 2626.56 samples/sec   Loss 6.4330   LearningRate 0.0239   Epoch: 10   Global Step: 424220   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:23:46,723-Speed 2619.38 samples/sec   Loss 6.5330   LearningRate 0.0239   Epoch: 10   Global Step: 424230   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:23:50,626-Speed 2624.81 samples/sec   Loss 6.4814   LearningRate 0.0239   Epoch: 10   Global Step: 424240   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:23:54,522-Speed 2628.84 samples/sec   Loss 6.4576   LearningRate 0.0239   Epoch: 10   Global Step: 424250   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:23:58,425-Speed 2624.44 samples/sec   Loss 6.5228   LearningRate 0.0239   Epoch: 10   Global Step: 424260   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:24:02,321-Speed 2628.79 samples/sec   Loss 6.5947   LearningRate 0.0239   Epoch: 10   Global Step: 424270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:24:06,218-Speed 2628.34 samples/sec   Loss 6.4045   LearningRate 0.0239   Epoch: 10   Global Step: 424280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:24:10,120-Speed 2624.57 samples/sec   Loss 6.4612   LearningRate 0.0239   Epoch: 10   Global Step: 424290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:24:14,080-Speed 2586.83 samples/sec   Loss 6.4139   LearningRate 0.0239   Epoch: 10   Global Step: 424300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:24:18,001-Speed 2612.27 samples/sec   Loss 6.5611   LearningRate 0.0239   Epoch: 10   Global Step: 424310   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:21,911-Speed 2619.65 samples/sec   Loss 6.5688   LearningRate 0.0239   Epoch: 10   Global Step: 424320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:25,809-Speed 2627.55 samples/sec   Loss 6.5325   LearningRate 0.0239   Epoch: 10   Global Step: 424330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:29,703-Speed 2630.40 samples/sec   Loss 6.5684   LearningRate 0.0239   Epoch: 10   Global Step: 424340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:33,601-Speed 2627.48 samples/sec   Loss 6.4935   LearningRate 0.0239   Epoch: 10   Global Step: 424350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:37,494-Speed 2630.54 samples/sec   Loss 6.4864   LearningRate 0.0239   Epoch: 10   Global Step: 424360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:41,390-Speed 2629.36 samples/sec   Loss 6.4648   LearningRate 0.0239   Epoch: 10   Global Step: 424370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:45,301-Speed 2618.98 samples/sec   Loss 6.5342   LearningRate 0.0239   Epoch: 10   Global Step: 424380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:49,208-Speed 2621.83 samples/sec   Loss 6.4475   LearningRate 0.0239   Epoch: 10   Global Step: 424390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:53,100-Speed 2631.27 samples/sec   Loss 6.4037   LearningRate 0.0239   Epoch: 10   Global Step: 424400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:24:56,977-Speed 2642.36 samples/sec   Loss 6.5311   LearningRate 0.0239   Epoch: 10   Global Step: 424410   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:00,874-Speed 2628.00 samples/sec   Loss 6.4894   LearningRate 0.0239   Epoch: 10   Global Step: 424420   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:04,770-Speed 2629.30 samples/sec   Loss 6.4591   LearningRate 0.0239   Epoch: 10   Global Step: 424430   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:08,710-Speed 2599.00 samples/sec   Loss 6.5927   LearningRate 0.0239   Epoch: 10   Global Step: 424440   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:12,615-Speed 2623.93 samples/sec   Loss 6.5617   LearningRate 0.0238   Epoch: 10   Global Step: 424450   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:16,506-Speed 2631.77 samples/sec   Loss 6.5793   LearningRate 0.0238   Epoch: 10   Global Step: 424460   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:20,407-Speed 2626.02 samples/sec   Loss 6.5027   LearningRate 0.0238   Epoch: 10   Global Step: 424470   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:24,319-Speed 2618.35 samples/sec   Loss 6.4905   LearningRate 0.0238   Epoch: 10   Global Step: 424480   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:28,218-Speed 2626.96 samples/sec   Loss 6.4222   LearningRate 0.0238   Epoch: 10   Global Step: 424490   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:32,111-Speed 2630.79 samples/sec   Loss 6.5961   LearningRate 0.0238   Epoch: 10   Global Step: 424500   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:25:36,007-Speed 2628.63 samples/sec   Loss 6.4740   LearningRate 0.0238   Epoch: 10   Global Step: 424510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:25:39,912-Speed 2623.26 samples/sec   Loss 6.6429   LearningRate 0.0238   Epoch: 10   Global Step: 424520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:25:43,846-Speed 2603.30 samples/sec   Loss 6.6364   LearningRate 0.0238   Epoch: 10   Global Step: 424530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:25:47,805-Speed 2587.65 samples/sec   Loss 6.6741   LearningRate 0.0238   Epoch: 10   Global Step: 424540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:25:51,703-Speed 2627.92 samples/sec   Loss 6.5839   LearningRate 0.0238   Epoch: 10   Global Step: 424550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:25:55,603-Speed 2626.41 samples/sec   Loss 6.5551   LearningRate 0.0238   Epoch: 10   Global Step: 424560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:25:59,508-Speed 2622.95 samples/sec   Loss 6.5552   LearningRate 0.0238   Epoch: 10   Global Step: 424570   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-04-14 19:26:03,433-Speed 2609.29 samples/sec   Loss 6.4583   LearningRate 0.0238   Epoch: 10   Global Step: 424580   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:26:07,327-Speed 2629.78 samples/sec   Loss 6.4876   LearningRate 0.0238   Epoch: 10   Global Step: 424590   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:26:11,228-Speed 2626.09 samples/sec   Loss 6.5622   LearningRate 0.0238   Epoch: 10   Global Step: 424600   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-04-14 19:26:15,122-Speed 2630.16 samples/sec   Loss 6.4161   LearningRate 0.0238   Epoch: 10   Global Step: 424610   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:19,020-Speed 2627.70 samples/sec   Loss 6.4960   LearningRate 0.0238   Epoch: 10   Global Step: 424620   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:22,928-Speed 2620.80 samples/sec   Loss 6.5093   LearningRate 0.0238   Epoch: 10   Global Step: 424630   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:26,841-Speed 2618.07 samples/sec   Loss 6.6292   LearningRate 0.0238   Epoch: 10   Global Step: 424640   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:30,753-Speed 2617.93 samples/sec   Loss 6.6590   LearningRate 0.0238   Epoch: 10   Global Step: 424650   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:34,662-Speed 2619.77 samples/sec   Loss 6.4321   LearningRate 0.0238   Epoch: 10   Global Step: 424660   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:38,578-Speed 2615.53 samples/sec   Loss 6.4358   LearningRate 0.0238   Epoch: 10   Global Step: 424670   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:26:42,478-Speed 2627.07 samples/sec   Loss 6.4877   LearningRate 0.0238   Epoch: 10   Global Step: 424680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:26:46,372-Speed 2630.26 samples/sec   Loss 6.5617   LearningRate 0.0238   Epoch: 10   Global Step: 424690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:26:50,295-Speed 2610.93 samples/sec   Loss 6.4418   LearningRate 0.0238   Epoch: 10   Global Step: 424700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:26:54,201-Speed 2622.27 samples/sec   Loss 6.4001   LearningRate 0.0238   Epoch: 10   Global Step: 424710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:26:58,103-Speed 2625.49 samples/sec   Loss 6.5146   LearningRate 0.0238   Epoch: 10   Global Step: 424720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:01,997-Speed 2630.34 samples/sec   Loss 6.5337   LearningRate 0.0238   Epoch: 10   Global Step: 424730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:05,894-Speed 2628.67 samples/sec   Loss 6.4454   LearningRate 0.0238   Epoch: 10   Global Step: 424740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:09,792-Speed 2627.55 samples/sec   Loss 6.3462   LearningRate 0.0238   Epoch: 10   Global Step: 424750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:13,693-Speed 2625.65 samples/sec   Loss 6.3753   LearningRate 0.0238   Epoch: 10   Global Step: 424760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:17,592-Speed 2627.52 samples/sec   Loss 6.4728   LearningRate 0.0238   Epoch: 10   Global Step: 424770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:21,472-Speed 2639.28 samples/sec   Loss 6.5224   LearningRate 0.0238   Epoch: 10   Global Step: 424780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:25,371-Speed 2627.18 samples/sec   Loss 6.5257   LearningRate 0.0238   Epoch: 10   Global Step: 424790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:29,272-Speed 2625.61 samples/sec   Loss 6.5284   LearningRate 0.0238   Epoch: 10   Global Step: 424800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:33,178-Speed 2622.21 samples/sec   Loss 6.4337   LearningRate 0.0238   Epoch: 10   Global Step: 424810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:37,075-Speed 2628.76 samples/sec   Loss 6.5014   LearningRate 0.0238   Epoch: 10   Global Step: 424820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:41,051-Speed 2575.91 samples/sec   Loss 6.4999   LearningRate 0.0238   Epoch: 10   Global Step: 424830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:44,951-Speed 2625.94 samples/sec   Loss 6.4428   LearningRate 0.0238   Epoch: 10   Global Step: 424840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:48,867-Speed 2616.82 samples/sec   Loss 6.3948   LearningRate 0.0238   Epoch: 10   Global Step: 424850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:52,774-Speed 2621.36 samples/sec   Loss 6.5568   LearningRate 0.0238   Epoch: 10   Global Step: 424860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:27:56,668-Speed 2630.82 samples/sec   Loss 6.5196   LearningRate 0.0238   Epoch: 10   Global Step: 424870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:00,564-Speed 2628.89 samples/sec   Loss 6.5545   LearningRate 0.0238   Epoch: 10   Global Step: 424880   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:28:04,844-Speed 2393.13 samples/sec   Loss 6.4964   LearningRate 0.0238   Epoch: 10   Global Step: 424890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:08,738-Speed 2630.90 samples/sec   Loss 6.5306   LearningRate 0.0238   Epoch: 10   Global Step: 424900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:12,635-Speed 2628.51 samples/sec   Loss 6.3977   LearningRate 0.0238   Epoch: 10   Global Step: 424910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:16,531-Speed 2628.78 samples/sec   Loss 6.3545   LearningRate 0.0238   Epoch: 10   Global Step: 424920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:20,432-Speed 2625.23 samples/sec   Loss 6.5201   LearningRate 0.0238   Epoch: 10   Global Step: 424930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:24,329-Speed 2629.12 samples/sec   Loss 6.6056   LearningRate 0.0238   Epoch: 10   Global Step: 424940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:28,228-Speed 2627.35 samples/sec   Loss 6.4731   LearningRate 0.0238   Epoch: 10   Global Step: 424950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:32,121-Speed 2630.73 samples/sec   Loss 6.4911   LearningRate 0.0238   Epoch: 10   Global Step: 424960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:36,040-Speed 2613.21 samples/sec   Loss 6.4410   LearningRate 0.0238   Epoch: 10   Global Step: 424970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:39,940-Speed 2626.10 samples/sec   Loss 6.4704   LearningRate 0.0238   Epoch: 10   Global Step: 424980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:43,839-Speed 2627.05 samples/sec   Loss 6.6485   LearningRate 0.0238   Epoch: 10   Global Step: 424990   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:28:47,719-Speed 2639.41 samples/sec   Loss 6.5782   LearningRate 0.0238   Epoch: 10   Global Step: 425000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:51,615-Speed 2629.48 samples/sec   Loss 6.6227   LearningRate 0.0238   Epoch: 10   Global Step: 425010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:55,520-Speed 2622.80 samples/sec   Loss 6.4582   LearningRate 0.0238   Epoch: 10   Global Step: 425020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:28:59,418-Speed 2627.93 samples/sec   Loss 6.4137   LearningRate 0.0238   Epoch: 10   Global Step: 425030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:03,314-Speed 2628.71 samples/sec   Loss 6.4339   LearningRate 0.0238   Epoch: 10   Global Step: 425040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:07,208-Speed 2630.20 samples/sec   Loss 6.4848   LearningRate 0.0238   Epoch: 10   Global Step: 425050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:11,104-Speed 2629.31 samples/sec   Loss 6.5166   LearningRate 0.0238   Epoch: 10   Global Step: 425060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:15,002-Speed 2627.21 samples/sec   Loss 6.4302   LearningRate 0.0238   Epoch: 10   Global Step: 425070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:18,908-Speed 2622.24 samples/sec   Loss 6.6050   LearningRate 0.0238   Epoch: 10   Global Step: 425080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:22,942-Speed 2538.83 samples/sec   Loss 6.5106   LearningRate 0.0238   Epoch: 10   Global Step: 425090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:26,825-Speed 2638.24 samples/sec   Loss 6.4628   LearningRate 0.0238   Epoch: 10   Global Step: 425100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:29:30,703-Speed 2641.26 samples/sec   Loss 6.4875   LearningRate 0.0238   Epoch: 10   Global Step: 425110   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:34,605-Speed 2624.56 samples/sec   Loss 6.5844   LearningRate 0.0238   Epoch: 10   Global Step: 425120   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:38,503-Speed 2627.67 samples/sec   Loss 6.6492   LearningRate 0.0238   Epoch: 10   Global Step: 425130   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:42,403-Speed 2626.05 samples/sec   Loss 6.5229   LearningRate 0.0238   Epoch: 10   Global Step: 425140   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:46,303-Speed 2626.60 samples/sec   Loss 6.5557   LearningRate 0.0238   Epoch: 10   Global Step: 425150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:50,194-Speed 2632.02 samples/sec   Loss 6.5461   LearningRate 0.0238   Epoch: 10   Global Step: 425160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:54,090-Speed 2629.30 samples/sec   Loss 6.5579   LearningRate 0.0238   Epoch: 10   Global Step: 425170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:29:57,990-Speed 2625.90 samples/sec   Loss 6.5806   LearningRate 0.0238   Epoch: 10   Global Step: 425180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:30:01,887-Speed 2628.23 samples/sec   Loss 6.5657   LearningRate 0.0238   Epoch: 10   Global Step: 425190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:30:05,785-Speed 2627.86 samples/sec   Loss 6.5127   LearningRate 0.0238   Epoch: 10   Global Step: 425200   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:30:09,745-Speed 2586.35 samples/sec   Loss 6.6021   LearningRate 0.0238   Epoch: 10   Global Step: 425210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:13,654-Speed 2620.69 samples/sec   Loss 6.4957   LearningRate 0.0238   Epoch: 10   Global Step: 425220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:17,560-Speed 2622.09 samples/sec   Loss 6.4126   LearningRate 0.0238   Epoch: 10   Global Step: 425230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:21,455-Speed 2629.72 samples/sec   Loss 6.4843   LearningRate 0.0238   Epoch: 10   Global Step: 425240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:25,347-Speed 2631.57 samples/sec   Loss 6.4038   LearningRate 0.0238   Epoch: 10   Global Step: 425250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:29,347-Speed 2561.61 samples/sec   Loss 6.6430   LearningRate 0.0238   Epoch: 10   Global Step: 425260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:33,243-Speed 2628.54 samples/sec   Loss 6.5634   LearningRate 0.0238   Epoch: 10   Global Step: 425270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:37,145-Speed 2625.02 samples/sec   Loss 6.4773   LearningRate 0.0238   Epoch: 10   Global Step: 425280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:41,056-Speed 2618.94 samples/sec   Loss 6.5235   LearningRate 0.0238   Epoch: 10   Global Step: 425290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:30:44,939-Speed 2637.42 samples/sec   Loss 6.5464   LearningRate 0.0237   Epoch: 10   Global Step: 425300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:30:48,849-Speed 2619.90 samples/sec   Loss 6.4472   LearningRate 0.0237   Epoch: 10   Global Step: 425310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:30:52,754-Speed 2622.65 samples/sec   Loss 6.4730   LearningRate 0.0237   Epoch: 10   Global Step: 425320   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:30:56,652-Speed 2628.34 samples/sec   Loss 6.4767   LearningRate 0.0237   Epoch: 10   Global Step: 425330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:00,550-Speed 2627.52 samples/sec   Loss 6.6191   LearningRate 0.0237   Epoch: 10   Global Step: 425340   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:04,454-Speed 2623.12 samples/sec   Loss 6.3754   LearningRate 0.0237   Epoch: 10   Global Step: 425350   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:08,362-Speed 2621.16 samples/sec   Loss 6.5554   LearningRate 0.0237   Epoch: 10   Global Step: 425360   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:12,258-Speed 2629.28 samples/sec   Loss 6.5660   LearningRate 0.0237   Epoch: 10   Global Step: 425370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:16,157-Speed 2627.06 samples/sec   Loss 6.5320   LearningRate 0.0237   Epoch: 10   Global Step: 425380   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:20,054-Speed 2628.43 samples/sec   Loss 6.5746   LearningRate 0.0237   Epoch: 10   Global Step: 425390   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:23,952-Speed 2627.41 samples/sec   Loss 6.4324   LearningRate 0.0237   Epoch: 10   Global Step: 425400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:27,849-Speed 2628.78 samples/sec   Loss 6.5334   LearningRate 0.0237   Epoch: 10   Global Step: 425410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:31,745-Speed 2628.61 samples/sec   Loss 6.4985   LearningRate 0.0237   Epoch: 10   Global Step: 425420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:35,645-Speed 2626.13 samples/sec   Loss 6.5417   LearningRate 0.0237   Epoch: 10   Global Step: 425430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:39,550-Speed 2622.65 samples/sec   Loss 6.5657   LearningRate 0.0237   Epoch: 10   Global Step: 425440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:43,454-Speed 2624.02 samples/sec   Loss 6.4641   LearningRate 0.0237   Epoch: 10   Global Step: 425450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:47,356-Speed 2624.73 samples/sec   Loss 6.5045   LearningRate 0.0237   Epoch: 10   Global Step: 425460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:31:51,248-Speed 2632.08 samples/sec   Loss 6.4342   LearningRate 0.0237   Epoch: 10   Global Step: 425470   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:55,151-Speed 2624.30 samples/sec   Loss 6.4180   LearningRate 0.0237   Epoch: 10   Global Step: 425480   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:31:59,053-Speed 2625.15 samples/sec   Loss 6.4611   LearningRate 0.0237   Epoch: 10   Global Step: 425490   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:02,948-Speed 2629.55 samples/sec   Loss 6.4331   LearningRate 0.0237   Epoch: 10   Global Step: 425500   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:06,860-Speed 2618.21 samples/sec   Loss 6.5127   LearningRate 0.0237   Epoch: 10   Global Step: 425510   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:10,755-Speed 2629.71 samples/sec   Loss 6.4486   LearningRate 0.0237   Epoch: 10   Global Step: 425520   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:14,652-Speed 2628.54 samples/sec   Loss 6.5860   LearningRate 0.0237   Epoch: 10   Global Step: 425530   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:18,544-Speed 2631.76 samples/sec   Loss 6.4847   LearningRate 0.0237   Epoch: 10   Global Step: 425540   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:22,445-Speed 2625.65 samples/sec   Loss 6.4973   LearningRate 0.0237   Epoch: 10   Global Step: 425550   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:26,349-Speed 2623.79 samples/sec   Loss 6.4782   LearningRate 0.0237   Epoch: 10   Global Step: 425560   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:32:30,242-Speed 2630.92 samples/sec   Loss 6.5834   LearningRate 0.0237   Epoch: 10   Global Step: 425570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:34,177-Speed 2602.78 samples/sec   Loss 6.3511   LearningRate 0.0237   Epoch: 10   Global Step: 425580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:38,095-Speed 2614.71 samples/sec   Loss 6.4962   LearningRate 0.0237   Epoch: 10   Global Step: 425590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:41,991-Speed 2629.05 samples/sec   Loss 6.4561   LearningRate 0.0237   Epoch: 10   Global Step: 425600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:45,900-Speed 2620.30 samples/sec   Loss 6.5099   LearningRate 0.0237   Epoch: 10   Global Step: 425610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:49,797-Speed 2628.27 samples/sec   Loss 6.3956   LearningRate 0.0237   Epoch: 10   Global Step: 425620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:53,707-Speed 2619.45 samples/sec   Loss 6.4649   LearningRate 0.0237   Epoch: 10   Global Step: 425630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:32:57,603-Speed 2629.29 samples/sec   Loss 6.4519   LearningRate 0.0237   Epoch: 10   Global Step: 425640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:01,500-Speed 2628.56 samples/sec   Loss 6.5696   LearningRate 0.0237   Epoch: 10   Global Step: 425650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:05,418-Speed 2613.88 samples/sec   Loss 6.5485   LearningRate 0.0237   Epoch: 10   Global Step: 425660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:09,333-Speed 2616.55 samples/sec   Loss 6.4720   LearningRate 0.0237   Epoch: 10   Global Step: 425670   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:33:13,238-Speed 2623.05 samples/sec   Loss 6.6071   LearningRate 0.0237   Epoch: 10   Global Step: 425680   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:33:17,125-Speed 2634.50 samples/sec   Loss 6.2866   LearningRate 0.0237   Epoch: 10   Global Step: 425690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:21,035-Speed 2619.79 samples/sec   Loss 6.4614   LearningRate 0.0237   Epoch: 10   Global Step: 425700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:24,971-Speed 2602.57 samples/sec   Loss 6.4394   LearningRate 0.0237   Epoch: 10   Global Step: 425710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:28,871-Speed 2626.64 samples/sec   Loss 6.5000   LearningRate 0.0237   Epoch: 10   Global Step: 425720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:32,769-Speed 2627.82 samples/sec   Loss 6.4529   LearningRate 0.0237   Epoch: 10   Global Step: 425730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:36,663-Speed 2629.67 samples/sec   Loss 6.3852   LearningRate 0.0237   Epoch: 10   Global Step: 425740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:40,557-Speed 2630.35 samples/sec   Loss 6.4819   LearningRate 0.0237   Epoch: 10   Global Step: 425750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:44,454-Speed 2628.64 samples/sec   Loss 6.4518   LearningRate 0.0237   Epoch: 10   Global Step: 425760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:48,349-Speed 2629.99 samples/sec   Loss 6.4292   LearningRate 0.0237   Epoch: 10   Global Step: 425770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:33:52,231-Speed 2639.01 samples/sec   Loss 6.5328   LearningRate 0.0237   Epoch: 10   Global Step: 425780   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:33:56,130-Speed 2626.79 samples/sec   Loss 6.4634   LearningRate 0.0237   Epoch: 10   Global Step: 425790   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:00,065-Speed 2602.71 samples/sec   Loss 6.4921   LearningRate 0.0237   Epoch: 10   Global Step: 425800   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:03,969-Speed 2623.72 samples/sec   Loss 6.4446   LearningRate 0.0237   Epoch: 10   Global Step: 425810   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:07,867-Speed 2627.58 samples/sec   Loss 6.5561   LearningRate 0.0237   Epoch: 10   Global Step: 425820   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:11,763-Speed 2629.27 samples/sec   Loss 6.5298   LearningRate 0.0237   Epoch: 10   Global Step: 425830   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:15,659-Speed 2628.96 samples/sec   Loss 6.5242   LearningRate 0.0237   Epoch: 10   Global Step: 425840   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:19,557-Speed 2627.74 samples/sec   Loss 6.5750   LearningRate 0.0237   Epoch: 10   Global Step: 425850   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:23,453-Speed 2629.20 samples/sec   Loss 6.4260   LearningRate 0.0237   Epoch: 10   Global Step: 425860   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:27,359-Speed 2622.63 samples/sec   Loss 6.4632   LearningRate 0.0237   Epoch: 10   Global Step: 425870   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:31,257-Speed 2627.07 samples/sec   Loss 6.5005   LearningRate 0.0237   Epoch: 10   Global Step: 425880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:34:35,153-Speed 2628.83 samples/sec   Loss 6.4762   LearningRate 0.0237   Epoch: 10   Global Step: 425890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:34:39,048-Speed 2629.71 samples/sec   Loss 6.5191   LearningRate 0.0237   Epoch: 10   Global Step: 425900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:34:42,924-Speed 2642.53 samples/sec   Loss 6.5549   LearningRate 0.0237   Epoch: 10   Global Step: 425910   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:46,829-Speed 2623.37 samples/sec   Loss 6.6267   LearningRate 0.0237   Epoch: 10   Global Step: 425920   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:50,741-Speed 2617.86 samples/sec   Loss 6.4213   LearningRate 0.0237   Epoch: 10   Global Step: 425930   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:54,648-Speed 2621.84 samples/sec   Loss 6.5939   LearningRate 0.0237   Epoch: 10   Global Step: 425940   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:34:58,547-Speed 2626.77 samples/sec   Loss 6.4248   LearningRate 0.0237   Epoch: 10   Global Step: 425950   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:02,446-Speed 2626.86 samples/sec   Loss 6.6119   LearningRate 0.0237   Epoch: 10   Global Step: 425960   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:06,345-Speed 2626.70 samples/sec   Loss 6.5478   LearningRate 0.0237   Epoch: 10   Global Step: 425970   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:10,242-Speed 2628.05 samples/sec   Loss 6.4127   LearningRate 0.0237   Epoch: 10   Global Step: 425980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:14,140-Speed 2628.45 samples/sec   Loss 6.6107   LearningRate 0.0237   Epoch: 10   Global Step: 425990   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:18,063-Speed 2610.51 samples/sec   Loss 6.5166   LearningRate 0.0237   Epoch: 10   Global Step: 426000   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:21,961-Speed 2634.48 samples/sec   Loss 6.5037   LearningRate 0.0237   Epoch: 10   Global Step: 426010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:35:25,852-Speed 2631.93 samples/sec   Loss 6.5301   LearningRate 0.0237   Epoch: 10   Global Step: 426020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:35:29,757-Speed 2623.62 samples/sec   Loss 6.3750   LearningRate 0.0237   Epoch: 10   Global Step: 426030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:35:33,635-Speed 2641.26 samples/sec   Loss 6.5386   LearningRate 0.0237   Epoch: 10   Global Step: 426040   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:37,533-Speed 2627.24 samples/sec   Loss 6.5038   LearningRate 0.0237   Epoch: 10   Global Step: 426050   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:41,430-Speed 2628.06 samples/sec   Loss 6.2744   LearningRate 0.0237   Epoch: 10   Global Step: 426060   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:45,330-Speed 2626.19 samples/sec   Loss 6.5298   LearningRate 0.0237   Epoch: 10   Global Step: 426070   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:49,224-Speed 2630.34 samples/sec   Loss 6.6121   LearningRate 0.0237   Epoch: 10   Global Step: 426080   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:53,134-Speed 2619.43 samples/sec   Loss 6.4488   LearningRate 0.0237   Epoch: 10   Global Step: 426090   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:35:57,019-Speed 2636.64 samples/sec   Loss 6.4007   LearningRate 0.0237   Epoch: 10   Global Step: 426100   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:36:00,977-Speed 2587.70 samples/sec   Loss 6.3732   LearningRate 0.0237   Epoch: 10   Global Step: 426110   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:36:04,869-Speed 2631.60 samples/sec   Loss 6.5330   LearningRate 0.0237   Epoch: 10   Global Step: 426120   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:36:08,766-Speed 2628.26 samples/sec   Loss 6.4101   LearningRate 0.0237   Epoch: 10   Global Step: 426130   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:36:12,660-Speed 2630.31 samples/sec   Loss 6.4748   LearningRate 0.0237   Epoch: 10   Global Step: 426140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:16,575-Speed 2616.35 samples/sec   Loss 6.4494   LearningRate 0.0236   Epoch: 10   Global Step: 426150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:20,471-Speed 2628.77 samples/sec   Loss 6.4750   LearningRate 0.0236   Epoch: 10   Global Step: 426160   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:24,373-Speed 2624.73 samples/sec   Loss 6.4283   LearningRate 0.0236   Epoch: 10   Global Step: 426170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:28,270-Speed 2628.70 samples/sec   Loss 6.6048   LearningRate 0.0236   Epoch: 10   Global Step: 426180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:32,170-Speed 2626.84 samples/sec   Loss 6.5621   LearningRate 0.0236   Epoch: 10   Global Step: 426190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:36,070-Speed 2625.50 samples/sec   Loss 6.4056   LearningRate 0.0236   Epoch: 10   Global Step: 426200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:39,976-Speed 2622.62 samples/sec   Loss 6.4709   LearningRate 0.0236   Epoch: 10   Global Step: 426210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:43,868-Speed 2632.20 samples/sec   Loss 6.5538   LearningRate 0.0236   Epoch: 10   Global Step: 426220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:47,761-Speed 2630.76 samples/sec   Loss 6.4764   LearningRate 0.0236   Epoch: 10   Global Step: 426230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:51,634-Speed 2644.67 samples/sec   Loss 6.5635   LearningRate 0.0236   Epoch: 10   Global Step: 426240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:55,531-Speed 2628.17 samples/sec   Loss 6.4960   LearningRate 0.0236   Epoch: 10   Global Step: 426250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:36:59,441-Speed 2619.88 samples/sec   Loss 6.5580   LearningRate 0.0236   Epoch: 10   Global Step: 426260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:03,341-Speed 2626.48 samples/sec   Loss 6.4460   LearningRate 0.0236   Epoch: 10   Global Step: 426270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:07,242-Speed 2625.37 samples/sec   Loss 6.3942   LearningRate 0.0236   Epoch: 10   Global Step: 426280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:11,139-Speed 2627.99 samples/sec   Loss 6.4980   LearningRate 0.0236   Epoch: 10   Global Step: 426290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:15,045-Speed 2622.34 samples/sec   Loss 6.4759   LearningRate 0.0236   Epoch: 10   Global Step: 426300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:18,947-Speed 2625.26 samples/sec   Loss 6.4348   LearningRate 0.0236   Epoch: 10   Global Step: 426310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:22,850-Speed 2623.75 samples/sec   Loss 6.4190   LearningRate 0.0236   Epoch: 10   Global Step: 426320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:26,746-Speed 2629.19 samples/sec   Loss 6.4379   LearningRate 0.0236   Epoch: 10   Global Step: 426330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:30,637-Speed 2632.40 samples/sec   Loss 6.6078   LearningRate 0.0236   Epoch: 10   Global Step: 426340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:34,528-Speed 2632.65 samples/sec   Loss 6.4299   LearningRate 0.0236   Epoch: 10   Global Step: 426350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:38,428-Speed 2626.14 samples/sec   Loss 6.3701   LearningRate 0.0236   Epoch: 10   Global Step: 426360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:37:42,342-Speed 2617.06 samples/sec   Loss 6.3204   LearningRate 0.0236   Epoch: 10   Global Step: 426370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:37:46,432-Speed 2503.67 samples/sec   Loss 6.3861   LearningRate 0.0236   Epoch: 10   Global Step: 426380   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:37:50,529-Speed 2500.43 samples/sec   Loss 6.4470   LearningRate 0.0236   Epoch: 10   Global Step: 426390   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:37:54,498-Speed 2580.03 samples/sec   Loss 6.5002   LearningRate 0.0236   Epoch: 10   Global Step: 426400   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:37:58,421-Speed 2611.65 samples/sec   Loss 6.5223   LearningRate 0.0236   Epoch: 10   Global Step: 426410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:38:02,325-Speed 2623.70 samples/sec   Loss 6.4003   LearningRate 0.0236   Epoch: 10   Global Step: 426420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:38:06,217-Speed 2631.66 samples/sec   Loss 6.5831   LearningRate 0.0236   Epoch: 10   Global Step: 426430   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:38:10,114-Speed 2627.95 samples/sec   Loss 6.5529   LearningRate 0.0236   Epoch: 10   Global Step: 426440   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:38:14,012-Speed 2628.34 samples/sec   Loss 6.5767   LearningRate 0.0236   Epoch: 10   Global Step: 426450   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:38:17,910-Speed 2627.28 samples/sec   Loss 6.5419   LearningRate 0.0236   Epoch: 10   Global Step: 426460   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:38:21,807-Speed 2628.18 samples/sec   Loss 6.5405   LearningRate 0.0236   Epoch: 10   Global Step: 426470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:25,705-Speed 2627.82 samples/sec   Loss 6.4997   LearningRate 0.0236   Epoch: 10   Global Step: 426480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:29,602-Speed 2628.59 samples/sec   Loss 6.4550   LearningRate 0.0236   Epoch: 10   Global Step: 426490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:33,514-Speed 2618.23 samples/sec   Loss 6.5861   LearningRate 0.0236   Epoch: 10   Global Step: 426500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:37,412-Speed 2627.82 samples/sec   Loss 6.5518   LearningRate 0.0236   Epoch: 10   Global Step: 426510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:41,309-Speed 2628.25 samples/sec   Loss 6.4245   LearningRate 0.0236   Epoch: 10   Global Step: 426520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:45,237-Speed 2607.18 samples/sec   Loss 6.4597   LearningRate 0.0236   Epoch: 10   Global Step: 426530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:49,147-Speed 2619.78 samples/sec   Loss 6.4587   LearningRate 0.0236   Epoch: 10   Global Step: 426540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:53,081-Speed 2603.19 samples/sec   Loss 6.4665   LearningRate 0.0236   Epoch: 10   Global Step: 426550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:38:56,999-Speed 2620.34 samples/sec   Loss 6.5418   LearningRate 0.0236   Epoch: 10   Global Step: 426560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:00,878-Speed 2640.12 samples/sec   Loss 6.5273   LearningRate 0.0236   Epoch: 10   Global Step: 426570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:04,782-Speed 2624.20 samples/sec   Loss 6.4078   LearningRate 0.0236   Epoch: 10   Global Step: 426580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:08,689-Speed 2621.24 samples/sec   Loss 6.4739   LearningRate 0.0236   Epoch: 10   Global Step: 426590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:12,601-Speed 2618.08 samples/sec   Loss 6.4658   LearningRate 0.0236   Epoch: 10   Global Step: 426600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:16,509-Speed 2620.74 samples/sec   Loss 6.4588   LearningRate 0.0236   Epoch: 10   Global Step: 426610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:20,409-Speed 2626.50 samples/sec   Loss 6.6470   LearningRate 0.0236   Epoch: 10   Global Step: 426620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:24,309-Speed 2626.46 samples/sec   Loss 6.4205   LearningRate 0.0236   Epoch: 10   Global Step: 426630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:28,219-Speed 2619.37 samples/sec   Loss 6.4936   LearningRate 0.0236   Epoch: 10   Global Step: 426640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:32,129-Speed 2619.80 samples/sec   Loss 6.4951   LearningRate 0.0236   Epoch: 10   Global Step: 426650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:36,035-Speed 2622.46 samples/sec   Loss 6.5440   LearningRate 0.0236   Epoch: 10   Global Step: 426660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:39,942-Speed 2621.20 samples/sec   Loss 6.4300   LearningRate 0.0236   Epoch: 10   Global Step: 426670   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:39:43,824-Speed 2638.31 samples/sec   Loss 6.4915   LearningRate 0.0236   Epoch: 10   Global Step: 426680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:47,728-Speed 2623.85 samples/sec   Loss 6.4257   LearningRate 0.0236   Epoch: 10   Global Step: 426690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:51,621-Speed 2630.57 samples/sec   Loss 6.5427   LearningRate 0.0236   Epoch: 10   Global Step: 426700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:55,552-Speed 2606.07 samples/sec   Loss 6.5188   LearningRate 0.0236   Epoch: 10   Global Step: 426710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:39:59,458-Speed 2622.41 samples/sec   Loss 6.4128   LearningRate 0.0236   Epoch: 10   Global Step: 426720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:40:03,337-Speed 2640.58 samples/sec   Loss 6.5868   LearningRate 0.0236   Epoch: 10   Global Step: 426730   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:07,231-Speed 2629.80 samples/sec   Loss 6.5015   LearningRate 0.0236   Epoch: 10   Global Step: 426740   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:11,131-Speed 2626.63 samples/sec   Loss 6.5097   LearningRate 0.0236   Epoch: 10   Global Step: 426750   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:15,021-Speed 2632.92 samples/sec   Loss 6.4767   LearningRate 0.0236   Epoch: 10   Global Step: 426760   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:18,925-Speed 2623.93 samples/sec   Loss 6.4557   LearningRate 0.0236   Epoch: 10   Global Step: 426770   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:22,843-Speed 2614.26 samples/sec   Loss 6.4154   LearningRate 0.0236   Epoch: 10   Global Step: 426780   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:26,818-Speed 2576.32 samples/sec   Loss 6.6025   LearningRate 0.0236   Epoch: 10   Global Step: 426790   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:30,735-Speed 2614.68 samples/sec   Loss 6.4764   LearningRate 0.0236   Epoch: 10   Global Step: 426800   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:34,652-Speed 2615.88 samples/sec   Loss 6.3571   LearningRate 0.0236   Epoch: 10   Global Step: 426810   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:38,647-Speed 2563.33 samples/sec   Loss 6.4902   LearningRate 0.0236   Epoch: 10   Global Step: 426820   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:40:42,547-Speed 2626.05 samples/sec   Loss 6.5727   LearningRate 0.0236   Epoch: 10   Global Step: 426830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:40:46,474-Speed 2608.59 samples/sec   Loss 6.6140   LearningRate 0.0236   Epoch: 10   Global Step: 426840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:40:50,383-Speed 2620.22 samples/sec   Loss 6.5595   LearningRate 0.0236   Epoch: 10   Global Step: 426850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:40:54,274-Speed 2632.96 samples/sec   Loss 6.4053   LearningRate 0.0236   Epoch: 10   Global Step: 426860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:40:58,170-Speed 2628.82 samples/sec   Loss 6.3399   LearningRate 0.0236   Epoch: 10   Global Step: 426870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:02,072-Speed 2624.27 samples/sec   Loss 6.4252   LearningRate 0.0236   Epoch: 10   Global Step: 426880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:05,981-Speed 2620.76 samples/sec   Loss 6.4084   LearningRate 0.0236   Epoch: 10   Global Step: 426890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:09,883-Speed 2625.40 samples/sec   Loss 6.4612   LearningRate 0.0236   Epoch: 10   Global Step: 426900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:13,782-Speed 2627.03 samples/sec   Loss 6.3686   LearningRate 0.0236   Epoch: 10   Global Step: 426910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:17,682-Speed 2626.06 samples/sec   Loss 6.4319   LearningRate 0.0236   Epoch: 10   Global Step: 426920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:21,563-Speed 2639.51 samples/sec   Loss 6.5738   LearningRate 0.0236   Epoch: 10   Global Step: 426930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:25,459-Speed 2628.94 samples/sec   Loss 6.4103   LearningRate 0.0236   Epoch: 10   Global Step: 426940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:29,358-Speed 2627.10 samples/sec   Loss 6.4060   LearningRate 0.0236   Epoch: 10   Global Step: 426950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:33,264-Speed 2621.91 samples/sec   Loss 6.4016   LearningRate 0.0236   Epoch: 10   Global Step: 426960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:37,173-Speed 2620.45 samples/sec   Loss 6.6868   LearningRate 0.0236   Epoch: 10   Global Step: 426970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:41,070-Speed 2628.09 samples/sec   Loss 6.4933   LearningRate 0.0236   Epoch: 10   Global Step: 426980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:45,004-Speed 2604.00 samples/sec   Loss 6.5237   LearningRate 0.0236   Epoch: 10   Global Step: 426990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:48,899-Speed 2629.58 samples/sec   Loss 6.5062   LearningRate 0.0235   Epoch: 10   Global Step: 427000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:52,820-Speed 2612.26 samples/sec   Loss 6.5006   LearningRate 0.0235   Epoch: 10   Global Step: 427010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:41:56,730-Speed 2620.11 samples/sec   Loss 6.5173   LearningRate 0.0235   Epoch: 10   Global Step: 427020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:00,627-Speed 2628.11 samples/sec   Loss 6.5327   LearningRate 0.0235   Epoch: 10   Global Step: 427030   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:42:04,512-Speed 2636.24 samples/sec   Loss 6.5057   LearningRate 0.0235   Epoch: 10   Global Step: 427040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:08,413-Speed 2625.24 samples/sec   Loss 6.4207   LearningRate 0.0235   Epoch: 10   Global Step: 427050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:12,312-Speed 2626.96 samples/sec   Loss 6.4390   LearningRate 0.0235   Epoch: 10   Global Step: 427060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:16,215-Speed 2624.56 samples/sec   Loss 6.4683   LearningRate 0.0235   Epoch: 10   Global Step: 427070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:20,130-Speed 2616.17 samples/sec   Loss 6.5828   LearningRate 0.0235   Epoch: 10   Global Step: 427080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:24,033-Speed 2623.99 samples/sec   Loss 6.5310   LearningRate 0.0235   Epoch: 10   Global Step: 427090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:27,926-Speed 2630.91 samples/sec   Loss 6.4477   LearningRate 0.0235   Epoch: 10   Global Step: 427100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:31,824-Speed 2628.40 samples/sec   Loss 6.3842   LearningRate 0.0235   Epoch: 10   Global Step: 427110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:42:35,703-Speed 2640.22 samples/sec   Loss 6.5645   LearningRate 0.0235   Epoch: 10   Global Step: 427120   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:42:39,605-Speed 2624.90 samples/sec   Loss 6.3740   LearningRate 0.0235   Epoch: 10   Global Step: 427130   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:42:43,502-Speed 2627.99 samples/sec   Loss 6.6033   LearningRate 0.0235   Epoch: 10   Global Step: 427140   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:42:47,400-Speed 2627.38 samples/sec   Loss 6.4880   LearningRate 0.0235   Epoch: 10   Global Step: 427150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:42:51,305-Speed 2623.49 samples/sec   Loss 6.5606   LearningRate 0.0235   Epoch: 10   Global Step: 427160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:42:55,238-Speed 2603.80 samples/sec   Loss 6.4357   LearningRate 0.0235   Epoch: 10   Global Step: 427170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:42:59,171-Speed 2604.33 samples/sec   Loss 6.4830   LearningRate 0.0235   Epoch: 10   Global Step: 427180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:43:03,063-Speed 2631.47 samples/sec   Loss 6.4768   LearningRate 0.0235   Epoch: 10   Global Step: 427190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:43:06,960-Speed 2628.24 samples/sec   Loss 6.4110   LearningRate 0.0235   Epoch: 10   Global Step: 427200   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:43:10,855-Speed 2629.70 samples/sec   Loss 6.3382   LearningRate 0.0235   Epoch: 10   Global Step: 427210   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:43:14,752-Speed 2628.41 samples/sec   Loss 6.4816   LearningRate 0.0235   Epoch: 10   Global Step: 427220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:18,651-Speed 2627.05 samples/sec   Loss 6.4241   LearningRate 0.0235   Epoch: 10   Global Step: 427230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:22,552-Speed 2626.10 samples/sec   Loss 6.4173   LearningRate 0.0235   Epoch: 10   Global Step: 427240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:26,450-Speed 2627.39 samples/sec   Loss 6.4151   LearningRate 0.0235   Epoch: 10   Global Step: 427250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:30,353-Speed 2624.01 samples/sec   Loss 6.4551   LearningRate 0.0235   Epoch: 10   Global Step: 427260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:34,248-Speed 2629.80 samples/sec   Loss 6.4284   LearningRate 0.0235   Epoch: 10   Global Step: 427270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:38,158-Speed 2619.37 samples/sec   Loss 6.4463   LearningRate 0.0235   Epoch: 10   Global Step: 427280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:42,072-Speed 2617.36 samples/sec   Loss 6.4508   LearningRate 0.0235   Epoch: 10   Global Step: 427290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:45,962-Speed 2632.58 samples/sec   Loss 6.4054   LearningRate 0.0235   Epoch: 10   Global Step: 427300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:49,856-Speed 2630.27 samples/sec   Loss 6.4075   LearningRate 0.0235   Epoch: 10   Global Step: 427310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:43:53,756-Speed 2626.68 samples/sec   Loss 6.5410   LearningRate 0.0235   Epoch: 10   Global Step: 427320   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:43:57,635-Speed 2640.39 samples/sec   Loss 6.5043   LearningRate 0.0235   Epoch: 10   Global Step: 427330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:01,534-Speed 2627.04 samples/sec   Loss 6.4600   LearningRate 0.0235   Epoch: 10   Global Step: 427340   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:05,428-Speed 2630.16 samples/sec   Loss 6.5854   LearningRate 0.0235   Epoch: 10   Global Step: 427350   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:09,319-Speed 2632.22 samples/sec   Loss 6.4929   LearningRate 0.0235   Epoch: 10   Global Step: 427360   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:13,214-Speed 2629.51 samples/sec   Loss 6.4829   LearningRate 0.0235   Epoch: 10   Global Step: 427370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:17,112-Speed 2627.77 samples/sec   Loss 6.4753   LearningRate 0.0235   Epoch: 10   Global Step: 427380   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:21,008-Speed 2628.89 samples/sec   Loss 6.3793   LearningRate 0.0235   Epoch: 10   Global Step: 427390   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:24,914-Speed 2622.03 samples/sec   Loss 6.5211   LearningRate 0.0235   Epoch: 10   Global Step: 427400   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:28,885-Speed 2579.08 samples/sec   Loss 6.4382   LearningRate 0.0235   Epoch: 10   Global Step: 427410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:32,805-Speed 2612.99 samples/sec   Loss 6.4271   LearningRate 0.0235   Epoch: 10   Global Step: 427420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:44:36,702-Speed 2628.57 samples/sec   Loss 6.4813   LearningRate 0.0235   Epoch: 10   Global Step: 427430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:44:40,597-Speed 2629.62 samples/sec   Loss 6.4564   LearningRate 0.0235   Epoch: 10   Global Step: 427440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:44:44,500-Speed 2624.24 samples/sec   Loss 6.3122   LearningRate 0.0235   Epoch: 10   Global Step: 427450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:44:48,470-Speed 2580.26 samples/sec   Loss 6.4544   LearningRate 0.0235   Epoch: 10   Global Step: 427460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:44:52,406-Speed 2602.13 samples/sec   Loss 6.5398   LearningRate 0.0235   Epoch: 10   Global Step: 427470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:44:56,305-Speed 2626.80 samples/sec   Loss 6.4528   LearningRate 0.0235   Epoch: 10   Global Step: 427480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:00,202-Speed 2627.91 samples/sec   Loss 6.3609   LearningRate 0.0235   Epoch: 10   Global Step: 427490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:04,101-Speed 2627.47 samples/sec   Loss 6.5233   LearningRate 0.0235   Epoch: 10   Global Step: 427500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:08,015-Speed 2616.61 samples/sec   Loss 6.5548   LearningRate 0.0235   Epoch: 10   Global Step: 427510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:11,910-Speed 2629.42 samples/sec   Loss 6.3736   LearningRate 0.0235   Epoch: 10   Global Step: 427520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:15,784-Speed 2643.44 samples/sec   Loss 6.3916   LearningRate 0.0235   Epoch: 10   Global Step: 427530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:19,680-Speed 2629.51 samples/sec   Loss 6.4644   LearningRate 0.0235   Epoch: 10   Global Step: 427540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:23,576-Speed 2629.08 samples/sec   Loss 6.3912   LearningRate 0.0235   Epoch: 10   Global Step: 427550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:27,474-Speed 2627.35 samples/sec   Loss 6.4575   LearningRate 0.0235   Epoch: 10   Global Step: 427560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:31,363-Speed 2633.55 samples/sec   Loss 6.5694   LearningRate 0.0235   Epoch: 10   Global Step: 427570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:45:35,241-Speed 2641.39 samples/sec   Loss 6.5046   LearningRate 0.0235   Epoch: 10   Global Step: 427580   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:45:39,140-Speed 2626.52 samples/sec   Loss 6.4069   LearningRate 0.0235   Epoch: 10   Global Step: 427590   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:45:43,039-Speed 2627.24 samples/sec   Loss 6.4593   LearningRate 0.0235   Epoch: 10   Global Step: 427600   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:45:46,945-Speed 2622.11 samples/sec   Loss 6.4418   LearningRate 0.0235   Epoch: 10   Global Step: 427610   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:45:50,848-Speed 2623.81 samples/sec   Loss 6.4542   LearningRate 0.0235   Epoch: 10   Global Step: 427620   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:45:54,743-Speed 2630.05 samples/sec   Loss 6.3532   LearningRate 0.0235   Epoch: 10   Global Step: 427630   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:45:58,660-Speed 2614.72 samples/sec   Loss 6.5523   LearningRate 0.0235   Epoch: 10   Global Step: 427640   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:02,589-Speed 2607.53 samples/sec   Loss 6.5373   LearningRate 0.0235   Epoch: 10   Global Step: 427650   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:06,482-Speed 2630.78 samples/sec   Loss 6.4690   LearningRate 0.0235   Epoch: 10   Global Step: 427660   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:10,382-Speed 2625.99 samples/sec   Loss 6.4325   LearningRate 0.0235   Epoch: 10   Global Step: 427670   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:14,302-Speed 2612.64 samples/sec   Loss 6.4515   LearningRate 0.0235   Epoch: 10   Global Step: 427680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:46:18,197-Speed 2629.76 samples/sec   Loss 6.4643   LearningRate 0.0235   Epoch: 10   Global Step: 427690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:46:22,118-Speed 2611.98 samples/sec   Loss 6.4638   LearningRate 0.0235   Epoch: 10   Global Step: 427700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:46:26,020-Speed 2624.68 samples/sec   Loss 6.3634   LearningRate 0.0235   Epoch: 10   Global Step: 427710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:46:29,911-Speed 2632.47 samples/sec   Loss 6.4839   LearningRate 0.0235   Epoch: 10   Global Step: 427720   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:33,801-Speed 2633.25 samples/sec   Loss 6.4835   LearningRate 0.0235   Epoch: 10   Global Step: 427730   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:37,696-Speed 2629.64 samples/sec   Loss 6.3955   LearningRate 0.0235   Epoch: 10   Global Step: 427740   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:41,593-Speed 2628.37 samples/sec   Loss 6.5061   LearningRate 0.0235   Epoch: 10   Global Step: 427750   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:46:45,469-Speed 2642.16 samples/sec   Loss 6.5096   LearningRate 0.0235   Epoch: 10   Global Step: 427760   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:46:49,363-Speed 2630.52 samples/sec   Loss 6.4116   LearningRate 0.0235   Epoch: 10   Global Step: 427770   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:46:53,267-Speed 2623.26 samples/sec   Loss 6.3634   LearningRate 0.0235   Epoch: 10   Global Step: 427780   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:46:57,166-Speed 2626.64 samples/sec   Loss 6.4826   LearningRate 0.0235   Epoch: 10   Global Step: 427790   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:01,061-Speed 2629.70 samples/sec   Loss 6.3533   LearningRate 0.0235   Epoch: 10   Global Step: 427800   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:04,955-Speed 2629.95 samples/sec   Loss 6.4187   LearningRate 0.0235   Epoch: 10   Global Step: 427810   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:08,869-Speed 2617.81 samples/sec   Loss 6.5178   LearningRate 0.0235   Epoch: 10   Global Step: 427820   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:12,760-Speed 2632.22 samples/sec   Loss 6.4155   LearningRate 0.0235   Epoch: 10   Global Step: 427830   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:16,650-Speed 2632.29 samples/sec   Loss 6.3666   LearningRate 0.0235   Epoch: 10   Global Step: 427840   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:20,562-Speed 2618.83 samples/sec   Loss 6.4813   LearningRate 0.0235   Epoch: 10   Global Step: 427850   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 19:47:24,456-Speed 2630.20 samples/sec   Loss 6.3590   LearningRate 0.0234   Epoch: 10   Global Step: 427860   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:28,352-Speed 2628.50 samples/sec   Loss 6.5190   LearningRate 0.0234   Epoch: 10   Global Step: 427870   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:32,367-Speed 2551.13 samples/sec   Loss 6.4386   LearningRate 0.0234   Epoch: 10   Global Step: 427880   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:36,291-Speed 2610.08 samples/sec   Loss 6.4845   LearningRate 0.0234   Epoch: 10   Global Step: 427890   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:40,207-Speed 2615.48 samples/sec   Loss 6.4783   LearningRate 0.0234   Epoch: 10   Global Step: 427900   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:44,103-Speed 2628.55 samples/sec   Loss 6.4340   LearningRate 0.0234   Epoch: 10   Global Step: 427910   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:48,002-Speed 2627.16 samples/sec   Loss 6.4109   LearningRate 0.0234   Epoch: 10   Global Step: 427920   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:51,901-Speed 2627.15 samples/sec   Loss 6.4759   LearningRate 0.0234   Epoch: 10   Global Step: 427930   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:55,800-Speed 2626.93 samples/sec   Loss 6.3503   LearningRate 0.0234   Epoch: 10   Global Step: 427940   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:47:59,699-Speed 2627.45 samples/sec   Loss 6.5041   LearningRate 0.0234   Epoch: 10   Global Step: 427950   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:48:03,600-Speed 2625.18 samples/sec   Loss 6.3346   LearningRate 0.0234   Epoch: 10   Global Step: 427960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:07,495-Speed 2629.82 samples/sec   Loss 6.4759   LearningRate 0.0234   Epoch: 10   Global Step: 427970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:11,393-Speed 2627.21 samples/sec   Loss 6.4546   LearningRate 0.0234   Epoch: 10   Global Step: 427980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:15,290-Speed 2628.58 samples/sec   Loss 6.5400   LearningRate 0.0234   Epoch: 10   Global Step: 427990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:19,193-Speed 2623.67 samples/sec   Loss 6.4159   LearningRate 0.0234   Epoch: 10   Global Step: 428000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:23,112-Speed 2614.29 samples/sec   Loss 6.4321   LearningRate 0.0234   Epoch: 10   Global Step: 428010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:27,015-Speed 2624.21 samples/sec   Loss 6.3736   LearningRate 0.0234   Epoch: 10   Global Step: 428020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:30,911-Speed 2629.73 samples/sec   Loss 6.5000   LearningRate 0.0234   Epoch: 10   Global Step: 428030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:34,826-Speed 2615.87 samples/sec   Loss 6.4901   LearningRate 0.0234   Epoch: 10   Global Step: 428040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:38,722-Speed 2629.08 samples/sec   Loss 6.3805   LearningRate 0.0234   Epoch: 10   Global Step: 428050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:42,616-Speed 2629.59 samples/sec   Loss 6.4824   LearningRate 0.0234   Epoch: 10   Global Step: 428060   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:48:46,490-Speed 2643.91 samples/sec   Loss 6.4364   LearningRate 0.0234   Epoch: 10   Global Step: 428070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:50,381-Speed 2632.15 samples/sec   Loss 6.4632   LearningRate 0.0234   Epoch: 10   Global Step: 428080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:54,409-Speed 2543.14 samples/sec   Loss 6.4213   LearningRate 0.0234   Epoch: 10   Global Step: 428090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:48:58,456-Speed 2530.69 samples/sec   Loss 6.5064   LearningRate 0.0234   Epoch: 10   Global Step: 428100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:02,351-Speed 2629.56 samples/sec   Loss 6.4260   LearningRate 0.0234   Epoch: 10   Global Step: 428110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:06,243-Speed 2631.87 samples/sec   Loss 6.4922   LearningRate 0.0234   Epoch: 10   Global Step: 428120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:10,135-Speed 2631.30 samples/sec   Loss 6.3862   LearningRate 0.0234   Epoch: 10   Global Step: 428130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:14,027-Speed 2633.30 samples/sec   Loss 6.3686   LearningRate 0.0234   Epoch: 10   Global Step: 428140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:17,922-Speed 2629.38 samples/sec   Loss 6.4605   LearningRate 0.0234   Epoch: 10   Global Step: 428150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:21,817-Speed 2629.74 samples/sec   Loss 6.5482   LearningRate 0.0234   Epoch: 10   Global Step: 428160   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:25,709-Speed 2631.14 samples/sec   Loss 6.5508   LearningRate 0.0234   Epoch: 10   Global Step: 428170   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:49:29,586-Speed 2642.47 samples/sec   Loss 6.6105   LearningRate 0.0234   Epoch: 10   Global Step: 428180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:33,484-Speed 2627.06 samples/sec   Loss 6.4354   LearningRate 0.0234   Epoch: 10   Global Step: 428190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:37,387-Speed 2624.34 samples/sec   Loss 6.5367   LearningRate 0.0234   Epoch: 10   Global Step: 428200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:41,281-Speed 2630.37 samples/sec   Loss 6.4422   LearningRate 0.0234   Epoch: 10   Global Step: 428210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:45,182-Speed 2625.61 samples/sec   Loss 6.4108   LearningRate 0.0234   Epoch: 10   Global Step: 428220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:49,082-Speed 2626.42 samples/sec   Loss 6.5748   LearningRate 0.0234   Epoch: 10   Global Step: 428230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:52,978-Speed 2628.86 samples/sec   Loss 6.5516   LearningRate 0.0234   Epoch: 10   Global Step: 428240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:49:56,875-Speed 2628.30 samples/sec   Loss 6.4961   LearningRate 0.0234   Epoch: 10   Global Step: 428250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:00,767-Speed 2631.46 samples/sec   Loss 6.5090   LearningRate 0.0234   Epoch: 10   Global Step: 428260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:04,863-Speed 2500.97 samples/sec   Loss 6.4356   LearningRate 0.0234   Epoch: 10   Global Step: 428270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:08,803-Speed 2599.34 samples/sec   Loss 6.3970   LearningRate 0.0234   Epoch: 10   Global Step: 428280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:12,788-Speed 2569.96 samples/sec   Loss 6.4145   LearningRate 0.0234   Epoch: 10   Global Step: 428290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:16,687-Speed 2626.55 samples/sec   Loss 6.5137   LearningRate 0.0234   Epoch: 10   Global Step: 428300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:20,583-Speed 2629.04 samples/sec   Loss 6.4199   LearningRate 0.0234   Epoch: 10   Global Step: 428310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:24,498-Speed 2616.51 samples/sec   Loss 6.4560   LearningRate 0.0234   Epoch: 10   Global Step: 428320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:28,400-Speed 2625.25 samples/sec   Loss 6.5834   LearningRate 0.0234   Epoch: 10   Global Step: 428330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:32,299-Speed 2626.42 samples/sec   Loss 6.4559   LearningRate 0.0234   Epoch: 10   Global Step: 428340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:36,209-Speed 2620.15 samples/sec   Loss 6.4116   LearningRate 0.0234   Epoch: 10   Global Step: 428350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:50:40,260-Speed 2528.01 samples/sec   Loss 6.4101   LearningRate 0.0234   Epoch: 10   Global Step: 428360   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:50:44,350-Speed 2503.84 samples/sec   Loss 6.3316   LearningRate 0.0234   Epoch: 10   Global Step: 428370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:50:48,274-Speed 2610.29 samples/sec   Loss 6.5350   LearningRate 0.0234   Epoch: 10   Global Step: 428380   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:50:52,170-Speed 2629.15 samples/sec   Loss 6.3742   LearningRate 0.0234   Epoch: 10   Global Step: 428390   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:50:56,064-Speed 2630.06 samples/sec   Loss 6.3541   LearningRate 0.0234   Epoch: 10   Global Step: 428400   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:50:59,959-Speed 2630.01 samples/sec   Loss 6.4275   LearningRate 0.0234   Epoch: 10   Global Step: 428410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:51:03,862-Speed 2624.87 samples/sec   Loss 6.4048   LearningRate 0.0234   Epoch: 10   Global Step: 428420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:51:07,758-Speed 2628.65 samples/sec   Loss 6.4237   LearningRate 0.0234   Epoch: 10   Global Step: 428430   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:51:11,658-Speed 2625.85 samples/sec   Loss 6.4194   LearningRate 0.0234   Epoch: 10   Global Step: 428440   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:51:15,554-Speed 2629.39 samples/sec   Loss 6.4298   LearningRate 0.0234   Epoch: 10   Global Step: 428450   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:51:19,449-Speed 2629.31 samples/sec   Loss 6.4877   LearningRate 0.0234   Epoch: 10   Global Step: 428460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:23,355-Speed 2622.53 samples/sec   Loss 6.4764   LearningRate 0.0234   Epoch: 10   Global Step: 428470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:27,247-Speed 2631.47 samples/sec   Loss 6.4110   LearningRate 0.0234   Epoch: 10   Global Step: 428480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:31,141-Speed 2630.34 samples/sec   Loss 6.4563   LearningRate 0.0234   Epoch: 10   Global Step: 428490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:35,047-Speed 2625.14 samples/sec   Loss 6.3522   LearningRate 0.0234   Epoch: 10   Global Step: 428500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:38,951-Speed 2623.84 samples/sec   Loss 6.4289   LearningRate 0.0234   Epoch: 10   Global Step: 428510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:42,861-Speed 2618.87 samples/sec   Loss 6.4993   LearningRate 0.0234   Epoch: 10   Global Step: 428520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:46,768-Speed 2621.72 samples/sec   Loss 6.5020   LearningRate 0.0234   Epoch: 10   Global Step: 428530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:50,673-Speed 2623.23 samples/sec   Loss 6.3794   LearningRate 0.0234   Epoch: 10   Global Step: 428540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:54,582-Speed 2619.63 samples/sec   Loss 6.5645   LearningRate 0.0234   Epoch: 10   Global Step: 428550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:51:58,480-Speed 2627.68 samples/sec   Loss 6.4701   LearningRate 0.0234   Epoch: 10   Global Step: 428560   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:52:02,358-Speed 2640.89 samples/sec   Loss 6.4249   LearningRate 0.0234   Epoch: 10   Global Step: 428570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:52:06,255-Speed 2628.92 samples/sec   Loss 6.4735   LearningRate 0.0234   Epoch: 10   Global Step: 428580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:52:10,140-Speed 2636.29 samples/sec   Loss 6.4504   LearningRate 0.0234   Epoch: 10   Global Step: 428590   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:14,038-Speed 2627.52 samples/sec   Loss 6.3870   LearningRate 0.0234   Epoch: 10   Global Step: 428600   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:17,936-Speed 2627.66 samples/sec   Loss 6.4656   LearningRate 0.0234   Epoch: 10   Global Step: 428610   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:21,833-Speed 2628.00 samples/sec   Loss 6.4463   LearningRate 0.0234   Epoch: 10   Global Step: 428620   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:25,726-Speed 2630.70 samples/sec   Loss 6.4116   LearningRate 0.0234   Epoch: 10   Global Step: 428630   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:29,626-Speed 2626.06 samples/sec   Loss 6.4812   LearningRate 0.0234   Epoch: 10   Global Step: 428640   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:33,520-Speed 2630.13 samples/sec   Loss 6.5856   LearningRate 0.0234   Epoch: 10   Global Step: 428650   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:37,414-Speed 2630.11 samples/sec   Loss 6.4589   LearningRate 0.0234   Epoch: 10   Global Step: 428660   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:41,312-Speed 2627.95 samples/sec   Loss 6.4161   LearningRate 0.0234   Epoch: 10   Global Step: 428670   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:45,211-Speed 2627.12 samples/sec   Loss 6.4812   LearningRate 0.0234   Epoch: 10   Global Step: 428680   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:52:49,125-Speed 2616.43 samples/sec   Loss 6.3469   LearningRate 0.0234   Epoch: 10   Global Step: 428690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:52:53,037-Speed 2619.18 samples/sec   Loss 6.5887   LearningRate 0.0234   Epoch: 10   Global Step: 428700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:52:56,943-Speed 2621.69 samples/sec   Loss 6.3583   LearningRate 0.0234   Epoch: 10   Global Step: 428710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:00,851-Speed 2620.93 samples/sec   Loss 6.4814   LearningRate 0.0233   Epoch: 10   Global Step: 428720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:04,760-Speed 2620.21 samples/sec   Loss 6.4804   LearningRate 0.0233   Epoch: 10   Global Step: 428730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:08,657-Speed 2628.23 samples/sec   Loss 6.3217   LearningRate 0.0233   Epoch: 10   Global Step: 428740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:12,557-Speed 2626.20 samples/sec   Loss 6.4028   LearningRate 0.0233   Epoch: 10   Global Step: 428750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:16,453-Speed 2628.62 samples/sec   Loss 6.4491   LearningRate 0.0233   Epoch: 10   Global Step: 428760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:20,353-Speed 2626.64 samples/sec   Loss 6.3718   LearningRate 0.0233   Epoch: 10   Global Step: 428770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:24,252-Speed 2626.66 samples/sec   Loss 6.4381   LearningRate 0.0233   Epoch: 10   Global Step: 428780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:28,153-Speed 2625.84 samples/sec   Loss 6.4947   LearningRate 0.0233   Epoch: 10   Global Step: 428790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:32,055-Speed 2624.70 samples/sec   Loss 6.3771   LearningRate 0.0233   Epoch: 10   Global Step: 428800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:35,949-Speed 2630.53 samples/sec   Loss 6.4844   LearningRate 0.0233   Epoch: 10   Global Step: 428810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:39,844-Speed 2629.64 samples/sec   Loss 6.5233   LearningRate 0.0233   Epoch: 10   Global Step: 428820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:43,748-Speed 2623.76 samples/sec   Loss 6.3684   LearningRate 0.0233   Epoch: 10   Global Step: 428830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:47,642-Speed 2630.25 samples/sec   Loss 6.5601   LearningRate 0.0233   Epoch: 10   Global Step: 428840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:51,539-Speed 2627.98 samples/sec   Loss 6.4343   LearningRate 0.0233   Epoch: 10   Global Step: 428850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:55,442-Speed 2624.04 samples/sec   Loss 6.4609   LearningRate 0.0233   Epoch: 10   Global Step: 428860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:53:59,349-Speed 2622.19 samples/sec   Loss 6.4822   LearningRate 0.0233   Epoch: 10   Global Step: 428870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:03,253-Speed 2623.94 samples/sec   Loss 6.5302   LearningRate 0.0233   Epoch: 10   Global Step: 428880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:07,227-Speed 2577.09 samples/sec   Loss 6.3015   LearningRate 0.0233   Epoch: 10   Global Step: 428890   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:54:11,097-Speed 2646.62 samples/sec   Loss 6.4787   LearningRate 0.0233   Epoch: 10   Global Step: 428900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:14,992-Speed 2629.59 samples/sec   Loss 6.4838   LearningRate 0.0233   Epoch: 10   Global Step: 428910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:18,892-Speed 2626.20 samples/sec   Loss 6.4442   LearningRate 0.0233   Epoch: 10   Global Step: 428920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:22,790-Speed 2628.08 samples/sec   Loss 6.4189   LearningRate 0.0233   Epoch: 10   Global Step: 428930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:26,690-Speed 2625.80 samples/sec   Loss 6.5087   LearningRate 0.0233   Epoch: 10   Global Step: 428940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:54:30,579-Speed 2633.54 samples/sec   Loss 6.4089   LearningRate 0.0233   Epoch: 10   Global Step: 428950   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:34,486-Speed 2621.46 samples/sec   Loss 6.4434   LearningRate 0.0233   Epoch: 10   Global Step: 428960   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:38,396-Speed 2619.50 samples/sec   Loss 6.4732   LearningRate 0.0233   Epoch: 10   Global Step: 428970   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:42,309-Speed 2617.51 samples/sec   Loss 6.5529   LearningRate 0.0233   Epoch: 10   Global Step: 428980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:46,206-Speed 2629.30 samples/sec   Loss 6.4895   LearningRate 0.0233   Epoch: 10   Global Step: 428990   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:50,109-Speed 2624.16 samples/sec   Loss 6.4926   LearningRate 0.0233   Epoch: 10   Global Step: 429000   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:54,006-Speed 2628.48 samples/sec   Loss 6.4939   LearningRate 0.0233   Epoch: 10   Global Step: 429010   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:54:57,931-Speed 2609.99 samples/sec   Loss 6.5533   LearningRate 0.0233   Epoch: 10   Global Step: 429020   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:01,831-Speed 2625.75 samples/sec   Loss 6.5022   LearningRate 0.0233   Epoch: 10   Global Step: 429030   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:05,726-Speed 2629.84 samples/sec   Loss 6.3705   LearningRate 0.0233   Epoch: 10   Global Step: 429040   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:09,629-Speed 2623.65 samples/sec   Loss 6.4674   LearningRate 0.0233   Epoch: 10   Global Step: 429050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:55:13,518-Speed 2634.04 samples/sec   Loss 6.3503   LearningRate 0.0233   Epoch: 10   Global Step: 429060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:55:17,410-Speed 2632.07 samples/sec   Loss 6.3307   LearningRate 0.0233   Epoch: 10   Global Step: 429070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:55:21,304-Speed 2630.52 samples/sec   Loss 6.3233   LearningRate 0.0233   Epoch: 10   Global Step: 429080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:55:25,198-Speed 2629.61 samples/sec   Loss 6.5038   LearningRate 0.0233   Epoch: 10   Global Step: 429090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:55:29,083-Speed 2637.76 samples/sec   Loss 6.3361   LearningRate 0.0233   Epoch: 10   Global Step: 429100   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:32,980-Speed 2627.63 samples/sec   Loss 6.4649   LearningRate 0.0233   Epoch: 10   Global Step: 429110   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:36,882-Speed 2624.64 samples/sec   Loss 6.4047   LearningRate 0.0233   Epoch: 10   Global Step: 429120   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:40,788-Speed 2622.14 samples/sec   Loss 6.4262   LearningRate 0.0233   Epoch: 10   Global Step: 429130   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:44,702-Speed 2617.38 samples/sec   Loss 6.5365   LearningRate 0.0233   Epoch: 10   Global Step: 429140   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:48,606-Speed 2623.10 samples/sec   Loss 6.4702   LearningRate 0.0233   Epoch: 10   Global Step: 429150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:52,510-Speed 2624.25 samples/sec   Loss 6.5596   LearningRate 0.0233   Epoch: 10   Global Step: 429160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:55:56,407-Speed 2628.24 samples/sec   Loss 6.3977   LearningRate 0.0233   Epoch: 10   Global Step: 429170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:00,303-Speed 2629.12 samples/sec   Loss 6.3397   LearningRate 0.0233   Epoch: 10   Global Step: 429180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:04,196-Speed 2630.41 samples/sec   Loss 6.4330   LearningRate 0.0233   Epoch: 10   Global Step: 429190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:08,094-Speed 2627.41 samples/sec   Loss 6.3421   LearningRate 0.0233   Epoch: 10   Global Step: 429200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:56:11,990-Speed 2628.92 samples/sec   Loss 6.4012   LearningRate 0.0233   Epoch: 10   Global Step: 429210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:56:15,903-Speed 2617.81 samples/sec   Loss 6.3099   LearningRate 0.0233   Epoch: 10   Global Step: 429220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:56:19,798-Speed 2629.71 samples/sec   Loss 6.4007   LearningRate 0.0233   Epoch: 10   Global Step: 429230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:56:23,677-Speed 2640.04 samples/sec   Loss 6.4132   LearningRate 0.0233   Epoch: 10   Global Step: 429240   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:27,577-Speed 2626.67 samples/sec   Loss 6.5166   LearningRate 0.0233   Epoch: 10   Global Step: 429250   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:31,477-Speed 2626.51 samples/sec   Loss 6.4172   LearningRate 0.0233   Epoch: 10   Global Step: 429260   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:35,370-Speed 2630.97 samples/sec   Loss 6.3636   LearningRate 0.0233   Epoch: 10   Global Step: 429270   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:39,270-Speed 2625.78 samples/sec   Loss 6.3984   LearningRate 0.0233   Epoch: 10   Global Step: 429280   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:43,174-Speed 2623.88 samples/sec   Loss 6.3689   LearningRate 0.0233   Epoch: 10   Global Step: 429290   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:47,091-Speed 2614.33 samples/sec   Loss 6.4668   LearningRate 0.0233   Epoch: 10   Global Step: 429300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:50,996-Speed 2623.26 samples/sec   Loss 6.3427   LearningRate 0.0233   Epoch: 10   Global Step: 429310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:54,894-Speed 2627.12 samples/sec   Loss 6.5078   LearningRate 0.0233   Epoch: 10   Global Step: 429320   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:56:59,102-Speed 2434.25 samples/sec   Loss 6.4699   LearningRate 0.0233   Epoch: 10   Global Step: 429330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:02,998-Speed 2628.97 samples/sec   Loss 6.5579   LearningRate 0.0233   Epoch: 10   Global Step: 429340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:57:06,906-Speed 2621.33 samples/sec   Loss 6.5220   LearningRate 0.0233   Epoch: 10   Global Step: 429350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:57:10,801-Speed 2629.39 samples/sec   Loss 6.4901   LearningRate 0.0233   Epoch: 10   Global Step: 429360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:57:14,677-Speed 2641.94 samples/sec   Loss 6.4446   LearningRate 0.0233   Epoch: 10   Global Step: 429370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:18,571-Speed 2630.70 samples/sec   Loss 6.4462   LearningRate 0.0233   Epoch: 10   Global Step: 429380   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:22,464-Speed 2630.28 samples/sec   Loss 6.3733   LearningRate 0.0233   Epoch: 10   Global Step: 429390   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:26,353-Speed 2633.82 samples/sec   Loss 6.3665   LearningRate 0.0233   Epoch: 10   Global Step: 429400   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:30,243-Speed 2633.05 samples/sec   Loss 6.5127   LearningRate 0.0233   Epoch: 10   Global Step: 429410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:34,134-Speed 2632.29 samples/sec   Loss 6.5461   LearningRate 0.0233   Epoch: 10   Global Step: 429420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:38,024-Speed 2633.07 samples/sec   Loss 6.5131   LearningRate 0.0233   Epoch: 10   Global Step: 429430   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:41,923-Speed 2627.33 samples/sec   Loss 6.3318   LearningRate 0.0233   Epoch: 10   Global Step: 429440   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:45,816-Speed 2630.91 samples/sec   Loss 6.4499   LearningRate 0.0233   Epoch: 10   Global Step: 429450   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:49,721-Speed 2623.08 samples/sec   Loss 6.4935   LearningRate 0.0233   Epoch: 10   Global Step: 429460   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:53,606-Speed 2636.45 samples/sec   Loss 6.4372   LearningRate 0.0233   Epoch: 10   Global Step: 429470   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:57:57,517-Speed 2619.76 samples/sec   Loss 6.3999   LearningRate 0.0233   Epoch: 10   Global Step: 429480   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:01,449-Speed 2604.73 samples/sec   Loss 6.4538   LearningRate 0.0233   Epoch: 10   Global Step: 429490   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:05,341-Speed 2632.19 samples/sec   Loss 6.3539   LearningRate 0.0233   Epoch: 10   Global Step: 429500   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:09,253-Speed 2617.74 samples/sec   Loss 6.4393   LearningRate 0.0233   Epoch: 10   Global Step: 429510   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:13,152-Speed 2626.87 samples/sec   Loss 6.5007   LearningRate 0.0233   Epoch: 10   Global Step: 429520   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:17,049-Speed 2628.35 samples/sec   Loss 6.4676   LearningRate 0.0233   Epoch: 10   Global Step: 429530   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:20,945-Speed 2629.36 samples/sec   Loss 6.4375   LearningRate 0.0233   Epoch: 10   Global Step: 429540   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:24,843-Speed 2627.73 samples/sec   Loss 6.4040   LearningRate 0.0233   Epoch: 10   Global Step: 429550   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:28,745-Speed 2624.35 samples/sec   Loss 6.3673   LearningRate 0.0233   Epoch: 10   Global Step: 429560   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:58:32,669-Speed 2610.07 samples/sec   Loss 6.4230   LearningRate 0.0233   Epoch: 10   Global Step: 429570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:36,570-Speed 2626.27 samples/sec   Loss 6.3838   LearningRate 0.0232   Epoch: 10   Global Step: 429580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:40,465-Speed 2629.74 samples/sec   Loss 6.4199   LearningRate 0.0232   Epoch: 10   Global Step: 429590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:44,360-Speed 2629.57 samples/sec   Loss 6.4152   LearningRate 0.0232   Epoch: 10   Global Step: 429600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:48,256-Speed 2629.39 samples/sec   Loss 6.3466   LearningRate 0.0232   Epoch: 10   Global Step: 429610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:52,186-Speed 2605.70 samples/sec   Loss 6.4016   LearningRate 0.0232   Epoch: 10   Global Step: 429620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:56,092-Speed 2622.46 samples/sec   Loss 6.4539   LearningRate 0.0232   Epoch: 10   Global Step: 429630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:58:59,990-Speed 2627.72 samples/sec   Loss 6.5153   LearningRate 0.0232   Epoch: 10   Global Step: 429640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:59:03,894-Speed 2623.69 samples/sec   Loss 6.4143   LearningRate 0.0232   Epoch: 10   Global Step: 429650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:59:07,793-Speed 2627.05 samples/sec   Loss 6.3420   LearningRate 0.0232   Epoch: 10   Global Step: 429660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:59:11,690-Speed 2628.08 samples/sec   Loss 6.3389   LearningRate 0.0232   Epoch: 10   Global Step: 429670   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 19:59:15,568-Speed 2641.46 samples/sec   Loss 6.3971   LearningRate 0.0232   Epoch: 10   Global Step: 429680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 19:59:19,448-Speed 2639.80 samples/sec   Loss 6.4166   LearningRate 0.0232   Epoch: 10   Global Step: 429690   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:23,355-Speed 2621.53 samples/sec   Loss 6.4185   LearningRate 0.0232   Epoch: 10   Global Step: 429700   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:27,245-Speed 2632.69 samples/sec   Loss 6.4964   LearningRate 0.0232   Epoch: 10   Global Step: 429710   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:31,136-Speed 2632.77 samples/sec   Loss 6.4274   LearningRate 0.0232   Epoch: 10   Global Step: 429720   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:35,034-Speed 2628.14 samples/sec   Loss 6.4085   LearningRate 0.0232   Epoch: 10   Global Step: 429730   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:38,951-Speed 2614.33 samples/sec   Loss 6.4083   LearningRate 0.0232   Epoch: 10   Global Step: 429740   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:42,843-Speed 2632.05 samples/sec   Loss 6.4948   LearningRate 0.0232   Epoch: 10   Global Step: 429750   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:46,738-Speed 2630.00 samples/sec   Loss 6.4395   LearningRate 0.0232   Epoch: 10   Global Step: 429760   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:50,635-Speed 2627.90 samples/sec   Loss 6.4422   LearningRate 0.0232   Epoch: 10   Global Step: 429770   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:54,549-Speed 2616.81 samples/sec   Loss 6.3672   LearningRate 0.0232   Epoch: 10   Global Step: 429780   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 19:59:58,440-Speed 2632.99 samples/sec   Loss 6.5482   LearningRate 0.0232   Epoch: 10   Global Step: 429790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:00:02,333-Speed 2630.66 samples/sec   Loss 6.5221   LearningRate 0.0232   Epoch: 10   Global Step: 429800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:00:06,230-Speed 2628.44 samples/sec   Loss 6.4440   LearningRate 0.0232   Epoch: 10   Global Step: 429810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:00:10,132-Speed 2624.34 samples/sec   Loss 6.3923   LearningRate 0.0232   Epoch: 10   Global Step: 429820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:00:14,029-Speed 2628.93 samples/sec   Loss 6.4182   LearningRate 0.0232   Epoch: 10   Global Step: 429830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:00:17,915-Speed 2635.14 samples/sec   Loss 6.5550   LearningRate 0.0232   Epoch: 10   Global Step: 429840   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:21,817-Speed 2625.03 samples/sec   Loss 6.3694   LearningRate 0.0232   Epoch: 10   Global Step: 429850   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:25,723-Speed 2622.49 samples/sec   Loss 6.4759   LearningRate 0.0232   Epoch: 10   Global Step: 429860   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:29,618-Speed 2629.44 samples/sec   Loss 6.5076   LearningRate 0.0232   Epoch: 10   Global Step: 429870   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:33,513-Speed 2629.98 samples/sec   Loss 6.3231   LearningRate 0.0232   Epoch: 10   Global Step: 429880   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:37,412-Speed 2626.91 samples/sec   Loss 6.3597   LearningRate 0.0232   Epoch: 10   Global Step: 429890   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:41,311-Speed 2626.65 samples/sec   Loss 6.4838   LearningRate 0.0232   Epoch: 10   Global Step: 429900   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:45,233-Speed 2611.81 samples/sec   Loss 6.4086   LearningRate 0.0232   Epoch: 10   Global Step: 429910   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:49,126-Speed 2631.42 samples/sec   Loss 6.3555   LearningRate 0.0232   Epoch: 10   Global Step: 429920   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:53,032-Speed 2622.85 samples/sec   Loss 6.3714   LearningRate 0.0232   Epoch: 10   Global Step: 429930   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:00:56,947-Speed 2616.33 samples/sec   Loss 6.5340   LearningRate 0.0232   Epoch: 10   Global Step: 429940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:01:00,842-Speed 2630.04 samples/sec   Loss 6.4656   LearningRate 0.0232   Epoch: 10   Global Step: 429950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:01:04,737-Speed 2629.56 samples/sec   Loss 6.3728   LearningRate 0.0232   Epoch: 10   Global Step: 429960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:01:08,633-Speed 2628.53 samples/sec   Loss 6.4661   LearningRate 0.0232   Epoch: 10   Global Step: 429970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:01:12,530-Speed 2628.37 samples/sec   Loss 6.3788   LearningRate 0.0232   Epoch: 10   Global Step: 429980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:01:16,430-Speed 2626.63 samples/sec   Loss 6.3925   LearningRate 0.0232   Epoch: 10   Global Step: 429990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:01:20,326-Speed 2628.82 samples/sec   Loss 6.4771   LearningRate 0.0232   Epoch: 10   Global Step: 430000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:02:03,095-[lfw][430000]XNorm: 22.466479
Training: 2022-04-14 20:02:03,096-[lfw][430000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 20:02:03,096-[lfw][430000]Accuracy-Highest: 0.99783
Training: 2022-04-14 20:02:52,903-[cfp_fp][430000]XNorm: 20.830325
Training: 2022-04-14 20:02:52,904-[cfp_fp][430000]Accuracy-Flip: 0.98814+-0.00384
Training: 2022-04-14 20:02:52,905-[cfp_fp][430000]Accuracy-Highest: 0.98843
Training: 2022-04-14 20:03:35,703-[agedb_30][430000]XNorm: 22.284805
Training: 2022-04-14 20:03:35,704-[agedb_30][430000]Accuracy-Flip: 0.97767+-0.00549
Training: 2022-04-14 20:03:35,705-[agedb_30][430000]Accuracy-Highest: 0.97767
Training: 2022-04-14 20:03:39,587-Speed 73.53 samples/sec   Loss 6.4008   LearningRate 0.0232   Epoch: 10   Global Step: 430010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:03:43,460-Speed 2644.40 samples/sec   Loss 6.4639   LearningRate 0.0232   Epoch: 10   Global Step: 430020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:03:47,336-Speed 2642.77 samples/sec   Loss 6.4191   LearningRate 0.0232   Epoch: 10   Global Step: 430030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:03:51,224-Speed 2634.55 samples/sec   Loss 6.4487   LearningRate 0.0232   Epoch: 10   Global Step: 430040   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 20:03:55,085-Speed 2652.49 samples/sec   Loss 6.4077   LearningRate 0.0232   Epoch: 10   Global Step: 430050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:03:58,976-Speed 2632.15 samples/sec   Loss 6.4880   LearningRate 0.0232   Epoch: 10   Global Step: 430060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:04:02,864-Speed 2635.14 samples/sec   Loss 6.3559   LearningRate 0.0232   Epoch: 10   Global Step: 430070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:04:06,757-Speed 2631.04 samples/sec   Loss 6.3997   LearningRate 0.0232   Epoch: 10   Global Step: 430080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:04:10,661-Speed 2624.03 samples/sec   Loss 6.5331   LearningRate 0.0232   Epoch: 10   Global Step: 430090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:04:14,532-Speed 2645.80 samples/sec   Loss 6.4263   LearningRate 0.0232   Epoch: 10   Global Step: 430100   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:18,427-Speed 2629.29 samples/sec   Loss 6.3941   LearningRate 0.0232   Epoch: 10   Global Step: 430110   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:22,323-Speed 2628.72 samples/sec   Loss 6.4402   LearningRate 0.0232   Epoch: 10   Global Step: 430120   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:26,214-Speed 2632.51 samples/sec   Loss 6.4270   LearningRate 0.0232   Epoch: 10   Global Step: 430130   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:30,110-Speed 2629.78 samples/sec   Loss 6.3570   LearningRate 0.0232   Epoch: 10   Global Step: 430140   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:34,037-Speed 2608.19 samples/sec   Loss 6.4948   LearningRate 0.0232   Epoch: 10   Global Step: 430150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:37,935-Speed 2628.20 samples/sec   Loss 6.4366   LearningRate 0.0232   Epoch: 10   Global Step: 430160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:41,828-Speed 2630.77 samples/sec   Loss 6.4051   LearningRate 0.0232   Epoch: 10   Global Step: 430170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:45,738-Speed 2620.18 samples/sec   Loss 6.4565   LearningRate 0.0232   Epoch: 10   Global Step: 430180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:49,632-Speed 2630.15 samples/sec   Loss 6.4345   LearningRate 0.0232   Epoch: 10   Global Step: 430190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:04:53,528-Speed 2629.40 samples/sec   Loss 6.4187   LearningRate 0.0232   Epoch: 10   Global Step: 430200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:04:57,424-Speed 2628.82 samples/sec   Loss 6.3432   LearningRate 0.0232   Epoch: 10   Global Step: 430210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:01,324-Speed 2626.60 samples/sec   Loss 6.4029   LearningRate 0.0232   Epoch: 10   Global Step: 430220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:05,225-Speed 2624.86 samples/sec   Loss 6.4388   LearningRate 0.0232   Epoch: 10   Global Step: 430230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:09,123-Speed 2628.21 samples/sec   Loss 6.4217   LearningRate 0.0232   Epoch: 10   Global Step: 430240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:13,023-Speed 2626.74 samples/sec   Loss 6.4830   LearningRate 0.0232   Epoch: 10   Global Step: 430250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:16,927-Speed 2623.51 samples/sec   Loss 6.4006   LearningRate 0.0232   Epoch: 10   Global Step: 430260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:20,828-Speed 2625.32 samples/sec   Loss 6.3404   LearningRate 0.0232   Epoch: 10   Global Step: 430270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:24,726-Speed 2627.59 samples/sec   Loss 6.3343   LearningRate 0.0232   Epoch: 10   Global Step: 430280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:28,632-Speed 2622.18 samples/sec   Loss 6.3155   LearningRate 0.0232   Epoch: 10   Global Step: 430290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:32,515-Speed 2637.92 samples/sec   Loss 6.3487   LearningRate 0.0232   Epoch: 10   Global Step: 430300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:36,417-Speed 2624.98 samples/sec   Loss 6.4670   LearningRate 0.0232   Epoch: 10   Global Step: 430310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:40,314-Speed 2628.54 samples/sec   Loss 6.4798   LearningRate 0.0232   Epoch: 10   Global Step: 430320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:44,221-Speed 2621.19 samples/sec   Loss 6.3733   LearningRate 0.0232   Epoch: 10   Global Step: 430330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:48,139-Speed 2615.62 samples/sec   Loss 6.4239   LearningRate 0.0232   Epoch: 10   Global Step: 430340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:52,040-Speed 2625.79 samples/sec   Loss 6.3330   LearningRate 0.0232   Epoch: 10   Global Step: 430350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:05:55,939-Speed 2626.66 samples/sec   Loss 6.3925   LearningRate 0.0232   Epoch: 10   Global Step: 430360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:00,007-Speed 2517.68 samples/sec   Loss 6.3419   LearningRate 0.0232   Epoch: 10   Global Step: 430370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:03,913-Speed 2621.93 samples/sec   Loss 6.4577   LearningRate 0.0232   Epoch: 10   Global Step: 430380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:07,808-Speed 2629.87 samples/sec   Loss 6.3503   LearningRate 0.0232   Epoch: 10   Global Step: 430390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:11,683-Speed 2643.92 samples/sec   Loss 6.3946   LearningRate 0.0232   Epoch: 10   Global Step: 430400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:15,588-Speed 2622.74 samples/sec   Loss 6.3535   LearningRate 0.0232   Epoch: 10   Global Step: 430410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:19,490-Speed 2624.75 samples/sec   Loss 6.3612   LearningRate 0.0232   Epoch: 10   Global Step: 430420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:23,396-Speed 2622.42 samples/sec   Loss 6.3441   LearningRate 0.0232   Epoch: 10   Global Step: 430430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:27,302-Speed 2622.35 samples/sec   Loss 6.4000   LearningRate 0.0231   Epoch: 10   Global Step: 430440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:31,200-Speed 2627.18 samples/sec   Loss 6.5631   LearningRate 0.0231   Epoch: 10   Global Step: 430450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:35,106-Speed 2622.04 samples/sec   Loss 6.3804   LearningRate 0.0231   Epoch: 10   Global Step: 430460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:39,006-Speed 2626.19 samples/sec   Loss 6.5087   LearningRate 0.0231   Epoch: 10   Global Step: 430470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:42,905-Speed 2627.74 samples/sec   Loss 6.3982   LearningRate 0.0231   Epoch: 10   Global Step: 430480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:46,799-Speed 2630.03 samples/sec   Loss 6.4233   LearningRate 0.0231   Epoch: 10   Global Step: 430490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:50,698-Speed 2627.21 samples/sec   Loss 6.4799   LearningRate 0.0231   Epoch: 10   Global Step: 430500   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 20:06:54,576-Speed 2641.05 samples/sec   Loss 6.3839   LearningRate 0.0231   Epoch: 10   Global Step: 430510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:06:58,470-Speed 2630.00 samples/sec   Loss 6.3839   LearningRate 0.0231   Epoch: 10   Global Step: 430520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:02,363-Speed 2631.40 samples/sec   Loss 6.3405   LearningRate 0.0231   Epoch: 10   Global Step: 430530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:06,262-Speed 2626.75 samples/sec   Loss 6.4645   LearningRate 0.0231   Epoch: 10   Global Step: 430540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:10,157-Speed 2629.72 samples/sec   Loss 6.3774   LearningRate 0.0231   Epoch: 10   Global Step: 430550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:14,053-Speed 2629.58 samples/sec   Loss 6.4342   LearningRate 0.0231   Epoch: 10   Global Step: 430560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:17,954-Speed 2626.14 samples/sec   Loss 6.4313   LearningRate 0.0231   Epoch: 10   Global Step: 430570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:21,856-Speed 2624.65 samples/sec   Loss 6.4226   LearningRate 0.0231   Epoch: 10   Global Step: 430580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:25,754-Speed 2627.78 samples/sec   Loss 6.2888   LearningRate 0.0231   Epoch: 10   Global Step: 430590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:29,649-Speed 2629.34 samples/sec   Loss 6.4230   LearningRate 0.0231   Epoch: 10   Global Step: 430600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:33,548-Speed 2627.34 samples/sec   Loss 6.4322   LearningRate 0.0231   Epoch: 10   Global Step: 430610   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 20:07:37,435-Speed 2635.17 samples/sec   Loss 6.3593   LearningRate 0.0231   Epoch: 10   Global Step: 430620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:41,341-Speed 2622.57 samples/sec   Loss 6.3849   LearningRate 0.0231   Epoch: 10   Global Step: 430630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:45,236-Speed 2629.18 samples/sec   Loss 6.5310   LearningRate 0.0231   Epoch: 10   Global Step: 430640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:49,131-Speed 2629.72 samples/sec   Loss 6.4638   LearningRate 0.0231   Epoch: 10   Global Step: 430650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:53,047-Speed 2615.36 samples/sec   Loss 6.3349   LearningRate 0.0231   Epoch: 10   Global Step: 430660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:07:56,951-Speed 2623.57 samples/sec   Loss 6.4740   LearningRate 0.0231   Epoch: 10   Global Step: 430670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:08:00,840-Speed 2633.55 samples/sec   Loss 6.3694   LearningRate 0.0231   Epoch: 10   Global Step: 430680   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:04,749-Speed 2620.67 samples/sec   Loss 6.3466   LearningRate 0.0231   Epoch: 10   Global Step: 430690   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:08,648-Speed 2626.58 samples/sec   Loss 6.3942   LearningRate 0.0231   Epoch: 10   Global Step: 430700   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:12,550-Speed 2625.44 samples/sec   Loss 6.3988   LearningRate 0.0231   Epoch: 10   Global Step: 430710   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:16,469-Speed 2612.98 samples/sec   Loss 6.4584   LearningRate 0.0231   Epoch: 10   Global Step: 430720   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:20,382-Speed 2617.81 samples/sec   Loss 6.3968   LearningRate 0.0231   Epoch: 10   Global Step: 430730   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:24,324-Speed 2598.69 samples/sec   Loss 6.3655   LearningRate 0.0231   Epoch: 10   Global Step: 430740   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:28,221-Speed 2628.50 samples/sec   Loss 6.4328   LearningRate 0.0231   Epoch: 10   Global Step: 430750   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:32,170-Speed 2594.03 samples/sec   Loss 6.4058   LearningRate 0.0231   Epoch: 10   Global Step: 430760   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:36,073-Speed 2624.16 samples/sec   Loss 6.3381   LearningRate 0.0231   Epoch: 10   Global Step: 430770   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:39,999-Speed 2609.02 samples/sec   Loss 6.4387   LearningRate 0.0231   Epoch: 10   Global Step: 430780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:08:43,922-Speed 2610.79 samples/sec   Loss 6.2814   LearningRate 0.0231   Epoch: 10   Global Step: 430790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:08:47,821-Speed 2627.15 samples/sec   Loss 6.3909   LearningRate 0.0231   Epoch: 10   Global Step: 430800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:08:51,718-Speed 2628.24 samples/sec   Loss 6.4119   LearningRate 0.0231   Epoch: 10   Global Step: 430810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:08:55,601-Speed 2637.96 samples/sec   Loss 6.3789   LearningRate 0.0231   Epoch: 10   Global Step: 430820   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:08:59,511-Speed 2619.01 samples/sec   Loss 6.4734   LearningRate 0.0231   Epoch: 10   Global Step: 430830   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:03,407-Speed 2629.32 samples/sec   Loss 6.4161   LearningRate 0.0231   Epoch: 10   Global Step: 430840   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:07,303-Speed 2628.90 samples/sec   Loss 6.4148   LearningRate 0.0231   Epoch: 10   Global Step: 430850   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:11,200-Speed 2628.55 samples/sec   Loss 6.4934   LearningRate 0.0231   Epoch: 10   Global Step: 430860   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:15,096-Speed 2629.24 samples/sec   Loss 6.3579   LearningRate 0.0231   Epoch: 10   Global Step: 430870   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:18,990-Speed 2630.62 samples/sec   Loss 6.4728   LearningRate 0.0231   Epoch: 10   Global Step: 430880   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:22,897-Speed 2621.26 samples/sec   Loss 6.4663   LearningRate 0.0231   Epoch: 10   Global Step: 430890   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:26,802-Speed 2622.60 samples/sec   Loss 6.4236   LearningRate 0.0231   Epoch: 10   Global Step: 430900   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:30,711-Speed 2620.44 samples/sec   Loss 6.5153   LearningRate 0.0231   Epoch: 10   Global Step: 430910   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:34,614-Speed 2624.02 samples/sec   Loss 6.4532   LearningRate 0.0231   Epoch: 10   Global Step: 430920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:09:38,513-Speed 2627.61 samples/sec   Loss 6.4874   LearningRate 0.0231   Epoch: 10   Global Step: 430930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:09:42,417-Speed 2623.17 samples/sec   Loss 6.4781   LearningRate 0.0231   Epoch: 10   Global Step: 430940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:09:46,329-Speed 2618.71 samples/sec   Loss 6.4174   LearningRate 0.0231   Epoch: 10   Global Step: 430950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:09:50,234-Speed 2622.20 samples/sec   Loss 6.3945   LearningRate 0.0231   Epoch: 10   Global Step: 430960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:09:54,106-Speed 2645.23 samples/sec   Loss 6.3701   LearningRate 0.0231   Epoch: 10   Global Step: 430970   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:09:58,043-Speed 2601.56 samples/sec   Loss 6.3345   LearningRate 0.0231   Epoch: 10   Global Step: 430980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:01,936-Speed 2631.35 samples/sec   Loss 6.3778   LearningRate 0.0231   Epoch: 10   Global Step: 430990   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:05,841-Speed 2622.78 samples/sec   Loss 6.4743   LearningRate 0.0231   Epoch: 10   Global Step: 431000   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:09,810-Speed 2580.62 samples/sec   Loss 6.4716   LearningRate 0.0231   Epoch: 10   Global Step: 431010   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:13,701-Speed 2632.23 samples/sec   Loss 6.4132   LearningRate 0.0231   Epoch: 10   Global Step: 431020   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:17,636-Speed 2603.20 samples/sec   Loss 6.4782   LearningRate 0.0231   Epoch: 10   Global Step: 431030   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:21,533-Speed 2628.48 samples/sec   Loss 6.3878   LearningRate 0.0231   Epoch: 10   Global Step: 431040   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:25,427-Speed 2630.54 samples/sec   Loss 6.4674   LearningRate 0.0231   Epoch: 10   Global Step: 431050   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:29,320-Speed 2630.45 samples/sec   Loss 6.4312   LearningRate 0.0231   Epoch: 10   Global Step: 431060   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:10:33,250-Speed 2607.73 samples/sec   Loss 6.3325   LearningRate 0.0231   Epoch: 10   Global Step: 431070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:10:37,145-Speed 2629.15 samples/sec   Loss 6.4520   LearningRate 0.0231   Epoch: 10   Global Step: 431080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:10:41,061-Speed 2616.04 samples/sec   Loss 6.4310   LearningRate 0.0231   Epoch: 10   Global Step: 431090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:10:44,960-Speed 2627.01 samples/sec   Loss 6.4078   LearningRate 0.0231   Epoch: 10   Global Step: 431100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:10:48,859-Speed 2626.88 samples/sec   Loss 6.4991   LearningRate 0.0231   Epoch: 10   Global Step: 431110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:10:52,757-Speed 2628.29 samples/sec   Loss 6.4142   LearningRate 0.0231   Epoch: 10   Global Step: 431120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:10:56,660-Speed 2624.20 samples/sec   Loss 6.2734   LearningRate 0.0231   Epoch: 10   Global Step: 431130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:00,551-Speed 2631.94 samples/sec   Loss 6.4160   LearningRate 0.0231   Epoch: 10   Global Step: 431140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:04,449-Speed 2628.27 samples/sec   Loss 6.4672   LearningRate 0.0231   Epoch: 10   Global Step: 431150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:08,356-Speed 2621.79 samples/sec   Loss 6.3639   LearningRate 0.0231   Epoch: 10   Global Step: 431160   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:12,237-Speed 2638.99 samples/sec   Loss 6.3771   LearningRate 0.0231   Epoch: 10   Global Step: 431170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:16,133-Speed 2628.67 samples/sec   Loss 6.3925   LearningRate 0.0231   Epoch: 10   Global Step: 431180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:20,031-Speed 2627.64 samples/sec   Loss 6.5025   LearningRate 0.0231   Epoch: 10   Global Step: 431190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:23,933-Speed 2624.64 samples/sec   Loss 6.2939   LearningRate 0.0231   Epoch: 10   Global Step: 431200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:27,825-Speed 2631.60 samples/sec   Loss 6.4303   LearningRate 0.0231   Epoch: 10   Global Step: 431210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:11:31,698-Speed 2645.29 samples/sec   Loss 6.3706   LearningRate 0.0231   Epoch: 10   Global Step: 431220   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:35,591-Speed 2630.81 samples/sec   Loss 6.3729   LearningRate 0.0231   Epoch: 10   Global Step: 431230   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:39,484-Speed 2630.88 samples/sec   Loss 6.4747   LearningRate 0.0231   Epoch: 10   Global Step: 431240   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:43,381-Speed 2628.03 samples/sec   Loss 6.5146   LearningRate 0.0231   Epoch: 10   Global Step: 431250   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:47,276-Speed 2630.16 samples/sec   Loss 6.4020   LearningRate 0.0231   Epoch: 10   Global Step: 431260   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:51,166-Speed 2632.91 samples/sec   Loss 6.3528   LearningRate 0.0231   Epoch: 10   Global Step: 431270   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:55,065-Speed 2626.91 samples/sec   Loss 6.4348   LearningRate 0.0231   Epoch: 10   Global Step: 431280   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:11:58,964-Speed 2627.70 samples/sec   Loss 6.4539   LearningRate 0.0231   Epoch: 10   Global Step: 431290   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:12:02,862-Speed 2627.48 samples/sec   Loss 6.4818   LearningRate 0.0230   Epoch: 10   Global Step: 431300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:12:06,761-Speed 2626.49 samples/sec   Loss 6.4361   LearningRate 0.0230   Epoch: 10   Global Step: 431310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:12:10,657-Speed 2629.21 samples/sec   Loss 6.3888   LearningRate 0.0230   Epoch: 10   Global Step: 431320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:14,593-Speed 2602.21 samples/sec   Loss 6.4559   LearningRate 0.0230   Epoch: 10   Global Step: 431330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:18,489-Speed 2629.34 samples/sec   Loss 6.3930   LearningRate 0.0230   Epoch: 10   Global Step: 431340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:22,388-Speed 2626.92 samples/sec   Loss 6.3974   LearningRate 0.0230   Epoch: 10   Global Step: 431350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:26,286-Speed 2627.76 samples/sec   Loss 6.3267   LearningRate 0.0230   Epoch: 10   Global Step: 431360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:30,184-Speed 2627.55 samples/sec   Loss 6.4805   LearningRate 0.0230   Epoch: 10   Global Step: 431370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:34,087-Speed 2624.84 samples/sec   Loss 6.3999   LearningRate 0.0230   Epoch: 10   Global Step: 431380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:38,005-Speed 2613.84 samples/sec   Loss 6.3572   LearningRate 0.0230   Epoch: 10   Global Step: 431390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:41,911-Speed 2622.36 samples/sec   Loss 6.4115   LearningRate 0.0230   Epoch: 10   Global Step: 431400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:12:45,839-Speed 2607.89 samples/sec   Loss 6.3242   LearningRate 0.0230   Epoch: 10   Global Step: 431410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:12:49,754-Speed 2616.21 samples/sec   Loss 6.3582   LearningRate 0.0230   Epoch: 10   Global Step: 431420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:12:53,656-Speed 2624.78 samples/sec   Loss 6.3517   LearningRate 0.0230   Epoch: 10   Global Step: 431430   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:12:57,563-Speed 2622.77 samples/sec   Loss 6.3649   LearningRate 0.0230   Epoch: 10   Global Step: 431440   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:01,465-Speed 2624.84 samples/sec   Loss 6.5065   LearningRate 0.0230   Epoch: 10   Global Step: 431450   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:05,362-Speed 2627.53 samples/sec   Loss 6.4476   LearningRate 0.0230   Epoch: 10   Global Step: 431460   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:09,277-Speed 2616.67 samples/sec   Loss 6.3048   LearningRate 0.0230   Epoch: 10   Global Step: 431470   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:13,175-Speed 2627.23 samples/sec   Loss 6.3236   LearningRate 0.0230   Epoch: 10   Global Step: 431480   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:17,080-Speed 2622.95 samples/sec   Loss 6.3465   LearningRate 0.0230   Epoch: 10   Global Step: 431490   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:20,979-Speed 2627.63 samples/sec   Loss 6.3502   LearningRate 0.0230   Epoch: 10   Global Step: 431500   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:13:24,882-Speed 2623.94 samples/sec   Loss 6.4188   LearningRate 0.0230   Epoch: 10   Global Step: 431510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:28,781-Speed 2626.99 samples/sec   Loss 6.2746   LearningRate 0.0230   Epoch: 10   Global Step: 431520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:32,689-Speed 2620.71 samples/sec   Loss 6.3673   LearningRate 0.0230   Epoch: 10   Global Step: 431530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:36,589-Speed 2626.00 samples/sec   Loss 6.4160   LearningRate 0.0230   Epoch: 10   Global Step: 431540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:40,498-Speed 2619.98 samples/sec   Loss 6.3050   LearningRate 0.0230   Epoch: 10   Global Step: 431550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:44,393-Speed 2630.11 samples/sec   Loss 6.4522   LearningRate 0.0230   Epoch: 10   Global Step: 431560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:48,293-Speed 2627.07 samples/sec   Loss 6.4247   LearningRate 0.0230   Epoch: 10   Global Step: 431570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:52,190-Speed 2627.87 samples/sec   Loss 6.5316   LearningRate 0.0230   Epoch: 10   Global Step: 431580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:13:56,097-Speed 2622.07 samples/sec   Loss 6.4405   LearningRate 0.0230   Epoch: 10   Global Step: 431590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:00,014-Speed 2614.51 samples/sec   Loss 6.2638   LearningRate 0.0230   Epoch: 10   Global Step: 431600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:03,893-Speed 2640.80 samples/sec   Loss 6.4207   LearningRate 0.0230   Epoch: 10   Global Step: 431610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:07,806-Speed 2617.43 samples/sec   Loss 6.3973   LearningRate 0.0230   Epoch: 10   Global Step: 431620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:11,737-Speed 2605.46 samples/sec   Loss 6.3487   LearningRate 0.0230   Epoch: 10   Global Step: 431630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:15,679-Speed 2598.16 samples/sec   Loss 6.3315   LearningRate 0.0230   Epoch: 10   Global Step: 431640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:19,582-Speed 2625.05 samples/sec   Loss 6.4049   LearningRate 0.0230   Epoch: 10   Global Step: 431650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:23,667-Speed 2507.27 samples/sec   Loss 6.4135   LearningRate 0.0230   Epoch: 10   Global Step: 431660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:27,574-Speed 2621.45 samples/sec   Loss 6.3697   LearningRate 0.0230   Epoch: 10   Global Step: 431670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:31,485-Speed 2619.21 samples/sec   Loss 6.4889   LearningRate 0.0230   Epoch: 10   Global Step: 431680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:35,385-Speed 2625.76 samples/sec   Loss 6.4121   LearningRate 0.0230   Epoch: 10   Global Step: 431690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:39,286-Speed 2626.05 samples/sec   Loss 6.4139   LearningRate 0.0230   Epoch: 10   Global Step: 431700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:43,162-Speed 2642.13 samples/sec   Loss 6.3477   LearningRate 0.0230   Epoch: 10   Global Step: 431710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:47,070-Speed 2621.46 samples/sec   Loss 6.3976   LearningRate 0.0230   Epoch: 10   Global Step: 431720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:50,966-Speed 2628.25 samples/sec   Loss 6.5068   LearningRate 0.0230   Epoch: 10   Global Step: 431730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:54,876-Speed 2619.78 samples/sec   Loss 6.2875   LearningRate 0.0230   Epoch: 10   Global Step: 431740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:14:58,777-Speed 2625.43 samples/sec   Loss 6.4071   LearningRate 0.0230   Epoch: 10   Global Step: 431750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:15:02,678-Speed 2626.31 samples/sec   Loss 6.3049   LearningRate 0.0230   Epoch: 10   Global Step: 431760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:15:06,582-Speed 2623.17 samples/sec   Loss 6.4356   LearningRate 0.0230   Epoch: 10   Global Step: 431770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:15:10,482-Speed 2626.93 samples/sec   Loss 6.3467   LearningRate 0.0230   Epoch: 10   Global Step: 431780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:15:14,376-Speed 2630.17 samples/sec   Loss 6.4096   LearningRate 0.0230   Epoch: 10   Global Step: 431790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:15:18,251-Speed 2643.27 samples/sec   Loss 6.3307   LearningRate 0.0230   Epoch: 10   Global Step: 431800   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:15:22,125-Speed 2644.26 samples/sec   Loss 6.3286   LearningRate 0.0230   Epoch: 10   Global Step: 431810   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:26,021-Speed 2628.93 samples/sec   Loss 6.4399   LearningRate 0.0230   Epoch: 10   Global Step: 431820   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:29,925-Speed 2623.33 samples/sec   Loss 6.2211   LearningRate 0.0230   Epoch: 10   Global Step: 431830   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:33,820-Speed 2629.96 samples/sec   Loss 6.4341   LearningRate 0.0230   Epoch: 10   Global Step: 431840   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:37,757-Speed 2601.30 samples/sec   Loss 6.3831   LearningRate 0.0230   Epoch: 10   Global Step: 431850   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:41,723-Speed 2582.69 samples/sec   Loss 6.3873   LearningRate 0.0230   Epoch: 10   Global Step: 431860   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:45,617-Speed 2630.18 samples/sec   Loss 6.3670   LearningRate 0.0230   Epoch: 10   Global Step: 431870   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:49,513-Speed 2629.68 samples/sec   Loss 6.4342   LearningRate 0.0230   Epoch: 10   Global Step: 431880   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:53,412-Speed 2626.70 samples/sec   Loss 6.4038   LearningRate 0.0230   Epoch: 10   Global Step: 431890   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:15:57,308-Speed 2629.39 samples/sec   Loss 6.5049   LearningRate 0.0230   Epoch: 10   Global Step: 431900   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-04-14 20:16:01,202-Speed 2630.47 samples/sec   Loss 6.4249   LearningRate 0.0230   Epoch: 10   Global Step: 431910   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:05,112-Speed 2619.10 samples/sec   Loss 6.3570   LearningRate 0.0230   Epoch: 10   Global Step: 431920   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:09,043-Speed 2606.23 samples/sec   Loss 6.5062   LearningRate 0.0230   Epoch: 10   Global Step: 431930   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:12,937-Speed 2630.11 samples/sec   Loss 6.4860   LearningRate 0.0230   Epoch: 10   Global Step: 431940   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:16,832-Speed 2629.79 samples/sec   Loss 6.4093   LearningRate 0.0230   Epoch: 10   Global Step: 431950   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:20,728-Speed 2629.25 samples/sec   Loss 6.4285   LearningRate 0.0230   Epoch: 10   Global Step: 431960   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:24,624-Speed 2628.65 samples/sec   Loss 6.3116   LearningRate 0.0230   Epoch: 10   Global Step: 431970   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:28,522-Speed 2627.75 samples/sec   Loss 6.4613   LearningRate 0.0230   Epoch: 10   Global Step: 431980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:32,414-Speed 2631.49 samples/sec   Loss 6.4636   LearningRate 0.0230   Epoch: 10   Global Step: 431990   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:36,313-Speed 2626.83 samples/sec   Loss 6.3382   LearningRate 0.0230   Epoch: 10   Global Step: 432000   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:16:40,226-Speed 2617.57 samples/sec   Loss 6.2830   LearningRate 0.0230   Epoch: 10   Global Step: 432010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:16:44,131-Speed 2624.02 samples/sec   Loss 6.3049   LearningRate 0.0230   Epoch: 10   Global Step: 432020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:16:48,022-Speed 2632.02 samples/sec   Loss 6.3929   LearningRate 0.0230   Epoch: 10   Global Step: 432030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:16:51,918-Speed 2628.89 samples/sec   Loss 6.3783   LearningRate 0.0230   Epoch: 10   Global Step: 432040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:16:55,810-Speed 2631.56 samples/sec   Loss 6.5055   LearningRate 0.0230   Epoch: 10   Global Step: 432050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:16:59,714-Speed 2623.65 samples/sec   Loss 6.3461   LearningRate 0.0230   Epoch: 10   Global Step: 432060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:03,609-Speed 2629.61 samples/sec   Loss 6.3521   LearningRate 0.0230   Epoch: 10   Global Step: 432070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:07,505-Speed 2628.57 samples/sec   Loss 6.3537   LearningRate 0.0230   Epoch: 10   Global Step: 432080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:11,399-Speed 2630.84 samples/sec   Loss 6.5050   LearningRate 0.0230   Epoch: 10   Global Step: 432090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:15,299-Speed 2626.07 samples/sec   Loss 6.3402   LearningRate 0.0230   Epoch: 10   Global Step: 432100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:19,178-Speed 2640.62 samples/sec   Loss 6.3603   LearningRate 0.0230   Epoch: 10   Global Step: 432110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:23,073-Speed 2629.28 samples/sec   Loss 6.3226   LearningRate 0.0230   Epoch: 10   Global Step: 432120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:26,978-Speed 2623.35 samples/sec   Loss 6.4173   LearningRate 0.0230   Epoch: 10   Global Step: 432130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:30,877-Speed 2626.63 samples/sec   Loss 6.4260   LearningRate 0.0230   Epoch: 10   Global Step: 432140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:34,776-Speed 2626.83 samples/sec   Loss 6.3797   LearningRate 0.0230   Epoch: 10   Global Step: 432150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:38,678-Speed 2625.02 samples/sec   Loss 6.4473   LearningRate 0.0230   Epoch: 10   Global Step: 432160   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:42,600-Speed 2611.77 samples/sec   Loss 6.3838   LearningRate 0.0229   Epoch: 10   Global Step: 432170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:46,503-Speed 2624.32 samples/sec   Loss 6.3042   LearningRate 0.0229   Epoch: 10   Global Step: 432180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:50,429-Speed 2608.89 samples/sec   Loss 6.3794   LearningRate 0.0229   Epoch: 10   Global Step: 432190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:54,323-Speed 2630.70 samples/sec   Loss 6.3975   LearningRate 0.0229   Epoch: 10   Global Step: 432200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:17:58,207-Speed 2637.17 samples/sec   Loss 6.4314   LearningRate 0.0229   Epoch: 10   Global Step: 432210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:02,134-Speed 2608.10 samples/sec   Loss 6.4562   LearningRate 0.0229   Epoch: 10   Global Step: 432220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:06,091-Speed 2588.56 samples/sec   Loss 6.3408   LearningRate 0.0229   Epoch: 10   Global Step: 432230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:10,023-Speed 2604.90 samples/sec   Loss 6.3894   LearningRate 0.0229   Epoch: 10   Global Step: 432240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:13,936-Speed 2618.71 samples/sec   Loss 6.4571   LearningRate 0.0229   Epoch: 10   Global Step: 432250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:17,912-Speed 2576.12 samples/sec   Loss 6.3841   LearningRate 0.0229   Epoch: 10   Global Step: 432260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:21,807-Speed 2629.61 samples/sec   Loss 6.3786   LearningRate 0.0229   Epoch: 10   Global Step: 432270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:25,703-Speed 2628.62 samples/sec   Loss 6.2843   LearningRate 0.0229   Epoch: 10   Global Step: 432280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:29,617-Speed 2617.78 samples/sec   Loss 6.4422   LearningRate 0.0229   Epoch: 10   Global Step: 432290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:33,575-Speed 2587.70 samples/sec   Loss 6.3406   LearningRate 0.0229   Epoch: 10   Global Step: 432300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:37,487-Speed 2617.83 samples/sec   Loss 6.3212   LearningRate 0.0229   Epoch: 10   Global Step: 432310   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 20:18:41,373-Speed 2635.11 samples/sec   Loss 6.4452   LearningRate 0.0229   Epoch: 10   Global Step: 432320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:45,272-Speed 2627.65 samples/sec   Loss 6.3545   LearningRate 0.0229   Epoch: 10   Global Step: 432330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:49,180-Speed 2621.35 samples/sec   Loss 6.3929   LearningRate 0.0229   Epoch: 10   Global Step: 432340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:53,095-Speed 2616.21 samples/sec   Loss 6.4187   LearningRate 0.0229   Epoch: 10   Global Step: 432350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:18:57,017-Speed 2611.47 samples/sec   Loss 6.3313   LearningRate 0.0229   Epoch: 10   Global Step: 432360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:19:00,919-Speed 2624.88 samples/sec   Loss 6.5113   LearningRate 0.0229   Epoch: 10   Global Step: 432370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:19:04,820-Speed 2625.47 samples/sec   Loss 6.4262   LearningRate 0.0229   Epoch: 10   Global Step: 432380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:19:08,729-Speed 2620.17 samples/sec   Loss 6.3952   LearningRate 0.0229   Epoch: 10   Global Step: 432390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:19:12,633-Speed 2623.06 samples/sec   Loss 6.4766   LearningRate 0.0229   Epoch: 10   Global Step: 432400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:19:16,520-Speed 2634.98 samples/sec   Loss 6.3382   LearningRate 0.0229   Epoch: 10   Global Step: 432410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:20,422-Speed 2625.47 samples/sec   Loss 6.3050   LearningRate 0.0229   Epoch: 10   Global Step: 432420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:24,321-Speed 2627.10 samples/sec   Loss 6.2738   LearningRate 0.0229   Epoch: 10   Global Step: 432430   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:28,222-Speed 2625.41 samples/sec   Loss 6.3347   LearningRate 0.0229   Epoch: 10   Global Step: 432440   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:32,125-Speed 2624.51 samples/sec   Loss 6.2681   LearningRate 0.0229   Epoch: 10   Global Step: 432450   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:36,023-Speed 2627.32 samples/sec   Loss 6.4641   LearningRate 0.0229   Epoch: 10   Global Step: 432460   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:39,923-Speed 2626.23 samples/sec   Loss 6.4058   LearningRate 0.0229   Epoch: 10   Global Step: 432470   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:43,823-Speed 2626.41 samples/sec   Loss 6.3124   LearningRate 0.0229   Epoch: 10   Global Step: 432480   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:47,727-Speed 2623.02 samples/sec   Loss 6.2823   LearningRate 0.0229   Epoch: 10   Global Step: 432490   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:51,624-Speed 2628.99 samples/sec   Loss 6.4243   LearningRate 0.0229   Epoch: 10   Global Step: 432500   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:19:55,541-Speed 2614.79 samples/sec   Loss 6.3100   LearningRate 0.0229   Epoch: 10   Global Step: 432510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:19:59,436-Speed 2630.13 samples/sec   Loss 6.3455   LearningRate 0.0229   Epoch: 10   Global Step: 432520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:20:03,333-Speed 2627.74 samples/sec   Loss 6.4173   LearningRate 0.0229   Epoch: 10   Global Step: 432530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:20:07,227-Speed 2630.02 samples/sec   Loss 6.4259   LearningRate 0.0229   Epoch: 10   Global Step: 432540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:20:11,132-Speed 2622.75 samples/sec   Loss 6.3789   LearningRate 0.0229   Epoch: 10   Global Step: 432550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:20:15,039-Speed 2621.80 samples/sec   Loss 6.3900   LearningRate 0.0229   Epoch: 10   Global Step: 432560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:20:18,943-Speed 2624.12 samples/sec   Loss 6.4357   LearningRate 0.0229   Epoch: 10   Global Step: 432570   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:22,849-Speed 2621.79 samples/sec   Loss 6.3875   LearningRate 0.0229   Epoch: 10   Global Step: 432580   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:26,757-Speed 2620.96 samples/sec   Loss 6.4316   LearningRate 0.0229   Epoch: 10   Global Step: 432590   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:30,657-Speed 2626.28 samples/sec   Loss 6.3941   LearningRate 0.0229   Epoch: 10   Global Step: 432600   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:34,579-Speed 2611.59 samples/sec   Loss 6.1983   LearningRate 0.0229   Epoch: 10   Global Step: 432610   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:38,499-Speed 2612.67 samples/sec   Loss 6.4754   LearningRate 0.0229   Epoch: 10   Global Step: 432620   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:42,401-Speed 2625.12 samples/sec   Loss 6.3673   LearningRate 0.0229   Epoch: 10   Global Step: 432630   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:46,332-Speed 2605.79 samples/sec   Loss 6.3014   LearningRate 0.0229   Epoch: 10   Global Step: 432640   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:50,230-Speed 2627.52 samples/sec   Loss 6.3943   LearningRate 0.0229   Epoch: 10   Global Step: 432650   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:54,128-Speed 2627.90 samples/sec   Loss 6.4621   LearningRate 0.0229   Epoch: 10   Global Step: 432660   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:20:58,022-Speed 2629.88 samples/sec   Loss 6.4228   LearningRate 0.0229   Epoch: 10   Global Step: 432670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:01,922-Speed 2626.60 samples/sec   Loss 6.4458   LearningRate 0.0229   Epoch: 10   Global Step: 432680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:05,819-Speed 2628.40 samples/sec   Loss 6.3438   LearningRate 0.0229   Epoch: 10   Global Step: 432690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:09,722-Speed 2624.35 samples/sec   Loss 6.3899   LearningRate 0.0229   Epoch: 10   Global Step: 432700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:13,614-Speed 2631.61 samples/sec   Loss 6.4364   LearningRate 0.0229   Epoch: 10   Global Step: 432710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:17,519-Speed 2622.96 samples/sec   Loss 6.3752   LearningRate 0.0229   Epoch: 10   Global Step: 432720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:21,419-Speed 2626.42 samples/sec   Loss 6.3217   LearningRate 0.0229   Epoch: 10   Global Step: 432730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:25,331-Speed 2618.73 samples/sec   Loss 6.3459   LearningRate 0.0229   Epoch: 10   Global Step: 432740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:29,231-Speed 2626.00 samples/sec   Loss 6.3603   LearningRate 0.0229   Epoch: 10   Global Step: 432750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:33,126-Speed 2629.29 samples/sec   Loss 6.3068   LearningRate 0.0229   Epoch: 10   Global Step: 432760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:37,037-Speed 2618.90 samples/sec   Loss 6.3919   LearningRate 0.0229   Epoch: 10   Global Step: 432770   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 20:21:40,910-Speed 2645.11 samples/sec   Loss 6.2592   LearningRate 0.0229   Epoch: 10   Global Step: 432780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:45,023-Speed 2490.40 samples/sec   Loss 6.3656   LearningRate 0.0229   Epoch: 10   Global Step: 432790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:48,954-Speed 2605.56 samples/sec   Loss 6.3156   LearningRate 0.0229   Epoch: 10   Global Step: 432800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:52,868-Speed 2616.65 samples/sec   Loss 6.3848   LearningRate 0.0229   Epoch: 10   Global Step: 432810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:21:56,764-Speed 2629.31 samples/sec   Loss 6.3386   LearningRate 0.0229   Epoch: 10   Global Step: 432820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:00,696-Speed 2604.62 samples/sec   Loss 6.4065   LearningRate 0.0229   Epoch: 10   Global Step: 432830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:04,637-Speed 2598.77 samples/sec   Loss 6.3681   LearningRate 0.0229   Epoch: 10   Global Step: 432840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:08,532-Speed 2629.89 samples/sec   Loss 6.4201   LearningRate 0.0229   Epoch: 10   Global Step: 432850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:12,431-Speed 2627.17 samples/sec   Loss 6.4020   LearningRate 0.0229   Epoch: 10   Global Step: 432860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:16,325-Speed 2630.99 samples/sec   Loss 6.3683   LearningRate 0.0229   Epoch: 10   Global Step: 432870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:20,216-Speed 2632.06 samples/sec   Loss 6.2524   LearningRate 0.0229   Epoch: 10   Global Step: 432880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:22:24,103-Speed 2635.47 samples/sec   Loss 6.4781   LearningRate 0.0229   Epoch: 10   Global Step: 432890   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:27,997-Speed 2630.40 samples/sec   Loss 6.4205   LearningRate 0.0229   Epoch: 10   Global Step: 432900   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:31,902-Speed 2622.76 samples/sec   Loss 6.3939   LearningRate 0.0229   Epoch: 10   Global Step: 432910   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:35,805-Speed 2623.89 samples/sec   Loss 6.3588   LearningRate 0.0229   Epoch: 10   Global Step: 432920   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:39,700-Speed 2630.28 samples/sec   Loss 6.2496   LearningRate 0.0229   Epoch: 10   Global Step: 432930   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:43,595-Speed 2629.40 samples/sec   Loss 6.3393   LearningRate 0.0229   Epoch: 10   Global Step: 432940   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:47,493-Speed 2627.97 samples/sec   Loss 6.4980   LearningRate 0.0229   Epoch: 10   Global Step: 432950   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:51,392-Speed 2626.80 samples/sec   Loss 6.4213   LearningRate 0.0229   Epoch: 10   Global Step: 432960   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:55,290-Speed 2628.23 samples/sec   Loss 6.3073   LearningRate 0.0229   Epoch: 10   Global Step: 432970   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:22:59,192-Speed 2624.42 samples/sec   Loss 6.4432   LearningRate 0.0229   Epoch: 10   Global Step: 432980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:23:03,100-Speed 2620.93 samples/sec   Loss 6.3218   LearningRate 0.0229   Epoch: 10   Global Step: 432990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:06,999-Speed 2627.04 samples/sec   Loss 6.3326   LearningRate 0.0229   Epoch: 10   Global Step: 433000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:10,903-Speed 2623.08 samples/sec   Loss 6.3663   LearningRate 0.0229   Epoch: 10   Global Step: 433010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:14,798-Speed 2630.13 samples/sec   Loss 6.3879   LearningRate 0.0229   Epoch: 10   Global Step: 433020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:18,701-Speed 2623.70 samples/sec   Loss 6.3676   LearningRate 0.0228   Epoch: 10   Global Step: 433030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:22,599-Speed 2628.38 samples/sec   Loss 6.3324   LearningRate 0.0228   Epoch: 10   Global Step: 433040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:26,499-Speed 2626.31 samples/sec   Loss 6.4202   LearningRate 0.0228   Epoch: 10   Global Step: 433050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:30,520-Speed 2547.36 samples/sec   Loss 6.2726   LearningRate 0.0228   Epoch: 10   Global Step: 433060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:34,428-Speed 2621.11 samples/sec   Loss 6.4644   LearningRate 0.0228   Epoch: 10   Global Step: 433070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:38,334-Speed 2622.07 samples/sec   Loss 6.3453   LearningRate 0.0228   Epoch: 10   Global Step: 433080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:42,235-Speed 2625.13 samples/sec   Loss 6.4103   LearningRate 0.0228   Epoch: 10   Global Step: 433090   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-04-14 20:23:46,119-Speed 2636.98 samples/sec   Loss 6.3518   LearningRate 0.0228   Epoch: 10   Global Step: 433100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:50,026-Speed 2622.06 samples/sec   Loss 6.4320   LearningRate 0.0228   Epoch: 10   Global Step: 433110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:23:53,906-Speed 2639.40 samples/sec   Loss 6.4379   LearningRate 0.0228   Epoch: 10   Global Step: 433120   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:23:57,808-Speed 2625.42 samples/sec   Loss 6.4695   LearningRate 0.0228   Epoch: 10   Global Step: 433130   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:01,706-Speed 2627.56 samples/sec   Loss 6.3647   LearningRate 0.0228   Epoch: 10   Global Step: 433140   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:05,605-Speed 2627.06 samples/sec   Loss 6.4224   LearningRate 0.0228   Epoch: 10   Global Step: 433150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:09,500-Speed 2629.04 samples/sec   Loss 6.3298   LearningRate 0.0228   Epoch: 10   Global Step: 433160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:13,407-Speed 2622.19 samples/sec   Loss 6.2580   LearningRate 0.0228   Epoch: 10   Global Step: 433170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:17,310-Speed 2623.59 samples/sec   Loss 6.3169   LearningRate 0.0228   Epoch: 10   Global Step: 433180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:21,217-Speed 2627.30 samples/sec   Loss 6.3307   LearningRate 0.0228   Epoch: 10   Global Step: 433190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:25,209-Speed 2565.87 samples/sec   Loss 6.3292   LearningRate 0.0228   Epoch: 10   Global Step: 433200   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:29,222-Speed 2553.02 samples/sec   Loss 6.3521   LearningRate 0.0228   Epoch: 10   Global Step: 433210   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:33,126-Speed 2623.41 samples/sec   Loss 6.3925   LearningRate 0.0228   Epoch: 10   Global Step: 433220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:24:37,021-Speed 2629.07 samples/sec   Loss 6.3740   LearningRate 0.0228   Epoch: 10   Global Step: 433230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:24:40,899-Speed 2641.04 samples/sec   Loss 6.4015   LearningRate 0.0228   Epoch: 10   Global Step: 433240   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:44,794-Speed 2629.94 samples/sec   Loss 6.5139   LearningRate 0.0228   Epoch: 10   Global Step: 433250   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:48,693-Speed 2626.58 samples/sec   Loss 6.4100   LearningRate 0.0228   Epoch: 10   Global Step: 433260   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:52,592-Speed 2627.22 samples/sec   Loss 6.4335   LearningRate 0.0228   Epoch: 10   Global Step: 433270   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:24:56,494-Speed 2624.55 samples/sec   Loss 6.3002   LearningRate 0.0228   Epoch: 10   Global Step: 433280   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:00,387-Speed 2631.33 samples/sec   Loss 6.4759   LearningRate 0.0228   Epoch: 10   Global Step: 433290   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:04,283-Speed 2629.26 samples/sec   Loss 6.3346   LearningRate 0.0228   Epoch: 10   Global Step: 433300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:08,175-Speed 2631.84 samples/sec   Loss 6.4630   LearningRate 0.0228   Epoch: 10   Global Step: 433310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:12,067-Speed 2630.92 samples/sec   Loss 6.3448   LearningRate 0.0228   Epoch: 10   Global Step: 433320   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:15,963-Speed 2629.35 samples/sec   Loss 6.3665   LearningRate 0.0228   Epoch: 10   Global Step: 433330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:19,867-Speed 2624.17 samples/sec   Loss 6.4744   LearningRate 0.0228   Epoch: 10   Global Step: 433340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:25:23,760-Speed 2630.15 samples/sec   Loss 6.4281   LearningRate 0.0228   Epoch: 10   Global Step: 433350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:25:27,660-Speed 2626.64 samples/sec   Loss 6.2508   LearningRate 0.0228   Epoch: 10   Global Step: 433360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:25:31,546-Speed 2635.57 samples/sec   Loss 6.3414   LearningRate 0.0228   Epoch: 10   Global Step: 433370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:35,454-Speed 2621.53 samples/sec   Loss 6.3191   LearningRate 0.0228   Epoch: 10   Global Step: 433380   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:39,357-Speed 2623.87 samples/sec   Loss 6.4301   LearningRate 0.0228   Epoch: 10   Global Step: 433390   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:43,298-Speed 2598.96 samples/sec   Loss 6.3722   LearningRate 0.0228   Epoch: 10   Global Step: 433400   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:47,201-Speed 2624.73 samples/sec   Loss 6.3040   LearningRate 0.0228   Epoch: 10   Global Step: 433410   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:51,101-Speed 2626.20 samples/sec   Loss 6.3367   LearningRate 0.0228   Epoch: 10   Global Step: 433420   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:55,033-Speed 2604.98 samples/sec   Loss 6.4110   LearningRate 0.0228   Epoch: 10   Global Step: 433430   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:25:58,934-Speed 2625.94 samples/sec   Loss 6.3255   LearningRate 0.0228   Epoch: 10   Global Step: 433440   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:26:02,829-Speed 2629.54 samples/sec   Loss 6.4436   LearningRate 0.0228   Epoch: 10   Global Step: 433450   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:26:06,725-Speed 2629.36 samples/sec   Loss 6.4140   LearningRate 0.0228   Epoch: 10   Global Step: 433460   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-04-14 20:26:10,630-Speed 2622.59 samples/sec   Loss 6.2770   LearningRate 0.0228   Epoch: 10   Global Step: 433470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:26:14,538-Speed 2620.62 samples/sec   Loss 6.3519   LearningRate 0.0228   Epoch: 10   Global Step: 433480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:26:18,440-Speed 2625.03 samples/sec   Loss 6.4310   LearningRate 0.0228   Epoch: 10   Global Step: 433490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:26:22,336-Speed 2629.45 samples/sec   Loss 6.3476   LearningRate 0.0228   Epoch: 10   Global Step: 433500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:26:26,237-Speed 2625.26 samples/sec   Loss 6.3978   LearningRate 0.0228   Epoch: 10   Global Step: 433510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:26:30,262-Speed 2544.64 samples/sec   Loss 6.4041   LearningRate 0.0228   Epoch: 10   Global Step: 433520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-04-14 20:26:34,185-Speed 2611.45 samples/sec   Loss 6.4252   LearningRate 0.0228   Epoch: 10   Global Step: 433530   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:26:38,082-Speed 2628.64 samples/sec   Loss 6.3626   LearningRate 0.0228   Epoch: 10   Global Step: 433540   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:26:41,988-Speed 2622.04 samples/sec   Loss 6.3776   LearningRate 0.0228   Epoch: 10   Global Step: 433550   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:26:45,884-Speed 2628.83 samples/sec   Loss 6.3317   LearningRate 0.0228   Epoch: 10   Global Step: 433560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:26:49,779-Speed 2629.75 samples/sec   Loss 6.3049   LearningRate 0.0228   Epoch: 10   Global Step: 433570   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-04-14 20:26:53,660-Speed 2640.21 samples/sec   Loss 6.3529   LearningRate 0.0228   Epoch: 10   Global Step: 433580   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:26:57,554-Speed 2629.69 samples/sec   Loss 6.3637   LearningRate 0.0228   Epoch: 10   Global Step: 433590   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:27:01,456-Speed 2624.85 samples/sec   Loss 6.3946   LearningRate 0.0228   Epoch: 10   Global Step: 433600   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:27:05,355-Speed 2626.91 samples/sec   Loss 6.3489   LearningRate 0.0228   Epoch: 10   Global Step: 433610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:27:09,232-Speed 2642.41 samples/sec   Loss 6.2559   LearningRate 0.0228   Epoch: 10   Global Step: 433620   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:13,127-Speed 2629.30 samples/sec   Loss 6.3068   LearningRate 0.0228   Epoch: 10   Global Step: 433630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:17,021-Speed 2630.60 samples/sec   Loss 6.4615   LearningRate 0.0228   Epoch: 10   Global Step: 433640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:20,918-Speed 2628.44 samples/sec   Loss 6.4398   LearningRate 0.0228   Epoch: 10   Global Step: 433650   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:24,810-Speed 2632.17 samples/sec   Loss 6.3608   LearningRate 0.0228   Epoch: 10   Global Step: 433660   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:28,714-Speed 2623.41 samples/sec   Loss 6.4227   LearningRate 0.0228   Epoch: 10   Global Step: 433670   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:32,623-Speed 2620.07 samples/sec   Loss 6.3391   LearningRate 0.0228   Epoch: 10   Global Step: 433680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:36,528-Speed 2622.58 samples/sec   Loss 6.4014   LearningRate 0.0228   Epoch: 10   Global Step: 433690   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:40,421-Speed 2632.35 samples/sec   Loss 6.3830   LearningRate 0.0228   Epoch: 10   Global Step: 433700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:44,314-Speed 2631.43 samples/sec   Loss 6.5561   LearningRate 0.0228   Epoch: 10   Global Step: 433710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:48,219-Speed 2622.64 samples/sec   Loss 6.4505   LearningRate 0.0228   Epoch: 10   Global Step: 433720   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:27:52,100-Speed 2639.77 samples/sec   Loss 6.2189   LearningRate 0.0228   Epoch: 10   Global Step: 433730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:56,008-Speed 2621.12 samples/sec   Loss 6.3018   LearningRate 0.0228   Epoch: 10   Global Step: 433740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:27:59,906-Speed 2626.93 samples/sec   Loss 6.3197   LearningRate 0.0228   Epoch: 10   Global Step: 433750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:03,822-Speed 2615.80 samples/sec   Loss 6.3922   LearningRate 0.0228   Epoch: 10   Global Step: 433760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:07,721-Speed 2627.08 samples/sec   Loss 6.3383   LearningRate 0.0228   Epoch: 10   Global Step: 433770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:11,775-Speed 2526.24 samples/sec   Loss 6.3550   LearningRate 0.0228   Epoch: 10   Global Step: 433780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:15,809-Speed 2539.16 samples/sec   Loss 6.3505   LearningRate 0.0228   Epoch: 10   Global Step: 433790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:20,026-Speed 2429.03 samples/sec   Loss 6.3775   LearningRate 0.0228   Epoch: 10   Global Step: 433800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:23,931-Speed 2623.60 samples/sec   Loss 6.4104   LearningRate 0.0228   Epoch: 10   Global Step: 433810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:27,826-Speed 2629.08 samples/sec   Loss 6.3948   LearningRate 0.0228   Epoch: 10   Global Step: 433820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:31,733-Speed 2621.37 samples/sec   Loss 6.3407   LearningRate 0.0228   Epoch: 10   Global Step: 433830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:28:35,624-Speed 2632.56 samples/sec   Loss 6.3779   LearningRate 0.0228   Epoch: 10   Global Step: 433840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:28:39,537-Speed 2617.95 samples/sec   Loss 6.3603   LearningRate 0.0228   Epoch: 10   Global Step: 433850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:28:43,416-Speed 2640.07 samples/sec   Loss 6.3659   LearningRate 0.0228   Epoch: 10   Global Step: 433860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:47,310-Speed 2630.72 samples/sec   Loss 6.3834   LearningRate 0.0228   Epoch: 10   Global Step: 433870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:51,223-Speed 2617.49 samples/sec   Loss 6.4645   LearningRate 0.0228   Epoch: 10   Global Step: 433880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:55,117-Speed 2630.61 samples/sec   Loss 6.3634   LearningRate 0.0228   Epoch: 10   Global Step: 433890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:28:59,018-Speed 2625.42 samples/sec   Loss 6.2362   LearningRate 0.0227   Epoch: 10   Global Step: 433900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:29:02,923-Speed 2622.86 samples/sec   Loss 6.2399   LearningRate 0.0227   Epoch: 10   Global Step: 433910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:29:06,826-Speed 2624.22 samples/sec   Loss 6.4629   LearningRate 0.0227   Epoch: 10   Global Step: 433920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:29:10,741-Speed 2616.56 samples/sec   Loss 6.3198   LearningRate 0.0227   Epoch: 10   Global Step: 433930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:29:14,643-Speed 2625.05 samples/sec   Loss 6.3740   LearningRate 0.0227   Epoch: 10   Global Step: 433940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:29:18,542-Speed 2626.77 samples/sec   Loss 6.4160   LearningRate 0.0227   Epoch: 10   Global Step: 433950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:29:22,461-Speed 2614.06 samples/sec   Loss 6.2635   LearningRate 0.0227   Epoch: 10   Global Step: 433960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:26,359-Speed 2627.58 samples/sec   Loss 6.3364   LearningRate 0.0227   Epoch: 10   Global Step: 433970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:30,253-Speed 2630.26 samples/sec   Loss 6.3962   LearningRate 0.0227   Epoch: 10   Global Step: 433980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:34,148-Speed 2629.40 samples/sec   Loss 6.3809   LearningRate 0.0227   Epoch: 10   Global Step: 433990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:38,063-Speed 2616.53 samples/sec   Loss 6.3565   LearningRate 0.0227   Epoch: 10   Global Step: 434000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:41,963-Speed 2626.10 samples/sec   Loss 6.2588   LearningRate 0.0227   Epoch: 10   Global Step: 434010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:45,860-Speed 2628.88 samples/sec   Loss 6.4080   LearningRate 0.0227   Epoch: 10   Global Step: 434020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:49,764-Speed 2623.62 samples/sec   Loss 6.4376   LearningRate 0.0227   Epoch: 10   Global Step: 434030   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:53,673-Speed 2619.93 samples/sec   Loss 6.3331   LearningRate 0.0227   Epoch: 10   Global Step: 434040   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:29:57,581-Speed 2621.29 samples/sec   Loss 6.3456   LearningRate 0.0227   Epoch: 10   Global Step: 434050   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:01,455-Speed 2644.18 samples/sec   Loss 6.4205   LearningRate 0.0227   Epoch: 10   Global Step: 434060   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:05,395-Speed 2599.13 samples/sec   Loss 6.3077   LearningRate 0.0227   Epoch: 10   Global Step: 434070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:09,314-Speed 2613.90 samples/sec   Loss 6.3984   LearningRate 0.0227   Epoch: 10   Global Step: 434080   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:13,225-Speed 2618.97 samples/sec   Loss 6.4098   LearningRate 0.0227   Epoch: 10   Global Step: 434090   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:17,130-Speed 2623.16 samples/sec   Loss 6.3831   LearningRate 0.0227   Epoch: 10   Global Step: 434100   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:21,033-Speed 2624.46 samples/sec   Loss 6.3135   LearningRate 0.0227   Epoch: 10   Global Step: 434110   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:24,934-Speed 2624.90 samples/sec   Loss 6.3377   LearningRate 0.0227   Epoch: 10   Global Step: 434120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:28,840-Speed 2622.68 samples/sec   Loss 6.4226   LearningRate 0.0227   Epoch: 10   Global Step: 434130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:32,733-Speed 2631.13 samples/sec   Loss 6.4043   LearningRate 0.0227   Epoch: 10   Global Step: 434140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:36,629-Speed 2629.02 samples/sec   Loss 6.3412   LearningRate 0.0227   Epoch: 10   Global Step: 434150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:40,533-Speed 2623.16 samples/sec   Loss 6.3260   LearningRate 0.0227   Epoch: 10   Global Step: 434160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:44,427-Speed 2630.45 samples/sec   Loss 6.3239   LearningRate 0.0227   Epoch: 10   Global Step: 434170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:48,319-Speed 2631.83 samples/sec   Loss 6.3927   LearningRate 0.0227   Epoch: 10   Global Step: 434180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:52,219-Speed 2626.91 samples/sec   Loss 6.3244   LearningRate 0.0227   Epoch: 10   Global Step: 434190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:30:56,117-Speed 2626.93 samples/sec   Loss 6.3119   LearningRate 0.0227   Epoch: 10   Global Step: 434200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:00,010-Speed 2631.78 samples/sec   Loss 6.3251   LearningRate 0.0227   Epoch: 10   Global Step: 434210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:03,903-Speed 2630.90 samples/sec   Loss 6.3541   LearningRate 0.0227   Epoch: 10   Global Step: 434220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:07,804-Speed 2625.63 samples/sec   Loss 6.3708   LearningRate 0.0227   Epoch: 10   Global Step: 434230   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:11,806-Speed 2558.96 samples/sec   Loss 6.3556   LearningRate 0.0227   Epoch: 10   Global Step: 434240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:15,698-Speed 2632.20 samples/sec   Loss 6.4216   LearningRate 0.0227   Epoch: 10   Global Step: 434250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:19,577-Speed 2640.75 samples/sec   Loss 6.3575   LearningRate 0.0227   Epoch: 10   Global Step: 434260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:23,475-Speed 2627.58 samples/sec   Loss 6.4343   LearningRate 0.0227   Epoch: 10   Global Step: 434270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:27,375-Speed 2626.43 samples/sec   Loss 6.4085   LearningRate 0.0227   Epoch: 10   Global Step: 434280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:31,282-Speed 2621.66 samples/sec   Loss 6.3810   LearningRate 0.0227   Epoch: 10   Global Step: 434290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:35,184-Speed 2624.68 samples/sec   Loss 6.3712   LearningRate 0.0227   Epoch: 10   Global Step: 434300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:39,090-Speed 2622.29 samples/sec   Loss 6.3429   LearningRate 0.0227   Epoch: 10   Global Step: 434310   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:43,031-Speed 2599.82 samples/sec   Loss 6.4073   LearningRate 0.0227   Epoch: 10   Global Step: 434320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:31:46,909-Speed 2640.90 samples/sec   Loss 6.2227   LearningRate 0.0227   Epoch: 10   Global Step: 434330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:31:50,803-Speed 2630.82 samples/sec   Loss 6.3165   LearningRate 0.0227   Epoch: 10   Global Step: 434340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:31:54,728-Speed 2608.98 samples/sec   Loss 6.4026   LearningRate 0.0227   Epoch: 10   Global Step: 434350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:31:58,637-Speed 2620.58 samples/sec   Loss 6.3331   LearningRate 0.0227   Epoch: 10   Global Step: 434360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:32:02,536-Speed 2627.27 samples/sec   Loss 6.3823   LearningRate 0.0227   Epoch: 10   Global Step: 434370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:32:06,444-Speed 2620.59 samples/sec   Loss 6.4220   LearningRate 0.0227   Epoch: 10   Global Step: 434380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:32:10,341-Speed 2628.66 samples/sec   Loss 6.2962   LearningRate 0.0227   Epoch: 10   Global Step: 434390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:32:14,245-Speed 2623.69 samples/sec   Loss 6.2988   LearningRate 0.0227   Epoch: 10   Global Step: 434400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:32:18,141-Speed 2628.26 samples/sec   Loss 6.3512   LearningRate 0.0227   Epoch: 10   Global Step: 434410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:32:22,024-Speed 2638.40 samples/sec   Loss 6.3947   LearningRate 0.0227   Epoch: 10   Global Step: 434420   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:25,921-Speed 2628.26 samples/sec   Loss 6.2036   LearningRate 0.0227   Epoch: 10   Global Step: 434430   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:29,862-Speed 2599.01 samples/sec   Loss 6.3930   LearningRate 0.0227   Epoch: 10   Global Step: 434440   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:33,772-Speed 2619.68 samples/sec   Loss 6.4032   LearningRate 0.0227   Epoch: 10   Global Step: 434450   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:37,668-Speed 2629.01 samples/sec   Loss 6.3915   LearningRate 0.0227   Epoch: 10   Global Step: 434460   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:41,564-Speed 2629.45 samples/sec   Loss 6.3025   LearningRate 0.0227   Epoch: 10   Global Step: 434470   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:45,484-Speed 2612.67 samples/sec   Loss 6.3251   LearningRate 0.0227   Epoch: 10   Global Step: 434480   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:49,378-Speed 2630.36 samples/sec   Loss 6.4590   LearningRate 0.0227   Epoch: 10   Global Step: 434490   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:53,272-Speed 2631.02 samples/sec   Loss 6.3985   LearningRate 0.0227   Epoch: 10   Global Step: 434500   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:32:57,166-Speed 2630.26 samples/sec   Loss 6.2635   LearningRate 0.0227   Epoch: 10   Global Step: 434510   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:01,061-Speed 2629.63 samples/sec   Loss 6.3696   LearningRate 0.0227   Epoch: 10   Global Step: 434520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:33:04,942-Speed 2638.69 samples/sec   Loss 6.4109   LearningRate 0.0227   Epoch: 10   Global Step: 434530   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:08,835-Speed 2631.44 samples/sec   Loss 6.3371   LearningRate 0.0227   Epoch: 10   Global Step: 434540   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:12,732-Speed 2628.41 samples/sec   Loss 6.4328   LearningRate 0.0227   Epoch: 10   Global Step: 434550   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:16,624-Speed 2631.68 samples/sec   Loss 6.3268   LearningRate 0.0227   Epoch: 10   Global Step: 434560   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:20,519-Speed 2629.88 samples/sec   Loss 6.3801   LearningRate 0.0227   Epoch: 10   Global Step: 434570   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:24,442-Speed 2611.17 samples/sec   Loss 6.2570   LearningRate 0.0227   Epoch: 10   Global Step: 434580   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:28,335-Speed 2630.76 samples/sec   Loss 6.4403   LearningRate 0.0227   Epoch: 10   Global Step: 434590   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:32,230-Speed 2629.79 samples/sec   Loss 6.3132   LearningRate 0.0227   Epoch: 10   Global Step: 434600   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:36,127-Speed 2628.00 samples/sec   Loss 6.3150   LearningRate 0.0227   Epoch: 10   Global Step: 434610   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:40,023-Speed 2629.42 samples/sec   Loss 6.4274   LearningRate 0.0227   Epoch: 10   Global Step: 434620   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 20:33:43,918-Speed 2629.93 samples/sec   Loss 6.3352   LearningRate 0.0227   Epoch: 10   Global Step: 434630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:33:47,842-Speed 2610.12 samples/sec   Loss 6.3479   LearningRate 0.0227   Epoch: 10   Global Step: 434640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:33:51,741-Speed 2626.37 samples/sec   Loss 6.3440   LearningRate 0.0227   Epoch: 10   Global Step: 434650   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:33:55,636-Speed 2630.67 samples/sec   Loss 6.3686   LearningRate 0.0227   Epoch: 10   Global Step: 434660   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:33:59,530-Speed 2629.92 samples/sec   Loss 6.4050   LearningRate 0.0227   Epoch: 10   Global Step: 434670   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:03,426-Speed 2628.66 samples/sec   Loss 6.2285   LearningRate 0.0227   Epoch: 10   Global Step: 434680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:07,319-Speed 2630.65 samples/sec   Loss 6.3437   LearningRate 0.0227   Epoch: 10   Global Step: 434690   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:11,230-Speed 2619.37 samples/sec   Loss 6.3405   LearningRate 0.0227   Epoch: 10   Global Step: 434700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:15,239-Speed 2555.01 samples/sec   Loss 6.4205   LearningRate 0.0227   Epoch: 10   Global Step: 434710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:19,202-Speed 2584.37 samples/sec   Loss 6.3138   LearningRate 0.0227   Epoch: 10   Global Step: 434720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:23,110-Speed 2620.99 samples/sec   Loss 6.4438   LearningRate 0.0227   Epoch: 10   Global Step: 434730   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:34:27,006-Speed 2628.74 samples/sec   Loss 6.2468   LearningRate 0.0227   Epoch: 10   Global Step: 434740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:30,906-Speed 2626.30 samples/sec   Loss 6.2398   LearningRate 0.0227   Epoch: 10   Global Step: 434750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:34,801-Speed 2629.42 samples/sec   Loss 6.3179   LearningRate 0.0227   Epoch: 10   Global Step: 434760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:38,717-Speed 2615.82 samples/sec   Loss 6.4506   LearningRate 0.0226   Epoch: 10   Global Step: 434770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:42,617-Speed 2625.64 samples/sec   Loss 6.3466   LearningRate 0.0226   Epoch: 10   Global Step: 434780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:46,519-Speed 2625.33 samples/sec   Loss 6.3921   LearningRate 0.0226   Epoch: 10   Global Step: 434790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:50,421-Speed 2625.04 samples/sec   Loss 6.3525   LearningRate 0.0226   Epoch: 10   Global Step: 434800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:54,328-Speed 2621.68 samples/sec   Loss 6.3503   LearningRate 0.0226   Epoch: 10   Global Step: 434810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:34:58,225-Speed 2628.07 samples/sec   Loss 6.2347   LearningRate 0.0226   Epoch: 10   Global Step: 434820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:02,120-Speed 2629.48 samples/sec   Loss 6.4903   LearningRate 0.0226   Epoch: 10   Global Step: 434830   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:06,017-Speed 2628.28 samples/sec   Loss 6.2745   LearningRate 0.0226   Epoch: 10   Global Step: 434840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:35:10,055-Speed 2536.21 samples/sec   Loss 6.2887   LearningRate 0.0226   Epoch: 10   Global Step: 434850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:35:14,109-Speed 2526.85 samples/sec   Loss 6.3343   LearningRate 0.0226   Epoch: 10   Global Step: 434860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:35:18,017-Speed 2620.18 samples/sec   Loss 6.3146   LearningRate 0.0226   Epoch: 10   Global Step: 434870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:35:21,904-Speed 2635.29 samples/sec   Loss 6.2668   LearningRate 0.0226   Epoch: 10   Global Step: 434880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:25,899-Speed 2563.76 samples/sec   Loss 6.4016   LearningRate 0.0226   Epoch: 10   Global Step: 434890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:29,797-Speed 2627.80 samples/sec   Loss 6.4255   LearningRate 0.0226   Epoch: 10   Global Step: 434900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:33,694-Speed 2628.57 samples/sec   Loss 6.2868   LearningRate 0.0226   Epoch: 10   Global Step: 434910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:37,592-Speed 2627.45 samples/sec   Loss 6.2379   LearningRate 0.0226   Epoch: 10   Global Step: 434920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:41,492-Speed 2625.63 samples/sec   Loss 6.3239   LearningRate 0.0226   Epoch: 10   Global Step: 434930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:45,391-Speed 2627.55 samples/sec   Loss 6.2751   LearningRate 0.0226   Epoch: 10   Global Step: 434940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:49,286-Speed 2629.27 samples/sec   Loss 6.4410   LearningRate 0.0226   Epoch: 10   Global Step: 434950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:53,183-Speed 2628.39 samples/sec   Loss 6.4129   LearningRate 0.0226   Epoch: 10   Global Step: 434960   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:35:57,091-Speed 2620.77 samples/sec   Loss 6.3448   LearningRate 0.0226   Epoch: 10   Global Step: 434970   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:36:00,988-Speed 2627.89 samples/sec   Loss 6.3709   LearningRate 0.0226   Epoch: 10   Global Step: 434980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:04,882-Speed 2630.78 samples/sec   Loss 6.3860   LearningRate 0.0226   Epoch: 10   Global Step: 434990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:08,790-Speed 2620.77 samples/sec   Loss 6.3490   LearningRate 0.0226   Epoch: 10   Global Step: 435000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:12,684-Speed 2629.96 samples/sec   Loss 6.4025   LearningRate 0.0226   Epoch: 10   Global Step: 435010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:16,587-Speed 2624.39 samples/sec   Loss 6.3194   LearningRate 0.0226   Epoch: 10   Global Step: 435020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:20,495-Speed 2621.16 samples/sec   Loss 6.2723   LearningRate 0.0226   Epoch: 10   Global Step: 435030   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:24,390-Speed 2629.83 samples/sec   Loss 6.3612   LearningRate 0.0226   Epoch: 10   Global Step: 435040   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:28,281-Speed 2632.08 samples/sec   Loss 6.2052   LearningRate 0.0226   Epoch: 10   Global Step: 435050   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:32,207-Speed 2609.10 samples/sec   Loss 6.4196   LearningRate 0.0226   Epoch: 10   Global Step: 435060   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:36,101-Speed 2629.61 samples/sec   Loss 6.3076   LearningRate 0.0226   Epoch: 10   Global Step: 435070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:39,985-Speed 2637.52 samples/sec   Loss 6.4457   LearningRate 0.0226   Epoch: 10   Global Step: 435080   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:43,900-Speed 2616.45 samples/sec   Loss 6.3510   LearningRate 0.0226   Epoch: 10   Global Step: 435090   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:47,790-Speed 2632.79 samples/sec   Loss 6.2802   LearningRate 0.0226   Epoch: 10   Global Step: 435100   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:51,686-Speed 2629.23 samples/sec   Loss 6.3785   LearningRate 0.0226   Epoch: 10   Global Step: 435110   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:55,581-Speed 2629.47 samples/sec   Loss 6.3164   LearningRate 0.0226   Epoch: 10   Global Step: 435120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:36:59,497-Speed 2615.22 samples/sec   Loss 6.3071   LearningRate 0.0226   Epoch: 10   Global Step: 435130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:03,454-Speed 2588.28 samples/sec   Loss 6.4423   LearningRate 0.0226   Epoch: 10   Global Step: 435140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:07,357-Speed 2624.32 samples/sec   Loss 6.3014   LearningRate 0.0226   Epoch: 10   Global Step: 435150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:11,251-Speed 2630.05 samples/sec   Loss 6.4760   LearningRate 0.0226   Epoch: 10   Global Step: 435160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:15,144-Speed 2630.78 samples/sec   Loss 6.4443   LearningRate 0.0226   Epoch: 10   Global Step: 435170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:19,027-Speed 2638.10 samples/sec   Loss 6.4604   LearningRate 0.0226   Epoch: 10   Global Step: 435180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:22,934-Speed 2621.97 samples/sec   Loss 6.4404   LearningRate 0.0226   Epoch: 10   Global Step: 435190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:26,830-Speed 2628.56 samples/sec   Loss 6.2027   LearningRate 0.0226   Epoch: 10   Global Step: 435200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:30,725-Speed 2629.71 samples/sec   Loss 6.3825   LearningRate 0.0226   Epoch: 10   Global Step: 435210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:34,721-Speed 2563.47 samples/sec   Loss 6.3385   LearningRate 0.0226   Epoch: 10   Global Step: 435220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:38,807-Speed 2506.56 samples/sec   Loss 6.2416   LearningRate 0.0226   Epoch: 10   Global Step: 435230   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:42,719-Speed 2618.04 samples/sec   Loss 6.3448   LearningRate 0.0226   Epoch: 10   Global Step: 435240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:46,616-Speed 2629.05 samples/sec   Loss 6.4572   LearningRate 0.0226   Epoch: 10   Global Step: 435250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:50,514-Speed 2628.38 samples/sec   Loss 6.4223   LearningRate 0.0226   Epoch: 10   Global Step: 435260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:54,414-Speed 2625.89 samples/sec   Loss 6.3508   LearningRate 0.0226   Epoch: 10   Global Step: 435270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:37:58,317-Speed 2624.86 samples/sec   Loss 6.2668   LearningRate 0.0226   Epoch: 10   Global Step: 435280   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-04-14 20:38:02,202-Speed 2636.12 samples/sec   Loss 6.3969   LearningRate 0.0226   Epoch: 10   Global Step: 435290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:38:06,099-Speed 2628.33 samples/sec   Loss 6.4238   LearningRate 0.0226   Epoch: 10   Global Step: 435300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:38:09,995-Speed 2628.47 samples/sec   Loss 6.3315   LearningRate 0.0226   Epoch: 10   Global Step: 435310   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:38:13,895-Speed 2626.26 samples/sec   Loss 6.2584   LearningRate 0.0226   Epoch: 10   Global Step: 435320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:38:17,792-Speed 2628.45 samples/sec   Loss 6.2379   LearningRate 0.0226   Epoch: 10   Global Step: 435330   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:38:21,692-Speed 2626.00 samples/sec   Loss 6.2845   LearningRate 0.0226   Epoch: 10   Global Step: 435340   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:38:25,568-Speed 2642.75 samples/sec   Loss 6.3300   LearningRate 0.0226   Epoch: 10   Global Step: 435350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:29,465-Speed 2628.49 samples/sec   Loss 6.2957   LearningRate 0.0226   Epoch: 10   Global Step: 435360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:33,363-Speed 2626.87 samples/sec   Loss 6.2432   LearningRate 0.0226   Epoch: 10   Global Step: 435370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:37,256-Speed 2631.73 samples/sec   Loss 6.3320   LearningRate 0.0226   Epoch: 10   Global Step: 435380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:41,148-Speed 2631.64 samples/sec   Loss 6.2248   LearningRate 0.0226   Epoch: 10   Global Step: 435390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:45,046-Speed 2627.17 samples/sec   Loss 6.3874   LearningRate 0.0226   Epoch: 10   Global Step: 435400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:48,949-Speed 2624.44 samples/sec   Loss 6.2118   LearningRate 0.0226   Epoch: 10   Global Step: 435410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:52,869-Speed 2612.71 samples/sec   Loss 6.3740   LearningRate 0.0226   Epoch: 10   Global Step: 435420   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:38:56,772-Speed 2624.07 samples/sec   Loss 6.4534   LearningRate 0.0226   Epoch: 10   Global Step: 435430   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:00,674-Speed 2624.87 samples/sec   Loss 6.3697   LearningRate 0.0226   Epoch: 10   Global Step: 435440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:04,594-Speed 2613.14 samples/sec   Loss 6.4769   LearningRate 0.0226   Epoch: 10   Global Step: 435450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:39:08,488-Speed 2629.65 samples/sec   Loss 6.3293   LearningRate 0.0226   Epoch: 10   Global Step: 435460   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:39:12,389-Speed 2625.61 samples/sec   Loss 6.2763   LearningRate 0.0226   Epoch: 10   Global Step: 435470   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:39:16,300-Speed 2624.08 samples/sec   Loss 6.3222   LearningRate 0.0226   Epoch: 10   Global Step: 435480   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:39:20,195-Speed 2629.58 samples/sec   Loss 6.2738   LearningRate 0.0226   Epoch: 10   Global Step: 435490   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:39:24,093-Speed 2627.82 samples/sec   Loss 6.3409   LearningRate 0.0226   Epoch: 10   Global Step: 435500   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:39:27,970-Speed 2641.82 samples/sec   Loss 6.2486   LearningRate 0.0226   Epoch: 10   Global Step: 435510   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:31,872-Speed 2624.78 samples/sec   Loss 6.2688   LearningRate 0.0226   Epoch: 10   Global Step: 435520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:35,772-Speed 2626.04 samples/sec   Loss 6.3278   LearningRate 0.0226   Epoch: 10   Global Step: 435530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:39,672-Speed 2626.18 samples/sec   Loss 6.4065   LearningRate 0.0226   Epoch: 10   Global Step: 435540   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:43,586-Speed 2617.05 samples/sec   Loss 6.3267   LearningRate 0.0226   Epoch: 10   Global Step: 435550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:47,482-Speed 2629.03 samples/sec   Loss 6.3321   LearningRate 0.0226   Epoch: 10   Global Step: 435560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:51,377-Speed 2629.43 samples/sec   Loss 6.3204   LearningRate 0.0226   Epoch: 10   Global Step: 435570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:55,272-Speed 2629.75 samples/sec   Loss 6.2784   LearningRate 0.0226   Epoch: 10   Global Step: 435580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:39:59,168-Speed 2628.70 samples/sec   Loss 6.2462   LearningRate 0.0226   Epoch: 10   Global Step: 435590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:03,066-Speed 2627.87 samples/sec   Loss 6.3457   LearningRate 0.0226   Epoch: 10   Global Step: 435600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:06,965-Speed 2626.89 samples/sec   Loss 6.3026   LearningRate 0.0226   Epoch: 10   Global Step: 435610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:40:10,863-Speed 2627.49 samples/sec   Loss 6.3208   LearningRate 0.0226   Epoch: 10   Global Step: 435620   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:40:14,758-Speed 2629.46 samples/sec   Loss 6.2753   LearningRate 0.0226   Epoch: 10   Global Step: 435630   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:40:18,651-Speed 2631.07 samples/sec   Loss 6.3041   LearningRate 0.0225   Epoch: 10   Global Step: 435640   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:40:22,546-Speed 2629.74 samples/sec   Loss 6.4258   LearningRate 0.0225   Epoch: 10   Global Step: 435650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:40:26,425-Speed 2640.37 samples/sec   Loss 6.2961   LearningRate 0.0225   Epoch: 10   Global Step: 435660   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:30,341-Speed 2615.91 samples/sec   Loss 6.3082   LearningRate 0.0225   Epoch: 10   Global Step: 435670   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:34,249-Speed 2620.77 samples/sec   Loss 6.3815   LearningRate 0.0225   Epoch: 10   Global Step: 435680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:38,145-Speed 2628.25 samples/sec   Loss 6.3702   LearningRate 0.0225   Epoch: 10   Global Step: 435690   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:42,038-Speed 2631.53 samples/sec   Loss 6.2909   LearningRate 0.0225   Epoch: 10   Global Step: 435700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:45,935-Speed 2628.56 samples/sec   Loss 6.4083   LearningRate 0.0225   Epoch: 10   Global Step: 435710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:49,880-Speed 2595.78 samples/sec   Loss 6.3731   LearningRate 0.0225   Epoch: 10   Global Step: 435720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:53,915-Speed 2538.49 samples/sec   Loss 6.3982   LearningRate 0.0225   Epoch: 10   Global Step: 435730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:40:57,813-Speed 2627.85 samples/sec   Loss 6.3614   LearningRate 0.0225   Epoch: 10   Global Step: 435740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:01,721-Speed 2620.68 samples/sec   Loss 6.3705   LearningRate 0.0225   Epoch: 10   Global Step: 435750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:05,617-Speed 2628.82 samples/sec   Loss 6.3261   LearningRate 0.0225   Epoch: 10   Global Step: 435760   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:41:09,516-Speed 2627.38 samples/sec   Loss 6.2244   LearningRate 0.0225   Epoch: 10   Global Step: 435770   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:41:13,411-Speed 2629.51 samples/sec   Loss 6.3509   LearningRate 0.0225   Epoch: 10   Global Step: 435780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:41:17,290-Speed 2640.36 samples/sec   Loss 6.2364   LearningRate 0.0225   Epoch: 10   Global Step: 435790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:21,189-Speed 2627.10 samples/sec   Loss 6.4074   LearningRate 0.0225   Epoch: 10   Global Step: 435800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:25,111-Speed 2611.52 samples/sec   Loss 6.2444   LearningRate 0.0225   Epoch: 10   Global Step: 435810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:29,024-Speed 2617.53 samples/sec   Loss 6.3916   LearningRate 0.0225   Epoch: 10   Global Step: 435820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:32,926-Speed 2624.51 samples/sec   Loss 6.3966   LearningRate 0.0225   Epoch: 10   Global Step: 435830   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:36,823-Speed 2628.40 samples/sec   Loss 6.3773   LearningRate 0.0225   Epoch: 10   Global Step: 435840   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:40,718-Speed 2629.30 samples/sec   Loss 6.3805   LearningRate 0.0225   Epoch: 10   Global Step: 435850   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:44,614-Speed 2629.10 samples/sec   Loss 6.3019   LearningRate 0.0225   Epoch: 10   Global Step: 435860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:48,509-Speed 2629.64 samples/sec   Loss 6.2837   LearningRate 0.0225   Epoch: 10   Global Step: 435870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:52,433-Speed 2610.60 samples/sec   Loss 6.2611   LearningRate 0.0225   Epoch: 10   Global Step: 435880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:41:56,331-Speed 2626.99 samples/sec   Loss 6.2153   LearningRate 0.0225   Epoch: 10   Global Step: 435890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:00,229-Speed 2627.90 samples/sec   Loss 6.3453   LearningRate 0.0225   Epoch: 10   Global Step: 435900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:04,144-Speed 2616.35 samples/sec   Loss 6.3042   LearningRate 0.0225   Epoch: 10   Global Step: 435910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:08,044-Speed 2626.02 samples/sec   Loss 6.1434   LearningRate 0.0225   Epoch: 10   Global Step: 435920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:11,950-Speed 2621.68 samples/sec   Loss 6.2515   LearningRate 0.0225   Epoch: 10   Global Step: 435930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:15,845-Speed 2629.76 samples/sec   Loss 6.3617   LearningRate 0.0225   Epoch: 10   Global Step: 435940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:19,777-Speed 2604.60 samples/sec   Loss 6.3779   LearningRate 0.0225   Epoch: 10   Global Step: 435950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:23,680-Speed 2624.87 samples/sec   Loss 6.2586   LearningRate 0.0225   Epoch: 10   Global Step: 435960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:42:27,555-Speed 2643.04 samples/sec   Loss 6.2350   LearningRate 0.0225   Epoch: 10   Global Step: 435970   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:31,450-Speed 2629.75 samples/sec   Loss 6.3592   LearningRate 0.0225   Epoch: 10   Global Step: 435980   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:35,350-Speed 2626.25 samples/sec   Loss 6.3403   LearningRate 0.0225   Epoch: 10   Global Step: 435990   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:39,276-Speed 2608.89 samples/sec   Loss 6.2830   LearningRate 0.0225   Epoch: 10   Global Step: 436000   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:43,176-Speed 2625.55 samples/sec   Loss 6.3801   LearningRate 0.0225   Epoch: 10   Global Step: 436010   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:47,074-Speed 2628.16 samples/sec   Loss 6.3171   LearningRate 0.0225   Epoch: 10   Global Step: 436020   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:50,981-Speed 2621.80 samples/sec   Loss 6.3362   LearningRate 0.0225   Epoch: 10   Global Step: 436030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:54,878-Speed 2627.65 samples/sec   Loss 6.3914   LearningRate 0.0225   Epoch: 10   Global Step: 436040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:42:58,825-Speed 2595.54 samples/sec   Loss 6.3657   LearningRate 0.0225   Epoch: 10   Global Step: 436050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:02,721-Speed 2628.99 samples/sec   Loss 6.3723   LearningRate 0.0225   Epoch: 10   Global Step: 436060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:06,618-Speed 2628.13 samples/sec   Loss 6.3546   LearningRate 0.0225   Epoch: 10   Global Step: 436070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:10,516-Speed 2627.11 samples/sec   Loss 6.2867   LearningRate 0.0225   Epoch: 10   Global Step: 436080   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:14,609-Speed 2502.27 samples/sec   Loss 6.2844   LearningRate 0.0225   Epoch: 10   Global Step: 436090   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:18,700-Speed 2503.64 samples/sec   Loss 6.2493   LearningRate 0.0225   Epoch: 10   Global Step: 436100   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:22,732-Speed 2540.22 samples/sec   Loss 6.2888   LearningRate 0.0225   Epoch: 10   Global Step: 436110   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:26,628-Speed 2628.76 samples/sec   Loss 6.3285   LearningRate 0.0225   Epoch: 10   Global Step: 436120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:30,526-Speed 2627.98 samples/sec   Loss 6.3430   LearningRate 0.0225   Epoch: 10   Global Step: 436130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:43:34,406-Speed 2639.31 samples/sec   Loss 6.2764   LearningRate 0.0225   Epoch: 10   Global Step: 436140   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:38,302-Speed 2629.20 samples/sec   Loss 6.3724   LearningRate 0.0225   Epoch: 10   Global Step: 436150   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:42,205-Speed 2624.41 samples/sec   Loss 6.3183   LearningRate 0.0225   Epoch: 10   Global Step: 436160   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:46,109-Speed 2623.63 samples/sec   Loss 6.3699   LearningRate 0.0225   Epoch: 10   Global Step: 436170   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:50,007-Speed 2627.17 samples/sec   Loss 6.3283   LearningRate 0.0225   Epoch: 10   Global Step: 436180   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:53,904-Speed 2628.96 samples/sec   Loss 6.4127   LearningRate 0.0225   Epoch: 10   Global Step: 436190   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:43:57,798-Speed 2629.94 samples/sec   Loss 6.4001   LearningRate 0.0225   Epoch: 10   Global Step: 436200   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:01,699-Speed 2626.34 samples/sec   Loss 6.2999   LearningRate 0.0225   Epoch: 10   Global Step: 436210   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:05,602-Speed 2624.01 samples/sec   Loss 6.3659   LearningRate 0.0225   Epoch: 10   Global Step: 436220   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:09,500-Speed 2626.96 samples/sec   Loss 6.1954   LearningRate 0.0225   Epoch: 10   Global Step: 436230   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:13,397-Speed 2628.15 samples/sec   Loss 6.3017   LearningRate 0.0225   Epoch: 10   Global Step: 436240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:44:17,293-Speed 2629.50 samples/sec   Loss 6.2480   LearningRate 0.0225   Epoch: 10   Global Step: 436250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:44:21,566-Speed 2397.45 samples/sec   Loss 6.2308   LearningRate 0.0225   Epoch: 10   Global Step: 436260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:44:25,460-Speed 2630.25 samples/sec   Loss 6.3225   LearningRate 0.0225   Epoch: 10   Global Step: 436270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:44:29,359-Speed 2626.69 samples/sec   Loss 6.3121   LearningRate 0.0225   Epoch: 10   Global Step: 436280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:44:33,255-Speed 2628.53 samples/sec   Loss 6.3998   LearningRate 0.0225   Epoch: 10   Global Step: 436290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:44:37,129-Speed 2644.46 samples/sec   Loss 6.2320   LearningRate 0.0225   Epoch: 10   Global Step: 436300   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:41,024-Speed 2629.81 samples/sec   Loss 6.2967   LearningRate 0.0225   Epoch: 10   Global Step: 436310   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:44,949-Speed 2609.26 samples/sec   Loss 6.3966   LearningRate 0.0225   Epoch: 10   Global Step: 436320   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:48,846-Speed 2627.81 samples/sec   Loss 6.4316   LearningRate 0.0225   Epoch: 10   Global Step: 436330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:52,761-Speed 2616.49 samples/sec   Loss 6.3539   LearningRate 0.0225   Epoch: 10   Global Step: 436340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:44:56,720-Speed 2587.17 samples/sec   Loss 6.3214   LearningRate 0.0225   Epoch: 10   Global Step: 436350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:00,724-Speed 2558.22 samples/sec   Loss 6.3442   LearningRate 0.0225   Epoch: 10   Global Step: 436360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:04,631-Speed 2621.39 samples/sec   Loss 6.2869   LearningRate 0.0225   Epoch: 10   Global Step: 436370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:08,535-Speed 2623.61 samples/sec   Loss 6.2710   LearningRate 0.0225   Epoch: 10   Global Step: 436380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:12,447-Speed 2618.21 samples/sec   Loss 6.2048   LearningRate 0.0225   Epoch: 10   Global Step: 436390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:16,347-Speed 2626.85 samples/sec   Loss 6.2824   LearningRate 0.0225   Epoch: 10   Global Step: 436400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:45:20,243-Speed 2628.41 samples/sec   Loss 6.2965   LearningRate 0.0225   Epoch: 10   Global Step: 436410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:45:24,139-Speed 2629.32 samples/sec   Loss 6.3396   LearningRate 0.0225   Epoch: 10   Global Step: 436420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:45:28,042-Speed 2624.60 samples/sec   Loss 6.2372   LearningRate 0.0225   Epoch: 10   Global Step: 436430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:45:31,939-Speed 2628.60 samples/sec   Loss 6.3803   LearningRate 0.0225   Epoch: 10   Global Step: 436440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:45:35,837-Speed 2627.45 samples/sec   Loss 6.4515   LearningRate 0.0225   Epoch: 10   Global Step: 436450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:45:39,715-Speed 2641.37 samples/sec   Loss 6.4671   LearningRate 0.0225   Epoch: 10   Global Step: 436460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:43,618-Speed 2623.91 samples/sec   Loss 6.4897   LearningRate 0.0225   Epoch: 10   Global Step: 436470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:47,514-Speed 2629.39 samples/sec   Loss 6.2310   LearningRate 0.0225   Epoch: 10   Global Step: 436480   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:51,497-Speed 2572.32 samples/sec   Loss 6.3592   LearningRate 0.0225   Epoch: 10   Global Step: 436490   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:55,618-Speed 2485.10 samples/sec   Loss 6.4298   LearningRate 0.0225   Epoch: 10   Global Step: 436500   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:45:59,528-Speed 2619.89 samples/sec   Loss 6.3418   LearningRate 0.0225   Epoch: 10   Global Step: 436510   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:03,445-Speed 2614.55 samples/sec   Loss 6.3420   LearningRate 0.0224   Epoch: 10   Global Step: 436520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:07,353-Speed 2620.67 samples/sec   Loss 6.4001   LearningRate 0.0224   Epoch: 10   Global Step: 436530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:11,251-Speed 2627.83 samples/sec   Loss 6.2728   LearningRate 0.0224   Epoch: 10   Global Step: 436540   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:15,150-Speed 2626.78 samples/sec   Loss 6.2836   LearningRate 0.0224   Epoch: 10   Global Step: 436550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:19,047-Speed 2628.16 samples/sec   Loss 6.3874   LearningRate 0.0224   Epoch: 10   Global Step: 436560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:22,941-Speed 2631.44 samples/sec   Loss 6.3704   LearningRate 0.0224   Epoch: 10   Global Step: 436570   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:26,836-Speed 2629.59 samples/sec   Loss 6.2974   LearningRate 0.0224   Epoch: 10   Global Step: 436580   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:30,744-Speed 2620.67 samples/sec   Loss 6.3857   LearningRate 0.0224   Epoch: 10   Global Step: 436590   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:34,640-Speed 2628.48 samples/sec   Loss 6.3771   LearningRate 0.0224   Epoch: 10   Global Step: 436600   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:38,535-Speed 2630.09 samples/sec   Loss 6.4001   LearningRate 0.0224   Epoch: 10   Global Step: 436610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:42,430-Speed 2629.22 samples/sec   Loss 6.3554   LearningRate 0.0224   Epoch: 10   Global Step: 436620   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:46:46,306-Speed 2642.25 samples/sec   Loss 6.3407   LearningRate 0.0224   Epoch: 10   Global Step: 436630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:50,201-Speed 2630.02 samples/sec   Loss 6.2895   LearningRate 0.0224   Epoch: 10   Global Step: 436640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:54,112-Speed 2619.12 samples/sec   Loss 6.4383   LearningRate 0.0224   Epoch: 10   Global Step: 436650   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:46:58,010-Speed 2627.84 samples/sec   Loss 6.3451   LearningRate 0.0224   Epoch: 10   Global Step: 436660   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:01,939-Speed 2607.53 samples/sec   Loss 6.2292   LearningRate 0.0224   Epoch: 10   Global Step: 436670   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:05,837-Speed 2627.30 samples/sec   Loss 6.3104   LearningRate 0.0224   Epoch: 10   Global Step: 436680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:09,748-Speed 2618.46 samples/sec   Loss 6.2207   LearningRate 0.0224   Epoch: 10   Global Step: 436690   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:13,648-Speed 2626.14 samples/sec   Loss 6.3746   LearningRate 0.0224   Epoch: 10   Global Step: 436700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:17,550-Speed 2625.33 samples/sec   Loss 6.4399   LearningRate 0.0224   Epoch: 10   Global Step: 436710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:21,459-Speed 2620.09 samples/sec   Loss 6.2375   LearningRate 0.0224   Epoch: 10   Global Step: 436720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:25,409-Speed 2592.83 samples/sec   Loss 6.3900   LearningRate 0.0224   Epoch: 10   Global Step: 436730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:29,317-Speed 2621.39 samples/sec   Loss 6.2880   LearningRate 0.0224   Epoch: 10   Global Step: 436740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:33,240-Speed 2610.45 samples/sec   Loss 6.3070   LearningRate 0.0224   Epoch: 10   Global Step: 436750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:37,137-Speed 2628.40 samples/sec   Loss 6.3836   LearningRate 0.0224   Epoch: 10   Global Step: 436760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:41,041-Speed 2623.72 samples/sec   Loss 6.3690   LearningRate 0.0224   Epoch: 10   Global Step: 436770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:44,952-Speed 2618.68 samples/sec   Loss 6.4048   LearningRate 0.0224   Epoch: 10   Global Step: 436780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:48,846-Speed 2630.20 samples/sec   Loss 6.4363   LearningRate 0.0224   Epoch: 10   Global Step: 436790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:52,751-Speed 2622.90 samples/sec   Loss 6.2822   LearningRate 0.0224   Epoch: 10   Global Step: 436800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:47:56,657-Speed 2622.00 samples/sec   Loss 6.3171   LearningRate 0.0224   Epoch: 10   Global Step: 436810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:48:00,555-Speed 2627.54 samples/sec   Loss 6.3611   LearningRate 0.0224   Epoch: 10   Global Step: 436820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:48:04,455-Speed 2626.58 samples/sec   Loss 6.2752   LearningRate 0.0224   Epoch: 10   Global Step: 436830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:08,367-Speed 2618.20 samples/sec   Loss 6.2809   LearningRate 0.0224   Epoch: 10   Global Step: 436840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:12,269-Speed 2624.92 samples/sec   Loss 6.2914   LearningRate 0.0224   Epoch: 10   Global Step: 436850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:16,172-Speed 2624.20 samples/sec   Loss 6.2429   LearningRate 0.0224   Epoch: 10   Global Step: 436860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:20,201-Speed 2541.99 samples/sec   Loss 6.3066   LearningRate 0.0224   Epoch: 10   Global Step: 436870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:24,124-Speed 2610.94 samples/sec   Loss 6.3549   LearningRate 0.0224   Epoch: 10   Global Step: 436880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:28,022-Speed 2627.28 samples/sec   Loss 6.3545   LearningRate 0.0224   Epoch: 10   Global Step: 436890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:31,930-Speed 2621.49 samples/sec   Loss 6.4014   LearningRate 0.0224   Epoch: 10   Global Step: 436900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:35,826-Speed 2628.35 samples/sec   Loss 6.3882   LearningRate 0.0224   Epoch: 10   Global Step: 436910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:39,722-Speed 2629.12 samples/sec   Loss 6.3094   LearningRate 0.0224   Epoch: 10   Global Step: 436920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:43,607-Speed 2636.59 samples/sec   Loss 6.2897   LearningRate 0.0224   Epoch: 10   Global Step: 436930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:47,510-Speed 2624.51 samples/sec   Loss 6.3929   LearningRate 0.0224   Epoch: 10   Global Step: 436940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:51,406-Speed 2628.93 samples/sec   Loss 6.3457   LearningRate 0.0224   Epoch: 10   Global Step: 436950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:55,299-Speed 2630.74 samples/sec   Loss 6.2440   LearningRate 0.0224   Epoch: 10   Global Step: 436960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:48:59,209-Speed 2620.18 samples/sec   Loss 6.2654   LearningRate 0.0224   Epoch: 10   Global Step: 436970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:49:03,111-Speed 2624.71 samples/sec   Loss 6.3768   LearningRate 0.0224   Epoch: 10   Global Step: 436980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:49:07,021-Speed 2619.68 samples/sec   Loss 6.4152   LearningRate 0.0224   Epoch: 10   Global Step: 436990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:49:10,913-Speed 2631.10 samples/sec   Loss 6.3131   LearningRate 0.0224   Epoch: 10   Global Step: 437000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:49:14,791-Speed 2641.14 samples/sec   Loss 6.3334   LearningRate 0.0224   Epoch: 10   Global Step: 437010   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:18,688-Speed 2628.39 samples/sec   Loss 6.2828   LearningRate 0.0224   Epoch: 10   Global Step: 437020   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:22,601-Speed 2618.22 samples/sec   Loss 6.2619   LearningRate 0.0224   Epoch: 10   Global Step: 437030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:26,498-Speed 2627.92 samples/sec   Loss 6.3681   LearningRate 0.0224   Epoch: 10   Global Step: 437040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:30,391-Speed 2630.79 samples/sec   Loss 6.2496   LearningRate 0.0224   Epoch: 10   Global Step: 437050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:34,287-Speed 2629.07 samples/sec   Loss 6.2604   LearningRate 0.0224   Epoch: 10   Global Step: 437060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:38,209-Speed 2611.37 samples/sec   Loss 6.4356   LearningRate 0.0224   Epoch: 10   Global Step: 437070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:42,108-Speed 2626.78 samples/sec   Loss 6.3203   LearningRate 0.0224   Epoch: 10   Global Step: 437080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:46,003-Speed 2630.52 samples/sec   Loss 6.2670   LearningRate 0.0224   Epoch: 10   Global Step: 437090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:49,905-Speed 2624.45 samples/sec   Loss 6.3064   LearningRate 0.0224   Epoch: 10   Global Step: 437100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:49:53,813-Speed 2620.72 samples/sec   Loss 6.2043   LearningRate 0.0224   Epoch: 10   Global Step: 437110   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:49:57,713-Speed 2626.48 samples/sec   Loss 6.2578   LearningRate 0.0224   Epoch: 10   Global Step: 437120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:01,611-Speed 2628.01 samples/sec   Loss 6.4255   LearningRate 0.0224   Epoch: 10   Global Step: 437130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:05,513-Speed 2624.75 samples/sec   Loss 6.3628   LearningRate 0.0224   Epoch: 10   Global Step: 437140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:09,425-Speed 2617.90 samples/sec   Loss 6.3791   LearningRate 0.0224   Epoch: 10   Global Step: 437150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:13,335-Speed 2619.52 samples/sec   Loss 6.2389   LearningRate 0.0224   Epoch: 10   Global Step: 437160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:17,234-Speed 2626.67 samples/sec   Loss 6.2738   LearningRate 0.0224   Epoch: 10   Global Step: 437170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:21,131-Speed 2628.37 samples/sec   Loss 6.2722   LearningRate 0.0224   Epoch: 10   Global Step: 437180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:25,039-Speed 2620.40 samples/sec   Loss 6.3382   LearningRate 0.0224   Epoch: 10   Global Step: 437190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:28,945-Speed 2622.02 samples/sec   Loss 6.2474   LearningRate 0.0224   Epoch: 10   Global Step: 437200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:32,826-Speed 2639.40 samples/sec   Loss 6.3833   LearningRate 0.0224   Epoch: 10   Global Step: 437210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:36,719-Speed 2631.38 samples/sec   Loss 6.2677   LearningRate 0.0224   Epoch: 10   Global Step: 437220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:40,613-Speed 2630.61 samples/sec   Loss 6.1946   LearningRate 0.0224   Epoch: 10   Global Step: 437230   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:44,510-Speed 2627.56 samples/sec   Loss 6.2124   LearningRate 0.0224   Epoch: 10   Global Step: 437240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:48,408-Speed 2627.96 samples/sec   Loss 6.2299   LearningRate 0.0224   Epoch: 10   Global Step: 437250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:52,308-Speed 2626.04 samples/sec   Loss 6.4169   LearningRate 0.0224   Epoch: 10   Global Step: 437260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:50:56,211-Speed 2624.60 samples/sec   Loss 6.3406   LearningRate 0.0224   Epoch: 10   Global Step: 437270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:00,105-Speed 2629.61 samples/sec   Loss 6.4025   LearningRate 0.0224   Epoch: 10   Global Step: 437280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:04,010-Speed 2623.18 samples/sec   Loss 6.2494   LearningRate 0.0224   Epoch: 10   Global Step: 437290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:07,908-Speed 2627.37 samples/sec   Loss 6.1872   LearningRate 0.0224   Epoch: 10   Global Step: 437300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:11,809-Speed 2625.49 samples/sec   Loss 6.2972   LearningRate 0.0224   Epoch: 10   Global Step: 437310   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-04-14 20:51:15,703-Speed 2630.64 samples/sec   Loss 6.3720   LearningRate 0.0224   Epoch: 10   Global Step: 437320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:19,595-Speed 2631.44 samples/sec   Loss 6.4045   LearningRate 0.0224   Epoch: 10   Global Step: 437330   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:23,489-Speed 2630.45 samples/sec   Loss 6.4163   LearningRate 0.0224   Epoch: 10   Global Step: 437340   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:27,382-Speed 2630.66 samples/sec   Loss 6.3298   LearningRate 0.0224   Epoch: 10   Global Step: 437350   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:31,274-Speed 2631.69 samples/sec   Loss 6.3289   LearningRate 0.0224   Epoch: 10   Global Step: 437360   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:35,165-Speed 2632.30 samples/sec   Loss 6.2367   LearningRate 0.0224   Epoch: 10   Global Step: 437370   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:51:39,051-Speed 2635.92 samples/sec   Loss 6.3615   LearningRate 0.0224   Epoch: 10   Global Step: 437380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:51:42,947-Speed 2628.47 samples/sec   Loss 6.3664   LearningRate 0.0223   Epoch: 10   Global Step: 437390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:51:46,847-Speed 2626.35 samples/sec   Loss 6.3207   LearningRate 0.0223   Epoch: 10   Global Step: 437400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:51:50,746-Speed 2627.49 samples/sec   Loss 6.3430   LearningRate 0.0223   Epoch: 10   Global Step: 437410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:51:54,657-Speed 2619.07 samples/sec   Loss 6.2726   LearningRate 0.0223   Epoch: 10   Global Step: 437420   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:51:58,561-Speed 2622.91 samples/sec   Loss 6.2872   LearningRate 0.0223   Epoch: 10   Global Step: 437430   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:02,458-Speed 2628.80 samples/sec   Loss 6.4406   LearningRate 0.0223   Epoch: 10   Global Step: 437440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:06,357-Speed 2626.30 samples/sec   Loss 6.2669   LearningRate 0.0223   Epoch: 10   Global Step: 437450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:10,260-Speed 2624.62 samples/sec   Loss 6.4368   LearningRate 0.0223   Epoch: 10   Global Step: 437460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:14,162-Speed 2624.78 samples/sec   Loss 6.3447   LearningRate 0.0223   Epoch: 10   Global Step: 437470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:18,067-Speed 2622.91 samples/sec   Loss 6.2603   LearningRate 0.0223   Epoch: 10   Global Step: 437480   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:21,962-Speed 2629.50 samples/sec   Loss 6.2768   LearningRate 0.0223   Epoch: 10   Global Step: 437490   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:25,859-Speed 2628.15 samples/sec   Loss 6.2501   LearningRate 0.0223   Epoch: 10   Global Step: 437500   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:29,760-Speed 2626.29 samples/sec   Loss 6.2802   LearningRate 0.0223   Epoch: 10   Global Step: 437510   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:33,659-Speed 2626.75 samples/sec   Loss 6.2931   LearningRate 0.0223   Epoch: 10   Global Step: 437520   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:37,559-Speed 2625.97 samples/sec   Loss 6.2845   LearningRate 0.0223   Epoch: 10   Global Step: 437530   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:41,461-Speed 2624.98 samples/sec   Loss 6.3254   LearningRate 0.0223   Epoch: 10   Global Step: 437540   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:52:45,354-Speed 2636.53 samples/sec   Loss 6.3358   LearningRate 0.0223   Epoch: 10   Global Step: 437550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:49,249-Speed 2628.90 samples/sec   Loss 6.1884   LearningRate 0.0223   Epoch: 10   Global Step: 437560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:53,152-Speed 2624.72 samples/sec   Loss 6.2209   LearningRate 0.0223   Epoch: 10   Global Step: 437570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:52:57,054-Speed 2624.81 samples/sec   Loss 6.3622   LearningRate 0.0223   Epoch: 10   Global Step: 437580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:00,953-Speed 2627.04 samples/sec   Loss 6.3989   LearningRate 0.0223   Epoch: 10   Global Step: 437590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:04,855-Speed 2624.68 samples/sec   Loss 6.2615   LearningRate 0.0223   Epoch: 10   Global Step: 437600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:08,755-Speed 2626.27 samples/sec   Loss 6.2458   LearningRate 0.0223   Epoch: 10   Global Step: 437610   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:12,655-Speed 2626.25 samples/sec   Loss 6.3319   LearningRate 0.0223   Epoch: 10   Global Step: 437620   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:16,555-Speed 2626.16 samples/sec   Loss 6.1806   LearningRate 0.0223   Epoch: 10   Global Step: 437630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:20,459-Speed 2623.28 samples/sec   Loss 6.2998   LearningRate 0.0223   Epoch: 10   Global Step: 437640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:24,361-Speed 2624.93 samples/sec   Loss 6.2091   LearningRate 0.0223   Epoch: 10   Global Step: 437650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:53:28,259-Speed 2627.69 samples/sec   Loss 6.3724   LearningRate 0.0223   Epoch: 10   Global Step: 437660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:53:32,154-Speed 2629.93 samples/sec   Loss 6.3796   LearningRate 0.0223   Epoch: 10   Global Step: 437670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:53:36,051-Speed 2627.81 samples/sec   Loss 6.2903   LearningRate 0.0223   Epoch: 10   Global Step: 437680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:53:39,954-Speed 2624.20 samples/sec   Loss 6.3392   LearningRate 0.0223   Epoch: 10   Global Step: 437690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:53:43,855-Speed 2625.69 samples/sec   Loss 6.2533   LearningRate 0.0223   Epoch: 10   Global Step: 437700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:53:47,732-Speed 2641.55 samples/sec   Loss 6.3824   LearningRate 0.0223   Epoch: 10   Global Step: 437710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:51,632-Speed 2626.48 samples/sec   Loss 6.3796   LearningRate 0.0223   Epoch: 10   Global Step: 437720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:55,529-Speed 2628.87 samples/sec   Loss 6.2278   LearningRate 0.0223   Epoch: 10   Global Step: 437730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:53:59,427-Speed 2627.18 samples/sec   Loss 6.2245   LearningRate 0.0223   Epoch: 10   Global Step: 437740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:03,330-Speed 2624.19 samples/sec   Loss 6.3426   LearningRate 0.0223   Epoch: 10   Global Step: 437750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:07,225-Speed 2629.53 samples/sec   Loss 6.2650   LearningRate 0.0223   Epoch: 10   Global Step: 437760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:11,121-Speed 2629.06 samples/sec   Loss 6.3270   LearningRate 0.0223   Epoch: 10   Global Step: 437770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:15,024-Speed 2624.01 samples/sec   Loss 6.3248   LearningRate 0.0223   Epoch: 10   Global Step: 437780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:18,918-Speed 2630.45 samples/sec   Loss 6.2518   LearningRate 0.0223   Epoch: 10   Global Step: 437790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:22,814-Speed 2629.01 samples/sec   Loss 6.3875   LearningRate 0.0223   Epoch: 10   Global Step: 437800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:54:26,713-Speed 2626.41 samples/sec   Loss 6.1993   LearningRate 0.0223   Epoch: 10   Global Step: 437810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:30,611-Speed 2628.37 samples/sec   Loss 6.3354   LearningRate 0.0223   Epoch: 10   Global Step: 437820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:34,597-Speed 2569.03 samples/sec   Loss 6.2937   LearningRate 0.0223   Epoch: 10   Global Step: 437830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:38,493-Speed 2628.79 samples/sec   Loss 6.1676   LearningRate 0.0223   Epoch: 10   Global Step: 437840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:42,401-Speed 2621.15 samples/sec   Loss 6.3532   LearningRate 0.0223   Epoch: 10   Global Step: 437850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:46,304-Speed 2623.97 samples/sec   Loss 6.2736   LearningRate 0.0223   Epoch: 10   Global Step: 437860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:50,213-Speed 2620.06 samples/sec   Loss 6.3724   LearningRate 0.0223   Epoch: 10   Global Step: 437870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:54,118-Speed 2623.08 samples/sec   Loss 6.3245   LearningRate 0.0223   Epoch: 10   Global Step: 437880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:54:58,023-Speed 2622.99 samples/sec   Loss 6.3258   LearningRate 0.0223   Epoch: 10   Global Step: 437890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:55:01,905-Speed 2638.46 samples/sec   Loss 6.2468   LearningRate 0.0223   Epoch: 10   Global Step: 437900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:05,813-Speed 2621.02 samples/sec   Loss 6.2462   LearningRate 0.0223   Epoch: 10   Global Step: 437910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:09,716-Speed 2624.49 samples/sec   Loss 6.3854   LearningRate 0.0223   Epoch: 10   Global Step: 437920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:13,612-Speed 2628.69 samples/sec   Loss 6.2011   LearningRate 0.0223   Epoch: 10   Global Step: 437930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:17,512-Speed 2626.39 samples/sec   Loss 6.2826   LearningRate 0.0223   Epoch: 10   Global Step: 437940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:21,403-Speed 2631.63 samples/sec   Loss 6.2486   LearningRate 0.0223   Epoch: 10   Global Step: 437950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:25,297-Speed 2630.53 samples/sec   Loss 6.2371   LearningRate 0.0223   Epoch: 10   Global Step: 437960   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:29,199-Speed 2624.63 samples/sec   Loss 6.2855   LearningRate 0.0223   Epoch: 10   Global Step: 437970   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:33,100-Speed 2625.96 samples/sec   Loss 6.2950   LearningRate 0.0223   Epoch: 10   Global Step: 437980   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:36,997-Speed 2627.89 samples/sec   Loss 6.4281   LearningRate 0.0223   Epoch: 10   Global Step: 437990   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:55:40,892-Speed 2629.38 samples/sec   Loss 6.4211   LearningRate 0.0223   Epoch: 10   Global Step: 438000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:55:44,789-Speed 2628.43 samples/sec   Loss 6.2792   LearningRate 0.0223   Epoch: 10   Global Step: 438010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:55:48,689-Speed 2627.06 samples/sec   Loss 6.2878   LearningRate 0.0223   Epoch: 10   Global Step: 438020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:55:52,591-Speed 2624.66 samples/sec   Loss 6.2652   LearningRate 0.0223   Epoch: 10   Global Step: 438030   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:55:56,489-Speed 2627.33 samples/sec   Loss 6.2773   LearningRate 0.0223   Epoch: 10   Global Step: 438040   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:56:00,386-Speed 2628.88 samples/sec   Loss 6.2564   LearningRate 0.0223   Epoch: 10   Global Step: 438050   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:56:04,282-Speed 2628.23 samples/sec   Loss 6.4159   LearningRate 0.0223   Epoch: 10   Global Step: 438060   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:56:08,180-Speed 2627.93 samples/sec   Loss 6.2553   LearningRate 0.0223   Epoch: 10   Global Step: 438070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:56:12,060-Speed 2639.11 samples/sec   Loss 6.3581   LearningRate 0.0223   Epoch: 10   Global Step: 438080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:15,952-Speed 2631.70 samples/sec   Loss 6.2864   LearningRate 0.0223   Epoch: 10   Global Step: 438090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:19,858-Speed 2622.35 samples/sec   Loss 6.2399   LearningRate 0.0223   Epoch: 10   Global Step: 438100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:23,753-Speed 2630.02 samples/sec   Loss 6.2383   LearningRate 0.0223   Epoch: 10   Global Step: 438110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:27,648-Speed 2629.69 samples/sec   Loss 6.2115   LearningRate 0.0223   Epoch: 10   Global Step: 438120   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:31,553-Speed 2622.65 samples/sec   Loss 6.2938   LearningRate 0.0223   Epoch: 10   Global Step: 438130   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:35,447-Speed 2630.21 samples/sec   Loss 6.2329   LearningRate 0.0223   Epoch: 10   Global Step: 438140   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:39,346-Speed 2626.69 samples/sec   Loss 6.1899   LearningRate 0.0223   Epoch: 10   Global Step: 438150   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:43,244-Speed 2628.17 samples/sec   Loss 6.3737   LearningRate 0.0223   Epoch: 10   Global Step: 438160   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:47,150-Speed 2622.15 samples/sec   Loss 6.2851   LearningRate 0.0223   Epoch: 10   Global Step: 438170   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:51,039-Speed 2633.57 samples/sec   Loss 6.3414   LearningRate 0.0223   Epoch: 10   Global Step: 438180   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:54,935-Speed 2628.65 samples/sec   Loss 6.2024   LearningRate 0.0223   Epoch: 10   Global Step: 438190   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:56:58,843-Speed 2620.49 samples/sec   Loss 6.2807   LearningRate 0.0223   Epoch: 10   Global Step: 438200   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:02,743-Speed 2626.45 samples/sec   Loss 6.2766   LearningRate 0.0223   Epoch: 10   Global Step: 438210   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:06,638-Speed 2630.91 samples/sec   Loss 6.3081   LearningRate 0.0223   Epoch: 10   Global Step: 438220   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:10,537-Speed 2626.32 samples/sec   Loss 6.3010   LearningRate 0.0223   Epoch: 10   Global Step: 438230   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:14,438-Speed 2625.41 samples/sec   Loss 6.3241   LearningRate 0.0223   Epoch: 10   Global Step: 438240   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:18,336-Speed 2628.03 samples/sec   Loss 6.3622   LearningRate 0.0223   Epoch: 10   Global Step: 438250   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:22,243-Speed 2621.53 samples/sec   Loss 6.3002   LearningRate 0.0223   Epoch: 10   Global Step: 438260   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:26,144-Speed 2625.54 samples/sec   Loss 6.2370   LearningRate 0.0222   Epoch: 10   Global Step: 438270   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:30,039-Speed 2629.52 samples/sec   Loss 6.1974   LearningRate 0.0222   Epoch: 10   Global Step: 438280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:57:33,938-Speed 2626.50 samples/sec   Loss 6.2153   LearningRate 0.0222   Epoch: 10   Global Step: 438290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:57:37,833-Speed 2629.72 samples/sec   Loss 6.3045   LearningRate 0.0222   Epoch: 10   Global Step: 438300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:57:41,708-Speed 2643.60 samples/sec   Loss 6.3704   LearningRate 0.0222   Epoch: 10   Global Step: 438310   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:45,604-Speed 2628.54 samples/sec   Loss 6.2354   LearningRate 0.0222   Epoch: 10   Global Step: 438320   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:49,497-Speed 2631.43 samples/sec   Loss 6.1917   LearningRate 0.0222   Epoch: 10   Global Step: 438330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:53,394-Speed 2628.61 samples/sec   Loss 6.2083   LearningRate 0.0222   Epoch: 10   Global Step: 438340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:57:57,294-Speed 2625.91 samples/sec   Loss 6.2609   LearningRate 0.0222   Epoch: 10   Global Step: 438350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:58:01,199-Speed 2623.19 samples/sec   Loss 6.3198   LearningRate 0.0222   Epoch: 10   Global Step: 438360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:58:05,094-Speed 2629.30 samples/sec   Loss 6.4078   LearningRate 0.0222   Epoch: 10   Global Step: 438370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:58:08,992-Speed 2627.55 samples/sec   Loss 6.2311   LearningRate 0.0222   Epoch: 10   Global Step: 438380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:58:12,885-Speed 2630.61 samples/sec   Loss 6.2666   LearningRate 0.0222   Epoch: 10   Global Step: 438390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:58:16,780-Speed 2629.30 samples/sec   Loss 6.2743   LearningRate 0.0222   Epoch: 10   Global Step: 438400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:58:20,678-Speed 2628.46 samples/sec   Loss 6.4093   LearningRate 0.0222   Epoch: 10   Global Step: 438410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:24,572-Speed 2630.32 samples/sec   Loss 6.2806   LearningRate 0.0222   Epoch: 10   Global Step: 438420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:28,469-Speed 2628.43 samples/sec   Loss 6.1766   LearningRate 0.0222   Epoch: 10   Global Step: 438430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:32,361-Speed 2631.12 samples/sec   Loss 6.3725   LearningRate 0.0222   Epoch: 10   Global Step: 438440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:36,259-Speed 2627.76 samples/sec   Loss 6.3496   LearningRate 0.0222   Epoch: 10   Global Step: 438450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:40,165-Speed 2621.93 samples/sec   Loss 6.3364   LearningRate 0.0222   Epoch: 10   Global Step: 438460   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:44,093-Speed 2608.48 samples/sec   Loss 6.4585   LearningRate 0.0222   Epoch: 10   Global Step: 438470   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:47,997-Speed 2623.46 samples/sec   Loss 6.2976   LearningRate 0.0222   Epoch: 10   Global Step: 438480   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:51,910-Speed 2617.15 samples/sec   Loss 6.3333   LearningRate 0.0222   Epoch: 10   Global Step: 438490   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:55,809-Speed 2627.50 samples/sec   Loss 6.2528   LearningRate 0.0222   Epoch: 10   Global Step: 438500   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:58:59,704-Speed 2629.49 samples/sec   Loss 6.3234   LearningRate 0.0222   Epoch: 10   Global Step: 438510   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-04-14 20:59:03,593-Speed 2633.75 samples/sec   Loss 6.2349   LearningRate 0.0222   Epoch: 10   Global Step: 438520   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:59:07,471-Speed 2640.51 samples/sec   Loss 6.4183   LearningRate 0.0222   Epoch: 10   Global Step: 438530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:11,371-Speed 2626.50 samples/sec   Loss 6.2408   LearningRate 0.0222   Epoch: 10   Global Step: 438540   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:15,271-Speed 2625.90 samples/sec   Loss 6.3368   LearningRate 0.0222   Epoch: 10   Global Step: 438550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:19,178-Speed 2621.93 samples/sec   Loss 6.3521   LearningRate 0.0222   Epoch: 10   Global Step: 438560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:23,097-Speed 2613.41 samples/sec   Loss 6.2122   LearningRate 0.0222   Epoch: 10   Global Step: 438570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:26,998-Speed 2625.96 samples/sec   Loss 6.3428   LearningRate 0.0222   Epoch: 10   Global Step: 438580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:30,898-Speed 2625.48 samples/sec   Loss 6.2142   LearningRate 0.0222   Epoch: 10   Global Step: 438590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:34,798-Speed 2626.45 samples/sec   Loss 6.2601   LearningRate 0.0222   Epoch: 10   Global Step: 438600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:38,695-Speed 2628.54 samples/sec   Loss 6.3275   LearningRate 0.0222   Epoch: 10   Global Step: 438610   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:42,602-Speed 2621.53 samples/sec   Loss 6.3317   LearningRate 0.0222   Epoch: 10   Global Step: 438620   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 20:59:46,496-Speed 2630.34 samples/sec   Loss 6.2610   LearningRate 0.0222   Epoch: 10   Global Step: 438630   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:59:50,393-Speed 2628.56 samples/sec   Loss 6.2707   LearningRate 0.0222   Epoch: 10   Global Step: 438640   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:59:54,289-Speed 2628.30 samples/sec   Loss 6.2592   LearningRate 0.0222   Epoch: 10   Global Step: 438650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 20:59:58,187-Speed 2628.25 samples/sec   Loss 6.3012   LearningRate 0.0222   Epoch: 10   Global Step: 438660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:00:02,087-Speed 2626.33 samples/sec   Loss 6.3156   LearningRate 0.0222   Epoch: 10   Global Step: 438670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:00:05,987-Speed 2625.97 samples/sec   Loss 6.3186   LearningRate 0.0222   Epoch: 10   Global Step: 438680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:00:09,899-Speed 2618.35 samples/sec   Loss 6.1475   LearningRate 0.0222   Epoch: 10   Global Step: 438690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:00:13,811-Speed 2618.32 samples/sec   Loss 6.3445   LearningRate 0.0222   Epoch: 10   Global Step: 438700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:00:17,722-Speed 2619.44 samples/sec   Loss 6.2939   LearningRate 0.0222   Epoch: 10   Global Step: 438710   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:00:21,607-Speed 2635.79 samples/sec   Loss 6.2619   LearningRate 0.0222   Epoch: 10   Global Step: 438720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:25,508-Speed 2626.18 samples/sec   Loss 6.1443   LearningRate 0.0222   Epoch: 10   Global Step: 438730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:29,409-Speed 2625.70 samples/sec   Loss 6.2739   LearningRate 0.0222   Epoch: 10   Global Step: 438740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:33,321-Speed 2618.24 samples/sec   Loss 6.2949   LearningRate 0.0222   Epoch: 10   Global Step: 438750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:37,213-Speed 2631.17 samples/sec   Loss 6.4069   LearningRate 0.0222   Epoch: 10   Global Step: 438760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:41,110-Speed 2628.48 samples/sec   Loss 6.2378   LearningRate 0.0222   Epoch: 10   Global Step: 438770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:45,005-Speed 2629.63 samples/sec   Loss 6.2482   LearningRate 0.0222   Epoch: 10   Global Step: 438780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:48,905-Speed 2626.60 samples/sec   Loss 6.2802   LearningRate 0.0222   Epoch: 10   Global Step: 438790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:52,799-Speed 2630.48 samples/sec   Loss 6.2776   LearningRate 0.0222   Epoch: 10   Global Step: 438800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:00:56,693-Speed 2630.07 samples/sec   Loss 6.1297   LearningRate 0.0222   Epoch: 10   Global Step: 438810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:00,586-Speed 2630.64 samples/sec   Loss 6.3094   LearningRate 0.0222   Epoch: 10   Global Step: 438820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:01:04,485-Speed 2627.30 samples/sec   Loss 6.3217   LearningRate 0.0222   Epoch: 10   Global Step: 438830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:01:08,391-Speed 2621.66 samples/sec   Loss 6.2487   LearningRate 0.0222   Epoch: 10   Global Step: 438840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:01:12,295-Speed 2623.54 samples/sec   Loss 6.2551   LearningRate 0.0222   Epoch: 10   Global Step: 438850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:01:16,185-Speed 2633.17 samples/sec   Loss 6.2591   LearningRate 0.0222   Epoch: 10   Global Step: 438860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:20,096-Speed 2619.05 samples/sec   Loss 6.3844   LearningRate 0.0222   Epoch: 10   Global Step: 438870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:23,992-Speed 2630.29 samples/sec   Loss 6.2555   LearningRate 0.0222   Epoch: 10   Global Step: 438880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:27,889-Speed 2628.76 samples/sec   Loss 6.2916   LearningRate 0.0222   Epoch: 10   Global Step: 438890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:31,785-Speed 2628.57 samples/sec   Loss 6.4286   LearningRate 0.0222   Epoch: 10   Global Step: 438900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:35,684-Speed 2626.84 samples/sec   Loss 6.3702   LearningRate 0.0222   Epoch: 10   Global Step: 438910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:39,581-Speed 2628.35 samples/sec   Loss 6.2623   LearningRate 0.0222   Epoch: 10   Global Step: 438920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:43,478-Speed 2627.81 samples/sec   Loss 6.2948   LearningRate 0.0222   Epoch: 10   Global Step: 438930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:47,401-Speed 2611.66 samples/sec   Loss 6.2703   LearningRate 0.0222   Epoch: 10   Global Step: 438940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:51,308-Speed 2621.08 samples/sec   Loss 6.1687   LearningRate 0.0222   Epoch: 10   Global Step: 438950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:01:55,214-Speed 2622.27 samples/sec   Loss 6.3986   LearningRate 0.0222   Epoch: 10   Global Step: 438960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:01:59,117-Speed 2624.51 samples/sec   Loss 6.2619   LearningRate 0.0222   Epoch: 10   Global Step: 438970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:02:03,017-Speed 2626.59 samples/sec   Loss 6.3389   LearningRate 0.0222   Epoch: 10   Global Step: 438980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:02:06,923-Speed 2622.24 samples/sec   Loss 6.3332   LearningRate 0.0222   Epoch: 10   Global Step: 438990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:02:10,827-Speed 2623.08 samples/sec   Loss 6.2701   LearningRate 0.0222   Epoch: 10   Global Step: 439000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:02:14,752-Speed 2609.31 samples/sec   Loss 6.3767   LearningRate 0.0222   Epoch: 10   Global Step: 439010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:02:18,643-Speed 2633.48 samples/sec   Loss 6.3881   LearningRate 0.0222   Epoch: 10   Global Step: 439020   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:22,550-Speed 2621.37 samples/sec   Loss 6.3860   LearningRate 0.0222   Epoch: 10   Global Step: 439030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:26,455-Speed 2622.81 samples/sec   Loss 6.3443   LearningRate 0.0222   Epoch: 10   Global Step: 439040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:30,361-Speed 2621.88 samples/sec   Loss 6.1497   LearningRate 0.0222   Epoch: 10   Global Step: 439050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:34,263-Speed 2624.80 samples/sec   Loss 6.2536   LearningRate 0.0222   Epoch: 10   Global Step: 439060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:38,172-Speed 2620.67 samples/sec   Loss 6.2902   LearningRate 0.0222   Epoch: 10   Global Step: 439070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:42,077-Speed 2623.07 samples/sec   Loss 6.2380   LearningRate 0.0222   Epoch: 10   Global Step: 439080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:46,009-Speed 2604.95 samples/sec   Loss 6.1894   LearningRate 0.0222   Epoch: 10   Global Step: 439090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:49,910-Speed 2625.79 samples/sec   Loss 6.2180   LearningRate 0.0222   Epoch: 10   Global Step: 439100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:53,817-Speed 2621.02 samples/sec   Loss 6.3303   LearningRate 0.0222   Epoch: 10   Global Step: 439110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:02:57,723-Speed 2622.59 samples/sec   Loss 6.2069   LearningRate 0.0222   Epoch: 10   Global Step: 439120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:03:01,615-Speed 2631.43 samples/sec   Loss 6.2362   LearningRate 0.0222   Epoch: 10   Global Step: 439130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:03:05,514-Speed 2626.27 samples/sec   Loss 6.2942   LearningRate 0.0222   Epoch: 10   Global Step: 439140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:03:09,421-Speed 2621.85 samples/sec   Loss 6.2433   LearningRate 0.0221   Epoch: 10   Global Step: 439150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:03:13,318-Speed 2628.85 samples/sec   Loss 6.4118   LearningRate 0.0221   Epoch: 10   Global Step: 439160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:03:17,202-Speed 2636.78 samples/sec   Loss 6.3233   LearningRate 0.0221   Epoch: 10   Global Step: 439170   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:21,098-Speed 2629.25 samples/sec   Loss 6.1553   LearningRate 0.0221   Epoch: 10   Global Step: 439180   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:24,990-Speed 2631.32 samples/sec   Loss 6.3902   LearningRate 0.0221   Epoch: 10   Global Step: 439190   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:28,888-Speed 2628.08 samples/sec   Loss 6.2793   LearningRate 0.0221   Epoch: 10   Global Step: 439200   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:32,794-Speed 2621.99 samples/sec   Loss 6.3182   LearningRate 0.0221   Epoch: 10   Global Step: 439210   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:36,699-Speed 2622.63 samples/sec   Loss 6.3014   LearningRate 0.0221   Epoch: 10   Global Step: 439220   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:40,609-Speed 2619.02 samples/sec   Loss 6.3191   LearningRate 0.0221   Epoch: 10   Global Step: 439230   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:44,510-Speed 2626.17 samples/sec   Loss 6.2335   LearningRate 0.0221   Epoch: 10   Global Step: 439240   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:48,411-Speed 2625.74 samples/sec   Loss 6.2984   LearningRate 0.0221   Epoch: 10   Global Step: 439250   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:52,308-Speed 2628.24 samples/sec   Loss 6.2369   LearningRate 0.0221   Epoch: 10   Global Step: 439260   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:03:56,203-Speed 2629.81 samples/sec   Loss 6.2992   LearningRate 0.0221   Epoch: 10   Global Step: 439270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:04:00,099-Speed 2628.45 samples/sec   Loss 6.2712   LearningRate 0.0221   Epoch: 10   Global Step: 439280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:04:04,001-Speed 2625.33 samples/sec   Loss 6.3027   LearningRate 0.0221   Epoch: 10   Global Step: 439290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:04:07,899-Speed 2626.92 samples/sec   Loss 6.2924   LearningRate 0.0221   Epoch: 10   Global Step: 439300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:04:11,772-Speed 2645.13 samples/sec   Loss 6.1527   LearningRate 0.0221   Epoch: 10   Global Step: 439310   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:15,678-Speed 2621.84 samples/sec   Loss 6.2183   LearningRate 0.0221   Epoch: 10   Global Step: 439320   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:19,589-Speed 2618.73 samples/sec   Loss 6.2982   LearningRate 0.0221   Epoch: 10   Global Step: 439330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:23,495-Speed 2622.52 samples/sec   Loss 6.1981   LearningRate 0.0221   Epoch: 10   Global Step: 439340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:27,389-Speed 2629.94 samples/sec   Loss 6.2960   LearningRate 0.0221   Epoch: 10   Global Step: 439350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:31,286-Speed 2628.46 samples/sec   Loss 6.2975   LearningRate 0.0221   Epoch: 10   Global Step: 439360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:35,184-Speed 2627.58 samples/sec   Loss 6.3313   LearningRate 0.0221   Epoch: 10   Global Step: 439370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:39,082-Speed 2627.36 samples/sec   Loss 6.1819   LearningRate 0.0221   Epoch: 10   Global Step: 439380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:42,980-Speed 2627.84 samples/sec   Loss 6.2661   LearningRate 0.0221   Epoch: 10   Global Step: 439390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:46,873-Speed 2630.98 samples/sec   Loss 6.2338   LearningRate 0.0221   Epoch: 10   Global Step: 439400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:04:50,790-Speed 2615.11 samples/sec   Loss 6.1665   LearningRate 0.0221   Epoch: 10   Global Step: 439410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:04:54,688-Speed 2627.21 samples/sec   Loss 6.2943   LearningRate 0.0221   Epoch: 10   Global Step: 439420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:04:58,584-Speed 2628.99 samples/sec   Loss 6.3450   LearningRate 0.0221   Epoch: 10   Global Step: 439430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:05:02,482-Speed 2627.15 samples/sec   Loss 6.2028   LearningRate 0.0221   Epoch: 10   Global Step: 439440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:05:06,379-Speed 2628.36 samples/sec   Loss 6.2658   LearningRate 0.0221   Epoch: 10   Global Step: 439450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:05:10,286-Speed 2621.58 samples/sec   Loss 6.2533   LearningRate 0.0221   Epoch: 10   Global Step: 439460   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:05:14,157-Speed 2646.65 samples/sec   Loss 6.2623   LearningRate 0.0221   Epoch: 10   Global Step: 439470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:18,054-Speed 2628.76 samples/sec   Loss 6.2242   LearningRate 0.0221   Epoch: 10   Global Step: 439480   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:21,951-Speed 2628.31 samples/sec   Loss 6.2698   LearningRate 0.0221   Epoch: 10   Global Step: 439490   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:25,843-Speed 2631.71 samples/sec   Loss 6.2521   LearningRate 0.0221   Epoch: 10   Global Step: 439500   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:29,744-Speed 2625.58 samples/sec   Loss 6.2683   LearningRate 0.0221   Epoch: 10   Global Step: 439510   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:33,649-Speed 2622.69 samples/sec   Loss 6.2790   LearningRate 0.0221   Epoch: 10   Global Step: 439520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:37,551-Speed 2624.23 samples/sec   Loss 6.2457   LearningRate 0.0221   Epoch: 10   Global Step: 439530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:41,460-Speed 2620.99 samples/sec   Loss 6.0911   LearningRate 0.0221   Epoch: 10   Global Step: 439540   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:45,362-Speed 2624.65 samples/sec   Loss 6.3067   LearningRate 0.0221   Epoch: 10   Global Step: 439550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:49,260-Speed 2628.06 samples/sec   Loss 6.2755   LearningRate 0.0221   Epoch: 10   Global Step: 439560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:05:53,163-Speed 2624.20 samples/sec   Loss 6.1967   LearningRate 0.0221   Epoch: 10   Global Step: 439570   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:05:57,057-Speed 2630.32 samples/sec   Loss 6.1567   LearningRate 0.0221   Epoch: 10   Global Step: 439580   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:06:00,932-Speed 2642.94 samples/sec   Loss 6.4043   LearningRate 0.0221   Epoch: 10   Global Step: 439590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:04,827-Speed 2629.38 samples/sec   Loss 6.1426   LearningRate 0.0221   Epoch: 10   Global Step: 439600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:08,725-Speed 2627.44 samples/sec   Loss 6.2339   LearningRate 0.0221   Epoch: 10   Global Step: 439610   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:12,632-Speed 2622.45 samples/sec   Loss 6.2252   LearningRate 0.0221   Epoch: 10   Global Step: 439620   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:16,530-Speed 2627.47 samples/sec   Loss 6.1344   LearningRate 0.0221   Epoch: 10   Global Step: 439630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:20,427-Speed 2627.97 samples/sec   Loss 6.3353   LearningRate 0.0221   Epoch: 10   Global Step: 439640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:24,326-Speed 2627.50 samples/sec   Loss 6.2962   LearningRate 0.0221   Epoch: 10   Global Step: 439650   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:28,221-Speed 2629.05 samples/sec   Loss 6.1722   LearningRate 0.0221   Epoch: 10   Global Step: 439660   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:32,122-Speed 2625.92 samples/sec   Loss 6.1627   LearningRate 0.0221   Epoch: 10   Global Step: 439670   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:36,023-Speed 2625.26 samples/sec   Loss 6.2447   LearningRate 0.0221   Epoch: 10   Global Step: 439680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:06:39,930-Speed 2621.72 samples/sec   Loss 6.2142   LearningRate 0.0221   Epoch: 10   Global Step: 439690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:06:43,832-Speed 2624.59 samples/sec   Loss 6.3037   LearningRate 0.0221   Epoch: 10   Global Step: 439700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:06:47,731-Speed 2626.87 samples/sec   Loss 6.1299   LearningRate 0.0221   Epoch: 10   Global Step: 439710   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:06:51,635-Speed 2623.79 samples/sec   Loss 6.2720   LearningRate 0.0221   Epoch: 10   Global Step: 439720   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:06:55,541-Speed 2622.03 samples/sec   Loss 6.2822   LearningRate 0.0221   Epoch: 10   Global Step: 439730   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:06:59,448-Speed 2621.42 samples/sec   Loss 6.3271   LearningRate 0.0221   Epoch: 10   Global Step: 439740   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:07:03,380-Speed 2605.52 samples/sec   Loss 6.2644   LearningRate 0.0221   Epoch: 10   Global Step: 439750   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:07:07,260-Speed 2639.55 samples/sec   Loss 6.2971   LearningRate 0.0221   Epoch: 10   Global Step: 439760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:11,157-Speed 2627.94 samples/sec   Loss 6.1496   LearningRate 0.0221   Epoch: 10   Global Step: 439770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:15,055-Speed 2628.07 samples/sec   Loss 6.2585   LearningRate 0.0221   Epoch: 10   Global Step: 439780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:18,957-Speed 2624.49 samples/sec   Loss 6.3417   LearningRate 0.0221   Epoch: 10   Global Step: 439790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:22,850-Speed 2631.07 samples/sec   Loss 6.3479   LearningRate 0.0221   Epoch: 10   Global Step: 439800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:26,745-Speed 2629.91 samples/sec   Loss 6.3210   LearningRate 0.0221   Epoch: 10   Global Step: 439810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:30,657-Speed 2617.82 samples/sec   Loss 6.2410   LearningRate 0.0221   Epoch: 10   Global Step: 439820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:34,553-Speed 2628.73 samples/sec   Loss 6.2700   LearningRate 0.0221   Epoch: 10   Global Step: 439830   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:38,448-Speed 2629.46 samples/sec   Loss 6.1748   LearningRate 0.0221   Epoch: 10   Global Step: 439840   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:42,340-Speed 2631.96 samples/sec   Loss 6.2808   LearningRate 0.0221   Epoch: 10   Global Step: 439850   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:07:46,241-Speed 2625.75 samples/sec   Loss 6.2453   LearningRate 0.0221   Epoch: 10   Global Step: 439860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:07:50,145-Speed 2623.05 samples/sec   Loss 6.1796   LearningRate 0.0221   Epoch: 10   Global Step: 439870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:07:54,053-Speed 2621.07 samples/sec   Loss 6.0885   LearningRate 0.0221   Epoch: 10   Global Step: 439880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:07:57,943-Speed 2633.31 samples/sec   Loss 6.3199   LearningRate 0.0221   Epoch: 10   Global Step: 439890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:01,851-Speed 2621.05 samples/sec   Loss 6.2790   LearningRate 0.0221   Epoch: 10   Global Step: 439900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:05,762-Speed 2618.70 samples/sec   Loss 6.3188   LearningRate 0.0221   Epoch: 10   Global Step: 439910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:09,660-Speed 2627.18 samples/sec   Loss 6.4608   LearningRate 0.0221   Epoch: 10   Global Step: 439920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:13,558-Speed 2627.57 samples/sec   Loss 6.2050   LearningRate 0.0221   Epoch: 10   Global Step: 439930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:17,450-Speed 2631.89 samples/sec   Loss 6.2104   LearningRate 0.0221   Epoch: 10   Global Step: 439940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:21,368-Speed 2614.11 samples/sec   Loss 6.1729   LearningRate 0.0221   Epoch: 10   Global Step: 439950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:25,273-Speed 2623.13 samples/sec   Loss 6.2674   LearningRate 0.0221   Epoch: 10   Global Step: 439960   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:29,173-Speed 2626.04 samples/sec   Loss 6.2629   LearningRate 0.0221   Epoch: 10   Global Step: 439970   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:33,067-Speed 2629.82 samples/sec   Loss 6.3779   LearningRate 0.0221   Epoch: 10   Global Step: 439980   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:08:36,963-Speed 2628.85 samples/sec   Loss 6.3100   LearningRate 0.0221   Epoch: 10   Global Step: 439990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:08:40,862-Speed 2626.98 samples/sec   Loss 6.3238   LearningRate 0.0221   Epoch: 10   Global Step: 440000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:09:23,743-[lfw][440000]XNorm: 23.537625
Training: 2022-04-14 21:09:23,743-[lfw][440000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 21:09:23,744-[lfw][440000]Accuracy-Highest: 0.99783
Training: 2022-04-14 21:10:13,485-[cfp_fp][440000]XNorm: 22.381696
Training: 2022-04-14 21:10:13,485-[cfp_fp][440000]Accuracy-Flip: 0.98671+-0.00616
Training: 2022-04-14 21:10:13,486-[cfp_fp][440000]Accuracy-Highest: 0.98843
Training: 2022-04-14 21:10:56,129-[agedb_30][440000]XNorm: 23.667220
Training: 2022-04-14 21:10:56,129-[agedb_30][440000]Accuracy-Flip: 0.97767+-0.00790
Training: 2022-04-14 21:10:56,130-[agedb_30][440000]Accuracy-Highest: 0.97767
Training: 2022-04-14 21:11:00,002-Speed 73.60 samples/sec   Loss 6.2343   LearningRate 0.0221   Epoch: 10   Global Step: 440010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:11:03,948-Speed 2595.51 samples/sec   Loss 6.3596   LearningRate 0.0221   Epoch: 10   Global Step: 440020   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:07,843-Speed 2629.59 samples/sec   Loss 6.2790   LearningRate 0.0221   Epoch: 10   Global Step: 440030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:11,718-Speed 2643.49 samples/sec   Loss 6.3260   LearningRate 0.0220   Epoch: 10   Global Step: 440040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:15,593-Speed 2642.70 samples/sec   Loss 6.3500   LearningRate 0.0220   Epoch: 10   Global Step: 440050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:19,471-Speed 2640.75 samples/sec   Loss 6.4086   LearningRate 0.0220   Epoch: 10   Global Step: 440060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:23,393-Speed 2612.03 samples/sec   Loss 6.2468   LearningRate 0.0220   Epoch: 10   Global Step: 440070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:27,319-Speed 2609.02 samples/sec   Loss 6.3152   LearningRate 0.0220   Epoch: 10   Global Step: 440080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:31,205-Speed 2636.09 samples/sec   Loss 6.4061   LearningRate 0.0220   Epoch: 10   Global Step: 440090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:35,090-Speed 2636.19 samples/sec   Loss 6.2125   LearningRate 0.0220   Epoch: 10   Global Step: 440100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:38,979-Speed 2633.10 samples/sec   Loss 6.2174   LearningRate 0.0220   Epoch: 10   Global Step: 440110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:11:42,876-Speed 2628.16 samples/sec   Loss 6.1686   LearningRate 0.0220   Epoch: 10   Global Step: 440120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:11:46,776-Speed 2626.19 samples/sec   Loss 6.3549   LearningRate 0.0220   Epoch: 10   Global Step: 440130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:11:50,675-Speed 2627.47 samples/sec   Loss 6.3522   LearningRate 0.0220   Epoch: 10   Global Step: 440140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:11:54,581-Speed 2622.19 samples/sec   Loss 6.3021   LearningRate 0.0220   Epoch: 10   Global Step: 440150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:11:58,479-Speed 2627.61 samples/sec   Loss 6.4588   LearningRate 0.0220   Epoch: 10   Global Step: 440160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:02,377-Speed 2627.70 samples/sec   Loss 6.2182   LearningRate 0.0220   Epoch: 10   Global Step: 440170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:06,284-Speed 2621.25 samples/sec   Loss 6.3357   LearningRate 0.0220   Epoch: 10   Global Step: 440180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:10,222-Speed 2600.94 samples/sec   Loss 6.3953   LearningRate 0.0220   Epoch: 10   Global Step: 440190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:14,157-Speed 2602.59 samples/sec   Loss 6.2853   LearningRate 0.0220   Epoch: 10   Global Step: 440200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:18,148-Speed 2566.34 samples/sec   Loss 6.3381   LearningRate 0.0220   Epoch: 10   Global Step: 440210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:22,036-Speed 2634.57 samples/sec   Loss 6.3043   LearningRate 0.0220   Epoch: 10   Global Step: 440220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:25,932-Speed 2628.93 samples/sec   Loss 6.2578   LearningRate 0.0220   Epoch: 10   Global Step: 440230   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:29,833-Speed 2625.39 samples/sec   Loss 6.2450   LearningRate 0.0220   Epoch: 10   Global Step: 440240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:33,731-Speed 2627.76 samples/sec   Loss 6.2257   LearningRate 0.0220   Epoch: 10   Global Step: 440250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:37,621-Speed 2633.56 samples/sec   Loss 6.2790   LearningRate 0.0220   Epoch: 10   Global Step: 440260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:41,517-Speed 2628.80 samples/sec   Loss 6.2789   LearningRate 0.0220   Epoch: 10   Global Step: 440270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:45,408-Speed 2632.12 samples/sec   Loss 6.3246   LearningRate 0.0220   Epoch: 10   Global Step: 440280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:49,303-Speed 2629.93 samples/sec   Loss 6.2043   LearningRate 0.0220   Epoch: 10   Global Step: 440290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:53,204-Speed 2625.39 samples/sec   Loss 6.2937   LearningRate 0.0220   Epoch: 10   Global Step: 440300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:12:57,098-Speed 2630.32 samples/sec   Loss 6.2429   LearningRate 0.0220   Epoch: 10   Global Step: 440310   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:13:00,980-Speed 2638.51 samples/sec   Loss 6.3234   LearningRate 0.0220   Epoch: 10   Global Step: 440320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:13:04,899-Speed 2613.17 samples/sec   Loss 6.4180   LearningRate 0.0220   Epoch: 10   Global Step: 440330   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:13:08,798-Speed 2627.23 samples/sec   Loss 6.1501   LearningRate 0.0220   Epoch: 10   Global Step: 440340   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:13:12,699-Speed 2625.36 samples/sec   Loss 6.2876   LearningRate 0.0220   Epoch: 10   Global Step: 440350   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:13:16,601-Speed 2625.04 samples/sec   Loss 6.1974   LearningRate 0.0220   Epoch: 10   Global Step: 440360   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:13:20,479-Speed 2640.86 samples/sec   Loss 6.2410   LearningRate 0.0220   Epoch: 10   Global Step: 440370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:24,385-Speed 2622.37 samples/sec   Loss 6.2753   LearningRate 0.0220   Epoch: 10   Global Step: 440380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:28,282-Speed 2628.18 samples/sec   Loss 6.3038   LearningRate 0.0220   Epoch: 10   Global Step: 440390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:32,177-Speed 2629.96 samples/sec   Loss 6.2457   LearningRate 0.0220   Epoch: 10   Global Step: 440400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:36,087-Speed 2618.91 samples/sec   Loss 6.2532   LearningRate 0.0220   Epoch: 10   Global Step: 440410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:40,076-Speed 2567.73 samples/sec   Loss 6.2964   LearningRate 0.0220   Epoch: 10   Global Step: 440420   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:43,978-Speed 2625.14 samples/sec   Loss 6.4332   LearningRate 0.0220   Epoch: 10   Global Step: 440430   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:47,882-Speed 2623.37 samples/sec   Loss 6.2338   LearningRate 0.0220   Epoch: 10   Global Step: 440440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:51,774-Speed 2631.51 samples/sec   Loss 6.2542   LearningRate 0.0220   Epoch: 10   Global Step: 440450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:55,675-Speed 2625.57 samples/sec   Loss 6.2680   LearningRate 0.0220   Epoch: 10   Global Step: 440460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:13:59,570-Speed 2630.00 samples/sec   Loss 6.2640   LearningRate 0.0220   Epoch: 10   Global Step: 440470   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:14:03,464-Speed 2630.45 samples/sec   Loss 6.2352   LearningRate 0.0220   Epoch: 10   Global Step: 440480   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:14:07,353-Speed 2633.09 samples/sec   Loss 6.2199   LearningRate 0.0220   Epoch: 10   Global Step: 440490   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:14:11,250-Speed 2628.62 samples/sec   Loss 6.3016   LearningRate 0.0220   Epoch: 10   Global Step: 440500   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:14:15,130-Speed 2639.71 samples/sec   Loss 6.3234   LearningRate 0.0220   Epoch: 10   Global Step: 440510   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:19,028-Speed 2627.32 samples/sec   Loss 6.2907   LearningRate 0.0220   Epoch: 10   Global Step: 440520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:22,931-Speed 2623.75 samples/sec   Loss 6.3081   LearningRate 0.0220   Epoch: 10   Global Step: 440530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:26,820-Speed 2633.74 samples/sec   Loss 6.2186   LearningRate 0.0220   Epoch: 10   Global Step: 440540   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:30,718-Speed 2627.67 samples/sec   Loss 6.2518   LearningRate 0.0220   Epoch: 10   Global Step: 440550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:34,613-Speed 2629.82 samples/sec   Loss 6.3268   LearningRate 0.0220   Epoch: 10   Global Step: 440560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:38,506-Speed 2630.84 samples/sec   Loss 6.1674   LearningRate 0.0220   Epoch: 10   Global Step: 440570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:42,399-Speed 2631.03 samples/sec   Loss 6.2861   LearningRate 0.0220   Epoch: 10   Global Step: 440580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:46,293-Speed 2630.78 samples/sec   Loss 6.1539   LearningRate 0.0220   Epoch: 10   Global Step: 440590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:50,183-Speed 2632.68 samples/sec   Loss 6.2507   LearningRate 0.0220   Epoch: 10   Global Step: 440600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:14:54,086-Speed 2624.02 samples/sec   Loss 6.2851   LearningRate 0.0220   Epoch: 10   Global Step: 440610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:14:57,982-Speed 2629.11 samples/sec   Loss 6.2335   LearningRate 0.0220   Epoch: 10   Global Step: 440620   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:01,873-Speed 2632.44 samples/sec   Loss 6.2761   LearningRate 0.0220   Epoch: 10   Global Step: 440630   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:05,779-Speed 2621.75 samples/sec   Loss 6.3155   LearningRate 0.0220   Epoch: 10   Global Step: 440640   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:09,676-Speed 2628.66 samples/sec   Loss 6.2316   LearningRate 0.0220   Epoch: 10   Global Step: 440650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:13,572-Speed 2628.73 samples/sec   Loss 6.2607   LearningRate 0.0220   Epoch: 10   Global Step: 440660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:17,466-Speed 2630.84 samples/sec   Loss 6.3695   LearningRate 0.0220   Epoch: 10   Global Step: 440670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:21,364-Speed 2627.64 samples/sec   Loss 6.2916   LearningRate 0.0220   Epoch: 10   Global Step: 440680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:25,254-Speed 2632.45 samples/sec   Loss 6.2919   LearningRate 0.0220   Epoch: 10   Global Step: 440690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:15:29,133-Speed 2640.88 samples/sec   Loss 6.2285   LearningRate 0.0220   Epoch: 10   Global Step: 440700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:33,028-Speed 2629.11 samples/sec   Loss 6.2045   LearningRate 0.0220   Epoch: 10   Global Step: 440710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:36,930-Speed 2624.90 samples/sec   Loss 6.2290   LearningRate 0.0220   Epoch: 10   Global Step: 440720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:40,831-Speed 2625.44 samples/sec   Loss 6.3011   LearningRate 0.0220   Epoch: 10   Global Step: 440730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:44,722-Speed 2632.26 samples/sec   Loss 6.2203   LearningRate 0.0220   Epoch: 10   Global Step: 440740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:48,619-Speed 2628.59 samples/sec   Loss 6.2775   LearningRate 0.0220   Epoch: 10   Global Step: 440750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:52,517-Speed 2627.98 samples/sec   Loss 6.4200   LearningRate 0.0220   Epoch: 10   Global Step: 440760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:15:56,411-Speed 2630.45 samples/sec   Loss 6.2815   LearningRate 0.0220   Epoch: 10   Global Step: 440770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:00,319-Speed 2621.49 samples/sec   Loss 6.3188   LearningRate 0.0220   Epoch: 10   Global Step: 440780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:04,215-Speed 2628.85 samples/sec   Loss 6.2259   LearningRate 0.0220   Epoch: 10   Global Step: 440790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:08,111-Speed 2628.84 samples/sec   Loss 6.2525   LearningRate 0.0220   Epoch: 10   Global Step: 440800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:16:12,011-Speed 2625.92 samples/sec   Loss 6.2214   LearningRate 0.0220   Epoch: 10   Global Step: 440810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:16:15,891-Speed 2640.24 samples/sec   Loss 6.1936   LearningRate 0.0220   Epoch: 10   Global Step: 440820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:19,784-Speed 2631.18 samples/sec   Loss 6.2478   LearningRate 0.0220   Epoch: 10   Global Step: 440830   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:23,680-Speed 2628.83 samples/sec   Loss 6.3209   LearningRate 0.0220   Epoch: 10   Global Step: 440840   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:27,578-Speed 2627.74 samples/sec   Loss 6.2759   LearningRate 0.0220   Epoch: 10   Global Step: 440850   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:31,471-Speed 2631.49 samples/sec   Loss 6.2269   LearningRate 0.0220   Epoch: 10   Global Step: 440860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:35,366-Speed 2629.68 samples/sec   Loss 6.2685   LearningRate 0.0220   Epoch: 10   Global Step: 440870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:39,261-Speed 2629.70 samples/sec   Loss 6.1301   LearningRate 0.0220   Epoch: 10   Global Step: 440880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:43,177-Speed 2615.11 samples/sec   Loss 6.2100   LearningRate 0.0220   Epoch: 10   Global Step: 440890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:47,077-Speed 2626.32 samples/sec   Loss 6.2007   LearningRate 0.0220   Epoch: 10   Global Step: 440900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:50,975-Speed 2627.55 samples/sec   Loss 6.2429   LearningRate 0.0220   Epoch: 10   Global Step: 440910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:16:54,873-Speed 2628.23 samples/sec   Loss 6.2743   LearningRate 0.0219   Epoch: 10   Global Step: 440920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:16:58,815-Speed 2598.14 samples/sec   Loss 6.2701   LearningRate 0.0219   Epoch: 10   Global Step: 440930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:02,730-Speed 2617.31 samples/sec   Loss 6.2273   LearningRate 0.0219   Epoch: 10   Global Step: 440940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:06,632-Speed 2624.84 samples/sec   Loss 6.2093   LearningRate 0.0219   Epoch: 10   Global Step: 440950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:10,641-Speed 2554.89 samples/sec   Loss 6.2637   LearningRate 0.0219   Epoch: 10   Global Step: 440960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:14,551-Speed 2619.28 samples/sec   Loss 6.2213   LearningRate 0.0219   Epoch: 10   Global Step: 440970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:18,463-Speed 2617.85 samples/sec   Loss 6.1904   LearningRate 0.0219   Epoch: 10   Global Step: 440980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:22,359-Speed 2629.29 samples/sec   Loss 6.2823   LearningRate 0.0219   Epoch: 10   Global Step: 440990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:26,274-Speed 2616.07 samples/sec   Loss 6.2009   LearningRate 0.0219   Epoch: 10   Global Step: 441000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:30,170-Speed 2629.49 samples/sec   Loss 6.2590   LearningRate 0.0219   Epoch: 10   Global Step: 441010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:34,048-Speed 2641.11 samples/sec   Loss 6.4397   LearningRate 0.0219   Epoch: 10   Global Step: 441020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:17:37,939-Speed 2632.43 samples/sec   Loss 6.1909   LearningRate 0.0219   Epoch: 10   Global Step: 441030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:17:41,843-Speed 2623.74 samples/sec   Loss 6.2035   LearningRate 0.0219   Epoch: 10   Global Step: 441040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:17:45,743-Speed 2626.21 samples/sec   Loss 6.2397   LearningRate 0.0219   Epoch: 10   Global Step: 441050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:17:49,663-Speed 2612.51 samples/sec   Loss 6.3583   LearningRate 0.0219   Epoch: 10   Global Step: 441060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:17:53,638-Speed 2577.37 samples/sec   Loss 6.2683   LearningRate 0.0219   Epoch: 10   Global Step: 441070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:17:57,536-Speed 2627.11 samples/sec   Loss 6.2663   LearningRate 0.0219   Epoch: 10   Global Step: 441080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:01,447-Speed 2619.05 samples/sec   Loss 6.2112   LearningRate 0.0219   Epoch: 10   Global Step: 441090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:05,356-Speed 2620.13 samples/sec   Loss 6.2311   LearningRate 0.0219   Epoch: 10   Global Step: 441100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:09,262-Speed 2622.63 samples/sec   Loss 6.2990   LearningRate 0.0219   Epoch: 10   Global Step: 441110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:13,167-Speed 2623.18 samples/sec   Loss 6.2336   LearningRate 0.0219   Epoch: 10   Global Step: 441120   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:17,066-Speed 2627.10 samples/sec   Loss 6.3075   LearningRate 0.0219   Epoch: 10   Global Step: 441130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:18:20,945-Speed 2640.43 samples/sec   Loss 6.2206   LearningRate 0.0219   Epoch: 10   Global Step: 441140   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:24,849-Speed 2623.64 samples/sec   Loss 6.2941   LearningRate 0.0219   Epoch: 10   Global Step: 441150   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:28,775-Speed 2608.40 samples/sec   Loss 6.2170   LearningRate 0.0219   Epoch: 10   Global Step: 441160   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:32,674-Speed 2627.97 samples/sec   Loss 6.3462   LearningRate 0.0219   Epoch: 10   Global Step: 441170   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:36,567-Speed 2630.69 samples/sec   Loss 6.2613   LearningRate 0.0219   Epoch: 10   Global Step: 441180   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:40,492-Speed 2609.80 samples/sec   Loss 6.2418   LearningRate 0.0219   Epoch: 10   Global Step: 441190   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:44,399-Speed 2621.47 samples/sec   Loss 6.1984   LearningRate 0.0219   Epoch: 10   Global Step: 441200   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:48,302-Speed 2624.34 samples/sec   Loss 6.2550   LearningRate 0.0219   Epoch: 10   Global Step: 441210   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:52,216-Speed 2617.18 samples/sec   Loss 6.1778   LearningRate 0.0219   Epoch: 10   Global Step: 441220   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:18:56,132-Speed 2615.13 samples/sec   Loss 6.3367   LearningRate 0.0219   Epoch: 10   Global Step: 441230   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:00,052-Speed 2613.08 samples/sec   Loss 6.2780   LearningRate 0.0219   Epoch: 10   Global Step: 441240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:19:04,041-Speed 2567.69 samples/sec   Loss 6.2565   LearningRate 0.0219   Epoch: 10   Global Step: 441250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:19:08,045-Speed 2558.31 samples/sec   Loss 6.1913   LearningRate 0.0219   Epoch: 10   Global Step: 441260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:19:11,942-Speed 2627.90 samples/sec   Loss 6.3344   LearningRate 0.0219   Epoch: 10   Global Step: 441270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:19:15,848-Speed 2622.98 samples/sec   Loss 6.2304   LearningRate 0.0219   Epoch: 10   Global Step: 441280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:19:19,727-Speed 2640.30 samples/sec   Loss 6.2353   LearningRate 0.0219   Epoch: 10   Global Step: 441290   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:23,623-Speed 2628.99 samples/sec   Loss 6.1735   LearningRate 0.0219   Epoch: 10   Global Step: 441300   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:27,517-Speed 2629.86 samples/sec   Loss 6.2267   LearningRate 0.0219   Epoch: 10   Global Step: 441310   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:31,417-Speed 2627.04 samples/sec   Loss 6.2669   LearningRate 0.0219   Epoch: 10   Global Step: 441320   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:35,314-Speed 2628.05 samples/sec   Loss 6.0785   LearningRate 0.0219   Epoch: 10   Global Step: 441330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:39,226-Speed 2618.89 samples/sec   Loss 6.1950   LearningRate 0.0219   Epoch: 10   Global Step: 441340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:43,142-Speed 2615.81 samples/sec   Loss 6.3127   LearningRate 0.0219   Epoch: 10   Global Step: 441350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:47,040-Speed 2628.20 samples/sec   Loss 6.1676   LearningRate 0.0219   Epoch: 10   Global Step: 441360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:50,937-Speed 2628.04 samples/sec   Loss 6.2051   LearningRate 0.0219   Epoch: 10   Global Step: 441370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:54,835-Speed 2627.02 samples/sec   Loss 6.3441   LearningRate 0.0219   Epoch: 10   Global Step: 441380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:19:58,736-Speed 2625.44 samples/sec   Loss 6.1924   LearningRate 0.0219   Epoch: 10   Global Step: 441390   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:20:02,637-Speed 2625.90 samples/sec   Loss 6.1282   LearningRate 0.0219   Epoch: 10   Global Step: 441400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:20:06,547-Speed 2620.43 samples/sec   Loss 6.3088   LearningRate 0.0219   Epoch: 10   Global Step: 441410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:20:10,444-Speed 2627.91 samples/sec   Loss 6.3560   LearningRate 0.0219   Epoch: 10   Global Step: 441420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:20:14,351-Speed 2621.96 samples/sec   Loss 6.3312   LearningRate 0.0219   Epoch: 10   Global Step: 441430   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:18,253-Speed 2624.58 samples/sec   Loss 6.2441   LearningRate 0.0219   Epoch: 10   Global Step: 441440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:22,161-Speed 2621.31 samples/sec   Loss 6.3071   LearningRate 0.0219   Epoch: 10   Global Step: 441450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:26,050-Speed 2633.01 samples/sec   Loss 6.3212   LearningRate 0.0219   Epoch: 10   Global Step: 441460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:29,951-Speed 2626.35 samples/sec   Loss 6.3147   LearningRate 0.0219   Epoch: 10   Global Step: 441470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:33,843-Speed 2631.58 samples/sec   Loss 6.2791   LearningRate 0.0219   Epoch: 10   Global Step: 441480   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:37,816-Speed 2578.55 samples/sec   Loss 6.3031   LearningRate 0.0219   Epoch: 10   Global Step: 441490   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:41,720-Speed 2623.97 samples/sec   Loss 6.3801   LearningRate 0.0219   Epoch: 10   Global Step: 441500   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:45,635-Speed 2615.82 samples/sec   Loss 6.3100   LearningRate 0.0219   Epoch: 10   Global Step: 441510   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:49,549-Speed 2617.22 samples/sec   Loss 6.2327   LearningRate 0.0219   Epoch: 10   Global Step: 441520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:20:53,459-Speed 2619.82 samples/sec   Loss 6.2127   LearningRate 0.0219   Epoch: 10   Global Step: 441530   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:20:57,356-Speed 2628.20 samples/sec   Loss 6.3113   LearningRate 0.0219   Epoch: 10   Global Step: 441540   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:21:01,256-Speed 2626.52 samples/sec   Loss 6.2530   LearningRate 0.0219   Epoch: 10   Global Step: 441550   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:21:05,159-Speed 2624.14 samples/sec   Loss 6.2108   LearningRate 0.0219   Epoch: 10   Global Step: 441560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:21:09,075-Speed 2615.30 samples/sec   Loss 6.2290   LearningRate 0.0219   Epoch: 10   Global Step: 441570   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:21:12,962-Speed 2635.47 samples/sec   Loss 6.2130   LearningRate 0.0219   Epoch: 10   Global Step: 441580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:16,864-Speed 2625.07 samples/sec   Loss 6.2103   LearningRate 0.0219   Epoch: 10   Global Step: 441590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:20,771-Speed 2621.86 samples/sec   Loss 6.1871   LearningRate 0.0219   Epoch: 10   Global Step: 441600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:24,857-Speed 2506.61 samples/sec   Loss 6.3187   LearningRate 0.0219   Epoch: 10   Global Step: 441610   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:28,762-Speed 2623.03 samples/sec   Loss 6.2107   LearningRate 0.0219   Epoch: 10   Global Step: 441620   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:32,735-Speed 2578.50 samples/sec   Loss 6.2256   LearningRate 0.0219   Epoch: 10   Global Step: 441630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:36,632-Speed 2628.20 samples/sec   Loss 6.2675   LearningRate 0.0219   Epoch: 10   Global Step: 441640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:40,531-Speed 2626.64 samples/sec   Loss 6.2294   LearningRate 0.0219   Epoch: 10   Global Step: 441650   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:44,444-Speed 2617.38 samples/sec   Loss 6.2496   LearningRate 0.0219   Epoch: 10   Global Step: 441660   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:48,344-Speed 2625.91 samples/sec   Loss 6.2397   LearningRate 0.0219   Epoch: 10   Global Step: 441670   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:21:52,246-Speed 2625.76 samples/sec   Loss 6.2250   LearningRate 0.0219   Epoch: 10   Global Step: 441680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:21:56,147-Speed 2625.14 samples/sec   Loss 6.1442   LearningRate 0.0219   Epoch: 10   Global Step: 441690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:22:00,050-Speed 2624.76 samples/sec   Loss 6.1971   LearningRate 0.0219   Epoch: 10   Global Step: 441700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:22:03,963-Speed 2617.10 samples/sec   Loss 6.2099   LearningRate 0.0219   Epoch: 10   Global Step: 441710   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:22:07,862-Speed 2626.92 samples/sec   Loss 6.2119   LearningRate 0.0219   Epoch: 10   Global Step: 441720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:11,763-Speed 2625.86 samples/sec   Loss 6.1845   LearningRate 0.0219   Epoch: 10   Global Step: 441730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:15,665-Speed 2624.88 samples/sec   Loss 6.2021   LearningRate 0.0219   Epoch: 10   Global Step: 441740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:19,570-Speed 2623.34 samples/sec   Loss 6.2884   LearningRate 0.0219   Epoch: 10   Global Step: 441750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:23,496-Speed 2608.85 samples/sec   Loss 6.2205   LearningRate 0.0219   Epoch: 10   Global Step: 441760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:27,434-Speed 2600.81 samples/sec   Loss 6.1613   LearningRate 0.0219   Epoch: 10   Global Step: 441770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:31,333-Speed 2627.01 samples/sec   Loss 6.2692   LearningRate 0.0219   Epoch: 10   Global Step: 441780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:35,285-Speed 2591.79 samples/sec   Loss 6.1918   LearningRate 0.0219   Epoch: 10   Global Step: 441790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:22:39,174-Speed 2634.10 samples/sec   Loss 6.2663   LearningRate 0.0219   Epoch: 10   Global Step: 441800   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:22:43,090-Speed 2615.09 samples/sec   Loss 6.2525   LearningRate 0.0218   Epoch: 10   Global Step: 441810   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:22:46,987-Speed 2628.90 samples/sec   Loss 6.1154   LearningRate 0.0218   Epoch: 10   Global Step: 441820   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:22:50,890-Speed 2623.91 samples/sec   Loss 6.1648   LearningRate 0.0218   Epoch: 10   Global Step: 441830   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:22:54,791-Speed 2625.59 samples/sec   Loss 6.2966   LearningRate 0.0218   Epoch: 10   Global Step: 441840   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:22:58,692-Speed 2625.59 samples/sec   Loss 6.2905   LearningRate 0.0218   Epoch: 10   Global Step: 441850   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:23:02,592-Speed 2626.56 samples/sec   Loss 6.2940   LearningRate 0.0218   Epoch: 10   Global Step: 441860   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:23:06,552-Speed 2586.15 samples/sec   Loss 6.1502   LearningRate 0.0218   Epoch: 10   Global Step: 441870   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:23:10,450-Speed 2627.42 samples/sec   Loss 6.3086   LearningRate 0.0218   Epoch: 10   Global Step: 441880   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:23:14,354-Speed 2623.58 samples/sec   Loss 6.2423   LearningRate 0.0218   Epoch: 10   Global Step: 441890   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-04-14 21:23:18,254-Speed 2626.47 samples/sec   Loss 6.1922   LearningRate 0.0218   Epoch: 10   Global Step: 441900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:22,151-Speed 2628.34 samples/sec   Loss 6.2035   LearningRate 0.0218   Epoch: 10   Global Step: 441910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:26,046-Speed 2629.29 samples/sec   Loss 6.2472   LearningRate 0.0218   Epoch: 10   Global Step: 441920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:29,946-Speed 2626.53 samples/sec   Loss 6.1723   LearningRate 0.0218   Epoch: 10   Global Step: 441930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:33,850-Speed 2623.22 samples/sec   Loss 6.1634   LearningRate 0.0218   Epoch: 10   Global Step: 441940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:37,747-Speed 2628.40 samples/sec   Loss 6.3230   LearningRate 0.0218   Epoch: 10   Global Step: 441950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:41,643-Speed 2628.60 samples/sec   Loss 6.1892   LearningRate 0.0218   Epoch: 10   Global Step: 441960   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:45,539-Speed 2629.67 samples/sec   Loss 6.2960   LearningRate 0.0218   Epoch: 10   Global Step: 441970   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:49,433-Speed 2630.00 samples/sec   Loss 6.2793   LearningRate 0.0218   Epoch: 10   Global Step: 441980   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:53,331-Speed 2627.84 samples/sec   Loss 6.2833   LearningRate 0.0218   Epoch: 10   Global Step: 441990   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:23:57,233-Speed 2624.42 samples/sec   Loss 6.1722   LearningRate 0.0218   Epoch: 10   Global Step: 442000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:01,133-Speed 2626.81 samples/sec   Loss 6.3447   LearningRate 0.0218   Epoch: 10   Global Step: 442010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:05,036-Speed 2624.02 samples/sec   Loss 6.3101   LearningRate 0.0218   Epoch: 10   Global Step: 442020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:08,943-Speed 2621.30 samples/sec   Loss 6.2773   LearningRate 0.0218   Epoch: 10   Global Step: 442030   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:12,841-Speed 2627.16 samples/sec   Loss 6.2351   LearningRate 0.0218   Epoch: 10   Global Step: 442040   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:16,740-Speed 2627.44 samples/sec   Loss 6.2864   LearningRate 0.0218   Epoch: 10   Global Step: 442050   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:20,644-Speed 2623.91 samples/sec   Loss 6.3061   LearningRate 0.0218   Epoch: 10   Global Step: 442060   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:24,686-Speed 2533.74 samples/sec   Loss 6.2538   LearningRate 0.0218   Epoch: 10   Global Step: 442070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:24:28,570-Speed 2637.08 samples/sec   Loss 6.1844   LearningRate 0.0218   Epoch: 10   Global Step: 442080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:32,469-Speed 2627.56 samples/sec   Loss 6.2651   LearningRate 0.0218   Epoch: 10   Global Step: 442090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:36,375-Speed 2621.79 samples/sec   Loss 6.3103   LearningRate 0.0218   Epoch: 10   Global Step: 442100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:40,295-Speed 2613.49 samples/sec   Loss 6.3422   LearningRate 0.0218   Epoch: 10   Global Step: 442110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:44,215-Speed 2613.32 samples/sec   Loss 6.2210   LearningRate 0.0218   Epoch: 10   Global Step: 442120   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:48,126-Speed 2619.00 samples/sec   Loss 6.1806   LearningRate 0.0218   Epoch: 10   Global Step: 442130   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:52,036-Speed 2619.54 samples/sec   Loss 6.1476   LearningRate 0.0218   Epoch: 10   Global Step: 442140   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:55,936-Speed 2625.73 samples/sec   Loss 6.2377   LearningRate 0.0218   Epoch: 10   Global Step: 442150   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:24:59,983-Speed 2531.06 samples/sec   Loss 6.2372   LearningRate 0.0218   Epoch: 10   Global Step: 442160   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:03,878-Speed 2630.13 samples/sec   Loss 6.1814   LearningRate 0.0218   Epoch: 10   Global Step: 442170   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:07,819-Speed 2598.73 samples/sec   Loss 6.2516   LearningRate 0.0218   Epoch: 10   Global Step: 442180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:25:11,779-Speed 2586.48 samples/sec   Loss 6.3377   LearningRate 0.0218   Epoch: 10   Global Step: 442190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:25:15,691-Speed 2618.45 samples/sec   Loss 6.1358   LearningRate 0.0218   Epoch: 10   Global Step: 442200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:25:19,594-Speed 2624.12 samples/sec   Loss 6.2530   LearningRate 0.0218   Epoch: 10   Global Step: 442210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:25:23,502-Speed 2621.42 samples/sec   Loss 6.1874   LearningRate 0.0218   Epoch: 10   Global Step: 442220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:25:27,381-Speed 2639.77 samples/sec   Loss 6.2244   LearningRate 0.0218   Epoch: 10   Global Step: 442230   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:31,335-Speed 2590.97 samples/sec   Loss 6.2539   LearningRate 0.0218   Epoch: 10   Global Step: 442240   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:35,248-Speed 2617.34 samples/sec   Loss 6.1837   LearningRate 0.0218   Epoch: 10   Global Step: 442250   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:39,255-Speed 2556.58 samples/sec   Loss 6.2391   LearningRate 0.0218   Epoch: 10   Global Step: 442260   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:43,361-Speed 2494.42 samples/sec   Loss 6.1569   LearningRate 0.0218   Epoch: 10   Global Step: 442270   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:47,460-Speed 2498.93 samples/sec   Loss 6.2564   LearningRate 0.0218   Epoch: 10   Global Step: 442280   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:51,550-Speed 2504.41 samples/sec   Loss 6.3308   LearningRate 0.0218   Epoch: 10   Global Step: 442290   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:55,576-Speed 2543.97 samples/sec   Loss 6.2042   LearningRate 0.0218   Epoch: 10   Global Step: 442300   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:25:59,484-Speed 2621.01 samples/sec   Loss 6.1858   LearningRate 0.0218   Epoch: 10   Global Step: 442310   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:03,392-Speed 2620.71 samples/sec   Loss 6.2266   LearningRate 0.0218   Epoch: 10   Global Step: 442320   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:07,277-Speed 2636.56 samples/sec   Loss 6.1989   LearningRate 0.0218   Epoch: 10   Global Step: 442330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:11,188-Speed 2619.16 samples/sec   Loss 6.2657   LearningRate 0.0218   Epoch: 10   Global Step: 442340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:15,130-Speed 2598.17 samples/sec   Loss 6.2189   LearningRate 0.0218   Epoch: 10   Global Step: 442350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:19,035-Speed 2623.54 samples/sec   Loss 6.1795   LearningRate 0.0218   Epoch: 10   Global Step: 442360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:22,934-Speed 2626.49 samples/sec   Loss 6.2671   LearningRate 0.0218   Epoch: 10   Global Step: 442370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:26,835-Speed 2625.35 samples/sec   Loss 6.2718   LearningRate 0.0218   Epoch: 10   Global Step: 442380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:30,748-Speed 2618.24 samples/sec   Loss 6.2943   LearningRate 0.0218   Epoch: 10   Global Step: 442390   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:34,656-Speed 2620.74 samples/sec   Loss 6.2392   LearningRate 0.0218   Epoch: 10   Global Step: 442400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:38,605-Speed 2593.90 samples/sec   Loss 6.1552   LearningRate 0.0218   Epoch: 10   Global Step: 442410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:42,530-Speed 2610.20 samples/sec   Loss 6.2929   LearningRate 0.0218   Epoch: 10   Global Step: 442420   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-04-14 21:26:46,431-Speed 2625.54 samples/sec   Loss 6.2836   LearningRate 0.0218   Epoch: 10   Global Step: 442430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:26:50,360-Speed 2607.51 samples/sec   Loss 6.2151   LearningRate 0.0218   Epoch: 10   Global Step: 442440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:26:54,271-Speed 2618.50 samples/sec   Loss 6.2767   LearningRate 0.0218   Epoch: 10   Global Step: 442450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-04-14 21:26:58,171-Speed 2626.56 samples/sec   Loss 6.2305   LearningRate 0.0218   Epoch: 10   Global Step: 442460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:02,094-Speed 2610.67 samples/sec   Loss 6.1895   LearningRate 0.0218   Epoch: 10   Global Step: 442470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:06,043-Speed 2594.38 samples/sec   Loss 6.1453   LearningRate 0.0218   Epoch: 10   Global Step: 442480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:09,941-Speed 2627.43 samples/sec   Loss 6.1889   LearningRate 0.0218   Epoch: 10   Global Step: 442490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:13,851-Speed 2619.95 samples/sec   Loss 6.2298   LearningRate 0.0218   Epoch: 10   Global Step: 442500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:17,765-Speed 2617.06 samples/sec   Loss 6.2402   LearningRate 0.0218   Epoch: 10   Global Step: 442510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:21,665-Speed 2626.44 samples/sec   Loss 6.1280   LearningRate 0.0218   Epoch: 10   Global Step: 442520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:25,551-Speed 2635.98 samples/sec   Loss 6.3395   LearningRate 0.0218   Epoch: 10   Global Step: 442530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:29,452-Speed 2625.03 samples/sec   Loss 6.2046   LearningRate 0.0218   Epoch: 10   Global Step: 442540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:33,361-Speed 2620.17 samples/sec   Loss 6.2011   LearningRate 0.0218   Epoch: 10   Global Step: 442550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:37,267-Speed 2622.06 samples/sec   Loss 6.1671   LearningRate 0.0218   Epoch: 10   Global Step: 442560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:41,164-Speed 2628.84 samples/sec   Loss 6.2963   LearningRate 0.0218   Epoch: 10   Global Step: 442570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:45,075-Speed 2618.68 samples/sec   Loss 6.1754   LearningRate 0.0218   Epoch: 10   Global Step: 442580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:48,977-Speed 2625.04 samples/sec   Loss 6.2418   LearningRate 0.0218   Epoch: 10   Global Step: 442590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:52,877-Speed 2626.09 samples/sec   Loss 6.3017   LearningRate 0.0218   Epoch: 10   Global Step: 442600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:27:56,895-Speed 2549.50 samples/sec   Loss 6.1553   LearningRate 0.0218   Epoch: 10   Global Step: 442610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:28:00,798-Speed 2624.70 samples/sec   Loss 6.1472   LearningRate 0.0218   Epoch: 10   Global Step: 442620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:28:04,695-Speed 2627.86 samples/sec   Loss 6.2664   LearningRate 0.0218   Epoch: 10   Global Step: 442630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:28:08,595-Speed 2626.15 samples/sec   Loss 6.1681   LearningRate 0.0218   Epoch: 10   Global Step: 442640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:28:12,529-Speed 2603.90 samples/sec   Loss 6.2586   LearningRate 0.0218   Epoch: 10   Global Step: 442650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:28:16,427-Speed 2628.06 samples/sec   Loss 6.1780   LearningRate 0.0218   Epoch: 10   Global Step: 442660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:28:20,314-Speed 2634.97 samples/sec   Loss 6.1964   LearningRate 0.0218   Epoch: 10   Global Step: 442670   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:24,216-Speed 2625.05 samples/sec   Loss 6.3566   LearningRate 0.0218   Epoch: 10   Global Step: 442680   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:28,124-Speed 2621.40 samples/sec   Loss 6.3092   LearningRate 0.0217   Epoch: 10   Global Step: 442690   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:32,028-Speed 2623.42 samples/sec   Loss 6.2373   LearningRate 0.0217   Epoch: 10   Global Step: 442700   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:35,928-Speed 2625.81 samples/sec   Loss 6.2029   LearningRate 0.0217   Epoch: 10   Global Step: 442710   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:39,833-Speed 2623.04 samples/sec   Loss 6.2616   LearningRate 0.0217   Epoch: 10   Global Step: 442720   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:43,738-Speed 2622.96 samples/sec   Loss 6.2573   LearningRate 0.0217   Epoch: 10   Global Step: 442730   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:47,652-Speed 2617.21 samples/sec   Loss 6.2666   LearningRate 0.0217   Epoch: 10   Global Step: 442740   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:51,557-Speed 2622.87 samples/sec   Loss 6.2475   LearningRate 0.0217   Epoch: 10   Global Step: 442750   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:55,454-Speed 2628.17 samples/sec   Loss 6.3104   LearningRate 0.0217   Epoch: 10   Global Step: 442760   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:28:59,361-Speed 2621.78 samples/sec   Loss 6.2826   LearningRate 0.0217   Epoch: 10   Global Step: 442770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:03,280-Speed 2613.25 samples/sec   Loss 6.3587   LearningRate 0.0217   Epoch: 10   Global Step: 442780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:07,184-Speed 2623.35 samples/sec   Loss 6.2900   LearningRate 0.0217   Epoch: 10   Global Step: 442790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:11,096-Speed 2618.26 samples/sec   Loss 6.1552   LearningRate 0.0217   Epoch: 10   Global Step: 442800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:14,996-Speed 2626.42 samples/sec   Loss 6.1568   LearningRate 0.0217   Epoch: 10   Global Step: 442810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:18,896-Speed 2626.74 samples/sec   Loss 6.2008   LearningRate 0.0217   Epoch: 10   Global Step: 442820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:22,797-Speed 2625.09 samples/sec   Loss 6.2135   LearningRate 0.0217   Epoch: 10   Global Step: 442830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:26,704-Speed 2622.16 samples/sec   Loss 6.2099   LearningRate 0.0217   Epoch: 10   Global Step: 442840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:30,608-Speed 2622.92 samples/sec   Loss 6.1658   LearningRate 0.0217   Epoch: 10   Global Step: 442850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:34,516-Speed 2621.22 samples/sec   Loss 6.1935   LearningRate 0.0217   Epoch: 10   Global Step: 442860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:29:38,424-Speed 2620.52 samples/sec   Loss 6.2621   LearningRate 0.0217   Epoch: 10   Global Step: 442870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:29:42,334-Speed 2619.45 samples/sec   Loss 6.2944   LearningRate 0.0217   Epoch: 10   Global Step: 442880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:29:46,244-Speed 2619.41 samples/sec   Loss 6.2868   LearningRate 0.0217   Epoch: 10   Global Step: 442890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:29:50,169-Speed 2610.51 samples/sec   Loss 6.2989   LearningRate 0.0217   Epoch: 10   Global Step: 442900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:29:54,071-Speed 2624.67 samples/sec   Loss 6.1168   LearningRate 0.0217   Epoch: 10   Global Step: 442910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:29:57,968-Speed 2628.43 samples/sec   Loss 6.1832   LearningRate 0.0217   Epoch: 10   Global Step: 442920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:01,871-Speed 2623.98 samples/sec   Loss 6.2360   LearningRate 0.0217   Epoch: 10   Global Step: 442930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:05,773-Speed 2624.85 samples/sec   Loss 6.1774   LearningRate 0.0217   Epoch: 10   Global Step: 442940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:09,677-Speed 2623.44 samples/sec   Loss 6.2064   LearningRate 0.0217   Epoch: 10   Global Step: 442950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:13,583-Speed 2622.67 samples/sec   Loss 6.2062   LearningRate 0.0217   Epoch: 10   Global Step: 442960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:17,463-Speed 2639.94 samples/sec   Loss 6.1624   LearningRate 0.0217   Epoch: 10   Global Step: 442970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:21,389-Speed 2608.73 samples/sec   Loss 6.1397   LearningRate 0.0217   Epoch: 10   Global Step: 442980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:25,305-Speed 2615.87 samples/sec   Loss 6.1531   LearningRate 0.0217   Epoch: 10   Global Step: 442990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:29,220-Speed 2616.85 samples/sec   Loss 6.2916   LearningRate 0.0217   Epoch: 10   Global Step: 443000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:33,119-Speed 2626.92 samples/sec   Loss 6.1821   LearningRate 0.0217   Epoch: 10   Global Step: 443010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:30:36,996-Speed 2641.26 samples/sec   Loss 6.2661   LearningRate 0.0217   Epoch: 10   Global Step: 443020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:30:40,899-Speed 2624.30 samples/sec   Loss 6.1978   LearningRate 0.0217   Epoch: 10   Global Step: 443030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:30:44,804-Speed 2623.37 samples/sec   Loss 6.3801   LearningRate 0.0217   Epoch: 10   Global Step: 443040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:30:48,710-Speed 2622.61 samples/sec   Loss 6.2644   LearningRate 0.0217   Epoch: 10   Global Step: 443050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:30:52,622-Speed 2618.96 samples/sec   Loss 6.2692   LearningRate 0.0217   Epoch: 10   Global Step: 443060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:30:56,522-Speed 2626.05 samples/sec   Loss 6.0722   LearningRate 0.0217   Epoch: 10   Global Step: 443070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:00,422-Speed 2626.16 samples/sec   Loss 6.1741   LearningRate 0.0217   Epoch: 10   Global Step: 443080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:04,324-Speed 2624.68 samples/sec   Loss 6.2648   LearningRate 0.0217   Epoch: 10   Global Step: 443090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:08,225-Speed 2625.80 samples/sec   Loss 6.1283   LearningRate 0.0217   Epoch: 10   Global Step: 443100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:12,131-Speed 2622.35 samples/sec   Loss 6.3358   LearningRate 0.0217   Epoch: 10   Global Step: 443110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:16,036-Speed 2623.49 samples/sec   Loss 6.3871   LearningRate 0.0217   Epoch: 10   Global Step: 443120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:19,945-Speed 2620.25 samples/sec   Loss 6.2881   LearningRate 0.0217   Epoch: 10   Global Step: 443130   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:23,851-Speed 2622.39 samples/sec   Loss 6.2446   LearningRate 0.0217   Epoch: 10   Global Step: 443140   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:27,764-Speed 2617.42 samples/sec   Loss 6.2562   LearningRate 0.0217   Epoch: 10   Global Step: 443150   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:31,680-Speed 2615.17 samples/sec   Loss 6.1975   LearningRate 0.0217   Epoch: 10   Global Step: 443160   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:35,584-Speed 2623.48 samples/sec   Loss 6.3259   LearningRate 0.0217   Epoch: 10   Global Step: 443170   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:39,484-Speed 2625.94 samples/sec   Loss 6.2789   LearningRate 0.0217   Epoch: 10   Global Step: 443180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:43,383-Speed 2627.69 samples/sec   Loss 6.2906   LearningRate 0.0217   Epoch: 10   Global Step: 443190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:31:47,269-Speed 2635.83 samples/sec   Loss 6.3322   LearningRate 0.0217   Epoch: 10   Global Step: 443200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:51,170-Speed 2625.46 samples/sec   Loss 6.1937   LearningRate 0.0217   Epoch: 10   Global Step: 443210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:55,072-Speed 2624.80 samples/sec   Loss 6.2747   LearningRate 0.0217   Epoch: 10   Global Step: 443220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:31:58,984-Speed 2618.37 samples/sec   Loss 6.1886   LearningRate 0.0217   Epoch: 10   Global Step: 443230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:02,886-Speed 2624.86 samples/sec   Loss 6.1641   LearningRate 0.0217   Epoch: 10   Global Step: 443240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:06,799-Speed 2617.68 samples/sec   Loss 6.1850   LearningRate 0.0217   Epoch: 10   Global Step: 443250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:10,702-Speed 2624.52 samples/sec   Loss 6.1766   LearningRate 0.0217   Epoch: 10   Global Step: 443260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:14,628-Speed 2608.87 samples/sec   Loss 6.0168   LearningRate 0.0217   Epoch: 10   Global Step: 443270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:18,532-Speed 2623.50 samples/sec   Loss 6.2710   LearningRate 0.0217   Epoch: 10   Global Step: 443280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:22,439-Speed 2621.78 samples/sec   Loss 6.3020   LearningRate 0.0217   Epoch: 10   Global Step: 443290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:32:26,349-Speed 2619.75 samples/sec   Loss 6.1649   LearningRate 0.0217   Epoch: 10   Global Step: 443300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:30,277-Speed 2607.30 samples/sec   Loss 6.2611   LearningRate 0.0217   Epoch: 10   Global Step: 443310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:34,194-Speed 2614.20 samples/sec   Loss 6.2063   LearningRate 0.0217   Epoch: 10   Global Step: 443320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:38,095-Speed 2626.24 samples/sec   Loss 6.1554   LearningRate 0.0217   Epoch: 10   Global Step: 443330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:41,996-Speed 2625.74 samples/sec   Loss 6.2583   LearningRate 0.0217   Epoch: 10   Global Step: 443340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:45,893-Speed 2628.45 samples/sec   Loss 6.2782   LearningRate 0.0217   Epoch: 10   Global Step: 443350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:49,844-Speed 2592.36 samples/sec   Loss 6.1190   LearningRate 0.0217   Epoch: 10   Global Step: 443360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:53,751-Speed 2622.01 samples/sec   Loss 6.3327   LearningRate 0.0217   Epoch: 10   Global Step: 443370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:32:57,691-Speed 2599.79 samples/sec   Loss 6.1038   LearningRate 0.0217   Epoch: 10   Global Step: 443380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:01,616-Speed 2609.80 samples/sec   Loss 6.2435   LearningRate 0.0217   Epoch: 10   Global Step: 443390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:05,528-Speed 2618.09 samples/sec   Loss 6.2428   LearningRate 0.0217   Epoch: 10   Global Step: 443400   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:33:09,424-Speed 2629.09 samples/sec   Loss 6.1294   LearningRate 0.0217   Epoch: 10   Global Step: 443410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:13,331-Speed 2622.01 samples/sec   Loss 6.1936   LearningRate 0.0217   Epoch: 10   Global Step: 443420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:17,380-Speed 2529.53 samples/sec   Loss 6.2707   LearningRate 0.0217   Epoch: 10   Global Step: 443430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:21,319-Speed 2600.40 samples/sec   Loss 6.1567   LearningRate 0.0217   Epoch: 10   Global Step: 443440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:25,233-Speed 2617.22 samples/sec   Loss 6.2302   LearningRate 0.0217   Epoch: 10   Global Step: 443450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:29,138-Speed 2623.33 samples/sec   Loss 6.1815   LearningRate 0.0217   Epoch: 10   Global Step: 443460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:33,059-Speed 2611.87 samples/sec   Loss 6.2042   LearningRate 0.0217   Epoch: 10   Global Step: 443470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:36,969-Speed 2619.28 samples/sec   Loss 6.1718   LearningRate 0.0217   Epoch: 10   Global Step: 443480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:40,911-Speed 2597.94 samples/sec   Loss 6.1819   LearningRate 0.0217   Epoch: 10   Global Step: 443490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:44,813-Speed 2625.16 samples/sec   Loss 6.1998   LearningRate 0.0217   Epoch: 10   Global Step: 443500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:33:48,716-Speed 2624.91 samples/sec   Loss 6.0901   LearningRate 0.0217   Epoch: 10   Global Step: 443510   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:33:52,587-Speed 2645.71 samples/sec   Loss 6.2134   LearningRate 0.0217   Epoch: 10   Global Step: 443520   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:33:56,491-Speed 2623.74 samples/sec   Loss 6.1575   LearningRate 0.0217   Epoch: 10   Global Step: 443530   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:00,405-Speed 2616.85 samples/sec   Loss 6.1207   LearningRate 0.0217   Epoch: 10   Global Step: 443540   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:04,310-Speed 2622.49 samples/sec   Loss 6.1592   LearningRate 0.0217   Epoch: 10   Global Step: 443550   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:08,216-Speed 2622.01 samples/sec   Loss 6.2549   LearningRate 0.0217   Epoch: 10   Global Step: 443560   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:12,122-Speed 2622.56 samples/sec   Loss 5.9988   LearningRate 0.0217   Epoch: 10   Global Step: 443570   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:16,026-Speed 2623.27 samples/sec   Loss 6.2390   LearningRate 0.0217   Epoch: 10   Global Step: 443580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:20,002-Speed 2576.35 samples/sec   Loss 6.2414   LearningRate 0.0216   Epoch: 10   Global Step: 443590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:23,965-Speed 2584.27 samples/sec   Loss 6.2581   LearningRate 0.0216   Epoch: 10   Global Step: 443600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:27,867-Speed 2625.33 samples/sec   Loss 6.1707   LearningRate 0.0216   Epoch: 10   Global Step: 443610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:34:31,775-Speed 2620.75 samples/sec   Loss 6.1285   LearningRate 0.0216   Epoch: 10   Global Step: 443620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:35,680-Speed 2622.68 samples/sec   Loss 6.1347   LearningRate 0.0216   Epoch: 10   Global Step: 443630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:39,583-Speed 2623.99 samples/sec   Loss 6.2805   LearningRate 0.0216   Epoch: 10   Global Step: 443640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:43,485-Speed 2625.86 samples/sec   Loss 6.2486   LearningRate 0.0216   Epoch: 10   Global Step: 443650   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:47,404-Speed 2613.33 samples/sec   Loss 6.1463   LearningRate 0.0216   Epoch: 10   Global Step: 443660   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:51,457-Speed 2527.05 samples/sec   Loss 6.3717   LearningRate 0.0216   Epoch: 10   Global Step: 443670   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:55,371-Speed 2617.14 samples/sec   Loss 6.2339   LearningRate 0.0216   Epoch: 10   Global Step: 443680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:34:59,273-Speed 2625.43 samples/sec   Loss 6.2556   LearningRate 0.0216   Epoch: 10   Global Step: 443690   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:35:03,212-Speed 2599.98 samples/sec   Loss 6.1084   LearningRate 0.0216   Epoch: 10   Global Step: 443700   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:35:07,090-Speed 2641.25 samples/sec   Loss 6.0297   LearningRate 0.0216   Epoch: 10   Global Step: 443710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:10,990-Speed 2626.00 samples/sec   Loss 6.1183   LearningRate 0.0216   Epoch: 10   Global Step: 443720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:14,889-Speed 2626.73 samples/sec   Loss 6.2072   LearningRate 0.0216   Epoch: 10   Global Step: 443730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:18,805-Speed 2616.20 samples/sec   Loss 6.2134   LearningRate 0.0216   Epoch: 10   Global Step: 443740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:22,703-Speed 2627.37 samples/sec   Loss 6.1941   LearningRate 0.0216   Epoch: 10   Global Step: 443750   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:26,603-Speed 2626.83 samples/sec   Loss 6.2660   LearningRate 0.0216   Epoch: 10   Global Step: 443760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:30,501-Speed 2627.64 samples/sec   Loss 6.1234   LearningRate 0.0216   Epoch: 10   Global Step: 443770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:34,398-Speed 2628.02 samples/sec   Loss 6.1876   LearningRate 0.0216   Epoch: 10   Global Step: 443780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:38,302-Speed 2623.77 samples/sec   Loss 6.1357   LearningRate 0.0216   Epoch: 10   Global Step: 443790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:42,200-Speed 2627.79 samples/sec   Loss 6.2326   LearningRate 0.0216   Epoch: 10   Global Step: 443800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:46,111-Speed 2618.75 samples/sec   Loss 6.2187   LearningRate 0.0216   Epoch: 10   Global Step: 443810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:35:50,026-Speed 2616.69 samples/sec   Loss 6.2239   LearningRate 0.0216   Epoch: 10   Global Step: 443820   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:35:53,967-Speed 2598.67 samples/sec   Loss 6.2676   LearningRate 0.0216   Epoch: 10   Global Step: 443830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:35:58,062-Speed 2501.94 samples/sec   Loss 6.1349   LearningRate 0.0216   Epoch: 10   Global Step: 443840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:01,962-Speed 2626.18 samples/sec   Loss 6.0763   LearningRate 0.0216   Epoch: 10   Global Step: 443850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:05,862-Speed 2625.93 samples/sec   Loss 6.0397   LearningRate 0.0216   Epoch: 10   Global Step: 443860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:09,762-Speed 2626.48 samples/sec   Loss 6.2248   LearningRate 0.0216   Epoch: 10   Global Step: 443870   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:13,670-Speed 2620.62 samples/sec   Loss 6.2397   LearningRate 0.0216   Epoch: 10   Global Step: 443880   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:17,572-Speed 2624.96 samples/sec   Loss 6.1135   LearningRate 0.0216   Epoch: 10   Global Step: 443890   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:21,484-Speed 2618.15 samples/sec   Loss 6.1284   LearningRate 0.0216   Epoch: 10   Global Step: 443900   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:25,394-Speed 2619.44 samples/sec   Loss 6.1813   LearningRate 0.0216   Epoch: 10   Global Step: 443910   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:29,335-Speed 2599.66 samples/sec   Loss 6.1610   LearningRate 0.0216   Epoch: 10   Global Step: 443920   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:36:33,293-Speed 2587.90 samples/sec   Loss 6.2188   LearningRate 0.0216   Epoch: 10   Global Step: 443930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:36:37,196-Speed 2623.93 samples/sec   Loss 6.2222   LearningRate 0.0216   Epoch: 10   Global Step: 443940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:36:41,114-Speed 2614.14 samples/sec   Loss 6.1625   LearningRate 0.0216   Epoch: 10   Global Step: 443950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:36:45,087-Speed 2578.06 samples/sec   Loss 6.1089   LearningRate 0.0216   Epoch: 10   Global Step: 443960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:36:49,024-Speed 2601.89 samples/sec   Loss 6.1653   LearningRate 0.0216   Epoch: 10   Global Step: 443970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:36:52,928-Speed 2623.96 samples/sec   Loss 6.2642   LearningRate 0.0216   Epoch: 10   Global Step: 443980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:36:56,834-Speed 2622.02 samples/sec   Loss 6.1206   LearningRate 0.0216   Epoch: 10   Global Step: 443990   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:00,752-Speed 2614.47 samples/sec   Loss 6.1512   LearningRate 0.0216   Epoch: 10   Global Step: 444000   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:04,723-Speed 2578.68 samples/sec   Loss 6.0974   LearningRate 0.0216   Epoch: 10   Global Step: 444010   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:08,630-Speed 2622.17 samples/sec   Loss 6.2652   LearningRate 0.0216   Epoch: 10   Global Step: 444020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:12,548-Speed 2614.09 samples/sec   Loss 6.2765   LearningRate 0.0216   Epoch: 10   Global Step: 444030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:16,448-Speed 2626.15 samples/sec   Loss 6.3965   LearningRate 0.0216   Epoch: 10   Global Step: 444040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:20,356-Speed 2621.20 samples/sec   Loss 6.1840   LearningRate 0.0216   Epoch: 10   Global Step: 444050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:24,262-Speed 2622.52 samples/sec   Loss 6.2730   LearningRate 0.0216   Epoch: 10   Global Step: 444060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:28,175-Speed 2617.26 samples/sec   Loss 6.1877   LearningRate 0.0216   Epoch: 10   Global Step: 444070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:32,093-Speed 2614.03 samples/sec   Loss 6.2499   LearningRate 0.0216   Epoch: 10   Global Step: 444080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:35,996-Speed 2623.91 samples/sec   Loss 6.1881   LearningRate 0.0216   Epoch: 10   Global Step: 444090   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:37:39,945-Speed 2593.99 samples/sec   Loss 6.1596   LearningRate 0.0216   Epoch: 10   Global Step: 444100   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:37:43,831-Speed 2636.27 samples/sec   Loss 6.2810   LearningRate 0.0216   Epoch: 10   Global Step: 444110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:47,762-Speed 2605.53 samples/sec   Loss 6.2282   LearningRate 0.0216   Epoch: 10   Global Step: 444120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:51,670-Speed 2621.12 samples/sec   Loss 6.1713   LearningRate 0.0216   Epoch: 10   Global Step: 444130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:55,577-Speed 2621.75 samples/sec   Loss 6.2775   LearningRate 0.0216   Epoch: 10   Global Step: 444140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:37:59,484-Speed 2621.62 samples/sec   Loss 6.2052   LearningRate 0.0216   Epoch: 10   Global Step: 444150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:03,504-Speed 2547.70 samples/sec   Loss 6.1945   LearningRate 0.0216   Epoch: 10   Global Step: 444160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:07,481-Speed 2575.30 samples/sec   Loss 6.1755   LearningRate 0.0216   Epoch: 10   Global Step: 444170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:11,596-Speed 2489.32 samples/sec   Loss 6.2827   LearningRate 0.0216   Epoch: 10   Global Step: 444180   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:15,504-Speed 2621.47 samples/sec   Loss 6.1781   LearningRate 0.0216   Epoch: 10   Global Step: 444190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:19,410-Speed 2622.35 samples/sec   Loss 6.2590   LearningRate 0.0216   Epoch: 10   Global Step: 444200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:23,302-Speed 2631.63 samples/sec   Loss 6.0587   LearningRate 0.0216   Epoch: 10   Global Step: 444210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:27,210-Speed 2620.79 samples/sec   Loss 6.2748   LearningRate 0.0216   Epoch: 10   Global Step: 444220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:31,129-Speed 2613.00 samples/sec   Loss 6.2406   LearningRate 0.0216   Epoch: 10   Global Step: 444230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:35,044-Speed 2616.61 samples/sec   Loss 6.2057   LearningRate 0.0216   Epoch: 10   Global Step: 444240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:38,947-Speed 2624.02 samples/sec   Loss 6.1972   LearningRate 0.0216   Epoch: 10   Global Step: 444250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:42,890-Speed 2597.92 samples/sec   Loss 6.2737   LearningRate 0.0216   Epoch: 10   Global Step: 444260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:46,996-Speed 2494.47 samples/sec   Loss 6.1776   LearningRate 0.0216   Epoch: 10   Global Step: 444270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:51,069-Speed 2514.73 samples/sec   Loss 6.3341   LearningRate 0.0216   Epoch: 10   Global Step: 444280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:55,035-Speed 2582.81 samples/sec   Loss 6.3376   LearningRate 0.0216   Epoch: 10   Global Step: 444290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:38:58,935-Speed 2626.58 samples/sec   Loss 6.2608   LearningRate 0.0216   Epoch: 10   Global Step: 444300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:39:02,835-Speed 2626.02 samples/sec   Loss 6.1664   LearningRate 0.0216   Epoch: 10   Global Step: 444310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:06,738-Speed 2624.34 samples/sec   Loss 6.1427   LearningRate 0.0216   Epoch: 10   Global Step: 444320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:10,650-Speed 2617.88 samples/sec   Loss 6.1191   LearningRate 0.0216   Epoch: 10   Global Step: 444330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:14,551-Speed 2626.43 samples/sec   Loss 6.1214   LearningRate 0.0216   Epoch: 10   Global Step: 444340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:18,451-Speed 2626.42 samples/sec   Loss 6.1316   LearningRate 0.0216   Epoch: 10   Global Step: 444350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:22,353-Speed 2624.38 samples/sec   Loss 6.1151   LearningRate 0.0216   Epoch: 10   Global Step: 444360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:26,257-Speed 2623.91 samples/sec   Loss 6.2526   LearningRate 0.0216   Epoch: 10   Global Step: 444370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:30,158-Speed 2625.99 samples/sec   Loss 6.1480   LearningRate 0.0216   Epoch: 10   Global Step: 444380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:34,061-Speed 2623.91 samples/sec   Loss 6.2300   LearningRate 0.0216   Epoch: 10   Global Step: 444390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:37,968-Speed 2621.80 samples/sec   Loss 6.2183   LearningRate 0.0216   Epoch: 10   Global Step: 444400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:41,868-Speed 2626.52 samples/sec   Loss 6.2500   LearningRate 0.0216   Epoch: 10   Global Step: 444410   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:39:45,766-Speed 2627.34 samples/sec   Loss 6.1995   LearningRate 0.0216   Epoch: 10   Global Step: 444420   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:39:49,691-Speed 2609.78 samples/sec   Loss 6.0802   LearningRate 0.0216   Epoch: 10   Global Step: 444430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:53,590-Speed 2627.01 samples/sec   Loss 6.1771   LearningRate 0.0216   Epoch: 10   Global Step: 444440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:39:57,487-Speed 2628.77 samples/sec   Loss 6.2122   LearningRate 0.0216   Epoch: 10   Global Step: 444450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:40:01,362-Speed 2642.59 samples/sec   Loss 6.2879   LearningRate 0.0216   Epoch: 10   Global Step: 444460   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:05,265-Speed 2624.55 samples/sec   Loss 6.2589   LearningRate 0.0216   Epoch: 10   Global Step: 444470   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:09,178-Speed 2617.27 samples/sec   Loss 6.1461   LearningRate 0.0215   Epoch: 10   Global Step: 444480   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:13,082-Speed 2624.11 samples/sec   Loss 6.2286   LearningRate 0.0215   Epoch: 10   Global Step: 444490   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:16,994-Speed 2618.04 samples/sec   Loss 6.1780   LearningRate 0.0215   Epoch: 10   Global Step: 444500   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:20,894-Speed 2626.46 samples/sec   Loss 6.1744   LearningRate 0.0215   Epoch: 10   Global Step: 444510   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:24,799-Speed 2623.29 samples/sec   Loss 6.2660   LearningRate 0.0215   Epoch: 10   Global Step: 444520   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:28,705-Speed 2621.91 samples/sec   Loss 6.2576   LearningRate 0.0215   Epoch: 10   Global Step: 444530   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:32,612-Speed 2621.65 samples/sec   Loss 6.2915   LearningRate 0.0215   Epoch: 10   Global Step: 444540   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:36,533-Speed 2611.90 samples/sec   Loss 6.1329   LearningRate 0.0215   Epoch: 10   Global Step: 444550   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:40,444-Speed 2619.15 samples/sec   Loss 6.2283   LearningRate 0.0215   Epoch: 10   Global Step: 444560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:40:44,357-Speed 2618.40 samples/sec   Loss 6.1363   LearningRate 0.0215   Epoch: 10   Global Step: 444570   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:48,265-Speed 2620.69 samples/sec   Loss 6.1977   LearningRate 0.0215   Epoch: 10   Global Step: 444580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:52,167-Speed 2625.32 samples/sec   Loss 6.1737   LearningRate 0.0215   Epoch: 10   Global Step: 444590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:56,079-Speed 2617.87 samples/sec   Loss 6.2708   LearningRate 0.0215   Epoch: 10   Global Step: 444600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:40:59,991-Speed 2619.04 samples/sec   Loss 6.1756   LearningRate 0.0215   Epoch: 10   Global Step: 444610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:41:03,894-Speed 2623.78 samples/sec   Loss 6.1277   LearningRate 0.0215   Epoch: 10   Global Step: 444620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:41:07,823-Speed 2606.61 samples/sec   Loss 6.1541   LearningRate 0.0215   Epoch: 10   Global Step: 444630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:41:11,729-Speed 2622.04 samples/sec   Loss 6.3054   LearningRate 0.0215   Epoch: 10   Global Step: 444640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:41:15,632-Speed 2624.28 samples/sec   Loss 6.1334   LearningRate 0.0215   Epoch: 10   Global Step: 444650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:41:19,531-Speed 2627.10 samples/sec   Loss 6.1757   LearningRate 0.0215   Epoch: 10   Global Step: 444660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:41:23,435-Speed 2623.64 samples/sec   Loss 6.1132   LearningRate 0.0215   Epoch: 10   Global Step: 444670   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:27,341-Speed 2622.43 samples/sec   Loss 6.2071   LearningRate 0.0215   Epoch: 10   Global Step: 444680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:31,238-Speed 2628.60 samples/sec   Loss 6.2342   LearningRate 0.0215   Epoch: 10   Global Step: 444690   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:35,134-Speed 2628.36 samples/sec   Loss 6.2399   LearningRate 0.0215   Epoch: 10   Global Step: 444700   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:39,034-Speed 2626.51 samples/sec   Loss 6.1407   LearningRate 0.0215   Epoch: 10   Global Step: 444710   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:42,930-Speed 2628.56 samples/sec   Loss 6.1503   LearningRate 0.0215   Epoch: 10   Global Step: 444720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:46,827-Speed 2628.55 samples/sec   Loss 6.1981   LearningRate 0.0215   Epoch: 10   Global Step: 444730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:50,725-Speed 2627.89 samples/sec   Loss 6.1966   LearningRate 0.0215   Epoch: 10   Global Step: 444740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:54,623-Speed 2627.98 samples/sec   Loss 6.2024   LearningRate 0.0215   Epoch: 10   Global Step: 444750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:41:58,529-Speed 2621.84 samples/sec   Loss 6.1531   LearningRate 0.0215   Epoch: 10   Global Step: 444760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:42:02,629-Speed 2498.90 samples/sec   Loss 6.2196   LearningRate 0.0215   Epoch: 10   Global Step: 444770   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:42:06,547-Speed 2614.16 samples/sec   Loss 6.1598   LearningRate 0.0215   Epoch: 10   Global Step: 444780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:42:10,466-Speed 2613.55 samples/sec   Loss 6.2074   LearningRate 0.0215   Epoch: 10   Global Step: 444790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:14,360-Speed 2630.50 samples/sec   Loss 6.2175   LearningRate 0.0215   Epoch: 10   Global Step: 444800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:18,260-Speed 2626.77 samples/sec   Loss 6.2519   LearningRate 0.0215   Epoch: 10   Global Step: 444810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:22,166-Speed 2621.64 samples/sec   Loss 6.2667   LearningRate 0.0215   Epoch: 10   Global Step: 444820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:26,067-Speed 2625.32 samples/sec   Loss 6.2488   LearningRate 0.0215   Epoch: 10   Global Step: 444830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:29,967-Speed 2627.08 samples/sec   Loss 6.1777   LearningRate 0.0215   Epoch: 10   Global Step: 444840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:33,916-Speed 2594.02 samples/sec   Loss 6.1872   LearningRate 0.0215   Epoch: 10   Global Step: 444850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:37,811-Speed 2629.30 samples/sec   Loss 6.1634   LearningRate 0.0215   Epoch: 10   Global Step: 444860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:41,721-Speed 2619.65 samples/sec   Loss 6.1557   LearningRate 0.0215   Epoch: 10   Global Step: 444870   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:45,641-Speed 2612.84 samples/sec   Loss 6.2923   LearningRate 0.0215   Epoch: 10   Global Step: 444880   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:42:49,535-Speed 2630.56 samples/sec   Loss 6.1419   LearningRate 0.0215   Epoch: 10   Global Step: 444890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:42:53,441-Speed 2622.60 samples/sec   Loss 6.2459   LearningRate 0.0215   Epoch: 10   Global Step: 444900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:42:57,317-Speed 2642.33 samples/sec   Loss 6.3260   LearningRate 0.0215   Epoch: 10   Global Step: 444910   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:43:01,216-Speed 2626.86 samples/sec   Loss 6.2122   LearningRate 0.0215   Epoch: 10   Global Step: 444920   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:43:05,120-Speed 2623.08 samples/sec   Loss 6.3010   LearningRate 0.0215   Epoch: 10   Global Step: 444930   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:43:09,079-Speed 2587.38 samples/sec   Loss 6.1853   LearningRate 0.0215   Epoch: 10   Global Step: 444940   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:43:13,172-Speed 2502.68 samples/sec   Loss 6.1244   LearningRate 0.0215   Epoch: 10   Global Step: 444950   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:43:17,110-Speed 2600.72 samples/sec   Loss 6.2879   LearningRate 0.0215   Epoch: 10   Global Step: 444960   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:21,004-Speed 2630.85 samples/sec   Loss 6.1935   LearningRate 0.0215   Epoch: 10   Global Step: 444970   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:24,904-Speed 2625.95 samples/sec   Loss 6.3598   LearningRate 0.0215   Epoch: 10   Global Step: 444980   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:28,812-Speed 2620.98 samples/sec   Loss 6.1939   LearningRate 0.0215   Epoch: 10   Global Step: 444990   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:32,710-Speed 2627.82 samples/sec   Loss 6.1865   LearningRate 0.0215   Epoch: 10   Global Step: 445000   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:36,617-Speed 2621.61 samples/sec   Loss 6.1984   LearningRate 0.0215   Epoch: 10   Global Step: 445010   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:40,514-Speed 2628.04 samples/sec   Loss 6.1984   LearningRate 0.0215   Epoch: 10   Global Step: 445020   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:44,411-Speed 2628.94 samples/sec   Loss 6.2517   LearningRate 0.0215   Epoch: 10   Global Step: 445030   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:48,311-Speed 2626.25 samples/sec   Loss 6.2154   LearningRate 0.0215   Epoch: 10   Global Step: 445040   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:52,206-Speed 2630.05 samples/sec   Loss 6.0990   LearningRate 0.0215   Epoch: 10   Global Step: 445050   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:43:56,127-Speed 2611.60 samples/sec   Loss 6.1454   LearningRate 0.0215   Epoch: 10   Global Step: 445060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:44:00,026-Speed 2627.46 samples/sec   Loss 6.2317   LearningRate 0.0215   Epoch: 10   Global Step: 445070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:44:03,924-Speed 2627.25 samples/sec   Loss 6.2336   LearningRate 0.0215   Epoch: 10   Global Step: 445080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:44:07,804-Speed 2640.05 samples/sec   Loss 6.2047   LearningRate 0.0215   Epoch: 10   Global Step: 445090   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:11,700-Speed 2629.03 samples/sec   Loss 6.1857   LearningRate 0.0215   Epoch: 10   Global Step: 445100   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:15,599-Speed 2627.00 samples/sec   Loss 6.1701   LearningRate 0.0215   Epoch: 10   Global Step: 445110   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:19,504-Speed 2623.47 samples/sec   Loss 6.1713   LearningRate 0.0215   Epoch: 10   Global Step: 445120   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:23,398-Speed 2629.87 samples/sec   Loss 6.1499   LearningRate 0.0215   Epoch: 10   Global Step: 445130   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:27,305-Speed 2621.97 samples/sec   Loss 6.1475   LearningRate 0.0215   Epoch: 10   Global Step: 445140   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:31,213-Speed 2620.41 samples/sec   Loss 6.1309   LearningRate 0.0215   Epoch: 10   Global Step: 445150   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:35,106-Speed 2631.52 samples/sec   Loss 6.1717   LearningRate 0.0215   Epoch: 10   Global Step: 445160   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:38,998-Speed 2631.26 samples/sec   Loss 6.1763   LearningRate 0.0215   Epoch: 10   Global Step: 445170   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:42,904-Speed 2622.85 samples/sec   Loss 6.1240   LearningRate 0.0215   Epoch: 10   Global Step: 445180   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-04-14 21:44:46,804-Speed 2625.77 samples/sec   Loss 6.3023   LearningRate 0.0215   Epoch: 10   Global Step: 445190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:44:50,712-Speed 2621.45 samples/sec   Loss 6.1623   LearningRate 0.0215   Epoch: 10   Global Step: 445200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:44:54,617-Speed 2622.62 samples/sec   Loss 6.2072   LearningRate 0.0215   Epoch: 10   Global Step: 445210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:44:58,541-Speed 2610.55 samples/sec   Loss 6.0940   LearningRate 0.0215   Epoch: 10   Global Step: 445220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:02,442-Speed 2625.84 samples/sec   Loss 6.2095   LearningRate 0.0215   Epoch: 10   Global Step: 445230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:06,338-Speed 2628.73 samples/sec   Loss 6.1290   LearningRate 0.0215   Epoch: 10   Global Step: 445240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:10,238-Speed 2626.20 samples/sec   Loss 6.2321   LearningRate 0.0215   Epoch: 10   Global Step: 445250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:14,155-Speed 2615.30 samples/sec   Loss 6.2308   LearningRate 0.0215   Epoch: 10   Global Step: 445260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:18,064-Speed 2620.16 samples/sec   Loss 6.1790   LearningRate 0.0215   Epoch: 10   Global Step: 445270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:21,964-Speed 2626.72 samples/sec   Loss 6.1172   LearningRate 0.0215   Epoch: 10   Global Step: 445280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:25,869-Speed 2622.63 samples/sec   Loss 6.1877   LearningRate 0.0215   Epoch: 10   Global Step: 445290   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:45:29,769-Speed 2626.30 samples/sec   Loss 6.2610   LearningRate 0.0215   Epoch: 10   Global Step: 445300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:45:33,647-Speed 2641.23 samples/sec   Loss 6.1569   LearningRate 0.0215   Epoch: 10   Global Step: 445310   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:37,571-Speed 2610.17 samples/sec   Loss 6.1981   LearningRate 0.0215   Epoch: 10   Global Step: 445320   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:41,471-Speed 2626.39 samples/sec   Loss 6.2043   LearningRate 0.0215   Epoch: 10   Global Step: 445330   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:45,369-Speed 2628.36 samples/sec   Loss 6.1436   LearningRate 0.0215   Epoch: 10   Global Step: 445340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:49,269-Speed 2626.23 samples/sec   Loss 6.1559   LearningRate 0.0215   Epoch: 10   Global Step: 445350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:53,168-Speed 2626.96 samples/sec   Loss 6.1487   LearningRate 0.0215   Epoch: 10   Global Step: 445360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:45:57,061-Speed 2630.50 samples/sec   Loss 6.2024   LearningRate 0.0214   Epoch: 10   Global Step: 445370   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:46:00,958-Speed 2628.55 samples/sec   Loss 6.1656   LearningRate 0.0214   Epoch: 10   Global Step: 445380   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:46:04,856-Speed 2627.50 samples/sec   Loss 6.1226   LearningRate 0.0214   Epoch: 10   Global Step: 445390   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:46:08,756-Speed 2626.13 samples/sec   Loss 6.2083   LearningRate 0.0214   Epoch: 10   Global Step: 445400   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:46:12,652-Speed 2629.38 samples/sec   Loss 6.1648   LearningRate 0.0214   Epoch: 10   Global Step: 445410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:16,551-Speed 2626.65 samples/sec   Loss 6.1429   LearningRate 0.0214   Epoch: 10   Global Step: 445420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:20,444-Speed 2631.28 samples/sec   Loss 6.0724   LearningRate 0.0214   Epoch: 10   Global Step: 445430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:24,392-Speed 2594.13 samples/sec   Loss 6.0471   LearningRate 0.0214   Epoch: 10   Global Step: 445440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:28,295-Speed 2624.61 samples/sec   Loss 6.2152   LearningRate 0.0214   Epoch: 10   Global Step: 445450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:32,194-Speed 2626.99 samples/sec   Loss 6.2364   LearningRate 0.0214   Epoch: 10   Global Step: 445460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:36,094-Speed 2626.09 samples/sec   Loss 6.1536   LearningRate 0.0214   Epoch: 10   Global Step: 445470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:39,989-Speed 2629.60 samples/sec   Loss 6.1739   LearningRate 0.0214   Epoch: 10   Global Step: 445480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:43,884-Speed 2629.56 samples/sec   Loss 6.1448   LearningRate 0.0214   Epoch: 10   Global Step: 445490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:47,784-Speed 2626.22 samples/sec   Loss 6.1497   LearningRate 0.0214   Epoch: 10   Global Step: 445500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:51,677-Speed 2631.23 samples/sec   Loss 6.1058   LearningRate 0.0214   Epoch: 10   Global Step: 445510   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:46:55,562-Speed 2636.15 samples/sec   Loss 6.1830   LearningRate 0.0214   Epoch: 10   Global Step: 445520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:46:59,464-Speed 2625.05 samples/sec   Loss 6.1534   LearningRate 0.0214   Epoch: 10   Global Step: 445530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:03,365-Speed 2626.20 samples/sec   Loss 6.1069   LearningRate 0.0214   Epoch: 10   Global Step: 445540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:07,273-Speed 2620.81 samples/sec   Loss 6.2280   LearningRate 0.0214   Epoch: 10   Global Step: 445550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:11,168-Speed 2628.80 samples/sec   Loss 6.2204   LearningRate 0.0214   Epoch: 10   Global Step: 445560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:15,065-Speed 2628.95 samples/sec   Loss 6.3008   LearningRate 0.0214   Epoch: 10   Global Step: 445570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:18,960-Speed 2629.89 samples/sec   Loss 6.2181   LearningRate 0.0214   Epoch: 10   Global Step: 445580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:22,880-Speed 2612.66 samples/sec   Loss 6.1665   LearningRate 0.0214   Epoch: 10   Global Step: 445590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:47:26,773-Speed 2630.91 samples/sec   Loss 6.1979   LearningRate 0.0214   Epoch: 10   Global Step: 445600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:30,676-Speed 2624.77 samples/sec   Loss 6.1600   LearningRate 0.0214   Epoch: 10   Global Step: 445610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:34,592-Speed 2615.05 samples/sec   Loss 6.1842   LearningRate 0.0214   Epoch: 10   Global Step: 445620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:38,493-Speed 2625.64 samples/sec   Loss 6.2095   LearningRate 0.0214   Epoch: 10   Global Step: 445630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:42,605-Speed 2491.35 samples/sec   Loss 6.1070   LearningRate 0.0214   Epoch: 10   Global Step: 445640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:46,569-Speed 2583.96 samples/sec   Loss 6.2159   LearningRate 0.0214   Epoch: 10   Global Step: 445650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:50,465-Speed 2629.24 samples/sec   Loss 6.1919   LearningRate 0.0214   Epoch: 10   Global Step: 445660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:54,361-Speed 2628.74 samples/sec   Loss 6.1568   LearningRate 0.0214   Epoch: 10   Global Step: 445670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:47:58,283-Speed 2611.55 samples/sec   Loss 6.2037   LearningRate 0.0214   Epoch: 10   Global Step: 445680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:02,184-Speed 2625.67 samples/sec   Loss 6.1001   LearningRate 0.0214   Epoch: 10   Global Step: 445690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:06,109-Speed 2609.68 samples/sec   Loss 6.1698   LearningRate 0.0214   Epoch: 10   Global Step: 445700   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:48:10,001-Speed 2631.72 samples/sec   Loss 6.2408   LearningRate 0.0214   Epoch: 10   Global Step: 445710   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:48:13,898-Speed 2628.83 samples/sec   Loss 6.1505   LearningRate 0.0214   Epoch: 10   Global Step: 445720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:48:17,807-Speed 2620.05 samples/sec   Loss 6.1083   LearningRate 0.0214   Epoch: 10   Global Step: 445730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:48:21,709-Speed 2624.80 samples/sec   Loss 6.3723   LearningRate 0.0214   Epoch: 10   Global Step: 445740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:48:25,591-Speed 2639.16 samples/sec   Loss 6.1664   LearningRate 0.0214   Epoch: 10   Global Step: 445750   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:29,511-Speed 2613.40 samples/sec   Loss 6.1269   LearningRate 0.0214   Epoch: 10   Global Step: 445760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:33,414-Speed 2623.61 samples/sec   Loss 6.0922   LearningRate 0.0214   Epoch: 10   Global Step: 445770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:37,348-Speed 2603.44 samples/sec   Loss 6.1919   LearningRate 0.0214   Epoch: 10   Global Step: 445780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:41,249-Speed 2625.63 samples/sec   Loss 6.1148   LearningRate 0.0214   Epoch: 10   Global Step: 445790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:45,150-Speed 2626.10 samples/sec   Loss 6.1978   LearningRate 0.0214   Epoch: 10   Global Step: 445800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:49,047-Speed 2628.06 samples/sec   Loss 6.3134   LearningRate 0.0214   Epoch: 10   Global Step: 445810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:52,945-Speed 2628.23 samples/sec   Loss 6.2115   LearningRate 0.0214   Epoch: 10   Global Step: 445820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:48:56,840-Speed 2629.59 samples/sec   Loss 6.1567   LearningRate 0.0214   Epoch: 10   Global Step: 445830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:49:00,742-Speed 2624.66 samples/sec   Loss 6.0891   LearningRate 0.0214   Epoch: 10   Global Step: 445840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:49:04,642-Speed 2626.16 samples/sec   Loss 6.1536   LearningRate 0.0214   Epoch: 10   Global Step: 445850   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:08,545-Speed 2624.66 samples/sec   Loss 6.2194   LearningRate 0.0214   Epoch: 10   Global Step: 445860   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:12,447-Speed 2625.10 samples/sec   Loss 6.2649   LearningRate 0.0214   Epoch: 10   Global Step: 445870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:16,343-Speed 2628.81 samples/sec   Loss 6.2311   LearningRate 0.0214   Epoch: 10   Global Step: 445880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:20,245-Speed 2624.71 samples/sec   Loss 6.2171   LearningRate 0.0214   Epoch: 10   Global Step: 445890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:24,149-Speed 2623.33 samples/sec   Loss 6.2201   LearningRate 0.0214   Epoch: 10   Global Step: 445900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:28,049-Speed 2626.94 samples/sec   Loss 6.2024   LearningRate 0.0214   Epoch: 10   Global Step: 445910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:31,946-Speed 2627.92 samples/sec   Loss 6.2359   LearningRate 0.0214   Epoch: 10   Global Step: 445920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:35,846-Speed 2626.48 samples/sec   Loss 6.2394   LearningRate 0.0214   Epoch: 10   Global Step: 445930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:39,744-Speed 2627.05 samples/sec   Loss 6.1828   LearningRate 0.0214   Epoch: 10   Global Step: 445940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:43,639-Speed 2630.17 samples/sec   Loss 6.1534   LearningRate 0.0214   Epoch: 10   Global Step: 445950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:47,560-Speed 2612.67 samples/sec   Loss 6.1575   LearningRate 0.0214   Epoch: 10   Global Step: 445960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:49:51,449-Speed 2633.69 samples/sec   Loss 6.1866   LearningRate 0.0214   Epoch: 10   Global Step: 445970   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:49:55,346-Speed 2628.97 samples/sec   Loss 5.9872   LearningRate 0.0214   Epoch: 10   Global Step: 445980   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:49:59,243-Speed 2628.35 samples/sec   Loss 6.1829   LearningRate 0.0214   Epoch: 10   Global Step: 445990   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:03,189-Speed 2595.55 samples/sec   Loss 6.1556   LearningRate 0.0214   Epoch: 10   Global Step: 446000   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:07,137-Speed 2594.85 samples/sec   Loss 6.2052   LearningRate 0.0214   Epoch: 10   Global Step: 446010   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:11,037-Speed 2625.94 samples/sec   Loss 6.2777   LearningRate 0.0214   Epoch: 10   Global Step: 446020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:14,932-Speed 2629.92 samples/sec   Loss 6.2424   LearningRate 0.0214   Epoch: 10   Global Step: 446030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:18,833-Speed 2625.48 samples/sec   Loss 6.0999   LearningRate 0.0214   Epoch: 10   Global Step: 446040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:22,753-Speed 2613.59 samples/sec   Loss 6.0786   LearningRate 0.0214   Epoch: 10   Global Step: 446050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:26,651-Speed 2627.54 samples/sec   Loss 6.2076   LearningRate 0.0214   Epoch: 10   Global Step: 446060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:30,545-Speed 2630.61 samples/sec   Loss 6.1517   LearningRate 0.0214   Epoch: 10   Global Step: 446070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:50:34,415-Speed 2645.96 samples/sec   Loss 6.2502   LearningRate 0.0214   Epoch: 10   Global Step: 446080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:38,310-Speed 2630.03 samples/sec   Loss 6.2981   LearningRate 0.0214   Epoch: 10   Global Step: 446090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:42,203-Speed 2630.92 samples/sec   Loss 6.1278   LearningRate 0.0214   Epoch: 10   Global Step: 446100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:46,101-Speed 2627.83 samples/sec   Loss 6.2101   LearningRate 0.0214   Epoch: 10   Global Step: 446110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:50,014-Speed 2617.47 samples/sec   Loss 6.0871   LearningRate 0.0214   Epoch: 10   Global Step: 446120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:53,947-Speed 2604.08 samples/sec   Loss 6.2350   LearningRate 0.0214   Epoch: 10   Global Step: 446130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:50:57,846-Speed 2627.41 samples/sec   Loss 6.1836   LearningRate 0.0214   Epoch: 10   Global Step: 446140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:01,745-Speed 2627.20 samples/sec   Loss 6.1845   LearningRate 0.0214   Epoch: 10   Global Step: 446150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:05,655-Speed 2619.19 samples/sec   Loss 6.2144   LearningRate 0.0214   Epoch: 10   Global Step: 446160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:09,552-Speed 2628.36 samples/sec   Loss 6.2288   LearningRate 0.0214   Epoch: 10   Global Step: 446170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:13,440-Speed 2634.44 samples/sec   Loss 6.1459   LearningRate 0.0214   Epoch: 10   Global Step: 446180   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:17,361-Speed 2612.37 samples/sec   Loss 6.1586   LearningRate 0.0214   Epoch: 10   Global Step: 446190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:21,279-Speed 2614.18 samples/sec   Loss 6.1395   LearningRate 0.0214   Epoch: 10   Global Step: 446200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:25,177-Speed 2627.28 samples/sec   Loss 6.1712   LearningRate 0.0214   Epoch: 10   Global Step: 446210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:29,076-Speed 2627.71 samples/sec   Loss 6.0898   LearningRate 0.0214   Epoch: 10   Global Step: 446220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:32,971-Speed 2629.66 samples/sec   Loss 6.1631   LearningRate 0.0214   Epoch: 10   Global Step: 446230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:36,872-Speed 2625.54 samples/sec   Loss 6.2160   LearningRate 0.0214   Epoch: 10   Global Step: 446240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:40,772-Speed 2626.22 samples/sec   Loss 6.2993   LearningRate 0.0214   Epoch: 10   Global Step: 446250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:44,683-Speed 2620.99 samples/sec   Loss 6.2238   LearningRate 0.0214   Epoch: 10   Global Step: 446260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:48,600-Speed 2615.06 samples/sec   Loss 6.2157   LearningRate 0.0213   Epoch: 10   Global Step: 446270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:51:52,490-Speed 2633.41 samples/sec   Loss 6.1616   LearningRate 0.0213   Epoch: 10   Global Step: 446280   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:51:56,386-Speed 2629.20 samples/sec   Loss 6.0972   LearningRate 0.0213   Epoch: 10   Global Step: 446290   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:00,285-Speed 2626.76 samples/sec   Loss 6.1876   LearningRate 0.0213   Epoch: 10   Global Step: 446300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:04,197-Speed 2618.11 samples/sec   Loss 6.1331   LearningRate 0.0213   Epoch: 10   Global Step: 446310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:08,095-Speed 2627.33 samples/sec   Loss 6.1010   LearningRate 0.0213   Epoch: 10   Global Step: 446320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:11,990-Speed 2629.93 samples/sec   Loss 6.0161   LearningRate 0.0213   Epoch: 10   Global Step: 446330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:15,886-Speed 2628.96 samples/sec   Loss 6.1116   LearningRate 0.0213   Epoch: 10   Global Step: 446340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:19,786-Speed 2626.79 samples/sec   Loss 6.1025   LearningRate 0.0213   Epoch: 10   Global Step: 446350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:23,680-Speed 2630.29 samples/sec   Loss 6.1865   LearningRate 0.0213   Epoch: 10   Global Step: 446360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:27,574-Speed 2630.43 samples/sec   Loss 6.1276   LearningRate 0.0213   Epoch: 10   Global Step: 446370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:31,470-Speed 2628.97 samples/sec   Loss 6.1694   LearningRate 0.0213   Epoch: 10   Global Step: 446380   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:52:35,354-Speed 2637.10 samples/sec   Loss 6.1528   LearningRate 0.0213   Epoch: 10   Global Step: 446390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:39,250-Speed 2628.86 samples/sec   Loss 6.1374   LearningRate 0.0213   Epoch: 10   Global Step: 446400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:43,149-Speed 2627.50 samples/sec   Loss 6.2802   LearningRate 0.0213   Epoch: 10   Global Step: 446410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:47,045-Speed 2628.71 samples/sec   Loss 6.0964   LearningRate 0.0213   Epoch: 10   Global Step: 446420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:50,941-Speed 2629.43 samples/sec   Loss 6.2471   LearningRate 0.0213   Epoch: 10   Global Step: 446430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:54,844-Speed 2623.78 samples/sec   Loss 6.2182   LearningRate 0.0213   Epoch: 10   Global Step: 446440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:52:58,746-Speed 2625.83 samples/sec   Loss 6.0779   LearningRate 0.0213   Epoch: 10   Global Step: 446450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:02,640-Speed 2630.02 samples/sec   Loss 6.1222   LearningRate 0.0213   Epoch: 10   Global Step: 446460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:06,535-Speed 2629.14 samples/sec   Loss 6.1199   LearningRate 0.0213   Epoch: 10   Global Step: 446470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:10,441-Speed 2622.73 samples/sec   Loss 6.1716   LearningRate 0.0213   Epoch: 10   Global Step: 446480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:14,323-Speed 2638.66 samples/sec   Loss 5.9964   LearningRate 0.0213   Epoch: 10   Global Step: 446490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:18,215-Speed 2631.52 samples/sec   Loss 6.2176   LearningRate 0.0213   Epoch: 10   Global Step: 446500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:22,161-Speed 2595.74 samples/sec   Loss 6.1089   LearningRate 0.0213   Epoch: 10   Global Step: 446510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:26,060-Speed 2627.47 samples/sec   Loss 6.0902   LearningRate 0.0213   Epoch: 10   Global Step: 446520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:29,990-Speed 2606.53 samples/sec   Loss 6.2556   LearningRate 0.0213   Epoch: 10   Global Step: 446530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:33,887-Speed 2628.08 samples/sec   Loss 6.0810   LearningRate 0.0213   Epoch: 10   Global Step: 446540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:37,790-Speed 2624.35 samples/sec   Loss 6.3218   LearningRate 0.0213   Epoch: 10   Global Step: 446550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:41,687-Speed 2628.20 samples/sec   Loss 6.3294   LearningRate 0.0213   Epoch: 10   Global Step: 446560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:45,582-Speed 2630.02 samples/sec   Loss 6.1230   LearningRate 0.0213   Epoch: 10   Global Step: 446570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:49,480-Speed 2627.45 samples/sec   Loss 6.1205   LearningRate 0.0213   Epoch: 10   Global Step: 446580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:53:53,390-Speed 2619.82 samples/sec   Loss 6.0927   LearningRate 0.0213   Epoch: 10   Global Step: 446590   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:53:57,276-Speed 2635.23 samples/sec   Loss 6.1415   LearningRate 0.0213   Epoch: 10   Global Step: 446600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:01,171-Speed 2630.09 samples/sec   Loss 6.1077   LearningRate 0.0213   Epoch: 10   Global Step: 446610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:05,063-Speed 2631.78 samples/sec   Loss 6.2628   LearningRate 0.0213   Epoch: 10   Global Step: 446620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:08,959-Speed 2628.80 samples/sec   Loss 6.1936   LearningRate 0.0213   Epoch: 10   Global Step: 446630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:12,856-Speed 2628.57 samples/sec   Loss 6.1455   LearningRate 0.0213   Epoch: 10   Global Step: 446640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:16,755-Speed 2626.57 samples/sec   Loss 6.2573   LearningRate 0.0213   Epoch: 10   Global Step: 446650   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:20,661-Speed 2622.22 samples/sec   Loss 6.1038   LearningRate 0.0213   Epoch: 10   Global Step: 446660   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:24,549-Speed 2634.04 samples/sec   Loss 6.1972   LearningRate 0.0213   Epoch: 10   Global Step: 446670   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:28,445-Speed 2629.17 samples/sec   Loss 6.1404   LearningRate 0.0213   Epoch: 10   Global Step: 446680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:32,346-Speed 2625.53 samples/sec   Loss 6.2269   LearningRate 0.0213   Epoch: 10   Global Step: 446690   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:36,244-Speed 2627.92 samples/sec   Loss 6.1989   LearningRate 0.0213   Epoch: 10   Global Step: 446700   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:54:40,120-Speed 2642.71 samples/sec   Loss 6.2312   LearningRate 0.0213   Epoch: 10   Global Step: 446710   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:44,018-Speed 2627.37 samples/sec   Loss 6.0282   LearningRate 0.0213   Epoch: 10   Global Step: 446720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:47,923-Speed 2623.38 samples/sec   Loss 6.1508   LearningRate 0.0213   Epoch: 10   Global Step: 446730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:51,824-Speed 2625.64 samples/sec   Loss 6.1395   LearningRate 0.0213   Epoch: 10   Global Step: 446740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:55,722-Speed 2627.78 samples/sec   Loss 6.1733   LearningRate 0.0213   Epoch: 10   Global Step: 446750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:54:59,619-Speed 2628.04 samples/sec   Loss 6.1318   LearningRate 0.0213   Epoch: 10   Global Step: 446760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:03,529-Speed 2619.79 samples/sec   Loss 6.2256   LearningRate 0.0213   Epoch: 10   Global Step: 446770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:07,428-Speed 2627.16 samples/sec   Loss 6.1736   LearningRate 0.0213   Epoch: 10   Global Step: 446780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:11,328-Speed 2626.47 samples/sec   Loss 6.1418   LearningRate 0.0213   Epoch: 10   Global Step: 446790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:15,221-Speed 2630.97 samples/sec   Loss 6.1064   LearningRate 0.0213   Epoch: 10   Global Step: 446800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:19,116-Speed 2629.65 samples/sec   Loss 6.0722   LearningRate 0.0213   Epoch: 10   Global Step: 446810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:23,022-Speed 2622.32 samples/sec   Loss 6.2172   LearningRate 0.0213   Epoch: 10   Global Step: 446820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:26,933-Speed 2618.92 samples/sec   Loss 6.1796   LearningRate 0.0213   Epoch: 10   Global Step: 446830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:30,963-Speed 2541.66 samples/sec   Loss 6.1150   LearningRate 0.0213   Epoch: 10   Global Step: 446840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:34,873-Speed 2619.79 samples/sec   Loss 6.1578   LearningRate 0.0213   Epoch: 10   Global Step: 446850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:38,770-Speed 2628.16 samples/sec   Loss 6.1696   LearningRate 0.0213   Epoch: 10   Global Step: 446860   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:55:42,781-Speed 2553.65 samples/sec   Loss 6.1849   LearningRate 0.0213   Epoch: 10   Global Step: 446870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:55:46,679-Speed 2627.73 samples/sec   Loss 6.2810   LearningRate 0.0213   Epoch: 10   Global Step: 446880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:55:50,565-Speed 2635.74 samples/sec   Loss 6.1965   LearningRate 0.0213   Epoch: 10   Global Step: 446890   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:54,462-Speed 2628.62 samples/sec   Loss 6.1927   LearningRate 0.0213   Epoch: 10   Global Step: 446900   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:55:58,358-Speed 2628.87 samples/sec   Loss 6.0909   LearningRate 0.0213   Epoch: 10   Global Step: 446910   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:02,261-Speed 2624.45 samples/sec   Loss 6.1538   LearningRate 0.0213   Epoch: 10   Global Step: 446920   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:06,181-Speed 2612.28 samples/sec   Loss 6.0958   LearningRate 0.0213   Epoch: 10   Global Step: 446930   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:10,124-Speed 2597.55 samples/sec   Loss 6.0255   LearningRate 0.0213   Epoch: 10   Global Step: 446940   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:14,085-Speed 2586.49 samples/sec   Loss 6.1464   LearningRate 0.0213   Epoch: 10   Global Step: 446950   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:17,982-Speed 2628.21 samples/sec   Loss 6.2030   LearningRate 0.0213   Epoch: 10   Global Step: 446960   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:21,885-Speed 2624.36 samples/sec   Loss 6.1878   LearningRate 0.0213   Epoch: 10   Global Step: 446970   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:25,790-Speed 2622.58 samples/sec   Loss 6.1187   LearningRate 0.0213   Epoch: 10   Global Step: 446980   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:56:29,698-Speed 2627.58 samples/sec   Loss 6.1457   LearningRate 0.0213   Epoch: 10   Global Step: 446990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:33,601-Speed 2624.68 samples/sec   Loss 6.1076   LearningRate 0.0213   Epoch: 10   Global Step: 447000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:37,496-Speed 2629.19 samples/sec   Loss 6.2448   LearningRate 0.0213   Epoch: 10   Global Step: 447010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:41,388-Speed 2631.62 samples/sec   Loss 6.0282   LearningRate 0.0213   Epoch: 10   Global Step: 447020   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:45,290-Speed 2625.18 samples/sec   Loss 6.1864   LearningRate 0.0213   Epoch: 10   Global Step: 447030   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:49,199-Speed 2620.13 samples/sec   Loss 6.1514   LearningRate 0.0213   Epoch: 10   Global Step: 447040   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:53,096-Speed 2628.86 samples/sec   Loss 6.2301   LearningRate 0.0213   Epoch: 10   Global Step: 447050   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:56:56,968-Speed 2645.52 samples/sec   Loss 6.3153   LearningRate 0.0213   Epoch: 10   Global Step: 447060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:00,867-Speed 2626.75 samples/sec   Loss 6.1864   LearningRate 0.0213   Epoch: 10   Global Step: 447070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:04,771-Speed 2623.58 samples/sec   Loss 6.2375   LearningRate 0.0213   Epoch: 10   Global Step: 447080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:08,683-Speed 2618.08 samples/sec   Loss 6.1534   LearningRate 0.0213   Epoch: 10   Global Step: 447090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:12,584-Speed 2625.14 samples/sec   Loss 6.1922   LearningRate 0.0213   Epoch: 10   Global Step: 447100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:16,478-Speed 2630.99 samples/sec   Loss 6.1491   LearningRate 0.0213   Epoch: 10   Global Step: 447110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:20,384-Speed 2622.44 samples/sec   Loss 6.2418   LearningRate 0.0213   Epoch: 10   Global Step: 447120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:24,305-Speed 2611.89 samples/sec   Loss 6.1160   LearningRate 0.0213   Epoch: 10   Global Step: 447130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:28,237-Speed 2606.16 samples/sec   Loss 6.1734   LearningRate 0.0213   Epoch: 10   Global Step: 447140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:32,140-Speed 2624.16 samples/sec   Loss 6.1308   LearningRate 0.0213   Epoch: 10   Global Step: 447150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:36,089-Speed 2593.40 samples/sec   Loss 6.2334   LearningRate 0.0213   Epoch: 10   Global Step: 447160   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:57:39,993-Speed 2623.70 samples/sec   Loss 6.1445   LearningRate 0.0212   Epoch: 10   Global Step: 447170   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:57:43,897-Speed 2623.55 samples/sec   Loss 6.2966   LearningRate 0.0212   Epoch: 10   Global Step: 447180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:57:47,774-Speed 2641.75 samples/sec   Loss 6.2205   LearningRate 0.0212   Epoch: 10   Global Step: 447190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:51,670-Speed 2629.98 samples/sec   Loss 6.1008   LearningRate 0.0212   Epoch: 10   Global Step: 447200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:55,564-Speed 2629.67 samples/sec   Loss 6.0312   LearningRate 0.0212   Epoch: 10   Global Step: 447210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:57:59,477-Speed 2617.75 samples/sec   Loss 6.1871   LearningRate 0.0212   Epoch: 10   Global Step: 447220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:03,399-Speed 2611.41 samples/sec   Loss 6.0828   LearningRate 0.0212   Epoch: 10   Global Step: 447230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:07,297-Speed 2627.70 samples/sec   Loss 6.0962   LearningRate 0.0212   Epoch: 10   Global Step: 447240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:11,202-Speed 2622.93 samples/sec   Loss 6.1927   LearningRate 0.0212   Epoch: 10   Global Step: 447250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:15,096-Speed 2630.60 samples/sec   Loss 6.0996   LearningRate 0.0212   Epoch: 10   Global Step: 447260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:18,992-Speed 2629.12 samples/sec   Loss 6.1789   LearningRate 0.0212   Epoch: 10   Global Step: 447270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:22,897-Speed 2622.63 samples/sec   Loss 6.2451   LearningRate 0.0212   Epoch: 10   Global Step: 447280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:58:26,797-Speed 2626.39 samples/sec   Loss 6.1890   LearningRate 0.0212   Epoch: 10   Global Step: 447290   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:30,692-Speed 2629.63 samples/sec   Loss 6.0990   LearningRate 0.0212   Epoch: 10   Global Step: 447300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:34,588-Speed 2629.05 samples/sec   Loss 6.1252   LearningRate 0.0212   Epoch: 10   Global Step: 447310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:38,489-Speed 2625.50 samples/sec   Loss 5.9831   LearningRate 0.0212   Epoch: 10   Global Step: 447320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:42,394-Speed 2622.70 samples/sec   Loss 6.2779   LearningRate 0.0212   Epoch: 10   Global Step: 447330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:46,294-Speed 2626.51 samples/sec   Loss 6.1591   LearningRate 0.0212   Epoch: 10   Global Step: 447340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:50,189-Speed 2629.43 samples/sec   Loss 6.0828   LearningRate 0.0212   Epoch: 10   Global Step: 447350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:54,087-Speed 2628.03 samples/sec   Loss 6.1326   LearningRate 0.0212   Epoch: 10   Global Step: 447360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:58:57,983-Speed 2629.37 samples/sec   Loss 6.2138   LearningRate 0.0212   Epoch: 10   Global Step: 447370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:59:01,969-Speed 2569.15 samples/sec   Loss 6.2053   LearningRate 0.0212   Epoch: 10   Global Step: 447380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:59:05,887-Speed 2614.49 samples/sec   Loss 6.1830   LearningRate 0.0212   Epoch: 10   Global Step: 447390   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 21:59:09,738-Speed 2659.56 samples/sec   Loss 6.1740   LearningRate 0.0212   Epoch: 10   Global Step: 447400   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:13,636-Speed 2627.88 samples/sec   Loss 6.1068   LearningRate 0.0212   Epoch: 10   Global Step: 447410   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:17,536-Speed 2625.81 samples/sec   Loss 6.1054   LearningRate 0.0212   Epoch: 10   Global Step: 447420   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:21,448-Speed 2618.80 samples/sec   Loss 6.0488   LearningRate 0.0212   Epoch: 10   Global Step: 447430   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:25,343-Speed 2630.14 samples/sec   Loss 6.1444   LearningRate 0.0212   Epoch: 10   Global Step: 447440   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:29,237-Speed 2629.94 samples/sec   Loss 6.1466   LearningRate 0.0212   Epoch: 10   Global Step: 447450   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:33,137-Speed 2626.48 samples/sec   Loss 6.1086   LearningRate 0.0212   Epoch: 10   Global Step: 447460   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:37,035-Speed 2627.77 samples/sec   Loss 6.2796   LearningRate 0.0212   Epoch: 10   Global Step: 447470   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:40,933-Speed 2627.09 samples/sec   Loss 6.2150   LearningRate 0.0212   Epoch: 10   Global Step: 447480   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:44,829-Speed 2629.36 samples/sec   Loss 6.3322   LearningRate 0.0212   Epoch: 10   Global Step: 447490   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 21:59:48,910-Speed 2509.71 samples/sec   Loss 6.1975   LearningRate 0.0212   Epoch: 10   Global Step: 447500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:59:52,958-Speed 2530.60 samples/sec   Loss 6.2265   LearningRate 0.0212   Epoch: 10   Global Step: 447510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 21:59:56,850-Speed 2631.79 samples/sec   Loss 6.1334   LearningRate 0.0212   Epoch: 10   Global Step: 447520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:00,754-Speed 2623.37 samples/sec   Loss 6.0342   LearningRate 0.0212   Epoch: 10   Global Step: 447530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:04,659-Speed 2622.68 samples/sec   Loss 6.0945   LearningRate 0.0212   Epoch: 10   Global Step: 447540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:08,552-Speed 2630.81 samples/sec   Loss 6.1565   LearningRate 0.0212   Epoch: 10   Global Step: 447550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:12,447-Speed 2629.59 samples/sec   Loss 6.0949   LearningRate 0.0212   Epoch: 10   Global Step: 447560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:16,342-Speed 2629.73 samples/sec   Loss 6.1889   LearningRate 0.0212   Epoch: 10   Global Step: 447570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:20,237-Speed 2629.70 samples/sec   Loss 6.1889   LearningRate 0.0212   Epoch: 10   Global Step: 447580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:24,149-Speed 2618.02 samples/sec   Loss 6.2200   LearningRate 0.0212   Epoch: 10   Global Step: 447590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:28,038-Speed 2633.63 samples/sec   Loss 6.0937   LearningRate 0.0212   Epoch: 10   Global Step: 447600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:31,932-Speed 2631.37 samples/sec   Loss 6.1814   LearningRate 0.0212   Epoch: 10   Global Step: 447610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:35,829-Speed 2629.11 samples/sec   Loss 6.1798   LearningRate 0.0212   Epoch: 10   Global Step: 447620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:00:39,713-Speed 2636.77 samples/sec   Loss 6.1377   LearningRate 0.0212   Epoch: 10   Global Step: 447630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:00:43,625-Speed 2618.50 samples/sec   Loss 6.1166   LearningRate 0.0212   Epoch: 10   Global Step: 447640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:00:47,525-Speed 2626.34 samples/sec   Loss 6.1289   LearningRate 0.0212   Epoch: 10   Global Step: 447650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:00:51,446-Speed 2612.32 samples/sec   Loss 6.3227   LearningRate 0.0212   Epoch: 10   Global Step: 447660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:00:55,351-Speed 2623.38 samples/sec   Loss 6.1674   LearningRate 0.0212   Epoch: 10   Global Step: 447670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:00:59,272-Speed 2612.16 samples/sec   Loss 6.1318   LearningRate 0.0212   Epoch: 10   Global Step: 447680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:03,172-Speed 2626.75 samples/sec   Loss 6.0981   LearningRate 0.0212   Epoch: 10   Global Step: 447690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:07,075-Speed 2623.60 samples/sec   Loss 6.2647   LearningRate 0.0212   Epoch: 10   Global Step: 447700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:10,972-Speed 2628.87 samples/sec   Loss 6.1506   LearningRate 0.0212   Epoch: 10   Global Step: 447710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:14,878-Speed 2622.05 samples/sec   Loss 6.2308   LearningRate 0.0212   Epoch: 10   Global Step: 447720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:18,779-Speed 2625.13 samples/sec   Loss 6.2178   LearningRate 0.0212   Epoch: 10   Global Step: 447730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:01:22,676-Speed 2629.04 samples/sec   Loss 6.2305   LearningRate 0.0212   Epoch: 10   Global Step: 447740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:01:26,571-Speed 2629.75 samples/sec   Loss 6.0364   LearningRate 0.0212   Epoch: 10   Global Step: 447750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:01:30,479-Speed 2621.02 samples/sec   Loss 6.1874   LearningRate 0.0212   Epoch: 10   Global Step: 447760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:01:34,360-Speed 2638.90 samples/sec   Loss 6.1182   LearningRate 0.0212   Epoch: 10   Global Step: 447770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:38,255-Speed 2629.53 samples/sec   Loss 6.2151   LearningRate 0.0212   Epoch: 10   Global Step: 447780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:42,169-Speed 2616.22 samples/sec   Loss 6.1676   LearningRate 0.0212   Epoch: 10   Global Step: 447790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:46,082-Speed 2618.04 samples/sec   Loss 6.1846   LearningRate 0.0212   Epoch: 10   Global Step: 447800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:49,980-Speed 2627.60 samples/sec   Loss 6.0476   LearningRate 0.0212   Epoch: 10   Global Step: 447810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:53,897-Speed 2615.61 samples/sec   Loss 6.2153   LearningRate 0.0212   Epoch: 10   Global Step: 447820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:01:57,792-Speed 2629.52 samples/sec   Loss 6.0965   LearningRate 0.0212   Epoch: 10   Global Step: 447830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:02:01,738-Speed 2596.19 samples/sec   Loss 6.1621   LearningRate 0.0212   Epoch: 10   Global Step: 447840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:02:05,633-Speed 2629.54 samples/sec   Loss 6.1497   LearningRate 0.0212   Epoch: 10   Global Step: 447850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:02:09,568-Speed 2602.68 samples/sec   Loss 6.2454   LearningRate 0.0212   Epoch: 10   Global Step: 447860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:02:13,461-Speed 2631.16 samples/sec   Loss 6.1486   LearningRate 0.0212   Epoch: 10   Global Step: 447870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:17,360-Speed 2627.02 samples/sec   Loss 6.1799   LearningRate 0.0212   Epoch: 10   Global Step: 447880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:21,257-Speed 2628.56 samples/sec   Loss 6.1020   LearningRate 0.0212   Epoch: 10   Global Step: 447890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:25,158-Speed 2625.00 samples/sec   Loss 6.1506   LearningRate 0.0212   Epoch: 10   Global Step: 447900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:29,054-Speed 2630.05 samples/sec   Loss 6.2176   LearningRate 0.0212   Epoch: 10   Global Step: 447910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:32,950-Speed 2628.87 samples/sec   Loss 6.0663   LearningRate 0.0212   Epoch: 10   Global Step: 447920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:36,850-Speed 2625.58 samples/sec   Loss 6.1758   LearningRate 0.0212   Epoch: 10   Global Step: 447930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:40,748-Speed 2627.99 samples/sec   Loss 6.0218   LearningRate 0.0212   Epoch: 10   Global Step: 447940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:44,651-Speed 2624.42 samples/sec   Loss 6.0969   LearningRate 0.0212   Epoch: 10   Global Step: 447950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:48,553-Speed 2624.80 samples/sec   Loss 6.1210   LearningRate 0.0212   Epoch: 10   Global Step: 447960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:02:52,464-Speed 2619.36 samples/sec   Loss 6.0838   LearningRate 0.0212   Epoch: 10   Global Step: 447970   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 22:02:56,359-Speed 2629.34 samples/sec   Loss 6.1267   LearningRate 0.0212   Epoch: 10   Global Step: 447980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:03:00,269-Speed 2619.70 samples/sec   Loss 6.2332   LearningRate 0.0212   Epoch: 10   Global Step: 447990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:03:04,199-Speed 2606.64 samples/sec   Loss 6.0617   LearningRate 0.0212   Epoch: 10   Global Step: 448000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:03:08,121-Speed 2610.86 samples/sec   Loss 6.1764   LearningRate 0.0212   Epoch: 10   Global Step: 448010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:03:12,022-Speed 2626.22 samples/sec   Loss 6.1443   LearningRate 0.0212   Epoch: 10   Global Step: 448020   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:03:15,921-Speed 2627.12 samples/sec   Loss 6.1300   LearningRate 0.0212   Epoch: 10   Global Step: 448030   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:03:19,826-Speed 2622.94 samples/sec   Loss 6.1077   LearningRate 0.0212   Epoch: 10   Global Step: 448040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:23,729-Speed 2624.30 samples/sec   Loss 6.1565   LearningRate 0.0212   Epoch: 10   Global Step: 448050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:27,625-Speed 2628.62 samples/sec   Loss 6.1952   LearningRate 0.0212   Epoch: 10   Global Step: 448060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:31,616-Speed 2566.58 samples/sec   Loss 6.1345   LearningRate 0.0211   Epoch: 10   Global Step: 448070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:35,606-Speed 2567.05 samples/sec   Loss 6.1549   LearningRate 0.0211   Epoch: 10   Global Step: 448080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:39,523-Speed 2615.12 samples/sec   Loss 6.0354   LearningRate 0.0211   Epoch: 10   Global Step: 448090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:43,428-Speed 2622.76 samples/sec   Loss 6.1940   LearningRate 0.0211   Epoch: 10   Global Step: 448100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:47,325-Speed 2628.56 samples/sec   Loss 6.2218   LearningRate 0.0211   Epoch: 10   Global Step: 448110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:51,219-Speed 2630.59 samples/sec   Loss 6.1310   LearningRate 0.0211   Epoch: 10   Global Step: 448120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:55,115-Speed 2628.99 samples/sec   Loss 6.1306   LearningRate 0.0211   Epoch: 10   Global Step: 448130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:03:59,017-Speed 2625.65 samples/sec   Loss 6.1490   LearningRate 0.0211   Epoch: 10   Global Step: 448140   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:02,935-Speed 2613.72 samples/sec   Loss 6.1884   LearningRate 0.0211   Epoch: 10   Global Step: 448150   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:06,840-Speed 2623.03 samples/sec   Loss 6.2318   LearningRate 0.0211   Epoch: 10   Global Step: 448160   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:10,740-Speed 2626.39 samples/sec   Loss 6.1406   LearningRate 0.0211   Epoch: 10   Global Step: 448170   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:14,640-Speed 2626.05 samples/sec   Loss 6.0738   LearningRate 0.0211   Epoch: 10   Global Step: 448180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:18,643-Speed 2559.04 samples/sec   Loss 6.1540   LearningRate 0.0211   Epoch: 10   Global Step: 448190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:22,540-Speed 2628.15 samples/sec   Loss 6.0157   LearningRate 0.0211   Epoch: 10   Global Step: 448200   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:26,443-Speed 2625.14 samples/sec   Loss 6.1330   LearningRate 0.0211   Epoch: 10   Global Step: 448210   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:30,338-Speed 2629.60 samples/sec   Loss 6.0949   LearningRate 0.0211   Epoch: 10   Global Step: 448220   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:34,237-Speed 2626.77 samples/sec   Loss 6.1839   LearningRate 0.0211   Epoch: 10   Global Step: 448230   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:38,113-Speed 2642.21 samples/sec   Loss 6.2369   LearningRate 0.0211   Epoch: 10   Global Step: 448240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:42,015-Speed 2625.09 samples/sec   Loss 6.1106   LearningRate 0.0211   Epoch: 10   Global Step: 448250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:45,915-Speed 2626.17 samples/sec   Loss 6.2154   LearningRate 0.0211   Epoch: 10   Global Step: 448260   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:49,811-Speed 2629.18 samples/sec   Loss 6.0911   LearningRate 0.0211   Epoch: 10   Global Step: 448270   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:53,703-Speed 2631.35 samples/sec   Loss 6.1506   LearningRate 0.0211   Epoch: 10   Global Step: 448280   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:04:57,597-Speed 2630.52 samples/sec   Loss 6.1593   LearningRate 0.0211   Epoch: 10   Global Step: 448290   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:01,492-Speed 2630.09 samples/sec   Loss 6.1858   LearningRate 0.0211   Epoch: 10   Global Step: 448300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:05,383-Speed 2632.14 samples/sec   Loss 6.1578   LearningRate 0.0211   Epoch: 10   Global Step: 448310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:09,274-Speed 2632.32 samples/sec   Loss 6.1946   LearningRate 0.0211   Epoch: 10   Global Step: 448320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:13,169-Speed 2629.90 samples/sec   Loss 6.1663   LearningRate 0.0211   Epoch: 10   Global Step: 448330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:17,074-Speed 2622.31 samples/sec   Loss 6.1291   LearningRate 0.0211   Epoch: 10   Global Step: 448340   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 22:05:20,951-Speed 2642.28 samples/sec   Loss 6.1600   LearningRate 0.0211   Epoch: 10   Global Step: 448350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:24,850-Speed 2626.87 samples/sec   Loss 6.1204   LearningRate 0.0211   Epoch: 10   Global Step: 448360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:28,748-Speed 2628.53 samples/sec   Loss 6.1050   LearningRate 0.0211   Epoch: 10   Global Step: 448370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:32,643-Speed 2628.97 samples/sec   Loss 6.2325   LearningRate 0.0211   Epoch: 10   Global Step: 448380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:36,534-Speed 2632.73 samples/sec   Loss 5.9927   LearningRate 0.0211   Epoch: 10   Global Step: 448390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:40,426-Speed 2631.43 samples/sec   Loss 6.0963   LearningRate 0.0211   Epoch: 10   Global Step: 448400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:44,320-Speed 2630.07 samples/sec   Loss 6.2191   LearningRate 0.0211   Epoch: 10   Global Step: 448410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:48,213-Speed 2630.70 samples/sec   Loss 6.0894   LearningRate 0.0211   Epoch: 10   Global Step: 448420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:52,116-Speed 2624.96 samples/sec   Loss 6.1907   LearningRate 0.0211   Epoch: 10   Global Step: 448430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:56,016-Speed 2626.56 samples/sec   Loss 6.2026   LearningRate 0.0211   Epoch: 10   Global Step: 448440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:05:59,900-Speed 2636.92 samples/sec   Loss 6.2214   LearningRate 0.0211   Epoch: 10   Global Step: 448450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:06:03,802-Speed 2625.27 samples/sec   Loss 6.0636   LearningRate 0.0211   Epoch: 10   Global Step: 448460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:06:07,703-Speed 2625.38 samples/sec   Loss 6.1555   LearningRate 0.0211   Epoch: 10   Global Step: 448470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:06:11,601-Speed 2627.74 samples/sec   Loss 6.1747   LearningRate 0.0211   Epoch: 10   Global Step: 448480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:06:15,493-Speed 2631.47 samples/sec   Loss 6.0965   LearningRate 0.0211   Epoch: 10   Global Step: 448490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:06:19,362-Speed 2647.69 samples/sec   Loss 6.0821   LearningRate 0.0211   Epoch: 10   Global Step: 448500   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:23,253-Speed 2632.21 samples/sec   Loss 6.0803   LearningRate 0.0211   Epoch: 10   Global Step: 448510   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:27,145-Speed 2632.00 samples/sec   Loss 6.1537   LearningRate 0.0211   Epoch: 10   Global Step: 448520   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:31,044-Speed 2626.84 samples/sec   Loss 6.1995   LearningRate 0.0211   Epoch: 10   Global Step: 448530   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:34,942-Speed 2627.30 samples/sec   Loss 6.1048   LearningRate 0.0211   Epoch: 10   Global Step: 448540   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:38,840-Speed 2627.66 samples/sec   Loss 6.1372   LearningRate 0.0211   Epoch: 10   Global Step: 448550   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:42,734-Speed 2630.56 samples/sec   Loss 6.0801   LearningRate 0.0211   Epoch: 10   Global Step: 448560   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:46,628-Speed 2630.33 samples/sec   Loss 6.1840   LearningRate 0.0211   Epoch: 10   Global Step: 448570   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:50,522-Speed 2630.47 samples/sec   Loss 6.1730   LearningRate 0.0211   Epoch: 10   Global Step: 448580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:54,423-Speed 2626.00 samples/sec   Loss 5.9872   LearningRate 0.0211   Epoch: 10   Global Step: 448590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:06:58,317-Speed 2630.11 samples/sec   Loss 5.9542   LearningRate 0.0211   Epoch: 10   Global Step: 448600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:07:02,211-Speed 2630.19 samples/sec   Loss 6.2262   LearningRate 0.0211   Epoch: 10   Global Step: 448610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:07:06,126-Speed 2615.66 samples/sec   Loss 6.0939   LearningRate 0.0211   Epoch: 10   Global Step: 448620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:07:10,041-Speed 2616.83 samples/sec   Loss 6.1425   LearningRate 0.0211   Epoch: 10   Global Step: 448630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:07:13,934-Speed 2630.93 samples/sec   Loss 6.1531   LearningRate 0.0211   Epoch: 10   Global Step: 448640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:07:17,813-Speed 2640.73 samples/sec   Loss 6.1913   LearningRate 0.0211   Epoch: 10   Global Step: 448650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:21,721-Speed 2621.66 samples/sec   Loss 6.0586   LearningRate 0.0211   Epoch: 10   Global Step: 448660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:25,617-Speed 2628.86 samples/sec   Loss 6.1063   LearningRate 0.0211   Epoch: 10   Global Step: 448670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:29,513-Speed 2629.03 samples/sec   Loss 6.1356   LearningRate 0.0211   Epoch: 10   Global Step: 448680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:33,408-Speed 2629.49 samples/sec   Loss 6.1833   LearningRate 0.0211   Epoch: 10   Global Step: 448690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:37,310-Speed 2624.75 samples/sec   Loss 6.0051   LearningRate 0.0211   Epoch: 10   Global Step: 448700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:41,221-Speed 2618.76 samples/sec   Loss 6.2235   LearningRate 0.0211   Epoch: 10   Global Step: 448710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:45,117-Speed 2629.41 samples/sec   Loss 6.1948   LearningRate 0.0211   Epoch: 10   Global Step: 448720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:49,015-Speed 2627.57 samples/sec   Loss 6.0262   LearningRate 0.0211   Epoch: 10   Global Step: 448730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:52,914-Speed 2627.69 samples/sec   Loss 6.0735   LearningRate 0.0211   Epoch: 10   Global Step: 448740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:07:56,810-Speed 2628.76 samples/sec   Loss 6.1126   LearningRate 0.0211   Epoch: 10   Global Step: 448750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:00,713-Speed 2624.94 samples/sec   Loss 6.1988   LearningRate 0.0211   Epoch: 10   Global Step: 448760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:04,644-Speed 2604.90 samples/sec   Loss 6.0405   LearningRate 0.0211   Epoch: 10   Global Step: 448770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:08,536-Speed 2632.01 samples/sec   Loss 6.2356   LearningRate 0.0211   Epoch: 10   Global Step: 448780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:12,567-Speed 2540.63 samples/sec   Loss 6.1194   LearningRate 0.0211   Epoch: 10   Global Step: 448790   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:16,496-Speed 2607.49 samples/sec   Loss 6.1375   LearningRate 0.0211   Epoch: 10   Global Step: 448800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:20,392-Speed 2628.52 samples/sec   Loss 6.0875   LearningRate 0.0211   Epoch: 10   Global Step: 448810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:08:24,306-Speed 2617.03 samples/sec   Loss 6.1927   LearningRate 0.0211   Epoch: 10   Global Step: 448820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:28,200-Speed 2630.75 samples/sec   Loss 6.1326   LearningRate 0.0211   Epoch: 10   Global Step: 448830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:32,094-Speed 2630.30 samples/sec   Loss 5.9879   LearningRate 0.0211   Epoch: 10   Global Step: 448840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:35,989-Speed 2629.78 samples/sec   Loss 6.1702   LearningRate 0.0211   Epoch: 10   Global Step: 448850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:39,897-Speed 2620.87 samples/sec   Loss 6.1594   LearningRate 0.0211   Epoch: 10   Global Step: 448860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:43,818-Speed 2611.87 samples/sec   Loss 6.1199   LearningRate 0.0211   Epoch: 10   Global Step: 448870   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:47,725-Speed 2622.15 samples/sec   Loss 6.1331   LearningRate 0.0211   Epoch: 10   Global Step: 448880   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:51,625-Speed 2626.89 samples/sec   Loss 6.2526   LearningRate 0.0211   Epoch: 10   Global Step: 448890   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:55,517-Speed 2631.52 samples/sec   Loss 6.1082   LearningRate 0.0211   Epoch: 10   Global Step: 448900   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:08:59,412-Speed 2629.84 samples/sec   Loss 6.0667   LearningRate 0.0211   Epoch: 10   Global Step: 448910   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:09:03,309-Speed 2628.22 samples/sec   Loss 6.0617   LearningRate 0.0211   Epoch: 10   Global Step: 448920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:07,206-Speed 2628.48 samples/sec   Loss 6.0573   LearningRate 0.0211   Epoch: 10   Global Step: 448930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:11,111-Speed 2622.17 samples/sec   Loss 6.0971   LearningRate 0.0211   Epoch: 10   Global Step: 448940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:15,005-Speed 2630.69 samples/sec   Loss 6.2355   LearningRate 0.0211   Epoch: 10   Global Step: 448950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:18,904-Speed 2626.91 samples/sec   Loss 6.0790   LearningRate 0.0211   Epoch: 10   Global Step: 448960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:22,806-Speed 2625.57 samples/sec   Loss 6.1573   LearningRate 0.0210   Epoch: 10   Global Step: 448970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:26,697-Speed 2632.40 samples/sec   Loss 6.1713   LearningRate 0.0210   Epoch: 10   Global Step: 448980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:30,589-Speed 2631.23 samples/sec   Loss 6.0961   LearningRate 0.0210   Epoch: 10   Global Step: 448990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:34,486-Speed 2628.25 samples/sec   Loss 6.0464   LearningRate 0.0210   Epoch: 10   Global Step: 449000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:38,380-Speed 2630.29 samples/sec   Loss 6.1059   LearningRate 0.0210   Epoch: 10   Global Step: 449010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:42,262-Speed 2638.01 samples/sec   Loss 6.1128   LearningRate 0.0210   Epoch: 10   Global Step: 449020   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:46,158-Speed 2629.49 samples/sec   Loss 6.3200   LearningRate 0.0210   Epoch: 10   Global Step: 449030   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:09:50,030-Speed 2645.38 samples/sec   Loss 6.0952   LearningRate 0.0210   Epoch: 10   Global Step: 449040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:09:53,927-Speed 2628.51 samples/sec   Loss 6.0845   LearningRate 0.0210   Epoch: 10   Global Step: 449050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:09:57,828-Speed 2625.85 samples/sec   Loss 6.2402   LearningRate 0.0210   Epoch: 10   Global Step: 449060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:01,726-Speed 2627.49 samples/sec   Loss 6.1080   LearningRate 0.0210   Epoch: 10   Global Step: 449070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:05,648-Speed 2611.23 samples/sec   Loss 6.1561   LearningRate 0.0210   Epoch: 10   Global Step: 449080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:09,545-Speed 2628.35 samples/sec   Loss 6.1148   LearningRate 0.0210   Epoch: 10   Global Step: 449090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:13,440-Speed 2629.71 samples/sec   Loss 6.1250   LearningRate 0.0210   Epoch: 10   Global Step: 449100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:17,346-Speed 2622.32 samples/sec   Loss 6.0658   LearningRate 0.0210   Epoch: 10   Global Step: 449110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:21,246-Speed 2626.95 samples/sec   Loss 6.0742   LearningRate 0.0210   Epoch: 10   Global Step: 449120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:25,144-Speed 2627.37 samples/sec   Loss 6.1244   LearningRate 0.0210   Epoch: 10   Global Step: 449130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:29,045-Speed 2625.72 samples/sec   Loss 6.1215   LearningRate 0.0210   Epoch: 10   Global Step: 449140   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:10:32,919-Speed 2644.00 samples/sec   Loss 6.2065   LearningRate 0.0210   Epoch: 10   Global Step: 449150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:36,814-Speed 2629.68 samples/sec   Loss 6.1525   LearningRate 0.0210   Epoch: 10   Global Step: 449160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:40,719-Speed 2623.06 samples/sec   Loss 6.1514   LearningRate 0.0210   Epoch: 10   Global Step: 449170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:44,624-Speed 2622.54 samples/sec   Loss 6.1197   LearningRate 0.0210   Epoch: 10   Global Step: 449180   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:48,519-Speed 2629.45 samples/sec   Loss 6.1414   LearningRate 0.0210   Epoch: 10   Global Step: 449190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:52,416-Speed 2629.21 samples/sec   Loss 6.1065   LearningRate 0.0210   Epoch: 10   Global Step: 449200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:10:56,312-Speed 2628.64 samples/sec   Loss 6.0821   LearningRate 0.0210   Epoch: 10   Global Step: 449210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:00,214-Speed 2624.83 samples/sec   Loss 6.1457   LearningRate 0.0210   Epoch: 10   Global Step: 449220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:04,115-Speed 2625.66 samples/sec   Loss 6.1217   LearningRate 0.0210   Epoch: 10   Global Step: 449230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:08,031-Speed 2615.39 samples/sec   Loss 6.1424   LearningRate 0.0210   Epoch: 10   Global Step: 449240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:11,939-Speed 2620.96 samples/sec   Loss 6.2129   LearningRate 0.0210   Epoch: 10   Global Step: 449250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:11:15,834-Speed 2629.24 samples/sec   Loss 6.1247   LearningRate 0.0210   Epoch: 10   Global Step: 449260   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:11:19,717-Speed 2638.34 samples/sec   Loss 6.2079   LearningRate 0.0210   Epoch: 10   Global Step: 449270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:23,618-Speed 2625.83 samples/sec   Loss 6.1577   LearningRate 0.0210   Epoch: 10   Global Step: 449280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:27,514-Speed 2629.01 samples/sec   Loss 6.1810   LearningRate 0.0210   Epoch: 10   Global Step: 449290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:31,418-Speed 2623.65 samples/sec   Loss 6.1713   LearningRate 0.0210   Epoch: 10   Global Step: 449300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:35,315-Speed 2628.36 samples/sec   Loss 6.1953   LearningRate 0.0210   Epoch: 10   Global Step: 449310   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:39,209-Speed 2630.18 samples/sec   Loss 6.1049   LearningRate 0.0210   Epoch: 10   Global Step: 449320   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:43,109-Speed 2625.75 samples/sec   Loss 6.1678   LearningRate 0.0210   Epoch: 10   Global Step: 449330   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:47,002-Speed 2630.99 samples/sec   Loss 6.1363   LearningRate 0.0210   Epoch: 10   Global Step: 449340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:50,897-Speed 2630.02 samples/sec   Loss 6.0200   LearningRate 0.0210   Epoch: 10   Global Step: 449350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:54,801-Speed 2623.42 samples/sec   Loss 6.1104   LearningRate 0.0210   Epoch: 10   Global Step: 449360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:11:58,692-Speed 2632.65 samples/sec   Loss 6.1440   LearningRate 0.0210   Epoch: 10   Global Step: 449370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:02,583-Speed 2632.26 samples/sec   Loss 6.0984   LearningRate 0.0210   Epoch: 10   Global Step: 449380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:06,479-Speed 2629.20 samples/sec   Loss 6.1313   LearningRate 0.0210   Epoch: 10   Global Step: 449390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:10,381-Speed 2625.06 samples/sec   Loss 6.0431   LearningRate 0.0210   Epoch: 10   Global Step: 449400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:14,288-Speed 2621.01 samples/sec   Loss 6.0930   LearningRate 0.0210   Epoch: 10   Global Step: 449410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:18,214-Speed 2608.61 samples/sec   Loss 6.1559   LearningRate 0.0210   Epoch: 10   Global Step: 449420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:22,121-Speed 2622.15 samples/sec   Loss 6.0840   LearningRate 0.0210   Epoch: 10   Global Step: 449430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:26,031-Speed 2619.27 samples/sec   Loss 6.0967   LearningRate 0.0210   Epoch: 10   Global Step: 449440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:29,928-Speed 2628.29 samples/sec   Loss 6.0961   LearningRate 0.0210   Epoch: 10   Global Step: 449450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:33,824-Speed 2628.75 samples/sec   Loss 6.1385   LearningRate 0.0210   Epoch: 10   Global Step: 449460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:37,700-Speed 2643.06 samples/sec   Loss 6.1654   LearningRate 0.0210   Epoch: 10   Global Step: 449470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:41,603-Speed 2624.45 samples/sec   Loss 6.0077   LearningRate 0.0210   Epoch: 10   Global Step: 449480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:45,501-Speed 2627.16 samples/sec   Loss 6.1611   LearningRate 0.0210   Epoch: 10   Global Step: 449490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:49,396-Speed 2630.05 samples/sec   Loss 6.0446   LearningRate 0.0210   Epoch: 10   Global Step: 449500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:53,294-Speed 2627.14 samples/sec   Loss 6.1017   LearningRate 0.0210   Epoch: 10   Global Step: 449510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:12:57,193-Speed 2627.20 samples/sec   Loss 6.1312   LearningRate 0.0210   Epoch: 10   Global Step: 449520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:01,087-Speed 2629.95 samples/sec   Loss 6.1724   LearningRate 0.0210   Epoch: 10   Global Step: 449530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:05,041-Speed 2590.49 samples/sec   Loss 6.1155   LearningRate 0.0210   Epoch: 10   Global Step: 449540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:08,944-Speed 2624.28 samples/sec   Loss 6.0880   LearningRate 0.0210   Epoch: 10   Global Step: 449550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:12,844-Speed 2626.91 samples/sec   Loss 6.1014   LearningRate 0.0210   Epoch: 10   Global Step: 449560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:16,752-Speed 2620.59 samples/sec   Loss 6.1312   LearningRate 0.0210   Epoch: 10   Global Step: 449570   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 22:13:20,642-Speed 2633.33 samples/sec   Loss 6.0850   LearningRate 0.0210   Epoch: 10   Global Step: 449580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:24,549-Speed 2621.13 samples/sec   Loss 6.1252   LearningRate 0.0210   Epoch: 10   Global Step: 449590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:28,454-Speed 2623.12 samples/sec   Loss 6.1950   LearningRate 0.0210   Epoch: 10   Global Step: 449600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:32,352-Speed 2627.40 samples/sec   Loss 6.0518   LearningRate 0.0210   Epoch: 10   Global Step: 449610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:36,350-Speed 2562.55 samples/sec   Loss 6.1254   LearningRate 0.0210   Epoch: 10   Global Step: 449620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:40,448-Speed 2498.95 samples/sec   Loss 6.0595   LearningRate 0.0210   Epoch: 10   Global Step: 449630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:44,361-Speed 2617.76 samples/sec   Loss 6.0449   LearningRate 0.0210   Epoch: 10   Global Step: 449640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:48,265-Speed 2623.71 samples/sec   Loss 5.9505   LearningRate 0.0210   Epoch: 10   Global Step: 449650   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:52,176-Speed 2619.34 samples/sec   Loss 6.0297   LearningRate 0.0210   Epoch: 10   Global Step: 449660   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:13:56,042-Speed 2648.93 samples/sec   Loss 6.0552   LearningRate 0.0210   Epoch: 10   Global Step: 449670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:13:59,945-Speed 2624.56 samples/sec   Loss 6.1810   LearningRate 0.0210   Epoch: 10   Global Step: 449680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:03,861-Speed 2615.59 samples/sec   Loss 6.2948   LearningRate 0.0210   Epoch: 10   Global Step: 449690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:07,759-Speed 2628.01 samples/sec   Loss 6.1719   LearningRate 0.0210   Epoch: 10   Global Step: 449700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:11,666-Speed 2621.38 samples/sec   Loss 6.1439   LearningRate 0.0210   Epoch: 10   Global Step: 449710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:15,608-Speed 2598.64 samples/sec   Loss 6.0370   LearningRate 0.0210   Epoch: 10   Global Step: 449720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:19,508-Speed 2625.74 samples/sec   Loss 6.0434   LearningRate 0.0210   Epoch: 10   Global Step: 449730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:23,420-Speed 2618.65 samples/sec   Loss 6.0708   LearningRate 0.0210   Epoch: 10   Global Step: 449740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:27,316-Speed 2628.94 samples/sec   Loss 6.2001   LearningRate 0.0210   Epoch: 10   Global Step: 449750   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:31,208-Speed 2632.43 samples/sec   Loss 6.1280   LearningRate 0.0210   Epoch: 10   Global Step: 449760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:14:35,101-Speed 2630.48 samples/sec   Loss 6.0792   LearningRate 0.0210   Epoch: 10   Global Step: 449770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:14:39,022-Speed 2611.87 samples/sec   Loss 6.1865   LearningRate 0.0210   Epoch: 10   Global Step: 449780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:14:42,917-Speed 2629.99 samples/sec   Loss 6.1839   LearningRate 0.0210   Epoch: 10   Global Step: 449790   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:14:46,811-Speed 2630.83 samples/sec   Loss 6.1760   LearningRate 0.0210   Epoch: 10   Global Step: 449800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:14:50,716-Speed 2622.75 samples/sec   Loss 6.0267   LearningRate 0.0210   Epoch: 10   Global Step: 449810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:14:54,610-Speed 2630.36 samples/sec   Loss 6.1250   LearningRate 0.0210   Epoch: 10   Global Step: 449820   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:14:58,528-Speed 2614.16 samples/sec   Loss 6.1210   LearningRate 0.0210   Epoch: 10   Global Step: 449830   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:15:02,402-Speed 2643.97 samples/sec   Loss 6.1797   LearningRate 0.0210   Epoch: 10   Global Step: 449840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:06,303-Speed 2625.87 samples/sec   Loss 6.1886   LearningRate 0.0210   Epoch: 10   Global Step: 449850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:10,196-Speed 2631.06 samples/sec   Loss 6.2616   LearningRate 0.0210   Epoch: 10   Global Step: 449860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:14,088-Speed 2631.25 samples/sec   Loss 6.1386   LearningRate 0.0210   Epoch: 10   Global Step: 449870   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:17,986-Speed 2627.80 samples/sec   Loss 6.1241   LearningRate 0.0209   Epoch: 10   Global Step: 449880   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:21,903-Speed 2615.28 samples/sec   Loss 6.0478   LearningRate 0.0209   Epoch: 10   Global Step: 449890   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:25,797-Speed 2630.49 samples/sec   Loss 6.1101   LearningRate 0.0209   Epoch: 10   Global Step: 449900   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:29,691-Speed 2630.43 samples/sec   Loss 6.1764   LearningRate 0.0209   Epoch: 10   Global Step: 449910   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:33,588-Speed 2627.94 samples/sec   Loss 6.0925   LearningRate 0.0209   Epoch: 10   Global Step: 449920   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:37,486-Speed 2627.81 samples/sec   Loss 6.0129   LearningRate 0.0209   Epoch: 10   Global Step: 449930   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:15:41,385-Speed 2626.60 samples/sec   Loss 6.0928   LearningRate 0.0209   Epoch: 10   Global Step: 449940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:15:45,288-Speed 2624.72 samples/sec   Loss 6.1307   LearningRate 0.0209   Epoch: 10   Global Step: 449950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:15:49,181-Speed 2631.17 samples/sec   Loss 5.9953   LearningRate 0.0209   Epoch: 10   Global Step: 449960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:15:53,076-Speed 2630.37 samples/sec   Loss 6.1911   LearningRate 0.0209   Epoch: 10   Global Step: 449970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:15:57,001-Speed 2609.45 samples/sec   Loss 6.1274   LearningRate 0.0209   Epoch: 10   Global Step: 449980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:16:00,896-Speed 2629.72 samples/sec   Loss 6.1016   LearningRate 0.0209   Epoch: 10   Global Step: 449990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:16:04,797-Speed 2625.78 samples/sec   Loss 6.0232   LearningRate 0.0209   Epoch: 10   Global Step: 450000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:16:47,873-[lfw][450000]XNorm: 24.080963
Training: 2022-04-14 22:16:47,874-[lfw][450000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-14 22:16:47,874-[lfw][450000]Accuracy-Highest: 0.99783
Training: 2022-04-14 22:17:37,696-[cfp_fp][450000]XNorm: 22.422806
Training: 2022-04-14 22:17:37,697-[cfp_fp][450000]Accuracy-Flip: 0.98743+-0.00498
Training: 2022-04-14 22:17:37,698-[cfp_fp][450000]Accuracy-Highest: 0.98843
Training: 2022-04-14 22:18:20,934-[agedb_30][450000]XNorm: 24.076111
Training: 2022-04-14 22:18:20,935-[agedb_30][450000]Accuracy-Flip: 0.97817+-0.00669
Training: 2022-04-14 22:18:20,936-[agedb_30][450000]Accuracy-Highest: 0.97817
Training: 2022-04-14 22:18:24,828-Speed 73.13 samples/sec   Loss 6.1232   LearningRate 0.0209   Epoch: 10   Global Step: 450010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:28,706-Speed 2641.20 samples/sec   Loss 6.1375   LearningRate 0.0209   Epoch: 10   Global Step: 450020   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:32,609-Speed 2624.28 samples/sec   Loss 6.1276   LearningRate 0.0209   Epoch: 10   Global Step: 450030   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:36,476-Speed 2649.23 samples/sec   Loss 6.0966   LearningRate 0.0209   Epoch: 10   Global Step: 450040   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:40,356-Speed 2639.52 samples/sec   Loss 6.1304   LearningRate 0.0209   Epoch: 10   Global Step: 450050   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:44,265-Speed 2620.25 samples/sec   Loss 6.0108   LearningRate 0.0209   Epoch: 10   Global Step: 450060   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:48,153-Speed 2634.86 samples/sec   Loss 6.1389   LearningRate 0.0209   Epoch: 10   Global Step: 450070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:52,048-Speed 2629.22 samples/sec   Loss 6.1296   LearningRate 0.0209   Epoch: 10   Global Step: 450080   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:55,950-Speed 2625.40 samples/sec   Loss 6.0237   LearningRate 0.0209   Epoch: 10   Global Step: 450090   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:18:59,856-Speed 2622.22 samples/sec   Loss 6.2045   LearningRate 0.0209   Epoch: 10   Global Step: 450100   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:03,752-Speed 2629.50 samples/sec   Loss 5.9696   LearningRate 0.0209   Epoch: 10   Global Step: 450110   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:07,668-Speed 2616.10 samples/sec   Loss 6.0590   LearningRate 0.0209   Epoch: 10   Global Step: 450120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:11,557-Speed 2633.26 samples/sec   Loss 6.0895   LearningRate 0.0209   Epoch: 10   Global Step: 450130   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:15,454-Speed 2628.15 samples/sec   Loss 6.1643   LearningRate 0.0209   Epoch: 10   Global Step: 450140   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-04-14 22:19:19,343-Speed 2633.59 samples/sec   Loss 6.0731   LearningRate 0.0209   Epoch: 10   Global Step: 450150   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:23,252-Speed 2620.92 samples/sec   Loss 6.0185   LearningRate 0.0209   Epoch: 10   Global Step: 450160   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:27,152-Speed 2625.88 samples/sec   Loss 6.1590   LearningRate 0.0209   Epoch: 10   Global Step: 450170   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:31,047-Speed 2629.71 samples/sec   Loss 6.0852   LearningRate 0.0209   Epoch: 10   Global Step: 450180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:34,970-Speed 2611.15 samples/sec   Loss 6.1078   LearningRate 0.0209   Epoch: 10   Global Step: 450190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:38,866-Speed 2629.20 samples/sec   Loss 6.0949   LearningRate 0.0209   Epoch: 10   Global Step: 450200   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:19:42,732-Speed 2649.23 samples/sec   Loss 6.0098   LearningRate 0.0209   Epoch: 10   Global Step: 450210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:19:46,633-Speed 2625.39 samples/sec   Loss 6.0777   LearningRate 0.0209   Epoch: 10   Global Step: 450220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:19:50,542-Speed 2620.41 samples/sec   Loss 6.1309   LearningRate 0.0209   Epoch: 10   Global Step: 450230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:19:54,461-Speed 2613.71 samples/sec   Loss 6.0673   LearningRate 0.0209   Epoch: 10   Global Step: 450240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:19:58,365-Speed 2624.62 samples/sec   Loss 6.1242   LearningRate 0.0209   Epoch: 10   Global Step: 450250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:02,280-Speed 2616.44 samples/sec   Loss 6.1872   LearningRate 0.0209   Epoch: 10   Global Step: 450260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:06,204-Speed 2610.51 samples/sec   Loss 6.1132   LearningRate 0.0209   Epoch: 10   Global Step: 450270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:10,100-Speed 2628.29 samples/sec   Loss 6.1355   LearningRate 0.0209   Epoch: 10   Global Step: 450280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:13,996-Speed 2628.84 samples/sec   Loss 5.9458   LearningRate 0.0209   Epoch: 10   Global Step: 450290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:17,895-Speed 2626.92 samples/sec   Loss 6.0181   LearningRate 0.0209   Epoch: 10   Global Step: 450300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:21,814-Speed 2613.73 samples/sec   Loss 6.0876   LearningRate 0.0209   Epoch: 10   Global Step: 450310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:20:25,709-Speed 2629.80 samples/sec   Loss 6.0749   LearningRate 0.0209   Epoch: 10   Global Step: 450320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:20:29,604-Speed 2629.86 samples/sec   Loss 5.9949   LearningRate 0.0209   Epoch: 10   Global Step: 450330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:20:33,479-Speed 2643.53 samples/sec   Loss 6.0896   LearningRate 0.0209   Epoch: 10   Global Step: 450340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:37,375-Speed 2628.62 samples/sec   Loss 6.0458   LearningRate 0.0209   Epoch: 10   Global Step: 450350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:41,268-Speed 2630.74 samples/sec   Loss 6.0888   LearningRate 0.0209   Epoch: 10   Global Step: 450360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:45,210-Speed 2598.69 samples/sec   Loss 6.0961   LearningRate 0.0209   Epoch: 10   Global Step: 450370   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:49,107-Speed 2628.22 samples/sec   Loss 6.0582   LearningRate 0.0209   Epoch: 10   Global Step: 450380   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:53,005-Speed 2628.07 samples/sec   Loss 6.2090   LearningRate 0.0209   Epoch: 10   Global Step: 450390   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:20:56,901-Speed 2628.64 samples/sec   Loss 6.1977   LearningRate 0.0209   Epoch: 10   Global Step: 450400   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:21:00,795-Speed 2630.66 samples/sec   Loss 6.0510   LearningRate 0.0209   Epoch: 10   Global Step: 450410   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:21:04,693-Speed 2627.54 samples/sec   Loss 6.0852   LearningRate 0.0209   Epoch: 10   Global Step: 450420   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:21:08,606-Speed 2617.72 samples/sec   Loss 6.2970   LearningRate 0.0209   Epoch: 10   Global Step: 450430   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:21:12,502-Speed 2628.42 samples/sec   Loss 6.0415   LearningRate 0.0209   Epoch: 10   Global Step: 450440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:16,405-Speed 2624.63 samples/sec   Loss 6.0668   LearningRate 0.0209   Epoch: 10   Global Step: 450450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:20,297-Speed 2631.96 samples/sec   Loss 6.1349   LearningRate 0.0209   Epoch: 10   Global Step: 450460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:24,197-Speed 2626.01 samples/sec   Loss 6.1039   LearningRate 0.0209   Epoch: 10   Global Step: 450470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:28,106-Speed 2620.30 samples/sec   Loss 5.9765   LearningRate 0.0209   Epoch: 10   Global Step: 450480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:32,002-Speed 2628.89 samples/sec   Loss 6.1628   LearningRate 0.0209   Epoch: 10   Global Step: 450490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:35,905-Speed 2623.89 samples/sec   Loss 6.1460   LearningRate 0.0209   Epoch: 10   Global Step: 450500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:39,803-Speed 2627.51 samples/sec   Loss 6.0318   LearningRate 0.0209   Epoch: 10   Global Step: 450510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:43,711-Speed 2621.22 samples/sec   Loss 6.1650   LearningRate 0.0209   Epoch: 10   Global Step: 450520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:47,602-Speed 2632.37 samples/sec   Loss 6.1242   LearningRate 0.0209   Epoch: 10   Global Step: 450530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:51,481-Speed 2640.90 samples/sec   Loss 6.1742   LearningRate 0.0209   Epoch: 10   Global Step: 450540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:21:55,362-Speed 2639.60 samples/sec   Loss 5.9646   LearningRate 0.0209   Epoch: 10   Global Step: 450550   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:21:59,255-Speed 2631.06 samples/sec   Loss 5.9563   LearningRate 0.0209   Epoch: 10   Global Step: 450560   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:03,151-Speed 2628.53 samples/sec   Loss 6.0138   LearningRate 0.0209   Epoch: 10   Global Step: 450570   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:07,099-Speed 2594.45 samples/sec   Loss 6.2093   LearningRate 0.0209   Epoch: 10   Global Step: 450580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:11,003-Speed 2623.87 samples/sec   Loss 6.1080   LearningRate 0.0209   Epoch: 10   Global Step: 450590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:14,910-Speed 2621.91 samples/sec   Loss 6.0760   LearningRate 0.0209   Epoch: 10   Global Step: 450600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:18,806-Speed 2628.71 samples/sec   Loss 6.1836   LearningRate 0.0209   Epoch: 10   Global Step: 450610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:22,709-Speed 2623.96 samples/sec   Loss 5.9944   LearningRate 0.0209   Epoch: 10   Global Step: 450620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:26,615-Speed 2623.00 samples/sec   Loss 6.0552   LearningRate 0.0209   Epoch: 10   Global Step: 450630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:30,512-Speed 2628.35 samples/sec   Loss 6.2391   LearningRate 0.0209   Epoch: 10   Global Step: 450640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:34,409-Speed 2628.34 samples/sec   Loss 6.1307   LearningRate 0.0209   Epoch: 10   Global Step: 450650   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:22:38,309-Speed 2625.78 samples/sec   Loss 6.0997   LearningRate 0.0209   Epoch: 10   Global Step: 450660   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:22:42,192-Speed 2638.03 samples/sec   Loss 6.1186   LearningRate 0.0209   Epoch: 10   Global Step: 450670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:46,101-Speed 2619.91 samples/sec   Loss 6.1304   LearningRate 0.0209   Epoch: 10   Global Step: 450680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:50,010-Speed 2620.27 samples/sec   Loss 6.0714   LearningRate 0.0209   Epoch: 10   Global Step: 450690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:53,911-Speed 2625.75 samples/sec   Loss 6.1745   LearningRate 0.0209   Epoch: 10   Global Step: 450700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:22:57,820-Speed 2620.34 samples/sec   Loss 6.0704   LearningRate 0.0209   Epoch: 10   Global Step: 450710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:01,725-Speed 2623.20 samples/sec   Loss 6.1537   LearningRate 0.0209   Epoch: 10   Global Step: 450720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:05,703-Speed 2574.42 samples/sec   Loss 6.0615   LearningRate 0.0209   Epoch: 10   Global Step: 450730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:09,803-Speed 2498.40 samples/sec   Loss 6.0351   LearningRate 0.0209   Epoch: 10   Global Step: 450740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:13,889-Speed 2506.83 samples/sec   Loss 6.1132   LearningRate 0.0209   Epoch: 10   Global Step: 450750   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:17,979-Speed 2504.06 samples/sec   Loss 6.0437   LearningRate 0.0209   Epoch: 10   Global Step: 450760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:21,994-Speed 2551.32 samples/sec   Loss 6.1393   LearningRate 0.0209   Epoch: 10   Global Step: 450770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:23:25,869-Speed 2643.16 samples/sec   Loss 6.1230   LearningRate 0.0208   Epoch: 10   Global Step: 450780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:29,759-Speed 2633.22 samples/sec   Loss 6.0464   LearningRate 0.0208   Epoch: 10   Global Step: 450790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:33,655-Speed 2628.67 samples/sec   Loss 6.1381   LearningRate 0.0208   Epoch: 10   Global Step: 450800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:37,551-Speed 2629.30 samples/sec   Loss 6.0310   LearningRate 0.0208   Epoch: 10   Global Step: 450810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:41,440-Speed 2633.57 samples/sec   Loss 6.0904   LearningRate 0.0208   Epoch: 10   Global Step: 450820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:45,333-Speed 2631.21 samples/sec   Loss 6.1498   LearningRate 0.0208   Epoch: 10   Global Step: 450830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:49,237-Speed 2623.28 samples/sec   Loss 6.2185   LearningRate 0.0208   Epoch: 10   Global Step: 450840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:53,140-Speed 2624.30 samples/sec   Loss 5.9734   LearningRate 0.0208   Epoch: 10   Global Step: 450850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:23:57,033-Speed 2631.11 samples/sec   Loss 6.1705   LearningRate 0.0208   Epoch: 10   Global Step: 450860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:00,929-Speed 2629.34 samples/sec   Loss 6.1410   LearningRate 0.0208   Epoch: 10   Global Step: 450870   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:04,821-Speed 2631.37 samples/sec   Loss 6.0611   LearningRate 0.0208   Epoch: 10   Global Step: 450880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:24:08,732-Speed 2618.79 samples/sec   Loss 5.9917   LearningRate 0.0208   Epoch: 10   Global Step: 450890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:24:12,637-Speed 2623.08 samples/sec   Loss 6.1028   LearningRate 0.0208   Epoch: 10   Global Step: 450900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:24:16,530-Speed 2630.93 samples/sec   Loss 6.0952   LearningRate 0.0208   Epoch: 10   Global Step: 450910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:24:20,429-Speed 2626.51 samples/sec   Loss 6.1412   LearningRate 0.0208   Epoch: 10   Global Step: 450920   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:24,329-Speed 2626.95 samples/sec   Loss 5.9490   LearningRate 0.0208   Epoch: 10   Global Step: 450930   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:28,222-Speed 2630.66 samples/sec   Loss 6.0620   LearningRate 0.0208   Epoch: 10   Global Step: 450940   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:32,129-Speed 2621.95 samples/sec   Loss 6.1228   LearningRate 0.0208   Epoch: 10   Global Step: 450950   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:36,023-Speed 2630.46 samples/sec   Loss 6.1120   LearningRate 0.0208   Epoch: 10   Global Step: 450960   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:39,918-Speed 2629.63 samples/sec   Loss 6.1070   LearningRate 0.0208   Epoch: 10   Global Step: 450970   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:43,826-Speed 2620.32 samples/sec   Loss 6.1451   LearningRate 0.0208   Epoch: 10   Global Step: 450980   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:47,721-Speed 2629.91 samples/sec   Loss 6.0900   LearningRate 0.0208   Epoch: 10   Global Step: 450990   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:51,624-Speed 2624.17 samples/sec   Loss 6.0767   LearningRate 0.0208   Epoch: 10   Global Step: 451000   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:55,517-Speed 2631.71 samples/sec   Loss 5.9874   LearningRate 0.0208   Epoch: 10   Global Step: 451010   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:24:59,412-Speed 2629.39 samples/sec   Loss 6.2664   LearningRate 0.0208   Epoch: 10   Global Step: 451020   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:03,308-Speed 2628.95 samples/sec   Loss 5.9423   LearningRate 0.0208   Epoch: 10   Global Step: 451030   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:07,213-Speed 2622.52 samples/sec   Loss 6.1269   LearningRate 0.0208   Epoch: 10   Global Step: 451040   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:11,117-Speed 2623.51 samples/sec   Loss 6.1047   LearningRate 0.0208   Epoch: 10   Global Step: 451050   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:15,026-Speed 2620.46 samples/sec   Loss 6.1095   LearningRate 0.0208   Epoch: 10   Global Step: 451060   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:18,924-Speed 2627.77 samples/sec   Loss 6.0113   LearningRate 0.0208   Epoch: 10   Global Step: 451070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:22,830-Speed 2621.93 samples/sec   Loss 6.0828   LearningRate 0.0208   Epoch: 10   Global Step: 451080   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:26,728-Speed 2628.23 samples/sec   Loss 6.0895   LearningRate 0.0208   Epoch: 10   Global Step: 451090   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:30,626-Speed 2627.42 samples/sec   Loss 6.1004   LearningRate 0.0208   Epoch: 10   Global Step: 451100   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:34,522-Speed 2628.97 samples/sec   Loss 6.0578   LearningRate 0.0208   Epoch: 10   Global Step: 451110   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:38,419-Speed 2628.16 samples/sec   Loss 6.0538   LearningRate 0.0208   Epoch: 10   Global Step: 451120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:25:42,295-Speed 2642.22 samples/sec   Loss 6.0238   LearningRate 0.0208   Epoch: 10   Global Step: 451130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:25:46,191-Speed 2628.92 samples/sec   Loss 6.2146   LearningRate 0.0208   Epoch: 10   Global Step: 451140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:25:50,086-Speed 2629.68 samples/sec   Loss 6.0443   LearningRate 0.0208   Epoch: 10   Global Step: 451150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:25:53,987-Speed 2626.08 samples/sec   Loss 6.0475   LearningRate 0.0208   Epoch: 10   Global Step: 451160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:25:57,892-Speed 2622.72 samples/sec   Loss 6.0911   LearningRate 0.0208   Epoch: 10   Global Step: 451170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:01,795-Speed 2624.88 samples/sec   Loss 6.0993   LearningRate 0.0208   Epoch: 10   Global Step: 451180   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:05,694-Speed 2626.28 samples/sec   Loss 6.0358   LearningRate 0.0208   Epoch: 10   Global Step: 451190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:09,587-Speed 2631.00 samples/sec   Loss 6.0916   LearningRate 0.0208   Epoch: 10   Global Step: 451200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:13,481-Speed 2630.18 samples/sec   Loss 5.9977   LearningRate 0.0208   Epoch: 10   Global Step: 451210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:17,380-Speed 2627.18 samples/sec   Loss 6.1272   LearningRate 0.0208   Epoch: 10   Global Step: 451220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:21,336-Speed 2589.10 samples/sec   Loss 6.0936   LearningRate 0.0208   Epoch: 10   Global Step: 451230   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:26:25,233-Speed 2628.67 samples/sec   Loss 6.1519   LearningRate 0.0208   Epoch: 10   Global Step: 451240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:26:29,129-Speed 2629.34 samples/sec   Loss 6.0822   LearningRate 0.0208   Epoch: 10   Global Step: 451250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:26:33,023-Speed 2630.38 samples/sec   Loss 5.9950   LearningRate 0.0208   Epoch: 10   Global Step: 451260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:36,922-Speed 2626.28 samples/sec   Loss 6.0807   LearningRate 0.0208   Epoch: 10   Global Step: 451270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:40,815-Speed 2631.10 samples/sec   Loss 6.0521   LearningRate 0.0208   Epoch: 10   Global Step: 451280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:44,714-Speed 2626.94 samples/sec   Loss 6.1481   LearningRate 0.0208   Epoch: 10   Global Step: 451290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:48,612-Speed 2628.05 samples/sec   Loss 6.1205   LearningRate 0.0208   Epoch: 10   Global Step: 451300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:52,508-Speed 2629.18 samples/sec   Loss 6.1456   LearningRate 0.0208   Epoch: 10   Global Step: 451310   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:26:56,402-Speed 2630.50 samples/sec   Loss 6.0484   LearningRate 0.0208   Epoch: 10   Global Step: 451320   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:27:00,296-Speed 2630.18 samples/sec   Loss 6.1479   LearningRate 0.0208   Epoch: 10   Global Step: 451330   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:27:04,195-Speed 2626.37 samples/sec   Loss 6.1586   LearningRate 0.0208   Epoch: 10   Global Step: 451340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:27:08,095-Speed 2626.99 samples/sec   Loss 6.0146   LearningRate 0.0208   Epoch: 10   Global Step: 451350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-04-14 22:27:11,991-Speed 2628.92 samples/sec   Loss 6.1862   LearningRate 0.0208   Epoch: 10   Global Step: 451360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:27:15,888-Speed 2628.46 samples/sec   Loss 6.1000   LearningRate 0.0208   Epoch: 10   Global Step: 451370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-04-14 22:27:19,778-Speed 2633.00 samples/sec   Loss 6.1872   LearningRate 0.0208   Epoch: 10   Global Step: 451380   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:23,687-Speed 2619.61 samples/sec   Loss 6.0026   LearningRate 0.0208   Epoch: 10   Global Step: 451390   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:27,581-Speed 2631.02 samples/sec   Loss 6.1495   LearningRate 0.0208   Epoch: 10   Global Step: 451400   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:31,494-Speed 2617.75 samples/sec   Loss 6.0650   LearningRate 0.0208   Epoch: 10   Global Step: 451410   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:35,395-Speed 2625.44 samples/sec   Loss 6.1702   LearningRate 0.0208   Epoch: 10   Global Step: 451420   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:39,290-Speed 2629.10 samples/sec   Loss 6.1061   LearningRate 0.0208   Epoch: 10   Global Step: 451430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:43,196-Speed 2622.52 samples/sec   Loss 6.1711   LearningRate 0.0208   Epoch: 10   Global Step: 451440   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:47,095-Speed 2627.02 samples/sec   Loss 6.1079   LearningRate 0.0208   Epoch: 10   Global Step: 451450   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:50,994-Speed 2627.52 samples/sec   Loss 6.1710   LearningRate 0.0208   Epoch: 10   Global Step: 451460   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:54,890-Speed 2629.45 samples/sec   Loss 6.0641   LearningRate 0.0208   Epoch: 10   Global Step: 451470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:27:58,806-Speed 2615.54 samples/sec   Loss 6.0878   LearningRate 0.0208   Epoch: 10   Global Step: 451480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:28:02,711-Speed 2622.69 samples/sec   Loss 6.1427   LearningRate 0.0208   Epoch: 10   Global Step: 451490   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:28:06,613-Speed 2625.04 samples/sec   Loss 6.1945   LearningRate 0.0208   Epoch: 10   Global Step: 451500   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:28:10,554-Speed 2598.94 samples/sec   Loss 6.1098   LearningRate 0.0208   Epoch: 10   Global Step: 451510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:28:14,448-Speed 2630.51 samples/sec   Loss 6.1546   LearningRate 0.0208   Epoch: 10   Global Step: 451520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:28:18,330-Speed 2638.51 samples/sec   Loss 6.0991   LearningRate 0.0208   Epoch: 10   Global Step: 451530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:22,230-Speed 2626.53 samples/sec   Loss 6.1257   LearningRate 0.0208   Epoch: 10   Global Step: 451540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:26,155-Speed 2608.87 samples/sec   Loss 6.1001   LearningRate 0.0208   Epoch: 10   Global Step: 451550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:30,120-Speed 2583.99 samples/sec   Loss 6.0224   LearningRate 0.0208   Epoch: 10   Global Step: 451560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:34,013-Speed 2630.66 samples/sec   Loss 6.0725   LearningRate 0.0208   Epoch: 10   Global Step: 451570   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:37,921-Speed 2621.18 samples/sec   Loss 6.1436   LearningRate 0.0208   Epoch: 10   Global Step: 451580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:41,813-Speed 2631.57 samples/sec   Loss 6.0670   LearningRate 0.0208   Epoch: 10   Global Step: 451590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:45,708-Speed 2629.96 samples/sec   Loss 6.0412   LearningRate 0.0208   Epoch: 10   Global Step: 451600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:49,625-Speed 2615.30 samples/sec   Loss 5.9650   LearningRate 0.0208   Epoch: 10   Global Step: 451610   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:53,527-Speed 2625.03 samples/sec   Loss 6.0302   LearningRate 0.0208   Epoch: 10   Global Step: 451620   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:28:57,424-Speed 2628.56 samples/sec   Loss 6.0570   LearningRate 0.0208   Epoch: 10   Global Step: 451630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:01,329-Speed 2622.64 samples/sec   Loss 6.1357   LearningRate 0.0208   Epoch: 10   Global Step: 451640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:05,225-Speed 2628.90 samples/sec   Loss 6.0656   LearningRate 0.0208   Epoch: 10   Global Step: 451650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:09,116-Speed 2632.33 samples/sec   Loss 5.9760   LearningRate 0.0208   Epoch: 10   Global Step: 451660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:13,007-Speed 2632.55 samples/sec   Loss 6.0925   LearningRate 0.0208   Epoch: 10   Global Step: 451670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:16,911-Speed 2623.08 samples/sec   Loss 6.1055   LearningRate 0.0208   Epoch: 10   Global Step: 451680   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:20,804-Speed 2631.74 samples/sec   Loss 6.0831   LearningRate 0.0207   Epoch: 10   Global Step: 451690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:24,703-Speed 2626.54 samples/sec   Loss 6.1294   LearningRate 0.0207   Epoch: 10   Global Step: 451700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:28,635-Speed 2605.67 samples/sec   Loss 6.1047   LearningRate 0.0207   Epoch: 10   Global Step: 451710   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:32,528-Speed 2630.31 samples/sec   Loss 6.0330   LearningRate 0.0207   Epoch: 10   Global Step: 451720   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:36,399-Speed 2645.77 samples/sec   Loss 6.0685   LearningRate 0.0207   Epoch: 10   Global Step: 451730   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:40,296-Speed 2628.09 samples/sec   Loss 6.1512   LearningRate 0.0207   Epoch: 10   Global Step: 451740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:44,192-Speed 2629.15 samples/sec   Loss 6.0049   LearningRate 0.0207   Epoch: 10   Global Step: 451750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:48,089-Speed 2628.11 samples/sec   Loss 5.9852   LearningRate 0.0207   Epoch: 10   Global Step: 451760   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:51,994-Speed 2623.45 samples/sec   Loss 6.0268   LearningRate 0.0207   Epoch: 10   Global Step: 451770   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:55,894-Speed 2625.92 samples/sec   Loss 6.0763   LearningRate 0.0207   Epoch: 10   Global Step: 451780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:29:59,791-Speed 2629.10 samples/sec   Loss 6.0996   LearningRate 0.0207   Epoch: 10   Global Step: 451790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:30:03,687-Speed 2628.58 samples/sec   Loss 5.9392   LearningRate 0.0207   Epoch: 10   Global Step: 451800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:30:07,598-Speed 2618.96 samples/sec   Loss 6.1094   LearningRate 0.0207   Epoch: 10   Global Step: 451810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:30:11,491-Speed 2630.71 samples/sec   Loss 6.0119   LearningRate 0.0207   Epoch: 10   Global Step: 451820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:30:15,346-Speed 2656.72 samples/sec   Loss 6.0620   LearningRate 0.0207   Epoch: 10   Global Step: 451830   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:19,247-Speed 2626.45 samples/sec   Loss 6.0614   LearningRate 0.0207   Epoch: 10   Global Step: 451840   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:23,141-Speed 2629.95 samples/sec   Loss 6.0282   LearningRate 0.0207   Epoch: 10   Global Step: 451850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:27,039-Speed 2628.47 samples/sec   Loss 6.0657   LearningRate 0.0207   Epoch: 10   Global Step: 451860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:31,125-Speed 2506.85 samples/sec   Loss 6.0601   LearningRate 0.0207   Epoch: 10   Global Step: 451870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:35,194-Speed 2517.71 samples/sec   Loss 6.1748   LearningRate 0.0207   Epoch: 10   Global Step: 451880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:39,088-Speed 2629.59 samples/sec   Loss 6.1031   LearningRate 0.0207   Epoch: 10   Global Step: 451890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:42,982-Speed 2630.35 samples/sec   Loss 6.0660   LearningRate 0.0207   Epoch: 10   Global Step: 451900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:46,876-Speed 2630.03 samples/sec   Loss 6.1195   LearningRate 0.0207   Epoch: 10   Global Step: 451910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:50,777-Speed 2626.17 samples/sec   Loss 6.1476   LearningRate 0.0207   Epoch: 10   Global Step: 451920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:30:54,668-Speed 2632.58 samples/sec   Loss 6.1348   LearningRate 0.0207   Epoch: 10   Global Step: 451930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:30:58,561-Speed 2631.29 samples/sec   Loss 6.0626   LearningRate 0.0207   Epoch: 10   Global Step: 451940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:31:02,460-Speed 2627.32 samples/sec   Loss 5.9509   LearningRate 0.0207   Epoch: 10   Global Step: 451950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:31:06,353-Speed 2631.16 samples/sec   Loss 6.0474   LearningRate 0.0207   Epoch: 10   Global Step: 451960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:31:10,255-Speed 2624.52 samples/sec   Loss 6.1015   LearningRate 0.0207   Epoch: 10   Global Step: 451970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:31:14,152-Speed 2627.90 samples/sec   Loss 6.1033   LearningRate 0.0207   Epoch: 10   Global Step: 451980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:31:18,047-Speed 2630.36 samples/sec   Loss 6.0497   LearningRate 0.0207   Epoch: 10   Global Step: 451990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:31:21,918-Speed 2646.04 samples/sec   Loss 6.0064   LearningRate 0.0207   Epoch: 10   Global Step: 452000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:25,822-Speed 2623.92 samples/sec   Loss 5.9862   LearningRate 0.0207   Epoch: 10   Global Step: 452010   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:29,726-Speed 2623.35 samples/sec   Loss 6.1307   LearningRate 0.0207   Epoch: 10   Global Step: 452020   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:33,628-Speed 2624.63 samples/sec   Loss 6.0045   LearningRate 0.0207   Epoch: 10   Global Step: 452030   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:37,524-Speed 2629.27 samples/sec   Loss 6.1028   LearningRate 0.0207   Epoch: 10   Global Step: 452040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:41,442-Speed 2613.90 samples/sec   Loss 5.9924   LearningRate 0.0207   Epoch: 10   Global Step: 452050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:45,343-Speed 2625.56 samples/sec   Loss 6.1531   LearningRate 0.0207   Epoch: 10   Global Step: 452060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:49,242-Speed 2626.84 samples/sec   Loss 6.0185   LearningRate 0.0207   Epoch: 10   Global Step: 452070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:53,142-Speed 2626.01 samples/sec   Loss 6.2047   LearningRate 0.0207   Epoch: 10   Global Step: 452080   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:31:57,035-Speed 2631.42 samples/sec   Loss 6.0374   LearningRate 0.0207   Epoch: 10   Global Step: 452090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:00,927-Speed 2631.91 samples/sec   Loss 5.9557   LearningRate 0.0207   Epoch: 10   Global Step: 452100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:04,821-Speed 2630.13 samples/sec   Loss 6.1196   LearningRate 0.0207   Epoch: 10   Global Step: 452110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:08,724-Speed 2624.50 samples/sec   Loss 6.1340   LearningRate 0.0207   Epoch: 10   Global Step: 452120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:12,617-Speed 2630.52 samples/sec   Loss 6.0594   LearningRate 0.0207   Epoch: 10   Global Step: 452130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:16,516-Speed 2627.01 samples/sec   Loss 6.0921   LearningRate 0.0207   Epoch: 10   Global Step: 452140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:20,522-Speed 2556.38 samples/sec   Loss 6.0823   LearningRate 0.0207   Epoch: 10   Global Step: 452150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:24,418-Speed 2629.73 samples/sec   Loss 6.0862   LearningRate 0.0207   Epoch: 10   Global Step: 452160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:28,316-Speed 2627.17 samples/sec   Loss 6.1431   LearningRate 0.0207   Epoch: 10   Global Step: 452170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:32,212-Speed 2628.75 samples/sec   Loss 6.0604   LearningRate 0.0207   Epoch: 10   Global Step: 452180   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:32:36,083-Speed 2646.14 samples/sec   Loss 6.1851   LearningRate 0.0207   Epoch: 10   Global Step: 452190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:39,979-Speed 2629.54 samples/sec   Loss 6.0540   LearningRate 0.0207   Epoch: 10   Global Step: 452200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:43,877-Speed 2627.22 samples/sec   Loss 6.0669   LearningRate 0.0207   Epoch: 10   Global Step: 452210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:47,771-Speed 2630.58 samples/sec   Loss 6.0616   LearningRate 0.0207   Epoch: 10   Global Step: 452220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:51,672-Speed 2625.42 samples/sec   Loss 6.0861   LearningRate 0.0207   Epoch: 10   Global Step: 452230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:55,575-Speed 2624.63 samples/sec   Loss 6.0701   LearningRate 0.0207   Epoch: 10   Global Step: 452240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:32:59,478-Speed 2624.08 samples/sec   Loss 6.1803   LearningRate 0.0207   Epoch: 10   Global Step: 452250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:03,385-Speed 2621.30 samples/sec   Loss 6.0574   LearningRate 0.0207   Epoch: 10   Global Step: 452260   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:07,281-Speed 2628.81 samples/sec   Loss 5.9940   LearningRate 0.0207   Epoch: 10   Global Step: 452270   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:11,183-Speed 2625.35 samples/sec   Loss 6.1547   LearningRate 0.0207   Epoch: 10   Global Step: 452280   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:15,078-Speed 2629.64 samples/sec   Loss 6.0713   LearningRate 0.0207   Epoch: 10   Global Step: 452290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:33:18,979-Speed 2625.40 samples/sec   Loss 6.0430   LearningRate 0.0207   Epoch: 10   Global Step: 452300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:33:22,885-Speed 2623.45 samples/sec   Loss 6.1196   LearningRate 0.0207   Epoch: 10   Global Step: 452310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:33:26,787-Speed 2624.50 samples/sec   Loss 6.0951   LearningRate 0.0207   Epoch: 10   Global Step: 452320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:33:30,688-Speed 2625.40 samples/sec   Loss 5.9745   LearningRate 0.0207   Epoch: 10   Global Step: 452330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:33:34,601-Speed 2617.50 samples/sec   Loss 5.9964   LearningRate 0.0207   Epoch: 10   Global Step: 452340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:33:38,488-Speed 2635.31 samples/sec   Loss 6.0801   LearningRate 0.0207   Epoch: 10   Global Step: 452350   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:42,384-Speed 2628.42 samples/sec   Loss 6.1396   LearningRate 0.0207   Epoch: 10   Global Step: 452360   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:46,280-Speed 2629.63 samples/sec   Loss 6.0756   LearningRate 0.0207   Epoch: 10   Global Step: 452370   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:50,189-Speed 2621.03 samples/sec   Loss 6.0458   LearningRate 0.0207   Epoch: 10   Global Step: 452380   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:54,086-Speed 2628.07 samples/sec   Loss 5.9992   LearningRate 0.0207   Epoch: 10   Global Step: 452390   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:33:58,006-Speed 2613.61 samples/sec   Loss 6.0003   LearningRate 0.0207   Epoch: 10   Global Step: 452400   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:01,902-Speed 2628.35 samples/sec   Loss 6.0671   LearningRate 0.0207   Epoch: 10   Global Step: 452410   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:05,797-Speed 2629.82 samples/sec   Loss 6.0143   LearningRate 0.0207   Epoch: 10   Global Step: 452420   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:09,704-Speed 2621.39 samples/sec   Loss 5.9651   LearningRate 0.0207   Epoch: 10   Global Step: 452430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:13,597-Speed 2631.15 samples/sec   Loss 6.1847   LearningRate 0.0207   Epoch: 10   Global Step: 452440   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:17,494-Speed 2628.00 samples/sec   Loss 6.1602   LearningRate 0.0207   Epoch: 10   Global Step: 452450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:34:21,393-Speed 2627.73 samples/sec   Loss 6.0950   LearningRate 0.0207   Epoch: 10   Global Step: 452460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:34:25,276-Speed 2638.00 samples/sec   Loss 6.1084   LearningRate 0.0207   Epoch: 10   Global Step: 452470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:29,224-Speed 2594.81 samples/sec   Loss 6.0420   LearningRate 0.0207   Epoch: 10   Global Step: 452480   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:33,131-Speed 2621.30 samples/sec   Loss 6.0573   LearningRate 0.0207   Epoch: 10   Global Step: 452490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:37,024-Speed 2630.79 samples/sec   Loss 6.2141   LearningRate 0.0207   Epoch: 10   Global Step: 452500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:40,926-Speed 2624.80 samples/sec   Loss 6.0143   LearningRate 0.0207   Epoch: 10   Global Step: 452510   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:44,821-Speed 2629.47 samples/sec   Loss 5.9001   LearningRate 0.0207   Epoch: 10   Global Step: 452520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:48,719-Speed 2627.91 samples/sec   Loss 5.9946   LearningRate 0.0207   Epoch: 10   Global Step: 452530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:52,621-Speed 2624.59 samples/sec   Loss 6.0182   LearningRate 0.0207   Epoch: 10   Global Step: 452540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:34:56,574-Speed 2590.95 samples/sec   Loss 6.0681   LearningRate 0.0207   Epoch: 10   Global Step: 452550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:00,661-Speed 2506.32 samples/sec   Loss 6.0187   LearningRate 0.0207   Epoch: 10   Global Step: 452560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:04,679-Speed 2549.23 samples/sec   Loss 6.0521   LearningRate 0.0207   Epoch: 10   Global Step: 452570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:35:08,555-Speed 2642.43 samples/sec   Loss 6.0558   LearningRate 0.0207   Epoch: 10   Global Step: 452580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:12,461-Speed 2622.11 samples/sec   Loss 6.1193   LearningRate 0.0207   Epoch: 10   Global Step: 452590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:16,355-Speed 2630.55 samples/sec   Loss 6.2588   LearningRate 0.0207   Epoch: 10   Global Step: 452600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:20,260-Speed 2623.39 samples/sec   Loss 6.0409   LearningRate 0.0206   Epoch: 10   Global Step: 452610   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:24,158-Speed 2627.40 samples/sec   Loss 6.0553   LearningRate 0.0206   Epoch: 10   Global Step: 452620   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:28,053-Speed 2629.38 samples/sec   Loss 6.1535   LearningRate 0.0206   Epoch: 10   Global Step: 452630   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:31,954-Speed 2625.82 samples/sec   Loss 6.0241   LearningRate 0.0206   Epoch: 10   Global Step: 452640   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:35,858-Speed 2623.12 samples/sec   Loss 6.1096   LearningRate 0.0206   Epoch: 10   Global Step: 452650   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:39,771-Speed 2618.16 samples/sec   Loss 6.1922   LearningRate 0.0206   Epoch: 10   Global Step: 452660   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:43,671-Speed 2626.65 samples/sec   Loss 6.0720   LearningRate 0.0206   Epoch: 10   Global Step: 452670   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:47,566-Speed 2629.37 samples/sec   Loss 6.0091   LearningRate 0.0206   Epoch: 10   Global Step: 452680   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:35:51,451-Speed 2636.69 samples/sec   Loss 6.0892   LearningRate 0.0206   Epoch: 10   Global Step: 452690   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:35:55,341-Speed 2632.63 samples/sec   Loss 6.0961   LearningRate 0.0206   Epoch: 10   Global Step: 452700   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:35:59,322-Speed 2573.02 samples/sec   Loss 5.9631   LearningRate 0.0206   Epoch: 10   Global Step: 452710   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:03,222-Speed 2626.61 samples/sec   Loss 6.1407   LearningRate 0.0206   Epoch: 10   Global Step: 452720   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:07,120-Speed 2627.43 samples/sec   Loss 6.0408   LearningRate 0.0206   Epoch: 10   Global Step: 452730   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:11,021-Speed 2625.83 samples/sec   Loss 6.0799   LearningRate 0.0206   Epoch: 10   Global Step: 452740   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:14,921-Speed 2626.03 samples/sec   Loss 6.0389   LearningRate 0.0206   Epoch: 10   Global Step: 452750   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:18,830-Speed 2620.26 samples/sec   Loss 6.2048   LearningRate 0.0206   Epoch: 10   Global Step: 452760   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:22,723-Speed 2630.83 samples/sec   Loss 6.0784   LearningRate 0.0206   Epoch: 10   Global Step: 452770   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:26,627-Speed 2624.05 samples/sec   Loss 6.0707   LearningRate 0.0206   Epoch: 10   Global Step: 452780   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:30,522-Speed 2629.13 samples/sec   Loss 6.0012   LearningRate 0.0206   Epoch: 10   Global Step: 452790   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:36:34,428-Speed 2622.71 samples/sec   Loss 5.9394   LearningRate 0.0206   Epoch: 10   Global Step: 452800   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:36:38,327-Speed 2626.98 samples/sec   Loss 6.0811   LearningRate 0.0206   Epoch: 10   Global Step: 452810   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:36:42,250-Speed 2610.87 samples/sec   Loss 6.0260   LearningRate 0.0206   Epoch: 10   Global Step: 452820   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:36:46,150-Speed 2625.67 samples/sec   Loss 6.0579   LearningRate 0.0206   Epoch: 10   Global Step: 452830   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:36:50,064-Speed 2617.05 samples/sec   Loss 6.1417   LearningRate 0.0206   Epoch: 10   Global Step: 452840   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:36:53,965-Speed 2625.64 samples/sec   Loss 6.0708   LearningRate 0.0206   Epoch: 10   Global Step: 452850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:36:57,866-Speed 2625.59 samples/sec   Loss 6.0319   LearningRate 0.0206   Epoch: 10   Global Step: 452860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:01,762-Speed 2629.09 samples/sec   Loss 6.1588   LearningRate 0.0206   Epoch: 10   Global Step: 452870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:05,743-Speed 2573.19 samples/sec   Loss 6.0549   LearningRate 0.0206   Epoch: 10   Global Step: 452880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:09,648-Speed 2623.01 samples/sec   Loss 6.1477   LearningRate 0.0206   Epoch: 10   Global Step: 452890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:13,543-Speed 2629.26 samples/sec   Loss 6.1556   LearningRate 0.0206   Epoch: 10   Global Step: 452900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:17,447-Speed 2623.57 samples/sec   Loss 5.9963   LearningRate 0.0206   Epoch: 10   Global Step: 452910   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:21,346-Speed 2627.41 samples/sec   Loss 5.9948   LearningRate 0.0206   Epoch: 10   Global Step: 452920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:25,473-Speed 2482.03 samples/sec   Loss 6.1307   LearningRate 0.0206   Epoch: 10   Global Step: 452930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:29,432-Speed 2586.87 samples/sec   Loss 6.1029   LearningRate 0.0206   Epoch: 10   Global Step: 452940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:33,325-Speed 2631.15 samples/sec   Loss 5.9885   LearningRate 0.0206   Epoch: 10   Global Step: 452950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:37,223-Speed 2627.46 samples/sec   Loss 6.0468   LearningRate 0.0206   Epoch: 10   Global Step: 452960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:37:41,102-Speed 2640.68 samples/sec   Loss 6.1209   LearningRate 0.0206   Epoch: 10   Global Step: 452970   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:44,995-Speed 2630.60 samples/sec   Loss 5.9749   LearningRate 0.0206   Epoch: 10   Global Step: 452980   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:48,893-Speed 2628.11 samples/sec   Loss 6.0674   LearningRate 0.0206   Epoch: 10   Global Step: 452990   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:52,791-Speed 2627.26 samples/sec   Loss 5.9722   LearningRate 0.0206   Epoch: 10   Global Step: 453000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:37:56,693-Speed 2625.20 samples/sec   Loss 6.0324   LearningRate 0.0206   Epoch: 10   Global Step: 453010   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:38:00,606-Speed 2617.58 samples/sec   Loss 6.0540   LearningRate 0.0206   Epoch: 10   Global Step: 453020   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:38:04,508-Speed 2624.93 samples/sec   Loss 6.1343   LearningRate 0.0206   Epoch: 10   Global Step: 453030   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:38:08,403-Speed 2629.33 samples/sec   Loss 6.0490   LearningRate 0.0206   Epoch: 10   Global Step: 453040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:38:12,302-Speed 2627.00 samples/sec   Loss 6.0338   LearningRate 0.0206   Epoch: 10   Global Step: 453050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:38:16,203-Speed 2625.28 samples/sec   Loss 5.9461   LearningRate 0.0206   Epoch: 10   Global Step: 453060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:38:20,102-Speed 2627.45 samples/sec   Loss 5.9503   LearningRate 0.0206   Epoch: 10   Global Step: 453070   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:24,005-Speed 2624.18 samples/sec   Loss 6.0360   LearningRate 0.0206   Epoch: 10   Global Step: 453080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:27,920-Speed 2615.70 samples/sec   Loss 6.0919   LearningRate 0.0206   Epoch: 10   Global Step: 453090   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:31,818-Speed 2627.89 samples/sec   Loss 6.0663   LearningRate 0.0206   Epoch: 10   Global Step: 453100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:35,717-Speed 2626.83 samples/sec   Loss 6.1668   LearningRate 0.0206   Epoch: 10   Global Step: 453110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:39,606-Speed 2633.71 samples/sec   Loss 6.1080   LearningRate 0.0206   Epoch: 10   Global Step: 453120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:43,505-Speed 2626.72 samples/sec   Loss 6.1259   LearningRate 0.0206   Epoch: 10   Global Step: 453130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:47,400-Speed 2629.76 samples/sec   Loss 5.9317   LearningRate 0.0206   Epoch: 10   Global Step: 453140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:51,298-Speed 2627.65 samples/sec   Loss 6.0899   LearningRate 0.0206   Epoch: 10   Global Step: 453150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:55,196-Speed 2628.44 samples/sec   Loss 6.0407   LearningRate 0.0206   Epoch: 10   Global Step: 453160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:38:59,068-Speed 2644.67 samples/sec   Loss 6.1164   LearningRate 0.0206   Epoch: 10   Global Step: 453170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:02,966-Speed 2627.41 samples/sec   Loss 6.1113   LearningRate 0.0206   Epoch: 10   Global Step: 453180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:06,862-Speed 2629.09 samples/sec   Loss 6.0255   LearningRate 0.0206   Epoch: 10   Global Step: 453190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:10,767-Speed 2623.08 samples/sec   Loss 6.0277   LearningRate 0.0206   Epoch: 10   Global Step: 453200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:14,662-Speed 2629.51 samples/sec   Loss 6.1743   LearningRate 0.0206   Epoch: 10   Global Step: 453210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:18,551-Speed 2633.01 samples/sec   Loss 6.0797   LearningRate 0.0206   Epoch: 10   Global Step: 453220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:22,449-Speed 2628.02 samples/sec   Loss 6.0535   LearningRate 0.0206   Epoch: 10   Global Step: 453230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:26,346-Speed 2628.47 samples/sec   Loss 5.9822   LearningRate 0.0206   Epoch: 10   Global Step: 453240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:30,248-Speed 2624.87 samples/sec   Loss 6.1749   LearningRate 0.0206   Epoch: 10   Global Step: 453250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:34,151-Speed 2624.05 samples/sec   Loss 6.1036   LearningRate 0.0206   Epoch: 10   Global Step: 453260   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:39:38,052-Speed 2625.94 samples/sec   Loss 6.0167   LearningRate 0.0206   Epoch: 10   Global Step: 453270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:39:41,952-Speed 2626.27 samples/sec   Loss 6.2011   LearningRate 0.0206   Epoch: 10   Global Step: 453280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:39:45,844-Speed 2631.77 samples/sec   Loss 6.0704   LearningRate 0.0206   Epoch: 10   Global Step: 453290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:39:49,742-Speed 2627.67 samples/sec   Loss 5.9916   LearningRate 0.0206   Epoch: 10   Global Step: 453300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:39:53,743-Speed 2560.38 samples/sec   Loss 6.1376   LearningRate 0.0206   Epoch: 10   Global Step: 453310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:39:57,726-Speed 2571.29 samples/sec   Loss 6.1126   LearningRate 0.0206   Epoch: 10   Global Step: 453320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:01,731-Speed 2558.04 samples/sec   Loss 6.0098   LearningRate 0.0206   Epoch: 10   Global Step: 453330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:05,648-Speed 2614.60 samples/sec   Loss 6.0658   LearningRate 0.0206   Epoch: 10   Global Step: 453340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:09,555-Speed 2621.29 samples/sec   Loss 6.0342   LearningRate 0.0206   Epoch: 10   Global Step: 453350   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:13,462-Speed 2621.39 samples/sec   Loss 6.1243   LearningRate 0.0206   Epoch: 10   Global Step: 453360   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:17,360-Speed 2628.07 samples/sec   Loss 6.0251   LearningRate 0.0206   Epoch: 10   Global Step: 453370   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 22:40:21,246-Speed 2635.84 samples/sec   Loss 6.0662   LearningRate 0.0206   Epoch: 10   Global Step: 453380   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:25,144-Speed 2627.31 samples/sec   Loss 5.9891   LearningRate 0.0206   Epoch: 10   Global Step: 453390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:29,038-Speed 2630.85 samples/sec   Loss 6.1043   LearningRate 0.0206   Epoch: 10   Global Step: 453400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:32,938-Speed 2626.14 samples/sec   Loss 6.0587   LearningRate 0.0206   Epoch: 10   Global Step: 453410   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:36,862-Speed 2610.08 samples/sec   Loss 5.9364   LearningRate 0.0206   Epoch: 10   Global Step: 453420   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:40,781-Speed 2613.64 samples/sec   Loss 6.1802   LearningRate 0.0206   Epoch: 10   Global Step: 453430   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:44,700-Speed 2614.03 samples/sec   Loss 6.1083   LearningRate 0.0206   Epoch: 10   Global Step: 453440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:48,636-Speed 2602.03 samples/sec   Loss 6.0467   LearningRate 0.0206   Epoch: 10   Global Step: 453450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:52,541-Speed 2623.37 samples/sec   Loss 6.0660   LearningRate 0.0206   Epoch: 10   Global Step: 453460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:40:56,457-Speed 2615.10 samples/sec   Loss 6.0570   LearningRate 0.0206   Epoch: 10   Global Step: 453470   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:00,328-Speed 2646.37 samples/sec   Loss 6.1450   LearningRate 0.0206   Epoch: 10   Global Step: 453480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:04,233-Speed 2623.48 samples/sec   Loss 6.0874   LearningRate 0.0206   Epoch: 10   Global Step: 453490   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:08,135-Speed 2624.23 samples/sec   Loss 6.0470   LearningRate 0.0206   Epoch: 10   Global Step: 453500   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:12,035-Speed 2626.55 samples/sec   Loss 6.0360   LearningRate 0.0206   Epoch: 10   Global Step: 453510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:15,932-Speed 2628.48 samples/sec   Loss 6.0989   LearningRate 0.0205   Epoch: 10   Global Step: 453520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:19,830-Speed 2627.53 samples/sec   Loss 5.9228   LearningRate 0.0205   Epoch: 10   Global Step: 453530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:23,745-Speed 2617.00 samples/sec   Loss 6.0620   LearningRate 0.0205   Epoch: 10   Global Step: 453540   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:27,656-Speed 2618.80 samples/sec   Loss 6.0723   LearningRate 0.0205   Epoch: 10   Global Step: 453550   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:31,551-Speed 2629.77 samples/sec   Loss 6.1159   LearningRate 0.0205   Epoch: 10   Global Step: 453560   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:35,447-Speed 2629.25 samples/sec   Loss 6.0585   LearningRate 0.0205   Epoch: 10   Global Step: 453570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:39,340-Speed 2630.87 samples/sec   Loss 6.2087   LearningRate 0.0205   Epoch: 10   Global Step: 453580   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 22:41:43,216-Speed 2642.60 samples/sec   Loss 6.0382   LearningRate 0.0205   Epoch: 10   Global Step: 453590   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:47,114-Speed 2627.30 samples/sec   Loss 6.0325   LearningRate 0.0205   Epoch: 10   Global Step: 453600   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:51,016-Speed 2625.41 samples/sec   Loss 6.0653   LearningRate 0.0205   Epoch: 10   Global Step: 453610   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:54,909-Speed 2630.66 samples/sec   Loss 6.0172   LearningRate 0.0205   Epoch: 10   Global Step: 453620   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:41:58,820-Speed 2619.42 samples/sec   Loss 6.1673   LearningRate 0.0205   Epoch: 10   Global Step: 453630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:42:02,720-Speed 2626.11 samples/sec   Loss 6.0548   LearningRate 0.0205   Epoch: 10   Global Step: 453640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:42:06,653-Speed 2604.33 samples/sec   Loss 5.9136   LearningRate 0.0205   Epoch: 10   Global Step: 453650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:42:10,549-Speed 2628.80 samples/sec   Loss 6.0377   LearningRate 0.0205   Epoch: 10   Global Step: 453660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:42:14,449-Speed 2627.61 samples/sec   Loss 5.9907   LearningRate 0.0205   Epoch: 10   Global Step: 453670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:42:18,327-Speed 2640.44 samples/sec   Loss 6.0647   LearningRate 0.0205   Epoch: 10   Global Step: 453680   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:22,236-Speed 2620.03 samples/sec   Loss 6.0854   LearningRate 0.0205   Epoch: 10   Global Step: 453690   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:26,144-Speed 2621.56 samples/sec   Loss 6.0816   LearningRate 0.0205   Epoch: 10   Global Step: 453700   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:30,053-Speed 2620.57 samples/sec   Loss 6.1141   LearningRate 0.0205   Epoch: 10   Global Step: 453710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:34,333-Speed 2393.32 samples/sec   Loss 6.1767   LearningRate 0.0205   Epoch: 10   Global Step: 453720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:38,232-Speed 2626.74 samples/sec   Loss 6.0429   LearningRate 0.0205   Epoch: 10   Global Step: 453730   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:42,126-Speed 2630.53 samples/sec   Loss 6.0681   LearningRate 0.0205   Epoch: 10   Global Step: 453740   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:46,021-Speed 2629.55 samples/sec   Loss 5.9524   LearningRate 0.0205   Epoch: 10   Global Step: 453750   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:49,923-Speed 2624.57 samples/sec   Loss 6.0416   LearningRate 0.0205   Epoch: 10   Global Step: 453760   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:53,830-Speed 2621.29 samples/sec   Loss 5.9919   LearningRate 0.0205   Epoch: 10   Global Step: 453770   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:42:57,909-Speed 2511.65 samples/sec   Loss 5.8560   LearningRate 0.0205   Epoch: 10   Global Step: 453780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:01,800-Speed 2632.76 samples/sec   Loss 6.0261   LearningRate 0.0205   Epoch: 10   Global Step: 453790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:05,688-Speed 2634.13 samples/sec   Loss 6.0085   LearningRate 0.0205   Epoch: 10   Global Step: 453800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:09,652-Speed 2584.39 samples/sec   Loss 6.1024   LearningRate 0.0205   Epoch: 10   Global Step: 453810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:13,617-Speed 2583.20 samples/sec   Loss 6.1217   LearningRate 0.0205   Epoch: 10   Global Step: 453820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:17,520-Speed 2623.76 samples/sec   Loss 6.0823   LearningRate 0.0205   Epoch: 10   Global Step: 453830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:21,416-Speed 2629.88 samples/sec   Loss 6.0225   LearningRate 0.0205   Epoch: 10   Global Step: 453840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:25,342-Speed 2609.09 samples/sec   Loss 6.1015   LearningRate 0.0205   Epoch: 10   Global Step: 453850   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:43:29,226-Speed 2636.93 samples/sec   Loss 5.8909   LearningRate 0.0205   Epoch: 10   Global Step: 453860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:33,122-Speed 2629.37 samples/sec   Loss 6.0710   LearningRate 0.0205   Epoch: 10   Global Step: 453870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:37,037-Speed 2616.33 samples/sec   Loss 6.0401   LearningRate 0.0205   Epoch: 10   Global Step: 453880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:40,943-Speed 2622.51 samples/sec   Loss 5.9971   LearningRate 0.0205   Epoch: 10   Global Step: 453890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:44,907-Speed 2583.74 samples/sec   Loss 6.0413   LearningRate 0.0205   Epoch: 10   Global Step: 453900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:48,921-Speed 2551.92 samples/sec   Loss 6.0717   LearningRate 0.0205   Epoch: 10   Global Step: 453910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:52,820-Speed 2626.82 samples/sec   Loss 6.0820   LearningRate 0.0205   Epoch: 10   Global Step: 453920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:43:56,724-Speed 2624.03 samples/sec   Loss 6.0949   LearningRate 0.0205   Epoch: 10   Global Step: 453930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:00,622-Speed 2627.23 samples/sec   Loss 6.2100   LearningRate 0.0205   Epoch: 10   Global Step: 453940   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:04,523-Speed 2626.18 samples/sec   Loss 6.0508   LearningRate 0.0205   Epoch: 10   Global Step: 453950   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:08,524-Speed 2560.02 samples/sec   Loss 6.1240   LearningRate 0.0205   Epoch: 10   Global Step: 453960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:44:12,415-Speed 2632.39 samples/sec   Loss 6.0446   LearningRate 0.0205   Epoch: 10   Global Step: 453970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:44:16,308-Speed 2630.83 samples/sec   Loss 6.0917   LearningRate 0.0205   Epoch: 10   Global Step: 453980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:44:20,204-Speed 2629.25 samples/sec   Loss 6.0167   LearningRate 0.0205   Epoch: 10   Global Step: 453990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:44:24,098-Speed 2630.19 samples/sec   Loss 6.0931   LearningRate 0.0205   Epoch: 10   Global Step: 454000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:44:27,976-Speed 2640.72 samples/sec   Loss 5.9561   LearningRate 0.0205   Epoch: 10   Global Step: 454010   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:31,873-Speed 2629.37 samples/sec   Loss 6.1111   LearningRate 0.0205   Epoch: 10   Global Step: 454020   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:35,770-Speed 2629.26 samples/sec   Loss 6.1123   LearningRate 0.0205   Epoch: 10   Global Step: 454030   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:39,675-Speed 2622.94 samples/sec   Loss 6.0737   LearningRate 0.0205   Epoch: 10   Global Step: 454040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:43,571-Speed 2628.97 samples/sec   Loss 6.1229   LearningRate 0.0205   Epoch: 10   Global Step: 454050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:47,535-Speed 2583.72 samples/sec   Loss 6.1759   LearningRate 0.0205   Epoch: 10   Global Step: 454060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:51,433-Speed 2627.54 samples/sec   Loss 5.9649   LearningRate 0.0205   Epoch: 10   Global Step: 454070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:55,345-Speed 2618.87 samples/sec   Loss 6.1555   LearningRate 0.0205   Epoch: 10   Global Step: 454080   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:44:59,243-Speed 2627.35 samples/sec   Loss 5.9645   LearningRate 0.0205   Epoch: 10   Global Step: 454090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:45:03,167-Speed 2610.36 samples/sec   Loss 6.0494   LearningRate 0.0205   Epoch: 10   Global Step: 454100   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:45:07,142-Speed 2576.34 samples/sec   Loss 6.0388   LearningRate 0.0205   Epoch: 10   Global Step: 454110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:11,041-Speed 2626.90 samples/sec   Loss 6.1378   LearningRate 0.0205   Epoch: 10   Global Step: 454120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:14,936-Speed 2629.87 samples/sec   Loss 5.9034   LearningRate 0.0205   Epoch: 10   Global Step: 454130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:18,835-Speed 2626.65 samples/sec   Loss 5.9955   LearningRate 0.0205   Epoch: 10   Global Step: 454140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:22,741-Speed 2622.22 samples/sec   Loss 6.0236   LearningRate 0.0205   Epoch: 10   Global Step: 454150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:26,634-Speed 2630.85 samples/sec   Loss 6.0117   LearningRate 0.0205   Epoch: 10   Global Step: 454160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:30,536-Speed 2625.32 samples/sec   Loss 6.0357   LearningRate 0.0205   Epoch: 10   Global Step: 454170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:34,433-Speed 2628.45 samples/sec   Loss 6.0974   LearningRate 0.0205   Epoch: 10   Global Step: 454180   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:38,334-Speed 2625.29 samples/sec   Loss 5.9580   LearningRate 0.0205   Epoch: 10   Global Step: 454190   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:42,243-Speed 2620.77 samples/sec   Loss 6.0858   LearningRate 0.0205   Epoch: 10   Global Step: 454200   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:46,144-Speed 2626.79 samples/sec   Loss 6.1392   LearningRate 0.0205   Epoch: 10   Global Step: 454210   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 22:45:50,041-Speed 2627.98 samples/sec   Loss 6.1480   LearningRate 0.0205   Epoch: 10   Global Step: 454220   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:53,950-Speed 2620.08 samples/sec   Loss 5.9168   LearningRate 0.0205   Epoch: 10   Global Step: 454230   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:45:57,842-Speed 2631.53 samples/sec   Loss 6.1481   LearningRate 0.0205   Epoch: 10   Global Step: 454240   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:46:01,744-Speed 2625.77 samples/sec   Loss 5.9970   LearningRate 0.0205   Epoch: 10   Global Step: 454250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:46:05,620-Speed 2642.16 samples/sec   Loss 6.0519   LearningRate 0.0205   Epoch: 10   Global Step: 454260   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:09,542-Speed 2611.36 samples/sec   Loss 6.0658   LearningRate 0.0205   Epoch: 10   Global Step: 454270   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:13,445-Speed 2624.68 samples/sec   Loss 6.1246   LearningRate 0.0205   Epoch: 10   Global Step: 454280   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:17,337-Speed 2631.72 samples/sec   Loss 6.1095   LearningRate 0.0205   Epoch: 10   Global Step: 454290   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:21,233-Speed 2629.37 samples/sec   Loss 6.0415   LearningRate 0.0205   Epoch: 10   Global Step: 454300   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:25,127-Speed 2630.21 samples/sec   Loss 6.0683   LearningRate 0.0205   Epoch: 10   Global Step: 454310   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:29,029-Speed 2625.13 samples/sec   Loss 5.9817   LearningRate 0.0205   Epoch: 10   Global Step: 454320   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:32,931-Speed 2624.32 samples/sec   Loss 6.0623   LearningRate 0.0205   Epoch: 10   Global Step: 454330   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:36,833-Speed 2625.41 samples/sec   Loss 6.0812   LearningRate 0.0205   Epoch: 10   Global Step: 454340   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:40,730-Speed 2627.91 samples/sec   Loss 6.0436   LearningRate 0.0205   Epoch: 10   Global Step: 454350   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:46:44,625-Speed 2629.59 samples/sec   Loss 5.9923   LearningRate 0.0205   Epoch: 10   Global Step: 454360   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:46:48,528-Speed 2624.28 samples/sec   Loss 6.0886   LearningRate 0.0205   Epoch: 10   Global Step: 454370   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:46:52,425-Speed 2628.39 samples/sec   Loss 6.0091   LearningRate 0.0205   Epoch: 10   Global Step: 454380   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:46:56,324-Speed 2627.01 samples/sec   Loss 6.0433   LearningRate 0.0205   Epoch: 10   Global Step: 454390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:00,236-Speed 2618.03 samples/sec   Loss 6.0802   LearningRate 0.0205   Epoch: 10   Global Step: 454400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:04,135-Speed 2626.94 samples/sec   Loss 6.0314   LearningRate 0.0205   Epoch: 10   Global Step: 454410   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:08,031-Speed 2629.09 samples/sec   Loss 5.9681   LearningRate 0.0205   Epoch: 10   Global Step: 454420   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:11,926-Speed 2629.46 samples/sec   Loss 6.0545   LearningRate 0.0205   Epoch: 10   Global Step: 454430   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:15,828-Speed 2625.28 samples/sec   Loss 6.0044   LearningRate 0.0204   Epoch: 10   Global Step: 454440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:19,730-Speed 2624.82 samples/sec   Loss 6.1487   LearningRate 0.0204   Epoch: 10   Global Step: 454450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:23,613-Speed 2637.79 samples/sec   Loss 6.0617   LearningRate 0.0204   Epoch: 10   Global Step: 454460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:27,511-Speed 2627.43 samples/sec   Loss 5.9281   LearningRate 0.0204   Epoch: 10   Global Step: 454470   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:31,415-Speed 2623.70 samples/sec   Loss 6.0209   LearningRate 0.0204   Epoch: 10   Global Step: 454480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:35,308-Speed 2630.23 samples/sec   Loss 6.1761   LearningRate 0.0204   Epoch: 10   Global Step: 454490   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:39,206-Speed 2627.65 samples/sec   Loss 5.9426   LearningRate 0.0204   Epoch: 10   Global Step: 454500   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:43,103-Speed 2628.53 samples/sec   Loss 6.0974   LearningRate 0.0204   Epoch: 10   Global Step: 454510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:47:46,987-Speed 2637.81 samples/sec   Loss 6.0004   LearningRate 0.0204   Epoch: 10   Global Step: 454520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:47:50,898-Speed 2619.39 samples/sec   Loss 5.9537   LearningRate 0.0204   Epoch: 10   Global Step: 454530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:47:54,795-Speed 2627.95 samples/sec   Loss 5.9826   LearningRate 0.0204   Epoch: 10   Global Step: 454540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:47:58,695-Speed 2626.33 samples/sec   Loss 6.1431   LearningRate 0.0204   Epoch: 10   Global Step: 454550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:48:02,596-Speed 2625.02 samples/sec   Loss 5.9606   LearningRate 0.0204   Epoch: 10   Global Step: 454560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:48:06,497-Speed 2625.48 samples/sec   Loss 6.0389   LearningRate 0.0204   Epoch: 10   Global Step: 454570   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:48:10,417-Speed 2612.66 samples/sec   Loss 6.0254   LearningRate 0.0204   Epoch: 10   Global Step: 454580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:48:14,413-Speed 2563.66 samples/sec   Loss 5.9647   LearningRate 0.0204   Epoch: 10   Global Step: 454590   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:18,311-Speed 2627.59 samples/sec   Loss 6.0846   LearningRate 0.0204   Epoch: 10   Global Step: 454600   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:22,204-Speed 2631.20 samples/sec   Loss 6.0846   LearningRate 0.0204   Epoch: 10   Global Step: 454610   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:26,101-Speed 2628.30 samples/sec   Loss 6.0297   LearningRate 0.0204   Epoch: 10   Global Step: 454620   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:30,005-Speed 2623.66 samples/sec   Loss 6.0605   LearningRate 0.0204   Epoch: 10   Global Step: 454630   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:33,905-Speed 2626.14 samples/sec   Loss 6.0606   LearningRate 0.0204   Epoch: 10   Global Step: 454640   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:37,799-Speed 2630.31 samples/sec   Loss 6.0178   LearningRate 0.0204   Epoch: 10   Global Step: 454650   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:41,695-Speed 2628.62 samples/sec   Loss 6.1934   LearningRate 0.0204   Epoch: 10   Global Step: 454660   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:45,591-Speed 2628.86 samples/sec   Loss 6.0983   LearningRate 0.0204   Epoch: 10   Global Step: 454670   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:49,492-Speed 2625.65 samples/sec   Loss 5.8929   LearningRate 0.0204   Epoch: 10   Global Step: 454680   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:48:53,392-Speed 2625.99 samples/sec   Loss 5.9524   LearningRate 0.0204   Epoch: 10   Global Step: 454690   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:48:57,290-Speed 2627.92 samples/sec   Loss 6.0510   LearningRate 0.0204   Epoch: 10   Global Step: 454700   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:01,187-Speed 2628.19 samples/sec   Loss 6.0769   LearningRate 0.0204   Epoch: 10   Global Step: 454710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:05,088-Speed 2626.35 samples/sec   Loss 6.0104   LearningRate 0.0204   Epoch: 10   Global Step: 454720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:09,009-Speed 2611.89 samples/sec   Loss 6.1171   LearningRate 0.0204   Epoch: 10   Global Step: 454730   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:12,904-Speed 2629.55 samples/sec   Loss 6.0530   LearningRate 0.0204   Epoch: 10   Global Step: 454740   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:16,800-Speed 2628.82 samples/sec   Loss 6.0159   LearningRate 0.0204   Epoch: 10   Global Step: 454750   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:20,711-Speed 2618.68 samples/sec   Loss 6.0060   LearningRate 0.0204   Epoch: 10   Global Step: 454760   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:24,604-Speed 2630.56 samples/sec   Loss 6.0860   LearningRate 0.0204   Epoch: 10   Global Step: 454770   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:28,500-Speed 2630.53 samples/sec   Loss 6.0387   LearningRate 0.0204   Epoch: 10   Global Step: 454780   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:32,406-Speed 2621.77 samples/sec   Loss 6.0060   LearningRate 0.0204   Epoch: 10   Global Step: 454790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:49:36,305-Speed 2627.15 samples/sec   Loss 6.2234   LearningRate 0.0204   Epoch: 10   Global Step: 454800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:49:40,203-Speed 2627.82 samples/sec   Loss 6.0827   LearningRate 0.0204   Epoch: 10   Global Step: 454810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:49:44,099-Speed 2629.07 samples/sec   Loss 5.8837   LearningRate 0.0204   Epoch: 10   Global Step: 454820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:49:47,992-Speed 2630.70 samples/sec   Loss 5.9468   LearningRate 0.0204   Epoch: 10   Global Step: 454830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:49:51,887-Speed 2629.57 samples/sec   Loss 6.0980   LearningRate 0.0204   Epoch: 10   Global Step: 454840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:49:55,762-Speed 2642.81 samples/sec   Loss 5.9408   LearningRate 0.0204   Epoch: 10   Global Step: 454850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:49:59,654-Speed 2632.19 samples/sec   Loss 6.0165   LearningRate 0.0204   Epoch: 10   Global Step: 454860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:03,559-Speed 2622.22 samples/sec   Loss 6.0216   LearningRate 0.0204   Epoch: 10   Global Step: 454870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:07,457-Speed 2628.24 samples/sec   Loss 6.0746   LearningRate 0.0204   Epoch: 10   Global Step: 454880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:11,358-Speed 2625.50 samples/sec   Loss 6.0811   LearningRate 0.0204   Epoch: 10   Global Step: 454890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:15,257-Speed 2626.85 samples/sec   Loss 6.0342   LearningRate 0.0204   Epoch: 10   Global Step: 454900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:19,162-Speed 2622.78 samples/sec   Loss 5.9632   LearningRate 0.0204   Epoch: 10   Global Step: 454910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:23,053-Speed 2632.08 samples/sec   Loss 5.9940   LearningRate 0.0204   Epoch: 10   Global Step: 454920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:26,950-Speed 2632.14 samples/sec   Loss 6.0183   LearningRate 0.0204   Epoch: 10   Global Step: 454930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:30,846-Speed 2629.13 samples/sec   Loss 6.1982   LearningRate 0.0204   Epoch: 10   Global Step: 454940   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:50:34,739-Speed 2630.71 samples/sec   Loss 5.9743   LearningRate 0.0204   Epoch: 10   Global Step: 454950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:50:38,641-Speed 2625.02 samples/sec   Loss 5.9837   LearningRate 0.0204   Epoch: 10   Global Step: 454960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:50:42,554-Speed 2617.59 samples/sec   Loss 6.0990   LearningRate 0.0204   Epoch: 10   Global Step: 454970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:50:46,449-Speed 2629.79 samples/sec   Loss 6.1228   LearningRate 0.0204   Epoch: 10   Global Step: 454980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:50:50,344-Speed 2628.86 samples/sec   Loss 6.0436   LearningRate 0.0204   Epoch: 10   Global Step: 454990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:50:54,243-Speed 2627.72 samples/sec   Loss 6.0217   LearningRate 0.0204   Epoch: 10   Global Step: 455000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:50:58,139-Speed 2628.81 samples/sec   Loss 6.0969   LearningRate 0.0204   Epoch: 10   Global Step: 455010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:51:02,032-Speed 2631.19 samples/sec   Loss 6.0052   LearningRate 0.0204   Epoch: 10   Global Step: 455020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:51:05,925-Speed 2630.67 samples/sec   Loss 5.9014   LearningRate 0.0204   Epoch: 10   Global Step: 455030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:51:09,801-Speed 2642.45 samples/sec   Loss 5.9688   LearningRate 0.0204   Epoch: 10   Global Step: 455040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:51:13,713-Speed 2618.18 samples/sec   Loss 5.8956   LearningRate 0.0204   Epoch: 10   Global Step: 455050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:51:17,620-Speed 2621.37 samples/sec   Loss 6.0354   LearningRate 0.0204   Epoch: 10   Global Step: 455060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:51:21,518-Speed 2628.07 samples/sec   Loss 6.0739   LearningRate 0.0204   Epoch: 10   Global Step: 455070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:51:25,403-Speed 2635.71 samples/sec   Loss 6.1343   LearningRate 0.0204   Epoch: 10   Global Step: 455080   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:29,297-Speed 2631.35 samples/sec   Loss 6.0815   LearningRate 0.0204   Epoch: 10   Global Step: 455090   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:33,192-Speed 2629.10 samples/sec   Loss 5.8924   LearningRate 0.0204   Epoch: 10   Global Step: 455100   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:37,091-Speed 2627.07 samples/sec   Loss 6.0411   LearningRate 0.0204   Epoch: 10   Global Step: 455110   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:40,983-Speed 2631.48 samples/sec   Loss 5.9404   LearningRate 0.0204   Epoch: 10   Global Step: 455120   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:44,877-Speed 2630.29 samples/sec   Loss 5.9575   LearningRate 0.0204   Epoch: 10   Global Step: 455130   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:48,778-Speed 2625.38 samples/sec   Loss 6.1197   LearningRate 0.0204   Epoch: 10   Global Step: 455140   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:52,678-Speed 2626.67 samples/sec   Loss 6.0069   LearningRate 0.0204   Epoch: 10   Global Step: 455150   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:51:56,575-Speed 2627.63 samples/sec   Loss 6.0131   LearningRate 0.0204   Epoch: 10   Global Step: 455160   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:52:00,474-Speed 2626.96 samples/sec   Loss 6.0812   LearningRate 0.0204   Epoch: 10   Global Step: 455170   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:52:04,371-Speed 2628.40 samples/sec   Loss 5.9805   LearningRate 0.0204   Epoch: 10   Global Step: 455180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:08,265-Speed 2630.14 samples/sec   Loss 6.1361   LearningRate 0.0204   Epoch: 10   Global Step: 455190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:12,162-Speed 2628.62 samples/sec   Loss 6.0591   LearningRate 0.0204   Epoch: 10   Global Step: 455200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:16,054-Speed 2631.38 samples/sec   Loss 5.9633   LearningRate 0.0204   Epoch: 10   Global Step: 455210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:19,950-Speed 2629.24 samples/sec   Loss 5.9919   LearningRate 0.0204   Epoch: 10   Global Step: 455220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:23,855-Speed 2622.86 samples/sec   Loss 5.9892   LearningRate 0.0204   Epoch: 10   Global Step: 455230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:27,757-Speed 2624.31 samples/sec   Loss 6.2046   LearningRate 0.0204   Epoch: 10   Global Step: 455240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:31,664-Speed 2621.64 samples/sec   Loss 6.0391   LearningRate 0.0204   Epoch: 10   Global Step: 455250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:35,562-Speed 2627.54 samples/sec   Loss 6.0419   LearningRate 0.0204   Epoch: 10   Global Step: 455260   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:39,471-Speed 2620.54 samples/sec   Loss 6.0176   LearningRate 0.0204   Epoch: 10   Global Step: 455270   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:52:43,391-Speed 2612.31 samples/sec   Loss 6.0820   LearningRate 0.0204   Epoch: 10   Global Step: 455280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:52:47,299-Speed 2621.55 samples/sec   Loss 5.9968   LearningRate 0.0204   Epoch: 10   Global Step: 455290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:52:51,193-Speed 2630.48 samples/sec   Loss 6.0353   LearningRate 0.0204   Epoch: 10   Global Step: 455300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:52:55,095-Speed 2624.68 samples/sec   Loss 6.2159   LearningRate 0.0204   Epoch: 10   Global Step: 455310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:52:58,992-Speed 2629.05 samples/sec   Loss 5.9791   LearningRate 0.0204   Epoch: 10   Global Step: 455320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:02,888-Speed 2628.50 samples/sec   Loss 6.0799   LearningRate 0.0204   Epoch: 10   Global Step: 455330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:06,785-Speed 2628.29 samples/sec   Loss 6.0457   LearningRate 0.0204   Epoch: 10   Global Step: 455340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:10,708-Speed 2610.37 samples/sec   Loss 6.0148   LearningRate 0.0203   Epoch: 10   Global Step: 455350   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:14,622-Speed 2617.28 samples/sec   Loss 5.9514   LearningRate 0.0203   Epoch: 10   Global Step: 455360   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:18,519-Speed 2628.14 samples/sec   Loss 5.9999   LearningRate 0.0203   Epoch: 10   Global Step: 455370   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:22,412-Speed 2631.02 samples/sec   Loss 6.0424   LearningRate 0.0203   Epoch: 10   Global Step: 455380   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 22:53:26,290-Speed 2641.34 samples/sec   Loss 6.0222   LearningRate 0.0203   Epoch: 10   Global Step: 455390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:30,189-Speed 2626.97 samples/sec   Loss 5.9990   LearningRate 0.0203   Epoch: 10   Global Step: 455400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:53:34,091-Speed 2624.77 samples/sec   Loss 5.9340   LearningRate 0.0203   Epoch: 10   Global Step: 455410   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:53:37,988-Speed 2627.91 samples/sec   Loss 5.9897   LearningRate 0.0203   Epoch: 10   Global Step: 455420   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:53:41,889-Speed 2625.48 samples/sec   Loss 6.0160   LearningRate 0.0203   Epoch: 10   Global Step: 455430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:53:45,793-Speed 2623.86 samples/sec   Loss 5.9687   LearningRate 0.0203   Epoch: 10   Global Step: 455440   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:53:49,712-Speed 2613.10 samples/sec   Loss 6.0043   LearningRate 0.0203   Epoch: 10   Global Step: 455450   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:53:53,609-Speed 2628.03 samples/sec   Loss 5.8876   LearningRate 0.0203   Epoch: 10   Global Step: 455460   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:53:57,508-Speed 2627.39 samples/sec   Loss 6.0552   LearningRate 0.0203   Epoch: 10   Global Step: 455470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:01,409-Speed 2625.68 samples/sec   Loss 5.9547   LearningRate 0.0203   Epoch: 10   Global Step: 455480   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:05,312-Speed 2624.00 samples/sec   Loss 5.9146   LearningRate 0.0203   Epoch: 10   Global Step: 455490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:09,209-Speed 2628.47 samples/sec   Loss 5.9733   LearningRate 0.0203   Epoch: 10   Global Step: 455500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:13,129-Speed 2612.56 samples/sec   Loss 6.0324   LearningRate 0.0203   Epoch: 10   Global Step: 455510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:17,025-Speed 2628.97 samples/sec   Loss 5.9406   LearningRate 0.0203   Epoch: 10   Global Step: 455520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:20,923-Speed 2627.75 samples/sec   Loss 6.0150   LearningRate 0.0203   Epoch: 10   Global Step: 455530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:24,826-Speed 2623.79 samples/sec   Loss 6.0803   LearningRate 0.0203   Epoch: 10   Global Step: 455540   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:28,733-Speed 2621.94 samples/sec   Loss 6.0045   LearningRate 0.0203   Epoch: 10   Global Step: 455550   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:32,629-Speed 2628.80 samples/sec   Loss 5.9902   LearningRate 0.0203   Epoch: 10   Global Step: 455560   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:36,525-Speed 2629.54 samples/sec   Loss 6.0443   LearningRate 0.0203   Epoch: 10   Global Step: 455570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:54:40,512-Speed 2569.18 samples/sec   Loss 5.9367   LearningRate 0.0203   Epoch: 10   Global Step: 455580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:44,414-Speed 2624.78 samples/sec   Loss 5.9945   LearningRate 0.0203   Epoch: 10   Global Step: 455590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:48,315-Speed 2625.05 samples/sec   Loss 5.9471   LearningRate 0.0203   Epoch: 10   Global Step: 455600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:52,232-Speed 2614.91 samples/sec   Loss 6.1096   LearningRate 0.0203   Epoch: 10   Global Step: 455610   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:54:56,131-Speed 2627.16 samples/sec   Loss 6.0336   LearningRate 0.0203   Epoch: 10   Global Step: 455620   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:00,031-Speed 2626.19 samples/sec   Loss 6.0315   LearningRate 0.0203   Epoch: 10   Global Step: 455630   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:03,932-Speed 2625.61 samples/sec   Loss 5.9789   LearningRate 0.0203   Epoch: 10   Global Step: 455640   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:07,832-Speed 2626.30 samples/sec   Loss 5.9513   LearningRate 0.0203   Epoch: 10   Global Step: 455650   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:11,734-Speed 2624.33 samples/sec   Loss 6.0151   LearningRate 0.0203   Epoch: 10   Global Step: 455660   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:15,631-Speed 2628.52 samples/sec   Loss 6.1014   LearningRate 0.0203   Epoch: 10   Global Step: 455670   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:19,504-Speed 2644.81 samples/sec   Loss 6.0555   LearningRate 0.0203   Epoch: 10   Global Step: 455680   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:23,401-Speed 2628.17 samples/sec   Loss 6.0519   LearningRate 0.0203   Epoch: 10   Global Step: 455690   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:27,306-Speed 2622.81 samples/sec   Loss 6.1273   LearningRate 0.0203   Epoch: 10   Global Step: 455700   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:31,203-Speed 2628.11 samples/sec   Loss 6.1746   LearningRate 0.0203   Epoch: 10   Global Step: 455710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:35,099-Speed 2628.78 samples/sec   Loss 6.1184   LearningRate 0.0203   Epoch: 10   Global Step: 455720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:38,999-Speed 2626.59 samples/sec   Loss 6.2090   LearningRate 0.0203   Epoch: 10   Global Step: 455730   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:42,898-Speed 2626.34 samples/sec   Loss 6.0027   LearningRate 0.0203   Epoch: 10   Global Step: 455740   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:46,810-Speed 2618.40 samples/sec   Loss 6.0325   LearningRate 0.0203   Epoch: 10   Global Step: 455750   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:50,719-Speed 2620.05 samples/sec   Loss 6.0695   LearningRate 0.0203   Epoch: 10   Global Step: 455760   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:54,614-Speed 2629.73 samples/sec   Loss 6.0822   LearningRate 0.0203   Epoch: 10   Global Step: 455770   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:55:58,514-Speed 2626.55 samples/sec   Loss 6.0474   LearningRate 0.0203   Epoch: 10   Global Step: 455780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:02,414-Speed 2626.17 samples/sec   Loss 6.0085   LearningRate 0.0203   Epoch: 10   Global Step: 455790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:06,313-Speed 2627.95 samples/sec   Loss 6.1114   LearningRate 0.0203   Epoch: 10   Global Step: 455800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:10,205-Speed 2630.95 samples/sec   Loss 6.0815   LearningRate 0.0203   Epoch: 10   Global Step: 455810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:14,100-Speed 2629.46 samples/sec   Loss 5.8891   LearningRate 0.0203   Epoch: 10   Global Step: 455820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:17,998-Speed 2627.43 samples/sec   Loss 6.0387   LearningRate 0.0203   Epoch: 10   Global Step: 455830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:21,897-Speed 2627.43 samples/sec   Loss 6.0157   LearningRate 0.0203   Epoch: 10   Global Step: 455840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:56:25,756-Speed 2653.66 samples/sec   Loss 6.0152   LearningRate 0.0203   Epoch: 10   Global Step: 455850   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:29,722-Speed 2582.92 samples/sec   Loss 5.9282   LearningRate 0.0203   Epoch: 10   Global Step: 455860   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:33,619-Speed 2628.08 samples/sec   Loss 6.1435   LearningRate 0.0203   Epoch: 10   Global Step: 455870   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:37,517-Speed 2627.68 samples/sec   Loss 6.0573   LearningRate 0.0203   Epoch: 10   Global Step: 455880   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:41,415-Speed 2627.67 samples/sec   Loss 6.0053   LearningRate 0.0203   Epoch: 10   Global Step: 455890   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:45,311-Speed 2628.92 samples/sec   Loss 5.9804   LearningRate 0.0203   Epoch: 10   Global Step: 455900   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:49,207-Speed 2631.60 samples/sec   Loss 6.1774   LearningRate 0.0203   Epoch: 10   Global Step: 455910   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:56:53,104-Speed 2628.49 samples/sec   Loss 6.0495   LearningRate 0.0203   Epoch: 10   Global Step: 455920   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:03,019-Speed 2508.32 samples/sec   Loss 6.0788   LearningRate 0.0203   Epoch: 10   Global Step: 455930   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:06,982-Speed 2584.67 samples/sec   Loss 5.9290   LearningRate 0.0203   Epoch: 10   Global Step: 455940   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:10,898-Speed 2626.90 samples/sec   Loss 5.9479   LearningRate 0.0203   Epoch: 10   Global Step: 455950   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:57:14,769-Speed 2645.47 samples/sec   Loss 6.0142   LearningRate 0.0203   Epoch: 10   Global Step: 455960   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:18,662-Speed 2630.69 samples/sec   Loss 6.0572   LearningRate 0.0203   Epoch: 10   Global Step: 455970   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:22,668-Speed 2639.25 samples/sec   Loss 6.0936   LearningRate 0.0203   Epoch: 10   Global Step: 455980   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:26,564-Speed 2629.30 samples/sec   Loss 6.0478   LearningRate 0.0203   Epoch: 10   Global Step: 455990   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:30,455-Speed 2635.27 samples/sec   Loss 6.0964   LearningRate 0.0203   Epoch: 10   Global Step: 456000   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:34,364-Speed 2620.22 samples/sec   Loss 6.0617   LearningRate 0.0203   Epoch: 10   Global Step: 456010   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:38,257-Speed 2630.98 samples/sec   Loss 6.1009   LearningRate 0.0203   Epoch: 10   Global Step: 456020   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:42,147-Speed 2632.81 samples/sec   Loss 6.0555   LearningRate 0.0203   Epoch: 10   Global Step: 456030   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:46,102-Speed 2635.81 samples/sec   Loss 5.9896   LearningRate 0.0203   Epoch: 10   Global Step: 456040   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:50,002-Speed 2626.27 samples/sec   Loss 6.0256   LearningRate 0.0203   Epoch: 10   Global Step: 456050   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 22:57:53,904-Speed 2624.70 samples/sec   Loss 5.9797   LearningRate 0.0203   Epoch: 10   Global Step: 456060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:57:57,798-Speed 2630.23 samples/sec   Loss 6.1211   LearningRate 0.0203   Epoch: 10   Global Step: 456070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:01,709-Speed 2618.94 samples/sec   Loss 6.0682   LearningRate 0.0203   Epoch: 10   Global Step: 456080   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:05,604-Speed 2630.25 samples/sec   Loss 6.0126   LearningRate 0.0203   Epoch: 10   Global Step: 456090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:09,500-Speed 2628.78 samples/sec   Loss 5.9555   LearningRate 0.0203   Epoch: 10   Global Step: 456100   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:13,395-Speed 2629.43 samples/sec   Loss 5.9514   LearningRate 0.0203   Epoch: 10   Global Step: 456110   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:17,299-Speed 2623.02 samples/sec   Loss 6.0722   LearningRate 0.0203   Epoch: 10   Global Step: 456120   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:21,210-Speed 2618.98 samples/sec   Loss 6.0541   LearningRate 0.0203   Epoch: 10   Global Step: 456130   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:25,110-Speed 2626.26 samples/sec   Loss 5.9461   LearningRate 0.0203   Epoch: 10   Global Step: 456140   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:29,014-Speed 2623.89 samples/sec   Loss 6.1117   LearningRate 0.0203   Epoch: 10   Global Step: 456150   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:32,889-Speed 2643.16 samples/sec   Loss 6.1666   LearningRate 0.0203   Epoch: 10   Global Step: 456160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:36,789-Speed 2625.81 samples/sec   Loss 6.0816   LearningRate 0.0203   Epoch: 10   Global Step: 456170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:40,689-Speed 2626.38 samples/sec   Loss 6.0379   LearningRate 0.0203   Epoch: 10   Global Step: 456180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:44,591-Speed 2624.79 samples/sec   Loss 6.0290   LearningRate 0.0203   Epoch: 10   Global Step: 456190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:48,482-Speed 2632.47 samples/sec   Loss 6.0489   LearningRate 0.0203   Epoch: 10   Global Step: 456200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:52,375-Speed 2631.22 samples/sec   Loss 6.0261   LearningRate 0.0203   Epoch: 10   Global Step: 456210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:58:56,271-Speed 2628.63 samples/sec   Loss 5.9371   LearningRate 0.0203   Epoch: 10   Global Step: 456220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:00,163-Speed 2632.19 samples/sec   Loss 6.0669   LearningRate 0.0203   Epoch: 10   Global Step: 456230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:04,204-Speed 2533.98 samples/sec   Loss 6.0271   LearningRate 0.0203   Epoch: 10   Global Step: 456240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:08,123-Speed 2613.59 samples/sec   Loss 6.0618   LearningRate 0.0203   Epoch: 10   Global Step: 456250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:12,061-Speed 2601.22 samples/sec   Loss 6.0928   LearningRate 0.0203   Epoch: 10   Global Step: 456260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:59:15,954-Speed 2630.82 samples/sec   Loss 6.0236   LearningRate 0.0202   Epoch: 10   Global Step: 456270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 22:59:39,420-Speed 436.38 samples/sec   Loss 5.9283   LearningRate 0.0202   Epoch: 11   Global Step: 456280   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:43,303-Speed 2638.74 samples/sec   Loss 6.0626   LearningRate 0.0202   Epoch: 11   Global Step: 456290   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:47,186-Speed 2637.72 samples/sec   Loss 6.0348   LearningRate 0.0202   Epoch: 11   Global Step: 456300   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:51,069-Speed 2637.67 samples/sec   Loss 5.9788   LearningRate 0.0202   Epoch: 11   Global Step: 456310   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:55,067-Speed 2561.70 samples/sec   Loss 6.0722   LearningRate 0.0202   Epoch: 11   Global Step: 456320   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 22:59:58,932-Speed 2650.95 samples/sec   Loss 6.1032   LearningRate 0.0202   Epoch: 11   Global Step: 456330   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:02,827-Speed 2629.30 samples/sec   Loss 5.9930   LearningRate 0.0202   Epoch: 11   Global Step: 456340   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:06,720-Speed 2631.72 samples/sec   Loss 5.8894   LearningRate 0.0202   Epoch: 11   Global Step: 456350   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:10,613-Speed 2630.82 samples/sec   Loss 6.0047   LearningRate 0.0202   Epoch: 11   Global Step: 456360   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:14,519-Speed 2621.82 samples/sec   Loss 5.9909   LearningRate 0.0202   Epoch: 11   Global Step: 456370   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:18,421-Speed 2624.77 samples/sec   Loss 5.9796   LearningRate 0.0202   Epoch: 11   Global Step: 456380   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:22,321-Speed 2627.12 samples/sec   Loss 6.0224   LearningRate 0.0202   Epoch: 11   Global Step: 456390   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:26,223-Speed 2624.88 samples/sec   Loss 6.0880   LearningRate 0.0202   Epoch: 11   Global Step: 456400   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:30,125-Speed 2625.05 samples/sec   Loss 6.0596   LearningRate 0.0202   Epoch: 11   Global Step: 456410   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:34,034-Speed 2619.93 samples/sec   Loss 6.0133   LearningRate 0.0202   Epoch: 11   Global Step: 456420   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:00:37,929-Speed 2630.04 samples/sec   Loss 6.0222   LearningRate 0.0202   Epoch: 11   Global Step: 456430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:00:41,844-Speed 2616.06 samples/sec   Loss 5.9754   LearningRate 0.0202   Epoch: 11   Global Step: 456440   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:00:45,749-Speed 2622.98 samples/sec   Loss 6.0089   LearningRate 0.0202   Epoch: 11   Global Step: 456450   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:00:49,673-Speed 2609.69 samples/sec   Loss 6.0357   LearningRate 0.0202   Epoch: 11   Global Step: 456460   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:00:53,569-Speed 2629.85 samples/sec   Loss 5.8604   LearningRate 0.0202   Epoch: 11   Global Step: 456470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:00:57,465-Speed 2628.73 samples/sec   Loss 6.0043   LearningRate 0.0202   Epoch: 11   Global Step: 456480   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:01,359-Speed 2630.18 samples/sec   Loss 6.0675   LearningRate 0.0202   Epoch: 11   Global Step: 456490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:05,256-Speed 2627.97 samples/sec   Loss 5.9824   LearningRate 0.0202   Epoch: 11   Global Step: 456500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:09,150-Speed 2630.12 samples/sec   Loss 5.9993   LearningRate 0.0202   Epoch: 11   Global Step: 456510   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:13,050-Speed 2627.16 samples/sec   Loss 5.9853   LearningRate 0.0202   Epoch: 11   Global Step: 456520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:16,942-Speed 2631.65 samples/sec   Loss 6.0576   LearningRate 0.0202   Epoch: 11   Global Step: 456530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:01:20,841-Speed 2626.61 samples/sec   Loss 5.9705   LearningRate 0.0202   Epoch: 11   Global Step: 456540   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:01:24,742-Speed 2626.13 samples/sec   Loss 5.9531   LearningRate 0.0202   Epoch: 11   Global Step: 456550   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:01:28,637-Speed 2629.57 samples/sec   Loss 5.9995   LearningRate 0.0202   Epoch: 11   Global Step: 456560   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:01:32,540-Speed 2624.11 samples/sec   Loss 5.9758   LearningRate 0.0202   Epoch: 11   Global Step: 456570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:01:36,435-Speed 2629.84 samples/sec   Loss 5.9857   LearningRate 0.0202   Epoch: 11   Global Step: 456580   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:01:40,319-Speed 2637.17 samples/sec   Loss 6.0545   LearningRate 0.0202   Epoch: 11   Global Step: 456590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:44,214-Speed 2629.91 samples/sec   Loss 5.9946   LearningRate 0.0202   Epoch: 11   Global Step: 456600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:48,111-Speed 2628.34 samples/sec   Loss 5.9440   LearningRate 0.0202   Epoch: 11   Global Step: 456610   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:52,019-Speed 2620.80 samples/sec   Loss 5.9927   LearningRate 0.0202   Epoch: 11   Global Step: 456620   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:55,928-Speed 2620.74 samples/sec   Loss 6.1051   LearningRate 0.0202   Epoch: 11   Global Step: 456630   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:01:59,823-Speed 2629.56 samples/sec   Loss 5.9650   LearningRate 0.0202   Epoch: 11   Global Step: 456640   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:02:03,721-Speed 2627.33 samples/sec   Loss 6.0124   LearningRate 0.0202   Epoch: 11   Global Step: 456650   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:02:07,621-Speed 2625.93 samples/sec   Loss 6.0092   LearningRate 0.0202   Epoch: 11   Global Step: 456660   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:02:11,525-Speed 2623.97 samples/sec   Loss 5.9810   LearningRate 0.0202   Epoch: 11   Global Step: 456670   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:02:15,426-Speed 2625.54 samples/sec   Loss 5.9340   LearningRate 0.0202   Epoch: 11   Global Step: 456680   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:02:19,323-Speed 2628.85 samples/sec   Loss 6.0976   LearningRate 0.0202   Epoch: 11   Global Step: 456690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:23,218-Speed 2629.63 samples/sec   Loss 5.9665   LearningRate 0.0202   Epoch: 11   Global Step: 456700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:27,117-Speed 2627.09 samples/sec   Loss 6.0142   LearningRate 0.0202   Epoch: 11   Global Step: 456710   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:31,011-Speed 2630.09 samples/sec   Loss 5.9778   LearningRate 0.0202   Epoch: 11   Global Step: 456720   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:34,919-Speed 2621.06 samples/sec   Loss 6.0008   LearningRate 0.0202   Epoch: 11   Global Step: 456730   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:38,813-Speed 2630.12 samples/sec   Loss 5.9360   LearningRate 0.0202   Epoch: 11   Global Step: 456740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:42,714-Speed 2625.24 samples/sec   Loss 5.9918   LearningRate 0.0202   Epoch: 11   Global Step: 456750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:46,612-Speed 2627.98 samples/sec   Loss 5.8914   LearningRate 0.0202   Epoch: 11   Global Step: 456760   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:50,521-Speed 2620.60 samples/sec   Loss 5.9768   LearningRate 0.0202   Epoch: 11   Global Step: 456770   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:54,413-Speed 2631.78 samples/sec   Loss 6.0188   LearningRate 0.0202   Epoch: 11   Global Step: 456780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:02:58,298-Speed 2635.95 samples/sec   Loss 6.0288   LearningRate 0.0202   Epoch: 11   Global Step: 456790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:02,198-Speed 2626.64 samples/sec   Loss 5.9288   LearningRate 0.0202   Epoch: 11   Global Step: 456800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:06,095-Speed 2628.24 samples/sec   Loss 6.0841   LearningRate 0.0202   Epoch: 11   Global Step: 456810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:09,997-Speed 2624.41 samples/sec   Loss 6.0066   LearningRate 0.0202   Epoch: 11   Global Step: 456820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:13,897-Speed 2626.59 samples/sec   Loss 6.0088   LearningRate 0.0202   Epoch: 11   Global Step: 456830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:17,792-Speed 2629.47 samples/sec   Loss 5.9338   LearningRate 0.0202   Epoch: 11   Global Step: 456840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:21,685-Speed 2630.74 samples/sec   Loss 5.9162   LearningRate 0.0202   Epoch: 11   Global Step: 456850   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:25,579-Speed 2631.41 samples/sec   Loss 6.0506   LearningRate 0.0202   Epoch: 11   Global Step: 456860   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:29,475-Speed 2628.53 samples/sec   Loss 5.9802   LearningRate 0.0202   Epoch: 11   Global Step: 456870   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:33,466-Speed 2566.05 samples/sec   Loss 5.9588   LearningRate 0.0202   Epoch: 11   Global Step: 456880   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:37,336-Speed 2646.76 samples/sec   Loss 5.9173   LearningRate 0.0202   Epoch: 11   Global Step: 456890   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:41,233-Speed 2628.34 samples/sec   Loss 6.0510   LearningRate 0.0202   Epoch: 11   Global Step: 456900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:45,137-Speed 2623.48 samples/sec   Loss 6.1268   LearningRate 0.0202   Epoch: 11   Global Step: 456910   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:49,036-Speed 2627.13 samples/sec   Loss 6.0058   LearningRate 0.0202   Epoch: 11   Global Step: 456920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:52,927-Speed 2632.26 samples/sec   Loss 5.9993   LearningRate 0.0202   Epoch: 11   Global Step: 456930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:03:56,823-Speed 2629.88 samples/sec   Loss 5.9431   LearningRate 0.0202   Epoch: 11   Global Step: 456940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:00,729-Speed 2621.79 samples/sec   Loss 5.9985   LearningRate 0.0202   Epoch: 11   Global Step: 456950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:04,628-Speed 2626.83 samples/sec   Loss 6.0476   LearningRate 0.0202   Epoch: 11   Global Step: 456960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:08,534-Speed 2621.98 samples/sec   Loss 6.0034   LearningRate 0.0202   Epoch: 11   Global Step: 456970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:12,445-Speed 2619.49 samples/sec   Loss 5.9810   LearningRate 0.0202   Epoch: 11   Global Step: 456980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:16,351-Speed 2622.50 samples/sec   Loss 6.0014   LearningRate 0.0202   Epoch: 11   Global Step: 456990   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 23:04:20,256-Speed 2622.78 samples/sec   Loss 6.0898   LearningRate 0.0202   Epoch: 11   Global Step: 457000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:24,183-Speed 2608.40 samples/sec   Loss 5.9468   LearningRate 0.0202   Epoch: 11   Global Step: 457010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:28,084-Speed 2626.05 samples/sec   Loss 5.8669   LearningRate 0.0202   Epoch: 11   Global Step: 457020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:31,977-Speed 2630.43 samples/sec   Loss 5.9709   LearningRate 0.0202   Epoch: 11   Global Step: 457030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:35,875-Speed 2627.52 samples/sec   Loss 6.0662   LearningRate 0.0202   Epoch: 11   Global Step: 457040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:04:39,767-Speed 2631.74 samples/sec   Loss 5.8902   LearningRate 0.0202   Epoch: 11   Global Step: 457050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:04:43,668-Speed 2625.94 samples/sec   Loss 6.0553   LearningRate 0.0202   Epoch: 11   Global Step: 457060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:04:47,570-Speed 2625.36 samples/sec   Loss 6.0438   LearningRate 0.0202   Epoch: 11   Global Step: 457070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:04:51,472-Speed 2624.34 samples/sec   Loss 6.0743   LearningRate 0.0202   Epoch: 11   Global Step: 457080   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:04:55,380-Speed 2621.50 samples/sec   Loss 6.0205   LearningRate 0.0202   Epoch: 11   Global Step: 457090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:04:59,276-Speed 2629.01 samples/sec   Loss 6.0850   LearningRate 0.0202   Epoch: 11   Global Step: 457100   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:03,172-Speed 2628.26 samples/sec   Loss 5.9405   LearningRate 0.0202   Epoch: 11   Global Step: 457110   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:07,069-Speed 2628.16 samples/sec   Loss 5.9650   LearningRate 0.0202   Epoch: 11   Global Step: 457120   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:10,979-Speed 2619.90 samples/sec   Loss 5.9675   LearningRate 0.0202   Epoch: 11   Global Step: 457130   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:14,914-Speed 2603.55 samples/sec   Loss 5.9943   LearningRate 0.0202   Epoch: 11   Global Step: 457140   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:18,809-Speed 2629.51 samples/sec   Loss 5.9624   LearningRate 0.0202   Epoch: 11   Global Step: 457150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:05:22,733-Speed 2610.32 samples/sec   Loss 5.9671   LearningRate 0.0202   Epoch: 11   Global Step: 457160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:05:26,630-Speed 2628.31 samples/sec   Loss 5.9233   LearningRate 0.0202   Epoch: 11   Global Step: 457170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:05:30,507-Speed 2642.34 samples/sec   Loss 5.8244   LearningRate 0.0202   Epoch: 11   Global Step: 457180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:34,407-Speed 2626.05 samples/sec   Loss 5.9624   LearningRate 0.0202   Epoch: 11   Global Step: 457190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:38,307-Speed 2625.64 samples/sec   Loss 5.9397   LearningRate 0.0201   Epoch: 11   Global Step: 457200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:42,203-Speed 2629.42 samples/sec   Loss 6.0456   LearningRate 0.0201   Epoch: 11   Global Step: 457210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:46,098-Speed 2629.77 samples/sec   Loss 6.0761   LearningRate 0.0201   Epoch: 11   Global Step: 457220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:49,993-Speed 2629.69 samples/sec   Loss 6.0189   LearningRate 0.0201   Epoch: 11   Global Step: 457230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:53,898-Speed 2622.71 samples/sec   Loss 6.0257   LearningRate 0.0201   Epoch: 11   Global Step: 457240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:05:57,802-Speed 2624.48 samples/sec   Loss 6.1108   LearningRate 0.0201   Epoch: 11   Global Step: 457250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:01,696-Speed 2629.74 samples/sec   Loss 6.1562   LearningRate 0.0201   Epoch: 11   Global Step: 457260   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:05,594-Speed 2627.68 samples/sec   Loss 6.0543   LearningRate 0.0201   Epoch: 11   Global Step: 457270   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:09,504-Speed 2619.23 samples/sec   Loss 5.9805   LearningRate 0.0201   Epoch: 11   Global Step: 457280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:06:13,414-Speed 2620.47 samples/sec   Loss 5.9685   LearningRate 0.0201   Epoch: 11   Global Step: 457290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:06:17,318-Speed 2623.16 samples/sec   Loss 6.1003   LearningRate 0.0201   Epoch: 11   Global Step: 457300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:06:21,225-Speed 2621.32 samples/sec   Loss 6.0095   LearningRate 0.0201   Epoch: 11   Global Step: 457310   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:25,122-Speed 2628.87 samples/sec   Loss 5.9873   LearningRate 0.0201   Epoch: 11   Global Step: 457320   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:29,014-Speed 2632.24 samples/sec   Loss 6.0854   LearningRate 0.0201   Epoch: 11   Global Step: 457330   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:32,923-Speed 2620.99 samples/sec   Loss 5.9989   LearningRate 0.0201   Epoch: 11   Global Step: 457340   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:36,817-Speed 2630.51 samples/sec   Loss 6.0526   LearningRate 0.0201   Epoch: 11   Global Step: 457350   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:40,785-Speed 2580.58 samples/sec   Loss 5.9603   LearningRate 0.0201   Epoch: 11   Global Step: 457360   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:06:44,675-Speed 2633.53 samples/sec   Loss 6.0738   LearningRate 0.0201   Epoch: 11   Global Step: 457370   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:06:48,569-Speed 2630.37 samples/sec   Loss 5.9953   LearningRate 0.0201   Epoch: 11   Global Step: 457380   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:06:52,465-Speed 2629.30 samples/sec   Loss 6.1515   LearningRate 0.0201   Epoch: 11   Global Step: 457390   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:06:56,392-Speed 2607.87 samples/sec   Loss 5.8269   LearningRate 0.0201   Epoch: 11   Global Step: 457400   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:00,284-Speed 2631.62 samples/sec   Loss 5.9538   LearningRate 0.0201   Epoch: 11   Global Step: 457410   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:04,179-Speed 2629.75 samples/sec   Loss 5.9638   LearningRate 0.0201   Epoch: 11   Global Step: 457420   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:08,075-Speed 2629.48 samples/sec   Loss 5.9272   LearningRate 0.0201   Epoch: 11   Global Step: 457430   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:11,967-Speed 2631.48 samples/sec   Loss 6.0113   LearningRate 0.0201   Epoch: 11   Global Step: 457440   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:15,857-Speed 2632.50 samples/sec   Loss 5.9095   LearningRate 0.0201   Epoch: 11   Global Step: 457450   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:19,750-Speed 2631.20 samples/sec   Loss 6.0320   LearningRate 0.0201   Epoch: 11   Global Step: 457460   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-04-14 23:07:23,645-Speed 2629.55 samples/sec   Loss 6.0681   LearningRate 0.0201   Epoch: 11   Global Step: 457470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:27,537-Speed 2631.97 samples/sec   Loss 5.9410   LearningRate 0.0201   Epoch: 11   Global Step: 457480   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:31,444-Speed 2621.73 samples/sec   Loss 5.9431   LearningRate 0.0201   Epoch: 11   Global Step: 457490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:35,338-Speed 2629.65 samples/sec   Loss 6.1345   LearningRate 0.0201   Epoch: 11   Global Step: 457500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:39,235-Speed 2628.28 samples/sec   Loss 5.9460   LearningRate 0.0201   Epoch: 11   Global Step: 457510   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:43,146-Speed 2619.20 samples/sec   Loss 5.9271   LearningRate 0.0201   Epoch: 11   Global Step: 457520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:47,042-Speed 2628.93 samples/sec   Loss 6.0123   LearningRate 0.0201   Epoch: 11   Global Step: 457530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:50,936-Speed 2630.63 samples/sec   Loss 6.0310   LearningRate 0.0201   Epoch: 11   Global Step: 457540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:54,845-Speed 2620.28 samples/sec   Loss 6.1241   LearningRate 0.0201   Epoch: 11   Global Step: 457550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:07:58,782-Speed 2601.66 samples/sec   Loss 6.1075   LearningRate 0.0201   Epoch: 11   Global Step: 457560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:02,692-Speed 2619.65 samples/sec   Loss 6.0830   LearningRate 0.0201   Epoch: 11   Global Step: 457570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:08:06,588-Speed 2628.45 samples/sec   Loss 5.9119   LearningRate 0.0201   Epoch: 11   Global Step: 457580   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:08:10,486-Speed 2627.43 samples/sec   Loss 5.9765   LearningRate 0.0201   Epoch: 11   Global Step: 457590   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:08:14,378-Speed 2632.89 samples/sec   Loss 6.0495   LearningRate 0.0201   Epoch: 11   Global Step: 457600   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:08:18,273-Speed 2630.52 samples/sec   Loss 5.9255   LearningRate 0.0201   Epoch: 11   Global Step: 457610   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:08:22,166-Speed 2630.76 samples/sec   Loss 6.1220   LearningRate 0.0201   Epoch: 11   Global Step: 457620   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:08:26,036-Speed 2646.72 samples/sec   Loss 5.9868   LearningRate 0.0201   Epoch: 11   Global Step: 457630   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:29,932-Speed 2628.71 samples/sec   Loss 6.0705   LearningRate 0.0201   Epoch: 11   Global Step: 457640   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:33,836-Speed 2623.78 samples/sec   Loss 5.8704   LearningRate 0.0201   Epoch: 11   Global Step: 457650   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:37,731-Speed 2629.57 samples/sec   Loss 5.8791   LearningRate 0.0201   Epoch: 11   Global Step: 457660   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:41,634-Speed 2623.97 samples/sec   Loss 5.9630   LearningRate 0.0201   Epoch: 11   Global Step: 457670   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:45,530-Speed 2628.71 samples/sec   Loss 5.8483   LearningRate 0.0201   Epoch: 11   Global Step: 457680   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:49,435-Speed 2623.36 samples/sec   Loss 5.8383   LearningRate 0.0201   Epoch: 11   Global Step: 457690   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:53,346-Speed 2618.70 samples/sec   Loss 6.1267   LearningRate 0.0201   Epoch: 11   Global Step: 457700   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:08:57,268-Speed 2611.87 samples/sec   Loss 6.0072   LearningRate 0.0201   Epoch: 11   Global Step: 457710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:09:01,165-Speed 2628.05 samples/sec   Loss 6.0337   LearningRate 0.0201   Epoch: 11   Global Step: 457720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:09:05,061-Speed 2629.28 samples/sec   Loss 5.9353   LearningRate 0.0201   Epoch: 11   Global Step: 457730   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:08,980-Speed 2613.45 samples/sec   Loss 5.9856   LearningRate 0.0201   Epoch: 11   Global Step: 457740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:12,900-Speed 2612.51 samples/sec   Loss 6.0036   LearningRate 0.0201   Epoch: 11   Global Step: 457750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:16,793-Speed 2631.00 samples/sec   Loss 5.9396   LearningRate 0.0201   Epoch: 11   Global Step: 457760   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:20,691-Speed 2628.16 samples/sec   Loss 6.0771   LearningRate 0.0201   Epoch: 11   Global Step: 457770   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:24,583-Speed 2631.79 samples/sec   Loss 6.1015   LearningRate 0.0201   Epoch: 11   Global Step: 457780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:28,487-Speed 2623.57 samples/sec   Loss 6.0265   LearningRate 0.0201   Epoch: 11   Global Step: 457790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:32,382-Speed 2630.10 samples/sec   Loss 6.0333   LearningRate 0.0201   Epoch: 11   Global Step: 457800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:36,274-Speed 2631.02 samples/sec   Loss 5.9711   LearningRate 0.0201   Epoch: 11   Global Step: 457810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:40,172-Speed 2627.47 samples/sec   Loss 5.9157   LearningRate 0.0201   Epoch: 11   Global Step: 457820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:44,052-Speed 2640.03 samples/sec   Loss 5.9305   LearningRate 0.0201   Epoch: 11   Global Step: 457830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:09:47,929-Speed 2642.31 samples/sec   Loss 6.0957   LearningRate 0.0201   Epoch: 11   Global Step: 457840   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:09:51,823-Speed 2630.17 samples/sec   Loss 5.9576   LearningRate 0.0201   Epoch: 11   Global Step: 457850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:09:55,748-Speed 2610.01 samples/sec   Loss 5.9418   LearningRate 0.0201   Epoch: 11   Global Step: 457860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:09:59,656-Speed 2620.42 samples/sec   Loss 5.9501   LearningRate 0.0201   Epoch: 11   Global Step: 457870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:03,550-Speed 2630.15 samples/sec   Loss 6.0194   LearningRate 0.0201   Epoch: 11   Global Step: 457880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:07,451-Speed 2625.34 samples/sec   Loss 6.0431   LearningRate 0.0201   Epoch: 11   Global Step: 457890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:11,380-Speed 2607.64 samples/sec   Loss 5.9215   LearningRate 0.0201   Epoch: 11   Global Step: 457900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:15,288-Speed 2620.61 samples/sec   Loss 5.9789   LearningRate 0.0201   Epoch: 11   Global Step: 457910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:19,191-Speed 2624.07 samples/sec   Loss 5.9759   LearningRate 0.0201   Epoch: 11   Global Step: 457920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:23,113-Speed 2611.48 samples/sec   Loss 6.0079   LearningRate 0.0201   Epoch: 11   Global Step: 457930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:27,011-Speed 2628.33 samples/sec   Loss 6.0018   LearningRate 0.0201   Epoch: 11   Global Step: 457940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:10:30,884-Speed 2644.19 samples/sec   Loss 5.9616   LearningRate 0.0201   Epoch: 11   Global Step: 457950   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:34,786-Speed 2625.08 samples/sec   Loss 5.9188   LearningRate 0.0201   Epoch: 11   Global Step: 457960   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:38,680-Speed 2629.64 samples/sec   Loss 6.0769   LearningRate 0.0201   Epoch: 11   Global Step: 457970   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:42,629-Speed 2594.17 samples/sec   Loss 5.9974   LearningRate 0.0201   Epoch: 11   Global Step: 457980   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:46,528-Speed 2627.46 samples/sec   Loss 5.9384   LearningRate 0.0201   Epoch: 11   Global Step: 457990   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:50,529-Speed 2560.01 samples/sec   Loss 5.7499   LearningRate 0.0201   Epoch: 11   Global Step: 458000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:54,426-Speed 2628.50 samples/sec   Loss 5.9517   LearningRate 0.0201   Epoch: 11   Global Step: 458010   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:10:58,321-Speed 2629.63 samples/sec   Loss 5.9299   LearningRate 0.0201   Epoch: 11   Global Step: 458020   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:11:02,229-Speed 2621.09 samples/sec   Loss 6.0460   LearningRate 0.0201   Epoch: 11   Global Step: 458030   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:11:06,124-Speed 2629.42 samples/sec   Loss 5.9437   LearningRate 0.0201   Epoch: 11   Global Step: 458040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:11:10,024-Speed 2625.65 samples/sec   Loss 5.9049   LearningRate 0.0201   Epoch: 11   Global Step: 458050   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:13,943-Speed 2614.51 samples/sec   Loss 6.0422   LearningRate 0.0201   Epoch: 11   Global Step: 458060   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:17,836-Speed 2630.79 samples/sec   Loss 6.0531   LearningRate 0.0201   Epoch: 11   Global Step: 458070   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:21,730-Speed 2630.42 samples/sec   Loss 6.0139   LearningRate 0.0201   Epoch: 11   Global Step: 458080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:25,625-Speed 2629.70 samples/sec   Loss 5.9708   LearningRate 0.0201   Epoch: 11   Global Step: 458090   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:29,533-Speed 2620.71 samples/sec   Loss 5.9516   LearningRate 0.0201   Epoch: 11   Global Step: 458100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:33,437-Speed 2623.69 samples/sec   Loss 5.9562   LearningRate 0.0201   Epoch: 11   Global Step: 458110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:37,354-Speed 2614.75 samples/sec   Loss 6.0325   LearningRate 0.0200   Epoch: 11   Global Step: 458120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:41,256-Speed 2624.40 samples/sec   Loss 6.0931   LearningRate 0.0200   Epoch: 11   Global Step: 458130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:45,155-Speed 2627.01 samples/sec   Loss 6.0456   LearningRate 0.0200   Epoch: 11   Global Step: 458140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:11:49,022-Speed 2648.89 samples/sec   Loss 6.0353   LearningRate 0.0200   Epoch: 11   Global Step: 458150   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:11:52,922-Speed 2626.39 samples/sec   Loss 6.0223   LearningRate 0.0200   Epoch: 11   Global Step: 458160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:11:56,919-Speed 2562.79 samples/sec   Loss 6.0303   LearningRate 0.0200   Epoch: 11   Global Step: 458170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:00,837-Speed 2613.82 samples/sec   Loss 5.9033   LearningRate 0.0200   Epoch: 11   Global Step: 458180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:04,743-Speed 2622.13 samples/sec   Loss 6.0091   LearningRate 0.0200   Epoch: 11   Global Step: 458190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:08,649-Speed 2621.91 samples/sec   Loss 6.0060   LearningRate 0.0200   Epoch: 11   Global Step: 458200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:12,562-Speed 2618.01 samples/sec   Loss 5.9398   LearningRate 0.0200   Epoch: 11   Global Step: 458210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:16,460-Speed 2627.29 samples/sec   Loss 5.9492   LearningRate 0.0200   Epoch: 11   Global Step: 458220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:20,374-Speed 2616.66 samples/sec   Loss 6.0306   LearningRate 0.0200   Epoch: 11   Global Step: 458230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:24,278-Speed 2624.34 samples/sec   Loss 6.0933   LearningRate 0.0200   Epoch: 11   Global Step: 458240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:12:28,174-Speed 2629.28 samples/sec   Loss 5.9849   LearningRate 0.0200   Epoch: 11   Global Step: 458250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:32,099-Speed 2609.08 samples/sec   Loss 6.0091   LearningRate 0.0200   Epoch: 11   Global Step: 458260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:36,005-Speed 2622.14 samples/sec   Loss 6.0346   LearningRate 0.0200   Epoch: 11   Global Step: 458270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:39,904-Speed 2627.45 samples/sec   Loss 6.0577   LearningRate 0.0200   Epoch: 11   Global Step: 458280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:43,801-Speed 2628.16 samples/sec   Loss 5.9912   LearningRate 0.0200   Epoch: 11   Global Step: 458290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:47,701-Speed 2625.98 samples/sec   Loss 5.8894   LearningRate 0.0200   Epoch: 11   Global Step: 458300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:51,605-Speed 2624.20 samples/sec   Loss 5.9655   LearningRate 0.0200   Epoch: 11   Global Step: 458310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:55,500-Speed 2630.04 samples/sec   Loss 6.0191   LearningRate 0.0200   Epoch: 11   Global Step: 458320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:12:59,431-Speed 2605.02 samples/sec   Loss 5.9913   LearningRate 0.0200   Epoch: 11   Global Step: 458330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:03,329-Speed 2627.81 samples/sec   Loss 5.9974   LearningRate 0.0200   Epoch: 11   Global Step: 458340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:07,228-Speed 2627.27 samples/sec   Loss 6.0256   LearningRate 0.0200   Epoch: 11   Global Step: 458350   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 23:13:11,108-Speed 2639.39 samples/sec   Loss 6.0047   LearningRate 0.0200   Epoch: 11   Global Step: 458360   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:15,004-Speed 2629.44 samples/sec   Loss 5.9852   LearningRate 0.0200   Epoch: 11   Global Step: 458370   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:18,905-Speed 2626.24 samples/sec   Loss 5.9733   LearningRate 0.0200   Epoch: 11   Global Step: 458380   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:22,904-Speed 2561.10 samples/sec   Loss 5.9613   LearningRate 0.0200   Epoch: 11   Global Step: 458390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:26,796-Speed 2631.84 samples/sec   Loss 6.0592   LearningRate 0.0200   Epoch: 11   Global Step: 458400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:30,689-Speed 2631.19 samples/sec   Loss 5.9639   LearningRate 0.0200   Epoch: 11   Global Step: 458410   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:35,619-Speed 2077.48 samples/sec   Loss 5.9899   LearningRate 0.0200   Epoch: 11   Global Step: 458420   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:39,516-Speed 2628.50 samples/sec   Loss 5.8574   LearningRate 0.0200   Epoch: 11   Global Step: 458430   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:43,429-Speed 2617.00 samples/sec   Loss 5.9974   LearningRate 0.0200   Epoch: 11   Global Step: 458440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:47,336-Speed 2621.72 samples/sec   Loss 5.9906   LearningRate 0.0200   Epoch: 11   Global Step: 458450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:51,230-Speed 2630.38 samples/sec   Loss 5.9061   LearningRate 0.0200   Epoch: 11   Global Step: 458460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:55,310-Speed 2510.79 samples/sec   Loss 5.9854   LearningRate 0.0200   Epoch: 11   Global Step: 458470   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:13:59,208-Speed 2627.34 samples/sec   Loss 5.9493   LearningRate 0.0200   Epoch: 11   Global Step: 458480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:03,170-Speed 2584.93 samples/sec   Loss 6.0044   LearningRate 0.0200   Epoch: 11   Global Step: 458490   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:07,069-Speed 2626.76 samples/sec   Loss 5.8235   LearningRate 0.0200   Epoch: 11   Global Step: 458500   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:10,971-Speed 2625.15 samples/sec   Loss 5.9084   LearningRate 0.0200   Epoch: 11   Global Step: 458510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:14,872-Speed 2625.37 samples/sec   Loss 5.9621   LearningRate 0.0200   Epoch: 11   Global Step: 458520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:18,778-Speed 2622.52 samples/sec   Loss 5.9395   LearningRate 0.0200   Epoch: 11   Global Step: 458530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:22,679-Speed 2625.68 samples/sec   Loss 5.9949   LearningRate 0.0200   Epoch: 11   Global Step: 458540   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:14:26,552-Speed 2644.77 samples/sec   Loss 5.9680   LearningRate 0.0200   Epoch: 11   Global Step: 458550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:30,448-Speed 2629.16 samples/sec   Loss 5.8514   LearningRate 0.0200   Epoch: 11   Global Step: 458560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:34,343-Speed 2629.18 samples/sec   Loss 6.0411   LearningRate 0.0200   Epoch: 11   Global Step: 458570   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:38,242-Speed 2627.06 samples/sec   Loss 6.0956   LearningRate 0.0200   Epoch: 11   Global Step: 458580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:42,138-Speed 2628.66 samples/sec   Loss 6.0976   LearningRate 0.0200   Epoch: 11   Global Step: 458590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:46,047-Speed 2620.29 samples/sec   Loss 6.0671   LearningRate 0.0200   Epoch: 11   Global Step: 458600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:49,946-Speed 2626.99 samples/sec   Loss 6.0010   LearningRate 0.0200   Epoch: 11   Global Step: 458610   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:53,855-Speed 2620.27 samples/sec   Loss 5.9927   LearningRate 0.0200   Epoch: 11   Global Step: 458620   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:14:57,749-Speed 2630.68 samples/sec   Loss 6.0124   LearningRate 0.0200   Epoch: 11   Global Step: 458630   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:01,644-Speed 2629.78 samples/sec   Loss 6.0330   LearningRate 0.0200   Epoch: 11   Global Step: 458640   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:05,544-Speed 2626.38 samples/sec   Loss 5.9872   LearningRate 0.0200   Epoch: 11   Global Step: 458650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:15:09,436-Speed 2631.20 samples/sec   Loss 5.8920   LearningRate 0.0200   Epoch: 11   Global Step: 458660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:15:13,327-Speed 2632.46 samples/sec   Loss 5.9161   LearningRate 0.0200   Epoch: 11   Global Step: 458670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:15:17,236-Speed 2620.28 samples/sec   Loss 6.0450   LearningRate 0.0200   Epoch: 11   Global Step: 458680   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:15:21,144-Speed 2621.35 samples/sec   Loss 5.8806   LearningRate 0.0200   Epoch: 11   Global Step: 458690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:15:25,039-Speed 2629.30 samples/sec   Loss 5.9944   LearningRate 0.0200   Epoch: 11   Global Step: 458700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:15:28,915-Speed 2642.62 samples/sec   Loss 5.9933   LearningRate 0.0200   Epoch: 11   Global Step: 458710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:32,816-Speed 2626.19 samples/sec   Loss 5.9691   LearningRate 0.0200   Epoch: 11   Global Step: 458720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:36,711-Speed 2629.69 samples/sec   Loss 5.9197   LearningRate 0.0200   Epoch: 11   Global Step: 458730   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:40,608-Speed 2628.05 samples/sec   Loss 5.9956   LearningRate 0.0200   Epoch: 11   Global Step: 458740   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:44,504-Speed 2628.86 samples/sec   Loss 5.8907   LearningRate 0.0200   Epoch: 11   Global Step: 458750   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:48,402-Speed 2627.84 samples/sec   Loss 5.9528   LearningRate 0.0200   Epoch: 11   Global Step: 458760   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:52,309-Speed 2621.74 samples/sec   Loss 5.9800   LearningRate 0.0200   Epoch: 11   Global Step: 458770   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:15:56,214-Speed 2622.96 samples/sec   Loss 6.0004   LearningRate 0.0200   Epoch: 11   Global Step: 458780   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:16:00,113-Speed 2626.74 samples/sec   Loss 6.0734   LearningRate 0.0200   Epoch: 11   Global Step: 458790   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:16:04,014-Speed 2625.49 samples/sec   Loss 5.9337   LearningRate 0.0200   Epoch: 11   Global Step: 458800   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:16:07,928-Speed 2616.81 samples/sec   Loss 5.9191   LearningRate 0.0200   Epoch: 11   Global Step: 458810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:11,937-Speed 2554.55 samples/sec   Loss 6.0155   LearningRate 0.0200   Epoch: 11   Global Step: 458820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:15,949-Speed 2552.98 samples/sec   Loss 5.9046   LearningRate 0.0200   Epoch: 11   Global Step: 458830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:19,846-Speed 2628.51 samples/sec   Loss 5.9702   LearningRate 0.0200   Epoch: 11   Global Step: 458840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:23,754-Speed 2620.99 samples/sec   Loss 5.9516   LearningRate 0.0200   Epoch: 11   Global Step: 458850   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:27,659-Speed 2623.20 samples/sec   Loss 5.9985   LearningRate 0.0200   Epoch: 11   Global Step: 458860   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:31,581-Speed 2611.49 samples/sec   Loss 5.9516   LearningRate 0.0200   Epoch: 11   Global Step: 458870   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:35,479-Speed 2627.33 samples/sec   Loss 6.0005   LearningRate 0.0200   Epoch: 11   Global Step: 458880   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:39,375-Speed 2628.87 samples/sec   Loss 5.9127   LearningRate 0.0200   Epoch: 11   Global Step: 458890   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:43,270-Speed 2629.25 samples/sec   Loss 5.9443   LearningRate 0.0200   Epoch: 11   Global Step: 458900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:47,163-Speed 2630.95 samples/sec   Loss 6.0653   LearningRate 0.0200   Epoch: 11   Global Step: 458910   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:51,059-Speed 2628.75 samples/sec   Loss 5.9510   LearningRate 0.0200   Epoch: 11   Global Step: 458920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:54,959-Speed 2626.59 samples/sec   Loss 6.0666   LearningRate 0.0200   Epoch: 11   Global Step: 458930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:16:58,879-Speed 2613.17 samples/sec   Loss 5.9292   LearningRate 0.0200   Epoch: 11   Global Step: 458940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:17:02,780-Speed 2625.33 samples/sec   Loss 6.0349   LearningRate 0.0200   Epoch: 11   Global Step: 458950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:17:06,689-Speed 2620.29 samples/sec   Loss 6.0081   LearningRate 0.0200   Epoch: 11   Global Step: 458960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:17:10,590-Speed 2625.12 samples/sec   Loss 6.0495   LearningRate 0.0200   Epoch: 11   Global Step: 458970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:17:14,498-Speed 2620.75 samples/sec   Loss 6.0053   LearningRate 0.0200   Epoch: 11   Global Step: 458980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:17:18,398-Speed 2626.38 samples/sec   Loss 5.9486   LearningRate 0.0200   Epoch: 11   Global Step: 458990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:17:22,281-Speed 2637.59 samples/sec   Loss 5.9232   LearningRate 0.0200   Epoch: 11   Global Step: 459000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:26,192-Speed 2619.14 samples/sec   Loss 5.9608   LearningRate 0.0200   Epoch: 11   Global Step: 459010   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:30,093-Speed 2625.60 samples/sec   Loss 5.9658   LearningRate 0.0200   Epoch: 11   Global Step: 459020   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:33,997-Speed 2623.74 samples/sec   Loss 5.9623   LearningRate 0.0200   Epoch: 11   Global Step: 459030   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:37,901-Speed 2623.83 samples/sec   Loss 5.9557   LearningRate 0.0200   Epoch: 11   Global Step: 459040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:41,797-Speed 2628.79 samples/sec   Loss 5.9679   LearningRate 0.0199   Epoch: 11   Global Step: 459050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:45,700-Speed 2624.20 samples/sec   Loss 5.8973   LearningRate 0.0199   Epoch: 11   Global Step: 459060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:49,593-Speed 2630.57 samples/sec   Loss 6.1125   LearningRate 0.0199   Epoch: 11   Global Step: 459070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:53,500-Speed 2621.76 samples/sec   Loss 6.0232   LearningRate 0.0199   Epoch: 11   Global Step: 459080   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:17:57,415-Speed 2615.86 samples/sec   Loss 5.9246   LearningRate 0.0199   Epoch: 11   Global Step: 459090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:01,313-Speed 2627.74 samples/sec   Loss 5.8198   LearningRate 0.0199   Epoch: 11   Global Step: 459100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:18:05,189-Speed 2642.96 samples/sec   Loss 5.8829   LearningRate 0.0199   Epoch: 11   Global Step: 459110   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:09,084-Speed 2628.88 samples/sec   Loss 6.0467   LearningRate 0.0199   Epoch: 11   Global Step: 459120   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:12,994-Speed 2619.62 samples/sec   Loss 5.9550   LearningRate 0.0199   Epoch: 11   Global Step: 459130   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:16,897-Speed 2624.67 samples/sec   Loss 5.9734   LearningRate 0.0199   Epoch: 11   Global Step: 459140   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:20,818-Speed 2611.89 samples/sec   Loss 5.9673   LearningRate 0.0199   Epoch: 11   Global Step: 459150   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:24,714-Speed 2628.91 samples/sec   Loss 5.9877   LearningRate 0.0199   Epoch: 11   Global Step: 459160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:28,638-Speed 2610.60 samples/sec   Loss 6.0014   LearningRate 0.0199   Epoch: 11   Global Step: 459170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:32,559-Speed 2611.52 samples/sec   Loss 6.0190   LearningRate 0.0199   Epoch: 11   Global Step: 459180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:36,467-Speed 2621.06 samples/sec   Loss 6.0225   LearningRate 0.0199   Epoch: 11   Global Step: 459190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:40,369-Speed 2624.67 samples/sec   Loss 5.9300   LearningRate 0.0199   Epoch: 11   Global Step: 459200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:18:44,271-Speed 2625.28 samples/sec   Loss 5.9391   LearningRate 0.0199   Epoch: 11   Global Step: 459210   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:18:48,168-Speed 2628.04 samples/sec   Loss 5.8871   LearningRate 0.0199   Epoch: 11   Global Step: 459220   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:18:52,071-Speed 2624.07 samples/sec   Loss 5.9985   LearningRate 0.0199   Epoch: 11   Global Step: 459230   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:18:55,973-Speed 2625.17 samples/sec   Loss 5.9965   LearningRate 0.0199   Epoch: 11   Global Step: 459240   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:18:59,899-Speed 2609.30 samples/sec   Loss 6.0364   LearningRate 0.0199   Epoch: 11   Global Step: 459250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:19:03,797-Speed 2626.96 samples/sec   Loss 5.9013   LearningRate 0.0199   Epoch: 11   Global Step: 459260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:19:07,697-Speed 2626.38 samples/sec   Loss 6.0158   LearningRate 0.0199   Epoch: 11   Global Step: 459270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:19:11,601-Speed 2623.55 samples/sec   Loss 6.0168   LearningRate 0.0199   Epoch: 11   Global Step: 459280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:19:15,478-Speed 2642.29 samples/sec   Loss 5.9253   LearningRate 0.0199   Epoch: 11   Global Step: 459290   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:19,378-Speed 2626.17 samples/sec   Loss 5.8230   LearningRate 0.0199   Epoch: 11   Global Step: 459300   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:23,277-Speed 2626.39 samples/sec   Loss 5.9822   LearningRate 0.0199   Epoch: 11   Global Step: 459310   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:27,178-Speed 2625.77 samples/sec   Loss 6.0278   LearningRate 0.0199   Epoch: 11   Global Step: 459320   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:31,076-Speed 2628.09 samples/sec   Loss 5.9590   LearningRate 0.0199   Epoch: 11   Global Step: 459330   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:34,974-Speed 2627.68 samples/sec   Loss 5.9742   LearningRate 0.0199   Epoch: 11   Global Step: 459340   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:38,871-Speed 2627.87 samples/sec   Loss 5.9845   LearningRate 0.0199   Epoch: 11   Global Step: 459350   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:42,774-Speed 2624.09 samples/sec   Loss 6.0294   LearningRate 0.0199   Epoch: 11   Global Step: 459360   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:46,699-Speed 2609.84 samples/sec   Loss 5.9516   LearningRate 0.0199   Epoch: 11   Global Step: 459370   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:50,598-Speed 2626.82 samples/sec   Loss 6.0809   LearningRate 0.0199   Epoch: 11   Global Step: 459380   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:19:54,499-Speed 2625.44 samples/sec   Loss 5.9631   LearningRate 0.0199   Epoch: 11   Global Step: 459390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:19:58,403-Speed 2623.70 samples/sec   Loss 6.0778   LearningRate 0.0199   Epoch: 11   Global Step: 459400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:02,296-Speed 2630.40 samples/sec   Loss 5.9133   LearningRate 0.0199   Epoch: 11   Global Step: 459410   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:06,200-Speed 2623.97 samples/sec   Loss 6.0015   LearningRate 0.0199   Epoch: 11   Global Step: 459420   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:10,097-Speed 2628.06 samples/sec   Loss 5.8816   LearningRate 0.0199   Epoch: 11   Global Step: 459430   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:13,995-Speed 2628.34 samples/sec   Loss 5.8660   LearningRate 0.0199   Epoch: 11   Global Step: 459440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:17,902-Speed 2621.35 samples/sec   Loss 5.9463   LearningRate 0.0199   Epoch: 11   Global Step: 459450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:21,797-Speed 2629.30 samples/sec   Loss 5.9421   LearningRate 0.0199   Epoch: 11   Global Step: 459460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:25,702-Speed 2622.60 samples/sec   Loss 5.9995   LearningRate 0.0199   Epoch: 11   Global Step: 459470   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:29,607-Speed 2623.48 samples/sec   Loss 5.8971   LearningRate 0.0199   Epoch: 11   Global Step: 459480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:33,504-Speed 2628.04 samples/sec   Loss 5.8812   LearningRate 0.0199   Epoch: 11   Global Step: 459490   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 23:20:37,571-Speed 2518.00 samples/sec   Loss 5.9473   LearningRate 0.0199   Epoch: 11   Global Step: 459500   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-04-14 23:20:41,624-Speed 2527.40 samples/sec   Loss 6.0013   LearningRate 0.0199   Epoch: 11   Global Step: 459510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:20:45,617-Speed 2565.72 samples/sec   Loss 5.9621   LearningRate 0.0199   Epoch: 11   Global Step: 459520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:20:49,517-Speed 2626.60 samples/sec   Loss 6.1051   LearningRate 0.0199   Epoch: 11   Global Step: 459530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:20:53,419-Speed 2624.80 samples/sec   Loss 5.8524   LearningRate 0.0199   Epoch: 11   Global Step: 459540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:20:57,348-Speed 2607.25 samples/sec   Loss 6.0001   LearningRate 0.0199   Epoch: 11   Global Step: 459550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:01,258-Speed 2619.30 samples/sec   Loss 5.9600   LearningRate 0.0199   Epoch: 11   Global Step: 459560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:05,163-Speed 2623.02 samples/sec   Loss 5.9415   LearningRate 0.0199   Epoch: 11   Global Step: 459570   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:09,063-Speed 2626.10 samples/sec   Loss 5.8829   LearningRate 0.0199   Epoch: 11   Global Step: 459580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:12,971-Speed 2620.76 samples/sec   Loss 5.8114   LearningRate 0.0199   Epoch: 11   Global Step: 459590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:16,878-Speed 2621.25 samples/sec   Loss 6.0039   LearningRate 0.0199   Epoch: 11   Global Step: 459600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:20,779-Speed 2626.15 samples/sec   Loss 6.0167   LearningRate 0.0199   Epoch: 11   Global Step: 459610   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:21:24,675-Speed 2628.65 samples/sec   Loss 5.9178   LearningRate 0.0199   Epoch: 11   Global Step: 459620   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:28,577-Speed 2625.31 samples/sec   Loss 5.9917   LearningRate 0.0199   Epoch: 11   Global Step: 459630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:32,474-Speed 2627.85 samples/sec   Loss 5.9326   LearningRate 0.0199   Epoch: 11   Global Step: 459640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:36,374-Speed 2626.56 samples/sec   Loss 5.8887   LearningRate 0.0199   Epoch: 11   Global Step: 459650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:40,270-Speed 2628.52 samples/sec   Loss 6.1694   LearningRate 0.0199   Epoch: 11   Global Step: 459660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:44,168-Speed 2628.01 samples/sec   Loss 6.0761   LearningRate 0.0199   Epoch: 11   Global Step: 459670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:48,067-Speed 2626.70 samples/sec   Loss 6.0070   LearningRate 0.0199   Epoch: 11   Global Step: 459680   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:51,968-Speed 2625.45 samples/sec   Loss 6.0083   LearningRate 0.0199   Epoch: 11   Global Step: 459690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:55,862-Speed 2630.10 samples/sec   Loss 6.0053   LearningRate 0.0199   Epoch: 11   Global Step: 459700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:21:59,764-Speed 2625.27 samples/sec   Loss 6.0074   LearningRate 0.0199   Epoch: 11   Global Step: 459710   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:03,653-Speed 2633.55 samples/sec   Loss 5.9541   LearningRate 0.0199   Epoch: 11   Global Step: 459720   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:07,558-Speed 2623.20 samples/sec   Loss 5.9281   LearningRate 0.0199   Epoch: 11   Global Step: 459730   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:11,462-Speed 2622.97 samples/sec   Loss 6.0169   LearningRate 0.0199   Epoch: 11   Global Step: 459740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:15,369-Speed 2621.50 samples/sec   Loss 5.9742   LearningRate 0.0199   Epoch: 11   Global Step: 459750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:19,271-Speed 2625.41 samples/sec   Loss 5.9431   LearningRate 0.0199   Epoch: 11   Global Step: 459760   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:23,177-Speed 2621.76 samples/sec   Loss 5.9914   LearningRate 0.0199   Epoch: 11   Global Step: 459770   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:27,083-Speed 2622.22 samples/sec   Loss 5.9809   LearningRate 0.0199   Epoch: 11   Global Step: 459780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:22:30,958-Speed 2643.50 samples/sec   Loss 6.1228   LearningRate 0.0199   Epoch: 11   Global Step: 459790   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:34,858-Speed 2626.45 samples/sec   Loss 5.9098   LearningRate 0.0199   Epoch: 11   Global Step: 459800   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:38,763-Speed 2622.54 samples/sec   Loss 5.9785   LearningRate 0.0199   Epoch: 11   Global Step: 459810   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:42,664-Speed 2625.43 samples/sec   Loss 5.8791   LearningRate 0.0199   Epoch: 11   Global Step: 459820   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:46,561-Speed 2628.28 samples/sec   Loss 5.8242   LearningRate 0.0199   Epoch: 11   Global Step: 459830   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:50,461-Speed 2626.17 samples/sec   Loss 5.9464   LearningRate 0.0199   Epoch: 11   Global Step: 459840   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:54,359-Speed 2627.82 samples/sec   Loss 5.9611   LearningRate 0.0199   Epoch: 11   Global Step: 459850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:22:58,257-Speed 2627.31 samples/sec   Loss 5.9237   LearningRate 0.0199   Epoch: 11   Global Step: 459860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:23:02,156-Speed 2627.28 samples/sec   Loss 5.9810   LearningRate 0.0199   Epoch: 11   Global Step: 459870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:23:06,056-Speed 2626.23 samples/sec   Loss 5.9546   LearningRate 0.0199   Epoch: 11   Global Step: 459880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:23:09,955-Speed 2626.41 samples/sec   Loss 5.9392   LearningRate 0.0199   Epoch: 11   Global Step: 459890   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:13,849-Speed 2630.39 samples/sec   Loss 6.0102   LearningRate 0.0199   Epoch: 11   Global Step: 459900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:17,754-Speed 2623.53 samples/sec   Loss 5.9153   LearningRate 0.0199   Epoch: 11   Global Step: 459910   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:21,673-Speed 2613.07 samples/sec   Loss 5.8901   LearningRate 0.0199   Epoch: 11   Global Step: 459920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:25,580-Speed 2622.67 samples/sec   Loss 5.9388   LearningRate 0.0199   Epoch: 11   Global Step: 459930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:29,471-Speed 2631.99 samples/sec   Loss 6.0213   LearningRate 0.0199   Epoch: 11   Global Step: 459940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:33,365-Speed 2630.48 samples/sec   Loss 5.8549   LearningRate 0.0199   Epoch: 11   Global Step: 459950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:37,285-Speed 2612.32 samples/sec   Loss 5.9601   LearningRate 0.0199   Epoch: 11   Global Step: 459960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:41,181-Speed 2628.74 samples/sec   Loss 6.0283   LearningRate 0.0199   Epoch: 11   Global Step: 459970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:45,082-Speed 2625.94 samples/sec   Loss 5.9250   LearningRate 0.0198   Epoch: 11   Global Step: 459980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:48,972-Speed 2633.01 samples/sec   Loss 5.9942   LearningRate 0.0198   Epoch: 11   Global Step: 459990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:23:52,869-Speed 2628.56 samples/sec   Loss 5.8300   LearningRate 0.0198   Epoch: 11   Global Step: 460000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:24:35,690-[lfw][460000]XNorm: 23.337887
Training: 2022-04-14 23:24:35,691-[lfw][460000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-14 23:24:35,692-[lfw][460000]Accuracy-Highest: 0.99783
Training: 2022-04-14 23:25:25,868-[cfp_fp][460000]XNorm: 21.993573
Training: 2022-04-14 23:25:25,869-[cfp_fp][460000]Accuracy-Flip: 0.98843+-0.00391
Training: 2022-04-14 23:25:25,869-[cfp_fp][460000]Accuracy-Highest: 0.98843
Training: 2022-04-14 23:26:08,311-[agedb_30][460000]XNorm: 23.593340
Training: 2022-04-14 23:26:08,312-[agedb_30][460000]Accuracy-Flip: 0.97717+-0.00711
Training: 2022-04-14 23:26:08,312-[agedb_30][460000]Accuracy-Highest: 0.97817
Training: 2022-04-14 23:26:12,187-Speed 73.50 samples/sec   Loss 5.9777   LearningRate 0.0198   Epoch: 11   Global Step: 460010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:26:16,059-Speed 2644.85 samples/sec   Loss 5.8247   LearningRate 0.0198   Epoch: 11   Global Step: 460020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:26:19,955-Speed 2629.27 samples/sec   Loss 5.8848   LearningRate 0.0198   Epoch: 11   Global Step: 460030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:26:23,883-Speed 2607.36 samples/sec   Loss 6.0856   LearningRate 0.0198   Epoch: 11   Global Step: 460040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:26:27,767-Speed 2637.22 samples/sec   Loss 6.0475   LearningRate 0.0198   Epoch: 11   Global Step: 460050   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:26:31,658-Speed 2632.21 samples/sec   Loss 5.8778   LearningRate 0.0198   Epoch: 11   Global Step: 460060   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:26:35,527-Speed 2646.88 samples/sec   Loss 6.0048   LearningRate 0.0198   Epoch: 11   Global Step: 460070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:26:39,421-Speed 2630.63 samples/sec   Loss 5.8898   LearningRate 0.0198   Epoch: 11   Global Step: 460080   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:26:43,312-Speed 2632.30 samples/sec   Loss 5.8244   LearningRate 0.0198   Epoch: 11   Global Step: 460090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:26:47,214-Speed 2625.28 samples/sec   Loss 5.9381   LearningRate 0.0198   Epoch: 11   Global Step: 460100   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:26:51,113-Speed 2627.07 samples/sec   Loss 5.9107   LearningRate 0.0198   Epoch: 11   Global Step: 460110   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:26:55,014-Speed 2625.54 samples/sec   Loss 5.9769   LearningRate 0.0198   Epoch: 11   Global Step: 460120   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:26:58,910-Speed 2628.73 samples/sec   Loss 5.9966   LearningRate 0.0198   Epoch: 11   Global Step: 460130   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:27:02,812-Speed 2625.09 samples/sec   Loss 5.8871   LearningRate 0.0198   Epoch: 11   Global Step: 460140   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:27:06,709-Speed 2628.10 samples/sec   Loss 5.9303   LearningRate 0.0198   Epoch: 11   Global Step: 460150   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:27:10,626-Speed 2614.83 samples/sec   Loss 6.0066   LearningRate 0.0198   Epoch: 11   Global Step: 460160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:27:14,540-Speed 2616.93 samples/sec   Loss 5.8873   LearningRate 0.0198   Epoch: 11   Global Step: 460170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:18,441-Speed 2625.72 samples/sec   Loss 5.9095   LearningRate 0.0198   Epoch: 11   Global Step: 460180   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:22,341-Speed 2626.29 samples/sec   Loss 5.8994   LearningRate 0.0198   Epoch: 11   Global Step: 460190   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:26,235-Speed 2630.14 samples/sec   Loss 5.8389   LearningRate 0.0198   Epoch: 11   Global Step: 460200   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:30,155-Speed 2613.11 samples/sec   Loss 6.0217   LearningRate 0.0198   Epoch: 11   Global Step: 460210   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:34,047-Speed 2630.90 samples/sec   Loss 5.9545   LearningRate 0.0198   Epoch: 11   Global Step: 460220   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:37,943-Speed 2629.29 samples/sec   Loss 5.8418   LearningRate 0.0198   Epoch: 11   Global Step: 460230   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:41,845-Speed 2624.87 samples/sec   Loss 5.9390   LearningRate 0.0198   Epoch: 11   Global Step: 460240   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:45,734-Speed 2633.44 samples/sec   Loss 6.0291   LearningRate 0.0198   Epoch: 11   Global Step: 460250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:49,643-Speed 2620.57 samples/sec   Loss 5.9401   LearningRate 0.0198   Epoch: 11   Global Step: 460260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:53,593-Speed 2593.15 samples/sec   Loss 5.9689   LearningRate 0.0198   Epoch: 11   Global Step: 460270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:27:57,673-Speed 2510.12 samples/sec   Loss 6.0217   LearningRate 0.0198   Epoch: 11   Global Step: 460280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:01,758-Speed 2507.55 samples/sec   Loss 5.8974   LearningRate 0.0198   Epoch: 11   Global Step: 460290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:05,822-Speed 2520.16 samples/sec   Loss 5.9463   LearningRate 0.0198   Epoch: 11   Global Step: 460300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:09,719-Speed 2628.06 samples/sec   Loss 5.9069   LearningRate 0.0198   Epoch: 11   Global Step: 460310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:13,655-Speed 2602.04 samples/sec   Loss 5.9360   LearningRate 0.0198   Epoch: 11   Global Step: 460320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:17,552-Speed 2628.92 samples/sec   Loss 5.9102   LearningRate 0.0198   Epoch: 11   Global Step: 460330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:21,445-Speed 2630.83 samples/sec   Loss 6.0327   LearningRate 0.0198   Epoch: 11   Global Step: 460340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-04-14 23:28:25,347-Speed 2624.33 samples/sec   Loss 5.9500   LearningRate 0.0198   Epoch: 11   Global Step: 460350   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-04-14 23:28:29,243-Speed 2629.16 samples/sec   Loss 5.9525   LearningRate 0.0198   Epoch: 11   Global Step: 460360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:33,172-Speed 2607.01 samples/sec   Loss 5.9571   LearningRate 0.0198   Epoch: 11   Global Step: 460370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:37,248-Speed 2512.64 samples/sec   Loss 5.9976   LearningRate 0.0198   Epoch: 11   Global Step: 460380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:41,254-Speed 2563.06 samples/sec   Loss 6.0768   LearningRate 0.0198   Epoch: 11   Global Step: 460390   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:45,172-Speed 2613.79 samples/sec   Loss 5.9831   LearningRate 0.0198   Epoch: 11   Global Step: 460400   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:49,079-Speed 2621.93 samples/sec   Loss 5.9004   LearningRate 0.0198   Epoch: 11   Global Step: 460410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:52,981-Speed 2625.17 samples/sec   Loss 6.0043   LearningRate 0.0198   Epoch: 11   Global Step: 460420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:28:56,876-Speed 2629.28 samples/sec   Loss 5.9665   LearningRate 0.0198   Epoch: 11   Global Step: 460430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:29:00,771-Speed 2629.23 samples/sec   Loss 5.8894   LearningRate 0.0198   Epoch: 11   Global Step: 460440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:29:04,667-Speed 2629.03 samples/sec   Loss 5.8979   LearningRate 0.0198   Epoch: 11   Global Step: 460450   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:08,560-Speed 2631.12 samples/sec   Loss 5.9814   LearningRate 0.0198   Epoch: 11   Global Step: 460460   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:12,460-Speed 2626.55 samples/sec   Loss 5.9548   LearningRate 0.0198   Epoch: 11   Global Step: 460470   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:16,354-Speed 2630.19 samples/sec   Loss 5.9542   LearningRate 0.0198   Epoch: 11   Global Step: 460480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:20,252-Speed 2627.65 samples/sec   Loss 6.0454   LearningRate 0.0198   Epoch: 11   Global Step: 460490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:24,149-Speed 2628.37 samples/sec   Loss 6.0396   LearningRate 0.0198   Epoch: 11   Global Step: 460500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:28,045-Speed 2628.79 samples/sec   Loss 5.9623   LearningRate 0.0198   Epoch: 11   Global Step: 460510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:31,936-Speed 2631.81 samples/sec   Loss 5.9373   LearningRate 0.0198   Epoch: 11   Global Step: 460520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:35,835-Speed 2627.12 samples/sec   Loss 5.9309   LearningRate 0.0198   Epoch: 11   Global Step: 460530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:39,734-Speed 2626.99 samples/sec   Loss 5.8887   LearningRate 0.0198   Epoch: 11   Global Step: 460540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:43,603-Speed 2647.25 samples/sec   Loss 5.9134   LearningRate 0.0198   Epoch: 11   Global Step: 460550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:47,502-Speed 2627.13 samples/sec   Loss 5.9425   LearningRate 0.0198   Epoch: 11   Global Step: 460560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:51,395-Speed 2630.83 samples/sec   Loss 6.1320   LearningRate 0.0198   Epoch: 11   Global Step: 460570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:55,290-Speed 2629.43 samples/sec   Loss 5.8773   LearningRate 0.0198   Epoch: 11   Global Step: 460580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:29:59,192-Speed 2625.18 samples/sec   Loss 5.8891   LearningRate 0.0198   Epoch: 11   Global Step: 460590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:30:03,114-Speed 2611.19 samples/sec   Loss 5.9842   LearningRate 0.0198   Epoch: 11   Global Step: 460600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:30:07,009-Speed 2629.90 samples/sec   Loss 5.9835   LearningRate 0.0198   Epoch: 11   Global Step: 460610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:30:10,909-Speed 2626.05 samples/sec   Loss 6.0147   LearningRate 0.0198   Epoch: 11   Global Step: 460620   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:30:14,804-Speed 2629.80 samples/sec   Loss 5.8086   LearningRate 0.0198   Epoch: 11   Global Step: 460630   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:30:18,699-Speed 2629.69 samples/sec   Loss 5.9297   LearningRate 0.0198   Epoch: 11   Global Step: 460640   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:30:22,569-Speed 2646.36 samples/sec   Loss 5.9024   LearningRate 0.0198   Epoch: 11   Global Step: 460650   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:26,465-Speed 2628.78 samples/sec   Loss 5.9829   LearningRate 0.0198   Epoch: 11   Global Step: 460660   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:30,372-Speed 2621.89 samples/sec   Loss 6.0298   LearningRate 0.0198   Epoch: 11   Global Step: 460670   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:34,274-Speed 2624.96 samples/sec   Loss 5.8606   LearningRate 0.0198   Epoch: 11   Global Step: 460680   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:38,170-Speed 2628.49 samples/sec   Loss 5.9850   LearningRate 0.0198   Epoch: 11   Global Step: 460690   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:42,063-Speed 2630.96 samples/sec   Loss 5.9780   LearningRate 0.0198   Epoch: 11   Global Step: 460700   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:45,954-Speed 2632.65 samples/sec   Loss 6.0282   LearningRate 0.0198   Epoch: 11   Global Step: 460710   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:49,855-Speed 2625.44 samples/sec   Loss 5.7882   LearningRate 0.0198   Epoch: 11   Global Step: 460720   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:53,760-Speed 2622.77 samples/sec   Loss 5.9366   LearningRate 0.0198   Epoch: 11   Global Step: 460730   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:30:57,662-Speed 2625.15 samples/sec   Loss 6.0086   LearningRate 0.0198   Epoch: 11   Global Step: 460740   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:31:01,573-Speed 2618.63 samples/sec   Loss 5.8989   LearningRate 0.0198   Epoch: 11   Global Step: 460750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:05,472-Speed 2627.16 samples/sec   Loss 5.9254   LearningRate 0.0198   Epoch: 11   Global Step: 460760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:09,367-Speed 2629.92 samples/sec   Loss 5.8872   LearningRate 0.0198   Epoch: 11   Global Step: 460770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:13,264-Speed 2627.85 samples/sec   Loss 5.8815   LearningRate 0.0198   Epoch: 11   Global Step: 460780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:17,159-Speed 2629.41 samples/sec   Loss 5.9480   LearningRate 0.0198   Epoch: 11   Global Step: 460790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:21,069-Speed 2620.01 samples/sec   Loss 5.9693   LearningRate 0.0198   Epoch: 11   Global Step: 460800   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:24,965-Speed 2628.95 samples/sec   Loss 5.9813   LearningRate 0.0198   Epoch: 11   Global Step: 460810   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:28,862-Speed 2628.38 samples/sec   Loss 5.8703   LearningRate 0.0198   Epoch: 11   Global Step: 460820   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:32,765-Speed 2623.74 samples/sec   Loss 6.0856   LearningRate 0.0198   Epoch: 11   Global Step: 460830   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:36,664-Speed 2626.54 samples/sec   Loss 6.0098   LearningRate 0.0198   Epoch: 11   Global Step: 460840   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:31:40,561-Speed 2628.82 samples/sec   Loss 5.9578   LearningRate 0.0198   Epoch: 11   Global Step: 460850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:31:44,452-Speed 2632.29 samples/sec   Loss 5.8506   LearningRate 0.0198   Epoch: 11   Global Step: 460860   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:31:48,343-Speed 2632.38 samples/sec   Loss 5.8542   LearningRate 0.0198   Epoch: 11   Global Step: 460870   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:31:52,236-Speed 2631.25 samples/sec   Loss 5.9582   LearningRate 0.0198   Epoch: 11   Global Step: 460880   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:31:56,128-Speed 2631.45 samples/sec   Loss 6.0517   LearningRate 0.0198   Epoch: 11   Global Step: 460890   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:00,026-Speed 2627.90 samples/sec   Loss 5.9105   LearningRate 0.0198   Epoch: 11   Global Step: 460900   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:03,919-Speed 2630.28 samples/sec   Loss 5.9269   LearningRate 0.0197   Epoch: 11   Global Step: 460910   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:07,823-Speed 2623.81 samples/sec   Loss 5.9453   LearningRate 0.0197   Epoch: 11   Global Step: 460920   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:11,717-Speed 2629.87 samples/sec   Loss 5.9604   LearningRate 0.0197   Epoch: 11   Global Step: 460930   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:15,638-Speed 2612.37 samples/sec   Loss 5.9185   LearningRate 0.0197   Epoch: 11   Global Step: 460940   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:19,517-Speed 2641.24 samples/sec   Loss 5.9250   LearningRate 0.0197   Epoch: 11   Global Step: 460950   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:23,422-Speed 2622.38 samples/sec   Loss 5.9558   LearningRate 0.0197   Epoch: 11   Global Step: 460960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:27,317-Speed 2630.10 samples/sec   Loss 5.9265   LearningRate 0.0197   Epoch: 11   Global Step: 460970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:31,208-Speed 2632.12 samples/sec   Loss 5.9220   LearningRate 0.0197   Epoch: 11   Global Step: 460980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:35,097-Speed 2633.53 samples/sec   Loss 5.9478   LearningRate 0.0197   Epoch: 11   Global Step: 460990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:32:38,968-Speed 2645.72 samples/sec   Loss 5.9751   LearningRate 0.0197   Epoch: 11   Global Step: 461000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:32:42,871-Speed 2624.48 samples/sec   Loss 5.9011   LearningRate 0.0197   Epoch: 11   Global Step: 461010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:32:46,767-Speed 2628.40 samples/sec   Loss 5.9699   LearningRate 0.0197   Epoch: 11   Global Step: 461020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:32:50,668-Speed 2626.14 samples/sec   Loss 5.9249   LearningRate 0.0197   Epoch: 11   Global Step: 461030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:32:54,563-Speed 2629.07 samples/sec   Loss 5.9903   LearningRate 0.0197   Epoch: 11   Global Step: 461040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:32:58,460-Speed 2628.94 samples/sec   Loss 5.8599   LearningRate 0.0197   Epoch: 11   Global Step: 461050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:02,350-Speed 2632.52 samples/sec   Loss 5.7924   LearningRate 0.0197   Epoch: 11   Global Step: 461060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:06,245-Speed 2629.91 samples/sec   Loss 5.9508   LearningRate 0.0197   Epoch: 11   Global Step: 461070   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:10,147-Speed 2624.70 samples/sec   Loss 5.9603   LearningRate 0.0197   Epoch: 11   Global Step: 461080   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:14,069-Speed 2611.58 samples/sec   Loss 5.8163   LearningRate 0.0197   Epoch: 11   Global Step: 461090   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:17,964-Speed 2629.49 samples/sec   Loss 5.8611   LearningRate 0.0197   Epoch: 11   Global Step: 461100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:33:21,848-Speed 2636.81 samples/sec   Loss 5.9613   LearningRate 0.0197   Epoch: 11   Global Step: 461110   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:25,754-Speed 2622.96 samples/sec   Loss 6.0704   LearningRate 0.0197   Epoch: 11   Global Step: 461120   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:29,648-Speed 2630.42 samples/sec   Loss 5.8855   LearningRate 0.0197   Epoch: 11   Global Step: 461130   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:33,542-Speed 2630.12 samples/sec   Loss 5.9174   LearningRate 0.0197   Epoch: 11   Global Step: 461140   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:37,437-Speed 2629.14 samples/sec   Loss 5.9593   LearningRate 0.0197   Epoch: 11   Global Step: 461150   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:41,351-Speed 2617.67 samples/sec   Loss 5.8743   LearningRate 0.0197   Epoch: 11   Global Step: 461160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:45,266-Speed 2615.70 samples/sec   Loss 5.8641   LearningRate 0.0197   Epoch: 11   Global Step: 461170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:49,180-Speed 2617.85 samples/sec   Loss 5.9145   LearningRate 0.0197   Epoch: 11   Global Step: 461180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:53,078-Speed 2626.90 samples/sec   Loss 5.9743   LearningRate 0.0197   Epoch: 11   Global Step: 461190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:33:56,972-Speed 2630.38 samples/sec   Loss 5.9841   LearningRate 0.0197   Epoch: 11   Global Step: 461200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:00,868-Speed 2629.44 samples/sec   Loss 5.8880   LearningRate 0.0197   Epoch: 11   Global Step: 461210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:04,762-Speed 2629.65 samples/sec   Loss 5.9015   LearningRate 0.0197   Epoch: 11   Global Step: 461220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:08,660-Speed 2627.31 samples/sec   Loss 5.9604   LearningRate 0.0197   Epoch: 11   Global Step: 461230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:12,554-Speed 2631.09 samples/sec   Loss 5.8706   LearningRate 0.0197   Epoch: 11   Global Step: 461240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:16,452-Speed 2627.28 samples/sec   Loss 5.9358   LearningRate 0.0197   Epoch: 11   Global Step: 461250   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:20,360-Speed 2621.37 samples/sec   Loss 5.9177   LearningRate 0.0197   Epoch: 11   Global Step: 461260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:24,276-Speed 2615.35 samples/sec   Loss 5.9353   LearningRate 0.0197   Epoch: 11   Global Step: 461270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:34:28,147-Speed 2646.33 samples/sec   Loss 5.8463   LearningRate 0.0197   Epoch: 11   Global Step: 461280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:32,041-Speed 2630.30 samples/sec   Loss 5.9975   LearningRate 0.0197   Epoch: 11   Global Step: 461290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:35,942-Speed 2625.58 samples/sec   Loss 5.8356   LearningRate 0.0197   Epoch: 11   Global Step: 461300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:39,853-Speed 2618.70 samples/sec   Loss 5.9005   LearningRate 0.0197   Epoch: 11   Global Step: 461310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:43,748-Speed 2629.99 samples/sec   Loss 5.9697   LearningRate 0.0197   Epoch: 11   Global Step: 461320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:47,639-Speed 2632.33 samples/sec   Loss 5.9839   LearningRate 0.0197   Epoch: 11   Global Step: 461330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:51,532-Speed 2630.74 samples/sec   Loss 5.9198   LearningRate 0.0197   Epoch: 11   Global Step: 461340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:55,450-Speed 2613.98 samples/sec   Loss 5.9253   LearningRate 0.0197   Epoch: 11   Global Step: 461350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:34:59,350-Speed 2626.71 samples/sec   Loss 5.7891   LearningRate 0.0197   Epoch: 11   Global Step: 461360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:03,249-Speed 2626.91 samples/sec   Loss 5.9037   LearningRate 0.0197   Epoch: 11   Global Step: 461370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:07,145-Speed 2629.06 samples/sec   Loss 6.0204   LearningRate 0.0197   Epoch: 11   Global Step: 461380   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:35:11,046-Speed 2625.24 samples/sec   Loss 5.8962   LearningRate 0.0197   Epoch: 11   Global Step: 461390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:35:14,956-Speed 2619.83 samples/sec   Loss 5.9133   LearningRate 0.0197   Epoch: 11   Global Step: 461400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:35:18,833-Speed 2641.74 samples/sec   Loss 5.9239   LearningRate 0.0197   Epoch: 11   Global Step: 461410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:22,726-Speed 2630.51 samples/sec   Loss 5.8939   LearningRate 0.0197   Epoch: 11   Global Step: 461420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:26,632-Speed 2622.93 samples/sec   Loss 5.8798   LearningRate 0.0197   Epoch: 11   Global Step: 461430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:30,570-Speed 2600.85 samples/sec   Loss 5.9869   LearningRate 0.0197   Epoch: 11   Global Step: 461440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:34,836-Speed 2402.09 samples/sec   Loss 6.0217   LearningRate 0.0197   Epoch: 11   Global Step: 461450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:38,736-Speed 2625.97 samples/sec   Loss 5.8537   LearningRate 0.0197   Epoch: 11   Global Step: 461460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:42,648-Speed 2618.27 samples/sec   Loss 5.9506   LearningRate 0.0197   Epoch: 11   Global Step: 461470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:46,541-Speed 2630.94 samples/sec   Loss 5.8315   LearningRate 0.0197   Epoch: 11   Global Step: 461480   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:50,442-Speed 2625.79 samples/sec   Loss 5.9906   LearningRate 0.0197   Epoch: 11   Global Step: 461490   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:54,493-Speed 2528.55 samples/sec   Loss 5.8740   LearningRate 0.0197   Epoch: 11   Global Step: 461500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:35:58,432-Speed 2600.72 samples/sec   Loss 6.0161   LearningRate 0.0197   Epoch: 11   Global Step: 461510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:36:02,444-Speed 2552.75 samples/sec   Loss 5.9049   LearningRate 0.0197   Epoch: 11   Global Step: 461520   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:06,457-Speed 2552.73 samples/sec   Loss 6.0216   LearningRate 0.0197   Epoch: 11   Global Step: 461530   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:10,359-Speed 2624.25 samples/sec   Loss 5.8614   LearningRate 0.0197   Epoch: 11   Global Step: 461540   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:14,396-Speed 2537.96 samples/sec   Loss 5.9807   LearningRate 0.0197   Epoch: 11   Global Step: 461550   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:18,317-Speed 2611.83 samples/sec   Loss 5.8920   LearningRate 0.0197   Epoch: 11   Global Step: 461560   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:22,214-Speed 2628.51 samples/sec   Loss 5.8616   LearningRate 0.0197   Epoch: 11   Global Step: 461570   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:26,114-Speed 2626.09 samples/sec   Loss 5.9027   LearningRate 0.0197   Epoch: 11   Global Step: 461580   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:30,015-Speed 2625.52 samples/sec   Loss 5.9140   LearningRate 0.0197   Epoch: 11   Global Step: 461590   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:33,920-Speed 2622.49 samples/sec   Loss 5.8229   LearningRate 0.0197   Epoch: 11   Global Step: 461600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:37,821-Speed 2625.53 samples/sec   Loss 5.9580   LearningRate 0.0197   Epoch: 11   Global Step: 461610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:36:41,723-Speed 2624.87 samples/sec   Loss 5.9623   LearningRate 0.0197   Epoch: 11   Global Step: 461620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:36:45,621-Speed 2628.42 samples/sec   Loss 5.9407   LearningRate 0.0197   Epoch: 11   Global Step: 461630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:36:49,516-Speed 2629.45 samples/sec   Loss 5.9008   LearningRate 0.0197   Epoch: 11   Global Step: 461640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:36:53,428-Speed 2618.67 samples/sec   Loss 5.8088   LearningRate 0.0197   Epoch: 11   Global Step: 461650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:36:57,425-Speed 2562.59 samples/sec   Loss 5.8479   LearningRate 0.0197   Epoch: 11   Global Step: 461660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:01,319-Speed 2630.23 samples/sec   Loss 5.9547   LearningRate 0.0197   Epoch: 11   Global Step: 461670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:05,252-Speed 2603.73 samples/sec   Loss 5.8733   LearningRate 0.0197   Epoch: 11   Global Step: 461680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:09,154-Speed 2625.27 samples/sec   Loss 5.9365   LearningRate 0.0197   Epoch: 11   Global Step: 461690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:13,065-Speed 2619.51 samples/sec   Loss 5.9955   LearningRate 0.0197   Epoch: 11   Global Step: 461700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:16,960-Speed 2629.34 samples/sec   Loss 5.9381   LearningRate 0.0197   Epoch: 11   Global Step: 461710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:20,861-Speed 2625.71 samples/sec   Loss 5.9803   LearningRate 0.0197   Epoch: 11   Global Step: 461720   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:24,769-Speed 2620.74 samples/sec   Loss 5.8743   LearningRate 0.0197   Epoch: 11   Global Step: 461730   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:37:28,650-Speed 2639.68 samples/sec   Loss 5.8323   LearningRate 0.0197   Epoch: 11   Global Step: 461740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:32,542-Speed 2631.39 samples/sec   Loss 5.9727   LearningRate 0.0197   Epoch: 11   Global Step: 461750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:36,435-Speed 2631.04 samples/sec   Loss 6.0574   LearningRate 0.0197   Epoch: 11   Global Step: 461760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:40,331-Speed 2628.77 samples/sec   Loss 5.9585   LearningRate 0.0197   Epoch: 11   Global Step: 461770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:44,224-Speed 2631.62 samples/sec   Loss 6.0069   LearningRate 0.0197   Epoch: 11   Global Step: 461780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:48,117-Speed 2630.65 samples/sec   Loss 5.8709   LearningRate 0.0197   Epoch: 11   Global Step: 461790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:52,008-Speed 2632.31 samples/sec   Loss 6.0385   LearningRate 0.0197   Epoch: 11   Global Step: 461800   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:55,921-Speed 2618.00 samples/sec   Loss 5.9156   LearningRate 0.0197   Epoch: 11   Global Step: 461810   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:37:59,814-Speed 2630.55 samples/sec   Loss 5.9600   LearningRate 0.0197   Epoch: 11   Global Step: 461820   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:03,709-Speed 2629.30 samples/sec   Loss 5.9184   LearningRate 0.0197   Epoch: 11   Global Step: 461830   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:07,628-Speed 2614.12 samples/sec   Loss 5.9418   LearningRate 0.0197   Epoch: 11   Global Step: 461840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:38:11,522-Speed 2630.11 samples/sec   Loss 5.9942   LearningRate 0.0196   Epoch: 11   Global Step: 461850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:38:15,400-Speed 2641.33 samples/sec   Loss 5.8555   LearningRate 0.0196   Epoch: 11   Global Step: 461860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:19,309-Speed 2620.66 samples/sec   Loss 5.8587   LearningRate 0.0196   Epoch: 11   Global Step: 461870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:23,205-Speed 2628.70 samples/sec   Loss 5.9262   LearningRate 0.0196   Epoch: 11   Global Step: 461880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:27,097-Speed 2631.72 samples/sec   Loss 5.8928   LearningRate 0.0196   Epoch: 11   Global Step: 461890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:31,002-Speed 2623.10 samples/sec   Loss 6.0075   LearningRate 0.0196   Epoch: 11   Global Step: 461900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:34,905-Speed 2624.16 samples/sec   Loss 5.9267   LearningRate 0.0196   Epoch: 11   Global Step: 461910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:38,803-Speed 2627.38 samples/sec   Loss 5.9527   LearningRate 0.0196   Epoch: 11   Global Step: 461920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:42,692-Speed 2634.28 samples/sec   Loss 5.9548   LearningRate 0.0196   Epoch: 11   Global Step: 461930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:46,611-Speed 2613.50 samples/sec   Loss 5.8598   LearningRate 0.0196   Epoch: 11   Global Step: 461940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:50,508-Speed 2628.32 samples/sec   Loss 5.8339   LearningRate 0.0196   Epoch: 11   Global Step: 461950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:38:54,401-Speed 2630.47 samples/sec   Loss 5.9018   LearningRate 0.0196   Epoch: 11   Global Step: 461960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:38:58,310-Speed 2620.54 samples/sec   Loss 6.0068   LearningRate 0.0196   Epoch: 11   Global Step: 461970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:02,244-Speed 2603.01 samples/sec   Loss 5.8926   LearningRate 0.0196   Epoch: 11   Global Step: 461980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:06,142-Speed 2627.99 samples/sec   Loss 5.8432   LearningRate 0.0196   Epoch: 11   Global Step: 461990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:10,051-Speed 2619.66 samples/sec   Loss 5.8361   LearningRate 0.0196   Epoch: 11   Global Step: 462000   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:13,950-Speed 2627.67 samples/sec   Loss 5.9144   LearningRate 0.0196   Epoch: 11   Global Step: 462010   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:17,848-Speed 2627.19 samples/sec   Loss 5.8293   LearningRate 0.0196   Epoch: 11   Global Step: 462020   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:21,749-Speed 2626.01 samples/sec   Loss 5.8825   LearningRate 0.0196   Epoch: 11   Global Step: 462030   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:25,649-Speed 2626.27 samples/sec   Loss 5.9157   LearningRate 0.0196   Epoch: 11   Global Step: 462040   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:29,547-Speed 2627.32 samples/sec   Loss 5.9216   LearningRate 0.0196   Epoch: 11   Global Step: 462050   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:33,422-Speed 2642.78 samples/sec   Loss 5.9725   LearningRate 0.0196   Epoch: 11   Global Step: 462060   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:37,323-Speed 2625.41 samples/sec   Loss 5.9800   LearningRate 0.0196   Epoch: 11   Global Step: 462070   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:41,220-Speed 2628.19 samples/sec   Loss 5.9629   LearningRate 0.0196   Epoch: 11   Global Step: 462080   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:45,114-Speed 2630.79 samples/sec   Loss 5.9376   LearningRate 0.0196   Epoch: 11   Global Step: 462090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:49,014-Speed 2625.85 samples/sec   Loss 5.8511   LearningRate 0.0196   Epoch: 11   Global Step: 462100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:39:52,904-Speed 2633.32 samples/sec   Loss 5.9184   LearningRate 0.0196   Epoch: 11   Global Step: 462110   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:39:56,805-Speed 2626.18 samples/sec   Loss 5.9049   LearningRate 0.0196   Epoch: 11   Global Step: 462120   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:00,697-Speed 2631.67 samples/sec   Loss 5.9701   LearningRate 0.0196   Epoch: 11   Global Step: 462130   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:04,596-Speed 2626.47 samples/sec   Loss 5.9023   LearningRate 0.0196   Epoch: 11   Global Step: 462140   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:08,492-Speed 2629.23 samples/sec   Loss 6.0208   LearningRate 0.0196   Epoch: 11   Global Step: 462150   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:12,391-Speed 2626.80 samples/sec   Loss 5.9052   LearningRate 0.0196   Epoch: 11   Global Step: 462160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:16,287-Speed 2628.57 samples/sec   Loss 5.8250   LearningRate 0.0196   Epoch: 11   Global Step: 462170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:20,185-Speed 2627.72 samples/sec   Loss 5.9352   LearningRate 0.0196   Epoch: 11   Global Step: 462180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:24,078-Speed 2631.25 samples/sec   Loss 5.9245   LearningRate 0.0196   Epoch: 11   Global Step: 462190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:27,988-Speed 2619.83 samples/sec   Loss 5.8474   LearningRate 0.0196   Epoch: 11   Global Step: 462200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:31,890-Speed 2625.15 samples/sec   Loss 6.0346   LearningRate 0.0196   Epoch: 11   Global Step: 462210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:40:35,789-Speed 2626.55 samples/sec   Loss 5.9772   LearningRate 0.0196   Epoch: 11   Global Step: 462220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:40:39,748-Speed 2587.21 samples/sec   Loss 5.9024   LearningRate 0.0196   Epoch: 11   Global Step: 462230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:43,645-Speed 2627.91 samples/sec   Loss 6.0089   LearningRate 0.0196   Epoch: 11   Global Step: 462240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:47,544-Speed 2627.36 samples/sec   Loss 5.9355   LearningRate 0.0196   Epoch: 11   Global Step: 462250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:51,444-Speed 2625.94 samples/sec   Loss 5.8516   LearningRate 0.0196   Epoch: 11   Global Step: 462260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:55,342-Speed 2627.17 samples/sec   Loss 5.8390   LearningRate 0.0196   Epoch: 11   Global Step: 462270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:40:59,249-Speed 2621.76 samples/sec   Loss 6.0435   LearningRate 0.0196   Epoch: 11   Global Step: 462280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:41:03,148-Speed 2626.56 samples/sec   Loss 5.8527   LearningRate 0.0196   Epoch: 11   Global Step: 462290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:41:07,043-Speed 2629.97 samples/sec   Loss 5.9003   LearningRate 0.0196   Epoch: 11   Global Step: 462300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:41:10,939-Speed 2629.00 samples/sec   Loss 5.9553   LearningRate 0.0196   Epoch: 11   Global Step: 462310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:41:14,832-Speed 2630.79 samples/sec   Loss 5.8994   LearningRate 0.0196   Epoch: 11   Global Step: 462320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:41:18,723-Speed 2632.79 samples/sec   Loss 5.9788   LearningRate 0.0196   Epoch: 11   Global Step: 462330   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:22,624-Speed 2625.52 samples/sec   Loss 5.9276   LearningRate 0.0196   Epoch: 11   Global Step: 462340   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:26,532-Speed 2620.36 samples/sec   Loss 6.0051   LearningRate 0.0196   Epoch: 11   Global Step: 462350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:30,600-Speed 2517.67 samples/sec   Loss 5.9566   LearningRate 0.0196   Epoch: 11   Global Step: 462360   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:34,495-Speed 2629.48 samples/sec   Loss 5.9291   LearningRate 0.0196   Epoch: 11   Global Step: 462370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:38,402-Speed 2621.38 samples/sec   Loss 5.8923   LearningRate 0.0196   Epoch: 11   Global Step: 462380   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:42,309-Speed 2622.13 samples/sec   Loss 5.8377   LearningRate 0.0196   Epoch: 11   Global Step: 462390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:46,206-Speed 2628.23 samples/sec   Loss 5.8709   LearningRate 0.0196   Epoch: 11   Global Step: 462400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:50,108-Speed 2624.91 samples/sec   Loss 5.8713   LearningRate 0.0196   Epoch: 11   Global Step: 462410   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:54,010-Speed 2624.67 samples/sec   Loss 5.9216   LearningRate 0.0196   Epoch: 11   Global Step: 462420   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:41:57,888-Speed 2640.88 samples/sec   Loss 6.0051   LearningRate 0.0196   Epoch: 11   Global Step: 462430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:01,783-Speed 2629.96 samples/sec   Loss 5.8461   LearningRate 0.0196   Epoch: 11   Global Step: 462440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:05,694-Speed 2618.36 samples/sec   Loss 5.9081   LearningRate 0.0196   Epoch: 11   Global Step: 462450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:09,594-Speed 2626.71 samples/sec   Loss 5.9250   LearningRate 0.0196   Epoch: 11   Global Step: 462460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:13,499-Speed 2622.46 samples/sec   Loss 5.8896   LearningRate 0.0196   Epoch: 11   Global Step: 462470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:17,399-Speed 2627.08 samples/sec   Loss 6.1239   LearningRate 0.0196   Epoch: 11   Global Step: 462480   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:21,297-Speed 2627.18 samples/sec   Loss 5.7696   LearningRate 0.0196   Epoch: 11   Global Step: 462490   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:25,201-Speed 2623.77 samples/sec   Loss 5.9714   LearningRate 0.0196   Epoch: 11   Global Step: 462500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:29,095-Speed 2630.23 samples/sec   Loss 5.8396   LearningRate 0.0196   Epoch: 11   Global Step: 462510   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:32,999-Speed 2624.19 samples/sec   Loss 5.9959   LearningRate 0.0196   Epoch: 11   Global Step: 462520   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:42:36,906-Speed 2621.13 samples/sec   Loss 6.0174   LearningRate 0.0196   Epoch: 11   Global Step: 462530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:42:40,811-Speed 2622.90 samples/sec   Loss 5.8787   LearningRate 0.0196   Epoch: 11   Global Step: 462540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:42:44,707-Speed 2629.19 samples/sec   Loss 5.8779   LearningRate 0.0196   Epoch: 11   Global Step: 462550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:42:48,615-Speed 2620.90 samples/sec   Loss 5.8854   LearningRate 0.0196   Epoch: 11   Global Step: 462560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:42:52,521-Speed 2622.37 samples/sec   Loss 5.8468   LearningRate 0.0196   Epoch: 11   Global Step: 462570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:42:56,417-Speed 2628.51 samples/sec   Loss 5.8092   LearningRate 0.0196   Epoch: 11   Global Step: 462580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:43:00,315-Speed 2627.84 samples/sec   Loss 5.8540   LearningRate 0.0196   Epoch: 11   Global Step: 462590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:43:04,200-Speed 2636.56 samples/sec   Loss 5.9761   LearningRate 0.0196   Epoch: 11   Global Step: 462600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:08,094-Speed 2629.82 samples/sec   Loss 5.9869   LearningRate 0.0196   Epoch: 11   Global Step: 462610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:11,991-Speed 2628.31 samples/sec   Loss 5.8866   LearningRate 0.0196   Epoch: 11   Global Step: 462620   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:15,890-Speed 2627.07 samples/sec   Loss 5.9897   LearningRate 0.0196   Epoch: 11   Global Step: 462630   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:19,785-Speed 2629.73 samples/sec   Loss 5.8953   LearningRate 0.0196   Epoch: 11   Global Step: 462640   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:23,683-Speed 2626.88 samples/sec   Loss 5.9191   LearningRate 0.0196   Epoch: 11   Global Step: 462650   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:27,580-Speed 2628.83 samples/sec   Loss 5.9336   LearningRate 0.0196   Epoch: 11   Global Step: 462660   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:31,508-Speed 2607.34 samples/sec   Loss 5.9009   LearningRate 0.0196   Epoch: 11   Global Step: 462670   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:35,416-Speed 2620.53 samples/sec   Loss 5.9324   LearningRate 0.0196   Epoch: 11   Global Step: 462680   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:39,315-Speed 2627.08 samples/sec   Loss 5.8767   LearningRate 0.0196   Epoch: 11   Global Step: 462690   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:43,303-Speed 2568.23 samples/sec   Loss 5.9085   LearningRate 0.0196   Epoch: 11   Global Step: 462700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:43:47,210-Speed 2621.78 samples/sec   Loss 5.9263   LearningRate 0.0196   Epoch: 11   Global Step: 462710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:43:51,099-Speed 2633.67 samples/sec   Loss 5.9271   LearningRate 0.0196   Epoch: 11   Global Step: 462720   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:54,997-Speed 2627.14 samples/sec   Loss 5.8628   LearningRate 0.0196   Epoch: 11   Global Step: 462730   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:43:58,902-Speed 2623.23 samples/sec   Loss 5.9567   LearningRate 0.0196   Epoch: 11   Global Step: 462740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:02,811-Speed 2620.37 samples/sec   Loss 5.9464   LearningRate 0.0196   Epoch: 11   Global Step: 462750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:06,704-Speed 2630.28 samples/sec   Loss 5.8470   LearningRate 0.0196   Epoch: 11   Global Step: 462760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:10,614-Speed 2619.90 samples/sec   Loss 5.8763   LearningRate 0.0196   Epoch: 11   Global Step: 462770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:14,570-Speed 2589.19 samples/sec   Loss 5.8372   LearningRate 0.0195   Epoch: 11   Global Step: 462780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:18,496-Speed 2609.12 samples/sec   Loss 5.8504   LearningRate 0.0195   Epoch: 11   Global Step: 462790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:22,394-Speed 2627.15 samples/sec   Loss 5.8769   LearningRate 0.0195   Epoch: 11   Global Step: 462800   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:26,311-Speed 2614.74 samples/sec   Loss 5.8631   LearningRate 0.0195   Epoch: 11   Global Step: 462810   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:30,243-Speed 2605.09 samples/sec   Loss 5.8958   LearningRate 0.0195   Epoch: 11   Global Step: 462820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:44:34,161-Speed 2613.82 samples/sec   Loss 5.8834   LearningRate 0.0195   Epoch: 11   Global Step: 462830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:44:38,070-Speed 2620.21 samples/sec   Loss 6.0092   LearningRate 0.0195   Epoch: 11   Global Step: 462840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:44:41,949-Speed 2640.51 samples/sec   Loss 5.9581   LearningRate 0.0195   Epoch: 11   Global Step: 462850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:45,840-Speed 2631.71 samples/sec   Loss 5.8995   LearningRate 0.0195   Epoch: 11   Global Step: 462860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:44:49,768-Speed 2608.09 samples/sec   Loss 5.9354   LearningRate 0.0195   Epoch: 11   Global Step: 462870   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:44:53,664-Speed 2629.12 samples/sec   Loss 5.8791   LearningRate 0.0195   Epoch: 11   Global Step: 462880   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:44:57,556-Speed 2631.83 samples/sec   Loss 5.8994   LearningRate 0.0195   Epoch: 11   Global Step: 462890   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:01,448-Speed 2630.98 samples/sec   Loss 5.7071   LearningRate 0.0195   Epoch: 11   Global Step: 462900   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:05,346-Speed 2627.66 samples/sec   Loss 5.9045   LearningRate 0.0195   Epoch: 11   Global Step: 462910   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:09,257-Speed 2618.71 samples/sec   Loss 5.7912   LearningRate 0.0195   Epoch: 11   Global Step: 462920   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:13,152-Speed 2629.63 samples/sec   Loss 5.9345   LearningRate 0.0195   Epoch: 11   Global Step: 462930   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:17,051-Speed 2626.66 samples/sec   Loss 5.9734   LearningRate 0.0195   Epoch: 11   Global Step: 462940   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:20,947-Speed 2629.65 samples/sec   Loss 5.8827   LearningRate 0.0195   Epoch: 11   Global Step: 462950   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:24,852-Speed 2622.53 samples/sec   Loss 5.8041   LearningRate 0.0195   Epoch: 11   Global Step: 462960   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:28,748-Speed 2629.27 samples/sec   Loss 5.9984   LearningRate 0.0195   Epoch: 11   Global Step: 462970   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:45:32,645-Speed 2628.11 samples/sec   Loss 5.9221   LearningRate 0.0195   Epoch: 11   Global Step: 462980   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:45:36,550-Speed 2622.80 samples/sec   Loss 5.9013   LearningRate 0.0195   Epoch: 11   Global Step: 462990   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:45:40,456-Speed 2621.89 samples/sec   Loss 6.0196   LearningRate 0.0195   Epoch: 11   Global Step: 463000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:45:44,356-Speed 2626.79 samples/sec   Loss 5.8925   LearningRate 0.0195   Epoch: 11   Global Step: 463010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:45:48,247-Speed 2631.90 samples/sec   Loss 5.8890   LearningRate 0.0195   Epoch: 11   Global Step: 463020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:45:52,122-Speed 2643.17 samples/sec   Loss 5.8291   LearningRate 0.0195   Epoch: 11   Global Step: 463030   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:56,030-Speed 2620.51 samples/sec   Loss 5.8336   LearningRate 0.0195   Epoch: 11   Global Step: 463040   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:45:59,925-Speed 2629.84 samples/sec   Loss 5.7747   LearningRate 0.0195   Epoch: 11   Global Step: 463050   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:03,832-Speed 2621.78 samples/sec   Loss 5.8484   LearningRate 0.0195   Epoch: 11   Global Step: 463060   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:07,819-Speed 2569.13 samples/sec   Loss 5.9761   LearningRate 0.0195   Epoch: 11   Global Step: 463070   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:11,900-Speed 2509.58 samples/sec   Loss 5.8044   LearningRate 0.0195   Epoch: 11   Global Step: 463080   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:15,944-Speed 2532.46 samples/sec   Loss 5.8535   LearningRate 0.0195   Epoch: 11   Global Step: 463090   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:19,842-Speed 2628.06 samples/sec   Loss 5.9342   LearningRate 0.0195   Epoch: 11   Global Step: 463100   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:23,746-Speed 2623.40 samples/sec   Loss 5.9003   LearningRate 0.0195   Epoch: 11   Global Step: 463110   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:27,639-Speed 2630.77 samples/sec   Loss 5.8973   LearningRate 0.0195   Epoch: 11   Global Step: 463120   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:46:31,546-Speed 2621.49 samples/sec   Loss 5.9023   LearningRate 0.0195   Epoch: 11   Global Step: 463130   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:35,457-Speed 2618.81 samples/sec   Loss 5.9545   LearningRate 0.0195   Epoch: 11   Global Step: 463140   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:39,359-Speed 2625.36 samples/sec   Loss 5.9407   LearningRate 0.0195   Epoch: 11   Global Step: 463150   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:43,254-Speed 2629.67 samples/sec   Loss 5.8842   LearningRate 0.0195   Epoch: 11   Global Step: 463160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:47,151-Speed 2628.49 samples/sec   Loss 5.9270   LearningRate 0.0195   Epoch: 11   Global Step: 463170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:51,050-Speed 2626.62 samples/sec   Loss 5.9900   LearningRate 0.0195   Epoch: 11   Global Step: 463180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:54,944-Speed 2630.86 samples/sec   Loss 5.9322   LearningRate 0.0195   Epoch: 11   Global Step: 463190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:46:58,843-Speed 2626.65 samples/sec   Loss 5.8586   LearningRate 0.0195   Epoch: 11   Global Step: 463200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:47:02,773-Speed 2605.93 samples/sec   Loss 5.9486   LearningRate 0.0195   Epoch: 11   Global Step: 463210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:47:06,672-Speed 2626.65 samples/sec   Loss 5.9125   LearningRate 0.0195   Epoch: 11   Global Step: 463220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:47:10,607-Speed 2602.76 samples/sec   Loss 5.8836   LearningRate 0.0195   Epoch: 11   Global Step: 463230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:14,513-Speed 2622.67 samples/sec   Loss 5.8443   LearningRate 0.0195   Epoch: 11   Global Step: 463240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:18,411-Speed 2627.47 samples/sec   Loss 5.8944   LearningRate 0.0195   Epoch: 11   Global Step: 463250   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:22,324-Speed 2617.85 samples/sec   Loss 5.8764   LearningRate 0.0195   Epoch: 11   Global Step: 463260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:26,220-Speed 2629.08 samples/sec   Loss 5.8946   LearningRate 0.0195   Epoch: 11   Global Step: 463270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:30,116-Speed 2628.43 samples/sec   Loss 5.8324   LearningRate 0.0195   Epoch: 11   Global Step: 463280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:34,012-Speed 2629.07 samples/sec   Loss 5.8517   LearningRate 0.0195   Epoch: 11   Global Step: 463290   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:37,951-Speed 2600.43 samples/sec   Loss 5.9307   LearningRate 0.0195   Epoch: 11   Global Step: 463300   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:41,850-Speed 2626.53 samples/sec   Loss 5.9042   LearningRate 0.0195   Epoch: 11   Global Step: 463310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:47:45,771-Speed 2613.03 samples/sec   Loss 5.9003   LearningRate 0.0195   Epoch: 11   Global Step: 463320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:47:49,680-Speed 2619.56 samples/sec   Loss 5.8403   LearningRate 0.0195   Epoch: 11   Global Step: 463330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:47:53,598-Speed 2614.45 samples/sec   Loss 5.8883   LearningRate 0.0195   Epoch: 11   Global Step: 463340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:47:57,497-Speed 2627.40 samples/sec   Loss 5.9206   LearningRate 0.0195   Epoch: 11   Global Step: 463350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:01,395-Speed 2627.62 samples/sec   Loss 5.9591   LearningRate 0.0195   Epoch: 11   Global Step: 463360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:05,294-Speed 2626.32 samples/sec   Loss 5.9993   LearningRate 0.0195   Epoch: 11   Global Step: 463370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:09,238-Speed 2597.72 samples/sec   Loss 5.9686   LearningRate 0.0195   Epoch: 11   Global Step: 463380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:13,134-Speed 2629.02 samples/sec   Loss 6.0411   LearningRate 0.0195   Epoch: 11   Global Step: 463390   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:17,029-Speed 2629.35 samples/sec   Loss 5.8094   LearningRate 0.0195   Epoch: 11   Global Step: 463400   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:20,928-Speed 2627.19 samples/sec   Loss 5.9763   LearningRate 0.0195   Epoch: 11   Global Step: 463410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:48:24,825-Speed 2628.57 samples/sec   Loss 5.8682   LearningRate 0.0195   Epoch: 11   Global Step: 463420   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:28,722-Speed 2628.25 samples/sec   Loss 5.8903   LearningRate 0.0195   Epoch: 11   Global Step: 463430   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:32,617-Speed 2629.27 samples/sec   Loss 5.9091   LearningRate 0.0195   Epoch: 11   Global Step: 463440   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:36,530-Speed 2617.34 samples/sec   Loss 5.8556   LearningRate 0.0195   Epoch: 11   Global Step: 463450   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:40,429-Speed 2627.57 samples/sec   Loss 5.9089   LearningRate 0.0195   Epoch: 11   Global Step: 463460   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:44,327-Speed 2628.02 samples/sec   Loss 5.9739   LearningRate 0.0195   Epoch: 11   Global Step: 463470   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:48,217-Speed 2632.91 samples/sec   Loss 5.9126   LearningRate 0.0195   Epoch: 11   Global Step: 463480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:52,114-Speed 2628.72 samples/sec   Loss 5.7536   LearningRate 0.0195   Epoch: 11   Global Step: 463490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:56,006-Speed 2631.31 samples/sec   Loss 5.7606   LearningRate 0.0195   Epoch: 11   Global Step: 463500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:48:59,901-Speed 2629.02 samples/sec   Loss 5.9351   LearningRate 0.0195   Epoch: 11   Global Step: 463510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:03,797-Speed 2629.17 samples/sec   Loss 5.8401   LearningRate 0.0195   Epoch: 11   Global Step: 463520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:07,695-Speed 2627.63 samples/sec   Loss 5.9618   LearningRate 0.0195   Epoch: 11   Global Step: 463530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:11,596-Speed 2625.77 samples/sec   Loss 5.9183   LearningRate 0.0195   Epoch: 11   Global Step: 463540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:15,493-Speed 2628.35 samples/sec   Loss 5.9384   LearningRate 0.0195   Epoch: 11   Global Step: 463550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:19,391-Speed 2627.99 samples/sec   Loss 5.9218   LearningRate 0.0195   Epoch: 11   Global Step: 463560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:23,289-Speed 2627.20 samples/sec   Loss 6.0049   LearningRate 0.0195   Epoch: 11   Global Step: 463570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:27,192-Speed 2624.75 samples/sec   Loss 5.9497   LearningRate 0.0195   Epoch: 11   Global Step: 463580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:31,087-Speed 2629.40 samples/sec   Loss 5.9196   LearningRate 0.0195   Epoch: 11   Global Step: 463590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:34,987-Speed 2626.11 samples/sec   Loss 5.8782   LearningRate 0.0195   Epoch: 11   Global Step: 463600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:38,881-Speed 2629.98 samples/sec   Loss 5.8473   LearningRate 0.0195   Epoch: 11   Global Step: 463610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:42,764-Speed 2638.38 samples/sec   Loss 5.9439   LearningRate 0.0195   Epoch: 11   Global Step: 463620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:46,657-Speed 2631.31 samples/sec   Loss 5.8613   LearningRate 0.0195   Epoch: 11   Global Step: 463630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:50,550-Speed 2631.57 samples/sec   Loss 5.9220   LearningRate 0.0195   Epoch: 11   Global Step: 463640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:54,459-Speed 2619.57 samples/sec   Loss 5.8675   LearningRate 0.0195   Epoch: 11   Global Step: 463650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:49:58,357-Speed 2628.47 samples/sec   Loss 6.0114   LearningRate 0.0195   Epoch: 11   Global Step: 463660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:50:02,236-Speed 2640.48 samples/sec   Loss 5.8442   LearningRate 0.0195   Epoch: 11   Global Step: 463670   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:50:06,132-Speed 2628.79 samples/sec   Loss 5.8295   LearningRate 0.0195   Epoch: 11   Global Step: 463680   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:50:10,030-Speed 2626.92 samples/sec   Loss 5.8948   LearningRate 0.0195   Epoch: 11   Global Step: 463690   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:50:13,923-Speed 2631.15 samples/sec   Loss 5.8156   LearningRate 0.0195   Epoch: 11   Global Step: 463700   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:50:17,827-Speed 2624.20 samples/sec   Loss 5.9570   LearningRate 0.0195   Epoch: 11   Global Step: 463710   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:50:21,700-Speed 2644.85 samples/sec   Loss 5.9202   LearningRate 0.0194   Epoch: 11   Global Step: 463720   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:25,593-Speed 2630.73 samples/sec   Loss 6.0038   LearningRate 0.0194   Epoch: 11   Global Step: 463730   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:29,495-Speed 2625.88 samples/sec   Loss 5.8436   LearningRate 0.0194   Epoch: 11   Global Step: 463740   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:33,408-Speed 2616.84 samples/sec   Loss 5.8631   LearningRate 0.0194   Epoch: 11   Global Step: 463750   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:37,317-Speed 2620.37 samples/sec   Loss 5.8936   LearningRate 0.0194   Epoch: 11   Global Step: 463760   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:41,217-Speed 2625.99 samples/sec   Loss 5.8314   LearningRate 0.0194   Epoch: 11   Global Step: 463770   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:45,120-Speed 2625.08 samples/sec   Loss 5.9038   LearningRate 0.0194   Epoch: 11   Global Step: 463780   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:49,022-Speed 2624.26 samples/sec   Loss 5.9874   LearningRate 0.0194   Epoch: 11   Global Step: 463790   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:52,933-Speed 2619.17 samples/sec   Loss 5.9862   LearningRate 0.0194   Epoch: 11   Global Step: 463800   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:50:56,840-Speed 2622.05 samples/sec   Loss 5.7647   LearningRate 0.0194   Epoch: 11   Global Step: 463810   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:51:00,733-Speed 2630.98 samples/sec   Loss 5.9611   LearningRate 0.0194   Epoch: 11   Global Step: 463820   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:04,643-Speed 2619.46 samples/sec   Loss 5.8482   LearningRate 0.0194   Epoch: 11   Global Step: 463830   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:08,647-Speed 2558.16 samples/sec   Loss 5.8370   LearningRate 0.0194   Epoch: 11   Global Step: 463840   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:12,680-Speed 2539.45 samples/sec   Loss 5.9984   LearningRate 0.0194   Epoch: 11   Global Step: 463850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:16,579-Speed 2626.74 samples/sec   Loss 5.6758   LearningRate 0.0194   Epoch: 11   Global Step: 463860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:20,479-Speed 2626.95 samples/sec   Loss 5.9567   LearningRate 0.0194   Epoch: 11   Global Step: 463870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:24,380-Speed 2625.43 samples/sec   Loss 5.8163   LearningRate 0.0194   Epoch: 11   Global Step: 463880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:28,283-Speed 2625.96 samples/sec   Loss 5.7364   LearningRate 0.0194   Epoch: 11   Global Step: 463890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:32,181-Speed 2627.55 samples/sec   Loss 5.8370   LearningRate 0.0194   Epoch: 11   Global Step: 463900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:36,268-Speed 2505.65 samples/sec   Loss 5.9062   LearningRate 0.0194   Epoch: 11   Global Step: 463910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:51:40,356-Speed 2505.48 samples/sec   Loss 5.8421   LearningRate 0.0194   Epoch: 11   Global Step: 463920   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:51:44,466-Speed 2492.42 samples/sec   Loss 5.9053   LearningRate 0.0194   Epoch: 11   Global Step: 463930   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:51:48,562-Speed 2500.65 samples/sec   Loss 5.9214   LearningRate 0.0194   Epoch: 11   Global Step: 463940   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:51:52,582-Speed 2547.95 samples/sec   Loss 5.7989   LearningRate 0.0194   Epoch: 11   Global Step: 463950   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:51:56,474-Speed 2632.17 samples/sec   Loss 5.9088   LearningRate 0.0194   Epoch: 11   Global Step: 463960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:00,367-Speed 2631.27 samples/sec   Loss 5.8445   LearningRate 0.0194   Epoch: 11   Global Step: 463970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:04,269-Speed 2624.13 samples/sec   Loss 5.9014   LearningRate 0.0194   Epoch: 11   Global Step: 463980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:08,163-Speed 2630.72 samples/sec   Loss 5.7389   LearningRate 0.0194   Epoch: 11   Global Step: 463990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:12,062-Speed 2626.74 samples/sec   Loss 6.0084   LearningRate 0.0194   Epoch: 11   Global Step: 464000   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:15,975-Speed 2617.63 samples/sec   Loss 5.9422   LearningRate 0.0194   Epoch: 11   Global Step: 464010   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:19,853-Speed 2641.83 samples/sec   Loss 5.8291   LearningRate 0.0194   Epoch: 11   Global Step: 464020   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:23,751-Speed 2627.03 samples/sec   Loss 5.8887   LearningRate 0.0194   Epoch: 11   Global Step: 464030   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:27,650-Speed 2626.76 samples/sec   Loss 5.8983   LearningRate 0.0194   Epoch: 11   Global Step: 464040   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:31,548-Speed 2627.47 samples/sec   Loss 5.8937   LearningRate 0.0194   Epoch: 11   Global Step: 464050   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:35,461-Speed 2618.32 samples/sec   Loss 5.9594   LearningRate 0.0194   Epoch: 11   Global Step: 464060   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:39,355-Speed 2630.15 samples/sec   Loss 5.8158   LearningRate 0.0194   Epoch: 11   Global Step: 464070   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:43,249-Speed 2630.09 samples/sec   Loss 5.7933   LearningRate 0.0194   Epoch: 11   Global Step: 464080   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:52:47,351-Speed 2497.18 samples/sec   Loss 5.9463   LearningRate 0.0194   Epoch: 11   Global Step: 464090   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:52:51,290-Speed 2600.21 samples/sec   Loss 5.9510   LearningRate 0.0194   Epoch: 11   Global Step: 464100   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:52:55,213-Speed 2611.19 samples/sec   Loss 5.9042   LearningRate 0.0194   Epoch: 11   Global Step: 464110   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:52:59,117-Speed 2623.52 samples/sec   Loss 5.9158   LearningRate 0.0194   Epoch: 11   Global Step: 464120   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:03,015-Speed 2627.98 samples/sec   Loss 5.8563   LearningRate 0.0194   Epoch: 11   Global Step: 464130   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:06,909-Speed 2630.03 samples/sec   Loss 5.8467   LearningRate 0.0194   Epoch: 11   Global Step: 464140   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:10,816-Speed 2622.06 samples/sec   Loss 5.7826   LearningRate 0.0194   Epoch: 11   Global Step: 464150   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:14,710-Speed 2630.16 samples/sec   Loss 5.8728   LearningRate 0.0194   Epoch: 11   Global Step: 464160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:18,606-Speed 2628.81 samples/sec   Loss 5.9162   LearningRate 0.0194   Epoch: 11   Global Step: 464170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:22,505-Speed 2627.22 samples/sec   Loss 5.9538   LearningRate 0.0194   Epoch: 11   Global Step: 464180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:26,401-Speed 2629.01 samples/sec   Loss 5.8814   LearningRate 0.0194   Epoch: 11   Global Step: 464190   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:53:30,291-Speed 2632.99 samples/sec   Loss 5.8309   LearningRate 0.0194   Epoch: 11   Global Step: 464200   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:53:34,188-Speed 2628.49 samples/sec   Loss 5.8613   LearningRate 0.0194   Epoch: 11   Global Step: 464210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:53:38,157-Speed 2580.67 samples/sec   Loss 5.8193   LearningRate 0.0194   Epoch: 11   Global Step: 464220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:53:42,069-Speed 2618.03 samples/sec   Loss 6.0135   LearningRate 0.0194   Epoch: 11   Global Step: 464230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:53:45,969-Speed 2626.35 samples/sec   Loss 5.9264   LearningRate 0.0194   Epoch: 11   Global Step: 464240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:53:49,846-Speed 2642.27 samples/sec   Loss 5.9127   LearningRate 0.0194   Epoch: 11   Global Step: 464250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:53,740-Speed 2630.34 samples/sec   Loss 5.8683   LearningRate 0.0194   Epoch: 11   Global Step: 464260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:53:57,639-Speed 2626.94 samples/sec   Loss 5.9841   LearningRate 0.0194   Epoch: 11   Global Step: 464270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:01,544-Speed 2622.78 samples/sec   Loss 5.9124   LearningRate 0.0194   Epoch: 11   Global Step: 464280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:05,446-Speed 2624.77 samples/sec   Loss 5.8264   LearningRate 0.0194   Epoch: 11   Global Step: 464290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:09,343-Speed 2628.56 samples/sec   Loss 5.8633   LearningRate 0.0194   Epoch: 11   Global Step: 464300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:13,241-Speed 2627.78 samples/sec   Loss 5.8139   LearningRate 0.0194   Epoch: 11   Global Step: 464310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:17,150-Speed 2620.42 samples/sec   Loss 6.0218   LearningRate 0.0194   Epoch: 11   Global Step: 464320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:21,046-Speed 2628.65 samples/sec   Loss 5.9854   LearningRate 0.0194   Epoch: 11   Global Step: 464330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:24,958-Speed 2618.94 samples/sec   Loss 5.9412   LearningRate 0.0194   Epoch: 11   Global Step: 464340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:54:28,832-Speed 2643.60 samples/sec   Loss 5.9346   LearningRate 0.0194   Epoch: 11   Global Step: 464350   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:32,733-Speed 2625.71 samples/sec   Loss 5.8150   LearningRate 0.0194   Epoch: 11   Global Step: 464360   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:36,630-Speed 2628.20 samples/sec   Loss 5.8272   LearningRate 0.0194   Epoch: 11   Global Step: 464370   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:40,525-Speed 2629.90 samples/sec   Loss 5.8440   LearningRate 0.0194   Epoch: 11   Global Step: 464380   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:44,428-Speed 2623.98 samples/sec   Loss 5.8381   LearningRate 0.0194   Epoch: 11   Global Step: 464390   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:48,322-Speed 2630.97 samples/sec   Loss 5.9307   LearningRate 0.0194   Epoch: 11   Global Step: 464400   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:52,222-Speed 2625.69 samples/sec   Loss 5.9025   LearningRate 0.0194   Epoch: 11   Global Step: 464410   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:54:56,115-Speed 2631.39 samples/sec   Loss 5.8399   LearningRate 0.0194   Epoch: 11   Global Step: 464420   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:55:00,022-Speed 2621.44 samples/sec   Loss 5.7867   LearningRate 0.0194   Epoch: 11   Global Step: 464430   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:55:03,931-Speed 2619.90 samples/sec   Loss 5.8762   LearningRate 0.0194   Epoch: 11   Global Step: 464440   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:55:07,829-Speed 2628.05 samples/sec   Loss 6.0031   LearningRate 0.0194   Epoch: 11   Global Step: 464450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:11,723-Speed 2630.47 samples/sec   Loss 5.9595   LearningRate 0.0194   Epoch: 11   Global Step: 464460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:15,616-Speed 2631.27 samples/sec   Loss 5.8583   LearningRate 0.0194   Epoch: 11   Global Step: 464470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:19,511-Speed 2629.88 samples/sec   Loss 5.8711   LearningRate 0.0194   Epoch: 11   Global Step: 464480   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:23,447-Speed 2602.35 samples/sec   Loss 5.9422   LearningRate 0.0194   Epoch: 11   Global Step: 464490   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:27,341-Speed 2630.51 samples/sec   Loss 5.9084   LearningRate 0.0194   Epoch: 11   Global Step: 464500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:31,240-Speed 2627.00 samples/sec   Loss 6.0143   LearningRate 0.0194   Epoch: 11   Global Step: 464510   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:35,149-Speed 2619.92 samples/sec   Loss 5.9406   LearningRate 0.0194   Epoch: 11   Global Step: 464520   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:39,048-Speed 2627.45 samples/sec   Loss 5.9102   LearningRate 0.0194   Epoch: 11   Global Step: 464530   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:42,946-Speed 2627.44 samples/sec   Loss 5.8141   LearningRate 0.0194   Epoch: 11   Global Step: 464540   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:55:46,849-Speed 2624.12 samples/sec   Loss 5.9613   LearningRate 0.0194   Epoch: 11   Global Step: 464550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:55:50,754-Speed 2623.14 samples/sec   Loss 5.9711   LearningRate 0.0194   Epoch: 11   Global Step: 464560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:55:54,647-Speed 2630.72 samples/sec   Loss 5.8618   LearningRate 0.0194   Epoch: 11   Global Step: 464570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:55:58,540-Speed 2631.00 samples/sec   Loss 5.8655   LearningRate 0.0194   Epoch: 11   Global Step: 464580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:02,450-Speed 2619.78 samples/sec   Loss 5.8667   LearningRate 0.0194   Epoch: 11   Global Step: 464590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:06,344-Speed 2629.96 samples/sec   Loss 5.9328   LearningRate 0.0194   Epoch: 11   Global Step: 464600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:10,239-Speed 2629.90 samples/sec   Loss 5.8584   LearningRate 0.0194   Epoch: 11   Global Step: 464610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:14,139-Speed 2626.46 samples/sec   Loss 5.8004   LearningRate 0.0194   Epoch: 11   Global Step: 464620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:18,048-Speed 2619.89 samples/sec   Loss 5.9426   LearningRate 0.0194   Epoch: 11   Global Step: 464630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:21,943-Speed 2629.90 samples/sec   Loss 5.9271   LearningRate 0.0194   Epoch: 11   Global Step: 464640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:25,824-Speed 2638.94 samples/sec   Loss 5.8794   LearningRate 0.0194   Epoch: 11   Global Step: 464650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:29,717-Speed 2630.73 samples/sec   Loss 5.8550   LearningRate 0.0193   Epoch: 11   Global Step: 464660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:33,617-Speed 2626.63 samples/sec   Loss 5.9286   LearningRate 0.0193   Epoch: 11   Global Step: 464670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:37,519-Speed 2625.20 samples/sec   Loss 5.8255   LearningRate 0.0193   Epoch: 11   Global Step: 464680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:41,423-Speed 2623.18 samples/sec   Loss 5.7912   LearningRate 0.0193   Epoch: 11   Global Step: 464690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:56:45,302-Speed 2640.37 samples/sec   Loss 5.8215   LearningRate 0.0193   Epoch: 11   Global Step: 464700   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:56:49,198-Speed 2629.38 samples/sec   Loss 5.9216   LearningRate 0.0193   Epoch: 11   Global Step: 464710   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:56:53,091-Speed 2631.12 samples/sec   Loss 5.9168   LearningRate 0.0193   Epoch: 11   Global Step: 464720   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:56:56,988-Speed 2628.55 samples/sec   Loss 5.9322   LearningRate 0.0193   Epoch: 11   Global Step: 464730   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:00,930-Speed 2598.63 samples/sec   Loss 5.9346   LearningRate 0.0193   Epoch: 11   Global Step: 464740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:04,825-Speed 2629.74 samples/sec   Loss 5.8323   LearningRate 0.0193   Epoch: 11   Global Step: 464750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:08,725-Speed 2625.59 samples/sec   Loss 5.8351   LearningRate 0.0193   Epoch: 11   Global Step: 464760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:12,621-Speed 2628.77 samples/sec   Loss 5.7727   LearningRate 0.0193   Epoch: 11   Global Step: 464770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:16,525-Speed 2624.66 samples/sec   Loss 5.9337   LearningRate 0.0193   Epoch: 11   Global Step: 464780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:20,439-Speed 2616.61 samples/sec   Loss 5.8110   LearningRate 0.0193   Epoch: 11   Global Step: 464790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:24,341-Speed 2624.66 samples/sec   Loss 5.9713   LearningRate 0.0193   Epoch: 11   Global Step: 464800   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:57:28,239-Speed 2628.19 samples/sec   Loss 5.8731   LearningRate 0.0193   Epoch: 11   Global Step: 464810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:57:32,168-Speed 2606.95 samples/sec   Loss 5.9049   LearningRate 0.0193   Epoch: 11   Global Step: 464820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:57:36,084-Speed 2615.14 samples/sec   Loss 5.9167   LearningRate 0.0193   Epoch: 11   Global Step: 464830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:57:39,982-Speed 2628.18 samples/sec   Loss 5.8946   LearningRate 0.0193   Epoch: 11   Global Step: 464840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:57:43,878-Speed 2629.46 samples/sec   Loss 5.8332   LearningRate 0.0193   Epoch: 11   Global Step: 464850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:57:47,762-Speed 2636.36 samples/sec   Loss 5.8308   LearningRate 0.0193   Epoch: 11   Global Step: 464860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:51,658-Speed 2629.26 samples/sec   Loss 5.8375   LearningRate 0.0193   Epoch: 11   Global Step: 464870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:55,556-Speed 2627.23 samples/sec   Loss 5.7989   LearningRate 0.0193   Epoch: 11   Global Step: 464880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:57:59,451-Speed 2630.32 samples/sec   Loss 5.9440   LearningRate 0.0193   Epoch: 11   Global Step: 464890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:58:03,352-Speed 2625.59 samples/sec   Loss 5.9054   LearningRate 0.0193   Epoch: 11   Global Step: 464900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:58:07,270-Speed 2613.88 samples/sec   Loss 5.8589   LearningRate 0.0193   Epoch: 11   Global Step: 464910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:58:11,156-Speed 2635.77 samples/sec   Loss 5.8478   LearningRate 0.0193   Epoch: 11   Global Step: 464920   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:15,047-Speed 2632.93 samples/sec   Loss 5.7868   LearningRate 0.0193   Epoch: 11   Global Step: 464930   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:18,948-Speed 2625.31 samples/sec   Loss 5.8694   LearningRate 0.0193   Epoch: 11   Global Step: 464940   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:22,853-Speed 2622.58 samples/sec   Loss 5.8205   LearningRate 0.0193   Epoch: 11   Global Step: 464950   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:26,757-Speed 2623.86 samples/sec   Loss 5.8867   LearningRate 0.0193   Epoch: 11   Global Step: 464960   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:30,656-Speed 2627.79 samples/sec   Loss 5.8916   LearningRate 0.0193   Epoch: 11   Global Step: 464970   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:34,552-Speed 2628.43 samples/sec   Loss 5.9223   LearningRate 0.0193   Epoch: 11   Global Step: 464980   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:38,448-Speed 2628.62 samples/sec   Loss 5.8233   LearningRate 0.0193   Epoch: 11   Global Step: 464990   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:42,349-Speed 2625.74 samples/sec   Loss 5.8680   LearningRate 0.0193   Epoch: 11   Global Step: 465000   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:46,241-Speed 2632.44 samples/sec   Loss 5.8277   LearningRate 0.0193   Epoch: 11   Global Step: 465010   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-14 23:58:50,137-Speed 2628.93 samples/sec   Loss 5.9249   LearningRate 0.0193   Epoch: 11   Global Step: 465020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:58:54,033-Speed 2628.62 samples/sec   Loss 5.8386   LearningRate 0.0193   Epoch: 11   Global Step: 465030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:58:57,948-Speed 2616.35 samples/sec   Loss 5.9048   LearningRate 0.0193   Epoch: 11   Global Step: 465040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:01,845-Speed 2628.17 samples/sec   Loss 5.7596   LearningRate 0.0193   Epoch: 11   Global Step: 465050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:05,743-Speed 2627.83 samples/sec   Loss 5.9459   LearningRate 0.0193   Epoch: 11   Global Step: 465060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:09,635-Speed 2631.66 samples/sec   Loss 5.8753   LearningRate 0.0193   Epoch: 11   Global Step: 465070   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:13,530-Speed 2629.47 samples/sec   Loss 5.9007   LearningRate 0.0193   Epoch: 11   Global Step: 465080   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:17,430-Speed 2626.89 samples/sec   Loss 5.8956   LearningRate 0.0193   Epoch: 11   Global Step: 465090   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:21,327-Speed 2628.13 samples/sec   Loss 5.7961   LearningRate 0.0193   Epoch: 11   Global Step: 465100   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:25,322-Speed 2563.62 samples/sec   Loss 5.8735   LearningRate 0.0193   Epoch: 11   Global Step: 465110   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-14 23:59:29,236-Speed 2616.99 samples/sec   Loss 5.8749   LearningRate 0.0193   Epoch: 11   Global Step: 465120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:33,139-Speed 2623.80 samples/sec   Loss 5.8693   LearningRate 0.0193   Epoch: 11   Global Step: 465130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:37,045-Speed 2622.83 samples/sec   Loss 5.8816   LearningRate 0.0193   Epoch: 11   Global Step: 465140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:40,951-Speed 2622.16 samples/sec   Loss 5.8571   LearningRate 0.0193   Epoch: 11   Global Step: 465150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:44,850-Speed 2626.60 samples/sec   Loss 5.8549   LearningRate 0.0193   Epoch: 11   Global Step: 465160   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:48,756-Speed 2622.16 samples/sec   Loss 5.8646   LearningRate 0.0193   Epoch: 11   Global Step: 465170   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:52,669-Speed 2617.25 samples/sec   Loss 5.9474   LearningRate 0.0193   Epoch: 11   Global Step: 465180   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-14 23:59:56,580-Speed 2619.07 samples/sec   Loss 5.9267   LearningRate 0.0193   Epoch: 11   Global Step: 465190   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:00:00,479-Speed 2627.10 samples/sec   Loss 5.9563   LearningRate 0.0193   Epoch: 11   Global Step: 465200   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:00:04,387-Speed 2620.61 samples/sec   Loss 5.8386   LearningRate 0.0193   Epoch: 11   Global Step: 465210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:00:08,265-Speed 2641.20 samples/sec   Loss 5.7986   LearningRate 0.0193   Epoch: 11   Global Step: 465220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:00:12,175-Speed 2619.93 samples/sec   Loss 5.9103   LearningRate 0.0193   Epoch: 11   Global Step: 465230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:00:16,075-Speed 2627.05 samples/sec   Loss 5.9014   LearningRate 0.0193   Epoch: 11   Global Step: 465240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:00:19,953-Speed 2641.03 samples/sec   Loss 5.9080   LearningRate 0.0193   Epoch: 11   Global Step: 465250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:23,847-Speed 2630.68 samples/sec   Loss 5.8941   LearningRate 0.0193   Epoch: 11   Global Step: 465260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:27,741-Speed 2629.90 samples/sec   Loss 5.8410   LearningRate 0.0193   Epoch: 11   Global Step: 465270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:31,644-Speed 2624.16 samples/sec   Loss 5.8693   LearningRate 0.0193   Epoch: 11   Global Step: 465280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:35,545-Speed 2625.60 samples/sec   Loss 5.8883   LearningRate 0.0193   Epoch: 11   Global Step: 465290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:39,438-Speed 2630.81 samples/sec   Loss 5.8958   LearningRate 0.0193   Epoch: 11   Global Step: 465300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:43,337-Speed 2627.22 samples/sec   Loss 5.8979   LearningRate 0.0193   Epoch: 11   Global Step: 465310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:47,232-Speed 2629.68 samples/sec   Loss 5.8278   LearningRate 0.0193   Epoch: 11   Global Step: 465320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:51,142-Speed 2619.67 samples/sec   Loss 5.8613   LearningRate 0.0193   Epoch: 11   Global Step: 465330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:55,039-Speed 2628.91 samples/sec   Loss 5.9250   LearningRate 0.0193   Epoch: 11   Global Step: 465340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:00:58,937-Speed 2627.07 samples/sec   Loss 5.8124   LearningRate 0.0193   Epoch: 11   Global Step: 465350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:01:02,833-Speed 2628.89 samples/sec   Loss 5.7842   LearningRate 0.0193   Epoch: 11   Global Step: 465360   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:01:06,734-Speed 2625.86 samples/sec   Loss 5.8258   LearningRate 0.0193   Epoch: 11   Global Step: 465370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:01:10,614-Speed 2640.30 samples/sec   Loss 5.9470   LearningRate 0.0193   Epoch: 11   Global Step: 465380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:14,510-Speed 2628.54 samples/sec   Loss 5.8168   LearningRate 0.0193   Epoch: 11   Global Step: 465390   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:18,434-Speed 2610.01 samples/sec   Loss 5.8918   LearningRate 0.0193   Epoch: 11   Global Step: 465400   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:22,335-Speed 2626.07 samples/sec   Loss 5.8414   LearningRate 0.0193   Epoch: 11   Global Step: 465410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:26,229-Speed 2630.21 samples/sec   Loss 5.7874   LearningRate 0.0193   Epoch: 11   Global Step: 465420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:30,122-Speed 2631.31 samples/sec   Loss 5.8621   LearningRate 0.0193   Epoch: 11   Global Step: 465430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:34,019-Speed 2628.00 samples/sec   Loss 5.8394   LearningRate 0.0193   Epoch: 11   Global Step: 465440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:37,917-Speed 2627.12 samples/sec   Loss 5.8206   LearningRate 0.0193   Epoch: 11   Global Step: 465450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:41,813-Speed 2629.26 samples/sec   Loss 5.9372   LearningRate 0.0193   Epoch: 11   Global Step: 465460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:45,709-Speed 2629.63 samples/sec   Loss 5.8425   LearningRate 0.0193   Epoch: 11   Global Step: 465470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:01:49,608-Speed 2626.86 samples/sec   Loss 5.8788   LearningRate 0.0193   Epoch: 11   Global Step: 465480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:01:53,518-Speed 2619.73 samples/sec   Loss 5.7907   LearningRate 0.0193   Epoch: 11   Global Step: 465490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:01:57,434-Speed 2615.49 samples/sec   Loss 5.7940   LearningRate 0.0193   Epoch: 11   Global Step: 465500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:01,332-Speed 2628.43 samples/sec   Loss 5.8060   LearningRate 0.0193   Epoch: 11   Global Step: 465510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:05,247-Speed 2616.07 samples/sec   Loss 5.8710   LearningRate 0.0193   Epoch: 11   Global Step: 465520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:09,342-Speed 2500.77 samples/sec   Loss 5.7640   LearningRate 0.0193   Epoch: 11   Global Step: 465530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:13,356-Speed 2551.60 samples/sec   Loss 5.7999   LearningRate 0.0193   Epoch: 11   Global Step: 465540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:17,254-Speed 2628.27 samples/sec   Loss 5.8919   LearningRate 0.0193   Epoch: 11   Global Step: 465550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:21,158-Speed 2624.06 samples/sec   Loss 5.9388   LearningRate 0.0193   Epoch: 11   Global Step: 465560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:25,052-Speed 2629.61 samples/sec   Loss 5.8127   LearningRate 0.0193   Epoch: 11   Global Step: 465570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:28,927-Speed 2643.49 samples/sec   Loss 5.8860   LearningRate 0.0193   Epoch: 11   Global Step: 465580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:32,920-Speed 2564.74 samples/sec   Loss 5.7858   LearningRate 0.0193   Epoch: 11   Global Step: 465590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:36,836-Speed 2615.52 samples/sec   Loss 5.8758   LearningRate 0.0193   Epoch: 11   Global Step: 465600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:40,743-Speed 2621.86 samples/sec   Loss 5.8856   LearningRate 0.0192   Epoch: 11   Global Step: 465610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:44,638-Speed 2629.58 samples/sec   Loss 5.9108   LearningRate 0.0192   Epoch: 11   Global Step: 465620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:48,543-Speed 2623.33 samples/sec   Loss 5.9027   LearningRate 0.0192   Epoch: 11   Global Step: 465630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:52,440-Speed 2628.13 samples/sec   Loss 5.9591   LearningRate 0.0192   Epoch: 11   Global Step: 465640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:02:56,338-Speed 2627.29 samples/sec   Loss 5.7317   LearningRate 0.0192   Epoch: 11   Global Step: 465650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:03:00,278-Speed 2599.78 samples/sec   Loss 5.9258   LearningRate 0.0192   Epoch: 11   Global Step: 465660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:03:04,291-Speed 2552.75 samples/sec   Loss 5.7907   LearningRate 0.0192   Epoch: 11   Global Step: 465670   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:08,267-Speed 2576.18 samples/sec   Loss 5.7463   LearningRate 0.0192   Epoch: 11   Global Step: 465680   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:12,160-Speed 2631.02 samples/sec   Loss 5.8283   LearningRate 0.0192   Epoch: 11   Global Step: 465690   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:16,068-Speed 2621.21 samples/sec   Loss 5.8142   LearningRate 0.0192   Epoch: 11   Global Step: 465700   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:19,963-Speed 2629.40 samples/sec   Loss 5.8976   LearningRate 0.0192   Epoch: 11   Global Step: 465710   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:23,918-Speed 2590.00 samples/sec   Loss 5.7960   LearningRate 0.0192   Epoch: 11   Global Step: 465720   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:27,817-Speed 2627.06 samples/sec   Loss 5.8588   LearningRate 0.0192   Epoch: 11   Global Step: 465730   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:31,710-Speed 2631.12 samples/sec   Loss 5.7568   LearningRate 0.0192   Epoch: 11   Global Step: 465740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:35,605-Speed 2629.57 samples/sec   Loss 5.9215   LearningRate 0.0192   Epoch: 11   Global Step: 465750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:39,499-Speed 2629.94 samples/sec   Loss 5.8608   LearningRate 0.0192   Epoch: 11   Global Step: 465760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:43,395-Speed 2628.83 samples/sec   Loss 6.0430   LearningRate 0.0192   Epoch: 11   Global Step: 465770   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:03:47,270-Speed 2643.95 samples/sec   Loss 5.8602   LearningRate 0.0192   Epoch: 11   Global Step: 465780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:51,174-Speed 2624.39 samples/sec   Loss 5.8316   LearningRate 0.0192   Epoch: 11   Global Step: 465790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:55,104-Speed 2606.07 samples/sec   Loss 5.7807   LearningRate 0.0192   Epoch: 11   Global Step: 465800   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:03:58,997-Speed 2631.58 samples/sec   Loss 5.9315   LearningRate 0.0192   Epoch: 11   Global Step: 465810   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:02,894-Speed 2628.10 samples/sec   Loss 5.7883   LearningRate 0.0192   Epoch: 11   Global Step: 465820   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:06,804-Speed 2619.64 samples/sec   Loss 5.8485   LearningRate 0.0192   Epoch: 11   Global Step: 465830   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:10,700-Speed 2628.98 samples/sec   Loss 5.8715   LearningRate 0.0192   Epoch: 11   Global Step: 465840   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:14,593-Speed 2631.33 samples/sec   Loss 5.7634   LearningRate 0.0192   Epoch: 11   Global Step: 465850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:18,488-Speed 2629.92 samples/sec   Loss 5.8902   LearningRate 0.0192   Epoch: 11   Global Step: 465860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:22,386-Speed 2627.77 samples/sec   Loss 5.7713   LearningRate 0.0192   Epoch: 11   Global Step: 465870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:26,286-Speed 2626.32 samples/sec   Loss 6.0045   LearningRate 0.0192   Epoch: 11   Global Step: 465880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:30,185-Speed 2627.00 samples/sec   Loss 5.8093   LearningRate 0.0192   Epoch: 11   Global Step: 465890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:34,080-Speed 2629.39 samples/sec   Loss 5.9084   LearningRate 0.0192   Epoch: 11   Global Step: 465900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:37,976-Speed 2629.00 samples/sec   Loss 5.9129   LearningRate 0.0192   Epoch: 11   Global Step: 465910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:41,867-Speed 2632.25 samples/sec   Loss 5.9294   LearningRate 0.0192   Epoch: 11   Global Step: 465920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:45,758-Speed 2632.49 samples/sec   Loss 5.9119   LearningRate 0.0192   Epoch: 11   Global Step: 465930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:49,668-Speed 2619.34 samples/sec   Loss 5.8328   LearningRate 0.0192   Epoch: 11   Global Step: 465940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:53,563-Speed 2630.20 samples/sec   Loss 5.8349   LearningRate 0.0192   Epoch: 11   Global Step: 465950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:04:57,463-Speed 2626.16 samples/sec   Loss 5.8940   LearningRate 0.0192   Epoch: 11   Global Step: 465960   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:01,357-Speed 2630.25 samples/sec   Loss 5.9531   LearningRate 0.0192   Epoch: 11   Global Step: 465970   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:05,250-Speed 2630.57 samples/sec   Loss 5.8880   LearningRate 0.0192   Epoch: 11   Global Step: 465980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:05:09,148-Speed 2628.01 samples/sec   Loss 5.9143   LearningRate 0.0192   Epoch: 11   Global Step: 465990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:05:13,048-Speed 2626.53 samples/sec   Loss 5.9247   LearningRate 0.0192   Epoch: 11   Global Step: 466000   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:05:16,949-Speed 2625.64 samples/sec   Loss 5.9199   LearningRate 0.0192   Epoch: 11   Global Step: 466010   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:05:20,824-Speed 2642.92 samples/sec   Loss 5.7840   LearningRate 0.0192   Epoch: 11   Global Step: 466020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:24,719-Speed 2630.13 samples/sec   Loss 5.9327   LearningRate 0.0192   Epoch: 11   Global Step: 466030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:28,614-Speed 2628.93 samples/sec   Loss 5.7530   LearningRate 0.0192   Epoch: 11   Global Step: 466040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:32,516-Speed 2624.67 samples/sec   Loss 5.8366   LearningRate 0.0192   Epoch: 11   Global Step: 466050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:36,413-Speed 2628.19 samples/sec   Loss 5.9125   LearningRate 0.0192   Epoch: 11   Global Step: 466060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:40,309-Speed 2629.84 samples/sec   Loss 5.8104   LearningRate 0.0192   Epoch: 11   Global Step: 466070   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:44,212-Speed 2623.87 samples/sec   Loss 5.7952   LearningRate 0.0192   Epoch: 11   Global Step: 466080   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:48,112-Speed 2626.91 samples/sec   Loss 5.9779   LearningRate 0.0192   Epoch: 11   Global Step: 466090   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:52,019-Speed 2621.20 samples/sec   Loss 5.7656   LearningRate 0.0192   Epoch: 11   Global Step: 466100   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:55,961-Speed 2598.92 samples/sec   Loss 5.9494   LearningRate 0.0192   Epoch: 11   Global Step: 466110   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:05:59,852-Speed 2632.51 samples/sec   Loss 5.9495   LearningRate 0.0192   Epoch: 11   Global Step: 466120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:06:03,754-Speed 2624.78 samples/sec   Loss 5.8447   LearningRate 0.0192   Epoch: 11   Global Step: 466130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:06:07,659-Speed 2622.94 samples/sec   Loss 5.8602   LearningRate 0.0192   Epoch: 11   Global Step: 466140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:06:11,569-Speed 2619.84 samples/sec   Loss 5.8673   LearningRate 0.0192   Epoch: 11   Global Step: 466150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:06:15,465-Speed 2629.49 samples/sec   Loss 5.9072   LearningRate 0.0192   Epoch: 11   Global Step: 466160   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:06:19,362-Speed 2628.57 samples/sec   Loss 5.8074   LearningRate 0.0192   Epoch: 11   Global Step: 466170   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:06:23,236-Speed 2643.56 samples/sec   Loss 5.9139   LearningRate 0.0192   Epoch: 11   Global Step: 466180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:27,174-Speed 2601.23 samples/sec   Loss 5.8930   LearningRate 0.0192   Epoch: 11   Global Step: 466190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:31,073-Speed 2626.88 samples/sec   Loss 5.8975   LearningRate 0.0192   Epoch: 11   Global Step: 466200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:34,977-Speed 2623.50 samples/sec   Loss 5.7210   LearningRate 0.0192   Epoch: 11   Global Step: 466210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:38,876-Speed 2626.92 samples/sec   Loss 5.8311   LearningRate 0.0192   Epoch: 11   Global Step: 466220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:42,773-Speed 2628.44 samples/sec   Loss 5.9492   LearningRate 0.0192   Epoch: 11   Global Step: 466230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:46,668-Speed 2630.28 samples/sec   Loss 5.9372   LearningRate 0.0192   Epoch: 11   Global Step: 466240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:50,562-Speed 2630.49 samples/sec   Loss 5.8831   LearningRate 0.0192   Epoch: 11   Global Step: 466250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:54,455-Speed 2630.83 samples/sec   Loss 5.8237   LearningRate 0.0192   Epoch: 11   Global Step: 466260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:06:58,353-Speed 2627.69 samples/sec   Loss 5.8554   LearningRate 0.0192   Epoch: 11   Global Step: 466270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:07:02,266-Speed 2617.03 samples/sec   Loss 5.8354   LearningRate 0.0192   Epoch: 11   Global Step: 466280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:06,162-Speed 2629.59 samples/sec   Loss 5.8738   LearningRate 0.0192   Epoch: 11   Global Step: 466290   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:10,061-Speed 2627.21 samples/sec   Loss 5.7231   LearningRate 0.0192   Epoch: 11   Global Step: 466300   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:13,961-Speed 2626.09 samples/sec   Loss 5.8410   LearningRate 0.0192   Epoch: 11   Global Step: 466310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:17,897-Speed 2602.70 samples/sec   Loss 5.8775   LearningRate 0.0192   Epoch: 11   Global Step: 466320   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:21,793-Speed 2628.73 samples/sec   Loss 5.8307   LearningRate 0.0192   Epoch: 11   Global Step: 466330   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:25,690-Speed 2628.60 samples/sec   Loss 5.7845   LearningRate 0.0192   Epoch: 11   Global Step: 466340   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:29,586-Speed 2629.06 samples/sec   Loss 5.9363   LearningRate 0.0192   Epoch: 11   Global Step: 466350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:33,483-Speed 2628.00 samples/sec   Loss 5.7873   LearningRate 0.0192   Epoch: 11   Global Step: 466360   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:37,376-Speed 2630.96 samples/sec   Loss 5.7985   LearningRate 0.0192   Epoch: 11   Global Step: 466370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:41,321-Speed 2596.25 samples/sec   Loss 5.9274   LearningRate 0.0192   Epoch: 11   Global Step: 466380   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-04-15 00:07:45,193-Speed 2646.04 samples/sec   Loss 5.8088   LearningRate 0.0192   Epoch: 11   Global Step: 466390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:49,111-Speed 2614.34 samples/sec   Loss 5.8653   LearningRate 0.0192   Epoch: 11   Global Step: 466400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:53,037-Speed 2608.84 samples/sec   Loss 5.8567   LearningRate 0.0192   Epoch: 11   Global Step: 466410   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:07:56,933-Speed 2628.84 samples/sec   Loss 5.7722   LearningRate 0.0192   Epoch: 11   Global Step: 466420   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:00,929-Speed 2563.88 samples/sec   Loss 5.8226   LearningRate 0.0192   Epoch: 11   Global Step: 466430   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:04,825-Speed 2628.72 samples/sec   Loss 5.8936   LearningRate 0.0192   Epoch: 11   Global Step: 466440   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:08,737-Speed 2618.60 samples/sec   Loss 5.8259   LearningRate 0.0192   Epoch: 11   Global Step: 466450   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:12,631-Speed 2630.08 samples/sec   Loss 5.8639   LearningRate 0.0192   Epoch: 11   Global Step: 466460   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:16,546-Speed 2616.68 samples/sec   Loss 5.9458   LearningRate 0.0192   Epoch: 11   Global Step: 466470   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:20,445-Speed 2626.82 samples/sec   Loss 5.8157   LearningRate 0.0192   Epoch: 11   Global Step: 466480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:24,341-Speed 2628.93 samples/sec   Loss 5.8959   LearningRate 0.0192   Epoch: 11   Global Step: 466490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:08:28,223-Speed 2638.21 samples/sec   Loss 5.8585   LearningRate 0.0192   Epoch: 11   Global Step: 466500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:32,126-Speed 2624.55 samples/sec   Loss 5.8325   LearningRate 0.0192   Epoch: 11   Global Step: 466510   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:36,029-Speed 2624.27 samples/sec   Loss 5.9517   LearningRate 0.0192   Epoch: 11   Global Step: 466520   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:39,923-Speed 2630.62 samples/sec   Loss 5.8751   LearningRate 0.0192   Epoch: 11   Global Step: 466530   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:43,815-Speed 2631.39 samples/sec   Loss 5.9555   LearningRate 0.0192   Epoch: 11   Global Step: 466540   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:47,712-Speed 2628.22 samples/sec   Loss 5.8323   LearningRate 0.0191   Epoch: 11   Global Step: 466550   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:51,612-Speed 2626.88 samples/sec   Loss 5.7874   LearningRate 0.0191   Epoch: 11   Global Step: 466560   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:55,506-Speed 2630.01 samples/sec   Loss 5.8350   LearningRate 0.0191   Epoch: 11   Global Step: 466570   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:08:59,405-Speed 2627.51 samples/sec   Loss 5.7600   LearningRate 0.0191   Epoch: 11   Global Step: 466580   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:03,312-Speed 2620.94 samples/sec   Loss 5.8858   LearningRate 0.0191   Epoch: 11   Global Step: 466590   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:07,237-Speed 2609.82 samples/sec   Loss 5.8818   LearningRate 0.0191   Epoch: 11   Global Step: 466600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:11,146-Speed 2619.74 samples/sec   Loss 5.7480   LearningRate 0.0191   Epoch: 11   Global Step: 466610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:15,041-Speed 2630.06 samples/sec   Loss 5.8889   LearningRate 0.0191   Epoch: 11   Global Step: 466620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:18,940-Speed 2627.19 samples/sec   Loss 5.8104   LearningRate 0.0191   Epoch: 11   Global Step: 466630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:22,835-Speed 2629.64 samples/sec   Loss 5.8442   LearningRate 0.0191   Epoch: 11   Global Step: 466640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:26,734-Speed 2626.78 samples/sec   Loss 5.8551   LearningRate 0.0191   Epoch: 11   Global Step: 466650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:30,638-Speed 2623.51 samples/sec   Loss 5.8422   LearningRate 0.0191   Epoch: 11   Global Step: 466660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:34,536-Speed 2627.55 samples/sec   Loss 5.8501   LearningRate 0.0191   Epoch: 11   Global Step: 466670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:09:38,560-Speed 2545.51 samples/sec   Loss 5.9064   LearningRate 0.0191   Epoch: 11   Global Step: 466680   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:42,457-Speed 2627.97 samples/sec   Loss 5.8215   LearningRate 0.0191   Epoch: 11   Global Step: 466690   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:46,358-Speed 2625.59 samples/sec   Loss 5.7196   LearningRate 0.0191   Epoch: 11   Global Step: 466700   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:50,256-Speed 2627.58 samples/sec   Loss 5.8721   LearningRate 0.0191   Epoch: 11   Global Step: 466710   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:54,164-Speed 2621.09 samples/sec   Loss 5.7524   LearningRate 0.0191   Epoch: 11   Global Step: 466720   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:09:58,056-Speed 2632.15 samples/sec   Loss 5.9093   LearningRate 0.0191   Epoch: 11   Global Step: 466730   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:10:01,952-Speed 2629.05 samples/sec   Loss 5.8155   LearningRate 0.0191   Epoch: 11   Global Step: 466740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:10:05,852-Speed 2625.98 samples/sec   Loss 5.9161   LearningRate 0.0191   Epoch: 11   Global Step: 466750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:10:09,745-Speed 2630.36 samples/sec   Loss 5.9135   LearningRate 0.0191   Epoch: 11   Global Step: 466760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:10:13,642-Speed 2628.20 samples/sec   Loss 5.9158   LearningRate 0.0191   Epoch: 11   Global Step: 466770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:10:17,543-Speed 2625.93 samples/sec   Loss 5.7465   LearningRate 0.0191   Epoch: 11   Global Step: 466780   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:21,450-Speed 2622.04 samples/sec   Loss 5.8435   LearningRate 0.0191   Epoch: 11   Global Step: 466790   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:25,357-Speed 2621.96 samples/sec   Loss 5.7765   LearningRate 0.0191   Epoch: 11   Global Step: 466800   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:29,279-Speed 2611.33 samples/sec   Loss 5.8342   LearningRate 0.0191   Epoch: 11   Global Step: 466810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:33,186-Speed 2621.69 samples/sec   Loss 5.7721   LearningRate 0.0191   Epoch: 11   Global Step: 466820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:37,084-Speed 2628.18 samples/sec   Loss 5.7901   LearningRate 0.0191   Epoch: 11   Global Step: 466830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:40,980-Speed 2628.37 samples/sec   Loss 5.8331   LearningRate 0.0191   Epoch: 11   Global Step: 466840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:44,881-Speed 2625.52 samples/sec   Loss 5.8790   LearningRate 0.0191   Epoch: 11   Global Step: 466850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:48,791-Speed 2619.99 samples/sec   Loss 5.9010   LearningRate 0.0191   Epoch: 11   Global Step: 466860   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:10:52,676-Speed 2636.51 samples/sec   Loss 5.8459   LearningRate 0.0191   Epoch: 11   Global Step: 466870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:10:56,610-Speed 2603.83 samples/sec   Loss 5.8165   LearningRate 0.0191   Epoch: 11   Global Step: 466880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:00,574-Speed 2583.92 samples/sec   Loss 5.7102   LearningRate 0.0191   Epoch: 11   Global Step: 466890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:04,507-Speed 2604.66 samples/sec   Loss 5.7783   LearningRate 0.0191   Epoch: 11   Global Step: 466900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:08,432-Speed 2609.13 samples/sec   Loss 5.8975   LearningRate 0.0191   Epoch: 11   Global Step: 466910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:12,336-Speed 2624.18 samples/sec   Loss 5.8236   LearningRate 0.0191   Epoch: 11   Global Step: 466920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:16,236-Speed 2626.47 samples/sec   Loss 5.9227   LearningRate 0.0191   Epoch: 11   Global Step: 466930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:20,135-Speed 2626.36 samples/sec   Loss 5.8964   LearningRate 0.0191   Epoch: 11   Global Step: 466940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:24,031-Speed 2629.47 samples/sec   Loss 5.8457   LearningRate 0.0191   Epoch: 11   Global Step: 466950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:27,928-Speed 2628.11 samples/sec   Loss 5.9073   LearningRate 0.0191   Epoch: 11   Global Step: 466960   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:31,829-Speed 2625.90 samples/sec   Loss 5.8082   LearningRate 0.0191   Epoch: 11   Global Step: 466970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:11:35,730-Speed 2625.69 samples/sec   Loss 5.8042   LearningRate 0.0191   Epoch: 11   Global Step: 466980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:11:39,635-Speed 2622.78 samples/sec   Loss 5.9025   LearningRate 0.0191   Epoch: 11   Global Step: 466990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:11:43,531-Speed 2628.24 samples/sec   Loss 5.8620   LearningRate 0.0191   Epoch: 11   Global Step: 467000   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:11:47,436-Speed 2623.83 samples/sec   Loss 5.9437   LearningRate 0.0191   Epoch: 11   Global Step: 467010   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:11:51,333-Speed 2628.14 samples/sec   Loss 5.8933   LearningRate 0.0191   Epoch: 11   Global Step: 467020   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:11:55,207-Speed 2643.96 samples/sec   Loss 5.7980   LearningRate 0.0191   Epoch: 11   Global Step: 467030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:11:59,137-Speed 2606.35 samples/sec   Loss 5.7398   LearningRate 0.0191   Epoch: 11   Global Step: 467040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:12:03,031-Speed 2630.74 samples/sec   Loss 5.7730   LearningRate 0.0191   Epoch: 11   Global Step: 467050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:12:06,929-Speed 2627.73 samples/sec   Loss 5.8239   LearningRate 0.0191   Epoch: 11   Global Step: 467060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:12:10,809-Speed 2639.26 samples/sec   Loss 5.7849   LearningRate 0.0191   Epoch: 11   Global Step: 467070   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:14,711-Speed 2624.92 samples/sec   Loss 5.8281   LearningRate 0.0191   Epoch: 11   Global Step: 467080   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:18,608-Speed 2628.08 samples/sec   Loss 5.7673   LearningRate 0.0191   Epoch: 11   Global Step: 467090   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:22,507-Speed 2627.92 samples/sec   Loss 5.7975   LearningRate 0.0191   Epoch: 11   Global Step: 467100   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:26,409-Speed 2624.78 samples/sec   Loss 5.7942   LearningRate 0.0191   Epoch: 11   Global Step: 467110   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:30,329-Speed 2612.91 samples/sec   Loss 5.8174   LearningRate 0.0191   Epoch: 11   Global Step: 467120   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:34,234-Speed 2623.23 samples/sec   Loss 5.8414   LearningRate 0.0191   Epoch: 11   Global Step: 467130   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:38,132-Speed 2627.09 samples/sec   Loss 5.8890   LearningRate 0.0191   Epoch: 11   Global Step: 467140   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:42,028-Speed 2628.67 samples/sec   Loss 5.7255   LearningRate 0.0191   Epoch: 11   Global Step: 467150   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:45,946-Speed 2615.01 samples/sec   Loss 5.8926   LearningRate 0.0191   Epoch: 11   Global Step: 467160   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-04-15 00:12:49,838-Speed 2631.32 samples/sec   Loss 5.8032   LearningRate 0.0191   Epoch: 11   Global Step: 467170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:12:53,734-Speed 2629.27 samples/sec   Loss 5.8293   LearningRate 0.0191   Epoch: 11   Global Step: 467180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:12:57,634-Speed 2626.63 samples/sec   Loss 5.7905   LearningRate 0.0191   Epoch: 11   Global Step: 467190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:01,532-Speed 2627.72 samples/sec   Loss 5.9071   LearningRate 0.0191   Epoch: 11   Global Step: 467200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:05,433-Speed 2625.59 samples/sec   Loss 5.8653   LearningRate 0.0191   Epoch: 11   Global Step: 467210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:09,326-Speed 2630.90 samples/sec   Loss 5.7949   LearningRate 0.0191   Epoch: 11   Global Step: 467220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:13,231-Speed 2622.35 samples/sec   Loss 5.9010   LearningRate 0.0191   Epoch: 11   Global Step: 467230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:17,128-Speed 2628.86 samples/sec   Loss 5.8600   LearningRate 0.0191   Epoch: 11   Global Step: 467240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:21,022-Speed 2630.53 samples/sec   Loss 5.8431   LearningRate 0.0191   Epoch: 11   Global Step: 467250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:24,918-Speed 2628.87 samples/sec   Loss 5.8507   LearningRate 0.0191   Epoch: 11   Global Step: 467260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:28,824-Speed 2621.98 samples/sec   Loss 5.7872   LearningRate 0.0191   Epoch: 11   Global Step: 467270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:13:32,704-Speed 2640.38 samples/sec   Loss 5.9425   LearningRate 0.0191   Epoch: 11   Global Step: 467280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:36,602-Speed 2627.96 samples/sec   Loss 5.9062   LearningRate 0.0191   Epoch: 11   Global Step: 467290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:40,504-Speed 2624.53 samples/sec   Loss 5.8765   LearningRate 0.0191   Epoch: 11   Global Step: 467300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:44,403-Speed 2627.25 samples/sec   Loss 5.8116   LearningRate 0.0191   Epoch: 11   Global Step: 467310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:48,295-Speed 2631.86 samples/sec   Loss 5.8355   LearningRate 0.0191   Epoch: 11   Global Step: 467320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:52,201-Speed 2622.37 samples/sec   Loss 5.8353   LearningRate 0.0191   Epoch: 11   Global Step: 467330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:13:56,114-Speed 2617.55 samples/sec   Loss 5.8490   LearningRate 0.0191   Epoch: 11   Global Step: 467340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:00,017-Speed 2624.56 samples/sec   Loss 5.8159   LearningRate 0.0191   Epoch: 11   Global Step: 467350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:03,950-Speed 2604.37 samples/sec   Loss 5.8367   LearningRate 0.0191   Epoch: 11   Global Step: 467360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:07,976-Speed 2543.91 samples/sec   Loss 5.6802   LearningRate 0.0191   Epoch: 11   Global Step: 467370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:11,880-Speed 2623.36 samples/sec   Loss 6.0063   LearningRate 0.0191   Epoch: 11   Global Step: 467380   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:14:15,779-Speed 2627.42 samples/sec   Loss 5.9389   LearningRate 0.0191   Epoch: 11   Global Step: 467390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:14:19,681-Speed 2624.89 samples/sec   Loss 5.9283   LearningRate 0.0191   Epoch: 11   Global Step: 467400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:14:23,598-Speed 2616.00 samples/sec   Loss 5.8862   LearningRate 0.0191   Epoch: 11   Global Step: 467410   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:14:27,485-Speed 2635.00 samples/sec   Loss 5.8586   LearningRate 0.0191   Epoch: 11   Global Step: 467420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:31,390-Speed 2628.71 samples/sec   Loss 5.7269   LearningRate 0.0191   Epoch: 11   Global Step: 467430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:35,288-Speed 2627.45 samples/sec   Loss 5.9046   LearningRate 0.0191   Epoch: 11   Global Step: 467440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:39,197-Speed 2620.59 samples/sec   Loss 5.9341   LearningRate 0.0191   Epoch: 11   Global Step: 467450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:43,093-Speed 2628.77 samples/sec   Loss 5.8814   LearningRate 0.0191   Epoch: 11   Global Step: 467460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:46,987-Speed 2630.83 samples/sec   Loss 5.8645   LearningRate 0.0191   Epoch: 11   Global Step: 467470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:50,883-Speed 2628.87 samples/sec   Loss 5.8718   LearningRate 0.0191   Epoch: 11   Global Step: 467480   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:54,776-Speed 2630.79 samples/sec   Loss 5.8471   LearningRate 0.0191   Epoch: 11   Global Step: 467490   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:14:58,682-Speed 2622.09 samples/sec   Loss 5.8865   LearningRate 0.0190   Epoch: 11   Global Step: 467500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:15:02,582-Speed 2627.20 samples/sec   Loss 5.8720   LearningRate 0.0190   Epoch: 11   Global Step: 467510   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:15:06,486-Speed 2623.58 samples/sec   Loss 5.7486   LearningRate 0.0190   Epoch: 11   Global Step: 467520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:10,391-Speed 2622.16 samples/sec   Loss 5.6953   LearningRate 0.0190   Epoch: 11   Global Step: 467530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:14,288-Speed 2628.80 samples/sec   Loss 5.8953   LearningRate 0.0190   Epoch: 11   Global Step: 467540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:18,192-Speed 2623.68 samples/sec   Loss 5.7564   LearningRate 0.0190   Epoch: 11   Global Step: 467550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:22,089-Speed 2628.73 samples/sec   Loss 5.7423   LearningRate 0.0190   Epoch: 11   Global Step: 467560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:26,132-Speed 2532.79 samples/sec   Loss 5.7197   LearningRate 0.0190   Epoch: 11   Global Step: 467570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:30,224-Speed 2503.21 samples/sec   Loss 5.8054   LearningRate 0.0190   Epoch: 11   Global Step: 467580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:34,315-Speed 2503.98 samples/sec   Loss 5.8523   LearningRate 0.0190   Epoch: 11   Global Step: 467590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:38,322-Speed 2556.07 samples/sec   Loss 5.8189   LearningRate 0.0190   Epoch: 11   Global Step: 467600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:42,221-Speed 2627.05 samples/sec   Loss 5.7652   LearningRate 0.0190   Epoch: 11   Global Step: 467610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:46,121-Speed 2626.26 samples/sec   Loss 5.9184   LearningRate 0.0190   Epoch: 11   Global Step: 467620   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-04-15 00:15:50,011-Speed 2632.64 samples/sec   Loss 5.7712   LearningRate 0.0190   Epoch: 11   Global Step: 467630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:53,908-Speed 2628.92 samples/sec   Loss 5.8436   LearningRate 0.0190   Epoch: 11   Global Step: 467640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:15:57,806-Speed 2627.78 samples/sec   Loss 5.8528   LearningRate 0.0190   Epoch: 11   Global Step: 467650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:16:01,730-Speed 2610.35 samples/sec   Loss 5.8906   LearningRate 0.0190   Epoch: 11   Global Step: 467660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:16:05,622-Speed 2631.61 samples/sec   Loss 5.8172   LearningRate 0.0190   Epoch: 11   Global Step: 467670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:16:09,534-Speed 2618.01 samples/sec   Loss 6.0142   LearningRate 0.0190   Epoch: 11   Global Step: 467680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:16:13,426-Speed 2631.97 samples/sec   Loss 5.9419   LearningRate 0.0190   Epoch: 11   Global Step: 467690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:16:17,319-Speed 2630.66 samples/sec   Loss 5.7693   LearningRate 0.0190   Epoch: 11   Global Step: 467700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:16:21,199-Speed 2640.46 samples/sec   Loss 5.8907   LearningRate 0.0190   Epoch: 11   Global Step: 467710   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:25,095-Speed 2628.48 samples/sec   Loss 5.8676   LearningRate 0.0190   Epoch: 11   Global Step: 467720   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:29,010-Speed 2616.16 samples/sec   Loss 5.8646   LearningRate 0.0190   Epoch: 11   Global Step: 467730   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:32,921-Speed 2619.42 samples/sec   Loss 5.7729   LearningRate 0.0190   Epoch: 11   Global Step: 467740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:36,916-Speed 2564.23 samples/sec   Loss 5.7778   LearningRate 0.0190   Epoch: 11   Global Step: 467750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:40,837-Speed 2611.93 samples/sec   Loss 5.7647   LearningRate 0.0190   Epoch: 11   Global Step: 467760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:44,748-Speed 2619.13 samples/sec   Loss 5.9165   LearningRate 0.0190   Epoch: 11   Global Step: 467770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:48,659-Speed 2618.98 samples/sec   Loss 5.7971   LearningRate 0.0190   Epoch: 11   Global Step: 467780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:52,561-Speed 2625.13 samples/sec   Loss 5.9007   LearningRate 0.0190   Epoch: 11   Global Step: 467790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:16:56,461-Speed 2625.99 samples/sec   Loss 5.7988   LearningRate 0.0190   Epoch: 11   Global Step: 467800   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:00,372-Speed 2618.91 samples/sec   Loss 5.7874   LearningRate 0.0190   Epoch: 11   Global Step: 467810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:17:04,272-Speed 2626.69 samples/sec   Loss 5.7772   LearningRate 0.0190   Epoch: 11   Global Step: 467820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:17:08,152-Speed 2639.87 samples/sec   Loss 5.9270   LearningRate 0.0190   Epoch: 11   Global Step: 467830   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:12,049-Speed 2628.47 samples/sec   Loss 5.8260   LearningRate 0.0190   Epoch: 11   Global Step: 467840   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:15,947-Speed 2627.46 samples/sec   Loss 5.7141   LearningRate 0.0190   Epoch: 11   Global Step: 467850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:19,877-Speed 2605.99 samples/sec   Loss 5.7980   LearningRate 0.0190   Epoch: 11   Global Step: 467860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:23,771-Speed 2630.60 samples/sec   Loss 5.7797   LearningRate 0.0190   Epoch: 11   Global Step: 467870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:27,670-Speed 2627.21 samples/sec   Loss 5.8985   LearningRate 0.0190   Epoch: 11   Global Step: 467880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:31,570-Speed 2626.56 samples/sec   Loss 5.9989   LearningRate 0.0190   Epoch: 11   Global Step: 467890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:35,469-Speed 2626.70 samples/sec   Loss 5.7329   LearningRate 0.0190   Epoch: 11   Global Step: 467900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:39,366-Speed 2628.24 samples/sec   Loss 5.8413   LearningRate 0.0190   Epoch: 11   Global Step: 467910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:43,262-Speed 2629.34 samples/sec   Loss 5.8440   LearningRate 0.0190   Epoch: 11   Global Step: 467920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:17:47,172-Speed 2619.98 samples/sec   Loss 5.8205   LearningRate 0.0190   Epoch: 11   Global Step: 467930   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:17:51,105-Speed 2604.43 samples/sec   Loss 5.8049   LearningRate 0.0190   Epoch: 11   Global Step: 467940   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:17:55,010-Speed 2623.35 samples/sec   Loss 5.7948   LearningRate 0.0190   Epoch: 11   Global Step: 467950   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:17:58,906-Speed 2628.86 samples/sec   Loss 5.8399   LearningRate 0.0190   Epoch: 11   Global Step: 467960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:18:02,808-Speed 2624.75 samples/sec   Loss 5.8238   LearningRate 0.0190   Epoch: 11   Global Step: 467970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:18:06,716-Speed 2621.07 samples/sec   Loss 5.6903   LearningRate 0.0190   Epoch: 11   Global Step: 467980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:18:10,597-Speed 2639.73 samples/sec   Loss 5.8985   LearningRate 0.0190   Epoch: 11   Global Step: 467990   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:14,499-Speed 2624.25 samples/sec   Loss 5.8853   LearningRate 0.0190   Epoch: 11   Global Step: 468000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:18,415-Speed 2623.63 samples/sec   Loss 5.8157   LearningRate 0.0190   Epoch: 11   Global Step: 468010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:22,308-Speed 2631.17 samples/sec   Loss 5.9097   LearningRate 0.0190   Epoch: 11   Global Step: 468020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:26,201-Speed 2631.07 samples/sec   Loss 5.8836   LearningRate 0.0190   Epoch: 11   Global Step: 468030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:30,133-Speed 2604.87 samples/sec   Loss 5.9548   LearningRate 0.0190   Epoch: 11   Global Step: 468040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:34,031-Speed 2627.46 samples/sec   Loss 5.8135   LearningRate 0.0190   Epoch: 11   Global Step: 468050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:37,931-Speed 2625.94 samples/sec   Loss 5.7416   LearningRate 0.0190   Epoch: 11   Global Step: 468060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:41,825-Speed 2631.05 samples/sec   Loss 5.8184   LearningRate 0.0190   Epoch: 11   Global Step: 468070   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:45,718-Speed 2630.21 samples/sec   Loss 5.8433   LearningRate 0.0190   Epoch: 11   Global Step: 468080   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:18:49,631-Speed 2618.84 samples/sec   Loss 5.8451   LearningRate 0.0190   Epoch: 11   Global Step: 468090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:18:53,521-Speed 2633.34 samples/sec   Loss 5.8064   LearningRate 0.0190   Epoch: 11   Global Step: 468100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:18:57,415-Speed 2630.24 samples/sec   Loss 5.8510   LearningRate 0.0190   Epoch: 11   Global Step: 468110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:19:01,313-Speed 2627.65 samples/sec   Loss 5.8623   LearningRate 0.0190   Epoch: 11   Global Step: 468120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:19:05,206-Speed 2630.92 samples/sec   Loss 5.8770   LearningRate 0.0190   Epoch: 11   Global Step: 468130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:19:09,083-Speed 2641.64 samples/sec   Loss 5.8609   LearningRate 0.0190   Epoch: 11   Global Step: 468140   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:13,000-Speed 2615.30 samples/sec   Loss 5.8764   LearningRate 0.0190   Epoch: 11   Global Step: 468150   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:16,902-Speed 2625.02 samples/sec   Loss 5.8488   LearningRate 0.0190   Epoch: 11   Global Step: 468160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:20,796-Speed 2630.21 samples/sec   Loss 5.9433   LearningRate 0.0190   Epoch: 11   Global Step: 468170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:24,695-Speed 2626.78 samples/sec   Loss 5.7983   LearningRate 0.0190   Epoch: 11   Global Step: 468180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:28,607-Speed 2618.46 samples/sec   Loss 5.8245   LearningRate 0.0190   Epoch: 11   Global Step: 468190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:32,504-Speed 2628.63 samples/sec   Loss 5.7852   LearningRate 0.0190   Epoch: 11   Global Step: 468200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:36,397-Speed 2631.12 samples/sec   Loss 5.9405   LearningRate 0.0190   Epoch: 11   Global Step: 468210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:40,290-Speed 2630.90 samples/sec   Loss 5.7769   LearningRate 0.0190   Epoch: 11   Global Step: 468220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:44,186-Speed 2628.81 samples/sec   Loss 5.8399   LearningRate 0.0190   Epoch: 11   Global Step: 468230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:19:48,080-Speed 2630.81 samples/sec   Loss 5.9140   LearningRate 0.0190   Epoch: 11   Global Step: 468240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:19:51,972-Speed 2631.34 samples/sec   Loss 5.8378   LearningRate 0.0190   Epoch: 11   Global Step: 468250   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:19:55,881-Speed 2620.69 samples/sec   Loss 5.8504   LearningRate 0.0190   Epoch: 11   Global Step: 468260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:19:59,776-Speed 2629.18 samples/sec   Loss 5.7454   LearningRate 0.0190   Epoch: 11   Global Step: 468270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:20:03,683-Speed 2621.63 samples/sec   Loss 5.8673   LearningRate 0.0190   Epoch: 11   Global Step: 468280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:20:07,560-Speed 2642.23 samples/sec   Loss 5.8383   LearningRate 0.0190   Epoch: 11   Global Step: 468290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:11,466-Speed 2622.47 samples/sec   Loss 5.8914   LearningRate 0.0190   Epoch: 11   Global Step: 468300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:15,357-Speed 2632.05 samples/sec   Loss 5.7761   LearningRate 0.0190   Epoch: 11   Global Step: 468310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:19,279-Speed 2610.93 samples/sec   Loss 5.8778   LearningRate 0.0190   Epoch: 11   Global Step: 468320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:23,182-Speed 2625.14 samples/sec   Loss 5.9042   LearningRate 0.0190   Epoch: 11   Global Step: 468330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:27,079-Speed 2628.26 samples/sec   Loss 5.8280   LearningRate 0.0190   Epoch: 11   Global Step: 468340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:30,973-Speed 2630.10 samples/sec   Loss 5.8608   LearningRate 0.0190   Epoch: 11   Global Step: 468350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:34,869-Speed 2629.21 samples/sec   Loss 5.8524   LearningRate 0.0190   Epoch: 11   Global Step: 468360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:38,769-Speed 2625.59 samples/sec   Loss 5.7433   LearningRate 0.0190   Epoch: 11   Global Step: 468370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:42,661-Speed 2631.98 samples/sec   Loss 5.8970   LearningRate 0.0190   Epoch: 11   Global Step: 468380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:20:46,552-Speed 2632.45 samples/sec   Loss 5.8902   LearningRate 0.0190   Epoch: 11   Global Step: 468390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:20:50,449-Speed 2628.68 samples/sec   Loss 5.8022   LearningRate 0.0190   Epoch: 11   Global Step: 468400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:20:54,369-Speed 2612.98 samples/sec   Loss 5.7925   LearningRate 0.0190   Epoch: 11   Global Step: 468410   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:20:58,268-Speed 2626.52 samples/sec   Loss 5.6866   LearningRate 0.0190   Epoch: 11   Global Step: 468420   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:21:02,171-Speed 2624.74 samples/sec   Loss 5.8891   LearningRate 0.0190   Epoch: 11   Global Step: 468430   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:21:06,064-Speed 2630.93 samples/sec   Loss 5.7430   LearningRate 0.0190   Epoch: 11   Global Step: 468440   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:21:09,971-Speed 2621.55 samples/sec   Loss 5.8506   LearningRate 0.0190   Epoch: 11   Global Step: 468450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:13,865-Speed 2629.56 samples/sec   Loss 5.8302   LearningRate 0.0189   Epoch: 11   Global Step: 468460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:17,769-Speed 2624.68 samples/sec   Loss 5.8864   LearningRate 0.0189   Epoch: 11   Global Step: 468470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:21,680-Speed 2619.01 samples/sec   Loss 5.7689   LearningRate 0.0189   Epoch: 11   Global Step: 468480   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:25,576-Speed 2629.47 samples/sec   Loss 5.8749   LearningRate 0.0189   Epoch: 11   Global Step: 468490   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:29,471-Speed 2628.88 samples/sec   Loss 5.9249   LearningRate 0.0189   Epoch: 11   Global Step: 468500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:33,393-Speed 2612.40 samples/sec   Loss 6.0254   LearningRate 0.0189   Epoch: 11   Global Step: 468510   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:37,298-Speed 2622.73 samples/sec   Loss 5.7566   LearningRate 0.0189   Epoch: 11   Global Step: 468520   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:41,210-Speed 2618.16 samples/sec   Loss 5.7634   LearningRate 0.0189   Epoch: 11   Global Step: 468530   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:45,114-Speed 2623.84 samples/sec   Loss 5.8613   LearningRate 0.0189   Epoch: 11   Global Step: 468540   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:21:49,008-Speed 2630.69 samples/sec   Loss 5.8847   LearningRate 0.0189   Epoch: 11   Global Step: 468550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:21:52,907-Speed 2626.64 samples/sec   Loss 5.7963   LearningRate 0.0189   Epoch: 11   Global Step: 468560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:21:56,785-Speed 2641.52 samples/sec   Loss 5.8660   LearningRate 0.0189   Epoch: 11   Global Step: 468570   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:00,701-Speed 2615.47 samples/sec   Loss 5.9214   LearningRate 0.0189   Epoch: 11   Global Step: 468580   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:04,592-Speed 2632.07 samples/sec   Loss 5.8149   LearningRate 0.0189   Epoch: 11   Global Step: 468590   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:08,489-Speed 2628.21 samples/sec   Loss 5.8363   LearningRate 0.0189   Epoch: 11   Global Step: 468600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:12,383-Speed 2630.61 samples/sec   Loss 5.7838   LearningRate 0.0189   Epoch: 11   Global Step: 468610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:16,297-Speed 2616.65 samples/sec   Loss 5.6974   LearningRate 0.0189   Epoch: 11   Global Step: 468620   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:20,195-Speed 2627.78 samples/sec   Loss 5.8875   LearningRate 0.0189   Epoch: 11   Global Step: 468630   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:24,100-Speed 2623.13 samples/sec   Loss 5.7568   LearningRate 0.0189   Epoch: 11   Global Step: 468640   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:27,995-Speed 2630.11 samples/sec   Loss 5.7853   LearningRate 0.0189   Epoch: 11   Global Step: 468650   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:31,955-Speed 2586.12 samples/sec   Loss 5.7385   LearningRate 0.0189   Epoch: 11   Global Step: 468660   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:35,880-Speed 2609.85 samples/sec   Loss 5.9293   LearningRate 0.0189   Epoch: 11   Global Step: 468670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:22:39,764-Speed 2636.94 samples/sec   Loss 5.8162   LearningRate 0.0189   Epoch: 11   Global Step: 468680   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:43,666-Speed 2625.50 samples/sec   Loss 5.7827   LearningRate 0.0189   Epoch: 11   Global Step: 468690   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:47,588-Speed 2611.04 samples/sec   Loss 5.7049   LearningRate 0.0189   Epoch: 11   Global Step: 468700   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:51,602-Speed 2551.73 samples/sec   Loss 5.7487   LearningRate 0.0189   Epoch: 11   Global Step: 468710   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:55,565-Speed 2585.17 samples/sec   Loss 5.9347   LearningRate 0.0189   Epoch: 11   Global Step: 468720   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:22:59,469-Speed 2623.31 samples/sec   Loss 5.8744   LearningRate 0.0189   Epoch: 11   Global Step: 468730   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:23:03,377-Speed 2620.91 samples/sec   Loss 5.8334   LearningRate 0.0189   Epoch: 11   Global Step: 468740   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:23:07,290-Speed 2617.59 samples/sec   Loss 5.8554   LearningRate 0.0189   Epoch: 11   Global Step: 468750   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:23:11,200-Speed 2619.98 samples/sec   Loss 5.7545   LearningRate 0.0189   Epoch: 11   Global Step: 468760   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:23:15,139-Speed 2599.94 samples/sec   Loss 5.8070   LearningRate 0.0189   Epoch: 11   Global Step: 468770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:23:19,035-Speed 2629.22 samples/sec   Loss 5.8421   LearningRate 0.0189   Epoch: 11   Global Step: 468780   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:22,934-Speed 2627.02 samples/sec   Loss 5.7414   LearningRate 0.0189   Epoch: 11   Global Step: 468790   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:26,833-Speed 2627.37 samples/sec   Loss 5.8028   LearningRate 0.0189   Epoch: 11   Global Step: 468800   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:30,731-Speed 2627.54 samples/sec   Loss 5.9112   LearningRate 0.0189   Epoch: 11   Global Step: 468810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:34,630-Speed 2626.59 samples/sec   Loss 5.8799   LearningRate 0.0189   Epoch: 11   Global Step: 468820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:38,527-Speed 2628.18 samples/sec   Loss 5.7823   LearningRate 0.0189   Epoch: 11   Global Step: 468830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:42,439-Speed 2623.64 samples/sec   Loss 5.7999   LearningRate 0.0189   Epoch: 11   Global Step: 468840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:46,339-Speed 2626.68 samples/sec   Loss 5.7970   LearningRate 0.0189   Epoch: 11   Global Step: 468850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:50,244-Speed 2622.84 samples/sec   Loss 5.8318   LearningRate 0.0189   Epoch: 11   Global Step: 468860   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:54,154-Speed 2619.34 samples/sec   Loss 5.7359   LearningRate 0.0189   Epoch: 11   Global Step: 468870   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:23:58,038-Speed 2637.42 samples/sec   Loss 5.8771   LearningRate 0.0189   Epoch: 11   Global Step: 468880   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:24:01,936-Speed 2627.04 samples/sec   Loss 5.8474   LearningRate 0.0189   Epoch: 11   Global Step: 468890   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:24:05,834-Speed 2627.59 samples/sec   Loss 5.7815   LearningRate 0.0189   Epoch: 11   Global Step: 468900   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:24:09,739-Speed 2622.86 samples/sec   Loss 5.9196   LearningRate 0.0189   Epoch: 11   Global Step: 468910   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-04-15 00:24:13,626-Speed 2635.50 samples/sec   Loss 5.7891   LearningRate 0.0189   Epoch: 11   Global Step: 468920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:17,535-Speed 2620.10 samples/sec   Loss 5.7592   LearningRate 0.0189   Epoch: 11   Global Step: 468930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:21,460-Speed 2612.45 samples/sec   Loss 5.8624   LearningRate 0.0189   Epoch: 11   Global Step: 468940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:25,354-Speed 2629.72 samples/sec   Loss 5.8273   LearningRate 0.0189   Epoch: 11   Global Step: 468950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:29,260-Speed 2622.45 samples/sec   Loss 5.8129   LearningRate 0.0189   Epoch: 11   Global Step: 468960   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:33,166-Speed 2622.54 samples/sec   Loss 5.7334   LearningRate 0.0189   Epoch: 11   Global Step: 468970   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:37,065-Speed 2626.56 samples/sec   Loss 5.7718   LearningRate 0.0189   Epoch: 11   Global Step: 468980   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:40,981-Speed 2615.66 samples/sec   Loss 5.8988   LearningRate 0.0189   Epoch: 11   Global Step: 468990   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:44,879-Speed 2627.15 samples/sec   Loss 5.8001   LearningRate 0.0189   Epoch: 11   Global Step: 469000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:48,778-Speed 2627.40 samples/sec   Loss 5.8401   LearningRate 0.0189   Epoch: 11   Global Step: 469010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-04-15 00:24:52,681-Speed 2624.26 samples/sec   Loss 5.8198   LearningRate 0.0189   Epoch: 11   Global Step: 469020   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:24:56,586-Speed 2623.08 samples/sec   Loss 5.7718   LearningRate 0.0189   Epoch: 11   Global Step: 469030   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:25:00,473-Speed 2634.23 samples/sec   Loss 5.8644   LearningRate 0.0189   Epoch: 11   Global Step: 469040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:04,371-Speed 2627.69 samples/sec   Loss 5.8195   LearningRate 0.0189   Epoch: 11   Global Step: 469050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:08,268-Speed 2628.36 samples/sec   Loss 5.7621   LearningRate 0.0189   Epoch: 11   Global Step: 469060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:12,165-Speed 2628.50 samples/sec   Loss 5.8912   LearningRate 0.0189   Epoch: 11   Global Step: 469070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:16,065-Speed 2626.39 samples/sec   Loss 5.7697   LearningRate 0.0189   Epoch: 11   Global Step: 469080   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:19,981-Speed 2615.34 samples/sec   Loss 5.7441   LearningRate 0.0189   Epoch: 11   Global Step: 469090   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:23,881-Speed 2626.48 samples/sec   Loss 5.7759   LearningRate 0.0189   Epoch: 11   Global Step: 469100   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:27,780-Speed 2626.45 samples/sec   Loss 5.8641   LearningRate 0.0189   Epoch: 11   Global Step: 469110   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:31,678-Speed 2627.49 samples/sec   Loss 5.7999   LearningRate 0.0189   Epoch: 11   Global Step: 469120   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:35,586-Speed 2621.18 samples/sec   Loss 5.7588   LearningRate 0.0189   Epoch: 11   Global Step: 469130   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:25:39,489-Speed 2623.56 samples/sec   Loss 5.8535   LearningRate 0.0189   Epoch: 11   Global Step: 469140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:25:43,390-Speed 2626.33 samples/sec   Loss 5.8720   LearningRate 0.0189   Epoch: 11   Global Step: 469150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:25:47,290-Speed 2626.56 samples/sec   Loss 5.8206   LearningRate 0.0189   Epoch: 11   Global Step: 469160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:25:51,186-Speed 2629.20 samples/sec   Loss 5.9205   LearningRate 0.0189   Epoch: 11   Global Step: 469170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:25:55,095-Speed 2619.89 samples/sec   Loss 5.7836   LearningRate 0.0189   Epoch: 11   Global Step: 469180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:25:58,996-Speed 2625.75 samples/sec   Loss 5.7131   LearningRate 0.0189   Epoch: 11   Global Step: 469190   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:02,897-Speed 2625.61 samples/sec   Loss 5.7806   LearningRate 0.0189   Epoch: 11   Global Step: 469200   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:06,843-Speed 2595.41 samples/sec   Loss 5.8567   LearningRate 0.0189   Epoch: 11   Global Step: 469210   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:10,750-Speed 2621.22 samples/sec   Loss 5.8090   LearningRate 0.0189   Epoch: 11   Global Step: 469220   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:14,642-Speed 2632.14 samples/sec   Loss 5.8634   LearningRate 0.0189   Epoch: 11   Global Step: 469230   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:18,525-Speed 2637.42 samples/sec   Loss 5.7996   LearningRate 0.0189   Epoch: 11   Global Step: 469240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:22,425-Speed 2626.39 samples/sec   Loss 5.8999   LearningRate 0.0189   Epoch: 11   Global Step: 469250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:26,325-Speed 2626.51 samples/sec   Loss 5.8350   LearningRate 0.0189   Epoch: 11   Global Step: 469260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:30,224-Speed 2627.27 samples/sec   Loss 5.8471   LearningRate 0.0189   Epoch: 11   Global Step: 469270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:34,132-Speed 2620.91 samples/sec   Loss 5.8052   LearningRate 0.0189   Epoch: 11   Global Step: 469280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:38,029-Speed 2628.22 samples/sec   Loss 5.7806   LearningRate 0.0189   Epoch: 11   Global Step: 469290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:41,933-Speed 2623.26 samples/sec   Loss 5.8597   LearningRate 0.0189   Epoch: 11   Global Step: 469300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:45,829-Speed 2628.31 samples/sec   Loss 5.7974   LearningRate 0.0189   Epoch: 11   Global Step: 469310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:49,724-Speed 2630.13 samples/sec   Loss 5.8162   LearningRate 0.0189   Epoch: 11   Global Step: 469320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:53,629-Speed 2622.94 samples/sec   Loss 5.8349   LearningRate 0.0189   Epoch: 11   Global Step: 469330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:26:57,530-Speed 2625.30 samples/sec   Loss 5.9106   LearningRate 0.0189   Epoch: 11   Global Step: 469340   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-04-15 00:27:01,413-Speed 2638.11 samples/sec   Loss 5.7605   LearningRate 0.0189   Epoch: 11   Global Step: 469350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:27:05,292-Speed 2640.46 samples/sec   Loss 5.7759   LearningRate 0.0189   Epoch: 11   Global Step: 469360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:09,211-Speed 2613.75 samples/sec   Loss 5.8192   LearningRate 0.0189   Epoch: 11   Global Step: 469370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:13,109-Speed 2626.91 samples/sec   Loss 5.8717   LearningRate 0.0189   Epoch: 11   Global Step: 469380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:17,014-Speed 2623.03 samples/sec   Loss 5.7022   LearningRate 0.0189   Epoch: 11   Global Step: 469390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:20,919-Speed 2622.89 samples/sec   Loss 5.8349   LearningRate 0.0189   Epoch: 11   Global Step: 469400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:24,816-Speed 2628.19 samples/sec   Loss 5.8721   LearningRate 0.0188   Epoch: 11   Global Step: 469410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:28,717-Speed 2625.59 samples/sec   Loss 5.7081   LearningRate 0.0188   Epoch: 11   Global Step: 469420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:32,625-Speed 2620.80 samples/sec   Loss 5.8689   LearningRate 0.0188   Epoch: 11   Global Step: 469430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:36,578-Speed 2591.33 samples/sec   Loss 5.6978   LearningRate 0.0188   Epoch: 11   Global Step: 469440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:40,491-Speed 2617.43 samples/sec   Loss 5.7591   LearningRate 0.0188   Epoch: 11   Global Step: 469450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:44,388-Speed 2628.44 samples/sec   Loss 5.8346   LearningRate 0.0188   Epoch: 11   Global Step: 469460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:27:48,298-Speed 2619.69 samples/sec   Loss 5.7717   LearningRate 0.0188   Epoch: 11   Global Step: 469470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:27:52,183-Speed 2635.67 samples/sec   Loss 5.7557   LearningRate 0.0188   Epoch: 11   Global Step: 469480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:56,082-Speed 2627.31 samples/sec   Loss 5.8925   LearningRate 0.0188   Epoch: 11   Global Step: 469490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:27:59,987-Speed 2622.49 samples/sec   Loss 5.7905   LearningRate 0.0188   Epoch: 11   Global Step: 469500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:03,894-Speed 2621.93 samples/sec   Loss 5.8097   LearningRate 0.0188   Epoch: 11   Global Step: 469510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:07,796-Speed 2624.82 samples/sec   Loss 5.8035   LearningRate 0.0188   Epoch: 11   Global Step: 469520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:11,702-Speed 2621.90 samples/sec   Loss 5.8200   LearningRate 0.0188   Epoch: 11   Global Step: 469530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:15,605-Speed 2623.90 samples/sec   Loss 5.7930   LearningRate 0.0188   Epoch: 11   Global Step: 469540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:19,504-Speed 2627.59 samples/sec   Loss 5.7997   LearningRate 0.0188   Epoch: 11   Global Step: 469550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:23,409-Speed 2623.04 samples/sec   Loss 5.7795   LearningRate 0.0188   Epoch: 11   Global Step: 469560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:27,318-Speed 2620.02 samples/sec   Loss 5.7862   LearningRate 0.0188   Epoch: 11   Global Step: 469570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:31,212-Speed 2629.88 samples/sec   Loss 5.7744   LearningRate 0.0188   Epoch: 11   Global Step: 469580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:28:35,088-Speed 2642.63 samples/sec   Loss 5.8015   LearningRate 0.0188   Epoch: 11   Global Step: 469590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:38,986-Speed 2628.82 samples/sec   Loss 5.8349   LearningRate 0.0188   Epoch: 11   Global Step: 469600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:42,893-Speed 2621.25 samples/sec   Loss 5.8460   LearningRate 0.0188   Epoch: 11   Global Step: 469610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:46,798-Speed 2622.69 samples/sec   Loss 5.8543   LearningRate 0.0188   Epoch: 11   Global Step: 469620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:50,701-Speed 2624.49 samples/sec   Loss 5.7912   LearningRate 0.0188   Epoch: 11   Global Step: 469630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:54,599-Speed 2627.59 samples/sec   Loss 5.8334   LearningRate 0.0188   Epoch: 11   Global Step: 469640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:28:58,493-Speed 2630.52 samples/sec   Loss 5.8416   LearningRate 0.0188   Epoch: 11   Global Step: 469650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:02,398-Speed 2622.61 samples/sec   Loss 5.7691   LearningRate 0.0188   Epoch: 11   Global Step: 469660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:06,299-Speed 2625.44 samples/sec   Loss 5.8576   LearningRate 0.0188   Epoch: 11   Global Step: 469670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:10,196-Speed 2627.85 samples/sec   Loss 5.8550   LearningRate 0.0188   Epoch: 11   Global Step: 469680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:14,137-Speed 2599.59 samples/sec   Loss 5.6610   LearningRate 0.0188   Epoch: 11   Global Step: 469690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:29:18,047-Speed 2620.02 samples/sec   Loss 5.9204   LearningRate 0.0188   Epoch: 11   Global Step: 469700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:29:21,945-Speed 2627.67 samples/sec   Loss 5.7428   LearningRate 0.0188   Epoch: 11   Global Step: 469710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:29:25,839-Speed 2630.20 samples/sec   Loss 5.7993   LearningRate 0.0188   Epoch: 11   Global Step: 469720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:29:29,722-Speed 2638.01 samples/sec   Loss 5.8757   LearningRate 0.0188   Epoch: 11   Global Step: 469730   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:33,623-Speed 2625.77 samples/sec   Loss 5.8017   LearningRate 0.0188   Epoch: 11   Global Step: 469740   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:37,518-Speed 2629.53 samples/sec   Loss 5.7650   LearningRate 0.0188   Epoch: 11   Global Step: 469750   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:41,422-Speed 2623.44 samples/sec   Loss 5.7184   LearningRate 0.0188   Epoch: 11   Global Step: 469760   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:45,314-Speed 2632.09 samples/sec   Loss 5.7610   LearningRate 0.0188   Epoch: 11   Global Step: 469770   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:49,216-Speed 2624.67 samples/sec   Loss 5.8103   LearningRate 0.0188   Epoch: 11   Global Step: 469780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:53,118-Speed 2625.18 samples/sec   Loss 5.8054   LearningRate 0.0188   Epoch: 11   Global Step: 469790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:29:57,028-Speed 2619.46 samples/sec   Loss 5.7706   LearningRate 0.0188   Epoch: 11   Global Step: 469800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:01,010-Speed 2572.49 samples/sec   Loss 5.8092   LearningRate 0.0188   Epoch: 11   Global Step: 469810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:04,931-Speed 2611.94 samples/sec   Loss 5.7659   LearningRate 0.0188   Epoch: 11   Global Step: 469820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:08,829-Speed 2627.27 samples/sec   Loss 5.7843   LearningRate 0.0188   Epoch: 11   Global Step: 469830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:30:12,775-Speed 2596.45 samples/sec   Loss 5.8320   LearningRate 0.0188   Epoch: 11   Global Step: 469840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:30:16,653-Speed 2641.32 samples/sec   Loss 5.8045   LearningRate 0.0188   Epoch: 11   Global Step: 469850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:20,577-Speed 2610.31 samples/sec   Loss 5.8504   LearningRate 0.0188   Epoch: 11   Global Step: 469860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:24,473-Speed 2628.97 samples/sec   Loss 5.8446   LearningRate 0.0188   Epoch: 11   Global Step: 469870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:28,365-Speed 2631.64 samples/sec   Loss 5.6886   LearningRate 0.0188   Epoch: 11   Global Step: 469880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:32,258-Speed 2631.26 samples/sec   Loss 5.6750   LearningRate 0.0188   Epoch: 11   Global Step: 469890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:36,152-Speed 2629.85 samples/sec   Loss 5.8325   LearningRate 0.0188   Epoch: 11   Global Step: 469900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:40,047-Speed 2629.53 samples/sec   Loss 5.7491   LearningRate 0.0188   Epoch: 11   Global Step: 469910   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:43,940-Speed 2630.94 samples/sec   Loss 5.8067   LearningRate 0.0188   Epoch: 11   Global Step: 469920   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:47,838-Speed 2627.68 samples/sec   Loss 5.7446   LearningRate 0.0188   Epoch: 11   Global Step: 469930   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:51,739-Speed 2626.19 samples/sec   Loss 5.7638   LearningRate 0.0188   Epoch: 11   Global Step: 469940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:30:55,642-Speed 2623.91 samples/sec   Loss 5.7095   LearningRate 0.0188   Epoch: 11   Global Step: 469950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:30:59,541-Speed 2627.29 samples/sec   Loss 5.6627   LearningRate 0.0188   Epoch: 11   Global Step: 469960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:31:03,417-Speed 2642.56 samples/sec   Loss 5.8886   LearningRate 0.0188   Epoch: 11   Global Step: 469970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:31:07,312-Speed 2629.36 samples/sec   Loss 5.8121   LearningRate 0.0188   Epoch: 11   Global Step: 469980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:31:11,210-Speed 2627.06 samples/sec   Loss 5.7486   LearningRate 0.0188   Epoch: 11   Global Step: 469990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:31:15,106-Speed 2630.00 samples/sec   Loss 5.8653   LearningRate 0.0188   Epoch: 11   Global Step: 470000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:31:58,722-[lfw][470000]XNorm: 23.571234
Training: 2022-04-15 00:31:58,723-[lfw][470000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-15 00:31:58,724-[lfw][470000]Accuracy-Highest: 0.99783
Training: 2022-04-15 00:32:48,891-[cfp_fp][470000]XNorm: 21.783026
Training: 2022-04-15 00:32:48,891-[cfp_fp][470000]Accuracy-Flip: 0.98786+-0.00617
Training: 2022-04-15 00:32:48,892-[cfp_fp][470000]Accuracy-Highest: 0.98843
Training: 2022-04-15 00:33:31,960-[agedb_30][470000]XNorm: 23.653521
Training: 2022-04-15 00:33:31,961-[agedb_30][470000]Accuracy-Flip: 0.97750+-0.00720
Training: 2022-04-15 00:33:31,962-[agedb_30][470000]Accuracy-Highest: 0.97817
Training: 2022-04-15 00:33:35,845-Speed 72.76 samples/sec   Loss 5.7909   LearningRate 0.0188   Epoch: 11   Global Step: 470010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:33:39,715-Speed 2647.15 samples/sec   Loss 5.7153   LearningRate 0.0188   Epoch: 11   Global Step: 470020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:33:43,587-Speed 2645.82 samples/sec   Loss 5.8462   LearningRate 0.0188   Epoch: 11   Global Step: 470030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:33:47,473-Speed 2635.71 samples/sec   Loss 5.7933   LearningRate 0.0188   Epoch: 11   Global Step: 470040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:33:51,360-Speed 2634.83 samples/sec   Loss 5.7875   LearningRate 0.0188   Epoch: 11   Global Step: 470050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:33:55,239-Speed 2641.16 samples/sec   Loss 5.6908   LearningRate 0.0188   Epoch: 11   Global Step: 470060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:33:59,153-Speed 2617.45 samples/sec   Loss 5.7146   LearningRate 0.0188   Epoch: 11   Global Step: 470070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:03,049-Speed 2628.94 samples/sec   Loss 5.7852   LearningRate 0.0188   Epoch: 11   Global Step: 470080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:06,964-Speed 2616.52 samples/sec   Loss 5.7576   LearningRate 0.0188   Epoch: 11   Global Step: 470090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:10,848-Speed 2637.24 samples/sec   Loss 5.7133   LearningRate 0.0188   Epoch: 11   Global Step: 470100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:14,766-Speed 2613.80 samples/sec   Loss 5.7082   LearningRate 0.0188   Epoch: 11   Global Step: 470110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:18,667-Speed 2625.19 samples/sec   Loss 5.7105   LearningRate 0.0188   Epoch: 11   Global Step: 470120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:22,569-Speed 2625.66 samples/sec   Loss 5.7130   LearningRate 0.0188   Epoch: 11   Global Step: 470130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:34:26,433-Speed 2650.61 samples/sec   Loss 5.8893   LearningRate 0.0188   Epoch: 11   Global Step: 470140   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:30,337-Speed 2623.76 samples/sec   Loss 5.7082   LearningRate 0.0188   Epoch: 11   Global Step: 470150   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:34,228-Speed 2632.50 samples/sec   Loss 5.8540   LearningRate 0.0188   Epoch: 11   Global Step: 470160   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:38,118-Speed 2633.33 samples/sec   Loss 5.9184   LearningRate 0.0188   Epoch: 11   Global Step: 470170   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:42,016-Speed 2627.23 samples/sec   Loss 5.9036   LearningRate 0.0188   Epoch: 11   Global Step: 470180   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:45,932-Speed 2615.93 samples/sec   Loss 5.8447   LearningRate 0.0188   Epoch: 11   Global Step: 470190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:49,834-Speed 2624.75 samples/sec   Loss 5.8140   LearningRate 0.0188   Epoch: 11   Global Step: 470200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:53,729-Speed 2630.17 samples/sec   Loss 5.7316   LearningRate 0.0188   Epoch: 11   Global Step: 470210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:34:57,694-Speed 2583.23 samples/sec   Loss 5.9059   LearningRate 0.0188   Epoch: 11   Global Step: 470220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:01,591-Speed 2629.01 samples/sec   Loss 5.8376   LearningRate 0.0188   Epoch: 11   Global Step: 470230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:05,536-Speed 2601.54 samples/sec   Loss 5.8435   LearningRate 0.0188   Epoch: 11   Global Step: 470240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:35:09,430-Speed 2630.88 samples/sec   Loss 5.7662   LearningRate 0.0188   Epoch: 11   Global Step: 470250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:35:13,300-Speed 2645.93 samples/sec   Loss 5.7362   LearningRate 0.0188   Epoch: 11   Global Step: 470260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:17,201-Speed 2625.89 samples/sec   Loss 5.8045   LearningRate 0.0188   Epoch: 11   Global Step: 470270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:21,099-Speed 2628.23 samples/sec   Loss 5.8701   LearningRate 0.0188   Epoch: 11   Global Step: 470280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:25,003-Speed 2623.53 samples/sec   Loss 5.8085   LearningRate 0.0188   Epoch: 11   Global Step: 470290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:28,934-Speed 2605.64 samples/sec   Loss 5.7305   LearningRate 0.0188   Epoch: 11   Global Step: 470300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:32,838-Speed 2623.54 samples/sec   Loss 5.8141   LearningRate 0.0188   Epoch: 11   Global Step: 470310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:36,738-Speed 2626.71 samples/sec   Loss 5.8205   LearningRate 0.0188   Epoch: 11   Global Step: 470320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:40,633-Speed 2629.66 samples/sec   Loss 5.8213   LearningRate 0.0188   Epoch: 11   Global Step: 470330   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:44,532-Speed 2626.49 samples/sec   Loss 5.8579   LearningRate 0.0188   Epoch: 11   Global Step: 470340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:48,429-Speed 2628.72 samples/sec   Loss 5.7562   LearningRate 0.0188   Epoch: 11   Global Step: 470350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:35:52,324-Speed 2629.64 samples/sec   Loss 5.7748   LearningRate 0.0188   Epoch: 11   Global Step: 470360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:35:56,192-Speed 2647.68 samples/sec   Loss 5.8034   LearningRate 0.0187   Epoch: 11   Global Step: 470370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:00,084-Speed 2632.24 samples/sec   Loss 5.7965   LearningRate 0.0187   Epoch: 11   Global Step: 470380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:03,979-Speed 2629.60 samples/sec   Loss 5.8061   LearningRate 0.0187   Epoch: 11   Global Step: 470390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:07,867-Speed 2634.75 samples/sec   Loss 5.8233   LearningRate 0.0187   Epoch: 11   Global Step: 470400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:11,760-Speed 2630.88 samples/sec   Loss 5.7773   LearningRate 0.0187   Epoch: 11   Global Step: 470410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:15,655-Speed 2629.24 samples/sec   Loss 5.8311   LearningRate 0.0187   Epoch: 11   Global Step: 470420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:19,591-Speed 2602.12 samples/sec   Loss 5.7613   LearningRate 0.0187   Epoch: 11   Global Step: 470430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:23,482-Speed 2632.76 samples/sec   Loss 5.6981   LearningRate 0.0187   Epoch: 11   Global Step: 470440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:27,377-Speed 2630.01 samples/sec   Loss 5.8510   LearningRate 0.0187   Epoch: 11   Global Step: 470450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:31,281-Speed 2623.50 samples/sec   Loss 5.8350   LearningRate 0.0187   Epoch: 11   Global Step: 470460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:35,188-Speed 2621.46 samples/sec   Loss 5.8191   LearningRate 0.0187   Epoch: 11   Global Step: 470470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:36:39,093-Speed 2623.40 samples/sec   Loss 5.7758   LearningRate 0.0187   Epoch: 11   Global Step: 470480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:36:43,000-Speed 2621.44 samples/sec   Loss 5.7272   LearningRate 0.0187   Epoch: 11   Global Step: 470490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:36:46,898-Speed 2627.49 samples/sec   Loss 5.8304   LearningRate 0.0187   Epoch: 11   Global Step: 470500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:36:50,775-Speed 2642.04 samples/sec   Loss 5.7452   LearningRate 0.0187   Epoch: 11   Global Step: 470510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:54,674-Speed 2626.58 samples/sec   Loss 5.9169   LearningRate 0.0187   Epoch: 11   Global Step: 470520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:36:58,605-Speed 2606.12 samples/sec   Loss 5.7375   LearningRate 0.0187   Epoch: 11   Global Step: 470530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:02,503-Speed 2627.94 samples/sec   Loss 5.8557   LearningRate 0.0187   Epoch: 11   Global Step: 470540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:06,416-Speed 2617.73 samples/sec   Loss 5.7527   LearningRate 0.0187   Epoch: 11   Global Step: 470550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:10,316-Speed 2626.14 samples/sec   Loss 5.8481   LearningRate 0.0187   Epoch: 11   Global Step: 470560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:14,215-Speed 2627.09 samples/sec   Loss 5.8721   LearningRate 0.0187   Epoch: 11   Global Step: 470570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:18,149-Speed 2603.24 samples/sec   Loss 5.7957   LearningRate 0.0187   Epoch: 11   Global Step: 470580   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:22,088-Speed 2600.91 samples/sec   Loss 5.8943   LearningRate 0.0187   Epoch: 11   Global Step: 470590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:26,009-Speed 2612.14 samples/sec   Loss 5.8342   LearningRate 0.0187   Epoch: 11   Global Step: 470600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:29,920-Speed 2619.46 samples/sec   Loss 5.8777   LearningRate 0.0187   Epoch: 11   Global Step: 470610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:37:33,816-Speed 2629.05 samples/sec   Loss 5.8103   LearningRate 0.0187   Epoch: 11   Global Step: 470620   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:37:37,695-Speed 2640.60 samples/sec   Loss 5.7515   LearningRate 0.0187   Epoch: 11   Global Step: 470630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:41,641-Speed 2595.56 samples/sec   Loss 5.7218   LearningRate 0.0187   Epoch: 11   Global Step: 470640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:45,541-Speed 2626.28 samples/sec   Loss 5.7206   LearningRate 0.0187   Epoch: 11   Global Step: 470650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:49,468-Speed 2608.48 samples/sec   Loss 5.8645   LearningRate 0.0187   Epoch: 11   Global Step: 470660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:53,363-Speed 2629.68 samples/sec   Loss 5.8214   LearningRate 0.0187   Epoch: 11   Global Step: 470670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:37:57,256-Speed 2630.99 samples/sec   Loss 5.7946   LearningRate 0.0187   Epoch: 11   Global Step: 470680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:01,152-Speed 2629.60 samples/sec   Loss 5.7021   LearningRate 0.0187   Epoch: 11   Global Step: 470690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:05,049-Speed 2627.71 samples/sec   Loss 5.7674   LearningRate 0.0187   Epoch: 11   Global Step: 470700   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:08,963-Speed 2617.33 samples/sec   Loss 5.8250   LearningRate 0.0187   Epoch: 11   Global Step: 470710   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:12,858-Speed 2629.33 samples/sec   Loss 5.8489   LearningRate 0.0187   Epoch: 11   Global Step: 470720   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:16,750-Speed 2631.84 samples/sec   Loss 5.7652   LearningRate 0.0187   Epoch: 11   Global Step: 470730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:38:20,643-Speed 2630.71 samples/sec   Loss 5.7407   LearningRate 0.0187   Epoch: 11   Global Step: 470740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:38:24,535-Speed 2631.39 samples/sec   Loss 5.8404   LearningRate 0.0187   Epoch: 11   Global Step: 470750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:38:28,426-Speed 2632.76 samples/sec   Loss 5.8447   LearningRate 0.0187   Epoch: 11   Global Step: 470760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:38:32,323-Speed 2628.56 samples/sec   Loss 5.7291   LearningRate 0.0187   Epoch: 11   Global Step: 470770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:38:36,217-Speed 2630.03 samples/sec   Loss 5.8002   LearningRate 0.0187   Epoch: 11   Global Step: 470780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:38:40,097-Speed 2639.72 samples/sec   Loss 5.7480   LearningRate 0.0187   Epoch: 11   Global Step: 470790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:44,025-Speed 2607.43 samples/sec   Loss 5.8628   LearningRate 0.0187   Epoch: 11   Global Step: 470800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:47,923-Speed 2628.20 samples/sec   Loss 5.7057   LearningRate 0.0187   Epoch: 11   Global Step: 470810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:51,819-Speed 2628.82 samples/sec   Loss 5.9126   LearningRate 0.0187   Epoch: 11   Global Step: 470820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:55,711-Speed 2631.59 samples/sec   Loss 5.7347   LearningRate 0.0187   Epoch: 11   Global Step: 470830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:38:59,607-Speed 2629.08 samples/sec   Loss 5.8270   LearningRate 0.0187   Epoch: 11   Global Step: 470840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:03,525-Speed 2614.80 samples/sec   Loss 5.8588   LearningRate 0.0187   Epoch: 11   Global Step: 470850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:07,419-Speed 2630.23 samples/sec   Loss 5.8187   LearningRate 0.0187   Epoch: 11   Global Step: 470860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:11,318-Speed 2626.92 samples/sec   Loss 5.7033   LearningRate 0.0187   Epoch: 11   Global Step: 470870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:15,217-Speed 2627.46 samples/sec   Loss 5.9110   LearningRate 0.0187   Epoch: 11   Global Step: 470880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:19,119-Speed 2624.81 samples/sec   Loss 5.7206   LearningRate 0.0187   Epoch: 11   Global Step: 470890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:39:23,031-Speed 2618.49 samples/sec   Loss 5.7898   LearningRate 0.0187   Epoch: 11   Global Step: 470900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:39:26,925-Speed 2630.51 samples/sec   Loss 5.8265   LearningRate 0.0187   Epoch: 11   Global Step: 470910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:39:30,819-Speed 2629.60 samples/sec   Loss 5.8256   LearningRate 0.0187   Epoch: 11   Global Step: 470920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:39:34,723-Speed 2623.48 samples/sec   Loss 5.7945   LearningRate 0.0187   Epoch: 11   Global Step: 470930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:39:38,623-Speed 2626.61 samples/sec   Loss 5.8412   LearningRate 0.0187   Epoch: 11   Global Step: 470940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:39:42,493-Speed 2646.74 samples/sec   Loss 5.7888   LearningRate 0.0187   Epoch: 11   Global Step: 470950   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:46,383-Speed 2633.29 samples/sec   Loss 5.8050   LearningRate 0.0187   Epoch: 11   Global Step: 470960   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:50,290-Speed 2621.32 samples/sec   Loss 5.7931   LearningRate 0.0187   Epoch: 11   Global Step: 470970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:54,188-Speed 2627.12 samples/sec   Loss 5.8263   LearningRate 0.0187   Epoch: 11   Global Step: 470980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:39:58,088-Speed 2626.41 samples/sec   Loss 5.7869   LearningRate 0.0187   Epoch: 11   Global Step: 470990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:40:01,981-Speed 2631.11 samples/sec   Loss 5.7808   LearningRate 0.0187   Epoch: 11   Global Step: 471000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:40:05,886-Speed 2622.73 samples/sec   Loss 5.8193   LearningRate 0.0187   Epoch: 11   Global Step: 471010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:40:09,784-Speed 2627.31 samples/sec   Loss 5.7426   LearningRate 0.0187   Epoch: 11   Global Step: 471020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:40:13,680-Speed 2629.20 samples/sec   Loss 5.8271   LearningRate 0.0187   Epoch: 11   Global Step: 471030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:40:17,591-Speed 2619.15 samples/sec   Loss 5.7715   LearningRate 0.0187   Epoch: 11   Global Step: 471040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:40:21,488-Speed 2628.56 samples/sec   Loss 5.7929   LearningRate 0.0187   Epoch: 11   Global Step: 471050   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:25,393-Speed 2622.27 samples/sec   Loss 5.6498   LearningRate 0.0187   Epoch: 11   Global Step: 471060   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:29,287-Speed 2630.40 samples/sec   Loss 5.8589   LearningRate 0.0187   Epoch: 11   Global Step: 471070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:33,187-Speed 2626.42 samples/sec   Loss 5.8191   LearningRate 0.0187   Epoch: 11   Global Step: 471080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:37,088-Speed 2625.30 samples/sec   Loss 5.8537   LearningRate 0.0187   Epoch: 11   Global Step: 471090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:40,980-Speed 2631.44 samples/sec   Loss 5.7775   LearningRate 0.0187   Epoch: 11   Global Step: 471100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:44,903-Speed 2611.15 samples/sec   Loss 5.7208   LearningRate 0.0187   Epoch: 11   Global Step: 471110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:48,828-Speed 2609.29 samples/sec   Loss 5.7718   LearningRate 0.0187   Epoch: 11   Global Step: 471120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:52,732-Speed 2624.25 samples/sec   Loss 5.7723   LearningRate 0.0187   Epoch: 11   Global Step: 471130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:40:56,626-Speed 2630.12 samples/sec   Loss 5.7358   LearningRate 0.0187   Epoch: 11   Global Step: 471140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:41:00,509-Speed 2637.38 samples/sec   Loss 5.8573   LearningRate 0.0187   Epoch: 11   Global Step: 471150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:41:04,421-Speed 2618.05 samples/sec   Loss 5.6513   LearningRate 0.0187   Epoch: 11   Global Step: 471160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:41:08,326-Speed 2623.30 samples/sec   Loss 5.6735   LearningRate 0.0187   Epoch: 11   Global Step: 471170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:41:12,223-Speed 2627.86 samples/sec   Loss 5.8220   LearningRate 0.0187   Epoch: 11   Global Step: 471180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:41:16,100-Speed 2641.44 samples/sec   Loss 5.7528   LearningRate 0.0187   Epoch: 11   Global Step: 471190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:20,088-Speed 2568.42 samples/sec   Loss 5.8249   LearningRate 0.0187   Epoch: 11   Global Step: 471200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:23,991-Speed 2624.05 samples/sec   Loss 5.6949   LearningRate 0.0187   Epoch: 11   Global Step: 471210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:27,888-Speed 2628.47 samples/sec   Loss 5.8525   LearningRate 0.0187   Epoch: 11   Global Step: 471220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:31,782-Speed 2630.51 samples/sec   Loss 5.9149   LearningRate 0.0187   Epoch: 11   Global Step: 471230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:35,680-Speed 2627.64 samples/sec   Loss 5.7454   LearningRate 0.0187   Epoch: 11   Global Step: 471240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:39,576-Speed 2629.24 samples/sec   Loss 5.6754   LearningRate 0.0187   Epoch: 11   Global Step: 471250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:43,474-Speed 2627.52 samples/sec   Loss 5.6764   LearningRate 0.0187   Epoch: 11   Global Step: 471260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:47,371-Speed 2628.08 samples/sec   Loss 5.7659   LearningRate 0.0187   Epoch: 11   Global Step: 471270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:51,266-Speed 2629.88 samples/sec   Loss 5.7763   LearningRate 0.0187   Epoch: 11   Global Step: 471280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:41:55,166-Speed 2625.68 samples/sec   Loss 5.8241   LearningRate 0.0187   Epoch: 11   Global Step: 471290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:41:59,075-Speed 2620.38 samples/sec   Loss 5.7338   LearningRate 0.0187   Epoch: 11   Global Step: 471300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:42:02,983-Speed 2628.10 samples/sec   Loss 5.8107   LearningRate 0.0187   Epoch: 11   Global Step: 471310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:42:06,891-Speed 2621.20 samples/sec   Loss 5.7798   LearningRate 0.0187   Epoch: 11   Global Step: 471320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:42:10,791-Speed 2626.26 samples/sec   Loss 5.7301   LearningRate 0.0186   Epoch: 11   Global Step: 471330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:42:14,660-Speed 2647.46 samples/sec   Loss 5.8171   LearningRate 0.0186   Epoch: 11   Global Step: 471340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:18,550-Speed 2632.85 samples/sec   Loss 5.8682   LearningRate 0.0186   Epoch: 11   Global Step: 471350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:22,448-Speed 2627.72 samples/sec   Loss 5.7629   LearningRate 0.0186   Epoch: 11   Global Step: 471360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:26,346-Speed 2627.63 samples/sec   Loss 5.7548   LearningRate 0.0186   Epoch: 11   Global Step: 471370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:30,246-Speed 2626.05 samples/sec   Loss 5.7625   LearningRate 0.0186   Epoch: 11   Global Step: 471380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:34,168-Speed 2611.58 samples/sec   Loss 5.7422   LearningRate 0.0186   Epoch: 11   Global Step: 471390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:38,080-Speed 2617.90 samples/sec   Loss 5.7820   LearningRate 0.0186   Epoch: 11   Global Step: 471400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:41,974-Speed 2630.45 samples/sec   Loss 5.8317   LearningRate 0.0186   Epoch: 11   Global Step: 471410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:45,867-Speed 2631.26 samples/sec   Loss 5.8588   LearningRate 0.0186   Epoch: 11   Global Step: 471420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:49,761-Speed 2630.43 samples/sec   Loss 5.7617   LearningRate 0.0186   Epoch: 11   Global Step: 471430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:42:53,657-Speed 2628.61 samples/sec   Loss 5.6840   LearningRate 0.0186   Epoch: 11   Global Step: 471440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:42:57,549-Speed 2631.87 samples/sec   Loss 5.7235   LearningRate 0.0186   Epoch: 11   Global Step: 471450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:43:01,446-Speed 2627.97 samples/sec   Loss 5.8358   LearningRate 0.0186   Epoch: 11   Global Step: 471460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:43:05,324-Speed 2641.14 samples/sec   Loss 5.7624   LearningRate 0.0186   Epoch: 11   Global Step: 471470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:09,234-Speed 2619.51 samples/sec   Loss 5.7010   LearningRate 0.0186   Epoch: 11   Global Step: 471480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:13,130-Speed 2629.47 samples/sec   Loss 5.8047   LearningRate 0.0186   Epoch: 11   Global Step: 471490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:17,029-Speed 2626.88 samples/sec   Loss 5.7783   LearningRate 0.0186   Epoch: 11   Global Step: 471500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:20,928-Speed 2627.24 samples/sec   Loss 5.8377   LearningRate 0.0186   Epoch: 11   Global Step: 471510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:24,863-Speed 2602.34 samples/sec   Loss 5.8335   LearningRate 0.0186   Epoch: 11   Global Step: 471520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:28,755-Speed 2632.38 samples/sec   Loss 5.6835   LearningRate 0.0186   Epoch: 11   Global Step: 471530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:32,650-Speed 2629.77 samples/sec   Loss 5.7121   LearningRate 0.0186   Epoch: 11   Global Step: 471540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:36,547-Speed 2627.79 samples/sec   Loss 5.6951   LearningRate 0.0186   Epoch: 11   Global Step: 471550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:40,441-Speed 2630.24 samples/sec   Loss 5.7133   LearningRate 0.0186   Epoch: 11   Global Step: 471560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:44,335-Speed 2631.20 samples/sec   Loss 5.7417   LearningRate 0.0186   Epoch: 11   Global Step: 471570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:43:48,231-Speed 2629.04 samples/sec   Loss 5.7276   LearningRate 0.0186   Epoch: 11   Global Step: 471580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:43:52,114-Speed 2637.31 samples/sec   Loss 5.7730   LearningRate 0.0186   Epoch: 11   Global Step: 471590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:56,008-Speed 2630.86 samples/sec   Loss 5.8480   LearningRate 0.0186   Epoch: 11   Global Step: 471600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:43:59,912-Speed 2623.51 samples/sec   Loss 5.7351   LearningRate 0.0186   Epoch: 11   Global Step: 471610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:03,825-Speed 2617.19 samples/sec   Loss 5.7795   LearningRate 0.0186   Epoch: 11   Global Step: 471620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:07,732-Speed 2621.90 samples/sec   Loss 5.7055   LearningRate 0.0186   Epoch: 11   Global Step: 471630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:11,674-Speed 2598.45 samples/sec   Loss 5.8706   LearningRate 0.0186   Epoch: 11   Global Step: 471640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:15,655-Speed 2572.89 samples/sec   Loss 5.8145   LearningRate 0.0186   Epoch: 11   Global Step: 471650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:19,553-Speed 2627.50 samples/sec   Loss 5.8489   LearningRate 0.0186   Epoch: 11   Global Step: 471660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:23,455-Speed 2626.01 samples/sec   Loss 5.8148   LearningRate 0.0186   Epoch: 11   Global Step: 471670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:27,348-Speed 2631.10 samples/sec   Loss 5.7732   LearningRate 0.0186   Epoch: 11   Global Step: 471680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:31,242-Speed 2630.01 samples/sec   Loss 5.7894   LearningRate 0.0186   Epoch: 11   Global Step: 471690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:44:35,137-Speed 2629.60 samples/sec   Loss 5.7180   LearningRate 0.0186   Epoch: 11   Global Step: 471700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:44:39,061-Speed 2610.35 samples/sec   Loss 5.8348   LearningRate 0.0186   Epoch: 11   Global Step: 471710   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:42,959-Speed 2627.92 samples/sec   Loss 5.7306   LearningRate 0.0186   Epoch: 11   Global Step: 471720   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:46,906-Speed 2595.09 samples/sec   Loss 5.7741   LearningRate 0.0186   Epoch: 11   Global Step: 471730   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:50,818-Speed 2618.43 samples/sec   Loss 5.6723   LearningRate 0.0186   Epoch: 11   Global Step: 471740   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:54,715-Speed 2628.29 samples/sec   Loss 5.6341   LearningRate 0.0186   Epoch: 11   Global Step: 471750   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:44:58,632-Speed 2615.25 samples/sec   Loss 5.7421   LearningRate 0.0186   Epoch: 11   Global Step: 471760   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:45:02,540-Speed 2620.58 samples/sec   Loss 5.7681   LearningRate 0.0186   Epoch: 11   Global Step: 471770   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:45:06,439-Speed 2627.04 samples/sec   Loss 5.7180   LearningRate 0.0186   Epoch: 11   Global Step: 471780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:45:10,355-Speed 2615.46 samples/sec   Loss 5.8153   LearningRate 0.0186   Epoch: 11   Global Step: 471790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:45:14,269-Speed 2616.71 samples/sec   Loss 5.7241   LearningRate 0.0186   Epoch: 11   Global Step: 471800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:45:18,162-Speed 2630.99 samples/sec   Loss 5.6661   LearningRate 0.0186   Epoch: 11   Global Step: 471810   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:22,123-Speed 2586.51 samples/sec   Loss 5.8484   LearningRate 0.0186   Epoch: 11   Global Step: 471820   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:26,022-Speed 2626.80 samples/sec   Loss 5.7559   LearningRate 0.0186   Epoch: 11   Global Step: 471830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:29,932-Speed 2619.37 samples/sec   Loss 5.7939   LearningRate 0.0186   Epoch: 11   Global Step: 471840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:33,830-Speed 2627.79 samples/sec   Loss 5.7949   LearningRate 0.0186   Epoch: 11   Global Step: 471850   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:37,748-Speed 2614.42 samples/sec   Loss 5.7854   LearningRate 0.0186   Epoch: 11   Global Step: 471860   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:41,647-Speed 2626.58 samples/sec   Loss 5.7882   LearningRate 0.0186   Epoch: 11   Global Step: 471870   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:45,547-Speed 2627.11 samples/sec   Loss 5.6469   LearningRate 0.0186   Epoch: 11   Global Step: 471880   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:49,442-Speed 2629.77 samples/sec   Loss 5.7145   LearningRate 0.0186   Epoch: 11   Global Step: 471890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:53,340-Speed 2628.37 samples/sec   Loss 5.7172   LearningRate 0.0186   Epoch: 11   Global Step: 471900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:45:57,214-Speed 2643.74 samples/sec   Loss 5.7874   LearningRate 0.0186   Epoch: 11   Global Step: 471910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:01,110-Speed 2629.23 samples/sec   Loss 5.7900   LearningRate 0.0186   Epoch: 11   Global Step: 471920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:05,002-Speed 2631.44 samples/sec   Loss 5.7832   LearningRate 0.0186   Epoch: 11   Global Step: 471930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:08,895-Speed 2632.10 samples/sec   Loss 5.7117   LearningRate 0.0186   Epoch: 11   Global Step: 471940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:12,793-Speed 2627.71 samples/sec   Loss 5.7754   LearningRate 0.0186   Epoch: 11   Global Step: 471950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:16,688-Speed 2629.69 samples/sec   Loss 5.6696   LearningRate 0.0186   Epoch: 11   Global Step: 471960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:20,588-Speed 2626.17 samples/sec   Loss 5.8059   LearningRate 0.0186   Epoch: 11   Global Step: 471970   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:24,483-Speed 2629.94 samples/sec   Loss 5.7363   LearningRate 0.0186   Epoch: 11   Global Step: 471980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:28,374-Speed 2632.11 samples/sec   Loss 5.8496   LearningRate 0.0186   Epoch: 11   Global Step: 471990   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:46:32,244-Speed 2646.32 samples/sec   Loss 5.8370   LearningRate 0.0186   Epoch: 11   Global Step: 472000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:36,151-Speed 2621.63 samples/sec   Loss 5.7601   LearningRate 0.0186   Epoch: 11   Global Step: 472010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:40,081-Speed 2606.62 samples/sec   Loss 5.9388   LearningRate 0.0186   Epoch: 11   Global Step: 472020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:43,978-Speed 2628.47 samples/sec   Loss 5.6813   LearningRate 0.0186   Epoch: 11   Global Step: 472030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:47,875-Speed 2628.70 samples/sec   Loss 5.7236   LearningRate 0.0186   Epoch: 11   Global Step: 472040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:51,780-Speed 2623.03 samples/sec   Loss 5.7295   LearningRate 0.0186   Epoch: 11   Global Step: 472050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:55,681-Speed 2625.54 samples/sec   Loss 5.8198   LearningRate 0.0186   Epoch: 11   Global Step: 472060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:46:59,622-Speed 2598.71 samples/sec   Loss 5.7485   LearningRate 0.0186   Epoch: 11   Global Step: 472070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:47:03,518-Speed 2629.28 samples/sec   Loss 5.8389   LearningRate 0.0186   Epoch: 11   Global Step: 472080   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:47:07,446-Speed 2607.71 samples/sec   Loss 5.6527   LearningRate 0.0186   Epoch: 11   Global Step: 472090   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:47:11,343-Speed 2628.38 samples/sec   Loss 5.7400   LearningRate 0.0186   Epoch: 11   Global Step: 472100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:15,241-Speed 2627.07 samples/sec   Loss 5.7021   LearningRate 0.0186   Epoch: 11   Global Step: 472110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:19,144-Speed 2625.05 samples/sec   Loss 5.7643   LearningRate 0.0186   Epoch: 11   Global Step: 472120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:23,038-Speed 2629.82 samples/sec   Loss 5.8283   LearningRate 0.0186   Epoch: 11   Global Step: 472130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:26,932-Speed 2630.74 samples/sec   Loss 5.8021   LearningRate 0.0186   Epoch: 11   Global Step: 472140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:30,825-Speed 2631.04 samples/sec   Loss 5.6945   LearningRate 0.0186   Epoch: 11   Global Step: 472150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:34,722-Speed 2628.44 samples/sec   Loss 5.6962   LearningRate 0.0186   Epoch: 11   Global Step: 472160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:38,620-Speed 2627.35 samples/sec   Loss 5.7174   LearningRate 0.0186   Epoch: 11   Global Step: 472170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:42,521-Speed 2626.37 samples/sec   Loss 5.7837   LearningRate 0.0186   Epoch: 11   Global Step: 472180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:47:46,401-Speed 2639.18 samples/sec   Loss 5.8364   LearningRate 0.0186   Epoch: 11   Global Step: 472190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:47:50,304-Speed 2624.78 samples/sec   Loss 5.6361   LearningRate 0.0186   Epoch: 11   Global Step: 472200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:47:54,213-Speed 2619.69 samples/sec   Loss 5.8134   LearningRate 0.0186   Epoch: 11   Global Step: 472210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:47:58,111-Speed 2628.06 samples/sec   Loss 5.7331   LearningRate 0.0186   Epoch: 11   Global Step: 472220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:02,004-Speed 2630.93 samples/sec   Loss 5.9344   LearningRate 0.0186   Epoch: 11   Global Step: 472230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:05,899-Speed 2629.72 samples/sec   Loss 5.8180   LearningRate 0.0186   Epoch: 11   Global Step: 472240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:09,808-Speed 2619.99 samples/sec   Loss 5.7241   LearningRate 0.0186   Epoch: 11   Global Step: 472250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:13,712-Speed 2624.22 samples/sec   Loss 5.8318   LearningRate 0.0186   Epoch: 11   Global Step: 472260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:17,625-Speed 2617.23 samples/sec   Loss 5.7190   LearningRate 0.0186   Epoch: 11   Global Step: 472270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:21,575-Speed 2593.70 samples/sec   Loss 5.7364   LearningRate 0.0186   Epoch: 11   Global Step: 472280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:25,476-Speed 2625.77 samples/sec   Loss 5.7604   LearningRate 0.0185   Epoch: 11   Global Step: 472290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:48:29,383-Speed 2622.05 samples/sec   Loss 5.7686   LearningRate 0.0185   Epoch: 11   Global Step: 472300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:48:33,270-Speed 2634.45 samples/sec   Loss 5.6565   LearningRate 0.0185   Epoch: 11   Global Step: 472310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:37,192-Speed 2611.23 samples/sec   Loss 5.7005   LearningRate 0.0185   Epoch: 11   Global Step: 472320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:41,095-Speed 2624.72 samples/sec   Loss 5.7576   LearningRate 0.0185   Epoch: 11   Global Step: 472330   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:44,988-Speed 2631.65 samples/sec   Loss 5.7785   LearningRate 0.0185   Epoch: 11   Global Step: 472340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:48,887-Speed 2626.83 samples/sec   Loss 5.7198   LearningRate 0.0185   Epoch: 11   Global Step: 472350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:52,794-Speed 2621.74 samples/sec   Loss 5.7456   LearningRate 0.0185   Epoch: 11   Global Step: 472360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:48:56,719-Speed 2609.37 samples/sec   Loss 5.7714   LearningRate 0.0185   Epoch: 11   Global Step: 472370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:49:00,614-Speed 2629.76 samples/sec   Loss 5.7699   LearningRate 0.0185   Epoch: 11   Global Step: 472380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:49:04,517-Speed 2624.03 samples/sec   Loss 5.8010   LearningRate 0.0185   Epoch: 11   Global Step: 472390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:49:08,430-Speed 2617.68 samples/sec   Loss 5.9077   LearningRate 0.0185   Epoch: 11   Global Step: 472400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:49:12,332-Speed 2625.22 samples/sec   Loss 5.8261   LearningRate 0.0185   Epoch: 11   Global Step: 472410   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:16,240-Speed 2620.19 samples/sec   Loss 5.7743   LearningRate 0.0185   Epoch: 11   Global Step: 472420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:20,147-Speed 2621.93 samples/sec   Loss 5.7196   LearningRate 0.0185   Epoch: 11   Global Step: 472430   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:24,053-Speed 2622.37 samples/sec   Loss 5.7176   LearningRate 0.0185   Epoch: 11   Global Step: 472440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:27,953-Speed 2626.11 samples/sec   Loss 5.8087   LearningRate 0.0185   Epoch: 11   Global Step: 472450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:31,849-Speed 2628.66 samples/sec   Loss 5.7716   LearningRate 0.0185   Epoch: 11   Global Step: 472460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:35,743-Speed 2630.77 samples/sec   Loss 5.8117   LearningRate 0.0185   Epoch: 11   Global Step: 472470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:39,657-Speed 2616.84 samples/sec   Loss 5.7836   LearningRate 0.0185   Epoch: 11   Global Step: 472480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:43,596-Speed 2600.13 samples/sec   Loss 5.7500   LearningRate 0.0185   Epoch: 11   Global Step: 472490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:47,508-Speed 2618.26 samples/sec   Loss 5.8277   LearningRate 0.0185   Epoch: 11   Global Step: 472500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:51,384-Speed 2642.67 samples/sec   Loss 5.7275   LearningRate 0.0185   Epoch: 11   Global Step: 472510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:55,278-Speed 2629.61 samples/sec   Loss 5.7763   LearningRate 0.0185   Epoch: 11   Global Step: 472520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:49:59,174-Speed 2629.43 samples/sec   Loss 5.7648   LearningRate 0.0185   Epoch: 11   Global Step: 472530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:03,070-Speed 2628.79 samples/sec   Loss 5.7055   LearningRate 0.0185   Epoch: 11   Global Step: 472540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:06,965-Speed 2629.34 samples/sec   Loss 5.8050   LearningRate 0.0185   Epoch: 11   Global Step: 472550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:10,862-Speed 2628.16 samples/sec   Loss 5.8111   LearningRate 0.0185   Epoch: 11   Global Step: 472560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:14,756-Speed 2630.79 samples/sec   Loss 5.6841   LearningRate 0.0185   Epoch: 11   Global Step: 472570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:18,655-Speed 2626.97 samples/sec   Loss 5.7443   LearningRate 0.0185   Epoch: 11   Global Step: 472580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:22,556-Speed 2625.58 samples/sec   Loss 5.7173   LearningRate 0.0185   Epoch: 11   Global Step: 472590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:50:26,428-Speed 2644.80 samples/sec   Loss 5.7073   LearningRate 0.0185   Epoch: 11   Global Step: 472600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:30,322-Speed 2630.36 samples/sec   Loss 5.7129   LearningRate 0.0185   Epoch: 11   Global Step: 472610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:34,220-Speed 2627.78 samples/sec   Loss 5.8000   LearningRate 0.0185   Epoch: 11   Global Step: 472620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:38,118-Speed 2627.33 samples/sec   Loss 5.7820   LearningRate 0.0185   Epoch: 11   Global Step: 472630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:42,013-Speed 2629.68 samples/sec   Loss 5.7747   LearningRate 0.0185   Epoch: 11   Global Step: 472640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:45,908-Speed 2629.46 samples/sec   Loss 5.7727   LearningRate 0.0185   Epoch: 11   Global Step: 472650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:49,815-Speed 2621.95 samples/sec   Loss 5.7847   LearningRate 0.0185   Epoch: 11   Global Step: 472660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:53,712-Speed 2628.34 samples/sec   Loss 5.7553   LearningRate 0.0185   Epoch: 11   Global Step: 472670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:50:57,626-Speed 2616.55 samples/sec   Loss 5.7554   LearningRate 0.0185   Epoch: 11   Global Step: 472680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:01,523-Speed 2628.28 samples/sec   Loss 5.7333   LearningRate 0.0185   Epoch: 11   Global Step: 472690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:05,425-Speed 2625.66 samples/sec   Loss 5.7870   LearningRate 0.0185   Epoch: 11   Global Step: 472700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:51:09,296-Speed 2645.88 samples/sec   Loss 5.8218   LearningRate 0.0185   Epoch: 11   Global Step: 472710   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:13,196-Speed 2626.16 samples/sec   Loss 5.6646   LearningRate 0.0185   Epoch: 11   Global Step: 472720   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:17,102-Speed 2621.93 samples/sec   Loss 5.6499   LearningRate 0.0185   Epoch: 11   Global Step: 472730   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:20,996-Speed 2630.22 samples/sec   Loss 5.7705   LearningRate 0.0185   Epoch: 11   Global Step: 472740   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:24,894-Speed 2627.38 samples/sec   Loss 5.8804   LearningRate 0.0185   Epoch: 11   Global Step: 472750   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:28,791-Speed 2628.42 samples/sec   Loss 5.7373   LearningRate 0.0185   Epoch: 11   Global Step: 472760   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:32,686-Speed 2630.20 samples/sec   Loss 5.7661   LearningRate 0.0185   Epoch: 11   Global Step: 472770   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:36,581-Speed 2629.43 samples/sec   Loss 5.8005   LearningRate 0.0185   Epoch: 11   Global Step: 472780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:40,479-Speed 2627.99 samples/sec   Loss 5.7639   LearningRate 0.0185   Epoch: 11   Global Step: 472790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:44,375-Speed 2628.80 samples/sec   Loss 5.6965   LearningRate 0.0185   Epoch: 11   Global Step: 472800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:48,242-Speed 2648.28 samples/sec   Loss 5.7496   LearningRate 0.0185   Epoch: 11   Global Step: 472810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:52,145-Speed 2624.53 samples/sec   Loss 5.7593   LearningRate 0.0185   Epoch: 11   Global Step: 472820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:56,043-Speed 2626.87 samples/sec   Loss 5.7504   LearningRate 0.0185   Epoch: 11   Global Step: 472830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:51:59,955-Speed 2618.46 samples/sec   Loss 5.6813   LearningRate 0.0185   Epoch: 11   Global Step: 472840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:03,855-Speed 2625.97 samples/sec   Loss 5.5505   LearningRate 0.0185   Epoch: 11   Global Step: 472850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:07,754-Speed 2627.08 samples/sec   Loss 5.8792   LearningRate 0.0185   Epoch: 11   Global Step: 472860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:11,659-Speed 2623.14 samples/sec   Loss 5.6960   LearningRate 0.0185   Epoch: 11   Global Step: 472870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:15,567-Speed 2620.67 samples/sec   Loss 5.7448   LearningRate 0.0185   Epoch: 11   Global Step: 472880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:19,463-Speed 2629.07 samples/sec   Loss 5.7692   LearningRate 0.0185   Epoch: 11   Global Step: 472890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:23,357-Speed 2630.60 samples/sec   Loss 5.7535   LearningRate 0.0185   Epoch: 11   Global Step: 472900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:52:27,268-Speed 2618.29 samples/sec   Loss 5.8340   LearningRate 0.0185   Epoch: 11   Global Step: 472910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:31,170-Speed 2624.75 samples/sec   Loss 5.7151   LearningRate 0.0185   Epoch: 11   Global Step: 472920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:35,069-Speed 2627.49 samples/sec   Loss 5.7589   LearningRate 0.0185   Epoch: 11   Global Step: 472930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:38,970-Speed 2625.63 samples/sec   Loss 5.7620   LearningRate 0.0185   Epoch: 11   Global Step: 472940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:42,863-Speed 2630.92 samples/sec   Loss 5.7966   LearningRate 0.0185   Epoch: 11   Global Step: 472950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:46,756-Speed 2630.94 samples/sec   Loss 5.8070   LearningRate 0.0185   Epoch: 11   Global Step: 472960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:50,649-Speed 2631.00 samples/sec   Loss 5.6922   LearningRate 0.0185   Epoch: 11   Global Step: 472970   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:54,555-Speed 2622.38 samples/sec   Loss 5.7162   LearningRate 0.0185   Epoch: 11   Global Step: 472980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:52:58,452-Speed 2628.35 samples/sec   Loss 5.7617   LearningRate 0.0185   Epoch: 11   Global Step: 472990   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:53:02,350-Speed 2627.39 samples/sec   Loss 5.6266   LearningRate 0.0185   Epoch: 11   Global Step: 473000   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:53:06,246-Speed 2628.86 samples/sec   Loss 5.7666   LearningRate 0.0185   Epoch: 11   Global Step: 473010   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-04-15 00:53:10,126-Speed 2639.81 samples/sec   Loss 5.7952   LearningRate 0.0185   Epoch: 11   Global Step: 473020   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:53:14,032-Speed 2622.00 samples/sec   Loss 5.7205   LearningRate 0.0185   Epoch: 11   Global Step: 473030   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:53:17,908-Speed 2642.93 samples/sec   Loss 5.7032   LearningRate 0.0185   Epoch: 11   Global Step: 473040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:21,802-Speed 2630.25 samples/sec   Loss 5.7311   LearningRate 0.0185   Epoch: 11   Global Step: 473050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:25,701-Speed 2627.12 samples/sec   Loss 5.7759   LearningRate 0.0185   Epoch: 11   Global Step: 473060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:29,598-Speed 2628.59 samples/sec   Loss 5.7601   LearningRate 0.0185   Epoch: 11   Global Step: 473070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:33,503-Speed 2622.60 samples/sec   Loss 5.7247   LearningRate 0.0185   Epoch: 11   Global Step: 473080   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:37,398-Speed 2629.20 samples/sec   Loss 5.7774   LearningRate 0.0185   Epoch: 11   Global Step: 473090   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:41,295-Speed 2628.04 samples/sec   Loss 5.6692   LearningRate 0.0185   Epoch: 11   Global Step: 473100   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:45,195-Speed 2626.52 samples/sec   Loss 5.6858   LearningRate 0.0185   Epoch: 11   Global Step: 473110   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:49,105-Speed 2619.71 samples/sec   Loss 5.7694   LearningRate 0.0185   Epoch: 11   Global Step: 473120   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:53:52,997-Speed 2631.50 samples/sec   Loss 5.7131   LearningRate 0.0185   Epoch: 11   Global Step: 473130   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:53:56,905-Speed 2621.12 samples/sec   Loss 5.7602   LearningRate 0.0185   Epoch: 11   Global Step: 473140   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:00,807-Speed 2624.55 samples/sec   Loss 5.8276   LearningRate 0.0185   Epoch: 11   Global Step: 473150   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:04,747-Speed 2599.78 samples/sec   Loss 5.6739   LearningRate 0.0185   Epoch: 11   Global Step: 473160   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:08,644-Speed 2628.28 samples/sec   Loss 5.7404   LearningRate 0.0185   Epoch: 11   Global Step: 473170   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:12,567-Speed 2611.31 samples/sec   Loss 5.7973   LearningRate 0.0185   Epoch: 11   Global Step: 473180   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:16,538-Speed 2578.74 samples/sec   Loss 5.7395   LearningRate 0.0185   Epoch: 11   Global Step: 473190   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:20,431-Speed 2631.22 samples/sec   Loss 5.6329   LearningRate 0.0185   Epoch: 11   Global Step: 473200   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:24,327-Speed 2629.01 samples/sec   Loss 5.7599   LearningRate 0.0185   Epoch: 11   Global Step: 473210   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:28,231-Speed 2623.50 samples/sec   Loss 5.8151   LearningRate 0.0185   Epoch: 11   Global Step: 473220   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 00:54:32,127-Speed 2628.55 samples/sec   Loss 5.7315   LearningRate 0.0185   Epoch: 11   Global Step: 473230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:36,022-Speed 2629.43 samples/sec   Loss 5.6918   LearningRate 0.0185   Epoch: 11   Global Step: 473240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:39,922-Speed 2626.03 samples/sec   Loss 5.7713   LearningRate 0.0184   Epoch: 11   Global Step: 473250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:43,845-Speed 2611.65 samples/sec   Loss 5.7427   LearningRate 0.0184   Epoch: 11   Global Step: 473260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:47,740-Speed 2629.39 samples/sec   Loss 5.6435   LearningRate 0.0184   Epoch: 11   Global Step: 473270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:51,643-Speed 2624.85 samples/sec   Loss 5.8448   LearningRate 0.0184   Epoch: 11   Global Step: 473280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:55,537-Speed 2629.63 samples/sec   Loss 5.6776   LearningRate 0.0184   Epoch: 11   Global Step: 473290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:54:59,430-Speed 2631.47 samples/sec   Loss 5.7497   LearningRate 0.0184   Epoch: 11   Global Step: 473300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:03,331-Speed 2625.55 samples/sec   Loss 5.7217   LearningRate 0.0184   Epoch: 11   Global Step: 473310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:07,243-Speed 2617.87 samples/sec   Loss 5.7661   LearningRate 0.0184   Epoch: 11   Global Step: 473320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:11,163-Speed 2612.89 samples/sec   Loss 5.5565   LearningRate 0.0184   Epoch: 11   Global Step: 473330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:55:15,060-Speed 2628.31 samples/sec   Loss 5.6658   LearningRate 0.0184   Epoch: 11   Global Step: 473340   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:55:18,954-Speed 2630.08 samples/sec   Loss 5.6290   LearningRate 0.0184   Epoch: 11   Global Step: 473350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:55:22,824-Speed 2647.14 samples/sec   Loss 5.6910   LearningRate 0.0184   Epoch: 11   Global Step: 473360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:26,720-Speed 2628.50 samples/sec   Loss 5.7984   LearningRate 0.0184   Epoch: 11   Global Step: 473370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:30,618-Speed 2628.21 samples/sec   Loss 5.8857   LearningRate 0.0184   Epoch: 11   Global Step: 473380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:34,515-Speed 2627.90 samples/sec   Loss 5.7198   LearningRate 0.0184   Epoch: 11   Global Step: 473390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:38,412-Speed 2628.13 samples/sec   Loss 5.8170   LearningRate 0.0184   Epoch: 11   Global Step: 473400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:42,305-Speed 2630.98 samples/sec   Loss 5.7049   LearningRate 0.0184   Epoch: 11   Global Step: 473410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:46,212-Speed 2621.30 samples/sec   Loss 5.7848   LearningRate 0.0184   Epoch: 11   Global Step: 473420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:50,108-Speed 2629.11 samples/sec   Loss 5.7946   LearningRate 0.0184   Epoch: 11   Global Step: 473430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:54,009-Speed 2625.24 samples/sec   Loss 5.6862   LearningRate 0.0184   Epoch: 11   Global Step: 473440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:55:57,945-Speed 2602.58 samples/sec   Loss 5.7112   LearningRate 0.0184   Epoch: 11   Global Step: 473450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:56:01,886-Speed 2599.17 samples/sec   Loss 5.7228   LearningRate 0.0184   Epoch: 11   Global Step: 473460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:05,777-Speed 2631.56 samples/sec   Loss 5.8079   LearningRate 0.0184   Epoch: 11   Global Step: 473470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:09,690-Speed 2617.96 samples/sec   Loss 5.8166   LearningRate 0.0184   Epoch: 11   Global Step: 473480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:13,582-Speed 2631.66 samples/sec   Loss 5.8798   LearningRate 0.0184   Epoch: 11   Global Step: 473490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:17,485-Speed 2623.59 samples/sec   Loss 5.7135   LearningRate 0.0184   Epoch: 11   Global Step: 473500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:21,393-Speed 2621.09 samples/sec   Loss 5.7863   LearningRate 0.0184   Epoch: 11   Global Step: 473510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:25,302-Speed 2620.26 samples/sec   Loss 5.7941   LearningRate 0.0184   Epoch: 11   Global Step: 473520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:29,193-Speed 2632.49 samples/sec   Loss 5.7247   LearningRate 0.0184   Epoch: 11   Global Step: 473530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:33,097-Speed 2623.72 samples/sec   Loss 5.7693   LearningRate 0.0184   Epoch: 11   Global Step: 473540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:36,990-Speed 2630.99 samples/sec   Loss 5.6818   LearningRate 0.0184   Epoch: 11   Global Step: 473550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:40,875-Speed 2636.77 samples/sec   Loss 5.6918   LearningRate 0.0184   Epoch: 11   Global Step: 473560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:44,772-Speed 2627.90 samples/sec   Loss 5.7148   LearningRate 0.0184   Epoch: 11   Global Step: 473570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:56:48,645-Speed 2644.94 samples/sec   Loss 5.6602   LearningRate 0.0184   Epoch: 11   Global Step: 473580   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:56:52,558-Speed 2617.32 samples/sec   Loss 5.7727   LearningRate 0.0184   Epoch: 11   Global Step: 473590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:56:56,458-Speed 2626.17 samples/sec   Loss 5.8229   LearningRate 0.0184   Epoch: 11   Global Step: 473600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:00,356-Speed 2627.12 samples/sec   Loss 5.7129   LearningRate 0.0184   Epoch: 11   Global Step: 473610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:04,278-Speed 2611.82 samples/sec   Loss 5.7108   LearningRate 0.0184   Epoch: 11   Global Step: 473620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:08,180-Speed 2624.87 samples/sec   Loss 5.7155   LearningRate 0.0184   Epoch: 11   Global Step: 473630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:12,072-Speed 2632.01 samples/sec   Loss 5.6978   LearningRate 0.0184   Epoch: 11   Global Step: 473640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:15,982-Speed 2619.70 samples/sec   Loss 5.8473   LearningRate 0.0184   Epoch: 11   Global Step: 473650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:19,885-Speed 2623.67 samples/sec   Loss 5.7627   LearningRate 0.0184   Epoch: 11   Global Step: 473660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:23,782-Speed 2628.44 samples/sec   Loss 5.7838   LearningRate 0.0184   Epoch: 11   Global Step: 473670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:57:27,679-Speed 2628.01 samples/sec   Loss 5.8275   LearningRate 0.0184   Epoch: 11   Global Step: 473680   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:31,585-Speed 2622.66 samples/sec   Loss 5.7156   LearningRate 0.0184   Epoch: 11   Global Step: 473690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:35,528-Speed 2597.08 samples/sec   Loss 5.6338   LearningRate 0.0184   Epoch: 11   Global Step: 473700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:39,440-Speed 2618.71 samples/sec   Loss 5.5604   LearningRate 0.0184   Epoch: 11   Global Step: 473710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:43,335-Speed 2629.01 samples/sec   Loss 5.7593   LearningRate 0.0184   Epoch: 11   Global Step: 473720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:47,232-Speed 2628.63 samples/sec   Loss 5.6885   LearningRate 0.0184   Epoch: 11   Global Step: 473730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:51,134-Speed 2625.31 samples/sec   Loss 5.7414   LearningRate 0.0184   Epoch: 11   Global Step: 473740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:55,025-Speed 2632.17 samples/sec   Loss 5.7508   LearningRate 0.0184   Epoch: 11   Global Step: 473750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:57:58,919-Speed 2630.28 samples/sec   Loss 5.6901   LearningRate 0.0184   Epoch: 11   Global Step: 473760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:02,815-Speed 2628.34 samples/sec   Loss 5.7106   LearningRate 0.0184   Epoch: 11   Global Step: 473770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:06,691-Speed 2643.02 samples/sec   Loss 5.7326   LearningRate 0.0184   Epoch: 11   Global Step: 473780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:10,598-Speed 2621.76 samples/sec   Loss 5.7431   LearningRate 0.0184   Epoch: 11   Global Step: 473790   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:14,505-Speed 2622.15 samples/sec   Loss 5.7647   LearningRate 0.0184   Epoch: 11   Global Step: 473800   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:18,407-Speed 2624.64 samples/sec   Loss 5.7275   LearningRate 0.0184   Epoch: 11   Global Step: 473810   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:22,311-Speed 2623.40 samples/sec   Loss 5.7583   LearningRate 0.0184   Epoch: 11   Global Step: 473820   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:26,203-Speed 2631.77 samples/sec   Loss 5.7812   LearningRate 0.0184   Epoch: 11   Global Step: 473830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:30,102-Speed 2627.33 samples/sec   Loss 5.7831   LearningRate 0.0184   Epoch: 11   Global Step: 473840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:58:34,037-Speed 2602.74 samples/sec   Loss 5.8192   LearningRate 0.0184   Epoch: 11   Global Step: 473850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:58:37,934-Speed 2628.21 samples/sec   Loss 5.6950   LearningRate 0.0184   Epoch: 11   Global Step: 473860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:58:41,830-Speed 2628.41 samples/sec   Loss 5.7460   LearningRate 0.0184   Epoch: 11   Global Step: 473870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:58:45,737-Speed 2621.76 samples/sec   Loss 5.7140   LearningRate 0.0184   Epoch: 11   Global Step: 473880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:58:49,631-Speed 2630.39 samples/sec   Loss 5.8171   LearningRate 0.0184   Epoch: 11   Global Step: 473890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:58:53,541-Speed 2619.37 samples/sec   Loss 5.7663   LearningRate 0.0184   Epoch: 11   Global Step: 473900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:58:57,437-Speed 2628.70 samples/sec   Loss 5.7380   LearningRate 0.0184   Epoch: 11   Global Step: 473910   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:59:01,340-Speed 2624.26 samples/sec   Loss 5.6883   LearningRate 0.0184   Epoch: 11   Global Step: 473920   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:59:05,236-Speed 2628.87 samples/sec   Loss 5.5937   LearningRate 0.0184   Epoch: 11   Global Step: 473930   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:59:09,127-Speed 2632.59 samples/sec   Loss 5.6194   LearningRate 0.0184   Epoch: 11   Global Step: 473940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 00:59:13,033-Speed 2622.28 samples/sec   Loss 5.8899   LearningRate 0.0184   Epoch: 11   Global Step: 473950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:16,931-Speed 2627.56 samples/sec   Loss 5.7171   LearningRate 0.0184   Epoch: 11   Global Step: 473960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:20,825-Speed 2630.36 samples/sec   Loss 5.7640   LearningRate 0.0184   Epoch: 11   Global Step: 473970   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:24,714-Speed 2633.11 samples/sec   Loss 5.8836   LearningRate 0.0184   Epoch: 11   Global Step: 473980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:28,610-Speed 2629.64 samples/sec   Loss 5.7366   LearningRate 0.0184   Epoch: 11   Global Step: 473990   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:32,524-Speed 2616.65 samples/sec   Loss 5.6740   LearningRate 0.0184   Epoch: 11   Global Step: 474000   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:36,418-Speed 2629.51 samples/sec   Loss 5.7326   LearningRate 0.0184   Epoch: 11   Global Step: 474010   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:40,311-Speed 2631.20 samples/sec   Loss 5.7985   LearningRate 0.0184   Epoch: 11   Global Step: 474020   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:44,210-Speed 2627.48 samples/sec   Loss 5.7719   LearningRate 0.0184   Epoch: 11   Global Step: 474030   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:48,107-Speed 2628.03 samples/sec   Loss 5.6277   LearningRate 0.0184   Epoch: 11   Global Step: 474040   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:51,983-Speed 2642.70 samples/sec   Loss 5.6780   LearningRate 0.0184   Epoch: 11   Global Step: 474050   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:55,884-Speed 2625.53 samples/sec   Loss 5.7055   LearningRate 0.0184   Epoch: 11   Global Step: 474060   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 00:59:59,793-Speed 2620.75 samples/sec   Loss 5.7568   LearningRate 0.0184   Epoch: 11   Global Step: 474070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:00:03,704-Speed 2619.03 samples/sec   Loss 5.7128   LearningRate 0.0184   Epoch: 11   Global Step: 474080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:00:07,586-Speed 2638.07 samples/sec   Loss 5.7283   LearningRate 0.0184   Epoch: 11   Global Step: 474090   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:11,489-Speed 2623.74 samples/sec   Loss 5.7051   LearningRate 0.0184   Epoch: 11   Global Step: 474100   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:15,385-Speed 2629.54 samples/sec   Loss 5.7421   LearningRate 0.0184   Epoch: 11   Global Step: 474110   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:19,282-Speed 2628.53 samples/sec   Loss 5.8808   LearningRate 0.0184   Epoch: 11   Global Step: 474120   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:23,189-Speed 2621.79 samples/sec   Loss 5.6694   LearningRate 0.0184   Epoch: 11   Global Step: 474130   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:27,136-Speed 2594.99 samples/sec   Loss 5.7610   LearningRate 0.0184   Epoch: 11   Global Step: 474140   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:31,040-Speed 2623.52 samples/sec   Loss 5.7031   LearningRate 0.0184   Epoch: 11   Global Step: 474150   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:34,937-Speed 2628.13 samples/sec   Loss 5.8995   LearningRate 0.0184   Epoch: 11   Global Step: 474160   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:38,840-Speed 2623.72 samples/sec   Loss 5.6600   LearningRate 0.0184   Epoch: 11   Global Step: 474170   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:42,741-Speed 2625.54 samples/sec   Loss 5.6928   LearningRate 0.0184   Epoch: 11   Global Step: 474180   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:46,642-Speed 2625.47 samples/sec   Loss 5.7305   LearningRate 0.0184   Epoch: 11   Global Step: 474190   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:00:50,540-Speed 2627.92 samples/sec   Loss 5.7656   LearningRate 0.0184   Epoch: 11   Global Step: 474200   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:00:54,425-Speed 2636.50 samples/sec   Loss 5.7248   LearningRate 0.0184   Epoch: 11   Global Step: 474210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:00:58,321-Speed 2629.10 samples/sec   Loss 5.8208   LearningRate 0.0183   Epoch: 11   Global Step: 474220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:02,216-Speed 2629.57 samples/sec   Loss 5.7581   LearningRate 0.0183   Epoch: 11   Global Step: 474230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:06,113-Speed 2628.23 samples/sec   Loss 5.6660   LearningRate 0.0183   Epoch: 11   Global Step: 474240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:10,020-Speed 2621.34 samples/sec   Loss 5.7317   LearningRate 0.0183   Epoch: 11   Global Step: 474250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:13,913-Speed 2631.21 samples/sec   Loss 5.6724   LearningRate 0.0183   Epoch: 11   Global Step: 474260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:17,818-Speed 2623.24 samples/sec   Loss 5.6523   LearningRate 0.0183   Epoch: 11   Global Step: 474270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:21,764-Speed 2595.32 samples/sec   Loss 5.7850   LearningRate 0.0183   Epoch: 11   Global Step: 474280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:25,723-Speed 2587.09 samples/sec   Loss 5.7329   LearningRate 0.0183   Epoch: 11   Global Step: 474290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:29,621-Speed 2633.42 samples/sec   Loss 5.7816   LearningRate 0.0183   Epoch: 11   Global Step: 474300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:33,517-Speed 2629.22 samples/sec   Loss 5.6956   LearningRate 0.0183   Epoch: 11   Global Step: 474310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:01:37,484-Speed 2581.91 samples/sec   Loss 5.6416   LearningRate 0.0183   Epoch: 11   Global Step: 474320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:01:41,466-Speed 2571.65 samples/sec   Loss 5.7233   LearningRate 0.0183   Epoch: 11   Global Step: 474330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:01:45,362-Speed 2628.92 samples/sec   Loss 5.7621   LearningRate 0.0183   Epoch: 11   Global Step: 474340   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:01:49,238-Speed 2642.71 samples/sec   Loss 5.6939   LearningRate 0.0183   Epoch: 11   Global Step: 474350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:53,132-Speed 2630.71 samples/sec   Loss 5.7545   LearningRate 0.0183   Epoch: 11   Global Step: 474360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:01:57,030-Speed 2627.23 samples/sec   Loss 5.7022   LearningRate 0.0183   Epoch: 11   Global Step: 474370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:02:00,923-Speed 2630.90 samples/sec   Loss 5.8043   LearningRate 0.0183   Epoch: 11   Global Step: 474380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:02:04,801-Speed 2641.34 samples/sec   Loss 5.6154   LearningRate 0.0183   Epoch: 11   Global Step: 474390   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:08,701-Speed 2625.74 samples/sec   Loss 5.7094   LearningRate 0.0183   Epoch: 11   Global Step: 474400   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:12,630-Speed 2606.91 samples/sec   Loss 6.0120   LearningRate 0.0183   Epoch: 11   Global Step: 474410   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:16,571-Speed 2599.84 samples/sec   Loss 5.7428   LearningRate 0.0183   Epoch: 11   Global Step: 474420   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:20,489-Speed 2614.34 samples/sec   Loss 5.8098   LearningRate 0.0183   Epoch: 11   Global Step: 474430   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:24,385-Speed 2628.27 samples/sec   Loss 5.6431   LearningRate 0.0183   Epoch: 11   Global Step: 474440   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:28,286-Speed 2625.44 samples/sec   Loss 5.7169   LearningRate 0.0183   Epoch: 11   Global Step: 474450   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:32,191-Speed 2623.21 samples/sec   Loss 5.7571   LearningRate 0.0183   Epoch: 11   Global Step: 474460   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:36,092-Speed 2625.78 samples/sec   Loss 5.5909   LearningRate 0.0183   Epoch: 11   Global Step: 474470   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:39,990-Speed 2627.50 samples/sec   Loss 5.6795   LearningRate 0.0183   Epoch: 11   Global Step: 474480   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:02:43,890-Speed 2626.35 samples/sec   Loss 5.7496   LearningRate 0.0183   Epoch: 11   Global Step: 474490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:02:47,789-Speed 2626.92 samples/sec   Loss 5.6960   LearningRate 0.0183   Epoch: 11   Global Step: 474500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:02:51,683-Speed 2630.41 samples/sec   Loss 5.6525   LearningRate 0.0183   Epoch: 11   Global Step: 474510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:02:55,575-Speed 2632.16 samples/sec   Loss 5.6763   LearningRate 0.0183   Epoch: 11   Global Step: 474520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:02:59,466-Speed 2632.22 samples/sec   Loss 5.8074   LearningRate 0.0183   Epoch: 11   Global Step: 474530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:03,362-Speed 2628.35 samples/sec   Loss 5.7501   LearningRate 0.0183   Epoch: 11   Global Step: 474540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:07,254-Speed 2631.90 samples/sec   Loss 5.8105   LearningRate 0.0183   Epoch: 11   Global Step: 474550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:11,144-Speed 2632.70 samples/sec   Loss 5.7691   LearningRate 0.0183   Epoch: 11   Global Step: 474560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:15,040-Speed 2629.49 samples/sec   Loss 5.7555   LearningRate 0.0183   Epoch: 11   Global Step: 474570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:18,941-Speed 2625.39 samples/sec   Loss 5.7793   LearningRate 0.0183   Epoch: 11   Global Step: 474580   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:22,823-Speed 2638.32 samples/sec   Loss 5.7545   LearningRate 0.0183   Epoch: 11   Global Step: 474590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:26,717-Speed 2630.51 samples/sec   Loss 5.7698   LearningRate 0.0183   Epoch: 11   Global Step: 474600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:30,611-Speed 2630.40 samples/sec   Loss 5.7083   LearningRate 0.0183   Epoch: 11   Global Step: 474610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:34,505-Speed 2629.75 samples/sec   Loss 5.6289   LearningRate 0.0183   Epoch: 11   Global Step: 474620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:38,398-Speed 2631.18 samples/sec   Loss 5.7194   LearningRate 0.0183   Epoch: 11   Global Step: 474630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:03:42,270-Speed 2645.15 samples/sec   Loss 5.7129   LearningRate 0.0183   Epoch: 11   Global Step: 474640   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:03:46,166-Speed 2629.18 samples/sec   Loss 5.7387   LearningRate 0.0183   Epoch: 11   Global Step: 474650   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:03:50,057-Speed 2632.67 samples/sec   Loss 5.6833   LearningRate 0.0183   Epoch: 11   Global Step: 474660   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:03:53,951-Speed 2630.46 samples/sec   Loss 5.8437   LearningRate 0.0183   Epoch: 11   Global Step: 474670   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:03:57,851-Speed 2625.89 samples/sec   Loss 5.7555   LearningRate 0.0183   Epoch: 11   Global Step: 474680   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:04:01,747-Speed 2628.61 samples/sec   Loss 5.6080   LearningRate 0.0183   Epoch: 11   Global Step: 474690   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:04:05,647-Speed 2626.52 samples/sec   Loss 5.5309   LearningRate 0.0183   Epoch: 11   Global Step: 474700   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:04:09,539-Speed 2631.82 samples/sec   Loss 5.7133   LearningRate 0.0183   Epoch: 11   Global Step: 474710   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:04:13,435-Speed 2628.71 samples/sec   Loss 5.7060   LearningRate 0.0183   Epoch: 11   Global Step: 474720   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:04:17,332-Speed 2628.55 samples/sec   Loss 5.6716   LearningRate 0.0183   Epoch: 11   Global Step: 474730   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:04:21,328-Speed 2563.07 samples/sec   Loss 5.7634   LearningRate 0.0183   Epoch: 11   Global Step: 474740   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:25,222-Speed 2630.53 samples/sec   Loss 5.7387   LearningRate 0.0183   Epoch: 11   Global Step: 474750   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:29,120-Speed 2627.55 samples/sec   Loss 5.7002   LearningRate 0.0183   Epoch: 11   Global Step: 474760   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:33,023-Speed 2624.23 samples/sec   Loss 5.7820   LearningRate 0.0183   Epoch: 11   Global Step: 474770   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:36,925-Speed 2624.43 samples/sec   Loss 5.7790   LearningRate 0.0183   Epoch: 11   Global Step: 474780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:40,841-Speed 2616.03 samples/sec   Loss 5.7888   LearningRate 0.0183   Epoch: 11   Global Step: 474790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:44,739-Speed 2627.26 samples/sec   Loss 5.7431   LearningRate 0.0183   Epoch: 11   Global Step: 474800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:48,643-Speed 2626.79 samples/sec   Loss 5.5903   LearningRate 0.0183   Epoch: 11   Global Step: 474810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:52,545-Speed 2624.57 samples/sec   Loss 5.8243   LearningRate 0.0183   Epoch: 11   Global Step: 474820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:04:56,441-Speed 2629.44 samples/sec   Loss 5.6740   LearningRate 0.0183   Epoch: 11   Global Step: 474830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:00,346-Speed 2622.32 samples/sec   Loss 5.6031   LearningRate 0.0183   Epoch: 11   Global Step: 474840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:05:04,240-Speed 2630.72 samples/sec   Loss 5.7730   LearningRate 0.0183   Epoch: 11   Global Step: 474850   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:05:08,110-Speed 2646.46 samples/sec   Loss 5.7620   LearningRate 0.0183   Epoch: 11   Global Step: 474860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:12,005-Speed 2629.31 samples/sec   Loss 5.7440   LearningRate 0.0183   Epoch: 11   Global Step: 474870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:15,902-Speed 2628.59 samples/sec   Loss 5.7140   LearningRate 0.0183   Epoch: 11   Global Step: 474880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:19,816-Speed 2616.53 samples/sec   Loss 5.7523   LearningRate 0.0183   Epoch: 11   Global Step: 474890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:23,728-Speed 2618.52 samples/sec   Loss 5.7026   LearningRate 0.0183   Epoch: 11   Global Step: 474900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:27,623-Speed 2629.75 samples/sec   Loss 5.5824   LearningRate 0.0183   Epoch: 11   Global Step: 474910   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:31,524-Speed 2625.82 samples/sec   Loss 5.7068   LearningRate 0.0183   Epoch: 11   Global Step: 474920   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:35,423-Speed 2626.82 samples/sec   Loss 5.7276   LearningRate 0.0183   Epoch: 11   Global Step: 474930   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:39,324-Speed 2625.00 samples/sec   Loss 5.7054   LearningRate 0.0183   Epoch: 11   Global Step: 474940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:43,220-Speed 2628.74 samples/sec   Loss 5.7361   LearningRate 0.0183   Epoch: 11   Global Step: 474950   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:47,117-Speed 2628.53 samples/sec   Loss 5.7385   LearningRate 0.0183   Epoch: 11   Global Step: 474960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:05:51,001-Speed 2636.90 samples/sec   Loss 5.8009   LearningRate 0.0183   Epoch: 11   Global Step: 474970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:54,900-Speed 2626.62 samples/sec   Loss 5.7311   LearningRate 0.0183   Epoch: 11   Global Step: 474980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:05:58,831-Speed 2605.60 samples/sec   Loss 5.7917   LearningRate 0.0183   Epoch: 11   Global Step: 474990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:02,740-Speed 2620.82 samples/sec   Loss 5.6428   LearningRate 0.0183   Epoch: 11   Global Step: 475000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:06,645-Speed 2622.99 samples/sec   Loss 5.6893   LearningRate 0.0183   Epoch: 11   Global Step: 475010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:10,542-Speed 2627.57 samples/sec   Loss 5.7728   LearningRate 0.0183   Epoch: 11   Global Step: 475020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:14,447-Speed 2623.85 samples/sec   Loss 5.7101   LearningRate 0.0183   Epoch: 11   Global Step: 475030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:18,359-Speed 2617.91 samples/sec   Loss 5.8106   LearningRate 0.0183   Epoch: 11   Global Step: 475040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:22,286-Speed 2608.70 samples/sec   Loss 5.7885   LearningRate 0.0183   Epoch: 11   Global Step: 475050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:26,188-Speed 2624.71 samples/sec   Loss 5.6848   LearningRate 0.0183   Epoch: 11   Global Step: 475060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:06:30,097-Speed 2620.70 samples/sec   Loss 5.6835   LearningRate 0.0183   Epoch: 11   Global Step: 475070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:33,998-Speed 2625.36 samples/sec   Loss 5.6958   LearningRate 0.0183   Epoch: 11   Global Step: 475080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:37,898-Speed 2625.92 samples/sec   Loss 5.6619   LearningRate 0.0183   Epoch: 11   Global Step: 475090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:41,812-Speed 2617.38 samples/sec   Loss 5.6288   LearningRate 0.0183   Epoch: 11   Global Step: 475100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:45,717-Speed 2623.03 samples/sec   Loss 5.6047   LearningRate 0.0183   Epoch: 11   Global Step: 475110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:49,615-Speed 2627.59 samples/sec   Loss 5.6755   LearningRate 0.0183   Epoch: 11   Global Step: 475120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:53,536-Speed 2612.27 samples/sec   Loss 5.6773   LearningRate 0.0183   Epoch: 11   Global Step: 475130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:06:57,434-Speed 2627.63 samples/sec   Loss 5.7668   LearningRate 0.0183   Epoch: 11   Global Step: 475140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:01,348-Speed 2616.79 samples/sec   Loss 5.7464   LearningRate 0.0183   Epoch: 11   Global Step: 475150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:05,245-Speed 2628.36 samples/sec   Loss 5.7140   LearningRate 0.0183   Epoch: 11   Global Step: 475160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:09,132-Speed 2635.05 samples/sec   Loss 5.7485   LearningRate 0.0183   Epoch: 11   Global Step: 475170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:13,026-Speed 2629.71 samples/sec   Loss 5.6199   LearningRate 0.0183   Epoch: 11   Global Step: 475180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:16,922-Speed 2629.19 samples/sec   Loss 5.7454   LearningRate 0.0182   Epoch: 11   Global Step: 475190   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:20,817-Speed 2630.25 samples/sec   Loss 5.7122   LearningRate 0.0182   Epoch: 11   Global Step: 475200   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:24,716-Speed 2626.63 samples/sec   Loss 5.6703   LearningRate 0.0182   Epoch: 11   Global Step: 475210   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:07:28,599-Speed 2637.89 samples/sec   Loss 5.6331   LearningRate 0.0182   Epoch: 11   Global Step: 475220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:32,492-Speed 2631.05 samples/sec   Loss 5.6535   LearningRate 0.0182   Epoch: 11   Global Step: 475230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:36,386-Speed 2630.39 samples/sec   Loss 5.6878   LearningRate 0.0182   Epoch: 11   Global Step: 475240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:40,280-Speed 2630.03 samples/sec   Loss 5.6959   LearningRate 0.0182   Epoch: 11   Global Step: 475250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:44,185-Speed 2622.79 samples/sec   Loss 5.7053   LearningRate 0.0182   Epoch: 11   Global Step: 475260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:48,094-Speed 2620.23 samples/sec   Loss 5.6449   LearningRate 0.0182   Epoch: 11   Global Step: 475270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:51,991-Speed 2628.45 samples/sec   Loss 5.7149   LearningRate 0.0182   Epoch: 11   Global Step: 475280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:55,889-Speed 2627.55 samples/sec   Loss 5.6557   LearningRate 0.0182   Epoch: 11   Global Step: 475290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:07:59,758-Speed 2647.70 samples/sec   Loss 5.7850   LearningRate 0.0182   Epoch: 11   Global Step: 475300   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:03,661-Speed 2624.83 samples/sec   Loss 5.6548   LearningRate 0.0182   Epoch: 11   Global Step: 475310   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:07,555-Speed 2630.28 samples/sec   Loss 5.7689   LearningRate 0.0182   Epoch: 11   Global Step: 475320   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:11,449-Speed 2629.97 samples/sec   Loss 5.7161   LearningRate 0.0182   Epoch: 11   Global Step: 475330   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:15,346-Speed 2628.65 samples/sec   Loss 5.7458   LearningRate 0.0182   Epoch: 11   Global Step: 475340   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:19,239-Speed 2630.25 samples/sec   Loss 5.7320   LearningRate 0.0182   Epoch: 11   Global Step: 475350   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:23,222-Speed 2571.95 samples/sec   Loss 5.7175   LearningRate 0.0182   Epoch: 11   Global Step: 475360   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:27,136-Speed 2616.72 samples/sec   Loss 5.7460   LearningRate 0.0182   Epoch: 11   Global Step: 475370   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:31,030-Speed 2630.76 samples/sec   Loss 5.6897   LearningRate 0.0182   Epoch: 11   Global Step: 475380   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:34,930-Speed 2626.63 samples/sec   Loss 5.6861   LearningRate 0.0182   Epoch: 11   Global Step: 475390   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:08:38,854-Speed 2609.66 samples/sec   Loss 5.6943   LearningRate 0.0182   Epoch: 11   Global Step: 475400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:08:42,747-Speed 2631.25 samples/sec   Loss 5.7211   LearningRate 0.0182   Epoch: 11   Global Step: 475410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:08:46,639-Speed 2631.55 samples/sec   Loss 5.6229   LearningRate 0.0182   Epoch: 11   Global Step: 475420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:08:50,540-Speed 2625.73 samples/sec   Loss 5.6665   LearningRate 0.0182   Epoch: 11   Global Step: 475430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:08:54,449-Speed 2620.49 samples/sec   Loss 5.7999   LearningRate 0.0182   Epoch: 11   Global Step: 475440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:08:58,361-Speed 2618.37 samples/sec   Loss 5.8156   LearningRate 0.0182   Epoch: 11   Global Step: 475450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:09:02,262-Speed 2624.84 samples/sec   Loss 5.7120   LearningRate 0.0182   Epoch: 11   Global Step: 475460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:09:06,164-Speed 2625.54 samples/sec   Loss 5.7334   LearningRate 0.0182   Epoch: 11   Global Step: 475470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:09:10,058-Speed 2630.41 samples/sec   Loss 5.7086   LearningRate 0.0182   Epoch: 11   Global Step: 475480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:09:13,958-Speed 2625.82 samples/sec   Loss 5.7060   LearningRate 0.0182   Epoch: 11   Global Step: 475490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:09:17,853-Speed 2629.89 samples/sec   Loss 5.6616   LearningRate 0.0182   Epoch: 11   Global Step: 475500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:21,759-Speed 2622.66 samples/sec   Loss 5.6638   LearningRate 0.0182   Epoch: 11   Global Step: 475510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:25,661-Speed 2627.85 samples/sec   Loss 5.7171   LearningRate 0.0182   Epoch: 11   Global Step: 475520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:29,556-Speed 2629.39 samples/sec   Loss 5.7309   LearningRate 0.0182   Epoch: 11   Global Step: 475530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:33,450-Speed 2630.69 samples/sec   Loss 5.6862   LearningRate 0.0182   Epoch: 11   Global Step: 475540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:37,346-Speed 2628.58 samples/sec   Loss 5.7750   LearningRate 0.0182   Epoch: 11   Global Step: 475550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:41,243-Speed 2628.93 samples/sec   Loss 5.7725   LearningRate 0.0182   Epoch: 11   Global Step: 475560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:45,137-Speed 2630.45 samples/sec   Loss 5.6845   LearningRate 0.0182   Epoch: 11   Global Step: 475570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:49,032-Speed 2629.79 samples/sec   Loss 5.6784   LearningRate 0.0182   Epoch: 11   Global Step: 475580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:52,924-Speed 2631.39 samples/sec   Loss 5.7272   LearningRate 0.0182   Epoch: 11   Global Step: 475590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:09:56,820-Speed 2629.27 samples/sec   Loss 5.7776   LearningRate 0.0182   Epoch: 11   Global Step: 475600   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-04-15 01:10:00,693-Speed 2644.06 samples/sec   Loss 5.7056   LearningRate 0.0182   Epoch: 11   Global Step: 475610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:10:04,590-Speed 2628.09 samples/sec   Loss 5.7139   LearningRate 0.0182   Epoch: 11   Global Step: 475620   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:10:08,484-Speed 2630.12 samples/sec   Loss 5.6695   LearningRate 0.0182   Epoch: 11   Global Step: 475630   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:10:12,377-Speed 2632.05 samples/sec   Loss 5.6789   LearningRate 0.0182   Epoch: 11   Global Step: 475640   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:10:16,278-Speed 2626.18 samples/sec   Loss 5.7928   LearningRate 0.0182   Epoch: 11   Global Step: 475650   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:10:20,155-Speed 2641.40 samples/sec   Loss 5.6791   LearningRate 0.0182   Epoch: 11   Global Step: 475660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:24,051-Speed 2629.33 samples/sec   Loss 5.7703   LearningRate 0.0182   Epoch: 11   Global Step: 475670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:27,948-Speed 2628.48 samples/sec   Loss 5.7323   LearningRate 0.0182   Epoch: 11   Global Step: 475680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:31,852-Speed 2622.80 samples/sec   Loss 5.7500   LearningRate 0.0182   Epoch: 11   Global Step: 475690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:35,745-Speed 2631.06 samples/sec   Loss 5.6477   LearningRate 0.0182   Epoch: 11   Global Step: 475700   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:39,657-Speed 2618.70 samples/sec   Loss 5.7855   LearningRate 0.0182   Epoch: 11   Global Step: 475710   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:43,553-Speed 2629.37 samples/sec   Loss 5.7247   LearningRate 0.0182   Epoch: 11   Global Step: 475720   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:47,453-Speed 2626.59 samples/sec   Loss 5.7495   LearningRate 0.0182   Epoch: 11   Global Step: 475730   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:51,339-Speed 2635.69 samples/sec   Loss 5.5916   LearningRate 0.0182   Epoch: 11   Global Step: 475740   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:55,230-Speed 2633.01 samples/sec   Loss 5.7061   LearningRate 0.0182   Epoch: 11   Global Step: 475750   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:10:59,135-Speed 2622.29 samples/sec   Loss 5.8818   LearningRate 0.0182   Epoch: 11   Global Step: 475760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:03,029-Speed 2630.13 samples/sec   Loss 5.8116   LearningRate 0.0182   Epoch: 11   Global Step: 475770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:06,920-Speed 2632.20 samples/sec   Loss 5.6217   LearningRate 0.0182   Epoch: 11   Global Step: 475780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:10,814-Speed 2631.32 samples/sec   Loss 5.6717   LearningRate 0.0182   Epoch: 11   Global Step: 475790   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:14,716-Speed 2624.28 samples/sec   Loss 5.5949   LearningRate 0.0182   Epoch: 11   Global Step: 475800   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:18,641-Speed 2609.77 samples/sec   Loss 5.6457   LearningRate 0.0182   Epoch: 11   Global Step: 475810   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:22,542-Speed 2626.04 samples/sec   Loss 5.6973   LearningRate 0.0182   Epoch: 11   Global Step: 475820   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:26,445-Speed 2624.27 samples/sec   Loss 5.7797   LearningRate 0.0182   Epoch: 11   Global Step: 475830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:30,337-Speed 2631.58 samples/sec   Loss 5.6496   LearningRate 0.0182   Epoch: 11   Global Step: 475840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:34,244-Speed 2621.40 samples/sec   Loss 5.6946   LearningRate 0.0182   Epoch: 11   Global Step: 475850   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:38,130-Speed 2635.79 samples/sec   Loss 5.7374   LearningRate 0.0182   Epoch: 11   Global Step: 475860   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:42,022-Speed 2631.71 samples/sec   Loss 5.7115   LearningRate 0.0182   Epoch: 11   Global Step: 475870   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:45,916-Speed 2630.21 samples/sec   Loss 5.5607   LearningRate 0.0182   Epoch: 11   Global Step: 475880   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:49,810-Speed 2630.91 samples/sec   Loss 5.7868   LearningRate 0.0182   Epoch: 11   Global Step: 475890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:53,701-Speed 2632.86 samples/sec   Loss 5.7247   LearningRate 0.0182   Epoch: 11   Global Step: 475900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:11:57,598-Speed 2627.90 samples/sec   Loss 5.8083   LearningRate 0.0182   Epoch: 11   Global Step: 475910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:12:01,530-Speed 2605.37 samples/sec   Loss 5.7976   LearningRate 0.0182   Epoch: 11   Global Step: 475920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:12:05,436-Speed 2622.21 samples/sec   Loss 5.7248   LearningRate 0.0182   Epoch: 11   Global Step: 475930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:12:09,459-Speed 2546.22 samples/sec   Loss 5.6332   LearningRate 0.0182   Epoch: 11   Global Step: 475940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:12:13,388-Speed 2606.35 samples/sec   Loss 5.7785   LearningRate 0.0182   Epoch: 11   Global Step: 475950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:12:17,283-Speed 2630.33 samples/sec   Loss 5.7184   LearningRate 0.0182   Epoch: 11   Global Step: 475960   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:21,193-Speed 2620.99 samples/sec   Loss 5.7368   LearningRate 0.0182   Epoch: 11   Global Step: 475970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:25,186-Speed 2565.04 samples/sec   Loss 5.6993   LearningRate 0.0182   Epoch: 11   Global Step: 475980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:29,087-Speed 2626.04 samples/sec   Loss 5.7637   LearningRate 0.0182   Epoch: 11   Global Step: 475990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:32,998-Speed 2618.62 samples/sec   Loss 5.6876   LearningRate 0.0182   Epoch: 11   Global Step: 476000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:36,909-Speed 2618.57 samples/sec   Loss 5.7717   LearningRate 0.0182   Epoch: 11   Global Step: 476010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:40,811-Speed 2624.72 samples/sec   Loss 5.6529   LearningRate 0.0182   Epoch: 11   Global Step: 476020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:44,720-Speed 2620.65 samples/sec   Loss 5.6963   LearningRate 0.0182   Epoch: 11   Global Step: 476030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:48,622-Speed 2624.99 samples/sec   Loss 5.6534   LearningRate 0.0182   Epoch: 11   Global Step: 476040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:52,524-Speed 2624.90 samples/sec   Loss 5.7627   LearningRate 0.0182   Epoch: 11   Global Step: 476050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:12:56,458-Speed 2603.41 samples/sec   Loss 5.6930   LearningRate 0.0182   Epoch: 11   Global Step: 476060   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:13:00,352-Speed 2630.67 samples/sec   Loss 5.6748   LearningRate 0.0182   Epoch: 11   Global Step: 476070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:13:04,254-Speed 2624.41 samples/sec   Loss 5.7106   LearningRate 0.0182   Epoch: 11   Global Step: 476080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:13:08,147-Speed 2631.45 samples/sec   Loss 5.8067   LearningRate 0.0182   Epoch: 11   Global Step: 476090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:13:12,042-Speed 2629.21 samples/sec   Loss 5.5525   LearningRate 0.0182   Epoch: 11   Global Step: 476100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:13:15,939-Speed 2628.61 samples/sec   Loss 5.6537   LearningRate 0.0182   Epoch: 11   Global Step: 476110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:13:19,808-Speed 2647.20 samples/sec   Loss 5.6680   LearningRate 0.0182   Epoch: 11   Global Step: 476120   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:23,727-Speed 2613.85 samples/sec   Loss 5.7368   LearningRate 0.0182   Epoch: 11   Global Step: 476130   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:27,621-Speed 2630.13 samples/sec   Loss 5.7477   LearningRate 0.0182   Epoch: 11   Global Step: 476140   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:31,536-Speed 2616.48 samples/sec   Loss 5.6795   LearningRate 0.0182   Epoch: 11   Global Step: 476150   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:35,492-Speed 2589.14 samples/sec   Loss 5.6467   LearningRate 0.0181   Epoch: 11   Global Step: 476160   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:39,583-Speed 2503.79 samples/sec   Loss 5.8316   LearningRate 0.0181   Epoch: 11   Global Step: 476170   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:43,678-Speed 2501.21 samples/sec   Loss 5.6868   LearningRate 0.0181   Epoch: 11   Global Step: 476180   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:47,605-Speed 2608.16 samples/sec   Loss 5.7210   LearningRate 0.0181   Epoch: 11   Global Step: 476190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:51,527-Speed 2611.67 samples/sec   Loss 5.7470   LearningRate 0.0181   Epoch: 11   Global Step: 476200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:55,620-Speed 2502.32 samples/sec   Loss 5.7311   LearningRate 0.0181   Epoch: 11   Global Step: 476210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:13:59,558-Speed 2601.37 samples/sec   Loss 5.6412   LearningRate 0.0181   Epoch: 11   Global Step: 476220   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:14:03,443-Speed 2636.61 samples/sec   Loss 5.7345   LearningRate 0.0181   Epoch: 11   Global Step: 476230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:14:07,347-Speed 2623.22 samples/sec   Loss 5.6109   LearningRate 0.0181   Epoch: 11   Global Step: 476240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:14:11,222-Speed 2642.69 samples/sec   Loss 5.7211   LearningRate 0.0181   Epoch: 11   Global Step: 476250   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:15,118-Speed 2629.53 samples/sec   Loss 5.7002   LearningRate 0.0181   Epoch: 11   Global Step: 476260   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:19,010-Speed 2631.87 samples/sec   Loss 5.7442   LearningRate 0.0181   Epoch: 11   Global Step: 476270   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:22,939-Speed 2606.43 samples/sec   Loss 5.6381   LearningRate 0.0181   Epoch: 11   Global Step: 476280   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:26,853-Speed 2617.73 samples/sec   Loss 5.6395   LearningRate 0.0181   Epoch: 11   Global Step: 476290   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:30,749-Speed 2629.27 samples/sec   Loss 5.6094   LearningRate 0.0181   Epoch: 11   Global Step: 476300   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:34,644-Speed 2629.09 samples/sec   Loss 5.6163   LearningRate 0.0181   Epoch: 11   Global Step: 476310   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:38,562-Speed 2614.63 samples/sec   Loss 5.6605   LearningRate 0.0181   Epoch: 11   Global Step: 476320   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:42,463-Speed 2625.79 samples/sec   Loss 5.7626   LearningRate 0.0181   Epoch: 11   Global Step: 476330   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:46,355-Speed 2631.78 samples/sec   Loss 5.6895   LearningRate 0.0181   Epoch: 11   Global Step: 476340   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-15 01:14:50,248-Speed 2631.49 samples/sec   Loss 5.7806   LearningRate 0.0181   Epoch: 11   Global Step: 476350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:14:54,142-Speed 2629.83 samples/sec   Loss 5.7097   LearningRate 0.0181   Epoch: 11   Global Step: 476360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:14:58,063-Speed 2612.57 samples/sec   Loss 5.7054   LearningRate 0.0181   Epoch: 11   Global Step: 476370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:02,306-Speed 2414.13 samples/sec   Loss 5.6145   LearningRate 0.0181   Epoch: 11   Global Step: 476380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:06,202-Speed 2628.75 samples/sec   Loss 5.7575   LearningRate 0.0181   Epoch: 11   Global Step: 476390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:10,101-Speed 2626.84 samples/sec   Loss 5.7143   LearningRate 0.0181   Epoch: 11   Global Step: 476400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:13,998-Speed 2628.19 samples/sec   Loss 5.6695   LearningRate 0.0181   Epoch: 11   Global Step: 476410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:17,899-Speed 2626.17 samples/sec   Loss 5.6219   LearningRate 0.0181   Epoch: 11   Global Step: 476420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:21,824-Speed 2609.22 samples/sec   Loss 5.6533   LearningRate 0.0181   Epoch: 11   Global Step: 476430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:25,720-Speed 2629.35 samples/sec   Loss 5.6845   LearningRate 0.0181   Epoch: 11   Global Step: 476440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:29,625-Speed 2623.01 samples/sec   Loss 5.7242   LearningRate 0.0181   Epoch: 11   Global Step: 476450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:15:33,527-Speed 2624.90 samples/sec   Loss 5.6762   LearningRate 0.0181   Epoch: 11   Global Step: 476460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:15:37,431-Speed 2623.26 samples/sec   Loss 5.7707   LearningRate 0.0181   Epoch: 11   Global Step: 476470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:15:41,317-Speed 2635.94 samples/sec   Loss 5.6462   LearningRate 0.0181   Epoch: 11   Global Step: 476480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:45,219-Speed 2625.11 samples/sec   Loss 5.6840   LearningRate 0.0181   Epoch: 11   Global Step: 476490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:49,127-Speed 2621.06 samples/sec   Loss 5.6738   LearningRate 0.0181   Epoch: 11   Global Step: 476500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:53,043-Speed 2615.19 samples/sec   Loss 5.6810   LearningRate 0.0181   Epoch: 11   Global Step: 476510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:15:56,951-Speed 2621.50 samples/sec   Loss 5.6953   LearningRate 0.0181   Epoch: 11   Global Step: 476520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:00,851-Speed 2626.40 samples/sec   Loss 5.6658   LearningRate 0.0181   Epoch: 11   Global Step: 476530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:04,751-Speed 2626.30 samples/sec   Loss 5.6042   LearningRate 0.0181   Epoch: 11   Global Step: 476540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:08,649-Speed 2627.21 samples/sec   Loss 5.6475   LearningRate 0.0181   Epoch: 11   Global Step: 476550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:12,547-Speed 2627.56 samples/sec   Loss 5.7028   LearningRate 0.0181   Epoch: 11   Global Step: 476560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:16,488-Speed 2599.88 samples/sec   Loss 5.6095   LearningRate 0.0181   Epoch: 11   Global Step: 476570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:20,389-Speed 2625.87 samples/sec   Loss 5.7051   LearningRate 0.0181   Epoch: 11   Global Step: 476580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:16:24,297-Speed 2620.82 samples/sec   Loss 5.6624   LearningRate 0.0181   Epoch: 11   Global Step: 476590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:16:28,177-Speed 2639.78 samples/sec   Loss 5.6518   LearningRate 0.0181   Epoch: 11   Global Step: 476600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:32,084-Speed 2621.37 samples/sec   Loss 5.7255   LearningRate 0.0181   Epoch: 11   Global Step: 476610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:35,992-Speed 2621.20 samples/sec   Loss 5.7802   LearningRate 0.0181   Epoch: 11   Global Step: 476620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:39,893-Speed 2626.31 samples/sec   Loss 5.6205   LearningRate 0.0181   Epoch: 11   Global Step: 476630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:43,798-Speed 2622.16 samples/sec   Loss 5.7026   LearningRate 0.0181   Epoch: 11   Global Step: 476640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:47,700-Speed 2625.22 samples/sec   Loss 5.6911   LearningRate 0.0181   Epoch: 11   Global Step: 476650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:51,599-Speed 2627.16 samples/sec   Loss 5.6803   LearningRate 0.0181   Epoch: 11   Global Step: 476660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:55,500-Speed 2626.31 samples/sec   Loss 5.7288   LearningRate 0.0181   Epoch: 11   Global Step: 476670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:16:59,415-Speed 2615.46 samples/sec   Loss 5.7486   LearningRate 0.0181   Epoch: 11   Global Step: 476680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:17:03,325-Speed 2619.99 samples/sec   Loss 5.7389   LearningRate 0.0181   Epoch: 11   Global Step: 476690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:17:07,226-Speed 2625.54 samples/sec   Loss 5.7130   LearningRate 0.0181   Epoch: 11   Global Step: 476700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:11,151-Speed 2610.05 samples/sec   Loss 5.6578   LearningRate 0.0181   Epoch: 11   Global Step: 476710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:15,048-Speed 2627.78 samples/sec   Loss 5.6268   LearningRate 0.0181   Epoch: 11   Global Step: 476720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:18,964-Speed 2615.73 samples/sec   Loss 5.6216   LearningRate 0.0181   Epoch: 11   Global Step: 476730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:22,863-Speed 2627.81 samples/sec   Loss 5.6898   LearningRate 0.0181   Epoch: 11   Global Step: 476740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:26,759-Speed 2628.96 samples/sec   Loss 5.6771   LearningRate 0.0181   Epoch: 11   Global Step: 476750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:30,654-Speed 2629.92 samples/sec   Loss 5.7195   LearningRate 0.0181   Epoch: 11   Global Step: 476760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:34,553-Speed 2626.72 samples/sec   Loss 5.7253   LearningRate 0.0181   Epoch: 11   Global Step: 476770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:38,451-Speed 2627.17 samples/sec   Loss 5.6893   LearningRate 0.0181   Epoch: 11   Global Step: 476780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:42,348-Speed 2628.42 samples/sec   Loss 5.8009   LearningRate 0.0181   Epoch: 11   Global Step: 476790   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:46,239-Speed 2632.12 samples/sec   Loss 5.7236   LearningRate 0.0181   Epoch: 11   Global Step: 476800   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:50,142-Speed 2624.55 samples/sec   Loss 5.5737   LearningRate 0.0181   Epoch: 11   Global Step: 476810   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:54,040-Speed 2627.87 samples/sec   Loss 5.4651   LearningRate 0.0181   Epoch: 11   Global Step: 476820   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:17:57,940-Speed 2626.25 samples/sec   Loss 5.7440   LearningRate 0.0181   Epoch: 11   Global Step: 476830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:18:01,835-Speed 2629.73 samples/sec   Loss 5.7010   LearningRate 0.0181   Epoch: 11   Global Step: 476840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:18:05,731-Speed 2629.35 samples/sec   Loss 5.6379   LearningRate 0.0181   Epoch: 11   Global Step: 476850   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:18:09,600-Speed 2646.83 samples/sec   Loss 5.6266   LearningRate 0.0181   Epoch: 11   Global Step: 476860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:13,549-Speed 2593.85 samples/sec   Loss 5.6174   LearningRate 0.0181   Epoch: 11   Global Step: 476870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:17,447-Speed 2628.54 samples/sec   Loss 5.7671   LearningRate 0.0181   Epoch: 11   Global Step: 476880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:21,345-Speed 2627.66 samples/sec   Loss 5.6712   LearningRate 0.0181   Epoch: 11   Global Step: 476890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:25,244-Speed 2626.59 samples/sec   Loss 5.6813   LearningRate 0.0181   Epoch: 11   Global Step: 476900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:29,137-Speed 2631.96 samples/sec   Loss 5.6922   LearningRate 0.0181   Epoch: 11   Global Step: 476910   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:33,029-Speed 2631.46 samples/sec   Loss 5.6250   LearningRate 0.0181   Epoch: 11   Global Step: 476920   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:36,924-Speed 2629.34 samples/sec   Loss 5.6770   LearningRate 0.0181   Epoch: 11   Global Step: 476930   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:40,817-Speed 2630.89 samples/sec   Loss 5.7732   LearningRate 0.0181   Epoch: 11   Global Step: 476940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:44,718-Speed 2625.73 samples/sec   Loss 5.7239   LearningRate 0.0181   Epoch: 11   Global Step: 476950   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:18:48,626-Speed 2621.02 samples/sec   Loss 5.5999   LearningRate 0.0181   Epoch: 11   Global Step: 476960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:18:52,518-Speed 2631.82 samples/sec   Loss 5.6484   LearningRate 0.0181   Epoch: 11   Global Step: 476970   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:18:56,417-Speed 2626.82 samples/sec   Loss 5.8050   LearningRate 0.0181   Epoch: 11   Global Step: 476980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:19:00,292-Speed 2644.95 samples/sec   Loss 5.6860   LearningRate 0.0181   Epoch: 11   Global Step: 476990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:04,189-Speed 2628.19 samples/sec   Loss 5.7625   LearningRate 0.0181   Epoch: 11   Global Step: 477000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:08,083-Speed 2629.99 samples/sec   Loss 5.6194   LearningRate 0.0181   Epoch: 11   Global Step: 477010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:11,979-Speed 2628.71 samples/sec   Loss 5.6669   LearningRate 0.0181   Epoch: 11   Global Step: 477020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:15,875-Speed 2629.05 samples/sec   Loss 5.7692   LearningRate 0.0181   Epoch: 11   Global Step: 477030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:19,767-Speed 2632.05 samples/sec   Loss 5.7281   LearningRate 0.0181   Epoch: 11   Global Step: 477040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:23,670-Speed 2624.37 samples/sec   Loss 5.7049   LearningRate 0.0181   Epoch: 11   Global Step: 477050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:27,562-Speed 2631.15 samples/sec   Loss 5.7047   LearningRate 0.0181   Epoch: 11   Global Step: 477060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:31,460-Speed 2628.23 samples/sec   Loss 5.5937   LearningRate 0.0181   Epoch: 11   Global Step: 477070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:35,354-Speed 2629.92 samples/sec   Loss 5.5780   LearningRate 0.0181   Epoch: 11   Global Step: 477080   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:19:39,248-Speed 2630.55 samples/sec   Loss 5.6351   LearningRate 0.0181   Epoch: 11   Global Step: 477090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:19:43,138-Speed 2632.69 samples/sec   Loss 5.7953   LearningRate 0.0181   Epoch: 11   Global Step: 477100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:19:47,031-Speed 2631.20 samples/sec   Loss 5.6648   LearningRate 0.0181   Epoch: 11   Global Step: 477110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:19:50,929-Speed 2627.47 samples/sec   Loss 5.6671   LearningRate 0.0181   Epoch: 11   Global Step: 477120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:19:54,828-Speed 2627.21 samples/sec   Loss 5.6475   LearningRate 0.0181   Epoch: 11   Global Step: 477130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:19:58,721-Speed 2631.10 samples/sec   Loss 5.6614   LearningRate 0.0180   Epoch: 11   Global Step: 477140   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:02,641-Speed 2613.19 samples/sec   Loss 5.6553   LearningRate 0.0180   Epoch: 11   Global Step: 477150   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:06,543-Speed 2624.70 samples/sec   Loss 5.7086   LearningRate 0.0180   Epoch: 11   Global Step: 477160   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:10,534-Speed 2566.20 samples/sec   Loss 5.5617   LearningRate 0.0180   Epoch: 11   Global Step: 477170   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:14,438-Speed 2623.77 samples/sec   Loss 5.6749   LearningRate 0.0180   Epoch: 11   Global Step: 477180   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:18,333-Speed 2630.10 samples/sec   Loss 5.6633   LearningRate 0.0180   Epoch: 11   Global Step: 477190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:22,225-Speed 2631.56 samples/sec   Loss 5.7255   LearningRate 0.0180   Epoch: 11   Global Step: 477200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:26,147-Speed 2611.91 samples/sec   Loss 5.6330   LearningRate 0.0180   Epoch: 11   Global Step: 477210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:30,038-Speed 2631.98 samples/sec   Loss 5.7220   LearningRate 0.0180   Epoch: 11   Global Step: 477220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:33,939-Speed 2625.29 samples/sec   Loss 5.7260   LearningRate 0.0180   Epoch: 11   Global Step: 477230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:37,831-Speed 2631.83 samples/sec   Loss 5.7203   LearningRate 0.0180   Epoch: 11   Global Step: 477240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:20:41,721-Speed 2633.31 samples/sec   Loss 5.7600   LearningRate 0.0180   Epoch: 11   Global Step: 477250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:20:45,603-Speed 2638.10 samples/sec   Loss 5.7065   LearningRate 0.0180   Epoch: 11   Global Step: 477260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:49,500-Speed 2629.30 samples/sec   Loss 5.6675   LearningRate 0.0180   Epoch: 11   Global Step: 477270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:53,396-Speed 2628.90 samples/sec   Loss 5.7740   LearningRate 0.0180   Epoch: 11   Global Step: 477280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:20:57,295-Speed 2626.59 samples/sec   Loss 5.5729   LearningRate 0.0180   Epoch: 11   Global Step: 477290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:01,198-Speed 2624.37 samples/sec   Loss 5.7631   LearningRate 0.0180   Epoch: 11   Global Step: 477300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:05,102-Speed 2623.25 samples/sec   Loss 5.7183   LearningRate 0.0180   Epoch: 11   Global Step: 477310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:08,998-Speed 2629.05 samples/sec   Loss 5.6735   LearningRate 0.0180   Epoch: 11   Global Step: 477320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:12,904-Speed 2622.36 samples/sec   Loss 5.6260   LearningRate 0.0180   Epoch: 11   Global Step: 477330   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:16,802-Speed 2627.98 samples/sec   Loss 5.6715   LearningRate 0.0180   Epoch: 11   Global Step: 477340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:20,707-Speed 2622.86 samples/sec   Loss 5.6519   LearningRate 0.0180   Epoch: 11   Global Step: 477350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:24,606-Speed 2627.70 samples/sec   Loss 5.7006   LearningRate 0.0180   Epoch: 11   Global Step: 477360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:21:28,507-Speed 2625.50 samples/sec   Loss 5.6363   LearningRate 0.0180   Epoch: 11   Global Step: 477370   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:21:32,399-Speed 2631.91 samples/sec   Loss 5.6275   LearningRate 0.0180   Epoch: 11   Global Step: 477380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:36,301-Speed 2624.69 samples/sec   Loss 5.7420   LearningRate 0.0180   Epoch: 11   Global Step: 477390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:40,199-Speed 2627.51 samples/sec   Loss 5.7081   LearningRate 0.0180   Epoch: 11   Global Step: 477400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:44,097-Speed 2627.76 samples/sec   Loss 5.7085   LearningRate 0.0180   Epoch: 11   Global Step: 477410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:47,990-Speed 2630.84 samples/sec   Loss 5.7096   LearningRate 0.0180   Epoch: 11   Global Step: 477420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:51,926-Speed 2602.89 samples/sec   Loss 5.6091   LearningRate 0.0180   Epoch: 11   Global Step: 477430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:21:56,010-Speed 2508.06 samples/sec   Loss 5.6043   LearningRate 0.0180   Epoch: 11   Global Step: 477440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:00,091-Speed 2509.68 samples/sec   Loss 5.6240   LearningRate 0.0180   Epoch: 11   Global Step: 477450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:04,080-Speed 2568.18 samples/sec   Loss 5.7324   LearningRate 0.0180   Epoch: 11   Global Step: 477460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:07,976-Speed 2628.89 samples/sec   Loss 5.6808   LearningRate 0.0180   Epoch: 11   Global Step: 477470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:11,876-Speed 2625.88 samples/sec   Loss 5.7030   LearningRate 0.0180   Epoch: 11   Global Step: 477480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:22:15,769-Speed 2631.42 samples/sec   Loss 5.5855   LearningRate 0.0180   Epoch: 11   Global Step: 477490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:22:19,672-Speed 2624.29 samples/sec   Loss 5.6474   LearningRate 0.0180   Epoch: 11   Global Step: 477500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:22:23,552-Speed 2640.08 samples/sec   Loss 5.7034   LearningRate 0.0180   Epoch: 11   Global Step: 477510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:27,503-Speed 2592.46 samples/sec   Loss 5.6393   LearningRate 0.0180   Epoch: 11   Global Step: 477520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:31,396-Speed 2631.70 samples/sec   Loss 5.6082   LearningRate 0.0180   Epoch: 11   Global Step: 477530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:35,315-Speed 2612.86 samples/sec   Loss 5.6425   LearningRate 0.0180   Epoch: 11   Global Step: 477540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:39,218-Speed 2624.82 samples/sec   Loss 5.7196   LearningRate 0.0180   Epoch: 11   Global Step: 477550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:43,111-Speed 2630.56 samples/sec   Loss 5.7397   LearningRate 0.0180   Epoch: 11   Global Step: 477560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:47,016-Speed 2623.49 samples/sec   Loss 5.5454   LearningRate 0.0180   Epoch: 11   Global Step: 477570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:50,953-Speed 2601.17 samples/sec   Loss 5.6443   LearningRate 0.0180   Epoch: 11   Global Step: 477580   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:54,850-Speed 2628.42 samples/sec   Loss 5.6731   LearningRate 0.0180   Epoch: 11   Global Step: 477590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:22:58,761-Speed 2619.39 samples/sec   Loss 5.6115   LearningRate 0.0180   Epoch: 11   Global Step: 477600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:02,658-Speed 2628.84 samples/sec   Loss 5.7019   LearningRate 0.0180   Epoch: 11   Global Step: 477610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:23:06,559-Speed 2625.36 samples/sec   Loss 5.6631   LearningRate 0.0180   Epoch: 11   Global Step: 477620   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:23:10,572-Speed 2551.75 samples/sec   Loss 5.6191   LearningRate 0.0180   Epoch: 11   Global Step: 477630   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:23:14,468-Speed 2629.15 samples/sec   Loss 5.7456   LearningRate 0.0180   Epoch: 11   Global Step: 477640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:18,395-Speed 2608.26 samples/sec   Loss 5.6618   LearningRate 0.0180   Epoch: 11   Global Step: 477650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:22,295-Speed 2626.48 samples/sec   Loss 5.6190   LearningRate 0.0180   Epoch: 11   Global Step: 477660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:26,199-Speed 2624.17 samples/sec   Loss 5.6782   LearningRate 0.0180   Epoch: 11   Global Step: 477670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:30,138-Speed 2600.53 samples/sec   Loss 5.7582   LearningRate 0.0180   Epoch: 11   Global Step: 477680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:34,075-Speed 2601.05 samples/sec   Loss 5.6380   LearningRate 0.0180   Epoch: 11   Global Step: 477690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:38,168-Speed 2502.36 samples/sec   Loss 5.7508   LearningRate 0.0180   Epoch: 11   Global Step: 477700   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:42,267-Speed 2498.42 samples/sec   Loss 5.7012   LearningRate 0.0180   Epoch: 11   Global Step: 477710   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:46,205-Speed 2601.65 samples/sec   Loss 5.6423   LearningRate 0.0180   Epoch: 11   Global Step: 477720   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:50,105-Speed 2626.07 samples/sec   Loss 5.6835   LearningRate 0.0180   Epoch: 11   Global Step: 477730   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:23:54,003-Speed 2627.82 samples/sec   Loss 5.7191   LearningRate 0.0180   Epoch: 11   Global Step: 477740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:23:57,901-Speed 2627.61 samples/sec   Loss 5.5935   LearningRate 0.0180   Epoch: 11   Global Step: 477750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:24:01,811-Speed 2619.51 samples/sec   Loss 5.6735   LearningRate 0.0180   Epoch: 11   Global Step: 477760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:24:05,702-Speed 2632.64 samples/sec   Loss 5.5738   LearningRate 0.0180   Epoch: 11   Global Step: 477770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:24:09,596-Speed 2630.39 samples/sec   Loss 5.6820   LearningRate 0.0180   Epoch: 11   Global Step: 477780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:24:13,466-Speed 2646.57 samples/sec   Loss 5.6203   LearningRate 0.0180   Epoch: 11   Global Step: 477790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:17,364-Speed 2627.48 samples/sec   Loss 5.7528   LearningRate 0.0180   Epoch: 11   Global Step: 477800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:21,269-Speed 2622.50 samples/sec   Loss 5.6594   LearningRate 0.0180   Epoch: 11   Global Step: 477810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:25,164-Speed 2629.57 samples/sec   Loss 5.6368   LearningRate 0.0180   Epoch: 11   Global Step: 477820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:29,089-Speed 2610.37 samples/sec   Loss 5.7219   LearningRate 0.0180   Epoch: 11   Global Step: 477830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:32,990-Speed 2625.16 samples/sec   Loss 5.7196   LearningRate 0.0180   Epoch: 11   Global Step: 477840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:36,894-Speed 2624.14 samples/sec   Loss 5.6264   LearningRate 0.0180   Epoch: 11   Global Step: 477850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:40,870-Speed 2575.98 samples/sec   Loss 5.7657   LearningRate 0.0180   Epoch: 11   Global Step: 477860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:44,763-Speed 2631.17 samples/sec   Loss 5.6715   LearningRate 0.0180   Epoch: 11   Global Step: 477870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:48,659-Speed 2628.57 samples/sec   Loss 5.6209   LearningRate 0.0180   Epoch: 11   Global Step: 477880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-04-15 01:24:52,553-Speed 2630.96 samples/sec   Loss 5.6746   LearningRate 0.0180   Epoch: 11   Global Step: 477890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:24:56,466-Speed 2617.84 samples/sec   Loss 5.6256   LearningRate 0.0180   Epoch: 11   Global Step: 477900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:25:00,370-Speed 2623.22 samples/sec   Loss 5.6973   LearningRate 0.0180   Epoch: 11   Global Step: 477910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:25:04,269-Speed 2627.32 samples/sec   Loss 5.6481   LearningRate 0.0180   Epoch: 11   Global Step: 477920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:25:08,163-Speed 2630.91 samples/sec   Loss 5.7032   LearningRate 0.0180   Epoch: 11   Global Step: 477930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:25:12,062-Speed 2626.77 samples/sec   Loss 5.6927   LearningRate 0.0180   Epoch: 11   Global Step: 477940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-04-15 01:25:15,957-Speed 2629.21 samples/sec   Loss 5.6081   LearningRate 0.0180   Epoch: 11   Global Step: 477950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:19,855-Speed 2627.70 samples/sec   Loss 5.6870   LearningRate 0.0180   Epoch: 11   Global Step: 477960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:23,766-Speed 2618.72 samples/sec   Loss 5.5540   LearningRate 0.0180   Epoch: 11   Global Step: 477970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:27,664-Speed 2627.82 samples/sec   Loss 5.5452   LearningRate 0.0180   Epoch: 11   Global Step: 477980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:31,568-Speed 2623.00 samples/sec   Loss 5.6067   LearningRate 0.0180   Epoch: 11   Global Step: 477990   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-04-15 01:25:35,465-Speed 2628.55 samples/sec   Loss 5.6587   LearningRate 0.0180   Epoch: 11   Global Step: 478000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:39,362-Speed 2628.65 samples/sec   Loss 5.6199   LearningRate 0.0180   Epoch: 11   Global Step: 478010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:43,268-Speed 2621.78 samples/sec   Loss 5.6750   LearningRate 0.0180   Epoch: 11   Global Step: 478020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:47,168-Speed 2626.67 samples/sec   Loss 5.5907   LearningRate 0.0180   Epoch: 11   Global Step: 478030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:51,058-Speed 2633.14 samples/sec   Loss 5.7191   LearningRate 0.0180   Epoch: 11   Global Step: 478040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:54,947-Speed 2633.33 samples/sec   Loss 5.6951   LearningRate 0.0180   Epoch: 11   Global Step: 478050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:25:58,841-Speed 2630.39 samples/sec   Loss 5.6704   LearningRate 0.0180   Epoch: 11   Global Step: 478060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:26:02,737-Speed 2629.03 samples/sec   Loss 5.7856   LearningRate 0.0180   Epoch: 11   Global Step: 478070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:26:06,629-Speed 2631.43 samples/sec   Loss 5.7475   LearningRate 0.0180   Epoch: 11   Global Step: 478080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:26:10,501-Speed 2645.03 samples/sec   Loss 5.8039   LearningRate 0.0180   Epoch: 11   Global Step: 478090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:26:14,399-Speed 2627.94 samples/sec   Loss 5.6569   LearningRate 0.0180   Epoch: 11   Global Step: 478100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:26:18,292-Speed 2631.20 samples/sec   Loss 5.7410   LearningRate 0.0179   Epoch: 11   Global Step: 478110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:26:22,163-Speed 2646.12 samples/sec   Loss 5.6335   LearningRate 0.0179   Epoch: 11   Global Step: 478120   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:26,055-Speed 2631.58 samples/sec   Loss 5.6637   LearningRate 0.0179   Epoch: 11   Global Step: 478130   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:29,954-Speed 2626.29 samples/sec   Loss 5.6749   LearningRate 0.0179   Epoch: 11   Global Step: 478140   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:33,851-Speed 2628.41 samples/sec   Loss 5.5740   LearningRate 0.0179   Epoch: 11   Global Step: 478150   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:37,745-Speed 2630.65 samples/sec   Loss 5.6597   LearningRate 0.0179   Epoch: 11   Global Step: 478160   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:41,636-Speed 2632.38 samples/sec   Loss 5.6600   LearningRate 0.0179   Epoch: 11   Global Step: 478170   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:45,527-Speed 2633.11 samples/sec   Loss 5.7014   LearningRate 0.0179   Epoch: 11   Global Step: 478180   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:49,425-Speed 2627.16 samples/sec   Loss 5.6738   LearningRate 0.0179   Epoch: 11   Global Step: 478190   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:53,328-Speed 2624.91 samples/sec   Loss 5.6351   LearningRate 0.0179   Epoch: 11   Global Step: 478200   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:26:57,246-Speed 2613.69 samples/sec   Loss 5.7583   LearningRate 0.0179   Epoch: 11   Global Step: 478210   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:27:01,140-Speed 2630.54 samples/sec   Loss 5.7030   LearningRate 0.0179   Epoch: 11   Global Step: 478220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:05,036-Speed 2629.11 samples/sec   Loss 5.8118   LearningRate 0.0179   Epoch: 11   Global Step: 478230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:08,934-Speed 2628.27 samples/sec   Loss 5.6794   LearningRate 0.0179   Epoch: 11   Global Step: 478240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:12,834-Speed 2626.80 samples/sec   Loss 5.5924   LearningRate 0.0179   Epoch: 11   Global Step: 478250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:16,736-Speed 2624.81 samples/sec   Loss 5.7317   LearningRate 0.0179   Epoch: 11   Global Step: 478260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:20,633-Speed 2628.75 samples/sec   Loss 5.6877   LearningRate 0.0179   Epoch: 11   Global Step: 478270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:24,537-Speed 2623.88 samples/sec   Loss 5.7907   LearningRate 0.0179   Epoch: 11   Global Step: 478280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:28,435-Speed 2628.41 samples/sec   Loss 5.6118   LearningRate 0.0179   Epoch: 11   Global Step: 478290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:32,331-Speed 2628.68 samples/sec   Loss 5.7161   LearningRate 0.0179   Epoch: 11   Global Step: 478300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:36,247-Speed 2615.18 samples/sec   Loss 5.7446   LearningRate 0.0179   Epoch: 11   Global Step: 478310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:27:40,143-Speed 2628.64 samples/sec   Loss 5.5929   LearningRate 0.0179   Epoch: 11   Global Step: 478320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:27:44,046-Speed 2625.21 samples/sec   Loss 5.6387   LearningRate 0.0179   Epoch: 11   Global Step: 478330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:27:47,951-Speed 2623.45 samples/sec   Loss 5.6139   LearningRate 0.0179   Epoch: 11   Global Step: 478340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:27:51,853-Speed 2624.95 samples/sec   Loss 5.6622   LearningRate 0.0179   Epoch: 11   Global Step: 478350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:27:55,758-Speed 2623.73 samples/sec   Loss 5.6234   LearningRate 0.0179   Epoch: 11   Global Step: 478360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:27:59,653-Speed 2629.43 samples/sec   Loss 5.6884   LearningRate 0.0179   Epoch: 11   Global Step: 478370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:03,563-Speed 2619.35 samples/sec   Loss 5.6893   LearningRate 0.0179   Epoch: 11   Global Step: 478380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:07,471-Speed 2620.93 samples/sec   Loss 5.6255   LearningRate 0.0179   Epoch: 11   Global Step: 478390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:11,380-Speed 2620.77 samples/sec   Loss 5.5456   LearningRate 0.0179   Epoch: 11   Global Step: 478400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:15,278-Speed 2627.74 samples/sec   Loss 5.7806   LearningRate 0.0179   Epoch: 11   Global Step: 478410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:19,157-Speed 2640.33 samples/sec   Loss 5.6678   LearningRate 0.0179   Epoch: 11   Global Step: 478420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:23,057-Speed 2626.55 samples/sec   Loss 5.7083   LearningRate 0.0179   Epoch: 11   Global Step: 478430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:27,040-Speed 2571.95 samples/sec   Loss 5.6367   LearningRate 0.0179   Epoch: 11   Global Step: 478440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:30,945-Speed 2622.99 samples/sec   Loss 5.5859   LearningRate 0.0179   Epoch: 11   Global Step: 478450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:34,847-Speed 2625.34 samples/sec   Loss 5.5947   LearningRate 0.0179   Epoch: 11   Global Step: 478460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:38,763-Speed 2615.23 samples/sec   Loss 5.7672   LearningRate 0.0179   Epoch: 11   Global Step: 478470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:42,657-Speed 2631.07 samples/sec   Loss 5.6875   LearningRate 0.0179   Epoch: 11   Global Step: 478480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:46,554-Speed 2627.97 samples/sec   Loss 5.7477   LearningRate 0.0179   Epoch: 11   Global Step: 478490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:50,449-Speed 2630.31 samples/sec   Loss 5.7165   LearningRate 0.0179   Epoch: 11   Global Step: 478500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:54,345-Speed 2629.40 samples/sec   Loss 5.6722   LearningRate 0.0179   Epoch: 11   Global Step: 478510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:28:58,220-Speed 2643.84 samples/sec   Loss 5.7194   LearningRate 0.0179   Epoch: 11   Global Step: 478520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:02,151-Speed 2605.05 samples/sec   Loss 5.6336   LearningRate 0.0179   Epoch: 11   Global Step: 478530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:06,051-Speed 2626.37 samples/sec   Loss 5.6862   LearningRate 0.0179   Epoch: 11   Global Step: 478540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:09,956-Speed 2622.72 samples/sec   Loss 5.7057   LearningRate 0.0179   Epoch: 11   Global Step: 478550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:13,854-Speed 2628.48 samples/sec   Loss 5.6223   LearningRate 0.0179   Epoch: 11   Global Step: 478560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:17,747-Speed 2631.01 samples/sec   Loss 5.8138   LearningRate 0.0179   Epoch: 11   Global Step: 478570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:21,644-Speed 2628.72 samples/sec   Loss 5.5706   LearningRate 0.0179   Epoch: 11   Global Step: 478580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:29:25,552-Speed 2621.56 samples/sec   Loss 5.6926   LearningRate 0.0179   Epoch: 11   Global Step: 478590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:29,452-Speed 2626.04 samples/sec   Loss 5.6470   LearningRate 0.0179   Epoch: 11   Global Step: 478600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:33,358-Speed 2622.56 samples/sec   Loss 5.6777   LearningRate 0.0179   Epoch: 11   Global Step: 478610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:37,253-Speed 2628.85 samples/sec   Loss 5.5772   LearningRate 0.0179   Epoch: 11   Global Step: 478620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:41,149-Speed 2629.60 samples/sec   Loss 5.6345   LearningRate 0.0179   Epoch: 11   Global Step: 478630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:45,040-Speed 2631.81 samples/sec   Loss 5.6741   LearningRate 0.0179   Epoch: 11   Global Step: 478640   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:48,933-Speed 2631.60 samples/sec   Loss 5.7427   LearningRate 0.0179   Epoch: 11   Global Step: 478650   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:52,838-Speed 2622.74 samples/sec   Loss 5.6807   LearningRate 0.0179   Epoch: 11   Global Step: 478660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:29:56,733-Speed 2630.35 samples/sec   Loss 5.6603   LearningRate 0.0179   Epoch: 11   Global Step: 478670   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:00,635-Speed 2624.84 samples/sec   Loss 5.6184   LearningRate 0.0179   Epoch: 11   Global Step: 478680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:04,508-Speed 2643.92 samples/sec   Loss 5.7207   LearningRate 0.0179   Epoch: 11   Global Step: 478690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:08,410-Speed 2625.15 samples/sec   Loss 5.6353   LearningRate 0.0179   Epoch: 11   Global Step: 478700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:12,334-Speed 2611.13 samples/sec   Loss 5.7205   LearningRate 0.0179   Epoch: 11   Global Step: 478710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:16,246-Speed 2618.03 samples/sec   Loss 5.6703   LearningRate 0.0179   Epoch: 11   Global Step: 478720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:20,156-Speed 2619.44 samples/sec   Loss 5.7316   LearningRate 0.0179   Epoch: 11   Global Step: 478730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:24,053-Speed 2628.75 samples/sec   Loss 5.6683   LearningRate 0.0179   Epoch: 11   Global Step: 478740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:27,949-Speed 2628.87 samples/sec   Loss 5.6465   LearningRate 0.0179   Epoch: 11   Global Step: 478750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:31,844-Speed 2630.21 samples/sec   Loss 5.5951   LearningRate 0.0179   Epoch: 11   Global Step: 478760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:35,737-Speed 2630.74 samples/sec   Loss 5.6914   LearningRate 0.0179   Epoch: 11   Global Step: 478770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:39,638-Speed 2625.06 samples/sec   Loss 5.6617   LearningRate 0.0179   Epoch: 11   Global Step: 478780   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:30:43,543-Speed 2623.06 samples/sec   Loss 5.6102   LearningRate 0.0179   Epoch: 11   Global Step: 478790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:30:47,441-Speed 2628.35 samples/sec   Loss 5.6299   LearningRate 0.0179   Epoch: 11   Global Step: 478800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:30:51,337-Speed 2628.39 samples/sec   Loss 5.6763   LearningRate 0.0179   Epoch: 11   Global Step: 478810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:30:55,253-Speed 2616.22 samples/sec   Loss 5.7523   LearningRate 0.0179   Epoch: 11   Global Step: 478820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:30:59,150-Speed 2628.13 samples/sec   Loss 5.6421   LearningRate 0.0179   Epoch: 11   Global Step: 478830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:03,059-Speed 2621.12 samples/sec   Loss 5.6354   LearningRate 0.0179   Epoch: 11   Global Step: 478840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:06,960-Speed 2625.19 samples/sec   Loss 5.5903   LearningRate 0.0179   Epoch: 11   Global Step: 478850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:10,856-Speed 2628.77 samples/sec   Loss 5.6602   LearningRate 0.0179   Epoch: 11   Global Step: 478860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:14,753-Speed 2628.46 samples/sec   Loss 5.7647   LearningRate 0.0179   Epoch: 11   Global Step: 478870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:18,660-Speed 2621.97 samples/sec   Loss 5.6423   LearningRate 0.0179   Epoch: 11   Global Step: 478880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:22,562-Speed 2625.12 samples/sec   Loss 5.6622   LearningRate 0.0179   Epoch: 11   Global Step: 478890   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-04-15 01:31:26,440-Speed 2641.01 samples/sec   Loss 5.6594   LearningRate 0.0179   Epoch: 11   Global Step: 478900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:30,335-Speed 2630.14 samples/sec   Loss 5.6273   LearningRate 0.0179   Epoch: 11   Global Step: 478910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:34,231-Speed 2628.41 samples/sec   Loss 5.6457   LearningRate 0.0179   Epoch: 11   Global Step: 478920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:38,130-Speed 2626.99 samples/sec   Loss 5.7345   LearningRate 0.0179   Epoch: 11   Global Step: 478930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:42,037-Speed 2621.09 samples/sec   Loss 5.6030   LearningRate 0.0179   Epoch: 11   Global Step: 478940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:45,934-Speed 2628.88 samples/sec   Loss 5.6886   LearningRate 0.0179   Epoch: 11   Global Step: 478950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:31:49,809-Speed 2643.55 samples/sec   Loss 5.7845   LearningRate 0.0179   Epoch: 11   Global Step: 478960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:31:53,726-Speed 2615.46 samples/sec   Loss 5.6687   LearningRate 0.0179   Epoch: 11   Global Step: 478970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:31:57,640-Speed 2616.17 samples/sec   Loss 5.7004   LearningRate 0.0179   Epoch: 11   Global Step: 478980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:01,537-Speed 2628.49 samples/sec   Loss 5.6456   LearningRate 0.0179   Epoch: 11   Global Step: 478990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:05,505-Speed 2581.48 samples/sec   Loss 5.7624   LearningRate 0.0179   Epoch: 11   Global Step: 479000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:09,402-Speed 2627.76 samples/sec   Loss 5.5528   LearningRate 0.0179   Epoch: 11   Global Step: 479010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:13,319-Speed 2614.89 samples/sec   Loss 5.5664   LearningRate 0.0179   Epoch: 11   Global Step: 479020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:17,230-Speed 2619.88 samples/sec   Loss 5.5644   LearningRate 0.0179   Epoch: 11   Global Step: 479030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:21,130-Speed 2625.87 samples/sec   Loss 5.6501   LearningRate 0.0179   Epoch: 11   Global Step: 479040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:25,033-Speed 2624.65 samples/sec   Loss 5.6804   LearningRate 0.0179   Epoch: 11   Global Step: 479050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:32:28,937-Speed 2624.09 samples/sec   Loss 5.6574   LearningRate 0.0179   Epoch: 11   Global Step: 479060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:32,832-Speed 2629.64 samples/sec   Loss 5.5336   LearningRate 0.0179   Epoch: 11   Global Step: 479070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:36,727-Speed 2629.17 samples/sec   Loss 5.6237   LearningRate 0.0179   Epoch: 11   Global Step: 479080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:40,623-Speed 2628.91 samples/sec   Loss 5.5867   LearningRate 0.0178   Epoch: 11   Global Step: 479090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:44,525-Speed 2625.08 samples/sec   Loss 5.6912   LearningRate 0.0178   Epoch: 11   Global Step: 479100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:48,423-Speed 2627.88 samples/sec   Loss 5.6615   LearningRate 0.0178   Epoch: 11   Global Step: 479110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:52,331-Speed 2620.99 samples/sec   Loss 5.5921   LearningRate 0.0178   Epoch: 11   Global Step: 479120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:32:56,233-Speed 2624.58 samples/sec   Loss 5.6395   LearningRate 0.0178   Epoch: 11   Global Step: 479130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:33:00,136-Speed 2624.32 samples/sec   Loss 5.6541   LearningRate 0.0178   Epoch: 11   Global Step: 479140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:33:04,028-Speed 2631.30 samples/sec   Loss 5.6253   LearningRate 0.0178   Epoch: 11   Global Step: 479150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:33:07,906-Speed 2642.10 samples/sec   Loss 5.6909   LearningRate 0.0178   Epoch: 11   Global Step: 479160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:33:11,801-Speed 2628.96 samples/sec   Loss 5.7226   LearningRate 0.0178   Epoch: 11   Global Step: 479170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:33:15,700-Speed 2626.77 samples/sec   Loss 5.6409   LearningRate 0.0178   Epoch: 11   Global Step: 479180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:33:19,603-Speed 2624.76 samples/sec   Loss 5.6862   LearningRate 0.0178   Epoch: 11   Global Step: 479190   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:23,493-Speed 2633.29 samples/sec   Loss 5.6398   LearningRate 0.0178   Epoch: 11   Global Step: 479200   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:27,386-Speed 2631.25 samples/sec   Loss 5.6330   LearningRate 0.0178   Epoch: 11   Global Step: 479210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:31,324-Speed 2600.85 samples/sec   Loss 5.6983   LearningRate 0.0178   Epoch: 11   Global Step: 479220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:35,264-Speed 2599.33 samples/sec   Loss 5.5752   LearningRate 0.0178   Epoch: 11   Global Step: 479230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:39,295-Speed 2541.11 samples/sec   Loss 5.6883   LearningRate 0.0178   Epoch: 11   Global Step: 479240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:43,191-Speed 2629.61 samples/sec   Loss 5.6614   LearningRate 0.0178   Epoch: 11   Global Step: 479250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:47,107-Speed 2614.77 samples/sec   Loss 5.6687   LearningRate 0.0178   Epoch: 11   Global Step: 479260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:51,019-Speed 2619.13 samples/sec   Loss 5.5811   LearningRate 0.0178   Epoch: 11   Global Step: 479270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:54,914-Speed 2629.85 samples/sec   Loss 5.6902   LearningRate 0.0178   Epoch: 11   Global Step: 479280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:33:58,828-Speed 2616.48 samples/sec   Loss 5.6124   LearningRate 0.0178   Epoch: 11   Global Step: 479290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:34:02,730-Speed 2624.97 samples/sec   Loss 5.5876   LearningRate 0.0178   Epoch: 11   Global Step: 479300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:34:06,602-Speed 2645.40 samples/sec   Loss 5.6890   LearningRate 0.0178   Epoch: 11   Global Step: 479310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:10,504-Speed 2625.30 samples/sec   Loss 5.6586   LearningRate 0.0178   Epoch: 11   Global Step: 479320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:14,459-Speed 2589.72 samples/sec   Loss 5.6416   LearningRate 0.0178   Epoch: 11   Global Step: 479330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:18,366-Speed 2621.44 samples/sec   Loss 5.7359   LearningRate 0.0178   Epoch: 11   Global Step: 479340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:22,264-Speed 2628.03 samples/sec   Loss 5.6809   LearningRate 0.0178   Epoch: 11   Global Step: 479350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:26,204-Speed 2599.73 samples/sec   Loss 5.6087   LearningRate 0.0178   Epoch: 11   Global Step: 479360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:30,100-Speed 2629.33 samples/sec   Loss 5.6439   LearningRate 0.0178   Epoch: 11   Global Step: 479370   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:33,999-Speed 2626.80 samples/sec   Loss 5.6327   LearningRate 0.0178   Epoch: 11   Global Step: 479380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:37,893-Speed 2629.92 samples/sec   Loss 5.7470   LearningRate 0.0178   Epoch: 11   Global Step: 479390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:41,788-Speed 2630.09 samples/sec   Loss 5.6040   LearningRate 0.0178   Epoch: 11   Global Step: 479400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:34:45,688-Speed 2625.86 samples/sec   Loss 5.6631   LearningRate 0.0178   Epoch: 11   Global Step: 479410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:34:49,597-Speed 2620.42 samples/sec   Loss 5.6165   LearningRate 0.0178   Epoch: 11   Global Step: 479420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:34:53,507-Speed 2619.48 samples/sec   Loss 5.6782   LearningRate 0.0178   Epoch: 11   Global Step: 479430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:34:57,408-Speed 2626.42 samples/sec   Loss 5.5738   LearningRate 0.0178   Epoch: 11   Global Step: 479440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:01,307-Speed 2626.91 samples/sec   Loss 5.6656   LearningRate 0.0178   Epoch: 11   Global Step: 479450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:05,216-Speed 2620.30 samples/sec   Loss 5.6711   LearningRate 0.0178   Epoch: 11   Global Step: 479460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:09,116-Speed 2626.18 samples/sec   Loss 5.5825   LearningRate 0.0178   Epoch: 11   Global Step: 479470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:13,045-Speed 2607.52 samples/sec   Loss 5.5566   LearningRate 0.0178   Epoch: 11   Global Step: 479480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:16,941-Speed 2628.89 samples/sec   Loss 5.6350   LearningRate 0.0178   Epoch: 11   Global Step: 479490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:20,863-Speed 2611.76 samples/sec   Loss 5.5954   LearningRate 0.0178   Epoch: 11   Global Step: 479500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:24,738-Speed 2643.10 samples/sec   Loss 5.6551   LearningRate 0.0178   Epoch: 11   Global Step: 479510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:28,651-Speed 2617.97 samples/sec   Loss 5.6338   LearningRate 0.0178   Epoch: 11   Global Step: 479520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:35:32,521-Speed 2646.57 samples/sec   Loss 5.7349   LearningRate 0.0178   Epoch: 11   Global Step: 479530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:36,418-Speed 2628.13 samples/sec   Loss 5.6835   LearningRate 0.0178   Epoch: 11   Global Step: 479540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:40,321-Speed 2624.43 samples/sec   Loss 5.6595   LearningRate 0.0178   Epoch: 11   Global Step: 479550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:44,215-Speed 2630.18 samples/sec   Loss 5.6923   LearningRate 0.0178   Epoch: 11   Global Step: 479560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:48,115-Speed 2627.08 samples/sec   Loss 5.5506   LearningRate 0.0178   Epoch: 11   Global Step: 479570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:52,012-Speed 2628.47 samples/sec   Loss 5.6153   LearningRate 0.0178   Epoch: 11   Global Step: 479580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:55,935-Speed 2610.76 samples/sec   Loss 5.6161   LearningRate 0.0178   Epoch: 11   Global Step: 479590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:35:59,832-Speed 2628.30 samples/sec   Loss 5.5959   LearningRate 0.0178   Epoch: 11   Global Step: 479600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:36:03,724-Speed 2631.99 samples/sec   Loss 5.7068   LearningRate 0.0178   Epoch: 11   Global Step: 479610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:36:07,619-Speed 2629.66 samples/sec   Loss 5.7292   LearningRate 0.0178   Epoch: 11   Global Step: 479620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:36:11,520-Speed 2626.11 samples/sec   Loss 5.6958   LearningRate 0.0178   Epoch: 11   Global Step: 479630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:15,418-Speed 2627.28 samples/sec   Loss 5.5091   LearningRate 0.0178   Epoch: 11   Global Step: 479640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:19,315-Speed 2628.81 samples/sec   Loss 5.6588   LearningRate 0.0178   Epoch: 11   Global Step: 479650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:23,221-Speed 2622.50 samples/sec   Loss 5.5703   LearningRate 0.0178   Epoch: 11   Global Step: 479660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:27,144-Speed 2610.57 samples/sec   Loss 5.6337   LearningRate 0.0178   Epoch: 11   Global Step: 479670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:31,041-Speed 2628.65 samples/sec   Loss 5.6776   LearningRate 0.0178   Epoch: 11   Global Step: 479680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:34,948-Speed 2621.20 samples/sec   Loss 5.6528   LearningRate 0.0178   Epoch: 11   Global Step: 479690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:38,845-Speed 2628.57 samples/sec   Loss 5.5926   LearningRate 0.0178   Epoch: 11   Global Step: 479700   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:36:42,718-Speed 2644.28 samples/sec   Loss 5.7273   LearningRate 0.0178   Epoch: 11   Global Step: 479710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:36:46,588-Speed 2647.07 samples/sec   Loss 5.6771   LearningRate 0.0178   Epoch: 11   Global Step: 479720   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:36:50,488-Speed 2626.28 samples/sec   Loss 5.5895   LearningRate 0.0178   Epoch: 11   Global Step: 479730   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:36:54,580-Speed 2503.31 samples/sec   Loss 5.4972   LearningRate 0.0178   Epoch: 11   Global Step: 479740   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:36:58,659-Speed 2510.99 samples/sec   Loss 5.6338   LearningRate 0.0178   Epoch: 11   Global Step: 479750   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:02,749-Speed 2505.69 samples/sec   Loss 5.6542   LearningRate 0.0178   Epoch: 11   Global Step: 479760   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:06,640-Speed 2632.18 samples/sec   Loss 5.5903   LearningRate 0.0178   Epoch: 11   Global Step: 479770   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:10,539-Speed 2626.53 samples/sec   Loss 5.7011   LearningRate 0.0178   Epoch: 11   Global Step: 479780   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:14,465-Speed 2608.92 samples/sec   Loss 5.6228   LearningRate 0.0178   Epoch: 11   Global Step: 479790   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:18,364-Speed 2627.71 samples/sec   Loss 5.6237   LearningRate 0.0178   Epoch: 11   Global Step: 479800   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:22,302-Speed 2600.72 samples/sec   Loss 5.6420   LearningRate 0.0178   Epoch: 11   Global Step: 479810   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:37:26,252-Speed 2593.38 samples/sec   Loss 5.5395   LearningRate 0.0178   Epoch: 11   Global Step: 479820   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:30,147-Speed 2630.20 samples/sec   Loss 5.6331   LearningRate 0.0178   Epoch: 11   Global Step: 479830   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:34,046-Speed 2627.37 samples/sec   Loss 5.6838   LearningRate 0.0178   Epoch: 11   Global Step: 479840   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:37,954-Speed 2620.87 samples/sec   Loss 5.5336   LearningRate 0.0178   Epoch: 11   Global Step: 479850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:41,847-Speed 2631.07 samples/sec   Loss 5.6864   LearningRate 0.0178   Epoch: 11   Global Step: 479860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:45,740-Speed 2630.84 samples/sec   Loss 5.4816   LearningRate 0.0178   Epoch: 11   Global Step: 479870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:49,646-Speed 2622.84 samples/sec   Loss 5.6327   LearningRate 0.0178   Epoch: 11   Global Step: 479880   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:53,546-Speed 2626.92 samples/sec   Loss 5.6435   LearningRate 0.0178   Epoch: 11   Global Step: 479890   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:37:57,449-Speed 2623.87 samples/sec   Loss 5.6561   LearningRate 0.0178   Epoch: 11   Global Step: 479900   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:01,346-Speed 2629.31 samples/sec   Loss 5.6588   LearningRate 0.0178   Epoch: 11   Global Step: 479910   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:05,252-Speed 2622.10 samples/sec   Loss 5.5899   LearningRate 0.0178   Epoch: 11   Global Step: 479920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:38:09,141-Speed 2633.26 samples/sec   Loss 5.7734   LearningRate 0.0178   Epoch: 11   Global Step: 479930   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:13,039-Speed 2627.52 samples/sec   Loss 5.5526   LearningRate 0.0178   Epoch: 11   Global Step: 479940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:16,941-Speed 2625.50 samples/sec   Loss 5.6944   LearningRate 0.0178   Epoch: 11   Global Step: 479950   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:20,845-Speed 2623.28 samples/sec   Loss 5.6005   LearningRate 0.0178   Epoch: 11   Global Step: 479960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:24,744-Speed 2627.26 samples/sec   Loss 5.5385   LearningRate 0.0178   Epoch: 11   Global Step: 479970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:28,682-Speed 2602.54 samples/sec   Loss 5.6290   LearningRate 0.0178   Epoch: 11   Global Step: 479980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:32,596-Speed 2616.62 samples/sec   Loss 5.6120   LearningRate 0.0178   Epoch: 11   Global Step: 479990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:38:36,495-Speed 2626.84 samples/sec   Loss 5.6175   LearningRate 0.0178   Epoch: 11   Global Step: 480000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:39:19,805-[lfw][480000]XNorm: 23.695356
Training: 2022-04-15 01:39:19,806-[lfw][480000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-15 01:39:19,807-[lfw][480000]Accuracy-Highest: 0.99800
Training: 2022-04-15 01:40:10,104-[cfp_fp][480000]XNorm: 21.673020
Training: 2022-04-15 01:40:10,105-[cfp_fp][480000]Accuracy-Flip: 0.98871+-0.00509
Training: 2022-04-15 01:40:10,106-[cfp_fp][480000]Accuracy-Highest: 0.98871
Training: 2022-04-15 01:40:53,437-[agedb_30][480000]XNorm: 23.584162
Training: 2022-04-15 01:40:53,438-[agedb_30][480000]Accuracy-Flip: 0.97783+-0.00646
Training: 2022-04-15 01:40:53,439-[agedb_30][480000]Accuracy-Highest: 0.97817
Training: 2022-04-15 01:40:57,322-Speed 72.71 samples/sec   Loss 5.6357   LearningRate 0.0178   Epoch: 11   Global Step: 480010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:01,201-Speed 2640.86 samples/sec   Loss 5.6220   LearningRate 0.0178   Epoch: 11   Global Step: 480020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:05,120-Speed 2614.03 samples/sec   Loss 5.6867   LearningRate 0.0178   Epoch: 11   Global Step: 480030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:41:08,998-Speed 2641.04 samples/sec   Loss 5.5934   LearningRate 0.0178   Epoch: 11   Global Step: 480040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:41:12,863-Speed 2650.03 samples/sec   Loss 5.5848   LearningRate 0.0178   Epoch: 11   Global Step: 480050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:16,756-Speed 2630.81 samples/sec   Loss 5.6194   LearningRate 0.0178   Epoch: 11   Global Step: 480060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:20,647-Speed 2633.58 samples/sec   Loss 5.5887   LearningRate 0.0178   Epoch: 11   Global Step: 480070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:24,548-Speed 2626.02 samples/sec   Loss 5.6471   LearningRate 0.0177   Epoch: 11   Global Step: 480080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:28,655-Speed 2494.53 samples/sec   Loss 5.6188   LearningRate 0.0177   Epoch: 11   Global Step: 480090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:32,545-Speed 2632.82 samples/sec   Loss 5.6034   LearningRate 0.0177   Epoch: 11   Global Step: 480100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:36,437-Speed 2631.79 samples/sec   Loss 5.6306   LearningRate 0.0177   Epoch: 11   Global Step: 480110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:40,333-Speed 2628.99 samples/sec   Loss 5.7347   LearningRate 0.0177   Epoch: 11   Global Step: 480120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:44,226-Speed 2631.64 samples/sec   Loss 5.6312   LearningRate 0.0177   Epoch: 11   Global Step: 480130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:48,121-Speed 2629.70 samples/sec   Loss 5.5611   LearningRate 0.0177   Epoch: 11   Global Step: 480140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:52,003-Speed 2638.72 samples/sec   Loss 5.5058   LearningRate 0.0177   Epoch: 11   Global Step: 480150   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:55,898-Speed 2629.69 samples/sec   Loss 5.6475   LearningRate 0.0177   Epoch: 11   Global Step: 480160   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:41:59,812-Speed 2617.21 samples/sec   Loss 5.6446   LearningRate 0.0177   Epoch: 11   Global Step: 480170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:03,732-Speed 2612.61 samples/sec   Loss 5.6060   LearningRate 0.0177   Epoch: 11   Global Step: 480180   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:07,649-Speed 2614.92 samples/sec   Loss 5.6572   LearningRate 0.0177   Epoch: 11   Global Step: 480190   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:11,556-Speed 2621.88 samples/sec   Loss 5.6809   LearningRate 0.0177   Epoch: 11   Global Step: 480200   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:15,448-Speed 2631.78 samples/sec   Loss 5.6580   LearningRate 0.0177   Epoch: 11   Global Step: 480210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:19,346-Speed 2627.82 samples/sec   Loss 5.7900   LearningRate 0.0177   Epoch: 11   Global Step: 480220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:23,282-Speed 2602.19 samples/sec   Loss 5.6393   LearningRate 0.0177   Epoch: 11   Global Step: 480230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:27,178-Speed 2628.59 samples/sec   Loss 5.6503   LearningRate 0.0177   Epoch: 11   Global Step: 480240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:42:31,076-Speed 2627.60 samples/sec   Loss 5.5864   LearningRate 0.0177   Epoch: 11   Global Step: 480250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:34,977-Speed 2626.24 samples/sec   Loss 5.6533   LearningRate 0.0177   Epoch: 11   Global Step: 480260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:38,872-Speed 2629.98 samples/sec   Loss 5.6155   LearningRate 0.0177   Epoch: 11   Global Step: 480270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:42,765-Speed 2630.52 samples/sec   Loss 5.4998   LearningRate 0.0177   Epoch: 11   Global Step: 480280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:46,661-Speed 2629.37 samples/sec   Loss 5.5599   LearningRate 0.0177   Epoch: 11   Global Step: 480290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:50,564-Speed 2623.59 samples/sec   Loss 5.6517   LearningRate 0.0177   Epoch: 11   Global Step: 480300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:54,462-Speed 2627.59 samples/sec   Loss 5.6121   LearningRate 0.0177   Epoch: 11   Global Step: 480310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:42:58,364-Speed 2625.22 samples/sec   Loss 5.7226   LearningRate 0.0177   Epoch: 11   Global Step: 480320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:02,261-Speed 2628.27 samples/sec   Loss 5.6286   LearningRate 0.0177   Epoch: 11   Global Step: 480330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:06,153-Speed 2631.93 samples/sec   Loss 5.5457   LearningRate 0.0177   Epoch: 11   Global Step: 480340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:10,153-Speed 2560.70 samples/sec   Loss 5.6011   LearningRate 0.0177   Epoch: 11   Global Step: 480350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:14,246-Speed 2502.16 samples/sec   Loss 5.6106   LearningRate 0.0177   Epoch: 11   Global Step: 480360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:18,177-Speed 2605.77 samples/sec   Loss 5.5501   LearningRate 0.0177   Epoch: 11   Global Step: 480370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:22,072-Speed 2629.95 samples/sec   Loss 5.5944   LearningRate 0.0177   Epoch: 11   Global Step: 480380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:25,965-Speed 2630.59 samples/sec   Loss 5.7030   LearningRate 0.0177   Epoch: 11   Global Step: 480390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:29,859-Speed 2630.30 samples/sec   Loss 5.7544   LearningRate 0.0177   Epoch: 11   Global Step: 480400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:33,816-Speed 2588.62 samples/sec   Loss 5.6543   LearningRate 0.0177   Epoch: 11   Global Step: 480410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:37,756-Speed 2599.57 samples/sec   Loss 5.5997   LearningRate 0.0177   Epoch: 11   Global Step: 480420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:41,660-Speed 2623.84 samples/sec   Loss 5.6053   LearningRate 0.0177   Epoch: 11   Global Step: 480430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:45,555-Speed 2629.92 samples/sec   Loss 5.6411   LearningRate 0.0177   Epoch: 11   Global Step: 480440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:49,454-Speed 2627.55 samples/sec   Loss 5.7055   LearningRate 0.0177   Epoch: 11   Global Step: 480450   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-04-15 01:43:53,345-Speed 2631.65 samples/sec   Loss 5.6944   LearningRate 0.0177   Epoch: 11   Global Step: 480460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:43:57,226-Speed 2639.56 samples/sec   Loss 5.6340   LearningRate 0.0177   Epoch: 11   Global Step: 480470   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:01,131-Speed 2622.52 samples/sec   Loss 5.6894   LearningRate 0.0177   Epoch: 11   Global Step: 480480   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:05,027-Speed 2629.45 samples/sec   Loss 5.5944   LearningRate 0.0177   Epoch: 11   Global Step: 480490   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:08,923-Speed 2629.40 samples/sec   Loss 5.7053   LearningRate 0.0177   Epoch: 11   Global Step: 480500   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:12,820-Speed 2628.43 samples/sec   Loss 5.6126   LearningRate 0.0177   Epoch: 11   Global Step: 480510   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:16,716-Speed 2628.53 samples/sec   Loss 5.6175   LearningRate 0.0177   Epoch: 11   Global Step: 480520   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:20,623-Speed 2621.63 samples/sec   Loss 5.5687   LearningRate 0.0177   Epoch: 11   Global Step: 480530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:24,521-Speed 2627.77 samples/sec   Loss 5.6907   LearningRate 0.0177   Epoch: 11   Global Step: 480540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:28,418-Speed 2627.83 samples/sec   Loss 5.5089   LearningRate 0.0177   Epoch: 11   Global Step: 480550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:32,313-Speed 2629.72 samples/sec   Loss 5.6299   LearningRate 0.0177   Epoch: 11   Global Step: 480560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:36,209-Speed 2629.53 samples/sec   Loss 5.6738   LearningRate 0.0177   Epoch: 11   Global Step: 480570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:44:40,108-Speed 2627.51 samples/sec   Loss 5.6292   LearningRate 0.0177   Epoch: 11   Global Step: 480580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:44:44,004-Speed 2629.08 samples/sec   Loss 5.5814   LearningRate 0.0177   Epoch: 11   Global Step: 480590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:44:47,906-Speed 2624.44 samples/sec   Loss 5.5119   LearningRate 0.0177   Epoch: 11   Global Step: 480600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:44:51,781-Speed 2643.76 samples/sec   Loss 5.6546   LearningRate 0.0177   Epoch: 11   Global Step: 480610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:55,680-Speed 2626.28 samples/sec   Loss 5.6354   LearningRate 0.0177   Epoch: 11   Global Step: 480620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:44:59,579-Speed 2627.11 samples/sec   Loss 5.6344   LearningRate 0.0177   Epoch: 11   Global Step: 480630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:03,475-Speed 2628.97 samples/sec   Loss 5.6942   LearningRate 0.0177   Epoch: 11   Global Step: 480640   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:07,372-Speed 2628.53 samples/sec   Loss 5.6799   LearningRate 0.0177   Epoch: 11   Global Step: 480650   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:11,270-Speed 2627.44 samples/sec   Loss 5.6442   LearningRate 0.0177   Epoch: 11   Global Step: 480660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:15,164-Speed 2630.26 samples/sec   Loss 5.5991   LearningRate 0.0177   Epoch: 11   Global Step: 480670   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:19,067-Speed 2624.45 samples/sec   Loss 5.6512   LearningRate 0.0177   Epoch: 11   Global Step: 480680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:22,960-Speed 2631.18 samples/sec   Loss 5.6910   LearningRate 0.0177   Epoch: 11   Global Step: 480690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:26,855-Speed 2629.71 samples/sec   Loss 5.6368   LearningRate 0.0177   Epoch: 11   Global Step: 480700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:45:30,752-Speed 2627.92 samples/sec   Loss 5.6473   LearningRate 0.0177   Epoch: 11   Global Step: 480710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:34,647-Speed 2629.84 samples/sec   Loss 5.5895   LearningRate 0.0177   Epoch: 11   Global Step: 480720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:38,545-Speed 2627.50 samples/sec   Loss 5.7025   LearningRate 0.0177   Epoch: 11   Global Step: 480730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:42,439-Speed 2629.89 samples/sec   Loss 5.6382   LearningRate 0.0177   Epoch: 11   Global Step: 480740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:46,434-Speed 2564.13 samples/sec   Loss 5.6211   LearningRate 0.0177   Epoch: 11   Global Step: 480750   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:50,352-Speed 2614.67 samples/sec   Loss 5.6250   LearningRate 0.0177   Epoch: 11   Global Step: 480760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:54,318-Speed 2582.12 samples/sec   Loss 5.6441   LearningRate 0.0177   Epoch: 11   Global Step: 480770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:45:58,223-Speed 2622.90 samples/sec   Loss 5.6297   LearningRate 0.0177   Epoch: 11   Global Step: 480780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:46:02,123-Speed 2626.19 samples/sec   Loss 5.6626   LearningRate 0.0177   Epoch: 11   Global Step: 480790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:46:06,025-Speed 2625.14 samples/sec   Loss 5.5710   LearningRate 0.0177   Epoch: 11   Global Step: 480800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:46:09,896-Speed 2645.66 samples/sec   Loss 5.5725   LearningRate 0.0177   Epoch: 11   Global Step: 480810   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:13,809-Speed 2617.85 samples/sec   Loss 5.5085   LearningRate 0.0177   Epoch: 11   Global Step: 480820   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:17,716-Speed 2620.87 samples/sec   Loss 5.5886   LearningRate 0.0177   Epoch: 11   Global Step: 480830   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:21,612-Speed 2629.15 samples/sec   Loss 5.5902   LearningRate 0.0177   Epoch: 11   Global Step: 480840   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:25,508-Speed 2629.54 samples/sec   Loss 5.6183   LearningRate 0.0177   Epoch: 11   Global Step: 480850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:29,407-Speed 2627.22 samples/sec   Loss 5.5859   LearningRate 0.0177   Epoch: 11   Global Step: 480860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:33,310-Speed 2624.11 samples/sec   Loss 5.5472   LearningRate 0.0177   Epoch: 11   Global Step: 480870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:46:37,184-Speed 2643.69 samples/sec   Loss 5.5759   LearningRate 0.0177   Epoch: 11   Global Step: 480880   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:46:41,082-Speed 2627.37 samples/sec   Loss 5.5325   LearningRate 0.0177   Epoch: 11   Global Step: 480890   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:46:44,977-Speed 2629.46 samples/sec   Loss 5.7653   LearningRate 0.0177   Epoch: 11   Global Step: 480900   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:46:48,874-Speed 2628.31 samples/sec   Loss 5.5540   LearningRate 0.0177   Epoch: 11   Global Step: 480910   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:46:52,776-Speed 2625.41 samples/sec   Loss 5.5510   LearningRate 0.0177   Epoch: 11   Global Step: 480920   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:46:56,668-Speed 2631.03 samples/sec   Loss 5.5981   LearningRate 0.0177   Epoch: 11   Global Step: 480930   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:47:00,567-Speed 2627.10 samples/sec   Loss 5.5211   LearningRate 0.0177   Epoch: 11   Global Step: 480940   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:47:04,466-Speed 2627.50 samples/sec   Loss 5.6030   LearningRate 0.0177   Epoch: 11   Global Step: 480950   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:47:08,368-Speed 2624.99 samples/sec   Loss 5.5325   LearningRate 0.0177   Epoch: 11   Global Step: 480960   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:47:12,272-Speed 2623.49 samples/sec   Loss 5.6298   LearningRate 0.0177   Epoch: 11   Global Step: 480970   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:47:16,166-Speed 2630.11 samples/sec   Loss 5.4718   LearningRate 0.0177   Epoch: 11   Global Step: 480980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:20,060-Speed 2630.26 samples/sec   Loss 5.5389   LearningRate 0.0177   Epoch: 11   Global Step: 480990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:23,958-Speed 2627.66 samples/sec   Loss 5.6495   LearningRate 0.0177   Epoch: 11   Global Step: 481000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:27,858-Speed 2626.18 samples/sec   Loss 5.5880   LearningRate 0.0177   Epoch: 11   Global Step: 481010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:31,762-Speed 2623.24 samples/sec   Loss 5.6579   LearningRate 0.0177   Epoch: 11   Global Step: 481020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:35,661-Speed 2626.92 samples/sec   Loss 5.6158   LearningRate 0.0177   Epoch: 11   Global Step: 481030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:39,555-Speed 2630.48 samples/sec   Loss 5.6175   LearningRate 0.0177   Epoch: 11   Global Step: 481040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:43,449-Speed 2630.29 samples/sec   Loss 5.5596   LearningRate 0.0177   Epoch: 11   Global Step: 481050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:47,347-Speed 2628.15 samples/sec   Loss 5.7112   LearningRate 0.0176   Epoch: 11   Global Step: 481060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:51,241-Speed 2630.00 samples/sec   Loss 5.6982   LearningRate 0.0176   Epoch: 11   Global Step: 481070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:55,316-Speed 2513.53 samples/sec   Loss 5.6389   LearningRate 0.0176   Epoch: 11   Global Step: 481080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:47:59,390-Speed 2514.53 samples/sec   Loss 5.5971   LearningRate 0.0176   Epoch: 11   Global Step: 481090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:48:03,400-Speed 2553.78 samples/sec   Loss 5.6454   LearningRate 0.0176   Epoch: 11   Global Step: 481100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:48:07,304-Speed 2623.21 samples/sec   Loss 5.6942   LearningRate 0.0176   Epoch: 11   Global Step: 481110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:48:11,202-Speed 2627.61 samples/sec   Loss 5.5544   LearningRate 0.0176   Epoch: 11   Global Step: 481120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:48:15,099-Speed 2628.50 samples/sec   Loss 5.7402   LearningRate 0.0176   Epoch: 11   Global Step: 481130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:48:19,014-Speed 2616.19 samples/sec   Loss 5.6342   LearningRate 0.0176   Epoch: 11   Global Step: 481140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:48:22,899-Speed 2636.78 samples/sec   Loss 5.5986   LearningRate 0.0176   Epoch: 11   Global Step: 481150   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:26,844-Speed 2596.31 samples/sec   Loss 5.6054   LearningRate 0.0176   Epoch: 11   Global Step: 481160   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:30,835-Speed 2566.51 samples/sec   Loss 5.5938   LearningRate 0.0176   Epoch: 11   Global Step: 481170   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:34,735-Speed 2626.48 samples/sec   Loss 5.6175   LearningRate 0.0176   Epoch: 11   Global Step: 481180   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:38,625-Speed 2632.66 samples/sec   Loss 5.6657   LearningRate 0.0176   Epoch: 11   Global Step: 481190   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:42,526-Speed 2626.00 samples/sec   Loss 5.5979   LearningRate 0.0176   Epoch: 11   Global Step: 481200   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:46,427-Speed 2625.17 samples/sec   Loss 5.6198   LearningRate 0.0176   Epoch: 11   Global Step: 481210   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:50,321-Speed 2631.12 samples/sec   Loss 5.5184   LearningRate 0.0176   Epoch: 11   Global Step: 481220   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:54,223-Speed 2624.45 samples/sec   Loss 5.6829   LearningRate 0.0176   Epoch: 11   Global Step: 481230   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:48:58,119-Speed 2629.12 samples/sec   Loss 5.6362   LearningRate 0.0176   Epoch: 11   Global Step: 481240   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:49:02,069-Speed 2593.25 samples/sec   Loss 5.5744   LearningRate 0.0176   Epoch: 11   Global Step: 481250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:05,966-Speed 2628.34 samples/sec   Loss 5.4616   LearningRate 0.0176   Epoch: 11   Global Step: 481260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:09,870-Speed 2623.63 samples/sec   Loss 5.5642   LearningRate 0.0176   Epoch: 11   Global Step: 481270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:13,776-Speed 2622.51 samples/sec   Loss 5.6004   LearningRate 0.0176   Epoch: 11   Global Step: 481280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:17,699-Speed 2611.38 samples/sec   Loss 5.6287   LearningRate 0.0176   Epoch: 11   Global Step: 481290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:21,595-Speed 2628.97 samples/sec   Loss 5.6470   LearningRate 0.0176   Epoch: 11   Global Step: 481300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:25,487-Speed 2631.93 samples/sec   Loss 5.6451   LearningRate 0.0176   Epoch: 11   Global Step: 481310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:29,379-Speed 2631.42 samples/sec   Loss 5.4520   LearningRate 0.0176   Epoch: 11   Global Step: 481320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:33,275-Speed 2629.19 samples/sec   Loss 5.6610   LearningRate 0.0176   Epoch: 11   Global Step: 481330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:37,176-Speed 2626.16 samples/sec   Loss 5.5598   LearningRate 0.0176   Epoch: 11   Global Step: 481340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:49:41,075-Speed 2627.38 samples/sec   Loss 5.5069   LearningRate 0.0176   Epoch: 11   Global Step: 481350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:49:44,975-Speed 2625.71 samples/sec   Loss 5.5985   LearningRate 0.0176   Epoch: 11   Global Step: 481360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:49:48,899-Speed 2610.49 samples/sec   Loss 5.5444   LearningRate 0.0176   Epoch: 11   Global Step: 481370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:49:52,794-Speed 2629.88 samples/sec   Loss 5.6861   LearningRate 0.0176   Epoch: 11   Global Step: 481380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:49:56,688-Speed 2630.43 samples/sec   Loss 5.7032   LearningRate 0.0176   Epoch: 11   Global Step: 481390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:00,584-Speed 2628.78 samples/sec   Loss 5.7111   LearningRate 0.0176   Epoch: 11   Global Step: 481400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:04,485-Speed 2625.43 samples/sec   Loss 5.6161   LearningRate 0.0176   Epoch: 11   Global Step: 481410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:08,382-Speed 2628.45 samples/sec   Loss 5.6183   LearningRate 0.0176   Epoch: 11   Global Step: 481420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:12,287-Speed 2623.38 samples/sec   Loss 5.6859   LearningRate 0.0176   Epoch: 11   Global Step: 481430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:16,185-Speed 2627.27 samples/sec   Loss 5.6152   LearningRate 0.0176   Epoch: 11   Global Step: 481440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:20,068-Speed 2638.88 samples/sec   Loss 5.5472   LearningRate 0.0176   Epoch: 11   Global Step: 481450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:23,972-Speed 2623.59 samples/sec   Loss 5.7433   LearningRate 0.0176   Epoch: 11   Global Step: 481460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:27,873-Speed 2625.09 samples/sec   Loss 5.6105   LearningRate 0.0176   Epoch: 11   Global Step: 481470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:50:31,752-Speed 2640.47 samples/sec   Loss 5.5894   LearningRate 0.0176   Epoch: 11   Global Step: 481480   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:35,674-Speed 2611.70 samples/sec   Loss 5.6967   LearningRate 0.0176   Epoch: 11   Global Step: 481490   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:39,576-Speed 2624.78 samples/sec   Loss 5.6296   LearningRate 0.0176   Epoch: 11   Global Step: 481500   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:43,494-Speed 2614.96 samples/sec   Loss 5.6362   LearningRate 0.0176   Epoch: 11   Global Step: 481510   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:47,391-Speed 2628.69 samples/sec   Loss 5.5243   LearningRate 0.0176   Epoch: 11   Global Step: 481520   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:51,292-Speed 2624.93 samples/sec   Loss 5.5814   LearningRate 0.0176   Epoch: 11   Global Step: 481530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:55,186-Speed 2630.86 samples/sec   Loss 5.6353   LearningRate 0.0176   Epoch: 11   Global Step: 481540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:50:59,082-Speed 2628.76 samples/sec   Loss 5.5651   LearningRate 0.0176   Epoch: 11   Global Step: 481550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:02,977-Speed 2629.82 samples/sec   Loss 5.6792   LearningRate 0.0176   Epoch: 11   Global Step: 481560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:06,941-Speed 2583.16 samples/sec   Loss 5.6114   LearningRate 0.0176   Epoch: 11   Global Step: 481570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:10,847-Speed 2623.06 samples/sec   Loss 5.6613   LearningRate 0.0176   Epoch: 11   Global Step: 481580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:51:14,747-Speed 2626.03 samples/sec   Loss 5.6280   LearningRate 0.0176   Epoch: 11   Global Step: 481590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:51:18,643-Speed 2629.66 samples/sec   Loss 5.5632   LearningRate 0.0176   Epoch: 11   Global Step: 481600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:51:22,538-Speed 2629.55 samples/sec   Loss 5.5891   LearningRate 0.0176   Epoch: 11   Global Step: 481610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:51:26,415-Speed 2641.75 samples/sec   Loss 5.7833   LearningRate 0.0176   Epoch: 11   Global Step: 481620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:30,317-Speed 2626.08 samples/sec   Loss 5.6497   LearningRate 0.0176   Epoch: 11   Global Step: 481630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:34,215-Speed 2627.47 samples/sec   Loss 5.6793   LearningRate 0.0176   Epoch: 11   Global Step: 481640   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:38,117-Speed 2624.52 samples/sec   Loss 5.6749   LearningRate 0.0176   Epoch: 11   Global Step: 481650   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:42,010-Speed 2630.68 samples/sec   Loss 5.5883   LearningRate 0.0176   Epoch: 11   Global Step: 481660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:45,902-Speed 2632.26 samples/sec   Loss 5.6000   LearningRate 0.0176   Epoch: 11   Global Step: 481670   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:49,801-Speed 2627.01 samples/sec   Loss 5.6531   LearningRate 0.0176   Epoch: 11   Global Step: 481680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:53,709-Speed 2621.15 samples/sec   Loss 5.6053   LearningRate 0.0176   Epoch: 11   Global Step: 481690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:51:57,618-Speed 2620.45 samples/sec   Loss 5.6726   LearningRate 0.0176   Epoch: 11   Global Step: 481700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:01,518-Speed 2626.47 samples/sec   Loss 5.6677   LearningRate 0.0176   Epoch: 11   Global Step: 481710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:05,423-Speed 2622.73 samples/sec   Loss 5.5967   LearningRate 0.0176   Epoch: 11   Global Step: 481720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:52:09,338-Speed 2616.19 samples/sec   Loss 5.6358   LearningRate 0.0176   Epoch: 11   Global Step: 481730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:52:13,251-Speed 2617.84 samples/sec   Loss 5.5994   LearningRate 0.0176   Epoch: 11   Global Step: 481740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:52:17,146-Speed 2629.55 samples/sec   Loss 5.6576   LearningRate 0.0176   Epoch: 11   Global Step: 481750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:21,050-Speed 2624.38 samples/sec   Loss 5.6371   LearningRate 0.0176   Epoch: 11   Global Step: 481760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:24,968-Speed 2613.97 samples/sec   Loss 5.5570   LearningRate 0.0176   Epoch: 11   Global Step: 481770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:28,866-Speed 2628.28 samples/sec   Loss 5.4844   LearningRate 0.0176   Epoch: 11   Global Step: 481780   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:32,773-Speed 2621.38 samples/sec   Loss 5.6342   LearningRate 0.0176   Epoch: 11   Global Step: 481790   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:36,689-Speed 2615.77 samples/sec   Loss 5.6436   LearningRate 0.0176   Epoch: 11   Global Step: 481800   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:40,589-Speed 2626.44 samples/sec   Loss 5.6354   LearningRate 0.0176   Epoch: 11   Global Step: 481810   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:44,489-Speed 2626.73 samples/sec   Loss 5.5659   LearningRate 0.0176   Epoch: 11   Global Step: 481820   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:48,385-Speed 2628.63 samples/sec   Loss 5.6300   LearningRate 0.0176   Epoch: 11   Global Step: 481830   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:52,275-Speed 2633.25 samples/sec   Loss 5.6124   LearningRate 0.0176   Epoch: 11   Global Step: 481840   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:52:56,188-Speed 2617.11 samples/sec   Loss 5.6215   LearningRate 0.0176   Epoch: 11   Global Step: 481850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:00,089-Speed 2625.81 samples/sec   Loss 5.5952   LearningRate 0.0176   Epoch: 11   Global Step: 481860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:03,990-Speed 2625.55 samples/sec   Loss 5.6793   LearningRate 0.0176   Epoch: 11   Global Step: 481870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:07,889-Speed 2627.26 samples/sec   Loss 5.6480   LearningRate 0.0176   Epoch: 11   Global Step: 481880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:11,784-Speed 2629.43 samples/sec   Loss 5.5698   LearningRate 0.0176   Epoch: 11   Global Step: 481890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:15,677-Speed 2630.98 samples/sec   Loss 5.5982   LearningRate 0.0176   Epoch: 11   Global Step: 481900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:19,587-Speed 2619.68 samples/sec   Loss 5.5264   LearningRate 0.0176   Epoch: 11   Global Step: 481910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:23,479-Speed 2631.67 samples/sec   Loss 5.5049   LearningRate 0.0176   Epoch: 11   Global Step: 481920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:27,372-Speed 2630.64 samples/sec   Loss 5.5658   LearningRate 0.0176   Epoch: 11   Global Step: 481930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:53:31,245-Speed 2644.74 samples/sec   Loss 5.6941   LearningRate 0.0176   Epoch: 11   Global Step: 481940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:35,140-Speed 2629.95 samples/sec   Loss 5.5728   LearningRate 0.0176   Epoch: 11   Global Step: 481950   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:39,040-Speed 2626.02 samples/sec   Loss 5.7393   LearningRate 0.0176   Epoch: 11   Global Step: 481960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:42,952-Speed 2618.27 samples/sec   Loss 5.5713   LearningRate 0.0176   Epoch: 11   Global Step: 481970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:46,850-Speed 2627.81 samples/sec   Loss 5.5820   LearningRate 0.0176   Epoch: 11   Global Step: 481980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:50,746-Speed 2628.98 samples/sec   Loss 5.6500   LearningRate 0.0176   Epoch: 11   Global Step: 481990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:54,646-Speed 2626.05 samples/sec   Loss 5.5966   LearningRate 0.0176   Epoch: 11   Global Step: 482000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:53:58,542-Speed 2628.67 samples/sec   Loss 5.5991   LearningRate 0.0176   Epoch: 11   Global Step: 482010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:02,444-Speed 2624.76 samples/sec   Loss 5.6328   LearningRate 0.0176   Epoch: 11   Global Step: 482020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:06,345-Speed 2625.42 samples/sec   Loss 5.6541   LearningRate 0.0176   Epoch: 11   Global Step: 482030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:10,244-Speed 2627.30 samples/sec   Loss 5.6489   LearningRate 0.0176   Epoch: 11   Global Step: 482040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:54:14,147-Speed 2624.51 samples/sec   Loss 5.6101   LearningRate 0.0175   Epoch: 11   Global Step: 482050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:18,046-Speed 2627.04 samples/sec   Loss 5.5586   LearningRate 0.0175   Epoch: 11   Global Step: 482060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:21,943-Speed 2628.12 samples/sec   Loss 5.6199   LearningRate 0.0175   Epoch: 11   Global Step: 482070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:25,855-Speed 2618.18 samples/sec   Loss 5.6657   LearningRate 0.0175   Epoch: 11   Global Step: 482080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:29,751-Speed 2629.10 samples/sec   Loss 5.6245   LearningRate 0.0175   Epoch: 11   Global Step: 482090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:33,660-Speed 2620.02 samples/sec   Loss 5.5983   LearningRate 0.0175   Epoch: 11   Global Step: 482100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:37,560-Speed 2626.38 samples/sec   Loss 5.5157   LearningRate 0.0175   Epoch: 11   Global Step: 482110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:41,459-Speed 2627.08 samples/sec   Loss 5.5702   LearningRate 0.0175   Epoch: 11   Global Step: 482120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:45,376-Speed 2615.39 samples/sec   Loss 5.5813   LearningRate 0.0175   Epoch: 11   Global Step: 482130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:49,278-Speed 2624.43 samples/sec   Loss 5.5426   LearningRate 0.0175   Epoch: 11   Global Step: 482140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:54:53,188-Speed 2620.06 samples/sec   Loss 5.6190   LearningRate 0.0175   Epoch: 11   Global Step: 482150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:54:57,086-Speed 2627.18 samples/sec   Loss 5.6569   LearningRate 0.0175   Epoch: 11   Global Step: 482160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:00,985-Speed 2626.65 samples/sec   Loss 5.6390   LearningRate 0.0175   Epoch: 11   Global Step: 482170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:04,894-Speed 2620.47 samples/sec   Loss 5.6574   LearningRate 0.0175   Epoch: 11   Global Step: 482180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:08,794-Speed 2626.35 samples/sec   Loss 5.6386   LearningRate 0.0175   Epoch: 11   Global Step: 482190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:12,699-Speed 2622.52 samples/sec   Loss 5.5294   LearningRate 0.0175   Epoch: 11   Global Step: 482200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:16,598-Speed 2627.21 samples/sec   Loss 5.6603   LearningRate 0.0175   Epoch: 11   Global Step: 482210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:20,500-Speed 2625.34 samples/sec   Loss 5.6740   LearningRate 0.0175   Epoch: 11   Global Step: 482220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:24,397-Speed 2628.27 samples/sec   Loss 5.5843   LearningRate 0.0175   Epoch: 11   Global Step: 482230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:28,295-Speed 2627.50 samples/sec   Loss 5.5970   LearningRate 0.0175   Epoch: 11   Global Step: 482240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:32,172-Speed 2642.01 samples/sec   Loss 5.5854   LearningRate 0.0175   Epoch: 11   Global Step: 482250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:36,069-Speed 2627.52 samples/sec   Loss 5.5269   LearningRate 0.0175   Epoch: 11   Global Step: 482260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:55:39,945-Speed 2642.71 samples/sec   Loss 5.6583   LearningRate 0.0175   Epoch: 11   Global Step: 482270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:55:43,851-Speed 2622.53 samples/sec   Loss 5.6299   LearningRate 0.0175   Epoch: 11   Global Step: 482280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:55:47,751-Speed 2626.29 samples/sec   Loss 5.6014   LearningRate 0.0175   Epoch: 11   Global Step: 482290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:55:51,672-Speed 2611.74 samples/sec   Loss 5.6029   LearningRate 0.0175   Epoch: 11   Global Step: 482300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:55:55,576-Speed 2623.81 samples/sec   Loss 5.5850   LearningRate 0.0175   Epoch: 11   Global Step: 482310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:55:59,474-Speed 2627.79 samples/sec   Loss 5.6781   LearningRate 0.0175   Epoch: 11   Global Step: 482320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:03,367-Speed 2630.82 samples/sec   Loss 5.5731   LearningRate 0.0175   Epoch: 11   Global Step: 482330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:07,269-Speed 2624.95 samples/sec   Loss 5.4986   LearningRate 0.0175   Epoch: 11   Global Step: 482340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:11,161-Speed 2631.58 samples/sec   Loss 5.5408   LearningRate 0.0175   Epoch: 11   Global Step: 482350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:15,057-Speed 2630.04 samples/sec   Loss 5.5527   LearningRate 0.0175   Epoch: 11   Global Step: 482360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:18,970-Speed 2617.40 samples/sec   Loss 5.5836   LearningRate 0.0175   Epoch: 11   Global Step: 482370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:22,870-Speed 2628.47 samples/sec   Loss 5.6106   LearningRate 0.0175   Epoch: 11   Global Step: 482380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:26,765-Speed 2629.42 samples/sec   Loss 5.6591   LearningRate 0.0175   Epoch: 11   Global Step: 482390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:30,656-Speed 2632.58 samples/sec   Loss 5.4983   LearningRate 0.0175   Epoch: 11   Global Step: 482400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:34,549-Speed 2630.40 samples/sec   Loss 5.6141   LearningRate 0.0175   Epoch: 11   Global Step: 482410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:38,454-Speed 2623.42 samples/sec   Loss 5.7152   LearningRate 0.0175   Epoch: 11   Global Step: 482420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:42,360-Speed 2622.20 samples/sec   Loss 5.6970   LearningRate 0.0175   Epoch: 11   Global Step: 482430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:56:46,243-Speed 2637.85 samples/sec   Loss 5.5728   LearningRate 0.0175   Epoch: 11   Global Step: 482440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:50,139-Speed 2629.12 samples/sec   Loss 5.6785   LearningRate 0.0175   Epoch: 11   Global Step: 482450   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:54,030-Speed 2631.93 samples/sec   Loss 5.5893   LearningRate 0.0175   Epoch: 11   Global Step: 482460   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:56:57,927-Speed 2628.32 samples/sec   Loss 5.6658   LearningRate 0.0175   Epoch: 11   Global Step: 482470   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:01,828-Speed 2625.51 samples/sec   Loss 5.5688   LearningRate 0.0175   Epoch: 11   Global Step: 482480   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:05,747-Speed 2613.64 samples/sec   Loss 5.6376   LearningRate 0.0175   Epoch: 11   Global Step: 482490   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:09,655-Speed 2620.97 samples/sec   Loss 5.6874   LearningRate 0.0175   Epoch: 11   Global Step: 482500   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:13,550-Speed 2629.08 samples/sec   Loss 5.5869   LearningRate 0.0175   Epoch: 11   Global Step: 482510   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:17,441-Speed 2632.40 samples/sec   Loss 5.5306   LearningRate 0.0175   Epoch: 11   Global Step: 482520   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:21,334-Speed 2631.16 samples/sec   Loss 5.5928   LearningRate 0.0175   Epoch: 11   Global Step: 482530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:57:25,235-Speed 2625.83 samples/sec   Loss 5.5314   LearningRate 0.0175   Epoch: 11   Global Step: 482540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:29,144-Speed 2620.46 samples/sec   Loss 5.6754   LearningRate 0.0175   Epoch: 11   Global Step: 482550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:33,036-Speed 2631.43 samples/sec   Loss 5.4417   LearningRate 0.0175   Epoch: 11   Global Step: 482560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:36,942-Speed 2621.79 samples/sec   Loss 5.6547   LearningRate 0.0175   Epoch: 11   Global Step: 482570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:40,844-Speed 2625.02 samples/sec   Loss 5.5135   LearningRate 0.0175   Epoch: 11   Global Step: 482580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:44,751-Speed 2621.67 samples/sec   Loss 5.6332   LearningRate 0.0175   Epoch: 11   Global Step: 482590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:48,642-Speed 2632.02 samples/sec   Loss 5.5470   LearningRate 0.0175   Epoch: 11   Global Step: 482600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:52,540-Speed 2628.33 samples/sec   Loss 5.5883   LearningRate 0.0175   Epoch: 11   Global Step: 482610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:57:56,426-Speed 2635.31 samples/sec   Loss 5.7168   LearningRate 0.0175   Epoch: 11   Global Step: 482620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:00,384-Speed 2588.16 samples/sec   Loss 5.5755   LearningRate 0.0175   Epoch: 11   Global Step: 482630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:04,280-Speed 2628.77 samples/sec   Loss 5.5551   LearningRate 0.0175   Epoch: 11   Global Step: 482640   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:08,176-Speed 2629.28 samples/sec   Loss 5.5583   LearningRate 0.0175   Epoch: 11   Global Step: 482650   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:12,080-Speed 2623.38 samples/sec   Loss 5.7197   LearningRate 0.0175   Epoch: 11   Global Step: 482660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:15,979-Speed 2626.80 samples/sec   Loss 5.6220   LearningRate 0.0175   Epoch: 11   Global Step: 482670   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:19,875-Speed 2628.97 samples/sec   Loss 5.6605   LearningRate 0.0175   Epoch: 11   Global Step: 482680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:23,769-Speed 2630.05 samples/sec   Loss 5.5686   LearningRate 0.0175   Epoch: 11   Global Step: 482690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:27,666-Speed 2628.61 samples/sec   Loss 5.5458   LearningRate 0.0175   Epoch: 11   Global Step: 482700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:31,568-Speed 2625.52 samples/sec   Loss 5.5700   LearningRate 0.0175   Epoch: 11   Global Step: 482710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:58:35,467-Speed 2626.84 samples/sec   Loss 5.5596   LearningRate 0.0175   Epoch: 11   Global Step: 482720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:58:39,390-Speed 2610.62 samples/sec   Loss 5.5459   LearningRate 0.0175   Epoch: 11   Global Step: 482730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:58:43,291-Speed 2625.22 samples/sec   Loss 5.5176   LearningRate 0.0175   Epoch: 11   Global Step: 482740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:58:47,202-Speed 2618.91 samples/sec   Loss 5.5275   LearningRate 0.0175   Epoch: 11   Global Step: 482750   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:58:51,095-Speed 2631.19 samples/sec   Loss 5.5285   LearningRate 0.0175   Epoch: 11   Global Step: 482760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:58:55,009-Speed 2616.46 samples/sec   Loss 5.6333   LearningRate 0.0175   Epoch: 11   Global Step: 482770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:58:58,918-Speed 2620.28 samples/sec   Loss 5.5626   LearningRate 0.0175   Epoch: 11   Global Step: 482780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 01:59:02,799-Speed 2639.34 samples/sec   Loss 5.6429   LearningRate 0.0175   Epoch: 11   Global Step: 482790   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:59:06,702-Speed 2623.65 samples/sec   Loss 5.6022   LearningRate 0.0175   Epoch: 11   Global Step: 482800   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:59:10,601-Speed 2627.23 samples/sec   Loss 5.5646   LearningRate 0.0175   Epoch: 11   Global Step: 482810   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:59:14,482-Speed 2639.64 samples/sec   Loss 5.6400   LearningRate 0.0175   Epoch: 11   Global Step: 482820   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:18,381-Speed 2626.70 samples/sec   Loss 5.5667   LearningRate 0.0175   Epoch: 11   Global Step: 482830   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:22,278-Speed 2628.24 samples/sec   Loss 5.6620   LearningRate 0.0175   Epoch: 11   Global Step: 482840   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:26,172-Speed 2630.08 samples/sec   Loss 5.7447   LearningRate 0.0175   Epoch: 11   Global Step: 482850   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:30,094-Speed 2611.72 samples/sec   Loss 5.6610   LearningRate 0.0175   Epoch: 11   Global Step: 482860   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:33,993-Speed 2626.61 samples/sec   Loss 5.4706   LearningRate 0.0175   Epoch: 11   Global Step: 482870   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:37,883-Speed 2633.44 samples/sec   Loss 5.6417   LearningRate 0.0175   Epoch: 11   Global Step: 482880   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:41,792-Speed 2619.50 samples/sec   Loss 5.6196   LearningRate 0.0175   Epoch: 11   Global Step: 482890   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:45,684-Speed 2632.05 samples/sec   Loss 5.5789   LearningRate 0.0175   Epoch: 11   Global Step: 482900   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:49,582-Speed 2627.82 samples/sec   Loss 5.5926   LearningRate 0.0175   Epoch: 11   Global Step: 482910   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 01:59:53,478-Speed 2629.29 samples/sec   Loss 5.7569   LearningRate 0.0175   Epoch: 11   Global Step: 482920   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 01:59:57,376-Speed 2627.15 samples/sec   Loss 5.5851   LearningRate 0.0175   Epoch: 11   Global Step: 482930   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:01,273-Speed 2629.22 samples/sec   Loss 5.5947   LearningRate 0.0175   Epoch: 11   Global Step: 482940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:05,171-Speed 2627.16 samples/sec   Loss 5.5005   LearningRate 0.0175   Epoch: 11   Global Step: 482950   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:09,072-Speed 2625.68 samples/sec   Loss 5.5083   LearningRate 0.0175   Epoch: 11   Global Step: 482960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:12,982-Speed 2619.10 samples/sec   Loss 5.5184   LearningRate 0.0175   Epoch: 11   Global Step: 482970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:16,880-Speed 2628.38 samples/sec   Loss 5.7139   LearningRate 0.0175   Epoch: 11   Global Step: 482980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:20,780-Speed 2626.48 samples/sec   Loss 5.5571   LearningRate 0.0175   Epoch: 11   Global Step: 482990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:24,677-Speed 2628.16 samples/sec   Loss 5.6844   LearningRate 0.0175   Epoch: 11   Global Step: 483000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:28,576-Speed 2627.33 samples/sec   Loss 5.5772   LearningRate 0.0175   Epoch: 11   Global Step: 483010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:32,479-Speed 2623.90 samples/sec   Loss 5.5322   LearningRate 0.0175   Epoch: 11   Global Step: 483020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:00:36,377-Speed 2628.83 samples/sec   Loss 5.5886   LearningRate 0.0175   Epoch: 11   Global Step: 483030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:00:40,271-Speed 2630.04 samples/sec   Loss 5.5621   LearningRate 0.0174   Epoch: 11   Global Step: 483040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:00:44,170-Speed 2627.06 samples/sec   Loss 5.5845   LearningRate 0.0174   Epoch: 11   Global Step: 483050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:00:48,064-Speed 2630.37 samples/sec   Loss 5.6206   LearningRate 0.0174   Epoch: 11   Global Step: 483060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:00:51,958-Speed 2630.30 samples/sec   Loss 5.5485   LearningRate 0.0174   Epoch: 11   Global Step: 483070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:00:55,846-Speed 2634.49 samples/sec   Loss 5.5911   LearningRate 0.0174   Epoch: 11   Global Step: 483080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:00:59,739-Speed 2630.95 samples/sec   Loss 5.6154   LearningRate 0.0174   Epoch: 11   Global Step: 483090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:03,632-Speed 2630.39 samples/sec   Loss 5.6365   LearningRate 0.0174   Epoch: 11   Global Step: 483100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:07,529-Speed 2628.91 samples/sec   Loss 5.5172   LearningRate 0.0174   Epoch: 11   Global Step: 483110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:11,421-Speed 2632.09 samples/sec   Loss 5.5846   LearningRate 0.0174   Epoch: 11   Global Step: 483120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:15,318-Speed 2628.09 samples/sec   Loss 5.5760   LearningRate 0.0174   Epoch: 11   Global Step: 483130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:19,215-Speed 2628.21 samples/sec   Loss 5.6449   LearningRate 0.0174   Epoch: 11   Global Step: 483140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:23,114-Speed 2627.03 samples/sec   Loss 5.5396   LearningRate 0.0174   Epoch: 11   Global Step: 483150   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:27,008-Speed 2629.97 samples/sec   Loss 5.6251   LearningRate 0.0174   Epoch: 11   Global Step: 483160   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:30,908-Speed 2626.23 samples/sec   Loss 5.5296   LearningRate 0.0174   Epoch: 11   Global Step: 483170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:01:34,803-Speed 2629.37 samples/sec   Loss 5.5998   LearningRate 0.0174   Epoch: 11   Global Step: 483180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:01:38,699-Speed 2629.34 samples/sec   Loss 5.5258   LearningRate 0.0174   Epoch: 11   Global Step: 483190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:01:42,599-Speed 2626.23 samples/sec   Loss 5.5144   LearningRate 0.0174   Epoch: 11   Global Step: 483200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:01:46,491-Speed 2632.32 samples/sec   Loss 5.5949   LearningRate 0.0174   Epoch: 11   Global Step: 483210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:01:50,398-Speed 2621.74 samples/sec   Loss 5.5447   LearningRate 0.0174   Epoch: 11   Global Step: 483220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:01:54,294-Speed 2629.17 samples/sec   Loss 5.5302   LearningRate 0.0174   Epoch: 11   Global Step: 483230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:01:58,179-Speed 2635.78 samples/sec   Loss 5.4659   LearningRate 0.0174   Epoch: 11   Global Step: 483240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:02,107-Speed 2607.34 samples/sec   Loss 5.7172   LearningRate 0.0174   Epoch: 11   Global Step: 483250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:05,999-Speed 2631.85 samples/sec   Loss 5.5473   LearningRate 0.0174   Epoch: 11   Global Step: 483260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:09,894-Speed 2632.34 samples/sec   Loss 5.5860   LearningRate 0.0174   Epoch: 11   Global Step: 483270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:13,798-Speed 2623.55 samples/sec   Loss 5.5987   LearningRate 0.0174   Epoch: 11   Global Step: 483280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:17,693-Speed 2630.20 samples/sec   Loss 5.5561   LearningRate 0.0174   Epoch: 11   Global Step: 483290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:21,603-Speed 2619.00 samples/sec   Loss 5.7133   LearningRate 0.0174   Epoch: 11   Global Step: 483300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:25,508-Speed 2623.75 samples/sec   Loss 5.5477   LearningRate 0.0174   Epoch: 11   Global Step: 483310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:29,406-Speed 2627.77 samples/sec   Loss 5.7530   LearningRate 0.0174   Epoch: 11   Global Step: 483320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:33,305-Speed 2626.66 samples/sec   Loss 5.6575   LearningRate 0.0174   Epoch: 11   Global Step: 483330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:37,200-Speed 2629.16 samples/sec   Loss 5.4649   LearningRate 0.0174   Epoch: 11   Global Step: 483340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:02:41,099-Speed 2627.37 samples/sec   Loss 5.6915   LearningRate 0.0174   Epoch: 11   Global Step: 483350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:02:44,981-Speed 2637.89 samples/sec   Loss 5.6990   LearningRate 0.0174   Epoch: 11   Global Step: 483360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:48,877-Speed 2629.86 samples/sec   Loss 5.6024   LearningRate 0.0174   Epoch: 11   Global Step: 483370   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:52,771-Speed 2630.47 samples/sec   Loss 5.6520   LearningRate 0.0174   Epoch: 11   Global Step: 483380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:02:56,726-Speed 2589.72 samples/sec   Loss 5.5829   LearningRate 0.0174   Epoch: 11   Global Step: 483390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:00,622-Speed 2629.44 samples/sec   Loss 5.6171   LearningRate 0.0174   Epoch: 11   Global Step: 483400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:04,695-Speed 2514.66 samples/sec   Loss 5.6805   LearningRate 0.0174   Epoch: 11   Global Step: 483410   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:08,689-Speed 2564.23 samples/sec   Loss 5.6385   LearningRate 0.0174   Epoch: 11   Global Step: 483420   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:12,605-Speed 2615.24 samples/sec   Loss 5.5783   LearningRate 0.0174   Epoch: 11   Global Step: 483430   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:16,521-Speed 2616.17 samples/sec   Loss 5.5366   LearningRate 0.0174   Epoch: 11   Global Step: 483440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:20,425-Speed 2623.81 samples/sec   Loss 5.5240   LearningRate 0.0174   Epoch: 11   Global Step: 483450   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:24,328-Speed 2624.62 samples/sec   Loss 5.6686   LearningRate 0.0174   Epoch: 11   Global Step: 483460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:03:28,256-Speed 2607.92 samples/sec   Loss 5.5353   LearningRate 0.0174   Epoch: 11   Global Step: 483470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:03:32,165-Speed 2620.27 samples/sec   Loss 5.6891   LearningRate 0.0174   Epoch: 11   Global Step: 483480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:03:36,058-Speed 2631.12 samples/sec   Loss 5.5279   LearningRate 0.0174   Epoch: 11   Global Step: 483490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:03:39,935-Speed 2641.87 samples/sec   Loss 5.6769   LearningRate 0.0174   Epoch: 11   Global Step: 483500   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:43,841-Speed 2621.93 samples/sec   Loss 5.5816   LearningRate 0.0174   Epoch: 11   Global Step: 483510   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:47,745-Speed 2623.20 samples/sec   Loss 5.5258   LearningRate 0.0174   Epoch: 11   Global Step: 483520   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:51,651-Speed 2622.76 samples/sec   Loss 5.5887   LearningRate 0.0174   Epoch: 11   Global Step: 483530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:55,647-Speed 2563.45 samples/sec   Loss 5.5527   LearningRate 0.0174   Epoch: 11   Global Step: 483540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:03:59,561-Speed 2617.44 samples/sec   Loss 5.5761   LearningRate 0.0174   Epoch: 11   Global Step: 483550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:04:03,460-Speed 2626.88 samples/sec   Loss 5.4629   LearningRate 0.0174   Epoch: 11   Global Step: 483560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:04:07,365-Speed 2622.82 samples/sec   Loss 5.5701   LearningRate 0.0174   Epoch: 11   Global Step: 483570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:04:11,284-Speed 2612.83 samples/sec   Loss 5.5129   LearningRate 0.0174   Epoch: 11   Global Step: 483580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:04:15,186-Speed 2625.43 samples/sec   Loss 5.6161   LearningRate 0.0174   Epoch: 11   Global Step: 483590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:04:19,081-Speed 2629.73 samples/sec   Loss 5.4851   LearningRate 0.0174   Epoch: 11   Global Step: 483600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:22,976-Speed 2630.35 samples/sec   Loss 5.5620   LearningRate 0.0174   Epoch: 11   Global Step: 483610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:26,883-Speed 2621.72 samples/sec   Loss 5.5305   LearningRate 0.0174   Epoch: 11   Global Step: 483620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:30,780-Speed 2628.56 samples/sec   Loss 5.6391   LearningRate 0.0174   Epoch: 11   Global Step: 483630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:34,680-Speed 2626.45 samples/sec   Loss 5.6264   LearningRate 0.0174   Epoch: 11   Global Step: 483640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:38,577-Speed 2627.80 samples/sec   Loss 5.5046   LearningRate 0.0174   Epoch: 11   Global Step: 483650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:42,479-Speed 2625.07 samples/sec   Loss 5.6915   LearningRate 0.0174   Epoch: 11   Global Step: 483660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:46,375-Speed 2629.39 samples/sec   Loss 5.5659   LearningRate 0.0174   Epoch: 11   Global Step: 483670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:50,272-Speed 2627.77 samples/sec   Loss 5.5577   LearningRate 0.0174   Epoch: 11   Global Step: 483680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:04:54,162-Speed 2633.21 samples/sec   Loss 5.5457   LearningRate 0.0174   Epoch: 11   Global Step: 483690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:04:58,060-Speed 2627.47 samples/sec   Loss 5.6422   LearningRate 0.0174   Epoch: 11   Global Step: 483700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:05:01,951-Speed 2632.97 samples/sec   Loss 5.5893   LearningRate 0.0174   Epoch: 11   Global Step: 483710   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:05,851-Speed 2625.63 samples/sec   Loss 5.5498   LearningRate 0.0174   Epoch: 11   Global Step: 483720   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:09,748-Speed 2628.76 samples/sec   Loss 5.5420   LearningRate 0.0174   Epoch: 11   Global Step: 483730   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:13,646-Speed 2627.22 samples/sec   Loss 5.4873   LearningRate 0.0174   Epoch: 11   Global Step: 483740   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:17,544-Speed 2628.08 samples/sec   Loss 5.5812   LearningRate 0.0174   Epoch: 11   Global Step: 483750   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:21,443-Speed 2627.35 samples/sec   Loss 5.5473   LearningRate 0.0174   Epoch: 11   Global Step: 483760   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:25,337-Speed 2630.05 samples/sec   Loss 5.4919   LearningRate 0.0174   Epoch: 11   Global Step: 483770   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:29,255-Speed 2614.36 samples/sec   Loss 5.6444   LearningRate 0.0174   Epoch: 11   Global Step: 483780   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:33,159-Speed 2623.62 samples/sec   Loss 5.6274   LearningRate 0.0174   Epoch: 11   Global Step: 483790   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:37,056-Speed 2627.85 samples/sec   Loss 5.6185   LearningRate 0.0174   Epoch: 11   Global Step: 483800   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:05:40,957-Speed 2625.48 samples/sec   Loss 5.4954   LearningRate 0.0174   Epoch: 11   Global Step: 483810   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:05:44,865-Speed 2621.73 samples/sec   Loss 5.5423   LearningRate 0.0174   Epoch: 11   Global Step: 483820   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:05:48,757-Speed 2631.30 samples/sec   Loss 5.6169   LearningRate 0.0174   Epoch: 11   Global Step: 483830   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:05:52,654-Speed 2629.16 samples/sec   Loss 5.7493   LearningRate 0.0174   Epoch: 11   Global Step: 483840   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:05:56,550-Speed 2628.38 samples/sec   Loss 5.6155   LearningRate 0.0174   Epoch: 11   Global Step: 483850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:06:00,443-Speed 2631.25 samples/sec   Loss 5.4960   LearningRate 0.0174   Epoch: 11   Global Step: 483860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:06:04,364-Speed 2612.07 samples/sec   Loss 5.5951   LearningRate 0.0174   Epoch: 11   Global Step: 483870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:06:08,279-Speed 2615.98 samples/sec   Loss 5.5427   LearningRate 0.0174   Epoch: 11   Global Step: 483880   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:12,184-Speed 2622.82 samples/sec   Loss 5.6428   LearningRate 0.0174   Epoch: 11   Global Step: 483890   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:16,083-Speed 2627.91 samples/sec   Loss 5.4852   LearningRate 0.0174   Epoch: 11   Global Step: 483900   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:19,979-Speed 2628.51 samples/sec   Loss 5.4744   LearningRate 0.0174   Epoch: 11   Global Step: 483910   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:23,876-Speed 2628.58 samples/sec   Loss 5.5750   LearningRate 0.0174   Epoch: 11   Global Step: 483920   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:27,778-Speed 2624.89 samples/sec   Loss 5.5492   LearningRate 0.0174   Epoch: 11   Global Step: 483930   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:31,673-Speed 2630.03 samples/sec   Loss 5.4794   LearningRate 0.0174   Epoch: 11   Global Step: 483940   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:35,571-Speed 2626.98 samples/sec   Loss 5.6613   LearningRate 0.0174   Epoch: 11   Global Step: 483950   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:39,462-Speed 2632.48 samples/sec   Loss 5.5825   LearningRate 0.0174   Epoch: 11   Global Step: 483960   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:43,358-Speed 2628.65 samples/sec   Loss 5.4497   LearningRate 0.0174   Epoch: 11   Global Step: 483970   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:06:47,259-Speed 2625.79 samples/sec   Loss 5.5463   LearningRate 0.0174   Epoch: 11   Global Step: 483980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:06:51,157-Speed 2627.60 samples/sec   Loss 5.6814   LearningRate 0.0174   Epoch: 11   Global Step: 483990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:06:55,055-Speed 2627.87 samples/sec   Loss 5.6038   LearningRate 0.0174   Epoch: 11   Global Step: 484000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:06:58,967-Speed 2618.60 samples/sec   Loss 5.6120   LearningRate 0.0174   Epoch: 11   Global Step: 484010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:02,862-Speed 2629.46 samples/sec   Loss 5.7320   LearningRate 0.0174   Epoch: 11   Global Step: 484020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:06,770-Speed 2620.61 samples/sec   Loss 5.7210   LearningRate 0.0174   Epoch: 11   Global Step: 484030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:10,714-Speed 2597.08 samples/sec   Loss 5.5967   LearningRate 0.0173   Epoch: 11   Global Step: 484040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:14,611-Speed 2628.76 samples/sec   Loss 5.5529   LearningRate 0.0173   Epoch: 11   Global Step: 484050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:18,504-Speed 2631.02 samples/sec   Loss 5.4859   LearningRate 0.0173   Epoch: 11   Global Step: 484060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:22,401-Speed 2628.04 samples/sec   Loss 5.5303   LearningRate 0.0173   Epoch: 11   Global Step: 484070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:07:26,303-Speed 2625.67 samples/sec   Loss 5.5457   LearningRate 0.0173   Epoch: 11   Global Step: 484080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:30,209-Speed 2621.88 samples/sec   Loss 5.6506   LearningRate 0.0173   Epoch: 11   Global Step: 484090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:34,114-Speed 2622.99 samples/sec   Loss 5.6073   LearningRate 0.0173   Epoch: 11   Global Step: 484100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:38,029-Speed 2616.03 samples/sec   Loss 5.5810   LearningRate 0.0173   Epoch: 11   Global Step: 484110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:42,154-Speed 2483.38 samples/sec   Loss 5.5907   LearningRate 0.0173   Epoch: 11   Global Step: 484120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:46,052-Speed 2628.10 samples/sec   Loss 5.5447   LearningRate 0.0173   Epoch: 11   Global Step: 484130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:49,956-Speed 2623.33 samples/sec   Loss 5.5845   LearningRate 0.0173   Epoch: 11   Global Step: 484140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:53,858-Speed 2624.96 samples/sec   Loss 5.5408   LearningRate 0.0173   Epoch: 11   Global Step: 484150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:07:57,738-Speed 2639.79 samples/sec   Loss 5.6334   LearningRate 0.0173   Epoch: 11   Global Step: 484160   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:01,639-Speed 2625.37 samples/sec   Loss 5.5570   LearningRate 0.0173   Epoch: 11   Global Step: 484170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:05,552-Speed 2617.43 samples/sec   Loss 5.5789   LearningRate 0.0173   Epoch: 11   Global Step: 484180   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:09,464-Speed 2618.45 samples/sec   Loss 5.5646   LearningRate 0.0173   Epoch: 11   Global Step: 484190   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:13,364-Speed 2626.32 samples/sec   Loss 5.6423   LearningRate 0.0173   Epoch: 11   Global Step: 484200   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:17,265-Speed 2625.79 samples/sec   Loss 5.7169   LearningRate 0.0173   Epoch: 11   Global Step: 484210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:21,169-Speed 2623.71 samples/sec   Loss 5.6165   LearningRate 0.0173   Epoch: 11   Global Step: 484220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:25,071-Speed 2624.89 samples/sec   Loss 5.5100   LearningRate 0.0173   Epoch: 11   Global Step: 484230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:28,966-Speed 2630.02 samples/sec   Loss 5.6040   LearningRate 0.0173   Epoch: 11   Global Step: 484240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:32,859-Speed 2630.94 samples/sec   Loss 5.5260   LearningRate 0.0173   Epoch: 11   Global Step: 484250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:36,761-Speed 2624.79 samples/sec   Loss 5.6173   LearningRate 0.0173   Epoch: 11   Global Step: 484260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:08:40,689-Speed 2607.80 samples/sec   Loss 5.5700   LearningRate 0.0173   Epoch: 11   Global Step: 484270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:44,593-Speed 2623.76 samples/sec   Loss 5.5973   LearningRate 0.0173   Epoch: 11   Global Step: 484280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:48,488-Speed 2630.33 samples/sec   Loss 5.5044   LearningRate 0.0173   Epoch: 11   Global Step: 484290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:52,383-Speed 2629.29 samples/sec   Loss 5.5686   LearningRate 0.0173   Epoch: 11   Global Step: 484300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:08:56,278-Speed 2629.90 samples/sec   Loss 5.5380   LearningRate 0.0173   Epoch: 11   Global Step: 484310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:00,219-Speed 2598.87 samples/sec   Loss 5.5194   LearningRate 0.0173   Epoch: 11   Global Step: 484320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:04,130-Speed 2618.72 samples/sec   Loss 5.5378   LearningRate 0.0173   Epoch: 11   Global Step: 484330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:08,027-Speed 2628.41 samples/sec   Loss 5.5602   LearningRate 0.0173   Epoch: 11   Global Step: 484340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:11,931-Speed 2629.50 samples/sec   Loss 5.5661   LearningRate 0.0173   Epoch: 11   Global Step: 484350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:15,831-Speed 2626.72 samples/sec   Loss 5.5790   LearningRate 0.0173   Epoch: 11   Global Step: 484360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:19,738-Speed 2621.43 samples/sec   Loss 5.5998   LearningRate 0.0173   Epoch: 11   Global Step: 484370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:09:23,636-Speed 2627.92 samples/sec   Loss 5.6468   LearningRate 0.0173   Epoch: 11   Global Step: 484380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:09:27,513-Speed 2641.36 samples/sec   Loss 5.5545   LearningRate 0.0173   Epoch: 11   Global Step: 484390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:31,412-Speed 2627.18 samples/sec   Loss 5.5038   LearningRate 0.0173   Epoch: 11   Global Step: 484400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:35,308-Speed 2629.34 samples/sec   Loss 5.6070   LearningRate 0.0173   Epoch: 11   Global Step: 484410   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:39,204-Speed 2628.70 samples/sec   Loss 5.5931   LearningRate 0.0173   Epoch: 11   Global Step: 484420   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:43,115-Speed 2618.44 samples/sec   Loss 5.6486   LearningRate 0.0173   Epoch: 11   Global Step: 484430   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:47,020-Speed 2623.79 samples/sec   Loss 5.5438   LearningRate 0.0173   Epoch: 11   Global Step: 484440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:50,953-Speed 2603.83 samples/sec   Loss 5.6197   LearningRate 0.0173   Epoch: 11   Global Step: 484450   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:54,852-Speed 2627.15 samples/sec   Loss 5.6527   LearningRate 0.0173   Epoch: 11   Global Step: 484460   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:09:58,893-Speed 2534.81 samples/sec   Loss 5.5394   LearningRate 0.0173   Epoch: 11   Global Step: 484470   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:10:02,791-Speed 2628.03 samples/sec   Loss 5.5200   LearningRate 0.0173   Epoch: 11   Global Step: 484480   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:10:06,691-Speed 2626.23 samples/sec   Loss 5.5376   LearningRate 0.0173   Epoch: 11   Global Step: 484490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:10,590-Speed 2626.45 samples/sec   Loss 5.4694   LearningRate 0.0173   Epoch: 11   Global Step: 484500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:14,489-Speed 2626.84 samples/sec   Loss 5.5773   LearningRate 0.0173   Epoch: 11   Global Step: 484510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:18,386-Speed 2628.31 samples/sec   Loss 5.5420   LearningRate 0.0173   Epoch: 11   Global Step: 484520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:22,298-Speed 2619.22 samples/sec   Loss 5.5029   LearningRate 0.0173   Epoch: 11   Global Step: 484530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:26,199-Speed 2625.33 samples/sec   Loss 5.5248   LearningRate 0.0173   Epoch: 11   Global Step: 484540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:30,134-Speed 2603.35 samples/sec   Loss 5.5133   LearningRate 0.0173   Epoch: 11   Global Step: 484550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:34,058-Speed 2610.15 samples/sec   Loss 5.6150   LearningRate 0.0173   Epoch: 11   Global Step: 484560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:37,954-Speed 2629.03 samples/sec   Loss 5.4175   LearningRate 0.0173   Epoch: 11   Global Step: 484570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:41,848-Speed 2629.75 samples/sec   Loss 5.6016   LearningRate 0.0173   Epoch: 11   Global Step: 484580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:45,727-Speed 2640.72 samples/sec   Loss 5.5251   LearningRate 0.0173   Epoch: 11   Global Step: 484590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:49,625-Speed 2627.69 samples/sec   Loss 5.5925   LearningRate 0.0173   Epoch: 11   Global Step: 484600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:53,526-Speed 2625.36 samples/sec   Loss 5.5628   LearningRate 0.0173   Epoch: 11   Global Step: 484610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:10:57,469-Speed 2597.79 samples/sec   Loss 5.4843   LearningRate 0.0173   Epoch: 11   Global Step: 484620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:11:01,609-Speed 2474.57 samples/sec   Loss 5.5153   LearningRate 0.0173   Epoch: 11   Global Step: 484630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:11:05,551-Speed 2598.13 samples/sec   Loss 5.6228   LearningRate 0.0173   Epoch: 11   Global Step: 484640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:11:09,415-Speed 2650.50 samples/sec   Loss 5.5646   LearningRate 0.0173   Epoch: 11   Global Step: 484650   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:13,314-Speed 2627.12 samples/sec   Loss 5.5618   LearningRate 0.0173   Epoch: 11   Global Step: 484660   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:17,224-Speed 2619.69 samples/sec   Loss 5.6345   LearningRate 0.0173   Epoch: 11   Global Step: 484670   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:21,124-Speed 2625.97 samples/sec   Loss 5.5461   LearningRate 0.0173   Epoch: 11   Global Step: 484680   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:25,019-Speed 2629.93 samples/sec   Loss 5.4549   LearningRate 0.0173   Epoch: 11   Global Step: 484690   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:28,922-Speed 2624.31 samples/sec   Loss 5.6048   LearningRate 0.0173   Epoch: 11   Global Step: 484700   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:32,822-Speed 2626.60 samples/sec   Loss 5.4807   LearningRate 0.0173   Epoch: 11   Global Step: 484710   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:36,735-Speed 2617.12 samples/sec   Loss 5.4505   LearningRate 0.0173   Epoch: 11   Global Step: 484720   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:40,631-Speed 2628.73 samples/sec   Loss 5.6669   LearningRate 0.0173   Epoch: 11   Global Step: 484730   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:44,560-Speed 2606.78 samples/sec   Loss 5.6289   LearningRate 0.0173   Epoch: 11   Global Step: 484740   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-15 02:11:48,452-Speed 2631.44 samples/sec   Loss 5.5243   LearningRate 0.0173   Epoch: 11   Global Step: 484750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:11:52,351-Speed 2627.36 samples/sec   Loss 5.5556   LearningRate 0.0173   Epoch: 11   Global Step: 484760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:11:56,255-Speed 2623.07 samples/sec   Loss 5.4772   LearningRate 0.0173   Epoch: 11   Global Step: 484770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:00,149-Speed 2630.41 samples/sec   Loss 5.5806   LearningRate 0.0173   Epoch: 11   Global Step: 484780   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:04,057-Speed 2621.00 samples/sec   Loss 5.5723   LearningRate 0.0173   Epoch: 11   Global Step: 484790   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:07,954-Speed 2628.16 samples/sec   Loss 5.6440   LearningRate 0.0173   Epoch: 11   Global Step: 484800   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:11,847-Speed 2631.12 samples/sec   Loss 5.5703   LearningRate 0.0173   Epoch: 11   Global Step: 484810   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:15,735-Speed 2634.67 samples/sec   Loss 5.5570   LearningRate 0.0173   Epoch: 11   Global Step: 484820   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:19,628-Speed 2630.85 samples/sec   Loss 5.5649   LearningRate 0.0173   Epoch: 11   Global Step: 484830   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:23,521-Speed 2630.76 samples/sec   Loss 5.5556   LearningRate 0.0173   Epoch: 11   Global Step: 484840   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:12:27,426-Speed 2622.71 samples/sec   Loss 5.5928   LearningRate 0.0173   Epoch: 11   Global Step: 484850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:31,324-Speed 2628.00 samples/sec   Loss 5.6091   LearningRate 0.0173   Epoch: 11   Global Step: 484860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:35,243-Speed 2613.39 samples/sec   Loss 5.5039   LearningRate 0.0173   Epoch: 11   Global Step: 484870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:39,145-Speed 2624.52 samples/sec   Loss 5.6023   LearningRate 0.0173   Epoch: 11   Global Step: 484880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:43,045-Speed 2626.57 samples/sec   Loss 5.5469   LearningRate 0.0173   Epoch: 11   Global Step: 484890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:46,952-Speed 2621.67 samples/sec   Loss 5.5597   LearningRate 0.0173   Epoch: 11   Global Step: 484900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:50,854-Speed 2624.84 samples/sec   Loss 5.5538   LearningRate 0.0173   Epoch: 11   Global Step: 484910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:54,771-Speed 2615.30 samples/sec   Loss 5.5272   LearningRate 0.0173   Epoch: 11   Global Step: 484920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:12:58,670-Speed 2626.85 samples/sec   Loss 5.6951   LearningRate 0.0173   Epoch: 11   Global Step: 484930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:13:02,564-Speed 2630.05 samples/sec   Loss 5.5028   LearningRate 0.0173   Epoch: 11   Global Step: 484940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:13:06,469-Speed 2622.51 samples/sec   Loss 5.5793   LearningRate 0.0173   Epoch: 11   Global Step: 484950   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-04-15 02:13:10,325-Speed 2656.35 samples/sec   Loss 5.5395   LearningRate 0.0173   Epoch: 11   Global Step: 484960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:14,231-Speed 2621.92 samples/sec   Loss 5.4800   LearningRate 0.0173   Epoch: 11   Global Step: 484970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:18,130-Speed 2627.45 samples/sec   Loss 5.5478   LearningRate 0.0173   Epoch: 11   Global Step: 484980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:22,030-Speed 2625.77 samples/sec   Loss 5.5384   LearningRate 0.0173   Epoch: 11   Global Step: 484990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:25,926-Speed 2629.31 samples/sec   Loss 5.4965   LearningRate 0.0173   Epoch: 11   Global Step: 485000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:29,827-Speed 2625.61 samples/sec   Loss 5.5984   LearningRate 0.0173   Epoch: 11   Global Step: 485010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:33,749-Speed 2611.43 samples/sec   Loss 5.5167   LearningRate 0.0173   Epoch: 11   Global Step: 485020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:37,664-Speed 2616.43 samples/sec   Loss 5.6119   LearningRate 0.0172   Epoch: 11   Global Step: 485030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:41,575-Speed 2619.14 samples/sec   Loss 5.6071   LearningRate 0.0172   Epoch: 11   Global Step: 485040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:45,481-Speed 2621.92 samples/sec   Loss 5.6279   LearningRate 0.0172   Epoch: 11   Global Step: 485050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:13:49,389-Speed 2621.15 samples/sec   Loss 5.5803   LearningRate 0.0172   Epoch: 11   Global Step: 485060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:13:53,297-Speed 2620.19 samples/sec   Loss 5.4447   LearningRate 0.0172   Epoch: 11   Global Step: 485070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:13:57,201-Speed 2623.93 samples/sec   Loss 5.6030   LearningRate 0.0172   Epoch: 11   Global Step: 485080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:14:01,098-Speed 2627.98 samples/sec   Loss 5.6164   LearningRate 0.0172   Epoch: 11   Global Step: 485090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:14:04,997-Speed 2627.37 samples/sec   Loss 5.4277   LearningRate 0.0172   Epoch: 11   Global Step: 485100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:14:08,887-Speed 2633.06 samples/sec   Loss 5.4560   LearningRate 0.0172   Epoch: 11   Global Step: 485110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:14:12,762-Speed 2642.76 samples/sec   Loss 5.6300   LearningRate 0.0172   Epoch: 11   Global Step: 485120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:16,668-Speed 2622.78 samples/sec   Loss 5.5506   LearningRate 0.0172   Epoch: 11   Global Step: 485130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:20,564-Speed 2628.49 samples/sec   Loss 5.5314   LearningRate 0.0172   Epoch: 11   Global Step: 485140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:24,462-Speed 2627.87 samples/sec   Loss 5.5899   LearningRate 0.0172   Epoch: 11   Global Step: 485150   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:28,370-Speed 2620.67 samples/sec   Loss 5.5797   LearningRate 0.0172   Epoch: 11   Global Step: 485160   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:32,270-Speed 2626.15 samples/sec   Loss 5.4745   LearningRate 0.0172   Epoch: 11   Global Step: 485170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:36,161-Speed 2632.17 samples/sec   Loss 5.5717   LearningRate 0.0172   Epoch: 11   Global Step: 485180   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:40,055-Speed 2630.38 samples/sec   Loss 5.5640   LearningRate 0.0172   Epoch: 11   Global Step: 485190   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:43,951-Speed 2629.36 samples/sec   Loss 5.5557   LearningRate 0.0172   Epoch: 11   Global Step: 485200   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:47,855-Speed 2623.82 samples/sec   Loss 5.5844   LearningRate 0.0172   Epoch: 11   Global Step: 485210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:14:51,755-Speed 2626.31 samples/sec   Loss 5.7642   LearningRate 0.0172   Epoch: 11   Global Step: 485220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:14:55,653-Speed 2627.13 samples/sec   Loss 5.5463   LearningRate 0.0172   Epoch: 11   Global Step: 485230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:14:59,569-Speed 2615.42 samples/sec   Loss 5.4636   LearningRate 0.0172   Epoch: 11   Global Step: 485240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:15:03,478-Speed 2620.50 samples/sec   Loss 5.5322   LearningRate 0.0172   Epoch: 11   Global Step: 485250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:15:07,374-Speed 2628.87 samples/sec   Loss 5.4825   LearningRate 0.0172   Epoch: 11   Global Step: 485260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:15:11,250-Speed 2642.26 samples/sec   Loss 5.5435   LearningRate 0.0172   Epoch: 11   Global Step: 485270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:15,149-Speed 2626.77 samples/sec   Loss 5.5719   LearningRate 0.0172   Epoch: 11   Global Step: 485280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:19,063-Speed 2616.56 samples/sec   Loss 5.6025   LearningRate 0.0172   Epoch: 11   Global Step: 485290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:22,970-Speed 2622.46 samples/sec   Loss 5.5726   LearningRate 0.0172   Epoch: 11   Global Step: 485300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:26,865-Speed 2629.68 samples/sec   Loss 5.4342   LearningRate 0.0172   Epoch: 11   Global Step: 485310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:30,779-Speed 2616.98 samples/sec   Loss 5.5621   LearningRate 0.0172   Epoch: 11   Global Step: 485320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:34,678-Speed 2626.66 samples/sec   Loss 5.5445   LearningRate 0.0172   Epoch: 11   Global Step: 485330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:38,575-Speed 2628.16 samples/sec   Loss 5.5277   LearningRate 0.0172   Epoch: 11   Global Step: 485340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:42,471-Speed 2628.87 samples/sec   Loss 5.4481   LearningRate 0.0172   Epoch: 11   Global Step: 485350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:46,366-Speed 2630.05 samples/sec   Loss 5.5033   LearningRate 0.0172   Epoch: 11   Global Step: 485360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:50,267-Speed 2625.01 samples/sec   Loss 5.5433   LearningRate 0.0172   Epoch: 11   Global Step: 485370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:15:54,145-Speed 2641.32 samples/sec   Loss 5.5649   LearningRate 0.0172   Epoch: 11   Global Step: 485380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:15:58,042-Speed 2628.18 samples/sec   Loss 5.4401   LearningRate 0.0172   Epoch: 11   Global Step: 485390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:01,937-Speed 2630.97 samples/sec   Loss 5.5005   LearningRate 0.0172   Epoch: 11   Global Step: 485400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:05,832-Speed 2629.86 samples/sec   Loss 5.5367   LearningRate 0.0172   Epoch: 11   Global Step: 485410   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:09,727-Speed 2629.41 samples/sec   Loss 5.5709   LearningRate 0.0172   Epoch: 11   Global Step: 485420   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:13,626-Speed 2626.63 samples/sec   Loss 5.4845   LearningRate 0.0172   Epoch: 11   Global Step: 485430   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:17,519-Speed 2631.14 samples/sec   Loss 5.6775   LearningRate 0.0172   Epoch: 11   Global Step: 485440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:21,420-Speed 2625.39 samples/sec   Loss 5.5071   LearningRate 0.0172   Epoch: 11   Global Step: 485450   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:25,328-Speed 2620.98 samples/sec   Loss 5.5737   LearningRate 0.0172   Epoch: 11   Global Step: 485460   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:29,223-Speed 2629.72 samples/sec   Loss 5.5306   LearningRate 0.0172   Epoch: 11   Global Step: 485470   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:33,118-Speed 2629.64 samples/sec   Loss 5.5501   LearningRate 0.0172   Epoch: 11   Global Step: 485480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:16:37,015-Speed 2628.24 samples/sec   Loss 5.6289   LearningRate 0.0172   Epoch: 11   Global Step: 485490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:16:40,910-Speed 2629.64 samples/sec   Loss 5.5087   LearningRate 0.0172   Epoch: 11   Global Step: 485500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:16:44,807-Speed 2628.39 samples/sec   Loss 5.5517   LearningRate 0.0172   Epoch: 11   Global Step: 485510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:16:48,818-Speed 2553.50 samples/sec   Loss 5.5633   LearningRate 0.0172   Epoch: 11   Global Step: 485520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:16:52,693-Speed 2643.85 samples/sec   Loss 5.5406   LearningRate 0.0172   Epoch: 11   Global Step: 485530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:16:56,589-Speed 2628.81 samples/sec   Loss 5.5790   LearningRate 0.0172   Epoch: 11   Global Step: 485540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:00,484-Speed 2629.49 samples/sec   Loss 5.5254   LearningRate 0.0172   Epoch: 11   Global Step: 485550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:04,395-Speed 2619.06 samples/sec   Loss 5.4895   LearningRate 0.0172   Epoch: 11   Global Step: 485560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:08,304-Speed 2620.17 samples/sec   Loss 5.4367   LearningRate 0.0172   Epoch: 11   Global Step: 485570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:12,203-Speed 2626.67 samples/sec   Loss 5.6074   LearningRate 0.0172   Epoch: 11   Global Step: 485580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:16,107-Speed 2623.77 samples/sec   Loss 5.5740   LearningRate 0.0172   Epoch: 11   Global Step: 485590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:20,017-Speed 2619.89 samples/sec   Loss 5.5254   LearningRate 0.0172   Epoch: 11   Global Step: 485600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:23,932-Speed 2616.17 samples/sec   Loss 5.5022   LearningRate 0.0172   Epoch: 11   Global Step: 485610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:27,874-Speed 2598.33 samples/sec   Loss 5.5553   LearningRate 0.0172   Epoch: 11   Global Step: 485620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:31,772-Speed 2627.33 samples/sec   Loss 5.6919   LearningRate 0.0172   Epoch: 11   Global Step: 485630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:17:35,671-Speed 2627.03 samples/sec   Loss 5.5771   LearningRate 0.0172   Epoch: 11   Global Step: 485640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:17:39,573-Speed 2624.44 samples/sec   Loss 5.5645   LearningRate 0.0172   Epoch: 11   Global Step: 485650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:17:43,477-Speed 2623.34 samples/sec   Loss 5.5139   LearningRate 0.0172   Epoch: 11   Global Step: 485660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:17:47,376-Speed 2627.14 samples/sec   Loss 5.5155   LearningRate 0.0172   Epoch: 11   Global Step: 485670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:17:51,250-Speed 2644.24 samples/sec   Loss 5.5939   LearningRate 0.0172   Epoch: 11   Global Step: 485680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:55,146-Speed 2629.35 samples/sec   Loss 5.6066   LearningRate 0.0172   Epoch: 11   Global Step: 485690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:17:59,042-Speed 2628.78 samples/sec   Loss 5.4145   LearningRate 0.0172   Epoch: 11   Global Step: 485700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:02,936-Speed 2630.65 samples/sec   Loss 5.5495   LearningRate 0.0172   Epoch: 11   Global Step: 485710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:06,836-Speed 2625.86 samples/sec   Loss 5.5310   LearningRate 0.0172   Epoch: 11   Global Step: 485720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:10,737-Speed 2625.46 samples/sec   Loss 5.6260   LearningRate 0.0172   Epoch: 11   Global Step: 485730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:14,642-Speed 2622.95 samples/sec   Loss 5.4652   LearningRate 0.0172   Epoch: 11   Global Step: 485740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:18,554-Speed 2618.09 samples/sec   Loss 5.5761   LearningRate 0.0172   Epoch: 11   Global Step: 485750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:22,454-Speed 2626.40 samples/sec   Loss 5.4125   LearningRate 0.0172   Epoch: 11   Global Step: 485760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:26,352-Speed 2627.28 samples/sec   Loss 5.5658   LearningRate 0.0172   Epoch: 11   Global Step: 485770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:18:30,263-Speed 2618.98 samples/sec   Loss 5.5287   LearningRate 0.0172   Epoch: 11   Global Step: 485780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:34,157-Speed 2630.38 samples/sec   Loss 5.5668   LearningRate 0.0172   Epoch: 11   Global Step: 485790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:38,051-Speed 2630.30 samples/sec   Loss 5.5053   LearningRate 0.0172   Epoch: 11   Global Step: 485800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:41,952-Speed 2625.80 samples/sec   Loss 5.6139   LearningRate 0.0172   Epoch: 11   Global Step: 485810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:45,846-Speed 2630.46 samples/sec   Loss 5.4958   LearningRate 0.0172   Epoch: 11   Global Step: 485820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:49,744-Speed 2626.97 samples/sec   Loss 5.5856   LearningRate 0.0172   Epoch: 11   Global Step: 485830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:53,644-Speed 2626.86 samples/sec   Loss 5.6357   LearningRate 0.0172   Epoch: 11   Global Step: 485840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:18:57,526-Speed 2638.19 samples/sec   Loss 5.5030   LearningRate 0.0172   Epoch: 11   Global Step: 485850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:01,423-Speed 2628.42 samples/sec   Loss 5.6171   LearningRate 0.0172   Epoch: 11   Global Step: 485860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:05,319-Speed 2628.52 samples/sec   Loss 5.6163   LearningRate 0.0172   Epoch: 11   Global Step: 485870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:09,222-Speed 2624.29 samples/sec   Loss 5.5741   LearningRate 0.0172   Epoch: 11   Global Step: 485880   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:13,195-Speed 2578.15 samples/sec   Loss 5.3772   LearningRate 0.0172   Epoch: 11   Global Step: 485890   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:17,129-Speed 2603.89 samples/sec   Loss 5.5592   LearningRate 0.0172   Epoch: 11   Global Step: 485900   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:21,073-Speed 2597.08 samples/sec   Loss 5.6095   LearningRate 0.0172   Epoch: 11   Global Step: 485910   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:24,986-Speed 2620.65 samples/sec   Loss 5.5355   LearningRate 0.0172   Epoch: 11   Global Step: 485920   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:28,878-Speed 2631.13 samples/sec   Loss 5.6329   LearningRate 0.0172   Epoch: 11   Global Step: 485930   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:32,774-Speed 2628.93 samples/sec   Loss 5.4929   LearningRate 0.0172   Epoch: 11   Global Step: 485940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:36,651-Speed 2641.83 samples/sec   Loss 5.5111   LearningRate 0.0172   Epoch: 11   Global Step: 485950   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:40,542-Speed 2632.84 samples/sec   Loss 5.6122   LearningRate 0.0172   Epoch: 11   Global Step: 485960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:44,436-Speed 2630.48 samples/sec   Loss 5.5707   LearningRate 0.0172   Epoch: 11   Global Step: 485970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:48,333-Speed 2628.51 samples/sec   Loss 5.5487   LearningRate 0.0172   Epoch: 11   Global Step: 485980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:52,246-Speed 2617.29 samples/sec   Loss 5.4704   LearningRate 0.0172   Epoch: 11   Global Step: 485990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:19:56,140-Speed 2630.22 samples/sec   Loss 5.6343   LearningRate 0.0172   Epoch: 11   Global Step: 486000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:00,038-Speed 2627.80 samples/sec   Loss 5.4192   LearningRate 0.0172   Epoch: 11   Global Step: 486010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:03,944-Speed 2621.72 samples/sec   Loss 5.6220   LearningRate 0.0172   Epoch: 11   Global Step: 486020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:07,873-Speed 2606.58 samples/sec   Loss 5.5003   LearningRate 0.0171   Epoch: 11   Global Step: 486030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:11,770-Speed 2628.94 samples/sec   Loss 5.4878   LearningRate 0.0171   Epoch: 11   Global Step: 486040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:15,666-Speed 2628.78 samples/sec   Loss 5.6857   LearningRate 0.0171   Epoch: 11   Global Step: 486050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:20:19,555-Speed 2633.57 samples/sec   Loss 5.5865   LearningRate 0.0171   Epoch: 11   Global Step: 486060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:23,456-Speed 2625.85 samples/sec   Loss 5.6723   LearningRate 0.0171   Epoch: 11   Global Step: 486070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:27,354-Speed 2627.90 samples/sec   Loss 5.5648   LearningRate 0.0171   Epoch: 11   Global Step: 486080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:31,300-Speed 2595.50 samples/sec   Loss 5.5353   LearningRate 0.0171   Epoch: 11   Global Step: 486090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:35,198-Speed 2627.28 samples/sec   Loss 5.3913   LearningRate 0.0171   Epoch: 11   Global Step: 486100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:39,095-Speed 2628.02 samples/sec   Loss 5.6218   LearningRate 0.0171   Epoch: 11   Global Step: 486110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:43,003-Speed 2621.24 samples/sec   Loss 5.5101   LearningRate 0.0171   Epoch: 11   Global Step: 486120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:46,901-Speed 2627.10 samples/sec   Loss 5.5462   LearningRate 0.0171   Epoch: 11   Global Step: 486130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:50,806-Speed 2623.22 samples/sec   Loss 5.5725   LearningRate 0.0171   Epoch: 11   Global Step: 486140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:54,711-Speed 2622.79 samples/sec   Loss 5.5589   LearningRate 0.0171   Epoch: 11   Global Step: 486150   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:20:58,617-Speed 2622.61 samples/sec   Loss 5.6098   LearningRate 0.0171   Epoch: 11   Global Step: 486160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:21:02,612-Speed 2563.69 samples/sec   Loss 5.5144   LearningRate 0.0171   Epoch: 11   Global Step: 486170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:06,707-Speed 2501.03 samples/sec   Loss 5.5735   LearningRate 0.0171   Epoch: 11   Global Step: 486180   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:10,807-Speed 2498.50 samples/sec   Loss 5.6774   LearningRate 0.0171   Epoch: 11   Global Step: 486190   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:14,839-Speed 2540.28 samples/sec   Loss 5.4937   LearningRate 0.0171   Epoch: 11   Global Step: 486200   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:18,738-Speed 2626.77 samples/sec   Loss 5.5902   LearningRate 0.0171   Epoch: 11   Global Step: 486210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:22,633-Speed 2629.85 samples/sec   Loss 5.5538   LearningRate 0.0171   Epoch: 11   Global Step: 486220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:26,540-Speed 2621.28 samples/sec   Loss 5.4810   LearningRate 0.0171   Epoch: 11   Global Step: 486230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:30,457-Speed 2614.67 samples/sec   Loss 5.5785   LearningRate 0.0171   Epoch: 11   Global Step: 486240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:34,360-Speed 2624.75 samples/sec   Loss 5.4676   LearningRate 0.0171   Epoch: 11   Global Step: 486250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:38,256-Speed 2628.79 samples/sec   Loss 5.5159   LearningRate 0.0171   Epoch: 11   Global Step: 486260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:21:42,154-Speed 2627.35 samples/sec   Loss 5.4812   LearningRate 0.0171   Epoch: 11   Global Step: 486270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:21:46,055-Speed 2625.69 samples/sec   Loss 5.5254   LearningRate 0.0171   Epoch: 11   Global Step: 486280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:21:49,951-Speed 2629.52 samples/sec   Loss 5.6266   LearningRate 0.0171   Epoch: 11   Global Step: 486290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:21:53,844-Speed 2630.51 samples/sec   Loss 5.5650   LearningRate 0.0171   Epoch: 11   Global Step: 486300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:21:57,739-Speed 2629.63 samples/sec   Loss 5.4991   LearningRate 0.0171   Epoch: 11   Global Step: 486310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:22:01,640-Speed 2625.45 samples/sec   Loss 5.5860   LearningRate 0.0171   Epoch: 11   Global Step: 486320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:22:05,536-Speed 2628.78 samples/sec   Loss 5.5203   LearningRate 0.0171   Epoch: 11   Global Step: 486330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:22:09,428-Speed 2631.58 samples/sec   Loss 5.4496   LearningRate 0.0171   Epoch: 11   Global Step: 486340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:22:13,303-Speed 2643.92 samples/sec   Loss 5.4125   LearningRate 0.0171   Epoch: 11   Global Step: 486350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:17,197-Speed 2630.15 samples/sec   Loss 5.6098   LearningRate 0.0171   Epoch: 11   Global Step: 486360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:21,102-Speed 2622.94 samples/sec   Loss 5.5881   LearningRate 0.0171   Epoch: 11   Global Step: 486370   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:25,024-Speed 2611.58 samples/sec   Loss 5.4256   LearningRate 0.0171   Epoch: 11   Global Step: 486380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:28,924-Speed 2625.80 samples/sec   Loss 5.6090   LearningRate 0.0171   Epoch: 11   Global Step: 486390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:32,818-Speed 2630.37 samples/sec   Loss 5.4878   LearningRate 0.0171   Epoch: 11   Global Step: 486400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:36,718-Speed 2626.16 samples/sec   Loss 5.5698   LearningRate 0.0171   Epoch: 11   Global Step: 486410   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:40,612-Speed 2630.60 samples/sec   Loss 5.6038   LearningRate 0.0171   Epoch: 11   Global Step: 486420   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:44,513-Speed 2625.09 samples/sec   Loss 5.4879   LearningRate 0.0171   Epoch: 11   Global Step: 486430   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:48,412-Speed 2627.04 samples/sec   Loss 5.5105   LearningRate 0.0171   Epoch: 11   Global Step: 486440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:22:52,313-Speed 2625.63 samples/sec   Loss 5.5538   LearningRate 0.0171   Epoch: 11   Global Step: 486450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:22:56,202-Speed 2633.91 samples/sec   Loss 5.5520   LearningRate 0.0171   Epoch: 11   Global Step: 486460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:00,095-Speed 2630.95 samples/sec   Loss 5.4354   LearningRate 0.0171   Epoch: 11   Global Step: 486470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:03,989-Speed 2631.22 samples/sec   Loss 5.5321   LearningRate 0.0171   Epoch: 11   Global Step: 486480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:07,894-Speed 2623.14 samples/sec   Loss 5.5176   LearningRate 0.0171   Epoch: 11   Global Step: 486490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:11,792-Speed 2627.00 samples/sec   Loss 5.5945   LearningRate 0.0171   Epoch: 11   Global Step: 486500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:15,691-Speed 2627.29 samples/sec   Loss 5.5260   LearningRate 0.0171   Epoch: 11   Global Step: 486510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:19,592-Speed 2624.98 samples/sec   Loss 5.5639   LearningRate 0.0171   Epoch: 11   Global Step: 486520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:23,487-Speed 2630.10 samples/sec   Loss 5.5620   LearningRate 0.0171   Epoch: 11   Global Step: 486530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:27,383-Speed 2629.27 samples/sec   Loss 5.5888   LearningRate 0.0171   Epoch: 11   Global Step: 486540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:31,255-Speed 2645.12 samples/sec   Loss 5.6177   LearningRate 0.0171   Epoch: 11   Global Step: 486550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:35,153-Speed 2627.40 samples/sec   Loss 5.5467   LearningRate 0.0171   Epoch: 11   Global Step: 486560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:23:39,041-Speed 2634.73 samples/sec   Loss 5.5226   LearningRate 0.0171   Epoch: 11   Global Step: 486570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:23:42,933-Speed 2631.44 samples/sec   Loss 5.6282   LearningRate 0.0171   Epoch: 11   Global Step: 486580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:23:46,831-Speed 2627.67 samples/sec   Loss 5.6458   LearningRate 0.0171   Epoch: 11   Global Step: 486590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:23:50,727-Speed 2628.79 samples/sec   Loss 5.6592   LearningRate 0.0171   Epoch: 11   Global Step: 486600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:23:54,676-Speed 2593.43 samples/sec   Loss 5.5710   LearningRate 0.0171   Epoch: 11   Global Step: 486610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:23:58,604-Speed 2607.28 samples/sec   Loss 5.5870   LearningRate 0.0171   Epoch: 11   Global Step: 486620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:02,508-Speed 2623.71 samples/sec   Loss 5.5360   LearningRate 0.0171   Epoch: 11   Global Step: 486630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:06,404-Speed 2629.47 samples/sec   Loss 5.4740   LearningRate 0.0171   Epoch: 11   Global Step: 486640   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:10,304-Speed 2626.20 samples/sec   Loss 5.4810   LearningRate 0.0171   Epoch: 11   Global Step: 486650   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:14,200-Speed 2628.77 samples/sec   Loss 5.4633   LearningRate 0.0171   Epoch: 11   Global Step: 486660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:18,104-Speed 2623.99 samples/sec   Loss 5.4981   LearningRate 0.0171   Epoch: 11   Global Step: 486670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:24:21,985-Speed 2638.59 samples/sec   Loss 5.5280   LearningRate 0.0171   Epoch: 11   Global Step: 486680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:25,882-Speed 2628.11 samples/sec   Loss 5.5881   LearningRate 0.0171   Epoch: 11   Global Step: 486690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:29,790-Speed 2621.06 samples/sec   Loss 5.5202   LearningRate 0.0171   Epoch: 11   Global Step: 486700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:33,694-Speed 2623.54 samples/sec   Loss 5.5818   LearningRate 0.0171   Epoch: 11   Global Step: 486710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:37,600-Speed 2622.39 samples/sec   Loss 5.4898   LearningRate 0.0171   Epoch: 11   Global Step: 486720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:41,499-Speed 2626.70 samples/sec   Loss 5.4045   LearningRate 0.0171   Epoch: 11   Global Step: 486730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:45,391-Speed 2631.91 samples/sec   Loss 5.6020   LearningRate 0.0171   Epoch: 11   Global Step: 486740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:49,284-Speed 2631.04 samples/sec   Loss 5.6245   LearningRate 0.0171   Epoch: 11   Global Step: 486750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:53,177-Speed 2630.95 samples/sec   Loss 5.5633   LearningRate 0.0171   Epoch: 11   Global Step: 486760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:24:57,094-Speed 2614.78 samples/sec   Loss 5.6340   LearningRate 0.0171   Epoch: 11   Global Step: 486770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:25:01,012-Speed 2614.22 samples/sec   Loss 5.6080   LearningRate 0.0171   Epoch: 11   Global Step: 486780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:04,932-Speed 2612.82 samples/sec   Loss 5.4753   LearningRate 0.0171   Epoch: 11   Global Step: 486790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:08,837-Speed 2622.84 samples/sec   Loss 5.5135   LearningRate 0.0171   Epoch: 11   Global Step: 486800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:12,744-Speed 2621.62 samples/sec   Loss 5.5005   LearningRate 0.0171   Epoch: 11   Global Step: 486810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:16,652-Speed 2621.03 samples/sec   Loss 5.5991   LearningRate 0.0171   Epoch: 11   Global Step: 486820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:20,550-Speed 2627.42 samples/sec   Loss 5.5478   LearningRate 0.0171   Epoch: 11   Global Step: 486830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:24,450-Speed 2626.47 samples/sec   Loss 5.6358   LearningRate 0.0171   Epoch: 11   Global Step: 486840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-04-15 02:25:28,329-Speed 2640.91 samples/sec   Loss 5.4087   LearningRate 0.0171   Epoch: 11   Global Step: 486850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:25:32,234-Speed 2622.62 samples/sec   Loss 5.6123   LearningRate 0.0171   Epoch: 11   Global Step: 486860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-04-15 02:25:36,125-Speed 2631.97 samples/sec   Loss 5.4062   LearningRate 0.0171   Epoch: 11   Global Step: 486870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:25:40,018-Speed 2631.28 samples/sec   Loss 5.6409   LearningRate 0.0171   Epoch: 11   Global Step: 486880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:25:43,912-Speed 2630.17 samples/sec   Loss 5.5022   LearningRate 0.0171   Epoch: 11   Global Step: 486890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:25:47,809-Speed 2628.57 samples/sec   Loss 5.5584   LearningRate 0.0171   Epoch: 11   Global Step: 486900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:25:51,707-Speed 2627.45 samples/sec   Loss 5.5293   LearningRate 0.0171   Epoch: 11   Global Step: 486910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:25:55,601-Speed 2630.77 samples/sec   Loss 5.4781   LearningRate 0.0171   Epoch: 11   Global Step: 486920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:25:59,504-Speed 2624.11 samples/sec   Loss 5.5595   LearningRate 0.0171   Epoch: 11   Global Step: 486930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:03,400-Speed 2628.69 samples/sec   Loss 5.5181   LearningRate 0.0171   Epoch: 11   Global Step: 486940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:07,298-Speed 2627.65 samples/sec   Loss 5.5294   LearningRate 0.0171   Epoch: 11   Global Step: 486950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:26:11,192-Speed 2630.21 samples/sec   Loss 5.5517   LearningRate 0.0171   Epoch: 11   Global Step: 486960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:26:15,090-Speed 2628.02 samples/sec   Loss 5.5861   LearningRate 0.0171   Epoch: 11   Global Step: 486970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:26:18,985-Speed 2629.28 samples/sec   Loss 5.4338   LearningRate 0.0171   Epoch: 11   Global Step: 486980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:26:22,873-Speed 2634.44 samples/sec   Loss 5.5404   LearningRate 0.0171   Epoch: 11   Global Step: 486990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:26,780-Speed 2622.36 samples/sec   Loss 5.6567   LearningRate 0.0171   Epoch: 11   Global Step: 487000   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:30,685-Speed 2622.59 samples/sec   Loss 5.4864   LearningRate 0.0171   Epoch: 11   Global Step: 487010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:34,587-Speed 2624.56 samples/sec   Loss 5.5651   LearningRate 0.0171   Epoch: 11   Global Step: 487020   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:38,477-Speed 2632.93 samples/sec   Loss 5.5427   LearningRate 0.0171   Epoch: 11   Global Step: 487030   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:42,372-Speed 2630.34 samples/sec   Loss 5.4804   LearningRate 0.0170   Epoch: 11   Global Step: 487040   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:46,273-Speed 2625.52 samples/sec   Loss 5.4780   LearningRate 0.0170   Epoch: 11   Global Step: 487050   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:26:50,234-Speed 2585.95 samples/sec   Loss 5.5564   LearningRate 0.0170   Epoch: 11   Global Step: 487060   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:26:54,154-Speed 2612.60 samples/sec   Loss 5.4689   LearningRate 0.0170   Epoch: 11   Global Step: 487070   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:26:58,048-Speed 2630.80 samples/sec   Loss 5.5169   LearningRate 0.0170   Epoch: 11   Global Step: 487080   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:01,943-Speed 2629.50 samples/sec   Loss 5.5554   LearningRate 0.0170   Epoch: 11   Global Step: 487090   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:05,852-Speed 2620.58 samples/sec   Loss 5.5361   LearningRate 0.0170   Epoch: 11   Global Step: 487100   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:09,754-Speed 2624.23 samples/sec   Loss 5.5633   LearningRate 0.0170   Epoch: 11   Global Step: 487110   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:13,651-Speed 2628.62 samples/sec   Loss 5.5657   LearningRate 0.0170   Epoch: 11   Global Step: 487120   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:17,552-Speed 2625.14 samples/sec   Loss 5.5822   LearningRate 0.0170   Epoch: 11   Global Step: 487130   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:21,451-Speed 2628.00 samples/sec   Loss 5.4692   LearningRate 0.0170   Epoch: 11   Global Step: 487140   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:25,360-Speed 2620.23 samples/sec   Loss 5.4176   LearningRate 0.0170   Epoch: 11   Global Step: 487150   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:27:29,283-Speed 2610.76 samples/sec   Loss 5.4754   LearningRate 0.0170   Epoch: 11   Global Step: 487160   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:33,211-Speed 2607.49 samples/sec   Loss 5.5055   LearningRate 0.0170   Epoch: 11   Global Step: 487170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:37,103-Speed 2631.15 samples/sec   Loss 5.6573   LearningRate 0.0170   Epoch: 11   Global Step: 487180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:40,997-Speed 2630.61 samples/sec   Loss 5.5295   LearningRate 0.0170   Epoch: 11   Global Step: 487190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:44,901-Speed 2623.39 samples/sec   Loss 5.5526   LearningRate 0.0170   Epoch: 11   Global Step: 487200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:48,800-Speed 2627.02 samples/sec   Loss 5.4530   LearningRate 0.0170   Epoch: 11   Global Step: 487210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:52,701-Speed 2625.35 samples/sec   Loss 5.5713   LearningRate 0.0170   Epoch: 11   Global Step: 487220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:27:56,604-Speed 2624.74 samples/sec   Loss 5.5090   LearningRate 0.0170   Epoch: 11   Global Step: 487230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:00,495-Speed 2631.86 samples/sec   Loss 5.7270   LearningRate 0.0170   Epoch: 11   Global Step: 487240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:04,388-Speed 2631.05 samples/sec   Loss 5.4363   LearningRate 0.0170   Epoch: 11   Global Step: 487250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:08,258-Speed 2646.50 samples/sec   Loss 5.5467   LearningRate 0.0170   Epoch: 11   Global Step: 487260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:12,152-Speed 2630.12 samples/sec   Loss 5.5809   LearningRate 0.0170   Epoch: 11   Global Step: 487270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:16,046-Speed 2630.63 samples/sec   Loss 5.5768   LearningRate 0.0170   Epoch: 11   Global Step: 487280   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:19,952-Speed 2622.03 samples/sec   Loss 5.4872   LearningRate 0.0170   Epoch: 11   Global Step: 487290   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:23,849-Speed 2627.96 samples/sec   Loss 5.5382   LearningRate 0.0170   Epoch: 11   Global Step: 487300   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:27,756-Speed 2621.30 samples/sec   Loss 5.6023   LearningRate 0.0170   Epoch: 11   Global Step: 487310   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:31,653-Speed 2629.33 samples/sec   Loss 5.5336   LearningRate 0.0170   Epoch: 11   Global Step: 487320   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:35,550-Speed 2628.69 samples/sec   Loss 5.4095   LearningRate 0.0170   Epoch: 11   Global Step: 487330   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:39,454-Speed 2623.71 samples/sec   Loss 5.3789   LearningRate 0.0170   Epoch: 11   Global Step: 487340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:43,353-Speed 2626.29 samples/sec   Loss 5.4829   LearningRate 0.0170   Epoch: 11   Global Step: 487350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:28:47,246-Speed 2631.41 samples/sec   Loss 5.4747   LearningRate 0.0170   Epoch: 11   Global Step: 487360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:28:51,150-Speed 2623.47 samples/sec   Loss 5.5045   LearningRate 0.0170   Epoch: 11   Global Step: 487370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:28:55,050-Speed 2626.78 samples/sec   Loss 5.5284   LearningRate 0.0170   Epoch: 11   Global Step: 487380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:28:58,948-Speed 2627.25 samples/sec   Loss 5.4752   LearningRate 0.0170   Epoch: 11   Global Step: 487390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:02,842-Speed 2630.70 samples/sec   Loss 5.4788   LearningRate 0.0170   Epoch: 11   Global Step: 487400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:06,743-Speed 2625.36 samples/sec   Loss 5.5243   LearningRate 0.0170   Epoch: 11   Global Step: 487410   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:10,646-Speed 2624.25 samples/sec   Loss 5.4249   LearningRate 0.0170   Epoch: 11   Global Step: 487420   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:14,544-Speed 2627.79 samples/sec   Loss 5.4650   LearningRate 0.0170   Epoch: 11   Global Step: 487430   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:18,453-Speed 2620.34 samples/sec   Loss 5.5719   LearningRate 0.0170   Epoch: 11   Global Step: 487440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:22,349-Speed 2628.67 samples/sec   Loss 5.5544   LearningRate 0.0170   Epoch: 11   Global Step: 487450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:26,229-Speed 2639.88 samples/sec   Loss 5.5789   LearningRate 0.0170   Epoch: 11   Global Step: 487460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:30,124-Speed 2629.77 samples/sec   Loss 5.5495   LearningRate 0.0170   Epoch: 11   Global Step: 487470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:34,019-Speed 2631.64 samples/sec   Loss 5.4173   LearningRate 0.0170   Epoch: 11   Global Step: 487480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:37,926-Speed 2621.38 samples/sec   Loss 5.5035   LearningRate 0.0170   Epoch: 11   Global Step: 487490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:29:41,811-Speed 2636.04 samples/sec   Loss 5.5787   LearningRate 0.0170   Epoch: 11   Global Step: 487500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:29:45,714-Speed 2624.02 samples/sec   Loss 5.4949   LearningRate 0.0170   Epoch: 11   Global Step: 487510   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:29:49,627-Speed 2618.17 samples/sec   Loss 5.5651   LearningRate 0.0170   Epoch: 11   Global Step: 487520   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:29:53,528-Speed 2626.19 samples/sec   Loss 5.5545   LearningRate 0.0170   Epoch: 11   Global Step: 487530   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:29:57,421-Speed 2630.30 samples/sec   Loss 5.4822   LearningRate 0.0170   Epoch: 11   Global Step: 487540   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:01,324-Speed 2624.56 samples/sec   Loss 5.4814   LearningRate 0.0170   Epoch: 11   Global Step: 487550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:05,225-Speed 2625.54 samples/sec   Loss 5.4530   LearningRate 0.0170   Epoch: 11   Global Step: 487560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:09,129-Speed 2623.52 samples/sec   Loss 5.4256   LearningRate 0.0170   Epoch: 11   Global Step: 487570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:13,026-Speed 2628.24 samples/sec   Loss 5.5013   LearningRate 0.0170   Epoch: 11   Global Step: 487580   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:16,932-Speed 2622.14 samples/sec   Loss 5.3979   LearningRate 0.0170   Epoch: 11   Global Step: 487590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:20,832-Speed 2626.14 samples/sec   Loss 5.4478   LearningRate 0.0170   Epoch: 11   Global Step: 487600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:24,731-Speed 2627.12 samples/sec   Loss 5.5318   LearningRate 0.0170   Epoch: 11   Global Step: 487610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:28,628-Speed 2628.34 samples/sec   Loss 5.5108   LearningRate 0.0170   Epoch: 11   Global Step: 487620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:32,526-Speed 2627.92 samples/sec   Loss 5.4312   LearningRate 0.0170   Epoch: 11   Global Step: 487630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:36,424-Speed 2627.31 samples/sec   Loss 5.6507   LearningRate 0.0170   Epoch: 11   Global Step: 487640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:40,331-Speed 2621.27 samples/sec   Loss 5.3474   LearningRate 0.0170   Epoch: 11   Global Step: 487650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:44,232-Speed 2626.21 samples/sec   Loss 5.5119   LearningRate 0.0170   Epoch: 11   Global Step: 487660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:48,127-Speed 2629.67 samples/sec   Loss 5.4824   LearningRate 0.0170   Epoch: 11   Global Step: 487670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:30:52,007-Speed 2642.56 samples/sec   Loss 5.5479   LearningRate 0.0170   Epoch: 11   Global Step: 487680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:55,902-Speed 2629.55 samples/sec   Loss 5.5426   LearningRate 0.0170   Epoch: 11   Global Step: 487690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:30:59,805-Speed 2624.20 samples/sec   Loss 5.4909   LearningRate 0.0170   Epoch: 11   Global Step: 487700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:03,702-Speed 2628.13 samples/sec   Loss 5.4258   LearningRate 0.0170   Epoch: 11   Global Step: 487710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:07,604-Speed 2625.36 samples/sec   Loss 5.4747   LearningRate 0.0170   Epoch: 11   Global Step: 487720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:11,505-Speed 2625.63 samples/sec   Loss 5.4530   LearningRate 0.0170   Epoch: 11   Global Step: 487730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:15,399-Speed 2629.76 samples/sec   Loss 5.5132   LearningRate 0.0170   Epoch: 11   Global Step: 487740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:19,299-Speed 2626.80 samples/sec   Loss 5.5852   LearningRate 0.0170   Epoch: 11   Global Step: 487750   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:23,194-Speed 2629.58 samples/sec   Loss 5.4814   LearningRate 0.0170   Epoch: 11   Global Step: 487760   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:27,088-Speed 2630.45 samples/sec   Loss 5.4824   LearningRate 0.0170   Epoch: 11   Global Step: 487770   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:30,980-Speed 2631.31 samples/sec   Loss 5.5818   LearningRate 0.0170   Epoch: 11   Global Step: 487780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:31:34,874-Speed 2630.20 samples/sec   Loss 5.4723   LearningRate 0.0170   Epoch: 11   Global Step: 487790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:31:38,747-Speed 2644.37 samples/sec   Loss 5.4888   LearningRate 0.0170   Epoch: 11   Global Step: 487800   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:42,654-Speed 2621.30 samples/sec   Loss 5.4062   LearningRate 0.0170   Epoch: 11   Global Step: 487810   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:46,550-Speed 2629.13 samples/sec   Loss 5.5550   LearningRate 0.0170   Epoch: 11   Global Step: 487820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:50,455-Speed 2623.22 samples/sec   Loss 5.5733   LearningRate 0.0170   Epoch: 11   Global Step: 487830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:54,351-Speed 2629.15 samples/sec   Loss 5.6516   LearningRate 0.0170   Epoch: 11   Global Step: 487840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:31:58,249-Speed 2627.70 samples/sec   Loss 5.6044   LearningRate 0.0170   Epoch: 11   Global Step: 487850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:02,147-Speed 2627.49 samples/sec   Loss 5.4892   LearningRate 0.0170   Epoch: 11   Global Step: 487860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:06,046-Speed 2627.09 samples/sec   Loss 5.5790   LearningRate 0.0170   Epoch: 11   Global Step: 487870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:09,952-Speed 2621.46 samples/sec   Loss 5.5382   LearningRate 0.0170   Epoch: 11   Global Step: 487880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:13,859-Speed 2621.89 samples/sec   Loss 5.4898   LearningRate 0.0170   Epoch: 11   Global Step: 487890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:17,756-Speed 2628.20 samples/sec   Loss 5.4102   LearningRate 0.0170   Epoch: 11   Global Step: 487900   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:32:21,657-Speed 2625.97 samples/sec   Loss 5.4854   LearningRate 0.0170   Epoch: 11   Global Step: 487910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:32:25,557-Speed 2626.52 samples/sec   Loss 5.5722   LearningRate 0.0170   Epoch: 11   Global Step: 487920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:32:29,435-Speed 2641.14 samples/sec   Loss 5.5067   LearningRate 0.0170   Epoch: 11   Global Step: 487930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:33,340-Speed 2622.94 samples/sec   Loss 5.4239   LearningRate 0.0170   Epoch: 11   Global Step: 487940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:37,240-Speed 2625.91 samples/sec   Loss 5.4636   LearningRate 0.0170   Epoch: 11   Global Step: 487950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:41,138-Speed 2627.55 samples/sec   Loss 5.6092   LearningRate 0.0170   Epoch: 11   Global Step: 487960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:45,042-Speed 2624.53 samples/sec   Loss 5.5859   LearningRate 0.0170   Epoch: 11   Global Step: 487970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:48,945-Speed 2624.53 samples/sec   Loss 5.6080   LearningRate 0.0170   Epoch: 11   Global Step: 487980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:52,848-Speed 2624.16 samples/sec   Loss 5.4282   LearningRate 0.0170   Epoch: 11   Global Step: 487990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:32:56,756-Speed 2621.05 samples/sec   Loss 5.4920   LearningRate 0.0170   Epoch: 11   Global Step: 488000   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:33:00,662-Speed 2622.10 samples/sec   Loss 5.4411   LearningRate 0.0170   Epoch: 11   Global Step: 488010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:33:04,567-Speed 2623.19 samples/sec   Loss 5.4987   LearningRate 0.0170   Epoch: 11   Global Step: 488020   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:33:08,463-Speed 2628.40 samples/sec   Loss 5.5162   LearningRate 0.0170   Epoch: 11   Global Step: 488030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:12,357-Speed 2631.29 samples/sec   Loss 5.4638   LearningRate 0.0169   Epoch: 11   Global Step: 488040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:16,255-Speed 2627.45 samples/sec   Loss 5.4510   LearningRate 0.0169   Epoch: 11   Global Step: 488050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:20,156-Speed 2626.16 samples/sec   Loss 5.6024   LearningRate 0.0169   Epoch: 11   Global Step: 488060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:24,053-Speed 2628.19 samples/sec   Loss 5.5363   LearningRate 0.0169   Epoch: 11   Global Step: 488070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:27,953-Speed 2626.15 samples/sec   Loss 5.4573   LearningRate 0.0169   Epoch: 11   Global Step: 488080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:31,854-Speed 2625.17 samples/sec   Loss 5.5814   LearningRate 0.0169   Epoch: 11   Global Step: 488090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:35,747-Speed 2631.24 samples/sec   Loss 5.4760   LearningRate 0.0169   Epoch: 11   Global Step: 488100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:39,640-Speed 2631.01 samples/sec   Loss 5.4394   LearningRate 0.0169   Epoch: 11   Global Step: 488110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:43,542-Speed 2625.11 samples/sec   Loss 5.5530   LearningRate 0.0169   Epoch: 11   Global Step: 488120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:47,415-Speed 2644.39 samples/sec   Loss 5.4834   LearningRate 0.0169   Epoch: 11   Global Step: 488130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:51,309-Speed 2630.83 samples/sec   Loss 5.4917   LearningRate 0.0169   Epoch: 11   Global Step: 488140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:55,203-Speed 2629.67 samples/sec   Loss 5.4726   LearningRate 0.0169   Epoch: 11   Global Step: 488150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:33:59,099-Speed 2629.15 samples/sec   Loss 5.5148   LearningRate 0.0169   Epoch: 11   Global Step: 488160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:34:02,986-Speed 2635.44 samples/sec   Loss 5.5083   LearningRate 0.0169   Epoch: 11   Global Step: 488170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:06,888-Speed 2624.66 samples/sec   Loss 5.4502   LearningRate 0.0169   Epoch: 11   Global Step: 488180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:10,784-Speed 2628.32 samples/sec   Loss 5.4838   LearningRate 0.0169   Epoch: 11   Global Step: 488190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:14,682-Speed 2628.11 samples/sec   Loss 5.4425   LearningRate 0.0169   Epoch: 11   Global Step: 488200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:18,582-Speed 2626.48 samples/sec   Loss 5.5892   LearningRate 0.0169   Epoch: 11   Global Step: 488210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:22,478-Speed 2629.03 samples/sec   Loss 5.4633   LearningRate 0.0169   Epoch: 11   Global Step: 488220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:26,376-Speed 2628.18 samples/sec   Loss 5.5943   LearningRate 0.0169   Epoch: 11   Global Step: 488230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:30,270-Speed 2629.84 samples/sec   Loss 5.5583   LearningRate 0.0169   Epoch: 11   Global Step: 488240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:34,183-Speed 2617.68 samples/sec   Loss 5.5218   LearningRate 0.0169   Epoch: 11   Global Step: 488250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:38,080-Speed 2628.36 samples/sec   Loss 5.4728   LearningRate 0.0169   Epoch: 11   Global Step: 488260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:34:41,977-Speed 2628.04 samples/sec   Loss 5.5828   LearningRate 0.0169   Epoch: 11   Global Step: 488270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:34:45,875-Speed 2627.43 samples/sec   Loss 5.4623   LearningRate 0.0169   Epoch: 11   Global Step: 488280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:34:49,774-Speed 2627.95 samples/sec   Loss 5.4645   LearningRate 0.0169   Epoch: 11   Global Step: 488290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:34:53,671-Speed 2628.47 samples/sec   Loss 5.5053   LearningRate 0.0169   Epoch: 11   Global Step: 488300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:34:57,565-Speed 2630.12 samples/sec   Loss 5.5708   LearningRate 0.0169   Epoch: 11   Global Step: 488310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:35:01,459-Speed 2630.82 samples/sec   Loss 5.5114   LearningRate 0.0169   Epoch: 11   Global Step: 488320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:35:05,374-Speed 2616.08 samples/sec   Loss 5.5576   LearningRate 0.0169   Epoch: 11   Global Step: 488330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:35:09,270-Speed 2628.39 samples/sec   Loss 5.5036   LearningRate 0.0169   Epoch: 11   Global Step: 488340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:35:13,151-Speed 2638.87 samples/sec   Loss 5.5395   LearningRate 0.0169   Epoch: 11   Global Step: 488350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:17,049-Speed 2628.26 samples/sec   Loss 5.5235   LearningRate 0.0169   Epoch: 11   Global Step: 488360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:20,945-Speed 2628.82 samples/sec   Loss 5.5231   LearningRate 0.0169   Epoch: 11   Global Step: 488370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:24,850-Speed 2623.61 samples/sec   Loss 5.5069   LearningRate 0.0169   Epoch: 11   Global Step: 488380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:28,746-Speed 2628.87 samples/sec   Loss 5.5629   LearningRate 0.0169   Epoch: 11   Global Step: 488390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:32,644-Speed 2628.18 samples/sec   Loss 5.4769   LearningRate 0.0169   Epoch: 11   Global Step: 488400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:36,569-Speed 2609.26 samples/sec   Loss 5.5376   LearningRate 0.0169   Epoch: 11   Global Step: 488410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:40,464-Speed 2629.94 samples/sec   Loss 5.4597   LearningRate 0.0169   Epoch: 11   Global Step: 488420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:44,357-Speed 2630.43 samples/sec   Loss 5.4975   LearningRate 0.0169   Epoch: 11   Global Step: 488430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:48,255-Speed 2628.56 samples/sec   Loss 5.5251   LearningRate 0.0169   Epoch: 11   Global Step: 488440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:35:52,173-Speed 2614.15 samples/sec   Loss 5.5499   LearningRate 0.0169   Epoch: 11   Global Step: 488450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:35:56,084-Speed 2619.26 samples/sec   Loss 5.5377   LearningRate 0.0169   Epoch: 11   Global Step: 488460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:35:59,992-Speed 2620.74 samples/sec   Loss 5.4976   LearningRate 0.0169   Epoch: 11   Global Step: 488470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:03,887-Speed 2630.44 samples/sec   Loss 5.5198   LearningRate 0.0169   Epoch: 11   Global Step: 488480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:07,780-Speed 2630.53 samples/sec   Loss 5.5034   LearningRate 0.0169   Epoch: 11   Global Step: 488490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:11,675-Speed 2629.24 samples/sec   Loss 5.4421   LearningRate 0.0169   Epoch: 11   Global Step: 488500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:15,577-Speed 2624.98 samples/sec   Loss 5.6153   LearningRate 0.0169   Epoch: 11   Global Step: 488510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:19,480-Speed 2624.50 samples/sec   Loss 5.5905   LearningRate 0.0169   Epoch: 11   Global Step: 488520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:23,375-Speed 2629.46 samples/sec   Loss 5.5419   LearningRate 0.0169   Epoch: 11   Global Step: 488530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:27,279-Speed 2631.10 samples/sec   Loss 5.4632   LearningRate 0.0169   Epoch: 11   Global Step: 488540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:36:31,155-Speed 2642.63 samples/sec   Loss 5.5732   LearningRate 0.0169   Epoch: 11   Global Step: 488550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:35,055-Speed 2628.62 samples/sec   Loss 5.4974   LearningRate 0.0169   Epoch: 11   Global Step: 488560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:38,949-Speed 2630.42 samples/sec   Loss 5.5153   LearningRate 0.0169   Epoch: 11   Global Step: 488570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:42,845-Speed 2628.68 samples/sec   Loss 5.4507   LearningRate 0.0169   Epoch: 11   Global Step: 488580   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:46,742-Speed 2628.36 samples/sec   Loss 5.5034   LearningRate 0.0169   Epoch: 11   Global Step: 488590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:50,633-Speed 2632.10 samples/sec   Loss 5.5048   LearningRate 0.0169   Epoch: 11   Global Step: 488600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:54,540-Speed 2622.52 samples/sec   Loss 5.4537   LearningRate 0.0169   Epoch: 11   Global Step: 488610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:36:58,433-Speed 2630.86 samples/sec   Loss 5.5768   LearningRate 0.0169   Epoch: 11   Global Step: 488620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:02,330-Speed 2628.09 samples/sec   Loss 5.5047   LearningRate 0.0169   Epoch: 11   Global Step: 488630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:06,225-Speed 2629.31 samples/sec   Loss 5.5563   LearningRate 0.0169   Epoch: 11   Global Step: 488640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:10,118-Speed 2631.65 samples/sec   Loss 5.5145   LearningRate 0.0169   Epoch: 11   Global Step: 488650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:37:14,013-Speed 2630.03 samples/sec   Loss 5.4363   LearningRate 0.0169   Epoch: 11   Global Step: 488660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:37:17,910-Speed 2627.93 samples/sec   Loss 5.4772   LearningRate 0.0169   Epoch: 11   Global Step: 488670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:37:21,786-Speed 2642.95 samples/sec   Loss 5.5731   LearningRate 0.0169   Epoch: 11   Global Step: 488680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:25,693-Speed 2621.44 samples/sec   Loss 5.5382   LearningRate 0.0169   Epoch: 11   Global Step: 488690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:29,588-Speed 2630.14 samples/sec   Loss 5.4993   LearningRate 0.0169   Epoch: 11   Global Step: 488700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:33,491-Speed 2623.85 samples/sec   Loss 5.4832   LearningRate 0.0169   Epoch: 11   Global Step: 488710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:37,390-Speed 2627.02 samples/sec   Loss 5.5377   LearningRate 0.0169   Epoch: 11   Global Step: 488720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:41,300-Speed 2619.64 samples/sec   Loss 5.5396   LearningRate 0.0169   Epoch: 11   Global Step: 488730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:45,200-Speed 2626.68 samples/sec   Loss 5.4572   LearningRate 0.0169   Epoch: 11   Global Step: 488740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:49,107-Speed 2621.60 samples/sec   Loss 5.5600   LearningRate 0.0169   Epoch: 11   Global Step: 488750   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:53,021-Speed 2616.34 samples/sec   Loss 5.5058   LearningRate 0.0169   Epoch: 11   Global Step: 488760   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:37:56,916-Speed 2630.48 samples/sec   Loss 5.4999   LearningRate 0.0169   Epoch: 11   Global Step: 488770   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:00,812-Speed 2629.13 samples/sec   Loss 5.5822   LearningRate 0.0169   Epoch: 11   Global Step: 488780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:38:04,686-Speed 2643.60 samples/sec   Loss 5.5355   LearningRate 0.0169   Epoch: 11   Global Step: 488790   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:08,577-Speed 2632.19 samples/sec   Loss 5.4815   LearningRate 0.0169   Epoch: 11   Global Step: 488800   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:12,471-Speed 2630.43 samples/sec   Loss 5.4491   LearningRate 0.0169   Epoch: 11   Global Step: 488810   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:16,369-Speed 2628.32 samples/sec   Loss 5.4751   LearningRate 0.0169   Epoch: 11   Global Step: 488820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:20,263-Speed 2630.17 samples/sec   Loss 5.4945   LearningRate 0.0169   Epoch: 11   Global Step: 488830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:24,161-Speed 2627.72 samples/sec   Loss 5.4313   LearningRate 0.0169   Epoch: 11   Global Step: 488840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:28,074-Speed 2623.95 samples/sec   Loss 5.5724   LearningRate 0.0169   Epoch: 11   Global Step: 488850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:31,975-Speed 2625.60 samples/sec   Loss 5.5953   LearningRate 0.0169   Epoch: 11   Global Step: 488860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:35,877-Speed 2624.98 samples/sec   Loss 5.5008   LearningRate 0.0169   Epoch: 11   Global Step: 488870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:39,778-Speed 2625.49 samples/sec   Loss 5.4548   LearningRate 0.0169   Epoch: 11   Global Step: 488880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:43,652-Speed 2644.72 samples/sec   Loss 5.5511   LearningRate 0.0169   Epoch: 11   Global Step: 488890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:47,553-Speed 2625.09 samples/sec   Loss 5.4598   LearningRate 0.0169   Epoch: 11   Global Step: 488900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:51,492-Speed 2601.28 samples/sec   Loss 5.5643   LearningRate 0.0169   Epoch: 11   Global Step: 488910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:55,382-Speed 2632.64 samples/sec   Loss 5.5379   LearningRate 0.0169   Epoch: 11   Global Step: 488920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:38:59,278-Speed 2629.67 samples/sec   Loss 5.5195   LearningRate 0.0169   Epoch: 11   Global Step: 488930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:39:03,172-Speed 2630.13 samples/sec   Loss 5.5684   LearningRate 0.0169   Epoch: 11   Global Step: 488940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:39:07,068-Speed 2629.09 samples/sec   Loss 5.5975   LearningRate 0.0169   Epoch: 11   Global Step: 488950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:39:10,956-Speed 2634.04 samples/sec   Loss 5.6526   LearningRate 0.0169   Epoch: 11   Global Step: 488960   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:14,871-Speed 2616.44 samples/sec   Loss 5.5021   LearningRate 0.0169   Epoch: 11   Global Step: 488970   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:18,769-Speed 2627.22 samples/sec   Loss 5.4446   LearningRate 0.0169   Epoch: 11   Global Step: 488980   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:22,675-Speed 2623.25 samples/sec   Loss 5.4753   LearningRate 0.0169   Epoch: 11   Global Step: 488990   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:26,566-Speed 2631.77 samples/sec   Loss 5.4650   LearningRate 0.0169   Epoch: 11   Global Step: 489000   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:30,468-Speed 2625.48 samples/sec   Loss 5.5158   LearningRate 0.0169   Epoch: 11   Global Step: 489010   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:34,372-Speed 2623.47 samples/sec   Loss 5.4408   LearningRate 0.0169   Epoch: 11   Global Step: 489020   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:38,277-Speed 2622.65 samples/sec   Loss 5.5694   LearningRate 0.0169   Epoch: 11   Global Step: 489030   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:42,180-Speed 2624.61 samples/sec   Loss 5.5603   LearningRate 0.0169   Epoch: 11   Global Step: 489040   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:46,077-Speed 2627.80 samples/sec   Loss 5.5132   LearningRate 0.0168   Epoch: 11   Global Step: 489050   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:39:49,980-Speed 2624.73 samples/sec   Loss 5.5354   LearningRate 0.0168   Epoch: 11   Global Step: 489060   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:39:53,872-Speed 2631.54 samples/sec   Loss 5.4920   LearningRate 0.0168   Epoch: 11   Global Step: 489070   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:39:57,770-Speed 2627.93 samples/sec   Loss 5.5187   LearningRate 0.0168   Epoch: 11   Global Step: 489080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:01,700-Speed 2606.29 samples/sec   Loss 5.3959   LearningRate 0.0168   Epoch: 11   Global Step: 489090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:05,596-Speed 2629.12 samples/sec   Loss 5.5027   LearningRate 0.0168   Epoch: 11   Global Step: 489100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:09,497-Speed 2626.55 samples/sec   Loss 5.4914   LearningRate 0.0168   Epoch: 11   Global Step: 489110   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:13,399-Speed 2625.36 samples/sec   Loss 5.5306   LearningRate 0.0168   Epoch: 11   Global Step: 489120   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:17,295-Speed 2628.54 samples/sec   Loss 5.5219   LearningRate 0.0168   Epoch: 11   Global Step: 489130   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:21,190-Speed 2629.89 samples/sec   Loss 5.5210   LearningRate 0.0168   Epoch: 11   Global Step: 489140   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:25,082-Speed 2631.66 samples/sec   Loss 5.4892   LearningRate 0.0168   Epoch: 11   Global Step: 489150   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:28,978-Speed 2629.30 samples/sec   Loss 5.3985   LearningRate 0.0168   Epoch: 11   Global Step: 489160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:40:32,874-Speed 2629.02 samples/sec   Loss 5.4273   LearningRate 0.0168   Epoch: 11   Global Step: 489170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:40:36,759-Speed 2636.62 samples/sec   Loss 5.5435   LearningRate 0.0168   Epoch: 11   Global Step: 489180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:40,658-Speed 2626.22 samples/sec   Loss 5.5085   LearningRate 0.0168   Epoch: 11   Global Step: 489190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:44,556-Speed 2627.91 samples/sec   Loss 5.4944   LearningRate 0.0168   Epoch: 11   Global Step: 489200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:48,460-Speed 2623.83 samples/sec   Loss 5.4080   LearningRate 0.0168   Epoch: 11   Global Step: 489210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:52,357-Speed 2628.64 samples/sec   Loss 5.4267   LearningRate 0.0168   Epoch: 11   Global Step: 489220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:40:56,251-Speed 2630.26 samples/sec   Loss 5.4375   LearningRate 0.0168   Epoch: 11   Global Step: 489230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:00,148-Speed 2628.57 samples/sec   Loss 5.5445   LearningRate 0.0168   Epoch: 11   Global Step: 489240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:04,045-Speed 2628.75 samples/sec   Loss 5.4511   LearningRate 0.0168   Epoch: 11   Global Step: 489250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:07,946-Speed 2625.19 samples/sec   Loss 5.4406   LearningRate 0.0168   Epoch: 11   Global Step: 489260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:11,842-Speed 2628.70 samples/sec   Loss 5.4616   LearningRate 0.0168   Epoch: 11   Global Step: 489270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:15,736-Speed 2630.43 samples/sec   Loss 5.3993   LearningRate 0.0168   Epoch: 11   Global Step: 489280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:41:19,682-Speed 2596.68 samples/sec   Loss 5.5016   LearningRate 0.0168   Epoch: 11   Global Step: 489290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:41:23,700-Speed 2549.14 samples/sec   Loss 5.5345   LearningRate 0.0168   Epoch: 11   Global Step: 489300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:41:27,577-Speed 2641.82 samples/sec   Loss 5.5304   LearningRate 0.0168   Epoch: 11   Global Step: 489310   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:31,485-Speed 2621.29 samples/sec   Loss 5.4402   LearningRate 0.0168   Epoch: 11   Global Step: 489320   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:35,382-Speed 2628.20 samples/sec   Loss 5.3448   LearningRate 0.0168   Epoch: 11   Global Step: 489330   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:39,286-Speed 2623.34 samples/sec   Loss 5.4632   LearningRate 0.0168   Epoch: 11   Global Step: 489340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:43,195-Speed 2620.17 samples/sec   Loss 5.5735   LearningRate 0.0168   Epoch: 11   Global Step: 489350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:47,093-Speed 2627.69 samples/sec   Loss 5.5068   LearningRate 0.0168   Epoch: 11   Global Step: 489360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:50,991-Speed 2627.91 samples/sec   Loss 5.5275   LearningRate 0.0168   Epoch: 11   Global Step: 489370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:54,889-Speed 2628.03 samples/sec   Loss 5.4928   LearningRate 0.0168   Epoch: 11   Global Step: 489380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:41:58,784-Speed 2629.38 samples/sec   Loss 5.5431   LearningRate 0.0168   Epoch: 11   Global Step: 489390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:42:02,652-Speed 2648.15 samples/sec   Loss 5.4973   LearningRate 0.0168   Epoch: 11   Global Step: 489400   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:06,554-Speed 2625.02 samples/sec   Loss 5.5108   LearningRate 0.0168   Epoch: 11   Global Step: 489410   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:10,464-Speed 2619.55 samples/sec   Loss 5.5529   LearningRate 0.0168   Epoch: 11   Global Step: 489420   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:14,382-Speed 2613.68 samples/sec   Loss 5.3750   LearningRate 0.0168   Epoch: 11   Global Step: 489430   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:18,280-Speed 2628.29 samples/sec   Loss 5.5056   LearningRate 0.0168   Epoch: 11   Global Step: 489440   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:22,177-Speed 2628.58 samples/sec   Loss 5.5008   LearningRate 0.0168   Epoch: 11   Global Step: 489450   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:26,070-Speed 2631.00 samples/sec   Loss 5.4733   LearningRate 0.0168   Epoch: 11   Global Step: 489460   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:29,964-Speed 2630.72 samples/sec   Loss 5.4527   LearningRate 0.0168   Epoch: 11   Global Step: 489470   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:33,859-Speed 2629.46 samples/sec   Loss 5.5028   LearningRate 0.0168   Epoch: 11   Global Step: 489480   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:37,758-Speed 2627.28 samples/sec   Loss 5.5016   LearningRate 0.0168   Epoch: 11   Global Step: 489490   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-15 02:42:41,656-Speed 2627.22 samples/sec   Loss 5.3653   LearningRate 0.0168   Epoch: 11   Global Step: 489500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:42:45,552-Speed 2629.10 samples/sec   Loss 5.5006   LearningRate 0.0168   Epoch: 11   Global Step: 489510   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:42:49,454-Speed 2624.41 samples/sec   Loss 5.5288   LearningRate 0.0168   Epoch: 11   Global Step: 489520   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:42:53,355-Speed 2626.21 samples/sec   Loss 5.5835   LearningRate 0.0168   Epoch: 11   Global Step: 489530   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:42:57,266-Speed 2618.30 samples/sec   Loss 5.6440   LearningRate 0.0168   Epoch: 11   Global Step: 489540   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:01,169-Speed 2624.53 samples/sec   Loss 5.5526   LearningRate 0.0168   Epoch: 11   Global Step: 489550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:05,066-Speed 2627.99 samples/sec   Loss 5.6142   LearningRate 0.0168   Epoch: 11   Global Step: 489560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:08,972-Speed 2622.69 samples/sec   Loss 5.5327   LearningRate 0.0168   Epoch: 11   Global Step: 489570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:12,870-Speed 2627.85 samples/sec   Loss 5.5029   LearningRate 0.0168   Epoch: 11   Global Step: 489580   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:16,771-Speed 2625.07 samples/sec   Loss 5.5369   LearningRate 0.0168   Epoch: 11   Global Step: 489590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:20,675-Speed 2623.50 samples/sec   Loss 5.4305   LearningRate 0.0168   Epoch: 11   Global Step: 489600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:43:24,580-Speed 2622.74 samples/sec   Loss 5.5656   LearningRate 0.0168   Epoch: 11   Global Step: 489610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:43:28,486-Speed 2622.65 samples/sec   Loss 5.4077   LearningRate 0.0168   Epoch: 11   Global Step: 489620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:43:32,400-Speed 2617.00 samples/sec   Loss 5.4781   LearningRate 0.0168   Epoch: 11   Global Step: 489630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:43:36,303-Speed 2623.61 samples/sec   Loss 5.4942   LearningRate 0.0168   Epoch: 11   Global Step: 489640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:43:40,188-Speed 2636.48 samples/sec   Loss 5.5693   LearningRate 0.0168   Epoch: 11   Global Step: 489650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:44,091-Speed 2633.09 samples/sec   Loss 5.5880   LearningRate 0.0168   Epoch: 11   Global Step: 489660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:47,989-Speed 2627.23 samples/sec   Loss 5.3621   LearningRate 0.0168   Epoch: 11   Global Step: 489670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:51,885-Speed 2629.47 samples/sec   Loss 5.5500   LearningRate 0.0168   Epoch: 11   Global Step: 489680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:55,776-Speed 2631.81 samples/sec   Loss 5.6047   LearningRate 0.0168   Epoch: 11   Global Step: 489690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:43:59,675-Speed 2627.34 samples/sec   Loss 5.4925   LearningRate 0.0168   Epoch: 11   Global Step: 489700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:44:03,582-Speed 2620.83 samples/sec   Loss 5.5143   LearningRate 0.0168   Epoch: 11   Global Step: 489710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:44:07,491-Speed 2620.61 samples/sec   Loss 5.5377   LearningRate 0.0168   Epoch: 11   Global Step: 489720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:44:11,389-Speed 2627.42 samples/sec   Loss 5.5293   LearningRate 0.0168   Epoch: 11   Global Step: 489730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:44:15,285-Speed 2629.09 samples/sec   Loss 5.5634   LearningRate 0.0168   Epoch: 11   Global Step: 489740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:44:19,180-Speed 2629.53 samples/sec   Loss 5.4850   LearningRate 0.0168   Epoch: 11   Global Step: 489750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:23,077-Speed 2628.67 samples/sec   Loss 5.5709   LearningRate 0.0168   Epoch: 11   Global Step: 489760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:26,967-Speed 2632.69 samples/sec   Loss 5.3993   LearningRate 0.0168   Epoch: 11   Global Step: 489770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:30,859-Speed 2631.60 samples/sec   Loss 5.5918   LearningRate 0.0168   Epoch: 11   Global Step: 489780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:34,751-Speed 2631.56 samples/sec   Loss 5.4358   LearningRate 0.0168   Epoch: 11   Global Step: 489790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:38,646-Speed 2629.81 samples/sec   Loss 5.4403   LearningRate 0.0168   Epoch: 11   Global Step: 489800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:42,548-Speed 2624.57 samples/sec   Loss 5.4689   LearningRate 0.0168   Epoch: 11   Global Step: 489810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:46,445-Speed 2628.70 samples/sec   Loss 5.4023   LearningRate 0.0168   Epoch: 11   Global Step: 489820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:50,337-Speed 2631.99 samples/sec   Loss 5.6068   LearningRate 0.0168   Epoch: 11   Global Step: 489830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:54,231-Speed 2629.99 samples/sec   Loss 5.5041   LearningRate 0.0168   Epoch: 11   Global Step: 489840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:44:58,100-Speed 2647.32 samples/sec   Loss 5.4487   LearningRate 0.0168   Epoch: 11   Global Step: 489850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:45:01,970-Speed 2647.09 samples/sec   Loss 5.4336   LearningRate 0.0168   Epoch: 11   Global Step: 489860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:05,863-Speed 2630.61 samples/sec   Loss 5.4891   LearningRate 0.0168   Epoch: 11   Global Step: 489870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:09,760-Speed 2627.96 samples/sec   Loss 5.4193   LearningRate 0.0168   Epoch: 11   Global Step: 489880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:13,662-Speed 2625.41 samples/sec   Loss 5.4922   LearningRate 0.0168   Epoch: 11   Global Step: 489890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:17,577-Speed 2615.99 samples/sec   Loss 5.4629   LearningRate 0.0168   Epoch: 11   Global Step: 489900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:21,476-Speed 2626.85 samples/sec   Loss 5.4301   LearningRate 0.0168   Epoch: 11   Global Step: 489910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:25,380-Speed 2623.13 samples/sec   Loss 5.5504   LearningRate 0.0168   Epoch: 11   Global Step: 489920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:29,274-Speed 2630.67 samples/sec   Loss 5.3835   LearningRate 0.0168   Epoch: 11   Global Step: 489930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:33,169-Speed 2629.49 samples/sec   Loss 5.4835   LearningRate 0.0168   Epoch: 11   Global Step: 489940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:37,075-Speed 2622.32 samples/sec   Loss 5.4617   LearningRate 0.0168   Epoch: 11   Global Step: 489950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:45:40,972-Speed 2628.07 samples/sec   Loss 5.3919   LearningRate 0.0168   Epoch: 11   Global Step: 489960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:45:44,870-Speed 2627.79 samples/sec   Loss 5.5527   LearningRate 0.0168   Epoch: 11   Global Step: 489970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:45:48,767-Speed 2628.36 samples/sec   Loss 5.4004   LearningRate 0.0168   Epoch: 11   Global Step: 489980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:45:52,665-Speed 2627.75 samples/sec   Loss 5.4822   LearningRate 0.0168   Epoch: 11   Global Step: 489990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:45:56,557-Speed 2631.85 samples/sec   Loss 5.5311   LearningRate 0.0168   Epoch: 11   Global Step: 490000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:46:39,240-[lfw][490000]XNorm: 23.497655
Training: 2022-04-15 02:46:39,241-[lfw][490000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 02:46:39,241-[lfw][490000]Accuracy-Highest: 0.99800
Training: 2022-04-15 02:47:28,918-[cfp_fp][490000]XNorm: 22.108973
Training: 2022-04-15 02:47:28,919-[cfp_fp][490000]Accuracy-Flip: 0.98971+-0.00541
Training: 2022-04-15 02:47:28,919-[cfp_fp][490000]Accuracy-Highest: 0.98971
Training: 2022-04-15 02:48:11,543-[agedb_30][490000]XNorm: 23.511800
Training: 2022-04-15 02:48:11,544-[agedb_30][490000]Accuracy-Flip: 0.97950+-0.00742
Training: 2022-04-15 02:48:11,544-[agedb_30][490000]Accuracy-Highest: 0.97950
Training: 2022-04-15 02:48:15,385-Speed 73.76 samples/sec   Loss 5.4498   LearningRate 0.0168   Epoch: 11   Global Step: 490010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:19,253-Speed 2647.78 samples/sec   Loss 5.5507   LearningRate 0.0168   Epoch: 11   Global Step: 490020   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:23,121-Speed 2648.68 samples/sec   Loss 5.4385   LearningRate 0.0168   Epoch: 11   Global Step: 490030   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:26,992-Speed 2645.77 samples/sec   Loss 5.4587   LearningRate 0.0168   Epoch: 11   Global Step: 490040   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:30,872-Speed 2639.39 samples/sec   Loss 5.5087   LearningRate 0.0168   Epoch: 11   Global Step: 490050   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:34,770-Speed 2627.42 samples/sec   Loss 5.5025   LearningRate 0.0167   Epoch: 11   Global Step: 490060   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:38,648-Speed 2641.70 samples/sec   Loss 5.4614   LearningRate 0.0167   Epoch: 11   Global Step: 490070   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:42,526-Speed 2641.15 samples/sec   Loss 5.5429   LearningRate 0.0167   Epoch: 11   Global Step: 490080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:46,405-Speed 2639.83 samples/sec   Loss 5.3692   LearningRate 0.0167   Epoch: 11   Global Step: 490090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:50,286-Speed 2639.69 samples/sec   Loss 5.4657   LearningRate 0.0167   Epoch: 11   Global Step: 490100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:48:54,174-Speed 2633.94 samples/sec   Loss 5.3593   LearningRate 0.0167   Epoch: 11   Global Step: 490110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:48:58,056-Speed 2638.50 samples/sec   Loss 5.4743   LearningRate 0.0167   Epoch: 11   Global Step: 490120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:01,951-Speed 2629.49 samples/sec   Loss 5.4683   LearningRate 0.0167   Epoch: 11   Global Step: 490130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:05,843-Speed 2631.62 samples/sec   Loss 5.5154   LearningRate 0.0167   Epoch: 11   Global Step: 490140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:09,738-Speed 2629.54 samples/sec   Loss 5.4741   LearningRate 0.0167   Epoch: 11   Global Step: 490150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:13,633-Speed 2629.87 samples/sec   Loss 5.4397   LearningRate 0.0167   Epoch: 11   Global Step: 490160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:17,554-Speed 2612.52 samples/sec   Loss 5.4352   LearningRate 0.0167   Epoch: 11   Global Step: 490170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:21,451-Speed 2628.10 samples/sec   Loss 5.4378   LearningRate 0.0167   Epoch: 11   Global Step: 490180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:25,351-Speed 2625.97 samples/sec   Loss 5.4283   LearningRate 0.0167   Epoch: 11   Global Step: 490190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:29,247-Speed 2629.56 samples/sec   Loss 5.4482   LearningRate 0.0167   Epoch: 11   Global Step: 490200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:33,121-Speed 2643.44 samples/sec   Loss 5.5271   LearningRate 0.0167   Epoch: 11   Global Step: 490210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:37,014-Speed 2630.74 samples/sec   Loss 5.5498   LearningRate 0.0167   Epoch: 11   Global Step: 490220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:40,926-Speed 2618.29 samples/sec   Loss 5.4800   LearningRate 0.0167   Epoch: 11   Global Step: 490230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:44,829-Speed 2624.22 samples/sec   Loss 5.4813   LearningRate 0.0167   Epoch: 11   Global Step: 490240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:48,725-Speed 2629.51 samples/sec   Loss 5.4229   LearningRate 0.0167   Epoch: 11   Global Step: 490250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:49:52,606-Speed 2638.97 samples/sec   Loss 5.5635   LearningRate 0.0167   Epoch: 11   Global Step: 490260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:49:56,500-Speed 2630.68 samples/sec   Loss 5.5053   LearningRate 0.0167   Epoch: 11   Global Step: 490270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:00,398-Speed 2627.50 samples/sec   Loss 5.4501   LearningRate 0.0167   Epoch: 11   Global Step: 490280   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:04,294-Speed 2629.21 samples/sec   Loss 5.3811   LearningRate 0.0167   Epoch: 11   Global Step: 490290   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:08,290-Speed 2562.93 samples/sec   Loss 5.5164   LearningRate 0.0167   Epoch: 11   Global Step: 490300   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:12,200-Speed 2619.31 samples/sec   Loss 5.4541   LearningRate 0.0167   Epoch: 11   Global Step: 490310   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:16,097-Speed 2628.07 samples/sec   Loss 5.3748   LearningRate 0.0167   Epoch: 11   Global Step: 490320   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:20,005-Speed 2621.52 samples/sec   Loss 5.4024   LearningRate 0.0167   Epoch: 11   Global Step: 490330   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:23,911-Speed 2621.98 samples/sec   Loss 5.4359   LearningRate 0.0167   Epoch: 11   Global Step: 490340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:27,812-Speed 2625.84 samples/sec   Loss 5.4761   LearningRate 0.0167   Epoch: 11   Global Step: 490350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:31,714-Speed 2624.93 samples/sec   Loss 5.3597   LearningRate 0.0167   Epoch: 11   Global Step: 490360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:50:35,612-Speed 2628.06 samples/sec   Loss 5.4855   LearningRate 0.0167   Epoch: 11   Global Step: 490370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:50:39,508-Speed 2628.62 samples/sec   Loss 5.3709   LearningRate 0.0167   Epoch: 11   Global Step: 490380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:50:43,414-Speed 2622.54 samples/sec   Loss 5.3757   LearningRate 0.0167   Epoch: 11   Global Step: 490390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:50:47,353-Speed 2599.88 samples/sec   Loss 5.4189   LearningRate 0.0167   Epoch: 11   Global Step: 490400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:50:51,229-Speed 2642.64 samples/sec   Loss 5.4919   LearningRate 0.0167   Epoch: 11   Global Step: 490410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:55,125-Speed 2628.81 samples/sec   Loss 5.4441   LearningRate 0.0167   Epoch: 11   Global Step: 490420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:50:59,033-Speed 2620.74 samples/sec   Loss 5.5311   LearningRate 0.0167   Epoch: 11   Global Step: 490430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:02,943-Speed 2619.50 samples/sec   Loss 5.5701   LearningRate 0.0167   Epoch: 11   Global Step: 490440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:06,841-Speed 2627.76 samples/sec   Loss 5.4093   LearningRate 0.0167   Epoch: 11   Global Step: 490450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:10,740-Speed 2626.51 samples/sec   Loss 5.5078   LearningRate 0.0167   Epoch: 11   Global Step: 490460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:14,634-Speed 2630.84 samples/sec   Loss 5.5883   LearningRate 0.0167   Epoch: 11   Global Step: 490470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:18,537-Speed 2624.05 samples/sec   Loss 5.4496   LearningRate 0.0167   Epoch: 11   Global Step: 490480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:22,432-Speed 2630.14 samples/sec   Loss 5.5562   LearningRate 0.0167   Epoch: 11   Global Step: 490490   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:26,327-Speed 2629.40 samples/sec   Loss 5.4908   LearningRate 0.0167   Epoch: 11   Global Step: 490500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:30,223-Speed 2629.21 samples/sec   Loss 5.4573   LearningRate 0.0167   Epoch: 11   Global Step: 490510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:51:34,124-Speed 2624.85 samples/sec   Loss 5.4911   LearningRate 0.0167   Epoch: 11   Global Step: 490520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:51:38,034-Speed 2619.61 samples/sec   Loss 5.4627   LearningRate 0.0167   Epoch: 11   Global Step: 490530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:51:41,930-Speed 2628.64 samples/sec   Loss 5.4918   LearningRate 0.0167   Epoch: 11   Global Step: 490540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:51:45,832-Speed 2624.77 samples/sec   Loss 5.5326   LearningRate 0.0167   Epoch: 11   Global Step: 490550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:51:49,701-Speed 2647.61 samples/sec   Loss 5.4401   LearningRate 0.0167   Epoch: 11   Global Step: 490560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:53,601-Speed 2626.36 samples/sec   Loss 5.4532   LearningRate 0.0167   Epoch: 11   Global Step: 490570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:51:57,500-Speed 2627.00 samples/sec   Loss 5.4736   LearningRate 0.0167   Epoch: 11   Global Step: 490580   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:01,445-Speed 2596.29 samples/sec   Loss 5.6485   LearningRate 0.0167   Epoch: 11   Global Step: 490590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:05,342-Speed 2628.07 samples/sec   Loss 5.4794   LearningRate 0.0167   Epoch: 11   Global Step: 490600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:09,240-Speed 2627.52 samples/sec   Loss 5.4463   LearningRate 0.0167   Epoch: 11   Global Step: 490610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:13,149-Speed 2625.08 samples/sec   Loss 5.4665   LearningRate 0.0167   Epoch: 11   Global Step: 490620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:17,046-Speed 2628.11 samples/sec   Loss 5.4158   LearningRate 0.0167   Epoch: 11   Global Step: 490630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:20,949-Speed 2624.53 samples/sec   Loss 5.5440   LearningRate 0.0167   Epoch: 11   Global Step: 490640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:24,881-Speed 2604.22 samples/sec   Loss 5.4445   LearningRate 0.0167   Epoch: 11   Global Step: 490650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:28,794-Speed 2618.61 samples/sec   Loss 5.3983   LearningRate 0.0167   Epoch: 11   Global Step: 490660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:52:32,688-Speed 2630.03 samples/sec   Loss 5.4863   LearningRate 0.0167   Epoch: 11   Global Step: 490670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:52:36,582-Speed 2629.88 samples/sec   Loss 5.3903   LearningRate 0.0167   Epoch: 11   Global Step: 490680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:52:40,557-Speed 2577.05 samples/sec   Loss 5.5125   LearningRate 0.0167   Epoch: 11   Global Step: 490690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:52:44,473-Speed 2615.29 samples/sec   Loss 5.5033   LearningRate 0.0167   Epoch: 11   Global Step: 490700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:48,556-Speed 2508.45 samples/sec   Loss 5.4382   LearningRate 0.0167   Epoch: 11   Global Step: 490710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:52,463-Speed 2621.99 samples/sec   Loss 5.4907   LearningRate 0.0167   Epoch: 11   Global Step: 490720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:52:56,367-Speed 2623.52 samples/sec   Loss 5.4270   LearningRate 0.0167   Epoch: 11   Global Step: 490730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:00,265-Speed 2627.16 samples/sec   Loss 5.5042   LearningRate 0.0167   Epoch: 11   Global Step: 490740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:04,184-Speed 2613.89 samples/sec   Loss 5.3622   LearningRate 0.0167   Epoch: 11   Global Step: 490750   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:08,083-Speed 2626.82 samples/sec   Loss 5.5183   LearningRate 0.0167   Epoch: 11   Global Step: 490760   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:11,985-Speed 2624.63 samples/sec   Loss 5.4361   LearningRate 0.0167   Epoch: 11   Global Step: 490770   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:15,936-Speed 2593.03 samples/sec   Loss 5.3997   LearningRate 0.0167   Epoch: 11   Global Step: 490780   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:19,854-Speed 2614.39 samples/sec   Loss 5.4578   LearningRate 0.0167   Epoch: 11   Global Step: 490790   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:23,763-Speed 2619.93 samples/sec   Loss 5.5275   LearningRate 0.0167   Epoch: 11   Global Step: 490800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:53:27,641-Speed 2641.09 samples/sec   Loss 5.5829   LearningRate 0.0167   Epoch: 11   Global Step: 490810   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:31,548-Speed 2621.41 samples/sec   Loss 5.3856   LearningRate 0.0167   Epoch: 11   Global Step: 490820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:35,465-Speed 2614.92 samples/sec   Loss 5.4387   LearningRate 0.0167   Epoch: 11   Global Step: 490830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:39,367-Speed 2624.65 samples/sec   Loss 5.4330   LearningRate 0.0167   Epoch: 11   Global Step: 490840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:43,394-Speed 2543.40 samples/sec   Loss 5.4926   LearningRate 0.0167   Epoch: 11   Global Step: 490850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:47,506-Speed 2491.53 samples/sec   Loss 5.4963   LearningRate 0.0167   Epoch: 11   Global Step: 490860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:51,485-Speed 2574.39 samples/sec   Loss 5.4882   LearningRate 0.0167   Epoch: 11   Global Step: 490870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:55,388-Speed 2623.84 samples/sec   Loss 5.4872   LearningRate 0.0167   Epoch: 11   Global Step: 490880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:53:59,400-Speed 2553.22 samples/sec   Loss 5.3676   LearningRate 0.0167   Epoch: 11   Global Step: 490890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:03,303-Speed 2624.15 samples/sec   Loss 5.4733   LearningRate 0.0167   Epoch: 11   Global Step: 490900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:07,201-Speed 2627.30 samples/sec   Loss 5.5036   LearningRate 0.0167   Epoch: 11   Global Step: 490910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:54:11,102-Speed 2625.03 samples/sec   Loss 5.5024   LearningRate 0.0167   Epoch: 11   Global Step: 490920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:54:15,009-Speed 2621.75 samples/sec   Loss 5.4528   LearningRate 0.0167   Epoch: 11   Global Step: 490930   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:54:18,908-Speed 2627.22 samples/sec   Loss 5.4526   LearningRate 0.0167   Epoch: 11   Global Step: 490940   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:54:22,786-Speed 2640.93 samples/sec   Loss 5.3206   LearningRate 0.0167   Epoch: 11   Global Step: 490950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:26,696-Speed 2620.39 samples/sec   Loss 5.4868   LearningRate 0.0167   Epoch: 11   Global Step: 490960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:30,634-Speed 2600.64 samples/sec   Loss 5.4629   LearningRate 0.0167   Epoch: 11   Global Step: 490970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:34,668-Speed 2538.90 samples/sec   Loss 5.4542   LearningRate 0.0167   Epoch: 11   Global Step: 490980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:38,576-Speed 2621.08 samples/sec   Loss 5.3737   LearningRate 0.0167   Epoch: 11   Global Step: 490990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:42,483-Speed 2620.93 samples/sec   Loss 5.4516   LearningRate 0.0167   Epoch: 11   Global Step: 491000   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:46,385-Speed 2624.75 samples/sec   Loss 5.4587   LearningRate 0.0167   Epoch: 11   Global Step: 491010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:50,295-Speed 2620.16 samples/sec   Loss 5.4796   LearningRate 0.0167   Epoch: 11   Global Step: 491020   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:54,207-Speed 2618.47 samples/sec   Loss 5.5110   LearningRate 0.0167   Epoch: 11   Global Step: 491030   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:54:58,126-Speed 2613.36 samples/sec   Loss 5.4874   LearningRate 0.0167   Epoch: 11   Global Step: 491040   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:02,033-Speed 2622.00 samples/sec   Loss 5.4620   LearningRate 0.0167   Epoch: 11   Global Step: 491050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:55:05,914-Speed 2638.97 samples/sec   Loss 5.5901   LearningRate 0.0167   Epoch: 11   Global Step: 491060   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:09,820-Speed 2621.97 samples/sec   Loss 5.4014   LearningRate 0.0167   Epoch: 11   Global Step: 491070   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:13,726-Speed 2622.27 samples/sec   Loss 5.4278   LearningRate 0.0166   Epoch: 11   Global Step: 491080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:17,638-Speed 2618.17 samples/sec   Loss 5.5871   LearningRate 0.0166   Epoch: 11   Global Step: 491090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:21,550-Speed 2618.45 samples/sec   Loss 5.4458   LearningRate 0.0166   Epoch: 11   Global Step: 491100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:25,452-Speed 2624.96 samples/sec   Loss 5.3785   LearningRate 0.0166   Epoch: 11   Global Step: 491110   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:29,355-Speed 2624.23 samples/sec   Loss 5.5165   LearningRate 0.0166   Epoch: 11   Global Step: 491120   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:33,262-Speed 2621.72 samples/sec   Loss 5.4733   LearningRate 0.0166   Epoch: 11   Global Step: 491130   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:37,173-Speed 2618.53 samples/sec   Loss 5.5457   LearningRate 0.0166   Epoch: 11   Global Step: 491140   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:41,086-Speed 2617.70 samples/sec   Loss 5.4737   LearningRate 0.0166   Epoch: 11   Global Step: 491150   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:55:44,995-Speed 2620.27 samples/sec   Loss 5.5046   LearningRate 0.0166   Epoch: 11   Global Step: 491160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:55:48,908-Speed 2617.40 samples/sec   Loss 5.4546   LearningRate 0.0166   Epoch: 11   Global Step: 491170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:55:52,816-Speed 2620.91 samples/sec   Loss 5.4628   LearningRate 0.0166   Epoch: 11   Global Step: 491180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:55:56,737-Speed 2612.29 samples/sec   Loss 5.5445   LearningRate 0.0166   Epoch: 11   Global Step: 491190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:56:00,625-Speed 2633.96 samples/sec   Loss 5.4489   LearningRate 0.0166   Epoch: 11   Global Step: 491200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:04,532-Speed 2621.73 samples/sec   Loss 5.4190   LearningRate 0.0166   Epoch: 11   Global Step: 491210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:08,446-Speed 2617.13 samples/sec   Loss 5.4428   LearningRate 0.0166   Epoch: 11   Global Step: 491220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:12,384-Speed 2600.59 samples/sec   Loss 5.3645   LearningRate 0.0166   Epoch: 11   Global Step: 491230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:16,299-Speed 2615.90 samples/sec   Loss 5.3484   LearningRate 0.0166   Epoch: 11   Global Step: 491240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:20,213-Speed 2616.96 samples/sec   Loss 5.5132   LearningRate 0.0166   Epoch: 11   Global Step: 491250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:24,124-Speed 2619.42 samples/sec   Loss 5.5400   LearningRate 0.0166   Epoch: 11   Global Step: 491260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:28,031-Speed 2621.24 samples/sec   Loss 5.5236   LearningRate 0.0166   Epoch: 11   Global Step: 491270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:31,939-Speed 2620.75 samples/sec   Loss 5.4195   LearningRate 0.0166   Epoch: 11   Global Step: 491280   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:35,844-Speed 2622.88 samples/sec   Loss 5.3418   LearningRate 0.0166   Epoch: 11   Global Step: 491290   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:56:39,753-Speed 2620.23 samples/sec   Loss 5.4297   LearningRate 0.0166   Epoch: 11   Global Step: 491300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:56:43,674-Speed 2612.08 samples/sec   Loss 5.3838   LearningRate 0.0166   Epoch: 11   Global Step: 491310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:56:47,589-Speed 2616.69 samples/sec   Loss 5.3715   LearningRate 0.0166   Epoch: 11   Global Step: 491320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:56:51,517-Speed 2606.99 samples/sec   Loss 5.4538   LearningRate 0.0166   Epoch: 11   Global Step: 491330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:56:55,432-Speed 2616.96 samples/sec   Loss 5.4749   LearningRate 0.0166   Epoch: 11   Global Step: 491340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:56:59,337-Speed 2623.32 samples/sec   Loss 5.4433   LearningRate 0.0166   Epoch: 11   Global Step: 491350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:57:03,244-Speed 2621.38 samples/sec   Loss 5.4119   LearningRate 0.0166   Epoch: 11   Global Step: 491360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:57:07,156-Speed 2618.01 samples/sec   Loss 5.4735   LearningRate 0.0166   Epoch: 11   Global Step: 491370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:57:11,068-Speed 2618.13 samples/sec   Loss 5.4652   LearningRate 0.0166   Epoch: 11   Global Step: 491380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:57:14,983-Speed 2616.45 samples/sec   Loss 5.4785   LearningRate 0.0166   Epoch: 11   Global Step: 491390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:57:18,892-Speed 2620.04 samples/sec   Loss 5.5494   LearningRate 0.0166   Epoch: 11   Global Step: 491400   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-04-15 02:57:22,755-Speed 2651.37 samples/sec   Loss 5.4294   LearningRate 0.0166   Epoch: 11   Global Step: 491410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:26,682-Speed 2608.13 samples/sec   Loss 5.4574   LearningRate 0.0166   Epoch: 11   Global Step: 491420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:30,590-Speed 2621.41 samples/sec   Loss 5.4361   LearningRate 0.0166   Epoch: 11   Global Step: 491430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:34,504-Speed 2616.27 samples/sec   Loss 5.4724   LearningRate 0.0166   Epoch: 11   Global Step: 491440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:38,423-Speed 2613.72 samples/sec   Loss 5.4664   LearningRate 0.0166   Epoch: 11   Global Step: 491450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:42,333-Speed 2619.83 samples/sec   Loss 5.4652   LearningRate 0.0166   Epoch: 11   Global Step: 491460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:46,239-Speed 2622.01 samples/sec   Loss 5.3552   LearningRate 0.0166   Epoch: 11   Global Step: 491470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:50,152-Speed 2618.21 samples/sec   Loss 5.4295   LearningRate 0.0166   Epoch: 11   Global Step: 491480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:54,064-Speed 2617.66 samples/sec   Loss 5.5125   LearningRate 0.0166   Epoch: 11   Global Step: 491490   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:57:57,986-Speed 2611.68 samples/sec   Loss 5.4714   LearningRate 0.0166   Epoch: 11   Global Step: 491500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:58:01,897-Speed 2619.02 samples/sec   Loss 5.4017   LearningRate 0.0166   Epoch: 11   Global Step: 491510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:05,802-Speed 2622.65 samples/sec   Loss 5.3283   LearningRate 0.0166   Epoch: 11   Global Step: 491520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:09,707-Speed 2622.54 samples/sec   Loss 5.4196   LearningRate 0.0166   Epoch: 11   Global Step: 491530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:13,616-Speed 2620.71 samples/sec   Loss 5.4407   LearningRate 0.0166   Epoch: 11   Global Step: 491540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:17,521-Speed 2623.19 samples/sec   Loss 5.3846   LearningRate 0.0166   Epoch: 11   Global Step: 491550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:21,429-Speed 2620.65 samples/sec   Loss 5.4742   LearningRate 0.0166   Epoch: 11   Global Step: 491560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:25,350-Speed 2612.35 samples/sec   Loss 5.3737   LearningRate 0.0166   Epoch: 11   Global Step: 491570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:29,265-Speed 2616.30 samples/sec   Loss 5.4699   LearningRate 0.0166   Epoch: 11   Global Step: 491580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:33,536-Speed 2397.95 samples/sec   Loss 5.4720   LearningRate 0.0166   Epoch: 11   Global Step: 491590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:37,458-Speed 2611.14 samples/sec   Loss 5.3608   LearningRate 0.0166   Epoch: 11   Global Step: 491600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:58:41,350-Speed 2631.97 samples/sec   Loss 5.4718   LearningRate 0.0166   Epoch: 11   Global Step: 491610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:58:45,258-Speed 2620.45 samples/sec   Loss 5.3512   LearningRate 0.0166   Epoch: 11   Global Step: 491620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:58:49,172-Speed 2617.07 samples/sec   Loss 5.4729   LearningRate 0.0166   Epoch: 11   Global Step: 491630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:58:53,093-Speed 2612.42 samples/sec   Loss 5.5796   LearningRate 0.0166   Epoch: 11   Global Step: 491640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:58:57,001-Speed 2621.04 samples/sec   Loss 5.4583   LearningRate 0.0166   Epoch: 11   Global Step: 491650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:59:00,919-Speed 2614.07 samples/sec   Loss 5.3598   LearningRate 0.0166   Epoch: 11   Global Step: 491660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:59:04,825-Speed 2622.63 samples/sec   Loss 5.4201   LearningRate 0.0166   Epoch: 11   Global Step: 491670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:59:08,732-Speed 2621.27 samples/sec   Loss 5.3848   LearningRate 0.0166   Epoch: 11   Global Step: 491680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:59:12,637-Speed 2622.65 samples/sec   Loss 5.5084   LearningRate 0.0166   Epoch: 11   Global Step: 491690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:59:16,550-Speed 2617.03 samples/sec   Loss 5.4723   LearningRate 0.0166   Epoch: 11   Global Step: 491700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 02:59:20,455-Speed 2622.98 samples/sec   Loss 5.4420   LearningRate 0.0166   Epoch: 11   Global Step: 491710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:24,358-Speed 2624.16 samples/sec   Loss 5.3746   LearningRate 0.0166   Epoch: 11   Global Step: 491720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:28,272-Speed 2617.47 samples/sec   Loss 5.3695   LearningRate 0.0166   Epoch: 11   Global Step: 491730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:32,183-Speed 2618.81 samples/sec   Loss 5.5044   LearningRate 0.0166   Epoch: 11   Global Step: 491740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:36,091-Speed 2620.64 samples/sec   Loss 5.4945   LearningRate 0.0166   Epoch: 11   Global Step: 491750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:39,998-Speed 2621.50 samples/sec   Loss 5.3770   LearningRate 0.0166   Epoch: 11   Global Step: 491760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:44,000-Speed 2559.65 samples/sec   Loss 5.5361   LearningRate 0.0166   Epoch: 11   Global Step: 491770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:47,937-Speed 2601.03 samples/sec   Loss 5.4440   LearningRate 0.0166   Epoch: 11   Global Step: 491780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:51,845-Speed 2620.97 samples/sec   Loss 5.5387   LearningRate 0.0166   Epoch: 11   Global Step: 491790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:55,765-Speed 2613.27 samples/sec   Loss 5.4985   LearningRate 0.0166   Epoch: 11   Global Step: 491800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 02:59:59,648-Speed 2637.82 samples/sec   Loss 5.4257   LearningRate 0.0166   Epoch: 11   Global Step: 491810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:03,630-Speed 2571.78 samples/sec   Loss 5.5264   LearningRate 0.0166   Epoch: 11   Global Step: 491820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:07,540-Speed 2620.09 samples/sec   Loss 5.4149   LearningRate 0.0166   Epoch: 11   Global Step: 491830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:11,508-Speed 2580.77 samples/sec   Loss 5.4069   LearningRate 0.0166   Epoch: 11   Global Step: 491840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:15,428-Speed 2613.38 samples/sec   Loss 5.3598   LearningRate 0.0166   Epoch: 11   Global Step: 491850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:19,337-Speed 2619.93 samples/sec   Loss 5.4648   LearningRate 0.0166   Epoch: 11   Global Step: 491860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:23,246-Speed 2620.34 samples/sec   Loss 5.5325   LearningRate 0.0166   Epoch: 11   Global Step: 491870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:27,151-Speed 2622.87 samples/sec   Loss 5.5259   LearningRate 0.0166   Epoch: 11   Global Step: 491880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:00:31,051-Speed 2626.14 samples/sec   Loss 5.4790   LearningRate 0.0166   Epoch: 11   Global Step: 491890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:34,959-Speed 2621.24 samples/sec   Loss 5.4919   LearningRate 0.0166   Epoch: 11   Global Step: 491900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:38,866-Speed 2620.77 samples/sec   Loss 5.4970   LearningRate 0.0166   Epoch: 11   Global Step: 491910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:42,775-Speed 2620.61 samples/sec   Loss 5.4726   LearningRate 0.0166   Epoch: 11   Global Step: 491920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:46,688-Speed 2617.32 samples/sec   Loss 5.4224   LearningRate 0.0166   Epoch: 11   Global Step: 491930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:50,600-Speed 2618.59 samples/sec   Loss 5.3505   LearningRate 0.0166   Epoch: 11   Global Step: 491940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:54,513-Speed 2617.53 samples/sec   Loss 5.4002   LearningRate 0.0166   Epoch: 11   Global Step: 491950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:00:58,424-Speed 2618.75 samples/sec   Loss 5.3418   LearningRate 0.0166   Epoch: 11   Global Step: 491960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:02,336-Speed 2618.01 samples/sec   Loss 5.3973   LearningRate 0.0166   Epoch: 11   Global Step: 491970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:06,243-Speed 2622.01 samples/sec   Loss 5.5259   LearningRate 0.0166   Epoch: 11   Global Step: 491980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:10,157-Speed 2616.11 samples/sec   Loss 5.5174   LearningRate 0.0166   Epoch: 11   Global Step: 491990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:01:14,074-Speed 2615.22 samples/sec   Loss 5.4288   LearningRate 0.0166   Epoch: 11   Global Step: 492000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:01:18,008-Speed 2602.86 samples/sec   Loss 5.5479   LearningRate 0.0166   Epoch: 11   Global Step: 492010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:01:21,928-Speed 2612.92 samples/sec   Loss 5.3780   LearningRate 0.0166   Epoch: 11   Global Step: 492020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:01:25,842-Speed 2617.09 samples/sec   Loss 5.4061   LearningRate 0.0166   Epoch: 11   Global Step: 492030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:01:29,761-Speed 2614.02 samples/sec   Loss 5.4117   LearningRate 0.0166   Epoch: 11   Global Step: 492040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:01:33,659-Speed 2627.56 samples/sec   Loss 5.5177   LearningRate 0.0166   Epoch: 11   Global Step: 492050   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:37,569-Speed 2620.35 samples/sec   Loss 5.5736   LearningRate 0.0166   Epoch: 11   Global Step: 492060   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:41,470-Speed 2625.23 samples/sec   Loss 5.3872   LearningRate 0.0166   Epoch: 11   Global Step: 492070   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:45,377-Speed 2621.61 samples/sec   Loss 5.4065   LearningRate 0.0166   Epoch: 11   Global Step: 492080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:49,282-Speed 2622.92 samples/sec   Loss 5.4923   LearningRate 0.0166   Epoch: 11   Global Step: 492090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:53,206-Speed 2610.00 samples/sec   Loss 5.4925   LearningRate 0.0165   Epoch: 11   Global Step: 492100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:01:57,113-Speed 2621.67 samples/sec   Loss 5.4248   LearningRate 0.0165   Epoch: 11   Global Step: 492110   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:02:01,021-Speed 2620.97 samples/sec   Loss 5.5341   LearningRate 0.0165   Epoch: 11   Global Step: 492120   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:02:04,945-Speed 2610.62 samples/sec   Loss 5.5198   LearningRate 0.0165   Epoch: 11   Global Step: 492130   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:02:08,860-Speed 2616.06 samples/sec   Loss 5.4412   LearningRate 0.0165   Epoch: 11   Global Step: 492140   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:02:12,762-Speed 2624.52 samples/sec   Loss 5.4764   LearningRate 0.0165   Epoch: 11   Global Step: 492150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:16,669-Speed 2621.51 samples/sec   Loss 5.4378   LearningRate 0.0165   Epoch: 11   Global Step: 492160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:20,573-Speed 2624.22 samples/sec   Loss 5.3282   LearningRate 0.0165   Epoch: 11   Global Step: 492170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:24,477-Speed 2622.99 samples/sec   Loss 5.4584   LearningRate 0.0165   Epoch: 11   Global Step: 492180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:28,379-Speed 2624.78 samples/sec   Loss 5.4042   LearningRate 0.0165   Epoch: 11   Global Step: 492190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:32,281-Speed 2625.03 samples/sec   Loss 5.4301   LearningRate 0.0165   Epoch: 11   Global Step: 492200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:36,186-Speed 2622.85 samples/sec   Loss 5.3563   LearningRate 0.0165   Epoch: 11   Global Step: 492210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:40,098-Speed 2618.09 samples/sec   Loss 5.4232   LearningRate 0.0165   Epoch: 11   Global Step: 492220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:44,008-Speed 2619.99 samples/sec   Loss 5.5734   LearningRate 0.0165   Epoch: 11   Global Step: 492230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:47,912-Speed 2623.66 samples/sec   Loss 5.4019   LearningRate 0.0165   Epoch: 11   Global Step: 492240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:51,816-Speed 2623.35 samples/sec   Loss 5.4083   LearningRate 0.0165   Epoch: 11   Global Step: 492250   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-04-15 03:02:55,699-Speed 2638.01 samples/sec   Loss 5.3755   LearningRate 0.0165   Epoch: 11   Global Step: 492260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:02:59,580-Speed 2639.50 samples/sec   Loss 5.3891   LearningRate 0.0165   Epoch: 11   Global Step: 492270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:03,485-Speed 2622.79 samples/sec   Loss 5.3561   LearningRate 0.0165   Epoch: 11   Global Step: 492280   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:07,391-Speed 2621.41 samples/sec   Loss 5.5201   LearningRate 0.0165   Epoch: 11   Global Step: 492290   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:11,291-Speed 2627.05 samples/sec   Loss 5.4345   LearningRate 0.0165   Epoch: 11   Global Step: 492300   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:15,194-Speed 2624.17 samples/sec   Loss 5.4916   LearningRate 0.0165   Epoch: 11   Global Step: 492310   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:19,093-Speed 2627.00 samples/sec   Loss 5.3359   LearningRate 0.0165   Epoch: 11   Global Step: 492320   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:22,997-Speed 2624.44 samples/sec   Loss 5.3729   LearningRate 0.0165   Epoch: 11   Global Step: 492330   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:26,904-Speed 2621.02 samples/sec   Loss 5.4757   LearningRate 0.0165   Epoch: 11   Global Step: 492340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:30,820-Speed 2616.17 samples/sec   Loss 5.5016   LearningRate 0.0165   Epoch: 11   Global Step: 492350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:34,726-Speed 2621.82 samples/sec   Loss 5.4653   LearningRate 0.0165   Epoch: 11   Global Step: 492360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:38,631-Speed 2622.83 samples/sec   Loss 5.4537   LearningRate 0.0165   Epoch: 11   Global Step: 492370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:03:42,512-Speed 2638.76 samples/sec   Loss 5.4399   LearningRate 0.0165   Epoch: 11   Global Step: 492380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:46,424-Speed 2618.13 samples/sec   Loss 5.3776   LearningRate 0.0165   Epoch: 11   Global Step: 492390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:50,330-Speed 2623.01 samples/sec   Loss 5.4727   LearningRate 0.0165   Epoch: 11   Global Step: 492400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:54,237-Speed 2620.93 samples/sec   Loss 5.4522   LearningRate 0.0165   Epoch: 11   Global Step: 492410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:03:58,147-Speed 2620.09 samples/sec   Loss 5.4956   LearningRate 0.0165   Epoch: 11   Global Step: 492420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:02,054-Speed 2621.58 samples/sec   Loss 5.4613   LearningRate 0.0165   Epoch: 11   Global Step: 492430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:05,953-Speed 2627.07 samples/sec   Loss 5.5278   LearningRate 0.0165   Epoch: 11   Global Step: 492440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:09,873-Speed 2612.30 samples/sec   Loss 5.3903   LearningRate 0.0165   Epoch: 11   Global Step: 492450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:13,778-Speed 2622.98 samples/sec   Loss 5.4093   LearningRate 0.0165   Epoch: 11   Global Step: 492460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:17,680-Speed 2624.98 samples/sec   Loss 5.6398   LearningRate 0.0165   Epoch: 11   Global Step: 492470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:21,589-Speed 2620.21 samples/sec   Loss 5.4084   LearningRate 0.0165   Epoch: 11   Global Step: 492480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:04:25,492-Speed 2624.30 samples/sec   Loss 5.3550   LearningRate 0.0165   Epoch: 11   Global Step: 492490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:04:29,398-Speed 2622.07 samples/sec   Loss 5.4278   LearningRate 0.0165   Epoch: 11   Global Step: 492500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:04:33,308-Speed 2619.44 samples/sec   Loss 5.4328   LearningRate 0.0165   Epoch: 11   Global Step: 492510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:04:37,206-Speed 2627.72 samples/sec   Loss 5.3812   LearningRate 0.0165   Epoch: 11   Global Step: 492520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:04:41,115-Speed 2619.92 samples/sec   Loss 5.3270   LearningRate 0.0165   Epoch: 11   Global Step: 492530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:04:45,003-Speed 2634.68 samples/sec   Loss 5.5391   LearningRate 0.0165   Epoch: 11   Global Step: 492540   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:48,910-Speed 2621.35 samples/sec   Loss 5.5115   LearningRate 0.0165   Epoch: 11   Global Step: 492550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:52,824-Speed 2617.50 samples/sec   Loss 5.4679   LearningRate 0.0165   Epoch: 11   Global Step: 492560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:04:56,733-Speed 2619.91 samples/sec   Loss 5.4689   LearningRate 0.0165   Epoch: 11   Global Step: 492570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:00,635-Speed 2624.83 samples/sec   Loss 5.4097   LearningRate 0.0165   Epoch: 11   Global Step: 492580   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:04,547-Speed 2618.14 samples/sec   Loss 5.4681   LearningRate 0.0165   Epoch: 11   Global Step: 492590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:08,452-Speed 2622.63 samples/sec   Loss 5.4373   LearningRate 0.0165   Epoch: 11   Global Step: 492600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:12,358-Speed 2622.81 samples/sec   Loss 5.4224   LearningRate 0.0165   Epoch: 11   Global Step: 492610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:16,261-Speed 2623.98 samples/sec   Loss 5.3629   LearningRate 0.0165   Epoch: 11   Global Step: 492620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:20,168-Speed 2621.75 samples/sec   Loss 5.2737   LearningRate 0.0165   Epoch: 11   Global Step: 492630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:24,076-Speed 2621.27 samples/sec   Loss 5.3559   LearningRate 0.0165   Epoch: 11   Global Step: 492640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:05:27,954-Speed 2641.19 samples/sec   Loss 5.5204   LearningRate 0.0165   Epoch: 11   Global Step: 492650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:31,871-Speed 2614.15 samples/sec   Loss 5.3855   LearningRate 0.0165   Epoch: 11   Global Step: 492660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:35,773-Speed 2625.11 samples/sec   Loss 5.4316   LearningRate 0.0165   Epoch: 11   Global Step: 492670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:39,684-Speed 2618.87 samples/sec   Loss 5.4595   LearningRate 0.0165   Epoch: 11   Global Step: 492680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:43,585-Speed 2625.59 samples/sec   Loss 5.4792   LearningRate 0.0165   Epoch: 11   Global Step: 492690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:47,486-Speed 2625.38 samples/sec   Loss 5.4507   LearningRate 0.0165   Epoch: 11   Global Step: 492700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:51,389-Speed 2624.82 samples/sec   Loss 5.3835   LearningRate 0.0165   Epoch: 11   Global Step: 492710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:55,286-Speed 2627.77 samples/sec   Loss 5.3643   LearningRate 0.0165   Epoch: 11   Global Step: 492720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:05:59,189-Speed 2624.59 samples/sec   Loss 5.4902   LearningRate 0.0165   Epoch: 11   Global Step: 492730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:06:03,088-Speed 2626.76 samples/sec   Loss 5.5067   LearningRate 0.0165   Epoch: 11   Global Step: 492740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:06:06,997-Speed 2619.90 samples/sec   Loss 5.3587   LearningRate 0.0165   Epoch: 11   Global Step: 492750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:10,898-Speed 2625.63 samples/sec   Loss 5.4554   LearningRate 0.0165   Epoch: 11   Global Step: 492760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:14,795-Speed 2628.31 samples/sec   Loss 5.4833   LearningRate 0.0165   Epoch: 11   Global Step: 492770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:18,694-Speed 2627.06 samples/sec   Loss 5.4262   LearningRate 0.0165   Epoch: 11   Global Step: 492780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:22,600-Speed 2621.86 samples/sec   Loss 5.3454   LearningRate 0.0165   Epoch: 11   Global Step: 492790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:26,499-Speed 2627.35 samples/sec   Loss 5.4106   LearningRate 0.0165   Epoch: 11   Global Step: 492800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:30,410-Speed 2618.76 samples/sec   Loss 5.5276   LearningRate 0.0165   Epoch: 11   Global Step: 492810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:34,323-Speed 2617.68 samples/sec   Loss 5.4554   LearningRate 0.0165   Epoch: 11   Global Step: 492820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:38,229-Speed 2622.49 samples/sec   Loss 5.4668   LearningRate 0.0165   Epoch: 11   Global Step: 492830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:42,137-Speed 2620.65 samples/sec   Loss 5.4586   LearningRate 0.0165   Epoch: 11   Global Step: 492840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:46,018-Speed 2638.62 samples/sec   Loss 5.3893   LearningRate 0.0165   Epoch: 11   Global Step: 492850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:49,934-Speed 2616.04 samples/sec   Loss 5.4858   LearningRate 0.0165   Epoch: 11   Global Step: 492860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:53,835-Speed 2625.28 samples/sec   Loss 5.4932   LearningRate 0.0165   Epoch: 11   Global Step: 492870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:06:57,732-Speed 2628.74 samples/sec   Loss 5.5966   LearningRate 0.0165   Epoch: 11   Global Step: 492880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:07:01,643-Speed 2618.35 samples/sec   Loss 5.4133   LearningRate 0.0165   Epoch: 11   Global Step: 492890   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:07:05,663-Speed 2548.65 samples/sec   Loss 5.4260   LearningRate 0.0165   Epoch: 11   Global Step: 492900   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:07:09,585-Speed 2611.57 samples/sec   Loss 5.5069   LearningRate 0.0165   Epoch: 11   Global Step: 492910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:07:13,484-Speed 2626.63 samples/sec   Loss 5.4660   LearningRate 0.0165   Epoch: 11   Global Step: 492920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:07:17,368-Speed 2636.91 samples/sec   Loss 5.4593   LearningRate 0.0165   Epoch: 11   Global Step: 492930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:21,269-Speed 2625.76 samples/sec   Loss 5.4368   LearningRate 0.0165   Epoch: 11   Global Step: 492940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:25,177-Speed 2620.55 samples/sec   Loss 5.3721   LearningRate 0.0165   Epoch: 11   Global Step: 492950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:29,076-Speed 2627.01 samples/sec   Loss 5.5232   LearningRate 0.0165   Epoch: 11   Global Step: 492960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:32,983-Speed 2621.58 samples/sec   Loss 5.4851   LearningRate 0.0165   Epoch: 11   Global Step: 492970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:36,903-Speed 2613.22 samples/sec   Loss 5.4374   LearningRate 0.0165   Epoch: 11   Global Step: 492980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:40,800-Speed 2628.05 samples/sec   Loss 5.4141   LearningRate 0.0165   Epoch: 11   Global Step: 492990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:44,704-Speed 2623.61 samples/sec   Loss 5.4496   LearningRate 0.0165   Epoch: 11   Global Step: 493000   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:48,603-Speed 2626.57 samples/sec   Loss 5.3781   LearningRate 0.0165   Epoch: 11   Global Step: 493010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:52,510-Speed 2621.32 samples/sec   Loss 5.3683   LearningRate 0.0165   Epoch: 11   Global Step: 493020   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:07:56,411-Speed 2626.10 samples/sec   Loss 5.4755   LearningRate 0.0165   Epoch: 11   Global Step: 493030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:00,312-Speed 2625.28 samples/sec   Loss 5.5836   LearningRate 0.0165   Epoch: 11   Global Step: 493040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:04,222-Speed 2619.74 samples/sec   Loss 5.4591   LearningRate 0.0165   Epoch: 11   Global Step: 493050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:08,118-Speed 2628.59 samples/sec   Loss 5.4481   LearningRate 0.0165   Epoch: 11   Global Step: 493060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:12,016-Speed 2627.55 samples/sec   Loss 5.4819   LearningRate 0.0165   Epoch: 11   Global Step: 493070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:15,922-Speed 2621.97 samples/sec   Loss 5.4403   LearningRate 0.0165   Epoch: 11   Global Step: 493080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:19,823-Speed 2625.93 samples/sec   Loss 5.2873   LearningRate 0.0165   Epoch: 11   Global Step: 493090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:23,721-Speed 2627.84 samples/sec   Loss 5.4416   LearningRate 0.0165   Epoch: 11   Global Step: 493100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:27,617-Speed 2628.75 samples/sec   Loss 5.3992   LearningRate 0.0165   Epoch: 11   Global Step: 493110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:31,515-Speed 2627.87 samples/sec   Loss 5.4861   LearningRate 0.0164   Epoch: 11   Global Step: 493120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:35,394-Speed 2640.75 samples/sec   Loss 5.3134   LearningRate 0.0164   Epoch: 11   Global Step: 493130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:39,304-Speed 2619.73 samples/sec   Loss 5.4630   LearningRate 0.0164   Epoch: 11   Global Step: 493140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:43,204-Speed 2626.13 samples/sec   Loss 5.4514   LearningRate 0.0164   Epoch: 11   Global Step: 493150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:47,104-Speed 2625.98 samples/sec   Loss 5.4178   LearningRate 0.0164   Epoch: 11   Global Step: 493160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:51,008-Speed 2623.41 samples/sec   Loss 5.5231   LearningRate 0.0164   Epoch: 11   Global Step: 493170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:54,924-Speed 2615.87 samples/sec   Loss 5.4235   LearningRate 0.0164   Epoch: 11   Global Step: 493180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:08:58,817-Speed 2631.22 samples/sec   Loss 5.5021   LearningRate 0.0164   Epoch: 11   Global Step: 493190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:02,722-Speed 2623.18 samples/sec   Loss 5.4490   LearningRate 0.0164   Epoch: 11   Global Step: 493200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:06,621-Speed 2626.27 samples/sec   Loss 5.3832   LearningRate 0.0164   Epoch: 11   Global Step: 493210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:10,522-Speed 2625.71 samples/sec   Loss 5.3985   LearningRate 0.0164   Epoch: 11   Global Step: 493220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:14,400-Speed 2641.06 samples/sec   Loss 5.5047   LearningRate 0.0164   Epoch: 11   Global Step: 493230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:18,302-Speed 2624.84 samples/sec   Loss 5.4414   LearningRate 0.0164   Epoch: 11   Global Step: 493240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:22,205-Speed 2624.42 samples/sec   Loss 5.4014   LearningRate 0.0164   Epoch: 11   Global Step: 493250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:26,107-Speed 2624.67 samples/sec   Loss 5.4227   LearningRate 0.0164   Epoch: 11   Global Step: 493260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:30,014-Speed 2622.06 samples/sec   Loss 5.4048   LearningRate 0.0164   Epoch: 11   Global Step: 493270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:33,916-Speed 2624.43 samples/sec   Loss 5.4209   LearningRate 0.0164   Epoch: 11   Global Step: 493280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:37,812-Speed 2629.39 samples/sec   Loss 5.4110   LearningRate 0.0164   Epoch: 11   Global Step: 493290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:41,723-Speed 2618.61 samples/sec   Loss 5.4710   LearningRate 0.0164   Epoch: 11   Global Step: 493300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:45,686-Speed 2585.27 samples/sec   Loss 5.3170   LearningRate 0.0164   Epoch: 11   Global Step: 493310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:49,619-Speed 2603.72 samples/sec   Loss 5.4202   LearningRate 0.0164   Epoch: 11   Global Step: 493320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:09:53,522-Speed 2624.71 samples/sec   Loss 5.3365   LearningRate 0.0164   Epoch: 11   Global Step: 493330   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-04-15 03:09:57,406-Speed 2636.35 samples/sec   Loss 5.3602   LearningRate 0.0164   Epoch: 11   Global Step: 493340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:10:01,312-Speed 2622.39 samples/sec   Loss 5.4216   LearningRate 0.0164   Epoch: 11   Global Step: 493350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:10:05,197-Speed 2636.27 samples/sec   Loss 5.4151   LearningRate 0.0164   Epoch: 11   Global Step: 493360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:09,105-Speed 2621.22 samples/sec   Loss 5.3309   LearningRate 0.0164   Epoch: 11   Global Step: 493370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:13,004-Speed 2626.42 samples/sec   Loss 5.4079   LearningRate 0.0164   Epoch: 11   Global Step: 493380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:16,901-Speed 2628.63 samples/sec   Loss 5.4104   LearningRate 0.0164   Epoch: 11   Global Step: 493390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:20,802-Speed 2625.90 samples/sec   Loss 5.3856   LearningRate 0.0164   Epoch: 11   Global Step: 493400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:24,699-Speed 2628.27 samples/sec   Loss 5.4890   LearningRate 0.0164   Epoch: 11   Global Step: 493410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:28,598-Speed 2626.96 samples/sec   Loss 5.3748   LearningRate 0.0164   Epoch: 11   Global Step: 493420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:32,495-Speed 2627.90 samples/sec   Loss 5.4602   LearningRate 0.0164   Epoch: 11   Global Step: 493430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:36,415-Speed 2613.10 samples/sec   Loss 5.3466   LearningRate 0.0164   Epoch: 11   Global Step: 493440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:40,323-Speed 2620.52 samples/sec   Loss 5.4057   LearningRate 0.0164   Epoch: 11   Global Step: 493450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:10:44,241-Speed 2622.45 samples/sec   Loss 5.4936   LearningRate 0.0164   Epoch: 11   Global Step: 493460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:10:48,138-Speed 2628.62 samples/sec   Loss 5.4179   LearningRate 0.0164   Epoch: 11   Global Step: 493470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:10:52,032-Speed 2630.40 samples/sec   Loss 5.3773   LearningRate 0.0164   Epoch: 11   Global Step: 493480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:10:55,928-Speed 2628.64 samples/sec   Loss 5.5296   LearningRate 0.0164   Epoch: 11   Global Step: 493490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:10:59,832-Speed 2623.97 samples/sec   Loss 5.4187   LearningRate 0.0164   Epoch: 11   Global Step: 493500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:03,727-Speed 2629.80 samples/sec   Loss 5.4606   LearningRate 0.0164   Epoch: 11   Global Step: 493510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:07,622-Speed 2628.98 samples/sec   Loss 5.4364   LearningRate 0.0164   Epoch: 11   Global Step: 493520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:11,521-Speed 2626.99 samples/sec   Loss 5.4093   LearningRate 0.0164   Epoch: 11   Global Step: 493530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:15,422-Speed 2625.79 samples/sec   Loss 5.5791   LearningRate 0.0164   Epoch: 11   Global Step: 493540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:19,333-Speed 2618.84 samples/sec   Loss 5.3980   LearningRate 0.0164   Epoch: 11   Global Step: 493550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:23,212-Speed 2643.08 samples/sec   Loss 5.4118   LearningRate 0.0164   Epoch: 11   Global Step: 493560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:27,110-Speed 2627.93 samples/sec   Loss 5.3804   LearningRate 0.0164   Epoch: 11   Global Step: 493570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:31,009-Speed 2627.11 samples/sec   Loss 5.4490   LearningRate 0.0164   Epoch: 11   Global Step: 493580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:34,906-Speed 2627.96 samples/sec   Loss 5.3603   LearningRate 0.0164   Epoch: 11   Global Step: 493590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:38,804-Speed 2628.08 samples/sec   Loss 5.5101   LearningRate 0.0164   Epoch: 11   Global Step: 493600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:42,712-Speed 2620.24 samples/sec   Loss 5.4446   LearningRate 0.0164   Epoch: 11   Global Step: 493610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:46,607-Speed 2629.95 samples/sec   Loss 5.3699   LearningRate 0.0164   Epoch: 11   Global Step: 493620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:50,503-Speed 2628.80 samples/sec   Loss 5.5643   LearningRate 0.0164   Epoch: 11   Global Step: 493630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:11:54,374-Speed 2645.97 samples/sec   Loss 5.3516   LearningRate 0.0164   Epoch: 11   Global Step: 493640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:11:58,277-Speed 2624.06 samples/sec   Loss 5.4625   LearningRate 0.0164   Epoch: 11   Global Step: 493650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:02,196-Speed 2614.03 samples/sec   Loss 5.4351   LearningRate 0.0164   Epoch: 11   Global Step: 493660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:06,098-Speed 2624.78 samples/sec   Loss 5.3539   LearningRate 0.0164   Epoch: 11   Global Step: 493670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:09,995-Speed 2627.87 samples/sec   Loss 5.4522   LearningRate 0.0164   Epoch: 11   Global Step: 493680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:13,902-Speed 2621.85 samples/sec   Loss 5.3708   LearningRate 0.0164   Epoch: 11   Global Step: 493690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:17,804-Speed 2624.69 samples/sec   Loss 5.3493   LearningRate 0.0164   Epoch: 11   Global Step: 493700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:21,700-Speed 2629.26 samples/sec   Loss 5.4212   LearningRate 0.0164   Epoch: 11   Global Step: 493710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:25,634-Speed 2603.24 samples/sec   Loss 5.4550   LearningRate 0.0164   Epoch: 11   Global Step: 493720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:29,534-Speed 2626.69 samples/sec   Loss 5.4339   LearningRate 0.0164   Epoch: 11   Global Step: 493730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:12:33,503-Speed 2580.37 samples/sec   Loss 5.5034   LearningRate 0.0164   Epoch: 11   Global Step: 493740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:12:37,437-Speed 2603.49 samples/sec   Loss 5.3481   LearningRate 0.0164   Epoch: 11   Global Step: 493750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:12:41,342-Speed 2622.73 samples/sec   Loss 5.4643   LearningRate 0.0164   Epoch: 11   Global Step: 493760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:12:45,259-Speed 2614.88 samples/sec   Loss 5.3349   LearningRate 0.0164   Epoch: 11   Global Step: 493770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:12:49,165-Speed 2622.03 samples/sec   Loss 5.4261   LearningRate 0.0164   Epoch: 11   Global Step: 493780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:12:53,073-Speed 2620.68 samples/sec   Loss 5.5870   LearningRate 0.0164   Epoch: 11   Global Step: 493790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:12:56,979-Speed 2622.56 samples/sec   Loss 5.4069   LearningRate 0.0164   Epoch: 11   Global Step: 493800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:01,389-Speed 2322.46 samples/sec   Loss 5.4000   LearningRate 0.0164   Epoch: 11   Global Step: 493810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:05,283-Speed 2630.33 samples/sec   Loss 5.3850   LearningRate 0.0164   Epoch: 11   Global Step: 493820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:09,183-Speed 2626.53 samples/sec   Loss 5.5240   LearningRate 0.0164   Epoch: 11   Global Step: 493830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:13,069-Speed 2635.49 samples/sec   Loss 5.4209   LearningRate 0.0164   Epoch: 11   Global Step: 493840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:16,965-Speed 2628.90 samples/sec   Loss 5.3254   LearningRate 0.0164   Epoch: 11   Global Step: 493850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:20,873-Speed 2620.92 samples/sec   Loss 5.4747   LearningRate 0.0164   Epoch: 11   Global Step: 493860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:24,766-Speed 2630.58 samples/sec   Loss 5.3987   LearningRate 0.0164   Epoch: 11   Global Step: 493870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:28,667-Speed 2625.86 samples/sec   Loss 5.4833   LearningRate 0.0164   Epoch: 11   Global Step: 493880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:13:32,546-Speed 2640.91 samples/sec   Loss 5.4330   LearningRate 0.0164   Epoch: 11   Global Step: 493890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:36,504-Speed 2587.54 samples/sec   Loss 5.4510   LearningRate 0.0164   Epoch: 11   Global Step: 493900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:40,412-Speed 2620.71 samples/sec   Loss 5.3671   LearningRate 0.0164   Epoch: 11   Global Step: 493910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:44,312-Speed 2626.30 samples/sec   Loss 5.3234   LearningRate 0.0164   Epoch: 11   Global Step: 493920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:48,222-Speed 2619.48 samples/sec   Loss 5.4363   LearningRate 0.0164   Epoch: 11   Global Step: 493930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:52,114-Speed 2631.78 samples/sec   Loss 5.4739   LearningRate 0.0164   Epoch: 11   Global Step: 493940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:56,004-Speed 2632.62 samples/sec   Loss 5.3302   LearningRate 0.0164   Epoch: 11   Global Step: 493950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:13:59,900-Speed 2629.12 samples/sec   Loss 5.3723   LearningRate 0.0164   Epoch: 11   Global Step: 493960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:14:03,794-Speed 2630.48 samples/sec   Loss 5.3305   LearningRate 0.0164   Epoch: 11   Global Step: 493970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:14:07,688-Speed 2630.00 samples/sec   Loss 5.4541   LearningRate 0.0164   Epoch: 11   Global Step: 493980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:14:11,584-Speed 2629.57 samples/sec   Loss 5.4174   LearningRate 0.0164   Epoch: 11   Global Step: 493990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:15,486-Speed 2624.19 samples/sec   Loss 5.4359   LearningRate 0.0164   Epoch: 11   Global Step: 494000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:19,393-Speed 2621.92 samples/sec   Loss 5.4535   LearningRate 0.0164   Epoch: 11   Global Step: 494010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:23,292-Speed 2626.82 samples/sec   Loss 5.3800   LearningRate 0.0164   Epoch: 11   Global Step: 494020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:27,200-Speed 2621.22 samples/sec   Loss 5.5235   LearningRate 0.0164   Epoch: 11   Global Step: 494030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:31,249-Speed 2529.33 samples/sec   Loss 5.3666   LearningRate 0.0164   Epoch: 11   Global Step: 494040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:35,358-Speed 2492.80 samples/sec   Loss 5.3353   LearningRate 0.0164   Epoch: 11   Global Step: 494050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:39,307-Speed 2593.45 samples/sec   Loss 5.4453   LearningRate 0.0164   Epoch: 11   Global Step: 494060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:43,206-Speed 2626.56 samples/sec   Loss 5.4045   LearningRate 0.0164   Epoch: 11   Global Step: 494070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:47,107-Speed 2626.22 samples/sec   Loss 5.3571   LearningRate 0.0164   Epoch: 11   Global Step: 494080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:50,977-Speed 2646.47 samples/sec   Loss 5.4267   LearningRate 0.0164   Epoch: 11   Global Step: 494090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:54,874-Speed 2628.77 samples/sec   Loss 5.5005   LearningRate 0.0164   Epoch: 11   Global Step: 494100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:14:58,774-Speed 2626.08 samples/sec   Loss 5.4700   LearningRate 0.0164   Epoch: 11   Global Step: 494110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:02,681-Speed 2621.57 samples/sec   Loss 5.3837   LearningRate 0.0164   Epoch: 11   Global Step: 494120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:06,588-Speed 2621.03 samples/sec   Loss 5.3330   LearningRate 0.0164   Epoch: 11   Global Step: 494130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:10,485-Speed 2628.51 samples/sec   Loss 5.5332   LearningRate 0.0163   Epoch: 11   Global Step: 494140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:14,382-Speed 2627.89 samples/sec   Loss 5.3489   LearningRate 0.0163   Epoch: 11   Global Step: 494150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:18,283-Speed 2625.85 samples/sec   Loss 5.4557   LearningRate 0.0163   Epoch: 11   Global Step: 494160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:22,199-Speed 2615.72 samples/sec   Loss 5.3789   LearningRate 0.0163   Epoch: 11   Global Step: 494170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:26,101-Speed 2624.76 samples/sec   Loss 5.4439   LearningRate 0.0163   Epoch: 11   Global Step: 494180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:29,987-Speed 2636.24 samples/sec   Loss 5.3066   LearningRate 0.0163   Epoch: 11   Global Step: 494190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:33,889-Speed 2624.68 samples/sec   Loss 5.3483   LearningRate 0.0163   Epoch: 11   Global Step: 494200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:37,781-Speed 2631.77 samples/sec   Loss 5.3494   LearningRate 0.0163   Epoch: 11   Global Step: 494210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:41,679-Speed 2627.39 samples/sec   Loss 5.4348   LearningRate 0.0163   Epoch: 11   Global Step: 494220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:45,582-Speed 2624.56 samples/sec   Loss 5.4744   LearningRate 0.0163   Epoch: 11   Global Step: 494230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:49,475-Speed 2630.78 samples/sec   Loss 5.5058   LearningRate 0.0163   Epoch: 11   Global Step: 494240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:53,379-Speed 2623.55 samples/sec   Loss 5.4095   LearningRate 0.0163   Epoch: 11   Global Step: 494250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:15:57,275-Speed 2629.16 samples/sec   Loss 5.4226   LearningRate 0.0163   Epoch: 11   Global Step: 494260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:16:01,168-Speed 2630.76 samples/sec   Loss 5.3679   LearningRate 0.0163   Epoch: 11   Global Step: 494270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:16:05,049-Speed 2639.33 samples/sec   Loss 5.3375   LearningRate 0.0163   Epoch: 11   Global Step: 494280   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:08,953-Speed 2623.59 samples/sec   Loss 5.5241   LearningRate 0.0163   Epoch: 11   Global Step: 494290   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:12,854-Speed 2625.20 samples/sec   Loss 5.5259   LearningRate 0.0163   Epoch: 11   Global Step: 494300   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:16,758-Speed 2624.26 samples/sec   Loss 5.4847   LearningRate 0.0163   Epoch: 11   Global Step: 494310   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:20,650-Speed 2631.52 samples/sec   Loss 5.4210   LearningRate 0.0163   Epoch: 11   Global Step: 494320   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:24,553-Speed 2624.09 samples/sec   Loss 5.4462   LearningRate 0.0163   Epoch: 11   Global Step: 494330   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:28,453-Speed 2626.17 samples/sec   Loss 5.3664   LearningRate 0.0163   Epoch: 11   Global Step: 494340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:32,354-Speed 2626.13 samples/sec   Loss 5.4366   LearningRate 0.0163   Epoch: 11   Global Step: 494350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:36,270-Speed 2614.95 samples/sec   Loss 5.3394   LearningRate 0.0163   Epoch: 11   Global Step: 494360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:40,173-Speed 2624.18 samples/sec   Loss 5.5136   LearningRate 0.0163   Epoch: 11   Global Step: 494370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:16:44,089-Speed 2616.28 samples/sec   Loss 5.4038   LearningRate 0.0163   Epoch: 11   Global Step: 494380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:16:47,988-Speed 2627.19 samples/sec   Loss 5.4045   LearningRate 0.0163   Epoch: 11   Global Step: 494390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:16:51,899-Speed 2618.63 samples/sec   Loss 5.3709   LearningRate 0.0163   Epoch: 11   Global Step: 494400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:16:55,805-Speed 2622.03 samples/sec   Loss 5.3521   LearningRate 0.0163   Epoch: 11   Global Step: 494410   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:16:59,683-Speed 2641.20 samples/sec   Loss 5.4176   LearningRate 0.0163   Epoch: 11   Global Step: 494420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:03,577-Speed 2630.62 samples/sec   Loss 5.3550   LearningRate 0.0163   Epoch: 11   Global Step: 494430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:07,495-Speed 2613.77 samples/sec   Loss 5.4673   LearningRate 0.0163   Epoch: 11   Global Step: 494440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:11,390-Speed 2629.30 samples/sec   Loss 5.4455   LearningRate 0.0163   Epoch: 11   Global Step: 494450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:15,291-Speed 2626.18 samples/sec   Loss 5.3912   LearningRate 0.0163   Epoch: 11   Global Step: 494460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:19,216-Speed 2609.69 samples/sec   Loss 5.3885   LearningRate 0.0163   Epoch: 11   Global Step: 494470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:23,115-Speed 2626.83 samples/sec   Loss 5.4535   LearningRate 0.0163   Epoch: 11   Global Step: 494480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:27,012-Speed 2628.58 samples/sec   Loss 5.3498   LearningRate 0.0163   Epoch: 11   Global Step: 494490   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:30,908-Speed 2628.77 samples/sec   Loss 5.3438   LearningRate 0.0163   Epoch: 11   Global Step: 494500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:34,807-Speed 2626.86 samples/sec   Loss 5.2601   LearningRate 0.0163   Epoch: 11   Global Step: 494510   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:38,717-Speed 2618.95 samples/sec   Loss 5.4875   LearningRate 0.0163   Epoch: 11   Global Step: 494520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:17:42,628-Speed 2618.62 samples/sec   Loss 5.3953   LearningRate 0.0163   Epoch: 11   Global Step: 494530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:17:46,538-Speed 2620.21 samples/sec   Loss 5.4871   LearningRate 0.0163   Epoch: 11   Global Step: 494540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:17:50,412-Speed 2643.54 samples/sec   Loss 5.3276   LearningRate 0.0163   Epoch: 11   Global Step: 494550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:54,311-Speed 2627.00 samples/sec   Loss 5.3588   LearningRate 0.0163   Epoch: 11   Global Step: 494560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:17:58,210-Speed 2626.94 samples/sec   Loss 5.4477   LearningRate 0.0163   Epoch: 11   Global Step: 494570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:02,104-Speed 2631.14 samples/sec   Loss 5.3832   LearningRate 0.0163   Epoch: 11   Global Step: 494580   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:06,000-Speed 2628.53 samples/sec   Loss 5.4300   LearningRate 0.0163   Epoch: 11   Global Step: 494590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:09,904-Speed 2623.64 samples/sec   Loss 5.3210   LearningRate 0.0163   Epoch: 11   Global Step: 494600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:13,815-Speed 2618.74 samples/sec   Loss 5.4182   LearningRate 0.0163   Epoch: 11   Global Step: 494610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:17,743-Speed 2607.98 samples/sec   Loss 5.3681   LearningRate 0.0163   Epoch: 11   Global Step: 494620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:21,839-Speed 2500.36 samples/sec   Loss 5.2787   LearningRate 0.0163   Epoch: 11   Global Step: 494630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:25,784-Speed 2595.88 samples/sec   Loss 5.3785   LearningRate 0.0163   Epoch: 11   Global Step: 494640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:18:29,690-Speed 2622.75 samples/sec   Loss 5.4000   LearningRate 0.0163   Epoch: 11   Global Step: 494650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:33,594-Speed 2623.45 samples/sec   Loss 5.5049   LearningRate 0.0163   Epoch: 11   Global Step: 494660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:37,493-Speed 2627.08 samples/sec   Loss 5.3868   LearningRate 0.0163   Epoch: 11   Global Step: 494670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:41,390-Speed 2628.59 samples/sec   Loss 5.2919   LearningRate 0.0163   Epoch: 11   Global Step: 494680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:45,286-Speed 2628.91 samples/sec   Loss 5.5713   LearningRate 0.0163   Epoch: 11   Global Step: 494690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:49,184-Speed 2627.41 samples/sec   Loss 5.4112   LearningRate 0.0163   Epoch: 11   Global Step: 494700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:53,078-Speed 2630.46 samples/sec   Loss 5.3823   LearningRate 0.0163   Epoch: 11   Global Step: 494710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:18:56,978-Speed 2626.41 samples/sec   Loss 5.4086   LearningRate 0.0163   Epoch: 11   Global Step: 494720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:00,884-Speed 2622.02 samples/sec   Loss 5.4314   LearningRate 0.0163   Epoch: 11   Global Step: 494730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:04,808-Speed 2610.12 samples/sec   Loss 5.3361   LearningRate 0.0163   Epoch: 11   Global Step: 494740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:08,686-Speed 2641.26 samples/sec   Loss 5.4693   LearningRate 0.0163   Epoch: 11   Global Step: 494750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:12,660-Speed 2577.14 samples/sec   Loss 5.4213   LearningRate 0.0163   Epoch: 11   Global Step: 494760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:16,556-Speed 2628.90 samples/sec   Loss 5.4027   LearningRate 0.0163   Epoch: 11   Global Step: 494770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:20,480-Speed 2610.81 samples/sec   Loss 5.3928   LearningRate 0.0163   Epoch: 11   Global Step: 494780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:24,375-Speed 2629.40 samples/sec   Loss 5.4195   LearningRate 0.0163   Epoch: 11   Global Step: 494790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:28,273-Speed 2628.18 samples/sec   Loss 5.3958   LearningRate 0.0163   Epoch: 11   Global Step: 494800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:19:32,146-Speed 2644.42 samples/sec   Loss 5.4088   LearningRate 0.0163   Epoch: 11   Global Step: 494810   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:36,044-Speed 2627.96 samples/sec   Loss 5.3629   LearningRate 0.0163   Epoch: 11   Global Step: 494820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:39,946-Speed 2624.59 samples/sec   Loss 5.4637   LearningRate 0.0163   Epoch: 11   Global Step: 494830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:43,846-Speed 2626.18 samples/sec   Loss 5.4904   LearningRate 0.0163   Epoch: 11   Global Step: 494840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:47,742-Speed 2628.41 samples/sec   Loss 5.4971   LearningRate 0.0163   Epoch: 11   Global Step: 494850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:51,649-Speed 2622.23 samples/sec   Loss 5.4220   LearningRate 0.0163   Epoch: 11   Global Step: 494860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:55,551-Speed 2625.18 samples/sec   Loss 5.5046   LearningRate 0.0163   Epoch: 11   Global Step: 494870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:19:59,450-Speed 2627.08 samples/sec   Loss 5.3757   LearningRate 0.0163   Epoch: 11   Global Step: 494880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:20:03,346-Speed 2628.95 samples/sec   Loss 5.4776   LearningRate 0.0163   Epoch: 11   Global Step: 494890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:20:07,246-Speed 2626.74 samples/sec   Loss 5.4824   LearningRate 0.0163   Epoch: 11   Global Step: 494900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:20:11,152-Speed 2622.07 samples/sec   Loss 5.5730   LearningRate 0.0163   Epoch: 11   Global Step: 494910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:15,055-Speed 2624.00 samples/sec   Loss 5.4990   LearningRate 0.0163   Epoch: 11   Global Step: 494920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:18,956-Speed 2625.35 samples/sec   Loss 5.5048   LearningRate 0.0163   Epoch: 11   Global Step: 494930   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:22,856-Speed 2625.98 samples/sec   Loss 5.3638   LearningRate 0.0163   Epoch: 11   Global Step: 494940   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:26,757-Speed 2626.22 samples/sec   Loss 5.3616   LearningRate 0.0163   Epoch: 11   Global Step: 494950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:30,649-Speed 2631.28 samples/sec   Loss 5.4646   LearningRate 0.0163   Epoch: 11   Global Step: 494960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:34,545-Speed 2629.00 samples/sec   Loss 5.4050   LearningRate 0.0163   Epoch: 11   Global Step: 494970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:38,443-Speed 2628.13 samples/sec   Loss 5.3387   LearningRate 0.0163   Epoch: 11   Global Step: 494980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:42,337-Speed 2630.44 samples/sec   Loss 5.3990   LearningRate 0.0163   Epoch: 11   Global Step: 494990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:46,229-Speed 2631.60 samples/sec   Loss 5.3979   LearningRate 0.0163   Epoch: 11   Global Step: 495000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:50,146-Speed 2614.79 samples/sec   Loss 5.3098   LearningRate 0.0163   Epoch: 11   Global Step: 495010   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-04-15 03:20:54,026-Speed 2639.57 samples/sec   Loss 5.3470   LearningRate 0.0163   Epoch: 11   Global Step: 495020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:20:57,936-Speed 2619.99 samples/sec   Loss 5.3414   LearningRate 0.0163   Epoch: 11   Global Step: 495030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:01,927-Speed 2566.01 samples/sec   Loss 5.4374   LearningRate 0.0163   Epoch: 11   Global Step: 495040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:05,918-Speed 2566.24 samples/sec   Loss 5.4010   LearningRate 0.0163   Epoch: 11   Global Step: 495050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:09,811-Speed 2630.79 samples/sec   Loss 5.4164   LearningRate 0.0163   Epoch: 11   Global Step: 495060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:13,802-Speed 2566.48 samples/sec   Loss 5.4978   LearningRate 0.0163   Epoch: 11   Global Step: 495070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:17,714-Speed 2618.51 samples/sec   Loss 5.3296   LearningRate 0.0163   Epoch: 11   Global Step: 495080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:21,611-Speed 2627.78 samples/sec   Loss 5.4085   LearningRate 0.0163   Epoch: 11   Global Step: 495090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:25,513-Speed 2625.49 samples/sec   Loss 5.4197   LearningRate 0.0163   Epoch: 11   Global Step: 495100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:29,409-Speed 2628.69 samples/sec   Loss 5.4774   LearningRate 0.0163   Epoch: 11   Global Step: 495110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:33,283-Speed 2643.96 samples/sec   Loss 5.3709   LearningRate 0.0163   Epoch: 11   Global Step: 495120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:37,200-Speed 2614.47 samples/sec   Loss 5.3947   LearningRate 0.0163   Epoch: 11   Global Step: 495130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:41,095-Speed 2629.60 samples/sec   Loss 5.4270   LearningRate 0.0163   Epoch: 11   Global Step: 495140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:44,992-Speed 2628.13 samples/sec   Loss 5.3438   LearningRate 0.0163   Epoch: 11   Global Step: 495150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:48,888-Speed 2629.55 samples/sec   Loss 5.3337   LearningRate 0.0163   Epoch: 11   Global Step: 495160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:52,789-Speed 2625.98 samples/sec   Loss 5.3423   LearningRate 0.0162   Epoch: 11   Global Step: 495170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:21:56,692-Speed 2623.81 samples/sec   Loss 5.3955   LearningRate 0.0162   Epoch: 11   Global Step: 495180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:22:00,599-Speed 2621.84 samples/sec   Loss 5.3795   LearningRate 0.0162   Epoch: 11   Global Step: 495190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:22:04,495-Speed 2628.61 samples/sec   Loss 5.3691   LearningRate 0.0162   Epoch: 11   Global Step: 495200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:22:08,399-Speed 2623.85 samples/sec   Loss 5.3029   LearningRate 0.0162   Epoch: 11   Global Step: 495210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:12,312-Speed 2617.08 samples/sec   Loss 5.4414   LearningRate 0.0162   Epoch: 11   Global Step: 495220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:16,215-Speed 2624.53 samples/sec   Loss 5.3794   LearningRate 0.0162   Epoch: 11   Global Step: 495230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:20,117-Speed 2624.95 samples/sec   Loss 5.3210   LearningRate 0.0162   Epoch: 11   Global Step: 495240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:24,012-Speed 2629.89 samples/sec   Loss 5.3859   LearningRate 0.0162   Epoch: 11   Global Step: 495250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:27,933-Speed 2612.61 samples/sec   Loss 5.2910   LearningRate 0.0162   Epoch: 11   Global Step: 495260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:31,830-Speed 2628.18 samples/sec   Loss 5.2938   LearningRate 0.0162   Epoch: 11   Global Step: 495270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:35,791-Speed 2585.44 samples/sec   Loss 5.4137   LearningRate 0.0162   Epoch: 11   Global Step: 495280   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:39,685-Speed 2630.54 samples/sec   Loss 5.4036   LearningRate 0.0162   Epoch: 11   Global Step: 495290   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:43,581-Speed 2629.26 samples/sec   Loss 5.3444   LearningRate 0.0162   Epoch: 11   Global Step: 495300   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:22:47,481-Speed 2626.64 samples/sec   Loss 5.3190   LearningRate 0.0162   Epoch: 11   Global Step: 495310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:22:51,379-Speed 2627.19 samples/sec   Loss 5.2974   LearningRate 0.0162   Epoch: 11   Global Step: 495320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:22:55,285-Speed 2629.04 samples/sec   Loss 5.3867   LearningRate 0.0162   Epoch: 11   Global Step: 495330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:22:59,183-Speed 2627.28 samples/sec   Loss 5.4667   LearningRate 0.0162   Epoch: 11   Global Step: 495340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:23:03,088-Speed 2623.06 samples/sec   Loss 5.4201   LearningRate 0.0162   Epoch: 11   Global Step: 495350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:23:07,009-Speed 2612.49 samples/sec   Loss 5.2836   LearningRate 0.0162   Epoch: 11   Global Step: 495360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:23:10,923-Speed 2616.60 samples/sec   Loss 5.3100   LearningRate 0.0162   Epoch: 11   Global Step: 495370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:23:14,803-Speed 2639.78 samples/sec   Loss 5.4809   LearningRate 0.0162   Epoch: 11   Global Step: 495380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:18,806-Speed 2559.23 samples/sec   Loss 5.4653   LearningRate 0.0162   Epoch: 11   Global Step: 495390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:22,704-Speed 2627.64 samples/sec   Loss 5.3140   LearningRate 0.0162   Epoch: 11   Global Step: 495400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:26,615-Speed 2619.57 samples/sec   Loss 5.3708   LearningRate 0.0162   Epoch: 11   Global Step: 495410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:30,509-Speed 2630.40 samples/sec   Loss 5.3876   LearningRate 0.0162   Epoch: 11   Global Step: 495420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:34,408-Speed 2626.26 samples/sec   Loss 5.4327   LearningRate 0.0162   Epoch: 11   Global Step: 495430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:38,304-Speed 2629.57 samples/sec   Loss 5.3996   LearningRate 0.0162   Epoch: 11   Global Step: 495440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:42,213-Speed 2619.94 samples/sec   Loss 5.4249   LearningRate 0.0162   Epoch: 11   Global Step: 495450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:46,109-Speed 2629.12 samples/sec   Loss 5.4569   LearningRate 0.0162   Epoch: 11   Global Step: 495460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:50,006-Speed 2628.17 samples/sec   Loss 5.4435   LearningRate 0.0162   Epoch: 11   Global Step: 495470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:23:53,933-Speed 2608.64 samples/sec   Loss 5.3153   LearningRate 0.0162   Epoch: 11   Global Step: 495480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:23:57,907-Speed 2577.83 samples/sec   Loss 5.4369   LearningRate 0.0162   Epoch: 11   Global Step: 495490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:01,820-Speed 2617.31 samples/sec   Loss 5.2637   LearningRate 0.0162   Epoch: 11   Global Step: 495500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:05,715-Speed 2629.80 samples/sec   Loss 5.4604   LearningRate 0.0162   Epoch: 11   Global Step: 495510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:09,620-Speed 2622.72 samples/sec   Loss 5.4839   LearningRate 0.0162   Epoch: 11   Global Step: 495520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:13,519-Speed 2627.18 samples/sec   Loss 5.4195   LearningRate 0.0162   Epoch: 11   Global Step: 495530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:17,423-Speed 2623.48 samples/sec   Loss 5.3204   LearningRate 0.0162   Epoch: 11   Global Step: 495540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:21,338-Speed 2616.06 samples/sec   Loss 5.3872   LearningRate 0.0162   Epoch: 11   Global Step: 495550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:25,246-Speed 2620.88 samples/sec   Loss 5.4299   LearningRate 0.0162   Epoch: 11   Global Step: 495560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:29,164-Speed 2615.14 samples/sec   Loss 5.4836   LearningRate 0.0162   Epoch: 11   Global Step: 495570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:33,051-Speed 2635.07 samples/sec   Loss 5.3831   LearningRate 0.0162   Epoch: 11   Global Step: 495580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:36,954-Speed 2624.23 samples/sec   Loss 5.3503   LearningRate 0.0162   Epoch: 11   Global Step: 495590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:40,864-Speed 2619.10 samples/sec   Loss 5.4696   LearningRate 0.0162   Epoch: 11   Global Step: 495600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:44,767-Speed 2624.06 samples/sec   Loss 5.3711   LearningRate 0.0162   Epoch: 11   Global Step: 495610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:48,668-Speed 2625.55 samples/sec   Loss 5.4314   LearningRate 0.0162   Epoch: 11   Global Step: 495620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:52,582-Speed 2617.38 samples/sec   Loss 5.4473   LearningRate 0.0162   Epoch: 11   Global Step: 495630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:24:56,478-Speed 2628.43 samples/sec   Loss 5.4206   LearningRate 0.0162   Epoch: 11   Global Step: 495640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:00,376-Speed 2628.20 samples/sec   Loss 5.3914   LearningRate 0.0162   Epoch: 11   Global Step: 495650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:04,271-Speed 2629.53 samples/sec   Loss 5.3974   LearningRate 0.0162   Epoch: 11   Global Step: 495660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:08,205-Speed 2604.24 samples/sec   Loss 5.4993   LearningRate 0.0162   Epoch: 11   Global Step: 495670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:12,081-Speed 2642.41 samples/sec   Loss 5.5371   LearningRate 0.0162   Epoch: 11   Global Step: 495680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:15,997-Speed 2615.20 samples/sec   Loss 5.3779   LearningRate 0.0162   Epoch: 11   Global Step: 495690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:19,902-Speed 2622.75 samples/sec   Loss 5.4892   LearningRate 0.0162   Epoch: 11   Global Step: 495700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:23,839-Speed 2601.66 samples/sec   Loss 5.3199   LearningRate 0.0162   Epoch: 11   Global Step: 495710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:27,741-Speed 2625.84 samples/sec   Loss 5.4320   LearningRate 0.0162   Epoch: 11   Global Step: 495720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:31,683-Speed 2597.77 samples/sec   Loss 5.2834   LearningRate 0.0162   Epoch: 11   Global Step: 495730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-04-15 03:25:35,551-Speed 2649.03 samples/sec   Loss 5.4804   LearningRate 0.0162   Epoch: 11   Global Step: 495740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:25:39,470-Speed 2613.26 samples/sec   Loss 5.4371   LearningRate 0.0162   Epoch: 11   Global Step: 495750   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:25:43,370-Speed 2626.53 samples/sec   Loss 5.3920   LearningRate 0.0162   Epoch: 11   Global Step: 495760   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:25:47,278-Speed 2620.53 samples/sec   Loss 5.4116   LearningRate 0.0162   Epoch: 11   Global Step: 495770   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:25:51,179-Speed 2626.31 samples/sec   Loss 5.4419   LearningRate 0.0162   Epoch: 11   Global Step: 495780   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-04-15 03:25:55,102-Speed 2610.45 samples/sec   Loss 5.4122   LearningRate 0.0162   Epoch: 11   Global Step: 495790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:25:59,017-Speed 2616.71 samples/sec   Loss 5.3847   LearningRate 0.0162   Epoch: 11   Global Step: 495800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:02,913-Speed 2629.24 samples/sec   Loss 5.3194   LearningRate 0.0162   Epoch: 11   Global Step: 495810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:06,829-Speed 2615.46 samples/sec   Loss 5.4011   LearningRate 0.0162   Epoch: 11   Global Step: 495820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:10,724-Speed 2629.42 samples/sec   Loss 5.4150   LearningRate 0.0162   Epoch: 11   Global Step: 495830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:14,637-Speed 2618.08 samples/sec   Loss 5.4582   LearningRate 0.0162   Epoch: 11   Global Step: 495840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:26:18,536-Speed 2626.31 samples/sec   Loss 5.4786   LearningRate 0.0162   Epoch: 11   Global Step: 495850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:26:22,414-Speed 2641.51 samples/sec   Loss 5.4388   LearningRate 0.0162   Epoch: 11   Global Step: 495860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:26,306-Speed 2631.52 samples/sec   Loss 5.3460   LearningRate 0.0162   Epoch: 11   Global Step: 495870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:30,216-Speed 2619.45 samples/sec   Loss 5.2711   LearningRate 0.0162   Epoch: 11   Global Step: 495880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:34,117-Speed 2626.26 samples/sec   Loss 5.4409   LearningRate 0.0162   Epoch: 11   Global Step: 495890   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:38,046-Speed 2607.06 samples/sec   Loss 5.4456   LearningRate 0.0162   Epoch: 11   Global Step: 495900   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:41,948-Speed 2624.70 samples/sec   Loss 5.5611   LearningRate 0.0162   Epoch: 11   Global Step: 495910   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:45,840-Speed 2631.94 samples/sec   Loss 5.4234   LearningRate 0.0162   Epoch: 11   Global Step: 495920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:49,733-Speed 2630.92 samples/sec   Loss 5.2915   LearningRate 0.0162   Epoch: 11   Global Step: 495930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:53,640-Speed 2621.88 samples/sec   Loss 5.5029   LearningRate 0.0162   Epoch: 11   Global Step: 495940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:26:57,543-Speed 2623.77 samples/sec   Loss 5.3845   LearningRate 0.0162   Epoch: 11   Global Step: 495950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:01,450-Speed 2621.85 samples/sec   Loss 5.3116   LearningRate 0.0162   Epoch: 11   Global Step: 495960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:05,338-Speed 2633.98 samples/sec   Loss 5.3895   LearningRate 0.0162   Epoch: 11   Global Step: 495970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:09,230-Speed 2631.70 samples/sec   Loss 5.3960   LearningRate 0.0162   Epoch: 11   Global Step: 495980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:13,120-Speed 2633.51 samples/sec   Loss 5.4059   LearningRate 0.0162   Epoch: 11   Global Step: 495990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:17,016-Speed 2628.89 samples/sec   Loss 5.4670   LearningRate 0.0162   Epoch: 11   Global Step: 496000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:20,910-Speed 2630.75 samples/sec   Loss 5.4658   LearningRate 0.0162   Epoch: 11   Global Step: 496010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:24,831-Speed 2611.56 samples/sec   Loss 5.3306   LearningRate 0.0162   Epoch: 11   Global Step: 496020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:28,726-Speed 2629.90 samples/sec   Loss 5.4488   LearningRate 0.0162   Epoch: 11   Global Step: 496030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:32,637-Speed 2618.54 samples/sec   Loss 5.3215   LearningRate 0.0162   Epoch: 11   Global Step: 496040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:27:36,542-Speed 2623.05 samples/sec   Loss 5.3634   LearningRate 0.0162   Epoch: 11   Global Step: 496050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:40,458-Speed 2615.29 samples/sec   Loss 5.3674   LearningRate 0.0162   Epoch: 11   Global Step: 496060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:44,361-Speed 2623.88 samples/sec   Loss 5.3770   LearningRate 0.0162   Epoch: 11   Global Step: 496070   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:48,265-Speed 2624.16 samples/sec   Loss 5.3522   LearningRate 0.0162   Epoch: 11   Global Step: 496080   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:52,168-Speed 2624.70 samples/sec   Loss 5.4413   LearningRate 0.0162   Epoch: 11   Global Step: 496090   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:56,064-Speed 2629.62 samples/sec   Loss 5.4230   LearningRate 0.0162   Epoch: 11   Global Step: 496100   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:27:59,963-Speed 2626.47 samples/sec   Loss 5.3053   LearningRate 0.0162   Epoch: 11   Global Step: 496110   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:03,864-Speed 2625.80 samples/sec   Loss 5.4497   LearningRate 0.0162   Epoch: 11   Global Step: 496120   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:07,760-Speed 2628.35 samples/sec   Loss 5.4506   LearningRate 0.0162   Epoch: 11   Global Step: 496130   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:11,653-Speed 2631.47 samples/sec   Loss 5.3508   LearningRate 0.0162   Epoch: 11   Global Step: 496140   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:15,549-Speed 2628.46 samples/sec   Loss 5.3957   LearningRate 0.0162   Epoch: 11   Global Step: 496150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:28:19,467-Speed 2614.51 samples/sec   Loss 5.3995   LearningRate 0.0162   Epoch: 11   Global Step: 496160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:28:23,386-Speed 2613.55 samples/sec   Loss 5.3330   LearningRate 0.0162   Epoch: 11   Global Step: 496170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:28:27,284-Speed 2628.09 samples/sec   Loss 5.3894   LearningRate 0.0162   Epoch: 11   Global Step: 496180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:28:31,184-Speed 2626.00 samples/sec   Loss 5.3134   LearningRate 0.0162   Epoch: 11   Global Step: 496190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:28:35,078-Speed 2630.39 samples/sec   Loss 5.2502   LearningRate 0.0161   Epoch: 11   Global Step: 496200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:28:38,975-Speed 2628.01 samples/sec   Loss 5.3856   LearningRate 0.0161   Epoch: 11   Global Step: 496210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:42,873-Speed 2628.16 samples/sec   Loss 5.3728   LearningRate 0.0161   Epoch: 11   Global Step: 496220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:46,792-Speed 2613.60 samples/sec   Loss 5.3956   LearningRate 0.0161   Epoch: 11   Global Step: 496230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:50,693-Speed 2625.68 samples/sec   Loss 5.3371   LearningRate 0.0161   Epoch: 11   Global Step: 496240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:54,600-Speed 2622.10 samples/sec   Loss 5.3989   LearningRate 0.0161   Epoch: 11   Global Step: 496250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:28:58,489-Speed 2633.33 samples/sec   Loss 5.3293   LearningRate 0.0161   Epoch: 11   Global Step: 496260   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:02,411-Speed 2611.59 samples/sec   Loss 5.3861   LearningRate 0.0161   Epoch: 11   Global Step: 496270   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:06,318-Speed 2621.38 samples/sec   Loss 5.3678   LearningRate 0.0161   Epoch: 11   Global Step: 496280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:10,216-Speed 2628.19 samples/sec   Loss 5.3004   LearningRate 0.0161   Epoch: 11   Global Step: 496290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:14,132-Speed 2615.48 samples/sec   Loss 5.4349   LearningRate 0.0161   Epoch: 11   Global Step: 496300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:18,034-Speed 2625.08 samples/sec   Loss 5.3798   LearningRate 0.0161   Epoch: 11   Global Step: 496310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:29:21,920-Speed 2636.16 samples/sec   Loss 5.4228   LearningRate 0.0161   Epoch: 11   Global Step: 496320   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:25,838-Speed 2614.32 samples/sec   Loss 5.3042   LearningRate 0.0161   Epoch: 11   Global Step: 496330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:29,748-Speed 2619.86 samples/sec   Loss 5.4220   LearningRate 0.0161   Epoch: 11   Global Step: 496340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:33,642-Speed 2630.14 samples/sec   Loss 5.3629   LearningRate 0.0161   Epoch: 11   Global Step: 496350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:37,568-Speed 2608.24 samples/sec   Loss 5.2647   LearningRate 0.0161   Epoch: 11   Global Step: 496360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:41,482-Speed 2618.08 samples/sec   Loss 5.3965   LearningRate 0.0161   Epoch: 11   Global Step: 496370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:45,396-Speed 2617.12 samples/sec   Loss 5.4643   LearningRate 0.0161   Epoch: 11   Global Step: 496380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:49,290-Speed 2630.11 samples/sec   Loss 5.3342   LearningRate 0.0161   Epoch: 11   Global Step: 496390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:53,183-Speed 2631.34 samples/sec   Loss 5.3537   LearningRate 0.0161   Epoch: 11   Global Step: 496400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:29:57,083-Speed 2626.22 samples/sec   Loss 5.3991   LearningRate 0.0161   Epoch: 11   Global Step: 496410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:00,961-Speed 2641.05 samples/sec   Loss 5.2606   LearningRate 0.0161   Epoch: 11   Global Step: 496420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:04,854-Speed 2631.07 samples/sec   Loss 5.3632   LearningRate 0.0161   Epoch: 11   Global Step: 496430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:08,743-Speed 2633.66 samples/sec   Loss 5.3111   LearningRate 0.0161   Epoch: 11   Global Step: 496440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:12,636-Speed 2631.53 samples/sec   Loss 5.4664   LearningRate 0.0161   Epoch: 11   Global Step: 496450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:16,528-Speed 2631.73 samples/sec   Loss 5.4536   LearningRate 0.0161   Epoch: 11   Global Step: 496460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:20,421-Speed 2630.80 samples/sec   Loss 5.3356   LearningRate 0.0161   Epoch: 11   Global Step: 496470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:24,325-Speed 2625.01 samples/sec   Loss 5.4816   LearningRate 0.0161   Epoch: 11   Global Step: 496480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:28,216-Speed 2632.59 samples/sec   Loss 5.3660   LearningRate 0.0161   Epoch: 11   Global Step: 496490   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:32,113-Speed 2627.74 samples/sec   Loss 5.4298   LearningRate 0.0161   Epoch: 11   Global Step: 496500   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:36,009-Speed 2629.67 samples/sec   Loss 5.4173   LearningRate 0.0161   Epoch: 11   Global Step: 496510   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:30:39,905-Speed 2629.15 samples/sec   Loss 5.4543   LearningRate 0.0161   Epoch: 11   Global Step: 496520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:30:43,822-Speed 2614.82 samples/sec   Loss 5.3068   LearningRate 0.0161   Epoch: 11   Global Step: 496530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:30:47,727-Speed 2623.14 samples/sec   Loss 5.3398   LearningRate 0.0161   Epoch: 11   Global Step: 496540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:30:51,617-Speed 2632.95 samples/sec   Loss 5.3837   LearningRate 0.0161   Epoch: 11   Global Step: 496550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:30:55,515-Speed 2627.61 samples/sec   Loss 5.4691   LearningRate 0.0161   Epoch: 11   Global Step: 496560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:30:59,397-Speed 2639.78 samples/sec   Loss 5.2983   LearningRate 0.0161   Epoch: 11   Global Step: 496570   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:03,289-Speed 2631.31 samples/sec   Loss 5.3992   LearningRate 0.0161   Epoch: 11   Global Step: 496580   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:07,191-Speed 2625.45 samples/sec   Loss 5.3991   LearningRate 0.0161   Epoch: 11   Global Step: 496590   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:11,097-Speed 2622.18 samples/sec   Loss 5.4002   LearningRate 0.0161   Epoch: 11   Global Step: 496600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:14,997-Speed 2625.79 samples/sec   Loss 5.3746   LearningRate 0.0161   Epoch: 11   Global Step: 496610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:18,949-Speed 2591.67 samples/sec   Loss 5.4174   LearningRate 0.0161   Epoch: 11   Global Step: 496620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:22,845-Speed 2629.34 samples/sec   Loss 5.3420   LearningRate 0.0161   Epoch: 11   Global Step: 496630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:26,762-Speed 2615.06 samples/sec   Loss 5.3972   LearningRate 0.0161   Epoch: 11   Global Step: 496640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:30,655-Speed 2630.59 samples/sec   Loss 5.2885   LearningRate 0.0161   Epoch: 11   Global Step: 496650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:34,554-Speed 2627.15 samples/sec   Loss 5.2476   LearningRate 0.0161   Epoch: 11   Global Step: 496660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:31:38,449-Speed 2629.83 samples/sec   Loss 5.4630   LearningRate 0.0161   Epoch: 11   Global Step: 496670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:31:42,340-Speed 2632.81 samples/sec   Loss 5.4495   LearningRate 0.0161   Epoch: 11   Global Step: 496680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:31:46,232-Speed 2631.28 samples/sec   Loss 5.4104   LearningRate 0.0161   Epoch: 11   Global Step: 496690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:31:50,130-Speed 2627.15 samples/sec   Loss 5.3744   LearningRate 0.0161   Epoch: 11   Global Step: 496700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:31:54,032-Speed 2625.23 samples/sec   Loss 5.4864   LearningRate 0.0161   Epoch: 11   Global Step: 496710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:31:57,910-Speed 2641.69 samples/sec   Loss 5.5087   LearningRate 0.0161   Epoch: 11   Global Step: 496720   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:01,810-Speed 2626.27 samples/sec   Loss 5.2800   LearningRate 0.0161   Epoch: 11   Global Step: 496730   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:05,707-Speed 2628.66 samples/sec   Loss 5.3941   LearningRate 0.0161   Epoch: 11   Global Step: 496740   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:09,637-Speed 2606.06 samples/sec   Loss 5.4796   LearningRate 0.0161   Epoch: 11   Global Step: 496750   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:13,535-Speed 2627.53 samples/sec   Loss 5.2480   LearningRate 0.0161   Epoch: 11   Global Step: 496760   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:17,435-Speed 2626.71 samples/sec   Loss 5.3403   LearningRate 0.0161   Epoch: 11   Global Step: 496770   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:21,332-Speed 2630.51 samples/sec   Loss 5.2947   LearningRate 0.0161   Epoch: 11   Global Step: 496780   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:25,224-Speed 2631.22 samples/sec   Loss 5.3268   LearningRate 0.0161   Epoch: 11   Global Step: 496790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:29,122-Speed 2628.03 samples/sec   Loss 5.4071   LearningRate 0.0161   Epoch: 11   Global Step: 496800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:33,019-Speed 2628.06 samples/sec   Loss 5.4734   LearningRate 0.0161   Epoch: 11   Global Step: 496810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:32:36,914-Speed 2630.08 samples/sec   Loss 5.4098   LearningRate 0.0161   Epoch: 11   Global Step: 496820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:32:40,811-Speed 2628.22 samples/sec   Loss 5.3394   LearningRate 0.0161   Epoch: 11   Global Step: 496830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:32:44,751-Speed 2599.83 samples/sec   Loss 5.3409   LearningRate 0.0161   Epoch: 11   Global Step: 496840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:32:48,668-Speed 2614.72 samples/sec   Loss 5.2672   LearningRate 0.0161   Epoch: 11   Global Step: 496850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:32:52,577-Speed 2620.36 samples/sec   Loss 5.3790   LearningRate 0.0161   Epoch: 11   Global Step: 496860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:32:56,476-Speed 2626.83 samples/sec   Loss 5.3539   LearningRate 0.0161   Epoch: 11   Global Step: 496870   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:00,370-Speed 2630.38 samples/sec   Loss 5.3072   LearningRate 0.0161   Epoch: 11   Global Step: 496880   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:04,277-Speed 2621.71 samples/sec   Loss 5.3635   LearningRate 0.0161   Epoch: 11   Global Step: 496890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:08,172-Speed 2630.12 samples/sec   Loss 5.3581   LearningRate 0.0161   Epoch: 11   Global Step: 496900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:12,064-Speed 2631.50 samples/sec   Loss 5.4317   LearningRate 0.0161   Epoch: 11   Global Step: 496910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:15,938-Speed 2643.96 samples/sec   Loss 5.3521   LearningRate 0.0161   Epoch: 11   Global Step: 496920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:19,845-Speed 2622.04 samples/sec   Loss 5.3570   LearningRate 0.0161   Epoch: 11   Global Step: 496930   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:23,738-Speed 2630.79 samples/sec   Loss 5.2760   LearningRate 0.0161   Epoch: 11   Global Step: 496940   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:27,637-Speed 2626.79 samples/sec   Loss 5.3147   LearningRate 0.0161   Epoch: 11   Global Step: 496950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:31,534-Speed 2628.62 samples/sec   Loss 5.4243   LearningRate 0.0161   Epoch: 11   Global Step: 496960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:33:35,407-Speed 2644.62 samples/sec   Loss 5.4060   LearningRate 0.0161   Epoch: 11   Global Step: 496970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:33:39,305-Speed 2627.16 samples/sec   Loss 5.3330   LearningRate 0.0161   Epoch: 11   Global Step: 496980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:33:43,202-Speed 2628.80 samples/sec   Loss 5.3830   LearningRate 0.0161   Epoch: 11   Global Step: 496990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:33:47,102-Speed 2626.28 samples/sec   Loss 5.4882   LearningRate 0.0161   Epoch: 11   Global Step: 497000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:33:51,007-Speed 2623.19 samples/sec   Loss 5.4503   LearningRate 0.0161   Epoch: 11   Global Step: 497010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:33:54,902-Speed 2629.57 samples/sec   Loss 5.3069   LearningRate 0.0161   Epoch: 11   Global Step: 497020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:33:58,818-Speed 2616.48 samples/sec   Loss 5.3900   LearningRate 0.0161   Epoch: 11   Global Step: 497030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:02,737-Speed 2613.23 samples/sec   Loss 5.4118   LearningRate 0.0161   Epoch: 11   Global Step: 497040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:06,633-Speed 2629.35 samples/sec   Loss 5.2966   LearningRate 0.0161   Epoch: 11   Global Step: 497050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:10,529-Speed 2628.88 samples/sec   Loss 5.3809   LearningRate 0.0161   Epoch: 11   Global Step: 497060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:14,425-Speed 2629.52 samples/sec   Loss 5.3830   LearningRate 0.0161   Epoch: 11   Global Step: 497070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:34:18,333-Speed 2621.31 samples/sec   Loss 5.4053   LearningRate 0.0161   Epoch: 11   Global Step: 497080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:34:22,227-Speed 2630.05 samples/sec   Loss 5.3159   LearningRate 0.0161   Epoch: 11   Global Step: 497090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:34:26,127-Speed 2627.27 samples/sec   Loss 5.3945   LearningRate 0.0161   Epoch: 11   Global Step: 497100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:34:30,044-Speed 2614.96 samples/sec   Loss 5.3073   LearningRate 0.0161   Epoch: 11   Global Step: 497110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:34:33,956-Speed 2618.43 samples/sec   Loss 5.4003   LearningRate 0.0161   Epoch: 11   Global Step: 497120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:34:37,968-Speed 2552.72 samples/sec   Loss 5.3688   LearningRate 0.0161   Epoch: 11   Global Step: 497130   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:42,061-Speed 2502.78 samples/sec   Loss 5.3449   LearningRate 0.0161   Epoch: 11   Global Step: 497140   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:46,156-Speed 2501.07 samples/sec   Loss 5.3631   LearningRate 0.0161   Epoch: 11   Global Step: 497150   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:50,063-Speed 2622.08 samples/sec   Loss 5.3752   LearningRate 0.0161   Epoch: 11   Global Step: 497160   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:53,959-Speed 2629.01 samples/sec   Loss 5.4163   LearningRate 0.0161   Epoch: 11   Global Step: 497170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:34:57,855-Speed 2629.30 samples/sec   Loss 5.3107   LearningRate 0.0161   Epoch: 11   Global Step: 497180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:01,750-Speed 2629.96 samples/sec   Loss 5.3728   LearningRate 0.0161   Epoch: 11   Global Step: 497190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:05,645-Speed 2629.15 samples/sec   Loss 5.4128   LearningRate 0.0161   Epoch: 11   Global Step: 497200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:09,550-Speed 2622.79 samples/sec   Loss 5.3963   LearningRate 0.0161   Epoch: 11   Global Step: 497210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:13,459-Speed 2620.81 samples/sec   Loss 5.4702   LearningRate 0.0161   Epoch: 11   Global Step: 497220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:17,355-Speed 2628.93 samples/sec   Loss 5.2847   LearningRate 0.0160   Epoch: 11   Global Step: 497230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:21,250-Speed 2630.00 samples/sec   Loss 5.3402   LearningRate 0.0160   Epoch: 11   Global Step: 497240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:25,154-Speed 2623.59 samples/sec   Loss 5.4060   LearningRate 0.0160   Epoch: 11   Global Step: 497250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:29,050-Speed 2629.73 samples/sec   Loss 5.3935   LearningRate 0.0160   Epoch: 11   Global Step: 497260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:33,037-Speed 2569.08 samples/sec   Loss 5.1841   LearningRate 0.0160   Epoch: 11   Global Step: 497270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:36,938-Speed 2625.43 samples/sec   Loss 5.3888   LearningRate 0.0160   Epoch: 11   Global Step: 497280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:40,837-Speed 2627.13 samples/sec   Loss 5.4054   LearningRate 0.0160   Epoch: 11   Global Step: 497290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:35:44,718-Speed 2639.84 samples/sec   Loss 5.3671   LearningRate 0.0160   Epoch: 11   Global Step: 497300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:48,613-Speed 2629.45 samples/sec   Loss 5.3908   LearningRate 0.0160   Epoch: 11   Global Step: 497310   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:52,517-Speed 2623.52 samples/sec   Loss 5.5381   LearningRate 0.0160   Epoch: 11   Global Step: 497320   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:35:56,427-Speed 2619.76 samples/sec   Loss 5.3124   LearningRate 0.0160   Epoch: 11   Global Step: 497330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:00,350-Speed 2610.74 samples/sec   Loss 5.4081   LearningRate 0.0160   Epoch: 11   Global Step: 497340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:04,253-Speed 2624.70 samples/sec   Loss 5.3247   LearningRate 0.0160   Epoch: 11   Global Step: 497350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:08,152-Speed 2626.89 samples/sec   Loss 5.4551   LearningRate 0.0160   Epoch: 11   Global Step: 497360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:12,093-Speed 2598.65 samples/sec   Loss 5.3166   LearningRate 0.0160   Epoch: 11   Global Step: 497370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:15,988-Speed 2630.11 samples/sec   Loss 5.4808   LearningRate 0.0160   Epoch: 11   Global Step: 497380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:19,887-Speed 2626.74 samples/sec   Loss 5.4047   LearningRate 0.0160   Epoch: 11   Global Step: 497390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:36:23,873-Speed 2570.26 samples/sec   Loss 5.3428   LearningRate 0.0160   Epoch: 11   Global Step: 497400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:27,767-Speed 2630.02 samples/sec   Loss 5.4875   LearningRate 0.0160   Epoch: 11   Global Step: 497410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:31,690-Speed 2611.08 samples/sec   Loss 5.4134   LearningRate 0.0160   Epoch: 11   Global Step: 497420   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:35,638-Speed 2594.62 samples/sec   Loss 5.3209   LearningRate 0.0160   Epoch: 11   Global Step: 497430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:39,537-Speed 2627.45 samples/sec   Loss 5.3739   LearningRate 0.0160   Epoch: 11   Global Step: 497440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:43,440-Speed 2624.20 samples/sec   Loss 5.4131   LearningRate 0.0160   Epoch: 11   Global Step: 497450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:47,347-Speed 2621.44 samples/sec   Loss 5.3682   LearningRate 0.0160   Epoch: 11   Global Step: 497460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:51,243-Speed 2628.41 samples/sec   Loss 5.3813   LearningRate 0.0160   Epoch: 11   Global Step: 497470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:55,136-Speed 2630.96 samples/sec   Loss 5.3965   LearningRate 0.0160   Epoch: 11   Global Step: 497480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:36:59,033-Speed 2629.15 samples/sec   Loss 5.3628   LearningRate 0.0160   Epoch: 11   Global Step: 497490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:02,908-Speed 2642.64 samples/sec   Loss 5.4324   LearningRate 0.0160   Epoch: 11   Global Step: 497500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:06,803-Speed 2629.82 samples/sec   Loss 5.3542   LearningRate 0.0160   Epoch: 11   Global Step: 497510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:10,704-Speed 2625.62 samples/sec   Loss 5.3702   LearningRate 0.0160   Epoch: 11   Global Step: 497520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:14,602-Speed 2627.63 samples/sec   Loss 5.3821   LearningRate 0.0160   Epoch: 11   Global Step: 497530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:18,506-Speed 2623.13 samples/sec   Loss 5.3860   LearningRate 0.0160   Epoch: 11   Global Step: 497540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:22,404-Speed 2627.96 samples/sec   Loss 5.3911   LearningRate 0.0160   Epoch: 11   Global Step: 497550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:26,294-Speed 2633.26 samples/sec   Loss 5.3828   LearningRate 0.0160   Epoch: 11   Global Step: 497560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:30,218-Speed 2610.24 samples/sec   Loss 5.2587   LearningRate 0.0160   Epoch: 11   Global Step: 497570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:34,112-Speed 2630.72 samples/sec   Loss 5.3717   LearningRate 0.0160   Epoch: 11   Global Step: 497580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:38,006-Speed 2630.02 samples/sec   Loss 5.3860   LearningRate 0.0160   Epoch: 11   Global Step: 497590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:41,885-Speed 2640.78 samples/sec   Loss 5.3798   LearningRate 0.0160   Epoch: 11   Global Step: 497600   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:45,799-Speed 2616.19 samples/sec   Loss 5.4253   LearningRate 0.0160   Epoch: 11   Global Step: 497610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:49,698-Speed 2627.32 samples/sec   Loss 5.3404   LearningRate 0.0160   Epoch: 11   Global Step: 497620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:37:53,585-Speed 2635.10 samples/sec   Loss 5.3727   LearningRate 0.0160   Epoch: 11   Global Step: 497630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:37:57,513-Speed 2607.21 samples/sec   Loss 5.3260   LearningRate 0.0160   Epoch: 11   Global Step: 497640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:01,408-Speed 2629.71 samples/sec   Loss 5.3501   LearningRate 0.0160   Epoch: 11   Global Step: 497650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:05,306-Speed 2628.11 samples/sec   Loss 5.3638   LearningRate 0.0160   Epoch: 11   Global Step: 497660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:09,202-Speed 2628.96 samples/sec   Loss 5.4628   LearningRate 0.0160   Epoch: 11   Global Step: 497670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:13,099-Speed 2628.01 samples/sec   Loss 5.3558   LearningRate 0.0160   Epoch: 11   Global Step: 497680   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:16,994-Speed 2629.49 samples/sec   Loss 5.3797   LearningRate 0.0160   Epoch: 11   Global Step: 497690   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:20,899-Speed 2622.85 samples/sec   Loss 5.3952   LearningRate 0.0160   Epoch: 11   Global Step: 497700   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:24,796-Speed 2628.88 samples/sec   Loss 5.4578   LearningRate 0.0160   Epoch: 11   Global Step: 497710   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:28,696-Speed 2627.13 samples/sec   Loss 5.3798   LearningRate 0.0160   Epoch: 11   Global Step: 497720   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:38:32,595-Speed 2626.24 samples/sec   Loss 5.3177   LearningRate 0.0160   Epoch: 11   Global Step: 497730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:38:36,491-Speed 2629.01 samples/sec   Loss 5.3411   LearningRate 0.0160   Epoch: 11   Global Step: 497740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:38:57,656-Speed 483.85 samples/sec   Loss 5.4467   LearningRate 0.0160   Epoch: 12   Global Step: 497750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:39:01,540-Speed 2637.87 samples/sec   Loss 5.4073   LearningRate 0.0160   Epoch: 12   Global Step: 497760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:39:05,426-Speed 2635.70 samples/sec   Loss 5.3457   LearningRate 0.0160   Epoch: 12   Global Step: 497770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:39:09,315-Speed 2634.17 samples/sec   Loss 5.3947   LearningRate 0.0160   Epoch: 12   Global Step: 497780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:39:13,209-Speed 2630.17 samples/sec   Loss 5.3839   LearningRate 0.0160   Epoch: 12   Global Step: 497790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:17,135-Speed 2609.56 samples/sec   Loss 5.3524   LearningRate 0.0160   Epoch: 12   Global Step: 497800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:21,029-Speed 2630.01 samples/sec   Loss 5.3134   LearningRate 0.0160   Epoch: 12   Global Step: 497810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:24,927-Speed 2628.29 samples/sec   Loss 5.2976   LearningRate 0.0160   Epoch: 12   Global Step: 497820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:28,823-Speed 2629.05 samples/sec   Loss 5.3161   LearningRate 0.0160   Epoch: 12   Global Step: 497830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:32,722-Speed 2626.79 samples/sec   Loss 5.3004   LearningRate 0.0160   Epoch: 12   Global Step: 497840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:36,621-Speed 2626.65 samples/sec   Loss 5.3595   LearningRate 0.0160   Epoch: 12   Global Step: 497850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:40,519-Speed 2627.54 samples/sec   Loss 5.3078   LearningRate 0.0160   Epoch: 12   Global Step: 497860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:44,425-Speed 2622.37 samples/sec   Loss 5.3237   LearningRate 0.0160   Epoch: 12   Global Step: 497870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:48,324-Speed 2626.73 samples/sec   Loss 5.3184   LearningRate 0.0160   Epoch: 12   Global Step: 497880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:39:52,229-Speed 2623.42 samples/sec   Loss 5.4178   LearningRate 0.0160   Epoch: 12   Global Step: 497890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:39:56,129-Speed 2626.38 samples/sec   Loss 5.3031   LearningRate 0.0160   Epoch: 12   Global Step: 497900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:40:00,032-Speed 2624.88 samples/sec   Loss 5.3847   LearningRate 0.0160   Epoch: 12   Global Step: 497910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:40:03,937-Speed 2622.54 samples/sec   Loss 5.3713   LearningRate 0.0160   Epoch: 12   Global Step: 497920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:40:07,838-Speed 2626.02 samples/sec   Loss 5.3509   LearningRate 0.0160   Epoch: 12   Global Step: 497930   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:40:11,715-Speed 2641.59 samples/sec   Loss 5.3225   LearningRate 0.0160   Epoch: 12   Global Step: 497940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:15,615-Speed 2625.61 samples/sec   Loss 5.3189   LearningRate 0.0160   Epoch: 12   Global Step: 497950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:19,516-Speed 2625.78 samples/sec   Loss 5.3564   LearningRate 0.0160   Epoch: 12   Global Step: 497960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:23,412-Speed 2629.21 samples/sec   Loss 5.4004   LearningRate 0.0160   Epoch: 12   Global Step: 497970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:27,312-Speed 2626.60 samples/sec   Loss 5.4554   LearningRate 0.0160   Epoch: 12   Global Step: 497980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:31,211-Speed 2626.79 samples/sec   Loss 5.4230   LearningRate 0.0160   Epoch: 12   Global Step: 497990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:35,106-Speed 2629.22 samples/sec   Loss 5.2969   LearningRate 0.0160   Epoch: 12   Global Step: 498000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:39,008-Speed 2624.88 samples/sec   Loss 5.3382   LearningRate 0.0160   Epoch: 12   Global Step: 498010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:42,904-Speed 2629.50 samples/sec   Loss 5.4547   LearningRate 0.0160   Epoch: 12   Global Step: 498020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:46,814-Speed 2619.32 samples/sec   Loss 5.4167   LearningRate 0.0160   Epoch: 12   Global Step: 498030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:40:50,711-Speed 2628.89 samples/sec   Loss 5.4746   LearningRate 0.0160   Epoch: 12   Global Step: 498040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:40:54,608-Speed 2628.40 samples/sec   Loss 5.2940   LearningRate 0.0160   Epoch: 12   Global Step: 498050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:40:58,505-Speed 2628.76 samples/sec   Loss 5.2615   LearningRate 0.0160   Epoch: 12   Global Step: 498060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:41:02,407-Speed 2625.25 samples/sec   Loss 5.4027   LearningRate 0.0160   Epoch: 12   Global Step: 498070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:41:06,313-Speed 2622.20 samples/sec   Loss 5.3781   LearningRate 0.0160   Epoch: 12   Global Step: 498080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:41:10,215-Speed 2624.66 samples/sec   Loss 5.3980   LearningRate 0.0160   Epoch: 12   Global Step: 498090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:41:14,127-Speed 2618.50 samples/sec   Loss 5.3660   LearningRate 0.0160   Epoch: 12   Global Step: 498100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:41:18,028-Speed 2625.11 samples/sec   Loss 5.3836   LearningRate 0.0160   Epoch: 12   Global Step: 498110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:41:21,909-Speed 2639.93 samples/sec   Loss 5.2542   LearningRate 0.0160   Epoch: 12   Global Step: 498120   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:25,825-Speed 2615.44 samples/sec   Loss 5.3282   LearningRate 0.0160   Epoch: 12   Global Step: 498130   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:29,724-Speed 2627.06 samples/sec   Loss 5.3424   LearningRate 0.0160   Epoch: 12   Global Step: 498140   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:33,619-Speed 2629.14 samples/sec   Loss 5.2192   LearningRate 0.0160   Epoch: 12   Global Step: 498150   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:37,523-Speed 2623.57 samples/sec   Loss 5.3583   LearningRate 0.0160   Epoch: 12   Global Step: 498160   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:41,420-Speed 2627.82 samples/sec   Loss 5.3552   LearningRate 0.0160   Epoch: 12   Global Step: 498170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:45,315-Speed 2629.79 samples/sec   Loss 5.3467   LearningRate 0.0160   Epoch: 12   Global Step: 498180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:49,217-Speed 2625.03 samples/sec   Loss 5.3667   LearningRate 0.0160   Epoch: 12   Global Step: 498190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:53,116-Speed 2627.03 samples/sec   Loss 5.3706   LearningRate 0.0160   Epoch: 12   Global Step: 498200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:41:57,011-Speed 2630.44 samples/sec   Loss 5.2925   LearningRate 0.0160   Epoch: 12   Global Step: 498210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:42:00,909-Speed 2627.71 samples/sec   Loss 5.3389   LearningRate 0.0160   Epoch: 12   Global Step: 498220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:04,835-Speed 2608.75 samples/sec   Loss 5.2739   LearningRate 0.0160   Epoch: 12   Global Step: 498230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:08,744-Speed 2619.62 samples/sec   Loss 5.3711   LearningRate 0.0160   Epoch: 12   Global Step: 498240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:12,650-Speed 2622.55 samples/sec   Loss 5.2709   LearningRate 0.0160   Epoch: 12   Global Step: 498250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:16,547-Speed 2628.06 samples/sec   Loss 5.3037   LearningRate 0.0160   Epoch: 12   Global Step: 498260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:20,457-Speed 2620.04 samples/sec   Loss 5.2933   LearningRate 0.0159   Epoch: 12   Global Step: 498270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:24,418-Speed 2585.46 samples/sec   Loss 5.3058   LearningRate 0.0159   Epoch: 12   Global Step: 498280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:28,446-Speed 2542.60 samples/sec   Loss 5.2487   LearningRate 0.0159   Epoch: 12   Global Step: 498290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:32,352-Speed 2622.95 samples/sec   Loss 5.4399   LearningRate 0.0159   Epoch: 12   Global Step: 498300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:36,253-Speed 2625.30 samples/sec   Loss 5.2930   LearningRate 0.0159   Epoch: 12   Global Step: 498310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:40,136-Speed 2637.83 samples/sec   Loss 5.3762   LearningRate 0.0159   Epoch: 12   Global Step: 498320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:44,048-Speed 2618.69 samples/sec   Loss 5.4075   LearningRate 0.0159   Epoch: 12   Global Step: 498330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:47,957-Speed 2619.58 samples/sec   Loss 5.3238   LearningRate 0.0159   Epoch: 12   Global Step: 498340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:51,854-Speed 2628.35 samples/sec   Loss 5.3643   LearningRate 0.0159   Epoch: 12   Global Step: 498350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:55,755-Speed 2625.49 samples/sec   Loss 5.2693   LearningRate 0.0159   Epoch: 12   Global Step: 498360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:42:59,629-Speed 2644.01 samples/sec   Loss 5.3334   LearningRate 0.0159   Epoch: 12   Global Step: 498370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:03,528-Speed 2626.64 samples/sec   Loss 5.3624   LearningRate 0.0159   Epoch: 12   Global Step: 498380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:07,426-Speed 2628.26 samples/sec   Loss 5.2648   LearningRate 0.0159   Epoch: 12   Global Step: 498390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:11,323-Speed 2628.21 samples/sec   Loss 5.1948   LearningRate 0.0159   Epoch: 12   Global Step: 498400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:15,218-Speed 2629.69 samples/sec   Loss 5.2908   LearningRate 0.0159   Epoch: 12   Global Step: 498410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:19,194-Speed 2575.87 samples/sec   Loss 5.3265   LearningRate 0.0159   Epoch: 12   Global Step: 498420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:23,096-Speed 2625.31 samples/sec   Loss 5.4585   LearningRate 0.0159   Epoch: 12   Global Step: 498430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:26,998-Speed 2624.79 samples/sec   Loss 5.3714   LearningRate 0.0159   Epoch: 12   Global Step: 498440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:30,894-Speed 2628.38 samples/sec   Loss 5.2431   LearningRate 0.0159   Epoch: 12   Global Step: 498450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:34,796-Speed 2625.17 samples/sec   Loss 5.4096   LearningRate 0.0159   Epoch: 12   Global Step: 498460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:43:38,717-Speed 2612.28 samples/sec   Loss 5.2887   LearningRate 0.0159   Epoch: 12   Global Step: 498470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:43:42,617-Speed 2625.87 samples/sec   Loss 5.4221   LearningRate 0.0159   Epoch: 12   Global Step: 498480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:43:46,514-Speed 2628.47 samples/sec   Loss 5.4061   LearningRate 0.0159   Epoch: 12   Global Step: 498490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:43:50,415-Speed 2625.55 samples/sec   Loss 5.1827   LearningRate 0.0159   Epoch: 12   Global Step: 498500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:43:54,313-Speed 2628.06 samples/sec   Loss 5.2694   LearningRate 0.0159   Epoch: 12   Global Step: 498510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:43:58,214-Speed 2625.18 samples/sec   Loss 5.3008   LearningRate 0.0159   Epoch: 12   Global Step: 498520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:02,113-Speed 2627.36 samples/sec   Loss 5.3744   LearningRate 0.0159   Epoch: 12   Global Step: 498530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:06,009-Speed 2628.52 samples/sec   Loss 5.3260   LearningRate 0.0159   Epoch: 12   Global Step: 498540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:09,908-Speed 2626.88 samples/sec   Loss 5.3162   LearningRate 0.0159   Epoch: 12   Global Step: 498550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:13,819-Speed 2618.74 samples/sec   Loss 5.3172   LearningRate 0.0159   Epoch: 12   Global Step: 498560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:17,695-Speed 2642.88 samples/sec   Loss 5.3740   LearningRate 0.0159   Epoch: 12   Global Step: 498570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:21,591-Speed 2629.04 samples/sec   Loss 5.4633   LearningRate 0.0159   Epoch: 12   Global Step: 498580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:25,491-Speed 2626.61 samples/sec   Loss 5.3582   LearningRate 0.0159   Epoch: 12   Global Step: 498590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:29,388-Speed 2628.13 samples/sec   Loss 5.3418   LearningRate 0.0159   Epoch: 12   Global Step: 498600   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:33,291-Speed 2624.20 samples/sec   Loss 5.2913   LearningRate 0.0159   Epoch: 12   Global Step: 498610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:37,190-Speed 2626.51 samples/sec   Loss 5.4546   LearningRate 0.0159   Epoch: 12   Global Step: 498620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:44:41,065-Speed 2643.40 samples/sec   Loss 5.4459   LearningRate 0.0159   Epoch: 12   Global Step: 498630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:44:44,974-Speed 2620.48 samples/sec   Loss 5.4271   LearningRate 0.0159   Epoch: 12   Global Step: 498640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:44:48,874-Speed 2625.77 samples/sec   Loss 5.3102   LearningRate 0.0159   Epoch: 12   Global Step: 498650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:44:52,770-Speed 2629.88 samples/sec   Loss 5.3189   LearningRate 0.0159   Epoch: 12   Global Step: 498660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:44:56,665-Speed 2629.69 samples/sec   Loss 5.4048   LearningRate 0.0159   Epoch: 12   Global Step: 498670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:45:00,560-Speed 2629.61 samples/sec   Loss 5.4805   LearningRate 0.0159   Epoch: 12   Global Step: 498680   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:45:04,470-Speed 2619.00 samples/sec   Loss 5.3045   LearningRate 0.0159   Epoch: 12   Global Step: 498690   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:45:08,378-Speed 2621.20 samples/sec   Loss 5.3140   LearningRate 0.0159   Epoch: 12   Global Step: 498700   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:45:12,325-Speed 2594.75 samples/sec   Loss 5.2529   LearningRate 0.0159   Epoch: 12   Global Step: 498710   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:45:16,226-Speed 2625.33 samples/sec   Loss 5.3381   LearningRate 0.0159   Epoch: 12   Global Step: 498720   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:45:20,138-Speed 2618.75 samples/sec   Loss 5.3556   LearningRate 0.0159   Epoch: 12   Global Step: 498730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:24,050-Speed 2617.64 samples/sec   Loss 5.3424   LearningRate 0.0159   Epoch: 12   Global Step: 498740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:27,946-Speed 2629.67 samples/sec   Loss 5.4428   LearningRate 0.0159   Epoch: 12   Global Step: 498750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:31,844-Speed 2627.40 samples/sec   Loss 5.4170   LearningRate 0.0159   Epoch: 12   Global Step: 498760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:35,743-Speed 2626.98 samples/sec   Loss 5.3409   LearningRate 0.0159   Epoch: 12   Global Step: 498770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:39,641-Speed 2627.34 samples/sec   Loss 5.3274   LearningRate 0.0159   Epoch: 12   Global Step: 498780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:43,538-Speed 2628.34 samples/sec   Loss 5.2713   LearningRate 0.0159   Epoch: 12   Global Step: 498790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:47,436-Speed 2627.43 samples/sec   Loss 5.3723   LearningRate 0.0159   Epoch: 12   Global Step: 498800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:51,334-Speed 2627.93 samples/sec   Loss 5.4322   LearningRate 0.0159   Epoch: 12   Global Step: 498810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:55,235-Speed 2625.31 samples/sec   Loss 5.3109   LearningRate 0.0159   Epoch: 12   Global Step: 498820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:45:59,116-Speed 2639.19 samples/sec   Loss 5.2711   LearningRate 0.0159   Epoch: 12   Global Step: 498830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:46:03,012-Speed 2629.04 samples/sec   Loss 5.3615   LearningRate 0.0159   Epoch: 12   Global Step: 498840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:46:06,923-Speed 2618.53 samples/sec   Loss 5.2216   LearningRate 0.0159   Epoch: 12   Global Step: 498850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:46:10,820-Speed 2628.20 samples/sec   Loss 5.3153   LearningRate 0.0159   Epoch: 12   Global Step: 498860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:46:14,701-Speed 2640.09 samples/sec   Loss 5.3015   LearningRate 0.0159   Epoch: 12   Global Step: 498870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:18,597-Speed 2629.34 samples/sec   Loss 5.3868   LearningRate 0.0159   Epoch: 12   Global Step: 498880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:22,499-Speed 2624.68 samples/sec   Loss 5.2417   LearningRate 0.0159   Epoch: 12   Global Step: 498890   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:26,395-Speed 2628.88 samples/sec   Loss 5.4504   LearningRate 0.0159   Epoch: 12   Global Step: 498900   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:30,291-Speed 2629.44 samples/sec   Loss 5.3224   LearningRate 0.0159   Epoch: 12   Global Step: 498910   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:34,187-Speed 2628.57 samples/sec   Loss 5.3016   LearningRate 0.0159   Epoch: 12   Global Step: 498920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:38,079-Speed 2631.87 samples/sec   Loss 5.2120   LearningRate 0.0159   Epoch: 12   Global Step: 498930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:41,980-Speed 2625.47 samples/sec   Loss 5.2740   LearningRate 0.0159   Epoch: 12   Global Step: 498940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:45,880-Speed 2625.81 samples/sec   Loss 5.3002   LearningRate 0.0159   Epoch: 12   Global Step: 498950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:49,799-Speed 2614.24 samples/sec   Loss 5.2410   LearningRate 0.0159   Epoch: 12   Global Step: 498960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:53,672-Speed 2644.30 samples/sec   Loss 5.2726   LearningRate 0.0159   Epoch: 12   Global Step: 498970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:46:57,566-Speed 2631.08 samples/sec   Loss 5.3937   LearningRate 0.0159   Epoch: 12   Global Step: 498980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:01,461-Speed 2628.92 samples/sec   Loss 5.3191   LearningRate 0.0159   Epoch: 12   Global Step: 498990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:05,360-Speed 2627.00 samples/sec   Loss 5.2998   LearningRate 0.0159   Epoch: 12   Global Step: 499000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:09,254-Speed 2630.23 samples/sec   Loss 5.3490   LearningRate 0.0159   Epoch: 12   Global Step: 499010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:13,146-Speed 2631.38 samples/sec   Loss 5.3374   LearningRate 0.0159   Epoch: 12   Global Step: 499020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:17,049-Speed 2624.47 samples/sec   Loss 5.3735   LearningRate 0.0159   Epoch: 12   Global Step: 499030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:20,942-Speed 2630.65 samples/sec   Loss 5.4016   LearningRate 0.0159   Epoch: 12   Global Step: 499040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:24,834-Speed 2631.58 samples/sec   Loss 5.3590   LearningRate 0.0159   Epoch: 12   Global Step: 499050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:47:28,708-Speed 2644.42 samples/sec   Loss 5.3195   LearningRate 0.0159   Epoch: 12   Global Step: 499060   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:32,611-Speed 2624.70 samples/sec   Loss 5.2758   LearningRate 0.0159   Epoch: 12   Global Step: 499070   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:36,506-Speed 2629.44 samples/sec   Loss 5.4518   LearningRate 0.0159   Epoch: 12   Global Step: 499080   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:40,417-Speed 2618.57 samples/sec   Loss 5.2740   LearningRate 0.0159   Epoch: 12   Global Step: 499090   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:44,315-Speed 2627.54 samples/sec   Loss 5.3367   LearningRate 0.0159   Epoch: 12   Global Step: 499100   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:48,214-Speed 2627.01 samples/sec   Loss 5.3480   LearningRate 0.0159   Epoch: 12   Global Step: 499110   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:52,116-Speed 2625.03 samples/sec   Loss 5.2683   LearningRate 0.0159   Epoch: 12   Global Step: 499120   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:56,015-Speed 2627.36 samples/sec   Loss 5.4309   LearningRate 0.0159   Epoch: 12   Global Step: 499130   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:47:59,913-Speed 2628.02 samples/sec   Loss 5.2501   LearningRate 0.0159   Epoch: 12   Global Step: 499140   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:48:03,819-Speed 2622.24 samples/sec   Loss 5.2400   LearningRate 0.0159   Epoch: 12   Global Step: 499150   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 03:48:07,722-Speed 2624.14 samples/sec   Loss 5.4322   LearningRate 0.0159   Epoch: 12   Global Step: 499160   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:11,621-Speed 2626.72 samples/sec   Loss 5.3753   LearningRate 0.0159   Epoch: 12   Global Step: 499170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:15,529-Speed 2621.53 samples/sec   Loss 5.3798   LearningRate 0.0159   Epoch: 12   Global Step: 499180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:19,436-Speed 2621.85 samples/sec   Loss 5.3713   LearningRate 0.0159   Epoch: 12   Global Step: 499190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:23,332-Speed 2628.76 samples/sec   Loss 5.2775   LearningRate 0.0159   Epoch: 12   Global Step: 499200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:27,230-Speed 2627.20 samples/sec   Loss 5.2985   LearningRate 0.0159   Epoch: 12   Global Step: 499210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:31,128-Speed 2628.51 samples/sec   Loss 5.3325   LearningRate 0.0159   Epoch: 12   Global Step: 499220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:35,025-Speed 2628.74 samples/sec   Loss 5.3088   LearningRate 0.0159   Epoch: 12   Global Step: 499230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:38,922-Speed 2627.83 samples/sec   Loss 5.4977   LearningRate 0.0159   Epoch: 12   Global Step: 499240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:42,818-Speed 2628.60 samples/sec   Loss 5.4297   LearningRate 0.0159   Epoch: 12   Global Step: 499250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:48:46,717-Speed 2628.14 samples/sec   Loss 5.2396   LearningRate 0.0159   Epoch: 12   Global Step: 499260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:48:50,610-Speed 2630.39 samples/sec   Loss 5.3921   LearningRate 0.0159   Epoch: 12   Global Step: 499270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:48:54,525-Speed 2616.84 samples/sec   Loss 5.3866   LearningRate 0.0159   Epoch: 12   Global Step: 499280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:48:58,438-Speed 2617.39 samples/sec   Loss 5.3564   LearningRate 0.0159   Epoch: 12   Global Step: 499290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:49:02,338-Speed 2626.55 samples/sec   Loss 5.3167   LearningRate 0.0159   Epoch: 12   Global Step: 499300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:49:06,233-Speed 2629.51 samples/sec   Loss 5.4037   LearningRate 0.0158   Epoch: 12   Global Step: 499310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:49:10,135-Speed 2624.70 samples/sec   Loss 5.3811   LearningRate 0.0158   Epoch: 12   Global Step: 499320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:49:14,010-Speed 2643.47 samples/sec   Loss 5.2465   LearningRate 0.0158   Epoch: 12   Global Step: 499330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:17,908-Speed 2628.36 samples/sec   Loss 5.3632   LearningRate 0.0158   Epoch: 12   Global Step: 499340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:21,806-Speed 2627.59 samples/sec   Loss 5.3509   LearningRate 0.0158   Epoch: 12   Global Step: 499350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:25,714-Speed 2621.46 samples/sec   Loss 5.3674   LearningRate 0.0158   Epoch: 12   Global Step: 499360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:29,609-Speed 2629.49 samples/sec   Loss 5.2723   LearningRate 0.0158   Epoch: 12   Global Step: 499370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:33,509-Speed 2625.52 samples/sec   Loss 5.3685   LearningRate 0.0158   Epoch: 12   Global Step: 499380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:37,415-Speed 2622.15 samples/sec   Loss 5.2581   LearningRate 0.0158   Epoch: 12   Global Step: 499390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:41,312-Speed 2629.08 samples/sec   Loss 5.4028   LearningRate 0.0158   Epoch: 12   Global Step: 499400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:45,203-Speed 2632.21 samples/sec   Loss 5.3917   LearningRate 0.0158   Epoch: 12   Global Step: 499410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:49,101-Speed 2627.85 samples/sec   Loss 5.3047   LearningRate 0.0158   Epoch: 12   Global Step: 499420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:49:52,998-Speed 2628.37 samples/sec   Loss 5.4078   LearningRate 0.0158   Epoch: 12   Global Step: 499430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:49:56,872-Speed 2643.60 samples/sec   Loss 5.2901   LearningRate 0.0158   Epoch: 12   Global Step: 499440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:00,766-Speed 2630.20 samples/sec   Loss 5.3884   LearningRate 0.0158   Epoch: 12   Global Step: 499450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:04,675-Speed 2620.51 samples/sec   Loss 5.3829   LearningRate 0.0158   Epoch: 12   Global Step: 499460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:08,591-Speed 2615.85 samples/sec   Loss 5.4026   LearningRate 0.0158   Epoch: 12   Global Step: 499470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:12,493-Speed 2624.83 samples/sec   Loss 5.3397   LearningRate 0.0158   Epoch: 12   Global Step: 499480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:16,393-Speed 2626.76 samples/sec   Loss 5.3942   LearningRate 0.0158   Epoch: 12   Global Step: 499490   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:20,329-Speed 2602.08 samples/sec   Loss 5.3366   LearningRate 0.0158   Epoch: 12   Global Step: 499500   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:24,240-Speed 2619.09 samples/sec   Loss 5.2172   LearningRate 0.0158   Epoch: 12   Global Step: 499510   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:28,138-Speed 2627.43 samples/sec   Loss 5.2966   LearningRate 0.0158   Epoch: 12   Global Step: 499520   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:32,086-Speed 2594.34 samples/sec   Loss 5.3132   LearningRate 0.0158   Epoch: 12   Global Step: 499530   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:35,998-Speed 2618.23 samples/sec   Loss 5.3767   LearningRate 0.0158   Epoch: 12   Global Step: 499540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:50:39,899-Speed 2625.48 samples/sec   Loss 5.4346   LearningRate 0.0158   Epoch: 12   Global Step: 499550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:50:43,779-Speed 2640.33 samples/sec   Loss 5.3831   LearningRate 0.0158   Epoch: 12   Global Step: 499560   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:47,684-Speed 2622.97 samples/sec   Loss 5.3872   LearningRate 0.0158   Epoch: 12   Global Step: 499570   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:51,581-Speed 2628.38 samples/sec   Loss 5.3511   LearningRate 0.0158   Epoch: 12   Global Step: 499580   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:55,529-Speed 2594.66 samples/sec   Loss 5.3492   LearningRate 0.0158   Epoch: 12   Global Step: 499590   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:50:59,438-Speed 2619.93 samples/sec   Loss 5.4394   LearningRate 0.0158   Epoch: 12   Global Step: 499600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:03,337-Speed 2627.28 samples/sec   Loss 5.2793   LearningRate 0.0158   Epoch: 12   Global Step: 499610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:07,261-Speed 2610.18 samples/sec   Loss 5.3797   LearningRate 0.0158   Epoch: 12   Global Step: 499620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:11,180-Speed 2614.66 samples/sec   Loss 5.2495   LearningRate 0.0158   Epoch: 12   Global Step: 499630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:15,077-Speed 2629.27 samples/sec   Loss 5.3402   LearningRate 0.0158   Epoch: 12   Global Step: 499640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:18,980-Speed 2623.79 samples/sec   Loss 5.2661   LearningRate 0.0158   Epoch: 12   Global Step: 499650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:22,901-Speed 2612.79 samples/sec   Loss 5.3248   LearningRate 0.0158   Epoch: 12   Global Step: 499660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:51:26,776-Speed 2643.06 samples/sec   Loss 5.2387   LearningRate 0.0158   Epoch: 12   Global Step: 499670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:30,697-Speed 2612.47 samples/sec   Loss 5.3852   LearningRate 0.0158   Epoch: 12   Global Step: 499680   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:34,594-Speed 2628.14 samples/sec   Loss 5.3271   LearningRate 0.0158   Epoch: 12   Global Step: 499690   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:38,489-Speed 2629.62 samples/sec   Loss 5.3309   LearningRate 0.0158   Epoch: 12   Global Step: 499700   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:42,396-Speed 2621.31 samples/sec   Loss 5.2525   LearningRate 0.0158   Epoch: 12   Global Step: 499710   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:46,294-Speed 2628.01 samples/sec   Loss 5.3165   LearningRate 0.0158   Epoch: 12   Global Step: 499720   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:50,189-Speed 2630.03 samples/sec   Loss 5.3544   LearningRate 0.0158   Epoch: 12   Global Step: 499730   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:54,086-Speed 2628.21 samples/sec   Loss 5.3123   LearningRate 0.0158   Epoch: 12   Global Step: 499740   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:51:58,015-Speed 2607.43 samples/sec   Loss 5.3207   LearningRate 0.0158   Epoch: 12   Global Step: 499750   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:01,917-Speed 2624.59 samples/sec   Loss 5.2818   LearningRate 0.0158   Epoch: 12   Global Step: 499760   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:05,813-Speed 2629.40 samples/sec   Loss 5.3838   LearningRate 0.0158   Epoch: 12   Global Step: 499770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:52:09,707-Speed 2630.17 samples/sec   Loss 5.2791   LearningRate 0.0158   Epoch: 12   Global Step: 499780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:52:13,580-Speed 2644.83 samples/sec   Loss 5.3362   LearningRate 0.0158   Epoch: 12   Global Step: 499790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:17,573-Speed 2565.02 samples/sec   Loss 5.2696   LearningRate 0.0158   Epoch: 12   Global Step: 499800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:21,470-Speed 2628.42 samples/sec   Loss 5.3434   LearningRate 0.0158   Epoch: 12   Global Step: 499810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:25,367-Speed 2628.13 samples/sec   Loss 5.2490   LearningRate 0.0158   Epoch: 12   Global Step: 499820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:29,271-Speed 2624.33 samples/sec   Loss 5.3653   LearningRate 0.0158   Epoch: 12   Global Step: 499830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:33,178-Speed 2621.38 samples/sec   Loss 5.3515   LearningRate 0.0158   Epoch: 12   Global Step: 499840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:37,075-Speed 2627.80 samples/sec   Loss 5.4284   LearningRate 0.0158   Epoch: 12   Global Step: 499850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:40,968-Speed 2631.02 samples/sec   Loss 5.4141   LearningRate 0.0158   Epoch: 12   Global Step: 499860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:44,860-Speed 2632.35 samples/sec   Loss 5.3096   LearningRate 0.0158   Epoch: 12   Global Step: 499870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:48,758-Speed 2627.51 samples/sec   Loss 5.3091   LearningRate 0.0158   Epoch: 12   Global Step: 499880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:52:52,745-Speed 2569.37 samples/sec   Loss 5.3481   LearningRate 0.0158   Epoch: 12   Global Step: 499890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:52:56,643-Speed 2627.14 samples/sec   Loss 5.2132   LearningRate 0.0158   Epoch: 12   Global Step: 499900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:00,536-Speed 2631.54 samples/sec   Loss 5.2619   LearningRate 0.0158   Epoch: 12   Global Step: 499910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:04,430-Speed 2630.44 samples/sec   Loss 5.3511   LearningRate 0.0158   Epoch: 12   Global Step: 499920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:08,325-Speed 2629.77 samples/sec   Loss 5.4226   LearningRate 0.0158   Epoch: 12   Global Step: 499930   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:12,251-Speed 2608.50 samples/sec   Loss 5.2875   LearningRate 0.0158   Epoch: 12   Global Step: 499940   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:16,151-Speed 2626.48 samples/sec   Loss 5.3128   LearningRate 0.0158   Epoch: 12   Global Step: 499950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:20,046-Speed 2630.27 samples/sec   Loss 5.2772   LearningRate 0.0158   Epoch: 12   Global Step: 499960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:23,958-Speed 2618.67 samples/sec   Loss 5.3100   LearningRate 0.0158   Epoch: 12   Global Step: 499970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:27,857-Speed 2626.83 samples/sec   Loss 5.2590   LearningRate 0.0158   Epoch: 12   Global Step: 499980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:31,774-Speed 2615.14 samples/sec   Loss 5.3633   LearningRate 0.0158   Epoch: 12   Global Step: 499990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:53:35,647-Speed 2644.52 samples/sec   Loss 5.3178   LearningRate 0.0158   Epoch: 12   Global Step: 500000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:54:18,662-[lfw][500000]XNorm: 22.839151
Training: 2022-04-15 03:54:18,663-[lfw][500000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-15 03:54:18,664-[lfw][500000]Accuracy-Highest: 0.99800
Training: 2022-04-15 03:55:08,750-[cfp_fp][500000]XNorm: 21.312026
Training: 2022-04-15 03:55:08,751-[cfp_fp][500000]Accuracy-Flip: 0.98929+-0.00439
Training: 2022-04-15 03:55:08,751-[cfp_fp][500000]Accuracy-Highest: 0.98971
Training: 2022-04-15 03:55:51,692-[agedb_30][500000]XNorm: 22.884774
Training: 2022-04-15 03:55:51,693-[agedb_30][500000]Accuracy-Flip: 0.97667+-0.00703
Training: 2022-04-15 03:55:51,694-[agedb_30][500000]Accuracy-Highest: 0.97950
Training: 2022-04-15 03:55:55,561-Speed 73.19 samples/sec   Loss 5.2997   LearningRate 0.0158   Epoch: 12   Global Step: 500010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:55:59,417-Speed 2656.28 samples/sec   Loss 5.2730   LearningRate 0.0158   Epoch: 12   Global Step: 500020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:03,280-Speed 2651.29 samples/sec   Loss 5.3116   LearningRate 0.0158   Epoch: 12   Global Step: 500030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:07,150-Speed 2646.50 samples/sec   Loss 5.2819   LearningRate 0.0158   Epoch: 12   Global Step: 500040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:11,026-Speed 2642.34 samples/sec   Loss 5.3360   LearningRate 0.0158   Epoch: 12   Global Step: 500050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:14,912-Speed 2635.66 samples/sec   Loss 5.2410   LearningRate 0.0158   Epoch: 12   Global Step: 500060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:18,793-Speed 2639.48 samples/sec   Loss 5.3157   LearningRate 0.0158   Epoch: 12   Global Step: 500070   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:22,685-Speed 2631.34 samples/sec   Loss 5.4727   LearningRate 0.0158   Epoch: 12   Global Step: 500080   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:26,569-Speed 2637.73 samples/sec   Loss 5.4389   LearningRate 0.0158   Epoch: 12   Global Step: 500090   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:56:30,445-Speed 2642.59 samples/sec   Loss 5.3291   LearningRate 0.0158   Epoch: 12   Global Step: 500100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:34,362-Speed 2615.06 samples/sec   Loss 5.2641   LearningRate 0.0158   Epoch: 12   Global Step: 500110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:38,244-Speed 2638.86 samples/sec   Loss 5.3202   LearningRate 0.0158   Epoch: 12   Global Step: 500120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:42,126-Speed 2637.96 samples/sec   Loss 5.3024   LearningRate 0.0158   Epoch: 12   Global Step: 500130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:46,017-Speed 2632.37 samples/sec   Loss 5.3095   LearningRate 0.0158   Epoch: 12   Global Step: 500140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:49,906-Speed 2634.07 samples/sec   Loss 5.2356   LearningRate 0.0158   Epoch: 12   Global Step: 500150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:53,792-Speed 2636.07 samples/sec   Loss 5.2247   LearningRate 0.0158   Epoch: 12   Global Step: 500160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:56:57,682-Speed 2633.30 samples/sec   Loss 5.2517   LearningRate 0.0158   Epoch: 12   Global Step: 500170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:01,567-Speed 2636.31 samples/sec   Loss 5.3180   LearningRate 0.0158   Epoch: 12   Global Step: 500180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:05,455-Speed 2634.95 samples/sec   Loss 5.3614   LearningRate 0.0158   Epoch: 12   Global Step: 500190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:09,327-Speed 2644.98 samples/sec   Loss 5.1794   LearningRate 0.0158   Epoch: 12   Global Step: 500200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:13,210-Speed 2637.35 samples/sec   Loss 5.3161   LearningRate 0.0158   Epoch: 12   Global Step: 500210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:17,092-Speed 2638.02 samples/sec   Loss 5.3050   LearningRate 0.0158   Epoch: 12   Global Step: 500220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:20,976-Speed 2637.93 samples/sec   Loss 5.3325   LearningRate 0.0158   Epoch: 12   Global Step: 500230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:24,866-Speed 2633.24 samples/sec   Loss 5.4303   LearningRate 0.0158   Epoch: 12   Global Step: 500240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:28,752-Speed 2636.17 samples/sec   Loss 5.4056   LearningRate 0.0158   Epoch: 12   Global Step: 500250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:32,636-Speed 2637.23 samples/sec   Loss 5.3294   LearningRate 0.0158   Epoch: 12   Global Step: 500260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:36,534-Speed 2627.33 samples/sec   Loss 5.3587   LearningRate 0.0158   Epoch: 12   Global Step: 500270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:57:40,397-Speed 2651.43 samples/sec   Loss 5.3903   LearningRate 0.0158   Epoch: 12   Global Step: 500280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:57:44,290-Speed 2631.08 samples/sec   Loss 5.2877   LearningRate 0.0158   Epoch: 12   Global Step: 500290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:57:48,184-Speed 2630.18 samples/sec   Loss 5.3112   LearningRate 0.0158   Epoch: 12   Global Step: 500300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:57:52,079-Speed 2630.01 samples/sec   Loss 5.2989   LearningRate 0.0158   Epoch: 12   Global Step: 500310   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:57:55,969-Speed 2633.35 samples/sec   Loss 5.2523   LearningRate 0.0158   Epoch: 12   Global Step: 500320   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:57:59,851-Speed 2638.17 samples/sec   Loss 5.2699   LearningRate 0.0158   Epoch: 12   Global Step: 500330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:58:03,737-Speed 2636.13 samples/sec   Loss 5.3075   LearningRate 0.0158   Epoch: 12   Global Step: 500340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:58:07,615-Speed 2641.58 samples/sec   Loss 5.2958   LearningRate 0.0158   Epoch: 12   Global Step: 500350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:58:11,503-Speed 2634.10 samples/sec   Loss 5.3510   LearningRate 0.0157   Epoch: 12   Global Step: 500360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:58:15,397-Speed 2629.95 samples/sec   Loss 5.4227   LearningRate 0.0157   Epoch: 12   Global Step: 500370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:58:19,282-Speed 2636.36 samples/sec   Loss 5.2197   LearningRate 0.0157   Epoch: 12   Global Step: 500380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:23,170-Speed 2634.68 samples/sec   Loss 5.3274   LearningRate 0.0157   Epoch: 12   Global Step: 500390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:27,054-Speed 2637.31 samples/sec   Loss 5.3312   LearningRate 0.0157   Epoch: 12   Global Step: 500400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:30,938-Speed 2637.26 samples/sec   Loss 5.3782   LearningRate 0.0157   Epoch: 12   Global Step: 500410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:34,819-Speed 2639.15 samples/sec   Loss 5.4777   LearningRate 0.0157   Epoch: 12   Global Step: 500420   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:38,702-Speed 2637.87 samples/sec   Loss 5.2947   LearningRate 0.0157   Epoch: 12   Global Step: 500430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:42,606-Speed 2623.59 samples/sec   Loss 5.1893   LearningRate 0.0157   Epoch: 12   Global Step: 500440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:46,487-Speed 2638.74 samples/sec   Loss 5.3129   LearningRate 0.0157   Epoch: 12   Global Step: 500450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:50,371-Speed 2637.16 samples/sec   Loss 5.3563   LearningRate 0.0157   Epoch: 12   Global Step: 500460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:54,257-Speed 2635.70 samples/sec   Loss 5.4487   LearningRate 0.0157   Epoch: 12   Global Step: 500470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:58:58,122-Speed 2650.33 samples/sec   Loss 5.3764   LearningRate 0.0157   Epoch: 12   Global Step: 500480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:02,011-Speed 2634.03 samples/sec   Loss 5.2939   LearningRate 0.0157   Epoch: 12   Global Step: 500490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:05,902-Speed 2632.32 samples/sec   Loss 5.2867   LearningRate 0.0157   Epoch: 12   Global Step: 500500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:09,791-Speed 2634.36 samples/sec   Loss 5.3584   LearningRate 0.0157   Epoch: 12   Global Step: 500510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:13,696-Speed 2622.74 samples/sec   Loss 5.4066   LearningRate 0.0157   Epoch: 12   Global Step: 500520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:17,584-Speed 2633.99 samples/sec   Loss 5.3075   LearningRate 0.0157   Epoch: 12   Global Step: 500530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:21,477-Speed 2630.45 samples/sec   Loss 5.3499   LearningRate 0.0157   Epoch: 12   Global Step: 500540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:25,366-Speed 2634.46 samples/sec   Loss 5.3413   LearningRate 0.0157   Epoch: 12   Global Step: 500550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:29,302-Speed 2601.97 samples/sec   Loss 5.3733   LearningRate 0.0157   Epoch: 12   Global Step: 500560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:33,192-Speed 2633.76 samples/sec   Loss 5.3681   LearningRate 0.0157   Epoch: 12   Global Step: 500570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:37,060-Speed 2647.30 samples/sec   Loss 5.3395   LearningRate 0.0157   Epoch: 12   Global Step: 500580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:40,946-Speed 2636.09 samples/sec   Loss 5.3564   LearningRate 0.0157   Epoch: 12   Global Step: 500590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 03:59:44,813-Speed 2648.31 samples/sec   Loss 5.1928   LearningRate 0.0157   Epoch: 12   Global Step: 500600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:59:48,709-Speed 2629.42 samples/sec   Loss 5.3175   LearningRate 0.0157   Epoch: 12   Global Step: 500610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:59:52,592-Speed 2637.00 samples/sec   Loss 5.2819   LearningRate 0.0157   Epoch: 12   Global Step: 500620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 03:59:56,478-Speed 2636.33 samples/sec   Loss 5.2359   LearningRate 0.0157   Epoch: 12   Global Step: 500630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:00:00,365-Speed 2634.95 samples/sec   Loss 5.4633   LearningRate 0.0157   Epoch: 12   Global Step: 500640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:00:04,230-Speed 2650.47 samples/sec   Loss 5.3422   LearningRate 0.0157   Epoch: 12   Global Step: 500650   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:08,126-Speed 2628.83 samples/sec   Loss 5.3741   LearningRate 0.0157   Epoch: 12   Global Step: 500660   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:12,019-Speed 2631.31 samples/sec   Loss 5.3288   LearningRate 0.0157   Epoch: 12   Global Step: 500670   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:15,909-Speed 2633.24 samples/sec   Loss 5.3815   LearningRate 0.0157   Epoch: 12   Global Step: 500680   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:19,798-Speed 2633.52 samples/sec   Loss 5.3969   LearningRate 0.0157   Epoch: 12   Global Step: 500690   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:23,686-Speed 2633.90 samples/sec   Loss 5.2874   LearningRate 0.0157   Epoch: 12   Global Step: 500700   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:27,581-Speed 2630.67 samples/sec   Loss 5.2440   LearningRate 0.0157   Epoch: 12   Global Step: 500710   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:31,469-Speed 2634.26 samples/sec   Loss 5.2633   LearningRate 0.0157   Epoch: 12   Global Step: 500720   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:35,370-Speed 2625.72 samples/sec   Loss 5.2969   LearningRate 0.0157   Epoch: 12   Global Step: 500730   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:39,277-Speed 2621.22 samples/sec   Loss 5.3091   LearningRate 0.0157   Epoch: 12   Global Step: 500740   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:00:43,167-Speed 2633.35 samples/sec   Loss 5.2683   LearningRate 0.0157   Epoch: 12   Global Step: 500750   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:00:47,055-Speed 2634.86 samples/sec   Loss 5.3373   LearningRate 0.0157   Epoch: 12   Global Step: 500760   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:00:50,944-Speed 2633.49 samples/sec   Loss 5.2952   LearningRate 0.0157   Epoch: 12   Global Step: 500770   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:00:54,835-Speed 2633.03 samples/sec   Loss 5.3656   LearningRate 0.0157   Epoch: 12   Global Step: 500780   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:00:58,725-Speed 2632.61 samples/sec   Loss 5.2837   LearningRate 0.0157   Epoch: 12   Global Step: 500790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:02,616-Speed 2632.94 samples/sec   Loss 5.2974   LearningRate 0.0157   Epoch: 12   Global Step: 500800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:06,514-Speed 2627.61 samples/sec   Loss 5.2625   LearningRate 0.0157   Epoch: 12   Global Step: 500810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:10,405-Speed 2632.27 samples/sec   Loss 5.2539   LearningRate 0.0157   Epoch: 12   Global Step: 500820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:14,289-Speed 2636.72 samples/sec   Loss 5.3147   LearningRate 0.0157   Epoch: 12   Global Step: 500830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:18,207-Speed 2615.44 samples/sec   Loss 5.2453   LearningRate 0.0157   Epoch: 12   Global Step: 500840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:22,084-Speed 2642.19 samples/sec   Loss 5.3717   LearningRate 0.0157   Epoch: 12   Global Step: 500850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:25,997-Speed 2617.54 samples/sec   Loss 5.3517   LearningRate 0.0157   Epoch: 12   Global Step: 500860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:29,882-Speed 2637.21 samples/sec   Loss 5.3341   LearningRate 0.0157   Epoch: 12   Global Step: 500870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:33,774-Speed 2631.31 samples/sec   Loss 5.3256   LearningRate 0.0157   Epoch: 12   Global Step: 500880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:37,662-Speed 2634.48 samples/sec   Loss 5.3899   LearningRate 0.0157   Epoch: 12   Global Step: 500890   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:41,548-Speed 2635.76 samples/sec   Loss 5.3375   LearningRate 0.0157   Epoch: 12   Global Step: 500900   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:45,435-Speed 2635.15 samples/sec   Loss 5.3709   LearningRate 0.0157   Epoch: 12   Global Step: 500910   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:49,321-Speed 2635.30 samples/sec   Loss 5.2225   LearningRate 0.0157   Epoch: 12   Global Step: 500920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:53,214-Speed 2631.86 samples/sec   Loss 5.3492   LearningRate 0.0157   Epoch: 12   Global Step: 500930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:01:57,109-Speed 2629.69 samples/sec   Loss 5.4262   LearningRate 0.0157   Epoch: 12   Global Step: 500940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:01,005-Speed 2629.12 samples/sec   Loss 5.3673   LearningRate 0.0157   Epoch: 12   Global Step: 500950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:04,898-Speed 2630.89 samples/sec   Loss 5.2725   LearningRate 0.0157   Epoch: 12   Global Step: 500960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:08,790-Speed 2631.22 samples/sec   Loss 5.3418   LearningRate 0.0157   Epoch: 12   Global Step: 500970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:12,677-Speed 2635.26 samples/sec   Loss 5.3292   LearningRate 0.0157   Epoch: 12   Global Step: 500980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:16,563-Speed 2635.75 samples/sec   Loss 5.3116   LearningRate 0.0157   Epoch: 12   Global Step: 500990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:20,451-Speed 2634.33 samples/sec   Loss 5.2727   LearningRate 0.0157   Epoch: 12   Global Step: 501000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:24,345-Speed 2630.76 samples/sec   Loss 5.2368   LearningRate 0.0157   Epoch: 12   Global Step: 501010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:02:28,222-Speed 2642.25 samples/sec   Loss 5.2876   LearningRate 0.0157   Epoch: 12   Global Step: 501020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:32,107-Speed 2636.44 samples/sec   Loss 5.3374   LearningRate 0.0157   Epoch: 12   Global Step: 501030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:35,996-Speed 2633.62 samples/sec   Loss 5.2590   LearningRate 0.0157   Epoch: 12   Global Step: 501040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:39,892-Speed 2628.86 samples/sec   Loss 5.2593   LearningRate 0.0157   Epoch: 12   Global Step: 501050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:43,782-Speed 2632.75 samples/sec   Loss 5.2425   LearningRate 0.0157   Epoch: 12   Global Step: 501060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:47,677-Speed 2630.10 samples/sec   Loss 5.3454   LearningRate 0.0157   Epoch: 12   Global Step: 501070   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:51,568-Speed 2633.18 samples/sec   Loss 5.3507   LearningRate 0.0157   Epoch: 12   Global Step: 501080   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:55,460-Speed 2631.05 samples/sec   Loss 5.2937   LearningRate 0.0157   Epoch: 12   Global Step: 501090   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:02:59,353-Speed 2631.34 samples/sec   Loss 5.2994   LearningRate 0.0157   Epoch: 12   Global Step: 501100   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:03:03,225-Speed 2645.31 samples/sec   Loss 5.3605   LearningRate 0.0157   Epoch: 12   Global Step: 501110   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:07,125-Speed 2625.87 samples/sec   Loss 5.2139   LearningRate 0.0157   Epoch: 12   Global Step: 501120   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:11,018-Speed 2631.13 samples/sec   Loss 5.3539   LearningRate 0.0157   Epoch: 12   Global Step: 501130   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:14,906-Speed 2634.55 samples/sec   Loss 5.2868   LearningRate 0.0157   Epoch: 12   Global Step: 501140   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:18,794-Speed 2634.51 samples/sec   Loss 5.3339   LearningRate 0.0157   Epoch: 12   Global Step: 501150   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:22,678-Speed 2637.36 samples/sec   Loss 5.2841   LearningRate 0.0157   Epoch: 12   Global Step: 501160   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:26,577-Speed 2627.07 samples/sec   Loss 5.1903   LearningRate 0.0157   Epoch: 12   Global Step: 501170   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:30,471-Speed 2630.39 samples/sec   Loss 5.2319   LearningRate 0.0157   Epoch: 12   Global Step: 501180   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:34,361-Speed 2633.11 samples/sec   Loss 5.2958   LearningRate 0.0157   Epoch: 12   Global Step: 501190   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:38,249-Speed 2634.43 samples/sec   Loss 5.3716   LearningRate 0.0157   Epoch: 12   Global Step: 501200   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:03:42,176-Speed 2607.70 samples/sec   Loss 5.3065   LearningRate 0.0157   Epoch: 12   Global Step: 501210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:03:46,070-Speed 2630.60 samples/sec   Loss 5.3091   LearningRate 0.0157   Epoch: 12   Global Step: 501220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:03:49,960-Speed 2633.57 samples/sec   Loss 5.3616   LearningRate 0.0157   Epoch: 12   Global Step: 501230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:03:53,850-Speed 2632.79 samples/sec   Loss 5.3104   LearningRate 0.0157   Epoch: 12   Global Step: 501240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:03:57,744-Speed 2630.39 samples/sec   Loss 5.3533   LearningRate 0.0157   Epoch: 12   Global Step: 501250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:01,633-Speed 2633.79 samples/sec   Loss 5.2407   LearningRate 0.0157   Epoch: 12   Global Step: 501260   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:05,534-Speed 2625.54 samples/sec   Loss 5.3264   LearningRate 0.0157   Epoch: 12   Global Step: 501270   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:09,422-Speed 2634.02 samples/sec   Loss 5.2004   LearningRate 0.0157   Epoch: 12   Global Step: 501280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:13,314-Speed 2631.53 samples/sec   Loss 5.2898   LearningRate 0.0157   Epoch: 12   Global Step: 501290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:17,208-Speed 2630.88 samples/sec   Loss 5.2956   LearningRate 0.0157   Epoch: 12   Global Step: 501300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:21,101-Speed 2630.78 samples/sec   Loss 5.2310   LearningRate 0.0157   Epoch: 12   Global Step: 501310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:04:24,991-Speed 2632.91 samples/sec   Loss 5.2901   LearningRate 0.0157   Epoch: 12   Global Step: 501320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:04:28,864-Speed 2645.17 samples/sec   Loss 5.2940   LearningRate 0.0157   Epoch: 12   Global Step: 501330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:32,758-Speed 2629.84 samples/sec   Loss 5.3373   LearningRate 0.0157   Epoch: 12   Global Step: 501340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:36,652-Speed 2630.54 samples/sec   Loss 5.2947   LearningRate 0.0157   Epoch: 12   Global Step: 501350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:40,546-Speed 2630.44 samples/sec   Loss 5.4085   LearningRate 0.0157   Epoch: 12   Global Step: 501360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:44,437-Speed 2632.26 samples/sec   Loss 5.4412   LearningRate 0.0157   Epoch: 12   Global Step: 501370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:48,326-Speed 2633.94 samples/sec   Loss 5.2942   LearningRate 0.0157   Epoch: 12   Global Step: 501380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:52,231-Speed 2622.48 samples/sec   Loss 5.3164   LearningRate 0.0157   Epoch: 12   Global Step: 501390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:04:56,126-Speed 2629.86 samples/sec   Loss 5.3278   LearningRate 0.0156   Epoch: 12   Global Step: 501400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:05:00,074-Speed 2594.78 samples/sec   Loss 5.2206   LearningRate 0.0156   Epoch: 12   Global Step: 501410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:05:04,097-Speed 2545.96 samples/sec   Loss 5.2853   LearningRate 0.0156   Epoch: 12   Global Step: 501420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:05:07,994-Speed 2628.28 samples/sec   Loss 5.2977   LearningRate 0.0156   Epoch: 12   Global Step: 501430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:11,911-Speed 2615.09 samples/sec   Loss 5.2326   LearningRate 0.0156   Epoch: 12   Global Step: 501440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:15,806-Speed 2629.40 samples/sec   Loss 5.3863   LearningRate 0.0156   Epoch: 12   Global Step: 501450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:19,695-Speed 2633.81 samples/sec   Loss 5.2627   LearningRate 0.0156   Epoch: 12   Global Step: 501460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:23,586-Speed 2632.92 samples/sec   Loss 5.2927   LearningRate 0.0156   Epoch: 12   Global Step: 501470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:27,478-Speed 2631.70 samples/sec   Loss 5.2171   LearningRate 0.0156   Epoch: 12   Global Step: 501480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:31,379-Speed 2625.05 samples/sec   Loss 5.3188   LearningRate 0.0156   Epoch: 12   Global Step: 501490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:35,267-Speed 2634.92 samples/sec   Loss 5.2096   LearningRate 0.0156   Epoch: 12   Global Step: 501500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:39,171-Speed 2623.15 samples/sec   Loss 5.3546   LearningRate 0.0156   Epoch: 12   Global Step: 501510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:43,065-Speed 2630.26 samples/sec   Loss 5.2759   LearningRate 0.0156   Epoch: 12   Global Step: 501520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:46,965-Speed 2626.74 samples/sec   Loss 5.3359   LearningRate 0.0156   Epoch: 12   Global Step: 501530   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-04-15 04:05:50,842-Speed 2642.02 samples/sec   Loss 5.3267   LearningRate 0.0156   Epoch: 12   Global Step: 501540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:54,733-Speed 2631.72 samples/sec   Loss 5.3003   LearningRate 0.0156   Epoch: 12   Global Step: 501550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:05:58,623-Speed 2634.08 samples/sec   Loss 5.3652   LearningRate 0.0156   Epoch: 12   Global Step: 501560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:06:02,512-Speed 2633.43 samples/sec   Loss 5.3161   LearningRate 0.0156   Epoch: 12   Global Step: 501570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:06:06,411-Speed 2627.12 samples/sec   Loss 5.3236   LearningRate 0.0156   Epoch: 12   Global Step: 501580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:06:10,307-Speed 2628.58 samples/sec   Loss 5.2771   LearningRate 0.0156   Epoch: 12   Global Step: 501590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:06:14,187-Speed 2640.01 samples/sec   Loss 5.3022   LearningRate 0.0156   Epoch: 12   Global Step: 501600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:18,079-Speed 2631.88 samples/sec   Loss 5.3383   LearningRate 0.0156   Epoch: 12   Global Step: 501610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:21,966-Speed 2635.63 samples/sec   Loss 5.2039   LearningRate 0.0156   Epoch: 12   Global Step: 501620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:25,865-Speed 2627.19 samples/sec   Loss 5.2983   LearningRate 0.0156   Epoch: 12   Global Step: 501630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:29,752-Speed 2634.39 samples/sec   Loss 5.1992   LearningRate 0.0156   Epoch: 12   Global Step: 501640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:33,641-Speed 2634.00 samples/sec   Loss 5.2916   LearningRate 0.0156   Epoch: 12   Global Step: 501650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:37,537-Speed 2629.46 samples/sec   Loss 5.3119   LearningRate 0.0156   Epoch: 12   Global Step: 501660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:41,425-Speed 2634.27 samples/sec   Loss 5.3362   LearningRate 0.0156   Epoch: 12   Global Step: 501670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:45,314-Speed 2633.34 samples/sec   Loss 5.4193   LearningRate 0.0156   Epoch: 12   Global Step: 501680   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:49,204-Speed 2633.26 samples/sec   Loss 5.2218   LearningRate 0.0156   Epoch: 12   Global Step: 501690   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:53,082-Speed 2640.98 samples/sec   Loss 5.3799   LearningRate 0.0156   Epoch: 12   Global Step: 501700   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:06:56,981-Speed 2627.53 samples/sec   Loss 5.3080   LearningRate 0.0156   Epoch: 12   Global Step: 501710   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:00,918-Speed 2601.17 samples/sec   Loss 5.2286   LearningRate 0.0156   Epoch: 12   Global Step: 501720   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:04,830-Speed 2618.33 samples/sec   Loss 5.2084   LearningRate 0.0156   Epoch: 12   Global Step: 501730   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:08,727-Speed 2628.72 samples/sec   Loss 5.2847   LearningRate 0.0156   Epoch: 12   Global Step: 501740   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:12,619-Speed 2631.66 samples/sec   Loss 5.2366   LearningRate 0.0156   Epoch: 12   Global Step: 501750   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:16,517-Speed 2627.28 samples/sec   Loss 5.4110   LearningRate 0.0156   Epoch: 12   Global Step: 501760   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:20,428-Speed 2618.93 samples/sec   Loss 5.3651   LearningRate 0.0156   Epoch: 12   Global Step: 501770   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:24,316-Speed 2635.19 samples/sec   Loss 5.3710   LearningRate 0.0156   Epoch: 12   Global Step: 501780   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:28,213-Speed 2628.35 samples/sec   Loss 5.3044   LearningRate 0.0156   Epoch: 12   Global Step: 501790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:32,102-Speed 2633.67 samples/sec   Loss 5.2616   LearningRate 0.0156   Epoch: 12   Global Step: 501800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:07:35,991-Speed 2633.91 samples/sec   Loss 5.2514   LearningRate 0.0156   Epoch: 12   Global Step: 501810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:07:39,879-Speed 2633.94 samples/sec   Loss 5.2834   LearningRate 0.0156   Epoch: 12   Global Step: 501820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:07:43,772-Speed 2631.14 samples/sec   Loss 5.3072   LearningRate 0.0156   Epoch: 12   Global Step: 501830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:07:47,644-Speed 2645.45 samples/sec   Loss 5.2448   LearningRate 0.0156   Epoch: 12   Global Step: 501840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:51,535-Speed 2632.27 samples/sec   Loss 5.4061   LearningRate 0.0156   Epoch: 12   Global Step: 501850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:55,428-Speed 2631.00 samples/sec   Loss 5.3083   LearningRate 0.0156   Epoch: 12   Global Step: 501860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:07:59,323-Speed 2629.52 samples/sec   Loss 5.2660   LearningRate 0.0156   Epoch: 12   Global Step: 501870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:03,216-Speed 2631.34 samples/sec   Loss 5.2572   LearningRate 0.0156   Epoch: 12   Global Step: 501880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:07,108-Speed 2631.61 samples/sec   Loss 5.3039   LearningRate 0.0156   Epoch: 12   Global Step: 501890   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:10,999-Speed 2632.44 samples/sec   Loss 5.2406   LearningRate 0.0156   Epoch: 12   Global Step: 501900   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:14,889-Speed 2632.45 samples/sec   Loss 5.3488   LearningRate 0.0156   Epoch: 12   Global Step: 501910   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:18,789-Speed 2627.21 samples/sec   Loss 5.2815   LearningRate 0.0156   Epoch: 12   Global Step: 501920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:22,683-Speed 2630.60 samples/sec   Loss 5.2237   LearningRate 0.0156   Epoch: 12   Global Step: 501930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:26,565-Speed 2638.12 samples/sec   Loss 5.2169   LearningRate 0.0156   Epoch: 12   Global Step: 501940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:30,486-Speed 2612.57 samples/sec   Loss 5.2529   LearningRate 0.0156   Epoch: 12   Global Step: 501950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:34,379-Speed 2631.26 samples/sec   Loss 5.2800   LearningRate 0.0156   Epoch: 12   Global Step: 501960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:38,269-Speed 2633.84 samples/sec   Loss 5.2567   LearningRate 0.0156   Epoch: 12   Global Step: 501970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:42,159-Speed 2633.17 samples/sec   Loss 5.3647   LearningRate 0.0156   Epoch: 12   Global Step: 501980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:46,051-Speed 2631.29 samples/sec   Loss 5.2249   LearningRate 0.0156   Epoch: 12   Global Step: 501990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:49,947-Speed 2629.05 samples/sec   Loss 5.3392   LearningRate 0.0156   Epoch: 12   Global Step: 502000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:53,839-Speed 2631.67 samples/sec   Loss 5.3783   LearningRate 0.0156   Epoch: 12   Global Step: 502010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:08:57,728-Speed 2634.28 samples/sec   Loss 5.3285   LearningRate 0.0156   Epoch: 12   Global Step: 502020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:01,619-Speed 2632.68 samples/sec   Loss 5.3186   LearningRate 0.0156   Epoch: 12   Global Step: 502030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:05,512-Speed 2631.06 samples/sec   Loss 5.2927   LearningRate 0.0156   Epoch: 12   Global Step: 502040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:09,406-Speed 2629.68 samples/sec   Loss 5.2452   LearningRate 0.0156   Epoch: 12   Global Step: 502050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:13,297-Speed 2632.55 samples/sec   Loss 5.3479   LearningRate 0.0156   Epoch: 12   Global Step: 502060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:17,187-Speed 2633.08 samples/sec   Loss 5.2677   LearningRate 0.0156   Epoch: 12   Global Step: 502070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:21,087-Speed 2626.19 samples/sec   Loss 5.2787   LearningRate 0.0156   Epoch: 12   Global Step: 502080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:24,979-Speed 2632.07 samples/sec   Loss 5.4216   LearningRate 0.0156   Epoch: 12   Global Step: 502090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:28,869-Speed 2632.96 samples/sec   Loss 5.3633   LearningRate 0.0156   Epoch: 12   Global Step: 502100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:09:32,738-Speed 2647.69 samples/sec   Loss 5.3127   LearningRate 0.0156   Epoch: 12   Global Step: 502110   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:36,630-Speed 2631.42 samples/sec   Loss 5.3006   LearningRate 0.0156   Epoch: 12   Global Step: 502120   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:40,525-Speed 2629.25 samples/sec   Loss 5.2173   LearningRate 0.0156   Epoch: 12   Global Step: 502130   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:44,412-Speed 2634.84 samples/sec   Loss 5.1935   LearningRate 0.0156   Epoch: 12   Global Step: 502140   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:48,303-Speed 2632.33 samples/sec   Loss 5.3534   LearningRate 0.0156   Epoch: 12   Global Step: 502150   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:52,197-Speed 2630.54 samples/sec   Loss 5.2682   LearningRate 0.0156   Epoch: 12   Global Step: 502160   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:56,085-Speed 2633.99 samples/sec   Loss 5.3094   LearningRate 0.0156   Epoch: 12   Global Step: 502170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:09:59,984-Speed 2627.34 samples/sec   Loss 5.3033   LearningRate 0.0156   Epoch: 12   Global Step: 502180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:03,880-Speed 2629.10 samples/sec   Loss 5.3655   LearningRate 0.0156   Epoch: 12   Global Step: 502190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:07,768-Speed 2634.14 samples/sec   Loss 5.3614   LearningRate 0.0156   Epoch: 12   Global Step: 502200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:11,662-Speed 2630.55 samples/sec   Loss 5.3312   LearningRate 0.0156   Epoch: 12   Global Step: 502210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:10:15,549-Speed 2634.93 samples/sec   Loss 5.2661   LearningRate 0.0156   Epoch: 12   Global Step: 502220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:10:19,447-Speed 2627.47 samples/sec   Loss 5.3029   LearningRate 0.0156   Epoch: 12   Global Step: 502230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:10:23,337-Speed 2633.05 samples/sec   Loss 5.2639   LearningRate 0.0156   Epoch: 12   Global Step: 502240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:10:27,226-Speed 2633.93 samples/sec   Loss 5.3464   LearningRate 0.0156   Epoch: 12   Global Step: 502250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:31,126-Speed 2625.98 samples/sec   Loss 5.2130   LearningRate 0.0156   Epoch: 12   Global Step: 502260   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:35,016-Speed 2633.27 samples/sec   Loss 5.3557   LearningRate 0.0156   Epoch: 12   Global Step: 502270   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:38,909-Speed 2631.36 samples/sec   Loss 5.3030   LearningRate 0.0156   Epoch: 12   Global Step: 502280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:42,822-Speed 2617.42 samples/sec   Loss 5.2919   LearningRate 0.0156   Epoch: 12   Global Step: 502290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:46,713-Speed 2632.11 samples/sec   Loss 5.3554   LearningRate 0.0156   Epoch: 12   Global Step: 502300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:50,608-Speed 2629.92 samples/sec   Loss 5.2450   LearningRate 0.0156   Epoch: 12   Global Step: 502310   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:54,502-Speed 2630.09 samples/sec   Loss 5.2935   LearningRate 0.0156   Epoch: 12   Global Step: 502320   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:10:58,392-Speed 2633.21 samples/sec   Loss 5.3703   LearningRate 0.0156   Epoch: 12   Global Step: 502330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:02,287-Speed 2629.74 samples/sec   Loss 5.2440   LearningRate 0.0156   Epoch: 12   Global Step: 502340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:06,179-Speed 2631.59 samples/sec   Loss 5.2466   LearningRate 0.0156   Epoch: 12   Global Step: 502350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:11:10,053-Speed 2643.88 samples/sec   Loss 5.1657   LearningRate 0.0156   Epoch: 12   Global Step: 502360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:13,949-Speed 2628.68 samples/sec   Loss 5.1621   LearningRate 0.0156   Epoch: 12   Global Step: 502370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:17,859-Speed 2619.61 samples/sec   Loss 5.2985   LearningRate 0.0156   Epoch: 12   Global Step: 502380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:21,751-Speed 2631.58 samples/sec   Loss 5.3127   LearningRate 0.0156   Epoch: 12   Global Step: 502390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:25,646-Speed 2629.75 samples/sec   Loss 5.3251   LearningRate 0.0156   Epoch: 12   Global Step: 502400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:29,540-Speed 2630.22 samples/sec   Loss 5.2749   LearningRate 0.0156   Epoch: 12   Global Step: 502410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:33,486-Speed 2596.10 samples/sec   Loss 5.1911   LearningRate 0.0156   Epoch: 12   Global Step: 502420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:37,378-Speed 2631.59 samples/sec   Loss 5.2946   LearningRate 0.0156   Epoch: 12   Global Step: 502430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:41,289-Speed 2618.86 samples/sec   Loss 5.3673   LearningRate 0.0156   Epoch: 12   Global Step: 502440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:45,184-Speed 2629.40 samples/sec   Loss 5.3008   LearningRate 0.0155   Epoch: 12   Global Step: 502450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:11:49,092-Speed 2620.52 samples/sec   Loss 5.2590   LearningRate 0.0155   Epoch: 12   Global Step: 502460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:11:52,981-Speed 2633.61 samples/sec   Loss 5.1441   LearningRate 0.0155   Epoch: 12   Global Step: 502470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:11:56,874-Speed 2631.82 samples/sec   Loss 5.3387   LearningRate 0.0155   Epoch: 12   Global Step: 502480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:00,769-Speed 2629.37 samples/sec   Loss 5.1537   LearningRate 0.0155   Epoch: 12   Global Step: 502490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:04,702-Speed 2604.55 samples/sec   Loss 5.3070   LearningRate 0.0155   Epoch: 12   Global Step: 502500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:08,589-Speed 2634.68 samples/sec   Loss 5.3456   LearningRate 0.0155   Epoch: 12   Global Step: 502510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:12,476-Speed 2635.88 samples/sec   Loss 5.2979   LearningRate 0.0155   Epoch: 12   Global Step: 502520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:16,366-Speed 2632.85 samples/sec   Loss 5.3280   LearningRate 0.0155   Epoch: 12   Global Step: 502530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:20,255-Speed 2633.13 samples/sec   Loss 5.2043   LearningRate 0.0155   Epoch: 12   Global Step: 502540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:24,144-Speed 2634.05 samples/sec   Loss 5.4120   LearningRate 0.0155   Epoch: 12   Global Step: 502550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:12:28,015-Speed 2645.95 samples/sec   Loss 5.2749   LearningRate 0.0155   Epoch: 12   Global Step: 502560   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:31,916-Speed 2625.30 samples/sec   Loss 5.1682   LearningRate 0.0155   Epoch: 12   Global Step: 502570   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:35,810-Speed 2630.16 samples/sec   Loss 5.3617   LearningRate 0.0155   Epoch: 12   Global Step: 502580   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:39,700-Speed 2633.00 samples/sec   Loss 5.2655   LearningRate 0.0155   Epoch: 12   Global Step: 502590   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:43,592-Speed 2632.02 samples/sec   Loss 5.3254   LearningRate 0.0155   Epoch: 12   Global Step: 502600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:47,483-Speed 2632.15 samples/sec   Loss 5.3877   LearningRate 0.0155   Epoch: 12   Global Step: 502610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:51,375-Speed 2632.13 samples/sec   Loss 5.1801   LearningRate 0.0155   Epoch: 12   Global Step: 502620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:55,301-Speed 2608.14 samples/sec   Loss 5.4002   LearningRate 0.0155   Epoch: 12   Global Step: 502630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:12:59,197-Speed 2629.07 samples/sec   Loss 5.2121   LearningRate 0.0155   Epoch: 12   Global Step: 502640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:13:03,099-Speed 2624.38 samples/sec   Loss 5.3047   LearningRate 0.0155   Epoch: 12   Global Step: 502650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:13:06,992-Speed 2632.10 samples/sec   Loss 5.2549   LearningRate 0.0155   Epoch: 12   Global Step: 502660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:10,880-Speed 2633.76 samples/sec   Loss 5.2234   LearningRate 0.0155   Epoch: 12   Global Step: 502670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:14,769-Speed 2633.86 samples/sec   Loss 5.1994   LearningRate 0.0155   Epoch: 12   Global Step: 502680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:18,696-Speed 2608.93 samples/sec   Loss 5.2900   LearningRate 0.0155   Epoch: 12   Global Step: 502690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:22,653-Speed 2587.83 samples/sec   Loss 5.2388   LearningRate 0.0155   Epoch: 12   Global Step: 502700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:26,567-Speed 2617.05 samples/sec   Loss 5.2745   LearningRate 0.0155   Epoch: 12   Global Step: 502710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:30,536-Speed 2580.48 samples/sec   Loss 5.2359   LearningRate 0.0155   Epoch: 12   Global Step: 502720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:34,510-Speed 2576.93 samples/sec   Loss 5.3168   LearningRate 0.0155   Epoch: 12   Global Step: 502730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:38,407-Speed 2628.53 samples/sec   Loss 5.2795   LearningRate 0.0155   Epoch: 12   Global Step: 502740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:42,309-Speed 2625.04 samples/sec   Loss 5.2148   LearningRate 0.0155   Epoch: 12   Global Step: 502750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:46,184-Speed 2642.85 samples/sec   Loss 5.2767   LearningRate 0.0155   Epoch: 12   Global Step: 502760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:50,076-Speed 2631.80 samples/sec   Loss 5.2649   LearningRate 0.0155   Epoch: 12   Global Step: 502770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:53,971-Speed 2630.11 samples/sec   Loss 5.3012   LearningRate 0.0155   Epoch: 12   Global Step: 502780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:13:57,860-Speed 2633.65 samples/sec   Loss 5.1555   LearningRate 0.0155   Epoch: 12   Global Step: 502790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:01,755-Speed 2629.18 samples/sec   Loss 5.2750   LearningRate 0.0155   Epoch: 12   Global Step: 502800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:05,658-Speed 2624.13 samples/sec   Loss 5.3650   LearningRate 0.0155   Epoch: 12   Global Step: 502810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:09,549-Speed 2632.53 samples/sec   Loss 5.2768   LearningRate 0.0155   Epoch: 12   Global Step: 502820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:13,454-Speed 2622.69 samples/sec   Loss 5.1807   LearningRate 0.0155   Epoch: 12   Global Step: 502830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:17,348-Speed 2630.45 samples/sec   Loss 5.2756   LearningRate 0.0155   Epoch: 12   Global Step: 502840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:21,239-Speed 2632.79 samples/sec   Loss 5.2431   LearningRate 0.0155   Epoch: 12   Global Step: 502850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:25,107-Speed 2647.62 samples/sec   Loss 5.3157   LearningRate 0.0155   Epoch: 12   Global Step: 502860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:29,003-Speed 2629.53 samples/sec   Loss 5.2372   LearningRate 0.0155   Epoch: 12   Global Step: 502870   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:32,894-Speed 2632.06 samples/sec   Loss 5.3152   LearningRate 0.0155   Epoch: 12   Global Step: 502880   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:36,792-Speed 2627.46 samples/sec   Loss 5.3014   LearningRate 0.0155   Epoch: 12   Global Step: 502890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:40,685-Speed 2630.98 samples/sec   Loss 5.2530   LearningRate 0.0155   Epoch: 12   Global Step: 502900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:14:44,563-Speed 2640.48 samples/sec   Loss 5.2920   LearningRate 0.0155   Epoch: 12   Global Step: 502910   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:14:48,466-Speed 2624.77 samples/sec   Loss 5.3862   LearningRate 0.0155   Epoch: 12   Global Step: 502920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:14:52,362-Speed 2628.83 samples/sec   Loss 5.2798   LearningRate 0.0155   Epoch: 12   Global Step: 502930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:14:56,259-Speed 2628.55 samples/sec   Loss 5.3757   LearningRate 0.0155   Epoch: 12   Global Step: 502940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:00,165-Speed 2621.59 samples/sec   Loss 5.2220   LearningRate 0.0155   Epoch: 12   Global Step: 502950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:04,058-Speed 2631.02 samples/sec   Loss 5.3256   LearningRate 0.0155   Epoch: 12   Global Step: 502960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:08,057-Speed 2561.28 samples/sec   Loss 5.2243   LearningRate 0.0155   Epoch: 12   Global Step: 502970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:11,956-Speed 2627.63 samples/sec   Loss 5.2434   LearningRate 0.0155   Epoch: 12   Global Step: 502980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:15,849-Speed 2631.12 samples/sec   Loss 5.3094   LearningRate 0.0155   Epoch: 12   Global Step: 502990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:19,751-Speed 2624.95 samples/sec   Loss 5.2949   LearningRate 0.0155   Epoch: 12   Global Step: 503000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:23,652-Speed 2625.29 samples/sec   Loss 5.2500   LearningRate 0.0155   Epoch: 12   Global Step: 503010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:15:27,554-Speed 2625.11 samples/sec   Loss 5.3211   LearningRate 0.0155   Epoch: 12   Global Step: 503020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:15:31,435-Speed 2638.85 samples/sec   Loss 5.2215   LearningRate 0.0155   Epoch: 12   Global Step: 503030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:35,329-Speed 2630.26 samples/sec   Loss 5.3595   LearningRate 0.0155   Epoch: 12   Global Step: 503040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:39,220-Speed 2632.31 samples/sec   Loss 5.2763   LearningRate 0.0155   Epoch: 12   Global Step: 503050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:43,129-Speed 2620.61 samples/sec   Loss 5.1932   LearningRate 0.0155   Epoch: 12   Global Step: 503060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:47,022-Speed 2630.83 samples/sec   Loss 5.2549   LearningRate 0.0155   Epoch: 12   Global Step: 503070   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:50,922-Speed 2626.48 samples/sec   Loss 5.3089   LearningRate 0.0155   Epoch: 12   Global Step: 503080   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:54,824-Speed 2624.57 samples/sec   Loss 5.3039   LearningRate 0.0155   Epoch: 12   Global Step: 503090   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:15:58,720-Speed 2629.04 samples/sec   Loss 5.2798   LearningRate 0.0155   Epoch: 12   Global Step: 503100   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:02,618-Speed 2627.73 samples/sec   Loss 5.2991   LearningRate 0.0155   Epoch: 12   Global Step: 503110   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:06,516-Speed 2627.39 samples/sec   Loss 5.2900   LearningRate 0.0155   Epoch: 12   Global Step: 503120   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:10,414-Speed 2627.28 samples/sec   Loss 5.2702   LearningRate 0.0155   Epoch: 12   Global Step: 503130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:16:14,311-Speed 2628.61 samples/sec   Loss 5.1694   LearningRate 0.0155   Epoch: 12   Global Step: 503140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:16:18,205-Speed 2630.22 samples/sec   Loss 5.2400   LearningRate 0.0155   Epoch: 12   Global Step: 503150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:16:22,081-Speed 2642.59 samples/sec   Loss 5.3457   LearningRate 0.0155   Epoch: 12   Global Step: 503160   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:25,970-Speed 2633.76 samples/sec   Loss 5.2480   LearningRate 0.0155   Epoch: 12   Global Step: 503170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:29,863-Speed 2631.13 samples/sec   Loss 5.2018   LearningRate 0.0155   Epoch: 12   Global Step: 503180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:33,758-Speed 2629.83 samples/sec   Loss 5.2442   LearningRate 0.0155   Epoch: 12   Global Step: 503190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:37,654-Speed 2628.77 samples/sec   Loss 5.3074   LearningRate 0.0155   Epoch: 12   Global Step: 503200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:41,555-Speed 2625.53 samples/sec   Loss 5.2921   LearningRate 0.0155   Epoch: 12   Global Step: 503210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:45,452-Speed 2627.97 samples/sec   Loss 5.2419   LearningRate 0.0155   Epoch: 12   Global Step: 503220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:49,349-Speed 2628.13 samples/sec   Loss 5.2257   LearningRate 0.0155   Epoch: 12   Global Step: 503230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:53,239-Speed 2633.33 samples/sec   Loss 5.1788   LearningRate 0.0155   Epoch: 12   Global Step: 503240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:16:57,132-Speed 2630.92 samples/sec   Loss 5.3427   LearningRate 0.0155   Epoch: 12   Global Step: 503250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:17:01,027-Speed 2629.61 samples/sec   Loss 5.2311   LearningRate 0.0155   Epoch: 12   Global Step: 503260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:04,918-Speed 2632.35 samples/sec   Loss 5.3008   LearningRate 0.0155   Epoch: 12   Global Step: 503270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:08,809-Speed 2632.40 samples/sec   Loss 5.3541   LearningRate 0.0155   Epoch: 12   Global Step: 503280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:12,705-Speed 2628.96 samples/sec   Loss 5.3695   LearningRate 0.0155   Epoch: 12   Global Step: 503290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:16,671-Speed 2582.32 samples/sec   Loss 5.2091   LearningRate 0.0155   Epoch: 12   Global Step: 503300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:20,571-Speed 2626.30 samples/sec   Loss 5.2565   LearningRate 0.0155   Epoch: 12   Global Step: 503310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:24,462-Speed 2632.84 samples/sec   Loss 5.3542   LearningRate 0.0155   Epoch: 12   Global Step: 503320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:28,355-Speed 2631.02 samples/sec   Loss 5.2204   LearningRate 0.0155   Epoch: 12   Global Step: 503330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:32,246-Speed 2632.60 samples/sec   Loss 5.3202   LearningRate 0.0155   Epoch: 12   Global Step: 503340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:36,137-Speed 2632.83 samples/sec   Loss 5.1875   LearningRate 0.0155   Epoch: 12   Global Step: 503350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:40,010-Speed 2644.60 samples/sec   Loss 5.3399   LearningRate 0.0155   Epoch: 12   Global Step: 503360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:43,906-Speed 2628.72 samples/sec   Loss 5.2849   LearningRate 0.0155   Epoch: 12   Global Step: 503370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:47,801-Speed 2629.49 samples/sec   Loss 5.3048   LearningRate 0.0155   Epoch: 12   Global Step: 503380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:17:51,670-Speed 2647.41 samples/sec   Loss 5.3475   LearningRate 0.0155   Epoch: 12   Global Step: 503390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:17:55,565-Speed 2629.13 samples/sec   Loss 5.2006   LearningRate 0.0155   Epoch: 12   Global Step: 503400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:17:59,464-Speed 2627.75 samples/sec   Loss 5.1884   LearningRate 0.0155   Epoch: 12   Global Step: 503410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:03,378-Speed 2616.88 samples/sec   Loss 5.1713   LearningRate 0.0155   Epoch: 12   Global Step: 503420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:07,276-Speed 2628.05 samples/sec   Loss 5.2337   LearningRate 0.0155   Epoch: 12   Global Step: 503430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:11,174-Speed 2627.75 samples/sec   Loss 5.2949   LearningRate 0.0155   Epoch: 12   Global Step: 503440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:15,072-Speed 2627.66 samples/sec   Loss 5.2300   LearningRate 0.0155   Epoch: 12   Global Step: 503450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:18,966-Speed 2630.03 samples/sec   Loss 5.2785   LearningRate 0.0155   Epoch: 12   Global Step: 503460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:22,862-Speed 2629.23 samples/sec   Loss 5.3055   LearningRate 0.0155   Epoch: 12   Global Step: 503470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:26,761-Speed 2626.92 samples/sec   Loss 5.3105   LearningRate 0.0155   Epoch: 12   Global Step: 503480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:18:30,654-Speed 2630.77 samples/sec   Loss 5.2598   LearningRate 0.0155   Epoch: 12   Global Step: 503490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:34,564-Speed 2623.48 samples/sec   Loss 5.3769   LearningRate 0.0155   Epoch: 12   Global Step: 503500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:38,460-Speed 2629.70 samples/sec   Loss 5.2854   LearningRate 0.0154   Epoch: 12   Global Step: 503510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:42,371-Speed 2618.53 samples/sec   Loss 5.2613   LearningRate 0.0154   Epoch: 12   Global Step: 503520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:46,268-Speed 2628.45 samples/sec   Loss 5.3976   LearningRate 0.0154   Epoch: 12   Global Step: 503530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:50,163-Speed 2629.91 samples/sec   Loss 5.3156   LearningRate 0.0154   Epoch: 12   Global Step: 503540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:54,073-Speed 2620.07 samples/sec   Loss 5.2815   LearningRate 0.0154   Epoch: 12   Global Step: 503550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:18:57,971-Speed 2627.67 samples/sec   Loss 5.3440   LearningRate 0.0154   Epoch: 12   Global Step: 503560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:01,864-Speed 2630.89 samples/sec   Loss 5.2678   LearningRate 0.0154   Epoch: 12   Global Step: 503570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:05,758-Speed 2629.76 samples/sec   Loss 5.2951   LearningRate 0.0154   Epoch: 12   Global Step: 503580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:09,651-Speed 2631.72 samples/sec   Loss 5.2982   LearningRate 0.0154   Epoch: 12   Global Step: 503590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:13,560-Speed 2629.18 samples/sec   Loss 5.2839   LearningRate 0.0154   Epoch: 12   Global Step: 503600   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:17,458-Speed 2627.83 samples/sec   Loss 5.2996   LearningRate 0.0154   Epoch: 12   Global Step: 503610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:21,383-Speed 2609.32 samples/sec   Loss 5.1941   LearningRate 0.0154   Epoch: 12   Global Step: 503620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:25,286-Speed 2624.47 samples/sec   Loss 5.3269   LearningRate 0.0154   Epoch: 12   Global Step: 503630   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:29,227-Speed 2599.35 samples/sec   Loss 5.3243   LearningRate 0.0154   Epoch: 12   Global Step: 503640   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:33,137-Speed 2619.12 samples/sec   Loss 5.2740   LearningRate 0.0154   Epoch: 12   Global Step: 503650   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:37,034-Speed 2628.57 samples/sec   Loss 5.2628   LearningRate 0.0154   Epoch: 12   Global Step: 503660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:40,931-Speed 2627.96 samples/sec   Loss 5.2752   LearningRate 0.0154   Epoch: 12   Global Step: 503670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:44,834-Speed 2624.38 samples/sec   Loss 5.2772   LearningRate 0.0154   Epoch: 12   Global Step: 503680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:48,711-Speed 2643.03 samples/sec   Loss 5.3539   LearningRate 0.0154   Epoch: 12   Global Step: 503690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:52,614-Speed 2623.93 samples/sec   Loss 5.2477   LearningRate 0.0154   Epoch: 12   Global Step: 503700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:19:56,505-Speed 2632.24 samples/sec   Loss 5.2841   LearningRate 0.0154   Epoch: 12   Global Step: 503710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:00,394-Speed 2633.84 samples/sec   Loss 5.2460   LearningRate 0.0154   Epoch: 12   Global Step: 503720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:04,289-Speed 2629.14 samples/sec   Loss 5.1877   LearningRate 0.0154   Epoch: 12   Global Step: 503730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:08,200-Speed 2618.95 samples/sec   Loss 5.2963   LearningRate 0.0154   Epoch: 12   Global Step: 503740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:12,099-Speed 2627.67 samples/sec   Loss 5.1948   LearningRate 0.0154   Epoch: 12   Global Step: 503750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:15,996-Speed 2627.89 samples/sec   Loss 5.2423   LearningRate 0.0154   Epoch: 12   Global Step: 503760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:19,924-Speed 2607.73 samples/sec   Loss 5.2356   LearningRate 0.0154   Epoch: 12   Global Step: 503770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:23,818-Speed 2630.83 samples/sec   Loss 5.2494   LearningRate 0.0154   Epoch: 12   Global Step: 503780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:20:27,688-Speed 2646.97 samples/sec   Loss 5.1637   LearningRate 0.0154   Epoch: 12   Global Step: 503790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:31,585-Speed 2627.86 samples/sec   Loss 5.2418   LearningRate 0.0154   Epoch: 12   Global Step: 503800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:35,483-Speed 2627.38 samples/sec   Loss 5.2228   LearningRate 0.0154   Epoch: 12   Global Step: 503810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:39,386-Speed 2623.94 samples/sec   Loss 5.2678   LearningRate 0.0154   Epoch: 12   Global Step: 503820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:43,278-Speed 2632.77 samples/sec   Loss 5.3951   LearningRate 0.0154   Epoch: 12   Global Step: 503830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:47,172-Speed 2630.02 samples/sec   Loss 5.2227   LearningRate 0.0154   Epoch: 12   Global Step: 503840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:51,069-Speed 2628.58 samples/sec   Loss 5.3964   LearningRate 0.0154   Epoch: 12   Global Step: 503850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:54,964-Speed 2629.63 samples/sec   Loss 5.3905   LearningRate 0.0154   Epoch: 12   Global Step: 503860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:20:58,871-Speed 2621.80 samples/sec   Loss 5.2707   LearningRate 0.0154   Epoch: 12   Global Step: 503870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:21:02,768-Speed 2628.10 samples/sec   Loss 5.2956   LearningRate 0.0154   Epoch: 12   Global Step: 503880   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:21:06,673-Speed 2622.55 samples/sec   Loss 5.2347   LearningRate 0.0154   Epoch: 12   Global Step: 503890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:10,570-Speed 2628.23 samples/sec   Loss 5.3307   LearningRate 0.0154   Epoch: 12   Global Step: 503900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:14,467-Speed 2628.65 samples/sec   Loss 5.3247   LearningRate 0.0154   Epoch: 12   Global Step: 503910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:18,375-Speed 2621.26 samples/sec   Loss 5.2307   LearningRate 0.0154   Epoch: 12   Global Step: 503920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:22,270-Speed 2629.18 samples/sec   Loss 5.2624   LearningRate 0.0154   Epoch: 12   Global Step: 503930   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:26,174-Speed 2624.24 samples/sec   Loss 5.2454   LearningRate 0.0154   Epoch: 12   Global Step: 503940   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:30,067-Speed 2630.75 samples/sec   Loss 5.2233   LearningRate 0.0154   Epoch: 12   Global Step: 503950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:33,964-Speed 2628.10 samples/sec   Loss 5.3361   LearningRate 0.0154   Epoch: 12   Global Step: 503960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:37,860-Speed 2628.80 samples/sec   Loss 5.2042   LearningRate 0.0154   Epoch: 12   Global Step: 503970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:41,753-Speed 2630.53 samples/sec   Loss 5.2902   LearningRate 0.0154   Epoch: 12   Global Step: 503980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:45,627-Speed 2643.87 samples/sec   Loss 5.2485   LearningRate 0.0154   Epoch: 12   Global Step: 503990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:21:49,514-Speed 2635.66 samples/sec   Loss 5.3431   LearningRate 0.0154   Epoch: 12   Global Step: 504000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:21:53,412-Speed 2627.58 samples/sec   Loss 5.2843   LearningRate 0.0154   Epoch: 12   Global Step: 504010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:21:57,321-Speed 2620.37 samples/sec   Loss 5.1351   LearningRate 0.0154   Epoch: 12   Global Step: 504020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:01,212-Speed 2632.18 samples/sec   Loss 5.3173   LearningRate 0.0154   Epoch: 12   Global Step: 504030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:05,106-Speed 2630.10 samples/sec   Loss 5.3736   LearningRate 0.0154   Epoch: 12   Global Step: 504040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:09,000-Speed 2630.34 samples/sec   Loss 5.3016   LearningRate 0.0154   Epoch: 12   Global Step: 504050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:12,894-Speed 2630.09 samples/sec   Loss 5.2878   LearningRate 0.0154   Epoch: 12   Global Step: 504060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:16,789-Speed 2629.87 samples/sec   Loss 5.2704   LearningRate 0.0154   Epoch: 12   Global Step: 504070   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:20,708-Speed 2613.31 samples/sec   Loss 5.1989   LearningRate 0.0154   Epoch: 12   Global Step: 504080   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:24,618-Speed 2619.87 samples/sec   Loss 5.1796   LearningRate 0.0154   Epoch: 12   Global Step: 504090   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:28,511-Speed 2630.83 samples/sec   Loss 5.2248   LearningRate 0.0154   Epoch: 12   Global Step: 504100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:22:32,387-Speed 2643.00 samples/sec   Loss 5.2687   LearningRate 0.0154   Epoch: 12   Global Step: 504110   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:36,282-Speed 2629.38 samples/sec   Loss 5.2788   LearningRate 0.0154   Epoch: 12   Global Step: 504120   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:40,178-Speed 2629.17 samples/sec   Loss 5.3257   LearningRate 0.0154   Epoch: 12   Global Step: 504130   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:44,072-Speed 2630.07 samples/sec   Loss 5.2670   LearningRate 0.0154   Epoch: 12   Global Step: 504140   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:47,971-Speed 2626.81 samples/sec   Loss 5.2284   LearningRate 0.0154   Epoch: 12   Global Step: 504150   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:51,878-Speed 2621.38 samples/sec   Loss 5.2765   LearningRate 0.0154   Epoch: 12   Global Step: 504160   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:55,796-Speed 2614.19 samples/sec   Loss 5.3107   LearningRate 0.0154   Epoch: 12   Global Step: 504170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:22:59,701-Speed 2623.03 samples/sec   Loss 5.3224   LearningRate 0.0154   Epoch: 12   Global Step: 504180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:03,599-Speed 2627.82 samples/sec   Loss 5.2345   LearningRate 0.0154   Epoch: 12   Global Step: 504190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:07,502-Speed 2624.50 samples/sec   Loss 5.2770   LearningRate 0.0154   Epoch: 12   Global Step: 504200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:11,396-Speed 2630.35 samples/sec   Loss 5.2232   LearningRate 0.0154   Epoch: 12   Global Step: 504210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:23:15,298-Speed 2624.89 samples/sec   Loss 5.2199   LearningRate 0.0154   Epoch: 12   Global Step: 504220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:23:19,199-Speed 2625.79 samples/sec   Loss 5.2515   LearningRate 0.0154   Epoch: 12   Global Step: 504230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:23:23,094-Speed 2629.04 samples/sec   Loss 5.2270   LearningRate 0.0154   Epoch: 12   Global Step: 504240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:23:26,993-Speed 2626.99 samples/sec   Loss 5.1770   LearningRate 0.0154   Epoch: 12   Global Step: 504250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:23:30,864-Speed 2645.93 samples/sec   Loss 5.2335   LearningRate 0.0154   Epoch: 12   Global Step: 504260   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:34,758-Speed 2630.78 samples/sec   Loss 5.2509   LearningRate 0.0154   Epoch: 12   Global Step: 504270   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:38,662-Speed 2623.10 samples/sec   Loss 5.2138   LearningRate 0.0154   Epoch: 12   Global Step: 504280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:42,580-Speed 2615.47 samples/sec   Loss 5.2094   LearningRate 0.0154   Epoch: 12   Global Step: 504290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:46,553-Speed 2577.87 samples/sec   Loss 5.2567   LearningRate 0.0154   Epoch: 12   Global Step: 504300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:50,451-Speed 2627.31 samples/sec   Loss 5.2193   LearningRate 0.0154   Epoch: 12   Global Step: 504310   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:54,355-Speed 2623.50 samples/sec   Loss 5.2684   LearningRate 0.0154   Epoch: 12   Global Step: 504320   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:23:58,252-Speed 2628.24 samples/sec   Loss 5.2302   LearningRate 0.0154   Epoch: 12   Global Step: 504330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:02,159-Speed 2621.55 samples/sec   Loss 5.2976   LearningRate 0.0154   Epoch: 12   Global Step: 504340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:06,060-Speed 2625.64 samples/sec   Loss 5.2345   LearningRate 0.0154   Epoch: 12   Global Step: 504350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:09,965-Speed 2623.09 samples/sec   Loss 5.3239   LearningRate 0.0154   Epoch: 12   Global Step: 504360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:24:13,863-Speed 2627.61 samples/sec   Loss 5.2742   LearningRate 0.0154   Epoch: 12   Global Step: 504370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:24:17,777-Speed 2616.69 samples/sec   Loss 5.2693   LearningRate 0.0154   Epoch: 12   Global Step: 504380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:24:21,688-Speed 2619.43 samples/sec   Loss 5.2256   LearningRate 0.0154   Epoch: 12   Global Step: 504390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:24:25,565-Speed 2641.83 samples/sec   Loss 5.3691   LearningRate 0.0154   Epoch: 12   Global Step: 504400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:29,463-Speed 2627.20 samples/sec   Loss 5.2873   LearningRate 0.0154   Epoch: 12   Global Step: 504410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:33,355-Speed 2631.93 samples/sec   Loss 5.3021   LearningRate 0.0154   Epoch: 12   Global Step: 504420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:37,268-Speed 2617.61 samples/sec   Loss 5.2480   LearningRate 0.0154   Epoch: 12   Global Step: 504430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:41,167-Speed 2626.56 samples/sec   Loss 5.2311   LearningRate 0.0154   Epoch: 12   Global Step: 504440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:45,069-Speed 2625.28 samples/sec   Loss 5.2844   LearningRate 0.0154   Epoch: 12   Global Step: 504450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:48,968-Speed 2626.47 samples/sec   Loss 5.2590   LearningRate 0.0154   Epoch: 12   Global Step: 504460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:52,858-Speed 2633.11 samples/sec   Loss 5.2705   LearningRate 0.0154   Epoch: 12   Global Step: 504470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:24:56,747-Speed 2633.53 samples/sec   Loss 5.2442   LearningRate 0.0154   Epoch: 12   Global Step: 504480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:25:00,636-Speed 2633.59 samples/sec   Loss 5.2460   LearningRate 0.0154   Epoch: 12   Global Step: 504490   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:25:04,539-Speed 2624.75 samples/sec   Loss 5.1916   LearningRate 0.0154   Epoch: 12   Global Step: 504500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:08,431-Speed 2631.33 samples/sec   Loss 5.2777   LearningRate 0.0154   Epoch: 12   Global Step: 504510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:12,327-Speed 2628.75 samples/sec   Loss 5.1688   LearningRate 0.0154   Epoch: 12   Global Step: 504520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:16,223-Speed 2629.23 samples/sec   Loss 5.1669   LearningRate 0.0154   Epoch: 12   Global Step: 504530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:20,114-Speed 2632.50 samples/sec   Loss 5.1302   LearningRate 0.0154   Epoch: 12   Global Step: 504540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:24,021-Speed 2621.42 samples/sec   Loss 5.1763   LearningRate 0.0154   Epoch: 12   Global Step: 504550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:27,916-Speed 2629.41 samples/sec   Loss 5.1636   LearningRate 0.0153   Epoch: 12   Global Step: 504560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:31,813-Speed 2628.10 samples/sec   Loss 5.2430   LearningRate 0.0153   Epoch: 12   Global Step: 504570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:35,799-Speed 2569.60 samples/sec   Loss 5.2217   LearningRate 0.0153   Epoch: 12   Global Step: 504580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:39,694-Speed 2629.73 samples/sec   Loss 5.2311   LearningRate 0.0153   Epoch: 12   Global Step: 504590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-04-15 04:25:43,660-Speed 2587.34 samples/sec   Loss 5.2074   LearningRate 0.0153   Epoch: 12   Global Step: 504600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:25:47,747-Speed 2506.13 samples/sec   Loss 5.2032   LearningRate 0.0153   Epoch: 12   Global Step: 504610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:25:51,841-Speed 2502.22 samples/sec   Loss 5.2511   LearningRate 0.0153   Epoch: 12   Global Step: 504620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:25:55,794-Speed 2590.89 samples/sec   Loss 5.2766   LearningRate 0.0153   Epoch: 12   Global Step: 504630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-15 04:25:59,671-Speed 2641.46 samples/sec   Loss 5.2923   LearningRate 0.0153   Epoch: 12   Global Step: 504640   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:03,569-Speed 2627.57 samples/sec   Loss 5.3459   LearningRate 0.0153   Epoch: 12   Global Step: 504650   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:07,476-Speed 2621.33 samples/sec   Loss 5.1381   LearningRate 0.0153   Epoch: 12   Global Step: 504660   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:11,374-Speed 2627.92 samples/sec   Loss 5.2781   LearningRate 0.0153   Epoch: 12   Global Step: 504670   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:15,272-Speed 2627.45 samples/sec   Loss 5.1294   LearningRate 0.0153   Epoch: 12   Global Step: 504680   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:19,170-Speed 2627.85 samples/sec   Loss 5.3053   LearningRate 0.0153   Epoch: 12   Global Step: 504690   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:23,069-Speed 2626.87 samples/sec   Loss 5.1205   LearningRate 0.0153   Epoch: 12   Global Step: 504700   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:26,964-Speed 2630.06 samples/sec   Loss 5.2255   LearningRate 0.0153   Epoch: 12   Global Step: 504710   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:30,855-Speed 2632.50 samples/sec   Loss 5.2935   LearningRate 0.0153   Epoch: 12   Global Step: 504720   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-15 04:26:34,747-Speed 2631.20 samples/sec   Loss 5.0982   LearningRate 0.0153   Epoch: 12   Global Step: 504730   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:26:38,639-Speed 2631.35 samples/sec   Loss 5.2547   LearningRate 0.0153   Epoch: 12   Global Step: 504740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:26:42,532-Speed 2631.31 samples/sec   Loss 5.2575   LearningRate 0.0153   Epoch: 12   Global Step: 504750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:26:46,425-Speed 2630.62 samples/sec   Loss 5.3285   LearningRate 0.0153   Epoch: 12   Global Step: 504760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:26:50,318-Speed 2631.21 samples/sec   Loss 5.4366   LearningRate 0.0153   Epoch: 12   Global Step: 504770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:26:54,210-Speed 2631.48 samples/sec   Loss 5.1632   LearningRate 0.0153   Epoch: 12   Global Step: 504780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:26:58,101-Speed 2633.04 samples/sec   Loss 5.1768   LearningRate 0.0153   Epoch: 12   Global Step: 504790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:01,997-Speed 2628.87 samples/sec   Loss 5.2565   LearningRate 0.0153   Epoch: 12   Global Step: 504800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:05,894-Speed 2628.34 samples/sec   Loss 5.2847   LearningRate 0.0153   Epoch: 12   Global Step: 504810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:09,794-Speed 2625.73 samples/sec   Loss 5.2565   LearningRate 0.0153   Epoch: 12   Global Step: 504820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:13,689-Speed 2630.62 samples/sec   Loss 5.1299   LearningRate 0.0153   Epoch: 12   Global Step: 504830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:17,606-Speed 2614.68 samples/sec   Loss 5.3382   LearningRate 0.0153   Epoch: 12   Global Step: 504840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:27:21,504-Speed 2627.57 samples/sec   Loss 5.2804   LearningRate 0.0153   Epoch: 12   Global Step: 504850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:27:25,557-Speed 2527.05 samples/sec   Loss 5.2233   LearningRate 0.0153   Epoch: 12   Global Step: 504860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:27:29,471-Speed 2617.06 samples/sec   Loss 5.1419   LearningRate 0.0153   Epoch: 12   Global Step: 504870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:27:33,349-Speed 2641.01 samples/sec   Loss 5.2663   LearningRate 0.0153   Epoch: 12   Global Step: 504880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:37,342-Speed 2564.80 samples/sec   Loss 5.3150   LearningRate 0.0153   Epoch: 12   Global Step: 504890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:41,241-Speed 2626.96 samples/sec   Loss 5.2369   LearningRate 0.0153   Epoch: 12   Global Step: 504900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:45,140-Speed 2627.47 samples/sec   Loss 5.2214   LearningRate 0.0153   Epoch: 12   Global Step: 504910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:49,043-Speed 2623.85 samples/sec   Loss 5.2687   LearningRate 0.0153   Epoch: 12   Global Step: 504920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:52,951-Speed 2621.27 samples/sec   Loss 5.2236   LearningRate 0.0153   Epoch: 12   Global Step: 504930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:27:56,852-Speed 2625.74 samples/sec   Loss 5.2309   LearningRate 0.0153   Epoch: 12   Global Step: 504940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:28:00,745-Speed 2630.84 samples/sec   Loss 5.0182   LearningRate 0.0153   Epoch: 12   Global Step: 504950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:28:04,645-Speed 2626.17 samples/sec   Loss 5.1565   LearningRate 0.0153   Epoch: 12   Global Step: 504960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:28:08,544-Speed 2626.72 samples/sec   Loss 5.2568   LearningRate 0.0153   Epoch: 12   Global Step: 504970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:28:12,440-Speed 2629.14 samples/sec   Loss 5.2275   LearningRate 0.0153   Epoch: 12   Global Step: 504980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:16,330-Speed 2633.37 samples/sec   Loss 5.3118   LearningRate 0.0153   Epoch: 12   Global Step: 504990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:20,225-Speed 2630.24 samples/sec   Loss 5.2653   LearningRate 0.0153   Epoch: 12   Global Step: 505000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:24,118-Speed 2631.70 samples/sec   Loss 5.1642   LearningRate 0.0153   Epoch: 12   Global Step: 505010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:28,012-Speed 2630.26 samples/sec   Loss 5.1998   LearningRate 0.0153   Epoch: 12   Global Step: 505020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:31,919-Speed 2621.18 samples/sec   Loss 5.2177   LearningRate 0.0153   Epoch: 12   Global Step: 505030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:35,829-Speed 2619.84 samples/sec   Loss 5.2822   LearningRate 0.0153   Epoch: 12   Global Step: 505040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:39,730-Speed 2626.14 samples/sec   Loss 5.2717   LearningRate 0.0153   Epoch: 12   Global Step: 505050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:43,636-Speed 2622.74 samples/sec   Loss 5.3102   LearningRate 0.0153   Epoch: 12   Global Step: 505060   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:47,534-Speed 2628.06 samples/sec   Loss 5.2753   LearningRate 0.0153   Epoch: 12   Global Step: 505070   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:51,430-Speed 2628.93 samples/sec   Loss 5.3157   LearningRate 0.0153   Epoch: 12   Global Step: 505080   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-04-15 04:28:55,308-Speed 2641.32 samples/sec   Loss 5.2752   LearningRate 0.0153   Epoch: 12   Global Step: 505090   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:28:59,203-Speed 2629.86 samples/sec   Loss 5.2541   LearningRate 0.0153   Epoch: 12   Global Step: 505100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:03,101-Speed 2628.08 samples/sec   Loss 5.2787   LearningRate 0.0153   Epoch: 12   Global Step: 505110   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:07,021-Speed 2612.69 samples/sec   Loss 5.2041   LearningRate 0.0153   Epoch: 12   Global Step: 505120   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:10,919-Speed 2627.31 samples/sec   Loss 5.2674   LearningRate 0.0153   Epoch: 12   Global Step: 505130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:14,818-Speed 2627.63 samples/sec   Loss 5.2587   LearningRate 0.0153   Epoch: 12   Global Step: 505140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:18,717-Speed 2627.15 samples/sec   Loss 5.1914   LearningRate 0.0153   Epoch: 12   Global Step: 505150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:22,613-Speed 2628.43 samples/sec   Loss 5.1997   LearningRate 0.0153   Epoch: 12   Global Step: 505160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:26,505-Speed 2631.58 samples/sec   Loss 5.2302   LearningRate 0.0153   Epoch: 12   Global Step: 505170   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:30,403-Speed 2627.77 samples/sec   Loss 5.1479   LearningRate 0.0153   Epoch: 12   Global Step: 505180   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:34,279-Speed 2642.63 samples/sec   Loss 5.2157   LearningRate 0.0153   Epoch: 12   Global Step: 505190   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:29:38,138-Speed 2654.42 samples/sec   Loss 5.2356   LearningRate 0.0153   Epoch: 12   Global Step: 505200   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:29:42,033-Speed 2630.11 samples/sec   Loss 5.2459   LearningRate 0.0153   Epoch: 12   Global Step: 505210   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:29:45,929-Speed 2629.02 samples/sec   Loss 5.1902   LearningRate 0.0153   Epoch: 12   Global Step: 505220   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:29:49,844-Speed 2615.82 samples/sec   Loss 5.1932   LearningRate 0.0153   Epoch: 12   Global Step: 505230   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:29:53,767-Speed 2610.93 samples/sec   Loss 5.1994   LearningRate 0.0153   Epoch: 12   Global Step: 505240   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:29:57,663-Speed 2629.80 samples/sec   Loss 5.3487   LearningRate 0.0153   Epoch: 12   Global Step: 505250   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:01,606-Speed 2597.44 samples/sec   Loss 5.2214   LearningRate 0.0153   Epoch: 12   Global Step: 505260   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:05,502-Speed 2629.12 samples/sec   Loss 5.2799   LearningRate 0.0153   Epoch: 12   Global Step: 505270   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:09,452-Speed 2592.78 samples/sec   Loss 5.2761   LearningRate 0.0153   Epoch: 12   Global Step: 505280   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:13,353-Speed 2626.40 samples/sec   Loss 5.2272   LearningRate 0.0153   Epoch: 12   Global Step: 505290   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:17,250-Speed 2628.11 samples/sec   Loss 5.1771   LearningRate 0.0153   Epoch: 12   Global Step: 505300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:30:21,151-Speed 2625.21 samples/sec   Loss 5.2610   LearningRate 0.0153   Epoch: 12   Global Step: 505310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:30:25,047-Speed 2629.26 samples/sec   Loss 5.2095   LearningRate 0.0153   Epoch: 12   Global Step: 505320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:30:28,942-Speed 2630.20 samples/sec   Loss 5.2747   LearningRate 0.0153   Epoch: 12   Global Step: 505330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:30:32,838-Speed 2628.48 samples/sec   Loss 5.1806   LearningRate 0.0153   Epoch: 12   Global Step: 505340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:30:36,713-Speed 2642.98 samples/sec   Loss 5.2640   LearningRate 0.0153   Epoch: 12   Global Step: 505350   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:40,620-Speed 2630.64 samples/sec   Loss 5.2174   LearningRate 0.0153   Epoch: 12   Global Step: 505360   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:44,535-Speed 2616.41 samples/sec   Loss 5.1667   LearningRate 0.0153   Epoch: 12   Global Step: 505370   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:48,459-Speed 2610.06 samples/sec   Loss 5.1875   LearningRate 0.0153   Epoch: 12   Global Step: 505380   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:52,362-Speed 2624.86 samples/sec   Loss 5.2427   LearningRate 0.0153   Epoch: 12   Global Step: 505390   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:30:56,264-Speed 2624.31 samples/sec   Loss 5.2296   LearningRate 0.0153   Epoch: 12   Global Step: 505400   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:31:00,163-Speed 2627.48 samples/sec   Loss 5.2692   LearningRate 0.0153   Epoch: 12   Global Step: 505410   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:31:04,056-Speed 2630.81 samples/sec   Loss 5.1854   LearningRate 0.0153   Epoch: 12   Global Step: 505420   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:31:07,952-Speed 2629.62 samples/sec   Loss 5.2084   LearningRate 0.0153   Epoch: 12   Global Step: 505430   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:31:11,847-Speed 2629.43 samples/sec   Loss 5.2481   LearningRate 0.0153   Epoch: 12   Global Step: 505440   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:31:15,742-Speed 2629.86 samples/sec   Loss 5.2121   LearningRate 0.0153   Epoch: 12   Global Step: 505450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:19,651-Speed 2620.54 samples/sec   Loss 5.1585   LearningRate 0.0153   Epoch: 12   Global Step: 505460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:23,561-Speed 2619.79 samples/sec   Loss 5.3079   LearningRate 0.0153   Epoch: 12   Global Step: 505470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:27,482-Speed 2612.71 samples/sec   Loss 5.3541   LearningRate 0.0153   Epoch: 12   Global Step: 505480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:31,380-Speed 2627.31 samples/sec   Loss 5.2971   LearningRate 0.0153   Epoch: 12   Global Step: 505490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:35,392-Speed 2553.00 samples/sec   Loss 5.2154   LearningRate 0.0153   Epoch: 12   Global Step: 505500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:39,300-Speed 2620.74 samples/sec   Loss 5.2121   LearningRate 0.0153   Epoch: 12   Global Step: 505510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:43,198-Speed 2628.41 samples/sec   Loss 5.2664   LearningRate 0.0153   Epoch: 12   Global Step: 505520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:47,108-Speed 2619.39 samples/sec   Loss 5.2887   LearningRate 0.0153   Epoch: 12   Global Step: 505530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:51,005-Speed 2628.71 samples/sec   Loss 5.3047   LearningRate 0.0153   Epoch: 12   Global Step: 505540   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:31:54,903-Speed 2628.10 samples/sec   Loss 5.3225   LearningRate 0.0153   Epoch: 12   Global Step: 505550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:31:58,802-Speed 2627.36 samples/sec   Loss 5.2050   LearningRate 0.0153   Epoch: 12   Global Step: 505560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:32:02,703-Speed 2625.10 samples/sec   Loss 5.2794   LearningRate 0.0153   Epoch: 12   Global Step: 505570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:32:06,601-Speed 2627.92 samples/sec   Loss 5.2650   LearningRate 0.0153   Epoch: 12   Global Step: 505580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:32:10,488-Speed 2634.93 samples/sec   Loss 5.2540   LearningRate 0.0153   Epoch: 12   Global Step: 505590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:14,394-Speed 2622.89 samples/sec   Loss 5.3149   LearningRate 0.0153   Epoch: 12   Global Step: 505600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:18,292-Speed 2627.78 samples/sec   Loss 5.1920   LearningRate 0.0153   Epoch: 12   Global Step: 505610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:22,217-Speed 2610.02 samples/sec   Loss 5.2546   LearningRate 0.0152   Epoch: 12   Global Step: 505620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:26,122-Speed 2622.50 samples/sec   Loss 5.1907   LearningRate 0.0152   Epoch: 12   Global Step: 505630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:30,051-Speed 2607.65 samples/sec   Loss 5.3320   LearningRate 0.0152   Epoch: 12   Global Step: 505640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:33,942-Speed 2632.15 samples/sec   Loss 5.2071   LearningRate 0.0152   Epoch: 12   Global Step: 505650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:37,836-Speed 2630.14 samples/sec   Loss 5.2064   LearningRate 0.0152   Epoch: 12   Global Step: 505660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:41,732-Speed 2628.59 samples/sec   Loss 5.2381   LearningRate 0.0152   Epoch: 12   Global Step: 505670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:45,626-Speed 2633.64 samples/sec   Loss 5.2383   LearningRate 0.0152   Epoch: 12   Global Step: 505680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:32:49,520-Speed 2631.83 samples/sec   Loss 5.2876   LearningRate 0.0152   Epoch: 12   Global Step: 505690   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:32:53,438-Speed 2614.58 samples/sec   Loss 5.2026   LearningRate 0.0152   Epoch: 12   Global Step: 505700   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:32:57,351-Speed 2617.59 samples/sec   Loss 5.2776   LearningRate 0.0152   Epoch: 12   Global Step: 505710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:01,252-Speed 2626.17 samples/sec   Loss 5.2335   LearningRate 0.0152   Epoch: 12   Global Step: 505720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:05,199-Speed 2594.32 samples/sec   Loss 5.2187   LearningRate 0.0152   Epoch: 12   Global Step: 505730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:09,095-Speed 2629.16 samples/sec   Loss 5.1715   LearningRate 0.0152   Epoch: 12   Global Step: 505740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:12,994-Speed 2627.20 samples/sec   Loss 5.2332   LearningRate 0.0152   Epoch: 12   Global Step: 505750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:16,890-Speed 2629.38 samples/sec   Loss 5.2494   LearningRate 0.0152   Epoch: 12   Global Step: 505760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:20,783-Speed 2630.98 samples/sec   Loss 5.2477   LearningRate 0.0152   Epoch: 12   Global Step: 505770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:24,690-Speed 2621.91 samples/sec   Loss 5.3436   LearningRate 0.0152   Epoch: 12   Global Step: 505780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:28,585-Speed 2630.11 samples/sec   Loss 5.1702   LearningRate 0.0152   Epoch: 12   Global Step: 505790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:32,480-Speed 2629.83 samples/sec   Loss 5.2662   LearningRate 0.0152   Epoch: 12   Global Step: 505800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:33:36,375-Speed 2629.55 samples/sec   Loss 5.2902   LearningRate 0.0152   Epoch: 12   Global Step: 505810   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:33:40,305-Speed 2606.08 samples/sec   Loss 5.1604   LearningRate 0.0152   Epoch: 12   Global Step: 505820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:33:44,220-Speed 2616.23 samples/sec   Loss 5.3167   LearningRate 0.0152   Epoch: 12   Global Step: 505830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:33:48,141-Speed 2612.71 samples/sec   Loss 5.1127   LearningRate 0.0152   Epoch: 12   Global Step: 505840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:33:52,032-Speed 2632.05 samples/sec   Loss 5.2418   LearningRate 0.0152   Epoch: 12   Global Step: 505850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:33:55,931-Speed 2626.96 samples/sec   Loss 5.2206   LearningRate 0.0152   Epoch: 12   Global Step: 505860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:33:59,829-Speed 2627.94 samples/sec   Loss 5.2920   LearningRate 0.0152   Epoch: 12   Global Step: 505870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:34:03,726-Speed 2627.94 samples/sec   Loss 5.2856   LearningRate 0.0152   Epoch: 12   Global Step: 505880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:34:07,595-Speed 2647.76 samples/sec   Loss 5.2460   LearningRate 0.0152   Epoch: 12   Global Step: 505890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:11,489-Speed 2630.38 samples/sec   Loss 5.1202   LearningRate 0.0152   Epoch: 12   Global Step: 505900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:15,385-Speed 2629.08 samples/sec   Loss 5.2432   LearningRate 0.0152   Epoch: 12   Global Step: 505910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:19,280-Speed 2629.51 samples/sec   Loss 5.1759   LearningRate 0.0152   Epoch: 12   Global Step: 505920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:23,175-Speed 2629.46 samples/sec   Loss 5.2562   LearningRate 0.0152   Epoch: 12   Global Step: 505930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:27,078-Speed 2623.55 samples/sec   Loss 5.3507   LearningRate 0.0152   Epoch: 12   Global Step: 505940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:30,983-Speed 2623.18 samples/sec   Loss 5.2151   LearningRate 0.0152   Epoch: 12   Global Step: 505950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:34,879-Speed 2629.29 samples/sec   Loss 5.3045   LearningRate 0.0152   Epoch: 12   Global Step: 505960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:38,782-Speed 2624.52 samples/sec   Loss 5.2133   LearningRate 0.0152   Epoch: 12   Global Step: 505970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:42,679-Speed 2628.87 samples/sec   Loss 5.1919   LearningRate 0.0152   Epoch: 12   Global Step: 505980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:46,574-Speed 2629.02 samples/sec   Loss 5.2329   LearningRate 0.0152   Epoch: 12   Global Step: 505990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:34:50,445-Speed 2646.30 samples/sec   Loss 5.2225   LearningRate 0.0152   Epoch: 12   Global Step: 506000   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:54,346-Speed 2625.34 samples/sec   Loss 5.1881   LearningRate 0.0152   Epoch: 12   Global Step: 506010   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:34:58,240-Speed 2630.37 samples/sec   Loss 5.2383   LearningRate 0.0152   Epoch: 12   Global Step: 506020   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:02,149-Speed 2619.43 samples/sec   Loss 5.2191   LearningRate 0.0152   Epoch: 12   Global Step: 506030   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:06,049-Speed 2626.83 samples/sec   Loss 5.1355   LearningRate 0.0152   Epoch: 12   Global Step: 506040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:09,943-Speed 2630.25 samples/sec   Loss 5.2224   LearningRate 0.0152   Epoch: 12   Global Step: 506050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:13,838-Speed 2630.00 samples/sec   Loss 5.3063   LearningRate 0.0152   Epoch: 12   Global Step: 506060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:17,733-Speed 2629.96 samples/sec   Loss 5.2003   LearningRate 0.0152   Epoch: 12   Global Step: 506070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:21,639-Speed 2622.07 samples/sec   Loss 5.2284   LearningRate 0.0152   Epoch: 12   Global Step: 506080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:25,535-Speed 2628.61 samples/sec   Loss 5.2285   LearningRate 0.0152   Epoch: 12   Global Step: 506090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:29,422-Speed 2635.07 samples/sec   Loss 5.2927   LearningRate 0.0152   Epoch: 12   Global Step: 506100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:33,314-Speed 2631.49 samples/sec   Loss 5.2454   LearningRate 0.0152   Epoch: 12   Global Step: 506110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:37,221-Speed 2621.97 samples/sec   Loss 5.1162   LearningRate 0.0152   Epoch: 12   Global Step: 506120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:41,118-Speed 2627.90 samples/sec   Loss 5.2234   LearningRate 0.0152   Epoch: 12   Global Step: 506130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:45,016-Speed 2627.61 samples/sec   Loss 5.2447   LearningRate 0.0152   Epoch: 12   Global Step: 506140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:48,909-Speed 2631.47 samples/sec   Loss 5.1657   LearningRate 0.0152   Epoch: 12   Global Step: 506150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:52,816-Speed 2621.71 samples/sec   Loss 5.2206   LearningRate 0.0152   Epoch: 12   Global Step: 506160   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:35:56,716-Speed 2625.98 samples/sec   Loss 5.2075   LearningRate 0.0152   Epoch: 12   Global Step: 506170   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:36:00,616-Speed 2626.11 samples/sec   Loss 5.1994   LearningRate 0.0152   Epoch: 12   Global Step: 506180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:36:04,525-Speed 2620.33 samples/sec   Loss 5.2672   LearningRate 0.0152   Epoch: 12   Global Step: 506190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:36:08,435-Speed 2619.24 samples/sec   Loss 5.1941   LearningRate 0.0152   Epoch: 12   Global Step: 506200   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:12,340-Speed 2623.57 samples/sec   Loss 5.3077   LearningRate 0.0152   Epoch: 12   Global Step: 506210   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:16,247-Speed 2620.87 samples/sec   Loss 5.2239   LearningRate 0.0152   Epoch: 12   Global Step: 506220   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:20,153-Speed 2622.58 samples/sec   Loss 5.1846   LearningRate 0.0152   Epoch: 12   Global Step: 506230   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:24,054-Speed 2625.18 samples/sec   Loss 5.2754   LearningRate 0.0152   Epoch: 12   Global Step: 506240   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:27,950-Speed 2628.89 samples/sec   Loss 5.1979   LearningRate 0.0152   Epoch: 12   Global Step: 506250   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:31,845-Speed 2629.82 samples/sec   Loss 5.2748   LearningRate 0.0152   Epoch: 12   Global Step: 506260   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:35,738-Speed 2631.61 samples/sec   Loss 5.2169   LearningRate 0.0152   Epoch: 12   Global Step: 506270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:39,637-Speed 2627.27 samples/sec   Loss 5.2065   LearningRate 0.0152   Epoch: 12   Global Step: 506280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:43,532-Speed 2629.55 samples/sec   Loss 5.2113   LearningRate 0.0152   Epoch: 12   Global Step: 506290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:47,419-Speed 2635.68 samples/sec   Loss 5.1461   LearningRate 0.0152   Epoch: 12   Global Step: 506300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:51,324-Speed 2622.63 samples/sec   Loss 5.2259   LearningRate 0.0152   Epoch: 12   Global Step: 506310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:55,219-Speed 2629.83 samples/sec   Loss 5.2721   LearningRate 0.0152   Epoch: 12   Global Step: 506320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:36:59,121-Speed 2624.47 samples/sec   Loss 5.1825   LearningRate 0.0152   Epoch: 12   Global Step: 506330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:37:03,032-Speed 2618.65 samples/sec   Loss 5.1714   LearningRate 0.0152   Epoch: 12   Global Step: 506340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:37:06,929-Speed 2628.63 samples/sec   Loss 5.2060   LearningRate 0.0152   Epoch: 12   Global Step: 506350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:37:10,830-Speed 2626.19 samples/sec   Loss 5.3052   LearningRate 0.0152   Epoch: 12   Global Step: 506360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:37:14,719-Speed 2633.41 samples/sec   Loss 5.2193   LearningRate 0.0152   Epoch: 12   Global Step: 506370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:18,629-Speed 2619.51 samples/sec   Loss 5.2777   LearningRate 0.0152   Epoch: 12   Global Step: 506380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:22,557-Speed 2607.78 samples/sec   Loss 5.2823   LearningRate 0.0152   Epoch: 12   Global Step: 506390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:26,468-Speed 2618.96 samples/sec   Loss 5.3134   LearningRate 0.0152   Epoch: 12   Global Step: 506400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:30,363-Speed 2629.63 samples/sec   Loss 5.1615   LearningRate 0.0152   Epoch: 12   Global Step: 506410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:34,276-Speed 2617.18 samples/sec   Loss 5.2499   LearningRate 0.0152   Epoch: 12   Global Step: 506420   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:38,184-Speed 2620.64 samples/sec   Loss 5.3783   LearningRate 0.0152   Epoch: 12   Global Step: 506430   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:42,078-Speed 2630.06 samples/sec   Loss 5.2744   LearningRate 0.0152   Epoch: 12   Global Step: 506440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:45,976-Speed 2628.61 samples/sec   Loss 5.1946   LearningRate 0.0152   Epoch: 12   Global Step: 506450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:49,871-Speed 2629.58 samples/sec   Loss 5.2457   LearningRate 0.0152   Epoch: 12   Global Step: 506460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:37:53,770-Speed 2626.97 samples/sec   Loss 5.2527   LearningRate 0.0152   Epoch: 12   Global Step: 506470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:37:57,673-Speed 2626.54 samples/sec   Loss 5.3108   LearningRate 0.0152   Epoch: 12   Global Step: 506480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:01,568-Speed 2629.27 samples/sec   Loss 5.2572   LearningRate 0.0152   Epoch: 12   Global Step: 506490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:05,465-Speed 2627.92 samples/sec   Loss 5.2146   LearningRate 0.0152   Epoch: 12   Global Step: 506500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:09,361-Speed 2629.34 samples/sec   Loss 5.2208   LearningRate 0.0152   Epoch: 12   Global Step: 506510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:13,262-Speed 2625.45 samples/sec   Loss 5.2410   LearningRate 0.0152   Epoch: 12   Global Step: 506520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:17,161-Speed 2626.80 samples/sec   Loss 5.2872   LearningRate 0.0152   Epoch: 12   Global Step: 506530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:21,055-Speed 2630.51 samples/sec   Loss 5.1286   LearningRate 0.0152   Epoch: 12   Global Step: 506540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:24,950-Speed 2629.60 samples/sec   Loss 5.2147   LearningRate 0.0152   Epoch: 12   Global Step: 506550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:28,850-Speed 2626.55 samples/sec   Loss 5.2606   LearningRate 0.0152   Epoch: 12   Global Step: 506560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:32,770-Speed 2612.70 samples/sec   Loss 5.1395   LearningRate 0.0152   Epoch: 12   Global Step: 506570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:36,683-Speed 2617.17 samples/sec   Loss 5.1754   LearningRate 0.0152   Epoch: 12   Global Step: 506580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:38:40,559-Speed 2642.72 samples/sec   Loss 5.1680   LearningRate 0.0152   Epoch: 12   Global Step: 506590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:38:44,456-Speed 2628.63 samples/sec   Loss 5.2027   LearningRate 0.0152   Epoch: 12   Global Step: 506600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:38:48,353-Speed 2627.83 samples/sec   Loss 5.2441   LearningRate 0.0152   Epoch: 12   Global Step: 506610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:38:52,273-Speed 2613.24 samples/sec   Loss 5.2380   LearningRate 0.0152   Epoch: 12   Global Step: 506620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:38:56,171-Speed 2627.53 samples/sec   Loss 5.2402   LearningRate 0.0152   Epoch: 12   Global Step: 506630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:00,070-Speed 2626.69 samples/sec   Loss 5.2122   LearningRate 0.0152   Epoch: 12   Global Step: 506640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:03,967-Speed 2628.80 samples/sec   Loss 5.2437   LearningRate 0.0152   Epoch: 12   Global Step: 506650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:07,860-Speed 2630.39 samples/sec   Loss 5.2455   LearningRate 0.0152   Epoch: 12   Global Step: 506660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:11,754-Speed 2630.61 samples/sec   Loss 5.2809   LearningRate 0.0152   Epoch: 12   Global Step: 506670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:15,650-Speed 2628.99 samples/sec   Loss 5.2750   LearningRate 0.0152   Epoch: 12   Global Step: 506680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:19,543-Speed 2630.51 samples/sec   Loss 5.1582   LearningRate 0.0151   Epoch: 12   Global Step: 506690   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:39:23,440-Speed 2628.40 samples/sec   Loss 5.1801   LearningRate 0.0151   Epoch: 12   Global Step: 506700   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:39:27,338-Speed 2627.83 samples/sec   Loss 5.2093   LearningRate 0.0151   Epoch: 12   Global Step: 506710   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:39:31,242-Speed 2623.83 samples/sec   Loss 5.1306   LearningRate 0.0151   Epoch: 12   Global Step: 506720   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:39:35,133-Speed 2632.13 samples/sec   Loss 5.2315   LearningRate 0.0151   Epoch: 12   Global Step: 506730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:39,032-Speed 2626.26 samples/sec   Loss 5.3317   LearningRate 0.0151   Epoch: 12   Global Step: 506740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:42,941-Speed 2620.17 samples/sec   Loss 5.2613   LearningRate 0.0151   Epoch: 12   Global Step: 506750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:46,853-Speed 2618.90 samples/sec   Loss 5.2376   LearningRate 0.0151   Epoch: 12   Global Step: 506760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:50,756-Speed 2624.21 samples/sec   Loss 5.1989   LearningRate 0.0151   Epoch: 12   Global Step: 506770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:54,657-Speed 2625.31 samples/sec   Loss 5.1269   LearningRate 0.0151   Epoch: 12   Global Step: 506780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:39:58,531-Speed 2644.26 samples/sec   Loss 5.3681   LearningRate 0.0151   Epoch: 12   Global Step: 506790   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:02,434-Speed 2624.39 samples/sec   Loss 5.1209   LearningRate 0.0151   Epoch: 12   Global Step: 506800   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:06,339-Speed 2622.41 samples/sec   Loss 5.2289   LearningRate 0.0151   Epoch: 12   Global Step: 506810   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:10,236-Speed 2628.47 samples/sec   Loss 5.1188   LearningRate 0.0151   Epoch: 12   Global Step: 506820   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:14,131-Speed 2629.64 samples/sec   Loss 5.2270   LearningRate 0.0151   Epoch: 12   Global Step: 506830   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:18,041-Speed 2619.51 samples/sec   Loss 5.1440   LearningRate 0.0151   Epoch: 12   Global Step: 506840   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:21,973-Speed 2604.98 samples/sec   Loss 5.2405   LearningRate 0.0151   Epoch: 12   Global Step: 506850   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:25,866-Speed 2631.23 samples/sec   Loss 5.1845   LearningRate 0.0151   Epoch: 12   Global Step: 506860   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:29,764-Speed 2627.34 samples/sec   Loss 5.2771   LearningRate 0.0151   Epoch: 12   Global Step: 506870   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:33,655-Speed 2632.57 samples/sec   Loss 5.1615   LearningRate 0.0151   Epoch: 12   Global Step: 506880   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:40:37,546-Speed 2631.94 samples/sec   Loss 5.1936   LearningRate 0.0151   Epoch: 12   Global Step: 506890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:40:41,440-Speed 2630.26 samples/sec   Loss 5.1915   LearningRate 0.0151   Epoch: 12   Global Step: 506900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:40:45,336-Speed 2629.27 samples/sec   Loss 5.2346   LearningRate 0.0151   Epoch: 12   Global Step: 506910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:40:49,236-Speed 2626.50 samples/sec   Loss 5.1981   LearningRate 0.0151   Epoch: 12   Global Step: 506920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:40:53,151-Speed 2615.87 samples/sec   Loss 5.2229   LearningRate 0.0151   Epoch: 12   Global Step: 506930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:40:57,041-Speed 2632.96 samples/sec   Loss 5.1960   LearningRate 0.0151   Epoch: 12   Global Step: 506940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:41:00,948-Speed 2621.90 samples/sec   Loss 5.1520   LearningRate 0.0151   Epoch: 12   Global Step: 506950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:41:04,847-Speed 2626.67 samples/sec   Loss 5.2055   LearningRate 0.0151   Epoch: 12   Global Step: 506960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:41:08,759-Speed 2618.61 samples/sec   Loss 5.2334   LearningRate 0.0151   Epoch: 12   Global Step: 506970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:41:12,654-Speed 2629.23 samples/sec   Loss 5.2434   LearningRate 0.0151   Epoch: 12   Global Step: 506980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:41:16,528-Speed 2643.94 samples/sec   Loss 5.2055   LearningRate 0.0151   Epoch: 12   Global Step: 506990   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:20,434-Speed 2623.18 samples/sec   Loss 5.2247   LearningRate 0.0151   Epoch: 12   Global Step: 507000   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:24,333-Speed 2626.84 samples/sec   Loss 5.2004   LearningRate 0.0151   Epoch: 12   Global Step: 507010   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:28,230-Speed 2627.94 samples/sec   Loss 5.2553   LearningRate 0.0151   Epoch: 12   Global Step: 507020   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:32,162-Speed 2604.93 samples/sec   Loss 5.1535   LearningRate 0.0151   Epoch: 12   Global Step: 507030   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:36,153-Speed 2566.33 samples/sec   Loss 5.1768   LearningRate 0.0151   Epoch: 12   Global Step: 507040   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:40,049-Speed 2629.33 samples/sec   Loss 5.1483   LearningRate 0.0151   Epoch: 12   Global Step: 507050   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:43,957-Speed 2621.15 samples/sec   Loss 5.2212   LearningRate 0.0151   Epoch: 12   Global Step: 507060   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:47,887-Speed 2605.91 samples/sec   Loss 5.1449   LearningRate 0.0151   Epoch: 12   Global Step: 507070   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:51,783-Speed 2629.90 samples/sec   Loss 5.2571   LearningRate 0.0151   Epoch: 12   Global Step: 507080   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:41:55,680-Speed 2628.16 samples/sec   Loss 5.1600   LearningRate 0.0151   Epoch: 12   Global Step: 507090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:41:59,586-Speed 2621.56 samples/sec   Loss 5.1871   LearningRate 0.0151   Epoch: 12   Global Step: 507100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:03,479-Speed 2631.01 samples/sec   Loss 5.1720   LearningRate 0.0151   Epoch: 12   Global Step: 507110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:07,408-Speed 2607.39 samples/sec   Loss 5.2162   LearningRate 0.0151   Epoch: 12   Global Step: 507120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:11,353-Speed 2596.64 samples/sec   Loss 5.1592   LearningRate 0.0151   Epoch: 12   Global Step: 507130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:15,243-Speed 2632.29 samples/sec   Loss 5.1597   LearningRate 0.0151   Epoch: 12   Global Step: 507140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:19,158-Speed 2617.01 samples/sec   Loss 5.2103   LearningRate 0.0151   Epoch: 12   Global Step: 507150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:23,057-Speed 2626.11 samples/sec   Loss 5.2256   LearningRate 0.0151   Epoch: 12   Global Step: 507160   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:26,954-Speed 2628.99 samples/sec   Loss 5.2376   LearningRate 0.0151   Epoch: 12   Global Step: 507170   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:30,845-Speed 2632.25 samples/sec   Loss 5.1728   LearningRate 0.0151   Epoch: 12   Global Step: 507180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:34,750-Speed 2623.08 samples/sec   Loss 5.1793   LearningRate 0.0151   Epoch: 12   Global Step: 507190   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:42:38,649-Speed 2626.90 samples/sec   Loss 5.2947   LearningRate 0.0151   Epoch: 12   Global Step: 507200   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:42:42,545-Speed 2628.45 samples/sec   Loss 5.3773   LearningRate 0.0151   Epoch: 12   Global Step: 507210   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:42:46,456-Speed 2619.45 samples/sec   Loss 5.1795   LearningRate 0.0151   Epoch: 12   Global Step: 507220   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:42:50,330-Speed 2643.62 samples/sec   Loss 5.1931   LearningRate 0.0151   Epoch: 12   Global Step: 507230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:54,237-Speed 2621.49 samples/sec   Loss 5.2939   LearningRate 0.0151   Epoch: 12   Global Step: 507240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:42:58,152-Speed 2616.61 samples/sec   Loss 5.1681   LearningRate 0.0151   Epoch: 12   Global Step: 507250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:02,054-Speed 2624.41 samples/sec   Loss 5.2206   LearningRate 0.0151   Epoch: 12   Global Step: 507260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:05,950-Speed 2629.23 samples/sec   Loss 5.2301   LearningRate 0.0151   Epoch: 12   Global Step: 507270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:09,842-Speed 2631.96 samples/sec   Loss 5.2214   LearningRate 0.0151   Epoch: 12   Global Step: 507280   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:13,738-Speed 2628.58 samples/sec   Loss 5.2606   LearningRate 0.0151   Epoch: 12   Global Step: 507290   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:17,636-Speed 2627.57 samples/sec   Loss 5.1187   LearningRate 0.0151   Epoch: 12   Global Step: 507300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:21,537-Speed 2625.54 samples/sec   Loss 5.2003   LearningRate 0.0151   Epoch: 12   Global Step: 507310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:25,441-Speed 2623.73 samples/sec   Loss 5.2510   LearningRate 0.0151   Epoch: 12   Global Step: 507320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:43:29,339-Speed 2627.88 samples/sec   Loss 5.0912   LearningRate 0.0151   Epoch: 12   Global Step: 507330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:33,233-Speed 2630.31 samples/sec   Loss 5.1106   LearningRate 0.0151   Epoch: 12   Global Step: 507340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:37,128-Speed 2629.84 samples/sec   Loss 5.2668   LearningRate 0.0151   Epoch: 12   Global Step: 507350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:41,020-Speed 2631.56 samples/sec   Loss 5.2403   LearningRate 0.0151   Epoch: 12   Global Step: 507360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:44,957-Speed 2601.60 samples/sec   Loss 5.1763   LearningRate 0.0151   Epoch: 12   Global Step: 507370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:48,870-Speed 2617.71 samples/sec   Loss 5.2402   LearningRate 0.0151   Epoch: 12   Global Step: 507380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:52,789-Speed 2613.99 samples/sec   Loss 5.2057   LearningRate 0.0151   Epoch: 12   Global Step: 507390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:43:56,680-Speed 2632.13 samples/sec   Loss 5.2572   LearningRate 0.0151   Epoch: 12   Global Step: 507400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:00,573-Speed 2631.59 samples/sec   Loss 5.1663   LearningRate 0.0151   Epoch: 12   Global Step: 507410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:04,473-Speed 2626.65 samples/sec   Loss 5.2241   LearningRate 0.0151   Epoch: 12   Global Step: 507420   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:08,385-Speed 2618.21 samples/sec   Loss 5.2492   LearningRate 0.0151   Epoch: 12   Global Step: 507430   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:12,292-Speed 2621.46 samples/sec   Loss 5.2853   LearningRate 0.0151   Epoch: 12   Global Step: 507440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:16,187-Speed 2630.06 samples/sec   Loss 5.1898   LearningRate 0.0151   Epoch: 12   Global Step: 507450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:20,102-Speed 2615.64 samples/sec   Loss 5.1621   LearningRate 0.0151   Epoch: 12   Global Step: 507460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:24,011-Speed 2620.73 samples/sec   Loss 5.2876   LearningRate 0.0151   Epoch: 12   Global Step: 507470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:27,921-Speed 2619.42 samples/sec   Loss 5.2551   LearningRate 0.0151   Epoch: 12   Global Step: 507480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:31,995-Speed 2514.25 samples/sec   Loss 5.2644   LearningRate 0.0151   Epoch: 12   Global Step: 507490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:44:35,893-Speed 2627.46 samples/sec   Loss 5.1579   LearningRate 0.0151   Epoch: 12   Global Step: 507500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:44:39,788-Speed 2629.71 samples/sec   Loss 5.2428   LearningRate 0.0151   Epoch: 12   Global Step: 507510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:44:43,708-Speed 2612.54 samples/sec   Loss 5.2470   LearningRate 0.0151   Epoch: 12   Global Step: 507520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:44:47,618-Speed 2619.11 samples/sec   Loss 5.1091   LearningRate 0.0151   Epoch: 12   Global Step: 507530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:44:51,531-Speed 2629.72 samples/sec   Loss 5.2434   LearningRate 0.0151   Epoch: 12   Global Step: 507540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:44:55,442-Speed 2618.76 samples/sec   Loss 5.2234   LearningRate 0.0151   Epoch: 12   Global Step: 507550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:44:59,342-Speed 2626.61 samples/sec   Loss 5.2668   LearningRate 0.0151   Epoch: 12   Global Step: 507560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:45:03,240-Speed 2627.44 samples/sec   Loss 5.1987   LearningRate 0.0151   Epoch: 12   Global Step: 507570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:45:07,135-Speed 2629.62 samples/sec   Loss 5.1178   LearningRate 0.0151   Epoch: 12   Global Step: 507580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:45:11,035-Speed 2626.31 samples/sec   Loss 5.1384   LearningRate 0.0151   Epoch: 12   Global Step: 507590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:45:14,928-Speed 2631.17 samples/sec   Loss 5.2070   LearningRate 0.0151   Epoch: 12   Global Step: 507600   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-04-15 04:45:18,815-Speed 2634.61 samples/sec   Loss 5.3264   LearningRate 0.0151   Epoch: 12   Global Step: 507610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:45:22,714-Speed 2627.34 samples/sec   Loss 5.1999   LearningRate 0.0151   Epoch: 12   Global Step: 507620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:45:26,582-Speed 2648.36 samples/sec   Loss 5.1899   LearningRate 0.0151   Epoch: 12   Global Step: 507630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:30,483-Speed 2625.81 samples/sec   Loss 5.1804   LearningRate 0.0151   Epoch: 12   Global Step: 507640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:34,374-Speed 2631.93 samples/sec   Loss 5.1055   LearningRate 0.0151   Epoch: 12   Global Step: 507650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:38,266-Speed 2631.90 samples/sec   Loss 5.2152   LearningRate 0.0151   Epoch: 12   Global Step: 507660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:42,175-Speed 2620.24 samples/sec   Loss 5.1897   LearningRate 0.0151   Epoch: 12   Global Step: 507670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:46,074-Speed 2627.73 samples/sec   Loss 5.1786   LearningRate 0.0151   Epoch: 12   Global Step: 507680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:49,968-Speed 2629.58 samples/sec   Loss 5.2390   LearningRate 0.0151   Epoch: 12   Global Step: 507690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:53,879-Speed 2619.29 samples/sec   Loss 5.2049   LearningRate 0.0151   Epoch: 12   Global Step: 507700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:45:57,785-Speed 2622.51 samples/sec   Loss 5.3351   LearningRate 0.0151   Epoch: 12   Global Step: 507710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:46:01,681-Speed 2629.04 samples/sec   Loss 5.2811   LearningRate 0.0151   Epoch: 12   Global Step: 507720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:46:05,586-Speed 2622.93 samples/sec   Loss 5.1898   LearningRate 0.0151   Epoch: 12   Global Step: 507730   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:09,492-Speed 2622.33 samples/sec   Loss 5.1911   LearningRate 0.0151   Epoch: 12   Global Step: 507740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:13,402-Speed 2619.17 samples/sec   Loss 5.1678   LearningRate 0.0151   Epoch: 12   Global Step: 507750   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:17,297-Speed 2629.77 samples/sec   Loss 5.2610   LearningRate 0.0150   Epoch: 12   Global Step: 507760   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:21,222-Speed 2610.54 samples/sec   Loss 5.1020   LearningRate 0.0150   Epoch: 12   Global Step: 507770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:25,125-Speed 2624.21 samples/sec   Loss 5.1993   LearningRate 0.0150   Epoch: 12   Global Step: 507780   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:29,025-Speed 2625.69 samples/sec   Loss 5.2060   LearningRate 0.0150   Epoch: 12   Global Step: 507790   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:32,962-Speed 2601.65 samples/sec   Loss 5.2344   LearningRate 0.0150   Epoch: 12   Global Step: 507800   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:36,857-Speed 2629.98 samples/sec   Loss 5.2073   LearningRate 0.0150   Epoch: 12   Global Step: 507810   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:40,748-Speed 2632.96 samples/sec   Loss 5.2859   LearningRate 0.0150   Epoch: 12   Global Step: 507820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:44,619-Speed 2645.67 samples/sec   Loss 5.2762   LearningRate 0.0150   Epoch: 12   Global Step: 507830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:48,616-Speed 2563.30 samples/sec   Loss 5.2612   LearningRate 0.0150   Epoch: 12   Global Step: 507840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:52,511-Speed 2629.53 samples/sec   Loss 5.1431   LearningRate 0.0150   Epoch: 12   Global Step: 507850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:46:56,406-Speed 2630.07 samples/sec   Loss 5.2628   LearningRate 0.0150   Epoch: 12   Global Step: 507860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:47:00,313-Speed 2621.79 samples/sec   Loss 5.1182   LearningRate 0.0150   Epoch: 12   Global Step: 507870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:47:04,185-Speed 2645.05 samples/sec   Loss 5.1887   LearningRate 0.0150   Epoch: 12   Global Step: 507880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:08,083-Speed 2627.73 samples/sec   Loss 5.1641   LearningRate 0.0150   Epoch: 12   Global Step: 507890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:12,182-Speed 2498.70 samples/sec   Loss 5.2118   LearningRate 0.0150   Epoch: 12   Global Step: 507900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:16,079-Speed 2628.63 samples/sec   Loss 5.2025   LearningRate 0.0150   Epoch: 12   Global Step: 507910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:19,982-Speed 2624.48 samples/sec   Loss 5.1827   LearningRate 0.0150   Epoch: 12   Global Step: 507920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:23,879-Speed 2628.46 samples/sec   Loss 5.2032   LearningRate 0.0150   Epoch: 12   Global Step: 507930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:27,779-Speed 2626.44 samples/sec   Loss 5.1613   LearningRate 0.0150   Epoch: 12   Global Step: 507940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:31,699-Speed 2612.51 samples/sec   Loss 5.2382   LearningRate 0.0150   Epoch: 12   Global Step: 507950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:35,680-Speed 2572.90 samples/sec   Loss 5.2327   LearningRate 0.0150   Epoch: 12   Global Step: 507960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:39,593-Speed 2617.73 samples/sec   Loss 5.2488   LearningRate 0.0150   Epoch: 12   Global Step: 507970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:47:43,485-Speed 2631.18 samples/sec   Loss 5.1589   LearningRate 0.0150   Epoch: 12   Global Step: 507980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:47:47,382-Speed 2628.93 samples/sec   Loss 5.1650   LearningRate 0.0150   Epoch: 12   Global Step: 507990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:47:51,284-Speed 2625.08 samples/sec   Loss 5.1629   LearningRate 0.0150   Epoch: 12   Global Step: 508000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:47:55,179-Speed 2629.96 samples/sec   Loss 5.2390   LearningRate 0.0150   Epoch: 12   Global Step: 508010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:47:59,078-Speed 2626.79 samples/sec   Loss 5.1883   LearningRate 0.0150   Epoch: 12   Global Step: 508020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:48:02,973-Speed 2629.18 samples/sec   Loss 5.2351   LearningRate 0.0150   Epoch: 12   Global Step: 508030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:48:06,854-Speed 2639.10 samples/sec   Loss 5.2378   LearningRate 0.0150   Epoch: 12   Global Step: 508040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:10,751-Speed 2628.33 samples/sec   Loss 5.1241   LearningRate 0.0150   Epoch: 12   Global Step: 508050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:14,654-Speed 2624.44 samples/sec   Loss 5.1671   LearningRate 0.0150   Epoch: 12   Global Step: 508060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:18,555-Speed 2625.45 samples/sec   Loss 5.1758   LearningRate 0.0150   Epoch: 12   Global Step: 508070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:22,458-Speed 2624.68 samples/sec   Loss 5.1729   LearningRate 0.0150   Epoch: 12   Global Step: 508080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:26,347-Speed 2634.02 samples/sec   Loss 5.1927   LearningRate 0.0150   Epoch: 12   Global Step: 508090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:30,243-Speed 2628.57 samples/sec   Loss 5.2363   LearningRate 0.0150   Epoch: 12   Global Step: 508100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:34,138-Speed 2629.82 samples/sec   Loss 5.1841   LearningRate 0.0150   Epoch: 12   Global Step: 508110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:38,034-Speed 2628.38 samples/sec   Loss 5.2348   LearningRate 0.0150   Epoch: 12   Global Step: 508120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:41,935-Speed 2625.74 samples/sec   Loss 5.2246   LearningRate 0.0150   Epoch: 12   Global Step: 508130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:45,829-Speed 2631.04 samples/sec   Loss 5.2710   LearningRate 0.0150   Epoch: 12   Global Step: 508140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:48:49,700-Speed 2646.10 samples/sec   Loss 5.1325   LearningRate 0.0150   Epoch: 12   Global Step: 508150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:53,597-Speed 2628.62 samples/sec   Loss 5.1018   LearningRate 0.0150   Epoch: 12   Global Step: 508160   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:48:57,528-Speed 2605.27 samples/sec   Loss 5.1858   LearningRate 0.0150   Epoch: 12   Global Step: 508170   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:01,422-Speed 2630.63 samples/sec   Loss 5.2099   LearningRate 0.0150   Epoch: 12   Global Step: 508180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:05,320-Speed 2627.27 samples/sec   Loss 5.1814   LearningRate 0.0150   Epoch: 12   Global Step: 508190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:09,218-Speed 2628.10 samples/sec   Loss 5.2406   LearningRate 0.0150   Epoch: 12   Global Step: 508200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:13,108-Speed 2632.56 samples/sec   Loss 5.1304   LearningRate 0.0150   Epoch: 12   Global Step: 508210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:17,002-Speed 2630.72 samples/sec   Loss 5.1992   LearningRate 0.0150   Epoch: 12   Global Step: 508220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:20,898-Speed 2629.46 samples/sec   Loss 5.2181   LearningRate 0.0150   Epoch: 12   Global Step: 508230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:24,798-Speed 2625.98 samples/sec   Loss 5.1426   LearningRate 0.0150   Epoch: 12   Global Step: 508240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:28,715-Speed 2615.34 samples/sec   Loss 5.2213   LearningRate 0.0150   Epoch: 12   Global Step: 508250   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:49:32,630-Speed 2616.16 samples/sec   Loss 5.0917   LearningRate 0.0150   Epoch: 12   Global Step: 508260   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:49:36,525-Speed 2629.48 samples/sec   Loss 5.1513   LearningRate 0.0150   Epoch: 12   Global Step: 508270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:49:40,430-Speed 2622.51 samples/sec   Loss 5.2781   LearningRate 0.0150   Epoch: 12   Global Step: 508280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:49:44,327-Speed 2628.40 samples/sec   Loss 5.3016   LearningRate 0.0150   Epoch: 12   Global Step: 508290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:49:48,232-Speed 2622.80 samples/sec   Loss 5.1891   LearningRate 0.0150   Epoch: 12   Global Step: 508300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:49:52,109-Speed 2642.57 samples/sec   Loss 5.2412   LearningRate 0.0150   Epoch: 12   Global Step: 508310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:56,016-Speed 2621.68 samples/sec   Loss 5.2472   LearningRate 0.0150   Epoch: 12   Global Step: 508320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:49:59,931-Speed 2616.21 samples/sec   Loss 5.1855   LearningRate 0.0150   Epoch: 12   Global Step: 508330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:03,831-Speed 2626.04 samples/sec   Loss 5.1433   LearningRate 0.0150   Epoch: 12   Global Step: 508340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:07,732-Speed 2625.75 samples/sec   Loss 5.1473   LearningRate 0.0150   Epoch: 12   Global Step: 508350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:11,636-Speed 2623.47 samples/sec   Loss 5.2597   LearningRate 0.0150   Epoch: 12   Global Step: 508360   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:15,568-Speed 2605.32 samples/sec   Loss 5.1620   LearningRate 0.0150   Epoch: 12   Global Step: 508370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:19,462-Speed 2629.77 samples/sec   Loss 5.1716   LearningRate 0.0150   Epoch: 12   Global Step: 508380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:23,376-Speed 2617.75 samples/sec   Loss 5.2282   LearningRate 0.0150   Epoch: 12   Global Step: 508390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:27,274-Speed 2627.42 samples/sec   Loss 5.1587   LearningRate 0.0150   Epoch: 12   Global Step: 508400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:50:31,174-Speed 2626.65 samples/sec   Loss 5.1973   LearningRate 0.0150   Epoch: 12   Global Step: 508410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:35,069-Speed 2629.62 samples/sec   Loss 5.1931   LearningRate 0.0150   Epoch: 12   Global Step: 508420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:38,970-Speed 2625.13 samples/sec   Loss 5.1116   LearningRate 0.0150   Epoch: 12   Global Step: 508430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:42,887-Speed 2615.19 samples/sec   Loss 5.2125   LearningRate 0.0150   Epoch: 12   Global Step: 508440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:46,799-Speed 2617.97 samples/sec   Loss 5.0916   LearningRate 0.0150   Epoch: 12   Global Step: 508450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:50,702-Speed 2624.68 samples/sec   Loss 5.1376   LearningRate 0.0150   Epoch: 12   Global Step: 508460   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:54,593-Speed 2632.46 samples/sec   Loss 5.1471   LearningRate 0.0150   Epoch: 12   Global Step: 508470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:50:58,494-Speed 2625.27 samples/sec   Loss 5.1393   LearningRate 0.0150   Epoch: 12   Global Step: 508480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:02,421-Speed 2608.60 samples/sec   Loss 5.1698   LearningRate 0.0150   Epoch: 12   Global Step: 508490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:06,319-Speed 2628.18 samples/sec   Loss 5.1833   LearningRate 0.0150   Epoch: 12   Global Step: 508500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:10,204-Speed 2635.81 samples/sec   Loss 5.2635   LearningRate 0.0150   Epoch: 12   Global Step: 508510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:14,104-Speed 2626.58 samples/sec   Loss 5.1074   LearningRate 0.0150   Epoch: 12   Global Step: 508520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:18,003-Speed 2627.22 samples/sec   Loss 5.2384   LearningRate 0.0150   Epoch: 12   Global Step: 508530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:21,902-Speed 2627.13 samples/sec   Loss 5.0871   LearningRate 0.0150   Epoch: 12   Global Step: 508540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:25,799-Speed 2628.69 samples/sec   Loss 5.2511   LearningRate 0.0150   Epoch: 12   Global Step: 508550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:29,697-Speed 2627.42 samples/sec   Loss 5.0821   LearningRate 0.0150   Epoch: 12   Global Step: 508560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:51:33,582-Speed 2636.68 samples/sec   Loss 5.1558   LearningRate 0.0150   Epoch: 12   Global Step: 508570   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:51:37,479-Speed 2628.15 samples/sec   Loss 5.2626   LearningRate 0.0150   Epoch: 12   Global Step: 508580   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:51:41,377-Speed 2627.30 samples/sec   Loss 5.2810   LearningRate 0.0150   Epoch: 12   Global Step: 508590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:51:45,273-Speed 2629.68 samples/sec   Loss 5.2269   LearningRate 0.0150   Epoch: 12   Global Step: 508600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:51:49,168-Speed 2629.67 samples/sec   Loss 5.3203   LearningRate 0.0150   Epoch: 12   Global Step: 508610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:51:53,069-Speed 2625.42 samples/sec   Loss 5.2102   LearningRate 0.0150   Epoch: 12   Global Step: 508620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:51:56,964-Speed 2629.84 samples/sec   Loss 5.1538   LearningRate 0.0150   Epoch: 12   Global Step: 508630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:00,867-Speed 2624.33 samples/sec   Loss 5.2220   LearningRate 0.0150   Epoch: 12   Global Step: 508640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:04,766-Speed 2626.86 samples/sec   Loss 5.0919   LearningRate 0.0150   Epoch: 12   Global Step: 508650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:08,673-Speed 2621.44 samples/sec   Loss 5.2049   LearningRate 0.0150   Epoch: 12   Global Step: 508660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:12,552-Speed 2640.56 samples/sec   Loss 5.1837   LearningRate 0.0150   Epoch: 12   Global Step: 508670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:16,455-Speed 2626.83 samples/sec   Loss 5.1637   LearningRate 0.0150   Epoch: 12   Global Step: 508680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:20,366-Speed 2618.72 samples/sec   Loss 5.1641   LearningRate 0.0150   Epoch: 12   Global Step: 508690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:24,259-Speed 2630.78 samples/sec   Loss 5.0964   LearningRate 0.0150   Epoch: 12   Global Step: 508700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:28,156-Speed 2628.14 samples/sec   Loss 5.1859   LearningRate 0.0150   Epoch: 12   Global Step: 508710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:32,061-Speed 2623.39 samples/sec   Loss 5.1337   LearningRate 0.0150   Epoch: 12   Global Step: 508720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:35,959-Speed 2627.41 samples/sec   Loss 5.2507   LearningRate 0.0150   Epoch: 12   Global Step: 508730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:39,859-Speed 2625.64 samples/sec   Loss 5.2592   LearningRate 0.0150   Epoch: 12   Global Step: 508740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:43,840-Speed 2572.90 samples/sec   Loss 5.1371   LearningRate 0.0150   Epoch: 12   Global Step: 508750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:47,746-Speed 2630.23 samples/sec   Loss 5.0258   LearningRate 0.0150   Epoch: 12   Global Step: 508760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:51,640-Speed 2630.53 samples/sec   Loss 5.2093   LearningRate 0.0150   Epoch: 12   Global Step: 508770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:52:55,517-Speed 2641.73 samples/sec   Loss 5.2133   LearningRate 0.0150   Epoch: 12   Global Step: 508780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:52:59,416-Speed 2627.25 samples/sec   Loss 5.2113   LearningRate 0.0150   Epoch: 12   Global Step: 508790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:03,334-Speed 2613.72 samples/sec   Loss 5.2690   LearningRate 0.0150   Epoch: 12   Global Step: 508800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:07,229-Speed 2629.52 samples/sec   Loss 5.1317   LearningRate 0.0150   Epoch: 12   Global Step: 508810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:11,127-Speed 2628.04 samples/sec   Loss 5.2213   LearningRate 0.0150   Epoch: 12   Global Step: 508820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:15,027-Speed 2625.91 samples/sec   Loss 5.1429   LearningRate 0.0149   Epoch: 12   Global Step: 508830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:18,934-Speed 2621.63 samples/sec   Loss 5.2030   LearningRate 0.0149   Epoch: 12   Global Step: 508840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:22,827-Speed 2631.03 samples/sec   Loss 5.2437   LearningRate 0.0149   Epoch: 12   Global Step: 508850   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:26,728-Speed 2625.89 samples/sec   Loss 5.1691   LearningRate 0.0149   Epoch: 12   Global Step: 508860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:30,625-Speed 2628.45 samples/sec   Loss 5.1507   LearningRate 0.0149   Epoch: 12   Global Step: 508870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:34,526-Speed 2625.04 samples/sec   Loss 5.2066   LearningRate 0.0149   Epoch: 12   Global Step: 508880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:53:38,397-Speed 2646.21 samples/sec   Loss 5.1656   LearningRate 0.0149   Epoch: 12   Global Step: 508890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:42,295-Speed 2628.02 samples/sec   Loss 5.2027   LearningRate 0.0149   Epoch: 12   Global Step: 508900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:46,189-Speed 2630.32 samples/sec   Loss 5.1548   LearningRate 0.0149   Epoch: 12   Global Step: 508910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:50,085-Speed 2629.62 samples/sec   Loss 5.1569   LearningRate 0.0149   Epoch: 12   Global Step: 508920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:53,981-Speed 2628.82 samples/sec   Loss 5.1003   LearningRate 0.0149   Epoch: 12   Global Step: 508930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:53:57,878-Speed 2628.91 samples/sec   Loss 5.2774   LearningRate 0.0149   Epoch: 12   Global Step: 508940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:01,773-Speed 2629.48 samples/sec   Loss 5.2546   LearningRate 0.0149   Epoch: 12   Global Step: 508950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:05,676-Speed 2624.21 samples/sec   Loss 5.2146   LearningRate 0.0149   Epoch: 12   Global Step: 508960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:09,573-Speed 2628.38 samples/sec   Loss 5.2270   LearningRate 0.0149   Epoch: 12   Global Step: 508970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:13,468-Speed 2629.11 samples/sec   Loss 5.2157   LearningRate 0.0149   Epoch: 12   Global Step: 508980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:17,366-Speed 2628.71 samples/sec   Loss 5.2275   LearningRate 0.0149   Epoch: 12   Global Step: 508990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:21,268-Speed 2624.76 samples/sec   Loss 5.1281   LearningRate 0.0149   Epoch: 12   Global Step: 509000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:25,212-Speed 2598.21 samples/sec   Loss 5.2079   LearningRate 0.0149   Epoch: 12   Global Step: 509010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:29,104-Speed 2631.38 samples/sec   Loss 5.1050   LearningRate 0.0149   Epoch: 12   Global Step: 509020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:32,999-Speed 2629.46 samples/sec   Loss 5.1273   LearningRate 0.0149   Epoch: 12   Global Step: 509030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:36,922-Speed 2611.19 samples/sec   Loss 5.3345   LearningRate 0.0149   Epoch: 12   Global Step: 509040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:40,837-Speed 2616.83 samples/sec   Loss 5.2131   LearningRate 0.0149   Epoch: 12   Global Step: 509050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:54:44,714-Speed 2641.40 samples/sec   Loss 5.1493   LearningRate 0.0149   Epoch: 12   Global Step: 509060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:48,745-Speed 2541.26 samples/sec   Loss 5.1911   LearningRate 0.0149   Epoch: 12   Global Step: 509070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:52,641-Speed 2628.57 samples/sec   Loss 5.1199   LearningRate 0.0149   Epoch: 12   Global Step: 509080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:54:56,549-Speed 2621.73 samples/sec   Loss 5.2700   LearningRate 0.0149   Epoch: 12   Global Step: 509090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:00,445-Speed 2629.06 samples/sec   Loss 5.2764   LearningRate 0.0149   Epoch: 12   Global Step: 509100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:04,340-Speed 2629.29 samples/sec   Loss 5.1441   LearningRate 0.0149   Epoch: 12   Global Step: 509110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:08,231-Speed 2632.37 samples/sec   Loss 5.2650   LearningRate 0.0149   Epoch: 12   Global Step: 509120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:12,126-Speed 2629.73 samples/sec   Loss 5.1823   LearningRate 0.0149   Epoch: 12   Global Step: 509130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:16,043-Speed 2614.86 samples/sec   Loss 5.1619   LearningRate 0.0149   Epoch: 12   Global Step: 509140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:19,943-Speed 2626.33 samples/sec   Loss 5.1013   LearningRate 0.0149   Epoch: 12   Global Step: 509150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:23,853-Speed 2619.78 samples/sec   Loss 5.1690   LearningRate 0.0149   Epoch: 12   Global Step: 509160   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:27,747-Speed 2630.61 samples/sec   Loss 5.1505   LearningRate 0.0149   Epoch: 12   Global Step: 509170   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:31,665-Speed 2614.49 samples/sec   Loss 5.2061   LearningRate 0.0149   Epoch: 12   Global Step: 509180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:35,559-Speed 2630.34 samples/sec   Loss 5.1068   LearningRate 0.0149   Epoch: 12   Global Step: 509190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:55:39,440-Speed 2638.77 samples/sec   Loss 5.2091   LearningRate 0.0149   Epoch: 12   Global Step: 509200   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:55:43,340-Speed 2626.29 samples/sec   Loss 5.2444   LearningRate 0.0149   Epoch: 12   Global Step: 509210   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:55:47,235-Speed 2630.03 samples/sec   Loss 5.2373   LearningRate 0.0149   Epoch: 12   Global Step: 509220   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:55:51,136-Speed 2625.67 samples/sec   Loss 5.2324   LearningRate 0.0149   Epoch: 12   Global Step: 509230   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:55:55,040-Speed 2623.49 samples/sec   Loss 5.2312   LearningRate 0.0149   Epoch: 12   Global Step: 509240   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:55:58,984-Speed 2597.38 samples/sec   Loss 5.1332   LearningRate 0.0149   Epoch: 12   Global Step: 509250   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:56:02,875-Speed 2632.79 samples/sec   Loss 5.1364   LearningRate 0.0149   Epoch: 12   Global Step: 509260   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:56:06,778-Speed 2624.15 samples/sec   Loss 5.1483   LearningRate 0.0149   Epoch: 12   Global Step: 509270   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:56:10,677-Speed 2626.64 samples/sec   Loss 5.1937   LearningRate 0.0149   Epoch: 12   Global Step: 509280   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:56:14,688-Speed 2553.23 samples/sec   Loss 5.1754   LearningRate 0.0149   Epoch: 12   Global Step: 509290   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 04:56:18,596-Speed 2620.85 samples/sec   Loss 5.1765   LearningRate 0.0149   Epoch: 12   Global Step: 509300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:22,508-Speed 2618.95 samples/sec   Loss 5.1159   LearningRate 0.0149   Epoch: 12   Global Step: 509310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:26,434-Speed 2608.61 samples/sec   Loss 5.0923   LearningRate 0.0149   Epoch: 12   Global Step: 509320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:30,337-Speed 2624.83 samples/sec   Loss 5.2484   LearningRate 0.0149   Epoch: 12   Global Step: 509330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:34,232-Speed 2629.97 samples/sec   Loss 5.1782   LearningRate 0.0149   Epoch: 12   Global Step: 509340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:38,251-Speed 2548.01 samples/sec   Loss 5.1497   LearningRate 0.0149   Epoch: 12   Global Step: 509350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:42,160-Speed 2620.39 samples/sec   Loss 5.2195   LearningRate 0.0149   Epoch: 12   Global Step: 509360   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:46,057-Speed 2628.38 samples/sec   Loss 5.1216   LearningRate 0.0149   Epoch: 12   Global Step: 509370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:49,962-Speed 2622.85 samples/sec   Loss 5.0829   LearningRate 0.0149   Epoch: 12   Global Step: 509380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:53,891-Speed 2607.27 samples/sec   Loss 5.1704   LearningRate 0.0149   Epoch: 12   Global Step: 509390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:56:57,785-Speed 2630.30 samples/sec   Loss 5.1968   LearningRate 0.0149   Epoch: 12   Global Step: 509400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:01,697-Speed 2618.56 samples/sec   Loss 5.1052   LearningRate 0.0149   Epoch: 12   Global Step: 509410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:05,591-Speed 2630.55 samples/sec   Loss 5.1346   LearningRate 0.0149   Epoch: 12   Global Step: 509420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:09,490-Speed 2626.80 samples/sec   Loss 5.2206   LearningRate 0.0149   Epoch: 12   Global Step: 509430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:13,502-Speed 2553.12 samples/sec   Loss 5.2005   LearningRate 0.0149   Epoch: 12   Global Step: 509440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:17,582-Speed 2510.37 samples/sec   Loss 5.1773   LearningRate 0.0149   Epoch: 12   Global Step: 509450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:21,560-Speed 2574.45 samples/sec   Loss 5.1816   LearningRate 0.0149   Epoch: 12   Global Step: 509460   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:25,537-Speed 2575.94 samples/sec   Loss 5.1411   LearningRate 0.0149   Epoch: 12   Global Step: 509470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:29,441-Speed 2623.87 samples/sec   Loss 5.0933   LearningRate 0.0149   Epoch: 12   Global Step: 509480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:33,384-Speed 2597.66 samples/sec   Loss 5.2667   LearningRate 0.0149   Epoch: 12   Global Step: 509490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:57:37,270-Speed 2636.44 samples/sec   Loss 5.2840   LearningRate 0.0149   Epoch: 12   Global Step: 509500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:57:41,165-Speed 2629.60 samples/sec   Loss 5.3079   LearningRate 0.0149   Epoch: 12   Global Step: 509510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:57:45,082-Speed 2614.28 samples/sec   Loss 5.2344   LearningRate 0.0149   Epoch: 12   Global Step: 509520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:57:48,979-Speed 2628.54 samples/sec   Loss 5.1151   LearningRate 0.0149   Epoch: 12   Global Step: 509530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:57:52,876-Speed 2628.37 samples/sec   Loss 5.1560   LearningRate 0.0149   Epoch: 12   Global Step: 509540   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:57:56,771-Speed 2629.51 samples/sec   Loss 5.2370   LearningRate 0.0149   Epoch: 12   Global Step: 509550   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:00,678-Speed 2621.96 samples/sec   Loss 5.0913   LearningRate 0.0149   Epoch: 12   Global Step: 509560   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:04,571-Speed 2630.89 samples/sec   Loss 5.2236   LearningRate 0.0149   Epoch: 12   Global Step: 509570   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:08,473-Speed 2624.76 samples/sec   Loss 5.1852   LearningRate 0.0149   Epoch: 12   Global Step: 509580   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:12,379-Speed 2622.28 samples/sec   Loss 5.2391   LearningRate 0.0149   Epoch: 12   Global Step: 509590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:16,312-Speed 2605.11 samples/sec   Loss 5.1570   LearningRate 0.0149   Epoch: 12   Global Step: 509600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:58:20,250-Speed 2600.89 samples/sec   Loss 5.2165   LearningRate 0.0149   Epoch: 12   Global Step: 509610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:58:24,127-Speed 2641.87 samples/sec   Loss 5.1573   LearningRate 0.0149   Epoch: 12   Global Step: 509620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:28,028-Speed 2625.90 samples/sec   Loss 5.2090   LearningRate 0.0149   Epoch: 12   Global Step: 509630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:31,930-Speed 2625.88 samples/sec   Loss 5.1122   LearningRate 0.0149   Epoch: 12   Global Step: 509640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:35,831-Speed 2625.15 samples/sec   Loss 5.1702   LearningRate 0.0149   Epoch: 12   Global Step: 509650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:39,728-Speed 2628.32 samples/sec   Loss 5.2070   LearningRate 0.0149   Epoch: 12   Global Step: 509660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:43,626-Speed 2628.35 samples/sec   Loss 5.2070   LearningRate 0.0149   Epoch: 12   Global Step: 509670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:47,518-Speed 2631.85 samples/sec   Loss 5.2796   LearningRate 0.0149   Epoch: 12   Global Step: 509680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:51,414-Speed 2628.63 samples/sec   Loss 5.2002   LearningRate 0.0149   Epoch: 12   Global Step: 509690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:55,311-Speed 2629.12 samples/sec   Loss 5.1606   LearningRate 0.0149   Epoch: 12   Global Step: 509700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:58:59,206-Speed 2629.24 samples/sec   Loss 5.2589   LearningRate 0.0149   Epoch: 12   Global Step: 509710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:03,084-Speed 2640.96 samples/sec   Loss 5.1690   LearningRate 0.0149   Epoch: 12   Global Step: 509720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:07,006-Speed 2612.11 samples/sec   Loss 5.1727   LearningRate 0.0149   Epoch: 12   Global Step: 509730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:10,955-Speed 2593.70 samples/sec   Loss 5.1123   LearningRate 0.0149   Epoch: 12   Global Step: 509740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:14,965-Speed 2554.28 samples/sec   Loss 5.2197   LearningRate 0.0149   Epoch: 12   Global Step: 509750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:18,904-Speed 2600.61 samples/sec   Loss 5.2135   LearningRate 0.0149   Epoch: 12   Global Step: 509760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:22,811-Speed 2621.23 samples/sec   Loss 5.1435   LearningRate 0.0149   Epoch: 12   Global Step: 509770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:26,709-Speed 2628.23 samples/sec   Loss 5.2169   LearningRate 0.0149   Epoch: 12   Global Step: 509780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:30,606-Speed 2628.13 samples/sec   Loss 5.1042   LearningRate 0.0149   Epoch: 12   Global Step: 509790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:34,536-Speed 2606.14 samples/sec   Loss 5.1953   LearningRate 0.0149   Epoch: 12   Global Step: 509800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:38,429-Speed 2631.08 samples/sec   Loss 5.2176   LearningRate 0.0149   Epoch: 12   Global Step: 509810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 04:59:42,327-Speed 2628.86 samples/sec   Loss 5.1903   LearningRate 0.0149   Epoch: 12   Global Step: 509820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:59:46,239-Speed 2618.21 samples/sec   Loss 5.2607   LearningRate 0.0149   Epoch: 12   Global Step: 509830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:59:50,142-Speed 2624.55 samples/sec   Loss 5.1528   LearningRate 0.0149   Epoch: 12   Global Step: 509840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:59:54,038-Speed 2628.33 samples/sec   Loss 5.0803   LearningRate 0.0149   Epoch: 12   Global Step: 509850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 04:59:57,934-Speed 2629.87 samples/sec   Loss 5.2531   LearningRate 0.0149   Epoch: 12   Global Step: 509860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:00:01,831-Speed 2628.05 samples/sec   Loss 5.1076   LearningRate 0.0149   Epoch: 12   Global Step: 509870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:00:05,734-Speed 2624.06 samples/sec   Loss 5.2742   LearningRate 0.0149   Epoch: 12   Global Step: 509880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:00:09,632-Speed 2627.81 samples/sec   Loss 5.2622   LearningRate 0.0149   Epoch: 12   Global Step: 509890   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:00:13,518-Speed 2635.65 samples/sec   Loss 5.2156   LearningRate 0.0148   Epoch: 12   Global Step: 509900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:17,431-Speed 2618.71 samples/sec   Loss 5.3167   LearningRate 0.0148   Epoch: 12   Global Step: 509910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:21,332-Speed 2625.10 samples/sec   Loss 5.2192   LearningRate 0.0148   Epoch: 12   Global Step: 509920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:25,231-Speed 2627.58 samples/sec   Loss 5.1568   LearningRate 0.0148   Epoch: 12   Global Step: 509930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:29,125-Speed 2629.59 samples/sec   Loss 5.1473   LearningRate 0.0148   Epoch: 12   Global Step: 509940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:33,022-Speed 2628.43 samples/sec   Loss 5.0747   LearningRate 0.0148   Epoch: 12   Global Step: 509950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:36,931-Speed 2620.58 samples/sec   Loss 5.1523   LearningRate 0.0148   Epoch: 12   Global Step: 509960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:40,832-Speed 2625.66 samples/sec   Loss 5.1190   LearningRate 0.0148   Epoch: 12   Global Step: 509970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:44,738-Speed 2621.98 samples/sec   Loss 5.2441   LearningRate 0.0148   Epoch: 12   Global Step: 509980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:48,645-Speed 2621.81 samples/sec   Loss 5.1908   LearningRate 0.0148   Epoch: 12   Global Step: 509990   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:00:52,551-Speed 2622.15 samples/sec   Loss 5.1976   LearningRate 0.0148   Epoch: 12   Global Step: 510000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:01:36,800-[lfw][510000]XNorm: 23.380763
Training: 2022-04-15 05:01:36,801-[lfw][510000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-15 05:01:36,802-[lfw][510000]Accuracy-Highest: 0.99800
Training: 2022-04-15 05:02:26,562-[cfp_fp][510000]XNorm: 21.950113
Training: 2022-04-15 05:02:26,563-[cfp_fp][510000]Accuracy-Flip: 0.99043+-0.00443
Training: 2022-04-15 05:02:26,564-[cfp_fp][510000]Accuracy-Highest: 0.99043
Training: 2022-04-15 05:03:09,839-[agedb_30][510000]XNorm: 23.299798
Training: 2022-04-15 05:03:09,840-[agedb_30][510000]Accuracy-Flip: 0.97767+-0.00688
Training: 2022-04-15 05:03:09,840-[agedb_30][510000]Accuracy-Highest: 0.97950
Training: 2022-04-15 05:03:13,720-Speed 72.54 samples/sec   Loss 5.1982   LearningRate 0.0148   Epoch: 12   Global Step: 510010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:03:17,582-Speed 2652.66 samples/sec   Loss 5.1285   LearningRate 0.0148   Epoch: 12   Global Step: 510020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:03:21,484-Speed 2624.98 samples/sec   Loss 5.2669   LearningRate 0.0148   Epoch: 12   Global Step: 510030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:03:25,353-Speed 2647.80 samples/sec   Loss 5.2115   LearningRate 0.0148   Epoch: 12   Global Step: 510040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:03:29,232-Speed 2640.18 samples/sec   Loss 5.1798   LearningRate 0.0148   Epoch: 12   Global Step: 510050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:03:33,087-Speed 2656.85 samples/sec   Loss 5.0540   LearningRate 0.0148   Epoch: 12   Global Step: 510060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:03:37,083-Speed 2564.20 samples/sec   Loss 5.2652   LearningRate 0.0148   Epoch: 12   Global Step: 510070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:03:41,023-Speed 2599.85 samples/sec   Loss 5.1344   LearningRate 0.0148   Epoch: 12   Global Step: 510080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:03:44,907-Speed 2637.84 samples/sec   Loss 5.2528   LearningRate 0.0148   Epoch: 12   Global Step: 510090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:03:48,825-Speed 2614.01 samples/sec   Loss 5.1544   LearningRate 0.0148   Epoch: 12   Global Step: 510100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:03:52,746-Speed 2612.23 samples/sec   Loss 5.0851   LearningRate 0.0148   Epoch: 12   Global Step: 510110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:03:56,651-Speed 2622.52 samples/sec   Loss 5.2054   LearningRate 0.0148   Epoch: 12   Global Step: 510120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:00,547-Speed 2629.50 samples/sec   Loss 5.1183   LearningRate 0.0148   Epoch: 12   Global Step: 510130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:04,451-Speed 2623.89 samples/sec   Loss 5.1922   LearningRate 0.0148   Epoch: 12   Global Step: 510140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:08,346-Speed 2629.50 samples/sec   Loss 5.1056   LearningRate 0.0148   Epoch: 12   Global Step: 510150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:12,234-Speed 2635.61 samples/sec   Loss 5.0640   LearningRate 0.0148   Epoch: 12   Global Step: 510160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:04:16,159-Speed 2609.23 samples/sec   Loss 5.1390   LearningRate 0.0148   Epoch: 12   Global Step: 510170   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:04:20,033-Speed 2644.45 samples/sec   Loss 5.1363   LearningRate 0.0148   Epoch: 12   Global Step: 510180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:23,921-Speed 2634.15 samples/sec   Loss 5.2032   LearningRate 0.0148   Epoch: 12   Global Step: 510190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:27,848-Speed 2608.57 samples/sec   Loss 5.3004   LearningRate 0.0148   Epoch: 12   Global Step: 510200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:31,749-Speed 2625.19 samples/sec   Loss 5.0599   LearningRate 0.0148   Epoch: 12   Global Step: 510210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:35,638-Speed 2634.44 samples/sec   Loss 5.1735   LearningRate 0.0148   Epoch: 12   Global Step: 510220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:39,532-Speed 2630.33 samples/sec   Loss 5.1240   LearningRate 0.0148   Epoch: 12   Global Step: 510230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:43,439-Speed 2621.67 samples/sec   Loss 5.1745   LearningRate 0.0148   Epoch: 12   Global Step: 510240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:47,325-Speed 2636.16 samples/sec   Loss 5.1277   LearningRate 0.0148   Epoch: 12   Global Step: 510250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:51,215-Speed 2633.13 samples/sec   Loss 5.2136   LearningRate 0.0148   Epoch: 12   Global Step: 510260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:55,104-Speed 2634.43 samples/sec   Loss 5.2105   LearningRate 0.0148   Epoch: 12   Global Step: 510270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:04:58,995-Speed 2632.47 samples/sec   Loss 5.1707   LearningRate 0.0148   Epoch: 12   Global Step: 510280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:05:02,887-Speed 2631.33 samples/sec   Loss 5.1338   LearningRate 0.0148   Epoch: 12   Global Step: 510290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:05:06,797-Speed 2619.56 samples/sec   Loss 5.1590   LearningRate 0.0148   Epoch: 12   Global Step: 510300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:05:10,689-Speed 2631.84 samples/sec   Loss 5.2020   LearningRate 0.0148   Epoch: 12   Global Step: 510310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:05:14,561-Speed 2645.62 samples/sec   Loss 5.1559   LearningRate 0.0148   Epoch: 12   Global Step: 510320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:18,479-Speed 2613.71 samples/sec   Loss 5.2502   LearningRate 0.0148   Epoch: 12   Global Step: 510330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:22,378-Speed 2627.30 samples/sec   Loss 5.0095   LearningRate 0.0148   Epoch: 12   Global Step: 510340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:26,269-Speed 2632.67 samples/sec   Loss 5.1711   LearningRate 0.0148   Epoch: 12   Global Step: 510350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:30,196-Speed 2608.70 samples/sec   Loss 5.1826   LearningRate 0.0148   Epoch: 12   Global Step: 510360   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:34,093-Speed 2628.29 samples/sec   Loss 5.2244   LearningRate 0.0148   Epoch: 12   Global Step: 510370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:38,023-Speed 2605.69 samples/sec   Loss 5.2132   LearningRate 0.0148   Epoch: 12   Global Step: 510380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:41,922-Speed 2627.13 samples/sec   Loss 5.0908   LearningRate 0.0148   Epoch: 12   Global Step: 510390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:45,819-Speed 2628.52 samples/sec   Loss 5.1662   LearningRate 0.0148   Epoch: 12   Global Step: 510400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:49,718-Speed 2627.19 samples/sec   Loss 5.2488   LearningRate 0.0148   Epoch: 12   Global Step: 510410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:05:53,625-Speed 2621.04 samples/sec   Loss 5.2278   LearningRate 0.0148   Epoch: 12   Global Step: 510420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:05:57,541-Speed 2616.48 samples/sec   Loss 5.1752   LearningRate 0.0148   Epoch: 12   Global Step: 510430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:06:01,409-Speed 2647.47 samples/sec   Loss 5.0491   LearningRate 0.0148   Epoch: 12   Global Step: 510440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:05,306-Speed 2628.65 samples/sec   Loss 5.0105   LearningRate 0.0148   Epoch: 12   Global Step: 510450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:09,201-Speed 2629.38 samples/sec   Loss 5.1058   LearningRate 0.0148   Epoch: 12   Global Step: 510460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:13,106-Speed 2623.69 samples/sec   Loss 5.1111   LearningRate 0.0148   Epoch: 12   Global Step: 510470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:17,006-Speed 2625.59 samples/sec   Loss 5.0998   LearningRate 0.0148   Epoch: 12   Global Step: 510480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:20,971-Speed 2583.41 samples/sec   Loss 5.2174   LearningRate 0.0148   Epoch: 12   Global Step: 510490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:24,913-Speed 2598.71 samples/sec   Loss 5.0926   LearningRate 0.0148   Epoch: 12   Global Step: 510500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:28,829-Speed 2616.46 samples/sec   Loss 5.1522   LearningRate 0.0148   Epoch: 12   Global Step: 510510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:32,720-Speed 2632.27 samples/sec   Loss 5.2328   LearningRate 0.0148   Epoch: 12   Global Step: 510520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:36,611-Speed 2632.82 samples/sec   Loss 5.1348   LearningRate 0.0148   Epoch: 12   Global Step: 510530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:40,511-Speed 2625.82 samples/sec   Loss 5.2554   LearningRate 0.0148   Epoch: 12   Global Step: 510540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:06:44,412-Speed 2626.34 samples/sec   Loss 5.2241   LearningRate 0.0148   Epoch: 12   Global Step: 510550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:06:48,287-Speed 2642.84 samples/sec   Loss 5.1225   LearningRate 0.0148   Epoch: 12   Global Step: 510560   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:52,178-Speed 2632.71 samples/sec   Loss 5.1774   LearningRate 0.0148   Epoch: 12   Global Step: 510570   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:56,090-Speed 2617.89 samples/sec   Loss 5.1460   LearningRate 0.0148   Epoch: 12   Global Step: 510580   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:06:59,979-Speed 2634.02 samples/sec   Loss 5.1308   LearningRate 0.0148   Epoch: 12   Global Step: 510590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:03,878-Speed 2627.06 samples/sec   Loss 5.1703   LearningRate 0.0148   Epoch: 12   Global Step: 510600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:07,769-Speed 2632.47 samples/sec   Loss 5.0965   LearningRate 0.0148   Epoch: 12   Global Step: 510610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:11,670-Speed 2625.38 samples/sec   Loss 5.1639   LearningRate 0.0148   Epoch: 12   Global Step: 510620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:15,573-Speed 2624.67 samples/sec   Loss 5.2017   LearningRate 0.0148   Epoch: 12   Global Step: 510630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:19,496-Speed 2611.41 samples/sec   Loss 5.1579   LearningRate 0.0148   Epoch: 12   Global Step: 510640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:23,392-Speed 2629.14 samples/sec   Loss 5.1515   LearningRate 0.0148   Epoch: 12   Global Step: 510650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:27,325-Speed 2604.72 samples/sec   Loss 5.2098   LearningRate 0.0148   Epoch: 12   Global Step: 510660   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:07:31,200-Speed 2642.86 samples/sec   Loss 5.2006   LearningRate 0.0148   Epoch: 12   Global Step: 510670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:35,125-Speed 2610.05 samples/sec   Loss 5.1165   LearningRate 0.0148   Epoch: 12   Global Step: 510680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:39,020-Speed 2629.37 samples/sec   Loss 5.0857   LearningRate 0.0148   Epoch: 12   Global Step: 510690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:42,916-Speed 2629.40 samples/sec   Loss 5.1256   LearningRate 0.0148   Epoch: 12   Global Step: 510700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:46,822-Speed 2621.46 samples/sec   Loss 5.0574   LearningRate 0.0148   Epoch: 12   Global Step: 510710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:50,718-Speed 2630.08 samples/sec   Loss 5.1655   LearningRate 0.0148   Epoch: 12   Global Step: 510720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:54,614-Speed 2628.96 samples/sec   Loss 5.1536   LearningRate 0.0148   Epoch: 12   Global Step: 510730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:07:58,513-Speed 2626.62 samples/sec   Loss 5.1714   LearningRate 0.0148   Epoch: 12   Global Step: 510740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:02,428-Speed 2616.30 samples/sec   Loss 5.1965   LearningRate 0.0148   Epoch: 12   Global Step: 510750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:06,325-Speed 2627.85 samples/sec   Loss 5.1565   LearningRate 0.0148   Epoch: 12   Global Step: 510760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:10,223-Speed 2627.43 samples/sec   Loss 5.1158   LearningRate 0.0148   Epoch: 12   Global Step: 510770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:08:14,097-Speed 2644.50 samples/sec   Loss 5.0397   LearningRate 0.0148   Epoch: 12   Global Step: 510780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:18,003-Speed 2621.87 samples/sec   Loss 5.2114   LearningRate 0.0148   Epoch: 12   Global Step: 510790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:21,902-Speed 2626.98 samples/sec   Loss 5.0751   LearningRate 0.0148   Epoch: 12   Global Step: 510800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:25,797-Speed 2630.39 samples/sec   Loss 5.2336   LearningRate 0.0148   Epoch: 12   Global Step: 510810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:29,693-Speed 2628.63 samples/sec   Loss 5.1502   LearningRate 0.0148   Epoch: 12   Global Step: 510820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:33,590-Speed 2628.25 samples/sec   Loss 5.1905   LearningRate 0.0148   Epoch: 12   Global Step: 510830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:37,492-Speed 2625.35 samples/sec   Loss 5.1206   LearningRate 0.0148   Epoch: 12   Global Step: 510840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:41,403-Speed 2618.31 samples/sec   Loss 5.2369   LearningRate 0.0148   Epoch: 12   Global Step: 510850   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:45,300-Speed 2628.38 samples/sec   Loss 5.1316   LearningRate 0.0148   Epoch: 12   Global Step: 510860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:49,311-Speed 2553.50 samples/sec   Loss 5.1990   LearningRate 0.0148   Epoch: 12   Global Step: 510870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:08:53,213-Speed 2625.47 samples/sec   Loss 5.1106   LearningRate 0.0148   Epoch: 12   Global Step: 510880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:08:57,114-Speed 2625.77 samples/sec   Loss 5.2154   LearningRate 0.0148   Epoch: 12   Global Step: 510890   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:09:01,006-Speed 2631.29 samples/sec   Loss 5.1062   LearningRate 0.0148   Epoch: 12   Global Step: 510900   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:09:04,912-Speed 2623.10 samples/sec   Loss 5.1570   LearningRate 0.0148   Epoch: 12   Global Step: 510910   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:09:08,787-Speed 2642.99 samples/sec   Loss 5.1817   LearningRate 0.0148   Epoch: 12   Global Step: 510920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:12,684-Speed 2627.74 samples/sec   Loss 5.2135   LearningRate 0.0148   Epoch: 12   Global Step: 510930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:16,578-Speed 2630.21 samples/sec   Loss 5.0550   LearningRate 0.0148   Epoch: 12   Global Step: 510940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:20,472-Speed 2630.59 samples/sec   Loss 5.1007   LearningRate 0.0148   Epoch: 12   Global Step: 510950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:24,367-Speed 2629.64 samples/sec   Loss 5.0767   LearningRate 0.0148   Epoch: 12   Global Step: 510960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:28,270-Speed 2624.45 samples/sec   Loss 5.1285   LearningRate 0.0148   Epoch: 12   Global Step: 510970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:32,178-Speed 2621.00 samples/sec   Loss 5.1507   LearningRate 0.0147   Epoch: 12   Global Step: 510980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:36,077-Speed 2626.90 samples/sec   Loss 5.1825   LearningRate 0.0147   Epoch: 12   Global Step: 510990   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:39,973-Speed 2628.82 samples/sec   Loss 5.1137   LearningRate 0.0147   Epoch: 12   Global Step: 511000   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:43,921-Speed 2594.61 samples/sec   Loss 5.1778   LearningRate 0.0147   Epoch: 12   Global Step: 511010   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:47,875-Speed 2590.31 samples/sec   Loss 5.0960   LearningRate 0.0147   Epoch: 12   Global Step: 511020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:09:51,756-Speed 2639.39 samples/sec   Loss 5.2544   LearningRate 0.0147   Epoch: 12   Global Step: 511030   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:55,652-Speed 2628.49 samples/sec   Loss 5.1548   LearningRate 0.0147   Epoch: 12   Global Step: 511040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:09:59,560-Speed 2621.08 samples/sec   Loss 5.0872   LearningRate 0.0147   Epoch: 12   Global Step: 511050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:10:03,458-Speed 2627.95 samples/sec   Loss 5.1336   LearningRate 0.0147   Epoch: 12   Global Step: 511060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:10:07,356-Speed 2627.93 samples/sec   Loss 5.2491   LearningRate 0.0147   Epoch: 12   Global Step: 511070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:10:11,227-Speed 2645.49 samples/sec   Loss 5.1743   LearningRate 0.0147   Epoch: 12   Global Step: 511080   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:15,123-Speed 2629.19 samples/sec   Loss 5.0947   LearningRate 0.0147   Epoch: 12   Global Step: 511090   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:19,018-Speed 2629.75 samples/sec   Loss 5.1066   LearningRate 0.0147   Epoch: 12   Global Step: 511100   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:22,916-Speed 2627.25 samples/sec   Loss 5.1007   LearningRate 0.0147   Epoch: 12   Global Step: 511110   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:26,812-Speed 2629.09 samples/sec   Loss 5.0531   LearningRate 0.0147   Epoch: 12   Global Step: 511120   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:30,708-Speed 2629.14 samples/sec   Loss 5.2530   LearningRate 0.0147   Epoch: 12   Global Step: 511130   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:34,617-Speed 2620.91 samples/sec   Loss 5.1596   LearningRate 0.0147   Epoch: 12   Global Step: 511140   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:38,543-Speed 2608.39 samples/sec   Loss 5.1331   LearningRate 0.0147   Epoch: 12   Global Step: 511150   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:42,454-Speed 2619.16 samples/sec   Loss 5.0385   LearningRate 0.0147   Epoch: 12   Global Step: 511160   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:46,357-Speed 2624.40 samples/sec   Loss 5.0957   LearningRate 0.0147   Epoch: 12   Global Step: 511170   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:10:50,259-Speed 2624.72 samples/sec   Loss 5.0860   LearningRate 0.0147   Epoch: 12   Global Step: 511180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:10:54,176-Speed 2615.63 samples/sec   Loss 5.1446   LearningRate 0.0147   Epoch: 12   Global Step: 511190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:10:58,074-Speed 2627.88 samples/sec   Loss 5.2070   LearningRate 0.0147   Epoch: 12   Global Step: 511200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:01,986-Speed 2618.04 samples/sec   Loss 5.1088   LearningRate 0.0147   Epoch: 12   Global Step: 511210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:05,883-Speed 2627.97 samples/sec   Loss 5.1857   LearningRate 0.0147   Epoch: 12   Global Step: 511220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:09,795-Speed 2618.32 samples/sec   Loss 5.1138   LearningRate 0.0147   Epoch: 12   Global Step: 511230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:13,696-Speed 2626.09 samples/sec   Loss 5.0740   LearningRate 0.0147   Epoch: 12   Global Step: 511240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:17,603-Speed 2621.37 samples/sec   Loss 5.1312   LearningRate 0.0147   Epoch: 12   Global Step: 511250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:21,500-Speed 2628.29 samples/sec   Loss 5.1656   LearningRate 0.0147   Epoch: 12   Global Step: 511260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:25,412-Speed 2617.83 samples/sec   Loss 5.1607   LearningRate 0.0147   Epoch: 12   Global Step: 511270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:29,286-Speed 2644.92 samples/sec   Loss 5.2067   LearningRate 0.0147   Epoch: 12   Global Step: 511280   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:33,191-Speed 2622.41 samples/sec   Loss 5.0940   LearningRate 0.0147   Epoch: 12   Global Step: 511290   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:37,086-Speed 2629.56 samples/sec   Loss 5.1785   LearningRate 0.0147   Epoch: 12   Global Step: 511300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:40,989-Speed 2624.08 samples/sec   Loss 5.0053   LearningRate 0.0147   Epoch: 12   Global Step: 511310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:44,896-Speed 2621.74 samples/sec   Loss 5.1371   LearningRate 0.0147   Epoch: 12   Global Step: 511320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:48,820-Speed 2610.19 samples/sec   Loss 5.2158   LearningRate 0.0147   Epoch: 12   Global Step: 511330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:52,741-Speed 2612.53 samples/sec   Loss 5.1309   LearningRate 0.0147   Epoch: 12   Global Step: 511340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:11:56,648-Speed 2621.21 samples/sec   Loss 5.0745   LearningRate 0.0147   Epoch: 12   Global Step: 511350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:00,546-Speed 2627.89 samples/sec   Loss 5.1696   LearningRate 0.0147   Epoch: 12   Global Step: 511360   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:04,441-Speed 2629.70 samples/sec   Loss 5.0485   LearningRate 0.0147   Epoch: 12   Global Step: 511370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:08,336-Speed 2629.81 samples/sec   Loss 5.0856   LearningRate 0.0147   Epoch: 12   Global Step: 511380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:12,236-Speed 2626.31 samples/sec   Loss 5.1673   LearningRate 0.0147   Epoch: 12   Global Step: 511390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:16,194-Speed 2587.89 samples/sec   Loss 5.1439   LearningRate 0.0147   Epoch: 12   Global Step: 511400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:20,089-Speed 2629.73 samples/sec   Loss 5.1352   LearningRate 0.0147   Epoch: 12   Global Step: 511410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:23,990-Speed 2625.74 samples/sec   Loss 5.1822   LearningRate 0.0147   Epoch: 12   Global Step: 511420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:27,892-Speed 2625.03 samples/sec   Loss 5.1736   LearningRate 0.0147   Epoch: 12   Global Step: 511430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:31,849-Speed 2588.56 samples/sec   Loss 5.1133   LearningRate 0.0147   Epoch: 12   Global Step: 511440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:35,741-Speed 2632.02 samples/sec   Loss 5.1532   LearningRate 0.0147   Epoch: 12   Global Step: 511450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:12:39,622-Speed 2638.63 samples/sec   Loss 5.2675   LearningRate 0.0147   Epoch: 12   Global Step: 511460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:43,533-Speed 2619.53 samples/sec   Loss 5.0645   LearningRate 0.0147   Epoch: 12   Global Step: 511470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:47,443-Speed 2619.27 samples/sec   Loss 5.3095   LearningRate 0.0147   Epoch: 12   Global Step: 511480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:51,348-Speed 2622.94 samples/sec   Loss 5.1157   LearningRate 0.0147   Epoch: 12   Global Step: 511490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:55,245-Speed 2628.72 samples/sec   Loss 5.1415   LearningRate 0.0147   Epoch: 12   Global Step: 511500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:12:59,141-Speed 2629.23 samples/sec   Loss 5.1823   LearningRate 0.0147   Epoch: 12   Global Step: 511510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:03,036-Speed 2629.27 samples/sec   Loss 5.1698   LearningRate 0.0147   Epoch: 12   Global Step: 511520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:06,940-Speed 2623.46 samples/sec   Loss 5.1328   LearningRate 0.0147   Epoch: 12   Global Step: 511530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:10,843-Speed 2624.11 samples/sec   Loss 5.1621   LearningRate 0.0147   Epoch: 12   Global Step: 511540   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:14,751-Speed 2621.06 samples/sec   Loss 5.1185   LearningRate 0.0147   Epoch: 12   Global Step: 511550   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:18,647-Speed 2628.77 samples/sec   Loss 5.1408   LearningRate 0.0147   Epoch: 12   Global Step: 511560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:13:22,554-Speed 2621.88 samples/sec   Loss 5.1366   LearningRate 0.0147   Epoch: 12   Global Step: 511570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:13:26,454-Speed 2626.67 samples/sec   Loss 5.0487   LearningRate 0.0147   Epoch: 12   Global Step: 511580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:13:30,361-Speed 2621.65 samples/sec   Loss 5.0590   LearningRate 0.0147   Epoch: 12   Global Step: 511590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:13:34,262-Speed 2626.16 samples/sec   Loss 5.1309   LearningRate 0.0147   Epoch: 12   Global Step: 511600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:13:38,159-Speed 2627.89 samples/sec   Loss 5.1755   LearningRate 0.0147   Epoch: 12   Global Step: 511610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:13:42,037-Speed 2640.80 samples/sec   Loss 5.1965   LearningRate 0.0147   Epoch: 12   Global Step: 511620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:45,935-Speed 2627.98 samples/sec   Loss 5.1029   LearningRate 0.0147   Epoch: 12   Global Step: 511630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:49,873-Speed 2614.66 samples/sec   Loss 5.1068   LearningRate 0.0147   Epoch: 12   Global Step: 511640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:53,772-Speed 2626.70 samples/sec   Loss 5.2060   LearningRate 0.0147   Epoch: 12   Global Step: 511650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:13:57,734-Speed 2597.26 samples/sec   Loss 5.1872   LearningRate 0.0147   Epoch: 12   Global Step: 511660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:14:01,635-Speed 2625.92 samples/sec   Loss 5.1390   LearningRate 0.0147   Epoch: 12   Global Step: 511670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:14:05,541-Speed 2621.88 samples/sec   Loss 5.0365   LearningRate 0.0147   Epoch: 12   Global Step: 511680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:14:09,437-Speed 2628.45 samples/sec   Loss 5.2693   LearningRate 0.0147   Epoch: 12   Global Step: 511690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:14:13,834-Speed 2628.66 samples/sec   Loss 5.2483   LearningRate 0.0147   Epoch: 12   Global Step: 511700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:14:17,743-Speed 2620.03 samples/sec   Loss 5.2465   LearningRate 0.0147   Epoch: 12   Global Step: 511710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:14:22,133-Speed 2624.26 samples/sec   Loss 5.1241   LearningRate 0.0147   Epoch: 12   Global Step: 511720   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:26,596-Speed 2589.65 samples/sec   Loss 5.1039   LearningRate 0.0147   Epoch: 12   Global Step: 511730   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:30,893-Speed 2615.65 samples/sec   Loss 5.1618   LearningRate 0.0147   Epoch: 12   Global Step: 511740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:34,820-Speed 2608.33 samples/sec   Loss 5.1233   LearningRate 0.0147   Epoch: 12   Global Step: 511750   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:38,797-Speed 2622.49 samples/sec   Loss 5.1464   LearningRate 0.0147   Epoch: 12   Global Step: 511760   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:42,701-Speed 2623.70 samples/sec   Loss 5.0474   LearningRate 0.0147   Epoch: 12   Global Step: 511770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:46,629-Speed 2607.49 samples/sec   Loss 5.0529   LearningRate 0.0147   Epoch: 12   Global Step: 511780   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:50,529-Speed 2625.95 samples/sec   Loss 5.2171   LearningRate 0.0147   Epoch: 12   Global Step: 511790   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:54,433-Speed 2624.34 samples/sec   Loss 5.1297   LearningRate 0.0147   Epoch: 12   Global Step: 511800   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:14:58,348-Speed 2616.20 samples/sec   Loss 5.1920   LearningRate 0.0147   Epoch: 12   Global Step: 511810   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:02,220-Speed 2645.24 samples/sec   Loss 5.0838   LearningRate 0.0147   Epoch: 12   Global Step: 511820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:06,131-Speed 2618.64 samples/sec   Loss 5.2511   LearningRate 0.0147   Epoch: 12   Global Step: 511830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:10,035-Speed 2624.17 samples/sec   Loss 5.1806   LearningRate 0.0147   Epoch: 12   Global Step: 511840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:14,032-Speed 2562.22 samples/sec   Loss 5.1319   LearningRate 0.0147   Epoch: 12   Global Step: 511850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:18,055-Speed 2546.04 samples/sec   Loss 5.1742   LearningRate 0.0147   Epoch: 12   Global Step: 511860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:22,052-Speed 2562.44 samples/sec   Loss 5.1578   LearningRate 0.0147   Epoch: 12   Global Step: 511870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:15:26,152-Speed 2498.66 samples/sec   Loss 5.1165   LearningRate 0.0147   Epoch: 12   Global Step: 511880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:30,239-Speed 2506.63 samples/sec   Loss 5.2018   LearningRate 0.0147   Epoch: 12   Global Step: 511890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:34,186-Speed 2594.69 samples/sec   Loss 5.1237   LearningRate 0.0147   Epoch: 12   Global Step: 511900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:38,096-Speed 2620.07 samples/sec   Loss 5.1133   LearningRate 0.0147   Epoch: 12   Global Step: 511910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:41,995-Speed 2626.80 samples/sec   Loss 5.0925   LearningRate 0.0147   Epoch: 12   Global Step: 511920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:45,918-Speed 2610.72 samples/sec   Loss 5.1415   LearningRate 0.0147   Epoch: 12   Global Step: 511930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:49,870-Speed 2592.07 samples/sec   Loss 5.1681   LearningRate 0.0147   Epoch: 12   Global Step: 511940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:53,784-Speed 2617.04 samples/sec   Loss 5.2498   LearningRate 0.0147   Epoch: 12   Global Step: 511950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:15:57,710-Speed 2608.50 samples/sec   Loss 5.1003   LearningRate 0.0147   Epoch: 12   Global Step: 511960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:01,614-Speed 2623.70 samples/sec   Loss 5.1930   LearningRate 0.0147   Epoch: 12   Global Step: 511970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:05,514-Speed 2626.15 samples/sec   Loss 5.2134   LearningRate 0.0147   Epoch: 12   Global Step: 511980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:16:09,434-Speed 2613.33 samples/sec   Loss 5.1883   LearningRate 0.0147   Epoch: 12   Global Step: 511990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:16:13,320-Speed 2635.95 samples/sec   Loss 5.0833   LearningRate 0.0147   Epoch: 12   Global Step: 512000   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:17,228-Speed 2620.94 samples/sec   Loss 5.1078   LearningRate 0.0147   Epoch: 12   Global Step: 512010   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:21,134-Speed 2622.17 samples/sec   Loss 5.1930   LearningRate 0.0147   Epoch: 12   Global Step: 512020   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:25,040-Speed 2622.56 samples/sec   Loss 5.1164   LearningRate 0.0147   Epoch: 12   Global Step: 512030   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:28,957-Speed 2614.55 samples/sec   Loss 5.0166   LearningRate 0.0147   Epoch: 12   Global Step: 512040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:32,875-Speed 2613.67 samples/sec   Loss 5.0696   LearningRate 0.0147   Epoch: 12   Global Step: 512050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:36,782-Speed 2622.08 samples/sec   Loss 5.0705   LearningRate 0.0146   Epoch: 12   Global Step: 512060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:40,681-Speed 2627.27 samples/sec   Loss 5.1386   LearningRate 0.0146   Epoch: 12   Global Step: 512070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:44,584-Speed 2623.91 samples/sec   Loss 5.0709   LearningRate 0.0146   Epoch: 12   Global Step: 512080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:48,515-Speed 2606.38 samples/sec   Loss 5.0901   LearningRate 0.0146   Epoch: 12   Global Step: 512090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:16:52,411-Speed 2628.94 samples/sec   Loss 5.1066   LearningRate 0.0146   Epoch: 12   Global Step: 512100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:16:56,291-Speed 2640.25 samples/sec   Loss 5.2346   LearningRate 0.0146   Epoch: 12   Global Step: 512110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:00,211-Speed 2612.70 samples/sec   Loss 5.1356   LearningRate 0.0146   Epoch: 12   Global Step: 512120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:04,115-Speed 2623.51 samples/sec   Loss 5.0945   LearningRate 0.0146   Epoch: 12   Global Step: 512130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:08,017-Speed 2624.54 samples/sec   Loss 5.1498   LearningRate 0.0146   Epoch: 12   Global Step: 512140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:11,930-Speed 2617.68 samples/sec   Loss 5.1840   LearningRate 0.0146   Epoch: 12   Global Step: 512150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:15,836-Speed 2622.81 samples/sec   Loss 5.2147   LearningRate 0.0146   Epoch: 12   Global Step: 512160   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:19,738-Speed 2624.77 samples/sec   Loss 5.2443   LearningRate 0.0146   Epoch: 12   Global Step: 512170   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:23,663-Speed 2609.26 samples/sec   Loss 5.2181   LearningRate 0.0146   Epoch: 12   Global Step: 512180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:27,567-Speed 2623.88 samples/sec   Loss 5.0886   LearningRate 0.0146   Epoch: 12   Global Step: 512190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:31,616-Speed 2529.82 samples/sec   Loss 5.2108   LearningRate 0.0146   Epoch: 12   Global Step: 512200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:35,523-Speed 2621.45 samples/sec   Loss 5.1111   LearningRate 0.0146   Epoch: 12   Global Step: 512210   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:17:39,425-Speed 2624.97 samples/sec   Loss 5.1886   LearningRate 0.0146   Epoch: 12   Global Step: 512220   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:17:43,327-Speed 2624.93 samples/sec   Loss 5.0924   LearningRate 0.0146   Epoch: 12   Global Step: 512230   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:17:47,213-Speed 2635.42 samples/sec   Loss 5.1180   LearningRate 0.0146   Epoch: 12   Global Step: 512240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:51,114-Speed 2625.42 samples/sec   Loss 5.2748   LearningRate 0.0146   Epoch: 12   Global Step: 512250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:55,048-Speed 2603.96 samples/sec   Loss 5.0590   LearningRate 0.0146   Epoch: 12   Global Step: 512260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:17:58,949-Speed 2625.65 samples/sec   Loss 5.1837   LearningRate 0.0146   Epoch: 12   Global Step: 512270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:02,854-Speed 2622.65 samples/sec   Loss 5.1412   LearningRate 0.0146   Epoch: 12   Global Step: 512280   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:06,756-Speed 2624.65 samples/sec   Loss 5.0760   LearningRate 0.0146   Epoch: 12   Global Step: 512290   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:10,656-Speed 2625.96 samples/sec   Loss 5.1221   LearningRate 0.0146   Epoch: 12   Global Step: 512300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:14,557-Speed 2626.44 samples/sec   Loss 5.0862   LearningRate 0.0146   Epoch: 12   Global Step: 512310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:18,457-Speed 2626.11 samples/sec   Loss 5.1486   LearningRate 0.0146   Epoch: 12   Global Step: 512320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:22,360-Speed 2624.32 samples/sec   Loss 5.0470   LearningRate 0.0146   Epoch: 12   Global Step: 512330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:18:26,265-Speed 2622.82 samples/sec   Loss 5.0791   LearningRate 0.0146   Epoch: 12   Global Step: 512340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:30,169-Speed 2624.20 samples/sec   Loss 5.1669   LearningRate 0.0146   Epoch: 12   Global Step: 512350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:34,069-Speed 2625.87 samples/sec   Loss 5.1457   LearningRate 0.0146   Epoch: 12   Global Step: 512360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:37,973-Speed 2622.99 samples/sec   Loss 5.0190   LearningRate 0.0146   Epoch: 12   Global Step: 512370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:41,876-Speed 2624.42 samples/sec   Loss 5.1970   LearningRate 0.0146   Epoch: 12   Global Step: 512380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:45,784-Speed 2621.48 samples/sec   Loss 5.1031   LearningRate 0.0146   Epoch: 12   Global Step: 512390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:49,689-Speed 2623.01 samples/sec   Loss 5.0515   LearningRate 0.0146   Epoch: 12   Global Step: 512400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:53,587-Speed 2627.54 samples/sec   Loss 5.1731   LearningRate 0.0146   Epoch: 12   Global Step: 512410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:18:57,490-Speed 2624.21 samples/sec   Loss 5.1367   LearningRate 0.0146   Epoch: 12   Global Step: 512420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:19:01,391-Speed 2625.37 samples/sec   Loss 5.0912   LearningRate 0.0146   Epoch: 12   Global Step: 512430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:19:05,294-Speed 2624.17 samples/sec   Loss 5.2004   LearningRate 0.0146   Epoch: 12   Global Step: 512440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:19:09,213-Speed 2613.68 samples/sec   Loss 5.1106   LearningRate 0.0146   Epoch: 12   Global Step: 512450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:19:13,097-Speed 2636.76 samples/sec   Loss 5.0932   LearningRate 0.0146   Epoch: 12   Global Step: 512460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:17,015-Speed 2614.56 samples/sec   Loss 5.0340   LearningRate 0.0146   Epoch: 12   Global Step: 512470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:20,931-Speed 2615.80 samples/sec   Loss 5.1450   LearningRate 0.0146   Epoch: 12   Global Step: 512480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:24,844-Speed 2617.70 samples/sec   Loss 5.1097   LearningRate 0.0146   Epoch: 12   Global Step: 512490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:28,774-Speed 2606.21 samples/sec   Loss 5.1209   LearningRate 0.0146   Epoch: 12   Global Step: 512500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:32,730-Speed 2589.23 samples/sec   Loss 5.1677   LearningRate 0.0146   Epoch: 12   Global Step: 512510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:36,644-Speed 2617.17 samples/sec   Loss 5.1772   LearningRate 0.0146   Epoch: 12   Global Step: 512520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:40,567-Speed 2610.47 samples/sec   Loss 5.0794   LearningRate 0.0146   Epoch: 12   Global Step: 512530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:44,475-Speed 2621.12 samples/sec   Loss 5.1816   LearningRate 0.0146   Epoch: 12   Global Step: 512540   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:48,390-Speed 2616.38 samples/sec   Loss 5.1003   LearningRate 0.0146   Epoch: 12   Global Step: 512550   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:19:52,303-Speed 2617.54 samples/sec   Loss 5.1877   LearningRate 0.0146   Epoch: 12   Global Step: 512560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:19:56,226-Speed 2611.53 samples/sec   Loss 5.1823   LearningRate 0.0146   Epoch: 12   Global Step: 512570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:20:00,133-Speed 2621.56 samples/sec   Loss 5.1856   LearningRate 0.0146   Epoch: 12   Global Step: 512580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:20:04,038-Speed 2622.56 samples/sec   Loss 5.0398   LearningRate 0.0146   Epoch: 12   Global Step: 512590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:20:07,950-Speed 2618.35 samples/sec   Loss 5.1792   LearningRate 0.0146   Epoch: 12   Global Step: 512600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:20:11,850-Speed 2626.15 samples/sec   Loss 5.1648   LearningRate 0.0146   Epoch: 12   Global Step: 512610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:20:15,751-Speed 2626.00 samples/sec   Loss 5.0206   LearningRate 0.0146   Epoch: 12   Global Step: 512620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:20:19,638-Speed 2634.77 samples/sec   Loss 5.0943   LearningRate 0.0146   Epoch: 12   Global Step: 512630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:23,539-Speed 2625.46 samples/sec   Loss 5.1173   LearningRate 0.0146   Epoch: 12   Global Step: 512640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:27,447-Speed 2620.85 samples/sec   Loss 5.2027   LearningRate 0.0146   Epoch: 12   Global Step: 512650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:31,389-Speed 2598.80 samples/sec   Loss 5.1345   LearningRate 0.0146   Epoch: 12   Global Step: 512660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:35,388-Speed 2561.70 samples/sec   Loss 5.1216   LearningRate 0.0146   Epoch: 12   Global Step: 512670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:39,293-Speed 2622.65 samples/sec   Loss 5.0590   LearningRate 0.0146   Epoch: 12   Global Step: 512680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:43,204-Speed 2619.67 samples/sec   Loss 5.1791   LearningRate 0.0146   Epoch: 12   Global Step: 512690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:47,114-Speed 2619.32 samples/sec   Loss 5.1451   LearningRate 0.0146   Epoch: 12   Global Step: 512700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:51,018-Speed 2623.50 samples/sec   Loss 5.1262   LearningRate 0.0146   Epoch: 12   Global Step: 512710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:54,926-Speed 2620.87 samples/sec   Loss 5.2708   LearningRate 0.0146   Epoch: 12   Global Step: 512720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:20:58,831-Speed 2623.32 samples/sec   Loss 5.1204   LearningRate 0.0146   Epoch: 12   Global Step: 512730   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:21:02,744-Speed 2617.73 samples/sec   Loss 5.1834   LearningRate 0.0146   Epoch: 12   Global Step: 512740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:21:06,657-Speed 2617.54 samples/sec   Loss 5.1195   LearningRate 0.0146   Epoch: 12   Global Step: 512750   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:21:10,564-Speed 2621.53 samples/sec   Loss 4.9299   LearningRate 0.0146   Epoch: 12   Global Step: 512760   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:21:14,447-Speed 2638.44 samples/sec   Loss 5.0886   LearningRate 0.0146   Epoch: 12   Global Step: 512770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:18,349-Speed 2624.90 samples/sec   Loss 5.0951   LearningRate 0.0146   Epoch: 12   Global Step: 512780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:22,250-Speed 2625.71 samples/sec   Loss 5.1849   LearningRate 0.0146   Epoch: 12   Global Step: 512790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:26,145-Speed 2629.30 samples/sec   Loss 5.0942   LearningRate 0.0146   Epoch: 12   Global Step: 512800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:30,053-Speed 2621.46 samples/sec   Loss 5.1258   LearningRate 0.0146   Epoch: 12   Global Step: 512810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:33,980-Speed 2607.70 samples/sec   Loss 5.1938   LearningRate 0.0146   Epoch: 12   Global Step: 512820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:37,890-Speed 2625.26 samples/sec   Loss 5.0692   LearningRate 0.0146   Epoch: 12   Global Step: 512830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:41,792-Speed 2625.48 samples/sec   Loss 5.0771   LearningRate 0.0146   Epoch: 12   Global Step: 512840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:45,692-Speed 2626.99 samples/sec   Loss 5.1723   LearningRate 0.0146   Epoch: 12   Global Step: 512850   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:49,609-Speed 2614.26 samples/sec   Loss 5.0552   LearningRate 0.0146   Epoch: 12   Global Step: 512860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:21:53,517-Speed 2620.74 samples/sec   Loss 5.0905   LearningRate 0.0146   Epoch: 12   Global Step: 512870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:21:57,412-Speed 2629.63 samples/sec   Loss 5.1097   LearningRate 0.0146   Epoch: 12   Global Step: 512880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:01,332-Speed 2614.08 samples/sec   Loss 5.0490   LearningRate 0.0146   Epoch: 12   Global Step: 512890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:05,239-Speed 2621.67 samples/sec   Loss 5.1861   LearningRate 0.0146   Epoch: 12   Global Step: 512900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:09,140-Speed 2625.93 samples/sec   Loss 5.1669   LearningRate 0.0146   Epoch: 12   Global Step: 512910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:13,044-Speed 2623.58 samples/sec   Loss 5.0699   LearningRate 0.0146   Epoch: 12   Global Step: 512920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:16,961-Speed 2615.34 samples/sec   Loss 5.0978   LearningRate 0.0146   Epoch: 12   Global Step: 512930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:20,873-Speed 2618.35 samples/sec   Loss 5.1967   LearningRate 0.0146   Epoch: 12   Global Step: 512940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:24,790-Speed 2614.62 samples/sec   Loss 5.0210   LearningRate 0.0146   Epoch: 12   Global Step: 512950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:28,740-Speed 2593.00 samples/sec   Loss 5.0490   LearningRate 0.0146   Epoch: 12   Global Step: 512960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:32,662-Speed 2611.04 samples/sec   Loss 5.0491   LearningRate 0.0146   Epoch: 12   Global Step: 512970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:22:36,567-Speed 2623.51 samples/sec   Loss 5.1246   LearningRate 0.0146   Epoch: 12   Global Step: 512980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:22:40,466-Speed 2627.80 samples/sec   Loss 5.1451   LearningRate 0.0146   Epoch: 12   Global Step: 512990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:22:44,366-Speed 2625.85 samples/sec   Loss 5.0888   LearningRate 0.0146   Epoch: 12   Global Step: 513000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:22:48,269-Speed 2624.21 samples/sec   Loss 5.1491   LearningRate 0.0146   Epoch: 12   Global Step: 513010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:22:52,169-Speed 2626.71 samples/sec   Loss 5.0953   LearningRate 0.0146   Epoch: 12   Global Step: 513020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:22:56,070-Speed 2625.66 samples/sec   Loss 5.1491   LearningRate 0.0146   Epoch: 12   Global Step: 513030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:23:00,000-Speed 2606.00 samples/sec   Loss 5.0223   LearningRate 0.0146   Epoch: 12   Global Step: 513040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:23:03,903-Speed 2624.26 samples/sec   Loss 5.1310   LearningRate 0.0146   Epoch: 12   Global Step: 513050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:23:07,779-Speed 2642.82 samples/sec   Loss 5.1267   LearningRate 0.0146   Epoch: 12   Global Step: 513060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:11,684-Speed 2622.78 samples/sec   Loss 5.1616   LearningRate 0.0146   Epoch: 12   Global Step: 513070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:15,948-Speed 2402.16 samples/sec   Loss 5.1236   LearningRate 0.0146   Epoch: 12   Global Step: 513080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:19,852-Speed 2623.39 samples/sec   Loss 5.0730   LearningRate 0.0146   Epoch: 12   Global Step: 513090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:23,756-Speed 2623.76 samples/sec   Loss 5.1332   LearningRate 0.0146   Epoch: 12   Global Step: 513100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:27,658-Speed 2625.16 samples/sec   Loss 5.1509   LearningRate 0.0146   Epoch: 12   Global Step: 513110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:31,564-Speed 2622.79 samples/sec   Loss 5.1502   LearningRate 0.0146   Epoch: 12   Global Step: 513120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:35,470-Speed 2622.10 samples/sec   Loss 5.1192   LearningRate 0.0146   Epoch: 12   Global Step: 513130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:39,378-Speed 2620.96 samples/sec   Loss 5.1745   LearningRate 0.0146   Epoch: 12   Global Step: 513140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:43,290-Speed 2617.96 samples/sec   Loss 5.1265   LearningRate 0.0145   Epoch: 12   Global Step: 513150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:23:47,190-Speed 2626.16 samples/sec   Loss 5.1587   LearningRate 0.0145   Epoch: 12   Global Step: 513160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:23:51,089-Speed 2626.95 samples/sec   Loss 5.0746   LearningRate 0.0145   Epoch: 12   Global Step: 513170   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:23:54,993-Speed 2623.54 samples/sec   Loss 5.2163   LearningRate 0.0145   Epoch: 12   Global Step: 513180   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:23:58,914-Speed 2617.44 samples/sec   Loss 5.1929   LearningRate 0.0145   Epoch: 12   Global Step: 513190   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:24:02,799-Speed 2636.14 samples/sec   Loss 5.1628   LearningRate 0.0145   Epoch: 12   Global Step: 513200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:06,702-Speed 2624.40 samples/sec   Loss 5.1267   LearningRate 0.0145   Epoch: 12   Global Step: 513210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:10,628-Speed 2608.54 samples/sec   Loss 5.0901   LearningRate 0.0145   Epoch: 12   Global Step: 513220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:14,555-Speed 2608.64 samples/sec   Loss 5.0287   LearningRate 0.0145   Epoch: 12   Global Step: 513230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:18,470-Speed 2615.82 samples/sec   Loss 5.1239   LearningRate 0.0145   Epoch: 12   Global Step: 513240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:22,380-Speed 2619.85 samples/sec   Loss 5.1168   LearningRate 0.0145   Epoch: 12   Global Step: 513250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:26,282-Speed 2624.57 samples/sec   Loss 4.9921   LearningRate 0.0145   Epoch: 12   Global Step: 513260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:24:30,159-Speed 2641.97 samples/sec   Loss 5.2187   LearningRate 0.0145   Epoch: 12   Global Step: 513270   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:34,061-Speed 2625.29 samples/sec   Loss 5.1013   LearningRate 0.0145   Epoch: 12   Global Step: 513280   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:37,959-Speed 2627.49 samples/sec   Loss 5.1986   LearningRate 0.0145   Epoch: 12   Global Step: 513290   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:41,864-Speed 2622.86 samples/sec   Loss 5.1399   LearningRate 0.0145   Epoch: 12   Global Step: 513300   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:45,772-Speed 2620.68 samples/sec   Loss 5.1767   LearningRate 0.0145   Epoch: 12   Global Step: 513310   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:49,681-Speed 2620.60 samples/sec   Loss 5.1494   LearningRate 0.0145   Epoch: 12   Global Step: 513320   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:53,589-Speed 2620.96 samples/sec   Loss 5.1434   LearningRate 0.0145   Epoch: 12   Global Step: 513330   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:24:57,494-Speed 2622.31 samples/sec   Loss 5.1449   LearningRate 0.0145   Epoch: 12   Global Step: 513340   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:25:01,398-Speed 2623.84 samples/sec   Loss 5.1020   LearningRate 0.0145   Epoch: 12   Global Step: 513350   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:25:05,299-Speed 2625.31 samples/sec   Loss 5.0797   LearningRate 0.0145   Epoch: 12   Global Step: 513360   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-04-15 05:25:09,201-Speed 2625.10 samples/sec   Loss 5.0296   LearningRate 0.0145   Epoch: 12   Global Step: 513370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:13,102-Speed 2625.46 samples/sec   Loss 5.0910   LearningRate 0.0145   Epoch: 12   Global Step: 513380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:17,006-Speed 2624.25 samples/sec   Loss 5.1390   LearningRate 0.0145   Epoch: 12   Global Step: 513390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:20,908-Speed 2627.12 samples/sec   Loss 5.1322   LearningRate 0.0145   Epoch: 12   Global Step: 513400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:24,811-Speed 2624.10 samples/sec   Loss 5.0075   LearningRate 0.0145   Epoch: 12   Global Step: 513410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:28,723-Speed 2618.36 samples/sec   Loss 5.0085   LearningRate 0.0145   Epoch: 12   Global Step: 513420   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:32,643-Speed 2612.79 samples/sec   Loss 5.1374   LearningRate 0.0145   Epoch: 12   Global Step: 513430   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:36,544-Speed 2625.31 samples/sec   Loss 5.2815   LearningRate 0.0145   Epoch: 12   Global Step: 513440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:40,444-Speed 2625.98 samples/sec   Loss 5.1890   LearningRate 0.0145   Epoch: 12   Global Step: 513450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:44,345-Speed 2626.93 samples/sec   Loss 5.0443   LearningRate 0.0145   Epoch: 12   Global Step: 513460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:25:48,247-Speed 2624.96 samples/sec   Loss 5.0866   LearningRate 0.0145   Epoch: 12   Global Step: 513470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:25:52,145-Speed 2627.18 samples/sec   Loss 5.0369   LearningRate 0.0145   Epoch: 12   Global Step: 513480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:25:56,047-Speed 2624.76 samples/sec   Loss 5.1485   LearningRate 0.0145   Epoch: 12   Global Step: 513490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:25:59,948-Speed 2626.35 samples/sec   Loss 5.1560   LearningRate 0.0145   Epoch: 12   Global Step: 513500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:03,849-Speed 2625.00 samples/sec   Loss 5.1231   LearningRate 0.0145   Epoch: 12   Global Step: 513510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:07,751-Speed 2625.11 samples/sec   Loss 5.1724   LearningRate 0.0145   Epoch: 12   Global Step: 513520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:11,656-Speed 2622.82 samples/sec   Loss 5.1555   LearningRate 0.0145   Epoch: 12   Global Step: 513530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:15,564-Speed 2620.64 samples/sec   Loss 5.0237   LearningRate 0.0145   Epoch: 12   Global Step: 513540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:19,467-Speed 2624.33 samples/sec   Loss 5.1212   LearningRate 0.0145   Epoch: 12   Global Step: 513550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:23,373-Speed 2622.14 samples/sec   Loss 5.0420   LearningRate 0.0145   Epoch: 12   Global Step: 513560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-15 05:26:27,231-Speed 2655.64 samples/sec   Loss 5.0401   LearningRate 0.0145   Epoch: 12   Global Step: 513570   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:31,136-Speed 2622.67 samples/sec   Loss 5.1891   LearningRate 0.0145   Epoch: 12   Global Step: 513580   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:35,060-Speed 2610.37 samples/sec   Loss 5.1328   LearningRate 0.0145   Epoch: 12   Global Step: 513590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:38,960-Speed 2626.00 samples/sec   Loss 5.1843   LearningRate 0.0145   Epoch: 12   Global Step: 513600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:42,872-Speed 2618.08 samples/sec   Loss 5.1138   LearningRate 0.0145   Epoch: 12   Global Step: 513610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:46,772-Speed 2626.73 samples/sec   Loss 5.1647   LearningRate 0.0145   Epoch: 12   Global Step: 513620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:50,673-Speed 2625.57 samples/sec   Loss 4.9885   LearningRate 0.0145   Epoch: 12   Global Step: 513630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:54,580-Speed 2621.44 samples/sec   Loss 5.0849   LearningRate 0.0145   Epoch: 12   Global Step: 513640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:26:58,482-Speed 2624.22 samples/sec   Loss 5.0111   LearningRate 0.0145   Epoch: 12   Global Step: 513650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-15 05:27:02,381-Speed 2627.53 samples/sec   Loss 5.0314   LearningRate 0.0145   Epoch: 12   Global Step: 513660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:27:06,280-Speed 2627.67 samples/sec   Loss 5.1301   LearningRate 0.0145   Epoch: 12   Global Step: 513670   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:10,180-Speed 2626.36 samples/sec   Loss 5.1238   LearningRate 0.0145   Epoch: 12   Global Step: 513680   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:14,088-Speed 2621.07 samples/sec   Loss 5.0693   LearningRate 0.0145   Epoch: 12   Global Step: 513690   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:17,989-Speed 2624.94 samples/sec   Loss 5.0783   LearningRate 0.0145   Epoch: 12   Global Step: 513700   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:21,891-Speed 2624.93 samples/sec   Loss 5.1628   LearningRate 0.0145   Epoch: 12   Global Step: 513710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:25,794-Speed 2624.12 samples/sec   Loss 5.0640   LearningRate 0.0145   Epoch: 12   Global Step: 513720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:29,697-Speed 2624.03 samples/sec   Loss 5.0051   LearningRate 0.0145   Epoch: 12   Global Step: 513730   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:33,598-Speed 2625.50 samples/sec   Loss 5.0957   LearningRate 0.0145   Epoch: 12   Global Step: 513740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:37,518-Speed 2613.37 samples/sec   Loss 5.2434   LearningRate 0.0145   Epoch: 12   Global Step: 513750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:41,424-Speed 2622.19 samples/sec   Loss 5.2651   LearningRate 0.0145   Epoch: 12   Global Step: 513760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:45,327-Speed 2624.32 samples/sec   Loss 5.1686   LearningRate 0.0145   Epoch: 12   Global Step: 513770   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-15 05:27:49,211-Speed 2637.06 samples/sec   Loss 5.1188   LearningRate 0.0145   Epoch: 12   Global Step: 513780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:53,114-Speed 2624.24 samples/sec   Loss 5.1756   LearningRate 0.0145   Epoch: 12   Global Step: 513790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:27:57,013-Speed 2626.68 samples/sec   Loss 5.1816   LearningRate 0.0145   Epoch: 12   Global Step: 513800   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:28:00,923-Speed 2627.71 samples/sec   Loss 5.0820   LearningRate 0.0145   Epoch: 12   Global Step: 513810   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:28:04,806-Speed 2637.96 samples/sec   Loss 4.9793   LearningRate 0.0145   Epoch: 12   Global Step: 513820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:08,704-Speed 2627.41 samples/sec   Loss 5.1255   LearningRate 0.0145   Epoch: 12   Global Step: 513830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:12,670-Speed 2582.16 samples/sec   Loss 5.1826   LearningRate 0.0145   Epoch: 12   Global Step: 513840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:16,577-Speed 2622.27 samples/sec   Loss 5.1092   LearningRate 0.0145   Epoch: 12   Global Step: 513850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:20,476-Speed 2626.94 samples/sec   Loss 5.0948   LearningRate 0.0145   Epoch: 12   Global Step: 513860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:24,380-Speed 2623.51 samples/sec   Loss 5.0824   LearningRate 0.0145   Epoch: 12   Global Step: 513870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:28,289-Speed 2619.99 samples/sec   Loss 5.0955   LearningRate 0.0145   Epoch: 12   Global Step: 513880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:32,197-Speed 2620.83 samples/sec   Loss 5.2076   LearningRate 0.0145   Epoch: 12   Global Step: 513890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:36,101-Speed 2623.79 samples/sec   Loss 5.1026   LearningRate 0.0145   Epoch: 12   Global Step: 513900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:40,008-Speed 2621.40 samples/sec   Loss 5.1006   LearningRate 0.0145   Epoch: 12   Global Step: 513910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:43,916-Speed 2621.04 samples/sec   Loss 5.0090   LearningRate 0.0145   Epoch: 12   Global Step: 513920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:28:47,820-Speed 2623.38 samples/sec   Loss 5.1145   LearningRate 0.0145   Epoch: 12   Global Step: 513930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:28:51,721-Speed 2625.66 samples/sec   Loss 5.0957   LearningRate 0.0145   Epoch: 12   Global Step: 513940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:28:55,598-Speed 2642.12 samples/sec   Loss 5.1063   LearningRate 0.0145   Epoch: 12   Global Step: 513950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:28:59,495-Speed 2628.60 samples/sec   Loss 5.0890   LearningRate 0.0145   Epoch: 12   Global Step: 513960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:03,394-Speed 2626.25 samples/sec   Loss 5.0459   LearningRate 0.0145   Epoch: 12   Global Step: 513970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:07,298-Speed 2623.80 samples/sec   Loss 5.0318   LearningRate 0.0145   Epoch: 12   Global Step: 513980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:11,201-Speed 2623.90 samples/sec   Loss 5.1886   LearningRate 0.0145   Epoch: 12   Global Step: 513990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:15,102-Speed 2625.41 samples/sec   Loss 5.0793   LearningRate 0.0145   Epoch: 12   Global Step: 514000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:19,013-Speed 2619.40 samples/sec   Loss 5.1856   LearningRate 0.0145   Epoch: 12   Global Step: 514010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:22,911-Speed 2627.59 samples/sec   Loss 5.0294   LearningRate 0.0145   Epoch: 12   Global Step: 514020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:26,810-Speed 2626.51 samples/sec   Loss 5.1708   LearningRate 0.0145   Epoch: 12   Global Step: 514030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:30,713-Speed 2624.81 samples/sec   Loss 5.1172   LearningRate 0.0145   Epoch: 12   Global Step: 514040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:34,615-Speed 2624.87 samples/sec   Loss 5.0698   LearningRate 0.0145   Epoch: 12   Global Step: 514050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:29:38,515-Speed 2626.04 samples/sec   Loss 5.0910   LearningRate 0.0145   Epoch: 12   Global Step: 514060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:29:42,424-Speed 2620.67 samples/sec   Loss 5.1717   LearningRate 0.0145   Epoch: 12   Global Step: 514070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:29:46,331-Speed 2621.63 samples/sec   Loss 5.1185   LearningRate 0.0145   Epoch: 12   Global Step: 514080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:29:50,220-Speed 2633.40 samples/sec   Loss 5.0731   LearningRate 0.0145   Epoch: 12   Global Step: 514090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:54,121-Speed 2625.77 samples/sec   Loss 5.1994   LearningRate 0.0145   Epoch: 12   Global Step: 514100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:29:58,019-Speed 2627.27 samples/sec   Loss 5.0588   LearningRate 0.0145   Epoch: 12   Global Step: 514110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:02,017-Speed 2561.80 samples/sec   Loss 5.1043   LearningRate 0.0145   Epoch: 12   Global Step: 514120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:05,919-Speed 2625.00 samples/sec   Loss 5.2301   LearningRate 0.0145   Epoch: 12   Global Step: 514130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:09,821-Speed 2624.79 samples/sec   Loss 5.1161   LearningRate 0.0145   Epoch: 12   Global Step: 514140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:13,722-Speed 2625.52 samples/sec   Loss 5.1097   LearningRate 0.0145   Epoch: 12   Global Step: 514150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:17,626-Speed 2623.69 samples/sec   Loss 5.1251   LearningRate 0.0145   Epoch: 12   Global Step: 514160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:21,525-Speed 2627.00 samples/sec   Loss 5.0752   LearningRate 0.0145   Epoch: 12   Global Step: 514170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:25,421-Speed 2628.93 samples/sec   Loss 5.1231   LearningRate 0.0145   Epoch: 12   Global Step: 514180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:29,321-Speed 2627.33 samples/sec   Loss 5.0219   LearningRate 0.0145   Epoch: 12   Global Step: 514190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:30:33,231-Speed 2618.85 samples/sec   Loss 5.0923   LearningRate 0.0145   Epoch: 12   Global Step: 514200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:30:37,134-Speed 2624.14 samples/sec   Loss 5.1871   LearningRate 0.0145   Epoch: 12   Global Step: 514210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:30:41,030-Speed 2629.23 samples/sec   Loss 5.0919   LearningRate 0.0145   Epoch: 12   Global Step: 514220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:44,936-Speed 2622.18 samples/sec   Loss 5.1117   LearningRate 0.0145   Epoch: 12   Global Step: 514230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:48,857-Speed 2612.31 samples/sec   Loss 5.0931   LearningRate 0.0144   Epoch: 12   Global Step: 514240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:52,760-Speed 2624.31 samples/sec   Loss 4.9891   LearningRate 0.0144   Epoch: 12   Global Step: 514250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:30:56,739-Speed 2574.34 samples/sec   Loss 5.0893   LearningRate 0.0144   Epoch: 12   Global Step: 514260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:00,836-Speed 2499.90 samples/sec   Loss 5.0545   LearningRate 0.0144   Epoch: 12   Global Step: 514270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:04,760-Speed 2609.82 samples/sec   Loss 5.1187   LearningRate 0.0144   Epoch: 12   Global Step: 514280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:08,662-Speed 2625.28 samples/sec   Loss 5.0467   LearningRate 0.0144   Epoch: 12   Global Step: 514290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:12,561-Speed 2626.77 samples/sec   Loss 5.1615   LearningRate 0.0144   Epoch: 12   Global Step: 514300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:16,464-Speed 2624.04 samples/sec   Loss 5.0692   LearningRate 0.0144   Epoch: 12   Global Step: 514310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:20,362-Speed 2627.43 samples/sec   Loss 5.1794   LearningRate 0.0144   Epoch: 12   Global Step: 514320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:31:24,257-Speed 2630.11 samples/sec   Loss 5.1411   LearningRate 0.0144   Epoch: 12   Global Step: 514330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:31:28,157-Speed 2626.05 samples/sec   Loss 5.0337   LearningRate 0.0144   Epoch: 12   Global Step: 514340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:31:32,031-Speed 2644.60 samples/sec   Loss 5.0342   LearningRate 0.0144   Epoch: 12   Global Step: 514350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:35,956-Speed 2608.85 samples/sec   Loss 5.1736   LearningRate 0.0144   Epoch: 12   Global Step: 514360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:39,863-Speed 2621.92 samples/sec   Loss 5.1845   LearningRate 0.0144   Epoch: 12   Global Step: 514370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:43,764-Speed 2624.88 samples/sec   Loss 5.0700   LearningRate 0.0144   Epoch: 12   Global Step: 514380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:47,659-Speed 2630.31 samples/sec   Loss 5.0972   LearningRate 0.0144   Epoch: 12   Global Step: 514390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:51,583-Speed 2610.19 samples/sec   Loss 5.1155   LearningRate 0.0144   Epoch: 12   Global Step: 514400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:55,486-Speed 2623.95 samples/sec   Loss 5.0553   LearningRate 0.0144   Epoch: 12   Global Step: 514410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:31:59,390-Speed 2623.49 samples/sec   Loss 5.1088   LearningRate 0.0144   Epoch: 12   Global Step: 514420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:03,289-Speed 2626.96 samples/sec   Loss 5.0457   LearningRate 0.0144   Epoch: 12   Global Step: 514430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:07,198-Speed 2620.38 samples/sec   Loss 5.0747   LearningRate 0.0144   Epoch: 12   Global Step: 514440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:11,097-Speed 2627.09 samples/sec   Loss 4.9994   LearningRate 0.0144   Epoch: 12   Global Step: 514450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:32:14,992-Speed 2629.36 samples/sec   Loss 5.1037   LearningRate 0.0144   Epoch: 12   Global Step: 514460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:32:18,889-Speed 2628.41 samples/sec   Loss 5.1740   LearningRate 0.0144   Epoch: 12   Global Step: 514470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:32:22,761-Speed 2644.71 samples/sec   Loss 5.0165   LearningRate 0.0144   Epoch: 12   Global Step: 514480   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:26,661-Speed 2626.55 samples/sec   Loss 5.0448   LearningRate 0.0144   Epoch: 12   Global Step: 514490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:30,559-Speed 2627.32 samples/sec   Loss 5.0268   LearningRate 0.0144   Epoch: 12   Global Step: 514500   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:34,461-Speed 2624.94 samples/sec   Loss 5.1733   LearningRate 0.0144   Epoch: 12   Global Step: 514510   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:38,359-Speed 2627.65 samples/sec   Loss 5.1006   LearningRate 0.0144   Epoch: 12   Global Step: 514520   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:42,267-Speed 2621.56 samples/sec   Loss 5.0921   LearningRate 0.0144   Epoch: 12   Global Step: 514530   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:46,166-Speed 2627.24 samples/sec   Loss 4.9295   LearningRate 0.0144   Epoch: 12   Global Step: 514540   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:50,082-Speed 2615.42 samples/sec   Loss 5.0618   LearningRate 0.0144   Epoch: 12   Global Step: 514550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:53,983-Speed 2625.38 samples/sec   Loss 5.1600   LearningRate 0.0144   Epoch: 12   Global Step: 514560   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:32:57,891-Speed 2620.77 samples/sec   Loss 5.1425   LearningRate 0.0144   Epoch: 12   Global Step: 514570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:33:01,793-Speed 2625.23 samples/sec   Loss 5.0572   LearningRate 0.0144   Epoch: 12   Global Step: 514580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:33:05,672-Speed 2640.66 samples/sec   Loss 5.0653   LearningRate 0.0144   Epoch: 12   Global Step: 514590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:33:09,577-Speed 2622.69 samples/sec   Loss 5.0869   LearningRate 0.0144   Epoch: 12   Global Step: 514600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:33:13,456-Speed 2640.38 samples/sec   Loss 5.1467   LearningRate 0.0144   Epoch: 12   Global Step: 514610   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:17,355-Speed 2626.74 samples/sec   Loss 5.0649   LearningRate 0.0144   Epoch: 12   Global Step: 514620   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:21,261-Speed 2623.02 samples/sec   Loss 5.1150   LearningRate 0.0144   Epoch: 12   Global Step: 514630   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:25,157-Speed 2628.64 samples/sec   Loss 5.1874   LearningRate 0.0144   Epoch: 12   Global Step: 514640   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:29,053-Speed 2628.67 samples/sec   Loss 5.0654   LearningRate 0.0144   Epoch: 12   Global Step: 514650   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:33,032-Speed 2573.92 samples/sec   Loss 5.0884   LearningRate 0.0144   Epoch: 12   Global Step: 514660   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:36,930-Speed 2628.23 samples/sec   Loss 5.1233   LearningRate 0.0144   Epoch: 12   Global Step: 514670   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:40,829-Speed 2626.24 samples/sec   Loss 5.0178   LearningRate 0.0144   Epoch: 12   Global Step: 514680   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:44,735-Speed 2622.97 samples/sec   Loss 4.9983   LearningRate 0.0144   Epoch: 12   Global Step: 514690   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:48,643-Speed 2620.22 samples/sec   Loss 5.0432   LearningRate 0.0144   Epoch: 12   Global Step: 514700   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:33:52,538-Speed 2631.02 samples/sec   Loss 5.0833   LearningRate 0.0144   Epoch: 12   Global Step: 514710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:33:56,432-Speed 2630.12 samples/sec   Loss 5.1512   LearningRate 0.0144   Epoch: 12   Global Step: 514720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:00,330-Speed 2627.64 samples/sec   Loss 5.0331   LearningRate 0.0144   Epoch: 12   Global Step: 514730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:04,238-Speed 2620.66 samples/sec   Loss 5.1550   LearningRate 0.0144   Epoch: 12   Global Step: 514740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:08,138-Speed 2626.83 samples/sec   Loss 5.1543   LearningRate 0.0144   Epoch: 12   Global Step: 514750   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:12,035-Speed 2628.44 samples/sec   Loss 5.0990   LearningRate 0.0144   Epoch: 12   Global Step: 514760   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:15,934-Speed 2626.67 samples/sec   Loss 5.1354   LearningRate 0.0144   Epoch: 12   Global Step: 514770   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:19,836-Speed 2625.13 samples/sec   Loss 5.1826   LearningRate 0.0144   Epoch: 12   Global Step: 514780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:23,746-Speed 2619.19 samples/sec   Loss 5.1565   LearningRate 0.0144   Epoch: 12   Global Step: 514790   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:27,651-Speed 2622.97 samples/sec   Loss 4.9358   LearningRate 0.0144   Epoch: 12   Global Step: 514800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:34:31,551-Speed 2626.60 samples/sec   Loss 5.1209   LearningRate 0.0144   Epoch: 12   Global Step: 514810   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:35,446-Speed 2629.67 samples/sec   Loss 5.0518   LearningRate 0.0144   Epoch: 12   Global Step: 514820   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:39,342-Speed 2628.62 samples/sec   Loss 5.0834   LearningRate 0.0144   Epoch: 12   Global Step: 514830   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:43,243-Speed 2625.79 samples/sec   Loss 5.1016   LearningRate 0.0144   Epoch: 12   Global Step: 514840   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:47,148-Speed 2623.39 samples/sec   Loss 5.0484   LearningRate 0.0144   Epoch: 12   Global Step: 514850   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:51,054-Speed 2621.65 samples/sec   Loss 5.1495   LearningRate 0.0144   Epoch: 12   Global Step: 514860   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:54,970-Speed 2618.92 samples/sec   Loss 5.0783   LearningRate 0.0144   Epoch: 12   Global Step: 514870   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:34:58,869-Speed 2626.97 samples/sec   Loss 5.0728   LearningRate 0.0144   Epoch: 12   Global Step: 514880   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:02,778-Speed 2619.85 samples/sec   Loss 5.0773   LearningRate 0.0144   Epoch: 12   Global Step: 514890   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:06,672-Speed 2630.14 samples/sec   Loss 5.1861   LearningRate 0.0144   Epoch: 12   Global Step: 514900   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:10,552-Speed 2640.10 samples/sec   Loss 5.0749   LearningRate 0.0144   Epoch: 12   Global Step: 514910   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:14,450-Speed 2627.65 samples/sec   Loss 5.0921   LearningRate 0.0144   Epoch: 12   Global Step: 514920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:18,346-Speed 2629.36 samples/sec   Loss 4.9843   LearningRate 0.0144   Epoch: 12   Global Step: 514930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:22,242-Speed 2628.78 samples/sec   Loss 5.1471   LearningRate 0.0144   Epoch: 12   Global Step: 514940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:26,140-Speed 2627.70 samples/sec   Loss 5.0557   LearningRate 0.0144   Epoch: 12   Global Step: 514950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:30,041-Speed 2625.55 samples/sec   Loss 5.1962   LearningRate 0.0144   Epoch: 12   Global Step: 514960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:35:33,928-Speed 2635.01 samples/sec   Loss 5.0968   LearningRate 0.0144   Epoch: 12   Global Step: 514970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:35:37,836-Speed 2620.67 samples/sec   Loss 5.1454   LearningRate 0.0144   Epoch: 12   Global Step: 514980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:35:41,736-Speed 2625.69 samples/sec   Loss 5.1466   LearningRate 0.0144   Epoch: 12   Global Step: 514990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:35:45,624-Speed 2634.81 samples/sec   Loss 5.0428   LearningRate 0.0144   Epoch: 12   Global Step: 515000   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:35:49,524-Speed 2626.38 samples/sec   Loss 5.0757   LearningRate 0.0144   Epoch: 12   Global Step: 515010   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:35:53,437-Speed 2617.62 samples/sec   Loss 5.2200   LearningRate 0.0144   Epoch: 12   Global Step: 515020   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:35:57,349-Speed 2618.52 samples/sec   Loss 5.0356   LearningRate 0.0144   Epoch: 12   Global Step: 515030   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:01,247-Speed 2627.84 samples/sec   Loss 5.0143   LearningRate 0.0144   Epoch: 12   Global Step: 515040   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:05,147-Speed 2625.98 samples/sec   Loss 5.1874   LearningRate 0.0144   Epoch: 12   Global Step: 515050   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:09,052-Speed 2622.86 samples/sec   Loss 5.1000   LearningRate 0.0144   Epoch: 12   Global Step: 515060   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:12,948-Speed 2628.59 samples/sec   Loss 5.1293   LearningRate 0.0144   Epoch: 12   Global Step: 515070   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:16,842-Speed 2630.53 samples/sec   Loss 5.0291   LearningRate 0.0144   Epoch: 12   Global Step: 515080   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:20,741-Speed 2626.45 samples/sec   Loss 5.1234   LearningRate 0.0144   Epoch: 12   Global Step: 515090   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:36:24,646-Speed 2623.46 samples/sec   Loss 5.2065   LearningRate 0.0144   Epoch: 12   Global Step: 515100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:28,552-Speed 2622.65 samples/sec   Loss 5.0289   LearningRate 0.0144   Epoch: 12   Global Step: 515110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:32,457-Speed 2622.90 samples/sec   Loss 5.1002   LearningRate 0.0144   Epoch: 12   Global Step: 515120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:36,365-Speed 2620.56 samples/sec   Loss 5.0457   LearningRate 0.0144   Epoch: 12   Global Step: 515130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:40,261-Speed 2628.87 samples/sec   Loss 5.0532   LearningRate 0.0144   Epoch: 12   Global Step: 515140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:44,159-Speed 2627.47 samples/sec   Loss 5.1044   LearningRate 0.0144   Epoch: 12   Global Step: 515150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:48,060-Speed 2625.58 samples/sec   Loss 5.0598   LearningRate 0.0144   Epoch: 12   Global Step: 515160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:51,959-Speed 2630.83 samples/sec   Loss 5.0337   LearningRate 0.0144   Epoch: 12   Global Step: 515170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:55,857-Speed 2627.78 samples/sec   Loss 5.1139   LearningRate 0.0144   Epoch: 12   Global Step: 515180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:36:59,756-Speed 2626.80 samples/sec   Loss 5.0196   LearningRate 0.0144   Epoch: 12   Global Step: 515190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:03,658-Speed 2625.19 samples/sec   Loss 5.1677   LearningRate 0.0144   Epoch: 12   Global Step: 515200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:37:07,569-Speed 2618.67 samples/sec   Loss 5.0904   LearningRate 0.0144   Epoch: 12   Global Step: 515210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:37:11,450-Speed 2639.28 samples/sec   Loss 4.9786   LearningRate 0.0144   Epoch: 12   Global Step: 515220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:15,354-Speed 2623.65 samples/sec   Loss 5.1090   LearningRate 0.0144   Epoch: 12   Global Step: 515230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:19,262-Speed 2620.77 samples/sec   Loss 5.0966   LearningRate 0.0144   Epoch: 12   Global Step: 515240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:23,159-Speed 2628.52 samples/sec   Loss 5.0826   LearningRate 0.0144   Epoch: 12   Global Step: 515250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:27,051-Speed 2631.58 samples/sec   Loss 5.0602   LearningRate 0.0144   Epoch: 12   Global Step: 515260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:30,954-Speed 2623.91 samples/sec   Loss 5.0570   LearningRate 0.0144   Epoch: 12   Global Step: 515270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:34,878-Speed 2609.92 samples/sec   Loss 4.9964   LearningRate 0.0144   Epoch: 12   Global Step: 515280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:38,776-Speed 2628.08 samples/sec   Loss 5.1931   LearningRate 0.0144   Epoch: 12   Global Step: 515290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:42,673-Speed 2627.92 samples/sec   Loss 4.9929   LearningRate 0.0144   Epoch: 12   Global Step: 515300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:46,571-Speed 2628.35 samples/sec   Loss 5.0594   LearningRate 0.0144   Epoch: 12   Global Step: 515310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:50,465-Speed 2629.55 samples/sec   Loss 5.1221   LearningRate 0.0144   Epoch: 12   Global Step: 515320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:37:54,340-Speed 2643.35 samples/sec   Loss 5.0537   LearningRate 0.0143   Epoch: 12   Global Step: 515330   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:37:58,238-Speed 2627.69 samples/sec   Loss 5.0341   LearningRate 0.0143   Epoch: 12   Global Step: 515340   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:02,150-Speed 2618.05 samples/sec   Loss 5.1207   LearningRate 0.0143   Epoch: 12   Global Step: 515350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:06,062-Speed 2618.43 samples/sec   Loss 5.0627   LearningRate 0.0143   Epoch: 12   Global Step: 515360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:09,957-Speed 2629.41 samples/sec   Loss 5.0534   LearningRate 0.0143   Epoch: 12   Global Step: 515370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:13,851-Speed 2630.56 samples/sec   Loss 5.0948   LearningRate 0.0143   Epoch: 12   Global Step: 515380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:17,749-Speed 2627.87 samples/sec   Loss 5.0669   LearningRate 0.0143   Epoch: 12   Global Step: 515390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:21,647-Speed 2627.61 samples/sec   Loss 5.0974   LearningRate 0.0143   Epoch: 12   Global Step: 515400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:25,543-Speed 2628.86 samples/sec   Loss 4.9970   LearningRate 0.0143   Epoch: 12   Global Step: 515410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:29,452-Speed 2620.08 samples/sec   Loss 5.0738   LearningRate 0.0143   Epoch: 12   Global Step: 515420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:38:33,347-Speed 2629.66 samples/sec   Loss 5.1258   LearningRate 0.0143   Epoch: 12   Global Step: 515430   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:38:37,250-Speed 2624.30 samples/sec   Loss 5.0417   LearningRate 0.0143   Epoch: 12   Global Step: 515440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:38:41,157-Speed 2621.09 samples/sec   Loss 5.0350   LearningRate 0.0143   Epoch: 12   Global Step: 515450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:38:45,057-Speed 2626.81 samples/sec   Loss 5.1659   LearningRate 0.0143   Epoch: 12   Global Step: 515460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:38:48,954-Speed 2627.92 samples/sec   Loss 5.0340   LearningRate 0.0143   Epoch: 12   Global Step: 515470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:38:52,854-Speed 2626.45 samples/sec   Loss 5.0365   LearningRate 0.0143   Epoch: 12   Global Step: 515480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:38:56,728-Speed 2644.32 samples/sec   Loss 5.0623   LearningRate 0.0143   Epoch: 12   Global Step: 515490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:00,623-Speed 2629.50 samples/sec   Loss 5.1370   LearningRate 0.0143   Epoch: 12   Global Step: 515500   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:04,519-Speed 2628.87 samples/sec   Loss 5.1215   LearningRate 0.0143   Epoch: 12   Global Step: 515510   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:08,415-Speed 2629.16 samples/sec   Loss 5.0330   LearningRate 0.0143   Epoch: 12   Global Step: 515520   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:12,313-Speed 2627.45 samples/sec   Loss 5.1373   LearningRate 0.0143   Epoch: 12   Global Step: 515530   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:16,224-Speed 2619.03 samples/sec   Loss 5.1149   LearningRate 0.0143   Epoch: 12   Global Step: 515540   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:20,121-Speed 2627.97 samples/sec   Loss 5.0451   LearningRate 0.0143   Epoch: 12   Global Step: 515550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:24,018-Speed 2628.01 samples/sec   Loss 5.0632   LearningRate 0.0143   Epoch: 12   Global Step: 515560   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:27,916-Speed 2627.49 samples/sec   Loss 5.1433   LearningRate 0.0143   Epoch: 12   Global Step: 515570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:31,814-Speed 2627.94 samples/sec   Loss 5.1168   LearningRate 0.0143   Epoch: 12   Global Step: 515580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:39:35,707-Speed 2631.49 samples/sec   Loss 5.0272   LearningRate 0.0143   Epoch: 12   Global Step: 515590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:39:39,599-Speed 2631.16 samples/sec   Loss 5.2078   LearningRate 0.0143   Epoch: 12   Global Step: 515600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:39:43,497-Speed 2627.72 samples/sec   Loss 5.0747   LearningRate 0.0143   Epoch: 12   Global Step: 515610   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:39:47,409-Speed 2618.67 samples/sec   Loss 5.0670   LearningRate 0.0143   Epoch: 12   Global Step: 515620   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:39:51,306-Speed 2627.98 samples/sec   Loss 5.1508   LearningRate 0.0143   Epoch: 12   Global Step: 515630   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:39:55,200-Speed 2630.31 samples/sec   Loss 5.1745   LearningRate 0.0143   Epoch: 12   Global Step: 515640   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:39:59,084-Speed 2636.91 samples/sec   Loss 5.1670   LearningRate 0.0143   Epoch: 12   Global Step: 515650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:02,976-Speed 2631.46 samples/sec   Loss 5.1272   LearningRate 0.0143   Epoch: 12   Global Step: 515660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:06,871-Speed 2629.68 samples/sec   Loss 5.0198   LearningRate 0.0143   Epoch: 12   Global Step: 515670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:10,769-Speed 2627.89 samples/sec   Loss 5.1785   LearningRate 0.0143   Epoch: 12   Global Step: 515680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:14,665-Speed 2628.84 samples/sec   Loss 4.9395   LearningRate 0.0143   Epoch: 12   Global Step: 515690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:18,570-Speed 2622.93 samples/sec   Loss 5.1031   LearningRate 0.0143   Epoch: 12   Global Step: 515700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:22,472-Speed 2625.42 samples/sec   Loss 5.0242   LearningRate 0.0143   Epoch: 12   Global Step: 515710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:26,370-Speed 2627.21 samples/sec   Loss 5.1053   LearningRate 0.0143   Epoch: 12   Global Step: 515720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:30,265-Speed 2629.22 samples/sec   Loss 5.0669   LearningRate 0.0143   Epoch: 12   Global Step: 515730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:34,163-Speed 2627.53 samples/sec   Loss 5.0610   LearningRate 0.0143   Epoch: 12   Global Step: 515740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:38,057-Speed 2630.25 samples/sec   Loss 5.1048   LearningRate 0.0143   Epoch: 12   Global Step: 515750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:40:41,952-Speed 2629.49 samples/sec   Loss 5.1167   LearningRate 0.0143   Epoch: 12   Global Step: 515760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:40:45,852-Speed 2626.69 samples/sec   Loss 5.0418   LearningRate 0.0143   Epoch: 12   Global Step: 515770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:40:49,731-Speed 2640.54 samples/sec   Loss 5.0438   LearningRate 0.0143   Epoch: 12   Global Step: 515780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:53,633-Speed 2625.27 samples/sec   Loss 5.1603   LearningRate 0.0143   Epoch: 12   Global Step: 515790   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:40:57,530-Speed 2628.33 samples/sec   Loss 4.9919   LearningRate 0.0143   Epoch: 12   Global Step: 515800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:01,423-Speed 2630.37 samples/sec   Loss 4.9890   LearningRate 0.0143   Epoch: 12   Global Step: 515810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:05,326-Speed 2623.99 samples/sec   Loss 5.0404   LearningRate 0.0143   Epoch: 12   Global Step: 515820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:09,223-Speed 2628.30 samples/sec   Loss 5.0922   LearningRate 0.0143   Epoch: 12   Global Step: 515830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:13,129-Speed 2622.59 samples/sec   Loss 5.1362   LearningRate 0.0143   Epoch: 12   Global Step: 515840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:17,028-Speed 2626.96 samples/sec   Loss 5.0603   LearningRate 0.0143   Epoch: 12   Global Step: 515850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:20,930-Speed 2625.08 samples/sec   Loss 5.0087   LearningRate 0.0143   Epoch: 12   Global Step: 515860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:24,827-Speed 2627.76 samples/sec   Loss 5.1100   LearningRate 0.0143   Epoch: 12   Global Step: 515870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:28,729-Speed 2625.60 samples/sec   Loss 5.1334   LearningRate 0.0143   Epoch: 12   Global Step: 515880   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:41:32,629-Speed 2626.19 samples/sec   Loss 5.0008   LearningRate 0.0143   Epoch: 12   Global Step: 515890   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:41:36,506-Speed 2641.84 samples/sec   Loss 5.1164   LearningRate 0.0143   Epoch: 12   Global Step: 515900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:40,404-Speed 2627.36 samples/sec   Loss 5.0106   LearningRate 0.0143   Epoch: 12   Global Step: 515910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:44,305-Speed 2625.52 samples/sec   Loss 5.1360   LearningRate 0.0143   Epoch: 12   Global Step: 515920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:48,225-Speed 2612.97 samples/sec   Loss 5.0296   LearningRate 0.0143   Epoch: 12   Global Step: 515930   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:52,185-Speed 2586.61 samples/sec   Loss 5.1131   LearningRate 0.0143   Epoch: 12   Global Step: 515940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:41:56,091-Speed 2622.41 samples/sec   Loss 5.0359   LearningRate 0.0143   Epoch: 12   Global Step: 515950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:00,082-Speed 2566.25 samples/sec   Loss 5.0505   LearningRate 0.0143   Epoch: 12   Global Step: 515960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:03,983-Speed 2625.73 samples/sec   Loss 5.0278   LearningRate 0.0143   Epoch: 12   Global Step: 515970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:07,877-Speed 2629.96 samples/sec   Loss 5.0464   LearningRate 0.0143   Epoch: 12   Global Step: 515980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:11,782-Speed 2623.57 samples/sec   Loss 5.0316   LearningRate 0.0143   Epoch: 12   Global Step: 515990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:15,683-Speed 2625.35 samples/sec   Loss 5.0250   LearningRate 0.0143   Epoch: 12   Global Step: 516000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:42:19,580-Speed 2628.46 samples/sec   Loss 5.1506   LearningRate 0.0143   Epoch: 12   Global Step: 516010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:42:23,476-Speed 2628.31 samples/sec   Loss 5.0325   LearningRate 0.0143   Epoch: 12   Global Step: 516020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:42:27,358-Speed 2638.79 samples/sec   Loss 5.0748   LearningRate 0.0143   Epoch: 12   Global Step: 516030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:31,255-Speed 2628.44 samples/sec   Loss 5.0528   LearningRate 0.0143   Epoch: 12   Global Step: 516040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:35,163-Speed 2620.22 samples/sec   Loss 5.0597   LearningRate 0.0143   Epoch: 12   Global Step: 516050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:39,066-Speed 2624.64 samples/sec   Loss 5.1078   LearningRate 0.0143   Epoch: 12   Global Step: 516060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:42,967-Speed 2625.96 samples/sec   Loss 5.1552   LearningRate 0.0143   Epoch: 12   Global Step: 516070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:46,874-Speed 2621.30 samples/sec   Loss 5.1056   LearningRate 0.0143   Epoch: 12   Global Step: 516080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:50,913-Speed 2535.86 samples/sec   Loss 5.1319   LearningRate 0.0143   Epoch: 12   Global Step: 516090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:54,834-Speed 2612.30 samples/sec   Loss 5.0540   LearningRate 0.0143   Epoch: 12   Global Step: 516100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:42:58,729-Speed 2629.91 samples/sec   Loss 4.9941   LearningRate 0.0143   Epoch: 12   Global Step: 516110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:43:02,633-Speed 2623.55 samples/sec   Loss 5.1500   LearningRate 0.0143   Epoch: 12   Global Step: 516120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:43:06,508-Speed 2642.67 samples/sec   Loss 4.9917   LearningRate 0.0143   Epoch: 12   Global Step: 516130   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:10,406-Speed 2627.36 samples/sec   Loss 5.0487   LearningRate 0.0143   Epoch: 12   Global Step: 516140   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:14,305-Speed 2627.36 samples/sec   Loss 5.1185   LearningRate 0.0143   Epoch: 12   Global Step: 516150   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:18,215-Speed 2620.14 samples/sec   Loss 5.0502   LearningRate 0.0143   Epoch: 12   Global Step: 516160   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:22,113-Speed 2627.33 samples/sec   Loss 5.1233   LearningRate 0.0143   Epoch: 12   Global Step: 516170   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:26,010-Speed 2628.32 samples/sec   Loss 5.1123   LearningRate 0.0143   Epoch: 12   Global Step: 516180   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:29,913-Speed 2624.31 samples/sec   Loss 5.1487   LearningRate 0.0143   Epoch: 12   Global Step: 516190   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:33,806-Speed 2631.05 samples/sec   Loss 5.0620   LearningRate 0.0143   Epoch: 12   Global Step: 516200   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:37,701-Speed 2629.40 samples/sec   Loss 4.9923   LearningRate 0.0143   Epoch: 12   Global Step: 516210   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:41,596-Speed 2629.40 samples/sec   Loss 5.0722   LearningRate 0.0143   Epoch: 12   Global Step: 516220   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:43:45,497-Speed 2625.43 samples/sec   Loss 5.1521   LearningRate 0.0143   Epoch: 12   Global Step: 516230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:43:49,400-Speed 2624.44 samples/sec   Loss 5.0304   LearningRate 0.0143   Epoch: 12   Global Step: 516240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:43:53,301-Speed 2625.99 samples/sec   Loss 5.1001   LearningRate 0.0143   Epoch: 12   Global Step: 516250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:43:57,196-Speed 2629.69 samples/sec   Loss 5.0297   LearningRate 0.0143   Epoch: 12   Global Step: 516260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:01,095-Speed 2627.07 samples/sec   Loss 5.1096   LearningRate 0.0143   Epoch: 12   Global Step: 516270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:04,994-Speed 2626.53 samples/sec   Loss 5.0476   LearningRate 0.0143   Epoch: 12   Global Step: 516280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:08,893-Speed 2626.90 samples/sec   Loss 5.1229   LearningRate 0.0143   Epoch: 12   Global Step: 516290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:12,804-Speed 2619.30 samples/sec   Loss 5.0398   LearningRate 0.0143   Epoch: 12   Global Step: 516300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:16,710-Speed 2621.83 samples/sec   Loss 5.0143   LearningRate 0.0143   Epoch: 12   Global Step: 516310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:20,612-Speed 2624.54 samples/sec   Loss 5.0329   LearningRate 0.0143   Epoch: 12   Global Step: 516320   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:24,510-Speed 2627.57 samples/sec   Loss 5.1192   LearningRate 0.0143   Epoch: 12   Global Step: 516330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:44:28,399-Speed 2633.83 samples/sec   Loss 5.1541   LearningRate 0.0143   Epoch: 12   Global Step: 516340   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:32,295-Speed 2629.22 samples/sec   Loss 4.9671   LearningRate 0.0143   Epoch: 12   Global Step: 516350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:36,194-Speed 2626.97 samples/sec   Loss 5.0962   LearningRate 0.0143   Epoch: 12   Global Step: 516360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:40,087-Speed 2631.01 samples/sec   Loss 5.0356   LearningRate 0.0143   Epoch: 12   Global Step: 516370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:43,986-Speed 2626.67 samples/sec   Loss 4.9916   LearningRate 0.0143   Epoch: 12   Global Step: 516380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:47,882-Speed 2628.62 samples/sec   Loss 5.1054   LearningRate 0.0143   Epoch: 12   Global Step: 516390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:51,778-Speed 2629.52 samples/sec   Loss 5.0590   LearningRate 0.0143   Epoch: 12   Global Step: 516400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:55,681-Speed 2624.16 samples/sec   Loss 5.0045   LearningRate 0.0143   Epoch: 12   Global Step: 516410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:44:59,576-Speed 2629.43 samples/sec   Loss 5.0765   LearningRate 0.0143   Epoch: 12   Global Step: 516420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:45:03,472-Speed 2628.79 samples/sec   Loss 5.1226   LearningRate 0.0142   Epoch: 12   Global Step: 516430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:45:07,364-Speed 2631.39 samples/sec   Loss 5.0895   LearningRate 0.0142   Epoch: 12   Global Step: 516440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:45:11,262-Speed 2627.86 samples/sec   Loss 5.1363   LearningRate 0.0142   Epoch: 12   Global Step: 516450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:45:15,161-Speed 2627.21 samples/sec   Loss 5.1416   LearningRate 0.0142   Epoch: 12   Global Step: 516460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:45:19,075-Speed 2616.84 samples/sec   Loss 5.1153   LearningRate 0.0142   Epoch: 12   Global Step: 516470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:45:22,976-Speed 2625.60 samples/sec   Loss 5.0173   LearningRate 0.0142   Epoch: 12   Global Step: 516480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:45:26,850-Speed 2643.78 samples/sec   Loss 5.0105   LearningRate 0.0142   Epoch: 12   Global Step: 516490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:45:30,731-Speed 2639.29 samples/sec   Loss 5.1270   LearningRate 0.0142   Epoch: 12   Global Step: 516500   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:34,642-Speed 2618.22 samples/sec   Loss 5.1943   LearningRate 0.0142   Epoch: 12   Global Step: 516510   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:38,541-Speed 2627.23 samples/sec   Loss 5.1085   LearningRate 0.0142   Epoch: 12   Global Step: 516520   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:42,438-Speed 2628.31 samples/sec   Loss 5.0851   LearningRate 0.0142   Epoch: 12   Global Step: 516530   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:46,336-Speed 2627.49 samples/sec   Loss 5.0145   LearningRate 0.0142   Epoch: 12   Global Step: 516540   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:50,237-Speed 2626.10 samples/sec   Loss 4.9979   LearningRate 0.0142   Epoch: 12   Global Step: 516550   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:54,131-Speed 2630.72 samples/sec   Loss 5.0245   LearningRate 0.0142   Epoch: 12   Global Step: 516560   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:45:58,027-Speed 2628.31 samples/sec   Loss 5.1499   LearningRate 0.0142   Epoch: 12   Global Step: 516570   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:46:01,921-Speed 2630.25 samples/sec   Loss 5.0337   LearningRate 0.0142   Epoch: 12   Global Step: 516580   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:46:05,816-Speed 2629.66 samples/sec   Loss 4.9040   LearningRate 0.0142   Epoch: 12   Global Step: 516590   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:46:09,709-Speed 2630.64 samples/sec   Loss 4.9651   LearningRate 0.0142   Epoch: 12   Global Step: 516600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:13,615-Speed 2622.36 samples/sec   Loss 4.9987   LearningRate 0.0142   Epoch: 12   Global Step: 516610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:17,512-Speed 2628.23 samples/sec   Loss 5.0679   LearningRate 0.0142   Epoch: 12   Global Step: 516620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:21,409-Speed 2628.07 samples/sec   Loss 5.1061   LearningRate 0.0142   Epoch: 12   Global Step: 516630   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:25,306-Speed 2628.67 samples/sec   Loss 5.0102   LearningRate 0.0142   Epoch: 12   Global Step: 516640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:29,204-Speed 2627.19 samples/sec   Loss 5.1458   LearningRate 0.0142   Epoch: 12   Global Step: 516650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:33,097-Speed 2631.34 samples/sec   Loss 4.9375   LearningRate 0.0142   Epoch: 12   Global Step: 516660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:36,991-Speed 2630.16 samples/sec   Loss 5.0421   LearningRate 0.0142   Epoch: 12   Global Step: 516670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:40,889-Speed 2627.62 samples/sec   Loss 5.1295   LearningRate 0.0142   Epoch: 12   Global Step: 516680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:44,792-Speed 2624.74 samples/sec   Loss 5.1237   LearningRate 0.0142   Epoch: 12   Global Step: 516690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:46:48,652-Speed 2653.01 samples/sec   Loss 5.0478   LearningRate 0.0142   Epoch: 12   Global Step: 516700   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:46:52,545-Speed 2631.39 samples/sec   Loss 4.9905   LearningRate 0.0142   Epoch: 12   Global Step: 516710   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:46:56,439-Speed 2629.88 samples/sec   Loss 5.1160   LearningRate 0.0142   Epoch: 12   Global Step: 516720   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:00,334-Speed 2629.69 samples/sec   Loss 5.1196   LearningRate 0.0142   Epoch: 12   Global Step: 516730   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:04,241-Speed 2621.19 samples/sec   Loss 5.0584   LearningRate 0.0142   Epoch: 12   Global Step: 516740   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:08,148-Speed 2622.19 samples/sec   Loss 4.9514   LearningRate 0.0142   Epoch: 12   Global Step: 516750   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:12,053-Speed 2623.02 samples/sec   Loss 5.1432   LearningRate 0.0142   Epoch: 12   Global Step: 516760   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:15,949-Speed 2628.49 samples/sec   Loss 5.0667   LearningRate 0.0142   Epoch: 12   Global Step: 516770   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:19,854-Speed 2623.13 samples/sec   Loss 5.1232   LearningRate 0.0142   Epoch: 12   Global Step: 516780   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:23,747-Speed 2631.08 samples/sec   Loss 5.0703   LearningRate 0.0142   Epoch: 12   Global Step: 516790   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:47:27,642-Speed 2629.79 samples/sec   Loss 5.0713   LearningRate 0.0142   Epoch: 12   Global Step: 516800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:31,539-Speed 2628.28 samples/sec   Loss 5.1640   LearningRate 0.0142   Epoch: 12   Global Step: 516810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:35,435-Speed 2628.24 samples/sec   Loss 4.9982   LearningRate 0.0142   Epoch: 12   Global Step: 516820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:39,336-Speed 2626.02 samples/sec   Loss 5.1181   LearningRate 0.0142   Epoch: 12   Global Step: 516830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:43,232-Speed 2629.50 samples/sec   Loss 5.0118   LearningRate 0.0142   Epoch: 12   Global Step: 516840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:47,205-Speed 2577.46 samples/sec   Loss 5.0287   LearningRate 0.0142   Epoch: 12   Global Step: 516850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:51,286-Speed 2510.09 samples/sec   Loss 5.1018   LearningRate 0.0142   Epoch: 12   Global Step: 516860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:55,275-Speed 2567.67 samples/sec   Loss 5.0365   LearningRate 0.0142   Epoch: 12   Global Step: 516870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:47:59,171-Speed 2629.21 samples/sec   Loss 5.0311   LearningRate 0.0142   Epoch: 12   Global Step: 516880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:48:03,095-Speed 2609.65 samples/sec   Loss 5.0806   LearningRate 0.0142   Epoch: 12   Global Step: 516890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:48:06,989-Speed 2630.28 samples/sec   Loss 5.0748   LearningRate 0.0142   Epoch: 12   Global Step: 516900   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:48:10,885-Speed 2628.58 samples/sec   Loss 5.0898   LearningRate 0.0142   Epoch: 12   Global Step: 516910   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:48:14,780-Speed 2630.72 samples/sec   Loss 4.9548   LearningRate 0.0142   Epoch: 12   Global Step: 516920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:48:18,677-Speed 2628.44 samples/sec   Loss 5.0419   LearningRate 0.0142   Epoch: 12   Global Step: 516930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:48:22,547-Speed 2646.71 samples/sec   Loss 4.9911   LearningRate 0.0142   Epoch: 12   Global Step: 516940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:48:26,445-Speed 2627.75 samples/sec   Loss 5.0578   LearningRate 0.0142   Epoch: 12   Global Step: 516950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:48:30,340-Speed 2629.65 samples/sec   Loss 5.0814   LearningRate 0.0142   Epoch: 12   Global Step: 516960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:48:34,246-Speed 2621.99 samples/sec   Loss 5.1334   LearningRate 0.0142   Epoch: 12   Global Step: 516970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:48:38,133-Speed 2634.57 samples/sec   Loss 5.0452   LearningRate 0.0142   Epoch: 12   Global Step: 516980   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:48:42,046-Speed 2618.08 samples/sec   Loss 5.0773   LearningRate 0.0142   Epoch: 12   Global Step: 516990   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:48:45,948-Speed 2624.76 samples/sec   Loss 5.0238   LearningRate 0.0142   Epoch: 12   Global Step: 517000   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:48:49,847-Speed 2627.21 samples/sec   Loss 5.0210   LearningRate 0.0142   Epoch: 12   Global Step: 517010   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:48:53,749-Speed 2625.19 samples/sec   Loss 5.2308   LearningRate 0.0142   Epoch: 12   Global Step: 517020   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:48:57,646-Speed 2628.34 samples/sec   Loss 5.0670   LearningRate 0.0142   Epoch: 12   Global Step: 517030   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:49:01,545-Speed 2626.68 samples/sec   Loss 5.0454   LearningRate 0.0142   Epoch: 12   Global Step: 517040   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:49:05,443-Speed 2627.52 samples/sec   Loss 4.9468   LearningRate 0.0142   Epoch: 12   Global Step: 517050   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:49:09,339-Speed 2629.22 samples/sec   Loss 4.9360   LearningRate 0.0142   Epoch: 12   Global Step: 517060   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:49:13,235-Speed 2629.76 samples/sec   Loss 5.0643   LearningRate 0.0142   Epoch: 12   Global Step: 517070   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 05:49:17,138-Speed 2624.15 samples/sec   Loss 5.0721   LearningRate 0.0142   Epoch: 12   Global Step: 517080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:21,037-Speed 2627.32 samples/sec   Loss 5.0901   LearningRate 0.0142   Epoch: 12   Global Step: 517090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:24,956-Speed 2612.96 samples/sec   Loss 5.0208   LearningRate 0.0142   Epoch: 12   Global Step: 517100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:28,850-Speed 2631.33 samples/sec   Loss 5.0687   LearningRate 0.0142   Epoch: 12   Global Step: 517110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:32,743-Speed 2630.60 samples/sec   Loss 5.0817   LearningRate 0.0142   Epoch: 12   Global Step: 517120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:36,660-Speed 2615.73 samples/sec   Loss 5.1076   LearningRate 0.0142   Epoch: 12   Global Step: 517130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:40,556-Speed 2629.10 samples/sec   Loss 5.0083   LearningRate 0.0142   Epoch: 12   Global Step: 517140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:44,458-Speed 2625.24 samples/sec   Loss 4.9198   LearningRate 0.0142   Epoch: 12   Global Step: 517150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:48,359-Speed 2625.52 samples/sec   Loss 5.1837   LearningRate 0.0142   Epoch: 12   Global Step: 517160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:52,259-Speed 2626.33 samples/sec   Loss 5.1356   LearningRate 0.0142   Epoch: 12   Global Step: 517170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:49:56,153-Speed 2630.47 samples/sec   Loss 5.1792   LearningRate 0.0142   Epoch: 12   Global Step: 517180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:50:00,054-Speed 2625.53 samples/sec   Loss 4.9828   LearningRate 0.0142   Epoch: 12   Global Step: 517190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:50:03,947-Speed 2630.97 samples/sec   Loss 5.0208   LearningRate 0.0142   Epoch: 12   Global Step: 517200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:07,862-Speed 2615.87 samples/sec   Loss 5.1352   LearningRate 0.0142   Epoch: 12   Global Step: 517210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:11,769-Speed 2621.99 samples/sec   Loss 4.9089   LearningRate 0.0142   Epoch: 12   Global Step: 517220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:15,667-Speed 2627.50 samples/sec   Loss 5.1388   LearningRate 0.0142   Epoch: 12   Global Step: 517230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:19,570-Speed 2624.39 samples/sec   Loss 4.9678   LearningRate 0.0142   Epoch: 12   Global Step: 517240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:23,473-Speed 2625.06 samples/sec   Loss 4.9945   LearningRate 0.0142   Epoch: 12   Global Step: 517250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:27,374-Speed 2625.34 samples/sec   Loss 5.0467   LearningRate 0.0142   Epoch: 12   Global Step: 517260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:31,272-Speed 2627.59 samples/sec   Loss 5.0433   LearningRate 0.0142   Epoch: 12   Global Step: 517270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:35,172-Speed 2625.78 samples/sec   Loss 5.0429   LearningRate 0.0142   Epoch: 12   Global Step: 517280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:39,065-Speed 2631.45 samples/sec   Loss 5.0778   LearningRate 0.0142   Epoch: 12   Global Step: 517290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:42,941-Speed 2642.19 samples/sec   Loss 5.1003   LearningRate 0.0142   Epoch: 12   Global Step: 517300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:46,849-Speed 2621.47 samples/sec   Loss 4.9965   LearningRate 0.0142   Epoch: 12   Global Step: 517310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:50,740-Speed 2632.03 samples/sec   Loss 4.9889   LearningRate 0.0142   Epoch: 12   Global Step: 517320   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:54,633-Speed 2631.52 samples/sec   Loss 4.9996   LearningRate 0.0142   Epoch: 12   Global Step: 517330   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:50:58,563-Speed 2605.87 samples/sec   Loss 5.0669   LearningRate 0.0142   Epoch: 12   Global Step: 517340   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:51:02,461-Speed 2627.59 samples/sec   Loss 5.0887   LearningRate 0.0142   Epoch: 12   Global Step: 517350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:51:06,355-Speed 2630.59 samples/sec   Loss 5.0519   LearningRate 0.0142   Epoch: 12   Global Step: 517360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:51:10,252-Speed 2627.74 samples/sec   Loss 5.1512   LearningRate 0.0142   Epoch: 12   Global Step: 517370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:51:14,154-Speed 2625.86 samples/sec   Loss 4.9768   LearningRate 0.0142   Epoch: 12   Global Step: 517380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:51:18,057-Speed 2624.43 samples/sec   Loss 5.0781   LearningRate 0.0142   Epoch: 12   Global Step: 517390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:51:21,960-Speed 2623.85 samples/sec   Loss 4.9879   LearningRate 0.0142   Epoch: 12   Global Step: 517400   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:25,856-Speed 2629.61 samples/sec   Loss 5.1995   LearningRate 0.0142   Epoch: 12   Global Step: 517410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:29,749-Speed 2631.09 samples/sec   Loss 5.0385   LearningRate 0.0142   Epoch: 12   Global Step: 517420   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:33,644-Speed 2629.10 samples/sec   Loss 5.0578   LearningRate 0.0142   Epoch: 12   Global Step: 517430   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:37,566-Speed 2611.98 samples/sec   Loss 5.0167   LearningRate 0.0142   Epoch: 12   Global Step: 517440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:41,462-Speed 2629.27 samples/sec   Loss 5.0951   LearningRate 0.0142   Epoch: 12   Global Step: 517450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:45,377-Speed 2616.59 samples/sec   Loss 5.0807   LearningRate 0.0142   Epoch: 12   Global Step: 517460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:49,273-Speed 2629.14 samples/sec   Loss 5.0034   LearningRate 0.0142   Epoch: 12   Global Step: 517470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:53,173-Speed 2626.57 samples/sec   Loss 5.1087   LearningRate 0.0142   Epoch: 12   Global Step: 517480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:51:57,073-Speed 2626.24 samples/sec   Loss 5.1232   LearningRate 0.0142   Epoch: 12   Global Step: 517490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:00,965-Speed 2631.51 samples/sec   Loss 5.0403   LearningRate 0.0142   Epoch: 12   Global Step: 517500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:04,864-Speed 2626.64 samples/sec   Loss 5.1676   LearningRate 0.0142   Epoch: 12   Global Step: 517510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:08,761-Speed 2628.39 samples/sec   Loss 4.9780   LearningRate 0.0142   Epoch: 12   Global Step: 517520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:12,662-Speed 2625.54 samples/sec   Loss 5.0988   LearningRate 0.0141   Epoch: 12   Global Step: 517530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:16,618-Speed 2590.08 samples/sec   Loss 5.0797   LearningRate 0.0141   Epoch: 12   Global Step: 517540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:20,517-Speed 2626.95 samples/sec   Loss 5.0720   LearningRate 0.0141   Epoch: 12   Global Step: 517550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:24,440-Speed 2611.18 samples/sec   Loss 5.0970   LearningRate 0.0141   Epoch: 12   Global Step: 517560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:52:28,322-Speed 2638.30 samples/sec   Loss 4.9994   LearningRate 0.0141   Epoch: 12   Global Step: 517570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:32,233-Speed 2618.98 samples/sec   Loss 5.0142   LearningRate 0.0141   Epoch: 12   Global Step: 517580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:36,136-Speed 2623.88 samples/sec   Loss 5.0905   LearningRate 0.0141   Epoch: 12   Global Step: 517590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:40,039-Speed 2626.56 samples/sec   Loss 5.0627   LearningRate 0.0141   Epoch: 12   Global Step: 517600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:43,937-Speed 2628.37 samples/sec   Loss 5.0452   LearningRate 0.0141   Epoch: 12   Global Step: 517610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:47,834-Speed 2628.10 samples/sec   Loss 5.0427   LearningRate 0.0141   Epoch: 12   Global Step: 517620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:51,734-Speed 2626.57 samples/sec   Loss 5.1064   LearningRate 0.0141   Epoch: 12   Global Step: 517630   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:55,632-Speed 2627.28 samples/sec   Loss 4.9619   LearningRate 0.0141   Epoch: 12   Global Step: 517640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:52:59,561-Speed 2606.82 samples/sec   Loss 5.0709   LearningRate 0.0141   Epoch: 12   Global Step: 517650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:03,465-Speed 2624.28 samples/sec   Loss 5.0421   LearningRate 0.0141   Epoch: 12   Global Step: 517660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:07,345-Speed 2640.79 samples/sec   Loss 4.9998   LearningRate 0.0141   Epoch: 12   Global Step: 517670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:11,244-Speed 2626.81 samples/sec   Loss 5.0108   LearningRate 0.0141   Epoch: 12   Global Step: 517680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:15,293-Speed 2529.93 samples/sec   Loss 5.0650   LearningRate 0.0141   Epoch: 12   Global Step: 517690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:19,196-Speed 2624.85 samples/sec   Loss 4.9562   LearningRate 0.0141   Epoch: 12   Global Step: 517700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:23,094-Speed 2627.77 samples/sec   Loss 5.0685   LearningRate 0.0141   Epoch: 12   Global Step: 517710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:26,992-Speed 2627.71 samples/sec   Loss 5.0695   LearningRate 0.0141   Epoch: 12   Global Step: 517720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:30,888-Speed 2628.94 samples/sec   Loss 5.0437   LearningRate 0.0141   Epoch: 12   Global Step: 517730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:34,785-Speed 2628.26 samples/sec   Loss 5.0949   LearningRate 0.0141   Epoch: 12   Global Step: 517740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:38,681-Speed 2629.22 samples/sec   Loss 4.9772   LearningRate 0.0141   Epoch: 12   Global Step: 517750   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:42,579-Speed 2627.56 samples/sec   Loss 5.0153   LearningRate 0.0141   Epoch: 12   Global Step: 517760   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:53:46,481-Speed 2624.76 samples/sec   Loss 5.0427   LearningRate 0.0141   Epoch: 12   Global Step: 517770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:53:50,413-Speed 2605.76 samples/sec   Loss 5.0494   LearningRate 0.0141   Epoch: 12   Global Step: 517780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:53:54,313-Speed 2626.03 samples/sec   Loss 5.1536   LearningRate 0.0141   Epoch: 12   Global Step: 517790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:53:58,231-Speed 2614.53 samples/sec   Loss 4.9795   LearningRate 0.0141   Epoch: 12   Global Step: 517800   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:02,157-Speed 2608.69 samples/sec   Loss 5.0474   LearningRate 0.0141   Epoch: 12   Global Step: 517810   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:06,062-Speed 2623.05 samples/sec   Loss 5.0243   LearningRate 0.0141   Epoch: 12   Global Step: 517820   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:09,979-Speed 2614.89 samples/sec   Loss 5.0178   LearningRate 0.0141   Epoch: 12   Global Step: 517830   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:13,885-Speed 2622.12 samples/sec   Loss 5.0014   LearningRate 0.0141   Epoch: 12   Global Step: 517840   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:17,790-Speed 2622.95 samples/sec   Loss 5.1414   LearningRate 0.0141   Epoch: 12   Global Step: 517850   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:21,695-Speed 2623.26 samples/sec   Loss 5.1410   LearningRate 0.0141   Epoch: 12   Global Step: 517860   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:54:25,600-Speed 2623.22 samples/sec   Loss 4.9969   LearningRate 0.0141   Epoch: 12   Global Step: 517870   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-15 05:54:29,467-Speed 2648.61 samples/sec   Loss 5.0477   LearningRate 0.0141   Epoch: 12   Global Step: 517880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:33,370-Speed 2624.21 samples/sec   Loss 5.0014   LearningRate 0.0141   Epoch: 12   Global Step: 517890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:37,273-Speed 2623.92 samples/sec   Loss 5.0958   LearningRate 0.0141   Epoch: 12   Global Step: 517900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:41,193-Speed 2612.66 samples/sec   Loss 5.0633   LearningRate 0.0141   Epoch: 12   Global Step: 517910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:45,093-Speed 2626.49 samples/sec   Loss 4.9420   LearningRate 0.0141   Epoch: 12   Global Step: 517920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:48,991-Speed 2628.49 samples/sec   Loss 4.9790   LearningRate 0.0141   Epoch: 12   Global Step: 517930   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:52,888-Speed 2627.90 samples/sec   Loss 5.0204   LearningRate 0.0141   Epoch: 12   Global Step: 517940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:54:56,790-Speed 2625.49 samples/sec   Loss 5.0206   LearningRate 0.0141   Epoch: 12   Global Step: 517950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:00,695-Speed 2622.93 samples/sec   Loss 5.0069   LearningRate 0.0141   Epoch: 12   Global Step: 517960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:04,597-Speed 2624.52 samples/sec   Loss 5.1378   LearningRate 0.0141   Epoch: 12   Global Step: 517970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:08,495-Speed 2627.72 samples/sec   Loss 5.0153   LearningRate 0.0141   Epoch: 12   Global Step: 517980   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:55:12,390-Speed 2629.39 samples/sec   Loss 5.0282   LearningRate 0.0141   Epoch: 12   Global Step: 517990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:55:16,271-Speed 2638.79 samples/sec   Loss 5.0231   LearningRate 0.0141   Epoch: 12   Global Step: 518000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:20,173-Speed 2625.64 samples/sec   Loss 5.0053   LearningRate 0.0141   Epoch: 12   Global Step: 518010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:24,074-Speed 2625.64 samples/sec   Loss 4.9912   LearningRate 0.0141   Epoch: 12   Global Step: 518020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:27,975-Speed 2625.88 samples/sec   Loss 5.0007   LearningRate 0.0141   Epoch: 12   Global Step: 518030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:31,891-Speed 2615.06 samples/sec   Loss 4.9931   LearningRate 0.0141   Epoch: 12   Global Step: 518040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:35,795-Speed 2623.87 samples/sec   Loss 4.9595   LearningRate 0.0141   Epoch: 12   Global Step: 518050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:39,721-Speed 2609.20 samples/sec   Loss 5.0655   LearningRate 0.0141   Epoch: 12   Global Step: 518060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:43,621-Speed 2626.05 samples/sec   Loss 5.1157   LearningRate 0.0141   Epoch: 12   Global Step: 518070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:47,532-Speed 2618.54 samples/sec   Loss 4.9893   LearningRate 0.0141   Epoch: 12   Global Step: 518080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:51,431-Speed 2627.61 samples/sec   Loss 5.0870   LearningRate 0.0141   Epoch: 12   Global Step: 518090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:55:55,335-Speed 2623.40 samples/sec   Loss 5.0481   LearningRate 0.0141   Epoch: 12   Global Step: 518100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:55:59,235-Speed 2627.01 samples/sec   Loss 5.0365   LearningRate 0.0141   Epoch: 12   Global Step: 518110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:03,141-Speed 2621.51 samples/sec   Loss 4.9507   LearningRate 0.0141   Epoch: 12   Global Step: 518120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:07,048-Speed 2621.88 samples/sec   Loss 4.9725   LearningRate 0.0141   Epoch: 12   Global Step: 518130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:10,949-Speed 2625.47 samples/sec   Loss 5.0582   LearningRate 0.0141   Epoch: 12   Global Step: 518140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:14,848-Speed 2627.14 samples/sec   Loss 5.1404   LearningRate 0.0141   Epoch: 12   Global Step: 518150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:18,754-Speed 2622.39 samples/sec   Loss 5.0666   LearningRate 0.0141   Epoch: 12   Global Step: 518160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:22,665-Speed 2618.35 samples/sec   Loss 5.0614   LearningRate 0.0141   Epoch: 12   Global Step: 518170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:26,565-Speed 2626.61 samples/sec   Loss 5.0488   LearningRate 0.0141   Epoch: 12   Global Step: 518180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:30,466-Speed 2626.17 samples/sec   Loss 5.0282   LearningRate 0.0141   Epoch: 12   Global Step: 518190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:56:34,332-Speed 2649.60 samples/sec   Loss 4.9776   LearningRate 0.0141   Epoch: 12   Global Step: 518200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:56:38,234-Speed 2624.65 samples/sec   Loss 4.9796   LearningRate 0.0141   Epoch: 12   Global Step: 518210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:56:42,130-Speed 2629.21 samples/sec   Loss 4.9847   LearningRate 0.0141   Epoch: 12   Global Step: 518220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:56:46,036-Speed 2622.55 samples/sec   Loss 4.9714   LearningRate 0.0141   Epoch: 12   Global Step: 518230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:56:49,943-Speed 2621.20 samples/sec   Loss 5.0989   LearningRate 0.0141   Epoch: 12   Global Step: 518240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:56:53,852-Speed 2620.20 samples/sec   Loss 5.0640   LearningRate 0.0141   Epoch: 12   Global Step: 518250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:56:57,752-Speed 2626.32 samples/sec   Loss 4.9857   LearningRate 0.0141   Epoch: 12   Global Step: 518260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:57:01,647-Speed 2629.65 samples/sec   Loss 5.1069   LearningRate 0.0141   Epoch: 12   Global Step: 518270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:57:05,541-Speed 2630.08 samples/sec   Loss 4.9377   LearningRate 0.0141   Epoch: 12   Global Step: 518280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:57:09,440-Speed 2627.21 samples/sec   Loss 5.0384   LearningRate 0.0141   Epoch: 12   Global Step: 518290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:57:13,338-Speed 2627.59 samples/sec   Loss 5.0212   LearningRate 0.0141   Epoch: 12   Global Step: 518300   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:17,241-Speed 2624.21 samples/sec   Loss 5.0396   LearningRate 0.0141   Epoch: 12   Global Step: 518310   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:21,138-Speed 2628.50 samples/sec   Loss 5.0664   LearningRate 0.0141   Epoch: 12   Global Step: 518320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:25,042-Speed 2623.33 samples/sec   Loss 4.9779   LearningRate 0.0141   Epoch: 12   Global Step: 518330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:28,943-Speed 2625.69 samples/sec   Loss 4.9845   LearningRate 0.0141   Epoch: 12   Global Step: 518340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:32,843-Speed 2626.19 samples/sec   Loss 5.0081   LearningRate 0.0141   Epoch: 12   Global Step: 518350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:36,756-Speed 2617.80 samples/sec   Loss 5.0793   LearningRate 0.0141   Epoch: 12   Global Step: 518360   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:40,659-Speed 2624.31 samples/sec   Loss 5.0383   LearningRate 0.0141   Epoch: 12   Global Step: 518370   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:44,571-Speed 2618.00 samples/sec   Loss 5.1608   LearningRate 0.0141   Epoch: 12   Global Step: 518380   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:57:48,447-Speed 2642.68 samples/sec   Loss 4.9505   LearningRate 0.0141   Epoch: 12   Global Step: 518390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:57:52,343-Speed 2629.35 samples/sec   Loss 5.0802   LearningRate 0.0141   Epoch: 12   Global Step: 518400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:57:56,241-Speed 2627.29 samples/sec   Loss 5.1001   LearningRate 0.0141   Epoch: 12   Global Step: 518410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:00,137-Speed 2629.37 samples/sec   Loss 5.0800   LearningRate 0.0141   Epoch: 12   Global Step: 518420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:04,044-Speed 2620.96 samples/sec   Loss 5.1575   LearningRate 0.0141   Epoch: 12   Global Step: 518430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:07,945-Speed 2626.15 samples/sec   Loss 4.9678   LearningRate 0.0141   Epoch: 12   Global Step: 518440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:11,839-Speed 2630.12 samples/sec   Loss 5.0054   LearningRate 0.0141   Epoch: 12   Global Step: 518450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:15,736-Speed 2628.78 samples/sec   Loss 5.0516   LearningRate 0.0141   Epoch: 12   Global Step: 518460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:19,636-Speed 2626.24 samples/sec   Loss 5.0411   LearningRate 0.0141   Epoch: 12   Global Step: 518470   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:23,534-Speed 2627.29 samples/sec   Loss 4.9729   LearningRate 0.0141   Epoch: 12   Global Step: 518480   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:27,441-Speed 2621.94 samples/sec   Loss 5.1118   LearningRate 0.0141   Epoch: 12   Global Step: 518490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:58:31,386-Speed 2596.53 samples/sec   Loss 5.0786   LearningRate 0.0141   Epoch: 12   Global Step: 518500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:58:35,285-Speed 2627.15 samples/sec   Loss 5.0000   LearningRate 0.0141   Epoch: 12   Global Step: 518510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:58:39,219-Speed 2603.03 samples/sec   Loss 5.0660   LearningRate 0.0141   Epoch: 12   Global Step: 518520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:58:43,128-Speed 2620.66 samples/sec   Loss 5.0437   LearningRate 0.0141   Epoch: 12   Global Step: 518530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:58:47,025-Speed 2628.36 samples/sec   Loss 5.0625   LearningRate 0.0141   Epoch: 12   Global Step: 518540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:58:50,911-Speed 2635.93 samples/sec   Loss 5.0230   LearningRate 0.0141   Epoch: 12   Global Step: 518550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:54,803-Speed 2631.26 samples/sec   Loss 4.9747   LearningRate 0.0141   Epoch: 12   Global Step: 518560   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:58:58,703-Speed 2627.03 samples/sec   Loss 5.0871   LearningRate 0.0141   Epoch: 12   Global Step: 518570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:02,601-Speed 2627.26 samples/sec   Loss 4.9472   LearningRate 0.0141   Epoch: 12   Global Step: 518580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:06,496-Speed 2629.86 samples/sec   Loss 5.0287   LearningRate 0.0141   Epoch: 12   Global Step: 518590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:10,414-Speed 2613.86 samples/sec   Loss 5.0868   LearningRate 0.0141   Epoch: 12   Global Step: 518600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:14,311-Speed 2628.45 samples/sec   Loss 5.0119   LearningRate 0.0141   Epoch: 12   Global Step: 518610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:18,203-Speed 2631.48 samples/sec   Loss 5.1205   LearningRate 0.0141   Epoch: 12   Global Step: 518620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:22,159-Speed 2589.86 samples/sec   Loss 5.0716   LearningRate 0.0140   Epoch: 12   Global Step: 518630   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:26,056-Speed 2628.40 samples/sec   Loss 4.8936   LearningRate 0.0140   Epoch: 12   Global Step: 518640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 05:59:29,974-Speed 2614.59 samples/sec   Loss 4.9968   LearningRate 0.0140   Epoch: 12   Global Step: 518650   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:33,876-Speed 2624.46 samples/sec   Loss 5.0958   LearningRate 0.0140   Epoch: 12   Global Step: 518660   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:37,873-Speed 2562.62 samples/sec   Loss 4.9900   LearningRate 0.0140   Epoch: 12   Global Step: 518670   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:41,894-Speed 2547.26 samples/sec   Loss 4.9079   LearningRate 0.0140   Epoch: 12   Global Step: 518680   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:45,789-Speed 2630.30 samples/sec   Loss 5.0255   LearningRate 0.0140   Epoch: 12   Global Step: 518690   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:49,689-Speed 2626.16 samples/sec   Loss 4.9557   LearningRate 0.0140   Epoch: 12   Global Step: 518700   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:53,602-Speed 2617.81 samples/sec   Loss 5.0451   LearningRate 0.0140   Epoch: 12   Global Step: 518710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 05:59:57,502-Speed 2626.33 samples/sec   Loss 5.0480   LearningRate 0.0140   Epoch: 12   Global Step: 518720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:01,403-Speed 2626.13 samples/sec   Loss 4.9959   LearningRate 0.0140   Epoch: 12   Global Step: 518730   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:05,297-Speed 2629.79 samples/sec   Loss 5.0654   LearningRate 0.0140   Epoch: 12   Global Step: 518740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:09,182-Speed 2635.93 samples/sec   Loss 5.0163   LearningRate 0.0140   Epoch: 12   Global Step: 518750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:13,085-Speed 2624.77 samples/sec   Loss 4.9921   LearningRate 0.0140   Epoch: 12   Global Step: 518760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:17,030-Speed 2596.31 samples/sec   Loss 5.0121   LearningRate 0.0140   Epoch: 12   Global Step: 518770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:20,942-Speed 2618.60 samples/sec   Loss 5.0379   LearningRate 0.0140   Epoch: 12   Global Step: 518780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:24,840-Speed 2627.17 samples/sec   Loss 5.0445   LearningRate 0.0140   Epoch: 12   Global Step: 518790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:00:28,719-Speed 2640.84 samples/sec   Loss 4.9436   LearningRate 0.0140   Epoch: 12   Global Step: 518800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:32,614-Speed 2629.69 samples/sec   Loss 5.0406   LearningRate 0.0140   Epoch: 12   Global Step: 518810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:36,561-Speed 2595.07 samples/sec   Loss 4.9750   LearningRate 0.0140   Epoch: 12   Global Step: 518820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:40,457-Speed 2629.02 samples/sec   Loss 5.1454   LearningRate 0.0140   Epoch: 12   Global Step: 518830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:44,352-Speed 2629.59 samples/sec   Loss 5.0585   LearningRate 0.0140   Epoch: 12   Global Step: 518840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:48,257-Speed 2622.79 samples/sec   Loss 4.9423   LearningRate 0.0140   Epoch: 12   Global Step: 518850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:52,171-Speed 2617.19 samples/sec   Loss 5.1069   LearningRate 0.0140   Epoch: 12   Global Step: 518860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:56,068-Speed 2627.91 samples/sec   Loss 5.0729   LearningRate 0.0140   Epoch: 12   Global Step: 518870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:00:59,989-Speed 2613.11 samples/sec   Loss 5.0941   LearningRate 0.0140   Epoch: 12   Global Step: 518880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:03,995-Speed 2556.32 samples/sec   Loss 5.0914   LearningRate 0.0140   Epoch: 12   Global Step: 518890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:07,989-Speed 2564.52 samples/sec   Loss 4.9329   LearningRate 0.0140   Epoch: 12   Global Step: 518900   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:11,895-Speed 2622.18 samples/sec   Loss 5.0869   LearningRate 0.0140   Epoch: 12   Global Step: 518910   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:15,900-Speed 2557.44 samples/sec   Loss 4.9508   LearningRate 0.0140   Epoch: 12   Global Step: 518920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:19,802-Speed 2625.33 samples/sec   Loss 4.9437   LearningRate 0.0140   Epoch: 12   Global Step: 518930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:23,721-Speed 2613.86 samples/sec   Loss 4.9865   LearningRate 0.0140   Epoch: 12   Global Step: 518940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:27,618-Speed 2628.45 samples/sec   Loss 5.0700   LearningRate 0.0140   Epoch: 12   Global Step: 518950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:31,583-Speed 2583.08 samples/sec   Loss 5.1003   LearningRate 0.0140   Epoch: 12   Global Step: 518960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:35,481-Speed 2627.72 samples/sec   Loss 4.9746   LearningRate 0.0140   Epoch: 12   Global Step: 518970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:01:39,396-Speed 2615.87 samples/sec   Loss 4.9659   LearningRate 0.0140   Epoch: 12   Global Step: 518980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:43,324-Speed 2608.25 samples/sec   Loss 5.0373   LearningRate 0.0140   Epoch: 12   Global Step: 518990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:47,223-Speed 2626.89 samples/sec   Loss 5.1391   LearningRate 0.0140   Epoch: 12   Global Step: 519000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:51,120-Speed 2628.16 samples/sec   Loss 5.1477   LearningRate 0.0140   Epoch: 12   Global Step: 519010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:55,050-Speed 2606.32 samples/sec   Loss 5.0214   LearningRate 0.0140   Epoch: 12   Global Step: 519020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:01:58,951-Speed 2626.09 samples/sec   Loss 5.1124   LearningRate 0.0140   Epoch: 12   Global Step: 519030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:02,876-Speed 2609.40 samples/sec   Loss 4.9385   LearningRate 0.0140   Epoch: 12   Global Step: 519040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:06,778-Speed 2624.55 samples/sec   Loss 5.0152   LearningRate 0.0140   Epoch: 12   Global Step: 519050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:10,682-Speed 2623.88 samples/sec   Loss 4.9666   LearningRate 0.0140   Epoch: 12   Global Step: 519060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:14,625-Speed 2597.96 samples/sec   Loss 4.9952   LearningRate 0.0140   Epoch: 12   Global Step: 519070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:18,550-Speed 2609.66 samples/sec   Loss 4.9444   LearningRate 0.0140   Epoch: 12   Global Step: 519080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:02:22,435-Speed 2636.32 samples/sec   Loss 4.9736   LearningRate 0.0140   Epoch: 12   Global Step: 519090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:26,341-Speed 2621.86 samples/sec   Loss 5.0484   LearningRate 0.0140   Epoch: 12   Global Step: 519100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:30,244-Speed 2624.68 samples/sec   Loss 5.0216   LearningRate 0.0140   Epoch: 12   Global Step: 519110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:34,149-Speed 2622.67 samples/sec   Loss 5.1967   LearningRate 0.0140   Epoch: 12   Global Step: 519120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:38,055-Speed 2622.05 samples/sec   Loss 5.0478   LearningRate 0.0140   Epoch: 12   Global Step: 519130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:41,978-Speed 2610.93 samples/sec   Loss 4.9684   LearningRate 0.0140   Epoch: 12   Global Step: 519140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:45,873-Speed 2630.03 samples/sec   Loss 5.0792   LearningRate 0.0140   Epoch: 12   Global Step: 519150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:49,770-Speed 2628.38 samples/sec   Loss 5.0865   LearningRate 0.0140   Epoch: 12   Global Step: 519160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:53,666-Speed 2629.48 samples/sec   Loss 5.0763   LearningRate 0.0140   Epoch: 12   Global Step: 519170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:02:57,564-Speed 2627.45 samples/sec   Loss 5.0089   LearningRate 0.0140   Epoch: 12   Global Step: 519180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:03:01,438-Speed 2643.65 samples/sec   Loss 4.9789   LearningRate 0.0140   Epoch: 12   Global Step: 519190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:03:05,343-Speed 2622.23 samples/sec   Loss 4.9577   LearningRate 0.0140   Epoch: 12   Global Step: 519200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:03:09,253-Speed 2619.87 samples/sec   Loss 4.9626   LearningRate 0.0140   Epoch: 12   Global Step: 519210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:03:13,130-Speed 2642.14 samples/sec   Loss 5.0054   LearningRate 0.0140   Epoch: 12   Global Step: 519220   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:17,034-Speed 2623.07 samples/sec   Loss 5.0689   LearningRate 0.0140   Epoch: 12   Global Step: 519230   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:20,936-Speed 2625.65 samples/sec   Loss 5.0363   LearningRate 0.0140   Epoch: 12   Global Step: 519240   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:24,838-Speed 2624.71 samples/sec   Loss 5.0430   LearningRate 0.0140   Epoch: 12   Global Step: 519250   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:28,744-Speed 2622.59 samples/sec   Loss 4.9474   LearningRate 0.0140   Epoch: 12   Global Step: 519260   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:32,649-Speed 2622.74 samples/sec   Loss 5.0884   LearningRate 0.0140   Epoch: 12   Global Step: 519270   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:36,550-Speed 2625.29 samples/sec   Loss 4.9605   LearningRate 0.0140   Epoch: 12   Global Step: 519280   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:40,448-Speed 2627.41 samples/sec   Loss 4.9722   LearningRate 0.0140   Epoch: 12   Global Step: 519290   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:44,347-Speed 2627.50 samples/sec   Loss 4.9347   LearningRate 0.0140   Epoch: 12   Global Step: 519300   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:48,247-Speed 2626.13 samples/sec   Loss 5.0822   LearningRate 0.0140   Epoch: 12   Global Step: 519310   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:03:52,151-Speed 2623.86 samples/sec   Loss 5.0108   LearningRate 0.0140   Epoch: 12   Global Step: 519320   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:03:56,049-Speed 2627.73 samples/sec   Loss 5.0759   LearningRate 0.0140   Epoch: 12   Global Step: 519330   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:03:59,950-Speed 2626.73 samples/sec   Loss 5.0415   LearningRate 0.0140   Epoch: 12   Global Step: 519340   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:03,860-Speed 2618.93 samples/sec   Loss 5.0189   LearningRate 0.0140   Epoch: 12   Global Step: 519350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:07,781-Speed 2612.65 samples/sec   Loss 5.0446   LearningRate 0.0140   Epoch: 12   Global Step: 519360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:11,681-Speed 2625.98 samples/sec   Loss 5.0063   LearningRate 0.0140   Epoch: 12   Global Step: 519370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:15,582-Speed 2626.25 samples/sec   Loss 4.9969   LearningRate 0.0140   Epoch: 12   Global Step: 519380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:19,485-Speed 2624.08 samples/sec   Loss 4.9966   LearningRate 0.0140   Epoch: 12   Global Step: 519390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:23,403-Speed 2613.78 samples/sec   Loss 4.9537   LearningRate 0.0140   Epoch: 12   Global Step: 519400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:27,302-Speed 2627.88 samples/sec   Loss 5.0182   LearningRate 0.0140   Epoch: 12   Global Step: 519410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:31,213-Speed 2618.87 samples/sec   Loss 5.0844   LearningRate 0.0140   Epoch: 12   Global Step: 519420   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:04:35,118-Speed 2622.76 samples/sec   Loss 5.0175   LearningRate 0.0140   Epoch: 12   Global Step: 519430   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:04:38,997-Speed 2640.62 samples/sec   Loss 4.9878   LearningRate 0.0140   Epoch: 12   Global Step: 519440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:42,895-Speed 2627.85 samples/sec   Loss 4.9984   LearningRate 0.0140   Epoch: 12   Global Step: 519450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:46,792-Speed 2628.04 samples/sec   Loss 5.1019   LearningRate 0.0140   Epoch: 12   Global Step: 519460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:04:50,680-Speed 2634.47 samples/sec   Loss 5.0745   LearningRate 0.0140   Epoch: 12   Global Step: 519470   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:04:54,604-Speed 2610.39 samples/sec   Loss 5.0526   LearningRate 0.0140   Epoch: 12   Global Step: 519480   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:04:58,607-Speed 2558.89 samples/sec   Loss 4.9293   LearningRate 0.0140   Epoch: 12   Global Step: 519490   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:02,510-Speed 2624.55 samples/sec   Loss 5.0312   LearningRate 0.0140   Epoch: 12   Global Step: 519500   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:06,413-Speed 2623.57 samples/sec   Loss 5.0298   LearningRate 0.0140   Epoch: 12   Global Step: 519510   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:10,308-Speed 2629.90 samples/sec   Loss 5.0378   LearningRate 0.0140   Epoch: 12   Global Step: 519520   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:14,343-Speed 2538.38 samples/sec   Loss 4.9691   LearningRate 0.0140   Epoch: 12   Global Step: 519530   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:18,313-Speed 2580.09 samples/sec   Loss 5.0452   LearningRate 0.0140   Epoch: 12   Global Step: 519540   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:22,207-Speed 2630.34 samples/sec   Loss 5.0522   LearningRate 0.0140   Epoch: 12   Global Step: 519550   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:26,111-Speed 2623.36 samples/sec   Loss 5.0900   LearningRate 0.0140   Epoch: 12   Global Step: 519560   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-04-15 06:05:30,009-Speed 2627.29 samples/sec   Loss 5.0501   LearningRate 0.0140   Epoch: 12   Global Step: 519570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:33,915-Speed 2622.77 samples/sec   Loss 5.0767   LearningRate 0.0140   Epoch: 12   Global Step: 519580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:37,843-Speed 2607.09 samples/sec   Loss 5.0361   LearningRate 0.0140   Epoch: 12   Global Step: 519590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:41,749-Speed 2622.52 samples/sec   Loss 4.9384   LearningRate 0.0140   Epoch: 12   Global Step: 519600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:45,647-Speed 2627.39 samples/sec   Loss 5.1730   LearningRate 0.0140   Epoch: 12   Global Step: 519610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:49,542-Speed 2630.10 samples/sec   Loss 5.0547   LearningRate 0.0140   Epoch: 12   Global Step: 519620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:53,439-Speed 2628.35 samples/sec   Loss 4.9475   LearningRate 0.0140   Epoch: 12   Global Step: 519630   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:05:57,341-Speed 2624.69 samples/sec   Loss 5.0476   LearningRate 0.0140   Epoch: 12   Global Step: 519640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:01,240-Speed 2627.10 samples/sec   Loss 4.9785   LearningRate 0.0140   Epoch: 12   Global Step: 519650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:05,151-Speed 2618.26 samples/sec   Loss 5.0661   LearningRate 0.0140   Epoch: 12   Global Step: 519660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:09,052-Speed 2625.96 samples/sec   Loss 5.0010   LearningRate 0.0140   Epoch: 12   Global Step: 519670   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:06:12,950-Speed 2627.82 samples/sec   Loss 5.0050   LearningRate 0.0140   Epoch: 12   Global Step: 519680   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:06:16,832-Speed 2638.64 samples/sec   Loss 5.0127   LearningRate 0.0140   Epoch: 12   Global Step: 519690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:20,750-Speed 2614.08 samples/sec   Loss 4.9523   LearningRate 0.0140   Epoch: 12   Global Step: 519700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:24,678-Speed 2607.91 samples/sec   Loss 5.0346   LearningRate 0.0140   Epoch: 12   Global Step: 519710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:28,617-Speed 2600.67 samples/sec   Loss 5.1055   LearningRate 0.0140   Epoch: 12   Global Step: 519720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:32,550-Speed 2603.96 samples/sec   Loss 4.9721   LearningRate 0.0140   Epoch: 12   Global Step: 519730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:36,465-Speed 2616.60 samples/sec   Loss 5.1207   LearningRate 0.0139   Epoch: 12   Global Step: 519740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:40,363-Speed 2627.36 samples/sec   Loss 4.9549   LearningRate 0.0139   Epoch: 12   Global Step: 519750   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:44,263-Speed 2626.89 samples/sec   Loss 5.0465   LearningRate 0.0139   Epoch: 12   Global Step: 519760   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:48,164-Speed 2625.25 samples/sec   Loss 5.0863   LearningRate 0.0139   Epoch: 12   Global Step: 519770   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:52,075-Speed 2619.31 samples/sec   Loss 5.1000   LearningRate 0.0139   Epoch: 12   Global Step: 519780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:06:55,977-Speed 2624.70 samples/sec   Loss 5.1422   LearningRate 0.0139   Epoch: 12   Global Step: 519790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:06:59,873-Speed 2629.87 samples/sec   Loss 4.9124   LearningRate 0.0139   Epoch: 12   Global Step: 519800   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:07:03,789-Speed 2615.27 samples/sec   Loss 5.1043   LearningRate 0.0139   Epoch: 12   Global Step: 519810   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:07:07,687-Speed 2628.12 samples/sec   Loss 5.0649   LearningRate 0.0139   Epoch: 12   Global Step: 519820   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:07:11,585-Speed 2627.30 samples/sec   Loss 4.9394   LearningRate 0.0139   Epoch: 12   Global Step: 519830   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:07:15,481-Speed 2628.80 samples/sec   Loss 5.0170   LearningRate 0.0139   Epoch: 12   Global Step: 519840   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:07:19,359-Speed 2640.99 samples/sec   Loss 4.9532   LearningRate 0.0139   Epoch: 12   Global Step: 519850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:23,254-Speed 2630.22 samples/sec   Loss 5.0432   LearningRate 0.0139   Epoch: 12   Global Step: 519860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:27,164-Speed 2619.64 samples/sec   Loss 5.1204   LearningRate 0.0139   Epoch: 12   Global Step: 519870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:31,060-Speed 2629.60 samples/sec   Loss 4.9013   LearningRate 0.0139   Epoch: 12   Global Step: 519880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:34,953-Speed 2630.70 samples/sec   Loss 4.9972   LearningRate 0.0139   Epoch: 12   Global Step: 519890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:38,869-Speed 2615.34 samples/sec   Loss 4.9773   LearningRate 0.0139   Epoch: 12   Global Step: 519900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:42,764-Speed 2629.55 samples/sec   Loss 4.9937   LearningRate 0.0139   Epoch: 12   Global Step: 519910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:46,798-Speed 2538.76 samples/sec   Loss 5.1379   LearningRate 0.0139   Epoch: 12   Global Step: 519920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:50,691-Speed 2631.39 samples/sec   Loss 5.0146   LearningRate 0.0139   Epoch: 12   Global Step: 519930   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:54,590-Speed 2626.95 samples/sec   Loss 4.9790   LearningRate 0.0139   Epoch: 12   Global Step: 519940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:07:58,486-Speed 2629.69 samples/sec   Loss 5.0272   LearningRate 0.0139   Epoch: 12   Global Step: 519950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:08:02,359-Speed 2644.66 samples/sec   Loss 4.9056   LearningRate 0.0139   Epoch: 12   Global Step: 519960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:08:06,255-Speed 2628.68 samples/sec   Loss 4.9802   LearningRate 0.0139   Epoch: 12   Global Step: 519970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:08:10,149-Speed 2630.04 samples/sec   Loss 5.0199   LearningRate 0.0139   Epoch: 12   Global Step: 519980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:08:14,046-Speed 2628.27 samples/sec   Loss 5.0110   LearningRate 0.0139   Epoch: 12   Global Step: 519990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:08:17,944-Speed 2627.93 samples/sec   Loss 4.9788   LearningRate 0.0139   Epoch: 12   Global Step: 520000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:09:01,338-[lfw][520000]XNorm: 23.355590
Training: 2022-04-15 06:09:01,339-[lfw][520000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 06:09:01,340-[lfw][520000]Accuracy-Highest: 0.99800
Training: 2022-04-15 06:09:51,848-[cfp_fp][520000]XNorm: 22.104677
Training: 2022-04-15 06:09:51,849-[cfp_fp][520000]Accuracy-Flip: 0.99057+-0.00457
Training: 2022-04-15 06:09:51,850-[cfp_fp][520000]Accuracy-Highest: 0.99057
Training: 2022-04-15 06:10:35,325-[agedb_30][520000]XNorm: 23.607143
Training: 2022-04-15 06:10:35,325-[agedb_30][520000]Accuracy-Flip: 0.98083+-0.00534
Training: 2022-04-15 06:10:35,326-[agedb_30][520000]Accuracy-Highest: 0.98083
Training: 2022-04-15 06:10:39,216-Speed 72.49 samples/sec   Loss 4.9896   LearningRate 0.0139   Epoch: 12   Global Step: 520010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:10:43,092-Speed 2642.37 samples/sec   Loss 5.1415   LearningRate 0.0139   Epoch: 12   Global Step: 520020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:10:46,973-Speed 2639.10 samples/sec   Loss 5.0566   LearningRate 0.0139   Epoch: 12   Global Step: 520030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:10:50,853-Speed 2639.97 samples/sec   Loss 5.0630   LearningRate 0.0139   Epoch: 12   Global Step: 520040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:10:54,736-Speed 2637.81 samples/sec   Loss 5.0163   LearningRate 0.0139   Epoch: 12   Global Step: 520050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:10:58,621-Speed 2637.24 samples/sec   Loss 5.0351   LearningRate 0.0139   Epoch: 12   Global Step: 520060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:02,510-Speed 2634.82 samples/sec   Loss 5.0040   LearningRate 0.0139   Epoch: 12   Global Step: 520070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:06,397-Speed 2634.91 samples/sec   Loss 5.0106   LearningRate 0.0139   Epoch: 12   Global Step: 520080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:10,414-Speed 2549.70 samples/sec   Loss 5.0946   LearningRate 0.0139   Epoch: 12   Global Step: 520090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:14,425-Speed 2554.58 samples/sec   Loss 4.9358   LearningRate 0.0139   Epoch: 12   Global Step: 520100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:18,327-Speed 2624.72 samples/sec   Loss 4.9818   LearningRate 0.0139   Epoch: 12   Global Step: 520110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:22,230-Speed 2624.11 samples/sec   Loss 4.9241   LearningRate 0.0139   Epoch: 12   Global Step: 520120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:26,127-Speed 2627.93 samples/sec   Loss 5.0003   LearningRate 0.0139   Epoch: 12   Global Step: 520130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:30,025-Speed 2628.86 samples/sec   Loss 5.0169   LearningRate 0.0139   Epoch: 12   Global Step: 520140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:33,922-Speed 2628.62 samples/sec   Loss 5.0255   LearningRate 0.0139   Epoch: 12   Global Step: 520150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:37,795-Speed 2644.58 samples/sec   Loss 4.9261   LearningRate 0.0139   Epoch: 12   Global Step: 520160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:41,698-Speed 2624.40 samples/sec   Loss 5.0169   LearningRate 0.0139   Epoch: 12   Global Step: 520170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:45,602-Speed 2623.47 samples/sec   Loss 4.9614   LearningRate 0.0139   Epoch: 12   Global Step: 520180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:49,507-Speed 2622.75 samples/sec   Loss 4.9708   LearningRate 0.0139   Epoch: 12   Global Step: 520190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:53,411-Speed 2623.82 samples/sec   Loss 4.9410   LearningRate 0.0139   Epoch: 12   Global Step: 520200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:11:57,316-Speed 2623.00 samples/sec   Loss 4.9978   LearningRate 0.0139   Epoch: 12   Global Step: 520210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:01,288-Speed 2578.61 samples/sec   Loss 5.1431   LearningRate 0.0139   Epoch: 12   Global Step: 520220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:05,202-Speed 2616.83 samples/sec   Loss 4.9808   LearningRate 0.0139   Epoch: 12   Global Step: 520230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:09,106-Speed 2624.42 samples/sec   Loss 5.0454   LearningRate 0.0139   Epoch: 12   Global Step: 520240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:13,051-Speed 2596.63 samples/sec   Loss 5.1032   LearningRate 0.0139   Epoch: 12   Global Step: 520250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:16,949-Speed 2627.67 samples/sec   Loss 4.9646   LearningRate 0.0139   Epoch: 12   Global Step: 520260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:20,881-Speed 2604.97 samples/sec   Loss 5.0175   LearningRate 0.0139   Epoch: 12   Global Step: 520270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:24,782-Speed 2625.84 samples/sec   Loss 5.0293   LearningRate 0.0139   Epoch: 12   Global Step: 520280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:28,678-Speed 2628.92 samples/sec   Loss 5.0560   LearningRate 0.0139   Epoch: 12   Global Step: 520290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:32,578-Speed 2627.12 samples/sec   Loss 5.0830   LearningRate 0.0139   Epoch: 12   Global Step: 520300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:36,474-Speed 2628.97 samples/sec   Loss 5.0339   LearningRate 0.0139   Epoch: 12   Global Step: 520310   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:12:40,376-Speed 2625.10 samples/sec   Loss 5.0285   LearningRate 0.0139   Epoch: 12   Global Step: 520320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:12:44,277-Speed 2625.43 samples/sec   Loss 5.1355   LearningRate 0.0139   Epoch: 12   Global Step: 520330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:12:48,172-Speed 2629.82 samples/sec   Loss 5.0373   LearningRate 0.0139   Epoch: 12   Global Step: 520340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:12:52,072-Speed 2626.03 samples/sec   Loss 5.0171   LearningRate 0.0139   Epoch: 12   Global Step: 520350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:12:55,944-Speed 2645.22 samples/sec   Loss 5.0399   LearningRate 0.0139   Epoch: 12   Global Step: 520360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:12:59,861-Speed 2614.95 samples/sec   Loss 5.0539   LearningRate 0.0139   Epoch: 12   Global Step: 520370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:03,766-Speed 2623.10 samples/sec   Loss 4.9219   LearningRate 0.0139   Epoch: 12   Global Step: 520380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:07,664-Speed 2627.62 samples/sec   Loss 5.0764   LearningRate 0.0139   Epoch: 12   Global Step: 520390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:11,559-Speed 2629.52 samples/sec   Loss 5.0360   LearningRate 0.0139   Epoch: 12   Global Step: 520400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:15,459-Speed 2626.81 samples/sec   Loss 4.9224   LearningRate 0.0139   Epoch: 12   Global Step: 520410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:19,357-Speed 2627.52 samples/sec   Loss 5.0898   LearningRate 0.0139   Epoch: 12   Global Step: 520420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:23,261-Speed 2623.47 samples/sec   Loss 5.0804   LearningRate 0.0139   Epoch: 12   Global Step: 520430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:27,161-Speed 2626.03 samples/sec   Loss 4.9889   LearningRate 0.0139   Epoch: 12   Global Step: 520440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:31,077-Speed 2615.66 samples/sec   Loss 4.9984   LearningRate 0.0139   Epoch: 12   Global Step: 520450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:34,974-Speed 2628.54 samples/sec   Loss 4.9416   LearningRate 0.0139   Epoch: 12   Global Step: 520460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:13:38,846-Speed 2645.50 samples/sec   Loss 5.0224   LearningRate 0.0139   Epoch: 12   Global Step: 520470   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:42,742-Speed 2628.64 samples/sec   Loss 5.1201   LearningRate 0.0139   Epoch: 12   Global Step: 520480   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:46,641-Speed 2627.03 samples/sec   Loss 4.9863   LearningRate 0.0139   Epoch: 12   Global Step: 520490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:50,539-Speed 2627.83 samples/sec   Loss 4.9696   LearningRate 0.0139   Epoch: 12   Global Step: 520500   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:54,440-Speed 2625.73 samples/sec   Loss 4.8840   LearningRate 0.0139   Epoch: 12   Global Step: 520510   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:13:58,341-Speed 2626.25 samples/sec   Loss 5.1308   LearningRate 0.0139   Epoch: 12   Global Step: 520520   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:02,262-Speed 2612.08 samples/sec   Loss 4.9233   LearningRate 0.0139   Epoch: 12   Global Step: 520530   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:06,168-Speed 2622.32 samples/sec   Loss 4.9481   LearningRate 0.0139   Epoch: 12   Global Step: 520540   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:10,080-Speed 2617.81 samples/sec   Loss 5.0344   LearningRate 0.0139   Epoch: 12   Global Step: 520550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:13,986-Speed 2622.95 samples/sec   Loss 4.9592   LearningRate 0.0139   Epoch: 12   Global Step: 520560   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:17,892-Speed 2622.18 samples/sec   Loss 4.9648   LearningRate 0.0139   Epoch: 12   Global Step: 520570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:21,792-Speed 2626.15 samples/sec   Loss 5.0611   LearningRate 0.0139   Epoch: 12   Global Step: 520580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:25,691-Speed 2626.23 samples/sec   Loss 4.9341   LearningRate 0.0139   Epoch: 12   Global Step: 520590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:29,629-Speed 2601.15 samples/sec   Loss 4.9702   LearningRate 0.0139   Epoch: 12   Global Step: 520600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:33,529-Speed 2626.05 samples/sec   Loss 4.9677   LearningRate 0.0139   Epoch: 12   Global Step: 520610   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:37,429-Speed 2628.39 samples/sec   Loss 4.9786   LearningRate 0.0139   Epoch: 12   Global Step: 520620   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:41,362-Speed 2603.69 samples/sec   Loss 5.0035   LearningRate 0.0139   Epoch: 12   Global Step: 520630   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:14:45,234-Speed 2645.45 samples/sec   Loss 5.1219   LearningRate 0.0139   Epoch: 12   Global Step: 520640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:49,135-Speed 2625.54 samples/sec   Loss 4.9714   LearningRate 0.0139   Epoch: 12   Global Step: 520650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:53,036-Speed 2626.12 samples/sec   Loss 5.0319   LearningRate 0.0139   Epoch: 12   Global Step: 520660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:14:56,936-Speed 2626.27 samples/sec   Loss 4.9672   LearningRate 0.0139   Epoch: 12   Global Step: 520670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:00,921-Speed 2569.84 samples/sec   Loss 4.9464   LearningRate 0.0139   Epoch: 12   Global Step: 520680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:04,847-Speed 2609.28 samples/sec   Loss 4.9857   LearningRate 0.0139   Epoch: 12   Global Step: 520690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:08,747-Speed 2626.96 samples/sec   Loss 4.9815   LearningRate 0.0139   Epoch: 12   Global Step: 520700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:12,653-Speed 2622.85 samples/sec   Loss 4.9874   LearningRate 0.0139   Epoch: 12   Global Step: 520710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:16,550-Speed 2628.47 samples/sec   Loss 5.0115   LearningRate 0.0139   Epoch: 12   Global Step: 520720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:20,446-Speed 2629.44 samples/sec   Loss 4.9840   LearningRate 0.0139   Epoch: 12   Global Step: 520730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:24,342-Speed 2628.65 samples/sec   Loss 4.9720   LearningRate 0.0139   Epoch: 12   Global Step: 520740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:15:28,216-Speed 2644.74 samples/sec   Loss 4.9252   LearningRate 0.0139   Epoch: 12   Global Step: 520750   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:32,113-Speed 2627.99 samples/sec   Loss 5.0201   LearningRate 0.0139   Epoch: 12   Global Step: 520760   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:36,012-Speed 2627.07 samples/sec   Loss 5.1497   LearningRate 0.0139   Epoch: 12   Global Step: 520770   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:39,910-Speed 2627.39 samples/sec   Loss 5.0178   LearningRate 0.0139   Epoch: 12   Global Step: 520780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:43,809-Speed 2627.26 samples/sec   Loss 5.0264   LearningRate 0.0139   Epoch: 12   Global Step: 520790   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:47,805-Speed 2563.29 samples/sec   Loss 5.0287   LearningRate 0.0139   Epoch: 12   Global Step: 520800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:51,717-Speed 2618.20 samples/sec   Loss 5.0597   LearningRate 0.0139   Epoch: 12   Global Step: 520810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:55,637-Speed 2613.74 samples/sec   Loss 4.9533   LearningRate 0.0139   Epoch: 12   Global Step: 520820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:15:59,536-Speed 2626.52 samples/sec   Loss 5.0332   LearningRate 0.0139   Epoch: 12   Global Step: 520830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:03,438-Speed 2624.93 samples/sec   Loss 5.0167   LearningRate 0.0139   Epoch: 12   Global Step: 520840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:07,345-Speed 2621.04 samples/sec   Loss 4.9966   LearningRate 0.0138   Epoch: 12   Global Step: 520850   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:16:11,237-Speed 2632.23 samples/sec   Loss 5.0092   LearningRate 0.0138   Epoch: 12   Global Step: 520860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:15,138-Speed 2625.44 samples/sec   Loss 4.9690   LearningRate 0.0138   Epoch: 12   Global Step: 520870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:19,094-Speed 2589.80 samples/sec   Loss 5.0372   LearningRate 0.0138   Epoch: 12   Global Step: 520880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:22,988-Speed 2630.54 samples/sec   Loss 4.9239   LearningRate 0.0138   Epoch: 12   Global Step: 520890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:26,901-Speed 2617.50 samples/sec   Loss 4.8928   LearningRate 0.0138   Epoch: 12   Global Step: 520900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:30,804-Speed 2623.88 samples/sec   Loss 4.9594   LearningRate 0.0138   Epoch: 12   Global Step: 520910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:34,707-Speed 2624.41 samples/sec   Loss 5.0305   LearningRate 0.0138   Epoch: 12   Global Step: 520920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:38,605-Speed 2627.48 samples/sec   Loss 4.9708   LearningRate 0.0138   Epoch: 12   Global Step: 520930   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:42,503-Speed 2628.21 samples/sec   Loss 4.9704   LearningRate 0.0138   Epoch: 12   Global Step: 520940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:46,399-Speed 2628.99 samples/sec   Loss 4.9351   LearningRate 0.0138   Epoch: 12   Global Step: 520950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:16:50,300-Speed 2625.30 samples/sec   Loss 4.9711   LearningRate 0.0138   Epoch: 12   Global Step: 520960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:16:54,198-Speed 2627.69 samples/sec   Loss 4.9924   LearningRate 0.0138   Epoch: 12   Global Step: 520970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:16:58,091-Speed 2631.30 samples/sec   Loss 5.0014   LearningRate 0.0138   Epoch: 12   Global Step: 520980   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:17:01,992-Speed 2625.57 samples/sec   Loss 4.9166   LearningRate 0.0138   Epoch: 12   Global Step: 520990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:17:05,866-Speed 2643.57 samples/sec   Loss 4.9928   LearningRate 0.0138   Epoch: 12   Global Step: 521000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:09,797-Speed 2605.41 samples/sec   Loss 5.0258   LearningRate 0.0138   Epoch: 12   Global Step: 521010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:13,697-Speed 2626.41 samples/sec   Loss 4.9043   LearningRate 0.0138   Epoch: 12   Global Step: 521020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:17,598-Speed 2625.67 samples/sec   Loss 4.9820   LearningRate 0.0138   Epoch: 12   Global Step: 521030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:21,497-Speed 2627.25 samples/sec   Loss 5.0478   LearningRate 0.0138   Epoch: 12   Global Step: 521040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:25,395-Speed 2628.11 samples/sec   Loss 4.9409   LearningRate 0.0138   Epoch: 12   Global Step: 521050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:29,292-Speed 2628.20 samples/sec   Loss 5.0975   LearningRate 0.0138   Epoch: 12   Global Step: 521060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:33,188-Speed 2629.13 samples/sec   Loss 4.9684   LearningRate 0.0138   Epoch: 12   Global Step: 521070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:37,087-Speed 2626.33 samples/sec   Loss 4.9988   LearningRate 0.0138   Epoch: 12   Global Step: 521080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:40,984-Speed 2628.09 samples/sec   Loss 4.9689   LearningRate 0.0138   Epoch: 12   Global Step: 521090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:17:44,882-Speed 2627.83 samples/sec   Loss 4.9440   LearningRate 0.0138   Epoch: 12   Global Step: 521100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:17:48,791-Speed 2620.78 samples/sec   Loss 4.9928   LearningRate 0.0138   Epoch: 12   Global Step: 521110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:17:52,691-Speed 2625.70 samples/sec   Loss 4.9345   LearningRate 0.0138   Epoch: 12   Global Step: 521120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:17:56,592-Speed 2625.96 samples/sec   Loss 5.0380   LearningRate 0.0138   Epoch: 12   Global Step: 521130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:18:00,478-Speed 2635.85 samples/sec   Loss 4.9941   LearningRate 0.0138   Epoch: 12   Global Step: 521140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:04,377-Speed 2626.36 samples/sec   Loss 4.9904   LearningRate 0.0138   Epoch: 12   Global Step: 521150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:08,275-Speed 2627.63 samples/sec   Loss 5.0102   LearningRate 0.0138   Epoch: 12   Global Step: 521160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:12,174-Speed 2627.68 samples/sec   Loss 5.0040   LearningRate 0.0138   Epoch: 12   Global Step: 521170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:16,067-Speed 2630.93 samples/sec   Loss 4.9228   LearningRate 0.0138   Epoch: 12   Global Step: 521180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:19,990-Speed 2610.95 samples/sec   Loss 5.1751   LearningRate 0.0138   Epoch: 12   Global Step: 521190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:23,886-Speed 2629.17 samples/sec   Loss 5.0011   LearningRate 0.0138   Epoch: 12   Global Step: 521200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:27,782-Speed 2628.87 samples/sec   Loss 4.9459   LearningRate 0.0138   Epoch: 12   Global Step: 521210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:31,680-Speed 2627.54 samples/sec   Loss 4.8379   LearningRate 0.0138   Epoch: 12   Global Step: 521220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:35,586-Speed 2622.86 samples/sec   Loss 5.0003   LearningRate 0.0138   Epoch: 12   Global Step: 521230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:39,480-Speed 2629.89 samples/sec   Loss 5.0183   LearningRate 0.0138   Epoch: 12   Global Step: 521240   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:18:43,385-Speed 2623.30 samples/sec   Loss 4.9802   LearningRate 0.0138   Epoch: 12   Global Step: 521250   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:18:47,274-Speed 2633.92 samples/sec   Loss 5.0106   LearningRate 0.0138   Epoch: 12   Global Step: 521260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:51,176-Speed 2624.72 samples/sec   Loss 4.9538   LearningRate 0.0138   Epoch: 12   Global Step: 521270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:55,092-Speed 2615.32 samples/sec   Loss 4.9377   LearningRate 0.0138   Epoch: 12   Global Step: 521280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:18:58,996-Speed 2623.82 samples/sec   Loss 4.9746   LearningRate 0.0138   Epoch: 12   Global Step: 521290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:02,913-Speed 2615.36 samples/sec   Loss 4.9387   LearningRate 0.0138   Epoch: 12   Global Step: 521300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:06,903-Speed 2566.92 samples/sec   Loss 4.9835   LearningRate 0.0138   Epoch: 12   Global Step: 521310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:10,816-Speed 2617.98 samples/sec   Loss 4.9005   LearningRate 0.0138   Epoch: 12   Global Step: 521320   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:14,718-Speed 2624.75 samples/sec   Loss 4.9205   LearningRate 0.0138   Epoch: 12   Global Step: 521330   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:18,621-Speed 2624.21 samples/sec   Loss 4.9134   LearningRate 0.0138   Epoch: 12   Global Step: 521340   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:22,520-Speed 2626.63 samples/sec   Loss 5.0373   LearningRate 0.0138   Epoch: 12   Global Step: 521350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:26,468-Speed 2594.55 samples/sec   Loss 4.9085   LearningRate 0.0138   Epoch: 12   Global Step: 521360   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:19:30,360-Speed 2631.65 samples/sec   Loss 4.9921   LearningRate 0.0138   Epoch: 12   Global Step: 521370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:34,263-Speed 2624.86 samples/sec   Loss 4.9943   LearningRate 0.0138   Epoch: 12   Global Step: 521380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:38,172-Speed 2619.87 samples/sec   Loss 4.9912   LearningRate 0.0138   Epoch: 12   Global Step: 521390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:42,084-Speed 2618.65 samples/sec   Loss 5.0068   LearningRate 0.0138   Epoch: 12   Global Step: 521400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:46,026-Speed 2597.79 samples/sec   Loss 4.9586   LearningRate 0.0138   Epoch: 12   Global Step: 521410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:49,939-Speed 2618.22 samples/sec   Loss 4.8691   LearningRate 0.0138   Epoch: 12   Global Step: 521420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:53,842-Speed 2624.24 samples/sec   Loss 5.0029   LearningRate 0.0138   Epoch: 12   Global Step: 521430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:19:57,742-Speed 2626.14 samples/sec   Loss 5.0262   LearningRate 0.0138   Epoch: 12   Global Step: 521440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:01,652-Speed 2619.76 samples/sec   Loss 5.0220   LearningRate 0.0138   Epoch: 12   Global Step: 521450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:05,700-Speed 2530.25 samples/sec   Loss 4.8731   LearningRate 0.0138   Epoch: 12   Global Step: 521460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:09,714-Speed 2551.87 samples/sec   Loss 5.0006   LearningRate 0.0138   Epoch: 12   Global Step: 521470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:20:13,611-Speed 2628.72 samples/sec   Loss 4.9582   LearningRate 0.0138   Epoch: 12   Global Step: 521480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:20:17,509-Speed 2627.02 samples/sec   Loss 4.9448   LearningRate 0.0138   Epoch: 12   Global Step: 521490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:20:21,418-Speed 2620.46 samples/sec   Loss 4.9744   LearningRate 0.0138   Epoch: 12   Global Step: 521500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:20:25,311-Speed 2630.78 samples/sec   Loss 4.9922   LearningRate 0.0138   Epoch: 12   Global Step: 521510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:20:29,210-Speed 2627.18 samples/sec   Loss 5.0168   LearningRate 0.0138   Epoch: 12   Global Step: 521520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:20:33,090-Speed 2640.55 samples/sec   Loss 5.0209   LearningRate 0.0138   Epoch: 12   Global Step: 521530   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:36,990-Speed 2625.87 samples/sec   Loss 4.9192   LearningRate 0.0138   Epoch: 12   Global Step: 521540   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:40,892-Speed 2625.07 samples/sec   Loss 5.0466   LearningRate 0.0138   Epoch: 12   Global Step: 521550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:44,801-Speed 2621.03 samples/sec   Loss 5.0060   LearningRate 0.0138   Epoch: 12   Global Step: 521560   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:48,707-Speed 2622.01 samples/sec   Loss 5.0365   LearningRate 0.0138   Epoch: 12   Global Step: 521570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:52,608-Speed 2625.83 samples/sec   Loss 4.9617   LearningRate 0.0138   Epoch: 12   Global Step: 521580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:20:56,524-Speed 2615.75 samples/sec   Loss 5.0466   LearningRate 0.0138   Epoch: 12   Global Step: 521590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:00,425-Speed 2625.42 samples/sec   Loss 5.0559   LearningRate 0.0138   Epoch: 12   Global Step: 521600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:04,343-Speed 2613.96 samples/sec   Loss 4.8999   LearningRate 0.0138   Epoch: 12   Global Step: 521610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:08,249-Speed 2622.56 samples/sec   Loss 5.0890   LearningRate 0.0138   Epoch: 12   Global Step: 521620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:12,152-Speed 2624.49 samples/sec   Loss 4.9968   LearningRate 0.0138   Epoch: 12   Global Step: 521630   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:16,046-Speed 2629.94 samples/sec   Loss 5.0261   LearningRate 0.0138   Epoch: 12   Global Step: 521640   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:19,963-Speed 2615.53 samples/sec   Loss 5.0124   LearningRate 0.0138   Epoch: 12   Global Step: 521650   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:23,861-Speed 2627.51 samples/sec   Loss 4.9853   LearningRate 0.0138   Epoch: 12   Global Step: 521660   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:27,769-Speed 2621.51 samples/sec   Loss 5.0073   LearningRate 0.0138   Epoch: 12   Global Step: 521670   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:31,664-Speed 2629.05 samples/sec   Loss 4.9376   LearningRate 0.0138   Epoch: 12   Global Step: 521680   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:35,582-Speed 2614.79 samples/sec   Loss 5.0010   LearningRate 0.0138   Epoch: 12   Global Step: 521690   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:21:39,463-Speed 2638.69 samples/sec   Loss 4.9488   LearningRate 0.0138   Epoch: 12   Global Step: 521700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:43,362-Speed 2627.52 samples/sec   Loss 5.0598   LearningRate 0.0138   Epoch: 12   Global Step: 521710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:47,266-Speed 2623.56 samples/sec   Loss 4.9709   LearningRate 0.0138   Epoch: 12   Global Step: 521720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:51,176-Speed 2620.35 samples/sec   Loss 5.0302   LearningRate 0.0138   Epoch: 12   Global Step: 521730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:55,092-Speed 2615.56 samples/sec   Loss 4.9492   LearningRate 0.0138   Epoch: 12   Global Step: 521740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:21:58,989-Speed 2628.33 samples/sec   Loss 4.9944   LearningRate 0.0138   Epoch: 12   Global Step: 521750   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:02,886-Speed 2628.53 samples/sec   Loss 5.0113   LearningRate 0.0138   Epoch: 12   Global Step: 521760   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:06,794-Speed 2620.64 samples/sec   Loss 4.9711   LearningRate 0.0138   Epoch: 12   Global Step: 521770   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:10,693-Speed 2626.80 samples/sec   Loss 5.0367   LearningRate 0.0138   Epoch: 12   Global Step: 521780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:14,639-Speed 2596.41 samples/sec   Loss 5.0508   LearningRate 0.0138   Epoch: 12   Global Step: 521790   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:18,536-Speed 2628.18 samples/sec   Loss 5.0083   LearningRate 0.0138   Epoch: 12   Global Step: 521800   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:22:22,448-Speed 2618.15 samples/sec   Loss 4.9936   LearningRate 0.0138   Epoch: 12   Global Step: 521810   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:22:26,325-Speed 2642.43 samples/sec   Loss 4.9472   LearningRate 0.0138   Epoch: 12   Global Step: 521820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:30,227-Speed 2624.78 samples/sec   Loss 4.9437   LearningRate 0.0138   Epoch: 12   Global Step: 521830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:34,128-Speed 2626.05 samples/sec   Loss 5.0318   LearningRate 0.0138   Epoch: 12   Global Step: 521840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:38,028-Speed 2625.53 samples/sec   Loss 5.0831   LearningRate 0.0138   Epoch: 12   Global Step: 521850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:41,931-Speed 2625.26 samples/sec   Loss 4.9444   LearningRate 0.0138   Epoch: 12   Global Step: 521860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:45,832-Speed 2625.34 samples/sec   Loss 4.9662   LearningRate 0.0138   Epoch: 12   Global Step: 521870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:49,731-Speed 2627.59 samples/sec   Loss 5.0007   LearningRate 0.0138   Epoch: 12   Global Step: 521880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:53,629-Speed 2627.30 samples/sec   Loss 4.9850   LearningRate 0.0138   Epoch: 12   Global Step: 521890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:22:57,526-Speed 2629.07 samples/sec   Loss 4.9395   LearningRate 0.0138   Epoch: 12   Global Step: 521900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:01,427-Speed 2625.54 samples/sec   Loss 5.1562   LearningRate 0.0138   Epoch: 12   Global Step: 521910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:05,334-Speed 2621.49 samples/sec   Loss 5.0132   LearningRate 0.0138   Epoch: 12   Global Step: 521920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:23:09,237-Speed 2624.08 samples/sec   Loss 4.9990   LearningRate 0.0138   Epoch: 12   Global Step: 521930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:23:13,134-Speed 2628.56 samples/sec   Loss 5.0465   LearningRate 0.0138   Epoch: 12   Global Step: 521940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:23:17,032-Speed 2627.80 samples/sec   Loss 4.8250   LearningRate 0.0138   Epoch: 12   Global Step: 521950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:23:20,926-Speed 2630.95 samples/sec   Loss 4.9237   LearningRate 0.0138   Epoch: 12   Global Step: 521960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:23:24,804-Speed 2641.15 samples/sec   Loss 4.9714   LearningRate 0.0137   Epoch: 12   Global Step: 521970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:28,714-Speed 2619.73 samples/sec   Loss 4.9117   LearningRate 0.0137   Epoch: 12   Global Step: 521980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:32,621-Speed 2621.42 samples/sec   Loss 4.9772   LearningRate 0.0137   Epoch: 12   Global Step: 521990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:36,525-Speed 2623.74 samples/sec   Loss 4.9728   LearningRate 0.0137   Epoch: 12   Global Step: 522000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:40,469-Speed 2596.61 samples/sec   Loss 5.0856   LearningRate 0.0137   Epoch: 12   Global Step: 522010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:44,364-Speed 2630.46 samples/sec   Loss 4.9645   LearningRate 0.0137   Epoch: 12   Global Step: 522020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:48,256-Speed 2630.95 samples/sec   Loss 4.9672   LearningRate 0.0137   Epoch: 12   Global Step: 522030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:52,153-Speed 2628.74 samples/sec   Loss 5.0913   LearningRate 0.0137   Epoch: 12   Global Step: 522040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:56,049-Speed 2628.98 samples/sec   Loss 4.9253   LearningRate 0.0137   Epoch: 12   Global Step: 522050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:23:59,948-Speed 2627.42 samples/sec   Loss 5.0310   LearningRate 0.0137   Epoch: 12   Global Step: 522060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:03,842-Speed 2630.01 samples/sec   Loss 4.8804   LearningRate 0.0137   Epoch: 12   Global Step: 522070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:24:07,750-Speed 2620.88 samples/sec   Loss 5.0139   LearningRate 0.0137   Epoch: 12   Global Step: 522080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:24:11,652-Speed 2624.75 samples/sec   Loss 5.1034   LearningRate 0.0137   Epoch: 12   Global Step: 522090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:15,559-Speed 2621.82 samples/sec   Loss 4.9877   LearningRate 0.0137   Epoch: 12   Global Step: 522100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:19,460-Speed 2625.99 samples/sec   Loss 4.9765   LearningRate 0.0137   Epoch: 12   Global Step: 522110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:23,361-Speed 2625.50 samples/sec   Loss 4.9984   LearningRate 0.0137   Epoch: 12   Global Step: 522120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:27,262-Speed 2624.88 samples/sec   Loss 4.8994   LearningRate 0.0137   Epoch: 12   Global Step: 522130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:31,168-Speed 2622.11 samples/sec   Loss 5.0091   LearningRate 0.0137   Epoch: 12   Global Step: 522140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:35,074-Speed 2622.82 samples/sec   Loss 5.0190   LearningRate 0.0137   Epoch: 12   Global Step: 522150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:38,970-Speed 2629.38 samples/sec   Loss 4.9467   LearningRate 0.0137   Epoch: 12   Global Step: 522160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:42,867-Speed 2628.47 samples/sec   Loss 4.9319   LearningRate 0.0137   Epoch: 12   Global Step: 522170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:46,764-Speed 2628.51 samples/sec   Loss 4.9560   LearningRate 0.0137   Epoch: 12   Global Step: 522180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:24:50,660-Speed 2628.99 samples/sec   Loss 4.9726   LearningRate 0.0137   Epoch: 12   Global Step: 522190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:24:54,606-Speed 2595.18 samples/sec   Loss 5.0377   LearningRate 0.0137   Epoch: 12   Global Step: 522200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:24:58,499-Speed 2630.77 samples/sec   Loss 5.0071   LearningRate 0.0137   Epoch: 12   Global Step: 522210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:25:02,412-Speed 2617.55 samples/sec   Loss 4.9020   LearningRate 0.0137   Epoch: 12   Global Step: 522220   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:25:06,314-Speed 2625.44 samples/sec   Loss 4.9040   LearningRate 0.0137   Epoch: 12   Global Step: 522230   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:25:10,189-Speed 2643.63 samples/sec   Loss 4.9132   LearningRate 0.0137   Epoch: 12   Global Step: 522240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:14,084-Speed 2629.18 samples/sec   Loss 5.0793   LearningRate 0.0137   Epoch: 12   Global Step: 522250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:17,986-Speed 2625.58 samples/sec   Loss 4.9755   LearningRate 0.0137   Epoch: 12   Global Step: 522260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:21,888-Speed 2624.18 samples/sec   Loss 4.9893   LearningRate 0.0137   Epoch: 12   Global Step: 522270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:25,786-Speed 2628.02 samples/sec   Loss 4.9940   LearningRate 0.0137   Epoch: 12   Global Step: 522280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:29,685-Speed 2626.28 samples/sec   Loss 5.0209   LearningRate 0.0137   Epoch: 12   Global Step: 522290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:33,585-Speed 2626.95 samples/sec   Loss 4.8631   LearningRate 0.0137   Epoch: 12   Global Step: 522300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:37,482-Speed 2627.64 samples/sec   Loss 4.9506   LearningRate 0.0137   Epoch: 12   Global Step: 522310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:41,379-Speed 2628.50 samples/sec   Loss 5.0252   LearningRate 0.0137   Epoch: 12   Global Step: 522320   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:45,280-Speed 2625.92 samples/sec   Loss 4.9686   LearningRate 0.0137   Epoch: 12   Global Step: 522330   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:49,180-Speed 2630.45 samples/sec   Loss 5.0238   LearningRate 0.0137   Epoch: 12   Global Step: 522340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:25:53,051-Speed 2645.55 samples/sec   Loss 4.9019   LearningRate 0.0137   Epoch: 12   Global Step: 522350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:25:56,949-Speed 2627.17 samples/sec   Loss 5.0480   LearningRate 0.0137   Epoch: 12   Global Step: 522360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:00,842-Speed 2630.92 samples/sec   Loss 4.9141   LearningRate 0.0137   Epoch: 12   Global Step: 522370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:04,737-Speed 2629.95 samples/sec   Loss 4.9360   LearningRate 0.0137   Epoch: 12   Global Step: 522380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:08,635-Speed 2628.28 samples/sec   Loss 4.9378   LearningRate 0.0137   Epoch: 12   Global Step: 522390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:12,529-Speed 2629.87 samples/sec   Loss 4.9167   LearningRate 0.0137   Epoch: 12   Global Step: 522400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:16,422-Speed 2631.54 samples/sec   Loss 4.9951   LearningRate 0.0137   Epoch: 12   Global Step: 522410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:20,315-Speed 2630.83 samples/sec   Loss 4.9818   LearningRate 0.0137   Epoch: 12   Global Step: 522420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:24,211-Speed 2629.11 samples/sec   Loss 4.9744   LearningRate 0.0137   Epoch: 12   Global Step: 522430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:28,101-Speed 2632.86 samples/sec   Loss 4.9908   LearningRate 0.0137   Epoch: 12   Global Step: 522440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-15 06:26:31,995-Speed 2630.67 samples/sec   Loss 4.9069   LearningRate 0.0137   Epoch: 12   Global Step: 522450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:35,889-Speed 2629.98 samples/sec   Loss 4.9767   LearningRate 0.0137   Epoch: 12   Global Step: 522460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:39,785-Speed 2628.79 samples/sec   Loss 5.0789   LearningRate 0.0137   Epoch: 12   Global Step: 522470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:43,679-Speed 2631.65 samples/sec   Loss 5.0251   LearningRate 0.0137   Epoch: 12   Global Step: 522480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:47,578-Speed 2626.69 samples/sec   Loss 4.9557   LearningRate 0.0137   Epoch: 12   Global Step: 522490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:51,476-Speed 2627.67 samples/sec   Loss 4.9737   LearningRate 0.0137   Epoch: 12   Global Step: 522500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:55,397-Speed 2612.40 samples/sec   Loss 4.9312   LearningRate 0.0137   Epoch: 12   Global Step: 522510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:26:59,294-Speed 2627.85 samples/sec   Loss 5.0026   LearningRate 0.0137   Epoch: 12   Global Step: 522520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:27:03,189-Speed 2629.62 samples/sec   Loss 4.9860   LearningRate 0.0137   Epoch: 12   Global Step: 522530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:27:07,091-Speed 2625.32 samples/sec   Loss 4.8960   LearningRate 0.0137   Epoch: 12   Global Step: 522540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:27:10,961-Speed 2646.21 samples/sec   Loss 5.0272   LearningRate 0.0137   Epoch: 12   Global Step: 522550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:27:14,873-Speed 2618.54 samples/sec   Loss 5.0502   LearningRate 0.0137   Epoch: 12   Global Step: 522560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-15 06:27:18,769-Speed 2629.35 samples/sec   Loss 4.8761   LearningRate 0.0137   Epoch: 12   Global Step: 522570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:22,664-Speed 2629.73 samples/sec   Loss 4.9519   LearningRate 0.0137   Epoch: 12   Global Step: 522580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:26,560-Speed 2628.77 samples/sec   Loss 4.9156   LearningRate 0.0137   Epoch: 12   Global Step: 522590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:30,467-Speed 2621.68 samples/sec   Loss 5.0481   LearningRate 0.0137   Epoch: 12   Global Step: 522600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:34,365-Speed 2627.30 samples/sec   Loss 4.9199   LearningRate 0.0137   Epoch: 12   Global Step: 522610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:38,269-Speed 2623.20 samples/sec   Loss 4.9501   LearningRate 0.0137   Epoch: 12   Global Step: 522620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:42,174-Speed 2623.56 samples/sec   Loss 4.9939   LearningRate 0.0137   Epoch: 12   Global Step: 522630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:27:46,060-Speed 2635.14 samples/sec   Loss 4.9141   LearningRate 0.0137   Epoch: 12   Global Step: 522640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:27:49,960-Speed 2627.23 samples/sec   Loss 5.0566   LearningRate 0.0137   Epoch: 12   Global Step: 522650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:27:53,860-Speed 2626.37 samples/sec   Loss 4.9478   LearningRate 0.0137   Epoch: 12   Global Step: 522660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:27:57,772-Speed 2618.01 samples/sec   Loss 4.8816   LearningRate 0.0137   Epoch: 12   Global Step: 522670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:01,675-Speed 2624.31 samples/sec   Loss 4.9795   LearningRate 0.0137   Epoch: 12   Global Step: 522680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:05,574-Speed 2626.32 samples/sec   Loss 4.9839   LearningRate 0.0137   Epoch: 12   Global Step: 522690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:09,468-Speed 2630.54 samples/sec   Loss 4.9874   LearningRate 0.0137   Epoch: 12   Global Step: 522700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:13,364-Speed 2629.19 samples/sec   Loss 4.9874   LearningRate 0.0137   Epoch: 12   Global Step: 522710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:17,260-Speed 2628.74 samples/sec   Loss 4.9212   LearningRate 0.0137   Epoch: 12   Global Step: 522720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:21,157-Speed 2628.12 samples/sec   Loss 4.9615   LearningRate 0.0137   Epoch: 12   Global Step: 522730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:25,058-Speed 2625.77 samples/sec   Loss 4.8467   LearningRate 0.0137   Epoch: 12   Global Step: 522740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:28:28,970-Speed 2618.96 samples/sec   Loss 4.8711   LearningRate 0.0137   Epoch: 12   Global Step: 522750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:28:32,862-Speed 2631.40 samples/sec   Loss 5.0405   LearningRate 0.0137   Epoch: 12   Global Step: 522760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:28:36,758-Speed 2628.38 samples/sec   Loss 4.8831   LearningRate 0.0137   Epoch: 12   Global Step: 522770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:28:40,657-Speed 2626.84 samples/sec   Loss 5.0045   LearningRate 0.0137   Epoch: 12   Global Step: 522780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:28:44,552-Speed 2630.19 samples/sec   Loss 4.9713   LearningRate 0.0137   Epoch: 12   Global Step: 522790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:48,449-Speed 2628.16 samples/sec   Loss 4.9856   LearningRate 0.0137   Epoch: 12   Global Step: 522800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:52,452-Speed 2559.02 samples/sec   Loss 4.9230   LearningRate 0.0137   Epoch: 12   Global Step: 522810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:28:56,353-Speed 2625.82 samples/sec   Loss 4.9587   LearningRate 0.0137   Epoch: 12   Global Step: 522820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:00,247-Speed 2631.43 samples/sec   Loss 4.9237   LearningRate 0.0137   Epoch: 12   Global Step: 522830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:04,141-Speed 2630.16 samples/sec   Loss 5.0307   LearningRate 0.0137   Epoch: 12   Global Step: 522840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:08,035-Speed 2629.86 samples/sec   Loss 4.8518   LearningRate 0.0137   Epoch: 12   Global Step: 522850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:11,940-Speed 2622.59 samples/sec   Loss 4.9369   LearningRate 0.0137   Epoch: 12   Global Step: 522860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:15,838-Speed 2627.78 samples/sec   Loss 4.8586   LearningRate 0.0137   Epoch: 12   Global Step: 522870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:19,740-Speed 2625.58 samples/sec   Loss 5.0168   LearningRate 0.0137   Epoch: 12   Global Step: 522880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:23,638-Speed 2627.03 samples/sec   Loss 4.9629   LearningRate 0.0137   Epoch: 12   Global Step: 522890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:27,562-Speed 2610.82 samples/sec   Loss 4.9984   LearningRate 0.0137   Epoch: 12   Global Step: 522900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:31,520-Speed 2587.16 samples/sec   Loss 4.9173   LearningRate 0.0137   Epoch: 12   Global Step: 522910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:35,419-Speed 2627.38 samples/sec   Loss 5.0091   LearningRate 0.0137   Epoch: 12   Global Step: 522920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:39,323-Speed 2622.89 samples/sec   Loss 5.0167   LearningRate 0.0137   Epoch: 12   Global Step: 522930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:43,265-Speed 2598.70 samples/sec   Loss 4.9981   LearningRate 0.0137   Epoch: 12   Global Step: 522940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:47,207-Speed 2598.29 samples/sec   Loss 4.9107   LearningRate 0.0137   Epoch: 12   Global Step: 522950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:51,105-Speed 2628.40 samples/sec   Loss 4.9178   LearningRate 0.0137   Epoch: 12   Global Step: 522960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:29:55,069-Speed 2583.22 samples/sec   Loss 5.0427   LearningRate 0.0137   Epoch: 12   Global Step: 522970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:29:59,058-Speed 2568.09 samples/sec   Loss 4.9765   LearningRate 0.0137   Epoch: 12   Global Step: 522980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:02,955-Speed 2628.71 samples/sec   Loss 5.0281   LearningRate 0.0137   Epoch: 12   Global Step: 522990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:06,853-Speed 2626.94 samples/sec   Loss 4.9598   LearningRate 0.0137   Epoch: 12   Global Step: 523000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:10,769-Speed 2615.89 samples/sec   Loss 5.0785   LearningRate 0.0137   Epoch: 12   Global Step: 523010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:14,667-Speed 2627.41 samples/sec   Loss 4.9980   LearningRate 0.0137   Epoch: 12   Global Step: 523020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:18,570-Speed 2624.52 samples/sec   Loss 5.0646   LearningRate 0.0137   Epoch: 12   Global Step: 523030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:22,466-Speed 2628.82 samples/sec   Loss 4.9658   LearningRate 0.0137   Epoch: 12   Global Step: 523040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:26,363-Speed 2627.91 samples/sec   Loss 4.9591   LearningRate 0.0137   Epoch: 12   Global Step: 523050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:30,267-Speed 2624.29 samples/sec   Loss 4.9363   LearningRate 0.0137   Epoch: 12   Global Step: 523060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:34,212-Speed 2596.50 samples/sec   Loss 4.9989   LearningRate 0.0137   Epoch: 12   Global Step: 523070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:30:38,109-Speed 2628.41 samples/sec   Loss 5.0229   LearningRate 0.0137   Epoch: 12   Global Step: 523080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:30:42,005-Speed 2629.36 samples/sec   Loss 4.8846   LearningRate 0.0136   Epoch: 12   Global Step: 523090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:30:45,880-Speed 2642.87 samples/sec   Loss 4.8748   LearningRate 0.0136   Epoch: 12   Global Step: 523100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:30:49,765-Speed 2636.98 samples/sec   Loss 5.0545   LearningRate 0.0136   Epoch: 12   Global Step: 523110   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:30:53,670-Speed 2622.58 samples/sec   Loss 5.0215   LearningRate 0.0136   Epoch: 12   Global Step: 523120   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:30:57,577-Speed 2622.01 samples/sec   Loss 5.0074   LearningRate 0.0136   Epoch: 12   Global Step: 523130   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:01,468-Speed 2631.76 samples/sec   Loss 5.0732   LearningRate 0.0136   Epoch: 12   Global Step: 523140   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:05,360-Speed 2632.02 samples/sec   Loss 4.9683   LearningRate 0.0136   Epoch: 12   Global Step: 523150   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:09,255-Speed 2629.79 samples/sec   Loss 5.0402   LearningRate 0.0136   Epoch: 12   Global Step: 523160   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:13,175-Speed 2613.13 samples/sec   Loss 4.9476   LearningRate 0.0136   Epoch: 12   Global Step: 523170   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:17,080-Speed 2622.98 samples/sec   Loss 4.8381   LearningRate 0.0136   Epoch: 12   Global Step: 523180   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:20,972-Speed 2632.00 samples/sec   Loss 4.9108   LearningRate 0.0136   Epoch: 12   Global Step: 523190   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:24,878-Speed 2622.11 samples/sec   Loss 5.0128   LearningRate 0.0136   Epoch: 12   Global Step: 523200   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:28,781-Speed 2623.91 samples/sec   Loss 4.8696   LearningRate 0.0136   Epoch: 12   Global Step: 523210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:31:32,660-Speed 2640.50 samples/sec   Loss 4.9820   LearningRate 0.0136   Epoch: 12   Global Step: 523220   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:36,556-Speed 2628.83 samples/sec   Loss 4.8407   LearningRate 0.0136   Epoch: 12   Global Step: 523230   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:40,463-Speed 2621.53 samples/sec   Loss 4.9999   LearningRate 0.0136   Epoch: 12   Global Step: 523240   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:44,356-Speed 2630.81 samples/sec   Loss 5.0623   LearningRate 0.0136   Epoch: 12   Global Step: 523250   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:48,250-Speed 2630.89 samples/sec   Loss 4.9055   LearningRate 0.0136   Epoch: 12   Global Step: 523260   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:52,151-Speed 2625.80 samples/sec   Loss 5.0145   LearningRate 0.0136   Epoch: 12   Global Step: 523270   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:56,044-Speed 2630.84 samples/sec   Loss 4.8774   LearningRate 0.0136   Epoch: 12   Global Step: 523280   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:31:59,939-Speed 2629.69 samples/sec   Loss 5.0049   LearningRate 0.0136   Epoch: 12   Global Step: 523290   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:32:03,835-Speed 2629.01 samples/sec   Loss 5.0875   LearningRate 0.0136   Epoch: 12   Global Step: 523300   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:32:07,728-Speed 2630.38 samples/sec   Loss 4.8858   LearningRate 0.0136   Epoch: 12   Global Step: 523310   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:32:11,629-Speed 2626.28 samples/sec   Loss 4.9864   LearningRate 0.0136   Epoch: 12   Global Step: 523320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:15,527-Speed 2627.55 samples/sec   Loss 4.9568   LearningRate 0.0136   Epoch: 12   Global Step: 523330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:19,429-Speed 2624.67 samples/sec   Loss 4.9311   LearningRate 0.0136   Epoch: 12   Global Step: 523340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:23,328-Speed 2627.08 samples/sec   Loss 4.8370   LearningRate 0.0136   Epoch: 12   Global Step: 523350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:27,228-Speed 2626.73 samples/sec   Loss 5.0074   LearningRate 0.0136   Epoch: 12   Global Step: 523360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:31,124-Speed 2628.75 samples/sec   Loss 4.9331   LearningRate 0.0136   Epoch: 12   Global Step: 523370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:35,020-Speed 2629.84 samples/sec   Loss 4.9023   LearningRate 0.0136   Epoch: 12   Global Step: 523380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:38,916-Speed 2628.41 samples/sec   Loss 5.0277   LearningRate 0.0136   Epoch: 12   Global Step: 523390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:42,814-Speed 2627.93 samples/sec   Loss 5.0114   LearningRate 0.0136   Epoch: 12   Global Step: 523400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:46,741-Speed 2608.48 samples/sec   Loss 5.0110   LearningRate 0.0136   Epoch: 12   Global Step: 523410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:32:50,634-Speed 2630.81 samples/sec   Loss 5.0109   LearningRate 0.0136   Epoch: 12   Global Step: 523420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:32:54,533-Speed 2627.22 samples/sec   Loss 4.8904   LearningRate 0.0136   Epoch: 12   Global Step: 523430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:32:58,433-Speed 2626.14 samples/sec   Loss 4.9689   LearningRate 0.0136   Epoch: 12   Global Step: 523440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:33:02,319-Speed 2636.15 samples/sec   Loss 4.9029   LearningRate 0.0136   Epoch: 12   Global Step: 523450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:33:06,211-Speed 2631.28 samples/sec   Loss 5.1053   LearningRate 0.0136   Epoch: 12   Global Step: 523460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:33:10,128-Speed 2615.12 samples/sec   Loss 4.9067   LearningRate 0.0136   Epoch: 12   Global Step: 523470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:33:14,025-Speed 2628.63 samples/sec   Loss 4.9219   LearningRate 0.0136   Epoch: 12   Global Step: 523480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:33:17,902-Speed 2641.83 samples/sec   Loss 4.9477   LearningRate 0.0136   Epoch: 12   Global Step: 523490   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:21,827-Speed 2609.76 samples/sec   Loss 4.8144   LearningRate 0.0136   Epoch: 12   Global Step: 523500   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:25,727-Speed 2626.22 samples/sec   Loss 4.9741   LearningRate 0.0136   Epoch: 12   Global Step: 523510   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:29,628-Speed 2625.95 samples/sec   Loss 4.9361   LearningRate 0.0136   Epoch: 12   Global Step: 523520   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:33,519-Speed 2632.14 samples/sec   Loss 4.9644   LearningRate 0.0136   Epoch: 12   Global Step: 523530   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:37,413-Speed 2630.43 samples/sec   Loss 4.9634   LearningRate 0.0136   Epoch: 12   Global Step: 523540   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:41,307-Speed 2629.74 samples/sec   Loss 4.9664   LearningRate 0.0136   Epoch: 12   Global Step: 523550   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:45,199-Speed 2632.43 samples/sec   Loss 5.0021   LearningRate 0.0136   Epoch: 12   Global Step: 523560   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:49,092-Speed 2630.94 samples/sec   Loss 5.0069   LearningRate 0.0136   Epoch: 12   Global Step: 523570   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:52,997-Speed 2622.87 samples/sec   Loss 5.0482   LearningRate 0.0136   Epoch: 12   Global Step: 523580   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:33:56,944-Speed 2594.66 samples/sec   Loss 4.9194   LearningRate 0.0136   Epoch: 12   Global Step: 523590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:00,844-Speed 2627.36 samples/sec   Loss 4.9081   LearningRate 0.0136   Epoch: 12   Global Step: 523600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:04,739-Speed 2629.45 samples/sec   Loss 5.0584   LearningRate 0.0136   Epoch: 12   Global Step: 523610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:08,636-Speed 2627.94 samples/sec   Loss 4.8841   LearningRate 0.0136   Epoch: 12   Global Step: 523620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:12,529-Speed 2630.66 samples/sec   Loss 5.0066   LearningRate 0.0136   Epoch: 12   Global Step: 523630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:16,424-Speed 2629.57 samples/sec   Loss 4.9640   LearningRate 0.0136   Epoch: 12   Global Step: 523640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:20,320-Speed 2629.89 samples/sec   Loss 4.9481   LearningRate 0.0136   Epoch: 12   Global Step: 523650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:24,230-Speed 2619.46 samples/sec   Loss 4.8539   LearningRate 0.0136   Epoch: 12   Global Step: 523660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:28,227-Speed 2562.20 samples/sec   Loss 4.8975   LearningRate 0.0136   Epoch: 12   Global Step: 523670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:32,122-Speed 2629.79 samples/sec   Loss 4.8827   LearningRate 0.0136   Epoch: 12   Global Step: 523680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:35,996-Speed 2644.00 samples/sec   Loss 4.8949   LearningRate 0.0136   Epoch: 12   Global Step: 523690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:39,889-Speed 2631.05 samples/sec   Loss 4.9920   LearningRate 0.0136   Epoch: 12   Global Step: 523700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:43,803-Speed 2617.33 samples/sec   Loss 5.0544   LearningRate 0.0136   Epoch: 12   Global Step: 523710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:47,696-Speed 2631.11 samples/sec   Loss 4.9545   LearningRate 0.0136   Epoch: 12   Global Step: 523720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:51,595-Speed 2627.15 samples/sec   Loss 4.8944   LearningRate 0.0136   Epoch: 12   Global Step: 523730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:55,488-Speed 2630.55 samples/sec   Loss 4.8985   LearningRate 0.0136   Epoch: 12   Global Step: 523740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:34:59,397-Speed 2620.65 samples/sec   Loss 4.9250   LearningRate 0.0136   Epoch: 12   Global Step: 523750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:03,288-Speed 2632.22 samples/sec   Loss 4.9908   LearningRate 0.0136   Epoch: 12   Global Step: 523760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:07,178-Speed 2632.55 samples/sec   Loss 4.9962   LearningRate 0.0136   Epoch: 12   Global Step: 523770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:11,080-Speed 2625.00 samples/sec   Loss 5.0100   LearningRate 0.0136   Epoch: 12   Global Step: 523780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:14,975-Speed 2629.66 samples/sec   Loss 4.9530   LearningRate 0.0136   Epoch: 12   Global Step: 523790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:35:18,851-Speed 2642.17 samples/sec   Loss 4.8111   LearningRate 0.0136   Epoch: 12   Global Step: 523800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:22,751-Speed 2627.21 samples/sec   Loss 4.9558   LearningRate 0.0136   Epoch: 12   Global Step: 523810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:26,649-Speed 2627.81 samples/sec   Loss 4.9238   LearningRate 0.0136   Epoch: 12   Global Step: 523820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:30,543-Speed 2630.08 samples/sec   Loss 4.9625   LearningRate 0.0136   Epoch: 12   Global Step: 523830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:34,439-Speed 2628.80 samples/sec   Loss 5.0232   LearningRate 0.0136   Epoch: 12   Global Step: 523840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:38,334-Speed 2629.95 samples/sec   Loss 4.8707   LearningRate 0.0136   Epoch: 12   Global Step: 523850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:42,232-Speed 2627.38 samples/sec   Loss 5.0208   LearningRate 0.0136   Epoch: 12   Global Step: 523860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:46,125-Speed 2631.15 samples/sec   Loss 4.8253   LearningRate 0.0136   Epoch: 12   Global Step: 523870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:50,020-Speed 2629.54 samples/sec   Loss 4.9617   LearningRate 0.0136   Epoch: 12   Global Step: 523880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:53,920-Speed 2625.99 samples/sec   Loss 5.1934   LearningRate 0.0136   Epoch: 12   Global Step: 523890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:35:57,811-Speed 2632.86 samples/sec   Loss 4.8805   LearningRate 0.0136   Epoch: 12   Global Step: 523900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:36:01,702-Speed 2632.32 samples/sec   Loss 4.9360   LearningRate 0.0136   Epoch: 12   Global Step: 523910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:36:05,601-Speed 2627.24 samples/sec   Loss 4.9445   LearningRate 0.0136   Epoch: 12   Global Step: 523920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:36:09,503-Speed 2624.48 samples/sec   Loss 4.9052   LearningRate 0.0136   Epoch: 12   Global Step: 523930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:36:13,372-Speed 2648.02 samples/sec   Loss 5.0146   LearningRate 0.0136   Epoch: 12   Global Step: 523940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:17,264-Speed 2632.00 samples/sec   Loss 4.8632   LearningRate 0.0136   Epoch: 12   Global Step: 523950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:21,154-Speed 2632.84 samples/sec   Loss 4.8621   LearningRate 0.0136   Epoch: 12   Global Step: 523960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:25,066-Speed 2618.97 samples/sec   Loss 5.0333   LearningRate 0.0136   Epoch: 12   Global Step: 523970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:28,967-Speed 2624.95 samples/sec   Loss 5.0559   LearningRate 0.0136   Epoch: 12   Global Step: 523980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:32,866-Speed 2626.68 samples/sec   Loss 4.9037   LearningRate 0.0136   Epoch: 12   Global Step: 523990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:36,762-Speed 2629.05 samples/sec   Loss 4.9123   LearningRate 0.0136   Epoch: 12   Global Step: 524000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:40,655-Speed 2631.42 samples/sec   Loss 4.9940   LearningRate 0.0136   Epoch: 12   Global Step: 524010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:44,546-Speed 2632.45 samples/sec   Loss 5.0024   LearningRate 0.0136   Epoch: 12   Global Step: 524020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:48,443-Speed 2628.24 samples/sec   Loss 5.0095   LearningRate 0.0136   Epoch: 12   Global Step: 524030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:36:52,337-Speed 2630.84 samples/sec   Loss 4.9462   LearningRate 0.0136   Epoch: 12   Global Step: 524040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:36:56,230-Speed 2630.68 samples/sec   Loss 5.0668   LearningRate 0.0136   Epoch: 12   Global Step: 524050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:37:00,130-Speed 2625.78 samples/sec   Loss 4.8515   LearningRate 0.0136   Epoch: 12   Global Step: 524060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:37:04,036-Speed 2622.25 samples/sec   Loss 4.9744   LearningRate 0.0136   Epoch: 12   Global Step: 524070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:37:07,915-Speed 2641.19 samples/sec   Loss 4.8464   LearningRate 0.0136   Epoch: 12   Global Step: 524080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:11,814-Speed 2626.51 samples/sec   Loss 5.0482   LearningRate 0.0136   Epoch: 12   Global Step: 524090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:15,717-Speed 2624.64 samples/sec   Loss 5.0694   LearningRate 0.0136   Epoch: 12   Global Step: 524100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:19,631-Speed 2616.84 samples/sec   Loss 4.9032   LearningRate 0.0136   Epoch: 12   Global Step: 524110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:23,525-Speed 2630.39 samples/sec   Loss 4.9730   LearningRate 0.0136   Epoch: 12   Global Step: 524120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:27,420-Speed 2629.34 samples/sec   Loss 4.9897   LearningRate 0.0136   Epoch: 12   Global Step: 524130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:31,313-Speed 2630.95 samples/sec   Loss 4.9497   LearningRate 0.0136   Epoch: 12   Global Step: 524140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:35,231-Speed 2613.89 samples/sec   Loss 4.9483   LearningRate 0.0136   Epoch: 12   Global Step: 524150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:39,159-Speed 2608.33 samples/sec   Loss 4.9874   LearningRate 0.0136   Epoch: 12   Global Step: 524160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:43,056-Speed 2628.67 samples/sec   Loss 4.8869   LearningRate 0.0136   Epoch: 12   Global Step: 524170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:46,947-Speed 2632.00 samples/sec   Loss 4.8885   LearningRate 0.0136   Epoch: 12   Global Step: 524180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:37:50,842-Speed 2630.05 samples/sec   Loss 4.9553   LearningRate 0.0136   Epoch: 12   Global Step: 524190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:37:54,724-Speed 2638.85 samples/sec   Loss 4.9334   LearningRate 0.0136   Epoch: 12   Global Step: 524200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:37:58,622-Speed 2628.09 samples/sec   Loss 4.9813   LearningRate 0.0135   Epoch: 12   Global Step: 524210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:02,517-Speed 2628.81 samples/sec   Loss 4.9612   LearningRate 0.0135   Epoch: 12   Global Step: 524220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:06,414-Speed 2628.43 samples/sec   Loss 4.8444   LearningRate 0.0135   Epoch: 12   Global Step: 524230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:10,308-Speed 2630.83 samples/sec   Loss 4.9755   LearningRate 0.0135   Epoch: 12   Global Step: 524240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:14,204-Speed 2629.13 samples/sec   Loss 4.9979   LearningRate 0.0135   Epoch: 12   Global Step: 524250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:18,101-Speed 2628.66 samples/sec   Loss 5.0293   LearningRate 0.0135   Epoch: 12   Global Step: 524260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:21,995-Speed 2630.46 samples/sec   Loss 4.8871   LearningRate 0.0135   Epoch: 12   Global Step: 524270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:25,890-Speed 2629.88 samples/sec   Loss 5.0419   LearningRate 0.0135   Epoch: 12   Global Step: 524280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:29,784-Speed 2630.31 samples/sec   Loss 4.9168   LearningRate 0.0135   Epoch: 12   Global Step: 524290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:33,689-Speed 2622.87 samples/sec   Loss 4.9702   LearningRate 0.0135   Epoch: 12   Global Step: 524300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:38:37,596-Speed 2621.02 samples/sec   Loss 4.9481   LearningRate 0.0135   Epoch: 12   Global Step: 524310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:38:41,496-Speed 2627.07 samples/sec   Loss 4.9383   LearningRate 0.0135   Epoch: 12   Global Step: 524320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:38:45,372-Speed 2643.14 samples/sec   Loss 4.9339   LearningRate 0.0135   Epoch: 12   Global Step: 524330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:49,294-Speed 2611.07 samples/sec   Loss 4.9814   LearningRate 0.0135   Epoch: 12   Global Step: 524340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:53,191-Speed 2628.60 samples/sec   Loss 4.9208   LearningRate 0.0135   Epoch: 12   Global Step: 524350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:38:57,090-Speed 2626.77 samples/sec   Loss 4.9777   LearningRate 0.0135   Epoch: 12   Global Step: 524360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:39:01,006-Speed 2615.74 samples/sec   Loss 5.0354   LearningRate 0.0135   Epoch: 12   Global Step: 524370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:39:04,882-Speed 2642.49 samples/sec   Loss 4.8597   LearningRate 0.0135   Epoch: 12   Global Step: 524380   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:08,835-Speed 2591.32 samples/sec   Loss 4.9767   LearningRate 0.0135   Epoch: 12   Global Step: 524390   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:12,749-Speed 2617.12 samples/sec   Loss 4.9362   LearningRate 0.0135   Epoch: 12   Global Step: 524400   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:16,652-Speed 2624.23 samples/sec   Loss 5.0407   LearningRate 0.0135   Epoch: 12   Global Step: 524410   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:20,558-Speed 2622.79 samples/sec   Loss 4.9280   LearningRate 0.0135   Epoch: 12   Global Step: 524420   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:24,453-Speed 2629.50 samples/sec   Loss 4.8918   LearningRate 0.0135   Epoch: 12   Global Step: 524430   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:28,353-Speed 2627.50 samples/sec   Loss 4.9581   LearningRate 0.0135   Epoch: 12   Global Step: 524440   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:32,270-Speed 2614.35 samples/sec   Loss 4.8799   LearningRate 0.0135   Epoch: 12   Global Step: 524450   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:36,166-Speed 2629.04 samples/sec   Loss 4.8890   LearningRate 0.0135   Epoch: 12   Global Step: 524460   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:40,071-Speed 2622.98 samples/sec   Loss 5.0513   LearningRate 0.0135   Epoch: 12   Global Step: 524470   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:39:43,971-Speed 2626.55 samples/sec   Loss 4.9588   LearningRate 0.0135   Epoch: 12   Global Step: 524480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:39:47,874-Speed 2623.98 samples/sec   Loss 4.9357   LearningRate 0.0135   Epoch: 12   Global Step: 524490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:39:51,772-Speed 2627.93 samples/sec   Loss 4.9516   LearningRate 0.0135   Epoch: 12   Global Step: 524500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:39:55,675-Speed 2624.15 samples/sec   Loss 5.0165   LearningRate 0.0135   Epoch: 12   Global Step: 524510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:39:59,591-Speed 2615.89 samples/sec   Loss 4.9143   LearningRate 0.0135   Epoch: 12   Global Step: 524520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:03,503-Speed 2618.22 samples/sec   Loss 4.9497   LearningRate 0.0135   Epoch: 12   Global Step: 524530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:07,513-Speed 2554.28 samples/sec   Loss 4.9215   LearningRate 0.0135   Epoch: 12   Global Step: 524540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:11,435-Speed 2611.28 samples/sec   Loss 4.8371   LearningRate 0.0135   Epoch: 12   Global Step: 524550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:15,333-Speed 2627.63 samples/sec   Loss 4.9466   LearningRate 0.0135   Epoch: 12   Global Step: 524560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:19,231-Speed 2628.10 samples/sec   Loss 4.9607   LearningRate 0.0135   Epoch: 12   Global Step: 524570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:23,136-Speed 2623.37 samples/sec   Loss 4.9514   LearningRate 0.0135   Epoch: 12   Global Step: 524580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:40:27,012-Speed 2642.35 samples/sec   Loss 4.8776   LearningRate 0.0135   Epoch: 12   Global Step: 524590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:30,907-Speed 2629.83 samples/sec   Loss 4.9186   LearningRate 0.0135   Epoch: 12   Global Step: 524600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:34,801-Speed 2629.71 samples/sec   Loss 4.9543   LearningRate 0.0135   Epoch: 12   Global Step: 524610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:38,696-Speed 2629.65 samples/sec   Loss 4.9757   LearningRate 0.0135   Epoch: 12   Global Step: 524620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:42,605-Speed 2620.43 samples/sec   Loss 4.9445   LearningRate 0.0135   Epoch: 12   Global Step: 524630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:46,496-Speed 2632.28 samples/sec   Loss 4.9476   LearningRate 0.0135   Epoch: 12   Global Step: 524640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:50,405-Speed 2620.54 samples/sec   Loss 4.9423   LearningRate 0.0135   Epoch: 12   Global Step: 524650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:54,298-Speed 2630.89 samples/sec   Loss 4.9004   LearningRate 0.0135   Epoch: 12   Global Step: 524660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:40:58,172-Speed 2644.38 samples/sec   Loss 4.9266   LearningRate 0.0135   Epoch: 12   Global Step: 524670   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:02,088-Speed 2615.49 samples/sec   Loss 4.9258   LearningRate 0.0135   Epoch: 12   Global Step: 524680   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:06,122-Speed 2538.57 samples/sec   Loss 5.0402   LearningRate 0.0135   Epoch: 12   Global Step: 524690   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:10,024-Speed 2624.85 samples/sec   Loss 4.9132   LearningRate 0.0135   Epoch: 12   Global Step: 524700   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:13,919-Speed 2629.68 samples/sec   Loss 4.9553   LearningRate 0.0135   Epoch: 12   Global Step: 524710   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:17,816-Speed 2628.77 samples/sec   Loss 4.9500   LearningRate 0.0135   Epoch: 12   Global Step: 524720   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:21,709-Speed 2631.15 samples/sec   Loss 4.9635   LearningRate 0.0135   Epoch: 12   Global Step: 524730   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:25,614-Speed 2622.88 samples/sec   Loss 5.0277   LearningRate 0.0135   Epoch: 12   Global Step: 524740   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:29,509-Speed 2630.07 samples/sec   Loss 4.9162   LearningRate 0.0135   Epoch: 12   Global Step: 524750   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:33,406-Speed 2628.33 samples/sec   Loss 4.9146   LearningRate 0.0135   Epoch: 12   Global Step: 524760   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:41:37,306-Speed 2626.08 samples/sec   Loss 4.8850   LearningRate 0.0135   Epoch: 12   Global Step: 524770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:41:41,203-Speed 2628.58 samples/sec   Loss 4.8731   LearningRate 0.0135   Epoch: 12   Global Step: 524780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:41:45,097-Speed 2630.26 samples/sec   Loss 4.9480   LearningRate 0.0135   Epoch: 12   Global Step: 524790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:41:49,004-Speed 2621.40 samples/sec   Loss 4.8255   LearningRate 0.0135   Epoch: 12   Global Step: 524800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:41:52,904-Speed 2626.61 samples/sec   Loss 4.9747   LearningRate 0.0135   Epoch: 12   Global Step: 524810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:41:56,798-Speed 2630.20 samples/sec   Loss 4.8774   LearningRate 0.0135   Epoch: 12   Global Step: 524820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:00,694-Speed 2629.48 samples/sec   Loss 4.8443   LearningRate 0.0135   Epoch: 12   Global Step: 524830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:04,596-Speed 2625.10 samples/sec   Loss 5.0055   LearningRate 0.0135   Epoch: 12   Global Step: 524840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:08,497-Speed 2625.43 samples/sec   Loss 4.9267   LearningRate 0.0135   Epoch: 12   Global Step: 524850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:12,396-Speed 2627.03 samples/sec   Loss 4.8433   LearningRate 0.0135   Epoch: 12   Global Step: 524860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:16,307-Speed 2619.50 samples/sec   Loss 4.9611   LearningRate 0.0135   Epoch: 12   Global Step: 524870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:42:20,181-Speed 2644.00 samples/sec   Loss 4.8854   LearningRate 0.0135   Epoch: 12   Global Step: 524880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:24,151-Speed 2580.46 samples/sec   Loss 4.9169   LearningRate 0.0135   Epoch: 12   Global Step: 524890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:28,049-Speed 2627.13 samples/sec   Loss 4.8994   LearningRate 0.0135   Epoch: 12   Global Step: 524900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:31,946-Speed 2628.90 samples/sec   Loss 4.9025   LearningRate 0.0135   Epoch: 12   Global Step: 524910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:35,841-Speed 2629.74 samples/sec   Loss 4.9469   LearningRate 0.0135   Epoch: 12   Global Step: 524920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:39,741-Speed 2626.44 samples/sec   Loss 5.0166   LearningRate 0.0135   Epoch: 12   Global Step: 524930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:43,636-Speed 2629.65 samples/sec   Loss 4.9855   LearningRate 0.0135   Epoch: 12   Global Step: 524940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:47,531-Speed 2629.91 samples/sec   Loss 4.9961   LearningRate 0.0135   Epoch: 12   Global Step: 524950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:51,433-Speed 2624.97 samples/sec   Loss 4.9899   LearningRate 0.0135   Epoch: 12   Global Step: 524960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:55,329-Speed 2629.26 samples/sec   Loss 4.9050   LearningRate 0.0135   Epoch: 12   Global Step: 524970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:42:59,229-Speed 2626.20 samples/sec   Loss 4.8609   LearningRate 0.0135   Epoch: 12   Global Step: 524980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:03,122-Speed 2630.99 samples/sec   Loss 4.9756   LearningRate 0.0135   Epoch: 12   Global Step: 524990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:07,015-Speed 2630.79 samples/sec   Loss 4.9939   LearningRate 0.0135   Epoch: 12   Global Step: 525000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:10,906-Speed 2633.00 samples/sec   Loss 4.9470   LearningRate 0.0135   Epoch: 12   Global Step: 525010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:14,799-Speed 2630.87 samples/sec   Loss 4.8756   LearningRate 0.0135   Epoch: 12   Global Step: 525020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:18,707-Speed 2621.33 samples/sec   Loss 4.8242   LearningRate 0.0135   Epoch: 12   Global Step: 525030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:22,604-Speed 2628.01 samples/sec   Loss 4.8971   LearningRate 0.0135   Epoch: 12   Global Step: 525040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:26,497-Speed 2631.60 samples/sec   Loss 4.9310   LearningRate 0.0135   Epoch: 12   Global Step: 525050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:30,394-Speed 2627.91 samples/sec   Loss 4.9334   LearningRate 0.0135   Epoch: 12   Global Step: 525060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:34,292-Speed 2627.82 samples/sec   Loss 4.8911   LearningRate 0.0135   Epoch: 12   Global Step: 525070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:43:38,169-Speed 2641.70 samples/sec   Loss 4.8898   LearningRate 0.0135   Epoch: 12   Global Step: 525080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:43:42,063-Speed 2630.41 samples/sec   Loss 4.9361   LearningRate 0.0135   Epoch: 12   Global Step: 525090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:43:45,958-Speed 2630.02 samples/sec   Loss 4.9748   LearningRate 0.0135   Epoch: 12   Global Step: 525100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:43:49,850-Speed 2631.22 samples/sec   Loss 5.0238   LearningRate 0.0135   Epoch: 12   Global Step: 525110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:43:53,742-Speed 2632.04 samples/sec   Loss 4.8774   LearningRate 0.0135   Epoch: 12   Global Step: 525120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:43:57,635-Speed 2630.71 samples/sec   Loss 4.9982   LearningRate 0.0135   Epoch: 12   Global Step: 525130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:01,528-Speed 2630.80 samples/sec   Loss 5.0072   LearningRate 0.0135   Epoch: 12   Global Step: 525140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:05,420-Speed 2631.99 samples/sec   Loss 4.8485   LearningRate 0.0135   Epoch: 12   Global Step: 525150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:09,319-Speed 2627.39 samples/sec   Loss 5.1038   LearningRate 0.0135   Epoch: 12   Global Step: 525160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:13,213-Speed 2630.12 samples/sec   Loss 4.8742   LearningRate 0.0135   Epoch: 12   Global Step: 525170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:17,087-Speed 2644.18 samples/sec   Loss 4.9116   LearningRate 0.0135   Epoch: 12   Global Step: 525180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:20,981-Speed 2630.05 samples/sec   Loss 4.9169   LearningRate 0.0135   Epoch: 12   Global Step: 525190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:24,877-Speed 2629.68 samples/sec   Loss 4.8885   LearningRate 0.0135   Epoch: 12   Global Step: 525200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:44:28,753-Speed 2642.92 samples/sec   Loss 4.9342   LearningRate 0.0135   Epoch: 12   Global Step: 525210   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:32,643-Speed 2632.29 samples/sec   Loss 4.9427   LearningRate 0.0135   Epoch: 12   Global Step: 525220   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:36,544-Speed 2626.17 samples/sec   Loss 4.9536   LearningRate 0.0135   Epoch: 12   Global Step: 525230   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:40,444-Speed 2626.56 samples/sec   Loss 4.9181   LearningRate 0.0135   Epoch: 12   Global Step: 525240   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:44,337-Speed 2630.44 samples/sec   Loss 4.9841   LearningRate 0.0135   Epoch: 12   Global Step: 525250   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:48,236-Speed 2627.60 samples/sec   Loss 4.9706   LearningRate 0.0135   Epoch: 12   Global Step: 525260   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:52,135-Speed 2627.02 samples/sec   Loss 4.8448   LearningRate 0.0135   Epoch: 12   Global Step: 525270   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:56,027-Speed 2631.65 samples/sec   Loss 4.9294   LearningRate 0.0135   Epoch: 12   Global Step: 525280   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:44:59,936-Speed 2619.97 samples/sec   Loss 4.9650   LearningRate 0.0135   Epoch: 12   Global Step: 525290   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:45:03,841-Speed 2623.15 samples/sec   Loss 4.9417   LearningRate 0.0135   Epoch: 12   Global Step: 525300   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:45:07,747-Speed 2622.28 samples/sec   Loss 4.9673   LearningRate 0.0135   Epoch: 12   Global Step: 525310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:11,636-Speed 2633.98 samples/sec   Loss 4.8770   LearningRate 0.0135   Epoch: 12   Global Step: 525320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:15,528-Speed 2632.38 samples/sec   Loss 4.9210   LearningRate 0.0135   Epoch: 12   Global Step: 525330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:19,425-Speed 2628.28 samples/sec   Loss 4.8994   LearningRate 0.0134   Epoch: 12   Global Step: 525340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:23,324-Speed 2627.10 samples/sec   Loss 4.8337   LearningRate 0.0134   Epoch: 12   Global Step: 525350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:27,214-Speed 2633.50 samples/sec   Loss 5.0044   LearningRate 0.0134   Epoch: 12   Global Step: 525360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:31,107-Speed 2631.13 samples/sec   Loss 4.9754   LearningRate 0.0134   Epoch: 12   Global Step: 525370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:35,001-Speed 2630.47 samples/sec   Loss 4.8734   LearningRate 0.0134   Epoch: 12   Global Step: 525380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:38,891-Speed 2632.78 samples/sec   Loss 4.9166   LearningRate 0.0134   Epoch: 12   Global Step: 525390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:42,795-Speed 2623.27 samples/sec   Loss 5.1323   LearningRate 0.0134   Epoch: 12   Global Step: 525400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:45:46,894-Speed 2499.52 samples/sec   Loss 4.7621   LearningRate 0.0134   Epoch: 12   Global Step: 525410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:45:50,786-Speed 2631.69 samples/sec   Loss 4.9857   LearningRate 0.0134   Epoch: 12   Global Step: 525420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:45:54,709-Speed 2611.11 samples/sec   Loss 4.9204   LearningRate 0.0134   Epoch: 12   Global Step: 525430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:45:58,590-Speed 2639.38 samples/sec   Loss 4.9372   LearningRate 0.0134   Epoch: 12   Global Step: 525440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:02,483-Speed 2630.73 samples/sec   Loss 4.9302   LearningRate 0.0134   Epoch: 12   Global Step: 525450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:06,383-Speed 2626.76 samples/sec   Loss 4.8934   LearningRate 0.0134   Epoch: 12   Global Step: 525460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:10,297-Speed 2616.35 samples/sec   Loss 4.9461   LearningRate 0.0134   Epoch: 12   Global Step: 525470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:14,247-Speed 2593.70 samples/sec   Loss 4.9641   LearningRate 0.0134   Epoch: 12   Global Step: 525480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:18,150-Speed 2623.85 samples/sec   Loss 4.8883   LearningRate 0.0134   Epoch: 12   Global Step: 525490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:22,068-Speed 2614.84 samples/sec   Loss 4.8814   LearningRate 0.0134   Epoch: 12   Global Step: 525500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:25,966-Speed 2627.84 samples/sec   Loss 4.9933   LearningRate 0.0134   Epoch: 12   Global Step: 525510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:29,857-Speed 2632.35 samples/sec   Loss 4.9624   LearningRate 0.0134   Epoch: 12   Global Step: 525520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:33,755-Speed 2627.20 samples/sec   Loss 5.0188   LearningRate 0.0134   Epoch: 12   Global Step: 525530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:37,648-Speed 2630.83 samples/sec   Loss 4.9582   LearningRate 0.0134   Epoch: 12   Global Step: 525540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:46:41,526-Speed 2641.34 samples/sec   Loss 4.9705   LearningRate 0.0134   Epoch: 12   Global Step: 525550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:45,429-Speed 2624.48 samples/sec   Loss 5.0200   LearningRate 0.0134   Epoch: 12   Global Step: 525560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:49,325-Speed 2629.00 samples/sec   Loss 4.8264   LearningRate 0.0134   Epoch: 12   Global Step: 525570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:53,232-Speed 2622.37 samples/sec   Loss 4.9016   LearningRate 0.0134   Epoch: 12   Global Step: 525580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:46:57,132-Speed 2625.91 samples/sec   Loss 4.7520   LearningRate 0.0134   Epoch: 12   Global Step: 525590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:01,031-Speed 2627.03 samples/sec   Loss 5.0318   LearningRate 0.0134   Epoch: 12   Global Step: 525600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:04,935-Speed 2623.28 samples/sec   Loss 5.0350   LearningRate 0.0134   Epoch: 12   Global Step: 525610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:08,840-Speed 2623.00 samples/sec   Loss 4.9745   LearningRate 0.0134   Epoch: 12   Global Step: 525620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:12,749-Speed 2619.86 samples/sec   Loss 5.0631   LearningRate 0.0134   Epoch: 12   Global Step: 525630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:16,645-Speed 2629.35 samples/sec   Loss 4.9355   LearningRate 0.0134   Epoch: 12   Global Step: 525640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:20,544-Speed 2627.20 samples/sec   Loss 4.9542   LearningRate 0.0134   Epoch: 12   Global Step: 525650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:47:24,443-Speed 2627.16 samples/sec   Loss 4.9001   LearningRate 0.0134   Epoch: 12   Global Step: 525660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:47:28,339-Speed 2628.50 samples/sec   Loss 4.9466   LearningRate 0.0134   Epoch: 12   Global Step: 525670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:47:32,238-Speed 2627.63 samples/sec   Loss 4.8586   LearningRate 0.0134   Epoch: 12   Global Step: 525680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:47:36,117-Speed 2640.47 samples/sec   Loss 4.8957   LearningRate 0.0134   Epoch: 12   Global Step: 525690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:40,018-Speed 2625.69 samples/sec   Loss 4.9574   LearningRate 0.0134   Epoch: 12   Global Step: 525700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:43,912-Speed 2629.80 samples/sec   Loss 4.9372   LearningRate 0.0134   Epoch: 12   Global Step: 525710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:47,807-Speed 2629.96 samples/sec   Loss 4.7825   LearningRate 0.0134   Epoch: 12   Global Step: 525720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:51,728-Speed 2612.86 samples/sec   Loss 4.9815   LearningRate 0.0134   Epoch: 12   Global Step: 525730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:55,672-Speed 2596.38 samples/sec   Loss 4.9283   LearningRate 0.0134   Epoch: 12   Global Step: 525740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:47:59,581-Speed 2620.59 samples/sec   Loss 4.9493   LearningRate 0.0134   Epoch: 12   Global Step: 525750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:03,482-Speed 2625.54 samples/sec   Loss 4.8112   LearningRate 0.0134   Epoch: 12   Global Step: 525760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:07,381-Speed 2627.24 samples/sec   Loss 5.0408   LearningRate 0.0134   Epoch: 12   Global Step: 525770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:11,279-Speed 2627.20 samples/sec   Loss 4.9633   LearningRate 0.0134   Epoch: 12   Global Step: 525780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:15,175-Speed 2629.58 samples/sec   Loss 4.9145   LearningRate 0.0134   Epoch: 12   Global Step: 525790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:48:19,079-Speed 2622.92 samples/sec   Loss 4.9167   LearningRate 0.0134   Epoch: 12   Global Step: 525800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:48:22,948-Speed 2647.54 samples/sec   Loss 4.8120   LearningRate 0.0134   Epoch: 12   Global Step: 525810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:26,854-Speed 2622.28 samples/sec   Loss 5.0238   LearningRate 0.0134   Epoch: 12   Global Step: 525820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:30,749-Speed 2629.60 samples/sec   Loss 4.9026   LearningRate 0.0134   Epoch: 12   Global Step: 525830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:34,653-Speed 2623.96 samples/sec   Loss 4.8427   LearningRate 0.0134   Epoch: 12   Global Step: 525840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:38,548-Speed 2629.33 samples/sec   Loss 4.9755   LearningRate 0.0134   Epoch: 12   Global Step: 525850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:42,445-Speed 2628.77 samples/sec   Loss 5.0147   LearningRate 0.0134   Epoch: 12   Global Step: 525860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:46,339-Speed 2630.37 samples/sec   Loss 4.8928   LearningRate 0.0134   Epoch: 12   Global Step: 525870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:50,234-Speed 2629.76 samples/sec   Loss 4.9655   LearningRate 0.0134   Epoch: 12   Global Step: 525880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:54,157-Speed 2611.10 samples/sec   Loss 4.9367   LearningRate 0.0134   Epoch: 12   Global Step: 525890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:48:58,136-Speed 2574.14 samples/sec   Loss 4.9338   LearningRate 0.0134   Epoch: 12   Global Step: 525900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:02,226-Speed 2503.74 samples/sec   Loss 4.8626   LearningRate 0.0134   Epoch: 12   Global Step: 525910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:49:06,323-Speed 2499.83 samples/sec   Loss 4.9629   LearningRate 0.0134   Epoch: 12   Global Step: 525920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:49:10,393-Speed 2517.01 samples/sec   Loss 4.8987   LearningRate 0.0134   Epoch: 12   Global Step: 525930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:49:14,289-Speed 2628.66 samples/sec   Loss 4.9077   LearningRate 0.0134   Epoch: 12   Global Step: 525940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:49:18,198-Speed 2621.11 samples/sec   Loss 4.8946   LearningRate 0.0134   Epoch: 12   Global Step: 525950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:49:22,072-Speed 2643.33 samples/sec   Loss 4.9930   LearningRate 0.0134   Epoch: 12   Global Step: 525960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:25,969-Speed 2628.79 samples/sec   Loss 4.7715   LearningRate 0.0134   Epoch: 12   Global Step: 525970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:29,866-Speed 2628.34 samples/sec   Loss 4.8069   LearningRate 0.0134   Epoch: 12   Global Step: 525980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:33,759-Speed 2630.48 samples/sec   Loss 4.9693   LearningRate 0.0134   Epoch: 12   Global Step: 525990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:37,650-Speed 2632.44 samples/sec   Loss 4.8637   LearningRate 0.0134   Epoch: 12   Global Step: 526000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:41,550-Speed 2626.60 samples/sec   Loss 4.9969   LearningRate 0.0134   Epoch: 12   Global Step: 526010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:45,452-Speed 2625.33 samples/sec   Loss 4.9261   LearningRate 0.0134   Epoch: 12   Global Step: 526020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:49,393-Speed 2598.74 samples/sec   Loss 4.9259   LearningRate 0.0134   Epoch: 12   Global Step: 526030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:53,487-Speed 2501.61 samples/sec   Loss 4.9409   LearningRate 0.0134   Epoch: 12   Global Step: 526040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:49:57,393-Speed 2622.42 samples/sec   Loss 4.8222   LearningRate 0.0134   Epoch: 12   Global Step: 526050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:01,306-Speed 2617.89 samples/sec   Loss 5.0603   LearningRate 0.0134   Epoch: 12   Global Step: 526060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:50:05,213-Speed 2621.15 samples/sec   Loss 4.8979   LearningRate 0.0134   Epoch: 12   Global Step: 526070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:50:09,122-Speed 2620.31 samples/sec   Loss 4.9350   LearningRate 0.0134   Epoch: 12   Global Step: 526080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:50:13,020-Speed 2627.76 samples/sec   Loss 4.8968   LearningRate 0.0134   Epoch: 12   Global Step: 526090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:50:16,898-Speed 2641.12 samples/sec   Loss 4.9587   LearningRate 0.0134   Epoch: 12   Global Step: 526100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:20,936-Speed 2536.26 samples/sec   Loss 4.9940   LearningRate 0.0134   Epoch: 12   Global Step: 526110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:25,032-Speed 2501.10 samples/sec   Loss 4.9863   LearningRate 0.0134   Epoch: 12   Global Step: 526120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:29,128-Speed 2500.36 samples/sec   Loss 4.9857   LearningRate 0.0134   Epoch: 12   Global Step: 526130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:33,202-Speed 2514.59 samples/sec   Loss 4.9506   LearningRate 0.0134   Epoch: 12   Global Step: 526140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:37,140-Speed 2601.05 samples/sec   Loss 4.8734   LearningRate 0.0134   Epoch: 12   Global Step: 526150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:41,035-Speed 2630.02 samples/sec   Loss 4.8957   LearningRate 0.0134   Epoch: 12   Global Step: 526160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:45,037-Speed 2559.25 samples/sec   Loss 4.8572   LearningRate 0.0134   Epoch: 12   Global Step: 526170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:48,929-Speed 2631.31 samples/sec   Loss 4.8256   LearningRate 0.0134   Epoch: 12   Global Step: 526180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:52,825-Speed 2629.59 samples/sec   Loss 4.9081   LearningRate 0.0134   Epoch: 12   Global Step: 526190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:50:56,697-Speed 2645.39 samples/sec   Loss 4.8777   LearningRate 0.0134   Epoch: 12   Global Step: 526200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:00,599-Speed 2625.02 samples/sec   Loss 4.9050   LearningRate 0.0134   Epoch: 12   Global Step: 526210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:04,495-Speed 2628.46 samples/sec   Loss 4.9014   LearningRate 0.0134   Epoch: 12   Global Step: 526220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:08,412-Speed 2614.58 samples/sec   Loss 4.9155   LearningRate 0.0134   Epoch: 12   Global Step: 526230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:12,309-Speed 2628.88 samples/sec   Loss 4.9084   LearningRate 0.0134   Epoch: 12   Global Step: 526240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:16,203-Speed 2630.74 samples/sec   Loss 4.8647   LearningRate 0.0134   Epoch: 12   Global Step: 526250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:20,106-Speed 2624.15 samples/sec   Loss 4.8933   LearningRate 0.0134   Epoch: 12   Global Step: 526260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:24,009-Speed 2624.83 samples/sec   Loss 4.9073   LearningRate 0.0134   Epoch: 12   Global Step: 526270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:27,908-Speed 2626.80 samples/sec   Loss 4.9369   LearningRate 0.0134   Epoch: 12   Global Step: 526280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:31,805-Speed 2628.27 samples/sec   Loss 4.9113   LearningRate 0.0134   Epoch: 12   Global Step: 526290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:35,714-Speed 2620.26 samples/sec   Loss 4.9802   LearningRate 0.0134   Epoch: 12   Global Step: 526300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:51:39,635-Speed 2611.89 samples/sec   Loss 4.8822   LearningRate 0.0134   Epoch: 12   Global Step: 526310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:51:43,555-Speed 2612.75 samples/sec   Loss 4.8030   LearningRate 0.0134   Epoch: 12   Global Step: 526320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:51:47,470-Speed 2616.52 samples/sec   Loss 4.9425   LearningRate 0.0134   Epoch: 12   Global Step: 526330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:51:51,359-Speed 2634.49 samples/sec   Loss 4.9084   LearningRate 0.0134   Epoch: 12   Global Step: 526340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:55,253-Speed 2629.99 samples/sec   Loss 4.8896   LearningRate 0.0134   Epoch: 12   Global Step: 526350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:51:59,194-Speed 2598.79 samples/sec   Loss 4.9515   LearningRate 0.0134   Epoch: 12   Global Step: 526360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:03,128-Speed 2603.82 samples/sec   Loss 4.8727   LearningRate 0.0134   Epoch: 12   Global Step: 526370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:07,028-Speed 2626.42 samples/sec   Loss 5.0312   LearningRate 0.0134   Epoch: 12   Global Step: 526380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:10,922-Speed 2630.01 samples/sec   Loss 5.0091   LearningRate 0.0134   Epoch: 12   Global Step: 526390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:14,830-Speed 2621.08 samples/sec   Loss 4.8533   LearningRate 0.0134   Epoch: 12   Global Step: 526400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:18,736-Speed 2622.84 samples/sec   Loss 4.9082   LearningRate 0.0134   Epoch: 12   Global Step: 526410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:22,636-Speed 2626.32 samples/sec   Loss 4.8116   LearningRate 0.0134   Epoch: 12   Global Step: 526420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:26,538-Speed 2625.08 samples/sec   Loss 4.7829   LearningRate 0.0134   Epoch: 12   Global Step: 526430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:30,431-Speed 2630.77 samples/sec   Loss 4.9347   LearningRate 0.0134   Epoch: 12   Global Step: 526440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:52:34,334-Speed 2624.03 samples/sec   Loss 4.9123   LearningRate 0.0134   Epoch: 12   Global Step: 526450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:52:38,233-Speed 2627.04 samples/sec   Loss 4.9590   LearningRate 0.0134   Epoch: 12   Global Step: 526460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:52:42,106-Speed 2644.61 samples/sec   Loss 4.9223   LearningRate 0.0134   Epoch: 12   Global Step: 526470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:46,016-Speed 2620.35 samples/sec   Loss 4.8560   LearningRate 0.0133   Epoch: 12   Global Step: 526480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:49,913-Speed 2628.44 samples/sec   Loss 4.8504   LearningRate 0.0133   Epoch: 12   Global Step: 526490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:53,810-Speed 2628.76 samples/sec   Loss 4.8729   LearningRate 0.0133   Epoch: 12   Global Step: 526500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:52:57,722-Speed 2617.93 samples/sec   Loss 4.9138   LearningRate 0.0133   Epoch: 12   Global Step: 526510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:01,621-Speed 2627.14 samples/sec   Loss 4.9184   LearningRate 0.0133   Epoch: 12   Global Step: 526520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:05,558-Speed 2601.53 samples/sec   Loss 4.8712   LearningRate 0.0133   Epoch: 12   Global Step: 526530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:09,479-Speed 2612.33 samples/sec   Loss 4.8706   LearningRate 0.0133   Epoch: 12   Global Step: 526540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:13,371-Speed 2631.53 samples/sec   Loss 4.9198   LearningRate 0.0133   Epoch: 12   Global Step: 526550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:17,268-Speed 2628.99 samples/sec   Loss 4.9441   LearningRate 0.0133   Epoch: 12   Global Step: 526560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:21,169-Speed 2625.91 samples/sec   Loss 4.8578   LearningRate 0.0133   Epoch: 12   Global Step: 526570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:53:25,073-Speed 2623.71 samples/sec   Loss 4.8887   LearningRate 0.0133   Epoch: 12   Global Step: 526580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:53:28,995-Speed 2611.54 samples/sec   Loss 4.8898   LearningRate 0.0133   Epoch: 12   Global Step: 526590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:53:32,892-Speed 2628.47 samples/sec   Loss 4.8793   LearningRate 0.0133   Epoch: 12   Global Step: 526600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:53:36,787-Speed 2629.77 samples/sec   Loss 4.8944   LearningRate 0.0133   Epoch: 12   Global Step: 526610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:40,687-Speed 2625.81 samples/sec   Loss 4.7930   LearningRate 0.0133   Epoch: 12   Global Step: 526620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:44,593-Speed 2622.35 samples/sec   Loss 4.8934   LearningRate 0.0133   Epoch: 12   Global Step: 526630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:48,541-Speed 2594.45 samples/sec   Loss 4.8749   LearningRate 0.0133   Epoch: 12   Global Step: 526640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:52,461-Speed 2613.48 samples/sec   Loss 4.9832   LearningRate 0.0133   Epoch: 12   Global Step: 526650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:53:56,355-Speed 2630.36 samples/sec   Loss 4.9307   LearningRate 0.0133   Epoch: 12   Global Step: 526660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:00,252-Speed 2628.20 samples/sec   Loss 4.8449   LearningRate 0.0133   Epoch: 12   Global Step: 526670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:04,165-Speed 2617.67 samples/sec   Loss 4.9213   LearningRate 0.0133   Epoch: 12   Global Step: 526680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:08,070-Speed 2622.63 samples/sec   Loss 4.9108   LearningRate 0.0133   Epoch: 12   Global Step: 526690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:11,978-Speed 2620.98 samples/sec   Loss 4.9114   LearningRate 0.0133   Epoch: 12   Global Step: 526700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:16,012-Speed 2539.19 samples/sec   Loss 4.9282   LearningRate 0.0133   Epoch: 12   Global Step: 526710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:19,910-Speed 2627.49 samples/sec   Loss 4.9430   LearningRate 0.0133   Epoch: 12   Global Step: 526720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:23,907-Speed 2562.99 samples/sec   Loss 4.9833   LearningRate 0.0133   Epoch: 12   Global Step: 526730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:27,800-Speed 2631.32 samples/sec   Loss 4.8426   LearningRate 0.0133   Epoch: 12   Global Step: 526740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:31,692-Speed 2632.22 samples/sec   Loss 4.8487   LearningRate 0.0133   Epoch: 12   Global Step: 526750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:35,589-Speed 2628.15 samples/sec   Loss 4.8666   LearningRate 0.0133   Epoch: 12   Global Step: 526760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:39,485-Speed 2628.55 samples/sec   Loss 4.9050   LearningRate 0.0133   Epoch: 12   Global Step: 526770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:43,384-Speed 2627.17 samples/sec   Loss 5.0139   LearningRate 0.0133   Epoch: 12   Global Step: 526780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:47,291-Speed 2621.27 samples/sec   Loss 4.8596   LearningRate 0.0133   Epoch: 12   Global Step: 526790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:51,191-Speed 2626.11 samples/sec   Loss 4.8546   LearningRate 0.0133   Epoch: 12   Global Step: 526800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:54:55,090-Speed 2627.52 samples/sec   Loss 4.8625   LearningRate 0.0133   Epoch: 12   Global Step: 526810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:54:59,007-Speed 2615.02 samples/sec   Loss 4.9187   LearningRate 0.0133   Epoch: 12   Global Step: 526820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:55:02,905-Speed 2627.46 samples/sec   Loss 4.7995   LearningRate 0.0133   Epoch: 12   Global Step: 526830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:55:06,818-Speed 2617.74 samples/sec   Loss 4.8295   LearningRate 0.0133   Epoch: 12   Global Step: 526840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:55:10,714-Speed 2628.67 samples/sec   Loss 4.9714   LearningRate 0.0133   Epoch: 12   Global Step: 526850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:55:14,621-Speed 2621.28 samples/sec   Loss 4.8838   LearningRate 0.0133   Epoch: 12   Global Step: 526860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:55:18,496-Speed 2643.04 samples/sec   Loss 5.0033   LearningRate 0.0133   Epoch: 12   Global Step: 526870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:22,392-Speed 2629.66 samples/sec   Loss 4.8291   LearningRate 0.0133   Epoch: 12   Global Step: 526880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:26,291-Speed 2626.62 samples/sec   Loss 4.9408   LearningRate 0.0133   Epoch: 12   Global Step: 526890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:30,194-Speed 2625.00 samples/sec   Loss 4.8385   LearningRate 0.0133   Epoch: 12   Global Step: 526900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:34,089-Speed 2629.56 samples/sec   Loss 4.8852   LearningRate 0.0133   Epoch: 12   Global Step: 526910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:37,981-Speed 2631.55 samples/sec   Loss 4.8205   LearningRate 0.0133   Epoch: 12   Global Step: 526920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:41,874-Speed 2631.20 samples/sec   Loss 4.9329   LearningRate 0.0133   Epoch: 12   Global Step: 526930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:45,774-Speed 2626.24 samples/sec   Loss 4.8582   LearningRate 0.0133   Epoch: 12   Global Step: 526940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:49,682-Speed 2620.75 samples/sec   Loss 4.9278   LearningRate 0.0133   Epoch: 12   Global Step: 526950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:53,578-Speed 2628.94 samples/sec   Loss 4.8795   LearningRate 0.0133   Epoch: 12   Global Step: 526960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:55:57,473-Speed 2629.56 samples/sec   Loss 4.8295   LearningRate 0.0133   Epoch: 12   Global Step: 526970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:56:01,370-Speed 2628.60 samples/sec   Loss 5.0605   LearningRate 0.0133   Epoch: 12   Global Step: 526980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:56:05,268-Speed 2627.55 samples/sec   Loss 4.8757   LearningRate 0.0133   Epoch: 12   Global Step: 526990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:56:09,140-Speed 2645.47 samples/sec   Loss 4.8310   LearningRate 0.0133   Epoch: 12   Global Step: 527000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:13,038-Speed 2627.40 samples/sec   Loss 4.8326   LearningRate 0.0133   Epoch: 12   Global Step: 527010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:16,936-Speed 2628.05 samples/sec   Loss 4.7874   LearningRate 0.0133   Epoch: 12   Global Step: 527020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:20,862-Speed 2608.63 samples/sec   Loss 4.9131   LearningRate 0.0133   Epoch: 12   Global Step: 527030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:24,763-Speed 2625.49 samples/sec   Loss 4.9543   LearningRate 0.0133   Epoch: 12   Global Step: 527040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:28,658-Speed 2629.42 samples/sec   Loss 4.8494   LearningRate 0.0133   Epoch: 12   Global Step: 527050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:32,589-Speed 2605.88 samples/sec   Loss 4.8594   LearningRate 0.0133   Epoch: 12   Global Step: 527060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:36,482-Speed 2630.90 samples/sec   Loss 4.8395   LearningRate 0.0133   Epoch: 12   Global Step: 527070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:40,385-Speed 2624.66 samples/sec   Loss 4.8993   LearningRate 0.0133   Epoch: 12   Global Step: 527080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:44,282-Speed 2627.96 samples/sec   Loss 4.8697   LearningRate 0.0133   Epoch: 12   Global Step: 527090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:56:48,176-Speed 2631.38 samples/sec   Loss 4.8588   LearningRate 0.0133   Epoch: 12   Global Step: 527100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:56:52,115-Speed 2601.22 samples/sec   Loss 4.9455   LearningRate 0.0133   Epoch: 12   Global Step: 527110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:56:56,006-Speed 2632.00 samples/sec   Loss 4.8769   LearningRate 0.0133   Epoch: 12   Global Step: 527120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:56:59,899-Speed 2631.50 samples/sec   Loss 4.8399   LearningRate 0.0133   Epoch: 12   Global Step: 527130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:03,797-Speed 2627.52 samples/sec   Loss 4.9791   LearningRate 0.0133   Epoch: 12   Global Step: 527140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:07,699-Speed 2624.82 samples/sec   Loss 4.8691   LearningRate 0.0133   Epoch: 12   Global Step: 527150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:11,591-Speed 2631.44 samples/sec   Loss 4.7797   LearningRate 0.0133   Epoch: 12   Global Step: 527160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:15,486-Speed 2630.23 samples/sec   Loss 4.9249   LearningRate 0.0133   Epoch: 12   Global Step: 527170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:19,382-Speed 2628.74 samples/sec   Loss 4.9744   LearningRate 0.0133   Epoch: 12   Global Step: 527180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:23,288-Speed 2622.13 samples/sec   Loss 4.9443   LearningRate 0.0133   Epoch: 12   Global Step: 527190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:57:27,167-Speed 2640.38 samples/sec   Loss 4.9223   LearningRate 0.0133   Epoch: 12   Global Step: 527200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:31,062-Speed 2630.57 samples/sec   Loss 4.7938   LearningRate 0.0133   Epoch: 12   Global Step: 527210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:34,959-Speed 2628.13 samples/sec   Loss 4.8240   LearningRate 0.0133   Epoch: 12   Global Step: 527220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:38,854-Speed 2629.01 samples/sec   Loss 4.8980   LearningRate 0.0133   Epoch: 12   Global Step: 527230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:42,752-Speed 2627.78 samples/sec   Loss 4.9442   LearningRate 0.0133   Epoch: 12   Global Step: 527240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:46,643-Speed 2632.45 samples/sec   Loss 4.9455   LearningRate 0.0133   Epoch: 12   Global Step: 527250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:50,535-Speed 2631.87 samples/sec   Loss 4.9465   LearningRate 0.0133   Epoch: 12   Global Step: 527260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:54,431-Speed 2628.93 samples/sec   Loss 4.8945   LearningRate 0.0133   Epoch: 12   Global Step: 527270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:57:58,326-Speed 2629.74 samples/sec   Loss 4.8437   LearningRate 0.0133   Epoch: 12   Global Step: 527280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:02,226-Speed 2626.79 samples/sec   Loss 5.0036   LearningRate 0.0133   Epoch: 12   Global Step: 527290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:06,118-Speed 2631.45 samples/sec   Loss 4.9192   LearningRate 0.0133   Epoch: 12   Global Step: 527300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:58:10,029-Speed 2618.64 samples/sec   Loss 4.9733   LearningRate 0.0133   Epoch: 12   Global Step: 527310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:58:13,923-Speed 2630.45 samples/sec   Loss 5.0104   LearningRate 0.0133   Epoch: 12   Global Step: 527320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:58:17,796-Speed 2644.40 samples/sec   Loss 4.9416   LearningRate 0.0133   Epoch: 12   Global Step: 527330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:21,688-Speed 2632.18 samples/sec   Loss 4.7916   LearningRate 0.0133   Epoch: 12   Global Step: 527340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:25,592-Speed 2622.99 samples/sec   Loss 5.0089   LearningRate 0.0133   Epoch: 12   Global Step: 527350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:29,493-Speed 2625.56 samples/sec   Loss 4.8278   LearningRate 0.0133   Epoch: 12   Global Step: 527360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:33,388-Speed 2629.64 samples/sec   Loss 4.8632   LearningRate 0.0133   Epoch: 12   Global Step: 527370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:37,284-Speed 2629.11 samples/sec   Loss 4.9593   LearningRate 0.0133   Epoch: 12   Global Step: 527380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:41,179-Speed 2629.81 samples/sec   Loss 5.0089   LearningRate 0.0133   Epoch: 12   Global Step: 527390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:45,075-Speed 2629.44 samples/sec   Loss 4.9131   LearningRate 0.0133   Epoch: 12   Global Step: 527400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:48,966-Speed 2631.71 samples/sec   Loss 4.9155   LearningRate 0.0133   Epoch: 12   Global Step: 527410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:52,853-Speed 2635.66 samples/sec   Loss 4.7582   LearningRate 0.0133   Epoch: 12   Global Step: 527420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:58:56,740-Speed 2634.45 samples/sec   Loss 4.8866   LearningRate 0.0133   Epoch: 12   Global Step: 527430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:00,642-Speed 2625.54 samples/sec   Loss 4.9583   LearningRate 0.0133   Epoch: 12   Global Step: 527440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:04,545-Speed 2624.13 samples/sec   Loss 4.9464   LearningRate 0.0133   Epoch: 12   Global Step: 527450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:08,445-Speed 2626.12 samples/sec   Loss 4.8551   LearningRate 0.0133   Epoch: 12   Global Step: 527460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:12,349-Speed 2623.66 samples/sec   Loss 4.9781   LearningRate 0.0133   Epoch: 12   Global Step: 527470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:16,252-Speed 2624.87 samples/sec   Loss 4.9250   LearningRate 0.0133   Epoch: 12   Global Step: 527480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:20,147-Speed 2629.20 samples/sec   Loss 4.8891   LearningRate 0.0133   Epoch: 12   Global Step: 527490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:24,047-Speed 2626.70 samples/sec   Loss 4.8323   LearningRate 0.0133   Epoch: 12   Global Step: 527500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:27,949-Speed 2624.85 samples/sec   Loss 4.9276   LearningRate 0.0133   Epoch: 12   Global Step: 527510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:31,842-Speed 2631.17 samples/sec   Loss 4.8892   LearningRate 0.0133   Epoch: 12   Global Step: 527520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:35,740-Speed 2627.81 samples/sec   Loss 4.8418   LearningRate 0.0133   Epoch: 12   Global Step: 527530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 06:59:39,617-Speed 2641.39 samples/sec   Loss 4.9030   LearningRate 0.0133   Epoch: 12   Global Step: 527540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 06:59:43,514-Speed 2628.39 samples/sec   Loss 4.9226   LearningRate 0.0133   Epoch: 12   Global Step: 527550   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:59:47,419-Speed 2623.41 samples/sec   Loss 4.8973   LearningRate 0.0133   Epoch: 12   Global Step: 527560   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:59:51,348-Speed 2607.03 samples/sec   Loss 4.8826   LearningRate 0.0133   Epoch: 12   Global Step: 527570   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:59:55,255-Speed 2621.73 samples/sec   Loss 4.8068   LearningRate 0.0133   Epoch: 12   Global Step: 527580   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 06:59:59,206-Speed 2591.98 samples/sec   Loss 4.9704   LearningRate 0.0133   Epoch: 12   Global Step: 527590   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:00:03,115-Speed 2620.56 samples/sec   Loss 4.8530   LearningRate 0.0133   Epoch: 12   Global Step: 527600   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:00:07,009-Speed 2630.14 samples/sec   Loss 4.8928   LearningRate 0.0132   Epoch: 12   Global Step: 527610   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:00:10,909-Speed 2626.82 samples/sec   Loss 4.7962   LearningRate 0.0132   Epoch: 12   Global Step: 527620   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:00:14,809-Speed 2625.54 samples/sec   Loss 4.8440   LearningRate 0.0132   Epoch: 12   Global Step: 527630   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:00:18,707-Speed 2628.80 samples/sec   Loss 4.7862   LearningRate 0.0132   Epoch: 12   Global Step: 527640   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:00:22,603-Speed 2629.01 samples/sec   Loss 4.7940   LearningRate 0.0132   Epoch: 12   Global Step: 527650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:26,502-Speed 2626.75 samples/sec   Loss 4.9243   LearningRate 0.0132   Epoch: 12   Global Step: 527660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:30,399-Speed 2628.35 samples/sec   Loss 4.8683   LearningRate 0.0132   Epoch: 12   Global Step: 527670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:34,292-Speed 2630.76 samples/sec   Loss 4.9506   LearningRate 0.0132   Epoch: 12   Global Step: 527680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:38,186-Speed 2630.89 samples/sec   Loss 4.8988   LearningRate 0.0132   Epoch: 12   Global Step: 527690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:42,086-Speed 2626.58 samples/sec   Loss 4.7615   LearningRate 0.0132   Epoch: 12   Global Step: 527700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:45,983-Speed 2628.19 samples/sec   Loss 4.8749   LearningRate 0.0132   Epoch: 12   Global Step: 527710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:49,896-Speed 2618.14 samples/sec   Loss 4.8351   LearningRate 0.0132   Epoch: 12   Global Step: 527720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:53,796-Speed 2626.47 samples/sec   Loss 4.8862   LearningRate 0.0132   Epoch: 12   Global Step: 527730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:00:57,701-Speed 2623.17 samples/sec   Loss 4.9439   LearningRate 0.0132   Epoch: 12   Global Step: 527740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:01,603-Speed 2624.71 samples/sec   Loss 5.0139   LearningRate 0.0132   Epoch: 12   Global Step: 527750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:01:05,477-Speed 2643.66 samples/sec   Loss 4.9824   LearningRate 0.0132   Epoch: 12   Global Step: 527760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:09,379-Speed 2625.24 samples/sec   Loss 4.8690   LearningRate 0.0132   Epoch: 12   Global Step: 527770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:13,279-Speed 2626.50 samples/sec   Loss 5.0306   LearningRate 0.0132   Epoch: 12   Global Step: 527780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:17,182-Speed 2624.14 samples/sec   Loss 4.7596   LearningRate 0.0132   Epoch: 12   Global Step: 527790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:21,081-Speed 2627.50 samples/sec   Loss 5.0046   LearningRate 0.0132   Epoch: 12   Global Step: 527800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:24,986-Speed 2623.83 samples/sec   Loss 4.8611   LearningRate 0.0132   Epoch: 12   Global Step: 527810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:28,890-Speed 2623.10 samples/sec   Loss 4.8813   LearningRate 0.0132   Epoch: 12   Global Step: 527820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:32,786-Speed 2629.03 samples/sec   Loss 4.9040   LearningRate 0.0132   Epoch: 12   Global Step: 527830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:36,693-Speed 2621.30 samples/sec   Loss 4.9139   LearningRate 0.0132   Epoch: 12   Global Step: 527840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:40,606-Speed 2617.69 samples/sec   Loss 4.8794   LearningRate 0.0132   Epoch: 12   Global Step: 527850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:44,533-Speed 2608.43 samples/sec   Loss 4.8701   LearningRate 0.0132   Epoch: 12   Global Step: 527860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:48,452-Speed 2613.56 samples/sec   Loss 4.9255   LearningRate 0.0132   Epoch: 12   Global Step: 527870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:52,359-Speed 2621.36 samples/sec   Loss 4.8420   LearningRate 0.0132   Epoch: 12   Global Step: 527880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:01:56,260-Speed 2626.74 samples/sec   Loss 5.0033   LearningRate 0.0132   Epoch: 12   Global Step: 527890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:00,170-Speed 2619.65 samples/sec   Loss 4.9260   LearningRate 0.0132   Epoch: 12   Global Step: 527900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:04,098-Speed 2607.98 samples/sec   Loss 4.9415   LearningRate 0.0132   Epoch: 12   Global Step: 527910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:08,029-Speed 2604.92 samples/sec   Loss 4.8598   LearningRate 0.0132   Epoch: 12   Global Step: 527920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:11,931-Speed 2625.34 samples/sec   Loss 4.9661   LearningRate 0.0132   Epoch: 12   Global Step: 527930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:15,828-Speed 2628.49 samples/sec   Loss 4.8694   LearningRate 0.0132   Epoch: 12   Global Step: 527940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:19,744-Speed 2616.25 samples/sec   Loss 4.9040   LearningRate 0.0132   Epoch: 12   Global Step: 527950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:23,643-Speed 2626.83 samples/sec   Loss 4.9445   LearningRate 0.0132   Epoch: 12   Global Step: 527960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:02:27,540-Speed 2629.19 samples/sec   Loss 4.9089   LearningRate 0.0132   Epoch: 12   Global Step: 527970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:31,430-Speed 2632.36 samples/sec   Loss 4.8776   LearningRate 0.0132   Epoch: 12   Global Step: 527980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:35,328-Speed 2627.81 samples/sec   Loss 4.9718   LearningRate 0.0132   Epoch: 12   Global Step: 527990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:39,250-Speed 2611.56 samples/sec   Loss 4.8715   LearningRate 0.0132   Epoch: 12   Global Step: 528000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:43,146-Speed 2629.24 samples/sec   Loss 4.8553   LearningRate 0.0132   Epoch: 12   Global Step: 528010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:47,038-Speed 2631.71 samples/sec   Loss 4.7703   LearningRate 0.0132   Epoch: 12   Global Step: 528020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:50,933-Speed 2629.48 samples/sec   Loss 4.8313   LearningRate 0.0132   Epoch: 12   Global Step: 528030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:54,825-Speed 2632.24 samples/sec   Loss 4.8813   LearningRate 0.0132   Epoch: 12   Global Step: 528040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:02:58,741-Speed 2615.53 samples/sec   Loss 4.8684   LearningRate 0.0132   Epoch: 12   Global Step: 528050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:02,636-Speed 2629.89 samples/sec   Loss 4.8494   LearningRate 0.0132   Epoch: 12   Global Step: 528060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:06,531-Speed 2629.70 samples/sec   Loss 4.8560   LearningRate 0.0132   Epoch: 12   Global Step: 528070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:03:10,429-Speed 2627.84 samples/sec   Loss 4.8397   LearningRate 0.0132   Epoch: 12   Global Step: 528080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:14,425-Speed 2563.49 samples/sec   Loss 4.9184   LearningRate 0.0132   Epoch: 12   Global Step: 528090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:18,322-Speed 2628.67 samples/sec   Loss 4.8351   LearningRate 0.0132   Epoch: 12   Global Step: 528100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:22,226-Speed 2623.66 samples/sec   Loss 4.8455   LearningRate 0.0132   Epoch: 12   Global Step: 528110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:26,128-Speed 2625.17 samples/sec   Loss 4.9227   LearningRate 0.0132   Epoch: 12   Global Step: 528120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:30,031-Speed 2624.26 samples/sec   Loss 4.7968   LearningRate 0.0132   Epoch: 12   Global Step: 528130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:33,939-Speed 2620.84 samples/sec   Loss 4.9002   LearningRate 0.0132   Epoch: 12   Global Step: 528140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:37,851-Speed 2618.22 samples/sec   Loss 4.8998   LearningRate 0.0132   Epoch: 12   Global Step: 528150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:41,755-Speed 2624.22 samples/sec   Loss 4.9753   LearningRate 0.0132   Epoch: 12   Global Step: 528160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:45,657-Speed 2625.12 samples/sec   Loss 4.8522   LearningRate 0.0132   Epoch: 12   Global Step: 528170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:03:49,553-Speed 2629.06 samples/sec   Loss 4.7916   LearningRate 0.0132   Epoch: 12   Global Step: 528180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:03:53,464-Speed 2618.48 samples/sec   Loss 4.8712   LearningRate 0.0132   Epoch: 12   Global Step: 528190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:03:57,371-Speed 2625.11 samples/sec   Loss 4.9671   LearningRate 0.0132   Epoch: 12   Global Step: 528200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:04:01,266-Speed 2629.31 samples/sec   Loss 4.9791   LearningRate 0.0132   Epoch: 12   Global Step: 528210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:04:05,184-Speed 2614.27 samples/sec   Loss 4.9700   LearningRate 0.0132   Epoch: 12   Global Step: 528220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:04:09,089-Speed 2623.33 samples/sec   Loss 4.9628   LearningRate 0.0132   Epoch: 12   Global Step: 528230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:04:13,005-Speed 2615.38 samples/sec   Loss 4.8528   LearningRate 0.0132   Epoch: 12   Global Step: 528240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:04:16,934-Speed 2606.76 samples/sec   Loss 4.9347   LearningRate 0.0132   Epoch: 12   Global Step: 528250   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:04:20,810-Speed 2642.60 samples/sec   Loss 4.8928   LearningRate 0.0132   Epoch: 12   Global Step: 528260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:24,753-Speed 2598.73 samples/sec   Loss 4.9505   LearningRate 0.0132   Epoch: 12   Global Step: 528270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:28,681-Speed 2607.44 samples/sec   Loss 4.9386   LearningRate 0.0132   Epoch: 12   Global Step: 528280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:32,690-Speed 2554.64 samples/sec   Loss 4.8936   LearningRate 0.0132   Epoch: 12   Global Step: 528290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:36,586-Speed 2629.09 samples/sec   Loss 4.9073   LearningRate 0.0132   Epoch: 12   Global Step: 528300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:40,489-Speed 2624.44 samples/sec   Loss 4.8744   LearningRate 0.0132   Epoch: 12   Global Step: 528310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:44,383-Speed 2630.21 samples/sec   Loss 4.9598   LearningRate 0.0132   Epoch: 12   Global Step: 528320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:48,285-Speed 2625.51 samples/sec   Loss 4.8399   LearningRate 0.0132   Epoch: 12   Global Step: 528330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:52,204-Speed 2612.78 samples/sec   Loss 4.9837   LearningRate 0.0132   Epoch: 12   Global Step: 528340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:04:56,106-Speed 2625.58 samples/sec   Loss 4.8884   LearningRate 0.0132   Epoch: 12   Global Step: 528350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:00,024-Speed 2614.50 samples/sec   Loss 4.9037   LearningRate 0.0132   Epoch: 12   Global Step: 528360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:05:03,917-Speed 2631.67 samples/sec   Loss 4.8196   LearningRate 0.0132   Epoch: 12   Global Step: 528370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:05:07,818-Speed 2625.36 samples/sec   Loss 4.9324   LearningRate 0.0132   Epoch: 12   Global Step: 528380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:05:11,713-Speed 2629.79 samples/sec   Loss 4.9079   LearningRate 0.0132   Epoch: 12   Global Step: 528390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:05:15,604-Speed 2631.76 samples/sec   Loss 4.8957   LearningRate 0.0132   Epoch: 12   Global Step: 528400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:05:19,479-Speed 2643.24 samples/sec   Loss 4.8354   LearningRate 0.0132   Epoch: 12   Global Step: 528410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:23,376-Speed 2629.13 samples/sec   Loss 4.8081   LearningRate 0.0132   Epoch: 12   Global Step: 528420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:27,278-Speed 2624.86 samples/sec   Loss 4.9085   LearningRate 0.0132   Epoch: 12   Global Step: 528430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:31,262-Speed 2570.92 samples/sec   Loss 4.7871   LearningRate 0.0132   Epoch: 12   Global Step: 528440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:35,157-Speed 2629.95 samples/sec   Loss 4.8143   LearningRate 0.0132   Epoch: 12   Global Step: 528450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:39,054-Speed 2627.52 samples/sec   Loss 4.9055   LearningRate 0.0132   Epoch: 12   Global Step: 528460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:42,951-Speed 2628.43 samples/sec   Loss 4.9026   LearningRate 0.0132   Epoch: 12   Global Step: 528470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:46,858-Speed 2621.72 samples/sec   Loss 4.8283   LearningRate 0.0132   Epoch: 12   Global Step: 528480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:50,759-Speed 2625.58 samples/sec   Loss 4.9000   LearningRate 0.0132   Epoch: 12   Global Step: 528490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:54,660-Speed 2626.08 samples/sec   Loss 4.8555   LearningRate 0.0132   Epoch: 12   Global Step: 528500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:05:58,578-Speed 2614.52 samples/sec   Loss 4.8532   LearningRate 0.0132   Epoch: 12   Global Step: 528510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:06:02,454-Speed 2643.07 samples/sec   Loss 4.8067   LearningRate 0.0132   Epoch: 12   Global Step: 528520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:06,362-Speed 2620.68 samples/sec   Loss 4.9763   LearningRate 0.0132   Epoch: 12   Global Step: 528530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:10,262-Speed 2626.38 samples/sec   Loss 4.8146   LearningRate 0.0132   Epoch: 12   Global Step: 528540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:14,155-Speed 2631.50 samples/sec   Loss 5.0051   LearningRate 0.0132   Epoch: 12   Global Step: 528550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:18,050-Speed 2629.51 samples/sec   Loss 4.8089   LearningRate 0.0132   Epoch: 12   Global Step: 528560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:21,957-Speed 2621.66 samples/sec   Loss 4.8616   LearningRate 0.0132   Epoch: 12   Global Step: 528570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:25,852-Speed 2629.62 samples/sec   Loss 4.7309   LearningRate 0.0132   Epoch: 12   Global Step: 528580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:29,767-Speed 2615.65 samples/sec   Loss 4.9348   LearningRate 0.0132   Epoch: 12   Global Step: 528590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:33,662-Speed 2630.10 samples/sec   Loss 4.9156   LearningRate 0.0132   Epoch: 12   Global Step: 528600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:37,556-Speed 2631.42 samples/sec   Loss 4.8797   LearningRate 0.0132   Epoch: 12   Global Step: 528610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:41,456-Speed 2625.89 samples/sec   Loss 4.8777   LearningRate 0.0132   Epoch: 12   Global Step: 528620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:06:45,337-Speed 2639.91 samples/sec   Loss 4.8553   LearningRate 0.0132   Epoch: 12   Global Step: 528630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:49,237-Speed 2625.92 samples/sec   Loss 4.8154   LearningRate 0.0132   Epoch: 12   Global Step: 528640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:53,168-Speed 2605.83 samples/sec   Loss 4.8725   LearningRate 0.0132   Epoch: 12   Global Step: 528650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:06:57,060-Speed 2631.77 samples/sec   Loss 4.8117   LearningRate 0.0132   Epoch: 12   Global Step: 528660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:00,965-Speed 2622.70 samples/sec   Loss 4.8979   LearningRate 0.0132   Epoch: 12   Global Step: 528670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:04,859-Speed 2630.36 samples/sec   Loss 4.9063   LearningRate 0.0132   Epoch: 12   Global Step: 528680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:08,754-Speed 2629.93 samples/sec   Loss 4.9197   LearningRate 0.0132   Epoch: 12   Global Step: 528690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:12,662-Speed 2621.37 samples/sec   Loss 4.8955   LearningRate 0.0132   Epoch: 12   Global Step: 528700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:16,563-Speed 2625.03 samples/sec   Loss 4.8684   LearningRate 0.0132   Epoch: 12   Global Step: 528710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:20,465-Speed 2625.81 samples/sec   Loss 4.8126   LearningRate 0.0132   Epoch: 12   Global Step: 528720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:24,359-Speed 2630.27 samples/sec   Loss 4.8826   LearningRate 0.0132   Epoch: 12   Global Step: 528730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:07:28,233-Speed 2644.09 samples/sec   Loss 4.8890   LearningRate 0.0132   Epoch: 12   Global Step: 528740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:32,130-Speed 2628.06 samples/sec   Loss 4.8862   LearningRate 0.0131   Epoch: 12   Global Step: 528750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:36,021-Speed 2632.67 samples/sec   Loss 4.8438   LearningRate 0.0131   Epoch: 12   Global Step: 528760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:39,920-Speed 2626.59 samples/sec   Loss 4.8453   LearningRate 0.0131   Epoch: 12   Global Step: 528770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:43,825-Speed 2623.45 samples/sec   Loss 4.9203   LearningRate 0.0131   Epoch: 12   Global Step: 528780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:47,721-Speed 2629.16 samples/sec   Loss 4.8666   LearningRate 0.0131   Epoch: 12   Global Step: 528790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:07:51,604-Speed 2637.20 samples/sec   Loss 4.9664   LearningRate 0.0131   Epoch: 12   Global Step: 528800   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:07:55,498-Speed 2631.88 samples/sec   Loss 4.8862   LearningRate 0.0131   Epoch: 12   Global Step: 528810   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:07:59,392-Speed 2629.93 samples/sec   Loss 4.8203   LearningRate 0.0131   Epoch: 12   Global Step: 528820   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:03,289-Speed 2628.53 samples/sec   Loss 4.8335   LearningRate 0.0131   Epoch: 12   Global Step: 528830   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:07,186-Speed 2627.78 samples/sec   Loss 4.8451   LearningRate 0.0131   Epoch: 12   Global Step: 528840   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:11,084-Speed 2628.24 samples/sec   Loss 4.9412   LearningRate 0.0131   Epoch: 12   Global Step: 528850   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:14,979-Speed 2629.16 samples/sec   Loss 4.9767   LearningRate 0.0131   Epoch: 12   Global Step: 528860   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:18,879-Speed 2626.61 samples/sec   Loss 4.8912   LearningRate 0.0131   Epoch: 12   Global Step: 528870   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:22,773-Speed 2630.00 samples/sec   Loss 4.8312   LearningRate 0.0131   Epoch: 12   Global Step: 528880   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:26,668-Speed 2630.46 samples/sec   Loss 4.8306   LearningRate 0.0131   Epoch: 12   Global Step: 528890   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:08:30,563-Speed 2629.75 samples/sec   Loss 4.8635   LearningRate 0.0131   Epoch: 12   Global Step: 528900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:34,558-Speed 2563.39 samples/sec   Loss 4.8770   LearningRate 0.0131   Epoch: 12   Global Step: 528910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:38,469-Speed 2618.43 samples/sec   Loss 4.8513   LearningRate 0.0131   Epoch: 12   Global Step: 528920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:42,373-Speed 2623.78 samples/sec   Loss 4.8749   LearningRate 0.0131   Epoch: 12   Global Step: 528930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:46,284-Speed 2619.25 samples/sec   Loss 4.8370   LearningRate 0.0131   Epoch: 12   Global Step: 528940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:50,182-Speed 2627.49 samples/sec   Loss 4.9427   LearningRate 0.0131   Epoch: 12   Global Step: 528950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:54,079-Speed 2628.29 samples/sec   Loss 4.9141   LearningRate 0.0131   Epoch: 12   Global Step: 528960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:08:57,978-Speed 2627.19 samples/sec   Loss 4.7979   LearningRate 0.0131   Epoch: 12   Global Step: 528970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:01,883-Speed 2623.37 samples/sec   Loss 4.7811   LearningRate 0.0131   Epoch: 12   Global Step: 528980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:05,820-Speed 2601.43 samples/sec   Loss 4.9218   LearningRate 0.0131   Epoch: 12   Global Step: 528990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:09,717-Speed 2628.08 samples/sec   Loss 4.8907   LearningRate 0.0131   Epoch: 12   Global Step: 529000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:09:13,619-Speed 2625.61 samples/sec   Loss 4.7502   LearningRate 0.0131   Epoch: 12   Global Step: 529010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:09:17,516-Speed 2627.68 samples/sec   Loss 4.8535   LearningRate 0.0131   Epoch: 12   Global Step: 529020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:09:21,392-Speed 2643.16 samples/sec   Loss 4.9053   LearningRate 0.0131   Epoch: 12   Global Step: 529030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:25,311-Speed 2612.87 samples/sec   Loss 4.8204   LearningRate 0.0131   Epoch: 12   Global Step: 529040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:29,213-Speed 2625.43 samples/sec   Loss 4.7835   LearningRate 0.0131   Epoch: 12   Global Step: 529050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:33,114-Speed 2625.75 samples/sec   Loss 4.8893   LearningRate 0.0131   Epoch: 12   Global Step: 529060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:37,006-Speed 2631.70 samples/sec   Loss 4.9389   LearningRate 0.0131   Epoch: 12   Global Step: 529070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:40,899-Speed 2630.48 samples/sec   Loss 4.7853   LearningRate 0.0131   Epoch: 12   Global Step: 529080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:44,817-Speed 2614.06 samples/sec   Loss 4.7999   LearningRate 0.0131   Epoch: 12   Global Step: 529090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:48,710-Speed 2631.54 samples/sec   Loss 4.7838   LearningRate 0.0131   Epoch: 12   Global Step: 529100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:52,606-Speed 2628.31 samples/sec   Loss 5.0161   LearningRate 0.0131   Epoch: 12   Global Step: 529110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:09:56,497-Speed 2632.58 samples/sec   Loss 4.8469   LearningRate 0.0131   Epoch: 12   Global Step: 529120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:00,369-Speed 2645.87 samples/sec   Loss 4.8274   LearningRate 0.0131   Epoch: 12   Global Step: 529130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:04,263-Speed 2630.08 samples/sec   Loss 4.8765   LearningRate 0.0131   Epoch: 12   Global Step: 529140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:08,164-Speed 2625.73 samples/sec   Loss 4.8926   LearningRate 0.0131   Epoch: 12   Global Step: 529150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:12,061-Speed 2629.68 samples/sec   Loss 4.8132   LearningRate 0.0131   Epoch: 12   Global Step: 529160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:15,954-Speed 2630.64 samples/sec   Loss 4.7996   LearningRate 0.0131   Epoch: 12   Global Step: 529170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:19,848-Speed 2630.23 samples/sec   Loss 4.8656   LearningRate 0.0131   Epoch: 12   Global Step: 529180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:23,753-Speed 2623.17 samples/sec   Loss 4.9411   LearningRate 0.0131   Epoch: 12   Global Step: 529190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:27,663-Speed 2619.28 samples/sec   Loss 4.8187   LearningRate 0.0131   Epoch: 12   Global Step: 529200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:31,566-Speed 2624.11 samples/sec   Loss 4.7909   LearningRate 0.0131   Epoch: 12   Global Step: 529210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:35,473-Speed 2621.70 samples/sec   Loss 4.8791   LearningRate 0.0131   Epoch: 12   Global Step: 529220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:10:39,347-Speed 2643.48 samples/sec   Loss 4.8630   LearningRate 0.0131   Epoch: 12   Global Step: 529230   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:10:43,265-Speed 2614.64 samples/sec   Loss 4.8622   LearningRate 0.0131   Epoch: 12   Global Step: 529240   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:10:47,168-Speed 2624.79 samples/sec   Loss 4.8761   LearningRate 0.0131   Epoch: 12   Global Step: 529250   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:10:51,072-Speed 2623.24 samples/sec   Loss 4.8735   LearningRate 0.0131   Epoch: 12   Global Step: 529260   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:10:54,979-Speed 2622.18 samples/sec   Loss 4.9936   LearningRate 0.0131   Epoch: 12   Global Step: 529270   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:10:58,894-Speed 2615.86 samples/sec   Loss 4.7241   LearningRate 0.0131   Epoch: 12   Global Step: 529280   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:11:02,792-Speed 2628.78 samples/sec   Loss 4.8421   LearningRate 0.0131   Epoch: 12   Global Step: 529290   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:11:06,689-Speed 2628.07 samples/sec   Loss 4.8877   LearningRate 0.0131   Epoch: 12   Global Step: 529300   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:11:10,594-Speed 2622.53 samples/sec   Loss 4.8660   LearningRate 0.0131   Epoch: 12   Global Step: 529310   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:11:14,498-Speed 2623.69 samples/sec   Loss 4.7125   LearningRate 0.0131   Epoch: 12   Global Step: 529320   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:11:18,414-Speed 2616.35 samples/sec   Loss 4.9899   LearningRate 0.0131   Epoch: 12   Global Step: 529330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:22,310-Speed 2629.28 samples/sec   Loss 4.8488   LearningRate 0.0131   Epoch: 12   Global Step: 529340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:26,211-Speed 2625.53 samples/sec   Loss 4.7979   LearningRate 0.0131   Epoch: 12   Global Step: 529350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:30,108-Speed 2628.46 samples/sec   Loss 4.7627   LearningRate 0.0131   Epoch: 12   Global Step: 529360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:34,000-Speed 2631.36 samples/sec   Loss 4.9497   LearningRate 0.0131   Epoch: 12   Global Step: 529370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:37,894-Speed 2630.11 samples/sec   Loss 4.8982   LearningRate 0.0131   Epoch: 12   Global Step: 529380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:41,792-Speed 2627.42 samples/sec   Loss 4.8192   LearningRate 0.0131   Epoch: 12   Global Step: 529390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:45,684-Speed 2631.45 samples/sec   Loss 4.9377   LearningRate 0.0131   Epoch: 12   Global Step: 529400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:49,641-Speed 2589.46 samples/sec   Loss 4.8957   LearningRate 0.0131   Epoch: 12   Global Step: 529410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:53,532-Speed 2632.41 samples/sec   Loss 4.8734   LearningRate 0.0131   Epoch: 12   Global Step: 529420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:11:57,431-Speed 2627.19 samples/sec   Loss 4.8405   LearningRate 0.0131   Epoch: 12   Global Step: 529430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:12:01,322-Speed 2632.70 samples/sec   Loss 4.8926   LearningRate 0.0131   Epoch: 12   Global Step: 529440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:05,235-Speed 2617.47 samples/sec   Loss 4.8945   LearningRate 0.0131   Epoch: 12   Global Step: 529450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:09,161-Speed 2608.57 samples/sec   Loss 4.8612   LearningRate 0.0131   Epoch: 12   Global Step: 529460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:13,059-Speed 2627.47 samples/sec   Loss 4.8863   LearningRate 0.0131   Epoch: 12   Global Step: 529470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:16,969-Speed 2619.40 samples/sec   Loss 4.8598   LearningRate 0.0131   Epoch: 12   Global Step: 529480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:20,867-Speed 2628.06 samples/sec   Loss 4.7854   LearningRate 0.0131   Epoch: 12   Global Step: 529490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:24,801-Speed 2604.09 samples/sec   Loss 4.8455   LearningRate 0.0131   Epoch: 12   Global Step: 529500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:28,694-Speed 2631.00 samples/sec   Loss 4.8969   LearningRate 0.0131   Epoch: 12   Global Step: 529510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:32,590-Speed 2628.45 samples/sec   Loss 4.8379   LearningRate 0.0131   Epoch: 12   Global Step: 529520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:36,506-Speed 2615.56 samples/sec   Loss 4.8850   LearningRate 0.0131   Epoch: 12   Global Step: 529530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:40,413-Speed 2621.24 samples/sec   Loss 4.9656   LearningRate 0.0131   Epoch: 12   Global Step: 529540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:12:44,288-Speed 2643.50 samples/sec   Loss 4.8242   LearningRate 0.0131   Epoch: 12   Global Step: 529550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:48,192-Speed 2623.67 samples/sec   Loss 4.9141   LearningRate 0.0131   Epoch: 12   Global Step: 529560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:52,090-Speed 2628.46 samples/sec   Loss 4.8927   LearningRate 0.0131   Epoch: 12   Global Step: 529570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:55,985-Speed 2629.42 samples/sec   Loss 4.8265   LearningRate 0.0131   Epoch: 12   Global Step: 529580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:12:59,898-Speed 2617.64 samples/sec   Loss 4.8629   LearningRate 0.0131   Epoch: 12   Global Step: 529590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:03,939-Speed 2534.73 samples/sec   Loss 4.9239   LearningRate 0.0131   Epoch: 12   Global Step: 529600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:07,964-Speed 2544.97 samples/sec   Loss 4.9314   LearningRate 0.0131   Epoch: 12   Global Step: 529610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:11,858-Speed 2629.64 samples/sec   Loss 4.8113   LearningRate 0.0131   Epoch: 12   Global Step: 529620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:15,754-Speed 2629.19 samples/sec   Loss 4.8926   LearningRate 0.0131   Epoch: 12   Global Step: 529630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:19,658-Speed 2623.88 samples/sec   Loss 4.9354   LearningRate 0.0131   Epoch: 12   Global Step: 529640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:23,554-Speed 2628.54 samples/sec   Loss 4.8885   LearningRate 0.0131   Epoch: 12   Global Step: 529650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:13:27,450-Speed 2629.61 samples/sec   Loss 4.8625   LearningRate 0.0131   Epoch: 12   Global Step: 529660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:13:31,325-Speed 2643.06 samples/sec   Loss 4.8657   LearningRate 0.0131   Epoch: 12   Global Step: 529670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:35,222-Speed 2628.29 samples/sec   Loss 4.8727   LearningRate 0.0131   Epoch: 12   Global Step: 529680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:39,116-Speed 2630.05 samples/sec   Loss 4.8578   LearningRate 0.0131   Epoch: 12   Global Step: 529690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:43,044-Speed 2608.07 samples/sec   Loss 4.7514   LearningRate 0.0131   Epoch: 12   Global Step: 529700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:47,101-Speed 2524.34 samples/sec   Loss 4.8983   LearningRate 0.0131   Epoch: 12   Global Step: 529710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:51,142-Speed 2534.73 samples/sec   Loss 4.9128   LearningRate 0.0131   Epoch: 12   Global Step: 529720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:55,046-Speed 2624.18 samples/sec   Loss 4.8777   LearningRate 0.0131   Epoch: 12   Global Step: 529730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:13:58,944-Speed 2628.10 samples/sec   Loss 4.8828   LearningRate 0.0131   Epoch: 12   Global Step: 529740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:02,843-Speed 2626.66 samples/sec   Loss 4.8371   LearningRate 0.0131   Epoch: 12   Global Step: 529750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:06,745-Speed 2624.26 samples/sec   Loss 4.8455   LearningRate 0.0131   Epoch: 12   Global Step: 529760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:10,641-Speed 2629.33 samples/sec   Loss 4.9096   LearningRate 0.0131   Epoch: 12   Global Step: 529770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:14:14,540-Speed 2627.48 samples/sec   Loss 4.7966   LearningRate 0.0131   Epoch: 12   Global Step: 529780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:14:18,452-Speed 2618.52 samples/sec   Loss 4.8627   LearningRate 0.0131   Epoch: 12   Global Step: 529790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:14:22,355-Speed 2623.96 samples/sec   Loss 4.9222   LearningRate 0.0131   Epoch: 12   Global Step: 529800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:14:26,239-Speed 2637.82 samples/sec   Loss 4.8514   LearningRate 0.0131   Epoch: 12   Global Step: 529810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:30,134-Speed 2629.64 samples/sec   Loss 4.8795   LearningRate 0.0131   Epoch: 12   Global Step: 529820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:34,040-Speed 2622.05 samples/sec   Loss 4.8288   LearningRate 0.0131   Epoch: 12   Global Step: 529830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:37,941-Speed 2625.57 samples/sec   Loss 4.9267   LearningRate 0.0131   Epoch: 12   Global Step: 529840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:41,844-Speed 2624.75 samples/sec   Loss 4.8606   LearningRate 0.0131   Epoch: 12   Global Step: 529850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:45,746-Speed 2624.56 samples/sec   Loss 4.8451   LearningRate 0.0131   Epoch: 12   Global Step: 529860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:49,651-Speed 2623.04 samples/sec   Loss 4.8578   LearningRate 0.0131   Epoch: 12   Global Step: 529870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:53,559-Speed 2621.06 samples/sec   Loss 4.7763   LearningRate 0.0131   Epoch: 12   Global Step: 529880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:14:57,458-Speed 2627.19 samples/sec   Loss 4.8886   LearningRate 0.0131   Epoch: 12   Global Step: 529890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:15:01,357-Speed 2627.17 samples/sec   Loss 4.9290   LearningRate 0.0130   Epoch: 12   Global Step: 529900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:15:05,258-Speed 2624.93 samples/sec   Loss 4.8731   LearningRate 0.0130   Epoch: 12   Global Step: 529910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:09,155-Speed 2628.51 samples/sec   Loss 4.9457   LearningRate 0.0130   Epoch: 12   Global Step: 529920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:13,075-Speed 2613.22 samples/sec   Loss 4.8004   LearningRate 0.0130   Epoch: 12   Global Step: 529930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:16,967-Speed 2631.41 samples/sec   Loss 4.9108   LearningRate 0.0130   Epoch: 12   Global Step: 529940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:20,863-Speed 2629.35 samples/sec   Loss 4.8811   LearningRate 0.0130   Epoch: 12   Global Step: 529950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:24,757-Speed 2630.54 samples/sec   Loss 4.9135   LearningRate 0.0130   Epoch: 12   Global Step: 529960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:28,653-Speed 2629.36 samples/sec   Loss 4.8960   LearningRate 0.0130   Epoch: 12   Global Step: 529970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:15:32,571-Speed 2613.43 samples/sec   Loss 4.8263   LearningRate 0.0130   Epoch: 12   Global Step: 529980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:15:36,470-Speed 2627.41 samples/sec   Loss 4.9374   LearningRate 0.0130   Epoch: 12   Global Step: 529990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:15:40,365-Speed 2629.75 samples/sec   Loss 4.7252   LearningRate 0.0130   Epoch: 12   Global Step: 530000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:16:23,425-[lfw][530000]XNorm: 23.368143
Training: 2022-04-15 07:16:23,426-[lfw][530000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 07:16:23,426-[lfw][530000]Accuracy-Highest: 0.99800
Training: 2022-04-15 07:17:13,246-[cfp_fp][530000]XNorm: 21.864775
Training: 2022-04-15 07:17:13,247-[cfp_fp][530000]Accuracy-Flip: 0.99086+-0.00453
Training: 2022-04-15 07:17:13,248-[cfp_fp][530000]Accuracy-Highest: 0.99086
Training: 2022-04-15 07:17:56,131-[agedb_30][530000]XNorm: 23.350406
Training: 2022-04-15 07:17:56,132-[agedb_30][530000]Accuracy-Flip: 0.97850+-0.00689
Training: 2022-04-15 07:17:56,133-[agedb_30][530000]Accuracy-Highest: 0.98083
Training: 2022-04-15 07:18:00,029-Speed 73.32 samples/sec   Loss 4.7652   LearningRate 0.0130   Epoch: 12   Global Step: 530010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:03,905-Speed 2642.41 samples/sec   Loss 4.7585   LearningRate 0.0130   Epoch: 12   Global Step: 530020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:07,782-Speed 2642.01 samples/sec   Loss 4.8720   LearningRate 0.0130   Epoch: 12   Global Step: 530030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:11,664-Speed 2638.78 samples/sec   Loss 4.8995   LearningRate 0.0130   Epoch: 12   Global Step: 530040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:15,555-Speed 2631.85 samples/sec   Loss 4.9189   LearningRate 0.0130   Epoch: 12   Global Step: 530050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:19,438-Speed 2638.00 samples/sec   Loss 4.8507   LearningRate 0.0130   Epoch: 12   Global Step: 530060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:23,322-Speed 2636.97 samples/sec   Loss 4.9030   LearningRate 0.0130   Epoch: 12   Global Step: 530070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:27,219-Speed 2628.26 samples/sec   Loss 4.8991   LearningRate 0.0130   Epoch: 12   Global Step: 530080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:18:31,104-Speed 2636.30 samples/sec   Loss 4.9084   LearningRate 0.0130   Epoch: 12   Global Step: 530090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:18:34,996-Speed 2631.53 samples/sec   Loss 4.8767   LearningRate 0.0130   Epoch: 12   Global Step: 530100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:18:38,959-Speed 2585.20 samples/sec   Loss 4.8553   LearningRate 0.0130   Epoch: 12   Global Step: 530110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:42,855-Speed 2628.49 samples/sec   Loss 4.9029   LearningRate 0.0130   Epoch: 12   Global Step: 530120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:46,748-Speed 2631.24 samples/sec   Loss 4.8126   LearningRate 0.0130   Epoch: 12   Global Step: 530130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:50,643-Speed 2629.24 samples/sec   Loss 4.8522   LearningRate 0.0130   Epoch: 12   Global Step: 530140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:54,535-Speed 2631.98 samples/sec   Loss 4.8828   LearningRate 0.0130   Epoch: 12   Global Step: 530150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:18:58,430-Speed 2629.38 samples/sec   Loss 4.7781   LearningRate 0.0130   Epoch: 12   Global Step: 530160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:02,335-Speed 2622.94 samples/sec   Loss 4.9990   LearningRate 0.0130   Epoch: 12   Global Step: 530170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:06,223-Speed 2634.02 samples/sec   Loss 4.8572   LearningRate 0.0130   Epoch: 12   Global Step: 530180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:10,121-Speed 2627.95 samples/sec   Loss 4.9102   LearningRate 0.0130   Epoch: 12   Global Step: 530190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:14,019-Speed 2628.19 samples/sec   Loss 4.7569   LearningRate 0.0130   Epoch: 12   Global Step: 530200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:17,932-Speed 2617.34 samples/sec   Loss 4.9126   LearningRate 0.0130   Epoch: 12   Global Step: 530210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:19:21,837-Speed 2622.40 samples/sec   Loss 4.8315   LearningRate 0.0130   Epoch: 12   Global Step: 530220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:19:25,745-Speed 2621.61 samples/sec   Loss 4.8407   LearningRate 0.0130   Epoch: 12   Global Step: 530230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:29,821-Speed 2512.51 samples/sec   Loss 4.8887   LearningRate 0.0130   Epoch: 12   Global Step: 530240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:33,730-Speed 2620.54 samples/sec   Loss 4.8996   LearningRate 0.0130   Epoch: 12   Global Step: 530250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:37,629-Speed 2626.44 samples/sec   Loss 4.9516   LearningRate 0.0130   Epoch: 12   Global Step: 530260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:41,526-Speed 2628.96 samples/sec   Loss 4.8366   LearningRate 0.0130   Epoch: 12   Global Step: 530270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:45,450-Speed 2609.51 samples/sec   Loss 4.8869   LearningRate 0.0130   Epoch: 12   Global Step: 530280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:49,353-Speed 2624.46 samples/sec   Loss 4.8913   LearningRate 0.0130   Epoch: 12   Global Step: 530290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:53,248-Speed 2629.97 samples/sec   Loss 4.8334   LearningRate 0.0130   Epoch: 12   Global Step: 530300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:19:57,157-Speed 2619.91 samples/sec   Loss 4.8417   LearningRate 0.0130   Epoch: 12   Global Step: 530310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:01,060-Speed 2625.00 samples/sec   Loss 4.8288   LearningRate 0.0130   Epoch: 12   Global Step: 530320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:04,958-Speed 2627.31 samples/sec   Loss 4.9042   LearningRate 0.0130   Epoch: 12   Global Step: 530330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:08,857-Speed 2626.69 samples/sec   Loss 4.9920   LearningRate 0.0130   Epoch: 12   Global Step: 530340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:12,758-Speed 2625.89 samples/sec   Loss 4.9119   LearningRate 0.0130   Epoch: 12   Global Step: 530350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:16,651-Speed 2630.73 samples/sec   Loss 4.7902   LearningRate 0.0130   Epoch: 12   Global Step: 530360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:20,551-Speed 2626.17 samples/sec   Loss 4.8502   LearningRate 0.0130   Epoch: 12   Global Step: 530370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:24,456-Speed 2622.73 samples/sec   Loss 4.8188   LearningRate 0.0130   Epoch: 12   Global Step: 530380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:28,433-Speed 2575.66 samples/sec   Loss 4.8161   LearningRate 0.0130   Epoch: 12   Global Step: 530390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:20:32,308-Speed 2643.83 samples/sec   Loss 4.8929   LearningRate 0.0130   Epoch: 12   Global Step: 530400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:36,207-Speed 2627.07 samples/sec   Loss 4.8829   LearningRate 0.0130   Epoch: 12   Global Step: 530410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:40,115-Speed 2620.37 samples/sec   Loss 4.7969   LearningRate 0.0130   Epoch: 12   Global Step: 530420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:44,012-Speed 2628.07 samples/sec   Loss 4.8121   LearningRate 0.0130   Epoch: 12   Global Step: 530430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:47,906-Speed 2630.55 samples/sec   Loss 4.8792   LearningRate 0.0130   Epoch: 12   Global Step: 530440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:52,245-Speed 2360.26 samples/sec   Loss 4.8200   LearningRate 0.0130   Epoch: 12   Global Step: 530450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:20:56,174-Speed 2607.40 samples/sec   Loss 4.9344   LearningRate 0.0130   Epoch: 12   Global Step: 530460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:21:00,073-Speed 2627.25 samples/sec   Loss 4.7754   LearningRate 0.0130   Epoch: 12   Global Step: 530470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:21:03,968-Speed 2629.26 samples/sec   Loss 4.7534   LearningRate 0.0130   Epoch: 12   Global Step: 530480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:21:07,918-Speed 2593.00 samples/sec   Loss 4.9251   LearningRate 0.0130   Epoch: 12   Global Step: 530490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:21:11,834-Speed 2615.37 samples/sec   Loss 4.8144   LearningRate 0.0130   Epoch: 12   Global Step: 530500   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:15,735-Speed 2625.63 samples/sec   Loss 4.7724   LearningRate 0.0130   Epoch: 12   Global Step: 530510   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:19,630-Speed 2629.91 samples/sec   Loss 4.7791   LearningRate 0.0130   Epoch: 12   Global Step: 530520   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:23,533-Speed 2624.64 samples/sec   Loss 4.8626   LearningRate 0.0130   Epoch: 12   Global Step: 530530   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:27,455-Speed 2611.02 samples/sec   Loss 4.8869   LearningRate 0.0130   Epoch: 12   Global Step: 530540   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:31,349-Speed 2630.32 samples/sec   Loss 4.9373   LearningRate 0.0130   Epoch: 12   Global Step: 530550   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:35,244-Speed 2629.93 samples/sec   Loss 4.9065   LearningRate 0.0130   Epoch: 12   Global Step: 530560   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:39,143-Speed 2627.34 samples/sec   Loss 4.7314   LearningRate 0.0130   Epoch: 12   Global Step: 530570   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:43,044-Speed 2624.82 samples/sec   Loss 4.9021   LearningRate 0.0130   Epoch: 12   Global Step: 530580   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:46,938-Speed 2630.82 samples/sec   Loss 4.8006   LearningRate 0.0130   Epoch: 12   Global Step: 530590   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-04-15 07:21:50,829-Speed 2632.43 samples/sec   Loss 4.7804   LearningRate 0.0130   Epoch: 12   Global Step: 530600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:21:54,724-Speed 2629.73 samples/sec   Loss 4.8322   LearningRate 0.0130   Epoch: 12   Global Step: 530610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:21:58,625-Speed 2625.47 samples/sec   Loss 4.8940   LearningRate 0.0130   Epoch: 12   Global Step: 530620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:02,516-Speed 2632.01 samples/sec   Loss 4.9125   LearningRate 0.0130   Epoch: 12   Global Step: 530630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:06,412-Speed 2629.32 samples/sec   Loss 4.8441   LearningRate 0.0130   Epoch: 12   Global Step: 530640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:10,305-Speed 2631.30 samples/sec   Loss 4.7478   LearningRate 0.0130   Epoch: 12   Global Step: 530650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:14,207-Speed 2624.38 samples/sec   Loss 4.7454   LearningRate 0.0130   Epoch: 12   Global Step: 530660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:18,102-Speed 2629.92 samples/sec   Loss 4.9169   LearningRate 0.0130   Epoch: 12   Global Step: 530670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:22,058-Speed 2589.25 samples/sec   Loss 4.7807   LearningRate 0.0130   Epoch: 12   Global Step: 530680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:26,017-Speed 2586.79 samples/sec   Loss 4.8529   LearningRate 0.0130   Epoch: 12   Global Step: 530690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:22:29,909-Speed 2631.43 samples/sec   Loss 4.8455   LearningRate 0.0130   Epoch: 12   Global Step: 530700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:33,808-Speed 2627.10 samples/sec   Loss 4.8696   LearningRate 0.0130   Epoch: 12   Global Step: 530710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:37,780-Speed 2578.56 samples/sec   Loss 4.8510   LearningRate 0.0130   Epoch: 12   Global Step: 530720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:41,909-Speed 2480.34 samples/sec   Loss 4.9285   LearningRate 0.0130   Epoch: 12   Global Step: 530730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:45,854-Speed 2597.23 samples/sec   Loss 4.8388   LearningRate 0.0130   Epoch: 12   Global Step: 530740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:49,753-Speed 2627.04 samples/sec   Loss 4.8465   LearningRate 0.0130   Epoch: 12   Global Step: 530750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:53,660-Speed 2621.79 samples/sec   Loss 4.8765   LearningRate 0.0130   Epoch: 12   Global Step: 530760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:22:57,540-Speed 2639.53 samples/sec   Loss 4.8515   LearningRate 0.0130   Epoch: 12   Global Step: 530770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:01,447-Speed 2622.13 samples/sec   Loss 4.9187   LearningRate 0.0130   Epoch: 12   Global Step: 530780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:05,345-Speed 2627.46 samples/sec   Loss 4.9026   LearningRate 0.0130   Epoch: 12   Global Step: 530790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:09,248-Speed 2623.70 samples/sec   Loss 4.8708   LearningRate 0.0130   Epoch: 12   Global Step: 530800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:13,149-Speed 2625.51 samples/sec   Loss 4.8186   LearningRate 0.0130   Epoch: 12   Global Step: 530810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:17,042-Speed 2630.91 samples/sec   Loss 4.7647   LearningRate 0.0130   Epoch: 12   Global Step: 530820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:20,941-Speed 2627.02 samples/sec   Loss 4.8478   LearningRate 0.0130   Epoch: 12   Global Step: 530830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:24,837-Speed 2629.85 samples/sec   Loss 4.7932   LearningRate 0.0130   Epoch: 12   Global Step: 530840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:28,735-Speed 2627.14 samples/sec   Loss 4.9834   LearningRate 0.0130   Epoch: 12   Global Step: 530850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:32,638-Speed 2625.31 samples/sec   Loss 4.8055   LearningRate 0.0130   Epoch: 12   Global Step: 530860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:36,534-Speed 2628.46 samples/sec   Loss 4.7879   LearningRate 0.0130   Epoch: 12   Global Step: 530870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:23:40,432-Speed 2627.58 samples/sec   Loss 4.8619   LearningRate 0.0130   Epoch: 12   Global Step: 530880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:23:44,314-Speed 2637.94 samples/sec   Loss 4.8777   LearningRate 0.0130   Epoch: 12   Global Step: 530890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:48,246-Speed 2605.21 samples/sec   Loss 4.7531   LearningRate 0.0130   Epoch: 12   Global Step: 530900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:52,181-Speed 2603.06 samples/sec   Loss 4.8373   LearningRate 0.0130   Epoch: 12   Global Step: 530910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:56,074-Speed 2630.69 samples/sec   Loss 4.8094   LearningRate 0.0130   Epoch: 12   Global Step: 530920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:23:59,968-Speed 2630.43 samples/sec   Loss 4.9165   LearningRate 0.0130   Epoch: 12   Global Step: 530930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:03,859-Speed 2632.67 samples/sec   Loss 4.7669   LearningRate 0.0130   Epoch: 12   Global Step: 530940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:07,752-Speed 2630.79 samples/sec   Loss 4.8474   LearningRate 0.0130   Epoch: 12   Global Step: 530950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:11,646-Speed 2629.71 samples/sec   Loss 4.8479   LearningRate 0.0130   Epoch: 12   Global Step: 530960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:15,552-Speed 2622.47 samples/sec   Loss 4.9280   LearningRate 0.0130   Epoch: 12   Global Step: 530970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:19,450-Speed 2627.31 samples/sec   Loss 4.8586   LearningRate 0.0130   Epoch: 12   Global Step: 530980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:23,357-Speed 2621.48 samples/sec   Loss 4.9148   LearningRate 0.0130   Epoch: 12   Global Step: 530990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:24:27,250-Speed 2630.98 samples/sec   Loss 4.7992   LearningRate 0.0130   Epoch: 12   Global Step: 531000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:24:31,146-Speed 2629.01 samples/sec   Loss 4.8875   LearningRate 0.0130   Epoch: 12   Global Step: 531010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:24:35,039-Speed 2631.29 samples/sec   Loss 4.8146   LearningRate 0.0130   Epoch: 12   Global Step: 531020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:24:38,921-Speed 2638.17 samples/sec   Loss 4.8659   LearningRate 0.0130   Epoch: 12   Global Step: 531030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:42,825-Speed 2623.92 samples/sec   Loss 4.8659   LearningRate 0.0130   Epoch: 12   Global Step: 531040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:46,725-Speed 2626.11 samples/sec   Loss 4.8436   LearningRate 0.0129   Epoch: 12   Global Step: 531050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:50,635-Speed 2619.63 samples/sec   Loss 4.8312   LearningRate 0.0129   Epoch: 12   Global Step: 531060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:54,532-Speed 2628.15 samples/sec   Loss 4.9184   LearningRate 0.0129   Epoch: 12   Global Step: 531070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:24:58,427-Speed 2630.23 samples/sec   Loss 4.9508   LearningRate 0.0129   Epoch: 12   Global Step: 531080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:02,366-Speed 2600.05 samples/sec   Loss 4.8500   LearningRate 0.0129   Epoch: 12   Global Step: 531090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:06,271-Speed 2622.73 samples/sec   Loss 4.7687   LearningRate 0.0129   Epoch: 12   Global Step: 531100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:10,170-Speed 2627.14 samples/sec   Loss 4.9012   LearningRate 0.0129   Epoch: 12   Global Step: 531110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:14,063-Speed 2630.63 samples/sec   Loss 4.8913   LearningRate 0.0129   Epoch: 12   Global Step: 531120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:17,977-Speed 2617.38 samples/sec   Loss 4.9441   LearningRate 0.0129   Epoch: 12   Global Step: 531130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:25:21,845-Speed 2647.88 samples/sec   Loss 4.8211   LearningRate 0.0129   Epoch: 12   Global Step: 531140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:25,744-Speed 2626.55 samples/sec   Loss 4.7793   LearningRate 0.0129   Epoch: 12   Global Step: 531150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:29,639-Speed 2630.22 samples/sec   Loss 4.8651   LearningRate 0.0129   Epoch: 12   Global Step: 531160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:33,536-Speed 2627.61 samples/sec   Loss 4.8495   LearningRate 0.0129   Epoch: 12   Global Step: 531170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:37,449-Speed 2618.09 samples/sec   Loss 4.8441   LearningRate 0.0129   Epoch: 12   Global Step: 531180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:41,348-Speed 2626.47 samples/sec   Loss 4.7537   LearningRate 0.0129   Epoch: 12   Global Step: 531190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:45,243-Speed 2630.03 samples/sec   Loss 4.9098   LearningRate 0.0129   Epoch: 12   Global Step: 531200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:49,135-Speed 2631.03 samples/sec   Loss 4.7469   LearningRate 0.0129   Epoch: 12   Global Step: 531210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:53,029-Speed 2630.59 samples/sec   Loss 4.9243   LearningRate 0.0129   Epoch: 12   Global Step: 531220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:25:56,924-Speed 2630.44 samples/sec   Loss 4.8985   LearningRate 0.0129   Epoch: 12   Global Step: 531230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:00,796-Speed 2645.03 samples/sec   Loss 4.8954   LearningRate 0.0129   Epoch: 12   Global Step: 531240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:04,691-Speed 2629.37 samples/sec   Loss 4.9285   LearningRate 0.0129   Epoch: 12   Global Step: 531250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:08,588-Speed 2628.00 samples/sec   Loss 4.8786   LearningRate 0.0129   Epoch: 12   Global Step: 531260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:12,492-Speed 2623.91 samples/sec   Loss 4.9101   LearningRate 0.0129   Epoch: 12   Global Step: 531270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:16,394-Speed 2624.31 samples/sec   Loss 4.8520   LearningRate 0.0129   Epoch: 12   Global Step: 531280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:20,302-Speed 2621.15 samples/sec   Loss 4.8083   LearningRate 0.0129   Epoch: 12   Global Step: 531290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:24,307-Speed 2557.23 samples/sec   Loss 4.8521   LearningRate 0.0129   Epoch: 12   Global Step: 531300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:28,210-Speed 2624.61 samples/sec   Loss 4.8236   LearningRate 0.0129   Epoch: 12   Global Step: 531310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:32,113-Speed 2624.71 samples/sec   Loss 4.8579   LearningRate 0.0129   Epoch: 12   Global Step: 531320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:36,009-Speed 2628.43 samples/sec   Loss 4.8624   LearningRate 0.0129   Epoch: 12   Global Step: 531330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:39,903-Speed 2630.67 samples/sec   Loss 4.8524   LearningRate 0.0129   Epoch: 12   Global Step: 531340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:26:43,795-Speed 2631.12 samples/sec   Loss 4.8641   LearningRate 0.0129   Epoch: 12   Global Step: 531350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:26:47,690-Speed 2630.14 samples/sec   Loss 4.8468   LearningRate 0.0129   Epoch: 12   Global Step: 531360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:26:51,564-Speed 2643.69 samples/sec   Loss 4.8286   LearningRate 0.0129   Epoch: 12   Global Step: 531370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:55,472-Speed 2620.58 samples/sec   Loss 4.8304   LearningRate 0.0129   Epoch: 12   Global Step: 531380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:26:59,411-Speed 2600.25 samples/sec   Loss 4.8333   LearningRate 0.0129   Epoch: 12   Global Step: 531390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:03,303-Speed 2631.88 samples/sec   Loss 4.7404   LearningRate 0.0129   Epoch: 12   Global Step: 531400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:07,205-Speed 2625.01 samples/sec   Loss 4.7776   LearningRate 0.0129   Epoch: 12   Global Step: 531410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:11,118-Speed 2617.72 samples/sec   Loss 4.8442   LearningRate 0.0129   Epoch: 12   Global Step: 531420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:15,017-Speed 2626.70 samples/sec   Loss 4.7795   LearningRate 0.0129   Epoch: 12   Global Step: 531430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:18,921-Speed 2623.12 samples/sec   Loss 4.7742   LearningRate 0.0129   Epoch: 12   Global Step: 531440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:22,856-Speed 2603.15 samples/sec   Loss 4.9008   LearningRate 0.0129   Epoch: 12   Global Step: 531450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:26,961-Speed 2494.94 samples/sec   Loss 4.9172   LearningRate 0.0129   Epoch: 12   Global Step: 531460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-15 07:27:31,061-Speed 2498.05 samples/sec   Loss 4.9154   LearningRate 0.0129   Epoch: 12   Global Step: 531470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:27:34,996-Speed 2603.00 samples/sec   Loss 4.7699   LearningRate 0.0129   Epoch: 12   Global Step: 531480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-15 07:27:38,905-Speed 2620.05 samples/sec   Loss 4.8534   LearningRate 0.0129   Epoch: 12   Global Step: 531490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:27:42,911-Speed 2557.12 samples/sec   Loss 4.7087   LearningRate 0.0129   Epoch: 12   Global Step: 531500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:27:46,848-Speed 2601.29 samples/sec   Loss 4.6922   LearningRate 0.0129   Epoch: 12   Global Step: 531510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:27:50,738-Speed 2633.15 samples/sec   Loss 4.8972   LearningRate 0.0129   Epoch: 12   Global Step: 531520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:27:54,632-Speed 2630.40 samples/sec   Loss 4.7096   LearningRate 0.0129   Epoch: 12   Global Step: 531530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:27:58,526-Speed 2630.45 samples/sec   Loss 4.8044   LearningRate 0.0129   Epoch: 12   Global Step: 531540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:02,437-Speed 2619.03 samples/sec   Loss 4.8003   LearningRate 0.0129   Epoch: 12   Global Step: 531550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:06,333-Speed 2628.27 samples/sec   Loss 4.9083   LearningRate 0.0129   Epoch: 12   Global Step: 531560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:10,211-Speed 2641.48 samples/sec   Loss 4.8128   LearningRate 0.0129   Epoch: 12   Global Step: 531570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:14,106-Speed 2629.36 samples/sec   Loss 4.8722   LearningRate 0.0129   Epoch: 12   Global Step: 531580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:18,015-Speed 2620.67 samples/sec   Loss 4.8651   LearningRate 0.0129   Epoch: 12   Global Step: 531590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:21,916-Speed 2626.18 samples/sec   Loss 4.8696   LearningRate 0.0129   Epoch: 12   Global Step: 531600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:25,819-Speed 2624.20 samples/sec   Loss 4.8187   LearningRate 0.0129   Epoch: 12   Global Step: 531610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:29,721-Speed 2624.93 samples/sec   Loss 4.8601   LearningRate 0.0129   Epoch: 12   Global Step: 531620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:33,619-Speed 2627.60 samples/sec   Loss 4.7843   LearningRate 0.0129   Epoch: 12   Global Step: 531630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:37,507-Speed 2634.27 samples/sec   Loss 4.8660   LearningRate 0.0129   Epoch: 12   Global Step: 531640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:41,412-Speed 2622.24 samples/sec   Loss 4.8481   LearningRate 0.0129   Epoch: 12   Global Step: 531650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:45,319-Speed 2622.23 samples/sec   Loss 4.8286   LearningRate 0.0129   Epoch: 12   Global Step: 531660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:49,195-Speed 2642.14 samples/sec   Loss 4.8500   LearningRate 0.0129   Epoch: 12   Global Step: 531670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:53,093-Speed 2627.78 samples/sec   Loss 4.8026   LearningRate 0.0129   Epoch: 12   Global Step: 531680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:28:56,987-Speed 2630.55 samples/sec   Loss 4.7306   LearningRate 0.0129   Epoch: 12   Global Step: 531690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:29:00,891-Speed 2623.35 samples/sec   Loss 4.8431   LearningRate 0.0129   Epoch: 12   Global Step: 531700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:29:04,774-Speed 2638.18 samples/sec   Loss 4.8751   LearningRate 0.0129   Epoch: 12   Global Step: 531710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:08,669-Speed 2628.98 samples/sec   Loss 4.8546   LearningRate 0.0129   Epoch: 12   Global Step: 531720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:12,569-Speed 2626.02 samples/sec   Loss 4.8089   LearningRate 0.0129   Epoch: 12   Global Step: 531730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:16,475-Speed 2622.96 samples/sec   Loss 4.9025   LearningRate 0.0129   Epoch: 12   Global Step: 531740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:20,375-Speed 2625.71 samples/sec   Loss 4.8346   LearningRate 0.0129   Epoch: 12   Global Step: 531750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:24,290-Speed 2616.83 samples/sec   Loss 4.7644   LearningRate 0.0129   Epoch: 12   Global Step: 531760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:28,187-Speed 2627.61 samples/sec   Loss 4.8131   LearningRate 0.0129   Epoch: 12   Global Step: 531770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:32,085-Speed 2628.06 samples/sec   Loss 4.7827   LearningRate 0.0129   Epoch: 12   Global Step: 531780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:35,985-Speed 2626.65 samples/sec   Loss 4.8806   LearningRate 0.0129   Epoch: 12   Global Step: 531790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:39,883-Speed 2627.79 samples/sec   Loss 4.9098   LearningRate 0.0129   Epoch: 12   Global Step: 531800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:43,776-Speed 2630.24 samples/sec   Loss 4.8268   LearningRate 0.0129   Epoch: 12   Global Step: 531810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:29:47,670-Speed 2630.61 samples/sec   Loss 4.7916   LearningRate 0.0129   Epoch: 12   Global Step: 531820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:29:51,565-Speed 2629.83 samples/sec   Loss 4.8965   LearningRate 0.0129   Epoch: 12   Global Step: 531830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:29:55,441-Speed 2642.67 samples/sec   Loss 4.7718   LearningRate 0.0129   Epoch: 12   Global Step: 531840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:29:59,335-Speed 2629.99 samples/sec   Loss 4.8087   LearningRate 0.0129   Epoch: 12   Global Step: 531850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:03,263-Speed 2607.81 samples/sec   Loss 4.7189   LearningRate 0.0129   Epoch: 12   Global Step: 531860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:07,157-Speed 2629.74 samples/sec   Loss 4.7964   LearningRate 0.0129   Epoch: 12   Global Step: 531870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:11,061-Speed 2623.86 samples/sec   Loss 4.8272   LearningRate 0.0129   Epoch: 12   Global Step: 531880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:14,967-Speed 2622.39 samples/sec   Loss 4.8275   LearningRate 0.0129   Epoch: 12   Global Step: 531890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:18,864-Speed 2628.14 samples/sec   Loss 4.7332   LearningRate 0.0129   Epoch: 12   Global Step: 531900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:22,756-Speed 2631.30 samples/sec   Loss 4.8159   LearningRate 0.0129   Epoch: 12   Global Step: 531910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:26,657-Speed 2625.65 samples/sec   Loss 4.8748   LearningRate 0.0129   Epoch: 12   Global Step: 531920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:30,559-Speed 2624.95 samples/sec   Loss 4.8155   LearningRate 0.0129   Epoch: 12   Global Step: 531930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:34,457-Speed 2627.48 samples/sec   Loss 4.7888   LearningRate 0.0129   Epoch: 12   Global Step: 531940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:30:38,357-Speed 2626.85 samples/sec   Loss 4.8549   LearningRate 0.0129   Epoch: 12   Global Step: 531950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:30:42,254-Speed 2628.11 samples/sec   Loss 4.5832   LearningRate 0.0129   Epoch: 12   Global Step: 531960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:30:46,131-Speed 2641.37 samples/sec   Loss 4.8474   LearningRate 0.0129   Epoch: 12   Global Step: 531970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:50,022-Speed 2632.49 samples/sec   Loss 4.9758   LearningRate 0.0129   Epoch: 12   Global Step: 531980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:53,917-Speed 2629.91 samples/sec   Loss 4.7991   LearningRate 0.0129   Epoch: 12   Global Step: 531990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:30:57,812-Speed 2629.44 samples/sec   Loss 4.8136   LearningRate 0.0129   Epoch: 12   Global Step: 532000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:01,703-Speed 2632.00 samples/sec   Loss 4.7908   LearningRate 0.0129   Epoch: 12   Global Step: 532010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:05,601-Speed 2627.90 samples/sec   Loss 4.9121   LearningRate 0.0129   Epoch: 12   Global Step: 532020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:09,494-Speed 2630.49 samples/sec   Loss 4.6855   LearningRate 0.0129   Epoch: 12   Global Step: 532030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:13,395-Speed 2625.42 samples/sec   Loss 4.7368   LearningRate 0.0129   Epoch: 12   Global Step: 532040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:17,295-Speed 2626.40 samples/sec   Loss 4.8584   LearningRate 0.0129   Epoch: 12   Global Step: 532050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:21,187-Speed 2631.89 samples/sec   Loss 4.8436   LearningRate 0.0129   Epoch: 12   Global Step: 532060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:25,062-Speed 2643.31 samples/sec   Loss 4.8385   LearningRate 0.0129   Epoch: 12   Global Step: 532070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:28,954-Speed 2631.92 samples/sec   Loss 4.8552   LearningRate 0.0129   Epoch: 12   Global Step: 532080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:32,848-Speed 2630.51 samples/sec   Loss 4.7915   LearningRate 0.0129   Epoch: 12   Global Step: 532090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:36,747-Speed 2626.82 samples/sec   Loss 4.8018   LearningRate 0.0129   Epoch: 12   Global Step: 532100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:40,645-Speed 2627.06 samples/sec   Loss 4.7228   LearningRate 0.0129   Epoch: 12   Global Step: 532110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:44,544-Speed 2627.09 samples/sec   Loss 4.8127   LearningRate 0.0129   Epoch: 12   Global Step: 532120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:48,457-Speed 2617.38 samples/sec   Loss 4.8395   LearningRate 0.0129   Epoch: 12   Global Step: 532130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:52,354-Speed 2628.45 samples/sec   Loss 4.8813   LearningRate 0.0129   Epoch: 12   Global Step: 532140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:31:56,251-Speed 2628.04 samples/sec   Loss 4.9200   LearningRate 0.0129   Epoch: 12   Global Step: 532150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:32:00,146-Speed 2629.78 samples/sec   Loss 4.8608   LearningRate 0.0129   Epoch: 12   Global Step: 532160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:32:04,046-Speed 2625.98 samples/sec   Loss 4.8044   LearningRate 0.0129   Epoch: 12   Global Step: 532170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:32:07,948-Speed 2625.97 samples/sec   Loss 4.9153   LearningRate 0.0129   Epoch: 12   Global Step: 532180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:32:11,864-Speed 2615.16 samples/sec   Loss 4.7113   LearningRate 0.0129   Epoch: 12   Global Step: 532190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:32:15,743-Speed 2640.22 samples/sec   Loss 4.8435   LearningRate 0.0129   Epoch: 12   Global Step: 532200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:32:19,650-Speed 2621.44 samples/sec   Loss 4.7992   LearningRate 0.0128   Epoch: 12   Global Step: 532210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:32:23,542-Speed 2631.72 samples/sec   Loss 4.8328   LearningRate 0.0128   Epoch: 12   Global Step: 532220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:32:27,413-Speed 2646.26 samples/sec   Loss 4.7926   LearningRate 0.0128   Epoch: 12   Global Step: 532230   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:31,308-Speed 2629.34 samples/sec   Loss 4.9539   LearningRate 0.0128   Epoch: 12   Global Step: 532240   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:35,203-Speed 2629.38 samples/sec   Loss 4.8471   LearningRate 0.0128   Epoch: 12   Global Step: 532250   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:39,100-Speed 2628.39 samples/sec   Loss 4.8312   LearningRate 0.0128   Epoch: 12   Global Step: 532260   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:42,994-Speed 2630.52 samples/sec   Loss 4.8378   LearningRate 0.0128   Epoch: 12   Global Step: 532270   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:46,906-Speed 2618.55 samples/sec   Loss 4.8293   LearningRate 0.0128   Epoch: 12   Global Step: 532280   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:50,803-Speed 2628.02 samples/sec   Loss 4.8176   LearningRate 0.0128   Epoch: 12   Global Step: 532290   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:54,700-Speed 2628.13 samples/sec   Loss 4.8020   LearningRate 0.0128   Epoch: 12   Global Step: 532300   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:32:58,595-Speed 2629.58 samples/sec   Loss 4.9197   LearningRate 0.0128   Epoch: 12   Global Step: 532310   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:33:02,583-Speed 2568.44 samples/sec   Loss 4.8747   LearningRate 0.0128   Epoch: 12   Global Step: 532320   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:33:06,485-Speed 2625.16 samples/sec   Loss 4.6863   LearningRate 0.0128   Epoch: 12   Global Step: 532330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:10,558-Speed 2514.32 samples/sec   Loss 4.8647   LearningRate 0.0128   Epoch: 12   Global Step: 532340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:14,545-Speed 2568.87 samples/sec   Loss 4.7885   LearningRate 0.0128   Epoch: 12   Global Step: 532350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:18,456-Speed 2618.82 samples/sec   Loss 4.8398   LearningRate 0.0128   Epoch: 12   Global Step: 532360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:22,349-Speed 2631.58 samples/sec   Loss 4.7748   LearningRate 0.0128   Epoch: 12   Global Step: 532370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:26,241-Speed 2631.82 samples/sec   Loss 4.8857   LearningRate 0.0128   Epoch: 12   Global Step: 532380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:30,136-Speed 2629.34 samples/sec   Loss 4.8600   LearningRate 0.0128   Epoch: 12   Global Step: 532390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:34,029-Speed 2630.79 samples/sec   Loss 4.8884   LearningRate 0.0128   Epoch: 12   Global Step: 532400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:37,927-Speed 2627.09 samples/sec   Loss 4.8619   LearningRate 0.0128   Epoch: 12   Global Step: 532410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:41,999-Speed 2515.67 samples/sec   Loss 4.7207   LearningRate 0.0128   Epoch: 12   Global Step: 532420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:46,095-Speed 2500.57 samples/sec   Loss 4.7862   LearningRate 0.0128   Epoch: 12   Global Step: 532430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:33:50,190-Speed 2501.97 samples/sec   Loss 4.8944   LearningRate 0.0128   Epoch: 12   Global Step: 532440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:33:54,248-Speed 2523.86 samples/sec   Loss 4.8498   LearningRate 0.0128   Epoch: 12   Global Step: 532450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:33:58,300-Speed 2527.84 samples/sec   Loss 4.8800   LearningRate 0.0128   Epoch: 12   Global Step: 532460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:02,203-Speed 2624.04 samples/sec   Loss 4.9001   LearningRate 0.0128   Epoch: 12   Global Step: 532470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:06,098-Speed 2630.13 samples/sec   Loss 4.8245   LearningRate 0.0128   Epoch: 12   Global Step: 532480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:10,011-Speed 2617.29 samples/sec   Loss 4.6908   LearningRate 0.0128   Epoch: 12   Global Step: 532490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:13,918-Speed 2621.67 samples/sec   Loss 4.8012   LearningRate 0.0128   Epoch: 12   Global Step: 532500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:17,814-Speed 2628.76 samples/sec   Loss 4.7679   LearningRate 0.0128   Epoch: 12   Global Step: 532510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:21,712-Speed 2627.39 samples/sec   Loss 4.8061   LearningRate 0.0128   Epoch: 12   Global Step: 532520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:25,606-Speed 2630.79 samples/sec   Loss 4.8569   LearningRate 0.0128   Epoch: 12   Global Step: 532530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:29,500-Speed 2630.17 samples/sec   Loss 4.7567   LearningRate 0.0128   Epoch: 12   Global Step: 532540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:33,405-Speed 2622.55 samples/sec   Loss 4.8514   LearningRate 0.0128   Epoch: 12   Global Step: 532550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:34:37,308-Speed 2623.91 samples/sec   Loss 4.7184   LearningRate 0.0128   Epoch: 12   Global Step: 532560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:34:41,201-Speed 2631.82 samples/sec   Loss 4.6982   LearningRate 0.0128   Epoch: 12   Global Step: 532570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:34:45,102-Speed 2625.66 samples/sec   Loss 4.7501   LearningRate 0.0128   Epoch: 12   Global Step: 532580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:34:48,994-Speed 2631.63 samples/sec   Loss 4.7903   LearningRate 0.0128   Epoch: 12   Global Step: 532590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:34:52,864-Speed 2646.19 samples/sec   Loss 4.8948   LearningRate 0.0128   Epoch: 12   Global Step: 532600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:34:56,740-Speed 2643.34 samples/sec   Loss 4.7843   LearningRate 0.0128   Epoch: 12   Global Step: 532610   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:00,641-Speed 2625.71 samples/sec   Loss 4.8146   LearningRate 0.0128   Epoch: 12   Global Step: 532620   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:04,537-Speed 2628.37 samples/sec   Loss 4.7964   LearningRate 0.0128   Epoch: 12   Global Step: 532630   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:08,431-Speed 2630.14 samples/sec   Loss 4.8830   LearningRate 0.0128   Epoch: 12   Global Step: 532640   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:12,335-Speed 2624.05 samples/sec   Loss 4.8356   LearningRate 0.0128   Epoch: 12   Global Step: 532650   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:16,232-Speed 2628.30 samples/sec   Loss 4.8672   LearningRate 0.0128   Epoch: 12   Global Step: 532660   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:20,125-Speed 2631.11 samples/sec   Loss 4.8094   LearningRate 0.0128   Epoch: 12   Global Step: 532670   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:24,022-Speed 2628.67 samples/sec   Loss 4.7460   LearningRate 0.0128   Epoch: 12   Global Step: 532680   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:27,916-Speed 2630.91 samples/sec   Loss 4.8002   LearningRate 0.0128   Epoch: 12   Global Step: 532690   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:31,811-Speed 2629.14 samples/sec   Loss 4.6564   LearningRate 0.0128   Epoch: 12   Global Step: 532700   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:35:35,703-Speed 2631.47 samples/sec   Loss 4.8590   LearningRate 0.0128   Epoch: 12   Global Step: 532710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:35:39,597-Speed 2630.58 samples/sec   Loss 4.7138   LearningRate 0.0128   Epoch: 12   Global Step: 532720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:35:43,489-Speed 2631.25 samples/sec   Loss 4.8759   LearningRate 0.0128   Epoch: 12   Global Step: 532730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:35:47,388-Speed 2627.46 samples/sec   Loss 4.8992   LearningRate 0.0128   Epoch: 12   Global Step: 532740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:35:51,282-Speed 2629.81 samples/sec   Loss 4.7975   LearningRate 0.0128   Epoch: 12   Global Step: 532750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:35:55,178-Speed 2629.39 samples/sec   Loss 4.9103   LearningRate 0.0128   Epoch: 12   Global Step: 532760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:35:59,070-Speed 2631.66 samples/sec   Loss 4.8518   LearningRate 0.0128   Epoch: 12   Global Step: 532770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:02,963-Speed 2630.85 samples/sec   Loss 4.7182   LearningRate 0.0128   Epoch: 12   Global Step: 532780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:06,869-Speed 2622.19 samples/sec   Loss 4.8904   LearningRate 0.0128   Epoch: 12   Global Step: 532790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:10,765-Speed 2628.69 samples/sec   Loss 4.7897   LearningRate 0.0128   Epoch: 12   Global Step: 532800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:14,660-Speed 2629.75 samples/sec   Loss 4.8502   LearningRate 0.0128   Epoch: 12   Global Step: 532810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:36:18,556-Speed 2629.37 samples/sec   Loss 4.8161   LearningRate 0.0128   Epoch: 12   Global Step: 532820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:36:22,457-Speed 2625.32 samples/sec   Loss 4.7541   LearningRate 0.0128   Epoch: 12   Global Step: 532830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:36:26,328-Speed 2646.05 samples/sec   Loss 4.7644   LearningRate 0.0128   Epoch: 12   Global Step: 532840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:30,241-Speed 2617.68 samples/sec   Loss 4.6742   LearningRate 0.0128   Epoch: 12   Global Step: 532850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:34,136-Speed 2629.17 samples/sec   Loss 4.7891   LearningRate 0.0128   Epoch: 12   Global Step: 532860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:38,093-Speed 2589.01 samples/sec   Loss 4.7949   LearningRate 0.0128   Epoch: 12   Global Step: 532870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:41,987-Speed 2630.14 samples/sec   Loss 4.8510   LearningRate 0.0128   Epoch: 12   Global Step: 532880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:45,877-Speed 2632.94 samples/sec   Loss 4.8245   LearningRate 0.0128   Epoch: 12   Global Step: 532890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:49,776-Speed 2627.08 samples/sec   Loss 4.8039   LearningRate 0.0128   Epoch: 12   Global Step: 532900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:53,676-Speed 2626.11 samples/sec   Loss 4.7719   LearningRate 0.0128   Epoch: 12   Global Step: 532910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:36:57,573-Speed 2628.31 samples/sec   Loss 4.7792   LearningRate 0.0128   Epoch: 12   Global Step: 532920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:37:01,453-Speed 2639.60 samples/sec   Loss 4.8408   LearningRate 0.0128   Epoch: 12   Global Step: 532930   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:05,343-Speed 2633.17 samples/sec   Loss 4.8335   LearningRate 0.0128   Epoch: 12   Global Step: 532940   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:09,243-Speed 2625.56 samples/sec   Loss 4.7833   LearningRate 0.0128   Epoch: 12   Global Step: 532950   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:13,144-Speed 2625.63 samples/sec   Loss 4.9153   LearningRate 0.0128   Epoch: 12   Global Step: 532960   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:17,048-Speed 2623.99 samples/sec   Loss 4.7171   LearningRate 0.0128   Epoch: 12   Global Step: 532970   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:20,951-Speed 2624.53 samples/sec   Loss 4.7073   LearningRate 0.0128   Epoch: 12   Global Step: 532980   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:24,858-Speed 2621.50 samples/sec   Loss 4.7699   LearningRate 0.0128   Epoch: 12   Global Step: 532990   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:28,757-Speed 2626.72 samples/sec   Loss 4.8658   LearningRate 0.0128   Epoch: 12   Global Step: 533000   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:32,661-Speed 2623.72 samples/sec   Loss 4.8270   LearningRate 0.0128   Epoch: 12   Global Step: 533010   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:36,560-Speed 2626.87 samples/sec   Loss 4.9293   LearningRate 0.0128   Epoch: 12   Global Step: 533020   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:37:40,466-Speed 2622.11 samples/sec   Loss 4.8120   LearningRate 0.0128   Epoch: 12   Global Step: 533030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:37:44,367-Speed 2625.41 samples/sec   Loss 4.8410   LearningRate 0.0128   Epoch: 12   Global Step: 533040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:37:48,264-Speed 2627.99 samples/sec   Loss 4.7764   LearningRate 0.0128   Epoch: 12   Global Step: 533050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:37:52,163-Speed 2627.42 samples/sec   Loss 4.7852   LearningRate 0.0128   Epoch: 12   Global Step: 533060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:37:56,059-Speed 2628.72 samples/sec   Loss 4.7564   LearningRate 0.0128   Epoch: 12   Global Step: 533070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:37:59,962-Speed 2624.11 samples/sec   Loss 4.7774   LearningRate 0.0128   Epoch: 12   Global Step: 533080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:03,874-Speed 2618.67 samples/sec   Loss 4.8348   LearningRate 0.0128   Epoch: 12   Global Step: 533090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:07,771-Speed 2628.21 samples/sec   Loss 4.8770   LearningRate 0.0128   Epoch: 12   Global Step: 533100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:11,667-Speed 2628.77 samples/sec   Loss 4.8305   LearningRate 0.0128   Epoch: 12   Global Step: 533110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:15,563-Speed 2628.68 samples/sec   Loss 4.8561   LearningRate 0.0128   Epoch: 12   Global Step: 533120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:19,463-Speed 2626.38 samples/sec   Loss 4.8196   LearningRate 0.0128   Epoch: 12   Global Step: 533130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:38:23,365-Speed 2624.92 samples/sec   Loss 4.7475   LearningRate 0.0128   Epoch: 12   Global Step: 533140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:38:27,251-Speed 2635.92 samples/sec   Loss 4.7822   LearningRate 0.0128   Epoch: 12   Global Step: 533150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:31,170-Speed 2613.35 samples/sec   Loss 4.8102   LearningRate 0.0128   Epoch: 12   Global Step: 533160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:35,074-Speed 2623.62 samples/sec   Loss 4.8156   LearningRate 0.0128   Epoch: 12   Global Step: 533170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:38,974-Speed 2625.89 samples/sec   Loss 4.8395   LearningRate 0.0128   Epoch: 12   Global Step: 533180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:42,868-Speed 2630.65 samples/sec   Loss 4.6814   LearningRate 0.0128   Epoch: 12   Global Step: 533190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:38:46,740-Speed 2644.98 samples/sec   Loss 4.6900   LearningRate 0.0128   Epoch: 12   Global Step: 533200   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:38:50,730-Speed 2567.80 samples/sec   Loss 4.8153   LearningRate 0.0128   Epoch: 12   Global Step: 533210   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:38:54,642-Speed 2617.68 samples/sec   Loss 4.7610   LearningRate 0.0128   Epoch: 12   Global Step: 533220   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:38:58,550-Speed 2621.05 samples/sec   Loss 4.7749   LearningRate 0.0128   Epoch: 12   Global Step: 533230   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:02,444-Speed 2630.01 samples/sec   Loss 4.8468   LearningRate 0.0128   Epoch: 12   Global Step: 533240   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:06,337-Speed 2631.73 samples/sec   Loss 4.8215   LearningRate 0.0128   Epoch: 12   Global Step: 533250   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:10,230-Speed 2630.89 samples/sec   Loss 4.8719   LearningRate 0.0128   Epoch: 12   Global Step: 533260   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:14,130-Speed 2626.27 samples/sec   Loss 4.8132   LearningRate 0.0128   Epoch: 12   Global Step: 533270   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:18,027-Speed 2628.62 samples/sec   Loss 4.7653   LearningRate 0.0128   Epoch: 12   Global Step: 533280   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:21,924-Speed 2627.65 samples/sec   Loss 4.7679   LearningRate 0.0128   Epoch: 12   Global Step: 533290   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:25,821-Speed 2628.69 samples/sec   Loss 4.7627   LearningRate 0.0128   Epoch: 12   Global Step: 533300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:39:29,723-Speed 2624.50 samples/sec   Loss 4.8024   LearningRate 0.0128   Epoch: 12   Global Step: 533310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:39:33,617-Speed 2630.54 samples/sec   Loss 4.8420   LearningRate 0.0128   Epoch: 12   Global Step: 533320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:39:37,514-Speed 2628.29 samples/sec   Loss 4.8627   LearningRate 0.0128   Epoch: 12   Global Step: 533330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:39:41,387-Speed 2644.18 samples/sec   Loss 4.8188   LearningRate 0.0128   Epoch: 12   Global Step: 533340   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:45,283-Speed 2629.08 samples/sec   Loss 4.7840   LearningRate 0.0128   Epoch: 12   Global Step: 533350   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:49,172-Speed 2634.46 samples/sec   Loss 4.7835   LearningRate 0.0128   Epoch: 12   Global Step: 533360   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:53,064-Speed 2631.15 samples/sec   Loss 4.8688   LearningRate 0.0127   Epoch: 12   Global Step: 533370   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:39:56,974-Speed 2619.69 samples/sec   Loss 4.7467   LearningRate 0.0127   Epoch: 12   Global Step: 533380   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:40:00,870-Speed 2629.20 samples/sec   Loss 4.8033   LearningRate 0.0127   Epoch: 12   Global Step: 533390   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:40:04,766-Speed 2628.58 samples/sec   Loss 4.7824   LearningRate 0.0127   Epoch: 12   Global Step: 533400   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:40:08,670-Speed 2623.15 samples/sec   Loss 4.7674   LearningRate 0.0127   Epoch: 12   Global Step: 533410   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:40:12,575-Speed 2624.19 samples/sec   Loss 4.8480   LearningRate 0.0127   Epoch: 12   Global Step: 533420   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:40:16,473-Speed 2627.11 samples/sec   Loss 4.7978   LearningRate 0.0127   Epoch: 12   Global Step: 533430   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:40:20,376-Speed 2624.79 samples/sec   Loss 4.7675   LearningRate 0.0127   Epoch: 12   Global Step: 533440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:24,290-Speed 2616.64 samples/sec   Loss 4.7363   LearningRate 0.0127   Epoch: 12   Global Step: 533450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:28,259-Speed 2581.15 samples/sec   Loss 4.7346   LearningRate 0.0127   Epoch: 12   Global Step: 533460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:32,158-Speed 2626.63 samples/sec   Loss 4.7947   LearningRate 0.0127   Epoch: 12   Global Step: 533470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:36,062-Speed 2623.56 samples/sec   Loss 4.8036   LearningRate 0.0127   Epoch: 12   Global Step: 533480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:39,959-Speed 2628.20 samples/sec   Loss 4.8713   LearningRate 0.0127   Epoch: 12   Global Step: 533490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:43,866-Speed 2621.30 samples/sec   Loss 4.7489   LearningRate 0.0127   Epoch: 12   Global Step: 533500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:47,761-Speed 2629.69 samples/sec   Loss 4.8045   LearningRate 0.0127   Epoch: 12   Global Step: 533510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:51,655-Speed 2630.32 samples/sec   Loss 4.9303   LearningRate 0.0127   Epoch: 12   Global Step: 533520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:55,551-Speed 2628.98 samples/sec   Loss 4.7776   LearningRate 0.0127   Epoch: 12   Global Step: 533530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:40:59,454-Speed 2623.97 samples/sec   Loss 4.9811   LearningRate 0.0127   Epoch: 12   Global Step: 533540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:03,350-Speed 2629.47 samples/sec   Loss 4.6740   LearningRate 0.0127   Epoch: 12   Global Step: 533550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:07,246-Speed 2628.61 samples/sec   Loss 4.7773   LearningRate 0.0127   Epoch: 12   Global Step: 533560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:11,144-Speed 2628.32 samples/sec   Loss 4.7669   LearningRate 0.0127   Epoch: 12   Global Step: 533570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:15,047-Speed 2623.97 samples/sec   Loss 4.8035   LearningRate 0.0127   Epoch: 12   Global Step: 533580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:18,938-Speed 2632.14 samples/sec   Loss 4.7691   LearningRate 0.0127   Epoch: 12   Global Step: 533590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:22,835-Speed 2628.49 samples/sec   Loss 4.7712   LearningRate 0.0127   Epoch: 12   Global Step: 533600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:26,731-Speed 2628.69 samples/sec   Loss 4.6771   LearningRate 0.0127   Epoch: 12   Global Step: 533610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:30,625-Speed 2630.40 samples/sec   Loss 4.7867   LearningRate 0.0127   Epoch: 12   Global Step: 533620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:34,522-Speed 2627.84 samples/sec   Loss 4.8602   LearningRate 0.0127   Epoch: 12   Global Step: 533630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:38,402-Speed 2640.26 samples/sec   Loss 4.8584   LearningRate 0.0127   Epoch: 12   Global Step: 533640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:42,300-Speed 2627.99 samples/sec   Loss 4.8251   LearningRate 0.0127   Epoch: 12   Global Step: 533650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:46,197-Speed 2627.90 samples/sec   Loss 4.7993   LearningRate 0.0127   Epoch: 12   Global Step: 533660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:50,102-Speed 2622.72 samples/sec   Loss 4.7809   LearningRate 0.0127   Epoch: 12   Global Step: 533670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:41:53,974-Speed 2645.13 samples/sec   Loss 4.7390   LearningRate 0.0127   Epoch: 12   Global Step: 533680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:41:57,866-Speed 2631.81 samples/sec   Loss 4.8032   LearningRate 0.0127   Epoch: 12   Global Step: 533690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:01,772-Speed 2622.53 samples/sec   Loss 4.8602   LearningRate 0.0127   Epoch: 12   Global Step: 533700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:05,666-Speed 2629.59 samples/sec   Loss 4.8449   LearningRate 0.0127   Epoch: 12   Global Step: 533710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:09,562-Speed 2629.23 samples/sec   Loss 4.8055   LearningRate 0.0127   Epoch: 12   Global Step: 533720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:13,455-Speed 2630.84 samples/sec   Loss 4.7893   LearningRate 0.0127   Epoch: 12   Global Step: 533730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:17,360-Speed 2623.17 samples/sec   Loss 4.7218   LearningRate 0.0127   Epoch: 12   Global Step: 533740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:21,251-Speed 2631.94 samples/sec   Loss 4.8145   LearningRate 0.0127   Epoch: 12   Global Step: 533750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:25,142-Speed 2632.99 samples/sec   Loss 4.7490   LearningRate 0.0127   Epoch: 12   Global Step: 533760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:29,043-Speed 2625.00 samples/sec   Loss 4.7429   LearningRate 0.0127   Epoch: 12   Global Step: 533770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:32,942-Speed 2627.23 samples/sec   Loss 4.7105   LearningRate 0.0127   Epoch: 12   Global Step: 533780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:42:36,867-Speed 2609.55 samples/sec   Loss 4.8220   LearningRate 0.0127   Epoch: 12   Global Step: 533790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:42:40,745-Speed 2641.08 samples/sec   Loss 4.8148   LearningRate 0.0127   Epoch: 12   Global Step: 533800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:44,641-Speed 2628.87 samples/sec   Loss 4.7248   LearningRate 0.0127   Epoch: 12   Global Step: 533810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:42:48,522-Speed 2640.91 samples/sec   Loss 4.7734   LearningRate 0.0127   Epoch: 12   Global Step: 533820   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:42:52,417-Speed 2629.21 samples/sec   Loss 4.7705   LearningRate 0.0127   Epoch: 12   Global Step: 533830   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:42:56,354-Speed 2602.21 samples/sec   Loss 4.7062   LearningRate 0.0127   Epoch: 12   Global Step: 533840   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:00,343-Speed 2567.11 samples/sec   Loss 4.7571   LearningRate 0.0127   Epoch: 12   Global Step: 533850   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:04,245-Speed 2625.37 samples/sec   Loss 4.8439   LearningRate 0.0127   Epoch: 12   Global Step: 533860   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:08,140-Speed 2629.47 samples/sec   Loss 4.7726   LearningRate 0.0127   Epoch: 12   Global Step: 533870   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:12,042-Speed 2624.64 samples/sec   Loss 4.7554   LearningRate 0.0127   Epoch: 12   Global Step: 533880   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:15,944-Speed 2625.04 samples/sec   Loss 4.8146   LearningRate 0.0127   Epoch: 12   Global Step: 533890   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:19,841-Speed 2628.11 samples/sec   Loss 4.7835   LearningRate 0.0127   Epoch: 12   Global Step: 533900   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:23,745-Speed 2624.07 samples/sec   Loss 4.7769   LearningRate 0.0127   Epoch: 12   Global Step: 533910   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:43:27,643-Speed 2627.53 samples/sec   Loss 4.8546   LearningRate 0.0127   Epoch: 12   Global Step: 533920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:31,531-Speed 2636.23 samples/sec   Loss 4.7779   LearningRate 0.0127   Epoch: 12   Global Step: 533930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:35,423-Speed 2631.65 samples/sec   Loss 4.7363   LearningRate 0.0127   Epoch: 12   Global Step: 533940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:39,318-Speed 2629.73 samples/sec   Loss 4.7721   LearningRate 0.0127   Epoch: 12   Global Step: 533950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:43,212-Speed 2629.58 samples/sec   Loss 4.7928   LearningRate 0.0127   Epoch: 12   Global Step: 533960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:47,107-Speed 2630.01 samples/sec   Loss 4.8177   LearningRate 0.0127   Epoch: 12   Global Step: 533970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:51,003-Speed 2628.50 samples/sec   Loss 4.8247   LearningRate 0.0127   Epoch: 12   Global Step: 533980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:54,899-Speed 2629.21 samples/sec   Loss 4.7940   LearningRate 0.0127   Epoch: 12   Global Step: 533990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:43:58,826-Speed 2607.98 samples/sec   Loss 4.7014   LearningRate 0.0127   Epoch: 12   Global Step: 534000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:02,728-Speed 2625.40 samples/sec   Loss 4.7423   LearningRate 0.0127   Epoch: 12   Global Step: 534010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:06,654-Speed 2608.93 samples/sec   Loss 4.8812   LearningRate 0.0127   Epoch: 12   Global Step: 534020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:44:10,546-Speed 2631.43 samples/sec   Loss 4.7155   LearningRate 0.0127   Epoch: 12   Global Step: 534030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:44:14,442-Speed 2629.23 samples/sec   Loss 4.7421   LearningRate 0.0127   Epoch: 12   Global Step: 534040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:44:18,340-Speed 2627.48 samples/sec   Loss 4.7599   LearningRate 0.0127   Epoch: 12   Global Step: 534050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:44:22,210-Speed 2646.63 samples/sec   Loss 4.7812   LearningRate 0.0127   Epoch: 12   Global Step: 534060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:26,103-Speed 2630.83 samples/sec   Loss 4.8598   LearningRate 0.0127   Epoch: 12   Global Step: 534070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:29,997-Speed 2630.32 samples/sec   Loss 4.9078   LearningRate 0.0127   Epoch: 12   Global Step: 534080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:33,910-Speed 2617.79 samples/sec   Loss 4.8779   LearningRate 0.0127   Epoch: 12   Global Step: 534090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:37,806-Speed 2628.53 samples/sec   Loss 4.8079   LearningRate 0.0127   Epoch: 12   Global Step: 534100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:41,703-Speed 2628.32 samples/sec   Loss 4.7754   LearningRate 0.0127   Epoch: 12   Global Step: 534110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:45,604-Speed 2625.93 samples/sec   Loss 4.8152   LearningRate 0.0127   Epoch: 12   Global Step: 534120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:49,503-Speed 2627.17 samples/sec   Loss 4.8005   LearningRate 0.0127   Epoch: 12   Global Step: 534130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:53,400-Speed 2629.13 samples/sec   Loss 4.7009   LearningRate 0.0127   Epoch: 12   Global Step: 534140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:44:57,297-Speed 2627.85 samples/sec   Loss 4.9027   LearningRate 0.0127   Epoch: 12   Global Step: 534150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:01,191-Speed 2629.86 samples/sec   Loss 4.8078   LearningRate 0.0127   Epoch: 12   Global Step: 534160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:45:05,093-Speed 2625.09 samples/sec   Loss 4.8120   LearningRate 0.0127   Epoch: 12   Global Step: 534170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:45:08,994-Speed 2625.29 samples/sec   Loss 4.7356   LearningRate 0.0127   Epoch: 12   Global Step: 534180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:45:12,873-Speed 2640.29 samples/sec   Loss 4.8267   LearningRate 0.0127   Epoch: 12   Global Step: 534190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:16,767-Speed 2630.34 samples/sec   Loss 4.8491   LearningRate 0.0127   Epoch: 12   Global Step: 534200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:20,661-Speed 2630.91 samples/sec   Loss 4.7782   LearningRate 0.0127   Epoch: 12   Global Step: 534210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:24,554-Speed 2630.69 samples/sec   Loss 4.7787   LearningRate 0.0127   Epoch: 12   Global Step: 534220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:28,447-Speed 2631.26 samples/sec   Loss 4.7700   LearningRate 0.0127   Epoch: 12   Global Step: 534230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:32,347-Speed 2626.11 samples/sec   Loss 4.8553   LearningRate 0.0127   Epoch: 12   Global Step: 534240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:36,241-Speed 2630.41 samples/sec   Loss 4.7309   LearningRate 0.0127   Epoch: 12   Global Step: 534250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:40,132-Speed 2631.73 samples/sec   Loss 4.7748   LearningRate 0.0127   Epoch: 12   Global Step: 534260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:44,031-Speed 2627.41 samples/sec   Loss 4.8456   LearningRate 0.0127   Epoch: 12   Global Step: 534270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:47,932-Speed 2625.78 samples/sec   Loss 4.8230   LearningRate 0.0127   Epoch: 12   Global Step: 534280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:51,829-Speed 2628.22 samples/sec   Loss 4.7645   LearningRate 0.0127   Epoch: 12   Global Step: 534290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:45:55,708-Speed 2639.95 samples/sec   Loss 4.8025   LearningRate 0.0127   Epoch: 12   Global Step: 534300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:45:59,608-Speed 2627.91 samples/sec   Loss 4.8119   LearningRate 0.0127   Epoch: 12   Global Step: 534310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:03,508-Speed 2625.90 samples/sec   Loss 4.7713   LearningRate 0.0127   Epoch: 12   Global Step: 534320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:07,569-Speed 2521.97 samples/sec   Loss 4.7494   LearningRate 0.0127   Epoch: 12   Global Step: 534330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:11,568-Speed 2561.36 samples/sec   Loss 4.8298   LearningRate 0.0127   Epoch: 12   Global Step: 534340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:15,470-Speed 2624.88 samples/sec   Loss 4.7478   LearningRate 0.0127   Epoch: 12   Global Step: 534350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:19,373-Speed 2624.57 samples/sec   Loss 4.7612   LearningRate 0.0127   Epoch: 12   Global Step: 534360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:23,274-Speed 2625.13 samples/sec   Loss 4.6874   LearningRate 0.0127   Epoch: 12   Global Step: 534370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:27,175-Speed 2626.01 samples/sec   Loss 4.7505   LearningRate 0.0127   Epoch: 12   Global Step: 534380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:31,084-Speed 2620.02 samples/sec   Loss 4.8301   LearningRate 0.0127   Epoch: 12   Global Step: 534390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:34,987-Speed 2624.03 samples/sec   Loss 4.8997   LearningRate 0.0127   Epoch: 12   Global Step: 534400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:46:38,886-Speed 2626.98 samples/sec   Loss 4.7998   LearningRate 0.0127   Epoch: 12   Global Step: 534410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:46:42,765-Speed 2640.42 samples/sec   Loss 4.7524   LearningRate 0.0127   Epoch: 12   Global Step: 534420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:46,676-Speed 2619.12 samples/sec   Loss 4.7693   LearningRate 0.0127   Epoch: 12   Global Step: 534430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:50,573-Speed 2628.11 samples/sec   Loss 4.8638   LearningRate 0.0127   Epoch: 12   Global Step: 534440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:54,467-Speed 2630.35 samples/sec   Loss 4.7317   LearningRate 0.0127   Epoch: 12   Global Step: 534450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:46:58,367-Speed 2626.55 samples/sec   Loss 4.7573   LearningRate 0.0127   Epoch: 12   Global Step: 534460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:47:02,270-Speed 2623.77 samples/sec   Loss 4.8145   LearningRate 0.0127   Epoch: 12   Global Step: 534470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:47:06,181-Speed 2618.86 samples/sec   Loss 4.8427   LearningRate 0.0127   Epoch: 12   Global Step: 534480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:47:10,067-Speed 2635.85 samples/sec   Loss 4.7934   LearningRate 0.0127   Epoch: 12   Global Step: 534490   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:13,976-Speed 2619.76 samples/sec   Loss 4.8744   LearningRate 0.0127   Epoch: 12   Global Step: 534500   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:17,900-Speed 2610.65 samples/sec   Loss 4.8212   LearningRate 0.0127   Epoch: 12   Global Step: 534510   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:21,797-Speed 2628.70 samples/sec   Loss 4.8478   LearningRate 0.0127   Epoch: 12   Global Step: 534520   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:25,698-Speed 2625.29 samples/sec   Loss 4.7322   LearningRate 0.0126   Epoch: 12   Global Step: 534530   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:29,666-Speed 2581.73 samples/sec   Loss 4.7672   LearningRate 0.0126   Epoch: 12   Global Step: 534540   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:33,572-Speed 2621.66 samples/sec   Loss 4.7412   LearningRate 0.0126   Epoch: 12   Global Step: 534550   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:37,479-Speed 2621.14 samples/sec   Loss 4.8157   LearningRate 0.0126   Epoch: 12   Global Step: 534560   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:41,399-Speed 2613.34 samples/sec   Loss 4.6703   LearningRate 0.0126   Epoch: 12   Global Step: 534570   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:45,334-Speed 2602.86 samples/sec   Loss 4.7627   LearningRate 0.0126   Epoch: 12   Global Step: 534580   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:47:49,248-Speed 2616.67 samples/sec   Loss 4.8609   LearningRate 0.0126   Epoch: 12   Global Step: 534590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:47:53,158-Speed 2619.38 samples/sec   Loss 4.8261   LearningRate 0.0126   Epoch: 12   Global Step: 534600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:47:57,062-Speed 2624.36 samples/sec   Loss 4.8385   LearningRate 0.0126   Epoch: 12   Global Step: 534610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:00,962-Speed 2626.09 samples/sec   Loss 4.8796   LearningRate 0.0126   Epoch: 12   Global Step: 534620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:04,858-Speed 2628.56 samples/sec   Loss 4.8751   LearningRate 0.0126   Epoch: 12   Global Step: 534630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:08,769-Speed 2619.03 samples/sec   Loss 4.7492   LearningRate 0.0126   Epoch: 12   Global Step: 534640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:12,667-Speed 2628.09 samples/sec   Loss 4.7216   LearningRate 0.0126   Epoch: 12   Global Step: 534650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:16,563-Speed 2629.28 samples/sec   Loss 4.8025   LearningRate 0.0126   Epoch: 12   Global Step: 534660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:20,458-Speed 2628.96 samples/sec   Loss 4.7816   LearningRate 0.0126   Epoch: 12   Global Step: 534670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:24,361-Speed 2625.14 samples/sec   Loss 4.8042   LearningRate 0.0126   Epoch: 12   Global Step: 534680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:28,266-Speed 2622.48 samples/sec   Loss 4.7612   LearningRate 0.0126   Epoch: 12   Global Step: 534690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:32,168-Speed 2625.05 samples/sec   Loss 4.6951   LearningRate 0.0126   Epoch: 12   Global Step: 534700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:36,087-Speed 2613.67 samples/sec   Loss 4.7575   LearningRate 0.0126   Epoch: 12   Global Step: 534710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:40,017-Speed 2606.02 samples/sec   Loss 4.7665   LearningRate 0.0126   Epoch: 12   Global Step: 534720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:43,919-Speed 2624.71 samples/sec   Loss 4.7600   LearningRate 0.0126   Epoch: 12   Global Step: 534730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:47,816-Speed 2628.57 samples/sec   Loss 4.7341   LearningRate 0.0126   Epoch: 12   Global Step: 534740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:51,715-Speed 2626.63 samples/sec   Loss 4.8144   LearningRate 0.0126   Epoch: 12   Global Step: 534750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:48:55,587-Speed 2645.95 samples/sec   Loss 4.7679   LearningRate 0.0126   Epoch: 12   Global Step: 534760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:48:59,486-Speed 2626.56 samples/sec   Loss 4.7434   LearningRate 0.0126   Epoch: 12   Global Step: 534770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:03,392-Speed 2622.13 samples/sec   Loss 4.8520   LearningRate 0.0126   Epoch: 12   Global Step: 534780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:07,288-Speed 2629.17 samples/sec   Loss 4.7402   LearningRate 0.0126   Epoch: 12   Global Step: 534790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:11,182-Speed 2630.19 samples/sec   Loss 4.7875   LearningRate 0.0126   Epoch: 12   Global Step: 534800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:15,080-Speed 2627.80 samples/sec   Loss 4.7656   LearningRate 0.0126   Epoch: 12   Global Step: 534810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:18,972-Speed 2631.75 samples/sec   Loss 4.7075   LearningRate 0.0126   Epoch: 12   Global Step: 534820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:22,873-Speed 2625.61 samples/sec   Loss 4.6722   LearningRate 0.0126   Epoch: 12   Global Step: 534830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:26,774-Speed 2625.74 samples/sec   Loss 4.7747   LearningRate 0.0126   Epoch: 12   Global Step: 534840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:30,690-Speed 2615.15 samples/sec   Loss 4.7638   LearningRate 0.0126   Epoch: 12   Global Step: 534850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:34,603-Speed 2618.07 samples/sec   Loss 4.7475   LearningRate 0.0126   Epoch: 12   Global Step: 534860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:49:38,502-Speed 2626.47 samples/sec   Loss 4.8053   LearningRate 0.0126   Epoch: 12   Global Step: 534870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:49:42,378-Speed 2642.21 samples/sec   Loss 4.7778   LearningRate 0.0126   Epoch: 12   Global Step: 534880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:46,272-Speed 2630.18 samples/sec   Loss 4.7910   LearningRate 0.0126   Epoch: 12   Global Step: 534890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:50,165-Speed 2631.36 samples/sec   Loss 4.7842   LearningRate 0.0126   Epoch: 12   Global Step: 534900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:54,059-Speed 2630.65 samples/sec   Loss 4.7840   LearningRate 0.0126   Epoch: 12   Global Step: 534910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:49:57,963-Speed 2623.36 samples/sec   Loss 4.8334   LearningRate 0.0126   Epoch: 12   Global Step: 534920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:50:01,866-Speed 2624.47 samples/sec   Loss 4.7300   LearningRate 0.0126   Epoch: 12   Global Step: 534930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:50:05,762-Speed 2629.16 samples/sec   Loss 4.8225   LearningRate 0.0126   Epoch: 12   Global Step: 534940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:50:09,656-Speed 2630.17 samples/sec   Loss 4.8253   LearningRate 0.0126   Epoch: 12   Global Step: 534950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:50:13,550-Speed 2630.25 samples/sec   Loss 4.7748   LearningRate 0.0126   Epoch: 12   Global Step: 534960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:50:17,449-Speed 2627.15 samples/sec   Loss 4.8253   LearningRate 0.0126   Epoch: 12   Global Step: 534970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:50:21,341-Speed 2630.98 samples/sec   Loss 4.8126   LearningRate 0.0126   Epoch: 12   Global Step: 534980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:25,247-Speed 2622.64 samples/sec   Loss 4.7976   LearningRate 0.0126   Epoch: 12   Global Step: 534990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:29,147-Speed 2626.96 samples/sec   Loss 4.6565   LearningRate 0.0126   Epoch: 12   Global Step: 535000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:33,040-Speed 2630.37 samples/sec   Loss 4.8058   LearningRate 0.0126   Epoch: 12   Global Step: 535010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:36,950-Speed 2619.72 samples/sec   Loss 4.8343   LearningRate 0.0126   Epoch: 12   Global Step: 535020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:40,852-Speed 2624.44 samples/sec   Loss 4.8002   LearningRate 0.0126   Epoch: 12   Global Step: 535030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:44,749-Speed 2628.97 samples/sec   Loss 4.7888   LearningRate 0.0126   Epoch: 12   Global Step: 535040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:48,655-Speed 2622.19 samples/sec   Loss 4.7450   LearningRate 0.0126   Epoch: 12   Global Step: 535050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:52,659-Speed 2557.90 samples/sec   Loss 4.8221   LearningRate 0.0126   Epoch: 12   Global Step: 535060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:50:56,573-Speed 2616.65 samples/sec   Loss 4.7011   LearningRate 0.0126   Epoch: 12   Global Step: 535070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:51:00,456-Speed 2637.74 samples/sec   Loss 4.7849   LearningRate 0.0126   Epoch: 12   Global Step: 535080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:51:04,353-Speed 2628.07 samples/sec   Loss 4.8352   LearningRate 0.0126   Epoch: 12   Global Step: 535090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:51:08,225-Speed 2645.41 samples/sec   Loss 4.8072   LearningRate 0.0126   Epoch: 12   Global Step: 535100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:12,118-Speed 2631.37 samples/sec   Loss 4.7544   LearningRate 0.0126   Epoch: 12   Global Step: 535110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:16,014-Speed 2628.68 samples/sec   Loss 4.6717   LearningRate 0.0126   Epoch: 12   Global Step: 535120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:19,907-Speed 2631.27 samples/sec   Loss 4.7096   LearningRate 0.0126   Epoch: 12   Global Step: 535130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:23,798-Speed 2631.69 samples/sec   Loss 4.7804   LearningRate 0.0126   Epoch: 12   Global Step: 535140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:27,703-Speed 2623.61 samples/sec   Loss 4.7778   LearningRate 0.0126   Epoch: 12   Global Step: 535150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:31,604-Speed 2625.11 samples/sec   Loss 4.7922   LearningRate 0.0126   Epoch: 12   Global Step: 535160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:35,495-Speed 2631.98 samples/sec   Loss 4.7407   LearningRate 0.0126   Epoch: 12   Global Step: 535170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:39,390-Speed 2629.36 samples/sec   Loss 4.8291   LearningRate 0.0126   Epoch: 12   Global Step: 535180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:43,556-Speed 2459.02 samples/sec   Loss 4.7756   LearningRate 0.0126   Epoch: 12   Global Step: 535190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:51:47,463-Speed 2621.99 samples/sec   Loss 4.8015   LearningRate 0.0126   Epoch: 12   Global Step: 535200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:51:51,368-Speed 2622.47 samples/sec   Loss 4.8590   LearningRate 0.0126   Epoch: 12   Global Step: 535210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:51:55,267-Speed 2627.09 samples/sec   Loss 4.8746   LearningRate 0.0126   Epoch: 12   Global Step: 535220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:51:59,139-Speed 2645.40 samples/sec   Loss 4.8291   LearningRate 0.0126   Epoch: 12   Global Step: 535230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:03,031-Speed 2631.78 samples/sec   Loss 4.7970   LearningRate 0.0126   Epoch: 12   Global Step: 535240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:06,929-Speed 2626.92 samples/sec   Loss 4.7746   LearningRate 0.0126   Epoch: 12   Global Step: 535250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:10,828-Speed 2627.06 samples/sec   Loss 4.9271   LearningRate 0.0126   Epoch: 12   Global Step: 535260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:14,719-Speed 2632.09 samples/sec   Loss 4.7289   LearningRate 0.0126   Epoch: 12   Global Step: 535270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:18,626-Speed 2622.02 samples/sec   Loss 4.8017   LearningRate 0.0126   Epoch: 12   Global Step: 535280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:22,528-Speed 2625.41 samples/sec   Loss 4.7757   LearningRate 0.0126   Epoch: 12   Global Step: 535290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:26,422-Speed 2629.85 samples/sec   Loss 4.8400   LearningRate 0.0126   Epoch: 12   Global Step: 535300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:30,361-Speed 2600.48 samples/sec   Loss 4.8032   LearningRate 0.0126   Epoch: 12   Global Step: 535310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:34,257-Speed 2628.76 samples/sec   Loss 4.8188   LearningRate 0.0126   Epoch: 12   Global Step: 535320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:38,152-Speed 2629.93 samples/sec   Loss 4.6974   LearningRate 0.0126   Epoch: 12   Global Step: 535330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:52:42,021-Speed 2646.68 samples/sec   Loss 4.6973   LearningRate 0.0126   Epoch: 12   Global Step: 535340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:45,919-Speed 2627.92 samples/sec   Loss 4.7225   LearningRate 0.0126   Epoch: 12   Global Step: 535350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:49,833-Speed 2616.47 samples/sec   Loss 4.8137   LearningRate 0.0126   Epoch: 12   Global Step: 535360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:53,738-Speed 2623.15 samples/sec   Loss 4.7494   LearningRate 0.0126   Epoch: 12   Global Step: 535370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:52:57,646-Speed 2621.35 samples/sec   Loss 4.7154   LearningRate 0.0126   Epoch: 12   Global Step: 535380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:01,542-Speed 2628.85 samples/sec   Loss 4.7753   LearningRate 0.0126   Epoch: 12   Global Step: 535390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:05,438-Speed 2628.92 samples/sec   Loss 4.8265   LearningRate 0.0126   Epoch: 12   Global Step: 535400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:09,335-Speed 2627.99 samples/sec   Loss 4.7497   LearningRate 0.0126   Epoch: 12   Global Step: 535410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:13,244-Speed 2620.04 samples/sec   Loss 4.7505   LearningRate 0.0126   Epoch: 12   Global Step: 535420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:17,139-Speed 2629.57 samples/sec   Loss 4.7530   LearningRate 0.0126   Epoch: 12   Global Step: 535430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:21,011-Speed 2645.57 samples/sec   Loss 4.7286   LearningRate 0.0126   Epoch: 12   Global Step: 535440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:24,906-Speed 2629.51 samples/sec   Loss 4.7856   LearningRate 0.0126   Epoch: 12   Global Step: 535450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:28,819-Speed 2617.83 samples/sec   Loss 4.7023   LearningRate 0.0126   Epoch: 12   Global Step: 535460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:32,720-Speed 2625.38 samples/sec   Loss 4.8666   LearningRate 0.0126   Epoch: 12   Global Step: 535470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:36,613-Speed 2630.73 samples/sec   Loss 4.7112   LearningRate 0.0126   Epoch: 12   Global Step: 535480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:40,511-Speed 2628.10 samples/sec   Loss 4.8605   LearningRate 0.0126   Epoch: 12   Global Step: 535490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:53:44,398-Speed 2635.71 samples/sec   Loss 4.7380   LearningRate 0.0126   Epoch: 12   Global Step: 535500   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:53:48,294-Speed 2628.79 samples/sec   Loss 4.7364   LearningRate 0.0126   Epoch: 12   Global Step: 535510   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:53:52,190-Speed 2629.15 samples/sec   Loss 4.7177   LearningRate 0.0126   Epoch: 12   Global Step: 535520   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:53:56,082-Speed 2631.36 samples/sec   Loss 4.8118   LearningRate 0.0126   Epoch: 12   Global Step: 535530   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:53:59,992-Speed 2620.10 samples/sec   Loss 4.7611   LearningRate 0.0126   Epoch: 12   Global Step: 535540   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:54:03,904-Speed 2618.20 samples/sec   Loss 4.7286   LearningRate 0.0126   Epoch: 12   Global Step: 535550   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:54:07,802-Speed 2627.32 samples/sec   Loss 4.6777   LearningRate 0.0126   Epoch: 12   Global Step: 535560   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:54:11,702-Speed 2626.33 samples/sec   Loss 4.7757   LearningRate 0.0126   Epoch: 12   Global Step: 535570   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:54:15,605-Speed 2624.96 samples/sec   Loss 4.7356   LearningRate 0.0126   Epoch: 12   Global Step: 535580   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:54:19,504-Speed 2627.25 samples/sec   Loss 4.7255   LearningRate 0.0126   Epoch: 12   Global Step: 535590   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 07:54:23,412-Speed 2620.62 samples/sec   Loss 4.7616   LearningRate 0.0126   Epoch: 12   Global Step: 535600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:27,318-Speed 2622.69 samples/sec   Loss 4.7378   LearningRate 0.0126   Epoch: 12   Global Step: 535610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:31,219-Speed 2624.85 samples/sec   Loss 4.7767   LearningRate 0.0126   Epoch: 12   Global Step: 535620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:35,128-Speed 2620.66 samples/sec   Loss 4.8475   LearningRate 0.0126   Epoch: 12   Global Step: 535630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:39,031-Speed 2624.21 samples/sec   Loss 4.7851   LearningRate 0.0126   Epoch: 12   Global Step: 535640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:42,925-Speed 2630.22 samples/sec   Loss 4.7660   LearningRate 0.0126   Epoch: 12   Global Step: 535650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:46,819-Speed 2630.23 samples/sec   Loss 4.7354   LearningRate 0.0126   Epoch: 12   Global Step: 535660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:50,719-Speed 2626.54 samples/sec   Loss 4.8303   LearningRate 0.0126   Epoch: 12   Global Step: 535670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:54,613-Speed 2630.92 samples/sec   Loss 4.7895   LearningRate 0.0126   Epoch: 12   Global Step: 535680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:54:58,519-Speed 2621.54 samples/sec   Loss 4.6005   LearningRate 0.0126   Epoch: 12   Global Step: 535690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:02,412-Speed 2631.47 samples/sec   Loss 4.7424   LearningRate 0.0125   Epoch: 12   Global Step: 535700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:55:06,308-Speed 2628.68 samples/sec   Loss 4.7597   LearningRate 0.0125   Epoch: 12   Global Step: 535710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:55:10,206-Speed 2627.46 samples/sec   Loss 4.8931   LearningRate 0.0125   Epoch: 12   Global Step: 535720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:55:14,104-Speed 2627.51 samples/sec   Loss 4.7262   LearningRate 0.0125   Epoch: 12   Global Step: 535730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:55:18,019-Speed 2616.23 samples/sec   Loss 4.7511   LearningRate 0.0125   Epoch: 12   Global Step: 535740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:55:21,905-Speed 2635.45 samples/sec   Loss 4.7522   LearningRate 0.0125   Epoch: 12   Global Step: 535750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:25,798-Speed 2631.40 samples/sec   Loss 4.8155   LearningRate 0.0125   Epoch: 12   Global Step: 535760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:29,695-Speed 2628.26 samples/sec   Loss 4.7841   LearningRate 0.0125   Epoch: 12   Global Step: 535770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:33,591-Speed 2629.29 samples/sec   Loss 4.9095   LearningRate 0.0125   Epoch: 12   Global Step: 535780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:37,491-Speed 2626.52 samples/sec   Loss 4.7752   LearningRate 0.0125   Epoch: 12   Global Step: 535790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:41,387-Speed 2628.30 samples/sec   Loss 4.7823   LearningRate 0.0125   Epoch: 12   Global Step: 535800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:45,282-Speed 2629.61 samples/sec   Loss 4.9024   LearningRate 0.0125   Epoch: 12   Global Step: 535810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:49,180-Speed 2628.20 samples/sec   Loss 4.7103   LearningRate 0.0125   Epoch: 12   Global Step: 535820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:53,079-Speed 2626.22 samples/sec   Loss 4.7978   LearningRate 0.0125   Epoch: 12   Global Step: 535830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:55:57,065-Speed 2570.22 samples/sec   Loss 4.8045   LearningRate 0.0125   Epoch: 12   Global Step: 535840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:56:01,165-Speed 2497.97 samples/sec   Loss 4.7685   LearningRate 0.0125   Epoch: 12   Global Step: 535850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:05,138-Speed 2577.62 samples/sec   Loss 4.7217   LearningRate 0.0125   Epoch: 12   Global Step: 535860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:09,044-Speed 2622.69 samples/sec   Loss 4.7164   LearningRate 0.0125   Epoch: 12   Global Step: 535870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:12,945-Speed 2625.31 samples/sec   Loss 4.7777   LearningRate 0.0125   Epoch: 12   Global Step: 535880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:16,870-Speed 2609.57 samples/sec   Loss 4.8616   LearningRate 0.0125   Epoch: 12   Global Step: 535890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:20,813-Speed 2597.87 samples/sec   Loss 4.7644   LearningRate 0.0125   Epoch: 12   Global Step: 535900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:24,756-Speed 2597.80 samples/sec   Loss 4.7590   LearningRate 0.0125   Epoch: 12   Global Step: 535910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:28,659-Speed 2623.86 samples/sec   Loss 4.7830   LearningRate 0.0125   Epoch: 12   Global Step: 535920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:32,560-Speed 2625.62 samples/sec   Loss 4.7965   LearningRate 0.0125   Epoch: 12   Global Step: 535930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:36,457-Speed 2628.28 samples/sec   Loss 4.7616   LearningRate 0.0125   Epoch: 12   Global Step: 535940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:40,335-Speed 2640.75 samples/sec   Loss 4.6420   LearningRate 0.0125   Epoch: 12   Global Step: 535950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:44,278-Speed 2598.26 samples/sec   Loss 4.7538   LearningRate 0.0125   Epoch: 12   Global Step: 535960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:48,244-Speed 2582.68 samples/sec   Loss 4.6863   LearningRate 0.0125   Epoch: 12   Global Step: 535970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:52,138-Speed 2630.68 samples/sec   Loss 4.6751   LearningRate 0.0125   Epoch: 12   Global Step: 535980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:56,031-Speed 2630.62 samples/sec   Loss 4.6811   LearningRate 0.0125   Epoch: 12   Global Step: 535990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:56:59,926-Speed 2629.30 samples/sec   Loss 4.6203   LearningRate 0.0125   Epoch: 12   Global Step: 536000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:03,834-Speed 2620.95 samples/sec   Loss 4.7097   LearningRate 0.0125   Epoch: 12   Global Step: 536010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:07,745-Speed 2618.96 samples/sec   Loss 4.7748   LearningRate 0.0125   Epoch: 12   Global Step: 536020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:11,645-Speed 2625.90 samples/sec   Loss 4.7266   LearningRate 0.0125   Epoch: 12   Global Step: 536030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:15,542-Speed 2629.11 samples/sec   Loss 4.7974   LearningRate 0.0125   Epoch: 12   Global Step: 536040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:19,464-Speed 2611.90 samples/sec   Loss 4.7820   LearningRate 0.0125   Epoch: 12   Global Step: 536050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:23,360-Speed 2628.82 samples/sec   Loss 4.7729   LearningRate 0.0125   Epoch: 12   Global Step: 536060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:27,286-Speed 2608.97 samples/sec   Loss 4.7902   LearningRate 0.0125   Epoch: 12   Global Step: 536070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:31,204-Speed 2614.27 samples/sec   Loss 4.8143   LearningRate 0.0125   Epoch: 12   Global Step: 536080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:35,113-Speed 2620.74 samples/sec   Loss 4.6591   LearningRate 0.0125   Epoch: 12   Global Step: 536090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:57:39,017-Speed 2623.26 samples/sec   Loss 4.7949   LearningRate 0.0125   Epoch: 12   Global Step: 536100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:57:42,912-Speed 2630.03 samples/sec   Loss 4.7901   LearningRate 0.0125   Epoch: 12   Global Step: 536110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:57:46,806-Speed 2629.74 samples/sec   Loss 4.6971   LearningRate 0.0125   Epoch: 12   Global Step: 536120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:57:50,707-Speed 2626.26 samples/sec   Loss 4.7649   LearningRate 0.0125   Epoch: 12   Global Step: 536130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:57:54,603-Speed 2628.70 samples/sec   Loss 4.7559   LearningRate 0.0125   Epoch: 12   Global Step: 536140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:57:58,501-Speed 2628.39 samples/sec   Loss 4.8018   LearningRate 0.0125   Epoch: 12   Global Step: 536150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:58:02,396-Speed 2629.53 samples/sec   Loss 4.7974   LearningRate 0.0125   Epoch: 12   Global Step: 536160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:58:06,289-Speed 2630.51 samples/sec   Loss 4.8331   LearningRate 0.0125   Epoch: 12   Global Step: 536170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:58:10,160-Speed 2645.67 samples/sec   Loss 4.7428   LearningRate 0.0125   Epoch: 12   Global Step: 536180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:14,056-Speed 2629.42 samples/sec   Loss 4.7134   LearningRate 0.0125   Epoch: 12   Global Step: 536190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:17,960-Speed 2622.90 samples/sec   Loss 4.8368   LearningRate 0.0125   Epoch: 12   Global Step: 536200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:21,857-Speed 2628.76 samples/sec   Loss 4.8100   LearningRate 0.0125   Epoch: 12   Global Step: 536210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:25,755-Speed 2627.58 samples/sec   Loss 4.8760   LearningRate 0.0125   Epoch: 12   Global Step: 536220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:29,663-Speed 2621.51 samples/sec   Loss 4.8858   LearningRate 0.0125   Epoch: 12   Global Step: 536230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:33,564-Speed 2625.33 samples/sec   Loss 4.7885   LearningRate 0.0125   Epoch: 12   Global Step: 536240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:37,475-Speed 2618.84 samples/sec   Loss 4.7540   LearningRate 0.0125   Epoch: 12   Global Step: 536250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:41,368-Speed 2631.33 samples/sec   Loss 4.7449   LearningRate 0.0125   Epoch: 12   Global Step: 536260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:45,262-Speed 2630.27 samples/sec   Loss 4.7651   LearningRate 0.0125   Epoch: 12   Global Step: 536270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:58:49,168-Speed 2622.14 samples/sec   Loss 4.7027   LearningRate 0.0125   Epoch: 12   Global Step: 536280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:58:53,085-Speed 2615.41 samples/sec   Loss 4.6714   LearningRate 0.0125   Epoch: 12   Global Step: 536290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:58:56,980-Speed 2629.84 samples/sec   Loss 4.8586   LearningRate 0.0125   Epoch: 12   Global Step: 536300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:59:00,850-Speed 2646.50 samples/sec   Loss 4.7063   LearningRate 0.0125   Epoch: 12   Global Step: 536310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:04,771-Speed 2611.83 samples/sec   Loss 4.7960   LearningRate 0.0125   Epoch: 12   Global Step: 536320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:08,672-Speed 2625.44 samples/sec   Loss 4.7492   LearningRate 0.0125   Epoch: 12   Global Step: 536330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:12,575-Speed 2624.46 samples/sec   Loss 4.7146   LearningRate 0.0125   Epoch: 12   Global Step: 536340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:16,470-Speed 2629.56 samples/sec   Loss 4.7547   LearningRate 0.0125   Epoch: 12   Global Step: 536350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:20,383-Speed 2617.69 samples/sec   Loss 4.7956   LearningRate 0.0125   Epoch: 12   Global Step: 536360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:24,285-Speed 2624.64 samples/sec   Loss 4.7521   LearningRate 0.0125   Epoch: 12   Global Step: 536370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:28,203-Speed 2614.83 samples/sec   Loss 4.8029   LearningRate 0.0125   Epoch: 12   Global Step: 536380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:32,150-Speed 2595.25 samples/sec   Loss 4.7855   LearningRate 0.0125   Epoch: 12   Global Step: 536390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:36,147-Speed 2562.43 samples/sec   Loss 4.7705   LearningRate 0.0125   Epoch: 12   Global Step: 536400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:40,057-Speed 2619.28 samples/sec   Loss 4.8349   LearningRate 0.0125   Epoch: 12   Global Step: 536410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:59:43,954-Speed 2628.34 samples/sec   Loss 4.8185   LearningRate 0.0125   Epoch: 12   Global Step: 536420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 07:59:47,832-Speed 2641.21 samples/sec   Loss 4.7703   LearningRate 0.0125   Epoch: 12   Global Step: 536430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:51,882-Speed 2529.24 samples/sec   Loss 4.8328   LearningRate 0.0125   Epoch: 12   Global Step: 536440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 07:59:55,965-Speed 2508.33 samples/sec   Loss 4.7721   LearningRate 0.0125   Epoch: 12   Global Step: 536450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:00,032-Speed 2518.63 samples/sec   Loss 4.7718   LearningRate 0.0125   Epoch: 12   Global Step: 536460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:03,935-Speed 2623.84 samples/sec   Loss 4.7635   LearningRate 0.0125   Epoch: 12   Global Step: 536470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:07,838-Speed 2624.70 samples/sec   Loss 4.8187   LearningRate 0.0125   Epoch: 12   Global Step: 536480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:11,739-Speed 2625.20 samples/sec   Loss 4.7989   LearningRate 0.0125   Epoch: 12   Global Step: 536490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:15,692-Speed 2591.52 samples/sec   Loss 4.8222   LearningRate 0.0125   Epoch: 12   Global Step: 536500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:19,587-Speed 2629.66 samples/sec   Loss 4.6355   LearningRate 0.0125   Epoch: 12   Global Step: 536510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:23,483-Speed 2629.20 samples/sec   Loss 4.7452   LearningRate 0.0125   Epoch: 12   Global Step: 536520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:27,378-Speed 2629.82 samples/sec   Loss 4.7563   LearningRate 0.0125   Epoch: 12   Global Step: 536530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:00:31,255-Speed 2642.20 samples/sec   Loss 4.7663   LearningRate 0.0125   Epoch: 12   Global Step: 536540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:35,148-Speed 2630.95 samples/sec   Loss 4.8005   LearningRate 0.0125   Epoch: 12   Global Step: 536550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:39,043-Speed 2629.37 samples/sec   Loss 4.7689   LearningRate 0.0125   Epoch: 12   Global Step: 536560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:42,936-Speed 2630.96 samples/sec   Loss 4.7586   LearningRate 0.0125   Epoch: 12   Global Step: 536570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:46,835-Speed 2627.14 samples/sec   Loss 4.8048   LearningRate 0.0125   Epoch: 12   Global Step: 536580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:50,741-Speed 2622.59 samples/sec   Loss 4.6975   LearningRate 0.0125   Epoch: 12   Global Step: 536590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:54,656-Speed 2616.85 samples/sec   Loss 4.6839   LearningRate 0.0125   Epoch: 12   Global Step: 536600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:00:58,552-Speed 2628.65 samples/sec   Loss 4.8262   LearningRate 0.0125   Epoch: 12   Global Step: 536610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:02,453-Speed 2626.08 samples/sec   Loss 4.8583   LearningRate 0.0125   Epoch: 12   Global Step: 536620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:06,345-Speed 2631.76 samples/sec   Loss 4.7259   LearningRate 0.0125   Epoch: 12   Global Step: 536630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:10,218-Speed 2644.31 samples/sec   Loss 4.7022   LearningRate 0.0125   Epoch: 12   Global Step: 536640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:14,113-Speed 2629.54 samples/sec   Loss 4.6780   LearningRate 0.0125   Epoch: 12   Global Step: 536650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:18,021-Speed 2621.82 samples/sec   Loss 4.7600   LearningRate 0.0125   Epoch: 12   Global Step: 536660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:21,914-Speed 2631.06 samples/sec   Loss 4.6855   LearningRate 0.0125   Epoch: 12   Global Step: 536670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:25,826-Speed 2618.27 samples/sec   Loss 4.8269   LearningRate 0.0125   Epoch: 12   Global Step: 536680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:29,718-Speed 2631.96 samples/sec   Loss 4.7781   LearningRate 0.0125   Epoch: 12   Global Step: 536690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:33,650-Speed 2604.93 samples/sec   Loss 4.6970   LearningRate 0.0125   Epoch: 12   Global Step: 536700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:37,553-Speed 2624.39 samples/sec   Loss 4.7319   LearningRate 0.0125   Epoch: 12   Global Step: 536710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:41,448-Speed 2629.59 samples/sec   Loss 4.6792   LearningRate 0.0125   Epoch: 12   Global Step: 536720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:45,338-Speed 2633.24 samples/sec   Loss 4.6136   LearningRate 0.0125   Epoch: 12   Global Step: 536730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:01:49,259-Speed 2612.62 samples/sec   Loss 4.7391   LearningRate 0.0125   Epoch: 12   Global Step: 536740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:01:53,158-Speed 2626.84 samples/sec   Loss 4.8927   LearningRate 0.0125   Epoch: 12   Global Step: 536750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:01:57,042-Speed 2637.60 samples/sec   Loss 4.7670   LearningRate 0.0125   Epoch: 12   Global Step: 536760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:00,943-Speed 2625.68 samples/sec   Loss 4.7760   LearningRate 0.0125   Epoch: 12   Global Step: 536770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:04,838-Speed 2630.10 samples/sec   Loss 4.7646   LearningRate 0.0125   Epoch: 12   Global Step: 536780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:08,732-Speed 2630.13 samples/sec   Loss 4.7727   LearningRate 0.0125   Epoch: 12   Global Step: 536790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:12,632-Speed 2625.77 samples/sec   Loss 4.7140   LearningRate 0.0125   Epoch: 12   Global Step: 536800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:16,527-Speed 2629.57 samples/sec   Loss 4.7413   LearningRate 0.0125   Epoch: 12   Global Step: 536810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:20,428-Speed 2625.96 samples/sec   Loss 4.7119   LearningRate 0.0125   Epoch: 12   Global Step: 536820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:24,329-Speed 2625.54 samples/sec   Loss 4.7393   LearningRate 0.0125   Epoch: 12   Global Step: 536830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:28,247-Speed 2614.23 samples/sec   Loss 4.6323   LearningRate 0.0125   Epoch: 12   Global Step: 536840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:32,147-Speed 2625.95 samples/sec   Loss 4.7208   LearningRate 0.0125   Epoch: 12   Global Step: 536850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:02:36,062-Speed 2616.34 samples/sec   Loss 4.7474   LearningRate 0.0125   Epoch: 12   Global Step: 536860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:02:39,966-Speed 2623.95 samples/sec   Loss 4.6553   LearningRate 0.0124   Epoch: 12   Global Step: 536870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:02:43,864-Speed 2627.09 samples/sec   Loss 4.7621   LearningRate 0.0124   Epoch: 12   Global Step: 536880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:02:47,773-Speed 2621.24 samples/sec   Loss 4.7229   LearningRate 0.0124   Epoch: 12   Global Step: 536890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:02:51,671-Speed 2627.21 samples/sec   Loss 4.7338   LearningRate 0.0124   Epoch: 12   Global Step: 536900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:02:55,569-Speed 2628.24 samples/sec   Loss 4.7319   LearningRate 0.0124   Epoch: 12   Global Step: 536910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:02:59,464-Speed 2629.47 samples/sec   Loss 4.7654   LearningRate 0.0124   Epoch: 12   Global Step: 536920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:03:03,380-Speed 2615.31 samples/sec   Loss 4.8072   LearningRate 0.0124   Epoch: 12   Global Step: 536930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:03:07,326-Speed 2595.33 samples/sec   Loss 4.6411   LearningRate 0.0124   Epoch: 12   Global Step: 536940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:03:11,316-Speed 2568.37 samples/sec   Loss 4.7834   LearningRate 0.0124   Epoch: 12   Global Step: 536950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:03:15,186-Speed 2646.63 samples/sec   Loss 4.8387   LearningRate 0.0124   Epoch: 12   Global Step: 536960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:03:19,102-Speed 2615.44 samples/sec   Loss 4.8021   LearningRate 0.0124   Epoch: 12   Global Step: 536970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:03:23,014-Speed 2618.76 samples/sec   Loss 4.7803   LearningRate 0.0124   Epoch: 12   Global Step: 536980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:03:26,908-Speed 2630.32 samples/sec   Loss 4.7810   LearningRate 0.0124   Epoch: 12   Global Step: 536990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:03:30,814-Speed 2622.17 samples/sec   Loss 4.7616   LearningRate 0.0124   Epoch: 12   Global Step: 537000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:03:34,695-Speed 2639.41 samples/sec   Loss 4.6803   LearningRate 0.0124   Epoch: 12   Global Step: 537010   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:03:38,592-Speed 2628.24 samples/sec   Loss 4.7034   LearningRate 0.0124   Epoch: 12   Global Step: 537020   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:03:42,485-Speed 2630.67 samples/sec   Loss 4.8750   LearningRate 0.0124   Epoch: 12   Global Step: 537030   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:03:46,399-Speed 2617.57 samples/sec   Loss 4.7252   LearningRate 0.0124   Epoch: 12   Global Step: 537040   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:03:50,297-Speed 2627.66 samples/sec   Loss 4.7842   LearningRate 0.0124   Epoch: 12   Global Step: 537050   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:03:54,201-Speed 2624.05 samples/sec   Loss 4.8222   LearningRate 0.0124   Epoch: 12   Global Step: 537060   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:03:58,109-Speed 2620.85 samples/sec   Loss 4.6946   LearningRate 0.0124   Epoch: 12   Global Step: 537070   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:04:02,009-Speed 2626.56 samples/sec   Loss 4.8079   LearningRate 0.0124   Epoch: 12   Global Step: 537080   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:04:05,903-Speed 2630.37 samples/sec   Loss 4.7977   LearningRate 0.0124   Epoch: 12   Global Step: 537090   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:04:09,804-Speed 2625.28 samples/sec   Loss 4.8120   LearningRate 0.0124   Epoch: 12   Global Step: 537100   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:04:13,703-Speed 2627.16 samples/sec   Loss 4.7965   LearningRate 0.0124   Epoch: 12   Global Step: 537110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:17,606-Speed 2623.84 samples/sec   Loss 4.8502   LearningRate 0.0124   Epoch: 12   Global Step: 537120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:21,498-Speed 2632.29 samples/sec   Loss 4.7747   LearningRate 0.0124   Epoch: 12   Global Step: 537130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:25,394-Speed 2628.65 samples/sec   Loss 4.8665   LearningRate 0.0124   Epoch: 12   Global Step: 537140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:29,291-Speed 2628.78 samples/sec   Loss 4.7999   LearningRate 0.0124   Epoch: 12   Global Step: 537150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:33,187-Speed 2629.15 samples/sec   Loss 4.7888   LearningRate 0.0124   Epoch: 12   Global Step: 537160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:37,094-Speed 2621.35 samples/sec   Loss 4.6714   LearningRate 0.0124   Epoch: 12   Global Step: 537170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:40,990-Speed 2628.69 samples/sec   Loss 4.7357   LearningRate 0.0124   Epoch: 12   Global Step: 537180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:44,888-Speed 2627.58 samples/sec   Loss 4.7655   LearningRate 0.0124   Epoch: 12   Global Step: 537190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:48,790-Speed 2624.95 samples/sec   Loss 4.7251   LearningRate 0.0124   Epoch: 12   Global Step: 537200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:04:52,687-Speed 2628.73 samples/sec   Loss 4.7330   LearningRate 0.0124   Epoch: 12   Global Step: 537210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:04:56,583-Speed 2628.94 samples/sec   Loss 4.6427   LearningRate 0.0124   Epoch: 12   Global Step: 537220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:00,536-Speed 2591.70 samples/sec   Loss 4.7344   LearningRate 0.0124   Epoch: 12   Global Step: 537230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:04,428-Speed 2631.26 samples/sec   Loss 4.7513   LearningRate 0.0124   Epoch: 12   Global Step: 537240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:08,329-Speed 2625.58 samples/sec   Loss 4.6873   LearningRate 0.0124   Epoch: 12   Global Step: 537250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:12,236-Speed 2621.32 samples/sec   Loss 4.9007   LearningRate 0.0124   Epoch: 12   Global Step: 537260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:16,140-Speed 2624.45 samples/sec   Loss 4.7607   LearningRate 0.0124   Epoch: 12   Global Step: 537270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:20,036-Speed 2629.09 samples/sec   Loss 4.7992   LearningRate 0.0124   Epoch: 12   Global Step: 537280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:05:23,913-Speed 2642.08 samples/sec   Loss 4.7878   LearningRate 0.0124   Epoch: 12   Global Step: 537290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:27,806-Speed 2630.99 samples/sec   Loss 4.6741   LearningRate 0.0124   Epoch: 12   Global Step: 537300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:31,701-Speed 2629.63 samples/sec   Loss 4.7644   LearningRate 0.0124   Epoch: 12   Global Step: 537310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:35,609-Speed 2621.36 samples/sec   Loss 4.7494   LearningRate 0.0124   Epoch: 12   Global Step: 537320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:39,506-Speed 2627.91 samples/sec   Loss 4.7284   LearningRate 0.0124   Epoch: 12   Global Step: 537330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:43,410-Speed 2624.26 samples/sec   Loss 4.8236   LearningRate 0.0124   Epoch: 12   Global Step: 537340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:47,354-Speed 2597.14 samples/sec   Loss 4.7677   LearningRate 0.0124   Epoch: 12   Global Step: 537350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:51,287-Speed 2603.89 samples/sec   Loss 4.8109   LearningRate 0.0124   Epoch: 12   Global Step: 537360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:55,185-Speed 2628.10 samples/sec   Loss 4.6589   LearningRate 0.0124   Epoch: 12   Global Step: 537370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:05:59,080-Speed 2630.00 samples/sec   Loss 4.6584   LearningRate 0.0124   Epoch: 12   Global Step: 537380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:02,995-Speed 2616.18 samples/sec   Loss 4.6901   LearningRate 0.0124   Epoch: 12   Global Step: 537390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:06,914-Speed 2612.97 samples/sec   Loss 4.7910   LearningRate 0.0124   Epoch: 12   Global Step: 537400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:10,818-Speed 2624.13 samples/sec   Loss 4.7748   LearningRate 0.0124   Epoch: 12   Global Step: 537410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:14,720-Speed 2625.41 samples/sec   Loss 4.5919   LearningRate 0.0124   Epoch: 12   Global Step: 537420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:18,617-Speed 2627.63 samples/sec   Loss 4.8329   LearningRate 0.0124   Epoch: 12   Global Step: 537430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:22,519-Speed 2625.16 samples/sec   Loss 4.7728   LearningRate 0.0124   Epoch: 12   Global Step: 537440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:26,423-Speed 2624.21 samples/sec   Loss 4.7259   LearningRate 0.0124   Epoch: 12   Global Step: 537450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:06:30,300-Speed 2641.77 samples/sec   Loss 4.6802   LearningRate 0.0124   Epoch: 12   Global Step: 537460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:34,222-Speed 2611.55 samples/sec   Loss 4.7319   LearningRate 0.0124   Epoch: 12   Global Step: 537470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:38,133-Speed 2619.54 samples/sec   Loss 4.7282   LearningRate 0.0124   Epoch: 12   Global Step: 537480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:42,033-Speed 2626.46 samples/sec   Loss 4.8488   LearningRate 0.0124   Epoch: 12   Global Step: 537490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:45,935-Speed 2624.55 samples/sec   Loss 4.7340   LearningRate 0.0124   Epoch: 12   Global Step: 537500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:49,835-Speed 2625.95 samples/sec   Loss 4.6642   LearningRate 0.0124   Epoch: 12   Global Step: 537510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:53,740-Speed 2623.41 samples/sec   Loss 4.6825   LearningRate 0.0124   Epoch: 12   Global Step: 537520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:06:57,665-Speed 2609.43 samples/sec   Loss 4.7674   LearningRate 0.0124   Epoch: 12   Global Step: 537530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:07:01,581-Speed 2616.36 samples/sec   Loss 4.7598   LearningRate 0.0124   Epoch: 12   Global Step: 537540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:07:05,508-Speed 2608.43 samples/sec   Loss 4.7346   LearningRate 0.0124   Epoch: 12   Global Step: 537550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:07:09,419-Speed 2618.78 samples/sec   Loss 4.7737   LearningRate 0.0124   Epoch: 12   Global Step: 537560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:13,330-Speed 2619.34 samples/sec   Loss 4.7280   LearningRate 0.0124   Epoch: 12   Global Step: 537570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:17,237-Speed 2621.39 samples/sec   Loss 4.6864   LearningRate 0.0124   Epoch: 12   Global Step: 537580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:21,136-Speed 2627.04 samples/sec   Loss 4.8741   LearningRate 0.0124   Epoch: 12   Global Step: 537590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:25,057-Speed 2612.42 samples/sec   Loss 4.7708   LearningRate 0.0124   Epoch: 12   Global Step: 537600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:28,967-Speed 2619.73 samples/sec   Loss 4.7989   LearningRate 0.0124   Epoch: 12   Global Step: 537610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:32,874-Speed 2621.63 samples/sec   Loss 4.7128   LearningRate 0.0124   Epoch: 12   Global Step: 537620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:36,895-Speed 2547.92 samples/sec   Loss 4.8079   LearningRate 0.0124   Epoch: 12   Global Step: 537630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:40,794-Speed 2627.19 samples/sec   Loss 4.7586   LearningRate 0.0124   Epoch: 12   Global Step: 537640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:07:44,702-Speed 2620.38 samples/sec   Loss 4.7584   LearningRate 0.0124   Epoch: 12   Global Step: 537650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:07:48,615-Speed 2617.60 samples/sec   Loss 4.6319   LearningRate 0.0124   Epoch: 12   Global Step: 537660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:07:52,513-Speed 2627.74 samples/sec   Loss 4.6427   LearningRate 0.0124   Epoch: 12   Global Step: 537670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:07:56,415-Speed 2625.36 samples/sec   Loss 4.7279   LearningRate 0.0124   Epoch: 12   Global Step: 537680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:00,312-Speed 2628.55 samples/sec   Loss 4.6504   LearningRate 0.0124   Epoch: 12   Global Step: 537690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:04,240-Speed 2607.58 samples/sec   Loss 4.6799   LearningRate 0.0124   Epoch: 12   Global Step: 537700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:08,156-Speed 2616.74 samples/sec   Loss 4.7348   LearningRate 0.0124   Epoch: 12   Global Step: 537710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:12,060-Speed 2623.15 samples/sec   Loss 4.7287   LearningRate 0.0124   Epoch: 12   Global Step: 537720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:15,961-Speed 2625.40 samples/sec   Loss 4.7839   LearningRate 0.0124   Epoch: 12   Global Step: 537730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:19,860-Speed 2627.01 samples/sec   Loss 4.6062   LearningRate 0.0124   Epoch: 12   Global Step: 537740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:23,913-Speed 2528.00 samples/sec   Loss 4.7708   LearningRate 0.0124   Epoch: 12   Global Step: 537750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:08:27,869-Speed 2588.65 samples/sec   Loss 4.7097   LearningRate 0.0124   Epoch: 12   Global Step: 537760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:08:31,774-Speed 2623.21 samples/sec   Loss 4.7728   LearningRate 0.0124   Epoch: 12   Global Step: 537770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:08:35,671-Speed 2628.39 samples/sec   Loss 4.6764   LearningRate 0.0124   Epoch: 12   Global Step: 537780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:08:39,598-Speed 2608.36 samples/sec   Loss 4.7604   LearningRate 0.0124   Epoch: 12   Global Step: 537790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:08:43,513-Speed 2616.43 samples/sec   Loss 4.7951   LearningRate 0.0124   Epoch: 12   Global Step: 537800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:08:47,389-Speed 2642.76 samples/sec   Loss 4.7023   LearningRate 0.0124   Epoch: 12   Global Step: 537810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:51,281-Speed 2631.12 samples/sec   Loss 4.6970   LearningRate 0.0124   Epoch: 12   Global Step: 537820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:55,177-Speed 2629.77 samples/sec   Loss 4.7771   LearningRate 0.0124   Epoch: 12   Global Step: 537830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:08:59,072-Speed 2629.35 samples/sec   Loss 4.7149   LearningRate 0.0124   Epoch: 12   Global Step: 537840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:02,975-Speed 2624.10 samples/sec   Loss 4.6675   LearningRate 0.0124   Epoch: 12   Global Step: 537850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:06,870-Speed 2629.26 samples/sec   Loss 4.7473   LearningRate 0.0124   Epoch: 12   Global Step: 537860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:10,766-Speed 2629.26 samples/sec   Loss 4.6964   LearningRate 0.0124   Epoch: 12   Global Step: 537870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:14,775-Speed 2555.19 samples/sec   Loss 4.6163   LearningRate 0.0124   Epoch: 12   Global Step: 537880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:18,678-Speed 2624.21 samples/sec   Loss 4.6750   LearningRate 0.0124   Epoch: 12   Global Step: 537890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:22,571-Speed 2631.15 samples/sec   Loss 4.7145   LearningRate 0.0124   Epoch: 12   Global Step: 537900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:26,463-Speed 2631.89 samples/sec   Loss 4.7423   LearningRate 0.0124   Epoch: 12   Global Step: 537910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:09:30,336-Speed 2644.48 samples/sec   Loss 4.7502   LearningRate 0.0124   Epoch: 12   Global Step: 537920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:34,234-Speed 2627.93 samples/sec   Loss 4.6974   LearningRate 0.0124   Epoch: 12   Global Step: 537930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:38,141-Speed 2621.48 samples/sec   Loss 4.7306   LearningRate 0.0124   Epoch: 12   Global Step: 537940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:42,042-Speed 2625.54 samples/sec   Loss 4.6906   LearningRate 0.0124   Epoch: 12   Global Step: 537950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:45,938-Speed 2628.93 samples/sec   Loss 4.6845   LearningRate 0.0124   Epoch: 12   Global Step: 537960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:49,833-Speed 2629.44 samples/sec   Loss 4.6285   LearningRate 0.0124   Epoch: 12   Global Step: 537970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:53,726-Speed 2631.57 samples/sec   Loss 4.7235   LearningRate 0.0124   Epoch: 12   Global Step: 537980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:09:57,712-Speed 2569.26 samples/sec   Loss 4.6921   LearningRate 0.0124   Epoch: 12   Global Step: 537990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:01,606-Speed 2630.78 samples/sec   Loss 4.6785   LearningRate 0.0124   Epoch: 12   Global Step: 538000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:05,500-Speed 2630.15 samples/sec   Loss 4.8112   LearningRate 0.0124   Epoch: 12   Global Step: 538010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:09,391-Speed 2632.36 samples/sec   Loss 4.6811   LearningRate 0.0124   Epoch: 12   Global Step: 538020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:10:13,282-Speed 2632.22 samples/sec   Loss 4.6772   LearningRate 0.0124   Epoch: 12   Global Step: 538030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:17,183-Speed 2626.13 samples/sec   Loss 4.7035   LearningRate 0.0124   Epoch: 12   Global Step: 538040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:21,089-Speed 2622.48 samples/sec   Loss 4.8658   LearningRate 0.0123   Epoch: 12   Global Step: 538050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:24,998-Speed 2620.49 samples/sec   Loss 4.6160   LearningRate 0.0123   Epoch: 12   Global Step: 538060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:28,898-Speed 2626.35 samples/sec   Loss 4.9229   LearningRate 0.0123   Epoch: 12   Global Step: 538070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:32,798-Speed 2626.12 samples/sec   Loss 4.6714   LearningRate 0.0123   Epoch: 12   Global Step: 538080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:36,695-Speed 2628.25 samples/sec   Loss 4.6741   LearningRate 0.0123   Epoch: 12   Global Step: 538090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:40,594-Speed 2626.89 samples/sec   Loss 4.6891   LearningRate 0.0123   Epoch: 12   Global Step: 538100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:44,507-Speed 2617.84 samples/sec   Loss 4.6353   LearningRate 0.0123   Epoch: 12   Global Step: 538110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:48,410-Speed 2624.10 samples/sec   Loss 4.7309   LearningRate 0.0123   Epoch: 12   Global Step: 538120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:10:52,327-Speed 2615.83 samples/sec   Loss 4.7364   LearningRate 0.0123   Epoch: 12   Global Step: 538130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:10:56,211-Speed 2636.76 samples/sec   Loss 4.7358   LearningRate 0.0123   Epoch: 12   Global Step: 538140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:11:00,197-Speed 2569.52 samples/sec   Loss 4.7963   LearningRate 0.0123   Epoch: 12   Global Step: 538150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:11:04,101-Speed 2623.56 samples/sec   Loss 4.7939   LearningRate 0.0123   Epoch: 12   Global Step: 538160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:11:07,993-Speed 2631.61 samples/sec   Loss 4.7377   LearningRate 0.0123   Epoch: 12   Global Step: 538170   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:11,887-Speed 2630.86 samples/sec   Loss 4.7390   LearningRate 0.0123   Epoch: 12   Global Step: 538180   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:15,785-Speed 2629.43 samples/sec   Loss 4.7622   LearningRate 0.0123   Epoch: 12   Global Step: 538190   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:19,693-Speed 2620.13 samples/sec   Loss 4.6948   LearningRate 0.0123   Epoch: 12   Global Step: 538200   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:23,631-Speed 2601.79 samples/sec   Loss 4.7755   LearningRate 0.0123   Epoch: 12   Global Step: 538210   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:27,529-Speed 2627.43 samples/sec   Loss 4.7431   LearningRate 0.0123   Epoch: 12   Global Step: 538220   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:31,425-Speed 2629.53 samples/sec   Loss 4.7374   LearningRate 0.0123   Epoch: 12   Global Step: 538230   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:35,341-Speed 2615.24 samples/sec   Loss 4.6905   LearningRate 0.0123   Epoch: 12   Global Step: 538240   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:39,235-Speed 2630.34 samples/sec   Loss 4.7231   LearningRate 0.0123   Epoch: 12   Global Step: 538250   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:43,130-Speed 2629.63 samples/sec   Loss 4.6770   LearningRate 0.0123   Epoch: 12   Global Step: 538260   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-15 08:11:47,023-Speed 2631.52 samples/sec   Loss 4.6346   LearningRate 0.0123   Epoch: 12   Global Step: 538270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:11:50,958-Speed 2603.53 samples/sec   Loss 4.7209   LearningRate 0.0123   Epoch: 12   Global Step: 538280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:11:54,860-Speed 2624.38 samples/sec   Loss 4.7859   LearningRate 0.0123   Epoch: 12   Global Step: 538290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:11:58,789-Speed 2607.48 samples/sec   Loss 4.7554   LearningRate 0.0123   Epoch: 12   Global Step: 538300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:02,681-Speed 2631.36 samples/sec   Loss 4.7968   LearningRate 0.0123   Epoch: 12   Global Step: 538310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:06,576-Speed 2630.14 samples/sec   Loss 4.7234   LearningRate 0.0123   Epoch: 12   Global Step: 538320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:10,474-Speed 2627.31 samples/sec   Loss 4.6774   LearningRate 0.0123   Epoch: 12   Global Step: 538330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:14,406-Speed 2604.87 samples/sec   Loss 4.7233   LearningRate 0.0123   Epoch: 12   Global Step: 538340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:18,309-Speed 2624.84 samples/sec   Loss 4.7570   LearningRate 0.0123   Epoch: 12   Global Step: 538350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:22,208-Speed 2629.50 samples/sec   Loss 4.7003   LearningRate 0.0123   Epoch: 12   Global Step: 538360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:26,091-Speed 2638.56 samples/sec   Loss 4.6953   LearningRate 0.0123   Epoch: 12   Global Step: 538370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:29,986-Speed 2629.35 samples/sec   Loss 4.7484   LearningRate 0.0123   Epoch: 12   Global Step: 538380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:33,886-Speed 2627.06 samples/sec   Loss 4.7405   LearningRate 0.0123   Epoch: 12   Global Step: 538390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:37,784-Speed 2628.02 samples/sec   Loss 4.6670   LearningRate 0.0123   Epoch: 12   Global Step: 538400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:41,681-Speed 2628.49 samples/sec   Loss 4.6144   LearningRate 0.0123   Epoch: 12   Global Step: 538410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:45,593-Speed 2617.44 samples/sec   Loss 4.7054   LearningRate 0.0123   Epoch: 12   Global Step: 538420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:49,489-Speed 2628.96 samples/sec   Loss 4.7314   LearningRate 0.0123   Epoch: 12   Global Step: 538430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:53,388-Speed 2627.25 samples/sec   Loss 4.6657   LearningRate 0.0123   Epoch: 12   Global Step: 538440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:12:57,288-Speed 2626.26 samples/sec   Loss 4.7503   LearningRate 0.0123   Epoch: 12   Global Step: 538450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:01,189-Speed 2625.98 samples/sec   Loss 4.7465   LearningRate 0.0123   Epoch: 12   Global Step: 538460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:05,099-Speed 2619.19 samples/sec   Loss 4.6886   LearningRate 0.0123   Epoch: 12   Global Step: 538470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:13:08,981-Speed 2639.78 samples/sec   Loss 4.7599   LearningRate 0.0123   Epoch: 12   Global Step: 538480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:12,877-Speed 2628.77 samples/sec   Loss 4.7519   LearningRate 0.0123   Epoch: 12   Global Step: 538490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:16,780-Speed 2624.06 samples/sec   Loss 4.7332   LearningRate 0.0123   Epoch: 12   Global Step: 538500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:20,685-Speed 2623.03 samples/sec   Loss 4.7527   LearningRate 0.0123   Epoch: 12   Global Step: 538510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:24,583-Speed 2627.45 samples/sec   Loss 4.8139   LearningRate 0.0123   Epoch: 12   Global Step: 538520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:28,486-Speed 2624.22 samples/sec   Loss 4.7168   LearningRate 0.0123   Epoch: 12   Global Step: 538530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:32,388-Speed 2624.86 samples/sec   Loss 4.6780   LearningRate 0.0123   Epoch: 12   Global Step: 538540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:36,289-Speed 2625.67 samples/sec   Loss 4.7085   LearningRate 0.0123   Epoch: 12   Global Step: 538550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:40,188-Speed 2627.02 samples/sec   Loss 4.7346   LearningRate 0.0123   Epoch: 12   Global Step: 538560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:44,083-Speed 2629.07 samples/sec   Loss 4.8157   LearningRate 0.0123   Epoch: 12   Global Step: 538570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:47,977-Speed 2630.11 samples/sec   Loss 4.7506   LearningRate 0.0123   Epoch: 12   Global Step: 538580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:13:51,873-Speed 2629.04 samples/sec   Loss 4.7545   LearningRate 0.0123   Epoch: 12   Global Step: 538590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:55,773-Speed 2626.79 samples/sec   Loss 4.7732   LearningRate 0.0123   Epoch: 12   Global Step: 538600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:13:59,684-Speed 2618.78 samples/sec   Loss 4.7342   LearningRate 0.0123   Epoch: 12   Global Step: 538610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:03,581-Speed 2628.62 samples/sec   Loss 4.7277   LearningRate 0.0123   Epoch: 12   Global Step: 538620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:07,484-Speed 2624.26 samples/sec   Loss 4.7747   LearningRate 0.0123   Epoch: 12   Global Step: 538630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:11,392-Speed 2620.84 samples/sec   Loss 4.7717   LearningRate 0.0123   Epoch: 12   Global Step: 538640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:15,302-Speed 2619.56 samples/sec   Loss 4.7908   LearningRate 0.0123   Epoch: 12   Global Step: 538650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:19,219-Speed 2614.33 samples/sec   Loss 4.7054   LearningRate 0.0123   Epoch: 12   Global Step: 538660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:23,132-Speed 2617.20 samples/sec   Loss 4.7679   LearningRate 0.0123   Epoch: 12   Global Step: 538670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:27,051-Speed 2614.19 samples/sec   Loss 4.7295   LearningRate 0.0123   Epoch: 12   Global Step: 538680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:30,948-Speed 2627.89 samples/sec   Loss 4.7186   LearningRate 0.0123   Epoch: 12   Global Step: 538690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:14:34,847-Speed 2627.15 samples/sec   Loss 4.6661   LearningRate 0.0123   Epoch: 12   Global Step: 538700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:14:38,749-Speed 2625.39 samples/sec   Loss 4.7542   LearningRate 0.0123   Epoch: 12   Global Step: 538710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:14:42,645-Speed 2629.02 samples/sec   Loss 4.7735   LearningRate 0.0123   Epoch: 12   Global Step: 538720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:46,546-Speed 2625.65 samples/sec   Loss 4.6812   LearningRate 0.0123   Epoch: 12   Global Step: 538730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:50,500-Speed 2589.98 samples/sec   Loss 4.7904   LearningRate 0.0123   Epoch: 12   Global Step: 538740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:54,400-Speed 2626.67 samples/sec   Loss 4.7646   LearningRate 0.0123   Epoch: 12   Global Step: 538750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:14:58,297-Speed 2627.90 samples/sec   Loss 4.7624   LearningRate 0.0123   Epoch: 12   Global Step: 538760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:02,236-Speed 2600.49 samples/sec   Loss 4.7533   LearningRate 0.0123   Epoch: 12   Global Step: 538770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:06,133-Speed 2628.15 samples/sec   Loss 4.7456   LearningRate 0.0123   Epoch: 12   Global Step: 538780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:10,026-Speed 2631.31 samples/sec   Loss 4.7157   LearningRate 0.0123   Epoch: 12   Global Step: 538790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:13,923-Speed 2627.77 samples/sec   Loss 4.6304   LearningRate 0.0123   Epoch: 12   Global Step: 538800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:17,824-Speed 2626.11 samples/sec   Loss 4.6514   LearningRate 0.0123   Epoch: 12   Global Step: 538810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:21,727-Speed 2624.33 samples/sec   Loss 4.7276   LearningRate 0.0123   Epoch: 12   Global Step: 538820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:15:25,602-Speed 2643.16 samples/sec   Loss 4.7099   LearningRate 0.0123   Epoch: 12   Global Step: 538830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:29,509-Speed 2621.87 samples/sec   Loss 4.7447   LearningRate 0.0123   Epoch: 12   Global Step: 538840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:33,426-Speed 2614.47 samples/sec   Loss 4.7612   LearningRate 0.0123   Epoch: 12   Global Step: 538850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:37,329-Speed 2624.09 samples/sec   Loss 4.7025   LearningRate 0.0123   Epoch: 12   Global Step: 538860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:41,236-Speed 2621.12 samples/sec   Loss 4.7450   LearningRate 0.0123   Epoch: 12   Global Step: 538870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:45,131-Speed 2629.76 samples/sec   Loss 4.6741   LearningRate 0.0123   Epoch: 12   Global Step: 538880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:49,027-Speed 2629.27 samples/sec   Loss 4.7283   LearningRate 0.0123   Epoch: 12   Global Step: 538890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:52,929-Speed 2624.88 samples/sec   Loss 4.7231   LearningRate 0.0123   Epoch: 12   Global Step: 538900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:15:56,826-Speed 2628.54 samples/sec   Loss 4.6412   LearningRate 0.0123   Epoch: 12   Global Step: 538910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:00,722-Speed 2628.86 samples/sec   Loss 4.7631   LearningRate 0.0123   Epoch: 12   Global Step: 538920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:04,660-Speed 2600.66 samples/sec   Loss 4.7849   LearningRate 0.0123   Epoch: 12   Global Step: 538930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:16:08,556-Speed 2629.19 samples/sec   Loss 4.7026   LearningRate 0.0123   Epoch: 12   Global Step: 538940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:16:12,448-Speed 2631.51 samples/sec   Loss 4.7360   LearningRate 0.0123   Epoch: 12   Global Step: 538950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:16:16,345-Speed 2628.04 samples/sec   Loss 4.7842   LearningRate 0.0123   Epoch: 12   Global Step: 538960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:16:20,241-Speed 2629.22 samples/sec   Loss 4.6564   LearningRate 0.0123   Epoch: 12   Global Step: 538970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:16:24,114-Speed 2644.34 samples/sec   Loss 4.7836   LearningRate 0.0123   Epoch: 12   Global Step: 538980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:28,011-Speed 2628.71 samples/sec   Loss 4.7422   LearningRate 0.0123   Epoch: 12   Global Step: 538990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:31,908-Speed 2628.54 samples/sec   Loss 4.7790   LearningRate 0.0123   Epoch: 12   Global Step: 539000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:35,804-Speed 2628.73 samples/sec   Loss 4.7853   LearningRate 0.0123   Epoch: 12   Global Step: 539010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:39,700-Speed 2628.55 samples/sec   Loss 4.7716   LearningRate 0.0123   Epoch: 12   Global Step: 539020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:43,599-Speed 2626.83 samples/sec   Loss 4.7216   LearningRate 0.0123   Epoch: 12   Global Step: 539030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:47,501-Speed 2625.46 samples/sec   Loss 4.7835   LearningRate 0.0123   Epoch: 12   Global Step: 539040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:51,395-Speed 2630.13 samples/sec   Loss 4.7045   LearningRate 0.0123   Epoch: 12   Global Step: 539050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:55,300-Speed 2622.73 samples/sec   Loss 4.5857   LearningRate 0.0123   Epoch: 12   Global Step: 539060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:16:59,202-Speed 2625.52 samples/sec   Loss 4.6983   LearningRate 0.0123   Epoch: 12   Global Step: 539070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:03,098-Speed 2628.58 samples/sec   Loss 4.7311   LearningRate 0.0123   Epoch: 12   Global Step: 539080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:17:06,998-Speed 2626.20 samples/sec   Loss 4.8203   LearningRate 0.0123   Epoch: 12   Global Step: 539090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:17:10,895-Speed 2628.15 samples/sec   Loss 4.7000   LearningRate 0.0123   Epoch: 12   Global Step: 539100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:17:14,772-Speed 2642.58 samples/sec   Loss 4.6610   LearningRate 0.0123   Epoch: 12   Global Step: 539110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:18,674-Speed 2624.18 samples/sec   Loss 4.7045   LearningRate 0.0123   Epoch: 12   Global Step: 539120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:22,573-Speed 2627.39 samples/sec   Loss 4.7425   LearningRate 0.0123   Epoch: 12   Global Step: 539130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:26,470-Speed 2628.19 samples/sec   Loss 4.7401   LearningRate 0.0123   Epoch: 12   Global Step: 539140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:30,369-Speed 2627.11 samples/sec   Loss 4.7179   LearningRate 0.0123   Epoch: 12   Global Step: 539150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:34,268-Speed 2627.00 samples/sec   Loss 4.7911   LearningRate 0.0123   Epoch: 12   Global Step: 539160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:38,164-Speed 2628.74 samples/sec   Loss 4.7737   LearningRate 0.0123   Epoch: 12   Global Step: 539170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:42,060-Speed 2629.15 samples/sec   Loss 4.8351   LearningRate 0.0123   Epoch: 12   Global Step: 539180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:45,954-Speed 2630.61 samples/sec   Loss 4.8690   LearningRate 0.0123   Epoch: 12   Global Step: 539190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:49,848-Speed 2629.67 samples/sec   Loss 4.6936   LearningRate 0.0123   Epoch: 12   Global Step: 539200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:17:53,740-Speed 2632.20 samples/sec   Loss 4.7707   LearningRate 0.0123   Epoch: 12   Global Step: 539210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:17:57,633-Speed 2630.83 samples/sec   Loss 4.7296   LearningRate 0.0123   Epoch: 12   Global Step: 539220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:18:19,113-Speed 476.74 samples/sec   Loss 4.7703   LearningRate 0.0122   Epoch: 13   Global Step: 539230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:18:22,973-Speed 2654.25 samples/sec   Loss 4.7500   LearningRate 0.0122   Epoch: 13   Global Step: 539240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:26,867-Speed 2630.27 samples/sec   Loss 4.7773   LearningRate 0.0122   Epoch: 13   Global Step: 539250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:30,769-Speed 2624.94 samples/sec   Loss 4.8075   LearningRate 0.0122   Epoch: 13   Global Step: 539260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:34,658-Speed 2634.21 samples/sec   Loss 4.7186   LearningRate 0.0122   Epoch: 13   Global Step: 539270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:38,553-Speed 2629.58 samples/sec   Loss 4.7342   LearningRate 0.0122   Epoch: 13   Global Step: 539280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:42,449-Speed 2628.70 samples/sec   Loss 4.7190   LearningRate 0.0122   Epoch: 13   Global Step: 539290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:46,350-Speed 2626.31 samples/sec   Loss 4.7301   LearningRate 0.0122   Epoch: 13   Global Step: 539300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:50,252-Speed 2624.32 samples/sec   Loss 4.6973   LearningRate 0.0122   Epoch: 13   Global Step: 539310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:54,181-Speed 2607.33 samples/sec   Loss 4.6979   LearningRate 0.0122   Epoch: 13   Global Step: 539320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:18:58,080-Speed 2626.87 samples/sec   Loss 4.6166   LearningRate 0.0122   Epoch: 13   Global Step: 539330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:01,980-Speed 2626.81 samples/sec   Loss 4.7125   LearningRate 0.0122   Epoch: 13   Global Step: 539340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:19:05,859-Speed 2640.20 samples/sec   Loss 4.7187   LearningRate 0.0122   Epoch: 13   Global Step: 539350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:09,783-Speed 2610.46 samples/sec   Loss 4.6808   LearningRate 0.0122   Epoch: 13   Global Step: 539360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:13,680-Speed 2627.99 samples/sec   Loss 4.6697   LearningRate 0.0122   Epoch: 13   Global Step: 539370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:17,598-Speed 2614.67 samples/sec   Loss 4.6173   LearningRate 0.0122   Epoch: 13   Global Step: 539380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:21,496-Speed 2627.81 samples/sec   Loss 4.8507   LearningRate 0.0122   Epoch: 13   Global Step: 539390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:25,396-Speed 2626.18 samples/sec   Loss 4.6801   LearningRate 0.0122   Epoch: 13   Global Step: 539400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:29,325-Speed 2607.19 samples/sec   Loss 4.7122   LearningRate 0.0122   Epoch: 13   Global Step: 539410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:33,229-Speed 2623.95 samples/sec   Loss 4.7245   LearningRate 0.0122   Epoch: 13   Global Step: 539420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:37,129-Speed 2625.76 samples/sec   Loss 4.7654   LearningRate 0.0122   Epoch: 13   Global Step: 539430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:41,029-Speed 2626.36 samples/sec   Loss 4.6609   LearningRate 0.0122   Epoch: 13   Global Step: 539440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:19:44,959-Speed 2606.93 samples/sec   Loss 4.7407   LearningRate 0.0122   Epoch: 13   Global Step: 539450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:19:48,862-Speed 2623.90 samples/sec   Loss 4.7325   LearningRate 0.0122   Epoch: 13   Global Step: 539460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:19:52,791-Speed 2607.19 samples/sec   Loss 4.6795   LearningRate 0.0122   Epoch: 13   Global Step: 539470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:19:56,686-Speed 2630.18 samples/sec   Loss 4.7473   LearningRate 0.0122   Epoch: 13   Global Step: 539480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:20:00,586-Speed 2626.73 samples/sec   Loss 4.7234   LearningRate 0.0122   Epoch: 13   Global Step: 539490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:20:04,487-Speed 2625.14 samples/sec   Loss 4.6194   LearningRate 0.0122   Epoch: 13   Global Step: 539500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:20:08,383-Speed 2628.99 samples/sec   Loss 4.6483   LearningRate 0.0122   Epoch: 13   Global Step: 539510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:20:12,275-Speed 2631.68 samples/sec   Loss 4.6652   LearningRate 0.0122   Epoch: 13   Global Step: 539520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:16,177-Speed 2625.23 samples/sec   Loss 4.7311   LearningRate 0.0122   Epoch: 13   Global Step: 539530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:20,072-Speed 2629.39 samples/sec   Loss 4.6404   LearningRate 0.0122   Epoch: 13   Global Step: 539540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:23,983-Speed 2618.93 samples/sec   Loss 4.7051   LearningRate 0.0122   Epoch: 13   Global Step: 539550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:27,888-Speed 2622.80 samples/sec   Loss 4.7686   LearningRate 0.0122   Epoch: 13   Global Step: 539560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:31,787-Speed 2627.79 samples/sec   Loss 4.7738   LearningRate 0.0122   Epoch: 13   Global Step: 539570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:35,679-Speed 2631.55 samples/sec   Loss 4.6848   LearningRate 0.0122   Epoch: 13   Global Step: 539580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:39,583-Speed 2623.28 samples/sec   Loss 4.7793   LearningRate 0.0122   Epoch: 13   Global Step: 539590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:43,490-Speed 2621.74 samples/sec   Loss 4.7407   LearningRate 0.0122   Epoch: 13   Global Step: 539600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:47,385-Speed 2629.53 samples/sec   Loss 4.6551   LearningRate 0.0122   Epoch: 13   Global Step: 539610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:20:51,279-Speed 2630.47 samples/sec   Loss 4.6717   LearningRate 0.0122   Epoch: 13   Global Step: 539620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:20:55,176-Speed 2628.87 samples/sec   Loss 4.7373   LearningRate 0.0122   Epoch: 13   Global Step: 539630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:20:59,073-Speed 2627.87 samples/sec   Loss 4.7437   LearningRate 0.0122   Epoch: 13   Global Step: 539640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:21:02,971-Speed 2629.04 samples/sec   Loss 4.5599   LearningRate 0.0122   Epoch: 13   Global Step: 539650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:21:06,877-Speed 2621.51 samples/sec   Loss 4.6794   LearningRate 0.0122   Epoch: 13   Global Step: 539660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:21:10,779-Speed 2625.33 samples/sec   Loss 4.6557   LearningRate 0.0122   Epoch: 13   Global Step: 539670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:21:14,683-Speed 2623.47 samples/sec   Loss 4.7406   LearningRate 0.0122   Epoch: 13   Global Step: 539680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:21:18,556-Speed 2644.24 samples/sec   Loss 4.7776   LearningRate 0.0122   Epoch: 13   Global Step: 539690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:22,461-Speed 2623.31 samples/sec   Loss 4.6803   LearningRate 0.0122   Epoch: 13   Global Step: 539700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:26,371-Speed 2619.45 samples/sec   Loss 4.6972   LearningRate 0.0122   Epoch: 13   Global Step: 539710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:30,272-Speed 2626.23 samples/sec   Loss 4.6774   LearningRate 0.0122   Epoch: 13   Global Step: 539720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:34,171-Speed 2626.71 samples/sec   Loss 4.6149   LearningRate 0.0122   Epoch: 13   Global Step: 539730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:38,080-Speed 2619.72 samples/sec   Loss 4.7018   LearningRate 0.0122   Epoch: 13   Global Step: 539740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:41,977-Speed 2628.69 samples/sec   Loss 4.6532   LearningRate 0.0122   Epoch: 13   Global Step: 539750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:45,872-Speed 2630.58 samples/sec   Loss 4.6450   LearningRate 0.0122   Epoch: 13   Global Step: 539760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:49,767-Speed 2629.43 samples/sec   Loss 4.7802   LearningRate 0.0122   Epoch: 13   Global Step: 539770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:53,691-Speed 2610.79 samples/sec   Loss 4.6431   LearningRate 0.0122   Epoch: 13   Global Step: 539780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:21:57,561-Speed 2646.62 samples/sec   Loss 4.5910   LearningRate 0.0122   Epoch: 13   Global Step: 539790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:01,458-Speed 2628.89 samples/sec   Loss 4.7147   LearningRate 0.0122   Epoch: 13   Global Step: 539800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:05,375-Speed 2614.63 samples/sec   Loss 4.6412   LearningRate 0.0122   Epoch: 13   Global Step: 539810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:09,273-Speed 2627.37 samples/sec   Loss 4.6482   LearningRate 0.0122   Epoch: 13   Global Step: 539820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:13,170-Speed 2628.38 samples/sec   Loss 4.7330   LearningRate 0.0122   Epoch: 13   Global Step: 539830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:17,065-Speed 2630.59 samples/sec   Loss 4.7910   LearningRate 0.0122   Epoch: 13   Global Step: 539840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:20,967-Speed 2624.49 samples/sec   Loss 4.5896   LearningRate 0.0122   Epoch: 13   Global Step: 539850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:24,864-Speed 2628.36 samples/sec   Loss 4.7403   LearningRate 0.0122   Epoch: 13   Global Step: 539860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:28,770-Speed 2622.38 samples/sec   Loss 4.7220   LearningRate 0.0122   Epoch: 13   Global Step: 539870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:32,676-Speed 2622.79 samples/sec   Loss 4.7391   LearningRate 0.0122   Epoch: 13   Global Step: 539880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:36,570-Speed 2629.94 samples/sec   Loss 4.7044   LearningRate 0.0122   Epoch: 13   Global Step: 539890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:22:40,467-Speed 2628.37 samples/sec   Loss 4.7056   LearningRate 0.0122   Epoch: 13   Global Step: 539900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:22:44,366-Speed 2626.72 samples/sec   Loss 4.7331   LearningRate 0.0122   Epoch: 13   Global Step: 539910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:22:48,260-Speed 2630.68 samples/sec   Loss 4.6811   LearningRate 0.0122   Epoch: 13   Global Step: 539920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:22:52,155-Speed 2630.71 samples/sec   Loss 4.7016   LearningRate 0.0122   Epoch: 13   Global Step: 539930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:56,049-Speed 2630.27 samples/sec   Loss 4.6914   LearningRate 0.0122   Epoch: 13   Global Step: 539940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:22:59,963-Speed 2616.73 samples/sec   Loss 4.6933   LearningRate 0.0122   Epoch: 13   Global Step: 539950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:23:03,871-Speed 2621.21 samples/sec   Loss 4.6264   LearningRate 0.0122   Epoch: 13   Global Step: 539960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:23:07,774-Speed 2623.97 samples/sec   Loss 4.6354   LearningRate 0.0122   Epoch: 13   Global Step: 539970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:23:11,728-Speed 2590.44 samples/sec   Loss 4.7110   LearningRate 0.0122   Epoch: 13   Global Step: 539980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:23:15,637-Speed 2620.52 samples/sec   Loss 4.6226   LearningRate 0.0122   Epoch: 13   Global Step: 539990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:23:19,542-Speed 2623.10 samples/sec   Loss 4.6408   LearningRate 0.0122   Epoch: 13   Global Step: 540000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:24:02,982-[lfw][540000]XNorm: 22.376632
Training: 2022-04-15 08:24:02,983-[lfw][540000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 08:24:02,983-[lfw][540000]Accuracy-Highest: 0.99800
Training: 2022-04-15 08:24:53,381-[cfp_fp][540000]XNorm: 20.796383
Training: 2022-04-15 08:24:53,382-[cfp_fp][540000]Accuracy-Flip: 0.98971+-0.00423
Training: 2022-04-15 08:24:53,383-[cfp_fp][540000]Accuracy-Highest: 0.99086
Training: 2022-04-15 08:25:36,817-[agedb_30][540000]XNorm: 22.333698
Training: 2022-04-15 08:25:36,818-[agedb_30][540000]Accuracy-Flip: 0.97917+-0.00720
Training: 2022-04-15 08:25:36,819-[agedb_30][540000]Accuracy-Highest: 0.98083
Training: 2022-04-15 08:25:40,713-Speed 72.54 samples/sec   Loss 4.6632   LearningRate 0.0122   Epoch: 13   Global Step: 540010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:25:44,755-Speed 2533.87 samples/sec   Loss 4.7313   LearningRate 0.0122   Epoch: 13   Global Step: 540020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:25:48,639-Speed 2637.78 samples/sec   Loss 4.7169   LearningRate 0.0122   Epoch: 13   Global Step: 540030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:25:52,528-Speed 2633.30 samples/sec   Loss 4.7077   LearningRate 0.0122   Epoch: 13   Global Step: 540040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:25:56,403-Speed 2643.31 samples/sec   Loss 4.6123   LearningRate 0.0122   Epoch: 13   Global Step: 540050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:00,282-Speed 2640.73 samples/sec   Loss 4.6621   LearningRate 0.0122   Epoch: 13   Global Step: 540060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:04,161-Speed 2640.90 samples/sec   Loss 4.7379   LearningRate 0.0122   Epoch: 13   Global Step: 540070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:08,074-Speed 2617.57 samples/sec   Loss 4.6431   LearningRate 0.0122   Epoch: 13   Global Step: 540080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:11,964-Speed 2633.44 samples/sec   Loss 4.6485   LearningRate 0.0122   Epoch: 13   Global Step: 540090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:15,860-Speed 2629.71 samples/sec   Loss 4.7362   LearningRate 0.0122   Epoch: 13   Global Step: 540100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:19,741-Speed 2639.12 samples/sec   Loss 4.7054   LearningRate 0.0122   Epoch: 13   Global Step: 540110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:23,634-Speed 2630.35 samples/sec   Loss 4.7243   LearningRate 0.0122   Epoch: 13   Global Step: 540120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:27,533-Speed 2627.18 samples/sec   Loss 4.6247   LearningRate 0.0122   Epoch: 13   Global Step: 540130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:31,432-Speed 2627.33 samples/sec   Loss 4.6666   LearningRate 0.0122   Epoch: 13   Global Step: 540140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:26:35,307-Speed 2643.66 samples/sec   Loss 4.6581   LearningRate 0.0122   Epoch: 13   Global Step: 540150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:39,229-Speed 2611.23 samples/sec   Loss 4.7735   LearningRate 0.0122   Epoch: 13   Global Step: 540160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:43,122-Speed 2631.33 samples/sec   Loss 4.6889   LearningRate 0.0122   Epoch: 13   Global Step: 540170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:47,017-Speed 2629.73 samples/sec   Loss 4.7480   LearningRate 0.0122   Epoch: 13   Global Step: 540180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:50,912-Speed 2629.83 samples/sec   Loss 4.7325   LearningRate 0.0122   Epoch: 13   Global Step: 540190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:54,809-Speed 2627.72 samples/sec   Loss 4.6339   LearningRate 0.0122   Epoch: 13   Global Step: 540200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:26:58,718-Speed 2620.82 samples/sec   Loss 4.6713   LearningRate 0.0122   Epoch: 13   Global Step: 540210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:27:02,607-Speed 2634.02 samples/sec   Loss 4.6846   LearningRate 0.0122   Epoch: 13   Global Step: 540220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:27:06,505-Speed 2627.55 samples/sec   Loss 4.6574   LearningRate 0.0122   Epoch: 13   Global Step: 540230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:27:10,410-Speed 2623.02 samples/sec   Loss 4.7081   LearningRate 0.0122   Epoch: 13   Global Step: 540240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:27:14,301-Speed 2632.50 samples/sec   Loss 4.7201   LearningRate 0.0122   Epoch: 13   Global Step: 540250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:18,195-Speed 2629.65 samples/sec   Loss 4.6269   LearningRate 0.0122   Epoch: 13   Global Step: 540260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:22,084-Speed 2634.16 samples/sec   Loss 4.6658   LearningRate 0.0122   Epoch: 13   Global Step: 540270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:26,002-Speed 2613.95 samples/sec   Loss 4.7204   LearningRate 0.0122   Epoch: 13   Global Step: 540280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:29,902-Speed 2626.94 samples/sec   Loss 4.7006   LearningRate 0.0122   Epoch: 13   Global Step: 540290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:33,796-Speed 2630.53 samples/sec   Loss 4.6949   LearningRate 0.0122   Epoch: 13   Global Step: 540300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:37,690-Speed 2629.91 samples/sec   Loss 4.7273   LearningRate 0.0122   Epoch: 13   Global Step: 540310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:41,584-Speed 2630.62 samples/sec   Loss 4.7323   LearningRate 0.0122   Epoch: 13   Global Step: 540320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:45,477-Speed 2631.17 samples/sec   Loss 4.7172   LearningRate 0.0122   Epoch: 13   Global Step: 540330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:49,372-Speed 2629.97 samples/sec   Loss 4.7196   LearningRate 0.0122   Epoch: 13   Global Step: 540340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:27:53,270-Speed 2627.45 samples/sec   Loss 4.7666   LearningRate 0.0122   Epoch: 13   Global Step: 540350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-15 08:27:57,278-Speed 2555.57 samples/sec   Loss 4.6359   LearningRate 0.0122   Epoch: 13   Global Step: 540360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:28:01,368-Speed 2504.23 samples/sec   Loss 4.7516   LearningRate 0.0122   Epoch: 13   Global Step: 540370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-15 08:28:05,257-Speed 2634.27 samples/sec   Loss 4.6824   LearningRate 0.0122   Epoch: 13   Global Step: 540380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:28:09,154-Speed 2627.93 samples/sec   Loss 4.6229   LearningRate 0.0122   Epoch: 13   Global Step: 540390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:28:13,051-Speed 2628.87 samples/sec   Loss 4.6510   LearningRate 0.0122   Epoch: 13   Global Step: 540400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:28:16,949-Speed 2627.57 samples/sec   Loss 4.7831   LearningRate 0.0122   Epoch: 13   Global Step: 540410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:28:20,837-Speed 2634.37 samples/sec   Loss 4.6664   LearningRate 0.0121   Epoch: 13   Global Step: 540420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-15 08:28:24,734-Speed 2628.30 samples/sec   Loss 4.7675   LearningRate 0.0121   Epoch: 13   Global Step: 540430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:28:28,629-Speed 2630.06 samples/sec   Loss 4.6516   LearningRate 0.0121   Epoch: 13   Global Step: 540440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:28:32,565-Speed 2602.06 samples/sec   Loss 4.7528   LearningRate 0.0121   Epoch: 13   Global Step: 540450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:28:36,466-Speed 2626.23 samples/sec   Loss 4.8771   LearningRate 0.0121   Epoch: 13   Global Step: 540460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:28:40,366-Speed 2626.66 samples/sec   Loss 4.6180   LearningRate 0.0121   Epoch: 13   Global Step: 540470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:28:44,292-Speed 2608.61 samples/sec   Loss 4.7323   LearningRate 0.0121   Epoch: 13   Global Step: 540480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:28:48,198-Speed 2622.27 samples/sec   Loss 4.7319   LearningRate 0.0121   Epoch: 13   Global Step: 540490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:28:52,089-Speed 2632.34 samples/sec   Loss 4.6472   LearningRate 0.0121   Epoch: 13   Global Step: 540500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:28:55,982-Speed 2630.90 samples/sec   Loss 4.6805   LearningRate 0.0121   Epoch: 13   Global Step: 540510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:28:59,875-Speed 2631.09 samples/sec   Loss 4.6873   LearningRate 0.0121   Epoch: 13   Global Step: 540520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:03,773-Speed 2627.74 samples/sec   Loss 4.7390   LearningRate 0.0121   Epoch: 13   Global Step: 540530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:07,664-Speed 2632.99 samples/sec   Loss 4.6923   LearningRate 0.0121   Epoch: 13   Global Step: 540540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:11,554-Speed 2632.61 samples/sec   Loss 4.7008   LearningRate 0.0121   Epoch: 13   Global Step: 540550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:15,448-Speed 2630.52 samples/sec   Loss 4.6192   LearningRate 0.0121   Epoch: 13   Global Step: 540560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:19,383-Speed 2602.62 samples/sec   Loss 4.6934   LearningRate 0.0121   Epoch: 13   Global Step: 540570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:23,251-Speed 2648.09 samples/sec   Loss 4.7480   LearningRate 0.0121   Epoch: 13   Global Step: 540580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:27,141-Speed 2633.09 samples/sec   Loss 4.6506   LearningRate 0.0121   Epoch: 13   Global Step: 540590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:31,066-Speed 2609.89 samples/sec   Loss 4.7559   LearningRate 0.0121   Epoch: 13   Global Step: 540600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:34,961-Speed 2629.78 samples/sec   Loss 4.7045   LearningRate 0.0121   Epoch: 13   Global Step: 540610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:38,852-Speed 2632.77 samples/sec   Loss 4.6807   LearningRate 0.0121   Epoch: 13   Global Step: 540620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:42,750-Speed 2627.63 samples/sec   Loss 4.5710   LearningRate 0.0121   Epoch: 13   Global Step: 540630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:46,648-Speed 2628.47 samples/sec   Loss 4.6556   LearningRate 0.0121   Epoch: 13   Global Step: 540640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:29:50,509-Speed 2652.10 samples/sec   Loss 4.7326   LearningRate 0.0121   Epoch: 13   Global Step: 540650   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:29:54,422-Speed 2617.72 samples/sec   Loss 4.8145   LearningRate 0.0121   Epoch: 13   Global Step: 540660   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:29:58,316-Speed 2630.02 samples/sec   Loss 4.5972   LearningRate 0.0121   Epoch: 13   Global Step: 540670   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:02,212-Speed 2629.64 samples/sec   Loss 4.6780   LearningRate 0.0121   Epoch: 13   Global Step: 540680   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:06,108-Speed 2628.59 samples/sec   Loss 4.7834   LearningRate 0.0121   Epoch: 13   Global Step: 540690   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:09,998-Speed 2633.16 samples/sec   Loss 4.5576   LearningRate 0.0121   Epoch: 13   Global Step: 540700   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:13,888-Speed 2632.93 samples/sec   Loss 4.7727   LearningRate 0.0121   Epoch: 13   Global Step: 540710   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:17,789-Speed 2626.21 samples/sec   Loss 4.6840   LearningRate 0.0121   Epoch: 13   Global Step: 540720   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:21,721-Speed 2604.29 samples/sec   Loss 4.7291   LearningRate 0.0121   Epoch: 13   Global Step: 540730   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:25,800-Speed 2511.33 samples/sec   Loss 4.6791   LearningRate 0.0121   Epoch: 13   Global Step: 540740   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:30:29,887-Speed 2505.72 samples/sec   Loss 4.7322   LearningRate 0.0121   Epoch: 13   Global Step: 540750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:33,978-Speed 2504.06 samples/sec   Loss 4.7373   LearningRate 0.0121   Epoch: 13   Global Step: 540760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:38,064-Speed 2507.14 samples/sec   Loss 4.6750   LearningRate 0.0121   Epoch: 13   Global Step: 540770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:42,104-Speed 2535.26 samples/sec   Loss 4.6430   LearningRate 0.0121   Epoch: 13   Global Step: 540780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:45,998-Speed 2630.43 samples/sec   Loss 4.6758   LearningRate 0.0121   Epoch: 13   Global Step: 540790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:49,891-Speed 2631.23 samples/sec   Loss 4.6367   LearningRate 0.0121   Epoch: 13   Global Step: 540800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:53,785-Speed 2630.58 samples/sec   Loss 4.7259   LearningRate 0.0121   Epoch: 13   Global Step: 540810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:30:57,691-Speed 2621.88 samples/sec   Loss 4.6077   LearningRate 0.0121   Epoch: 13   Global Step: 540820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:01,588-Speed 2628.57 samples/sec   Loss 4.7406   LearningRate 0.0121   Epoch: 13   Global Step: 540830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:05,483-Speed 2629.62 samples/sec   Loss 4.8139   LearningRate 0.0121   Epoch: 13   Global Step: 540840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:09,384-Speed 2625.99 samples/sec   Loss 4.7669   LearningRate 0.0121   Epoch: 13   Global Step: 540850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:31:13,280-Speed 2629.41 samples/sec   Loss 4.6048   LearningRate 0.0121   Epoch: 13   Global Step: 540860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:31:17,173-Speed 2630.91 samples/sec   Loss 4.7511   LearningRate 0.0121   Epoch: 13   Global Step: 540870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:31:21,065-Speed 2632.15 samples/sec   Loss 4.7333   LearningRate 0.0121   Epoch: 13   Global Step: 540880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:31:24,967-Speed 2624.78 samples/sec   Loss 4.7068   LearningRate 0.0121   Epoch: 13   Global Step: 540890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:31:28,862-Speed 2629.43 samples/sec   Loss 4.6526   LearningRate 0.0121   Epoch: 13   Global Step: 540900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:31:32,776-Speed 2617.07 samples/sec   Loss 4.7219   LearningRate 0.0121   Epoch: 13   Global Step: 540910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:36,674-Speed 2626.99 samples/sec   Loss 4.6760   LearningRate 0.0121   Epoch: 13   Global Step: 540920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:40,576-Speed 2625.60 samples/sec   Loss 4.6180   LearningRate 0.0121   Epoch: 13   Global Step: 540930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:44,472-Speed 2629.38 samples/sec   Loss 4.6863   LearningRate 0.0121   Epoch: 13   Global Step: 540940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:48,371-Speed 2627.47 samples/sec   Loss 4.7232   LearningRate 0.0121   Epoch: 13   Global Step: 540950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:52,416-Speed 2532.23 samples/sec   Loss 4.7622   LearningRate 0.0121   Epoch: 13   Global Step: 540960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:31:56,309-Speed 2631.66 samples/sec   Loss 4.6414   LearningRate 0.0121   Epoch: 13   Global Step: 540970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:32:00,219-Speed 2619.70 samples/sec   Loss 4.7202   LearningRate 0.0121   Epoch: 13   Global Step: 540980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:32:04,114-Speed 2630.02 samples/sec   Loss 4.6193   LearningRate 0.0121   Epoch: 13   Global Step: 540990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:32:08,012-Speed 2627.52 samples/sec   Loss 4.7341   LearningRate 0.0121   Epoch: 13   Global Step: 541000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:32:11,925-Speed 2617.87 samples/sec   Loss 4.7051   LearningRate 0.0121   Epoch: 13   Global Step: 541010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:15,816-Speed 2631.86 samples/sec   Loss 4.6737   LearningRate 0.0121   Epoch: 13   Global Step: 541020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:19,713-Speed 2628.70 samples/sec   Loss 4.6602   LearningRate 0.0121   Epoch: 13   Global Step: 541030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:23,635-Speed 2611.50 samples/sec   Loss 4.6970   LearningRate 0.0121   Epoch: 13   Global Step: 541040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:27,533-Speed 2628.15 samples/sec   Loss 4.7259   LearningRate 0.0121   Epoch: 13   Global Step: 541050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:31,425-Speed 2631.43 samples/sec   Loss 4.5699   LearningRate 0.0121   Epoch: 13   Global Step: 541060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:35,321-Speed 2628.95 samples/sec   Loss 4.5676   LearningRate 0.0121   Epoch: 13   Global Step: 541070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:39,235-Speed 2616.94 samples/sec   Loss 4.7252   LearningRate 0.0121   Epoch: 13   Global Step: 541080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:43,127-Speed 2631.91 samples/sec   Loss 4.6663   LearningRate 0.0121   Epoch: 13   Global Step: 541090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:47,022-Speed 2629.48 samples/sec   Loss 4.6884   LearningRate 0.0121   Epoch: 13   Global Step: 541100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:50,899-Speed 2642.01 samples/sec   Loss 4.7183   LearningRate 0.0121   Epoch: 13   Global Step: 541110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:32:54,785-Speed 2635.80 samples/sec   Loss 4.7934   LearningRate 0.0121   Epoch: 13   Global Step: 541120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:32:58,685-Speed 2626.64 samples/sec   Loss 4.5808   LearningRate 0.0121   Epoch: 13   Global Step: 541130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:02,600-Speed 2616.03 samples/sec   Loss 4.7119   LearningRate 0.0121   Epoch: 13   Global Step: 541140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:06,700-Speed 2498.24 samples/sec   Loss 4.5894   LearningRate 0.0121   Epoch: 13   Global Step: 541150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:10,751-Speed 2528.37 samples/sec   Loss 4.7195   LearningRate 0.0121   Epoch: 13   Global Step: 541160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:14,646-Speed 2630.06 samples/sec   Loss 4.6560   LearningRate 0.0121   Epoch: 13   Global Step: 541170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:18,542-Speed 2629.52 samples/sec   Loss 4.7124   LearningRate 0.0121   Epoch: 13   Global Step: 541180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:22,451-Speed 2620.01 samples/sec   Loss 4.7440   LearningRate 0.0121   Epoch: 13   Global Step: 541190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:26,349-Speed 2628.40 samples/sec   Loss 4.7238   LearningRate 0.0121   Epoch: 13   Global Step: 541200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:30,263-Speed 2616.94 samples/sec   Loss 4.6563   LearningRate 0.0121   Epoch: 13   Global Step: 541210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:34,174-Speed 2618.55 samples/sec   Loss 4.6852   LearningRate 0.0121   Epoch: 13   Global Step: 541220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:33:38,071-Speed 2628.14 samples/sec   Loss 4.6468   LearningRate 0.0121   Epoch: 13   Global Step: 541230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:33:41,945-Speed 2644.37 samples/sec   Loss 4.6059   LearningRate 0.0121   Epoch: 13   Global Step: 541240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:45,840-Speed 2629.99 samples/sec   Loss 4.6654   LearningRate 0.0121   Epoch: 13   Global Step: 541250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:49,738-Speed 2627.67 samples/sec   Loss 4.7670   LearningRate 0.0121   Epoch: 13   Global Step: 541260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:53,635-Speed 2627.96 samples/sec   Loss 4.7542   LearningRate 0.0121   Epoch: 13   Global Step: 541270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:33:57,534-Speed 2627.54 samples/sec   Loss 4.6544   LearningRate 0.0121   Epoch: 13   Global Step: 541280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:01,430-Speed 2628.59 samples/sec   Loss 4.7192   LearningRate 0.0121   Epoch: 13   Global Step: 541290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:05,328-Speed 2627.28 samples/sec   Loss 4.7351   LearningRate 0.0121   Epoch: 13   Global Step: 541300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:09,227-Speed 2627.05 samples/sec   Loss 4.5740   LearningRate 0.0121   Epoch: 13   Global Step: 541310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:13,121-Speed 2630.54 samples/sec   Loss 4.7077   LearningRate 0.0121   Epoch: 13   Global Step: 541320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:17,022-Speed 2625.16 samples/sec   Loss 4.6999   LearningRate 0.0121   Epoch: 13   Global Step: 541330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:20,923-Speed 2625.89 samples/sec   Loss 4.7171   LearningRate 0.0121   Epoch: 13   Global Step: 541340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:34:24,825-Speed 2624.47 samples/sec   Loss 4.6186   LearningRate 0.0121   Epoch: 13   Global Step: 541350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:34:28,722-Speed 2628.89 samples/sec   Loss 4.6395   LearningRate 0.0121   Epoch: 13   Global Step: 541360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:34:32,623-Speed 2625.14 samples/sec   Loss 4.7077   LearningRate 0.0121   Epoch: 13   Global Step: 541370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:34:36,517-Speed 2630.69 samples/sec   Loss 4.7149   LearningRate 0.0121   Epoch: 13   Global Step: 541380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:34:40,435-Speed 2614.26 samples/sec   Loss 4.6506   LearningRate 0.0121   Epoch: 13   Global Step: 541390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:34:44,310-Speed 2643.47 samples/sec   Loss 4.6925   LearningRate 0.0121   Epoch: 13   Global Step: 541400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:48,224-Speed 2616.54 samples/sec   Loss 4.6978   LearningRate 0.0121   Epoch: 13   Global Step: 541410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:52,121-Speed 2627.94 samples/sec   Loss 4.6665   LearningRate 0.0121   Epoch: 13   Global Step: 541420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:56,019-Speed 2628.00 samples/sec   Loss 4.7744   LearningRate 0.0121   Epoch: 13   Global Step: 541430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:34:59,912-Speed 2630.67 samples/sec   Loss 4.6867   LearningRate 0.0121   Epoch: 13   Global Step: 541440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:03,837-Speed 2609.68 samples/sec   Loss 4.7260   LearningRate 0.0121   Epoch: 13   Global Step: 541450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:07,746-Speed 2620.53 samples/sec   Loss 4.8217   LearningRate 0.0121   Epoch: 13   Global Step: 541460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:11,646-Speed 2626.18 samples/sec   Loss 4.6264   LearningRate 0.0121   Epoch: 13   Global Step: 541470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:15,556-Speed 2619.45 samples/sec   Loss 4.6894   LearningRate 0.0121   Epoch: 13   Global Step: 541480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:19,453-Speed 2628.81 samples/sec   Loss 4.7969   LearningRate 0.0121   Epoch: 13   Global Step: 541490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:23,347-Speed 2629.84 samples/sec   Loss 4.6469   LearningRate 0.0121   Epoch: 13   Global Step: 541500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:35:27,253-Speed 2622.67 samples/sec   Loss 4.6323   LearningRate 0.0121   Epoch: 13   Global Step: 541510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:35:31,150-Speed 2628.05 samples/sec   Loss 4.7355   LearningRate 0.0121   Epoch: 13   Global Step: 541520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:35:35,046-Speed 2629.79 samples/sec   Loss 4.6669   LearningRate 0.0121   Epoch: 13   Global Step: 541530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:35:38,939-Speed 2630.67 samples/sec   Loss 4.6310   LearningRate 0.0121   Epoch: 13   Global Step: 541540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:35:42,817-Speed 2641.81 samples/sec   Loss 4.6515   LearningRate 0.0121   Epoch: 13   Global Step: 541550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:46,711-Speed 2629.84 samples/sec   Loss 4.7007   LearningRate 0.0121   Epoch: 13   Global Step: 541560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:50,609-Speed 2627.89 samples/sec   Loss 4.6562   LearningRate 0.0121   Epoch: 13   Global Step: 541570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:54,524-Speed 2616.01 samples/sec   Loss 4.7426   LearningRate 0.0121   Epoch: 13   Global Step: 541580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:35:58,420-Speed 2628.97 samples/sec   Loss 4.6389   LearningRate 0.0121   Epoch: 13   Global Step: 541590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:36:02,323-Speed 2624.12 samples/sec   Loss 4.6656   LearningRate 0.0121   Epoch: 13   Global Step: 541600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:36:06,217-Speed 2631.11 samples/sec   Loss 4.5574   LearningRate 0.0120   Epoch: 13   Global Step: 541610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:36:10,112-Speed 2628.93 samples/sec   Loss 4.6617   LearningRate 0.0120   Epoch: 13   Global Step: 541620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:36:14,011-Speed 2627.60 samples/sec   Loss 4.8342   LearningRate 0.0120   Epoch: 13   Global Step: 541630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:36:17,916-Speed 2622.32 samples/sec   Loss 4.7435   LearningRate 0.0120   Epoch: 13   Global Step: 541640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:36:21,822-Speed 2622.34 samples/sec   Loss 4.7244   LearningRate 0.0120   Epoch: 13   Global Step: 541650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:25,717-Speed 2629.56 samples/sec   Loss 4.6700   LearningRate 0.0120   Epoch: 13   Global Step: 541660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:29,619-Speed 2625.60 samples/sec   Loss 4.7202   LearningRate 0.0120   Epoch: 13   Global Step: 541670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:33,519-Speed 2626.33 samples/sec   Loss 4.7011   LearningRate 0.0120   Epoch: 13   Global Step: 541680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:37,424-Speed 2622.87 samples/sec   Loss 4.6821   LearningRate 0.0120   Epoch: 13   Global Step: 541690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:41,325-Speed 2625.49 samples/sec   Loss 4.6304   LearningRate 0.0120   Epoch: 13   Global Step: 541700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:45,222-Speed 2628.56 samples/sec   Loss 4.6464   LearningRate 0.0120   Epoch: 13   Global Step: 541710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:49,134-Speed 2617.60 samples/sec   Loss 4.7204   LearningRate 0.0120   Epoch: 13   Global Step: 541720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:53,028-Speed 2630.52 samples/sec   Loss 4.6087   LearningRate 0.0120   Epoch: 13   Global Step: 541730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:36:56,909-Speed 2639.35 samples/sec   Loss 4.6350   LearningRate 0.0120   Epoch: 13   Global Step: 541740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:00,811-Speed 2624.78 samples/sec   Loss 4.6416   LearningRate 0.0120   Epoch: 13   Global Step: 541750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:04,710-Speed 2627.41 samples/sec   Loss 4.6596   LearningRate 0.0120   Epoch: 13   Global Step: 541760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:08,614-Speed 2623.56 samples/sec   Loss 4.6769   LearningRate 0.0120   Epoch: 13   Global Step: 541770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:12,510-Speed 2628.80 samples/sec   Loss 4.7161   LearningRate 0.0120   Epoch: 13   Global Step: 541780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:16,409-Speed 2626.51 samples/sec   Loss 4.6755   LearningRate 0.0120   Epoch: 13   Global Step: 541790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:20,310-Speed 2626.36 samples/sec   Loss 4.6474   LearningRate 0.0120   Epoch: 13   Global Step: 541800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:24,207-Speed 2627.89 samples/sec   Loss 4.6538   LearningRate 0.0120   Epoch: 13   Global Step: 541810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:28,112-Speed 2623.22 samples/sec   Loss 4.7510   LearningRate 0.0120   Epoch: 13   Global Step: 541820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:32,008-Speed 2628.79 samples/sec   Loss 4.6874   LearningRate 0.0120   Epoch: 13   Global Step: 541830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:35,914-Speed 2622.53 samples/sec   Loss 4.6704   LearningRate 0.0120   Epoch: 13   Global Step: 541840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:37:39,790-Speed 2642.33 samples/sec   Loss 4.6103   LearningRate 0.0120   Epoch: 13   Global Step: 541850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:43,689-Speed 2627.43 samples/sec   Loss 4.5808   LearningRate 0.0120   Epoch: 13   Global Step: 541860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:47,596-Speed 2621.20 samples/sec   Loss 4.7018   LearningRate 0.0120   Epoch: 13   Global Step: 541870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:51,498-Speed 2625.22 samples/sec   Loss 4.6212   LearningRate 0.0120   Epoch: 13   Global Step: 541880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:55,410-Speed 2618.32 samples/sec   Loss 4.6735   LearningRate 0.0120   Epoch: 13   Global Step: 541890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:37:59,310-Speed 2626.82 samples/sec   Loss 4.5511   LearningRate 0.0120   Epoch: 13   Global Step: 541900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:38:03,215-Speed 2622.34 samples/sec   Loss 4.6031   LearningRate 0.0120   Epoch: 13   Global Step: 541910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:38:07,126-Speed 2619.10 samples/sec   Loss 4.7167   LearningRate 0.0120   Epoch: 13   Global Step: 541920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:38:11,023-Speed 2628.02 samples/sec   Loss 4.6469   LearningRate 0.0120   Epoch: 13   Global Step: 541930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:38:14,942-Speed 2613.43 samples/sec   Loss 4.6035   LearningRate 0.0120   Epoch: 13   Global Step: 541940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:38:18,849-Speed 2622.49 samples/sec   Loss 4.6216   LearningRate 0.0120   Epoch: 13   Global Step: 541950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:22,750-Speed 2625.05 samples/sec   Loss 4.7040   LearningRate 0.0120   Epoch: 13   Global Step: 541960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:26,647-Speed 2629.20 samples/sec   Loss 4.6561   LearningRate 0.0120   Epoch: 13   Global Step: 541970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:30,544-Speed 2627.90 samples/sec   Loss 4.6822   LearningRate 0.0120   Epoch: 13   Global Step: 541980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:34,439-Speed 2629.27 samples/sec   Loss 4.6198   LearningRate 0.0120   Epoch: 13   Global Step: 541990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:38,340-Speed 2625.35 samples/sec   Loss 4.6716   LearningRate 0.0120   Epoch: 13   Global Step: 542000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:42,239-Speed 2627.63 samples/sec   Loss 4.6237   LearningRate 0.0120   Epoch: 13   Global Step: 542010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:46,144-Speed 2622.24 samples/sec   Loss 4.7090   LearningRate 0.0120   Epoch: 13   Global Step: 542020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:50,044-Speed 2626.75 samples/sec   Loss 4.7814   LearningRate 0.0120   Epoch: 13   Global Step: 542030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:53,944-Speed 2625.83 samples/sec   Loss 4.7160   LearningRate 0.0120   Epoch: 13   Global Step: 542040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:38:57,838-Speed 2630.80 samples/sec   Loss 4.6571   LearningRate 0.0120   Epoch: 13   Global Step: 542050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:39:01,736-Speed 2627.51 samples/sec   Loss 4.6631   LearningRate 0.0120   Epoch: 13   Global Step: 542060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:39:05,589-Speed 2658.11 samples/sec   Loss 4.5835   LearningRate 0.0120   Epoch: 13   Global Step: 542070   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:09,483-Speed 2630.71 samples/sec   Loss 4.6901   LearningRate 0.0120   Epoch: 13   Global Step: 542080   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:13,379-Speed 2628.76 samples/sec   Loss 4.7210   LearningRate 0.0120   Epoch: 13   Global Step: 542090   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:17,285-Speed 2622.06 samples/sec   Loss 4.6789   LearningRate 0.0120   Epoch: 13   Global Step: 542100   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:21,205-Speed 2613.78 samples/sec   Loss 4.6383   LearningRate 0.0120   Epoch: 13   Global Step: 542110   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:25,102-Speed 2628.08 samples/sec   Loss 4.6782   LearningRate 0.0120   Epoch: 13   Global Step: 542120   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:29,013-Speed 2619.42 samples/sec   Loss 4.5444   LearningRate 0.0120   Epoch: 13   Global Step: 542130   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:32,906-Speed 2630.84 samples/sec   Loss 4.6579   LearningRate 0.0120   Epoch: 13   Global Step: 542140   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:36,804-Speed 2627.48 samples/sec   Loss 4.6321   LearningRate 0.0120   Epoch: 13   Global Step: 542150   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:40,700-Speed 2628.65 samples/sec   Loss 4.6424   LearningRate 0.0120   Epoch: 13   Global Step: 542160   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:39:44,598-Speed 2628.04 samples/sec   Loss 4.5915   LearningRate 0.0120   Epoch: 13   Global Step: 542170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:39:48,496-Speed 2628.01 samples/sec   Loss 4.5824   LearningRate 0.0120   Epoch: 13   Global Step: 542180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:39:52,404-Speed 2620.72 samples/sec   Loss 4.7403   LearningRate 0.0120   Epoch: 13   Global Step: 542190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:39:56,310-Speed 2622.80 samples/sec   Loss 4.7242   LearningRate 0.0120   Epoch: 13   Global Step: 542200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:00,214-Speed 2623.47 samples/sec   Loss 4.5033   LearningRate 0.0120   Epoch: 13   Global Step: 542210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:04,118-Speed 2623.39 samples/sec   Loss 4.6556   LearningRate 0.0120   Epoch: 13   Global Step: 542220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:08,022-Speed 2623.09 samples/sec   Loss 4.7563   LearningRate 0.0120   Epoch: 13   Global Step: 542230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:11,947-Speed 2609.95 samples/sec   Loss 4.6501   LearningRate 0.0120   Epoch: 13   Global Step: 542240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:15,849-Speed 2625.18 samples/sec   Loss 4.6754   LearningRate 0.0120   Epoch: 13   Global Step: 542250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:19,750-Speed 2625.92 samples/sec   Loss 4.7492   LearningRate 0.0120   Epoch: 13   Global Step: 542260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:23,645-Speed 2629.72 samples/sec   Loss 4.6211   LearningRate 0.0120   Epoch: 13   Global Step: 542270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:40:27,547-Speed 2624.94 samples/sec   Loss 4.6022   LearningRate 0.0120   Epoch: 13   Global Step: 542280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:40:31,440-Speed 2631.23 samples/sec   Loss 4.6979   LearningRate 0.0120   Epoch: 13   Global Step: 542290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:35,337-Speed 2628.27 samples/sec   Loss 4.6393   LearningRate 0.0120   Epoch: 13   Global Step: 542300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:39,235-Speed 2627.18 samples/sec   Loss 4.6516   LearningRate 0.0120   Epoch: 13   Global Step: 542310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:43,139-Speed 2623.94 samples/sec   Loss 4.7634   LearningRate 0.0120   Epoch: 13   Global Step: 542320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:47,037-Speed 2627.58 samples/sec   Loss 4.7394   LearningRate 0.0120   Epoch: 13   Global Step: 542330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:50,939-Speed 2624.92 samples/sec   Loss 4.6773   LearningRate 0.0120   Epoch: 13   Global Step: 542340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:54,834-Speed 2629.36 samples/sec   Loss 4.7038   LearningRate 0.0120   Epoch: 13   Global Step: 542350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:40:58,733-Speed 2627.51 samples/sec   Loss 4.6470   LearningRate 0.0120   Epoch: 13   Global Step: 542360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:41:02,648-Speed 2615.96 samples/sec   Loss 4.6525   LearningRate 0.0120   Epoch: 13   Global Step: 542370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:41:06,549-Speed 2626.17 samples/sec   Loss 4.6647   LearningRate 0.0120   Epoch: 13   Global Step: 542380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:41:10,462-Speed 2617.82 samples/sec   Loss 4.7314   LearningRate 0.0120   Epoch: 13   Global Step: 542390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:14,359-Speed 2628.26 samples/sec   Loss 4.6648   LearningRate 0.0120   Epoch: 13   Global Step: 542400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:18,257-Speed 2627.68 samples/sec   Loss 4.6397   LearningRate 0.0120   Epoch: 13   Global Step: 542410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:22,159-Speed 2624.69 samples/sec   Loss 4.5985   LearningRate 0.0120   Epoch: 13   Global Step: 542420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:26,057-Speed 2627.91 samples/sec   Loss 4.7079   LearningRate 0.0120   Epoch: 13   Global Step: 542430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:29,957-Speed 2626.40 samples/sec   Loss 4.5636   LearningRate 0.0120   Epoch: 13   Global Step: 542440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:33,870-Speed 2617.95 samples/sec   Loss 4.7277   LearningRate 0.0120   Epoch: 13   Global Step: 542450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:37,769-Speed 2626.53 samples/sec   Loss 4.7517   LearningRate 0.0120   Epoch: 13   Global Step: 542460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:41,666-Speed 2628.67 samples/sec   Loss 4.6746   LearningRate 0.0120   Epoch: 13   Global Step: 542470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:41:45,545-Speed 2640.41 samples/sec   Loss 4.6328   LearningRate 0.0120   Epoch: 13   Global Step: 542480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:41:49,456-Speed 2618.89 samples/sec   Loss 4.7906   LearningRate 0.0120   Epoch: 13   Global Step: 542490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:41:53,353-Speed 2627.85 samples/sec   Loss 4.7150   LearningRate 0.0120   Epoch: 13   Global Step: 542500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:41:57,255-Speed 2625.52 samples/sec   Loss 4.7072   LearningRate 0.0120   Epoch: 13   Global Step: 542510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:42:01,154-Speed 2626.20 samples/sec   Loss 4.5481   LearningRate 0.0120   Epoch: 13   Global Step: 542520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:42:05,052-Speed 2628.50 samples/sec   Loss 4.5640   LearningRate 0.0120   Epoch: 13   Global Step: 542530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:42:08,959-Speed 2621.58 samples/sec   Loss 4.6180   LearningRate 0.0120   Epoch: 13   Global Step: 542540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:42:12,836-Speed 2642.11 samples/sec   Loss 4.7081   LearningRate 0.0120   Epoch: 13   Global Step: 542550   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:16,765-Speed 2606.71 samples/sec   Loss 4.5927   LearningRate 0.0120   Epoch: 13   Global Step: 542560   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:20,664-Speed 2627.24 samples/sec   Loss 4.6505   LearningRate 0.0120   Epoch: 13   Global Step: 542570   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:24,566-Speed 2624.61 samples/sec   Loss 4.6250   LearningRate 0.0120   Epoch: 13   Global Step: 542580   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:28,463-Speed 2628.79 samples/sec   Loss 4.6564   LearningRate 0.0120   Epoch: 13   Global Step: 542590   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:32,371-Speed 2620.93 samples/sec   Loss 4.6463   LearningRate 0.0120   Epoch: 13   Global Step: 542600   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:36,269-Speed 2627.90 samples/sec   Loss 4.5692   LearningRate 0.0120   Epoch: 13   Global Step: 542610   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:40,166-Speed 2628.23 samples/sec   Loss 4.6011   LearningRate 0.0120   Epoch: 13   Global Step: 542620   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:44,071-Speed 2623.67 samples/sec   Loss 4.5579   LearningRate 0.0120   Epoch: 13   Global Step: 542630   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:47,972-Speed 2625.72 samples/sec   Loss 4.6280   LearningRate 0.0120   Epoch: 13   Global Step: 542640   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:42:51,891-Speed 2613.66 samples/sec   Loss 4.6452   LearningRate 0.0120   Epoch: 13   Global Step: 542650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:42:55,829-Speed 2600.79 samples/sec   Loss 4.7456   LearningRate 0.0120   Epoch: 13   Global Step: 542660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:42:59,729-Speed 2626.51 samples/sec   Loss 4.6723   LearningRate 0.0120   Epoch: 13   Global Step: 542670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:03,638-Speed 2619.55 samples/sec   Loss 4.6571   LearningRate 0.0120   Epoch: 13   Global Step: 542680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:07,536-Speed 2628.37 samples/sec   Loss 4.6213   LearningRate 0.0120   Epoch: 13   Global Step: 542690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:11,433-Speed 2628.15 samples/sec   Loss 4.6122   LearningRate 0.0120   Epoch: 13   Global Step: 542700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:15,333-Speed 2626.60 samples/sec   Loss 4.6586   LearningRate 0.0120   Epoch: 13   Global Step: 542710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:19,326-Speed 2565.43 samples/sec   Loss 4.6774   LearningRate 0.0120   Epoch: 13   Global Step: 542720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:23,225-Speed 2626.66 samples/sec   Loss 4.6050   LearningRate 0.0120   Epoch: 13   Global Step: 542730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:27,119-Speed 2630.77 samples/sec   Loss 4.6707   LearningRate 0.0120   Epoch: 13   Global Step: 542740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:30,995-Speed 2642.13 samples/sec   Loss 4.7586   LearningRate 0.0120   Epoch: 13   Global Step: 542750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:34,895-Speed 2626.32 samples/sec   Loss 4.6284   LearningRate 0.0120   Epoch: 13   Global Step: 542760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:38,796-Speed 2625.64 samples/sec   Loss 4.6197   LearningRate 0.0120   Epoch: 13   Global Step: 542770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:42,696-Speed 2626.77 samples/sec   Loss 4.6562   LearningRate 0.0120   Epoch: 13   Global Step: 542780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:46,591-Speed 2629.64 samples/sec   Loss 4.7735   LearningRate 0.0120   Epoch: 13   Global Step: 542790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:50,490-Speed 2627.30 samples/sec   Loss 4.7639   LearningRate 0.0120   Epoch: 13   Global Step: 542800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:54,389-Speed 2626.22 samples/sec   Loss 4.7929   LearningRate 0.0119   Epoch: 13   Global Step: 542810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:43:58,299-Speed 2619.67 samples/sec   Loss 4.6814   LearningRate 0.0119   Epoch: 13   Global Step: 542820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:02,198-Speed 2626.97 samples/sec   Loss 4.6170   LearningRate 0.0119   Epoch: 13   Global Step: 542830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:06,094-Speed 2628.99 samples/sec   Loss 4.7345   LearningRate 0.0119   Epoch: 13   Global Step: 542840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:09,976-Speed 2637.93 samples/sec   Loss 4.6064   LearningRate 0.0119   Epoch: 13   Global Step: 542850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:13,875-Speed 2628.72 samples/sec   Loss 4.6031   LearningRate 0.0119   Epoch: 13   Global Step: 542860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:17,775-Speed 2625.70 samples/sec   Loss 4.5865   LearningRate 0.0119   Epoch: 13   Global Step: 542870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:21,678-Speed 2624.34 samples/sec   Loss 4.5801   LearningRate 0.0119   Epoch: 13   Global Step: 542880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:25,582-Speed 2623.88 samples/sec   Loss 4.5805   LearningRate 0.0119   Epoch: 13   Global Step: 542890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:29,579-Speed 2563.60 samples/sec   Loss 4.6604   LearningRate 0.0119   Epoch: 13   Global Step: 542900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:33,487-Speed 2620.59 samples/sec   Loss 4.7226   LearningRate 0.0119   Epoch: 13   Global Step: 542910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:37,386-Speed 2626.66 samples/sec   Loss 4.6671   LearningRate 0.0119   Epoch: 13   Global Step: 542920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:41,286-Speed 2626.45 samples/sec   Loss 4.6863   LearningRate 0.0119   Epoch: 13   Global Step: 542930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:45,188-Speed 2625.54 samples/sec   Loss 4.6917   LearningRate 0.0119   Epoch: 13   Global Step: 542940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:49,090-Speed 2624.93 samples/sec   Loss 4.5522   LearningRate 0.0119   Epoch: 13   Global Step: 542950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:44:52,974-Speed 2637.30 samples/sec   Loss 4.7131   LearningRate 0.0119   Epoch: 13   Global Step: 542960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:44:56,874-Speed 2626.40 samples/sec   Loss 4.6521   LearningRate 0.0119   Epoch: 13   Global Step: 542970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:00,769-Speed 2629.62 samples/sec   Loss 4.7011   LearningRate 0.0119   Epoch: 13   Global Step: 542980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:04,696-Speed 2608.60 samples/sec   Loss 4.6328   LearningRate 0.0119   Epoch: 13   Global Step: 542990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:08,636-Speed 2599.28 samples/sec   Loss 4.6523   LearningRate 0.0119   Epoch: 13   Global Step: 543000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:12,534-Speed 2627.74 samples/sec   Loss 4.6724   LearningRate 0.0119   Epoch: 13   Global Step: 543010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:16,434-Speed 2626.25 samples/sec   Loss 4.6345   LearningRate 0.0119   Epoch: 13   Global Step: 543020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:20,332-Speed 2627.68 samples/sec   Loss 4.7820   LearningRate 0.0119   Epoch: 13   Global Step: 543030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:24,311-Speed 2574.18 samples/sec   Loss 4.6246   LearningRate 0.0119   Epoch: 13   Global Step: 543040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:28,213-Speed 2625.25 samples/sec   Loss 4.6254   LearningRate 0.0119   Epoch: 13   Global Step: 543050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:32,102-Speed 2633.76 samples/sec   Loss 4.6632   LearningRate 0.0119   Epoch: 13   Global Step: 543060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:36,018-Speed 2615.10 samples/sec   Loss 4.6482   LearningRate 0.0119   Epoch: 13   Global Step: 543070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:39,927-Speed 2620.94 samples/sec   Loss 4.6011   LearningRate 0.0119   Epoch: 13   Global Step: 543080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:43,829-Speed 2624.98 samples/sec   Loss 4.6155   LearningRate 0.0119   Epoch: 13   Global Step: 543090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:47,789-Speed 2586.59 samples/sec   Loss 4.7003   LearningRate 0.0119   Epoch: 13   Global Step: 543100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:51,696-Speed 2621.54 samples/sec   Loss 4.5610   LearningRate 0.0119   Epoch: 13   Global Step: 543110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:55,598-Speed 2624.84 samples/sec   Loss 4.7040   LearningRate 0.0119   Epoch: 13   Global Step: 543120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:45:59,498-Speed 2626.59 samples/sec   Loss 4.6660   LearningRate 0.0119   Epoch: 13   Global Step: 543130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:03,396-Speed 2627.36 samples/sec   Loss 4.6854   LearningRate 0.0119   Epoch: 13   Global Step: 543140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:07,313-Speed 2614.75 samples/sec   Loss 4.6359   LearningRate 0.0119   Epoch: 13   Global Step: 543150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:11,214-Speed 2625.11 samples/sec   Loss 4.7387   LearningRate 0.0119   Epoch: 13   Global Step: 543160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:46:15,126-Speed 2618.93 samples/sec   Loss 4.6290   LearningRate 0.0119   Epoch: 13   Global Step: 543170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:46:19,116-Speed 2566.90 samples/sec   Loss 4.6614   LearningRate 0.0119   Epoch: 13   Global Step: 543180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:46:22,999-Speed 2637.79 samples/sec   Loss 4.6095   LearningRate 0.0119   Epoch: 13   Global Step: 543190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:26,908-Speed 2620.28 samples/sec   Loss 4.6533   LearningRate 0.0119   Epoch: 13   Global Step: 543200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:30,824-Speed 2616.02 samples/sec   Loss 4.6652   LearningRate 0.0119   Epoch: 13   Global Step: 543210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:34,729-Speed 2623.52 samples/sec   Loss 4.5769   LearningRate 0.0119   Epoch: 13   Global Step: 543220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:38,636-Speed 2621.53 samples/sec   Loss 4.6226   LearningRate 0.0119   Epoch: 13   Global Step: 543230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:42,546-Speed 2619.53 samples/sec   Loss 4.6005   LearningRate 0.0119   Epoch: 13   Global Step: 543240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:46,451-Speed 2622.39 samples/sec   Loss 4.7053   LearningRate 0.0119   Epoch: 13   Global Step: 543250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:50,354-Speed 2625.02 samples/sec   Loss 4.7178   LearningRate 0.0119   Epoch: 13   Global Step: 543260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:54,265-Speed 2618.39 samples/sec   Loss 4.6768   LearningRate 0.0119   Epoch: 13   Global Step: 543270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:46:58,168-Speed 2624.32 samples/sec   Loss 4.6732   LearningRate 0.0119   Epoch: 13   Global Step: 543280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:02,043-Speed 2643.47 samples/sec   Loss 4.6746   LearningRate 0.0119   Epoch: 13   Global Step: 543290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:05,988-Speed 2596.41 samples/sec   Loss 4.6518   LearningRate 0.0119   Epoch: 13   Global Step: 543300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:09,912-Speed 2611.06 samples/sec   Loss 4.6647   LearningRate 0.0119   Epoch: 13   Global Step: 543310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:13,814-Speed 2624.23 samples/sec   Loss 4.6868   LearningRate 0.0119   Epoch: 13   Global Step: 543320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:17,733-Speed 2613.79 samples/sec   Loss 4.6531   LearningRate 0.0119   Epoch: 13   Global Step: 543330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:21,637-Speed 2624.23 samples/sec   Loss 4.6636   LearningRate 0.0119   Epoch: 13   Global Step: 543340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:25,534-Speed 2628.10 samples/sec   Loss 4.6534   LearningRate 0.0119   Epoch: 13   Global Step: 543350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:47:29,414-Speed 2639.67 samples/sec   Loss 4.6522   LearningRate 0.0119   Epoch: 13   Global Step: 543360   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:33,317-Speed 2623.94 samples/sec   Loss 4.6623   LearningRate 0.0119   Epoch: 13   Global Step: 543370   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:37,226-Speed 2620.27 samples/sec   Loss 4.5156   LearningRate 0.0119   Epoch: 13   Global Step: 543380   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:41,142-Speed 2616.03 samples/sec   Loss 4.6763   LearningRate 0.0119   Epoch: 13   Global Step: 543390   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:45,050-Speed 2620.76 samples/sec   Loss 4.6904   LearningRate 0.0119   Epoch: 13   Global Step: 543400   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:48,967-Speed 2615.70 samples/sec   Loss 4.6743   LearningRate 0.0119   Epoch: 13   Global Step: 543410   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:52,889-Speed 2611.35 samples/sec   Loss 4.7187   LearningRate 0.0119   Epoch: 13   Global Step: 543420   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:47:56,802-Speed 2617.57 samples/sec   Loss 4.6128   LearningRate 0.0119   Epoch: 13   Global Step: 543430   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:48:00,710-Speed 2620.11 samples/sec   Loss 4.6198   LearningRate 0.0119   Epoch: 13   Global Step: 543440   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:48:04,653-Speed 2598.60 samples/sec   Loss 4.5729   LearningRate 0.0119   Epoch: 13   Global Step: 543450   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:48:08,574-Speed 2612.21 samples/sec   Loss 4.6634   LearningRate 0.0119   Epoch: 13   Global Step: 543460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:12,495-Speed 2612.35 samples/sec   Loss 4.6192   LearningRate 0.0119   Epoch: 13   Global Step: 543470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:16,391-Speed 2628.61 samples/sec   Loss 4.6671   LearningRate 0.0119   Epoch: 13   Global Step: 543480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:20,291-Speed 2626.98 samples/sec   Loss 4.6542   LearningRate 0.0119   Epoch: 13   Global Step: 543490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:24,190-Speed 2626.76 samples/sec   Loss 4.5860   LearningRate 0.0119   Epoch: 13   Global Step: 543500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:28,091-Speed 2626.14 samples/sec   Loss 4.6681   LearningRate 0.0119   Epoch: 13   Global Step: 543510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:31,989-Speed 2626.93 samples/sec   Loss 4.6829   LearningRate 0.0119   Epoch: 13   Global Step: 543520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:36,076-Speed 2506.43 samples/sec   Loss 4.6634   LearningRate 0.0119   Epoch: 13   Global Step: 543530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:39,972-Speed 2629.48 samples/sec   Loss 4.5951   LearningRate 0.0119   Epoch: 13   Global Step: 543540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:43,871-Speed 2627.26 samples/sec   Loss 4.7297   LearningRate 0.0119   Epoch: 13   Global Step: 543550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:47,772-Speed 2626.03 samples/sec   Loss 4.5705   LearningRate 0.0119   Epoch: 13   Global Step: 543560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:48:51,680-Speed 2620.38 samples/sec   Loss 4.6049   LearningRate 0.0119   Epoch: 13   Global Step: 543570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:48:55,583-Speed 2624.44 samples/sec   Loss 4.6450   LearningRate 0.0119   Epoch: 13   Global Step: 543580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:48:59,488-Speed 2623.32 samples/sec   Loss 4.6817   LearningRate 0.0119   Epoch: 13   Global Step: 543590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:49:03,386-Speed 2627.54 samples/sec   Loss 4.7004   LearningRate 0.0119   Epoch: 13   Global Step: 543600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:49:07,288-Speed 2624.84 samples/sec   Loss 4.6153   LearningRate 0.0119   Epoch: 13   Global Step: 543610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:49:11,163-Speed 2643.53 samples/sec   Loss 4.6962   LearningRate 0.0119   Epoch: 13   Global Step: 543620   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:15,060-Speed 2628.53 samples/sec   Loss 4.6416   LearningRate 0.0119   Epoch: 13   Global Step: 543630   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:18,972-Speed 2617.55 samples/sec   Loss 4.7496   LearningRate 0.0119   Epoch: 13   Global Step: 543640   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:22,875-Speed 2624.70 samples/sec   Loss 4.6133   LearningRate 0.0119   Epoch: 13   Global Step: 543650   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:26,784-Speed 2620.24 samples/sec   Loss 4.5834   LearningRate 0.0119   Epoch: 13   Global Step: 543660   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:30,696-Speed 2618.77 samples/sec   Loss 4.5446   LearningRate 0.0119   Epoch: 13   Global Step: 543670   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:34,641-Speed 2595.89 samples/sec   Loss 4.6895   LearningRate 0.0119   Epoch: 13   Global Step: 543680   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:38,546-Speed 2622.91 samples/sec   Loss 4.7550   LearningRate 0.0119   Epoch: 13   Global Step: 543690   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:42,452-Speed 2622.21 samples/sec   Loss 4.6553   LearningRate 0.0119   Epoch: 13   Global Step: 543700   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:46,356-Speed 2623.06 samples/sec   Loss 4.6058   LearningRate 0.0119   Epoch: 13   Global Step: 543710   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 08:49:50,256-Speed 2626.34 samples/sec   Loss 4.6187   LearningRate 0.0119   Epoch: 13   Global Step: 543720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:49:54,160-Speed 2623.82 samples/sec   Loss 4.6355   LearningRate 0.0119   Epoch: 13   Global Step: 543730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:49:58,061-Speed 2626.10 samples/sec   Loss 4.6198   LearningRate 0.0119   Epoch: 13   Global Step: 543740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:01,967-Speed 2622.05 samples/sec   Loss 4.6346   LearningRate 0.0119   Epoch: 13   Global Step: 543750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:05,877-Speed 2619.56 samples/sec   Loss 4.5670   LearningRate 0.0119   Epoch: 13   Global Step: 543760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:09,784-Speed 2621.78 samples/sec   Loss 4.5368   LearningRate 0.0119   Epoch: 13   Global Step: 543770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:13,694-Speed 2619.97 samples/sec   Loss 4.6032   LearningRate 0.0119   Epoch: 13   Global Step: 543780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:17,607-Speed 2616.97 samples/sec   Loss 4.7880   LearningRate 0.0119   Epoch: 13   Global Step: 543790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:21,521-Speed 2617.52 samples/sec   Loss 4.6450   LearningRate 0.0119   Epoch: 13   Global Step: 543800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:25,443-Speed 2610.79 samples/sec   Loss 4.6949   LearningRate 0.0119   Epoch: 13   Global Step: 543810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:29,348-Speed 2624.18 samples/sec   Loss 4.6542   LearningRate 0.0119   Epoch: 13   Global Step: 543820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:50:33,235-Speed 2634.58 samples/sec   Loss 4.6472   LearningRate 0.0119   Epoch: 13   Global Step: 543830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:37,141-Speed 2622.35 samples/sec   Loss 4.6363   LearningRate 0.0119   Epoch: 13   Global Step: 543840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:41,056-Speed 2615.59 samples/sec   Loss 4.6417   LearningRate 0.0119   Epoch: 13   Global Step: 543850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:44,969-Speed 2618.38 samples/sec   Loss 4.7093   LearningRate 0.0119   Epoch: 13   Global Step: 543860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:48,871-Speed 2624.61 samples/sec   Loss 4.5731   LearningRate 0.0119   Epoch: 13   Global Step: 543870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:52,773-Speed 2625.05 samples/sec   Loss 4.6558   LearningRate 0.0119   Epoch: 13   Global Step: 543880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:50:56,676-Speed 2623.81 samples/sec   Loss 4.6733   LearningRate 0.0119   Epoch: 13   Global Step: 543890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:00,580-Speed 2624.53 samples/sec   Loss 4.5930   LearningRate 0.0119   Epoch: 13   Global Step: 543900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:04,484-Speed 2623.74 samples/sec   Loss 4.6067   LearningRate 0.0119   Epoch: 13   Global Step: 543910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:08,389-Speed 2622.13 samples/sec   Loss 4.6618   LearningRate 0.0119   Epoch: 13   Global Step: 543920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:12,274-Speed 2636.25 samples/sec   Loss 4.6878   LearningRate 0.0119   Epoch: 13   Global Step: 543930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:16,172-Speed 2628.24 samples/sec   Loss 4.5266   LearningRate 0.0119   Epoch: 13   Global Step: 543940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:20,075-Speed 2624.05 samples/sec   Loss 4.5689   LearningRate 0.0119   Epoch: 13   Global Step: 543950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:23,979-Speed 2623.83 samples/sec   Loss 4.6681   LearningRate 0.0119   Epoch: 13   Global Step: 543960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:27,899-Speed 2622.26 samples/sec   Loss 4.6499   LearningRate 0.0119   Epoch: 13   Global Step: 543970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:31,796-Speed 2628.40 samples/sec   Loss 4.5571   LearningRate 0.0119   Epoch: 13   Global Step: 543980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:35,699-Speed 2624.15 samples/sec   Loss 4.6442   LearningRate 0.0119   Epoch: 13   Global Step: 543990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:39,599-Speed 2626.44 samples/sec   Loss 4.6294   LearningRate 0.0119   Epoch: 13   Global Step: 544000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:43,507-Speed 2621.16 samples/sec   Loss 4.6495   LearningRate 0.0118   Epoch: 13   Global Step: 544010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:47,406-Speed 2626.63 samples/sec   Loss 4.6115   LearningRate 0.0118   Epoch: 13   Global Step: 544020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:51:51,305-Speed 2626.99 samples/sec   Loss 4.6443   LearningRate 0.0118   Epoch: 13   Global Step: 544030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:51:55,234-Speed 2606.98 samples/sec   Loss 4.5241   LearningRate 0.0118   Epoch: 13   Global Step: 544040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:51:59,137-Speed 2624.41 samples/sec   Loss 4.5907   LearningRate 0.0118   Epoch: 13   Global Step: 544050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:52:03,089-Speed 2592.02 samples/sec   Loss 4.5750   LearningRate 0.0118   Epoch: 13   Global Step: 544060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:52:07,188-Speed 2498.39 samples/sec   Loss 4.5650   LearningRate 0.0118   Epoch: 13   Global Step: 544070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:52:11,139-Speed 2592.78 samples/sec   Loss 4.6799   LearningRate 0.0118   Epoch: 13   Global Step: 544080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:52:15,022-Speed 2638.51 samples/sec   Loss 4.6389   LearningRate 0.0118   Epoch: 13   Global Step: 544090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:19,746-Speed 2167.99 samples/sec   Loss 4.5867   LearningRate 0.0118   Epoch: 13   Global Step: 544100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:23,850-Speed 2495.31 samples/sec   Loss 4.6507   LearningRate 0.0118   Epoch: 13   Global Step: 544110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:27,828-Speed 2574.34 samples/sec   Loss 4.6367   LearningRate 0.0118   Epoch: 13   Global Step: 544120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:31,747-Speed 2613.87 samples/sec   Loss 4.7386   LearningRate 0.0118   Epoch: 13   Global Step: 544130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:35,648-Speed 2626.48 samples/sec   Loss 4.6358   LearningRate 0.0118   Epoch: 13   Global Step: 544140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:39,545-Speed 2628.07 samples/sec   Loss 4.6889   LearningRate 0.0118   Epoch: 13   Global Step: 544150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:43,447-Speed 2624.95 samples/sec   Loss 4.6527   LearningRate 0.0118   Epoch: 13   Global Step: 544160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:47,368-Speed 2612.05 samples/sec   Loss 4.6298   LearningRate 0.0118   Epoch: 13   Global Step: 544170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:51,275-Speed 2621.60 samples/sec   Loss 4.7185   LearningRate 0.0118   Epoch: 13   Global Step: 544180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:52:55,215-Speed 2600.68 samples/sec   Loss 4.6791   LearningRate 0.0118   Epoch: 13   Global Step: 544190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:52:59,113-Speed 2627.20 samples/sec   Loss 4.6929   LearningRate 0.0118   Epoch: 13   Global Step: 544200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:53:03,030-Speed 2615.15 samples/sec   Loss 4.6153   LearningRate 0.0118   Epoch: 13   Global Step: 544210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:06,930-Speed 2626.34 samples/sec   Loss 4.6686   LearningRate 0.0118   Epoch: 13   Global Step: 544220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:10,830-Speed 2626.58 samples/sec   Loss 4.6422   LearningRate 0.0118   Epoch: 13   Global Step: 544230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:14,729-Speed 2627.53 samples/sec   Loss 4.7197   LearningRate 0.0118   Epoch: 13   Global Step: 544240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:18,632-Speed 2623.64 samples/sec   Loss 4.5935   LearningRate 0.0118   Epoch: 13   Global Step: 544250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:22,531-Speed 2627.27 samples/sec   Loss 4.6778   LearningRate 0.0118   Epoch: 13   Global Step: 544260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:26,442-Speed 2619.33 samples/sec   Loss 4.6740   LearningRate 0.0118   Epoch: 13   Global Step: 544270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:30,721-Speed 2393.51 samples/sec   Loss 4.6183   LearningRate 0.0118   Epoch: 13   Global Step: 544280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:34,621-Speed 2627.03 samples/sec   Loss 4.5934   LearningRate 0.0118   Epoch: 13   Global Step: 544290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:38,531-Speed 2618.99 samples/sec   Loss 4.5789   LearningRate 0.0118   Epoch: 13   Global Step: 544300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:53:42,428-Speed 2628.50 samples/sec   Loss 4.6028   LearningRate 0.0118   Epoch: 13   Global Step: 544310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:53:46,362-Speed 2603.53 samples/sec   Loss 4.6464   LearningRate 0.0118   Epoch: 13   Global Step: 544320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:53:50,260-Speed 2627.92 samples/sec   Loss 4.6532   LearningRate 0.0118   Epoch: 13   Global Step: 544330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:53:54,161-Speed 2625.77 samples/sec   Loss 4.5185   LearningRate 0.0118   Epoch: 13   Global Step: 544340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:53:58,041-Speed 2640.06 samples/sec   Loss 4.6706   LearningRate 0.0118   Epoch: 13   Global Step: 544350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:01,945-Speed 2623.17 samples/sec   Loss 4.5848   LearningRate 0.0118   Epoch: 13   Global Step: 544360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:05,851-Speed 2622.66 samples/sec   Loss 4.5999   LearningRate 0.0118   Epoch: 13   Global Step: 544370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:09,753-Speed 2625.39 samples/sec   Loss 4.5907   LearningRate 0.0118   Epoch: 13   Global Step: 544380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:13,655-Speed 2624.44 samples/sec   Loss 4.7051   LearningRate 0.0118   Epoch: 13   Global Step: 544390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:17,559-Speed 2624.28 samples/sec   Loss 4.6654   LearningRate 0.0118   Epoch: 13   Global Step: 544400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:21,467-Speed 2620.49 samples/sec   Loss 4.6212   LearningRate 0.0118   Epoch: 13   Global Step: 544410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:25,415-Speed 2594.29 samples/sec   Loss 4.7221   LearningRate 0.0118   Epoch: 13   Global Step: 544420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:29,316-Speed 2625.73 samples/sec   Loss 4.6511   LearningRate 0.0118   Epoch: 13   Global Step: 544430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:33,219-Speed 2624.59 samples/sec   Loss 4.5958   LearningRate 0.0118   Epoch: 13   Global Step: 544440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:37,117-Speed 2627.05 samples/sec   Loss 4.7539   LearningRate 0.0118   Epoch: 13   Global Step: 544450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:54:40,990-Speed 2645.05 samples/sec   Loss 4.5154   LearningRate 0.0118   Epoch: 13   Global Step: 544460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:44,897-Speed 2622.07 samples/sec   Loss 4.6102   LearningRate 0.0118   Epoch: 13   Global Step: 544470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:48,792-Speed 2629.02 samples/sec   Loss 4.7220   LearningRate 0.0118   Epoch: 13   Global Step: 544480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:52,699-Speed 2622.46 samples/sec   Loss 4.5109   LearningRate 0.0118   Epoch: 13   Global Step: 544490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:54:56,601-Speed 2624.57 samples/sec   Loss 4.6033   LearningRate 0.0118   Epoch: 13   Global Step: 544500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:00,495-Speed 2630.47 samples/sec   Loss 4.6149   LearningRate 0.0118   Epoch: 13   Global Step: 544510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:04,428-Speed 2603.77 samples/sec   Loss 4.6578   LearningRate 0.0118   Epoch: 13   Global Step: 544520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:08,333-Speed 2623.80 samples/sec   Loss 4.5789   LearningRate 0.0118   Epoch: 13   Global Step: 544530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:12,233-Speed 2626.27 samples/sec   Loss 4.6411   LearningRate 0.0118   Epoch: 13   Global Step: 544540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:16,135-Speed 2624.94 samples/sec   Loss 4.6525   LearningRate 0.0118   Epoch: 13   Global Step: 544550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:20,036-Speed 2626.00 samples/sec   Loss 4.6690   LearningRate 0.0118   Epoch: 13   Global Step: 544560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:55:23,948-Speed 2618.44 samples/sec   Loss 4.5356   LearningRate 0.0118   Epoch: 13   Global Step: 544570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:55:27,849-Speed 2625.93 samples/sec   Loss 4.6561   LearningRate 0.0118   Epoch: 13   Global Step: 544580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:55:31,754-Speed 2623.00 samples/sec   Loss 4.7091   LearningRate 0.0118   Epoch: 13   Global Step: 544590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:55:35,678-Speed 2610.07 samples/sec   Loss 4.6905   LearningRate 0.0118   Epoch: 13   Global Step: 544600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:39,597-Speed 2613.48 samples/sec   Loss 4.5167   LearningRate 0.0118   Epoch: 13   Global Step: 544610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:43,501-Speed 2623.76 samples/sec   Loss 4.5961   LearningRate 0.0118   Epoch: 13   Global Step: 544620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:47,398-Speed 2628.56 samples/sec   Loss 4.6093   LearningRate 0.0118   Epoch: 13   Global Step: 544630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:51,307-Speed 2620.43 samples/sec   Loss 4.5762   LearningRate 0.0118   Epoch: 13   Global Step: 544640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:55,207-Speed 2626.35 samples/sec   Loss 4.6394   LearningRate 0.0118   Epoch: 13   Global Step: 544650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:55:59,116-Speed 2620.50 samples/sec   Loss 4.7132   LearningRate 0.0118   Epoch: 13   Global Step: 544660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:03,017-Speed 2625.19 samples/sec   Loss 4.7096   LearningRate 0.0118   Epoch: 13   Global Step: 544670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:06,915-Speed 2627.77 samples/sec   Loss 4.6037   LearningRate 0.0118   Epoch: 13   Global Step: 544680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:10,909-Speed 2564.53 samples/sec   Loss 4.5690   LearningRate 0.0118   Epoch: 13   Global Step: 544690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:14,836-Speed 2608.20 samples/sec   Loss 4.6216   LearningRate 0.0118   Epoch: 13   Global Step: 544700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:56:18,713-Speed 2642.04 samples/sec   Loss 4.5627   LearningRate 0.0118   Epoch: 13   Global Step: 544710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:22,616-Speed 2624.64 samples/sec   Loss 4.7287   LearningRate 0.0118   Epoch: 13   Global Step: 544720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:26,522-Speed 2622.57 samples/sec   Loss 4.5469   LearningRate 0.0118   Epoch: 13   Global Step: 544730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:30,421-Speed 2626.97 samples/sec   Loss 4.6398   LearningRate 0.0118   Epoch: 13   Global Step: 544740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:34,319-Speed 2627.82 samples/sec   Loss 4.6294   LearningRate 0.0118   Epoch: 13   Global Step: 544750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:38,217-Speed 2627.15 samples/sec   Loss 4.7063   LearningRate 0.0118   Epoch: 13   Global Step: 544760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:42,120-Speed 2624.62 samples/sec   Loss 4.5498   LearningRate 0.0118   Epoch: 13   Global Step: 544770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:46,029-Speed 2620.37 samples/sec   Loss 4.6890   LearningRate 0.0118   Epoch: 13   Global Step: 544780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:49,928-Speed 2627.12 samples/sec   Loss 4.5353   LearningRate 0.0118   Epoch: 13   Global Step: 544790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:53,859-Speed 2605.65 samples/sec   Loss 4.5999   LearningRate 0.0118   Epoch: 13   Global Step: 544800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:56:57,753-Speed 2630.49 samples/sec   Loss 4.5757   LearningRate 0.0118   Epoch: 13   Global Step: 544810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:57:01,650-Speed 2629.04 samples/sec   Loss 4.5955   LearningRate 0.0118   Epoch: 13   Global Step: 544820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:57:05,547-Speed 2627.74 samples/sec   Loss 4.7098   LearningRate 0.0118   Epoch: 13   Global Step: 544830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:57:09,464-Speed 2615.24 samples/sec   Loss 4.5700   LearningRate 0.0118   Epoch: 13   Global Step: 544840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:13,361-Speed 2628.00 samples/sec   Loss 4.7116   LearningRate 0.0118   Epoch: 13   Global Step: 544850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:17,260-Speed 2627.21 samples/sec   Loss 4.6511   LearningRate 0.0118   Epoch: 13   Global Step: 544860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:21,165-Speed 2622.98 samples/sec   Loss 4.6941   LearningRate 0.0118   Epoch: 13   Global Step: 544870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:25,066-Speed 2625.41 samples/sec   Loss 4.6444   LearningRate 0.0118   Epoch: 13   Global Step: 544880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:28,964-Speed 2627.97 samples/sec   Loss 4.6108   LearningRate 0.0118   Epoch: 13   Global Step: 544890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:32,870-Speed 2622.29 samples/sec   Loss 4.7097   LearningRate 0.0118   Epoch: 13   Global Step: 544900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:36,768-Speed 2628.01 samples/sec   Loss 4.6631   LearningRate 0.0118   Epoch: 13   Global Step: 544910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:40,663-Speed 2629.25 samples/sec   Loss 4.5921   LearningRate 0.0118   Epoch: 13   Global Step: 544920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:44,590-Speed 2608.27 samples/sec   Loss 4.6227   LearningRate 0.0118   Epoch: 13   Global Step: 544930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:57:48,493-Speed 2624.56 samples/sec   Loss 4.6498   LearningRate 0.0118   Epoch: 13   Global Step: 544940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:57:52,397-Speed 2623.80 samples/sec   Loss 4.5838   LearningRate 0.0118   Epoch: 13   Global Step: 544950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:57:56,298-Speed 2625.72 samples/sec   Loss 4.6172   LearningRate 0.0118   Epoch: 13   Global Step: 544960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:58:00,173-Speed 2644.02 samples/sec   Loss 4.6140   LearningRate 0.0118   Epoch: 13   Global Step: 544970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:04,069-Speed 2628.72 samples/sec   Loss 4.5742   LearningRate 0.0118   Epoch: 13   Global Step: 544980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:07,966-Speed 2628.09 samples/sec   Loss 4.6364   LearningRate 0.0118   Epoch: 13   Global Step: 544990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:11,884-Speed 2614.44 samples/sec   Loss 4.6710   LearningRate 0.0118   Epoch: 13   Global Step: 545000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:15,778-Speed 2630.37 samples/sec   Loss 4.5950   LearningRate 0.0118   Epoch: 13   Global Step: 545010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:19,681-Speed 2624.25 samples/sec   Loss 4.5970   LearningRate 0.0118   Epoch: 13   Global Step: 545020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:23,588-Speed 2621.40 samples/sec   Loss 4.6246   LearningRate 0.0118   Epoch: 13   Global Step: 545030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:27,486-Speed 2627.41 samples/sec   Loss 4.6794   LearningRate 0.0118   Epoch: 13   Global Step: 545040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:31,381-Speed 2630.19 samples/sec   Loss 4.7090   LearningRate 0.0118   Epoch: 13   Global Step: 545050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:35,280-Speed 2627.63 samples/sec   Loss 4.6906   LearningRate 0.0118   Epoch: 13   Global Step: 545060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:39,181-Speed 2625.16 samples/sec   Loss 4.6310   LearningRate 0.0118   Epoch: 13   Global Step: 545070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:58:43,086-Speed 2622.64 samples/sec   Loss 4.5342   LearningRate 0.0118   Epoch: 13   Global Step: 545080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:58:46,989-Speed 2625.11 samples/sec   Loss 4.5607   LearningRate 0.0118   Epoch: 13   Global Step: 545090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:58:50,893-Speed 2623.63 samples/sec   Loss 4.6673   LearningRate 0.0118   Epoch: 13   Global Step: 545100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 08:58:54,772-Speed 2640.33 samples/sec   Loss 4.6906   LearningRate 0.0118   Epoch: 13   Global Step: 545110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:58:58,680-Speed 2620.67 samples/sec   Loss 4.5478   LearningRate 0.0118   Epoch: 13   Global Step: 545120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:02,589-Speed 2620.27 samples/sec   Loss 4.5871   LearningRate 0.0118   Epoch: 13   Global Step: 545130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:06,496-Speed 2621.89 samples/sec   Loss 4.6648   LearningRate 0.0118   Epoch: 13   Global Step: 545140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:10,398-Speed 2624.84 samples/sec   Loss 4.7057   LearningRate 0.0118   Epoch: 13   Global Step: 545150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:14,294-Speed 2628.95 samples/sec   Loss 4.6918   LearningRate 0.0118   Epoch: 13   Global Step: 545160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:18,192-Speed 2628.25 samples/sec   Loss 4.5902   LearningRate 0.0118   Epoch: 13   Global Step: 545170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:22,085-Speed 2630.90 samples/sec   Loss 4.7425   LearningRate 0.0118   Epoch: 13   Global Step: 545180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:25,991-Speed 2622.32 samples/sec   Loss 4.5458   LearningRate 0.0118   Epoch: 13   Global Step: 545190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:29,887-Speed 2629.63 samples/sec   Loss 4.6453   LearningRate 0.0118   Epoch: 13   Global Step: 545200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:33,754-Speed 2648.48 samples/sec   Loss 4.6137   LearningRate 0.0118   Epoch: 13   Global Step: 545210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:37,676-Speed 2611.11 samples/sec   Loss 4.5367   LearningRate 0.0117   Epoch: 13   Global Step: 545220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:41,569-Speed 2631.24 samples/sec   Loss 4.6733   LearningRate 0.0117   Epoch: 13   Global Step: 545230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:45,467-Speed 2628.33 samples/sec   Loss 4.5308   LearningRate 0.0117   Epoch: 13   Global Step: 545240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:49,360-Speed 2631.28 samples/sec   Loss 4.6605   LearningRate 0.0117   Epoch: 13   Global Step: 545250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:53,257-Speed 2628.49 samples/sec   Loss 4.7361   LearningRate 0.0117   Epoch: 13   Global Step: 545260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 08:59:57,191-Speed 2603.39 samples/sec   Loss 4.6356   LearningRate 0.0117   Epoch: 13   Global Step: 545270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:01,095-Speed 2623.12 samples/sec   Loss 4.6028   LearningRate 0.0117   Epoch: 13   Global Step: 545280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:04,995-Speed 2626.84 samples/sec   Loss 4.6604   LearningRate 0.0117   Epoch: 13   Global Step: 545290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:08,892-Speed 2628.11 samples/sec   Loss 4.5508   LearningRate 0.0117   Epoch: 13   Global Step: 545300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:12,786-Speed 2630.39 samples/sec   Loss 4.6543   LearningRate 0.0117   Epoch: 13   Global Step: 545310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:16,716-Speed 2606.44 samples/sec   Loss 4.5790   LearningRate 0.0117   Epoch: 13   Global Step: 545320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:20,616-Speed 2626.35 samples/sec   Loss 4.6388   LearningRate 0.0117   Epoch: 13   Global Step: 545330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:24,514-Speed 2627.95 samples/sec   Loss 4.5208   LearningRate 0.0117   Epoch: 13   Global Step: 545340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:28,412-Speed 2627.37 samples/sec   Loss 4.7338   LearningRate 0.0117   Epoch: 13   Global Step: 545350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:32,311-Speed 2626.75 samples/sec   Loss 4.5935   LearningRate 0.0117   Epoch: 13   Global Step: 545360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:36,209-Speed 2627.42 samples/sec   Loss 4.5662   LearningRate 0.0117   Epoch: 13   Global Step: 545370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:00:40,113-Speed 2624.59 samples/sec   Loss 4.5873   LearningRate 0.0117   Epoch: 13   Global Step: 545380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:44,009-Speed 2628.25 samples/sec   Loss 4.5642   LearningRate 0.0117   Epoch: 13   Global Step: 545390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:47,925-Speed 2615.77 samples/sec   Loss 4.6019   LearningRate 0.0117   Epoch: 13   Global Step: 545400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:51,824-Speed 2627.21 samples/sec   Loss 4.5831   LearningRate 0.0117   Epoch: 13   Global Step: 545410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:55,719-Speed 2630.34 samples/sec   Loss 4.6003   LearningRate 0.0117   Epoch: 13   Global Step: 545420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:00:59,621-Speed 2624.27 samples/sec   Loss 4.5479   LearningRate 0.0117   Epoch: 13   Global Step: 545430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:03,519-Speed 2628.14 samples/sec   Loss 4.6072   LearningRate 0.0117   Epoch: 13   Global Step: 545440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:07,420-Speed 2625.71 samples/sec   Loss 4.6444   LearningRate 0.0117   Epoch: 13   Global Step: 545450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:11,372-Speed 2591.64 samples/sec   Loss 4.6336   LearningRate 0.0117   Epoch: 13   Global Step: 545460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:15,277-Speed 2622.94 samples/sec   Loss 4.5434   LearningRate 0.0117   Epoch: 13   Global Step: 545470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:19,177-Speed 2626.59 samples/sec   Loss 4.7676   LearningRate 0.0117   Epoch: 13   Global Step: 545480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:01:23,082-Speed 2623.00 samples/sec   Loss 4.5335   LearningRate 0.0117   Epoch: 13   Global Step: 545490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:01:26,977-Speed 2629.44 samples/sec   Loss 4.6700   LearningRate 0.0117   Epoch: 13   Global Step: 545500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:01:30,872-Speed 2629.96 samples/sec   Loss 4.6397   LearningRate 0.0117   Epoch: 13   Global Step: 545510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:01:34,749-Speed 2642.15 samples/sec   Loss 4.6282   LearningRate 0.0117   Epoch: 13   Global Step: 545520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:38,643-Speed 2630.03 samples/sec   Loss 4.6088   LearningRate 0.0117   Epoch: 13   Global Step: 545530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:42,536-Speed 2630.46 samples/sec   Loss 4.6354   LearningRate 0.0117   Epoch: 13   Global Step: 545540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:46,432-Speed 2629.61 samples/sec   Loss 4.6605   LearningRate 0.0117   Epoch: 13   Global Step: 545550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:50,327-Speed 2629.57 samples/sec   Loss 4.6864   LearningRate 0.0117   Epoch: 13   Global Step: 545560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:54,227-Speed 2626.90 samples/sec   Loss 4.5725   LearningRate 0.0117   Epoch: 13   Global Step: 545570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:01:58,119-Speed 2631.56 samples/sec   Loss 4.6712   LearningRate 0.0117   Epoch: 13   Global Step: 545580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:02,068-Speed 2594.06 samples/sec   Loss 4.6819   LearningRate 0.0117   Epoch: 13   Global Step: 545590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:05,959-Speed 2632.33 samples/sec   Loss 4.6198   LearningRate 0.0117   Epoch: 13   Global Step: 545600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:09,856-Speed 2628.09 samples/sec   Loss 4.6081   LearningRate 0.0117   Epoch: 13   Global Step: 545610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:13,752-Speed 2628.84 samples/sec   Loss 4.5916   LearningRate 0.0117   Epoch: 13   Global Step: 545620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:02:17,647-Speed 2629.78 samples/sec   Loss 4.5745   LearningRate 0.0117   Epoch: 13   Global Step: 545630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:02:21,589-Speed 2598.92 samples/sec   Loss 4.6196   LearningRate 0.0117   Epoch: 13   Global Step: 545640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:02:25,485-Speed 2628.74 samples/sec   Loss 4.6721   LearningRate 0.0117   Epoch: 13   Global Step: 545650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:02:29,422-Speed 2601.68 samples/sec   Loss 4.5649   LearningRate 0.0117   Epoch: 13   Global Step: 545660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:33,447-Speed 2544.94 samples/sec   Loss 4.6561   LearningRate 0.0117   Epoch: 13   Global Step: 545670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:37,434-Speed 2568.84 samples/sec   Loss 4.6819   LearningRate 0.0117   Epoch: 13   Global Step: 545680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:41,328-Speed 2629.81 samples/sec   Loss 4.4997   LearningRate 0.0117   Epoch: 13   Global Step: 545690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:45,225-Speed 2628.46 samples/sec   Loss 4.6764   LearningRate 0.0117   Epoch: 13   Global Step: 545700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:49,141-Speed 2615.85 samples/sec   Loss 4.7015   LearningRate 0.0117   Epoch: 13   Global Step: 545710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:53,043-Speed 2625.10 samples/sec   Loss 4.6141   LearningRate 0.0117   Epoch: 13   Global Step: 545720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:02:56,935-Speed 2631.44 samples/sec   Loss 4.5758   LearningRate 0.0117   Epoch: 13   Global Step: 545730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:00,828-Speed 2631.38 samples/sec   Loss 4.6248   LearningRate 0.0117   Epoch: 13   Global Step: 545740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:04,735-Speed 2621.49 samples/sec   Loss 4.5089   LearningRate 0.0117   Epoch: 13   Global Step: 545750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:08,619-Speed 2637.09 samples/sec   Loss 4.5219   LearningRate 0.0117   Epoch: 13   Global Step: 545760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:12,517-Speed 2627.73 samples/sec   Loss 4.5072   LearningRate 0.0117   Epoch: 13   Global Step: 545770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:16,428-Speed 2619.14 samples/sec   Loss 4.6837   LearningRate 0.0117   Epoch: 13   Global Step: 545780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:20,327-Speed 2626.60 samples/sec   Loss 4.6647   LearningRate 0.0117   Epoch: 13   Global Step: 545790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:24,240-Speed 2617.79 samples/sec   Loss 4.5412   LearningRate 0.0117   Epoch: 13   Global Step: 545800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:28,140-Speed 2626.23 samples/sec   Loss 4.6208   LearningRate 0.0117   Epoch: 13   Global Step: 545810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:32,037-Speed 2628.45 samples/sec   Loss 4.6194   LearningRate 0.0117   Epoch: 13   Global Step: 545820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:35,933-Speed 2629.10 samples/sec   Loss 4.5059   LearningRate 0.0117   Epoch: 13   Global Step: 545830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:39,825-Speed 2631.73 samples/sec   Loss 4.5676   LearningRate 0.0117   Epoch: 13   Global Step: 545840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:43,727-Speed 2624.15 samples/sec   Loss 4.4885   LearningRate 0.0117   Epoch: 13   Global Step: 545850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:03:47,619-Speed 2631.77 samples/sec   Loss 4.6105   LearningRate 0.0117   Epoch: 13   Global Step: 545860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:03:51,515-Speed 2629.82 samples/sec   Loss 4.6109   LearningRate 0.0117   Epoch: 13   Global Step: 545870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:03:55,417-Speed 2624.61 samples/sec   Loss 4.5642   LearningRate 0.0117   Epoch: 13   Global Step: 545880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:03:59,289-Speed 2646.41 samples/sec   Loss 4.6406   LearningRate 0.0117   Epoch: 13   Global Step: 545890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:03,184-Speed 2629.45 samples/sec   Loss 4.6924   LearningRate 0.0117   Epoch: 13   Global Step: 545900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:07,085-Speed 2625.72 samples/sec   Loss 4.5713   LearningRate 0.0117   Epoch: 13   Global Step: 545910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:10,998-Speed 2617.23 samples/sec   Loss 4.6911   LearningRate 0.0117   Epoch: 13   Global Step: 545920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:14,893-Speed 2630.02 samples/sec   Loss 4.5612   LearningRate 0.0117   Epoch: 13   Global Step: 545930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:18,793-Speed 2626.05 samples/sec   Loss 4.4990   LearningRate 0.0117   Epoch: 13   Global Step: 545940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:22,690-Speed 2628.60 samples/sec   Loss 4.6288   LearningRate 0.0117   Epoch: 13   Global Step: 545950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:26,582-Speed 2631.69 samples/sec   Loss 4.5480   LearningRate 0.0117   Epoch: 13   Global Step: 545960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:30,488-Speed 2622.54 samples/sec   Loss 4.4934   LearningRate 0.0117   Epoch: 13   Global Step: 545970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:34,398-Speed 2619.46 samples/sec   Loss 4.5983   LearningRate 0.0117   Epoch: 13   Global Step: 545980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:38,297-Speed 2626.35 samples/sec   Loss 4.6542   LearningRate 0.0117   Epoch: 13   Global Step: 545990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:04:42,197-Speed 2626.70 samples/sec   Loss 4.6232   LearningRate 0.0117   Epoch: 13   Global Step: 546000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:04:46,104-Speed 2622.31 samples/sec   Loss 4.6486   LearningRate 0.0117   Epoch: 13   Global Step: 546010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:04:49,979-Speed 2642.69 samples/sec   Loss 4.6286   LearningRate 0.0117   Epoch: 13   Global Step: 546020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:53,881-Speed 2625.70 samples/sec   Loss 4.5709   LearningRate 0.0117   Epoch: 13   Global Step: 546030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:04:57,784-Speed 2624.64 samples/sec   Loss 4.5652   LearningRate 0.0117   Epoch: 13   Global Step: 546040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:01,815-Speed 2540.67 samples/sec   Loss 4.5315   LearningRate 0.0117   Epoch: 13   Global Step: 546050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:05,710-Speed 2629.40 samples/sec   Loss 4.6165   LearningRate 0.0117   Epoch: 13   Global Step: 546060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:09,611-Speed 2626.15 samples/sec   Loss 4.5482   LearningRate 0.0117   Epoch: 13   Global Step: 546070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:13,522-Speed 2618.81 samples/sec   Loss 4.5212   LearningRate 0.0117   Epoch: 13   Global Step: 546080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:17,470-Speed 2594.58 samples/sec   Loss 4.6228   LearningRate 0.0117   Epoch: 13   Global Step: 546090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:21,373-Speed 2624.22 samples/sec   Loss 4.5474   LearningRate 0.0117   Epoch: 13   Global Step: 546100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:25,312-Speed 2600.68 samples/sec   Loss 4.5547   LearningRate 0.0117   Epoch: 13   Global Step: 546110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:29,212-Speed 2626.14 samples/sec   Loss 4.5991   LearningRate 0.0117   Epoch: 13   Global Step: 546120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:05:33,085-Speed 2644.65 samples/sec   Loss 4.6092   LearningRate 0.0117   Epoch: 13   Global Step: 546130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:36,991-Speed 2622.07 samples/sec   Loss 4.5652   LearningRate 0.0117   Epoch: 13   Global Step: 546140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:40,969-Speed 2574.80 samples/sec   Loss 4.5748   LearningRate 0.0117   Epoch: 13   Global Step: 546150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:44,875-Speed 2622.55 samples/sec   Loss 4.5221   LearningRate 0.0117   Epoch: 13   Global Step: 546160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:48,779-Speed 2623.78 samples/sec   Loss 4.6005   LearningRate 0.0117   Epoch: 13   Global Step: 546170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:52,676-Speed 2628.42 samples/sec   Loss 4.5739   LearningRate 0.0117   Epoch: 13   Global Step: 546180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:05:56,599-Speed 2611.18 samples/sec   Loss 4.5694   LearningRate 0.0117   Epoch: 13   Global Step: 546190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:00,494-Speed 2629.81 samples/sec   Loss 4.6893   LearningRate 0.0117   Epoch: 13   Global Step: 546200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:04,389-Speed 2629.17 samples/sec   Loss 4.6946   LearningRate 0.0117   Epoch: 13   Global Step: 546210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:08,295-Speed 2622.10 samples/sec   Loss 4.5658   LearningRate 0.0117   Epoch: 13   Global Step: 546220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:12,175-Speed 2640.23 samples/sec   Loss 4.5384   LearningRate 0.0117   Epoch: 13   Global Step: 546230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:16,069-Speed 2631.48 samples/sec   Loss 4.5991   LearningRate 0.0117   Epoch: 13   Global Step: 546240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:19,962-Speed 2630.78 samples/sec   Loss 4.6411   LearningRate 0.0117   Epoch: 13   Global Step: 546250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:24,013-Speed 2528.73 samples/sec   Loss 4.6701   LearningRate 0.0117   Epoch: 13   Global Step: 546260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:27,908-Speed 2629.41 samples/sec   Loss 4.6108   LearningRate 0.0117   Epoch: 13   Global Step: 546270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:31,800-Speed 2631.97 samples/sec   Loss 4.5600   LearningRate 0.0117   Epoch: 13   Global Step: 546280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:35,696-Speed 2628.32 samples/sec   Loss 4.5790   LearningRate 0.0117   Epoch: 13   Global Step: 546290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:39,641-Speed 2596.55 samples/sec   Loss 4.6258   LearningRate 0.0117   Epoch: 13   Global Step: 546300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:43,540-Speed 2627.08 samples/sec   Loss 4.6463   LearningRate 0.0117   Epoch: 13   Global Step: 546310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:47,555-Speed 2551.62 samples/sec   Loss 4.5968   LearningRate 0.0117   Epoch: 13   Global Step: 546320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:06:51,452-Speed 2628.45 samples/sec   Loss 4.4680   LearningRate 0.0117   Epoch: 13   Global Step: 546330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:06:55,394-Speed 2598.63 samples/sec   Loss 4.5637   LearningRate 0.0117   Epoch: 13   Global Step: 546340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:06:59,306-Speed 2618.19 samples/sec   Loss 4.6086   LearningRate 0.0117   Epoch: 13   Global Step: 546350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:07:03,181-Speed 2643.27 samples/sec   Loss 4.6022   LearningRate 0.0117   Epoch: 13   Global Step: 546360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:07,086-Speed 2622.99 samples/sec   Loss 4.5926   LearningRate 0.0117   Epoch: 13   Global Step: 546370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:10,980-Speed 2630.70 samples/sec   Loss 4.5815   LearningRate 0.0117   Epoch: 13   Global Step: 546380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:14,890-Speed 2619.90 samples/sec   Loss 4.5855   LearningRate 0.0117   Epoch: 13   Global Step: 546390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:18,786-Speed 2629.04 samples/sec   Loss 4.6281   LearningRate 0.0117   Epoch: 13   Global Step: 546400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:22,681-Speed 2629.89 samples/sec   Loss 4.6632   LearningRate 0.0117   Epoch: 13   Global Step: 546410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:26,586-Speed 2622.67 samples/sec   Loss 4.5661   LearningRate 0.0117   Epoch: 13   Global Step: 546420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:30,482-Speed 2629.70 samples/sec   Loss 4.6492   LearningRate 0.0116   Epoch: 13   Global Step: 546430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:34,379-Speed 2628.38 samples/sec   Loss 4.6622   LearningRate 0.0116   Epoch: 13   Global Step: 546440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:38,274-Speed 2629.78 samples/sec   Loss 4.4997   LearningRate 0.0116   Epoch: 13   Global Step: 546450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:07:42,195-Speed 2611.72 samples/sec   Loss 4.5280   LearningRate 0.0116   Epoch: 13   Global Step: 546460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:07:46,096-Speed 2625.86 samples/sec   Loss 4.6387   LearningRate 0.0116   Epoch: 13   Global Step: 546470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:07:49,992-Speed 2629.45 samples/sec   Loss 4.6361   LearningRate 0.0116   Epoch: 13   Global Step: 546480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:07:53,893-Speed 2625.57 samples/sec   Loss 4.5911   LearningRate 0.0116   Epoch: 13   Global Step: 546490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:07:57,804-Speed 2618.60 samples/sec   Loss 4.7285   LearningRate 0.0116   Epoch: 13   Global Step: 546500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:08:01,700-Speed 2629.63 samples/sec   Loss 4.6159   LearningRate 0.0116   Epoch: 13   Global Step: 546510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:08:05,598-Speed 2627.32 samples/sec   Loss 4.6051   LearningRate 0.0116   Epoch: 13   Global Step: 546520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:08:09,496-Speed 2627.61 samples/sec   Loss 4.5381   LearningRate 0.0116   Epoch: 13   Global Step: 546530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:08:13,392-Speed 2628.87 samples/sec   Loss 4.7298   LearningRate 0.0116   Epoch: 13   Global Step: 546540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:08:17,269-Speed 2642.53 samples/sec   Loss 4.6227   LearningRate 0.0116   Epoch: 13   Global Step: 546550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:21,164-Speed 2630.25 samples/sec   Loss 4.5518   LearningRate 0.0116   Epoch: 13   Global Step: 546560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:25,056-Speed 2631.36 samples/sec   Loss 4.7289   LearningRate 0.0116   Epoch: 13   Global Step: 546570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:28,949-Speed 2631.18 samples/sec   Loss 4.6688   LearningRate 0.0116   Epoch: 13   Global Step: 546580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:32,846-Speed 2628.32 samples/sec   Loss 4.5605   LearningRate 0.0116   Epoch: 13   Global Step: 546590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:36,756-Speed 2619.48 samples/sec   Loss 4.4509   LearningRate 0.0116   Epoch: 13   Global Step: 546600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:40,652-Speed 2628.89 samples/sec   Loss 4.6732   LearningRate 0.0116   Epoch: 13   Global Step: 546610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:44,556-Speed 2624.10 samples/sec   Loss 4.5438   LearningRate 0.0116   Epoch: 13   Global Step: 546620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:48,452-Speed 2628.81 samples/sec   Loss 4.6086   LearningRate 0.0116   Epoch: 13   Global Step: 546630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:52,345-Speed 2631.12 samples/sec   Loss 4.5572   LearningRate 0.0116   Epoch: 13   Global Step: 546640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:08:56,259-Speed 2616.99 samples/sec   Loss 4.6285   LearningRate 0.0116   Epoch: 13   Global Step: 546650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:00,158-Speed 2627.33 samples/sec   Loss 4.5970   LearningRate 0.0116   Epoch: 13   Global Step: 546660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:04,060-Speed 2625.07 samples/sec   Loss 4.6443   LearningRate 0.0116   Epoch: 13   Global Step: 546670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:07,955-Speed 2629.76 samples/sec   Loss 4.6375   LearningRate 0.0116   Epoch: 13   Global Step: 546680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:11,854-Speed 2626.68 samples/sec   Loss 4.5846   LearningRate 0.0116   Epoch: 13   Global Step: 546690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:15,751-Speed 2627.99 samples/sec   Loss 4.6491   LearningRate 0.0116   Epoch: 13   Global Step: 546700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:19,661-Speed 2619.42 samples/sec   Loss 4.6257   LearningRate 0.0116   Epoch: 13   Global Step: 546710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:23,558-Speed 2629.45 samples/sec   Loss 4.5736   LearningRate 0.0116   Epoch: 13   Global Step: 546720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:27,453-Speed 2629.44 samples/sec   Loss 4.6877   LearningRate 0.0116   Epoch: 13   Global Step: 546730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:31,347-Speed 2630.24 samples/sec   Loss 4.6857   LearningRate 0.0116   Epoch: 13   Global Step: 546740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:35,231-Speed 2637.29 samples/sec   Loss 4.6842   LearningRate 0.0116   Epoch: 13   Global Step: 546750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:39,127-Speed 2628.51 samples/sec   Loss 4.5774   LearningRate 0.0116   Epoch: 13   Global Step: 546760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:43,023-Speed 2629.39 samples/sec   Loss 4.6367   LearningRate 0.0116   Epoch: 13   Global Step: 546770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:47,121-Speed 2499.24 samples/sec   Loss 4.5319   LearningRate 0.0116   Epoch: 13   Global Step: 546780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:51,025-Speed 2623.59 samples/sec   Loss 4.5362   LearningRate 0.0116   Epoch: 13   Global Step: 546790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:54,917-Speed 2631.48 samples/sec   Loss 4.5510   LearningRate 0.0116   Epoch: 13   Global Step: 546800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:09:58,824-Speed 2621.97 samples/sec   Loss 4.4843   LearningRate 0.0116   Epoch: 13   Global Step: 546810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:10:02,698-Speed 2644.53 samples/sec   Loss 4.6282   LearningRate 0.0116   Epoch: 13   Global Step: 546820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:06,594-Speed 2628.34 samples/sec   Loss 4.5222   LearningRate 0.0116   Epoch: 13   Global Step: 546830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:10,504-Speed 2619.63 samples/sec   Loss 4.5557   LearningRate 0.0116   Epoch: 13   Global Step: 546840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:14,402-Speed 2627.11 samples/sec   Loss 4.6160   LearningRate 0.0116   Epoch: 13   Global Step: 546850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:18,299-Speed 2628.00 samples/sec   Loss 4.5508   LearningRate 0.0116   Epoch: 13   Global Step: 546860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:22,193-Speed 2630.86 samples/sec   Loss 4.4508   LearningRate 0.0116   Epoch: 13   Global Step: 546870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:26,086-Speed 2630.88 samples/sec   Loss 4.4596   LearningRate 0.0116   Epoch: 13   Global Step: 546880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:29,996-Speed 2620.05 samples/sec   Loss 4.6016   LearningRate 0.0116   Epoch: 13   Global Step: 546890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:33,895-Speed 2626.30 samples/sec   Loss 4.5794   LearningRate 0.0116   Epoch: 13   Global Step: 546900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:37,792-Speed 2628.60 samples/sec   Loss 4.5426   LearningRate 0.0116   Epoch: 13   Global Step: 546910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:10:41,779-Speed 2569.54 samples/sec   Loss 4.6035   LearningRate 0.0116   Epoch: 13   Global Step: 546920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:10:45,683-Speed 2623.59 samples/sec   Loss 4.6237   LearningRate 0.0116   Epoch: 13   Global Step: 546930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:10:49,591-Speed 2621.13 samples/sec   Loss 4.5717   LearningRate 0.0116   Epoch: 13   Global Step: 546940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:10:53,496-Speed 2622.52 samples/sec   Loss 4.6295   LearningRate 0.0116   Epoch: 13   Global Step: 546950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:10:57,414-Speed 2615.06 samples/sec   Loss 4.6000   LearningRate 0.0116   Epoch: 13   Global Step: 546960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:11:03,135-Speed 1790.01 samples/sec   Loss 4.5571   LearningRate 0.0116   Epoch: 13   Global Step: 546970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:11:07,052-Speed 2614.57 samples/sec   Loss 4.5718   LearningRate 0.0116   Epoch: 13   Global Step: 546980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:11:10,931-Speed 2641.02 samples/sec   Loss 4.5469   LearningRate 0.0116   Epoch: 13   Global Step: 546990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:14,828-Speed 2628.31 samples/sec   Loss 4.6704   LearningRate 0.0116   Epoch: 13   Global Step: 547000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:18,748-Speed 2613.10 samples/sec   Loss 4.5744   LearningRate 0.0116   Epoch: 13   Global Step: 547010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:22,644-Speed 2629.37 samples/sec   Loss 4.5684   LearningRate 0.0116   Epoch: 13   Global Step: 547020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:26,539-Speed 2630.17 samples/sec   Loss 4.4913   LearningRate 0.0116   Epoch: 13   Global Step: 547030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:30,446-Speed 2621.34 samples/sec   Loss 4.5259   LearningRate 0.0116   Epoch: 13   Global Step: 547040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:34,357-Speed 2618.18 samples/sec   Loss 4.7054   LearningRate 0.0116   Epoch: 13   Global Step: 547050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:38,250-Speed 2630.80 samples/sec   Loss 4.5818   LearningRate 0.0116   Epoch: 13   Global Step: 547060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:42,145-Speed 2630.30 samples/sec   Loss 4.5984   LearningRate 0.0116   Epoch: 13   Global Step: 547070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:46,039-Speed 2630.80 samples/sec   Loss 4.7015   LearningRate 0.0116   Epoch: 13   Global Step: 547080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:49,988-Speed 2593.71 samples/sec   Loss 4.5837   LearningRate 0.0116   Epoch: 13   Global Step: 547090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:11:53,862-Speed 2643.91 samples/sec   Loss 4.5670   LearningRate 0.0116   Epoch: 13   Global Step: 547100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:11:57,791-Speed 2607.64 samples/sec   Loss 4.6161   LearningRate 0.0116   Epoch: 13   Global Step: 547110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:01,715-Speed 2609.82 samples/sec   Loss 4.6874   LearningRate 0.0116   Epoch: 13   Global Step: 547120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:05,612-Speed 2628.55 samples/sec   Loss 4.5820   LearningRate 0.0116   Epoch: 13   Global Step: 547130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:09,501-Speed 2633.67 samples/sec   Loss 4.6177   LearningRate 0.0116   Epoch: 13   Global Step: 547140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:13,392-Speed 2633.08 samples/sec   Loss 4.6000   LearningRate 0.0116   Epoch: 13   Global Step: 547150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:17,292-Speed 2626.24 samples/sec   Loss 4.5974   LearningRate 0.0116   Epoch: 13   Global Step: 547160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:21,194-Speed 2624.95 samples/sec   Loss 4.6062   LearningRate 0.0116   Epoch: 13   Global Step: 547170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:25,095-Speed 2626.43 samples/sec   Loss 4.6971   LearningRate 0.0116   Epoch: 13   Global Step: 547180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:28,995-Speed 2626.13 samples/sec   Loss 4.5409   LearningRate 0.0116   Epoch: 13   Global Step: 547190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:32,912-Speed 2614.21 samples/sec   Loss 4.5473   LearningRate 0.0116   Epoch: 13   Global Step: 547200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:12:36,810-Speed 2627.87 samples/sec   Loss 4.5014   LearningRate 0.0116   Epoch: 13   Global Step: 547210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:12:40,688-Speed 2641.58 samples/sec   Loss 4.5341   LearningRate 0.0116   Epoch: 13   Global Step: 547220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:44,623-Speed 2603.12 samples/sec   Loss 4.5181   LearningRate 0.0116   Epoch: 13   Global Step: 547230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:48,515-Speed 2631.73 samples/sec   Loss 4.4708   LearningRate 0.0116   Epoch: 13   Global Step: 547240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:52,410-Speed 2629.68 samples/sec   Loss 4.5608   LearningRate 0.0116   Epoch: 13   Global Step: 547250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:12:56,318-Speed 2622.02 samples/sec   Loss 4.5834   LearningRate 0.0116   Epoch: 13   Global Step: 547260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:00,214-Speed 2628.87 samples/sec   Loss 4.6322   LearningRate 0.0116   Epoch: 13   Global Step: 547270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:04,140-Speed 2608.79 samples/sec   Loss 4.5990   LearningRate 0.0116   Epoch: 13   Global Step: 547280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:08,040-Speed 2626.39 samples/sec   Loss 4.6653   LearningRate 0.0116   Epoch: 13   Global Step: 547290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:11,933-Speed 2631.18 samples/sec   Loss 4.6517   LearningRate 0.0116   Epoch: 13   Global Step: 547300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:15,826-Speed 2630.88 samples/sec   Loss 4.6405   LearningRate 0.0116   Epoch: 13   Global Step: 547310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:19,735-Speed 2619.75 samples/sec   Loss 4.5431   LearningRate 0.0116   Epoch: 13   Global Step: 547320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:13:23,630-Speed 2630.40 samples/sec   Loss 4.6592   LearningRate 0.0116   Epoch: 13   Global Step: 547330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:27,521-Speed 2632.21 samples/sec   Loss 4.6458   LearningRate 0.0116   Epoch: 13   Global Step: 547340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:31,463-Speed 2598.87 samples/sec   Loss 4.5948   LearningRate 0.0116   Epoch: 13   Global Step: 547350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:35,359-Speed 2629.22 samples/sec   Loss 4.5950   LearningRate 0.0116   Epoch: 13   Global Step: 547360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:39,252-Speed 2630.60 samples/sec   Loss 4.5130   LearningRate 0.0116   Epoch: 13   Global Step: 547370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:43,153-Speed 2625.89 samples/sec   Loss 4.5675   LearningRate 0.0116   Epoch: 13   Global Step: 547380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:47,056-Speed 2624.52 samples/sec   Loss 4.5278   LearningRate 0.0116   Epoch: 13   Global Step: 547390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:50,957-Speed 2625.66 samples/sec   Loss 4.5467   LearningRate 0.0116   Epoch: 13   Global Step: 547400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:54,849-Speed 2632.14 samples/sec   Loss 4.6134   LearningRate 0.0116   Epoch: 13   Global Step: 547410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:13:58,742-Speed 2630.87 samples/sec   Loss 4.5248   LearningRate 0.0116   Epoch: 13   Global Step: 547420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:02,638-Speed 2629.39 samples/sec   Loss 4.5668   LearningRate 0.0116   Epoch: 13   Global Step: 547430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:14:06,544-Speed 2621.88 samples/sec   Loss 4.5827   LearningRate 0.0116   Epoch: 13   Global Step: 547440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:14:10,436-Speed 2631.60 samples/sec   Loss 4.5920   LearningRate 0.0116   Epoch: 13   Global Step: 547450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:14,482-Speed 2531.36 samples/sec   Loss 4.5857   LearningRate 0.0116   Epoch: 13   Global Step: 547460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:18,573-Speed 2504.32 samples/sec   Loss 4.5354   LearningRate 0.0116   Epoch: 13   Global Step: 547470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:22,533-Speed 2586.72 samples/sec   Loss 4.5373   LearningRate 0.0116   Epoch: 13   Global Step: 547480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:26,431-Speed 2627.61 samples/sec   Loss 4.5532   LearningRate 0.0116   Epoch: 13   Global Step: 547490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:30,422-Speed 2566.41 samples/sec   Loss 4.5754   LearningRate 0.0116   Epoch: 13   Global Step: 547500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:34,324-Speed 2624.55 samples/sec   Loss 4.5561   LearningRate 0.0116   Epoch: 13   Global Step: 547510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:38,225-Speed 2625.37 samples/sec   Loss 4.6422   LearningRate 0.0116   Epoch: 13   Global Step: 547520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:42,140-Speed 2615.93 samples/sec   Loss 4.6258   LearningRate 0.0116   Epoch: 13   Global Step: 547530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:46,050-Speed 2620.24 samples/sec   Loss 4.6894   LearningRate 0.0116   Epoch: 13   Global Step: 547540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:14:49,945-Speed 2629.31 samples/sec   Loss 4.6292   LearningRate 0.0116   Epoch: 13   Global Step: 547550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:14:53,839-Speed 2631.39 samples/sec   Loss 4.5614   LearningRate 0.0116   Epoch: 13   Global Step: 547560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:14:57,738-Speed 2626.31 samples/sec   Loss 4.5684   LearningRate 0.0116   Epoch: 13   Global Step: 547570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:15:01,665-Speed 2608.69 samples/sec   Loss 4.6103   LearningRate 0.0116   Epoch: 13   Global Step: 547580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:15:05,533-Speed 2647.71 samples/sec   Loss 4.4412   LearningRate 0.0116   Epoch: 13   Global Step: 547590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:09,423-Speed 2632.99 samples/sec   Loss 4.6097   LearningRate 0.0116   Epoch: 13   Global Step: 547600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:13,345-Speed 2611.56 samples/sec   Loss 4.5868   LearningRate 0.0116   Epoch: 13   Global Step: 547610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:17,242-Speed 2628.63 samples/sec   Loss 4.5617   LearningRate 0.0116   Epoch: 13   Global Step: 547620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:21,132-Speed 2633.60 samples/sec   Loss 4.5179   LearningRate 0.0116   Epoch: 13   Global Step: 547630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:25,025-Speed 2631.60 samples/sec   Loss 4.7084   LearningRate 0.0116   Epoch: 13   Global Step: 547640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:28,944-Speed 2613.40 samples/sec   Loss 4.6045   LearningRate 0.0115   Epoch: 13   Global Step: 547650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:32,835-Speed 2632.87 samples/sec   Loss 4.5462   LearningRate 0.0115   Epoch: 13   Global Step: 547660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:36,729-Speed 2629.74 samples/sec   Loss 4.5956   LearningRate 0.0115   Epoch: 13   Global Step: 547670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:40,645-Speed 2615.63 samples/sec   Loss 4.5196   LearningRate 0.0115   Epoch: 13   Global Step: 547680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:15:44,539-Speed 2630.40 samples/sec   Loss 4.5387   LearningRate 0.0115   Epoch: 13   Global Step: 547690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:15:48,452-Speed 2617.86 samples/sec   Loss 4.5945   LearningRate 0.0115   Epoch: 13   Global Step: 547700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:15:52,345-Speed 2631.05 samples/sec   Loss 4.4982   LearningRate 0.0115   Epoch: 13   Global Step: 547710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:15:56,260-Speed 2616.05 samples/sec   Loss 4.5618   LearningRate 0.0115   Epoch: 13   Global Step: 547720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:16:00,159-Speed 2627.50 samples/sec   Loss 4.6125   LearningRate 0.0115   Epoch: 13   Global Step: 547730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:16:04,030-Speed 2646.21 samples/sec   Loss 4.5834   LearningRate 0.0115   Epoch: 13   Global Step: 547740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:16:07,927-Speed 2628.28 samples/sec   Loss 4.5075   LearningRate 0.0115   Epoch: 13   Global Step: 547750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:16:11,823-Speed 2628.27 samples/sec   Loss 4.7005   LearningRate 0.0115   Epoch: 13   Global Step: 547760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:16:15,701-Speed 2641.67 samples/sec   Loss 4.5889   LearningRate 0.0115   Epoch: 13   Global Step: 547770   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:19,806-Speed 2495.38 samples/sec   Loss 4.6531   LearningRate 0.0115   Epoch: 13   Global Step: 547780   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:23,760-Speed 2590.20 samples/sec   Loss 4.5847   LearningRate 0.0115   Epoch: 13   Global Step: 547790   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:27,651-Speed 2632.92 samples/sec   Loss 4.5295   LearningRate 0.0115   Epoch: 13   Global Step: 547800   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:31,548-Speed 2628.39 samples/sec   Loss 4.6007   LearningRate 0.0115   Epoch: 13   Global Step: 547810   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:35,445-Speed 2627.86 samples/sec   Loss 4.6203   LearningRate 0.0115   Epoch: 13   Global Step: 547820   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:39,336-Speed 2632.78 samples/sec   Loss 4.5901   LearningRate 0.0115   Epoch: 13   Global Step: 547830   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:43,237-Speed 2625.85 samples/sec   Loss 4.5084   LearningRate 0.0115   Epoch: 13   Global Step: 547840   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:47,131-Speed 2630.88 samples/sec   Loss 4.5596   LearningRate 0.0115   Epoch: 13   Global Step: 547850   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:51,027-Speed 2628.50 samples/sec   Loss 4.5618   LearningRate 0.0115   Epoch: 13   Global Step: 547860   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:16:54,928-Speed 2626.30 samples/sec   Loss 4.5542   LearningRate 0.0115   Epoch: 13   Global Step: 547870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:16:58,822-Speed 2630.46 samples/sec   Loss 4.6506   LearningRate 0.0115   Epoch: 13   Global Step: 547880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:02,714-Speed 2630.99 samples/sec   Loss 4.6301   LearningRate 0.0115   Epoch: 13   Global Step: 547890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:06,608-Speed 2630.15 samples/sec   Loss 4.5725   LearningRate 0.0115   Epoch: 13   Global Step: 547900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:10,516-Speed 2621.33 samples/sec   Loss 4.6140   LearningRate 0.0115   Epoch: 13   Global Step: 547910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:14,410-Speed 2629.93 samples/sec   Loss 4.5452   LearningRate 0.0115   Epoch: 13   Global Step: 547920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:18,306-Speed 2630.57 samples/sec   Loss 4.5061   LearningRate 0.0115   Epoch: 13   Global Step: 547930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:22,199-Speed 2630.61 samples/sec   Loss 4.5522   LearningRate 0.0115   Epoch: 13   Global Step: 547940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:26,093-Speed 2630.65 samples/sec   Loss 4.5649   LearningRate 0.0115   Epoch: 13   Global Step: 547950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:29,986-Speed 2630.72 samples/sec   Loss 4.5519   LearningRate 0.0115   Epoch: 13   Global Step: 547960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:17:33,879-Speed 2631.06 samples/sec   Loss 4.5737   LearningRate 0.0115   Epoch: 13   Global Step: 547970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:17:37,770-Speed 2632.42 samples/sec   Loss 4.5374   LearningRate 0.0115   Epoch: 13   Global Step: 547980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:17:41,670-Speed 2626.33 samples/sec   Loss 4.5665   LearningRate 0.0115   Epoch: 13   Global Step: 547990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:17:45,564-Speed 2630.17 samples/sec   Loss 4.5317   LearningRate 0.0115   Epoch: 13   Global Step: 548000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:17:49,458-Speed 2631.10 samples/sec   Loss 4.6311   LearningRate 0.0115   Epoch: 13   Global Step: 548010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:17:53,358-Speed 2625.89 samples/sec   Loss 4.6233   LearningRate 0.0115   Epoch: 13   Global Step: 548020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:17:57,253-Speed 2630.09 samples/sec   Loss 4.5561   LearningRate 0.0115   Epoch: 13   Global Step: 548030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:18:01,136-Speed 2638.11 samples/sec   Loss 4.6029   LearningRate 0.0115   Epoch: 13   Global Step: 548040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:05,036-Speed 2626.23 samples/sec   Loss 4.5810   LearningRate 0.0115   Epoch: 13   Global Step: 548050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:08,933-Speed 2627.80 samples/sec   Loss 4.6038   LearningRate 0.0115   Epoch: 13   Global Step: 548060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:12,906-Speed 2578.81 samples/sec   Loss 4.5129   LearningRate 0.0115   Epoch: 13   Global Step: 548070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:16,883-Speed 2575.00 samples/sec   Loss 4.6650   LearningRate 0.0115   Epoch: 13   Global Step: 548080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:20,799-Speed 2616.07 samples/sec   Loss 4.5707   LearningRate 0.0115   Epoch: 13   Global Step: 548090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:24,702-Speed 2624.32 samples/sec   Loss 4.6478   LearningRate 0.0115   Epoch: 13   Global Step: 548100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:28,627-Speed 2609.77 samples/sec   Loss 4.5190   LearningRate 0.0115   Epoch: 13   Global Step: 548110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:32,563-Speed 2602.23 samples/sec   Loss 4.5549   LearningRate 0.0115   Epoch: 13   Global Step: 548120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:36,589-Speed 2543.97 samples/sec   Loss 4.4995   LearningRate 0.0115   Epoch: 13   Global Step: 548130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:18:40,460-Speed 2646.28 samples/sec   Loss 4.5357   LearningRate 0.0115   Epoch: 13   Global Step: 548140   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:18:44,367-Speed 2620.76 samples/sec   Loss 4.6216   LearningRate 0.0115   Epoch: 13   Global Step: 548150   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:18:48,272-Speed 2623.46 samples/sec   Loss 4.5543   LearningRate 0.0115   Epoch: 13   Global Step: 548160   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:18:52,179-Speed 2621.92 samples/sec   Loss 4.5744   LearningRate 0.0115   Epoch: 13   Global Step: 548170   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:18:56,081-Speed 2624.95 samples/sec   Loss 4.5583   LearningRate 0.0115   Epoch: 13   Global Step: 548180   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:18:59,984-Speed 2623.96 samples/sec   Loss 4.6100   LearningRate 0.0115   Epoch: 13   Global Step: 548190   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:19:03,896-Speed 2618.63 samples/sec   Loss 4.5186   LearningRate 0.0115   Epoch: 13   Global Step: 548200   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:19:07,867-Speed 2578.97 samples/sec   Loss 4.5674   LearningRate 0.0115   Epoch: 13   Global Step: 548210   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:19:11,759-Speed 2631.57 samples/sec   Loss 4.5267   LearningRate 0.0115   Epoch: 13   Global Step: 548220   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:19:15,658-Speed 2627.23 samples/sec   Loss 4.5868   LearningRate 0.0115   Epoch: 13   Global Step: 548230   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:19:19,558-Speed 2626.28 samples/sec   Loss 4.6178   LearningRate 0.0115   Epoch: 13   Global Step: 548240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:23,458-Speed 2626.60 samples/sec   Loss 4.5556   LearningRate 0.0115   Epoch: 13   Global Step: 548250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:27,357-Speed 2627.45 samples/sec   Loss 4.5879   LearningRate 0.0115   Epoch: 13   Global Step: 548260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:31,258-Speed 2625.09 samples/sec   Loss 4.6140   LearningRate 0.0115   Epoch: 13   Global Step: 548270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:35,163-Speed 2623.08 samples/sec   Loss 4.5782   LearningRate 0.0115   Epoch: 13   Global Step: 548280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:39,064-Speed 2625.18 samples/sec   Loss 4.5603   LearningRate 0.0115   Epoch: 13   Global Step: 548290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:42,958-Speed 2630.41 samples/sec   Loss 4.6403   LearningRate 0.0115   Epoch: 13   Global Step: 548300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:46,853-Speed 2629.76 samples/sec   Loss 4.5730   LearningRate 0.0115   Epoch: 13   Global Step: 548310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:50,753-Speed 2626.20 samples/sec   Loss 4.4807   LearningRate 0.0115   Epoch: 13   Global Step: 548320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:54,650-Speed 2628.36 samples/sec   Loss 4.5297   LearningRate 0.0115   Epoch: 13   Global Step: 548330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:19:58,560-Speed 2620.15 samples/sec   Loss 4.5974   LearningRate 0.0115   Epoch: 13   Global Step: 548340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:02,463-Speed 2624.42 samples/sec   Loss 4.5361   LearningRate 0.0115   Epoch: 13   Global Step: 548350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:06,364-Speed 2625.46 samples/sec   Loss 4.5814   LearningRate 0.0115   Epoch: 13   Global Step: 548360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:10,256-Speed 2631.21 samples/sec   Loss 4.5915   LearningRate 0.0115   Epoch: 13   Global Step: 548370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:14,152-Speed 2629.42 samples/sec   Loss 4.5495   LearningRate 0.0115   Epoch: 13   Global Step: 548380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:18,056-Speed 2623.90 samples/sec   Loss 4.5228   LearningRate 0.0115   Epoch: 13   Global Step: 548390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:21,953-Speed 2628.73 samples/sec   Loss 4.5787   LearningRate 0.0115   Epoch: 13   Global Step: 548400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:25,847-Speed 2630.37 samples/sec   Loss 4.4884   LearningRate 0.0115   Epoch: 13   Global Step: 548410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:29,745-Speed 2627.62 samples/sec   Loss 4.5742   LearningRate 0.0115   Epoch: 13   Global Step: 548420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:20:33,628-Speed 2638.13 samples/sec   Loss 4.5012   LearningRate 0.0115   Epoch: 13   Global Step: 548430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:20:37,526-Speed 2627.50 samples/sec   Loss 4.5726   LearningRate 0.0115   Epoch: 13   Global Step: 548440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:20:41,424-Speed 2627.53 samples/sec   Loss 4.4518   LearningRate 0.0115   Epoch: 13   Global Step: 548450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:20:45,330-Speed 2621.92 samples/sec   Loss 4.5366   LearningRate 0.0115   Epoch: 13   Global Step: 548460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:20:49,236-Speed 2622.18 samples/sec   Loss 4.5527   LearningRate 0.0115   Epoch: 13   Global Step: 548470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:20:53,135-Speed 2627.51 samples/sec   Loss 4.5125   LearningRate 0.0115   Epoch: 13   Global Step: 548480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:20:57,032-Speed 2628.79 samples/sec   Loss 4.5479   LearningRate 0.0115   Epoch: 13   Global Step: 548490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:21:00,928-Speed 2629.47 samples/sec   Loss 4.5569   LearningRate 0.0115   Epoch: 13   Global Step: 548500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:21:04,826-Speed 2627.23 samples/sec   Loss 4.5501   LearningRate 0.0115   Epoch: 13   Global Step: 548510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:21:08,737-Speed 2619.13 samples/sec   Loss 4.4794   LearningRate 0.0115   Epoch: 13   Global Step: 548520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:21:12,631-Speed 2630.01 samples/sec   Loss 4.5713   LearningRate 0.0115   Epoch: 13   Global Step: 548530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:16,533-Speed 2625.23 samples/sec   Loss 4.5744   LearningRate 0.0115   Epoch: 13   Global Step: 548540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:20,428-Speed 2629.31 samples/sec   Loss 4.6683   LearningRate 0.0115   Epoch: 13   Global Step: 548550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:24,327-Speed 2627.75 samples/sec   Loss 4.6426   LearningRate 0.0115   Epoch: 13   Global Step: 548560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:28,220-Speed 2630.81 samples/sec   Loss 4.5708   LearningRate 0.0115   Epoch: 13   Global Step: 548570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:32,112-Speed 2631.70 samples/sec   Loss 4.6595   LearningRate 0.0115   Epoch: 13   Global Step: 548580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:36,005-Speed 2631.42 samples/sec   Loss 4.4422   LearningRate 0.0115   Epoch: 13   Global Step: 548590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:21:39,876-Speed 2645.48 samples/sec   Loss 4.5838   LearningRate 0.0115   Epoch: 13   Global Step: 548600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:21:43,750-Speed 2643.86 samples/sec   Loss 4.5581   LearningRate 0.0115   Epoch: 13   Global Step: 548610   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:21:47,660-Speed 2619.98 samples/sec   Loss 4.4956   LearningRate 0.0115   Epoch: 13   Global Step: 548620   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:21:51,570-Speed 2619.57 samples/sec   Loss 4.6605   LearningRate 0.0115   Epoch: 13   Global Step: 548630   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:21:55,463-Speed 2631.38 samples/sec   Loss 4.4886   LearningRate 0.0115   Epoch: 13   Global Step: 548640   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:21:59,362-Speed 2626.59 samples/sec   Loss 4.5491   LearningRate 0.0115   Epoch: 13   Global Step: 548650   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:22:03,258-Speed 2629.61 samples/sec   Loss 4.5724   LearningRate 0.0115   Epoch: 13   Global Step: 548660   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:22:07,156-Speed 2627.96 samples/sec   Loss 4.5127   LearningRate 0.0115   Epoch: 13   Global Step: 548670   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:22:11,051-Speed 2629.49 samples/sec   Loss 4.5297   LearningRate 0.0115   Epoch: 13   Global Step: 548680   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:22:15,080-Speed 2541.64 samples/sec   Loss 4.5555   LearningRate 0.0115   Epoch: 13   Global Step: 548690   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:22:18,970-Speed 2633.60 samples/sec   Loss 4.4596   LearningRate 0.0115   Epoch: 13   Global Step: 548700   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-15 09:22:22,863-Speed 2631.35 samples/sec   Loss 4.5708   LearningRate 0.0115   Epoch: 13   Global Step: 548710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:26,769-Speed 2622.34 samples/sec   Loss 4.6493   LearningRate 0.0115   Epoch: 13   Global Step: 548720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:30,675-Speed 2622.29 samples/sec   Loss 4.5300   LearningRate 0.0115   Epoch: 13   Global Step: 548730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:34,571-Speed 2628.65 samples/sec   Loss 4.6124   LearningRate 0.0115   Epoch: 13   Global Step: 548740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:38,467-Speed 2629.87 samples/sec   Loss 4.5153   LearningRate 0.0115   Epoch: 13   Global Step: 548750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:42,364-Speed 2627.82 samples/sec   Loss 4.5389   LearningRate 0.0115   Epoch: 13   Global Step: 548760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:46,263-Speed 2627.35 samples/sec   Loss 4.4703   LearningRate 0.0115   Epoch: 13   Global Step: 548770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:50,159-Speed 2628.95 samples/sec   Loss 4.5303   LearningRate 0.0115   Epoch: 13   Global Step: 548780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:54,060-Speed 2626.68 samples/sec   Loss 4.4939   LearningRate 0.0115   Epoch: 13   Global Step: 548790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:22:57,958-Speed 2627.82 samples/sec   Loss 4.4669   LearningRate 0.0115   Epoch: 13   Global Step: 548800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:01,851-Speed 2631.29 samples/sec   Loss 4.5802   LearningRate 0.0115   Epoch: 13   Global Step: 548810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:23:05,729-Speed 2641.17 samples/sec   Loss 4.5837   LearningRate 0.0115   Epoch: 13   Global Step: 548820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:09,629-Speed 2626.47 samples/sec   Loss 4.5131   LearningRate 0.0115   Epoch: 13   Global Step: 548830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:13,519-Speed 2633.02 samples/sec   Loss 4.6204   LearningRate 0.0115   Epoch: 13   Global Step: 548840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:17,411-Speed 2631.46 samples/sec   Loss 4.5321   LearningRate 0.0115   Epoch: 13   Global Step: 548850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:21,311-Speed 2627.01 samples/sec   Loss 4.5181   LearningRate 0.0115   Epoch: 13   Global Step: 548860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:25,206-Speed 2629.69 samples/sec   Loss 4.5799   LearningRate 0.0114   Epoch: 13   Global Step: 548870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:29,098-Speed 2632.96 samples/sec   Loss 4.6164   LearningRate 0.0114   Epoch: 13   Global Step: 548880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:32,995-Speed 2628.36 samples/sec   Loss 4.5356   LearningRate 0.0114   Epoch: 13   Global Step: 548890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:36,892-Speed 2627.84 samples/sec   Loss 4.5458   LearningRate 0.0114   Epoch: 13   Global Step: 548900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:40,783-Speed 2631.98 samples/sec   Loss 4.6241   LearningRate 0.0114   Epoch: 13   Global Step: 548910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:23:44,684-Speed 2625.92 samples/sec   Loss 4.5031   LearningRate 0.0114   Epoch: 13   Global Step: 548920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:23:48,635-Speed 2592.78 samples/sec   Loss 4.5326   LearningRate 0.0114   Epoch: 13   Global Step: 548930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:23:52,661-Speed 2543.81 samples/sec   Loss 4.5625   LearningRate 0.0114   Epoch: 13   Global Step: 548940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:23:56,637-Speed 2576.85 samples/sec   Loss 4.5518   LearningRate 0.0114   Epoch: 13   Global Step: 548950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:24:00,527-Speed 2632.80 samples/sec   Loss 4.6477   LearningRate 0.0114   Epoch: 13   Global Step: 548960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:24:04,433-Speed 2622.57 samples/sec   Loss 4.6187   LearningRate 0.0114   Epoch: 13   Global Step: 548970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:24:08,319-Speed 2635.59 samples/sec   Loss 4.5314   LearningRate 0.0114   Epoch: 13   Global Step: 548980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:12,212-Speed 2631.22 samples/sec   Loss 4.5784   LearningRate 0.0114   Epoch: 13   Global Step: 548990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:16,105-Speed 2630.81 samples/sec   Loss 4.7314   LearningRate 0.0114   Epoch: 13   Global Step: 549000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:20,013-Speed 2621.28 samples/sec   Loss 4.6069   LearningRate 0.0114   Epoch: 13   Global Step: 549010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:23,908-Speed 2629.24 samples/sec   Loss 4.5969   LearningRate 0.0114   Epoch: 13   Global Step: 549020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:27,802-Speed 2631.78 samples/sec   Loss 4.5708   LearningRate 0.0114   Epoch: 13   Global Step: 549030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:31,768-Speed 2582.73 samples/sec   Loss 4.5821   LearningRate 0.0114   Epoch: 13   Global Step: 549040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:35,666-Speed 2627.14 samples/sec   Loss 4.5582   LearningRate 0.0114   Epoch: 13   Global Step: 549050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:39,577-Speed 2619.11 samples/sec   Loss 4.5234   LearningRate 0.0114   Epoch: 13   Global Step: 549060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:43,550-Speed 2578.60 samples/sec   Loss 4.5329   LearningRate 0.0114   Epoch: 13   Global Step: 549070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:47,693-Speed 2471.78 samples/sec   Loss 4.5437   LearningRate 0.0114   Epoch: 13   Global Step: 549080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-15 09:24:51,587-Speed 2630.77 samples/sec   Loss 4.5071   LearningRate 0.0114   Epoch: 13   Global Step: 549090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:55,479-Speed 2631.58 samples/sec   Loss 4.6481   LearningRate 0.0114   Epoch: 13   Global Step: 549100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:24:59,385-Speed 2623.17 samples/sec   Loss 4.6158   LearningRate 0.0114   Epoch: 13   Global Step: 549110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:25:03,297-Speed 2617.86 samples/sec   Loss 4.5894   LearningRate 0.0114   Epoch: 13   Global Step: 549120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:25:07,196-Speed 2626.92 samples/sec   Loss 4.4393   LearningRate 0.0114   Epoch: 13   Global Step: 549130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:25:11,165-Speed 2580.36 samples/sec   Loss 4.5138   LearningRate 0.0114   Epoch: 13   Global Step: 549140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:25:15,079-Speed 2617.15 samples/sec   Loss 4.5070   LearningRate 0.0114   Epoch: 13   Global Step: 549150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:25:18,978-Speed 2627.52 samples/sec   Loss 4.5895   LearningRate 0.0114   Epoch: 13   Global Step: 549160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-15 09:25:22,875-Speed 2628.26 samples/sec   Loss 4.5770   LearningRate 0.0114   Epoch: 13   Global Step: 549170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:25:26,776-Speed 2625.35 samples/sec   Loss 4.6142   LearningRate 0.0114   Epoch: 13   Global Step: 549180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:25:30,684-Speed 2621.10 samples/sec   Loss 4.5250   LearningRate 0.0114   Epoch: 13   Global Step: 549190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:25:34,590-Speed 2622.22 samples/sec   Loss 4.5887   LearningRate 0.0114   Epoch: 13   Global Step: 549200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:25:38,497-Speed 2621.58 samples/sec   Loss 4.4481   LearningRate 0.0114   Epoch: 13   Global Step: 549210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:25:42,549-Speed 2528.11 samples/sec   Loss 4.6054   LearningRate 0.0114   Epoch: 13   Global Step: 549220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:25:46,451-Speed 2624.12 samples/sec   Loss 4.5126   LearningRate 0.0114   Epoch: 13   Global Step: 549230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:25:50,358-Speed 2622.21 samples/sec   Loss 4.5668   LearningRate 0.0114   Epoch: 13   Global Step: 549240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:25:54,279-Speed 2612.00 samples/sec   Loss 4.5684   LearningRate 0.0114   Epoch: 13   Global Step: 549250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:25:58,193-Speed 2617.18 samples/sec   Loss 4.6187   LearningRate 0.0114   Epoch: 13   Global Step: 549260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:26:02,084-Speed 2632.26 samples/sec   Loss 4.4928   LearningRate 0.0114   Epoch: 13   Global Step: 549270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:26:05,980-Speed 2629.44 samples/sec   Loss 4.4926   LearningRate 0.0114   Epoch: 13   Global Step: 549280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:26:09,880-Speed 2625.83 samples/sec   Loss 4.6451   LearningRate 0.0114   Epoch: 13   Global Step: 549290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:26:13,776-Speed 2629.08 samples/sec   Loss 4.6611   LearningRate 0.0114   Epoch: 13   Global Step: 549300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:26:17,670-Speed 2630.99 samples/sec   Loss 4.5098   LearningRate 0.0114   Epoch: 13   Global Step: 549310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:26:21,538-Speed 2647.76 samples/sec   Loss 4.5170   LearningRate 0.0114   Epoch: 13   Global Step: 549320   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:25,429-Speed 2632.88 samples/sec   Loss 4.5879   LearningRate 0.0114   Epoch: 13   Global Step: 549330   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:29,322-Speed 2630.54 samples/sec   Loss 4.5702   LearningRate 0.0114   Epoch: 13   Global Step: 549340   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:33,223-Speed 2626.04 samples/sec   Loss 4.5434   LearningRate 0.0114   Epoch: 13   Global Step: 549350   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:37,153-Speed 2605.92 samples/sec   Loss 4.5882   LearningRate 0.0114   Epoch: 13   Global Step: 549360   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:41,045-Speed 2632.08 samples/sec   Loss 4.5162   LearningRate 0.0114   Epoch: 13   Global Step: 549370   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:44,938-Speed 2631.26 samples/sec   Loss 4.5899   LearningRate 0.0114   Epoch: 13   Global Step: 549380   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:48,830-Speed 2631.78 samples/sec   Loss 4.4946   LearningRate 0.0114   Epoch: 13   Global Step: 549390   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:52,724-Speed 2629.97 samples/sec   Loss 4.5241   LearningRate 0.0114   Epoch: 13   Global Step: 549400   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:26:56,637-Speed 2617.99 samples/sec   Loss 4.5546   LearningRate 0.0114   Epoch: 13   Global Step: 549410   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:27:00,531-Speed 2630.54 samples/sec   Loss 4.5002   LearningRate 0.0114   Epoch: 13   Global Step: 549420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:04,424-Speed 2631.41 samples/sec   Loss 4.5062   LearningRate 0.0114   Epoch: 13   Global Step: 549430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:08,316-Speed 2631.37 samples/sec   Loss 4.5068   LearningRate 0.0114   Epoch: 13   Global Step: 549440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:12,217-Speed 2625.81 samples/sec   Loss 4.5308   LearningRate 0.0114   Epoch: 13   Global Step: 549450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:16,106-Speed 2633.19 samples/sec   Loss 4.6409   LearningRate 0.0114   Epoch: 13   Global Step: 549460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:20,012-Speed 2622.21 samples/sec   Loss 4.6306   LearningRate 0.0114   Epoch: 13   Global Step: 549470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:23,917-Speed 2623.24 samples/sec   Loss 4.5814   LearningRate 0.0114   Epoch: 13   Global Step: 549480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:27,814-Speed 2628.80 samples/sec   Loss 4.6424   LearningRate 0.0114   Epoch: 13   Global Step: 549490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:31,711-Speed 2628.07 samples/sec   Loss 4.4670   LearningRate 0.0114   Epoch: 13   Global Step: 549500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:35,623-Speed 2618.41 samples/sec   Loss 4.6011   LearningRate 0.0114   Epoch: 13   Global Step: 549510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:27:39,531-Speed 2620.60 samples/sec   Loss 4.4637   LearningRate 0.0114   Epoch: 13   Global Step: 549520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:27:43,427-Speed 2629.50 samples/sec   Loss 4.5986   LearningRate 0.0114   Epoch: 13   Global Step: 549530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:27:47,321-Speed 2630.61 samples/sec   Loss 4.5092   LearningRate 0.0114   Epoch: 13   Global Step: 549540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:27:51,212-Speed 2631.76 samples/sec   Loss 4.5478   LearningRate 0.0114   Epoch: 13   Global Step: 549550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:27:55,113-Speed 2626.17 samples/sec   Loss 4.4888   LearningRate 0.0114   Epoch: 13   Global Step: 549560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:27:59,013-Speed 2626.12 samples/sec   Loss 4.5548   LearningRate 0.0114   Epoch: 13   Global Step: 549570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:02,924-Speed 2619.19 samples/sec   Loss 4.4965   LearningRate 0.0114   Epoch: 13   Global Step: 549580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:06,827-Speed 2624.23 samples/sec   Loss 4.6026   LearningRate 0.0114   Epoch: 13   Global Step: 549590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:10,722-Speed 2629.68 samples/sec   Loss 4.5019   LearningRate 0.0114   Epoch: 13   Global Step: 549600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:14,617-Speed 2629.08 samples/sec   Loss 4.5761   LearningRate 0.0114   Epoch: 13   Global Step: 549610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:18,498-Speed 2639.76 samples/sec   Loss 4.6651   LearningRate 0.0114   Epoch: 13   Global Step: 549620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:22,394-Speed 2629.01 samples/sec   Loss 4.5909   LearningRate 0.0114   Epoch: 13   Global Step: 549630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:26,376-Speed 2571.94 samples/sec   Loss 4.5730   LearningRate 0.0114   Epoch: 13   Global Step: 549640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:28:30,249-Speed 2644.26 samples/sec   Loss 4.5019   LearningRate 0.0114   Epoch: 13   Global Step: 549650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:34,142-Speed 2632.04 samples/sec   Loss 4.5343   LearningRate 0.0114   Epoch: 13   Global Step: 549660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:38,041-Speed 2627.31 samples/sec   Loss 4.4885   LearningRate 0.0114   Epoch: 13   Global Step: 549670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:41,937-Speed 2628.63 samples/sec   Loss 4.5199   LearningRate 0.0114   Epoch: 13   Global Step: 549680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:45,837-Speed 2626.47 samples/sec   Loss 4.6210   LearningRate 0.0114   Epoch: 13   Global Step: 549690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:49,737-Speed 2626.21 samples/sec   Loss 4.6058   LearningRate 0.0114   Epoch: 13   Global Step: 549700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:53,633-Speed 2629.35 samples/sec   Loss 4.4854   LearningRate 0.0114   Epoch: 13   Global Step: 549710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:28:57,542-Speed 2620.04 samples/sec   Loss 4.5351   LearningRate 0.0114   Epoch: 13   Global Step: 549720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:01,437-Speed 2629.98 samples/sec   Loss 4.6226   LearningRate 0.0114   Epoch: 13   Global Step: 549730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:05,339-Speed 2625.02 samples/sec   Loss 4.5932   LearningRate 0.0114   Epoch: 13   Global Step: 549740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:09,235-Speed 2629.19 samples/sec   Loss 4.5315   LearningRate 0.0114   Epoch: 13   Global Step: 549750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:29:13,141-Speed 2622.98 samples/sec   Loss 4.5463   LearningRate 0.0114   Epoch: 13   Global Step: 549760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:29:17,012-Speed 2645.59 samples/sec   Loss 4.5443   LearningRate 0.0114   Epoch: 13   Global Step: 549770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:20,932-Speed 2613.60 samples/sec   Loss 4.5001   LearningRate 0.0114   Epoch: 13   Global Step: 549780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:24,826-Speed 2630.12 samples/sec   Loss 4.6065   LearningRate 0.0114   Epoch: 13   Global Step: 549790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:28,726-Speed 2626.33 samples/sec   Loss 4.5859   LearningRate 0.0114   Epoch: 13   Global Step: 549800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:32,621-Speed 2629.55 samples/sec   Loss 4.5082   LearningRate 0.0114   Epoch: 13   Global Step: 549810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:36,514-Speed 2631.64 samples/sec   Loss 4.5553   LearningRate 0.0114   Epoch: 13   Global Step: 549820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:40,429-Speed 2615.74 samples/sec   Loss 4.5938   LearningRate 0.0114   Epoch: 13   Global Step: 549830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:44,337-Speed 2621.20 samples/sec   Loss 4.6579   LearningRate 0.0114   Epoch: 13   Global Step: 549840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:29:48,208-Speed 2646.15 samples/sec   Loss 4.5233   LearningRate 0.0114   Epoch: 13   Global Step: 549850   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:29:52,103-Speed 2629.75 samples/sec   Loss 4.5159   LearningRate 0.0114   Epoch: 13   Global Step: 549860   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:29:55,996-Speed 2631.04 samples/sec   Loss 4.4611   LearningRate 0.0114   Epoch: 13   Global Step: 549870   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:29:59,890-Speed 2630.23 samples/sec   Loss 4.5345   LearningRate 0.0114   Epoch: 13   Global Step: 549880   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:03,787-Speed 2628.20 samples/sec   Loss 4.5579   LearningRate 0.0114   Epoch: 13   Global Step: 549890   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:07,756-Speed 2580.81 samples/sec   Loss 4.5255   LearningRate 0.0114   Epoch: 13   Global Step: 549900   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:11,660-Speed 2623.54 samples/sec   Loss 4.5333   LearningRate 0.0114   Epoch: 13   Global Step: 549910   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:15,560-Speed 2626.29 samples/sec   Loss 4.4390   LearningRate 0.0114   Epoch: 13   Global Step: 549920   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:19,457-Speed 2629.04 samples/sec   Loss 4.4909   LearningRate 0.0114   Epoch: 13   Global Step: 549930   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:23,357-Speed 2625.76 samples/sec   Loss 4.5241   LearningRate 0.0114   Epoch: 13   Global Step: 549940   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:30:27,247-Speed 2633.50 samples/sec   Loss 4.5596   LearningRate 0.0114   Epoch: 13   Global Step: 549950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:30:31,140-Speed 2631.21 samples/sec   Loss 4.6035   LearningRate 0.0114   Epoch: 13   Global Step: 549960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:30:35,044-Speed 2622.97 samples/sec   Loss 4.5768   LearningRate 0.0114   Epoch: 13   Global Step: 549970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:30:38,962-Speed 2614.46 samples/sec   Loss 4.4665   LearningRate 0.0114   Epoch: 13   Global Step: 549980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:30:42,875-Speed 2617.84 samples/sec   Loss 4.5623   LearningRate 0.0114   Epoch: 13   Global Step: 549990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:30:46,773-Speed 2627.44 samples/sec   Loss 4.4588   LearningRate 0.0114   Epoch: 13   Global Step: 550000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:31:30,044-[lfw][550000]XNorm: 22.308013
Training: 2022-04-15 09:31:30,045-[lfw][550000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-15 09:31:30,045-[lfw][550000]Accuracy-Highest: 0.99800
Training: 2022-04-15 09:32:20,260-[cfp_fp][550000]XNorm: 20.862175
Training: 2022-04-15 09:32:20,261-[cfp_fp][550000]Accuracy-Flip: 0.99071+-0.00483
Training: 2022-04-15 09:32:20,263-[cfp_fp][550000]Accuracy-Highest: 0.99086
Training: 2022-04-15 09:33:03,531-[agedb_30][550000]XNorm: 22.551556
Training: 2022-04-15 09:33:03,532-[agedb_30][550000]Accuracy-Flip: 0.97850+-0.00762
Training: 2022-04-15 09:33:03,532-[agedb_30][550000]Accuracy-Highest: 0.98083
Training: 2022-04-15 09:33:07,402-Speed 72.82 samples/sec   Loss 4.4630   LearningRate 0.0114   Epoch: 13   Global Step: 550010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:11,304-Speed 2633.78 samples/sec   Loss 4.6039   LearningRate 0.0114   Epoch: 13   Global Step: 550020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:15,241-Speed 2601.63 samples/sec   Loss 4.6053   LearningRate 0.0114   Epoch: 13   Global Step: 550030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:19,149-Speed 2621.01 samples/sec   Loss 4.5317   LearningRate 0.0114   Epoch: 13   Global Step: 550040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:22,996-Speed 2662.75 samples/sec   Loss 4.5080   LearningRate 0.0114   Epoch: 13   Global Step: 550050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:26,867-Speed 2645.71 samples/sec   Loss 4.4993   LearningRate 0.0114   Epoch: 13   Global Step: 550060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:30,748-Speed 2639.28 samples/sec   Loss 4.5862   LearningRate 0.0114   Epoch: 13   Global Step: 550070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:34,628-Speed 2640.11 samples/sec   Loss 4.6174   LearningRate 0.0114   Epoch: 13   Global Step: 550080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:38,571-Speed 2598.99 samples/sec   Loss 4.5927   LearningRate 0.0114   Epoch: 13   Global Step: 550090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:42,645-Speed 2513.97 samples/sec   Loss 4.5261   LearningRate 0.0113   Epoch: 13   Global Step: 550100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:46,653-Speed 2556.09 samples/sec   Loss 4.5855   LearningRate 0.0113   Epoch: 13   Global Step: 550110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:50,530-Speed 2642.39 samples/sec   Loss 4.4566   LearningRate 0.0113   Epoch: 13   Global Step: 550120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:54,409-Speed 2640.02 samples/sec   Loss 4.4354   LearningRate 0.0113   Epoch: 13   Global Step: 550130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:33:58,335-Speed 2609.46 samples/sec   Loss 4.5951   LearningRate 0.0113   Epoch: 13   Global Step: 550140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:02,230-Speed 2629.89 samples/sec   Loss 4.5657   LearningRate 0.0113   Epoch: 13   Global Step: 550150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:34:06,113-Speed 2638.56 samples/sec   Loss 4.5609   LearningRate 0.0113   Epoch: 13   Global Step: 550160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:34:09,992-Speed 2641.01 samples/sec   Loss 4.6082   LearningRate 0.0113   Epoch: 13   Global Step: 550170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:34:13,884-Speed 2631.78 samples/sec   Loss 4.5367   LearningRate 0.0113   Epoch: 13   Global Step: 550180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:34:17,771-Speed 2634.69 samples/sec   Loss 4.5682   LearningRate 0.0113   Epoch: 13   Global Step: 550190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:34:21,668-Speed 2628.59 samples/sec   Loss 4.5381   LearningRate 0.0113   Epoch: 13   Global Step: 550200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:25,553-Speed 2636.07 samples/sec   Loss 4.5576   LearningRate 0.0113   Epoch: 13   Global Step: 550210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:29,437-Speed 2637.45 samples/sec   Loss 4.6566   LearningRate 0.0113   Epoch: 13   Global Step: 550220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:33,323-Speed 2635.97 samples/sec   Loss 4.4618   LearningRate 0.0113   Epoch: 13   Global Step: 550230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:37,209-Speed 2635.89 samples/sec   Loss 4.5766   LearningRate 0.0113   Epoch: 13   Global Step: 550240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:41,095-Speed 2636.03 samples/sec   Loss 4.5413   LearningRate 0.0113   Epoch: 13   Global Step: 550250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:45,056-Speed 2585.81 samples/sec   Loss 4.5459   LearningRate 0.0113   Epoch: 13   Global Step: 550260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:48,941-Speed 2636.07 samples/sec   Loss 4.5150   LearningRate 0.0113   Epoch: 13   Global Step: 550270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:52,824-Speed 2637.93 samples/sec   Loss 4.4779   LearningRate 0.0113   Epoch: 13   Global Step: 550280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:34:56,712-Speed 2634.25 samples/sec   Loss 4.5118   LearningRate 0.0113   Epoch: 13   Global Step: 550290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:00,601-Speed 2634.13 samples/sec   Loss 4.5939   LearningRate 0.0113   Epoch: 13   Global Step: 550300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:35:04,490-Speed 2633.82 samples/sec   Loss 4.6125   LearningRate 0.0113   Epoch: 13   Global Step: 550310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:35:08,399-Speed 2620.63 samples/sec   Loss 4.4293   LearningRate 0.0113   Epoch: 13   Global Step: 550320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:35:12,285-Speed 2635.51 samples/sec   Loss 4.5844   LearningRate 0.0113   Epoch: 13   Global Step: 550330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:35:16,269-Speed 2570.77 samples/sec   Loss 4.6051   LearningRate 0.0113   Epoch: 13   Global Step: 550340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:20,332-Speed 2520.83 samples/sec   Loss 4.5188   LearningRate 0.0113   Epoch: 13   Global Step: 550350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:24,301-Speed 2580.90 samples/sec   Loss 4.3770   LearningRate 0.0113   Epoch: 13   Global Step: 550360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:28,189-Speed 2634.92 samples/sec   Loss 4.6249   LearningRate 0.0113   Epoch: 13   Global Step: 550370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:32,086-Speed 2628.70 samples/sec   Loss 4.4695   LearningRate 0.0113   Epoch: 13   Global Step: 550380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:35,968-Speed 2638.54 samples/sec   Loss 4.5129   LearningRate 0.0113   Epoch: 13   Global Step: 550390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:39,862-Speed 2630.19 samples/sec   Loss 4.4624   LearningRate 0.0113   Epoch: 13   Global Step: 550400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:43,746-Speed 2636.84 samples/sec   Loss 4.4917   LearningRate 0.0113   Epoch: 13   Global Step: 550410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:47,636-Speed 2633.13 samples/sec   Loss 4.6036   LearningRate 0.0113   Epoch: 13   Global Step: 550420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:51,536-Speed 2626.78 samples/sec   Loss 4.5259   LearningRate 0.0113   Epoch: 13   Global Step: 550430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:35:55,428-Speed 2631.63 samples/sec   Loss 4.5368   LearningRate 0.0113   Epoch: 13   Global Step: 550440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:35:59,318-Speed 2633.56 samples/sec   Loss 4.5907   LearningRate 0.0113   Epoch: 13   Global Step: 550450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:36:03,213-Speed 2629.63 samples/sec   Loss 4.4847   LearningRate 0.0113   Epoch: 13   Global Step: 550460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:36:07,103-Speed 2638.07 samples/sec   Loss 4.4955   LearningRate 0.0113   Epoch: 13   Global Step: 550470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:36:10,962-Speed 2653.94 samples/sec   Loss 4.5022   LearningRate 0.0113   Epoch: 13   Global Step: 550480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:15,038-Speed 2512.92 samples/sec   Loss 4.5539   LearningRate 0.0113   Epoch: 13   Global Step: 550490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:18,921-Speed 2637.81 samples/sec   Loss 4.5367   LearningRate 0.0113   Epoch: 13   Global Step: 550500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:22,807-Speed 2635.99 samples/sec   Loss 4.5598   LearningRate 0.0113   Epoch: 13   Global Step: 550510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:26,694-Speed 2634.72 samples/sec   Loss 4.6121   LearningRate 0.0113   Epoch: 13   Global Step: 550520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:30,579-Speed 2637.51 samples/sec   Loss 4.6273   LearningRate 0.0113   Epoch: 13   Global Step: 550530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:34,476-Speed 2628.13 samples/sec   Loss 4.4915   LearningRate 0.0113   Epoch: 13   Global Step: 550540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:36:38,335-Speed 2654.57 samples/sec   Loss 4.5832   LearningRate 0.0113   Epoch: 13   Global Step: 550550   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:36:42,227-Speed 2631.67 samples/sec   Loss 4.5976   LearningRate 0.0113   Epoch: 13   Global Step: 550560   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:36:46,113-Speed 2635.55 samples/sec   Loss 4.5603   LearningRate 0.0113   Epoch: 13   Global Step: 550570   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:36:49,996-Speed 2637.74 samples/sec   Loss 4.4799   LearningRate 0.0113   Epoch: 13   Global Step: 550580   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:36:53,881-Speed 2636.91 samples/sec   Loss 4.4793   LearningRate 0.0113   Epoch: 13   Global Step: 550590   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:36:57,770-Speed 2633.97 samples/sec   Loss 4.4744   LearningRate 0.0113   Epoch: 13   Global Step: 550600   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:01,667-Speed 2628.34 samples/sec   Loss 4.6310   LearningRate 0.0113   Epoch: 13   Global Step: 550610   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:05,559-Speed 2631.89 samples/sec   Loss 4.5442   LearningRate 0.0113   Epoch: 13   Global Step: 550620   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:09,440-Speed 2638.52 samples/sec   Loss 4.5522   LearningRate 0.0113   Epoch: 13   Global Step: 550630   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:13,328-Speed 2634.76 samples/sec   Loss 4.5238   LearningRate 0.0113   Epoch: 13   Global Step: 550640   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:17,227-Speed 2626.80 samples/sec   Loss 4.5130   LearningRate 0.0113   Epoch: 13   Global Step: 550650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:37:21,115-Speed 2635.04 samples/sec   Loss 4.5381   LearningRate 0.0113   Epoch: 13   Global Step: 550660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:37:25,002-Speed 2634.72 samples/sec   Loss 4.5520   LearningRate 0.0113   Epoch: 13   Global Step: 550670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:37:28,900-Speed 2627.36 samples/sec   Loss 4.5810   LearningRate 0.0113   Epoch: 13   Global Step: 550680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:37:32,796-Speed 2628.81 samples/sec   Loss 4.4741   LearningRate 0.0113   Epoch: 13   Global Step: 550690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:37:36,696-Speed 2627.04 samples/sec   Loss 4.6323   LearningRate 0.0113   Epoch: 13   Global Step: 550700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:37:40,571-Speed 2643.06 samples/sec   Loss 4.5552   LearningRate 0.0113   Epoch: 13   Global Step: 550710   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:44,463-Speed 2631.43 samples/sec   Loss 4.5811   LearningRate 0.0113   Epoch: 13   Global Step: 550720   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:48,352-Speed 2633.82 samples/sec   Loss 4.4352   LearningRate 0.0113   Epoch: 13   Global Step: 550730   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:52,239-Speed 2635.18 samples/sec   Loss 4.5380   LearningRate 0.0113   Epoch: 13   Global Step: 550740   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:37:56,122-Speed 2637.35 samples/sec   Loss 4.4353   LearningRate 0.0113   Epoch: 13   Global Step: 550750   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:38:00,013-Speed 2633.27 samples/sec   Loss 4.4065   LearningRate 0.0113   Epoch: 13   Global Step: 550760   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:38:03,898-Speed 2636.59 samples/sec   Loss 4.4738   LearningRate 0.0113   Epoch: 13   Global Step: 550770   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:38:07,788-Speed 2633.26 samples/sec   Loss 4.5380   LearningRate 0.0113   Epoch: 13   Global Step: 550780   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:38:11,675-Speed 2635.00 samples/sec   Loss 4.5671   LearningRate 0.0113   Epoch: 13   Global Step: 550790   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:38:15,569-Speed 2630.38 samples/sec   Loss 4.6002   LearningRate 0.0113   Epoch: 13   Global Step: 550800   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:38:19,452-Speed 2637.46 samples/sec   Loss 4.5464   LearningRate 0.0113   Epoch: 13   Global Step: 550810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:23,361-Speed 2620.58 samples/sec   Loss 4.5423   LearningRate 0.0113   Epoch: 13   Global Step: 550820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:27,244-Speed 2637.76 samples/sec   Loss 4.4915   LearningRate 0.0113   Epoch: 13   Global Step: 550830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:31,126-Speed 2638.21 samples/sec   Loss 4.5240   LearningRate 0.0113   Epoch: 13   Global Step: 550840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:35,009-Speed 2637.34 samples/sec   Loss 4.5385   LearningRate 0.0113   Epoch: 13   Global Step: 550850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:38,892-Speed 2638.81 samples/sec   Loss 4.4860   LearningRate 0.0113   Epoch: 13   Global Step: 550860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:42,775-Speed 2637.18 samples/sec   Loss 4.5955   LearningRate 0.0113   Epoch: 13   Global Step: 550870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:46,665-Speed 2633.93 samples/sec   Loss 4.5491   LearningRate 0.0113   Epoch: 13   Global Step: 550880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:50,548-Speed 2637.29 samples/sec   Loss 4.5463   LearningRate 0.0113   Epoch: 13   Global Step: 550890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:54,430-Speed 2638.04 samples/sec   Loss 4.5698   LearningRate 0.0113   Epoch: 13   Global Step: 550900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:38:58,322-Speed 2631.95 samples/sec   Loss 4.4243   LearningRate 0.0113   Epoch: 13   Global Step: 550910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:39:02,183-Speed 2653.09 samples/sec   Loss 4.4930   LearningRate 0.0113   Epoch: 13   Global Step: 550920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:06,085-Speed 2625.05 samples/sec   Loss 4.5544   LearningRate 0.0113   Epoch: 13   Global Step: 550930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:09,968-Speed 2637.28 samples/sec   Loss 4.5033   LearningRate 0.0113   Epoch: 13   Global Step: 550940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:13,862-Speed 2630.91 samples/sec   Loss 4.5718   LearningRate 0.0113   Epoch: 13   Global Step: 550950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:17,752-Speed 2633.51 samples/sec   Loss 4.5083   LearningRate 0.0113   Epoch: 13   Global Step: 550960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:21,675-Speed 2610.73 samples/sec   Loss 4.6012   LearningRate 0.0113   Epoch: 13   Global Step: 550970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:25,569-Speed 2629.84 samples/sec   Loss 4.4946   LearningRate 0.0113   Epoch: 13   Global Step: 550980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:29,473-Speed 2623.53 samples/sec   Loss 4.5513   LearningRate 0.0113   Epoch: 13   Global Step: 550990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:33,362-Speed 2634.09 samples/sec   Loss 4.4951   LearningRate 0.0113   Epoch: 13   Global Step: 551000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:37,254-Speed 2631.62 samples/sec   Loss 4.5869   LearningRate 0.0113   Epoch: 13   Global Step: 551010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:39:41,230-Speed 2576.23 samples/sec   Loss 4.5124   LearningRate 0.0113   Epoch: 13   Global Step: 551020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:39:45,310-Speed 2510.29 samples/sec   Loss 4.4623   LearningRate 0.0113   Epoch: 13   Global Step: 551030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:39:49,394-Speed 2508.29 samples/sec   Loss 4.5500   LearningRate 0.0113   Epoch: 13   Global Step: 551040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:39:53,475-Speed 2510.06 samples/sec   Loss 4.4662   LearningRate 0.0113   Epoch: 13   Global Step: 551050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:39:57,518-Speed 2532.92 samples/sec   Loss 4.5634   LearningRate 0.0113   Epoch: 13   Global Step: 551060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:01,497-Speed 2575.01 samples/sec   Loss 4.6174   LearningRate 0.0113   Epoch: 13   Global Step: 551070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:05,384-Speed 2634.63 samples/sec   Loss 4.4628   LearningRate 0.0113   Epoch: 13   Global Step: 551080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:09,269-Speed 2636.44 samples/sec   Loss 4.5502   LearningRate 0.0113   Epoch: 13   Global Step: 551090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:13,181-Speed 2618.54 samples/sec   Loss 4.5904   LearningRate 0.0113   Epoch: 13   Global Step: 551100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:17,071-Speed 2633.59 samples/sec   Loss 4.5772   LearningRate 0.0113   Epoch: 13   Global Step: 551110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:20,964-Speed 2630.84 samples/sec   Loss 4.4989   LearningRate 0.0113   Epoch: 13   Global Step: 551120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:24,858-Speed 2630.03 samples/sec   Loss 4.5668   LearningRate 0.0113   Epoch: 13   Global Step: 551130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:28,741-Speed 2637.74 samples/sec   Loss 4.5421   LearningRate 0.0113   Epoch: 13   Global Step: 551140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:32,625-Speed 2637.19 samples/sec   Loss 4.4885   LearningRate 0.0113   Epoch: 13   Global Step: 551150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:40:36,511-Speed 2635.87 samples/sec   Loss 4.5101   LearningRate 0.0113   Epoch: 13   Global Step: 551160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:40:40,398-Speed 2634.60 samples/sec   Loss 4.4823   LearningRate 0.0113   Epoch: 13   Global Step: 551170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:40:44,284-Speed 2636.41 samples/sec   Loss 4.4853   LearningRate 0.0113   Epoch: 13   Global Step: 551180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:40:48,179-Speed 2629.57 samples/sec   Loss 4.5149   LearningRate 0.0113   Epoch: 13   Global Step: 551190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:40:52,066-Speed 2635.69 samples/sec   Loss 4.4617   LearningRate 0.0113   Epoch: 13   Global Step: 551200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:40:55,973-Speed 2621.25 samples/sec   Loss 4.5847   LearningRate 0.0113   Epoch: 13   Global Step: 551210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:40:59,846-Speed 2644.39 samples/sec   Loss 4.5725   LearningRate 0.0113   Epoch: 13   Global Step: 551220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:03,735-Speed 2633.96 samples/sec   Loss 4.5224   LearningRate 0.0113   Epoch: 13   Global Step: 551230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:07,624-Speed 2634.12 samples/sec   Loss 4.4283   LearningRate 0.0113   Epoch: 13   Global Step: 551240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:11,515-Speed 2632.10 samples/sec   Loss 4.4871   LearningRate 0.0113   Epoch: 13   Global Step: 551250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:15,424-Speed 2620.62 samples/sec   Loss 4.5996   LearningRate 0.0113   Epoch: 13   Global Step: 551260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:19,318-Speed 2630.33 samples/sec   Loss 4.5954   LearningRate 0.0113   Epoch: 13   Global Step: 551270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:23,215-Speed 2628.90 samples/sec   Loss 4.5037   LearningRate 0.0113   Epoch: 13   Global Step: 551280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:27,102-Speed 2634.84 samples/sec   Loss 4.5139   LearningRate 0.0113   Epoch: 13   Global Step: 551290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:31,003-Speed 2625.91 samples/sec   Loss 4.6073   LearningRate 0.0113   Epoch: 13   Global Step: 551300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:34,905-Speed 2624.73 samples/sec   Loss 4.4714   LearningRate 0.0113   Epoch: 13   Global Step: 551310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:38,798-Speed 2630.81 samples/sec   Loss 4.4685   LearningRate 0.0113   Epoch: 13   Global Step: 551320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:41:42,664-Speed 2650.42 samples/sec   Loss 4.5645   LearningRate 0.0113   Epoch: 13   Global Step: 551330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:46,553-Speed 2633.79 samples/sec   Loss 4.5987   LearningRate 0.0112   Epoch: 13   Global Step: 551340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:50,442-Speed 2634.04 samples/sec   Loss 4.6280   LearningRate 0.0112   Epoch: 13   Global Step: 551350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:54,331-Speed 2633.51 samples/sec   Loss 4.5056   LearningRate 0.0112   Epoch: 13   Global Step: 551360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:41:58,218-Speed 2635.24 samples/sec   Loss 4.5585   LearningRate 0.0112   Epoch: 13   Global Step: 551370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:42:02,110-Speed 2632.18 samples/sec   Loss 4.4598   LearningRate 0.0112   Epoch: 13   Global Step: 551380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:42:06,001-Speed 2632.00 samples/sec   Loss 4.4489   LearningRate 0.0112   Epoch: 13   Global Step: 551390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:42:09,900-Speed 2626.95 samples/sec   Loss 4.5131   LearningRate 0.0112   Epoch: 13   Global Step: 551400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:42:13,918-Speed 2549.35 samples/sec   Loss 4.5358   LearningRate 0.0112   Epoch: 13   Global Step: 551410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:42:17,824-Speed 2622.52 samples/sec   Loss 4.4265   LearningRate 0.0112   Epoch: 13   Global Step: 551420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:42:21,817-Speed 2565.11 samples/sec   Loss 4.6353   LearningRate 0.0112   Epoch: 13   Global Step: 551430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:42:25,728-Speed 2619.28 samples/sec   Loss 4.5256   LearningRate 0.0112   Epoch: 13   Global Step: 551440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:42:29,606-Speed 2642.35 samples/sec   Loss 4.4616   LearningRate 0.0112   Epoch: 13   Global Step: 551450   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:33,514-Speed 2621.07 samples/sec   Loss 4.4183   LearningRate 0.0112   Epoch: 13   Global Step: 551460   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:37,415-Speed 2625.48 samples/sec   Loss 4.5544   LearningRate 0.0112   Epoch: 13   Global Step: 551470   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:41,306-Speed 2632.52 samples/sec   Loss 4.5123   LearningRate 0.0112   Epoch: 13   Global Step: 551480   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:45,201-Speed 2629.37 samples/sec   Loss 4.4471   LearningRate 0.0112   Epoch: 13   Global Step: 551490   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:49,108-Speed 2621.68 samples/sec   Loss 4.5138   LearningRate 0.0112   Epoch: 13   Global Step: 551500   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:53,015-Speed 2621.96 samples/sec   Loss 4.5024   LearningRate 0.0112   Epoch: 13   Global Step: 551510   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:42:56,925-Speed 2620.35 samples/sec   Loss 4.6287   LearningRate 0.0112   Epoch: 13   Global Step: 551520   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:43:00,832-Speed 2621.58 samples/sec   Loss 4.5713   LearningRate 0.0112   Epoch: 13   Global Step: 551530   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:43:04,720-Speed 2634.39 samples/sec   Loss 4.5678   LearningRate 0.0112   Epoch: 13   Global Step: 551540   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:43:08,610-Speed 2632.95 samples/sec   Loss 4.5670   LearningRate 0.0112   Epoch: 13   Global Step: 551550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:12,513-Speed 2624.26 samples/sec   Loss 4.6525   LearningRate 0.0112   Epoch: 13   Global Step: 551560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:16,403-Speed 2633.06 samples/sec   Loss 4.5312   LearningRate 0.0112   Epoch: 13   Global Step: 551570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:20,308-Speed 2623.11 samples/sec   Loss 4.4854   LearningRate 0.0112   Epoch: 13   Global Step: 551580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:24,232-Speed 2610.30 samples/sec   Loss 4.3781   LearningRate 0.0112   Epoch: 13   Global Step: 551590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:28,127-Speed 2629.64 samples/sec   Loss 4.6162   LearningRate 0.0112   Epoch: 13   Global Step: 551600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:32,014-Speed 2634.73 samples/sec   Loss 4.5610   LearningRate 0.0112   Epoch: 13   Global Step: 551610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:35,905-Speed 2632.16 samples/sec   Loss 4.5135   LearningRate 0.0112   Epoch: 13   Global Step: 551620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:39,799-Speed 2630.07 samples/sec   Loss 4.5578   LearningRate 0.0112   Epoch: 13   Global Step: 551630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:43,699-Speed 2627.02 samples/sec   Loss 4.4929   LearningRate 0.0112   Epoch: 13   Global Step: 551640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:47,594-Speed 2629.72 samples/sec   Loss 4.5970   LearningRate 0.0112   Epoch: 13   Global Step: 551650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:43:51,460-Speed 2649.36 samples/sec   Loss 4.5177   LearningRate 0.0112   Epoch: 13   Global Step: 551660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:55,348-Speed 2634.21 samples/sec   Loss 4.5107   LearningRate 0.0112   Epoch: 13   Global Step: 551670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:43:59,260-Speed 2618.31 samples/sec   Loss 4.6349   LearningRate 0.0112   Epoch: 13   Global Step: 551680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:03,154-Speed 2630.16 samples/sec   Loss 4.5015   LearningRate 0.0112   Epoch: 13   Global Step: 551690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:07,067-Speed 2617.50 samples/sec   Loss 4.5453   LearningRate 0.0112   Epoch: 13   Global Step: 551700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:10,966-Speed 2626.78 samples/sec   Loss 4.4546   LearningRate 0.0112   Epoch: 13   Global Step: 551710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:14,906-Speed 2599.53 samples/sec   Loss 4.4838   LearningRate 0.0112   Epoch: 13   Global Step: 551720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:18,798-Speed 2632.55 samples/sec   Loss 4.5549   LearningRate 0.0112   Epoch: 13   Global Step: 551730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:22,689-Speed 2632.38 samples/sec   Loss 4.5436   LearningRate 0.0112   Epoch: 13   Global Step: 551740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:26,577-Speed 2634.38 samples/sec   Loss 4.5645   LearningRate 0.0112   Epoch: 13   Global Step: 551750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:30,471-Speed 2629.71 samples/sec   Loss 4.5401   LearningRate 0.0112   Epoch: 13   Global Step: 551760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:44:34,370-Speed 2627.48 samples/sec   Loss 4.4857   LearningRate 0.0112   Epoch: 13   Global Step: 551770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:44:38,268-Speed 2627.22 samples/sec   Loss 4.5661   LearningRate 0.0112   Epoch: 13   Global Step: 551780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:44:42,154-Speed 2635.49 samples/sec   Loss 4.5459   LearningRate 0.0112   Epoch: 13   Global Step: 551790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:46,052-Speed 2628.44 samples/sec   Loss 4.5355   LearningRate 0.0112   Epoch: 13   Global Step: 551800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:49,948-Speed 2629.04 samples/sec   Loss 4.5318   LearningRate 0.0112   Epoch: 13   Global Step: 551810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:53,856-Speed 2620.87 samples/sec   Loss 4.4368   LearningRate 0.0112   Epoch: 13   Global Step: 551820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:44:57,752-Speed 2629.46 samples/sec   Loss 4.4476   LearningRate 0.0112   Epoch: 13   Global Step: 551830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:01,647-Speed 2629.42 samples/sec   Loss 4.5616   LearningRate 0.0112   Epoch: 13   Global Step: 551840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:05,551-Speed 2622.91 samples/sec   Loss 4.5214   LearningRate 0.0112   Epoch: 13   Global Step: 551850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:09,443-Speed 2632.12 samples/sec   Loss 4.5429   LearningRate 0.0112   Epoch: 13   Global Step: 551860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:13,362-Speed 2613.27 samples/sec   Loss 4.5698   LearningRate 0.0112   Epoch: 13   Global Step: 551870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:17,254-Speed 2631.73 samples/sec   Loss 4.5522   LearningRate 0.0112   Epoch: 13   Global Step: 551880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:21,150-Speed 2629.32 samples/sec   Loss 4.4494   LearningRate 0.0112   Epoch: 13   Global Step: 551890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:45:25,021-Speed 2645.98 samples/sec   Loss 4.5249   LearningRate 0.0112   Epoch: 13   Global Step: 551900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:28,916-Speed 2629.13 samples/sec   Loss 4.4459   LearningRate 0.0112   Epoch: 13   Global Step: 551910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:32,812-Speed 2629.14 samples/sec   Loss 4.5604   LearningRate 0.0112   Epoch: 13   Global Step: 551920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:45:36,683-Speed 2645.56 samples/sec   Loss 4.5303   LearningRate 0.0112   Epoch: 13   Global Step: 551930   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:45:40,657-Speed 2577.60 samples/sec   Loss 4.4674   LearningRate 0.0112   Epoch: 13   Global Step: 551940   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:45:44,554-Speed 2628.36 samples/sec   Loss 4.4160   LearningRate 0.0112   Epoch: 13   Global Step: 551950   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:45:48,444-Speed 2633.13 samples/sec   Loss 4.5706   LearningRate 0.0112   Epoch: 13   Global Step: 551960   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:45:52,341-Speed 2628.34 samples/sec   Loss 4.5508   LearningRate 0.0112   Epoch: 13   Global Step: 551970   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:45:56,241-Speed 2626.28 samples/sec   Loss 4.4436   LearningRate 0.0112   Epoch: 13   Global Step: 551980   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:46:00,142-Speed 2625.48 samples/sec   Loss 4.4487   LearningRate 0.0112   Epoch: 13   Global Step: 551990   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:46:04,035-Speed 2630.53 samples/sec   Loss 4.4790   LearningRate 0.0112   Epoch: 13   Global Step: 552000   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:46:07,932-Speed 2628.32 samples/sec   Loss 4.5334   LearningRate 0.0112   Epoch: 13   Global Step: 552010   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:46:11,829-Speed 2628.68 samples/sec   Loss 4.4643   LearningRate 0.0112   Epoch: 13   Global Step: 552020   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:46:15,722-Speed 2631.36 samples/sec   Loss 4.4521   LearningRate 0.0112   Epoch: 13   Global Step: 552030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:19,615-Speed 2630.42 samples/sec   Loss 4.4931   LearningRate 0.0112   Epoch: 13   Global Step: 552040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:23,561-Speed 2596.46 samples/sec   Loss 4.5254   LearningRate 0.0112   Epoch: 13   Global Step: 552050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:27,471-Speed 2618.76 samples/sec   Loss 4.5332   LearningRate 0.0112   Epoch: 13   Global Step: 552060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:31,367-Speed 2629.44 samples/sec   Loss 4.4812   LearningRate 0.0112   Epoch: 13   Global Step: 552070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:35,261-Speed 2629.78 samples/sec   Loss 4.6360   LearningRate 0.0112   Epoch: 13   Global Step: 552080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:39,165-Speed 2624.04 samples/sec   Loss 4.4334   LearningRate 0.0112   Epoch: 13   Global Step: 552090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:43,056-Speed 2631.93 samples/sec   Loss 4.4635   LearningRate 0.0112   Epoch: 13   Global Step: 552100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:46,955-Speed 2627.29 samples/sec   Loss 4.5493   LearningRate 0.0112   Epoch: 13   Global Step: 552110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:50,849-Speed 2630.25 samples/sec   Loss 4.5913   LearningRate 0.0112   Epoch: 13   Global Step: 552120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:46:54,743-Speed 2630.37 samples/sec   Loss 4.5461   LearningRate 0.0112   Epoch: 13   Global Step: 552130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:46:58,639-Speed 2629.03 samples/sec   Loss 4.4652   LearningRate 0.0112   Epoch: 13   Global Step: 552140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:47:02,516-Speed 2641.83 samples/sec   Loss 4.5952   LearningRate 0.0112   Epoch: 13   Global Step: 552150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:06,429-Speed 2617.24 samples/sec   Loss 4.4784   LearningRate 0.0112   Epoch: 13   Global Step: 552160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:10,329-Speed 2626.27 samples/sec   Loss 4.4794   LearningRate 0.0112   Epoch: 13   Global Step: 552170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:14,225-Speed 2629.46 samples/sec   Loss 4.4625   LearningRate 0.0112   Epoch: 13   Global Step: 552180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:18,120-Speed 2629.49 samples/sec   Loss 4.5552   LearningRate 0.0112   Epoch: 13   Global Step: 552190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:22,015-Speed 2629.88 samples/sec   Loss 4.4428   LearningRate 0.0112   Epoch: 13   Global Step: 552200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:25,914-Speed 2626.51 samples/sec   Loss 4.4648   LearningRate 0.0112   Epoch: 13   Global Step: 552210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:29,820-Speed 2622.46 samples/sec   Loss 4.5098   LearningRate 0.0112   Epoch: 13   Global Step: 552220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:33,713-Speed 2631.04 samples/sec   Loss 4.4437   LearningRate 0.0112   Epoch: 13   Global Step: 552230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:37,611-Speed 2627.32 samples/sec   Loss 4.5080   LearningRate 0.0112   Epoch: 13   Global Step: 552240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:41,506-Speed 2629.45 samples/sec   Loss 4.5365   LearningRate 0.0112   Epoch: 13   Global Step: 552250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:47:45,401-Speed 2629.83 samples/sec   Loss 4.5267   LearningRate 0.0112   Epoch: 13   Global Step: 552260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:47:49,385-Speed 2570.94 samples/sec   Loss 4.5875   LearningRate 0.0112   Epoch: 13   Global Step: 552270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:47:53,252-Speed 2648.59 samples/sec   Loss 4.5001   LearningRate 0.0112   Epoch: 13   Global Step: 552280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:47:57,143-Speed 2632.65 samples/sec   Loss 4.4417   LearningRate 0.0112   Epoch: 13   Global Step: 552290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:01,050-Speed 2621.28 samples/sec   Loss 4.5389   LearningRate 0.0112   Epoch: 13   Global Step: 552300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:04,943-Speed 2631.01 samples/sec   Loss 4.4308   LearningRate 0.0112   Epoch: 13   Global Step: 552310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:08,849-Speed 2621.84 samples/sec   Loss 4.5063   LearningRate 0.0112   Epoch: 13   Global Step: 552320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:12,763-Speed 2617.82 samples/sec   Loss 4.5176   LearningRate 0.0112   Epoch: 13   Global Step: 552330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:16,655-Speed 2631.45 samples/sec   Loss 4.5874   LearningRate 0.0112   Epoch: 13   Global Step: 552340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:20,549-Speed 2630.64 samples/sec   Loss 4.4764   LearningRate 0.0112   Epoch: 13   Global Step: 552350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:24,443-Speed 2630.11 samples/sec   Loss 4.4828   LearningRate 0.0112   Epoch: 13   Global Step: 552360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:28,338-Speed 2629.75 samples/sec   Loss 4.5537   LearningRate 0.0112   Epoch: 13   Global Step: 552370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:32,228-Speed 2632.73 samples/sec   Loss 4.4839   LearningRate 0.0112   Epoch: 13   Global Step: 552380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:48:36,123-Speed 2629.02 samples/sec   Loss 4.4152   LearningRate 0.0112   Epoch: 13   Global Step: 552390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:48:39,993-Speed 2647.00 samples/sec   Loss 4.5112   LearningRate 0.0112   Epoch: 13   Global Step: 552400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:43,900-Speed 2621.50 samples/sec   Loss 4.4615   LearningRate 0.0112   Epoch: 13   Global Step: 552410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:47,798-Speed 2627.40 samples/sec   Loss 4.4460   LearningRate 0.0112   Epoch: 13   Global Step: 552420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:51,689-Speed 2632.49 samples/sec   Loss 4.4929   LearningRate 0.0112   Epoch: 13   Global Step: 552430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:55,583-Speed 2630.52 samples/sec   Loss 4.5504   LearningRate 0.0112   Epoch: 13   Global Step: 552440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:48:59,576-Speed 2565.32 samples/sec   Loss 4.4150   LearningRate 0.0112   Epoch: 13   Global Step: 552450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:03,473-Speed 2628.31 samples/sec   Loss 4.4530   LearningRate 0.0112   Epoch: 13   Global Step: 552460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:07,366-Speed 2630.45 samples/sec   Loss 4.4797   LearningRate 0.0112   Epoch: 13   Global Step: 552470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:11,263-Speed 2628.48 samples/sec   Loss 4.5070   LearningRate 0.0112   Epoch: 13   Global Step: 552480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:15,154-Speed 2632.06 samples/sec   Loss 4.5042   LearningRate 0.0112   Epoch: 13   Global Step: 552490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:19,035-Speed 2639.60 samples/sec   Loss 4.5861   LearningRate 0.0112   Epoch: 13   Global Step: 552500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:22,940-Speed 2622.63 samples/sec   Loss 4.5439   LearningRate 0.0112   Epoch: 13   Global Step: 552510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:26,843-Speed 2624.48 samples/sec   Loss 4.5219   LearningRate 0.0112   Epoch: 13   Global Step: 552520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:30,751-Speed 2620.97 samples/sec   Loss 4.5233   LearningRate 0.0112   Epoch: 13   Global Step: 552530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:34,646-Speed 2629.70 samples/sec   Loss 4.4453   LearningRate 0.0112   Epoch: 13   Global Step: 552540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:38,554-Speed 2620.67 samples/sec   Loss 4.4581   LearningRate 0.0112   Epoch: 13   Global Step: 552550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:42,453-Speed 2627.18 samples/sec   Loss 4.5655   LearningRate 0.0112   Epoch: 13   Global Step: 552560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:46,368-Speed 2615.51 samples/sec   Loss 4.5185   LearningRate 0.0111   Epoch: 13   Global Step: 552570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:50,269-Speed 2626.21 samples/sec   Loss 4.5093   LearningRate 0.0111   Epoch: 13   Global Step: 552580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:54,163-Speed 2629.97 samples/sec   Loss 4.4619   LearningRate 0.0111   Epoch: 13   Global Step: 552590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:49:58,075-Speed 2618.02 samples/sec   Loss 4.4148   LearningRate 0.0111   Epoch: 13   Global Step: 552600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:50:01,970-Speed 2629.63 samples/sec   Loss 4.5052   LearningRate 0.0111   Epoch: 13   Global Step: 552610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:50:05,859-Speed 2633.77 samples/sec   Loss 4.4667   LearningRate 0.0111   Epoch: 13   Global Step: 552620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:50:09,753-Speed 2630.30 samples/sec   Loss 4.6015   LearningRate 0.0111   Epoch: 13   Global Step: 552630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:50:13,622-Speed 2647.35 samples/sec   Loss 4.4598   LearningRate 0.0111   Epoch: 13   Global Step: 552640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:17,509-Speed 2634.90 samples/sec   Loss 4.5189   LearningRate 0.0111   Epoch: 13   Global Step: 552650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:21,399-Speed 2633.04 samples/sec   Loss 4.4164   LearningRate 0.0111   Epoch: 13   Global Step: 552660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:25,297-Speed 2627.75 samples/sec   Loss 4.5578   LearningRate 0.0111   Epoch: 13   Global Step: 552670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:29,198-Speed 2626.01 samples/sec   Loss 4.5607   LearningRate 0.0111   Epoch: 13   Global Step: 552680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:33,087-Speed 2633.09 samples/sec   Loss 4.5059   LearningRate 0.0111   Epoch: 13   Global Step: 552690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:36,977-Speed 2632.78 samples/sec   Loss 4.4692   LearningRate 0.0111   Epoch: 13   Global Step: 552700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:40,881-Speed 2624.04 samples/sec   Loss 4.5369   LearningRate 0.0111   Epoch: 13   Global Step: 552710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:44,780-Speed 2627.22 samples/sec   Loss 4.4793   LearningRate 0.0111   Epoch: 13   Global Step: 552720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:48,670-Speed 2633.12 samples/sec   Loss 4.4654   LearningRate 0.0111   Epoch: 13   Global Step: 552730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:50:52,565-Speed 2630.01 samples/sec   Loss 4.4833   LearningRate 0.0111   Epoch: 13   Global Step: 552740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:50:56,460-Speed 2629.45 samples/sec   Loss 4.5060   LearningRate 0.0111   Epoch: 13   Global Step: 552750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:51:00,334-Speed 2643.18 samples/sec   Loss 4.5311   LearningRate 0.0111   Epoch: 13   Global Step: 552760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:04,235-Speed 2625.59 samples/sec   Loss 4.5378   LearningRate 0.0111   Epoch: 13   Global Step: 552770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:08,132-Speed 2628.24 samples/sec   Loss 4.5342   LearningRate 0.0111   Epoch: 13   Global Step: 552780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:12,022-Speed 2632.83 samples/sec   Loss 4.4374   LearningRate 0.0111   Epoch: 13   Global Step: 552790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:15,913-Speed 2633.04 samples/sec   Loss 4.5363   LearningRate 0.0111   Epoch: 13   Global Step: 552800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:19,805-Speed 2631.05 samples/sec   Loss 4.5294   LearningRate 0.0111   Epoch: 13   Global Step: 552810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:23,699-Speed 2630.65 samples/sec   Loss 4.4595   LearningRate 0.0111   Epoch: 13   Global Step: 552820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:27,594-Speed 2630.13 samples/sec   Loss 4.4406   LearningRate 0.0111   Epoch: 13   Global Step: 552830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:31,503-Speed 2619.96 samples/sec   Loss 4.5405   LearningRate 0.0111   Epoch: 13   Global Step: 552840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:35,404-Speed 2625.12 samples/sec   Loss 4.4637   LearningRate 0.0111   Epoch: 13   Global Step: 552850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:39,299-Speed 2629.75 samples/sec   Loss 4.4274   LearningRate 0.0111   Epoch: 13   Global Step: 552860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:51:43,171-Speed 2645.37 samples/sec   Loss 4.4970   LearningRate 0.0111   Epoch: 13   Global Step: 552870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:47,075-Speed 2623.51 samples/sec   Loss 4.4390   LearningRate 0.0111   Epoch: 13   Global Step: 552880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:50,997-Speed 2611.14 samples/sec   Loss 4.4238   LearningRate 0.0111   Epoch: 13   Global Step: 552890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:54,901-Speed 2624.18 samples/sec   Loss 4.5725   LearningRate 0.0111   Epoch: 13   Global Step: 552900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:51:58,796-Speed 2629.55 samples/sec   Loss 4.5178   LearningRate 0.0111   Epoch: 13   Global Step: 552910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:02,691-Speed 2629.37 samples/sec   Loss 4.5225   LearningRate 0.0111   Epoch: 13   Global Step: 552920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:06,583-Speed 2631.95 samples/sec   Loss 4.4917   LearningRate 0.0111   Epoch: 13   Global Step: 552930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:10,474-Speed 2632.42 samples/sec   Loss 4.5029   LearningRate 0.0111   Epoch: 13   Global Step: 552940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:14,368-Speed 2629.92 samples/sec   Loss 4.4922   LearningRate 0.0111   Epoch: 13   Global Step: 552950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:18,263-Speed 2630.07 samples/sec   Loss 4.5039   LearningRate 0.0111   Epoch: 13   Global Step: 552960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:22,156-Speed 2630.75 samples/sec   Loss 4.4871   LearningRate 0.0111   Epoch: 13   Global Step: 552970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:52:26,053-Speed 2628.19 samples/sec   Loss 4.4859   LearningRate 0.0111   Epoch: 13   Global Step: 552980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:52:29,957-Speed 2623.68 samples/sec   Loss 4.4420   LearningRate 0.0111   Epoch: 13   Global Step: 552990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:52:33,846-Speed 2633.14 samples/sec   Loss 4.5887   LearningRate 0.0111   Epoch: 13   Global Step: 553000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:52:37,739-Speed 2631.02 samples/sec   Loss 4.4331   LearningRate 0.0111   Epoch: 13   Global Step: 553010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:52:41,612-Speed 2644.97 samples/sec   Loss 4.4803   LearningRate 0.0111   Epoch: 13   Global Step: 553020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:45,503-Speed 2632.55 samples/sec   Loss 4.5154   LearningRate 0.0111   Epoch: 13   Global Step: 553030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:49,395-Speed 2631.34 samples/sec   Loss 4.4863   LearningRate 0.0111   Epoch: 13   Global Step: 553040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:53,288-Speed 2630.95 samples/sec   Loss 4.5392   LearningRate 0.0111   Epoch: 13   Global Step: 553050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:52:57,252-Speed 2584.15 samples/sec   Loss 4.6212   LearningRate 0.0111   Epoch: 13   Global Step: 553060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:01,151-Speed 2626.79 samples/sec   Loss 4.4743   LearningRate 0.0111   Epoch: 13   Global Step: 553070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:05,052-Speed 2624.97 samples/sec   Loss 4.4955   LearningRate 0.0111   Epoch: 13   Global Step: 553080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:08,945-Speed 2631.26 samples/sec   Loss 4.4738   LearningRate 0.0111   Epoch: 13   Global Step: 553090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:12,842-Speed 2628.13 samples/sec   Loss 4.4478   LearningRate 0.0111   Epoch: 13   Global Step: 553100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:16,734-Speed 2631.97 samples/sec   Loss 4.5169   LearningRate 0.0111   Epoch: 13   Global Step: 553110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:20,639-Speed 2623.27 samples/sec   Loss 4.5005   LearningRate 0.0111   Epoch: 13   Global Step: 553120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:53:24,543-Speed 2623.14 samples/sec   Loss 4.4777   LearningRate 0.0111   Epoch: 13   Global Step: 553130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:53:28,421-Speed 2641.18 samples/sec   Loss 4.6034   LearningRate 0.0111   Epoch: 13   Global Step: 553140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:32,322-Speed 2625.62 samples/sec   Loss 4.5297   LearningRate 0.0111   Epoch: 13   Global Step: 553150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:36,222-Speed 2626.47 samples/sec   Loss 4.4656   LearningRate 0.0111   Epoch: 13   Global Step: 553160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:40,120-Speed 2627.42 samples/sec   Loss 4.4188   LearningRate 0.0111   Epoch: 13   Global Step: 553170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:44,021-Speed 2625.58 samples/sec   Loss 4.4382   LearningRate 0.0111   Epoch: 13   Global Step: 553180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:47,931-Speed 2619.47 samples/sec   Loss 4.5167   LearningRate 0.0111   Epoch: 13   Global Step: 553190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:51,833-Speed 2624.68 samples/sec   Loss 4.5294   LearningRate 0.0111   Epoch: 13   Global Step: 553200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:55,736-Speed 2624.46 samples/sec   Loss 4.4467   LearningRate 0.0111   Epoch: 13   Global Step: 553210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:53:59,630-Speed 2630.84 samples/sec   Loss 4.4752   LearningRate 0.0111   Epoch: 13   Global Step: 553220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:03,522-Speed 2631.35 samples/sec   Loss 4.4935   LearningRate 0.0111   Epoch: 13   Global Step: 553230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:07,423-Speed 2625.08 samples/sec   Loss 4.4286   LearningRate 0.0111   Epoch: 13   Global Step: 553240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:54:11,303-Speed 2639.73 samples/sec   Loss 4.4681   LearningRate 0.0111   Epoch: 13   Global Step: 553250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:15,215-Speed 2626.44 samples/sec   Loss 4.5184   LearningRate 0.0111   Epoch: 13   Global Step: 553260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:19,110-Speed 2629.60 samples/sec   Loss 4.6145   LearningRate 0.0111   Epoch: 13   Global Step: 553270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:23,008-Speed 2627.88 samples/sec   Loss 4.5401   LearningRate 0.0111   Epoch: 13   Global Step: 553280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:26,917-Speed 2620.25 samples/sec   Loss 4.5045   LearningRate 0.0111   Epoch: 13   Global Step: 553290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:30,816-Speed 2626.84 samples/sec   Loss 4.5637   LearningRate 0.0111   Epoch: 13   Global Step: 553300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:34,871-Speed 2526.28 samples/sec   Loss 4.5145   LearningRate 0.0111   Epoch: 13   Global Step: 553310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:38,920-Speed 2529.33 samples/sec   Loss 4.3969   LearningRate 0.0111   Epoch: 13   Global Step: 553320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:42,813-Speed 2631.51 samples/sec   Loss 4.5030   LearningRate 0.0111   Epoch: 13   Global Step: 553330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:46,707-Speed 2629.60 samples/sec   Loss 4.3945   LearningRate 0.0111   Epoch: 13   Global Step: 553340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:50,581-Speed 2644.60 samples/sec   Loss 4.4435   LearningRate 0.0111   Epoch: 13   Global Step: 553350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:54:54,457-Speed 2641.89 samples/sec   Loss 4.5173   LearningRate 0.0111   Epoch: 13   Global Step: 553360   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:54:58,352-Speed 2630.21 samples/sec   Loss 4.5031   LearningRate 0.0111   Epoch: 13   Global Step: 553370   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:02,260-Speed 2620.44 samples/sec   Loss 4.4471   LearningRate 0.0111   Epoch: 13   Global Step: 553380   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:06,171-Speed 2618.36 samples/sec   Loss 4.3828   LearningRate 0.0111   Epoch: 13   Global Step: 553390   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:10,063-Speed 2631.94 samples/sec   Loss 4.3984   LearningRate 0.0111   Epoch: 13   Global Step: 553400   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:13,960-Speed 2628.10 samples/sec   Loss 4.4128   LearningRate 0.0111   Epoch: 13   Global Step: 553410   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:17,855-Speed 2629.93 samples/sec   Loss 4.5349   LearningRate 0.0111   Epoch: 13   Global Step: 553420   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:21,753-Speed 2628.29 samples/sec   Loss 4.4501   LearningRate 0.0111   Epoch: 13   Global Step: 553430   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:25,650-Speed 2628.04 samples/sec   Loss 4.5588   LearningRate 0.0111   Epoch: 13   Global Step: 553440   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:29,545-Speed 2629.70 samples/sec   Loss 4.4183   LearningRate 0.0111   Epoch: 13   Global Step: 553450   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:55:33,439-Speed 2629.71 samples/sec   Loss 4.5669   LearningRate 0.0111   Epoch: 13   Global Step: 553460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:55:37,334-Speed 2629.48 samples/sec   Loss 4.4489   LearningRate 0.0111   Epoch: 13   Global Step: 553470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:55:41,225-Speed 2632.28 samples/sec   Loss 4.5162   LearningRate 0.0111   Epoch: 13   Global Step: 553480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:55:45,128-Speed 2624.15 samples/sec   Loss 4.4915   LearningRate 0.0111   Epoch: 13   Global Step: 553490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:55:49,023-Speed 2629.85 samples/sec   Loss 4.6169   LearningRate 0.0111   Epoch: 13   Global Step: 553500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:55:52,914-Speed 2632.27 samples/sec   Loss 4.5404   LearningRate 0.0111   Epoch: 13   Global Step: 553510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:55:56,789-Speed 2643.27 samples/sec   Loss 4.4647   LearningRate 0.0111   Epoch: 13   Global Step: 553520   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:00,703-Speed 2617.14 samples/sec   Loss 4.5086   LearningRate 0.0111   Epoch: 13   Global Step: 553530   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:04,602-Speed 2627.06 samples/sec   Loss 4.4158   LearningRate 0.0111   Epoch: 13   Global Step: 553540   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:08,502-Speed 2625.92 samples/sec   Loss 4.3793   LearningRate 0.0111   Epoch: 13   Global Step: 553550   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:12,403-Speed 2625.55 samples/sec   Loss 4.4956   LearningRate 0.0111   Epoch: 13   Global Step: 553560   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:16,302-Speed 2627.13 samples/sec   Loss 4.4902   LearningRate 0.0111   Epoch: 13   Global Step: 553570   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:20,201-Speed 2627.07 samples/sec   Loss 4.5487   LearningRate 0.0111   Epoch: 13   Global Step: 553580   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:24,097-Speed 2629.06 samples/sec   Loss 4.5557   LearningRate 0.0111   Epoch: 13   Global Step: 553590   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:27,993-Speed 2629.00 samples/sec   Loss 4.4843   LearningRate 0.0111   Epoch: 13   Global Step: 553600   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:31,887-Speed 2629.98 samples/sec   Loss 4.4384   LearningRate 0.0111   Epoch: 13   Global Step: 553610   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:56:35,780-Speed 2631.07 samples/sec   Loss 4.5531   LearningRate 0.0111   Epoch: 13   Global Step: 553620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:56:39,674-Speed 2630.45 samples/sec   Loss 4.4129   LearningRate 0.0111   Epoch: 13   Global Step: 553630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:56:43,573-Speed 2627.19 samples/sec   Loss 4.5130   LearningRate 0.0111   Epoch: 13   Global Step: 553640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:56:47,466-Speed 2630.96 samples/sec   Loss 4.4912   LearningRate 0.0111   Epoch: 13   Global Step: 553650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:56:51,361-Speed 2629.30 samples/sec   Loss 4.5228   LearningRate 0.0111   Epoch: 13   Global Step: 553660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:56:55,258-Speed 2628.59 samples/sec   Loss 4.5097   LearningRate 0.0111   Epoch: 13   Global Step: 553670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:56:59,161-Speed 2624.11 samples/sec   Loss 4.4621   LearningRate 0.0111   Epoch: 13   Global Step: 553680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:03,061-Speed 2626.23 samples/sec   Loss 4.5168   LearningRate 0.0111   Epoch: 13   Global Step: 553690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:06,974-Speed 2617.15 samples/sec   Loss 4.4128   LearningRate 0.0111   Epoch: 13   Global Step: 553700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:10,880-Speed 2622.60 samples/sec   Loss 4.4274   LearningRate 0.0111   Epoch: 13   Global Step: 553710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:14,783-Speed 2624.95 samples/sec   Loss 4.4757   LearningRate 0.0111   Epoch: 13   Global Step: 553720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:57:18,674-Speed 2631.94 samples/sec   Loss 4.5404   LearningRate 0.0111   Epoch: 13   Global Step: 553730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:22,565-Speed 2633.35 samples/sec   Loss 4.5491   LearningRate 0.0111   Epoch: 13   Global Step: 553740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:26,459-Speed 2629.48 samples/sec   Loss 4.5166   LearningRate 0.0111   Epoch: 13   Global Step: 553750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:30,366-Speed 2621.77 samples/sec   Loss 4.4982   LearningRate 0.0111   Epoch: 13   Global Step: 553760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:57:34,237-Speed 2645.46 samples/sec   Loss 4.4956   LearningRate 0.0111   Epoch: 13   Global Step: 553770   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:57:38,131-Speed 2631.45 samples/sec   Loss 4.4338   LearningRate 0.0111   Epoch: 13   Global Step: 553780   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:57:42,030-Speed 2626.50 samples/sec   Loss 4.5279   LearningRate 0.0111   Epoch: 13   Global Step: 553790   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:57:45,923-Speed 2631.62 samples/sec   Loss 4.5405   LearningRate 0.0111   Epoch: 13   Global Step: 553800   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:57:49,820-Speed 2628.14 samples/sec   Loss 4.4923   LearningRate 0.0111   Epoch: 13   Global Step: 553810   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:57:53,716-Speed 2629.50 samples/sec   Loss 4.4219   LearningRate 0.0110   Epoch: 13   Global Step: 553820   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:57:57,610-Speed 2629.96 samples/sec   Loss 4.5204   LearningRate 0.0110   Epoch: 13   Global Step: 553830   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:58:01,505-Speed 2629.64 samples/sec   Loss 4.4487   LearningRate 0.0110   Epoch: 13   Global Step: 553840   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:58:05,400-Speed 2629.10 samples/sec   Loss 4.4787   LearningRate 0.0110   Epoch: 13   Global Step: 553850   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:58:09,298-Speed 2627.95 samples/sec   Loss 4.4623   LearningRate 0.0110   Epoch: 13   Global Step: 553860   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:58:13,196-Speed 2627.10 samples/sec   Loss 4.4783   LearningRate 0.0110   Epoch: 13   Global Step: 553870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:17,100-Speed 2624.18 samples/sec   Loss 4.5224   LearningRate 0.0110   Epoch: 13   Global Step: 553880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:20,995-Speed 2629.58 samples/sec   Loss 4.4350   LearningRate 0.0110   Epoch: 13   Global Step: 553890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:24,894-Speed 2626.77 samples/sec   Loss 4.5070   LearningRate 0.0110   Epoch: 13   Global Step: 553900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:28,794-Speed 2626.79 samples/sec   Loss 4.5011   LearningRate 0.0110   Epoch: 13   Global Step: 553910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:32,698-Speed 2623.43 samples/sec   Loss 4.4012   LearningRate 0.0110   Epoch: 13   Global Step: 553920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:36,597-Speed 2626.61 samples/sec   Loss 4.5119   LearningRate 0.0110   Epoch: 13   Global Step: 553930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:40,500-Speed 2624.09 samples/sec   Loss 4.4800   LearningRate 0.0110   Epoch: 13   Global Step: 553940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:44,398-Speed 2627.37 samples/sec   Loss 4.4670   LearningRate 0.0110   Epoch: 13   Global Step: 553950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:48,298-Speed 2626.80 samples/sec   Loss 4.4942   LearningRate 0.0110   Epoch: 13   Global Step: 553960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:58:52,203-Speed 2622.78 samples/sec   Loss 4.5148   LearningRate 0.0110   Epoch: 13   Global Step: 553970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:58:56,096-Speed 2630.74 samples/sec   Loss 4.5049   LearningRate 0.0110   Epoch: 13   Global Step: 553980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:58:59,990-Speed 2630.16 samples/sec   Loss 4.3552   LearningRate 0.0110   Epoch: 13   Global Step: 553990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:59:03,888-Speed 2627.94 samples/sec   Loss 4.4642   LearningRate 0.0110   Epoch: 13   Global Step: 554000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 09:59:07,764-Speed 2642.49 samples/sec   Loss 4.5314   LearningRate 0.0110   Epoch: 13   Global Step: 554010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:59:11,657-Speed 2631.14 samples/sec   Loss 4.5900   LearningRate 0.0110   Epoch: 13   Global Step: 554020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:59:15,552-Speed 2630.14 samples/sec   Loss 4.4933   LearningRate 0.0110   Epoch: 13   Global Step: 554030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:59:19,448-Speed 2628.85 samples/sec   Loss 4.4507   LearningRate 0.0110   Epoch: 13   Global Step: 554040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 09:59:23,328-Speed 2639.43 samples/sec   Loss 4.5348   LearningRate 0.0110   Epoch: 13   Global Step: 554050   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:27,239-Speed 2618.88 samples/sec   Loss 4.4677   LearningRate 0.0110   Epoch: 13   Global Step: 554060   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:31,145-Speed 2622.39 samples/sec   Loss 4.5141   LearningRate 0.0110   Epoch: 13   Global Step: 554070   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:35,036-Speed 2632.28 samples/sec   Loss 4.4591   LearningRate 0.0110   Epoch: 13   Global Step: 554080   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:38,937-Speed 2625.47 samples/sec   Loss 4.5621   LearningRate 0.0110   Epoch: 13   Global Step: 554090   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:42,858-Speed 2612.62 samples/sec   Loss 4.4852   LearningRate 0.0110   Epoch: 13   Global Step: 554100   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:46,801-Speed 2597.30 samples/sec   Loss 4.4707   LearningRate 0.0110   Epoch: 13   Global Step: 554110   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:50,694-Speed 2631.65 samples/sec   Loss 4.4438   LearningRate 0.0110   Epoch: 13   Global Step: 554120   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:54,590-Speed 2628.78 samples/sec   Loss 4.5161   LearningRate 0.0110   Epoch: 13   Global Step: 554130   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 09:59:58,490-Speed 2626.44 samples/sec   Loss 4.3607   LearningRate 0.0110   Epoch: 13   Global Step: 554140   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:00:02,388-Speed 2627.42 samples/sec   Loss 4.4536   LearningRate 0.0110   Epoch: 13   Global Step: 554150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:06,285-Speed 2627.51 samples/sec   Loss 4.4859   LearningRate 0.0110   Epoch: 13   Global Step: 554160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:10,179-Speed 2630.65 samples/sec   Loss 4.5367   LearningRate 0.0110   Epoch: 13   Global Step: 554170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:14,080-Speed 2626.03 samples/sec   Loss 4.4826   LearningRate 0.0110   Epoch: 13   Global Step: 554180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:17,993-Speed 2617.42 samples/sec   Loss 4.4526   LearningRate 0.0110   Epoch: 13   Global Step: 554190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:21,885-Speed 2632.40 samples/sec   Loss 4.4026   LearningRate 0.0110   Epoch: 13   Global Step: 554200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:25,785-Speed 2626.02 samples/sec   Loss 4.5014   LearningRate 0.0110   Epoch: 13   Global Step: 554210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:29,684-Speed 2626.69 samples/sec   Loss 4.4912   LearningRate 0.0110   Epoch: 13   Global Step: 554220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:33,588-Speed 2623.72 samples/sec   Loss 4.3458   LearningRate 0.0110   Epoch: 13   Global Step: 554230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:00:37,465-Speed 2641.43 samples/sec   Loss 4.4648   LearningRate 0.0110   Epoch: 13   Global Step: 554240   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:00:41,363-Speed 2627.88 samples/sec   Loss 4.5491   LearningRate 0.0110   Epoch: 13   Global Step: 554250   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:00:45,258-Speed 2629.67 samples/sec   Loss 4.5546   LearningRate 0.0110   Epoch: 13   Global Step: 554260   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:00:49,149-Speed 2631.78 samples/sec   Loss 4.4444   LearningRate 0.0110   Epoch: 13   Global Step: 554270   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:00:53,051-Speed 2625.39 samples/sec   Loss 4.4949   LearningRate 0.0110   Epoch: 13   Global Step: 554280   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:00:56,951-Speed 2626.45 samples/sec   Loss 4.4940   LearningRate 0.0110   Epoch: 13   Global Step: 554290   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:01:00,850-Speed 2627.09 samples/sec   Loss 4.3870   LearningRate 0.0110   Epoch: 13   Global Step: 554300   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:01:04,791-Speed 2598.43 samples/sec   Loss 4.4023   LearningRate 0.0110   Epoch: 13   Global Step: 554310   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:01:08,694-Speed 2624.08 samples/sec   Loss 4.5472   LearningRate 0.0110   Epoch: 13   Global Step: 554320   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:01:12,600-Speed 2622.09 samples/sec   Loss 4.4924   LearningRate 0.0110   Epoch: 13   Global Step: 554330   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:01:16,499-Speed 2627.01 samples/sec   Loss 4.4283   LearningRate 0.0110   Epoch: 13   Global Step: 554340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:20,402-Speed 2624.66 samples/sec   Loss 4.5443   LearningRate 0.0110   Epoch: 13   Global Step: 554350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:24,304-Speed 2624.89 samples/sec   Loss 4.5859   LearningRate 0.0110   Epoch: 13   Global Step: 554360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:28,202-Speed 2627.83 samples/sec   Loss 4.5136   LearningRate 0.0110   Epoch: 13   Global Step: 554370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:32,116-Speed 2616.95 samples/sec   Loss 4.5051   LearningRate 0.0110   Epoch: 13   Global Step: 554380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:36,046-Speed 2605.90 samples/sec   Loss 4.4197   LearningRate 0.0110   Epoch: 13   Global Step: 554390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:39,949-Speed 2624.06 samples/sec   Loss 4.5817   LearningRate 0.0110   Epoch: 13   Global Step: 554400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:43,854-Speed 2622.88 samples/sec   Loss 4.4660   LearningRate 0.0110   Epoch: 13   Global Step: 554410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:47,753-Speed 2627.19 samples/sec   Loss 4.4222   LearningRate 0.0110   Epoch: 13   Global Step: 554420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:51,648-Speed 2629.42 samples/sec   Loss 4.4208   LearningRate 0.0110   Epoch: 13   Global Step: 554430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:01:55,546-Speed 2627.52 samples/sec   Loss 4.4300   LearningRate 0.0110   Epoch: 13   Global Step: 554440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:01:59,447-Speed 2626.15 samples/sec   Loss 4.4669   LearningRate 0.0110   Epoch: 13   Global Step: 554450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:02:03,343-Speed 2628.28 samples/sec   Loss 4.5376   LearningRate 0.0110   Epoch: 13   Global Step: 554460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:02:07,237-Speed 2630.53 samples/sec   Loss 4.5139   LearningRate 0.0110   Epoch: 13   Global Step: 554470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:02:11,112-Speed 2643.09 samples/sec   Loss 4.3448   LearningRate 0.0110   Epoch: 13   Global Step: 554480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:15,011-Speed 2627.15 samples/sec   Loss 4.4740   LearningRate 0.0110   Epoch: 13   Global Step: 554490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:18,917-Speed 2622.07 samples/sec   Loss 4.4814   LearningRate 0.0110   Epoch: 13   Global Step: 554500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:22,813-Speed 2628.83 samples/sec   Loss 4.4344   LearningRate 0.0110   Epoch: 13   Global Step: 554510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:26,746-Speed 2604.11 samples/sec   Loss 4.5433   LearningRate 0.0110   Epoch: 13   Global Step: 554520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:30,642-Speed 2629.69 samples/sec   Loss 4.3600   LearningRate 0.0110   Epoch: 13   Global Step: 554530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:34,546-Speed 2623.05 samples/sec   Loss 4.5990   LearningRate 0.0110   Epoch: 13   Global Step: 554540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:38,453-Speed 2621.15 samples/sec   Loss 4.4243   LearningRate 0.0110   Epoch: 13   Global Step: 554550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:42,365-Speed 2618.24 samples/sec   Loss 4.5051   LearningRate 0.0110   Epoch: 13   Global Step: 554560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:46,263-Speed 2627.85 samples/sec   Loss 4.4007   LearningRate 0.0110   Epoch: 13   Global Step: 554570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:50,140-Speed 2641.89 samples/sec   Loss 4.4813   LearningRate 0.0110   Epoch: 13   Global Step: 554580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:54,035-Speed 2630.02 samples/sec   Loss 4.3414   LearningRate 0.0110   Epoch: 13   Global Step: 554590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:02:57,929-Speed 2629.92 samples/sec   Loss 4.4498   LearningRate 0.0110   Epoch: 13   Global Step: 554600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:01,829-Speed 2626.38 samples/sec   Loss 4.4690   LearningRate 0.0110   Epoch: 13   Global Step: 554610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:05,730-Speed 2625.62 samples/sec   Loss 4.4151   LearningRate 0.0110   Epoch: 13   Global Step: 554620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:09,629-Speed 2627.00 samples/sec   Loss 4.5537   LearningRate 0.0110   Epoch: 13   Global Step: 554630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:13,530-Speed 2625.23 samples/sec   Loss 4.4888   LearningRate 0.0110   Epoch: 13   Global Step: 554640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:17,427-Speed 2628.40 samples/sec   Loss 4.4126   LearningRate 0.0110   Epoch: 13   Global Step: 554650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:21,327-Speed 2625.91 samples/sec   Loss 4.4754   LearningRate 0.0110   Epoch: 13   Global Step: 554660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:25,222-Speed 2630.00 samples/sec   Loss 4.4695   LearningRate 0.0110   Epoch: 13   Global Step: 554670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:29,163-Speed 2599.34 samples/sec   Loss 4.5195   LearningRate 0.0110   Epoch: 13   Global Step: 554680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:03:33,042-Speed 2640.12 samples/sec   Loss 4.4248   LearningRate 0.0110   Epoch: 13   Global Step: 554690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:36,937-Speed 2629.38 samples/sec   Loss 4.4659   LearningRate 0.0110   Epoch: 13   Global Step: 554700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:40,831-Speed 2630.49 samples/sec   Loss 4.5041   LearningRate 0.0110   Epoch: 13   Global Step: 554710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:44,725-Speed 2630.48 samples/sec   Loss 4.4380   LearningRate 0.0110   Epoch: 13   Global Step: 554720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:48,633-Speed 2620.71 samples/sec   Loss 4.4565   LearningRate 0.0110   Epoch: 13   Global Step: 554730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:52,531-Speed 2627.34 samples/sec   Loss 4.5188   LearningRate 0.0110   Epoch: 13   Global Step: 554740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:03:56,425-Speed 2630.80 samples/sec   Loss 4.4740   LearningRate 0.0110   Epoch: 13   Global Step: 554750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:00,322-Speed 2628.28 samples/sec   Loss 4.4477   LearningRate 0.0110   Epoch: 13   Global Step: 554760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:04,222-Speed 2625.93 samples/sec   Loss 4.4817   LearningRate 0.0110   Epoch: 13   Global Step: 554770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:08,120-Speed 2627.95 samples/sec   Loss 4.4471   LearningRate 0.0110   Epoch: 13   Global Step: 554780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:12,019-Speed 2627.21 samples/sec   Loss 4.3901   LearningRate 0.0110   Epoch: 13   Global Step: 554790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:04:15,918-Speed 2627.10 samples/sec   Loss 4.3889   LearningRate 0.0110   Epoch: 13   Global Step: 554800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:04:19,813-Speed 2629.13 samples/sec   Loss 4.3927   LearningRate 0.0110   Epoch: 13   Global Step: 554810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:04:23,751-Speed 2600.98 samples/sec   Loss 4.4370   LearningRate 0.0110   Epoch: 13   Global Step: 554820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:27,690-Speed 2600.37 samples/sec   Loss 4.5691   LearningRate 0.0110   Epoch: 13   Global Step: 554830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:31,616-Speed 2608.47 samples/sec   Loss 4.4316   LearningRate 0.0110   Epoch: 13   Global Step: 554840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:35,508-Speed 2631.59 samples/sec   Loss 4.4669   LearningRate 0.0110   Epoch: 13   Global Step: 554850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:39,434-Speed 2609.23 samples/sec   Loss 4.4591   LearningRate 0.0110   Epoch: 13   Global Step: 554860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:43,328-Speed 2630.05 samples/sec   Loss 4.4926   LearningRate 0.0110   Epoch: 13   Global Step: 554870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:47,220-Speed 2632.23 samples/sec   Loss 4.4840   LearningRate 0.0110   Epoch: 13   Global Step: 554880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:51,111-Speed 2632.06 samples/sec   Loss 4.3954   LearningRate 0.0110   Epoch: 13   Global Step: 554890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:55,013-Speed 2624.93 samples/sec   Loss 4.4834   LearningRate 0.0110   Epoch: 13   Global Step: 554900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:04:58,909-Speed 2628.75 samples/sec   Loss 4.4331   LearningRate 0.0110   Epoch: 13   Global Step: 554910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:02,797-Speed 2634.20 samples/sec   Loss 4.4846   LearningRate 0.0110   Epoch: 13   Global Step: 554920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:06,691-Speed 2630.02 samples/sec   Loss 4.4313   LearningRate 0.0110   Epoch: 13   Global Step: 554930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:10,598-Speed 2621.84 samples/sec   Loss 4.3228   LearningRate 0.0110   Epoch: 13   Global Step: 554940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:14,544-Speed 2595.96 samples/sec   Loss 4.4563   LearningRate 0.0110   Epoch: 13   Global Step: 554950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:18,447-Speed 2623.63 samples/sec   Loss 4.5070   LearningRate 0.0110   Epoch: 13   Global Step: 554960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:22,342-Speed 2630.22 samples/sec   Loss 4.4467   LearningRate 0.0110   Epoch: 13   Global Step: 554970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:26,238-Speed 2629.04 samples/sec   Loss 4.4633   LearningRate 0.0110   Epoch: 13   Global Step: 554980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:30,128-Speed 2633.12 samples/sec   Loss 4.5417   LearningRate 0.0110   Epoch: 13   Global Step: 554990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:34,021-Speed 2630.57 samples/sec   Loss 4.5250   LearningRate 0.0110   Epoch: 13   Global Step: 555000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:37,913-Speed 2631.49 samples/sec   Loss 4.5062   LearningRate 0.0110   Epoch: 13   Global Step: 555010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:41,805-Speed 2631.77 samples/sec   Loss 4.5323   LearningRate 0.0110   Epoch: 13   Global Step: 555020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:05:45,697-Speed 2631.48 samples/sec   Loss 4.4691   LearningRate 0.0110   Epoch: 13   Global Step: 555030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:05:49,598-Speed 2625.86 samples/sec   Loss 4.4608   LearningRate 0.0110   Epoch: 13   Global Step: 555040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:05:53,471-Speed 2644.77 samples/sec   Loss 4.5097   LearningRate 0.0110   Epoch: 13   Global Step: 555050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:05:57,375-Speed 2623.44 samples/sec   Loss 4.5657   LearningRate 0.0110   Epoch: 13   Global Step: 555060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:01,267-Speed 2631.92 samples/sec   Loss 4.4046   LearningRate 0.0109   Epoch: 13   Global Step: 555070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:05,170-Speed 2624.06 samples/sec   Loss 4.4237   LearningRate 0.0109   Epoch: 13   Global Step: 555080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:09,087-Speed 2615.12 samples/sec   Loss 4.4164   LearningRate 0.0109   Epoch: 13   Global Step: 555090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:12,979-Speed 2630.98 samples/sec   Loss 4.4268   LearningRate 0.0109   Epoch: 13   Global Step: 555100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:16,875-Speed 2629.46 samples/sec   Loss 4.4450   LearningRate 0.0109   Epoch: 13   Global Step: 555110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:20,774-Speed 2626.77 samples/sec   Loss 4.4697   LearningRate 0.0109   Epoch: 13   Global Step: 555120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:24,666-Speed 2631.36 samples/sec   Loss 4.4361   LearningRate 0.0109   Epoch: 13   Global Step: 555130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:28,561-Speed 2629.81 samples/sec   Loss 4.5182   LearningRate 0.0109   Epoch: 13   Global Step: 555140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:32,454-Speed 2631.23 samples/sec   Loss 4.4459   LearningRate 0.0109   Epoch: 13   Global Step: 555150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:06:36,349-Speed 2629.27 samples/sec   Loss 4.4028   LearningRate 0.0109   Epoch: 13   Global Step: 555160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:06:40,250-Speed 2625.97 samples/sec   Loss 4.5445   LearningRate 0.0109   Epoch: 13   Global Step: 555170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:06:44,119-Speed 2647.33 samples/sec   Loss 4.4631   LearningRate 0.0109   Epoch: 13   Global Step: 555180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:48,016-Speed 2628.25 samples/sec   Loss 4.4339   LearningRate 0.0109   Epoch: 13   Global Step: 555190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:51,909-Speed 2631.64 samples/sec   Loss 4.3872   LearningRate 0.0109   Epoch: 13   Global Step: 555200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:55,806-Speed 2628.31 samples/sec   Loss 4.4013   LearningRate 0.0109   Epoch: 13   Global Step: 555210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:06:59,715-Speed 2621.05 samples/sec   Loss 4.3599   LearningRate 0.0109   Epoch: 13   Global Step: 555220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:03,615-Speed 2625.82 samples/sec   Loss 4.4856   LearningRate 0.0109   Epoch: 13   Global Step: 555230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:07,519-Speed 2623.10 samples/sec   Loss 4.4892   LearningRate 0.0109   Epoch: 13   Global Step: 555240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:11,434-Speed 2615.96 samples/sec   Loss 4.4785   LearningRate 0.0109   Epoch: 13   Global Step: 555250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:15,336-Speed 2625.53 samples/sec   Loss 4.4965   LearningRate 0.0109   Epoch: 13   Global Step: 555260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:19,235-Speed 2627.18 samples/sec   Loss 4.4718   LearningRate 0.0109   Epoch: 13   Global Step: 555270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:23,128-Speed 2631.57 samples/sec   Loss 4.4745   LearningRate 0.0109   Epoch: 13   Global Step: 555280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:07:27,028-Speed 2625.90 samples/sec   Loss 4.5094   LearningRate 0.0109   Epoch: 13   Global Step: 555290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:07:30,926-Speed 2628.05 samples/sec   Loss 4.5203   LearningRate 0.0109   Epoch: 13   Global Step: 555300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:07:34,800-Speed 2643.76 samples/sec   Loss 4.4405   LearningRate 0.0109   Epoch: 13   Global Step: 555310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:38,693-Speed 2630.75 samples/sec   Loss 4.3398   LearningRate 0.0109   Epoch: 13   Global Step: 555320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:42,591-Speed 2627.78 samples/sec   Loss 4.4091   LearningRate 0.0109   Epoch: 13   Global Step: 555330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:46,509-Speed 2614.37 samples/sec   Loss 4.5140   LearningRate 0.0109   Epoch: 13   Global Step: 555340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:50,403-Speed 2630.06 samples/sec   Loss 4.5361   LearningRate 0.0109   Epoch: 13   Global Step: 555350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:54,301-Speed 2627.72 samples/sec   Loss 4.4969   LearningRate 0.0109   Epoch: 13   Global Step: 555360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:07:58,202-Speed 2625.75 samples/sec   Loss 4.4371   LearningRate 0.0109   Epoch: 13   Global Step: 555370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:02,099-Speed 2628.40 samples/sec   Loss 4.5255   LearningRate 0.0109   Epoch: 13   Global Step: 555380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:05,996-Speed 2628.50 samples/sec   Loss 4.4647   LearningRate 0.0109   Epoch: 13   Global Step: 555390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:09,888-Speed 2631.37 samples/sec   Loss 4.4160   LearningRate 0.0109   Epoch: 13   Global Step: 555400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:13,785-Speed 2628.18 samples/sec   Loss 4.5791   LearningRate 0.0109   Epoch: 13   Global Step: 555410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:08:17,680-Speed 2629.29 samples/sec   Loss 4.4705   LearningRate 0.0109   Epoch: 13   Global Step: 555420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:08:21,591-Speed 2619.14 samples/sec   Loss 4.5126   LearningRate 0.0109   Epoch: 13   Global Step: 555430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:08:25,482-Speed 2631.76 samples/sec   Loss 4.4841   LearningRate 0.0109   Epoch: 13   Global Step: 555440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:08:29,350-Speed 2648.05 samples/sec   Loss 4.4135   LearningRate 0.0109   Epoch: 13   Global Step: 555450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:33,241-Speed 2631.96 samples/sec   Loss 4.5151   LearningRate 0.0109   Epoch: 13   Global Step: 555460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:37,132-Speed 2633.17 samples/sec   Loss 4.3778   LearningRate 0.0109   Epoch: 13   Global Step: 555470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:41,026-Speed 2630.30 samples/sec   Loss 4.5484   LearningRate 0.0109   Epoch: 13   Global Step: 555480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:44,919-Speed 2630.99 samples/sec   Loss 4.4103   LearningRate 0.0109   Epoch: 13   Global Step: 555490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:48,817-Speed 2627.95 samples/sec   Loss 4.4409   LearningRate 0.0109   Epoch: 13   Global Step: 555500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:52,715-Speed 2626.93 samples/sec   Loss 4.4119   LearningRate 0.0109   Epoch: 13   Global Step: 555510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:08:56,607-Speed 2631.74 samples/sec   Loss 4.5091   LearningRate 0.0109   Epoch: 13   Global Step: 555520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:00,502-Speed 2630.08 samples/sec   Loss 4.4950   LearningRate 0.0109   Epoch: 13   Global Step: 555530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:04,400-Speed 2627.30 samples/sec   Loss 4.4180   LearningRate 0.0109   Epoch: 13   Global Step: 555540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:08,295-Speed 2629.55 samples/sec   Loss 4.3826   LearningRate 0.0109   Epoch: 13   Global Step: 555550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:09:12,191-Speed 2629.08 samples/sec   Loss 4.4502   LearningRate 0.0109   Epoch: 13   Global Step: 555560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:09:16,083-Speed 2632.10 samples/sec   Loss 4.4721   LearningRate 0.0109   Epoch: 13   Global Step: 555570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:09:19,980-Speed 2628.14 samples/sec   Loss 4.3667   LearningRate 0.0109   Epoch: 13   Global Step: 555580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:09:23,869-Speed 2634.33 samples/sec   Loss 4.4163   LearningRate 0.0109   Epoch: 13   Global Step: 555590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:27,775-Speed 2621.86 samples/sec   Loss 4.4640   LearningRate 0.0109   Epoch: 13   Global Step: 555600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:31,678-Speed 2624.18 samples/sec   Loss 4.4352   LearningRate 0.0109   Epoch: 13   Global Step: 555610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:35,577-Speed 2627.02 samples/sec   Loss 4.5120   LearningRate 0.0109   Epoch: 13   Global Step: 555620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:39,601-Speed 2544.94 samples/sec   Loss 4.5073   LearningRate 0.0109   Epoch: 13   Global Step: 555630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:43,680-Speed 2511.29 samples/sec   Loss 4.4006   LearningRate 0.0109   Epoch: 13   Global Step: 555640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:47,760-Speed 2509.97 samples/sec   Loss 4.4822   LearningRate 0.0109   Epoch: 13   Global Step: 555650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:51,814-Speed 2526.81 samples/sec   Loss 4.5636   LearningRate 0.0109   Epoch: 13   Global Step: 555660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:55,719-Speed 2622.89 samples/sec   Loss 4.3664   LearningRate 0.0109   Epoch: 13   Global Step: 555670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:09:59,630-Speed 2618.65 samples/sec   Loss 4.3381   LearningRate 0.0109   Epoch: 13   Global Step: 555680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:03,522-Speed 2631.50 samples/sec   Loss 4.4176   LearningRate 0.0109   Epoch: 13   Global Step: 555690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:10:07,426-Speed 2623.63 samples/sec   Loss 4.3663   LearningRate 0.0109   Epoch: 13   Global Step: 555700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:10:11,303-Speed 2642.10 samples/sec   Loss 4.5179   LearningRate 0.0109   Epoch: 13   Global Step: 555710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:15,204-Speed 2625.28 samples/sec   Loss 4.4446   LearningRate 0.0109   Epoch: 13   Global Step: 555720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:19,112-Speed 2621.11 samples/sec   Loss 4.4726   LearningRate 0.0109   Epoch: 13   Global Step: 555730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:23,007-Speed 2629.85 samples/sec   Loss 4.5154   LearningRate 0.0109   Epoch: 13   Global Step: 555740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:26,915-Speed 2620.41 samples/sec   Loss 4.3924   LearningRate 0.0109   Epoch: 13   Global Step: 555750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:30,821-Speed 2623.01 samples/sec   Loss 4.4076   LearningRate 0.0109   Epoch: 13   Global Step: 555760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:34,723-Speed 2624.60 samples/sec   Loss 4.5244   LearningRate 0.0109   Epoch: 13   Global Step: 555770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:38,622-Speed 2626.86 samples/sec   Loss 4.4328   LearningRate 0.0109   Epoch: 13   Global Step: 555780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:42,539-Speed 2614.72 samples/sec   Loss 4.4621   LearningRate 0.0109   Epoch: 13   Global Step: 555790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:46,439-Speed 2626.33 samples/sec   Loss 4.4682   LearningRate 0.0109   Epoch: 13   Global Step: 555800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:50,322-Speed 2637.77 samples/sec   Loss 4.4362   LearningRate 0.0109   Epoch: 13   Global Step: 555810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:54,255-Speed 2604.60 samples/sec   Loss 4.3279   LearningRate 0.0109   Epoch: 13   Global Step: 555820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:10:58,158-Speed 2624.11 samples/sec   Loss 4.4309   LearningRate 0.0109   Epoch: 13   Global Step: 555830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:02,053-Speed 2629.37 samples/sec   Loss 4.3982   LearningRate 0.0109   Epoch: 13   Global Step: 555840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:05,954-Speed 2625.88 samples/sec   Loss 4.4992   LearningRate 0.0109   Epoch: 13   Global Step: 555850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:09,852-Speed 2627.30 samples/sec   Loss 4.4595   LearningRate 0.0109   Epoch: 13   Global Step: 555860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:13,744-Speed 2631.65 samples/sec   Loss 4.4604   LearningRate 0.0109   Epoch: 13   Global Step: 555870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:17,657-Speed 2617.93 samples/sec   Loss 4.4276   LearningRate 0.0109   Epoch: 13   Global Step: 555880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:21,556-Speed 2627.09 samples/sec   Loss 4.4244   LearningRate 0.0109   Epoch: 13   Global Step: 555890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:25,448-Speed 2631.15 samples/sec   Loss 4.4596   LearningRate 0.0109   Epoch: 13   Global Step: 555900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:29,348-Speed 2626.23 samples/sec   Loss 4.3924   LearningRate 0.0109   Epoch: 13   Global Step: 555910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:11:33,231-Speed 2637.79 samples/sec   Loss 4.4411   LearningRate 0.0109   Epoch: 13   Global Step: 555920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:37,133-Speed 2625.10 samples/sec   Loss 4.4295   LearningRate 0.0109   Epoch: 13   Global Step: 555930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:41,034-Speed 2625.43 samples/sec   Loss 4.3375   LearningRate 0.0109   Epoch: 13   Global Step: 555940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:44,931-Speed 2628.65 samples/sec   Loss 4.4198   LearningRate 0.0109   Epoch: 13   Global Step: 555950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:48,832-Speed 2625.41 samples/sec   Loss 4.4240   LearningRate 0.0109   Epoch: 13   Global Step: 555960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:52,729-Speed 2629.18 samples/sec   Loss 4.4288   LearningRate 0.0109   Epoch: 13   Global Step: 555970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:11:56,629-Speed 2626.15 samples/sec   Loss 4.6208   LearningRate 0.0109   Epoch: 13   Global Step: 555980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:00,526-Speed 2628.33 samples/sec   Loss 4.4084   LearningRate 0.0109   Epoch: 13   Global Step: 555990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:04,422-Speed 2628.68 samples/sec   Loss 4.3900   LearningRate 0.0109   Epoch: 13   Global Step: 556000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:08,326-Speed 2623.48 samples/sec   Loss 4.4507   LearningRate 0.0109   Epoch: 13   Global Step: 556010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:12,202-Speed 2642.15 samples/sec   Loss 4.4816   LearningRate 0.0109   Epoch: 13   Global Step: 556020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:16,096-Speed 2631.29 samples/sec   Loss 4.5744   LearningRate 0.0109   Epoch: 13   Global Step: 556030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:19,992-Speed 2628.76 samples/sec   Loss 4.4863   LearningRate 0.0109   Epoch: 13   Global Step: 556040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:23,885-Speed 2631.49 samples/sec   Loss 4.3539   LearningRate 0.0109   Epoch: 13   Global Step: 556050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:27,785-Speed 2626.19 samples/sec   Loss 4.5118   LearningRate 0.0109   Epoch: 13   Global Step: 556060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:31,680-Speed 2629.60 samples/sec   Loss 4.4808   LearningRate 0.0109   Epoch: 13   Global Step: 556070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:35,575-Speed 2629.29 samples/sec   Loss 4.5059   LearningRate 0.0109   Epoch: 13   Global Step: 556080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:39,473-Speed 2627.96 samples/sec   Loss 4.4648   LearningRate 0.0109   Epoch: 13   Global Step: 556090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:43,368-Speed 2629.39 samples/sec   Loss 4.3757   LearningRate 0.0109   Epoch: 13   Global Step: 556100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:47,277-Speed 2620.03 samples/sec   Loss 4.4208   LearningRate 0.0109   Epoch: 13   Global Step: 556110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:12:51,187-Speed 2619.73 samples/sec   Loss 4.4437   LearningRate 0.0109   Epoch: 13   Global Step: 556120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:12:55,091-Speed 2623.10 samples/sec   Loss 4.3668   LearningRate 0.0109   Epoch: 13   Global Step: 556130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:12:58,986-Speed 2630.37 samples/sec   Loss 4.4483   LearningRate 0.0109   Epoch: 13   Global Step: 556140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:13:02,857-Speed 2646.15 samples/sec   Loss 4.4996   LearningRate 0.0109   Epoch: 13   Global Step: 556150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:06,750-Speed 2630.46 samples/sec   Loss 4.4765   LearningRate 0.0109   Epoch: 13   Global Step: 556160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:10,646-Speed 2629.10 samples/sec   Loss 4.3926   LearningRate 0.0109   Epoch: 13   Global Step: 556170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:14,550-Speed 2623.11 samples/sec   Loss 4.4880   LearningRate 0.0109   Epoch: 13   Global Step: 556180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:18,483-Speed 2604.51 samples/sec   Loss 4.4288   LearningRate 0.0109   Epoch: 13   Global Step: 556190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:22,385-Speed 2624.95 samples/sec   Loss 4.4146   LearningRate 0.0109   Epoch: 13   Global Step: 556200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:26,293-Speed 2620.38 samples/sec   Loss 4.4382   LearningRate 0.0109   Epoch: 13   Global Step: 556210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:30,195-Speed 2625.07 samples/sec   Loss 4.5142   LearningRate 0.0109   Epoch: 13   Global Step: 556220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:34,085-Speed 2633.44 samples/sec   Loss 4.4844   LearningRate 0.0109   Epoch: 13   Global Step: 556230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:37,977-Speed 2631.88 samples/sec   Loss 4.3829   LearningRate 0.0109   Epoch: 13   Global Step: 556240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:41,872-Speed 2628.91 samples/sec   Loss 4.4401   LearningRate 0.0109   Epoch: 13   Global Step: 556250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:13:45,767-Speed 2630.45 samples/sec   Loss 4.4835   LearningRate 0.0109   Epoch: 13   Global Step: 556260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:13:49,649-Speed 2637.87 samples/sec   Loss 4.4087   LearningRate 0.0109   Epoch: 13   Global Step: 556270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:53,544-Speed 2629.88 samples/sec   Loss 4.3842   LearningRate 0.0109   Epoch: 13   Global Step: 556280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:13:57,447-Speed 2624.12 samples/sec   Loss 4.3648   LearningRate 0.0109   Epoch: 13   Global Step: 556290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:14:01,324-Speed 2641.77 samples/sec   Loss 4.4686   LearningRate 0.0109   Epoch: 13   Global Step: 556300   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:05,224-Speed 2626.19 samples/sec   Loss 4.3617   LearningRate 0.0109   Epoch: 13   Global Step: 556310   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:09,121-Speed 2628.46 samples/sec   Loss 4.4882   LearningRate 0.0109   Epoch: 13   Global Step: 556320   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:13,019-Speed 2627.58 samples/sec   Loss 4.4266   LearningRate 0.0108   Epoch: 13   Global Step: 556330   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:16,928-Speed 2621.07 samples/sec   Loss 4.5101   LearningRate 0.0108   Epoch: 13   Global Step: 556340   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:20,823-Speed 2629.38 samples/sec   Loss 4.4574   LearningRate 0.0108   Epoch: 13   Global Step: 556350   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:24,717-Speed 2630.30 samples/sec   Loss 4.4955   LearningRate 0.0108   Epoch: 13   Global Step: 556360   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:28,610-Speed 2630.87 samples/sec   Loss 4.4568   LearningRate 0.0108   Epoch: 13   Global Step: 556370   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:32,504-Speed 2630.16 samples/sec   Loss 4.4670   LearningRate 0.0108   Epoch: 13   Global Step: 556380   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:36,395-Speed 2632.71 samples/sec   Loss 4.4196   LearningRate 0.0108   Epoch: 13   Global Step: 556390   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:14:40,315-Speed 2612.81 samples/sec   Loss 4.4619   LearningRate 0.0108   Epoch: 13   Global Step: 556400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:14:44,214-Speed 2626.97 samples/sec   Loss 4.4813   LearningRate 0.0108   Epoch: 13   Global Step: 556410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:14:48,119-Speed 2623.37 samples/sec   Loss 4.4350   LearningRate 0.0108   Epoch: 13   Global Step: 556420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:14:52,027-Speed 2620.19 samples/sec   Loss 4.3918   LearningRate 0.0108   Epoch: 13   Global Step: 556430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:14:55,924-Speed 2628.52 samples/sec   Loss 4.5422   LearningRate 0.0108   Epoch: 13   Global Step: 556440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:14:59,827-Speed 2624.51 samples/sec   Loss 4.4580   LearningRate 0.0108   Epoch: 13   Global Step: 556450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:03,731-Speed 2623.35 samples/sec   Loss 4.5546   LearningRate 0.0108   Epoch: 13   Global Step: 556460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:07,641-Speed 2618.86 samples/sec   Loss 4.4152   LearningRate 0.0108   Epoch: 13   Global Step: 556470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:11,545-Speed 2623.92 samples/sec   Loss 4.3991   LearningRate 0.0108   Epoch: 13   Global Step: 556480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:15,486-Speed 2598.59 samples/sec   Loss 4.4988   LearningRate 0.0108   Epoch: 13   Global Step: 556490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:19,406-Speed 2612.75 samples/sec   Loss 4.3603   LearningRate 0.0108   Epoch: 13   Global Step: 556500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:15:23,292-Speed 2636.44 samples/sec   Loss 4.3868   LearningRate 0.0108   Epoch: 13   Global Step: 556510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:27,186-Speed 2630.15 samples/sec   Loss 4.4226   LearningRate 0.0108   Epoch: 13   Global Step: 556520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:31,094-Speed 2621.16 samples/sec   Loss 4.4219   LearningRate 0.0108   Epoch: 13   Global Step: 556530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:34,991-Speed 2628.15 samples/sec   Loss 4.4528   LearningRate 0.0108   Epoch: 13   Global Step: 556540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:38,885-Speed 2630.27 samples/sec   Loss 4.4621   LearningRate 0.0108   Epoch: 13   Global Step: 556550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:42,778-Speed 2630.26 samples/sec   Loss 4.4012   LearningRate 0.0108   Epoch: 13   Global Step: 556560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:46,686-Speed 2621.55 samples/sec   Loss 4.3846   LearningRate 0.0108   Epoch: 13   Global Step: 556570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:50,588-Speed 2624.34 samples/sec   Loss 4.4075   LearningRate 0.0108   Epoch: 13   Global Step: 556580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:54,492-Speed 2624.41 samples/sec   Loss 4.4364   LearningRate 0.0108   Epoch: 13   Global Step: 556590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:15:58,388-Speed 2628.66 samples/sec   Loss 4.4768   LearningRate 0.0108   Epoch: 13   Global Step: 556600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:16:02,294-Speed 2622.35 samples/sec   Loss 4.4987   LearningRate 0.0108   Epoch: 13   Global Step: 556610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:06,196-Speed 2624.91 samples/sec   Loss 4.3692   LearningRate 0.0108   Epoch: 13   Global Step: 556620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:10,094-Speed 2627.56 samples/sec   Loss 4.3235   LearningRate 0.0108   Epoch: 13   Global Step: 556630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:13,993-Speed 2626.66 samples/sec   Loss 4.5049   LearningRate 0.0108   Epoch: 13   Global Step: 556640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:17,883-Speed 2633.30 samples/sec   Loss 4.4299   LearningRate 0.0108   Epoch: 13   Global Step: 556650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:21,779-Speed 2629.06 samples/sec   Loss 4.4009   LearningRate 0.0108   Epoch: 13   Global Step: 556660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:25,675-Speed 2628.91 samples/sec   Loss 4.3584   LearningRate 0.0108   Epoch: 13   Global Step: 556670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:29,581-Speed 2622.48 samples/sec   Loss 4.3954   LearningRate 0.0108   Epoch: 13   Global Step: 556680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:33,475-Speed 2629.75 samples/sec   Loss 4.3963   LearningRate 0.0108   Epoch: 13   Global Step: 556690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:16:37,347-Speed 2645.48 samples/sec   Loss 4.4006   LearningRate 0.0108   Epoch: 13   Global Step: 556700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:16:41,241-Speed 2630.37 samples/sec   Loss 4.4695   LearningRate 0.0108   Epoch: 13   Global Step: 556710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:16:45,134-Speed 2631.21 samples/sec   Loss 4.5187   LearningRate 0.0108   Epoch: 13   Global Step: 556720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:16:49,038-Speed 2623.77 samples/sec   Loss 4.4827   LearningRate 0.0108   Epoch: 13   Global Step: 556730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:16:52,936-Speed 2627.41 samples/sec   Loss 4.4873   LearningRate 0.0108   Epoch: 13   Global Step: 556740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:16:56,828-Speed 2631.20 samples/sec   Loss 4.3793   LearningRate 0.0108   Epoch: 13   Global Step: 556750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:00,723-Speed 2630.48 samples/sec   Loss 4.5026   LearningRate 0.0108   Epoch: 13   Global Step: 556760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:04,626-Speed 2623.43 samples/sec   Loss 4.4503   LearningRate 0.0108   Epoch: 13   Global Step: 556770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:08,531-Speed 2623.21 samples/sec   Loss 4.4437   LearningRate 0.0108   Epoch: 13   Global Step: 556780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:12,428-Speed 2627.73 samples/sec   Loss 4.5283   LearningRate 0.0108   Epoch: 13   Global Step: 556790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:16,334-Speed 2626.47 samples/sec   Loss 4.5094   LearningRate 0.0108   Epoch: 13   Global Step: 556800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:17:20,232-Speed 2627.62 samples/sec   Loss 4.4290   LearningRate 0.0108   Epoch: 13   Global Step: 556810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:17:24,126-Speed 2630.32 samples/sec   Loss 4.4295   LearningRate 0.0108   Epoch: 13   Global Step: 556820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:17:28,002-Speed 2642.26 samples/sec   Loss 4.3812   LearningRate 0.0108   Epoch: 13   Global Step: 556830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:31,902-Speed 2626.87 samples/sec   Loss 4.3611   LearningRate 0.0108   Epoch: 13   Global Step: 556840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:35,796-Speed 2629.54 samples/sec   Loss 4.4313   LearningRate 0.0108   Epoch: 13   Global Step: 556850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:39,694-Speed 2627.48 samples/sec   Loss 4.4145   LearningRate 0.0108   Epoch: 13   Global Step: 556860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:43,594-Speed 2626.44 samples/sec   Loss 4.3393   LearningRate 0.0108   Epoch: 13   Global Step: 556870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:47,500-Speed 2622.12 samples/sec   Loss 4.5609   LearningRate 0.0108   Epoch: 13   Global Step: 556880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:51,400-Speed 2626.40 samples/sec   Loss 4.4174   LearningRate 0.0108   Epoch: 13   Global Step: 556890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:55,295-Speed 2629.52 samples/sec   Loss 4.4611   LearningRate 0.0108   Epoch: 13   Global Step: 556900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:17:59,192-Speed 2628.33 samples/sec   Loss 4.4196   LearningRate 0.0108   Epoch: 13   Global Step: 556910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:03,085-Speed 2631.72 samples/sec   Loss 4.4606   LearningRate 0.0108   Epoch: 13   Global Step: 556920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:06,966-Speed 2638.91 samples/sec   Loss 4.3625   LearningRate 0.0108   Epoch: 13   Global Step: 556930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:10,863-Speed 2627.79 samples/sec   Loss 4.4294   LearningRate 0.0108   Epoch: 13   Global Step: 556940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:14,757-Speed 2630.21 samples/sec   Loss 4.4046   LearningRate 0.0108   Epoch: 13   Global Step: 556950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:18,667-Speed 2619.37 samples/sec   Loss 4.3689   LearningRate 0.0108   Epoch: 13   Global Step: 556960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:22,585-Speed 2614.28 samples/sec   Loss 4.3948   LearningRate 0.0108   Epoch: 13   Global Step: 556970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:26,480-Speed 2629.44 samples/sec   Loss 4.4222   LearningRate 0.0108   Epoch: 13   Global Step: 556980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:30,386-Speed 2622.71 samples/sec   Loss 4.3833   LearningRate 0.0108   Epoch: 13   Global Step: 556990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:34,278-Speed 2631.37 samples/sec   Loss 4.4756   LearningRate 0.0108   Epoch: 13   Global Step: 557000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:38,175-Speed 2628.24 samples/sec   Loss 4.4245   LearningRate 0.0108   Epoch: 13   Global Step: 557010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:42,066-Speed 2632.27 samples/sec   Loss 4.4042   LearningRate 0.0108   Epoch: 13   Global Step: 557020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:45,958-Speed 2631.75 samples/sec   Loss 4.3801   LearningRate 0.0108   Epoch: 13   Global Step: 557030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:18:49,925-Speed 2581.68 samples/sec   Loss 4.5228   LearningRate 0.0108   Epoch: 13   Global Step: 557040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:53,833-Speed 2621.21 samples/sec   Loss 4.3818   LearningRate 0.0108   Epoch: 13   Global Step: 557050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:18:57,725-Speed 2631.82 samples/sec   Loss 4.3397   LearningRate 0.0108   Epoch: 13   Global Step: 557060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:01,626-Speed 2625.58 samples/sec   Loss 4.4193   LearningRate 0.0108   Epoch: 13   Global Step: 557070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:05,530-Speed 2623.61 samples/sec   Loss 4.4865   LearningRate 0.0108   Epoch: 13   Global Step: 557080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:09,421-Speed 2631.65 samples/sec   Loss 4.3494   LearningRate 0.0108   Epoch: 13   Global Step: 557090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:13,318-Speed 2628.43 samples/sec   Loss 4.5777   LearningRate 0.0108   Epoch: 13   Global Step: 557100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:17,213-Speed 2630.23 samples/sec   Loss 4.4007   LearningRate 0.0108   Epoch: 13   Global Step: 557110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:21,107-Speed 2629.94 samples/sec   Loss 4.3540   LearningRate 0.0108   Epoch: 13   Global Step: 557120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:25,003-Speed 2629.53 samples/sec   Loss 4.3866   LearningRate 0.0108   Epoch: 13   Global Step: 557130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:28,875-Speed 2644.67 samples/sec   Loss 4.3826   LearningRate 0.0108   Epoch: 13   Global Step: 557140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:32,774-Speed 2626.97 samples/sec   Loss 4.4143   LearningRate 0.0108   Epoch: 13   Global Step: 557150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:36,667-Speed 2631.01 samples/sec   Loss 4.4353   LearningRate 0.0108   Epoch: 13   Global Step: 557160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:40,566-Speed 2626.98 samples/sec   Loss 4.4212   LearningRate 0.0108   Epoch: 13   Global Step: 557170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:44,469-Speed 2624.11 samples/sec   Loss 4.4922   LearningRate 0.0108   Epoch: 13   Global Step: 557180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:48,385-Speed 2615.70 samples/sec   Loss 4.3556   LearningRate 0.0108   Epoch: 13   Global Step: 557190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:52,290-Speed 2623.22 samples/sec   Loss 4.4887   LearningRate 0.0108   Epoch: 13   Global Step: 557200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:19:56,189-Speed 2626.37 samples/sec   Loss 4.4376   LearningRate 0.0108   Epoch: 13   Global Step: 557210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:00,085-Speed 2629.73 samples/sec   Loss 4.4017   LearningRate 0.0108   Epoch: 13   Global Step: 557220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:03,977-Speed 2631.36 samples/sec   Loss 4.4298   LearningRate 0.0108   Epoch: 13   Global Step: 557230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:07,868-Speed 2632.17 samples/sec   Loss 4.3262   LearningRate 0.0108   Epoch: 13   Global Step: 557240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:20:11,744-Speed 2642.49 samples/sec   Loss 4.4051   LearningRate 0.0108   Epoch: 13   Global Step: 557250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:15,635-Speed 2632.30 samples/sec   Loss 4.4541   LearningRate 0.0108   Epoch: 13   Global Step: 557260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:19,532-Speed 2627.96 samples/sec   Loss 4.4521   LearningRate 0.0108   Epoch: 13   Global Step: 557270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:23,427-Speed 2629.66 samples/sec   Loss 4.3498   LearningRate 0.0108   Epoch: 13   Global Step: 557280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:20:27,313-Speed 2636.33 samples/sec   Loss 4.4513   LearningRate 0.0108   Epoch: 13   Global Step: 557290   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:31,213-Speed 2625.63 samples/sec   Loss 4.4381   LearningRate 0.0108   Epoch: 13   Global Step: 557300   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:35,113-Speed 2626.86 samples/sec   Loss 4.3915   LearningRate 0.0108   Epoch: 13   Global Step: 557310   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:39,013-Speed 2626.08 samples/sec   Loss 4.4288   LearningRate 0.0108   Epoch: 13   Global Step: 557320   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:42,910-Speed 2628.23 samples/sec   Loss 4.4496   LearningRate 0.0108   Epoch: 13   Global Step: 557330   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:46,804-Speed 2630.34 samples/sec   Loss 4.4980   LearningRate 0.0108   Epoch: 13   Global Step: 557340   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:50,699-Speed 2629.45 samples/sec   Loss 4.4751   LearningRate 0.0108   Epoch: 13   Global Step: 557350   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:54,595-Speed 2629.70 samples/sec   Loss 4.4049   LearningRate 0.0108   Epoch: 13   Global Step: 557360   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:20:58,486-Speed 2631.56 samples/sec   Loss 4.4174   LearningRate 0.0108   Epoch: 13   Global Step: 557370   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:21:02,379-Speed 2631.11 samples/sec   Loss 4.4909   LearningRate 0.0108   Epoch: 13   Global Step: 557380   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:21:06,276-Speed 2628.42 samples/sec   Loss 4.3963   LearningRate 0.0108   Epoch: 13   Global Step: 557390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:10,169-Speed 2630.98 samples/sec   Loss 4.3835   LearningRate 0.0108   Epoch: 13   Global Step: 557400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:14,063-Speed 2630.39 samples/sec   Loss 4.4014   LearningRate 0.0108   Epoch: 13   Global Step: 557410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:17,956-Speed 2631.26 samples/sec   Loss 4.4953   LearningRate 0.0108   Epoch: 13   Global Step: 557420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:21,863-Speed 2621.68 samples/sec   Loss 4.3167   LearningRate 0.0108   Epoch: 13   Global Step: 557430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:25,761-Speed 2627.36 samples/sec   Loss 4.3256   LearningRate 0.0108   Epoch: 13   Global Step: 557440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:29,658-Speed 2628.26 samples/sec   Loss 4.4311   LearningRate 0.0108   Epoch: 13   Global Step: 557450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:33,560-Speed 2624.76 samples/sec   Loss 4.3982   LearningRate 0.0108   Epoch: 13   Global Step: 557460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:37,457-Speed 2628.19 samples/sec   Loss 4.4508   LearningRate 0.0108   Epoch: 13   Global Step: 557470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:41,355-Speed 2627.50 samples/sec   Loss 4.4842   LearningRate 0.0108   Epoch: 13   Global Step: 557480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:45,237-Speed 2638.87 samples/sec   Loss 4.4707   LearningRate 0.0108   Epoch: 13   Global Step: 557490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:49,133-Speed 2628.88 samples/sec   Loss 4.3849   LearningRate 0.0108   Epoch: 13   Global Step: 557500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:53,029-Speed 2629.14 samples/sec   Loss 4.4240   LearningRate 0.0108   Epoch: 13   Global Step: 557510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:21:56,926-Speed 2627.95 samples/sec   Loss 4.4431   LearningRate 0.0108   Epoch: 13   Global Step: 557520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:00,820-Speed 2630.93 samples/sec   Loss 4.5097   LearningRate 0.0108   Epoch: 13   Global Step: 557530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:04,721-Speed 2624.99 samples/sec   Loss 4.4780   LearningRate 0.0108   Epoch: 13   Global Step: 557540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:08,711-Speed 2567.06 samples/sec   Loss 4.4452   LearningRate 0.0108   Epoch: 13   Global Step: 557550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:12,787-Speed 2512.70 samples/sec   Loss 4.2918   LearningRate 0.0108   Epoch: 13   Global Step: 557560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:16,758-Speed 2579.40 samples/sec   Loss 4.3004   LearningRate 0.0108   Epoch: 13   Global Step: 557570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:20,665-Speed 2621.11 samples/sec   Loss 4.4104   LearningRate 0.0108   Epoch: 13   Global Step: 557580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:24,562-Speed 2629.12 samples/sec   Loss 4.4409   LearningRate 0.0107   Epoch: 13   Global Step: 557590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:22:28,458-Speed 2629.15 samples/sec   Loss 4.4985   LearningRate 0.0107   Epoch: 13   Global Step: 557600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:22:32,352-Speed 2630.75 samples/sec   Loss 4.4389   LearningRate 0.0107   Epoch: 13   Global Step: 557610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:22:36,229-Speed 2641.77 samples/sec   Loss 4.4364   LearningRate 0.0107   Epoch: 13   Global Step: 557620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:40,122-Speed 2631.00 samples/sec   Loss 4.4346   LearningRate 0.0107   Epoch: 13   Global Step: 557630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:44,019-Speed 2627.81 samples/sec   Loss 4.3498   LearningRate 0.0107   Epoch: 13   Global Step: 557640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:47,956-Speed 2601.77 samples/sec   Loss 4.4523   LearningRate 0.0107   Epoch: 13   Global Step: 557650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:51,853-Speed 2628.36 samples/sec   Loss 4.4830   LearningRate 0.0107   Epoch: 13   Global Step: 557660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:55,749-Speed 2629.78 samples/sec   Loss 4.4840   LearningRate 0.0107   Epoch: 13   Global Step: 557670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:22:59,646-Speed 2628.50 samples/sec   Loss 4.5224   LearningRate 0.0107   Epoch: 13   Global Step: 557680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:03,547-Speed 2626.09 samples/sec   Loss 4.3526   LearningRate 0.0107   Epoch: 13   Global Step: 557690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:07,442-Speed 2628.95 samples/sec   Loss 4.3323   LearningRate 0.0107   Epoch: 13   Global Step: 557700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:11,344-Speed 2625.19 samples/sec   Loss 4.3568   LearningRate 0.0107   Epoch: 13   Global Step: 557710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:15,249-Speed 2623.01 samples/sec   Loss 4.3719   LearningRate 0.0107   Epoch: 13   Global Step: 557720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:23:19,159-Speed 2619.29 samples/sec   Loss 4.4420   LearningRate 0.0107   Epoch: 13   Global Step: 557730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:23:23,052-Speed 2632.41 samples/sec   Loss 4.5323   LearningRate 0.0107   Epoch: 13   Global Step: 557740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:23:26,945-Speed 2630.52 samples/sec   Loss 4.4829   LearningRate 0.0107   Epoch: 13   Global Step: 557750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:23:30,887-Speed 2599.39 samples/sec   Loss 4.4618   LearningRate 0.0107   Epoch: 13   Global Step: 557760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:23:34,802-Speed 2616.19 samples/sec   Loss 4.4433   LearningRate 0.0107   Epoch: 13   Global Step: 557770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:23:38,688-Speed 2635.53 samples/sec   Loss 4.4495   LearningRate 0.0107   Epoch: 13   Global Step: 557780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:42,583-Speed 2629.25 samples/sec   Loss 4.4661   LearningRate 0.0107   Epoch: 13   Global Step: 557790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:46,479-Speed 2629.73 samples/sec   Loss 4.4233   LearningRate 0.0107   Epoch: 13   Global Step: 557800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:50,373-Speed 2629.84 samples/sec   Loss 4.3825   LearningRate 0.0107   Epoch: 13   Global Step: 557810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:54,293-Speed 2613.21 samples/sec   Loss 4.3629   LearningRate 0.0107   Epoch: 13   Global Step: 557820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:23:58,198-Speed 2622.99 samples/sec   Loss 4.3514   LearningRate 0.0107   Epoch: 13   Global Step: 557830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:02,125-Speed 2608.63 samples/sec   Loss 4.4025   LearningRate 0.0107   Epoch: 13   Global Step: 557840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:06,020-Speed 2629.66 samples/sec   Loss 4.3454   LearningRate 0.0107   Epoch: 13   Global Step: 557850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:09,917-Speed 2628.81 samples/sec   Loss 4.4840   LearningRate 0.0107   Epoch: 13   Global Step: 557860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:13,808-Speed 2631.88 samples/sec   Loss 4.4233   LearningRate 0.0107   Epoch: 13   Global Step: 557870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:17,701-Speed 2631.64 samples/sec   Loss 4.5307   LearningRate 0.0107   Epoch: 13   Global Step: 557880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:24:21,594-Speed 2630.61 samples/sec   Loss 4.4194   LearningRate 0.0107   Epoch: 13   Global Step: 557890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:24:25,486-Speed 2631.49 samples/sec   Loss 4.4690   LearningRate 0.0107   Epoch: 13   Global Step: 557900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:24:29,381-Speed 2629.85 samples/sec   Loss 4.4059   LearningRate 0.0107   Epoch: 13   Global Step: 557910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:24:33,275-Speed 2630.56 samples/sec   Loss 4.3354   LearningRate 0.0107   Epoch: 13   Global Step: 557920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-15 10:24:37,148-Speed 2644.76 samples/sec   Loss 4.4650   LearningRate 0.0107   Epoch: 13   Global Step: 557930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:41,045-Speed 2628.49 samples/sec   Loss 4.4705   LearningRate 0.0107   Epoch: 13   Global Step: 557940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:24:44,916-Speed 2645.68 samples/sec   Loss 4.3880   LearningRate 0.0107   Epoch: 13   Global Step: 557950   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:24:48,809-Speed 2630.62 samples/sec   Loss 4.4396   LearningRate 0.0107   Epoch: 13   Global Step: 557960   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:24:52,708-Speed 2627.55 samples/sec   Loss 4.3379   LearningRate 0.0107   Epoch: 13   Global Step: 557970   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:24:56,600-Speed 2631.03 samples/sec   Loss 4.4179   LearningRate 0.0107   Epoch: 13   Global Step: 557980   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:00,492-Speed 2631.80 samples/sec   Loss 4.4075   LearningRate 0.0107   Epoch: 13   Global Step: 557990   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:04,390-Speed 2627.65 samples/sec   Loss 4.3170   LearningRate 0.0107   Epoch: 13   Global Step: 558000   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:08,295-Speed 2623.38 samples/sec   Loss 4.3987   LearningRate 0.0107   Epoch: 13   Global Step: 558010   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:12,188-Speed 2630.81 samples/sec   Loss 4.4419   LearningRate 0.0107   Epoch: 13   Global Step: 558020   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:16,094-Speed 2622.09 samples/sec   Loss 4.4447   LearningRate 0.0107   Epoch: 13   Global Step: 558030   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:19,988-Speed 2630.87 samples/sec   Loss 4.4569   LearningRate 0.0107   Epoch: 13   Global Step: 558040   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-15 10:25:23,881-Speed 2630.94 samples/sec   Loss 4.3701   LearningRate 0.0107   Epoch: 13   Global Step: 558050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:25:27,782-Speed 2626.10 samples/sec   Loss 4.5688   LearningRate 0.0107   Epoch: 13   Global Step: 558060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:25:31,703-Speed 2612.21 samples/sec   Loss 4.3900   LearningRate 0.0107   Epoch: 13   Global Step: 558070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:25:35,602-Speed 2626.87 samples/sec   Loss 4.3271   LearningRate 0.0107   Epoch: 13   Global Step: 558080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-15 10:25:39,497-Speed 2629.69 samples/sec   Loss 4.4411   LearningRate 0.0107   Epoch: 13   Global Step: 558090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:25:43,376-Speed 2640.27 samples/sec   Loss 4.4158   LearningRate 0.0107   Epoch: 13   Global Step: 558100   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:25:47,278-Speed 2624.99 samples/sec   Loss 4.4036   LearningRate 0.0107   Epoch: 13   Global Step: 558110   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:25:51,172-Speed 2630.65 samples/sec   Loss 4.4347   LearningRate 0.0107   Epoch: 13   Global Step: 558120   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:25:55,096-Speed 2610.39 samples/sec   Loss 4.3629   LearningRate 0.0107   Epoch: 13   Global Step: 558130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:25:58,996-Speed 2627.25 samples/sec   Loss 4.3030   LearningRate 0.0107   Epoch: 13   Global Step: 558140   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:02,922-Speed 2608.87 samples/sec   Loss 4.3978   LearningRate 0.0107   Epoch: 13   Global Step: 558150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:06,814-Speed 2631.97 samples/sec   Loss 4.4046   LearningRate 0.0107   Epoch: 13   Global Step: 558160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:10,713-Speed 2627.04 samples/sec   Loss 4.3991   LearningRate 0.0107   Epoch: 13   Global Step: 558170   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:14,611-Speed 2627.81 samples/sec   Loss 4.3734   LearningRate 0.0107   Epoch: 13   Global Step: 558180   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:18,508-Speed 2628.20 samples/sec   Loss 4.3759   LearningRate 0.0107   Epoch: 13   Global Step: 558190   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:22,438-Speed 2606.12 samples/sec   Loss 4.3536   LearningRate 0.0107   Epoch: 13   Global Step: 558200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:26:26,318-Speed 2640.27 samples/sec   Loss 4.5187   LearningRate 0.0107   Epoch: 13   Global Step: 558210   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:30,225-Speed 2621.48 samples/sec   Loss 4.5099   LearningRate 0.0107   Epoch: 13   Global Step: 558220   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:34,138-Speed 2617.32 samples/sec   Loss 4.3736   LearningRate 0.0107   Epoch: 13   Global Step: 558230   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:38,035-Speed 2628.77 samples/sec   Loss 4.4574   LearningRate 0.0107   Epoch: 13   Global Step: 558240   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:41,937-Speed 2625.26 samples/sec   Loss 4.3342   LearningRate 0.0107   Epoch: 13   Global Step: 558250   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:45,837-Speed 2626.37 samples/sec   Loss 4.3055   LearningRate 0.0107   Epoch: 13   Global Step: 558260   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:49,756-Speed 2613.51 samples/sec   Loss 4.3536   LearningRate 0.0107   Epoch: 13   Global Step: 558270   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:53,648-Speed 2632.93 samples/sec   Loss 4.3692   LearningRate 0.0107   Epoch: 13   Global Step: 558280   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:26:57,543-Speed 2629.16 samples/sec   Loss 4.4595   LearningRate 0.0107   Epoch: 13   Global Step: 558290   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:01,440-Speed 2627.98 samples/sec   Loss 4.4683   LearningRate 0.0107   Epoch: 13   Global Step: 558300   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:05,341-Speed 2625.57 samples/sec   Loss 4.3090   LearningRate 0.0107   Epoch: 13   Global Step: 558310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:27:09,237-Speed 2629.81 samples/sec   Loss 4.4274   LearningRate 0.0107   Epoch: 13   Global Step: 558320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:27:13,134-Speed 2628.48 samples/sec   Loss 4.4591   LearningRate 0.0107   Epoch: 13   Global Step: 558330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:27:17,024-Speed 2632.64 samples/sec   Loss 4.3779   LearningRate 0.0107   Epoch: 13   Global Step: 558340   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:20,923-Speed 2627.58 samples/sec   Loss 4.4032   LearningRate 0.0107   Epoch: 13   Global Step: 558350   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:24,820-Speed 2628.58 samples/sec   Loss 4.4354   LearningRate 0.0107   Epoch: 13   Global Step: 558360   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:28,719-Speed 2627.38 samples/sec   Loss 4.4560   LearningRate 0.0107   Epoch: 13   Global Step: 558370   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:32,740-Speed 2546.87 samples/sec   Loss 4.4551   LearningRate 0.0107   Epoch: 13   Global Step: 558380   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:36,650-Speed 2619.33 samples/sec   Loss 4.3476   LearningRate 0.0107   Epoch: 13   Global Step: 558390   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:40,558-Speed 2621.33 samples/sec   Loss 4.3853   LearningRate 0.0107   Epoch: 13   Global Step: 558400   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:44,464-Speed 2622.88 samples/sec   Loss 4.4358   LearningRate 0.0107   Epoch: 13   Global Step: 558410   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:48,390-Speed 2609.09 samples/sec   Loss 4.4601   LearningRate 0.0107   Epoch: 13   Global Step: 558420   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:52,288-Speed 2627.35 samples/sec   Loss 4.3175   LearningRate 0.0107   Epoch: 13   Global Step: 558430   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:27:56,207-Speed 2614.18 samples/sec   Loss 4.3118   LearningRate 0.0107   Epoch: 13   Global Step: 558440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:28:00,122-Speed 2615.80 samples/sec   Loss 4.4520   LearningRate 0.0107   Epoch: 13   Global Step: 558450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:28:04,001-Speed 2640.49 samples/sec   Loss 4.4309   LearningRate 0.0107   Epoch: 13   Global Step: 558460   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:07,900-Speed 2626.64 samples/sec   Loss 4.4072   LearningRate 0.0107   Epoch: 13   Global Step: 558470   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:11,800-Speed 2626.70 samples/sec   Loss 4.4297   LearningRate 0.0107   Epoch: 13   Global Step: 558480   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:15,696-Speed 2629.15 samples/sec   Loss 4.4619   LearningRate 0.0107   Epoch: 13   Global Step: 558490   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:19,591-Speed 2629.83 samples/sec   Loss 4.4148   LearningRate 0.0107   Epoch: 13   Global Step: 558500   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:23,492-Speed 2625.66 samples/sec   Loss 4.3542   LearningRate 0.0107   Epoch: 13   Global Step: 558510   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:27,427-Speed 2603.22 samples/sec   Loss 4.3383   LearningRate 0.0107   Epoch: 13   Global Step: 558520   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:31,362-Speed 2603.07 samples/sec   Loss 4.4497   LearningRate 0.0107   Epoch: 13   Global Step: 558530   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:35,258-Speed 2628.56 samples/sec   Loss 4.3405   LearningRate 0.0107   Epoch: 13   Global Step: 558540   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:39,155-Speed 2628.50 samples/sec   Loss 4.4749   LearningRate 0.0107   Epoch: 13   Global Step: 558550   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:28:43,053-Speed 2628.43 samples/sec   Loss 4.3705   LearningRate 0.0107   Epoch: 13   Global Step: 558560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:28:46,954-Speed 2625.55 samples/sec   Loss 4.4589   LearningRate 0.0107   Epoch: 13   Global Step: 558570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:28:50,851-Speed 2628.61 samples/sec   Loss 4.4627   LearningRate 0.0107   Epoch: 13   Global Step: 558580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:28:54,750-Speed 2627.12 samples/sec   Loss 4.4210   LearningRate 0.0107   Epoch: 13   Global Step: 558590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:28:58,658-Speed 2621.41 samples/sec   Loss 4.4281   LearningRate 0.0107   Epoch: 13   Global Step: 558600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:02,653-Speed 2563.95 samples/sec   Loss 4.4026   LearningRate 0.0107   Epoch: 13   Global Step: 558610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:06,567-Speed 2616.55 samples/sec   Loss 4.3633   LearningRate 0.0107   Epoch: 13   Global Step: 558620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:10,459-Speed 2631.81 samples/sec   Loss 4.4258   LearningRate 0.0107   Epoch: 13   Global Step: 558630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:14,351-Speed 2631.99 samples/sec   Loss 4.4271   LearningRate 0.0107   Epoch: 13   Global Step: 558640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:18,247-Speed 2629.44 samples/sec   Loss 4.4075   LearningRate 0.0107   Epoch: 13   Global Step: 558650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:22,139-Speed 2632.25 samples/sec   Loss 4.3884   LearningRate 0.0107   Epoch: 13   Global Step: 558660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:29:26,005-Speed 2649.67 samples/sec   Loss 4.5391   LearningRate 0.0107   Epoch: 13   Global Step: 558670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:29,896-Speed 2632.05 samples/sec   Loss 4.4656   LearningRate 0.0107   Epoch: 13   Global Step: 558680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:33,793-Speed 2628.04 samples/sec   Loss 4.4625   LearningRate 0.0107   Epoch: 13   Global Step: 558690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:37,690-Speed 2628.24 samples/sec   Loss 4.4019   LearningRate 0.0107   Epoch: 13   Global Step: 558700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:41,584-Speed 2630.67 samples/sec   Loss 4.3991   LearningRate 0.0107   Epoch: 13   Global Step: 558710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:45,479-Speed 2630.20 samples/sec   Loss 4.3396   LearningRate 0.0107   Epoch: 13   Global Step: 558720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:29:49,351-Speed 2645.52 samples/sec   Loss 4.4527   LearningRate 0.0107   Epoch: 13   Global Step: 558730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:29:53,260-Speed 2619.87 samples/sec   Loss 4.4348   LearningRate 0.0107   Epoch: 13   Global Step: 558740   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:29:57,152-Speed 2632.35 samples/sec   Loss 4.5384   LearningRate 0.0107   Epoch: 13   Global Step: 558750   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:01,069-Speed 2614.92 samples/sec   Loss 4.2806   LearningRate 0.0107   Epoch: 13   Global Step: 558760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:04,981-Speed 2617.62 samples/sec   Loss 4.4743   LearningRate 0.0107   Epoch: 13   Global Step: 558770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:08,879-Speed 2627.45 samples/sec   Loss 4.4103   LearningRate 0.0107   Epoch: 13   Global Step: 558780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:12,778-Speed 2627.78 samples/sec   Loss 4.3133   LearningRate 0.0107   Epoch: 13   Global Step: 558790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:16,675-Speed 2628.48 samples/sec   Loss 4.4033   LearningRate 0.0107   Epoch: 13   Global Step: 558800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:20,594-Speed 2612.85 samples/sec   Loss 4.3777   LearningRate 0.0107   Epoch: 13   Global Step: 558810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:24,504-Speed 2620.37 samples/sec   Loss 4.4944   LearningRate 0.0107   Epoch: 13   Global Step: 558820   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:30:28,403-Speed 2626.72 samples/sec   Loss 4.4700   LearningRate 0.0107   Epoch: 13   Global Step: 558830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:32,338-Speed 2603.32 samples/sec   Loss 4.4152   LearningRate 0.0107   Epoch: 13   Global Step: 558840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:36,229-Speed 2632.74 samples/sec   Loss 4.4266   LearningRate 0.0107   Epoch: 13   Global Step: 558850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:40,126-Speed 2627.86 samples/sec   Loss 4.3158   LearningRate 0.0106   Epoch: 13   Global Step: 558860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:44,033-Speed 2621.86 samples/sec   Loss 4.3255   LearningRate 0.0106   Epoch: 13   Global Step: 558870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:47,926-Speed 2631.73 samples/sec   Loss 4.3337   LearningRate 0.0106   Epoch: 13   Global Step: 558880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:51,819-Speed 2631.41 samples/sec   Loss 4.3572   LearningRate 0.0106   Epoch: 13   Global Step: 558890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:55,780-Speed 2585.52 samples/sec   Loss 4.4268   LearningRate 0.0106   Epoch: 13   Global Step: 558900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:30:59,675-Speed 2630.01 samples/sec   Loss 4.3510   LearningRate 0.0106   Epoch: 13   Global Step: 558910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:03,582-Speed 2621.83 samples/sec   Loss 4.3791   LearningRate 0.0106   Epoch: 13   Global Step: 558920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:07,469-Speed 2634.66 samples/sec   Loss 4.4118   LearningRate 0.0106   Epoch: 13   Global Step: 558930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:11,364-Speed 2629.77 samples/sec   Loss 4.4082   LearningRate 0.0106   Epoch: 13   Global Step: 558940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:15,260-Speed 2628.43 samples/sec   Loss 4.3680   LearningRate 0.0106   Epoch: 13   Global Step: 558950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:19,155-Speed 2630.33 samples/sec   Loss 4.5066   LearningRate 0.0106   Epoch: 13   Global Step: 558960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:23,048-Speed 2631.77 samples/sec   Loss 4.4116   LearningRate 0.0106   Epoch: 13   Global Step: 558970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:26,939-Speed 2632.14 samples/sec   Loss 4.3165   LearningRate 0.0106   Epoch: 13   Global Step: 558980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:30,835-Speed 2629.05 samples/sec   Loss 4.3560   LearningRate 0.0106   Epoch: 13   Global Step: 558990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:34,739-Speed 2623.77 samples/sec   Loss 4.4469   LearningRate 0.0106   Epoch: 13   Global Step: 559000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:38,636-Speed 2627.97 samples/sec   Loss 4.3450   LearningRate 0.0106   Epoch: 13   Global Step: 559010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:42,542-Speed 2622.36 samples/sec   Loss 4.4436   LearningRate 0.0106   Epoch: 13   Global Step: 559020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:31:46,455-Speed 2618.06 samples/sec   Loss 4.4233   LearningRate 0.0106   Epoch: 13   Global Step: 559030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:31:50,359-Speed 2623.28 samples/sec   Loss 4.3555   LearningRate 0.0106   Epoch: 13   Global Step: 559040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:31:54,292-Speed 2604.82 samples/sec   Loss 4.3909   LearningRate 0.0106   Epoch: 13   Global Step: 559050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:31:58,187-Speed 2629.56 samples/sec   Loss 4.3781   LearningRate 0.0106   Epoch: 13   Global Step: 559060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:32:02,059-Speed 2645.65 samples/sec   Loss 4.5332   LearningRate 0.0106   Epoch: 13   Global Step: 559070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:05,962-Speed 2624.34 samples/sec   Loss 4.5256   LearningRate 0.0106   Epoch: 13   Global Step: 559080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:09,901-Speed 2600.09 samples/sec   Loss 4.4551   LearningRate 0.0106   Epoch: 13   Global Step: 559090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:13,798-Speed 2627.96 samples/sec   Loss 4.4952   LearningRate 0.0106   Epoch: 13   Global Step: 559100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:17,693-Speed 2630.28 samples/sec   Loss 4.3982   LearningRate 0.0106   Epoch: 13   Global Step: 559110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:21,602-Speed 2619.80 samples/sec   Loss 4.4323   LearningRate 0.0106   Epoch: 13   Global Step: 559120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:25,505-Speed 2624.70 samples/sec   Loss 4.4802   LearningRate 0.0106   Epoch: 13   Global Step: 559130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:29,404-Speed 2630.37 samples/sec   Loss 4.4819   LearningRate 0.0106   Epoch: 13   Global Step: 559140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:33,298-Speed 2630.91 samples/sec   Loss 4.3563   LearningRate 0.0106   Epoch: 13   Global Step: 559150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:37,190-Speed 2631.80 samples/sec   Loss 4.3924   LearningRate 0.0106   Epoch: 13   Global Step: 559160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:41,067-Speed 2641.28 samples/sec   Loss 4.3738   LearningRate 0.0106   Epoch: 13   Global Step: 559170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:44,962-Speed 2630.18 samples/sec   Loss 4.3522   LearningRate 0.0106   Epoch: 13   Global Step: 559180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:48,854-Speed 2631.49 samples/sec   Loss 4.4670   LearningRate 0.0106   Epoch: 13   Global Step: 559190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:52,759-Speed 2623.10 samples/sec   Loss 4.4145   LearningRate 0.0106   Epoch: 13   Global Step: 559200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:32:56,672-Speed 2617.48 samples/sec   Loss 4.4068   LearningRate 0.0106   Epoch: 13   Global Step: 559210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:00,600-Speed 2607.30 samples/sec   Loss 4.3574   LearningRate 0.0106   Epoch: 13   Global Step: 559220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:04,514-Speed 2616.59 samples/sec   Loss 4.3923   LearningRate 0.0106   Epoch: 13   Global Step: 559230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:08,417-Speed 2624.84 samples/sec   Loss 4.3725   LearningRate 0.0106   Epoch: 13   Global Step: 559240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:12,373-Speed 2589.35 samples/sec   Loss 4.3537   LearningRate 0.0106   Epoch: 13   Global Step: 559250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:16,271-Speed 2627.17 samples/sec   Loss 4.4160   LearningRate 0.0106   Epoch: 13   Global Step: 559260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:20,152-Speed 2639.24 samples/sec   Loss 4.4850   LearningRate 0.0106   Epoch: 13   Global Step: 559270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:24,065-Speed 2618.08 samples/sec   Loss 4.4155   LearningRate 0.0106   Epoch: 13   Global Step: 559280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:27,978-Speed 2617.70 samples/sec   Loss 4.3142   LearningRate 0.0106   Epoch: 13   Global Step: 559290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:31,884-Speed 2622.18 samples/sec   Loss 4.3532   LearningRate 0.0106   Epoch: 13   Global Step: 559300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:35,787-Speed 2623.94 samples/sec   Loss 4.4223   LearningRate 0.0106   Epoch: 13   Global Step: 559310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:39,684-Speed 2628.52 samples/sec   Loss 4.3469   LearningRate 0.0106   Epoch: 13   Global Step: 559320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:43,585-Speed 2625.48 samples/sec   Loss 4.3964   LearningRate 0.0106   Epoch: 13   Global Step: 559330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:47,500-Speed 2616.59 samples/sec   Loss 4.4444   LearningRate 0.0106   Epoch: 13   Global Step: 559340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:51,399-Speed 2627.67 samples/sec   Loss 4.4485   LearningRate 0.0106   Epoch: 13   Global Step: 559350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:55,292-Speed 2630.84 samples/sec   Loss 4.3642   LearningRate 0.0106   Epoch: 13   Global Step: 559360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:33:59,207-Speed 2616.06 samples/sec   Loss 4.3455   LearningRate 0.0106   Epoch: 13   Global Step: 559370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:34:03,106-Speed 2627.54 samples/sec   Loss 4.4441   LearningRate 0.0106   Epoch: 13   Global Step: 559380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:34:07,020-Speed 2616.81 samples/sec   Loss 4.4072   LearningRate 0.0106   Epoch: 13   Global Step: 559390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:34:10,918-Speed 2627.38 samples/sec   Loss 4.2770   LearningRate 0.0106   Epoch: 13   Global Step: 559400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:34:14,818-Speed 2626.87 samples/sec   Loss 4.4008   LearningRate 0.0106   Epoch: 13   Global Step: 559410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:34:18,714-Speed 2630.43 samples/sec   Loss 4.4930   LearningRate 0.0106   Epoch: 13   Global Step: 559420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:34:22,592-Speed 2640.52 samples/sec   Loss 4.3557   LearningRate 0.0106   Epoch: 13   Global Step: 559430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:34:26,470-Speed 2641.80 samples/sec   Loss 4.4621   LearningRate 0.0106   Epoch: 13   Global Step: 559440   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:30,361-Speed 2632.28 samples/sec   Loss 4.3987   LearningRate 0.0106   Epoch: 13   Global Step: 559450   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:34,255-Speed 2630.43 samples/sec   Loss 4.4177   LearningRate 0.0106   Epoch: 13   Global Step: 559460   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:38,157-Speed 2624.74 samples/sec   Loss 4.3901   LearningRate 0.0106   Epoch: 13   Global Step: 559470   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:42,066-Speed 2620.61 samples/sec   Loss 4.3454   LearningRate 0.0106   Epoch: 13   Global Step: 559480   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:45,960-Speed 2630.52 samples/sec   Loss 4.4776   LearningRate 0.0106   Epoch: 13   Global Step: 559490   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:49,895-Speed 2602.83 samples/sec   Loss 4.3207   LearningRate 0.0106   Epoch: 13   Global Step: 559500   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:53,793-Speed 2627.40 samples/sec   Loss 4.5067   LearningRate 0.0106   Epoch: 13   Global Step: 559510   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:34:57,684-Speed 2632.78 samples/sec   Loss 4.2880   LearningRate 0.0106   Epoch: 13   Global Step: 559520   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:35:01,576-Speed 2631.84 samples/sec   Loss 4.3297   LearningRate 0.0106   Epoch: 13   Global Step: 559530   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:35:05,488-Speed 2617.51 samples/sec   Loss 4.3459   LearningRate 0.0106   Epoch: 13   Global Step: 559540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:09,385-Speed 2628.37 samples/sec   Loss 4.3446   LearningRate 0.0106   Epoch: 13   Global Step: 559550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:13,290-Speed 2623.12 samples/sec   Loss 4.4338   LearningRate 0.0106   Epoch: 13   Global Step: 559560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:17,180-Speed 2632.54 samples/sec   Loss 4.3950   LearningRate 0.0106   Epoch: 13   Global Step: 559570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:21,073-Speed 2631.51 samples/sec   Loss 4.4430   LearningRate 0.0106   Epoch: 13   Global Step: 559580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:24,967-Speed 2630.53 samples/sec   Loss 4.3626   LearningRate 0.0106   Epoch: 13   Global Step: 559590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:28,881-Speed 2617.02 samples/sec   Loss 4.3927   LearningRate 0.0106   Epoch: 13   Global Step: 559600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:32,798-Speed 2614.78 samples/sec   Loss 4.4233   LearningRate 0.0106   Epoch: 13   Global Step: 559610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:36,693-Speed 2629.27 samples/sec   Loss 4.4332   LearningRate 0.0106   Epoch: 13   Global Step: 559620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:40,592-Speed 2626.58 samples/sec   Loss 4.3608   LearningRate 0.0106   Epoch: 13   Global Step: 559630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:35:44,452-Speed 2653.63 samples/sec   Loss 4.4027   LearningRate 0.0106   Epoch: 13   Global Step: 559640   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:35:48,365-Speed 2617.98 samples/sec   Loss 4.3464   LearningRate 0.0106   Epoch: 13   Global Step: 559650   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:35:52,275-Speed 2619.26 samples/sec   Loss 4.3786   LearningRate 0.0106   Epoch: 13   Global Step: 559660   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:35:56,181-Speed 2622.16 samples/sec   Loss 4.3572   LearningRate 0.0106   Epoch: 13   Global Step: 559670   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:00,088-Speed 2621.92 samples/sec   Loss 4.3961   LearningRate 0.0106   Epoch: 13   Global Step: 559680   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:03,996-Speed 2620.47 samples/sec   Loss 4.4272   LearningRate 0.0106   Epoch: 13   Global Step: 559690   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:07,909-Speed 2617.43 samples/sec   Loss 4.3597   LearningRate 0.0106   Epoch: 13   Global Step: 559700   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:11,816-Speed 2622.00 samples/sec   Loss 4.4384   LearningRate 0.0106   Epoch: 13   Global Step: 559710   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:15,716-Speed 2626.22 samples/sec   Loss 4.3835   LearningRate 0.0106   Epoch: 13   Global Step: 559720   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:19,632-Speed 2615.35 samples/sec   Loss 4.3492   LearningRate 0.0106   Epoch: 13   Global Step: 559730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:36:23,542-Speed 2619.95 samples/sec   Loss 4.4683   LearningRate 0.0106   Epoch: 13   Global Step: 559740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:27,446-Speed 2623.44 samples/sec   Loss 4.4535   LearningRate 0.0106   Epoch: 13   Global Step: 559750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:31,342-Speed 2628.85 samples/sec   Loss 4.3753   LearningRate 0.0106   Epoch: 13   Global Step: 559760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:35,240-Speed 2627.86 samples/sec   Loss 4.3797   LearningRate 0.0106   Epoch: 13   Global Step: 559770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:39,144-Speed 2623.95 samples/sec   Loss 4.3264   LearningRate 0.0106   Epoch: 13   Global Step: 559780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:43,042-Speed 2627.16 samples/sec   Loss 4.3605   LearningRate 0.0106   Epoch: 13   Global Step: 559790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:46,940-Speed 2628.07 samples/sec   Loss 4.4697   LearningRate 0.0106   Epoch: 13   Global Step: 559800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:50,834-Speed 2630.03 samples/sec   Loss 4.3862   LearningRate 0.0106   Epoch: 13   Global Step: 559810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:54,730-Speed 2628.93 samples/sec   Loss 4.3786   LearningRate 0.0106   Epoch: 13   Global Step: 559820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:36:58,625-Speed 2629.28 samples/sec   Loss 4.3901   LearningRate 0.0106   Epoch: 13   Global Step: 559830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:02,517-Speed 2632.91 samples/sec   Loss 4.3672   LearningRate 0.0106   Epoch: 13   Global Step: 559840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:37:06,417-Speed 2625.88 samples/sec   Loss 4.3911   LearningRate 0.0106   Epoch: 13   Global Step: 559850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:37:10,313-Speed 2629.30 samples/sec   Loss 4.3336   LearningRate 0.0106   Epoch: 13   Global Step: 559860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:37:14,188-Speed 2643.00 samples/sec   Loss 4.4121   LearningRate 0.0106   Epoch: 13   Global Step: 559870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:18,085-Speed 2629.21 samples/sec   Loss 4.3533   LearningRate 0.0106   Epoch: 13   Global Step: 559880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:21,982-Speed 2627.87 samples/sec   Loss 4.3745   LearningRate 0.0106   Epoch: 13   Global Step: 559890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:25,876-Speed 2630.58 samples/sec   Loss 4.2610   LearningRate 0.0106   Epoch: 13   Global Step: 559900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:29,783-Speed 2621.48 samples/sec   Loss 4.3212   LearningRate 0.0106   Epoch: 13   Global Step: 559910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:33,683-Speed 2626.44 samples/sec   Loss 4.3650   LearningRate 0.0106   Epoch: 13   Global Step: 559920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:37,588-Speed 2622.58 samples/sec   Loss 4.3276   LearningRate 0.0106   Epoch: 13   Global Step: 559930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:41,491-Speed 2624.27 samples/sec   Loss 4.3723   LearningRate 0.0106   Epoch: 13   Global Step: 559940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:45,390-Speed 2626.85 samples/sec   Loss 4.4041   LearningRate 0.0106   Epoch: 13   Global Step: 559950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:49,302-Speed 2618.70 samples/sec   Loss 4.4409   LearningRate 0.0106   Epoch: 13   Global Step: 559960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:37:53,194-Speed 2631.26 samples/sec   Loss 4.3892   LearningRate 0.0106   Epoch: 13   Global Step: 559970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:37:57,071-Speed 2642.38 samples/sec   Loss 4.3588   LearningRate 0.0106   Epoch: 13   Global Step: 559980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:38:00,978-Speed 2621.19 samples/sec   Loss 4.3907   LearningRate 0.0106   Epoch: 13   Global Step: 559990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:38:04,883-Speed 2623.09 samples/sec   Loss 4.3959   LearningRate 0.0106   Epoch: 13   Global Step: 560000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:38:48,319-[lfw][560000]XNorm: 22.792994
Training: 2022-04-15 10:38:48,320-[lfw][560000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 10:38:48,321-[lfw][560000]Accuracy-Highest: 0.99800
Training: 2022-04-15 10:39:38,317-[cfp_fp][560000]XNorm: 21.402369
Training: 2022-04-15 10:39:38,318-[cfp_fp][560000]Accuracy-Flip: 0.99029+-0.00366
Training: 2022-04-15 10:39:38,319-[cfp_fp][560000]Accuracy-Highest: 0.99086
Training: 2022-04-15 10:40:21,246-[agedb_30][560000]XNorm: 22.867380
Training: 2022-04-15 10:40:21,247-[agedb_30][560000]Accuracy-Flip: 0.98067+-0.00578
Training: 2022-04-15 10:40:21,247-[agedb_30][560000]Accuracy-Highest: 0.98083
Training: 2022-04-15 10:40:25,142-Speed 73.01 samples/sec   Loss 4.3806   LearningRate 0.0106   Epoch: 13   Global Step: 560010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:29,018-Speed 2642.30 samples/sec   Loss 4.4120   LearningRate 0.0106   Epoch: 13   Global Step: 560020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:32,900-Speed 2639.25 samples/sec   Loss 4.4262   LearningRate 0.0106   Epoch: 13   Global Step: 560030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:36,926-Speed 2543.87 samples/sec   Loss 4.4426   LearningRate 0.0106   Epoch: 13   Global Step: 560040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:40,918-Speed 2565.76 samples/sec   Loss 4.3727   LearningRate 0.0106   Epoch: 13   Global Step: 560050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:44,811-Speed 2630.68 samples/sec   Loss 4.4476   LearningRate 0.0106   Epoch: 13   Global Step: 560060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:48,701-Speed 2633.14 samples/sec   Loss 4.3551   LearningRate 0.0106   Epoch: 13   Global Step: 560070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:40:52,595-Speed 2630.39 samples/sec   Loss 4.3802   LearningRate 0.0106   Epoch: 13   Global Step: 560080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:40:56,468-Speed 2644.70 samples/sec   Loss 4.3114   LearningRate 0.0106   Epoch: 13   Global Step: 560090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:41:00,371-Speed 2624.68 samples/sec   Loss 4.3172   LearningRate 0.0106   Epoch: 13   Global Step: 560100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:41:04,529-Speed 2462.96 samples/sec   Loss 4.4004   LearningRate 0.0106   Epoch: 13   Global Step: 560110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:41:08,420-Speed 2632.13 samples/sec   Loss 4.3991   LearningRate 0.0106   Epoch: 13   Global Step: 560120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:41:12,297-Speed 2642.78 samples/sec   Loss 4.3498   LearningRate 0.0105   Epoch: 13   Global Step: 560130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:16,187-Speed 2632.79 samples/sec   Loss 4.3744   LearningRate 0.0105   Epoch: 13   Global Step: 560140   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:20,089-Speed 2624.67 samples/sec   Loss 4.4216   LearningRate 0.0105   Epoch: 13   Global Step: 560150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:23,992-Speed 2624.72 samples/sec   Loss 4.2975   LearningRate 0.0105   Epoch: 13   Global Step: 560160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:27,890-Speed 2627.30 samples/sec   Loss 4.4379   LearningRate 0.0105   Epoch: 13   Global Step: 560170   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:31,791-Speed 2626.14 samples/sec   Loss 4.4587   LearningRate 0.0105   Epoch: 13   Global Step: 560180   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:35,730-Speed 2600.05 samples/sec   Loss 4.2979   LearningRate 0.0105   Epoch: 13   Global Step: 560190   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:39,624-Speed 2630.68 samples/sec   Loss 4.3314   LearningRate 0.0105   Epoch: 13   Global Step: 560200   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:43,528-Speed 2623.73 samples/sec   Loss 4.3485   LearningRate 0.0105   Epoch: 13   Global Step: 560210   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:47,431-Speed 2624.09 samples/sec   Loss 4.3937   LearningRate 0.0105   Epoch: 13   Global Step: 560220   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:41:51,327-Speed 2629.62 samples/sec   Loss 4.4509   LearningRate 0.0105   Epoch: 13   Global Step: 560230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:41:55,232-Speed 2623.45 samples/sec   Loss 4.4452   LearningRate 0.0105   Epoch: 13   Global Step: 560240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:41:59,132-Speed 2626.19 samples/sec   Loss 4.3959   LearningRate 0.0105   Epoch: 13   Global Step: 560250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:03,100-Speed 2581.15 samples/sec   Loss 4.3951   LearningRate 0.0105   Epoch: 13   Global Step: 560260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:06,998-Speed 2628.35 samples/sec   Loss 4.3548   LearningRate 0.0105   Epoch: 13   Global Step: 560270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:10,944-Speed 2595.67 samples/sec   Loss 4.3933   LearningRate 0.0105   Epoch: 13   Global Step: 560280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:14,846-Speed 2624.95 samples/sec   Loss 4.3959   LearningRate 0.0105   Epoch: 13   Global Step: 560290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:18,746-Speed 2625.76 samples/sec   Loss 4.3506   LearningRate 0.0105   Epoch: 13   Global Step: 560300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:22,646-Speed 2626.84 samples/sec   Loss 4.3371   LearningRate 0.0105   Epoch: 13   Global Step: 560310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:26,540-Speed 2630.57 samples/sec   Loss 4.4803   LearningRate 0.0105   Epoch: 13   Global Step: 560320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:30,417-Speed 2641.49 samples/sec   Loss 4.3862   LearningRate 0.0105   Epoch: 13   Global Step: 560330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:34,314-Speed 2628.99 samples/sec   Loss 4.3273   LearningRate 0.0105   Epoch: 13   Global Step: 560340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:38,208-Speed 2630.42 samples/sec   Loss 4.3161   LearningRate 0.0105   Epoch: 13   Global Step: 560350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:42,100-Speed 2631.34 samples/sec   Loss 4.3211   LearningRate 0.0105   Epoch: 13   Global Step: 560360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:45,994-Speed 2630.83 samples/sec   Loss 4.2553   LearningRate 0.0105   Epoch: 13   Global Step: 560370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:49,907-Speed 2624.76 samples/sec   Loss 4.4052   LearningRate 0.0105   Epoch: 13   Global Step: 560380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:53,802-Speed 2629.86 samples/sec   Loss 4.3659   LearningRate 0.0105   Epoch: 13   Global Step: 560390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:42:57,695-Speed 2630.92 samples/sec   Loss 4.3169   LearningRate 0.0105   Epoch: 13   Global Step: 560400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:01,591-Speed 2629.00 samples/sec   Loss 4.2626   LearningRate 0.0105   Epoch: 13   Global Step: 560410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:05,485-Speed 2630.32 samples/sec   Loss 4.4193   LearningRate 0.0105   Epoch: 13   Global Step: 560420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:09,360-Speed 2643.01 samples/sec   Loss 4.3904   LearningRate 0.0105   Epoch: 13   Global Step: 560430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:13,263-Speed 2624.13 samples/sec   Loss 4.4562   LearningRate 0.0105   Epoch: 13   Global Step: 560440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:17,156-Speed 2632.10 samples/sec   Loss 4.4285   LearningRate 0.0105   Epoch: 13   Global Step: 560450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:21,066-Speed 2619.37 samples/sec   Loss 4.4137   LearningRate 0.0105   Epoch: 13   Global Step: 560460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:24,963-Speed 2628.20 samples/sec   Loss 4.4314   LearningRate 0.0105   Epoch: 13   Global Step: 560470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:28,853-Speed 2633.04 samples/sec   Loss 4.3364   LearningRate 0.0105   Epoch: 13   Global Step: 560480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:32,746-Speed 2631.40 samples/sec   Loss 4.4704   LearningRate 0.0105   Epoch: 13   Global Step: 560490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:36,639-Speed 2630.90 samples/sec   Loss 4.3429   LearningRate 0.0105   Epoch: 13   Global Step: 560500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:40,529-Speed 2632.91 samples/sec   Loss 4.3908   LearningRate 0.0105   Epoch: 13   Global Step: 560510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:44,422-Speed 2631.49 samples/sec   Loss 4.4165   LearningRate 0.0105   Epoch: 13   Global Step: 560520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:48,328-Speed 2622.44 samples/sec   Loss 4.2859   LearningRate 0.0105   Epoch: 13   Global Step: 560530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:43:52,218-Speed 2633.22 samples/sec   Loss 4.4660   LearningRate 0.0105   Epoch: 13   Global Step: 560540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:43:56,088-Speed 2646.78 samples/sec   Loss 4.3852   LearningRate 0.0105   Epoch: 13   Global Step: 560550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:43:59,986-Speed 2627.54 samples/sec   Loss 4.3001   LearningRate 0.0105   Epoch: 13   Global Step: 560560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:03,885-Speed 2626.84 samples/sec   Loss 4.3226   LearningRate 0.0105   Epoch: 13   Global Step: 560570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:07,785-Speed 2626.90 samples/sec   Loss 4.4806   LearningRate 0.0105   Epoch: 13   Global Step: 560580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:11,683-Speed 2627.45 samples/sec   Loss 4.4506   LearningRate 0.0105   Epoch: 13   Global Step: 560590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:15,581-Speed 2627.81 samples/sec   Loss 4.3158   LearningRate 0.0105   Epoch: 13   Global Step: 560600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:19,479-Speed 2627.45 samples/sec   Loss 4.3699   LearningRate 0.0105   Epoch: 13   Global Step: 560610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:23,374-Speed 2630.10 samples/sec   Loss 4.4010   LearningRate 0.0105   Epoch: 13   Global Step: 560620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:27,267-Speed 2630.89 samples/sec   Loss 4.3548   LearningRate 0.0105   Epoch: 13   Global Step: 560630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:31,185-Speed 2613.98 samples/sec   Loss 4.3395   LearningRate 0.0105   Epoch: 13   Global Step: 560640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:35,118-Speed 2604.11 samples/sec   Loss 4.3072   LearningRate 0.0105   Epoch: 13   Global Step: 560650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:44:39,022-Speed 2623.97 samples/sec   Loss 4.3875   LearningRate 0.0105   Epoch: 13   Global Step: 560660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:44:42,919-Speed 2628.41 samples/sec   Loss 4.4301   LearningRate 0.0105   Epoch: 13   Global Step: 560670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:44:46,810-Speed 2632.63 samples/sec   Loss 4.3525   LearningRate 0.0105   Epoch: 13   Global Step: 560680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:44:50,702-Speed 2631.19 samples/sec   Loss 4.3411   LearningRate 0.0105   Epoch: 13   Global Step: 560690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:54,597-Speed 2630.30 samples/sec   Loss 4.3589   LearningRate 0.0105   Epoch: 13   Global Step: 560700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:44:58,498-Speed 2625.65 samples/sec   Loss 4.2847   LearningRate 0.0105   Epoch: 13   Global Step: 560710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:02,405-Speed 2621.88 samples/sec   Loss 4.3556   LearningRate 0.0105   Epoch: 13   Global Step: 560720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:06,297-Speed 2631.24 samples/sec   Loss 4.3257   LearningRate 0.0105   Epoch: 13   Global Step: 560730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:10,238-Speed 2599.45 samples/sec   Loss 4.3473   LearningRate 0.0105   Epoch: 13   Global Step: 560740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:14,136-Speed 2627.75 samples/sec   Loss 4.2734   LearningRate 0.0105   Epoch: 13   Global Step: 560750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:18,039-Speed 2624.47 samples/sec   Loss 4.3420   LearningRate 0.0105   Epoch: 13   Global Step: 560760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:21,935-Speed 2628.95 samples/sec   Loss 4.3877   LearningRate 0.0105   Epoch: 13   Global Step: 560770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:25,830-Speed 2629.94 samples/sec   Loss 4.2796   LearningRate 0.0105   Epoch: 13   Global Step: 560780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:29,724-Speed 2631.04 samples/sec   Loss 4.4366   LearningRate 0.0105   Epoch: 13   Global Step: 560790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:45:33,614-Speed 2632.93 samples/sec   Loss 4.4230   LearningRate 0.0105   Epoch: 13   Global Step: 560800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:37,511-Speed 2627.61 samples/sec   Loss 4.3713   LearningRate 0.0105   Epoch: 13   Global Step: 560810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:41,439-Speed 2607.41 samples/sec   Loss 4.3173   LearningRate 0.0105   Epoch: 13   Global Step: 560820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:45,338-Speed 2627.10 samples/sec   Loss 4.4405   LearningRate 0.0105   Epoch: 13   Global Step: 560830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:49,232-Speed 2631.03 samples/sec   Loss 4.4179   LearningRate 0.0105   Epoch: 13   Global Step: 560840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:53,127-Speed 2629.80 samples/sec   Loss 4.3992   LearningRate 0.0105   Epoch: 13   Global Step: 560850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:45:57,021-Speed 2630.57 samples/sec   Loss 4.5392   LearningRate 0.0105   Epoch: 13   Global Step: 560860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:00,920-Speed 2626.54 samples/sec   Loss 4.3344   LearningRate 0.0105   Epoch: 13   Global Step: 560870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:04,842-Speed 2611.81 samples/sec   Loss 4.3948   LearningRate 0.0105   Epoch: 13   Global Step: 560880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:08,739-Speed 2627.88 samples/sec   Loss 4.2995   LearningRate 0.0105   Epoch: 13   Global Step: 560890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:12,676-Speed 2602.32 samples/sec   Loss 4.3387   LearningRate 0.0105   Epoch: 13   Global Step: 560900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:46:16,551-Speed 2643.13 samples/sec   Loss 4.3692   LearningRate 0.0105   Epoch: 13   Global Step: 560910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:20,472-Speed 2612.55 samples/sec   Loss 4.4178   LearningRate 0.0105   Epoch: 13   Global Step: 560920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:24,370-Speed 2627.38 samples/sec   Loss 4.3875   LearningRate 0.0105   Epoch: 13   Global Step: 560930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:28,304-Speed 2604.17 samples/sec   Loss 4.3282   LearningRate 0.0105   Epoch: 13   Global Step: 560940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:32,221-Speed 2614.41 samples/sec   Loss 4.3170   LearningRate 0.0105   Epoch: 13   Global Step: 560950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:36,126-Speed 2622.56 samples/sec   Loss 4.3933   LearningRate 0.0105   Epoch: 13   Global Step: 560960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:40,030-Speed 2623.70 samples/sec   Loss 4.3792   LearningRate 0.0105   Epoch: 13   Global Step: 560970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:43,985-Speed 2591.66 samples/sec   Loss 4.3849   LearningRate 0.0105   Epoch: 13   Global Step: 560980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:47,877-Speed 2631.54 samples/sec   Loss 4.3881   LearningRate 0.0105   Epoch: 13   Global Step: 560990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:51,769-Speed 2631.60 samples/sec   Loss 4.4298   LearningRate 0.0105   Epoch: 13   Global Step: 561000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:46:55,662-Speed 2630.68 samples/sec   Loss 4.4045   LearningRate 0.0105   Epoch: 13   Global Step: 561010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:46:59,557-Speed 2630.37 samples/sec   Loss 4.4039   LearningRate 0.0105   Epoch: 13   Global Step: 561020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:47:03,455-Speed 2627.90 samples/sec   Loss 4.3222   LearningRate 0.0105   Epoch: 13   Global Step: 561030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:47:07,349-Speed 2630.31 samples/sec   Loss 4.4076   LearningRate 0.0105   Epoch: 13   Global Step: 561040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:47:11,249-Speed 2626.21 samples/sec   Loss 4.4618   LearningRate 0.0105   Epoch: 13   Global Step: 561050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:47:15,121-Speed 2645.53 samples/sec   Loss 4.3596   LearningRate 0.0105   Epoch: 13   Global Step: 561060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:19,011-Speed 2633.17 samples/sec   Loss 4.4240   LearningRate 0.0105   Epoch: 13   Global Step: 561070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:22,906-Speed 2629.67 samples/sec   Loss 4.3579   LearningRate 0.0105   Epoch: 13   Global Step: 561080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:26,862-Speed 2589.62 samples/sec   Loss 4.4185   LearningRate 0.0105   Epoch: 13   Global Step: 561090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:30,760-Speed 2627.44 samples/sec   Loss 4.3858   LearningRate 0.0105   Epoch: 13   Global Step: 561100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:34,657-Speed 2627.97 samples/sec   Loss 4.3817   LearningRate 0.0105   Epoch: 13   Global Step: 561110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:38,552-Speed 2629.79 samples/sec   Loss 4.4888   LearningRate 0.0105   Epoch: 13   Global Step: 561120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:42,449-Speed 2628.92 samples/sec   Loss 4.2800   LearningRate 0.0105   Epoch: 13   Global Step: 561130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:46,357-Speed 2620.53 samples/sec   Loss 4.4031   LearningRate 0.0105   Epoch: 13   Global Step: 561140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:50,301-Speed 2597.20 samples/sec   Loss 4.4475   LearningRate 0.0105   Epoch: 13   Global Step: 561150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:47:54,200-Speed 2627.22 samples/sec   Loss 4.3725   LearningRate 0.0105   Epoch: 13   Global Step: 561160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:47:58,069-Speed 2647.76 samples/sec   Loss 4.3750   LearningRate 0.0105   Epoch: 13   Global Step: 561170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:01,964-Speed 2629.32 samples/sec   Loss 4.3654   LearningRate 0.0105   Epoch: 13   Global Step: 561180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:05,869-Speed 2622.78 samples/sec   Loss 4.3647   LearningRate 0.0105   Epoch: 13   Global Step: 561190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:09,766-Speed 2628.65 samples/sec   Loss 4.4040   LearningRate 0.0105   Epoch: 13   Global Step: 561200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:13,664-Speed 2627.45 samples/sec   Loss 4.3656   LearningRate 0.0105   Epoch: 13   Global Step: 561210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:17,560-Speed 2629.14 samples/sec   Loss 4.2732   LearningRate 0.0105   Epoch: 13   Global Step: 561220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:21,455-Speed 2629.46 samples/sec   Loss 4.2855   LearningRate 0.0105   Epoch: 13   Global Step: 561230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:25,359-Speed 2623.77 samples/sec   Loss 4.3538   LearningRate 0.0105   Epoch: 13   Global Step: 561240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:29,254-Speed 2630.28 samples/sec   Loss 4.4314   LearningRate 0.0105   Epoch: 13   Global Step: 561250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:33,157-Speed 2624.34 samples/sec   Loss 4.3842   LearningRate 0.0105   Epoch: 13   Global Step: 561260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:37,032-Speed 2643.04 samples/sec   Loss 4.4075   LearningRate 0.0105   Epoch: 13   Global Step: 561270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:40,932-Speed 2626.03 samples/sec   Loss 4.4334   LearningRate 0.0105   Epoch: 13   Global Step: 561280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:44,840-Speed 2621.00 samples/sec   Loss 4.3375   LearningRate 0.0105   Epoch: 13   Global Step: 561290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:48,727-Speed 2634.76 samples/sec   Loss 4.3273   LearningRate 0.0105   Epoch: 13   Global Step: 561300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:52,627-Speed 2626.91 samples/sec   Loss 4.3338   LearningRate 0.0105   Epoch: 13   Global Step: 561310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:48:56,524-Speed 2628.60 samples/sec   Loss 4.4859   LearningRate 0.0105   Epoch: 13   Global Step: 561320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:00,446-Speed 2611.03 samples/sec   Loss 4.4077   LearningRate 0.0105   Epoch: 13   Global Step: 561330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:04,340-Speed 2630.85 samples/sec   Loss 4.4138   LearningRate 0.0105   Epoch: 13   Global Step: 561340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:08,231-Speed 2632.44 samples/sec   Loss 4.3571   LearningRate 0.0105   Epoch: 13   Global Step: 561350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:12,122-Speed 2632.42 samples/sec   Loss 4.3720   LearningRate 0.0105   Epoch: 13   Global Step: 561360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:16,031-Speed 2620.89 samples/sec   Loss 4.2795   LearningRate 0.0105   Epoch: 13   Global Step: 561370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:49:19,920-Speed 2633.49 samples/sec   Loss 4.4187   LearningRate 0.0105   Epoch: 13   Global Step: 561380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:49:23,815-Speed 2630.11 samples/sec   Loss 4.3010   LearningRate 0.0105   Epoch: 13   Global Step: 561390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:49:27,686-Speed 2645.91 samples/sec   Loss 4.3362   LearningRate 0.0105   Epoch: 13   Global Step: 561400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:31,580-Speed 2630.31 samples/sec   Loss 4.3052   LearningRate 0.0104   Epoch: 13   Global Step: 561410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:35,491-Speed 2618.58 samples/sec   Loss 4.3848   LearningRate 0.0104   Epoch: 13   Global Step: 561420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:39,390-Speed 2627.70 samples/sec   Loss 4.3774   LearningRate 0.0104   Epoch: 13   Global Step: 561430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:43,295-Speed 2622.73 samples/sec   Loss 4.3665   LearningRate 0.0104   Epoch: 13   Global Step: 561440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:47,197-Speed 2625.16 samples/sec   Loss 4.4085   LearningRate 0.0104   Epoch: 13   Global Step: 561450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:51,108-Speed 2618.60 samples/sec   Loss 4.4499   LearningRate 0.0104   Epoch: 13   Global Step: 561460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:55,004-Speed 2629.67 samples/sec   Loss 4.3319   LearningRate 0.0104   Epoch: 13   Global Step: 561470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:49:58,898-Speed 2630.09 samples/sec   Loss 4.3688   LearningRate 0.0104   Epoch: 13   Global Step: 561480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:02,798-Speed 2626.20 samples/sec   Loss 4.4095   LearningRate 0.0104   Epoch: 13   Global Step: 561490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:06,702-Speed 2623.48 samples/sec   Loss 4.3062   LearningRate 0.0104   Epoch: 13   Global Step: 561500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:50:10,693-Speed 2566.31 samples/sec   Loss 4.4035   LearningRate 0.0104   Epoch: 13   Global Step: 561510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:50:14,571-Speed 2641.24 samples/sec   Loss 4.2677   LearningRate 0.0104   Epoch: 13   Global Step: 561520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:18,493-Speed 2611.27 samples/sec   Loss 4.4483   LearningRate 0.0104   Epoch: 13   Global Step: 561530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:22,422-Speed 2607.06 samples/sec   Loss 4.4049   LearningRate 0.0104   Epoch: 13   Global Step: 561540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:26,319-Speed 2628.35 samples/sec   Loss 4.3740   LearningRate 0.0104   Epoch: 13   Global Step: 561550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:30,217-Speed 2627.68 samples/sec   Loss 4.2795   LearningRate 0.0104   Epoch: 13   Global Step: 561560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:34,116-Speed 2626.99 samples/sec   Loss 4.3139   LearningRate 0.0104   Epoch: 13   Global Step: 561570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:38,017-Speed 2625.59 samples/sec   Loss 4.3433   LearningRate 0.0104   Epoch: 13   Global Step: 561580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:41,920-Speed 2624.03 samples/sec   Loss 4.3304   LearningRate 0.0104   Epoch: 13   Global Step: 561590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:45,822-Speed 2625.69 samples/sec   Loss 4.4277   LearningRate 0.0104   Epoch: 13   Global Step: 561600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:49,724-Speed 2624.60 samples/sec   Loss 4.3579   LearningRate 0.0104   Epoch: 13   Global Step: 561610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:50:53,625-Speed 2625.80 samples/sec   Loss 4.4302   LearningRate 0.0104   Epoch: 13   Global Step: 561620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:50:57,498-Speed 2644.20 samples/sec   Loss 4.2696   LearningRate 0.0104   Epoch: 13   Global Step: 561630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:01,400-Speed 2625.25 samples/sec   Loss 4.3170   LearningRate 0.0104   Epoch: 13   Global Step: 561640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:05,311-Speed 2618.91 samples/sec   Loss 4.3619   LearningRate 0.0104   Epoch: 13   Global Step: 561650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:09,216-Speed 2622.92 samples/sec   Loss 4.4409   LearningRate 0.0104   Epoch: 13   Global Step: 561660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:13,135-Speed 2613.10 samples/sec   Loss 4.3389   LearningRate 0.0104   Epoch: 13   Global Step: 561670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:17,075-Speed 2599.48 samples/sec   Loss 4.3834   LearningRate 0.0104   Epoch: 13   Global Step: 561680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:20,993-Speed 2614.51 samples/sec   Loss 4.2471   LearningRate 0.0104   Epoch: 13   Global Step: 561690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:24,896-Speed 2624.38 samples/sec   Loss 4.3419   LearningRate 0.0104   Epoch: 13   Global Step: 561700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:28,802-Speed 2622.21 samples/sec   Loss 4.2913   LearningRate 0.0104   Epoch: 13   Global Step: 561710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:32,724-Speed 2611.41 samples/sec   Loss 4.3128   LearningRate 0.0104   Epoch: 13   Global Step: 561720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:36,625-Speed 2625.43 samples/sec   Loss 4.3539   LearningRate 0.0104   Epoch: 13   Global Step: 561730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:51:40,526-Speed 2625.81 samples/sec   Loss 4.4403   LearningRate 0.0104   Epoch: 13   Global Step: 561740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:51:44,414-Speed 2634.58 samples/sec   Loss 4.3002   LearningRate 0.0104   Epoch: 13   Global Step: 561750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:48,317-Speed 2623.68 samples/sec   Loss 4.2568   LearningRate 0.0104   Epoch: 13   Global Step: 561760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:52,210-Speed 2631.19 samples/sec   Loss 4.3393   LearningRate 0.0104   Epoch: 13   Global Step: 561770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:51:56,111-Speed 2625.70 samples/sec   Loss 4.3423   LearningRate 0.0104   Epoch: 13   Global Step: 561780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:52:00,022-Speed 2618.90 samples/sec   Loss 4.4183   LearningRate 0.0104   Epoch: 13   Global Step: 561790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:52:03,906-Speed 2636.99 samples/sec   Loss 4.3397   LearningRate 0.0104   Epoch: 13   Global Step: 561800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:07,902-Speed 2563.18 samples/sec   Loss 4.3113   LearningRate 0.0104   Epoch: 13   Global Step: 561810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:11,799-Speed 2628.21 samples/sec   Loss 4.3474   LearningRate 0.0104   Epoch: 13   Global Step: 561820   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:15,697-Speed 2627.32 samples/sec   Loss 4.3339   LearningRate 0.0104   Epoch: 13   Global Step: 561830   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:19,595-Speed 2628.32 samples/sec   Loss 4.3943   LearningRate 0.0104   Epoch: 13   Global Step: 561840   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:23,513-Speed 2614.22 samples/sec   Loss 4.4338   LearningRate 0.0104   Epoch: 13   Global Step: 561850   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:27,414-Speed 2625.89 samples/sec   Loss 4.4172   LearningRate 0.0104   Epoch: 13   Global Step: 561860   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:31,311-Speed 2628.44 samples/sec   Loss 4.4977   LearningRate 0.0104   Epoch: 13   Global Step: 561870   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:35,208-Speed 2628.31 samples/sec   Loss 4.3441   LearningRate 0.0104   Epoch: 13   Global Step: 561880   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:39,118-Speed 2619.46 samples/sec   Loss 4.3612   LearningRate 0.0104   Epoch: 13   Global Step: 561890   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:43,019-Speed 2625.65 samples/sec   Loss 4.3800   LearningRate 0.0104   Epoch: 13   Global Step: 561900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:52:46,926-Speed 2621.11 samples/sec   Loss 4.3336   LearningRate 0.0104   Epoch: 13   Global Step: 561910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:52:50,799-Speed 2645.17 samples/sec   Loss 4.3515   LearningRate 0.0104   Epoch: 13   Global Step: 561920   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:54,707-Speed 2620.91 samples/sec   Loss 4.3201   LearningRate 0.0104   Epoch: 13   Global Step: 561930   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:52:58,602-Speed 2629.55 samples/sec   Loss 4.3262   LearningRate 0.0104   Epoch: 13   Global Step: 561940   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:02,505-Speed 2624.09 samples/sec   Loss 4.3622   LearningRate 0.0104   Epoch: 13   Global Step: 561950   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:06,412-Speed 2621.39 samples/sec   Loss 4.3315   LearningRate 0.0104   Epoch: 13   Global Step: 561960   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:10,311-Speed 2627.00 samples/sec   Loss 4.3735   LearningRate 0.0104   Epoch: 13   Global Step: 561970   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:14,204-Speed 2631.35 samples/sec   Loss 4.4785   LearningRate 0.0104   Epoch: 13   Global Step: 561980   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:18,107-Speed 2624.03 samples/sec   Loss 4.4696   LearningRate 0.0104   Epoch: 13   Global Step: 561990   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:22,010-Speed 2624.30 samples/sec   Loss 4.3638   LearningRate 0.0104   Epoch: 13   Global Step: 562000   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:25,902-Speed 2631.82 samples/sec   Loss 4.2929   LearningRate 0.0104   Epoch: 13   Global Step: 562010   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:53:29,802-Speed 2626.39 samples/sec   Loss 4.4084   LearningRate 0.0104   Epoch: 13   Global Step: 562020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:33,692-Speed 2633.43 samples/sec   Loss 4.3744   LearningRate 0.0104   Epoch: 13   Global Step: 562030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:37,599-Speed 2621.48 samples/sec   Loss 4.4928   LearningRate 0.0104   Epoch: 13   Global Step: 562040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:41,498-Speed 2626.61 samples/sec   Loss 4.3653   LearningRate 0.0104   Epoch: 13   Global Step: 562050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:45,397-Speed 2626.96 samples/sec   Loss 4.3860   LearningRate 0.0104   Epoch: 13   Global Step: 562060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:49,290-Speed 2631.39 samples/sec   Loss 4.3627   LearningRate 0.0104   Epoch: 13   Global Step: 562070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:53,186-Speed 2629.37 samples/sec   Loss 4.3842   LearningRate 0.0104   Epoch: 13   Global Step: 562080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:53:57,081-Speed 2630.01 samples/sec   Loss 4.3805   LearningRate 0.0104   Epoch: 13   Global Step: 562090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:00,974-Speed 2631.25 samples/sec   Loss 4.3088   LearningRate 0.0104   Epoch: 13   Global Step: 562100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:04,921-Speed 2594.78 samples/sec   Loss 4.3255   LearningRate 0.0104   Epoch: 13   Global Step: 562110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:08,823-Speed 2625.04 samples/sec   Loss 4.4057   LearningRate 0.0104   Epoch: 13   Global Step: 562120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:54:12,797-Speed 2577.29 samples/sec   Loss 4.3903   LearningRate 0.0104   Epoch: 13   Global Step: 562130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:54:16,692-Speed 2629.39 samples/sec   Loss 4.3259   LearningRate 0.0104   Epoch: 13   Global Step: 562140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:20,587-Speed 2629.82 samples/sec   Loss 4.2305   LearningRate 0.0104   Epoch: 13   Global Step: 562150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:24,484-Speed 2629.04 samples/sec   Loss 4.3273   LearningRate 0.0104   Epoch: 13   Global Step: 562160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:28,384-Speed 2626.29 samples/sec   Loss 4.3878   LearningRate 0.0104   Epoch: 13   Global Step: 562170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:32,303-Speed 2613.33 samples/sec   Loss 4.3403   LearningRate 0.0104   Epoch: 13   Global Step: 562180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:36,195-Speed 2631.79 samples/sec   Loss 4.4004   LearningRate 0.0104   Epoch: 13   Global Step: 562190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:40,095-Speed 2626.75 samples/sec   Loss 4.3695   LearningRate 0.0104   Epoch: 13   Global Step: 562200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:43,990-Speed 2629.67 samples/sec   Loss 4.3653   LearningRate 0.0104   Epoch: 13   Global Step: 562210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:47,882-Speed 2631.73 samples/sec   Loss 4.2573   LearningRate 0.0104   Epoch: 13   Global Step: 562220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:51,775-Speed 2630.58 samples/sec   Loss 4.2541   LearningRate 0.0104   Epoch: 13   Global Step: 562230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:54:55,677-Speed 2625.52 samples/sec   Loss 4.3768   LearningRate 0.0104   Epoch: 13   Global Step: 562240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:54:59,553-Speed 2642.54 samples/sec   Loss 4.3923   LearningRate 0.0104   Epoch: 13   Global Step: 562250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:03,444-Speed 2632.55 samples/sec   Loss 4.3934   LearningRate 0.0104   Epoch: 13   Global Step: 562260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:07,332-Speed 2633.91 samples/sec   Loss 4.3653   LearningRate 0.0104   Epoch: 13   Global Step: 562270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:11,227-Speed 2630.27 samples/sec   Loss 4.3786   LearningRate 0.0104   Epoch: 13   Global Step: 562280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:15,121-Speed 2630.18 samples/sec   Loss 4.3606   LearningRate 0.0104   Epoch: 13   Global Step: 562290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:19,018-Speed 2628.19 samples/sec   Loss 4.4508   LearningRate 0.0104   Epoch: 13   Global Step: 562300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:22,913-Speed 2630.16 samples/sec   Loss 4.3880   LearningRate 0.0104   Epoch: 13   Global Step: 562310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:26,810-Speed 2628.01 samples/sec   Loss 4.2532   LearningRate 0.0104   Epoch: 13   Global Step: 562320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:30,705-Speed 2629.87 samples/sec   Loss 4.3576   LearningRate 0.0104   Epoch: 13   Global Step: 562330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:34,600-Speed 2629.34 samples/sec   Loss 4.4209   LearningRate 0.0104   Epoch: 13   Global Step: 562340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:38,473-Speed 2645.08 samples/sec   Loss 4.2914   LearningRate 0.0104   Epoch: 13   Global Step: 562350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:42,366-Speed 2631.06 samples/sec   Loss 4.3409   LearningRate 0.0104   Epoch: 13   Global Step: 562360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:46,258-Speed 2632.20 samples/sec   Loss 4.3176   LearningRate 0.0104   Epoch: 13   Global Step: 562370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:50,152-Speed 2630.08 samples/sec   Loss 4.2722   LearningRate 0.0104   Epoch: 13   Global Step: 562380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:54,047-Speed 2630.03 samples/sec   Loss 4.3019   LearningRate 0.0104   Epoch: 13   Global Step: 562390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:55:57,946-Speed 2627.43 samples/sec   Loss 4.3135   LearningRate 0.0104   Epoch: 13   Global Step: 562400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:01,853-Speed 2620.80 samples/sec   Loss 4.3726   LearningRate 0.0104   Epoch: 13   Global Step: 562410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:05,749-Speed 2629.50 samples/sec   Loss 4.3014   LearningRate 0.0104   Epoch: 13   Global Step: 562420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:09,639-Speed 2634.10 samples/sec   Loss 4.2727   LearningRate 0.0104   Epoch: 13   Global Step: 562430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:13,536-Speed 2628.68 samples/sec   Loss 4.3533   LearningRate 0.0104   Epoch: 13   Global Step: 562440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:17,430-Speed 2630.21 samples/sec   Loss 4.2960   LearningRate 0.0104   Epoch: 13   Global Step: 562450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:56:21,337-Speed 2621.10 samples/sec   Loss 4.4761   LearningRate 0.0104   Epoch: 13   Global Step: 562460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:56:25,236-Speed 2628.16 samples/sec   Loss 4.4329   LearningRate 0.0104   Epoch: 13   Global Step: 562470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:56:29,131-Speed 2629.33 samples/sec   Loss 4.4047   LearningRate 0.0104   Epoch: 13   Global Step: 562480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:56:33,024-Speed 2630.97 samples/sec   Loss 4.2638   LearningRate 0.0104   Epoch: 13   Global Step: 562490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:56:36,900-Speed 2642.00 samples/sec   Loss 4.3779   LearningRate 0.0104   Epoch: 13   Global Step: 562500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:40,794-Speed 2631.00 samples/sec   Loss 4.3373   LearningRate 0.0104   Epoch: 13   Global Step: 562510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:44,692-Speed 2627.52 samples/sec   Loss 4.3418   LearningRate 0.0104   Epoch: 13   Global Step: 562520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:48,600-Speed 2620.96 samples/sec   Loss 4.3049   LearningRate 0.0104   Epoch: 13   Global Step: 562530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:52,496-Speed 2628.89 samples/sec   Loss 4.2307   LearningRate 0.0104   Epoch: 13   Global Step: 562540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:56:56,415-Speed 2614.57 samples/sec   Loss 4.3289   LearningRate 0.0104   Epoch: 13   Global Step: 562550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:57:00,300-Speed 2635.77 samples/sec   Loss 4.3720   LearningRate 0.0104   Epoch: 13   Global Step: 562560   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:04,204-Speed 2623.67 samples/sec   Loss 4.2796   LearningRate 0.0104   Epoch: 13   Global Step: 562570   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:08,097-Speed 2631.25 samples/sec   Loss 4.3312   LearningRate 0.0104   Epoch: 13   Global Step: 562580   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:12,003-Speed 2622.04 samples/sec   Loss 4.3226   LearningRate 0.0104   Epoch: 13   Global Step: 562590   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:15,908-Speed 2623.15 samples/sec   Loss 4.3542   LearningRate 0.0104   Epoch: 13   Global Step: 562600   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:19,838-Speed 2606.39 samples/sec   Loss 4.3489   LearningRate 0.0104   Epoch: 13   Global Step: 562610   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:23,734-Speed 2629.35 samples/sec   Loss 4.3283   LearningRate 0.0104   Epoch: 13   Global Step: 562620   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:27,631-Speed 2628.74 samples/sec   Loss 4.3500   LearningRate 0.0104   Epoch: 13   Global Step: 562630   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:31,545-Speed 2616.48 samples/sec   Loss 4.2853   LearningRate 0.0104   Epoch: 13   Global Step: 562640   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:35,464-Speed 2613.34 samples/sec   Loss 4.3635   LearningRate 0.0104   Epoch: 13   Global Step: 562650   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 10:57:39,364-Speed 2626.20 samples/sec   Loss 4.3566   LearningRate 0.0104   Epoch: 13   Global Step: 562660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:57:43,259-Speed 2630.45 samples/sec   Loss 4.1869   LearningRate 0.0104   Epoch: 13   Global Step: 562670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:57:47,155-Speed 2628.27 samples/sec   Loss 4.3978   LearningRate 0.0104   Epoch: 13   Global Step: 562680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:57:51,072-Speed 2615.53 samples/sec   Loss 4.2817   LearningRate 0.0104   Epoch: 13   Global Step: 562690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:57:54,971-Speed 2627.05 samples/sec   Loss 4.3410   LearningRate 0.0103   Epoch: 13   Global Step: 562700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:57:58,879-Speed 2621.60 samples/sec   Loss 4.2615   LearningRate 0.0103   Epoch: 13   Global Step: 562710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:02,777-Speed 2626.92 samples/sec   Loss 4.3634   LearningRate 0.0103   Epoch: 13   Global Step: 562720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:06,694-Speed 2615.20 samples/sec   Loss 4.3864   LearningRate 0.0103   Epoch: 13   Global Step: 562730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:10,596-Speed 2624.57 samples/sec   Loss 4.3190   LearningRate 0.0103   Epoch: 13   Global Step: 562740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:14,605-Speed 2555.38 samples/sec   Loss 4.3650   LearningRate 0.0103   Epoch: 13   Global Step: 562750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:18,498-Speed 2630.94 samples/sec   Loss 4.3378   LearningRate 0.0103   Epoch: 13   Global Step: 562760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:58:22,373-Speed 2644.01 samples/sec   Loss 4.2844   LearningRate 0.0103   Epoch: 13   Global Step: 562770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:26,282-Speed 2619.97 samples/sec   Loss 4.1813   LearningRate 0.0103   Epoch: 13   Global Step: 562780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:30,178-Speed 2629.65 samples/sec   Loss 4.3122   LearningRate 0.0103   Epoch: 13   Global Step: 562790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:34,080-Speed 2624.49 samples/sec   Loss 4.3236   LearningRate 0.0103   Epoch: 13   Global Step: 562800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:37,980-Speed 2626.11 samples/sec   Loss 4.4464   LearningRate 0.0103   Epoch: 13   Global Step: 562810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:41,879-Speed 2626.90 samples/sec   Loss 4.3236   LearningRate 0.0103   Epoch: 13   Global Step: 562820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:45,774-Speed 2629.61 samples/sec   Loss 4.3447   LearningRate 0.0103   Epoch: 13   Global Step: 562830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:49,669-Speed 2629.41 samples/sec   Loss 4.4712   LearningRate 0.0103   Epoch: 13   Global Step: 562840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:53,577-Speed 2620.92 samples/sec   Loss 4.3342   LearningRate 0.0103   Epoch: 13   Global Step: 562850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:58:57,471-Speed 2630.71 samples/sec   Loss 4.2928   LearningRate 0.0103   Epoch: 13   Global Step: 562860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:01,362-Speed 2632.78 samples/sec   Loss 4.2944   LearningRate 0.0103   Epoch: 13   Global Step: 562870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:05,266-Speed 2623.60 samples/sec   Loss 4.3622   LearningRate 0.0103   Epoch: 13   Global Step: 562880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:09,158-Speed 2631.38 samples/sec   Loss 4.3602   LearningRate 0.0103   Epoch: 13   Global Step: 562890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:13,058-Speed 2626.13 samples/sec   Loss 4.2780   LearningRate 0.0103   Epoch: 13   Global Step: 562900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:16,949-Speed 2632.20 samples/sec   Loss 4.3810   LearningRate 0.0103   Epoch: 13   Global Step: 562910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:20,842-Speed 2631.12 samples/sec   Loss 4.3555   LearningRate 0.0103   Epoch: 13   Global Step: 562920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:24,735-Speed 2630.83 samples/sec   Loss 4.3441   LearningRate 0.0103   Epoch: 13   Global Step: 562930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:28,632-Speed 2628.22 samples/sec   Loss 4.3494   LearningRate 0.0103   Epoch: 13   Global Step: 562940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:32,526-Speed 2630.06 samples/sec   Loss 4.3740   LearningRate 0.0103   Epoch: 13   Global Step: 562950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:36,420-Speed 2630.39 samples/sec   Loss 4.2470   LearningRate 0.0103   Epoch: 13   Global Step: 562960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:40,312-Speed 2631.87 samples/sec   Loss 4.4493   LearningRate 0.0103   Epoch: 13   Global Step: 562970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:59:44,211-Speed 2627.30 samples/sec   Loss 4.4026   LearningRate 0.0103   Epoch: 13   Global Step: 562980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:59:48,114-Speed 2624.24 samples/sec   Loss 4.2949   LearningRate 0.0103   Epoch: 13   Global Step: 562990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 10:59:51,986-Speed 2645.30 samples/sec   Loss 4.3275   LearningRate 0.0103   Epoch: 13   Global Step: 563000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:55,879-Speed 2630.70 samples/sec   Loss 4.3175   LearningRate 0.0103   Epoch: 13   Global Step: 563010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 10:59:59,772-Speed 2631.15 samples/sec   Loss 4.3446   LearningRate 0.0103   Epoch: 13   Global Step: 563020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:00:03,664-Speed 2632.14 samples/sec   Loss 4.3478   LearningRate 0.0103   Epoch: 13   Global Step: 563030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:00:07,566-Speed 2624.99 samples/sec   Loss 4.3831   LearningRate 0.0103   Epoch: 13   Global Step: 563040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:00:11,497-Speed 2604.91 samples/sec   Loss 4.3428   LearningRate 0.0103   Epoch: 13   Global Step: 563050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:00:15,391-Speed 2630.87 samples/sec   Loss 4.3415   LearningRate 0.0103   Epoch: 13   Global Step: 563060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:00:19,290-Speed 2627.29 samples/sec   Loss 4.3069   LearningRate 0.0103   Epoch: 13   Global Step: 563070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:00:23,162-Speed 2645.00 samples/sec   Loss 4.3501   LearningRate 0.0103   Epoch: 13   Global Step: 563080   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:27,107-Speed 2596.40 samples/sec   Loss 4.3446   LearningRate 0.0103   Epoch: 13   Global Step: 563090   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:31,011-Speed 2623.97 samples/sec   Loss 4.2992   LearningRate 0.0103   Epoch: 13   Global Step: 563100   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:34,902-Speed 2632.20 samples/sec   Loss 4.3498   LearningRate 0.0103   Epoch: 13   Global Step: 563110   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:38,810-Speed 2621.18 samples/sec   Loss 4.2948   LearningRate 0.0103   Epoch: 13   Global Step: 563120   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:42,727-Speed 2614.60 samples/sec   Loss 4.3644   LearningRate 0.0103   Epoch: 13   Global Step: 563130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:46,624-Speed 2629.00 samples/sec   Loss 4.3493   LearningRate 0.0103   Epoch: 13   Global Step: 563140   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:50,516-Speed 2631.55 samples/sec   Loss 4.3192   LearningRate 0.0103   Epoch: 13   Global Step: 563150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:54,421-Speed 2622.93 samples/sec   Loss 4.3475   LearningRate 0.0103   Epoch: 13   Global Step: 563160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:00:58,316-Speed 2629.16 samples/sec   Loss 4.3148   LearningRate 0.0103   Epoch: 13   Global Step: 563170   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:01:02,215-Speed 2626.75 samples/sec   Loss 4.2117   LearningRate 0.0103   Epoch: 13   Global Step: 563180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:06,111-Speed 2629.56 samples/sec   Loss 4.2864   LearningRate 0.0103   Epoch: 13   Global Step: 563190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:10,013-Speed 2625.09 samples/sec   Loss 4.2975   LearningRate 0.0103   Epoch: 13   Global Step: 563200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:13,911-Speed 2627.70 samples/sec   Loss 4.3744   LearningRate 0.0103   Epoch: 13   Global Step: 563210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:17,826-Speed 2616.20 samples/sec   Loss 4.2651   LearningRate 0.0103   Epoch: 13   Global Step: 563220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:21,721-Speed 2629.83 samples/sec   Loss 4.2751   LearningRate 0.0103   Epoch: 13   Global Step: 563230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:25,614-Speed 2631.18 samples/sec   Loss 4.3995   LearningRate 0.0103   Epoch: 13   Global Step: 563240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:29,507-Speed 2630.84 samples/sec   Loss 4.4318   LearningRate 0.0103   Epoch: 13   Global Step: 563250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:33,398-Speed 2632.14 samples/sec   Loss 4.2798   LearningRate 0.0103   Epoch: 13   Global Step: 563260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:37,293-Speed 2629.99 samples/sec   Loss 4.3313   LearningRate 0.0103   Epoch: 13   Global Step: 563270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:01:41,196-Speed 2624.31 samples/sec   Loss 4.4093   LearningRate 0.0103   Epoch: 13   Global Step: 563280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:01:45,088-Speed 2631.48 samples/sec   Loss 4.2298   LearningRate 0.0103   Epoch: 13   Global Step: 563290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:01:48,983-Speed 2630.16 samples/sec   Loss 4.2473   LearningRate 0.0103   Epoch: 13   Global Step: 563300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:01:52,892-Speed 2619.80 samples/sec   Loss 4.2795   LearningRate 0.0103   Epoch: 13   Global Step: 563310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:01:56,767-Speed 2644.19 samples/sec   Loss 4.3485   LearningRate 0.0103   Epoch: 13   Global Step: 563320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:00,665-Speed 2627.40 samples/sec   Loss 4.3684   LearningRate 0.0103   Epoch: 13   Global Step: 563330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:04,575-Speed 2619.07 samples/sec   Loss 4.3269   LearningRate 0.0103   Epoch: 13   Global Step: 563340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:08,497-Speed 2611.95 samples/sec   Loss 4.3594   LearningRate 0.0103   Epoch: 13   Global Step: 563350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:12,393-Speed 2628.84 samples/sec   Loss 4.2940   LearningRate 0.0103   Epoch: 13   Global Step: 563360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:16,289-Speed 2629.26 samples/sec   Loss 4.2482   LearningRate 0.0103   Epoch: 13   Global Step: 563370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:20,194-Speed 2623.22 samples/sec   Loss 4.2444   LearningRate 0.0103   Epoch: 13   Global Step: 563380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:24,090-Speed 2628.69 samples/sec   Loss 4.3969   LearningRate 0.0103   Epoch: 13   Global Step: 563390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:27,989-Speed 2627.20 samples/sec   Loss 4.3441   LearningRate 0.0103   Epoch: 13   Global Step: 563400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:31,890-Speed 2625.78 samples/sec   Loss 4.3016   LearningRate 0.0103   Epoch: 13   Global Step: 563410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:02:35,786-Speed 2628.89 samples/sec   Loss 4.3586   LearningRate 0.0103   Epoch: 13   Global Step: 563420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:02:39,682-Speed 2628.75 samples/sec   Loss 4.4287   LearningRate 0.0103   Epoch: 13   Global Step: 563430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:02:43,580-Speed 2627.83 samples/sec   Loss 4.3475   LearningRate 0.0103   Epoch: 13   Global Step: 563440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:02:47,484-Speed 2623.43 samples/sec   Loss 4.2565   LearningRate 0.0103   Epoch: 13   Global Step: 563450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:02:51,381-Speed 2628.30 samples/sec   Loss 4.3720   LearningRate 0.0103   Epoch: 13   Global Step: 563460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:02:55,292-Speed 2619.36 samples/sec   Loss 4.3657   LearningRate 0.0103   Epoch: 13   Global Step: 563470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:02:59,196-Speed 2622.87 samples/sec   Loss 4.3533   LearningRate 0.0103   Epoch: 13   Global Step: 563480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:03:03,081-Speed 2636.46 samples/sec   Loss 4.1988   LearningRate 0.0103   Epoch: 13   Global Step: 563490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:06,978-Speed 2628.79 samples/sec   Loss 4.4002   LearningRate 0.0103   Epoch: 13   Global Step: 563500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:10,876-Speed 2627.40 samples/sec   Loss 4.4451   LearningRate 0.0103   Epoch: 13   Global Step: 563510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:14,768-Speed 2631.43 samples/sec   Loss 4.2624   LearningRate 0.0103   Epoch: 13   Global Step: 563520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:18,672-Speed 2623.66 samples/sec   Loss 4.3724   LearningRate 0.0103   Epoch: 13   Global Step: 563530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:22,565-Speed 2631.46 samples/sec   Loss 4.3839   LearningRate 0.0103   Epoch: 13   Global Step: 563540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:26,456-Speed 2632.02 samples/sec   Loss 4.3304   LearningRate 0.0103   Epoch: 13   Global Step: 563550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:30,353-Speed 2629.18 samples/sec   Loss 4.4022   LearningRate 0.0103   Epoch: 13   Global Step: 563560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:34,273-Speed 2612.67 samples/sec   Loss 4.2843   LearningRate 0.0103   Epoch: 13   Global Step: 563570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:38,172-Speed 2626.78 samples/sec   Loss 4.3492   LearningRate 0.0103   Epoch: 13   Global Step: 563580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:42,066-Speed 2630.07 samples/sec   Loss 4.3384   LearningRate 0.0103   Epoch: 13   Global Step: 563590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:45,957-Speed 2632.79 samples/sec   Loss 4.3243   LearningRate 0.0103   Epoch: 13   Global Step: 563600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:49,859-Speed 2625.46 samples/sec   Loss 4.4290   LearningRate 0.0103   Epoch: 13   Global Step: 563610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:53,760-Speed 2625.39 samples/sec   Loss 4.3773   LearningRate 0.0103   Epoch: 13   Global Step: 563620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:03:57,664-Speed 2623.96 samples/sec   Loss 4.3659   LearningRate 0.0103   Epoch: 13   Global Step: 563630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:01,568-Speed 2624.03 samples/sec   Loss 4.3152   LearningRate 0.0103   Epoch: 13   Global Step: 563640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:05,460-Speed 2631.36 samples/sec   Loss 4.2929   LearningRate 0.0103   Epoch: 13   Global Step: 563650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:09,363-Speed 2624.48 samples/sec   Loss 4.3496   LearningRate 0.0103   Epoch: 13   Global Step: 563660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:13,259-Speed 2628.48 samples/sec   Loss 4.3312   LearningRate 0.0103   Epoch: 13   Global Step: 563670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:17,155-Speed 2628.94 samples/sec   Loss 4.2950   LearningRate 0.0103   Epoch: 13   Global Step: 563680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:21,051-Speed 2629.44 samples/sec   Loss 4.3228   LearningRate 0.0103   Epoch: 13   Global Step: 563690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:04:24,930-Speed 2640.37 samples/sec   Loss 4.2226   LearningRate 0.0103   Epoch: 13   Global Step: 563700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:28,829-Speed 2627.65 samples/sec   Loss 4.3849   LearningRate 0.0103   Epoch: 13   Global Step: 563710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:32,729-Speed 2626.32 samples/sec   Loss 4.2938   LearningRate 0.0103   Epoch: 13   Global Step: 563720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:36,634-Speed 2622.16 samples/sec   Loss 4.3623   LearningRate 0.0103   Epoch: 13   Global Step: 563730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:40,530-Speed 2628.89 samples/sec   Loss 4.3075   LearningRate 0.0103   Epoch: 13   Global Step: 563740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:44,468-Speed 2601.61 samples/sec   Loss 4.3243   LearningRate 0.0103   Epoch: 13   Global Step: 563750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:48,386-Speed 2614.31 samples/sec   Loss 4.3456   LearningRate 0.0103   Epoch: 13   Global Step: 563760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:52,297-Speed 2619.19 samples/sec   Loss 4.2850   LearningRate 0.0103   Epoch: 13   Global Step: 563770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:04:56,200-Speed 2624.00 samples/sec   Loss 4.3289   LearningRate 0.0103   Epoch: 13   Global Step: 563780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:00,093-Speed 2631.58 samples/sec   Loss 4.3575   LearningRate 0.0103   Epoch: 13   Global Step: 563790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:03,988-Speed 2629.41 samples/sec   Loss 4.2954   LearningRate 0.0103   Epoch: 13   Global Step: 563800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:05:07,867-Speed 2640.27 samples/sec   Loss 4.2664   LearningRate 0.0103   Epoch: 13   Global Step: 563810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:11,755-Speed 2634.15 samples/sec   Loss 4.2576   LearningRate 0.0103   Epoch: 13   Global Step: 563820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:15,659-Speed 2624.41 samples/sec   Loss 4.2553   LearningRate 0.0103   Epoch: 13   Global Step: 563830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:19,553-Speed 2629.76 samples/sec   Loss 4.2614   LearningRate 0.0103   Epoch: 13   Global Step: 563840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:23,452-Speed 2627.70 samples/sec   Loss 4.3395   LearningRate 0.0103   Epoch: 13   Global Step: 563850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:27,347-Speed 2629.06 samples/sec   Loss 4.2966   LearningRate 0.0103   Epoch: 13   Global Step: 563860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:31,262-Speed 2617.06 samples/sec   Loss 4.2949   LearningRate 0.0103   Epoch: 13   Global Step: 563870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:35,166-Speed 2623.55 samples/sec   Loss 4.3017   LearningRate 0.0103   Epoch: 13   Global Step: 563880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:39,235-Speed 2517.34 samples/sec   Loss 4.2814   LearningRate 0.0103   Epoch: 13   Global Step: 563890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:43,129-Speed 2629.83 samples/sec   Loss 4.2891   LearningRate 0.0103   Epoch: 13   Global Step: 563900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:05:47,023-Speed 2630.75 samples/sec   Loss 4.3359   LearningRate 0.0103   Epoch: 13   Global Step: 563910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:05:50,923-Speed 2626.74 samples/sec   Loss 4.2565   LearningRate 0.0103   Epoch: 13   Global Step: 563920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:05:54,833-Speed 2619.26 samples/sec   Loss 4.4418   LearningRate 0.0103   Epoch: 13   Global Step: 563930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:05:58,710-Speed 2642.03 samples/sec   Loss 4.2399   LearningRate 0.0103   Epoch: 13   Global Step: 563940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:02,609-Speed 2627.08 samples/sec   Loss 4.3809   LearningRate 0.0103   Epoch: 13   Global Step: 563950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:06,504-Speed 2629.75 samples/sec   Loss 4.2986   LearningRate 0.0103   Epoch: 13   Global Step: 563960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:10,411-Speed 2621.31 samples/sec   Loss 4.3065   LearningRate 0.0103   Epoch: 13   Global Step: 563970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:14,305-Speed 2630.42 samples/sec   Loss 4.2456   LearningRate 0.0103   Epoch: 13   Global Step: 563980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:18,362-Speed 2524.69 samples/sec   Loss 4.3384   LearningRate 0.0102   Epoch: 13   Global Step: 563990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:22,358-Speed 2563.30 samples/sec   Loss 4.3124   LearningRate 0.0102   Epoch: 13   Global Step: 564000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:26,259-Speed 2626.14 samples/sec   Loss 4.3278   LearningRate 0.0102   Epoch: 13   Global Step: 564010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:30,159-Speed 2626.08 samples/sec   Loss 4.3169   LearningRate 0.0102   Epoch: 13   Global Step: 564020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:34,062-Speed 2624.21 samples/sec   Loss 4.3487   LearningRate 0.0102   Epoch: 13   Global Step: 564030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:06:37,933-Speed 2646.14 samples/sec   Loss 4.2756   LearningRate 0.0102   Epoch: 13   Global Step: 564040   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:06:41,827-Speed 2630.38 samples/sec   Loss 4.1330   LearningRate 0.0102   Epoch: 13   Global Step: 564050   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:06:45,723-Speed 2628.85 samples/sec   Loss 4.2795   LearningRate 0.0102   Epoch: 13   Global Step: 564060   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:06:49,639-Speed 2615.37 samples/sec   Loss 4.3041   LearningRate 0.0102   Epoch: 13   Global Step: 564070   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:06:53,542-Speed 2624.74 samples/sec   Loss 4.3571   LearningRate 0.0102   Epoch: 13   Global Step: 564080   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:06:57,432-Speed 2633.05 samples/sec   Loss 4.3827   LearningRate 0.0102   Epoch: 13   Global Step: 564090   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:07:01,327-Speed 2630.12 samples/sec   Loss 4.3668   LearningRate 0.0102   Epoch: 13   Global Step: 564100   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:07:05,229-Speed 2624.45 samples/sec   Loss 4.3278   LearningRate 0.0102   Epoch: 13   Global Step: 564110   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:07:09,123-Speed 2630.52 samples/sec   Loss 4.2089   LearningRate 0.0102   Epoch: 13   Global Step: 564120   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:07:13,014-Speed 2632.35 samples/sec   Loss 4.1960   LearningRate 0.0102   Epoch: 13   Global Step: 564130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:07:16,930-Speed 2615.73 samples/sec   Loss 4.3694   LearningRate 0.0102   Epoch: 13   Global Step: 564140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:20,826-Speed 2629.01 samples/sec   Loss 4.2923   LearningRate 0.0102   Epoch: 13   Global Step: 564150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:24,735-Speed 2620.73 samples/sec   Loss 4.2406   LearningRate 0.0102   Epoch: 13   Global Step: 564160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:28,637-Speed 2624.65 samples/sec   Loss 4.2695   LearningRate 0.0102   Epoch: 13   Global Step: 564170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:32,541-Speed 2623.48 samples/sec   Loss 4.3156   LearningRate 0.0102   Epoch: 13   Global Step: 564180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:36,442-Speed 2625.82 samples/sec   Loss 4.3424   LearningRate 0.0102   Epoch: 13   Global Step: 564190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:40,338-Speed 2628.86 samples/sec   Loss 4.3592   LearningRate 0.0102   Epoch: 13   Global Step: 564200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:44,256-Speed 2614.46 samples/sec   Loss 4.3107   LearningRate 0.0102   Epoch: 13   Global Step: 564210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:48,168-Speed 2618.70 samples/sec   Loss 4.3411   LearningRate 0.0102   Epoch: 13   Global Step: 564220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:52,066-Speed 2627.50 samples/sec   Loss 4.2997   LearningRate 0.0102   Epoch: 13   Global Step: 564230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:07:55,960-Speed 2630.79 samples/sec   Loss 4.3975   LearningRate 0.0102   Epoch: 13   Global Step: 564240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:07:59,856-Speed 2628.58 samples/sec   Loss 4.3163   LearningRate 0.0102   Epoch: 13   Global Step: 564250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:08:03,766-Speed 2619.47 samples/sec   Loss 4.2198   LearningRate 0.0102   Epoch: 13   Global Step: 564260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:08:07,663-Speed 2628.50 samples/sec   Loss 4.3322   LearningRate 0.0102   Epoch: 13   Global Step: 564270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:08:11,555-Speed 2631.48 samples/sec   Loss 4.2941   LearningRate 0.0102   Epoch: 13   Global Step: 564280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:08:15,453-Speed 2627.93 samples/sec   Loss 4.3164   LearningRate 0.0102   Epoch: 13   Global Step: 564290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:08:19,324-Speed 2646.73 samples/sec   Loss 4.2525   LearningRate 0.0102   Epoch: 13   Global Step: 564300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:23,223-Speed 2626.58 samples/sec   Loss 4.3774   LearningRate 0.0102   Epoch: 13   Global Step: 564310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:27,717-Speed 2280.03 samples/sec   Loss 4.3719   LearningRate 0.0102   Epoch: 13   Global Step: 564320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:31,623-Speed 2622.56 samples/sec   Loss 4.3640   LearningRate 0.0102   Epoch: 13   Global Step: 564330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:35,538-Speed 2615.62 samples/sec   Loss 4.3822   LearningRate 0.0102   Epoch: 13   Global Step: 564340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:39,442-Speed 2624.29 samples/sec   Loss 4.3682   LearningRate 0.0102   Epoch: 13   Global Step: 564350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:43,347-Speed 2622.87 samples/sec   Loss 4.3123   LearningRate 0.0102   Epoch: 13   Global Step: 564360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:47,247-Speed 2627.31 samples/sec   Loss 4.3064   LearningRate 0.0102   Epoch: 13   Global Step: 564370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:51,142-Speed 2629.02 samples/sec   Loss 4.2986   LearningRate 0.0102   Epoch: 13   Global Step: 564380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:55,054-Speed 2619.17 samples/sec   Loss 4.2828   LearningRate 0.0102   Epoch: 13   Global Step: 564390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:08:58,964-Speed 2619.60 samples/sec   Loss 4.1983   LearningRate 0.0102   Epoch: 13   Global Step: 564400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:02,875-Speed 2618.79 samples/sec   Loss 4.3022   LearningRate 0.0102   Epoch: 13   Global Step: 564410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:06,780-Speed 2622.83 samples/sec   Loss 4.2546   LearningRate 0.0102   Epoch: 13   Global Step: 564420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:10,675-Speed 2629.81 samples/sec   Loss 4.3908   LearningRate 0.0102   Epoch: 13   Global Step: 564430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:14,572-Speed 2628.66 samples/sec   Loss 4.3185   LearningRate 0.0102   Epoch: 13   Global Step: 564440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:18,486-Speed 2616.48 samples/sec   Loss 4.3040   LearningRate 0.0102   Epoch: 13   Global Step: 564450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:22,388-Speed 2625.22 samples/sec   Loss 4.3193   LearningRate 0.0102   Epoch: 13   Global Step: 564460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:26,281-Speed 2630.60 samples/sec   Loss 4.3563   LearningRate 0.0102   Epoch: 13   Global Step: 564470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:30,175-Speed 2630.81 samples/sec   Loss 4.2531   LearningRate 0.0102   Epoch: 13   Global Step: 564480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:34,067-Speed 2631.73 samples/sec   Loss 4.3009   LearningRate 0.0102   Epoch: 13   Global Step: 564490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:37,943-Speed 2642.59 samples/sec   Loss 4.3297   LearningRate 0.0102   Epoch: 13   Global Step: 564500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:09:41,822-Speed 2639.94 samples/sec   Loss 4.3496   LearningRate 0.0102   Epoch: 13   Global Step: 564510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:09:45,713-Speed 2632.38 samples/sec   Loss 4.3363   LearningRate 0.0102   Epoch: 13   Global Step: 564520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:09:49,730-Speed 2549.86 samples/sec   Loss 4.3099   LearningRate 0.0102   Epoch: 13   Global Step: 564530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:09:53,783-Speed 2527.79 samples/sec   Loss 4.2529   LearningRate 0.0102   Epoch: 13   Global Step: 564540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:09:57,675-Speed 2631.72 samples/sec   Loss 4.3270   LearningRate 0.0102   Epoch: 13   Global Step: 564550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:01,568-Speed 2631.19 samples/sec   Loss 4.3606   LearningRate 0.0102   Epoch: 13   Global Step: 564560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:05,460-Speed 2631.70 samples/sec   Loss 4.3588   LearningRate 0.0102   Epoch: 13   Global Step: 564570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:09,355-Speed 2629.75 samples/sec   Loss 4.2800   LearningRate 0.0102   Epoch: 13   Global Step: 564580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:13,247-Speed 2631.45 samples/sec   Loss 4.3459   LearningRate 0.0102   Epoch: 13   Global Step: 564590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:17,143-Speed 2629.47 samples/sec   Loss 4.2763   LearningRate 0.0102   Epoch: 13   Global Step: 564600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:21,075-Speed 2605.83 samples/sec   Loss 4.2667   LearningRate 0.0102   Epoch: 13   Global Step: 564610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:10:24,978-Speed 2624.32 samples/sec   Loss 4.4146   LearningRate 0.0102   Epoch: 13   Global Step: 564620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:10:28,878-Speed 2626.51 samples/sec   Loss 4.3057   LearningRate 0.0102   Epoch: 13   Global Step: 564630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:10:32,780-Speed 2624.93 samples/sec   Loss 4.3345   LearningRate 0.0102   Epoch: 13   Global Step: 564640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:10:36,677-Speed 2628.37 samples/sec   Loss 4.3504   LearningRate 0.0102   Epoch: 13   Global Step: 564650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:10:40,565-Speed 2634.31 samples/sec   Loss 4.3471   LearningRate 0.0102   Epoch: 13   Global Step: 564660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:44,477-Speed 2618.50 samples/sec   Loss 4.3005   LearningRate 0.0102   Epoch: 13   Global Step: 564670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:48,404-Speed 2607.86 samples/sec   Loss 4.3462   LearningRate 0.0102   Epoch: 13   Global Step: 564680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:52,311-Speed 2622.13 samples/sec   Loss 4.3886   LearningRate 0.0102   Epoch: 13   Global Step: 564690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:10:56,210-Speed 2626.74 samples/sec   Loss 4.2843   LearningRate 0.0102   Epoch: 13   Global Step: 564700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:00,117-Speed 2621.88 samples/sec   Loss 4.3562   LearningRate 0.0102   Epoch: 13   Global Step: 564710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:04,012-Speed 2629.62 samples/sec   Loss 4.2670   LearningRate 0.0102   Epoch: 13   Global Step: 564720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:07,907-Speed 2629.59 samples/sec   Loss 4.2681   LearningRate 0.0102   Epoch: 13   Global Step: 564730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:11,799-Speed 2631.46 samples/sec   Loss 4.3748   LearningRate 0.0102   Epoch: 13   Global Step: 564740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:15,697-Speed 2628.97 samples/sec   Loss 4.3363   LearningRate 0.0102   Epoch: 13   Global Step: 564750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:19,608-Speed 2618.39 samples/sec   Loss 4.3516   LearningRate 0.0102   Epoch: 13   Global Step: 564760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:11:23,503-Speed 2630.28 samples/sec   Loss 4.3424   LearningRate 0.0102   Epoch: 13   Global Step: 564770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:27,398-Speed 2629.75 samples/sec   Loss 4.3175   LearningRate 0.0102   Epoch: 13   Global Step: 564780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:31,295-Speed 2628.26 samples/sec   Loss 4.3054   LearningRate 0.0102   Epoch: 13   Global Step: 564790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:35,191-Speed 2629.02 samples/sec   Loss 4.2454   LearningRate 0.0102   Epoch: 13   Global Step: 564800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:39,084-Speed 2630.97 samples/sec   Loss 4.3406   LearningRate 0.0102   Epoch: 13   Global Step: 564810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:43,078-Speed 2564.47 samples/sec   Loss 4.2455   LearningRate 0.0102   Epoch: 13   Global Step: 564820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:46,982-Speed 2624.48 samples/sec   Loss 4.3397   LearningRate 0.0102   Epoch: 13   Global Step: 564830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:50,877-Speed 2629.57 samples/sec   Loss 4.3439   LearningRate 0.0102   Epoch: 13   Global Step: 564840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:54,774-Speed 2628.46 samples/sec   Loss 4.2891   LearningRate 0.0102   Epoch: 13   Global Step: 564850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:11:58,664-Speed 2632.81 samples/sec   Loss 4.3404   LearningRate 0.0102   Epoch: 13   Global Step: 564860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:02,565-Speed 2625.81 samples/sec   Loss 4.2973   LearningRate 0.0102   Epoch: 13   Global Step: 564870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:12:06,441-Speed 2642.67 samples/sec   Loss 4.3527   LearningRate 0.0102   Epoch: 13   Global Step: 564880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:10,333-Speed 2631.25 samples/sec   Loss 4.3681   LearningRate 0.0102   Epoch: 13   Global Step: 564890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:14,236-Speed 2623.99 samples/sec   Loss 4.3408   LearningRate 0.0102   Epoch: 13   Global Step: 564900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:18,131-Speed 2630.13 samples/sec   Loss 4.3315   LearningRate 0.0102   Epoch: 13   Global Step: 564910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:22,040-Speed 2620.44 samples/sec   Loss 4.2721   LearningRate 0.0102   Epoch: 13   Global Step: 564920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:25,934-Speed 2630.51 samples/sec   Loss 4.3870   LearningRate 0.0102   Epoch: 13   Global Step: 564930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:29,831-Speed 2628.22 samples/sec   Loss 4.4157   LearningRate 0.0102   Epoch: 13   Global Step: 564940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:33,723-Speed 2631.72 samples/sec   Loss 4.2609   LearningRate 0.0102   Epoch: 13   Global Step: 564950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:37,643-Speed 2613.07 samples/sec   Loss 4.3149   LearningRate 0.0102   Epoch: 13   Global Step: 564960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:41,535-Speed 2631.56 samples/sec   Loss 4.2193   LearningRate 0.0102   Epoch: 13   Global Step: 564970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:12:45,422-Speed 2635.77 samples/sec   Loss 4.3377   LearningRate 0.0102   Epoch: 13   Global Step: 564980   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:12:49,350-Speed 2607.48 samples/sec   Loss 4.3063   LearningRate 0.0102   Epoch: 13   Global Step: 564990   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:12:53,296-Speed 2595.60 samples/sec   Loss 4.3369   LearningRate 0.0102   Epoch: 13   Global Step: 565000   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:12:57,188-Speed 2632.23 samples/sec   Loss 4.3259   LearningRate 0.0102   Epoch: 13   Global Step: 565010   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:01,083-Speed 2629.30 samples/sec   Loss 4.2533   LearningRate 0.0102   Epoch: 13   Global Step: 565020   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:05,019-Speed 2602.03 samples/sec   Loss 4.3359   LearningRate 0.0102   Epoch: 13   Global Step: 565030   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:08,918-Speed 2627.17 samples/sec   Loss 4.2975   LearningRate 0.0102   Epoch: 13   Global Step: 565040   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:12,820-Speed 2625.32 samples/sec   Loss 4.4208   LearningRate 0.0102   Epoch: 13   Global Step: 565050   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:16,725-Speed 2623.52 samples/sec   Loss 4.3067   LearningRate 0.0102   Epoch: 13   Global Step: 565060   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:20,622-Speed 2627.83 samples/sec   Loss 4.3004   LearningRate 0.0102   Epoch: 13   Global Step: 565070   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:13:24,641-Speed 2548.46 samples/sec   Loss 4.4005   LearningRate 0.0102   Epoch: 13   Global Step: 565080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:28,555-Speed 2616.77 samples/sec   Loss 4.1912   LearningRate 0.0102   Epoch: 13   Global Step: 565090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:32,447-Speed 2632.19 samples/sec   Loss 4.3298   LearningRate 0.0102   Epoch: 13   Global Step: 565100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:36,347-Speed 2626.44 samples/sec   Loss 4.2866   LearningRate 0.0102   Epoch: 13   Global Step: 565110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:40,241-Speed 2630.19 samples/sec   Loss 4.2663   LearningRate 0.0102   Epoch: 13   Global Step: 565120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:44,146-Speed 2622.74 samples/sec   Loss 4.3575   LearningRate 0.0102   Epoch: 13   Global Step: 565130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:48,041-Speed 2629.83 samples/sec   Loss 4.2981   LearningRate 0.0102   Epoch: 13   Global Step: 565140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:51,963-Speed 2611.70 samples/sec   Loss 4.3604   LearningRate 0.0102   Epoch: 13   Global Step: 565150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:55,864-Speed 2625.93 samples/sec   Loss 4.2656   LearningRate 0.0102   Epoch: 13   Global Step: 565160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:13:59,760-Speed 2629.86 samples/sec   Loss 4.2219   LearningRate 0.0102   Epoch: 13   Global Step: 565170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:03,659-Speed 2626.38 samples/sec   Loss 4.2653   LearningRate 0.0102   Epoch: 13   Global Step: 565180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:14:07,558-Speed 2627.58 samples/sec   Loss 4.2814   LearningRate 0.0102   Epoch: 13   Global Step: 565190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:14:11,450-Speed 2631.51 samples/sec   Loss 4.2812   LearningRate 0.0102   Epoch: 13   Global Step: 565200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:14:15,342-Speed 2631.59 samples/sec   Loss 4.2943   LearningRate 0.0102   Epoch: 13   Global Step: 565210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:14:19,259-Speed 2614.99 samples/sec   Loss 4.3390   LearningRate 0.0102   Epoch: 13   Global Step: 565220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:14:23,175-Speed 2615.74 samples/sec   Loss 4.2696   LearningRate 0.0102   Epoch: 13   Global Step: 565230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:14:27,044-Speed 2647.74 samples/sec   Loss 4.3893   LearningRate 0.0102   Epoch: 13   Global Step: 565240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:30,935-Speed 2632.68 samples/sec   Loss 4.3573   LearningRate 0.0102   Epoch: 13   Global Step: 565250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:34,837-Speed 2624.41 samples/sec   Loss 4.3319   LearningRate 0.0102   Epoch: 13   Global Step: 565260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:38,738-Speed 2625.87 samples/sec   Loss 4.2621   LearningRate 0.0102   Epoch: 13   Global Step: 565270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:42,632-Speed 2629.68 samples/sec   Loss 4.2296   LearningRate 0.0102   Epoch: 13   Global Step: 565280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:46,556-Speed 2610.68 samples/sec   Loss 4.3094   LearningRate 0.0101   Epoch: 13   Global Step: 565290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:50,471-Speed 2615.73 samples/sec   Loss 4.3884   LearningRate 0.0101   Epoch: 13   Global Step: 565300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:54,421-Speed 2593.52 samples/sec   Loss 4.2952   LearningRate 0.0101   Epoch: 13   Global Step: 565310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:14:58,316-Speed 2629.99 samples/sec   Loss 4.3108   LearningRate 0.0101   Epoch: 13   Global Step: 565320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:15:02,211-Speed 2629.69 samples/sec   Loss 4.3518   LearningRate 0.0101   Epoch: 13   Global Step: 565330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:15:06,080-Speed 2647.47 samples/sec   Loss 4.2412   LearningRate 0.0101   Epoch: 13   Global Step: 565340   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:09,975-Speed 2629.20 samples/sec   Loss 4.4082   LearningRate 0.0101   Epoch: 13   Global Step: 565350   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:13,873-Speed 2628.14 samples/sec   Loss 4.2684   LearningRate 0.0101   Epoch: 13   Global Step: 565360   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:17,766-Speed 2631.27 samples/sec   Loss 4.3147   LearningRate 0.0101   Epoch: 13   Global Step: 565370   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:21,655-Speed 2634.10 samples/sec   Loss 4.3487   LearningRate 0.0101   Epoch: 13   Global Step: 565380   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:25,553-Speed 2627.54 samples/sec   Loss 4.3297   LearningRate 0.0101   Epoch: 13   Global Step: 565390   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:29,454-Speed 2626.18 samples/sec   Loss 4.2597   LearningRate 0.0101   Epoch: 13   Global Step: 565400   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:33,348-Speed 2629.74 samples/sec   Loss 4.2476   LearningRate 0.0101   Epoch: 13   Global Step: 565410   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:37,256-Speed 2621.18 samples/sec   Loss 4.2925   LearningRate 0.0101   Epoch: 13   Global Step: 565420   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:41,149-Speed 2631.27 samples/sec   Loss 4.2268   LearningRate 0.0101   Epoch: 13   Global Step: 565430   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:15:45,044-Speed 2630.13 samples/sec   Loss 4.3283   LearningRate 0.0101   Epoch: 13   Global Step: 565440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:15:48,960-Speed 2614.93 samples/sec   Loss 4.2379   LearningRate 0.0101   Epoch: 13   Global Step: 565450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:15:52,870-Speed 2620.27 samples/sec   Loss 4.2349   LearningRate 0.0101   Epoch: 13   Global Step: 565460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:15:56,768-Speed 2627.28 samples/sec   Loss 4.3461   LearningRate 0.0101   Epoch: 13   Global Step: 565470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:00,663-Speed 2631.04 samples/sec   Loss 4.3973   LearningRate 0.0101   Epoch: 13   Global Step: 565480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:04,567-Speed 2623.68 samples/sec   Loss 4.2227   LearningRate 0.0101   Epoch: 13   Global Step: 565490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:08,467-Speed 2625.51 samples/sec   Loss 4.2866   LearningRate 0.0101   Epoch: 13   Global Step: 565500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:12,359-Speed 2631.86 samples/sec   Loss 4.3572   LearningRate 0.0101   Epoch: 13   Global Step: 565510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:16,261-Speed 2624.90 samples/sec   Loss 4.3460   LearningRate 0.0101   Epoch: 13   Global Step: 565520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:20,148-Speed 2634.89 samples/sec   Loss 4.3323   LearningRate 0.0101   Epoch: 13   Global Step: 565530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:24,040-Speed 2632.14 samples/sec   Loss 4.3195   LearningRate 0.0101   Epoch: 13   Global Step: 565540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:16:27,937-Speed 2628.23 samples/sec   Loss 4.2953   LearningRate 0.0101   Epoch: 13   Global Step: 565550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:16:31,811-Speed 2643.78 samples/sec   Loss 4.2705   LearningRate 0.0101   Epoch: 13   Global Step: 565560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:35,714-Speed 2624.86 samples/sec   Loss 4.2729   LearningRate 0.0101   Epoch: 13   Global Step: 565570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:39,616-Speed 2624.73 samples/sec   Loss 4.3243   LearningRate 0.0101   Epoch: 13   Global Step: 565580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:43,511-Speed 2629.29 samples/sec   Loss 4.4126   LearningRate 0.0101   Epoch: 13   Global Step: 565590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:47,417-Speed 2621.77 samples/sec   Loss 4.2806   LearningRate 0.0101   Epoch: 13   Global Step: 565600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:51,317-Speed 2626.60 samples/sec   Loss 4.3364   LearningRate 0.0101   Epoch: 13   Global Step: 565610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:55,224-Speed 2621.25 samples/sec   Loss 4.2642   LearningRate 0.0101   Epoch: 13   Global Step: 565620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:16:59,122-Speed 2627.91 samples/sec   Loss 4.4428   LearningRate 0.0101   Epoch: 13   Global Step: 565630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:03,027-Speed 2623.39 samples/sec   Loss 4.1667   LearningRate 0.0101   Epoch: 13   Global Step: 565640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:06,920-Speed 2630.85 samples/sec   Loss 4.2585   LearningRate 0.0101   Epoch: 13   Global Step: 565650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:10,792-Speed 2645.54 samples/sec   Loss 4.1995   LearningRate 0.0101   Epoch: 13   Global Step: 565660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:14,688-Speed 2628.94 samples/sec   Loss 4.2819   LearningRate 0.0101   Epoch: 13   Global Step: 565670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:18,583-Speed 2629.20 samples/sec   Loss 4.3261   LearningRate 0.0101   Epoch: 13   Global Step: 565680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:22,494-Speed 2618.95 samples/sec   Loss 4.2984   LearningRate 0.0101   Epoch: 13   Global Step: 565690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:26,385-Speed 2632.69 samples/sec   Loss 4.2492   LearningRate 0.0101   Epoch: 13   Global Step: 565700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:30,279-Speed 2629.96 samples/sec   Loss 4.2845   LearningRate 0.0101   Epoch: 13   Global Step: 565710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:34,177-Speed 2627.85 samples/sec   Loss 4.3160   LearningRate 0.0101   Epoch: 13   Global Step: 565720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:38,070-Speed 2631.21 samples/sec   Loss 4.3338   LearningRate 0.0101   Epoch: 13   Global Step: 565730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:41,965-Speed 2629.49 samples/sec   Loss 4.2339   LearningRate 0.0101   Epoch: 13   Global Step: 565740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:17:45,837-Speed 2645.08 samples/sec   Loss 4.2345   LearningRate 0.0101   Epoch: 13   Global Step: 565750   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:17:49,733-Speed 2628.61 samples/sec   Loss 4.3654   LearningRate 0.0101   Epoch: 13   Global Step: 565760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:17:53,627-Speed 2632.91 samples/sec   Loss 4.2913   LearningRate 0.0101   Epoch: 13   Global Step: 565770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:17:57,522-Speed 2629.99 samples/sec   Loss 4.2233   LearningRate 0.0101   Epoch: 13   Global Step: 565780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:01,416-Speed 2629.98 samples/sec   Loss 4.2787   LearningRate 0.0101   Epoch: 13   Global Step: 565790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:05,337-Speed 2612.13 samples/sec   Loss 4.3394   LearningRate 0.0101   Epoch: 13   Global Step: 565800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:09,354-Speed 2549.54 samples/sec   Loss 4.3872   LearningRate 0.0101   Epoch: 13   Global Step: 565810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:13,245-Speed 2632.55 samples/sec   Loss 4.2250   LearningRate 0.0101   Epoch: 13   Global Step: 565820   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:17,148-Speed 2623.95 samples/sec   Loss 4.2686   LearningRate 0.0101   Epoch: 13   Global Step: 565830   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:21,041-Speed 2630.86 samples/sec   Loss 4.3683   LearningRate 0.0101   Epoch: 13   Global Step: 565840   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:24,914-Speed 2644.69 samples/sec   Loss 4.3358   LearningRate 0.0101   Epoch: 13   Global Step: 565850   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:28,803-Speed 2633.51 samples/sec   Loss 4.3736   LearningRate 0.0101   Epoch: 13   Global Step: 565860   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:32,711-Speed 2621.09 samples/sec   Loss 4.2444   LearningRate 0.0101   Epoch: 13   Global Step: 565870   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:36,610-Speed 2627.08 samples/sec   Loss 4.2475   LearningRate 0.0101   Epoch: 13   Global Step: 565880   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:40,509-Speed 2626.43 samples/sec   Loss 4.2848   LearningRate 0.0101   Epoch: 13   Global Step: 565890   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:44,407-Speed 2627.45 samples/sec   Loss 4.3565   LearningRate 0.0101   Epoch: 13   Global Step: 565900   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:48,308-Speed 2626.09 samples/sec   Loss 4.2420   LearningRate 0.0101   Epoch: 13   Global Step: 565910   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:52,206-Speed 2627.59 samples/sec   Loss 4.2943   LearningRate 0.0101   Epoch: 13   Global Step: 565920   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:18:56,098-Speed 2631.48 samples/sec   Loss 4.2585   LearningRate 0.0101   Epoch: 13   Global Step: 565930   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:19:00,023-Speed 2609.54 samples/sec   Loss 4.1934   LearningRate 0.0101   Epoch: 13   Global Step: 565940   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:19:04,059-Speed 2537.78 samples/sec   Loss 4.2395   LearningRate 0.0101   Epoch: 13   Global Step: 565950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:07,959-Speed 2626.10 samples/sec   Loss 4.2455   LearningRate 0.0101   Epoch: 13   Global Step: 565960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:11,857-Speed 2628.11 samples/sec   Loss 4.3085   LearningRate 0.0101   Epoch: 13   Global Step: 565970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:15,757-Speed 2625.99 samples/sec   Loss 4.3369   LearningRate 0.0101   Epoch: 13   Global Step: 565980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:19,652-Speed 2629.13 samples/sec   Loss 4.3578   LearningRate 0.0101   Epoch: 13   Global Step: 565990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:23,548-Speed 2629.32 samples/sec   Loss 4.3462   LearningRate 0.0101   Epoch: 13   Global Step: 566000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:27,476-Speed 2607.27 samples/sec   Loss 4.2846   LearningRate 0.0101   Epoch: 13   Global Step: 566010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:31,379-Speed 2625.13 samples/sec   Loss 4.3131   LearningRate 0.0101   Epoch: 13   Global Step: 566020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:35,273-Speed 2630.04 samples/sec   Loss 4.2997   LearningRate 0.0101   Epoch: 13   Global Step: 566030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:39,165-Speed 2631.91 samples/sec   Loss 4.2939   LearningRate 0.0101   Epoch: 13   Global Step: 566040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:43,061-Speed 2628.98 samples/sec   Loss 4.3002   LearningRate 0.0101   Epoch: 13   Global Step: 566050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:19:46,990-Speed 2607.41 samples/sec   Loss 4.2324   LearningRate 0.0101   Epoch: 13   Global Step: 566060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:19:50,887-Speed 2627.77 samples/sec   Loss 4.3339   LearningRate 0.0101   Epoch: 13   Global Step: 566070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:19:54,773-Speed 2635.98 samples/sec   Loss 4.2207   LearningRate 0.0101   Epoch: 13   Global Step: 566080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:19:58,690-Speed 2614.83 samples/sec   Loss 4.2127   LearningRate 0.0101   Epoch: 13   Global Step: 566090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:02,585-Speed 2630.13 samples/sec   Loss 4.1544   LearningRate 0.0101   Epoch: 13   Global Step: 566100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:06,482-Speed 2628.26 samples/sec   Loss 4.2635   LearningRate 0.0101   Epoch: 13   Global Step: 566110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:10,390-Speed 2620.88 samples/sec   Loss 4.2449   LearningRate 0.0101   Epoch: 13   Global Step: 566120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:14,305-Speed 2616.03 samples/sec   Loss 4.2812   LearningRate 0.0101   Epoch: 13   Global Step: 566130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:18,199-Speed 2630.58 samples/sec   Loss 4.3209   LearningRate 0.0101   Epoch: 13   Global Step: 566140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:22,110-Speed 2619.36 samples/sec   Loss 4.3198   LearningRate 0.0101   Epoch: 13   Global Step: 566150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:26,011-Speed 2625.03 samples/sec   Loss 4.2758   LearningRate 0.0101   Epoch: 13   Global Step: 566160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:29,935-Speed 2610.97 samples/sec   Loss 4.2232   LearningRate 0.0101   Epoch: 13   Global Step: 566170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:20:33,834-Speed 2626.59 samples/sec   Loss 4.2733   LearningRate 0.0101   Epoch: 13   Global Step: 566180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:20:37,702-Speed 2647.77 samples/sec   Loss 4.2307   LearningRate 0.0101   Epoch: 13   Global Step: 566190   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:20:41,601-Speed 2627.28 samples/sec   Loss 4.3140   LearningRate 0.0101   Epoch: 13   Global Step: 566200   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:20:45,497-Speed 2629.54 samples/sec   Loss 4.2406   LearningRate 0.0101   Epoch: 13   Global Step: 566210   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:20:49,389-Speed 2631.04 samples/sec   Loss 4.3933   LearningRate 0.0101   Epoch: 13   Global Step: 566220   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:20:53,283-Speed 2630.61 samples/sec   Loss 4.3194   LearningRate 0.0101   Epoch: 13   Global Step: 566230   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:20:57,176-Speed 2630.72 samples/sec   Loss 4.2828   LearningRate 0.0101   Epoch: 13   Global Step: 566240   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:21:23,510-Speed 388.88 samples/sec   Loss 4.2521   LearningRate 0.0101   Epoch: 13   Global Step: 566250   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:21:27,406-Speed 2629.65 samples/sec   Loss 4.3395   LearningRate 0.0101   Epoch: 13   Global Step: 566260   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:21:37,330-Speed 1031.95 samples/sec   Loss 4.3189   LearningRate 0.0101   Epoch: 13   Global Step: 566270   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:21:41,212-Speed 2638.47 samples/sec   Loss 4.2033   LearningRate 0.0101   Epoch: 13   Global Step: 566280   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:21:45,140-Speed 2607.79 samples/sec   Loss 4.3342   LearningRate 0.0101   Epoch: 13   Global Step: 566290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:21:49,038-Speed 2627.61 samples/sec   Loss 4.2995   LearningRate 0.0101   Epoch: 13   Global Step: 566300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:21:52,947-Speed 2620.52 samples/sec   Loss 4.3253   LearningRate 0.0101   Epoch: 13   Global Step: 566310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:21:56,841-Speed 2630.59 samples/sec   Loss 4.2340   LearningRate 0.0101   Epoch: 13   Global Step: 566320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:12,639-Speed 648.26 samples/sec   Loss 4.1861   LearningRate 0.0101   Epoch: 13   Global Step: 566330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:16,528-Speed 2633.77 samples/sec   Loss 4.2985   LearningRate 0.0101   Epoch: 13   Global Step: 566340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:20,443-Speed 2616.59 samples/sec   Loss 4.2652   LearningRate 0.0101   Epoch: 13   Global Step: 566350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:24,354-Speed 2619.22 samples/sec   Loss 4.2769   LearningRate 0.0101   Epoch: 13   Global Step: 566360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:28,278-Speed 2610.40 samples/sec   Loss 4.3246   LearningRate 0.0101   Epoch: 13   Global Step: 566370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:32,181-Speed 2624.29 samples/sec   Loss 4.3092   LearningRate 0.0101   Epoch: 13   Global Step: 566380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:36,085-Speed 2623.35 samples/sec   Loss 4.4216   LearningRate 0.0101   Epoch: 13   Global Step: 566390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:22:39,961-Speed 2642.69 samples/sec   Loss 4.4103   LearningRate 0.0101   Epoch: 13   Global Step: 566400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:43,874-Speed 2617.25 samples/sec   Loss 4.2662   LearningRate 0.0101   Epoch: 13   Global Step: 566410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:47,779-Speed 2623.53 samples/sec   Loss 4.2538   LearningRate 0.0101   Epoch: 13   Global Step: 566420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:51,683-Speed 2623.86 samples/sec   Loss 4.3767   LearningRate 0.0101   Epoch: 13   Global Step: 566430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:55,597-Speed 2616.55 samples/sec   Loss 4.3010   LearningRate 0.0101   Epoch: 13   Global Step: 566440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:22:59,491-Speed 2630.24 samples/sec   Loss 4.2176   LearningRate 0.0101   Epoch: 13   Global Step: 566450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:03,449-Speed 2587.32 samples/sec   Loss 4.2249   LearningRate 0.0101   Epoch: 13   Global Step: 566460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:07,523-Speed 2514.78 samples/sec   Loss 4.2679   LearningRate 0.0101   Epoch: 13   Global Step: 566470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:11,427-Speed 2623.48 samples/sec   Loss 4.2630   LearningRate 0.0101   Epoch: 13   Global Step: 566480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:15,359-Speed 2605.62 samples/sec   Loss 4.3250   LearningRate 0.0101   Epoch: 13   Global Step: 566490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:19,263-Speed 2623.65 samples/sec   Loss 4.2632   LearningRate 0.0101   Epoch: 13   Global Step: 566500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:23:23,162-Speed 2627.52 samples/sec   Loss 4.2941   LearningRate 0.0101   Epoch: 13   Global Step: 566510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:27,072-Speed 2619.40 samples/sec   Loss 4.1442   LearningRate 0.0101   Epoch: 13   Global Step: 566520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:30,988-Speed 2615.70 samples/sec   Loss 4.2475   LearningRate 0.0101   Epoch: 13   Global Step: 566530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:34,897-Speed 2620.02 samples/sec   Loss 4.2174   LearningRate 0.0101   Epoch: 13   Global Step: 566540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:38,799-Speed 2625.93 samples/sec   Loss 4.3204   LearningRate 0.0101   Epoch: 13   Global Step: 566550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:42,703-Speed 2624.44 samples/sec   Loss 4.2546   LearningRate 0.0101   Epoch: 13   Global Step: 566560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:46,602-Speed 2626.76 samples/sec   Loss 4.2615   LearningRate 0.0101   Epoch: 13   Global Step: 566570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:50,501-Speed 2627.36 samples/sec   Loss 4.3738   LearningRate 0.0101   Epoch: 13   Global Step: 566580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:54,517-Speed 2550.60 samples/sec   Loss 4.2939   LearningRate 0.0100   Epoch: 13   Global Step: 566590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:23:58,411-Speed 2630.25 samples/sec   Loss 4.2352   LearningRate 0.0100   Epoch: 13   Global Step: 566600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:02,307-Speed 2628.68 samples/sec   Loss 4.3577   LearningRate 0.0100   Epoch: 13   Global Step: 566610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:24:06,205-Speed 2627.46 samples/sec   Loss 4.3015   LearningRate 0.0100   Epoch: 13   Global Step: 566620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:24:10,113-Speed 2621.21 samples/sec   Loss 4.4214   LearningRate 0.0100   Epoch: 13   Global Step: 566630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:24:13,988-Speed 2644.18 samples/sec   Loss 4.2795   LearningRate 0.0100   Epoch: 13   Global Step: 566640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:17,895-Speed 2621.11 samples/sec   Loss 4.2436   LearningRate 0.0100   Epoch: 13   Global Step: 566650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:21,878-Speed 2572.10 samples/sec   Loss 4.1525   LearningRate 0.0100   Epoch: 13   Global Step: 566660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:25,798-Speed 2612.71 samples/sec   Loss 4.2684   LearningRate 0.0100   Epoch: 13   Global Step: 566670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:29,696-Speed 2627.59 samples/sec   Loss 4.2040   LearningRate 0.0100   Epoch: 13   Global Step: 566680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:33,605-Speed 2619.98 samples/sec   Loss 4.1683   LearningRate 0.0100   Epoch: 13   Global Step: 566690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:37,502-Speed 2628.57 samples/sec   Loss 4.3284   LearningRate 0.0100   Epoch: 13   Global Step: 566700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:41,403-Speed 2625.02 samples/sec   Loss 4.3058   LearningRate 0.0100   Epoch: 13   Global Step: 566710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:45,302-Speed 2627.78 samples/sec   Loss 4.4157   LearningRate 0.0100   Epoch: 13   Global Step: 566720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:24:49,175-Speed 2644.70 samples/sec   Loss 4.2423   LearningRate 0.0100   Epoch: 13   Global Step: 566730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:24:53,084-Speed 2620.19 samples/sec   Loss 4.4196   LearningRate 0.0100   Epoch: 13   Global Step: 566740   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:24:56,982-Speed 2627.56 samples/sec   Loss 4.3173   LearningRate 0.0100   Epoch: 13   Global Step: 566750   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:00,880-Speed 2627.23 samples/sec   Loss 4.2203   LearningRate 0.0100   Epoch: 13   Global Step: 566760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:04,933-Speed 2527.06 samples/sec   Loss 4.2132   LearningRate 0.0100   Epoch: 13   Global Step: 566770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:08,858-Speed 2610.03 samples/sec   Loss 4.1964   LearningRate 0.0100   Epoch: 13   Global Step: 566780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:12,761-Speed 2624.42 samples/sec   Loss 4.2255   LearningRate 0.0100   Epoch: 13   Global Step: 566790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:16,661-Speed 2626.21 samples/sec   Loss 4.3209   LearningRate 0.0100   Epoch: 13   Global Step: 566800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:20,569-Speed 2620.99 samples/sec   Loss 4.4032   LearningRate 0.0100   Epoch: 13   Global Step: 566810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:24,531-Speed 2585.49 samples/sec   Loss 4.2115   LearningRate 0.0100   Epoch: 13   Global Step: 566820   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:28,440-Speed 2620.45 samples/sec   Loss 4.1734   LearningRate 0.0100   Epoch: 13   Global Step: 566830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:25:32,357-Speed 2615.07 samples/sec   Loss 4.2106   LearningRate 0.0100   Epoch: 13   Global Step: 566840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:25:36,278-Speed 2612.19 samples/sec   Loss 4.2948   LearningRate 0.0100   Epoch: 13   Global Step: 566850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:25:40,248-Speed 2579.45 samples/sec   Loss 4.2384   LearningRate 0.0100   Epoch: 13   Global Step: 566860   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:44,148-Speed 2626.88 samples/sec   Loss 4.2705   LearningRate 0.0100   Epoch: 13   Global Step: 566870   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:48,319-Speed 2455.64 samples/sec   Loss 4.2984   LearningRate 0.0100   Epoch: 13   Global Step: 566880   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:52,261-Speed 2598.29 samples/sec   Loss 4.2939   LearningRate 0.0100   Epoch: 13   Global Step: 566890   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:25:56,167-Speed 2622.17 samples/sec   Loss 4.3149   LearningRate 0.0100   Epoch: 13   Global Step: 566900   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:26:00,075-Speed 2621.10 samples/sec   Loss 4.3001   LearningRate 0.0100   Epoch: 13   Global Step: 566910   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:26:03,976-Speed 2625.92 samples/sec   Loss 4.2875   LearningRate 0.0100   Epoch: 13   Global Step: 566920   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:26:07,880-Speed 2623.27 samples/sec   Loss 4.2477   LearningRate 0.0100   Epoch: 13   Global Step: 566930   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:26:11,790-Speed 2619.65 samples/sec   Loss 4.3168   LearningRate 0.0100   Epoch: 13   Global Step: 566940   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:26:15,700-Speed 2619.22 samples/sec   Loss 4.2825   LearningRate 0.0100   Epoch: 13   Global Step: 566950   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-15 11:26:19,602-Speed 2624.89 samples/sec   Loss 4.3789   LearningRate 0.0100   Epoch: 13   Global Step: 566960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:26:23,501-Speed 2626.73 samples/sec   Loss 4.3498   LearningRate 0.0100   Epoch: 13   Global Step: 566970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:26:51,980-Speed 359.59 samples/sec   Loss 4.2229   LearningRate 0.0100   Epoch: 13   Global Step: 566980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:26:55,871-Speed 2633.06 samples/sec   Loss 4.2657   LearningRate 0.0100   Epoch: 13   Global Step: 566990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:26:59,749-Speed 2641.06 samples/sec   Loss 4.2439   LearningRate 0.0100   Epoch: 13   Global Step: 567000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:27:03,636-Speed 2635.06 samples/sec   Loss 4.3280   LearningRate 0.0100   Epoch: 13   Global Step: 567010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:27:07,531-Speed 2629.54 samples/sec   Loss 4.2440   LearningRate 0.0100   Epoch: 13   Global Step: 567020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:27:11,427-Speed 2628.96 samples/sec   Loss 4.2918   LearningRate 0.0100   Epoch: 13   Global Step: 567030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:27:15,312-Speed 2636.47 samples/sec   Loss 4.3791   LearningRate 0.0100   Epoch: 13   Global Step: 567040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:27:19,204-Speed 2631.92 samples/sec   Loss 4.2751   LearningRate 0.0100   Epoch: 13   Global Step: 567050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-15 11:27:23,108-Speed 2623.15 samples/sec   Loss 4.1886   LearningRate 0.0100   Epoch: 13   Global Step: 567060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-15 11:27:27,008-Speed 2626.85 samples/sec   Loss 4.3403   LearningRate 0.0100   Epoch: 13   Global Step: 567070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:27:30,883-Speed 2642.39 samples/sec   Loss 4.2936   LearningRate 0.0100   Epoch: 13   Global Step: 567080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:34,781-Speed 2627.65 samples/sec   Loss 4.3166   LearningRate 0.0100   Epoch: 13   Global Step: 567090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:38,680-Speed 2627.47 samples/sec   Loss 4.2330   LearningRate 0.0100   Epoch: 13   Global Step: 567100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:42,592-Speed 2618.32 samples/sec   Loss 4.3023   LearningRate 0.0100   Epoch: 13   Global Step: 567110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:46,502-Speed 2619.11 samples/sec   Loss 4.3109   LearningRate 0.0100   Epoch: 13   Global Step: 567120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:50,516-Speed 2552.07 samples/sec   Loss 4.2076   LearningRate 0.0100   Epoch: 13   Global Step: 567130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:54,414-Speed 2627.40 samples/sec   Loss 4.2855   LearningRate 0.0100   Epoch: 13   Global Step: 567140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:27:58,324-Speed 2621.51 samples/sec   Loss 4.3495   LearningRate 0.0100   Epoch: 13   Global Step: 567150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:28:02,246-Speed 2610.86 samples/sec   Loss 4.2400   LearningRate 0.0100   Epoch: 13   Global Step: 567160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:28:06,173-Speed 2608.34 samples/sec   Loss 4.2291   LearningRate 0.0100   Epoch: 13   Global Step: 567170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:28:10,091-Speed 2614.44 samples/sec   Loss 4.3447   LearningRate 0.0100   Epoch: 13   Global Step: 567180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:28:14,008-Speed 2615.30 samples/sec   Loss 4.2975   LearningRate 0.0100   Epoch: 13   Global Step: 567190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:28:17,916-Speed 2620.87 samples/sec   Loss 4.2265   LearningRate 0.0100   Epoch: 13   Global Step: 567200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:28:21,797-Speed 2639.06 samples/sec   Loss 4.3342   LearningRate 0.0100   Epoch: 13   Global Step: 567210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:28:25,690-Speed 2630.51 samples/sec   Loss 4.3683   LearningRate 0.0100   Epoch: 13   Global Step: 567220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:29,597-Speed 2622.18 samples/sec   Loss 4.2917   LearningRate 0.0100   Epoch: 13   Global Step: 567230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:33,512-Speed 2616.04 samples/sec   Loss 4.3066   LearningRate 0.0100   Epoch: 13   Global Step: 567240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:37,423-Speed 2619.32 samples/sec   Loss 4.2023   LearningRate 0.0100   Epoch: 13   Global Step: 567250   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:41,360-Speed 2600.90 samples/sec   Loss 4.2719   LearningRate 0.0100   Epoch: 13   Global Step: 567260   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:45,276-Speed 2616.20 samples/sec   Loss 4.2640   LearningRate 0.0100   Epoch: 13   Global Step: 567270   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:49,182-Speed 2621.65 samples/sec   Loss 4.3503   LearningRate 0.0100   Epoch: 13   Global Step: 567280   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:53,103-Speed 2612.25 samples/sec   Loss 4.1948   LearningRate 0.0100   Epoch: 13   Global Step: 567290   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:28:57,011-Speed 2621.26 samples/sec   Loss 4.2399   LearningRate 0.0100   Epoch: 13   Global Step: 567300   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:29:00,914-Speed 2624.02 samples/sec   Loss 4.2513   LearningRate 0.0100   Epoch: 13   Global Step: 567310   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:29:04,812-Speed 2627.70 samples/sec   Loss 4.2418   LearningRate 0.0100   Epoch: 13   Global Step: 567320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:08,715-Speed 2624.12 samples/sec   Loss 4.2733   LearningRate 0.0100   Epoch: 13   Global Step: 567330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:12,611-Speed 2629.07 samples/sec   Loss 4.2486   LearningRate 0.0100   Epoch: 13   Global Step: 567340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:16,511-Speed 2626.10 samples/sec   Loss 4.2802   LearningRate 0.0100   Epoch: 13   Global Step: 567350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:20,407-Speed 2628.73 samples/sec   Loss 4.2999   LearningRate 0.0100   Epoch: 13   Global Step: 567360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:24,306-Speed 2627.02 samples/sec   Loss 4.2960   LearningRate 0.0100   Epoch: 13   Global Step: 567370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:28,203-Speed 2628.66 samples/sec   Loss 4.3109   LearningRate 0.0100   Epoch: 13   Global Step: 567380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:32,108-Speed 2622.38 samples/sec   Loss 4.2280   LearningRate 0.0100   Epoch: 13   Global Step: 567390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:36,019-Speed 2618.91 samples/sec   Loss 4.2712   LearningRate 0.0100   Epoch: 13   Global Step: 567400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:39,914-Speed 2629.22 samples/sec   Loss 4.3027   LearningRate 0.0100   Epoch: 13   Global Step: 567410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:29:43,796-Speed 2638.94 samples/sec   Loss 4.3092   LearningRate 0.0100   Epoch: 13   Global Step: 567420   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:29:47,697-Speed 2625.86 samples/sec   Loss 4.2690   LearningRate 0.0100   Epoch: 13   Global Step: 567430   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:29:51,595-Speed 2628.00 samples/sec   Loss 4.2654   LearningRate 0.0100   Epoch: 13   Global Step: 567440   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:29:55,495-Speed 2625.97 samples/sec   Loss 4.2423   LearningRate 0.0100   Epoch: 13   Global Step: 567450   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:29:59,402-Speed 2621.24 samples/sec   Loss 4.3110   LearningRate 0.0100   Epoch: 13   Global Step: 567460   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:30:03,306-Speed 2623.74 samples/sec   Loss 4.4246   LearningRate 0.0100   Epoch: 13   Global Step: 567470   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:30:07,307-Speed 2559.87 samples/sec   Loss 4.2403   LearningRate 0.0100   Epoch: 13   Global Step: 567480   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:30:11,401-Speed 2501.59 samples/sec   Loss 4.3546   LearningRate 0.0100   Epoch: 13   Global Step: 567490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:30:15,302-Speed 2625.22 samples/sec   Loss 4.3056   LearningRate 0.0100   Epoch: 13   Global Step: 567500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:30:19,201-Speed 2627.70 samples/sec   Loss 4.3428   LearningRate 0.0100   Epoch: 13   Global Step: 567510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:30:23,096-Speed 2629.97 samples/sec   Loss 4.3561   LearningRate 0.0100   Epoch: 13   Global Step: 567520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:26,989-Speed 2630.97 samples/sec   Loss 4.2013   LearningRate 0.0100   Epoch: 13   Global Step: 567530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:30,885-Speed 2628.67 samples/sec   Loss 4.2783   LearningRate 0.0100   Epoch: 13   Global Step: 567540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:34,789-Speed 2623.71 samples/sec   Loss 4.3564   LearningRate 0.0100   Epoch: 13   Global Step: 567550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:38,686-Speed 2628.21 samples/sec   Loss 4.3753   LearningRate 0.0100   Epoch: 13   Global Step: 567560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:42,588-Speed 2624.84 samples/sec   Loss 4.2789   LearningRate 0.0100   Epoch: 13   Global Step: 567570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:46,495-Speed 2621.52 samples/sec   Loss 4.2835   LearningRate 0.0100   Epoch: 13   Global Step: 567580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:50,395-Speed 2626.89 samples/sec   Loss 4.2938   LearningRate 0.0100   Epoch: 13   Global Step: 567590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:54,294-Speed 2626.63 samples/sec   Loss 4.2099   LearningRate 0.0100   Epoch: 13   Global Step: 567600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:30:58,202-Speed 2621.47 samples/sec   Loss 4.2753   LearningRate 0.0100   Epoch: 13   Global Step: 567610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:02,102-Speed 2625.96 samples/sec   Loss 4.3516   LearningRate 0.0100   Epoch: 13   Global Step: 567620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:31:06,006-Speed 2623.58 samples/sec   Loss 4.2898   LearningRate 0.0100   Epoch: 13   Global Step: 567630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:31:09,916-Speed 2619.67 samples/sec   Loss 4.1590   LearningRate 0.0100   Epoch: 13   Global Step: 567640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:31:13,806-Speed 2632.68 samples/sec   Loss 4.2392   LearningRate 0.0100   Epoch: 13   Global Step: 567650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:17,708-Speed 2624.77 samples/sec   Loss 4.2544   LearningRate 0.0100   Epoch: 13   Global Step: 567660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:21,607-Speed 2627.28 samples/sec   Loss 4.2345   LearningRate 0.0100   Epoch: 13   Global Step: 567670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:25,521-Speed 2616.83 samples/sec   Loss 4.2098   LearningRate 0.0100   Epoch: 13   Global Step: 567680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:29,440-Speed 2613.47 samples/sec   Loss 4.2105   LearningRate 0.0100   Epoch: 13   Global Step: 567690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:33,352-Speed 2618.07 samples/sec   Loss 4.1927   LearningRate 0.0100   Epoch: 13   Global Step: 567700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:37,251-Speed 2626.86 samples/sec   Loss 4.1981   LearningRate 0.0100   Epoch: 13   Global Step: 567710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:41,151-Speed 2626.13 samples/sec   Loss 4.2783   LearningRate 0.0100   Epoch: 13   Global Step: 567720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:45,051-Speed 2626.84 samples/sec   Loss 4.1900   LearningRate 0.0100   Epoch: 13   Global Step: 567730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:48,949-Speed 2627.99 samples/sec   Loss 4.2287   LearningRate 0.0100   Epoch: 13   Global Step: 567740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:31:52,848-Speed 2626.82 samples/sec   Loss 4.3026   LearningRate 0.0100   Epoch: 13   Global Step: 567750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:31:56,752-Speed 2624.08 samples/sec   Loss 4.1967   LearningRate 0.0100   Epoch: 13   Global Step: 567760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:32:00,728-Speed 2575.74 samples/sec   Loss 4.2643   LearningRate 0.0100   Epoch: 13   Global Step: 567770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:32:04,628-Speed 2626.03 samples/sec   Loss 4.2866   LearningRate 0.0100   Epoch: 13   Global Step: 567780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:32:08,530-Speed 2624.96 samples/sec   Loss 4.2104   LearningRate 0.0100   Epoch: 13   Global Step: 567790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:32:12,420-Speed 2633.94 samples/sec   Loss 4.2230   LearningRate 0.0100   Epoch: 13   Global Step: 567800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:32:16,324-Speed 2623.78 samples/sec   Loss 4.3228   LearningRate 0.0100   Epoch: 13   Global Step: 567810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:32:20,229-Speed 2622.50 samples/sec   Loss 4.2564   LearningRate 0.0100   Epoch: 13   Global Step: 567820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:32:24,135-Speed 2622.73 samples/sec   Loss 4.2139   LearningRate 0.0100   Epoch: 13   Global Step: 567830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:32:28,028-Speed 2630.93 samples/sec   Loss 4.2774   LearningRate 0.0100   Epoch: 13   Global Step: 567840   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:32,020-Speed 2591.11 samples/sec   Loss 4.2285   LearningRate 0.0100   Epoch: 13   Global Step: 567850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:36,022-Speed 2558.98 samples/sec   Loss 4.2725   LearningRate 0.0100   Epoch: 13   Global Step: 567860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:39,933-Speed 2619.10 samples/sec   Loss 4.2605   LearningRate 0.0100   Epoch: 13   Global Step: 567870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:43,830-Speed 2628.23 samples/sec   Loss 4.3190   LearningRate 0.0100   Epoch: 13   Global Step: 567880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:51,119-Speed 2601.64 samples/sec   Loss 4.2625   LearningRate 0.0100   Epoch: 13   Global Step: 567890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:55,632-Speed 2635.29 samples/sec   Loss 4.2282   LearningRate 0.0100   Epoch: 13   Global Step: 567900   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:32:59,562-Speed 2606.36 samples/sec   Loss 4.3012   LearningRate 0.0099   Epoch: 13   Global Step: 567910   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:33:03,497-Speed 2630.36 samples/sec   Loss 4.3199   LearningRate 0.0099   Epoch: 13   Global Step: 567920   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:33:07,405-Speed 2621.25 samples/sec   Loss 4.3269   LearningRate 0.0099   Epoch: 13   Global Step: 567930   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:33:11,304-Speed 2626.24 samples/sec   Loss 4.2576   LearningRate 0.0099   Epoch: 13   Global Step: 567940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:15,204-Speed 2626.57 samples/sec   Loss 4.1916   LearningRate 0.0099   Epoch: 13   Global Step: 567950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:19,233-Speed 2619.39 samples/sec   Loss 4.1610   LearningRate 0.0099   Epoch: 13   Global Step: 567960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:23,144-Speed 2619.41 samples/sec   Loss 4.2716   LearningRate 0.0099   Epoch: 13   Global Step: 567970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:27,043-Speed 2626.99 samples/sec   Loss 4.2246   LearningRate 0.0099   Epoch: 13   Global Step: 567980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:30,938-Speed 2629.71 samples/sec   Loss 4.3178   LearningRate 0.0099   Epoch: 13   Global Step: 567990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:34,848-Speed 2619.50 samples/sec   Loss 4.2299   LearningRate 0.0099   Epoch: 13   Global Step: 568000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:38,743-Speed 2630.04 samples/sec   Loss 4.1456   LearningRate 0.0099   Epoch: 13   Global Step: 568010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:42,649-Speed 2622.25 samples/sec   Loss 4.2616   LearningRate 0.0099   Epoch: 13   Global Step: 568020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:46,543-Speed 2630.11 samples/sec   Loss 4.2372   LearningRate 0.0099   Epoch: 13   Global Step: 568030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:50,438-Speed 2629.46 samples/sec   Loss 4.2033   LearningRate 0.0099   Epoch: 13   Global Step: 568040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:33:54,317-Speed 2641.01 samples/sec   Loss 4.2686   LearningRate 0.0099   Epoch: 13   Global Step: 568050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:33:58,214-Speed 2628.25 samples/sec   Loss 4.2735   LearningRate 0.0099   Epoch: 13   Global Step: 568060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:34:02,112-Speed 2627.61 samples/sec   Loss 4.3098   LearningRate 0.0099   Epoch: 13   Global Step: 568070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:34:06,007-Speed 2629.72 samples/sec   Loss 4.2336   LearningRate 0.0099   Epoch: 13   Global Step: 568080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:34:09,914-Speed 2621.08 samples/sec   Loss 4.1708   LearningRate 0.0099   Epoch: 13   Global Step: 568090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:34:13,801-Speed 2639.65 samples/sec   Loss 4.2761   LearningRate 0.0099   Epoch: 13   Global Step: 568100   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:17,694-Speed 2630.89 samples/sec   Loss 4.3388   LearningRate 0.0099   Epoch: 13   Global Step: 568110   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:21,603-Speed 2620.18 samples/sec   Loss 4.2540   LearningRate 0.0099   Epoch: 13   Global Step: 568120   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:25,504-Speed 2625.53 samples/sec   Loss 4.1993   LearningRate 0.0099   Epoch: 13   Global Step: 568130   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:29,407-Speed 2624.66 samples/sec   Loss 4.1907   LearningRate 0.0099   Epoch: 13   Global Step: 568140   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:33,314-Speed 2621.88 samples/sec   Loss 4.2485   LearningRate 0.0099   Epoch: 13   Global Step: 568150   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:37,212-Speed 2627.48 samples/sec   Loss 4.2169   LearningRate 0.0099   Epoch: 13   Global Step: 568160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:41,104-Speed 2631.18 samples/sec   Loss 4.3823   LearningRate 0.0099   Epoch: 13   Global Step: 568170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:45,011-Speed 2622.06 samples/sec   Loss 4.2881   LearningRate 0.0099   Epoch: 13   Global Step: 568180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:48,908-Speed 2628.13 samples/sec   Loss 4.1968   LearningRate 0.0099   Epoch: 13   Global Step: 568190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:34:52,809-Speed 2626.39 samples/sec   Loss 4.3572   LearningRate 0.0099   Epoch: 13   Global Step: 568200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:34:56,706-Speed 2628.35 samples/sec   Loss 4.2655   LearningRate 0.0099   Epoch: 13   Global Step: 568210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:00,599-Speed 2630.86 samples/sec   Loss 4.2250   LearningRate 0.0099   Epoch: 13   Global Step: 568220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:04,495-Speed 2628.79 samples/sec   Loss 4.1819   LearningRate 0.0099   Epoch: 13   Global Step: 568230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:08,395-Speed 2626.52 samples/sec   Loss 4.1723   LearningRate 0.0099   Epoch: 13   Global Step: 568240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:12,473-Speed 2511.89 samples/sec   Loss 4.1593   LearningRate 0.0099   Epoch: 13   Global Step: 568250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:16,383-Speed 2619.55 samples/sec   Loss 4.2929   LearningRate 0.0099   Epoch: 13   Global Step: 568260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:20,325-Speed 2598.56 samples/sec   Loss 4.2639   LearningRate 0.0099   Epoch: 13   Global Step: 568270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:24,251-Speed 2609.00 samples/sec   Loss 4.3195   LearningRate 0.0099   Epoch: 13   Global Step: 568280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:28,147-Speed 2628.96 samples/sec   Loss 4.2701   LearningRate 0.0099   Epoch: 13   Global Step: 568290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:32,028-Speed 2639.20 samples/sec   Loss 4.2978   LearningRate 0.0099   Epoch: 13   Global Step: 568300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:35,931-Speed 2624.04 samples/sec   Loss 4.2626   LearningRate 0.0099   Epoch: 13   Global Step: 568310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:39,829-Speed 2627.46 samples/sec   Loss 4.2299   LearningRate 0.0099   Epoch: 13   Global Step: 568320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:43,728-Speed 2629.38 samples/sec   Loss 4.2190   LearningRate 0.0099   Epoch: 13   Global Step: 568330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:47,634-Speed 2622.57 samples/sec   Loss 4.1842   LearningRate 0.0099   Epoch: 13   Global Step: 568340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:51,527-Speed 2630.71 samples/sec   Loss 4.1880   LearningRate 0.0099   Epoch: 13   Global Step: 568350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:55,426-Speed 2627.24 samples/sec   Loss 4.1993   LearningRate 0.0099   Epoch: 13   Global Step: 568360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:35:59,323-Speed 2628.40 samples/sec   Loss 4.2959   LearningRate 0.0099   Epoch: 13   Global Step: 568370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:03,225-Speed 2625.03 samples/sec   Loss 4.1891   LearningRate 0.0099   Epoch: 13   Global Step: 568380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:07,122-Speed 2628.39 samples/sec   Loss 4.1191   LearningRate 0.0099   Epoch: 13   Global Step: 568390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:11,037-Speed 2616.02 samples/sec   Loss 4.2232   LearningRate 0.0099   Epoch: 13   Global Step: 568400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:36:14,937-Speed 2626.34 samples/sec   Loss 4.3561   LearningRate 0.0099   Epoch: 13   Global Step: 568410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:36:18,830-Speed 2631.38 samples/sec   Loss 4.2386   LearningRate 0.0099   Epoch: 13   Global Step: 568420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:22,727-Speed 2628.09 samples/sec   Loss 4.1929   LearningRate 0.0099   Epoch: 13   Global Step: 568430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:26,629-Speed 2625.11 samples/sec   Loss 4.3193   LearningRate 0.0099   Epoch: 13   Global Step: 568440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:30,545-Speed 2615.92 samples/sec   Loss 4.2064   LearningRate 0.0099   Epoch: 13   Global Step: 568450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:34,441-Speed 2628.82 samples/sec   Loss 4.2665   LearningRate 0.0099   Epoch: 13   Global Step: 568460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:38,337-Speed 2628.74 samples/sec   Loss 4.2775   LearningRate 0.0099   Epoch: 13   Global Step: 568470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:42,247-Speed 2619.83 samples/sec   Loss 4.2425   LearningRate 0.0099   Epoch: 13   Global Step: 568480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:36:46,123-Speed 2642.68 samples/sec   Loss 4.3106   LearningRate 0.0099   Epoch: 13   Global Step: 568490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:36:50,022-Speed 2627.16 samples/sec   Loss 4.1913   LearningRate 0.0099   Epoch: 13   Global Step: 568500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:36:53,923-Speed 2625.95 samples/sec   Loss 4.2207   LearningRate 0.0099   Epoch: 13   Global Step: 568510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:36:57,822-Speed 2627.22 samples/sec   Loss 4.3014   LearningRate 0.0099   Epoch: 13   Global Step: 568520   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:01,720-Speed 2627.47 samples/sec   Loss 4.1242   LearningRate 0.0099   Epoch: 13   Global Step: 568530   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:05,615-Speed 2629.18 samples/sec   Loss 4.3632   LearningRate 0.0099   Epoch: 13   Global Step: 568540   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:09,517-Speed 2624.70 samples/sec   Loss 4.2556   LearningRate 0.0099   Epoch: 13   Global Step: 568550   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:13,444-Speed 2609.03 samples/sec   Loss 4.1986   LearningRate 0.0099   Epoch: 13   Global Step: 568560   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:17,677-Speed 2419.59 samples/sec   Loss 4.2594   LearningRate 0.0099   Epoch: 13   Global Step: 568570   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:21,584-Speed 2621.33 samples/sec   Loss 4.2803   LearningRate 0.0099   Epoch: 13   Global Step: 568580   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:37:25,488-Speed 2624.35 samples/sec   Loss 4.0972   LearningRate 0.0099   Epoch: 13   Global Step: 568590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:29,389-Speed 2625.45 samples/sec   Loss 4.2731   LearningRate 0.0099   Epoch: 13   Global Step: 568600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:33,287-Speed 2627.51 samples/sec   Loss 4.3176   LearningRate 0.0099   Epoch: 13   Global Step: 568610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:37,195-Speed 2620.55 samples/sec   Loss 4.2478   LearningRate 0.0099   Epoch: 13   Global Step: 568620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:41,103-Speed 2620.66 samples/sec   Loss 4.2513   LearningRate 0.0099   Epoch: 13   Global Step: 568630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:45,000-Speed 2629.06 samples/sec   Loss 4.2758   LearningRate 0.0099   Epoch: 13   Global Step: 568640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:48,901-Speed 2625.94 samples/sec   Loss 4.3063   LearningRate 0.0099   Epoch: 13   Global Step: 568650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:52,845-Speed 2597.00 samples/sec   Loss 4.2151   LearningRate 0.0099   Epoch: 13   Global Step: 568660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:37:56,752-Speed 2627.38 samples/sec   Loss 4.1884   LearningRate 0.0099   Epoch: 13   Global Step: 568670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:00,708-Speed 2589.11 samples/sec   Loss 4.2449   LearningRate 0.0099   Epoch: 13   Global Step: 568680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:04,627-Speed 2613.55 samples/sec   Loss 4.2753   LearningRate 0.0099   Epoch: 13   Global Step: 568690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:38:08,564-Speed 2601.37 samples/sec   Loss 4.2924   LearningRate 0.0099   Epoch: 13   Global Step: 568700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:12,564-Speed 2561.32 samples/sec   Loss 4.2871   LearningRate 0.0099   Epoch: 13   Global Step: 568710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:16,465-Speed 2625.50 samples/sec   Loss 4.2188   LearningRate 0.0099   Epoch: 13   Global Step: 568720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:20,372-Speed 2621.89 samples/sec   Loss 4.1758   LearningRate 0.0099   Epoch: 13   Global Step: 568730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:24,284-Speed 2618.16 samples/sec   Loss 4.2626   LearningRate 0.0099   Epoch: 13   Global Step: 568740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:28,187-Speed 2624.36 samples/sec   Loss 4.2733   LearningRate 0.0099   Epoch: 13   Global Step: 568750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:32,089-Speed 2625.50 samples/sec   Loss 4.2337   LearningRate 0.0099   Epoch: 13   Global Step: 568760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:35,990-Speed 2625.74 samples/sec   Loss 4.2360   LearningRate 0.0099   Epoch: 13   Global Step: 568770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:39,892-Speed 2625.04 samples/sec   Loss 4.1790   LearningRate 0.0099   Epoch: 13   Global Step: 568780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:43,785-Speed 2630.58 samples/sec   Loss 4.2113   LearningRate 0.0099   Epoch: 13   Global Step: 568790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:38:47,701-Speed 2615.91 samples/sec   Loss 4.2108   LearningRate 0.0099   Epoch: 13   Global Step: 568800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:38:51,595-Speed 2630.13 samples/sec   Loss 4.3491   LearningRate 0.0099   Epoch: 13   Global Step: 568810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:38:55,491-Speed 2629.53 samples/sec   Loss 4.3192   LearningRate 0.0099   Epoch: 13   Global Step: 568820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:38:59,400-Speed 2620.35 samples/sec   Loss 4.2423   LearningRate 0.0099   Epoch: 13   Global Step: 568830   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:39:03,294-Speed 2630.32 samples/sec   Loss 4.2999   LearningRate 0.0099   Epoch: 13   Global Step: 568840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:39:07,198-Speed 2623.82 samples/sec   Loss 4.2544   LearningRate 0.0099   Epoch: 13   Global Step: 568850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:39:11,084-Speed 2635.61 samples/sec   Loss 4.2119   LearningRate 0.0099   Epoch: 13   Global Step: 568860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:39:14,976-Speed 2631.71 samples/sec   Loss 4.3118   LearningRate 0.0099   Epoch: 13   Global Step: 568870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:39:18,867-Speed 2632.47 samples/sec   Loss 4.2173   LearningRate 0.0099   Epoch: 13   Global Step: 568880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:39:22,749-Speed 2638.21 samples/sec   Loss 4.2374   LearningRate 0.0099   Epoch: 13   Global Step: 568890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:26,660-Speed 2619.10 samples/sec   Loss 4.2272   LearningRate 0.0099   Epoch: 13   Global Step: 568900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:30,561-Speed 2625.64 samples/sec   Loss 4.2429   LearningRate 0.0099   Epoch: 13   Global Step: 568910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:34,455-Speed 2630.90 samples/sec   Loss 4.2833   LearningRate 0.0099   Epoch: 13   Global Step: 568920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:38,346-Speed 2632.61 samples/sec   Loss 4.2014   LearningRate 0.0099   Epoch: 13   Global Step: 568930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:42,263-Speed 2614.26 samples/sec   Loss 4.3269   LearningRate 0.0099   Epoch: 13   Global Step: 568940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:46,163-Speed 2626.74 samples/sec   Loss 4.3347   LearningRate 0.0099   Epoch: 13   Global Step: 568950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:50,054-Speed 2632.43 samples/sec   Loss 4.2684   LearningRate 0.0099   Epoch: 13   Global Step: 568960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:53,947-Speed 2631.43 samples/sec   Loss 4.2048   LearningRate 0.0099   Epoch: 13   Global Step: 568970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:39:57,849-Speed 2624.71 samples/sec   Loss 4.2410   LearningRate 0.0099   Epoch: 13   Global Step: 568980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:01,748-Speed 2627.39 samples/sec   Loss 4.3053   LearningRate 0.0099   Epoch: 13   Global Step: 568990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:40:05,626-Speed 2640.79 samples/sec   Loss 4.2702   LearningRate 0.0099   Epoch: 13   Global Step: 569000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:09,523-Speed 2628.59 samples/sec   Loss 4.2785   LearningRate 0.0099   Epoch: 13   Global Step: 569010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:13,418-Speed 2629.03 samples/sec   Loss 4.2097   LearningRate 0.0099   Epoch: 13   Global Step: 569020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:17,319-Speed 2626.28 samples/sec   Loss 4.2468   LearningRate 0.0099   Epoch: 13   Global Step: 569030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:21,215-Speed 2629.43 samples/sec   Loss 4.2646   LearningRate 0.0099   Epoch: 13   Global Step: 569040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:25,135-Speed 2612.62 samples/sec   Loss 4.3265   LearningRate 0.0099   Epoch: 13   Global Step: 569050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:29,042-Speed 2621.84 samples/sec   Loss 4.2711   LearningRate 0.0099   Epoch: 13   Global Step: 569060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:40:32,919-Speed 2641.91 samples/sec   Loss 4.2978   LearningRate 0.0099   Epoch: 13   Global Step: 569070   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:40:36,817-Speed 2627.74 samples/sec   Loss 4.2779   LearningRate 0.0099   Epoch: 13   Global Step: 569080   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:40:40,711-Speed 2630.36 samples/sec   Loss 4.2036   LearningRate 0.0099   Epoch: 13   Global Step: 569090   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:40:44,617-Speed 2622.95 samples/sec   Loss 4.2100   LearningRate 0.0099   Epoch: 13   Global Step: 569100   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:40:48,517-Speed 2626.30 samples/sec   Loss 4.2555   LearningRate 0.0099   Epoch: 13   Global Step: 569110   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:40:52,418-Speed 2625.72 samples/sec   Loss 4.2049   LearningRate 0.0099   Epoch: 13   Global Step: 569120   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:40:56,313-Speed 2629.07 samples/sec   Loss 4.2573   LearningRate 0.0099   Epoch: 13   Global Step: 569130   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:41:00,210-Speed 2628.15 samples/sec   Loss 4.2486   LearningRate 0.0099   Epoch: 13   Global Step: 569140   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:41:04,107-Speed 2628.06 samples/sec   Loss 4.1673   LearningRate 0.0099   Epoch: 13   Global Step: 569150   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:41:08,008-Speed 2626.33 samples/sec   Loss 4.2102   LearningRate 0.0099   Epoch: 13   Global Step: 569160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:41:11,908-Speed 2625.97 samples/sec   Loss 4.2075   LearningRate 0.0099   Epoch: 13   Global Step: 569170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:15,841-Speed 2605.07 samples/sec   Loss 4.1732   LearningRate 0.0099   Epoch: 13   Global Step: 569180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:19,747-Speed 2622.32 samples/sec   Loss 4.2793   LearningRate 0.0099   Epoch: 13   Global Step: 569190   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:23,643-Speed 2628.90 samples/sec   Loss 4.2668   LearningRate 0.0099   Epoch: 13   Global Step: 569200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:27,557-Speed 2617.43 samples/sec   Loss 4.1608   LearningRate 0.0099   Epoch: 13   Global Step: 569210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:31,453-Speed 2629.06 samples/sec   Loss 4.1791   LearningRate 0.0098   Epoch: 13   Global Step: 569220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:35,348-Speed 2629.88 samples/sec   Loss 4.2900   LearningRate 0.0098   Epoch: 13   Global Step: 569230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:39,253-Speed 2622.32 samples/sec   Loss 4.2840   LearningRate 0.0098   Epoch: 13   Global Step: 569240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:43,163-Speed 2620.01 samples/sec   Loss 4.2460   LearningRate 0.0098   Epoch: 13   Global Step: 569250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:47,058-Speed 2629.93 samples/sec   Loss 4.1656   LearningRate 0.0098   Epoch: 13   Global Step: 569260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:50,934-Speed 2642.46 samples/sec   Loss 4.2474   LearningRate 0.0098   Epoch: 13   Global Step: 569270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:41:54,818-Speed 2637.33 samples/sec   Loss 4.3207   LearningRate 0.0098   Epoch: 13   Global Step: 569280   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:41:58,712-Speed 2631.24 samples/sec   Loss 4.1344   LearningRate 0.0098   Epoch: 13   Global Step: 569290   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:02,608-Speed 2628.81 samples/sec   Loss 4.3330   LearningRate 0.0098   Epoch: 13   Global Step: 569300   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:06,514-Speed 2622.00 samples/sec   Loss 4.3004   LearningRate 0.0098   Epoch: 13   Global Step: 569310   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:10,420-Speed 2621.84 samples/sec   Loss 4.1128   LearningRate 0.0098   Epoch: 13   Global Step: 569320   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:14,324-Speed 2624.12 samples/sec   Loss 4.0833   LearningRate 0.0098   Epoch: 13   Global Step: 569330   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:18,229-Speed 2623.24 samples/sec   Loss 4.2073   LearningRate 0.0098   Epoch: 13   Global Step: 569340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:22,145-Speed 2615.80 samples/sec   Loss 4.2474   LearningRate 0.0098   Epoch: 13   Global Step: 569350   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:26,070-Speed 2609.40 samples/sec   Loss 4.1257   LearningRate 0.0098   Epoch: 13   Global Step: 569360   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:30,097-Speed 2543.58 samples/sec   Loss 4.3166   LearningRate 0.0098   Epoch: 13   Global Step: 569370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:34,057-Speed 2586.77 samples/sec   Loss 4.2014   LearningRate 0.0098   Epoch: 13   Global Step: 569380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:42:37,959-Speed 2624.83 samples/sec   Loss 4.2252   LearningRate 0.0098   Epoch: 13   Global Step: 569390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:42:41,885-Speed 2608.37 samples/sec   Loss 4.2329   LearningRate 0.0098   Epoch: 13   Global Step: 569400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:42:45,793-Speed 2621.49 samples/sec   Loss 4.2165   LearningRate 0.0098   Epoch: 13   Global Step: 569410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:42:49,687-Speed 2630.71 samples/sec   Loss 4.2039   LearningRate 0.0098   Epoch: 13   Global Step: 569420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:42:53,629-Speed 2598.21 samples/sec   Loss 4.2333   LearningRate 0.0098   Epoch: 13   Global Step: 569430   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:42:57,533-Speed 2624.48 samples/sec   Loss 4.2757   LearningRate 0.0098   Epoch: 13   Global Step: 569440   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:01,427-Speed 2630.20 samples/sec   Loss 4.2778   LearningRate 0.0098   Epoch: 13   Global Step: 569450   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:05,324-Speed 2627.93 samples/sec   Loss 4.2305   LearningRate 0.0098   Epoch: 13   Global Step: 569460   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:09,231-Speed 2622.08 samples/sec   Loss 4.2148   LearningRate 0.0098   Epoch: 13   Global Step: 569470   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:13,122-Speed 2632.18 samples/sec   Loss 4.2551   LearningRate 0.0098   Epoch: 13   Global Step: 569480   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:17,018-Speed 2628.96 samples/sec   Loss 4.1722   LearningRate 0.0098   Epoch: 13   Global Step: 569490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:20,962-Speed 2596.81 samples/sec   Loss 4.1997   LearningRate 0.0098   Epoch: 13   Global Step: 569500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:24,867-Speed 2623.64 samples/sec   Loss 4.2540   LearningRate 0.0098   Epoch: 13   Global Step: 569510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:28,766-Speed 2632.81 samples/sec   Loss 4.2444   LearningRate 0.0098   Epoch: 13   Global Step: 569520   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:43:32,660-Speed 2629.99 samples/sec   Loss 4.2508   LearningRate 0.0098   Epoch: 13   Global Step: 569530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:36,554-Speed 2629.69 samples/sec   Loss 4.2778   LearningRate 0.0098   Epoch: 13   Global Step: 569540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:40,448-Speed 2630.52 samples/sec   Loss 4.2070   LearningRate 0.0098   Epoch: 13   Global Step: 569550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:44,343-Speed 2630.01 samples/sec   Loss 4.2590   LearningRate 0.0098   Epoch: 13   Global Step: 569560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:48,237-Speed 2630.36 samples/sec   Loss 4.2228   LearningRate 0.0098   Epoch: 13   Global Step: 569570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:52,135-Speed 2627.91 samples/sec   Loss 4.2507   LearningRate 0.0098   Epoch: 13   Global Step: 569580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:56,029-Speed 2630.38 samples/sec   Loss 4.1931   LearningRate 0.0098   Epoch: 13   Global Step: 569590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:43:59,922-Speed 2631.42 samples/sec   Loss 4.2861   LearningRate 0.0098   Epoch: 13   Global Step: 569600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:44:03,814-Speed 2631.68 samples/sec   Loss 4.2929   LearningRate 0.0098   Epoch: 13   Global Step: 569610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:44:07,711-Speed 2628.08 samples/sec   Loss 4.2570   LearningRate 0.0098   Epoch: 13   Global Step: 569620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:44:11,588-Speed 2641.62 samples/sec   Loss 4.2868   LearningRate 0.0098   Epoch: 13   Global Step: 569630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:44:15,483-Speed 2630.16 samples/sec   Loss 4.2915   LearningRate 0.0098   Epoch: 13   Global Step: 569640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:44:19,370-Speed 2635.39 samples/sec   Loss 4.3098   LearningRate 0.0098   Epoch: 13   Global Step: 569650   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:23,271-Speed 2624.90 samples/sec   Loss 4.1846   LearningRate 0.0098   Epoch: 13   Global Step: 569660   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:27,172-Speed 2626.49 samples/sec   Loss 4.2328   LearningRate 0.0098   Epoch: 13   Global Step: 569670   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:31,067-Speed 2629.45 samples/sec   Loss 4.2937   LearningRate 0.0098   Epoch: 13   Global Step: 569680   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:34,964-Speed 2628.85 samples/sec   Loss 4.2462   LearningRate 0.0098   Epoch: 13   Global Step: 569690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:38,914-Speed 2593.28 samples/sec   Loss 4.2672   LearningRate 0.0098   Epoch: 13   Global Step: 569700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:42,885-Speed 2579.03 samples/sec   Loss 4.2645   LearningRate 0.0098   Epoch: 13   Global Step: 569710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:46,785-Speed 2625.96 samples/sec   Loss 4.2556   LearningRate 0.0098   Epoch: 13   Global Step: 569720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:50,689-Speed 2624.34 samples/sec   Loss 4.2562   LearningRate 0.0098   Epoch: 13   Global Step: 569730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:54,587-Speed 2627.81 samples/sec   Loss 4.2477   LearningRate 0.0098   Epoch: 13   Global Step: 569740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:44:58,506-Speed 2613.01 samples/sec   Loss 4.0756   LearningRate 0.0098   Epoch: 13   Global Step: 569750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:45:02,403-Speed 2628.52 samples/sec   Loss 4.1919   LearningRate 0.0098   Epoch: 13   Global Step: 569760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:45:06,305-Speed 2625.31 samples/sec   Loss 4.1634   LearningRate 0.0098   Epoch: 13   Global Step: 569770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:45:10,201-Speed 2628.93 samples/sec   Loss 4.2260   LearningRate 0.0098   Epoch: 13   Global Step: 569780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:45:14,097-Speed 2628.42 samples/sec   Loss 4.2517   LearningRate 0.0098   Epoch: 13   Global Step: 569790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:45:17,969-Speed 2645.99 samples/sec   Loss 4.2969   LearningRate 0.0098   Epoch: 13   Global Step: 569800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:21,882-Speed 2617.16 samples/sec   Loss 4.1707   LearningRate 0.0098   Epoch: 13   Global Step: 569810   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:25,786-Speed 2627.10 samples/sec   Loss 4.2178   LearningRate 0.0098   Epoch: 13   Global Step: 569820   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:29,688-Speed 2624.42 samples/sec   Loss 4.2173   LearningRate 0.0098   Epoch: 13   Global Step: 569830   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:33,585-Speed 2628.66 samples/sec   Loss 4.2067   LearningRate 0.0098   Epoch: 13   Global Step: 569840   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:37,485-Speed 2626.59 samples/sec   Loss 4.2289   LearningRate 0.0098   Epoch: 13   Global Step: 569850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:41,417-Speed 2604.91 samples/sec   Loss 4.2702   LearningRate 0.0098   Epoch: 13   Global Step: 569860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:45,318-Speed 2625.74 samples/sec   Loss 4.1885   LearningRate 0.0098   Epoch: 13   Global Step: 569870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:49,216-Speed 2627.50 samples/sec   Loss 4.2469   LearningRate 0.0098   Epoch: 13   Global Step: 569880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:53,118-Speed 2625.00 samples/sec   Loss 4.2193   LearningRate 0.0098   Epoch: 13   Global Step: 569890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:45:57,040-Speed 2611.53 samples/sec   Loss 4.2225   LearningRate 0.0098   Epoch: 13   Global Step: 569900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:00,936-Speed 2628.82 samples/sec   Loss 4.1499   LearningRate 0.0098   Epoch: 13   Global Step: 569910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:04,833-Speed 2628.73 samples/sec   Loss 4.2239   LearningRate 0.0098   Epoch: 13   Global Step: 569920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:08,728-Speed 2629.54 samples/sec   Loss 4.2623   LearningRate 0.0098   Epoch: 13   Global Step: 569930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:12,630-Speed 2624.99 samples/sec   Loss 4.2228   LearningRate 0.0098   Epoch: 13   Global Step: 569940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:16,533-Speed 2624.26 samples/sec   Loss 4.2062   LearningRate 0.0098   Epoch: 13   Global Step: 569950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:20,431-Speed 2627.90 samples/sec   Loss 4.2864   LearningRate 0.0098   Epoch: 13   Global Step: 569960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:24,327-Speed 2628.46 samples/sec   Loss 4.2335   LearningRate 0.0098   Epoch: 13   Global Step: 569970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:28,232-Speed 2622.92 samples/sec   Loss 4.2304   LearningRate 0.0098   Epoch: 13   Global Step: 569980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:32,136-Speed 2623.23 samples/sec   Loss 4.1784   LearningRate 0.0098   Epoch: 13   Global Step: 569990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:46:36,035-Speed 2627.27 samples/sec   Loss 4.3347   LearningRate 0.0098   Epoch: 13   Global Step: 570000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:47:18,982-[lfw][570000]XNorm: 23.059112
Training: 2022-04-15 11:47:18,982-[lfw][570000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 11:47:18,983-[lfw][570000]Accuracy-Highest: 0.99800
Training: 2022-04-15 11:48:08,845-[cfp_fp][570000]XNorm: 21.829221
Training: 2022-04-15 11:48:08,847-[cfp_fp][570000]Accuracy-Flip: 0.98986+-0.00375
Training: 2022-04-15 11:48:08,848-[cfp_fp][570000]Accuracy-Highest: 0.99086
Training: 2022-04-15 11:48:51,832-[agedb_30][570000]XNorm: 23.189739
Training: 2022-04-15 11:48:51,833-[agedb_30][570000]Accuracy-Flip: 0.97983+-0.00570
Training: 2022-04-15 11:48:51,833-[agedb_30][570000]Accuracy-Highest: 0.98083
Training: 2022-04-15 11:48:55,682-Speed 73.33 samples/sec   Loss 4.2581   LearningRate 0.0098   Epoch: 13   Global Step: 570010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:48:59,550-Speed 2647.94 samples/sec   Loss 4.2374   LearningRate 0.0098   Epoch: 13   Global Step: 570020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:03,423-Speed 2645.11 samples/sec   Loss 4.2093   LearningRate 0.0098   Epoch: 13   Global Step: 570030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:07,316-Speed 2630.73 samples/sec   Loss 4.3120   LearningRate 0.0098   Epoch: 13   Global Step: 570040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:11,186-Speed 2647.35 samples/sec   Loss 4.2316   LearningRate 0.0098   Epoch: 13   Global Step: 570050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:15,060-Speed 2643.91 samples/sec   Loss 4.1814   LearningRate 0.0098   Epoch: 13   Global Step: 570060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:18,942-Speed 2639.20 samples/sec   Loss 4.1176   LearningRate 0.0098   Epoch: 13   Global Step: 570070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:22,814-Speed 2645.22 samples/sec   Loss 4.2174   LearningRate 0.0098   Epoch: 13   Global Step: 570080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:26,694-Speed 2640.52 samples/sec   Loss 4.2696   LearningRate 0.0098   Epoch: 13   Global Step: 570090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:30,573-Speed 2640.53 samples/sec   Loss 4.1925   LearningRate 0.0098   Epoch: 13   Global Step: 570100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:34,433-Speed 2654.06 samples/sec   Loss 4.2123   LearningRate 0.0098   Epoch: 13   Global Step: 570110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:38,318-Speed 2636.61 samples/sec   Loss 4.2585   LearningRate 0.0098   Epoch: 13   Global Step: 570120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:42,218-Speed 2626.04 samples/sec   Loss 4.3585   LearningRate 0.0098   Epoch: 13   Global Step: 570130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:46,101-Speed 2637.88 samples/sec   Loss 4.1979   LearningRate 0.0098   Epoch: 13   Global Step: 570140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:49,989-Speed 2634.30 samples/sec   Loss 4.2864   LearningRate 0.0098   Epoch: 13   Global Step: 570150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:53,882-Speed 2631.98 samples/sec   Loss 4.3085   LearningRate 0.0098   Epoch: 13   Global Step: 570160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:49:57,775-Speed 2630.66 samples/sec   Loss 4.1101   LearningRate 0.0098   Epoch: 13   Global Step: 570170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:50:01,669-Speed 2631.14 samples/sec   Loss 4.2267   LearningRate 0.0098   Epoch: 13   Global Step: 570180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:50:05,567-Speed 2627.14 samples/sec   Loss 4.2725   LearningRate 0.0098   Epoch: 13   Global Step: 570190   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:50:09,468-Speed 2625.99 samples/sec   Loss 4.1820   LearningRate 0.0098   Epoch: 13   Global Step: 570200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:50:13,368-Speed 2626.29 samples/sec   Loss 4.2973   LearningRate 0.0098   Epoch: 13   Global Step: 570210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:17,264-Speed 2628.71 samples/sec   Loss 4.2111   LearningRate 0.0098   Epoch: 13   Global Step: 570220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:21,167-Speed 2624.32 samples/sec   Loss 4.2632   LearningRate 0.0098   Epoch: 13   Global Step: 570230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:25,061-Speed 2630.49 samples/sec   Loss 4.2577   LearningRate 0.0098   Epoch: 13   Global Step: 570240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:28,957-Speed 2629.16 samples/sec   Loss 4.1977   LearningRate 0.0098   Epoch: 13   Global Step: 570250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:32,884-Speed 2607.62 samples/sec   Loss 4.2483   LearningRate 0.0098   Epoch: 13   Global Step: 570260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:36,821-Speed 2602.82 samples/sec   Loss 4.1405   LearningRate 0.0098   Epoch: 13   Global Step: 570270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:40,721-Speed 2626.55 samples/sec   Loss 4.2691   LearningRate 0.0098   Epoch: 13   Global Step: 570280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:44,645-Speed 2610.37 samples/sec   Loss 4.3072   LearningRate 0.0098   Epoch: 13   Global Step: 570290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:48,551-Speed 2622.97 samples/sec   Loss 4.2077   LearningRate 0.0098   Epoch: 13   Global Step: 570300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:52,417-Speed 2649.19 samples/sec   Loss 4.2135   LearningRate 0.0098   Epoch: 13   Global Step: 570310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:50:56,340-Speed 2610.61 samples/sec   Loss 4.2241   LearningRate 0.0098   Epoch: 13   Global Step: 570320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:51:00,216-Speed 2643.14 samples/sec   Loss 4.2599   LearningRate 0.0098   Epoch: 13   Global Step: 570330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:04,114-Speed 2627.28 samples/sec   Loss 4.2244   LearningRate 0.0098   Epoch: 13   Global Step: 570340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:08,017-Speed 2624.56 samples/sec   Loss 4.2583   LearningRate 0.0098   Epoch: 13   Global Step: 570350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:11,923-Speed 2622.13 samples/sec   Loss 4.2874   LearningRate 0.0098   Epoch: 13   Global Step: 570360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:15,863-Speed 2600.33 samples/sec   Loss 4.2866   LearningRate 0.0098   Epoch: 13   Global Step: 570370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:19,770-Speed 2621.22 samples/sec   Loss 4.3070   LearningRate 0.0098   Epoch: 13   Global Step: 570380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:23,687-Speed 2615.51 samples/sec   Loss 4.2148   LearningRate 0.0098   Epoch: 13   Global Step: 570390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:27,598-Speed 2618.92 samples/sec   Loss 4.1918   LearningRate 0.0098   Epoch: 13   Global Step: 570400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:31,495-Speed 2628.03 samples/sec   Loss 4.2005   LearningRate 0.0098   Epoch: 13   Global Step: 570410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:35,456-Speed 2585.74 samples/sec   Loss 4.1873   LearningRate 0.0098   Epoch: 13   Global Step: 570420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:39,364-Speed 2620.73 samples/sec   Loss 4.1985   LearningRate 0.0098   Epoch: 13   Global Step: 570430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:51:43,270-Speed 2622.08 samples/sec   Loss 4.1894   LearningRate 0.0098   Epoch: 13   Global Step: 570440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:51:47,178-Speed 2620.84 samples/sec   Loss 4.2008   LearningRate 0.0098   Epoch: 13   Global Step: 570450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:51:51,072-Speed 2631.32 samples/sec   Loss 4.2229   LearningRate 0.0098   Epoch: 13   Global Step: 570460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:51:54,953-Speed 2638.92 samples/sec   Loss 4.2681   LearningRate 0.0098   Epoch: 13   Global Step: 570470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:51:58,844-Speed 2631.97 samples/sec   Loss 4.1911   LearningRate 0.0098   Epoch: 13   Global Step: 570480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:02,741-Speed 2628.38 samples/sec   Loss 4.2402   LearningRate 0.0098   Epoch: 13   Global Step: 570490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:06,648-Speed 2621.35 samples/sec   Loss 4.2791   LearningRate 0.0098   Epoch: 13   Global Step: 570500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:10,544-Speed 2628.77 samples/sec   Loss 4.2658   LearningRate 0.0098   Epoch: 13   Global Step: 570510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:14,449-Speed 2623.46 samples/sec   Loss 4.1496   LearningRate 0.0098   Epoch: 13   Global Step: 570520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:18,357-Speed 2620.92 samples/sec   Loss 4.2301   LearningRate 0.0098   Epoch: 13   Global Step: 570530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:22,271-Speed 2617.46 samples/sec   Loss 4.2548   LearningRate 0.0098   Epoch: 13   Global Step: 570540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:26,218-Speed 2594.63 samples/sec   Loss 4.1916   LearningRate 0.0097   Epoch: 13   Global Step: 570550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:30,114-Speed 2629.39 samples/sec   Loss 4.1996   LearningRate 0.0097   Epoch: 13   Global Step: 570560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:33,984-Speed 2646.48 samples/sec   Loss 4.2018   LearningRate 0.0097   Epoch: 13   Global Step: 570570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:37,881-Speed 2628.34 samples/sec   Loss 4.2178   LearningRate 0.0097   Epoch: 13   Global Step: 570580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:41,776-Speed 2629.15 samples/sec   Loss 4.2030   LearningRate 0.0097   Epoch: 13   Global Step: 570590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:45,676-Speed 2626.87 samples/sec   Loss 4.3154   LearningRate 0.0097   Epoch: 13   Global Step: 570600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:49,570-Speed 2630.19 samples/sec   Loss 4.2352   LearningRate 0.0097   Epoch: 13   Global Step: 570610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:53,464-Speed 2630.39 samples/sec   Loss 4.1943   LearningRate 0.0097   Epoch: 13   Global Step: 570620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:52:57,361-Speed 2628.58 samples/sec   Loss 4.2914   LearningRate 0.0097   Epoch: 13   Global Step: 570630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:01,262-Speed 2626.02 samples/sec   Loss 4.1535   LearningRate 0.0097   Epoch: 13   Global Step: 570640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:05,161-Speed 2626.52 samples/sec   Loss 4.2061   LearningRate 0.0097   Epoch: 13   Global Step: 570650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:09,059-Speed 2627.64 samples/sec   Loss 4.2130   LearningRate 0.0097   Epoch: 13   Global Step: 570660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:12,970-Speed 2618.84 samples/sec   Loss 4.3138   LearningRate 0.0097   Epoch: 13   Global Step: 570670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:53:16,846-Speed 2643.09 samples/sec   Loss 4.1603   LearningRate 0.0097   Epoch: 13   Global Step: 570680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:20,740-Speed 2630.81 samples/sec   Loss 4.2420   LearningRate 0.0097   Epoch: 13   Global Step: 570690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:24,636-Speed 2628.86 samples/sec   Loss 4.2001   LearningRate 0.0097   Epoch: 13   Global Step: 570700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:28,538-Speed 2624.81 samples/sec   Loss 4.2225   LearningRate 0.0097   Epoch: 13   Global Step: 570710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:32,444-Speed 2622.23 samples/sec   Loss 4.2476   LearningRate 0.0097   Epoch: 13   Global Step: 570720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:36,339-Speed 2629.87 samples/sec   Loss 4.2255   LearningRate 0.0097   Epoch: 13   Global Step: 570730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:40,237-Speed 2627.56 samples/sec   Loss 4.1641   LearningRate 0.0097   Epoch: 13   Global Step: 570740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:44,147-Speed 2619.88 samples/sec   Loss 4.2100   LearningRate 0.0097   Epoch: 13   Global Step: 570750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:48,054-Speed 2621.54 samples/sec   Loss 4.1425   LearningRate 0.0097   Epoch: 13   Global Step: 570760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:51,964-Speed 2619.79 samples/sec   Loss 4.2511   LearningRate 0.0097   Epoch: 13   Global Step: 570770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:55,842-Speed 2640.82 samples/sec   Loss 4.1283   LearningRate 0.0097   Epoch: 13   Global Step: 570780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:53:59,742-Speed 2626.95 samples/sec   Loss 4.1929   LearningRate 0.0097   Epoch: 13   Global Step: 570790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:03,649-Speed 2621.67 samples/sec   Loss 4.2377   LearningRate 0.0097   Epoch: 13   Global Step: 570800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:07,544-Speed 2629.67 samples/sec   Loss 4.2967   LearningRate 0.0097   Epoch: 13   Global Step: 570810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:11,448-Speed 2623.00 samples/sec   Loss 4.1736   LearningRate 0.0097   Epoch: 13   Global Step: 570820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:15,352-Speed 2624.43 samples/sec   Loss 4.1181   LearningRate 0.0097   Epoch: 13   Global Step: 570830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:19,275-Speed 2610.55 samples/sec   Loss 4.1130   LearningRate 0.0097   Epoch: 13   Global Step: 570840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:23,175-Speed 2626.43 samples/sec   Loss 4.1076   LearningRate 0.0097   Epoch: 13   Global Step: 570850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:27,091-Speed 2615.92 samples/sec   Loss 4.2532   LearningRate 0.0097   Epoch: 13   Global Step: 570860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:30,994-Speed 2624.13 samples/sec   Loss 4.2546   LearningRate 0.0097   Epoch: 13   Global Step: 570870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:34,914-Speed 2613.19 samples/sec   Loss 4.1301   LearningRate 0.0097   Epoch: 13   Global Step: 570880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:38,849-Speed 2602.80 samples/sec   Loss 4.2541   LearningRate 0.0097   Epoch: 13   Global Step: 570890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:42,768-Speed 2613.54 samples/sec   Loss 4.2534   LearningRate 0.0097   Epoch: 13   Global Step: 570900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:46,660-Speed 2631.56 samples/sec   Loss 4.2412   LearningRate 0.0097   Epoch: 13   Global Step: 570910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:50,559-Speed 2627.09 samples/sec   Loss 4.1793   LearningRate 0.0097   Epoch: 13   Global Step: 570920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:54,463-Speed 2623.77 samples/sec   Loss 4.2545   LearningRate 0.0097   Epoch: 13   Global Step: 570930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:54:58,364-Speed 2625.80 samples/sec   Loss 4.2278   LearningRate 0.0097   Epoch: 13   Global Step: 570940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:02,266-Speed 2624.92 samples/sec   Loss 4.1310   LearningRate 0.0097   Epoch: 13   Global Step: 570950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:06,181-Speed 2616.27 samples/sec   Loss 4.1937   LearningRate 0.0097   Epoch: 13   Global Step: 570960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:10,093-Speed 2618.66 samples/sec   Loss 4.2430   LearningRate 0.0097   Epoch: 13   Global Step: 570970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:14,001-Speed 2620.63 samples/sec   Loss 4.2765   LearningRate 0.0097   Epoch: 13   Global Step: 570980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:55:17,875-Speed 2643.96 samples/sec   Loss 4.2535   LearningRate 0.0097   Epoch: 13   Global Step: 570990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:21,799-Speed 2610.42 samples/sec   Loss 4.1851   LearningRate 0.0097   Epoch: 13   Global Step: 571000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:25,698-Speed 2627.74 samples/sec   Loss 4.1357   LearningRate 0.0097   Epoch: 13   Global Step: 571010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:29,613-Speed 2616.25 samples/sec   Loss 4.1987   LearningRate 0.0097   Epoch: 13   Global Step: 571020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:33,516-Speed 2624.65 samples/sec   Loss 4.2531   LearningRate 0.0097   Epoch: 13   Global Step: 571030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:37,429-Speed 2617.24 samples/sec   Loss 4.1909   LearningRate 0.0097   Epoch: 13   Global Step: 571040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:41,353-Speed 2610.07 samples/sec   Loss 4.2615   LearningRate 0.0097   Epoch: 13   Global Step: 571050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:45,257-Speed 2623.67 samples/sec   Loss 4.1736   LearningRate 0.0097   Epoch: 13   Global Step: 571060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:49,161-Speed 2624.28 samples/sec   Loss 4.1873   LearningRate 0.0097   Epoch: 13   Global Step: 571070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:53,070-Speed 2619.51 samples/sec   Loss 4.1736   LearningRate 0.0097   Epoch: 13   Global Step: 571080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:55:56,986-Speed 2615.76 samples/sec   Loss 4.1663   LearningRate 0.0097   Epoch: 13   Global Step: 571090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:56:00,888-Speed 2625.01 samples/sec   Loss 4.2105   LearningRate 0.0097   Epoch: 13   Global Step: 571100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:56:04,792-Speed 2624.01 samples/sec   Loss 4.1076   LearningRate 0.0097   Epoch: 13   Global Step: 571110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:56:08,695-Speed 2624.04 samples/sec   Loss 4.2059   LearningRate 0.0097   Epoch: 13   Global Step: 571120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:56:12,579-Speed 2637.33 samples/sec   Loss 4.2780   LearningRate 0.0097   Epoch: 13   Global Step: 571130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:56:16,483-Speed 2623.27 samples/sec   Loss 4.1618   LearningRate 0.0097   Epoch: 13   Global Step: 571140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:56:20,401-Speed 2614.55 samples/sec   Loss 4.2413   LearningRate 0.0097   Epoch: 13   Global Step: 571150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:56:24,307-Speed 2622.52 samples/sec   Loss 4.1560   LearningRate 0.0097   Epoch: 13   Global Step: 571160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:56:28,215-Speed 2620.65 samples/sec   Loss 4.3138   LearningRate 0.0097   Epoch: 13   Global Step: 571170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:56:32,133-Speed 2614.49 samples/sec   Loss 4.2246   LearningRate 0.0097   Epoch: 13   Global Step: 571180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:36,031-Speed 2629.24 samples/sec   Loss 4.2487   LearningRate 0.0097   Epoch: 13   Global Step: 571190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:39,949-Speed 2615.54 samples/sec   Loss 4.1793   LearningRate 0.0097   Epoch: 13   Global Step: 571200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:43,851-Speed 2624.32 samples/sec   Loss 4.1892   LearningRate 0.0097   Epoch: 13   Global Step: 571210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:47,765-Speed 2618.08 samples/sec   Loss 4.2688   LearningRate 0.0097   Epoch: 13   Global Step: 571220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:51,687-Speed 2611.59 samples/sec   Loss 4.2105   LearningRate 0.0097   Epoch: 13   Global Step: 571230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:55,604-Speed 2614.75 samples/sec   Loss 4.1926   LearningRate 0.0097   Epoch: 13   Global Step: 571240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:56:59,504-Speed 2626.46 samples/sec   Loss 4.2350   LearningRate 0.0097   Epoch: 13   Global Step: 571250   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:57:03,405-Speed 2625.55 samples/sec   Loss 4.2197   LearningRate 0.0097   Epoch: 13   Global Step: 571260   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:57:07,312-Speed 2621.53 samples/sec   Loss 4.2624   LearningRate 0.0097   Epoch: 13   Global Step: 571270   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:57:11,231-Speed 2613.83 samples/sec   Loss 4.2446   LearningRate 0.0097   Epoch: 13   Global Step: 571280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:15,134-Speed 2624.36 samples/sec   Loss 4.1629   LearningRate 0.0097   Epoch: 13   Global Step: 571290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:19,041-Speed 2621.74 samples/sec   Loss 4.2398   LearningRate 0.0097   Epoch: 13   Global Step: 571300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:22,942-Speed 2626.12 samples/sec   Loss 4.1859   LearningRate 0.0097   Epoch: 13   Global Step: 571310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:26,842-Speed 2626.43 samples/sec   Loss 4.1930   LearningRate 0.0097   Epoch: 13   Global Step: 571320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:30,746-Speed 2623.50 samples/sec   Loss 4.2313   LearningRate 0.0097   Epoch: 13   Global Step: 571330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:34,660-Speed 2616.97 samples/sec   Loss 4.2662   LearningRate 0.0097   Epoch: 13   Global Step: 571340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:38,570-Speed 2619.23 samples/sec   Loss 4.2747   LearningRate 0.0097   Epoch: 13   Global Step: 571350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:42,595-Speed 2544.60 samples/sec   Loss 4.2462   LearningRate 0.0097   Epoch: 13   Global Step: 571360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:46,495-Speed 2626.49 samples/sec   Loss 4.2183   LearningRate 0.0097   Epoch: 13   Global Step: 571370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:57:50,400-Speed 2623.26 samples/sec   Loss 4.2412   LearningRate 0.0097   Epoch: 13   Global Step: 571380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:57:54,318-Speed 2614.06 samples/sec   Loss 4.1624   LearningRate 0.0097   Epoch: 13   Global Step: 571390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 11:57:58,215-Speed 2628.46 samples/sec   Loss 4.2074   LearningRate 0.0097   Epoch: 13   Global Step: 571400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:58:02,343-Speed 2481.62 samples/sec   Loss 4.2828   LearningRate 0.0097   Epoch: 13   Global Step: 571410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:58:06,426-Speed 2508.30 samples/sec   Loss 4.1902   LearningRate 0.0097   Epoch: 13   Global Step: 571420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:58:10,343-Speed 2614.51 samples/sec   Loss 4.2039   LearningRate 0.0097   Epoch: 13   Global Step: 571430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:58:14,292-Speed 2594.30 samples/sec   Loss 4.2043   LearningRate 0.0097   Epoch: 13   Global Step: 571440   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:18,200-Speed 2620.63 samples/sec   Loss 4.2282   LearningRate 0.0097   Epoch: 13   Global Step: 571450   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:22,102-Speed 2625.40 samples/sec   Loss 4.2029   LearningRate 0.0097   Epoch: 13   Global Step: 571460   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:26,008-Speed 2622.36 samples/sec   Loss 4.1980   LearningRate 0.0097   Epoch: 13   Global Step: 571470   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:29,914-Speed 2622.59 samples/sec   Loss 4.3005   LearningRate 0.0097   Epoch: 13   Global Step: 571480   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:33,811-Speed 2627.88 samples/sec   Loss 4.2205   LearningRate 0.0097   Epoch: 13   Global Step: 571490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:37,843-Speed 2540.24 samples/sec   Loss 4.2037   LearningRate 0.0097   Epoch: 13   Global Step: 571500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:41,767-Speed 2610.39 samples/sec   Loss 4.1571   LearningRate 0.0097   Epoch: 13   Global Step: 571510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:45,669-Speed 2625.13 samples/sec   Loss 4.1812   LearningRate 0.0097   Epoch: 13   Global Step: 571520   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:49,582-Speed 2617.46 samples/sec   Loss 4.1888   LearningRate 0.0097   Epoch: 13   Global Step: 571530   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:58:53,480-Speed 2628.10 samples/sec   Loss 4.2857   LearningRate 0.0097   Epoch: 13   Global Step: 571540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:58:57,364-Speed 2636.61 samples/sec   Loss 4.2371   LearningRate 0.0097   Epoch: 13   Global Step: 571550   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:01,270-Speed 2622.48 samples/sec   Loss 4.2246   LearningRate 0.0097   Epoch: 13   Global Step: 571560   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:05,175-Speed 2622.66 samples/sec   Loss 4.2619   LearningRate 0.0097   Epoch: 13   Global Step: 571570   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:09,083-Speed 2620.87 samples/sec   Loss 4.2865   LearningRate 0.0097   Epoch: 13   Global Step: 571580   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:12,984-Speed 2625.80 samples/sec   Loss 4.2143   LearningRate 0.0097   Epoch: 13   Global Step: 571590   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:16,887-Speed 2624.07 samples/sec   Loss 4.3028   LearningRate 0.0097   Epoch: 13   Global Step: 571600   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:20,789-Speed 2625.44 samples/sec   Loss 4.1198   LearningRate 0.0097   Epoch: 13   Global Step: 571610   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:24,704-Speed 2616.07 samples/sec   Loss 4.3071   LearningRate 0.0097   Epoch: 13   Global Step: 571620   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:28,607-Speed 2625.29 samples/sec   Loss 4.1757   LearningRate 0.0097   Epoch: 13   Global Step: 571630   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:32,507-Speed 2625.90 samples/sec   Loss 4.2063   LearningRate 0.0097   Epoch: 13   Global Step: 571640   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 11:59:36,407-Speed 2625.98 samples/sec   Loss 4.1928   LearningRate 0.0097   Epoch: 13   Global Step: 571650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:59:40,308-Speed 2625.90 samples/sec   Loss 4.2062   LearningRate 0.0097   Epoch: 13   Global Step: 571660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:59:44,211-Speed 2624.20 samples/sec   Loss 4.2332   LearningRate 0.0097   Epoch: 13   Global Step: 571670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:59:48,113-Speed 2624.85 samples/sec   Loss 4.2618   LearningRate 0.0097   Epoch: 13   Global Step: 571680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:59:52,021-Speed 2621.51 samples/sec   Loss 4.0970   LearningRate 0.0097   Epoch: 13   Global Step: 571690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:59:55,919-Speed 2627.52 samples/sec   Loss 4.2531   LearningRate 0.0097   Epoch: 13   Global Step: 571700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 11:59:59,804-Speed 2636.80 samples/sec   Loss 4.2227   LearningRate 0.0097   Epoch: 13   Global Step: 571710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:03,719-Speed 2616.31 samples/sec   Loss 4.2315   LearningRate 0.0097   Epoch: 13   Global Step: 571720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:07,630-Speed 2619.29 samples/sec   Loss 4.1983   LearningRate 0.0097   Epoch: 13   Global Step: 571730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:11,539-Speed 2619.85 samples/sec   Loss 4.2164   LearningRate 0.0097   Epoch: 13   Global Step: 571740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:15,445-Speed 2622.59 samples/sec   Loss 4.2673   LearningRate 0.0097   Epoch: 13   Global Step: 571750   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:19,350-Speed 2622.47 samples/sec   Loss 4.2263   LearningRate 0.0097   Epoch: 13   Global Step: 571760   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:23,247-Speed 2628.21 samples/sec   Loss 4.2223   LearningRate 0.0097   Epoch: 13   Global Step: 571770   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:27,149-Speed 2625.09 samples/sec   Loss 4.2314   LearningRate 0.0097   Epoch: 13   Global Step: 571780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:31,053-Speed 2624.21 samples/sec   Loss 4.1654   LearningRate 0.0097   Epoch: 13   Global Step: 571790   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:34,953-Speed 2626.39 samples/sec   Loss 4.2645   LearningRate 0.0097   Epoch: 13   Global Step: 571800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:00:38,853-Speed 2626.02 samples/sec   Loss 4.2412   LearningRate 0.0097   Epoch: 13   Global Step: 571810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:00:42,764-Speed 2618.94 samples/sec   Loss 4.3061   LearningRate 0.0097   Epoch: 13   Global Step: 571820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:00:46,664-Speed 2625.98 samples/sec   Loss 4.1962   LearningRate 0.0097   Epoch: 13   Global Step: 571830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:00:50,566-Speed 2625.67 samples/sec   Loss 4.2139   LearningRate 0.0097   Epoch: 13   Global Step: 571840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:00:54,463-Speed 2627.95 samples/sec   Loss 4.2796   LearningRate 0.0097   Epoch: 13   Global Step: 571850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:00:58,360-Speed 2628.47 samples/sec   Loss 4.1216   LearningRate 0.0097   Epoch: 13   Global Step: 571860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:02,261-Speed 2626.04 samples/sec   Loss 4.1873   LearningRate 0.0097   Epoch: 13   Global Step: 571870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:06,160-Speed 2626.99 samples/sec   Loss 4.1874   LearningRate 0.0096   Epoch: 13   Global Step: 571880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:10,073-Speed 2617.53 samples/sec   Loss 4.2455   LearningRate 0.0096   Epoch: 13   Global Step: 571890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:13,979-Speed 2622.05 samples/sec   Loss 4.2514   LearningRate 0.0096   Epoch: 13   Global Step: 571900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:17,885-Speed 2621.84 samples/sec   Loss 4.1718   LearningRate 0.0096   Epoch: 13   Global Step: 571910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:01:21,792-Speed 2622.68 samples/sec   Loss 4.1751   LearningRate 0.0096   Epoch: 13   Global Step: 571920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:01:25,690-Speed 2627.49 samples/sec   Loss 4.1060   LearningRate 0.0096   Epoch: 13   Global Step: 571930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:01:29,628-Speed 2600.62 samples/sec   Loss 4.2201   LearningRate 0.0096   Epoch: 13   Global Step: 571940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:01:33,515-Speed 2635.69 samples/sec   Loss 4.1762   LearningRate 0.0096   Epoch: 13   Global Step: 571950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:37,414-Speed 2627.51 samples/sec   Loss 4.2082   LearningRate 0.0096   Epoch: 13   Global Step: 571960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:41,314-Speed 2626.16 samples/sec   Loss 4.1428   LearningRate 0.0096   Epoch: 13   Global Step: 571970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:45,220-Speed 2622.11 samples/sec   Loss 4.1722   LearningRate 0.0096   Epoch: 13   Global Step: 571980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:49,122-Speed 2624.75 samples/sec   Loss 4.1520   LearningRate 0.0096   Epoch: 13   Global Step: 571990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:53,031-Speed 2619.99 samples/sec   Loss 4.0598   LearningRate 0.0096   Epoch: 13   Global Step: 572000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:01:56,934-Speed 2624.57 samples/sec   Loss 4.1863   LearningRate 0.0096   Epoch: 13   Global Step: 572010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:00,843-Speed 2620.36 samples/sec   Loss 4.2488   LearningRate 0.0096   Epoch: 13   Global Step: 572020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:04,745-Speed 2625.33 samples/sec   Loss 4.2286   LearningRate 0.0096   Epoch: 13   Global Step: 572030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:08,648-Speed 2623.83 samples/sec   Loss 4.1750   LearningRate 0.0096   Epoch: 13   Global Step: 572040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:12,554-Speed 2622.36 samples/sec   Loss 4.2748   LearningRate 0.0096   Epoch: 13   Global Step: 572050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:02:16,440-Speed 2635.55 samples/sec   Loss 4.1200   LearningRate 0.0096   Epoch: 13   Global Step: 572060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:20,340-Speed 2626.10 samples/sec   Loss 4.2178   LearningRate 0.0096   Epoch: 13   Global Step: 572070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:24,249-Speed 2620.33 samples/sec   Loss 4.1418   LearningRate 0.0096   Epoch: 13   Global Step: 572080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:28,164-Speed 2616.26 samples/sec   Loss 4.2787   LearningRate 0.0096   Epoch: 13   Global Step: 572090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:32,068-Speed 2623.84 samples/sec   Loss 4.1364   LearningRate 0.0096   Epoch: 13   Global Step: 572100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:35,971-Speed 2623.95 samples/sec   Loss 4.2191   LearningRate 0.0096   Epoch: 13   Global Step: 572110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:02:39,850-Speed 2640.55 samples/sec   Loss 4.2201   LearningRate 0.0096   Epoch: 13   Global Step: 572120   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:02:43,758-Speed 2620.88 samples/sec   Loss 4.2443   LearningRate 0.0096   Epoch: 13   Global Step: 572130   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:02:47,804-Speed 2531.02 samples/sec   Loss 4.2088   LearningRate 0.0096   Epoch: 13   Global Step: 572140   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:02:51,805-Speed 2560.38 samples/sec   Loss 4.1718   LearningRate 0.0096   Epoch: 13   Global Step: 572150   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:02:55,751-Speed 2595.52 samples/sec   Loss 4.1299   LearningRate 0.0096   Epoch: 13   Global Step: 572160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:02:59,849-Speed 2499.98 samples/sec   Loss 4.1834   LearningRate 0.0096   Epoch: 13   Global Step: 572170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:03:03,947-Speed 2499.15 samples/sec   Loss 4.1360   LearningRate 0.0096   Epoch: 13   Global Step: 572180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:03:08,046-Speed 2498.85 samples/sec   Loss 4.2142   LearningRate 0.0096   Epoch: 13   Global Step: 572190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:03:12,149-Speed 2496.40 samples/sec   Loss 4.2328   LearningRate 0.0096   Epoch: 13   Global Step: 572200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:03:16,142-Speed 2565.20 samples/sec   Loss 4.1949   LearningRate 0.0096   Epoch: 13   Global Step: 572210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:03:20,048-Speed 2622.78 samples/sec   Loss 4.1607   LearningRate 0.0096   Epoch: 13   Global Step: 572220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:23,951-Speed 2624.06 samples/sec   Loss 4.2214   LearningRate 0.0096   Epoch: 13   Global Step: 572230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:27,903-Speed 2592.20 samples/sec   Loss 4.1924   LearningRate 0.0096   Epoch: 13   Global Step: 572240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:31,812-Speed 2620.14 samples/sec   Loss 4.1310   LearningRate 0.0096   Epoch: 13   Global Step: 572250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:35,719-Speed 2621.36 samples/sec   Loss 4.1107   LearningRate 0.0096   Epoch: 13   Global Step: 572260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:39,620-Speed 2625.44 samples/sec   Loss 4.1984   LearningRate 0.0096   Epoch: 13   Global Step: 572270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:43,525-Speed 2623.12 samples/sec   Loss 4.2382   LearningRate 0.0096   Epoch: 13   Global Step: 572280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:47,427-Speed 2625.38 samples/sec   Loss 4.1274   LearningRate 0.0096   Epoch: 13   Global Step: 572290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:51,336-Speed 2620.71 samples/sec   Loss 4.1793   LearningRate 0.0096   Epoch: 13   Global Step: 572300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:55,247-Speed 2618.55 samples/sec   Loss 4.1903   LearningRate 0.0096   Epoch: 13   Global Step: 572310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:03:59,154-Speed 2621.76 samples/sec   Loss 4.2104   LearningRate 0.0096   Epoch: 13   Global Step: 572320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:04:03,059-Speed 2623.03 samples/sec   Loss 4.2234   LearningRate 0.0096   Epoch: 13   Global Step: 572330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:06,963-Speed 2624.17 samples/sec   Loss 4.1941   LearningRate 0.0096   Epoch: 13   Global Step: 572340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:10,864-Speed 2625.28 samples/sec   Loss 4.2701   LearningRate 0.0096   Epoch: 13   Global Step: 572350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:14,795-Speed 2605.53 samples/sec   Loss 4.2030   LearningRate 0.0096   Epoch: 13   Global Step: 572360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:18,703-Speed 2621.09 samples/sec   Loss 4.2107   LearningRate 0.0096   Epoch: 13   Global Step: 572370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:22,605-Speed 2624.69 samples/sec   Loss 4.1694   LearningRate 0.0096   Epoch: 13   Global Step: 572380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:26,515-Speed 2620.16 samples/sec   Loss 4.1140   LearningRate 0.0096   Epoch: 13   Global Step: 572390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:30,446-Speed 2605.62 samples/sec   Loss 4.2196   LearningRate 0.0096   Epoch: 13   Global Step: 572400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:34,355-Speed 2620.05 samples/sec   Loss 4.1445   LearningRate 0.0096   Epoch: 13   Global Step: 572410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:38,266-Speed 2619.87 samples/sec   Loss 4.2292   LearningRate 0.0096   Epoch: 13   Global Step: 572420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:04:42,166-Speed 2625.78 samples/sec   Loss 4.2048   LearningRate 0.0096   Epoch: 13   Global Step: 572430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:04:46,068-Speed 2625.42 samples/sec   Loss 4.2695   LearningRate 0.0096   Epoch: 13   Global Step: 572440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:04:49,988-Speed 2612.37 samples/sec   Loss 4.1681   LearningRate 0.0096   Epoch: 13   Global Step: 572450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:04:53,906-Speed 2614.55 samples/sec   Loss 4.1174   LearningRate 0.0096   Epoch: 13   Global Step: 572460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:04:57,797-Speed 2632.32 samples/sec   Loss 4.1720   LearningRate 0.0096   Epoch: 13   Global Step: 572470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:01,739-Speed 2598.57 samples/sec   Loss 4.1760   LearningRate 0.0096   Epoch: 13   Global Step: 572480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:05,640-Speed 2625.89 samples/sec   Loss 4.2742   LearningRate 0.0096   Epoch: 13   Global Step: 572490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:09,545-Speed 2623.16 samples/sec   Loss 4.1969   LearningRate 0.0096   Epoch: 13   Global Step: 572500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:13,448-Speed 2623.91 samples/sec   Loss 4.1398   LearningRate 0.0096   Epoch: 13   Global Step: 572510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:17,353-Speed 2623.86 samples/sec   Loss 4.2197   LearningRate 0.0096   Epoch: 13   Global Step: 572520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:21,319-Speed 2582.58 samples/sec   Loss 4.2137   LearningRate 0.0096   Epoch: 13   Global Step: 572530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:25,274-Speed 2589.37 samples/sec   Loss 4.2053   LearningRate 0.0096   Epoch: 13   Global Step: 572540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:29,177-Speed 2624.45 samples/sec   Loss 4.2668   LearningRate 0.0096   Epoch: 13   Global Step: 572550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:33,082-Speed 2623.73 samples/sec   Loss 4.1712   LearningRate 0.0096   Epoch: 13   Global Step: 572560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:36,994-Speed 2618.38 samples/sec   Loss 4.1585   LearningRate 0.0096   Epoch: 13   Global Step: 572570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:05:40,905-Speed 2619.05 samples/sec   Loss 4.1924   LearningRate 0.0096   Epoch: 13   Global Step: 572580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:05:44,817-Speed 2618.64 samples/sec   Loss 4.1985   LearningRate 0.0096   Epoch: 13   Global Step: 572590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:05:48,704-Speed 2635.06 samples/sec   Loss 4.1824   LearningRate 0.0096   Epoch: 13   Global Step: 572600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:52,612-Speed 2620.60 samples/sec   Loss 4.2739   LearningRate 0.0096   Epoch: 13   Global Step: 572610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:05:56,549-Speed 2601.88 samples/sec   Loss 4.2711   LearningRate 0.0096   Epoch: 13   Global Step: 572620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:00,453-Speed 2623.30 samples/sec   Loss 4.2413   LearningRate 0.0096   Epoch: 13   Global Step: 572630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:04,382-Speed 2606.95 samples/sec   Loss 4.1480   LearningRate 0.0096   Epoch: 13   Global Step: 572640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:08,289-Speed 2621.63 samples/sec   Loss 4.2102   LearningRate 0.0096   Epoch: 13   Global Step: 572650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:12,200-Speed 2619.15 samples/sec   Loss 4.1994   LearningRate 0.0096   Epoch: 13   Global Step: 572660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:16,103-Speed 2624.49 samples/sec   Loss 4.2680   LearningRate 0.0096   Epoch: 13   Global Step: 572670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:20,017-Speed 2616.91 samples/sec   Loss 4.1777   LearningRate 0.0096   Epoch: 13   Global Step: 572680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:23,934-Speed 2614.89 samples/sec   Loss 4.1199   LearningRate 0.0096   Epoch: 13   Global Step: 572690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:27,863-Speed 2606.76 samples/sec   Loss 4.1678   LearningRate 0.0096   Epoch: 13   Global Step: 572700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:06:31,743-Speed 2639.83 samples/sec   Loss 4.1721   LearningRate 0.0096   Epoch: 13   Global Step: 572710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:35,649-Speed 2622.82 samples/sec   Loss 4.1884   LearningRate 0.0096   Epoch: 13   Global Step: 572720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:39,699-Speed 2528.52 samples/sec   Loss 4.1079   LearningRate 0.0096   Epoch: 13   Global Step: 572730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:43,621-Speed 2611.18 samples/sec   Loss 4.1968   LearningRate 0.0096   Epoch: 13   Global Step: 572740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:47,524-Speed 2624.88 samples/sec   Loss 4.2047   LearningRate 0.0096   Epoch: 13   Global Step: 572750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:51,431-Speed 2620.85 samples/sec   Loss 4.2307   LearningRate 0.0096   Epoch: 13   Global Step: 572760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:55,333-Speed 2625.84 samples/sec   Loss 4.1508   LearningRate 0.0096   Epoch: 13   Global Step: 572770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:06:59,242-Speed 2619.83 samples/sec   Loss 4.1423   LearningRate 0.0096   Epoch: 13   Global Step: 572780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:03,158-Speed 2615.64 samples/sec   Loss 4.1259   LearningRate 0.0096   Epoch: 13   Global Step: 572790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:07,062-Speed 2623.31 samples/sec   Loss 4.2400   LearningRate 0.0096   Epoch: 13   Global Step: 572800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:10,941-Speed 2640.66 samples/sec   Loss 4.0899   LearningRate 0.0096   Epoch: 13   Global Step: 572810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:14,848-Speed 2621.36 samples/sec   Loss 4.1997   LearningRate 0.0096   Epoch: 13   Global Step: 572820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:18,747-Speed 2626.97 samples/sec   Loss 4.1635   LearningRate 0.0096   Epoch: 13   Global Step: 572830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:22,649-Speed 2625.19 samples/sec   Loss 4.1890   LearningRate 0.0096   Epoch: 13   Global Step: 572840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:26,554-Speed 2623.35 samples/sec   Loss 4.1374   LearningRate 0.0096   Epoch: 13   Global Step: 572850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:07:30,468-Speed 2616.74 samples/sec   Loss 4.2120   LearningRate 0.0096   Epoch: 13   Global Step: 572860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:34,381-Speed 2617.49 samples/sec   Loss 4.2162   LearningRate 0.0096   Epoch: 13   Global Step: 572870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:38,282-Speed 2625.64 samples/sec   Loss 4.1325   LearningRate 0.0096   Epoch: 13   Global Step: 572880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:42,198-Speed 2615.45 samples/sec   Loss 4.1771   LearningRate 0.0096   Epoch: 13   Global Step: 572890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:46,124-Speed 2609.26 samples/sec   Loss 4.1775   LearningRate 0.0096   Epoch: 13   Global Step: 572900   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:50,049-Speed 2609.36 samples/sec   Loss 4.2467   LearningRate 0.0096   Epoch: 13   Global Step: 572910   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:53,978-Speed 2607.66 samples/sec   Loss 4.1719   LearningRate 0.0096   Epoch: 13   Global Step: 572920   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:07:57,884-Speed 2622.32 samples/sec   Loss 4.1841   LearningRate 0.0096   Epoch: 13   Global Step: 572930   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:01,789-Speed 2623.09 samples/sec   Loss 4.1467   LearningRate 0.0096   Epoch: 13   Global Step: 572940   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:05,693-Speed 2623.36 samples/sec   Loss 4.1613   LearningRate 0.0096   Epoch: 13   Global Step: 572950   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:09,600-Speed 2621.67 samples/sec   Loss 4.1943   LearningRate 0.0096   Epoch: 13   Global Step: 572960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:08:13,505-Speed 2623.10 samples/sec   Loss 4.1456   LearningRate 0.0096   Epoch: 13   Global Step: 572970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:08:17,540-Speed 2538.67 samples/sec   Loss 4.1368   LearningRate 0.0096   Epoch: 13   Global Step: 572980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:08:21,456-Speed 2614.82 samples/sec   Loss 4.1180   LearningRate 0.0096   Epoch: 13   Global Step: 572990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:08:25,360-Speed 2624.10 samples/sec   Loss 4.1888   LearningRate 0.0096   Epoch: 13   Global Step: 573000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:08:29,239-Speed 2640.43 samples/sec   Loss 4.2160   LearningRate 0.0096   Epoch: 13   Global Step: 573010   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:33,141-Speed 2626.29 samples/sec   Loss 4.2172   LearningRate 0.0096   Epoch: 13   Global Step: 573020   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:37,047-Speed 2621.55 samples/sec   Loss 4.1895   LearningRate 0.0096   Epoch: 13   Global Step: 573030   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:40,949-Speed 2625.34 samples/sec   Loss 4.2624   LearningRate 0.0096   Epoch: 13   Global Step: 573040   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:44,853-Speed 2623.27 samples/sec   Loss 4.1793   LearningRate 0.0096   Epoch: 13   Global Step: 573050   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:48,757-Speed 2623.39 samples/sec   Loss 4.1925   LearningRate 0.0096   Epoch: 13   Global Step: 573060   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:52,697-Speed 2600.17 samples/sec   Loss 4.1209   LearningRate 0.0096   Epoch: 13   Global Step: 573070   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:08:56,600-Speed 2624.19 samples/sec   Loss 4.2917   LearningRate 0.0096   Epoch: 13   Global Step: 573080   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:00,518-Speed 2614.88 samples/sec   Loss 4.2414   LearningRate 0.0096   Epoch: 13   Global Step: 573090   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:04,423-Speed 2622.70 samples/sec   Loss 4.1916   LearningRate 0.0096   Epoch: 13   Global Step: 573100   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:08,330-Speed 2621.28 samples/sec   Loss 4.2160   LearningRate 0.0096   Epoch: 13   Global Step: 573110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:09:12,352-Speed 2546.80 samples/sec   Loss 4.0982   LearningRate 0.0096   Epoch: 13   Global Step: 573120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:09:16,263-Speed 2619.21 samples/sec   Loss 4.1852   LearningRate 0.0096   Epoch: 13   Global Step: 573130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:09:20,165-Speed 2624.99 samples/sec   Loss 4.2441   LearningRate 0.0096   Epoch: 13   Global Step: 573140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:09:24,039-Speed 2644.74 samples/sec   Loss 4.1812   LearningRate 0.0096   Epoch: 13   Global Step: 573150   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:27,937-Speed 2627.47 samples/sec   Loss 4.2086   LearningRate 0.0096   Epoch: 13   Global Step: 573160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:31,840-Speed 2624.30 samples/sec   Loss 4.2248   LearningRate 0.0096   Epoch: 13   Global Step: 573170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:35,756-Speed 2615.66 samples/sec   Loss 4.1816   LearningRate 0.0096   Epoch: 13   Global Step: 573180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:39,659-Speed 2624.32 samples/sec   Loss 4.1538   LearningRate 0.0096   Epoch: 13   Global Step: 573190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:43,555-Speed 2628.49 samples/sec   Loss 4.1681   LearningRate 0.0096   Epoch: 13   Global Step: 573200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:47,474-Speed 2614.76 samples/sec   Loss 4.1226   LearningRate 0.0096   Epoch: 13   Global Step: 573210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:51,381-Speed 2621.11 samples/sec   Loss 4.1202   LearningRate 0.0095   Epoch: 13   Global Step: 573220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:55,278-Speed 2629.39 samples/sec   Loss 4.1817   LearningRate 0.0095   Epoch: 13   Global Step: 573230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:09:59,178-Speed 2625.94 samples/sec   Loss 4.2389   LearningRate 0.0095   Epoch: 13   Global Step: 573240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:10:03,110-Speed 2604.92 samples/sec   Loss 4.1894   LearningRate 0.0095   Epoch: 13   Global Step: 573250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:07,011-Speed 2625.70 samples/sec   Loss 4.1985   LearningRate 0.0095   Epoch: 13   Global Step: 573260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:10,918-Speed 2621.81 samples/sec   Loss 4.1103   LearningRate 0.0095   Epoch: 13   Global Step: 573270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:14,826-Speed 2620.87 samples/sec   Loss 4.2661   LearningRate 0.0095   Epoch: 13   Global Step: 573280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:18,730-Speed 2623.47 samples/sec   Loss 4.1649   LearningRate 0.0095   Epoch: 13   Global Step: 573290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:22,631-Speed 2626.01 samples/sec   Loss 4.1479   LearningRate 0.0095   Epoch: 13   Global Step: 573300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:26,533-Speed 2625.12 samples/sec   Loss 4.0773   LearningRate 0.0095   Epoch: 13   Global Step: 573310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:30,439-Speed 2621.74 samples/sec   Loss 4.1714   LearningRate 0.0095   Epoch: 13   Global Step: 573320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:34,349-Speed 2619.89 samples/sec   Loss 4.0857   LearningRate 0.0095   Epoch: 13   Global Step: 573330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:38,255-Speed 2622.75 samples/sec   Loss 4.1780   LearningRate 0.0095   Epoch: 13   Global Step: 573340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:42,174-Speed 2613.91 samples/sec   Loss 4.0961   LearningRate 0.0095   Epoch: 13   Global Step: 573350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:10:46,088-Speed 2616.84 samples/sec   Loss 4.2152   LearningRate 0.0095   Epoch: 13   Global Step: 573360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:49,996-Speed 2620.98 samples/sec   Loss 4.1539   LearningRate 0.0095   Epoch: 13   Global Step: 573370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:53,896-Speed 2626.15 samples/sec   Loss 4.1774   LearningRate 0.0095   Epoch: 13   Global Step: 573380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:10:57,801-Speed 2623.19 samples/sec   Loss 4.1818   LearningRate 0.0095   Epoch: 13   Global Step: 573390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:01,730-Speed 2606.67 samples/sec   Loss 4.1461   LearningRate 0.0095   Epoch: 13   Global Step: 573400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:05,678-Speed 2594.62 samples/sec   Loss 4.1893   LearningRate 0.0095   Epoch: 13   Global Step: 573410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:09,582-Speed 2623.75 samples/sec   Loss 4.1452   LearningRate 0.0095   Epoch: 13   Global Step: 573420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:13,478-Speed 2629.38 samples/sec   Loss 4.1297   LearningRate 0.0095   Epoch: 13   Global Step: 573430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:17,394-Speed 2615.20 samples/sec   Loss 4.1198   LearningRate 0.0095   Epoch: 13   Global Step: 573440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:21,304-Speed 2619.85 samples/sec   Loss 4.1404   LearningRate 0.0095   Epoch: 13   Global Step: 573450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:25,211-Speed 2621.59 samples/sec   Loss 4.2711   LearningRate 0.0095   Epoch: 13   Global Step: 573460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:11:29,101-Speed 2633.40 samples/sec   Loss 4.1539   LearningRate 0.0095   Epoch: 13   Global Step: 573470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:33,009-Speed 2620.42 samples/sec   Loss 4.2097   LearningRate 0.0095   Epoch: 13   Global Step: 573480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:36,911-Speed 2625.51 samples/sec   Loss 4.1881   LearningRate 0.0095   Epoch: 13   Global Step: 573490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:40,813-Speed 2624.45 samples/sec   Loss 4.1615   LearningRate 0.0095   Epoch: 13   Global Step: 573500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:44,713-Speed 2626.74 samples/sec   Loss 4.1196   LearningRate 0.0095   Epoch: 13   Global Step: 573510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:48,616-Speed 2624.63 samples/sec   Loss 4.1562   LearningRate 0.0095   Epoch: 13   Global Step: 573520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:52,514-Speed 2627.40 samples/sec   Loss 4.1634   LearningRate 0.0095   Epoch: 13   Global Step: 573530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:11:56,408-Speed 2630.01 samples/sec   Loss 4.2209   LearningRate 0.0095   Epoch: 13   Global Step: 573540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:00,305-Speed 2628.57 samples/sec   Loss 4.1402   LearningRate 0.0095   Epoch: 13   Global Step: 573550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:04,206-Speed 2625.16 samples/sec   Loss 4.1547   LearningRate 0.0095   Epoch: 13   Global Step: 573560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:08,111-Speed 2623.03 samples/sec   Loss 4.1241   LearningRate 0.0095   Epoch: 13   Global Step: 573570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:12,016-Speed 2623.72 samples/sec   Loss 4.2326   LearningRate 0.0095   Epoch: 13   Global Step: 573580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:15,918-Speed 2625.05 samples/sec   Loss 4.2696   LearningRate 0.0095   Epoch: 13   Global Step: 573590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:19,819-Speed 2625.39 samples/sec   Loss 4.2045   LearningRate 0.0095   Epoch: 13   Global Step: 573600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:23,726-Speed 2621.68 samples/sec   Loss 4.0906   LearningRate 0.0095   Epoch: 13   Global Step: 573610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:27,637-Speed 2618.45 samples/sec   Loss 4.1501   LearningRate 0.0095   Epoch: 13   Global Step: 573620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:31,538-Speed 2625.61 samples/sec   Loss 4.1904   LearningRate 0.0095   Epoch: 13   Global Step: 573630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:35,442-Speed 2623.89 samples/sec   Loss 4.2084   LearningRate 0.0095   Epoch: 13   Global Step: 573640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:39,342-Speed 2626.50 samples/sec   Loss 4.1739   LearningRate 0.0095   Epoch: 13   Global Step: 573650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:43,294-Speed 2592.08 samples/sec   Loss 4.1727   LearningRate 0.0095   Epoch: 13   Global Step: 573660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:12:47,194-Speed 2626.06 samples/sec   Loss 4.2325   LearningRate 0.0095   Epoch: 13   Global Step: 573670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:12:51,102-Speed 2621.67 samples/sec   Loss 4.1778   LearningRate 0.0095   Epoch: 13   Global Step: 573680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:12:55,000-Speed 2626.95 samples/sec   Loss 4.2260   LearningRate 0.0095   Epoch: 13   Global Step: 573690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:12:58,881-Speed 2639.17 samples/sec   Loss 4.1965   LearningRate 0.0095   Epoch: 13   Global Step: 573700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:13:02,792-Speed 2618.64 samples/sec   Loss 4.2140   LearningRate 0.0095   Epoch: 13   Global Step: 573710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:13:06,694-Speed 2625.53 samples/sec   Loss 4.1727   LearningRate 0.0095   Epoch: 13   Global Step: 573720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:13:10,595-Speed 2625.21 samples/sec   Loss 4.2366   LearningRate 0.0095   Epoch: 13   Global Step: 573730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:13:14,495-Speed 2626.82 samples/sec   Loss 4.2270   LearningRate 0.0095   Epoch: 13   Global Step: 573740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:13:18,400-Speed 2622.55 samples/sec   Loss 4.1139   LearningRate 0.0095   Epoch: 13   Global Step: 573750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:13:22,276-Speed 2642.72 samples/sec   Loss 4.1715   LearningRate 0.0095   Epoch: 13   Global Step: 573760   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:26,172-Speed 2628.89 samples/sec   Loss 4.1637   LearningRate 0.0095   Epoch: 13   Global Step: 573770   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:30,084-Speed 2618.50 samples/sec   Loss 4.2499   LearningRate 0.0095   Epoch: 13   Global Step: 573780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:33,997-Speed 2617.16 samples/sec   Loss 4.1385   LearningRate 0.0095   Epoch: 13   Global Step: 573790   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:37,896-Speed 2626.40 samples/sec   Loss 4.1868   LearningRate 0.0095   Epoch: 13   Global Step: 573800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:41,799-Speed 2625.46 samples/sec   Loss 4.1756   LearningRate 0.0095   Epoch: 13   Global Step: 573810   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:45,697-Speed 2626.96 samples/sec   Loss 4.1011   LearningRate 0.0095   Epoch: 13   Global Step: 573820   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:49,615-Speed 2614.56 samples/sec   Loss 4.0635   LearningRate 0.0095   Epoch: 13   Global Step: 573830   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:53,516-Speed 2626.20 samples/sec   Loss 4.0629   LearningRate 0.0095   Epoch: 13   Global Step: 573840   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:13:57,412-Speed 2628.95 samples/sec   Loss 4.2548   LearningRate 0.0095   Epoch: 13   Global Step: 573850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:14:01,329-Speed 2615.07 samples/sec   Loss 4.2216   LearningRate 0.0095   Epoch: 13   Global Step: 573860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:05,227-Speed 2627.41 samples/sec   Loss 4.1648   LearningRate 0.0095   Epoch: 13   Global Step: 573870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:09,130-Speed 2624.10 samples/sec   Loss 4.0942   LearningRate 0.0095   Epoch: 13   Global Step: 573880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:13,028-Speed 2627.95 samples/sec   Loss 4.1736   LearningRate 0.0095   Epoch: 13   Global Step: 573890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:16,934-Speed 2621.95 samples/sec   Loss 4.1050   LearningRate 0.0095   Epoch: 13   Global Step: 573900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:20,842-Speed 2621.12 samples/sec   Loss 4.1447   LearningRate 0.0095   Epoch: 13   Global Step: 573910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:24,748-Speed 2622.09 samples/sec   Loss 4.1813   LearningRate 0.0095   Epoch: 13   Global Step: 573920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:28,651-Speed 2624.95 samples/sec   Loss 4.1373   LearningRate 0.0095   Epoch: 13   Global Step: 573930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:32,553-Speed 2625.03 samples/sec   Loss 4.1338   LearningRate 0.0095   Epoch: 13   Global Step: 573940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:36,471-Speed 2613.96 samples/sec   Loss 4.1194   LearningRate 0.0095   Epoch: 13   Global Step: 573950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:40,387-Speed 2615.33 samples/sec   Loss 4.1596   LearningRate 0.0095   Epoch: 13   Global Step: 573960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:14:44,284-Speed 2628.84 samples/sec   Loss 4.1227   LearningRate 0.0095   Epoch: 13   Global Step: 573970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:14:48,160-Speed 2642.61 samples/sec   Loss 4.1326   LearningRate 0.0095   Epoch: 13   Global Step: 573980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:52,057-Speed 2627.89 samples/sec   Loss 4.2026   LearningRate 0.0095   Epoch: 13   Global Step: 573990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:55,981-Speed 2610.71 samples/sec   Loss 4.1009   LearningRate 0.0095   Epoch: 13   Global Step: 574000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:14:59,883-Speed 2624.79 samples/sec   Loss 4.0861   LearningRate 0.0095   Epoch: 13   Global Step: 574010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:03,789-Speed 2622.00 samples/sec   Loss 4.1945   LearningRate 0.0095   Epoch: 13   Global Step: 574020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:07,690-Speed 2625.68 samples/sec   Loss 4.1176   LearningRate 0.0095   Epoch: 13   Global Step: 574030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:11,602-Speed 2619.03 samples/sec   Loss 4.1929   LearningRate 0.0095   Epoch: 13   Global Step: 574040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:15,509-Speed 2620.95 samples/sec   Loss 4.1537   LearningRate 0.0095   Epoch: 13   Global Step: 574050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:19,417-Speed 2621.69 samples/sec   Loss 4.0834   LearningRate 0.0095   Epoch: 13   Global Step: 574060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:23,309-Speed 2631.35 samples/sec   Loss 4.1212   LearningRate 0.0095   Epoch: 13   Global Step: 574070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:27,205-Speed 2629.55 samples/sec   Loss 4.1659   LearningRate 0.0095   Epoch: 13   Global Step: 574080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:15:31,080-Speed 2643.16 samples/sec   Loss 4.1321   LearningRate 0.0095   Epoch: 13   Global Step: 574090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:34,978-Speed 2627.77 samples/sec   Loss 4.2384   LearningRate 0.0095   Epoch: 13   Global Step: 574100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:38,877-Speed 2626.54 samples/sec   Loss 4.1105   LearningRate 0.0095   Epoch: 13   Global Step: 574110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:42,771-Speed 2630.94 samples/sec   Loss 4.1679   LearningRate 0.0095   Epoch: 13   Global Step: 574120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:46,665-Speed 2630.37 samples/sec   Loss 4.1811   LearningRate 0.0095   Epoch: 13   Global Step: 574130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:50,567-Speed 2625.24 samples/sec   Loss 4.1501   LearningRate 0.0095   Epoch: 13   Global Step: 574140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:54,477-Speed 2619.19 samples/sec   Loss 4.1825   LearningRate 0.0095   Epoch: 13   Global Step: 574150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:15:58,381-Speed 2624.57 samples/sec   Loss 4.1737   LearningRate 0.0095   Epoch: 13   Global Step: 574160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:02,278-Speed 2627.84 samples/sec   Loss 4.1641   LearningRate 0.0095   Epoch: 13   Global Step: 574170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:06,193-Speed 2616.29 samples/sec   Loss 4.0515   LearningRate 0.0095   Epoch: 13   Global Step: 574180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:10,102-Speed 2620.23 samples/sec   Loss 4.1794   LearningRate 0.0095   Epoch: 13   Global Step: 574190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:16:13,971-Speed 2647.68 samples/sec   Loss 4.1826   LearningRate 0.0095   Epoch: 13   Global Step: 574200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:17,877-Speed 2622.38 samples/sec   Loss 4.0947   LearningRate 0.0095   Epoch: 13   Global Step: 574210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:21,788-Speed 2619.54 samples/sec   Loss 4.2021   LearningRate 0.0095   Epoch: 13   Global Step: 574220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:25,688-Speed 2625.82 samples/sec   Loss 4.2511   LearningRate 0.0095   Epoch: 13   Global Step: 574230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:29,584-Speed 2628.81 samples/sec   Loss 4.1695   LearningRate 0.0095   Epoch: 13   Global Step: 574240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:33,487-Speed 2624.71 samples/sec   Loss 4.2201   LearningRate 0.0095   Epoch: 13   Global Step: 574250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:37,390-Speed 2624.89 samples/sec   Loss 4.1138   LearningRate 0.0095   Epoch: 13   Global Step: 574260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:41,287-Speed 2628.46 samples/sec   Loss 4.2318   LearningRate 0.0095   Epoch: 13   Global Step: 574270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:45,189-Speed 2625.01 samples/sec   Loss 4.1945   LearningRate 0.0095   Epoch: 13   Global Step: 574280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:49,093-Speed 2623.40 samples/sec   Loss 4.1725   LearningRate 0.0095   Epoch: 13   Global Step: 574290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:52,978-Speed 2636.55 samples/sec   Loss 4.1726   LearningRate 0.0095   Epoch: 13   Global Step: 574300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:16:56,863-Speed 2636.92 samples/sec   Loss 4.1644   LearningRate 0.0095   Epoch: 13   Global Step: 574310   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:17:00,770-Speed 2621.32 samples/sec   Loss 4.1782   LearningRate 0.0095   Epoch: 13   Global Step: 574320   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:17:04,676-Speed 2621.85 samples/sec   Loss 4.1386   LearningRate 0.0095   Epoch: 13   Global Step: 574330   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:17:08,570-Speed 2630.66 samples/sec   Loss 4.2185   LearningRate 0.0095   Epoch: 13   Global Step: 574340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:17:12,442-Speed 2645.83 samples/sec   Loss 4.0619   LearningRate 0.0095   Epoch: 13   Global Step: 574350   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:16,334-Speed 2631.36 samples/sec   Loss 4.2333   LearningRate 0.0095   Epoch: 13   Global Step: 574360   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:20,228-Speed 2630.77 samples/sec   Loss 4.2311   LearningRate 0.0095   Epoch: 13   Global Step: 574370   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:24,133-Speed 2622.62 samples/sec   Loss 4.1391   LearningRate 0.0095   Epoch: 13   Global Step: 574380   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:28,034-Speed 2625.87 samples/sec   Loss 4.1809   LearningRate 0.0095   Epoch: 13   Global Step: 574390   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:31,925-Speed 2632.09 samples/sec   Loss 4.2729   LearningRate 0.0095   Epoch: 13   Global Step: 574400   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:35,815-Speed 2633.60 samples/sec   Loss 4.1270   LearningRate 0.0095   Epoch: 13   Global Step: 574410   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:39,719-Speed 2623.58 samples/sec   Loss 4.0951   LearningRate 0.0095   Epoch: 13   Global Step: 574420   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:43,616-Speed 2628.62 samples/sec   Loss 4.1535   LearningRate 0.0095   Epoch: 13   Global Step: 574430   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:47,511-Speed 2629.72 samples/sec   Loss 4.0788   LearningRate 0.0095   Epoch: 13   Global Step: 574440   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-15 12:17:51,412-Speed 2625.82 samples/sec   Loss 4.1533   LearningRate 0.0095   Epoch: 13   Global Step: 574450   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:17:55,321-Speed 2619.92 samples/sec   Loss 4.1269   LearningRate 0.0095   Epoch: 13   Global Step: 574460   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:17:59,220-Speed 2626.88 samples/sec   Loss 4.1183   LearningRate 0.0095   Epoch: 13   Global Step: 574470   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:03,130-Speed 2619.91 samples/sec   Loss 4.1635   LearningRate 0.0095   Epoch: 13   Global Step: 574480   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:07,038-Speed 2620.90 samples/sec   Loss 4.1869   LearningRate 0.0095   Epoch: 13   Global Step: 574490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:10,950-Speed 2617.84 samples/sec   Loss 4.2062   LearningRate 0.0095   Epoch: 13   Global Step: 574500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:14,847-Speed 2628.42 samples/sec   Loss 4.1518   LearningRate 0.0095   Epoch: 13   Global Step: 574510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:18,750-Speed 2624.47 samples/sec   Loss 4.2254   LearningRate 0.0095   Epoch: 13   Global Step: 574520   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:22,932-Speed 2449.78 samples/sec   Loss 4.0786   LearningRate 0.0095   Epoch: 13   Global Step: 574530   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:26,847-Speed 2615.93 samples/sec   Loss 4.2076   LearningRate 0.0095   Epoch: 13   Global Step: 574540   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:18:30,748-Speed 2625.83 samples/sec   Loss 4.1794   LearningRate 0.0095   Epoch: 13   Global Step: 574550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:34,645-Speed 2628.36 samples/sec   Loss 4.1722   LearningRate 0.0095   Epoch: 13   Global Step: 574560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:38,548-Speed 2624.36 samples/sec   Loss 4.0563   LearningRate 0.0094   Epoch: 13   Global Step: 574570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:42,452-Speed 2623.63 samples/sec   Loss 4.0921   LearningRate 0.0094   Epoch: 13   Global Step: 574580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:46,351-Speed 2627.56 samples/sec   Loss 4.1334   LearningRate 0.0094   Epoch: 13   Global Step: 574590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:50,249-Speed 2627.52 samples/sec   Loss 4.1090   LearningRate 0.0094   Epoch: 13   Global Step: 574600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:54,155-Speed 2624.15 samples/sec   Loss 4.0963   LearningRate 0.0094   Epoch: 13   Global Step: 574610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:18:58,079-Speed 2609.74 samples/sec   Loss 4.1547   LearningRate 0.0094   Epoch: 13   Global Step: 574620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:19:01,994-Speed 2616.18 samples/sec   Loss 4.1223   LearningRate 0.0094   Epoch: 13   Global Step: 574630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:19:05,889-Speed 2630.00 samples/sec   Loss 4.0858   LearningRate 0.0094   Epoch: 13   Global Step: 574640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:19:09,791-Speed 2624.40 samples/sec   Loss 4.1078   LearningRate 0.0094   Epoch: 13   Global Step: 574650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:19:13,689-Speed 2627.60 samples/sec   Loss 4.0814   LearningRate 0.0094   Epoch: 13   Global Step: 574660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:19:17,588-Speed 2627.39 samples/sec   Loss 4.1334   LearningRate 0.0094   Epoch: 13   Global Step: 574670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:19:21,494-Speed 2622.40 samples/sec   Loss 4.1425   LearningRate 0.0094   Epoch: 13   Global Step: 574680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:19:25,385-Speed 2632.15 samples/sec   Loss 4.1449   LearningRate 0.0094   Epoch: 13   Global Step: 574690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:29,284-Speed 2627.69 samples/sec   Loss 4.1504   LearningRate 0.0094   Epoch: 13   Global Step: 574700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:33,192-Speed 2620.55 samples/sec   Loss 4.1313   LearningRate 0.0094   Epoch: 13   Global Step: 574710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:37,099-Speed 2621.45 samples/sec   Loss 4.1339   LearningRate 0.0094   Epoch: 13   Global Step: 574720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:41,017-Speed 2613.93 samples/sec   Loss 4.0904   LearningRate 0.0094   Epoch: 13   Global Step: 574730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:44,915-Speed 2628.22 samples/sec   Loss 4.1570   LearningRate 0.0094   Epoch: 13   Global Step: 574740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:48,806-Speed 2632.52 samples/sec   Loss 4.1453   LearningRate 0.0094   Epoch: 13   Global Step: 574750   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:52,705-Speed 2627.00 samples/sec   Loss 4.1890   LearningRate 0.0094   Epoch: 13   Global Step: 574760   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:19:56,596-Speed 2632.89 samples/sec   Loss 4.0624   LearningRate 0.0094   Epoch: 13   Global Step: 574770   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:00,497-Speed 2625.91 samples/sec   Loss 4.1080   LearningRate 0.0094   Epoch: 13   Global Step: 574780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:04,405-Speed 2620.84 samples/sec   Loss 4.0862   LearningRate 0.0094   Epoch: 13   Global Step: 574790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:08,296-Speed 2632.02 samples/sec   Loss 4.1876   LearningRate 0.0094   Epoch: 13   Global Step: 574800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:12,190-Speed 2631.08 samples/sec   Loss 4.1878   LearningRate 0.0094   Epoch: 13   Global Step: 574810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:16,089-Speed 2626.80 samples/sec   Loss 4.2355   LearningRate 0.0094   Epoch: 13   Global Step: 574820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:20,032-Speed 2597.82 samples/sec   Loss 4.1897   LearningRate 0.0094   Epoch: 13   Global Step: 574830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:23,929-Speed 2628.54 samples/sec   Loss 4.1652   LearningRate 0.0094   Epoch: 13   Global Step: 574840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:27,837-Speed 2620.71 samples/sec   Loss 4.0765   LearningRate 0.0094   Epoch: 13   Global Step: 574850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:31,768-Speed 2605.81 samples/sec   Loss 4.2137   LearningRate 0.0094   Epoch: 13   Global Step: 574860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:35,667-Speed 2626.79 samples/sec   Loss 4.1283   LearningRate 0.0094   Epoch: 13   Global Step: 574870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:20:39,539-Speed 2645.48 samples/sec   Loss 4.2037   LearningRate 0.0094   Epoch: 13   Global Step: 574880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:43,431-Speed 2631.55 samples/sec   Loss 4.1439   LearningRate 0.0094   Epoch: 13   Global Step: 574890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:47,533-Speed 2496.94 samples/sec   Loss 4.1705   LearningRate 0.0094   Epoch: 13   Global Step: 574900   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:51,426-Speed 2631.81 samples/sec   Loss 4.1130   LearningRate 0.0094   Epoch: 13   Global Step: 574910   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:55,321-Speed 2629.46 samples/sec   Loss 4.1513   LearningRate 0.0094   Epoch: 13   Global Step: 574920   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:20:59,217-Speed 2629.10 samples/sec   Loss 4.1753   LearningRate 0.0094   Epoch: 13   Global Step: 574930   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:21:03,125-Speed 2620.95 samples/sec   Loss 4.1755   LearningRate 0.0094   Epoch: 13   Global Step: 574940   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:21:07,019-Speed 2630.22 samples/sec   Loss 4.0841   LearningRate 0.0094   Epoch: 13   Global Step: 574950   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:21:10,909-Speed 2633.09 samples/sec   Loss 4.1255   LearningRate 0.0094   Epoch: 13   Global Step: 574960   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:21:14,802-Speed 2631.41 samples/sec   Loss 4.1300   LearningRate 0.0094   Epoch: 13   Global Step: 574970   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:21:18,696-Speed 2631.09 samples/sec   Loss 4.1167   LearningRate 0.0094   Epoch: 13   Global Step: 574980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:22,599-Speed 2623.90 samples/sec   Loss 4.1286   LearningRate 0.0094   Epoch: 13   Global Step: 574990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:26,496-Speed 2628.35 samples/sec   Loss 4.0321   LearningRate 0.0094   Epoch: 13   Global Step: 575000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:30,396-Speed 2626.65 samples/sec   Loss 4.1253   LearningRate 0.0094   Epoch: 13   Global Step: 575010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:34,300-Speed 2623.16 samples/sec   Loss 4.1342   LearningRate 0.0094   Epoch: 13   Global Step: 575020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:38,199-Speed 2627.13 samples/sec   Loss 4.1502   LearningRate 0.0094   Epoch: 13   Global Step: 575030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:42,088-Speed 2633.69 samples/sec   Loss 4.1660   LearningRate 0.0094   Epoch: 13   Global Step: 575040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:46,070-Speed 2572.09 samples/sec   Loss 4.2049   LearningRate 0.0094   Epoch: 13   Global Step: 575050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:49,981-Speed 2619.02 samples/sec   Loss 4.1405   LearningRate 0.0094   Epoch: 13   Global Step: 575060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:53,889-Speed 2620.73 samples/sec   Loss 4.1139   LearningRate 0.0094   Epoch: 13   Global Step: 575070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:21:57,795-Speed 2623.19 samples/sec   Loss 4.2282   LearningRate 0.0094   Epoch: 13   Global Step: 575080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:22:01,708-Speed 2617.24 samples/sec   Loss 4.0881   LearningRate 0.0094   Epoch: 13   Global Step: 575090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:22:05,685-Speed 2575.34 samples/sec   Loss 4.1533   LearningRate 0.0094   Epoch: 13   Global Step: 575100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:09,587-Speed 2624.51 samples/sec   Loss 4.1056   LearningRate 0.0094   Epoch: 13   Global Step: 575110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:13,484-Speed 2629.45 samples/sec   Loss 4.0571   LearningRate 0.0094   Epoch: 13   Global Step: 575120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:17,379-Speed 2630.02 samples/sec   Loss 4.1618   LearningRate 0.0094   Epoch: 13   Global Step: 575130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:21,274-Speed 2629.07 samples/sec   Loss 4.1376   LearningRate 0.0094   Epoch: 13   Global Step: 575140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:25,183-Speed 2619.93 samples/sec   Loss 4.0980   LearningRate 0.0094   Epoch: 13   Global Step: 575150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:29,097-Speed 2617.38 samples/sec   Loss 4.1364   LearningRate 0.0094   Epoch: 13   Global Step: 575160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:22:32,975-Speed 2641.93 samples/sec   Loss 4.1356   LearningRate 0.0094   Epoch: 13   Global Step: 575170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:22:36,873-Speed 2627.62 samples/sec   Loss 4.1550   LearningRate 0.0094   Epoch: 13   Global Step: 575180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:22:40,772-Speed 2626.55 samples/sec   Loss 4.1181   LearningRate 0.0094   Epoch: 13   Global Step: 575190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:22:44,687-Speed 2616.71 samples/sec   Loss 4.2058   LearningRate 0.0094   Epoch: 13   Global Step: 575200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:22:48,597-Speed 2619.20 samples/sec   Loss 4.0872   LearningRate 0.0094   Epoch: 13   Global Step: 575210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:22:52,502-Speed 2623.53 samples/sec   Loss 4.1245   LearningRate 0.0094   Epoch: 13   Global Step: 575220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:22:56,431-Speed 2607.14 samples/sec   Loss 4.2599   LearningRate 0.0094   Epoch: 13   Global Step: 575230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:00,330-Speed 2626.22 samples/sec   Loss 4.1199   LearningRate 0.0094   Epoch: 13   Global Step: 575240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:04,226-Speed 2629.23 samples/sec   Loss 4.0853   LearningRate 0.0094   Epoch: 13   Global Step: 575250   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:08,122-Speed 2629.44 samples/sec   Loss 4.1662   LearningRate 0.0094   Epoch: 13   Global Step: 575260   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:12,025-Speed 2624.30 samples/sec   Loss 4.0817   LearningRate 0.0094   Epoch: 13   Global Step: 575270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:23:15,921-Speed 2629.59 samples/sec   Loss 4.0183   LearningRate 0.0094   Epoch: 13   Global Step: 575280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:23:19,841-Speed 2612.90 samples/sec   Loss 4.1576   LearningRate 0.0094   Epoch: 13   Global Step: 575290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:23:23,735-Speed 2630.55 samples/sec   Loss 4.1448   LearningRate 0.0094   Epoch: 13   Global Step: 575300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:23:27,670-Speed 2603.47 samples/sec   Loss 4.0660   LearningRate 0.0094   Epoch: 13   Global Step: 575310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:23:31,563-Speed 2630.34 samples/sec   Loss 4.1396   LearningRate 0.0094   Epoch: 13   Global Step: 575320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:23:35,448-Speed 2637.08 samples/sec   Loss 4.1554   LearningRate 0.0094   Epoch: 13   Global Step: 575330   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:39,344-Speed 2628.96 samples/sec   Loss 4.1395   LearningRate 0.0094   Epoch: 13   Global Step: 575340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:43,241-Speed 2628.98 samples/sec   Loss 4.1965   LearningRate 0.0094   Epoch: 13   Global Step: 575350   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:47,133-Speed 2631.42 samples/sec   Loss 4.1454   LearningRate 0.0094   Epoch: 13   Global Step: 575360   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:51,024-Speed 2632.18 samples/sec   Loss 4.1458   LearningRate 0.0094   Epoch: 13   Global Step: 575370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:54,917-Speed 2630.96 samples/sec   Loss 4.1430   LearningRate 0.0094   Epoch: 13   Global Step: 575380   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:23:58,817-Speed 2626.37 samples/sec   Loss 4.1684   LearningRate 0.0094   Epoch: 13   Global Step: 575390   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:24:02,714-Speed 2628.08 samples/sec   Loss 4.1286   LearningRate 0.0094   Epoch: 13   Global Step: 575400   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:24:06,623-Speed 2621.02 samples/sec   Loss 4.1214   LearningRate 0.0094   Epoch: 13   Global Step: 575410   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:24:10,520-Speed 2627.99 samples/sec   Loss 4.1881   LearningRate 0.0094   Epoch: 13   Global Step: 575420   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:24:14,482-Speed 2585.20 samples/sec   Loss 4.1339   LearningRate 0.0094   Epoch: 13   Global Step: 575430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:18,393-Speed 2619.49 samples/sec   Loss 4.1372   LearningRate 0.0094   Epoch: 13   Global Step: 575440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:22,345-Speed 2592.21 samples/sec   Loss 4.1489   LearningRate 0.0094   Epoch: 13   Global Step: 575450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:26,240-Speed 2629.53 samples/sec   Loss 4.0921   LearningRate 0.0094   Epoch: 13   Global Step: 575460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:30,151-Speed 2618.63 samples/sec   Loss 4.1138   LearningRate 0.0094   Epoch: 13   Global Step: 575470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:34,057-Speed 2622.30 samples/sec   Loss 4.0689   LearningRate 0.0094   Epoch: 13   Global Step: 575480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:37,971-Speed 2621.37 samples/sec   Loss 4.1083   LearningRate 0.0094   Epoch: 13   Global Step: 575490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:41,872-Speed 2625.12 samples/sec   Loss 4.2082   LearningRate 0.0094   Epoch: 13   Global Step: 575500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:45,771-Speed 2627.35 samples/sec   Loss 4.1869   LearningRate 0.0094   Epoch: 13   Global Step: 575510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:49,671-Speed 2626.45 samples/sec   Loss 4.1314   LearningRate 0.0094   Epoch: 13   Global Step: 575520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:24:53,576-Speed 2623.38 samples/sec   Loss 4.0903   LearningRate 0.0094   Epoch: 13   Global Step: 575530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:24:57,473-Speed 2628.21 samples/sec   Loss 4.1407   LearningRate 0.0094   Epoch: 13   Global Step: 575540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:25:01,366-Speed 2630.89 samples/sec   Loss 4.1230   LearningRate 0.0094   Epoch: 13   Global Step: 575550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:25:05,263-Speed 2628.33 samples/sec   Loss 4.1300   LearningRate 0.0094   Epoch: 13   Global Step: 575560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:25:09,163-Speed 2626.24 samples/sec   Loss 4.1001   LearningRate 0.0094   Epoch: 13   Global Step: 575570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:25:13,068-Speed 2623.10 samples/sec   Loss 4.0469   LearningRate 0.0094   Epoch: 13   Global Step: 575580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:25:16,947-Speed 2639.91 samples/sec   Loss 4.1683   LearningRate 0.0094   Epoch: 13   Global Step: 575590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:25:20,844-Speed 2629.15 samples/sec   Loss 4.2026   LearningRate 0.0094   Epoch: 13   Global Step: 575600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:25:24,745-Speed 2625.69 samples/sec   Loss 4.1295   LearningRate 0.0094   Epoch: 13   Global Step: 575610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:25:28,634-Speed 2633.72 samples/sec   Loss 4.1666   LearningRate 0.0094   Epoch: 13   Global Step: 575620   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:32,529-Speed 2629.88 samples/sec   Loss 4.1057   LearningRate 0.0094   Epoch: 13   Global Step: 575630   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:36,424-Speed 2629.41 samples/sec   Loss 4.0761   LearningRate 0.0094   Epoch: 13   Global Step: 575640   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:40,319-Speed 2629.66 samples/sec   Loss 4.1462   LearningRate 0.0094   Epoch: 13   Global Step: 575650   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:44,211-Speed 2631.92 samples/sec   Loss 4.1590   LearningRate 0.0094   Epoch: 13   Global Step: 575660   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:48,116-Speed 2623.12 samples/sec   Loss 4.1103   LearningRate 0.0094   Epoch: 13   Global Step: 575670   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:52,065-Speed 2594.00 samples/sec   Loss 4.1347   LearningRate 0.0094   Epoch: 13   Global Step: 575680   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:55,965-Speed 2626.27 samples/sec   Loss 4.1016   LearningRate 0.0094   Epoch: 13   Global Step: 575690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:25:59,875-Speed 2619.66 samples/sec   Loss 4.0742   LearningRate 0.0094   Epoch: 13   Global Step: 575700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:03,770-Speed 2629.31 samples/sec   Loss 4.1151   LearningRate 0.0094   Epoch: 13   Global Step: 575710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:07,668-Speed 2628.13 samples/sec   Loss 4.1355   LearningRate 0.0094   Epoch: 13   Global Step: 575720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:26:11,556-Speed 2634.45 samples/sec   Loss 4.1953   LearningRate 0.0094   Epoch: 13   Global Step: 575730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:15,454-Speed 2627.51 samples/sec   Loss 4.0175   LearningRate 0.0094   Epoch: 13   Global Step: 575740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:19,360-Speed 2621.93 samples/sec   Loss 4.0823   LearningRate 0.0094   Epoch: 13   Global Step: 575750   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:23,253-Speed 2630.97 samples/sec   Loss 4.0543   LearningRate 0.0094   Epoch: 13   Global Step: 575760   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:27,150-Speed 2628.81 samples/sec   Loss 4.1294   LearningRate 0.0094   Epoch: 13   Global Step: 575770   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:31,080-Speed 2606.50 samples/sec   Loss 4.1185   LearningRate 0.0094   Epoch: 13   Global Step: 575780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:34,984-Speed 2623.70 samples/sec   Loss 4.1833   LearningRate 0.0094   Epoch: 13   Global Step: 575790   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:38,876-Speed 2631.55 samples/sec   Loss 4.1591   LearningRate 0.0094   Epoch: 13   Global Step: 575800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:42,802-Speed 2608.79 samples/sec   Loss 4.1626   LearningRate 0.0094   Epoch: 13   Global Step: 575810   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:46,697-Speed 2630.04 samples/sec   Loss 4.1945   LearningRate 0.0094   Epoch: 13   Global Step: 575820   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-15 12:26:50,617-Speed 2612.85 samples/sec   Loss 4.0524   LearningRate 0.0094   Epoch: 13   Global Step: 575830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:26:54,527-Speed 2620.20 samples/sec   Loss 4.1749   LearningRate 0.0094   Epoch: 13   Global Step: 575840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:26:58,530-Speed 2558.70 samples/sec   Loss 4.1517   LearningRate 0.0094   Epoch: 13   Global Step: 575850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:02,454-Speed 2610.21 samples/sec   Loss 4.0711   LearningRate 0.0094   Epoch: 13   Global Step: 575860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:06,352-Speed 2628.06 samples/sec   Loss 4.2279   LearningRate 0.0094   Epoch: 13   Global Step: 575870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:10,266-Speed 2616.41 samples/sec   Loss 4.1738   LearningRate 0.0094   Epoch: 13   Global Step: 575880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:14,193-Speed 2609.05 samples/sec   Loss 4.1382   LearningRate 0.0094   Epoch: 13   Global Step: 575890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:18,085-Speed 2631.09 samples/sec   Loss 4.1506   LearningRate 0.0094   Epoch: 13   Global Step: 575900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:22,041-Speed 2589.57 samples/sec   Loss 4.0374   LearningRate 0.0094   Epoch: 13   Global Step: 575910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:25,933-Speed 2632.08 samples/sec   Loss 4.1458   LearningRate 0.0093   Epoch: 13   Global Step: 575920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:29,850-Speed 2614.73 samples/sec   Loss 4.2106   LearningRate 0.0093   Epoch: 13   Global Step: 575930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-15 12:27:33,732-Speed 2638.80 samples/sec   Loss 4.1431   LearningRate 0.0093   Epoch: 13   Global Step: 575940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:37,629-Speed 2628.41 samples/sec   Loss 4.1698   LearningRate 0.0093   Epoch: 13   Global Step: 575950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:41,522-Speed 2630.76 samples/sec   Loss 4.1306   LearningRate 0.0093   Epoch: 13   Global Step: 575960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:45,417-Speed 2629.67 samples/sec   Loss 4.1722   LearningRate 0.0093   Epoch: 13   Global Step: 575970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:49,333-Speed 2616.20 samples/sec   Loss 4.1954   LearningRate 0.0093   Epoch: 13   Global Step: 575980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-15 12:27:53,244-Speed 2619.10 samples/sec   Loss 4.2071   LearningRate 0.0093   Epoch: 13   Global Step: 575990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:27:57,154-Speed 2619.69 samples/sec   Loss 4.1731   LearningRate 0.0093   Epoch: 13   Global Step: 576000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:28:01,085-Speed 2605.24 samples/sec   Loss 4.1322   LearningRate 0.0093   Epoch: 13   Global Step: 576010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:28:04,994-Speed 2620.26 samples/sec   Loss 4.1554   LearningRate 0.0093   Epoch: 13   Global Step: 576020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:28:08,894-Speed 2625.99 samples/sec   Loss 4.1581   LearningRate 0.0093   Epoch: 13   Global Step: 576030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:28:12,793-Speed 2627.35 samples/sec   Loss 4.2358   LearningRate 0.0093   Epoch: 13   Global Step: 576040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:16,696-Speed 2624.05 samples/sec   Loss 4.0912   LearningRate 0.0093   Epoch: 13   Global Step: 576050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:20,589-Speed 2630.74 samples/sec   Loss 4.1183   LearningRate 0.0093   Epoch: 13   Global Step: 576060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:24,509-Speed 2613.52 samples/sec   Loss 4.2361   LearningRate 0.0093   Epoch: 13   Global Step: 576070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:28,406-Speed 2628.73 samples/sec   Loss 4.1288   LearningRate 0.0093   Epoch: 13   Global Step: 576080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:32,341-Speed 2602.60 samples/sec   Loss 4.0602   LearningRate 0.0093   Epoch: 13   Global Step: 576090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:36,229-Speed 2634.02 samples/sec   Loss 4.1331   LearningRate 0.0093   Epoch: 13   Global Step: 576100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:40,138-Speed 2621.10 samples/sec   Loss 4.1112   LearningRate 0.0093   Epoch: 13   Global Step: 576110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:44,033-Speed 2629.57 samples/sec   Loss 4.1187   LearningRate 0.0093   Epoch: 13   Global Step: 576120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:48,031-Speed 2561.75 samples/sec   Loss 4.0839   LearningRate 0.0093   Epoch: 13   Global Step: 576130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:51,921-Speed 2633.89 samples/sec   Loss 4.1395   LearningRate 0.0093   Epoch: 13   Global Step: 576140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:55,815-Speed 2630.28 samples/sec   Loss 4.0731   LearningRate 0.0093   Epoch: 13   Global Step: 576150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:28:59,730-Speed 2616.17 samples/sec   Loss 4.0474   LearningRate 0.0093   Epoch: 13   Global Step: 576160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:03,624-Speed 2630.03 samples/sec   Loss 4.1272   LearningRate 0.0093   Epoch: 13   Global Step: 576170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:07,525-Speed 2626.42 samples/sec   Loss 4.1267   LearningRate 0.0093   Epoch: 13   Global Step: 576180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:11,429-Speed 2622.90 samples/sec   Loss 4.0822   LearningRate 0.0093   Epoch: 13   Global Step: 576190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:15,352-Speed 2611.54 samples/sec   Loss 4.1567   LearningRate 0.0093   Epoch: 13   Global Step: 576200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:19,260-Speed 2621.02 samples/sec   Loss 4.1761   LearningRate 0.0093   Epoch: 13   Global Step: 576210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:23,150-Speed 2633.08 samples/sec   Loss 4.2256   LearningRate 0.0093   Epoch: 13   Global Step: 576220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:27,047-Speed 2628.48 samples/sec   Loss 4.0710   LearningRate 0.0093   Epoch: 13   Global Step: 576230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:29:30,919-Speed 2645.34 samples/sec   Loss 4.0169   LearningRate 0.0093   Epoch: 13   Global Step: 576240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:29:34,822-Speed 2623.95 samples/sec   Loss 4.0998   LearningRate 0.0093   Epoch: 13   Global Step: 576250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:29:38,719-Speed 2628.28 samples/sec   Loss 4.0957   LearningRate 0.0093   Epoch: 13   Global Step: 576260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:29:42,622-Speed 2624.05 samples/sec   Loss 4.1477   LearningRate 0.0093   Epoch: 13   Global Step: 576270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:29:46,524-Speed 2625.70 samples/sec   Loss 4.1141   LearningRate 0.0093   Epoch: 13   Global Step: 576280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:29:50,425-Speed 2625.27 samples/sec   Loss 4.1304   LearningRate 0.0093   Epoch: 13   Global Step: 576290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:29:54,321-Speed 2629.78 samples/sec   Loss 4.1243   LearningRate 0.0093   Epoch: 13   Global Step: 576300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:29:58,216-Speed 2629.28 samples/sec   Loss 4.0941   LearningRate 0.0093   Epoch: 13   Global Step: 576310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:02,124-Speed 2621.39 samples/sec   Loss 4.1184   LearningRate 0.0093   Epoch: 13   Global Step: 576320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:06,029-Speed 2622.35 samples/sec   Loss 4.0510   LearningRate 0.0093   Epoch: 13   Global Step: 576330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:09,929-Speed 2626.77 samples/sec   Loss 4.1978   LearningRate 0.0093   Epoch: 13   Global Step: 576340   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:13,826-Speed 2628.28 samples/sec   Loss 4.1593   LearningRate 0.0093   Epoch: 13   Global Step: 576350   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:17,761-Speed 2603.14 samples/sec   Loss 4.0622   LearningRate 0.0093   Epoch: 13   Global Step: 576360   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:21,660-Speed 2627.48 samples/sec   Loss 4.0827   LearningRate 0.0093   Epoch: 13   Global Step: 576370   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:25,558-Speed 2627.40 samples/sec   Loss 4.1584   LearningRate 0.0093   Epoch: 13   Global Step: 576380   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:29,464-Speed 2621.87 samples/sec   Loss 4.2135   LearningRate 0.0093   Epoch: 13   Global Step: 576390   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:30:33,368-Speed 2623.71 samples/sec   Loss 4.1463   LearningRate 0.0093   Epoch: 13   Global Step: 576400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:30:37,262-Speed 2631.03 samples/sec   Loss 4.0515   LearningRate 0.0093   Epoch: 13   Global Step: 576410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:30:41,156-Speed 2630.27 samples/sec   Loss 4.0819   LearningRate 0.0093   Epoch: 13   Global Step: 576420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:30:45,082-Speed 2608.93 samples/sec   Loss 4.0891   LearningRate 0.0093   Epoch: 13   Global Step: 576430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:30:48,985-Speed 2624.79 samples/sec   Loss 4.0520   LearningRate 0.0093   Epoch: 13   Global Step: 576440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:30:52,879-Speed 2630.48 samples/sec   Loss 4.1190   LearningRate 0.0093   Epoch: 13   Global Step: 576450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:30:56,780-Speed 2625.66 samples/sec   Loss 4.2272   LearningRate 0.0093   Epoch: 13   Global Step: 576460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:00,676-Speed 2628.63 samples/sec   Loss 4.1565   LearningRate 0.0093   Epoch: 13   Global Step: 576470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:04,592-Speed 2615.45 samples/sec   Loss 4.0870   LearningRate 0.0093   Epoch: 13   Global Step: 576480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:08,517-Speed 2609.94 samples/sec   Loss 4.1057   LearningRate 0.0093   Epoch: 13   Global Step: 576490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:12,395-Speed 2640.98 samples/sec   Loss 4.0594   LearningRate 0.0093   Epoch: 13   Global Step: 576500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:16,304-Speed 2620.10 samples/sec   Loss 4.1035   LearningRate 0.0093   Epoch: 13   Global Step: 576510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:20,204-Speed 2626.60 samples/sec   Loss 4.1444   LearningRate 0.0093   Epoch: 13   Global Step: 576520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:24,114-Speed 2619.66 samples/sec   Loss 4.1413   LearningRate 0.0093   Epoch: 13   Global Step: 576530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:28,019-Speed 2623.40 samples/sec   Loss 4.1161   LearningRate 0.0093   Epoch: 13   Global Step: 576540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:31,941-Speed 2610.99 samples/sec   Loss 4.1207   LearningRate 0.0093   Epoch: 13   Global Step: 576550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:35,846-Speed 2622.89 samples/sec   Loss 4.1186   LearningRate 0.0093   Epoch: 13   Global Step: 576560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:39,741-Speed 2629.53 samples/sec   Loss 4.1991   LearningRate 0.0093   Epoch: 13   Global Step: 576570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:43,840-Speed 2499.32 samples/sec   Loss 4.1536   LearningRate 0.0093   Epoch: 13   Global Step: 576580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:47,752-Speed 2617.53 samples/sec   Loss 4.0463   LearningRate 0.0093   Epoch: 13   Global Step: 576590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:51,651-Speed 2628.11 samples/sec   Loss 4.1562   LearningRate 0.0093   Epoch: 13   Global Step: 576600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:31:55,526-Speed 2642.61 samples/sec   Loss 4.1231   LearningRate 0.0093   Epoch: 13   Global Step: 576610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:31:59,427-Speed 2626.26 samples/sec   Loss 4.1351   LearningRate 0.0093   Epoch: 13   Global Step: 576620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:32:03,328-Speed 2625.75 samples/sec   Loss 4.1463   LearningRate 0.0093   Epoch: 13   Global Step: 576630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:32:07,219-Speed 2632.45 samples/sec   Loss 4.1923   LearningRate 0.0093   Epoch: 13   Global Step: 576640   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:11,252-Speed 2539.82 samples/sec   Loss 4.1057   LearningRate 0.0093   Epoch: 13   Global Step: 576650   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:15,169-Speed 2614.86 samples/sec   Loss 4.0863   LearningRate 0.0093   Epoch: 13   Global Step: 576660   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:19,099-Speed 2606.86 samples/sec   Loss 4.1136   LearningRate 0.0093   Epoch: 13   Global Step: 576670   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:22,995-Speed 2628.73 samples/sec   Loss 4.1961   LearningRate 0.0093   Epoch: 13   Global Step: 576680   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:26,883-Speed 2634.57 samples/sec   Loss 4.0406   LearningRate 0.0093   Epoch: 13   Global Step: 576690   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:30,773-Speed 2633.10 samples/sec   Loss 4.0441   LearningRate 0.0093   Epoch: 13   Global Step: 576700   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:34,668-Speed 2630.15 samples/sec   Loss 4.0608   LearningRate 0.0093   Epoch: 13   Global Step: 576710   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:38,639-Speed 2578.79 samples/sec   Loss 4.2035   LearningRate 0.0093   Epoch: 13   Global Step: 576720   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:42,554-Speed 2616.87 samples/sec   Loss 4.1377   LearningRate 0.0093   Epoch: 13   Global Step: 576730   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:46,451-Speed 2628.24 samples/sec   Loss 4.1492   LearningRate 0.0093   Epoch: 13   Global Step: 576740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:32:50,325-Speed 2644.90 samples/sec   Loss 4.1818   LearningRate 0.0093   Epoch: 13   Global Step: 576750   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:54,231-Speed 2622.38 samples/sec   Loss 4.1272   LearningRate 0.0093   Epoch: 13   Global Step: 576760   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:32:58,136-Speed 2622.83 samples/sec   Loss 4.1231   LearningRate 0.0093   Epoch: 13   Global Step: 576770   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:02,035-Speed 2626.92 samples/sec   Loss 4.1388   LearningRate 0.0093   Epoch: 13   Global Step: 576780   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:05,931-Speed 2629.36 samples/sec   Loss 3.9872   LearningRate 0.0093   Epoch: 13   Global Step: 576790   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:09,826-Speed 2629.13 samples/sec   Loss 4.1656   LearningRate 0.0093   Epoch: 13   Global Step: 576800   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:13,743-Speed 2615.63 samples/sec   Loss 4.1779   LearningRate 0.0093   Epoch: 13   Global Step: 576810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:17,716-Speed 2577.86 samples/sec   Loss 4.1090   LearningRate 0.0093   Epoch: 13   Global Step: 576820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:21,615-Speed 2627.59 samples/sec   Loss 4.0751   LearningRate 0.0093   Epoch: 13   Global Step: 576830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:25,539-Speed 2610.15 samples/sec   Loss 4.1181   LearningRate 0.0093   Epoch: 13   Global Step: 576840   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:29,436-Speed 2628.93 samples/sec   Loss 4.1314   LearningRate 0.0093   Epoch: 13   Global Step: 576850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:33:33,333-Speed 2628.17 samples/sec   Loss 4.1687   LearningRate 0.0093   Epoch: 13   Global Step: 576860   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:37,284-Speed 2592.19 samples/sec   Loss 4.1603   LearningRate 0.0093   Epoch: 13   Global Step: 576870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:41,179-Speed 2629.70 samples/sec   Loss 4.0929   LearningRate 0.0093   Epoch: 13   Global Step: 576880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:45,087-Speed 2620.96 samples/sec   Loss 4.1706   LearningRate 0.0093   Epoch: 13   Global Step: 576890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:49,033-Speed 2595.89 samples/sec   Loss 4.2145   LearningRate 0.0093   Epoch: 13   Global Step: 576900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:52,925-Speed 2632.04 samples/sec   Loss 4.1273   LearningRate 0.0093   Epoch: 13   Global Step: 576910   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:33:56,849-Speed 2610.07 samples/sec   Loss 4.1341   LearningRate 0.0093   Epoch: 13   Global Step: 576920   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:00,743-Speed 2630.98 samples/sec   Loss 4.0614   LearningRate 0.0093   Epoch: 13   Global Step: 576930   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:04,638-Speed 2629.66 samples/sec   Loss 4.1125   LearningRate 0.0093   Epoch: 13   Global Step: 576940   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:08,535-Speed 2628.07 samples/sec   Loss 4.1574   LearningRate 0.0093   Epoch: 13   Global Step: 576950   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:12,431-Speed 2628.70 samples/sec   Loss 4.1500   LearningRate 0.0093   Epoch: 13   Global Step: 576960   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:16,330-Speed 2627.18 samples/sec   Loss 4.2335   LearningRate 0.0093   Epoch: 13   Global Step: 576970   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:20,232-Speed 2625.56 samples/sec   Loss 4.1349   LearningRate 0.0093   Epoch: 13   Global Step: 576980   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:24,131-Speed 2627.47 samples/sec   Loss 4.1238   LearningRate 0.0093   Epoch: 13   Global Step: 576990   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:28,033-Speed 2624.46 samples/sec   Loss 4.0595   LearningRate 0.0093   Epoch: 13   Global Step: 577000   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:31,942-Speed 2620.88 samples/sec   Loss 4.1299   LearningRate 0.0093   Epoch: 13   Global Step: 577010   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:35,845-Speed 2624.50 samples/sec   Loss 4.0961   LearningRate 0.0093   Epoch: 13   Global Step: 577020   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:39,788-Speed 2598.02 samples/sec   Loss 4.0739   LearningRate 0.0093   Epoch: 13   Global Step: 577030   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:43,688-Speed 2625.68 samples/sec   Loss 4.0484   LearningRate 0.0093   Epoch: 13   Global Step: 577040   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:47,615-Speed 2609.17 samples/sec   Loss 4.1473   LearningRate 0.0093   Epoch: 13   Global Step: 577050   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:34:51,519-Speed 2623.07 samples/sec   Loss 3.9814   LearningRate 0.0093   Epoch: 13   Global Step: 577060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:34:55,450-Speed 2606.32 samples/sec   Loss 4.0980   LearningRate 0.0093   Epoch: 13   Global Step: 577070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:34:59,367-Speed 2615.05 samples/sec   Loss 4.0999   LearningRate 0.0093   Epoch: 13   Global Step: 577080   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:35:03,262-Speed 2629.36 samples/sec   Loss 4.2072   LearningRate 0.0093   Epoch: 13   Global Step: 577090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:35:07,168-Speed 2622.59 samples/sec   Loss 4.1375   LearningRate 0.0093   Epoch: 13   Global Step: 577100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:35:11,085-Speed 2615.47 samples/sec   Loss 4.0871   LearningRate 0.0093   Epoch: 13   Global Step: 577110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:35:14,976-Speed 2632.14 samples/sec   Loss 4.1491   LearningRate 0.0093   Epoch: 13   Global Step: 577120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:35:18,886-Speed 2619.52 samples/sec   Loss 4.0547   LearningRate 0.0093   Epoch: 13   Global Step: 577130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:35:22,762-Speed 2643.08 samples/sec   Loss 4.1683   LearningRate 0.0093   Epoch: 13   Global Step: 577140   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:26,653-Speed 2632.09 samples/sec   Loss 4.1038   LearningRate 0.0093   Epoch: 13   Global Step: 577150   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:30,544-Speed 2632.70 samples/sec   Loss 4.1152   LearningRate 0.0093   Epoch: 13   Global Step: 577160   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:34,438-Speed 2630.31 samples/sec   Loss 4.0855   LearningRate 0.0093   Epoch: 13   Global Step: 577170   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:38,331-Speed 2630.96 samples/sec   Loss 4.0382   LearningRate 0.0093   Epoch: 13   Global Step: 577180   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:42,238-Speed 2621.61 samples/sec   Loss 4.1472   LearningRate 0.0093   Epoch: 13   Global Step: 577190   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:46,128-Speed 2633.05 samples/sec   Loss 4.0755   LearningRate 0.0093   Epoch: 13   Global Step: 577200   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:50,023-Speed 2629.96 samples/sec   Loss 4.0922   LearningRate 0.0093   Epoch: 13   Global Step: 577210   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:53,932-Speed 2620.05 samples/sec   Loss 4.0324   LearningRate 0.0093   Epoch: 13   Global Step: 577220   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:35:57,877-Speed 2596.77 samples/sec   Loss 4.1505   LearningRate 0.0093   Epoch: 13   Global Step: 577230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:01,777-Speed 2626.65 samples/sec   Loss 4.0579   LearningRate 0.0093   Epoch: 13   Global Step: 577240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:36:05,699-Speed 2611.51 samples/sec   Loss 4.1720   LearningRate 0.0093   Epoch: 13   Global Step: 577250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:36:09,605-Speed 2622.11 samples/sec   Loss 4.1150   LearningRate 0.0093   Epoch: 13   Global Step: 577260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:36:13,503-Speed 2628.02 samples/sec   Loss 4.0426   LearningRate 0.0093   Epoch: 13   Global Step: 577270   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:17,405-Speed 2624.61 samples/sec   Loss 4.1956   LearningRate 0.0092   Epoch: 13   Global Step: 577280   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:21,354-Speed 2593.90 samples/sec   Loss 4.1456   LearningRate 0.0092   Epoch: 13   Global Step: 577290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:25,259-Speed 2623.18 samples/sec   Loss 4.1469   LearningRate 0.0092   Epoch: 13   Global Step: 577300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:29,158-Speed 2627.22 samples/sec   Loss 4.0519   LearningRate 0.0092   Epoch: 13   Global Step: 577310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:33,064-Speed 2622.02 samples/sec   Loss 4.1155   LearningRate 0.0092   Epoch: 13   Global Step: 577320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:36,959-Speed 2629.76 samples/sec   Loss 4.0590   LearningRate 0.0092   Epoch: 13   Global Step: 577330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:40,868-Speed 2619.97 samples/sec   Loss 4.1337   LearningRate 0.0092   Epoch: 13   Global Step: 577340   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:44,766-Speed 2628.64 samples/sec   Loss 4.1698   LearningRate 0.0092   Epoch: 13   Global Step: 577350   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:48,660-Speed 2629.79 samples/sec   Loss 4.0308   LearningRate 0.0092   Epoch: 13   Global Step: 577360   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:36:52,563-Speed 2624.78 samples/sec   Loss 4.0706   LearningRate 0.0092   Epoch: 13   Global Step: 577370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:36:56,466-Speed 2623.89 samples/sec   Loss 4.0568   LearningRate 0.0092   Epoch: 13   Global Step: 577380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:00,365-Speed 2627.16 samples/sec   Loss 4.0884   LearningRate 0.0092   Epoch: 13   Global Step: 577390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:04,271-Speed 2622.56 samples/sec   Loss 4.1358   LearningRate 0.0092   Epoch: 13   Global Step: 577400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:08,167-Speed 2629.04 samples/sec   Loss 4.0987   LearningRate 0.0092   Epoch: 13   Global Step: 577410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:12,068-Speed 2625.37 samples/sec   Loss 4.1242   LearningRate 0.0092   Epoch: 13   Global Step: 577420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:15,968-Speed 2626.09 samples/sec   Loss 4.0938   LearningRate 0.0092   Epoch: 13   Global Step: 577430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:19,883-Speed 2616.46 samples/sec   Loss 4.1233   LearningRate 0.0092   Epoch: 13   Global Step: 577440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:23,779-Speed 2629.22 samples/sec   Loss 4.1555   LearningRate 0.0092   Epoch: 13   Global Step: 577450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:27,689-Speed 2619.98 samples/sec   Loss 4.1342   LearningRate 0.0092   Epoch: 13   Global Step: 577460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:31,639-Speed 2592.58 samples/sec   Loss 4.0284   LearningRate 0.0092   Epoch: 13   Global Step: 577470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:35,566-Speed 2608.62 samples/sec   Loss 4.1739   LearningRate 0.0092   Epoch: 13   Global Step: 577480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:39,468-Speed 2625.01 samples/sec   Loss 4.1010   LearningRate 0.0092   Epoch: 13   Global Step: 577490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:43,361-Speed 2631.39 samples/sec   Loss 4.0869   LearningRate 0.0092   Epoch: 13   Global Step: 577500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:47,257-Speed 2628.80 samples/sec   Loss 4.1051   LearningRate 0.0092   Epoch: 13   Global Step: 577510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:37:51,128-Speed 2645.65 samples/sec   Loss 4.1792   LearningRate 0.0092   Epoch: 13   Global Step: 577520   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:37:55,023-Speed 2629.92 samples/sec   Loss 4.1002   LearningRate 0.0092   Epoch: 13   Global Step: 577530   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:37:58,927-Speed 2623.67 samples/sec   Loss 4.1071   LearningRate 0.0092   Epoch: 13   Global Step: 577540   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:02,936-Speed 2554.86 samples/sec   Loss 3.9667   LearningRate 0.0092   Epoch: 13   Global Step: 577550   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:06,835-Speed 2626.89 samples/sec   Loss 4.1622   LearningRate 0.0092   Epoch: 13   Global Step: 577560   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:10,746-Speed 2619.00 samples/sec   Loss 4.1166   LearningRate 0.0092   Epoch: 13   Global Step: 577570   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:14,643-Speed 2628.52 samples/sec   Loss 4.0556   LearningRate 0.0092   Epoch: 13   Global Step: 577580   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:18,566-Speed 2611.44 samples/sec   Loss 4.0869   LearningRate 0.0092   Epoch: 13   Global Step: 577590   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:22,454-Speed 2634.77 samples/sec   Loss 4.1200   LearningRate 0.0092   Epoch: 13   Global Step: 577600   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:26,348-Speed 2629.90 samples/sec   Loss 4.0628   LearningRate 0.0092   Epoch: 13   Global Step: 577610   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:38:30,239-Speed 2633.10 samples/sec   Loss 4.1339   LearningRate 0.0092   Epoch: 13   Global Step: 577620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:34,130-Speed 2632.17 samples/sec   Loss 4.2502   LearningRate 0.0092   Epoch: 13   Global Step: 577630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:38,036-Speed 2622.10 samples/sec   Loss 4.1007   LearningRate 0.0092   Epoch: 13   Global Step: 577640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:41,937-Speed 2625.55 samples/sec   Loss 4.0799   LearningRate 0.0092   Epoch: 13   Global Step: 577650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:45,855-Speed 2614.50 samples/sec   Loss 4.1988   LearningRate 0.0092   Epoch: 13   Global Step: 577660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:49,746-Speed 2632.53 samples/sec   Loss 4.1813   LearningRate 0.0092   Epoch: 13   Global Step: 577670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:53,639-Speed 2631.11 samples/sec   Loss 4.1171   LearningRate 0.0092   Epoch: 13   Global Step: 577680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:38:57,540-Speed 2626.21 samples/sec   Loss 4.0808   LearningRate 0.0092   Epoch: 13   Global Step: 577690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:01,436-Speed 2628.94 samples/sec   Loss 4.1593   LearningRate 0.0092   Epoch: 13   Global Step: 577700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:05,350-Speed 2616.46 samples/sec   Loss 4.1321   LearningRate 0.0092   Epoch: 13   Global Step: 577710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:09,254-Speed 2623.46 samples/sec   Loss 4.0156   LearningRate 0.0092   Epoch: 13   Global Step: 577720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:13,148-Speed 2631.29 samples/sec   Loss 4.1172   LearningRate 0.0092   Epoch: 13   Global Step: 577730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:17,047-Speed 2626.83 samples/sec   Loss 4.0117   LearningRate 0.0092   Epoch: 13   Global Step: 577740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:20,970-Speed 2611.27 samples/sec   Loss 4.0264   LearningRate 0.0092   Epoch: 13   Global Step: 577750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:24,873-Speed 2624.39 samples/sec   Loss 4.0405   LearningRate 0.0092   Epoch: 13   Global Step: 577760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:28,949-Speed 2513.03 samples/sec   Loss 4.1251   LearningRate 0.0092   Epoch: 13   Global Step: 577770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:32,848-Speed 2626.97 samples/sec   Loss 4.2298   LearningRate 0.0092   Epoch: 13   Global Step: 577780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:36,745-Speed 2628.26 samples/sec   Loss 4.1802   LearningRate 0.0092   Epoch: 13   Global Step: 577790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:40,639-Speed 2630.50 samples/sec   Loss 4.2468   LearningRate 0.0092   Epoch: 13   Global Step: 577800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:39:44,517-Speed 2641.18 samples/sec   Loss 4.2413   LearningRate 0.0092   Epoch: 13   Global Step: 577810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:39:48,410-Speed 2630.61 samples/sec   Loss 4.1657   LearningRate 0.0092   Epoch: 13   Global Step: 577820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:39:52,311-Speed 2626.29 samples/sec   Loss 4.1440   LearningRate 0.0092   Epoch: 13   Global Step: 577830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:39:56,203-Speed 2631.28 samples/sec   Loss 4.1445   LearningRate 0.0092   Epoch: 13   Global Step: 577840   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:00,133-Speed 2606.78 samples/sec   Loss 4.1396   LearningRate 0.0092   Epoch: 13   Global Step: 577850   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:04,044-Speed 2619.12 samples/sec   Loss 4.1208   LearningRate 0.0092   Epoch: 13   Global Step: 577860   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:07,949-Speed 2622.92 samples/sec   Loss 4.0816   LearningRate 0.0092   Epoch: 13   Global Step: 577870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:11,852-Speed 2623.99 samples/sec   Loss 4.1073   LearningRate 0.0092   Epoch: 13   Global Step: 577880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:15,749-Speed 2628.93 samples/sec   Loss 4.1051   LearningRate 0.0092   Epoch: 13   Global Step: 577890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:19,642-Speed 2631.08 samples/sec   Loss 4.1561   LearningRate 0.0092   Epoch: 13   Global Step: 577900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:40:23,546-Speed 2623.72 samples/sec   Loss 4.0515   LearningRate 0.0092   Epoch: 13   Global Step: 577910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:27,443-Speed 2628.04 samples/sec   Loss 4.1088   LearningRate 0.0092   Epoch: 13   Global Step: 577920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:31,346-Speed 2624.70 samples/sec   Loss 4.1846   LearningRate 0.0092   Epoch: 13   Global Step: 577930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:35,251-Speed 2622.82 samples/sec   Loss 4.1031   LearningRate 0.0092   Epoch: 13   Global Step: 577940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:39,189-Speed 2601.18 samples/sec   Loss 4.0552   LearningRate 0.0092   Epoch: 13   Global Step: 577950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:43,118-Speed 2606.69 samples/sec   Loss 4.1659   LearningRate 0.0092   Epoch: 13   Global Step: 577960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:47,131-Speed 2552.37 samples/sec   Loss 4.2313   LearningRate 0.0092   Epoch: 13   Global Step: 577970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:51,025-Speed 2630.63 samples/sec   Loss 4.1511   LearningRate 0.0092   Epoch: 13   Global Step: 577980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:54,917-Speed 2631.66 samples/sec   Loss 4.0355   LearningRate 0.0092   Epoch: 13   Global Step: 577990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:40:58,816-Speed 2626.88 samples/sec   Loss 4.0867   LearningRate 0.0092   Epoch: 13   Global Step: 578000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:02,712-Speed 2628.75 samples/sec   Loss 4.0351   LearningRate 0.0092   Epoch: 13   Global Step: 578010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:41:06,604-Speed 2631.53 samples/sec   Loss 4.1676   LearningRate 0.0092   Epoch: 13   Global Step: 578020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:41:10,479-Speed 2643.88 samples/sec   Loss 4.0809   LearningRate 0.0092   Epoch: 13   Global Step: 578030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:14,378-Speed 2626.58 samples/sec   Loss 4.1014   LearningRate 0.0092   Epoch: 13   Global Step: 578040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:18,269-Speed 2632.56 samples/sec   Loss 4.0275   LearningRate 0.0092   Epoch: 13   Global Step: 578050   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:22,173-Speed 2624.90 samples/sec   Loss 3.9355   LearningRate 0.0092   Epoch: 13   Global Step: 578060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:26,068-Speed 2629.72 samples/sec   Loss 4.1520   LearningRate 0.0092   Epoch: 13   Global Step: 578070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:29,960-Speed 2631.90 samples/sec   Loss 4.1237   LearningRate 0.0092   Epoch: 13   Global Step: 578080   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:33,859-Speed 2626.93 samples/sec   Loss 4.2143   LearningRate 0.0092   Epoch: 13   Global Step: 578090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:37,753-Speed 2630.17 samples/sec   Loss 4.0838   LearningRate 0.0092   Epoch: 13   Global Step: 578100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:41,657-Speed 2623.51 samples/sec   Loss 4.1343   LearningRate 0.0092   Epoch: 13   Global Step: 578110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:45,652-Speed 2564.25 samples/sec   Loss 4.1636   LearningRate 0.0092   Epoch: 13   Global Step: 578120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:41:49,551-Speed 2627.19 samples/sec   Loss 4.1061   LearningRate 0.0092   Epoch: 13   Global Step: 578130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:41:53,461-Speed 2619.25 samples/sec   Loss 4.1358   LearningRate 0.0092   Epoch: 13   Global Step: 578140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:41:57,364-Speed 2624.58 samples/sec   Loss 4.0408   LearningRate 0.0092   Epoch: 13   Global Step: 578150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:42:01,234-Speed 2647.15 samples/sec   Loss 4.0465   LearningRate 0.0092   Epoch: 13   Global Step: 578160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:05,136-Speed 2624.88 samples/sec   Loss 4.0645   LearningRate 0.0092   Epoch: 13   Global Step: 578170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:09,036-Speed 2626.09 samples/sec   Loss 4.1176   LearningRate 0.0092   Epoch: 13   Global Step: 578180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:12,933-Speed 2627.96 samples/sec   Loss 4.0543   LearningRate 0.0092   Epoch: 13   Global Step: 578190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:16,844-Speed 2618.61 samples/sec   Loss 3.9863   LearningRate 0.0092   Epoch: 13   Global Step: 578200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:20,749-Speed 2623.55 samples/sec   Loss 4.1181   LearningRate 0.0092   Epoch: 13   Global Step: 578210   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:24,652-Speed 2624.43 samples/sec   Loss 4.0699   LearningRate 0.0092   Epoch: 13   Global Step: 578220   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:28,594-Speed 2598.51 samples/sec   Loss 4.1284   LearningRate 0.0092   Epoch: 13   Global Step: 578230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:32,489-Speed 2628.90 samples/sec   Loss 4.0444   LearningRate 0.0092   Epoch: 13   Global Step: 578240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:36,390-Speed 2625.64 samples/sec   Loss 4.1279   LearningRate 0.0092   Epoch: 13   Global Step: 578250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:40,290-Speed 2626.22 samples/sec   Loss 4.1863   LearningRate 0.0092   Epoch: 13   Global Step: 578260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:42:44,170-Speed 2640.12 samples/sec   Loss 4.0301   LearningRate 0.0092   Epoch: 13   Global Step: 578270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:48,069-Speed 2627.40 samples/sec   Loss 4.1390   LearningRate 0.0092   Epoch: 13   Global Step: 578280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:51,961-Speed 2631.80 samples/sec   Loss 4.1014   LearningRate 0.0092   Epoch: 13   Global Step: 578290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:55,864-Speed 2623.96 samples/sec   Loss 4.0826   LearningRate 0.0092   Epoch: 13   Global Step: 578300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:42:59,828-Speed 2584.67 samples/sec   Loss 4.2075   LearningRate 0.0092   Epoch: 13   Global Step: 578310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:03,723-Speed 2629.14 samples/sec   Loss 4.0799   LearningRate 0.0092   Epoch: 13   Global Step: 578320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:07,622-Speed 2626.84 samples/sec   Loss 4.0608   LearningRate 0.0092   Epoch: 13   Global Step: 578330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:11,526-Speed 2623.35 samples/sec   Loss 4.0455   LearningRate 0.0092   Epoch: 13   Global Step: 578340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:15,422-Speed 2628.84 samples/sec   Loss 4.0470   LearningRate 0.0092   Epoch: 13   Global Step: 578350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:19,316-Speed 2630.71 samples/sec   Loss 4.0582   LearningRate 0.0092   Epoch: 13   Global Step: 578360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:23,207-Speed 2632.83 samples/sec   Loss 4.2190   LearningRate 0.0092   Epoch: 13   Global Step: 578370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:43:27,102-Speed 2629.43 samples/sec   Loss 4.0679   LearningRate 0.0092   Epoch: 13   Global Step: 578380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:43:31,007-Speed 2622.92 samples/sec   Loss 4.1029   LearningRate 0.0092   Epoch: 13   Global Step: 578390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:43:34,917-Speed 2619.69 samples/sec   Loss 4.0080   LearningRate 0.0092   Epoch: 13   Global Step: 578400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:38,806-Speed 2633.29 samples/sec   Loss 4.0904   LearningRate 0.0092   Epoch: 13   Global Step: 578410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:42,716-Speed 2619.44 samples/sec   Loss 4.1299   LearningRate 0.0092   Epoch: 13   Global Step: 578420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:46,627-Speed 2620.01 samples/sec   Loss 4.1545   LearningRate 0.0092   Epoch: 13   Global Step: 578430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:50,532-Speed 2622.83 samples/sec   Loss 4.1510   LearningRate 0.0092   Epoch: 13   Global Step: 578440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:54,432-Speed 2626.50 samples/sec   Loss 4.1495   LearningRate 0.0092   Epoch: 13   Global Step: 578450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:43:58,328-Speed 2629.20 samples/sec   Loss 4.0825   LearningRate 0.0092   Epoch: 13   Global Step: 578460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:02,219-Speed 2632.37 samples/sec   Loss 4.0866   LearningRate 0.0092   Epoch: 13   Global Step: 578470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:06,114-Speed 2629.36 samples/sec   Loss 4.0614   LearningRate 0.0092   Epoch: 13   Global Step: 578480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:10,004-Speed 2633.14 samples/sec   Loss 4.0387   LearningRate 0.0092   Epoch: 13   Global Step: 578490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:13,909-Speed 2622.64 samples/sec   Loss 4.1067   LearningRate 0.0092   Epoch: 13   Global Step: 578500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:44:17,783-Speed 2644.07 samples/sec   Loss 4.0837   LearningRate 0.0092   Epoch: 13   Global Step: 578510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:21,680-Speed 2628.92 samples/sec   Loss 4.0635   LearningRate 0.0092   Epoch: 13   Global Step: 578520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:25,572-Speed 2631.97 samples/sec   Loss 4.0730   LearningRate 0.0092   Epoch: 13   Global Step: 578530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:29,467-Speed 2630.12 samples/sec   Loss 4.0993   LearningRate 0.0092   Epoch: 13   Global Step: 578540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:33,364-Speed 2627.65 samples/sec   Loss 3.9891   LearningRate 0.0092   Epoch: 13   Global Step: 578550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:37,266-Speed 2625.20 samples/sec   Loss 4.0659   LearningRate 0.0092   Epoch: 13   Global Step: 578560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:41,161-Speed 2629.15 samples/sec   Loss 4.0794   LearningRate 0.0092   Epoch: 13   Global Step: 578570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:45,129-Speed 2581.13 samples/sec   Loss 4.0802   LearningRate 0.0092   Epoch: 13   Global Step: 578580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:49,025-Speed 2629.40 samples/sec   Loss 4.0453   LearningRate 0.0092   Epoch: 13   Global Step: 578590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:52,918-Speed 2631.09 samples/sec   Loss 4.0619   LearningRate 0.0092   Epoch: 13   Global Step: 578600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:44:56,816-Speed 2627.96 samples/sec   Loss 4.0912   LearningRate 0.0092   Epoch: 13   Global Step: 578610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:45:00,710-Speed 2630.76 samples/sec   Loss 4.1558   LearningRate 0.0092   Epoch: 13   Global Step: 578620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:45:04,606-Speed 2628.80 samples/sec   Loss 4.1413   LearningRate 0.0092   Epoch: 13   Global Step: 578630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:45:08,519-Speed 2617.74 samples/sec   Loss 4.1680   LearningRate 0.0092   Epoch: 13   Global Step: 578640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:45:12,413-Speed 2630.09 samples/sec   Loss 4.0436   LearningRate 0.0091   Epoch: 13   Global Step: 578650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:45:16,306-Speed 2631.10 samples/sec   Loss 4.1369   LearningRate 0.0091   Epoch: 13   Global Step: 578660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:45:20,192-Speed 2635.68 samples/sec   Loss 4.0639   LearningRate 0.0091   Epoch: 13   Global Step: 578670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:24,089-Speed 2628.61 samples/sec   Loss 4.1085   LearningRate 0.0091   Epoch: 13   Global Step: 578680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:27,979-Speed 2633.32 samples/sec   Loss 4.0910   LearningRate 0.0091   Epoch: 13   Global Step: 578690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:31,873-Speed 2630.47 samples/sec   Loss 4.1169   LearningRate 0.0091   Epoch: 13   Global Step: 578700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:35,764-Speed 2631.72 samples/sec   Loss 4.0548   LearningRate 0.0091   Epoch: 13   Global Step: 578710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:39,664-Speed 2626.41 samples/sec   Loss 4.1271   LearningRate 0.0091   Epoch: 13   Global Step: 578720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:43,569-Speed 2622.80 samples/sec   Loss 4.1265   LearningRate 0.0091   Epoch: 13   Global Step: 578730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:47,464-Speed 2629.52 samples/sec   Loss 4.1794   LearningRate 0.0091   Epoch: 13   Global Step: 578740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:51,364-Speed 2626.86 samples/sec   Loss 4.1270   LearningRate 0.0091   Epoch: 13   Global Step: 578750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:55,264-Speed 2626.49 samples/sec   Loss 4.0189   LearningRate 0.0091   Epoch: 13   Global Step: 578760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:45:59,159-Speed 2629.95 samples/sec   Loss 4.0490   LearningRate 0.0091   Epoch: 13   Global Step: 578770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:46:03,054-Speed 2629.48 samples/sec   Loss 4.0541   LearningRate 0.0091   Epoch: 13   Global Step: 578780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:46:06,955-Speed 2625.11 samples/sec   Loss 4.0509   LearningRate 0.0091   Epoch: 13   Global Step: 578790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:46:10,850-Speed 2629.61 samples/sec   Loss 4.0802   LearningRate 0.0091   Epoch: 13   Global Step: 578800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:46:14,724-Speed 2644.10 samples/sec   Loss 4.0604   LearningRate 0.0091   Epoch: 13   Global Step: 578810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:18,615-Speed 2632.20 samples/sec   Loss 4.0565   LearningRate 0.0091   Epoch: 13   Global Step: 578820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:22,507-Speed 2631.35 samples/sec   Loss 4.0756   LearningRate 0.0091   Epoch: 13   Global Step: 578830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:26,406-Speed 2627.98 samples/sec   Loss 3.9722   LearningRate 0.0091   Epoch: 13   Global Step: 578840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:30,300-Speed 2630.58 samples/sec   Loss 4.1361   LearningRate 0.0091   Epoch: 13   Global Step: 578850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:34,197-Speed 2628.23 samples/sec   Loss 3.9583   LearningRate 0.0091   Epoch: 13   Global Step: 578860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:38,112-Speed 2615.71 samples/sec   Loss 4.1396   LearningRate 0.0091   Epoch: 13   Global Step: 578870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:46:41,998-Speed 2636.51 samples/sec   Loss 4.0355   LearningRate 0.0091   Epoch: 13   Global Step: 578880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:46:45,976-Speed 2575.18 samples/sec   Loss 4.0727   LearningRate 0.0091   Epoch: 13   Global Step: 578890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:46:49,914-Speed 2600.87 samples/sec   Loss 4.0779   LearningRate 0.0091   Epoch: 13   Global Step: 578900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:46:53,808-Speed 2631.16 samples/sec   Loss 4.1039   LearningRate 0.0091   Epoch: 13   Global Step: 578910   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:46:57,696-Speed 2633.85 samples/sec   Loss 4.0538   LearningRate 0.0091   Epoch: 13   Global Step: 578920   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:47:01,590-Speed 2630.22 samples/sec   Loss 4.2167   LearningRate 0.0091   Epoch: 13   Global Step: 578930   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:47:05,485-Speed 2630.11 samples/sec   Loss 4.0418   LearningRate 0.0091   Epoch: 13   Global Step: 578940   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:47:09,381-Speed 2628.89 samples/sec   Loss 4.1295   LearningRate 0.0091   Epoch: 13   Global Step: 578950   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:47:13,280-Speed 2626.66 samples/sec   Loss 4.0344   LearningRate 0.0091   Epoch: 13   Global Step: 578960   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:47:17,176-Speed 2629.64 samples/sec   Loss 4.0886   LearningRate 0.0091   Epoch: 13   Global Step: 578970   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:47:21,073-Speed 2631.23 samples/sec   Loss 4.0879   LearningRate 0.0091   Epoch: 13   Global Step: 578980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:24,986-Speed 2617.73 samples/sec   Loss 4.1622   LearningRate 0.0091   Epoch: 13   Global Step: 578990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:28,882-Speed 2628.76 samples/sec   Loss 4.2042   LearningRate 0.0091   Epoch: 13   Global Step: 579000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:32,814-Speed 2605.28 samples/sec   Loss 4.0564   LearningRate 0.0091   Epoch: 13   Global Step: 579010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:36,712-Speed 2627.17 samples/sec   Loss 4.1980   LearningRate 0.0091   Epoch: 13   Global Step: 579020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:40,602-Speed 2632.71 samples/sec   Loss 4.2016   LearningRate 0.0091   Epoch: 13   Global Step: 579030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:44,508-Speed 2622.42 samples/sec   Loss 4.0496   LearningRate 0.0091   Epoch: 13   Global Step: 579040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:48,401-Speed 2631.24 samples/sec   Loss 4.1456   LearningRate 0.0091   Epoch: 13   Global Step: 579050   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:52,294-Speed 2630.97 samples/sec   Loss 3.9871   LearningRate 0.0091   Epoch: 13   Global Step: 579060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:47:56,187-Speed 2631.30 samples/sec   Loss 4.0441   LearningRate 0.0091   Epoch: 13   Global Step: 579070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:00,087-Speed 2626.34 samples/sec   Loss 4.1502   LearningRate 0.0091   Epoch: 13   Global Step: 579080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:48:03,981-Speed 2630.46 samples/sec   Loss 4.0273   LearningRate 0.0091   Epoch: 13   Global Step: 579090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:48:07,875-Speed 2630.33 samples/sec   Loss 4.1083   LearningRate 0.0091   Epoch: 13   Global Step: 579100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:48:11,755-Speed 2639.55 samples/sec   Loss 4.0782   LearningRate 0.0091   Epoch: 13   Global Step: 579110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:15,656-Speed 2625.46 samples/sec   Loss 4.0831   LearningRate 0.0091   Epoch: 13   Global Step: 579120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:19,587-Speed 2605.46 samples/sec   Loss 4.0520   LearningRate 0.0091   Epoch: 13   Global Step: 579130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:23,480-Speed 2631.15 samples/sec   Loss 4.0187   LearningRate 0.0091   Epoch: 13   Global Step: 579140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:27,373-Speed 2630.96 samples/sec   Loss 4.0188   LearningRate 0.0091   Epoch: 13   Global Step: 579150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:31,269-Speed 2629.61 samples/sec   Loss 4.0652   LearningRate 0.0091   Epoch: 13   Global Step: 579160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:35,158-Speed 2633.44 samples/sec   Loss 4.0981   LearningRate 0.0091   Epoch: 13   Global Step: 579170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:39,046-Speed 2634.15 samples/sec   Loss 4.1866   LearningRate 0.0091   Epoch: 13   Global Step: 579180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:42,943-Speed 2628.28 samples/sec   Loss 4.0681   LearningRate 0.0091   Epoch: 13   Global Step: 579190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:46,837-Speed 2630.66 samples/sec   Loss 4.0139   LearningRate 0.0091   Epoch: 13   Global Step: 579200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:48:50,738-Speed 2625.88 samples/sec   Loss 4.1634   LearningRate 0.0091   Epoch: 13   Global Step: 579210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:48:54,642-Speed 2623.39 samples/sec   Loss 4.0756   LearningRate 0.0091   Epoch: 13   Global Step: 579220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:48:58,556-Speed 2617.64 samples/sec   Loss 4.1084   LearningRate 0.0091   Epoch: 13   Global Step: 579230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:49:02,432-Speed 2642.39 samples/sec   Loss 4.0633   LearningRate 0.0091   Epoch: 13   Global Step: 579240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:49:06,325-Speed 2631.01 samples/sec   Loss 4.1806   LearningRate 0.0091   Epoch: 13   Global Step: 579250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:49:10,234-Speed 2620.01 samples/sec   Loss 3.9839   LearningRate 0.0091   Epoch: 13   Global Step: 579260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:49:14,135-Speed 2626.22 samples/sec   Loss 4.0419   LearningRate 0.0091   Epoch: 13   Global Step: 579270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:49:18,032-Speed 2628.31 samples/sec   Loss 4.0850   LearningRate 0.0091   Epoch: 13   Global Step: 579280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:49:21,909-Speed 2641.76 samples/sec   Loss 4.1639   LearningRate 0.0091   Epoch: 13   Global Step: 579290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:25,822-Speed 2617.65 samples/sec   Loss 4.0244   LearningRate 0.0091   Epoch: 13   Global Step: 579300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:29,718-Speed 2629.21 samples/sec   Loss 4.0620   LearningRate 0.0091   Epoch: 13   Global Step: 579310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:33,609-Speed 2632.85 samples/sec   Loss 4.1041   LearningRate 0.0091   Epoch: 13   Global Step: 579320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:37,499-Speed 2632.64 samples/sec   Loss 4.0779   LearningRate 0.0091   Epoch: 13   Global Step: 579330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:41,389-Speed 2632.67 samples/sec   Loss 4.1242   LearningRate 0.0091   Epoch: 13   Global Step: 579340   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:45,283-Speed 2630.39 samples/sec   Loss 4.1099   LearningRate 0.0091   Epoch: 13   Global Step: 579350   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:49,175-Speed 2631.55 samples/sec   Loss 4.0481   LearningRate 0.0091   Epoch: 13   Global Step: 579360   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:53,069-Speed 2631.64 samples/sec   Loss 4.0061   LearningRate 0.0091   Epoch: 13   Global Step: 579370   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:49:56,976-Speed 2621.04 samples/sec   Loss 4.1091   LearningRate 0.0091   Epoch: 13   Global Step: 579380   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:50:00,875-Speed 2627.08 samples/sec   Loss 3.9973   LearningRate 0.0091   Epoch: 13   Global Step: 579390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:04,772-Speed 2627.97 samples/sec   Loss 4.1375   LearningRate 0.0091   Epoch: 13   Global Step: 579400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:08,669-Speed 2628.72 samples/sec   Loss 4.1404   LearningRate 0.0091   Epoch: 13   Global Step: 579410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:12,572-Speed 2623.87 samples/sec   Loss 4.1757   LearningRate 0.0091   Epoch: 13   Global Step: 579420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:16,475-Speed 2624.97 samples/sec   Loss 4.0594   LearningRate 0.0091   Epoch: 13   Global Step: 579430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:20,387-Speed 2617.74 samples/sec   Loss 4.0873   LearningRate 0.0091   Epoch: 13   Global Step: 579440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:24,349-Speed 2585.08 samples/sec   Loss 4.1418   LearningRate 0.0091   Epoch: 13   Global Step: 579450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:28,245-Speed 2629.49 samples/sec   Loss 4.1070   LearningRate 0.0091   Epoch: 13   Global Step: 579460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:32,141-Speed 2628.37 samples/sec   Loss 4.0428   LearningRate 0.0091   Epoch: 13   Global Step: 579470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:36,035-Speed 2631.16 samples/sec   Loss 4.0631   LearningRate 0.0091   Epoch: 13   Global Step: 579480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:39,930-Speed 2629.23 samples/sec   Loss 4.0708   LearningRate 0.0091   Epoch: 13   Global Step: 579490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:50:43,821-Speed 2632.46 samples/sec   Loss 4.0461   LearningRate 0.0091   Epoch: 13   Global Step: 579500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:50:47,719-Speed 2627.57 samples/sec   Loss 4.0420   LearningRate 0.0091   Epoch: 13   Global Step: 579510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:50:51,590-Speed 2646.25 samples/sec   Loss 4.0437   LearningRate 0.0091   Epoch: 13   Global Step: 579520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:50:55,467-Speed 2641.16 samples/sec   Loss 4.1230   LearningRate 0.0091   Epoch: 13   Global Step: 579530   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:50:59,359-Speed 2632.63 samples/sec   Loss 4.0640   LearningRate 0.0091   Epoch: 13   Global Step: 579540   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:03,264-Speed 2622.63 samples/sec   Loss 4.1317   LearningRate 0.0091   Epoch: 13   Global Step: 579550   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:07,166-Speed 2625.74 samples/sec   Loss 4.0909   LearningRate 0.0091   Epoch: 13   Global Step: 579560   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:11,073-Speed 2621.49 samples/sec   Loss 4.0573   LearningRate 0.0091   Epoch: 13   Global Step: 579570   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:14,990-Speed 2614.30 samples/sec   Loss 4.0710   LearningRate 0.0091   Epoch: 13   Global Step: 579580   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:18,886-Speed 2629.70 samples/sec   Loss 4.1269   LearningRate 0.0091   Epoch: 13   Global Step: 579590   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:22,778-Speed 2631.20 samples/sec   Loss 4.0564   LearningRate 0.0091   Epoch: 13   Global Step: 579600   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:26,671-Speed 2631.57 samples/sec   Loss 4.0707   LearningRate 0.0091   Epoch: 13   Global Step: 579610   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:30,581-Speed 2619.32 samples/sec   Loss 4.0992   LearningRate 0.0091   Epoch: 13   Global Step: 579620   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:51:34,484-Speed 2624.48 samples/sec   Loss 4.0776   LearningRate 0.0091   Epoch: 13   Global Step: 579630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:51:38,382-Speed 2627.33 samples/sec   Loss 4.1168   LearningRate 0.0091   Epoch: 13   Global Step: 579640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:51:42,290-Speed 2620.83 samples/sec   Loss 4.1171   LearningRate 0.0091   Epoch: 13   Global Step: 579650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:51:46,189-Speed 2626.80 samples/sec   Loss 4.1171   LearningRate 0.0091   Epoch: 13   Global Step: 579660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:51:50,082-Speed 2631.31 samples/sec   Loss 4.1811   LearningRate 0.0091   Epoch: 13   Global Step: 579670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:51:53,977-Speed 2629.72 samples/sec   Loss 4.0091   LearningRate 0.0091   Epoch: 13   Global Step: 579680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:51:57,870-Speed 2631.12 samples/sec   Loss 4.1075   LearningRate 0.0091   Epoch: 13   Global Step: 579690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:01,762-Speed 2631.26 samples/sec   Loss 4.0692   LearningRate 0.0091   Epoch: 13   Global Step: 579700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:05,654-Speed 2631.89 samples/sec   Loss 4.1661   LearningRate 0.0091   Epoch: 13   Global Step: 579710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:09,545-Speed 2632.39 samples/sec   Loss 4.1371   LearningRate 0.0091   Epoch: 13   Global Step: 579720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:13,437-Speed 2631.85 samples/sec   Loss 4.0267   LearningRate 0.0091   Epoch: 13   Global Step: 579730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:52:17,337-Speed 2626.01 samples/sec   Loss 4.0111   LearningRate 0.0091   Epoch: 13   Global Step: 579740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:52:21,235-Speed 2628.51 samples/sec   Loss 4.0659   LearningRate 0.0091   Epoch: 13   Global Step: 579750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:52:25,132-Speed 2627.96 samples/sec   Loss 4.0205   LearningRate 0.0091   Epoch: 13   Global Step: 579760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:52:29,025-Speed 2630.53 samples/sec   Loss 4.0918   LearningRate 0.0091   Epoch: 13   Global Step: 579770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:52:32,937-Speed 2618.65 samples/sec   Loss 4.1382   LearningRate 0.0091   Epoch: 13   Global Step: 579780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:52:36,816-Speed 2641.53 samples/sec   Loss 4.0841   LearningRate 0.0091   Epoch: 13   Global Step: 579790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:40,714-Speed 2627.66 samples/sec   Loss 4.1387   LearningRate 0.0091   Epoch: 13   Global Step: 579800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:44,611-Speed 2628.04 samples/sec   Loss 4.0982   LearningRate 0.0091   Epoch: 13   Global Step: 579810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:48,540-Speed 2607.49 samples/sec   Loss 4.0963   LearningRate 0.0091   Epoch: 13   Global Step: 579820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:52,429-Speed 2633.44 samples/sec   Loss 4.0949   LearningRate 0.0091   Epoch: 13   Global Step: 579830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:52:56,321-Speed 2632.12 samples/sec   Loss 4.1408   LearningRate 0.0091   Epoch: 13   Global Step: 579840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:00,218-Speed 2628.14 samples/sec   Loss 4.0883   LearningRate 0.0091   Epoch: 13   Global Step: 579850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:04,201-Speed 2571.15 samples/sec   Loss 3.9423   LearningRate 0.0091   Epoch: 13   Global Step: 579860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:08,107-Speed 2622.14 samples/sec   Loss 3.9957   LearningRate 0.0091   Epoch: 13   Global Step: 579870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:12,008-Speed 2626.08 samples/sec   Loss 4.0456   LearningRate 0.0091   Epoch: 13   Global Step: 579880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:15,908-Speed 2626.68 samples/sec   Loss 4.0406   LearningRate 0.0091   Epoch: 13   Global Step: 579890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:53:19,788-Speed 2639.67 samples/sec   Loss 4.1525   LearningRate 0.0091   Epoch: 13   Global Step: 579900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:23,686-Speed 2628.07 samples/sec   Loss 4.1217   LearningRate 0.0091   Epoch: 13   Global Step: 579910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:27,581-Speed 2629.62 samples/sec   Loss 3.9759   LearningRate 0.0091   Epoch: 13   Global Step: 579920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:31,480-Speed 2627.98 samples/sec   Loss 4.1075   LearningRate 0.0091   Epoch: 13   Global Step: 579930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:35,500-Speed 2547.82 samples/sec   Loss 4.0784   LearningRate 0.0091   Epoch: 13   Global Step: 579940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:53:39,370-Speed 2646.50 samples/sec   Loss 4.0537   LearningRate 0.0091   Epoch: 13   Global Step: 579950   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:53:43,245-Speed 2643.07 samples/sec   Loss 4.0955   LearningRate 0.0091   Epoch: 13   Global Step: 579960   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:53:47,143-Speed 2627.77 samples/sec   Loss 4.0056   LearningRate 0.0091   Epoch: 13   Global Step: 579970   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:53:51,042-Speed 2627.39 samples/sec   Loss 4.2058   LearningRate 0.0091   Epoch: 13   Global Step: 579980   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:53:54,943-Speed 2625.29 samples/sec   Loss 4.0292   LearningRate 0.0091   Epoch: 13   Global Step: 579990   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:53:58,846-Speed 2624.59 samples/sec   Loss 4.1257   LearningRate 0.0091   Epoch: 13   Global Step: 580000   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:54:42,065-[lfw][580000]XNorm: 23.535062
Training: 2022-04-15 12:54:42,066-[lfw][580000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-15 12:54:42,066-[lfw][580000]Accuracy-Highest: 0.99800
Training: 2022-04-15 12:55:32,145-[cfp_fp][580000]XNorm: 22.038366
Training: 2022-04-15 12:55:32,145-[cfp_fp][580000]Accuracy-Flip: 0.99143+-0.00433
Training: 2022-04-15 12:55:32,146-[cfp_fp][580000]Accuracy-Highest: 0.99143
Training: 2022-04-15 12:56:15,256-[agedb_30][580000]XNorm: 23.660515
Training: 2022-04-15 12:56:15,256-[agedb_30][580000]Accuracy-Flip: 0.98000+-0.00764
Training: 2022-04-15 12:56:15,257-[agedb_30][580000]Accuracy-Highest: 0.98083
Training: 2022-04-15 12:56:19,134-Speed 72.99 samples/sec   Loss 4.0368   LearningRate 0.0091   Epoch: 13   Global Step: 580010   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:56:23,005-Speed 2645.97 samples/sec   Loss 4.1650   LearningRate 0.0090   Epoch: 13   Global Step: 580020   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:56:26,876-Speed 2646.18 samples/sec   Loss 4.0754   LearningRate 0.0090   Epoch: 13   Global Step: 580030   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:56:30,813-Speed 2601.51 samples/sec   Loss 4.0298   LearningRate 0.0090   Epoch: 13   Global Step: 580040   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:56:34,700-Speed 2634.59 samples/sec   Loss 4.0829   LearningRate 0.0090   Epoch: 13   Global Step: 580050   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-04-15 12:56:38,595-Speed 2629.27 samples/sec   Loss 4.0880   LearningRate 0.0090   Epoch: 13   Global Step: 580060   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:56:42,484-Speed 2634.28 samples/sec   Loss 4.0391   LearningRate 0.0090   Epoch: 13   Global Step: 580070   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:56:46,460-Speed 2576.04 samples/sec   Loss 4.1199   LearningRate 0.0090   Epoch: 13   Global Step: 580080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:56:50,358-Speed 2627.88 samples/sec   Loss 4.0003   LearningRate 0.0090   Epoch: 13   Global Step: 580090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:56:54,243-Speed 2636.50 samples/sec   Loss 4.0607   LearningRate 0.0090   Epoch: 13   Global Step: 580100   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:56:58,134-Speed 2632.61 samples/sec   Loss 4.0264   LearningRate 0.0090   Epoch: 13   Global Step: 580110   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:02,026-Speed 2631.72 samples/sec   Loss 4.0292   LearningRate 0.0090   Epoch: 13   Global Step: 580120   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:05,918-Speed 2631.52 samples/sec   Loss 4.0798   LearningRate 0.0090   Epoch: 13   Global Step: 580130   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:09,812-Speed 2630.20 samples/sec   Loss 4.0364   LearningRate 0.0090   Epoch: 13   Global Step: 580140   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:13,705-Speed 2630.79 samples/sec   Loss 4.0819   LearningRate 0.0090   Epoch: 13   Global Step: 580150   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:17,602-Speed 2628.84 samples/sec   Loss 4.1812   LearningRate 0.0090   Epoch: 13   Global Step: 580160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:57:21,494-Speed 2631.58 samples/sec   Loss 4.0897   LearningRate 0.0090   Epoch: 13   Global Step: 580170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:57:25,391-Speed 2628.46 samples/sec   Loss 4.1148   LearningRate 0.0090   Epoch: 13   Global Step: 580180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:57:29,280-Speed 2633.94 samples/sec   Loss 4.1474   LearningRate 0.0090   Epoch: 13   Global Step: 580190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:57:33,187-Speed 2621.51 samples/sec   Loss 4.1199   LearningRate 0.0090   Epoch: 13   Global Step: 580200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:57:37,080-Speed 2631.16 samples/sec   Loss 4.1812   LearningRate 0.0090   Epoch: 13   Global Step: 580210   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:57:40,946-Speed 2649.39 samples/sec   Loss 3.9956   LearningRate 0.0090   Epoch: 13   Global Step: 580220   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:44,843-Speed 2628.27 samples/sec   Loss 4.1460   LearningRate 0.0090   Epoch: 13   Global Step: 580230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:48,736-Speed 2631.68 samples/sec   Loss 4.0560   LearningRate 0.0090   Epoch: 13   Global Step: 580240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:52,637-Speed 2625.79 samples/sec   Loss 4.0008   LearningRate 0.0090   Epoch: 13   Global Step: 580250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:57:56,534-Speed 2628.31 samples/sec   Loss 4.0492   LearningRate 0.0090   Epoch: 13   Global Step: 580260   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:58:00,507-Speed 2578.43 samples/sec   Loss 4.1333   LearningRate 0.0090   Epoch: 13   Global Step: 580270   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:58:04,604-Speed 2500.04 samples/sec   Loss 4.0789   LearningRate 0.0090   Epoch: 13   Global Step: 580280   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:58:08,640-Speed 2537.30 samples/sec   Loss 4.0715   LearningRate 0.0090   Epoch: 13   Global Step: 580290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:58:12,728-Speed 2506.03 samples/sec   Loss 3.8774   LearningRate 0.0090   Epoch: 13   Global Step: 580300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:58:16,804-Speed 2512.74 samples/sec   Loss 4.0842   LearningRate 0.0090   Epoch: 13   Global Step: 580310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 12:58:20,709-Speed 2622.77 samples/sec   Loss 4.0855   LearningRate 0.0090   Epoch: 13   Global Step: 580320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:24,616-Speed 2621.04 samples/sec   Loss 4.1685   LearningRate 0.0090   Epoch: 13   Global Step: 580330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:28,508-Speed 2632.18 samples/sec   Loss 4.0548   LearningRate 0.0090   Epoch: 13   Global Step: 580340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:32,401-Speed 2630.84 samples/sec   Loss 4.0878   LearningRate 0.0090   Epoch: 13   Global Step: 580350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:36,289-Speed 2634.72 samples/sec   Loss 4.1003   LearningRate 0.0090   Epoch: 13   Global Step: 580360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:40,186-Speed 2628.05 samples/sec   Loss 4.0971   LearningRate 0.0090   Epoch: 13   Global Step: 580370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:44,143-Speed 2588.46 samples/sec   Loss 4.1355   LearningRate 0.0090   Epoch: 13   Global Step: 580380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:48,223-Speed 2509.87 samples/sec   Loss 4.0173   LearningRate 0.0090   Epoch: 13   Global Step: 580390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:52,208-Speed 2570.98 samples/sec   Loss 4.1379   LearningRate 0.0090   Epoch: 13   Global Step: 580400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:56,099-Speed 2632.42 samples/sec   Loss 4.0539   LearningRate 0.0090   Epoch: 13   Global Step: 580410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:58:59,990-Speed 2632.28 samples/sec   Loss 4.0315   LearningRate 0.0090   Epoch: 13   Global Step: 580420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 12:59:03,862-Speed 2645.22 samples/sec   Loss 4.0786   LearningRate 0.0090   Epoch: 13   Global Step: 580430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:07,827-Speed 2583.72 samples/sec   Loss 4.1048   LearningRate 0.0090   Epoch: 13   Global Step: 580440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:11,753-Speed 2608.47 samples/sec   Loss 4.0297   LearningRate 0.0090   Epoch: 13   Global Step: 580450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:15,651-Speed 2627.41 samples/sec   Loss 4.1229   LearningRate 0.0090   Epoch: 13   Global Step: 580460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:19,546-Speed 2629.23 samples/sec   Loss 4.1319   LearningRate 0.0090   Epoch: 13   Global Step: 580470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:23,482-Speed 2602.80 samples/sec   Loss 4.1442   LearningRate 0.0090   Epoch: 13   Global Step: 580480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:27,401-Speed 2613.87 samples/sec   Loss 4.0841   LearningRate 0.0090   Epoch: 13   Global Step: 580490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:31,314-Speed 2617.49 samples/sec   Loss 4.0461   LearningRate 0.0090   Epoch: 13   Global Step: 580500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:35,218-Speed 2623.53 samples/sec   Loss 4.0540   LearningRate 0.0090   Epoch: 13   Global Step: 580510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:39,119-Speed 2625.81 samples/sec   Loss 4.0326   LearningRate 0.0090   Epoch: 13   Global Step: 580520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:42,991-Speed 2645.60 samples/sec   Loss 4.0558   LearningRate 0.0090   Epoch: 13   Global Step: 580530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:46,919-Speed 2607.17 samples/sec   Loss 4.0218   LearningRate 0.0090   Epoch: 13   Global Step: 580540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:50,814-Speed 2629.18 samples/sec   Loss 4.0280   LearningRate 0.0090   Epoch: 13   Global Step: 580550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:54,730-Speed 2615.54 samples/sec   Loss 4.0680   LearningRate 0.0090   Epoch: 13   Global Step: 580560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 12:59:58,637-Speed 2621.83 samples/sec   Loss 4.0245   LearningRate 0.0090   Epoch: 13   Global Step: 580570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:02,530-Speed 2630.94 samples/sec   Loss 4.0355   LearningRate 0.0090   Epoch: 13   Global Step: 580580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:06,426-Speed 2629.55 samples/sec   Loss 4.1067   LearningRate 0.0090   Epoch: 13   Global Step: 580590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:10,321-Speed 2629.17 samples/sec   Loss 3.9839   LearningRate 0.0090   Epoch: 13   Global Step: 580600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:14,219-Speed 2627.71 samples/sec   Loss 4.0339   LearningRate 0.0090   Epoch: 13   Global Step: 580610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:18,128-Speed 2620.35 samples/sec   Loss 4.0317   LearningRate 0.0090   Epoch: 13   Global Step: 580620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:22,026-Speed 2627.93 samples/sec   Loss 4.1855   LearningRate 0.0090   Epoch: 13   Global Step: 580630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:00:25,922-Speed 2629.21 samples/sec   Loss 4.1259   LearningRate 0.0090   Epoch: 13   Global Step: 580640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:29,832-Speed 2618.92 samples/sec   Loss 4.0763   LearningRate 0.0090   Epoch: 13   Global Step: 580650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:33,747-Speed 2616.71 samples/sec   Loss 4.0299   LearningRate 0.0090   Epoch: 13   Global Step: 580660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:37,648-Speed 2626.24 samples/sec   Loss 4.1130   LearningRate 0.0090   Epoch: 13   Global Step: 580670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:41,548-Speed 2626.14 samples/sec   Loss 4.1370   LearningRate 0.0090   Epoch: 13   Global Step: 580680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:45,450-Speed 2625.41 samples/sec   Loss 4.0827   LearningRate 0.0090   Epoch: 13   Global Step: 580690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:00:49,354-Speed 2623.44 samples/sec   Loss 4.0219   LearningRate 0.0090   Epoch: 13   Global Step: 580700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:01:10,616-Speed 481.67 samples/sec   Loss 3.9939   LearningRate 0.0090   Epoch: 14   Global Step: 580710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:01:14,496-Speed 2640.46 samples/sec   Loss 4.0927   LearningRate 0.0090   Epoch: 14   Global Step: 580720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:01:18,431-Speed 2602.84 samples/sec   Loss 4.1343   LearningRate 0.0090   Epoch: 14   Global Step: 580730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:01:22,338-Speed 2621.69 samples/sec   Loss 4.0022   LearningRate 0.0090   Epoch: 14   Global Step: 580740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:26,225-Speed 2635.30 samples/sec   Loss 4.0888   LearningRate 0.0090   Epoch: 14   Global Step: 580750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:30,112-Speed 2634.49 samples/sec   Loss 4.0637   LearningRate 0.0090   Epoch: 14   Global Step: 580760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:33,994-Speed 2638.34 samples/sec   Loss 4.1179   LearningRate 0.0090   Epoch: 14   Global Step: 580770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:37,875-Speed 2639.60 samples/sec   Loss 4.1188   LearningRate 0.0090   Epoch: 14   Global Step: 580780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:41,767-Speed 2631.63 samples/sec   Loss 4.1257   LearningRate 0.0090   Epoch: 14   Global Step: 580790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:45,657-Speed 2633.97 samples/sec   Loss 4.1550   LearningRate 0.0090   Epoch: 14   Global Step: 580800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:01:49,551-Speed 2630.25 samples/sec   Loss 4.1440   LearningRate 0.0090   Epoch: 14   Global Step: 580810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:01:53,436-Speed 2636.37 samples/sec   Loss 4.1401   LearningRate 0.0090   Epoch: 14   Global Step: 580820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:01:57,325-Speed 2633.55 samples/sec   Loss 4.0687   LearningRate 0.0090   Epoch: 14   Global Step: 580830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:01,224-Speed 2626.63 samples/sec   Loss 4.0096   LearningRate 0.0090   Epoch: 14   Global Step: 580840   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:05,125-Speed 2625.87 samples/sec   Loss 4.0558   LearningRate 0.0090   Epoch: 14   Global Step: 580850   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:09,022-Speed 2628.71 samples/sec   Loss 4.0123   LearningRate 0.0090   Epoch: 14   Global Step: 580860   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:13,006-Speed 2571.36 samples/sec   Loss 4.0783   LearningRate 0.0090   Epoch: 14   Global Step: 580870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:16,921-Speed 2615.72 samples/sec   Loss 4.0728   LearningRate 0.0090   Epoch: 14   Global Step: 580880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:20,838-Speed 2615.43 samples/sec   Loss 4.0853   LearningRate 0.0090   Epoch: 14   Global Step: 580890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:24,738-Speed 2626.11 samples/sec   Loss 4.1103   LearningRate 0.0090   Epoch: 14   Global Step: 580900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:28,632-Speed 2630.47 samples/sec   Loss 3.9753   LearningRate 0.0090   Epoch: 14   Global Step: 580910   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:02:32,525-Speed 2630.59 samples/sec   Loss 3.9478   LearningRate 0.0090   Epoch: 14   Global Step: 580920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:36,423-Speed 2627.91 samples/sec   Loss 4.0831   LearningRate 0.0090   Epoch: 14   Global Step: 580930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:40,376-Speed 2591.15 samples/sec   Loss 4.0631   LearningRate 0.0090   Epoch: 14   Global Step: 580940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:44,290-Speed 2617.28 samples/sec   Loss 4.0635   LearningRate 0.0090   Epoch: 14   Global Step: 580950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:48,236-Speed 2595.37 samples/sec   Loss 4.0503   LearningRate 0.0090   Epoch: 14   Global Step: 580960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:52,136-Speed 2627.33 samples/sec   Loss 4.0123   LearningRate 0.0090   Epoch: 14   Global Step: 580970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:56,031-Speed 2629.40 samples/sec   Loss 4.0641   LearningRate 0.0090   Epoch: 14   Global Step: 580980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:02:59,927-Speed 2628.54 samples/sec   Loss 4.0841   LearningRate 0.0090   Epoch: 14   Global Step: 580990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:03:03,802-Speed 2642.94 samples/sec   Loss 4.0421   LearningRate 0.0090   Epoch: 14   Global Step: 581000   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:07,716-Speed 2616.91 samples/sec   Loss 4.1545   LearningRate 0.0090   Epoch: 14   Global Step: 581010   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:11,610-Speed 2630.43 samples/sec   Loss 4.0433   LearningRate 0.0090   Epoch: 14   Global Step: 581020   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:15,509-Speed 2627.41 samples/sec   Loss 4.0277   LearningRate 0.0090   Epoch: 14   Global Step: 581030   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:19,401-Speed 2631.63 samples/sec   Loss 3.9746   LearningRate 0.0090   Epoch: 14   Global Step: 581040   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:23,296-Speed 2630.01 samples/sec   Loss 4.0340   LearningRate 0.0090   Epoch: 14   Global Step: 581050   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:27,186-Speed 2632.86 samples/sec   Loss 3.9820   LearningRate 0.0090   Epoch: 14   Global Step: 581060   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:31,078-Speed 2631.61 samples/sec   Loss 3.9731   LearningRate 0.0090   Epoch: 14   Global Step: 581070   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:34,978-Speed 2626.10 samples/sec   Loss 4.0227   LearningRate 0.0090   Epoch: 14   Global Step: 581080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:38,870-Speed 2631.49 samples/sec   Loss 4.0045   LearningRate 0.0090   Epoch: 14   Global Step: 581090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:03:42,765-Speed 2629.61 samples/sec   Loss 4.0930   LearningRate 0.0090   Epoch: 14   Global Step: 581100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:03:46,657-Speed 2631.48 samples/sec   Loss 4.0043   LearningRate 0.0090   Epoch: 14   Global Step: 581110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:03:50,550-Speed 2630.98 samples/sec   Loss 4.1235   LearningRate 0.0090   Epoch: 14   Global Step: 581120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:03:54,448-Speed 2627.80 samples/sec   Loss 4.0238   LearningRate 0.0090   Epoch: 14   Global Step: 581130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:03:58,347-Speed 2627.45 samples/sec   Loss 4.0569   LearningRate 0.0090   Epoch: 14   Global Step: 581140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:04:02,254-Speed 2621.41 samples/sec   Loss 4.0786   LearningRate 0.0090   Epoch: 14   Global Step: 581150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:04:06,163-Speed 2619.74 samples/sec   Loss 4.0869   LearningRate 0.0090   Epoch: 14   Global Step: 581160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:04:10,059-Speed 2629.04 samples/sec   Loss 4.0791   LearningRate 0.0090   Epoch: 14   Global Step: 581170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:04:13,958-Speed 2627.53 samples/sec   Loss 4.1513   LearningRate 0.0090   Epoch: 14   Global Step: 581180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:04:17,852-Speed 2630.11 samples/sec   Loss 4.0542   LearningRate 0.0090   Epoch: 14   Global Step: 581190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:04:21,757-Speed 2622.87 samples/sec   Loss 4.0654   LearningRate 0.0090   Epoch: 14   Global Step: 581200   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:25,654-Speed 2628.33 samples/sec   Loss 4.0406   LearningRate 0.0090   Epoch: 14   Global Step: 581210   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:29,556-Speed 2625.26 samples/sec   Loss 3.9799   LearningRate 0.0090   Epoch: 14   Global Step: 581220   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:33,455-Speed 2627.07 samples/sec   Loss 3.9928   LearningRate 0.0090   Epoch: 14   Global Step: 581230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:37,346-Speed 2631.90 samples/sec   Loss 4.1384   LearningRate 0.0090   Epoch: 14   Global Step: 581240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:41,258-Speed 2618.24 samples/sec   Loss 4.0479   LearningRate 0.0090   Epoch: 14   Global Step: 581250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:45,158-Speed 2626.16 samples/sec   Loss 3.8933   LearningRate 0.0090   Epoch: 14   Global Step: 581260   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:49,056-Speed 2627.76 samples/sec   Loss 3.9286   LearningRate 0.0090   Epoch: 14   Global Step: 581270   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:52,949-Speed 2631.36 samples/sec   Loss 3.9805   LearningRate 0.0090   Epoch: 14   Global Step: 581280   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:04:56,845-Speed 2629.04 samples/sec   Loss 4.0178   LearningRate 0.0090   Epoch: 14   Global Step: 581290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:05:00,742-Speed 2628.58 samples/sec   Loss 4.0654   LearningRate 0.0090   Epoch: 14   Global Step: 581300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:04,645-Speed 2623.95 samples/sec   Loss 4.0863   LearningRate 0.0090   Epoch: 14   Global Step: 581310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:08,538-Speed 2631.06 samples/sec   Loss 3.9770   LearningRate 0.0090   Epoch: 14   Global Step: 581320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:12,433-Speed 2629.33 samples/sec   Loss 4.1118   LearningRate 0.0090   Epoch: 14   Global Step: 581330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:16,380-Speed 2595.46 samples/sec   Loss 3.9929   LearningRate 0.0090   Epoch: 14   Global Step: 581340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:20,527-Speed 2469.79 samples/sec   Loss 4.0504   LearningRate 0.0090   Epoch: 14   Global Step: 581350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:24,447-Speed 2613.09 samples/sec   Loss 4.0318   LearningRate 0.0090   Epoch: 14   Global Step: 581360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:28,355-Speed 2621.09 samples/sec   Loss 4.0034   LearningRate 0.0090   Epoch: 14   Global Step: 581370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:32,253-Speed 2627.72 samples/sec   Loss 4.0715   LearningRate 0.0090   Epoch: 14   Global Step: 581380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:36,145-Speed 2631.94 samples/sec   Loss 4.0662   LearningRate 0.0090   Epoch: 14   Global Step: 581390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:40,039-Speed 2630.33 samples/sec   Loss 3.9844   LearningRate 0.0089   Epoch: 14   Global Step: 581400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:05:43,928-Speed 2633.43 samples/sec   Loss 3.9828   LearningRate 0.0089   Epoch: 14   Global Step: 581410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:47,823-Speed 2629.47 samples/sec   Loss 4.1321   LearningRate 0.0089   Epoch: 14   Global Step: 581420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:51,717-Speed 2630.32 samples/sec   Loss 3.9851   LearningRate 0.0089   Epoch: 14   Global Step: 581430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:55,610-Speed 2631.16 samples/sec   Loss 4.0610   LearningRate 0.0089   Epoch: 14   Global Step: 581440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:05:59,506-Speed 2629.32 samples/sec   Loss 3.9956   LearningRate 0.0089   Epoch: 14   Global Step: 581450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:06:03,406-Speed 2626.56 samples/sec   Loss 4.0074   LearningRate 0.0089   Epoch: 14   Global Step: 581460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:06:07,334-Speed 2607.69 samples/sec   Loss 3.9607   LearningRate 0.0089   Epoch: 14   Global Step: 581470   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:11,264-Speed 2605.95 samples/sec   Loss 3.9923   LearningRate 0.0089   Epoch: 14   Global Step: 581480   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:15,160-Speed 2629.07 samples/sec   Loss 4.0600   LearningRate 0.0089   Epoch: 14   Global Step: 581490   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:19,214-Speed 2526.10 samples/sec   Loss 3.9621   LearningRate 0.0089   Epoch: 14   Global Step: 581500   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:23,166-Speed 2591.85 samples/sec   Loss 4.0449   LearningRate 0.0089   Epoch: 14   Global Step: 581510   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:27,060-Speed 2629.79 samples/sec   Loss 4.1348   LearningRate 0.0089   Epoch: 14   Global Step: 581520   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:30,953-Speed 2631.30 samples/sec   Loss 4.0378   LearningRate 0.0089   Epoch: 14   Global Step: 581530   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:34,847-Speed 2631.04 samples/sec   Loss 3.9726   LearningRate 0.0089   Epoch: 14   Global Step: 581540   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:38,740-Speed 2631.00 samples/sec   Loss 3.9866   LearningRate 0.0089   Epoch: 14   Global Step: 581550   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:42,632-Speed 2631.45 samples/sec   Loss 4.0447   LearningRate 0.0089   Epoch: 14   Global Step: 581560   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:06:46,547-Speed 2616.77 samples/sec   Loss 4.0841   LearningRate 0.0089   Epoch: 14   Global Step: 581570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:06:50,441-Speed 2630.71 samples/sec   Loss 4.1003   LearningRate 0.0089   Epoch: 14   Global Step: 581580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:06:54,333-Speed 2631.82 samples/sec   Loss 4.0904   LearningRate 0.0089   Epoch: 14   Global Step: 581590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:06:58,230-Speed 2628.37 samples/sec   Loss 4.1021   LearningRate 0.0089   Epoch: 14   Global Step: 581600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:02,132-Speed 2624.60 samples/sec   Loss 3.9813   LearningRate 0.0089   Epoch: 14   Global Step: 581610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:06,028-Speed 2629.06 samples/sec   Loss 3.9871   LearningRate 0.0089   Epoch: 14   Global Step: 581620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:09,978-Speed 2593.04 samples/sec   Loss 4.0961   LearningRate 0.0089   Epoch: 14   Global Step: 581630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:13,876-Speed 2628.55 samples/sec   Loss 4.0076   LearningRate 0.0089   Epoch: 14   Global Step: 581640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:17,775-Speed 2626.78 samples/sec   Loss 4.0433   LearningRate 0.0089   Epoch: 14   Global Step: 581650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:21,675-Speed 2626.63 samples/sec   Loss 4.0654   LearningRate 0.0089   Epoch: 14   Global Step: 581660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:25,549-Speed 2644.37 samples/sec   Loss 4.0206   LearningRate 0.0089   Epoch: 14   Global Step: 581670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:29,452-Speed 2625.10 samples/sec   Loss 4.0342   LearningRate 0.0089   Epoch: 14   Global Step: 581680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:33,375-Speed 2610.48 samples/sec   Loss 4.0833   LearningRate 0.0089   Epoch: 14   Global Step: 581690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:37,272-Speed 2628.20 samples/sec   Loss 3.9992   LearningRate 0.0089   Epoch: 14   Global Step: 581700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:41,164-Speed 2631.97 samples/sec   Loss 4.0177   LearningRate 0.0089   Epoch: 14   Global Step: 581710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:45,072-Speed 2622.34 samples/sec   Loss 4.0384   LearningRate 0.0089   Epoch: 14   Global Step: 581720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:48,985-Speed 2618.09 samples/sec   Loss 4.0412   LearningRate 0.0089   Epoch: 14   Global Step: 581730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:07:52,905-Speed 2612.56 samples/sec   Loss 4.1611   LearningRate 0.0089   Epoch: 14   Global Step: 581740   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:07:56,798-Speed 2631.82 samples/sec   Loss 3.9819   LearningRate 0.0089   Epoch: 14   Global Step: 581750   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:00,715-Speed 2614.69 samples/sec   Loss 3.9482   LearningRate 0.0089   Epoch: 14   Global Step: 581760   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:04,610-Speed 2629.52 samples/sec   Loss 4.0789   LearningRate 0.0089   Epoch: 14   Global Step: 581770   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:08,502-Speed 2631.62 samples/sec   Loss 4.0746   LearningRate 0.0089   Epoch: 14   Global Step: 581780   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:12,393-Speed 2632.32 samples/sec   Loss 4.0302   LearningRate 0.0089   Epoch: 14   Global Step: 581790   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:16,288-Speed 2629.77 samples/sec   Loss 4.0188   LearningRate 0.0089   Epoch: 14   Global Step: 581800   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:20,189-Speed 2625.88 samples/sec   Loss 4.0393   LearningRate 0.0089   Epoch: 14   Global Step: 581810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:24,096-Speed 2621.99 samples/sec   Loss 4.0485   LearningRate 0.0089   Epoch: 14   Global Step: 581820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:27,995-Speed 2626.70 samples/sec   Loss 4.0837   LearningRate 0.0089   Epoch: 14   Global Step: 581830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:31,903-Speed 2621.60 samples/sec   Loss 3.9895   LearningRate 0.0089   Epoch: 14   Global Step: 581840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:08:35,778-Speed 2642.85 samples/sec   Loss 4.1007   LearningRate 0.0089   Epoch: 14   Global Step: 581850   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:39,684-Speed 2622.03 samples/sec   Loss 4.1075   LearningRate 0.0089   Epoch: 14   Global Step: 581860   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:43,585-Speed 2625.95 samples/sec   Loss 4.0629   LearningRate 0.0089   Epoch: 14   Global Step: 581870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:47,477-Speed 2631.84 samples/sec   Loss 4.0191   LearningRate 0.0089   Epoch: 14   Global Step: 581880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:51,380-Speed 2625.00 samples/sec   Loss 4.1009   LearningRate 0.0089   Epoch: 14   Global Step: 581890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:55,278-Speed 2627.33 samples/sec   Loss 4.0476   LearningRate 0.0089   Epoch: 14   Global Step: 581900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:08:59,221-Speed 2606.47 samples/sec   Loss 4.0155   LearningRate 0.0089   Epoch: 14   Global Step: 581910   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:03,121-Speed 2626.55 samples/sec   Loss 4.0471   LearningRate 0.0089   Epoch: 14   Global Step: 581920   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:07,013-Speed 2631.48 samples/sec   Loss 4.1109   LearningRate 0.0089   Epoch: 14   Global Step: 581930   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:10,905-Speed 2631.71 samples/sec   Loss 4.0733   LearningRate 0.0089   Epoch: 14   Global Step: 581940   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:14,805-Speed 2626.48 samples/sec   Loss 4.0581   LearningRate 0.0089   Epoch: 14   Global Step: 581950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:09:18,711-Speed 2622.64 samples/sec   Loss 4.0202   LearningRate 0.0089   Epoch: 14   Global Step: 581960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:09:22,608-Speed 2628.66 samples/sec   Loss 4.1138   LearningRate 0.0089   Epoch: 14   Global Step: 581970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:09:26,502-Speed 2630.48 samples/sec   Loss 4.0594   LearningRate 0.0089   Epoch: 14   Global Step: 581980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:09:30,401-Speed 2627.14 samples/sec   Loss 3.9357   LearningRate 0.0089   Epoch: 14   Global Step: 581990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:09:34,295-Speed 2629.74 samples/sec   Loss 4.0949   LearningRate 0.0089   Epoch: 14   Global Step: 582000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:09:38,174-Speed 2640.50 samples/sec   Loss 4.0711   LearningRate 0.0089   Epoch: 14   Global Step: 582010   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:42,071-Speed 2629.37 samples/sec   Loss 3.9830   LearningRate 0.0089   Epoch: 14   Global Step: 582020   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:45,964-Speed 2630.80 samples/sec   Loss 3.9698   LearningRate 0.0089   Epoch: 14   Global Step: 582030   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:49,863-Speed 2626.83 samples/sec   Loss 4.0337   LearningRate 0.0089   Epoch: 14   Global Step: 582040   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:53,760-Speed 2628.40 samples/sec   Loss 4.0661   LearningRate 0.0089   Epoch: 14   Global Step: 582050   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:09:57,654-Speed 2630.69 samples/sec   Loss 4.0007   LearningRate 0.0089   Epoch: 14   Global Step: 582060   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:10:01,547-Speed 2630.94 samples/sec   Loss 4.0538   LearningRate 0.0089   Epoch: 14   Global Step: 582070   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:10:05,464-Speed 2614.73 samples/sec   Loss 3.9494   LearningRate 0.0089   Epoch: 14   Global Step: 582080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:10:09,393-Speed 2607.35 samples/sec   Loss 4.1178   LearningRate 0.0089   Epoch: 14   Global Step: 582090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:10:13,396-Speed 2558.58 samples/sec   Loss 4.0562   LearningRate 0.0089   Epoch: 14   Global Step: 582100   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:10:17,355-Speed 2587.09 samples/sec   Loss 4.0533   LearningRate 0.0089   Epoch: 14   Global Step: 582110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:21,250-Speed 2629.88 samples/sec   Loss 3.9665   LearningRate 0.0089   Epoch: 14   Global Step: 582120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:25,157-Speed 2621.84 samples/sec   Loss 4.0902   LearningRate 0.0089   Epoch: 14   Global Step: 582130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:29,054-Speed 2629.08 samples/sec   Loss 4.1180   LearningRate 0.0089   Epoch: 14   Global Step: 582140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:33,038-Speed 2570.81 samples/sec   Loss 3.9319   LearningRate 0.0089   Epoch: 14   Global Step: 582150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:36,958-Speed 2613.31 samples/sec   Loss 4.1113   LearningRate 0.0089   Epoch: 14   Global Step: 582160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:40,861-Speed 2624.01 samples/sec   Loss 3.9807   LearningRate 0.0089   Epoch: 14   Global Step: 582170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:44,779-Speed 2614.40 samples/sec   Loss 4.0049   LearningRate 0.0089   Epoch: 14   Global Step: 582180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:48,672-Speed 2630.68 samples/sec   Loss 4.0675   LearningRate 0.0089   Epoch: 14   Global Step: 582190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:52,563-Speed 2632.40 samples/sec   Loss 4.0400   LearningRate 0.0089   Epoch: 14   Global Step: 582200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:10:56,457-Speed 2630.25 samples/sec   Loss 3.9740   LearningRate 0.0089   Epoch: 14   Global Step: 582210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:11:00,375-Speed 2614.88 samples/sec   Loss 4.0663   LearningRate 0.0089   Epoch: 14   Global Step: 582220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:11:04,268-Speed 2631.16 samples/sec   Loss 3.9996   LearningRate 0.0089   Epoch: 14   Global Step: 582230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:11:08,183-Speed 2616.68 samples/sec   Loss 4.0779   LearningRate 0.0089   Epoch: 14   Global Step: 582240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:12,082-Speed 2626.91 samples/sec   Loss 4.0750   LearningRate 0.0089   Epoch: 14   Global Step: 582250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:15,986-Speed 2623.59 samples/sec   Loss 3.9907   LearningRate 0.0089   Epoch: 14   Global Step: 582260   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:19,881-Speed 2629.33 samples/sec   Loss 4.0662   LearningRate 0.0089   Epoch: 14   Global Step: 582270   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:23,776-Speed 2629.87 samples/sec   Loss 4.1043   LearningRate 0.0089   Epoch: 14   Global Step: 582280   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:27,667-Speed 2632.62 samples/sec   Loss 3.9944   LearningRate 0.0089   Epoch: 14   Global Step: 582290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:31,563-Speed 2629.20 samples/sec   Loss 4.0409   LearningRate 0.0089   Epoch: 14   Global Step: 582300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:35,454-Speed 2632.28 samples/sec   Loss 4.0879   LearningRate 0.0089   Epoch: 14   Global Step: 582310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:39,348-Speed 2630.35 samples/sec   Loss 4.0530   LearningRate 0.0089   Epoch: 14   Global Step: 582320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:43,258-Speed 2619.92 samples/sec   Loss 4.0117   LearningRate 0.0089   Epoch: 14   Global Step: 582330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:11:47,161-Speed 2624.48 samples/sec   Loss 4.0163   LearningRate 0.0089   Epoch: 14   Global Step: 582340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:11:51,064-Speed 2623.85 samples/sec   Loss 3.9495   LearningRate 0.0089   Epoch: 14   Global Step: 582350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:11:54,963-Speed 2626.95 samples/sec   Loss 3.9748   LearningRate 0.0089   Epoch: 14   Global Step: 582360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:11:58,858-Speed 2629.71 samples/sec   Loss 4.0544   LearningRate 0.0089   Epoch: 14   Global Step: 582370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:02,755-Speed 2628.47 samples/sec   Loss 4.0584   LearningRate 0.0089   Epoch: 14   Global Step: 582380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:06,647-Speed 2632.00 samples/sec   Loss 4.0283   LearningRate 0.0089   Epoch: 14   Global Step: 582390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:10,546-Speed 2627.15 samples/sec   Loss 4.0266   LearningRate 0.0089   Epoch: 14   Global Step: 582400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:14,437-Speed 2632.21 samples/sec   Loss 4.0459   LearningRate 0.0089   Epoch: 14   Global Step: 582410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:18,337-Speed 2625.99 samples/sec   Loss 4.0575   LearningRate 0.0089   Epoch: 14   Global Step: 582420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:22,234-Speed 2628.86 samples/sec   Loss 3.9690   LearningRate 0.0089   Epoch: 14   Global Step: 582430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:12:26,127-Speed 2630.78 samples/sec   Loss 3.9783   LearningRate 0.0089   Epoch: 14   Global Step: 582440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:30,020-Speed 2631.37 samples/sec   Loss 4.0809   LearningRate 0.0089   Epoch: 14   Global Step: 582450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:33,918-Speed 2628.38 samples/sec   Loss 4.0830   LearningRate 0.0089   Epoch: 14   Global Step: 582460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:37,818-Speed 2626.07 samples/sec   Loss 4.0922   LearningRate 0.0089   Epoch: 14   Global Step: 582470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:41,715-Speed 2628.24 samples/sec   Loss 4.0162   LearningRate 0.0089   Epoch: 14   Global Step: 582480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:45,611-Speed 2629.26 samples/sec   Loss 4.0775   LearningRate 0.0089   Epoch: 14   Global Step: 582490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:49,519-Speed 2621.03 samples/sec   Loss 4.0569   LearningRate 0.0089   Epoch: 14   Global Step: 582500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:53,419-Speed 2626.42 samples/sec   Loss 4.0832   LearningRate 0.0089   Epoch: 14   Global Step: 582510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:12:57,296-Speed 2641.66 samples/sec   Loss 4.0690   LearningRate 0.0089   Epoch: 14   Global Step: 582520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:01,195-Speed 2626.50 samples/sec   Loss 4.0861   LearningRate 0.0089   Epoch: 14   Global Step: 582530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:05,120-Speed 2609.70 samples/sec   Loss 4.0067   LearningRate 0.0089   Epoch: 14   Global Step: 582540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:09,018-Speed 2627.44 samples/sec   Loss 3.9348   LearningRate 0.0089   Epoch: 14   Global Step: 582550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:12,911-Speed 2630.83 samples/sec   Loss 4.0327   LearningRate 0.0089   Epoch: 14   Global Step: 582560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:16,830-Speed 2614.14 samples/sec   Loss 4.0260   LearningRate 0.0089   Epoch: 14   Global Step: 582570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:20,814-Speed 2571.50 samples/sec   Loss 4.0137   LearningRate 0.0089   Epoch: 14   Global Step: 582580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:24,711-Speed 2627.92 samples/sec   Loss 4.0064   LearningRate 0.0089   Epoch: 14   Global Step: 582590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:28,613-Speed 2625.46 samples/sec   Loss 4.0392   LearningRate 0.0089   Epoch: 14   Global Step: 582600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:32,514-Speed 2624.89 samples/sec   Loss 4.0699   LearningRate 0.0089   Epoch: 14   Global Step: 582610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:36,425-Speed 2619.16 samples/sec   Loss 4.0348   LearningRate 0.0089   Epoch: 14   Global Step: 582620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:13:40,305-Speed 2639.56 samples/sec   Loss 3.9659   LearningRate 0.0089   Epoch: 14   Global Step: 582630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:44,211-Speed 2629.76 samples/sec   Loss 4.1219   LearningRate 0.0089   Epoch: 14   Global Step: 582640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:48,141-Speed 2606.66 samples/sec   Loss 3.9605   LearningRate 0.0089   Epoch: 14   Global Step: 582650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:52,037-Speed 2629.06 samples/sec   Loss 4.0726   LearningRate 0.0089   Epoch: 14   Global Step: 582660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:55,942-Speed 2622.62 samples/sec   Loss 4.0667   LearningRate 0.0089   Epoch: 14   Global Step: 582670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:13:59,841-Speed 2627.06 samples/sec   Loss 3.9833   LearningRate 0.0089   Epoch: 14   Global Step: 582680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:03,741-Speed 2626.60 samples/sec   Loss 3.9049   LearningRate 0.0089   Epoch: 14   Global Step: 582690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:07,642-Speed 2624.93 samples/sec   Loss 4.0481   LearningRate 0.0089   Epoch: 14   Global Step: 582700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:11,540-Speed 2628.03 samples/sec   Loss 4.0635   LearningRate 0.0089   Epoch: 14   Global Step: 582710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:15,442-Speed 2625.05 samples/sec   Loss 4.0076   LearningRate 0.0089   Epoch: 14   Global Step: 582720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:19,339-Speed 2628.60 samples/sec   Loss 3.9943   LearningRate 0.0089   Epoch: 14   Global Step: 582730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:14:23,238-Speed 2626.73 samples/sec   Loss 4.0297   LearningRate 0.0089   Epoch: 14   Global Step: 582740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:14:27,116-Speed 2641.89 samples/sec   Loss 4.0169   LearningRate 0.0089   Epoch: 14   Global Step: 582750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:31,015-Speed 2626.80 samples/sec   Loss 4.0535   LearningRate 0.0089   Epoch: 14   Global Step: 582760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:34,916-Speed 2625.30 samples/sec   Loss 4.0625   LearningRate 0.0089   Epoch: 14   Global Step: 582770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:38,815-Speed 2626.55 samples/sec   Loss 3.9538   LearningRate 0.0089   Epoch: 14   Global Step: 582780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:42,712-Speed 2628.54 samples/sec   Loss 3.9862   LearningRate 0.0088   Epoch: 14   Global Step: 582790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:46,606-Speed 2630.34 samples/sec   Loss 3.9262   LearningRate 0.0088   Epoch: 14   Global Step: 582800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:50,533-Speed 2608.63 samples/sec   Loss 4.0591   LearningRate 0.0088   Epoch: 14   Global Step: 582810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:54,434-Speed 2625.33 samples/sec   Loss 4.0493   LearningRate 0.0088   Epoch: 14   Global Step: 582820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:14:58,328-Speed 2631.03 samples/sec   Loss 4.0628   LearningRate 0.0088   Epoch: 14   Global Step: 582830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:15:02,224-Speed 2628.94 samples/sec   Loss 3.9914   LearningRate 0.0088   Epoch: 14   Global Step: 582840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:15:06,101-Speed 2641.63 samples/sec   Loss 3.9860   LearningRate 0.0088   Epoch: 14   Global Step: 582850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:15:10,001-Speed 2626.15 samples/sec   Loss 4.0187   LearningRate 0.0088   Epoch: 14   Global Step: 582860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:15:13,901-Speed 2627.06 samples/sec   Loss 3.9561   LearningRate 0.0088   Epoch: 14   Global Step: 582870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:15:17,774-Speed 2644.17 samples/sec   Loss 4.0954   LearningRate 0.0088   Epoch: 14   Global Step: 582880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:21,670-Speed 2629.28 samples/sec   Loss 4.1077   LearningRate 0.0088   Epoch: 14   Global Step: 582890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:25,575-Speed 2623.03 samples/sec   Loss 3.9833   LearningRate 0.0088   Epoch: 14   Global Step: 582900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:29,469-Speed 2630.92 samples/sec   Loss 4.1168   LearningRate 0.0088   Epoch: 14   Global Step: 582910   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:33,373-Speed 2623.44 samples/sec   Loss 3.9855   LearningRate 0.0088   Epoch: 14   Global Step: 582920   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:37,276-Speed 2624.02 samples/sec   Loss 4.0637   LearningRate 0.0088   Epoch: 14   Global Step: 582930   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:41,187-Speed 2618.99 samples/sec   Loss 4.0087   LearningRate 0.0088   Epoch: 14   Global Step: 582940   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:45,088-Speed 2625.72 samples/sec   Loss 3.9908   LearningRate 0.0088   Epoch: 14   Global Step: 582950   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:49,030-Speed 2598.30 samples/sec   Loss 3.9944   LearningRate 0.0088   Epoch: 14   Global Step: 582960   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:52,930-Speed 2626.51 samples/sec   Loss 3.9835   LearningRate 0.0088   Epoch: 14   Global Step: 582970   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:15:56,828-Speed 2628.01 samples/sec   Loss 4.0362   LearningRate 0.0088   Epoch: 14   Global Step: 582980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:00,911-Speed 2508.64 samples/sec   Loss 4.1555   LearningRate 0.0088   Epoch: 14   Global Step: 582990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:04,918-Speed 2555.94 samples/sec   Loss 4.0225   LearningRate 0.0088   Epoch: 14   Global Step: 583000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:08,811-Speed 2630.96 samples/sec   Loss 4.1157   LearningRate 0.0088   Epoch: 14   Global Step: 583010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:12,711-Speed 2626.30 samples/sec   Loss 3.9672   LearningRate 0.0088   Epoch: 14   Global Step: 583020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:16,625-Speed 2616.88 samples/sec   Loss 4.0348   LearningRate 0.0088   Epoch: 14   Global Step: 583030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:20,530-Speed 2623.04 samples/sec   Loss 3.9989   LearningRate 0.0088   Epoch: 14   Global Step: 583040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:24,425-Speed 2630.77 samples/sec   Loss 4.0686   LearningRate 0.0088   Epoch: 14   Global Step: 583050   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:28,318-Speed 2630.68 samples/sec   Loss 3.9809   LearningRate 0.0088   Epoch: 14   Global Step: 583060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:32,214-Speed 2628.91 samples/sec   Loss 3.9929   LearningRate 0.0088   Epoch: 14   Global Step: 583070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:36,114-Speed 2626.37 samples/sec   Loss 4.0824   LearningRate 0.0088   Epoch: 14   Global Step: 583080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:16:39,990-Speed 2641.99 samples/sec   Loss 4.0450   LearningRate 0.0088   Epoch: 14   Global Step: 583090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:43,888-Speed 2627.76 samples/sec   Loss 3.9131   LearningRate 0.0088   Epoch: 14   Global Step: 583100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:47,788-Speed 2626.34 samples/sec   Loss 4.0347   LearningRate 0.0088   Epoch: 14   Global Step: 583110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:51,680-Speed 2631.37 samples/sec   Loss 3.9713   LearningRate 0.0088   Epoch: 14   Global Step: 583120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:55,587-Speed 2621.18 samples/sec   Loss 4.0588   LearningRate 0.0088   Epoch: 14   Global Step: 583130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:16:59,500-Speed 2618.28 samples/sec   Loss 4.0265   LearningRate 0.0088   Epoch: 14   Global Step: 583140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:03,404-Speed 2623.62 samples/sec   Loss 4.0523   LearningRate 0.0088   Epoch: 14   Global Step: 583150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:07,305-Speed 2625.72 samples/sec   Loss 3.9318   LearningRate 0.0088   Epoch: 14   Global Step: 583160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:11,196-Speed 2631.97 samples/sec   Loss 4.0533   LearningRate 0.0088   Epoch: 14   Global Step: 583170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:15,114-Speed 2614.67 samples/sec   Loss 4.0637   LearningRate 0.0088   Epoch: 14   Global Step: 583180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:19,016-Speed 2624.78 samples/sec   Loss 4.0244   LearningRate 0.0088   Epoch: 14   Global Step: 583190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:17:22,912-Speed 2628.93 samples/sec   Loss 3.9875   LearningRate 0.0088   Epoch: 14   Global Step: 583200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:17:26,816-Speed 2623.41 samples/sec   Loss 4.0443   LearningRate 0.0088   Epoch: 14   Global Step: 583210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:17:30,696-Speed 2639.89 samples/sec   Loss 3.9889   LearningRate 0.0088   Epoch: 14   Global Step: 583220   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:34,594-Speed 2627.86 samples/sec   Loss 3.9540   LearningRate 0.0088   Epoch: 14   Global Step: 583230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:38,494-Speed 2626.38 samples/sec   Loss 3.9993   LearningRate 0.0088   Epoch: 14   Global Step: 583240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:42,399-Speed 2622.99 samples/sec   Loss 3.9984   LearningRate 0.0088   Epoch: 14   Global Step: 583250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:46,290-Speed 2632.18 samples/sec   Loss 4.0024   LearningRate 0.0088   Epoch: 14   Global Step: 583260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:50,196-Speed 2621.81 samples/sec   Loss 3.9345   LearningRate 0.0088   Epoch: 14   Global Step: 583270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:54,089-Speed 2631.33 samples/sec   Loss 3.9495   LearningRate 0.0088   Epoch: 14   Global Step: 583280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:17:57,995-Speed 2621.82 samples/sec   Loss 4.0637   LearningRate 0.0088   Epoch: 14   Global Step: 583290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:01,916-Speed 2612.00 samples/sec   Loss 3.9558   LearningRate 0.0088   Epoch: 14   Global Step: 583300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:05,819-Speed 2624.61 samples/sec   Loss 3.9675   LearningRate 0.0088   Epoch: 14   Global Step: 583310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:09,716-Speed 2628.96 samples/sec   Loss 4.0417   LearningRate 0.0088   Epoch: 14   Global Step: 583320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:18:13,627-Speed 2618.70 samples/sec   Loss 4.1008   LearningRate 0.0088   Epoch: 14   Global Step: 583330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:18:17,537-Speed 2619.91 samples/sec   Loss 3.9492   LearningRate 0.0088   Epoch: 14   Global Step: 583340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:18:21,427-Speed 2633.09 samples/sec   Loss 4.0314   LearningRate 0.0088   Epoch: 14   Global Step: 583350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:18:25,300-Speed 2644.23 samples/sec   Loss 4.0405   LearningRate 0.0088   Epoch: 14   Global Step: 583360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:29,201-Speed 2626.01 samples/sec   Loss 3.9560   LearningRate 0.0088   Epoch: 14   Global Step: 583370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:33,111-Speed 2619.09 samples/sec   Loss 3.9562   LearningRate 0.0088   Epoch: 14   Global Step: 583380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:37,024-Speed 2617.66 samples/sec   Loss 4.0282   LearningRate 0.0088   Epoch: 14   Global Step: 583390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:40,927-Speed 2624.08 samples/sec   Loss 4.0203   LearningRate 0.0088   Epoch: 14   Global Step: 583400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:44,825-Speed 2628.38 samples/sec   Loss 3.9583   LearningRate 0.0088   Epoch: 14   Global Step: 583410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:48,716-Speed 2632.21 samples/sec   Loss 4.0929   LearningRate 0.0088   Epoch: 14   Global Step: 583420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:52,607-Speed 2632.67 samples/sec   Loss 4.1107   LearningRate 0.0088   Epoch: 14   Global Step: 583430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:18:56,518-Speed 2618.94 samples/sec   Loss 4.0444   LearningRate 0.0088   Epoch: 14   Global Step: 583440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:00,415-Speed 2628.05 samples/sec   Loss 4.0226   LearningRate 0.0088   Epoch: 14   Global Step: 583450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:04,318-Speed 2623.52 samples/sec   Loss 4.0222   LearningRate 0.0088   Epoch: 14   Global Step: 583460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:19:08,221-Speed 2624.94 samples/sec   Loss 4.0005   LearningRate 0.0088   Epoch: 14   Global Step: 583470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:19:12,091-Speed 2646.72 samples/sec   Loss 4.0486   LearningRate 0.0088   Epoch: 14   Global Step: 583480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:16,015-Speed 2610.65 samples/sec   Loss 4.0580   LearningRate 0.0088   Epoch: 14   Global Step: 583490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:19,911-Speed 2629.62 samples/sec   Loss 3.9332   LearningRate 0.0088   Epoch: 14   Global Step: 583500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:23,850-Speed 2600.28 samples/sec   Loss 3.9524   LearningRate 0.0088   Epoch: 14   Global Step: 583510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:27,774-Speed 2610.08 samples/sec   Loss 4.0773   LearningRate 0.0088   Epoch: 14   Global Step: 583520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:31,671-Speed 2627.95 samples/sec   Loss 4.0490   LearningRate 0.0088   Epoch: 14   Global Step: 583530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:35,571-Speed 2626.24 samples/sec   Loss 4.0436   LearningRate 0.0088   Epoch: 14   Global Step: 583540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:39,466-Speed 2629.90 samples/sec   Loss 4.0121   LearningRate 0.0088   Epoch: 14   Global Step: 583550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:43,364-Speed 2628.26 samples/sec   Loss 4.0771   LearningRate 0.0088   Epoch: 14   Global Step: 583560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:19:47,234-Speed 2646.55 samples/sec   Loss 3.9906   LearningRate 0.0088   Epoch: 14   Global Step: 583570   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:19:51,151-Speed 2615.07 samples/sec   Loss 4.0815   LearningRate 0.0088   Epoch: 14   Global Step: 583580   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:19:55,041-Speed 2632.83 samples/sec   Loss 3.9701   LearningRate 0.0088   Epoch: 14   Global Step: 583590   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:19:58,951-Speed 2620.24 samples/sec   Loss 3.9575   LearningRate 0.0088   Epoch: 14   Global Step: 583600   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:02,855-Speed 2630.99 samples/sec   Loss 3.9271   LearningRate 0.0088   Epoch: 14   Global Step: 583610   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:06,770-Speed 2615.95 samples/sec   Loss 3.9973   LearningRate 0.0088   Epoch: 14   Global Step: 583620   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:10,677-Speed 2621.50 samples/sec   Loss 4.0115   LearningRate 0.0088   Epoch: 14   Global Step: 583630   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:14,586-Speed 2621.17 samples/sec   Loss 3.9671   LearningRate 0.0088   Epoch: 14   Global Step: 583640   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:18,476-Speed 2632.24 samples/sec   Loss 4.0542   LearningRate 0.0088   Epoch: 14   Global Step: 583650   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:22,385-Speed 2620.83 samples/sec   Loss 3.9534   LearningRate 0.0088   Epoch: 14   Global Step: 583660   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:20:26,279-Speed 2630.62 samples/sec   Loss 4.0425   LearningRate 0.0088   Epoch: 14   Global Step: 583670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:30,174-Speed 2630.21 samples/sec   Loss 4.0563   LearningRate 0.0088   Epoch: 14   Global Step: 583680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:34,071-Speed 2627.87 samples/sec   Loss 3.9751   LearningRate 0.0088   Epoch: 14   Global Step: 583690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:37,991-Speed 2612.81 samples/sec   Loss 4.0485   LearningRate 0.0088   Epoch: 14   Global Step: 583700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:41,885-Speed 2630.41 samples/sec   Loss 3.9574   LearningRate 0.0088   Epoch: 14   Global Step: 583710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:45,783-Speed 2628.07 samples/sec   Loss 4.0031   LearningRate 0.0088   Epoch: 14   Global Step: 583720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:49,679-Speed 2628.88 samples/sec   Loss 3.9402   LearningRate 0.0088   Epoch: 14   Global Step: 583730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:53,591-Speed 2618.61 samples/sec   Loss 3.9948   LearningRate 0.0088   Epoch: 14   Global Step: 583740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:20:57,485-Speed 2630.06 samples/sec   Loss 4.0514   LearningRate 0.0088   Epoch: 14   Global Step: 583750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:01,376-Speed 2632.67 samples/sec   Loss 4.1038   LearningRate 0.0088   Epoch: 14   Global Step: 583760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:05,270-Speed 2630.09 samples/sec   Loss 3.9681   LearningRate 0.0088   Epoch: 14   Global Step: 583770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:21:09,166-Speed 2629.36 samples/sec   Loss 4.0383   LearningRate 0.0088   Epoch: 14   Global Step: 583780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:21:13,065-Speed 2627.10 samples/sec   Loss 4.0021   LearningRate 0.0088   Epoch: 14   Global Step: 583790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:21:16,959-Speed 2629.77 samples/sec   Loss 3.9829   LearningRate 0.0088   Epoch: 14   Global Step: 583800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:21:20,830-Speed 2646.53 samples/sec   Loss 3.9771   LearningRate 0.0088   Epoch: 14   Global Step: 583810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:24,731-Speed 2624.99 samples/sec   Loss 3.9579   LearningRate 0.0088   Epoch: 14   Global Step: 583820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:28,645-Speed 2617.10 samples/sec   Loss 3.9582   LearningRate 0.0088   Epoch: 14   Global Step: 583830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:32,542-Speed 2627.86 samples/sec   Loss 3.9308   LearningRate 0.0088   Epoch: 14   Global Step: 583840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:36,440-Speed 2628.10 samples/sec   Loss 4.0675   LearningRate 0.0088   Epoch: 14   Global Step: 583850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:40,338-Speed 2627.57 samples/sec   Loss 3.9595   LearningRate 0.0088   Epoch: 14   Global Step: 583860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:44,231-Speed 2631.41 samples/sec   Loss 3.8900   LearningRate 0.0088   Epoch: 14   Global Step: 583870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:48,128-Speed 2628.07 samples/sec   Loss 4.0879   LearningRate 0.0088   Epoch: 14   Global Step: 583880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:52,024-Speed 2628.61 samples/sec   Loss 4.0583   LearningRate 0.0088   Epoch: 14   Global Step: 583890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:55,920-Speed 2629.36 samples/sec   Loss 3.9808   LearningRate 0.0088   Epoch: 14   Global Step: 583900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:21:59,816-Speed 2628.73 samples/sec   Loss 4.0030   LearningRate 0.0088   Epoch: 14   Global Step: 583910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:22:03,805-Speed 2567.81 samples/sec   Loss 4.0006   LearningRate 0.0088   Epoch: 14   Global Step: 583920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:07,720-Speed 2615.67 samples/sec   Loss 4.0332   LearningRate 0.0088   Epoch: 14   Global Step: 583930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:11,618-Speed 2627.27 samples/sec   Loss 3.9305   LearningRate 0.0088   Epoch: 14   Global Step: 583940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:15,519-Speed 2625.92 samples/sec   Loss 4.0269   LearningRate 0.0088   Epoch: 14   Global Step: 583950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:19,410-Speed 2632.57 samples/sec   Loss 3.9895   LearningRate 0.0088   Epoch: 14   Global Step: 583960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:23,305-Speed 2629.72 samples/sec   Loss 4.1278   LearningRate 0.0088   Epoch: 14   Global Step: 583970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:27,195-Speed 2632.72 samples/sec   Loss 4.0379   LearningRate 0.0088   Epoch: 14   Global Step: 583980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:31,097-Speed 2625.33 samples/sec   Loss 4.0244   LearningRate 0.0088   Epoch: 14   Global Step: 583990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:22:34,986-Speed 2633.74 samples/sec   Loss 3.9839   LearningRate 0.0088   Epoch: 14   Global Step: 584000   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:22:38,885-Speed 2627.23 samples/sec   Loss 4.1097   LearningRate 0.0088   Epoch: 14   Global Step: 584010   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:22:42,777-Speed 2631.65 samples/sec   Loss 3.9691   LearningRate 0.0088   Epoch: 14   Global Step: 584020   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:22:46,664-Speed 2635.01 samples/sec   Loss 3.9763   LearningRate 0.0088   Epoch: 14   Global Step: 584030   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:22:50,560-Speed 2628.75 samples/sec   Loss 3.9879   LearningRate 0.0088   Epoch: 14   Global Step: 584040   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:22:54,458-Speed 2627.87 samples/sec   Loss 4.0119   LearningRate 0.0088   Epoch: 14   Global Step: 584050   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:22:58,351-Speed 2630.89 samples/sec   Loss 3.9791   LearningRate 0.0088   Epoch: 14   Global Step: 584060   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:23:02,248-Speed 2628.51 samples/sec   Loss 4.0346   LearningRate 0.0088   Epoch: 14   Global Step: 584070   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:23:06,147-Speed 2627.26 samples/sec   Loss 4.0239   LearningRate 0.0088   Epoch: 14   Global Step: 584080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:23:10,039-Speed 2631.85 samples/sec   Loss 3.9731   LearningRate 0.0088   Epoch: 14   Global Step: 584090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:23:13,935-Speed 2629.12 samples/sec   Loss 3.9660   LearningRate 0.0088   Epoch: 14   Global Step: 584100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:17,829-Speed 2630.65 samples/sec   Loss 3.9087   LearningRate 0.0088   Epoch: 14   Global Step: 584110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:21,724-Speed 2629.39 samples/sec   Loss 4.0349   LearningRate 0.0088   Epoch: 14   Global Step: 584120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:25,620-Speed 2628.74 samples/sec   Loss 4.0406   LearningRate 0.0088   Epoch: 14   Global Step: 584130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:29,524-Speed 2623.47 samples/sec   Loss 4.0421   LearningRate 0.0088   Epoch: 14   Global Step: 584140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:33,418-Speed 2630.93 samples/sec   Loss 3.9858   LearningRate 0.0088   Epoch: 14   Global Step: 584150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:37,315-Speed 2628.38 samples/sec   Loss 3.9616   LearningRate 0.0088   Epoch: 14   Global Step: 584160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:41,213-Speed 2627.17 samples/sec   Loss 4.0600   LearningRate 0.0088   Epoch: 14   Global Step: 584170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:45,115-Speed 2625.34 samples/sec   Loss 3.9703   LearningRate 0.0088   Epoch: 14   Global Step: 584180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:49,005-Speed 2633.46 samples/sec   Loss 4.0417   LearningRate 0.0087   Epoch: 14   Global Step: 584190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:23:52,896-Speed 2631.91 samples/sec   Loss 3.9590   LearningRate 0.0087   Epoch: 14   Global Step: 584200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:23:56,787-Speed 2632.32 samples/sec   Loss 4.0887   LearningRate 0.0087   Epoch: 14   Global Step: 584210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:24:00,746-Speed 2586.78 samples/sec   Loss 4.0135   LearningRate 0.0087   Epoch: 14   Global Step: 584220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:24:04,656-Speed 2620.46 samples/sec   Loss 4.0125   LearningRate 0.0087   Epoch: 14   Global Step: 584230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:24:08,555-Speed 2626.47 samples/sec   Loss 4.0830   LearningRate 0.0087   Epoch: 14   Global Step: 584240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:12,458-Speed 2625.09 samples/sec   Loss 4.0261   LearningRate 0.0087   Epoch: 14   Global Step: 584250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:16,352-Speed 2630.35 samples/sec   Loss 4.0610   LearningRate 0.0087   Epoch: 14   Global Step: 584260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:20,246-Speed 2630.77 samples/sec   Loss 3.9484   LearningRate 0.0087   Epoch: 14   Global Step: 584270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:24,141-Speed 2629.55 samples/sec   Loss 4.0529   LearningRate 0.0087   Epoch: 14   Global Step: 584280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:28,081-Speed 2599.55 samples/sec   Loss 4.0147   LearningRate 0.0087   Epoch: 14   Global Step: 584290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:32,069-Speed 2568.12 samples/sec   Loss 4.0063   LearningRate 0.0087   Epoch: 14   Global Step: 584300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:35,991-Speed 2611.81 samples/sec   Loss 4.0382   LearningRate 0.0087   Epoch: 14   Global Step: 584310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:39,960-Speed 2580.83 samples/sec   Loss 3.9678   LearningRate 0.0087   Epoch: 14   Global Step: 584320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:43,866-Speed 2622.23 samples/sec   Loss 3.9219   LearningRate 0.0087   Epoch: 14   Global Step: 584330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:24:47,769-Speed 2624.93 samples/sec   Loss 4.0508   LearningRate 0.0087   Epoch: 14   Global Step: 584340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:24:51,660-Speed 2632.02 samples/sec   Loss 3.9751   LearningRate 0.0087   Epoch: 14   Global Step: 584350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:24:55,556-Speed 2629.32 samples/sec   Loss 4.1173   LearningRate 0.0087   Epoch: 14   Global Step: 584360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:24:59,452-Speed 2629.43 samples/sec   Loss 4.0172   LearningRate 0.0087   Epoch: 14   Global Step: 584370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:25:03,329-Speed 2641.25 samples/sec   Loss 4.0200   LearningRate 0.0087   Epoch: 14   Global Step: 584380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:25:07,239-Speed 2620.29 samples/sec   Loss 4.0354   LearningRate 0.0087   Epoch: 14   Global Step: 584390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:25:11,164-Speed 2609.70 samples/sec   Loss 3.9591   LearningRate 0.0087   Epoch: 14   Global Step: 584400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:25:15,076-Speed 2618.38 samples/sec   Loss 3.9334   LearningRate 0.0087   Epoch: 14   Global Step: 584410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:25:18,950-Speed 2643.85 samples/sec   Loss 3.9949   LearningRate 0.0087   Epoch: 14   Global Step: 584420   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:22,847-Speed 2628.11 samples/sec   Loss 4.0375   LearningRate 0.0087   Epoch: 14   Global Step: 584430   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:26,743-Speed 2629.05 samples/sec   Loss 4.0893   LearningRate 0.0087   Epoch: 14   Global Step: 584440   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:30,650-Speed 2622.34 samples/sec   Loss 3.9160   LearningRate 0.0087   Epoch: 14   Global Step: 584450   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:34,557-Speed 2621.38 samples/sec   Loss 3.9417   LearningRate 0.0087   Epoch: 14   Global Step: 584460   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:38,462-Speed 2622.27 samples/sec   Loss 3.9216   LearningRate 0.0087   Epoch: 14   Global Step: 584470   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:42,372-Speed 2619.48 samples/sec   Loss 3.9712   LearningRate 0.0087   Epoch: 14   Global Step: 584480   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:46,269-Speed 2632.11 samples/sec   Loss 4.0126   LearningRate 0.0087   Epoch: 14   Global Step: 584490   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:50,165-Speed 2629.37 samples/sec   Loss 4.0552   LearningRate 0.0087   Epoch: 14   Global Step: 584500   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:54,059-Speed 2630.49 samples/sec   Loss 4.0431   LearningRate 0.0087   Epoch: 14   Global Step: 584510   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:25:57,966-Speed 2622.21 samples/sec   Loss 3.9702   LearningRate 0.0087   Epoch: 14   Global Step: 584520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:01,857-Speed 2631.99 samples/sec   Loss 4.0440   LearningRate 0.0087   Epoch: 14   Global Step: 584530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:05,754-Speed 2627.90 samples/sec   Loss 3.9267   LearningRate 0.0087   Epoch: 14   Global Step: 584540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:09,647-Speed 2631.41 samples/sec   Loss 3.9442   LearningRate 0.0087   Epoch: 14   Global Step: 584550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:13,556-Speed 2620.64 samples/sec   Loss 4.0567   LearningRate 0.0087   Epoch: 14   Global Step: 584560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:17,452-Speed 2628.95 samples/sec   Loss 4.0022   LearningRate 0.0087   Epoch: 14   Global Step: 584570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:21,353-Speed 2625.73 samples/sec   Loss 4.0744   LearningRate 0.0087   Epoch: 14   Global Step: 584580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:25,256-Speed 2624.31 samples/sec   Loss 3.9615   LearningRate 0.0087   Epoch: 14   Global Step: 584590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:29,149-Speed 2631.67 samples/sec   Loss 3.9753   LearningRate 0.0087   Epoch: 14   Global Step: 584600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:33,042-Speed 2630.55 samples/sec   Loss 3.9340   LearningRate 0.0087   Epoch: 14   Global Step: 584610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:36,971-Speed 2606.52 samples/sec   Loss 3.9233   LearningRate 0.0087   Epoch: 14   Global Step: 584620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:26:40,842-Speed 2646.61 samples/sec   Loss 3.9392   LearningRate 0.0087   Epoch: 14   Global Step: 584630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:44,741-Speed 2627.29 samples/sec   Loss 3.8735   LearningRate 0.0087   Epoch: 14   Global Step: 584640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:48,634-Speed 2630.31 samples/sec   Loss 3.9371   LearningRate 0.0087   Epoch: 14   Global Step: 584650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:52,527-Speed 2631.55 samples/sec   Loss 3.9366   LearningRate 0.0087   Epoch: 14   Global Step: 584660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:26:56,422-Speed 2629.09 samples/sec   Loss 3.9763   LearningRate 0.0087   Epoch: 14   Global Step: 584670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:00,319-Speed 2629.21 samples/sec   Loss 3.9872   LearningRate 0.0087   Epoch: 14   Global Step: 584680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:04,216-Speed 2628.20 samples/sec   Loss 3.9162   LearningRate 0.0087   Epoch: 14   Global Step: 584690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:08,281-Speed 2519.26 samples/sec   Loss 3.9557   LearningRate 0.0087   Epoch: 14   Global Step: 584700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:12,185-Speed 2623.85 samples/sec   Loss 3.9838   LearningRate 0.0087   Epoch: 14   Global Step: 584710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:16,076-Speed 2633.06 samples/sec   Loss 4.0516   LearningRate 0.0087   Epoch: 14   Global Step: 584720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:19,980-Speed 2623.50 samples/sec   Loss 3.9225   LearningRate 0.0087   Epoch: 14   Global Step: 584730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-15 13:27:23,851-Speed 2646.31 samples/sec   Loss 3.9958   LearningRate 0.0087   Epoch: 14   Global Step: 584740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:27,748-Speed 2628.05 samples/sec   Loss 3.8589   LearningRate 0.0087   Epoch: 14   Global Step: 584750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:31,642-Speed 2630.69 samples/sec   Loss 4.0131   LearningRate 0.0087   Epoch: 14   Global Step: 584760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:35,535-Speed 2630.65 samples/sec   Loss 3.9506   LearningRate 0.0087   Epoch: 14   Global Step: 584770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:39,429-Speed 2630.79 samples/sec   Loss 3.9919   LearningRate 0.0087   Epoch: 14   Global Step: 584780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:27:43,302-Speed 2644.30 samples/sec   Loss 4.0990   LearningRate 0.0087   Epoch: 14   Global Step: 584790   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:27:47,193-Speed 2631.66 samples/sec   Loss 4.0146   LearningRate 0.0087   Epoch: 14   Global Step: 584800   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:27:51,086-Speed 2631.85 samples/sec   Loss 4.0542   LearningRate 0.0087   Epoch: 14   Global Step: 584810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:27:55,022-Speed 2602.26 samples/sec   Loss 4.0587   LearningRate 0.0087   Epoch: 14   Global Step: 584820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:27:58,917-Speed 2630.08 samples/sec   Loss 3.9640   LearningRate 0.0087   Epoch: 14   Global Step: 584830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:28:02,812-Speed 2629.79 samples/sec   Loss 3.9662   LearningRate 0.0087   Epoch: 14   Global Step: 584840   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:28:06,705-Speed 2630.37 samples/sec   Loss 4.0391   LearningRate 0.0087   Epoch: 14   Global Step: 584850   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:28:10,612-Speed 2621.80 samples/sec   Loss 4.0074   LearningRate 0.0087   Epoch: 14   Global Step: 584860   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:28:14,506-Speed 2630.81 samples/sec   Loss 4.0399   LearningRate 0.0087   Epoch: 14   Global Step: 584870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:28:18,400-Speed 2630.07 samples/sec   Loss 3.9671   LearningRate 0.0087   Epoch: 14   Global Step: 584880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-15 13:28:22,309-Speed 2620.40 samples/sec   Loss 3.9946   LearningRate 0.0087   Epoch: 14   Global Step: 584890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:28:26,252-Speed 2598.02 samples/sec   Loss 3.9429   LearningRate 0.0087   Epoch: 14   Global Step: 584900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:28:30,150-Speed 2627.64 samples/sec   Loss 4.0545   LearningRate 0.0087   Epoch: 14   Global Step: 584910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-15 13:28:34,047-Speed 2628.11 samples/sec   Loss 4.0179   LearningRate 0.0087   Epoch: 14   Global Step: 584920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:28:37,994-Speed 2594.69 samples/sec   Loss 4.0287   LearningRate 0.0087   Epoch: 14   Global Step: 584930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:28:41,891-Speed 2628.98 samples/sec   Loss 3.9971   LearningRate 0.0087   Epoch: 14   Global Step: 584940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:28:45,785-Speed 2629.97 samples/sec   Loss 4.0254   LearningRate 0.0087   Epoch: 14   Global Step: 584950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:28:49,696-Speed 2619.57 samples/sec   Loss 4.1189   LearningRate 0.0087   Epoch: 14   Global Step: 584960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:28:53,590-Speed 2630.63 samples/sec   Loss 4.0164   LearningRate 0.0087   Epoch: 14   Global Step: 584970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:28:57,485-Speed 2629.92 samples/sec   Loss 4.0769   LearningRate 0.0087   Epoch: 14   Global Step: 584980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:29:01,376-Speed 2632.07 samples/sec   Loss 4.0898   LearningRate 0.0087   Epoch: 14   Global Step: 584990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:29:05,275-Speed 2627.31 samples/sec   Loss 4.0824   LearningRate 0.0087   Epoch: 14   Global Step: 585000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:29:09,217-Speed 2598.16 samples/sec   Loss 3.8493   LearningRate 0.0087   Epoch: 14   Global Step: 585010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:29:13,109-Speed 2632.74 samples/sec   Loss 4.0119   LearningRate 0.0087   Epoch: 14   Global Step: 585020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:29:16,983-Speed 2643.88 samples/sec   Loss 4.0386   LearningRate 0.0087   Epoch: 14   Global Step: 585030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:29:20,873-Speed 2632.89 samples/sec   Loss 3.9700   LearningRate 0.0087   Epoch: 14   Global Step: 585040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:29:24,767-Speed 2630.68 samples/sec   Loss 3.9565   LearningRate 0.0087   Epoch: 14   Global Step: 585050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:29:28,658-Speed 2631.87 samples/sec   Loss 3.9600   LearningRate 0.0087   Epoch: 14   Global Step: 585060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:29:32,526-Speed 2648.18 samples/sec   Loss 4.1102   LearningRate 0.0087   Epoch: 14   Global Step: 585070   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:36,421-Speed 2630.04 samples/sec   Loss 4.0780   LearningRate 0.0087   Epoch: 14   Global Step: 585080   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:40,332-Speed 2618.74 samples/sec   Loss 4.0013   LearningRate 0.0087   Epoch: 14   Global Step: 585090   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:44,229-Speed 2628.61 samples/sec   Loss 4.0067   LearningRate 0.0087   Epoch: 14   Global Step: 585100   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:48,122-Speed 2630.85 samples/sec   Loss 4.0189   LearningRate 0.0087   Epoch: 14   Global Step: 585110   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:52,023-Speed 2625.64 samples/sec   Loss 4.0579   LearningRate 0.0087   Epoch: 14   Global Step: 585120   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:55,915-Speed 2631.66 samples/sec   Loss 4.0004   LearningRate 0.0087   Epoch: 14   Global Step: 585130   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:29:59,807-Speed 2632.00 samples/sec   Loss 3.9281   LearningRate 0.0087   Epoch: 14   Global Step: 585140   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:03,703-Speed 2628.52 samples/sec   Loss 3.9616   LearningRate 0.0087   Epoch: 14   Global Step: 585150   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:07,614-Speed 2619.28 samples/sec   Loss 4.0391   LearningRate 0.0087   Epoch: 14   Global Step: 585160   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:11,561-Speed 2595.09 samples/sec   Loss 4.0193   LearningRate 0.0087   Epoch: 14   Global Step: 585170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:30:15,440-Speed 2640.58 samples/sec   Loss 3.9600   LearningRate 0.0087   Epoch: 14   Global Step: 585180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:19,337-Speed 2628.95 samples/sec   Loss 4.0083   LearningRate 0.0087   Epoch: 14   Global Step: 585190   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:23,231-Speed 2630.10 samples/sec   Loss 3.9646   LearningRate 0.0087   Epoch: 14   Global Step: 585200   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:27,148-Speed 2615.01 samples/sec   Loss 3.9637   LearningRate 0.0087   Epoch: 14   Global Step: 585210   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:31,039-Speed 2632.08 samples/sec   Loss 4.0277   LearningRate 0.0087   Epoch: 14   Global Step: 585220   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:34,940-Speed 2626.42 samples/sec   Loss 3.9956   LearningRate 0.0087   Epoch: 14   Global Step: 585230   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:38,830-Speed 2632.75 samples/sec   Loss 3.9494   LearningRate 0.0087   Epoch: 14   Global Step: 585240   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:42,736-Speed 2622.18 samples/sec   Loss 3.8377   LearningRate 0.0087   Epoch: 14   Global Step: 585250   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:46,637-Speed 2625.52 samples/sec   Loss 4.0325   LearningRate 0.0087   Epoch: 14   Global Step: 585260   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:50,533-Speed 2629.59 samples/sec   Loss 3.9416   LearningRate 0.0087   Epoch: 14   Global Step: 585270   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:30:54,428-Speed 2630.22 samples/sec   Loss 4.0939   LearningRate 0.0087   Epoch: 14   Global Step: 585280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:30:58,297-Speed 2647.14 samples/sec   Loss 3.9396   LearningRate 0.0087   Epoch: 14   Global Step: 585290   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:02,196-Speed 2626.77 samples/sec   Loss 3.8761   LearningRate 0.0087   Epoch: 14   Global Step: 585300   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:06,097-Speed 2625.59 samples/sec   Loss 3.9323   LearningRate 0.0087   Epoch: 14   Global Step: 585310   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:09,989-Speed 2631.76 samples/sec   Loss 4.0413   LearningRate 0.0087   Epoch: 14   Global Step: 585320   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:13,889-Speed 2626.35 samples/sec   Loss 3.9178   LearningRate 0.0087   Epoch: 14   Global Step: 585330   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:17,816-Speed 2608.06 samples/sec   Loss 3.9682   LearningRate 0.0087   Epoch: 14   Global Step: 585340   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:21,718-Speed 2624.80 samples/sec   Loss 3.9965   LearningRate 0.0087   Epoch: 14   Global Step: 585350   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:25,613-Speed 2629.77 samples/sec   Loss 3.9882   LearningRate 0.0087   Epoch: 14   Global Step: 585360   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:29,515-Speed 2625.59 samples/sec   Loss 4.0575   LearningRate 0.0087   Epoch: 14   Global Step: 585370   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:33,416-Speed 2625.45 samples/sec   Loss 4.0394   LearningRate 0.0087   Epoch: 14   Global Step: 585380   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:31:37,320-Speed 2623.21 samples/sec   Loss 3.9369   LearningRate 0.0087   Epoch: 14   Global Step: 585390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:31:41,215-Speed 2629.19 samples/sec   Loss 4.1070   LearningRate 0.0087   Epoch: 14   Global Step: 585400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:31:45,121-Speed 2622.73 samples/sec   Loss 3.9199   LearningRate 0.0087   Epoch: 14   Global Step: 585410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:31:49,026-Speed 2622.67 samples/sec   Loss 3.9695   LearningRate 0.0087   Epoch: 14   Global Step: 585420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:31:52,925-Speed 2627.54 samples/sec   Loss 3.9117   LearningRate 0.0087   Epoch: 14   Global Step: 585430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:31:56,852-Speed 2608.04 samples/sec   Loss 3.8699   LearningRate 0.0087   Epoch: 14   Global Step: 585440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:32:00,770-Speed 2614.85 samples/sec   Loss 4.0179   LearningRate 0.0087   Epoch: 14   Global Step: 585450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:32:04,639-Speed 2646.99 samples/sec   Loss 3.9664   LearningRate 0.0087   Epoch: 14   Global Step: 585460   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:08,531-Speed 2631.23 samples/sec   Loss 3.9416   LearningRate 0.0087   Epoch: 14   Global Step: 585470   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:12,432-Speed 2625.53 samples/sec   Loss 4.0339   LearningRate 0.0087   Epoch: 14   Global Step: 585480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:16,325-Speed 2631.63 samples/sec   Loss 4.0544   LearningRate 0.0087   Epoch: 14   Global Step: 585490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:20,217-Speed 2631.50 samples/sec   Loss 3.9564   LearningRate 0.0087   Epoch: 14   Global Step: 585500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:24,115-Speed 2627.46 samples/sec   Loss 3.9585   LearningRate 0.0087   Epoch: 14   Global Step: 585510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:28,008-Speed 2630.73 samples/sec   Loss 3.9635   LearningRate 0.0087   Epoch: 14   Global Step: 585520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:31,915-Speed 2622.14 samples/sec   Loss 4.0196   LearningRate 0.0087   Epoch: 14   Global Step: 585530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:35,809-Speed 2630.47 samples/sec   Loss 3.9248   LearningRate 0.0087   Epoch: 14   Global Step: 585540   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:39,705-Speed 2628.51 samples/sec   Loss 4.0332   LearningRate 0.0087   Epoch: 14   Global Step: 585550   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:32:43,599-Speed 2630.71 samples/sec   Loss 3.9123   LearningRate 0.0087   Epoch: 14   Global Step: 585560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:32:47,519-Speed 2613.11 samples/sec   Loss 4.0064   LearningRate 0.0087   Epoch: 14   Global Step: 585570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:32:51,411-Speed 2631.68 samples/sec   Loss 3.9334   LearningRate 0.0087   Epoch: 14   Global Step: 585580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:32:55,312-Speed 2625.82 samples/sec   Loss 3.9339   LearningRate 0.0087   Epoch: 14   Global Step: 585590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:32:59,208-Speed 2629.25 samples/sec   Loss 3.9503   LearningRate 0.0086   Epoch: 14   Global Step: 585600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:33:03,101-Speed 2631.24 samples/sec   Loss 4.0467   LearningRate 0.0086   Epoch: 14   Global Step: 585610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:33:06,984-Speed 2637.41 samples/sec   Loss 3.8399   LearningRate 0.0086   Epoch: 14   Global Step: 585620   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:10,881-Speed 2628.30 samples/sec   Loss 4.0350   LearningRate 0.0086   Epoch: 14   Global Step: 585630   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:14,912-Speed 2541.19 samples/sec   Loss 4.0126   LearningRate 0.0086   Epoch: 14   Global Step: 585640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:18,810-Speed 2628.28 samples/sec   Loss 3.9379   LearningRate 0.0086   Epoch: 14   Global Step: 585650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:22,775-Speed 2583.31 samples/sec   Loss 4.0111   LearningRate 0.0086   Epoch: 14   Global Step: 585660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:26,666-Speed 2631.82 samples/sec   Loss 3.9869   LearningRate 0.0086   Epoch: 14   Global Step: 585670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:30,560-Speed 2631.23 samples/sec   Loss 3.9642   LearningRate 0.0086   Epoch: 14   Global Step: 585680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:34,460-Speed 2626.29 samples/sec   Loss 4.0169   LearningRate 0.0086   Epoch: 14   Global Step: 585690   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:38,353-Speed 2630.80 samples/sec   Loss 3.8591   LearningRate 0.0086   Epoch: 14   Global Step: 585700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:42,379-Speed 2543.66 samples/sec   Loss 3.9756   LearningRate 0.0086   Epoch: 14   Global Step: 585710   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:33:46,286-Speed 2622.65 samples/sec   Loss 4.0193   LearningRate 0.0086   Epoch: 14   Global Step: 585720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:33:50,198-Speed 2618.15 samples/sec   Loss 3.9596   LearningRate 0.0086   Epoch: 14   Global Step: 585730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:33:54,095-Speed 2628.30 samples/sec   Loss 3.9613   LearningRate 0.0086   Epoch: 14   Global Step: 585740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:33:58,022-Speed 2608.24 samples/sec   Loss 4.0288   LearningRate 0.0086   Epoch: 14   Global Step: 585750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:01,917-Speed 2629.80 samples/sec   Loss 4.0464   LearningRate 0.0086   Epoch: 14   Global Step: 585760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:05,817-Speed 2626.79 samples/sec   Loss 4.0582   LearningRate 0.0086   Epoch: 14   Global Step: 585770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:09,716-Speed 2626.50 samples/sec   Loss 3.9567   LearningRate 0.0086   Epoch: 14   Global Step: 585780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:13,610-Speed 2630.11 samples/sec   Loss 3.9751   LearningRate 0.0086   Epoch: 14   Global Step: 585790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:17,524-Speed 2617.18 samples/sec   Loss 4.0098   LearningRate 0.0086   Epoch: 14   Global Step: 585800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:21,424-Speed 2626.75 samples/sec   Loss 4.0343   LearningRate 0.0086   Epoch: 14   Global Step: 585810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:34:25,416-Speed 2565.98 samples/sec   Loss 3.9380   LearningRate 0.0086   Epoch: 14   Global Step: 585820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:29,421-Speed 2556.96 samples/sec   Loss 4.0286   LearningRate 0.0086   Epoch: 14   Global Step: 585830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:33,315-Speed 2630.38 samples/sec   Loss 3.9241   LearningRate 0.0086   Epoch: 14   Global Step: 585840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:37,216-Speed 2625.75 samples/sec   Loss 3.8906   LearningRate 0.0086   Epoch: 14   Global Step: 585850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:41,113-Speed 2628.62 samples/sec   Loss 3.8909   LearningRate 0.0086   Epoch: 14   Global Step: 585860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:45,009-Speed 2629.17 samples/sec   Loss 3.8758   LearningRate 0.0086   Epoch: 14   Global Step: 585870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:48,925-Speed 2615.51 samples/sec   Loss 3.9986   LearningRate 0.0086   Epoch: 14   Global Step: 585880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:52,825-Speed 2626.44 samples/sec   Loss 4.0139   LearningRate 0.0086   Epoch: 14   Global Step: 585890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:34:56,695-Speed 2647.43 samples/sec   Loss 3.9875   LearningRate 0.0086   Epoch: 14   Global Step: 585900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:00,586-Speed 2632.21 samples/sec   Loss 4.0354   LearningRate 0.0086   Epoch: 14   Global Step: 585910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:04,482-Speed 2629.12 samples/sec   Loss 3.9643   LearningRate 0.0086   Epoch: 14   Global Step: 585920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:08,373-Speed 2631.64 samples/sec   Loss 4.0163   LearningRate 0.0086   Epoch: 14   Global Step: 585930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:12,269-Speed 2629.84 samples/sec   Loss 4.0102   LearningRate 0.0086   Epoch: 14   Global Step: 585940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:16,180-Speed 2618.94 samples/sec   Loss 3.9157   LearningRate 0.0086   Epoch: 14   Global Step: 585950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:20,090-Speed 2619.26 samples/sec   Loss 3.9804   LearningRate 0.0086   Epoch: 14   Global Step: 585960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:23,986-Speed 2629.55 samples/sec   Loss 4.0169   LearningRate 0.0086   Epoch: 14   Global Step: 585970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:27,877-Speed 2632.11 samples/sec   Loss 4.0367   LearningRate 0.0086   Epoch: 14   Global Step: 585980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:31,770-Speed 2631.23 samples/sec   Loss 3.9308   LearningRate 0.0086   Epoch: 14   Global Step: 585990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:35,667-Speed 2627.60 samples/sec   Loss 3.9923   LearningRate 0.0086   Epoch: 14   Global Step: 586000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:35:39,563-Speed 2629.42 samples/sec   Loss 3.9169   LearningRate 0.0086   Epoch: 14   Global Step: 586010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:35:43,463-Speed 2626.51 samples/sec   Loss 4.0232   LearningRate 0.0086   Epoch: 14   Global Step: 586020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:35:47,352-Speed 2633.57 samples/sec   Loss 3.8978   LearningRate 0.0086   Epoch: 14   Global Step: 586030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:35:51,224-Speed 2645.39 samples/sec   Loss 3.9562   LearningRate 0.0086   Epoch: 14   Global Step: 586040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:55,118-Speed 2630.75 samples/sec   Loss 4.0172   LearningRate 0.0086   Epoch: 14   Global Step: 586050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:35:59,007-Speed 2633.69 samples/sec   Loss 3.9018   LearningRate 0.0086   Epoch: 14   Global Step: 586060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:02,900-Speed 2630.80 samples/sec   Loss 3.9008   LearningRate 0.0086   Epoch: 14   Global Step: 586070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:06,793-Speed 2630.57 samples/sec   Loss 4.0123   LearningRate 0.0086   Epoch: 14   Global Step: 586080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:10,685-Speed 2631.65 samples/sec   Loss 3.9804   LearningRate 0.0086   Epoch: 14   Global Step: 586090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:14,576-Speed 2632.51 samples/sec   Loss 4.0403   LearningRate 0.0086   Epoch: 14   Global Step: 586100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:18,472-Speed 2629.15 samples/sec   Loss 3.9442   LearningRate 0.0086   Epoch: 14   Global Step: 586110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:22,373-Speed 2625.42 samples/sec   Loss 3.9152   LearningRate 0.0086   Epoch: 14   Global Step: 586120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:26,270-Speed 2628.90 samples/sec   Loss 4.0160   LearningRate 0.0086   Epoch: 14   Global Step: 586130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:30,172-Speed 2624.58 samples/sec   Loss 4.0270   LearningRate 0.0086   Epoch: 14   Global Step: 586140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:36:34,052-Speed 2639.59 samples/sec   Loss 4.0338   LearningRate 0.0086   Epoch: 14   Global Step: 586150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:37,956-Speed 2623.91 samples/sec   Loss 3.8491   LearningRate 0.0086   Epoch: 14   Global Step: 586160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:41,846-Speed 2632.92 samples/sec   Loss 4.0011   LearningRate 0.0086   Epoch: 14   Global Step: 586170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:45,739-Speed 2630.68 samples/sec   Loss 3.9449   LearningRate 0.0086   Epoch: 14   Global Step: 586180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:49,638-Speed 2627.67 samples/sec   Loss 3.8964   LearningRate 0.0086   Epoch: 14   Global Step: 586190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:53,540-Speed 2625.15 samples/sec   Loss 4.0046   LearningRate 0.0086   Epoch: 14   Global Step: 586200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:36:57,523-Speed 2571.72 samples/sec   Loss 3.9447   LearningRate 0.0086   Epoch: 14   Global Step: 586210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:01,432-Speed 2620.40 samples/sec   Loss 3.9660   LearningRate 0.0086   Epoch: 14   Global Step: 586220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:05,333-Speed 2625.59 samples/sec   Loss 3.9768   LearningRate 0.0086   Epoch: 14   Global Step: 586230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:09,230-Speed 2628.58 samples/sec   Loss 4.0037   LearningRate 0.0086   Epoch: 14   Global Step: 586240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:13,128-Speed 2627.23 samples/sec   Loss 3.9278   LearningRate 0.0086   Epoch: 14   Global Step: 586250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:37:17,024-Speed 2629.22 samples/sec   Loss 3.9837   LearningRate 0.0086   Epoch: 14   Global Step: 586260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:37:20,920-Speed 2629.20 samples/sec   Loss 3.9591   LearningRate 0.0086   Epoch: 14   Global Step: 586270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:37:24,796-Speed 2643.00 samples/sec   Loss 3.9700   LearningRate 0.0086   Epoch: 14   Global Step: 586280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:28,700-Speed 2623.24 samples/sec   Loss 3.9819   LearningRate 0.0086   Epoch: 14   Global Step: 586290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:32,628-Speed 2607.82 samples/sec   Loss 4.0086   LearningRate 0.0086   Epoch: 14   Global Step: 586300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:37:36,529-Speed 2625.92 samples/sec   Loss 3.9699   LearningRate 0.0086   Epoch: 14   Global Step: 586310   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:37:40,425-Speed 2628.62 samples/sec   Loss 3.9342   LearningRate 0.0086   Epoch: 14   Global Step: 586320   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:37:44,341-Speed 2615.45 samples/sec   Loss 3.9571   LearningRate 0.0086   Epoch: 14   Global Step: 586330   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:37:48,239-Speed 2627.70 samples/sec   Loss 3.9788   LearningRate 0.0086   Epoch: 14   Global Step: 586340   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:37:52,137-Speed 2628.06 samples/sec   Loss 3.9838   LearningRate 0.0086   Epoch: 14   Global Step: 586350   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:37:56,033-Speed 2629.07 samples/sec   Loss 3.9295   LearningRate 0.0086   Epoch: 14   Global Step: 586360   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:37:59,931-Speed 2627.83 samples/sec   Loss 3.9671   LearningRate 0.0086   Epoch: 14   Global Step: 586370   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:38:03,820-Speed 2633.25 samples/sec   Loss 3.8875   LearningRate 0.0086   Epoch: 14   Global Step: 586380   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:38:07,714-Speed 2630.27 samples/sec   Loss 3.9534   LearningRate 0.0086   Epoch: 14   Global Step: 586390   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:38:11,615-Speed 2625.44 samples/sec   Loss 4.0908   LearningRate 0.0086   Epoch: 14   Global Step: 586400   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:38:15,516-Speed 2626.21 samples/sec   Loss 3.9768   LearningRate 0.0086   Epoch: 14   Global Step: 586410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:19,414-Speed 2627.24 samples/sec   Loss 3.9893   LearningRate 0.0086   Epoch: 14   Global Step: 586420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:23,344-Speed 2606.47 samples/sec   Loss 3.9546   LearningRate 0.0086   Epoch: 14   Global Step: 586430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:27,241-Speed 2628.54 samples/sec   Loss 3.9870   LearningRate 0.0086   Epoch: 14   Global Step: 586440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:31,140-Speed 2627.17 samples/sec   Loss 3.9370   LearningRate 0.0086   Epoch: 14   Global Step: 586450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:35,043-Speed 2624.31 samples/sec   Loss 3.9935   LearningRate 0.0086   Epoch: 14   Global Step: 586460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:38,959-Speed 2615.89 samples/sec   Loss 3.9689   LearningRate 0.0086   Epoch: 14   Global Step: 586470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:42,853-Speed 2630.25 samples/sec   Loss 3.9965   LearningRate 0.0086   Epoch: 14   Global Step: 586480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:46,886-Speed 2539.38 samples/sec   Loss 3.9585   LearningRate 0.0086   Epoch: 14   Global Step: 586490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:50,805-Speed 2613.79 samples/sec   Loss 4.0338   LearningRate 0.0086   Epoch: 14   Global Step: 586500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:38:54,699-Speed 2630.92 samples/sec   Loss 3.9978   LearningRate 0.0086   Epoch: 14   Global Step: 586510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:38:58,592-Speed 2630.64 samples/sec   Loss 3.9722   LearningRate 0.0086   Epoch: 14   Global Step: 586520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:39:02,463-Speed 2646.29 samples/sec   Loss 3.9732   LearningRate 0.0086   Epoch: 14   Global Step: 586530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:06,356-Speed 2631.11 samples/sec   Loss 3.9517   LearningRate 0.0086   Epoch: 14   Global Step: 586540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:10,257-Speed 2625.77 samples/sec   Loss 4.0356   LearningRate 0.0086   Epoch: 14   Global Step: 586550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:14,150-Speed 2630.72 samples/sec   Loss 3.9419   LearningRate 0.0086   Epoch: 14   Global Step: 586560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:18,046-Speed 2631.53 samples/sec   Loss 3.9728   LearningRate 0.0086   Epoch: 14   Global Step: 586570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:21,985-Speed 2600.48 samples/sec   Loss 3.9592   LearningRate 0.0086   Epoch: 14   Global Step: 586580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:25,882-Speed 2628.52 samples/sec   Loss 4.0157   LearningRate 0.0086   Epoch: 14   Global Step: 586590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:29,780-Speed 2628.16 samples/sec   Loss 3.8535   LearningRate 0.0086   Epoch: 14   Global Step: 586600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:33,692-Speed 2618.33 samples/sec   Loss 4.1007   LearningRate 0.0086   Epoch: 14   Global Step: 586610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:37,593-Speed 2625.73 samples/sec   Loss 3.9308   LearningRate 0.0086   Epoch: 14   Global Step: 586620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:39:41,488-Speed 2629.50 samples/sec   Loss 3.9625   LearningRate 0.0086   Epoch: 14   Global Step: 586630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:39:45,389-Speed 2625.64 samples/sec   Loss 3.9870   LearningRate 0.0086   Epoch: 14   Global Step: 586640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:39:49,309-Speed 2613.14 samples/sec   Loss 4.0038   LearningRate 0.0086   Epoch: 14   Global Step: 586650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:39:53,205-Speed 2629.53 samples/sec   Loss 4.1296   LearningRate 0.0086   Epoch: 14   Global Step: 586660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:39:57,080-Speed 2642.79 samples/sec   Loss 3.8848   LearningRate 0.0086   Epoch: 14   Global Step: 586670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:40:00,959-Speed 2640.74 samples/sec   Loss 4.0862   LearningRate 0.0086   Epoch: 14   Global Step: 586680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:04,853-Speed 2630.81 samples/sec   Loss 3.9093   LearningRate 0.0086   Epoch: 14   Global Step: 586690   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:08,779-Speed 2608.36 samples/sec   Loss 4.0195   LearningRate 0.0086   Epoch: 14   Global Step: 586700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:12,673-Speed 2630.22 samples/sec   Loss 3.9246   LearningRate 0.0086   Epoch: 14   Global Step: 586710   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:16,568-Speed 2630.25 samples/sec   Loss 3.9335   LearningRate 0.0086   Epoch: 14   Global Step: 586720   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:20,464-Speed 2628.74 samples/sec   Loss 3.8669   LearningRate 0.0086   Epoch: 14   Global Step: 586730   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:24,356-Speed 2631.39 samples/sec   Loss 3.9751   LearningRate 0.0086   Epoch: 14   Global Step: 586740   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:28,252-Speed 2629.79 samples/sec   Loss 4.0168   LearningRate 0.0086   Epoch: 14   Global Step: 586750   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:32,161-Speed 2619.89 samples/sec   Loss 3.9529   LearningRate 0.0086   Epoch: 14   Global Step: 586760   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:36,054-Speed 2631.13 samples/sec   Loss 3.9788   LearningRate 0.0086   Epoch: 14   Global Step: 586770   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:40:39,948-Speed 2630.64 samples/sec   Loss 3.9524   LearningRate 0.0086   Epoch: 14   Global Step: 586780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:40:43,843-Speed 2629.59 samples/sec   Loss 3.9346   LearningRate 0.0086   Epoch: 14   Global Step: 586790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:40:47,742-Speed 2626.71 samples/sec   Loss 3.9870   LearningRate 0.0086   Epoch: 14   Global Step: 586800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:40:51,637-Speed 2629.44 samples/sec   Loss 3.9700   LearningRate 0.0086   Epoch: 14   Global Step: 586810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:40:55,542-Speed 2624.04 samples/sec   Loss 3.9942   LearningRate 0.0086   Epoch: 14   Global Step: 586820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:40:59,433-Speed 2632.10 samples/sec   Loss 3.9086   LearningRate 0.0086   Epoch: 14   Global Step: 586830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:03,327-Speed 2630.25 samples/sec   Loss 3.9347   LearningRate 0.0086   Epoch: 14   Global Step: 586840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:07,261-Speed 2603.19 samples/sec   Loss 3.8933   LearningRate 0.0086   Epoch: 14   Global Step: 586850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:11,174-Speed 2618.15 samples/sec   Loss 3.9158   LearningRate 0.0086   Epoch: 14   Global Step: 586860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:15,069-Speed 2629.35 samples/sec   Loss 3.9642   LearningRate 0.0086   Epoch: 14   Global Step: 586870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:18,966-Speed 2628.91 samples/sec   Loss 3.9953   LearningRate 0.0086   Epoch: 14   Global Step: 586880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:41:22,863-Speed 2627.83 samples/sec   Loss 4.0115   LearningRate 0.0086   Epoch: 14   Global Step: 586890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:26,778-Speed 2616.95 samples/sec   Loss 3.9339   LearningRate 0.0086   Epoch: 14   Global Step: 586900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:30,673-Speed 2629.79 samples/sec   Loss 3.9704   LearningRate 0.0086   Epoch: 14   Global Step: 586910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:34,570-Speed 2627.83 samples/sec   Loss 4.0465   LearningRate 0.0086   Epoch: 14   Global Step: 586920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:38,465-Speed 2629.42 samples/sec   Loss 3.8960   LearningRate 0.0086   Epoch: 14   Global Step: 586930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:42,371-Speed 2622.66 samples/sec   Loss 4.0356   LearningRate 0.0086   Epoch: 14   Global Step: 586940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:46,265-Speed 2630.10 samples/sec   Loss 3.9665   LearningRate 0.0086   Epoch: 14   Global Step: 586950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:50,159-Speed 2631.30 samples/sec   Loss 3.9325   LearningRate 0.0086   Epoch: 14   Global Step: 586960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:54,065-Speed 2622.22 samples/sec   Loss 3.9485   LearningRate 0.0086   Epoch: 14   Global Step: 586970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:41:57,960-Speed 2629.62 samples/sec   Loss 3.9503   LearningRate 0.0086   Epoch: 14   Global Step: 586980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:01,855-Speed 2629.29 samples/sec   Loss 3.9364   LearningRate 0.0086   Epoch: 14   Global Step: 586990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:05,754-Speed 2626.95 samples/sec   Loss 3.9186   LearningRate 0.0086   Epoch: 14   Global Step: 587000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:09,668-Speed 2617.02 samples/sec   Loss 3.9018   LearningRate 0.0085   Epoch: 14   Global Step: 587010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:13,569-Speed 2625.45 samples/sec   Loss 4.0390   LearningRate 0.0085   Epoch: 14   Global Step: 587020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:17,463-Speed 2630.39 samples/sec   Loss 4.0252   LearningRate 0.0085   Epoch: 14   Global Step: 587030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:21,364-Speed 2625.53 samples/sec   Loss 4.0109   LearningRate 0.0085   Epoch: 14   Global Step: 587040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:42:25,246-Speed 2639.21 samples/sec   Loss 3.8937   LearningRate 0.0085   Epoch: 14   Global Step: 587050   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:29,338-Speed 2502.90 samples/sec   Loss 3.9607   LearningRate 0.0085   Epoch: 14   Global Step: 587060   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:33,249-Speed 2619.18 samples/sec   Loss 3.9531   LearningRate 0.0085   Epoch: 14   Global Step: 587070   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:37,150-Speed 2625.03 samples/sec   Loss 3.8839   LearningRate 0.0085   Epoch: 14   Global Step: 587080   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:41,046-Speed 2629.17 samples/sec   Loss 3.9281   LearningRate 0.0085   Epoch: 14   Global Step: 587090   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:44,949-Speed 2624.48 samples/sec   Loss 3.9474   LearningRate 0.0085   Epoch: 14   Global Step: 587100   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:48,847-Speed 2627.71 samples/sec   Loss 3.9822   LearningRate 0.0085   Epoch: 14   Global Step: 587110   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:52,743-Speed 2628.89 samples/sec   Loss 3.8931   LearningRate 0.0085   Epoch: 14   Global Step: 587120   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:42:56,648-Speed 2623.57 samples/sec   Loss 3.8723   LearningRate 0.0085   Epoch: 14   Global Step: 587130   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:43:00,543-Speed 2629.46 samples/sec   Loss 3.9678   LearningRate 0.0085   Epoch: 14   Global Step: 587140   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:43:04,440-Speed 2628.07 samples/sec   Loss 3.8581   LearningRate 0.0085   Epoch: 14   Global Step: 587150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:08,338-Speed 2627.55 samples/sec   Loss 4.0033   LearningRate 0.0085   Epoch: 14   Global Step: 587160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:12,243-Speed 2623.05 samples/sec   Loss 4.0165   LearningRate 0.0085   Epoch: 14   Global Step: 587170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:16,137-Speed 2630.50 samples/sec   Loss 3.9751   LearningRate 0.0085   Epoch: 14   Global Step: 587180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:20,035-Speed 2627.24 samples/sec   Loss 4.0449   LearningRate 0.0085   Epoch: 14   Global Step: 587190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:23,932-Speed 2628.83 samples/sec   Loss 3.9515   LearningRate 0.0085   Epoch: 14   Global Step: 587200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:27,825-Speed 2630.79 samples/sec   Loss 3.9547   LearningRate 0.0085   Epoch: 14   Global Step: 587210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:31,718-Speed 2631.59 samples/sec   Loss 4.0484   LearningRate 0.0085   Epoch: 14   Global Step: 587220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:35,614-Speed 2629.06 samples/sec   Loss 3.9890   LearningRate 0.0085   Epoch: 14   Global Step: 587230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:39,511-Speed 2627.90 samples/sec   Loss 3.9925   LearningRate 0.0085   Epoch: 14   Global Step: 587240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:43,395-Speed 2637.12 samples/sec   Loss 4.0169   LearningRate 0.0085   Epoch: 14   Global Step: 587250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:47,292-Speed 2628.13 samples/sec   Loss 3.8462   LearningRate 0.0085   Epoch: 14   Global Step: 587260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:51,186-Speed 2630.49 samples/sec   Loss 3.9528   LearningRate 0.0085   Epoch: 14   Global Step: 587270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:55,082-Speed 2630.36 samples/sec   Loss 4.0040   LearningRate 0.0085   Epoch: 14   Global Step: 587280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:43:58,996-Speed 2616.68 samples/sec   Loss 4.0339   LearningRate 0.0085   Epoch: 14   Global Step: 587290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:02,896-Speed 2626.70 samples/sec   Loss 3.9591   LearningRate 0.0085   Epoch: 14   Global Step: 587300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:06,791-Speed 2629.61 samples/sec   Loss 3.9556   LearningRate 0.0085   Epoch: 14   Global Step: 587310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:10,687-Speed 2628.55 samples/sec   Loss 3.9152   LearningRate 0.0085   Epoch: 14   Global Step: 587320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:14,586-Speed 2627.07 samples/sec   Loss 3.8755   LearningRate 0.0085   Epoch: 14   Global Step: 587330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:18,497-Speed 2619.42 samples/sec   Loss 3.9683   LearningRate 0.0085   Epoch: 14   Global Step: 587340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:22,392-Speed 2630.30 samples/sec   Loss 3.9220   LearningRate 0.0085   Epoch: 14   Global Step: 587350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:44:26,282-Speed 2632.77 samples/sec   Loss 3.9305   LearningRate 0.0085   Epoch: 14   Global Step: 587360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:44:30,151-Speed 2647.34 samples/sec   Loss 4.0539   LearningRate 0.0085   Epoch: 14   Global Step: 587370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:34,063-Speed 2618.69 samples/sec   Loss 3.9866   LearningRate 0.0085   Epoch: 14   Global Step: 587380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:37,972-Speed 2619.45 samples/sec   Loss 4.0143   LearningRate 0.0085   Epoch: 14   Global Step: 587390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:41,870-Speed 2628.12 samples/sec   Loss 3.8706   LearningRate 0.0085   Epoch: 14   Global Step: 587400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:45,766-Speed 2629.32 samples/sec   Loss 3.9318   LearningRate 0.0085   Epoch: 14   Global Step: 587410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:49,664-Speed 2627.32 samples/sec   Loss 4.0075   LearningRate 0.0085   Epoch: 14   Global Step: 587420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:53,600-Speed 2603.01 samples/sec   Loss 3.9331   LearningRate 0.0085   Epoch: 14   Global Step: 587430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:44:57,517-Speed 2614.64 samples/sec   Loss 3.9916   LearningRate 0.0085   Epoch: 14   Global Step: 587440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:01,413-Speed 2629.05 samples/sec   Loss 3.9573   LearningRate 0.0085   Epoch: 14   Global Step: 587450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:05,470-Speed 2524.51 samples/sec   Loss 3.9218   LearningRate 0.0085   Epoch: 14   Global Step: 587460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:09,396-Speed 2609.01 samples/sec   Loss 3.9177   LearningRate 0.0085   Epoch: 14   Global Step: 587470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:45:13,290-Speed 2630.43 samples/sec   Loss 3.8660   LearningRate 0.0085   Epoch: 14   Global Step: 587480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:45:17,177-Speed 2635.64 samples/sec   Loss 3.8816   LearningRate 0.0085   Epoch: 14   Global Step: 587490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:21,075-Speed 2627.55 samples/sec   Loss 3.8999   LearningRate 0.0085   Epoch: 14   Global Step: 587500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:25,023-Speed 2594.87 samples/sec   Loss 3.9722   LearningRate 0.0085   Epoch: 14   Global Step: 587510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:28,947-Speed 2610.16 samples/sec   Loss 3.9754   LearningRate 0.0085   Epoch: 14   Global Step: 587520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:32,863-Speed 2616.11 samples/sec   Loss 3.9505   LearningRate 0.0085   Epoch: 14   Global Step: 587530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:36,781-Speed 2613.68 samples/sec   Loss 3.9156   LearningRate 0.0085   Epoch: 14   Global Step: 587540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:40,687-Speed 2622.55 samples/sec   Loss 3.9212   LearningRate 0.0085   Epoch: 14   Global Step: 587550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:44,580-Speed 2630.70 samples/sec   Loss 4.0267   LearningRate 0.0085   Epoch: 14   Global Step: 587560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:48,475-Speed 2629.64 samples/sec   Loss 3.9657   LearningRate 0.0085   Epoch: 14   Global Step: 587570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:52,487-Speed 2557.08 samples/sec   Loss 3.8239   LearningRate 0.0085   Epoch: 14   Global Step: 587580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:45:56,361-Speed 2643.99 samples/sec   Loss 3.9448   LearningRate 0.0085   Epoch: 14   Global Step: 587590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:00,266-Speed 2622.82 samples/sec   Loss 3.9703   LearningRate 0.0085   Epoch: 14   Global Step: 587600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:04,173-Speed 2621.16 samples/sec   Loss 3.9434   LearningRate 0.0085   Epoch: 14   Global Step: 587610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:08,072-Speed 2627.13 samples/sec   Loss 3.9242   LearningRate 0.0085   Epoch: 14   Global Step: 587620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:11,978-Speed 2622.27 samples/sec   Loss 3.9707   LearningRate 0.0085   Epoch: 14   Global Step: 587630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:15,870-Speed 2631.67 samples/sec   Loss 3.9601   LearningRate 0.0085   Epoch: 14   Global Step: 587640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:19,764-Speed 2630.64 samples/sec   Loss 4.0042   LearningRate 0.0085   Epoch: 14   Global Step: 587650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:23,656-Speed 2631.50 samples/sec   Loss 3.9149   LearningRate 0.0085   Epoch: 14   Global Step: 587660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:27,555-Speed 2628.16 samples/sec   Loss 3.9817   LearningRate 0.0085   Epoch: 14   Global Step: 587670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:31,451-Speed 2628.79 samples/sec   Loss 3.9280   LearningRate 0.0085   Epoch: 14   Global Step: 587680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:35,344-Speed 2630.88 samples/sec   Loss 3.8996   LearningRate 0.0085   Epoch: 14   Global Step: 587690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:46:39,239-Speed 2630.00 samples/sec   Loss 3.8043   LearningRate 0.0085   Epoch: 14   Global Step: 587700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:46:43,115-Speed 2642.32 samples/sec   Loss 3.9535   LearningRate 0.0085   Epoch: 14   Global Step: 587710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:47,008-Speed 2630.74 samples/sec   Loss 3.8985   LearningRate 0.0085   Epoch: 14   Global Step: 587720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:50,901-Speed 2631.46 samples/sec   Loss 3.9753   LearningRate 0.0085   Epoch: 14   Global Step: 587730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:54,796-Speed 2629.13 samples/sec   Loss 4.0616   LearningRate 0.0085   Epoch: 14   Global Step: 587740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:46:58,687-Speed 2632.66 samples/sec   Loss 4.0035   LearningRate 0.0085   Epoch: 14   Global Step: 587750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:02,581-Speed 2630.30 samples/sec   Loss 4.0145   LearningRate 0.0085   Epoch: 14   Global Step: 587760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:06,478-Speed 2628.66 samples/sec   Loss 3.8545   LearningRate 0.0085   Epoch: 14   Global Step: 587770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:10,375-Speed 2628.22 samples/sec   Loss 3.8895   LearningRate 0.0085   Epoch: 14   Global Step: 587780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:14,284-Speed 2619.81 samples/sec   Loss 3.9434   LearningRate 0.0085   Epoch: 14   Global Step: 587790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:18,181-Speed 2628.27 samples/sec   Loss 4.0639   LearningRate 0.0085   Epoch: 14   Global Step: 587800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:22,059-Speed 2641.45 samples/sec   Loss 3.9385   LearningRate 0.0085   Epoch: 14   Global Step: 587810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:25,952-Speed 2630.76 samples/sec   Loss 3.9032   LearningRate 0.0085   Epoch: 14   Global Step: 587820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:29,848-Speed 2629.48 samples/sec   Loss 3.8795   LearningRate 0.0085   Epoch: 14   Global Step: 587830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:33,739-Speed 2631.88 samples/sec   Loss 3.9263   LearningRate 0.0085   Epoch: 14   Global Step: 587840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:37,633-Speed 2630.19 samples/sec   Loss 3.9909   LearningRate 0.0085   Epoch: 14   Global Step: 587850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:41,567-Speed 2603.66 samples/sec   Loss 3.9504   LearningRate 0.0085   Epoch: 14   Global Step: 587860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:45,462-Speed 2630.25 samples/sec   Loss 3.8965   LearningRate 0.0085   Epoch: 14   Global Step: 587870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:49,350-Speed 2634.54 samples/sec   Loss 3.9869   LearningRate 0.0085   Epoch: 14   Global Step: 587880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:53,256-Speed 2622.06 samples/sec   Loss 4.0825   LearningRate 0.0085   Epoch: 14   Global Step: 587890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:47:57,149-Speed 2631.37 samples/sec   Loss 3.9011   LearningRate 0.0085   Epoch: 14   Global Step: 587900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:01,055-Speed 2622.27 samples/sec   Loss 3.8810   LearningRate 0.0085   Epoch: 14   Global Step: 587910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:48:04,942-Speed 2635.39 samples/sec   Loss 3.8555   LearningRate 0.0085   Epoch: 14   Global Step: 587920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:08,843-Speed 2625.09 samples/sec   Loss 3.9739   LearningRate 0.0085   Epoch: 14   Global Step: 587930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:12,738-Speed 2630.95 samples/sec   Loss 3.9317   LearningRate 0.0085   Epoch: 14   Global Step: 587940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:16,630-Speed 2631.49 samples/sec   Loss 4.0142   LearningRate 0.0085   Epoch: 14   Global Step: 587950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:20,523-Speed 2631.06 samples/sec   Loss 3.8615   LearningRate 0.0085   Epoch: 14   Global Step: 587960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:24,415-Speed 2632.11 samples/sec   Loss 3.9149   LearningRate 0.0085   Epoch: 14   Global Step: 587970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:28,315-Speed 2626.48 samples/sec   Loss 3.9070   LearningRate 0.0085   Epoch: 14   Global Step: 587980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:32,227-Speed 2617.54 samples/sec   Loss 3.9262   LearningRate 0.0085   Epoch: 14   Global Step: 587990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:36,127-Speed 2626.48 samples/sec   Loss 3.9524   LearningRate 0.0085   Epoch: 14   Global Step: 588000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:40,027-Speed 2626.25 samples/sec   Loss 3.9636   LearningRate 0.0085   Epoch: 14   Global Step: 588010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:48:43,960-Speed 2604.33 samples/sec   Loss 3.9412   LearningRate 0.0085   Epoch: 14   Global Step: 588020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:48:47,879-Speed 2614.13 samples/sec   Loss 4.0004   LearningRate 0.0085   Epoch: 14   Global Step: 588030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:48:51,772-Speed 2630.70 samples/sec   Loss 3.9597   LearningRate 0.0085   Epoch: 14   Global Step: 588040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:48:55,666-Speed 2630.38 samples/sec   Loss 3.9132   LearningRate 0.0085   Epoch: 14   Global Step: 588050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:48:59,538-Speed 2645.74 samples/sec   Loss 4.0344   LearningRate 0.0085   Epoch: 14   Global Step: 588060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:03,438-Speed 2625.97 samples/sec   Loss 3.9334   LearningRate 0.0085   Epoch: 14   Global Step: 588070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:07,350-Speed 2617.97 samples/sec   Loss 3.9668   LearningRate 0.0085   Epoch: 14   Global Step: 588080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:11,248-Speed 2628.61 samples/sec   Loss 3.9109   LearningRate 0.0085   Epoch: 14   Global Step: 588090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:15,142-Speed 2629.91 samples/sec   Loss 3.9715   LearningRate 0.0085   Epoch: 14   Global Step: 588100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:19,037-Speed 2630.16 samples/sec   Loss 3.9037   LearningRate 0.0085   Epoch: 14   Global Step: 588110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:22,928-Speed 2631.84 samples/sec   Loss 3.9029   LearningRate 0.0085   Epoch: 14   Global Step: 588120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:26,831-Speed 2624.91 samples/sec   Loss 3.9608   LearningRate 0.0085   Epoch: 14   Global Step: 588130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:30,727-Speed 2628.62 samples/sec   Loss 3.9229   LearningRate 0.0085   Epoch: 14   Global Step: 588140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:34,619-Speed 2632.17 samples/sec   Loss 3.9567   LearningRate 0.0085   Epoch: 14   Global Step: 588150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:38,524-Speed 2622.50 samples/sec   Loss 3.9696   LearningRate 0.0085   Epoch: 14   Global Step: 588160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:49:42,397-Speed 2645.16 samples/sec   Loss 4.0299   LearningRate 0.0085   Epoch: 14   Global Step: 588170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:46,293-Speed 2628.42 samples/sec   Loss 3.9542   LearningRate 0.0085   Epoch: 14   Global Step: 588180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:50,188-Speed 2629.63 samples/sec   Loss 3.7580   LearningRate 0.0085   Epoch: 14   Global Step: 588190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:54,083-Speed 2629.49 samples/sec   Loss 3.8871   LearningRate 0.0085   Epoch: 14   Global Step: 588200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:49:57,983-Speed 2626.79 samples/sec   Loss 4.0310   LearningRate 0.0085   Epoch: 14   Global Step: 588210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:01,878-Speed 2629.55 samples/sec   Loss 3.9961   LearningRate 0.0085   Epoch: 14   Global Step: 588220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:05,773-Speed 2630.19 samples/sec   Loss 3.9316   LearningRate 0.0085   Epoch: 14   Global Step: 588230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:09,666-Speed 2630.32 samples/sec   Loss 3.9181   LearningRate 0.0085   Epoch: 14   Global Step: 588240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:13,560-Speed 2631.22 samples/sec   Loss 3.9178   LearningRate 0.0085   Epoch: 14   Global Step: 588250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:17,453-Speed 2630.75 samples/sec   Loss 3.9924   LearningRate 0.0085   Epoch: 14   Global Step: 588260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:21,336-Speed 2637.69 samples/sec   Loss 3.9286   LearningRate 0.0085   Epoch: 14   Global Step: 588270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:25,576-Speed 2416.33 samples/sec   Loss 3.9282   LearningRate 0.0085   Epoch: 14   Global Step: 588280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:29,473-Speed 2627.84 samples/sec   Loss 3.8929   LearningRate 0.0085   Epoch: 14   Global Step: 588290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:33,440-Speed 2582.04 samples/sec   Loss 3.9283   LearningRate 0.0085   Epoch: 14   Global Step: 588300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:37,361-Speed 2612.11 samples/sec   Loss 3.9350   LearningRate 0.0085   Epoch: 14   Global Step: 588310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:41,255-Speed 2630.68 samples/sec   Loss 3.8990   LearningRate 0.0085   Epoch: 14   Global Step: 588320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:45,146-Speed 2632.29 samples/sec   Loss 3.8625   LearningRate 0.0085   Epoch: 14   Global Step: 588330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:49,037-Speed 2632.65 samples/sec   Loss 3.8898   LearningRate 0.0085   Epoch: 14   Global Step: 588340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:52,933-Speed 2628.65 samples/sec   Loss 3.9776   LearningRate 0.0085   Epoch: 14   Global Step: 588350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:50:56,833-Speed 2626.84 samples/sec   Loss 4.0009   LearningRate 0.0085   Epoch: 14   Global Step: 588360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:00,706-Speed 2644.43 samples/sec   Loss 3.9386   LearningRate 0.0085   Epoch: 14   Global Step: 588370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:04,604-Speed 2627.73 samples/sec   Loss 3.8854   LearningRate 0.0085   Epoch: 14   Global Step: 588380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:08,513-Speed 2619.86 samples/sec   Loss 3.9525   LearningRate 0.0085   Epoch: 14   Global Step: 588390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:12,411-Speed 2628.09 samples/sec   Loss 3.9425   LearningRate 0.0085   Epoch: 14   Global Step: 588400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:16,317-Speed 2622.34 samples/sec   Loss 3.9270   LearningRate 0.0085   Epoch: 14   Global Step: 588410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:20,206-Speed 2633.60 samples/sec   Loss 3.9564   LearningRate 0.0085   Epoch: 14   Global Step: 588420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:24,129-Speed 2611.10 samples/sec   Loss 3.9293   LearningRate 0.0085   Epoch: 14   Global Step: 588430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:28,023-Speed 2630.79 samples/sec   Loss 3.9721   LearningRate 0.0084   Epoch: 14   Global Step: 588440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:31,912-Speed 2633.27 samples/sec   Loss 3.8956   LearningRate 0.0084   Epoch: 14   Global Step: 588450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:35,802-Speed 2632.98 samples/sec   Loss 3.9397   LearningRate 0.0084   Epoch: 14   Global Step: 588460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:39,717-Speed 2616.45 samples/sec   Loss 3.9630   LearningRate 0.0084   Epoch: 14   Global Step: 588470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:51:43,614-Speed 2628.42 samples/sec   Loss 3.9175   LearningRate 0.0084   Epoch: 14   Global Step: 588480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:51:47,485-Speed 2646.45 samples/sec   Loss 3.9154   LearningRate 0.0084   Epoch: 14   Global Step: 588490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:51,382-Speed 2627.87 samples/sec   Loss 3.8499   LearningRate 0.0084   Epoch: 14   Global Step: 588500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:55,290-Speed 2621.16 samples/sec   Loss 3.9301   LearningRate 0.0084   Epoch: 14   Global Step: 588510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:51:59,186-Speed 2629.11 samples/sec   Loss 4.0003   LearningRate 0.0084   Epoch: 14   Global Step: 588520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:03,095-Speed 2620.59 samples/sec   Loss 4.0331   LearningRate 0.0084   Epoch: 14   Global Step: 588530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:06,994-Speed 2626.51 samples/sec   Loss 3.9942   LearningRate 0.0084   Epoch: 14   Global Step: 588540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:10,904-Speed 2620.45 samples/sec   Loss 3.8974   LearningRate 0.0084   Epoch: 14   Global Step: 588550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:14,797-Speed 2630.84 samples/sec   Loss 3.8636   LearningRate 0.0084   Epoch: 14   Global Step: 588560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:18,687-Speed 2633.48 samples/sec   Loss 3.8957   LearningRate 0.0084   Epoch: 14   Global Step: 588570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:22,581-Speed 2630.49 samples/sec   Loss 3.9131   LearningRate 0.0084   Epoch: 14   Global Step: 588580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:26,481-Speed 2626.20 samples/sec   Loss 4.0007   LearningRate 0.0084   Epoch: 14   Global Step: 588590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:52:30,359-Speed 2641.01 samples/sec   Loss 4.0429   LearningRate 0.0084   Epoch: 14   Global Step: 588600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:34,267-Speed 2621.50 samples/sec   Loss 3.9804   LearningRate 0.0084   Epoch: 14   Global Step: 588610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:38,163-Speed 2629.08 samples/sec   Loss 4.0512   LearningRate 0.0084   Epoch: 14   Global Step: 588620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:42,073-Speed 2619.53 samples/sec   Loss 3.9529   LearningRate 0.0084   Epoch: 14   Global Step: 588630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:52:45,960-Speed 2635.31 samples/sec   Loss 3.9526   LearningRate 0.0084   Epoch: 14   Global Step: 588640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:52:49,862-Speed 2625.06 samples/sec   Loss 4.0049   LearningRate 0.0084   Epoch: 14   Global Step: 588650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:52:53,757-Speed 2630.50 samples/sec   Loss 3.9143   LearningRate 0.0084   Epoch: 14   Global Step: 588660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:52:57,652-Speed 2629.48 samples/sec   Loss 3.9515   LearningRate 0.0084   Epoch: 14   Global Step: 588670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:01,547-Speed 2629.34 samples/sec   Loss 3.9442   LearningRate 0.0084   Epoch: 14   Global Step: 588680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:05,447-Speed 2626.64 samples/sec   Loss 3.9723   LearningRate 0.0084   Epoch: 14   Global Step: 588690   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:09,346-Speed 2627.17 samples/sec   Loss 3.9305   LearningRate 0.0084   Epoch: 14   Global Step: 588700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:13,248-Speed 2625.02 samples/sec   Loss 3.9636   LearningRate 0.0084   Epoch: 14   Global Step: 588710   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:17,146-Speed 2627.59 samples/sec   Loss 3.9052   LearningRate 0.0084   Epoch: 14   Global Step: 588720   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:21,040-Speed 2630.64 samples/sec   Loss 3.9585   LearningRate 0.0084   Epoch: 14   Global Step: 588730   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:24,936-Speed 2628.64 samples/sec   Loss 3.9004   LearningRate 0.0084   Epoch: 14   Global Step: 588740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:28,831-Speed 2629.81 samples/sec   Loss 3.9972   LearningRate 0.0084   Epoch: 14   Global Step: 588750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:32,723-Speed 2631.78 samples/sec   Loss 3.9828   LearningRate 0.0084   Epoch: 14   Global Step: 588760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:36,616-Speed 2631.02 samples/sec   Loss 3.9559   LearningRate 0.0084   Epoch: 14   Global Step: 588770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:40,512-Speed 2628.72 samples/sec   Loss 3.9473   LearningRate 0.0084   Epoch: 14   Global Step: 588780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:44,406-Speed 2630.69 samples/sec   Loss 3.9278   LearningRate 0.0084   Epoch: 14   Global Step: 588790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:48,304-Speed 2627.95 samples/sec   Loss 3.9564   LearningRate 0.0084   Epoch: 14   Global Step: 588800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:53:52,187-Speed 2637.06 samples/sec   Loss 4.0550   LearningRate 0.0084   Epoch: 14   Global Step: 588810   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:56,098-Speed 2619.97 samples/sec   Loss 3.9865   LearningRate 0.0084   Epoch: 14   Global Step: 588820   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:53:59,994-Speed 2628.56 samples/sec   Loss 3.8895   LearningRate 0.0084   Epoch: 14   Global Step: 588830   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:03,906-Speed 2618.32 samples/sec   Loss 3.8740   LearningRate 0.0084   Epoch: 14   Global Step: 588840   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:07,822-Speed 2615.31 samples/sec   Loss 3.9916   LearningRate 0.0084   Epoch: 14   Global Step: 588850   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:11,728-Speed 2623.18 samples/sec   Loss 3.9391   LearningRate 0.0084   Epoch: 14   Global Step: 588860   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:15,636-Speed 2620.37 samples/sec   Loss 3.9328   LearningRate 0.0084   Epoch: 14   Global Step: 588870   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:19,546-Speed 2620.04 samples/sec   Loss 3.9870   LearningRate 0.0084   Epoch: 14   Global Step: 588880   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:23,453-Speed 2621.34 samples/sec   Loss 3.9438   LearningRate 0.0084   Epoch: 14   Global Step: 588890   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:27,351-Speed 2627.82 samples/sec   Loss 3.9113   LearningRate 0.0084   Epoch: 14   Global Step: 588900   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:54:31,251-Speed 2626.00 samples/sec   Loss 3.9172   LearningRate 0.0084   Epoch: 14   Global Step: 588910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:35,146-Speed 2629.98 samples/sec   Loss 3.8432   LearningRate 0.0084   Epoch: 14   Global Step: 588920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:39,054-Speed 2620.70 samples/sec   Loss 3.9610   LearningRate 0.0084   Epoch: 14   Global Step: 588930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:42,974-Speed 2612.99 samples/sec   Loss 3.9534   LearningRate 0.0084   Epoch: 14   Global Step: 588940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:46,873-Speed 2626.81 samples/sec   Loss 3.9418   LearningRate 0.0084   Epoch: 14   Global Step: 588950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:50,771-Speed 2627.55 samples/sec   Loss 3.8913   LearningRate 0.0084   Epoch: 14   Global Step: 588960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:54,676-Speed 2623.21 samples/sec   Loss 3.9303   LearningRate 0.0084   Epoch: 14   Global Step: 588970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:54:58,622-Speed 2595.44 samples/sec   Loss 3.9766   LearningRate 0.0084   Epoch: 14   Global Step: 588980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:02,520-Speed 2628.31 samples/sec   Loss 3.9245   LearningRate 0.0084   Epoch: 14   Global Step: 588990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:06,414-Speed 2629.90 samples/sec   Loss 3.9567   LearningRate 0.0084   Epoch: 14   Global Step: 589000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:10,314-Speed 2626.07 samples/sec   Loss 3.9247   LearningRate 0.0084   Epoch: 14   Global Step: 589010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:55:14,211-Speed 2629.72 samples/sec   Loss 3.8825   LearningRate 0.0084   Epoch: 14   Global Step: 589020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:55:18,149-Speed 2600.85 samples/sec   Loss 3.8639   LearningRate 0.0084   Epoch: 14   Global Step: 589030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:55:22,044-Speed 2629.89 samples/sec   Loss 3.9329   LearningRate 0.0084   Epoch: 14   Global Step: 589040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:55:25,960-Speed 2615.70 samples/sec   Loss 3.8728   LearningRate 0.0084   Epoch: 14   Global Step: 589050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:29,858-Speed 2627.61 samples/sec   Loss 3.9471   LearningRate 0.0084   Epoch: 14   Global Step: 589060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:33,767-Speed 2620.12 samples/sec   Loss 3.9493   LearningRate 0.0084   Epoch: 14   Global Step: 589070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:37,680-Speed 2618.28 samples/sec   Loss 3.9296   LearningRate 0.0084   Epoch: 14   Global Step: 589080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:41,571-Speed 2632.84 samples/sec   Loss 3.8969   LearningRate 0.0084   Epoch: 14   Global Step: 589090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:45,470-Speed 2626.18 samples/sec   Loss 3.8788   LearningRate 0.0084   Epoch: 14   Global Step: 589100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:49,370-Speed 2626.86 samples/sec   Loss 3.9547   LearningRate 0.0084   Epoch: 14   Global Step: 589110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:53,264-Speed 2630.06 samples/sec   Loss 3.8878   LearningRate 0.0084   Epoch: 14   Global Step: 589120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:55:57,164-Speed 2626.73 samples/sec   Loss 3.9384   LearningRate 0.0084   Epoch: 14   Global Step: 589130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:01,056-Speed 2631.32 samples/sec   Loss 3.9498   LearningRate 0.0084   Epoch: 14   Global Step: 589140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:04,964-Speed 2621.11 samples/sec   Loss 3.8388   LearningRate 0.0084   Epoch: 14   Global Step: 589150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:56:08,861-Speed 2627.76 samples/sec   Loss 3.7857   LearningRate 0.0084   Epoch: 14   Global Step: 589160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:56:12,761-Speed 2626.81 samples/sec   Loss 3.9662   LearningRate 0.0084   Epoch: 14   Global Step: 589170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:56:16,666-Speed 2622.49 samples/sec   Loss 3.9204   LearningRate 0.0084   Epoch: 14   Global Step: 589180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:56:20,563-Speed 2628.58 samples/sec   Loss 3.9804   LearningRate 0.0084   Epoch: 14   Global Step: 589190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 13:56:24,442-Speed 2640.40 samples/sec   Loss 3.9094   LearningRate 0.0084   Epoch: 14   Global Step: 589200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:28,345-Speed 2624.22 samples/sec   Loss 3.9361   LearningRate 0.0084   Epoch: 14   Global Step: 589210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:32,251-Speed 2622.65 samples/sec   Loss 3.9722   LearningRate 0.0084   Epoch: 14   Global Step: 589220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:36,150-Speed 2626.47 samples/sec   Loss 3.9352   LearningRate 0.0084   Epoch: 14   Global Step: 589230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:40,074-Speed 2610.13 samples/sec   Loss 3.8434   LearningRate 0.0084   Epoch: 14   Global Step: 589240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:44,043-Speed 2581.17 samples/sec   Loss 3.9840   LearningRate 0.0084   Epoch: 14   Global Step: 589250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:48,027-Speed 2570.54 samples/sec   Loss 3.8716   LearningRate 0.0084   Epoch: 14   Global Step: 589260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:51,931-Speed 2623.92 samples/sec   Loss 3.9384   LearningRate 0.0084   Epoch: 14   Global Step: 589270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:55,848-Speed 2614.71 samples/sec   Loss 3.9012   LearningRate 0.0084   Epoch: 14   Global Step: 589280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:56:59,745-Speed 2628.77 samples/sec   Loss 3.8479   LearningRate 0.0084   Epoch: 14   Global Step: 589290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:57:03,621-Speed 2642.51 samples/sec   Loss 3.9327   LearningRate 0.0084   Epoch: 14   Global Step: 589300   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:07,517-Speed 2628.27 samples/sec   Loss 4.0023   LearningRate 0.0084   Epoch: 14   Global Step: 589310   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:11,417-Speed 2626.51 samples/sec   Loss 3.9143   LearningRate 0.0084   Epoch: 14   Global Step: 589320   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:15,318-Speed 2625.57 samples/sec   Loss 3.8904   LearningRate 0.0084   Epoch: 14   Global Step: 589330   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:19,226-Speed 2621.34 samples/sec   Loss 3.9495   LearningRate 0.0084   Epoch: 14   Global Step: 589340   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:23,125-Speed 2626.66 samples/sec   Loss 3.8980   LearningRate 0.0084   Epoch: 14   Global Step: 589350   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:27,027-Speed 2625.51 samples/sec   Loss 3.8576   LearningRate 0.0084   Epoch: 14   Global Step: 589360   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:30,932-Speed 2622.81 samples/sec   Loss 4.0048   LearningRate 0.0084   Epoch: 14   Global Step: 589370   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:34,835-Speed 2623.63 samples/sec   Loss 3.9271   LearningRate 0.0084   Epoch: 14   Global Step: 589380   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:38,734-Speed 2626.92 samples/sec   Loss 3.9000   LearningRate 0.0084   Epoch: 14   Global Step: 589390   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:57:42,645-Speed 2619.24 samples/sec   Loss 3.8802   LearningRate 0.0084   Epoch: 14   Global Step: 589400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:57:46,546-Speed 2625.37 samples/sec   Loss 3.9404   LearningRate 0.0084   Epoch: 14   Global Step: 589410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:57:50,443-Speed 2628.64 samples/sec   Loss 3.8999   LearningRate 0.0084   Epoch: 14   Global Step: 589420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:57:54,339-Speed 2629.27 samples/sec   Loss 3.9618   LearningRate 0.0084   Epoch: 14   Global Step: 589430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:57:58,224-Speed 2636.52 samples/sec   Loss 3.9875   LearningRate 0.0084   Epoch: 14   Global Step: 589440   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:02,142-Speed 2614.13 samples/sec   Loss 3.9521   LearningRate 0.0084   Epoch: 14   Global Step: 589450   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:06,041-Speed 2626.99 samples/sec   Loss 3.8516   LearningRate 0.0084   Epoch: 14   Global Step: 589460   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:09,933-Speed 2631.38 samples/sec   Loss 3.8331   LearningRate 0.0084   Epoch: 14   Global Step: 589470   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:13,829-Speed 2629.34 samples/sec   Loss 3.9069   LearningRate 0.0084   Epoch: 14   Global Step: 589480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:17,723-Speed 2630.33 samples/sec   Loss 3.9075   LearningRate 0.0084   Epoch: 14   Global Step: 589490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:21,617-Speed 2630.59 samples/sec   Loss 3.9123   LearningRate 0.0084   Epoch: 14   Global Step: 589500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:25,521-Speed 2623.43 samples/sec   Loss 3.8489   LearningRate 0.0084   Epoch: 14   Global Step: 589510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:29,446-Speed 2609.90 samples/sec   Loss 3.8998   LearningRate 0.0084   Epoch: 14   Global Step: 589520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:33,344-Speed 2628.51 samples/sec   Loss 3.8804   LearningRate 0.0084   Epoch: 14   Global Step: 589530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:58:37,268-Speed 2609.50 samples/sec   Loss 3.9783   LearningRate 0.0084   Epoch: 14   Global Step: 589540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:58:41,162-Speed 2630.75 samples/sec   Loss 4.0235   LearningRate 0.0084   Epoch: 14   Global Step: 589550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:58:45,064-Speed 2625.17 samples/sec   Loss 3.8455   LearningRate 0.0084   Epoch: 14   Global Step: 589560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:58:48,961-Speed 2628.49 samples/sec   Loss 3.8181   LearningRate 0.0084   Epoch: 14   Global Step: 589570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:58:52,863-Speed 2624.46 samples/sec   Loss 3.8817   LearningRate 0.0084   Epoch: 14   Global Step: 589580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:58:56,768-Speed 2623.06 samples/sec   Loss 3.9618   LearningRate 0.0084   Epoch: 14   Global Step: 589590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:59:00,663-Speed 2629.16 samples/sec   Loss 3.9538   LearningRate 0.0084   Epoch: 14   Global Step: 589600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:59:04,545-Speed 2638.66 samples/sec   Loss 3.8485   LearningRate 0.0084   Epoch: 14   Global Step: 589610   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:08,437-Speed 2632.49 samples/sec   Loss 3.9447   LearningRate 0.0084   Epoch: 14   Global Step: 589620   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:12,332-Speed 2629.35 samples/sec   Loss 3.9280   LearningRate 0.0084   Epoch: 14   Global Step: 589630   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:16,232-Speed 2626.31 samples/sec   Loss 3.9130   LearningRate 0.0084   Epoch: 14   Global Step: 589640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:20,127-Speed 2629.47 samples/sec   Loss 3.9606   LearningRate 0.0084   Epoch: 14   Global Step: 589650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:24,037-Speed 2619.92 samples/sec   Loss 3.9411   LearningRate 0.0084   Epoch: 14   Global Step: 589660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:27,933-Speed 2629.26 samples/sec   Loss 3.9182   LearningRate 0.0084   Epoch: 14   Global Step: 589670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:31,841-Speed 2620.25 samples/sec   Loss 3.8905   LearningRate 0.0084   Epoch: 14   Global Step: 589680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:35,742-Speed 2625.99 samples/sec   Loss 3.8781   LearningRate 0.0084   Epoch: 14   Global Step: 589690   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:39,705-Speed 2584.27 samples/sec   Loss 3.8405   LearningRate 0.0084   Epoch: 14   Global Step: 589700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 13:59:43,790-Speed 2508.01 samples/sec   Loss 3.9606   LearningRate 0.0084   Epoch: 14   Global Step: 589710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:59:47,892-Speed 2497.04 samples/sec   Loss 3.9750   LearningRate 0.0084   Epoch: 14   Global Step: 589720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:59:51,868-Speed 2576.20 samples/sec   Loss 3.8228   LearningRate 0.0084   Epoch: 14   Global Step: 589730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:59:55,771-Speed 2624.14 samples/sec   Loss 3.8313   LearningRate 0.0084   Epoch: 14   Global Step: 589740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 13:59:59,670-Speed 2627.14 samples/sec   Loss 3.8553   LearningRate 0.0084   Epoch: 14   Global Step: 589750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:03,565-Speed 2629.04 samples/sec   Loss 3.9657   LearningRate 0.0084   Epoch: 14   Global Step: 589760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:07,466-Speed 2625.49 samples/sec   Loss 3.8061   LearningRate 0.0084   Epoch: 14   Global Step: 589770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:11,365-Speed 2626.69 samples/sec   Loss 3.9144   LearningRate 0.0084   Epoch: 14   Global Step: 589780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:15,279-Speed 2617.92 samples/sec   Loss 3.7593   LearningRate 0.0084   Epoch: 14   Global Step: 589790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:19,174-Speed 2629.93 samples/sec   Loss 4.0217   LearningRate 0.0084   Epoch: 14   Global Step: 589800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:23,087-Speed 2617.54 samples/sec   Loss 3.9125   LearningRate 0.0084   Epoch: 14   Global Step: 589810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:00:26,984-Speed 2629.57 samples/sec   Loss 3.9360   LearningRate 0.0084   Epoch: 14   Global Step: 589820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:00:30,879-Speed 2629.40 samples/sec   Loss 3.8835   LearningRate 0.0084   Epoch: 14   Global Step: 589830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:00:34,784-Speed 2622.62 samples/sec   Loss 3.9935   LearningRate 0.0084   Epoch: 14   Global Step: 589840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:00:38,700-Speed 2615.87 samples/sec   Loss 3.9718   LearningRate 0.0084   Epoch: 14   Global Step: 589850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:00:42,568-Speed 2648.01 samples/sec   Loss 3.8247   LearningRate 0.0084   Epoch: 14   Global Step: 589860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:46,460-Speed 2631.65 samples/sec   Loss 3.8936   LearningRate 0.0083   Epoch: 14   Global Step: 589870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:50,371-Speed 2619.48 samples/sec   Loss 3.9465   LearningRate 0.0083   Epoch: 14   Global Step: 589880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:54,277-Speed 2621.87 samples/sec   Loss 3.9243   LearningRate 0.0083   Epoch: 14   Global Step: 589890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:00:58,197-Speed 2613.13 samples/sec   Loss 3.9032   LearningRate 0.0083   Epoch: 14   Global Step: 589900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:01:02,097-Speed 2625.93 samples/sec   Loss 3.9503   LearningRate 0.0083   Epoch: 14   Global Step: 589910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:01:05,994-Speed 2628.50 samples/sec   Loss 3.9823   LearningRate 0.0083   Epoch: 14   Global Step: 589920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:01:09,885-Speed 2631.98 samples/sec   Loss 3.8686   LearningRate 0.0083   Epoch: 14   Global Step: 589930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:01:13,782-Speed 2628.73 samples/sec   Loss 3.9439   LearningRate 0.0083   Epoch: 14   Global Step: 589940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:01:17,676-Speed 2630.35 samples/sec   Loss 3.9223   LearningRate 0.0083   Epoch: 14   Global Step: 589950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:01:21,583-Speed 2621.87 samples/sec   Loss 4.0053   LearningRate 0.0083   Epoch: 14   Global Step: 589960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:01:25,491-Speed 2621.26 samples/sec   Loss 3.9159   LearningRate 0.0083   Epoch: 14   Global Step: 589970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:01:29,395-Speed 2623.71 samples/sec   Loss 3.8316   LearningRate 0.0083   Epoch: 14   Global Step: 589980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:01:33,294-Speed 2626.98 samples/sec   Loss 3.8181   LearningRate 0.0083   Epoch: 14   Global Step: 589990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:01:37,191-Speed 2628.56 samples/sec   Loss 3.9263   LearningRate 0.0083   Epoch: 14   Global Step: 590000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:02:20,085-[lfw][590000]XNorm: 22.907722
Training: 2022-04-15 14:02:20,086-[lfw][590000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 14:02:20,087-[lfw][590000]Accuracy-Highest: 0.99800
Training: 2022-04-15 14:03:10,230-[cfp_fp][590000]XNorm: 21.833903
Training: 2022-04-15 14:03:10,231-[cfp_fp][590000]Accuracy-Flip: 0.99043+-0.00378
Training: 2022-04-15 14:03:10,232-[cfp_fp][590000]Accuracy-Highest: 0.99143
Training: 2022-04-15 14:03:53,137-[agedb_30][590000]XNorm: 23.177372
Training: 2022-04-15 14:03:53,138-[agedb_30][590000]Accuracy-Flip: 0.97867+-0.00726
Training: 2022-04-15 14:03:53,138-[agedb_30][590000]Accuracy-Highest: 0.98083
Training: 2022-04-15 14:03:57,097-Speed 73.19 samples/sec   Loss 3.9634   LearningRate 0.0083   Epoch: 14   Global Step: 590010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:00,984-Speed 2635.08 samples/sec   Loss 3.9249   LearningRate 0.0083   Epoch: 14   Global Step: 590020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:05,057-Speed 2515.36 samples/sec   Loss 3.8551   LearningRate 0.0083   Epoch: 14   Global Step: 590030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:08,986-Speed 2606.99 samples/sec   Loss 3.9306   LearningRate 0.0083   Epoch: 14   Global Step: 590040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:12,901-Speed 2615.45 samples/sec   Loss 3.8859   LearningRate 0.0083   Epoch: 14   Global Step: 590050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:16,788-Speed 2636.07 samples/sec   Loss 3.8795   LearningRate 0.0083   Epoch: 14   Global Step: 590060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:20,662-Speed 2644.65 samples/sec   Loss 3.8765   LearningRate 0.0083   Epoch: 14   Global Step: 590070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:24,591-Speed 2607.05 samples/sec   Loss 3.8039   LearningRate 0.0083   Epoch: 14   Global Step: 590080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:28,472-Speed 2639.56 samples/sec   Loss 3.8809   LearningRate 0.0083   Epoch: 14   Global Step: 590090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:04:32,332-Speed 2653.39 samples/sec   Loss 4.0121   LearningRate 0.0083   Epoch: 14   Global Step: 590100   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:36,216-Speed 2637.68 samples/sec   Loss 3.9791   LearningRate 0.0083   Epoch: 14   Global Step: 590110   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:40,098-Speed 2638.18 samples/sec   Loss 3.9479   LearningRate 0.0083   Epoch: 14   Global Step: 590120   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:43,996-Speed 2627.57 samples/sec   Loss 3.8732   LearningRate 0.0083   Epoch: 14   Global Step: 590130   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:47,883-Speed 2634.56 samples/sec   Loss 3.8787   LearningRate 0.0083   Epoch: 14   Global Step: 590140   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:51,778-Speed 2630.19 samples/sec   Loss 3.9288   LearningRate 0.0083   Epoch: 14   Global Step: 590150   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:55,667-Speed 2634.16 samples/sec   Loss 3.8531   LearningRate 0.0083   Epoch: 14   Global Step: 590160   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:04:59,556-Speed 2633.46 samples/sec   Loss 3.9253   LearningRate 0.0083   Epoch: 14   Global Step: 590170   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:03,467-Speed 2618.77 samples/sec   Loss 3.8917   LearningRate 0.0083   Epoch: 14   Global Step: 590180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:07,365-Speed 2628.11 samples/sec   Loss 3.8922   LearningRate 0.0083   Epoch: 14   Global Step: 590190   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:11,238-Speed 2644.76 samples/sec   Loss 3.9170   LearningRate 0.0083   Epoch: 14   Global Step: 590200   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:15,135-Speed 2628.60 samples/sec   Loss 3.9525   LearningRate 0.0083   Epoch: 14   Global Step: 590210   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:19,048-Speed 2617.24 samples/sec   Loss 4.0389   LearningRate 0.0083   Epoch: 14   Global Step: 590220   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:22,946-Speed 2627.85 samples/sec   Loss 3.9005   LearningRate 0.0083   Epoch: 14   Global Step: 590230   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:26,847-Speed 2625.38 samples/sec   Loss 3.9622   LearningRate 0.0083   Epoch: 14   Global Step: 590240   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:30,747-Speed 2626.87 samples/sec   Loss 3.8534   LearningRate 0.0083   Epoch: 14   Global Step: 590250   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:34,655-Speed 2620.54 samples/sec   Loss 4.0189   LearningRate 0.0083   Epoch: 14   Global Step: 590260   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:38,546-Speed 2632.66 samples/sec   Loss 3.8470   LearningRate 0.0083   Epoch: 14   Global Step: 590270   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:42,439-Speed 2630.92 samples/sec   Loss 3.8804   LearningRate 0.0083   Epoch: 14   Global Step: 590280   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:46,331-Speed 2631.14 samples/sec   Loss 3.8307   LearningRate 0.0083   Epoch: 14   Global Step: 590290   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:05:50,223-Speed 2632.19 samples/sec   Loss 3.9040   LearningRate 0.0083   Epoch: 14   Global Step: 590300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:05:54,117-Speed 2630.02 samples/sec   Loss 3.8679   LearningRate 0.0083   Epoch: 14   Global Step: 590310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:05:58,012-Speed 2630.85 samples/sec   Loss 3.9681   LearningRate 0.0083   Epoch: 14   Global Step: 590320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:01,922-Speed 2618.93 samples/sec   Loss 3.8944   LearningRate 0.0083   Epoch: 14   Global Step: 590330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:05,825-Speed 2625.29 samples/sec   Loss 3.9393   LearningRate 0.0083   Epoch: 14   Global Step: 590340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:09,718-Speed 2630.91 samples/sec   Loss 3.8938   LearningRate 0.0083   Epoch: 14   Global Step: 590350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:13,630-Speed 2617.63 samples/sec   Loss 3.9121   LearningRate 0.0083   Epoch: 14   Global Step: 590360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:17,521-Speed 2632.28 samples/sec   Loss 3.9080   LearningRate 0.0083   Epoch: 14   Global Step: 590370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:21,464-Speed 2598.26 samples/sec   Loss 3.8746   LearningRate 0.0083   Epoch: 14   Global Step: 590380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:25,363-Speed 2626.62 samples/sec   Loss 3.9451   LearningRate 0.0083   Epoch: 14   Global Step: 590390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:29,303-Speed 2600.63 samples/sec   Loss 3.8831   LearningRate 0.0083   Epoch: 14   Global Step: 590400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:06:33,297-Speed 2564.71 samples/sec   Loss 3.8809   LearningRate 0.0083   Epoch: 14   Global Step: 590410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:06:37,218-Speed 2612.15 samples/sec   Loss 3.9516   LearningRate 0.0083   Epoch: 14   Global Step: 590420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:06:41,193-Speed 2576.87 samples/sec   Loss 3.8742   LearningRate 0.0083   Epoch: 14   Global Step: 590430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:06:45,078-Speed 2636.84 samples/sec   Loss 3.9823   LearningRate 0.0083   Epoch: 14   Global Step: 590440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:48,977-Speed 2626.71 samples/sec   Loss 3.9999   LearningRate 0.0083   Epoch: 14   Global Step: 590450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:52,871-Speed 2630.72 samples/sec   Loss 3.8458   LearningRate 0.0083   Epoch: 14   Global Step: 590460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:06:56,772-Speed 2625.64 samples/sec   Loss 3.8649   LearningRate 0.0083   Epoch: 14   Global Step: 590470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:00,686-Speed 2616.58 samples/sec   Loss 3.9468   LearningRate 0.0083   Epoch: 14   Global Step: 590480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:04,622-Speed 2602.12 samples/sec   Loss 3.9913   LearningRate 0.0083   Epoch: 14   Global Step: 590490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:08,519-Speed 2628.83 samples/sec   Loss 3.8431   LearningRate 0.0083   Epoch: 14   Global Step: 590500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:12,429-Speed 2619.88 samples/sec   Loss 3.9318   LearningRate 0.0083   Epoch: 14   Global Step: 590510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:16,322-Speed 2630.60 samples/sec   Loss 3.9083   LearningRate 0.0083   Epoch: 14   Global Step: 590520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:20,233-Speed 2619.55 samples/sec   Loss 3.9138   LearningRate 0.0083   Epoch: 14   Global Step: 590530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:24,131-Speed 2627.21 samples/sec   Loss 3.8700   LearningRate 0.0083   Epoch: 14   Global Step: 590540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:07:28,030-Speed 2627.22 samples/sec   Loss 3.9033   LearningRate 0.0083   Epoch: 14   Global Step: 590550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:07:31,928-Speed 2627.79 samples/sec   Loss 3.8882   LearningRate 0.0083   Epoch: 14   Global Step: 590560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:07:35,840-Speed 2618.31 samples/sec   Loss 3.8958   LearningRate 0.0083   Epoch: 14   Global Step: 590570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:07:39,727-Speed 2634.63 samples/sec   Loss 3.9266   LearningRate 0.0083   Epoch: 14   Global Step: 590580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:43,630-Speed 2624.72 samples/sec   Loss 3.9258   LearningRate 0.0083   Epoch: 14   Global Step: 590590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:47,531-Speed 2626.09 samples/sec   Loss 3.9124   LearningRate 0.0083   Epoch: 14   Global Step: 590600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:51,451-Speed 2612.46 samples/sec   Loss 3.8811   LearningRate 0.0083   Epoch: 14   Global Step: 590610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:55,401-Speed 2593.58 samples/sec   Loss 3.8138   LearningRate 0.0083   Epoch: 14   Global Step: 590620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:07:59,300-Speed 2626.80 samples/sec   Loss 3.9577   LearningRate 0.0083   Epoch: 14   Global Step: 590630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:03,199-Speed 2627.28 samples/sec   Loss 3.9056   LearningRate 0.0083   Epoch: 14   Global Step: 590640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:07,102-Speed 2623.97 samples/sec   Loss 3.8545   LearningRate 0.0083   Epoch: 14   Global Step: 590650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:11,056-Speed 2590.40 samples/sec   Loss 3.8797   LearningRate 0.0083   Epoch: 14   Global Step: 590660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:14,956-Speed 2626.20 samples/sec   Loss 3.8783   LearningRate 0.0083   Epoch: 14   Global Step: 590670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:18,863-Speed 2622.45 samples/sec   Loss 3.8552   LearningRate 0.0083   Epoch: 14   Global Step: 590680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:08:22,738-Speed 2642.97 samples/sec   Loss 3.8282   LearningRate 0.0083   Epoch: 14   Global Step: 590690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:26,661-Speed 2611.13 samples/sec   Loss 3.8926   LearningRate 0.0083   Epoch: 14   Global Step: 590700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:30,564-Speed 2624.18 samples/sec   Loss 3.7824   LearningRate 0.0083   Epoch: 14   Global Step: 590710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:34,472-Speed 2620.91 samples/sec   Loss 3.9873   LearningRate 0.0083   Epoch: 14   Global Step: 590720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:38,380-Speed 2620.59 samples/sec   Loss 3.9561   LearningRate 0.0083   Epoch: 14   Global Step: 590730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:42,290-Speed 2620.27 samples/sec   Loss 3.8280   LearningRate 0.0083   Epoch: 14   Global Step: 590740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:46,207-Speed 2614.50 samples/sec   Loss 3.8493   LearningRate 0.0083   Epoch: 14   Global Step: 590750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:50,148-Speed 2599.14 samples/sec   Loss 3.8987   LearningRate 0.0083   Epoch: 14   Global Step: 590760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:54,051-Speed 2624.33 samples/sec   Loss 3.8166   LearningRate 0.0083   Epoch: 14   Global Step: 590770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:08:57,953-Speed 2625.16 samples/sec   Loss 3.9311   LearningRate 0.0083   Epoch: 14   Global Step: 590780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:01,877-Speed 2610.51 samples/sec   Loss 3.8764   LearningRate 0.0083   Epoch: 14   Global Step: 590790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:09:05,792-Speed 2616.41 samples/sec   Loss 4.0387   LearningRate 0.0083   Epoch: 14   Global Step: 590800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:09:09,686-Speed 2630.26 samples/sec   Loss 3.9248   LearningRate 0.0083   Epoch: 14   Global Step: 590810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:13,587-Speed 2625.74 samples/sec   Loss 3.8996   LearningRate 0.0083   Epoch: 14   Global Step: 590820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:17,496-Speed 2620.38 samples/sec   Loss 3.8857   LearningRate 0.0083   Epoch: 14   Global Step: 590830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:21,399-Speed 2624.59 samples/sec   Loss 3.9238   LearningRate 0.0083   Epoch: 14   Global Step: 590840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:25,307-Speed 2620.29 samples/sec   Loss 4.0085   LearningRate 0.0083   Epoch: 14   Global Step: 590850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:29,223-Speed 2615.95 samples/sec   Loss 3.9239   LearningRate 0.0083   Epoch: 14   Global Step: 590860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:33,129-Speed 2622.10 samples/sec   Loss 3.8786   LearningRate 0.0083   Epoch: 14   Global Step: 590870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:37,031-Speed 2625.12 samples/sec   Loss 3.9594   LearningRate 0.0083   Epoch: 14   Global Step: 590880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:09:40,961-Speed 2606.06 samples/sec   Loss 3.9399   LearningRate 0.0083   Epoch: 14   Global Step: 590890   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:09:44,952-Speed 2566.17 samples/sec   Loss 3.8624   LearningRate 0.0083   Epoch: 14   Global Step: 590900   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:09:48,879-Speed 2608.53 samples/sec   Loss 3.8636   LearningRate 0.0083   Epoch: 14   Global Step: 590910   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:09:52,797-Speed 2614.16 samples/sec   Loss 3.9253   LearningRate 0.0083   Epoch: 14   Global Step: 590920   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:09:56,714-Speed 2614.98 samples/sec   Loss 3.8664   LearningRate 0.0083   Epoch: 14   Global Step: 590930   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:10:00,636-Speed 2611.37 samples/sec   Loss 3.8719   LearningRate 0.0083   Epoch: 14   Global Step: 590940   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:10:04,572-Speed 2602.57 samples/sec   Loss 3.9009   LearningRate 0.0083   Epoch: 14   Global Step: 590950   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:10:08,474-Speed 2625.27 samples/sec   Loss 3.8922   LearningRate 0.0083   Epoch: 14   Global Step: 590960   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:10:12,422-Speed 2593.97 samples/sec   Loss 3.9607   LearningRate 0.0083   Epoch: 14   Global Step: 590970   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:10:16,335-Speed 2618.34 samples/sec   Loss 3.9383   LearningRate 0.0083   Epoch: 14   Global Step: 590980   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:10:20,233-Speed 2627.22 samples/sec   Loss 3.8764   LearningRate 0.0083   Epoch: 14   Global Step: 590990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:24,132-Speed 2627.39 samples/sec   Loss 3.9751   LearningRate 0.0083   Epoch: 14   Global Step: 591000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:28,037-Speed 2622.96 samples/sec   Loss 3.8872   LearningRate 0.0083   Epoch: 14   Global Step: 591010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:31,938-Speed 2626.10 samples/sec   Loss 3.8223   LearningRate 0.0083   Epoch: 14   Global Step: 591020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:35,838-Speed 2625.54 samples/sec   Loss 3.8633   LearningRate 0.0083   Epoch: 14   Global Step: 591030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:39,747-Speed 2620.41 samples/sec   Loss 3.9483   LearningRate 0.0083   Epoch: 14   Global Step: 591040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:43,654-Speed 2621.46 samples/sec   Loss 3.8945   LearningRate 0.0083   Epoch: 14   Global Step: 591050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:47,555-Speed 2626.07 samples/sec   Loss 3.9267   LearningRate 0.0083   Epoch: 14   Global Step: 591060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:51,455-Speed 2626.02 samples/sec   Loss 3.9535   LearningRate 0.0083   Epoch: 14   Global Step: 591070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:55,375-Speed 2613.30 samples/sec   Loss 3.9106   LearningRate 0.0083   Epoch: 14   Global Step: 591080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:10:59,298-Speed 2610.89 samples/sec   Loss 3.8795   LearningRate 0.0083   Epoch: 14   Global Step: 591090   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:03,249-Speed 2593.02 samples/sec   Loss 4.0020   LearningRate 0.0083   Epoch: 14   Global Step: 591100   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:07,153-Speed 2623.62 samples/sec   Loss 4.0016   LearningRate 0.0083   Epoch: 14   Global Step: 591110   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:11,062-Speed 2620.25 samples/sec   Loss 3.8323   LearningRate 0.0083   Epoch: 14   Global Step: 591120   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:14,976-Speed 2616.39 samples/sec   Loss 3.9140   LearningRate 0.0083   Epoch: 14   Global Step: 591130   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:18,891-Speed 2616.43 samples/sec   Loss 3.9326   LearningRate 0.0083   Epoch: 14   Global Step: 591140   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:22,799-Speed 2621.21 samples/sec   Loss 3.9253   LearningRate 0.0083   Epoch: 14   Global Step: 591150   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:26,704-Speed 2623.16 samples/sec   Loss 3.8629   LearningRate 0.0083   Epoch: 14   Global Step: 591160   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:30,607-Speed 2624.46 samples/sec   Loss 3.9243   LearningRate 0.0083   Epoch: 14   Global Step: 591170   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:34,508-Speed 2625.53 samples/sec   Loss 3.9266   LearningRate 0.0083   Epoch: 14   Global Step: 591180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:11:38,419-Speed 2618.98 samples/sec   Loss 3.9184   LearningRate 0.0083   Epoch: 14   Global Step: 591190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:11:42,322-Speed 2624.04 samples/sec   Loss 3.9207   LearningRate 0.0083   Epoch: 14   Global Step: 591200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:11:46,223-Speed 2625.37 samples/sec   Loss 3.8804   LearningRate 0.0083   Epoch: 14   Global Step: 591210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:11:50,125-Speed 2624.89 samples/sec   Loss 3.8705   LearningRate 0.0083   Epoch: 14   Global Step: 591220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:11:54,031-Speed 2622.72 samples/sec   Loss 3.7717   LearningRate 0.0083   Epoch: 14   Global Step: 591230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:11:57,934-Speed 2624.01 samples/sec   Loss 3.8737   LearningRate 0.0083   Epoch: 14   Global Step: 591240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:01,839-Speed 2623.04 samples/sec   Loss 3.8771   LearningRate 0.0083   Epoch: 14   Global Step: 591250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:05,749-Speed 2619.65 samples/sec   Loss 3.9088   LearningRate 0.0083   Epoch: 14   Global Step: 591260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:09,660-Speed 2618.25 samples/sec   Loss 3.9004   LearningRate 0.0083   Epoch: 14   Global Step: 591270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:13,571-Speed 2619.19 samples/sec   Loss 3.8395   LearningRate 0.0083   Epoch: 14   Global Step: 591280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:17,493-Speed 2611.64 samples/sec   Loss 3.8944   LearningRate 0.0083   Epoch: 14   Global Step: 591290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:12:21,373-Speed 2639.74 samples/sec   Loss 3.7878   LearningRate 0.0083   Epoch: 14   Global Step: 591300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:25,281-Speed 2620.80 samples/sec   Loss 3.8115   LearningRate 0.0082   Epoch: 14   Global Step: 591310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:29,185-Speed 2624.12 samples/sec   Loss 3.8848   LearningRate 0.0082   Epoch: 14   Global Step: 591320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:33,094-Speed 2621.09 samples/sec   Loss 3.8915   LearningRate 0.0082   Epoch: 14   Global Step: 591330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:37,021-Speed 2608.09 samples/sec   Loss 3.8759   LearningRate 0.0082   Epoch: 14   Global Step: 591340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:40,922-Speed 2625.00 samples/sec   Loss 3.8592   LearningRate 0.0082   Epoch: 14   Global Step: 591350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:44,826-Speed 2624.20 samples/sec   Loss 3.9238   LearningRate 0.0082   Epoch: 14   Global Step: 591360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:48,736-Speed 2619.38 samples/sec   Loss 3.9541   LearningRate 0.0082   Epoch: 14   Global Step: 591370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:52,678-Speed 2598.52 samples/sec   Loss 3.9288   LearningRate 0.0082   Epoch: 14   Global Step: 591380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:12:56,582-Speed 2623.49 samples/sec   Loss 3.9021   LearningRate 0.0082   Epoch: 14   Global Step: 591390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:00,501-Speed 2613.42 samples/sec   Loss 3.7805   LearningRate 0.0082   Epoch: 14   Global Step: 591400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:13:04,411-Speed 2619.46 samples/sec   Loss 3.9498   LearningRate 0.0082   Epoch: 14   Global Step: 591410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:13:08,318-Speed 2622.78 samples/sec   Loss 3.8614   LearningRate 0.0082   Epoch: 14   Global Step: 591420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:13:12,225-Speed 2621.16 samples/sec   Loss 3.8248   LearningRate 0.0082   Epoch: 14   Global Step: 591430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:16,137-Speed 2618.28 samples/sec   Loss 3.8641   LearningRate 0.0082   Epoch: 14   Global Step: 591440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:20,056-Speed 2613.59 samples/sec   Loss 3.8655   LearningRate 0.0082   Epoch: 14   Global Step: 591450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:23,972-Speed 2615.89 samples/sec   Loss 3.9003   LearningRate 0.0082   Epoch: 14   Global Step: 591460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:27,940-Speed 2582.02 samples/sec   Loss 3.8209   LearningRate 0.0082   Epoch: 14   Global Step: 591470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:31,843-Speed 2624.13 samples/sec   Loss 3.9155   LearningRate 0.0082   Epoch: 14   Global Step: 591480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:35,752-Speed 2619.92 samples/sec   Loss 3.8645   LearningRate 0.0082   Epoch: 14   Global Step: 591490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:39,656-Speed 2623.45 samples/sec   Loss 3.9124   LearningRate 0.0082   Epoch: 14   Global Step: 591500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:43,556-Speed 2627.30 samples/sec   Loss 3.9378   LearningRate 0.0082   Epoch: 14   Global Step: 591510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:47,457-Speed 2625.26 samples/sec   Loss 3.8889   LearningRate 0.0082   Epoch: 14   Global Step: 591520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:13:51,380-Speed 2612.13 samples/sec   Loss 3.8354   LearningRate 0.0082   Epoch: 14   Global Step: 591530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:13:55,286-Speed 2622.30 samples/sec   Loss 3.8606   LearningRate 0.0082   Epoch: 14   Global Step: 591540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:13:59,201-Speed 2615.99 samples/sec   Loss 3.9040   LearningRate 0.0082   Epoch: 14   Global Step: 591550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:14:03,110-Speed 2620.91 samples/sec   Loss 3.7993   LearningRate 0.0082   Epoch: 14   Global Step: 591560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:14:07,024-Speed 2616.28 samples/sec   Loss 3.8804   LearningRate 0.0082   Epoch: 14   Global Step: 591570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:14:10,929-Speed 2623.11 samples/sec   Loss 3.9443   LearningRate 0.0082   Epoch: 14   Global Step: 591580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:14:14,810-Speed 2639.31 samples/sec   Loss 3.8966   LearningRate 0.0082   Epoch: 14   Global Step: 591590   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:18,723-Speed 2617.87 samples/sec   Loss 3.8578   LearningRate 0.0082   Epoch: 14   Global Step: 591600   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:22,631-Speed 2620.88 samples/sec   Loss 3.8381   LearningRate 0.0082   Epoch: 14   Global Step: 591610   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:26,533-Speed 2624.94 samples/sec   Loss 3.9634   LearningRate 0.0082   Epoch: 14   Global Step: 591620   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:30,451-Speed 2614.05 samples/sec   Loss 3.8174   LearningRate 0.0082   Epoch: 14   Global Step: 591630   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:34,357-Speed 2622.51 samples/sec   Loss 3.8868   LearningRate 0.0082   Epoch: 14   Global Step: 591640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:38,267-Speed 2619.37 samples/sec   Loss 3.9083   LearningRate 0.0082   Epoch: 14   Global Step: 591650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:42,201-Speed 2604.38 samples/sec   Loss 3.8717   LearningRate 0.0082   Epoch: 14   Global Step: 591660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:46,109-Speed 2620.24 samples/sec   Loss 3.9099   LearningRate 0.0082   Epoch: 14   Global Step: 591670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:50,015-Speed 2622.84 samples/sec   Loss 3.8622   LearningRate 0.0082   Epoch: 14   Global Step: 591680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:14:53,921-Speed 2621.39 samples/sec   Loss 3.9165   LearningRate 0.0082   Epoch: 14   Global Step: 591690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:14:57,846-Speed 2610.00 samples/sec   Loss 3.8834   LearningRate 0.0082   Epoch: 14   Global Step: 591700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:15:01,783-Speed 2601.68 samples/sec   Loss 3.8850   LearningRate 0.0082   Epoch: 14   Global Step: 591710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:15:05,691-Speed 2620.25 samples/sec   Loss 3.9272   LearningRate 0.0082   Epoch: 14   Global Step: 591720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:15:09,596-Speed 2623.57 samples/sec   Loss 3.8541   LearningRate 0.0082   Epoch: 14   Global Step: 591730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:15:13,515-Speed 2613.52 samples/sec   Loss 3.9313   LearningRate 0.0082   Epoch: 14   Global Step: 591740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:15:17,422-Speed 2621.60 samples/sec   Loss 3.9394   LearningRate 0.0082   Epoch: 14   Global Step: 591750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:15:21,365-Speed 2597.79 samples/sec   Loss 3.8661   LearningRate 0.0082   Epoch: 14   Global Step: 591760   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:25,275-Speed 2620.21 samples/sec   Loss 3.9449   LearningRate 0.0082   Epoch: 14   Global Step: 591770   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:29,182-Speed 2621.27 samples/sec   Loss 3.9506   LearningRate 0.0082   Epoch: 14   Global Step: 591780   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:33,097-Speed 2616.50 samples/sec   Loss 3.9287   LearningRate 0.0082   Epoch: 14   Global Step: 591790   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:37,000-Speed 2624.32 samples/sec   Loss 3.8492   LearningRate 0.0082   Epoch: 14   Global Step: 591800   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:40,905-Speed 2623.03 samples/sec   Loss 3.8192   LearningRate 0.0082   Epoch: 14   Global Step: 591810   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:44,807-Speed 2624.89 samples/sec   Loss 3.8810   LearningRate 0.0082   Epoch: 14   Global Step: 591820   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:48,712-Speed 2622.90 samples/sec   Loss 3.9390   LearningRate 0.0082   Epoch: 14   Global Step: 591830   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:52,613-Speed 2625.41 samples/sec   Loss 3.8681   LearningRate 0.0082   Epoch: 14   Global Step: 591840   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:15:56,522-Speed 2625.78 samples/sec   Loss 4.0166   LearningRate 0.0082   Epoch: 14   Global Step: 591850   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:16:00,422-Speed 2626.18 samples/sec   Loss 3.8942   LearningRate 0.0082   Epoch: 14   Global Step: 591860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:04,325-Speed 2624.00 samples/sec   Loss 3.9335   LearningRate 0.0082   Epoch: 14   Global Step: 591870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:08,229-Speed 2623.17 samples/sec   Loss 3.9317   LearningRate 0.0082   Epoch: 14   Global Step: 591880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:12,144-Speed 2617.23 samples/sec   Loss 3.8904   LearningRate 0.0082   Epoch: 14   Global Step: 591890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:16,049-Speed 2622.73 samples/sec   Loss 3.8624   LearningRate 0.0082   Epoch: 14   Global Step: 591900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:19,966-Speed 2615.59 samples/sec   Loss 3.8080   LearningRate 0.0082   Epoch: 14   Global Step: 591910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:23,875-Speed 2620.31 samples/sec   Loss 3.8697   LearningRate 0.0082   Epoch: 14   Global Step: 591920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:27,796-Speed 2612.31 samples/sec   Loss 3.8411   LearningRate 0.0082   Epoch: 14   Global Step: 591930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:31,702-Speed 2622.54 samples/sec   Loss 3.8825   LearningRate 0.0082   Epoch: 14   Global Step: 591940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:35,610-Speed 2620.71 samples/sec   Loss 3.8784   LearningRate 0.0082   Epoch: 14   Global Step: 591950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:39,518-Speed 2621.22 samples/sec   Loss 3.8774   LearningRate 0.0082   Epoch: 14   Global Step: 591960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:16:43,414-Speed 2628.77 samples/sec   Loss 3.8980   LearningRate 0.0082   Epoch: 14   Global Step: 591970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:47,314-Speed 2626.38 samples/sec   Loss 3.9107   LearningRate 0.0082   Epoch: 14   Global Step: 591980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:51,225-Speed 2619.28 samples/sec   Loss 3.8715   LearningRate 0.0082   Epoch: 14   Global Step: 591990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:55,145-Speed 2612.62 samples/sec   Loss 3.9232   LearningRate 0.0082   Epoch: 14   Global Step: 592000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:16:59,048-Speed 2624.69 samples/sec   Loss 3.8515   LearningRate 0.0082   Epoch: 14   Global Step: 592010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:02,957-Speed 2620.38 samples/sec   Loss 3.9154   LearningRate 0.0082   Epoch: 14   Global Step: 592020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:06,862-Speed 2622.66 samples/sec   Loss 3.8684   LearningRate 0.0082   Epoch: 14   Global Step: 592030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:10,761-Speed 2627.00 samples/sec   Loss 3.8681   LearningRate 0.0082   Epoch: 14   Global Step: 592040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:14,666-Speed 2622.72 samples/sec   Loss 3.9368   LearningRate 0.0082   Epoch: 14   Global Step: 592050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:18,577-Speed 2619.19 samples/sec   Loss 3.8782   LearningRate 0.0082   Epoch: 14   Global Step: 592060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:22,483-Speed 2622.30 samples/sec   Loss 3.9225   LearningRate 0.0082   Epoch: 14   Global Step: 592070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:17:26,387-Speed 2623.80 samples/sec   Loss 3.8422   LearningRate 0.0082   Epoch: 14   Global Step: 592080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:17:30,277-Speed 2632.32 samples/sec   Loss 3.8555   LearningRate 0.0082   Epoch: 14   Global Step: 592090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:34,187-Speed 2620.48 samples/sec   Loss 3.8983   LearningRate 0.0082   Epoch: 14   Global Step: 592100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:38,098-Speed 2618.45 samples/sec   Loss 3.9031   LearningRate 0.0082   Epoch: 14   Global Step: 592110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:42,009-Speed 2618.95 samples/sec   Loss 3.9324   LearningRate 0.0082   Epoch: 14   Global Step: 592120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:45,921-Speed 2618.36 samples/sec   Loss 3.8480   LearningRate 0.0082   Epoch: 14   Global Step: 592130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:17:49,848-Speed 2608.36 samples/sec   Loss 3.9288   LearningRate 0.0082   Epoch: 14   Global Step: 592140   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:17:53,750-Speed 2626.58 samples/sec   Loss 3.8521   LearningRate 0.0082   Epoch: 14   Global Step: 592150   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:17:57,654-Speed 2623.28 samples/sec   Loss 3.8650   LearningRate 0.0082   Epoch: 14   Global Step: 592160   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:01,557-Speed 2624.93 samples/sec   Loss 3.9912   LearningRate 0.0082   Epoch: 14   Global Step: 592170   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:05,462-Speed 2622.39 samples/sec   Loss 3.8495   LearningRate 0.0082   Epoch: 14   Global Step: 592180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:09,518-Speed 2525.42 samples/sec   Loss 3.7308   LearningRate 0.0082   Epoch: 14   Global Step: 592190   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:13,554-Speed 2537.60 samples/sec   Loss 3.9049   LearningRate 0.0082   Epoch: 14   Global Step: 592200   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:17,453-Speed 2627.58 samples/sec   Loss 3.8768   LearningRate 0.0082   Epoch: 14   Global Step: 592210   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:21,374-Speed 2612.38 samples/sec   Loss 3.9497   LearningRate 0.0082   Epoch: 14   Global Step: 592220   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:25,309-Speed 2603.83 samples/sec   Loss 3.8471   LearningRate 0.0082   Epoch: 14   Global Step: 592230   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:18:29,212-Speed 2625.20 samples/sec   Loss 3.8590   LearningRate 0.0082   Epoch: 14   Global Step: 592240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:33,117-Speed 2622.72 samples/sec   Loss 3.8570   LearningRate 0.0082   Epoch: 14   Global Step: 592250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:37,087-Speed 2579.78 samples/sec   Loss 3.8938   LearningRate 0.0082   Epoch: 14   Global Step: 592260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:41,112-Speed 2544.69 samples/sec   Loss 3.8998   LearningRate 0.0082   Epoch: 14   Global Step: 592270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:45,223-Speed 2491.77 samples/sec   Loss 3.8846   LearningRate 0.0082   Epoch: 14   Global Step: 592280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:49,129-Speed 2622.10 samples/sec   Loss 3.8568   LearningRate 0.0082   Epoch: 14   Global Step: 592290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:53,050-Speed 2612.89 samples/sec   Loss 3.8791   LearningRate 0.0082   Epoch: 14   Global Step: 592300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:18:56,953-Speed 2624.46 samples/sec   Loss 3.8521   LearningRate 0.0082   Epoch: 14   Global Step: 592310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:00,865-Speed 2618.19 samples/sec   Loss 3.9079   LearningRate 0.0082   Epoch: 14   Global Step: 592320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:04,773-Speed 2620.84 samples/sec   Loss 3.9580   LearningRate 0.0082   Epoch: 14   Global Step: 592330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:08,693-Speed 2613.17 samples/sec   Loss 3.8550   LearningRate 0.0082   Epoch: 14   Global Step: 592340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:19:12,580-Speed 2635.45 samples/sec   Loss 3.8253   LearningRate 0.0082   Epoch: 14   Global Step: 592350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:16,503-Speed 2610.62 samples/sec   Loss 3.8485   LearningRate 0.0082   Epoch: 14   Global Step: 592360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:20,436-Speed 2604.73 samples/sec   Loss 3.9784   LearningRate 0.0082   Epoch: 14   Global Step: 592370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:24,354-Speed 2614.25 samples/sec   Loss 3.8434   LearningRate 0.0082   Epoch: 14   Global Step: 592380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:28,293-Speed 2600.47 samples/sec   Loss 3.8820   LearningRate 0.0082   Epoch: 14   Global Step: 592390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:32,205-Speed 2618.08 samples/sec   Loss 3.9054   LearningRate 0.0082   Epoch: 14   Global Step: 592400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:36,206-Speed 2560.16 samples/sec   Loss 3.9838   LearningRate 0.0082   Epoch: 14   Global Step: 592410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:40,294-Speed 2504.98 samples/sec   Loss 3.8935   LearningRate 0.0082   Epoch: 14   Global Step: 592420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:44,355-Speed 2522.62 samples/sec   Loss 3.8440   LearningRate 0.0082   Epoch: 14   Global Step: 592430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:48,303-Speed 2594.90 samples/sec   Loss 3.9401   LearningRate 0.0082   Epoch: 14   Global Step: 592440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:52,227-Speed 2610.19 samples/sec   Loss 3.8770   LearningRate 0.0082   Epoch: 14   Global Step: 592450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:19:56,140-Speed 2617.56 samples/sec   Loss 3.8813   LearningRate 0.0082   Epoch: 14   Global Step: 592460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:00,051-Speed 2619.74 samples/sec   Loss 3.9091   LearningRate 0.0082   Epoch: 14   Global Step: 592470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:03,997-Speed 2595.45 samples/sec   Loss 3.8441   LearningRate 0.0082   Epoch: 14   Global Step: 592480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:07,901-Speed 2623.84 samples/sec   Loss 3.9113   LearningRate 0.0082   Epoch: 14   Global Step: 592490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:11,805-Speed 2623.52 samples/sec   Loss 3.8857   LearningRate 0.0082   Epoch: 14   Global Step: 592500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:15,718-Speed 2617.41 samples/sec   Loss 3.8926   LearningRate 0.0082   Epoch: 14   Global Step: 592510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:19,619-Speed 2625.83 samples/sec   Loss 3.9113   LearningRate 0.0082   Epoch: 14   Global Step: 592520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:23,534-Speed 2616.58 samples/sec   Loss 3.9255   LearningRate 0.0082   Epoch: 14   Global Step: 592530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:27,435-Speed 2626.18 samples/sec   Loss 3.8833   LearningRate 0.0082   Epoch: 14   Global Step: 592540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:31,336-Speed 2625.49 samples/sec   Loss 3.8831   LearningRate 0.0082   Epoch: 14   Global Step: 592550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:20:35,236-Speed 2625.84 samples/sec   Loss 3.8439   LearningRate 0.0082   Epoch: 14   Global Step: 592560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:20:39,117-Speed 2639.12 samples/sec   Loss 3.7735   LearningRate 0.0082   Epoch: 14   Global Step: 592570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:43,018-Speed 2626.26 samples/sec   Loss 3.8265   LearningRate 0.0082   Epoch: 14   Global Step: 592580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:46,919-Speed 2625.68 samples/sec   Loss 3.9147   LearningRate 0.0082   Epoch: 14   Global Step: 592590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:50,822-Speed 2624.55 samples/sec   Loss 3.9729   LearningRate 0.0082   Epoch: 14   Global Step: 592600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:54,721-Speed 2626.89 samples/sec   Loss 3.8361   LearningRate 0.0082   Epoch: 14   Global Step: 592610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:20:58,637-Speed 2615.57 samples/sec   Loss 3.7973   LearningRate 0.0082   Epoch: 14   Global Step: 592620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:02,540-Speed 2624.36 samples/sec   Loss 3.9045   LearningRate 0.0082   Epoch: 14   Global Step: 592630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:06,455-Speed 2615.97 samples/sec   Loss 3.8618   LearningRate 0.0082   Epoch: 14   Global Step: 592640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:10,363-Speed 2620.80 samples/sec   Loss 3.9417   LearningRate 0.0082   Epoch: 14   Global Step: 592650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:14,266-Speed 2624.45 samples/sec   Loss 3.8779   LearningRate 0.0082   Epoch: 14   Global Step: 592660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:18,148-Speed 2638.71 samples/sec   Loss 3.9333   LearningRate 0.0082   Epoch: 14   Global Step: 592670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:22,090-Speed 2598.14 samples/sec   Loss 3.8024   LearningRate 0.0082   Epoch: 14   Global Step: 592680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:25,989-Speed 2626.74 samples/sec   Loss 3.8216   LearningRate 0.0082   Epoch: 14   Global Step: 592690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:29,892-Speed 2624.78 samples/sec   Loss 3.8469   LearningRate 0.0082   Epoch: 14   Global Step: 592700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:33,787-Speed 2629.37 samples/sec   Loss 3.8706   LearningRate 0.0082   Epoch: 14   Global Step: 592710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:37,685-Speed 2627.67 samples/sec   Loss 3.9072   LearningRate 0.0082   Epoch: 14   Global Step: 592720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:41,585-Speed 2626.26 samples/sec   Loss 3.8791   LearningRate 0.0082   Epoch: 14   Global Step: 592730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:45,500-Speed 2616.05 samples/sec   Loss 3.9155   LearningRate 0.0082   Epoch: 14   Global Step: 592740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:49,398-Speed 2627.65 samples/sec   Loss 3.8704   LearningRate 0.0081   Epoch: 14   Global Step: 592750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:53,300-Speed 2625.10 samples/sec   Loss 3.8468   LearningRate 0.0081   Epoch: 14   Global Step: 592760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:21:57,222-Speed 2612.28 samples/sec   Loss 3.9417   LearningRate 0.0081   Epoch: 14   Global Step: 592770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:22:01,101-Speed 2639.90 samples/sec   Loss 3.8197   LearningRate 0.0081   Epoch: 14   Global Step: 592780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:22:05,004-Speed 2624.42 samples/sec   Loss 3.9092   LearningRate 0.0081   Epoch: 14   Global Step: 592790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:22:08,916-Speed 2617.92 samples/sec   Loss 3.9002   LearningRate 0.0081   Epoch: 14   Global Step: 592800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:22:12,819-Speed 2624.41 samples/sec   Loss 3.8503   LearningRate 0.0081   Epoch: 14   Global Step: 592810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:22:16,720-Speed 2625.26 samples/sec   Loss 3.9338   LearningRate 0.0081   Epoch: 14   Global Step: 592820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:22:20,606-Speed 2636.28 samples/sec   Loss 3.8466   LearningRate 0.0081   Epoch: 14   Global Step: 592830   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:24,532-Speed 2609.29 samples/sec   Loss 3.8769   LearningRate 0.0081   Epoch: 14   Global Step: 592840   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:28,438-Speed 2622.13 samples/sec   Loss 3.9195   LearningRate 0.0081   Epoch: 14   Global Step: 592850   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:32,356-Speed 2614.71 samples/sec   Loss 3.7788   LearningRate 0.0081   Epoch: 14   Global Step: 592860   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:36,254-Speed 2627.54 samples/sec   Loss 3.8631   LearningRate 0.0081   Epoch: 14   Global Step: 592870   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:40,155-Speed 2625.83 samples/sec   Loss 3.8324   LearningRate 0.0081   Epoch: 14   Global Step: 592880   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:44,058-Speed 2624.33 samples/sec   Loss 3.7803   LearningRate 0.0081   Epoch: 14   Global Step: 592890   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:47,963-Speed 2623.37 samples/sec   Loss 3.7936   LearningRate 0.0081   Epoch: 14   Global Step: 592900   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:51,901-Speed 2601.06 samples/sec   Loss 3.8724   LearningRate 0.0081   Epoch: 14   Global Step: 592910   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:55,800-Speed 2627.29 samples/sec   Loss 3.8765   LearningRate 0.0081   Epoch: 14   Global Step: 592920   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:22:59,751-Speed 2592.23 samples/sec   Loss 3.8768   LearningRate 0.0081   Epoch: 14   Global Step: 592930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:03,650-Speed 2627.96 samples/sec   Loss 3.7592   LearningRate 0.0081   Epoch: 14   Global Step: 592940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:07,549-Speed 2627.00 samples/sec   Loss 3.8037   LearningRate 0.0081   Epoch: 14   Global Step: 592950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:11,463-Speed 2616.47 samples/sec   Loss 3.8436   LearningRate 0.0081   Epoch: 14   Global Step: 592960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:15,369-Speed 2622.53 samples/sec   Loss 3.8840   LearningRate 0.0081   Epoch: 14   Global Step: 592970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:19,297-Speed 2607.96 samples/sec   Loss 3.8906   LearningRate 0.0081   Epoch: 14   Global Step: 592980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:23,222-Speed 2609.67 samples/sec   Loss 3.9245   LearningRate 0.0081   Epoch: 14   Global Step: 592990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:27,127-Speed 2623.16 samples/sec   Loss 3.8492   LearningRate 0.0081   Epoch: 14   Global Step: 593000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:31,050-Speed 2610.99 samples/sec   Loss 3.9379   LearningRate 0.0081   Epoch: 14   Global Step: 593010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:34,956-Speed 2622.59 samples/sec   Loss 3.8131   LearningRate 0.0081   Epoch: 14   Global Step: 593020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:38,835-Speed 2640.33 samples/sec   Loss 3.8724   LearningRate 0.0081   Epoch: 14   Global Step: 593030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:23:42,714-Speed 2640.56 samples/sec   Loss 3.9244   LearningRate 0.0081   Epoch: 14   Global Step: 593040   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:23:46,626-Speed 2617.72 samples/sec   Loss 3.8363   LearningRate 0.0081   Epoch: 14   Global Step: 593050   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:23:50,541-Speed 2617.54 samples/sec   Loss 3.8572   LearningRate 0.0081   Epoch: 14   Global Step: 593060   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:23:54,449-Speed 2621.18 samples/sec   Loss 3.8335   LearningRate 0.0081   Epoch: 14   Global Step: 593070   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:23:58,377-Speed 2607.21 samples/sec   Loss 3.7622   LearningRate 0.0081   Epoch: 14   Global Step: 593080   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:24:02,277-Speed 2627.11 samples/sec   Loss 3.9457   LearningRate 0.0081   Epoch: 14   Global Step: 593090   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:24:06,221-Speed 2596.69 samples/sec   Loss 3.8308   LearningRate 0.0081   Epoch: 14   Global Step: 593100   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:24:10,123-Speed 2624.88 samples/sec   Loss 3.8108   LearningRate 0.0081   Epoch: 14   Global Step: 593110   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:24:14,040-Speed 2614.81 samples/sec   Loss 3.8912   LearningRate 0.0081   Epoch: 14   Global Step: 593120   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:24:18,001-Speed 2585.86 samples/sec   Loss 3.8191   LearningRate 0.0081   Epoch: 14   Global Step: 593130   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-15 14:24:21,973-Speed 2578.97 samples/sec   Loss 3.8929   LearningRate 0.0081   Epoch: 14   Global Step: 593140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:26,019-Speed 2531.94 samples/sec   Loss 3.9501   LearningRate 0.0081   Epoch: 14   Global Step: 593150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:29,926-Speed 2621.87 samples/sec   Loss 3.8228   LearningRate 0.0081   Epoch: 14   Global Step: 593160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:33,839-Speed 2617.65 samples/sec   Loss 3.8382   LearningRate 0.0081   Epoch: 14   Global Step: 593170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:37,754-Speed 2615.57 samples/sec   Loss 3.8726   LearningRate 0.0081   Epoch: 14   Global Step: 593180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:41,659-Speed 2623.41 samples/sec   Loss 3.9228   LearningRate 0.0081   Epoch: 14   Global Step: 593190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:45,578-Speed 2615.99 samples/sec   Loss 3.8852   LearningRate 0.0081   Epoch: 14   Global Step: 593200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:49,504-Speed 2609.06 samples/sec   Loss 3.8839   LearningRate 0.0081   Epoch: 14   Global Step: 593210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:53,411-Speed 2622.12 samples/sec   Loss 3.7832   LearningRate 0.0081   Epoch: 14   Global Step: 593220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:24:57,315-Speed 2623.25 samples/sec   Loss 3.8445   LearningRate 0.0081   Epoch: 14   Global Step: 593230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:01,243-Speed 2607.49 samples/sec   Loss 3.9037   LearningRate 0.0081   Epoch: 14   Global Step: 593240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:25:05,143-Speed 2626.78 samples/sec   Loss 3.8721   LearningRate 0.0081   Epoch: 14   Global Step: 593250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:09,044-Speed 2625.61 samples/sec   Loss 3.8853   LearningRate 0.0081   Epoch: 14   Global Step: 593260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:12,940-Speed 2628.82 samples/sec   Loss 3.8567   LearningRate 0.0081   Epoch: 14   Global Step: 593270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:16,846-Speed 2621.79 samples/sec   Loss 3.8667   LearningRate 0.0081   Epoch: 14   Global Step: 593280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:20,746-Speed 2626.79 samples/sec   Loss 3.8140   LearningRate 0.0081   Epoch: 14   Global Step: 593290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:24,648-Speed 2624.86 samples/sec   Loss 3.9334   LearningRate 0.0081   Epoch: 14   Global Step: 593300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:28,547-Speed 2627.47 samples/sec   Loss 3.8831   LearningRate 0.0081   Epoch: 14   Global Step: 593310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:32,450-Speed 2624.36 samples/sec   Loss 3.9068   LearningRate 0.0081   Epoch: 14   Global Step: 593320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:36,352-Speed 2624.58 samples/sec   Loss 3.8414   LearningRate 0.0081   Epoch: 14   Global Step: 593330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:40,253-Speed 2625.33 samples/sec   Loss 3.9708   LearningRate 0.0081   Epoch: 14   Global Step: 593340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:44,153-Speed 2626.51 samples/sec   Loss 3.8616   LearningRate 0.0081   Epoch: 14   Global Step: 593350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:25:48,034-Speed 2639.24 samples/sec   Loss 3.8465   LearningRate 0.0081   Epoch: 14   Global Step: 593360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:51,937-Speed 2624.54 samples/sec   Loss 3.8722   LearningRate 0.0081   Epoch: 14   Global Step: 593370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:55,837-Speed 2625.62 samples/sec   Loss 3.8508   LearningRate 0.0081   Epoch: 14   Global Step: 593380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:25:59,743-Speed 2622.92 samples/sec   Loss 3.8375   LearningRate 0.0081   Epoch: 14   Global Step: 593390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:03,642-Speed 2627.19 samples/sec   Loss 3.7904   LearningRate 0.0081   Epoch: 14   Global Step: 593400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:07,539-Speed 2628.25 samples/sec   Loss 3.8319   LearningRate 0.0081   Epoch: 14   Global Step: 593410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:11,472-Speed 2604.06 samples/sec   Loss 3.8330   LearningRate 0.0081   Epoch: 14   Global Step: 593420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:15,379-Speed 2621.90 samples/sec   Loss 3.7880   LearningRate 0.0081   Epoch: 14   Global Step: 593430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:19,280-Speed 2626.12 samples/sec   Loss 3.8335   LearningRate 0.0081   Epoch: 14   Global Step: 593440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:23,179-Speed 2626.63 samples/sec   Loss 3.8685   LearningRate 0.0081   Epoch: 14   Global Step: 593450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:27,081-Speed 2625.71 samples/sec   Loss 3.7804   LearningRate 0.0081   Epoch: 14   Global Step: 593460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:26:31,005-Speed 2610.00 samples/sec   Loss 3.8702   LearningRate 0.0081   Epoch: 14   Global Step: 593470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:26:34,923-Speed 2613.65 samples/sec   Loss 3.9210   LearningRate 0.0081   Epoch: 14   Global Step: 593480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:38,890-Speed 2582.19 samples/sec   Loss 3.8787   LearningRate 0.0081   Epoch: 14   Global Step: 593490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:42,790-Speed 2626.77 samples/sec   Loss 3.8652   LearningRate 0.0081   Epoch: 14   Global Step: 593500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:46,690-Speed 2625.90 samples/sec   Loss 3.9208   LearningRate 0.0081   Epoch: 14   Global Step: 593510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:50,606-Speed 2615.74 samples/sec   Loss 3.8214   LearningRate 0.0081   Epoch: 14   Global Step: 593520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:54,509-Speed 2624.38 samples/sec   Loss 3.7317   LearningRate 0.0081   Epoch: 14   Global Step: 593530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:26:58,409-Speed 2626.40 samples/sec   Loss 3.7730   LearningRate 0.0081   Epoch: 14   Global Step: 593540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:02,306-Speed 2628.70 samples/sec   Loss 3.7968   LearningRate 0.0081   Epoch: 14   Global Step: 593550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:06,207-Speed 2624.83 samples/sec   Loss 3.8304   LearningRate 0.0081   Epoch: 14   Global Step: 593560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:10,110-Speed 2624.30 samples/sec   Loss 3.8716   LearningRate 0.0081   Epoch: 14   Global Step: 593570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:14,014-Speed 2623.63 samples/sec   Loss 3.9849   LearningRate 0.0081   Epoch: 14   Global Step: 593580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:27:17,900-Speed 2635.84 samples/sec   Loss 3.7787   LearningRate 0.0081   Epoch: 14   Global Step: 593590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:21,828-Speed 2607.75 samples/sec   Loss 3.8488   LearningRate 0.0081   Epoch: 14   Global Step: 593600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:25,724-Speed 2629.23 samples/sec   Loss 3.7896   LearningRate 0.0081   Epoch: 14   Global Step: 593610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:29,628-Speed 2623.69 samples/sec   Loss 3.7917   LearningRate 0.0081   Epoch: 14   Global Step: 593620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:33,553-Speed 2609.40 samples/sec   Loss 3.8580   LearningRate 0.0081   Epoch: 14   Global Step: 593630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:37,448-Speed 2630.09 samples/sec   Loss 3.7436   LearningRate 0.0081   Epoch: 14   Global Step: 593640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:41,355-Speed 2621.18 samples/sec   Loss 3.9324   LearningRate 0.0081   Epoch: 14   Global Step: 593650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:45,257-Speed 2626.05 samples/sec   Loss 3.9152   LearningRate 0.0081   Epoch: 14   Global Step: 593660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:49,249-Speed 2566.05 samples/sec   Loss 3.8359   LearningRate 0.0081   Epoch: 14   Global Step: 593670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:53,181-Speed 2604.80 samples/sec   Loss 3.8392   LearningRate 0.0081   Epoch: 14   Global Step: 593680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:27:57,086-Speed 2623.14 samples/sec   Loss 3.8554   LearningRate 0.0081   Epoch: 14   Global Step: 593690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:28:01,009-Speed 2611.55 samples/sec   Loss 3.8374   LearningRate 0.0081   Epoch: 14   Global Step: 593700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:28:04,895-Speed 2635.30 samples/sec   Loss 3.9218   LearningRate 0.0081   Epoch: 14   Global Step: 593710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:08,793-Speed 2627.57 samples/sec   Loss 3.8069   LearningRate 0.0081   Epoch: 14   Global Step: 593720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:12,754-Speed 2586.23 samples/sec   Loss 3.9235   LearningRate 0.0081   Epoch: 14   Global Step: 593730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:16,648-Speed 2630.01 samples/sec   Loss 3.8870   LearningRate 0.0081   Epoch: 14   Global Step: 593740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:20,583-Speed 2603.07 samples/sec   Loss 3.8385   LearningRate 0.0081   Epoch: 14   Global Step: 593750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:24,522-Speed 2600.98 samples/sec   Loss 3.8427   LearningRate 0.0081   Epoch: 14   Global Step: 593760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:28,421-Speed 2626.45 samples/sec   Loss 3.8115   LearningRate 0.0081   Epoch: 14   Global Step: 593770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:32,351-Speed 2606.45 samples/sec   Loss 3.8567   LearningRate 0.0081   Epoch: 14   Global Step: 593780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:36,249-Speed 2628.20 samples/sec   Loss 3.9033   LearningRate 0.0081   Epoch: 14   Global Step: 593790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:40,146-Speed 2628.12 samples/sec   Loss 3.9332   LearningRate 0.0081   Epoch: 14   Global Step: 593800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:44,055-Speed 2620.06 samples/sec   Loss 3.9587   LearningRate 0.0081   Epoch: 14   Global Step: 593810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-15 14:28:47,938-Speed 2638.13 samples/sec   Loss 3.8589   LearningRate 0.0081   Epoch: 14   Global Step: 593820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-15 14:28:51,856-Speed 2613.87 samples/sec   Loss 3.8453   LearningRate 0.0081   Epoch: 14   Global Step: 593830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:28:55,805-Speed 2594.00 samples/sec   Loss 3.8690   LearningRate 0.0081   Epoch: 14   Global Step: 593840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:28:59,717-Speed 2617.87 samples/sec   Loss 3.7977   LearningRate 0.0081   Epoch: 14   Global Step: 593850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:03,622-Speed 2623.94 samples/sec   Loss 3.8597   LearningRate 0.0081   Epoch: 14   Global Step: 593860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:07,519-Speed 2628.21 samples/sec   Loss 3.7480   LearningRate 0.0081   Epoch: 14   Global Step: 593870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:11,497-Speed 2574.44 samples/sec   Loss 3.8653   LearningRate 0.0081   Epoch: 14   Global Step: 593880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:15,430-Speed 2604.62 samples/sec   Loss 3.8460   LearningRate 0.0081   Epoch: 14   Global Step: 593890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:19,331-Speed 2625.64 samples/sec   Loss 3.8439   LearningRate 0.0081   Epoch: 14   Global Step: 593900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:23,232-Speed 2625.33 samples/sec   Loss 3.8084   LearningRate 0.0081   Epoch: 14   Global Step: 593910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:27,153-Speed 2612.56 samples/sec   Loss 3.9040   LearningRate 0.0081   Epoch: 14   Global Step: 593920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:31,165-Speed 2553.31 samples/sec   Loss 3.8646   LearningRate 0.0081   Epoch: 14   Global Step: 593930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:35,066-Speed 2625.75 samples/sec   Loss 3.8506   LearningRate 0.0081   Epoch: 14   Global Step: 593940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:39,084-Speed 2548.93 samples/sec   Loss 3.8509   LearningRate 0.0081   Epoch: 14   Global Step: 593950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:43,200-Speed 2488.84 samples/sec   Loss 3.8488   LearningRate 0.0081   Epoch: 14   Global Step: 593960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:47,093-Speed 2631.56 samples/sec   Loss 3.6726   LearningRate 0.0081   Epoch: 14   Global Step: 593970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:50,986-Speed 2630.55 samples/sec   Loss 3.7641   LearningRate 0.0081   Epoch: 14   Global Step: 593980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:54,881-Speed 2629.54 samples/sec   Loss 3.8282   LearningRate 0.0081   Epoch: 14   Global Step: 593990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:29:58,779-Speed 2627.41 samples/sec   Loss 3.8699   LearningRate 0.0081   Epoch: 14   Global Step: 594000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:30:02,680-Speed 2626.54 samples/sec   Loss 3.8152   LearningRate 0.0081   Epoch: 14   Global Step: 594010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:30:06,582-Speed 2625.37 samples/sec   Loss 3.8469   LearningRate 0.0081   Epoch: 14   Global Step: 594020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:30:10,482-Speed 2625.85 samples/sec   Loss 3.8223   LearningRate 0.0081   Epoch: 14   Global Step: 594030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:30:14,394-Speed 2618.34 samples/sec   Loss 3.7629   LearningRate 0.0081   Epoch: 14   Global Step: 594040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:30:18,274-Speed 2640.04 samples/sec   Loss 3.8671   LearningRate 0.0081   Epoch: 14   Global Step: 594050   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:22,174-Speed 2626.30 samples/sec   Loss 3.8895   LearningRate 0.0081   Epoch: 14   Global Step: 594060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:26,068-Speed 2630.68 samples/sec   Loss 3.9167   LearningRate 0.0081   Epoch: 14   Global Step: 594070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:29,995-Speed 2608.53 samples/sec   Loss 3.9396   LearningRate 0.0081   Epoch: 14   Global Step: 594080   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:33,891-Speed 2629.03 samples/sec   Loss 3.8283   LearningRate 0.0081   Epoch: 14   Global Step: 594090   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:37,797-Speed 2621.97 samples/sec   Loss 3.8591   LearningRate 0.0081   Epoch: 14   Global Step: 594100   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:41,703-Speed 2622.61 samples/sec   Loss 4.0011   LearningRate 0.0081   Epoch: 14   Global Step: 594110   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:45,602-Speed 2626.79 samples/sec   Loss 3.8933   LearningRate 0.0081   Epoch: 14   Global Step: 594120   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:49,498-Speed 2629.08 samples/sec   Loss 3.8042   LearningRate 0.0081   Epoch: 14   Global Step: 594130   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:53,420-Speed 2611.42 samples/sec   Loss 3.8674   LearningRate 0.0081   Epoch: 14   Global Step: 594140   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:30:57,316-Speed 2629.07 samples/sec   Loss 3.8886   LearningRate 0.0081   Epoch: 14   Global Step: 594150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:01,213-Speed 2628.49 samples/sec   Loss 3.8311   LearningRate 0.0081   Epoch: 14   Global Step: 594160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:05,110-Speed 2627.98 samples/sec   Loss 3.8141   LearningRate 0.0081   Epoch: 14   Global Step: 594170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:09,027-Speed 2615.60 samples/sec   Loss 3.8246   LearningRate 0.0081   Epoch: 14   Global Step: 594180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:12,938-Speed 2618.38 samples/sec   Loss 3.8478   LearningRate 0.0081   Epoch: 14   Global Step: 594190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:16,839-Speed 2625.71 samples/sec   Loss 3.8468   LearningRate 0.0081   Epoch: 14   Global Step: 594200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:20,757-Speed 2614.68 samples/sec   Loss 3.8302   LearningRate 0.0080   Epoch: 14   Global Step: 594210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:24,670-Speed 2617.48 samples/sec   Loss 3.8723   LearningRate 0.0080   Epoch: 14   Global Step: 594220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:28,568-Speed 2628.10 samples/sec   Loss 3.8411   LearningRate 0.0080   Epoch: 14   Global Step: 594230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:32,466-Speed 2627.73 samples/sec   Loss 3.8865   LearningRate 0.0080   Epoch: 14   Global Step: 594240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:36,363-Speed 2628.16 samples/sec   Loss 3.9428   LearningRate 0.0080   Epoch: 14   Global Step: 594250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:31:40,279-Speed 2615.02 samples/sec   Loss 3.8501   LearningRate 0.0080   Epoch: 14   Global Step: 594260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:31:44,180-Speed 2626.26 samples/sec   Loss 3.8872   LearningRate 0.0080   Epoch: 14   Global Step: 594270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:31:48,076-Speed 2629.34 samples/sec   Loss 3.9188   LearningRate 0.0080   Epoch: 14   Global Step: 594280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:31:51,985-Speed 2620.36 samples/sec   Loss 3.8338   LearningRate 0.0080   Epoch: 14   Global Step: 594290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:31:55,866-Speed 2638.85 samples/sec   Loss 3.7790   LearningRate 0.0080   Epoch: 14   Global Step: 594300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:31:59,791-Speed 2609.52 samples/sec   Loss 3.8363   LearningRate 0.0080   Epoch: 14   Global Step: 594310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:03,700-Speed 2620.51 samples/sec   Loss 3.8102   LearningRate 0.0080   Epoch: 14   Global Step: 594320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:07,598-Speed 2627.58 samples/sec   Loss 3.8008   LearningRate 0.0080   Epoch: 14   Global Step: 594330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:11,494-Speed 2628.46 samples/sec   Loss 3.8268   LearningRate 0.0080   Epoch: 14   Global Step: 594340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:15,397-Speed 2625.04 samples/sec   Loss 3.8544   LearningRate 0.0080   Epoch: 14   Global Step: 594350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:19,298-Speed 2625.20 samples/sec   Loss 3.7731   LearningRate 0.0080   Epoch: 14   Global Step: 594360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:23,195-Speed 2628.22 samples/sec   Loss 3.7895   LearningRate 0.0080   Epoch: 14   Global Step: 594370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:27,120-Speed 2609.64 samples/sec   Loss 3.8777   LearningRate 0.0080   Epoch: 14   Global Step: 594380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:31,017-Speed 2628.60 samples/sec   Loss 3.7713   LearningRate 0.0080   Epoch: 14   Global Step: 594390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:34,891-Speed 2643.72 samples/sec   Loss 3.7985   LearningRate 0.0080   Epoch: 14   Global Step: 594400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:38,791-Speed 2626.64 samples/sec   Loss 3.8125   LearningRate 0.0080   Epoch: 14   Global Step: 594410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:42,712-Speed 2612.06 samples/sec   Loss 3.8064   LearningRate 0.0080   Epoch: 14   Global Step: 594420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:46,624-Speed 2618.49 samples/sec   Loss 3.9040   LearningRate 0.0080   Epoch: 14   Global Step: 594430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:50,518-Speed 2630.52 samples/sec   Loss 3.8256   LearningRate 0.0080   Epoch: 14   Global Step: 594440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:54,436-Speed 2614.14 samples/sec   Loss 3.8537   LearningRate 0.0080   Epoch: 14   Global Step: 594450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:32:58,327-Speed 2632.61 samples/sec   Loss 3.8386   LearningRate 0.0080   Epoch: 14   Global Step: 594460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:02,231-Speed 2623.60 samples/sec   Loss 3.8344   LearningRate 0.0080   Epoch: 14   Global Step: 594470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:06,127-Speed 2628.72 samples/sec   Loss 3.9370   LearningRate 0.0080   Epoch: 14   Global Step: 594480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:10,022-Speed 2630.16 samples/sec   Loss 3.9933   LearningRate 0.0080   Epoch: 14   Global Step: 594490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:13,921-Speed 2627.38 samples/sec   Loss 3.8519   LearningRate 0.0080   Epoch: 14   Global Step: 594500   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:33:17,829-Speed 2620.60 samples/sec   Loss 3.8211   LearningRate 0.0080   Epoch: 14   Global Step: 594510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:21,733-Speed 2624.64 samples/sec   Loss 3.8777   LearningRate 0.0080   Epoch: 14   Global Step: 594520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:25,651-Speed 2613.88 samples/sec   Loss 3.8077   LearningRate 0.0080   Epoch: 14   Global Step: 594530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:29,555-Speed 2623.89 samples/sec   Loss 3.9541   LearningRate 0.0080   Epoch: 14   Global Step: 594540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:33,473-Speed 2614.44 samples/sec   Loss 3.9221   LearningRate 0.0080   Epoch: 14   Global Step: 594550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:37,368-Speed 2629.58 samples/sec   Loss 3.8320   LearningRate 0.0080   Epoch: 14   Global Step: 594560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:41,303-Speed 2602.92 samples/sec   Loss 3.8510   LearningRate 0.0080   Epoch: 14   Global Step: 594570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:45,201-Speed 2627.59 samples/sec   Loss 3.7420   LearningRate 0.0080   Epoch: 14   Global Step: 594580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:49,096-Speed 2629.97 samples/sec   Loss 3.7218   LearningRate 0.0080   Epoch: 14   Global Step: 594590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:52,997-Speed 2626.12 samples/sec   Loss 3.8540   LearningRate 0.0080   Epoch: 14   Global Step: 594600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:33:56,894-Speed 2628.09 samples/sec   Loss 3.8742   LearningRate 0.0080   Epoch: 14   Global Step: 594610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:34:00,781-Speed 2635.33 samples/sec   Loss 3.6461   LearningRate 0.0080   Epoch: 14   Global Step: 594620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:04,696-Speed 2615.83 samples/sec   Loss 3.8856   LearningRate 0.0080   Epoch: 14   Global Step: 594630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:08,596-Speed 2626.32 samples/sec   Loss 3.7865   LearningRate 0.0080   Epoch: 14   Global Step: 594640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:12,537-Speed 2599.43 samples/sec   Loss 3.7841   LearningRate 0.0080   Epoch: 14   Global Step: 594650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:16,444-Speed 2621.70 samples/sec   Loss 3.8881   LearningRate 0.0080   Epoch: 14   Global Step: 594660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:20,348-Speed 2623.53 samples/sec   Loss 3.7935   LearningRate 0.0080   Epoch: 14   Global Step: 594670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:24,251-Speed 2624.70 samples/sec   Loss 3.8143   LearningRate 0.0080   Epoch: 14   Global Step: 594680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:28,149-Speed 2628.21 samples/sec   Loss 3.8014   LearningRate 0.0080   Epoch: 14   Global Step: 594690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:32,055-Speed 2621.88 samples/sec   Loss 3.7701   LearningRate 0.0080   Epoch: 14   Global Step: 594700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:36,046-Speed 2567.35 samples/sec   Loss 3.8372   LearningRate 0.0080   Epoch: 14   Global Step: 594710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:40,051-Speed 2557.36 samples/sec   Loss 3.8431   LearningRate 0.0080   Epoch: 14   Global Step: 594720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:34:44,056-Speed 2557.31 samples/sec   Loss 3.7769   LearningRate 0.0080   Epoch: 14   Global Step: 594730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:34:47,933-Speed 2642.07 samples/sec   Loss 3.8751   LearningRate 0.0080   Epoch: 14   Global Step: 594740   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:34:51,831-Speed 2627.25 samples/sec   Loss 3.8070   LearningRate 0.0080   Epoch: 14   Global Step: 594750   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:34:55,731-Speed 2627.14 samples/sec   Loss 3.8785   LearningRate 0.0080   Epoch: 14   Global Step: 594760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:34:59,627-Speed 2628.43 samples/sec   Loss 3.7907   LearningRate 0.0080   Epoch: 14   Global Step: 594770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:03,526-Speed 2626.94 samples/sec   Loss 3.8332   LearningRate 0.0080   Epoch: 14   Global Step: 594780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:07,523-Speed 2563.28 samples/sec   Loss 3.8494   LearningRate 0.0080   Epoch: 14   Global Step: 594790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:11,448-Speed 2609.11 samples/sec   Loss 3.8408   LearningRate 0.0080   Epoch: 14   Global Step: 594800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:15,344-Speed 2628.57 samples/sec   Loss 3.8392   LearningRate 0.0080   Epoch: 14   Global Step: 594810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:19,249-Speed 2623.74 samples/sec   Loss 3.8642   LearningRate 0.0080   Epoch: 14   Global Step: 594820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:23,153-Speed 2623.23 samples/sec   Loss 3.9735   LearningRate 0.0080   Epoch: 14   Global Step: 594830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:35:27,049-Speed 2629.30 samples/sec   Loss 3.8833   LearningRate 0.0080   Epoch: 14   Global Step: 594840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:30,948-Speed 2627.07 samples/sec   Loss 3.7678   LearningRate 0.0080   Epoch: 14   Global Step: 594850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:34,850-Speed 2625.65 samples/sec   Loss 3.8815   LearningRate 0.0080   Epoch: 14   Global Step: 594860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:38,748-Speed 2627.27 samples/sec   Loss 3.8219   LearningRate 0.0080   Epoch: 14   Global Step: 594870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:42,816-Speed 2517.59 samples/sec   Loss 3.7437   LearningRate 0.0080   Epoch: 14   Global Step: 594880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:46,732-Speed 2615.87 samples/sec   Loss 3.8507   LearningRate 0.0080   Epoch: 14   Global Step: 594890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:50,647-Speed 2616.02 samples/sec   Loss 3.8484   LearningRate 0.0080   Epoch: 14   Global Step: 594900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:54,553-Speed 2622.37 samples/sec   Loss 3.8005   LearningRate 0.0080   Epoch: 14   Global Step: 594910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:35:58,447-Speed 2630.71 samples/sec   Loss 3.7942   LearningRate 0.0080   Epoch: 14   Global Step: 594920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:02,344-Speed 2628.30 samples/sec   Loss 3.9094   LearningRate 0.0080   Epoch: 14   Global Step: 594930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:06,273-Speed 2607.06 samples/sec   Loss 3.8611   LearningRate 0.0080   Epoch: 14   Global Step: 594940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:36:10,171-Speed 2627.92 samples/sec   Loss 3.8747   LearningRate 0.0080   Epoch: 14   Global Step: 594950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:36:14,065-Speed 2629.98 samples/sec   Loss 3.8431   LearningRate 0.0080   Epoch: 14   Global Step: 594960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:36:17,960-Speed 2629.54 samples/sec   Loss 3.8344   LearningRate 0.0080   Epoch: 14   Global Step: 594970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:36:21,860-Speed 2626.81 samples/sec   Loss 3.8642   LearningRate 0.0080   Epoch: 14   Global Step: 594980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:25,766-Speed 2622.82 samples/sec   Loss 3.7389   LearningRate 0.0080   Epoch: 14   Global Step: 594990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:29,658-Speed 2631.40 samples/sec   Loss 3.7717   LearningRate 0.0080   Epoch: 14   Global Step: 595000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:33,564-Speed 2622.37 samples/sec   Loss 3.8644   LearningRate 0.0080   Epoch: 14   Global Step: 595010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:37,459-Speed 2629.68 samples/sec   Loss 3.8147   LearningRate 0.0080   Epoch: 14   Global Step: 595020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:41,353-Speed 2630.39 samples/sec   Loss 3.8591   LearningRate 0.0080   Epoch: 14   Global Step: 595030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:45,270-Speed 2615.16 samples/sec   Loss 3.7994   LearningRate 0.0080   Epoch: 14   Global Step: 595040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:49,192-Speed 2611.60 samples/sec   Loss 3.8590   LearningRate 0.0080   Epoch: 14   Global Step: 595050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:53,096-Speed 2623.61 samples/sec   Loss 3.8135   LearningRate 0.0080   Epoch: 14   Global Step: 595060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:36:56,989-Speed 2631.19 samples/sec   Loss 3.7735   LearningRate 0.0080   Epoch: 14   Global Step: 595070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:00,920-Speed 2605.72 samples/sec   Loss 3.9252   LearningRate 0.0080   Epoch: 14   Global Step: 595080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:37:04,817-Speed 2628.04 samples/sec   Loss 3.9646   LearningRate 0.0080   Epoch: 14   Global Step: 595090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:37:08,709-Speed 2632.04 samples/sec   Loss 3.8733   LearningRate 0.0080   Epoch: 14   Global Step: 595100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:37:12,586-Speed 2641.14 samples/sec   Loss 3.8855   LearningRate 0.0080   Epoch: 14   Global Step: 595110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:16,489-Speed 2625.47 samples/sec   Loss 3.8762   LearningRate 0.0080   Epoch: 14   Global Step: 595120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:20,379-Speed 2633.24 samples/sec   Loss 3.8506   LearningRate 0.0080   Epoch: 14   Global Step: 595130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:24,274-Speed 2629.58 samples/sec   Loss 3.8453   LearningRate 0.0080   Epoch: 14   Global Step: 595140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:28,203-Speed 2607.06 samples/sec   Loss 3.8055   LearningRate 0.0080   Epoch: 14   Global Step: 595150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:32,099-Speed 2629.27 samples/sec   Loss 3.7811   LearningRate 0.0080   Epoch: 14   Global Step: 595160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:35,992-Speed 2630.59 samples/sec   Loss 3.8889   LearningRate 0.0080   Epoch: 14   Global Step: 595170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:39,885-Speed 2630.98 samples/sec   Loss 3.9336   LearningRate 0.0080   Epoch: 14   Global Step: 595180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:43,996-Speed 2491.88 samples/sec   Loss 3.8751   LearningRate 0.0080   Epoch: 14   Global Step: 595190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:47,949-Speed 2591.25 samples/sec   Loss 3.8693   LearningRate 0.0080   Epoch: 14   Global Step: 595200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:37:51,849-Speed 2626.78 samples/sec   Loss 3.8111   LearningRate 0.0080   Epoch: 14   Global Step: 595210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:37:55,774-Speed 2609.08 samples/sec   Loss 3.8741   LearningRate 0.0080   Epoch: 14   Global Step: 595220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:37:59,694-Speed 2613.42 samples/sec   Loss 3.8539   LearningRate 0.0080   Epoch: 14   Global Step: 595230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:38:03,581-Speed 2635.07 samples/sec   Loss 3.7695   LearningRate 0.0080   Epoch: 14   Global Step: 595240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:07,476-Speed 2629.28 samples/sec   Loss 3.7880   LearningRate 0.0080   Epoch: 14   Global Step: 595250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:11,376-Speed 2626.29 samples/sec   Loss 3.9087   LearningRate 0.0080   Epoch: 14   Global Step: 595260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:15,272-Speed 2629.08 samples/sec   Loss 3.8610   LearningRate 0.0080   Epoch: 14   Global Step: 595270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:19,163-Speed 2632.33 samples/sec   Loss 3.8351   LearningRate 0.0080   Epoch: 14   Global Step: 595280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:23,057-Speed 2631.22 samples/sec   Loss 3.7567   LearningRate 0.0080   Epoch: 14   Global Step: 595290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:26,994-Speed 2601.47 samples/sec   Loss 3.8299   LearningRate 0.0080   Epoch: 14   Global Step: 595300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:30,891-Speed 2628.60 samples/sec   Loss 3.7604   LearningRate 0.0080   Epoch: 14   Global Step: 595310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:34,831-Speed 2599.77 samples/sec   Loss 3.8556   LearningRate 0.0080   Epoch: 14   Global Step: 595320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:38,811-Speed 2573.51 samples/sec   Loss 3.8294   LearningRate 0.0080   Epoch: 14   Global Step: 595330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:42,703-Speed 2631.37 samples/sec   Loss 3.9331   LearningRate 0.0080   Epoch: 14   Global Step: 595340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:38:46,572-Speed 2647.65 samples/sec   Loss 3.7506   LearningRate 0.0080   Epoch: 14   Global Step: 595350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:50,474-Speed 2625.61 samples/sec   Loss 3.7871   LearningRate 0.0080   Epoch: 14   Global Step: 595360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:54,378-Speed 2623.41 samples/sec   Loss 3.8020   LearningRate 0.0080   Epoch: 14   Global Step: 595370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:38:58,274-Speed 2629.26 samples/sec   Loss 3.7684   LearningRate 0.0080   Epoch: 14   Global Step: 595380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:39:02,194-Speed 2613.03 samples/sec   Loss 3.8529   LearningRate 0.0080   Epoch: 14   Global Step: 595390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:39:06,087-Speed 2630.85 samples/sec   Loss 3.8108   LearningRate 0.0080   Epoch: 14   Global Step: 595400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:39:09,986-Speed 2626.75 samples/sec   Loss 3.7563   LearningRate 0.0080   Epoch: 14   Global Step: 595410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:39:13,890-Speed 2623.88 samples/sec   Loss 3.7766   LearningRate 0.0080   Epoch: 14   Global Step: 595420   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:17,801-Speed 2618.64 samples/sec   Loss 3.8883   LearningRate 0.0080   Epoch: 14   Global Step: 595430   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:21,703-Speed 2625.12 samples/sec   Loss 3.7768   LearningRate 0.0080   Epoch: 14   Global Step: 595440   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:25,603-Speed 2626.45 samples/sec   Loss 3.9086   LearningRate 0.0080   Epoch: 14   Global Step: 595450   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:29,496-Speed 2631.46 samples/sec   Loss 3.8619   LearningRate 0.0080   Epoch: 14   Global Step: 595460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:33,396-Speed 2626.52 samples/sec   Loss 3.8746   LearningRate 0.0080   Epoch: 14   Global Step: 595470   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:37,289-Speed 2630.57 samples/sec   Loss 3.7936   LearningRate 0.0080   Epoch: 14   Global Step: 595480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:41,183-Speed 2630.27 samples/sec   Loss 3.8007   LearningRate 0.0080   Epoch: 14   Global Step: 595490   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:45,084-Speed 2625.87 samples/sec   Loss 3.8282   LearningRate 0.0080   Epoch: 14   Global Step: 595500   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:48,978-Speed 2629.89 samples/sec   Loss 3.7675   LearningRate 0.0080   Epoch: 14   Global Step: 595510   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:39:52,879-Speed 2626.08 samples/sec   Loss 3.8414   LearningRate 0.0080   Epoch: 14   Global Step: 595520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:39:56,759-Speed 2639.86 samples/sec   Loss 3.9033   LearningRate 0.0080   Epoch: 14   Global Step: 595530   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:00,675-Speed 2615.66 samples/sec   Loss 3.8430   LearningRate 0.0080   Epoch: 14   Global Step: 595540   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:04,576-Speed 2625.68 samples/sec   Loss 3.8693   LearningRate 0.0080   Epoch: 14   Global Step: 595550   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:08,472-Speed 2629.10 samples/sec   Loss 3.8669   LearningRate 0.0080   Epoch: 14   Global Step: 595560   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:12,388-Speed 2615.49 samples/sec   Loss 3.9238   LearningRate 0.0080   Epoch: 14   Global Step: 595570   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:16,281-Speed 2630.83 samples/sec   Loss 3.7355   LearningRate 0.0080   Epoch: 14   Global Step: 595580   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:20,188-Speed 2621.46 samples/sec   Loss 3.7679   LearningRate 0.0080   Epoch: 14   Global Step: 595590   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:24,083-Speed 2629.83 samples/sec   Loss 3.8491   LearningRate 0.0080   Epoch: 14   Global Step: 595600   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:27,991-Speed 2621.56 samples/sec   Loss 3.7760   LearningRate 0.0080   Epoch: 14   Global Step: 595610   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:31,896-Speed 2622.67 samples/sec   Loss 3.8104   LearningRate 0.0080   Epoch: 14   Global Step: 595620   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:35,834-Speed 2601.17 samples/sec   Loss 3.8897   LearningRate 0.0080   Epoch: 14   Global Step: 595630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:40:39,740-Speed 2622.48 samples/sec   Loss 3.7882   LearningRate 0.0080   Epoch: 14   Global Step: 595640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:40:43,647-Speed 2621.30 samples/sec   Loss 3.7703   LearningRate 0.0080   Epoch: 14   Global Step: 595650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:40:47,573-Speed 2608.49 samples/sec   Loss 3.8463   LearningRate 0.0080   Epoch: 14   Global Step: 595660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:40:51,571-Speed 2562.75 samples/sec   Loss 3.7667   LearningRate 0.0080   Epoch: 14   Global Step: 595670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:40:55,443-Speed 2644.98 samples/sec   Loss 3.8836   LearningRate 0.0079   Epoch: 14   Global Step: 595680   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:40:59,363-Speed 2613.49 samples/sec   Loss 3.7827   LearningRate 0.0079   Epoch: 14   Global Step: 595690   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:03,261-Speed 2626.93 samples/sec   Loss 3.6892   LearningRate 0.0079   Epoch: 14   Global Step: 595700   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:07,197-Speed 2603.22 samples/sec   Loss 3.8014   LearningRate 0.0079   Epoch: 14   Global Step: 595710   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:11,111-Speed 2616.47 samples/sec   Loss 3.8327   LearningRate 0.0079   Epoch: 14   Global Step: 595720   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:15,010-Speed 2627.11 samples/sec   Loss 3.8235   LearningRate 0.0079   Epoch: 14   Global Step: 595730   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:18,944-Speed 2603.41 samples/sec   Loss 3.9130   LearningRate 0.0079   Epoch: 14   Global Step: 595740   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:22,844-Speed 2626.13 samples/sec   Loss 3.7691   LearningRate 0.0079   Epoch: 14   Global Step: 595750   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:26,733-Speed 2634.41 samples/sec   Loss 3.8518   LearningRate 0.0079   Epoch: 14   Global Step: 595760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:30,623-Speed 2632.74 samples/sec   Loss 3.7649   LearningRate 0.0079   Epoch: 14   Global Step: 595770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:34,529-Speed 2622.27 samples/sec   Loss 3.8787   LearningRate 0.0079   Epoch: 14   Global Step: 595780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:41:38,405-Speed 2642.40 samples/sec   Loss 3.8362   LearningRate 0.0079   Epoch: 14   Global Step: 595790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:42,302-Speed 2628.23 samples/sec   Loss 3.8579   LearningRate 0.0079   Epoch: 14   Global Step: 595800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:46,205-Speed 2624.15 samples/sec   Loss 3.8047   LearningRate 0.0079   Epoch: 14   Global Step: 595810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:50,098-Speed 2631.08 samples/sec   Loss 3.8194   LearningRate 0.0079   Epoch: 14   Global Step: 595820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:53,996-Speed 2627.64 samples/sec   Loss 3.8050   LearningRate 0.0079   Epoch: 14   Global Step: 595830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:41:57,938-Speed 2598.79 samples/sec   Loss 3.7748   LearningRate 0.0079   Epoch: 14   Global Step: 595840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:42:01,859-Speed 2611.83 samples/sec   Loss 3.8811   LearningRate 0.0079   Epoch: 14   Global Step: 595850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:42:05,780-Speed 2612.64 samples/sec   Loss 3.6897   LearningRate 0.0079   Epoch: 14   Global Step: 595860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:42:09,690-Speed 2619.15 samples/sec   Loss 3.7405   LearningRate 0.0079   Epoch: 14   Global Step: 595870   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:42:13,586-Speed 2629.32 samples/sec   Loss 3.8633   LearningRate 0.0079   Epoch: 14   Global Step: 595880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:42:17,479-Speed 2631.06 samples/sec   Loss 3.8730   LearningRate 0.0079   Epoch: 14   Global Step: 595890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:21,372-Speed 2630.99 samples/sec   Loss 3.7512   LearningRate 0.0079   Epoch: 14   Global Step: 595900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:25,322-Speed 2592.70 samples/sec   Loss 3.8414   LearningRate 0.0079   Epoch: 14   Global Step: 595910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:29,222-Speed 2626.70 samples/sec   Loss 3.7719   LearningRate 0.0079   Epoch: 14   Global Step: 595920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:33,192-Speed 2579.68 samples/sec   Loss 3.8310   LearningRate 0.0079   Epoch: 14   Global Step: 595930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:37,084-Speed 2632.27 samples/sec   Loss 3.8859   LearningRate 0.0079   Epoch: 14   Global Step: 595940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:40,976-Speed 2631.16 samples/sec   Loss 3.8477   LearningRate 0.0079   Epoch: 14   Global Step: 595950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:44,868-Speed 2631.83 samples/sec   Loss 3.8053   LearningRate 0.0079   Epoch: 14   Global Step: 595960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:48,763-Speed 2629.06 samples/sec   Loss 3.8368   LearningRate 0.0079   Epoch: 14   Global Step: 595970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:42:52,641-Speed 2641.33 samples/sec   Loss 3.8014   LearningRate 0.0079   Epoch: 14   Global Step: 595980   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:42:56,586-Speed 2596.96 samples/sec   Loss 3.7850   LearningRate 0.0079   Epoch: 14   Global Step: 595990   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:00,481-Speed 2629.35 samples/sec   Loss 3.7920   LearningRate 0.0079   Epoch: 14   Global Step: 596000   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:04,386-Speed 2622.67 samples/sec   Loss 3.7777   LearningRate 0.0079   Epoch: 14   Global Step: 596010   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:08,289-Speed 2624.81 samples/sec   Loss 3.8035   LearningRate 0.0079   Epoch: 14   Global Step: 596020   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:12,182-Speed 2630.41 samples/sec   Loss 3.9885   LearningRate 0.0079   Epoch: 14   Global Step: 596030   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:16,079-Speed 2628.31 samples/sec   Loss 3.7923   LearningRate 0.0079   Epoch: 14   Global Step: 596040   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:19,976-Speed 2628.72 samples/sec   Loss 3.9112   LearningRate 0.0079   Epoch: 14   Global Step: 596050   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:23,880-Speed 2623.99 samples/sec   Loss 3.8211   LearningRate 0.0079   Epoch: 14   Global Step: 596060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:27,777-Speed 2628.07 samples/sec   Loss 3.8322   LearningRate 0.0079   Epoch: 14   Global Step: 596070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:43:31,694-Speed 2615.08 samples/sec   Loss 3.7593   LearningRate 0.0079   Epoch: 14   Global Step: 596080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:35,619-Speed 2609.82 samples/sec   Loss 3.8322   LearningRate 0.0079   Epoch: 14   Global Step: 596090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:39,514-Speed 2629.33 samples/sec   Loss 3.8103   LearningRate 0.0079   Epoch: 14   Global Step: 596100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:43,433-Speed 2613.78 samples/sec   Loss 3.7602   LearningRate 0.0079   Epoch: 14   Global Step: 596110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:47,328-Speed 2629.13 samples/sec   Loss 3.8225   LearningRate 0.0079   Epoch: 14   Global Step: 596120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:51,234-Speed 2623.17 samples/sec   Loss 3.8234   LearningRate 0.0079   Epoch: 14   Global Step: 596130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:55,124-Speed 2632.43 samples/sec   Loss 3.7219   LearningRate 0.0079   Epoch: 14   Global Step: 596140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:43:59,024-Speed 2626.93 samples/sec   Loss 3.8160   LearningRate 0.0079   Epoch: 14   Global Step: 596150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:02,920-Speed 2628.67 samples/sec   Loss 3.8630   LearningRate 0.0079   Epoch: 14   Global Step: 596160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:06,819-Speed 2627.38 samples/sec   Loss 3.7825   LearningRate 0.0079   Epoch: 14   Global Step: 596170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:10,690-Speed 2646.29 samples/sec   Loss 3.8164   LearningRate 0.0079   Epoch: 14   Global Step: 596180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:14,598-Speed 2620.79 samples/sec   Loss 3.8836   LearningRate 0.0079   Epoch: 14   Global Step: 596190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:18,492-Speed 2629.95 samples/sec   Loss 3.7883   LearningRate 0.0079   Epoch: 14   Global Step: 596200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:22,398-Speed 2622.82 samples/sec   Loss 3.8777   LearningRate 0.0079   Epoch: 14   Global Step: 596210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:26,327-Speed 2607.40 samples/sec   Loss 3.7847   LearningRate 0.0079   Epoch: 14   Global Step: 596220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:30,236-Speed 2620.16 samples/sec   Loss 3.8753   LearningRate 0.0079   Epoch: 14   Global Step: 596230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:34,135-Speed 2627.07 samples/sec   Loss 3.8259   LearningRate 0.0079   Epoch: 14   Global Step: 596240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:38,061-Speed 2608.98 samples/sec   Loss 3.8550   LearningRate 0.0079   Epoch: 14   Global Step: 596250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:41,959-Speed 2627.30 samples/sec   Loss 3.7446   LearningRate 0.0079   Epoch: 14   Global Step: 596260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:45,855-Speed 2628.99 samples/sec   Loss 3.9024   LearningRate 0.0079   Epoch: 14   Global Step: 596270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:44:49,812-Speed 2588.78 samples/sec   Loss 3.8232   LearningRate 0.0079   Epoch: 14   Global Step: 596280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:44:53,713-Speed 2625.77 samples/sec   Loss 3.7833   LearningRate 0.0079   Epoch: 14   Global Step: 596290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:44:57,590-Speed 2642.73 samples/sec   Loss 3.8111   LearningRate 0.0079   Epoch: 14   Global Step: 596300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:01,497-Speed 2621.11 samples/sec   Loss 3.8229   LearningRate 0.0079   Epoch: 14   Global Step: 596310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:05,404-Speed 2621.96 samples/sec   Loss 3.8679   LearningRate 0.0079   Epoch: 14   Global Step: 596320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:09,306-Speed 2624.64 samples/sec   Loss 3.7290   LearningRate 0.0079   Epoch: 14   Global Step: 596330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:13,203-Speed 2628.68 samples/sec   Loss 3.7729   LearningRate 0.0079   Epoch: 14   Global Step: 596340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:17,102-Speed 2627.03 samples/sec   Loss 3.8074   LearningRate 0.0079   Epoch: 14   Global Step: 596350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:20,994-Speed 2631.98 samples/sec   Loss 3.7834   LearningRate 0.0079   Epoch: 14   Global Step: 596360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:24,894-Speed 2625.48 samples/sec   Loss 3.7972   LearningRate 0.0079   Epoch: 14   Global Step: 596370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:28,790-Speed 2629.74 samples/sec   Loss 3.7886   LearningRate 0.0079   Epoch: 14   Global Step: 596380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:45:32,668-Speed 2641.24 samples/sec   Loss 3.8042   LearningRate 0.0079   Epoch: 14   Global Step: 596390   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:36,563-Speed 2630.12 samples/sec   Loss 3.8055   LearningRate 0.0079   Epoch: 14   Global Step: 596400   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:40,495-Speed 2604.65 samples/sec   Loss 3.7543   LearningRate 0.0079   Epoch: 14   Global Step: 596410   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:44,394-Speed 2626.98 samples/sec   Loss 3.8734   LearningRate 0.0079   Epoch: 14   Global Step: 596420   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:48,292-Speed 2627.51 samples/sec   Loss 3.7712   LearningRate 0.0079   Epoch: 14   Global Step: 596430   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:52,196-Speed 2624.03 samples/sec   Loss 3.8828   LearningRate 0.0079   Epoch: 14   Global Step: 596440   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:56,094-Speed 2627.59 samples/sec   Loss 3.7932   LearningRate 0.0079   Epoch: 14   Global Step: 596450   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:45:59,992-Speed 2627.52 samples/sec   Loss 3.8428   LearningRate 0.0079   Epoch: 14   Global Step: 596460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:46:03,899-Speed 2621.51 samples/sec   Loss 3.8754   LearningRate 0.0079   Epoch: 14   Global Step: 596470   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:46:07,794-Speed 2629.95 samples/sec   Loss 3.8592   LearningRate 0.0079   Epoch: 14   Global Step: 596480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:46:11,689-Speed 2629.82 samples/sec   Loss 3.8413   LearningRate 0.0079   Epoch: 14   Global Step: 596490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:15,580-Speed 2631.76 samples/sec   Loss 3.7293   LearningRate 0.0079   Epoch: 14   Global Step: 596500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:19,473-Speed 2631.40 samples/sec   Loss 3.9145   LearningRate 0.0079   Epoch: 14   Global Step: 596510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:23,368-Speed 2629.12 samples/sec   Loss 3.7483   LearningRate 0.0079   Epoch: 14   Global Step: 596520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:27,259-Speed 2632.92 samples/sec   Loss 3.8403   LearningRate 0.0079   Epoch: 14   Global Step: 596530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:31,150-Speed 2632.14 samples/sec   Loss 3.7748   LearningRate 0.0079   Epoch: 14   Global Step: 596540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:35,058-Speed 2621.43 samples/sec   Loss 3.7918   LearningRate 0.0079   Epoch: 14   Global Step: 596550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:38,958-Speed 2626.47 samples/sec   Loss 3.8529   LearningRate 0.0079   Epoch: 14   Global Step: 596560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:42,865-Speed 2622.03 samples/sec   Loss 3.7785   LearningRate 0.0079   Epoch: 14   Global Step: 596570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:46,801-Speed 2602.03 samples/sec   Loss 3.7566   LearningRate 0.0079   Epoch: 14   Global Step: 596580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:50,696-Speed 2629.43 samples/sec   Loss 3.8036   LearningRate 0.0079   Epoch: 14   Global Step: 596590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:46:54,665-Speed 2580.33 samples/sec   Loss 3.8370   LearningRate 0.0079   Epoch: 14   Global Step: 596600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:46:58,561-Speed 2628.56 samples/sec   Loss 3.7301   LearningRate 0.0079   Epoch: 14   Global Step: 596610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:02,661-Speed 2498.88 samples/sec   Loss 3.8282   LearningRate 0.0079   Epoch: 14   Global Step: 596620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:06,590-Speed 2606.77 samples/sec   Loss 3.8316   LearningRate 0.0079   Epoch: 14   Global Step: 596630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:10,489-Speed 2626.83 samples/sec   Loss 3.7938   LearningRate 0.0079   Epoch: 14   Global Step: 596640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:14,385-Speed 2629.35 samples/sec   Loss 3.7535   LearningRate 0.0079   Epoch: 14   Global Step: 596650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:18,283-Speed 2627.61 samples/sec   Loss 3.8807   LearningRate 0.0079   Epoch: 14   Global Step: 596660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:22,175-Speed 2631.52 samples/sec   Loss 3.8391   LearningRate 0.0079   Epoch: 14   Global Step: 596670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:26,072-Speed 2628.59 samples/sec   Loss 3.8561   LearningRate 0.0079   Epoch: 14   Global Step: 596680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:29,986-Speed 2616.50 samples/sec   Loss 3.7944   LearningRate 0.0079   Epoch: 14   Global Step: 596690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:33,906-Speed 2613.46 samples/sec   Loss 3.8040   LearningRate 0.0079   Epoch: 14   Global Step: 596700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:37,808-Speed 2624.62 samples/sec   Loss 3.8564   LearningRate 0.0079   Epoch: 14   Global Step: 596710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:41,703-Speed 2630.02 samples/sec   Loss 3.8117   LearningRate 0.0079   Epoch: 14   Global Step: 596720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:45,599-Speed 2628.81 samples/sec   Loss 3.8388   LearningRate 0.0079   Epoch: 14   Global Step: 596730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:49,495-Speed 2629.18 samples/sec   Loss 3.8531   LearningRate 0.0079   Epoch: 14   Global Step: 596740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:53,399-Speed 2622.86 samples/sec   Loss 3.8188   LearningRate 0.0079   Epoch: 14   Global Step: 596750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:47:57,332-Speed 2604.88 samples/sec   Loss 3.7706   LearningRate 0.0079   Epoch: 14   Global Step: 596760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:48:01,234-Speed 2625.04 samples/sec   Loss 3.7141   LearningRate 0.0079   Epoch: 14   Global Step: 596770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:48:05,141-Speed 2621.72 samples/sec   Loss 3.8188   LearningRate 0.0079   Epoch: 14   Global Step: 596780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:48:09,040-Speed 2626.85 samples/sec   Loss 3.7947   LearningRate 0.0079   Epoch: 14   Global Step: 596790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:48:12,938-Speed 2628.08 samples/sec   Loss 3.8038   LearningRate 0.0079   Epoch: 14   Global Step: 596800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:48:16,837-Speed 2626.85 samples/sec   Loss 3.7889   LearningRate 0.0079   Epoch: 14   Global Step: 596810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:48:20,767-Speed 2605.76 samples/sec   Loss 3.7323   LearningRate 0.0079   Epoch: 14   Global Step: 596820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:48:24,655-Speed 2634.58 samples/sec   Loss 3.7922   LearningRate 0.0079   Epoch: 14   Global Step: 596830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:48:28,524-Speed 2647.51 samples/sec   Loss 3.6651   LearningRate 0.0079   Epoch: 14   Global Step: 596840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:32,464-Speed 2599.76 samples/sec   Loss 3.8368   LearningRate 0.0079   Epoch: 14   Global Step: 596850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:36,379-Speed 2616.50 samples/sec   Loss 3.8309   LearningRate 0.0079   Epoch: 14   Global Step: 596860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:40,272-Speed 2630.77 samples/sec   Loss 3.8399   LearningRate 0.0079   Epoch: 14   Global Step: 596870   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:44,168-Speed 2628.55 samples/sec   Loss 3.8287   LearningRate 0.0079   Epoch: 14   Global Step: 596880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:48,066-Speed 2628.12 samples/sec   Loss 3.8723   LearningRate 0.0079   Epoch: 14   Global Step: 596890   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:51,965-Speed 2627.33 samples/sec   Loss 3.7650   LearningRate 0.0079   Epoch: 14   Global Step: 596900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:55,858-Speed 2631.06 samples/sec   Loss 3.7874   LearningRate 0.0079   Epoch: 14   Global Step: 596910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:48:59,750-Speed 2631.27 samples/sec   Loss 3.8228   LearningRate 0.0079   Epoch: 14   Global Step: 596920   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:49:03,644-Speed 2629.92 samples/sec   Loss 3.8004   LearningRate 0.0079   Epoch: 14   Global Step: 596930   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:49:07,546-Speed 2625.64 samples/sec   Loss 3.7993   LearningRate 0.0079   Epoch: 14   Global Step: 596940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:11,444-Speed 2627.46 samples/sec   Loss 3.8079   LearningRate 0.0079   Epoch: 14   Global Step: 596950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:15,340-Speed 2629.02 samples/sec   Loss 3.8255   LearningRate 0.0079   Epoch: 14   Global Step: 596960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:19,236-Speed 2628.88 samples/sec   Loss 3.7797   LearningRate 0.0079   Epoch: 14   Global Step: 596970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:23,129-Speed 2631.09 samples/sec   Loss 3.7405   LearningRate 0.0079   Epoch: 14   Global Step: 596980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:27,023-Speed 2629.99 samples/sec   Loss 3.7999   LearningRate 0.0079   Epoch: 14   Global Step: 596990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:30,931-Speed 2621.24 samples/sec   Loss 3.8113   LearningRate 0.0079   Epoch: 14   Global Step: 597000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:34,824-Speed 2630.71 samples/sec   Loss 3.8983   LearningRate 0.0079   Epoch: 14   Global Step: 597010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:38,719-Speed 2629.57 samples/sec   Loss 3.7757   LearningRate 0.0079   Epoch: 14   Global Step: 597020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:42,615-Speed 2629.44 samples/sec   Loss 3.7989   LearningRate 0.0079   Epoch: 14   Global Step: 597030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:49:46,512-Speed 2628.42 samples/sec   Loss 3.8627   LearningRate 0.0079   Epoch: 14   Global Step: 597040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:49:50,408-Speed 2628.40 samples/sec   Loss 3.7882   LearningRate 0.0079   Epoch: 14   Global Step: 597050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:49:54,318-Speed 2620.35 samples/sec   Loss 3.8153   LearningRate 0.0079   Epoch: 14   Global Step: 597060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:49:58,193-Speed 2643.63 samples/sec   Loss 3.7436   LearningRate 0.0079   Epoch: 14   Global Step: 597070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:02,083-Speed 2632.71 samples/sec   Loss 3.8191   LearningRate 0.0079   Epoch: 14   Global Step: 597080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:05,983-Speed 2626.01 samples/sec   Loss 3.6959   LearningRate 0.0079   Epoch: 14   Global Step: 597090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:09,906-Speed 2611.14 samples/sec   Loss 3.7997   LearningRate 0.0079   Epoch: 14   Global Step: 597100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:13,804-Speed 2626.96 samples/sec   Loss 3.8039   LearningRate 0.0079   Epoch: 14   Global Step: 597110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:17,724-Speed 2613.12 samples/sec   Loss 3.8273   LearningRate 0.0079   Epoch: 14   Global Step: 597120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:21,628-Speed 2624.37 samples/sec   Loss 3.8360   LearningRate 0.0079   Epoch: 14   Global Step: 597130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:25,524-Speed 2629.00 samples/sec   Loss 3.8660   LearningRate 0.0079   Epoch: 14   Global Step: 597140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:29,428-Speed 2623.33 samples/sec   Loss 3.7715   LearningRate 0.0078   Epoch: 14   Global Step: 597150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:33,323-Speed 2629.94 samples/sec   Loss 3.7868   LearningRate 0.0078   Epoch: 14   Global Step: 597160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:37,232-Speed 2620.07 samples/sec   Loss 3.7921   LearningRate 0.0078   Epoch: 14   Global Step: 597170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:50:41,132-Speed 2626.00 samples/sec   Loss 3.7264   LearningRate 0.0078   Epoch: 14   Global Step: 597180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:50:45,014-Speed 2639.15 samples/sec   Loss 3.7794   LearningRate 0.0078   Epoch: 14   Global Step: 597190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:48,925-Speed 2618.22 samples/sec   Loss 3.7715   LearningRate 0.0078   Epoch: 14   Global Step: 597200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:52,828-Speed 2624.95 samples/sec   Loss 3.7722   LearningRate 0.0078   Epoch: 14   Global Step: 597210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:50:56,725-Speed 2627.55 samples/sec   Loss 3.8196   LearningRate 0.0078   Epoch: 14   Global Step: 597220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:51:00,640-Speed 2616.69 samples/sec   Loss 3.8411   LearningRate 0.0078   Epoch: 14   Global Step: 597230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:51:04,565-Speed 2609.27 samples/sec   Loss 3.8381   LearningRate 0.0078   Epoch: 14   Global Step: 597240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:51:08,454-Speed 2633.68 samples/sec   Loss 3.8125   LearningRate 0.0078   Epoch: 14   Global Step: 597250   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:12,378-Speed 2609.94 samples/sec   Loss 3.8661   LearningRate 0.0078   Epoch: 14   Global Step: 597260   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:16,283-Speed 2623.56 samples/sec   Loss 3.8280   LearningRate 0.0078   Epoch: 14   Global Step: 597270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:20,185-Speed 2625.07 samples/sec   Loss 3.8088   LearningRate 0.0078   Epoch: 14   Global Step: 597280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:24,134-Speed 2593.32 samples/sec   Loss 3.8393   LearningRate 0.0078   Epoch: 14   Global Step: 597290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:28,042-Speed 2621.51 samples/sec   Loss 3.7880   LearningRate 0.0078   Epoch: 14   Global Step: 597300   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:31,946-Speed 2624.28 samples/sec   Loss 3.7601   LearningRate 0.0078   Epoch: 14   Global Step: 597310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:35,863-Speed 2614.64 samples/sec   Loss 3.8596   LearningRate 0.0078   Epoch: 14   Global Step: 597320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:39,778-Speed 2616.43 samples/sec   Loss 3.7950   LearningRate 0.0078   Epoch: 14   Global Step: 597330   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:43,675-Speed 2627.95 samples/sec   Loss 3.8878   LearningRate 0.0078   Epoch: 14   Global Step: 597340   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:51:47,575-Speed 2627.12 samples/sec   Loss 3.7768   LearningRate 0.0078   Epoch: 14   Global Step: 597350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:51:51,480-Speed 2622.59 samples/sec   Loss 3.7685   LearningRate 0.0078   Epoch: 14   Global Step: 597360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:51:55,485-Speed 2557.72 samples/sec   Loss 3.7461   LearningRate 0.0078   Epoch: 14   Global Step: 597370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:51:59,428-Speed 2597.92 samples/sec   Loss 3.7727   LearningRate 0.0078   Epoch: 14   Global Step: 597380   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:03,350-Speed 2611.80 samples/sec   Loss 3.6828   LearningRate 0.0078   Epoch: 14   Global Step: 597390   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:07,249-Speed 2626.39 samples/sec   Loss 3.7333   LearningRate 0.0078   Epoch: 14   Global Step: 597400   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:11,150-Speed 2626.01 samples/sec   Loss 3.7666   LearningRate 0.0078   Epoch: 14   Global Step: 597410   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:15,044-Speed 2630.09 samples/sec   Loss 3.7910   LearningRate 0.0078   Epoch: 14   Global Step: 597420   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:18,943-Speed 2626.76 samples/sec   Loss 3.8098   LearningRate 0.0078   Epoch: 14   Global Step: 597430   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:22,841-Speed 2628.04 samples/sec   Loss 3.7318   LearningRate 0.0078   Epoch: 14   Global Step: 597440   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:26,736-Speed 2629.28 samples/sec   Loss 3.8372   LearningRate 0.0078   Epoch: 14   Global Step: 597450   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:30,635-Speed 2626.96 samples/sec   Loss 3.8708   LearningRate 0.0078   Epoch: 14   Global Step: 597460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:34,533-Speed 2627.74 samples/sec   Loss 3.7420   LearningRate 0.0078   Epoch: 14   Global Step: 597470   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:52:38,434-Speed 2625.78 samples/sec   Loss 3.7160   LearningRate 0.0078   Epoch: 14   Global Step: 597480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:52:42,344-Speed 2619.73 samples/sec   Loss 3.7624   LearningRate 0.0078   Epoch: 14   Global Step: 597490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:52:46,239-Speed 2629.60 samples/sec   Loss 3.8311   LearningRate 0.0078   Epoch: 14   Global Step: 597500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:52:50,151-Speed 2617.96 samples/sec   Loss 3.8244   LearningRate 0.0078   Epoch: 14   Global Step: 597510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:52:54,056-Speed 2622.99 samples/sec   Loss 3.8113   LearningRate 0.0078   Epoch: 14   Global Step: 597520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:52:57,950-Speed 2630.60 samples/sec   Loss 3.8007   LearningRate 0.0078   Epoch: 14   Global Step: 597530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:01,846-Speed 2629.98 samples/sec   Loss 3.7104   LearningRate 0.0078   Epoch: 14   Global Step: 597540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:05,740-Speed 2630.26 samples/sec   Loss 3.7695   LearningRate 0.0078   Epoch: 14   Global Step: 597550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:09,639-Speed 2627.12 samples/sec   Loss 3.7774   LearningRate 0.0078   Epoch: 14   Global Step: 597560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:13,539-Speed 2626.81 samples/sec   Loss 3.8337   LearningRate 0.0078   Epoch: 14   Global Step: 597570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:17,447-Speed 2621.02 samples/sec   Loss 3.7892   LearningRate 0.0078   Epoch: 14   Global Step: 597580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:53:21,340-Speed 2630.26 samples/sec   Loss 3.8391   LearningRate 0.0078   Epoch: 14   Global Step: 597590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:53:25,228-Speed 2635.07 samples/sec   Loss 3.8916   LearningRate 0.0078   Epoch: 14   Global Step: 597600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:29,124-Speed 2628.89 samples/sec   Loss 3.8232   LearningRate 0.0078   Epoch: 14   Global Step: 597610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:33,022-Speed 2627.96 samples/sec   Loss 3.8054   LearningRate 0.0078   Epoch: 14   Global Step: 597620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:36,924-Speed 2625.10 samples/sec   Loss 3.7486   LearningRate 0.0078   Epoch: 14   Global Step: 597630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:53:40,805-Speed 2639.01 samples/sec   Loss 3.7212   LearningRate 0.0078   Epoch: 14   Global Step: 597640   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:53:44,705-Speed 2626.61 samples/sec   Loss 3.7412   LearningRate 0.0078   Epoch: 14   Global Step: 597650   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:53:48,607-Speed 2624.74 samples/sec   Loss 3.8215   LearningRate 0.0078   Epoch: 14   Global Step: 597660   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:53:52,528-Speed 2612.31 samples/sec   Loss 3.8607   LearningRate 0.0078   Epoch: 14   Global Step: 597670   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:53:56,428-Speed 2626.67 samples/sec   Loss 3.7769   LearningRate 0.0078   Epoch: 14   Global Step: 597680   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:00,326-Speed 2627.81 samples/sec   Loss 3.7945   LearningRate 0.0078   Epoch: 14   Global Step: 597690   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:04,226-Speed 2626.19 samples/sec   Loss 3.7894   LearningRate 0.0078   Epoch: 14   Global Step: 597700   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:08,131-Speed 2622.99 samples/sec   Loss 3.7791   LearningRate 0.0078   Epoch: 14   Global Step: 597710   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:12,028-Speed 2628.43 samples/sec   Loss 3.9140   LearningRate 0.0078   Epoch: 14   Global Step: 597720   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:15,930-Speed 2625.19 samples/sec   Loss 3.7828   LearningRate 0.0078   Epoch: 14   Global Step: 597730   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:19,875-Speed 2596.07 samples/sec   Loss 3.6739   LearningRate 0.0078   Epoch: 14   Global Step: 597740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:54:23,796-Speed 2612.11 samples/sec   Loss 3.7548   LearningRate 0.0078   Epoch: 14   Global Step: 597750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:54:27,695-Speed 2627.64 samples/sec   Loss 3.7904   LearningRate 0.0078   Epoch: 14   Global Step: 597760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:54:31,599-Speed 2623.03 samples/sec   Loss 3.7087   LearningRate 0.0078   Epoch: 14   Global Step: 597770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:54:35,495-Speed 2629.28 samples/sec   Loss 3.7902   LearningRate 0.0078   Epoch: 14   Global Step: 597780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:54:39,372-Speed 2641.24 samples/sec   Loss 3.7150   LearningRate 0.0078   Epoch: 14   Global Step: 597790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:43,270-Speed 2627.94 samples/sec   Loss 3.7970   LearningRate 0.0078   Epoch: 14   Global Step: 597800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:47,167-Speed 2628.52 samples/sec   Loss 3.7762   LearningRate 0.0078   Epoch: 14   Global Step: 597810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:51,069-Speed 2625.41 samples/sec   Loss 3.7993   LearningRate 0.0078   Epoch: 14   Global Step: 597820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:54,980-Speed 2618.61 samples/sec   Loss 3.7674   LearningRate 0.0078   Epoch: 14   Global Step: 597830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:54:58,872-Speed 2631.13 samples/sec   Loss 3.8170   LearningRate 0.0078   Epoch: 14   Global Step: 597840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:55:02,768-Speed 2629.00 samples/sec   Loss 3.7783   LearningRate 0.0078   Epoch: 14   Global Step: 597850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:55:06,665-Speed 2628.26 samples/sec   Loss 3.8605   LearningRate 0.0078   Epoch: 14   Global Step: 597860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:55:10,571-Speed 2622.03 samples/sec   Loss 3.7398   LearningRate 0.0078   Epoch: 14   Global Step: 597870   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:55:14,472-Speed 2626.18 samples/sec   Loss 3.7830   LearningRate 0.0078   Epoch: 14   Global Step: 597880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 14:55:18,369-Speed 2627.60 samples/sec   Loss 3.7853   LearningRate 0.0078   Epoch: 14   Global Step: 597890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:22,273-Speed 2632.75 samples/sec   Loss 3.7598   LearningRate 0.0078   Epoch: 14   Global Step: 597900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:26,169-Speed 2628.89 samples/sec   Loss 3.8157   LearningRate 0.0078   Epoch: 14   Global Step: 597910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:30,092-Speed 2610.85 samples/sec   Loss 3.7659   LearningRate 0.0078   Epoch: 14   Global Step: 597920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:34,000-Speed 2620.84 samples/sec   Loss 3.8159   LearningRate 0.0078   Epoch: 14   Global Step: 597930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:37,907-Speed 2621.63 samples/sec   Loss 3.7664   LearningRate 0.0078   Epoch: 14   Global Step: 597940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:41,806-Speed 2626.90 samples/sec   Loss 3.7859   LearningRate 0.0078   Epoch: 14   Global Step: 597950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:45,725-Speed 2613.74 samples/sec   Loss 3.7193   LearningRate 0.0078   Epoch: 14   Global Step: 597960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:49,625-Speed 2625.95 samples/sec   Loss 3.7782   LearningRate 0.0078   Epoch: 14   Global Step: 597970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:53,523-Speed 2627.55 samples/sec   Loss 3.7438   LearningRate 0.0078   Epoch: 14   Global Step: 597980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:55:57,424-Speed 2625.46 samples/sec   Loss 3.7606   LearningRate 0.0078   Epoch: 14   Global Step: 597990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:56:01,365-Speed 2599.04 samples/sec   Loss 3.7906   LearningRate 0.0078   Epoch: 14   Global Step: 598000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:05,320-Speed 2589.85 samples/sec   Loss 3.7815   LearningRate 0.0078   Epoch: 14   Global Step: 598010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:09,214-Speed 2630.30 samples/sec   Loss 3.7950   LearningRate 0.0078   Epoch: 14   Global Step: 598020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:13,108-Speed 2630.48 samples/sec   Loss 3.8184   LearningRate 0.0078   Epoch: 14   Global Step: 598030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:17,009-Speed 2624.89 samples/sec   Loss 3.7489   LearningRate 0.0078   Epoch: 14   Global Step: 598040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:20,901-Speed 2631.87 samples/sec   Loss 3.7413   LearningRate 0.0078   Epoch: 14   Global Step: 598050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:24,808-Speed 2621.73 samples/sec   Loss 3.7724   LearningRate 0.0078   Epoch: 14   Global Step: 598060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:28,722-Speed 2616.19 samples/sec   Loss 3.8266   LearningRate 0.0078   Epoch: 14   Global Step: 598070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:32,626-Speed 2623.55 samples/sec   Loss 3.8488   LearningRate 0.0078   Epoch: 14   Global Step: 598080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:36,528-Speed 2625.07 samples/sec   Loss 3.8254   LearningRate 0.0078   Epoch: 14   Global Step: 598090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:40,427-Speed 2627.24 samples/sec   Loss 3.7589   LearningRate 0.0078   Epoch: 14   Global Step: 598100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:56:44,301-Speed 2643.90 samples/sec   Loss 3.7893   LearningRate 0.0078   Epoch: 14   Global Step: 598110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:48,195-Speed 2630.24 samples/sec   Loss 3.8379   LearningRate 0.0078   Epoch: 14   Global Step: 598120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:52,122-Speed 2608.83 samples/sec   Loss 3.7726   LearningRate 0.0078   Epoch: 14   Global Step: 598130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:56,023-Speed 2625.07 samples/sec   Loss 3.8144   LearningRate 0.0078   Epoch: 14   Global Step: 598140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:56:59,931-Speed 2620.63 samples/sec   Loss 3.8760   LearningRate 0.0078   Epoch: 14   Global Step: 598150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:03,835-Speed 2623.70 samples/sec   Loss 3.7331   LearningRate 0.0078   Epoch: 14   Global Step: 598160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:07,730-Speed 2629.92 samples/sec   Loss 3.8367   LearningRate 0.0078   Epoch: 14   Global Step: 598170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:11,634-Speed 2623.11 samples/sec   Loss 3.7355   LearningRate 0.0078   Epoch: 14   Global Step: 598180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:15,530-Speed 2629.60 samples/sec   Loss 3.7833   LearningRate 0.0078   Epoch: 14   Global Step: 598190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:19,430-Speed 2626.42 samples/sec   Loss 3.7433   LearningRate 0.0078   Epoch: 14   Global Step: 598200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:23,305-Speed 2643.47 samples/sec   Loss 3.7887   LearningRate 0.0078   Epoch: 14   Global Step: 598210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:27,199-Speed 2630.04 samples/sec   Loss 3.8736   LearningRate 0.0078   Epoch: 14   Global Step: 598220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:31,097-Speed 2627.67 samples/sec   Loss 3.7157   LearningRate 0.0078   Epoch: 14   Global Step: 598230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:34,996-Speed 2626.68 samples/sec   Loss 3.8063   LearningRate 0.0078   Epoch: 14   Global Step: 598240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:38,904-Speed 2620.97 samples/sec   Loss 3.8006   LearningRate 0.0078   Epoch: 14   Global Step: 598250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:42,811-Speed 2621.96 samples/sec   Loss 3.7405   LearningRate 0.0078   Epoch: 14   Global Step: 598260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:46,710-Speed 2627.42 samples/sec   Loss 3.7308   LearningRate 0.0078   Epoch: 14   Global Step: 598270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:50,602-Speed 2632.36 samples/sec   Loss 3.7246   LearningRate 0.0078   Epoch: 14   Global Step: 598280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:54,503-Speed 2625.33 samples/sec   Loss 3.8842   LearningRate 0.0078   Epoch: 14   Global Step: 598290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:57:58,419-Speed 2615.65 samples/sec   Loss 3.7651   LearningRate 0.0078   Epoch: 14   Global Step: 598300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:02,333-Speed 2616.68 samples/sec   Loss 3.9067   LearningRate 0.0078   Epoch: 14   Global Step: 598310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:58:06,233-Speed 2626.95 samples/sec   Loss 3.8420   LearningRate 0.0078   Epoch: 14   Global Step: 598320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:58:10,131-Speed 2627.72 samples/sec   Loss 3.7690   LearningRate 0.0078   Epoch: 14   Global Step: 598330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:58:14,002-Speed 2646.32 samples/sec   Loss 3.8706   LearningRate 0.0078   Epoch: 14   Global Step: 598340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:17,992-Speed 2567.91 samples/sec   Loss 3.7512   LearningRate 0.0078   Epoch: 14   Global Step: 598350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:21,891-Speed 2626.35 samples/sec   Loss 3.7666   LearningRate 0.0078   Epoch: 14   Global Step: 598360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:25,788-Speed 2628.55 samples/sec   Loss 3.7918   LearningRate 0.0078   Epoch: 14   Global Step: 598370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:29,696-Speed 2620.63 samples/sec   Loss 3.7229   LearningRate 0.0078   Epoch: 14   Global Step: 598380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:33,595-Speed 2627.04 samples/sec   Loss 3.8128   LearningRate 0.0078   Epoch: 14   Global Step: 598390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:37,492-Speed 2628.34 samples/sec   Loss 3.8557   LearningRate 0.0078   Epoch: 14   Global Step: 598400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:41,386-Speed 2630.68 samples/sec   Loss 3.7206   LearningRate 0.0078   Epoch: 14   Global Step: 598410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:45,276-Speed 2632.92 samples/sec   Loss 3.7250   LearningRate 0.0078   Epoch: 14   Global Step: 598420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:49,196-Speed 2612.79 samples/sec   Loss 3.8289   LearningRate 0.0078   Epoch: 14   Global Step: 598430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:58:53,095-Speed 2627.04 samples/sec   Loss 3.7654   LearningRate 0.0078   Epoch: 14   Global Step: 598440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:58:56,969-Speed 2644.25 samples/sec   Loss 3.7240   LearningRate 0.0078   Epoch: 14   Global Step: 598450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:00,863-Speed 2630.00 samples/sec   Loss 3.7208   LearningRate 0.0078   Epoch: 14   Global Step: 598460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:04,764-Speed 2626.04 samples/sec   Loss 3.8032   LearningRate 0.0078   Epoch: 14   Global Step: 598470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:08,658-Speed 2630.16 samples/sec   Loss 3.8651   LearningRate 0.0078   Epoch: 14   Global Step: 598480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:12,557-Speed 2627.26 samples/sec   Loss 3.7956   LearningRate 0.0078   Epoch: 14   Global Step: 598490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:16,461-Speed 2623.95 samples/sec   Loss 3.7057   LearningRate 0.0078   Epoch: 14   Global Step: 598500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:20,366-Speed 2622.94 samples/sec   Loss 3.7960   LearningRate 0.0078   Epoch: 14   Global Step: 598510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:24,262-Speed 2628.76 samples/sec   Loss 3.6749   LearningRate 0.0078   Epoch: 14   Global Step: 598520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:28,158-Speed 2629.55 samples/sec   Loss 3.7788   LearningRate 0.0078   Epoch: 14   Global Step: 598530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:32,056-Speed 2627.16 samples/sec   Loss 3.7580   LearningRate 0.0078   Epoch: 14   Global Step: 598540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:35,948-Speed 2632.35 samples/sec   Loss 3.7561   LearningRate 0.0078   Epoch: 14   Global Step: 598550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:59:39,842-Speed 2630.16 samples/sec   Loss 3.8209   LearningRate 0.0078   Epoch: 14   Global Step: 598560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:59:43,737-Speed 2630.35 samples/sec   Loss 3.8357   LearningRate 0.0078   Epoch: 14   Global Step: 598570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:59:47,644-Speed 2621.31 samples/sec   Loss 3.7916   LearningRate 0.0078   Epoch: 14   Global Step: 598580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 14:59:51,515-Speed 2646.00 samples/sec   Loss 3.7728   LearningRate 0.0078   Epoch: 14   Global Step: 598590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:55,442-Speed 2607.75 samples/sec   Loss 3.8202   LearningRate 0.0078   Epoch: 14   Global Step: 598600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 14:59:59,365-Speed 2611.36 samples/sec   Loss 3.7505   LearningRate 0.0078   Epoch: 14   Global Step: 598610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:03,264-Speed 2627.08 samples/sec   Loss 3.8876   LearningRate 0.0078   Epoch: 14   Global Step: 598620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:07,164-Speed 2626.12 samples/sec   Loss 3.7766   LearningRate 0.0078   Epoch: 14   Global Step: 598630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:11,071-Speed 2621.78 samples/sec   Loss 3.6891   LearningRate 0.0077   Epoch: 14   Global Step: 598640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:14,972-Speed 2625.94 samples/sec   Loss 3.7792   LearningRate 0.0077   Epoch: 14   Global Step: 598650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:18,872-Speed 2625.62 samples/sec   Loss 3.7600   LearningRate 0.0077   Epoch: 14   Global Step: 598660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:22,772-Speed 2626.76 samples/sec   Loss 3.7937   LearningRate 0.0077   Epoch: 14   Global Step: 598670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:26,724-Speed 2591.70 samples/sec   Loss 3.7134   LearningRate 0.0077   Epoch: 14   Global Step: 598680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:30,620-Speed 2628.99 samples/sec   Loss 3.6595   LearningRate 0.0077   Epoch: 14   Global Step: 598690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:00:34,524-Speed 2623.55 samples/sec   Loss 3.7306   LearningRate 0.0077   Epoch: 14   Global Step: 598700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:00:38,408-Speed 2637.32 samples/sec   Loss 3.7082   LearningRate 0.0077   Epoch: 14   Global Step: 598710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:42,306-Speed 2627.92 samples/sec   Loss 3.7861   LearningRate 0.0077   Epoch: 14   Global Step: 598720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:46,203-Speed 2628.70 samples/sec   Loss 3.8014   LearningRate 0.0077   Epoch: 14   Global Step: 598730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:50,105-Speed 2624.57 samples/sec   Loss 3.7452   LearningRate 0.0077   Epoch: 14   Global Step: 598740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:54,008-Speed 2624.66 samples/sec   Loss 3.7769   LearningRate 0.0077   Epoch: 14   Global Step: 598750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:00:57,882-Speed 2643.58 samples/sec   Loss 3.6900   LearningRate 0.0077   Epoch: 14   Global Step: 598760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:01,783-Speed 2626.18 samples/sec   Loss 3.8185   LearningRate 0.0077   Epoch: 14   Global Step: 598770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:05,679-Speed 2629.07 samples/sec   Loss 3.6928   LearningRate 0.0077   Epoch: 14   Global Step: 598780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:09,574-Speed 2629.58 samples/sec   Loss 3.7854   LearningRate 0.0077   Epoch: 14   Global Step: 598790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:13,504-Speed 2606.44 samples/sec   Loss 3.7180   LearningRate 0.0077   Epoch: 14   Global Step: 598800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:17,408-Speed 2623.80 samples/sec   Loss 3.7226   LearningRate 0.0077   Epoch: 14   Global Step: 598810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:21,306-Speed 2628.28 samples/sec   Loss 3.7885   LearningRate 0.0077   Epoch: 14   Global Step: 598820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:25,215-Speed 2619.57 samples/sec   Loss 3.8541   LearningRate 0.0077   Epoch: 14   Global Step: 598830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:29,115-Speed 2626.77 samples/sec   Loss 3.7271   LearningRate 0.0077   Epoch: 14   Global Step: 598840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:33,013-Speed 2627.72 samples/sec   Loss 3.9003   LearningRate 0.0077   Epoch: 14   Global Step: 598850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:36,913-Speed 2626.06 samples/sec   Loss 3.7668   LearningRate 0.0077   Epoch: 14   Global Step: 598860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:01:40,821-Speed 2620.86 samples/sec   Loss 3.7132   LearningRate 0.0077   Epoch: 14   Global Step: 598870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:01:44,722-Speed 2625.61 samples/sec   Loss 3.7132   LearningRate 0.0077   Epoch: 14   Global Step: 598880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:48,615-Speed 2631.08 samples/sec   Loss 3.8208   LearningRate 0.0077   Epoch: 14   Global Step: 598890   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:52,511-Speed 2629.46 samples/sec   Loss 3.7583   LearningRate 0.0077   Epoch: 14   Global Step: 598900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:01:56,407-Speed 2628.72 samples/sec   Loss 3.8050   LearningRate 0.0077   Epoch: 14   Global Step: 598910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:00,303-Speed 2628.68 samples/sec   Loss 3.8150   LearningRate 0.0077   Epoch: 14   Global Step: 598920   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:04,203-Speed 2626.34 samples/sec   Loss 3.7550   LearningRate 0.0077   Epoch: 14   Global Step: 598930   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:08,119-Speed 2615.69 samples/sec   Loss 3.7706   LearningRate 0.0077   Epoch: 14   Global Step: 598940   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:12,064-Speed 2596.30 samples/sec   Loss 3.7803   LearningRate 0.0077   Epoch: 14   Global Step: 598950   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:16,011-Speed 2595.41 samples/sec   Loss 3.7861   LearningRate 0.0077   Epoch: 14   Global Step: 598960   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:19,908-Speed 2628.08 samples/sec   Loss 3.7605   LearningRate 0.0077   Epoch: 14   Global Step: 598970   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:02:23,808-Speed 2626.89 samples/sec   Loss 3.7958   LearningRate 0.0077   Epoch: 14   Global Step: 598980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:27,728-Speed 2612.79 samples/sec   Loss 3.8171   LearningRate 0.0077   Epoch: 14   Global Step: 598990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:31,653-Speed 2609.52 samples/sec   Loss 3.7420   LearningRate 0.0077   Epoch: 14   Global Step: 599000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:35,553-Speed 2626.14 samples/sec   Loss 3.7813   LearningRate 0.0077   Epoch: 14   Global Step: 599010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:39,453-Speed 2626.35 samples/sec   Loss 3.7622   LearningRate 0.0077   Epoch: 14   Global Step: 599020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:43,390-Speed 2601.86 samples/sec   Loss 3.6873   LearningRate 0.0077   Epoch: 14   Global Step: 599030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:47,289-Speed 2627.21 samples/sec   Loss 3.7093   LearningRate 0.0077   Epoch: 14   Global Step: 599040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:51,187-Speed 2628.14 samples/sec   Loss 3.7716   LearningRate 0.0077   Epoch: 14   Global Step: 599050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:55,086-Speed 2626.68 samples/sec   Loss 3.7545   LearningRate 0.0077   Epoch: 14   Global Step: 599060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:02:58,990-Speed 2623.34 samples/sec   Loss 3.7657   LearningRate 0.0077   Epoch: 14   Global Step: 599070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:02,869-Speed 2640.56 samples/sec   Loss 3.8062   LearningRate 0.0077   Epoch: 14   Global Step: 599080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:06,770-Speed 2625.40 samples/sec   Loss 3.7799   LearningRate 0.0077   Epoch: 14   Global Step: 599090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:10,673-Speed 2624.55 samples/sec   Loss 3.7548   LearningRate 0.0077   Epoch: 14   Global Step: 599100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:14,568-Speed 2629.67 samples/sec   Loss 3.7731   LearningRate 0.0077   Epoch: 14   Global Step: 599110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:18,468-Speed 2626.54 samples/sec   Loss 3.8154   LearningRate 0.0077   Epoch: 14   Global Step: 599120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:22,364-Speed 2629.43 samples/sec   Loss 3.8513   LearningRate 0.0077   Epoch: 14   Global Step: 599130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:26,263-Speed 2626.82 samples/sec   Loss 3.8403   LearningRate 0.0077   Epoch: 14   Global Step: 599140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:30,163-Speed 2626.29 samples/sec   Loss 3.7399   LearningRate 0.0077   Epoch: 14   Global Step: 599150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:34,067-Speed 2623.40 samples/sec   Loss 3.8219   LearningRate 0.0077   Epoch: 14   Global Step: 599160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:37,964-Speed 2628.31 samples/sec   Loss 3.8003   LearningRate 0.0077   Epoch: 14   Global Step: 599170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:41,873-Speed 2619.68 samples/sec   Loss 3.7059   LearningRate 0.0077   Epoch: 14   Global Step: 599180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:03:45,774-Speed 2625.44 samples/sec   Loss 3.8239   LearningRate 0.0077   Epoch: 14   Global Step: 599190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:03:49,661-Speed 2635.33 samples/sec   Loss 3.8974   LearningRate 0.0077   Epoch: 14   Global Step: 599200   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:03:53,562-Speed 2626.21 samples/sec   Loss 3.8742   LearningRate 0.0077   Epoch: 14   Global Step: 599210   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:03:57,464-Speed 2624.70 samples/sec   Loss 3.7426   LearningRate 0.0077   Epoch: 14   Global Step: 599220   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:01,373-Speed 2620.21 samples/sec   Loss 3.7981   LearningRate 0.0077   Epoch: 14   Global Step: 599230   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:05,267-Speed 2630.48 samples/sec   Loss 3.6823   LearningRate 0.0077   Epoch: 14   Global Step: 599240   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:09,172-Speed 2622.73 samples/sec   Loss 3.7677   LearningRate 0.0077   Epoch: 14   Global Step: 599250   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:13,067-Speed 2629.79 samples/sec   Loss 3.7921   LearningRate 0.0077   Epoch: 14   Global Step: 599260   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:16,962-Speed 2629.01 samples/sec   Loss 3.7258   LearningRate 0.0077   Epoch: 14   Global Step: 599270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:20,863-Speed 2626.28 samples/sec   Loss 3.8365   LearningRate 0.0077   Epoch: 14   Global Step: 599280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:24,759-Speed 2628.37 samples/sec   Loss 3.7630   LearningRate 0.0077   Epoch: 14   Global Step: 599290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:04:28,655-Speed 2629.64 samples/sec   Loss 3.8217   LearningRate 0.0077   Epoch: 14   Global Step: 599300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:32,553-Speed 2626.92 samples/sec   Loss 3.7657   LearningRate 0.0077   Epoch: 14   Global Step: 599310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:36,449-Speed 2628.93 samples/sec   Loss 3.6962   LearningRate 0.0077   Epoch: 14   Global Step: 599320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:40,346-Speed 2628.35 samples/sec   Loss 3.7669   LearningRate 0.0077   Epoch: 14   Global Step: 599330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:44,260-Speed 2617.54 samples/sec   Loss 3.7230   LearningRate 0.0077   Epoch: 14   Global Step: 599340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:48,157-Speed 2628.41 samples/sec   Loss 3.8254   LearningRate 0.0077   Epoch: 14   Global Step: 599350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:52,059-Speed 2624.75 samples/sec   Loss 3.8191   LearningRate 0.0077   Epoch: 14   Global Step: 599360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:55,951-Speed 2631.48 samples/sec   Loss 3.7558   LearningRate 0.0077   Epoch: 14   Global Step: 599370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:04:59,845-Speed 2630.48 samples/sec   Loss 3.7294   LearningRate 0.0077   Epoch: 14   Global Step: 599380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:03,750-Speed 2622.78 samples/sec   Loss 3.8255   LearningRate 0.0077   Epoch: 14   Global Step: 599390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:07,649-Speed 2626.69 samples/sec   Loss 3.6697   LearningRate 0.0077   Epoch: 14   Global Step: 599400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:05:11,519-Speed 2647.06 samples/sec   Loss 3.7339   LearningRate 0.0077   Epoch: 14   Global Step: 599410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:15,416-Speed 2628.08 samples/sec   Loss 3.8521   LearningRate 0.0077   Epoch: 14   Global Step: 599420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:19,329-Speed 2617.87 samples/sec   Loss 3.7613   LearningRate 0.0077   Epoch: 14   Global Step: 599430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:23,226-Speed 2628.30 samples/sec   Loss 3.7775   LearningRate 0.0077   Epoch: 14   Global Step: 599440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:27,119-Speed 2631.50 samples/sec   Loss 3.6781   LearningRate 0.0077   Epoch: 14   Global Step: 599450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:31,017-Speed 2627.02 samples/sec   Loss 3.7694   LearningRate 0.0077   Epoch: 14   Global Step: 599460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:34,923-Speed 2622.46 samples/sec   Loss 3.7506   LearningRate 0.0077   Epoch: 14   Global Step: 599470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:38,820-Speed 2627.53 samples/sec   Loss 3.8525   LearningRate 0.0077   Epoch: 14   Global Step: 599480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:42,715-Speed 2629.84 samples/sec   Loss 3.7946   LearningRate 0.0077   Epoch: 14   Global Step: 599490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:46,613-Speed 2627.54 samples/sec   Loss 3.7450   LearningRate 0.0077   Epoch: 14   Global Step: 599500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:50,496-Speed 2638.28 samples/sec   Loss 3.7841   LearningRate 0.0077   Epoch: 14   Global Step: 599510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:54,395-Speed 2626.90 samples/sec   Loss 3.8076   LearningRate 0.0077   Epoch: 14   Global Step: 599520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:05:58,294-Speed 2627.67 samples/sec   Loss 3.7255   LearningRate 0.0077   Epoch: 14   Global Step: 599530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:02,198-Speed 2623.75 samples/sec   Loss 3.7518   LearningRate 0.0077   Epoch: 14   Global Step: 599540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:06,093-Speed 2629.69 samples/sec   Loss 3.7927   LearningRate 0.0077   Epoch: 14   Global Step: 599550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:09,994-Speed 2625.83 samples/sec   Loss 3.7342   LearningRate 0.0077   Epoch: 14   Global Step: 599560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:13,886-Speed 2631.62 samples/sec   Loss 3.6384   LearningRate 0.0077   Epoch: 14   Global Step: 599570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:17,795-Speed 2620.19 samples/sec   Loss 3.6437   LearningRate 0.0077   Epoch: 14   Global Step: 599580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:21,701-Speed 2622.66 samples/sec   Loss 3.7854   LearningRate 0.0077   Epoch: 14   Global Step: 599590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:25,592-Speed 2632.70 samples/sec   Loss 3.7275   LearningRate 0.0077   Epoch: 14   Global Step: 599600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:29,507-Speed 2616.39 samples/sec   Loss 3.7043   LearningRate 0.0077   Epoch: 14   Global Step: 599610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:06:33,408-Speed 2625.26 samples/sec   Loss 3.6822   LearningRate 0.0077   Epoch: 14   Global Step: 599620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:06:37,301-Speed 2631.17 samples/sec   Loss 3.6899   LearningRate 0.0077   Epoch: 14   Global Step: 599630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:41,204-Speed 2624.54 samples/sec   Loss 3.7337   LearningRate 0.0077   Epoch: 14   Global Step: 599640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:45,117-Speed 2617.18 samples/sec   Loss 3.7490   LearningRate 0.0077   Epoch: 14   Global Step: 599650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:49,016-Speed 2627.45 samples/sec   Loss 3.7359   LearningRate 0.0077   Epoch: 14   Global Step: 599660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:52,923-Speed 2621.38 samples/sec   Loss 3.8214   LearningRate 0.0077   Epoch: 14   Global Step: 599670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:06:56,825-Speed 2625.18 samples/sec   Loss 3.7066   LearningRate 0.0077   Epoch: 14   Global Step: 599680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:07:00,726-Speed 2625.48 samples/sec   Loss 3.7291   LearningRate 0.0077   Epoch: 14   Global Step: 599690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:07:04,638-Speed 2618.10 samples/sec   Loss 3.7824   LearningRate 0.0077   Epoch: 14   Global Step: 599700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:07:08,566-Speed 2607.72 samples/sec   Loss 3.7911   LearningRate 0.0077   Epoch: 14   Global Step: 599710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:07:12,513-Speed 2595.17 samples/sec   Loss 3.7453   LearningRate 0.0077   Epoch: 14   Global Step: 599720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:07:16,445-Speed 2604.85 samples/sec   Loss 3.7696   LearningRate 0.0077   Epoch: 14   Global Step: 599730   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:20,342-Speed 2628.86 samples/sec   Loss 3.7962   LearningRate 0.0077   Epoch: 14   Global Step: 599740   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:24,259-Speed 2614.85 samples/sec   Loss 3.7350   LearningRate 0.0077   Epoch: 14   Global Step: 599750   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:28,155-Speed 2629.10 samples/sec   Loss 3.8080   LearningRate 0.0077   Epoch: 14   Global Step: 599760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:32,051-Speed 2628.99 samples/sec   Loss 3.6901   LearningRate 0.0077   Epoch: 14   Global Step: 599770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:35,949-Speed 2627.70 samples/sec   Loss 3.7761   LearningRate 0.0077   Epoch: 14   Global Step: 599780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:39,845-Speed 2628.76 samples/sec   Loss 3.7681   LearningRate 0.0077   Epoch: 14   Global Step: 599790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:43,746-Speed 2625.88 samples/sec   Loss 3.8151   LearningRate 0.0077   Epoch: 14   Global Step: 599800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:47,642-Speed 2629.35 samples/sec   Loss 3.7203   LearningRate 0.0077   Epoch: 14   Global Step: 599810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:51,533-Speed 2631.94 samples/sec   Loss 3.7485   LearningRate 0.0077   Epoch: 14   Global Step: 599820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:07:55,430-Speed 2628.59 samples/sec   Loss 3.7540   LearningRate 0.0077   Epoch: 14   Global Step: 599830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:07:59,332-Speed 2624.97 samples/sec   Loss 3.7356   LearningRate 0.0077   Epoch: 14   Global Step: 599840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:08:03,236-Speed 2623.78 samples/sec   Loss 3.7756   LearningRate 0.0077   Epoch: 14   Global Step: 599850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:08:07,128-Speed 2631.19 samples/sec   Loss 3.7724   LearningRate 0.0077   Epoch: 14   Global Step: 599860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:08:11,037-Speed 2620.54 samples/sec   Loss 3.6830   LearningRate 0.0077   Epoch: 14   Global Step: 599870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:08:14,938-Speed 2626.16 samples/sec   Loss 3.7379   LearningRate 0.0077   Epoch: 14   Global Step: 599880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:08:18,831-Speed 2631.35 samples/sec   Loss 3.7240   LearningRate 0.0077   Epoch: 14   Global Step: 599890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:08:22,709-Speed 2640.98 samples/sec   Loss 3.8080   LearningRate 0.0077   Epoch: 14   Global Step: 599900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:26,608-Speed 2627.22 samples/sec   Loss 3.7752   LearningRate 0.0077   Epoch: 14   Global Step: 599910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:30,506-Speed 2627.70 samples/sec   Loss 3.7143   LearningRate 0.0077   Epoch: 14   Global Step: 599920   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:34,407-Speed 2625.39 samples/sec   Loss 3.7389   LearningRate 0.0077   Epoch: 14   Global Step: 599930   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:38,299-Speed 2631.22 samples/sec   Loss 3.7115   LearningRate 0.0077   Epoch: 14   Global Step: 599940   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:42,195-Speed 2628.74 samples/sec   Loss 3.7696   LearningRate 0.0077   Epoch: 14   Global Step: 599950   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:46,181-Speed 2569.48 samples/sec   Loss 3.7328   LearningRate 0.0077   Epoch: 14   Global Step: 599960   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:50,082-Speed 2625.56 samples/sec   Loss 3.8004   LearningRate 0.0077   Epoch: 14   Global Step: 599970   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:53,979-Speed 2628.83 samples/sec   Loss 3.7494   LearningRate 0.0077   Epoch: 14   Global Step: 599980   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:08:57,877-Speed 2627.58 samples/sec   Loss 3.7860   LearningRate 0.0077   Epoch: 14   Global Step: 599990   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:09:01,800-Speed 2610.59 samples/sec   Loss 3.7552   LearningRate 0.0077   Epoch: 14   Global Step: 600000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:09:44,281-[lfw][600000]XNorm: 22.678364
Training: 2022-04-15 15:09:44,282-[lfw][600000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 15:09:44,282-[lfw][600000]Accuracy-Highest: 0.99800
Training: 2022-04-15 15:10:34,202-[cfp_fp][600000]XNorm: 21.877079
Training: 2022-04-15 15:10:34,203-[cfp_fp][600000]Accuracy-Flip: 0.98971+-0.00403
Training: 2022-04-15 15:10:34,204-[cfp_fp][600000]Accuracy-Highest: 0.99143
Training: 2022-04-15 15:11:17,095-[agedb_30][600000]XNorm: 22.971854
Training: 2022-04-15 15:11:17,096-[agedb_30][600000]Accuracy-Flip: 0.98033+-0.00572
Training: 2022-04-15 15:11:17,097-[agedb_30][600000]Accuracy-Highest: 0.98083
Training: 2022-04-15 15:11:20,973-Speed 73.58 samples/sec   Loss 3.8381   LearningRate 0.0077   Epoch: 14   Global Step: 600010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:11:24,845-Speed 2645.49 samples/sec   Loss 3.7869   LearningRate 0.0077   Epoch: 14   Global Step: 600020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:11:28,723-Speed 2641.03 samples/sec   Loss 3.8141   LearningRate 0.0077   Epoch: 14   Global Step: 600030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:11:32,607-Speed 2636.44 samples/sec   Loss 3.7497   LearningRate 0.0077   Epoch: 14   Global Step: 600040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:11:36,493-Speed 2635.54 samples/sec   Loss 3.8217   LearningRate 0.0077   Epoch: 14   Global Step: 600050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:11:40,355-Speed 2652.73 samples/sec   Loss 3.7626   LearningRate 0.0077   Epoch: 14   Global Step: 600060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:11:44,265-Speed 2619.62 samples/sec   Loss 3.7827   LearningRate 0.0077   Epoch: 14   Global Step: 600070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:11:48,167-Speed 2624.90 samples/sec   Loss 3.7485   LearningRate 0.0077   Epoch: 14   Global Step: 600080   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:11:52,054-Speed 2635.12 samples/sec   Loss 3.7824   LearningRate 0.0077   Epoch: 14   Global Step: 600090   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:11:55,940-Speed 2636.09 samples/sec   Loss 3.7711   LearningRate 0.0077   Epoch: 14   Global Step: 600100   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:11:59,827-Speed 2635.45 samples/sec   Loss 3.7727   LearningRate 0.0077   Epoch: 14   Global Step: 600110   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:12:03,718-Speed 2632.45 samples/sec   Loss 3.7274   LearningRate 0.0077   Epoch: 14   Global Step: 600120   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:12:07,583-Speed 2650.08 samples/sec   Loss 3.7388   LearningRate 0.0076   Epoch: 14   Global Step: 600130   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:11,476-Speed 2630.82 samples/sec   Loss 3.7702   LearningRate 0.0076   Epoch: 14   Global Step: 600140   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:15,369-Speed 2630.88 samples/sec   Loss 3.7260   LearningRate 0.0076   Epoch: 14   Global Step: 600150   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:19,271-Speed 2625.09 samples/sec   Loss 3.6948   LearningRate 0.0076   Epoch: 14   Global Step: 600160   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:23,165-Speed 2630.66 samples/sec   Loss 3.8442   LearningRate 0.0076   Epoch: 14   Global Step: 600170   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:27,062-Speed 2627.83 samples/sec   Loss 3.6644   LearningRate 0.0076   Epoch: 14   Global Step: 600180   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:30,955-Speed 2631.76 samples/sec   Loss 3.7831   LearningRate 0.0076   Epoch: 14   Global Step: 600190   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:34,849-Speed 2631.32 samples/sec   Loss 3.8019   LearningRate 0.0076   Epoch: 14   Global Step: 600200   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:38,747-Speed 2627.52 samples/sec   Loss 3.6977   LearningRate 0.0076   Epoch: 14   Global Step: 600210   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:42,655-Speed 2620.90 samples/sec   Loss 3.7359   LearningRate 0.0076   Epoch: 14   Global Step: 600220   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:12:46,546-Speed 2631.73 samples/sec   Loss 3.7177   LearningRate 0.0076   Epoch: 14   Global Step: 600230   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:12:50,447-Speed 2626.44 samples/sec   Loss 3.6378   LearningRate 0.0076   Epoch: 14   Global Step: 600240   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:12:54,340-Speed 2631.06 samples/sec   Loss 3.7998   LearningRate 0.0076   Epoch: 14   Global Step: 600250   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:12:58,249-Speed 2620.52 samples/sec   Loss 3.7435   LearningRate 0.0076   Epoch: 14   Global Step: 600260   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:02,143-Speed 2630.43 samples/sec   Loss 3.7196   LearningRate 0.0076   Epoch: 14   Global Step: 600270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:06,035-Speed 2632.04 samples/sec   Loss 3.7333   LearningRate 0.0076   Epoch: 14   Global Step: 600280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:10,026-Speed 2566.82 samples/sec   Loss 3.7127   LearningRate 0.0076   Epoch: 14   Global Step: 600290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:14,122-Speed 2500.68 samples/sec   Loss 3.6176   LearningRate 0.0076   Epoch: 14   Global Step: 600300   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:18,227-Speed 2494.59 samples/sec   Loss 3.6996   LearningRate 0.0076   Epoch: 14   Global Step: 600310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:22,320-Speed 2502.45 samples/sec   Loss 3.7187   LearningRate 0.0076   Epoch: 14   Global Step: 600320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:13:26,413-Speed 2502.41 samples/sec   Loss 3.7271   LearningRate 0.0076   Epoch: 14   Global Step: 600330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:30,505-Speed 2503.11 samples/sec   Loss 3.6522   LearningRate 0.0076   Epoch: 14   Global Step: 600340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:34,601-Speed 2500.97 samples/sec   Loss 3.7756   LearningRate 0.0076   Epoch: 14   Global Step: 600350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:38,534-Speed 2604.37 samples/sec   Loss 3.6716   LearningRate 0.0076   Epoch: 14   Global Step: 600360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:42,430-Speed 2628.34 samples/sec   Loss 3.7469   LearningRate 0.0076   Epoch: 14   Global Step: 600370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:46,331-Speed 2626.25 samples/sec   Loss 3.7941   LearningRate 0.0076   Epoch: 14   Global Step: 600380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:50,229-Speed 2627.50 samples/sec   Loss 3.6642   LearningRate 0.0076   Epoch: 14   Global Step: 600390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:54,124-Speed 2629.54 samples/sec   Loss 3.7033   LearningRate 0.0076   Epoch: 14   Global Step: 600400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:13:58,016-Speed 2631.48 samples/sec   Loss 3.7450   LearningRate 0.0076   Epoch: 14   Global Step: 600410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:01,908-Speed 2631.82 samples/sec   Loss 3.7939   LearningRate 0.0076   Epoch: 14   Global Step: 600420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:05,809-Speed 2625.61 samples/sec   Loss 3.7542   LearningRate 0.0076   Epoch: 14   Global Step: 600430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:14:09,679-Speed 2646.78 samples/sec   Loss 3.7603   LearningRate 0.0076   Epoch: 14   Global Step: 600440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:13,572-Speed 2631.51 samples/sec   Loss 3.7849   LearningRate 0.0076   Epoch: 14   Global Step: 600450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:17,484-Speed 2617.57 samples/sec   Loss 3.8048   LearningRate 0.0076   Epoch: 14   Global Step: 600460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:21,388-Speed 2623.73 samples/sec   Loss 3.7899   LearningRate 0.0076   Epoch: 14   Global Step: 600470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:25,289-Speed 2625.94 samples/sec   Loss 3.7995   LearningRate 0.0076   Epoch: 14   Global Step: 600480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:29,194-Speed 2622.96 samples/sec   Loss 3.8386   LearningRate 0.0076   Epoch: 14   Global Step: 600490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:33,084-Speed 2632.63 samples/sec   Loss 3.7599   LearningRate 0.0076   Epoch: 14   Global Step: 600500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:37,010-Speed 2608.90 samples/sec   Loss 3.7781   LearningRate 0.0076   Epoch: 14   Global Step: 600510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:40,903-Speed 2631.27 samples/sec   Loss 3.8159   LearningRate 0.0076   Epoch: 14   Global Step: 600520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:45,018-Speed 2489.93 samples/sec   Loss 3.7752   LearningRate 0.0076   Epoch: 14   Global Step: 600530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:14:49,076-Speed 2524.36 samples/sec   Loss 3.8231   LearningRate 0.0076   Epoch: 14   Global Step: 600540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:14:52,964-Speed 2633.88 samples/sec   Loss 3.7037   LearningRate 0.0076   Epoch: 14   Global Step: 600550   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:14:56,871-Speed 2622.07 samples/sec   Loss 3.8472   LearningRate 0.0076   Epoch: 14   Global Step: 600560   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:00,768-Speed 2628.43 samples/sec   Loss 3.7346   LearningRate 0.0076   Epoch: 14   Global Step: 600570   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:04,673-Speed 2622.72 samples/sec   Loss 3.7161   LearningRate 0.0076   Epoch: 14   Global Step: 600580   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:08,583-Speed 2619.96 samples/sec   Loss 3.7516   LearningRate 0.0076   Epoch: 14   Global Step: 600590   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:12,478-Speed 2629.15 samples/sec   Loss 3.7273   LearningRate 0.0076   Epoch: 14   Global Step: 600600   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:16,380-Speed 2625.37 samples/sec   Loss 3.7929   LearningRate 0.0076   Epoch: 14   Global Step: 600610   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:20,275-Speed 2629.79 samples/sec   Loss 3.7377   LearningRate 0.0076   Epoch: 14   Global Step: 600620   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:24,174-Speed 2627.20 samples/sec   Loss 3.8136   LearningRate 0.0076   Epoch: 14   Global Step: 600630   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:28,082-Speed 2620.82 samples/sec   Loss 3.8015   LearningRate 0.0076   Epoch: 14   Global Step: 600640   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:15:31,992-Speed 2619.18 samples/sec   Loss 3.6909   LearningRate 0.0076   Epoch: 14   Global Step: 600650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:35,887-Speed 2629.73 samples/sec   Loss 3.7909   LearningRate 0.0076   Epoch: 14   Global Step: 600660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:39,809-Speed 2611.83 samples/sec   Loss 3.7338   LearningRate 0.0076   Epoch: 14   Global Step: 600670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:43,721-Speed 2618.05 samples/sec   Loss 3.7927   LearningRate 0.0076   Epoch: 14   Global Step: 600680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:47,716-Speed 2563.32 samples/sec   Loss 3.7164   LearningRate 0.0076   Epoch: 14   Global Step: 600690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:51,698-Speed 2572.37 samples/sec   Loss 3.7179   LearningRate 0.0076   Epoch: 14   Global Step: 600700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:55,593-Speed 2630.47 samples/sec   Loss 3.6896   LearningRate 0.0076   Epoch: 14   Global Step: 600710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:15:59,492-Speed 2627.08 samples/sec   Loss 3.6974   LearningRate 0.0076   Epoch: 14   Global Step: 600720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:03,392-Speed 2626.11 samples/sec   Loss 3.7388   LearningRate 0.0076   Epoch: 14   Global Step: 600730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:07,287-Speed 2629.62 samples/sec   Loss 3.6755   LearningRate 0.0076   Epoch: 14   Global Step: 600740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:11,184-Speed 2628.17 samples/sec   Loss 3.7223   LearningRate 0.0076   Epoch: 14   Global Step: 600750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:16:15,061-Speed 2641.91 samples/sec   Loss 3.7380   LearningRate 0.0076   Epoch: 14   Global Step: 600760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:18,966-Speed 2622.89 samples/sec   Loss 3.7232   LearningRate 0.0076   Epoch: 14   Global Step: 600770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:22,872-Speed 2622.03 samples/sec   Loss 3.7313   LearningRate 0.0076   Epoch: 14   Global Step: 600780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:26,769-Speed 2628.36 samples/sec   Loss 3.7414   LearningRate 0.0076   Epoch: 14   Global Step: 600790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:30,664-Speed 2629.54 samples/sec   Loss 3.7373   LearningRate 0.0076   Epoch: 14   Global Step: 600800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:34,558-Speed 2631.26 samples/sec   Loss 3.7861   LearningRate 0.0076   Epoch: 14   Global Step: 600810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:38,450-Speed 2631.42 samples/sec   Loss 3.7532   LearningRate 0.0076   Epoch: 14   Global Step: 600820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:42,345-Speed 2628.98 samples/sec   Loss 3.6559   LearningRate 0.0076   Epoch: 14   Global Step: 600830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:46,334-Speed 2568.61 samples/sec   Loss 3.7666   LearningRate 0.0076   Epoch: 14   Global Step: 600840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:50,225-Speed 2632.59 samples/sec   Loss 3.7881   LearningRate 0.0076   Epoch: 14   Global Step: 600850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:16:54,120-Speed 2629.74 samples/sec   Loss 3.7711   LearningRate 0.0076   Epoch: 14   Global Step: 600860   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:16:58,002-Speed 2638.87 samples/sec   Loss 3.7026   LearningRate 0.0076   Epoch: 14   Global Step: 600870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:01,901-Speed 2626.25 samples/sec   Loss 3.8238   LearningRate 0.0076   Epoch: 14   Global Step: 600880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:05,818-Speed 2615.38 samples/sec   Loss 3.7676   LearningRate 0.0076   Epoch: 14   Global Step: 600890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:09,715-Speed 2628.80 samples/sec   Loss 3.7436   LearningRate 0.0076   Epoch: 14   Global Step: 600900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:13,618-Speed 2624.11 samples/sec   Loss 3.8770   LearningRate 0.0076   Epoch: 14   Global Step: 600910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:17,509-Speed 2632.48 samples/sec   Loss 3.7232   LearningRate 0.0076   Epoch: 14   Global Step: 600920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:21,405-Speed 2629.23 samples/sec   Loss 3.7152   LearningRate 0.0076   Epoch: 14   Global Step: 600930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:25,296-Speed 2632.60 samples/sec   Loss 3.7208   LearningRate 0.0076   Epoch: 14   Global Step: 600940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:29,196-Speed 2626.45 samples/sec   Loss 3.7551   LearningRate 0.0076   Epoch: 14   Global Step: 600950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:33,093-Speed 2627.95 samples/sec   Loss 3.6779   LearningRate 0.0076   Epoch: 14   Global Step: 600960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:36,988-Speed 2629.73 samples/sec   Loss 3.7346   LearningRate 0.0076   Epoch: 14   Global Step: 600970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:17:40,856-Speed 2648.22 samples/sec   Loss 3.7819   LearningRate 0.0076   Epoch: 14   Global Step: 600980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:44,753-Speed 2628.35 samples/sec   Loss 3.7525   LearningRate 0.0076   Epoch: 14   Global Step: 600990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:48,726-Speed 2577.41 samples/sec   Loss 3.6895   LearningRate 0.0076   Epoch: 14   Global Step: 601000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:52,817-Speed 2503.88 samples/sec   Loss 3.7930   LearningRate 0.0076   Epoch: 14   Global Step: 601010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:17:56,711-Speed 2629.78 samples/sec   Loss 3.7178   LearningRate 0.0076   Epoch: 14   Global Step: 601020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:00,698-Speed 2569.51 samples/sec   Loss 3.7606   LearningRate 0.0076   Epoch: 14   Global Step: 601030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:04,597-Speed 2626.95 samples/sec   Loss 3.8138   LearningRate 0.0076   Epoch: 14   Global Step: 601040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:08,493-Speed 2628.76 samples/sec   Loss 3.7702   LearningRate 0.0076   Epoch: 14   Global Step: 601050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:12,388-Speed 2629.83 samples/sec   Loss 3.7322   LearningRate 0.0076   Epoch: 14   Global Step: 601060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:16,283-Speed 2629.92 samples/sec   Loss 3.7206   LearningRate 0.0076   Epoch: 14   Global Step: 601070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:20,188-Speed 2623.24 samples/sec   Loss 3.7559   LearningRate 0.0076   Epoch: 14   Global Step: 601080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:18:24,086-Speed 2627.43 samples/sec   Loss 3.7041   LearningRate 0.0076   Epoch: 14   Global Step: 601090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:18:27,976-Speed 2632.85 samples/sec   Loss 3.7354   LearningRate 0.0076   Epoch: 14   Global Step: 601100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:18:31,854-Speed 2640.51 samples/sec   Loss 3.7556   LearningRate 0.0076   Epoch: 14   Global Step: 601110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:35,751-Speed 2629.02 samples/sec   Loss 3.6747   LearningRate 0.0076   Epoch: 14   Global Step: 601120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:39,645-Speed 2629.94 samples/sec   Loss 3.7242   LearningRate 0.0076   Epoch: 14   Global Step: 601130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:43,540-Speed 2629.86 samples/sec   Loss 3.7030   LearningRate 0.0076   Epoch: 14   Global Step: 601140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:47,721-Speed 2449.64 samples/sec   Loss 3.7305   LearningRate 0.0076   Epoch: 14   Global Step: 601150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:51,612-Speed 2633.02 samples/sec   Loss 3.8387   LearningRate 0.0076   Epoch: 14   Global Step: 601160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:55,525-Speed 2617.81 samples/sec   Loss 3.6957   LearningRate 0.0076   Epoch: 14   Global Step: 601170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:18:59,423-Speed 2627.10 samples/sec   Loss 3.7489   LearningRate 0.0076   Epoch: 14   Global Step: 601180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:03,321-Speed 2627.64 samples/sec   Loss 3.7280   LearningRate 0.0076   Epoch: 14   Global Step: 601190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:07,214-Speed 2631.06 samples/sec   Loss 3.6745   LearningRate 0.0076   Epoch: 14   Global Step: 601200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:11,091-Speed 2642.39 samples/sec   Loss 3.7374   LearningRate 0.0076   Epoch: 14   Global Step: 601210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:15,027-Speed 2602.61 samples/sec   Loss 3.7079   LearningRate 0.0076   Epoch: 14   Global Step: 601220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:18,923-Speed 2628.76 samples/sec   Loss 3.7675   LearningRate 0.0076   Epoch: 14   Global Step: 601230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:22,814-Speed 2631.71 samples/sec   Loss 3.6714   LearningRate 0.0076   Epoch: 14   Global Step: 601240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:26,713-Speed 2627.23 samples/sec   Loss 3.7043   LearningRate 0.0076   Epoch: 14   Global Step: 601250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:30,632-Speed 2614.65 samples/sec   Loss 3.7577   LearningRate 0.0076   Epoch: 14   Global Step: 601260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:34,525-Speed 2630.63 samples/sec   Loss 3.7549   LearningRate 0.0076   Epoch: 14   Global Step: 601270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:38,427-Speed 2624.66 samples/sec   Loss 3.7828   LearningRate 0.0076   Epoch: 14   Global Step: 601280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:42,329-Speed 2624.94 samples/sec   Loss 3.7179   LearningRate 0.0076   Epoch: 14   Global Step: 601290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:46,414-Speed 2507.63 samples/sec   Loss 3.6068   LearningRate 0.0076   Epoch: 14   Global Step: 601300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:50,414-Speed 2560.77 samples/sec   Loss 3.7449   LearningRate 0.0076   Epoch: 14   Global Step: 601310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-15 15:19:54,286-Speed 2645.37 samples/sec   Loss 3.7429   LearningRate 0.0076   Epoch: 14   Global Step: 601320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:19:58,194-Speed 2620.96 samples/sec   Loss 3.6991   LearningRate 0.0076   Epoch: 14   Global Step: 601330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:02,093-Speed 2626.86 samples/sec   Loss 3.7098   LearningRate 0.0076   Epoch: 14   Global Step: 601340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:05,994-Speed 2625.73 samples/sec   Loss 3.7159   LearningRate 0.0076   Epoch: 14   Global Step: 601350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:09,906-Speed 2618.72 samples/sec   Loss 3.7770   LearningRate 0.0076   Epoch: 14   Global Step: 601360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:13,850-Speed 2596.79 samples/sec   Loss 3.7113   LearningRate 0.0076   Epoch: 14   Global Step: 601370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:17,743-Speed 2631.16 samples/sec   Loss 3.7576   LearningRate 0.0076   Epoch: 14   Global Step: 601380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:21,651-Speed 2621.27 samples/sec   Loss 3.6528   LearningRate 0.0076   Epoch: 14   Global Step: 601390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:25,560-Speed 2620.20 samples/sec   Loss 3.7017   LearningRate 0.0076   Epoch: 14   Global Step: 601400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:20:29,440-Speed 2640.18 samples/sec   Loss 3.7260   LearningRate 0.0076   Epoch: 14   Global Step: 601410   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:33,337-Speed 2627.97 samples/sec   Loss 3.6660   LearningRate 0.0076   Epoch: 14   Global Step: 601420   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:37,232-Speed 2629.83 samples/sec   Loss 3.6827   LearningRate 0.0076   Epoch: 14   Global Step: 601430   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:41,125-Speed 2630.95 samples/sec   Loss 3.7348   LearningRate 0.0076   Epoch: 14   Global Step: 601440   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:45,026-Speed 2625.87 samples/sec   Loss 3.7367   LearningRate 0.0076   Epoch: 14   Global Step: 601450   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:48,918-Speed 2631.70 samples/sec   Loss 3.7478   LearningRate 0.0076   Epoch: 14   Global Step: 601460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:52,826-Speed 2621.03 samples/sec   Loss 3.7219   LearningRate 0.0076   Epoch: 14   Global Step: 601470   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:20:56,722-Speed 2629.92 samples/sec   Loss 3.7370   LearningRate 0.0076   Epoch: 14   Global Step: 601480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:00,613-Speed 2632.71 samples/sec   Loss 3.7269   LearningRate 0.0076   Epoch: 14   Global Step: 601490   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:04,508-Speed 2629.55 samples/sec   Loss 3.6875   LearningRate 0.0076   Epoch: 14   Global Step: 601500   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:08,410-Speed 2624.76 samples/sec   Loss 3.7318   LearningRate 0.0076   Epoch: 14   Global Step: 601510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:21:12,317-Speed 2621.35 samples/sec   Loss 3.7067   LearningRate 0.0076   Epoch: 14   Global Step: 601520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:21:16,220-Speed 2624.49 samples/sec   Loss 3.6528   LearningRate 0.0076   Epoch: 14   Global Step: 601530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:21:20,116-Speed 2629.47 samples/sec   Loss 3.5965   LearningRate 0.0076   Epoch: 14   Global Step: 601540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:21:24,006-Speed 2632.92 samples/sec   Loss 3.7538   LearningRate 0.0076   Epoch: 14   Global Step: 601550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:21:27,906-Speed 2626.07 samples/sec   Loss 3.6919   LearningRate 0.0076   Epoch: 14   Global Step: 601560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:21:31,775-Speed 2648.00 samples/sec   Loss 3.7515   LearningRate 0.0076   Epoch: 14   Global Step: 601570   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:35,674-Speed 2627.01 samples/sec   Loss 3.7398   LearningRate 0.0076   Epoch: 14   Global Step: 601580   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:39,719-Speed 2532.31 samples/sec   Loss 3.7514   LearningRate 0.0076   Epoch: 14   Global Step: 601590   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:43,812-Speed 2502.47 samples/sec   Loss 3.7699   LearningRate 0.0076   Epoch: 14   Global Step: 601600   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:47,893-Speed 2509.61 samples/sec   Loss 3.7913   LearningRate 0.0076   Epoch: 14   Global Step: 601610   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:51,867-Speed 2577.98 samples/sec   Loss 3.7290   LearningRate 0.0076   Epoch: 14   Global Step: 601620   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:55,757-Speed 2633.14 samples/sec   Loss 3.6071   LearningRate 0.0076   Epoch: 14   Global Step: 601630   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:21:59,648-Speed 2632.25 samples/sec   Loss 3.7540   LearningRate 0.0075   Epoch: 14   Global Step: 601640   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:03,555-Speed 2620.88 samples/sec   Loss 3.7487   LearningRate 0.0075   Epoch: 14   Global Step: 601650   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:07,448-Speed 2631.35 samples/sec   Loss 3.6261   LearningRate 0.0075   Epoch: 14   Global Step: 601660   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:11,356-Speed 2620.88 samples/sec   Loss 3.6937   LearningRate 0.0075   Epoch: 14   Global Step: 601670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:22:15,250-Speed 2631.05 samples/sec   Loss 3.8268   LearningRate 0.0075   Epoch: 14   Global Step: 601680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:22:19,150-Speed 2626.38 samples/sec   Loss 3.6620   LearningRate 0.0075   Epoch: 14   Global Step: 601690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:22:23,029-Speed 2639.92 samples/sec   Loss 3.7842   LearningRate 0.0075   Epoch: 14   Global Step: 601700   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:26,931-Speed 2625.16 samples/sec   Loss 3.7429   LearningRate 0.0075   Epoch: 14   Global Step: 601710   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:30,824-Speed 2632.05 samples/sec   Loss 3.7268   LearningRate 0.0075   Epoch: 14   Global Step: 601720   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:34,723-Speed 2626.97 samples/sec   Loss 3.7268   LearningRate 0.0075   Epoch: 14   Global Step: 601730   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:38,622-Speed 2626.97 samples/sec   Loss 3.7393   LearningRate 0.0075   Epoch: 14   Global Step: 601740   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:42,522-Speed 2626.40 samples/sec   Loss 3.7263   LearningRate 0.0075   Epoch: 14   Global Step: 601750   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:22:46,397-Speed 2643.73 samples/sec   Loss 3.7890   LearningRate 0.0075   Epoch: 14   Global Step: 601760   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:22:50,305-Speed 2621.19 samples/sec   Loss 3.7334   LearningRate 0.0075   Epoch: 14   Global Step: 601770   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:22:54,209-Speed 2623.42 samples/sec   Loss 3.6979   LearningRate 0.0075   Epoch: 14   Global Step: 601780   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:22:58,112-Speed 2623.78 samples/sec   Loss 3.6332   LearningRate 0.0075   Epoch: 14   Global Step: 601790   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:02,022-Speed 2619.74 samples/sec   Loss 3.7538   LearningRate 0.0075   Epoch: 14   Global Step: 601800   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:05,921-Speed 2627.44 samples/sec   Loss 3.7075   LearningRate 0.0075   Epoch: 14   Global Step: 601810   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:09,814-Speed 2631.12 samples/sec   Loss 3.7157   LearningRate 0.0075   Epoch: 14   Global Step: 601820   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:13,717-Speed 2624.43 samples/sec   Loss 3.7211   LearningRate 0.0075   Epoch: 14   Global Step: 601830   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:17,617-Speed 2625.92 samples/sec   Loss 3.7172   LearningRate 0.0075   Epoch: 14   Global Step: 601840   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:21,512-Speed 2630.26 samples/sec   Loss 3.7656   LearningRate 0.0075   Epoch: 14   Global Step: 601850   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-04-15 15:23:25,408-Speed 2629.29 samples/sec   Loss 3.6656   LearningRate 0.0075   Epoch: 14   Global Step: 601860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:29,302-Speed 2630.11 samples/sec   Loss 3.7422   LearningRate 0.0075   Epoch: 14   Global Step: 601870   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:33,210-Speed 2620.92 samples/sec   Loss 3.7708   LearningRate 0.0075   Epoch: 14   Global Step: 601880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:37,105-Speed 2629.84 samples/sec   Loss 3.8114   LearningRate 0.0075   Epoch: 14   Global Step: 601890   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:41,015-Speed 2619.94 samples/sec   Loss 3.7963   LearningRate 0.0075   Epoch: 14   Global Step: 601900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:44,932-Speed 2615.09 samples/sec   Loss 3.7368   LearningRate 0.0075   Epoch: 14   Global Step: 601910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:48,845-Speed 2616.97 samples/sec   Loss 3.7499   LearningRate 0.0075   Epoch: 14   Global Step: 601920   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:52,738-Speed 2631.11 samples/sec   Loss 3.7200   LearningRate 0.0075   Epoch: 14   Global Step: 601930   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:23:56,634-Speed 2629.39 samples/sec   Loss 3.7351   LearningRate 0.0075   Epoch: 14   Global Step: 601940   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:00,528-Speed 2630.82 samples/sec   Loss 3.7823   LearningRate 0.0075   Epoch: 14   Global Step: 601950   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:04,424-Speed 2629.01 samples/sec   Loss 3.7630   LearningRate 0.0075   Epoch: 14   Global Step: 601960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:24:08,403-Speed 2573.90 samples/sec   Loss 3.7665   LearningRate 0.0075   Epoch: 14   Global Step: 601970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:24:12,308-Speed 2622.31 samples/sec   Loss 3.6254   LearningRate 0.0075   Epoch: 14   Global Step: 601980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:24:16,206-Speed 2628.14 samples/sec   Loss 3.7344   LearningRate 0.0075   Epoch: 14   Global Step: 601990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:24:20,104-Speed 2627.76 samples/sec   Loss 3.7114   LearningRate 0.0075   Epoch: 14   Global Step: 602000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:24:23,980-Speed 2642.04 samples/sec   Loss 3.7185   LearningRate 0.0075   Epoch: 14   Global Step: 602010   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:27,878-Speed 2628.00 samples/sec   Loss 3.7308   LearningRate 0.0075   Epoch: 14   Global Step: 602020   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:31,772-Speed 2631.22 samples/sec   Loss 3.7136   LearningRate 0.0075   Epoch: 14   Global Step: 602030   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:35,664-Speed 2631.59 samples/sec   Loss 3.6506   LearningRate 0.0075   Epoch: 14   Global Step: 602040   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:39,558-Speed 2630.94 samples/sec   Loss 3.6178   LearningRate 0.0075   Epoch: 14   Global Step: 602050   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:43,447-Speed 2633.16 samples/sec   Loss 3.7076   LearningRate 0.0075   Epoch: 14   Global Step: 602060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:47,340-Speed 2630.83 samples/sec   Loss 3.6996   LearningRate 0.0075   Epoch: 14   Global Step: 602070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:51,242-Speed 2625.76 samples/sec   Loss 3.7007   LearningRate 0.0075   Epoch: 14   Global Step: 602080   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:55,136-Speed 2629.83 samples/sec   Loss 3.5941   LearningRate 0.0075   Epoch: 14   Global Step: 602090   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:24:59,033-Speed 2628.66 samples/sec   Loss 3.7297   LearningRate 0.0075   Epoch: 14   Global Step: 602100   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:02,934-Speed 2625.65 samples/sec   Loss 3.7503   LearningRate 0.0075   Epoch: 14   Global Step: 602110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:25:06,840-Speed 2622.45 samples/sec   Loss 3.6918   LearningRate 0.0075   Epoch: 14   Global Step: 602120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:25:10,712-Speed 2645.53 samples/sec   Loss 3.7672   LearningRate 0.0075   Epoch: 14   Global Step: 602130   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:14,602-Speed 2633.05 samples/sec   Loss 3.6187   LearningRate 0.0075   Epoch: 14   Global Step: 602140   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:18,497-Speed 2629.86 samples/sec   Loss 3.7710   LearningRate 0.0075   Epoch: 14   Global Step: 602150   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:22,394-Speed 2627.73 samples/sec   Loss 3.7305   LearningRate 0.0075   Epoch: 14   Global Step: 602160   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:26,372-Speed 2575.53 samples/sec   Loss 3.7696   LearningRate 0.0075   Epoch: 14   Global Step: 602170   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:30,266-Speed 2629.88 samples/sec   Loss 3.7170   LearningRate 0.0075   Epoch: 14   Global Step: 602180   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:34,183-Speed 2614.90 samples/sec   Loss 3.6937   LearningRate 0.0075   Epoch: 14   Global Step: 602190   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:38,079-Speed 2628.77 samples/sec   Loss 3.7621   LearningRate 0.0075   Epoch: 14   Global Step: 602200   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:42,005-Speed 2610.29 samples/sec   Loss 3.7428   LearningRate 0.0075   Epoch: 14   Global Step: 602210   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:45,896-Speed 2633.08 samples/sec   Loss 3.6598   LearningRate 0.0075   Epoch: 14   Global Step: 602220   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:25:49,810-Speed 2616.43 samples/sec   Loss 3.8362   LearningRate 0.0075   Epoch: 14   Global Step: 602230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:25:53,726-Speed 2615.46 samples/sec   Loss 3.6958   LearningRate 0.0075   Epoch: 14   Global Step: 602240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:25:57,623-Speed 2628.08 samples/sec   Loss 3.6886   LearningRate 0.0075   Epoch: 14   Global Step: 602250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:26:01,529-Speed 2622.69 samples/sec   Loss 3.7141   LearningRate 0.0075   Epoch: 14   Global Step: 602260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:26:05,408-Speed 2641.18 samples/sec   Loss 3.6863   LearningRate 0.0075   Epoch: 14   Global Step: 602270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:09,345-Speed 2601.19 samples/sec   Loss 3.6946   LearningRate 0.0075   Epoch: 14   Global Step: 602280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:13,248-Speed 2624.37 samples/sec   Loss 3.7090   LearningRate 0.0075   Epoch: 14   Global Step: 602290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:17,171-Speed 2610.87 samples/sec   Loss 3.7557   LearningRate 0.0075   Epoch: 14   Global Step: 602300   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:21,084-Speed 2617.98 samples/sec   Loss 3.7469   LearningRate 0.0075   Epoch: 14   Global Step: 602310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:24,993-Speed 2620.46 samples/sec   Loss 3.7279   LearningRate 0.0075   Epoch: 14   Global Step: 602320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:28,900-Speed 2620.99 samples/sec   Loss 3.6184   LearningRate 0.0075   Epoch: 14   Global Step: 602330   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:32,804-Speed 2624.23 samples/sec   Loss 3.6862   LearningRate 0.0075   Epoch: 14   Global Step: 602340   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:36,694-Speed 2632.99 samples/sec   Loss 3.7773   LearningRate 0.0075   Epoch: 14   Global Step: 602350   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:40,585-Speed 2632.19 samples/sec   Loss 3.7763   LearningRate 0.0075   Epoch: 14   Global Step: 602360   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:26:44,485-Speed 2626.28 samples/sec   Loss 3.6499   LearningRate 0.0075   Epoch: 14   Global Step: 602370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:26:48,386-Speed 2625.15 samples/sec   Loss 3.7167   LearningRate 0.0075   Epoch: 14   Global Step: 602380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:26:52,278-Speed 2632.26 samples/sec   Loss 3.7396   LearningRate 0.0075   Epoch: 14   Global Step: 602390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:26:56,172-Speed 2630.68 samples/sec   Loss 3.7781   LearningRate 0.0075   Epoch: 14   Global Step: 602400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:00,064-Speed 2631.58 samples/sec   Loss 3.7316   LearningRate 0.0075   Epoch: 14   Global Step: 602410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:03,960-Speed 2629.12 samples/sec   Loss 3.6901   LearningRate 0.0075   Epoch: 14   Global Step: 602420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:07,853-Speed 2630.75 samples/sec   Loss 3.8093   LearningRate 0.0075   Epoch: 14   Global Step: 602430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:11,752-Speed 2627.53 samples/sec   Loss 3.7376   LearningRate 0.0075   Epoch: 14   Global Step: 602440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:15,659-Speed 2621.63 samples/sec   Loss 3.7669   LearningRate 0.0075   Epoch: 14   Global Step: 602450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:19,554-Speed 2629.11 samples/sec   Loss 3.6306   LearningRate 0.0075   Epoch: 14   Global Step: 602460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:23,430-Speed 2644.47 samples/sec   Loss 3.7767   LearningRate 0.0075   Epoch: 14   Global Step: 602470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:27:27,300-Speed 2647.92 samples/sec   Loss 3.8002   LearningRate 0.0075   Epoch: 14   Global Step: 602480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:31,197-Speed 2628.38 samples/sec   Loss 3.8161   LearningRate 0.0075   Epoch: 14   Global Step: 602490   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:35,089-Speed 2631.96 samples/sec   Loss 3.7876   LearningRate 0.0075   Epoch: 14   Global Step: 602500   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:38,984-Speed 2629.26 samples/sec   Loss 3.7103   LearningRate 0.0075   Epoch: 14   Global Step: 602510   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:42,877-Speed 2630.93 samples/sec   Loss 3.7505   LearningRate 0.0075   Epoch: 14   Global Step: 602520   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:46,769-Speed 2631.38 samples/sec   Loss 3.7972   LearningRate 0.0075   Epoch: 14   Global Step: 602530   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:50,660-Speed 2632.47 samples/sec   Loss 3.6945   LearningRate 0.0075   Epoch: 14   Global Step: 602540   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:54,554-Speed 2630.50 samples/sec   Loss 3.7207   LearningRate 0.0075   Epoch: 14   Global Step: 602550   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:27:58,445-Speed 2631.86 samples/sec   Loss 3.6568   LearningRate 0.0075   Epoch: 14   Global Step: 602560   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:28:02,337-Speed 2632.64 samples/sec   Loss 3.7803   LearningRate 0.0075   Epoch: 14   Global Step: 602570   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-15 15:28:06,258-Speed 2612.73 samples/sec   Loss 3.7120   LearningRate 0.0075   Epoch: 14   Global Step: 602580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:10,163-Speed 2622.88 samples/sec   Loss 3.7543   LearningRate 0.0075   Epoch: 14   Global Step: 602590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:14,067-Speed 2623.39 samples/sec   Loss 3.7625   LearningRate 0.0075   Epoch: 14   Global Step: 602600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:17,962-Speed 2629.59 samples/sec   Loss 3.6682   LearningRate 0.0075   Epoch: 14   Global Step: 602610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:21,879-Speed 2615.11 samples/sec   Loss 3.7816   LearningRate 0.0075   Epoch: 14   Global Step: 602620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:25,780-Speed 2625.81 samples/sec   Loss 3.7357   LearningRate 0.0075   Epoch: 14   Global Step: 602630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:29,680-Speed 2625.85 samples/sec   Loss 3.6451   LearningRate 0.0075   Epoch: 14   Global Step: 602640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:33,578-Speed 2627.58 samples/sec   Loss 3.7229   LearningRate 0.0075   Epoch: 14   Global Step: 602650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:37,475-Speed 2628.44 samples/sec   Loss 3.7102   LearningRate 0.0075   Epoch: 14   Global Step: 602660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:41,388-Speed 2618.33 samples/sec   Loss 3.7150   LearningRate 0.0075   Epoch: 14   Global Step: 602670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:45,266-Speed 2641.36 samples/sec   Loss 3.7456   LearningRate 0.0075   Epoch: 14   Global Step: 602680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:49,168-Speed 2624.94 samples/sec   Loss 3.7750   LearningRate 0.0075   Epoch: 14   Global Step: 602690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:53,067-Speed 2626.78 samples/sec   Loss 3.6965   LearningRate 0.0075   Epoch: 14   Global Step: 602700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:28:57,013-Speed 2595.66 samples/sec   Loss 3.7705   LearningRate 0.0075   Epoch: 14   Global Step: 602710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:29:00,909-Speed 2629.45 samples/sec   Loss 3.6838   LearningRate 0.0075   Epoch: 14   Global Step: 602720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:29:04,807-Speed 2627.63 samples/sec   Loss 3.7176   LearningRate 0.0075   Epoch: 14   Global Step: 602730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-15 15:29:08,745-Speed 2600.50 samples/sec   Loss 3.7309   LearningRate 0.0075   Epoch: 14   Global Step: 602740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:12,646-Speed 2626.10 samples/sec   Loss 3.6945   LearningRate 0.0075   Epoch: 14   Global Step: 602750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:16,547-Speed 2625.63 samples/sec   Loss 3.6195   LearningRate 0.0075   Epoch: 14   Global Step: 602760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:20,444-Speed 2628.59 samples/sec   Loss 3.6550   LearningRate 0.0075   Epoch: 14   Global Step: 602770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:24,329-Speed 2636.34 samples/sec   Loss 3.6499   LearningRate 0.0075   Epoch: 14   Global Step: 602780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:28,234-Speed 2622.64 samples/sec   Loss 3.6995   LearningRate 0.0075   Epoch: 14   Global Step: 602790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:32,127-Speed 2631.28 samples/sec   Loss 3.6214   LearningRate 0.0075   Epoch: 14   Global Step: 602800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:36,155-Speed 2543.66 samples/sec   Loss 3.6866   LearningRate 0.0075   Epoch: 14   Global Step: 602810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:40,086-Speed 2605.55 samples/sec   Loss 3.7216   LearningRate 0.0075   Epoch: 14   Global Step: 602820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:43,989-Speed 2624.11 samples/sec   Loss 3.8232   LearningRate 0.0075   Epoch: 14   Global Step: 602830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:47,921-Speed 2604.58 samples/sec   Loss 3.6026   LearningRate 0.0075   Epoch: 14   Global Step: 602840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:51,830-Speed 2621.21 samples/sec   Loss 3.7543   LearningRate 0.0075   Epoch: 14   Global Step: 602850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:55,721-Speed 2632.13 samples/sec   Loss 3.7138   LearningRate 0.0075   Epoch: 14   Global Step: 602860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:29:59,623-Speed 2624.93 samples/sec   Loss 3.7170   LearningRate 0.0075   Epoch: 14   Global Step: 602870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:03,553-Speed 2605.77 samples/sec   Loss 3.7004   LearningRate 0.0075   Epoch: 14   Global Step: 602880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:30:07,468-Speed 2616.91 samples/sec   Loss 3.7541   LearningRate 0.0075   Epoch: 14   Global Step: 602890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:11,401-Speed 2604.70 samples/sec   Loss 3.7145   LearningRate 0.0075   Epoch: 14   Global Step: 602900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:15,326-Speed 2608.90 samples/sec   Loss 3.7350   LearningRate 0.0075   Epoch: 14   Global Step: 602910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:19,234-Speed 2621.57 samples/sec   Loss 3.6733   LearningRate 0.0075   Epoch: 14   Global Step: 602920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:23,144-Speed 2619.27 samples/sec   Loss 3.6978   LearningRate 0.0075   Epoch: 14   Global Step: 602930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:27,051-Speed 2621.99 samples/sec   Loss 3.7742   LearningRate 0.0075   Epoch: 14   Global Step: 602940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:30,950-Speed 2627.16 samples/sec   Loss 3.6964   LearningRate 0.0075   Epoch: 14   Global Step: 602950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:34,854-Speed 2623.59 samples/sec   Loss 3.7241   LearningRate 0.0075   Epoch: 14   Global Step: 602960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:38,753-Speed 2626.37 samples/sec   Loss 3.7792   LearningRate 0.0075   Epoch: 14   Global Step: 602970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:42,663-Speed 2620.22 samples/sec   Loss 3.7390   LearningRate 0.0075   Epoch: 14   Global Step: 602980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:46,565-Speed 2625.05 samples/sec   Loss 3.7282   LearningRate 0.0075   Epoch: 14   Global Step: 602990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:30:50,446-Speed 2639.24 samples/sec   Loss 3.7346   LearningRate 0.0075   Epoch: 14   Global Step: 603000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:54,348-Speed 2624.51 samples/sec   Loss 3.6980   LearningRate 0.0075   Epoch: 14   Global Step: 603010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:30:58,249-Speed 2625.80 samples/sec   Loss 3.8062   LearningRate 0.0075   Epoch: 14   Global Step: 603020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:02,152-Speed 2624.57 samples/sec   Loss 3.6723   LearningRate 0.0075   Epoch: 14   Global Step: 603030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:06,066-Speed 2617.41 samples/sec   Loss 3.6589   LearningRate 0.0075   Epoch: 14   Global Step: 603040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:09,966-Speed 2625.92 samples/sec   Loss 3.7809   LearningRate 0.0075   Epoch: 14   Global Step: 603050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:13,857-Speed 2632.07 samples/sec   Loss 3.6850   LearningRate 0.0075   Epoch: 14   Global Step: 603060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:17,756-Speed 2627.46 samples/sec   Loss 3.6783   LearningRate 0.0075   Epoch: 14   Global Step: 603070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:21,649-Speed 2631.74 samples/sec   Loss 3.7495   LearningRate 0.0075   Epoch: 14   Global Step: 603080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:25,541-Speed 2631.13 samples/sec   Loss 3.6774   LearningRate 0.0075   Epoch: 14   Global Step: 603090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:29,412-Speed 2646.35 samples/sec   Loss 3.7375   LearningRate 0.0075   Epoch: 14   Global Step: 603100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:33,309-Speed 2627.92 samples/sec   Loss 3.6615   LearningRate 0.0075   Epoch: 14   Global Step: 603110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:37,202-Speed 2630.77 samples/sec   Loss 3.7762   LearningRate 0.0075   Epoch: 14   Global Step: 603120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:41,104-Speed 2625.82 samples/sec   Loss 3.7908   LearningRate 0.0075   Epoch: 14   Global Step: 603130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:45,001-Speed 2628.15 samples/sec   Loss 3.7453   LearningRate 0.0075   Epoch: 14   Global Step: 603140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:48,914-Speed 2617.53 samples/sec   Loss 3.7010   LearningRate 0.0074   Epoch: 14   Global Step: 603150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:52,801-Speed 2635.26 samples/sec   Loss 3.6043   LearningRate 0.0074   Epoch: 14   Global Step: 603160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:31:56,687-Speed 2635.73 samples/sec   Loss 3.6268   LearningRate 0.0074   Epoch: 14   Global Step: 603170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:00,578-Speed 2631.97 samples/sec   Loss 3.6947   LearningRate 0.0074   Epoch: 14   Global Step: 603180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:04,472-Speed 2630.29 samples/sec   Loss 3.6820   LearningRate 0.0074   Epoch: 14   Global Step: 603190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:08,373-Speed 2625.76 samples/sec   Loss 3.6654   LearningRate 0.0074   Epoch: 14   Global Step: 603200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:12,266-Speed 2631.11 samples/sec   Loss 3.7179   LearningRate 0.0074   Epoch: 14   Global Step: 603210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:16,156-Speed 2633.28 samples/sec   Loss 3.7058   LearningRate 0.0074   Epoch: 14   Global Step: 603220   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:20,055-Speed 2627.21 samples/sec   Loss 3.7600   LearningRate 0.0074   Epoch: 14   Global Step: 603230   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:23,958-Speed 2623.54 samples/sec   Loss 3.7681   LearningRate 0.0074   Epoch: 14   Global Step: 603240   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:27,854-Speed 2629.35 samples/sec   Loss 3.7189   LearningRate 0.0074   Epoch: 14   Global Step: 603250   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:31,776-Speed 2611.77 samples/sec   Loss 3.7495   LearningRate 0.0074   Epoch: 14   Global Step: 603260   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:32:35,679-Speed 2624.34 samples/sec   Loss 3.6930   LearningRate 0.0074   Epoch: 14   Global Step: 603270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:32:39,576-Speed 2627.63 samples/sec   Loss 3.6980   LearningRate 0.0074   Epoch: 14   Global Step: 603280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:32:43,468-Speed 2631.79 samples/sec   Loss 3.7618   LearningRate 0.0074   Epoch: 14   Global Step: 603290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:32:47,374-Speed 2622.91 samples/sec   Loss 3.7541   LearningRate 0.0074   Epoch: 14   Global Step: 603300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:32:51,267-Speed 2631.51 samples/sec   Loss 3.7292   LearningRate 0.0074   Epoch: 14   Global Step: 603310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:32:55,159-Speed 2630.92 samples/sec   Loss 3.5851   LearningRate 0.0074   Epoch: 14   Global Step: 603320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:32:59,052-Speed 2631.15 samples/sec   Loss 3.6722   LearningRate 0.0074   Epoch: 14   Global Step: 603330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:02,958-Speed 2622.40 samples/sec   Loss 3.7037   LearningRate 0.0074   Epoch: 14   Global Step: 603340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:06,878-Speed 2613.20 samples/sec   Loss 3.7975   LearningRate 0.0074   Epoch: 14   Global Step: 603350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:10,771-Speed 2630.90 samples/sec   Loss 3.6933   LearningRate 0.0074   Epoch: 14   Global Step: 603360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:14,673-Speed 2625.00 samples/sec   Loss 3.6564   LearningRate 0.0074   Epoch: 14   Global Step: 603370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:33:18,556-Speed 2637.66 samples/sec   Loss 3.6781   LearningRate 0.0074   Epoch: 14   Global Step: 603380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:22,522-Speed 2583.05 samples/sec   Loss 3.7476   LearningRate 0.0074   Epoch: 14   Global Step: 603390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:26,414-Speed 2632.24 samples/sec   Loss 3.6835   LearningRate 0.0074   Epoch: 14   Global Step: 603400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:30,312-Speed 2627.11 samples/sec   Loss 3.6636   LearningRate 0.0074   Epoch: 14   Global Step: 603410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:34,211-Speed 2626.83 samples/sec   Loss 3.6181   LearningRate 0.0074   Epoch: 14   Global Step: 603420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:38,111-Speed 2626.99 samples/sec   Loss 3.7926   LearningRate 0.0074   Epoch: 14   Global Step: 603430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:33:41,984-Speed 2644.76 samples/sec   Loss 3.7090   LearningRate 0.0074   Epoch: 14   Global Step: 603440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:33:45,912-Speed 2607.49 samples/sec   Loss 3.6479   LearningRate 0.0074   Epoch: 14   Global Step: 603450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:33:49,815-Speed 2623.70 samples/sec   Loss 3.5613   LearningRate 0.0074   Epoch: 14   Global Step: 603460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:33:53,714-Speed 2627.40 samples/sec   Loss 3.7295   LearningRate 0.0074   Epoch: 14   Global Step: 603470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:33:57,608-Speed 2630.71 samples/sec   Loss 3.7376   LearningRate 0.0074   Epoch: 14   Global Step: 603480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:34:01,532-Speed 2610.40 samples/sec   Loss 3.7039   LearningRate 0.0074   Epoch: 14   Global Step: 603490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:34:05,430-Speed 2627.60 samples/sec   Loss 3.6179   LearningRate 0.0074   Epoch: 14   Global Step: 603500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:34:09,322-Speed 2631.42 samples/sec   Loss 3.7479   LearningRate 0.0074   Epoch: 14   Global Step: 603510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:34:13,228-Speed 2621.96 samples/sec   Loss 3.6773   LearningRate 0.0074   Epoch: 14   Global Step: 603520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:34:17,260-Speed 2541.22 samples/sec   Loss 3.7573   LearningRate 0.0074   Epoch: 14   Global Step: 603530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:34:21,166-Speed 2622.15 samples/sec   Loss 3.7309   LearningRate 0.0074   Epoch: 14   Global Step: 603540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:25,103-Speed 2601.29 samples/sec   Loss 3.7481   LearningRate 0.0074   Epoch: 14   Global Step: 603550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:28,997-Speed 2630.50 samples/sec   Loss 3.8029   LearningRate 0.0074   Epoch: 14   Global Step: 603560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:32,888-Speed 2633.35 samples/sec   Loss 3.6229   LearningRate 0.0074   Epoch: 14   Global Step: 603570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:36,813-Speed 2609.65 samples/sec   Loss 3.7110   LearningRate 0.0074   Epoch: 14   Global Step: 603580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:40,709-Speed 2629.43 samples/sec   Loss 3.6869   LearningRate 0.0074   Epoch: 14   Global Step: 603590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:44,601-Speed 2631.65 samples/sec   Loss 3.6775   LearningRate 0.0074   Epoch: 14   Global Step: 603600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:48,681-Speed 2510.45 samples/sec   Loss 3.7471   LearningRate 0.0074   Epoch: 14   Global Step: 603610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:52,783-Speed 2497.02 samples/sec   Loss 3.6815   LearningRate 0.0074   Epoch: 14   Global Step: 603620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:34:56,733-Speed 2593.14 samples/sec   Loss 3.7310   LearningRate 0.0074   Epoch: 14   Global Step: 603630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:00,669-Speed 2601.87 samples/sec   Loss 3.6836   LearningRate 0.0074   Epoch: 14   Global Step: 603640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:35:04,571-Speed 2625.43 samples/sec   Loss 3.6811   LearningRate 0.0074   Epoch: 14   Global Step: 603650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:35:08,446-Speed 2642.72 samples/sec   Loss 3.6931   LearningRate 0.0074   Epoch: 14   Global Step: 603660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:12,345-Speed 2627.89 samples/sec   Loss 3.7519   LearningRate 0.0074   Epoch: 14   Global Step: 603670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:16,244-Speed 2626.53 samples/sec   Loss 3.6312   LearningRate 0.0074   Epoch: 14   Global Step: 603680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:20,153-Speed 2620.48 samples/sec   Loss 3.7033   LearningRate 0.0074   Epoch: 14   Global Step: 603690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:24,075-Speed 2611.80 samples/sec   Loss 3.7897   LearningRate 0.0074   Epoch: 14   Global Step: 603700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:27,972-Speed 2628.25 samples/sec   Loss 3.6500   LearningRate 0.0074   Epoch: 14   Global Step: 603710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:31,865-Speed 2631.76 samples/sec   Loss 3.7007   LearningRate 0.0074   Epoch: 14   Global Step: 603720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:35:35,741-Speed 2642.47 samples/sec   Loss 3.7426   LearningRate 0.0074   Epoch: 14   Global Step: 603730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:35:39,641-Speed 2626.17 samples/sec   Loss 3.6914   LearningRate 0.0074   Epoch: 14   Global Step: 603740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:35:43,519-Speed 2640.99 samples/sec   Loss 3.8034   LearningRate 0.0074   Epoch: 14   Global Step: 603750   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:35:47,415-Speed 2629.32 samples/sec   Loss 3.7210   LearningRate 0.0074   Epoch: 14   Global Step: 603760   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:35:51,309-Speed 2630.52 samples/sec   Loss 3.6757   LearningRate 0.0074   Epoch: 14   Global Step: 603770   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:35:55,236-Speed 2608.23 samples/sec   Loss 3.6465   LearningRate 0.0074   Epoch: 14   Global Step: 603780   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:35:59,136-Speed 2626.44 samples/sec   Loss 3.7687   LearningRate 0.0074   Epoch: 14   Global Step: 603790   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:36:03,053-Speed 2614.93 samples/sec   Loss 3.7140   LearningRate 0.0074   Epoch: 14   Global Step: 603800   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:36:06,952-Speed 2627.00 samples/sec   Loss 3.6595   LearningRate 0.0074   Epoch: 14   Global Step: 603810   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:36:10,852-Speed 2626.00 samples/sec   Loss 3.6856   LearningRate 0.0074   Epoch: 14   Global Step: 603820   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:36:14,766-Speed 2616.92 samples/sec   Loss 3.6493   LearningRate 0.0074   Epoch: 14   Global Step: 603830   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:36:18,671-Speed 2622.50 samples/sec   Loss 3.7118   LearningRate 0.0074   Epoch: 14   Global Step: 603840   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-15 15:36:22,570-Speed 2627.27 samples/sec   Loss 3.6432   LearningRate 0.0074   Epoch: 14   Global Step: 603850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:26,467-Speed 2628.53 samples/sec   Loss 3.6957   LearningRate 0.0074   Epoch: 14   Global Step: 603860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:30,368-Speed 2625.39 samples/sec   Loss 3.6280   LearningRate 0.0074   Epoch: 14   Global Step: 603870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:34,274-Speed 2622.45 samples/sec   Loss 3.5838   LearningRate 0.0074   Epoch: 14   Global Step: 603880   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:38,175-Speed 2625.59 samples/sec   Loss 3.7289   LearningRate 0.0074   Epoch: 14   Global Step: 603890   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:42,072-Speed 2627.88 samples/sec   Loss 3.7462   LearningRate 0.0074   Epoch: 14   Global Step: 603900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:45,975-Speed 2624.71 samples/sec   Loss 3.7369   LearningRate 0.0074   Epoch: 14   Global Step: 603910   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:49,876-Speed 2625.55 samples/sec   Loss 3.6902   LearningRate 0.0074   Epoch: 14   Global Step: 603920   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:53,770-Speed 2630.84 samples/sec   Loss 3.6766   LearningRate 0.0074   Epoch: 14   Global Step: 603930   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:36:57,663-Speed 2630.25 samples/sec   Loss 3.6107   LearningRate 0.0074   Epoch: 14   Global Step: 603940   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:01,567-Speed 2624.12 samples/sec   Loss 3.6747   LearningRate 0.0074   Epoch: 14   Global Step: 603950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:37:05,475-Speed 2621.12 samples/sec   Loss 3.6667   LearningRate 0.0074   Epoch: 14   Global Step: 603960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:37:09,345-Speed 2646.22 samples/sec   Loss 3.6102   LearningRate 0.0074   Epoch: 14   Global Step: 603970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:13,278-Speed 2604.55 samples/sec   Loss 3.6946   LearningRate 0.0074   Epoch: 14   Global Step: 603980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:17,178-Speed 2626.31 samples/sec   Loss 3.6747   LearningRate 0.0074   Epoch: 14   Global Step: 603990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:21,095-Speed 2615.31 samples/sec   Loss 3.6899   LearningRate 0.0074   Epoch: 14   Global Step: 604000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:24,987-Speed 2631.71 samples/sec   Loss 3.7099   LearningRate 0.0074   Epoch: 14   Global Step: 604010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:28,911-Speed 2609.94 samples/sec   Loss 3.6992   LearningRate 0.0074   Epoch: 14   Global Step: 604020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:32,819-Speed 2621.58 samples/sec   Loss 3.6888   LearningRate 0.0074   Epoch: 14   Global Step: 604030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:36,716-Speed 2628.33 samples/sec   Loss 3.6500   LearningRate 0.0074   Epoch: 14   Global Step: 604040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:40,613-Speed 2627.73 samples/sec   Loss 3.6352   LearningRate 0.0074   Epoch: 14   Global Step: 604050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:44,509-Speed 2629.48 samples/sec   Loss 3.7951   LearningRate 0.0074   Epoch: 14   Global Step: 604060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:37:48,401-Speed 2632.11 samples/sec   Loss 3.7312   LearningRate 0.0074   Epoch: 14   Global Step: 604070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:37:52,295-Speed 2630.46 samples/sec   Loss 3.6207   LearningRate 0.0074   Epoch: 14   Global Step: 604080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:37:56,192-Speed 2627.76 samples/sec   Loss 3.6852   LearningRate 0.0074   Epoch: 14   Global Step: 604090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:00,104-Speed 2618.59 samples/sec   Loss 3.6977   LearningRate 0.0074   Epoch: 14   Global Step: 604100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:03,997-Speed 2631.18 samples/sec   Loss 3.6608   LearningRate 0.0074   Epoch: 14   Global Step: 604110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:07,896-Speed 2626.68 samples/sec   Loss 3.6665   LearningRate 0.0074   Epoch: 14   Global Step: 604120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:11,798-Speed 2625.41 samples/sec   Loss 3.6721   LearningRate 0.0074   Epoch: 14   Global Step: 604130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:15,715-Speed 2615.33 samples/sec   Loss 3.7571   LearningRate 0.0074   Epoch: 14   Global Step: 604140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:19,611-Speed 2628.59 samples/sec   Loss 3.6264   LearningRate 0.0074   Epoch: 14   Global Step: 604150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:38:23,485-Speed 2644.37 samples/sec   Loss 3.7257   LearningRate 0.0074   Epoch: 14   Global Step: 604160   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:27,386-Speed 2626.62 samples/sec   Loss 3.6428   LearningRate 0.0074   Epoch: 14   Global Step: 604170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:31,278-Speed 2630.96 samples/sec   Loss 3.5921   LearningRate 0.0074   Epoch: 14   Global Step: 604180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:35,172-Speed 2631.12 samples/sec   Loss 3.7321   LearningRate 0.0074   Epoch: 14   Global Step: 604190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:39,062-Speed 2633.16 samples/sec   Loss 3.6457   LearningRate 0.0074   Epoch: 14   Global Step: 604200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:42,959-Speed 2627.63 samples/sec   Loss 3.6718   LearningRate 0.0074   Epoch: 14   Global Step: 604210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:46,855-Speed 2629.06 samples/sec   Loss 3.6889   LearningRate 0.0074   Epoch: 14   Global Step: 604220   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:50,774-Speed 2613.41 samples/sec   Loss 3.7203   LearningRate 0.0074   Epoch: 14   Global Step: 604230   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:54,672-Speed 2627.49 samples/sec   Loss 3.7331   LearningRate 0.0074   Epoch: 14   Global Step: 604240   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:38:58,591-Speed 2614.14 samples/sec   Loss 3.6264   LearningRate 0.0074   Epoch: 14   Global Step: 604250   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:02,481-Speed 2632.81 samples/sec   Loss 3.7143   LearningRate 0.0074   Epoch: 14   Global Step: 604260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:06,372-Speed 2632.83 samples/sec   Loss 3.5919   LearningRate 0.0074   Epoch: 14   Global Step: 604270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:10,272-Speed 2625.93 samples/sec   Loss 3.5881   LearningRate 0.0074   Epoch: 14   Global Step: 604280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:14,176-Speed 2623.85 samples/sec   Loss 3.6287   LearningRate 0.0074   Epoch: 14   Global Step: 604290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:18,067-Speed 2632.08 samples/sec   Loss 3.7144   LearningRate 0.0074   Epoch: 14   Global Step: 604300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:21,965-Speed 2627.88 samples/sec   Loss 3.6660   LearningRate 0.0074   Epoch: 14   Global Step: 604310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:25,864-Speed 2627.34 samples/sec   Loss 3.6949   LearningRate 0.0074   Epoch: 14   Global Step: 604320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:39:29,736-Speed 2645.22 samples/sec   Loss 3.5988   LearningRate 0.0074   Epoch: 14   Global Step: 604330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:33,631-Speed 2629.61 samples/sec   Loss 3.7121   LearningRate 0.0074   Epoch: 14   Global Step: 604340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:37,537-Speed 2622.73 samples/sec   Loss 3.6058   LearningRate 0.0074   Epoch: 14   Global Step: 604350   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:41,428-Speed 2632.04 samples/sec   Loss 3.7093   LearningRate 0.0074   Epoch: 14   Global Step: 604360   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:45,336-Speed 2621.00 samples/sec   Loss 3.7247   LearningRate 0.0074   Epoch: 14   Global Step: 604370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:49,277-Speed 2598.60 samples/sec   Loss 3.7374   LearningRate 0.0074   Epoch: 14   Global Step: 604380   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:53,169-Speed 2632.63 samples/sec   Loss 3.7576   LearningRate 0.0074   Epoch: 14   Global Step: 604390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:39:57,063-Speed 2630.56 samples/sec   Loss 3.6672   LearningRate 0.0074   Epoch: 14   Global Step: 604400   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:00,965-Speed 2624.42 samples/sec   Loss 3.6693   LearningRate 0.0074   Epoch: 14   Global Step: 604410   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:04,870-Speed 2623.64 samples/sec   Loss 3.6620   LearningRate 0.0074   Epoch: 14   Global Step: 604420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:08,780-Speed 2619.45 samples/sec   Loss 3.6565   LearningRate 0.0074   Epoch: 14   Global Step: 604430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:40:12,660-Speed 2639.58 samples/sec   Loss 3.7763   LearningRate 0.0074   Epoch: 14   Global Step: 604440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:16,561-Speed 2625.40 samples/sec   Loss 3.7026   LearningRate 0.0074   Epoch: 14   Global Step: 604450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:20,474-Speed 2618.00 samples/sec   Loss 3.6922   LearningRate 0.0074   Epoch: 14   Global Step: 604460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:24,375-Speed 2625.33 samples/sec   Loss 3.7069   LearningRate 0.0074   Epoch: 14   Global Step: 604470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:28,284-Speed 2620.60 samples/sec   Loss 3.6879   LearningRate 0.0074   Epoch: 14   Global Step: 604480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:32,177-Speed 2631.37 samples/sec   Loss 3.6881   LearningRate 0.0074   Epoch: 14   Global Step: 604490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:36,113-Speed 2601.61 samples/sec   Loss 3.5721   LearningRate 0.0074   Epoch: 14   Global Step: 604500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:40,023-Speed 2619.56 samples/sec   Loss 3.6399   LearningRate 0.0074   Epoch: 14   Global Step: 604510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:43,916-Speed 2631.01 samples/sec   Loss 3.6746   LearningRate 0.0074   Epoch: 14   Global Step: 604520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:47,809-Speed 2630.99 samples/sec   Loss 3.7252   LearningRate 0.0074   Epoch: 14   Global Step: 604530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:40:51,708-Speed 2626.66 samples/sec   Loss 3.6116   LearningRate 0.0074   Epoch: 14   Global Step: 604540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:40:55,609-Speed 2625.99 samples/sec   Loss 3.7592   LearningRate 0.0074   Epoch: 14   Global Step: 604550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:40:59,504-Speed 2630.04 samples/sec   Loss 3.6111   LearningRate 0.0074   Epoch: 14   Global Step: 604560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:03,399-Speed 2629.61 samples/sec   Loss 3.6885   LearningRate 0.0074   Epoch: 14   Global Step: 604570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:07,303-Speed 2623.21 samples/sec   Loss 3.6148   LearningRate 0.0074   Epoch: 14   Global Step: 604580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:11,212-Speed 2620.96 samples/sec   Loss 3.6409   LearningRate 0.0074   Epoch: 14   Global Step: 604590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:15,109-Speed 2628.38 samples/sec   Loss 3.6928   LearningRate 0.0074   Epoch: 14   Global Step: 604600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:19,020-Speed 2618.59 samples/sec   Loss 3.6726   LearningRate 0.0074   Epoch: 14   Global Step: 604610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:22,930-Speed 2625.19 samples/sec   Loss 3.6122   LearningRate 0.0074   Epoch: 14   Global Step: 604620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:26,825-Speed 2630.00 samples/sec   Loss 3.6820   LearningRate 0.0074   Epoch: 14   Global Step: 604630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:30,722-Speed 2628.90 samples/sec   Loss 3.6864   LearningRate 0.0074   Epoch: 14   Global Step: 604640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:41:34,626-Speed 2623.51 samples/sec   Loss 3.6522   LearningRate 0.0074   Epoch: 14   Global Step: 604650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:38,538-Speed 2618.02 samples/sec   Loss 3.6709   LearningRate 0.0074   Epoch: 14   Global Step: 604660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:42,435-Speed 2628.24 samples/sec   Loss 3.6997   LearningRate 0.0074   Epoch: 14   Global Step: 604670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:46,353-Speed 2614.69 samples/sec   Loss 3.6952   LearningRate 0.0073   Epoch: 14   Global Step: 604680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:50,263-Speed 2619.27 samples/sec   Loss 3.7237   LearningRate 0.0073   Epoch: 14   Global Step: 604690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:54,154-Speed 2632.93 samples/sec   Loss 3.7323   LearningRate 0.0073   Epoch: 14   Global Step: 604700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:41:58,045-Speed 2632.08 samples/sec   Loss 3.6329   LearningRate 0.0073   Epoch: 14   Global Step: 604710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:42:01,949-Speed 2623.83 samples/sec   Loss 3.6364   LearningRate 0.0073   Epoch: 14   Global Step: 604720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:42:05,939-Speed 2566.62 samples/sec   Loss 3.5422   LearningRate 0.0073   Epoch: 14   Global Step: 604730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:42:09,815-Speed 2642.37 samples/sec   Loss 3.6039   LearningRate 0.0073   Epoch: 14   Global Step: 604740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:13,715-Speed 2626.49 samples/sec   Loss 3.6021   LearningRate 0.0073   Epoch: 14   Global Step: 604750   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:17,604-Speed 2633.69 samples/sec   Loss 3.6592   LearningRate 0.0073   Epoch: 14   Global Step: 604760   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:21,499-Speed 2629.65 samples/sec   Loss 3.6741   LearningRate 0.0073   Epoch: 14   Global Step: 604770   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:25,402-Speed 2624.68 samples/sec   Loss 3.7143   LearningRate 0.0073   Epoch: 14   Global Step: 604780   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:29,306-Speed 2623.27 samples/sec   Loss 3.6431   LearningRate 0.0073   Epoch: 14   Global Step: 604790   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:33,204-Speed 2628.56 samples/sec   Loss 3.7390   LearningRate 0.0073   Epoch: 14   Global Step: 604800   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:37,105-Speed 2625.50 samples/sec   Loss 3.7448   LearningRate 0.0073   Epoch: 14   Global Step: 604810   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:41,001-Speed 2628.65 samples/sec   Loss 3.6645   LearningRate 0.0073   Epoch: 14   Global Step: 604820   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:44,905-Speed 2623.45 samples/sec   Loss 3.7240   LearningRate 0.0073   Epoch: 14   Global Step: 604830   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:42:48,800-Speed 2630.52 samples/sec   Loss 3.7369   LearningRate 0.0073   Epoch: 14   Global Step: 604840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:42:52,698-Speed 2627.95 samples/sec   Loss 3.6827   LearningRate 0.0073   Epoch: 14   Global Step: 604850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:42:56,600-Speed 2625.27 samples/sec   Loss 3.7039   LearningRate 0.0073   Epoch: 14   Global Step: 604860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:00,500-Speed 2627.28 samples/sec   Loss 3.6752   LearningRate 0.0073   Epoch: 14   Global Step: 604870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:04,391-Speed 2631.78 samples/sec   Loss 3.6486   LearningRate 0.0073   Epoch: 14   Global Step: 604880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:08,287-Speed 2628.88 samples/sec   Loss 3.6569   LearningRate 0.0073   Epoch: 14   Global Step: 604890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:12,191-Speed 2623.63 samples/sec   Loss 3.6468   LearningRate 0.0073   Epoch: 14   Global Step: 604900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:16,088-Speed 2628.85 samples/sec   Loss 3.7334   LearningRate 0.0073   Epoch: 14   Global Step: 604910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:19,986-Speed 2627.31 samples/sec   Loss 3.7467   LearningRate 0.0073   Epoch: 14   Global Step: 604920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:43:23,904-Speed 2614.74 samples/sec   Loss 3.5955   LearningRate 0.0073   Epoch: 14   Global Step: 604930   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:27,802-Speed 2627.83 samples/sec   Loss 3.6721   LearningRate 0.0073   Epoch: 14   Global Step: 604940   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:31,709-Speed 2622.07 samples/sec   Loss 3.7036   LearningRate 0.0073   Epoch: 14   Global Step: 604950   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:35,617-Speed 2620.44 samples/sec   Loss 3.7019   LearningRate 0.0073   Epoch: 14   Global Step: 604960   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:39,514-Speed 2628.70 samples/sec   Loss 3.6632   LearningRate 0.0073   Epoch: 14   Global Step: 604970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:43,430-Speed 2615.25 samples/sec   Loss 3.6531   LearningRate 0.0073   Epoch: 14   Global Step: 604980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:47,357-Speed 2608.82 samples/sec   Loss 3.7187   LearningRate 0.0073   Epoch: 14   Global Step: 604990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:51,253-Speed 2628.74 samples/sec   Loss 3.6847   LearningRate 0.0073   Epoch: 14   Global Step: 605000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:55,145-Speed 2631.82 samples/sec   Loss 3.5841   LearningRate 0.0073   Epoch: 14   Global Step: 605010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:43:59,036-Speed 2632.43 samples/sec   Loss 3.6896   LearningRate 0.0073   Epoch: 14   Global Step: 605020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:44:02,938-Speed 2629.77 samples/sec   Loss 3.7035   LearningRate 0.0073   Epoch: 14   Global Step: 605030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:06,845-Speed 2621.40 samples/sec   Loss 3.5843   LearningRate 0.0073   Epoch: 14   Global Step: 605040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:10,742-Speed 2628.27 samples/sec   Loss 3.6745   LearningRate 0.0073   Epoch: 14   Global Step: 605050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:14,643-Speed 2625.18 samples/sec   Loss 3.6365   LearningRate 0.0073   Epoch: 14   Global Step: 605060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:18,535-Speed 2632.26 samples/sec   Loss 3.7447   LearningRate 0.0073   Epoch: 14   Global Step: 605070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:22,444-Speed 2620.62 samples/sec   Loss 3.6484   LearningRate 0.0073   Epoch: 14   Global Step: 605080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:26,350-Speed 2622.02 samples/sec   Loss 3.7366   LearningRate 0.0073   Epoch: 14   Global Step: 605090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:30,240-Speed 2633.29 samples/sec   Loss 3.7065   LearningRate 0.0073   Epoch: 14   Global Step: 605100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:34,136-Speed 2628.62 samples/sec   Loss 3.6462   LearningRate 0.0073   Epoch: 14   Global Step: 605110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:44:38,004-Speed 2648.13 samples/sec   Loss 3.6987   LearningRate 0.0073   Epoch: 14   Global Step: 605120   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:44:41,899-Speed 2629.96 samples/sec   Loss 3.6975   LearningRate 0.0073   Epoch: 14   Global Step: 605130   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:44:45,789-Speed 2632.84 samples/sec   Loss 3.6868   LearningRate 0.0073   Epoch: 14   Global Step: 605140   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:44:49,684-Speed 2629.43 samples/sec   Loss 3.6544   LearningRate 0.0073   Epoch: 14   Global Step: 605150   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:44:53,588-Speed 2623.87 samples/sec   Loss 3.6995   LearningRate 0.0073   Epoch: 14   Global Step: 605160   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:44:57,484-Speed 2630.01 samples/sec   Loss 3.6405   LearningRate 0.0073   Epoch: 14   Global Step: 605170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:01,379-Speed 2628.96 samples/sec   Loss 3.5111   LearningRate 0.0073   Epoch: 14   Global Step: 605180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:05,277-Speed 2628.43 samples/sec   Loss 3.6489   LearningRate 0.0073   Epoch: 14   Global Step: 605190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:09,196-Speed 2613.04 samples/sec   Loss 3.6583   LearningRate 0.0073   Epoch: 14   Global Step: 605200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:13,089-Speed 2631.36 samples/sec   Loss 3.6873   LearningRate 0.0073   Epoch: 14   Global Step: 605210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:16,987-Speed 2627.58 samples/sec   Loss 3.6700   LearningRate 0.0073   Epoch: 14   Global Step: 605220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:20,892-Speed 2622.97 samples/sec   Loss 3.6938   LearningRate 0.0073   Epoch: 14   Global Step: 605230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:24,787-Speed 2629.76 samples/sec   Loss 3.5613   LearningRate 0.0073   Epoch: 14   Global Step: 605240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:28,683-Speed 2629.50 samples/sec   Loss 3.7186   LearningRate 0.0073   Epoch: 14   Global Step: 605250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:32,728-Speed 2532.01 samples/sec   Loss 3.6764   LearningRate 0.0073   Epoch: 14   Global Step: 605260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:36,722-Speed 2564.87 samples/sec   Loss 3.6452   LearningRate 0.0073   Epoch: 14   Global Step: 605270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:40,619-Speed 2628.58 samples/sec   Loss 3.6476   LearningRate 0.0073   Epoch: 14   Global Step: 605280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:44,549-Speed 2605.80 samples/sec   Loss 3.6706   LearningRate 0.0073   Epoch: 14   Global Step: 605290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:45:48,424-Speed 2642.92 samples/sec   Loss 3.7139   LearningRate 0.0073   Epoch: 14   Global Step: 605300   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:52,320-Speed 2629.62 samples/sec   Loss 3.7067   LearningRate 0.0073   Epoch: 14   Global Step: 605310   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:45:56,216-Speed 2628.99 samples/sec   Loss 3.5963   LearningRate 0.0073   Epoch: 14   Global Step: 605320   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:00,121-Speed 2622.87 samples/sec   Loss 3.7150   LearningRate 0.0073   Epoch: 14   Global Step: 605330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:04,036-Speed 2616.86 samples/sec   Loss 3.6885   LearningRate 0.0073   Epoch: 14   Global Step: 605340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:07,940-Speed 2631.93 samples/sec   Loss 3.5536   LearningRate 0.0073   Epoch: 14   Global Step: 605350   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:11,851-Speed 2619.32 samples/sec   Loss 3.6754   LearningRate 0.0073   Epoch: 14   Global Step: 605360   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:15,742-Speed 2632.31 samples/sec   Loss 3.6751   LearningRate 0.0073   Epoch: 14   Global Step: 605370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:19,651-Speed 2619.79 samples/sec   Loss 3.6345   LearningRate 0.0073   Epoch: 14   Global Step: 605380   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:23,564-Speed 2618.44 samples/sec   Loss 3.7034   LearningRate 0.0073   Epoch: 14   Global Step: 605390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:27,457-Speed 2630.90 samples/sec   Loss 3.6355   LearningRate 0.0073   Epoch: 14   Global Step: 605400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:46:31,355-Speed 2627.45 samples/sec   Loss 3.6945   LearningRate 0.0073   Epoch: 14   Global Step: 605410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:46:35,250-Speed 2629.92 samples/sec   Loss 3.6275   LearningRate 0.0073   Epoch: 14   Global Step: 605420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:39,151-Speed 2625.53 samples/sec   Loss 3.6906   LearningRate 0.0073   Epoch: 14   Global Step: 605430   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:43,040-Speed 2634.15 samples/sec   Loss 3.7254   LearningRate 0.0073   Epoch: 14   Global Step: 605440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:46,938-Speed 2627.36 samples/sec   Loss 3.7413   LearningRate 0.0073   Epoch: 14   Global Step: 605450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:50,829-Speed 2632.60 samples/sec   Loss 3.6500   LearningRate 0.0073   Epoch: 14   Global Step: 605460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:54,744-Speed 2615.69 samples/sec   Loss 3.6423   LearningRate 0.0073   Epoch: 14   Global Step: 605470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:46:58,637-Speed 2631.59 samples/sec   Loss 3.7346   LearningRate 0.0073   Epoch: 14   Global Step: 605480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:02,530-Speed 2631.49 samples/sec   Loss 3.6748   LearningRate 0.0073   Epoch: 14   Global Step: 605490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:06,428-Speed 2627.19 samples/sec   Loss 3.7103   LearningRate 0.0073   Epoch: 14   Global Step: 605500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:10,326-Speed 2627.86 samples/sec   Loss 3.5801   LearningRate 0.0073   Epoch: 14   Global Step: 605510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:14,223-Speed 2629.52 samples/sec   Loss 3.6637   LearningRate 0.0073   Epoch: 14   Global Step: 605520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:18,131-Speed 2620.92 samples/sec   Loss 3.7355   LearningRate 0.0073   Epoch: 14   Global Step: 605530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:22,034-Speed 2624.33 samples/sec   Loss 3.6623   LearningRate 0.0073   Epoch: 14   Global Step: 605540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:25,930-Speed 2628.66 samples/sec   Loss 3.7301   LearningRate 0.0073   Epoch: 14   Global Step: 605550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:29,822-Speed 2631.37 samples/sec   Loss 3.7277   LearningRate 0.0073   Epoch: 14   Global Step: 605560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:33,714-Speed 2632.09 samples/sec   Loss 3.7255   LearningRate 0.0073   Epoch: 14   Global Step: 605570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:37,608-Speed 2631.02 samples/sec   Loss 3.5844   LearningRate 0.0073   Epoch: 14   Global Step: 605580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:47:41,501-Speed 2630.47 samples/sec   Loss 3.6634   LearningRate 0.0073   Epoch: 14   Global Step: 605590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:45,396-Speed 2629.80 samples/sec   Loss 3.6748   LearningRate 0.0073   Epoch: 14   Global Step: 605600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:49,294-Speed 2628.71 samples/sec   Loss 3.6039   LearningRate 0.0073   Epoch: 14   Global Step: 605610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:53,185-Speed 2632.69 samples/sec   Loss 3.6309   LearningRate 0.0073   Epoch: 14   Global Step: 605620   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:47:57,074-Speed 2632.91 samples/sec   Loss 3.6466   LearningRate 0.0073   Epoch: 14   Global Step: 605630   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:00,979-Speed 2622.93 samples/sec   Loss 3.5662   LearningRate 0.0073   Epoch: 14   Global Step: 605640   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:04,869-Speed 2633.03 samples/sec   Loss 3.6046   LearningRate 0.0073   Epoch: 14   Global Step: 605650   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:08,762-Speed 2631.43 samples/sec   Loss 3.6280   LearningRate 0.0073   Epoch: 14   Global Step: 605660   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:12,657-Speed 2630.21 samples/sec   Loss 3.6304   LearningRate 0.0073   Epoch: 14   Global Step: 605670   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:16,552-Speed 2629.50 samples/sec   Loss 3.6852   LearningRate 0.0073   Epoch: 14   Global Step: 605680   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:20,450-Speed 2627.60 samples/sec   Loss 3.6667   LearningRate 0.0073   Epoch: 14   Global Step: 605690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:48:24,377-Speed 2608.59 samples/sec   Loss 3.6054   LearningRate 0.0073   Epoch: 14   Global Step: 605700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:48:28,279-Speed 2625.15 samples/sec   Loss 3.5801   LearningRate 0.0073   Epoch: 14   Global Step: 605710   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:32,238-Speed 2587.21 samples/sec   Loss 3.6379   LearningRate 0.0073   Epoch: 14   Global Step: 605720   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:36,157-Speed 2613.35 samples/sec   Loss 3.6633   LearningRate 0.0073   Epoch: 14   Global Step: 605730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:40,048-Speed 2632.41 samples/sec   Loss 3.6883   LearningRate 0.0073   Epoch: 14   Global Step: 605740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:43,939-Speed 2632.25 samples/sec   Loss 3.6709   LearningRate 0.0073   Epoch: 14   Global Step: 605750   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:47,828-Speed 2634.22 samples/sec   Loss 3.6765   LearningRate 0.0073   Epoch: 14   Global Step: 605760   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:51,737-Speed 2620.02 samples/sec   Loss 3.7963   LearningRate 0.0073   Epoch: 14   Global Step: 605770   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:55,629-Speed 2631.79 samples/sec   Loss 3.5834   LearningRate 0.0073   Epoch: 14   Global Step: 605780   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:48:59,522-Speed 2631.09 samples/sec   Loss 3.7448   LearningRate 0.0073   Epoch: 14   Global Step: 605790   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:49:03,420-Speed 2627.54 samples/sec   Loss 3.6967   LearningRate 0.0073   Epoch: 14   Global Step: 605800   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:49:07,309-Speed 2634.13 samples/sec   Loss 3.6982   LearningRate 0.0073   Epoch: 14   Global Step: 605810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:11,204-Speed 2629.65 samples/sec   Loss 3.6630   LearningRate 0.0073   Epoch: 14   Global Step: 605820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:15,122-Speed 2614.02 samples/sec   Loss 3.5946   LearningRate 0.0073   Epoch: 14   Global Step: 605830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:19,063-Speed 2599.30 samples/sec   Loss 3.6368   LearningRate 0.0073   Epoch: 14   Global Step: 605840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:22,966-Speed 2626.65 samples/sec   Loss 3.6426   LearningRate 0.0073   Epoch: 14   Global Step: 605850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:26,877-Speed 2618.90 samples/sec   Loss 3.6414   LearningRate 0.0073   Epoch: 14   Global Step: 605860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:30,771-Speed 2630.21 samples/sec   Loss 3.5690   LearningRate 0.0073   Epoch: 14   Global Step: 605870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:34,666-Speed 2629.16 samples/sec   Loss 3.6387   LearningRate 0.0073   Epoch: 14   Global Step: 605880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:38,640-Speed 2578.28 samples/sec   Loss 3.7225   LearningRate 0.0073   Epoch: 14   Global Step: 605890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:42,532-Speed 2631.51 samples/sec   Loss 3.6654   LearningRate 0.0073   Epoch: 14   Global Step: 605900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:46,425-Speed 2630.95 samples/sec   Loss 3.6364   LearningRate 0.0073   Epoch: 14   Global Step: 605910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:49:50,340-Speed 2615.65 samples/sec   Loss 3.6904   LearningRate 0.0073   Epoch: 14   Global Step: 605920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:49:54,228-Speed 2635.38 samples/sec   Loss 3.6853   LearningRate 0.0073   Epoch: 14   Global Step: 605930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:49:58,119-Speed 2632.24 samples/sec   Loss 3.6636   LearningRate 0.0073   Epoch: 14   Global Step: 605940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:02,019-Speed 2626.66 samples/sec   Loss 3.6339   LearningRate 0.0073   Epoch: 14   Global Step: 605950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:05,911-Speed 2631.11 samples/sec   Loss 3.6659   LearningRate 0.0073   Epoch: 14   Global Step: 605960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:09,815-Speed 2623.54 samples/sec   Loss 3.6120   LearningRate 0.0073   Epoch: 14   Global Step: 605970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:13,707-Speed 2632.12 samples/sec   Loss 3.6859   LearningRate 0.0073   Epoch: 14   Global Step: 605980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:17,601-Speed 2630.12 samples/sec   Loss 3.7105   LearningRate 0.0073   Epoch: 14   Global Step: 605990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:21,493-Speed 2631.30 samples/sec   Loss 3.6523   LearningRate 0.0073   Epoch: 14   Global Step: 606000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:25,388-Speed 2629.90 samples/sec   Loss 3.6153   LearningRate 0.0073   Epoch: 14   Global Step: 606010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:29,288-Speed 2626.57 samples/sec   Loss 3.6619   LearningRate 0.0073   Epoch: 14   Global Step: 606020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:33,193-Speed 2623.50 samples/sec   Loss 3.6571   LearningRate 0.0073   Epoch: 14   Global Step: 606030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:50:37,087-Speed 2630.13 samples/sec   Loss 3.7290   LearningRate 0.0073   Epoch: 14   Global Step: 606040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:50:40,962-Speed 2643.25 samples/sec   Loss 3.6381   LearningRate 0.0073   Epoch: 14   Global Step: 606050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:44,866-Speed 2623.80 samples/sec   Loss 3.7029   LearningRate 0.0073   Epoch: 14   Global Step: 606060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:48,759-Speed 2630.90 samples/sec   Loss 3.6697   LearningRate 0.0073   Epoch: 14   Global Step: 606070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:52,651-Speed 2631.54 samples/sec   Loss 3.5711   LearningRate 0.0073   Epoch: 14   Global Step: 606080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:50:56,549-Speed 2627.88 samples/sec   Loss 3.6783   LearningRate 0.0073   Epoch: 14   Global Step: 606090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:00,447-Speed 2627.65 samples/sec   Loss 3.6497   LearningRate 0.0073   Epoch: 14   Global Step: 606100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:04,346-Speed 2627.18 samples/sec   Loss 3.6307   LearningRate 0.0073   Epoch: 14   Global Step: 606110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:08,262-Speed 2615.21 samples/sec   Loss 3.6889   LearningRate 0.0073   Epoch: 14   Global Step: 606120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:12,168-Speed 2623.23 samples/sec   Loss 3.6769   LearningRate 0.0073   Epoch: 14   Global Step: 606130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:16,140-Speed 2578.92 samples/sec   Loss 3.6768   LearningRate 0.0073   Epoch: 14   Global Step: 606140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:20,091-Speed 2592.14 samples/sec   Loss 3.7116   LearningRate 0.0073   Epoch: 14   Global Step: 606150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:23,983-Speed 2631.19 samples/sec   Loss 3.6112   LearningRate 0.0073   Epoch: 14   Global Step: 606160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:27,877-Speed 2631.27 samples/sec   Loss 3.6810   LearningRate 0.0073   Epoch: 14   Global Step: 606170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:31,774-Speed 2627.93 samples/sec   Loss 3.6523   LearningRate 0.0073   Epoch: 14   Global Step: 606180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:35,670-Speed 2629.48 samples/sec   Loss 3.6530   LearningRate 0.0073   Epoch: 14   Global Step: 606190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:39,574-Speed 2623.58 samples/sec   Loss 3.6237   LearningRate 0.0073   Epoch: 14   Global Step: 606200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:43,474-Speed 2626.11 samples/sec   Loss 3.5606   LearningRate 0.0072   Epoch: 14   Global Step: 606210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:47,381-Speed 2621.70 samples/sec   Loss 3.5571   LearningRate 0.0072   Epoch: 14   Global Step: 606220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:51,280-Speed 2627.06 samples/sec   Loss 3.7722   LearningRate 0.0072   Epoch: 14   Global Step: 606230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:55,183-Speed 2623.82 samples/sec   Loss 3.5878   LearningRate 0.0072   Epoch: 14   Global Step: 606240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:51:59,098-Speed 2617.11 samples/sec   Loss 3.6406   LearningRate 0.0072   Epoch: 14   Global Step: 606250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:52:02,990-Speed 2631.88 samples/sec   Loss 3.6399   LearningRate 0.0072   Epoch: 14   Global Step: 606260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:52:06,886-Speed 2628.93 samples/sec   Loss 3.6386   LearningRate 0.0072   Epoch: 14   Global Step: 606270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:52:10,859-Speed 2578.58 samples/sec   Loss 3.7766   LearningRate 0.0072   Epoch: 14   Global Step: 606280   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:14,815-Speed 2588.83 samples/sec   Loss 3.7742   LearningRate 0.0072   Epoch: 14   Global Step: 606290   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:18,714-Speed 2626.74 samples/sec   Loss 3.6498   LearningRate 0.0072   Epoch: 14   Global Step: 606300   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:22,610-Speed 2629.37 samples/sec   Loss 3.6270   LearningRate 0.0072   Epoch: 14   Global Step: 606310   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:26,543-Speed 2604.54 samples/sec   Loss 3.5700   LearningRate 0.0072   Epoch: 14   Global Step: 606320   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:30,512-Speed 2580.15 samples/sec   Loss 3.7508   LearningRate 0.0072   Epoch: 14   Global Step: 606330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:34,409-Speed 2628.93 samples/sec   Loss 3.6411   LearningRate 0.0072   Epoch: 14   Global Step: 606340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:38,308-Speed 2627.33 samples/sec   Loss 3.6266   LearningRate 0.0072   Epoch: 14   Global Step: 606350   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:42,205-Speed 2627.94 samples/sec   Loss 3.6323   LearningRate 0.0072   Epoch: 14   Global Step: 606360   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:46,107-Speed 2625.59 samples/sec   Loss 3.6273   LearningRate 0.0072   Epoch: 14   Global Step: 606370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:52:50,005-Speed 2627.32 samples/sec   Loss 3.6430   LearningRate 0.0072   Epoch: 14   Global Step: 606380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:52:53,985-Speed 2573.55 samples/sec   Loss 3.6275   LearningRate 0.0072   Epoch: 14   Global Step: 606390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:52:57,902-Speed 2615.43 samples/sec   Loss 3.6359   LearningRate 0.0072   Epoch: 14   Global Step: 606400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:01,800-Speed 2627.40 samples/sec   Loss 3.6325   LearningRate 0.0072   Epoch: 14   Global Step: 606410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:05,695-Speed 2629.85 samples/sec   Loss 3.6031   LearningRate 0.0072   Epoch: 14   Global Step: 606420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:09,591-Speed 2628.84 samples/sec   Loss 3.6598   LearningRate 0.0072   Epoch: 14   Global Step: 606430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:13,515-Speed 2610.72 samples/sec   Loss 3.7719   LearningRate 0.0072   Epoch: 14   Global Step: 606440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:17,412-Speed 2627.64 samples/sec   Loss 3.6376   LearningRate 0.0072   Epoch: 14   Global Step: 606450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:21,312-Speed 2627.96 samples/sec   Loss 3.6762   LearningRate 0.0072   Epoch: 14   Global Step: 606460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:25,209-Speed 2628.00 samples/sec   Loss 3.6542   LearningRate 0.0072   Epoch: 14   Global Step: 606470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:29,103-Speed 2630.79 samples/sec   Loss 3.5737   LearningRate 0.0072   Epoch: 14   Global Step: 606480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:53:33,008-Speed 2622.26 samples/sec   Loss 3.6336   LearningRate 0.0072   Epoch: 14   Global Step: 606490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:53:36,888-Speed 2639.89 samples/sec   Loss 3.6219   LearningRate 0.0072   Epoch: 14   Global Step: 606500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:40,796-Speed 2620.95 samples/sec   Loss 3.7421   LearningRate 0.0072   Epoch: 14   Global Step: 606510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:53:44,680-Speed 2637.46 samples/sec   Loss 3.6351   LearningRate 0.0072   Epoch: 14   Global Step: 606520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:53:48,578-Speed 2627.87 samples/sec   Loss 3.6709   LearningRate 0.0072   Epoch: 14   Global Step: 606530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:53:52,476-Speed 2626.91 samples/sec   Loss 3.6642   LearningRate 0.0072   Epoch: 14   Global Step: 606540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:53:56,378-Speed 2625.60 samples/sec   Loss 3.6084   LearningRate 0.0072   Epoch: 14   Global Step: 606550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:00,273-Speed 2629.82 samples/sec   Loss 3.6786   LearningRate 0.0072   Epoch: 14   Global Step: 606560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:04,177-Speed 2623.53 samples/sec   Loss 3.5747   LearningRate 0.0072   Epoch: 14   Global Step: 606570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:08,129-Speed 2591.05 samples/sec   Loss 3.6805   LearningRate 0.0072   Epoch: 14   Global Step: 606580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:12,093-Speed 2584.12 samples/sec   Loss 3.6431   LearningRate 0.0072   Epoch: 14   Global Step: 606590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:15,986-Speed 2630.70 samples/sec   Loss 3.6027   LearningRate 0.0072   Epoch: 14   Global Step: 606600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:19,879-Speed 2631.98 samples/sec   Loss 3.7112   LearningRate 0.0072   Epoch: 14   Global Step: 606610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:23,775-Speed 2628.78 samples/sec   Loss 3.6021   LearningRate 0.0072   Epoch: 14   Global Step: 606620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:54:27,667-Speed 2632.34 samples/sec   Loss 3.7309   LearningRate 0.0072   Epoch: 14   Global Step: 606630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:54:31,562-Speed 2629.76 samples/sec   Loss 3.6210   LearningRate 0.0072   Epoch: 14   Global Step: 606640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:54:35,459-Speed 2628.38 samples/sec   Loss 3.6855   LearningRate 0.0072   Epoch: 14   Global Step: 606650   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:39,351-Speed 2631.46 samples/sec   Loss 3.6402   LearningRate 0.0072   Epoch: 14   Global Step: 606660   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:43,248-Speed 2628.24 samples/sec   Loss 3.6752   LearningRate 0.0072   Epoch: 14   Global Step: 606670   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:47,140-Speed 2631.37 samples/sec   Loss 3.6824   LearningRate 0.0072   Epoch: 14   Global Step: 606680   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:51,038-Speed 2628.00 samples/sec   Loss 3.6378   LearningRate 0.0072   Epoch: 14   Global Step: 606690   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:54,938-Speed 2626.40 samples/sec   Loss 3.6749   LearningRate 0.0072   Epoch: 14   Global Step: 606700   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:54:58,835-Speed 2628.70 samples/sec   Loss 3.6851   LearningRate 0.0072   Epoch: 14   Global Step: 606710   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:02,733-Speed 2627.39 samples/sec   Loss 3.6906   LearningRate 0.0072   Epoch: 14   Global Step: 606720   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:06,627-Speed 2630.07 samples/sec   Loss 3.7105   LearningRate 0.0072   Epoch: 14   Global Step: 606730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:10,530-Speed 2624.42 samples/sec   Loss 3.6640   LearningRate 0.0072   Epoch: 14   Global Step: 606740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:14,428-Speed 2627.99 samples/sec   Loss 3.6033   LearningRate 0.0072   Epoch: 14   Global Step: 606750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:55:18,319-Speed 2632.94 samples/sec   Loss 3.6340   LearningRate 0.0072   Epoch: 14   Global Step: 606760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:55:22,207-Speed 2634.22 samples/sec   Loss 3.5891   LearningRate 0.0072   Epoch: 14   Global Step: 606770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:55:26,082-Speed 2643.58 samples/sec   Loss 3.5524   LearningRate 0.0072   Epoch: 14   Global Step: 606780   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:29,981-Speed 2627.02 samples/sec   Loss 3.6713   LearningRate 0.0072   Epoch: 14   Global Step: 606790   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:33,870-Speed 2633.10 samples/sec   Loss 3.6506   LearningRate 0.0072   Epoch: 14   Global Step: 606800   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:37,760-Speed 2632.82 samples/sec   Loss 3.6650   LearningRate 0.0072   Epoch: 14   Global Step: 606810   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:41,651-Speed 2633.42 samples/sec   Loss 3.6447   LearningRate 0.0072   Epoch: 14   Global Step: 606820   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:45,544-Speed 2630.54 samples/sec   Loss 3.6439   LearningRate 0.0072   Epoch: 14   Global Step: 606830   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:49,445-Speed 2626.10 samples/sec   Loss 3.6458   LearningRate 0.0072   Epoch: 14   Global Step: 606840   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:53,336-Speed 2631.90 samples/sec   Loss 3.6462   LearningRate 0.0072   Epoch: 14   Global Step: 606850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:55:57,229-Speed 2631.90 samples/sec   Loss 3.6453   LearningRate 0.0072   Epoch: 14   Global Step: 606860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:01,119-Speed 2632.42 samples/sec   Loss 3.5670   LearningRate 0.0072   Epoch: 14   Global Step: 606870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:05,014-Speed 2630.11 samples/sec   Loss 3.6468   LearningRate 0.0072   Epoch: 14   Global Step: 606880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:56:08,912-Speed 2627.40 samples/sec   Loss 3.6721   LearningRate 0.0072   Epoch: 14   Global Step: 606890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:56:12,814-Speed 2625.09 samples/sec   Loss 3.6265   LearningRate 0.0072   Epoch: 14   Global Step: 606900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:56:16,711-Speed 2628.13 samples/sec   Loss 3.6574   LearningRate 0.0072   Epoch: 14   Global Step: 606910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:56:20,607-Speed 2628.94 samples/sec   Loss 3.7160   LearningRate 0.0072   Epoch: 14   Global Step: 606920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:56:24,579-Speed 2579.19 samples/sec   Loss 3.6827   LearningRate 0.0072   Epoch: 14   Global Step: 606930   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:28,470-Speed 2632.55 samples/sec   Loss 3.6107   LearningRate 0.0072   Epoch: 14   Global Step: 606940   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:32,362-Speed 2631.66 samples/sec   Loss 3.6011   LearningRate 0.0072   Epoch: 14   Global Step: 606950   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:36,257-Speed 2629.71 samples/sec   Loss 3.6317   LearningRate 0.0072   Epoch: 14   Global Step: 606960   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:40,156-Speed 2626.91 samples/sec   Loss 3.5744   LearningRate 0.0072   Epoch: 14   Global Step: 606970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:44,058-Speed 2625.79 samples/sec   Loss 3.6772   LearningRate 0.0072   Epoch: 14   Global Step: 606980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:47,950-Speed 2631.82 samples/sec   Loss 3.6673   LearningRate 0.0072   Epoch: 14   Global Step: 606990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:51,863-Speed 2617.37 samples/sec   Loss 3.6245   LearningRate 0.0072   Epoch: 14   Global Step: 607000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:55,760-Speed 2628.24 samples/sec   Loss 3.6425   LearningRate 0.0072   Epoch: 14   Global Step: 607010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:56:59,656-Speed 2629.36 samples/sec   Loss 3.6391   LearningRate 0.0072   Epoch: 14   Global Step: 607020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:57:03,566-Speed 2619.53 samples/sec   Loss 3.6896   LearningRate 0.0072   Epoch: 14   Global Step: 607030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:07,460-Speed 2630.26 samples/sec   Loss 3.6573   LearningRate 0.0072   Epoch: 14   Global Step: 607040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:11,357-Speed 2628.79 samples/sec   Loss 3.6091   LearningRate 0.0072   Epoch: 14   Global Step: 607050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:15,257-Speed 2625.69 samples/sec   Loss 3.6084   LearningRate 0.0072   Epoch: 14   Global Step: 607060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:19,154-Speed 2628.98 samples/sec   Loss 3.6488   LearningRate 0.0072   Epoch: 14   Global Step: 607070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:23,049-Speed 2629.23 samples/sec   Loss 3.6538   LearningRate 0.0072   Epoch: 14   Global Step: 607080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:26,946-Speed 2628.35 samples/sec   Loss 3.6364   LearningRate 0.0072   Epoch: 14   Global Step: 607090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:30,847-Speed 2625.59 samples/sec   Loss 3.6357   LearningRate 0.0072   Epoch: 14   Global Step: 607100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:34,764-Speed 2615.43 samples/sec   Loss 3.6867   LearningRate 0.0072   Epoch: 14   Global Step: 607110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:38,656-Speed 2631.47 samples/sec   Loss 3.6263   LearningRate 0.0072   Epoch: 14   Global Step: 607120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:42,528-Speed 2645.14 samples/sec   Loss 3.6329   LearningRate 0.0072   Epoch: 14   Global Step: 607130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:46,428-Speed 2626.49 samples/sec   Loss 3.7171   LearningRate 0.0072   Epoch: 14   Global Step: 607140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:50,343-Speed 2616.22 samples/sec   Loss 3.6208   LearningRate 0.0072   Epoch: 14   Global Step: 607150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:54,240-Speed 2628.51 samples/sec   Loss 3.6146   LearningRate 0.0072   Epoch: 14   Global Step: 607160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:57:58,144-Speed 2623.81 samples/sec   Loss 3.6980   LearningRate 0.0072   Epoch: 14   Global Step: 607170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:02,037-Speed 2630.33 samples/sec   Loss 3.6272   LearningRate 0.0072   Epoch: 14   Global Step: 607180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:05,991-Speed 2591.10 samples/sec   Loss 3.6671   LearningRate 0.0072   Epoch: 14   Global Step: 607190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:09,925-Speed 2603.41 samples/sec   Loss 3.6886   LearningRate 0.0072   Epoch: 14   Global Step: 607200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:13,822-Speed 2628.21 samples/sec   Loss 3.6150   LearningRate 0.0072   Epoch: 14   Global Step: 607210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:17,714-Speed 2631.54 samples/sec   Loss 3.6441   LearningRate 0.0072   Epoch: 14   Global Step: 607220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:21,613-Speed 2627.00 samples/sec   Loss 3.6515   LearningRate 0.0072   Epoch: 14   Global Step: 607230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:58:25,521-Speed 2620.74 samples/sec   Loss 3.6515   LearningRate 0.0072   Epoch: 14   Global Step: 607240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:58:29,418-Speed 2628.45 samples/sec   Loss 3.6324   LearningRate 0.0072   Epoch: 14   Global Step: 607250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:58:33,294-Speed 2642.48 samples/sec   Loss 3.6298   LearningRate 0.0072   Epoch: 14   Global Step: 607260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:37,195-Speed 2625.81 samples/sec   Loss 3.6000   LearningRate 0.0072   Epoch: 14   Global Step: 607270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:41,089-Speed 2630.01 samples/sec   Loss 3.6243   LearningRate 0.0072   Epoch: 14   Global Step: 607280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:45,026-Speed 2601.84 samples/sec   Loss 3.6361   LearningRate 0.0072   Epoch: 14   Global Step: 607290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:48,927-Speed 2625.95 samples/sec   Loss 3.5715   LearningRate 0.0072   Epoch: 14   Global Step: 607300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:52,839-Speed 2618.44 samples/sec   Loss 3.6209   LearningRate 0.0072   Epoch: 14   Global Step: 607310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:58:56,738-Speed 2626.71 samples/sec   Loss 3.6544   LearningRate 0.0072   Epoch: 14   Global Step: 607320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:00,648-Speed 2619.63 samples/sec   Loss 3.6491   LearningRate 0.0072   Epoch: 14   Global Step: 607330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:04,575-Speed 2608.44 samples/sec   Loss 3.5992   LearningRate 0.0072   Epoch: 14   Global Step: 607340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:08,467-Speed 2631.55 samples/sec   Loss 3.5806   LearningRate 0.0072   Epoch: 14   Global Step: 607350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:12,371-Speed 2623.15 samples/sec   Loss 3.5896   LearningRate 0.0072   Epoch: 14   Global Step: 607360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 15:59:16,255-Speed 2637.94 samples/sec   Loss 3.6577   LearningRate 0.0072   Epoch: 14   Global Step: 607370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:20,149-Speed 2630.34 samples/sec   Loss 3.6511   LearningRate 0.0072   Epoch: 14   Global Step: 607380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:24,052-Speed 2624.74 samples/sec   Loss 3.6910   LearningRate 0.0072   Epoch: 14   Global Step: 607390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:27,946-Speed 2629.62 samples/sec   Loss 3.6547   LearningRate 0.0072   Epoch: 14   Global Step: 607400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 15:59:31,867-Speed 2612.39 samples/sec   Loss 3.5930   LearningRate 0.0072   Epoch: 14   Global Step: 607410   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:35,763-Speed 2629.30 samples/sec   Loss 3.6770   LearningRate 0.0072   Epoch: 14   Global Step: 607420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:39,661-Speed 2627.50 samples/sec   Loss 3.5732   LearningRate 0.0072   Epoch: 14   Global Step: 607430   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:43,557-Speed 2629.18 samples/sec   Loss 3.6036   LearningRate 0.0072   Epoch: 14   Global Step: 607440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:47,497-Speed 2599.61 samples/sec   Loss 3.6948   LearningRate 0.0072   Epoch: 14   Global Step: 607450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:51,514-Speed 2550.24 samples/sec   Loss 3.6021   LearningRate 0.0072   Epoch: 14   Global Step: 607460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:55,410-Speed 2629.00 samples/sec   Loss 3.6947   LearningRate 0.0072   Epoch: 14   Global Step: 607470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 15:59:59,300-Speed 2632.67 samples/sec   Loss 3.6236   LearningRate 0.0072   Epoch: 14   Global Step: 607480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:03,196-Speed 2629.18 samples/sec   Loss 3.7089   LearningRate 0.0072   Epoch: 14   Global Step: 607490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:07,090-Speed 2630.03 samples/sec   Loss 3.6406   LearningRate 0.0072   Epoch: 14   Global Step: 607500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:10,984-Speed 2630.65 samples/sec   Loss 3.7880   LearningRate 0.0072   Epoch: 14   Global Step: 607510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:00:14,853-Speed 2647.36 samples/sec   Loss 3.6728   LearningRate 0.0072   Epoch: 14   Global Step: 607520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:18,750-Speed 2628.57 samples/sec   Loss 3.6671   LearningRate 0.0072   Epoch: 14   Global Step: 607530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:22,641-Speed 2632.35 samples/sec   Loss 3.5529   LearningRate 0.0072   Epoch: 14   Global Step: 607540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:26,539-Speed 2627.38 samples/sec   Loss 3.6395   LearningRate 0.0072   Epoch: 14   Global Step: 607550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:30,435-Speed 2629.18 samples/sec   Loss 3.5518   LearningRate 0.0072   Epoch: 14   Global Step: 607560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:34,430-Speed 2563.66 samples/sec   Loss 3.6108   LearningRate 0.0072   Epoch: 14   Global Step: 607570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:38,328-Speed 2628.06 samples/sec   Loss 3.7007   LearningRate 0.0072   Epoch: 14   Global Step: 607580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:42,264-Speed 2602.09 samples/sec   Loss 3.6346   LearningRate 0.0072   Epoch: 14   Global Step: 607590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:46,161-Speed 2628.50 samples/sec   Loss 3.5796   LearningRate 0.0072   Epoch: 14   Global Step: 607600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:50,073-Speed 2618.62 samples/sec   Loss 3.6319   LearningRate 0.0072   Epoch: 14   Global Step: 607610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:00:54,034-Speed 2585.53 samples/sec   Loss 3.6632   LearningRate 0.0072   Epoch: 14   Global Step: 607620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:00:57,934-Speed 2626.66 samples/sec   Loss 3.5345   LearningRate 0.0072   Epoch: 14   Global Step: 607630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:01:01,829-Speed 2629.62 samples/sec   Loss 3.6493   LearningRate 0.0072   Epoch: 14   Global Step: 607640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:01:05,727-Speed 2627.71 samples/sec   Loss 3.6901   LearningRate 0.0072   Epoch: 14   Global Step: 607650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:01:09,622-Speed 2629.61 samples/sec   Loss 3.6271   LearningRate 0.0072   Epoch: 14   Global Step: 607660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:01:13,530-Speed 2621.78 samples/sec   Loss 3.6652   LearningRate 0.0072   Epoch: 14   Global Step: 607670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:01:17,422-Speed 2631.33 samples/sec   Loss 3.6792   LearningRate 0.0072   Epoch: 14   Global Step: 607680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:01:21,298-Speed 2643.03 samples/sec   Loss 3.6494   LearningRate 0.0072   Epoch: 14   Global Step: 607690   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:25,191-Speed 2631.03 samples/sec   Loss 3.6573   LearningRate 0.0072   Epoch: 14   Global Step: 607700   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:29,087-Speed 2629.54 samples/sec   Loss 3.6125   LearningRate 0.0072   Epoch: 14   Global Step: 607710   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:32,983-Speed 2628.42 samples/sec   Loss 3.6450   LearningRate 0.0072   Epoch: 14   Global Step: 607720   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:36,898-Speed 2616.01 samples/sec   Loss 3.6507   LearningRate 0.0072   Epoch: 14   Global Step: 607730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:40,811-Speed 2617.91 samples/sec   Loss 3.5967   LearningRate 0.0072   Epoch: 14   Global Step: 607740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:44,704-Speed 2631.03 samples/sec   Loss 3.5474   LearningRate 0.0072   Epoch: 14   Global Step: 607750   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:48,602-Speed 2627.41 samples/sec   Loss 3.6786   LearningRate 0.0071   Epoch: 14   Global Step: 607760   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:52,501-Speed 2627.35 samples/sec   Loss 3.6399   LearningRate 0.0071   Epoch: 14   Global Step: 607770   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:01:56,398-Speed 2628.41 samples/sec   Loss 3.6131   LearningRate 0.0071   Epoch: 14   Global Step: 607780   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:00,293-Speed 2630.22 samples/sec   Loss 3.5985   LearningRate 0.0071   Epoch: 14   Global Step: 607790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:02:04,190-Speed 2627.65 samples/sec   Loss 3.6937   LearningRate 0.0071   Epoch: 14   Global Step: 607800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:02:08,100-Speed 2619.34 samples/sec   Loss 3.6388   LearningRate 0.0071   Epoch: 14   Global Step: 607810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:02:12,011-Speed 2618.84 samples/sec   Loss 3.7096   LearningRate 0.0071   Epoch: 14   Global Step: 607820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:02:15,901-Speed 2632.96 samples/sec   Loss 3.5813   LearningRate 0.0071   Epoch: 14   Global Step: 607830   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:19,799-Speed 2627.93 samples/sec   Loss 3.6100   LearningRate 0.0071   Epoch: 14   Global Step: 607840   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:23,696-Speed 2628.38 samples/sec   Loss 3.6476   LearningRate 0.0071   Epoch: 14   Global Step: 607850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:27,588-Speed 2631.76 samples/sec   Loss 3.6529   LearningRate 0.0071   Epoch: 14   Global Step: 607860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:31,488-Speed 2626.28 samples/sec   Loss 3.6782   LearningRate 0.0071   Epoch: 14   Global Step: 607870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:35,392-Speed 2623.26 samples/sec   Loss 3.7281   LearningRate 0.0071   Epoch: 14   Global Step: 607880   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:39,303-Speed 2619.19 samples/sec   Loss 3.5954   LearningRate 0.0071   Epoch: 14   Global Step: 607890   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:43,240-Speed 2601.87 samples/sec   Loss 3.6134   LearningRate 0.0071   Epoch: 14   Global Step: 607900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:47,138-Speed 2627.57 samples/sec   Loss 3.6180   LearningRate 0.0071   Epoch: 14   Global Step: 607910   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:51,039-Speed 2625.79 samples/sec   Loss 3.5657   LearningRate 0.0071   Epoch: 14   Global Step: 607920   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:02:54,938-Speed 2627.27 samples/sec   Loss 3.5770   LearningRate 0.0071   Epoch: 14   Global Step: 607930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:02:58,833-Speed 2629.52 samples/sec   Loss 3.6801   LearningRate 0.0071   Epoch: 14   Global Step: 607940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:02,735-Speed 2625.13 samples/sec   Loss 3.5644   LearningRate 0.0071   Epoch: 14   Global Step: 607950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:06,632-Speed 2628.13 samples/sec   Loss 3.6099   LearningRate 0.0071   Epoch: 14   Global Step: 607960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:10,530-Speed 2627.77 samples/sec   Loss 3.7189   LearningRate 0.0071   Epoch: 14   Global Step: 607970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:14,433-Speed 2624.26 samples/sec   Loss 3.6512   LearningRate 0.0071   Epoch: 14   Global Step: 607980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:18,341-Speed 2620.85 samples/sec   Loss 3.6194   LearningRate 0.0071   Epoch: 14   Global Step: 607990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:22,236-Speed 2629.89 samples/sec   Loss 3.6419   LearningRate 0.0071   Epoch: 14   Global Step: 608000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:26,134-Speed 2627.44 samples/sec   Loss 3.6603   LearningRate 0.0071   Epoch: 14   Global Step: 608010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:30,027-Speed 2632.15 samples/sec   Loss 3.6988   LearningRate 0.0071   Epoch: 14   Global Step: 608020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:33,918-Speed 2632.48 samples/sec   Loss 3.7036   LearningRate 0.0071   Epoch: 14   Global Step: 608030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:03:37,810-Speed 2631.19 samples/sec   Loss 3.6992   LearningRate 0.0071   Epoch: 14   Global Step: 608040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:41,708-Speed 2627.52 samples/sec   Loss 3.6756   LearningRate 0.0071   Epoch: 14   Global Step: 608050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:45,605-Speed 2628.76 samples/sec   Loss 3.6434   LearningRate 0.0071   Epoch: 14   Global Step: 608060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:49,499-Speed 2630.83 samples/sec   Loss 3.5832   LearningRate 0.0071   Epoch: 14   Global Step: 608070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:53,391-Speed 2631.27 samples/sec   Loss 3.5102   LearningRate 0.0071   Epoch: 14   Global Step: 608080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:03:57,296-Speed 2623.58 samples/sec   Loss 3.5864   LearningRate 0.0071   Epoch: 14   Global Step: 608090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:01,189-Speed 2630.66 samples/sec   Loss 3.5661   LearningRate 0.0071   Epoch: 14   Global Step: 608100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:05,109-Speed 2613.28 samples/sec   Loss 3.6258   LearningRate 0.0071   Epoch: 14   Global Step: 608110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:09,011-Speed 2624.69 samples/sec   Loss 3.6253   LearningRate 0.0071   Epoch: 14   Global Step: 608120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:12,905-Speed 2630.57 samples/sec   Loss 3.6828   LearningRate 0.0071   Epoch: 14   Global Step: 608130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:16,802-Speed 2628.02 samples/sec   Loss 3.6546   LearningRate 0.0071   Epoch: 14   Global Step: 608140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:04:20,710-Speed 2620.93 samples/sec   Loss 3.6028   LearningRate 0.0071   Epoch: 14   Global Step: 608150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:04:24,606-Speed 2628.93 samples/sec   Loss 3.5140   LearningRate 0.0071   Epoch: 14   Global Step: 608160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:04:28,498-Speed 2632.48 samples/sec   Loss 3.5834   LearningRate 0.0071   Epoch: 14   Global Step: 608170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:04:32,389-Speed 2632.04 samples/sec   Loss 3.5900   LearningRate 0.0071   Epoch: 14   Global Step: 608180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:04:36,263-Speed 2643.60 samples/sec   Loss 3.5957   LearningRate 0.0071   Epoch: 14   Global Step: 608190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:40,157-Speed 2630.20 samples/sec   Loss 3.5952   LearningRate 0.0071   Epoch: 14   Global Step: 608200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:44,066-Speed 2620.46 samples/sec   Loss 3.6772   LearningRate 0.0071   Epoch: 14   Global Step: 608210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:47,958-Speed 2631.72 samples/sec   Loss 3.6867   LearningRate 0.0071   Epoch: 14   Global Step: 608220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:51,862-Speed 2624.51 samples/sec   Loss 3.5946   LearningRate 0.0071   Epoch: 14   Global Step: 608230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:55,764-Speed 2624.57 samples/sec   Loss 3.6563   LearningRate 0.0071   Epoch: 14   Global Step: 608240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:04:59,752-Speed 2568.91 samples/sec   Loss 3.5903   LearningRate 0.0071   Epoch: 14   Global Step: 608250   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:03,653-Speed 2625.63 samples/sec   Loss 3.6085   LearningRate 0.0071   Epoch: 14   Global Step: 608260   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:07,554-Speed 2624.95 samples/sec   Loss 3.6654   LearningRate 0.0071   Epoch: 14   Global Step: 608270   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:11,456-Speed 2624.76 samples/sec   Loss 3.6585   LearningRate 0.0071   Epoch: 14   Global Step: 608280   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:15,351-Speed 2630.27 samples/sec   Loss 3.5600   LearningRate 0.0071   Epoch: 14   Global Step: 608290   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:19,246-Speed 2629.47 samples/sec   Loss 3.6027   LearningRate 0.0071   Epoch: 14   Global Step: 608300   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:23,138-Speed 2631.84 samples/sec   Loss 3.5906   LearningRate 0.0071   Epoch: 14   Global Step: 608310   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:27,030-Speed 2632.11 samples/sec   Loss 3.6187   LearningRate 0.0071   Epoch: 14   Global Step: 608320   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:30,925-Speed 2629.69 samples/sec   Loss 3.6669   LearningRate 0.0071   Epoch: 14   Global Step: 608330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:34,834-Speed 2620.23 samples/sec   Loss 3.5636   LearningRate 0.0071   Epoch: 14   Global Step: 608340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:05:38,726-Speed 2631.08 samples/sec   Loss 3.6039   LearningRate 0.0071   Epoch: 14   Global Step: 608350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:05:42,620-Speed 2631.07 samples/sec   Loss 3.5839   LearningRate 0.0071   Epoch: 14   Global Step: 608360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:05:46,514-Speed 2630.04 samples/sec   Loss 3.5540   LearningRate 0.0071   Epoch: 14   Global Step: 608370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:05:50,406-Speed 2632.22 samples/sec   Loss 3.6222   LearningRate 0.0071   Epoch: 14   Global Step: 608380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:05:54,308-Speed 2624.49 samples/sec   Loss 3.5755   LearningRate 0.0071   Epoch: 14   Global Step: 608390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:05:58,223-Speed 2617.01 samples/sec   Loss 3.6223   LearningRate 0.0071   Epoch: 14   Global Step: 608400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:06:02,117-Speed 2629.88 samples/sec   Loss 3.6600   LearningRate 0.0071   Epoch: 14   Global Step: 608410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:06:06,013-Speed 2629.04 samples/sec   Loss 3.6151   LearningRate 0.0071   Epoch: 14   Global Step: 608420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:06:09,921-Speed 2620.32 samples/sec   Loss 3.6332   LearningRate 0.0071   Epoch: 14   Global Step: 608430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:06:13,820-Speed 2627.37 samples/sec   Loss 3.6029   LearningRate 0.0071   Epoch: 14   Global Step: 608440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:06:17,693-Speed 2645.06 samples/sec   Loss 3.7057   LearningRate 0.0071   Epoch: 14   Global Step: 608450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:21,590-Speed 2628.31 samples/sec   Loss 3.6343   LearningRate 0.0071   Epoch: 14   Global Step: 608460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:25,491-Speed 2625.47 samples/sec   Loss 3.6886   LearningRate 0.0071   Epoch: 14   Global Step: 608470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:29,383-Speed 2631.87 samples/sec   Loss 3.5514   LearningRate 0.0071   Epoch: 14   Global Step: 608480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:33,280-Speed 2628.01 samples/sec   Loss 3.6356   LearningRate 0.0071   Epoch: 14   Global Step: 608490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:37,187-Speed 2621.61 samples/sec   Loss 3.6047   LearningRate 0.0071   Epoch: 14   Global Step: 608500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:41,081-Speed 2630.23 samples/sec   Loss 3.5629   LearningRate 0.0071   Epoch: 14   Global Step: 608510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:44,980-Speed 2627.07 samples/sec   Loss 3.5633   LearningRate 0.0071   Epoch: 14   Global Step: 608520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:48,877-Speed 2628.66 samples/sec   Loss 3.6109   LearningRate 0.0071   Epoch: 14   Global Step: 608530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:52,793-Speed 2615.31 samples/sec   Loss 3.5232   LearningRate 0.0071   Epoch: 14   Global Step: 608540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:06:56,713-Speed 2613.30 samples/sec   Loss 3.6545   LearningRate 0.0071   Epoch: 14   Global Step: 608550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:00,613-Speed 2626.54 samples/sec   Loss 3.6273   LearningRate 0.0071   Epoch: 14   Global Step: 608560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:04,503-Speed 2632.81 samples/sec   Loss 3.6906   LearningRate 0.0071   Epoch: 14   Global Step: 608570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:08,403-Speed 2626.15 samples/sec   Loss 3.5992   LearningRate 0.0071   Epoch: 14   Global Step: 608580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:12,311-Speed 2620.57 samples/sec   Loss 3.6582   LearningRate 0.0071   Epoch: 14   Global Step: 608590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:16,210-Speed 2627.67 samples/sec   Loss 3.6468   LearningRate 0.0071   Epoch: 14   Global Step: 608600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:20,104-Speed 2630.86 samples/sec   Loss 3.6104   LearningRate 0.0071   Epoch: 14   Global Step: 608610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:23,999-Speed 2629.28 samples/sec   Loss 3.5929   LearningRate 0.0071   Epoch: 14   Global Step: 608620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:27,897-Speed 2627.96 samples/sec   Loss 3.6487   LearningRate 0.0071   Epoch: 14   Global Step: 608630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:31,802-Speed 2623.04 samples/sec   Loss 3.6795   LearningRate 0.0071   Epoch: 14   Global Step: 608640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:35,696-Speed 2630.04 samples/sec   Loss 3.5737   LearningRate 0.0071   Epoch: 14   Global Step: 608650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:07:39,569-Speed 2643.85 samples/sec   Loss 3.7007   LearningRate 0.0071   Epoch: 14   Global Step: 608660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:07:43,438-Speed 2648.19 samples/sec   Loss 3.6652   LearningRate 0.0071   Epoch: 14   Global Step: 608670   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:07:47,330-Speed 2631.81 samples/sec   Loss 3.5764   LearningRate 0.0071   Epoch: 14   Global Step: 608680   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:07:51,244-Speed 2616.62 samples/sec   Loss 3.6524   LearningRate 0.0071   Epoch: 14   Global Step: 608690   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:07:55,141-Speed 2628.63 samples/sec   Loss 3.6027   LearningRate 0.0071   Epoch: 14   Global Step: 608700   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:07:59,041-Speed 2626.53 samples/sec   Loss 3.7045   LearningRate 0.0071   Epoch: 14   Global Step: 608710   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:08:02,944-Speed 2624.12 samples/sec   Loss 3.7052   LearningRate 0.0071   Epoch: 14   Global Step: 608720   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:08:06,840-Speed 2628.68 samples/sec   Loss 3.5565   LearningRate 0.0071   Epoch: 14   Global Step: 608730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:08:10,738-Speed 2627.13 samples/sec   Loss 3.5958   LearningRate 0.0071   Epoch: 14   Global Step: 608740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:08:14,635-Speed 2628.94 samples/sec   Loss 3.6169   LearningRate 0.0071   Epoch: 14   Global Step: 608750   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:08:18,530-Speed 2629.43 samples/sec   Loss 3.5647   LearningRate 0.0071   Epoch: 14   Global Step: 608760   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:08:22,421-Speed 2632.35 samples/sec   Loss 3.5851   LearningRate 0.0071   Epoch: 14   Global Step: 608770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:26,320-Speed 2626.99 samples/sec   Loss 3.6288   LearningRate 0.0071   Epoch: 14   Global Step: 608780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:30,212-Speed 2632.35 samples/sec   Loss 3.6296   LearningRate 0.0071   Epoch: 14   Global Step: 608790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:34,105-Speed 2630.71 samples/sec   Loss 3.6161   LearningRate 0.0071   Epoch: 14   Global Step: 608800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:38,009-Speed 2623.66 samples/sec   Loss 3.6371   LearningRate 0.0071   Epoch: 14   Global Step: 608810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:41,927-Speed 2614.46 samples/sec   Loss 3.6528   LearningRate 0.0071   Epoch: 14   Global Step: 608820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:45,822-Speed 2629.53 samples/sec   Loss 3.6600   LearningRate 0.0071   Epoch: 14   Global Step: 608830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:49,718-Speed 2629.38 samples/sec   Loss 3.6038   LearningRate 0.0071   Epoch: 14   Global Step: 608840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:53,617-Speed 2627.17 samples/sec   Loss 3.5716   LearningRate 0.0071   Epoch: 14   Global Step: 608850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:08:57,512-Speed 2629.49 samples/sec   Loss 3.6793   LearningRate 0.0071   Epoch: 14   Global Step: 608860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:09:01,403-Speed 2631.59 samples/sec   Loss 3.5670   LearningRate 0.0071   Epoch: 14   Global Step: 608870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:09:05,275-Speed 2645.78 samples/sec   Loss 3.5535   LearningRate 0.0071   Epoch: 14   Global Step: 608880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:09:09,165-Speed 2633.38 samples/sec   Loss 3.5594   LearningRate 0.0071   Epoch: 14   Global Step: 608890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:09:13,039-Speed 2648.36 samples/sec   Loss 3.6375   LearningRate 0.0071   Epoch: 14   Global Step: 608900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:16,944-Speed 2622.83 samples/sec   Loss 3.5585   LearningRate 0.0071   Epoch: 14   Global Step: 608910   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:20,840-Speed 2629.51 samples/sec   Loss 3.6041   LearningRate 0.0071   Epoch: 14   Global Step: 608920   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:24,732-Speed 2631.31 samples/sec   Loss 3.6959   LearningRate 0.0071   Epoch: 14   Global Step: 608930   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:28,623-Speed 2632.20 samples/sec   Loss 3.5439   LearningRate 0.0071   Epoch: 14   Global Step: 608940   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:32,519-Speed 2628.86 samples/sec   Loss 3.6130   LearningRate 0.0071   Epoch: 14   Global Step: 608950   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:36,487-Speed 2581.84 samples/sec   Loss 3.6109   LearningRate 0.0071   Epoch: 14   Global Step: 608960   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:40,382-Speed 2629.59 samples/sec   Loss 3.6549   LearningRate 0.0071   Epoch: 14   Global Step: 608970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:44,286-Speed 2623.76 samples/sec   Loss 3.6094   LearningRate 0.0071   Epoch: 14   Global Step: 608980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:48,188-Speed 2625.75 samples/sec   Loss 3.6091   LearningRate 0.0071   Epoch: 14   Global Step: 608990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:09:52,081-Speed 2630.82 samples/sec   Loss 3.6158   LearningRate 0.0071   Epoch: 14   Global Step: 609000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:09:55,992-Speed 2618.41 samples/sec   Loss 3.7135   LearningRate 0.0071   Epoch: 14   Global Step: 609010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:09:59,898-Speed 2622.08 samples/sec   Loss 3.6820   LearningRate 0.0071   Epoch: 14   Global Step: 609020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:03,799-Speed 2626.48 samples/sec   Loss 3.5597   LearningRate 0.0071   Epoch: 14   Global Step: 609030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:07,694-Speed 2629.44 samples/sec   Loss 3.6566   LearningRate 0.0071   Epoch: 14   Global Step: 609040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:11,598-Speed 2623.50 samples/sec   Loss 3.6684   LearningRate 0.0071   Epoch: 14   Global Step: 609050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:15,512-Speed 2617.04 samples/sec   Loss 3.5827   LearningRate 0.0071   Epoch: 14   Global Step: 609060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:19,427-Speed 2616.78 samples/sec   Loss 3.6187   LearningRate 0.0071   Epoch: 14   Global Step: 609070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:23,321-Speed 2630.17 samples/sec   Loss 3.6152   LearningRate 0.0071   Epoch: 14   Global Step: 609080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:27,217-Speed 2628.55 samples/sec   Loss 3.5712   LearningRate 0.0071   Epoch: 14   Global Step: 609090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:31,107-Speed 2633.35 samples/sec   Loss 3.6267   LearningRate 0.0071   Epoch: 14   Global Step: 609100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:35,009-Speed 2625.18 samples/sec   Loss 3.6348   LearningRate 0.0071   Epoch: 14   Global Step: 609110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:38,907-Speed 2627.67 samples/sec   Loss 3.6585   LearningRate 0.0071   Epoch: 14   Global Step: 609120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:42,805-Speed 2627.46 samples/sec   Loss 3.5560   LearningRate 0.0071   Epoch: 14   Global Step: 609130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:46,696-Speed 2632.51 samples/sec   Loss 3.6169   LearningRate 0.0071   Epoch: 14   Global Step: 609140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:50,594-Speed 2628.17 samples/sec   Loss 3.5825   LearningRate 0.0071   Epoch: 14   Global Step: 609150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:54,500-Speed 2622.51 samples/sec   Loss 3.5951   LearningRate 0.0071   Epoch: 14   Global Step: 609160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:10:58,400-Speed 2626.00 samples/sec   Loss 3.6655   LearningRate 0.0071   Epoch: 14   Global Step: 609170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:02,306-Speed 2622.43 samples/sec   Loss 3.5710   LearningRate 0.0071   Epoch: 14   Global Step: 609180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:06,201-Speed 2628.84 samples/sec   Loss 3.5784   LearningRate 0.0071   Epoch: 14   Global Step: 609190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:10,118-Speed 2615.95 samples/sec   Loss 3.6138   LearningRate 0.0071   Epoch: 14   Global Step: 609200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:11:14,023-Speed 2622.37 samples/sec   Loss 3.5334   LearningRate 0.0071   Epoch: 14   Global Step: 609210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:11:17,939-Speed 2615.79 samples/sec   Loss 3.6472   LearningRate 0.0071   Epoch: 14   Global Step: 609220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:11:21,813-Speed 2644.58 samples/sec   Loss 3.6271   LearningRate 0.0071   Epoch: 14   Global Step: 609230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:25,712-Speed 2626.71 samples/sec   Loss 3.6204   LearningRate 0.0071   Epoch: 14   Global Step: 609240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:29,602-Speed 2632.88 samples/sec   Loss 3.5882   LearningRate 0.0071   Epoch: 14   Global Step: 609250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:33,511-Speed 2620.80 samples/sec   Loss 3.5475   LearningRate 0.0071   Epoch: 14   Global Step: 609260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:37,506-Speed 2563.31 samples/sec   Loss 3.6295   LearningRate 0.0071   Epoch: 14   Global Step: 609270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:41,410-Speed 2624.05 samples/sec   Loss 3.5713   LearningRate 0.0071   Epoch: 14   Global Step: 609280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:45,308-Speed 2628.03 samples/sec   Loss 3.6414   LearningRate 0.0071   Epoch: 14   Global Step: 609290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:11:49,178-Speed 2646.53 samples/sec   Loss 3.6161   LearningRate 0.0071   Epoch: 14   Global Step: 609300   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:11:53,070-Speed 2633.21 samples/sec   Loss 3.6689   LearningRate 0.0071   Epoch: 14   Global Step: 609310   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:11:56,971-Speed 2625.63 samples/sec   Loss 3.4918   LearningRate 0.0070   Epoch: 14   Global Step: 609320   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:00,866-Speed 2629.98 samples/sec   Loss 3.6025   LearningRate 0.0070   Epoch: 14   Global Step: 609330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:04,761-Speed 2629.60 samples/sec   Loss 3.5915   LearningRate 0.0070   Epoch: 14   Global Step: 609340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:08,667-Speed 2622.01 samples/sec   Loss 3.6635   LearningRate 0.0070   Epoch: 14   Global Step: 609350   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:12,577-Speed 2619.25 samples/sec   Loss 3.6636   LearningRate 0.0070   Epoch: 14   Global Step: 609360   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:16,470-Speed 2632.34 samples/sec   Loss 3.5382   LearningRate 0.0070   Epoch: 14   Global Step: 609370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:20,369-Speed 2626.92 samples/sec   Loss 3.5693   LearningRate 0.0070   Epoch: 14   Global Step: 609380   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:24,266-Speed 2628.39 samples/sec   Loss 3.5657   LearningRate 0.0070   Epoch: 14   Global Step: 609390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:12:28,157-Speed 2632.81 samples/sec   Loss 3.6278   LearningRate 0.0070   Epoch: 14   Global Step: 609400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:32,054-Speed 2628.11 samples/sec   Loss 3.6219   LearningRate 0.0070   Epoch: 14   Global Step: 609410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:35,958-Speed 2623.45 samples/sec   Loss 3.5915   LearningRate 0.0070   Epoch: 14   Global Step: 609420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:39,851-Speed 2630.79 samples/sec   Loss 3.6666   LearningRate 0.0070   Epoch: 14   Global Step: 609430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:43,752-Speed 2626.30 samples/sec   Loss 3.6199   LearningRate 0.0070   Epoch: 14   Global Step: 609440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:47,666-Speed 2616.82 samples/sec   Loss 3.5100   LearningRate 0.0070   Epoch: 14   Global Step: 609450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:51,577-Speed 2619.04 samples/sec   Loss 3.5538   LearningRate 0.0070   Epoch: 14   Global Step: 609460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:55,486-Speed 2620.06 samples/sec   Loss 3.5718   LearningRate 0.0070   Epoch: 14   Global Step: 609470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:12:59,378-Speed 2632.03 samples/sec   Loss 3.5306   LearningRate 0.0070   Epoch: 14   Global Step: 609480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:03,271-Speed 2630.86 samples/sec   Loss 3.6465   LearningRate 0.0070   Epoch: 14   Global Step: 609490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:07,244-Speed 2578.38 samples/sec   Loss 3.6238   LearningRate 0.0070   Epoch: 14   Global Step: 609500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:11,136-Speed 2631.55 samples/sec   Loss 3.5656   LearningRate 0.0070   Epoch: 14   Global Step: 609510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:15,025-Speed 2633.46 samples/sec   Loss 3.5975   LearningRate 0.0070   Epoch: 14   Global Step: 609520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:18,918-Speed 2631.28 samples/sec   Loss 3.5463   LearningRate 0.0070   Epoch: 14   Global Step: 609530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:22,828-Speed 2619.70 samples/sec   Loss 3.6788   LearningRate 0.0070   Epoch: 14   Global Step: 609540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:26,721-Speed 2630.98 samples/sec   Loss 3.7261   LearningRate 0.0070   Epoch: 14   Global Step: 609550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:30,619-Speed 2627.64 samples/sec   Loss 3.5705   LearningRate 0.0070   Epoch: 14   Global Step: 609560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:34,510-Speed 2632.47 samples/sec   Loss 3.5799   LearningRate 0.0070   Epoch: 14   Global Step: 609570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:38,398-Speed 2634.25 samples/sec   Loss 3.6526   LearningRate 0.0070   Epoch: 14   Global Step: 609580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:42,291-Speed 2631.08 samples/sec   Loss 3.5340   LearningRate 0.0070   Epoch: 14   Global Step: 609590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:46,194-Speed 2624.71 samples/sec   Loss 3.5986   LearningRate 0.0070   Epoch: 14   Global Step: 609600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-15 16:13:50,125-Speed 2610.00 samples/sec   Loss 3.5562   LearningRate 0.0070   Epoch: 14   Global Step: 609610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:54,028-Speed 2624.40 samples/sec   Loss 3.5922   LearningRate 0.0070   Epoch: 14   Global Step: 609620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:13:57,942-Speed 2617.24 samples/sec   Loss 3.5687   LearningRate 0.0070   Epoch: 14   Global Step: 609630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:14:01,822-Speed 2639.65 samples/sec   Loss 3.6110   LearningRate 0.0070   Epoch: 14   Global Step: 609640   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:05,714-Speed 2631.67 samples/sec   Loss 3.5729   LearningRate 0.0070   Epoch: 14   Global Step: 609650   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:09,610-Speed 2628.59 samples/sec   Loss 3.6005   LearningRate 0.0070   Epoch: 14   Global Step: 609660   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:13,507-Speed 2628.93 samples/sec   Loss 3.5338   LearningRate 0.0070   Epoch: 14   Global Step: 609670   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:17,424-Speed 2614.55 samples/sec   Loss 3.6147   LearningRate 0.0070   Epoch: 14   Global Step: 609680   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:21,332-Speed 2621.52 samples/sec   Loss 3.5281   LearningRate 0.0070   Epoch: 14   Global Step: 609690   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:25,229-Speed 2627.61 samples/sec   Loss 3.6293   LearningRate 0.0070   Epoch: 14   Global Step: 609700   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:29,128-Speed 2627.55 samples/sec   Loss 3.6215   LearningRate 0.0070   Epoch: 14   Global Step: 609710   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:33,029-Speed 2626.04 samples/sec   Loss 3.5991   LearningRate 0.0070   Epoch: 14   Global Step: 609720   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:36,952-Speed 2610.69 samples/sec   Loss 3.6171   LearningRate 0.0070   Epoch: 14   Global Step: 609730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:14:40,853-Speed 2625.61 samples/sec   Loss 3.6148   LearningRate 0.0070   Epoch: 14   Global Step: 609740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:14:44,748-Speed 2629.77 samples/sec   Loss 3.5848   LearningRate 0.0070   Epoch: 14   Global Step: 609750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:14:48,639-Speed 2632.36 samples/sec   Loss 3.5654   LearningRate 0.0070   Epoch: 14   Global Step: 609760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:14:52,549-Speed 2619.45 samples/sec   Loss 3.5293   LearningRate 0.0070   Epoch: 14   Global Step: 609770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:14:56,448-Speed 2627.50 samples/sec   Loss 3.6051   LearningRate 0.0070   Epoch: 14   Global Step: 609780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:15:00,359-Speed 2619.02 samples/sec   Loss 3.5595   LearningRate 0.0070   Epoch: 14   Global Step: 609790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:15:04,281-Speed 2611.83 samples/sec   Loss 3.6877   LearningRate 0.0070   Epoch: 14   Global Step: 609800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:15:08,155-Speed 2643.92 samples/sec   Loss 3.5740   LearningRate 0.0070   Epoch: 14   Global Step: 609810   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:12,047-Speed 2631.09 samples/sec   Loss 3.5776   LearningRate 0.0070   Epoch: 14   Global Step: 609820   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:15,946-Speed 2627.62 samples/sec   Loss 3.5385   LearningRate 0.0070   Epoch: 14   Global Step: 609830   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:19,840-Speed 2630.08 samples/sec   Loss 3.5548   LearningRate 0.0070   Epoch: 14   Global Step: 609840   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:23,758-Speed 2614.65 samples/sec   Loss 3.6093   LearningRate 0.0070   Epoch: 14   Global Step: 609850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:27,650-Speed 2631.60 samples/sec   Loss 3.6325   LearningRate 0.0070   Epoch: 14   Global Step: 609860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:31,543-Speed 2631.79 samples/sec   Loss 3.5732   LearningRate 0.0070   Epoch: 14   Global Step: 609870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:35,447-Speed 2623.28 samples/sec   Loss 3.5748   LearningRate 0.0070   Epoch: 14   Global Step: 609880   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:39,339-Speed 2632.35 samples/sec   Loss 3.5370   LearningRate 0.0070   Epoch: 14   Global Step: 609890   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:43,238-Speed 2626.65 samples/sec   Loss 3.5233   LearningRate 0.0070   Epoch: 14   Global Step: 609900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:15:47,135-Speed 2628.42 samples/sec   Loss 3.5728   LearningRate 0.0070   Epoch: 14   Global Step: 609910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:15:51,047-Speed 2618.49 samples/sec   Loss 3.5883   LearningRate 0.0070   Epoch: 14   Global Step: 609920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:15:54,940-Speed 2631.03 samples/sec   Loss 3.5451   LearningRate 0.0070   Epoch: 14   Global Step: 609930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:15:58,845-Speed 2623.08 samples/sec   Loss 3.6223   LearningRate 0.0070   Epoch: 14   Global Step: 609940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:16:02,762-Speed 2615.07 samples/sec   Loss 3.6136   LearningRate 0.0070   Epoch: 14   Global Step: 609950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:16:06,654-Speed 2631.52 samples/sec   Loss 3.5919   LearningRate 0.0070   Epoch: 14   Global Step: 609960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:16:10,549-Speed 2630.03 samples/sec   Loss 3.5430   LearningRate 0.0070   Epoch: 14   Global Step: 609970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:16:14,444-Speed 2629.40 samples/sec   Loss 3.6058   LearningRate 0.0070   Epoch: 14   Global Step: 609980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:16:18,314-Speed 2646.98 samples/sec   Loss 3.7054   LearningRate 0.0070   Epoch: 14   Global Step: 609990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:16:22,208-Speed 2630.79 samples/sec   Loss 3.6040   LearningRate 0.0070   Epoch: 14   Global Step: 610000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:17:05,040-[lfw][610000]XNorm: 22.652629
Training: 2022-04-15 16:17:05,041-[lfw][610000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 16:17:05,041-[lfw][610000]Accuracy-Highest: 0.99800
Training: 2022-04-15 16:17:55,210-[cfp_fp][610000]XNorm: 21.952519
Training: 2022-04-15 16:17:55,211-[cfp_fp][610000]Accuracy-Flip: 0.99057+-0.00504
Training: 2022-04-15 16:17:55,212-[cfp_fp][610000]Accuracy-Highest: 0.99143
Training: 2022-04-15 16:18:38,630-[agedb_30][610000]XNorm: 22.843816
Training: 2022-04-15 16:18:38,630-[agedb_30][610000]Accuracy-Flip: 0.98150+-0.00589
Training: 2022-04-15 16:18:38,631-[agedb_30][610000]Accuracy-Highest: 0.98150
Training: 2022-04-15 16:18:42,504-Speed 72.99 samples/sec   Loss 3.5391   LearningRate 0.0070   Epoch: 14   Global Step: 610010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:18:46,371-Speed 2648.66 samples/sec   Loss 3.5281   LearningRate 0.0070   Epoch: 14   Global Step: 610020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:18:50,236-Speed 2649.83 samples/sec   Loss 3.5712   LearningRate 0.0070   Epoch: 14   Global Step: 610030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:18:54,105-Speed 2648.20 samples/sec   Loss 3.6348   LearningRate 0.0070   Epoch: 14   Global Step: 610040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:18:57,981-Speed 2642.65 samples/sec   Loss 3.5932   LearningRate 0.0070   Epoch: 14   Global Step: 610050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:01,859-Speed 2642.62 samples/sec   Loss 3.5525   LearningRate 0.0070   Epoch: 14   Global Step: 610060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:05,736-Speed 2641.87 samples/sec   Loss 3.6197   LearningRate 0.0070   Epoch: 14   Global Step: 610070   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:09,614-Speed 2641.29 samples/sec   Loss 3.6069   LearningRate 0.0070   Epoch: 14   Global Step: 610080   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:13,497-Speed 2637.95 samples/sec   Loss 3.6902   LearningRate 0.0070   Epoch: 14   Global Step: 610090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:19:17,394-Speed 2628.56 samples/sec   Loss 3.6035   LearningRate 0.0070   Epoch: 14   Global Step: 610100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:19:21,288-Speed 2630.28 samples/sec   Loss 3.6624   LearningRate 0.0070   Epoch: 14   Global Step: 610110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:19:25,179-Speed 2632.54 samples/sec   Loss 3.6168   LearningRate 0.0070   Epoch: 14   Global Step: 610120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:19:29,068-Speed 2634.39 samples/sec   Loss 3.5496   LearningRate 0.0070   Epoch: 14   Global Step: 610130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:19:32,934-Speed 2648.94 samples/sec   Loss 3.6636   LearningRate 0.0070   Epoch: 14   Global Step: 610140   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:36,821-Speed 2635.25 samples/sec   Loss 3.5468   LearningRate 0.0070   Epoch: 14   Global Step: 610150   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:40,707-Speed 2635.96 samples/sec   Loss 3.6159   LearningRate 0.0070   Epoch: 14   Global Step: 610160   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:44,599-Speed 2631.36 samples/sec   Loss 3.6031   LearningRate 0.0070   Epoch: 14   Global Step: 610170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:48,511-Speed 2618.31 samples/sec   Loss 3.5810   LearningRate 0.0070   Epoch: 14   Global Step: 610180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:52,408-Speed 2628.76 samples/sec   Loss 3.5550   LearningRate 0.0070   Epoch: 14   Global Step: 610190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:19:56,303-Speed 2629.58 samples/sec   Loss 3.6036   LearningRate 0.0070   Epoch: 14   Global Step: 610200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:20:00,199-Speed 2629.45 samples/sec   Loss 3.6514   LearningRate 0.0070   Epoch: 14   Global Step: 610210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:20:04,093-Speed 2630.10 samples/sec   Loss 3.5972   LearningRate 0.0070   Epoch: 14   Global Step: 610220   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:20:07,989-Speed 2628.80 samples/sec   Loss 3.4851   LearningRate 0.0070   Epoch: 14   Global Step: 610230   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:20:11,995-Speed 2556.65 samples/sec   Loss 3.6154   LearningRate 0.0070   Epoch: 14   Global Step: 610240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:15,988-Speed 2566.05 samples/sec   Loss 3.6135   LearningRate 0.0070   Epoch: 14   Global Step: 610250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:19,974-Speed 2569.36 samples/sec   Loss 3.6397   LearningRate 0.0070   Epoch: 14   Global Step: 610260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:23,867-Speed 2631.26 samples/sec   Loss 3.5847   LearningRate 0.0070   Epoch: 14   Global Step: 610270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:27,763-Speed 2628.33 samples/sec   Loss 3.5486   LearningRate 0.0070   Epoch: 14   Global Step: 610280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:31,662-Speed 2627.55 samples/sec   Loss 3.6280   LearningRate 0.0070   Epoch: 14   Global Step: 610290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:35,570-Speed 2620.59 samples/sec   Loss 3.6279   LearningRate 0.0070   Epoch: 14   Global Step: 610300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:39,463-Speed 2631.19 samples/sec   Loss 3.5977   LearningRate 0.0070   Epoch: 14   Global Step: 610310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:43,358-Speed 2629.28 samples/sec   Loss 3.6628   LearningRate 0.0070   Epoch: 14   Global Step: 610320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:47,249-Speed 2632.87 samples/sec   Loss 3.5907   LearningRate 0.0070   Epoch: 14   Global Step: 610330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:51,115-Speed 2649.22 samples/sec   Loss 3.5594   LearningRate 0.0070   Epoch: 14   Global Step: 610340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:55,034-Speed 2613.64 samples/sec   Loss 3.6000   LearningRate 0.0070   Epoch: 14   Global Step: 610350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:20:58,930-Speed 2629.05 samples/sec   Loss 3.5681   LearningRate 0.0070   Epoch: 14   Global Step: 610360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:21:02,810-Speed 2639.70 samples/sec   Loss 3.6194   LearningRate 0.0070   Epoch: 14   Global Step: 610370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:06,706-Speed 2629.01 samples/sec   Loss 3.5996   LearningRate 0.0070   Epoch: 14   Global Step: 610380   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:10,601-Speed 2629.66 samples/sec   Loss 3.6127   LearningRate 0.0070   Epoch: 14   Global Step: 610390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:14,491-Speed 2633.18 samples/sec   Loss 3.5399   LearningRate 0.0070   Epoch: 14   Global Step: 610400   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:18,387-Speed 2629.19 samples/sec   Loss 3.6403   LearningRate 0.0070   Epoch: 14   Global Step: 610410   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:22,279-Speed 2631.86 samples/sec   Loss 3.5383   LearningRate 0.0070   Epoch: 14   Global Step: 610420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:26,169-Speed 2632.39 samples/sec   Loss 3.6140   LearningRate 0.0070   Epoch: 14   Global Step: 610430   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:30,063-Speed 2630.50 samples/sec   Loss 3.4986   LearningRate 0.0070   Epoch: 14   Global Step: 610440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:33,955-Speed 2631.68 samples/sec   Loss 3.5947   LearningRate 0.0070   Epoch: 14   Global Step: 610450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:37,846-Speed 2632.47 samples/sec   Loss 3.6070   LearningRate 0.0070   Epoch: 14   Global Step: 610460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:21:41,742-Speed 2628.73 samples/sec   Loss 3.5408   LearningRate 0.0070   Epoch: 14   Global Step: 610470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:21:45,634-Speed 2631.83 samples/sec   Loss 3.6398   LearningRate 0.0070   Epoch: 14   Global Step: 610480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:21:49,532-Speed 2627.75 samples/sec   Loss 3.6454   LearningRate 0.0070   Epoch: 14   Global Step: 610490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:21:53,433-Speed 2626.01 samples/sec   Loss 3.5311   LearningRate 0.0070   Epoch: 14   Global Step: 610500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:21:57,361-Speed 2607.03 samples/sec   Loss 3.6245   LearningRate 0.0070   Epoch: 14   Global Step: 610510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:22:01,232-Speed 2645.86 samples/sec   Loss 3.5985   LearningRate 0.0070   Epoch: 14   Global Step: 610520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:05,144-Speed 2618.69 samples/sec   Loss 3.6149   LearningRate 0.0070   Epoch: 14   Global Step: 610530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:09,057-Speed 2616.90 samples/sec   Loss 3.5588   LearningRate 0.0070   Epoch: 14   Global Step: 610540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:12,941-Speed 2636.93 samples/sec   Loss 3.5803   LearningRate 0.0070   Epoch: 14   Global Step: 610550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:16,834-Speed 2631.65 samples/sec   Loss 3.5502   LearningRate 0.0070   Epoch: 14   Global Step: 610560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:20,731-Speed 2628.47 samples/sec   Loss 3.6550   LearningRate 0.0070   Epoch: 14   Global Step: 610570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:24,641-Speed 2619.50 samples/sec   Loss 3.5494   LearningRate 0.0070   Epoch: 14   Global Step: 610580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:28,542-Speed 2625.25 samples/sec   Loss 3.6003   LearningRate 0.0070   Epoch: 14   Global Step: 610590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:32,440-Speed 2628.35 samples/sec   Loss 3.5880   LearningRate 0.0070   Epoch: 14   Global Step: 610600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:36,338-Speed 2627.16 samples/sec   Loss 3.5751   LearningRate 0.0070   Epoch: 14   Global Step: 610610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:22:40,280-Speed 2598.19 samples/sec   Loss 3.5863   LearningRate 0.0070   Epoch: 14   Global Step: 610620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:22:44,176-Speed 2628.89 samples/sec   Loss 3.5565   LearningRate 0.0070   Epoch: 14   Global Step: 610630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:22:48,084-Speed 2620.80 samples/sec   Loss 3.6249   LearningRate 0.0070   Epoch: 14   Global Step: 610640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:22:51,987-Speed 2623.86 samples/sec   Loss 3.5225   LearningRate 0.0070   Epoch: 14   Global Step: 610650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:22:55,891-Speed 2624.06 samples/sec   Loss 3.6183   LearningRate 0.0070   Epoch: 14   Global Step: 610660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:22:59,798-Speed 2621.39 samples/sec   Loss 3.5276   LearningRate 0.0070   Epoch: 14   Global Step: 610670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:03,700-Speed 2624.49 samples/sec   Loss 3.5722   LearningRate 0.0070   Epoch: 14   Global Step: 610680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:07,597-Speed 2628.37 samples/sec   Loss 3.5954   LearningRate 0.0070   Epoch: 14   Global Step: 610690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:11,496-Speed 2626.73 samples/sec   Loss 3.6075   LearningRate 0.0070   Epoch: 14   Global Step: 610700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:15,530-Speed 2539.72 samples/sec   Loss 3.5820   LearningRate 0.0070   Epoch: 14   Global Step: 610710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:19,406-Speed 2642.01 samples/sec   Loss 3.5871   LearningRate 0.0070   Epoch: 14   Global Step: 610720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:23,303-Speed 2628.98 samples/sec   Loss 3.5771   LearningRate 0.0070   Epoch: 14   Global Step: 610730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:27,245-Speed 2598.09 samples/sec   Loss 3.6334   LearningRate 0.0070   Epoch: 14   Global Step: 610740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:31,148-Speed 2623.86 samples/sec   Loss 3.5706   LearningRate 0.0070   Epoch: 14   Global Step: 610750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:35,041-Speed 2631.03 samples/sec   Loss 3.5658   LearningRate 0.0070   Epoch: 14   Global Step: 610760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:23:38,915-Speed 2643.44 samples/sec   Loss 3.6017   LearningRate 0.0070   Epoch: 14   Global Step: 610770   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:23:42,808-Speed 2631.40 samples/sec   Loss 3.5627   LearningRate 0.0070   Epoch: 14   Global Step: 610780   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:23:46,695-Speed 2634.63 samples/sec   Loss 3.6102   LearningRate 0.0070   Epoch: 14   Global Step: 610790   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:23:50,584-Speed 2633.96 samples/sec   Loss 3.5426   LearningRate 0.0070   Epoch: 14   Global Step: 610800   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:23:54,476-Speed 2631.73 samples/sec   Loss 3.5482   LearningRate 0.0070   Epoch: 14   Global Step: 610810   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:23:58,369-Speed 2631.19 samples/sec   Loss 3.5682   LearningRate 0.0070   Epoch: 14   Global Step: 610820   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:24:02,279-Speed 2620.16 samples/sec   Loss 3.5022   LearningRate 0.0070   Epoch: 14   Global Step: 610830   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:24:06,177-Speed 2627.34 samples/sec   Loss 3.5460   LearningRate 0.0070   Epoch: 14   Global Step: 610840   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:24:10,075-Speed 2627.30 samples/sec   Loss 3.6825   LearningRate 0.0070   Epoch: 14   Global Step: 610850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:24:13,976-Speed 2625.01 samples/sec   Loss 3.5726   LearningRate 0.0070   Epoch: 14   Global Step: 610860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:24:17,875-Speed 2627.45 samples/sec   Loss 3.6076   LearningRate 0.0070   Epoch: 14   Global Step: 610870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:21,774-Speed 2626.84 samples/sec   Loss 3.5808   LearningRate 0.0069   Epoch: 14   Global Step: 610880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:25,673-Speed 2626.91 samples/sec   Loss 3.5508   LearningRate 0.0069   Epoch: 14   Global Step: 610890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:29,595-Speed 2611.57 samples/sec   Loss 3.5879   LearningRate 0.0069   Epoch: 14   Global Step: 610900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:33,494-Speed 2626.77 samples/sec   Loss 3.5650   LearningRate 0.0069   Epoch: 14   Global Step: 610910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:37,410-Speed 2615.86 samples/sec   Loss 3.5143   LearningRate 0.0069   Epoch: 14   Global Step: 610920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:41,325-Speed 2616.64 samples/sec   Loss 3.5943   LearningRate 0.0069   Epoch: 14   Global Step: 610930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:45,219-Speed 2629.92 samples/sec   Loss 3.5872   LearningRate 0.0069   Epoch: 14   Global Step: 610940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:49,109-Speed 2632.53 samples/sec   Loss 3.6103   LearningRate 0.0069   Epoch: 14   Global Step: 610950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:53,211-Speed 2497.66 samples/sec   Loss 3.5759   LearningRate 0.0069   Epoch: 14   Global Step: 610960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:24:57,081-Speed 2647.06 samples/sec   Loss 3.6703   LearningRate 0.0069   Epoch: 14   Global Step: 610970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:25:00,976-Speed 2629.46 samples/sec   Loss 3.5730   LearningRate 0.0069   Epoch: 14   Global Step: 610980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:25:04,870-Speed 2630.31 samples/sec   Loss 3.5467   LearningRate 0.0069   Epoch: 14   Global Step: 610990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:25:08,807-Speed 2602.21 samples/sec   Loss 3.6113   LearningRate 0.0069   Epoch: 14   Global Step: 611000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:25:12,704-Speed 2628.27 samples/sec   Loss 3.5938   LearningRate 0.0069   Epoch: 14   Global Step: 611010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:25:16,577-Speed 2644.33 samples/sec   Loss 3.4893   LearningRate 0.0069   Epoch: 14   Global Step: 611020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:20,469-Speed 2631.73 samples/sec   Loss 3.6312   LearningRate 0.0069   Epoch: 14   Global Step: 611030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:24,364-Speed 2629.79 samples/sec   Loss 3.4681   LearningRate 0.0069   Epoch: 14   Global Step: 611040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:28,259-Speed 2630.35 samples/sec   Loss 3.5627   LearningRate 0.0069   Epoch: 14   Global Step: 611050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:32,151-Speed 2631.55 samples/sec   Loss 3.5210   LearningRate 0.0069   Epoch: 14   Global Step: 611060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:36,058-Speed 2621.66 samples/sec   Loss 3.6306   LearningRate 0.0069   Epoch: 14   Global Step: 611070   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:39,948-Speed 2632.96 samples/sec   Loss 3.5771   LearningRate 0.0069   Epoch: 14   Global Step: 611080   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:43,883-Speed 2602.97 samples/sec   Loss 3.6620   LearningRate 0.0069   Epoch: 14   Global Step: 611090   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:47,773-Speed 2633.07 samples/sec   Loss 3.5453   LearningRate 0.0069   Epoch: 14   Global Step: 611100   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:51,695-Speed 2612.46 samples/sec   Loss 3.5398   LearningRate 0.0069   Epoch: 14   Global Step: 611110   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:25:55,602-Speed 2621.45 samples/sec   Loss 3.6383   LearningRate 0.0069   Epoch: 14   Global Step: 611120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:25:59,476-Speed 2643.91 samples/sec   Loss 3.6741   LearningRate 0.0069   Epoch: 14   Global Step: 611130   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:03,372-Speed 2628.72 samples/sec   Loss 3.6035   LearningRate 0.0069   Epoch: 14   Global Step: 611140   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:07,387-Speed 2551.56 samples/sec   Loss 3.6188   LearningRate 0.0069   Epoch: 14   Global Step: 611150   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:11,365-Speed 2574.72 samples/sec   Loss 3.7016   LearningRate 0.0069   Epoch: 14   Global Step: 611160   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:15,263-Speed 2627.56 samples/sec   Loss 3.5370   LearningRate 0.0069   Epoch: 14   Global Step: 611170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:19,166-Speed 2623.93 samples/sec   Loss 3.5613   LearningRate 0.0069   Epoch: 14   Global Step: 611180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:23,067-Speed 2625.71 samples/sec   Loss 3.5961   LearningRate 0.0069   Epoch: 14   Global Step: 611190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:26,966-Speed 2627.41 samples/sec   Loss 3.5481   LearningRate 0.0069   Epoch: 14   Global Step: 611200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:30,861-Speed 2629.24 samples/sec   Loss 3.5380   LearningRate 0.0069   Epoch: 14   Global Step: 611210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:34,762-Speed 2625.46 samples/sec   Loss 3.5504   LearningRate 0.0069   Epoch: 14   Global Step: 611220   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:38,686-Speed 2610.54 samples/sec   Loss 3.6288   LearningRate 0.0069   Epoch: 14   Global Step: 611230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:26:42,579-Speed 2631.29 samples/sec   Loss 3.5597   LearningRate 0.0069   Epoch: 14   Global Step: 611240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:26:46,472-Speed 2631.18 samples/sec   Loss 3.5425   LearningRate 0.0069   Epoch: 14   Global Step: 611250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:26:50,346-Speed 2643.82 samples/sec   Loss 3.4998   LearningRate 0.0069   Epoch: 14   Global Step: 611260   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:54,260-Speed 2617.38 samples/sec   Loss 3.6536   LearningRate 0.0069   Epoch: 14   Global Step: 611270   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:26:58,160-Speed 2625.58 samples/sec   Loss 3.6140   LearningRate 0.0069   Epoch: 14   Global Step: 611280   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:02,062-Speed 2624.85 samples/sec   Loss 3.5746   LearningRate 0.0069   Epoch: 14   Global Step: 611290   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:05,970-Speed 2621.15 samples/sec   Loss 3.5636   LearningRate 0.0069   Epoch: 14   Global Step: 611300   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:09,864-Speed 2630.53 samples/sec   Loss 3.5948   LearningRate 0.0069   Epoch: 14   Global Step: 611310   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:13,757-Speed 2630.84 samples/sec   Loss 3.5331   LearningRate 0.0069   Epoch: 14   Global Step: 611320   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:17,653-Speed 2629.75 samples/sec   Loss 3.5883   LearningRate 0.0069   Epoch: 14   Global Step: 611330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:21,549-Speed 2629.25 samples/sec   Loss 3.5083   LearningRate 0.0069   Epoch: 14   Global Step: 611340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:25,448-Speed 2627.03 samples/sec   Loss 3.5871   LearningRate 0.0069   Epoch: 14   Global Step: 611350   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:27:29,343-Speed 2630.37 samples/sec   Loss 3.6160   LearningRate 0.0069   Epoch: 14   Global Step: 611360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:33,239-Speed 2628.56 samples/sec   Loss 3.5705   LearningRate 0.0069   Epoch: 14   Global Step: 611370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:37,136-Speed 2628.09 samples/sec   Loss 3.5952   LearningRate 0.0069   Epoch: 14   Global Step: 611380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:41,035-Speed 2626.53 samples/sec   Loss 3.5869   LearningRate 0.0069   Epoch: 14   Global Step: 611390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:44,931-Speed 2629.83 samples/sec   Loss 3.5921   LearningRate 0.0069   Epoch: 14   Global Step: 611400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:48,832-Speed 2625.93 samples/sec   Loss 3.5651   LearningRate 0.0069   Epoch: 14   Global Step: 611410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:52,724-Speed 2631.95 samples/sec   Loss 3.5771   LearningRate 0.0069   Epoch: 14   Global Step: 611420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:27:56,621-Speed 2628.23 samples/sec   Loss 3.6125   LearningRate 0.0069   Epoch: 14   Global Step: 611430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:00,528-Speed 2621.82 samples/sec   Loss 3.5411   LearningRate 0.0069   Epoch: 14   Global Step: 611440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:04,423-Speed 2628.90 samples/sec   Loss 3.6647   LearningRate 0.0069   Epoch: 14   Global Step: 611450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:08,304-Speed 2638.95 samples/sec   Loss 3.5039   LearningRate 0.0069   Epoch: 14   Global Step: 611460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:12,202-Speed 2628.23 samples/sec   Loss 3.5440   LearningRate 0.0069   Epoch: 14   Global Step: 611470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:16,116-Speed 2617.00 samples/sec   Loss 3.5637   LearningRate 0.0069   Epoch: 14   Global Step: 611480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:20,029-Speed 2617.17 samples/sec   Loss 3.6540   LearningRate 0.0069   Epoch: 14   Global Step: 611490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:23,927-Speed 2628.51 samples/sec   Loss 3.6091   LearningRate 0.0069   Epoch: 14   Global Step: 611500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:27,823-Speed 2628.84 samples/sec   Loss 3.5062   LearningRate 0.0069   Epoch: 14   Global Step: 611510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:28:31,700-Speed 2641.98 samples/sec   Loss 3.6815   LearningRate 0.0069   Epoch: 14   Global Step: 611520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:35,594-Speed 2629.61 samples/sec   Loss 3.6816   LearningRate 0.0069   Epoch: 14   Global Step: 611530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:39,509-Speed 2616.63 samples/sec   Loss 3.4888   LearningRate 0.0069   Epoch: 14   Global Step: 611540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:43,409-Speed 2625.70 samples/sec   Loss 3.5984   LearningRate 0.0069   Epoch: 14   Global Step: 611550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:47,347-Speed 2601.42 samples/sec   Loss 3.6309   LearningRate 0.0069   Epoch: 14   Global Step: 611560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:51,258-Speed 2618.69 samples/sec   Loss 3.5125   LearningRate 0.0069   Epoch: 14   Global Step: 611570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:55,174-Speed 2615.81 samples/sec   Loss 3.6456   LearningRate 0.0069   Epoch: 14   Global Step: 611580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:28:59,076-Speed 2625.20 samples/sec   Loss 3.5778   LearningRate 0.0069   Epoch: 14   Global Step: 611590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:29:02,979-Speed 2624.78 samples/sec   Loss 3.5811   LearningRate 0.0069   Epoch: 14   Global Step: 611600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:29:06,883-Speed 2623.15 samples/sec   Loss 3.5659   LearningRate 0.0069   Epoch: 14   Global Step: 611610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-15 16:29:10,778-Speed 2630.00 samples/sec   Loss 3.5919   LearningRate 0.0069   Epoch: 14   Global Step: 611620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:29:14,692-Speed 2616.58 samples/sec   Loss 3.5996   LearningRate 0.0069   Epoch: 14   Global Step: 611630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-15 16:29:18,591-Speed 2626.75 samples/sec   Loss 3.6210   LearningRate 0.0069   Epoch: 14   Global Step: 611640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:29:22,491-Speed 2627.18 samples/sec   Loss 3.5889   LearningRate 0.0069   Epoch: 14   Global Step: 611650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:26,387-Speed 2628.54 samples/sec   Loss 3.5781   LearningRate 0.0069   Epoch: 14   Global Step: 611660   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:30,287-Speed 2626.63 samples/sec   Loss 3.5789   LearningRate 0.0069   Epoch: 14   Global Step: 611670   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:34,198-Speed 2618.84 samples/sec   Loss 3.5206   LearningRate 0.0069   Epoch: 14   Global Step: 611680   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:38,108-Speed 2619.46 samples/sec   Loss 3.5956   LearningRate 0.0069   Epoch: 14   Global Step: 611690   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:42,007-Speed 2626.94 samples/sec   Loss 3.5756   LearningRate 0.0069   Epoch: 14   Global Step: 611700   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:45,905-Speed 2627.97 samples/sec   Loss 3.5746   LearningRate 0.0069   Epoch: 14   Global Step: 611710   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:49,803-Speed 2627.05 samples/sec   Loss 3.6700   LearningRate 0.0069   Epoch: 14   Global Step: 611720   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:53,701-Speed 2627.88 samples/sec   Loss 3.5419   LearningRate 0.0069   Epoch: 14   Global Step: 611730   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:29:57,598-Speed 2628.25 samples/sec   Loss 3.6405   LearningRate 0.0069   Epoch: 14   Global Step: 611740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:30:01,495-Speed 2628.56 samples/sec   Loss 3.5262   LearningRate 0.0069   Epoch: 14   Global Step: 611750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:05,396-Speed 2625.11 samples/sec   Loss 3.5337   LearningRate 0.0069   Epoch: 14   Global Step: 611760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:09,296-Speed 2626.42 samples/sec   Loss 3.5728   LearningRate 0.0069   Epoch: 14   Global Step: 611770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:13,193-Speed 2627.70 samples/sec   Loss 3.5990   LearningRate 0.0069   Epoch: 14   Global Step: 611780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:17,093-Speed 2627.07 samples/sec   Loss 3.5314   LearningRate 0.0069   Epoch: 14   Global Step: 611790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:20,993-Speed 2626.08 samples/sec   Loss 3.5926   LearningRate 0.0069   Epoch: 14   Global Step: 611800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:24,897-Speed 2623.36 samples/sec   Loss 3.5484   LearningRate 0.0069   Epoch: 14   Global Step: 611810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:28,797-Speed 2626.86 samples/sec   Loss 3.5535   LearningRate 0.0069   Epoch: 14   Global Step: 611820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:32,703-Speed 2622.33 samples/sec   Loss 3.6215   LearningRate 0.0069   Epoch: 14   Global Step: 611830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:36,602-Speed 2626.71 samples/sec   Loss 3.5488   LearningRate 0.0069   Epoch: 14   Global Step: 611840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:40,482-Speed 2639.77 samples/sec   Loss 3.5041   LearningRate 0.0069   Epoch: 14   Global Step: 611850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:44,494-Speed 2552.70 samples/sec   Loss 3.5678   LearningRate 0.0069   Epoch: 14   Global Step: 611860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:48,623-Speed 2480.65 samples/sec   Loss 3.5717   LearningRate 0.0069   Epoch: 14   Global Step: 611870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:52,545-Speed 2611.61 samples/sec   Loss 3.5659   LearningRate 0.0069   Epoch: 14   Global Step: 611880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:30:56,444-Speed 2627.16 samples/sec   Loss 3.5070   LearningRate 0.0069   Epoch: 14   Global Step: 611890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:31:00,348-Speed 2624.10 samples/sec   Loss 3.6115   LearningRate 0.0069   Epoch: 14   Global Step: 611900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:31:04,249-Speed 2624.93 samples/sec   Loss 3.5377   LearningRate 0.0069   Epoch: 14   Global Step: 611910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:31:08,128-Speed 2641.81 samples/sec   Loss 3.6420   LearningRate 0.0069   Epoch: 14   Global Step: 611920   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:12,138-Speed 2554.07 samples/sec   Loss 3.5596   LearningRate 0.0069   Epoch: 14   Global Step: 611930   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:16,040-Speed 2624.83 samples/sec   Loss 3.5133   LearningRate 0.0069   Epoch: 14   Global Step: 611940   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:19,937-Speed 2627.85 samples/sec   Loss 3.4750   LearningRate 0.0069   Epoch: 14   Global Step: 611950   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:23,835-Speed 2627.25 samples/sec   Loss 3.5374   LearningRate 0.0069   Epoch: 14   Global Step: 611960   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:27,749-Speed 2617.76 samples/sec   Loss 3.5837   LearningRate 0.0069   Epoch: 14   Global Step: 611970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:31,658-Speed 2620.26 samples/sec   Loss 3.5340   LearningRate 0.0069   Epoch: 14   Global Step: 611980   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:35,567-Speed 2620.47 samples/sec   Loss 3.6262   LearningRate 0.0069   Epoch: 14   Global Step: 611990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:39,465-Speed 2627.54 samples/sec   Loss 3.5404   LearningRate 0.0069   Epoch: 14   Global Step: 612000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:43,369-Speed 2622.88 samples/sec   Loss 3.5722   LearningRate 0.0069   Epoch: 14   Global Step: 612010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:31:47,269-Speed 2626.60 samples/sec   Loss 3.5462   LearningRate 0.0069   Epoch: 14   Global Step: 612020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:31:51,188-Speed 2613.72 samples/sec   Loss 3.6054   LearningRate 0.0069   Epoch: 14   Global Step: 612030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:31:55,106-Speed 2613.45 samples/sec   Loss 3.5333   LearningRate 0.0069   Epoch: 14   Global Step: 612040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:31:59,161-Speed 2526.38 samples/sec   Loss 3.5350   LearningRate 0.0069   Epoch: 14   Global Step: 612050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:03,066-Speed 2622.99 samples/sec   Loss 3.5290   LearningRate 0.0069   Epoch: 14   Global Step: 612060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:06,968-Speed 2625.67 samples/sec   Loss 3.5284   LearningRate 0.0069   Epoch: 14   Global Step: 612070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:10,914-Speed 2595.43 samples/sec   Loss 3.5404   LearningRate 0.0069   Epoch: 14   Global Step: 612080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:14,809-Speed 2630.12 samples/sec   Loss 3.5825   LearningRate 0.0069   Epoch: 14   Global Step: 612090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:18,709-Speed 2625.89 samples/sec   Loss 3.5466   LearningRate 0.0069   Epoch: 14   Global Step: 612100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:22,625-Speed 2615.91 samples/sec   Loss 3.5576   LearningRate 0.0069   Epoch: 14   Global Step: 612110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:26,530-Speed 2623.30 samples/sec   Loss 3.5378   LearningRate 0.0069   Epoch: 14   Global Step: 612120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:32:30,408-Speed 2640.97 samples/sec   Loss 3.5395   LearningRate 0.0069   Epoch: 14   Global Step: 612130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:34,311-Speed 2623.98 samples/sec   Loss 3.5282   LearningRate 0.0069   Epoch: 14   Global Step: 612140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:38,212-Speed 2625.62 samples/sec   Loss 3.4934   LearningRate 0.0069   Epoch: 14   Global Step: 612150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:42,112-Speed 2627.05 samples/sec   Loss 3.4925   LearningRate 0.0069   Epoch: 14   Global Step: 612160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:46,013-Speed 2625.24 samples/sec   Loss 3.5396   LearningRate 0.0069   Epoch: 14   Global Step: 612170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:49,915-Speed 2626.22 samples/sec   Loss 3.6738   LearningRate 0.0069   Epoch: 14   Global Step: 612180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:53,813-Speed 2627.55 samples/sec   Loss 3.5407   LearningRate 0.0069   Epoch: 14   Global Step: 612190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:32:57,711-Speed 2627.52 samples/sec   Loss 3.5749   LearningRate 0.0069   Epoch: 14   Global Step: 612200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:01,615-Speed 2623.87 samples/sec   Loss 3.5464   LearningRate 0.0069   Epoch: 14   Global Step: 612210   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:05,522-Speed 2621.45 samples/sec   Loss 3.5522   LearningRate 0.0069   Epoch: 14   Global Step: 612220   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:09,420-Speed 2627.36 samples/sec   Loss 3.6147   LearningRate 0.0069   Epoch: 14   Global Step: 612230   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:13,324-Speed 2623.84 samples/sec   Loss 3.5251   LearningRate 0.0069   Epoch: 14   Global Step: 612240   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:17,275-Speed 2593.18 samples/sec   Loss 3.5244   LearningRate 0.0069   Epoch: 14   Global Step: 612250   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:21,220-Speed 2595.76 samples/sec   Loss 3.5072   LearningRate 0.0069   Epoch: 14   Global Step: 612260   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:25,123-Speed 2624.81 samples/sec   Loss 3.5635   LearningRate 0.0069   Epoch: 14   Global Step: 612270   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:29,031-Speed 2620.83 samples/sec   Loss 3.5158   LearningRate 0.0069   Epoch: 14   Global Step: 612280   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:32,929-Speed 2627.72 samples/sec   Loss 3.5650   LearningRate 0.0069   Epoch: 14   Global Step: 612290   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:33:36,851-Speed 2611.90 samples/sec   Loss 3.5326   LearningRate 0.0069   Epoch: 14   Global Step: 612300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:33:40,758-Speed 2621.17 samples/sec   Loss 3.5667   LearningRate 0.0069   Epoch: 14   Global Step: 612310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:33:44,657-Speed 2627.00 samples/sec   Loss 3.5477   LearningRate 0.0069   Epoch: 14   Global Step: 612320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:33:48,556-Speed 2627.47 samples/sec   Loss 3.5162   LearningRate 0.0069   Epoch: 14   Global Step: 612330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:33:52,458-Speed 2625.10 samples/sec   Loss 3.5732   LearningRate 0.0069   Epoch: 14   Global Step: 612340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:33:56,360-Speed 2624.79 samples/sec   Loss 3.5962   LearningRate 0.0069   Epoch: 14   Global Step: 612350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:34:00,257-Speed 2627.93 samples/sec   Loss 3.5510   LearningRate 0.0069   Epoch: 14   Global Step: 612360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:34:04,156-Speed 2626.84 samples/sec   Loss 3.4766   LearningRate 0.0069   Epoch: 14   Global Step: 612370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:34:08,064-Speed 2620.84 samples/sec   Loss 3.5502   LearningRate 0.0069   Epoch: 14   Global Step: 612380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:34:11,975-Speed 2618.72 samples/sec   Loss 3.4875   LearningRate 0.0069   Epoch: 14   Global Step: 612390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:34:15,875-Speed 2626.25 samples/sec   Loss 3.5062   LearningRate 0.0069   Epoch: 14   Global Step: 612400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:34:19,764-Speed 2633.95 samples/sec   Loss 3.4767   LearningRate 0.0069   Epoch: 14   Global Step: 612410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:34:23,649-Speed 2636.61 samples/sec   Loss 3.6263   LearningRate 0.0069   Epoch: 14   Global Step: 612420   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:27,558-Speed 2620.66 samples/sec   Loss 3.5759   LearningRate 0.0069   Epoch: 14   Global Step: 612430   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:31,459-Speed 2625.17 samples/sec   Loss 3.5223   LearningRate 0.0069   Epoch: 14   Global Step: 612440   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:35,362-Speed 2624.58 samples/sec   Loss 3.5743   LearningRate 0.0069   Epoch: 14   Global Step: 612450   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:39,265-Speed 2624.00 samples/sec   Loss 3.5240   LearningRate 0.0068   Epoch: 14   Global Step: 612460   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:43,177-Speed 2617.68 samples/sec   Loss 3.5912   LearningRate 0.0068   Epoch: 14   Global Step: 612470   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:47,081-Speed 2624.28 samples/sec   Loss 3.5493   LearningRate 0.0068   Epoch: 14   Global Step: 612480   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:50,982-Speed 2625.97 samples/sec   Loss 3.6295   LearningRate 0.0068   Epoch: 14   Global Step: 612490   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:54,885-Speed 2624.21 samples/sec   Loss 3.5659   LearningRate 0.0068   Epoch: 14   Global Step: 612500   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:34:58,786-Speed 2625.36 samples/sec   Loss 3.5423   LearningRate 0.0068   Epoch: 14   Global Step: 612510   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:35:02,691-Speed 2623.58 samples/sec   Loss 3.5720   LearningRate 0.0068   Epoch: 14   Global Step: 612520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:06,596-Speed 2622.89 samples/sec   Loss 3.5691   LearningRate 0.0068   Epoch: 14   Global Step: 612530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:10,571-Speed 2576.42 samples/sec   Loss 3.5372   LearningRate 0.0068   Epoch: 14   Global Step: 612540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:14,524-Speed 2590.83 samples/sec   Loss 3.6102   LearningRate 0.0068   Epoch: 14   Global Step: 612550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:18,438-Speed 2617.03 samples/sec   Loss 3.5708   LearningRate 0.0068   Epoch: 14   Global Step: 612560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:22,404-Speed 2582.95 samples/sec   Loss 3.4781   LearningRate 0.0068   Epoch: 14   Global Step: 612570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:26,310-Speed 2622.20 samples/sec   Loss 3.5890   LearningRate 0.0068   Epoch: 14   Global Step: 612580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:30,232-Speed 2611.95 samples/sec   Loss 3.5579   LearningRate 0.0068   Epoch: 14   Global Step: 612590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:34,170-Speed 2601.07 samples/sec   Loss 3.5585   LearningRate 0.0068   Epoch: 14   Global Step: 612600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:38,077-Speed 2621.43 samples/sec   Loss 3.6143   LearningRate 0.0068   Epoch: 14   Global Step: 612610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:41,952-Speed 2642.88 samples/sec   Loss 3.5720   LearningRate 0.0068   Epoch: 14   Global Step: 612620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:45,864-Speed 2618.68 samples/sec   Loss 3.6082   LearningRate 0.0068   Epoch: 14   Global Step: 612630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:49,769-Speed 2622.94 samples/sec   Loss 3.5627   LearningRate 0.0068   Epoch: 14   Global Step: 612640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:53,670-Speed 2625.57 samples/sec   Loss 3.5756   LearningRate 0.0068   Epoch: 14   Global Step: 612650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:35:57,578-Speed 2621.37 samples/sec   Loss 3.5414   LearningRate 0.0068   Epoch: 14   Global Step: 612660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:01,487-Speed 2620.52 samples/sec   Loss 3.4953   LearningRate 0.0068   Epoch: 14   Global Step: 612670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:05,408-Speed 2611.87 samples/sec   Loss 3.5174   LearningRate 0.0068   Epoch: 14   Global Step: 612680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:09,310-Speed 2624.77 samples/sec   Loss 3.4907   LearningRate 0.0068   Epoch: 14   Global Step: 612690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:13,211-Speed 2625.23 samples/sec   Loss 3.5661   LearningRate 0.0068   Epoch: 14   Global Step: 612700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:17,108-Speed 2628.94 samples/sec   Loss 3.5510   LearningRate 0.0068   Epoch: 14   Global Step: 612710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:21,019-Speed 2618.79 samples/sec   Loss 3.5589   LearningRate 0.0068   Epoch: 14   Global Step: 612720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:36:24,903-Speed 2636.51 samples/sec   Loss 3.6425   LearningRate 0.0068   Epoch: 14   Global Step: 612730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:28,810-Speed 2622.76 samples/sec   Loss 3.6050   LearningRate 0.0068   Epoch: 14   Global Step: 612740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:32,711-Speed 2625.85 samples/sec   Loss 3.5672   LearningRate 0.0068   Epoch: 14   Global Step: 612750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:36,617-Speed 2622.30 samples/sec   Loss 3.5356   LearningRate 0.0068   Epoch: 14   Global Step: 612760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:40,527-Speed 2619.30 samples/sec   Loss 3.4713   LearningRate 0.0068   Epoch: 14   Global Step: 612770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:44,432-Speed 2623.02 samples/sec   Loss 3.5832   LearningRate 0.0068   Epoch: 14   Global Step: 612780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:48,343-Speed 2618.71 samples/sec   Loss 3.5097   LearningRate 0.0068   Epoch: 14   Global Step: 612790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:52,260-Speed 2615.07 samples/sec   Loss 3.5082   LearningRate 0.0068   Epoch: 14   Global Step: 612800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:36:56,168-Speed 2620.30 samples/sec   Loss 3.5694   LearningRate 0.0068   Epoch: 14   Global Step: 612810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:00,071-Speed 2624.35 samples/sec   Loss 3.5738   LearningRate 0.0068   Epoch: 14   Global Step: 612820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:04,043-Speed 2578.97 samples/sec   Loss 3.5360   LearningRate 0.0068   Epoch: 14   Global Step: 612830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:37:07,925-Speed 2639.00 samples/sec   Loss 3.5868   LearningRate 0.0068   Epoch: 14   Global Step: 612840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:11,837-Speed 2617.72 samples/sec   Loss 3.5476   LearningRate 0.0068   Epoch: 14   Global Step: 612850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:15,742-Speed 2622.90 samples/sec   Loss 3.6217   LearningRate 0.0068   Epoch: 14   Global Step: 612860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:19,645-Speed 2623.86 samples/sec   Loss 3.5401   LearningRate 0.0068   Epoch: 14   Global Step: 612870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:23,546-Speed 2625.80 samples/sec   Loss 3.6158   LearningRate 0.0068   Epoch: 14   Global Step: 612880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:27,465-Speed 2614.60 samples/sec   Loss 3.6335   LearningRate 0.0068   Epoch: 14   Global Step: 612890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:31,379-Speed 2616.14 samples/sec   Loss 3.5548   LearningRate 0.0068   Epoch: 14   Global Step: 612900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:35,286-Speed 2621.97 samples/sec   Loss 3.5346   LearningRate 0.0068   Epoch: 14   Global Step: 612910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:39,183-Speed 2627.83 samples/sec   Loss 3.5044   LearningRate 0.0068   Epoch: 14   Global Step: 612920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:37:43,060-Speed 2641.96 samples/sec   Loss 3.5871   LearningRate 0.0068   Epoch: 14   Global Step: 612930   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:37:46,961-Speed 2625.79 samples/sec   Loss 3.5473   LearningRate 0.0068   Epoch: 14   Global Step: 612940   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:37:50,874-Speed 2617.66 samples/sec   Loss 3.5724   LearningRate 0.0068   Epoch: 14   Global Step: 612950   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:37:54,781-Speed 2621.29 samples/sec   Loss 3.6058   LearningRate 0.0068   Epoch: 14   Global Step: 612960   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:37:58,691-Speed 2619.61 samples/sec   Loss 3.5051   LearningRate 0.0068   Epoch: 14   Global Step: 612970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:38:02,600-Speed 2620.28 samples/sec   Loss 3.6563   LearningRate 0.0068   Epoch: 14   Global Step: 612980   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:38:06,501-Speed 2625.87 samples/sec   Loss 3.5809   LearningRate 0.0068   Epoch: 14   Global Step: 612990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:38:10,403-Speed 2624.88 samples/sec   Loss 3.5495   LearningRate 0.0068   Epoch: 14   Global Step: 613000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:38:14,315-Speed 2618.16 samples/sec   Loss 3.5770   LearningRate 0.0068   Epoch: 14   Global Step: 613010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:38:18,218-Speed 2623.44 samples/sec   Loss 3.5310   LearningRate 0.0068   Epoch: 14   Global Step: 613020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:38:22,137-Speed 2614.39 samples/sec   Loss 3.7175   LearningRate 0.0068   Epoch: 14   Global Step: 613030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:26,045-Speed 2621.03 samples/sec   Loss 3.6127   LearningRate 0.0068   Epoch: 14   Global Step: 613040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:29,958-Speed 2617.31 samples/sec   Loss 3.5858   LearningRate 0.0068   Epoch: 14   Global Step: 613050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:33,867-Speed 2620.38 samples/sec   Loss 3.4612   LearningRate 0.0068   Epoch: 14   Global Step: 613060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:37,774-Speed 2621.26 samples/sec   Loss 3.5283   LearningRate 0.0068   Epoch: 14   Global Step: 613070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:41,689-Speed 2616.32 samples/sec   Loss 3.5223   LearningRate 0.0068   Epoch: 14   Global Step: 613080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:45,599-Speed 2619.68 samples/sec   Loss 3.5127   LearningRate 0.0068   Epoch: 14   Global Step: 613090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:49,510-Speed 2618.50 samples/sec   Loss 3.5552   LearningRate 0.0068   Epoch: 14   Global Step: 613100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:53,415-Speed 2623.33 samples/sec   Loss 3.5277   LearningRate 0.0068   Epoch: 14   Global Step: 613110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:38:57,318-Speed 2623.99 samples/sec   Loss 3.6239   LearningRate 0.0068   Epoch: 14   Global Step: 613120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:01,225-Speed 2621.38 samples/sec   Loss 3.4847   LearningRate 0.0068   Epoch: 14   Global Step: 613130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:39:05,103-Speed 2641.67 samples/sec   Loss 3.6080   LearningRate 0.0068   Epoch: 14   Global Step: 613140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:09,004-Speed 2625.72 samples/sec   Loss 3.4693   LearningRate 0.0068   Epoch: 14   Global Step: 613150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:12,906-Speed 2624.84 samples/sec   Loss 3.6453   LearningRate 0.0068   Epoch: 14   Global Step: 613160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:16,807-Speed 2625.58 samples/sec   Loss 3.4532   LearningRate 0.0068   Epoch: 14   Global Step: 613170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:20,714-Speed 2621.67 samples/sec   Loss 3.5084   LearningRate 0.0068   Epoch: 14   Global Step: 613180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:24,618-Speed 2623.40 samples/sec   Loss 3.5642   LearningRate 0.0068   Epoch: 14   Global Step: 613190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:28,534-Speed 2616.57 samples/sec   Loss 3.5914   LearningRate 0.0068   Epoch: 14   Global Step: 613200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:32,450-Speed 2614.97 samples/sec   Loss 3.5336   LearningRate 0.0068   Epoch: 14   Global Step: 613210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:39:36,354-Speed 2623.45 samples/sec   Loss 3.6121   LearningRate 0.0068   Epoch: 14   Global Step: 613220   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:39:40,264-Speed 2619.86 samples/sec   Loss 3.5729   LearningRate 0.0068   Epoch: 14   Global Step: 613230   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:39:44,173-Speed 2621.22 samples/sec   Loss 3.5644   LearningRate 0.0068   Epoch: 14   Global Step: 613240   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:39:48,076-Speed 2623.99 samples/sec   Loss 3.4971   LearningRate 0.0068   Epoch: 14   Global Step: 613250   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:39:51,977-Speed 2625.31 samples/sec   Loss 3.5481   LearningRate 0.0068   Epoch: 14   Global Step: 613260   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:39:55,909-Speed 2605.11 samples/sec   Loss 3.5664   LearningRate 0.0068   Epoch: 14   Global Step: 613270   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:39:59,818-Speed 2620.17 samples/sec   Loss 3.4164   LearningRate 0.0068   Epoch: 14   Global Step: 613280   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:40:03,729-Speed 2619.13 samples/sec   Loss 3.5268   LearningRate 0.0068   Epoch: 14   Global Step: 613290   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:40:07,633-Speed 2623.83 samples/sec   Loss 3.4437   LearningRate 0.0068   Epoch: 14   Global Step: 613300   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:40:11,534-Speed 2626.21 samples/sec   Loss 3.5215   LearningRate 0.0068   Epoch: 14   Global Step: 613310   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:40:15,435-Speed 2625.25 samples/sec   Loss 3.5867   LearningRate 0.0068   Epoch: 14   Global Step: 613320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:19,340-Speed 2623.27 samples/sec   Loss 3.5354   LearningRate 0.0068   Epoch: 14   Global Step: 613330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:23,252-Speed 2617.71 samples/sec   Loss 3.5176   LearningRate 0.0068   Epoch: 14   Global Step: 613340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:27,343-Speed 2504.47 samples/sec   Loss 3.5323   LearningRate 0.0068   Epoch: 14   Global Step: 613350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:31,480-Speed 2475.53 samples/sec   Loss 3.6104   LearningRate 0.0068   Epoch: 14   Global Step: 613360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:35,496-Speed 2550.33 samples/sec   Loss 3.5784   LearningRate 0.0068   Epoch: 14   Global Step: 613370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:39,416-Speed 2613.24 samples/sec   Loss 3.4931   LearningRate 0.0068   Epoch: 14   Global Step: 613380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:43,327-Speed 2619.01 samples/sec   Loss 3.5959   LearningRate 0.0068   Epoch: 14   Global Step: 613390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:47,258-Speed 2605.12 samples/sec   Loss 3.4671   LearningRate 0.0068   Epoch: 14   Global Step: 613400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:51,176-Speed 2615.29 samples/sec   Loss 3.5496   LearningRate 0.0068   Epoch: 14   Global Step: 613410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:55,158-Speed 2572.02 samples/sec   Loss 3.5445   LearningRate 0.0068   Epoch: 14   Global Step: 613420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:40:59,063-Speed 2622.77 samples/sec   Loss 3.4395   LearningRate 0.0068   Epoch: 14   Global Step: 613430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:03,142-Speed 2511.26 samples/sec   Loss 3.5811   LearningRate 0.0068   Epoch: 14   Global Step: 613440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:07,055-Speed 2617.52 samples/sec   Loss 3.5832   LearningRate 0.0068   Epoch: 14   Global Step: 613450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:10,957-Speed 2624.56 samples/sec   Loss 3.4689   LearningRate 0.0068   Epoch: 14   Global Step: 613460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:14,862-Speed 2622.88 samples/sec   Loss 3.5138   LearningRate 0.0068   Epoch: 14   Global Step: 613470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:18,763-Speed 2626.07 samples/sec   Loss 3.5401   LearningRate 0.0068   Epoch: 14   Global Step: 613480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:22,666-Speed 2624.62 samples/sec   Loss 3.4637   LearningRate 0.0068   Epoch: 14   Global Step: 613490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:26,566-Speed 2626.04 samples/sec   Loss 3.5237   LearningRate 0.0068   Epoch: 14   Global Step: 613500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:30,467-Speed 2625.83 samples/sec   Loss 3.5445   LearningRate 0.0068   Epoch: 14   Global Step: 613510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:34,370-Speed 2623.67 samples/sec   Loss 3.5919   LearningRate 0.0068   Epoch: 14   Global Step: 613520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:41:38,247-Speed 2641.74 samples/sec   Loss 3.4765   LearningRate 0.0068   Epoch: 14   Global Step: 613530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:41:42,167-Speed 2613.40 samples/sec   Loss 3.5252   LearningRate 0.0068   Epoch: 14   Global Step: 613540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:41:46,071-Speed 2623.67 samples/sec   Loss 3.5580   LearningRate 0.0068   Epoch: 14   Global Step: 613550   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:41:49,976-Speed 2624.19 samples/sec   Loss 3.4906   LearningRate 0.0068   Epoch: 14   Global Step: 613560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:41:53,891-Speed 2615.95 samples/sec   Loss 3.5788   LearningRate 0.0068   Epoch: 14   Global Step: 613570   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:41:57,803-Speed 2618.34 samples/sec   Loss 3.5556   LearningRate 0.0068   Epoch: 14   Global Step: 613580   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:42:01,704-Speed 2626.04 samples/sec   Loss 3.4797   LearningRate 0.0068   Epoch: 14   Global Step: 613590   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:42:05,623-Speed 2612.98 samples/sec   Loss 3.4815   LearningRate 0.0068   Epoch: 14   Global Step: 613600   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:42:09,533-Speed 2619.73 samples/sec   Loss 3.5717   LearningRate 0.0068   Epoch: 14   Global Step: 613610   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:42:13,493-Speed 2587.10 samples/sec   Loss 3.5186   LearningRate 0.0068   Epoch: 14   Global Step: 613620   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:42:17,409-Speed 2615.03 samples/sec   Loss 3.5419   LearningRate 0.0068   Epoch: 14   Global Step: 613630   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:42:21,320-Speed 2619.11 samples/sec   Loss 3.5917   LearningRate 0.0068   Epoch: 14   Global Step: 613640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:25,227-Speed 2621.19 samples/sec   Loss 3.5495   LearningRate 0.0068   Epoch: 14   Global Step: 613650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:29,219-Speed 2566.03 samples/sec   Loss 3.5278   LearningRate 0.0068   Epoch: 14   Global Step: 613660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:33,129-Speed 2619.72 samples/sec   Loss 3.5881   LearningRate 0.0068   Epoch: 14   Global Step: 613670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:37,039-Speed 2619.69 samples/sec   Loss 3.6360   LearningRate 0.0068   Epoch: 14   Global Step: 613680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:40,950-Speed 2619.28 samples/sec   Loss 3.5423   LearningRate 0.0068   Epoch: 14   Global Step: 613690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:44,873-Speed 2610.76 samples/sec   Loss 3.6311   LearningRate 0.0068   Epoch: 14   Global Step: 613700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:48,777-Speed 2623.49 samples/sec   Loss 3.5593   LearningRate 0.0068   Epoch: 14   Global Step: 613710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:52,705-Speed 2607.64 samples/sec   Loss 3.5319   LearningRate 0.0068   Epoch: 14   Global Step: 613720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:42:56,655-Speed 2593.26 samples/sec   Loss 3.5669   LearningRate 0.0068   Epoch: 14   Global Step: 613730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:00,562-Speed 2621.53 samples/sec   Loss 3.4682   LearningRate 0.0068   Epoch: 14   Global Step: 613740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:43:04,452-Speed 2633.59 samples/sec   Loss 3.5797   LearningRate 0.0068   Epoch: 14   Global Step: 613750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:08,358-Speed 2622.32 samples/sec   Loss 3.5667   LearningRate 0.0068   Epoch: 14   Global Step: 613760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:12,272-Speed 2617.24 samples/sec   Loss 3.5015   LearningRate 0.0068   Epoch: 14   Global Step: 613770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:16,176-Speed 2623.50 samples/sec   Loss 3.4801   LearningRate 0.0068   Epoch: 14   Global Step: 613780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:20,088-Speed 2618.28 samples/sec   Loss 3.5900   LearningRate 0.0068   Epoch: 14   Global Step: 613790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:23,990-Speed 2624.62 samples/sec   Loss 3.5441   LearningRate 0.0068   Epoch: 14   Global Step: 613800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:27,901-Speed 2618.73 samples/sec   Loss 3.5678   LearningRate 0.0068   Epoch: 14   Global Step: 613810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:31,805-Speed 2623.58 samples/sec   Loss 3.5981   LearningRate 0.0068   Epoch: 14   Global Step: 613820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:35,730-Speed 2610.12 samples/sec   Loss 3.4911   LearningRate 0.0068   Epoch: 14   Global Step: 613830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:39,633-Speed 2624.37 samples/sec   Loss 3.5464   LearningRate 0.0068   Epoch: 14   Global Step: 613840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:43,516-Speed 2637.61 samples/sec   Loss 3.5163   LearningRate 0.0068   Epoch: 14   Global Step: 613850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:47,418-Speed 2624.80 samples/sec   Loss 3.5223   LearningRate 0.0068   Epoch: 14   Global Step: 613860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:51,322-Speed 2623.93 samples/sec   Loss 3.5520   LearningRate 0.0068   Epoch: 14   Global Step: 613870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:55,230-Speed 2620.43 samples/sec   Loss 3.5574   LearningRate 0.0068   Epoch: 14   Global Step: 613880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:43:59,134-Speed 2623.59 samples/sec   Loss 3.5778   LearningRate 0.0068   Epoch: 14   Global Step: 613890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:44:03,038-Speed 2624.09 samples/sec   Loss 3.5567   LearningRate 0.0068   Epoch: 14   Global Step: 613900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:44:06,922-Speed 2637.01 samples/sec   Loss 3.4865   LearningRate 0.0068   Epoch: 14   Global Step: 613910   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:10,833-Speed 2619.49 samples/sec   Loss 3.4509   LearningRate 0.0068   Epoch: 14   Global Step: 613920   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:14,772-Speed 2599.95 samples/sec   Loss 3.4855   LearningRate 0.0068   Epoch: 14   Global Step: 613930   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:18,674-Speed 2624.84 samples/sec   Loss 3.4322   LearningRate 0.0068   Epoch: 14   Global Step: 613940   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:22,577-Speed 2624.44 samples/sec   Loss 3.5719   LearningRate 0.0068   Epoch: 14   Global Step: 613950   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:26,480-Speed 2624.25 samples/sec   Loss 3.5158   LearningRate 0.0068   Epoch: 14   Global Step: 613960   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:30,385-Speed 2623.32 samples/sec   Loss 3.5388   LearningRate 0.0068   Epoch: 14   Global Step: 613970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:34,290-Speed 2622.58 samples/sec   Loss 3.5040   LearningRate 0.0068   Epoch: 14   Global Step: 613980   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:38,193-Speed 2625.33 samples/sec   Loss 3.4775   LearningRate 0.0068   Epoch: 14   Global Step: 613990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:42,096-Speed 2624.10 samples/sec   Loss 3.5937   LearningRate 0.0068   Epoch: 14   Global Step: 614000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:44:46,058-Speed 2585.16 samples/sec   Loss 3.5815   LearningRate 0.0068   Epoch: 14   Global Step: 614010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:44:50,059-Speed 2559.97 samples/sec   Loss 3.5241   LearningRate 0.0068   Epoch: 14   Global Step: 614020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:44:53,963-Speed 2623.73 samples/sec   Loss 3.5190   LearningRate 0.0068   Epoch: 14   Global Step: 614030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:44:57,862-Speed 2627.14 samples/sec   Loss 3.5567   LearningRate 0.0068   Epoch: 14   Global Step: 614040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:01,763-Speed 2625.55 samples/sec   Loss 3.4921   LearningRate 0.0067   Epoch: 14   Global Step: 614050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:05,659-Speed 2628.57 samples/sec   Loss 3.5434   LearningRate 0.0067   Epoch: 14   Global Step: 614060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:09,563-Speed 2623.58 samples/sec   Loss 3.4299   LearningRate 0.0067   Epoch: 14   Global Step: 614070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:13,463-Speed 2626.90 samples/sec   Loss 3.6157   LearningRate 0.0067   Epoch: 14   Global Step: 614080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:17,367-Speed 2623.39 samples/sec   Loss 3.5616   LearningRate 0.0067   Epoch: 14   Global Step: 614090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:21,272-Speed 2623.13 samples/sec   Loss 3.5133   LearningRate 0.0067   Epoch: 14   Global Step: 614100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:25,152-Speed 2639.79 samples/sec   Loss 3.5096   LearningRate 0.0067   Epoch: 14   Global Step: 614110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:29,051-Speed 2627.37 samples/sec   Loss 3.6310   LearningRate 0.0067   Epoch: 14   Global Step: 614120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:32,951-Speed 2625.98 samples/sec   Loss 3.5667   LearningRate 0.0067   Epoch: 14   Global Step: 614130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:36,854-Speed 2624.19 samples/sec   Loss 3.5481   LearningRate 0.0067   Epoch: 14   Global Step: 614140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:40,760-Speed 2622.37 samples/sec   Loss 3.5562   LearningRate 0.0067   Epoch: 14   Global Step: 614150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:44,662-Speed 2625.53 samples/sec   Loss 3.4802   LearningRate 0.0067   Epoch: 14   Global Step: 614160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:48,564-Speed 2625.07 samples/sec   Loss 3.5847   LearningRate 0.0067   Epoch: 14   Global Step: 614170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:52,466-Speed 2625.21 samples/sec   Loss 3.4968   LearningRate 0.0067   Epoch: 14   Global Step: 614180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:45:56,395-Speed 2607.02 samples/sec   Loss 3.5020   LearningRate 0.0067   Epoch: 14   Global Step: 614190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:00,295-Speed 2626.56 samples/sec   Loss 3.4984   LearningRate 0.0067   Epoch: 14   Global Step: 614200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:04,201-Speed 2621.89 samples/sec   Loss 3.5716   LearningRate 0.0067   Epoch: 14   Global Step: 614210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:46:08,099-Speed 2627.26 samples/sec   Loss 3.5296   LearningRate 0.0067   Epoch: 14   Global Step: 614220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:46:12,005-Speed 2622.64 samples/sec   Loss 3.6038   LearningRate 0.0067   Epoch: 14   Global Step: 614230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:46:15,911-Speed 2622.24 samples/sec   Loss 3.5881   LearningRate 0.0067   Epoch: 14   Global Step: 614240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:46:19,827-Speed 2616.25 samples/sec   Loss 3.5165   LearningRate 0.0067   Epoch: 14   Global Step: 614250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:46:23,729-Speed 2624.89 samples/sec   Loss 3.5389   LearningRate 0.0067   Epoch: 14   Global Step: 614260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:27,648-Speed 2613.77 samples/sec   Loss 3.6003   LearningRate 0.0067   Epoch: 14   Global Step: 614270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:31,662-Speed 2551.81 samples/sec   Loss 3.5925   LearningRate 0.0067   Epoch: 14   Global Step: 614280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:35,581-Speed 2613.13 samples/sec   Loss 3.4460   LearningRate 0.0067   Epoch: 14   Global Step: 614290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:39,500-Speed 2613.11 samples/sec   Loss 3.5752   LearningRate 0.0067   Epoch: 14   Global Step: 614300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:46:43,410-Speed 2620.60 samples/sec   Loss 3.6157   LearningRate 0.0067   Epoch: 14   Global Step: 614310   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:46:47,353-Speed 2597.16 samples/sec   Loss 3.4290   LearningRate 0.0067   Epoch: 14   Global Step: 614320   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:46:51,265-Speed 2621.39 samples/sec   Loss 3.4528   LearningRate 0.0067   Epoch: 14   Global Step: 614330   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:46:55,167-Speed 2625.32 samples/sec   Loss 3.5806   LearningRate 0.0067   Epoch: 14   Global Step: 614340   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:46:59,072-Speed 2623.45 samples/sec   Loss 3.5224   LearningRate 0.0067   Epoch: 14   Global Step: 614350   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:47:02,978-Speed 2622.11 samples/sec   Loss 3.5552   LearningRate 0.0067   Epoch: 14   Global Step: 614360   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:47:06,875-Speed 2628.20 samples/sec   Loss 3.5525   LearningRate 0.0067   Epoch: 14   Global Step: 614370   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:47:10,772-Speed 2628.36 samples/sec   Loss 3.5174   LearningRate 0.0067   Epoch: 14   Global Step: 614380   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:47:14,671-Speed 2626.88 samples/sec   Loss 3.4592   LearningRate 0.0067   Epoch: 14   Global Step: 614390   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:47:18,581-Speed 2619.99 samples/sec   Loss 3.4994   LearningRate 0.0067   Epoch: 14   Global Step: 614400   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:47:22,478-Speed 2628.69 samples/sec   Loss 3.4854   LearningRate 0.0067   Epoch: 14   Global Step: 614410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:26,375-Speed 2628.14 samples/sec   Loss 3.5424   LearningRate 0.0067   Epoch: 14   Global Step: 614420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:30,278-Speed 2624.27 samples/sec   Loss 3.4850   LearningRate 0.0067   Epoch: 14   Global Step: 614430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:34,179-Speed 2625.68 samples/sec   Loss 3.5850   LearningRate 0.0067   Epoch: 14   Global Step: 614440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:38,080-Speed 2625.50 samples/sec   Loss 3.4737   LearningRate 0.0067   Epoch: 14   Global Step: 614450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:41,981-Speed 2625.97 samples/sec   Loss 3.5998   LearningRate 0.0067   Epoch: 14   Global Step: 614460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:45,886-Speed 2623.32 samples/sec   Loss 3.5924   LearningRate 0.0067   Epoch: 14   Global Step: 614470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:49,787-Speed 2626.03 samples/sec   Loss 3.4955   LearningRate 0.0067   Epoch: 14   Global Step: 614480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:53,684-Speed 2628.03 samples/sec   Loss 3.5177   LearningRate 0.0067   Epoch: 14   Global Step: 614490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:47:57,626-Speed 2598.44 samples/sec   Loss 3.5035   LearningRate 0.0067   Epoch: 14   Global Step: 614500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:01,523-Speed 2628.49 samples/sec   Loss 3.4141   LearningRate 0.0067   Epoch: 14   Global Step: 614510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:48:05,441-Speed 2614.23 samples/sec   Loss 3.4872   LearningRate 0.0067   Epoch: 14   Global Step: 614520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:48:09,312-Speed 2645.72 samples/sec   Loss 3.5251   LearningRate 0.0067   Epoch: 14   Global Step: 614530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:13,211-Speed 2627.25 samples/sec   Loss 3.5052   LearningRate 0.0067   Epoch: 14   Global Step: 614540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:17,112-Speed 2625.91 samples/sec   Loss 3.5348   LearningRate 0.0067   Epoch: 14   Global Step: 614550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:21,046-Speed 2602.81 samples/sec   Loss 3.4840   LearningRate 0.0067   Epoch: 14   Global Step: 614560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:24,948-Speed 2625.44 samples/sec   Loss 3.4963   LearningRate 0.0067   Epoch: 14   Global Step: 614570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:28,846-Speed 2628.26 samples/sec   Loss 3.5192   LearningRate 0.0067   Epoch: 14   Global Step: 614580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:48:32,718-Speed 2645.41 samples/sec   Loss 3.5037   LearningRate 0.0067   Epoch: 14   Global Step: 614590   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:48:36,630-Speed 2618.17 samples/sec   Loss 3.5093   LearningRate 0.0067   Epoch: 14   Global Step: 614600   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:48:40,552-Speed 2611.54 samples/sec   Loss 3.4587   LearningRate 0.0067   Epoch: 14   Global Step: 614610   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:48:44,452-Speed 2626.41 samples/sec   Loss 3.5736   LearningRate 0.0067   Epoch: 14   Global Step: 614620   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:48:48,346-Speed 2630.59 samples/sec   Loss 3.6551   LearningRate 0.0067   Epoch: 14   Global Step: 614630   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:48:52,249-Speed 2624.33 samples/sec   Loss 3.6089   LearningRate 0.0067   Epoch: 14   Global Step: 614640   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:48:56,149-Speed 2626.00 samples/sec   Loss 3.5207   LearningRate 0.0067   Epoch: 14   Global Step: 614650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:49:00,051-Speed 2625.28 samples/sec   Loss 3.4514   LearningRate 0.0067   Epoch: 14   Global Step: 614660   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:49:04,091-Speed 2535.39 samples/sec   Loss 3.5551   LearningRate 0.0067   Epoch: 14   Global Step: 614670   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:49:08,030-Speed 2600.32 samples/sec   Loss 3.5531   LearningRate 0.0067   Epoch: 14   Global Step: 614680   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:49:11,982-Speed 2592.55 samples/sec   Loss 3.5223   LearningRate 0.0067   Epoch: 14   Global Step: 614690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:15,898-Speed 2615.18 samples/sec   Loss 3.5331   LearningRate 0.0067   Epoch: 14   Global Step: 614700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:19,796-Speed 2627.67 samples/sec   Loss 3.5216   LearningRate 0.0067   Epoch: 14   Global Step: 614710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:23,691-Speed 2629.45 samples/sec   Loss 3.5658   LearningRate 0.0067   Epoch: 14   Global Step: 614720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:27,602-Speed 2619.48 samples/sec   Loss 3.4508   LearningRate 0.0067   Epoch: 14   Global Step: 614730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:31,497-Speed 2629.42 samples/sec   Loss 3.5081   LearningRate 0.0067   Epoch: 14   Global Step: 614740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:35,393-Speed 2629.29 samples/sec   Loss 3.4890   LearningRate 0.0067   Epoch: 14   Global Step: 614750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:39,303-Speed 2619.86 samples/sec   Loss 3.5094   LearningRate 0.0067   Epoch: 14   Global Step: 614760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:43,204-Speed 2626.44 samples/sec   Loss 3.4971   LearningRate 0.0067   Epoch: 14   Global Step: 614770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:47,115-Speed 2618.42 samples/sec   Loss 3.5081   LearningRate 0.0067   Epoch: 14   Global Step: 614780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:50,998-Speed 2637.27 samples/sec   Loss 3.4863   LearningRate 0.0067   Epoch: 14   Global Step: 614790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:54,912-Speed 2617.13 samples/sec   Loss 3.4599   LearningRate 0.0067   Epoch: 14   Global Step: 614800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:49:58,944-Speed 2540.73 samples/sec   Loss 3.5490   LearningRate 0.0067   Epoch: 14   Global Step: 614810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:50:02,894-Speed 2592.72 samples/sec   Loss 3.6267   LearningRate 0.0067   Epoch: 14   Global Step: 614820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:50:06,797-Speed 2624.64 samples/sec   Loss 3.4539   LearningRate 0.0067   Epoch: 14   Global Step: 614830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:50:10,701-Speed 2623.52 samples/sec   Loss 3.4901   LearningRate 0.0067   Epoch: 14   Global Step: 614840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:50:14,608-Speed 2622.22 samples/sec   Loss 3.5027   LearningRate 0.0067   Epoch: 14   Global Step: 614850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:50:18,492-Speed 2636.60 samples/sec   Loss 3.4369   LearningRate 0.0067   Epoch: 14   Global Step: 614860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:22,390-Speed 2627.62 samples/sec   Loss 3.5316   LearningRate 0.0067   Epoch: 14   Global Step: 614870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:26,287-Speed 2628.45 samples/sec   Loss 3.4441   LearningRate 0.0067   Epoch: 14   Global Step: 614880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:30,187-Speed 2627.09 samples/sec   Loss 3.5645   LearningRate 0.0067   Epoch: 14   Global Step: 614890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:34,085-Speed 2627.74 samples/sec   Loss 3.4586   LearningRate 0.0067   Epoch: 14   Global Step: 614900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:37,985-Speed 2626.12 samples/sec   Loss 3.4872   LearningRate 0.0067   Epoch: 14   Global Step: 614910   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:41,887-Speed 2624.47 samples/sec   Loss 3.4241   LearningRate 0.0067   Epoch: 14   Global Step: 614920   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:45,786-Speed 2627.23 samples/sec   Loss 3.4634   LearningRate 0.0067   Epoch: 14   Global Step: 614930   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:49,684-Speed 2627.10 samples/sec   Loss 3.4972   LearningRate 0.0067   Epoch: 14   Global Step: 614940   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:53,609-Speed 2609.98 samples/sec   Loss 3.5221   LearningRate 0.0067   Epoch: 14   Global Step: 614950   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:50:57,501-Speed 2631.80 samples/sec   Loss 3.5299   LearningRate 0.0067   Epoch: 14   Global Step: 614960   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:01,417-Speed 2615.08 samples/sec   Loss 3.5235   LearningRate 0.0067   Epoch: 14   Global Step: 614970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:05,316-Speed 2627.11 samples/sec   Loss 3.5570   LearningRate 0.0067   Epoch: 14   Global Step: 614980   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:09,213-Speed 2629.05 samples/sec   Loss 3.4944   LearningRate 0.0067   Epoch: 14   Global Step: 614990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:13,143-Speed 2605.73 samples/sec   Loss 3.5981   LearningRate 0.0067   Epoch: 14   Global Step: 615000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:17,041-Speed 2627.68 samples/sec   Loss 3.5010   LearningRate 0.0067   Epoch: 14   Global Step: 615010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:20,935-Speed 2630.51 samples/sec   Loss 3.6607   LearningRate 0.0067   Epoch: 14   Global Step: 615020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:24,848-Speed 2617.65 samples/sec   Loss 3.5123   LearningRate 0.0067   Epoch: 14   Global Step: 615030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:28,744-Speed 2629.49 samples/sec   Loss 3.5376   LearningRate 0.0067   Epoch: 14   Global Step: 615040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:32,641-Speed 2628.45 samples/sec   Loss 3.4973   LearningRate 0.0067   Epoch: 14   Global Step: 615050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:51:36,543-Speed 2624.52 samples/sec   Loss 3.5281   LearningRate 0.0067   Epoch: 14   Global Step: 615060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:51:40,439-Speed 2629.05 samples/sec   Loss 3.4972   LearningRate 0.0067   Epoch: 14   Global Step: 615070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:51:44,351-Speed 2618.41 samples/sec   Loss 3.5747   LearningRate 0.0067   Epoch: 14   Global Step: 615080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:51:48,250-Speed 2627.08 samples/sec   Loss 3.5428   LearningRate 0.0067   Epoch: 14   Global Step: 615090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:51:52,149-Speed 2627.42 samples/sec   Loss 3.5759   LearningRate 0.0067   Epoch: 14   Global Step: 615100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:51:56,057-Speed 2620.40 samples/sec   Loss 3.5189   LearningRate 0.0067   Epoch: 14   Global Step: 615110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:51:59,969-Speed 2618.44 samples/sec   Loss 3.3957   LearningRate 0.0067   Epoch: 14   Global Step: 615120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:03,891-Speed 2612.04 samples/sec   Loss 3.5343   LearningRate 0.0067   Epoch: 14   Global Step: 615130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:07,796-Speed 2622.30 samples/sec   Loss 3.4981   LearningRate 0.0067   Epoch: 14   Global Step: 615140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:11,695-Speed 2626.78 samples/sec   Loss 3.5688   LearningRate 0.0067   Epoch: 14   Global Step: 615150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:15,624-Speed 2607.95 samples/sec   Loss 3.4786   LearningRate 0.0067   Epoch: 14   Global Step: 615160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:52:19,500-Speed 2642.93 samples/sec   Loss 3.5784   LearningRate 0.0067   Epoch: 14   Global Step: 615170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:23,395-Speed 2629.30 samples/sec   Loss 3.4401   LearningRate 0.0067   Epoch: 14   Global Step: 615180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:27,293-Speed 2628.03 samples/sec   Loss 3.4365   LearningRate 0.0067   Epoch: 14   Global Step: 615190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:31,192-Speed 2627.24 samples/sec   Loss 3.5185   LearningRate 0.0067   Epoch: 14   Global Step: 615200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:35,087-Speed 2629.11 samples/sec   Loss 3.4953   LearningRate 0.0067   Epoch: 14   Global Step: 615210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:38,983-Speed 2628.76 samples/sec   Loss 3.5353   LearningRate 0.0067   Epoch: 14   Global Step: 615220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:42,881-Speed 2628.28 samples/sec   Loss 3.6655   LearningRate 0.0067   Epoch: 14   Global Step: 615230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:46,778-Speed 2628.43 samples/sec   Loss 3.5435   LearningRate 0.0067   Epoch: 14   Global Step: 615240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:50,688-Speed 2619.95 samples/sec   Loss 3.5265   LearningRate 0.0067   Epoch: 14   Global Step: 615250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:54,611-Speed 2610.27 samples/sec   Loss 3.5243   LearningRate 0.0067   Epoch: 14   Global Step: 615260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:52:58,487-Speed 2643.54 samples/sec   Loss 3.5128   LearningRate 0.0067   Epoch: 14   Global Step: 615270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:53:02,382-Speed 2629.44 samples/sec   Loss 3.5342   LearningRate 0.0067   Epoch: 14   Global Step: 615280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:53:06,319-Speed 2601.42 samples/sec   Loss 3.5024   LearningRate 0.0067   Epoch: 14   Global Step: 615290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:53:10,207-Speed 2634.34 samples/sec   Loss 3.6224   LearningRate 0.0067   Epoch: 14   Global Step: 615300   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:14,107-Speed 2626.45 samples/sec   Loss 3.5497   LearningRate 0.0067   Epoch: 14   Global Step: 615310   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:18,005-Speed 2627.63 samples/sec   Loss 3.5052   LearningRate 0.0067   Epoch: 14   Global Step: 615320   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:21,905-Speed 2626.45 samples/sec   Loss 3.5979   LearningRate 0.0067   Epoch: 14   Global Step: 615330   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:25,811-Speed 2622.55 samples/sec   Loss 3.4428   LearningRate 0.0067   Epoch: 14   Global Step: 615340   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:29,717-Speed 2622.35 samples/sec   Loss 3.5539   LearningRate 0.0067   Epoch: 14   Global Step: 615350   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:33,614-Speed 2628.35 samples/sec   Loss 3.6343   LearningRate 0.0067   Epoch: 14   Global Step: 615360   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:37,516-Speed 2624.51 samples/sec   Loss 3.4088   LearningRate 0.0067   Epoch: 14   Global Step: 615370   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:41,428-Speed 2618.51 samples/sec   Loss 3.5275   LearningRate 0.0067   Epoch: 14   Global Step: 615380   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:45,342-Speed 2616.67 samples/sec   Loss 3.5151   LearningRate 0.0067   Epoch: 14   Global Step: 615390   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:53:49,244-Speed 2626.17 samples/sec   Loss 3.4998   LearningRate 0.0067   Epoch: 14   Global Step: 615400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:53:53,147-Speed 2624.15 samples/sec   Loss 3.5239   LearningRate 0.0067   Epoch: 14   Global Step: 615410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:53:57,070-Speed 2610.85 samples/sec   Loss 3.5437   LearningRate 0.0067   Epoch: 14   Global Step: 615420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:54:00,972-Speed 2625.09 samples/sec   Loss 3.5612   LearningRate 0.0067   Epoch: 14   Global Step: 615430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:54:04,879-Speed 2621.58 samples/sec   Loss 3.5263   LearningRate 0.0067   Epoch: 14   Global Step: 615440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:54:08,775-Speed 2628.86 samples/sec   Loss 3.4607   LearningRate 0.0067   Epoch: 14   Global Step: 615450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:54:12,681-Speed 2621.92 samples/sec   Loss 3.5283   LearningRate 0.0067   Epoch: 14   Global Step: 615460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:54:16,560-Speed 2640.78 samples/sec   Loss 3.4385   LearningRate 0.0067   Epoch: 14   Global Step: 615470   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:20,459-Speed 2627.41 samples/sec   Loss 3.5009   LearningRate 0.0067   Epoch: 14   Global Step: 615480   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:24,357-Speed 2628.17 samples/sec   Loss 3.4536   LearningRate 0.0067   Epoch: 14   Global Step: 615490   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:28,259-Speed 2624.48 samples/sec   Loss 3.4556   LearningRate 0.0067   Epoch: 14   Global Step: 615500   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:32,215-Speed 2589.10 samples/sec   Loss 3.3799   LearningRate 0.0067   Epoch: 14   Global Step: 615510   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:36,114-Speed 2626.86 samples/sec   Loss 3.5926   LearningRate 0.0067   Epoch: 14   Global Step: 615520   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:40,017-Speed 2624.93 samples/sec   Loss 3.5186   LearningRate 0.0067   Epoch: 14   Global Step: 615530   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:43,917-Speed 2625.51 samples/sec   Loss 3.4003   LearningRate 0.0067   Epoch: 14   Global Step: 615540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:47,817-Speed 2626.84 samples/sec   Loss 3.4173   LearningRate 0.0067   Epoch: 14   Global Step: 615550   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:51,713-Speed 2628.32 samples/sec   Loss 3.5856   LearningRate 0.0067   Epoch: 14   Global Step: 615560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 16:54:55,613-Speed 2626.82 samples/sec   Loss 3.5477   LearningRate 0.0067   Epoch: 14   Global Step: 615570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:54:59,511-Speed 2627.63 samples/sec   Loss 3.5569   LearningRate 0.0067   Epoch: 14   Global Step: 615580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:03,413-Speed 2625.04 samples/sec   Loss 3.3980   LearningRate 0.0067   Epoch: 14   Global Step: 615590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:07,313-Speed 2626.31 samples/sec   Loss 3.5142   LearningRate 0.0067   Epoch: 14   Global Step: 615600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:11,214-Speed 2625.41 samples/sec   Loss 3.5195   LearningRate 0.0067   Epoch: 14   Global Step: 615610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:15,121-Speed 2621.68 samples/sec   Loss 3.4860   LearningRate 0.0067   Epoch: 14   Global Step: 615620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:19,023-Speed 2625.06 samples/sec   Loss 3.4852   LearningRate 0.0067   Epoch: 14   Global Step: 615630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:22,952-Speed 2606.93 samples/sec   Loss 3.4996   LearningRate 0.0067   Epoch: 14   Global Step: 615640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:26,852-Speed 2626.55 samples/sec   Loss 3.5797   LearningRate 0.0067   Epoch: 14   Global Step: 615650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:30,755-Speed 2624.17 samples/sec   Loss 3.6125   LearningRate 0.0066   Epoch: 14   Global Step: 615660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:34,654-Speed 2627.29 samples/sec   Loss 3.5152   LearningRate 0.0066   Epoch: 14   Global Step: 615670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:55:38,559-Speed 2622.36 samples/sec   Loss 3.5056   LearningRate 0.0066   Epoch: 14   Global Step: 615680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:42,457-Speed 2628.19 samples/sec   Loss 3.5890   LearningRate 0.0066   Epoch: 14   Global Step: 615690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:46,356-Speed 2626.82 samples/sec   Loss 3.5495   LearningRate 0.0066   Epoch: 14   Global Step: 615700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:50,266-Speed 2620.01 samples/sec   Loss 3.4999   LearningRate 0.0066   Epoch: 14   Global Step: 615710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:54,161-Speed 2629.39 samples/sec   Loss 3.5330   LearningRate 0.0066   Epoch: 14   Global Step: 615720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:55:58,062-Speed 2625.63 samples/sec   Loss 3.5482   LearningRate 0.0066   Epoch: 14   Global Step: 615730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:01,963-Speed 2625.54 samples/sec   Loss 3.5204   LearningRate 0.0066   Epoch: 14   Global Step: 615740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:05,880-Speed 2614.96 samples/sec   Loss 3.4683   LearningRate 0.0066   Epoch: 14   Global Step: 615750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:09,799-Speed 2613.81 samples/sec   Loss 3.5441   LearningRate 0.0066   Epoch: 14   Global Step: 615760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:13,697-Speed 2628.02 samples/sec   Loss 3.3929   LearningRate 0.0066   Epoch: 14   Global Step: 615770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:17,572-Speed 2643.28 samples/sec   Loss 3.4885   LearningRate 0.0066   Epoch: 14   Global Step: 615780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:21,470-Speed 2627.20 samples/sec   Loss 3.4694   LearningRate 0.0066   Epoch: 14   Global Step: 615790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:25,373-Speed 2624.34 samples/sec   Loss 3.4967   LearningRate 0.0066   Epoch: 14   Global Step: 615800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:29,387-Speed 2552.07 samples/sec   Loss 3.4543   LearningRate 0.0066   Epoch: 14   Global Step: 615810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:33,283-Speed 2628.95 samples/sec   Loss 3.3852   LearningRate 0.0066   Epoch: 14   Global Step: 615820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:37,181-Speed 2627.74 samples/sec   Loss 3.4550   LearningRate 0.0066   Epoch: 14   Global Step: 615830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:41,103-Speed 2611.64 samples/sec   Loss 3.5259   LearningRate 0.0066   Epoch: 14   Global Step: 615840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:45,001-Speed 2628.99 samples/sec   Loss 3.4552   LearningRate 0.0066   Epoch: 14   Global Step: 615850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:48,901-Speed 2626.09 samples/sec   Loss 3.4870   LearningRate 0.0066   Epoch: 14   Global Step: 615860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:52,797-Speed 2629.05 samples/sec   Loss 3.4516   LearningRate 0.0066   Epoch: 14   Global Step: 615870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:56:56,746-Speed 2593.69 samples/sec   Loss 3.5512   LearningRate 0.0066   Epoch: 14   Global Step: 615880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:00,644-Speed 2627.86 samples/sec   Loss 3.4832   LearningRate 0.0066   Epoch: 14   Global Step: 615890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:04,641-Speed 2562.99 samples/sec   Loss 3.4879   LearningRate 0.0066   Epoch: 14   Global Step: 615900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:08,532-Speed 2632.45 samples/sec   Loss 3.5817   LearningRate 0.0066   Epoch: 14   Global Step: 615910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:12,445-Speed 2617.08 samples/sec   Loss 3.4627   LearningRate 0.0066   Epoch: 14   Global Step: 615920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:16,344-Speed 2627.08 samples/sec   Loss 3.4095   LearningRate 0.0066   Epoch: 14   Global Step: 615930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:20,255-Speed 2619.17 samples/sec   Loss 3.4420   LearningRate 0.0066   Epoch: 14   Global Step: 615940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:24,151-Speed 2628.92 samples/sec   Loss 3.4169   LearningRate 0.0066   Epoch: 14   Global Step: 615950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:28,081-Speed 2606.34 samples/sec   Loss 3.4901   LearningRate 0.0066   Epoch: 14   Global Step: 615960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:31,982-Speed 2625.75 samples/sec   Loss 3.4624   LearningRate 0.0066   Epoch: 14   Global Step: 615970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:35,876-Speed 2630.56 samples/sec   Loss 3.5329   LearningRate 0.0066   Epoch: 14   Global Step: 615980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:57:39,756-Speed 2639.68 samples/sec   Loss 3.4461   LearningRate 0.0066   Epoch: 14   Global Step: 615990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:43,676-Speed 2612.91 samples/sec   Loss 3.4872   LearningRate 0.0066   Epoch: 14   Global Step: 616000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:47,583-Speed 2621.73 samples/sec   Loss 3.5428   LearningRate 0.0066   Epoch: 14   Global Step: 616010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:51,483-Speed 2626.98 samples/sec   Loss 3.5303   LearningRate 0.0066   Epoch: 14   Global Step: 616020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:55,410-Speed 2607.86 samples/sec   Loss 3.3769   LearningRate 0.0066   Epoch: 14   Global Step: 616030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:57:59,302-Speed 2631.70 samples/sec   Loss 3.4664   LearningRate 0.0066   Epoch: 14   Global Step: 616040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:03,201-Speed 2627.28 samples/sec   Loss 3.5017   LearningRate 0.0066   Epoch: 14   Global Step: 616050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:07,092-Speed 2632.05 samples/sec   Loss 3.5924   LearningRate 0.0066   Epoch: 14   Global Step: 616060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:11,000-Speed 2621.08 samples/sec   Loss 3.4591   LearningRate 0.0066   Epoch: 14   Global Step: 616070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:14,895-Speed 2629.34 samples/sec   Loss 3.5935   LearningRate 0.0066   Epoch: 14   Global Step: 616080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:18,769-Speed 2644.39 samples/sec   Loss 3.4601   LearningRate 0.0066   Epoch: 14   Global Step: 616090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:22,703-Speed 2604.87 samples/sec   Loss 3.4300   LearningRate 0.0066   Epoch: 14   Global Step: 616100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:26,627-Speed 2610.26 samples/sec   Loss 3.5276   LearningRate 0.0066   Epoch: 14   Global Step: 616110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:30,523-Speed 2629.31 samples/sec   Loss 3.5083   LearningRate 0.0066   Epoch: 14   Global Step: 616120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:34,432-Speed 2620.03 samples/sec   Loss 3.4618   LearningRate 0.0066   Epoch: 14   Global Step: 616130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:38,327-Speed 2629.68 samples/sec   Loss 3.4237   LearningRate 0.0066   Epoch: 14   Global Step: 616140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:42,230-Speed 2624.05 samples/sec   Loss 3.5053   LearningRate 0.0066   Epoch: 14   Global Step: 616150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:46,132-Speed 2625.60 samples/sec   Loss 3.3668   LearningRate 0.0066   Epoch: 14   Global Step: 616160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:50,036-Speed 2623.47 samples/sec   Loss 3.4779   LearningRate 0.0066   Epoch: 14   Global Step: 616170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:53,955-Speed 2613.32 samples/sec   Loss 3.5025   LearningRate 0.0066   Epoch: 14   Global Step: 616180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:58:57,870-Speed 2616.78 samples/sec   Loss 3.4178   LearningRate 0.0066   Epoch: 14   Global Step: 616190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:59:01,774-Speed 2623.50 samples/sec   Loss 3.5271   LearningRate 0.0066   Epoch: 14   Global Step: 616200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 16:59:05,653-Speed 2640.57 samples/sec   Loss 3.4845   LearningRate 0.0066   Epoch: 14   Global Step: 616210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:09,547-Speed 2629.74 samples/sec   Loss 3.4498   LearningRate 0.0066   Epoch: 14   Global Step: 616220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:13,442-Speed 2630.01 samples/sec   Loss 3.3977   LearningRate 0.0066   Epoch: 14   Global Step: 616230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:17,349-Speed 2621.81 samples/sec   Loss 3.5147   LearningRate 0.0066   Epoch: 14   Global Step: 616240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:21,251-Speed 2624.73 samples/sec   Loss 3.5093   LearningRate 0.0066   Epoch: 14   Global Step: 616250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:25,163-Speed 2618.00 samples/sec   Loss 3.5349   LearningRate 0.0066   Epoch: 14   Global Step: 616260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:29,059-Speed 2629.56 samples/sec   Loss 3.4903   LearningRate 0.0066   Epoch: 14   Global Step: 616270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:32,987-Speed 2607.21 samples/sec   Loss 3.4440   LearningRate 0.0066   Epoch: 14   Global Step: 616280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:36,980-Speed 2565.41 samples/sec   Loss 3.4588   LearningRate 0.0066   Epoch: 14   Global Step: 616290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:41,115-Speed 2476.76 samples/sec   Loss 3.4744   LearningRate 0.0066   Epoch: 14   Global Step: 616300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:44,990-Speed 2643.66 samples/sec   Loss 3.4708   LearningRate 0.0066   Epoch: 14   Global Step: 616310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:48,951-Speed 2585.47 samples/sec   Loss 3.4960   LearningRate 0.0066   Epoch: 14   Global Step: 616320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:52,849-Speed 2627.70 samples/sec   Loss 3.4649   LearningRate 0.0066   Epoch: 14   Global Step: 616330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 16:59:56,747-Speed 2627.85 samples/sec   Loss 3.4841   LearningRate 0.0066   Epoch: 14   Global Step: 616340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:00:00,660-Speed 2617.49 samples/sec   Loss 3.5054   LearningRate 0.0066   Epoch: 14   Global Step: 616350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:00:04,740-Speed 2510.28 samples/sec   Loss 3.4546   LearningRate 0.0066   Epoch: 14   Global Step: 616360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:00:08,678-Speed 2601.11 samples/sec   Loss 3.5431   LearningRate 0.0066   Epoch: 14   Global Step: 616370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:00:12,580-Speed 2624.75 samples/sec   Loss 3.5424   LearningRate 0.0066   Epoch: 14   Global Step: 616380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:00:16,483-Speed 2624.28 samples/sec   Loss 3.5330   LearningRate 0.0066   Epoch: 14   Global Step: 616390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:00:20,360-Speed 2642.10 samples/sec   Loss 3.5520   LearningRate 0.0066   Epoch: 14   Global Step: 616400   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:24,277-Speed 2614.47 samples/sec   Loss 3.4280   LearningRate 0.0066   Epoch: 14   Global Step: 616410   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:28,171-Speed 2630.22 samples/sec   Loss 3.4681   LearningRate 0.0066   Epoch: 14   Global Step: 616420   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:32,071-Speed 2626.15 samples/sec   Loss 3.5519   LearningRate 0.0066   Epoch: 14   Global Step: 616430   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:35,964-Speed 2631.32 samples/sec   Loss 3.4414   LearningRate 0.0066   Epoch: 14   Global Step: 616440   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:39,864-Speed 2626.83 samples/sec   Loss 3.4429   LearningRate 0.0066   Epoch: 14   Global Step: 616450   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:43,825-Speed 2585.57 samples/sec   Loss 3.5541   LearningRate 0.0066   Epoch: 14   Global Step: 616460   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:47,728-Speed 2624.98 samples/sec   Loss 3.5329   LearningRate 0.0066   Epoch: 14   Global Step: 616470   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:51,622-Speed 2629.96 samples/sec   Loss 3.4744   LearningRate 0.0066   Epoch: 14   Global Step: 616480   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:55,518-Speed 2629.29 samples/sec   Loss 3.4740   LearningRate 0.0066   Epoch: 14   Global Step: 616490   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:00:59,426-Speed 2621.30 samples/sec   Loss 3.4556   LearningRate 0.0066   Epoch: 14   Global Step: 616500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:03,322-Speed 2628.22 samples/sec   Loss 3.4299   LearningRate 0.0066   Epoch: 14   Global Step: 616510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:07,219-Speed 2628.52 samples/sec   Loss 3.4631   LearningRate 0.0066   Epoch: 14   Global Step: 616520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:11,117-Speed 2628.07 samples/sec   Loss 3.4897   LearningRate 0.0066   Epoch: 14   Global Step: 616530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:15,015-Speed 2627.93 samples/sec   Loss 3.4378   LearningRate 0.0066   Epoch: 14   Global Step: 616540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:18,910-Speed 2629.56 samples/sec   Loss 3.5113   LearningRate 0.0066   Epoch: 14   Global Step: 616550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:22,822-Speed 2618.49 samples/sec   Loss 3.5340   LearningRate 0.0066   Epoch: 14   Global Step: 616560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:26,717-Speed 2629.73 samples/sec   Loss 3.5930   LearningRate 0.0066   Epoch: 14   Global Step: 616570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:30,634-Speed 2614.59 samples/sec   Loss 3.4608   LearningRate 0.0066   Epoch: 14   Global Step: 616580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:34,539-Speed 2622.99 samples/sec   Loss 3.4868   LearningRate 0.0066   Epoch: 14   Global Step: 616590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:38,436-Speed 2628.86 samples/sec   Loss 3.4954   LearningRate 0.0066   Epoch: 14   Global Step: 616600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:01:42,317-Speed 2639.14 samples/sec   Loss 3.4578   LearningRate 0.0066   Epoch: 14   Global Step: 616610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:46,260-Speed 2598.05 samples/sec   Loss 3.5144   LearningRate 0.0066   Epoch: 14   Global Step: 616620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:50,159-Speed 2626.23 samples/sec   Loss 3.5016   LearningRate 0.0066   Epoch: 14   Global Step: 616630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:54,067-Speed 2621.54 samples/sec   Loss 3.5856   LearningRate 0.0066   Epoch: 14   Global Step: 616640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:01:57,971-Speed 2623.68 samples/sec   Loss 3.4024   LearningRate 0.0066   Epoch: 14   Global Step: 616650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:01,867-Speed 2628.77 samples/sec   Loss 3.4486   LearningRate 0.0066   Epoch: 14   Global Step: 616660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:05,771-Speed 2623.15 samples/sec   Loss 3.4978   LearningRate 0.0066   Epoch: 14   Global Step: 616670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:09,671-Speed 2626.59 samples/sec   Loss 3.4734   LearningRate 0.0066   Epoch: 14   Global Step: 616680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:13,570-Speed 2627.54 samples/sec   Loss 3.4746   LearningRate 0.0066   Epoch: 14   Global Step: 616690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:17,469-Speed 2627.46 samples/sec   Loss 3.4638   LearningRate 0.0066   Epoch: 14   Global Step: 616700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:21,349-Speed 2639.67 samples/sec   Loss 3.3951   LearningRate 0.0066   Epoch: 14   Global Step: 616710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:25,284-Speed 2603.43 samples/sec   Loss 3.5491   LearningRate 0.0066   Epoch: 14   Global Step: 616720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:29,177-Speed 2630.81 samples/sec   Loss 3.5521   LearningRate 0.0066   Epoch: 14   Global Step: 616730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:33,095-Speed 2613.95 samples/sec   Loss 3.5061   LearningRate 0.0066   Epoch: 14   Global Step: 616740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:36,995-Speed 2626.62 samples/sec   Loss 3.4869   LearningRate 0.0066   Epoch: 14   Global Step: 616750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:40,901-Speed 2622.41 samples/sec   Loss 3.5488   LearningRate 0.0066   Epoch: 14   Global Step: 616760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:44,804-Speed 2624.38 samples/sec   Loss 3.4826   LearningRate 0.0066   Epoch: 14   Global Step: 616770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:48,698-Speed 2630.46 samples/sec   Loss 3.4140   LearningRate 0.0066   Epoch: 14   Global Step: 616780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:52,615-Speed 2615.79 samples/sec   Loss 3.4394   LearningRate 0.0066   Epoch: 14   Global Step: 616790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:02:56,524-Speed 2619.97 samples/sec   Loss 3.5235   LearningRate 0.0066   Epoch: 14   Global Step: 616800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:03:00,410-Speed 2635.60 samples/sec   Loss 3.4313   LearningRate 0.0066   Epoch: 14   Global Step: 616810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:04,307-Speed 2627.91 samples/sec   Loss 3.4767   LearningRate 0.0066   Epoch: 14   Global Step: 616820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:08,227-Speed 2613.70 samples/sec   Loss 3.4615   LearningRate 0.0066   Epoch: 14   Global Step: 616830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:12,179-Speed 2591.57 samples/sec   Loss 3.5182   LearningRate 0.0066   Epoch: 14   Global Step: 616840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:16,073-Speed 2630.50 samples/sec   Loss 3.5207   LearningRate 0.0066   Epoch: 14   Global Step: 616850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:19,980-Speed 2622.38 samples/sec   Loss 3.5389   LearningRate 0.0066   Epoch: 14   Global Step: 616860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:23,875-Speed 2629.36 samples/sec   Loss 3.4658   LearningRate 0.0066   Epoch: 14   Global Step: 616870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:27,788-Speed 2618.47 samples/sec   Loss 3.4816   LearningRate 0.0066   Epoch: 14   Global Step: 616880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:31,687-Speed 2626.61 samples/sec   Loss 3.4374   LearningRate 0.0066   Epoch: 14   Global Step: 616890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:35,584-Speed 2627.94 samples/sec   Loss 3.5296   LearningRate 0.0066   Epoch: 14   Global Step: 616900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:03:39,477-Speed 2630.95 samples/sec   Loss 3.4908   LearningRate 0.0066   Epoch: 14   Global Step: 616910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:03:43,382-Speed 2623.37 samples/sec   Loss 3.4866   LearningRate 0.0066   Epoch: 14   Global Step: 616920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:03:47,278-Speed 2629.10 samples/sec   Loss 3.3833   LearningRate 0.0066   Epoch: 14   Global Step: 616930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:03:51,214-Speed 2602.27 samples/sec   Loss 3.5079   LearningRate 0.0066   Epoch: 14   Global Step: 616940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:03:55,122-Speed 2620.92 samples/sec   Loss 3.5792   LearningRate 0.0066   Epoch: 14   Global Step: 616950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:03:59,017-Speed 2630.24 samples/sec   Loss 3.4784   LearningRate 0.0066   Epoch: 14   Global Step: 616960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:02,917-Speed 2625.95 samples/sec   Loss 3.4592   LearningRate 0.0066   Epoch: 14   Global Step: 616970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:06,811-Speed 2630.25 samples/sec   Loss 3.4532   LearningRate 0.0066   Epoch: 14   Global Step: 616980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:10,704-Speed 2631.05 samples/sec   Loss 3.5196   LearningRate 0.0066   Epoch: 14   Global Step: 616990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:14,668-Speed 2584.02 samples/sec   Loss 3.5011   LearningRate 0.0066   Epoch: 14   Global Step: 617000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:18,700-Speed 2540.35 samples/sec   Loss 3.4016   LearningRate 0.0066   Epoch: 14   Global Step: 617010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:22,620-Speed 2613.76 samples/sec   Loss 3.5020   LearningRate 0.0066   Epoch: 14   Global Step: 617020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:26,550-Speed 2606.16 samples/sec   Loss 3.5051   LearningRate 0.0066   Epoch: 14   Global Step: 617030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:30,454-Speed 2624.02 samples/sec   Loss 3.5457   LearningRate 0.0066   Epoch: 14   Global Step: 617040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:34,373-Speed 2612.90 samples/sec   Loss 3.5418   LearningRate 0.0066   Epoch: 14   Global Step: 617050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:38,293-Speed 2612.79 samples/sec   Loss 3.4933   LearningRate 0.0066   Epoch: 14   Global Step: 617060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:42,196-Speed 2624.99 samples/sec   Loss 3.4713   LearningRate 0.0066   Epoch: 14   Global Step: 617070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:46,108-Speed 2618.27 samples/sec   Loss 3.4464   LearningRate 0.0066   Epoch: 14   Global Step: 617080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:50,010-Speed 2625.30 samples/sec   Loss 3.5368   LearningRate 0.0066   Epoch: 14   Global Step: 617090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:53,925-Speed 2616.21 samples/sec   Loss 3.4843   LearningRate 0.0066   Epoch: 14   Global Step: 617100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:04:57,796-Speed 2646.50 samples/sec   Loss 3.4728   LearningRate 0.0066   Epoch: 14   Global Step: 617110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:01,725-Speed 2607.02 samples/sec   Loss 3.4260   LearningRate 0.0066   Epoch: 14   Global Step: 617120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:05,639-Speed 2617.02 samples/sec   Loss 3.4021   LearningRate 0.0066   Epoch: 14   Global Step: 617130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:09,537-Speed 2627.10 samples/sec   Loss 3.3910   LearningRate 0.0066   Epoch: 14   Global Step: 617140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:13,446-Speed 2620.86 samples/sec   Loss 3.4425   LearningRate 0.0066   Epoch: 14   Global Step: 617150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:17,546-Speed 2497.63 samples/sec   Loss 3.4238   LearningRate 0.0066   Epoch: 14   Global Step: 617160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:21,505-Speed 2588.58 samples/sec   Loss 3.6628   LearningRate 0.0066   Epoch: 14   Global Step: 617170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:25,466-Speed 2585.65 samples/sec   Loss 3.5454   LearningRate 0.0066   Epoch: 14   Global Step: 617180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:29,380-Speed 2617.89 samples/sec   Loss 3.4520   LearningRate 0.0066   Epoch: 14   Global Step: 617190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:33,302-Speed 2611.93 samples/sec   Loss 3.4382   LearningRate 0.0066   Epoch: 14   Global Step: 617200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:37,199-Speed 2628.13 samples/sec   Loss 3.4876   LearningRate 0.0066   Epoch: 14   Global Step: 617210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:05:41,071-Speed 2645.08 samples/sec   Loss 3.4838   LearningRate 0.0066   Epoch: 14   Global Step: 617220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:44,974-Speed 2624.31 samples/sec   Loss 3.4778   LearningRate 0.0066   Epoch: 14   Global Step: 617230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:48,868-Speed 2630.49 samples/sec   Loss 3.5303   LearningRate 0.0066   Epoch: 14   Global Step: 617240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:52,761-Speed 2631.52 samples/sec   Loss 3.5047   LearningRate 0.0066   Epoch: 14   Global Step: 617250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:05:56,661-Speed 2625.60 samples/sec   Loss 3.4917   LearningRate 0.0066   Epoch: 14   Global Step: 617260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:00,553-Speed 2632.23 samples/sec   Loss 3.4852   LearningRate 0.0065   Epoch: 14   Global Step: 617270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:04,467-Speed 2616.61 samples/sec   Loss 3.4283   LearningRate 0.0065   Epoch: 14   Global Step: 617280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:08,363-Speed 2628.63 samples/sec   Loss 3.4437   LearningRate 0.0065   Epoch: 14   Global Step: 617290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:12,264-Speed 2625.90 samples/sec   Loss 3.5077   LearningRate 0.0065   Epoch: 14   Global Step: 617300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:16,154-Speed 2633.40 samples/sec   Loss 3.4515   LearningRate 0.0065   Epoch: 14   Global Step: 617310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:20,022-Speed 2648.37 samples/sec   Loss 3.4835   LearningRate 0.0065   Epoch: 14   Global Step: 617320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:23,927-Speed 2622.16 samples/sec   Loss 3.4296   LearningRate 0.0065   Epoch: 14   Global Step: 617330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:27,832-Speed 2622.74 samples/sec   Loss 3.4519   LearningRate 0.0065   Epoch: 14   Global Step: 617340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:31,726-Speed 2630.66 samples/sec   Loss 3.4897   LearningRate 0.0065   Epoch: 14   Global Step: 617350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:35,629-Speed 2625.11 samples/sec   Loss 3.4598   LearningRate 0.0065   Epoch: 14   Global Step: 617360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:39,522-Speed 2630.77 samples/sec   Loss 3.4934   LearningRate 0.0065   Epoch: 14   Global Step: 617370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:43,422-Speed 2625.76 samples/sec   Loss 3.4794   LearningRate 0.0065   Epoch: 14   Global Step: 617380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:47,322-Speed 2626.94 samples/sec   Loss 3.4595   LearningRate 0.0065   Epoch: 14   Global Step: 617390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:51,227-Speed 2622.83 samples/sec   Loss 3.4992   LearningRate 0.0065   Epoch: 14   Global Step: 617400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:55,125-Speed 2628.08 samples/sec   Loss 3.4475   LearningRate 0.0065   Epoch: 14   Global Step: 617410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:06:59,033-Speed 2620.43 samples/sec   Loss 3.4208   LearningRate 0.0065   Epoch: 14   Global Step: 617420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:07:02,910-Speed 2641.79 samples/sec   Loss 3.4509   LearningRate 0.0065   Epoch: 14   Global Step: 617430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:07:06,810-Speed 2626.11 samples/sec   Loss 3.4451   LearningRate 0.0065   Epoch: 14   Global Step: 617440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:07:10,708-Speed 2628.34 samples/sec   Loss 3.5119   LearningRate 0.0065   Epoch: 14   Global Step: 617450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:07:14,606-Speed 2627.56 samples/sec   Loss 3.5486   LearningRate 0.0065   Epoch: 14   Global Step: 617460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:07:18,532-Speed 2609.05 samples/sec   Loss 3.5340   LearningRate 0.0065   Epoch: 14   Global Step: 617470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:07:22,430-Speed 2627.47 samples/sec   Loss 3.4783   LearningRate 0.0065   Epoch: 14   Global Step: 617480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:07:26,305-Speed 2643.41 samples/sec   Loss 3.5091   LearningRate 0.0065   Epoch: 14   Global Step: 617490   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:30,216-Speed 2619.26 samples/sec   Loss 3.5136   LearningRate 0.0065   Epoch: 14   Global Step: 617500   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:34,112-Speed 2628.74 samples/sec   Loss 3.4677   LearningRate 0.0065   Epoch: 14   Global Step: 617510   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:38,014-Speed 2624.32 samples/sec   Loss 3.4708   LearningRate 0.0065   Epoch: 14   Global Step: 617520   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:41,915-Speed 2626.46 samples/sec   Loss 3.4450   LearningRate 0.0065   Epoch: 14   Global Step: 617530   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:45,817-Speed 2625.28 samples/sec   Loss 3.5654   LearningRate 0.0065   Epoch: 14   Global Step: 617540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:49,714-Speed 2628.23 samples/sec   Loss 3.5065   LearningRate 0.0065   Epoch: 14   Global Step: 617550   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:53,618-Speed 2623.62 samples/sec   Loss 3.4187   LearningRate 0.0065   Epoch: 14   Global Step: 617560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:07:57,527-Speed 2620.56 samples/sec   Loss 3.6257   LearningRate 0.0065   Epoch: 14   Global Step: 617570   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:08:01,417-Speed 2632.60 samples/sec   Loss 3.4592   LearningRate 0.0065   Epoch: 14   Global Step: 617580   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:08:05,311-Speed 2630.45 samples/sec   Loss 3.5318   LearningRate 0.0065   Epoch: 14   Global Step: 617590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:09,244-Speed 2604.49 samples/sec   Loss 3.4958   LearningRate 0.0065   Epoch: 14   Global Step: 617600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:13,145-Speed 2625.43 samples/sec   Loss 3.4031   LearningRate 0.0065   Epoch: 14   Global Step: 617610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:17,061-Speed 2615.98 samples/sec   Loss 3.4621   LearningRate 0.0065   Epoch: 14   Global Step: 617620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:20,971-Speed 2619.19 samples/sec   Loss 3.4721   LearningRate 0.0065   Epoch: 14   Global Step: 617630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:24,868-Speed 2628.97 samples/sec   Loss 3.4447   LearningRate 0.0065   Epoch: 14   Global Step: 617640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:28,761-Speed 2630.89 samples/sec   Loss 3.4800   LearningRate 0.0065   Epoch: 14   Global Step: 617650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:32,654-Speed 2631.18 samples/sec   Loss 3.4943   LearningRate 0.0065   Epoch: 14   Global Step: 617660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:36,572-Speed 2613.52 samples/sec   Loss 3.6068   LearningRate 0.0065   Epoch: 14   Global Step: 617670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:40,464-Speed 2632.18 samples/sec   Loss 3.5134   LearningRate 0.0065   Epoch: 14   Global Step: 617680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:44,345-Speed 2638.93 samples/sec   Loss 3.4060   LearningRate 0.0065   Epoch: 14   Global Step: 617690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:48,239-Speed 2630.77 samples/sec   Loss 3.5043   LearningRate 0.0065   Epoch: 14   Global Step: 617700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:52,134-Speed 2629.56 samples/sec   Loss 3.4399   LearningRate 0.0065   Epoch: 14   Global Step: 617710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:56,027-Speed 2631.26 samples/sec   Loss 3.4577   LearningRate 0.0065   Epoch: 14   Global Step: 617720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:08:59,936-Speed 2619.98 samples/sec   Loss 3.3907   LearningRate 0.0065   Epoch: 14   Global Step: 617730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:03,846-Speed 2619.54 samples/sec   Loss 3.4936   LearningRate 0.0065   Epoch: 14   Global Step: 617740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:07,738-Speed 2631.09 samples/sec   Loss 3.4359   LearningRate 0.0065   Epoch: 14   Global Step: 617750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:11,631-Speed 2632.19 samples/sec   Loss 3.4771   LearningRate 0.0065   Epoch: 14   Global Step: 617760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:15,544-Speed 2617.46 samples/sec   Loss 3.3848   LearningRate 0.0065   Epoch: 14   Global Step: 617770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:19,435-Speed 2632.31 samples/sec   Loss 3.5218   LearningRate 0.0065   Epoch: 14   Global Step: 617780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:23,348-Speed 2618.18 samples/sec   Loss 3.4002   LearningRate 0.0065   Epoch: 14   Global Step: 617790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:09:27,221-Speed 2644.30 samples/sec   Loss 3.4820   LearningRate 0.0065   Epoch: 14   Global Step: 617800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:31,135-Speed 2617.23 samples/sec   Loss 3.4739   LearningRate 0.0065   Epoch: 14   Global Step: 617810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:35,033-Speed 2627.69 samples/sec   Loss 3.4851   LearningRate 0.0065   Epoch: 14   Global Step: 617820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:09:38,912-Speed 2640.65 samples/sec   Loss 3.4407   LearningRate 0.0065   Epoch: 14   Global Step: 617830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:09:42,810-Speed 2627.00 samples/sec   Loss 3.4839   LearningRate 0.0065   Epoch: 14   Global Step: 617840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:09:46,706-Speed 2629.50 samples/sec   Loss 3.5062   LearningRate 0.0065   Epoch: 14   Global Step: 617850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:09:50,597-Speed 2632.61 samples/sec   Loss 3.5045   LearningRate 0.0065   Epoch: 14   Global Step: 617860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:09:54,503-Speed 2622.32 samples/sec   Loss 3.4980   LearningRate 0.0065   Epoch: 14   Global Step: 617870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:09:58,401-Speed 2628.36 samples/sec   Loss 3.5068   LearningRate 0.0065   Epoch: 14   Global Step: 617880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:10:02,299-Speed 2627.04 samples/sec   Loss 3.3983   LearningRate 0.0065   Epoch: 14   Global Step: 617890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:10:06,196-Speed 2628.43 samples/sec   Loss 3.4729   LearningRate 0.0065   Epoch: 14   Global Step: 617900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:10:10,099-Speed 2624.31 samples/sec   Loss 3.5038   LearningRate 0.0065   Epoch: 14   Global Step: 617910   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:10:13,995-Speed 2628.62 samples/sec   Loss 3.4232   LearningRate 0.0065   Epoch: 14   Global Step: 617920   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:10:17,890-Speed 2629.70 samples/sec   Loss 3.4149   LearningRate 0.0065   Epoch: 14   Global Step: 617930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:21,783-Speed 2631.63 samples/sec   Loss 3.5227   LearningRate 0.0065   Epoch: 14   Global Step: 617940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:25,688-Speed 2622.84 samples/sec   Loss 3.4873   LearningRate 0.0065   Epoch: 14   Global Step: 617950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:29,596-Speed 2621.28 samples/sec   Loss 3.4925   LearningRate 0.0065   Epoch: 14   Global Step: 617960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:33,576-Speed 2572.80 samples/sec   Loss 3.4627   LearningRate 0.0065   Epoch: 14   Global Step: 617970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:37,484-Speed 2620.79 samples/sec   Loss 3.4891   LearningRate 0.0065   Epoch: 14   Global Step: 617980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:41,415-Speed 2605.61 samples/sec   Loss 3.4625   LearningRate 0.0065   Epoch: 14   Global Step: 617990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:45,313-Speed 2628.36 samples/sec   Loss 3.4826   LearningRate 0.0065   Epoch: 14   Global Step: 618000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:49,209-Speed 2628.47 samples/sec   Loss 3.3909   LearningRate 0.0065   Epoch: 14   Global Step: 618010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:53,106-Speed 2628.95 samples/sec   Loss 3.4688   LearningRate 0.0065   Epoch: 14   Global Step: 618020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:10:56,975-Speed 2646.98 samples/sec   Loss 3.4914   LearningRate 0.0065   Epoch: 14   Global Step: 618030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:00,867-Speed 2632.41 samples/sec   Loss 3.4317   LearningRate 0.0065   Epoch: 14   Global Step: 618040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:04,769-Speed 2624.72 samples/sec   Loss 3.3664   LearningRate 0.0065   Epoch: 14   Global Step: 618050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:08,662-Speed 2630.66 samples/sec   Loss 3.4952   LearningRate 0.0065   Epoch: 14   Global Step: 618060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:12,554-Speed 2631.79 samples/sec   Loss 3.5289   LearningRate 0.0065   Epoch: 14   Global Step: 618070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:16,452-Speed 2627.78 samples/sec   Loss 3.4775   LearningRate 0.0065   Epoch: 14   Global Step: 618080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:20,345-Speed 2630.61 samples/sec   Loss 3.4829   LearningRate 0.0065   Epoch: 14   Global Step: 618090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:24,241-Speed 2629.21 samples/sec   Loss 3.4857   LearningRate 0.0065   Epoch: 14   Global Step: 618100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:28,133-Speed 2631.44 samples/sec   Loss 3.4375   LearningRate 0.0065   Epoch: 14   Global Step: 618110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:32,029-Speed 2629.18 samples/sec   Loss 3.4257   LearningRate 0.0065   Epoch: 14   Global Step: 618120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:35,928-Speed 2627.14 samples/sec   Loss 3.3636   LearningRate 0.0065   Epoch: 14   Global Step: 618130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:11:39,825-Speed 2628.10 samples/sec   Loss 3.4928   LearningRate 0.0065   Epoch: 14   Global Step: 618140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:11:43,696-Speed 2645.54 samples/sec   Loss 3.4206   LearningRate 0.0065   Epoch: 14   Global Step: 618150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:47,603-Speed 2621.22 samples/sec   Loss 3.5025   LearningRate 0.0065   Epoch: 14   Global Step: 618160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:51,497-Speed 2631.24 samples/sec   Loss 3.4746   LearningRate 0.0065   Epoch: 14   Global Step: 618170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:55,394-Speed 2628.05 samples/sec   Loss 3.4873   LearningRate 0.0065   Epoch: 14   Global Step: 618180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:11:59,299-Speed 2622.92 samples/sec   Loss 3.4560   LearningRate 0.0065   Epoch: 14   Global Step: 618190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:03,194-Speed 2629.83 samples/sec   Loss 3.5777   LearningRate 0.0065   Epoch: 14   Global Step: 618200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:07,099-Speed 2623.34 samples/sec   Loss 3.4571   LearningRate 0.0065   Epoch: 14   Global Step: 618210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:10,997-Speed 2627.40 samples/sec   Loss 3.4034   LearningRate 0.0065   Epoch: 14   Global Step: 618220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:14,901-Speed 2623.41 samples/sec   Loss 3.4975   LearningRate 0.0065   Epoch: 14   Global Step: 618230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:18,798-Speed 2628.77 samples/sec   Loss 3.4891   LearningRate 0.0065   Epoch: 14   Global Step: 618240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:22,694-Speed 2628.97 samples/sec   Loss 3.4699   LearningRate 0.0065   Epoch: 14   Global Step: 618250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:12:26,578-Speed 2636.78 samples/sec   Loss 3.4849   LearningRate 0.0065   Epoch: 14   Global Step: 618260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:30,485-Speed 2621.65 samples/sec   Loss 3.5080   LearningRate 0.0065   Epoch: 14   Global Step: 618270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:34,384-Speed 2626.71 samples/sec   Loss 3.4791   LearningRate 0.0065   Epoch: 14   Global Step: 618280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:38,281-Speed 2628.76 samples/sec   Loss 3.4433   LearningRate 0.0065   Epoch: 14   Global Step: 618290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:42,181-Speed 2625.96 samples/sec   Loss 3.4355   LearningRate 0.0065   Epoch: 14   Global Step: 618300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:46,085-Speed 2624.21 samples/sec   Loss 3.4309   LearningRate 0.0065   Epoch: 14   Global Step: 618310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:49,992-Speed 2621.60 samples/sec   Loss 3.5047   LearningRate 0.0065   Epoch: 14   Global Step: 618320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:53,887-Speed 2629.61 samples/sec   Loss 3.4772   LearningRate 0.0065   Epoch: 14   Global Step: 618330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:12:57,810-Speed 2611.10 samples/sec   Loss 3.4949   LearningRate 0.0065   Epoch: 14   Global Step: 618340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:01,717-Speed 2621.16 samples/sec   Loss 3.4922   LearningRate 0.0065   Epoch: 14   Global Step: 618350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:05,598-Speed 2639.58 samples/sec   Loss 3.4396   LearningRate 0.0065   Epoch: 14   Global Step: 618360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:09,491-Speed 2630.92 samples/sec   Loss 3.4442   LearningRate 0.0065   Epoch: 14   Global Step: 618370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:13,382-Speed 2632.54 samples/sec   Loss 3.3948   LearningRate 0.0065   Epoch: 14   Global Step: 618380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:17,284-Speed 2624.95 samples/sec   Loss 3.4346   LearningRate 0.0065   Epoch: 14   Global Step: 618390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:21,176-Speed 2631.40 samples/sec   Loss 3.4047   LearningRate 0.0065   Epoch: 14   Global Step: 618400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:25,074-Speed 2628.36 samples/sec   Loss 3.5125   LearningRate 0.0065   Epoch: 14   Global Step: 618410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:28,979-Speed 2623.12 samples/sec   Loss 3.5267   LearningRate 0.0065   Epoch: 14   Global Step: 618420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:32,947-Speed 2581.20 samples/sec   Loss 3.4682   LearningRate 0.0065   Epoch: 14   Global Step: 618430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:36,868-Speed 2612.34 samples/sec   Loss 3.4403   LearningRate 0.0065   Epoch: 14   Global Step: 618440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:40,763-Speed 2631.21 samples/sec   Loss 3.5006   LearningRate 0.0065   Epoch: 14   Global Step: 618450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:44,658-Speed 2629.12 samples/sec   Loss 3.4587   LearningRate 0.0065   Epoch: 14   Global Step: 618460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:13:48,529-Speed 2646.61 samples/sec   Loss 3.4794   LearningRate 0.0065   Epoch: 14   Global Step: 618470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:52,469-Speed 2599.49 samples/sec   Loss 3.4121   LearningRate 0.0065   Epoch: 14   Global Step: 618480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:13:56,457-Speed 2568.79 samples/sec   Loss 3.4897   LearningRate 0.0065   Epoch: 14   Global Step: 618490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:00,361-Speed 2623.33 samples/sec   Loss 3.3514   LearningRate 0.0065   Epoch: 14   Global Step: 618500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:04,250-Speed 2633.62 samples/sec   Loss 3.4191   LearningRate 0.0065   Epoch: 14   Global Step: 618510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:08,141-Speed 2632.17 samples/sec   Loss 3.4074   LearningRate 0.0065   Epoch: 14   Global Step: 618520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:12,038-Speed 2629.33 samples/sec   Loss 3.5018   LearningRate 0.0065   Epoch: 14   Global Step: 618530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:15,930-Speed 2632.35 samples/sec   Loss 3.5059   LearningRate 0.0065   Epoch: 14   Global Step: 618540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:19,855-Speed 2609.27 samples/sec   Loss 3.5208   LearningRate 0.0065   Epoch: 14   Global Step: 618550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:23,776-Speed 2612.74 samples/sec   Loss 3.3334   LearningRate 0.0065   Epoch: 14   Global Step: 618560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:14:27,647-Speed 2646.21 samples/sec   Loss 3.4805   LearningRate 0.0065   Epoch: 14   Global Step: 618570   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:31,541-Speed 2629.97 samples/sec   Loss 3.4640   LearningRate 0.0065   Epoch: 14   Global Step: 618580   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:35,438-Speed 2628.01 samples/sec   Loss 3.4046   LearningRate 0.0065   Epoch: 14   Global Step: 618590   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:39,336-Speed 2628.42 samples/sec   Loss 3.4070   LearningRate 0.0065   Epoch: 14   Global Step: 618600   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:43,237-Speed 2625.66 samples/sec   Loss 3.4566   LearningRate 0.0065   Epoch: 14   Global Step: 618610   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:47,128-Speed 2632.46 samples/sec   Loss 3.4941   LearningRate 0.0065   Epoch: 14   Global Step: 618620   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:51,041-Speed 2617.33 samples/sec   Loss 3.4626   LearningRate 0.0065   Epoch: 14   Global Step: 618630   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:54,979-Speed 2601.40 samples/sec   Loss 3.5015   LearningRate 0.0065   Epoch: 14   Global Step: 618640   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:14:58,870-Speed 2632.18 samples/sec   Loss 3.4036   LearningRate 0.0065   Epoch: 14   Global Step: 618650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:15:02,763-Speed 2630.58 samples/sec   Loss 3.4875   LearningRate 0.0065   Epoch: 14   Global Step: 618660   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:15:06,662-Speed 2627.15 samples/sec   Loss 3.4507   LearningRate 0.0065   Epoch: 14   Global Step: 618670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:10,560-Speed 2627.56 samples/sec   Loss 3.4788   LearningRate 0.0065   Epoch: 14   Global Step: 618680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:14,506-Speed 2595.57 samples/sec   Loss 3.4127   LearningRate 0.0065   Epoch: 14   Global Step: 618690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:18,538-Speed 2540.42 samples/sec   Loss 3.4791   LearningRate 0.0065   Epoch: 14   Global Step: 618700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:22,452-Speed 2618.52 samples/sec   Loss 3.4628   LearningRate 0.0065   Epoch: 14   Global Step: 618710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:26,351-Speed 2626.79 samples/sec   Loss 3.4947   LearningRate 0.0065   Epoch: 14   Global Step: 618720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:30,297-Speed 2596.12 samples/sec   Loss 3.5456   LearningRate 0.0065   Epoch: 14   Global Step: 618730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:34,196-Speed 2627.07 samples/sec   Loss 3.4017   LearningRate 0.0065   Epoch: 14   Global Step: 618740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:38,105-Speed 2619.94 samples/sec   Loss 3.4853   LearningRate 0.0065   Epoch: 14   Global Step: 618750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:42,048-Speed 2598.09 samples/sec   Loss 3.4635   LearningRate 0.0065   Epoch: 14   Global Step: 618760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:15:45,925-Speed 2642.06 samples/sec   Loss 3.4856   LearningRate 0.0065   Epoch: 14   Global Step: 618770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:15:49,818-Speed 2630.72 samples/sec   Loss 3.4581   LearningRate 0.0065   Epoch: 14   Global Step: 618780   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:15:53,726-Speed 2621.45 samples/sec   Loss 3.4400   LearningRate 0.0065   Epoch: 14   Global Step: 618790   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:15:57,632-Speed 2622.45 samples/sec   Loss 3.4455   LearningRate 0.0065   Epoch: 14   Global Step: 618800   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:01,542-Speed 2619.96 samples/sec   Loss 3.4718   LearningRate 0.0065   Epoch: 14   Global Step: 618810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:05,436-Speed 2630.22 samples/sec   Loss 3.5856   LearningRate 0.0065   Epoch: 14   Global Step: 618820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:09,344-Speed 2620.97 samples/sec   Loss 3.5317   LearningRate 0.0065   Epoch: 14   Global Step: 618830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:13,240-Speed 2628.56 samples/sec   Loss 3.5355   LearningRate 0.0065   Epoch: 14   Global Step: 618840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:17,140-Speed 2627.02 samples/sec   Loss 3.4553   LearningRate 0.0065   Epoch: 14   Global Step: 618850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:21,094-Speed 2590.95 samples/sec   Loss 3.4649   LearningRate 0.0065   Epoch: 14   Global Step: 618860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:24,990-Speed 2629.05 samples/sec   Loss 3.4708   LearningRate 0.0065   Epoch: 14   Global Step: 618870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:16:28,876-Speed 2635.47 samples/sec   Loss 3.3698   LearningRate 0.0065   Epoch: 14   Global Step: 618880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:32,793-Speed 2615.27 samples/sec   Loss 3.4904   LearningRate 0.0065   Epoch: 14   Global Step: 618890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:36,690-Speed 2628.50 samples/sec   Loss 3.4530   LearningRate 0.0064   Epoch: 14   Global Step: 618900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:40,591-Speed 2625.46 samples/sec   Loss 3.4242   LearningRate 0.0064   Epoch: 14   Global Step: 618910   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:44,488-Speed 2628.27 samples/sec   Loss 3.3855   LearningRate 0.0064   Epoch: 14   Global Step: 618920   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:48,381-Speed 2631.37 samples/sec   Loss 3.4877   LearningRate 0.0064   Epoch: 14   Global Step: 618930   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:52,278-Speed 2628.49 samples/sec   Loss 3.4452   LearningRate 0.0064   Epoch: 14   Global Step: 618940   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:16:56,199-Speed 2611.98 samples/sec   Loss 3.4561   LearningRate 0.0064   Epoch: 14   Global Step: 618950   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:00,099-Speed 2627.20 samples/sec   Loss 3.4713   LearningRate 0.0064   Epoch: 14   Global Step: 618960   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:03,995-Speed 2628.76 samples/sec   Loss 3.5233   LearningRate 0.0064   Epoch: 14   Global Step: 618970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:07,896-Speed 2625.31 samples/sec   Loss 3.4374   LearningRate 0.0064   Epoch: 14   Global Step: 618980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:17:11,772-Speed 2642.58 samples/sec   Loss 3.4918   LearningRate 0.0064   Epoch: 14   Global Step: 618990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:15,682-Speed 2620.06 samples/sec   Loss 3.4392   LearningRate 0.0064   Epoch: 14   Global Step: 619000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:19,611-Speed 2606.44 samples/sec   Loss 3.4252   LearningRate 0.0064   Epoch: 14   Global Step: 619010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:23,506-Speed 2630.28 samples/sec   Loss 3.4244   LearningRate 0.0064   Epoch: 14   Global Step: 619020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:27,406-Speed 2626.08 samples/sec   Loss 3.5428   LearningRate 0.0064   Epoch: 14   Global Step: 619030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:31,309-Speed 2624.49 samples/sec   Loss 3.3681   LearningRate 0.0064   Epoch: 14   Global Step: 619040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:35,347-Speed 2536.63 samples/sec   Loss 3.4626   LearningRate 0.0064   Epoch: 14   Global Step: 619050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:39,263-Speed 2615.42 samples/sec   Loss 3.4822   LearningRate 0.0064   Epoch: 14   Global Step: 619060   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:43,170-Speed 2621.07 samples/sec   Loss 3.4444   LearningRate 0.0064   Epoch: 14   Global Step: 619070   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:47,068-Speed 2628.04 samples/sec   Loss 3.3711   LearningRate 0.0064   Epoch: 14   Global Step: 619080   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:17:50,962-Speed 2630.57 samples/sec   Loss 3.4444   LearningRate 0.0064   Epoch: 14   Global Step: 619090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:17:54,861-Speed 2626.79 samples/sec   Loss 3.4402   LearningRate 0.0064   Epoch: 14   Global Step: 619100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:17:58,761-Speed 2626.73 samples/sec   Loss 3.5762   LearningRate 0.0064   Epoch: 14   Global Step: 619110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:18:02,637-Speed 2642.10 samples/sec   Loss 3.4834   LearningRate 0.0064   Epoch: 14   Global Step: 619120   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:06,527-Speed 2633.41 samples/sec   Loss 3.4947   LearningRate 0.0064   Epoch: 14   Global Step: 619130   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:10,419-Speed 2631.11 samples/sec   Loss 3.4957   LearningRate 0.0064   Epoch: 14   Global Step: 619140   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:14,330-Speed 2618.97 samples/sec   Loss 3.4382   LearningRate 0.0064   Epoch: 14   Global Step: 619150   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:18,275-Speed 2596.17 samples/sec   Loss 3.5306   LearningRate 0.0064   Epoch: 14   Global Step: 619160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:22,255-Speed 2573.69 samples/sec   Loss 3.4840   LearningRate 0.0064   Epoch: 14   Global Step: 619170   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:26,156-Speed 2625.28 samples/sec   Loss 3.4756   LearningRate 0.0064   Epoch: 14   Global Step: 619180   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:30,065-Speed 2620.62 samples/sec   Loss 3.4157   LearningRate 0.0064   Epoch: 14   Global Step: 619190   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:33,973-Speed 2620.75 samples/sec   Loss 3.4560   LearningRate 0.0064   Epoch: 14   Global Step: 619200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:37,880-Speed 2622.06 samples/sec   Loss 3.4864   LearningRate 0.0064   Epoch: 14   Global Step: 619210   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:18:41,785-Speed 2623.04 samples/sec   Loss 3.4329   LearningRate 0.0064   Epoch: 14   Global Step: 619220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:18:45,693-Speed 2621.09 samples/sec   Loss 3.4634   LearningRate 0.0064   Epoch: 14   Global Step: 619230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:18:49,586-Speed 2631.27 samples/sec   Loss 3.3900   LearningRate 0.0064   Epoch: 14   Global Step: 619240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:18:53,490-Speed 2623.15 samples/sec   Loss 3.4875   LearningRate 0.0064   Epoch: 14   Global Step: 619250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:18:57,389-Speed 2627.19 samples/sec   Loss 3.4412   LearningRate 0.0064   Epoch: 14   Global Step: 619260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:19:01,307-Speed 2614.76 samples/sec   Loss 3.5050   LearningRate 0.0064   Epoch: 14   Global Step: 619270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:19:05,226-Speed 2613.20 samples/sec   Loss 3.5117   LearningRate 0.0064   Epoch: 14   Global Step: 619280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:19:09,120-Speed 2630.07 samples/sec   Loss 3.4496   LearningRate 0.0064   Epoch: 14   Global Step: 619290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:19:12,997-Speed 2642.64 samples/sec   Loss 3.4664   LearningRate 0.0064   Epoch: 14   Global Step: 619300   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:16,977-Speed 2573.82 samples/sec   Loss 3.3999   LearningRate 0.0064   Epoch: 14   Global Step: 619310   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:20,899-Speed 2610.92 samples/sec   Loss 3.4897   LearningRate 0.0064   Epoch: 14   Global Step: 619320   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:24,800-Speed 2626.12 samples/sec   Loss 3.4102   LearningRate 0.0064   Epoch: 14   Global Step: 619330   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:28,699-Speed 2627.55 samples/sec   Loss 3.4813   LearningRate 0.0064   Epoch: 14   Global Step: 619340   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:32,606-Speed 2621.70 samples/sec   Loss 3.3819   LearningRate 0.0064   Epoch: 14   Global Step: 619350   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:36,535-Speed 2606.76 samples/sec   Loss 3.4237   LearningRate 0.0064   Epoch: 14   Global Step: 619360   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:40,449-Speed 2617.11 samples/sec   Loss 3.5157   LearningRate 0.0064   Epoch: 14   Global Step: 619370   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:44,406-Speed 2588.18 samples/sec   Loss 3.3785   LearningRate 0.0064   Epoch: 14   Global Step: 619380   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:48,322-Speed 2616.21 samples/sec   Loss 3.5094   LearningRate 0.0064   Epoch: 14   Global Step: 619390   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:19:52,220-Speed 2627.82 samples/sec   Loss 3.3938   LearningRate 0.0064   Epoch: 14   Global Step: 619400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:19:56,146-Speed 2608.78 samples/sec   Loss 3.4842   LearningRate 0.0064   Epoch: 14   Global Step: 619410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:00,097-Speed 2592.40 samples/sec   Loss 3.5009   LearningRate 0.0064   Epoch: 14   Global Step: 619420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:03,996-Speed 2627.34 samples/sec   Loss 3.4440   LearningRate 0.0064   Epoch: 14   Global Step: 619430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:07,887-Speed 2631.92 samples/sec   Loss 3.3851   LearningRate 0.0064   Epoch: 14   Global Step: 619440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:11,785-Speed 2627.37 samples/sec   Loss 3.3939   LearningRate 0.0064   Epoch: 14   Global Step: 619450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:15,678-Speed 2631.04 samples/sec   Loss 3.4449   LearningRate 0.0064   Epoch: 14   Global Step: 619460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:19,584-Speed 2622.69 samples/sec   Loss 3.4930   LearningRate 0.0064   Epoch: 14   Global Step: 619470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:23,475-Speed 2632.47 samples/sec   Loss 3.4186   LearningRate 0.0064   Epoch: 14   Global Step: 619480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:27,369-Speed 2629.84 samples/sec   Loss 3.4546   LearningRate 0.0064   Epoch: 14   Global Step: 619490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:31,261-Speed 2632.13 samples/sec   Loss 3.4567   LearningRate 0.0064   Epoch: 14   Global Step: 619500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:20:35,135-Speed 2643.78 samples/sec   Loss 3.4505   LearningRate 0.0064   Epoch: 14   Global Step: 619510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:20:39,018-Speed 2637.45 samples/sec   Loss 3.4354   LearningRate 0.0064   Epoch: 14   Global Step: 619520   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:20:42,909-Speed 2632.31 samples/sec   Loss 3.4755   LearningRate 0.0064   Epoch: 14   Global Step: 619530   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:20:46,810-Speed 2626.21 samples/sec   Loss 3.3934   LearningRate 0.0064   Epoch: 14   Global Step: 619540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:20:50,706-Speed 2628.66 samples/sec   Loss 3.4409   LearningRate 0.0064   Epoch: 14   Global Step: 619550   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:20:54,603-Speed 2628.59 samples/sec   Loss 3.4535   LearningRate 0.0064   Epoch: 14   Global Step: 619560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:20:58,502-Speed 2627.17 samples/sec   Loss 3.5268   LearningRate 0.0064   Epoch: 14   Global Step: 619570   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:21:02,418-Speed 2616.08 samples/sec   Loss 3.4371   LearningRate 0.0064   Epoch: 14   Global Step: 619580   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:21:06,329-Speed 2618.58 samples/sec   Loss 3.4824   LearningRate 0.0064   Epoch: 14   Global Step: 619590   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:21:10,253-Speed 2610.38 samples/sec   Loss 3.4376   LearningRate 0.0064   Epoch: 14   Global Step: 619600   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:21:14,167-Speed 2616.76 samples/sec   Loss 3.4098   LearningRate 0.0064   Epoch: 14   Global Step: 619610   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:21:18,071-Speed 2624.17 samples/sec   Loss 3.3998   LearningRate 0.0064   Epoch: 14   Global Step: 619620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:21,970-Speed 2626.95 samples/sec   Loss 3.3746   LearningRate 0.0064   Epoch: 14   Global Step: 619630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:25,873-Speed 2623.91 samples/sec   Loss 3.4608   LearningRate 0.0064   Epoch: 14   Global Step: 619640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:29,775-Speed 2625.30 samples/sec   Loss 3.4457   LearningRate 0.0064   Epoch: 14   Global Step: 619650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:33,681-Speed 2622.36 samples/sec   Loss 3.4061   LearningRate 0.0064   Epoch: 14   Global Step: 619660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:37,584-Speed 2624.50 samples/sec   Loss 3.4106   LearningRate 0.0064   Epoch: 14   Global Step: 619670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:41,491-Speed 2621.17 samples/sec   Loss 3.4777   LearningRate 0.0064   Epoch: 14   Global Step: 619680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:45,394-Speed 2624.21 samples/sec   Loss 3.4065   LearningRate 0.0064   Epoch: 14   Global Step: 619690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:49,302-Speed 2621.29 samples/sec   Loss 3.4390   LearningRate 0.0064   Epoch: 14   Global Step: 619700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:53,212-Speed 2619.61 samples/sec   Loss 3.4867   LearningRate 0.0064   Epoch: 14   Global Step: 619710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:21:57,122-Speed 2619.53 samples/sec   Loss 3.4174   LearningRate 0.0064   Epoch: 14   Global Step: 619720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:01,027-Speed 2623.48 samples/sec   Loss 3.4392   LearningRate 0.0064   Epoch: 14   Global Step: 619730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:04,926-Speed 2626.85 samples/sec   Loss 3.4268   LearningRate 0.0064   Epoch: 14   Global Step: 619740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:08,826-Speed 2626.06 samples/sec   Loss 3.4423   LearningRate 0.0064   Epoch: 14   Global Step: 619750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:12,727-Speed 2625.37 samples/sec   Loss 3.5608   LearningRate 0.0064   Epoch: 14   Global Step: 619760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:16,625-Speed 2628.21 samples/sec   Loss 3.4667   LearningRate 0.0064   Epoch: 14   Global Step: 619770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:20,529-Speed 2623.57 samples/sec   Loss 3.3568   LearningRate 0.0064   Epoch: 14   Global Step: 619780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:24,432-Speed 2624.62 samples/sec   Loss 3.4400   LearningRate 0.0064   Epoch: 14   Global Step: 619790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:28,332-Speed 2625.59 samples/sec   Loss 3.4030   LearningRate 0.0064   Epoch: 14   Global Step: 619800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:22:32,211-Speed 2641.75 samples/sec   Loss 3.4402   LearningRate 0.0064   Epoch: 14   Global Step: 619810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:36,110-Speed 2626.68 samples/sec   Loss 3.4906   LearningRate 0.0064   Epoch: 14   Global Step: 619820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:40,014-Speed 2623.17 samples/sec   Loss 3.3658   LearningRate 0.0064   Epoch: 14   Global Step: 619830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:43,912-Speed 2627.73 samples/sec   Loss 3.4034   LearningRate 0.0064   Epoch: 14   Global Step: 619840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:47,894-Speed 2572.69 samples/sec   Loss 3.4170   LearningRate 0.0064   Epoch: 14   Global Step: 619850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:51,792-Speed 2627.85 samples/sec   Loss 3.4376   LearningRate 0.0064   Epoch: 14   Global Step: 619860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:55,684-Speed 2631.69 samples/sec   Loss 3.4841   LearningRate 0.0064   Epoch: 14   Global Step: 619870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:22:59,583-Speed 2627.22 samples/sec   Loss 3.4034   LearningRate 0.0064   Epoch: 14   Global Step: 619880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:23:03,480-Speed 2628.16 samples/sec   Loss 3.5180   LearningRate 0.0064   Epoch: 14   Global Step: 619890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:23:07,374-Speed 2630.40 samples/sec   Loss 3.4927   LearningRate 0.0064   Epoch: 14   Global Step: 619900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:23:11,267-Speed 2630.42 samples/sec   Loss 3.4243   LearningRate 0.0064   Epoch: 14   Global Step: 619910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:15,183-Speed 2616.50 samples/sec   Loss 3.4078   LearningRate 0.0064   Epoch: 14   Global Step: 619920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:19,084-Speed 2625.12 samples/sec   Loss 3.5084   LearningRate 0.0064   Epoch: 14   Global Step: 619930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:22,980-Speed 2629.37 samples/sec   Loss 3.4237   LearningRate 0.0064   Epoch: 14   Global Step: 619940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:26,879-Speed 2626.75 samples/sec   Loss 3.4485   LearningRate 0.0064   Epoch: 14   Global Step: 619950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:30,779-Speed 2627.32 samples/sec   Loss 3.4320   LearningRate 0.0064   Epoch: 14   Global Step: 619960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:34,685-Speed 2622.01 samples/sec   Loss 3.4586   LearningRate 0.0064   Epoch: 14   Global Step: 619970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:38,608-Speed 2610.50 samples/sec   Loss 3.4031   LearningRate 0.0064   Epoch: 14   Global Step: 619980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:42,614-Speed 2557.21 samples/sec   Loss 3.5287   LearningRate 0.0064   Epoch: 14   Global Step: 619990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:23:46,614-Speed 2560.65 samples/sec   Loss 3.3443   LearningRate 0.0064   Epoch: 14   Global Step: 620000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:24:29,281-[lfw][620000]XNorm: 22.645209
Training: 2022-04-15 17:24:29,282-[lfw][620000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 17:24:29,283-[lfw][620000]Accuracy-Highest: 0.99800
Training: 2022-04-15 17:25:19,500-[cfp_fp][620000]XNorm: 21.815986
Training: 2022-04-15 17:25:19,501-[cfp_fp][620000]Accuracy-Flip: 0.99200+-0.00333
Training: 2022-04-15 17:25:19,502-[cfp_fp][620000]Accuracy-Highest: 0.99200
Training: 2022-04-15 17:26:02,715-[agedb_30][620000]XNorm: 22.851277
Training: 2022-04-15 17:26:02,716-[agedb_30][620000]Accuracy-Flip: 0.98083+-0.00739
Training: 2022-04-15 17:26:02,717-[agedb_30][620000]Accuracy-Highest: 0.98150
Training: 2022-04-15 17:26:06,597-Speed 73.15 samples/sec   Loss 3.4136   LearningRate 0.0064   Epoch: 14   Global Step: 620010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-15 17:26:10,465-Speed 2647.98 samples/sec   Loss 3.5243   LearningRate 0.0064   Epoch: 14   Global Step: 620020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:26:14,330-Speed 2650.61 samples/sec   Loss 3.4093   LearningRate 0.0064   Epoch: 14   Global Step: 620030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:18,206-Speed 2642.73 samples/sec   Loss 3.4335   LearningRate 0.0064   Epoch: 14   Global Step: 620040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:22,088-Speed 2639.07 samples/sec   Loss 3.3992   LearningRate 0.0064   Epoch: 14   Global Step: 620050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:26,004-Speed 2615.42 samples/sec   Loss 3.5197   LearningRate 0.0064   Epoch: 14   Global Step: 620060   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:29,896-Speed 2632.96 samples/sec   Loss 3.4249   LearningRate 0.0064   Epoch: 14   Global Step: 620070   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:33,786-Speed 2632.85 samples/sec   Loss 3.5207   LearningRate 0.0064   Epoch: 14   Global Step: 620080   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:37,672-Speed 2635.40 samples/sec   Loss 3.4077   LearningRate 0.0064   Epoch: 14   Global Step: 620090   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:41,573-Speed 2625.41 samples/sec   Loss 3.4362   LearningRate 0.0064   Epoch: 14   Global Step: 620100   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:45,480-Speed 2628.22 samples/sec   Loss 3.4558   LearningRate 0.0064   Epoch: 14   Global Step: 620110   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:49,376-Speed 2628.88 samples/sec   Loss 3.4201   LearningRate 0.0064   Epoch: 14   Global Step: 620120   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:26:53,276-Speed 2626.66 samples/sec   Loss 3.4047   LearningRate 0.0064   Epoch: 14   Global Step: 620130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:26:57,178-Speed 2624.92 samples/sec   Loss 3.3970   LearningRate 0.0064   Epoch: 14   Global Step: 620140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:27:01,058-Speed 2639.74 samples/sec   Loss 3.4140   LearningRate 0.0064   Epoch: 14   Global Step: 620150   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:04,955-Speed 2628.61 samples/sec   Loss 3.4441   LearningRate 0.0064   Epoch: 14   Global Step: 620160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:08,850-Speed 2629.22 samples/sec   Loss 3.4822   LearningRate 0.0064   Epoch: 14   Global Step: 620170   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:12,748-Speed 2627.45 samples/sec   Loss 3.3713   LearningRate 0.0064   Epoch: 14   Global Step: 620180   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:16,646-Speed 2628.03 samples/sec   Loss 3.4021   LearningRate 0.0064   Epoch: 14   Global Step: 620190   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:20,549-Speed 2624.02 samples/sec   Loss 3.3778   LearningRate 0.0064   Epoch: 14   Global Step: 620200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:24,449-Speed 2626.77 samples/sec   Loss 3.3538   LearningRate 0.0064   Epoch: 14   Global Step: 620210   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:28,342-Speed 2630.93 samples/sec   Loss 3.4574   LearningRate 0.0064   Epoch: 14   Global Step: 620220   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:32,238-Speed 2629.07 samples/sec   Loss 3.4599   LearningRate 0.0064   Epoch: 14   Global Step: 620230   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:36,133-Speed 2629.28 samples/sec   Loss 3.5034   LearningRate 0.0064   Epoch: 14   Global Step: 620240   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:27:40,032-Speed 2626.81 samples/sec   Loss 3.4210   LearningRate 0.0064   Epoch: 14   Global Step: 620250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:27:43,936-Speed 2623.62 samples/sec   Loss 3.4157   LearningRate 0.0064   Epoch: 14   Global Step: 620260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:27:47,829-Speed 2631.57 samples/sec   Loss 3.4322   LearningRate 0.0064   Epoch: 14   Global Step: 620270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:27:51,736-Speed 2620.81 samples/sec   Loss 3.4117   LearningRate 0.0064   Epoch: 14   Global Step: 620280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:27:55,628-Speed 2632.41 samples/sec   Loss 3.4642   LearningRate 0.0064   Epoch: 14   Global Step: 620290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:27:59,525-Speed 2628.45 samples/sec   Loss 3.3948   LearningRate 0.0064   Epoch: 14   Global Step: 620300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:03,425-Speed 2626.15 samples/sec   Loss 3.5230   LearningRate 0.0064   Epoch: 14   Global Step: 620310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:07,325-Speed 2626.52 samples/sec   Loss 3.3916   LearningRate 0.0064   Epoch: 14   Global Step: 620320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:11,222-Speed 2627.74 samples/sec   Loss 3.4354   LearningRate 0.0064   Epoch: 14   Global Step: 620330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:15,119-Speed 2628.31 samples/sec   Loss 3.4363   LearningRate 0.0064   Epoch: 14   Global Step: 620340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:18,994-Speed 2643.44 samples/sec   Loss 3.4208   LearningRate 0.0064   Epoch: 14   Global Step: 620350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:22,897-Speed 2624.53 samples/sec   Loss 3.4291   LearningRate 0.0064   Epoch: 14   Global Step: 620360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:26,792-Speed 2629.13 samples/sec   Loss 3.4726   LearningRate 0.0064   Epoch: 14   Global Step: 620370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:28:30,668-Speed 2642.70 samples/sec   Loss 3.4136   LearningRate 0.0064   Epoch: 14   Global Step: 620380   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:34,565-Speed 2628.51 samples/sec   Loss 3.4460   LearningRate 0.0064   Epoch: 14   Global Step: 620390   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:38,462-Speed 2628.82 samples/sec   Loss 3.4797   LearningRate 0.0064   Epoch: 14   Global Step: 620400   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:42,352-Speed 2632.75 samples/sec   Loss 3.4440   LearningRate 0.0064   Epoch: 14   Global Step: 620410   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:46,244-Speed 2631.42 samples/sec   Loss 3.4214   LearningRate 0.0064   Epoch: 14   Global Step: 620420   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:50,137-Speed 2631.19 samples/sec   Loss 3.4210   LearningRate 0.0064   Epoch: 14   Global Step: 620430   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:54,029-Speed 2631.40 samples/sec   Loss 3.4961   LearningRate 0.0064   Epoch: 14   Global Step: 620440   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:28:58,121-Speed 2503.62 samples/sec   Loss 3.4202   LearningRate 0.0064   Epoch: 14   Global Step: 620450   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:29:02,045-Speed 2610.51 samples/sec   Loss 3.5083   LearningRate 0.0064   Epoch: 14   Global Step: 620460   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:29:05,943-Speed 2627.86 samples/sec   Loss 3.5425   LearningRate 0.0064   Epoch: 14   Global Step: 620470   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:29:09,864-Speed 2611.65 samples/sec   Loss 3.4507   LearningRate 0.0064   Epoch: 14   Global Step: 620480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-15 17:29:13,771-Speed 2621.58 samples/sec   Loss 3.4023   LearningRate 0.0064   Epoch: 14   Global Step: 620490   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:29:17,666-Speed 2630.11 samples/sec   Loss 3.4034   LearningRate 0.0064   Epoch: 14   Global Step: 620500   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:29:21,561-Speed 2629.40 samples/sec   Loss 3.4657   LearningRate 0.0064   Epoch: 14   Global Step: 620510   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-15 17:29:25,441-Speed 2640.59 samples/sec   Loss 3.4147   LearningRate 0.0064   Epoch: 14   Global Step: 620520   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-04-15 17:29:29,346-Speed 2622.95 samples/sec   Loss 3.4554   LearningRate 0.0064   Epoch: 14   Global Step: 620530   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-04-15 17:29:33,238-Speed 2631.78 samples/sec   Loss 3.4303   LearningRate 0.0063   Epoch: 14   Global Step: 620540   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-04-15 17:29:37,137-Speed 2627.00 samples/sec   Loss 3.4527   LearningRate 0.0063   Epoch: 14   Global Step: 620550   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:29:41,031-Speed 2630.33 samples/sec   Loss 3.3192   LearningRate 0.0063   Epoch: 14   Global Step: 620560   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:29:44,936-Speed 2623.16 samples/sec   Loss 3.3748   LearningRate 0.0063   Epoch: 14   Global Step: 620570   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:29:48,827-Speed 2631.78 samples/sec   Loss 3.3178   LearningRate 0.0063   Epoch: 14   Global Step: 620580   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:29:52,732-Speed 2623.18 samples/sec   Loss 3.4061   LearningRate 0.0063   Epoch: 14   Global Step: 620590   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:29:56,625-Speed 2631.38 samples/sec   Loss 3.5021   LearningRate 0.0063   Epoch: 14   Global Step: 620600   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:30:00,521-Speed 2629.31 samples/sec   Loss 3.3920   LearningRate 0.0063   Epoch: 14   Global Step: 620610   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:30:04,420-Speed 2626.56 samples/sec   Loss 3.3813   LearningRate 0.0063   Epoch: 14   Global Step: 620620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:08,315-Speed 2630.29 samples/sec   Loss 3.3697   LearningRate 0.0063   Epoch: 14   Global Step: 620630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:12,211-Speed 2629.05 samples/sec   Loss 3.4886   LearningRate 0.0063   Epoch: 14   Global Step: 620640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:16,139-Speed 2607.18 samples/sec   Loss 3.2926   LearningRate 0.0063   Epoch: 14   Global Step: 620650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:20,036-Speed 2628.26 samples/sec   Loss 3.3967   LearningRate 0.0063   Epoch: 14   Global Step: 620660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:23,928-Speed 2632.57 samples/sec   Loss 3.4234   LearningRate 0.0063   Epoch: 14   Global Step: 620670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:27,822-Speed 2630.61 samples/sec   Loss 3.4741   LearningRate 0.0063   Epoch: 14   Global Step: 620680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:31,719-Speed 2628.07 samples/sec   Loss 3.4614   LearningRate 0.0063   Epoch: 14   Global Step: 620690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:35,610-Speed 2631.92 samples/sec   Loss 3.4028   LearningRate 0.0063   Epoch: 14   Global Step: 620700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:39,502-Speed 2631.55 samples/sec   Loss 3.4036   LearningRate 0.0063   Epoch: 14   Global Step: 620710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:43,398-Speed 2629.95 samples/sec   Loss 3.4123   LearningRate 0.0063   Epoch: 14   Global Step: 620720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:30:47,275-Speed 2641.51 samples/sec   Loss 3.4373   LearningRate 0.0063   Epoch: 14   Global Step: 620730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:51,178-Speed 2624.74 samples/sec   Loss 3.4581   LearningRate 0.0063   Epoch: 14   Global Step: 620740   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:55,075-Speed 2628.11 samples/sec   Loss 3.3138   LearningRate 0.0063   Epoch: 14   Global Step: 620750   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:30:58,967-Speed 2631.97 samples/sec   Loss 3.3957   LearningRate 0.0063   Epoch: 14   Global Step: 620760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:02,866-Speed 2626.70 samples/sec   Loss 3.3480   LearningRate 0.0063   Epoch: 14   Global Step: 620770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:06,766-Speed 2626.35 samples/sec   Loss 3.4639   LearningRate 0.0063   Epoch: 14   Global Step: 620780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:10,663-Speed 2628.47 samples/sec   Loss 3.5499   LearningRate 0.0063   Epoch: 14   Global Step: 620790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:14,564-Speed 2625.82 samples/sec   Loss 3.4181   LearningRate 0.0063   Epoch: 14   Global Step: 620800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:18,457-Speed 2630.98 samples/sec   Loss 3.4905   LearningRate 0.0063   Epoch: 14   Global Step: 620810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:22,355-Speed 2627.72 samples/sec   Loss 3.3901   LearningRate 0.0063   Epoch: 14   Global Step: 620820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:26,253-Speed 2628.36 samples/sec   Loss 3.4096   LearningRate 0.0063   Epoch: 14   Global Step: 620830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:31:30,148-Speed 2629.07 samples/sec   Loss 3.4295   LearningRate 0.0063   Epoch: 14   Global Step: 620840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:31:34,042-Speed 2630.84 samples/sec   Loss 3.4088   LearningRate 0.0063   Epoch: 14   Global Step: 620850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:31:37,911-Speed 2646.66 samples/sec   Loss 3.4478   LearningRate 0.0063   Epoch: 14   Global Step: 620860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:41,808-Speed 2628.92 samples/sec   Loss 3.4007   LearningRate 0.0063   Epoch: 14   Global Step: 620870   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:45,714-Speed 2622.02 samples/sec   Loss 3.3670   LearningRate 0.0063   Epoch: 14   Global Step: 620880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:49,606-Speed 2631.65 samples/sec   Loss 3.4820   LearningRate 0.0063   Epoch: 14   Global Step: 620890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:53,498-Speed 2631.89 samples/sec   Loss 3.5227   LearningRate 0.0063   Epoch: 14   Global Step: 620900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:31:57,393-Speed 2629.88 samples/sec   Loss 3.4005   LearningRate 0.0063   Epoch: 14   Global Step: 620910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:01,291-Speed 2627.49 samples/sec   Loss 3.3785   LearningRate 0.0063   Epoch: 14   Global Step: 620920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:05,202-Speed 2618.98 samples/sec   Loss 3.4401   LearningRate 0.0063   Epoch: 14   Global Step: 620930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:09,097-Speed 2629.58 samples/sec   Loss 3.4364   LearningRate 0.0063   Epoch: 14   Global Step: 620940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:13,001-Speed 2624.16 samples/sec   Loss 3.4049   LearningRate 0.0063   Epoch: 14   Global Step: 620950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:16,906-Speed 2622.95 samples/sec   Loss 3.4595   LearningRate 0.0063   Epoch: 14   Global Step: 620960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:32:20,812-Speed 2622.78 samples/sec   Loss 3.4632   LearningRate 0.0063   Epoch: 14   Global Step: 620970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:32:24,703-Speed 2632.16 samples/sec   Loss 3.4604   LearningRate 0.0063   Epoch: 14   Global Step: 620980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:32:28,607-Speed 2623.56 samples/sec   Loss 3.4682   LearningRate 0.0063   Epoch: 14   Global Step: 620990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:32:32,502-Speed 2629.68 samples/sec   Loss 3.5207   LearningRate 0.0063   Epoch: 14   Global Step: 621000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:32:36,401-Speed 2626.82 samples/sec   Loss 3.5300   LearningRate 0.0063   Epoch: 14   Global Step: 621010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:32:40,275-Speed 2643.85 samples/sec   Loss 3.4230   LearningRate 0.0063   Epoch: 14   Global Step: 621020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:44,198-Speed 2611.53 samples/sec   Loss 3.4102   LearningRate 0.0063   Epoch: 14   Global Step: 621030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:48,093-Speed 2629.72 samples/sec   Loss 3.4223   LearningRate 0.0063   Epoch: 14   Global Step: 621040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:51,988-Speed 2629.31 samples/sec   Loss 3.3538   LearningRate 0.0063   Epoch: 14   Global Step: 621050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:55,881-Speed 2631.66 samples/sec   Loss 3.4714   LearningRate 0.0063   Epoch: 14   Global Step: 621060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:32:59,782-Speed 2625.02 samples/sec   Loss 3.3787   LearningRate 0.0063   Epoch: 14   Global Step: 621070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:03,679-Speed 2628.65 samples/sec   Loss 3.3855   LearningRate 0.0063   Epoch: 14   Global Step: 621080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:07,575-Speed 2629.22 samples/sec   Loss 3.4092   LearningRate 0.0063   Epoch: 14   Global Step: 621090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:11,473-Speed 2627.78 samples/sec   Loss 3.3690   LearningRate 0.0063   Epoch: 14   Global Step: 621100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:15,455-Speed 2572.46 samples/sec   Loss 3.3992   LearningRate 0.0063   Epoch: 14   Global Step: 621110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:19,356-Speed 2625.50 samples/sec   Loss 3.4181   LearningRate 0.0063   Epoch: 14   Global Step: 621120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:33:23,251-Speed 2629.53 samples/sec   Loss 3.3694   LearningRate 0.0063   Epoch: 14   Global Step: 621130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:33:27,147-Speed 2630.29 samples/sec   Loss 3.3988   LearningRate 0.0063   Epoch: 14   Global Step: 621140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:33:31,021-Speed 2643.84 samples/sec   Loss 3.4054   LearningRate 0.0063   Epoch: 14   Global Step: 621150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:34,929-Speed 2620.88 samples/sec   Loss 3.3670   LearningRate 0.0063   Epoch: 14   Global Step: 621160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:38,849-Speed 2612.53 samples/sec   Loss 3.4603   LearningRate 0.0063   Epoch: 14   Global Step: 621170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:42,750-Speed 2626.28 samples/sec   Loss 3.3561   LearningRate 0.0063   Epoch: 14   Global Step: 621180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:46,641-Speed 2632.27 samples/sec   Loss 3.3850   LearningRate 0.0063   Epoch: 14   Global Step: 621190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:50,532-Speed 2632.13 samples/sec   Loss 3.3951   LearningRate 0.0063   Epoch: 14   Global Step: 621200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:54,427-Speed 2630.25 samples/sec   Loss 3.4091   LearningRate 0.0063   Epoch: 14   Global Step: 621210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:33:58,319-Speed 2632.27 samples/sec   Loss 3.4452   LearningRate 0.0063   Epoch: 14   Global Step: 621220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:02,244-Speed 2609.11 samples/sec   Loss 3.4203   LearningRate 0.0063   Epoch: 14   Global Step: 621230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:06,143-Speed 2627.41 samples/sec   Loss 3.4230   LearningRate 0.0063   Epoch: 14   Global Step: 621240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:10,017-Speed 2643.94 samples/sec   Loss 3.4496   LearningRate 0.0063   Epoch: 14   Global Step: 621250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:13,928-Speed 2619.30 samples/sec   Loss 3.4877   LearningRate 0.0063   Epoch: 14   Global Step: 621260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:17,837-Speed 2620.13 samples/sec   Loss 3.3966   LearningRate 0.0063   Epoch: 14   Global Step: 621270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:21,726-Speed 2633.33 samples/sec   Loss 3.4764   LearningRate 0.0063   Epoch: 14   Global Step: 621280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:25,620-Speed 2630.61 samples/sec   Loss 3.3581   LearningRate 0.0063   Epoch: 14   Global Step: 621290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:29,521-Speed 2629.53 samples/sec   Loss 3.4016   LearningRate 0.0063   Epoch: 14   Global Step: 621300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:33,420-Speed 2627.21 samples/sec   Loss 3.4381   LearningRate 0.0063   Epoch: 14   Global Step: 621310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:37,314-Speed 2631.03 samples/sec   Loss 3.4423   LearningRate 0.0063   Epoch: 14   Global Step: 621320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:41,209-Speed 2629.43 samples/sec   Loss 3.5820   LearningRate 0.0063   Epoch: 14   Global Step: 621330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:45,195-Speed 2569.91 samples/sec   Loss 3.4386   LearningRate 0.0063   Epoch: 14   Global Step: 621340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:34:49,105-Speed 2619.88 samples/sec   Loss 3.4293   LearningRate 0.0063   Epoch: 14   Global Step: 621350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:34:53,017-Speed 2617.99 samples/sec   Loss 3.3500   LearningRate 0.0063   Epoch: 14   Global Step: 621360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:34:57,116-Speed 2499.16 samples/sec   Loss 3.4320   LearningRate 0.0063   Epoch: 14   Global Step: 621370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:01,127-Speed 2553.74 samples/sec   Loss 3.4542   LearningRate 0.0063   Epoch: 14   Global Step: 621380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:05,028-Speed 2625.59 samples/sec   Loss 3.4243   LearningRate 0.0063   Epoch: 14   Global Step: 621390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:08,924-Speed 2628.87 samples/sec   Loss 3.4049   LearningRate 0.0063   Epoch: 14   Global Step: 621400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:12,833-Speed 2620.41 samples/sec   Loss 3.4583   LearningRate 0.0063   Epoch: 14   Global Step: 621410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:16,732-Speed 2627.04 samples/sec   Loss 3.3322   LearningRate 0.0063   Epoch: 14   Global Step: 621420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:20,623-Speed 2632.34 samples/sec   Loss 3.3979   LearningRate 0.0063   Epoch: 14   Global Step: 621430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:24,514-Speed 2632.50 samples/sec   Loss 3.3906   LearningRate 0.0063   Epoch: 14   Global Step: 621440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:28,393-Speed 2640.81 samples/sec   Loss 3.4418   LearningRate 0.0063   Epoch: 14   Global Step: 621450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:32,306-Speed 2617.25 samples/sec   Loss 3.4485   LearningRate 0.0063   Epoch: 14   Global Step: 621460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:36,209-Speed 2624.93 samples/sec   Loss 3.3769   LearningRate 0.0063   Epoch: 14   Global Step: 621470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:40,106-Speed 2627.98 samples/sec   Loss 3.4582   LearningRate 0.0063   Epoch: 14   Global Step: 621480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:44,122-Speed 2550.94 samples/sec   Loss 3.4127   LearningRate 0.0063   Epoch: 14   Global Step: 621490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:48,014-Speed 2631.46 samples/sec   Loss 3.3417   LearningRate 0.0063   Epoch: 14   Global Step: 621500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:51,935-Speed 2612.25 samples/sec   Loss 3.4974   LearningRate 0.0063   Epoch: 14   Global Step: 621510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:55,830-Speed 2629.41 samples/sec   Loss 3.4920   LearningRate 0.0063   Epoch: 14   Global Step: 621520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:35:59,727-Speed 2628.54 samples/sec   Loss 3.3647   LearningRate 0.0063   Epoch: 14   Global Step: 621530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:03,630-Speed 2624.75 samples/sec   Loss 3.4408   LearningRate 0.0063   Epoch: 14   Global Step: 621540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:07,523-Speed 2630.58 samples/sec   Loss 3.4103   LearningRate 0.0063   Epoch: 14   Global Step: 621550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:36:11,402-Speed 2640.82 samples/sec   Loss 3.4773   LearningRate 0.0063   Epoch: 14   Global Step: 621560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:15,312-Speed 2619.83 samples/sec   Loss 3.4215   LearningRate 0.0063   Epoch: 14   Global Step: 621570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:19,206-Speed 2629.92 samples/sec   Loss 3.4194   LearningRate 0.0063   Epoch: 14   Global Step: 621580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:23,098-Speed 2631.79 samples/sec   Loss 3.4181   LearningRate 0.0063   Epoch: 14   Global Step: 621590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:26,997-Speed 2627.82 samples/sec   Loss 3.3843   LearningRate 0.0063   Epoch: 14   Global Step: 621600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:30,889-Speed 2631.17 samples/sec   Loss 3.3676   LearningRate 0.0063   Epoch: 14   Global Step: 621610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:34,785-Speed 2629.71 samples/sec   Loss 3.3913   LearningRate 0.0063   Epoch: 14   Global Step: 621620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:38,683-Speed 2628.10 samples/sec   Loss 3.4998   LearningRate 0.0063   Epoch: 14   Global Step: 621630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:42,574-Speed 2632.03 samples/sec   Loss 3.4027   LearningRate 0.0063   Epoch: 14   Global Step: 621640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:46,476-Speed 2624.96 samples/sec   Loss 3.4148   LearningRate 0.0063   Epoch: 14   Global Step: 621650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:50,373-Speed 2628.14 samples/sec   Loss 3.3194   LearningRate 0.0063   Epoch: 14   Global Step: 621660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:36:54,249-Speed 2642.86 samples/sec   Loss 3.4236   LearningRate 0.0063   Epoch: 14   Global Step: 621670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:36:58,152-Speed 2624.12 samples/sec   Loss 3.4816   LearningRate 0.0063   Epoch: 14   Global Step: 621680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:02,054-Speed 2624.89 samples/sec   Loss 3.3707   LearningRate 0.0063   Epoch: 14   Global Step: 621690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:05,948-Speed 2629.97 samples/sec   Loss 3.4500   LearningRate 0.0063   Epoch: 14   Global Step: 621700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:09,842-Speed 2631.12 samples/sec   Loss 3.3721   LearningRate 0.0063   Epoch: 14   Global Step: 621710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:13,734-Speed 2631.55 samples/sec   Loss 3.4086   LearningRate 0.0063   Epoch: 14   Global Step: 621720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:17,638-Speed 2623.20 samples/sec   Loss 3.4682   LearningRate 0.0063   Epoch: 14   Global Step: 621730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:21,534-Speed 2633.00 samples/sec   Loss 3.4495   LearningRate 0.0063   Epoch: 14   Global Step: 621740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:25,426-Speed 2631.90 samples/sec   Loss 3.4714   LearningRate 0.0063   Epoch: 14   Global Step: 621750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:37:29,310-Speed 2637.22 samples/sec   Loss 3.3776   LearningRate 0.0063   Epoch: 14   Global Step: 621760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:33,205-Speed 2630.01 samples/sec   Loss 3.4032   LearningRate 0.0063   Epoch: 14   Global Step: 621770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:37,099-Speed 2630.07 samples/sec   Loss 3.4592   LearningRate 0.0063   Epoch: 14   Global Step: 621780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:40,997-Speed 2627.35 samples/sec   Loss 3.4081   LearningRate 0.0063   Epoch: 14   Global Step: 621790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:44,886-Speed 2633.74 samples/sec   Loss 3.4323   LearningRate 0.0063   Epoch: 14   Global Step: 621800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:48,777-Speed 2632.92 samples/sec   Loss 3.3623   LearningRate 0.0063   Epoch: 14   Global Step: 621810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:52,680-Speed 2624.04 samples/sec   Loss 3.4923   LearningRate 0.0063   Epoch: 14   Global Step: 621820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:37:56,573-Speed 2631.33 samples/sec   Loss 3.4414   LearningRate 0.0063   Epoch: 14   Global Step: 621830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:00,504-Speed 2605.04 samples/sec   Loss 3.4313   LearningRate 0.0063   Epoch: 14   Global Step: 621840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:04,407-Speed 2624.18 samples/sec   Loss 3.3139   LearningRate 0.0063   Epoch: 14   Global Step: 621850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:08,299-Speed 2631.38 samples/sec   Loss 3.4211   LearningRate 0.0063   Epoch: 14   Global Step: 621860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:12,200-Speed 2626.30 samples/sec   Loss 3.4386   LearningRate 0.0063   Epoch: 14   Global Step: 621870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:16,096-Speed 2628.92 samples/sec   Loss 3.3722   LearningRate 0.0063   Epoch: 14   Global Step: 621880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:20,007-Speed 2619.51 samples/sec   Loss 3.3745   LearningRate 0.0063   Epoch: 14   Global Step: 621890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:23,900-Speed 2630.55 samples/sec   Loss 3.4567   LearningRate 0.0063   Epoch: 14   Global Step: 621900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:27,806-Speed 2622.35 samples/sec   Loss 3.5157   LearningRate 0.0063   Epoch: 14   Global Step: 621910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:31,710-Speed 2623.50 samples/sec   Loss 3.4187   LearningRate 0.0063   Epoch: 14   Global Step: 621920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:38:35,639-Speed 2606.75 samples/sec   Loss 3.4815   LearningRate 0.0063   Epoch: 14   Global Step: 621930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:39,528-Speed 2633.62 samples/sec   Loss 3.4124   LearningRate 0.0063   Epoch: 14   Global Step: 621940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:43,431-Speed 2625.18 samples/sec   Loss 3.3799   LearningRate 0.0063   Epoch: 14   Global Step: 621950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:47,348-Speed 2614.86 samples/sec   Loss 3.3798   LearningRate 0.0063   Epoch: 14   Global Step: 621960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:51,243-Speed 2629.64 samples/sec   Loss 3.3951   LearningRate 0.0063   Epoch: 14   Global Step: 621970   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:55,188-Speed 2596.50 samples/sec   Loss 3.3884   LearningRate 0.0063   Epoch: 14   Global Step: 621980   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:38:59,093-Speed 2623.49 samples/sec   Loss 3.4010   LearningRate 0.0063   Epoch: 14   Global Step: 621990   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:39:02,995-Speed 2624.88 samples/sec   Loss 3.4064   LearningRate 0.0063   Epoch: 14   Global Step: 622000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:39:06,894-Speed 2626.45 samples/sec   Loss 3.4228   LearningRate 0.0063   Epoch: 14   Global Step: 622010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:39:10,798-Speed 2623.68 samples/sec   Loss 3.4420   LearningRate 0.0063   Epoch: 14   Global Step: 622020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:39:14,699-Speed 2626.33 samples/sec   Loss 3.3246   LearningRate 0.0063   Epoch: 14   Global Step: 622030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:18,605-Speed 2623.25 samples/sec   Loss 3.4077   LearningRate 0.0063   Epoch: 14   Global Step: 622040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:22,503-Speed 2627.57 samples/sec   Loss 3.4442   LearningRate 0.0063   Epoch: 14   Global Step: 622050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:26,397-Speed 2630.63 samples/sec   Loss 3.4777   LearningRate 0.0063   Epoch: 14   Global Step: 622060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:30,288-Speed 2632.04 samples/sec   Loss 3.3832   LearningRate 0.0063   Epoch: 14   Global Step: 622070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:34,184-Speed 2629.03 samples/sec   Loss 3.4249   LearningRate 0.0063   Epoch: 14   Global Step: 622080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:38,078-Speed 2630.38 samples/sec   Loss 3.3773   LearningRate 0.0063   Epoch: 14   Global Step: 622090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:41,970-Speed 2631.64 samples/sec   Loss 3.4459   LearningRate 0.0063   Epoch: 14   Global Step: 622100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:45,871-Speed 2625.61 samples/sec   Loss 3.4656   LearningRate 0.0063   Epoch: 14   Global Step: 622110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:49,761-Speed 2633.38 samples/sec   Loss 3.4279   LearningRate 0.0063   Epoch: 14   Global Step: 622120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:39:53,659-Speed 2627.29 samples/sec   Loss 3.3465   LearningRate 0.0063   Epoch: 14   Global Step: 622130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:39:57,506-Speed 2662.88 samples/sec   Loss 3.3743   LearningRate 0.0063   Epoch: 14   Global Step: 622140   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:01,402-Speed 2629.00 samples/sec   Loss 3.3633   LearningRate 0.0063   Epoch: 14   Global Step: 622150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:05,305-Speed 2624.35 samples/sec   Loss 3.4442   LearningRate 0.0063   Epoch: 14   Global Step: 622160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:09,203-Speed 2627.83 samples/sec   Loss 3.3702   LearningRate 0.0063   Epoch: 14   Global Step: 622170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:13,100-Speed 2627.84 samples/sec   Loss 3.4437   LearningRate 0.0063   Epoch: 14   Global Step: 622180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:35,103-Speed 465.41 samples/sec   Loss 3.4411   LearningRate 0.0062   Epoch: 15   Global Step: 622190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:38,983-Speed 2640.42 samples/sec   Loss 3.4603   LearningRate 0.0062   Epoch: 15   Global Step: 622200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:42,880-Speed 2628.53 samples/sec   Loss 3.4112   LearningRate 0.0062   Epoch: 15   Global Step: 622210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:46,769-Speed 2633.68 samples/sec   Loss 3.4590   LearningRate 0.0062   Epoch: 15   Global Step: 622220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:50,701-Speed 2604.99 samples/sec   Loss 3.3997   LearningRate 0.0062   Epoch: 15   Global Step: 622230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:40:54,594-Speed 2630.67 samples/sec   Loss 3.4301   LearningRate 0.0062   Epoch: 15   Global Step: 622240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:40:58,497-Speed 2624.81 samples/sec   Loss 3.4336   LearningRate 0.0062   Epoch: 15   Global Step: 622250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:41:02,397-Speed 2625.86 samples/sec   Loss 3.3296   LearningRate 0.0062   Epoch: 15   Global Step: 622260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:41:06,296-Speed 2628.06 samples/sec   Loss 3.5517   LearningRate 0.0062   Epoch: 15   Global Step: 622270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:41:10,214-Speed 2613.71 samples/sec   Loss 3.5038   LearningRate 0.0062   Epoch: 15   Global Step: 622280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:41:14,098-Speed 2637.41 samples/sec   Loss 3.4753   LearningRate 0.0062   Epoch: 15   Global Step: 622290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:18,008-Speed 2619.59 samples/sec   Loss 3.3374   LearningRate 0.0062   Epoch: 15   Global Step: 622300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:21,912-Speed 2623.60 samples/sec   Loss 3.4587   LearningRate 0.0062   Epoch: 15   Global Step: 622310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:25,827-Speed 2616.04 samples/sec   Loss 3.3725   LearningRate 0.0062   Epoch: 15   Global Step: 622320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:29,734-Speed 2622.27 samples/sec   Loss 3.5078   LearningRate 0.0062   Epoch: 15   Global Step: 622330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:33,656-Speed 2610.89 samples/sec   Loss 3.3869   LearningRate 0.0062   Epoch: 15   Global Step: 622340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:37,576-Speed 2617.90 samples/sec   Loss 3.3338   LearningRate 0.0062   Epoch: 15   Global Step: 622350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:41,484-Speed 2620.62 samples/sec   Loss 3.3919   LearningRate 0.0062   Epoch: 15   Global Step: 622360   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:45,402-Speed 2613.86 samples/sec   Loss 3.4390   LearningRate 0.0062   Epoch: 15   Global Step: 622370   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:49,314-Speed 2618.33 samples/sec   Loss 3.3784   LearningRate 0.0062   Epoch: 15   Global Step: 622380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:41:53,225-Speed 2619.64 samples/sec   Loss 3.3592   LearningRate 0.0062   Epoch: 15   Global Step: 622390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:41:57,132-Speed 2621.45 samples/sec   Loss 3.3990   LearningRate 0.0062   Epoch: 15   Global Step: 622400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:42:01,042-Speed 2619.46 samples/sec   Loss 3.4070   LearningRate 0.0062   Epoch: 15   Global Step: 622410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:42:04,948-Speed 2622.08 samples/sec   Loss 3.3916   LearningRate 0.0062   Epoch: 15   Global Step: 622420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:42:08,835-Speed 2635.11 samples/sec   Loss 3.4576   LearningRate 0.0062   Epoch: 15   Global Step: 622430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:12,742-Speed 2621.81 samples/sec   Loss 3.4159   LearningRate 0.0062   Epoch: 15   Global Step: 622440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:16,654-Speed 2617.76 samples/sec   Loss 3.3518   LearningRate 0.0062   Epoch: 15   Global Step: 622450   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:20,560-Speed 2621.81 samples/sec   Loss 3.4554   LearningRate 0.0062   Epoch: 15   Global Step: 622460   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:24,464-Speed 2624.28 samples/sec   Loss 3.3573   LearningRate 0.0062   Epoch: 15   Global Step: 622470   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:28,370-Speed 2622.27 samples/sec   Loss 3.3899   LearningRate 0.0062   Epoch: 15   Global Step: 622480   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:32,275-Speed 2622.88 samples/sec   Loss 3.2905   LearningRate 0.0062   Epoch: 15   Global Step: 622490   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:36,181-Speed 2622.60 samples/sec   Loss 3.5280   LearningRate 0.0062   Epoch: 15   Global Step: 622500   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:40,083-Speed 2624.44 samples/sec   Loss 3.3000   LearningRate 0.0062   Epoch: 15   Global Step: 622510   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:43,990-Speed 2622.30 samples/sec   Loss 3.4100   LearningRate 0.0062   Epoch: 15   Global Step: 622520   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:42:47,895-Speed 2622.10 samples/sec   Loss 3.4463   LearningRate 0.0062   Epoch: 15   Global Step: 622530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:42:51,805-Speed 2619.38 samples/sec   Loss 3.4138   LearningRate 0.0062   Epoch: 15   Global Step: 622540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:42:55,718-Speed 2617.63 samples/sec   Loss 3.4010   LearningRate 0.0062   Epoch: 15   Global Step: 622550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:42:59,624-Speed 2622.49 samples/sec   Loss 3.3644   LearningRate 0.0062   Epoch: 15   Global Step: 622560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:03,527-Speed 2624.38 samples/sec   Loss 3.3639   LearningRate 0.0062   Epoch: 15   Global Step: 622570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:07,432-Speed 2623.51 samples/sec   Loss 3.3987   LearningRate 0.0062   Epoch: 15   Global Step: 622580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:11,334-Speed 2624.37 samples/sec   Loss 3.4230   LearningRate 0.0062   Epoch: 15   Global Step: 622590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:15,249-Speed 2617.01 samples/sec   Loss 3.4186   LearningRate 0.0062   Epoch: 15   Global Step: 622600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:19,169-Speed 2612.22 samples/sec   Loss 3.3692   LearningRate 0.0062   Epoch: 15   Global Step: 622610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:23,068-Speed 2626.84 samples/sec   Loss 3.3567   LearningRate 0.0062   Epoch: 15   Global Step: 622620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:26,956-Speed 2634.12 samples/sec   Loss 3.4336   LearningRate 0.0062   Epoch: 15   Global Step: 622630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:30,858-Speed 2624.95 samples/sec   Loss 3.3602   LearningRate 0.0062   Epoch: 15   Global Step: 622640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:34,765-Speed 2621.90 samples/sec   Loss 3.2659   LearningRate 0.0062   Epoch: 15   Global Step: 622650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:38,671-Speed 2621.90 samples/sec   Loss 3.3044   LearningRate 0.0062   Epoch: 15   Global Step: 622660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:42,608-Speed 2602.24 samples/sec   Loss 3.4097   LearningRate 0.0062   Epoch: 15   Global Step: 622670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:46,523-Speed 2616.28 samples/sec   Loss 3.3865   LearningRate 0.0062   Epoch: 15   Global Step: 622680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:50,432-Speed 2620.21 samples/sec   Loss 3.4457   LearningRate 0.0062   Epoch: 15   Global Step: 622690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:54,347-Speed 2616.43 samples/sec   Loss 3.3429   LearningRate 0.0062   Epoch: 15   Global Step: 622700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:43:58,248-Speed 2625.95 samples/sec   Loss 3.4424   LearningRate 0.0062   Epoch: 15   Global Step: 622710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:44:02,151-Speed 2624.24 samples/sec   Loss 3.3021   LearningRate 0.0062   Epoch: 15   Global Step: 622720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:44:06,052-Speed 2625.53 samples/sec   Loss 3.4423   LearningRate 0.0062   Epoch: 15   Global Step: 622730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:44:09,956-Speed 2623.70 samples/sec   Loss 3.3828   LearningRate 0.0062   Epoch: 15   Global Step: 622740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:44:13,865-Speed 2620.84 samples/sec   Loss 3.3138   LearningRate 0.0062   Epoch: 15   Global Step: 622750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:44:17,760-Speed 2629.84 samples/sec   Loss 3.4689   LearningRate 0.0062   Epoch: 15   Global Step: 622760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:44:21,643-Speed 2637.57 samples/sec   Loss 3.4422   LearningRate 0.0062   Epoch: 15   Global Step: 622770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:25,550-Speed 2620.94 samples/sec   Loss 3.3793   LearningRate 0.0062   Epoch: 15   Global Step: 622780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:29,457-Speed 2622.40 samples/sec   Loss 3.3341   LearningRate 0.0062   Epoch: 15   Global Step: 622790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:33,368-Speed 2619.09 samples/sec   Loss 3.3940   LearningRate 0.0062   Epoch: 15   Global Step: 622800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:37,269-Speed 2625.29 samples/sec   Loss 3.3623   LearningRate 0.0062   Epoch: 15   Global Step: 622810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:41,167-Speed 2627.64 samples/sec   Loss 3.4531   LearningRate 0.0062   Epoch: 15   Global Step: 622820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:45,070-Speed 2625.25 samples/sec   Loss 3.3405   LearningRate 0.0062   Epoch: 15   Global Step: 622830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:48,985-Speed 2615.66 samples/sec   Loss 3.3017   LearningRate 0.0062   Epoch: 15   Global Step: 622840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:52,894-Speed 2620.61 samples/sec   Loss 3.4267   LearningRate 0.0062   Epoch: 15   Global Step: 622850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:44:56,820-Speed 2608.63 samples/sec   Loss 3.5089   LearningRate 0.0062   Epoch: 15   Global Step: 622860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:00,723-Speed 2624.13 samples/sec   Loss 3.3060   LearningRate 0.0062   Epoch: 15   Global Step: 622870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:45:04,627-Speed 2624.02 samples/sec   Loss 3.3685   LearningRate 0.0062   Epoch: 15   Global Step: 622880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:45:08,531-Speed 2623.80 samples/sec   Loss 3.3244   LearningRate 0.0062   Epoch: 15   Global Step: 622890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:45:12,436-Speed 2622.99 samples/sec   Loss 3.3902   LearningRate 0.0062   Epoch: 15   Global Step: 622900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:45:16,314-Speed 2640.93 samples/sec   Loss 3.3499   LearningRate 0.0062   Epoch: 15   Global Step: 622910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:20,225-Speed 2618.84 samples/sec   Loss 3.3761   LearningRate 0.0062   Epoch: 15   Global Step: 622920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:24,130-Speed 2622.84 samples/sec   Loss 3.4135   LearningRate 0.0062   Epoch: 15   Global Step: 622930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:28,048-Speed 2614.92 samples/sec   Loss 3.2927   LearningRate 0.0062   Epoch: 15   Global Step: 622940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:31,962-Speed 2616.42 samples/sec   Loss 3.3381   LearningRate 0.0062   Epoch: 15   Global Step: 622950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:35,865-Speed 2624.31 samples/sec   Loss 3.4664   LearningRate 0.0062   Epoch: 15   Global Step: 622960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:39,773-Speed 2620.70 samples/sec   Loss 3.4084   LearningRate 0.0062   Epoch: 15   Global Step: 622970   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:43,678-Speed 2623.77 samples/sec   Loss 3.3867   LearningRate 0.0062   Epoch: 15   Global Step: 622980   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:47,604-Speed 2608.42 samples/sec   Loss 3.3488   LearningRate 0.0062   Epoch: 15   Global Step: 622990   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:51,510-Speed 2622.48 samples/sec   Loss 3.3913   LearningRate 0.0062   Epoch: 15   Global Step: 623000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:45:55,414-Speed 2623.83 samples/sec   Loss 3.3246   LearningRate 0.0062   Epoch: 15   Global Step: 623010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:45:59,334-Speed 2613.32 samples/sec   Loss 3.4015   LearningRate 0.0062   Epoch: 15   Global Step: 623020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:46:03,218-Speed 2636.74 samples/sec   Loss 3.4134   LearningRate 0.0062   Epoch: 15   Global Step: 623030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:46:07,109-Speed 2632.36 samples/sec   Loss 3.3910   LearningRate 0.0062   Epoch: 15   Global Step: 623040   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:11,007-Speed 2627.59 samples/sec   Loss 3.3564   LearningRate 0.0062   Epoch: 15   Global Step: 623050   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:14,907-Speed 2626.60 samples/sec   Loss 3.4025   LearningRate 0.0062   Epoch: 15   Global Step: 623060   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:18,828-Speed 2612.30 samples/sec   Loss 3.3891   LearningRate 0.0062   Epoch: 15   Global Step: 623070   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:22,756-Speed 2608.28 samples/sec   Loss 3.2911   LearningRate 0.0062   Epoch: 15   Global Step: 623080   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:26,659-Speed 2624.16 samples/sec   Loss 3.4171   LearningRate 0.0062   Epoch: 15   Global Step: 623090   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:30,560-Speed 2626.01 samples/sec   Loss 3.3444   LearningRate 0.0062   Epoch: 15   Global Step: 623100   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:34,485-Speed 2609.60 samples/sec   Loss 3.3993   LearningRate 0.0062   Epoch: 15   Global Step: 623110   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:38,391-Speed 2621.73 samples/sec   Loss 3.3475   LearningRate 0.0062   Epoch: 15   Global Step: 623120   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:42,322-Speed 2605.33 samples/sec   Loss 3.4112   LearningRate 0.0062   Epoch: 15   Global Step: 623130   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 17:46:46,227-Speed 2623.10 samples/sec   Loss 3.3695   LearningRate 0.0062   Epoch: 15   Global Step: 623140   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:46:50,135-Speed 2621.49 samples/sec   Loss 3.3862   LearningRate 0.0062   Epoch: 15   Global Step: 623150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:46:54,040-Speed 2622.91 samples/sec   Loss 3.3651   LearningRate 0.0062   Epoch: 15   Global Step: 623160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:46:57,948-Speed 2620.55 samples/sec   Loss 3.4410   LearningRate 0.0062   Epoch: 15   Global Step: 623170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:01,893-Speed 2596.58 samples/sec   Loss 3.2743   LearningRate 0.0062   Epoch: 15   Global Step: 623180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:05,800-Speed 2620.77 samples/sec   Loss 3.3730   LearningRate 0.0062   Epoch: 15   Global Step: 623190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:09,702-Speed 2625.39 samples/sec   Loss 3.3575   LearningRate 0.0062   Epoch: 15   Global Step: 623200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:13,604-Speed 2625.38 samples/sec   Loss 3.4675   LearningRate 0.0062   Epoch: 15   Global Step: 623210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:17,507-Speed 2623.95 samples/sec   Loss 3.3828   LearningRate 0.0062   Epoch: 15   Global Step: 623220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:21,437-Speed 2606.22 samples/sec   Loss 3.3976   LearningRate 0.0062   Epoch: 15   Global Step: 623230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:47:25,347-Speed 2620.05 samples/sec   Loss 3.3692   LearningRate 0.0062   Epoch: 15   Global Step: 623240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:29,281-Speed 2604.15 samples/sec   Loss 3.3967   LearningRate 0.0062   Epoch: 15   Global Step: 623250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:33,203-Speed 2611.39 samples/sec   Loss 3.2540   LearningRate 0.0062   Epoch: 15   Global Step: 623260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:37,119-Speed 2615.15 samples/sec   Loss 3.3470   LearningRate 0.0062   Epoch: 15   Global Step: 623270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:41,024-Speed 2623.22 samples/sec   Loss 3.2775   LearningRate 0.0062   Epoch: 15   Global Step: 623280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:44,941-Speed 2615.34 samples/sec   Loss 3.3222   LearningRate 0.0062   Epoch: 15   Global Step: 623290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:48,863-Speed 2611.26 samples/sec   Loss 3.3264   LearningRate 0.0062   Epoch: 15   Global Step: 623300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:52,783-Speed 2612.93 samples/sec   Loss 3.3713   LearningRate 0.0062   Epoch: 15   Global Step: 623310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:47:56,675-Speed 2631.90 samples/sec   Loss 3.3271   LearningRate 0.0062   Epoch: 15   Global Step: 623320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:00,586-Speed 2619.40 samples/sec   Loss 3.4430   LearningRate 0.0062   Epoch: 15   Global Step: 623330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:04,502-Speed 2614.99 samples/sec   Loss 3.4216   LearningRate 0.0062   Epoch: 15   Global Step: 623340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:08,455-Speed 2591.21 samples/sec   Loss 3.3631   LearningRate 0.0062   Epoch: 15   Global Step: 623350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:12,375-Speed 2612.63 samples/sec   Loss 3.3650   LearningRate 0.0062   Epoch: 15   Global Step: 623360   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:16,439-Speed 2520.07 samples/sec   Loss 3.4188   LearningRate 0.0062   Epoch: 15   Global Step: 623370   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:20,500-Speed 2522.77 samples/sec   Loss 3.4467   LearningRate 0.0062   Epoch: 15   Global Step: 623380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:24,418-Speed 2614.18 samples/sec   Loss 3.4164   LearningRate 0.0062   Epoch: 15   Global Step: 623390   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:28,324-Speed 2622.32 samples/sec   Loss 3.3917   LearningRate 0.0062   Epoch: 15   Global Step: 623400   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:32,233-Speed 2620.75 samples/sec   Loss 3.3787   LearningRate 0.0062   Epoch: 15   Global Step: 623410   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:48:36,151-Speed 2614.83 samples/sec   Loss 3.3961   LearningRate 0.0062   Epoch: 15   Global Step: 623420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:48:40,054-Speed 2624.54 samples/sec   Loss 3.4633   LearningRate 0.0062   Epoch: 15   Global Step: 623430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:48:43,957-Speed 2623.61 samples/sec   Loss 3.3475   LearningRate 0.0062   Epoch: 15   Global Step: 623440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:48:47,862-Speed 2623.00 samples/sec   Loss 3.3410   LearningRate 0.0062   Epoch: 15   Global Step: 623450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:48:51,767-Speed 2622.55 samples/sec   Loss 3.3513   LearningRate 0.0062   Epoch: 15   Global Step: 623460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:48:55,676-Speed 2620.57 samples/sec   Loss 3.3298   LearningRate 0.0062   Epoch: 15   Global Step: 623470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:48:59,603-Speed 2609.70 samples/sec   Loss 3.3042   LearningRate 0.0062   Epoch: 15   Global Step: 623480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:49:03,516-Speed 2617.64 samples/sec   Loss 3.3629   LearningRate 0.0062   Epoch: 15   Global Step: 623490   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:07,422-Speed 2622.28 samples/sec   Loss 3.3550   LearningRate 0.0062   Epoch: 15   Global Step: 623500   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:11,330-Speed 2620.91 samples/sec   Loss 3.3962   LearningRate 0.0062   Epoch: 15   Global Step: 623510   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:15,236-Speed 2622.30 samples/sec   Loss 3.4257   LearningRate 0.0062   Epoch: 15   Global Step: 623520   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:19,147-Speed 2618.43 samples/sec   Loss 3.3921   LearningRate 0.0062   Epoch: 15   Global Step: 623530   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:23,064-Speed 2615.51 samples/sec   Loss 3.3828   LearningRate 0.0062   Epoch: 15   Global Step: 623540   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:26,978-Speed 2616.75 samples/sec   Loss 3.4234   LearningRate 0.0062   Epoch: 15   Global Step: 623550   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:30,884-Speed 2622.50 samples/sec   Loss 3.3767   LearningRate 0.0062   Epoch: 15   Global Step: 623560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:34,782-Speed 2627.81 samples/sec   Loss 3.3457   LearningRate 0.0062   Epoch: 15   Global Step: 623570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:38,683-Speed 2625.88 samples/sec   Loss 3.4815   LearningRate 0.0062   Epoch: 15   Global Step: 623580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:49:42,587-Speed 2623.53 samples/sec   Loss 3.4098   LearningRate 0.0062   Epoch: 15   Global Step: 623590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:49:46,489-Speed 2624.29 samples/sec   Loss 3.3552   LearningRate 0.0062   Epoch: 15   Global Step: 623600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:49:50,397-Speed 2621.07 samples/sec   Loss 3.3502   LearningRate 0.0062   Epoch: 15   Global Step: 623610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:49:54,320-Speed 2610.95 samples/sec   Loss 3.3826   LearningRate 0.0062   Epoch: 15   Global Step: 623620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:49:58,224-Speed 2624.19 samples/sec   Loss 3.4010   LearningRate 0.0062   Epoch: 15   Global Step: 623630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:50:02,139-Speed 2616.17 samples/sec   Loss 3.3585   LearningRate 0.0062   Epoch: 15   Global Step: 623640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:06,041-Speed 2625.22 samples/sec   Loss 3.4489   LearningRate 0.0062   Epoch: 15   Global Step: 623650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:09,944-Speed 2625.33 samples/sec   Loss 3.3672   LearningRate 0.0062   Epoch: 15   Global Step: 623660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:13,852-Speed 2620.60 samples/sec   Loss 3.4061   LearningRate 0.0062   Epoch: 15   Global Step: 623670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:17,757-Speed 2622.53 samples/sec   Loss 3.4685   LearningRate 0.0062   Epoch: 15   Global Step: 623680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:21,780-Speed 2546.00 samples/sec   Loss 3.3628   LearningRate 0.0062   Epoch: 15   Global Step: 623690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:25,777-Speed 2598.82 samples/sec   Loss 3.3560   LearningRate 0.0062   Epoch: 15   Global Step: 623700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:29,714-Speed 2617.27 samples/sec   Loss 3.3931   LearningRate 0.0062   Epoch: 15   Global Step: 623710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:33,721-Speed 2556.35 samples/sec   Loss 3.4007   LearningRate 0.0062   Epoch: 15   Global Step: 623720   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:37,690-Speed 2580.24 samples/sec   Loss 3.3518   LearningRate 0.0062   Epoch: 15   Global Step: 623730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:50:41,602-Speed 2618.21 samples/sec   Loss 3.3744   LearningRate 0.0062   Epoch: 15   Global Step: 623740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:50:46,034-Speed 2624.61 samples/sec   Loss 3.3251   LearningRate 0.0062   Epoch: 15   Global Step: 623750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:50:49,945-Speed 2618.65 samples/sec   Loss 3.4052   LearningRate 0.0062   Epoch: 15   Global Step: 623760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:50:53,857-Speed 2622.61 samples/sec   Loss 3.3696   LearningRate 0.0062   Epoch: 15   Global Step: 623770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:50:57,758-Speed 2625.49 samples/sec   Loss 3.4518   LearningRate 0.0062   Epoch: 15   Global Step: 623780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:01,755-Speed 2563.02 samples/sec   Loss 3.3714   LearningRate 0.0062   Epoch: 15   Global Step: 623790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:05,658-Speed 2624.27 samples/sec   Loss 3.4368   LearningRate 0.0062   Epoch: 15   Global Step: 623800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:09,809-Speed 2623.19 samples/sec   Loss 3.3798   LearningRate 0.0062   Epoch: 15   Global Step: 623810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:13,744-Speed 2603.34 samples/sec   Loss 3.3616   LearningRate 0.0062   Epoch: 15   Global Step: 623820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:17,665-Speed 2612.37 samples/sec   Loss 3.3825   LearningRate 0.0062   Epoch: 15   Global Step: 623830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:21,548-Speed 2637.43 samples/sec   Loss 3.3601   LearningRate 0.0062   Epoch: 15   Global Step: 623840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:25,473-Speed 2609.39 samples/sec   Loss 3.3972   LearningRate 0.0062   Epoch: 15   Global Step: 623850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:29,384-Speed 2619.57 samples/sec   Loss 3.4039   LearningRate 0.0061   Epoch: 15   Global Step: 623860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:33,291-Speed 2622.15 samples/sec   Loss 3.3632   LearningRate 0.0061   Epoch: 15   Global Step: 623870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:37,209-Speed 2613.52 samples/sec   Loss 3.3321   LearningRate 0.0061   Epoch: 15   Global Step: 623880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:41,124-Speed 2616.44 samples/sec   Loss 3.4274   LearningRate 0.0061   Epoch: 15   Global Step: 623890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:45,022-Speed 2628.10 samples/sec   Loss 3.4170   LearningRate 0.0061   Epoch: 15   Global Step: 623900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:48,924-Speed 2624.49 samples/sec   Loss 3.3558   LearningRate 0.0061   Epoch: 15   Global Step: 623910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:52,832-Speed 2620.80 samples/sec   Loss 3.4175   LearningRate 0.0061   Epoch: 15   Global Step: 623920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:51:56,734-Speed 2625.49 samples/sec   Loss 3.4538   LearningRate 0.0061   Epoch: 15   Global Step: 623930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:00,614-Speed 2640.35 samples/sec   Loss 3.3365   LearningRate 0.0061   Epoch: 15   Global Step: 623940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:04,515-Speed 2626.19 samples/sec   Loss 3.4041   LearningRate 0.0061   Epoch: 15   Global Step: 623950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:08,454-Speed 2600.02 samples/sec   Loss 3.5255   LearningRate 0.0061   Epoch: 15   Global Step: 623960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:12,357-Speed 2624.38 samples/sec   Loss 3.2901   LearningRate 0.0061   Epoch: 15   Global Step: 623970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:16,256-Speed 2626.93 samples/sec   Loss 3.3145   LearningRate 0.0061   Epoch: 15   Global Step: 623980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:20,155-Speed 2627.58 samples/sec   Loss 3.3692   LearningRate 0.0061   Epoch: 15   Global Step: 623990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:24,062-Speed 2621.65 samples/sec   Loss 3.4351   LearningRate 0.0061   Epoch: 15   Global Step: 624000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:27,974-Speed 2618.56 samples/sec   Loss 3.4045   LearningRate 0.0061   Epoch: 15   Global Step: 624010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:31,910-Speed 2602.20 samples/sec   Loss 3.3449   LearningRate 0.0061   Epoch: 15   Global Step: 624020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:35,822-Speed 2618.41 samples/sec   Loss 3.4279   LearningRate 0.0061   Epoch: 15   Global Step: 624030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:52:39,704-Speed 2639.00 samples/sec   Loss 3.4026   LearningRate 0.0061   Epoch: 15   Global Step: 624040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:52:43,623-Speed 2613.29 samples/sec   Loss 3.3367   LearningRate 0.0061   Epoch: 15   Global Step: 624050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:52:47,525-Speed 2624.31 samples/sec   Loss 3.3779   LearningRate 0.0061   Epoch: 15   Global Step: 624060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:52:51,432-Speed 2621.96 samples/sec   Loss 3.4009   LearningRate 0.0061   Epoch: 15   Global Step: 624070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:52:55,340-Speed 2622.00 samples/sec   Loss 3.4220   LearningRate 0.0061   Epoch: 15   Global Step: 624080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:52:59,276-Speed 2602.63 samples/sec   Loss 3.3902   LearningRate 0.0061   Epoch: 15   Global Step: 624090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:53:03,188-Speed 2618.17 samples/sec   Loss 3.3955   LearningRate 0.0061   Epoch: 15   Global Step: 624100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:53:07,095-Speed 2621.44 samples/sec   Loss 3.3343   LearningRate 0.0061   Epoch: 15   Global Step: 624110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:53:11,003-Speed 2621.38 samples/sec   Loss 3.4168   LearningRate 0.0061   Epoch: 15   Global Step: 624120   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:53:14,903-Speed 2625.57 samples/sec   Loss 3.4216   LearningRate 0.0061   Epoch: 15   Global Step: 624130   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:53:18,825-Speed 2611.86 samples/sec   Loss 3.3779   LearningRate 0.0061   Epoch: 15   Global Step: 624140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:22,729-Speed 2623.98 samples/sec   Loss 3.3550   LearningRate 0.0061   Epoch: 15   Global Step: 624150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:26,634-Speed 2623.07 samples/sec   Loss 3.3212   LearningRate 0.0061   Epoch: 15   Global Step: 624160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:30,556-Speed 2611.88 samples/sec   Loss 3.3168   LearningRate 0.0061   Epoch: 15   Global Step: 624170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:34,471-Speed 2616.06 samples/sec   Loss 3.3509   LearningRate 0.0061   Epoch: 15   Global Step: 624180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:38,376-Speed 2622.98 samples/sec   Loss 3.3502   LearningRate 0.0061   Epoch: 15   Global Step: 624190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:42,284-Speed 2620.49 samples/sec   Loss 3.3061   LearningRate 0.0061   Epoch: 15   Global Step: 624200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:46,191-Speed 2621.26 samples/sec   Loss 3.3385   LearningRate 0.0061   Epoch: 15   Global Step: 624210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:50,095-Speed 2623.90 samples/sec   Loss 3.3183   LearningRate 0.0061   Epoch: 15   Global Step: 624220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:54,002-Speed 2621.45 samples/sec   Loss 3.3741   LearningRate 0.0061   Epoch: 15   Global Step: 624230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:53:57,908-Speed 2622.98 samples/sec   Loss 3.4090   LearningRate 0.0061   Epoch: 15   Global Step: 624240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:54:01,792-Speed 2636.87 samples/sec   Loss 3.3898   LearningRate 0.0061   Epoch: 15   Global Step: 624250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:05,706-Speed 2617.02 samples/sec   Loss 3.3999   LearningRate 0.0061   Epoch: 15   Global Step: 624260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:09,608-Speed 2625.05 samples/sec   Loss 3.4034   LearningRate 0.0061   Epoch: 15   Global Step: 624270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:13,515-Speed 2621.33 samples/sec   Loss 3.3626   LearningRate 0.0061   Epoch: 15   Global Step: 624280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:17,425-Speed 2619.92 samples/sec   Loss 3.3704   LearningRate 0.0061   Epoch: 15   Global Step: 624290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:21,366-Speed 2599.29 samples/sec   Loss 3.3351   LearningRate 0.0061   Epoch: 15   Global Step: 624300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:25,274-Speed 2621.40 samples/sec   Loss 3.3448   LearningRate 0.0061   Epoch: 15   Global Step: 624310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:29,180-Speed 2622.37 samples/sec   Loss 3.4252   LearningRate 0.0061   Epoch: 15   Global Step: 624320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:33,108-Speed 2607.07 samples/sec   Loss 3.3341   LearningRate 0.0061   Epoch: 15   Global Step: 624330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:37,082-Speed 2577.80 samples/sec   Loss 3.3770   LearningRate 0.0061   Epoch: 15   Global Step: 624340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:40,990-Speed 2620.70 samples/sec   Loss 3.3112   LearningRate 0.0061   Epoch: 15   Global Step: 624350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:54:44,917-Speed 2608.67 samples/sec   Loss 3.3249   LearningRate 0.0061   Epoch: 15   Global Step: 624360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:48,837-Speed 2612.28 samples/sec   Loss 3.3215   LearningRate 0.0061   Epoch: 15   Global Step: 624370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:54:52,734-Speed 2628.03 samples/sec   Loss 3.3800   LearningRate 0.0061   Epoch: 15   Global Step: 624380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:54:56,645-Speed 2619.04 samples/sec   Loss 3.4436   LearningRate 0.0061   Epoch: 15   Global Step: 624390   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:00,551-Speed 2622.70 samples/sec   Loss 3.2826   LearningRate 0.0061   Epoch: 15   Global Step: 624400   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:04,454-Speed 2624.07 samples/sec   Loss 3.3802   LearningRate 0.0061   Epoch: 15   Global Step: 624410   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:08,357-Speed 2624.44 samples/sec   Loss 3.2728   LearningRate 0.0061   Epoch: 15   Global Step: 624420   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:12,299-Speed 2597.96 samples/sec   Loss 3.3465   LearningRate 0.0061   Epoch: 15   Global Step: 624430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:16,205-Speed 2622.46 samples/sec   Loss 3.3311   LearningRate 0.0061   Epoch: 15   Global Step: 624440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:20,131-Speed 2608.96 samples/sec   Loss 3.4471   LearningRate 0.0061   Epoch: 15   Global Step: 624450   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:24,038-Speed 2621.14 samples/sec   Loss 3.3977   LearningRate 0.0061   Epoch: 15   Global Step: 624460   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:27,946-Speed 2621.25 samples/sec   Loss 3.3159   LearningRate 0.0061   Epoch: 15   Global Step: 624470   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:55:31,858-Speed 2618.12 samples/sec   Loss 3.4109   LearningRate 0.0061   Epoch: 15   Global Step: 624480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:35,761-Speed 2624.45 samples/sec   Loss 3.4013   LearningRate 0.0061   Epoch: 15   Global Step: 624490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:39,680-Speed 2613.69 samples/sec   Loss 3.4275   LearningRate 0.0061   Epoch: 15   Global Step: 624500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:43,721-Speed 2534.78 samples/sec   Loss 3.4276   LearningRate 0.0061   Epoch: 15   Global Step: 624510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:47,628-Speed 2621.65 samples/sec   Loss 3.3709   LearningRate 0.0061   Epoch: 15   Global Step: 624520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:51,557-Speed 2607.02 samples/sec   Loss 3.3567   LearningRate 0.0061   Epoch: 15   Global Step: 624530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:55,480-Speed 2611.17 samples/sec   Loss 3.3441   LearningRate 0.0061   Epoch: 15   Global Step: 624540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:55:59,398-Speed 2614.68 samples/sec   Loss 3.3342   LearningRate 0.0061   Epoch: 15   Global Step: 624550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:03,311-Speed 2617.81 samples/sec   Loss 3.3074   LearningRate 0.0061   Epoch: 15   Global Step: 624560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:07,215-Speed 2623.53 samples/sec   Loss 3.4075   LearningRate 0.0061   Epoch: 15   Global Step: 624570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:11,094-Speed 2640.40 samples/sec   Loss 3.3779   LearningRate 0.0061   Epoch: 15   Global Step: 624580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:15,048-Speed 2591.26 samples/sec   Loss 3.3317   LearningRate 0.0061   Epoch: 15   Global Step: 624590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:18,956-Speed 2620.49 samples/sec   Loss 3.4121   LearningRate 0.0061   Epoch: 15   Global Step: 624600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:22,857-Speed 2625.58 samples/sec   Loss 3.3372   LearningRate 0.0061   Epoch: 15   Global Step: 624610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:26,764-Speed 2621.76 samples/sec   Loss 3.4101   LearningRate 0.0061   Epoch: 15   Global Step: 624620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:30,681-Speed 2614.94 samples/sec   Loss 3.3248   LearningRate 0.0061   Epoch: 15   Global Step: 624630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:34,583-Speed 2625.04 samples/sec   Loss 3.3834   LearningRate 0.0061   Epoch: 15   Global Step: 624640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:38,492-Speed 2620.15 samples/sec   Loss 3.3769   LearningRate 0.0061   Epoch: 15   Global Step: 624650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:42,411-Speed 2613.45 samples/sec   Loss 3.3482   LearningRate 0.0061   Epoch: 15   Global Step: 624660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:46,310-Speed 2627.28 samples/sec   Loss 3.3484   LearningRate 0.0061   Epoch: 15   Global Step: 624670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:50,212-Speed 2624.80 samples/sec   Loss 3.2989   LearningRate 0.0061   Epoch: 15   Global Step: 624680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:54,120-Speed 2621.41 samples/sec   Loss 3.3604   LearningRate 0.0061   Epoch: 15   Global Step: 624690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:56:58,021-Speed 2625.19 samples/sec   Loss 3.3424   LearningRate 0.0061   Epoch: 15   Global Step: 624700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:01,923-Speed 2625.25 samples/sec   Loss 3.4153   LearningRate 0.0061   Epoch: 15   Global Step: 624710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:05,828-Speed 2622.36 samples/sec   Loss 3.3042   LearningRate 0.0061   Epoch: 15   Global Step: 624720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:09,739-Speed 2619.78 samples/sec   Loss 3.3813   LearningRate 0.0061   Epoch: 15   Global Step: 624730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:13,643-Speed 2623.44 samples/sec   Loss 3.4064   LearningRate 0.0061   Epoch: 15   Global Step: 624740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:17,555-Speed 2618.10 samples/sec   Loss 3.4077   LearningRate 0.0061   Epoch: 15   Global Step: 624750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:21,455-Speed 2626.62 samples/sec   Loss 3.3815   LearningRate 0.0061   Epoch: 15   Global Step: 624760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:25,360-Speed 2623.21 samples/sec   Loss 3.2886   LearningRate 0.0061   Epoch: 15   Global Step: 624770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:29,259-Speed 2627.14 samples/sec   Loss 3.2840   LearningRate 0.0061   Epoch: 15   Global Step: 624780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:57:33,141-Speed 2638.32 samples/sec   Loss 3.3200   LearningRate 0.0061   Epoch: 15   Global Step: 624790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:37,113-Speed 2578.72 samples/sec   Loss 3.4492   LearningRate 0.0061   Epoch: 15   Global Step: 624800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:41,012-Speed 2626.92 samples/sec   Loss 3.3208   LearningRate 0.0061   Epoch: 15   Global Step: 624810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:44,915-Speed 2624.80 samples/sec   Loss 3.3070   LearningRate 0.0061   Epoch: 15   Global Step: 624820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:48,815-Speed 2625.92 samples/sec   Loss 3.3827   LearningRate 0.0061   Epoch: 15   Global Step: 624830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:52,753-Speed 2600.98 samples/sec   Loss 3.3883   LearningRate 0.0061   Epoch: 15   Global Step: 624840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:57:56,663-Speed 2620.04 samples/sec   Loss 3.3742   LearningRate 0.0061   Epoch: 15   Global Step: 624850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:00,564-Speed 2626.56 samples/sec   Loss 3.4660   LearningRate 0.0061   Epoch: 15   Global Step: 624860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:04,474-Speed 2619.45 samples/sec   Loss 3.3427   LearningRate 0.0061   Epoch: 15   Global Step: 624870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:08,388-Speed 2616.83 samples/sec   Loss 3.3139   LearningRate 0.0061   Epoch: 15   Global Step: 624880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:12,294-Speed 2622.39 samples/sec   Loss 3.3166   LearningRate 0.0061   Epoch: 15   Global Step: 624890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 17:58:16,176-Speed 2638.94 samples/sec   Loss 3.4013   LearningRate 0.0061   Epoch: 15   Global Step: 624900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:20,080-Speed 2623.99 samples/sec   Loss 3.3749   LearningRate 0.0061   Epoch: 15   Global Step: 624910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:23,981-Speed 2625.36 samples/sec   Loss 3.3457   LearningRate 0.0061   Epoch: 15   Global Step: 624920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:27,881-Speed 2626.41 samples/sec   Loss 3.3446   LearningRate 0.0061   Epoch: 15   Global Step: 624930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:31,785-Speed 2623.24 samples/sec   Loss 3.4424   LearningRate 0.0061   Epoch: 15   Global Step: 624940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:35,687-Speed 2624.94 samples/sec   Loss 3.3767   LearningRate 0.0061   Epoch: 15   Global Step: 624950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:39,587-Speed 2625.99 samples/sec   Loss 3.4097   LearningRate 0.0061   Epoch: 15   Global Step: 624960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:43,498-Speed 2619.46 samples/sec   Loss 3.3718   LearningRate 0.0061   Epoch: 15   Global Step: 624970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:47,407-Speed 2620.28 samples/sec   Loss 3.3592   LearningRate 0.0061   Epoch: 15   Global Step: 624980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:51,306-Speed 2627.01 samples/sec   Loss 3.3657   LearningRate 0.0061   Epoch: 15   Global Step: 624990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:58:55,182-Speed 2642.34 samples/sec   Loss 3.4080   LearningRate 0.0061   Epoch: 15   Global Step: 625000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:58:59,082-Speed 2627.47 samples/sec   Loss 3.3564   LearningRate 0.0061   Epoch: 15   Global Step: 625010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:02,982-Speed 2626.25 samples/sec   Loss 3.3528   LearningRate 0.0061   Epoch: 15   Global Step: 625020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:06,888-Speed 2622.03 samples/sec   Loss 3.3701   LearningRate 0.0061   Epoch: 15   Global Step: 625030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:10,789-Speed 2625.28 samples/sec   Loss 3.3469   LearningRate 0.0061   Epoch: 15   Global Step: 625040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:14,691-Speed 2625.19 samples/sec   Loss 3.3682   LearningRate 0.0061   Epoch: 15   Global Step: 625050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:18,597-Speed 2622.08 samples/sec   Loss 3.3717   LearningRate 0.0061   Epoch: 15   Global Step: 625060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:22,506-Speed 2620.83 samples/sec   Loss 3.3163   LearningRate 0.0061   Epoch: 15   Global Step: 625070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:26,414-Speed 2620.49 samples/sec   Loss 3.3526   LearningRate 0.0061   Epoch: 15   Global Step: 625080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:30,322-Speed 2621.22 samples/sec   Loss 3.3237   LearningRate 0.0061   Epoch: 15   Global Step: 625090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 17:59:34,242-Speed 2612.71 samples/sec   Loss 3.3236   LearningRate 0.0061   Epoch: 15   Global Step: 625100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:59:38,155-Speed 2617.47 samples/sec   Loss 3.2751   LearningRate 0.0061   Epoch: 15   Global Step: 625110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:59:42,059-Speed 2623.55 samples/sec   Loss 3.2676   LearningRate 0.0061   Epoch: 15   Global Step: 625120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:59:46,209-Speed 2468.21 samples/sec   Loss 3.3212   LearningRate 0.0061   Epoch: 15   Global Step: 625130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:59:50,148-Speed 2599.81 samples/sec   Loss 3.3888   LearningRate 0.0061   Epoch: 15   Global Step: 625140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:59:54,146-Speed 2562.37 samples/sec   Loss 3.4043   LearningRate 0.0061   Epoch: 15   Global Step: 625150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 17:59:58,093-Speed 2595.10 samples/sec   Loss 3.3453   LearningRate 0.0061   Epoch: 15   Global Step: 625160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:01,996-Speed 2624.81 samples/sec   Loss 3.3838   LearningRate 0.0061   Epoch: 15   Global Step: 625170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:05,901-Speed 2622.67 samples/sec   Loss 3.3399   LearningRate 0.0061   Epoch: 15   Global Step: 625180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:09,813-Speed 2618.01 samples/sec   Loss 3.3680   LearningRate 0.0061   Epoch: 15   Global Step: 625190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:13,705-Speed 2631.30 samples/sec   Loss 3.3767   LearningRate 0.0061   Epoch: 15   Global Step: 625200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:17,604-Speed 2627.19 samples/sec   Loss 3.3445   LearningRate 0.0061   Epoch: 15   Global Step: 625210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:21,509-Speed 2622.85 samples/sec   Loss 3.2799   LearningRate 0.0061   Epoch: 15   Global Step: 625220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:25,416-Speed 2621.80 samples/sec   Loss 3.3728   LearningRate 0.0061   Epoch: 15   Global Step: 625230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:29,315-Speed 2626.32 samples/sec   Loss 3.3807   LearningRate 0.0061   Epoch: 15   Global Step: 625240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:00:33,200-Speed 2636.53 samples/sec   Loss 3.3611   LearningRate 0.0061   Epoch: 15   Global Step: 625250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:00:37,098-Speed 2628.34 samples/sec   Loss 3.2975   LearningRate 0.0061   Epoch: 15   Global Step: 625260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:00:41,018-Speed 2613.03 samples/sec   Loss 3.4518   LearningRate 0.0061   Epoch: 15   Global Step: 625270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:00:44,929-Speed 2618.44 samples/sec   Loss 3.3297   LearningRate 0.0061   Epoch: 15   Global Step: 625280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:00:48,841-Speed 2618.82 samples/sec   Loss 3.3816   LearningRate 0.0061   Epoch: 15   Global Step: 625290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:00:52,743-Speed 2624.52 samples/sec   Loss 3.3969   LearningRate 0.0061   Epoch: 15   Global Step: 625300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:00:56,648-Speed 2623.03 samples/sec   Loss 3.3003   LearningRate 0.0061   Epoch: 15   Global Step: 625310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:01:00,558-Speed 2619.42 samples/sec   Loss 3.2512   LearningRate 0.0061   Epoch: 15   Global Step: 625320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:01:04,461-Speed 2624.99 samples/sec   Loss 3.3213   LearningRate 0.0061   Epoch: 15   Global Step: 625330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:01:08,365-Speed 2622.94 samples/sec   Loss 3.3770   LearningRate 0.0061   Epoch: 15   Global Step: 625340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:01:12,268-Speed 2624.76 samples/sec   Loss 3.3572   LearningRate 0.0061   Epoch: 15   Global Step: 625350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:16,169-Speed 2625.82 samples/sec   Loss 3.3746   LearningRate 0.0061   Epoch: 15   Global Step: 625360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:20,106-Speed 2601.50 samples/sec   Loss 3.2887   LearningRate 0.0061   Epoch: 15   Global Step: 625370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:24,017-Speed 2619.58 samples/sec   Loss 3.3049   LearningRate 0.0061   Epoch: 15   Global Step: 625380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:27,934-Speed 2614.37 samples/sec   Loss 3.3140   LearningRate 0.0061   Epoch: 15   Global Step: 625390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:31,844-Speed 2619.62 samples/sec   Loss 3.3598   LearningRate 0.0061   Epoch: 15   Global Step: 625400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:35,745-Speed 2625.97 samples/sec   Loss 3.3265   LearningRate 0.0061   Epoch: 15   Global Step: 625410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:39,648-Speed 2624.45 samples/sec   Loss 3.2960   LearningRate 0.0061   Epoch: 15   Global Step: 625420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:43,547-Speed 2626.93 samples/sec   Loss 3.3111   LearningRate 0.0061   Epoch: 15   Global Step: 625430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:47,474-Speed 2607.83 samples/sec   Loss 3.4341   LearningRate 0.0061   Epoch: 15   Global Step: 625440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:51,372-Speed 2628.40 samples/sec   Loss 3.2834   LearningRate 0.0061   Epoch: 15   Global Step: 625450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:01:55,255-Speed 2637.84 samples/sec   Loss 3.3608   LearningRate 0.0061   Epoch: 15   Global Step: 625460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:01:59,155-Speed 2626.01 samples/sec   Loss 3.3795   LearningRate 0.0061   Epoch: 15   Global Step: 625470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:03,119-Speed 2584.06 samples/sec   Loss 3.3301   LearningRate 0.0061   Epoch: 15   Global Step: 625480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:07,225-Speed 2494.44 samples/sec   Loss 3.4059   LearningRate 0.0061   Epoch: 15   Global Step: 625490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:11,242-Speed 2549.91 samples/sec   Loss 3.4147   LearningRate 0.0061   Epoch: 15   Global Step: 625500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:15,141-Speed 2627.04 samples/sec   Loss 3.4353   LearningRate 0.0061   Epoch: 15   Global Step: 625510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:19,048-Speed 2622.24 samples/sec   Loss 3.3914   LearningRate 0.0061   Epoch: 15   Global Step: 625520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:22,948-Speed 2626.04 samples/sec   Loss 3.2100   LearningRate 0.0061   Epoch: 15   Global Step: 625530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:26,850-Speed 2624.42 samples/sec   Loss 3.3523   LearningRate 0.0060   Epoch: 15   Global Step: 625540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:30,753-Speed 2624.29 samples/sec   Loss 3.4731   LearningRate 0.0060   Epoch: 15   Global Step: 625550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:34,668-Speed 2620.76 samples/sec   Loss 3.3026   LearningRate 0.0060   Epoch: 15   Global Step: 625560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:02:38,545-Speed 2641.28 samples/sec   Loss 3.3739   LearningRate 0.0060   Epoch: 15   Global Step: 625570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:42,451-Speed 2622.09 samples/sec   Loss 3.3875   LearningRate 0.0060   Epoch: 15   Global Step: 625580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:46,361-Speed 2620.00 samples/sec   Loss 3.3091   LearningRate 0.0060   Epoch: 15   Global Step: 625590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:50,272-Speed 2619.00 samples/sec   Loss 3.3718   LearningRate 0.0060   Epoch: 15   Global Step: 625600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:54,173-Speed 2626.26 samples/sec   Loss 3.3472   LearningRate 0.0060   Epoch: 15   Global Step: 625610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:02:58,079-Speed 2622.27 samples/sec   Loss 3.3759   LearningRate 0.0060   Epoch: 15   Global Step: 625620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:03:01,981-Speed 2624.60 samples/sec   Loss 3.2897   LearningRate 0.0060   Epoch: 15   Global Step: 625630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:03:05,886-Speed 2623.00 samples/sec   Loss 3.3494   LearningRate 0.0060   Epoch: 15   Global Step: 625640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:03:09,777-Speed 2632.88 samples/sec   Loss 3.4629   LearningRate 0.0060   Epoch: 15   Global Step: 625650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:13,676-Speed 2626.70 samples/sec   Loss 3.3072   LearningRate 0.0060   Epoch: 15   Global Step: 625660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:17,578-Speed 2625.10 samples/sec   Loss 3.3168   LearningRate 0.0060   Epoch: 15   Global Step: 625670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:21,486-Speed 2621.23 samples/sec   Loss 3.3595   LearningRate 0.0060   Epoch: 15   Global Step: 625680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:25,400-Speed 2616.92 samples/sec   Loss 3.3335   LearningRate 0.0060   Epoch: 15   Global Step: 625690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:29,325-Speed 2609.88 samples/sec   Loss 3.3398   LearningRate 0.0060   Epoch: 15   Global Step: 625700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:33,238-Speed 2617.60 samples/sec   Loss 3.3485   LearningRate 0.0060   Epoch: 15   Global Step: 625710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:37,162-Speed 2610.05 samples/sec   Loss 3.3129   LearningRate 0.0060   Epoch: 15   Global Step: 625720   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:41,065-Speed 2624.27 samples/sec   Loss 3.2727   LearningRate 0.0060   Epoch: 15   Global Step: 625730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:44,966-Speed 2625.59 samples/sec   Loss 3.3830   LearningRate 0.0060   Epoch: 15   Global Step: 625740   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:48,866-Speed 2626.20 samples/sec   Loss 3.4124   LearningRate 0.0060   Epoch: 15   Global Step: 625750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:03:52,750-Speed 2637.24 samples/sec   Loss 3.2988   LearningRate 0.0060   Epoch: 15   Global Step: 625760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:03:56,653-Speed 2624.66 samples/sec   Loss 3.2921   LearningRate 0.0060   Epoch: 15   Global Step: 625770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:00,549-Speed 2629.04 samples/sec   Loss 3.3560   LearningRate 0.0060   Epoch: 15   Global Step: 625780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:04,456-Speed 2621.30 samples/sec   Loss 3.3204   LearningRate 0.0060   Epoch: 15   Global Step: 625790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:08,358-Speed 2624.70 samples/sec   Loss 3.2942   LearningRate 0.0060   Epoch: 15   Global Step: 625800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:12,265-Speed 2621.57 samples/sec   Loss 3.3312   LearningRate 0.0060   Epoch: 15   Global Step: 625810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:16,168-Speed 2624.38 samples/sec   Loss 3.2788   LearningRate 0.0060   Epoch: 15   Global Step: 625820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:20,083-Speed 2616.30 samples/sec   Loss 3.3537   LearningRate 0.0060   Epoch: 15   Global Step: 625830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:23,993-Speed 2619.37 samples/sec   Loss 3.3371   LearningRate 0.0060   Epoch: 15   Global Step: 625840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:27,896-Speed 2624.29 samples/sec   Loss 3.3234   LearningRate 0.0060   Epoch: 15   Global Step: 625850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:04:31,798-Speed 2625.45 samples/sec   Loss 3.3601   LearningRate 0.0060   Epoch: 15   Global Step: 625860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:35,703-Speed 2622.70 samples/sec   Loss 3.3296   LearningRate 0.0060   Epoch: 15   Global Step: 625870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:39,609-Speed 2622.16 samples/sec   Loss 3.2845   LearningRate 0.0060   Epoch: 15   Global Step: 625880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:43,522-Speed 2617.79 samples/sec   Loss 3.4163   LearningRate 0.0060   Epoch: 15   Global Step: 625890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:47,434-Speed 2618.54 samples/sec   Loss 3.3031   LearningRate 0.0060   Epoch: 15   Global Step: 625900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:51,335-Speed 2625.91 samples/sec   Loss 3.2903   LearningRate 0.0060   Epoch: 15   Global Step: 625910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:55,234-Speed 2627.10 samples/sec   Loss 3.3038   LearningRate 0.0060   Epoch: 15   Global Step: 625920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:04:59,145-Speed 2618.87 samples/sec   Loss 3.3069   LearningRate 0.0060   Epoch: 15   Global Step: 625930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:03,133-Speed 2568.74 samples/sec   Loss 3.3676   LearningRate 0.0060   Epoch: 15   Global Step: 625940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:07,032-Speed 2626.81 samples/sec   Loss 3.2779   LearningRate 0.0060   Epoch: 15   Global Step: 625950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:10,941-Speed 2620.47 samples/sec   Loss 3.3424   LearningRate 0.0060   Epoch: 15   Global Step: 625960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:05:14,823-Speed 2638.17 samples/sec   Loss 3.3343   LearningRate 0.0060   Epoch: 15   Global Step: 625970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:18,720-Speed 2628.45 samples/sec   Loss 3.3256   LearningRate 0.0060   Epoch: 15   Global Step: 625980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:22,619-Speed 2627.25 samples/sec   Loss 3.2415   LearningRate 0.0060   Epoch: 15   Global Step: 625990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:26,520-Speed 2625.33 samples/sec   Loss 3.3344   LearningRate 0.0060   Epoch: 15   Global Step: 626000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:30,424-Speed 2624.47 samples/sec   Loss 3.3163   LearningRate 0.0060   Epoch: 15   Global Step: 626010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:34,339-Speed 2616.05 samples/sec   Loss 3.3803   LearningRate 0.0060   Epoch: 15   Global Step: 626020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:05:38,225-Speed 2634.97 samples/sec   Loss 3.3020   LearningRate 0.0060   Epoch: 15   Global Step: 626030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:05:42,125-Speed 2626.29 samples/sec   Loss 3.3272   LearningRate 0.0060   Epoch: 15   Global Step: 626040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:05:46,057-Speed 2605.76 samples/sec   Loss 3.4127   LearningRate 0.0060   Epoch: 15   Global Step: 626050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:05:50,003-Speed 2595.38 samples/sec   Loss 3.3259   LearningRate 0.0060   Epoch: 15   Global Step: 626060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:05:53,905-Speed 2625.41 samples/sec   Loss 3.3069   LearningRate 0.0060   Epoch: 15   Global Step: 626070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:05:57,825-Speed 2612.65 samples/sec   Loss 3.3879   LearningRate 0.0060   Epoch: 15   Global Step: 626080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:01,735-Speed 2620.07 samples/sec   Loss 3.3329   LearningRate 0.0060   Epoch: 15   Global Step: 626090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:05,659-Speed 2610.08 samples/sec   Loss 3.3878   LearningRate 0.0060   Epoch: 15   Global Step: 626100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:09,566-Speed 2621.18 samples/sec   Loss 3.4329   LearningRate 0.0060   Epoch: 15   Global Step: 626110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:13,467-Speed 2625.54 samples/sec   Loss 3.3077   LearningRate 0.0060   Epoch: 15   Global Step: 626120   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:17,366-Speed 2628.08 samples/sec   Loss 3.3157   LearningRate 0.0060   Epoch: 15   Global Step: 626130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:06:21,255-Speed 2634.28 samples/sec   Loss 3.3962   LearningRate 0.0060   Epoch: 15   Global Step: 626140   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:25,216-Speed 2585.57 samples/sec   Loss 3.3275   LearningRate 0.0060   Epoch: 15   Global Step: 626150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:29,112-Speed 2628.63 samples/sec   Loss 3.2749   LearningRate 0.0060   Epoch: 15   Global Step: 626160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:33,009-Speed 2627.97 samples/sec   Loss 3.3605   LearningRate 0.0060   Epoch: 15   Global Step: 626170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:36,928-Speed 2614.53 samples/sec   Loss 3.3830   LearningRate 0.0060   Epoch: 15   Global Step: 626180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:40,833-Speed 2623.07 samples/sec   Loss 3.3005   LearningRate 0.0060   Epoch: 15   Global Step: 626190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:44,772-Speed 2600.24 samples/sec   Loss 3.3840   LearningRate 0.0060   Epoch: 15   Global Step: 626200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:48,704-Speed 2606.13 samples/sec   Loss 3.3200   LearningRate 0.0060   Epoch: 15   Global Step: 626210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:52,606-Speed 2624.36 samples/sec   Loss 3.2806   LearningRate 0.0060   Epoch: 15   Global Step: 626220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:06:56,525-Speed 2613.86 samples/sec   Loss 3.3992   LearningRate 0.0060   Epoch: 15   Global Step: 626230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:00,432-Speed 2621.79 samples/sec   Loss 3.2729   LearningRate 0.0060   Epoch: 15   Global Step: 626240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:07:04,336-Speed 2623.72 samples/sec   Loss 3.3482   LearningRate 0.0060   Epoch: 15   Global Step: 626250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:07:08,241-Speed 2622.83 samples/sec   Loss 3.3702   LearningRate 0.0060   Epoch: 15   Global Step: 626260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:07:12,141-Speed 2626.38 samples/sec   Loss 3.2627   LearningRate 0.0060   Epoch: 15   Global Step: 626270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:07:16,043-Speed 2625.33 samples/sec   Loss 3.3030   LearningRate 0.0060   Epoch: 15   Global Step: 626280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:07:19,926-Speed 2637.68 samples/sec   Loss 3.3888   LearningRate 0.0060   Epoch: 15   Global Step: 626290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:23,827-Speed 2627.02 samples/sec   Loss 3.2884   LearningRate 0.0060   Epoch: 15   Global Step: 626300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:27,729-Speed 2624.65 samples/sec   Loss 3.3127   LearningRate 0.0060   Epoch: 15   Global Step: 626310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:31,630-Speed 2625.82 samples/sec   Loss 3.3940   LearningRate 0.0060   Epoch: 15   Global Step: 626320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:35,533-Speed 2623.57 samples/sec   Loss 3.3402   LearningRate 0.0060   Epoch: 15   Global Step: 626330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:39,443-Speed 2620.07 samples/sec   Loss 3.2723   LearningRate 0.0060   Epoch: 15   Global Step: 626340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:43,340-Speed 2628.61 samples/sec   Loss 3.3800   LearningRate 0.0060   Epoch: 15   Global Step: 626350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:47,235-Speed 2630.07 samples/sec   Loss 3.3651   LearningRate 0.0060   Epoch: 15   Global Step: 626360   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:51,139-Speed 2622.83 samples/sec   Loss 3.3117   LearningRate 0.0060   Epoch: 15   Global Step: 626370   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:55,037-Speed 2628.46 samples/sec   Loss 3.2755   LearningRate 0.0060   Epoch: 15   Global Step: 626380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:07:58,935-Speed 2627.47 samples/sec   Loss 3.3909   LearningRate 0.0060   Epoch: 15   Global Step: 626390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:08:02,833-Speed 2627.67 samples/sec   Loss 3.2643   LearningRate 0.0060   Epoch: 15   Global Step: 626400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:08:06,748-Speed 2615.53 samples/sec   Loss 3.3706   LearningRate 0.0060   Epoch: 15   Global Step: 626410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:08:10,644-Speed 2629.39 samples/sec   Loss 3.2610   LearningRate 0.0060   Epoch: 15   Global Step: 626420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:08:14,521-Speed 2641.97 samples/sec   Loss 3.3844   LearningRate 0.0060   Epoch: 15   Global Step: 626430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:08:18,436-Speed 2616.86 samples/sec   Loss 3.3041   LearningRate 0.0060   Epoch: 15   Global Step: 626440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:08:22,317-Speed 2639.09 samples/sec   Loss 3.3593   LearningRate 0.0060   Epoch: 15   Global Step: 626450   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:26,248-Speed 2605.06 samples/sec   Loss 3.3238   LearningRate 0.0060   Epoch: 15   Global Step: 626460   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:30,158-Speed 2619.62 samples/sec   Loss 3.3106   LearningRate 0.0060   Epoch: 15   Global Step: 626470   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:34,054-Speed 2628.57 samples/sec   Loss 3.3035   LearningRate 0.0060   Epoch: 15   Global Step: 626480   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:37,951-Speed 2628.29 samples/sec   Loss 3.3571   LearningRate 0.0060   Epoch: 15   Global Step: 626490   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:41,850-Speed 2627.50 samples/sec   Loss 3.2699   LearningRate 0.0060   Epoch: 15   Global Step: 626500   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:45,746-Speed 2628.76 samples/sec   Loss 3.3530   LearningRate 0.0060   Epoch: 15   Global Step: 626510   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:49,687-Speed 2599.80 samples/sec   Loss 3.3283   LearningRate 0.0060   Epoch: 15   Global Step: 626520   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:53,584-Speed 2628.18 samples/sec   Loss 3.3504   LearningRate 0.0060   Epoch: 15   Global Step: 626530   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:08:57,750-Speed 2459.47 samples/sec   Loss 3.3862   LearningRate 0.0060   Epoch: 15   Global Step: 626540   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:09:01,690-Speed 2599.33 samples/sec   Loss 3.4363   LearningRate 0.0060   Epoch: 15   Global Step: 626550   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:05,739-Speed 2529.59 samples/sec   Loss 3.4069   LearningRate 0.0060   Epoch: 15   Global Step: 626560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:09,644-Speed 2623.01 samples/sec   Loss 3.2937   LearningRate 0.0060   Epoch: 15   Global Step: 626570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:13,547-Speed 2624.59 samples/sec   Loss 3.4058   LearningRate 0.0060   Epoch: 15   Global Step: 626580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:17,441-Speed 2630.45 samples/sec   Loss 3.2843   LearningRate 0.0060   Epoch: 15   Global Step: 626590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:21,347-Speed 2623.51 samples/sec   Loss 3.3062   LearningRate 0.0060   Epoch: 15   Global Step: 626600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:25,249-Speed 2624.54 samples/sec   Loss 3.3212   LearningRate 0.0060   Epoch: 15   Global Step: 626610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:29,152-Speed 2624.92 samples/sec   Loss 3.3507   LearningRate 0.0060   Epoch: 15   Global Step: 626620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:33,083-Speed 2605.28 samples/sec   Loss 3.3248   LearningRate 0.0060   Epoch: 15   Global Step: 626630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:36,979-Speed 2629.27 samples/sec   Loss 3.3538   LearningRate 0.0060   Epoch: 15   Global Step: 626640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:09:40,875-Speed 2628.98 samples/sec   Loss 3.3669   LearningRate 0.0060   Epoch: 15   Global Step: 626650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:09:44,774-Speed 2627.54 samples/sec   Loss 3.3385   LearningRate 0.0060   Epoch: 15   Global Step: 626660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:09:48,674-Speed 2625.95 samples/sec   Loss 3.3229   LearningRate 0.0060   Epoch: 15   Global Step: 626670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:09:52,571-Speed 2628.79 samples/sec   Loss 3.3271   LearningRate 0.0060   Epoch: 15   Global Step: 626680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:09:56,487-Speed 2615.83 samples/sec   Loss 3.3694   LearningRate 0.0060   Epoch: 15   Global Step: 626690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:10:00,380-Speed 2631.12 samples/sec   Loss 3.3070   LearningRate 0.0060   Epoch: 15   Global Step: 626700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:10:04,257-Speed 2641.67 samples/sec   Loss 3.2563   LearningRate 0.0060   Epoch: 15   Global Step: 626710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:08,158-Speed 2625.71 samples/sec   Loss 3.3106   LearningRate 0.0060   Epoch: 15   Global Step: 626720   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:12,059-Speed 2625.47 samples/sec   Loss 3.3150   LearningRate 0.0060   Epoch: 15   Global Step: 626730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:15,954-Speed 2629.86 samples/sec   Loss 3.3398   LearningRate 0.0060   Epoch: 15   Global Step: 626740   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:19,856-Speed 2624.75 samples/sec   Loss 3.2961   LearningRate 0.0060   Epoch: 15   Global Step: 626750   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:23,754-Speed 2627.81 samples/sec   Loss 3.3797   LearningRate 0.0060   Epoch: 15   Global Step: 626760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:27,651-Speed 2628.48 samples/sec   Loss 3.3316   LearningRate 0.0060   Epoch: 15   Global Step: 626770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:31,546-Speed 2629.71 samples/sec   Loss 3.2881   LearningRate 0.0060   Epoch: 15   Global Step: 626780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:35,444-Speed 2627.51 samples/sec   Loss 3.3455   LearningRate 0.0060   Epoch: 15   Global Step: 626790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:39,343-Speed 2626.96 samples/sec   Loss 3.3533   LearningRate 0.0060   Epoch: 15   Global Step: 626800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:10:43,242-Speed 2626.44 samples/sec   Loss 3.2741   LearningRate 0.0060   Epoch: 15   Global Step: 626810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:10:47,236-Speed 2564.94 samples/sec   Loss 3.2919   LearningRate 0.0060   Epoch: 15   Global Step: 626820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:10:51,150-Speed 2616.63 samples/sec   Loss 3.3622   LearningRate 0.0060   Epoch: 15   Global Step: 626830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:10:55,051-Speed 2626.21 samples/sec   Loss 3.3475   LearningRate 0.0060   Epoch: 15   Global Step: 626840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:10:58,951-Speed 2626.17 samples/sec   Loss 3.3795   LearningRate 0.0060   Epoch: 15   Global Step: 626850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:11:02,847-Speed 2629.41 samples/sec   Loss 3.3232   LearningRate 0.0060   Epoch: 15   Global Step: 626860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:11:06,722-Speed 2643.03 samples/sec   Loss 3.2983   LearningRate 0.0060   Epoch: 15   Global Step: 626870   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:10,619-Speed 2627.98 samples/sec   Loss 3.4530   LearningRate 0.0060   Epoch: 15   Global Step: 626880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:14,517-Speed 2627.43 samples/sec   Loss 3.2323   LearningRate 0.0060   Epoch: 15   Global Step: 626890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:18,413-Speed 2628.94 samples/sec   Loss 3.2618   LearningRate 0.0060   Epoch: 15   Global Step: 626900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:22,315-Speed 2624.89 samples/sec   Loss 3.3902   LearningRate 0.0060   Epoch: 15   Global Step: 626910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:26,223-Speed 2620.60 samples/sec   Loss 3.2382   LearningRate 0.0060   Epoch: 15   Global Step: 626920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:30,129-Speed 2623.21 samples/sec   Loss 3.3096   LearningRate 0.0060   Epoch: 15   Global Step: 626930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:34,038-Speed 2620.21 samples/sec   Loss 3.2400   LearningRate 0.0060   Epoch: 15   Global Step: 626940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:37,939-Speed 2625.55 samples/sec   Loss 3.3231   LearningRate 0.0060   Epoch: 15   Global Step: 626950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:41,837-Speed 2627.05 samples/sec   Loss 3.3600   LearningRate 0.0060   Epoch: 15   Global Step: 626960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:11:45,736-Speed 2627.07 samples/sec   Loss 3.3524   LearningRate 0.0060   Epoch: 15   Global Step: 626970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:11:49,632-Speed 2628.75 samples/sec   Loss 3.2995   LearningRate 0.0060   Epoch: 15   Global Step: 626980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:11:53,533-Speed 2625.94 samples/sec   Loss 3.3166   LearningRate 0.0060   Epoch: 15   Global Step: 626990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:11:57,422-Speed 2633.75 samples/sec   Loss 3.3996   LearningRate 0.0060   Epoch: 15   Global Step: 627000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:01,325-Speed 2624.33 samples/sec   Loss 3.2530   LearningRate 0.0060   Epoch: 15   Global Step: 627010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:05,225-Speed 2626.50 samples/sec   Loss 3.3281   LearningRate 0.0060   Epoch: 15   Global Step: 627020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:09,168-Speed 2597.36 samples/sec   Loss 3.2575   LearningRate 0.0060   Epoch: 15   Global Step: 627030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:13,065-Speed 2628.33 samples/sec   Loss 3.3522   LearningRate 0.0060   Epoch: 15   Global Step: 627040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:16,962-Speed 2628.95 samples/sec   Loss 3.3258   LearningRate 0.0060   Epoch: 15   Global Step: 627050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:20,868-Speed 2621.85 samples/sec   Loss 3.3406   LearningRate 0.0060   Epoch: 15   Global Step: 627060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:24,772-Speed 2623.84 samples/sec   Loss 3.4109   LearningRate 0.0060   Epoch: 15   Global Step: 627070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:28,676-Speed 2623.06 samples/sec   Loss 3.3177   LearningRate 0.0060   Epoch: 15   Global Step: 627080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:32,591-Speed 2617.03 samples/sec   Loss 3.3161   LearningRate 0.0060   Epoch: 15   Global Step: 627090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:12:36,468-Speed 2642.39 samples/sec   Loss 3.3125   LearningRate 0.0060   Epoch: 15   Global Step: 627100   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:12:40,373-Speed 2622.89 samples/sec   Loss 3.3736   LearningRate 0.0060   Epoch: 15   Global Step: 627110   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:12:44,270-Speed 2627.54 samples/sec   Loss 3.3493   LearningRate 0.0060   Epoch: 15   Global Step: 627120   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:12:48,188-Speed 2614.74 samples/sec   Loss 3.3398   LearningRate 0.0060   Epoch: 15   Global Step: 627130   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:12:52,099-Speed 2618.80 samples/sec   Loss 3.3513   LearningRate 0.0060   Epoch: 15   Global Step: 627140   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:12:56,015-Speed 2615.55 samples/sec   Loss 3.3411   LearningRate 0.0060   Epoch: 15   Global Step: 627150   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:12:59,920-Speed 2622.97 samples/sec   Loss 3.3566   LearningRate 0.0060   Epoch: 15   Global Step: 627160   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:13:03,828-Speed 2621.25 samples/sec   Loss 3.3588   LearningRate 0.0060   Epoch: 15   Global Step: 627170   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:13:07,724-Speed 2628.50 samples/sec   Loss 3.3273   LearningRate 0.0060   Epoch: 15   Global Step: 627180   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:13:11,618-Speed 2630.91 samples/sec   Loss 3.2309   LearningRate 0.0060   Epoch: 15   Global Step: 627190   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-15 18:13:15,515-Speed 2628.23 samples/sec   Loss 3.2947   LearningRate 0.0060   Epoch: 15   Global Step: 627200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:19,413-Speed 2627.64 samples/sec   Loss 3.3352   LearningRate 0.0060   Epoch: 15   Global Step: 627210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:23,326-Speed 2617.37 samples/sec   Loss 3.2818   LearningRate 0.0060   Epoch: 15   Global Step: 627220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:27,225-Speed 2627.08 samples/sec   Loss 3.2779   LearningRate 0.0059   Epoch: 15   Global Step: 627230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:31,134-Speed 2620.97 samples/sec   Loss 3.3535   LearningRate 0.0059   Epoch: 15   Global Step: 627240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:35,032-Speed 2626.97 samples/sec   Loss 3.3144   LearningRate 0.0059   Epoch: 15   Global Step: 627250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:38,931-Speed 2626.99 samples/sec   Loss 3.2615   LearningRate 0.0059   Epoch: 15   Global Step: 627260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:42,871-Speed 2599.72 samples/sec   Loss 3.3318   LearningRate 0.0059   Epoch: 15   Global Step: 627270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:46,772-Speed 2626.04 samples/sec   Loss 3.3954   LearningRate 0.0059   Epoch: 15   Global Step: 627280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:50,674-Speed 2625.08 samples/sec   Loss 3.3562   LearningRate 0.0059   Epoch: 15   Global Step: 627290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:13:54,611-Speed 2601.95 samples/sec   Loss 3.3377   LearningRate 0.0059   Epoch: 15   Global Step: 627300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:13:58,520-Speed 2620.33 samples/sec   Loss 3.3253   LearningRate 0.0059   Epoch: 15   Global Step: 627310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:14:02,418-Speed 2627.43 samples/sec   Loss 3.3453   LearningRate 0.0059   Epoch: 15   Global Step: 627320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:14:06,315-Speed 2628.37 samples/sec   Loss 3.2057   LearningRate 0.0059   Epoch: 15   Global Step: 627330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:14:10,212-Speed 2628.71 samples/sec   Loss 3.3026   LearningRate 0.0059   Epoch: 15   Global Step: 627340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:14:14,085-Speed 2644.62 samples/sec   Loss 3.3825   LearningRate 0.0059   Epoch: 15   Global Step: 627350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:17,982-Speed 2628.44 samples/sec   Loss 3.3880   LearningRate 0.0059   Epoch: 15   Global Step: 627360   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:21,914-Speed 2604.76 samples/sec   Loss 3.3165   LearningRate 0.0059   Epoch: 15   Global Step: 627370   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:25,810-Speed 2629.00 samples/sec   Loss 3.2771   LearningRate 0.0059   Epoch: 15   Global Step: 627380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:29,706-Speed 2629.50 samples/sec   Loss 3.2392   LearningRate 0.0059   Epoch: 15   Global Step: 627390   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:33,617-Speed 2618.61 samples/sec   Loss 3.2999   LearningRate 0.0059   Epoch: 15   Global Step: 627400   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:37,579-Speed 2585.15 samples/sec   Loss 3.1609   LearningRate 0.0059   Epoch: 15   Global Step: 627410   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:41,475-Speed 2628.54 samples/sec   Loss 3.3494   LearningRate 0.0059   Epoch: 15   Global Step: 627420   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:45,372-Speed 2628.20 samples/sec   Loss 3.3359   LearningRate 0.0059   Epoch: 15   Global Step: 627430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:49,294-Speed 2611.90 samples/sec   Loss 3.2576   LearningRate 0.0059   Epoch: 15   Global Step: 627440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:14:53,191-Speed 2628.78 samples/sec   Loss 3.3267   LearningRate 0.0059   Epoch: 15   Global Step: 627450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:14:57,093-Speed 2625.03 samples/sec   Loss 3.2988   LearningRate 0.0059   Epoch: 15   Global Step: 627460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:01,002-Speed 2620.17 samples/sec   Loss 3.3590   LearningRate 0.0059   Epoch: 15   Global Step: 627470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:04,917-Speed 2615.70 samples/sec   Loss 3.3209   LearningRate 0.0059   Epoch: 15   Global Step: 627480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:08,811-Speed 2630.70 samples/sec   Loss 3.3710   LearningRate 0.0059   Epoch: 15   Global Step: 627490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:12,718-Speed 2621.42 samples/sec   Loss 3.2627   LearningRate 0.0059   Epoch: 15   Global Step: 627500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:16,615-Speed 2628.30 samples/sec   Loss 3.3184   LearningRate 0.0059   Epoch: 15   Global Step: 627510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:20,515-Speed 2626.26 samples/sec   Loss 3.3434   LearningRate 0.0059   Epoch: 15   Global Step: 627520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:24,415-Speed 2626.63 samples/sec   Loss 3.2835   LearningRate 0.0059   Epoch: 15   Global Step: 627530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:28,313-Speed 2627.75 samples/sec   Loss 3.4327   LearningRate 0.0059   Epoch: 15   Global Step: 627540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:32,183-Speed 2647.04 samples/sec   Loss 3.2958   LearningRate 0.0059   Epoch: 15   Global Step: 627550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:36,090-Speed 2620.95 samples/sec   Loss 3.2675   LearningRate 0.0059   Epoch: 15   Global Step: 627560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:39,989-Speed 2626.84 samples/sec   Loss 3.3581   LearningRate 0.0059   Epoch: 15   Global Step: 627570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:43,898-Speed 2620.08 samples/sec   Loss 3.2169   LearningRate 0.0059   Epoch: 15   Global Step: 627580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:47,798-Speed 2626.74 samples/sec   Loss 3.3664   LearningRate 0.0059   Epoch: 15   Global Step: 627590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:51,698-Speed 2626.68 samples/sec   Loss 3.3240   LearningRate 0.0059   Epoch: 15   Global Step: 627600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:55,602-Speed 2623.06 samples/sec   Loss 3.3868   LearningRate 0.0059   Epoch: 15   Global Step: 627610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:15:59,498-Speed 2629.81 samples/sec   Loss 3.3850   LearningRate 0.0059   Epoch: 15   Global Step: 627620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:16:03,394-Speed 2628.56 samples/sec   Loss 3.3139   LearningRate 0.0059   Epoch: 15   Global Step: 627630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:16:07,302-Speed 2620.90 samples/sec   Loss 3.2297   LearningRate 0.0059   Epoch: 15   Global Step: 627640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:16:11,187-Speed 2635.92 samples/sec   Loss 3.3913   LearningRate 0.0059   Epoch: 15   Global Step: 627650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:16:15,087-Speed 2627.12 samples/sec   Loss 3.2882   LearningRate 0.0059   Epoch: 15   Global Step: 627660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:16:18,958-Speed 2645.71 samples/sec   Loss 3.2910   LearningRate 0.0059   Epoch: 15   Global Step: 627670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:22,867-Speed 2620.53 samples/sec   Loss 3.4320   LearningRate 0.0059   Epoch: 15   Global Step: 627680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:26,765-Speed 2627.80 samples/sec   Loss 3.3408   LearningRate 0.0059   Epoch: 15   Global Step: 627690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:31,757-Speed 2051.92 samples/sec   Loss 3.2369   LearningRate 0.0059   Epoch: 15   Global Step: 627700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:35,655-Speed 2627.24 samples/sec   Loss 3.3531   LearningRate 0.0059   Epoch: 15   Global Step: 627710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:39,547-Speed 2632.13 samples/sec   Loss 3.2978   LearningRate 0.0059   Epoch: 15   Global Step: 627720   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:43,441-Speed 2630.73 samples/sec   Loss 3.2524   LearningRate 0.0059   Epoch: 15   Global Step: 627730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:47,335-Speed 2630.28 samples/sec   Loss 3.2979   LearningRate 0.0059   Epoch: 15   Global Step: 627740   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:51,241-Speed 2622.25 samples/sec   Loss 3.2558   LearningRate 0.0059   Epoch: 15   Global Step: 627750   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:55,144-Speed 2624.53 samples/sec   Loss 3.2560   LearningRate 0.0059   Epoch: 15   Global Step: 627760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:16:59,023-Speed 2640.70 samples/sec   Loss 3.2411   LearningRate 0.0059   Epoch: 15   Global Step: 627770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:02,919-Speed 2628.97 samples/sec   Loss 3.3010   LearningRate 0.0059   Epoch: 15   Global Step: 627780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:06,819-Speed 2625.89 samples/sec   Loss 3.2733   LearningRate 0.0059   Epoch: 15   Global Step: 627790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:10,712-Speed 2631.22 samples/sec   Loss 3.4009   LearningRate 0.0059   Epoch: 15   Global Step: 627800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:14,611-Speed 2626.45 samples/sec   Loss 3.3114   LearningRate 0.0059   Epoch: 15   Global Step: 627810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:18,514-Speed 2624.95 samples/sec   Loss 3.3389   LearningRate 0.0059   Epoch: 15   Global Step: 627820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:22,409-Speed 2629.38 samples/sec   Loss 3.3104   LearningRate 0.0059   Epoch: 15   Global Step: 627830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:26,307-Speed 2627.37 samples/sec   Loss 3.2998   LearningRate 0.0059   Epoch: 15   Global Step: 627840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:30,202-Speed 2630.59 samples/sec   Loss 3.2804   LearningRate 0.0059   Epoch: 15   Global Step: 627850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:34,104-Speed 2624.64 samples/sec   Loss 3.3158   LearningRate 0.0059   Epoch: 15   Global Step: 627860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:17:37,999-Speed 2629.28 samples/sec   Loss 3.3097   LearningRate 0.0059   Epoch: 15   Global Step: 627870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:17:41,899-Speed 2626.61 samples/sec   Loss 3.3048   LearningRate 0.0059   Epoch: 15   Global Step: 627880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:17:45,794-Speed 2629.43 samples/sec   Loss 3.2676   LearningRate 0.0059   Epoch: 15   Global Step: 627890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:17:49,700-Speed 2622.64 samples/sec   Loss 3.3108   LearningRate 0.0059   Epoch: 15   Global Step: 627900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:17:53,594-Speed 2630.00 samples/sec   Loss 3.2161   LearningRate 0.0059   Epoch: 15   Global Step: 627910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:17:57,488-Speed 2630.27 samples/sec   Loss 3.2469   LearningRate 0.0059   Epoch: 15   Global Step: 627920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:01,383-Speed 2630.01 samples/sec   Loss 3.2693   LearningRate 0.0059   Epoch: 15   Global Step: 627930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:05,274-Speed 2632.52 samples/sec   Loss 3.3507   LearningRate 0.0059   Epoch: 15   Global Step: 627940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:09,170-Speed 2628.86 samples/sec   Loss 3.2705   LearningRate 0.0059   Epoch: 15   Global Step: 627950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:13,064-Speed 2629.94 samples/sec   Loss 3.3582   LearningRate 0.0059   Epoch: 15   Global Step: 627960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:16,962-Speed 2627.84 samples/sec   Loss 3.2936   LearningRate 0.0059   Epoch: 15   Global Step: 627970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:18:20,837-Speed 2642.78 samples/sec   Loss 3.3069   LearningRate 0.0059   Epoch: 15   Global Step: 627980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:24,730-Speed 2631.22 samples/sec   Loss 3.3133   LearningRate 0.0059   Epoch: 15   Global Step: 627990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:28,636-Speed 2622.49 samples/sec   Loss 3.2603   LearningRate 0.0059   Epoch: 15   Global Step: 628000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:32,544-Speed 2620.92 samples/sec   Loss 3.2874   LearningRate 0.0059   Epoch: 15   Global Step: 628010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:36,439-Speed 2629.46 samples/sec   Loss 3.3321   LearningRate 0.0059   Epoch: 15   Global Step: 628020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:40,337-Speed 2627.62 samples/sec   Loss 3.3417   LearningRate 0.0059   Epoch: 15   Global Step: 628030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:44,232-Speed 2630.09 samples/sec   Loss 3.3067   LearningRate 0.0059   Epoch: 15   Global Step: 628040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:48,127-Speed 2629.54 samples/sec   Loss 3.2302   LearningRate 0.0059   Epoch: 15   Global Step: 628050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:52,020-Speed 2630.88 samples/sec   Loss 3.3940   LearningRate 0.0059   Epoch: 15   Global Step: 628060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:55,915-Speed 2629.47 samples/sec   Loss 3.2544   LearningRate 0.0059   Epoch: 15   Global Step: 628070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:18:59,818-Speed 2624.30 samples/sec   Loss 3.3134   LearningRate 0.0059   Epoch: 15   Global Step: 628080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:19:03,706-Speed 2633.77 samples/sec   Loss 3.2842   LearningRate 0.0059   Epoch: 15   Global Step: 628090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:07,621-Speed 2616.43 samples/sec   Loss 3.3039   LearningRate 0.0059   Epoch: 15   Global Step: 628100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:11,518-Speed 2628.24 samples/sec   Loss 3.3209   LearningRate 0.0059   Epoch: 15   Global Step: 628110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:15,427-Speed 2620.59 samples/sec   Loss 3.2633   LearningRate 0.0059   Epoch: 15   Global Step: 628120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:19,339-Speed 2618.51 samples/sec   Loss 3.3133   LearningRate 0.0059   Epoch: 15   Global Step: 628130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:23,230-Speed 2631.74 samples/sec   Loss 3.4367   LearningRate 0.0059   Epoch: 15   Global Step: 628140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:27,128-Speed 2628.32 samples/sec   Loss 3.2712   LearningRate 0.0059   Epoch: 15   Global Step: 628150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:31,030-Speed 2624.47 samples/sec   Loss 3.3537   LearningRate 0.0059   Epoch: 15   Global Step: 628160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:34,924-Speed 2629.86 samples/sec   Loss 3.3809   LearningRate 0.0059   Epoch: 15   Global Step: 628170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:38,821-Speed 2628.47 samples/sec   Loss 3.3189   LearningRate 0.0059   Epoch: 15   Global Step: 628180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:42,730-Speed 2620.53 samples/sec   Loss 3.3416   LearningRate 0.0059   Epoch: 15   Global Step: 628190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:19:46,683-Speed 2590.54 samples/sec   Loss 3.3146   LearningRate 0.0059   Epoch: 15   Global Step: 628200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:50,598-Speed 2616.43 samples/sec   Loss 3.2939   LearningRate 0.0059   Epoch: 15   Global Step: 628210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:19:54,474-Speed 2642.22 samples/sec   Loss 3.2982   LearningRate 0.0059   Epoch: 15   Global Step: 628220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:19:58,373-Speed 2628.13 samples/sec   Loss 3.3555   LearningRate 0.0059   Epoch: 15   Global Step: 628230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:02,273-Speed 2625.99 samples/sec   Loss 3.3555   LearningRate 0.0059   Epoch: 15   Global Step: 628240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:06,167-Speed 2630.29 samples/sec   Loss 3.2829   LearningRate 0.0059   Epoch: 15   Global Step: 628250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:10,059-Speed 2631.09 samples/sec   Loss 3.3140   LearningRate 0.0059   Epoch: 15   Global Step: 628260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:13,959-Speed 2626.76 samples/sec   Loss 3.3091   LearningRate 0.0059   Epoch: 15   Global Step: 628270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:17,852-Speed 2630.95 samples/sec   Loss 3.2657   LearningRate 0.0059   Epoch: 15   Global Step: 628280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:21,758-Speed 2621.91 samples/sec   Loss 3.3557   LearningRate 0.0059   Epoch: 15   Global Step: 628290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:25,655-Speed 2629.01 samples/sec   Loss 3.3237   LearningRate 0.0059   Epoch: 15   Global Step: 628300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:29,549-Speed 2629.88 samples/sec   Loss 3.2858   LearningRate 0.0059   Epoch: 15   Global Step: 628310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:20:33,466-Speed 2614.94 samples/sec   Loss 3.3131   LearningRate 0.0059   Epoch: 15   Global Step: 628320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:20:37,387-Speed 2611.85 samples/sec   Loss 3.4078   LearningRate 0.0059   Epoch: 15   Global Step: 628330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:20:41,291-Speed 2624.17 samples/sec   Loss 3.3185   LearningRate 0.0059   Epoch: 15   Global Step: 628340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:20:45,185-Speed 2630.20 samples/sec   Loss 3.3515   LearningRate 0.0059   Epoch: 15   Global Step: 628350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:20:49,078-Speed 2630.95 samples/sec   Loss 3.2636   LearningRate 0.0059   Epoch: 15   Global Step: 628360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:20:52,972-Speed 2630.15 samples/sec   Loss 3.3146   LearningRate 0.0059   Epoch: 15   Global Step: 628370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:20:56,866-Speed 2630.62 samples/sec   Loss 3.3772   LearningRate 0.0059   Epoch: 15   Global Step: 628380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:21:00,768-Speed 2624.66 samples/sec   Loss 3.3052   LearningRate 0.0059   Epoch: 15   Global Step: 628390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:21:04,681-Speed 2617.10 samples/sec   Loss 3.3253   LearningRate 0.0059   Epoch: 15   Global Step: 628400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:21:08,561-Speed 2639.97 samples/sec   Loss 3.2790   LearningRate 0.0059   Epoch: 15   Global Step: 628410   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:12,456-Speed 2629.77 samples/sec   Loss 3.3352   LearningRate 0.0059   Epoch: 15   Global Step: 628420   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:16,349-Speed 2631.13 samples/sec   Loss 3.2204   LearningRate 0.0059   Epoch: 15   Global Step: 628430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:20,240-Speed 2632.58 samples/sec   Loss 3.3050   LearningRate 0.0059   Epoch: 15   Global Step: 628440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:24,131-Speed 2632.27 samples/sec   Loss 3.3539   LearningRate 0.0059   Epoch: 15   Global Step: 628450   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:28,031-Speed 2626.22 samples/sec   Loss 3.4183   LearningRate 0.0059   Epoch: 15   Global Step: 628460   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:31,929-Speed 2627.31 samples/sec   Loss 3.3239   LearningRate 0.0059   Epoch: 15   Global Step: 628470   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:35,837-Speed 2620.97 samples/sec   Loss 3.2834   LearningRate 0.0059   Epoch: 15   Global Step: 628480   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:39,758-Speed 2612.09 samples/sec   Loss 3.2028   LearningRate 0.0059   Epoch: 15   Global Step: 628490   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:43,656-Speed 2627.79 samples/sec   Loss 3.3718   LearningRate 0.0059   Epoch: 15   Global Step: 628500   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:21:47,549-Speed 2631.44 samples/sec   Loss 3.3785   LearningRate 0.0059   Epoch: 15   Global Step: 628510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:21:51,444-Speed 2629.32 samples/sec   Loss 3.2940   LearningRate 0.0059   Epoch: 15   Global Step: 628520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:21:55,334-Speed 2633.48 samples/sec   Loss 3.3363   LearningRate 0.0059   Epoch: 15   Global Step: 628530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:21:59,229-Speed 2629.81 samples/sec   Loss 3.3046   LearningRate 0.0059   Epoch: 15   Global Step: 628540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:03,127-Speed 2627.52 samples/sec   Loss 3.3499   LearningRate 0.0059   Epoch: 15   Global Step: 628550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:07,125-Speed 2561.27 samples/sec   Loss 3.3268   LearningRate 0.0059   Epoch: 15   Global Step: 628560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:11,060-Speed 2603.82 samples/sec   Loss 3.2724   LearningRate 0.0059   Epoch: 15   Global Step: 628570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:14,960-Speed 2626.10 samples/sec   Loss 3.2008   LearningRate 0.0059   Epoch: 15   Global Step: 628580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:18,862-Speed 2625.24 samples/sec   Loss 3.1697   LearningRate 0.0059   Epoch: 15   Global Step: 628590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:22,772-Speed 2619.37 samples/sec   Loss 3.2622   LearningRate 0.0059   Epoch: 15   Global Step: 628600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:26,670-Speed 2628.11 samples/sec   Loss 3.2317   LearningRate 0.0059   Epoch: 15   Global Step: 628610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:30,576-Speed 2621.87 samples/sec   Loss 3.2731   LearningRate 0.0059   Epoch: 15   Global Step: 628620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:34,519-Speed 2597.81 samples/sec   Loss 3.2979   LearningRate 0.0059   Epoch: 15   Global Step: 628630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:38,422-Speed 2624.51 samples/sec   Loss 3.2868   LearningRate 0.0059   Epoch: 15   Global Step: 628640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:42,316-Speed 2630.64 samples/sec   Loss 3.3012   LearningRate 0.0059   Epoch: 15   Global Step: 628650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:46,229-Speed 2617.36 samples/sec   Loss 3.2261   LearningRate 0.0059   Epoch: 15   Global Step: 628660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:50,171-Speed 2598.76 samples/sec   Loss 3.3002   LearningRate 0.0059   Epoch: 15   Global Step: 628670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:54,069-Speed 2627.62 samples/sec   Loss 3.2345   LearningRate 0.0059   Epoch: 15   Global Step: 628680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:22:57,969-Speed 2627.22 samples/sec   Loss 3.2875   LearningRate 0.0059   Epoch: 15   Global Step: 628690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:01,972-Speed 2558.67 samples/sec   Loss 3.3524   LearningRate 0.0059   Epoch: 15   Global Step: 628700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:05,844-Speed 2645.51 samples/sec   Loss 3.3314   LearningRate 0.0059   Epoch: 15   Global Step: 628710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:09,744-Speed 2626.09 samples/sec   Loss 3.2938   LearningRate 0.0059   Epoch: 15   Global Step: 628720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:13,651-Speed 2622.19 samples/sec   Loss 3.2983   LearningRate 0.0059   Epoch: 15   Global Step: 628730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:17,573-Speed 2611.81 samples/sec   Loss 3.3371   LearningRate 0.0059   Epoch: 15   Global Step: 628740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:21,470-Speed 2628.12 samples/sec   Loss 3.3553   LearningRate 0.0059   Epoch: 15   Global Step: 628750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:25,367-Speed 2628.67 samples/sec   Loss 3.2826   LearningRate 0.0059   Epoch: 15   Global Step: 628760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:29,268-Speed 2626.15 samples/sec   Loss 3.3222   LearningRate 0.0059   Epoch: 15   Global Step: 628770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:33,160-Speed 2631.49 samples/sec   Loss 3.2336   LearningRate 0.0059   Epoch: 15   Global Step: 628780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:37,056-Speed 2628.70 samples/sec   Loss 3.3282   LearningRate 0.0059   Epoch: 15   Global Step: 628790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:40,958-Speed 2624.99 samples/sec   Loss 3.3480   LearningRate 0.0059   Epoch: 15   Global Step: 628800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:44,865-Speed 2622.13 samples/sec   Loss 3.2910   LearningRate 0.0059   Epoch: 15   Global Step: 628810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:48,763-Speed 2627.99 samples/sec   Loss 3.2783   LearningRate 0.0059   Epoch: 15   Global Step: 628820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:52,658-Speed 2629.03 samples/sec   Loss 3.3218   LearningRate 0.0059   Epoch: 15   Global Step: 628830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:23:56,550-Speed 2632.13 samples/sec   Loss 3.2966   LearningRate 0.0059   Epoch: 15   Global Step: 628840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:24:00,447-Speed 2628.80 samples/sec   Loss 3.2938   LearningRate 0.0059   Epoch: 15   Global Step: 628850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:24:04,322-Speed 2642.88 samples/sec   Loss 3.4038   LearningRate 0.0059   Epoch: 15   Global Step: 628860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:08,221-Speed 2626.86 samples/sec   Loss 3.2867   LearningRate 0.0059   Epoch: 15   Global Step: 628870   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:12,110-Speed 2634.29 samples/sec   Loss 3.2662   LearningRate 0.0059   Epoch: 15   Global Step: 628880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:16,001-Speed 2631.73 samples/sec   Loss 3.3266   LearningRate 0.0059   Epoch: 15   Global Step: 628890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:19,910-Speed 2620.24 samples/sec   Loss 3.2790   LearningRate 0.0059   Epoch: 15   Global Step: 628900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:23,802-Speed 2632.16 samples/sec   Loss 3.2042   LearningRate 0.0059   Epoch: 15   Global Step: 628910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:27,720-Speed 2615.09 samples/sec   Loss 3.2037   LearningRate 0.0059   Epoch: 15   Global Step: 628920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:31,623-Speed 2624.28 samples/sec   Loss 3.2847   LearningRate 0.0059   Epoch: 15   Global Step: 628930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:35,544-Speed 2612.22 samples/sec   Loss 3.2705   LearningRate 0.0058   Epoch: 15   Global Step: 628940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:39,438-Speed 2630.74 samples/sec   Loss 3.4173   LearningRate 0.0058   Epoch: 15   Global Step: 628950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:24:43,333-Speed 2630.22 samples/sec   Loss 3.2435   LearningRate 0.0058   Epoch: 15   Global Step: 628960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:24:47,231-Speed 2627.92 samples/sec   Loss 3.2546   LearningRate 0.0058   Epoch: 15   Global Step: 628970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:24:51,150-Speed 2613.59 samples/sec   Loss 3.2708   LearningRate 0.0058   Epoch: 15   Global Step: 628980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:24:55,061-Speed 2618.98 samples/sec   Loss 3.3076   LearningRate 0.0058   Epoch: 15   Global Step: 628990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:24:58,979-Speed 2614.00 samples/sec   Loss 3.2284   LearningRate 0.0058   Epoch: 15   Global Step: 629000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:25:02,876-Speed 2628.65 samples/sec   Loss 3.2098   LearningRate 0.0058   Epoch: 15   Global Step: 629010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:25:06,751-Speed 2643.34 samples/sec   Loss 3.2720   LearningRate 0.0058   Epoch: 15   Global Step: 629020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:10,655-Speed 2623.24 samples/sec   Loss 3.3565   LearningRate 0.0058   Epoch: 15   Global Step: 629030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:14,760-Speed 2495.18 samples/sec   Loss 3.3328   LearningRate 0.0058   Epoch: 15   Global Step: 629040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:18,801-Speed 2535.42 samples/sec   Loss 3.2955   LearningRate 0.0058   Epoch: 15   Global Step: 629050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:22,745-Speed 2596.95 samples/sec   Loss 3.3133   LearningRate 0.0058   Epoch: 15   Global Step: 629060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:26,668-Speed 2610.70 samples/sec   Loss 3.2846   LearningRate 0.0058   Epoch: 15   Global Step: 629070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:30,562-Speed 2630.90 samples/sec   Loss 3.2520   LearningRate 0.0058   Epoch: 15   Global Step: 629080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:34,457-Speed 2629.54 samples/sec   Loss 3.2667   LearningRate 0.0058   Epoch: 15   Global Step: 629090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:38,356-Speed 2626.60 samples/sec   Loss 3.3237   LearningRate 0.0058   Epoch: 15   Global Step: 629100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:42,258-Speed 2625.56 samples/sec   Loss 3.3825   LearningRate 0.0058   Epoch: 15   Global Step: 629110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:25:46,153-Speed 2629.78 samples/sec   Loss 3.2616   LearningRate 0.0058   Epoch: 15   Global Step: 629120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:25:50,054-Speed 2625.94 samples/sec   Loss 3.3094   LearningRate 0.0058   Epoch: 15   Global Step: 629130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:25:53,955-Speed 2625.38 samples/sec   Loss 3.2518   LearningRate 0.0058   Epoch: 15   Global Step: 629140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:25:57,864-Speed 2620.55 samples/sec   Loss 3.3600   LearningRate 0.0058   Epoch: 15   Global Step: 629150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:01,767-Speed 2624.18 samples/sec   Loss 3.2736   LearningRate 0.0058   Epoch: 15   Global Step: 629160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:05,663-Speed 2629.23 samples/sec   Loss 3.2399   LearningRate 0.0058   Epoch: 15   Global Step: 629170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:09,564-Speed 2625.74 samples/sec   Loss 3.2700   LearningRate 0.0058   Epoch: 15   Global Step: 629180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:13,476-Speed 2618.27 samples/sec   Loss 3.3489   LearningRate 0.0058   Epoch: 15   Global Step: 629190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:17,385-Speed 2620.29 samples/sec   Loss 3.2855   LearningRate 0.0058   Epoch: 15   Global Step: 629200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:21,341-Speed 2590.22 samples/sec   Loss 3.2358   LearningRate 0.0058   Epoch: 15   Global Step: 629210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:25,306-Speed 2582.78 samples/sec   Loss 3.3434   LearningRate 0.0058   Epoch: 15   Global Step: 629220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-15 18:26:29,181-Speed 2643.83 samples/sec   Loss 3.2896   LearningRate 0.0058   Epoch: 15   Global Step: 629230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:33,112-Speed 2605.38 samples/sec   Loss 3.3554   LearningRate 0.0058   Epoch: 15   Global Step: 629240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:37,011-Speed 2627.51 samples/sec   Loss 3.2557   LearningRate 0.0058   Epoch: 15   Global Step: 629250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:40,950-Speed 2599.74 samples/sec   Loss 3.3413   LearningRate 0.0058   Epoch: 15   Global Step: 629260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:44,844-Speed 2630.55 samples/sec   Loss 3.2709   LearningRate 0.0058   Epoch: 15   Global Step: 629270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:48,740-Speed 2629.86 samples/sec   Loss 3.1792   LearningRate 0.0058   Epoch: 15   Global Step: 629280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:52,634-Speed 2630.56 samples/sec   Loss 3.2930   LearningRate 0.0058   Epoch: 15   Global Step: 629290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:26:56,527-Speed 2631.14 samples/sec   Loss 3.3155   LearningRate 0.0058   Epoch: 15   Global Step: 629300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-15 18:27:00,404-Speed 2641.86 samples/sec   Loss 3.2590   LearningRate 0.0058   Epoch: 15   Global Step: 629310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:27:04,306-Speed 2625.08 samples/sec   Loss 3.2172   LearningRate 0.0058   Epoch: 15   Global Step: 629320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:27:08,219-Speed 2616.98 samples/sec   Loss 3.2912   LearningRate 0.0058   Epoch: 15   Global Step: 629330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:27:12,119-Speed 2626.46 samples/sec   Loss 3.4092   LearningRate 0.0058   Epoch: 15   Global Step: 629340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:27:16,019-Speed 2626.89 samples/sec   Loss 3.2652   LearningRate 0.0058   Epoch: 15   Global Step: 629350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-15 18:27:19,919-Speed 2626.40 samples/sec   Loss 3.2711   LearningRate 0.0058   Epoch: 15   Global Step: 629360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:27:23,852-Speed 2603.74 samples/sec   Loss 3.2830   LearningRate 0.0058   Epoch: 15   Global Step: 629370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:27:27,762-Speed 2620.82 samples/sec   Loss 3.2334   LearningRate 0.0058   Epoch: 15   Global Step: 629380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:27:31,660-Speed 2627.35 samples/sec   Loss 3.2187   LearningRate 0.0058   Epoch: 15   Global Step: 629390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:27:35,583-Speed 2610.37 samples/sec   Loss 3.2789   LearningRate 0.0058   Epoch: 15   Global Step: 629400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:27:39,486-Speed 2624.86 samples/sec   Loss 3.3265   LearningRate 0.0058   Epoch: 15   Global Step: 629410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:27:43,392-Speed 2622.40 samples/sec   Loss 3.3448   LearningRate 0.0058   Epoch: 15   Global Step: 629420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:27:47,308-Speed 2615.58 samples/sec   Loss 3.3095   LearningRate 0.0058   Epoch: 15   Global Step: 629430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:27:51,211-Speed 2624.77 samples/sec   Loss 3.2992   LearningRate 0.0058   Epoch: 15   Global Step: 629440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:27:55,167-Speed 2588.99 samples/sec   Loss 3.3080   LearningRate 0.0058   Epoch: 15   Global Step: 629450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:27:59,091-Speed 2610.09 samples/sec   Loss 3.3469   LearningRate 0.0058   Epoch: 15   Global Step: 629460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:28:02,999-Speed 2621.16 samples/sec   Loss 3.3313   LearningRate 0.0058   Epoch: 15   Global Step: 629470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:28:06,892-Speed 2630.76 samples/sec   Loss 3.3829   LearningRate 0.0058   Epoch: 15   Global Step: 629480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:28:10,791-Speed 2627.00 samples/sec   Loss 3.2182   LearningRate 0.0058   Epoch: 15   Global Step: 629490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:28:14,721-Speed 2606.57 samples/sec   Loss 3.3131   LearningRate 0.0058   Epoch: 15   Global Step: 629500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:28:18,628-Speed 2622.36 samples/sec   Loss 3.2717   LearningRate 0.0058   Epoch: 15   Global Step: 629510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 18:28:22,523-Speed 2629.11 samples/sec   Loss 3.2382   LearningRate 0.0058   Epoch: 15   Global Step: 629520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 18:28:26,405-Speed 2639.17 samples/sec   Loss 3.2926   LearningRate 0.0058   Epoch: 15   Global Step: 629530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:28:30,285-Speed 2639.87 samples/sec   Loss 3.2925   LearningRate 0.0058   Epoch: 15   Global Step: 629540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:34,182-Speed 2628.92 samples/sec   Loss 3.3124   LearningRate 0.0058   Epoch: 15   Global Step: 629550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:38,121-Speed 2600.37 samples/sec   Loss 3.2573   LearningRate 0.0058   Epoch: 15   Global Step: 629560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:42,031-Speed 2619.16 samples/sec   Loss 3.2688   LearningRate 0.0058   Epoch: 15   Global Step: 629570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:45,926-Speed 2629.90 samples/sec   Loss 3.3002   LearningRate 0.0058   Epoch: 15   Global Step: 629580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:49,831-Speed 2623.20 samples/sec   Loss 3.2662   LearningRate 0.0058   Epoch: 15   Global Step: 629590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:53,736-Speed 2624.31 samples/sec   Loss 3.2021   LearningRate 0.0058   Epoch: 15   Global Step: 629600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:28:57,633-Speed 2628.26 samples/sec   Loss 3.2485   LearningRate 0.0058   Epoch: 15   Global Step: 629610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:01,529-Speed 2628.62 samples/sec   Loss 3.2729   LearningRate 0.0058   Epoch: 15   Global Step: 629620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:05,422-Speed 2631.26 samples/sec   Loss 3.2947   LearningRate 0.0058   Epoch: 15   Global Step: 629630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:09,317-Speed 2629.42 samples/sec   Loss 3.2259   LearningRate 0.0058   Epoch: 15   Global Step: 629640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:13,222-Speed 2622.78 samples/sec   Loss 3.2889   LearningRate 0.0058   Epoch: 15   Global Step: 629650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:17,133-Speed 2619.33 samples/sec   Loss 3.3935   LearningRate 0.0058   Epoch: 15   Global Step: 629660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:21,037-Speed 2624.11 samples/sec   Loss 3.2265   LearningRate 0.0058   Epoch: 15   Global Step: 629670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:24,958-Speed 2612.02 samples/sec   Loss 3.2420   LearningRate 0.0058   Epoch: 15   Global Step: 629680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:28,853-Speed 2629.80 samples/sec   Loss 3.3028   LearningRate 0.0058   Epoch: 15   Global Step: 629690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:32,747-Speed 2630.15 samples/sec   Loss 3.2097   LearningRate 0.0058   Epoch: 15   Global Step: 629700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:29:36,631-Speed 2637.35 samples/sec   Loss 3.2808   LearningRate 0.0058   Epoch: 15   Global Step: 629710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:40,528-Speed 2628.06 samples/sec   Loss 3.2388   LearningRate 0.0058   Epoch: 15   Global Step: 629720   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:44,432-Speed 2623.99 samples/sec   Loss 3.3077   LearningRate 0.0058   Epoch: 15   Global Step: 629730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:48,329-Speed 2628.18 samples/sec   Loss 3.2248   LearningRate 0.0058   Epoch: 15   Global Step: 629740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:52,234-Speed 2622.73 samples/sec   Loss 3.2964   LearningRate 0.0058   Epoch: 15   Global Step: 629750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:29:56,176-Speed 2598.76 samples/sec   Loss 3.2418   LearningRate 0.0058   Epoch: 15   Global Step: 629760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:00,076-Speed 2626.53 samples/sec   Loss 3.2634   LearningRate 0.0058   Epoch: 15   Global Step: 629770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:03,970-Speed 2630.44 samples/sec   Loss 3.1896   LearningRate 0.0058   Epoch: 15   Global Step: 629780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:07,865-Speed 2629.80 samples/sec   Loss 3.2690   LearningRate 0.0058   Epoch: 15   Global Step: 629790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:11,759-Speed 2630.05 samples/sec   Loss 3.2751   LearningRate 0.0058   Epoch: 15   Global Step: 629800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:15,654-Speed 2630.23 samples/sec   Loss 3.2684   LearningRate 0.0058   Epoch: 15   Global Step: 629810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:19,599-Speed 2596.08 samples/sec   Loss 3.2716   LearningRate 0.0058   Epoch: 15   Global Step: 629820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:23,495-Speed 2629.52 samples/sec   Loss 3.2854   LearningRate 0.0058   Epoch: 15   Global Step: 629830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:27,404-Speed 2620.32 samples/sec   Loss 3.3964   LearningRate 0.0058   Epoch: 15   Global Step: 629840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:31,311-Speed 2621.40 samples/sec   Loss 3.2179   LearningRate 0.0058   Epoch: 15   Global Step: 629850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:35,221-Speed 2619.42 samples/sec   Loss 3.3094   LearningRate 0.0058   Epoch: 15   Global Step: 629860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:39,132-Speed 2619.54 samples/sec   Loss 3.2727   LearningRate 0.0058   Epoch: 15   Global Step: 629870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:30:43,108-Speed 2576.19 samples/sec   Loss 3.3291   LearningRate 0.0058   Epoch: 15   Global Step: 629880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:47,028-Speed 2612.58 samples/sec   Loss 3.2648   LearningRate 0.0058   Epoch: 15   Global Step: 629890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:50,958-Speed 2606.86 samples/sec   Loss 3.2532   LearningRate 0.0058   Epoch: 15   Global Step: 629900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:54,851-Speed 2631.08 samples/sec   Loss 3.2670   LearningRate 0.0058   Epoch: 15   Global Step: 629910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:30:58,777-Speed 2609.02 samples/sec   Loss 3.2908   LearningRate 0.0058   Epoch: 15   Global Step: 629920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:31:02,672-Speed 2629.88 samples/sec   Loss 3.2525   LearningRate 0.0058   Epoch: 15   Global Step: 629930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:31:06,587-Speed 2616.17 samples/sec   Loss 3.2430   LearningRate 0.0058   Epoch: 15   Global Step: 629940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:31:10,515-Speed 2607.26 samples/sec   Loss 3.3337   LearningRate 0.0058   Epoch: 15   Global Step: 629950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:31:14,498-Speed 2571.56 samples/sec   Loss 3.3010   LearningRate 0.0058   Epoch: 15   Global Step: 629960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:31:18,391-Speed 2631.72 samples/sec   Loss 3.2822   LearningRate 0.0058   Epoch: 15   Global Step: 629970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:31:22,285-Speed 2630.10 samples/sec   Loss 3.3475   LearningRate 0.0058   Epoch: 15   Global Step: 629980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:31:26,179-Speed 2630.74 samples/sec   Loss 3.2898   LearningRate 0.0058   Epoch: 15   Global Step: 629990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:31:30,077-Speed 2627.02 samples/sec   Loss 3.2556   LearningRate 0.0058   Epoch: 15   Global Step: 630000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:32:13,174-[lfw][630000]XNorm: 22.623001
Training: 2022-04-15 18:32:13,175-[lfw][630000]Accuracy-Flip: 0.99800+-0.00256
Training: 2022-04-15 18:32:13,175-[lfw][630000]Accuracy-Highest: 0.99800
Training: 2022-04-15 18:33:03,538-[cfp_fp][630000]XNorm: 21.724245
Training: 2022-04-15 18:33:03,539-[cfp_fp][630000]Accuracy-Flip: 0.99243+-0.00362
Training: 2022-04-15 18:33:03,540-[cfp_fp][630000]Accuracy-Highest: 0.99243
Training: 2022-04-15 18:33:46,912-[agedb_30][630000]XNorm: 22.713212
Training: 2022-04-15 18:33:46,913-[agedb_30][630000]Accuracy-Flip: 0.98050+-0.00633
Training: 2022-04-15 18:33:46,914-[agedb_30][630000]Accuracy-Highest: 0.98150
Training: 2022-04-15 18:33:50,784-Speed 72.78 samples/sec   Loss 3.3057   LearningRate 0.0058   Epoch: 15   Global Step: 630010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:33:54,655-Speed 2646.13 samples/sec   Loss 3.2381   LearningRate 0.0058   Epoch: 15   Global Step: 630020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:33:58,518-Speed 2651.44 samples/sec   Loss 3.2823   LearningRate 0.0058   Epoch: 15   Global Step: 630030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:02,394-Speed 2642.95 samples/sec   Loss 3.2828   LearningRate 0.0058   Epoch: 15   Global Step: 630040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:06,270-Speed 2642.54 samples/sec   Loss 3.2562   LearningRate 0.0058   Epoch: 15   Global Step: 630050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:10,169-Speed 2627.18 samples/sec   Loss 3.2891   LearningRate 0.0058   Epoch: 15   Global Step: 630060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:14,052-Speed 2638.30 samples/sec   Loss 3.3241   LearningRate 0.0058   Epoch: 15   Global Step: 630070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:17,911-Speed 2653.95 samples/sec   Loss 3.2170   LearningRate 0.0058   Epoch: 15   Global Step: 630080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:21,791-Speed 2641.07 samples/sec   Loss 3.2578   LearningRate 0.0058   Epoch: 15   Global Step: 630090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:25,674-Speed 2637.34 samples/sec   Loss 3.2823   LearningRate 0.0058   Epoch: 15   Global Step: 630100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:34:29,544-Speed 2646.69 samples/sec   Loss 3.3010   LearningRate 0.0058   Epoch: 15   Global Step: 630110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:33,441-Speed 2628.63 samples/sec   Loss 3.2588   LearningRate 0.0058   Epoch: 15   Global Step: 630120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:37,330-Speed 2634.11 samples/sec   Loss 3.3000   LearningRate 0.0058   Epoch: 15   Global Step: 630130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:41,226-Speed 2628.53 samples/sec   Loss 3.3259   LearningRate 0.0058   Epoch: 15   Global Step: 630140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:45,128-Speed 2625.67 samples/sec   Loss 3.1829   LearningRate 0.0058   Epoch: 15   Global Step: 630150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:49,068-Speed 2599.49 samples/sec   Loss 3.2795   LearningRate 0.0058   Epoch: 15   Global Step: 630160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:52,967-Speed 2627.46 samples/sec   Loss 3.2219   LearningRate 0.0058   Epoch: 15   Global Step: 630170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:34:56,873-Speed 2622.34 samples/sec   Loss 3.3362   LearningRate 0.0058   Epoch: 15   Global Step: 630180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:35:00,796-Speed 2610.77 samples/sec   Loss 3.3146   LearningRate 0.0058   Epoch: 15   Global Step: 630190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:35:04,698-Speed 2625.50 samples/sec   Loss 3.2438   LearningRate 0.0058   Epoch: 15   Global Step: 630200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:35:08,598-Speed 2625.86 samples/sec   Loss 3.2640   LearningRate 0.0058   Epoch: 15   Global Step: 630210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:12,510-Speed 2618.13 samples/sec   Loss 3.3767   LearningRate 0.0058   Epoch: 15   Global Step: 630220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:16,430-Speed 2613.07 samples/sec   Loss 3.3272   LearningRate 0.0058   Epoch: 15   Global Step: 630230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:20,336-Speed 2622.93 samples/sec   Loss 3.2400   LearningRate 0.0058   Epoch: 15   Global Step: 630240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:24,239-Speed 2624.12 samples/sec   Loss 3.2969   LearningRate 0.0058   Epoch: 15   Global Step: 630250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:28,150-Speed 2618.92 samples/sec   Loss 3.3221   LearningRate 0.0058   Epoch: 15   Global Step: 630260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:32,067-Speed 2615.19 samples/sec   Loss 3.2483   LearningRate 0.0058   Epoch: 15   Global Step: 630270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:35,979-Speed 2617.70 samples/sec   Loss 3.3636   LearningRate 0.0058   Epoch: 15   Global Step: 630280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:39,903-Speed 2610.55 samples/sec   Loss 3.2878   LearningRate 0.0058   Epoch: 15   Global Step: 630290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:43,813-Speed 2619.69 samples/sec   Loss 3.2771   LearningRate 0.0058   Epoch: 15   Global Step: 630300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:47,729-Speed 2615.17 samples/sec   Loss 3.3002   LearningRate 0.0058   Epoch: 15   Global Step: 630310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 18:35:51,623-Speed 2630.80 samples/sec   Loss 3.2446   LearningRate 0.0058   Epoch: 15   Global Step: 630320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:55,540-Speed 2615.10 samples/sec   Loss 3.2728   LearningRate 0.0058   Epoch: 15   Global Step: 630330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:35:59,439-Speed 2627.14 samples/sec   Loss 3.1827   LearningRate 0.0058   Epoch: 15   Global Step: 630340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:03,365-Speed 2608.36 samples/sec   Loss 3.2698   LearningRate 0.0058   Epoch: 15   Global Step: 630350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:07,294-Speed 2607.15 samples/sec   Loss 3.3292   LearningRate 0.0058   Epoch: 15   Global Step: 630360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:11,230-Speed 2601.98 samples/sec   Loss 3.3081   LearningRate 0.0058   Epoch: 15   Global Step: 630370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:15,216-Speed 2570.23 samples/sec   Loss 3.2952   LearningRate 0.0058   Epoch: 15   Global Step: 630380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:19,174-Speed 2587.88 samples/sec   Loss 3.2552   LearningRate 0.0058   Epoch: 15   Global Step: 630390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:23,140-Speed 2582.19 samples/sec   Loss 3.2954   LearningRate 0.0058   Epoch: 15   Global Step: 630400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:27,112-Speed 2578.99 samples/sec   Loss 3.2798   LearningRate 0.0058   Epoch: 15   Global Step: 630410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:31,101-Speed 2567.81 samples/sec   Loss 3.2748   LearningRate 0.0058   Epoch: 15   Global Step: 630420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:35,079-Speed 2574.59 samples/sec   Loss 3.2730   LearningRate 0.0058   Epoch: 15   Global Step: 630430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:36:39,065-Speed 2569.64 samples/sec   Loss 3.2463   LearningRate 0.0058   Epoch: 15   Global Step: 630440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:36:43,050-Speed 2570.34 samples/sec   Loss 3.2217   LearningRate 0.0058   Epoch: 15   Global Step: 630450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:36:47,055-Speed 2557.28 samples/sec   Loss 3.2386   LearningRate 0.0058   Epoch: 15   Global Step: 630460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:36:51,063-Speed 2555.55 samples/sec   Loss 3.2048   LearningRate 0.0058   Epoch: 15   Global Step: 630470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:36:55,045-Speed 2572.39 samples/sec   Loss 3.2043   LearningRate 0.0058   Epoch: 15   Global Step: 630480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:36:59,068-Speed 2546.44 samples/sec   Loss 3.2252   LearningRate 0.0058   Epoch: 15   Global Step: 630490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:03,079-Speed 2553.69 samples/sec   Loss 3.2763   LearningRate 0.0058   Epoch: 15   Global Step: 630500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:07,078-Speed 2561.47 samples/sec   Loss 3.2817   LearningRate 0.0058   Epoch: 15   Global Step: 630510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:11,063-Speed 2570.69 samples/sec   Loss 3.2862   LearningRate 0.0058   Epoch: 15   Global Step: 630520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:15,043-Speed 2573.14 samples/sec   Loss 3.2894   LearningRate 0.0058   Epoch: 15   Global Step: 630530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:19,012-Speed 2580.25 samples/sec   Loss 3.3375   LearningRate 0.0058   Epoch: 15   Global Step: 630540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 18:37:22,986-Speed 2577.64 samples/sec   Loss 3.2222   LearningRate 0.0058   Epoch: 15   Global Step: 630550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 18:37:26,932-Speed 2595.35 samples/sec   Loss 3.2492   LearningRate 0.0058   Epoch: 15   Global Step: 630560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:30,896-Speed 2584.36 samples/sec   Loss 3.2252   LearningRate 0.0058   Epoch: 15   Global Step: 630570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:34,850-Speed 2590.97 samples/sec   Loss 3.3811   LearningRate 0.0058   Epoch: 15   Global Step: 630580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:38,820-Speed 2579.61 samples/sec   Loss 3.2550   LearningRate 0.0058   Epoch: 15   Global Step: 630590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:37:42,754-Speed 2603.34 samples/sec   Loss 3.2214   LearningRate 0.0058   Epoch: 15   Global Step: 630600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:37:46,709-Speed 2589.71 samples/sec   Loss 3.2836   LearningRate 0.0058   Epoch: 15   Global Step: 630610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:37:50,679-Speed 2580.46 samples/sec   Loss 3.3183   LearningRate 0.0058   Epoch: 15   Global Step: 630620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:37:54,636-Speed 2588.42 samples/sec   Loss 3.1937   LearningRate 0.0058   Epoch: 15   Global Step: 630630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:37:58,582-Speed 2596.32 samples/sec   Loss 3.2260   LearningRate 0.0058   Epoch: 15   Global Step: 630640   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:02,521-Speed 2600.52 samples/sec   Loss 3.2919   LearningRate 0.0058   Epoch: 15   Global Step: 630650   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:06,500-Speed 2573.78 samples/sec   Loss 3.2698   LearningRate 0.0057   Epoch: 15   Global Step: 630660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:10,452-Speed 2591.61 samples/sec   Loss 3.3855   LearningRate 0.0057   Epoch: 15   Global Step: 630670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:14,387-Speed 2603.16 samples/sec   Loss 3.2928   LearningRate 0.0057   Epoch: 15   Global Step: 630680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:18,325-Speed 2600.54 samples/sec   Loss 3.3212   LearningRate 0.0057   Epoch: 15   Global Step: 630690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:22,270-Speed 2596.81 samples/sec   Loss 3.2280   LearningRate 0.0057   Epoch: 15   Global Step: 630700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:38:26,212-Speed 2599.03 samples/sec   Loss 3.2569   LearningRate 0.0057   Epoch: 15   Global Step: 630710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:38:30,141-Speed 2606.98 samples/sec   Loss 3.2604   LearningRate 0.0057   Epoch: 15   Global Step: 630720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:38:34,065-Speed 2610.23 samples/sec   Loss 3.1964   LearningRate 0.0057   Epoch: 15   Global Step: 630730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:38:37,967-Speed 2624.47 samples/sec   Loss 3.2527   LearningRate 0.0057   Epoch: 15   Global Step: 630740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:41,892-Speed 2610.85 samples/sec   Loss 3.2233   LearningRate 0.0057   Epoch: 15   Global Step: 630750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:45,808-Speed 2616.04 samples/sec   Loss 3.2604   LearningRate 0.0057   Epoch: 15   Global Step: 630760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:49,728-Speed 2612.66 samples/sec   Loss 3.2957   LearningRate 0.0057   Epoch: 15   Global Step: 630770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:53,676-Speed 2594.75 samples/sec   Loss 3.2761   LearningRate 0.0057   Epoch: 15   Global Step: 630780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:38:57,592-Speed 2616.01 samples/sec   Loss 3.2509   LearningRate 0.0057   Epoch: 15   Global Step: 630790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:01,521-Speed 2606.76 samples/sec   Loss 3.2827   LearningRate 0.0057   Epoch: 15   Global Step: 630800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:05,464-Speed 2597.47 samples/sec   Loss 3.2313   LearningRate 0.0057   Epoch: 15   Global Step: 630810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:09,384-Speed 2612.90 samples/sec   Loss 3.2567   LearningRate 0.0057   Epoch: 15   Global Step: 630820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:13,375-Speed 2566.29 samples/sec   Loss 3.2663   LearningRate 0.0057   Epoch: 15   Global Step: 630830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:17,383-Speed 2556.23 samples/sec   Loss 3.2381   LearningRate 0.0057   Epoch: 15   Global Step: 630840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:39:21,301-Speed 2614.50 samples/sec   Loss 3.3542   LearningRate 0.0057   Epoch: 15   Global Step: 630850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:39:25,222-Speed 2611.95 samples/sec   Loss 3.3250   LearningRate 0.0057   Epoch: 15   Global Step: 630860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:39:29,162-Speed 2600.25 samples/sec   Loss 3.2462   LearningRate 0.0057   Epoch: 15   Global Step: 630870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:39:33,075-Speed 2617.58 samples/sec   Loss 3.2893   LearningRate 0.0057   Epoch: 15   Global Step: 630880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:39:37,008-Speed 2604.16 samples/sec   Loss 3.3042   LearningRate 0.0057   Epoch: 15   Global Step: 630890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:39:40,899-Speed 2632.33 samples/sec   Loss 3.2021   LearningRate 0.0057   Epoch: 15   Global Step: 630900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:44,847-Speed 2594.75 samples/sec   Loss 3.2340   LearningRate 0.0057   Epoch: 15   Global Step: 630910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:48,753-Speed 2622.35 samples/sec   Loss 3.2507   LearningRate 0.0057   Epoch: 15   Global Step: 630920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:52,663-Speed 2619.72 samples/sec   Loss 3.2629   LearningRate 0.0057   Epoch: 15   Global Step: 630930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:39:56,579-Speed 2615.51 samples/sec   Loss 3.2102   LearningRate 0.0057   Epoch: 15   Global Step: 630940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:00,494-Speed 2616.46 samples/sec   Loss 3.2516   LearningRate 0.0057   Epoch: 15   Global Step: 630950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:04,418-Speed 2609.89 samples/sec   Loss 3.2894   LearningRate 0.0057   Epoch: 15   Global Step: 630960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:08,332-Speed 2617.13 samples/sec   Loss 3.2101   LearningRate 0.0057   Epoch: 15   Global Step: 630970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:12,251-Speed 2613.12 samples/sec   Loss 3.2415   LearningRate 0.0057   Epoch: 15   Global Step: 630980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:16,170-Speed 2614.06 samples/sec   Loss 3.2899   LearningRate 0.0057   Epoch: 15   Global Step: 630990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:20,083-Speed 2617.37 samples/sec   Loss 3.3121   LearningRate 0.0057   Epoch: 15   Global Step: 631000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:40:23,998-Speed 2616.78 samples/sec   Loss 3.2175   LearningRate 0.0057   Epoch: 15   Global Step: 631010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:40:27,949-Speed 2592.47 samples/sec   Loss 3.2495   LearningRate 0.0057   Epoch: 15   Global Step: 631020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:40:31,866-Speed 2615.52 samples/sec   Loss 3.2613   LearningRate 0.0057   Epoch: 15   Global Step: 631030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:40:35,773-Speed 2621.29 samples/sec   Loss 3.3308   LearningRate 0.0057   Epoch: 15   Global Step: 631040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:40:39,689-Speed 2615.01 samples/sec   Loss 3.2524   LearningRate 0.0057   Epoch: 15   Global Step: 631050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:40:43,579-Speed 2633.30 samples/sec   Loss 3.2612   LearningRate 0.0057   Epoch: 15   Global Step: 631060   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:47,495-Speed 2615.58 samples/sec   Loss 3.2423   LearningRate 0.0057   Epoch: 15   Global Step: 631070   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:51,432-Speed 2602.27 samples/sec   Loss 3.2414   LearningRate 0.0057   Epoch: 15   Global Step: 631080   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:55,339-Speed 2621.06 samples/sec   Loss 3.3066   LearningRate 0.0057   Epoch: 15   Global Step: 631090   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:40:59,262-Speed 2611.31 samples/sec   Loss 3.3446   LearningRate 0.0057   Epoch: 15   Global Step: 631100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:03,193-Speed 2605.56 samples/sec   Loss 3.2861   LearningRate 0.0057   Epoch: 15   Global Step: 631110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:07,103-Speed 2619.40 samples/sec   Loss 3.2354   LearningRate 0.0057   Epoch: 15   Global Step: 631120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:11,024-Speed 2612.10 samples/sec   Loss 3.2673   LearningRate 0.0057   Epoch: 15   Global Step: 631130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:14,945-Speed 2612.30 samples/sec   Loss 3.2476   LearningRate 0.0057   Epoch: 15   Global Step: 631140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:18,958-Speed 2552.95 samples/sec   Loss 3.2500   LearningRate 0.0057   Epoch: 15   Global Step: 631150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:22,875-Speed 2615.54 samples/sec   Loss 3.2497   LearningRate 0.0057   Epoch: 15   Global Step: 631160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:41:26,796-Speed 2611.83 samples/sec   Loss 3.2416   LearningRate 0.0057   Epoch: 15   Global Step: 631170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:41:30,709-Speed 2618.21 samples/sec   Loss 3.3173   LearningRate 0.0057   Epoch: 15   Global Step: 631180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:41:34,625-Speed 2615.54 samples/sec   Loss 3.3228   LearningRate 0.0057   Epoch: 15   Global Step: 631190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:41:38,544-Speed 2612.92 samples/sec   Loss 3.1790   LearningRate 0.0057   Epoch: 15   Global Step: 631200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:41:42,431-Speed 2634.82 samples/sec   Loss 3.1857   LearningRate 0.0057   Epoch: 15   Global Step: 631210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:46,592-Speed 2461.95 samples/sec   Loss 3.2712   LearningRate 0.0057   Epoch: 15   Global Step: 631220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:50,605-Speed 2552.19 samples/sec   Loss 3.3287   LearningRate 0.0057   Epoch: 15   Global Step: 631230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:54,516-Speed 2619.30 samples/sec   Loss 3.3415   LearningRate 0.0057   Epoch: 15   Global Step: 631240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:41:58,431-Speed 2616.14 samples/sec   Loss 3.3010   LearningRate 0.0057   Epoch: 15   Global Step: 631250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:02,344-Speed 2618.15 samples/sec   Loss 3.2348   LearningRate 0.0057   Epoch: 15   Global Step: 631260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:06,271-Speed 2608.08 samples/sec   Loss 3.3492   LearningRate 0.0057   Epoch: 15   Global Step: 631270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:10,179-Speed 2620.95 samples/sec   Loss 3.1578   LearningRate 0.0057   Epoch: 15   Global Step: 631280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:14,088-Speed 2620.03 samples/sec   Loss 3.3008   LearningRate 0.0057   Epoch: 15   Global Step: 631290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:18,007-Speed 2613.91 samples/sec   Loss 3.2152   LearningRate 0.0057   Epoch: 15   Global Step: 631300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:21,923-Speed 2616.26 samples/sec   Loss 3.3015   LearningRate 0.0057   Epoch: 15   Global Step: 631310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:42:25,835-Speed 2618.30 samples/sec   Loss 3.1807   LearningRate 0.0057   Epoch: 15   Global Step: 631320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:42:29,756-Speed 2611.90 samples/sec   Loss 3.2878   LearningRate 0.0057   Epoch: 15   Global Step: 631330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:42:33,665-Speed 2620.82 samples/sec   Loss 3.2497   LearningRate 0.0057   Epoch: 15   Global Step: 631340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:42:37,553-Speed 2635.15 samples/sec   Loss 3.2371   LearningRate 0.0057   Epoch: 15   Global Step: 631350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:41,466-Speed 2617.19 samples/sec   Loss 3.3297   LearningRate 0.0057   Epoch: 15   Global Step: 631360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:45,380-Speed 2617.03 samples/sec   Loss 3.3222   LearningRate 0.0057   Epoch: 15   Global Step: 631370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:49,286-Speed 2622.33 samples/sec   Loss 3.2301   LearningRate 0.0057   Epoch: 15   Global Step: 631380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:53,196-Speed 2619.15 samples/sec   Loss 3.1761   LearningRate 0.0057   Epoch: 15   Global Step: 631390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:42:57,104-Speed 2621.13 samples/sec   Loss 3.2814   LearningRate 0.0057   Epoch: 15   Global Step: 631400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:43:01,023-Speed 2613.25 samples/sec   Loss 3.2744   LearningRate 0.0057   Epoch: 15   Global Step: 631410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:43:04,965-Speed 2599.42 samples/sec   Loss 3.2041   LearningRate 0.0057   Epoch: 15   Global Step: 631420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:43:08,908-Speed 2597.49 samples/sec   Loss 3.1877   LearningRate 0.0057   Epoch: 15   Global Step: 631430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:43:12,818-Speed 2619.44 samples/sec   Loss 3.1995   LearningRate 0.0057   Epoch: 15   Global Step: 631440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:43:16,722-Speed 2623.30 samples/sec   Loss 3.1926   LearningRate 0.0057   Epoch: 15   Global Step: 631450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:20,636-Speed 2617.36 samples/sec   Loss 3.1980   LearningRate 0.0057   Epoch: 15   Global Step: 631460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:24,539-Speed 2624.39 samples/sec   Loss 3.3141   LearningRate 0.0057   Epoch: 15   Global Step: 631470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:28,448-Speed 2620.33 samples/sec   Loss 3.2727   LearningRate 0.0057   Epoch: 15   Global Step: 631480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:32,355-Speed 2621.94 samples/sec   Loss 3.3009   LearningRate 0.0057   Epoch: 15   Global Step: 631490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:36,257-Speed 2625.60 samples/sec   Loss 3.2771   LearningRate 0.0057   Epoch: 15   Global Step: 631500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:40,163-Speed 2622.12 samples/sec   Loss 3.2465   LearningRate 0.0057   Epoch: 15   Global Step: 631510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:44,076-Speed 2617.16 samples/sec   Loss 3.2385   LearningRate 0.0057   Epoch: 15   Global Step: 631520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:47,986-Speed 2619.77 samples/sec   Loss 3.2786   LearningRate 0.0057   Epoch: 15   Global Step: 631530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:51,894-Speed 2621.13 samples/sec   Loss 3.2886   LearningRate 0.0057   Epoch: 15   Global Step: 631540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:55,776-Speed 2638.35 samples/sec   Loss 3.1771   LearningRate 0.0057   Epoch: 15   Global Step: 631550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:43:59,706-Speed 2606.98 samples/sec   Loss 3.2654   LearningRate 0.0057   Epoch: 15   Global Step: 631560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:03,616-Speed 2619.44 samples/sec   Loss 3.2551   LearningRate 0.0057   Epoch: 15   Global Step: 631570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:07,517-Speed 2625.43 samples/sec   Loss 3.1921   LearningRate 0.0057   Epoch: 15   Global Step: 631580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:11,439-Speed 2611.93 samples/sec   Loss 3.2281   LearningRate 0.0057   Epoch: 15   Global Step: 631590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:15,344-Speed 2623.09 samples/sec   Loss 3.1431   LearningRate 0.0057   Epoch: 15   Global Step: 631600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:19,249-Speed 2622.64 samples/sec   Loss 3.2705   LearningRate 0.0057   Epoch: 15   Global Step: 631610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:23,158-Speed 2620.12 samples/sec   Loss 3.2564   LearningRate 0.0057   Epoch: 15   Global Step: 631620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:27,101-Speed 2598.23 samples/sec   Loss 3.3861   LearningRate 0.0057   Epoch: 15   Global Step: 631630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:31,020-Speed 2613.48 samples/sec   Loss 3.2485   LearningRate 0.0057   Epoch: 15   Global Step: 631640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:34,901-Speed 2639.54 samples/sec   Loss 3.2247   LearningRate 0.0057   Epoch: 15   Global Step: 631650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:38,804-Speed 2623.65 samples/sec   Loss 3.2427   LearningRate 0.0057   Epoch: 15   Global Step: 631660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:42,719-Speed 2616.72 samples/sec   Loss 3.2681   LearningRate 0.0057   Epoch: 15   Global Step: 631670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:46,622-Speed 2624.36 samples/sec   Loss 3.1229   LearningRate 0.0057   Epoch: 15   Global Step: 631680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:50,529-Speed 2621.93 samples/sec   Loss 3.2439   LearningRate 0.0057   Epoch: 15   Global Step: 631690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:54,441-Speed 2618.62 samples/sec   Loss 3.2914   LearningRate 0.0057   Epoch: 15   Global Step: 631700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:44:58,349-Speed 2620.73 samples/sec   Loss 3.1741   LearningRate 0.0057   Epoch: 15   Global Step: 631710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:45:02,261-Speed 2618.70 samples/sec   Loss 3.3941   LearningRate 0.0057   Epoch: 15   Global Step: 631720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:45:06,166-Speed 2622.71 samples/sec   Loss 3.3095   LearningRate 0.0057   Epoch: 15   Global Step: 631730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:45:10,107-Speed 2599.06 samples/sec   Loss 3.1738   LearningRate 0.0057   Epoch: 15   Global Step: 631740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:45:14,004-Speed 2628.98 samples/sec   Loss 3.3541   LearningRate 0.0057   Epoch: 15   Global Step: 631750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:45:17,896-Speed 2631.46 samples/sec   Loss 3.3057   LearningRate 0.0057   Epoch: 15   Global Step: 631760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:21,821-Speed 2609.94 samples/sec   Loss 3.2921   LearningRate 0.0057   Epoch: 15   Global Step: 631770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:25,722-Speed 2625.58 samples/sec   Loss 3.1726   LearningRate 0.0057   Epoch: 15   Global Step: 631780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:29,630-Speed 2621.00 samples/sec   Loss 3.1903   LearningRate 0.0057   Epoch: 15   Global Step: 631790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:33,541-Speed 2619.18 samples/sec   Loss 3.2032   LearningRate 0.0057   Epoch: 15   Global Step: 631800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:37,454-Speed 2617.09 samples/sec   Loss 3.2414   LearningRate 0.0057   Epoch: 15   Global Step: 631810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:41,376-Speed 2611.72 samples/sec   Loss 3.2881   LearningRate 0.0057   Epoch: 15   Global Step: 631820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:45,300-Speed 2610.72 samples/sec   Loss 3.2422   LearningRate 0.0057   Epoch: 15   Global Step: 631830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:49,215-Speed 2615.76 samples/sec   Loss 3.2150   LearningRate 0.0057   Epoch: 15   Global Step: 631840   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:53,139-Speed 2610.55 samples/sec   Loss 3.2023   LearningRate 0.0057   Epoch: 15   Global Step: 631850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:45:57,040-Speed 2625.60 samples/sec   Loss 3.1433   LearningRate 0.0057   Epoch: 15   Global Step: 631860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:00,959-Speed 2613.83 samples/sec   Loss 3.2487   LearningRate 0.0057   Epoch: 15   Global Step: 631870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:04,864-Speed 2623.05 samples/sec   Loss 3.2960   LearningRate 0.0057   Epoch: 15   Global Step: 631880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:08,768-Speed 2623.68 samples/sec   Loss 3.2206   LearningRate 0.0057   Epoch: 15   Global Step: 631890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:12,670-Speed 2624.54 samples/sec   Loss 3.2170   LearningRate 0.0057   Epoch: 15   Global Step: 631900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:16,569-Speed 2627.73 samples/sec   Loss 3.2796   LearningRate 0.0057   Epoch: 15   Global Step: 631910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:20,468-Speed 2626.35 samples/sec   Loss 3.2938   LearningRate 0.0057   Epoch: 15   Global Step: 631920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:24,408-Speed 2600.27 samples/sec   Loss 3.2185   LearningRate 0.0057   Epoch: 15   Global Step: 631930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:46:28,289-Speed 2638.97 samples/sec   Loss 3.1730   LearningRate 0.0057   Epoch: 15   Global Step: 631940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:32,190-Speed 2625.79 samples/sec   Loss 3.1518   LearningRate 0.0057   Epoch: 15   Global Step: 631950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:36,109-Speed 2613.63 samples/sec   Loss 3.2123   LearningRate 0.0057   Epoch: 15   Global Step: 631960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:40,029-Speed 2612.36 samples/sec   Loss 3.1815   LearningRate 0.0057   Epoch: 15   Global Step: 631970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:43,944-Speed 2616.48 samples/sec   Loss 3.3312   LearningRate 0.0057   Epoch: 15   Global Step: 631980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:47,846-Speed 2625.43 samples/sec   Loss 3.2539   LearningRate 0.0057   Epoch: 15   Global Step: 631990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:51,782-Speed 2602.76 samples/sec   Loss 3.2316   LearningRate 0.0057   Epoch: 15   Global Step: 632000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:55,711-Speed 2606.58 samples/sec   Loss 3.2445   LearningRate 0.0057   Epoch: 15   Global Step: 632010   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:46:59,630-Speed 2614.19 samples/sec   Loss 3.2453   LearningRate 0.0057   Epoch: 15   Global Step: 632020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:47:03,538-Speed 2620.70 samples/sec   Loss 3.2865   LearningRate 0.0057   Epoch: 15   Global Step: 632030   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:47:07,446-Speed 2620.86 samples/sec   Loss 3.1843   LearningRate 0.0057   Epoch: 15   Global Step: 632040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:47:11,360-Speed 2616.85 samples/sec   Loss 3.3031   LearningRate 0.0057   Epoch: 15   Global Step: 632050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:47:15,345-Speed 2570.78 samples/sec   Loss 3.2541   LearningRate 0.0057   Epoch: 15   Global Step: 632060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:47:19,414-Speed 2517.26 samples/sec   Loss 3.2429   LearningRate 0.0057   Epoch: 15   Global Step: 632070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:47:23,286-Speed 2645.26 samples/sec   Loss 3.2783   LearningRate 0.0057   Epoch: 15   Global Step: 632080   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:27,189-Speed 2624.38 samples/sec   Loss 3.2545   LearningRate 0.0057   Epoch: 15   Global Step: 632090   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:31,098-Speed 2620.37 samples/sec   Loss 3.2715   LearningRate 0.0057   Epoch: 15   Global Step: 632100   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:35,049-Speed 2592.49 samples/sec   Loss 3.2379   LearningRate 0.0057   Epoch: 15   Global Step: 632110   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:38,953-Speed 2623.39 samples/sec   Loss 3.2366   LearningRate 0.0057   Epoch: 15   Global Step: 632120   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:42,865-Speed 2618.64 samples/sec   Loss 3.1993   LearningRate 0.0057   Epoch: 15   Global Step: 632130   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:46,835-Speed 2580.23 samples/sec   Loss 3.2775   LearningRate 0.0057   Epoch: 15   Global Step: 632140   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:50,739-Speed 2623.42 samples/sec   Loss 3.2717   LearningRate 0.0057   Epoch: 15   Global Step: 632150   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:54,640-Speed 2626.31 samples/sec   Loss 3.2754   LearningRate 0.0057   Epoch: 15   Global Step: 632160   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:47:58,632-Speed 2565.15 samples/sec   Loss 3.2124   LearningRate 0.0057   Epoch: 15   Global Step: 632170   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:48:02,534-Speed 2625.82 samples/sec   Loss 3.1857   LearningRate 0.0057   Epoch: 15   Global Step: 632180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:06,437-Speed 2623.59 samples/sec   Loss 3.2383   LearningRate 0.0057   Epoch: 15   Global Step: 632190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:10,335-Speed 2627.91 samples/sec   Loss 3.2057   LearningRate 0.0057   Epoch: 15   Global Step: 632200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:14,243-Speed 2620.42 samples/sec   Loss 3.1965   LearningRate 0.0057   Epoch: 15   Global Step: 632210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:18,151-Speed 2621.16 samples/sec   Loss 3.1592   LearningRate 0.0057   Epoch: 15   Global Step: 632220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:22,075-Speed 2611.04 samples/sec   Loss 3.2458   LearningRate 0.0057   Epoch: 15   Global Step: 632230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:25,997-Speed 2611.62 samples/sec   Loss 3.2041   LearningRate 0.0057   Epoch: 15   Global Step: 632240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:29,899-Speed 2630.60 samples/sec   Loss 3.2013   LearningRate 0.0057   Epoch: 15   Global Step: 632250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:33,795-Speed 2629.29 samples/sec   Loss 3.2060   LearningRate 0.0057   Epoch: 15   Global Step: 632260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:37,698-Speed 2624.49 samples/sec   Loss 3.1832   LearningRate 0.0057   Epoch: 15   Global Step: 632270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:41,599-Speed 2625.28 samples/sec   Loss 3.2272   LearningRate 0.0057   Epoch: 15   Global Step: 632280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:48:45,522-Speed 2610.91 samples/sec   Loss 3.2683   LearningRate 0.0057   Epoch: 15   Global Step: 632290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:48:49,409-Speed 2635.32 samples/sec   Loss 3.2678   LearningRate 0.0057   Epoch: 15   Global Step: 632300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:53,318-Speed 2620.49 samples/sec   Loss 3.2815   LearningRate 0.0057   Epoch: 15   Global Step: 632310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:48:57,263-Speed 2597.08 samples/sec   Loss 3.2312   LearningRate 0.0057   Epoch: 15   Global Step: 632320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:01,164-Speed 2626.23 samples/sec   Loss 3.2650   LearningRate 0.0057   Epoch: 15   Global Step: 632330   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:05,090-Speed 2608.88 samples/sec   Loss 3.1829   LearningRate 0.0057   Epoch: 15   Global Step: 632340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:09,028-Speed 2600.94 samples/sec   Loss 3.2914   LearningRate 0.0057   Epoch: 15   Global Step: 632350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:12,925-Speed 2628.19 samples/sec   Loss 3.3087   LearningRate 0.0057   Epoch: 15   Global Step: 632360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:16,828-Speed 2624.00 samples/sec   Loss 3.2329   LearningRate 0.0057   Epoch: 15   Global Step: 632370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:20,727-Speed 2627.65 samples/sec   Loss 3.2563   LearningRate 0.0057   Epoch: 15   Global Step: 632380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:24,624-Speed 2628.30 samples/sec   Loss 3.2241   LearningRate 0.0057   Epoch: 15   Global Step: 632390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:49:28,507-Speed 2638.26 samples/sec   Loss 3.2930   LearningRate 0.0056   Epoch: 15   Global Step: 632400   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:32,405-Speed 2627.15 samples/sec   Loss 3.2994   LearningRate 0.0056   Epoch: 15   Global Step: 632410   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:36,303-Speed 2627.99 samples/sec   Loss 3.2270   LearningRate 0.0056   Epoch: 15   Global Step: 632420   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:40,204-Speed 2626.01 samples/sec   Loss 3.2182   LearningRate 0.0056   Epoch: 15   Global Step: 632430   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:44,104-Speed 2626.25 samples/sec   Loss 3.2325   LearningRate 0.0056   Epoch: 15   Global Step: 632440   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:48,005-Speed 2625.30 samples/sec   Loss 3.2246   LearningRate 0.0056   Epoch: 15   Global Step: 632450   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:51,911-Speed 2626.92 samples/sec   Loss 3.0998   LearningRate 0.0056   Epoch: 15   Global Step: 632460   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:55,814-Speed 2623.66 samples/sec   Loss 3.2078   LearningRate 0.0056   Epoch: 15   Global Step: 632470   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:49:59,826-Speed 2553.72 samples/sec   Loss 3.1705   LearningRate 0.0056   Epoch: 15   Global Step: 632480   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:50:03,725-Speed 2627.13 samples/sec   Loss 3.2723   LearningRate 0.0056   Epoch: 15   Global Step: 632490   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:50:07,629-Speed 2623.34 samples/sec   Loss 3.3210   LearningRate 0.0056   Epoch: 15   Global Step: 632500   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:11,544-Speed 2616.20 samples/sec   Loss 3.1536   LearningRate 0.0056   Epoch: 15   Global Step: 632510   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:15,480-Speed 2602.64 samples/sec   Loss 3.2415   LearningRate 0.0056   Epoch: 15   Global Step: 632520   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:19,374-Speed 2630.03 samples/sec   Loss 3.2709   LearningRate 0.0056   Epoch: 15   Global Step: 632530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:23,303-Speed 2607.30 samples/sec   Loss 3.3303   LearningRate 0.0056   Epoch: 15   Global Step: 632540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:27,206-Speed 2624.99 samples/sec   Loss 3.2272   LearningRate 0.0056   Epoch: 15   Global Step: 632550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:31,106-Speed 2626.03 samples/sec   Loss 3.2805   LearningRate 0.0056   Epoch: 15   Global Step: 632560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:35,012-Speed 2621.62 samples/sec   Loss 3.2785   LearningRate 0.0056   Epoch: 15   Global Step: 632570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:38,941-Speed 2607.53 samples/sec   Loss 3.2232   LearningRate 0.0056   Epoch: 15   Global Step: 632580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:42,841-Speed 2626.62 samples/sec   Loss 3.3282   LearningRate 0.0056   Epoch: 15   Global Step: 632590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:46,753-Speed 2617.63 samples/sec   Loss 3.3080   LearningRate 0.0056   Epoch: 15   Global Step: 632600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:50:50,828-Speed 2514.23 samples/sec   Loss 3.2723   LearningRate 0.0056   Epoch: 15   Global Step: 632610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:54,905-Speed 2511.82 samples/sec   Loss 3.2005   LearningRate 0.0056   Epoch: 15   Global Step: 632620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:50:58,866-Speed 2586.97 samples/sec   Loss 3.2288   LearningRate 0.0056   Epoch: 15   Global Step: 632630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:02,769-Speed 2624.13 samples/sec   Loss 3.3249   LearningRate 0.0056   Epoch: 15   Global Step: 632640   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:06,670-Speed 2625.48 samples/sec   Loss 3.1587   LearningRate 0.0056   Epoch: 15   Global Step: 632650   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:10,574-Speed 2623.83 samples/sec   Loss 3.2156   LearningRate 0.0056   Epoch: 15   Global Step: 632660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:14,572-Speed 2562.24 samples/sec   Loss 3.2523   LearningRate 0.0056   Epoch: 15   Global Step: 632670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:18,466-Speed 2630.10 samples/sec   Loss 3.2918   LearningRate 0.0056   Epoch: 15   Global Step: 632680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:22,396-Speed 2607.09 samples/sec   Loss 3.2337   LearningRate 0.0056   Epoch: 15   Global Step: 632690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:26,299-Speed 2624.65 samples/sec   Loss 3.2192   LearningRate 0.0056   Epoch: 15   Global Step: 632700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:30,230-Speed 2605.97 samples/sec   Loss 3.2120   LearningRate 0.0056   Epoch: 15   Global Step: 632710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:51:34,145-Speed 2616.17 samples/sec   Loss 3.2261   LearningRate 0.0056   Epoch: 15   Global Step: 632720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:51:38,042-Speed 2627.73 samples/sec   Loss 3.1911   LearningRate 0.0056   Epoch: 15   Global Step: 632730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:51:41,937-Speed 2629.37 samples/sec   Loss 3.2911   LearningRate 0.0056   Epoch: 15   Global Step: 632740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:45,846-Speed 2620.93 samples/sec   Loss 3.1871   LearningRate 0.0056   Epoch: 15   Global Step: 632750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:49,782-Speed 2602.16 samples/sec   Loss 3.3014   LearningRate 0.0056   Epoch: 15   Global Step: 632760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:53,691-Speed 2620.62 samples/sec   Loss 3.2826   LearningRate 0.0056   Epoch: 15   Global Step: 632770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:51:57,596-Speed 2623.11 samples/sec   Loss 3.1756   LearningRate 0.0056   Epoch: 15   Global Step: 632780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:01,506-Speed 2619.64 samples/sec   Loss 3.2010   LearningRate 0.0056   Epoch: 15   Global Step: 632790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:05,409-Speed 2624.84 samples/sec   Loss 3.2509   LearningRate 0.0056   Epoch: 15   Global Step: 632800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:09,305-Speed 2628.44 samples/sec   Loss 3.1637   LearningRate 0.0056   Epoch: 15   Global Step: 632810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:13,211-Speed 2622.21 samples/sec   Loss 3.2330   LearningRate 0.0056   Epoch: 15   Global Step: 632820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:17,142-Speed 2606.36 samples/sec   Loss 3.2264   LearningRate 0.0056   Epoch: 15   Global Step: 632830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:21,049-Speed 2621.91 samples/sec   Loss 3.2838   LearningRate 0.0056   Epoch: 15   Global Step: 632840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:52:24,935-Speed 2636.38 samples/sec   Loss 3.2159   LearningRate 0.0056   Epoch: 15   Global Step: 632850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:28,835-Speed 2626.01 samples/sec   Loss 3.2662   LearningRate 0.0056   Epoch: 15   Global Step: 632860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:32,733-Speed 2628.16 samples/sec   Loss 3.3360   LearningRate 0.0056   Epoch: 15   Global Step: 632870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:36,632-Speed 2627.03 samples/sec   Loss 3.2064   LearningRate 0.0056   Epoch: 15   Global Step: 632880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:40,529-Speed 2628.37 samples/sec   Loss 3.2642   LearningRate 0.0056   Epoch: 15   Global Step: 632890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:44,426-Speed 2627.68 samples/sec   Loss 3.2368   LearningRate 0.0056   Epoch: 15   Global Step: 632900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:48,327-Speed 2626.59 samples/sec   Loss 3.2812   LearningRate 0.0056   Epoch: 15   Global Step: 632910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:52,235-Speed 2621.76 samples/sec   Loss 3.2039   LearningRate 0.0056   Epoch: 15   Global Step: 632920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:52:56,146-Speed 2619.06 samples/sec   Loss 3.2475   LearningRate 0.0056   Epoch: 15   Global Step: 632930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:00,048-Speed 2625.37 samples/sec   Loss 3.2546   LearningRate 0.0056   Epoch: 15   Global Step: 632940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:03,950-Speed 2624.94 samples/sec   Loss 3.1650   LearningRate 0.0056   Epoch: 15   Global Step: 632950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:53:07,869-Speed 2612.79 samples/sec   Loss 3.2199   LearningRate 0.0056   Epoch: 15   Global Step: 632960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:53:11,806-Speed 2601.86 samples/sec   Loss 3.2262   LearningRate 0.0056   Epoch: 15   Global Step: 632970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:53:15,729-Speed 2611.56 samples/sec   Loss 3.3154   LearningRate 0.0056   Epoch: 15   Global Step: 632980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:53:19,623-Speed 2630.35 samples/sec   Loss 3.2048   LearningRate 0.0056   Epoch: 15   Global Step: 632990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:23,547-Speed 2610.12 samples/sec   Loss 3.1853   LearningRate 0.0056   Epoch: 15   Global Step: 633000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:27,445-Speed 2627.91 samples/sec   Loss 3.2130   LearningRate 0.0056   Epoch: 15   Global Step: 633010   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:31,344-Speed 2626.82 samples/sec   Loss 3.2641   LearningRate 0.0056   Epoch: 15   Global Step: 633020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:35,280-Speed 2602.62 samples/sec   Loss 3.2707   LearningRate 0.0056   Epoch: 15   Global Step: 633030   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:39,182-Speed 2624.64 samples/sec   Loss 3.2247   LearningRate 0.0056   Epoch: 15   Global Step: 633040   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:43,079-Speed 2628.67 samples/sec   Loss 3.1816   LearningRate 0.0056   Epoch: 15   Global Step: 633050   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:46,979-Speed 2626.53 samples/sec   Loss 3.2477   LearningRate 0.0056   Epoch: 15   Global Step: 633060   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:50,885-Speed 2621.69 samples/sec   Loss 3.2415   LearningRate 0.0056   Epoch: 15   Global Step: 633070   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:54,789-Speed 2623.86 samples/sec   Loss 3.1502   LearningRate 0.0056   Epoch: 15   Global Step: 633080   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:53:58,690-Speed 2625.60 samples/sec   Loss 3.1940   LearningRate 0.0056   Epoch: 15   Global Step: 633090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:54:02,585-Speed 2630.19 samples/sec   Loss 3.1875   LearningRate 0.0056   Epoch: 15   Global Step: 633100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:54:06,455-Speed 2646.20 samples/sec   Loss 3.2425   LearningRate 0.0056   Epoch: 15   Global Step: 633110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:10,370-Speed 2615.93 samples/sec   Loss 3.2242   LearningRate 0.0056   Epoch: 15   Global Step: 633120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:14,268-Speed 2627.61 samples/sec   Loss 3.2482   LearningRate 0.0056   Epoch: 15   Global Step: 633130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:18,169-Speed 2626.63 samples/sec   Loss 3.2627   LearningRate 0.0056   Epoch: 15   Global Step: 633140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:22,064-Speed 2629.29 samples/sec   Loss 3.2283   LearningRate 0.0056   Epoch: 15   Global Step: 633150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:25,962-Speed 2627.23 samples/sec   Loss 3.3320   LearningRate 0.0056   Epoch: 15   Global Step: 633160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:29,873-Speed 2619.05 samples/sec   Loss 3.1651   LearningRate 0.0056   Epoch: 15   Global Step: 633170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:33,780-Speed 2621.90 samples/sec   Loss 3.2903   LearningRate 0.0056   Epoch: 15   Global Step: 633180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:37,730-Speed 2593.62 samples/sec   Loss 3.1264   LearningRate 0.0056   Epoch: 15   Global Step: 633190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:41,636-Speed 2622.08 samples/sec   Loss 3.1003   LearningRate 0.0056   Epoch: 15   Global Step: 633200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:45,538-Speed 2625.18 samples/sec   Loss 3.2722   LearningRate 0.0056   Epoch: 15   Global Step: 633210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:54:49,451-Speed 2616.95 samples/sec   Loss 3.2675   LearningRate 0.0056   Epoch: 15   Global Step: 633220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:54:53,335-Speed 2637.89 samples/sec   Loss 3.2863   LearningRate 0.0056   Epoch: 15   Global Step: 633230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:54:57,234-Speed 2627.40 samples/sec   Loss 3.3029   LearningRate 0.0056   Epoch: 15   Global Step: 633240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:55:01,129-Speed 2629.56 samples/sec   Loss 3.1723   LearningRate 0.0056   Epoch: 15   Global Step: 633250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:55:05,032-Speed 2624.55 samples/sec   Loss 3.2840   LearningRate 0.0056   Epoch: 15   Global Step: 633260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:55:08,941-Speed 2619.56 samples/sec   Loss 3.1999   LearningRate 0.0056   Epoch: 15   Global Step: 633270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:55:12,819-Speed 2641.52 samples/sec   Loss 3.3326   LearningRate 0.0056   Epoch: 15   Global Step: 633280   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:16,716-Speed 2628.11 samples/sec   Loss 3.2911   LearningRate 0.0056   Epoch: 15   Global Step: 633290   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:20,615-Speed 2627.20 samples/sec   Loss 3.1831   LearningRate 0.0056   Epoch: 15   Global Step: 633300   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:24,517-Speed 2625.13 samples/sec   Loss 3.2123   LearningRate 0.0056   Epoch: 15   Global Step: 633310   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:28,413-Speed 2629.35 samples/sec   Loss 3.1881   LearningRate 0.0056   Epoch: 15   Global Step: 633320   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:32,420-Speed 2556.20 samples/sec   Loss 3.2097   LearningRate 0.0056   Epoch: 15   Global Step: 633330   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:36,324-Speed 2624.04 samples/sec   Loss 3.1552   LearningRate 0.0056   Epoch: 15   Global Step: 633340   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:40,226-Speed 2624.66 samples/sec   Loss 3.1714   LearningRate 0.0056   Epoch: 15   Global Step: 633350   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:44,125-Speed 2627.47 samples/sec   Loss 3.2086   LearningRate 0.0056   Epoch: 15   Global Step: 633360   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:48,026-Speed 2625.20 samples/sec   Loss 3.2024   LearningRate 0.0056   Epoch: 15   Global Step: 633370   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 18:55:51,925-Speed 2627.14 samples/sec   Loss 3.2337   LearningRate 0.0056   Epoch: 15   Global Step: 633380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:55:55,827-Speed 2625.10 samples/sec   Loss 3.1793   LearningRate 0.0056   Epoch: 15   Global Step: 633390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:55:59,729-Speed 2625.20 samples/sec   Loss 3.2035   LearningRate 0.0056   Epoch: 15   Global Step: 633400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:03,633-Speed 2623.62 samples/sec   Loss 3.2064   LearningRate 0.0056   Epoch: 15   Global Step: 633410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:07,570-Speed 2601.96 samples/sec   Loss 3.2473   LearningRate 0.0056   Epoch: 15   Global Step: 633420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:11,479-Speed 2620.03 samples/sec   Loss 3.1855   LearningRate 0.0056   Epoch: 15   Global Step: 633430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:15,378-Speed 2627.25 samples/sec   Loss 3.2251   LearningRate 0.0056   Epoch: 15   Global Step: 633440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:19,287-Speed 2619.54 samples/sec   Loss 3.2773   LearningRate 0.0056   Epoch: 15   Global Step: 633450   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:23,198-Speed 2619.64 samples/sec   Loss 3.2099   LearningRate 0.0056   Epoch: 15   Global Step: 633460   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:27,106-Speed 2621.10 samples/sec   Loss 3.2062   LearningRate 0.0056   Epoch: 15   Global Step: 633470   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:31,002-Speed 2628.83 samples/sec   Loss 3.3050   LearningRate 0.0056   Epoch: 15   Global Step: 633480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:56:34,902-Speed 2625.52 samples/sec   Loss 3.2124   LearningRate 0.0056   Epoch: 15   Global Step: 633490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:56:38,801-Speed 2627.42 samples/sec   Loss 3.1876   LearningRate 0.0056   Epoch: 15   Global Step: 633500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:56:42,706-Speed 2623.83 samples/sec   Loss 3.2091   LearningRate 0.0056   Epoch: 15   Global Step: 633510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:56:46,621-Speed 2616.04 samples/sec   Loss 3.1564   LearningRate 0.0056   Epoch: 15   Global Step: 633520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:56:50,532-Speed 2618.62 samples/sec   Loss 3.2271   LearningRate 0.0056   Epoch: 15   Global Step: 633530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:56:54,430-Speed 2627.93 samples/sec   Loss 3.2814   LearningRate 0.0056   Epoch: 15   Global Step: 633540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:56:58,330-Speed 2626.06 samples/sec   Loss 3.2514   LearningRate 0.0056   Epoch: 15   Global Step: 633550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:02,249-Speed 2613.53 samples/sec   Loss 3.1915   LearningRate 0.0056   Epoch: 15   Global Step: 633560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:06,152-Speed 2624.71 samples/sec   Loss 3.1899   LearningRate 0.0056   Epoch: 15   Global Step: 633570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:10,068-Speed 2615.64 samples/sec   Loss 3.2145   LearningRate 0.0056   Epoch: 15   Global Step: 633580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:13,992-Speed 2610.63 samples/sec   Loss 3.3067   LearningRate 0.0056   Epoch: 15   Global Step: 633590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:17,904-Speed 2618.84 samples/sec   Loss 3.2212   LearningRate 0.0056   Epoch: 15   Global Step: 633600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:21,808-Speed 2623.07 samples/sec   Loss 3.2096   LearningRate 0.0056   Epoch: 15   Global Step: 633610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:25,802-Speed 2565.09 samples/sec   Loss 3.2148   LearningRate 0.0056   Epoch: 15   Global Step: 633620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:29,699-Speed 2628.37 samples/sec   Loss 3.1960   LearningRate 0.0056   Epoch: 15   Global Step: 633630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:57:33,612-Speed 2617.30 samples/sec   Loss 3.2163   LearningRate 0.0056   Epoch: 15   Global Step: 633640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:57:37,590-Speed 2574.64 samples/sec   Loss 3.2227   LearningRate 0.0056   Epoch: 15   Global Step: 633650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:57:41,489-Speed 2627.02 samples/sec   Loss 3.2759   LearningRate 0.0056   Epoch: 15   Global Step: 633660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:57:45,389-Speed 2625.97 samples/sec   Loss 3.2871   LearningRate 0.0056   Epoch: 15   Global Step: 633670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:57:49,289-Speed 2627.88 samples/sec   Loss 3.2089   LearningRate 0.0056   Epoch: 15   Global Step: 633680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:57:53,180-Speed 2632.47 samples/sec   Loss 3.1795   LearningRate 0.0056   Epoch: 15   Global Step: 633690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:57:57,082-Speed 2625.63 samples/sec   Loss 3.2867   LearningRate 0.0056   Epoch: 15   Global Step: 633700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:58:00,981-Speed 2626.88 samples/sec   Loss 3.1926   LearningRate 0.0056   Epoch: 15   Global Step: 633710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:58:04,884-Speed 2624.18 samples/sec   Loss 3.2916   LearningRate 0.0056   Epoch: 15   Global Step: 633720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:58:08,764-Speed 2640.13 samples/sec   Loss 3.2208   LearningRate 0.0056   Epoch: 15   Global Step: 633730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:12,660-Speed 2628.93 samples/sec   Loss 3.2846   LearningRate 0.0056   Epoch: 15   Global Step: 633740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:16,557-Speed 2628.31 samples/sec   Loss 3.2038   LearningRate 0.0056   Epoch: 15   Global Step: 633750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:20,452-Speed 2629.64 samples/sec   Loss 3.1053   LearningRate 0.0056   Epoch: 15   Global Step: 633760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:24,376-Speed 2609.96 samples/sec   Loss 3.2099   LearningRate 0.0056   Epoch: 15   Global Step: 633770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:28,274-Speed 2628.72 samples/sec   Loss 3.2008   LearningRate 0.0056   Epoch: 15   Global Step: 633780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:32,170-Speed 2628.54 samples/sec   Loss 3.2413   LearningRate 0.0056   Epoch: 15   Global Step: 633790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:36,067-Speed 2627.94 samples/sec   Loss 3.2374   LearningRate 0.0056   Epoch: 15   Global Step: 633800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:39,963-Speed 2628.99 samples/sec   Loss 3.1861   LearningRate 0.0056   Epoch: 15   Global Step: 633810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:43,867-Speed 2624.40 samples/sec   Loss 3.2267   LearningRate 0.0056   Epoch: 15   Global Step: 633820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 18:58:47,766-Speed 2627.10 samples/sec   Loss 3.2866   LearningRate 0.0056   Epoch: 15   Global Step: 633830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:58:51,668-Speed 2624.94 samples/sec   Loss 3.2010   LearningRate 0.0056   Epoch: 15   Global Step: 633840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:58:55,602-Speed 2603.92 samples/sec   Loss 3.1868   LearningRate 0.0056   Epoch: 15   Global Step: 633850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:58:59,498-Speed 2629.27 samples/sec   Loss 3.2185   LearningRate 0.0056   Epoch: 15   Global Step: 633860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:03,393-Speed 2628.91 samples/sec   Loss 3.2297   LearningRate 0.0056   Epoch: 15   Global Step: 633870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:07,298-Speed 2622.66 samples/sec   Loss 3.2136   LearningRate 0.0056   Epoch: 15   Global Step: 633880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:11,201-Speed 2625.04 samples/sec   Loss 3.1784   LearningRate 0.0056   Epoch: 15   Global Step: 633890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:15,109-Speed 2621.05 samples/sec   Loss 3.2126   LearningRate 0.0056   Epoch: 15   Global Step: 633900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:19,010-Speed 2626.23 samples/sec   Loss 3.1324   LearningRate 0.0056   Epoch: 15   Global Step: 633910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:22,909-Speed 2626.89 samples/sec   Loss 3.2487   LearningRate 0.0056   Epoch: 15   Global Step: 633920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:26,813-Speed 2623.63 samples/sec   Loss 3.2651   LearningRate 0.0056   Epoch: 15   Global Step: 633930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 18:59:30,691-Speed 2641.51 samples/sec   Loss 3.2288   LearningRate 0.0056   Epoch: 15   Global Step: 633940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:34,584-Speed 2630.96 samples/sec   Loss 3.2453   LearningRate 0.0056   Epoch: 15   Global Step: 633950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:38,482-Speed 2627.18 samples/sec   Loss 3.2687   LearningRate 0.0056   Epoch: 15   Global Step: 633960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:42,382-Speed 2626.52 samples/sec   Loss 3.3031   LearningRate 0.0056   Epoch: 15   Global Step: 633970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:46,283-Speed 2625.97 samples/sec   Loss 3.2332   LearningRate 0.0056   Epoch: 15   Global Step: 633980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:50,206-Speed 2610.65 samples/sec   Loss 3.1690   LearningRate 0.0056   Epoch: 15   Global Step: 633990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:54,119-Speed 2617.79 samples/sec   Loss 3.2955   LearningRate 0.0056   Epoch: 15   Global Step: 634000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 18:59:58,219-Speed 2498.40 samples/sec   Loss 3.1827   LearningRate 0.0056   Epoch: 15   Global Step: 634010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:02,158-Speed 2600.89 samples/sec   Loss 3.1595   LearningRate 0.0056   Epoch: 15   Global Step: 634020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:06,051-Speed 2630.50 samples/sec   Loss 3.3004   LearningRate 0.0056   Epoch: 15   Global Step: 634030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:09,966-Speed 2616.08 samples/sec   Loss 3.2385   LearningRate 0.0056   Epoch: 15   Global Step: 634040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:00:13,964-Speed 2562.67 samples/sec   Loss 3.1821   LearningRate 0.0056   Epoch: 15   Global Step: 634050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:00:17,839-Speed 2643.59 samples/sec   Loss 3.1945   LearningRate 0.0056   Epoch: 15   Global Step: 634060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:21,749-Speed 2619.45 samples/sec   Loss 3.1798   LearningRate 0.0056   Epoch: 15   Global Step: 634070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:25,642-Speed 2631.25 samples/sec   Loss 3.2056   LearningRate 0.0056   Epoch: 15   Global Step: 634080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:29,532-Speed 2632.70 samples/sec   Loss 3.1519   LearningRate 0.0056   Epoch: 15   Global Step: 634090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:00:33,429-Speed 2628.52 samples/sec   Loss 3.2903   LearningRate 0.0056   Epoch: 15   Global Step: 634100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:00:37,327-Speed 2628.33 samples/sec   Loss 3.2508   LearningRate 0.0056   Epoch: 15   Global Step: 634110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:00:41,237-Speed 2618.95 samples/sec   Loss 3.3123   LearningRate 0.0056   Epoch: 15   Global Step: 634120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:00:45,158-Speed 2612.51 samples/sec   Loss 3.2481   LearningRate 0.0056   Epoch: 15   Global Step: 634130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:00:49,062-Speed 2623.96 samples/sec   Loss 3.2972   LearningRate 0.0056   Epoch: 15   Global Step: 634140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:00:52,970-Speed 2621.23 samples/sec   Loss 3.3378   LearningRate 0.0055   Epoch: 15   Global Step: 634150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:00:56,897-Speed 2608.02 samples/sec   Loss 3.2176   LearningRate 0.0055   Epoch: 15   Global Step: 634160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:00,797-Speed 2626.46 samples/sec   Loss 3.2707   LearningRate 0.0055   Epoch: 15   Global Step: 634170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:04,714-Speed 2615.20 samples/sec   Loss 3.2211   LearningRate 0.0055   Epoch: 15   Global Step: 634180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:08,706-Speed 2565.42 samples/sec   Loss 3.1519   LearningRate 0.0055   Epoch: 15   Global Step: 634190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:12,737-Speed 2540.80 samples/sec   Loss 3.1451   LearningRate 0.0055   Epoch: 15   Global Step: 634200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:01:16,635-Speed 2628.54 samples/sec   Loss 3.2409   LearningRate 0.0055   Epoch: 15   Global Step: 634210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:01:20,553-Speed 2614.00 samples/sec   Loss 3.3234   LearningRate 0.0055   Epoch: 15   Global Step: 634220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:01:24,447-Speed 2630.62 samples/sec   Loss 3.2319   LearningRate 0.0055   Epoch: 15   Global Step: 634230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:28,344-Speed 2627.84 samples/sec   Loss 3.2857   LearningRate 0.0055   Epoch: 15   Global Step: 634240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:32,242-Speed 2628.46 samples/sec   Loss 3.1757   LearningRate 0.0055   Epoch: 15   Global Step: 634250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:36,134-Speed 2631.48 samples/sec   Loss 3.2347   LearningRate 0.0055   Epoch: 15   Global Step: 634260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:40,055-Speed 2611.94 samples/sec   Loss 3.1801   LearningRate 0.0055   Epoch: 15   Global Step: 634270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:43,970-Speed 2616.18 samples/sec   Loss 3.1911   LearningRate 0.0055   Epoch: 15   Global Step: 634280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:47,866-Speed 2629.63 samples/sec   Loss 3.1624   LearningRate 0.0055   Epoch: 15   Global Step: 634290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:51,762-Speed 2628.62 samples/sec   Loss 3.2657   LearningRate 0.0055   Epoch: 15   Global Step: 634300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:55,659-Speed 2628.49 samples/sec   Loss 3.1228   LearningRate 0.0055   Epoch: 15   Global Step: 634310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:01:59,554-Speed 2630.09 samples/sec   Loss 3.2014   LearningRate 0.0055   Epoch: 15   Global Step: 634320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:02:03,451-Speed 2628.68 samples/sec   Loss 3.2051   LearningRate 0.0055   Epoch: 15   Global Step: 634330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:02:07,355-Speed 2623.19 samples/sec   Loss 3.2973   LearningRate 0.0055   Epoch: 15   Global Step: 634340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:02:11,266-Speed 2619.28 samples/sec   Loss 3.2228   LearningRate 0.0055   Epoch: 15   Global Step: 634350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:02:15,180-Speed 2616.55 samples/sec   Loss 3.2046   LearningRate 0.0055   Epoch: 15   Global Step: 634360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:02:19,077-Speed 2628.82 samples/sec   Loss 3.1904   LearningRate 0.0055   Epoch: 15   Global Step: 634370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:02:22,995-Speed 2617.42 samples/sec   Loss 3.2124   LearningRate 0.0055   Epoch: 15   Global Step: 634380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:02:26,874-Speed 2641.13 samples/sec   Loss 3.1790   LearningRate 0.0055   Epoch: 15   Global Step: 634390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:02:30,769-Speed 2628.96 samples/sec   Loss 3.2148   LearningRate 0.0055   Epoch: 15   Global Step: 634400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:02:34,666-Speed 2629.18 samples/sec   Loss 3.2485   LearningRate 0.0055   Epoch: 15   Global Step: 634410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:02:38,554-Speed 2634.61 samples/sec   Loss 3.2101   LearningRate 0.0055   Epoch: 15   Global Step: 634420   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:02:42,450-Speed 2628.45 samples/sec   Loss 3.2322   LearningRate 0.0055   Epoch: 15   Global Step: 634430   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:02:46,356-Speed 2623.10 samples/sec   Loss 3.1481   LearningRate 0.0055   Epoch: 15   Global Step: 634440   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:02:50,255-Speed 2626.52 samples/sec   Loss 3.1599   LearningRate 0.0055   Epoch: 15   Global Step: 634450   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:02:54,161-Speed 2622.94 samples/sec   Loss 3.2786   LearningRate 0.0055   Epoch: 15   Global Step: 634460   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:02:58,068-Speed 2621.71 samples/sec   Loss 3.1893   LearningRate 0.0055   Epoch: 15   Global Step: 634470   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:03:01,965-Speed 2628.05 samples/sec   Loss 3.1682   LearningRate 0.0055   Epoch: 15   Global Step: 634480   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:03:05,859-Speed 2630.35 samples/sec   Loss 3.2545   LearningRate 0.0055   Epoch: 15   Global Step: 634490   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:03:09,776-Speed 2614.75 samples/sec   Loss 3.1649   LearningRate 0.0055   Epoch: 15   Global Step: 634500   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:03:13,678-Speed 2625.09 samples/sec   Loss 3.2153   LearningRate 0.0055   Epoch: 15   Global Step: 634510   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-04-15 19:03:17,575-Speed 2628.13 samples/sec   Loss 3.1639   LearningRate 0.0055   Epoch: 15   Global Step: 634520   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:21,471-Speed 2629.37 samples/sec   Loss 3.1767   LearningRate 0.0055   Epoch: 15   Global Step: 634530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:25,370-Speed 2626.87 samples/sec   Loss 3.3421   LearningRate 0.0055   Epoch: 15   Global Step: 634540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:29,262-Speed 2631.54 samples/sec   Loss 3.1538   LearningRate 0.0055   Epoch: 15   Global Step: 634550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:33,159-Speed 2628.52 samples/sec   Loss 3.1685   LearningRate 0.0055   Epoch: 15   Global Step: 634560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:37,058-Speed 2626.59 samples/sec   Loss 3.1809   LearningRate 0.0055   Epoch: 15   Global Step: 634570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:40,952-Speed 2630.02 samples/sec   Loss 3.1562   LearningRate 0.0055   Epoch: 15   Global Step: 634580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:44,898-Speed 2596.17 samples/sec   Loss 3.1989   LearningRate 0.0055   Epoch: 15   Global Step: 634590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:48,813-Speed 2616.99 samples/sec   Loss 3.2264   LearningRate 0.0055   Epoch: 15   Global Step: 634600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:52,703-Speed 2632.65 samples/sec   Loss 3.2931   LearningRate 0.0055   Epoch: 15   Global Step: 634610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:03:56,601-Speed 2627.88 samples/sec   Loss 3.2540   LearningRate 0.0055   Epoch: 15   Global Step: 634620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:04:00,499-Speed 2628.09 samples/sec   Loss 3.1367   LearningRate 0.0055   Epoch: 15   Global Step: 634630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:04:04,397-Speed 2627.62 samples/sec   Loss 3.2171   LearningRate 0.0055   Epoch: 15   Global Step: 634640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:04:08,293-Speed 2628.62 samples/sec   Loss 3.1272   LearningRate 0.0055   Epoch: 15   Global Step: 634650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:04:12,164-Speed 2646.71 samples/sec   Loss 3.3408   LearningRate 0.0055   Epoch: 15   Global Step: 634660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:16,059-Speed 2629.45 samples/sec   Loss 3.1980   LearningRate 0.0055   Epoch: 15   Global Step: 634670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:19,955-Speed 2629.14 samples/sec   Loss 3.1666   LearningRate 0.0055   Epoch: 15   Global Step: 634680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:23,854-Speed 2627.19 samples/sec   Loss 3.2320   LearningRate 0.0055   Epoch: 15   Global Step: 634690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:27,780-Speed 2609.23 samples/sec   Loss 3.2648   LearningRate 0.0055   Epoch: 15   Global Step: 634700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:31,679-Speed 2626.51 samples/sec   Loss 3.1865   LearningRate 0.0055   Epoch: 15   Global Step: 634710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:35,588-Speed 2620.36 samples/sec   Loss 3.1542   LearningRate 0.0055   Epoch: 15   Global Step: 634720   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:39,491-Speed 2623.70 samples/sec   Loss 3.2270   LearningRate 0.0055   Epoch: 15   Global Step: 634730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:43,409-Speed 2614.86 samples/sec   Loss 3.2390   LearningRate 0.0055   Epoch: 15   Global Step: 634740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:47,307-Speed 2627.54 samples/sec   Loss 3.2110   LearningRate 0.0055   Epoch: 15   Global Step: 634750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:04:51,204-Speed 2628.51 samples/sec   Loss 3.2081   LearningRate 0.0055   Epoch: 15   Global Step: 634760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:04:55,102-Speed 2628.27 samples/sec   Loss 3.1687   LearningRate 0.0055   Epoch: 15   Global Step: 634770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:04:58,997-Speed 2629.90 samples/sec   Loss 3.2692   LearningRate 0.0055   Epoch: 15   Global Step: 634780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:05:02,894-Speed 2627.80 samples/sec   Loss 3.2607   LearningRate 0.0055   Epoch: 15   Global Step: 634790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:05:06,790-Speed 2629.28 samples/sec   Loss 3.2219   LearningRate 0.0055   Epoch: 15   Global Step: 634800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:05:10,714-Speed 2609.87 samples/sec   Loss 3.2053   LearningRate 0.0055   Epoch: 15   Global Step: 634810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:05:14,612-Speed 2628.33 samples/sec   Loss 3.2319   LearningRate 0.0055   Epoch: 15   Global Step: 634820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:05:18,508-Speed 2629.00 samples/sec   Loss 3.0863   LearningRate 0.0055   Epoch: 15   Global Step: 634830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:05:22,417-Speed 2619.94 samples/sec   Loss 3.2909   LearningRate 0.0055   Epoch: 15   Global Step: 634840   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:26,311-Speed 2631.24 samples/sec   Loss 3.0731   LearningRate 0.0055   Epoch: 15   Global Step: 634850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:30,203-Speed 2631.24 samples/sec   Loss 3.1319   LearningRate 0.0055   Epoch: 15   Global Step: 634860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:34,095-Speed 2631.35 samples/sec   Loss 3.1776   LearningRate 0.0055   Epoch: 15   Global Step: 634870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:37,994-Speed 2627.30 samples/sec   Loss 3.1858   LearningRate 0.0055   Epoch: 15   Global Step: 634880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:41,892-Speed 2628.20 samples/sec   Loss 3.1513   LearningRate 0.0055   Epoch: 15   Global Step: 634890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:45,785-Speed 2630.84 samples/sec   Loss 3.1762   LearningRate 0.0055   Epoch: 15   Global Step: 634900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:49,684-Speed 2626.67 samples/sec   Loss 3.2056   LearningRate 0.0055   Epoch: 15   Global Step: 634910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:53,576-Speed 2632.13 samples/sec   Loss 3.1000   LearningRate 0.0055   Epoch: 15   Global Step: 634920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:05:57,468-Speed 2631.77 samples/sec   Loss 3.1537   LearningRate 0.0055   Epoch: 15   Global Step: 634930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:06:01,389-Speed 2612.42 samples/sec   Loss 3.1483   LearningRate 0.0055   Epoch: 15   Global Step: 634940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:05,293-Speed 2623.42 samples/sec   Loss 3.1593   LearningRate 0.0055   Epoch: 15   Global Step: 634950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:09,189-Speed 2628.95 samples/sec   Loss 3.1740   LearningRate 0.0055   Epoch: 15   Global Step: 634960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:13,087-Speed 2627.74 samples/sec   Loss 3.2669   LearningRate 0.0055   Epoch: 15   Global Step: 634970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:17,000-Speed 2617.66 samples/sec   Loss 3.2172   LearningRate 0.0055   Epoch: 15   Global Step: 634980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:20,898-Speed 2627.62 samples/sec   Loss 3.2554   LearningRate 0.0055   Epoch: 15   Global Step: 634990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:24,805-Speed 2621.88 samples/sec   Loss 3.2400   LearningRate 0.0055   Epoch: 15   Global Step: 635000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:28,709-Speed 2623.76 samples/sec   Loss 3.2174   LearningRate 0.0055   Epoch: 15   Global Step: 635010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:32,606-Speed 2628.06 samples/sec   Loss 3.1931   LearningRate 0.0055   Epoch: 15   Global Step: 635020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:36,503-Speed 2628.59 samples/sec   Loss 3.2537   LearningRate 0.0055   Epoch: 15   Global Step: 635030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:40,396-Speed 2630.81 samples/sec   Loss 3.0818   LearningRate 0.0055   Epoch: 15   Global Step: 635040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:06:44,268-Speed 2644.48 samples/sec   Loss 3.1799   LearningRate 0.0055   Epoch: 15   Global Step: 635050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:48,199-Speed 2606.55 samples/sec   Loss 3.1283   LearningRate 0.0055   Epoch: 15   Global Step: 635060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:52,145-Speed 2595.55 samples/sec   Loss 3.2646   LearningRate 0.0055   Epoch: 15   Global Step: 635070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:56,080-Speed 2603.47 samples/sec   Loss 3.2333   LearningRate 0.0055   Epoch: 15   Global Step: 635080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:06:59,983-Speed 2624.40 samples/sec   Loss 3.0986   LearningRate 0.0055   Epoch: 15   Global Step: 635090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:07:03,877-Speed 2631.57 samples/sec   Loss 3.2483   LearningRate 0.0055   Epoch: 15   Global Step: 635100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:07:07,748-Speed 2645.62 samples/sec   Loss 3.2060   LearningRate 0.0055   Epoch: 15   Global Step: 635110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:11,781-Speed 2539.34 samples/sec   Loss 3.1343   LearningRate 0.0055   Epoch: 15   Global Step: 635120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:15,876-Speed 2501.48 samples/sec   Loss 3.1552   LearningRate 0.0055   Epoch: 15   Global Step: 635130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:19,810-Speed 2603.45 samples/sec   Loss 3.1344   LearningRate 0.0055   Epoch: 15   Global Step: 635140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:23,708-Speed 2627.92 samples/sec   Loss 3.1866   LearningRate 0.0055   Epoch: 15   Global Step: 635150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:27,603-Speed 2629.56 samples/sec   Loss 3.2371   LearningRate 0.0055   Epoch: 15   Global Step: 635160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:31,497-Speed 2630.88 samples/sec   Loss 3.0738   LearningRate 0.0055   Epoch: 15   Global Step: 635170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:35,397-Speed 2625.47 samples/sec   Loss 3.2138   LearningRate 0.0055   Epoch: 15   Global Step: 635180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:39,311-Speed 2616.97 samples/sec   Loss 3.2896   LearningRate 0.0055   Epoch: 15   Global Step: 635190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:43,204-Speed 2630.95 samples/sec   Loss 3.2720   LearningRate 0.0055   Epoch: 15   Global Step: 635200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:07:47,106-Speed 2624.98 samples/sec   Loss 3.2487   LearningRate 0.0055   Epoch: 15   Global Step: 635210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:07:50,995-Speed 2633.24 samples/sec   Loss 3.1844   LearningRate 0.0055   Epoch: 15   Global Step: 635220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:07:54,893-Speed 2627.97 samples/sec   Loss 3.2119   LearningRate 0.0055   Epoch: 15   Global Step: 635230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:07:58,783-Speed 2632.96 samples/sec   Loss 3.2137   LearningRate 0.0055   Epoch: 15   Global Step: 635240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:08:02,677-Speed 2630.87 samples/sec   Loss 3.2043   LearningRate 0.0055   Epoch: 15   Global Step: 635250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:08:06,584-Speed 2621.26 samples/sec   Loss 3.2761   LearningRate 0.0055   Epoch: 15   Global Step: 635260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:08:10,460-Speed 2642.70 samples/sec   Loss 3.2368   LearningRate 0.0055   Epoch: 15   Global Step: 635270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:14,350-Speed 2632.62 samples/sec   Loss 3.1216   LearningRate 0.0055   Epoch: 15   Global Step: 635280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:18,377-Speed 2543.68 samples/sec   Loss 3.1495   LearningRate 0.0055   Epoch: 15   Global Step: 635290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:22,272-Speed 2629.14 samples/sec   Loss 3.2406   LearningRate 0.0055   Epoch: 15   Global Step: 635300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:26,168-Speed 2628.84 samples/sec   Loss 3.3441   LearningRate 0.0055   Epoch: 15   Global Step: 635310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:30,079-Speed 2619.65 samples/sec   Loss 3.2052   LearningRate 0.0055   Epoch: 15   Global Step: 635320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:33,987-Speed 2621.24 samples/sec   Loss 3.2598   LearningRate 0.0055   Epoch: 15   Global Step: 635330   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:37,877-Speed 2632.43 samples/sec   Loss 3.1409   LearningRate 0.0055   Epoch: 15   Global Step: 635340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:41,786-Speed 2620.66 samples/sec   Loss 3.1981   LearningRate 0.0055   Epoch: 15   Global Step: 635350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:45,693-Speed 2622.08 samples/sec   Loss 3.1563   LearningRate 0.0055   Epoch: 15   Global Step: 635360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:49,598-Speed 2622.36 samples/sec   Loss 3.2107   LearningRate 0.0055   Epoch: 15   Global Step: 635370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:08:53,472-Speed 2644.27 samples/sec   Loss 3.2086   LearningRate 0.0055   Epoch: 15   Global Step: 635380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:08:57,383-Speed 2618.43 samples/sec   Loss 3.0902   LearningRate 0.0055   Epoch: 15   Global Step: 635390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:01,283-Speed 2626.66 samples/sec   Loss 3.2518   LearningRate 0.0055   Epoch: 15   Global Step: 635400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:05,189-Speed 2622.41 samples/sec   Loss 3.1216   LearningRate 0.0055   Epoch: 15   Global Step: 635410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:09,081-Speed 2631.69 samples/sec   Loss 3.1664   LearningRate 0.0055   Epoch: 15   Global Step: 635420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:12,979-Speed 2627.44 samples/sec   Loss 3.0958   LearningRate 0.0055   Epoch: 15   Global Step: 635430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:16,878-Speed 2626.68 samples/sec   Loss 3.2456   LearningRate 0.0055   Epoch: 15   Global Step: 635440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:20,808-Speed 2607.04 samples/sec   Loss 3.1304   LearningRate 0.0055   Epoch: 15   Global Step: 635450   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:24,719-Speed 2618.69 samples/sec   Loss 3.1770   LearningRate 0.0055   Epoch: 15   Global Step: 635460   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:28,632-Speed 2618.17 samples/sec   Loss 3.1704   LearningRate 0.0055   Epoch: 15   Global Step: 635470   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:09:32,522-Speed 2632.40 samples/sec   Loss 3.2230   LearningRate 0.0055   Epoch: 15   Global Step: 635480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:09:36,423-Speed 2626.05 samples/sec   Loss 3.1054   LearningRate 0.0055   Epoch: 15   Global Step: 635490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:09:40,420-Speed 2562.44 samples/sec   Loss 3.1973   LearningRate 0.0055   Epoch: 15   Global Step: 635500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:09:44,346-Speed 2608.93 samples/sec   Loss 3.1941   LearningRate 0.0055   Epoch: 15   Global Step: 635510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:09:48,398-Speed 2528.09 samples/sec   Loss 3.2027   LearningRate 0.0055   Epoch: 15   Global Step: 635520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:09:52,374-Speed 2576.48 samples/sec   Loss 3.2629   LearningRate 0.0055   Epoch: 15   Global Step: 635530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:09:56,277-Speed 2624.26 samples/sec   Loss 3.1515   LearningRate 0.0055   Epoch: 15   Global Step: 635540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:00,172-Speed 2629.69 samples/sec   Loss 3.1758   LearningRate 0.0055   Epoch: 15   Global Step: 635550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:04,069-Speed 2628.60 samples/sec   Loss 3.0895   LearningRate 0.0055   Epoch: 15   Global Step: 635560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:07,964-Speed 2629.67 samples/sec   Loss 3.2406   LearningRate 0.0055   Epoch: 15   Global Step: 635570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:11,931-Speed 2581.22 samples/sec   Loss 3.1600   LearningRate 0.0055   Epoch: 15   Global Step: 635580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:15,982-Speed 2528.48 samples/sec   Loss 3.2597   LearningRate 0.0055   Epoch: 15   Global Step: 635590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:19,884-Speed 2624.82 samples/sec   Loss 3.1958   LearningRate 0.0055   Epoch: 15   Global Step: 635600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:23,788-Speed 2624.21 samples/sec   Loss 3.2928   LearningRate 0.0055   Epoch: 15   Global Step: 635610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:27,693-Speed 2622.95 samples/sec   Loss 3.2580   LearningRate 0.0055   Epoch: 15   Global Step: 635620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:31,589-Speed 2628.66 samples/sec   Loss 3.1908   LearningRate 0.0055   Epoch: 15   Global Step: 635630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:35,495-Speed 2622.26 samples/sec   Loss 3.1253   LearningRate 0.0055   Epoch: 15   Global Step: 635640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:39,391-Speed 2629.10 samples/sec   Loss 3.2493   LearningRate 0.0055   Epoch: 15   Global Step: 635650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:43,292-Speed 2625.92 samples/sec   Loss 3.1872   LearningRate 0.0055   Epoch: 15   Global Step: 635660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:47,186-Speed 2630.26 samples/sec   Loss 3.2216   LearningRate 0.0055   Epoch: 15   Global Step: 635670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:51,100-Speed 2617.24 samples/sec   Loss 3.1201   LearningRate 0.0055   Epoch: 15   Global Step: 635680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:10:54,971-Speed 2645.73 samples/sec   Loss 3.2720   LearningRate 0.0055   Epoch: 15   Global Step: 635690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:10:58,866-Speed 2629.70 samples/sec   Loss 3.1260   LearningRate 0.0055   Epoch: 15   Global Step: 635700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:02,760-Speed 2630.58 samples/sec   Loss 3.1642   LearningRate 0.0055   Epoch: 15   Global Step: 635710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:06,654-Speed 2630.12 samples/sec   Loss 3.1710   LearningRate 0.0055   Epoch: 15   Global Step: 635720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:10,546-Speed 2631.39 samples/sec   Loss 3.2222   LearningRate 0.0055   Epoch: 15   Global Step: 635730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:14,445-Speed 2627.25 samples/sec   Loss 3.1810   LearningRate 0.0055   Epoch: 15   Global Step: 635740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:18,342-Speed 2628.41 samples/sec   Loss 3.1672   LearningRate 0.0055   Epoch: 15   Global Step: 635750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:22,235-Speed 2631.41 samples/sec   Loss 3.1844   LearningRate 0.0055   Epoch: 15   Global Step: 635760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:26,129-Speed 2630.56 samples/sec   Loss 3.1163   LearningRate 0.0055   Epoch: 15   Global Step: 635770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:30,018-Speed 2633.49 samples/sec   Loss 3.1443   LearningRate 0.0055   Epoch: 15   Global Step: 635780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:33,893-Speed 2643.14 samples/sec   Loss 3.1649   LearningRate 0.0055   Epoch: 15   Global Step: 635790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:37,794-Speed 2625.29 samples/sec   Loss 3.2067   LearningRate 0.0055   Epoch: 15   Global Step: 635800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:41,702-Speed 2620.76 samples/sec   Loss 3.2386   LearningRate 0.0055   Epoch: 15   Global Step: 635810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:11:45,582-Speed 2640.22 samples/sec   Loss 3.1482   LearningRate 0.0055   Epoch: 15   Global Step: 635820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:11:49,484-Speed 2625.29 samples/sec   Loss 3.1276   LearningRate 0.0055   Epoch: 15   Global Step: 635830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:11:53,381-Speed 2628.60 samples/sec   Loss 3.1435   LearningRate 0.0055   Epoch: 15   Global Step: 635840   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:11:57,277-Speed 2629.35 samples/sec   Loss 3.1936   LearningRate 0.0055   Epoch: 15   Global Step: 635850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:01,171-Speed 2630.15 samples/sec   Loss 3.1998   LearningRate 0.0055   Epoch: 15   Global Step: 635860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:05,073-Speed 2625.19 samples/sec   Loss 3.1737   LearningRate 0.0055   Epoch: 15   Global Step: 635870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:09,010-Speed 2601.10 samples/sec   Loss 3.1340   LearningRate 0.0055   Epoch: 15   Global Step: 635880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:12,961-Speed 2592.91 samples/sec   Loss 3.2584   LearningRate 0.0055   Epoch: 15   Global Step: 635890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:16,872-Speed 2619.48 samples/sec   Loss 3.2380   LearningRate 0.0055   Epoch: 15   Global Step: 635900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:20,767-Speed 2629.93 samples/sec   Loss 3.1450   LearningRate 0.0055   Epoch: 15   Global Step: 635910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:24,678-Speed 2619.82 samples/sec   Loss 3.1862   LearningRate 0.0054   Epoch: 15   Global Step: 635920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:12:28,545-Speed 2648.22 samples/sec   Loss 3.1891   LearningRate 0.0054   Epoch: 15   Global Step: 635930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:32,465-Speed 2613.36 samples/sec   Loss 3.1693   LearningRate 0.0054   Epoch: 15   Global Step: 635940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:36,357-Speed 2632.03 samples/sec   Loss 3.1972   LearningRate 0.0054   Epoch: 15   Global Step: 635950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:40,251-Speed 2630.78 samples/sec   Loss 3.1425   LearningRate 0.0054   Epoch: 15   Global Step: 635960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:44,148-Speed 2628.65 samples/sec   Loss 3.1475   LearningRate 0.0054   Epoch: 15   Global Step: 635970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:48,041-Speed 2630.45 samples/sec   Loss 3.1537   LearningRate 0.0054   Epoch: 15   Global Step: 635980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:51,933-Speed 2631.75 samples/sec   Loss 3.2198   LearningRate 0.0054   Epoch: 15   Global Step: 635990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:55,837-Speed 2623.81 samples/sec   Loss 3.1711   LearningRate 0.0054   Epoch: 15   Global Step: 636000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:12:59,729-Speed 2632.78 samples/sec   Loss 3.1882   LearningRate 0.0054   Epoch: 15   Global Step: 636010   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:03,628-Speed 2626.86 samples/sec   Loss 3.2089   LearningRate 0.0054   Epoch: 15   Global Step: 636020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:07,517-Speed 2633.92 samples/sec   Loss 3.2009   LearningRate 0.0054   Epoch: 15   Global Step: 636030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:11,427-Speed 2619.75 samples/sec   Loss 3.1054   LearningRate 0.0054   Epoch: 15   Global Step: 636040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:15,337-Speed 2619.04 samples/sec   Loss 3.1297   LearningRate 0.0054   Epoch: 15   Global Step: 636050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:19,251-Speed 2617.18 samples/sec   Loss 3.1430   LearningRate 0.0054   Epoch: 15   Global Step: 636060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:23,183-Speed 2604.86 samples/sec   Loss 3.2807   LearningRate 0.0054   Epoch: 15   Global Step: 636070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:27,116-Speed 2604.56 samples/sec   Loss 3.2263   LearningRate 0.0054   Epoch: 15   Global Step: 636080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:31,017-Speed 2635.36 samples/sec   Loss 3.1775   LearningRate 0.0054   Epoch: 15   Global Step: 636090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:13:34,896-Speed 2640.30 samples/sec   Loss 3.2319   LearningRate 0.0054   Epoch: 15   Global Step: 636100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:38,802-Speed 2622.16 samples/sec   Loss 3.1853   LearningRate 0.0054   Epoch: 15   Global Step: 636110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:42,698-Speed 2629.07 samples/sec   Loss 3.1799   LearningRate 0.0054   Epoch: 15   Global Step: 636120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:46,602-Speed 2623.93 samples/sec   Loss 3.2143   LearningRate 0.0054   Epoch: 15   Global Step: 636130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:50,492-Speed 2632.32 samples/sec   Loss 3.1692   LearningRate 0.0054   Epoch: 15   Global Step: 636140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:54,387-Speed 2630.30 samples/sec   Loss 3.1911   LearningRate 0.0054   Epoch: 15   Global Step: 636150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:13:58,294-Speed 2621.31 samples/sec   Loss 3.1496   LearningRate 0.0054   Epoch: 15   Global Step: 636160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:02,198-Speed 2624.11 samples/sec   Loss 3.2016   LearningRate 0.0054   Epoch: 15   Global Step: 636170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:06,107-Speed 2620.09 samples/sec   Loss 3.1731   LearningRate 0.0054   Epoch: 15   Global Step: 636180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:10,011-Speed 2624.22 samples/sec   Loss 3.1856   LearningRate 0.0054   Epoch: 15   Global Step: 636190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:14,001-Speed 2567.03 samples/sec   Loss 3.1541   LearningRate 0.0054   Epoch: 15   Global Step: 636200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:14:17,903-Speed 2624.73 samples/sec   Loss 3.0859   LearningRate 0.0054   Epoch: 15   Global Step: 636210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:14:21,794-Speed 2632.48 samples/sec   Loss 3.1564   LearningRate 0.0054   Epoch: 15   Global Step: 636220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:14:25,702-Speed 2621.22 samples/sec   Loss 3.2029   LearningRate 0.0054   Epoch: 15   Global Step: 636230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:14:29,611-Speed 2621.04 samples/sec   Loss 3.2612   LearningRate 0.0054   Epoch: 15   Global Step: 636240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:14:33,478-Speed 2648.36 samples/sec   Loss 3.2194   LearningRate 0.0054   Epoch: 15   Global Step: 636250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:37,368-Speed 2632.66 samples/sec   Loss 3.1671   LearningRate 0.0054   Epoch: 15   Global Step: 636260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:41,261-Speed 2631.43 samples/sec   Loss 3.2071   LearningRate 0.0054   Epoch: 15   Global Step: 636270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:45,149-Speed 2634.53 samples/sec   Loss 3.1978   LearningRate 0.0054   Epoch: 15   Global Step: 636280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:49,042-Speed 2631.15 samples/sec   Loss 3.1557   LearningRate 0.0054   Epoch: 15   Global Step: 636290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:52,954-Speed 2618.33 samples/sec   Loss 3.1777   LearningRate 0.0054   Epoch: 15   Global Step: 636300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:14:56,846-Speed 2631.60 samples/sec   Loss 3.1848   LearningRate 0.0054   Epoch: 15   Global Step: 636310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:15:00,739-Speed 2631.31 samples/sec   Loss 3.2047   LearningRate 0.0054   Epoch: 15   Global Step: 636320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:15:04,636-Speed 2627.64 samples/sec   Loss 3.1414   LearningRate 0.0054   Epoch: 15   Global Step: 636330   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:15:08,543-Speed 2622.89 samples/sec   Loss 3.1330   LearningRate 0.0054   Epoch: 15   Global Step: 636340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:15:12,450-Speed 2621.68 samples/sec   Loss 3.1642   LearningRate 0.0054   Epoch: 15   Global Step: 636350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:16,347-Speed 2627.92 samples/sec   Loss 3.1780   LearningRate 0.0054   Epoch: 15   Global Step: 636360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:20,268-Speed 2612.06 samples/sec   Loss 3.1972   LearningRate 0.0054   Epoch: 15   Global Step: 636370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:24,169-Speed 2626.16 samples/sec   Loss 3.1242   LearningRate 0.0054   Epoch: 15   Global Step: 636380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:28,066-Speed 2628.74 samples/sec   Loss 3.3212   LearningRate 0.0054   Epoch: 15   Global Step: 636390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:31,969-Speed 2624.01 samples/sec   Loss 3.2374   LearningRate 0.0054   Epoch: 15   Global Step: 636400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:35,888-Speed 2613.40 samples/sec   Loss 3.1928   LearningRate 0.0054   Epoch: 15   Global Step: 636410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:39,822-Speed 2603.07 samples/sec   Loss 3.2080   LearningRate 0.0054   Epoch: 15   Global Step: 636420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:43,736-Speed 2617.50 samples/sec   Loss 3.1747   LearningRate 0.0054   Epoch: 15   Global Step: 636430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:47,628-Speed 2631.26 samples/sec   Loss 3.1651   LearningRate 0.0054   Epoch: 15   Global Step: 636440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:51,524-Speed 2629.36 samples/sec   Loss 3.1651   LearningRate 0.0054   Epoch: 15   Global Step: 636450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:15:55,441-Speed 2615.41 samples/sec   Loss 3.2459   LearningRate 0.0054   Epoch: 15   Global Step: 636460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:15:59,339-Speed 2627.76 samples/sec   Loss 3.2259   LearningRate 0.0054   Epoch: 15   Global Step: 636470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:16:03,230-Speed 2631.93 samples/sec   Loss 3.1597   LearningRate 0.0054   Epoch: 15   Global Step: 636480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:16:07,164-Speed 2603.79 samples/sec   Loss 3.2589   LearningRate 0.0054   Epoch: 15   Global Step: 636490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:16:11,130-Speed 2582.27 samples/sec   Loss 3.1582   LearningRate 0.0054   Epoch: 15   Global Step: 636500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:16:15,028-Speed 2634.95 samples/sec   Loss 3.1127   LearningRate 0.0054   Epoch: 15   Global Step: 636510   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:18,997-Speed 2580.75 samples/sec   Loss 3.1808   LearningRate 0.0054   Epoch: 15   Global Step: 636520   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:22,900-Speed 2624.06 samples/sec   Loss 3.2136   LearningRate 0.0054   Epoch: 15   Global Step: 636530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:26,791-Speed 2632.36 samples/sec   Loss 3.1547   LearningRate 0.0054   Epoch: 15   Global Step: 636540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:30,686-Speed 2630.43 samples/sec   Loss 3.2081   LearningRate 0.0054   Epoch: 15   Global Step: 636550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:34,584-Speed 2627.56 samples/sec   Loss 3.1610   LearningRate 0.0054   Epoch: 15   Global Step: 636560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:38,477-Speed 2630.28 samples/sec   Loss 3.1866   LearningRate 0.0054   Epoch: 15   Global Step: 636570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:42,371-Speed 2630.68 samples/sec   Loss 3.1645   LearningRate 0.0054   Epoch: 15   Global Step: 636580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:46,270-Speed 2627.05 samples/sec   Loss 3.1862   LearningRate 0.0054   Epoch: 15   Global Step: 636590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:50,158-Speed 2634.07 samples/sec   Loss 3.2000   LearningRate 0.0054   Epoch: 15   Global Step: 636600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:16:54,061-Speed 2624.53 samples/sec   Loss 3.1504   LearningRate 0.0054   Epoch: 15   Global Step: 636610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:16:57,955-Speed 2630.57 samples/sec   Loss 3.2677   LearningRate 0.0054   Epoch: 15   Global Step: 636620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:01,847-Speed 2631.72 samples/sec   Loss 3.1201   LearningRate 0.0054   Epoch: 15   Global Step: 636630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:05,741-Speed 2629.79 samples/sec   Loss 3.1213   LearningRate 0.0054   Epoch: 15   Global Step: 636640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:09,632-Speed 2633.16 samples/sec   Loss 3.1975   LearningRate 0.0054   Epoch: 15   Global Step: 636650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:13,524-Speed 2631.82 samples/sec   Loss 3.1652   LearningRate 0.0054   Epoch: 15   Global Step: 636660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:17,514-Speed 2566.85 samples/sec   Loss 3.1915   LearningRate 0.0054   Epoch: 15   Global Step: 636670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:21,480-Speed 2583.27 samples/sec   Loss 3.1489   LearningRate 0.0054   Epoch: 15   Global Step: 636680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:25,528-Speed 2530.35 samples/sec   Loss 3.1650   LearningRate 0.0054   Epoch: 15   Global Step: 636690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:29,633-Speed 2495.21 samples/sec   Loss 3.1060   LearningRate 0.0054   Epoch: 15   Global Step: 636700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:33,686-Speed 2527.36 samples/sec   Loss 3.1381   LearningRate 0.0054   Epoch: 15   Global Step: 636710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:37,670-Speed 2570.31 samples/sec   Loss 3.1030   LearningRate 0.0054   Epoch: 15   Global Step: 636720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:41,564-Speed 2630.82 samples/sec   Loss 3.1576   LearningRate 0.0054   Epoch: 15   Global Step: 636730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:45,458-Speed 2629.98 samples/sec   Loss 3.2389   LearningRate 0.0054   Epoch: 15   Global Step: 636740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:49,379-Speed 2612.08 samples/sec   Loss 3.1546   LearningRate 0.0054   Epoch: 15   Global Step: 636750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:17:53,250-Speed 2646.41 samples/sec   Loss 3.0810   LearningRate 0.0054   Epoch: 15   Global Step: 636760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:17:57,147-Speed 2627.95 samples/sec   Loss 3.1763   LearningRate 0.0054   Epoch: 15   Global Step: 636770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:01,038-Speed 2634.87 samples/sec   Loss 3.1616   LearningRate 0.0054   Epoch: 15   Global Step: 636780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:04,931-Speed 2630.70 samples/sec   Loss 3.1328   LearningRate 0.0054   Epoch: 15   Global Step: 636790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:08,833-Speed 2624.86 samples/sec   Loss 3.2072   LearningRate 0.0054   Epoch: 15   Global Step: 636800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:12,729-Speed 2628.45 samples/sec   Loss 3.1451   LearningRate 0.0054   Epoch: 15   Global Step: 636810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:16,632-Speed 2624.22 samples/sec   Loss 3.1425   LearningRate 0.0054   Epoch: 15   Global Step: 636820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:20,525-Speed 2631.46 samples/sec   Loss 3.2860   LearningRate 0.0054   Epoch: 15   Global Step: 636830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:24,420-Speed 2629.70 samples/sec   Loss 3.1746   LearningRate 0.0054   Epoch: 15   Global Step: 636840   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:28,318-Speed 2627.67 samples/sec   Loss 3.1695   LearningRate 0.0054   Epoch: 15   Global Step: 636850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:32,231-Speed 2617.48 samples/sec   Loss 3.1734   LearningRate 0.0054   Epoch: 15   Global Step: 636860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:18:36,129-Speed 2628.02 samples/sec   Loss 3.1636   LearningRate 0.0054   Epoch: 15   Global Step: 636870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:18:40,030-Speed 2625.27 samples/sec   Loss 3.1475   LearningRate 0.0054   Epoch: 15   Global Step: 636880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:18:43,936-Speed 2623.35 samples/sec   Loss 3.1900   LearningRate 0.0054   Epoch: 15   Global Step: 636890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:18:47,835-Speed 2626.78 samples/sec   Loss 3.1202   LearningRate 0.0054   Epoch: 15   Global Step: 636900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:18:51,708-Speed 2644.20 samples/sec   Loss 3.1592   LearningRate 0.0054   Epoch: 15   Global Step: 636910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:55,604-Speed 2629.21 samples/sec   Loss 3.1728   LearningRate 0.0054   Epoch: 15   Global Step: 636920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:18:59,500-Speed 2629.28 samples/sec   Loss 3.2183   LearningRate 0.0054   Epoch: 15   Global Step: 636930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:03,401-Speed 2625.47 samples/sec   Loss 3.1855   LearningRate 0.0054   Epoch: 15   Global Step: 636940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:07,297-Speed 2629.17 samples/sec   Loss 3.0931   LearningRate 0.0054   Epoch: 15   Global Step: 636950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:11,221-Speed 2610.13 samples/sec   Loss 3.2394   LearningRate 0.0054   Epoch: 15   Global Step: 636960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:15,119-Speed 2628.60 samples/sec   Loss 3.1688   LearningRate 0.0054   Epoch: 15   Global Step: 636970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:19,026-Speed 2621.03 samples/sec   Loss 3.1609   LearningRate 0.0054   Epoch: 15   Global Step: 636980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:22,917-Speed 2632.77 samples/sec   Loss 3.1860   LearningRate 0.0054   Epoch: 15   Global Step: 636990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:26,814-Speed 2628.07 samples/sec   Loss 3.1130   LearningRate 0.0054   Epoch: 15   Global Step: 637000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:19:30,721-Speed 2622.11 samples/sec   Loss 3.2217   LearningRate 0.0054   Epoch: 15   Global Step: 637010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:34,634-Speed 2619.58 samples/sec   Loss 3.0420   LearningRate 0.0054   Epoch: 15   Global Step: 637020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:38,557-Speed 2611.05 samples/sec   Loss 3.1595   LearningRate 0.0054   Epoch: 15   Global Step: 637030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:42,454-Speed 2628.47 samples/sec   Loss 3.2138   LearningRate 0.0054   Epoch: 15   Global Step: 637040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:46,375-Speed 2612.71 samples/sec   Loss 3.1174   LearningRate 0.0054   Epoch: 15   Global Step: 637050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:50,265-Speed 2633.01 samples/sec   Loss 3.1740   LearningRate 0.0054   Epoch: 15   Global Step: 637060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:54,161-Speed 2629.02 samples/sec   Loss 3.1997   LearningRate 0.0054   Epoch: 15   Global Step: 637070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:19:58,059-Speed 2627.47 samples/sec   Loss 3.1343   LearningRate 0.0054   Epoch: 15   Global Step: 637080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:01,961-Speed 2625.00 samples/sec   Loss 3.1930   LearningRate 0.0054   Epoch: 15   Global Step: 637090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:05,874-Speed 2617.51 samples/sec   Loss 3.1589   LearningRate 0.0054   Epoch: 15   Global Step: 637100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:09,782-Speed 2621.11 samples/sec   Loss 3.1580   LearningRate 0.0054   Epoch: 15   Global Step: 637110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:20:13,667-Speed 2636.64 samples/sec   Loss 3.1587   LearningRate 0.0054   Epoch: 15   Global Step: 637120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:17,563-Speed 2629.04 samples/sec   Loss 3.1330   LearningRate 0.0054   Epoch: 15   Global Step: 637130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:21,471-Speed 2620.38 samples/sec   Loss 3.2372   LearningRate 0.0054   Epoch: 15   Global Step: 637140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:25,367-Speed 2628.88 samples/sec   Loss 3.1370   LearningRate 0.0054   Epoch: 15   Global Step: 637150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:29,266-Speed 2627.61 samples/sec   Loss 3.1468   LearningRate 0.0054   Epoch: 15   Global Step: 637160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:33,164-Speed 2627.38 samples/sec   Loss 3.1249   LearningRate 0.0054   Epoch: 15   Global Step: 637170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:37,061-Speed 2627.64 samples/sec   Loss 3.2004   LearningRate 0.0054   Epoch: 15   Global Step: 637180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:40,957-Speed 2629.21 samples/sec   Loss 3.1273   LearningRate 0.0054   Epoch: 15   Global Step: 637190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:44,855-Speed 2628.20 samples/sec   Loss 3.1514   LearningRate 0.0054   Epoch: 15   Global Step: 637200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:48,753-Speed 2627.57 samples/sec   Loss 3.1329   LearningRate 0.0054   Epoch: 15   Global Step: 637210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:52,624-Speed 2646.31 samples/sec   Loss 3.0531   LearningRate 0.0054   Epoch: 15   Global Step: 637220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:20:56,554-Speed 2606.21 samples/sec   Loss 3.1813   LearningRate 0.0054   Epoch: 15   Global Step: 637230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:00,458-Speed 2623.27 samples/sec   Loss 3.0848   LearningRate 0.0054   Epoch: 15   Global Step: 637240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:04,446-Speed 2568.46 samples/sec   Loss 3.0960   LearningRate 0.0054   Epoch: 15   Global Step: 637250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:08,380-Speed 2603.32 samples/sec   Loss 3.1698   LearningRate 0.0054   Epoch: 15   Global Step: 637260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:12,298-Speed 2614.53 samples/sec   Loss 3.1332   LearningRate 0.0054   Epoch: 15   Global Step: 637270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:16,197-Speed 2627.23 samples/sec   Loss 3.2226   LearningRate 0.0054   Epoch: 15   Global Step: 637280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:20,100-Speed 2624.47 samples/sec   Loss 3.1689   LearningRate 0.0054   Epoch: 15   Global Step: 637290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:24,001-Speed 2625.52 samples/sec   Loss 3.1417   LearningRate 0.0054   Epoch: 15   Global Step: 637300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:27,900-Speed 2627.33 samples/sec   Loss 3.2015   LearningRate 0.0054   Epoch: 15   Global Step: 637310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:31,794-Speed 2630.68 samples/sec   Loss 3.1090   LearningRate 0.0054   Epoch: 15   Global Step: 637320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:35,694-Speed 2625.94 samples/sec   Loss 3.1582   LearningRate 0.0054   Epoch: 15   Global Step: 637330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:21:39,584-Speed 2633.26 samples/sec   Loss 3.2222   LearningRate 0.0054   Epoch: 15   Global Step: 637340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:21:43,485-Speed 2625.86 samples/sec   Loss 3.1835   LearningRate 0.0054   Epoch: 15   Global Step: 637350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:21:47,384-Speed 2627.26 samples/sec   Loss 3.2023   LearningRate 0.0054   Epoch: 15   Global Step: 637360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:21:51,319-Speed 2603.10 samples/sec   Loss 3.3061   LearningRate 0.0054   Epoch: 15   Global Step: 637370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:21:55,215-Speed 2628.76 samples/sec   Loss 3.1154   LearningRate 0.0054   Epoch: 15   Global Step: 637380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:21:59,110-Speed 2629.91 samples/sec   Loss 3.2374   LearningRate 0.0054   Epoch: 15   Global Step: 637390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:22:03,010-Speed 2626.34 samples/sec   Loss 3.1099   LearningRate 0.0054   Epoch: 15   Global Step: 637400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:22:06,907-Speed 2628.11 samples/sec   Loss 3.0544   LearningRate 0.0054   Epoch: 15   Global Step: 637410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:22:10,825-Speed 2614.54 samples/sec   Loss 3.1229   LearningRate 0.0054   Epoch: 15   Global Step: 637420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:22:14,760-Speed 2603.04 samples/sec   Loss 3.1363   LearningRate 0.0054   Epoch: 15   Global Step: 637430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:22:18,653-Speed 2631.27 samples/sec   Loss 3.1297   LearningRate 0.0054   Epoch: 15   Global Step: 637440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:22,544-Speed 2632.07 samples/sec   Loss 3.1375   LearningRate 0.0054   Epoch: 15   Global Step: 637450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:26,435-Speed 2632.16 samples/sec   Loss 3.1458   LearningRate 0.0054   Epoch: 15   Global Step: 637460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:30,336-Speed 2626.65 samples/sec   Loss 3.1377   LearningRate 0.0054   Epoch: 15   Global Step: 637470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:34,226-Speed 2632.90 samples/sec   Loss 3.1277   LearningRate 0.0054   Epoch: 15   Global Step: 637480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:38,153-Speed 2607.73 samples/sec   Loss 3.1895   LearningRate 0.0054   Epoch: 15   Global Step: 637490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:42,047-Speed 2630.22 samples/sec   Loss 3.1163   LearningRate 0.0054   Epoch: 15   Global Step: 637500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:45,952-Speed 2623.48 samples/sec   Loss 3.2070   LearningRate 0.0054   Epoch: 15   Global Step: 637510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:49,848-Speed 2628.68 samples/sec   Loss 3.0865   LearningRate 0.0054   Epoch: 15   Global Step: 637520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:53,741-Speed 2631.38 samples/sec   Loss 3.1266   LearningRate 0.0054   Epoch: 15   Global Step: 637530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:22:57,655-Speed 2616.45 samples/sec   Loss 3.1603   LearningRate 0.0054   Epoch: 15   Global Step: 637540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:23:01,535-Speed 2640.00 samples/sec   Loss 3.2108   LearningRate 0.0054   Epoch: 15   Global Step: 637550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:23:05,431-Speed 2629.65 samples/sec   Loss 3.0353   LearningRate 0.0054   Epoch: 15   Global Step: 637560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:23:09,322-Speed 2632.31 samples/sec   Loss 3.1115   LearningRate 0.0054   Epoch: 15   Global Step: 637570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:23:13,219-Speed 2628.14 samples/sec   Loss 3.1331   LearningRate 0.0054   Epoch: 15   Global Step: 637580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:23:17,094-Speed 2642.83 samples/sec   Loss 3.1576   LearningRate 0.0054   Epoch: 15   Global Step: 637590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:20,998-Speed 2623.95 samples/sec   Loss 3.1694   LearningRate 0.0054   Epoch: 15   Global Step: 637600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:24,896-Speed 2627.10 samples/sec   Loss 3.1628   LearningRate 0.0054   Epoch: 15   Global Step: 637610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:28,790-Speed 2630.41 samples/sec   Loss 3.1923   LearningRate 0.0054   Epoch: 15   Global Step: 637620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:32,697-Speed 2621.46 samples/sec   Loss 3.1726   LearningRate 0.0054   Epoch: 15   Global Step: 637630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:36,598-Speed 2625.39 samples/sec   Loss 3.1140   LearningRate 0.0054   Epoch: 15   Global Step: 637640   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:40,508-Speed 2619.78 samples/sec   Loss 3.2036   LearningRate 0.0054   Epoch: 15   Global Step: 637650   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:44,428-Speed 2613.20 samples/sec   Loss 3.1676   LearningRate 0.0054   Epoch: 15   Global Step: 637660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:48,326-Speed 2627.27 samples/sec   Loss 3.1742   LearningRate 0.0054   Epoch: 15   Global Step: 637670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:52,222-Speed 2629.32 samples/sec   Loss 3.1526   LearningRate 0.0054   Epoch: 15   Global Step: 637680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:23:56,117-Speed 2629.69 samples/sec   Loss 3.1246   LearningRate 0.0054   Epoch: 15   Global Step: 637690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:24:00,013-Speed 2629.12 samples/sec   Loss 3.0934   LearningRate 0.0053   Epoch: 15   Global Step: 637700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:24:03,907-Speed 2630.61 samples/sec   Loss 3.0762   LearningRate 0.0053   Epoch: 15   Global Step: 637710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:24:07,805-Speed 2627.64 samples/sec   Loss 3.2026   LearningRate 0.0053   Epoch: 15   Global Step: 637720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:24:11,707-Speed 2624.63 samples/sec   Loss 3.1908   LearningRate 0.0053   Epoch: 15   Global Step: 637730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:15,605-Speed 2628.38 samples/sec   Loss 3.1293   LearningRate 0.0053   Epoch: 15   Global Step: 637740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:19,497-Speed 2631.20 samples/sec   Loss 3.2626   LearningRate 0.0053   Epoch: 15   Global Step: 637750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:23,392-Speed 2629.76 samples/sec   Loss 3.1393   LearningRate 0.0053   Epoch: 15   Global Step: 637760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:27,299-Speed 2621.80 samples/sec   Loss 3.1333   LearningRate 0.0053   Epoch: 15   Global Step: 637770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:31,194-Speed 2629.95 samples/sec   Loss 3.1549   LearningRate 0.0053   Epoch: 15   Global Step: 637780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:35,090-Speed 2629.32 samples/sec   Loss 3.1282   LearningRate 0.0053   Epoch: 15   Global Step: 637790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:39,001-Speed 2618.26 samples/sec   Loss 3.1253   LearningRate 0.0053   Epoch: 15   Global Step: 637800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:42,897-Speed 2629.49 samples/sec   Loss 3.2265   LearningRate 0.0053   Epoch: 15   Global Step: 637810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:46,787-Speed 2633.34 samples/sec   Loss 3.1732   LearningRate 0.0053   Epoch: 15   Global Step: 637820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:24:50,682-Speed 2629.86 samples/sec   Loss 3.1167   LearningRate 0.0053   Epoch: 15   Global Step: 637830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:24:54,581-Speed 2626.78 samples/sec   Loss 3.1632   LearningRate 0.0053   Epoch: 15   Global Step: 637840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:24:58,474-Speed 2631.26 samples/sec   Loss 3.1824   LearningRate 0.0053   Epoch: 15   Global Step: 637850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:25:02,382-Speed 2620.52 samples/sec   Loss 3.2018   LearningRate 0.0053   Epoch: 15   Global Step: 637860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:25:06,291-Speed 2620.86 samples/sec   Loss 3.1206   LearningRate 0.0053   Epoch: 15   Global Step: 637870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:25:10,190-Speed 2627.30 samples/sec   Loss 3.1488   LearningRate 0.0053   Epoch: 15   Global Step: 637880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:25:14,063-Speed 2643.96 samples/sec   Loss 3.1281   LearningRate 0.0053   Epoch: 15   Global Step: 637890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:17,964-Speed 2627.06 samples/sec   Loss 3.0779   LearningRate 0.0053   Epoch: 15   Global Step: 637900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:21,860-Speed 2628.61 samples/sec   Loss 3.2537   LearningRate 0.0053   Epoch: 15   Global Step: 637910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:25,778-Speed 2614.39 samples/sec   Loss 3.1478   LearningRate 0.0053   Epoch: 15   Global Step: 637920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:29,670-Speed 2631.53 samples/sec   Loss 3.1835   LearningRate 0.0053   Epoch: 15   Global Step: 637930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:33,581-Speed 2619.05 samples/sec   Loss 3.1386   LearningRate 0.0053   Epoch: 15   Global Step: 637940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:37,481-Speed 2626.63 samples/sec   Loss 3.1490   LearningRate 0.0053   Epoch: 15   Global Step: 637950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:41,377-Speed 2629.06 samples/sec   Loss 3.1126   LearningRate 0.0053   Epoch: 15   Global Step: 637960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:45,277-Speed 2626.35 samples/sec   Loss 3.0446   LearningRate 0.0053   Epoch: 15   Global Step: 637970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:49,181-Speed 2624.43 samples/sec   Loss 3.1623   LearningRate 0.0053   Epoch: 15   Global Step: 637980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:25:53,094-Speed 2617.52 samples/sec   Loss 3.0841   LearningRate 0.0053   Epoch: 15   Global Step: 637990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:25:56,995-Speed 2624.92 samples/sec   Loss 3.1416   LearningRate 0.0053   Epoch: 15   Global Step: 638000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:26:00,909-Speed 2617.27 samples/sec   Loss 3.1541   LearningRate 0.0053   Epoch: 15   Global Step: 638010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:26:04,786-Speed 2642.31 samples/sec   Loss 3.1617   LearningRate 0.0053   Epoch: 15   Global Step: 638020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:08,871-Speed 2507.22 samples/sec   Loss 3.1765   LearningRate 0.0053   Epoch: 15   Global Step: 638030   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:12,901-Speed 2542.12 samples/sec   Loss 3.0749   LearningRate 0.0053   Epoch: 15   Global Step: 638040   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:16,797-Speed 2628.71 samples/sec   Loss 3.1578   LearningRate 0.0053   Epoch: 15   Global Step: 638050   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:20,705-Speed 2621.57 samples/sec   Loss 3.1592   LearningRate 0.0053   Epoch: 15   Global Step: 638060   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:24,606-Speed 2625.19 samples/sec   Loss 3.1761   LearningRate 0.0053   Epoch: 15   Global Step: 638070   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:28,508-Speed 2624.59 samples/sec   Loss 3.0821   LearningRate 0.0053   Epoch: 15   Global Step: 638080   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:32,405-Speed 2628.45 samples/sec   Loss 3.1399   LearningRate 0.0053   Epoch: 15   Global Step: 638090   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:36,406-Speed 2560.36 samples/sec   Loss 3.1784   LearningRate 0.0053   Epoch: 15   Global Step: 638100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:40,298-Speed 2631.64 samples/sec   Loss 3.1620   LearningRate 0.0053   Epoch: 15   Global Step: 638110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-15 19:26:44,201-Speed 2624.30 samples/sec   Loss 3.1500   LearningRate 0.0053   Epoch: 15   Global Step: 638120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:26:48,092-Speed 2632.93 samples/sec   Loss 3.1942   LearningRate 0.0053   Epoch: 15   Global Step: 638130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:26:52,001-Speed 2619.93 samples/sec   Loss 3.1893   LearningRate 0.0053   Epoch: 15   Global Step: 638140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:26:55,902-Speed 2625.61 samples/sec   Loss 3.1127   LearningRate 0.0053   Epoch: 15   Global Step: 638150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:26:59,797-Speed 2629.18 samples/sec   Loss 3.1208   LearningRate 0.0053   Epoch: 15   Global Step: 638160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:03,694-Speed 2628.24 samples/sec   Loss 3.1039   LearningRate 0.0053   Epoch: 15   Global Step: 638170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:07,590-Speed 2629.07 samples/sec   Loss 3.1372   LearningRate 0.0053   Epoch: 15   Global Step: 638180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:11,493-Speed 2623.86 samples/sec   Loss 3.1625   LearningRate 0.0053   Epoch: 15   Global Step: 638190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:15,399-Speed 2622.90 samples/sec   Loss 3.0548   LearningRate 0.0053   Epoch: 15   Global Step: 638200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:19,293-Speed 2630.17 samples/sec   Loss 3.0862   LearningRate 0.0053   Epoch: 15   Global Step: 638210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:23,203-Speed 2620.11 samples/sec   Loss 3.1965   LearningRate 0.0053   Epoch: 15   Global Step: 638220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-15 19:27:27,081-Speed 2641.10 samples/sec   Loss 3.1640   LearningRate 0.0053   Epoch: 15   Global Step: 638230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:31,000-Speed 2613.58 samples/sec   Loss 3.1468   LearningRate 0.0053   Epoch: 15   Global Step: 638240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:34,900-Speed 2626.18 samples/sec   Loss 3.1036   LearningRate 0.0053   Epoch: 15   Global Step: 638250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:38,802-Speed 2624.89 samples/sec   Loss 3.1331   LearningRate 0.0053   Epoch: 15   Global Step: 638260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-15 19:27:42,693-Speed 2632.04 samples/sec   Loss 3.1289   LearningRate 0.0053   Epoch: 15   Global Step: 638270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:27:46,599-Speed 2622.79 samples/sec   Loss 3.1494   LearningRate 0.0053   Epoch: 15   Global Step: 638280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:27:50,494-Speed 2630.16 samples/sec   Loss 3.1888   LearningRate 0.0053   Epoch: 15   Global Step: 638290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:27:54,412-Speed 2613.66 samples/sec   Loss 3.2181   LearningRate 0.0053   Epoch: 15   Global Step: 638300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:27:58,309-Speed 2628.81 samples/sec   Loss 3.1842   LearningRate 0.0053   Epoch: 15   Global Step: 638310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:02,215-Speed 2621.72 samples/sec   Loss 3.2039   LearningRate 0.0053   Epoch: 15   Global Step: 638320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:06,086-Speed 2646.49 samples/sec   Loss 3.2043   LearningRate 0.0053   Epoch: 15   Global Step: 638330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:09,988-Speed 2624.61 samples/sec   Loss 3.1249   LearningRate 0.0053   Epoch: 15   Global Step: 638340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:13,884-Speed 2629.52 samples/sec   Loss 3.1153   LearningRate 0.0053   Epoch: 15   Global Step: 638350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:17,795-Speed 2618.49 samples/sec   Loss 3.1753   LearningRate 0.0053   Epoch: 15   Global Step: 638360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:21,702-Speed 2622.09 samples/sec   Loss 3.1438   LearningRate 0.0053   Epoch: 15   Global Step: 638370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:25,600-Speed 2626.96 samples/sec   Loss 3.1542   LearningRate 0.0053   Epoch: 15   Global Step: 638380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:29,497-Speed 2628.68 samples/sec   Loss 3.0856   LearningRate 0.0053   Epoch: 15   Global Step: 638390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:33,402-Speed 2622.54 samples/sec   Loss 3.2089   LearningRate 0.0053   Epoch: 15   Global Step: 638400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:37,305-Speed 2624.24 samples/sec   Loss 3.1254   LearningRate 0.0053   Epoch: 15   Global Step: 638410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:41,204-Speed 2627.02 samples/sec   Loss 3.1839   LearningRate 0.0053   Epoch: 15   Global Step: 638420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:45,130-Speed 2609.32 samples/sec   Loss 3.1529   LearningRate 0.0053   Epoch: 15   Global Step: 638430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:28:49,023-Speed 2630.72 samples/sec   Loss 3.1263   LearningRate 0.0053   Epoch: 15   Global Step: 638440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:52,922-Speed 2627.31 samples/sec   Loss 3.1705   LearningRate 0.0053   Epoch: 15   Global Step: 638450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:28:56,820-Speed 2627.07 samples/sec   Loss 3.1768   LearningRate 0.0053   Epoch: 15   Global Step: 638460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:00,723-Speed 2624.66 samples/sec   Loss 3.1833   LearningRate 0.0053   Epoch: 15   Global Step: 638470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:04,617-Speed 2630.15 samples/sec   Loss 3.2009   LearningRate 0.0053   Epoch: 15   Global Step: 638480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:08,522-Speed 2622.44 samples/sec   Loss 3.1863   LearningRate 0.0053   Epoch: 15   Global Step: 638490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:12,424-Speed 2624.54 samples/sec   Loss 3.1454   LearningRate 0.0053   Epoch: 15   Global Step: 638500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:16,319-Speed 2630.28 samples/sec   Loss 3.1161   LearningRate 0.0053   Epoch: 15   Global Step: 638510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:20,217-Speed 2627.85 samples/sec   Loss 3.1890   LearningRate 0.0053   Epoch: 15   Global Step: 638520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:24,112-Speed 2629.77 samples/sec   Loss 3.2039   LearningRate 0.0053   Epoch: 15   Global Step: 638530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:28,006-Speed 2630.66 samples/sec   Loss 3.1542   LearningRate 0.0053   Epoch: 15   Global Step: 638540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:29:31,877-Speed 2646.18 samples/sec   Loss 3.1639   LearningRate 0.0053   Epoch: 15   Global Step: 638550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:35,769-Speed 2630.91 samples/sec   Loss 3.1878   LearningRate 0.0053   Epoch: 15   Global Step: 638560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:39,663-Speed 2630.41 samples/sec   Loss 3.1423   LearningRate 0.0053   Epoch: 15   Global Step: 638570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:43,563-Speed 2626.21 samples/sec   Loss 3.1375   LearningRate 0.0053   Epoch: 15   Global Step: 638580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:47,471-Speed 2621.05 samples/sec   Loss 3.1394   LearningRate 0.0053   Epoch: 15   Global Step: 638590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:29:51,418-Speed 2595.23 samples/sec   Loss 3.2413   LearningRate 0.0053   Epoch: 15   Global Step: 638600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:29:55,319-Speed 2625.15 samples/sec   Loss 3.0920   LearningRate 0.0053   Epoch: 15   Global Step: 638610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:29:59,209-Speed 2633.51 samples/sec   Loss 3.1402   LearningRate 0.0053   Epoch: 15   Global Step: 638620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:03,108-Speed 2626.78 samples/sec   Loss 3.1173   LearningRate 0.0053   Epoch: 15   Global Step: 638630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:07,005-Speed 2628.08 samples/sec   Loss 3.1364   LearningRate 0.0053   Epoch: 15   Global Step: 638640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:10,903-Speed 2627.93 samples/sec   Loss 3.1347   LearningRate 0.0053   Epoch: 15   Global Step: 638650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:14,797-Speed 2630.11 samples/sec   Loss 3.1563   LearningRate 0.0053   Epoch: 15   Global Step: 638660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:18,689-Speed 2631.46 samples/sec   Loss 3.1246   LearningRate 0.0053   Epoch: 15   Global Step: 638670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:22,586-Speed 2628.60 samples/sec   Loss 3.1975   LearningRate 0.0053   Epoch: 15   Global Step: 638680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:26,479-Speed 2631.27 samples/sec   Loss 3.1911   LearningRate 0.0053   Epoch: 15   Global Step: 638690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:30:30,374-Speed 2629.80 samples/sec   Loss 3.1575   LearningRate 0.0053   Epoch: 15   Global Step: 638700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:34,269-Speed 2629.60 samples/sec   Loss 3.0808   LearningRate 0.0053   Epoch: 15   Global Step: 638710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:38,162-Speed 2630.83 samples/sec   Loss 3.1796   LearningRate 0.0053   Epoch: 15   Global Step: 638720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:42,055-Speed 2631.97 samples/sec   Loss 3.1472   LearningRate 0.0053   Epoch: 15   Global Step: 638730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:45,946-Speed 2631.95 samples/sec   Loss 3.1512   LearningRate 0.0053   Epoch: 15   Global Step: 638740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:49,855-Speed 2621.42 samples/sec   Loss 3.1676   LearningRate 0.0053   Epoch: 15   Global Step: 638750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:53,750-Speed 2629.26 samples/sec   Loss 3.1517   LearningRate 0.0053   Epoch: 15   Global Step: 638760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:30:57,648-Speed 2627.46 samples/sec   Loss 3.1413   LearningRate 0.0053   Epoch: 15   Global Step: 638770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:31:01,520-Speed 2645.18 samples/sec   Loss 3.1096   LearningRate 0.0053   Epoch: 15   Global Step: 638780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:05,420-Speed 2626.21 samples/sec   Loss 3.1767   LearningRate 0.0053   Epoch: 15   Global Step: 638790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:09,319-Speed 2626.69 samples/sec   Loss 3.1440   LearningRate 0.0053   Epoch: 15   Global Step: 638800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:13,214-Speed 2629.94 samples/sec   Loss 3.1797   LearningRate 0.0053   Epoch: 15   Global Step: 638810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:17,107-Speed 2631.24 samples/sec   Loss 3.2143   LearningRate 0.0053   Epoch: 15   Global Step: 638820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:21,004-Speed 2628.62 samples/sec   Loss 3.1161   LearningRate 0.0053   Epoch: 15   Global Step: 638830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:24,910-Speed 2621.75 samples/sec   Loss 3.0942   LearningRate 0.0053   Epoch: 15   Global Step: 638840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:28,806-Speed 2628.60 samples/sec   Loss 3.0969   LearningRate 0.0053   Epoch: 15   Global Step: 638850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:32,701-Speed 2630.00 samples/sec   Loss 3.1971   LearningRate 0.0053   Epoch: 15   Global Step: 638860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:36,595-Speed 2629.81 samples/sec   Loss 3.1890   LearningRate 0.0053   Epoch: 15   Global Step: 638870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:31:40,490-Speed 2629.96 samples/sec   Loss 3.0851   LearningRate 0.0053   Epoch: 15   Global Step: 638880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:31:44,384-Speed 2630.08 samples/sec   Loss 3.1642   LearningRate 0.0053   Epoch: 15   Global Step: 638890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:31:48,298-Speed 2616.98 samples/sec   Loss 3.0342   LearningRate 0.0053   Epoch: 15   Global Step: 638900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:31:52,197-Speed 2626.66 samples/sec   Loss 3.1601   LearningRate 0.0053   Epoch: 15   Global Step: 638910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:31:56,100-Speed 2624.65 samples/sec   Loss 3.0485   LearningRate 0.0053   Epoch: 15   Global Step: 638920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:31:59,997-Speed 2628.44 samples/sec   Loss 3.1379   LearningRate 0.0053   Epoch: 15   Global Step: 638930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:03,901-Speed 2622.87 samples/sec   Loss 3.2023   LearningRate 0.0053   Epoch: 15   Global Step: 638940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:07,805-Speed 2624.56 samples/sec   Loss 3.1210   LearningRate 0.0053   Epoch: 15   Global Step: 638950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:11,702-Speed 2628.43 samples/sec   Loss 3.1044   LearningRate 0.0053   Epoch: 15   Global Step: 638960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:15,597-Speed 2629.48 samples/sec   Loss 3.1639   LearningRate 0.0053   Epoch: 15   Global Step: 638970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:19,519-Speed 2611.60 samples/sec   Loss 3.1021   LearningRate 0.0053   Epoch: 15   Global Step: 638980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:32:23,460-Speed 2598.64 samples/sec   Loss 3.1158   LearningRate 0.0053   Epoch: 15   Global Step: 638990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:27,361-Speed 2625.49 samples/sec   Loss 3.1863   LearningRate 0.0053   Epoch: 15   Global Step: 639000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:31,252-Speed 2632.56 samples/sec   Loss 3.1390   LearningRate 0.0053   Epoch: 15   Global Step: 639010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:35,142-Speed 2633.41 samples/sec   Loss 3.0962   LearningRate 0.0053   Epoch: 15   Global Step: 639020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:39,046-Speed 2623.77 samples/sec   Loss 3.1676   LearningRate 0.0053   Epoch: 15   Global Step: 639030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:42,946-Speed 2626.59 samples/sec   Loss 3.1126   LearningRate 0.0053   Epoch: 15   Global Step: 639040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:46,845-Speed 2626.99 samples/sec   Loss 3.0955   LearningRate 0.0053   Epoch: 15   Global Step: 639050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:50,757-Speed 2617.97 samples/sec   Loss 3.0563   LearningRate 0.0053   Epoch: 15   Global Step: 639060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:54,667-Speed 2620.12 samples/sec   Loss 3.1419   LearningRate 0.0053   Epoch: 15   Global Step: 639070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:32:58,588-Speed 2612.21 samples/sec   Loss 3.1013   LearningRate 0.0053   Epoch: 15   Global Step: 639080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:02,484-Speed 2628.88 samples/sec   Loss 3.1679   LearningRate 0.0053   Epoch: 15   Global Step: 639090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:06,377-Speed 2630.69 samples/sec   Loss 3.0790   LearningRate 0.0053   Epoch: 15   Global Step: 639100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:10,271-Speed 2630.57 samples/sec   Loss 3.2368   LearningRate 0.0053   Epoch: 15   Global Step: 639110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:14,195-Speed 2610.63 samples/sec   Loss 3.1073   LearningRate 0.0053   Epoch: 15   Global Step: 639120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:18,110-Speed 2615.88 samples/sec   Loss 3.0947   LearningRate 0.0053   Epoch: 15   Global Step: 639130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:22,034-Speed 2611.17 samples/sec   Loss 3.1903   LearningRate 0.0053   Epoch: 15   Global Step: 639140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:25,939-Speed 2622.53 samples/sec   Loss 3.1368   LearningRate 0.0053   Epoch: 15   Global Step: 639150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:33:29,816-Speed 2642.30 samples/sec   Loss 3.1360   LearningRate 0.0053   Epoch: 15   Global Step: 639160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:33,713-Speed 2627.66 samples/sec   Loss 3.1966   LearningRate 0.0053   Epoch: 15   Global Step: 639170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:37,612-Speed 2627.03 samples/sec   Loss 3.1118   LearningRate 0.0053   Epoch: 15   Global Step: 639180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:41,512-Speed 2626.49 samples/sec   Loss 3.1088   LearningRate 0.0053   Epoch: 15   Global Step: 639190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:45,405-Speed 2630.66 samples/sec   Loss 3.0745   LearningRate 0.0053   Epoch: 15   Global Step: 639200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:49,298-Speed 2631.29 samples/sec   Loss 3.1185   LearningRate 0.0053   Epoch: 15   Global Step: 639210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:53,193-Speed 2629.10 samples/sec   Loss 3.1544   LearningRate 0.0053   Epoch: 15   Global Step: 639220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:33:57,085-Speed 2632.30 samples/sec   Loss 3.1288   LearningRate 0.0053   Epoch: 15   Global Step: 639230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:00,976-Speed 2632.56 samples/sec   Loss 3.1552   LearningRate 0.0053   Epoch: 15   Global Step: 639240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:04,869-Speed 2630.56 samples/sec   Loss 3.1385   LearningRate 0.0053   Epoch: 15   Global Step: 639250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:08,786-Speed 2614.90 samples/sec   Loss 3.0884   LearningRate 0.0053   Epoch: 15   Global Step: 639260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:34:12,684-Speed 2627.28 samples/sec   Loss 3.1750   LearningRate 0.0053   Epoch: 15   Global Step: 639270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:34:16,589-Speed 2623.50 samples/sec   Loss 3.1119   LearningRate 0.0053   Epoch: 15   Global Step: 639280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:34:20,485-Speed 2628.23 samples/sec   Loss 3.0836   LearningRate 0.0053   Epoch: 15   Global Step: 639290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:34:24,362-Speed 2642.63 samples/sec   Loss 3.2133   LearningRate 0.0053   Epoch: 15   Global Step: 639300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:28,255-Speed 2630.55 samples/sec   Loss 3.0595   LearningRate 0.0053   Epoch: 15   Global Step: 639310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:32,152-Speed 2628.18 samples/sec   Loss 3.1575   LearningRate 0.0053   Epoch: 15   Global Step: 639320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:36,050-Speed 2628.25 samples/sec   Loss 3.1178   LearningRate 0.0053   Epoch: 15   Global Step: 639330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:39,951-Speed 2625.01 samples/sec   Loss 3.1900   LearningRate 0.0053   Epoch: 15   Global Step: 639340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:43,849-Speed 2627.54 samples/sec   Loss 3.1533   LearningRate 0.0053   Epoch: 15   Global Step: 639350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:47,754-Speed 2623.29 samples/sec   Loss 3.2072   LearningRate 0.0053   Epoch: 15   Global Step: 639360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:51,654-Speed 2625.94 samples/sec   Loss 3.1599   LearningRate 0.0053   Epoch: 15   Global Step: 639370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:55,546-Speed 2631.82 samples/sec   Loss 3.0845   LearningRate 0.0053   Epoch: 15   Global Step: 639380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:34:59,439-Speed 2631.33 samples/sec   Loss 3.1224   LearningRate 0.0053   Epoch: 15   Global Step: 639390   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:35:03,337-Speed 2627.31 samples/sec   Loss 3.0944   LearningRate 0.0053   Epoch: 15   Global Step: 639400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:07,290-Speed 2590.76 samples/sec   Loss 3.0903   LearningRate 0.0053   Epoch: 15   Global Step: 639410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:11,182-Speed 2632.12 samples/sec   Loss 3.1087   LearningRate 0.0053   Epoch: 15   Global Step: 639420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:15,081-Speed 2627.11 samples/sec   Loss 3.1669   LearningRate 0.0053   Epoch: 15   Global Step: 639430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:18,974-Speed 2630.34 samples/sec   Loss 3.0833   LearningRate 0.0053   Epoch: 15   Global Step: 639440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:22,867-Speed 2631.60 samples/sec   Loss 3.1020   LearningRate 0.0053   Epoch: 15   Global Step: 639450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:26,758-Speed 2631.78 samples/sec   Loss 3.1330   LearningRate 0.0053   Epoch: 15   Global Step: 639460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:30,648-Speed 2633.10 samples/sec   Loss 3.1121   LearningRate 0.0053   Epoch: 15   Global Step: 639470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:34,557-Speed 2619.97 samples/sec   Loss 3.1376   LearningRate 0.0053   Epoch: 15   Global Step: 639480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:35:38,435-Speed 2641.17 samples/sec   Loss 3.0896   LearningRate 0.0053   Epoch: 15   Global Step: 639490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:35:42,331-Speed 2628.56 samples/sec   Loss 3.1492   LearningRate 0.0052   Epoch: 15   Global Step: 639500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:35:46,226-Speed 2629.88 samples/sec   Loss 3.1872   LearningRate 0.0052   Epoch: 15   Global Step: 639510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:35:50,137-Speed 2619.00 samples/sec   Loss 3.0186   LearningRate 0.0052   Epoch: 15   Global Step: 639520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:35:54,030-Speed 2631.47 samples/sec   Loss 3.1268   LearningRate 0.0052   Epoch: 15   Global Step: 639530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:35:57,933-Speed 2624.19 samples/sec   Loss 3.1393   LearningRate 0.0052   Epoch: 15   Global Step: 639540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:01,827-Speed 2630.26 samples/sec   Loss 3.0941   LearningRate 0.0052   Epoch: 15   Global Step: 639550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:05,722-Speed 2629.33 samples/sec   Loss 3.0789   LearningRate 0.0052   Epoch: 15   Global Step: 639560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:09,617-Speed 2629.64 samples/sec   Loss 3.1399   LearningRate 0.0052   Epoch: 15   Global Step: 639570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:13,516-Speed 2626.37 samples/sec   Loss 3.1392   LearningRate 0.0052   Epoch: 15   Global Step: 639580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:17,412-Speed 2629.31 samples/sec   Loss 3.1494   LearningRate 0.0052   Epoch: 15   Global Step: 639590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:36:21,308-Speed 2629.30 samples/sec   Loss 3.0774   LearningRate 0.0052   Epoch: 15   Global Step: 639600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:36:25,212-Speed 2623.09 samples/sec   Loss 3.1088   LearningRate 0.0052   Epoch: 15   Global Step: 639610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:36:29,117-Speed 2623.20 samples/sec   Loss 3.1957   LearningRate 0.0052   Epoch: 15   Global Step: 639620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:36:33,014-Speed 2628.62 samples/sec   Loss 3.0681   LearningRate 0.0052   Epoch: 15   Global Step: 639630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:36:36,884-Speed 2646.49 samples/sec   Loss 3.1480   LearningRate 0.0052   Epoch: 15   Global Step: 639640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:40,782-Speed 2627.88 samples/sec   Loss 3.0891   LearningRate 0.0052   Epoch: 15   Global Step: 639650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:44,677-Speed 2629.70 samples/sec   Loss 3.0645   LearningRate 0.0052   Epoch: 15   Global Step: 639660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:48,577-Speed 2625.89 samples/sec   Loss 3.1481   LearningRate 0.0052   Epoch: 15   Global Step: 639670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:52,487-Speed 2619.66 samples/sec   Loss 3.1538   LearningRate 0.0052   Epoch: 15   Global Step: 639680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:36:56,383-Speed 2629.30 samples/sec   Loss 3.2206   LearningRate 0.0052   Epoch: 15   Global Step: 639690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:00,287-Speed 2623.50 samples/sec   Loss 3.1898   LearningRate 0.0052   Epoch: 15   Global Step: 639700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:04,179-Speed 2631.21 samples/sec   Loss 3.1639   LearningRate 0.0052   Epoch: 15   Global Step: 639710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:08,079-Speed 2626.80 samples/sec   Loss 3.0988   LearningRate 0.0052   Epoch: 15   Global Step: 639720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:11,978-Speed 2626.92 samples/sec   Loss 3.1702   LearningRate 0.0052   Epoch: 15   Global Step: 639730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:15,848-Speed 2646.61 samples/sec   Loss 3.0971   LearningRate 0.0052   Epoch: 15   Global Step: 639740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:19,744-Speed 2629.74 samples/sec   Loss 3.0727   LearningRate 0.0052   Epoch: 15   Global Step: 639750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:23,650-Speed 2621.92 samples/sec   Loss 3.1238   LearningRate 0.0052   Epoch: 15   Global Step: 639760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:27,541-Speed 2631.91 samples/sec   Loss 3.1302   LearningRate 0.0052   Epoch: 15   Global Step: 639770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:31,457-Speed 2615.49 samples/sec   Loss 3.0797   LearningRate 0.0052   Epoch: 15   Global Step: 639780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:35,364-Speed 2622.01 samples/sec   Loss 3.1773   LearningRate 0.0052   Epoch: 15   Global Step: 639790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:39,267-Speed 2623.90 samples/sec   Loss 3.1688   LearningRate 0.0052   Epoch: 15   Global Step: 639800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:43,171-Speed 2624.08 samples/sec   Loss 3.0997   LearningRate 0.0052   Epoch: 15   Global Step: 639810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:47,068-Speed 2628.32 samples/sec   Loss 3.0499   LearningRate 0.0052   Epoch: 15   Global Step: 639820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:50,965-Speed 2628.41 samples/sec   Loss 3.0636   LearningRate 0.0052   Epoch: 15   Global Step: 639830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:37:54,858-Speed 2630.41 samples/sec   Loss 3.0601   LearningRate 0.0052   Epoch: 15   Global Step: 639840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:37:58,761-Speed 2624.55 samples/sec   Loss 3.1679   LearningRate 0.0052   Epoch: 15   Global Step: 639850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:02,650-Speed 2633.23 samples/sec   Loss 3.0553   LearningRate 0.0052   Epoch: 15   Global Step: 639860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:06,545-Speed 2629.80 samples/sec   Loss 3.0962   LearningRate 0.0052   Epoch: 15   Global Step: 639870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:10,438-Speed 2630.53 samples/sec   Loss 3.1609   LearningRate 0.0052   Epoch: 15   Global Step: 639880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:14,331-Speed 2631.44 samples/sec   Loss 3.1288   LearningRate 0.0052   Epoch: 15   Global Step: 639890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:18,231-Speed 2626.32 samples/sec   Loss 3.0318   LearningRate 0.0052   Epoch: 15   Global Step: 639900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:22,126-Speed 2629.63 samples/sec   Loss 3.1063   LearningRate 0.0052   Epoch: 15   Global Step: 639910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:26,034-Speed 2620.71 samples/sec   Loss 3.1551   LearningRate 0.0052   Epoch: 15   Global Step: 639920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:29,929-Speed 2629.68 samples/sec   Loss 3.0337   LearningRate 0.0052   Epoch: 15   Global Step: 639930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:38:33,820-Speed 2632.28 samples/sec   Loss 3.1251   LearningRate 0.0052   Epoch: 15   Global Step: 639940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:38:37,663-Speed 2665.60 samples/sec   Loss 3.0568   LearningRate 0.0052   Epoch: 15   Global Step: 639950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:38:41,574-Speed 2619.18 samples/sec   Loss 3.0376   LearningRate 0.0052   Epoch: 15   Global Step: 639960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:38:45,479-Speed 2622.75 samples/sec   Loss 3.1808   LearningRate 0.0052   Epoch: 15   Global Step: 639970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:38:49,371-Speed 2631.45 samples/sec   Loss 3.0801   LearningRate 0.0052   Epoch: 15   Global Step: 639980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:38:53,263-Speed 2631.87 samples/sec   Loss 3.1680   LearningRate 0.0052   Epoch: 15   Global Step: 639990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:38:57,183-Speed 2613.56 samples/sec   Loss 3.1327   LearningRate 0.0052   Epoch: 15   Global Step: 640000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:39:40,171-[lfw][640000]XNorm: 22.309158
Training: 2022-04-15 19:39:40,172-[lfw][640000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-15 19:39:40,172-[lfw][640000]Accuracy-Highest: 0.99817
Training: 2022-04-15 19:40:31,816-[cfp_fp][640000]XNorm: 21.524490
Training: 2022-04-15 19:40:31,817-[cfp_fp][640000]Accuracy-Flip: 0.99186+-0.00409
Training: 2022-04-15 19:40:31,818-[cfp_fp][640000]Accuracy-Highest: 0.99243
Training: 2022-04-15 19:41:15,250-[agedb_30][640000]XNorm: 22.692452
Training: 2022-04-15 19:41:15,260-[agedb_30][640000]Accuracy-Flip: 0.98017+-0.00550
Training: 2022-04-15 19:41:15,260-[agedb_30][640000]Accuracy-Highest: 0.98150
Training: 2022-04-15 19:41:19,149-Speed 72.13 samples/sec   Loss 3.1260   LearningRate 0.0052   Epoch: 15   Global Step: 640010   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:23,025-Speed 2642.50 samples/sec   Loss 3.1826   LearningRate 0.0052   Epoch: 15   Global Step: 640020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:26,902-Speed 2642.10 samples/sec   Loss 3.0967   LearningRate 0.0052   Epoch: 15   Global Step: 640030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:30,780-Speed 2641.36 samples/sec   Loss 3.0793   LearningRate 0.0052   Epoch: 15   Global Step: 640040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:34,664-Speed 2636.75 samples/sec   Loss 3.0979   LearningRate 0.0052   Epoch: 15   Global Step: 640050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:41:38,534-Speed 2647.54 samples/sec   Loss 3.2112   LearningRate 0.0052   Epoch: 15   Global Step: 640060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:42,431-Speed 2628.73 samples/sec   Loss 3.1202   LearningRate 0.0052   Epoch: 15   Global Step: 640070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:46,409-Speed 2574.55 samples/sec   Loss 3.0746   LearningRate 0.0052   Epoch: 15   Global Step: 640080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:50,307-Speed 2628.10 samples/sec   Loss 3.0643   LearningRate 0.0052   Epoch: 15   Global Step: 640090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:54,202-Speed 2629.78 samples/sec   Loss 3.1168   LearningRate 0.0052   Epoch: 15   Global Step: 640100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:41:58,101-Speed 2626.93 samples/sec   Loss 3.0550   LearningRate 0.0052   Epoch: 15   Global Step: 640110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:42:01,993-Speed 2631.47 samples/sec   Loss 3.0917   LearningRate 0.0052   Epoch: 15   Global Step: 640120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:42:05,920-Speed 2607.52 samples/sec   Loss 3.1172   LearningRate 0.0052   Epoch: 15   Global Step: 640130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:42:09,855-Speed 2603.26 samples/sec   Loss 3.1300   LearningRate 0.0052   Epoch: 15   Global Step: 640140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:42:13,746-Speed 2632.67 samples/sec   Loss 3.1444   LearningRate 0.0052   Epoch: 15   Global Step: 640150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:42:17,643-Speed 2628.77 samples/sec   Loss 3.1176   LearningRate 0.0052   Epoch: 15   Global Step: 640160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:21,575-Speed 2605.09 samples/sec   Loss 3.1314   LearningRate 0.0052   Epoch: 15   Global Step: 640170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:25,472-Speed 2628.63 samples/sec   Loss 3.0446   LearningRate 0.0052   Epoch: 15   Global Step: 640180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:29,368-Speed 2628.59 samples/sec   Loss 3.1790   LearningRate 0.0052   Epoch: 15   Global Step: 640190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:33,261-Speed 2631.39 samples/sec   Loss 3.1654   LearningRate 0.0052   Epoch: 15   Global Step: 640200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:37,153-Speed 2631.55 samples/sec   Loss 3.0671   LearningRate 0.0052   Epoch: 15   Global Step: 640210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:41,058-Speed 2623.04 samples/sec   Loss 3.0877   LearningRate 0.0052   Epoch: 15   Global Step: 640220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:44,958-Speed 2626.60 samples/sec   Loss 3.1535   LearningRate 0.0052   Epoch: 15   Global Step: 640230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:48,851-Speed 2630.75 samples/sec   Loss 3.1521   LearningRate 0.0052   Epoch: 15   Global Step: 640240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:52,766-Speed 2616.53 samples/sec   Loss 3.1703   LearningRate 0.0052   Epoch: 15   Global Step: 640250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:42:56,663-Speed 2628.33 samples/sec   Loss 3.1589   LearningRate 0.0052   Epoch: 15   Global Step: 640260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:43:00,555-Speed 2631.70 samples/sec   Loss 3.2101   LearningRate 0.0052   Epoch: 15   Global Step: 640270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:43:04,444-Speed 2633.40 samples/sec   Loss 3.0979   LearningRate 0.0052   Epoch: 15   Global Step: 640280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:43:08,322-Speed 2641.74 samples/sec   Loss 3.1301   LearningRate 0.0052   Epoch: 15   Global Step: 640290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:12,217-Speed 2629.01 samples/sec   Loss 3.0833   LearningRate 0.0052   Epoch: 15   Global Step: 640300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:16,118-Speed 2625.90 samples/sec   Loss 3.1298   LearningRate 0.0052   Epoch: 15   Global Step: 640310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:20,009-Speed 2632.50 samples/sec   Loss 3.1509   LearningRate 0.0052   Epoch: 15   Global Step: 640320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:23,903-Speed 2630.55 samples/sec   Loss 3.0874   LearningRate 0.0052   Epoch: 15   Global Step: 640330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:27,872-Speed 2580.73 samples/sec   Loss 3.1015   LearningRate 0.0052   Epoch: 15   Global Step: 640340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:31,783-Speed 2619.08 samples/sec   Loss 3.1080   LearningRate 0.0052   Epoch: 15   Global Step: 640350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:35,675-Speed 2631.56 samples/sec   Loss 3.1834   LearningRate 0.0052   Epoch: 15   Global Step: 640360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:39,573-Speed 2627.33 samples/sec   Loss 3.0975   LearningRate 0.0052   Epoch: 15   Global Step: 640370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:43:43,444-Speed 2646.37 samples/sec   Loss 3.1311   LearningRate 0.0052   Epoch: 15   Global Step: 640380   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:43:47,345-Speed 2625.77 samples/sec   Loss 3.0242   LearningRate 0.0052   Epoch: 15   Global Step: 640390   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:43:51,244-Speed 2627.18 samples/sec   Loss 3.1322   LearningRate 0.0052   Epoch: 15   Global Step: 640400   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:43:55,143-Speed 2626.68 samples/sec   Loss 3.1336   LearningRate 0.0052   Epoch: 15   Global Step: 640410   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:43:59,041-Speed 2628.09 samples/sec   Loss 3.0901   LearningRate 0.0052   Epoch: 15   Global Step: 640420   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:44:02,938-Speed 2628.39 samples/sec   Loss 3.0328   LearningRate 0.0052   Epoch: 15   Global Step: 640430   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:44:06,866-Speed 2607.60 samples/sec   Loss 3.0880   LearningRate 0.0052   Epoch: 15   Global Step: 640440   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:44:10,768-Speed 2624.51 samples/sec   Loss 3.1290   LearningRate 0.0052   Epoch: 15   Global Step: 640450   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:44:14,666-Speed 2627.98 samples/sec   Loss 3.0865   LearningRate 0.0052   Epoch: 15   Global Step: 640460   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:44:18,569-Speed 2624.43 samples/sec   Loss 3.1409   LearningRate 0.0052   Epoch: 15   Global Step: 640470   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:44:22,480-Speed 2619.25 samples/sec   Loss 3.0791   LearningRate 0.0052   Epoch: 15   Global Step: 640480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:26,392-Speed 2618.14 samples/sec   Loss 3.0281   LearningRate 0.0052   Epoch: 15   Global Step: 640490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:30,295-Speed 2624.12 samples/sec   Loss 3.0772   LearningRate 0.0052   Epoch: 15   Global Step: 640500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:34,191-Speed 2629.06 samples/sec   Loss 3.1875   LearningRate 0.0052   Epoch: 15   Global Step: 640510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:38,088-Speed 2628.40 samples/sec   Loss 3.0844   LearningRate 0.0052   Epoch: 15   Global Step: 640520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:41,990-Speed 2624.98 samples/sec   Loss 3.1267   LearningRate 0.0052   Epoch: 15   Global Step: 640530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:45,895-Speed 2623.08 samples/sec   Loss 3.0143   LearningRate 0.0052   Epoch: 15   Global Step: 640540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:49,791-Speed 2629.31 samples/sec   Loss 3.0681   LearningRate 0.0052   Epoch: 15   Global Step: 640550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:53,684-Speed 2630.52 samples/sec   Loss 3.1154   LearningRate 0.0052   Epoch: 15   Global Step: 640560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:44:57,566-Speed 2639.36 samples/sec   Loss 3.1472   LearningRate 0.0052   Epoch: 15   Global Step: 640570   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:01,461-Speed 2629.87 samples/sec   Loss 3.1463   LearningRate 0.0052   Epoch: 15   Global Step: 640580   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:05,389-Speed 2607.04 samples/sec   Loss 3.1923   LearningRate 0.0052   Epoch: 15   Global Step: 640590   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:09,303-Speed 2617.03 samples/sec   Loss 3.0968   LearningRate 0.0052   Epoch: 15   Global Step: 640600   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:13,196-Speed 2631.27 samples/sec   Loss 3.1095   LearningRate 0.0052   Epoch: 15   Global Step: 640610   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:17,105-Speed 2620.16 samples/sec   Loss 3.1413   LearningRate 0.0052   Epoch: 15   Global Step: 640620   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:21,006-Speed 2626.05 samples/sec   Loss 3.0984   LearningRate 0.0052   Epoch: 15   Global Step: 640630   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:24,904-Speed 2627.91 samples/sec   Loss 3.1096   LearningRate 0.0052   Epoch: 15   Global Step: 640640   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:28,796-Speed 2632.18 samples/sec   Loss 3.1627   LearningRate 0.0052   Epoch: 15   Global Step: 640650   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:32,689-Speed 2630.95 samples/sec   Loss 3.1314   LearningRate 0.0052   Epoch: 15   Global Step: 640660   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:45:36,611-Speed 2611.42 samples/sec   Loss 3.0918   LearningRate 0.0052   Epoch: 15   Global Step: 640670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:45:40,504-Speed 2630.84 samples/sec   Loss 3.0787   LearningRate 0.0052   Epoch: 15   Global Step: 640680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:45:44,408-Speed 2624.12 samples/sec   Loss 3.1899   LearningRate 0.0052   Epoch: 15   Global Step: 640690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:45:48,305-Speed 2628.20 samples/sec   Loss 3.1239   LearningRate 0.0052   Epoch: 15   Global Step: 640700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:45:52,218-Speed 2617.41 samples/sec   Loss 3.1356   LearningRate 0.0052   Epoch: 15   Global Step: 640710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:45:56,121-Speed 2624.17 samples/sec   Loss 3.1561   LearningRate 0.0052   Epoch: 15   Global Step: 640720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:00,015-Speed 2631.27 samples/sec   Loss 3.1106   LearningRate 0.0052   Epoch: 15   Global Step: 640730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:03,908-Speed 2630.50 samples/sec   Loss 3.0768   LearningRate 0.0052   Epoch: 15   Global Step: 640740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:07,819-Speed 2618.78 samples/sec   Loss 3.1050   LearningRate 0.0052   Epoch: 15   Global Step: 640750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:11,726-Speed 2621.90 samples/sec   Loss 3.1286   LearningRate 0.0052   Epoch: 15   Global Step: 640760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:15,621-Speed 2630.00 samples/sec   Loss 3.0990   LearningRate 0.0052   Epoch: 15   Global Step: 640770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:19,532-Speed 2619.00 samples/sec   Loss 3.1316   LearningRate 0.0052   Epoch: 15   Global Step: 640780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:23,443-Speed 2618.48 samples/sec   Loss 3.0758   LearningRate 0.0052   Epoch: 15   Global Step: 640790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:27,343-Speed 2626.97 samples/sec   Loss 3.0383   LearningRate 0.0052   Epoch: 15   Global Step: 640800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:31,272-Speed 2606.76 samples/sec   Loss 3.1456   LearningRate 0.0052   Epoch: 15   Global Step: 640810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:35,192-Speed 2613.52 samples/sec   Loss 3.1114   LearningRate 0.0052   Epoch: 15   Global Step: 640820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:39,091-Speed 2626.81 samples/sec   Loss 3.0660   LearningRate 0.0052   Epoch: 15   Global Step: 640830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:46:42,961-Speed 2646.82 samples/sec   Loss 3.1179   LearningRate 0.0052   Epoch: 15   Global Step: 640840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:46,856-Speed 2628.95 samples/sec   Loss 3.1242   LearningRate 0.0052   Epoch: 15   Global Step: 640850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:50,799-Speed 2598.02 samples/sec   Loss 3.0106   LearningRate 0.0052   Epoch: 15   Global Step: 640860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:54,694-Speed 2630.58 samples/sec   Loss 3.1642   LearningRate 0.0052   Epoch: 15   Global Step: 640870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:46:58,588-Speed 2629.84 samples/sec   Loss 3.1915   LearningRate 0.0052   Epoch: 15   Global Step: 640880   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:02,500-Speed 2619.09 samples/sec   Loss 3.1222   LearningRate 0.0052   Epoch: 15   Global Step: 640890   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:06,396-Speed 2629.07 samples/sec   Loss 3.0866   LearningRate 0.0052   Epoch: 15   Global Step: 640900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:10,301-Speed 2622.75 samples/sec   Loss 3.1321   LearningRate 0.0052   Epoch: 15   Global Step: 640910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:14,202-Speed 2625.31 samples/sec   Loss 3.0499   LearningRate 0.0052   Epoch: 15   Global Step: 640920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:18,123-Speed 2612.48 samples/sec   Loss 3.0956   LearningRate 0.0052   Epoch: 15   Global Step: 640930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:22,023-Speed 2626.56 samples/sec   Loss 3.1474   LearningRate 0.0052   Epoch: 15   Global Step: 640940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:47:25,920-Speed 2628.08 samples/sec   Loss 3.1277   LearningRate 0.0052   Epoch: 15   Global Step: 640950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:47:29,833-Speed 2617.46 samples/sec   Loss 3.0467   LearningRate 0.0052   Epoch: 15   Global Step: 640960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:47:33,734-Speed 2625.83 samples/sec   Loss 3.1252   LearningRate 0.0052   Epoch: 15   Global Step: 640970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:47:37,615-Speed 2639.36 samples/sec   Loss 3.1189   LearningRate 0.0052   Epoch: 15   Global Step: 640980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:41,504-Speed 2633.93 samples/sec   Loss 3.0867   LearningRate 0.0052   Epoch: 15   Global Step: 640990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:45,400-Speed 2628.53 samples/sec   Loss 3.1534   LearningRate 0.0052   Epoch: 15   Global Step: 641000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:49,291-Speed 2632.58 samples/sec   Loss 3.1629   LearningRate 0.0052   Epoch: 15   Global Step: 641010   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:53,194-Speed 2624.28 samples/sec   Loss 3.0287   LearningRate 0.0052   Epoch: 15   Global Step: 641020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:47:57,090-Speed 2628.68 samples/sec   Loss 3.1576   LearningRate 0.0052   Epoch: 15   Global Step: 641030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:00,985-Speed 2629.95 samples/sec   Loss 3.2157   LearningRate 0.0052   Epoch: 15   Global Step: 641040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:04,880-Speed 2630.08 samples/sec   Loss 3.0740   LearningRate 0.0052   Epoch: 15   Global Step: 641050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:08,777-Speed 2628.52 samples/sec   Loss 3.0894   LearningRate 0.0052   Epoch: 15   Global Step: 641060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:12,672-Speed 2629.33 samples/sec   Loss 3.1074   LearningRate 0.0052   Epoch: 15   Global Step: 641070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:16,565-Speed 2630.90 samples/sec   Loss 3.0946   LearningRate 0.0052   Epoch: 15   Global Step: 641080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:48:20,456-Speed 2632.75 samples/sec   Loss 3.0494   LearningRate 0.0052   Epoch: 15   Global Step: 641090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:48:24,384-Speed 2607.66 samples/sec   Loss 3.1186   LearningRate 0.0052   Epoch: 15   Global Step: 641100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:48:28,281-Speed 2628.59 samples/sec   Loss 3.0724   LearningRate 0.0052   Epoch: 15   Global Step: 641110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:48:32,233-Speed 2591.82 samples/sec   Loss 3.0433   LearningRate 0.0052   Epoch: 15   Global Step: 641120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:48:36,181-Speed 2594.72 samples/sec   Loss 3.0602   LearningRate 0.0052   Epoch: 15   Global Step: 641130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:40,073-Speed 2631.57 samples/sec   Loss 3.0961   LearningRate 0.0052   Epoch: 15   Global Step: 641140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:43,975-Speed 2625.13 samples/sec   Loss 3.1326   LearningRate 0.0052   Epoch: 15   Global Step: 641150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:47,868-Speed 2632.01 samples/sec   Loss 3.1055   LearningRate 0.0052   Epoch: 15   Global Step: 641160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:51,760-Speed 2630.98 samples/sec   Loss 3.0420   LearningRate 0.0052   Epoch: 15   Global Step: 641170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:55,672-Speed 2619.28 samples/sec   Loss 3.1013   LearningRate 0.0052   Epoch: 15   Global Step: 641180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:48:59,568-Speed 2628.42 samples/sec   Loss 3.0823   LearningRate 0.0052   Epoch: 15   Global Step: 641190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:03,470-Speed 2625.07 samples/sec   Loss 3.0504   LearningRate 0.0052   Epoch: 15   Global Step: 641200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:07,371-Speed 2625.47 samples/sec   Loss 3.0428   LearningRate 0.0052   Epoch: 15   Global Step: 641210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:11,271-Speed 2626.50 samples/sec   Loss 3.1100   LearningRate 0.0052   Epoch: 15   Global Step: 641220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:15,169-Speed 2628.33 samples/sec   Loss 3.1377   LearningRate 0.0052   Epoch: 15   Global Step: 641230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:49:19,083-Speed 2616.43 samples/sec   Loss 3.0796   LearningRate 0.0052   Epoch: 15   Global Step: 641240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:49:22,978-Speed 2630.21 samples/sec   Loss 3.1154   LearningRate 0.0052   Epoch: 15   Global Step: 641250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:49:26,874-Speed 2629.32 samples/sec   Loss 3.1549   LearningRate 0.0052   Epoch: 15   Global Step: 641260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:49:30,753-Speed 2640.55 samples/sec   Loss 3.0786   LearningRate 0.0052   Epoch: 15   Global Step: 641270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:34,652-Speed 2627.06 samples/sec   Loss 3.0533   LearningRate 0.0052   Epoch: 15   Global Step: 641280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:38,545-Speed 2630.73 samples/sec   Loss 3.1132   LearningRate 0.0052   Epoch: 15   Global Step: 641290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:42,457-Speed 2618.38 samples/sec   Loss 3.0559   LearningRate 0.0052   Epoch: 15   Global Step: 641300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:46,360-Speed 2624.13 samples/sec   Loss 3.1247   LearningRate 0.0052   Epoch: 15   Global Step: 641310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:50,256-Speed 2629.87 samples/sec   Loss 3.0367   LearningRate 0.0051   Epoch: 15   Global Step: 641320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:54,148-Speed 2631.22 samples/sec   Loss 3.2009   LearningRate 0.0051   Epoch: 15   Global Step: 641330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:49:58,044-Speed 2630.00 samples/sec   Loss 3.1090   LearningRate 0.0051   Epoch: 15   Global Step: 641340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:01,939-Speed 2629.50 samples/sec   Loss 3.0875   LearningRate 0.0051   Epoch: 15   Global Step: 641350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:05,845-Speed 2621.84 samples/sec   Loss 3.0935   LearningRate 0.0051   Epoch: 15   Global Step: 641360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:09,745-Speed 2626.28 samples/sec   Loss 3.0973   LearningRate 0.0051   Epoch: 15   Global Step: 641370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:13,646-Speed 2625.80 samples/sec   Loss 3.0789   LearningRate 0.0051   Epoch: 15   Global Step: 641380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:17,541-Speed 2629.75 samples/sec   Loss 3.0256   LearningRate 0.0051   Epoch: 15   Global Step: 641390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:21,438-Speed 2628.88 samples/sec   Loss 3.0663   LearningRate 0.0051   Epoch: 15   Global Step: 641400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:25,339-Speed 2625.27 samples/sec   Loss 3.0177   LearningRate 0.0051   Epoch: 15   Global Step: 641410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:29,248-Speed 2620.42 samples/sec   Loss 3.1006   LearningRate 0.0051   Epoch: 15   Global Step: 641420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:33,153-Speed 2623.18 samples/sec   Loss 3.1325   LearningRate 0.0051   Epoch: 15   Global Step: 641430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:50:37,030-Speed 2641.61 samples/sec   Loss 3.1265   LearningRate 0.0051   Epoch: 15   Global Step: 641440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:40,991-Speed 2585.68 samples/sec   Loss 3.1284   LearningRate 0.0051   Epoch: 15   Global Step: 641450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:44,888-Speed 2629.23 samples/sec   Loss 3.0130   LearningRate 0.0051   Epoch: 15   Global Step: 641460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:48,779-Speed 2632.20 samples/sec   Loss 3.0216   LearningRate 0.0051   Epoch: 15   Global Step: 641470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:52,695-Speed 2615.58 samples/sec   Loss 3.1031   LearningRate 0.0051   Epoch: 15   Global Step: 641480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:50:56,590-Speed 2630.23 samples/sec   Loss 3.1039   LearningRate 0.0051   Epoch: 15   Global Step: 641490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:51:00,495-Speed 2623.34 samples/sec   Loss 3.0836   LearningRate 0.0051   Epoch: 15   Global Step: 641500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:51:04,416-Speed 2612.03 samples/sec   Loss 3.0898   LearningRate 0.0051   Epoch: 15   Global Step: 641510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:51:08,310-Speed 2630.27 samples/sec   Loss 3.1816   LearningRate 0.0051   Epoch: 15   Global Step: 641520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:51:12,202-Speed 2631.38 samples/sec   Loss 3.0020   LearningRate 0.0051   Epoch: 15   Global Step: 641530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:51:16,106-Speed 2623.97 samples/sec   Loss 3.1163   LearningRate 0.0051   Epoch: 15   Global Step: 641540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:20,006-Speed 2626.57 samples/sec   Loss 3.0586   LearningRate 0.0051   Epoch: 15   Global Step: 641550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:23,901-Speed 2629.40 samples/sec   Loss 3.0740   LearningRate 0.0051   Epoch: 15   Global Step: 641560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:27,816-Speed 2616.42 samples/sec   Loss 3.0783   LearningRate 0.0051   Epoch: 15   Global Step: 641570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:31,720-Speed 2624.03 samples/sec   Loss 3.1284   LearningRate 0.0051   Epoch: 15   Global Step: 641580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:35,623-Speed 2624.11 samples/sec   Loss 3.1298   LearningRate 0.0051   Epoch: 15   Global Step: 641590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:39,518-Speed 2629.39 samples/sec   Loss 3.1219   LearningRate 0.0051   Epoch: 15   Global Step: 641600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:43,419-Speed 2625.30 samples/sec   Loss 3.1329   LearningRate 0.0051   Epoch: 15   Global Step: 641610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:47,317-Speed 2628.04 samples/sec   Loss 3.1265   LearningRate 0.0051   Epoch: 15   Global Step: 641620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:51,210-Speed 2632.55 samples/sec   Loss 3.1232   LearningRate 0.0051   Epoch: 15   Global Step: 641630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:55,080-Speed 2646.92 samples/sec   Loss 3.0598   LearningRate 0.0051   Epoch: 15   Global Step: 641640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:51:58,973-Speed 2631.01 samples/sec   Loss 3.1110   LearningRate 0.0051   Epoch: 15   Global Step: 641650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:02,870-Speed 2627.96 samples/sec   Loss 3.0737   LearningRate 0.0051   Epoch: 15   Global Step: 641660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:06,768-Speed 2627.85 samples/sec   Loss 3.0575   LearningRate 0.0051   Epoch: 15   Global Step: 641670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:10,660-Speed 2631.36 samples/sec   Loss 3.1614   LearningRate 0.0051   Epoch: 15   Global Step: 641680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:14,552-Speed 2631.88 samples/sec   Loss 3.0996   LearningRate 0.0051   Epoch: 15   Global Step: 641690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:18,444-Speed 2631.53 samples/sec   Loss 3.1202   LearningRate 0.0051   Epoch: 15   Global Step: 641700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:22,336-Speed 2631.79 samples/sec   Loss 3.0411   LearningRate 0.0051   Epoch: 15   Global Step: 641710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:26,292-Speed 2589.53 samples/sec   Loss 3.1114   LearningRate 0.0051   Epoch: 15   Global Step: 641720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:30,196-Speed 2623.70 samples/sec   Loss 3.0825   LearningRate 0.0051   Epoch: 15   Global Step: 641730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:34,120-Speed 2610.15 samples/sec   Loss 3.2049   LearningRate 0.0051   Epoch: 15   Global Step: 641740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:38,013-Speed 2630.79 samples/sec   Loss 3.0702   LearningRate 0.0051   Epoch: 15   Global Step: 641750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:41,913-Speed 2626.81 samples/sec   Loss 3.1183   LearningRate 0.0051   Epoch: 15   Global Step: 641760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:45,809-Speed 2628.50 samples/sec   Loss 2.9753   LearningRate 0.0051   Epoch: 15   Global Step: 641770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:49,704-Speed 2629.97 samples/sec   Loss 3.0862   LearningRate 0.0051   Epoch: 15   Global Step: 641780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:53,609-Speed 2622.63 samples/sec   Loss 3.1023   LearningRate 0.0051   Epoch: 15   Global Step: 641790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:52:57,516-Speed 2622.99 samples/sec   Loss 3.1274   LearningRate 0.0051   Epoch: 15   Global Step: 641800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:53:01,489-Speed 2578.21 samples/sec   Loss 3.0706   LearningRate 0.0051   Epoch: 15   Global Step: 641810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:53:05,397-Speed 2621.13 samples/sec   Loss 3.1168   LearningRate 0.0051   Epoch: 15   Global Step: 641820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:53:09,333-Speed 2602.05 samples/sec   Loss 3.0853   LearningRate 0.0051   Epoch: 15   Global Step: 641830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:53:13,251-Speed 2614.51 samples/sec   Loss 3.1114   LearningRate 0.0051   Epoch: 15   Global Step: 641840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:53:17,148-Speed 2628.18 samples/sec   Loss 3.0973   LearningRate 0.0051   Epoch: 15   Global Step: 641850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:53:21,043-Speed 2630.07 samples/sec   Loss 3.0710   LearningRate 0.0051   Epoch: 15   Global Step: 641860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:53:24,937-Speed 2630.21 samples/sec   Loss 3.0618   LearningRate 0.0051   Epoch: 15   Global Step: 641870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:53:28,851-Speed 2617.23 samples/sec   Loss 2.9820   LearningRate 0.0051   Epoch: 15   Global Step: 641880   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:53:32,726-Speed 2643.38 samples/sec   Loss 3.0565   LearningRate 0.0051   Epoch: 15   Global Step: 641890   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:53:36,626-Speed 2626.26 samples/sec   Loss 3.0134   LearningRate 0.0051   Epoch: 15   Global Step: 641900   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:53:40,539-Speed 2617.94 samples/sec   Loss 3.0688   LearningRate 0.0051   Epoch: 15   Global Step: 641910   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:53:44,434-Speed 2628.84 samples/sec   Loss 3.1463   LearningRate 0.0051   Epoch: 15   Global Step: 641920   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:53:48,333-Speed 2627.19 samples/sec   Loss 3.0768   LearningRate 0.0051   Epoch: 15   Global Step: 641930   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:53:52,231-Speed 2627.19 samples/sec   Loss 3.1076   LearningRate 0.0051   Epoch: 15   Global Step: 641940   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:53:56,126-Speed 2630.16 samples/sec   Loss 3.0672   LearningRate 0.0051   Epoch: 15   Global Step: 641950   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:54:00,023-Speed 2628.62 samples/sec   Loss 3.0643   LearningRate 0.0051   Epoch: 15   Global Step: 641960   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:54:03,921-Speed 2627.53 samples/sec   Loss 3.0757   LearningRate 0.0051   Epoch: 15   Global Step: 641970   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:54:07,817-Speed 2628.90 samples/sec   Loss 3.1035   LearningRate 0.0051   Epoch: 15   Global Step: 641980   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 19:54:11,721-Speed 2623.52 samples/sec   Loss 3.0414   LearningRate 0.0051   Epoch: 15   Global Step: 641990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:15,618-Speed 2627.76 samples/sec   Loss 3.1441   LearningRate 0.0051   Epoch: 15   Global Step: 642000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:19,521-Speed 2624.97 samples/sec   Loss 3.0730   LearningRate 0.0051   Epoch: 15   Global Step: 642010   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:23,418-Speed 2628.00 samples/sec   Loss 3.0540   LearningRate 0.0051   Epoch: 15   Global Step: 642020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:27,407-Speed 2567.69 samples/sec   Loss 3.1821   LearningRate 0.0051   Epoch: 15   Global Step: 642030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:31,518-Speed 2491.92 samples/sec   Loss 3.0305   LearningRate 0.0051   Epoch: 15   Global Step: 642040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:35,413-Speed 2629.69 samples/sec   Loss 3.0879   LearningRate 0.0051   Epoch: 15   Global Step: 642050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:39,321-Speed 2620.63 samples/sec   Loss 3.0757   LearningRate 0.0051   Epoch: 15   Global Step: 642060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:43,276-Speed 2590.12 samples/sec   Loss 3.1209   LearningRate 0.0051   Epoch: 15   Global Step: 642070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:47,171-Speed 2629.85 samples/sec   Loss 3.0071   LearningRate 0.0051   Epoch: 15   Global Step: 642080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:51,045-Speed 2643.64 samples/sec   Loss 3.0494   LearningRate 0.0051   Epoch: 15   Global Step: 642090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:54,937-Speed 2631.71 samples/sec   Loss 3.1306   LearningRate 0.0051   Epoch: 15   Global Step: 642100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:54:58,837-Speed 2626.77 samples/sec   Loss 3.1295   LearningRate 0.0051   Epoch: 15   Global Step: 642110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:02,733-Speed 2628.88 samples/sec   Loss 3.0747   LearningRate 0.0051   Epoch: 15   Global Step: 642120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:06,628-Speed 2629.11 samples/sec   Loss 3.1267   LearningRate 0.0051   Epoch: 15   Global Step: 642130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:10,565-Speed 2602.07 samples/sec   Loss 3.1061   LearningRate 0.0051   Epoch: 15   Global Step: 642140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:14,460-Speed 2629.30 samples/sec   Loss 3.0108   LearningRate 0.0051   Epoch: 15   Global Step: 642150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:18,354-Speed 2630.27 samples/sec   Loss 3.0321   LearningRate 0.0051   Epoch: 15   Global Step: 642160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:22,250-Speed 2629.75 samples/sec   Loss 3.0636   LearningRate 0.0051   Epoch: 15   Global Step: 642170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:26,145-Speed 2629.25 samples/sec   Loss 3.1091   LearningRate 0.0051   Epoch: 15   Global Step: 642180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:55:30,046-Speed 2626.58 samples/sec   Loss 2.9949   LearningRate 0.0051   Epoch: 15   Global Step: 642190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:33,941-Speed 2630.05 samples/sec   Loss 3.0864   LearningRate 0.0051   Epoch: 15   Global Step: 642200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:37,839-Speed 2626.89 samples/sec   Loss 3.1671   LearningRate 0.0051   Epoch: 15   Global Step: 642210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:41,807-Speed 2581.53 samples/sec   Loss 3.0854   LearningRate 0.0051   Epoch: 15   Global Step: 642220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:45,698-Speed 2632.12 samples/sec   Loss 3.0993   LearningRate 0.0051   Epoch: 15   Global Step: 642230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:49,588-Speed 2632.78 samples/sec   Loss 3.0742   LearningRate 0.0051   Epoch: 15   Global Step: 642240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:53,488-Speed 2626.54 samples/sec   Loss 3.0469   LearningRate 0.0051   Epoch: 15   Global Step: 642250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:55:57,387-Speed 2627.23 samples/sec   Loss 3.0735   LearningRate 0.0051   Epoch: 15   Global Step: 642260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:01,288-Speed 2626.01 samples/sec   Loss 3.1330   LearningRate 0.0051   Epoch: 15   Global Step: 642270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:05,179-Speed 2632.33 samples/sec   Loss 3.0381   LearningRate 0.0051   Epoch: 15   Global Step: 642280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:09,070-Speed 2631.78 samples/sec   Loss 3.0364   LearningRate 0.0051   Epoch: 15   Global Step: 642290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:56:12,959-Speed 2633.75 samples/sec   Loss 3.1030   LearningRate 0.0051   Epoch: 15   Global Step: 642300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 19:56:16,841-Speed 2638.49 samples/sec   Loss 3.0284   LearningRate 0.0051   Epoch: 15   Global Step: 642310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:20,796-Speed 2590.39 samples/sec   Loss 3.0909   LearningRate 0.0051   Epoch: 15   Global Step: 642320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:24,692-Speed 2628.85 samples/sec   Loss 3.1120   LearningRate 0.0051   Epoch: 15   Global Step: 642330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:28,584-Speed 2632.07 samples/sec   Loss 3.0853   LearningRate 0.0051   Epoch: 15   Global Step: 642340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:32,476-Speed 2631.27 samples/sec   Loss 3.0279   LearningRate 0.0051   Epoch: 15   Global Step: 642350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:36,380-Speed 2623.73 samples/sec   Loss 3.1340   LearningRate 0.0051   Epoch: 15   Global Step: 642360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:40,289-Speed 2620.05 samples/sec   Loss 3.0743   LearningRate 0.0051   Epoch: 15   Global Step: 642370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:44,190-Speed 2626.36 samples/sec   Loss 3.0833   LearningRate 0.0051   Epoch: 15   Global Step: 642380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:48,088-Speed 2627.30 samples/sec   Loss 3.0192   LearningRate 0.0051   Epoch: 15   Global Step: 642390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:52,017-Speed 2607.52 samples/sec   Loss 3.1349   LearningRate 0.0051   Epoch: 15   Global Step: 642400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:55,896-Speed 2640.38 samples/sec   Loss 3.1895   LearningRate 0.0051   Epoch: 15   Global Step: 642410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:56:59,801-Speed 2623.04 samples/sec   Loss 2.9693   LearningRate 0.0051   Epoch: 15   Global Step: 642420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:57:03,673-Speed 2645.76 samples/sec   Loss 3.1037   LearningRate 0.0051   Epoch: 15   Global Step: 642430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:07,575-Speed 2624.38 samples/sec   Loss 3.0453   LearningRate 0.0051   Epoch: 15   Global Step: 642440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:11,480-Speed 2622.52 samples/sec   Loss 3.0279   LearningRate 0.0051   Epoch: 15   Global Step: 642450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:15,430-Speed 2594.17 samples/sec   Loss 3.0592   LearningRate 0.0051   Epoch: 15   Global Step: 642460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:19,346-Speed 2615.28 samples/sec   Loss 3.1909   LearningRate 0.0051   Epoch: 15   Global Step: 642470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:23,240-Speed 2630.85 samples/sec   Loss 3.0679   LearningRate 0.0051   Epoch: 15   Global Step: 642480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:27,142-Speed 2624.66 samples/sec   Loss 3.0540   LearningRate 0.0051   Epoch: 15   Global Step: 642490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:31,034-Speed 2632.37 samples/sec   Loss 3.1473   LearningRate 0.0051   Epoch: 15   Global Step: 642500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:34,933-Speed 2626.91 samples/sec   Loss 2.9940   LearningRate 0.0051   Epoch: 15   Global Step: 642510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:38,840-Speed 2621.11 samples/sec   Loss 3.0619   LearningRate 0.0051   Epoch: 15   Global Step: 642520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:57:42,736-Speed 2628.96 samples/sec   Loss 3.1520   LearningRate 0.0051   Epoch: 15   Global Step: 642530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:57:46,652-Speed 2616.53 samples/sec   Loss 3.1176   LearningRate 0.0051   Epoch: 15   Global Step: 642540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:57:50,541-Speed 2633.71 samples/sec   Loss 3.0013   LearningRate 0.0051   Epoch: 15   Global Step: 642550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:57:54,434-Speed 2630.71 samples/sec   Loss 3.0332   LearningRate 0.0051   Epoch: 15   Global Step: 642560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:57:58,345-Speed 2619.80 samples/sec   Loss 3.1333   LearningRate 0.0051   Epoch: 15   Global Step: 642570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:02,247-Speed 2624.47 samples/sec   Loss 3.0614   LearningRate 0.0051   Epoch: 15   Global Step: 642580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:06,142-Speed 2629.56 samples/sec   Loss 3.1440   LearningRate 0.0051   Epoch: 15   Global Step: 642590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:10,052-Speed 2619.45 samples/sec   Loss 3.1127   LearningRate 0.0051   Epoch: 15   Global Step: 642600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:13,951-Speed 2627.55 samples/sec   Loss 3.0849   LearningRate 0.0051   Epoch: 15   Global Step: 642610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:17,843-Speed 2631.53 samples/sec   Loss 3.1112   LearningRate 0.0051   Epoch: 15   Global Step: 642620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:21,721-Speed 2641.50 samples/sec   Loss 3.1125   LearningRate 0.0051   Epoch: 15   Global Step: 642630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:25,621-Speed 2626.90 samples/sec   Loss 3.0527   LearningRate 0.0051   Epoch: 15   Global Step: 642640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:29,684-Speed 2521.04 samples/sec   Loss 3.0900   LearningRate 0.0051   Epoch: 15   Global Step: 642650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:33,610-Speed 2608.45 samples/sec   Loss 2.9892   LearningRate 0.0051   Epoch: 15   Global Step: 642660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:37,547-Speed 2601.71 samples/sec   Loss 3.0638   LearningRate 0.0051   Epoch: 15   Global Step: 642670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:41,463-Speed 2615.53 samples/sec   Loss 3.0123   LearningRate 0.0051   Epoch: 15   Global Step: 642680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:45,357-Speed 2630.45 samples/sec   Loss 3.0664   LearningRate 0.0051   Epoch: 15   Global Step: 642690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:49,309-Speed 2592.31 samples/sec   Loss 3.0987   LearningRate 0.0051   Epoch: 15   Global Step: 642700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:53,213-Speed 2623.15 samples/sec   Loss 3.0538   LearningRate 0.0051   Epoch: 15   Global Step: 642710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:58:57,115-Speed 2625.41 samples/sec   Loss 3.0248   LearningRate 0.0051   Epoch: 15   Global Step: 642720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:59:01,004-Speed 2633.63 samples/sec   Loss 3.0961   LearningRate 0.0051   Epoch: 15   Global Step: 642730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:59:04,908-Speed 2624.50 samples/sec   Loss 3.0352   LearningRate 0.0051   Epoch: 15   Global Step: 642740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:59:08,776-Speed 2647.67 samples/sec   Loss 3.1268   LearningRate 0.0051   Epoch: 15   Global Step: 642750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:12,671-Speed 2629.61 samples/sec   Loss 3.1400   LearningRate 0.0051   Epoch: 15   Global Step: 642760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:16,577-Speed 2622.17 samples/sec   Loss 3.1317   LearningRate 0.0051   Epoch: 15   Global Step: 642770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:20,480-Speed 2624.49 samples/sec   Loss 3.1383   LearningRate 0.0051   Epoch: 15   Global Step: 642780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:24,376-Speed 2628.55 samples/sec   Loss 3.1085   LearningRate 0.0051   Epoch: 15   Global Step: 642790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:28,336-Speed 2587.08 samples/sec   Loss 3.0627   LearningRate 0.0051   Epoch: 15   Global Step: 642800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:32,241-Speed 2622.76 samples/sec   Loss 3.0753   LearningRate 0.0051   Epoch: 15   Global Step: 642810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:36,137-Speed 2629.52 samples/sec   Loss 3.0687   LearningRate 0.0051   Epoch: 15   Global Step: 642820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:40,039-Speed 2624.78 samples/sec   Loss 3.0372   LearningRate 0.0051   Epoch: 15   Global Step: 642830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:43,950-Speed 2619.09 samples/sec   Loss 3.0803   LearningRate 0.0051   Epoch: 15   Global Step: 642840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:47,871-Speed 2611.84 samples/sec   Loss 3.0359   LearningRate 0.0051   Epoch: 15   Global Step: 642850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:59:51,787-Speed 2616.08 samples/sec   Loss 3.0604   LearningRate 0.0051   Epoch: 15   Global Step: 642860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 19:59:55,681-Speed 2630.27 samples/sec   Loss 3.0634   LearningRate 0.0051   Epoch: 15   Global Step: 642870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 19:59:59,687-Speed 2557.06 samples/sec   Loss 3.0096   LearningRate 0.0051   Epoch: 15   Global Step: 642880   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:03,587-Speed 2626.48 samples/sec   Loss 3.1280   LearningRate 0.0051   Epoch: 15   Global Step: 642890   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:07,491-Speed 2623.90 samples/sec   Loss 3.1076   LearningRate 0.0051   Epoch: 15   Global Step: 642900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:11,393-Speed 2624.14 samples/sec   Loss 3.0206   LearningRate 0.0051   Epoch: 15   Global Step: 642910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:15,291-Speed 2627.89 samples/sec   Loss 2.9695   LearningRate 0.0051   Epoch: 15   Global Step: 642920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:19,187-Speed 2629.29 samples/sec   Loss 3.0576   LearningRate 0.0051   Epoch: 15   Global Step: 642930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:23,085-Speed 2627.44 samples/sec   Loss 3.0704   LearningRate 0.0051   Epoch: 15   Global Step: 642940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:26,982-Speed 2628.55 samples/sec   Loss 3.1586   LearningRate 0.0051   Epoch: 15   Global Step: 642950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:30,877-Speed 2630.38 samples/sec   Loss 3.0793   LearningRate 0.0051   Epoch: 15   Global Step: 642960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:34,789-Speed 2617.94 samples/sec   Loss 3.0811   LearningRate 0.0051   Epoch: 15   Global Step: 642970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:00:38,685-Speed 2628.34 samples/sec   Loss 3.0675   LearningRate 0.0051   Epoch: 15   Global Step: 642980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:00:42,580-Speed 2630.08 samples/sec   Loss 3.0212   LearningRate 0.0051   Epoch: 15   Global Step: 642990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:00:46,476-Speed 2629.77 samples/sec   Loss 3.1644   LearningRate 0.0051   Epoch: 15   Global Step: 643000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:00:50,350-Speed 2643.61 samples/sec   Loss 3.0396   LearningRate 0.0051   Epoch: 15   Global Step: 643010   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:54,255-Speed 2632.20 samples/sec   Loss 3.0385   LearningRate 0.0051   Epoch: 15   Global Step: 643020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:00:58,149-Speed 2630.03 samples/sec   Loss 3.1231   LearningRate 0.0051   Epoch: 15   Global Step: 643030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:02,052-Speed 2624.48 samples/sec   Loss 3.0675   LearningRate 0.0051   Epoch: 15   Global Step: 643040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:05,955-Speed 2623.90 samples/sec   Loss 3.0103   LearningRate 0.0051   Epoch: 15   Global Step: 643050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:09,850-Speed 2630.47 samples/sec   Loss 2.9946   LearningRate 0.0051   Epoch: 15   Global Step: 643060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:13,745-Speed 2629.61 samples/sec   Loss 3.0180   LearningRate 0.0051   Epoch: 15   Global Step: 643070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:17,640-Speed 2629.49 samples/sec   Loss 3.1225   LearningRate 0.0051   Epoch: 15   Global Step: 643080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:21,590-Speed 2593.26 samples/sec   Loss 3.0506   LearningRate 0.0051   Epoch: 15   Global Step: 643090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:25,484-Speed 2630.90 samples/sec   Loss 3.0093   LearningRate 0.0051   Epoch: 15   Global Step: 643100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:01:29,376-Speed 2631.60 samples/sec   Loss 2.9860   LearningRate 0.0051   Epoch: 15   Global Step: 643110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:33,273-Speed 2628.17 samples/sec   Loss 3.0694   LearningRate 0.0051   Epoch: 15   Global Step: 643120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:37,170-Speed 2628.76 samples/sec   Loss 3.1818   LearningRate 0.0051   Epoch: 15   Global Step: 643130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:41,064-Speed 2629.71 samples/sec   Loss 3.1076   LearningRate 0.0051   Epoch: 15   Global Step: 643140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:44,958-Speed 2630.78 samples/sec   Loss 3.1536   LearningRate 0.0051   Epoch: 15   Global Step: 643150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:48,858-Speed 2626.24 samples/sec   Loss 3.0627   LearningRate 0.0050   Epoch: 15   Global Step: 643160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:52,753-Speed 2629.93 samples/sec   Loss 3.0715   LearningRate 0.0050   Epoch: 15   Global Step: 643170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:01:56,647-Speed 2630.09 samples/sec   Loss 3.0771   LearningRate 0.0050   Epoch: 15   Global Step: 643180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:00,544-Speed 2628.62 samples/sec   Loss 3.0113   LearningRate 0.0050   Epoch: 15   Global Step: 643190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:04,437-Speed 2630.90 samples/sec   Loss 3.1520   LearningRate 0.0050   Epoch: 15   Global Step: 643200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:08,333-Speed 2628.55 samples/sec   Loss 3.1091   LearningRate 0.0050   Epoch: 15   Global Step: 643210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:02:12,244-Speed 2618.95 samples/sec   Loss 3.1121   LearningRate 0.0050   Epoch: 15   Global Step: 643220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:16,139-Speed 2630.18 samples/sec   Loss 3.0638   LearningRate 0.0050   Epoch: 15   Global Step: 643230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:20,033-Speed 2630.21 samples/sec   Loss 3.1021   LearningRate 0.0050   Epoch: 15   Global Step: 643240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:23,984-Speed 2592.71 samples/sec   Loss 3.0853   LearningRate 0.0050   Epoch: 15   Global Step: 643250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:27,879-Speed 2629.78 samples/sec   Loss 2.9985   LearningRate 0.0050   Epoch: 15   Global Step: 643260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:31,776-Speed 2628.99 samples/sec   Loss 3.0270   LearningRate 0.0050   Epoch: 15   Global Step: 643270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:35,728-Speed 2591.35 samples/sec   Loss 3.0098   LearningRate 0.0050   Epoch: 15   Global Step: 643280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:39,627-Speed 2626.69 samples/sec   Loss 3.0476   LearningRate 0.0050   Epoch: 15   Global Step: 643290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:43,527-Speed 2626.44 samples/sec   Loss 3.0483   LearningRate 0.0050   Epoch: 15   Global Step: 643300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:47,419-Speed 2631.97 samples/sec   Loss 3.1364   LearningRate 0.0050   Epoch: 15   Global Step: 643310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:02:51,304-Speed 2636.67 samples/sec   Loss 2.9962   LearningRate 0.0050   Epoch: 15   Global Step: 643320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:02:55,206-Speed 2625.31 samples/sec   Loss 3.0356   LearningRate 0.0050   Epoch: 15   Global Step: 643330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:02:59,096-Speed 2634.26 samples/sec   Loss 3.0454   LearningRate 0.0050   Epoch: 15   Global Step: 643340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:03,014-Speed 2613.95 samples/sec   Loss 3.0915   LearningRate 0.0050   Epoch: 15   Global Step: 643350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:06,906-Speed 2631.59 samples/sec   Loss 3.0122   LearningRate 0.0050   Epoch: 15   Global Step: 643360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:10,798-Speed 2631.86 samples/sec   Loss 3.0417   LearningRate 0.0050   Epoch: 15   Global Step: 643370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:14,695-Speed 2628.25 samples/sec   Loss 3.0145   LearningRate 0.0050   Epoch: 15   Global Step: 643380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:18,593-Speed 2627.22 samples/sec   Loss 3.0633   LearningRate 0.0050   Epoch: 15   Global Step: 643390   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:22,617-Speed 2545.89 samples/sec   Loss 2.9911   LearningRate 0.0050   Epoch: 15   Global Step: 643400   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:26,612-Speed 2563.85 samples/sec   Loss 3.1199   LearningRate 0.0050   Epoch: 15   Global Step: 643410   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:03:30,518-Speed 2622.76 samples/sec   Loss 3.1106   LearningRate 0.0050   Epoch: 15   Global Step: 643420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:34,552-Speed 2539.00 samples/sec   Loss 3.0243   LearningRate 0.0050   Epoch: 15   Global Step: 643430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:38,646-Speed 2501.93 samples/sec   Loss 3.0291   LearningRate 0.0050   Epoch: 15   Global Step: 643440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:42,652-Speed 2556.67 samples/sec   Loss 3.0867   LearningRate 0.0050   Epoch: 15   Global Step: 643450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:46,550-Speed 2627.77 samples/sec   Loss 3.0683   LearningRate 0.0050   Epoch: 15   Global Step: 643460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:50,443-Speed 2631.03 samples/sec   Loss 3.0929   LearningRate 0.0050   Epoch: 15   Global Step: 643470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:54,339-Speed 2629.61 samples/sec   Loss 3.0992   LearningRate 0.0050   Epoch: 15   Global Step: 643480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:03:58,246-Speed 2621.10 samples/sec   Loss 3.0169   LearningRate 0.0050   Epoch: 15   Global Step: 643490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:04:02,144-Speed 2627.74 samples/sec   Loss 3.0880   LearningRate 0.0050   Epoch: 15   Global Step: 643500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:04:06,038-Speed 2630.39 samples/sec   Loss 3.1022   LearningRate 0.0050   Epoch: 15   Global Step: 643510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:04:09,932-Speed 2630.60 samples/sec   Loss 3.0543   LearningRate 0.0050   Epoch: 15   Global Step: 643520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:04:13,809-Speed 2641.65 samples/sec   Loss 3.1061   LearningRate 0.0050   Epoch: 15   Global Step: 643530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:04:17,685-Speed 2642.86 samples/sec   Loss 2.9623   LearningRate 0.0050   Epoch: 15   Global Step: 643540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:21,584-Speed 2627.19 samples/sec   Loss 3.0503   LearningRate 0.0050   Epoch: 15   Global Step: 643550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:25,485-Speed 2625.06 samples/sec   Loss 3.0801   LearningRate 0.0050   Epoch: 15   Global Step: 643560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:29,397-Speed 2618.19 samples/sec   Loss 3.0442   LearningRate 0.0050   Epoch: 15   Global Step: 643570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:33,296-Speed 2627.17 samples/sec   Loss 3.0544   LearningRate 0.0050   Epoch: 15   Global Step: 643580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:37,197-Speed 2625.55 samples/sec   Loss 3.0832   LearningRate 0.0050   Epoch: 15   Global Step: 643590   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:41,102-Speed 2623.36 samples/sec   Loss 2.9942   LearningRate 0.0050   Epoch: 15   Global Step: 643600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:44,999-Speed 2628.24 samples/sec   Loss 3.1458   LearningRate 0.0050   Epoch: 15   Global Step: 643610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:48,895-Speed 2629.01 samples/sec   Loss 3.0162   LearningRate 0.0050   Epoch: 15   Global Step: 643620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:52,924-Speed 2541.75 samples/sec   Loss 3.0960   LearningRate 0.0050   Epoch: 15   Global Step: 643630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:04:57,030-Speed 2494.86 samples/sec   Loss 3.0421   LearningRate 0.0050   Epoch: 15   Global Step: 643640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:00,969-Speed 2600.04 samples/sec   Loss 3.0835   LearningRate 0.0050   Epoch: 15   Global Step: 643650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:04,866-Speed 2628.48 samples/sec   Loss 3.1093   LearningRate 0.0050   Epoch: 15   Global Step: 643660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:08,776-Speed 2619.25 samples/sec   Loss 3.0333   LearningRate 0.0050   Epoch: 15   Global Step: 643670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:12,673-Speed 2628.22 samples/sec   Loss 3.0912   LearningRate 0.0050   Epoch: 15   Global Step: 643680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:16,587-Speed 2616.69 samples/sec   Loss 3.0859   LearningRate 0.0050   Epoch: 15   Global Step: 643690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:20,485-Speed 2627.84 samples/sec   Loss 3.1229   LearningRate 0.0050   Epoch: 15   Global Step: 643700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:24,380-Speed 2629.33 samples/sec   Loss 3.0163   LearningRate 0.0050   Epoch: 15   Global Step: 643710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:28,274-Speed 2630.63 samples/sec   Loss 3.0342   LearningRate 0.0050   Epoch: 15   Global Step: 643720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:32,171-Speed 2628.41 samples/sec   Loss 3.0434   LearningRate 0.0050   Epoch: 15   Global Step: 643730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:36,044-Speed 2643.88 samples/sec   Loss 2.9830   LearningRate 0.0050   Epoch: 15   Global Step: 643740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:39,936-Speed 2632.17 samples/sec   Loss 2.9967   LearningRate 0.0050   Epoch: 15   Global Step: 643750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:43,832-Speed 2628.50 samples/sec   Loss 3.0467   LearningRate 0.0050   Epoch: 15   Global Step: 643760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:47,741-Speed 2620.48 samples/sec   Loss 2.9732   LearningRate 0.0050   Epoch: 15   Global Step: 643770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:51,639-Speed 2627.03 samples/sec   Loss 2.9927   LearningRate 0.0050   Epoch: 15   Global Step: 643780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:55,537-Speed 2628.00 samples/sec   Loss 3.0157   LearningRate 0.0050   Epoch: 15   Global Step: 643790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:05:59,457-Speed 2612.62 samples/sec   Loss 3.0867   LearningRate 0.0050   Epoch: 15   Global Step: 643800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:03,354-Speed 2628.93 samples/sec   Loss 3.0616   LearningRate 0.0050   Epoch: 15   Global Step: 643810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:07,253-Speed 2626.76 samples/sec   Loss 3.0520   LearningRate 0.0050   Epoch: 15   Global Step: 643820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:11,150-Speed 2628.18 samples/sec   Loss 3.0670   LearningRate 0.0050   Epoch: 15   Global Step: 643830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:15,043-Speed 2630.66 samples/sec   Loss 3.0779   LearningRate 0.0050   Epoch: 15   Global Step: 643840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:18,944-Speed 2626.00 samples/sec   Loss 3.1002   LearningRate 0.0050   Epoch: 15   Global Step: 643850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:22,839-Speed 2629.39 samples/sec   Loss 3.0348   LearningRate 0.0050   Epoch: 15   Global Step: 643860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:26,750-Speed 2618.56 samples/sec   Loss 3.0499   LearningRate 0.0050   Epoch: 15   Global Step: 643870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:30,644-Speed 2630.53 samples/sec   Loss 3.0725   LearningRate 0.0050   Epoch: 15   Global Step: 643880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:34,542-Speed 2627.38 samples/sec   Loss 3.1005   LearningRate 0.0050   Epoch: 15   Global Step: 643890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:38,445-Speed 2624.52 samples/sec   Loss 3.0547   LearningRate 0.0050   Epoch: 15   Global Step: 643900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:42,340-Speed 2630.21 samples/sec   Loss 3.0600   LearningRate 0.0050   Epoch: 15   Global Step: 643910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:46,231-Speed 2632.19 samples/sec   Loss 3.0926   LearningRate 0.0050   Epoch: 15   Global Step: 643920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:50,125-Speed 2630.55 samples/sec   Loss 3.0774   LearningRate 0.0050   Epoch: 15   Global Step: 643930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:06:54,036-Speed 2619.05 samples/sec   Loss 3.1247   LearningRate 0.0050   Epoch: 15   Global Step: 643940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:06:57,956-Speed 2613.03 samples/sec   Loss 3.0045   LearningRate 0.0050   Epoch: 15   Global Step: 643950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:07:01,837-Speed 2638.88 samples/sec   Loss 3.1321   LearningRate 0.0050   Epoch: 15   Global Step: 643960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:05,730-Speed 2631.00 samples/sec   Loss 3.0611   LearningRate 0.0050   Epoch: 15   Global Step: 643970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:09,653-Speed 2610.65 samples/sec   Loss 3.1168   LearningRate 0.0050   Epoch: 15   Global Step: 643980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:13,566-Speed 2617.84 samples/sec   Loss 3.0000   LearningRate 0.0050   Epoch: 15   Global Step: 643990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:17,496-Speed 2606.36 samples/sec   Loss 3.0524   LearningRate 0.0050   Epoch: 15   Global Step: 644000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:21,407-Speed 2619.37 samples/sec   Loss 3.1034   LearningRate 0.0050   Epoch: 15   Global Step: 644010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:25,326-Speed 2613.15 samples/sec   Loss 3.0263   LearningRate 0.0050   Epoch: 15   Global Step: 644020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:29,245-Speed 2614.02 samples/sec   Loss 3.0669   LearningRate 0.0050   Epoch: 15   Global Step: 644030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:33,148-Speed 2624.24 samples/sec   Loss 3.0589   LearningRate 0.0050   Epoch: 15   Global Step: 644040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:37,045-Speed 2628.32 samples/sec   Loss 3.0787   LearningRate 0.0050   Epoch: 15   Global Step: 644050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:40,925-Speed 2640.00 samples/sec   Loss 3.1175   LearningRate 0.0050   Epoch: 15   Global Step: 644060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:44,839-Speed 2617.10 samples/sec   Loss 3.0071   LearningRate 0.0050   Epoch: 15   Global Step: 644070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:48,733-Speed 2629.99 samples/sec   Loss 3.0554   LearningRate 0.0050   Epoch: 15   Global Step: 644080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:52,630-Speed 2629.14 samples/sec   Loss 3.0222   LearningRate 0.0050   Epoch: 15   Global Step: 644090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:07:56,538-Speed 2620.40 samples/sec   Loss 3.1511   LearningRate 0.0050   Epoch: 15   Global Step: 644100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:08:00,418-Speed 2639.84 samples/sec   Loss 3.1061   LearningRate 0.0050   Epoch: 15   Global Step: 644110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:04,320-Speed 2625.51 samples/sec   Loss 3.0811   LearningRate 0.0050   Epoch: 15   Global Step: 644120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:08,216-Speed 2628.54 samples/sec   Loss 2.9784   LearningRate 0.0050   Epoch: 15   Global Step: 644130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:12,111-Speed 2629.86 samples/sec   Loss 3.0825   LearningRate 0.0050   Epoch: 15   Global Step: 644140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:16,012-Speed 2625.65 samples/sec   Loss 3.0647   LearningRate 0.0050   Epoch: 15   Global Step: 644150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:19,907-Speed 2629.89 samples/sec   Loss 3.0625   LearningRate 0.0050   Epoch: 15   Global Step: 644160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:23,801-Speed 2630.67 samples/sec   Loss 3.0535   LearningRate 0.0050   Epoch: 15   Global Step: 644170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:27,703-Speed 2624.30 samples/sec   Loss 3.0615   LearningRate 0.0050   Epoch: 15   Global Step: 644180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:31,601-Speed 2628.51 samples/sec   Loss 3.0233   LearningRate 0.0050   Epoch: 15   Global Step: 644190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:35,503-Speed 2624.64 samples/sec   Loss 3.0516   LearningRate 0.0050   Epoch: 15   Global Step: 644200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:39,402-Speed 2627.14 samples/sec   Loss 3.0171   LearningRate 0.0050   Epoch: 15   Global Step: 644210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:08:43,306-Speed 2623.24 samples/sec   Loss 3.0858   LearningRate 0.0050   Epoch: 15   Global Step: 644220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:08:47,211-Speed 2622.80 samples/sec   Loss 3.0371   LearningRate 0.0050   Epoch: 15   Global Step: 644230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:08:51,109-Speed 2628.22 samples/sec   Loss 3.0352   LearningRate 0.0050   Epoch: 15   Global Step: 644240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:08:54,988-Speed 2640.75 samples/sec   Loss 3.1152   LearningRate 0.0050   Epoch: 15   Global Step: 644250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:08:58,901-Speed 2617.17 samples/sec   Loss 3.0678   LearningRate 0.0050   Epoch: 15   Global Step: 644260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:02,811-Speed 2620.09 samples/sec   Loss 3.0990   LearningRate 0.0050   Epoch: 15   Global Step: 644270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:06,711-Speed 2625.67 samples/sec   Loss 3.0851   LearningRate 0.0050   Epoch: 15   Global Step: 644280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:10,616-Speed 2622.66 samples/sec   Loss 3.1116   LearningRate 0.0050   Epoch: 15   Global Step: 644290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:14,515-Speed 2626.74 samples/sec   Loss 3.0532   LearningRate 0.0050   Epoch: 15   Global Step: 644300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:18,415-Speed 2627.71 samples/sec   Loss 2.9920   LearningRate 0.0050   Epoch: 15   Global Step: 644310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:22,316-Speed 2625.57 samples/sec   Loss 3.0105   LearningRate 0.0050   Epoch: 15   Global Step: 644320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:26,231-Speed 2616.41 samples/sec   Loss 3.0432   LearningRate 0.0050   Epoch: 15   Global Step: 644330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:30,128-Speed 2629.12 samples/sec   Loss 3.1090   LearningRate 0.0050   Epoch: 15   Global Step: 644340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:34,027-Speed 2626.94 samples/sec   Loss 3.1299   LearningRate 0.0050   Epoch: 15   Global Step: 644350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:09:37,923-Speed 2629.05 samples/sec   Loss 3.0442   LearningRate 0.0050   Epoch: 15   Global Step: 644360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:09:41,819-Speed 2628.50 samples/sec   Loss 3.0794   LearningRate 0.0050   Epoch: 15   Global Step: 644370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:09:45,731-Speed 2619.03 samples/sec   Loss 2.9757   LearningRate 0.0050   Epoch: 15   Global Step: 644380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:49,636-Speed 2623.16 samples/sec   Loss 3.0722   LearningRate 0.0050   Epoch: 15   Global Step: 644390   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:53,536-Speed 2625.82 samples/sec   Loss 3.0626   LearningRate 0.0050   Epoch: 15   Global Step: 644400   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:09:57,439-Speed 2624.75 samples/sec   Loss 3.0491   LearningRate 0.0050   Epoch: 15   Global Step: 644410   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:01,339-Speed 2626.48 samples/sec   Loss 3.0828   LearningRate 0.0050   Epoch: 15   Global Step: 644420   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:05,244-Speed 2622.71 samples/sec   Loss 2.9157   LearningRate 0.0050   Epoch: 15   Global Step: 644430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:09,141-Speed 2628.21 samples/sec   Loss 3.0473   LearningRate 0.0050   Epoch: 15   Global Step: 644440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:13,048-Speed 2621.57 samples/sec   Loss 3.0511   LearningRate 0.0050   Epoch: 15   Global Step: 644450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:16,944-Speed 2628.70 samples/sec   Loss 3.0494   LearningRate 0.0050   Epoch: 15   Global Step: 644460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:20,845-Speed 2626.04 samples/sec   Loss 3.0198   LearningRate 0.0050   Epoch: 15   Global Step: 644470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:24,760-Speed 2616.38 samples/sec   Loss 3.0238   LearningRate 0.0050   Epoch: 15   Global Step: 644480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:10:28,652-Speed 2631.13 samples/sec   Loss 3.0622   LearningRate 0.0050   Epoch: 15   Global Step: 644490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:32,549-Speed 2628.32 samples/sec   Loss 3.0721   LearningRate 0.0050   Epoch: 15   Global Step: 644500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:36,454-Speed 2623.08 samples/sec   Loss 3.0299   LearningRate 0.0050   Epoch: 15   Global Step: 644510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:40,358-Speed 2623.77 samples/sec   Loss 3.0169   LearningRate 0.0050   Epoch: 15   Global Step: 644520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:44,266-Speed 2620.89 samples/sec   Loss 3.0602   LearningRate 0.0050   Epoch: 15   Global Step: 644530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:48,168-Speed 2625.51 samples/sec   Loss 3.0412   LearningRate 0.0050   Epoch: 15   Global Step: 644540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:52,077-Speed 2620.35 samples/sec   Loss 3.0543   LearningRate 0.0050   Epoch: 15   Global Step: 644550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:55,970-Speed 2630.66 samples/sec   Loss 3.1492   LearningRate 0.0050   Epoch: 15   Global Step: 644560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:10:59,864-Speed 2630.35 samples/sec   Loss 3.0375   LearningRate 0.0050   Epoch: 15   Global Step: 644570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:03,757-Speed 2631.24 samples/sec   Loss 3.0677   LearningRate 0.0050   Epoch: 15   Global Step: 644580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:07,654-Speed 2628.37 samples/sec   Loss 3.0122   LearningRate 0.0050   Epoch: 15   Global Step: 644590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:11,547-Speed 2630.82 samples/sec   Loss 2.9712   LearningRate 0.0050   Epoch: 15   Global Step: 644600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:15,448-Speed 2626.02 samples/sec   Loss 3.0617   LearningRate 0.0050   Epoch: 15   Global Step: 644610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:19,360-Speed 2618.43 samples/sec   Loss 3.0806   LearningRate 0.0050   Epoch: 15   Global Step: 644620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:23,257-Speed 2628.02 samples/sec   Loss 3.0064   LearningRate 0.0050   Epoch: 15   Global Step: 644630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:27,160-Speed 2624.66 samples/sec   Loss 3.1019   LearningRate 0.0050   Epoch: 15   Global Step: 644640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:31,060-Speed 2626.68 samples/sec   Loss 3.0274   LearningRate 0.0050   Epoch: 15   Global Step: 644650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:34,966-Speed 2622.35 samples/sec   Loss 3.0419   LearningRate 0.0050   Epoch: 15   Global Step: 644660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:11:38,849-Speed 2637.97 samples/sec   Loss 2.9875   LearningRate 0.0050   Epoch: 15   Global Step: 644670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:42,743-Speed 2629.72 samples/sec   Loss 3.0562   LearningRate 0.0050   Epoch: 15   Global Step: 644680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:46,667-Speed 2610.62 samples/sec   Loss 3.0323   LearningRate 0.0050   Epoch: 15   Global Step: 644690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:50,578-Speed 2619.12 samples/sec   Loss 3.0780   LearningRate 0.0050   Epoch: 15   Global Step: 644700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:54,472-Speed 2629.94 samples/sec   Loss 3.0401   LearningRate 0.0050   Epoch: 15   Global Step: 644710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:11:58,369-Speed 2628.66 samples/sec   Loss 3.0509   LearningRate 0.0050   Epoch: 15   Global Step: 644720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:12:02,271-Speed 2624.86 samples/sec   Loss 3.0020   LearningRate 0.0050   Epoch: 15   Global Step: 644730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:12:06,168-Speed 2628.32 samples/sec   Loss 3.0541   LearningRate 0.0050   Epoch: 15   Global Step: 644740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:12:10,076-Speed 2620.83 samples/sec   Loss 3.1250   LearningRate 0.0050   Epoch: 15   Global Step: 644750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:12:13,972-Speed 2628.90 samples/sec   Loss 3.0429   LearningRate 0.0050   Epoch: 15   Global Step: 644760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:12:17,864-Speed 2632.64 samples/sec   Loss 3.1189   LearningRate 0.0050   Epoch: 15   Global Step: 644770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:21,756-Speed 2631.60 samples/sec   Loss 3.0398   LearningRate 0.0050   Epoch: 15   Global Step: 644780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:25,646-Speed 2632.79 samples/sec   Loss 3.0731   LearningRate 0.0050   Epoch: 15   Global Step: 644790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:29,545-Speed 2627.05 samples/sec   Loss 3.0003   LearningRate 0.0050   Epoch: 15   Global Step: 644800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:33,458-Speed 2617.81 samples/sec   Loss 3.0204   LearningRate 0.0050   Epoch: 15   Global Step: 644810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:37,369-Speed 2619.45 samples/sec   Loss 3.0082   LearningRate 0.0050   Epoch: 15   Global Step: 644820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:41,366-Speed 2562.25 samples/sec   Loss 3.0007   LearningRate 0.0050   Epoch: 15   Global Step: 644830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:45,273-Speed 2621.54 samples/sec   Loss 3.0903   LearningRate 0.0050   Epoch: 15   Global Step: 644840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:49,192-Speed 2614.23 samples/sec   Loss 3.1120   LearningRate 0.0050   Epoch: 15   Global Step: 644850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:53,093-Speed 2625.51 samples/sec   Loss 3.1095   LearningRate 0.0050   Epoch: 15   Global Step: 644860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:12:56,994-Speed 2626.30 samples/sec   Loss 2.9582   LearningRate 0.0050   Epoch: 15   Global Step: 644870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:13:00,905-Speed 2618.38 samples/sec   Loss 2.9963   LearningRate 0.0050   Epoch: 15   Global Step: 644880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:13:04,807-Speed 2625.17 samples/sec   Loss 3.0790   LearningRate 0.0050   Epoch: 15   Global Step: 644890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:13:08,701-Speed 2630.44 samples/sec   Loss 3.1060   LearningRate 0.0050   Epoch: 15   Global Step: 644900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:13:12,618-Speed 2615.39 samples/sec   Loss 3.0296   LearningRate 0.0050   Epoch: 15   Global Step: 644910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:16,517-Speed 2626.79 samples/sec   Loss 3.0363   LearningRate 0.0050   Epoch: 15   Global Step: 644920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:20,416-Speed 2626.68 samples/sec   Loss 3.0735   LearningRate 0.0050   Epoch: 15   Global Step: 644930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:24,369-Speed 2593.80 samples/sec   Loss 3.0607   LearningRate 0.0050   Epoch: 15   Global Step: 644940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:28,279-Speed 2620.06 samples/sec   Loss 3.0595   LearningRate 0.0050   Epoch: 15   Global Step: 644950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:32,185-Speed 2622.50 samples/sec   Loss 3.0551   LearningRate 0.0050   Epoch: 15   Global Step: 644960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:36,083-Speed 2627.01 samples/sec   Loss 3.0516   LearningRate 0.0050   Epoch: 15   Global Step: 644970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:40,008-Speed 2609.32 samples/sec   Loss 3.0319   LearningRate 0.0050   Epoch: 15   Global Step: 644980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:43,918-Speed 2619.61 samples/sec   Loss 3.0571   LearningRate 0.0050   Epoch: 15   Global Step: 644990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:47,815-Speed 2629.03 samples/sec   Loss 2.9579   LearningRate 0.0050   Epoch: 15   Global Step: 645000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:51,722-Speed 2621.15 samples/sec   Loss 2.9362   LearningRate 0.0050   Epoch: 15   Global Step: 645010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:13:55,592-Speed 2646.91 samples/sec   Loss 3.0437   LearningRate 0.0049   Epoch: 15   Global Step: 645020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:13:59,489-Speed 2628.50 samples/sec   Loss 3.0174   LearningRate 0.0049   Epoch: 15   Global Step: 645030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:03,394-Speed 2622.96 samples/sec   Loss 3.0544   LearningRate 0.0049   Epoch: 15   Global Step: 645040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:07,295-Speed 2625.90 samples/sec   Loss 3.1066   LearningRate 0.0049   Epoch: 15   Global Step: 645050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:11,190-Speed 2629.53 samples/sec   Loss 3.0461   LearningRate 0.0049   Epoch: 15   Global Step: 645060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:15,090-Speed 2626.24 samples/sec   Loss 3.0494   LearningRate 0.0049   Epoch: 15   Global Step: 645070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:18,986-Speed 2628.93 samples/sec   Loss 3.0432   LearningRate 0.0049   Epoch: 15   Global Step: 645080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:22,883-Speed 2629.15 samples/sec   Loss 3.0282   LearningRate 0.0049   Epoch: 15   Global Step: 645090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:26,837-Speed 2589.96 samples/sec   Loss 3.0002   LearningRate 0.0049   Epoch: 15   Global Step: 645100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:30,733-Speed 2629.33 samples/sec   Loss 3.0245   LearningRate 0.0049   Epoch: 15   Global Step: 645110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:34,628-Speed 2629.24 samples/sec   Loss 3.0794   LearningRate 0.0049   Epoch: 15   Global Step: 645120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:14:38,525-Speed 2628.47 samples/sec   Loss 3.1480   LearningRate 0.0049   Epoch: 15   Global Step: 645130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:14:42,394-Speed 2647.57 samples/sec   Loss 2.9554   LearningRate 0.0049   Epoch: 15   Global Step: 645140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:46,290-Speed 2628.93 samples/sec   Loss 3.0797   LearningRate 0.0049   Epoch: 15   Global Step: 645150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:50,216-Speed 2608.26 samples/sec   Loss 2.9797   LearningRate 0.0049   Epoch: 15   Global Step: 645160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:54,110-Speed 2631.01 samples/sec   Loss 2.9814   LearningRate 0.0049   Epoch: 15   Global Step: 645170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:14:58,009-Speed 2626.93 samples/sec   Loss 2.9865   LearningRate 0.0049   Epoch: 15   Global Step: 645180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:01,913-Speed 2623.81 samples/sec   Loss 3.0816   LearningRate 0.0049   Epoch: 15   Global Step: 645190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:05,807-Speed 2630.10 samples/sec   Loss 3.0690   LearningRate 0.0049   Epoch: 15   Global Step: 645200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:09,779-Speed 2578.66 samples/sec   Loss 3.0376   LearningRate 0.0049   Epoch: 15   Global Step: 645210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:13,671-Speed 2631.89 samples/sec   Loss 3.1102   LearningRate 0.0049   Epoch: 15   Global Step: 645220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:17,565-Speed 2630.42 samples/sec   Loss 3.0444   LearningRate 0.0049   Epoch: 15   Global Step: 645230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:21,466-Speed 2626.01 samples/sec   Loss 3.0961   LearningRate 0.0049   Epoch: 15   Global Step: 645240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:15:25,412-Speed 2595.56 samples/sec   Loss 3.0555   LearningRate 0.0049   Epoch: 15   Global Step: 645250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:15:29,495-Speed 2508.26 samples/sec   Loss 3.0151   LearningRate 0.0049   Epoch: 15   Global Step: 645260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:15:33,519-Speed 2545.92 samples/sec   Loss 3.0632   LearningRate 0.0049   Epoch: 15   Global Step: 645270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:15:37,415-Speed 2628.87 samples/sec   Loss 3.0000   LearningRate 0.0049   Epoch: 15   Global Step: 645280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:15:41,297-Speed 2638.13 samples/sec   Loss 3.0392   LearningRate 0.0049   Epoch: 15   Global Step: 645290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:45,189-Speed 2631.66 samples/sec   Loss 3.0991   LearningRate 0.0049   Epoch: 15   Global Step: 645300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:49,096-Speed 2621.80 samples/sec   Loss 2.9608   LearningRate 0.0049   Epoch: 15   Global Step: 645310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:52,987-Speed 2632.46 samples/sec   Loss 3.0580   LearningRate 0.0049   Epoch: 15   Global Step: 645320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:15:56,888-Speed 2625.46 samples/sec   Loss 3.0539   LearningRate 0.0049   Epoch: 15   Global Step: 645330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:16:00,788-Speed 2626.55 samples/sec   Loss 3.0945   LearningRate 0.0049   Epoch: 15   Global Step: 645340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:16:04,690-Speed 2625.06 samples/sec   Loss 3.0038   LearningRate 0.0049   Epoch: 15   Global Step: 645350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:16:08,612-Speed 2611.41 samples/sec   Loss 3.0159   LearningRate 0.0049   Epoch: 15   Global Step: 645360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:16:12,516-Speed 2623.20 samples/sec   Loss 3.0686   LearningRate 0.0049   Epoch: 15   Global Step: 645370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:16:16,417-Speed 2625.91 samples/sec   Loss 3.0744   LearningRate 0.0049   Epoch: 15   Global Step: 645380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:16:20,309-Speed 2631.47 samples/sec   Loss 2.9989   LearningRate 0.0049   Epoch: 15   Global Step: 645390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:24,203-Speed 2630.62 samples/sec   Loss 3.0729   LearningRate 0.0049   Epoch: 15   Global Step: 645400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:28,106-Speed 2624.42 samples/sec   Loss 3.1033   LearningRate 0.0049   Epoch: 15   Global Step: 645410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:32,192-Speed 2507.02 samples/sec   Loss 2.9484   LearningRate 0.0049   Epoch: 15   Global Step: 645420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:36,146-Speed 2589.97 samples/sec   Loss 3.0277   LearningRate 0.0049   Epoch: 15   Global Step: 645430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:40,056-Speed 2620.72 samples/sec   Loss 2.9881   LearningRate 0.0049   Epoch: 15   Global Step: 645440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:43,987-Speed 2605.74 samples/sec   Loss 3.0239   LearningRate 0.0049   Epoch: 15   Global Step: 645450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:47,902-Speed 2616.80 samples/sec   Loss 3.0312   LearningRate 0.0049   Epoch: 15   Global Step: 645460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:51,799-Speed 2627.81 samples/sec   Loss 3.0045   LearningRate 0.0049   Epoch: 15   Global Step: 645470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:55,697-Speed 2627.85 samples/sec   Loss 3.0216   LearningRate 0.0049   Epoch: 15   Global Step: 645480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:16:59,611-Speed 2616.82 samples/sec   Loss 3.0267   LearningRate 0.0049   Epoch: 15   Global Step: 645490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:17:03,492-Speed 2639.78 samples/sec   Loss 3.0770   LearningRate 0.0049   Epoch: 15   Global Step: 645500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:17:07,397-Speed 2623.54 samples/sec   Loss 2.9853   LearningRate 0.0049   Epoch: 15   Global Step: 645510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:17:11,267-Speed 2646.62 samples/sec   Loss 3.0142   LearningRate 0.0049   Epoch: 15   Global Step: 645520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:15,175-Speed 2620.97 samples/sec   Loss 3.0345   LearningRate 0.0049   Epoch: 15   Global Step: 645530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:19,078-Speed 2624.10 samples/sec   Loss 3.0177   LearningRate 0.0049   Epoch: 15   Global Step: 645540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:22,973-Speed 2629.56 samples/sec   Loss 3.0275   LearningRate 0.0049   Epoch: 15   Global Step: 645550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:26,869-Speed 2628.85 samples/sec   Loss 3.0106   LearningRate 0.0049   Epoch: 15   Global Step: 645560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:30,765-Speed 2629.52 samples/sec   Loss 2.9620   LearningRate 0.0049   Epoch: 15   Global Step: 645570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:34,666-Speed 2625.42 samples/sec   Loss 3.0227   LearningRate 0.0049   Epoch: 15   Global Step: 645580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:38,564-Speed 2627.90 samples/sec   Loss 3.0389   LearningRate 0.0049   Epoch: 15   Global Step: 645590   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:42,474-Speed 2619.23 samples/sec   Loss 3.0241   LearningRate 0.0049   Epoch: 15   Global Step: 645600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:46,396-Speed 2611.92 samples/sec   Loss 2.9665   LearningRate 0.0049   Epoch: 15   Global Step: 645610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:17:50,292-Speed 2628.95 samples/sec   Loss 3.0899   LearningRate 0.0049   Epoch: 15   Global Step: 645620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:17:54,188-Speed 2628.93 samples/sec   Loss 3.0156   LearningRate 0.0049   Epoch: 15   Global Step: 645630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:17:58,089-Speed 2625.67 samples/sec   Loss 3.0302   LearningRate 0.0049   Epoch: 15   Global Step: 645640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:18:01,995-Speed 2622.46 samples/sec   Loss 3.0049   LearningRate 0.0049   Epoch: 15   Global Step: 645650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:18:05,889-Speed 2630.49 samples/sec   Loss 2.9683   LearningRate 0.0049   Epoch: 15   Global Step: 645660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:18:09,816-Speed 2608.51 samples/sec   Loss 3.0071   LearningRate 0.0049   Epoch: 15   Global Step: 645670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:18:13,712-Speed 2628.71 samples/sec   Loss 3.0010   LearningRate 0.0049   Epoch: 15   Global Step: 645680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:18:17,608-Speed 2629.63 samples/sec   Loss 2.9879   LearningRate 0.0049   Epoch: 15   Global Step: 645690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:18:21,481-Speed 2644.23 samples/sec   Loss 3.0220   LearningRate 0.0049   Epoch: 15   Global Step: 645700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:25,374-Speed 2630.62 samples/sec   Loss 3.0024   LearningRate 0.0049   Epoch: 15   Global Step: 645710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:29,280-Speed 2622.39 samples/sec   Loss 2.9884   LearningRate 0.0049   Epoch: 15   Global Step: 645720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:33,178-Speed 2627.80 samples/sec   Loss 2.9740   LearningRate 0.0049   Epoch: 15   Global Step: 645730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:37,071-Speed 2630.69 samples/sec   Loss 3.0865   LearningRate 0.0049   Epoch: 15   Global Step: 645740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:40,970-Speed 2627.02 samples/sec   Loss 2.9322   LearningRate 0.0049   Epoch: 15   Global Step: 645750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:44,864-Speed 2630.99 samples/sec   Loss 3.0360   LearningRate 0.0049   Epoch: 15   Global Step: 645760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:48,756-Speed 2631.59 samples/sec   Loss 3.0081   LearningRate 0.0049   Epoch: 15   Global Step: 645770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:52,661-Speed 2622.56 samples/sec   Loss 2.9516   LearningRate 0.0049   Epoch: 15   Global Step: 645780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:18:56,555-Speed 2630.22 samples/sec   Loss 3.0135   LearningRate 0.0049   Epoch: 15   Global Step: 645790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:00,447-Speed 2631.48 samples/sec   Loss 2.9719   LearningRate 0.0049   Epoch: 15   Global Step: 645800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:19:04,338-Speed 2632.64 samples/sec   Loss 2.9832   LearningRate 0.0049   Epoch: 15   Global Step: 645810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:19:08,226-Speed 2634.51 samples/sec   Loss 3.1177   LearningRate 0.0049   Epoch: 15   Global Step: 645820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:19:12,118-Speed 2631.46 samples/sec   Loss 3.0653   LearningRate 0.0049   Epoch: 15   Global Step: 645830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:19:16,025-Speed 2621.62 samples/sec   Loss 3.0508   LearningRate 0.0049   Epoch: 15   Global Step: 645840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:19:19,923-Speed 2627.73 samples/sec   Loss 3.0614   LearningRate 0.0049   Epoch: 15   Global Step: 645850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:19:23,798-Speed 2643.05 samples/sec   Loss 2.9928   LearningRate 0.0049   Epoch: 15   Global Step: 645860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:27,689-Speed 2633.33 samples/sec   Loss 3.0429   LearningRate 0.0049   Epoch: 15   Global Step: 645870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:31,586-Speed 2627.87 samples/sec   Loss 2.9771   LearningRate 0.0049   Epoch: 15   Global Step: 645880   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:35,484-Speed 2627.62 samples/sec   Loss 3.0444   LearningRate 0.0049   Epoch: 15   Global Step: 645890   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:39,400-Speed 2615.10 samples/sec   Loss 3.1056   LearningRate 0.0049   Epoch: 15   Global Step: 645900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:43,300-Speed 2626.64 samples/sec   Loss 2.9497   LearningRate 0.0049   Epoch: 15   Global Step: 645910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:47,193-Speed 2630.87 samples/sec   Loss 2.9711   LearningRate 0.0049   Epoch: 15   Global Step: 645920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:51,090-Speed 2628.94 samples/sec   Loss 2.9431   LearningRate 0.0049   Epoch: 15   Global Step: 645930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:54,982-Speed 2631.52 samples/sec   Loss 3.0284   LearningRate 0.0049   Epoch: 15   Global Step: 645940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:19:58,878-Speed 2629.16 samples/sec   Loss 3.0604   LearningRate 0.0049   Epoch: 15   Global Step: 645950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:20:02,778-Speed 2626.18 samples/sec   Loss 3.0711   LearningRate 0.0049   Epoch: 15   Global Step: 645960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:06,680-Speed 2624.65 samples/sec   Loss 3.0525   LearningRate 0.0049   Epoch: 15   Global Step: 645970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:10,575-Speed 2629.56 samples/sec   Loss 3.0376   LearningRate 0.0049   Epoch: 15   Global Step: 645980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:14,473-Speed 2628.05 samples/sec   Loss 2.9673   LearningRate 0.0049   Epoch: 15   Global Step: 645990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:18,374-Speed 2626.34 samples/sec   Loss 3.0207   LearningRate 0.0049   Epoch: 15   Global Step: 646000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:22,271-Speed 2627.66 samples/sec   Loss 2.9708   LearningRate 0.0049   Epoch: 15   Global Step: 646010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:26,178-Speed 2621.59 samples/sec   Loss 3.0146   LearningRate 0.0049   Epoch: 15   Global Step: 646020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:30,076-Speed 2628.26 samples/sec   Loss 2.9751   LearningRate 0.0049   Epoch: 15   Global Step: 646030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:33,973-Speed 2628.23 samples/sec   Loss 3.0495   LearningRate 0.0049   Epoch: 15   Global Step: 646040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:37,866-Speed 2630.86 samples/sec   Loss 3.0193   LearningRate 0.0049   Epoch: 15   Global Step: 646050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:41,768-Speed 2624.80 samples/sec   Loss 2.9479   LearningRate 0.0049   Epoch: 15   Global Step: 646060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:20:45,644-Speed 2642.70 samples/sec   Loss 3.0042   LearningRate 0.0049   Epoch: 15   Global Step: 646070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:49,548-Speed 2623.85 samples/sec   Loss 2.8987   LearningRate 0.0049   Epoch: 15   Global Step: 646080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:53,450-Speed 2624.87 samples/sec   Loss 2.9931   LearningRate 0.0049   Epoch: 15   Global Step: 646090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:20:57,353-Speed 2629.97 samples/sec   Loss 3.0083   LearningRate 0.0049   Epoch: 15   Global Step: 646100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:21:01,243-Speed 2632.66 samples/sec   Loss 2.9503   LearningRate 0.0049   Epoch: 15   Global Step: 646110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:21:05,145-Speed 2625.28 samples/sec   Loss 3.0124   LearningRate 0.0049   Epoch: 15   Global Step: 646120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:21:09,020-Speed 2643.34 samples/sec   Loss 3.0065   LearningRate 0.0049   Epoch: 15   Global Step: 646130   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:12,921-Speed 2625.65 samples/sec   Loss 3.0107   LearningRate 0.0049   Epoch: 15   Global Step: 646140   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:16,814-Speed 2631.02 samples/sec   Loss 3.0293   LearningRate 0.0049   Epoch: 15   Global Step: 646150   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:20,718-Speed 2623.24 samples/sec   Loss 3.0343   LearningRate 0.0049   Epoch: 15   Global Step: 646160   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:24,611-Speed 2631.40 samples/sec   Loss 3.0358   LearningRate 0.0049   Epoch: 15   Global Step: 646170   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:28,520-Speed 2620.26 samples/sec   Loss 3.0163   LearningRate 0.0049   Epoch: 15   Global Step: 646180   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:32,413-Speed 2631.14 samples/sec   Loss 3.0624   LearningRate 0.0049   Epoch: 15   Global Step: 646190   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:36,307-Speed 2630.36 samples/sec   Loss 2.9498   LearningRate 0.0049   Epoch: 15   Global Step: 646200   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:40,198-Speed 2631.99 samples/sec   Loss 2.9239   LearningRate 0.0049   Epoch: 15   Global Step: 646210   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:44,094-Speed 2629.50 samples/sec   Loss 2.9886   LearningRate 0.0049   Epoch: 15   Global Step: 646220   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:21:47,985-Speed 2632.60 samples/sec   Loss 3.0323   LearningRate 0.0049   Epoch: 15   Global Step: 646230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:21:51,908-Speed 2610.80 samples/sec   Loss 3.0797   LearningRate 0.0049   Epoch: 15   Global Step: 646240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:21:55,801-Speed 2631.34 samples/sec   Loss 3.0939   LearningRate 0.0049   Epoch: 15   Global Step: 646250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:21:59,699-Speed 2627.57 samples/sec   Loss 3.0759   LearningRate 0.0049   Epoch: 15   Global Step: 646260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:03,606-Speed 2621.65 samples/sec   Loss 3.0438   LearningRate 0.0049   Epoch: 15   Global Step: 646270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:07,498-Speed 2631.53 samples/sec   Loss 3.0391   LearningRate 0.0049   Epoch: 15   Global Step: 646280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:11,389-Speed 2632.76 samples/sec   Loss 2.9850   LearningRate 0.0049   Epoch: 15   Global Step: 646290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:15,362-Speed 2577.82 samples/sec   Loss 2.9745   LearningRate 0.0049   Epoch: 15   Global Step: 646300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:19,266-Speed 2624.11 samples/sec   Loss 3.0098   LearningRate 0.0049   Epoch: 15   Global Step: 646310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:23,165-Speed 2627.24 samples/sec   Loss 3.0753   LearningRate 0.0049   Epoch: 15   Global Step: 646320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:22:27,058-Speed 2630.65 samples/sec   Loss 3.0410   LearningRate 0.0049   Epoch: 15   Global Step: 646330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:30,965-Speed 2621.69 samples/sec   Loss 3.0328   LearningRate 0.0049   Epoch: 15   Global Step: 646340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:34,859-Speed 2629.96 samples/sec   Loss 3.0021   LearningRate 0.0049   Epoch: 15   Global Step: 646350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:38,761-Speed 2625.40 samples/sec   Loss 3.0338   LearningRate 0.0049   Epoch: 15   Global Step: 646360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:42,668-Speed 2621.04 samples/sec   Loss 2.9716   LearningRate 0.0049   Epoch: 15   Global Step: 646370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:46,562-Speed 2631.04 samples/sec   Loss 3.1027   LearningRate 0.0049   Epoch: 15   Global Step: 646380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:50,463-Speed 2625.92 samples/sec   Loss 3.0355   LearningRate 0.0049   Epoch: 15   Global Step: 646390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:54,372-Speed 2625.38 samples/sec   Loss 3.1120   LearningRate 0.0049   Epoch: 15   Global Step: 646400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:22:58,268-Speed 2628.72 samples/sec   Loss 3.0901   LearningRate 0.0049   Epoch: 15   Global Step: 646410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:02,163-Speed 2629.87 samples/sec   Loss 3.0281   LearningRate 0.0049   Epoch: 15   Global Step: 646420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:06,149-Speed 2569.37 samples/sec   Loss 3.0429   LearningRate 0.0049   Epoch: 15   Global Step: 646430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:23:10,046-Speed 2628.26 samples/sec   Loss 2.9915   LearningRate 0.0049   Epoch: 15   Global Step: 646440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:23:13,940-Speed 2630.28 samples/sec   Loss 3.0175   LearningRate 0.0049   Epoch: 15   Global Step: 646450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:17,832-Speed 2631.59 samples/sec   Loss 3.0802   LearningRate 0.0049   Epoch: 15   Global Step: 646460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:21,779-Speed 2595.06 samples/sec   Loss 2.9632   LearningRate 0.0049   Epoch: 15   Global Step: 646470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:25,696-Speed 2614.84 samples/sec   Loss 3.0138   LearningRate 0.0049   Epoch: 15   Global Step: 646480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:29,593-Speed 2628.11 samples/sec   Loss 3.0024   LearningRate 0.0049   Epoch: 15   Global Step: 646490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:33,489-Speed 2628.92 samples/sec   Loss 2.9877   LearningRate 0.0049   Epoch: 15   Global Step: 646500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:23:37,384-Speed 2629.92 samples/sec   Loss 3.0473   LearningRate 0.0049   Epoch: 15   Global Step: 646510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:23:41,345-Speed 2585.76 samples/sec   Loss 2.9523   LearningRate 0.0049   Epoch: 15   Global Step: 646520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:23:45,252-Speed 2621.77 samples/sec   Loss 2.9879   LearningRate 0.0049   Epoch: 15   Global Step: 646530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:23:49,149-Speed 2627.63 samples/sec   Loss 2.9536   LearningRate 0.0049   Epoch: 15   Global Step: 646540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:23:53,054-Speed 2623.49 samples/sec   Loss 2.9417   LearningRate 0.0049   Epoch: 15   Global Step: 646550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:23:56,947-Speed 2630.45 samples/sec   Loss 2.9792   LearningRate 0.0049   Epoch: 15   Global Step: 646560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:24:00,839-Speed 2631.76 samples/sec   Loss 3.0076   LearningRate 0.0049   Epoch: 15   Global Step: 646570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:24:04,732-Speed 2630.55 samples/sec   Loss 3.0687   LearningRate 0.0049   Epoch: 15   Global Step: 646580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:24:08,642-Speed 2620.11 samples/sec   Loss 3.0407   LearningRate 0.0049   Epoch: 15   Global Step: 646590   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:24:12,538-Speed 2628.93 samples/sec   Loss 3.1015   LearningRate 0.0049   Epoch: 15   Global Step: 646600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:24:16,537-Speed 2561.40 samples/sec   Loss 2.9467   LearningRate 0.0049   Epoch: 15   Global Step: 646610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:20,430-Speed 2631.17 samples/sec   Loss 2.9977   LearningRate 0.0049   Epoch: 15   Global Step: 646620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:24,324-Speed 2630.58 samples/sec   Loss 3.0632   LearningRate 0.0049   Epoch: 15   Global Step: 646630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:28,340-Speed 2550.29 samples/sec   Loss 3.0311   LearningRate 0.0049   Epoch: 15   Global Step: 646640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:32,428-Speed 2505.06 samples/sec   Loss 2.9839   LearningRate 0.0049   Epoch: 15   Global Step: 646650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:36,504-Speed 2513.06 samples/sec   Loss 3.0292   LearningRate 0.0049   Epoch: 15   Global Step: 646660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:40,401-Speed 2628.61 samples/sec   Loss 3.0380   LearningRate 0.0049   Epoch: 15   Global Step: 646670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:44,307-Speed 2622.59 samples/sec   Loss 2.9449   LearningRate 0.0049   Epoch: 15   Global Step: 646680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:48,210-Speed 2624.25 samples/sec   Loss 2.9631   LearningRate 0.0049   Epoch: 15   Global Step: 646690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:52,112-Speed 2625.37 samples/sec   Loss 2.9600   LearningRate 0.0049   Epoch: 15   Global Step: 646700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:24:56,016-Speed 2623.78 samples/sec   Loss 2.9292   LearningRate 0.0049   Epoch: 15   Global Step: 646710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-15 20:24:59,974-Speed 2587.38 samples/sec   Loss 2.9274   LearningRate 0.0049   Epoch: 15   Global Step: 646720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:04,001-Speed 2543.46 samples/sec   Loss 3.0355   LearningRate 0.0049   Epoch: 15   Global Step: 646730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:07,915-Speed 2617.51 samples/sec   Loss 2.8994   LearningRate 0.0049   Epoch: 15   Global Step: 646740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:11,820-Speed 2622.94 samples/sec   Loss 3.0332   LearningRate 0.0049   Epoch: 15   Global Step: 646750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:15,724-Speed 2623.08 samples/sec   Loss 2.9984   LearningRate 0.0049   Epoch: 15   Global Step: 646760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:19,643-Speed 2614.54 samples/sec   Loss 2.9756   LearningRate 0.0049   Epoch: 15   Global Step: 646770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:23,548-Speed 2622.83 samples/sec   Loss 3.0424   LearningRate 0.0049   Epoch: 15   Global Step: 646780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:27,453-Speed 2623.10 samples/sec   Loss 3.0224   LearningRate 0.0049   Epoch: 15   Global Step: 646790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:31,355-Speed 2624.81 samples/sec   Loss 2.9164   LearningRate 0.0049   Epoch: 15   Global Step: 646800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:35,253-Speed 2627.36 samples/sec   Loss 2.9597   LearningRate 0.0049   Epoch: 15   Global Step: 646810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:39,131-Speed 2641.14 samples/sec   Loss 3.0045   LearningRate 0.0049   Epoch: 15   Global Step: 646820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:43,028-Speed 2629.04 samples/sec   Loss 2.9612   LearningRate 0.0049   Epoch: 15   Global Step: 646830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:46,923-Speed 2629.17 samples/sec   Loss 3.0260   LearningRate 0.0049   Epoch: 15   Global Step: 646840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:50,816-Speed 2631.58 samples/sec   Loss 3.0058   LearningRate 0.0049   Epoch: 15   Global Step: 646850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:54,718-Speed 2624.43 samples/sec   Loss 2.9496   LearningRate 0.0049   Epoch: 15   Global Step: 646860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:25:58,632-Speed 2617.46 samples/sec   Loss 2.9717   LearningRate 0.0049   Epoch: 15   Global Step: 646870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:26:02,533-Speed 2625.41 samples/sec   Loss 3.0010   LearningRate 0.0049   Epoch: 15   Global Step: 646880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:26:06,405-Speed 2645.39 samples/sec   Loss 2.9876   LearningRate 0.0048   Epoch: 15   Global Step: 646890   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:10,309-Speed 2623.46 samples/sec   Loss 3.0648   LearningRate 0.0048   Epoch: 15   Global Step: 646900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:14,202-Speed 2631.11 samples/sec   Loss 2.9612   LearningRate 0.0048   Epoch: 15   Global Step: 646910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:18,098-Speed 2629.31 samples/sec   Loss 2.9819   LearningRate 0.0048   Epoch: 15   Global Step: 646920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:21,994-Speed 2628.78 samples/sec   Loss 3.0066   LearningRate 0.0048   Epoch: 15   Global Step: 646930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:25,908-Speed 2616.78 samples/sec   Loss 3.0915   LearningRate 0.0048   Epoch: 15   Global Step: 646940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:29,807-Speed 2627.78 samples/sec   Loss 2.9767   LearningRate 0.0048   Epoch: 15   Global Step: 646950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:33,708-Speed 2625.55 samples/sec   Loss 2.9488   LearningRate 0.0048   Epoch: 15   Global Step: 646960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:26:37,585-Speed 2641.33 samples/sec   Loss 3.0047   LearningRate 0.0048   Epoch: 15   Global Step: 646970   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:26:41,486-Speed 2626.01 samples/sec   Loss 3.0208   LearningRate 0.0048   Epoch: 15   Global Step: 646980   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:26:45,391-Speed 2622.50 samples/sec   Loss 2.9964   LearningRate 0.0048   Epoch: 15   Global Step: 646990   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:26:49,291-Speed 2626.76 samples/sec   Loss 3.0946   LearningRate 0.0048   Epoch: 15   Global Step: 647000   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:26:53,191-Speed 2626.34 samples/sec   Loss 3.0129   LearningRate 0.0048   Epoch: 15   Global Step: 647010   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:26:57,085-Speed 2630.38 samples/sec   Loss 3.0629   LearningRate 0.0048   Epoch: 15   Global Step: 647020   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:27:00,981-Speed 2628.77 samples/sec   Loss 3.0697   LearningRate 0.0048   Epoch: 15   Global Step: 647030   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:27:04,881-Speed 2626.10 samples/sec   Loss 2.9325   LearningRate 0.0048   Epoch: 15   Global Step: 647040   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:27:08,782-Speed 2625.34 samples/sec   Loss 2.9226   LearningRate 0.0048   Epoch: 15   Global Step: 647050   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:27:12,689-Speed 2622.11 samples/sec   Loss 3.0109   LearningRate 0.0048   Epoch: 15   Global Step: 647060   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-15 20:27:16,588-Speed 2626.62 samples/sec   Loss 3.0761   LearningRate 0.0048   Epoch: 15   Global Step: 647070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:20,484-Speed 2629.66 samples/sec   Loss 3.0491   LearningRate 0.0048   Epoch: 15   Global Step: 647080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:24,379-Speed 2629.41 samples/sec   Loss 2.9858   LearningRate 0.0048   Epoch: 15   Global Step: 647090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:28,285-Speed 2622.55 samples/sec   Loss 2.9529   LearningRate 0.0048   Epoch: 15   Global Step: 647100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:32,185-Speed 2626.47 samples/sec   Loss 3.0544   LearningRate 0.0048   Epoch: 15   Global Step: 647110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:36,088-Speed 2624.09 samples/sec   Loss 3.0385   LearningRate 0.0048   Epoch: 15   Global Step: 647120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:39,989-Speed 2625.02 samples/sec   Loss 2.9136   LearningRate 0.0048   Epoch: 15   Global Step: 647130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:43,890-Speed 2625.91 samples/sec   Loss 2.9973   LearningRate 0.0048   Epoch: 15   Global Step: 647140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:47,800-Speed 2619.50 samples/sec   Loss 2.9738   LearningRate 0.0048   Epoch: 15   Global Step: 647150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:51,696-Speed 2629.79 samples/sec   Loss 2.9362   LearningRate 0.0048   Epoch: 15   Global Step: 647160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-15 20:27:55,628-Speed 2604.60 samples/sec   Loss 2.9481   LearningRate 0.0048   Epoch: 15   Global Step: 647170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-15 20:27:59,521-Speed 2631.46 samples/sec   Loss 2.9763   LearningRate 0.0048   Epoch: 15   Global Step: 647180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:28:03,424-Speed 2623.91 samples/sec   Loss 3.0218   LearningRate 0.0048   Epoch: 15   Global Step: 647190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:28:07,340-Speed 2615.35 samples/sec   Loss 2.9613   LearningRate 0.0048   Epoch: 15   Global Step: 647200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:28:11,250-Speed 2619.41 samples/sec   Loss 2.9964   LearningRate 0.0048   Epoch: 15   Global Step: 647210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:28:15,154-Speed 2624.45 samples/sec   Loss 2.9812   LearningRate 0.0048   Epoch: 15   Global Step: 647220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:28:19,064-Speed 2619.88 samples/sec   Loss 3.0246   LearningRate 0.0048   Epoch: 15   Global Step: 647230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:28:23,071-Speed 2555.63 samples/sec   Loss 2.9589   LearningRate 0.0048   Epoch: 15   Global Step: 647240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:26,980-Speed 2620.64 samples/sec   Loss 3.0134   LearningRate 0.0048   Epoch: 15   Global Step: 647250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:30,877-Speed 2628.05 samples/sec   Loss 3.0781   LearningRate 0.0048   Epoch: 15   Global Step: 647260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:34,774-Speed 2628.62 samples/sec   Loss 2.9987   LearningRate 0.0048   Epoch: 15   Global Step: 647270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:38,666-Speed 2632.05 samples/sec   Loss 2.9589   LearningRate 0.0048   Epoch: 15   Global Step: 647280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:42,557-Speed 2632.11 samples/sec   Loss 3.0422   LearningRate 0.0048   Epoch: 15   Global Step: 647290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:46,455-Speed 2627.98 samples/sec   Loss 3.0966   LearningRate 0.0048   Epoch: 15   Global Step: 647300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:50,354-Speed 2626.62 samples/sec   Loss 2.9999   LearningRate 0.0048   Epoch: 15   Global Step: 647310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:54,271-Speed 2615.32 samples/sec   Loss 3.1233   LearningRate 0.0048   Epoch: 15   Global Step: 647320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:28:58,178-Speed 2621.53 samples/sec   Loss 2.9666   LearningRate 0.0048   Epoch: 15   Global Step: 647330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:02,080-Speed 2624.85 samples/sec   Loss 3.0647   LearningRate 0.0048   Epoch: 15   Global Step: 647340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:05,976-Speed 2628.93 samples/sec   Loss 2.9819   LearningRate 0.0048   Epoch: 15   Global Step: 647350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:09,871-Speed 2629.84 samples/sec   Loss 3.0152   LearningRate 0.0048   Epoch: 15   Global Step: 647360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:13,766-Speed 2629.94 samples/sec   Loss 3.0317   LearningRate 0.0048   Epoch: 15   Global Step: 647370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:17,662-Speed 2628.46 samples/sec   Loss 3.0333   LearningRate 0.0048   Epoch: 15   Global Step: 647380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:21,562-Speed 2626.76 samples/sec   Loss 2.9920   LearningRate 0.0048   Epoch: 15   Global Step: 647390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:25,480-Speed 2613.62 samples/sec   Loss 2.9693   LearningRate 0.0048   Epoch: 15   Global Step: 647400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:29:29,361-Speed 2639.16 samples/sec   Loss 3.0894   LearningRate 0.0048   Epoch: 15   Global Step: 647410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:33,260-Speed 2626.65 samples/sec   Loss 3.0084   LearningRate 0.0048   Epoch: 15   Global Step: 647420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:37,165-Speed 2623.45 samples/sec   Loss 3.0740   LearningRate 0.0048   Epoch: 15   Global Step: 647430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:41,063-Speed 2627.56 samples/sec   Loss 3.0002   LearningRate 0.0048   Epoch: 15   Global Step: 647440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:44,969-Speed 2622.73 samples/sec   Loss 2.9669   LearningRate 0.0048   Epoch: 15   Global Step: 647450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:48,865-Speed 2628.71 samples/sec   Loss 3.0020   LearningRate 0.0048   Epoch: 15   Global Step: 647460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:52,764-Speed 2627.46 samples/sec   Loss 3.0437   LearningRate 0.0048   Epoch: 15   Global Step: 647470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:29:56,654-Speed 2632.40 samples/sec   Loss 3.0338   LearningRate 0.0048   Epoch: 15   Global Step: 647480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:00,546-Speed 2631.58 samples/sec   Loss 3.0665   LearningRate 0.0048   Epoch: 15   Global Step: 647490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:04,445-Speed 2627.06 samples/sec   Loss 3.0108   LearningRate 0.0048   Epoch: 15   Global Step: 647500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:08,365-Speed 2613.19 samples/sec   Loss 2.9139   LearningRate 0.0048   Epoch: 15   Global Step: 647510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:30:12,266-Speed 2625.43 samples/sec   Loss 3.0245   LearningRate 0.0048   Epoch: 15   Global Step: 647520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:30:16,137-Speed 2646.34 samples/sec   Loss 2.9436   LearningRate 0.0048   Epoch: 15   Global Step: 647530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:20,204-Speed 2521.08 samples/sec   Loss 3.0931   LearningRate 0.0048   Epoch: 15   Global Step: 647540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:24,110-Speed 2623.14 samples/sec   Loss 2.9977   LearningRate 0.0048   Epoch: 15   Global Step: 647550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:28,010-Speed 2625.66 samples/sec   Loss 3.0279   LearningRate 0.0048   Epoch: 15   Global Step: 647560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:31,910-Speed 2626.27 samples/sec   Loss 2.9595   LearningRate 0.0048   Epoch: 15   Global Step: 647570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:35,807-Speed 2628.48 samples/sec   Loss 2.9892   LearningRate 0.0048   Epoch: 15   Global Step: 647580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:39,713-Speed 2622.10 samples/sec   Loss 3.0109   LearningRate 0.0048   Epoch: 15   Global Step: 647590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:43,606-Speed 2631.77 samples/sec   Loss 2.9595   LearningRate 0.0048   Epoch: 15   Global Step: 647600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:47,498-Speed 2631.72 samples/sec   Loss 3.0565   LearningRate 0.0048   Epoch: 15   Global Step: 647610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:51,435-Speed 2607.04 samples/sec   Loss 2.9082   LearningRate 0.0048   Epoch: 15   Global Step: 647620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:30:55,514-Speed 2510.78 samples/sec   Loss 3.0332   LearningRate 0.0048   Epoch: 15   Global Step: 647630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:30:59,504-Speed 2566.93 samples/sec   Loss 2.9690   LearningRate 0.0048   Epoch: 15   Global Step: 647640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:31:03,491-Speed 2572.08 samples/sec   Loss 3.0255   LearningRate 0.0048   Epoch: 15   Global Step: 647650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:07,405-Speed 2616.77 samples/sec   Loss 3.0292   LearningRate 0.0048   Epoch: 15   Global Step: 647660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:11,304-Speed 2626.80 samples/sec   Loss 2.9600   LearningRate 0.0048   Epoch: 15   Global Step: 647670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:15,211-Speed 2622.46 samples/sec   Loss 2.9371   LearningRate 0.0048   Epoch: 15   Global Step: 647680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:19,107-Speed 2629.11 samples/sec   Loss 2.9594   LearningRate 0.0048   Epoch: 15   Global Step: 647690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:23,007-Speed 2626.12 samples/sec   Loss 2.9963   LearningRate 0.0048   Epoch: 15   Global Step: 647700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:26,900-Speed 2631.44 samples/sec   Loss 2.9795   LearningRate 0.0048   Epoch: 15   Global Step: 647710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:30,795-Speed 2629.50 samples/sec   Loss 3.0526   LearningRate 0.0048   Epoch: 15   Global Step: 647720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:34,698-Speed 2624.60 samples/sec   Loss 3.0692   LearningRate 0.0048   Epoch: 15   Global Step: 647730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:38,610-Speed 2617.78 samples/sec   Loss 2.9896   LearningRate 0.0048   Epoch: 15   Global Step: 647740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:31:42,576-Speed 2583.19 samples/sec   Loss 3.0429   LearningRate 0.0048   Epoch: 15   Global Step: 647750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:31:46,468-Speed 2631.41 samples/sec   Loss 2.9925   LearningRate 0.0048   Epoch: 15   Global Step: 647760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:31:50,361-Speed 2631.34 samples/sec   Loss 2.9888   LearningRate 0.0048   Epoch: 15   Global Step: 647770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:31:54,257-Speed 2628.96 samples/sec   Loss 3.0143   LearningRate 0.0048   Epoch: 15   Global Step: 647780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:31:58,151-Speed 2630.57 samples/sec   Loss 3.0350   LearningRate 0.0048   Epoch: 15   Global Step: 647790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:32:02,055-Speed 2623.63 samples/sec   Loss 2.9818   LearningRate 0.0048   Epoch: 15   Global Step: 647800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:32:05,951-Speed 2629.07 samples/sec   Loss 2.9934   LearningRate 0.0048   Epoch: 15   Global Step: 647810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:32:09,830-Speed 2640.28 samples/sec   Loss 2.9608   LearningRate 0.0048   Epoch: 15   Global Step: 647820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:13,721-Speed 2632.74 samples/sec   Loss 3.0370   LearningRate 0.0048   Epoch: 15   Global Step: 647830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:17,613-Speed 2631.72 samples/sec   Loss 3.0555   LearningRate 0.0048   Epoch: 15   Global Step: 647840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:21,520-Speed 2621.90 samples/sec   Loss 3.0192   LearningRate 0.0048   Epoch: 15   Global Step: 647850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:25,442-Speed 2611.70 samples/sec   Loss 3.0012   LearningRate 0.0048   Epoch: 15   Global Step: 647860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:29,381-Speed 2600.44 samples/sec   Loss 3.0193   LearningRate 0.0048   Epoch: 15   Global Step: 647870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:33,277-Speed 2628.96 samples/sec   Loss 3.0413   LearningRate 0.0048   Epoch: 15   Global Step: 647880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:37,186-Speed 2620.36 samples/sec   Loss 3.0473   LearningRate 0.0048   Epoch: 15   Global Step: 647890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:41,081-Speed 2629.19 samples/sec   Loss 2.9218   LearningRate 0.0048   Epoch: 15   Global Step: 647900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:44,981-Speed 2626.81 samples/sec   Loss 3.0243   LearningRate 0.0048   Epoch: 15   Global Step: 647910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:32:48,909-Speed 2607.45 samples/sec   Loss 3.0021   LearningRate 0.0048   Epoch: 15   Global Step: 647920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:32:52,800-Speed 2632.36 samples/sec   Loss 2.9867   LearningRate 0.0048   Epoch: 15   Global Step: 647930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:32:56,697-Speed 2629.21 samples/sec   Loss 2.9209   LearningRate 0.0048   Epoch: 15   Global Step: 647940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:33:00,587-Speed 2632.73 samples/sec   Loss 2.9809   LearningRate 0.0048   Epoch: 15   Global Step: 647950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:33:04,459-Speed 2645.31 samples/sec   Loss 2.9382   LearningRate 0.0048   Epoch: 15   Global Step: 647960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:08,364-Speed 2622.86 samples/sec   Loss 2.9555   LearningRate 0.0048   Epoch: 15   Global Step: 647970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:12,262-Speed 2627.79 samples/sec   Loss 3.0120   LearningRate 0.0048   Epoch: 15   Global Step: 647980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:16,179-Speed 2614.71 samples/sec   Loss 2.9713   LearningRate 0.0048   Epoch: 15   Global Step: 647990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:20,095-Speed 2615.98 samples/sec   Loss 3.0046   LearningRate 0.0048   Epoch: 15   Global Step: 648000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:23,990-Speed 2629.24 samples/sec   Loss 2.9687   LearningRate 0.0048   Epoch: 15   Global Step: 648010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:27,893-Speed 2624.37 samples/sec   Loss 2.9348   LearningRate 0.0048   Epoch: 15   Global Step: 648020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:31,796-Speed 2624.56 samples/sec   Loss 3.0092   LearningRate 0.0048   Epoch: 15   Global Step: 648030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:35,706-Speed 2619.43 samples/sec   Loss 3.0823   LearningRate 0.0048   Epoch: 15   Global Step: 648040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:39,616-Speed 2619.45 samples/sec   Loss 3.0295   LearningRate 0.0048   Epoch: 15   Global Step: 648050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:43,511-Speed 2630.46 samples/sec   Loss 2.9585   LearningRate 0.0048   Epoch: 15   Global Step: 648060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:33:47,402-Speed 2631.91 samples/sec   Loss 2.9230   LearningRate 0.0048   Epoch: 15   Global Step: 648070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:33:51,283-Speed 2639.48 samples/sec   Loss 2.9631   LearningRate 0.0048   Epoch: 15   Global Step: 648080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:55,180-Speed 2627.95 samples/sec   Loss 3.0150   LearningRate 0.0048   Epoch: 15   Global Step: 648090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:33:59,077-Speed 2628.74 samples/sec   Loss 3.0776   LearningRate 0.0048   Epoch: 15   Global Step: 648100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:02,970-Speed 2631.08 samples/sec   Loss 2.9519   LearningRate 0.0048   Epoch: 15   Global Step: 648110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:06,860-Speed 2632.62 samples/sec   Loss 2.9749   LearningRate 0.0048   Epoch: 15   Global Step: 648120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:10,750-Speed 2633.39 samples/sec   Loss 2.9358   LearningRate 0.0048   Epoch: 15   Global Step: 648130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:14,648-Speed 2628.01 samples/sec   Loss 3.0613   LearningRate 0.0048   Epoch: 15   Global Step: 648140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:18,543-Speed 2629.76 samples/sec   Loss 2.9587   LearningRate 0.0048   Epoch: 15   Global Step: 648150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:22,440-Speed 2627.58 samples/sec   Loss 3.0160   LearningRate 0.0048   Epoch: 15   Global Step: 648160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:26,375-Speed 2603.15 samples/sec   Loss 2.9718   LearningRate 0.0048   Epoch: 15   Global Step: 648170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:34:30,269-Speed 2630.16 samples/sec   Loss 2.9625   LearningRate 0.0048   Epoch: 15   Global Step: 648180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:34,177-Speed 2621.89 samples/sec   Loss 3.0068   LearningRate 0.0048   Epoch: 15   Global Step: 648190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:38,084-Speed 2621.52 samples/sec   Loss 2.9551   LearningRate 0.0048   Epoch: 15   Global Step: 648200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:41,976-Speed 2630.90 samples/sec   Loss 3.0370   LearningRate 0.0048   Epoch: 15   Global Step: 648210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:45,868-Speed 2632.52 samples/sec   Loss 3.0367   LearningRate 0.0048   Epoch: 15   Global Step: 648220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:49,769-Speed 2625.31 samples/sec   Loss 2.9793   LearningRate 0.0048   Epoch: 15   Global Step: 648230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:53,663-Speed 2630.13 samples/sec   Loss 2.9761   LearningRate 0.0048   Epoch: 15   Global Step: 648240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:34:57,558-Speed 2629.91 samples/sec   Loss 2.9936   LearningRate 0.0048   Epoch: 15   Global Step: 648250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:01,455-Speed 2628.07 samples/sec   Loss 2.9854   LearningRate 0.0048   Epoch: 15   Global Step: 648260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:05,359-Speed 2623.69 samples/sec   Loss 3.0032   LearningRate 0.0048   Epoch: 15   Global Step: 648270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:09,250-Speed 2632.59 samples/sec   Loss 2.9371   LearningRate 0.0048   Epoch: 15   Global Step: 648280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:13,144-Speed 2630.25 samples/sec   Loss 2.9463   LearningRate 0.0048   Epoch: 15   Global Step: 648290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:17,085-Speed 2599.23 samples/sec   Loss 3.0104   LearningRate 0.0048   Epoch: 15   Global Step: 648300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:20,992-Speed 2621.81 samples/sec   Loss 2.9346   LearningRate 0.0048   Epoch: 15   Global Step: 648310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:24,887-Speed 2630.06 samples/sec   Loss 2.9569   LearningRate 0.0048   Epoch: 15   Global Step: 648320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:28,781-Speed 2629.94 samples/sec   Loss 2.9369   LearningRate 0.0048   Epoch: 15   Global Step: 648330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:32,677-Speed 2629.28 samples/sec   Loss 2.9830   LearningRate 0.0048   Epoch: 15   Global Step: 648340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:36,587-Speed 2619.13 samples/sec   Loss 2.9501   LearningRate 0.0048   Epoch: 15   Global Step: 648350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:40,494-Speed 2622.04 samples/sec   Loss 2.9202   LearningRate 0.0048   Epoch: 15   Global Step: 648360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:44,484-Speed 2567.24 samples/sec   Loss 2.9828   LearningRate 0.0048   Epoch: 15   Global Step: 648370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:48,383-Speed 2626.67 samples/sec   Loss 2.9810   LearningRate 0.0048   Epoch: 15   Global Step: 648380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-15 20:35:52,254-Speed 2646.76 samples/sec   Loss 2.9080   LearningRate 0.0048   Epoch: 15   Global Step: 648390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:35:56,135-Speed 2639.22 samples/sec   Loss 3.0537   LearningRate 0.0048   Epoch: 15   Global Step: 648400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:00,025-Speed 2632.77 samples/sec   Loss 2.9619   LearningRate 0.0048   Epoch: 15   Global Step: 648410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:03,919-Speed 2630.26 samples/sec   Loss 2.9919   LearningRate 0.0048   Epoch: 15   Global Step: 648420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:07,828-Speed 2620.75 samples/sec   Loss 2.9274   LearningRate 0.0048   Epoch: 15   Global Step: 648430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:11,723-Speed 2629.22 samples/sec   Loss 2.9843   LearningRate 0.0048   Epoch: 15   Global Step: 648440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:15,638-Speed 2616.89 samples/sec   Loss 3.0297   LearningRate 0.0048   Epoch: 15   Global Step: 648450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:19,554-Speed 2615.56 samples/sec   Loss 3.0047   LearningRate 0.0048   Epoch: 15   Global Step: 648460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:23,443-Speed 2633.63 samples/sec   Loss 2.9647   LearningRate 0.0048   Epoch: 15   Global Step: 648470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:27,356-Speed 2618.03 samples/sec   Loss 3.0112   LearningRate 0.0048   Epoch: 15   Global Step: 648480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:31,249-Speed 2630.93 samples/sec   Loss 3.0159   LearningRate 0.0048   Epoch: 15   Global Step: 648490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:35,146-Speed 2628.23 samples/sec   Loss 2.9175   LearningRate 0.0048   Epoch: 15   Global Step: 648500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:36:39,040-Speed 2630.36 samples/sec   Loss 3.0277   LearningRate 0.0048   Epoch: 15   Global Step: 648510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:36:42,920-Speed 2639.52 samples/sec   Loss 2.9121   LearningRate 0.0048   Epoch: 15   Global Step: 648520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:46,813-Speed 2631.35 samples/sec   Loss 2.9759   LearningRate 0.0048   Epoch: 15   Global Step: 648530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:50,707-Speed 2630.58 samples/sec   Loss 3.0711   LearningRate 0.0048   Epoch: 15   Global Step: 648540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:54,613-Speed 2622.75 samples/sec   Loss 2.9529   LearningRate 0.0048   Epoch: 15   Global Step: 648550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:36:58,538-Speed 2609.17 samples/sec   Loss 3.0622   LearningRate 0.0048   Epoch: 15   Global Step: 648560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:02,443-Speed 2622.82 samples/sec   Loss 2.9944   LearningRate 0.0048   Epoch: 15   Global Step: 648570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:06,342-Speed 2627.26 samples/sec   Loss 3.0032   LearningRate 0.0048   Epoch: 15   Global Step: 648580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:10,242-Speed 2626.71 samples/sec   Loss 3.0112   LearningRate 0.0048   Epoch: 15   Global Step: 648590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:14,142-Speed 2626.21 samples/sec   Loss 2.9118   LearningRate 0.0048   Epoch: 15   Global Step: 648600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:18,044-Speed 2624.32 samples/sec   Loss 3.0560   LearningRate 0.0048   Epoch: 15   Global Step: 648610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:21,959-Speed 2617.13 samples/sec   Loss 3.0152   LearningRate 0.0048   Epoch: 15   Global Step: 648620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:37:25,877-Speed 2614.31 samples/sec   Loss 2.9423   LearningRate 0.0048   Epoch: 15   Global Step: 648630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:37:29,771-Speed 2630.29 samples/sec   Loss 2.9945   LearningRate 0.0048   Epoch: 15   Global Step: 648640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:37:33,672-Speed 2625.71 samples/sec   Loss 2.9770   LearningRate 0.0048   Epoch: 15   Global Step: 648650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:37:37,576-Speed 2623.81 samples/sec   Loss 2.8689   LearningRate 0.0048   Epoch: 15   Global Step: 648660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:37:41,473-Speed 2627.50 samples/sec   Loss 2.9710   LearningRate 0.0048   Epoch: 15   Global Step: 648670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:37:45,354-Speed 2639.49 samples/sec   Loss 2.9722   LearningRate 0.0048   Epoch: 15   Global Step: 648680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:49,259-Speed 2623.99 samples/sec   Loss 3.0304   LearningRate 0.0048   Epoch: 15   Global Step: 648690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:53,166-Speed 2621.11 samples/sec   Loss 2.9327   LearningRate 0.0048   Epoch: 15   Global Step: 648700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:37:57,085-Speed 2613.94 samples/sec   Loss 2.9985   LearningRate 0.0048   Epoch: 15   Global Step: 648710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:01,000-Speed 2616.39 samples/sec   Loss 3.0012   LearningRate 0.0048   Epoch: 15   Global Step: 648720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:04,903-Speed 2624.67 samples/sec   Loss 2.9888   LearningRate 0.0048   Epoch: 15   Global Step: 648730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:08,808-Speed 2622.67 samples/sec   Loss 3.0493   LearningRate 0.0048   Epoch: 15   Global Step: 648740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:12,707-Speed 2627.18 samples/sec   Loss 3.0395   LearningRate 0.0048   Epoch: 15   Global Step: 648750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:16,604-Speed 2628.22 samples/sec   Loss 2.9979   LearningRate 0.0048   Epoch: 15   Global Step: 648760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:20,502-Speed 2627.96 samples/sec   Loss 2.9802   LearningRate 0.0048   Epoch: 15   Global Step: 648770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:24,399-Speed 2628.49 samples/sec   Loss 2.9704   LearningRate 0.0047   Epoch: 15   Global Step: 648780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:28,306-Speed 2621.96 samples/sec   Loss 3.0299   LearningRate 0.0047   Epoch: 15   Global Step: 648790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:32,204-Speed 2627.18 samples/sec   Loss 3.0560   LearningRate 0.0047   Epoch: 15   Global Step: 648800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:36,102-Speed 2627.76 samples/sec   Loss 2.9520   LearningRate 0.0047   Epoch: 15   Global Step: 648810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:40,014-Speed 2617.72 samples/sec   Loss 2.9932   LearningRate 0.0047   Epoch: 15   Global Step: 648820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:43,918-Speed 2624.09 samples/sec   Loss 2.9332   LearningRate 0.0047   Epoch: 15   Global Step: 648830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:47,844-Speed 2608.67 samples/sec   Loss 2.9580   LearningRate 0.0047   Epoch: 15   Global Step: 648840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:38:51,718-Speed 2643.80 samples/sec   Loss 3.0142   LearningRate 0.0047   Epoch: 15   Global Step: 648850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:55,611-Speed 2631.13 samples/sec   Loss 3.0017   LearningRate 0.0047   Epoch: 15   Global Step: 648860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:38:59,505-Speed 2631.58 samples/sec   Loss 3.0475   LearningRate 0.0047   Epoch: 15   Global Step: 648870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:39:03,406-Speed 2625.23 samples/sec   Loss 3.0376   LearningRate 0.0047   Epoch: 15   Global Step: 648880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:39:07,277-Speed 2645.30 samples/sec   Loss 2.9496   LearningRate 0.0047   Epoch: 15   Global Step: 648890   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:11,175-Speed 2627.60 samples/sec   Loss 2.9137   LearningRate 0.0047   Epoch: 15   Global Step: 648900   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:15,069-Speed 2630.70 samples/sec   Loss 2.9868   LearningRate 0.0047   Epoch: 15   Global Step: 648910   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:18,965-Speed 2628.83 samples/sec   Loss 3.0314   LearningRate 0.0047   Epoch: 15   Global Step: 648920   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:22,862-Speed 2628.54 samples/sec   Loss 2.9591   LearningRate 0.0047   Epoch: 15   Global Step: 648930   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:26,755-Speed 2630.83 samples/sec   Loss 3.0002   LearningRate 0.0047   Epoch: 15   Global Step: 648940   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:30,647-Speed 2631.78 samples/sec   Loss 2.9245   LearningRate 0.0047   Epoch: 15   Global Step: 648950   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:34,538-Speed 2632.97 samples/sec   Loss 2.9747   LearningRate 0.0047   Epoch: 15   Global Step: 648960   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:38,431-Speed 2630.29 samples/sec   Loss 2.9852   LearningRate 0.0047   Epoch: 15   Global Step: 648970   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:42,369-Speed 2600.95 samples/sec   Loss 3.0388   LearningRate 0.0047   Epoch: 15   Global Step: 648980   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:39:46,258-Speed 2633.90 samples/sec   Loss 2.9944   LearningRate 0.0047   Epoch: 15   Global Step: 648990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:39:50,150-Speed 2631.84 samples/sec   Loss 2.9984   LearningRate 0.0047   Epoch: 15   Global Step: 649000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:39:54,057-Speed 2621.88 samples/sec   Loss 2.9967   LearningRate 0.0047   Epoch: 15   Global Step: 649010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:39:57,953-Speed 2629.04 samples/sec   Loss 2.9850   LearningRate 0.0047   Epoch: 15   Global Step: 649020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:01,851-Speed 2628.05 samples/sec   Loss 2.9111   LearningRate 0.0047   Epoch: 15   Global Step: 649030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:05,748-Speed 2628.22 samples/sec   Loss 2.9871   LearningRate 0.0047   Epoch: 15   Global Step: 649040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:09,677-Speed 2606.58 samples/sec   Loss 3.0078   LearningRate 0.0047   Epoch: 15   Global Step: 649050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:13,711-Speed 2539.03 samples/sec   Loss 2.9543   LearningRate 0.0047   Epoch: 15   Global Step: 649060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:17,803-Speed 2503.06 samples/sec   Loss 2.9392   LearningRate 0.0047   Epoch: 15   Global Step: 649070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:21,790-Speed 2569.16 samples/sec   Loss 2.9290   LearningRate 0.0047   Epoch: 15   Global Step: 649080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:25,684-Speed 2630.45 samples/sec   Loss 3.0017   LearningRate 0.0047   Epoch: 15   Global Step: 649090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:40:29,576-Speed 2632.08 samples/sec   Loss 2.9433   LearningRate 0.0047   Epoch: 15   Global Step: 649100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:40:33,470-Speed 2629.75 samples/sec   Loss 2.9600   LearningRate 0.0047   Epoch: 15   Global Step: 649110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:40:37,375-Speed 2623.21 samples/sec   Loss 2.9205   LearningRate 0.0047   Epoch: 15   Global Step: 649120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:40:41,272-Speed 2628.90 samples/sec   Loss 2.9928   LearningRate 0.0047   Epoch: 15   Global Step: 649130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:40:45,163-Speed 2632.14 samples/sec   Loss 2.9115   LearningRate 0.0047   Epoch: 15   Global Step: 649140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:49,057-Speed 2630.08 samples/sec   Loss 2.8971   LearningRate 0.0047   Epoch: 15   Global Step: 649150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:52,948-Speed 2633.15 samples/sec   Loss 2.9976   LearningRate 0.0047   Epoch: 15   Global Step: 649160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:40:56,839-Speed 2632.06 samples/sec   Loss 2.9929   LearningRate 0.0047   Epoch: 15   Global Step: 649170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:00,736-Speed 2627.77 samples/sec   Loss 2.9865   LearningRate 0.0047   Epoch: 15   Global Step: 649180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:04,628-Speed 2631.45 samples/sec   Loss 2.9773   LearningRate 0.0047   Epoch: 15   Global Step: 649190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:08,521-Speed 2631.63 samples/sec   Loss 2.9874   LearningRate 0.0047   Epoch: 15   Global Step: 649200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:12,412-Speed 2632.26 samples/sec   Loss 2.9874   LearningRate 0.0047   Epoch: 15   Global Step: 649210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:16,314-Speed 2625.06 samples/sec   Loss 2.9898   LearningRate 0.0047   Epoch: 15   Global Step: 649220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:20,224-Speed 2619.22 samples/sec   Loss 3.0369   LearningRate 0.0047   Epoch: 15   Global Step: 649230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:41:24,139-Speed 2616.54 samples/sec   Loss 2.9700   LearningRate 0.0047   Epoch: 15   Global Step: 649240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:28,031-Speed 2632.23 samples/sec   Loss 2.9459   LearningRate 0.0047   Epoch: 15   Global Step: 649250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:31,963-Speed 2604.97 samples/sec   Loss 2.8945   LearningRate 0.0047   Epoch: 15   Global Step: 649260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:35,858-Speed 2629.08 samples/sec   Loss 2.9810   LearningRate 0.0047   Epoch: 15   Global Step: 649270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:39,770-Speed 2618.48 samples/sec   Loss 3.0101   LearningRate 0.0047   Epoch: 15   Global Step: 649280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:43,677-Speed 2621.99 samples/sec   Loss 2.9962   LearningRate 0.0047   Epoch: 15   Global Step: 649290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:47,570-Speed 2631.10 samples/sec   Loss 2.9827   LearningRate 0.0047   Epoch: 15   Global Step: 649300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:51,467-Speed 2628.54 samples/sec   Loss 2.9435   LearningRate 0.0047   Epoch: 15   Global Step: 649310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:55,362-Speed 2630.53 samples/sec   Loss 3.0217   LearningRate 0.0047   Epoch: 15   Global Step: 649320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:41:59,256-Speed 2630.64 samples/sec   Loss 3.0614   LearningRate 0.0047   Epoch: 15   Global Step: 649330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:42:03,149-Speed 2630.65 samples/sec   Loss 2.9437   LearningRate 0.0047   Epoch: 15   Global Step: 649340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:07,041-Speed 2631.44 samples/sec   Loss 2.9907   LearningRate 0.0047   Epoch: 15   Global Step: 649350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:10,976-Speed 2603.45 samples/sec   Loss 2.9795   LearningRate 0.0047   Epoch: 15   Global Step: 649360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:14,879-Speed 2624.76 samples/sec   Loss 2.9763   LearningRate 0.0047   Epoch: 15   Global Step: 649370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:18,776-Speed 2627.97 samples/sec   Loss 3.0045   LearningRate 0.0047   Epoch: 15   Global Step: 649380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:22,701-Speed 2609.87 samples/sec   Loss 2.9055   LearningRate 0.0047   Epoch: 15   Global Step: 649390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:26,637-Speed 2601.65 samples/sec   Loss 2.9739   LearningRate 0.0047   Epoch: 15   Global Step: 649400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:30,533-Speed 2629.00 samples/sec   Loss 2.9199   LearningRate 0.0047   Epoch: 15   Global Step: 649410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:34,429-Speed 2629.35 samples/sec   Loss 3.0179   LearningRate 0.0047   Epoch: 15   Global Step: 649420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:42:38,324-Speed 2629.59 samples/sec   Loss 2.8451   LearningRate 0.0047   Epoch: 15   Global Step: 649430   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:42:42,214-Speed 2632.93 samples/sec   Loss 2.9726   LearningRate 0.0047   Epoch: 15   Global Step: 649440   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:42:46,132-Speed 2614.86 samples/sec   Loss 3.0249   LearningRate 0.0047   Epoch: 15   Global Step: 649450   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:42:50,028-Speed 2628.35 samples/sec   Loss 2.9473   LearningRate 0.0047   Epoch: 15   Global Step: 649460   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:42:53,925-Speed 2629.24 samples/sec   Loss 3.0087   LearningRate 0.0047   Epoch: 15   Global Step: 649470   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:42:57,818-Speed 2630.91 samples/sec   Loss 2.9572   LearningRate 0.0047   Epoch: 15   Global Step: 649480   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:43:01,714-Speed 2628.29 samples/sec   Loss 2.9480   LearningRate 0.0047   Epoch: 15   Global Step: 649490   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:43:05,627-Speed 2617.64 samples/sec   Loss 2.8790   LearningRate 0.0047   Epoch: 15   Global Step: 649500   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:43:09,606-Speed 2573.98 samples/sec   Loss 2.9628   LearningRate 0.0047   Epoch: 15   Global Step: 649510   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:43:13,505-Speed 2627.30 samples/sec   Loss 3.0193   LearningRate 0.0047   Epoch: 15   Global Step: 649520   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:43:17,417-Speed 2617.80 samples/sec   Loss 2.9305   LearningRate 0.0047   Epoch: 15   Global Step: 649530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:21,377-Speed 2587.13 samples/sec   Loss 2.9392   LearningRate 0.0047   Epoch: 15   Global Step: 649540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:25,286-Speed 2620.06 samples/sec   Loss 2.9518   LearningRate 0.0047   Epoch: 15   Global Step: 649550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:29,194-Speed 2621.26 samples/sec   Loss 2.9572   LearningRate 0.0047   Epoch: 15   Global Step: 649560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:33,095-Speed 2625.53 samples/sec   Loss 2.9641   LearningRate 0.0047   Epoch: 15   Global Step: 649570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:37,009-Speed 2616.65 samples/sec   Loss 2.9320   LearningRate 0.0047   Epoch: 15   Global Step: 649580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:40,903-Speed 2630.07 samples/sec   Loss 2.9964   LearningRate 0.0047   Epoch: 15   Global Step: 649590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:44,801-Speed 2628.18 samples/sec   Loss 3.0064   LearningRate 0.0047   Epoch: 15   Global Step: 649600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:48,699-Speed 2627.46 samples/sec   Loss 3.0060   LearningRate 0.0047   Epoch: 15   Global Step: 649610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:52,595-Speed 2629.39 samples/sec   Loss 2.9040   LearningRate 0.0047   Epoch: 15   Global Step: 649620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:43:56,466-Speed 2645.70 samples/sec   Loss 2.9895   LearningRate 0.0047   Epoch: 15   Global Step: 649630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:00,362-Speed 2629.53 samples/sec   Loss 3.0492   LearningRate 0.0047   Epoch: 15   Global Step: 649640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:04,255-Speed 2630.61 samples/sec   Loss 3.0017   LearningRate 0.0047   Epoch: 15   Global Step: 649650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:08,149-Speed 2630.33 samples/sec   Loss 2.9878   LearningRate 0.0047   Epoch: 15   Global Step: 649660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:12,041-Speed 2631.30 samples/sec   Loss 2.8666   LearningRate 0.0047   Epoch: 15   Global Step: 649670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:15,945-Speed 2623.91 samples/sec   Loss 3.0513   LearningRate 0.0047   Epoch: 15   Global Step: 649680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:19,841-Speed 2628.94 samples/sec   Loss 2.9297   LearningRate 0.0047   Epoch: 15   Global Step: 649690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:23,737-Speed 2629.41 samples/sec   Loss 2.9281   LearningRate 0.0047   Epoch: 15   Global Step: 649700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:27,635-Speed 2627.15 samples/sec   Loss 2.9752   LearningRate 0.0047   Epoch: 15   Global Step: 649710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:31,562-Speed 2608.42 samples/sec   Loss 3.0184   LearningRate 0.0047   Epoch: 15   Global Step: 649720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:35,463-Speed 2626.10 samples/sec   Loss 3.0410   LearningRate 0.0047   Epoch: 15   Global Step: 649730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:44:39,345-Speed 2637.97 samples/sec   Loss 2.9674   LearningRate 0.0047   Epoch: 15   Global Step: 649740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:43,245-Speed 2626.19 samples/sec   Loss 2.9161   LearningRate 0.0047   Epoch: 15   Global Step: 649750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:47,144-Speed 2626.55 samples/sec   Loss 3.0057   LearningRate 0.0047   Epoch: 15   Global Step: 649760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:51,061-Speed 2615.38 samples/sec   Loss 2.9108   LearningRate 0.0047   Epoch: 15   Global Step: 649770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:54,961-Speed 2626.49 samples/sec   Loss 2.9703   LearningRate 0.0047   Epoch: 15   Global Step: 649780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:44:58,860-Speed 2627.26 samples/sec   Loss 2.9309   LearningRate 0.0047   Epoch: 15   Global Step: 649790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:02,760-Speed 2625.90 samples/sec   Loss 2.9628   LearningRate 0.0047   Epoch: 15   Global Step: 649800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:06,661-Speed 2625.81 samples/sec   Loss 2.9651   LearningRate 0.0047   Epoch: 15   Global Step: 649810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:10,553-Speed 2631.92 samples/sec   Loss 3.0078   LearningRate 0.0047   Epoch: 15   Global Step: 649820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:14,448-Speed 2629.99 samples/sec   Loss 2.9848   LearningRate 0.0047   Epoch: 15   Global Step: 649830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:18,349-Speed 2625.27 samples/sec   Loss 2.9524   LearningRate 0.0047   Epoch: 15   Global Step: 649840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:45:22,241-Speed 2632.09 samples/sec   Loss 3.0930   LearningRate 0.0047   Epoch: 15   Global Step: 649850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:45:26,137-Speed 2628.24 samples/sec   Loss 2.9764   LearningRate 0.0047   Epoch: 15   Global Step: 649860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:45:30,033-Speed 2629.97 samples/sec   Loss 2.8991   LearningRate 0.0047   Epoch: 15   Global Step: 649870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:45:33,901-Speed 2647.87 samples/sec   Loss 2.9582   LearningRate 0.0047   Epoch: 15   Global Step: 649880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:37,793-Speed 2631.68 samples/sec   Loss 2.9862   LearningRate 0.0047   Epoch: 15   Global Step: 649890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:41,702-Speed 2620.21 samples/sec   Loss 2.8655   LearningRate 0.0047   Epoch: 15   Global Step: 649900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:45,600-Speed 2627.81 samples/sec   Loss 2.9710   LearningRate 0.0047   Epoch: 15   Global Step: 649910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:49,505-Speed 2622.67 samples/sec   Loss 2.9800   LearningRate 0.0047   Epoch: 15   Global Step: 649920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:53,397-Speed 2632.34 samples/sec   Loss 2.9580   LearningRate 0.0047   Epoch: 15   Global Step: 649930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:45:57,296-Speed 2626.41 samples/sec   Loss 2.9972   LearningRate 0.0047   Epoch: 15   Global Step: 649940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:46:01,193-Speed 2628.69 samples/sec   Loss 2.9880   LearningRate 0.0047   Epoch: 15   Global Step: 649950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:46:05,093-Speed 2626.40 samples/sec   Loss 2.9925   LearningRate 0.0047   Epoch: 15   Global Step: 649960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:46:09,008-Speed 2615.86 samples/sec   Loss 2.9882   LearningRate 0.0047   Epoch: 15   Global Step: 649970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:46:12,880-Speed 2645.22 samples/sec   Loss 2.9768   LearningRate 0.0047   Epoch: 15   Global Step: 649980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:46:16,750-Speed 2646.93 samples/sec   Loss 2.9714   LearningRate 0.0047   Epoch: 15   Global Step: 649990   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:46:20,650-Speed 2626.43 samples/sec   Loss 2.9143   LearningRate 0.0047   Epoch: 15   Global Step: 650000   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:47:03,388-[lfw][650000]XNorm: 22.672565
Training: 2022-04-15 20:47:03,389-[lfw][650000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-15 20:47:03,389-[lfw][650000]Accuracy-Highest: 0.99817
Training: 2022-04-15 20:47:53,130-[cfp_fp][650000]XNorm: 21.908579
Training: 2022-04-15 20:47:53,131-[cfp_fp][650000]Accuracy-Flip: 0.99171+-0.00432
Training: 2022-04-15 20:47:53,132-[cfp_fp][650000]Accuracy-Highest: 0.99243
Training: 2022-04-15 20:48:36,004-[agedb_30][650000]XNorm: 23.018914
Training: 2022-04-15 20:48:36,004-[agedb_30][650000]Accuracy-Flip: 0.97967+-0.00799
Training: 2022-04-15 20:48:36,005-[agedb_30][650000]Accuracy-Highest: 0.98150
Training: 2022-04-15 20:48:39,880-Speed 73.55 samples/sec   Loss 2.9369   LearningRate 0.0047   Epoch: 15   Global Step: 650010   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:48:43,757-Speed 2642.00 samples/sec   Loss 2.9878   LearningRate 0.0047   Epoch: 15   Global Step: 650020   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:48:47,636-Speed 2640.42 samples/sec   Loss 2.9899   LearningRate 0.0047   Epoch: 15   Global Step: 650030   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:48:51,521-Speed 2637.60 samples/sec   Loss 2.9291   LearningRate 0.0047   Epoch: 15   Global Step: 650040   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:48:55,432-Speed 2619.01 samples/sec   Loss 2.9352   LearningRate 0.0047   Epoch: 15   Global Step: 650050   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:48:59,350-Speed 2613.93 samples/sec   Loss 3.0095   LearningRate 0.0047   Epoch: 15   Global Step: 650060   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:49:03,236-Speed 2636.30 samples/sec   Loss 2.9650   LearningRate 0.0047   Epoch: 15   Global Step: 650070   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:49:07,136-Speed 2626.35 samples/sec   Loss 2.9691   LearningRate 0.0047   Epoch: 15   Global Step: 650080   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 20:49:11,058-Speed 2611.03 samples/sec   Loss 3.0543   LearningRate 0.0047   Epoch: 15   Global Step: 650090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:14,955-Speed 2628.90 samples/sec   Loss 2.8746   LearningRate 0.0047   Epoch: 15   Global Step: 650100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:18,844-Speed 2633.74 samples/sec   Loss 2.9504   LearningRate 0.0047   Epoch: 15   Global Step: 650110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:22,743-Speed 2627.44 samples/sec   Loss 2.9815   LearningRate 0.0047   Epoch: 15   Global Step: 650120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:26,654-Speed 2625.81 samples/sec   Loss 2.9618   LearningRate 0.0047   Epoch: 15   Global Step: 650130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:30,557-Speed 2624.33 samples/sec   Loss 2.9365   LearningRate 0.0047   Epoch: 15   Global Step: 650140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:34,459-Speed 2624.87 samples/sec   Loss 2.9664   LearningRate 0.0047   Epoch: 15   Global Step: 650150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:38,357-Speed 2628.62 samples/sec   Loss 3.0055   LearningRate 0.0047   Epoch: 15   Global Step: 650160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:42,249-Speed 2632.01 samples/sec   Loss 2.9937   LearningRate 0.0047   Epoch: 15   Global Step: 650170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:46,187-Speed 2601.03 samples/sec   Loss 2.9541   LearningRate 0.0047   Epoch: 15   Global Step: 650180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:49:50,100-Speed 2616.92 samples/sec   Loss 2.9924   LearningRate 0.0047   Epoch: 15   Global Step: 650190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:49:54,000-Speed 2627.29 samples/sec   Loss 2.8952   LearningRate 0.0047   Epoch: 15   Global Step: 650200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:49:57,900-Speed 2626.58 samples/sec   Loss 2.9074   LearningRate 0.0047   Epoch: 15   Global Step: 650210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:50:01,812-Speed 2618.01 samples/sec   Loss 2.9079   LearningRate 0.0047   Epoch: 15   Global Step: 650220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:50:05,693-Speed 2638.78 samples/sec   Loss 2.9807   LearningRate 0.0047   Epoch: 15   Global Step: 650230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:09,598-Speed 2623.04 samples/sec   Loss 2.9912   LearningRate 0.0047   Epoch: 15   Global Step: 650240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:13,515-Speed 2618.95 samples/sec   Loss 2.9681   LearningRate 0.0047   Epoch: 15   Global Step: 650250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:17,427-Speed 2618.90 samples/sec   Loss 3.0178   LearningRate 0.0047   Epoch: 15   Global Step: 650260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:21,340-Speed 2616.85 samples/sec   Loss 3.0116   LearningRate 0.0047   Epoch: 15   Global Step: 650270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:25,239-Speed 2627.27 samples/sec   Loss 2.8927   LearningRate 0.0047   Epoch: 15   Global Step: 650280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:29,141-Speed 2625.66 samples/sec   Loss 2.9533   LearningRate 0.0047   Epoch: 15   Global Step: 650290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:33,053-Speed 2618.16 samples/sec   Loss 2.9890   LearningRate 0.0047   Epoch: 15   Global Step: 650300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:36,967-Speed 2616.86 samples/sec   Loss 3.0746   LearningRate 0.0047   Epoch: 15   Global Step: 650310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:40,863-Speed 2629.01 samples/sec   Loss 2.9823   LearningRate 0.0047   Epoch: 15   Global Step: 650320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:44,824-Speed 2585.53 samples/sec   Loss 2.9829   LearningRate 0.0047   Epoch: 15   Global Step: 650330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:50:48,803-Speed 2574.16 samples/sec   Loss 3.0216   LearningRate 0.0047   Epoch: 15   Global Step: 650340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:52,704-Speed 2626.08 samples/sec   Loss 3.0087   LearningRate 0.0047   Epoch: 15   Global Step: 650350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:50:56,604-Speed 2626.46 samples/sec   Loss 2.9110   LearningRate 0.0047   Epoch: 15   Global Step: 650360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:00,510-Speed 2626.47 samples/sec   Loss 2.9516   LearningRate 0.0047   Epoch: 15   Global Step: 650370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:04,432-Speed 2611.11 samples/sec   Loss 3.0112   LearningRate 0.0047   Epoch: 15   Global Step: 650380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:08,335-Speed 2624.37 samples/sec   Loss 2.9413   LearningRate 0.0047   Epoch: 15   Global Step: 650390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:12,230-Speed 2629.84 samples/sec   Loss 2.8867   LearningRate 0.0047   Epoch: 15   Global Step: 650400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:16,134-Speed 2623.14 samples/sec   Loss 3.0168   LearningRate 0.0047   Epoch: 15   Global Step: 650410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:20,036-Speed 2624.85 samples/sec   Loss 2.8990   LearningRate 0.0047   Epoch: 15   Global Step: 650420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:23,934-Speed 2628.56 samples/sec   Loss 3.0060   LearningRate 0.0047   Epoch: 15   Global Step: 650430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:27,856-Speed 2611.92 samples/sec   Loss 2.9574   LearningRate 0.0047   Epoch: 15   Global Step: 650440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:51:31,733-Speed 2641.45 samples/sec   Loss 2.8962   LearningRate 0.0047   Epoch: 15   Global Step: 650450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:35,632-Speed 2627.00 samples/sec   Loss 2.9689   LearningRate 0.0047   Epoch: 15   Global Step: 650460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:39,536-Speed 2623.69 samples/sec   Loss 2.9766   LearningRate 0.0047   Epoch: 15   Global Step: 650470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:43,438-Speed 2624.72 samples/sec   Loss 2.9632   LearningRate 0.0047   Epoch: 15   Global Step: 650480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:47,340-Speed 2624.45 samples/sec   Loss 2.9346   LearningRate 0.0047   Epoch: 15   Global Step: 650490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:51,236-Speed 2629.31 samples/sec   Loss 2.9151   LearningRate 0.0047   Epoch: 15   Global Step: 650500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:55,132-Speed 2629.22 samples/sec   Loss 2.9505   LearningRate 0.0047   Epoch: 15   Global Step: 650510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:51:59,022-Speed 2632.98 samples/sec   Loss 2.9466   LearningRate 0.0047   Epoch: 15   Global Step: 650520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:52:02,941-Speed 2613.67 samples/sec   Loss 2.9401   LearningRate 0.0047   Epoch: 15   Global Step: 650530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:52:06,843-Speed 2624.91 samples/sec   Loss 2.9507   LearningRate 0.0047   Epoch: 15   Global Step: 650540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:52:10,740-Speed 2628.22 samples/sec   Loss 2.9286   LearningRate 0.0047   Epoch: 15   Global Step: 650550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:14,639-Speed 2626.70 samples/sec   Loss 2.8931   LearningRate 0.0047   Epoch: 15   Global Step: 650560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:18,553-Speed 2616.91 samples/sec   Loss 2.9358   LearningRate 0.0047   Epoch: 15   Global Step: 650570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:22,448-Speed 2629.58 samples/sec   Loss 2.9444   LearningRate 0.0047   Epoch: 15   Global Step: 650580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:26,350-Speed 2625.34 samples/sec   Loss 2.9528   LearningRate 0.0047   Epoch: 15   Global Step: 650590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:30,246-Speed 2628.65 samples/sec   Loss 3.0214   LearningRate 0.0047   Epoch: 15   Global Step: 650600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:34,145-Speed 2627.59 samples/sec   Loss 2.9233   LearningRate 0.0047   Epoch: 15   Global Step: 650610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:38,108-Speed 2584.45 samples/sec   Loss 2.9379   LearningRate 0.0047   Epoch: 15   Global Step: 650620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:42,011-Speed 2624.84 samples/sec   Loss 2.9385   LearningRate 0.0047   Epoch: 15   Global Step: 650630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:45,913-Speed 2625.35 samples/sec   Loss 2.9597   LearningRate 0.0047   Epoch: 15   Global Step: 650640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:49,820-Speed 2621.08 samples/sec   Loss 2.9358   LearningRate 0.0047   Epoch: 15   Global Step: 650650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-15 20:52:53,704-Speed 2637.93 samples/sec   Loss 3.0262   LearningRate 0.0047   Epoch: 15   Global Step: 650660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:52:57,600-Speed 2628.95 samples/sec   Loss 2.9009   LearningRate 0.0047   Epoch: 15   Global Step: 650670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:53:01,506-Speed 2622.03 samples/sec   Loss 2.9460   LearningRate 0.0047   Epoch: 15   Global Step: 650680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:53:05,400-Speed 2630.23 samples/sec   Loss 2.9214   LearningRate 0.0047   Epoch: 15   Global Step: 650690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:53:09,300-Speed 2626.53 samples/sec   Loss 2.9479   LearningRate 0.0046   Epoch: 15   Global Step: 650700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:53:13,196-Speed 2628.98 samples/sec   Loss 3.0258   LearningRate 0.0046   Epoch: 15   Global Step: 650710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:53:17,107-Speed 2619.12 samples/sec   Loss 2.9412   LearningRate 0.0046   Epoch: 15   Global Step: 650720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:53:20,988-Speed 2639.75 samples/sec   Loss 2.9670   LearningRate 0.0046   Epoch: 15   Global Step: 650730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:24,972-Speed 2570.67 samples/sec   Loss 2.8950   LearningRate 0.0046   Epoch: 15   Global Step: 650740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:28,872-Speed 2626.24 samples/sec   Loss 2.9287   LearningRate 0.0046   Epoch: 15   Global Step: 650750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:32,825-Speed 2591.33 samples/sec   Loss 2.9693   LearningRate 0.0046   Epoch: 15   Global Step: 650760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:36,728-Speed 2624.51 samples/sec   Loss 3.0274   LearningRate 0.0046   Epoch: 15   Global Step: 650770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:40,626-Speed 2627.55 samples/sec   Loss 3.0017   LearningRate 0.0046   Epoch: 15   Global Step: 650780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:44,530-Speed 2623.78 samples/sec   Loss 2.9271   LearningRate 0.0046   Epoch: 15   Global Step: 650790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:48,443-Speed 2617.50 samples/sec   Loss 2.9958   LearningRate 0.0046   Epoch: 15   Global Step: 650800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:52,346-Speed 2624.23 samples/sec   Loss 2.9465   LearningRate 0.0046   Epoch: 15   Global Step: 650810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:53:56,249-Speed 2624.88 samples/sec   Loss 2.9028   LearningRate 0.0046   Epoch: 15   Global Step: 650820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:00,152-Speed 2624.06 samples/sec   Loss 2.8939   LearningRate 0.0046   Epoch: 15   Global Step: 650830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:04,050-Speed 2627.17 samples/sec   Loss 2.9394   LearningRate 0.0046   Epoch: 15   Global Step: 650840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:07,958-Speed 2620.98 samples/sec   Loss 2.9266   LearningRate 0.0046   Epoch: 15   Global Step: 650850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:11,860-Speed 2625.46 samples/sec   Loss 2.9598   LearningRate 0.0046   Epoch: 15   Global Step: 650860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:15,766-Speed 2621.71 samples/sec   Loss 2.9163   LearningRate 0.0046   Epoch: 15   Global Step: 650870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:19,664-Speed 2628.11 samples/sec   Loss 2.9427   LearningRate 0.0046   Epoch: 15   Global Step: 650880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:23,558-Speed 2630.16 samples/sec   Loss 2.9338   LearningRate 0.0046   Epoch: 15   Global Step: 650890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:27,455-Speed 2628.19 samples/sec   Loss 2.9310   LearningRate 0.0046   Epoch: 15   Global Step: 650900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:54:31,338-Speed 2638.21 samples/sec   Loss 2.9597   LearningRate 0.0046   Epoch: 15   Global Step: 650910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:35,237-Speed 2626.38 samples/sec   Loss 2.9234   LearningRate 0.0046   Epoch: 15   Global Step: 650920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:39,136-Speed 2627.33 samples/sec   Loss 2.9689   LearningRate 0.0046   Epoch: 15   Global Step: 650930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:43,041-Speed 2623.11 samples/sec   Loss 2.8827   LearningRate 0.0046   Epoch: 15   Global Step: 650940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:46,939-Speed 2627.36 samples/sec   Loss 2.9542   LearningRate 0.0046   Epoch: 15   Global Step: 650950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:50,842-Speed 2625.19 samples/sec   Loss 2.9398   LearningRate 0.0046   Epoch: 15   Global Step: 650960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:54,746-Speed 2623.60 samples/sec   Loss 2.9612   LearningRate 0.0046   Epoch: 15   Global Step: 650970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:54:58,699-Speed 2591.21 samples/sec   Loss 2.9516   LearningRate 0.0046   Epoch: 15   Global Step: 650980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:02,618-Speed 2613.84 samples/sec   Loss 2.9324   LearningRate 0.0046   Epoch: 15   Global Step: 650990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:06,520-Speed 2624.59 samples/sec   Loss 2.9102   LearningRate 0.0046   Epoch: 15   Global Step: 651000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:10,418-Speed 2627.66 samples/sec   Loss 2.9388   LearningRate 0.0046   Epoch: 15   Global Step: 651010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:55:14,292-Speed 2643.58 samples/sec   Loss 2.9574   LearningRate 0.0046   Epoch: 15   Global Step: 651020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:18,191-Speed 2627.25 samples/sec   Loss 2.9262   LearningRate 0.0046   Epoch: 15   Global Step: 651030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:22,093-Speed 2625.41 samples/sec   Loss 2.8523   LearningRate 0.0046   Epoch: 15   Global Step: 651040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:25,994-Speed 2625.95 samples/sec   Loss 2.8848   LearningRate 0.0046   Epoch: 15   Global Step: 651050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:29,898-Speed 2623.16 samples/sec   Loss 3.0053   LearningRate 0.0046   Epoch: 15   Global Step: 651060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:33,807-Speed 2620.29 samples/sec   Loss 2.9405   LearningRate 0.0046   Epoch: 15   Global Step: 651070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:37,725-Speed 2614.41 samples/sec   Loss 2.9262   LearningRate 0.0046   Epoch: 15   Global Step: 651080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:41,775-Speed 2529.31 samples/sec   Loss 2.9087   LearningRate 0.0046   Epoch: 15   Global Step: 651090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:45,673-Speed 2627.86 samples/sec   Loss 2.9541   LearningRate 0.0046   Epoch: 15   Global Step: 651100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:49,572-Speed 2627.21 samples/sec   Loss 2.9707   LearningRate 0.0046   Epoch: 15   Global Step: 651110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:55:53,472-Speed 2625.64 samples/sec   Loss 2.9277   LearningRate 0.0046   Epoch: 15   Global Step: 651120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:55:57,375-Speed 2625.01 samples/sec   Loss 2.8964   LearningRate 0.0046   Epoch: 15   Global Step: 651130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:01,271-Speed 2628.76 samples/sec   Loss 2.9784   LearningRate 0.0046   Epoch: 15   Global Step: 651140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:05,170-Speed 2626.82 samples/sec   Loss 2.9569   LearningRate 0.0046   Epoch: 15   Global Step: 651150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:09,065-Speed 2629.40 samples/sec   Loss 2.9965   LearningRate 0.0046   Epoch: 15   Global Step: 651160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:12,964-Speed 2627.64 samples/sec   Loss 2.8904   LearningRate 0.0046   Epoch: 15   Global Step: 651170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:16,873-Speed 2620.58 samples/sec   Loss 2.9472   LearningRate 0.0046   Epoch: 15   Global Step: 651180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:20,773-Speed 2625.86 samples/sec   Loss 2.9236   LearningRate 0.0046   Epoch: 15   Global Step: 651190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:24,693-Speed 2612.99 samples/sec   Loss 2.8874   LearningRate 0.0046   Epoch: 15   Global Step: 651200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:28,608-Speed 2616.73 samples/sec   Loss 2.9408   LearningRate 0.0046   Epoch: 15   Global Step: 651210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:32,509-Speed 2625.72 samples/sec   Loss 2.9467   LearningRate 0.0046   Epoch: 15   Global Step: 651220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:56:36,413-Speed 2623.50 samples/sec   Loss 2.9143   LearningRate 0.0046   Epoch: 15   Global Step: 651230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:56:40,393-Speed 2573.63 samples/sec   Loss 2.9527   LearningRate 0.0046   Epoch: 15   Global Step: 651240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:56:44,325-Speed 2605.43 samples/sec   Loss 2.9524   LearningRate 0.0046   Epoch: 15   Global Step: 651250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:56:48,222-Speed 2627.93 samples/sec   Loss 2.9316   LearningRate 0.0046   Epoch: 15   Global Step: 651260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:56:52,119-Speed 2628.37 samples/sec   Loss 2.9154   LearningRate 0.0046   Epoch: 15   Global Step: 651270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:56:56,025-Speed 2621.89 samples/sec   Loss 2.8914   LearningRate 0.0046   Epoch: 15   Global Step: 651280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:56:59,926-Speed 2625.63 samples/sec   Loss 2.9243   LearningRate 0.0046   Epoch: 15   Global Step: 651290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:57:03,822-Speed 2629.33 samples/sec   Loss 2.8829   LearningRate 0.0046   Epoch: 15   Global Step: 651300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:57:07,695-Speed 2644.96 samples/sec   Loss 2.9533   LearningRate 0.0046   Epoch: 15   Global Step: 651310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:11,619-Speed 2610.14 samples/sec   Loss 2.9448   LearningRate 0.0046   Epoch: 15   Global Step: 651320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:15,516-Speed 2628.21 samples/sec   Loss 2.8814   LearningRate 0.0046   Epoch: 15   Global Step: 651330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:19,433-Speed 2615.18 samples/sec   Loss 2.9435   LearningRate 0.0046   Epoch: 15   Global Step: 651340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:23,332-Speed 2626.72 samples/sec   Loss 2.9584   LearningRate 0.0046   Epoch: 15   Global Step: 651350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:27,231-Speed 2627.28 samples/sec   Loss 2.9371   LearningRate 0.0046   Epoch: 15   Global Step: 651360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:31,130-Speed 2626.98 samples/sec   Loss 2.8913   LearningRate 0.0046   Epoch: 15   Global Step: 651370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:35,070-Speed 2599.71 samples/sec   Loss 2.9177   LearningRate 0.0046   Epoch: 15   Global Step: 651380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:38,969-Speed 2626.72 samples/sec   Loss 2.9948   LearningRate 0.0046   Epoch: 15   Global Step: 651390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:42,863-Speed 2630.79 samples/sec   Loss 2.9765   LearningRate 0.0046   Epoch: 15   Global Step: 651400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:46,772-Speed 2620.09 samples/sec   Loss 2.8512   LearningRate 0.0046   Epoch: 15   Global Step: 651410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:57:50,655-Speed 2638.17 samples/sec   Loss 2.8709   LearningRate 0.0046   Epoch: 15   Global Step: 651420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:54,549-Speed 2629.78 samples/sec   Loss 2.9652   LearningRate 0.0046   Epoch: 15   Global Step: 651430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:57:58,652-Speed 2496.68 samples/sec   Loss 2.8474   LearningRate 0.0046   Epoch: 15   Global Step: 651440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:02,557-Speed 2622.76 samples/sec   Loss 2.8722   LearningRate 0.0046   Epoch: 15   Global Step: 651450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:06,456-Speed 2626.99 samples/sec   Loss 2.9920   LearningRate 0.0046   Epoch: 15   Global Step: 651460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:10,351-Speed 2629.82 samples/sec   Loss 2.9487   LearningRate 0.0046   Epoch: 15   Global Step: 651470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:14,249-Speed 2627.91 samples/sec   Loss 3.0033   LearningRate 0.0046   Epoch: 15   Global Step: 651480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:18,239-Speed 2567.24 samples/sec   Loss 2.9497   LearningRate 0.0046   Epoch: 15   Global Step: 651490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:22,207-Speed 2581.11 samples/sec   Loss 2.9068   LearningRate 0.0046   Epoch: 15   Global Step: 651500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:26,108-Speed 2625.07 samples/sec   Loss 2.9805   LearningRate 0.0046   Epoch: 15   Global Step: 651510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:58:30,007-Speed 2627.08 samples/sec   Loss 2.8710   LearningRate 0.0046   Epoch: 15   Global Step: 651520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:33,907-Speed 2626.39 samples/sec   Loss 2.9487   LearningRate 0.0046   Epoch: 15   Global Step: 651530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:37,815-Speed 2620.98 samples/sec   Loss 2.9074   LearningRate 0.0046   Epoch: 15   Global Step: 651540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:41,726-Speed 2618.78 samples/sec   Loss 2.9402   LearningRate 0.0046   Epoch: 15   Global Step: 651550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:45,627-Speed 2625.93 samples/sec   Loss 2.9253   LearningRate 0.0046   Epoch: 15   Global Step: 651560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:49,527-Speed 2626.62 samples/sec   Loss 2.9801   LearningRate 0.0046   Epoch: 15   Global Step: 651570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:53,433-Speed 2622.43 samples/sec   Loss 2.9515   LearningRate 0.0046   Epoch: 15   Global Step: 651580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:58:57,348-Speed 2616.17 samples/sec   Loss 2.8977   LearningRate 0.0046   Epoch: 15   Global Step: 651590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:59:01,226-Speed 2641.29 samples/sec   Loss 2.8908   LearningRate 0.0046   Epoch: 15   Global Step: 651600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:05,135-Speed 2620.30 samples/sec   Loss 2.9696   LearningRate 0.0046   Epoch: 15   Global Step: 651610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:09,060-Speed 2609.97 samples/sec   Loss 3.0572   LearningRate 0.0046   Epoch: 15   Global Step: 651620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:12,965-Speed 2623.11 samples/sec   Loss 2.9358   LearningRate 0.0046   Epoch: 15   Global Step: 651630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:16,859-Speed 2629.68 samples/sec   Loss 2.8909   LearningRate 0.0046   Epoch: 15   Global Step: 651640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:20,755-Speed 2629.85 samples/sec   Loss 2.9321   LearningRate 0.0046   Epoch: 15   Global Step: 651650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:24,652-Speed 2627.56 samples/sec   Loss 3.0015   LearningRate 0.0046   Epoch: 15   Global Step: 651660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:28,557-Speed 2625.22 samples/sec   Loss 2.9426   LearningRate 0.0046   Epoch: 15   Global Step: 651670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:32,462-Speed 2622.98 samples/sec   Loss 3.0007   LearningRate 0.0046   Epoch: 15   Global Step: 651680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:36,361-Speed 2627.21 samples/sec   Loss 3.0248   LearningRate 0.0046   Epoch: 15   Global Step: 651690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 20:59:40,255-Speed 2629.87 samples/sec   Loss 2.8412   LearningRate 0.0046   Epoch: 15   Global Step: 651700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:59:44,156-Speed 2626.13 samples/sec   Loss 2.9446   LearningRate 0.0046   Epoch: 15   Global Step: 651710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:59:48,049-Speed 2630.92 samples/sec   Loss 2.8581   LearningRate 0.0046   Epoch: 15   Global Step: 651720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:59:51,944-Speed 2630.25 samples/sec   Loss 3.0465   LearningRate 0.0046   Epoch: 15   Global Step: 651730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:59:55,870-Speed 2608.65 samples/sec   Loss 2.9431   LearningRate 0.0046   Epoch: 15   Global Step: 651740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 20:59:59,772-Speed 2625.67 samples/sec   Loss 2.9535   LearningRate 0.0046   Epoch: 15   Global Step: 651750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:00:03,667-Speed 2629.69 samples/sec   Loss 2.9370   LearningRate 0.0046   Epoch: 15   Global Step: 651760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:00:07,559-Speed 2631.46 samples/sec   Loss 2.9148   LearningRate 0.0046   Epoch: 15   Global Step: 651770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:00:11,430-Speed 2645.37 samples/sec   Loss 2.9481   LearningRate 0.0046   Epoch: 15   Global Step: 651780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:15,341-Speed 2618.67 samples/sec   Loss 2.9072   LearningRate 0.0046   Epoch: 15   Global Step: 651790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:19,250-Speed 2621.05 samples/sec   Loss 2.8910   LearningRate 0.0046   Epoch: 15   Global Step: 651800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:23,145-Speed 2629.93 samples/sec   Loss 2.9449   LearningRate 0.0046   Epoch: 15   Global Step: 651810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:27,040-Speed 2629.40 samples/sec   Loss 2.8705   LearningRate 0.0046   Epoch: 15   Global Step: 651820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:30,935-Speed 2630.07 samples/sec   Loss 2.9390   LearningRate 0.0046   Epoch: 15   Global Step: 651830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:34,849-Speed 2616.64 samples/sec   Loss 2.9321   LearningRate 0.0046   Epoch: 15   Global Step: 651840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:38,748-Speed 2627.16 samples/sec   Loss 2.9536   LearningRate 0.0046   Epoch: 15   Global Step: 651850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:42,652-Speed 2623.69 samples/sec   Loss 2.9332   LearningRate 0.0046   Epoch: 15   Global Step: 651860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:46,551-Speed 2627.25 samples/sec   Loss 2.9348   LearningRate 0.0046   Epoch: 15   Global Step: 651870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:50,432-Speed 2639.12 samples/sec   Loss 2.9831   LearningRate 0.0046   Epoch: 15   Global Step: 651880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:54,363-Speed 2605.82 samples/sec   Loss 2.9192   LearningRate 0.0046   Epoch: 15   Global Step: 651890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:00:58,258-Speed 2629.91 samples/sec   Loss 2.9055   LearningRate 0.0046   Epoch: 15   Global Step: 651900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:02,151-Speed 2630.66 samples/sec   Loss 2.9208   LearningRate 0.0046   Epoch: 15   Global Step: 651910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:06,067-Speed 2615.96 samples/sec   Loss 2.9409   LearningRate 0.0046   Epoch: 15   Global Step: 651920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:09,971-Speed 2623.61 samples/sec   Loss 2.8884   LearningRate 0.0046   Epoch: 15   Global Step: 651930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:13,875-Speed 2623.68 samples/sec   Loss 2.9509   LearningRate 0.0046   Epoch: 15   Global Step: 651940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:17,775-Speed 2626.17 samples/sec   Loss 2.8889   LearningRate 0.0046   Epoch: 15   Global Step: 651950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:21,684-Speed 2620.32 samples/sec   Loss 2.9612   LearningRate 0.0046   Epoch: 15   Global Step: 651960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:25,581-Speed 2628.24 samples/sec   Loss 2.9936   LearningRate 0.0046   Epoch: 15   Global Step: 651970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:29,475-Speed 2630.22 samples/sec   Loss 2.9145   LearningRate 0.0046   Epoch: 15   Global Step: 651980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:01:33,368-Speed 2631.17 samples/sec   Loss 2.8721   LearningRate 0.0046   Epoch: 15   Global Step: 651990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:01:37,263-Speed 2629.53 samples/sec   Loss 2.9207   LearningRate 0.0046   Epoch: 15   Global Step: 652000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:01:41,166-Speed 2623.95 samples/sec   Loss 2.9392   LearningRate 0.0046   Epoch: 15   Global Step: 652010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:01:45,046-Speed 2640.44 samples/sec   Loss 2.9898   LearningRate 0.0046   Epoch: 15   Global Step: 652020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:48,953-Speed 2621.64 samples/sec   Loss 2.9432   LearningRate 0.0046   Epoch: 15   Global Step: 652030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:52,862-Speed 2619.54 samples/sec   Loss 3.0138   LearningRate 0.0046   Epoch: 15   Global Step: 652040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:01:56,757-Speed 2630.10 samples/sec   Loss 2.8872   LearningRate 0.0046   Epoch: 15   Global Step: 652050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:00,678-Speed 2612.40 samples/sec   Loss 2.8685   LearningRate 0.0046   Epoch: 15   Global Step: 652060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:04,603-Speed 2609.38 samples/sec   Loss 2.8743   LearningRate 0.0046   Epoch: 15   Global Step: 652070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:08,497-Speed 2629.79 samples/sec   Loss 2.9134   LearningRate 0.0046   Epoch: 15   Global Step: 652080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:12,391-Speed 2631.49 samples/sec   Loss 2.9078   LearningRate 0.0046   Epoch: 15   Global Step: 652090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:16,282-Speed 2632.08 samples/sec   Loss 2.9126   LearningRate 0.0046   Epoch: 15   Global Step: 652100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:20,179-Speed 2628.69 samples/sec   Loss 2.9040   LearningRate 0.0046   Epoch: 15   Global Step: 652110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:24,072-Speed 2631.06 samples/sec   Loss 2.9028   LearningRate 0.0046   Epoch: 15   Global Step: 652120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:02:27,965-Speed 2631.38 samples/sec   Loss 2.8811   LearningRate 0.0046   Epoch: 15   Global Step: 652130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:02:31,862-Speed 2628.00 samples/sec   Loss 2.9574   LearningRate 0.0046   Epoch: 15   Global Step: 652140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:02:35,758-Speed 2629.66 samples/sec   Loss 2.8797   LearningRate 0.0046   Epoch: 15   Global Step: 652150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:02:39,631-Speed 2643.94 samples/sec   Loss 2.9367   LearningRate 0.0046   Epoch: 15   Global Step: 652160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:43,531-Speed 2626.62 samples/sec   Loss 2.8934   LearningRate 0.0046   Epoch: 15   Global Step: 652170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:47,428-Speed 2628.18 samples/sec   Loss 3.0043   LearningRate 0.0046   Epoch: 15   Global Step: 652180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:51,326-Speed 2627.61 samples/sec   Loss 2.8687   LearningRate 0.0046   Epoch: 15   Global Step: 652190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:55,238-Speed 2618.17 samples/sec   Loss 2.9578   LearningRate 0.0046   Epoch: 15   Global Step: 652200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:02:59,143-Speed 2623.42 samples/sec   Loss 2.9292   LearningRate 0.0046   Epoch: 15   Global Step: 652210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:03,035-Speed 2631.44 samples/sec   Loss 2.9325   LearningRate 0.0046   Epoch: 15   Global Step: 652220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:06,932-Speed 2628.21 samples/sec   Loss 2.9948   LearningRate 0.0046   Epoch: 15   Global Step: 652230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:10,832-Speed 2626.28 samples/sec   Loss 2.8950   LearningRate 0.0046   Epoch: 15   Global Step: 652240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:14,730-Speed 2628.09 samples/sec   Loss 2.8550   LearningRate 0.0046   Epoch: 15   Global Step: 652250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:18,638-Speed 2620.65 samples/sec   Loss 2.9817   LearningRate 0.0046   Epoch: 15   Global Step: 652260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:22,529-Speed 2632.30 samples/sec   Loss 2.8408   LearningRate 0.0046   Epoch: 15   Global Step: 652270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:26,426-Speed 2628.00 samples/sec   Loss 2.9632   LearningRate 0.0046   Epoch: 15   Global Step: 652280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:30,321-Speed 2630.03 samples/sec   Loss 2.8843   LearningRate 0.0046   Epoch: 15   Global Step: 652290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:34,212-Speed 2632.64 samples/sec   Loss 3.0481   LearningRate 0.0046   Epoch: 15   Global Step: 652300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:38,108-Speed 2629.24 samples/sec   Loss 2.8630   LearningRate 0.0046   Epoch: 15   Global Step: 652310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:42,007-Speed 2626.94 samples/sec   Loss 2.9136   LearningRate 0.0046   Epoch: 15   Global Step: 652320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:45,907-Speed 2625.99 samples/sec   Loss 2.8794   LearningRate 0.0046   Epoch: 15   Global Step: 652330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:03:49,777-Speed 2646.51 samples/sec   Loss 2.9213   LearningRate 0.0046   Epoch: 15   Global Step: 652340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:53,674-Speed 2627.85 samples/sec   Loss 2.9166   LearningRate 0.0046   Epoch: 15   Global Step: 652350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:03:57,581-Speed 2622.66 samples/sec   Loss 2.9933   LearningRate 0.0046   Epoch: 15   Global Step: 652360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:01,479-Speed 2627.38 samples/sec   Loss 2.9221   LearningRate 0.0046   Epoch: 15   Global Step: 652370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:05,386-Speed 2621.68 samples/sec   Loss 2.9273   LearningRate 0.0046   Epoch: 15   Global Step: 652380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:09,285-Speed 2626.92 samples/sec   Loss 2.8661   LearningRate 0.0046   Epoch: 15   Global Step: 652390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:13,178-Speed 2631.12 samples/sec   Loss 2.9734   LearningRate 0.0046   Epoch: 15   Global Step: 652400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:17,109-Speed 2605.22 samples/sec   Loss 2.9129   LearningRate 0.0046   Epoch: 15   Global Step: 652410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:21,136-Speed 2543.52 samples/sec   Loss 2.9406   LearningRate 0.0046   Epoch: 15   Global Step: 652420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:25,029-Speed 2630.87 samples/sec   Loss 2.9143   LearningRate 0.0046   Epoch: 15   Global Step: 652430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:04:28,903-Speed 2644.12 samples/sec   Loss 2.9253   LearningRate 0.0046   Epoch: 15   Global Step: 652440   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:32,801-Speed 2627.60 samples/sec   Loss 2.9414   LearningRate 0.0046   Epoch: 15   Global Step: 652450   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:36,697-Speed 2628.87 samples/sec   Loss 2.8311   LearningRate 0.0046   Epoch: 15   Global Step: 652460   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:40,594-Speed 2628.73 samples/sec   Loss 3.0117   LearningRate 0.0046   Epoch: 15   Global Step: 652470   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:44,494-Speed 2626.93 samples/sec   Loss 2.9600   LearningRate 0.0046   Epoch: 15   Global Step: 652480   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:48,394-Speed 2626.34 samples/sec   Loss 2.8280   LearningRate 0.0046   Epoch: 15   Global Step: 652490   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:52,290-Speed 2629.02 samples/sec   Loss 2.9393   LearningRate 0.0046   Epoch: 15   Global Step: 652500   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:04:56,186-Speed 2628.41 samples/sec   Loss 2.9262   LearningRate 0.0046   Epoch: 15   Global Step: 652510   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:05:00,079-Speed 2631.06 samples/sec   Loss 2.9949   LearningRate 0.0046   Epoch: 15   Global Step: 652520   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:05:03,978-Speed 2626.91 samples/sec   Loss 2.8842   LearningRate 0.0046   Epoch: 15   Global Step: 652530   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:05:07,873-Speed 2630.98 samples/sec   Loss 2.9020   LearningRate 0.0046   Epoch: 15   Global Step: 652540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:11,771-Speed 2626.88 samples/sec   Loss 2.9290   LearningRate 0.0046   Epoch: 15   Global Step: 652550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:15,664-Speed 2631.92 samples/sec   Loss 2.9572   LearningRate 0.0046   Epoch: 15   Global Step: 652560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:19,560-Speed 2629.52 samples/sec   Loss 2.9486   LearningRate 0.0046   Epoch: 15   Global Step: 652570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:23,460-Speed 2626.07 samples/sec   Loss 2.9433   LearningRate 0.0046   Epoch: 15   Global Step: 652580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:27,354-Speed 2630.84 samples/sec   Loss 2.9665   LearningRate 0.0046   Epoch: 15   Global Step: 652590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:31,259-Speed 2622.92 samples/sec   Loss 2.9098   LearningRate 0.0046   Epoch: 15   Global Step: 652600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:35,167-Speed 2620.78 samples/sec   Loss 2.9568   LearningRate 0.0046   Epoch: 15   Global Step: 652610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:39,058-Speed 2632.23 samples/sec   Loss 2.9070   LearningRate 0.0046   Epoch: 15   Global Step: 652620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:42,951-Speed 2631.87 samples/sec   Loss 2.9615   LearningRate 0.0045   Epoch: 15   Global Step: 652630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:46,840-Speed 2633.33 samples/sec   Loss 2.9029   LearningRate 0.0045   Epoch: 15   Global Step: 652640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:05:50,712-Speed 2645.66 samples/sec   Loss 2.8810   LearningRate 0.0045   Epoch: 15   Global Step: 652650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:54,611-Speed 2626.63 samples/sec   Loss 2.8761   LearningRate 0.0045   Epoch: 15   Global Step: 652660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:05:58,509-Speed 2627.85 samples/sec   Loss 2.8760   LearningRate 0.0045   Epoch: 15   Global Step: 652670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:02,418-Speed 2620.66 samples/sec   Loss 2.8640   LearningRate 0.0045   Epoch: 15   Global Step: 652680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:06,324-Speed 2621.59 samples/sec   Loss 2.9841   LearningRate 0.0045   Epoch: 15   Global Step: 652690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:10,241-Speed 2614.95 samples/sec   Loss 2.8948   LearningRate 0.0045   Epoch: 15   Global Step: 652700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:14,189-Speed 2595.15 samples/sec   Loss 2.9088   LearningRate 0.0045   Epoch: 15   Global Step: 652710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:18,273-Speed 2507.73 samples/sec   Loss 2.9432   LearningRate 0.0045   Epoch: 15   Global Step: 652720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:22,188-Speed 2616.24 samples/sec   Loss 2.9116   LearningRate 0.0045   Epoch: 15   Global Step: 652730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:26,093-Speed 2622.93 samples/sec   Loss 2.8809   LearningRate 0.0045   Epoch: 15   Global Step: 652740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:06:29,993-Speed 2626.67 samples/sec   Loss 2.9190   LearningRate 0.0045   Epoch: 15   Global Step: 652750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:33,886-Speed 2631.23 samples/sec   Loss 2.9025   LearningRate 0.0045   Epoch: 15   Global Step: 652760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:37,785-Speed 2626.63 samples/sec   Loss 2.8111   LearningRate 0.0045   Epoch: 15   Global Step: 652770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:41,686-Speed 2625.55 samples/sec   Loss 2.9488   LearningRate 0.0045   Epoch: 15   Global Step: 652780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:45,587-Speed 2626.47 samples/sec   Loss 2.9077   LearningRate 0.0045   Epoch: 15   Global Step: 652790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:49,485-Speed 2627.70 samples/sec   Loss 2.9323   LearningRate 0.0045   Epoch: 15   Global Step: 652800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:53,398-Speed 2617.19 samples/sec   Loss 2.9601   LearningRate 0.0045   Epoch: 15   Global Step: 652810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:06:57,360-Speed 2586.26 samples/sec   Loss 2.9241   LearningRate 0.0045   Epoch: 15   Global Step: 652820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:01,269-Speed 2620.22 samples/sec   Loss 2.9286   LearningRate 0.0045   Epoch: 15   Global Step: 652830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:05,170-Speed 2625.51 samples/sec   Loss 2.9409   LearningRate 0.0045   Epoch: 15   Global Step: 652840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:09,059-Speed 2633.86 samples/sec   Loss 2.8369   LearningRate 0.0045   Epoch: 15   Global Step: 652850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:12,973-Speed 2616.84 samples/sec   Loss 2.9248   LearningRate 0.0045   Epoch: 15   Global Step: 652860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:16,873-Speed 2626.04 samples/sec   Loss 2.8591   LearningRate 0.0045   Epoch: 15   Global Step: 652870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:20,862-Speed 2569.60 samples/sec   Loss 2.9123   LearningRate 0.0045   Epoch: 15   Global Step: 652880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:24,801-Speed 2599.80 samples/sec   Loss 2.8331   LearningRate 0.0045   Epoch: 15   Global Step: 652890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:07:28,675-Speed 2644.18 samples/sec   Loss 2.9711   LearningRate 0.0045   Epoch: 15   Global Step: 652900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:32,568-Speed 2631.12 samples/sec   Loss 2.9224   LearningRate 0.0045   Epoch: 15   Global Step: 652910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:36,463-Speed 2629.32 samples/sec   Loss 2.9395   LearningRate 0.0045   Epoch: 15   Global Step: 652920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:40,364-Speed 2625.54 samples/sec   Loss 2.9112   LearningRate 0.0045   Epoch: 15   Global Step: 652930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:44,279-Speed 2616.52 samples/sec   Loss 3.0127   LearningRate 0.0045   Epoch: 15   Global Step: 652940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:48,201-Speed 2611.48 samples/sec   Loss 2.8611   LearningRate 0.0045   Epoch: 15   Global Step: 652950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:52,095-Speed 2631.10 samples/sec   Loss 2.8443   LearningRate 0.0045   Epoch: 15   Global Step: 652960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:55,987-Speed 2631.58 samples/sec   Loss 2.9375   LearningRate 0.0045   Epoch: 15   Global Step: 652970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:07:59,879-Speed 2631.40 samples/sec   Loss 2.8875   LearningRate 0.0045   Epoch: 15   Global Step: 652980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:03,775-Speed 2628.86 samples/sec   Loss 2.9044   LearningRate 0.0045   Epoch: 15   Global Step: 652990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:07,679-Speed 2623.46 samples/sec   Loss 2.9140   LearningRate 0.0045   Epoch: 15   Global Step: 653000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:08:11,591-Speed 2618.27 samples/sec   Loss 2.9442   LearningRate 0.0045   Epoch: 15   Global Step: 653010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:08:15,463-Speed 2645.75 samples/sec   Loss 2.9629   LearningRate 0.0045   Epoch: 15   Global Step: 653020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:19,359-Speed 2629.53 samples/sec   Loss 2.9136   LearningRate 0.0045   Epoch: 15   Global Step: 653030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:23,257-Speed 2627.41 samples/sec   Loss 2.9019   LearningRate 0.0045   Epoch: 15   Global Step: 653040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:27,157-Speed 2626.77 samples/sec   Loss 2.9119   LearningRate 0.0045   Epoch: 15   Global Step: 653050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:31,053-Speed 2628.36 samples/sec   Loss 2.9317   LearningRate 0.0045   Epoch: 15   Global Step: 653060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:34,948-Speed 2629.99 samples/sec   Loss 2.9332   LearningRate 0.0045   Epoch: 15   Global Step: 653070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:38,842-Speed 2630.16 samples/sec   Loss 2.9264   LearningRate 0.0045   Epoch: 15   Global Step: 653080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:42,734-Speed 2631.71 samples/sec   Loss 3.0177   LearningRate 0.0045   Epoch: 15   Global Step: 653090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:46,634-Speed 2626.58 samples/sec   Loss 2.8738   LearningRate 0.0045   Epoch: 15   Global Step: 653100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:50,525-Speed 2632.61 samples/sec   Loss 2.8902   LearningRate 0.0045   Epoch: 15   Global Step: 653110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:08:54,418-Speed 2630.37 samples/sec   Loss 2.8546   LearningRate 0.0045   Epoch: 15   Global Step: 653120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:08:58,324-Speed 2622.69 samples/sec   Loss 2.9215   LearningRate 0.0045   Epoch: 15   Global Step: 653130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:09:02,205-Speed 2638.56 samples/sec   Loss 2.9691   LearningRate 0.0045   Epoch: 15   Global Step: 653140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:06,106-Speed 2625.56 samples/sec   Loss 2.9181   LearningRate 0.0045   Epoch: 15   Global Step: 653150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:10,006-Speed 2626.64 samples/sec   Loss 2.9062   LearningRate 0.0045   Epoch: 15   Global Step: 653160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:13,918-Speed 2618.34 samples/sec   Loss 2.8841   LearningRate 0.0045   Epoch: 15   Global Step: 653170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:17,819-Speed 2625.52 samples/sec   Loss 3.0000   LearningRate 0.0045   Epoch: 15   Global Step: 653180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:21,730-Speed 2618.99 samples/sec   Loss 2.8739   LearningRate 0.0045   Epoch: 15   Global Step: 653190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:25,630-Speed 2625.44 samples/sec   Loss 2.9122   LearningRate 0.0045   Epoch: 15   Global Step: 653200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:29,534-Speed 2623.50 samples/sec   Loss 2.8881   LearningRate 0.0045   Epoch: 15   Global Step: 653210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:33,444-Speed 2620.07 samples/sec   Loss 2.9537   LearningRate 0.0045   Epoch: 15   Global Step: 653220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:37,342-Speed 2627.69 samples/sec   Loss 2.9084   LearningRate 0.0045   Epoch: 15   Global Step: 653230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:09:41,238-Speed 2629.28 samples/sec   Loss 2.9670   LearningRate 0.0045   Epoch: 15   Global Step: 653240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:09:45,243-Speed 2557.00 samples/sec   Loss 2.9257   LearningRate 0.0045   Epoch: 15   Global Step: 653250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:09:49,141-Speed 2628.11 samples/sec   Loss 2.9458   LearningRate 0.0045   Epoch: 15   Global Step: 653260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:09:53,043-Speed 2625.09 samples/sec   Loss 2.8846   LearningRate 0.0045   Epoch: 15   Global Step: 653270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:09:56,945-Speed 2624.74 samples/sec   Loss 2.9629   LearningRate 0.0045   Epoch: 15   Global Step: 653280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:10:00,842-Speed 2627.92 samples/sec   Loss 2.9314   LearningRate 0.0045   Epoch: 15   Global Step: 653290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:10:04,738-Speed 2628.81 samples/sec   Loss 2.9015   LearningRate 0.0045   Epoch: 15   Global Step: 653300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:10:08,609-Speed 2646.10 samples/sec   Loss 2.9367   LearningRate 0.0045   Epoch: 15   Global Step: 653310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:12,503-Speed 2630.78 samples/sec   Loss 2.9008   LearningRate 0.0045   Epoch: 15   Global Step: 653320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:16,404-Speed 2625.23 samples/sec   Loss 2.8890   LearningRate 0.0045   Epoch: 15   Global Step: 653330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:20,305-Speed 2626.08 samples/sec   Loss 2.9340   LearningRate 0.0045   Epoch: 15   Global Step: 653340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:24,216-Speed 2619.35 samples/sec   Loss 2.9430   LearningRate 0.0045   Epoch: 15   Global Step: 653350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:28,114-Speed 2627.30 samples/sec   Loss 2.9029   LearningRate 0.0045   Epoch: 15   Global Step: 653360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:32,128-Speed 2551.77 samples/sec   Loss 2.9032   LearningRate 0.0045   Epoch: 15   Global Step: 653370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:36,119-Speed 2566.77 samples/sec   Loss 2.9182   LearningRate 0.0045   Epoch: 15   Global Step: 653380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:40,020-Speed 2626.22 samples/sec   Loss 2.8864   LearningRate 0.0045   Epoch: 15   Global Step: 653390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:43,934-Speed 2616.59 samples/sec   Loss 2.8903   LearningRate 0.0045   Epoch: 15   Global Step: 653400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:47,830-Speed 2629.53 samples/sec   Loss 2.9056   LearningRate 0.0045   Epoch: 15   Global Step: 653410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:10:51,707-Speed 2641.78 samples/sec   Loss 2.9274   LearningRate 0.0045   Epoch: 15   Global Step: 653420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:55,617-Speed 2619.83 samples/sec   Loss 2.9849   LearningRate 0.0045   Epoch: 15   Global Step: 653430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:10:59,512-Speed 2629.42 samples/sec   Loss 2.9069   LearningRate 0.0045   Epoch: 15   Global Step: 653440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:03,419-Speed 2621.34 samples/sec   Loss 2.8935   LearningRate 0.0045   Epoch: 15   Global Step: 653450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:07,340-Speed 2611.97 samples/sec   Loss 2.8806   LearningRate 0.0045   Epoch: 15   Global Step: 653460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:11,246-Speed 2622.69 samples/sec   Loss 2.8872   LearningRate 0.0045   Epoch: 15   Global Step: 653470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:15,150-Speed 2623.79 samples/sec   Loss 2.8994   LearningRate 0.0045   Epoch: 15   Global Step: 653480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:19,047-Speed 2628.16 samples/sec   Loss 3.0075   LearningRate 0.0045   Epoch: 15   Global Step: 653490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:22,947-Speed 2626.36 samples/sec   Loss 2.9183   LearningRate 0.0045   Epoch: 15   Global Step: 653500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:26,845-Speed 2627.54 samples/sec   Loss 2.8692   LearningRate 0.0045   Epoch: 15   Global Step: 653510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:30,738-Speed 2631.72 samples/sec   Loss 2.9127   LearningRate 0.0045   Epoch: 15   Global Step: 653520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:11:34,637-Speed 2626.79 samples/sec   Loss 2.8962   LearningRate 0.0045   Epoch: 15   Global Step: 653530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:11:38,533-Speed 2628.72 samples/sec   Loss 2.8989   LearningRate 0.0045   Epoch: 15   Global Step: 653540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:11:42,430-Speed 2627.73 samples/sec   Loss 2.9288   LearningRate 0.0045   Epoch: 15   Global Step: 653550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:11:46,333-Speed 2625.01 samples/sec   Loss 2.9267   LearningRate 0.0045   Epoch: 15   Global Step: 653560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:11:50,203-Speed 2646.69 samples/sec   Loss 2.8730   LearningRate 0.0045   Epoch: 15   Global Step: 653570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:54,151-Speed 2594.58 samples/sec   Loss 2.8671   LearningRate 0.0045   Epoch: 15   Global Step: 653580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:11:58,048-Speed 2628.44 samples/sec   Loss 2.9549   LearningRate 0.0045   Epoch: 15   Global Step: 653590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:01,946-Speed 2627.44 samples/sec   Loss 2.9720   LearningRate 0.0045   Epoch: 15   Global Step: 653600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:05,851-Speed 2622.67 samples/sec   Loss 2.9292   LearningRate 0.0045   Epoch: 15   Global Step: 653610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:09,785-Speed 2603.21 samples/sec   Loss 2.9416   LearningRate 0.0045   Epoch: 15   Global Step: 653620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:13,678-Speed 2631.28 samples/sec   Loss 2.9654   LearningRate 0.0045   Epoch: 15   Global Step: 653630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:17,570-Speed 2631.72 samples/sec   Loss 2.8519   LearningRate 0.0045   Epoch: 15   Global Step: 653640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:21,473-Speed 2624.29 samples/sec   Loss 2.8438   LearningRate 0.0045   Epoch: 15   Global Step: 653650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:25,365-Speed 2631.43 samples/sec   Loss 2.8841   LearningRate 0.0045   Epoch: 15   Global Step: 653660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:29,237-Speed 2646.79 samples/sec   Loss 2.8892   LearningRate 0.0045   Epoch: 15   Global Step: 653670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:33,142-Speed 2622.57 samples/sec   Loss 2.8311   LearningRate 0.0045   Epoch: 15   Global Step: 653680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:37,035-Speed 2630.38 samples/sec   Loss 2.9021   LearningRate 0.0045   Epoch: 15   Global Step: 653690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:40,929-Speed 2630.43 samples/sec   Loss 2.8412   LearningRate 0.0045   Epoch: 15   Global Step: 653700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:44,825-Speed 2629.38 samples/sec   Loss 2.9074   LearningRate 0.0045   Epoch: 15   Global Step: 653710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:48,719-Speed 2630.53 samples/sec   Loss 2.8304   LearningRate 0.0045   Epoch: 15   Global Step: 653720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:52,635-Speed 2617.47 samples/sec   Loss 2.9003   LearningRate 0.0045   Epoch: 15   Global Step: 653730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:12:56,538-Speed 2624.14 samples/sec   Loss 2.9021   LearningRate 0.0045   Epoch: 15   Global Step: 653740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:13:00,450-Speed 2618.79 samples/sec   Loss 2.8462   LearningRate 0.0045   Epoch: 15   Global Step: 653750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:13:04,359-Speed 2619.94 samples/sec   Loss 2.8437   LearningRate 0.0045   Epoch: 15   Global Step: 653760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:13:08,284-Speed 2609.73 samples/sec   Loss 2.9318   LearningRate 0.0045   Epoch: 15   Global Step: 653770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:12,195-Speed 2618.28 samples/sec   Loss 2.9574   LearningRate 0.0045   Epoch: 15   Global Step: 653780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:16,101-Speed 2623.18 samples/sec   Loss 2.8714   LearningRate 0.0045   Epoch: 15   Global Step: 653790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:19,995-Speed 2630.38 samples/sec   Loss 2.9080   LearningRate 0.0045   Epoch: 15   Global Step: 653800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:23,915-Speed 2612.71 samples/sec   Loss 2.8656   LearningRate 0.0045   Epoch: 15   Global Step: 653810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:27,828-Speed 2617.94 samples/sec   Loss 2.8480   LearningRate 0.0045   Epoch: 15   Global Step: 653820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:31,740-Speed 2618.34 samples/sec   Loss 2.8192   LearningRate 0.0045   Epoch: 15   Global Step: 653830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:35,649-Speed 2620.28 samples/sec   Loss 2.8488   LearningRate 0.0045   Epoch: 15   Global Step: 653840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:39,543-Speed 2630.40 samples/sec   Loss 2.8823   LearningRate 0.0045   Epoch: 15   Global Step: 653850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:43,441-Speed 2627.70 samples/sec   Loss 2.9389   LearningRate 0.0045   Epoch: 15   Global Step: 653860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:47,320-Speed 2640.90 samples/sec   Loss 2.9336   LearningRate 0.0045   Epoch: 15   Global Step: 653870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:51,388-Speed 2517.72 samples/sec   Loss 2.8379   LearningRate 0.0045   Epoch: 15   Global Step: 653880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:55,322-Speed 2603.76 samples/sec   Loss 2.9180   LearningRate 0.0045   Epoch: 15   Global Step: 653890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:13:59,216-Speed 2630.58 samples/sec   Loss 2.8902   LearningRate 0.0045   Epoch: 15   Global Step: 653900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:03,118-Speed 2625.09 samples/sec   Loss 2.8853   LearningRate 0.0045   Epoch: 15   Global Step: 653910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:07,037-Speed 2613.08 samples/sec   Loss 2.8741   LearningRate 0.0045   Epoch: 15   Global Step: 653920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:10,934-Speed 2628.77 samples/sec   Loss 2.8509   LearningRate 0.0045   Epoch: 15   Global Step: 653930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:14,840-Speed 2622.69 samples/sec   Loss 2.9025   LearningRate 0.0045   Epoch: 15   Global Step: 653940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:18,738-Speed 2627.38 samples/sec   Loss 2.8631   LearningRate 0.0045   Epoch: 15   Global Step: 653950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:22,660-Speed 2611.22 samples/sec   Loss 2.8363   LearningRate 0.0045   Epoch: 15   Global Step: 653960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:26,541-Speed 2640.55 samples/sec   Loss 2.9015   LearningRate 0.0045   Epoch: 15   Global Step: 653970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:30,441-Speed 2626.68 samples/sec   Loss 2.8158   LearningRate 0.0045   Epoch: 15   Global Step: 653980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:14:34,344-Speed 2624.02 samples/sec   Loss 2.9416   LearningRate 0.0045   Epoch: 15   Global Step: 653990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:14:38,260-Speed 2615.50 samples/sec   Loss 2.9236   LearningRate 0.0045   Epoch: 15   Global Step: 654000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:14:42,163-Speed 2624.44 samples/sec   Loss 2.9099   LearningRate 0.0045   Epoch: 15   Global Step: 654010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:14:46,070-Speed 2621.58 samples/sec   Loss 2.9156   LearningRate 0.0045   Epoch: 15   Global Step: 654020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:14:50,146-Speed 2512.90 samples/sec   Loss 2.8418   LearningRate 0.0045   Epoch: 15   Global Step: 654030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:14:54,045-Speed 2627.21 samples/sec   Loss 2.9115   LearningRate 0.0045   Epoch: 15   Global Step: 654040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:14:57,944-Speed 2626.84 samples/sec   Loss 2.9593   LearningRate 0.0045   Epoch: 15   Global Step: 654050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:15:01,862-Speed 2613.89 samples/sec   Loss 2.8825   LearningRate 0.0045   Epoch: 15   Global Step: 654060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:15:05,793-Speed 2606.26 samples/sec   Loss 2.8732   LearningRate 0.0045   Epoch: 15   Global Step: 654070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:15:09,687-Speed 2630.76 samples/sec   Loss 2.7822   LearningRate 0.0045   Epoch: 15   Global Step: 654080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:15:13,617-Speed 2606.80 samples/sec   Loss 2.9583   LearningRate 0.0045   Epoch: 15   Global Step: 654090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:17,514-Speed 2628.82 samples/sec   Loss 2.9397   LearningRate 0.0045   Epoch: 15   Global Step: 654100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:21,418-Speed 2623.24 samples/sec   Loss 2.8760   LearningRate 0.0045   Epoch: 15   Global Step: 654110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:25,315-Speed 2628.16 samples/sec   Loss 2.8629   LearningRate 0.0045   Epoch: 15   Global Step: 654120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:29,206-Speed 2632.74 samples/sec   Loss 2.9191   LearningRate 0.0045   Epoch: 15   Global Step: 654130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:33,099-Speed 2631.31 samples/sec   Loss 2.9251   LearningRate 0.0045   Epoch: 15   Global Step: 654140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:36,994-Speed 2629.55 samples/sec   Loss 2.9084   LearningRate 0.0045   Epoch: 15   Global Step: 654150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:40,896-Speed 2625.72 samples/sec   Loss 2.9121   LearningRate 0.0045   Epoch: 15   Global Step: 654160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:44,806-Speed 2619.31 samples/sec   Loss 2.9709   LearningRate 0.0045   Epoch: 15   Global Step: 654170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:48,700-Speed 2630.92 samples/sec   Loss 2.7925   LearningRate 0.0045   Epoch: 15   Global Step: 654180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:15:52,575-Speed 2642.65 samples/sec   Loss 2.9343   LearningRate 0.0045   Epoch: 15   Global Step: 654190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:15:56,478-Speed 2624.10 samples/sec   Loss 2.9441   LearningRate 0.0045   Epoch: 15   Global Step: 654200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:00,375-Speed 2628.82 samples/sec   Loss 2.8924   LearningRate 0.0045   Epoch: 15   Global Step: 654210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:04,289-Speed 2617.01 samples/sec   Loss 2.9028   LearningRate 0.0045   Epoch: 15   Global Step: 654220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:08,187-Speed 2627.34 samples/sec   Loss 2.8508   LearningRate 0.0045   Epoch: 15   Global Step: 654230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:12,081-Speed 2630.36 samples/sec   Loss 2.8662   LearningRate 0.0045   Epoch: 15   Global Step: 654240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:15,973-Speed 2631.83 samples/sec   Loss 2.8984   LearningRate 0.0045   Epoch: 15   Global Step: 654250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:19,872-Speed 2627.49 samples/sec   Loss 2.9540   LearningRate 0.0045   Epoch: 15   Global Step: 654260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:23,770-Speed 2627.54 samples/sec   Loss 2.9380   LearningRate 0.0045   Epoch: 15   Global Step: 654270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:27,669-Speed 2627.01 samples/sec   Loss 2.8499   LearningRate 0.0045   Epoch: 15   Global Step: 654280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:16:31,575-Speed 2621.92 samples/sec   Loss 2.9225   LearningRate 0.0045   Epoch: 15   Global Step: 654290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:35,467-Speed 2632.12 samples/sec   Loss 2.9287   LearningRate 0.0045   Epoch: 15   Global Step: 654300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:39,427-Speed 2586.49 samples/sec   Loss 2.9065   LearningRate 0.0045   Epoch: 15   Global Step: 654310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:43,346-Speed 2612.86 samples/sec   Loss 2.8628   LearningRate 0.0045   Epoch: 15   Global Step: 654320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:47,300-Speed 2590.85 samples/sec   Loss 2.9338   LearningRate 0.0045   Epoch: 15   Global Step: 654330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:51,196-Speed 2629.60 samples/sec   Loss 2.7921   LearningRate 0.0045   Epoch: 15   Global Step: 654340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:55,109-Speed 2617.81 samples/sec   Loss 2.8679   LearningRate 0.0045   Epoch: 15   Global Step: 654350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:16:59,020-Speed 2618.61 samples/sec   Loss 2.9477   LearningRate 0.0045   Epoch: 15   Global Step: 654360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:02,919-Speed 2627.31 samples/sec   Loss 2.8526   LearningRate 0.0045   Epoch: 15   Global Step: 654370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:06,834-Speed 2615.47 samples/sec   Loss 2.9669   LearningRate 0.0045   Epoch: 15   Global Step: 654380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:10,718-Speed 2637.71 samples/sec   Loss 2.8113   LearningRate 0.0045   Epoch: 15   Global Step: 654390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:14,612-Speed 2630.38 samples/sec   Loss 2.9399   LearningRate 0.0045   Epoch: 15   Global Step: 654400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:18,532-Speed 2613.03 samples/sec   Loss 2.9322   LearningRate 0.0045   Epoch: 15   Global Step: 654410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:22,437-Speed 2623.26 samples/sec   Loss 2.7972   LearningRate 0.0045   Epoch: 15   Global Step: 654420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:26,333-Speed 2629.13 samples/sec   Loss 2.8909   LearningRate 0.0045   Epoch: 15   Global Step: 654430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:30,248-Speed 2616.43 samples/sec   Loss 2.9079   LearningRate 0.0045   Epoch: 15   Global Step: 654440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:34,149-Speed 2625.26 samples/sec   Loss 2.9158   LearningRate 0.0045   Epoch: 15   Global Step: 654450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:38,041-Speed 2631.92 samples/sec   Loss 2.9150   LearningRate 0.0045   Epoch: 15   Global Step: 654460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:41,938-Speed 2627.79 samples/sec   Loss 2.9835   LearningRate 0.0045   Epoch: 15   Global Step: 654470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:45,834-Speed 2629.60 samples/sec   Loss 2.9591   LearningRate 0.0045   Epoch: 15   Global Step: 654480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:49,703-Speed 2647.56 samples/sec   Loss 2.9148   LearningRate 0.0045   Epoch: 15   Global Step: 654490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:53,596-Speed 2631.08 samples/sec   Loss 2.8423   LearningRate 0.0045   Epoch: 15   Global Step: 654500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:17:57,474-Speed 2641.12 samples/sec   Loss 2.8952   LearningRate 0.0045   Epoch: 15   Global Step: 654510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:01,368-Speed 2630.65 samples/sec   Loss 2.9050   LearningRate 0.0045   Epoch: 15   Global Step: 654520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:05,268-Speed 2626.36 samples/sec   Loss 2.8571   LearningRate 0.0045   Epoch: 15   Global Step: 654530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:09,163-Speed 2629.57 samples/sec   Loss 2.9363   LearningRate 0.0045   Epoch: 15   Global Step: 654540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:13,062-Speed 2626.63 samples/sec   Loss 2.9223   LearningRate 0.0045   Epoch: 15   Global Step: 654550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:16,958-Speed 2629.15 samples/sec   Loss 2.8279   LearningRate 0.0045   Epoch: 15   Global Step: 654560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:20,852-Speed 2630.94 samples/sec   Loss 2.9592   LearningRate 0.0045   Epoch: 15   Global Step: 654570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:24,751-Speed 2626.77 samples/sec   Loss 2.8757   LearningRate 0.0044   Epoch: 15   Global Step: 654580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:28,653-Speed 2631.35 samples/sec   Loss 2.9407   LearningRate 0.0044   Epoch: 15   Global Step: 654590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:32,548-Speed 2629.56 samples/sec   Loss 2.8684   LearningRate 0.0044   Epoch: 15   Global Step: 654600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:18:36,447-Speed 2626.76 samples/sec   Loss 2.8081   LearningRate 0.0044   Epoch: 15   Global Step: 654610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:18:40,346-Speed 2627.00 samples/sec   Loss 2.8120   LearningRate 0.0044   Epoch: 15   Global Step: 654620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:18:44,240-Speed 2630.26 samples/sec   Loss 2.8251   LearningRate 0.0044   Epoch: 15   Global Step: 654630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:18:48,134-Speed 2629.90 samples/sec   Loss 2.8912   LearningRate 0.0044   Epoch: 15   Global Step: 654640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:18:52,029-Speed 2630.75 samples/sec   Loss 2.8555   LearningRate 0.0044   Epoch: 15   Global Step: 654650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:18:55,926-Speed 2628.38 samples/sec   Loss 2.9436   LearningRate 0.0044   Epoch: 15   Global Step: 654660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:18:59,820-Speed 2630.80 samples/sec   Loss 2.9226   LearningRate 0.0044   Epoch: 15   Global Step: 654670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:19:03,719-Speed 2626.65 samples/sec   Loss 2.9127   LearningRate 0.0044   Epoch: 15   Global Step: 654680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:19:07,615-Speed 2628.56 samples/sec   Loss 2.9427   LearningRate 0.0044   Epoch: 15   Global Step: 654690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:19:11,515-Speed 2626.06 samples/sec   Loss 2.8885   LearningRate 0.0044   Epoch: 15   Global Step: 654700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:19:15,410-Speed 2630.85 samples/sec   Loss 2.8714   LearningRate 0.0044   Epoch: 15   Global Step: 654710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-15 21:19:19,282-Speed 2644.70 samples/sec   Loss 2.8687   LearningRate 0.0044   Epoch: 15   Global Step: 654720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:19:23,160-Speed 2641.47 samples/sec   Loss 2.8551   LearningRate 0.0044   Epoch: 15   Global Step: 654730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:27,057-Speed 2628.15 samples/sec   Loss 2.8815   LearningRate 0.0044   Epoch: 15   Global Step: 654740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:30,996-Speed 2601.04 samples/sec   Loss 2.8910   LearningRate 0.0044   Epoch: 15   Global Step: 654750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:34,924-Speed 2607.46 samples/sec   Loss 2.9187   LearningRate 0.0044   Epoch: 15   Global Step: 654760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:38,821-Speed 2628.18 samples/sec   Loss 2.8485   LearningRate 0.0044   Epoch: 15   Global Step: 654770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:42,717-Speed 2629.13 samples/sec   Loss 2.8731   LearningRate 0.0044   Epoch: 15   Global Step: 654780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:46,646-Speed 2607.15 samples/sec   Loss 2.9060   LearningRate 0.0044   Epoch: 15   Global Step: 654790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:19:50,517-Speed 2646.25 samples/sec   Loss 2.9168   LearningRate 0.0044   Epoch: 15   Global Step: 654800   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:19:54,417-Speed 2626.75 samples/sec   Loss 2.8979   LearningRate 0.0044   Epoch: 15   Global Step: 654810   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:19:58,315-Speed 2627.58 samples/sec   Loss 2.9491   LearningRate 0.0044   Epoch: 15   Global Step: 654820   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:02,216-Speed 2625.37 samples/sec   Loss 2.9102   LearningRate 0.0044   Epoch: 15   Global Step: 654830   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:06,116-Speed 2626.56 samples/sec   Loss 2.9056   LearningRate 0.0044   Epoch: 15   Global Step: 654840   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:10,009-Speed 2630.72 samples/sec   Loss 2.7900   LearningRate 0.0044   Epoch: 15   Global Step: 654850   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:13,902-Speed 2631.00 samples/sec   Loss 2.9134   LearningRate 0.0044   Epoch: 15   Global Step: 654860   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:17,798-Speed 2629.48 samples/sec   Loss 2.8689   LearningRate 0.0044   Epoch: 15   Global Step: 654870   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:21,694-Speed 2629.25 samples/sec   Loss 2.8419   LearningRate 0.0044   Epoch: 15   Global Step: 654880   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:25,593-Speed 2626.87 samples/sec   Loss 2.8448   LearningRate 0.0044   Epoch: 15   Global Step: 654890   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:20:29,492-Speed 2626.89 samples/sec   Loss 2.8260   LearningRate 0.0044   Epoch: 15   Global Step: 654900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:33,390-Speed 2627.15 samples/sec   Loss 2.8770   LearningRate 0.0044   Epoch: 15   Global Step: 654910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:37,348-Speed 2587.78 samples/sec   Loss 2.9182   LearningRate 0.0044   Epoch: 15   Global Step: 654920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:41,248-Speed 2626.45 samples/sec   Loss 2.9265   LearningRate 0.0044   Epoch: 15   Global Step: 654930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:45,150-Speed 2625.33 samples/sec   Loss 2.8147   LearningRate 0.0044   Epoch: 15   Global Step: 654940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:49,135-Speed 2570.23 samples/sec   Loss 2.8762   LearningRate 0.0044   Epoch: 15   Global Step: 654950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:53,033-Speed 2627.99 samples/sec   Loss 2.8708   LearningRate 0.0044   Epoch: 15   Global Step: 654960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:20:56,942-Speed 2620.15 samples/sec   Loss 2.8597   LearningRate 0.0044   Epoch: 15   Global Step: 654970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:00,844-Speed 2625.10 samples/sec   Loss 2.8717   LearningRate 0.0044   Epoch: 15   Global Step: 654980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:04,738-Speed 2629.90 samples/sec   Loss 2.9238   LearningRate 0.0044   Epoch: 15   Global Step: 654990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:08,633-Speed 2629.37 samples/sec   Loss 3.0090   LearningRate 0.0044   Epoch: 15   Global Step: 655000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:21:12,636-Speed 2558.61 samples/sec   Loss 2.9379   LearningRate 0.0044   Epoch: 15   Global Step: 655010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:21:16,533-Speed 2628.40 samples/sec   Loss 2.8651   LearningRate 0.0044   Epoch: 15   Global Step: 655020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:21:20,420-Speed 2635.35 samples/sec   Loss 2.9360   LearningRate 0.0044   Epoch: 15   Global Step: 655030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:24,324-Speed 2623.70 samples/sec   Loss 2.9329   LearningRate 0.0044   Epoch: 15   Global Step: 655040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:28,240-Speed 2615.58 samples/sec   Loss 2.9303   LearningRate 0.0044   Epoch: 15   Global Step: 655050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:32,134-Speed 2630.08 samples/sec   Loss 2.8045   LearningRate 0.0044   Epoch: 15   Global Step: 655060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:36,030-Speed 2628.74 samples/sec   Loss 2.8706   LearningRate 0.0044   Epoch: 15   Global Step: 655070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:39,932-Speed 2625.64 samples/sec   Loss 2.9293   LearningRate 0.0044   Epoch: 15   Global Step: 655080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:43,834-Speed 2624.63 samples/sec   Loss 2.8749   LearningRate 0.0044   Epoch: 15   Global Step: 655090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:47,738-Speed 2624.14 samples/sec   Loss 2.8566   LearningRate 0.0044   Epoch: 15   Global Step: 655100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:51,632-Speed 2630.01 samples/sec   Loss 2.8565   LearningRate 0.0044   Epoch: 15   Global Step: 655110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:55,534-Speed 2625.82 samples/sec   Loss 2.9320   LearningRate 0.0044   Epoch: 15   Global Step: 655120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:21:59,445-Speed 2618.89 samples/sec   Loss 2.8574   LearningRate 0.0044   Epoch: 15   Global Step: 655130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:22:03,327-Speed 2638.11 samples/sec   Loss 2.8818   LearningRate 0.0044   Epoch: 15   Global Step: 655140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:22:07,222-Speed 2629.72 samples/sec   Loss 2.8873   LearningRate 0.0044   Epoch: 15   Global Step: 655150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:22:11,120-Speed 2627.20 samples/sec   Loss 2.9109   LearningRate 0.0044   Epoch: 15   Global Step: 655160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:22:14,996-Speed 2643.04 samples/sec   Loss 2.7957   LearningRate 0.0044   Epoch: 15   Global Step: 655170   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:18,906-Speed 2620.10 samples/sec   Loss 2.8984   LearningRate 0.0044   Epoch: 15   Global Step: 655180   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:22,842-Speed 2601.63 samples/sec   Loss 2.9312   LearningRate 0.0044   Epoch: 15   Global Step: 655190   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:26,745-Speed 2625.79 samples/sec   Loss 2.8865   LearningRate 0.0044   Epoch: 15   Global Step: 655200   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:30,643-Speed 2627.26 samples/sec   Loss 2.8130   LearningRate 0.0044   Epoch: 15   Global Step: 655210   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:34,538-Speed 2629.48 samples/sec   Loss 2.9115   LearningRate 0.0044   Epoch: 15   Global Step: 655220   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:38,435-Speed 2628.27 samples/sec   Loss 2.8933   LearningRate 0.0044   Epoch: 15   Global Step: 655230   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:42,333-Speed 2627.53 samples/sec   Loss 2.9153   LearningRate 0.0044   Epoch: 15   Global Step: 655240   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:46,237-Speed 2623.94 samples/sec   Loss 2.9185   LearningRate 0.0044   Epoch: 15   Global Step: 655250   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:50,149-Speed 2618.69 samples/sec   Loss 2.8769   LearningRate 0.0044   Epoch: 15   Global Step: 655260   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-15 21:22:54,059-Speed 2619.78 samples/sec   Loss 2.9231   LearningRate 0.0044   Epoch: 15   Global Step: 655270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:22:57,954-Speed 2629.52 samples/sec   Loss 2.8645   LearningRate 0.0044   Epoch: 15   Global Step: 655280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:01,853-Speed 2626.40 samples/sec   Loss 2.8523   LearningRate 0.0044   Epoch: 15   Global Step: 655290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:05,751-Speed 2627.62 samples/sec   Loss 2.8863   LearningRate 0.0044   Epoch: 15   Global Step: 655300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:09,654-Speed 2624.50 samples/sec   Loss 2.8965   LearningRate 0.0044   Epoch: 15   Global Step: 655310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:13,546-Speed 2631.97 samples/sec   Loss 2.8119   LearningRate 0.0044   Epoch: 15   Global Step: 655320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:17,448-Speed 2624.45 samples/sec   Loss 2.8303   LearningRate 0.0044   Epoch: 15   Global Step: 655330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:21,342-Speed 2630.78 samples/sec   Loss 2.8523   LearningRate 0.0044   Epoch: 15   Global Step: 655340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:25,247-Speed 2622.74 samples/sec   Loss 2.8456   LearningRate 0.0044   Epoch: 15   Global Step: 655350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:29,145-Speed 2628.46 samples/sec   Loss 2.9390   LearningRate 0.0044   Epoch: 15   Global Step: 655360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:23:33,041-Speed 2629.11 samples/sec   Loss 2.8560   LearningRate 0.0044   Epoch: 15   Global Step: 655370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:23:36,932-Speed 2632.35 samples/sec   Loss 2.9124   LearningRate 0.0044   Epoch: 15   Global Step: 655380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:23:40,824-Speed 2631.22 samples/sec   Loss 2.8491   LearningRate 0.0044   Epoch: 15   Global Step: 655390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:23:44,724-Speed 2627.10 samples/sec   Loss 2.8776   LearningRate 0.0044   Epoch: 15   Global Step: 655400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:23:48,618-Speed 2630.23 samples/sec   Loss 2.8601   LearningRate 0.0044   Epoch: 15   Global Step: 655410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:23:52,519-Speed 2625.51 samples/sec   Loss 2.8920   LearningRate 0.0044   Epoch: 15   Global Step: 655420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:23:56,411-Speed 2632.14 samples/sec   Loss 2.9342   LearningRate 0.0044   Epoch: 15   Global Step: 655430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:24:00,308-Speed 2628.91 samples/sec   Loss 2.8373   LearningRate 0.0044   Epoch: 15   Global Step: 655440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:24:04,210-Speed 2624.41 samples/sec   Loss 2.8697   LearningRate 0.0044   Epoch: 15   Global Step: 655450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:24:08,083-Speed 2644.47 samples/sec   Loss 2.8246   LearningRate 0.0044   Epoch: 15   Global Step: 655460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:11,975-Speed 2631.47 samples/sec   Loss 2.8718   LearningRate 0.0044   Epoch: 15   Global Step: 655470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:15,866-Speed 2633.17 samples/sec   Loss 2.8809   LearningRate 0.0044   Epoch: 15   Global Step: 655480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:19,759-Speed 2630.56 samples/sec   Loss 2.8698   LearningRate 0.0044   Epoch: 15   Global Step: 655490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:23,656-Speed 2628.88 samples/sec   Loss 2.8495   LearningRate 0.0044   Epoch: 15   Global Step: 655500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:27,545-Speed 2633.33 samples/sec   Loss 2.8669   LearningRate 0.0044   Epoch: 15   Global Step: 655510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:31,477-Speed 2605.14 samples/sec   Loss 2.8349   LearningRate 0.0044   Epoch: 15   Global Step: 655520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:35,375-Speed 2627.90 samples/sec   Loss 3.0340   LearningRate 0.0044   Epoch: 15   Global Step: 655530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:39,269-Speed 2630.49 samples/sec   Loss 2.8473   LearningRate 0.0044   Epoch: 15   Global Step: 655540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:43,181-Speed 2617.66 samples/sec   Loss 2.8537   LearningRate 0.0044   Epoch: 15   Global Step: 655550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:24:47,073-Speed 2632.75 samples/sec   Loss 2.9153   LearningRate 0.0044   Epoch: 15   Global Step: 655560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:24:50,965-Speed 2632.17 samples/sec   Loss 2.8413   LearningRate 0.0044   Epoch: 15   Global Step: 655570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:24:54,870-Speed 2622.94 samples/sec   Loss 2.9152   LearningRate 0.0044   Epoch: 15   Global Step: 655580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:24:58,766-Speed 2629.27 samples/sec   Loss 2.8667   LearningRate 0.0044   Epoch: 15   Global Step: 655590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:25:02,660-Speed 2630.33 samples/sec   Loss 2.8672   LearningRate 0.0044   Epoch: 15   Global Step: 655600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:25:06,551-Speed 2632.23 samples/sec   Loss 2.8466   LearningRate 0.0044   Epoch: 15   Global Step: 655610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:25:10,442-Speed 2632.51 samples/sec   Loss 2.9066   LearningRate 0.0044   Epoch: 15   Global Step: 655620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:25:14,338-Speed 2629.16 samples/sec   Loss 2.8863   LearningRate 0.0044   Epoch: 15   Global Step: 655630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:25:18,245-Speed 2621.17 samples/sec   Loss 2.8483   LearningRate 0.0044   Epoch: 15   Global Step: 655640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:25:22,118-Speed 2645.16 samples/sec   Loss 2.8937   LearningRate 0.0044   Epoch: 15   Global Step: 655650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:26,008-Speed 2632.47 samples/sec   Loss 2.9106   LearningRate 0.0044   Epoch: 15   Global Step: 655660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:29,899-Speed 2632.78 samples/sec   Loss 2.8276   LearningRate 0.0044   Epoch: 15   Global Step: 655670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:33,801-Speed 2624.32 samples/sec   Loss 2.9553   LearningRate 0.0044   Epoch: 15   Global Step: 655680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:37,696-Speed 2629.82 samples/sec   Loss 2.9056   LearningRate 0.0044   Epoch: 15   Global Step: 655690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:41,594-Speed 2627.44 samples/sec   Loss 2.9411   LearningRate 0.0044   Epoch: 15   Global Step: 655700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:45,487-Speed 2631.40 samples/sec   Loss 2.8762   LearningRate 0.0044   Epoch: 15   Global Step: 655710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:49,381-Speed 2630.46 samples/sec   Loss 2.8579   LearningRate 0.0044   Epoch: 15   Global Step: 655720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:53,282-Speed 2625.94 samples/sec   Loss 2.8278   LearningRate 0.0044   Epoch: 15   Global Step: 655730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:25:57,177-Speed 2629.87 samples/sec   Loss 2.8489   LearningRate 0.0044   Epoch: 15   Global Step: 655740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:01,070-Speed 2631.03 samples/sec   Loss 2.8059   LearningRate 0.0044   Epoch: 15   Global Step: 655750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:26:04,971-Speed 2625.29 samples/sec   Loss 2.8878   LearningRate 0.0044   Epoch: 15   Global Step: 655760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:26:08,867-Speed 2629.28 samples/sec   Loss 2.8506   LearningRate 0.0044   Epoch: 15   Global Step: 655770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:26:12,738-Speed 2645.66 samples/sec   Loss 2.8536   LearningRate 0.0044   Epoch: 15   Global Step: 655780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:16,638-Speed 2627.09 samples/sec   Loss 2.8714   LearningRate 0.0044   Epoch: 15   Global Step: 655790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:20,529-Speed 2632.07 samples/sec   Loss 2.8021   LearningRate 0.0044   Epoch: 15   Global Step: 655800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:24,442-Speed 2617.51 samples/sec   Loss 2.9801   LearningRate 0.0044   Epoch: 15   Global Step: 655810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:28,342-Speed 2626.13 samples/sec   Loss 2.8759   LearningRate 0.0044   Epoch: 15   Global Step: 655820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:32,247-Speed 2623.54 samples/sec   Loss 2.8362   LearningRate 0.0044   Epoch: 15   Global Step: 655830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:36,147-Speed 2626.74 samples/sec   Loss 2.9037   LearningRate 0.0044   Epoch: 15   Global Step: 655840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:40,039-Speed 2631.56 samples/sec   Loss 2.8818   LearningRate 0.0044   Epoch: 15   Global Step: 655850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:43,937-Speed 2626.93 samples/sec   Loss 2.8973   LearningRate 0.0044   Epoch: 15   Global Step: 655860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:47,849-Speed 2619.15 samples/sec   Loss 2.8745   LearningRate 0.0044   Epoch: 15   Global Step: 655870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:51,742-Speed 2630.98 samples/sec   Loss 2.8280   LearningRate 0.0044   Epoch: 15   Global Step: 655880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:26:55,631-Speed 2633.77 samples/sec   Loss 2.8964   LearningRate 0.0044   Epoch: 15   Global Step: 655890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:26:59,531-Speed 2626.34 samples/sec   Loss 2.8390   LearningRate 0.0044   Epoch: 15   Global Step: 655900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:03,434-Speed 2624.46 samples/sec   Loss 2.8359   LearningRate 0.0044   Epoch: 15   Global Step: 655910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:07,329-Speed 2629.54 samples/sec   Loss 2.8970   LearningRate 0.0044   Epoch: 15   Global Step: 655920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:11,223-Speed 2630.28 samples/sec   Loss 2.7679   LearningRate 0.0044   Epoch: 15   Global Step: 655930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:15,120-Speed 2628.41 samples/sec   Loss 2.8216   LearningRate 0.0044   Epoch: 15   Global Step: 655940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:19,014-Speed 2630.73 samples/sec   Loss 2.8630   LearningRate 0.0044   Epoch: 15   Global Step: 655950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:22,903-Speed 2633.47 samples/sec   Loss 2.7159   LearningRate 0.0044   Epoch: 15   Global Step: 655960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:26,803-Speed 2626.96 samples/sec   Loss 2.7884   LearningRate 0.0044   Epoch: 15   Global Step: 655970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:30,698-Speed 2629.91 samples/sec   Loss 2.7995   LearningRate 0.0044   Epoch: 15   Global Step: 655980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:34,594-Speed 2628.91 samples/sec   Loss 2.9264   LearningRate 0.0044   Epoch: 15   Global Step: 655990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-15 21:27:38,467-Speed 2644.22 samples/sec   Loss 2.8387   LearningRate 0.0044   Epoch: 15   Global Step: 656000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:42,367-Speed 2626.10 samples/sec   Loss 2.8706   LearningRate 0.0044   Epoch: 15   Global Step: 656010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:46,268-Speed 2626.64 samples/sec   Loss 2.8863   LearningRate 0.0044   Epoch: 15   Global Step: 656020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:50,165-Speed 2627.75 samples/sec   Loss 2.8335   LearningRate 0.0044   Epoch: 15   Global Step: 656030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:54,059-Speed 2630.75 samples/sec   Loss 2.8248   LearningRate 0.0044   Epoch: 15   Global Step: 656040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:27:57,953-Speed 2630.01 samples/sec   Loss 2.8517   LearningRate 0.0044   Epoch: 15   Global Step: 656050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:28:01,854-Speed 2625.69 samples/sec   Loss 2.9020   LearningRate 0.0044   Epoch: 15   Global Step: 656060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:28:05,748-Speed 2630.23 samples/sec   Loss 2.8074   LearningRate 0.0044   Epoch: 15   Global Step: 656070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-15 21:28:09,671-Speed 2611.49 samples/sec   Loss 2.8083   LearningRate 0.0044   Epoch: 15   Global Step: 656080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:13,569-Speed 2627.19 samples/sec   Loss 2.8825   LearningRate 0.0044   Epoch: 15   Global Step: 656090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:17,503-Speed 2603.86 samples/sec   Loss 2.8672   LearningRate 0.0044   Epoch: 15   Global Step: 656100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:28:21,373-Speed 2646.60 samples/sec   Loss 2.8112   LearningRate 0.0044   Epoch: 15   Global Step: 656110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:25,267-Speed 2631.55 samples/sec   Loss 2.9049   LearningRate 0.0044   Epoch: 15   Global Step: 656120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:29,166-Speed 2626.98 samples/sec   Loss 2.8470   LearningRate 0.0044   Epoch: 15   Global Step: 656130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:33,067-Speed 2625.49 samples/sec   Loss 2.9048   LearningRate 0.0044   Epoch: 15   Global Step: 656140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:36,959-Speed 2631.77 samples/sec   Loss 2.8201   LearningRate 0.0044   Epoch: 15   Global Step: 656150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:40,863-Speed 2623.68 samples/sec   Loss 2.8477   LearningRate 0.0044   Epoch: 15   Global Step: 656160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:44,768-Speed 2623.03 samples/sec   Loss 2.8122   LearningRate 0.0044   Epoch: 15   Global Step: 656170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:48,660-Speed 2631.60 samples/sec   Loss 2.8524   LearningRate 0.0044   Epoch: 15   Global Step: 656180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:52,554-Speed 2630.33 samples/sec   Loss 2.8050   LearningRate 0.0044   Epoch: 15   Global Step: 656190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:28:56,447-Speed 2631.66 samples/sec   Loss 2.8993   LearningRate 0.0044   Epoch: 15   Global Step: 656200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:29:00,319-Speed 2644.76 samples/sec   Loss 2.8399   LearningRate 0.0044   Epoch: 15   Global Step: 656210   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:04,212-Speed 2631.15 samples/sec   Loss 2.8031   LearningRate 0.0044   Epoch: 15   Global Step: 656220   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:08,110-Speed 2627.29 samples/sec   Loss 2.8149   LearningRate 0.0044   Epoch: 15   Global Step: 656230   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:12,006-Speed 2629.47 samples/sec   Loss 2.8884   LearningRate 0.0044   Epoch: 15   Global Step: 656240   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:15,907-Speed 2624.99 samples/sec   Loss 2.8526   LearningRate 0.0044   Epoch: 15   Global Step: 656250   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:19,804-Speed 2628.45 samples/sec   Loss 2.8862   LearningRate 0.0044   Epoch: 15   Global Step: 656260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:23,729-Speed 2614.55 samples/sec   Loss 2.9222   LearningRate 0.0044   Epoch: 15   Global Step: 656270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:27,620-Speed 2632.11 samples/sec   Loss 2.8570   LearningRate 0.0044   Epoch: 15   Global Step: 656280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:31,521-Speed 2626.13 samples/sec   Loss 2.9176   LearningRate 0.0044   Epoch: 15   Global Step: 656290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:35,416-Speed 2629.60 samples/sec   Loss 2.9104   LearningRate 0.0044   Epoch: 15   Global Step: 656300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:29:39,343-Speed 2607.78 samples/sec   Loss 2.8700   LearningRate 0.0044   Epoch: 15   Global Step: 656310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:29:43,268-Speed 2609.26 samples/sec   Loss 2.8071   LearningRate 0.0044   Epoch: 15   Global Step: 656320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:29:47,163-Speed 2630.14 samples/sec   Loss 2.9003   LearningRate 0.0044   Epoch: 15   Global Step: 656330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:29:51,065-Speed 2625.04 samples/sec   Loss 2.9014   LearningRate 0.0044   Epoch: 15   Global Step: 656340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:29:54,970-Speed 2622.97 samples/sec   Loss 2.8875   LearningRate 0.0044   Epoch: 15   Global Step: 656350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:29:58,930-Speed 2586.90 samples/sec   Loss 2.8837   LearningRate 0.0044   Epoch: 15   Global Step: 656360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:30:02,825-Speed 2629.24 samples/sec   Loss 2.9002   LearningRate 0.0044   Epoch: 15   Global Step: 656370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:30:06,720-Speed 2629.97 samples/sec   Loss 2.8573   LearningRate 0.0044   Epoch: 15   Global Step: 656380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:30:10,613-Speed 2630.90 samples/sec   Loss 2.7845   LearningRate 0.0044   Epoch: 15   Global Step: 656390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:30:14,507-Speed 2630.83 samples/sec   Loss 2.8885   LearningRate 0.0044   Epoch: 15   Global Step: 656400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:30:18,397-Speed 2632.60 samples/sec   Loss 2.9551   LearningRate 0.0044   Epoch: 15   Global Step: 656410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:22,294-Speed 2628.52 samples/sec   Loss 2.7634   LearningRate 0.0044   Epoch: 15   Global Step: 656420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:26,188-Speed 2630.21 samples/sec   Loss 2.8544   LearningRate 0.0044   Epoch: 15   Global Step: 656430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:30,084-Speed 2629.52 samples/sec   Loss 2.8198   LearningRate 0.0044   Epoch: 15   Global Step: 656440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:33,991-Speed 2621.59 samples/sec   Loss 2.8535   LearningRate 0.0044   Epoch: 15   Global Step: 656450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:37,883-Speed 2631.76 samples/sec   Loss 2.8038   LearningRate 0.0044   Epoch: 15   Global Step: 656460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:41,772-Speed 2633.23 samples/sec   Loss 2.8615   LearningRate 0.0044   Epoch: 15   Global Step: 656470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:45,690-Speed 2614.62 samples/sec   Loss 2.8625   LearningRate 0.0044   Epoch: 15   Global Step: 656480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:49,588-Speed 2628.02 samples/sec   Loss 2.8798   LearningRate 0.0044   Epoch: 15   Global Step: 656490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:53,483-Speed 2629.38 samples/sec   Loss 2.9150   LearningRate 0.0044   Epoch: 15   Global Step: 656500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:30:57,343-Speed 2653.18 samples/sec   Loss 2.8469   LearningRate 0.0044   Epoch: 15   Global Step: 656510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:01,242-Speed 2627.19 samples/sec   Loss 2.8953   LearningRate 0.0044   Epoch: 15   Global Step: 656520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:05,132-Speed 2633.37 samples/sec   Loss 2.8432   LearningRate 0.0044   Epoch: 15   Global Step: 656530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:09,028-Speed 2628.77 samples/sec   Loss 2.8219   LearningRate 0.0044   Epoch: 15   Global Step: 656540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:12,920-Speed 2631.64 samples/sec   Loss 2.8579   LearningRate 0.0044   Epoch: 15   Global Step: 656550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:16,815-Speed 2629.98 samples/sec   Loss 2.9300   LearningRate 0.0043   Epoch: 15   Global Step: 656560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:20,706-Speed 2631.73 samples/sec   Loss 2.8182   LearningRate 0.0043   Epoch: 15   Global Step: 656570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:24,597-Speed 2633.16 samples/sec   Loss 2.8280   LearningRate 0.0043   Epoch: 15   Global Step: 656580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:28,489-Speed 2630.97 samples/sec   Loss 2.8525   LearningRate 0.0043   Epoch: 15   Global Step: 656590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:32,396-Speed 2622.27 samples/sec   Loss 2.8216   LearningRate 0.0043   Epoch: 15   Global Step: 656600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:31:36,292-Speed 2628.44 samples/sec   Loss 2.8581   LearningRate 0.0043   Epoch: 15   Global Step: 656610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:31:40,203-Speed 2618.73 samples/sec   Loss 2.8831   LearningRate 0.0043   Epoch: 15   Global Step: 656620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:31:44,101-Speed 2628.17 samples/sec   Loss 2.8629   LearningRate 0.0043   Epoch: 15   Global Step: 656630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:31:47,992-Speed 2632.62 samples/sec   Loss 2.8848   LearningRate 0.0043   Epoch: 15   Global Step: 656640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:31:51,885-Speed 2630.65 samples/sec   Loss 2.8142   LearningRate 0.0043   Epoch: 15   Global Step: 656650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:31:55,780-Speed 2629.53 samples/sec   Loss 2.8321   LearningRate 0.0043   Epoch: 15   Global Step: 656660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:31:59,680-Speed 2626.57 samples/sec   Loss 2.9635   LearningRate 0.0043   Epoch: 15   Global Step: 656670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:32:03,566-Speed 2635.91 samples/sec   Loss 2.8261   LearningRate 0.0043   Epoch: 15   Global Step: 656680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:32:07,457-Speed 2631.80 samples/sec   Loss 2.8238   LearningRate 0.0043   Epoch: 15   Global Step: 656690   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:11,370-Speed 2617.19 samples/sec   Loss 2.8114   LearningRate 0.0043   Epoch: 15   Global Step: 656700   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:15,277-Speed 2622.28 samples/sec   Loss 2.8732   LearningRate 0.0043   Epoch: 15   Global Step: 656710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:19,171-Speed 2630.25 samples/sec   Loss 2.8266   LearningRate 0.0043   Epoch: 15   Global Step: 656720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:23,071-Speed 2627.30 samples/sec   Loss 2.8172   LearningRate 0.0043   Epoch: 15   Global Step: 656730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:26,963-Speed 2631.67 samples/sec   Loss 2.7897   LearningRate 0.0043   Epoch: 15   Global Step: 656740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:30,857-Speed 2630.51 samples/sec   Loss 2.9083   LearningRate 0.0043   Epoch: 15   Global Step: 656750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:34,771-Speed 2616.35 samples/sec   Loss 2.8453   LearningRate 0.0043   Epoch: 15   Global Step: 656760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:38,680-Speed 2620.05 samples/sec   Loss 2.8216   LearningRate 0.0043   Epoch: 15   Global Step: 656770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:42,575-Speed 2630.13 samples/sec   Loss 2.8245   LearningRate 0.0043   Epoch: 15   Global Step: 656780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:32:46,472-Speed 2627.90 samples/sec   Loss 2.9087   LearningRate 0.0043   Epoch: 15   Global Step: 656790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:32:50,368-Speed 2629.27 samples/sec   Loss 2.9179   LearningRate 0.0043   Epoch: 15   Global Step: 656800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:32:54,260-Speed 2632.31 samples/sec   Loss 2.8179   LearningRate 0.0043   Epoch: 15   Global Step: 656810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:32:58,169-Speed 2620.11 samples/sec   Loss 2.8506   LearningRate 0.0043   Epoch: 15   Global Step: 656820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:02,079-Speed 2619.38 samples/sec   Loss 2.8840   LearningRate 0.0043   Epoch: 15   Global Step: 656830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:06,027-Speed 2593.72 samples/sec   Loss 2.8736   LearningRate 0.0043   Epoch: 15   Global Step: 656840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:09,932-Speed 2623.55 samples/sec   Loss 2.8620   LearningRate 0.0043   Epoch: 15   Global Step: 656850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:13,827-Speed 2629.33 samples/sec   Loss 2.8474   LearningRate 0.0043   Epoch: 15   Global Step: 656860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:17,726-Speed 2628.75 samples/sec   Loss 2.7420   LearningRate 0.0043   Epoch: 15   Global Step: 656870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:21,619-Speed 2631.68 samples/sec   Loss 2.8790   LearningRate 0.0043   Epoch: 15   Global Step: 656880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:25,513-Speed 2630.75 samples/sec   Loss 2.9342   LearningRate 0.0043   Epoch: 15   Global Step: 656890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:33:29,413-Speed 2625.69 samples/sec   Loss 2.8882   LearningRate 0.0043   Epoch: 15   Global Step: 656900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:33:33,286-Speed 2644.80 samples/sec   Loss 2.7719   LearningRate 0.0043   Epoch: 15   Global Step: 656910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:37,187-Speed 2625.43 samples/sec   Loss 2.8706   LearningRate 0.0043   Epoch: 15   Global Step: 656920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:41,235-Speed 2530.68 samples/sec   Loss 2.9443   LearningRate 0.0043   Epoch: 15   Global Step: 656930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:45,156-Speed 2612.04 samples/sec   Loss 2.8592   LearningRate 0.0043   Epoch: 15   Global Step: 656940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:49,051-Speed 2629.69 samples/sec   Loss 2.8495   LearningRate 0.0043   Epoch: 15   Global Step: 656950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:52,965-Speed 2617.03 samples/sec   Loss 2.8347   LearningRate 0.0043   Epoch: 15   Global Step: 656960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:33:56,883-Speed 2614.68 samples/sec   Loss 2.8112   LearningRate 0.0043   Epoch: 15   Global Step: 656970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:00,780-Speed 2627.88 samples/sec   Loss 2.8400   LearningRate 0.0043   Epoch: 15   Global Step: 656980   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:04,688-Speed 2621.16 samples/sec   Loss 2.7408   LearningRate 0.0043   Epoch: 15   Global Step: 656990   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:08,586-Speed 2627.58 samples/sec   Loss 2.9365   LearningRate 0.0043   Epoch: 15   Global Step: 657000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:12,483-Speed 2628.17 samples/sec   Loss 2.8408   LearningRate 0.0043   Epoch: 15   Global Step: 657010   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:16,380-Speed 2628.44 samples/sec   Loss 2.8481   LearningRate 0.0043   Epoch: 15   Global Step: 657020   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:20,303-Speed 2611.75 samples/sec   Loss 2.8139   LearningRate 0.0043   Epoch: 15   Global Step: 657030   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:24,193-Speed 2632.64 samples/sec   Loss 2.8755   LearningRate 0.0043   Epoch: 15   Global Step: 657040   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:28,094-Speed 2626.10 samples/sec   Loss 2.8352   LearningRate 0.0043   Epoch: 15   Global Step: 657050   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:31,993-Speed 2626.76 samples/sec   Loss 2.8259   LearningRate 0.0043   Epoch: 15   Global Step: 657060   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:34:35,892-Speed 2627.04 samples/sec   Loss 2.8871   LearningRate 0.0043   Epoch: 15   Global Step: 657070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:34:39,787-Speed 2629.33 samples/sec   Loss 2.8377   LearningRate 0.0043   Epoch: 15   Global Step: 657080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:34:43,680-Speed 2630.95 samples/sec   Loss 2.9330   LearningRate 0.0043   Epoch: 15   Global Step: 657090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:34:47,577-Speed 2628.35 samples/sec   Loss 2.8664   LearningRate 0.0043   Epoch: 15   Global Step: 657100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:34:51,472-Speed 2629.91 samples/sec   Loss 2.8184   LearningRate 0.0043   Epoch: 15   Global Step: 657110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:34:55,365-Speed 2631.87 samples/sec   Loss 2.8670   LearningRate 0.0043   Epoch: 15   Global Step: 657120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:34:59,268-Speed 2624.38 samples/sec   Loss 2.8107   LearningRate 0.0043   Epoch: 15   Global Step: 657130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:35:03,159-Speed 2632.07 samples/sec   Loss 2.8435   LearningRate 0.0043   Epoch: 15   Global Step: 657140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:35:07,053-Speed 2629.78 samples/sec   Loss 2.8710   LearningRate 0.0043   Epoch: 15   Global Step: 657150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:35:10,967-Speed 2617.56 samples/sec   Loss 2.8531   LearningRate 0.0043   Epoch: 15   Global Step: 657160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:35:14,859-Speed 2631.29 samples/sec   Loss 2.8396   LearningRate 0.0043   Epoch: 15   Global Step: 657170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:18,758-Speed 2626.57 samples/sec   Loss 2.8887   LearningRate 0.0043   Epoch: 15   Global Step: 657180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:22,654-Speed 2629.80 samples/sec   Loss 2.8989   LearningRate 0.0043   Epoch: 15   Global Step: 657190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:26,633-Speed 2573.95 samples/sec   Loss 2.8892   LearningRate 0.0043   Epoch: 15   Global Step: 657200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:30,526-Speed 2631.20 samples/sec   Loss 2.8463   LearningRate 0.0043   Epoch: 15   Global Step: 657210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:34,422-Speed 2628.95 samples/sec   Loss 2.9078   LearningRate 0.0043   Epoch: 15   Global Step: 657220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:38,319-Speed 2627.78 samples/sec   Loss 2.7859   LearningRate 0.0043   Epoch: 15   Global Step: 657230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:42,215-Speed 2628.83 samples/sec   Loss 2.8495   LearningRate 0.0043   Epoch: 15   Global Step: 657240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:46,108-Speed 2631.54 samples/sec   Loss 2.8244   LearningRate 0.0043   Epoch: 15   Global Step: 657250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:50,010-Speed 2624.56 samples/sec   Loss 2.8159   LearningRate 0.0043   Epoch: 15   Global Step: 657260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:53,876-Speed 2649.66 samples/sec   Loss 2.8259   LearningRate 0.0043   Epoch: 15   Global Step: 657270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:35:57,748-Speed 2645.42 samples/sec   Loss 2.8876   LearningRate 0.0043   Epoch: 15   Global Step: 657280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:01,645-Speed 2628.47 samples/sec   Loss 2.8507   LearningRate 0.0043   Epoch: 15   Global Step: 657290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:05,621-Speed 2576.01 samples/sec   Loss 2.8356   LearningRate 0.0043   Epoch: 15   Global Step: 657300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:09,543-Speed 2611.79 samples/sec   Loss 2.8958   LearningRate 0.0043   Epoch: 15   Global Step: 657310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:13,433-Speed 2632.78 samples/sec   Loss 2.8555   LearningRate 0.0043   Epoch: 15   Global Step: 657320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:17,329-Speed 2628.67 samples/sec   Loss 2.9102   LearningRate 0.0043   Epoch: 15   Global Step: 657330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:21,230-Speed 2625.50 samples/sec   Loss 2.8350   LearningRate 0.0043   Epoch: 15   Global Step: 657340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:25,129-Speed 2628.29 samples/sec   Loss 2.9027   LearningRate 0.0043   Epoch: 15   Global Step: 657350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:29,025-Speed 2628.67 samples/sec   Loss 2.8134   LearningRate 0.0043   Epoch: 15   Global Step: 657360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:32,921-Speed 2629.16 samples/sec   Loss 2.8447   LearningRate 0.0043   Epoch: 15   Global Step: 657370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:36,815-Speed 2629.98 samples/sec   Loss 2.8522   LearningRate 0.0043   Epoch: 15   Global Step: 657380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:36:40,710-Speed 2630.17 samples/sec   Loss 2.8267   LearningRate 0.0043   Epoch: 15   Global Step: 657390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:36:44,601-Speed 2632.23 samples/sec   Loss 2.7890   LearningRate 0.0043   Epoch: 15   Global Step: 657400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:36:48,467-Speed 2649.31 samples/sec   Loss 2.7541   LearningRate 0.0043   Epoch: 15   Global Step: 657410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:52,365-Speed 2628.09 samples/sec   Loss 2.7443   LearningRate 0.0043   Epoch: 15   Global Step: 657420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:36:56,260-Speed 2629.55 samples/sec   Loss 2.8291   LearningRate 0.0043   Epoch: 15   Global Step: 657430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:00,189-Speed 2606.88 samples/sec   Loss 2.8275   LearningRate 0.0043   Epoch: 15   Global Step: 657440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:04,106-Speed 2615.18 samples/sec   Loss 2.8053   LearningRate 0.0043   Epoch: 15   Global Step: 657450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:08,025-Speed 2612.97 samples/sec   Loss 2.8292   LearningRate 0.0043   Epoch: 15   Global Step: 657460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:11,922-Speed 2628.59 samples/sec   Loss 2.8342   LearningRate 0.0043   Epoch: 15   Global Step: 657470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:15,831-Speed 2620.85 samples/sec   Loss 2.8588   LearningRate 0.0043   Epoch: 15   Global Step: 657480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:19,736-Speed 2622.95 samples/sec   Loss 2.8315   LearningRate 0.0043   Epoch: 15   Global Step: 657490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:23,628-Speed 2631.90 samples/sec   Loss 2.8057   LearningRate 0.0043   Epoch: 15   Global Step: 657500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:27,542-Speed 2616.61 samples/sec   Loss 2.7898   LearningRate 0.0043   Epoch: 15   Global Step: 657510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:37:31,434-Speed 2631.43 samples/sec   Loss 2.7973   LearningRate 0.0043   Epoch: 15   Global Step: 657520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:35,334-Speed 2626.23 samples/sec   Loss 2.8437   LearningRate 0.0043   Epoch: 15   Global Step: 657530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:39,230-Speed 2628.99 samples/sec   Loss 2.7519   LearningRate 0.0043   Epoch: 15   Global Step: 657540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:43,124-Speed 2630.34 samples/sec   Loss 2.7806   LearningRate 0.0043   Epoch: 15   Global Step: 657550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:37:47,001-Speed 2641.90 samples/sec   Loss 2.8502   LearningRate 0.0043   Epoch: 15   Global Step: 657560   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:37:50,894-Speed 2631.18 samples/sec   Loss 2.8355   LearningRate 0.0043   Epoch: 15   Global Step: 657570   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:37:54,786-Speed 2631.73 samples/sec   Loss 2.8434   LearningRate 0.0043   Epoch: 15   Global Step: 657580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:37:58,684-Speed 2627.57 samples/sec   Loss 2.8271   LearningRate 0.0043   Epoch: 15   Global Step: 657590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:02,577-Speed 2630.66 samples/sec   Loss 2.7422   LearningRate 0.0043   Epoch: 15   Global Step: 657600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:06,479-Speed 2625.37 samples/sec   Loss 2.7301   LearningRate 0.0043   Epoch: 15   Global Step: 657610   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:10,371-Speed 2631.79 samples/sec   Loss 2.8183   LearningRate 0.0043   Epoch: 15   Global Step: 657620   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:14,266-Speed 2629.03 samples/sec   Loss 2.9027   LearningRate 0.0043   Epoch: 15   Global Step: 657630   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:18,163-Speed 2628.71 samples/sec   Loss 2.8093   LearningRate 0.0043   Epoch: 15   Global Step: 657640   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:22,064-Speed 2626.43 samples/sec   Loss 2.7886   LearningRate 0.0043   Epoch: 15   Global Step: 657650   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:38:25,957-Speed 2630.23 samples/sec   Loss 2.7807   LearningRate 0.0043   Epoch: 15   Global Step: 657660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:29,853-Speed 2628.94 samples/sec   Loss 2.8121   LearningRate 0.0043   Epoch: 15   Global Step: 657670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:33,758-Speed 2624.32 samples/sec   Loss 2.9068   LearningRate 0.0043   Epoch: 15   Global Step: 657680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:37,651-Speed 2631.47 samples/sec   Loss 2.8425   LearningRate 0.0043   Epoch: 15   Global Step: 657690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:41,550-Speed 2626.76 samples/sec   Loss 2.7888   LearningRate 0.0043   Epoch: 15   Global Step: 657700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:45,458-Speed 2620.90 samples/sec   Loss 2.8857   LearningRate 0.0043   Epoch: 15   Global Step: 657710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:49,353-Speed 2629.67 samples/sec   Loss 2.8158   LearningRate 0.0043   Epoch: 15   Global Step: 657720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:53,246-Speed 2630.63 samples/sec   Loss 2.8902   LearningRate 0.0043   Epoch: 15   Global Step: 657730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:38:57,114-Speed 2648.73 samples/sec   Loss 2.8878   LearningRate 0.0043   Epoch: 15   Global Step: 657740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:01,011-Speed 2627.70 samples/sec   Loss 2.8811   LearningRate 0.0043   Epoch: 15   Global Step: 657750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:04,924-Speed 2617.87 samples/sec   Loss 2.9041   LearningRate 0.0043   Epoch: 15   Global Step: 657760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:08,827-Speed 2623.85 samples/sec   Loss 2.7998   LearningRate 0.0043   Epoch: 15   Global Step: 657770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:12,720-Speed 2631.27 samples/sec   Loss 2.8385   LearningRate 0.0043   Epoch: 15   Global Step: 657780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:16,615-Speed 2629.10 samples/sec   Loss 2.7723   LearningRate 0.0043   Epoch: 15   Global Step: 657790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:20,513-Speed 2628.45 samples/sec   Loss 2.8872   LearningRate 0.0043   Epoch: 15   Global Step: 657800   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:24,417-Speed 2623.23 samples/sec   Loss 2.8547   LearningRate 0.0043   Epoch: 15   Global Step: 657810   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:28,324-Speed 2622.10 samples/sec   Loss 2.7734   LearningRate 0.0043   Epoch: 15   Global Step: 657820   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:32,217-Speed 2630.91 samples/sec   Loss 2.8326   LearningRate 0.0043   Epoch: 15   Global Step: 657830   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:36,154-Speed 2600.93 samples/sec   Loss 2.9138   LearningRate 0.0043   Epoch: 15   Global Step: 657840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:39:40,052-Speed 2627.38 samples/sec   Loss 2.8209   LearningRate 0.0043   Epoch: 15   Global Step: 657850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:39:43,946-Speed 2630.85 samples/sec   Loss 2.8446   LearningRate 0.0043   Epoch: 15   Global Step: 657860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:39:47,852-Speed 2622.55 samples/sec   Loss 2.7647   LearningRate 0.0043   Epoch: 15   Global Step: 657870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:39:51,728-Speed 2642.48 samples/sec   Loss 2.8802   LearningRate 0.0043   Epoch: 15   Global Step: 657880   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:55,625-Speed 2628.79 samples/sec   Loss 2.8893   LearningRate 0.0043   Epoch: 15   Global Step: 657890   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:39:59,520-Speed 2629.48 samples/sec   Loss 2.8060   LearningRate 0.0043   Epoch: 15   Global Step: 657900   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:03,414-Speed 2630.09 samples/sec   Loss 2.9022   LearningRate 0.0043   Epoch: 15   Global Step: 657910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:07,319-Speed 2622.78 samples/sec   Loss 2.8514   LearningRate 0.0043   Epoch: 15   Global Step: 657920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:11,230-Speed 2621.62 samples/sec   Loss 2.8188   LearningRate 0.0043   Epoch: 15   Global Step: 657930   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:15,186-Speed 2589.00 samples/sec   Loss 2.8300   LearningRate 0.0043   Epoch: 15   Global Step: 657940   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:19,085-Speed 2626.94 samples/sec   Loss 2.8256   LearningRate 0.0043   Epoch: 15   Global Step: 657950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:22,982-Speed 2627.61 samples/sec   Loss 2.7853   LearningRate 0.0043   Epoch: 15   Global Step: 657960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:26,885-Speed 2624.83 samples/sec   Loss 2.8260   LearningRate 0.0043   Epoch: 15   Global Step: 657970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:40:30,785-Speed 2625.69 samples/sec   Loss 2.8654   LearningRate 0.0043   Epoch: 15   Global Step: 657980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:34,683-Speed 2627.66 samples/sec   Loss 2.7209   LearningRate 0.0043   Epoch: 15   Global Step: 657990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:38,579-Speed 2629.11 samples/sec   Loss 2.8095   LearningRate 0.0043   Epoch: 15   Global Step: 658000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:42,491-Speed 2618.63 samples/sec   Loss 2.7983   LearningRate 0.0043   Epoch: 15   Global Step: 658010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:46,382-Speed 2631.80 samples/sec   Loss 2.9015   LearningRate 0.0043   Epoch: 15   Global Step: 658020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:50,275-Speed 2631.15 samples/sec   Loss 2.7807   LearningRate 0.0043   Epoch: 15   Global Step: 658030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:54,170-Speed 2629.79 samples/sec   Loss 2.7963   LearningRate 0.0043   Epoch: 15   Global Step: 658040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:40:58,066-Speed 2628.99 samples/sec   Loss 2.8434   LearningRate 0.0043   Epoch: 15   Global Step: 658050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:02,050-Speed 2570.53 samples/sec   Loss 2.8158   LearningRate 0.0043   Epoch: 15   Global Step: 658060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:05,960-Speed 2619.61 samples/sec   Loss 2.8849   LearningRate 0.0043   Epoch: 15   Global Step: 658070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:09,878-Speed 2614.33 samples/sec   Loss 2.7851   LearningRate 0.0043   Epoch: 15   Global Step: 658080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:41:13,752-Speed 2644.07 samples/sec   Loss 2.7910   LearningRate 0.0043   Epoch: 15   Global Step: 658090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:17,658-Speed 2621.89 samples/sec   Loss 2.8605   LearningRate 0.0043   Epoch: 15   Global Step: 658100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:21,553-Speed 2630.28 samples/sec   Loss 2.7956   LearningRate 0.0043   Epoch: 15   Global Step: 658110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:25,445-Speed 2631.24 samples/sec   Loss 2.7641   LearningRate 0.0043   Epoch: 15   Global Step: 658120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:29,338-Speed 2630.98 samples/sec   Loss 2.7818   LearningRate 0.0043   Epoch: 15   Global Step: 658130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:33,234-Speed 2628.90 samples/sec   Loss 2.8941   LearningRate 0.0043   Epoch: 15   Global Step: 658140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:37,128-Speed 2629.88 samples/sec   Loss 2.8925   LearningRate 0.0043   Epoch: 15   Global Step: 658150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:41,026-Speed 2628.18 samples/sec   Loss 2.8740   LearningRate 0.0043   Epoch: 15   Global Step: 658160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:44,938-Speed 2618.24 samples/sec   Loss 2.8325   LearningRate 0.0043   Epoch: 15   Global Step: 658170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:48,831-Speed 2630.69 samples/sec   Loss 2.7452   LearningRate 0.0043   Epoch: 15   Global Step: 658180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:41:52,728-Speed 2628.09 samples/sec   Loss 2.8398   LearningRate 0.0043   Epoch: 15   Global Step: 658190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:41:56,621-Speed 2631.99 samples/sec   Loss 2.7915   LearningRate 0.0043   Epoch: 15   Global Step: 658200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:42:00,515-Speed 2630.34 samples/sec   Loss 2.8624   LearningRate 0.0043   Epoch: 15   Global Step: 658210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:04,429-Speed 2617.03 samples/sec   Loss 2.8737   LearningRate 0.0043   Epoch: 15   Global Step: 658220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:08,328-Speed 2626.35 samples/sec   Loss 2.8081   LearningRate 0.0043   Epoch: 15   Global Step: 658230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:12,224-Speed 2629.58 samples/sec   Loss 2.7942   LearningRate 0.0043   Epoch: 15   Global Step: 658240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:16,118-Speed 2630.11 samples/sec   Loss 2.7771   LearningRate 0.0043   Epoch: 15   Global Step: 658250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:20,015-Speed 2628.03 samples/sec   Loss 2.8366   LearningRate 0.0043   Epoch: 15   Global Step: 658260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:23,914-Speed 2626.72 samples/sec   Loss 2.8526   LearningRate 0.0043   Epoch: 15   Global Step: 658270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:27,874-Speed 2586.77 samples/sec   Loss 2.8158   LearningRate 0.0043   Epoch: 15   Global Step: 658280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:31,776-Speed 2624.52 samples/sec   Loss 2.8010   LearningRate 0.0043   Epoch: 15   Global Step: 658290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:35,671-Speed 2629.86 samples/sec   Loss 2.8267   LearningRate 0.0043   Epoch: 15   Global Step: 658300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:42:39,569-Speed 2627.80 samples/sec   Loss 2.7596   LearningRate 0.0043   Epoch: 15   Global Step: 658310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:42:43,463-Speed 2630.35 samples/sec   Loss 2.7920   LearningRate 0.0043   Epoch: 15   Global Step: 658320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:42:47,364-Speed 2625.59 samples/sec   Loss 2.7998   LearningRate 0.0043   Epoch: 15   Global Step: 658330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:42:51,265-Speed 2625.93 samples/sec   Loss 2.8150   LearningRate 0.0043   Epoch: 15   Global Step: 658340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:42:55,165-Speed 2626.53 samples/sec   Loss 2.8846   LearningRate 0.0043   Epoch: 15   Global Step: 658350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:42:59,075-Speed 2619.42 samples/sec   Loss 2.8342   LearningRate 0.0043   Epoch: 15   Global Step: 658360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:43:02,996-Speed 2612.42 samples/sec   Loss 2.7704   LearningRate 0.0043   Epoch: 15   Global Step: 658370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:43:06,884-Speed 2634.19 samples/sec   Loss 2.8253   LearningRate 0.0043   Epoch: 15   Global Step: 658380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:10,781-Speed 2627.85 samples/sec   Loss 2.7958   LearningRate 0.0043   Epoch: 15   Global Step: 658390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:14,684-Speed 2624.94 samples/sec   Loss 2.8080   LearningRate 0.0043   Epoch: 15   Global Step: 658400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:18,578-Speed 2630.64 samples/sec   Loss 2.8657   LearningRate 0.0043   Epoch: 15   Global Step: 658410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:22,487-Speed 2620.30 samples/sec   Loss 2.8441   LearningRate 0.0043   Epoch: 15   Global Step: 658420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:26,380-Speed 2630.94 samples/sec   Loss 2.7795   LearningRate 0.0043   Epoch: 15   Global Step: 658430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:30,282-Speed 2624.93 samples/sec   Loss 2.8100   LearningRate 0.0043   Epoch: 15   Global Step: 658440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:34,180-Speed 2627.68 samples/sec   Loss 2.8316   LearningRate 0.0043   Epoch: 15   Global Step: 658450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:38,088-Speed 2620.79 samples/sec   Loss 2.8204   LearningRate 0.0043   Epoch: 15   Global Step: 658460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:41,995-Speed 2621.41 samples/sec   Loss 2.7458   LearningRate 0.0043   Epoch: 15   Global Step: 658470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:43:45,890-Speed 2630.29 samples/sec   Loss 2.7831   LearningRate 0.0043   Epoch: 15   Global Step: 658480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:43:49,791-Speed 2625.54 samples/sec   Loss 2.9389   LearningRate 0.0043   Epoch: 15   Global Step: 658490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:43:53,691-Speed 2626.56 samples/sec   Loss 2.9019   LearningRate 0.0043   Epoch: 15   Global Step: 658500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:43:57,588-Speed 2627.93 samples/sec   Loss 2.7580   LearningRate 0.0043   Epoch: 15   Global Step: 658510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:44:01,481-Speed 2630.95 samples/sec   Loss 2.8685   LearningRate 0.0043   Epoch: 15   Global Step: 658520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:44:05,376-Speed 2629.54 samples/sec   Loss 2.7886   LearningRate 0.0043   Epoch: 15   Global Step: 658530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:44:09,248-Speed 2645.24 samples/sec   Loss 2.8020   LearningRate 0.0043   Epoch: 15   Global Step: 658540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:13,145-Speed 2628.34 samples/sec   Loss 2.8139   LearningRate 0.0043   Epoch: 15   Global Step: 658550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:17,043-Speed 2627.89 samples/sec   Loss 2.7952   LearningRate 0.0042   Epoch: 15   Global Step: 658560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:20,942-Speed 2626.67 samples/sec   Loss 2.8841   LearningRate 0.0042   Epoch: 15   Global Step: 658570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:24,839-Speed 2628.15 samples/sec   Loss 2.8585   LearningRate 0.0042   Epoch: 15   Global Step: 658580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:28,733-Speed 2630.63 samples/sec   Loss 2.7510   LearningRate 0.0042   Epoch: 15   Global Step: 658590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:32,636-Speed 2624.77 samples/sec   Loss 2.7374   LearningRate 0.0042   Epoch: 15   Global Step: 658600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:36,538-Speed 2625.20 samples/sec   Loss 2.7831   LearningRate 0.0042   Epoch: 15   Global Step: 658610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:40,434-Speed 2628.58 samples/sec   Loss 2.8904   LearningRate 0.0042   Epoch: 15   Global Step: 658620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:44,329-Speed 2629.60 samples/sec   Loss 2.8297   LearningRate 0.0042   Epoch: 15   Global Step: 658630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:48,204-Speed 2643.09 samples/sec   Loss 2.8598   LearningRate 0.0042   Epoch: 15   Global Step: 658640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:52,101-Speed 2628.74 samples/sec   Loss 2.8286   LearningRate 0.0042   Epoch: 15   Global Step: 658650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:56,013-Speed 2618.00 samples/sec   Loss 2.8257   LearningRate 0.0042   Epoch: 15   Global Step: 658660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:44:59,925-Speed 2618.46 samples/sec   Loss 2.8168   LearningRate 0.0042   Epoch: 15   Global Step: 658670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:03,824-Speed 2626.50 samples/sec   Loss 2.8588   LearningRate 0.0042   Epoch: 15   Global Step: 658680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:07,722-Speed 2627.82 samples/sec   Loss 2.7474   LearningRate 0.0042   Epoch: 15   Global Step: 658690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:11,621-Speed 2626.80 samples/sec   Loss 2.8064   LearningRate 0.0042   Epoch: 15   Global Step: 658700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:15,541-Speed 2612.73 samples/sec   Loss 2.7707   LearningRate 0.0042   Epoch: 15   Global Step: 658710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:19,454-Speed 2618.31 samples/sec   Loss 2.7683   LearningRate 0.0042   Epoch: 15   Global Step: 658720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:23,357-Speed 2623.58 samples/sec   Loss 2.8470   LearningRate 0.0042   Epoch: 15   Global Step: 658730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:27,281-Speed 2610.85 samples/sec   Loss 2.8110   LearningRate 0.0042   Epoch: 15   Global Step: 658740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:45:31,240-Speed 2586.91 samples/sec   Loss 2.8326   LearningRate 0.0042   Epoch: 15   Global Step: 658750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:35,131-Speed 2631.84 samples/sec   Loss 2.8511   LearningRate 0.0042   Epoch: 15   Global Step: 658760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:39,033-Speed 2624.75 samples/sec   Loss 2.8280   LearningRate 0.0042   Epoch: 15   Global Step: 658770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:42,933-Speed 2626.58 samples/sec   Loss 2.7858   LearningRate 0.0042   Epoch: 15   Global Step: 658780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:46,842-Speed 2620.13 samples/sec   Loss 2.8706   LearningRate 0.0042   Epoch: 15   Global Step: 658790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:50,760-Speed 2614.54 samples/sec   Loss 2.8267   LearningRate 0.0042   Epoch: 15   Global Step: 658800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:54,670-Speed 2619.44 samples/sec   Loss 2.8414   LearningRate 0.0042   Epoch: 15   Global Step: 658810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:45:58,564-Speed 2630.70 samples/sec   Loss 2.8755   LearningRate 0.0042   Epoch: 15   Global Step: 658820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:02,478-Speed 2616.27 samples/sec   Loss 2.7505   LearningRate 0.0042   Epoch: 15   Global Step: 658830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:06,379-Speed 2625.73 samples/sec   Loss 2.7840   LearningRate 0.0042   Epoch: 15   Global Step: 658840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:10,275-Speed 2629.03 samples/sec   Loss 2.7857   LearningRate 0.0042   Epoch: 15   Global Step: 658850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:46:14,179-Speed 2623.25 samples/sec   Loss 2.7842   LearningRate 0.0042   Epoch: 15   Global Step: 658860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:46:18,086-Speed 2622.06 samples/sec   Loss 2.7675   LearningRate 0.0042   Epoch: 15   Global Step: 658870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:46:21,983-Speed 2628.25 samples/sec   Loss 2.8601   LearningRate 0.0042   Epoch: 15   Global Step: 658880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:46:25,854-Speed 2646.17 samples/sec   Loss 2.8052   LearningRate 0.0042   Epoch: 15   Global Step: 658890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:29,747-Speed 2630.41 samples/sec   Loss 2.8787   LearningRate 0.0042   Epoch: 15   Global Step: 658900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:33,647-Speed 2626.49 samples/sec   Loss 2.8169   LearningRate 0.0042   Epoch: 15   Global Step: 658910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:37,539-Speed 2631.45 samples/sec   Loss 2.7702   LearningRate 0.0042   Epoch: 15   Global Step: 658920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:41,435-Speed 2629.13 samples/sec   Loss 2.7916   LearningRate 0.0042   Epoch: 15   Global Step: 658930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:45,329-Speed 2629.99 samples/sec   Loss 2.7962   LearningRate 0.0042   Epoch: 15   Global Step: 658940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:49,227-Speed 2627.76 samples/sec   Loss 2.8594   LearningRate 0.0042   Epoch: 15   Global Step: 658950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:53,121-Speed 2630.21 samples/sec   Loss 2.8376   LearningRate 0.0042   Epoch: 15   Global Step: 658960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:46:57,013-Speed 2632.38 samples/sec   Loss 2.7557   LearningRate 0.0042   Epoch: 15   Global Step: 658970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:00,915-Speed 2624.69 samples/sec   Loss 2.8343   LearningRate 0.0042   Epoch: 15   Global Step: 658980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:04,809-Speed 2630.14 samples/sec   Loss 2.8073   LearningRate 0.0042   Epoch: 15   Global Step: 658990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:47:08,702-Speed 2630.69 samples/sec   Loss 2.8420   LearningRate 0.0042   Epoch: 15   Global Step: 659000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:47:12,592-Speed 2633.63 samples/sec   Loss 2.8878   LearningRate 0.0042   Epoch: 15   Global Step: 659010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:16,492-Speed 2625.99 samples/sec   Loss 2.8596   LearningRate 0.0042   Epoch: 15   Global Step: 659020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:20,394-Speed 2625.34 samples/sec   Loss 2.7605   LearningRate 0.0042   Epoch: 15   Global Step: 659030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:24,307-Speed 2617.08 samples/sec   Loss 2.8856   LearningRate 0.0042   Epoch: 15   Global Step: 659040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:28,204-Speed 2628.84 samples/sec   Loss 2.7743   LearningRate 0.0042   Epoch: 15   Global Step: 659050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:32,095-Speed 2632.16 samples/sec   Loss 2.7744   LearningRate 0.0042   Epoch: 15   Global Step: 659060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:35,994-Speed 2626.68 samples/sec   Loss 2.7703   LearningRate 0.0042   Epoch: 15   Global Step: 659070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:39,925-Speed 2604.88 samples/sec   Loss 2.8851   LearningRate 0.0042   Epoch: 15   Global Step: 659080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:43,824-Speed 2627.30 samples/sec   Loss 2.7596   LearningRate 0.0042   Epoch: 15   Global Step: 659090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:47,715-Speed 2632.37 samples/sec   Loss 2.7963   LearningRate 0.0042   Epoch: 15   Global Step: 659100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:51,607-Speed 2631.59 samples/sec   Loss 2.8150   LearningRate 0.0042   Epoch: 15   Global Step: 659110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:47:55,497-Speed 2633.06 samples/sec   Loss 2.8617   LearningRate 0.0042   Epoch: 15   Global Step: 659120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:47:59,397-Speed 2626.90 samples/sec   Loss 2.7775   LearningRate 0.0042   Epoch: 15   Global Step: 659130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:03,302-Speed 2622.37 samples/sec   Loss 2.7777   LearningRate 0.0042   Epoch: 15   Global Step: 659140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:07,203-Speed 2625.66 samples/sec   Loss 2.8360   LearningRate 0.0042   Epoch: 15   Global Step: 659150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:11,101-Speed 2627.25 samples/sec   Loss 2.8608   LearningRate 0.0042   Epoch: 15   Global Step: 659160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:14,994-Speed 2630.68 samples/sec   Loss 2.7808   LearningRate 0.0042   Epoch: 15   Global Step: 659170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:18,890-Speed 2629.39 samples/sec   Loss 2.7196   LearningRate 0.0042   Epoch: 15   Global Step: 659180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:22,792-Speed 2624.49 samples/sec   Loss 2.7895   LearningRate 0.0042   Epoch: 15   Global Step: 659190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:26,688-Speed 2629.38 samples/sec   Loss 2.8257   LearningRate 0.0042   Epoch: 15   Global Step: 659200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:30,584-Speed 2628.93 samples/sec   Loss 2.8129   LearningRate 0.0042   Epoch: 15   Global Step: 659210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:48:34,491-Speed 2622.09 samples/sec   Loss 2.8548   LearningRate 0.0042   Epoch: 15   Global Step: 659220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:48:38,387-Speed 2628.85 samples/sec   Loss 2.8047   LearningRate 0.0042   Epoch: 15   Global Step: 659230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:48:42,279-Speed 2631.52 samples/sec   Loss 2.7809   LearningRate 0.0042   Epoch: 15   Global Step: 659240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:48:46,177-Speed 2627.38 samples/sec   Loss 2.7776   LearningRate 0.0042   Epoch: 15   Global Step: 659250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:48:50,074-Speed 2627.64 samples/sec   Loss 2.8560   LearningRate 0.0042   Epoch: 15   Global Step: 659260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:48:53,968-Speed 2630.63 samples/sec   Loss 2.7630   LearningRate 0.0042   Epoch: 15   Global Step: 659270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:48:57,854-Speed 2635.65 samples/sec   Loss 2.8048   LearningRate 0.0042   Epoch: 15   Global Step: 659280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:01,748-Speed 2630.67 samples/sec   Loss 2.7761   LearningRate 0.0042   Epoch: 15   Global Step: 659290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:05,650-Speed 2624.57 samples/sec   Loss 2.8867   LearningRate 0.0042   Epoch: 15   Global Step: 659300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:09,544-Speed 2630.81 samples/sec   Loss 2.7601   LearningRate 0.0042   Epoch: 15   Global Step: 659310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:13,438-Speed 2629.81 samples/sec   Loss 2.7522   LearningRate 0.0042   Epoch: 15   Global Step: 659320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:17,333-Speed 2630.23 samples/sec   Loss 2.8549   LearningRate 0.0042   Epoch: 15   Global Step: 659330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:21,224-Speed 2631.72 samples/sec   Loss 2.8026   LearningRate 0.0042   Epoch: 15   Global Step: 659340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:25,140-Speed 2615.69 samples/sec   Loss 2.7816   LearningRate 0.0042   Epoch: 15   Global Step: 659350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:29,034-Speed 2630.14 samples/sec   Loss 2.7660   LearningRate 0.0042   Epoch: 15   Global Step: 659360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:32,934-Speed 2626.61 samples/sec   Loss 2.8267   LearningRate 0.0042   Epoch: 15   Global Step: 659370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:36,830-Speed 2629.01 samples/sec   Loss 2.8596   LearningRate 0.0042   Epoch: 15   Global Step: 659380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:49:40,711-Speed 2639.10 samples/sec   Loss 2.8178   LearningRate 0.0042   Epoch: 15   Global Step: 659390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:44,604-Speed 2630.41 samples/sec   Loss 2.8399   LearningRate 0.0042   Epoch: 15   Global Step: 659400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:48,497-Speed 2631.82 samples/sec   Loss 2.8379   LearningRate 0.0042   Epoch: 15   Global Step: 659410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:52,411-Speed 2616.71 samples/sec   Loss 2.8180   LearningRate 0.0042   Epoch: 15   Global Step: 659420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:49:56,312-Speed 2626.37 samples/sec   Loss 2.8313   LearningRate 0.0042   Epoch: 15   Global Step: 659430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:00,203-Speed 2631.86 samples/sec   Loss 2.8373   LearningRate 0.0042   Epoch: 15   Global Step: 659440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:04,102-Speed 2626.94 samples/sec   Loss 2.7939   LearningRate 0.0042   Epoch: 15   Global Step: 659450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:07,996-Speed 2630.28 samples/sec   Loss 2.7838   LearningRate 0.0042   Epoch: 15   Global Step: 659460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:11,892-Speed 2629.15 samples/sec   Loss 2.8157   LearningRate 0.0042   Epoch: 15   Global Step: 659470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:15,786-Speed 2629.99 samples/sec   Loss 2.7922   LearningRate 0.0042   Epoch: 15   Global Step: 659480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:19,679-Speed 2630.73 samples/sec   Loss 2.7334   LearningRate 0.0042   Epoch: 15   Global Step: 659490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:50:23,550-Speed 2646.72 samples/sec   Loss 2.8174   LearningRate 0.0042   Epoch: 15   Global Step: 659500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:27,446-Speed 2628.66 samples/sec   Loss 2.7816   LearningRate 0.0042   Epoch: 15   Global Step: 659510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:31,341-Speed 2629.55 samples/sec   Loss 2.8095   LearningRate 0.0042   Epoch: 15   Global Step: 659520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:35,243-Speed 2624.97 samples/sec   Loss 2.8054   LearningRate 0.0042   Epoch: 15   Global Step: 659530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:39,291-Speed 2530.14 samples/sec   Loss 2.7906   LearningRate 0.0042   Epoch: 15   Global Step: 659540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:43,187-Speed 2629.16 samples/sec   Loss 2.7311   LearningRate 0.0042   Epoch: 15   Global Step: 659550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:47,101-Speed 2617.21 samples/sec   Loss 2.7270   LearningRate 0.0042   Epoch: 15   Global Step: 659560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:51,002-Speed 2625.77 samples/sec   Loss 2.7924   LearningRate 0.0042   Epoch: 15   Global Step: 659570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:54,895-Speed 2630.76 samples/sec   Loss 2.8071   LearningRate 0.0042   Epoch: 15   Global Step: 659580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:50:58,788-Speed 2630.33 samples/sec   Loss 2.8149   LearningRate 0.0042   Epoch: 15   Global Step: 659590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:02,686-Speed 2628.19 samples/sec   Loss 2.8046   LearningRate 0.0042   Epoch: 15   Global Step: 659600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:06,578-Speed 2631.88 samples/sec   Loss 2.8669   LearningRate 0.0042   Epoch: 15   Global Step: 659610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:10,472-Speed 2630.36 samples/sec   Loss 2.8267   LearningRate 0.0042   Epoch: 15   Global Step: 659620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:14,417-Speed 2596.66 samples/sec   Loss 2.7867   LearningRate 0.0042   Epoch: 15   Global Step: 659630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:18,372-Speed 2589.71 samples/sec   Loss 2.7628   LearningRate 0.0042   Epoch: 15   Global Step: 659640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:22,268-Speed 2628.78 samples/sec   Loss 2.8371   LearningRate 0.0042   Epoch: 15   Global Step: 659650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:26,182-Speed 2616.95 samples/sec   Loss 2.7892   LearningRate 0.0042   Epoch: 15   Global Step: 659660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:51:30,054-Speed 2645.25 samples/sec   Loss 2.8560   LearningRate 0.0042   Epoch: 15   Global Step: 659670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:33,947-Speed 2631.07 samples/sec   Loss 2.9083   LearningRate 0.0042   Epoch: 15   Global Step: 659680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:37,841-Speed 2630.11 samples/sec   Loss 2.7871   LearningRate 0.0042   Epoch: 15   Global Step: 659690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:41,785-Speed 2597.10 samples/sec   Loss 2.7889   LearningRate 0.0042   Epoch: 15   Global Step: 659700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:45,717-Speed 2604.71 samples/sec   Loss 2.7762   LearningRate 0.0042   Epoch: 15   Global Step: 659710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:49,606-Speed 2633.97 samples/sec   Loss 2.8577   LearningRate 0.0042   Epoch: 15   Global Step: 659720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:53,501-Speed 2629.60 samples/sec   Loss 2.8506   LearningRate 0.0042   Epoch: 15   Global Step: 659730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:51:57,401-Speed 2626.49 samples/sec   Loss 2.7604   LearningRate 0.0042   Epoch: 15   Global Step: 659740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:52:01,294-Speed 2630.59 samples/sec   Loss 2.8567   LearningRate 0.0042   Epoch: 15   Global Step: 659750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:52:05,194-Speed 2625.97 samples/sec   Loss 2.8650   LearningRate 0.0042   Epoch: 15   Global Step: 659760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:52:09,097-Speed 2624.08 samples/sec   Loss 2.7569   LearningRate 0.0042   Epoch: 15   Global Step: 659770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:52:12,998-Speed 2626.11 samples/sec   Loss 2.7862   LearningRate 0.0042   Epoch: 15   Global Step: 659780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:52:16,891-Speed 2630.26 samples/sec   Loss 2.7695   LearningRate 0.0042   Epoch: 15   Global Step: 659790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:52:20,769-Speed 2642.18 samples/sec   Loss 2.7824   LearningRate 0.0042   Epoch: 15   Global Step: 659800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:52:24,661-Speed 2631.50 samples/sec   Loss 2.8047   LearningRate 0.0042   Epoch: 15   Global Step: 659810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:52:28,555-Speed 2630.97 samples/sec   Loss 2.8954   LearningRate 0.0042   Epoch: 15   Global Step: 659820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:52:32,423-Speed 2647.99 samples/sec   Loss 2.8196   LearningRate 0.0042   Epoch: 15   Global Step: 659830   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:36,316-Speed 2630.86 samples/sec   Loss 2.7160   LearningRate 0.0042   Epoch: 15   Global Step: 659840   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:40,211-Speed 2629.00 samples/sec   Loss 2.8022   LearningRate 0.0042   Epoch: 15   Global Step: 659850   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:44,105-Speed 2630.41 samples/sec   Loss 2.8286   LearningRate 0.0042   Epoch: 15   Global Step: 659860   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:48,003-Speed 2627.39 samples/sec   Loss 2.8381   LearningRate 0.0042   Epoch: 15   Global Step: 659870   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:51,909-Speed 2622.73 samples/sec   Loss 2.9072   LearningRate 0.0042   Epoch: 15   Global Step: 659880   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:55,809-Speed 2625.78 samples/sec   Loss 2.8095   LearningRate 0.0042   Epoch: 15   Global Step: 659890   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:52:59,715-Speed 2622.96 samples/sec   Loss 2.8781   LearningRate 0.0042   Epoch: 15   Global Step: 659900   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:53:03,613-Speed 2627.44 samples/sec   Loss 2.8669   LearningRate 0.0042   Epoch: 15   Global Step: 659910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:53:07,511-Speed 2627.63 samples/sec   Loss 2.7855   LearningRate 0.0042   Epoch: 15   Global Step: 659920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 21:53:11,423-Speed 2617.81 samples/sec   Loss 2.8571   LearningRate 0.0042   Epoch: 15   Global Step: 659930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:15,318-Speed 2629.92 samples/sec   Loss 2.8593   LearningRate 0.0042   Epoch: 15   Global Step: 659940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:19,216-Speed 2627.81 samples/sec   Loss 2.7633   LearningRate 0.0042   Epoch: 15   Global Step: 659950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:23,108-Speed 2631.03 samples/sec   Loss 2.7841   LearningRate 0.0042   Epoch: 15   Global Step: 659960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:27,004-Speed 2628.90 samples/sec   Loss 2.7642   LearningRate 0.0042   Epoch: 15   Global Step: 659970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:30,900-Speed 2629.27 samples/sec   Loss 2.8615   LearningRate 0.0042   Epoch: 15   Global Step: 659980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:34,791-Speed 2632.00 samples/sec   Loss 2.8393   LearningRate 0.0042   Epoch: 15   Global Step: 659990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:53:38,689-Speed 2627.18 samples/sec   Loss 2.7984   LearningRate 0.0042   Epoch: 15   Global Step: 660000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:54:21,694-[lfw][660000]XNorm: 22.610302
Training: 2022-04-15 21:54:21,694-[lfw][660000]Accuracy-Flip: 0.99833+-0.00197
Training: 2022-04-15 21:54:21,695-[lfw][660000]Accuracy-Highest: 0.99833
Training: 2022-04-15 21:55:11,361-[cfp_fp][660000]XNorm: 21.929637
Training: 2022-04-15 21:55:11,362-[cfp_fp][660000]Accuracy-Flip: 0.99271+-0.00375
Training: 2022-04-15 21:55:11,362-[cfp_fp][660000]Accuracy-Highest: 0.99271
Training: 2022-04-15 21:55:53,988-[agedb_30][660000]XNorm: 22.953074
Training: 2022-04-15 21:55:53,989-[agedb_30][660000]Accuracy-Flip: 0.98217+-0.00679
Training: 2022-04-15 21:55:53,990-[agedb_30][660000]Accuracy-Highest: 0.98217
Training: 2022-04-15 21:55:57,851-Speed 73.58 samples/sec   Loss 2.8242   LearningRate 0.0042   Epoch: 15   Global Step: 660010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:01,711-Speed 2653.90 samples/sec   Loss 2.8627   LearningRate 0.0042   Epoch: 15   Global Step: 660020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:05,574-Speed 2651.29 samples/sec   Loss 2.7262   LearningRate 0.0042   Epoch: 15   Global Step: 660030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:56:09,439-Speed 2649.87 samples/sec   Loss 2.7573   LearningRate 0.0042   Epoch: 15   Global Step: 660040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:56:13,287-Speed 2661.61 samples/sec   Loss 2.8003   LearningRate 0.0042   Epoch: 15   Global Step: 660050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:17,157-Speed 2646.68 samples/sec   Loss 2.7715   LearningRate 0.0042   Epoch: 15   Global Step: 660060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:21,033-Speed 2643.09 samples/sec   Loss 2.7872   LearningRate 0.0042   Epoch: 15   Global Step: 660070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:24,920-Speed 2634.47 samples/sec   Loss 2.7719   LearningRate 0.0042   Epoch: 15   Global Step: 660080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:28,803-Speed 2639.00 samples/sec   Loss 2.8047   LearningRate 0.0042   Epoch: 15   Global Step: 660090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:32,690-Speed 2634.71 samples/sec   Loss 2.8880   LearningRate 0.0042   Epoch: 15   Global Step: 660100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:36,586-Speed 2629.15 samples/sec   Loss 2.8407   LearningRate 0.0042   Epoch: 15   Global Step: 660110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:40,481-Speed 2629.45 samples/sec   Loss 2.8355   LearningRate 0.0042   Epoch: 15   Global Step: 660120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:44,373-Speed 2631.15 samples/sec   Loss 2.7916   LearningRate 0.0042   Epoch: 15   Global Step: 660130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:48,263-Speed 2632.72 samples/sec   Loss 2.8646   LearningRate 0.0042   Epoch: 15   Global Step: 660140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:56:52,156-Speed 2631.48 samples/sec   Loss 2.8555   LearningRate 0.0042   Epoch: 15   Global Step: 660150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:56:56,051-Speed 2629.80 samples/sec   Loss 2.8441   LearningRate 0.0042   Epoch: 15   Global Step: 660160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:56:59,952-Speed 2625.97 samples/sec   Loss 2.7911   LearningRate 0.0042   Epoch: 15   Global Step: 660170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:57:03,848-Speed 2628.47 samples/sec   Loss 2.7252   LearningRate 0.0042   Epoch: 15   Global Step: 660180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:57:07,804-Speed 2589.87 samples/sec   Loss 2.8216   LearningRate 0.0042   Epoch: 15   Global Step: 660190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:57:11,681-Speed 2641.49 samples/sec   Loss 2.7677   LearningRate 0.0042   Epoch: 15   Global Step: 660200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:15,593-Speed 2618.56 samples/sec   Loss 2.7684   LearningRate 0.0042   Epoch: 15   Global Step: 660210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:19,499-Speed 2621.80 samples/sec   Loss 2.8927   LearningRate 0.0042   Epoch: 15   Global Step: 660220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:23,404-Speed 2623.22 samples/sec   Loss 2.8122   LearningRate 0.0042   Epoch: 15   Global Step: 660230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:27,310-Speed 2622.04 samples/sec   Loss 2.8703   LearningRate 0.0042   Epoch: 15   Global Step: 660240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:31,205-Speed 2630.13 samples/sec   Loss 2.8190   LearningRate 0.0042   Epoch: 15   Global Step: 660250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:35,111-Speed 2622.56 samples/sec   Loss 2.7819   LearningRate 0.0042   Epoch: 15   Global Step: 660260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:39,005-Speed 2631.02 samples/sec   Loss 2.7850   LearningRate 0.0042   Epoch: 15   Global Step: 660270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:42,905-Speed 2625.71 samples/sec   Loss 2.8147   LearningRate 0.0042   Epoch: 15   Global Step: 660280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:46,799-Speed 2630.12 samples/sec   Loss 2.7229   LearningRate 0.0042   Epoch: 15   Global Step: 660290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:50,678-Speed 2640.45 samples/sec   Loss 2.7974   LearningRate 0.0042   Epoch: 15   Global Step: 660300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:54,570-Speed 2632.07 samples/sec   Loss 2.8078   LearningRate 0.0042   Epoch: 15   Global Step: 660310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:57:58,467-Speed 2628.14 samples/sec   Loss 2.7923   LearningRate 0.0042   Epoch: 15   Global Step: 660320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:02,376-Speed 2620.04 samples/sec   Loss 2.7612   LearningRate 0.0042   Epoch: 15   Global Step: 660330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:06,274-Speed 2627.49 samples/sec   Loss 2.7256   LearningRate 0.0042   Epoch: 15   Global Step: 660340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:10,175-Speed 2625.74 samples/sec   Loss 2.8083   LearningRate 0.0042   Epoch: 15   Global Step: 660350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:14,080-Speed 2622.88 samples/sec   Loss 2.8267   LearningRate 0.0042   Epoch: 15   Global Step: 660360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:17,982-Speed 2625.34 samples/sec   Loss 2.8923   LearningRate 0.0042   Epoch: 15   Global Step: 660370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:21,894-Speed 2618.15 samples/sec   Loss 2.7285   LearningRate 0.0042   Epoch: 15   Global Step: 660380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:25,789-Speed 2629.43 samples/sec   Loss 2.7819   LearningRate 0.0042   Epoch: 15   Global Step: 660390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:29,679-Speed 2633.36 samples/sec   Loss 2.7735   LearningRate 0.0042   Epoch: 15   Global Step: 660400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:58:33,574-Speed 2629.49 samples/sec   Loss 2.8144   LearningRate 0.0042   Epoch: 15   Global Step: 660410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:58:37,467-Speed 2630.32 samples/sec   Loss 2.8261   LearningRate 0.0042   Epoch: 15   Global Step: 660420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:58:41,367-Speed 2626.56 samples/sec   Loss 2.7940   LearningRate 0.0042   Epoch: 15   Global Step: 660430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:58:45,256-Speed 2633.62 samples/sec   Loss 2.8140   LearningRate 0.0042   Epoch: 15   Global Step: 660440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:58:49,130-Speed 2643.84 samples/sec   Loss 2.7305   LearningRate 0.0042   Epoch: 15   Global Step: 660450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:53,021-Speed 2632.80 samples/sec   Loss 2.8016   LearningRate 0.0042   Epoch: 15   Global Step: 660460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:58:56,924-Speed 2624.49 samples/sec   Loss 2.7555   LearningRate 0.0042   Epoch: 15   Global Step: 660470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:00,811-Speed 2634.44 samples/sec   Loss 2.7437   LearningRate 0.0042   Epoch: 15   Global Step: 660480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:04,698-Speed 2635.12 samples/sec   Loss 2.8534   LearningRate 0.0042   Epoch: 15   Global Step: 660490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:08,590-Speed 2631.75 samples/sec   Loss 2.7306   LearningRate 0.0042   Epoch: 15   Global Step: 660500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:12,490-Speed 2625.93 samples/sec   Loss 2.7855   LearningRate 0.0042   Epoch: 15   Global Step: 660510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:16,393-Speed 2623.89 samples/sec   Loss 2.8146   LearningRate 0.0042   Epoch: 15   Global Step: 660520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:20,302-Speed 2620.40 samples/sec   Loss 2.7675   LearningRate 0.0042   Epoch: 15   Global Step: 660530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:24,201-Speed 2627.26 samples/sec   Loss 2.7976   LearningRate 0.0042   Epoch: 15   Global Step: 660540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:28,100-Speed 2627.52 samples/sec   Loss 2.7778   LearningRate 0.0042   Epoch: 15   Global Step: 660550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:59:31,997-Speed 2628.41 samples/sec   Loss 2.8112   LearningRate 0.0042   Epoch: 15   Global Step: 660560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:59:35,892-Speed 2629.17 samples/sec   Loss 2.8063   LearningRate 0.0042   Epoch: 15   Global Step: 660570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:59:39,816-Speed 2610.14 samples/sec   Loss 2.7424   LearningRate 0.0042   Epoch: 15   Global Step: 660580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:59:43,718-Speed 2625.04 samples/sec   Loss 2.7800   LearningRate 0.0041   Epoch: 15   Global Step: 660590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 21:59:47,588-Speed 2646.36 samples/sec   Loss 2.8049   LearningRate 0.0041   Epoch: 15   Global Step: 660600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:51,478-Speed 2633.42 samples/sec   Loss 2.8125   LearningRate 0.0041   Epoch: 15   Global Step: 660610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:55,380-Speed 2624.61 samples/sec   Loss 2.8253   LearningRate 0.0041   Epoch: 15   Global Step: 660620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 21:59:59,276-Speed 2629.24 samples/sec   Loss 2.8032   LearningRate 0.0041   Epoch: 15   Global Step: 660630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:00:03,170-Speed 2629.88 samples/sec   Loss 2.8146   LearningRate 0.0041   Epoch: 15   Global Step: 660640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:00:07,057-Speed 2634.97 samples/sec   Loss 2.8611   LearningRate 0.0041   Epoch: 15   Global Step: 660650   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:10,951-Speed 2630.69 samples/sec   Loss 2.7999   LearningRate 0.0041   Epoch: 15   Global Step: 660660   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:14,847-Speed 2634.22 samples/sec   Loss 2.7819   LearningRate 0.0041   Epoch: 15   Global Step: 660670   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:18,805-Speed 2587.53 samples/sec   Loss 2.7359   LearningRate 0.0041   Epoch: 15   Global Step: 660680   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:22,700-Speed 2629.40 samples/sec   Loss 2.7471   LearningRate 0.0041   Epoch: 15   Global Step: 660690   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:26,600-Speed 2626.55 samples/sec   Loss 2.7903   LearningRate 0.0041   Epoch: 15   Global Step: 660700   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:30,493-Speed 2631.00 samples/sec   Loss 2.7196   LearningRate 0.0041   Epoch: 15   Global Step: 660710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:34,385-Speed 2631.54 samples/sec   Loss 2.8890   LearningRate 0.0041   Epoch: 15   Global Step: 660720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:38,279-Speed 2629.70 samples/sec   Loss 2.8476   LearningRate 0.0041   Epoch: 15   Global Step: 660730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:42,170-Speed 2632.23 samples/sec   Loss 2.7774   LearningRate 0.0041   Epoch: 15   Global Step: 660740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:00:46,065-Speed 2630.00 samples/sec   Loss 2.7588   LearningRate 0.0041   Epoch: 15   Global Step: 660750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:00:49,960-Speed 2630.54 samples/sec   Loss 2.7345   LearningRate 0.0041   Epoch: 15   Global Step: 660760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:00:53,857-Speed 2627.74 samples/sec   Loss 2.7544   LearningRate 0.0041   Epoch: 15   Global Step: 660770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:00:57,757-Speed 2626.69 samples/sec   Loss 2.8083   LearningRate 0.0041   Epoch: 15   Global Step: 660780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:01,647-Speed 2632.47 samples/sec   Loss 2.8426   LearningRate 0.0041   Epoch: 15   Global Step: 660790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:05,541-Speed 2630.74 samples/sec   Loss 2.8205   LearningRate 0.0041   Epoch: 15   Global Step: 660800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:09,451-Speed 2619.43 samples/sec   Loss 2.7864   LearningRate 0.0041   Epoch: 15   Global Step: 660810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:13,342-Speed 2632.45 samples/sec   Loss 2.7842   LearningRate 0.0041   Epoch: 15   Global Step: 660820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:17,238-Speed 2628.72 samples/sec   Loss 2.8901   LearningRate 0.0041   Epoch: 15   Global Step: 660830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:21,134-Speed 2629.56 samples/sec   Loss 2.7261   LearningRate 0.0041   Epoch: 15   Global Step: 660840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:25,033-Speed 2626.81 samples/sec   Loss 2.7961   LearningRate 0.0041   Epoch: 15   Global Step: 660850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:01:28,932-Speed 2627.12 samples/sec   Loss 2.8250   LearningRate 0.0041   Epoch: 15   Global Step: 660860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:01:32,822-Speed 2633.34 samples/sec   Loss 2.8041   LearningRate 0.0041   Epoch: 15   Global Step: 660870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:01:36,723-Speed 2625.42 samples/sec   Loss 2.7501   LearningRate 0.0041   Epoch: 15   Global Step: 660880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:01:40,619-Speed 2628.77 samples/sec   Loss 2.7843   LearningRate 0.0041   Epoch: 15   Global Step: 660890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:01:44,510-Speed 2632.39 samples/sec   Loss 2.7885   LearningRate 0.0041   Epoch: 15   Global Step: 660900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:01:48,389-Speed 2640.11 samples/sec   Loss 2.8679   LearningRate 0.0041   Epoch: 15   Global Step: 660910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:52,293-Speed 2624.63 samples/sec   Loss 2.7476   LearningRate 0.0041   Epoch: 15   Global Step: 660920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:01:56,198-Speed 2622.77 samples/sec   Loss 2.7572   LearningRate 0.0041   Epoch: 15   Global Step: 660930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:00,097-Speed 2626.84 samples/sec   Loss 2.7956   LearningRate 0.0041   Epoch: 15   Global Step: 660940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:03,998-Speed 2625.90 samples/sec   Loss 2.7634   LearningRate 0.0041   Epoch: 15   Global Step: 660950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:07,890-Speed 2631.39 samples/sec   Loss 2.7878   LearningRate 0.0041   Epoch: 15   Global Step: 660960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:11,794-Speed 2623.44 samples/sec   Loss 2.8046   LearningRate 0.0041   Epoch: 15   Global Step: 660970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:15,686-Speed 2632.00 samples/sec   Loss 2.8063   LearningRate 0.0041   Epoch: 15   Global Step: 660980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:19,580-Speed 2630.53 samples/sec   Loss 2.7425   LearningRate 0.0041   Epoch: 15   Global Step: 660990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:23,470-Speed 2633.03 samples/sec   Loss 2.8525   LearningRate 0.0041   Epoch: 15   Global Step: 661000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:27,381-Speed 2618.58 samples/sec   Loss 2.7796   LearningRate 0.0041   Epoch: 15   Global Step: 661010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:02:31,273-Speed 2631.43 samples/sec   Loss 2.8132   LearningRate 0.0041   Epoch: 15   Global Step: 661020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:02:35,172-Speed 2627.11 samples/sec   Loss 2.7678   LearningRate 0.0041   Epoch: 15   Global Step: 661030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:02:39,068-Speed 2629.03 samples/sec   Loss 2.7945   LearningRate 0.0041   Epoch: 15   Global Step: 661040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:02:42,958-Speed 2632.88 samples/sec   Loss 2.7825   LearningRate 0.0041   Epoch: 15   Global Step: 661050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:02:46,832-Speed 2644.05 samples/sec   Loss 2.8435   LearningRate 0.0041   Epoch: 15   Global Step: 661060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:50,728-Speed 2629.47 samples/sec   Loss 2.8678   LearningRate 0.0041   Epoch: 15   Global Step: 661070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:54,624-Speed 2629.11 samples/sec   Loss 2.7424   LearningRate 0.0041   Epoch: 15   Global Step: 661080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:02:58,519-Speed 2629.10 samples/sec   Loss 2.8640   LearningRate 0.0041   Epoch: 15   Global Step: 661090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:02,542-Speed 2546.62 samples/sec   Loss 2.7907   LearningRate 0.0041   Epoch: 15   Global Step: 661100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:06,564-Speed 2546.34 samples/sec   Loss 2.7766   LearningRate 0.0041   Epoch: 15   Global Step: 661110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:10,497-Speed 2603.84 samples/sec   Loss 2.8409   LearningRate 0.0041   Epoch: 15   Global Step: 661120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:14,395-Speed 2627.78 samples/sec   Loss 2.7574   LearningRate 0.0041   Epoch: 15   Global Step: 661130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:18,306-Speed 2618.54 samples/sec   Loss 2.7552   LearningRate 0.0041   Epoch: 15   Global Step: 661140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:22,203-Speed 2628.45 samples/sec   Loss 2.8159   LearningRate 0.0041   Epoch: 15   Global Step: 661150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:03:26,126-Speed 2612.11 samples/sec   Loss 2.7711   LearningRate 0.0041   Epoch: 15   Global Step: 661160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:30,016-Speed 2632.62 samples/sec   Loss 2.7586   LearningRate 0.0041   Epoch: 15   Global Step: 661170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:33,950-Speed 2603.32 samples/sec   Loss 2.7260   LearningRate 0.0041   Epoch: 15   Global Step: 661180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:37,853-Speed 2624.69 samples/sec   Loss 2.8376   LearningRate 0.0041   Epoch: 15   Global Step: 661190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:41,754-Speed 2625.13 samples/sec   Loss 2.7773   LearningRate 0.0041   Epoch: 15   Global Step: 661200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:45,653-Speed 2626.42 samples/sec   Loss 2.7093   LearningRate 0.0041   Epoch: 15   Global Step: 661210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:49,546-Speed 2631.39 samples/sec   Loss 2.8303   LearningRate 0.0041   Epoch: 15   Global Step: 661220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:53,440-Speed 2630.19 samples/sec   Loss 2.7846   LearningRate 0.0041   Epoch: 15   Global Step: 661230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:03:57,333-Speed 2631.19 samples/sec   Loss 2.7431   LearningRate 0.0041   Epoch: 15   Global Step: 661240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:04:01,224-Speed 2632.46 samples/sec   Loss 2.7969   LearningRate 0.0041   Epoch: 15   Global Step: 661250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:04:05,101-Speed 2642.56 samples/sec   Loss 2.7310   LearningRate 0.0041   Epoch: 15   Global Step: 661260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:04:08,994-Speed 2630.96 samples/sec   Loss 2.7799   LearningRate 0.0041   Epoch: 15   Global Step: 661270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:04:12,888-Speed 2630.25 samples/sec   Loss 2.7588   LearningRate 0.0041   Epoch: 15   Global Step: 661280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:04:16,783-Speed 2629.50 samples/sec   Loss 2.8028   LearningRate 0.0041   Epoch: 15   Global Step: 661290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:04:20,660-Speed 2642.06 samples/sec   Loss 2.7990   LearningRate 0.0041   Epoch: 15   Global Step: 661300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:24,562-Speed 2625.52 samples/sec   Loss 2.8017   LearningRate 0.0041   Epoch: 15   Global Step: 661310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:28,463-Speed 2625.59 samples/sec   Loss 2.7404   LearningRate 0.0041   Epoch: 15   Global Step: 661320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:32,359-Speed 2628.66 samples/sec   Loss 2.7264   LearningRate 0.0041   Epoch: 15   Global Step: 661330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:36,260-Speed 2626.14 samples/sec   Loss 2.8599   LearningRate 0.0041   Epoch: 15   Global Step: 661340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:40,156-Speed 2628.75 samples/sec   Loss 2.8136   LearningRate 0.0041   Epoch: 15   Global Step: 661350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:44,050-Speed 2630.13 samples/sec   Loss 2.7526   LearningRate 0.0041   Epoch: 15   Global Step: 661360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:47,948-Speed 2628.18 samples/sec   Loss 2.6830   LearningRate 0.0041   Epoch: 15   Global Step: 661370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:51,851-Speed 2623.69 samples/sec   Loss 2.7967   LearningRate 0.0041   Epoch: 15   Global Step: 661380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:55,746-Speed 2630.08 samples/sec   Loss 2.7880   LearningRate 0.0041   Epoch: 15   Global Step: 661390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:04:59,641-Speed 2629.25 samples/sec   Loss 2.8549   LearningRate 0.0041   Epoch: 15   Global Step: 661400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:03,538-Speed 2628.08 samples/sec   Loss 2.7000   LearningRate 0.0041   Epoch: 15   Global Step: 661410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:07,439-Speed 2625.67 samples/sec   Loss 2.8001   LearningRate 0.0041   Epoch: 15   Global Step: 661420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:11,336-Speed 2629.11 samples/sec   Loss 2.8356   LearningRate 0.0041   Epoch: 15   Global Step: 661430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:15,232-Speed 2629.81 samples/sec   Loss 2.7431   LearningRate 0.0041   Epoch: 15   Global Step: 661440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:19,130-Speed 2627.39 samples/sec   Loss 2.7570   LearningRate 0.0041   Epoch: 15   Global Step: 661450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:23,031-Speed 2625.47 samples/sec   Loss 2.7671   LearningRate 0.0041   Epoch: 15   Global Step: 661460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:05:26,914-Speed 2637.30 samples/sec   Loss 2.7861   LearningRate 0.0041   Epoch: 15   Global Step: 661470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:05:30,856-Speed 2598.58 samples/sec   Loss 2.8635   LearningRate 0.0041   Epoch: 15   Global Step: 661480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:05:34,820-Speed 2583.94 samples/sec   Loss 2.8363   LearningRate 0.0041   Epoch: 15   Global Step: 661490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:05:38,748-Speed 2607.43 samples/sec   Loss 2.8270   LearningRate 0.0041   Epoch: 15   Global Step: 661500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:05:42,643-Speed 2629.22 samples/sec   Loss 2.7629   LearningRate 0.0041   Epoch: 15   Global Step: 661510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:05:46,543-Speed 2626.82 samples/sec   Loss 2.7651   LearningRate 0.0041   Epoch: 15   Global Step: 661520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:05:50,421-Speed 2641.21 samples/sec   Loss 2.7616   LearningRate 0.0041   Epoch: 15   Global Step: 661530   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:05:54,408-Speed 2568.87 samples/sec   Loss 2.7600   LearningRate 0.0041   Epoch: 15   Global Step: 661540   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:05:58,334-Speed 2609.40 samples/sec   Loss 2.8154   LearningRate 0.0041   Epoch: 15   Global Step: 661550   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:02,235-Speed 2625.00 samples/sec   Loss 2.7259   LearningRate 0.0041   Epoch: 15   Global Step: 661560   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:06,128-Speed 2630.94 samples/sec   Loss 2.8561   LearningRate 0.0041   Epoch: 15   Global Step: 661570   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:10,032-Speed 2623.60 samples/sec   Loss 2.7532   LearningRate 0.0041   Epoch: 15   Global Step: 661580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:13,939-Speed 2621.55 samples/sec   Loss 2.6868   LearningRate 0.0041   Epoch: 15   Global Step: 661590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:17,842-Speed 2624.07 samples/sec   Loss 2.7193   LearningRate 0.0041   Epoch: 15   Global Step: 661600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:21,744-Speed 2625.36 samples/sec   Loss 2.7793   LearningRate 0.0041   Epoch: 15   Global Step: 661610   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:25,648-Speed 2623.38 samples/sec   Loss 2.8799   LearningRate 0.0041   Epoch: 15   Global Step: 661620   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:06:29,555-Speed 2622.19 samples/sec   Loss 2.7428   LearningRate 0.0041   Epoch: 15   Global Step: 661630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:33,454-Speed 2626.33 samples/sec   Loss 2.7686   LearningRate 0.0041   Epoch: 15   Global Step: 661640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:37,368-Speed 2616.75 samples/sec   Loss 2.7979   LearningRate 0.0041   Epoch: 15   Global Step: 661650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:41,274-Speed 2622.29 samples/sec   Loss 2.7543   LearningRate 0.0041   Epoch: 15   Global Step: 661660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:45,175-Speed 2625.70 samples/sec   Loss 2.7338   LearningRate 0.0041   Epoch: 15   Global Step: 661670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:49,079-Speed 2623.72 samples/sec   Loss 2.8305   LearningRate 0.0041   Epoch: 15   Global Step: 661680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:52,991-Speed 2618.07 samples/sec   Loss 2.7842   LearningRate 0.0041   Epoch: 15   Global Step: 661690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:06:56,922-Speed 2605.18 samples/sec   Loss 2.8092   LearningRate 0.0041   Epoch: 15   Global Step: 661700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:00,819-Speed 2628.37 samples/sec   Loss 2.7777   LearningRate 0.0041   Epoch: 15   Global Step: 661710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:04,757-Speed 2601.58 samples/sec   Loss 2.8988   LearningRate 0.0041   Epoch: 15   Global Step: 661720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:08,654-Speed 2627.82 samples/sec   Loss 2.7716   LearningRate 0.0041   Epoch: 15   Global Step: 661730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:07:12,536-Speed 2638.63 samples/sec   Loss 2.8078   LearningRate 0.0041   Epoch: 15   Global Step: 661740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:16,436-Speed 2626.45 samples/sec   Loss 2.7657   LearningRate 0.0041   Epoch: 15   Global Step: 661750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:20,335-Speed 2626.94 samples/sec   Loss 2.7758   LearningRate 0.0041   Epoch: 15   Global Step: 661760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:24,236-Speed 2625.44 samples/sec   Loss 2.7675   LearningRate 0.0041   Epoch: 15   Global Step: 661770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:28,157-Speed 2612.03 samples/sec   Loss 2.7857   LearningRate 0.0041   Epoch: 15   Global Step: 661780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:32,112-Speed 2590.01 samples/sec   Loss 2.6533   LearningRate 0.0041   Epoch: 15   Global Step: 661790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:36,034-Speed 2611.35 samples/sec   Loss 2.7350   LearningRate 0.0041   Epoch: 15   Global Step: 661800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:39,940-Speed 2621.81 samples/sec   Loss 2.6998   LearningRate 0.0041   Epoch: 15   Global Step: 661810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:43,836-Speed 2629.71 samples/sec   Loss 2.7884   LearningRate 0.0041   Epoch: 15   Global Step: 661820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:47,738-Speed 2624.74 samples/sec   Loss 2.7252   LearningRate 0.0041   Epoch: 15   Global Step: 661830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:51,639-Speed 2626.05 samples/sec   Loss 2.8407   LearningRate 0.0041   Epoch: 15   Global Step: 661840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:07:55,511-Speed 2644.72 samples/sec   Loss 2.7109   LearningRate 0.0041   Epoch: 15   Global Step: 661850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:07:59,410-Speed 2627.40 samples/sec   Loss 2.7374   LearningRate 0.0041   Epoch: 15   Global Step: 661860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:03,310-Speed 2626.40 samples/sec   Loss 2.7536   LearningRate 0.0041   Epoch: 15   Global Step: 661870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:07,206-Speed 2628.25 samples/sec   Loss 2.7908   LearningRate 0.0041   Epoch: 15   Global Step: 661880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:11,112-Speed 2622.35 samples/sec   Loss 2.7730   LearningRate 0.0041   Epoch: 15   Global Step: 661890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:15,016-Speed 2623.39 samples/sec   Loss 2.7921   LearningRate 0.0041   Epoch: 15   Global Step: 661900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:18,923-Speed 2621.47 samples/sec   Loss 2.7949   LearningRate 0.0041   Epoch: 15   Global Step: 661910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:22,819-Speed 2629.34 samples/sec   Loss 2.6698   LearningRate 0.0041   Epoch: 15   Global Step: 661920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:26,722-Speed 2624.45 samples/sec   Loss 2.7669   LearningRate 0.0041   Epoch: 15   Global Step: 661930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:30,624-Speed 2625.04 samples/sec   Loss 2.7750   LearningRate 0.0041   Epoch: 15   Global Step: 661940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:08:34,530-Speed 2622.27 samples/sec   Loss 2.6970   LearningRate 0.0041   Epoch: 15   Global Step: 661950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:08:38,465-Speed 2602.39 samples/sec   Loss 2.7575   LearningRate 0.0041   Epoch: 15   Global Step: 661960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:08:42,456-Speed 2566.60 samples/sec   Loss 2.7800   LearningRate 0.0041   Epoch: 15   Global Step: 661970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:08:46,353-Speed 2627.86 samples/sec   Loss 2.7259   LearningRate 0.0041   Epoch: 15   Global Step: 661980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:08:50,254-Speed 2625.66 samples/sec   Loss 2.7734   LearningRate 0.0041   Epoch: 15   Global Step: 661990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:08:54,154-Speed 2626.36 samples/sec   Loss 2.8083   LearningRate 0.0041   Epoch: 15   Global Step: 662000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:08:58,153-Speed 2561.55 samples/sec   Loss 2.7858   LearningRate 0.0041   Epoch: 15   Global Step: 662010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:09:02,114-Speed 2585.45 samples/sec   Loss 2.6932   LearningRate 0.0041   Epoch: 15   Global Step: 662020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:09:06,016-Speed 2625.20 samples/sec   Loss 2.7248   LearningRate 0.0041   Epoch: 15   Global Step: 662030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:09:09,914-Speed 2627.82 samples/sec   Loss 2.7692   LearningRate 0.0041   Epoch: 15   Global Step: 662040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:09:13,818-Speed 2623.48 samples/sec   Loss 2.8633   LearningRate 0.0041   Epoch: 15   Global Step: 662050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:09:17,692-Speed 2643.81 samples/sec   Loss 2.8653   LearningRate 0.0041   Epoch: 15   Global Step: 662060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:21,593-Speed 2625.16 samples/sec   Loss 2.7523   LearningRate 0.0041   Epoch: 15   Global Step: 662070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:25,490-Speed 2628.50 samples/sec   Loss 2.7670   LearningRate 0.0041   Epoch: 15   Global Step: 662080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:29,387-Speed 2628.16 samples/sec   Loss 2.7821   LearningRate 0.0041   Epoch: 15   Global Step: 662090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:33,282-Speed 2630.30 samples/sec   Loss 2.8833   LearningRate 0.0041   Epoch: 15   Global Step: 662100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:37,182-Speed 2625.99 samples/sec   Loss 2.7021   LearningRate 0.0041   Epoch: 15   Global Step: 662110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:41,080-Speed 2627.41 samples/sec   Loss 2.7580   LearningRate 0.0041   Epoch: 15   Global Step: 662120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:44,977-Speed 2628.51 samples/sec   Loss 2.7996   LearningRate 0.0041   Epoch: 15   Global Step: 662130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:09:48,851-Speed 2644.21 samples/sec   Loss 2.8025   LearningRate 0.0041   Epoch: 15   Global Step: 662140   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:09:52,747-Speed 2628.18 samples/sec   Loss 2.7188   LearningRate 0.0041   Epoch: 15   Global Step: 662150   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:09:56,668-Speed 2612.86 samples/sec   Loss 2.8004   LearningRate 0.0041   Epoch: 15   Global Step: 662160   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:00,564-Speed 2628.20 samples/sec   Loss 2.7622   LearningRate 0.0041   Epoch: 15   Global Step: 662170   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:04,478-Speed 2617.33 samples/sec   Loss 2.8610   LearningRate 0.0041   Epoch: 15   Global Step: 662180   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:08,375-Speed 2627.80 samples/sec   Loss 2.7348   LearningRate 0.0041   Epoch: 15   Global Step: 662190   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:12,275-Speed 2626.61 samples/sec   Loss 2.7567   LearningRate 0.0041   Epoch: 15   Global Step: 662200   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:16,176-Speed 2625.52 samples/sec   Loss 2.7796   LearningRate 0.0041   Epoch: 15   Global Step: 662210   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:20,074-Speed 2627.62 samples/sec   Loss 2.8074   LearningRate 0.0041   Epoch: 15   Global Step: 662220   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:23,985-Speed 2619.15 samples/sec   Loss 2.7526   LearningRate 0.0041   Epoch: 15   Global Step: 662230   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:27,897-Speed 2618.04 samples/sec   Loss 2.7990   LearningRate 0.0041   Epoch: 15   Global Step: 662240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:10:31,793-Speed 2629.29 samples/sec   Loss 2.7728   LearningRate 0.0041   Epoch: 15   Global Step: 662250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:10:35,671-Speed 2641.44 samples/sec   Loss 2.7520   LearningRate 0.0041   Epoch: 15   Global Step: 662260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:39,567-Speed 2628.87 samples/sec   Loss 2.7866   LearningRate 0.0041   Epoch: 15   Global Step: 662270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:43,479-Speed 2618.14 samples/sec   Loss 2.7495   LearningRate 0.0041   Epoch: 15   Global Step: 662280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:47,384-Speed 2623.00 samples/sec   Loss 2.7297   LearningRate 0.0041   Epoch: 15   Global Step: 662290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:51,279-Speed 2629.14 samples/sec   Loss 2.7296   LearningRate 0.0041   Epoch: 15   Global Step: 662300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:55,198-Speed 2613.98 samples/sec   Loss 2.7629   LearningRate 0.0041   Epoch: 15   Global Step: 662310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:10:59,095-Speed 2627.98 samples/sec   Loss 2.7213   LearningRate 0.0041   Epoch: 15   Global Step: 662320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:02,999-Speed 2623.61 samples/sec   Loss 2.7953   LearningRate 0.0041   Epoch: 15   Global Step: 662330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:06,915-Speed 2615.18 samples/sec   Loss 2.7907   LearningRate 0.0041   Epoch: 15   Global Step: 662340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:10,821-Speed 2622.83 samples/sec   Loss 2.7529   LearningRate 0.0041   Epoch: 15   Global Step: 662350   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:14,720-Speed 2626.89 samples/sec   Loss 2.7614   LearningRate 0.0041   Epoch: 15   Global Step: 662360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:11:18,633-Speed 2616.89 samples/sec   Loss 2.6738   LearningRate 0.0041   Epoch: 15   Global Step: 662370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:11:22,535-Speed 2625.79 samples/sec   Loss 2.7420   LearningRate 0.0041   Epoch: 15   Global Step: 662380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:11:26,490-Speed 2589.53 samples/sec   Loss 2.8125   LearningRate 0.0041   Epoch: 15   Global Step: 662390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:11:30,537-Speed 2530.85 samples/sec   Loss 2.7218   LearningRate 0.0041   Epoch: 15   Global Step: 662400   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:34,623-Speed 2506.87 samples/sec   Loss 2.7898   LearningRate 0.0041   Epoch: 15   Global Step: 662410   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:38,598-Speed 2576.75 samples/sec   Loss 2.7435   LearningRate 0.0041   Epoch: 15   Global Step: 662420   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:42,497-Speed 2627.43 samples/sec   Loss 2.8081   LearningRate 0.0041   Epoch: 15   Global Step: 662430   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:46,401-Speed 2622.88 samples/sec   Loss 2.7562   LearningRate 0.0041   Epoch: 15   Global Step: 662440   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:50,303-Speed 2625.20 samples/sec   Loss 2.7490   LearningRate 0.0041   Epoch: 15   Global Step: 662450   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:54,215-Speed 2618.26 samples/sec   Loss 2.6834   LearningRate 0.0041   Epoch: 15   Global Step: 662460   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:11:58,118-Speed 2624.45 samples/sec   Loss 2.8072   LearningRate 0.0041   Epoch: 15   Global Step: 662470   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:12:02,025-Speed 2621.34 samples/sec   Loss 2.7610   LearningRate 0.0041   Epoch: 15   Global Step: 662480   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:12:05,930-Speed 2622.44 samples/sec   Loss 2.8267   LearningRate 0.0041   Epoch: 15   Global Step: 662490   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:12:09,830-Speed 2626.33 samples/sec   Loss 2.7290   LearningRate 0.0041   Epoch: 15   Global Step: 662500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:13,733-Speed 2624.60 samples/sec   Loss 2.7619   LearningRate 0.0041   Epoch: 15   Global Step: 662510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:17,633-Speed 2626.36 samples/sec   Loss 2.7206   LearningRate 0.0041   Epoch: 15   Global Step: 662520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:21,534-Speed 2625.68 samples/sec   Loss 2.7953   LearningRate 0.0041   Epoch: 15   Global Step: 662530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:25,437-Speed 2624.02 samples/sec   Loss 2.8239   LearningRate 0.0041   Epoch: 15   Global Step: 662540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:29,346-Speed 2620.52 samples/sec   Loss 2.8304   LearningRate 0.0041   Epoch: 15   Global Step: 662550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:33,258-Speed 2618.10 samples/sec   Loss 2.8199   LearningRate 0.0041   Epoch: 15   Global Step: 662560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:37,159-Speed 2625.76 samples/sec   Loss 2.7506   LearningRate 0.0041   Epoch: 15   Global Step: 662570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:41,058-Speed 2626.34 samples/sec   Loss 2.7051   LearningRate 0.0041   Epoch: 15   Global Step: 662580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:44,960-Speed 2625.06 samples/sec   Loss 2.8154   LearningRate 0.0041   Epoch: 15   Global Step: 662590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:12:48,866-Speed 2622.63 samples/sec   Loss 2.7751   LearningRate 0.0041   Epoch: 15   Global Step: 662600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:12:52,766-Speed 2626.40 samples/sec   Loss 2.7487   LearningRate 0.0041   Epoch: 15   Global Step: 662610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:12:56,676-Speed 2620.02 samples/sec   Loss 2.7575   LearningRate 0.0041   Epoch: 15   Global Step: 662620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:13:00,605-Speed 2606.61 samples/sec   Loss 2.7072   LearningRate 0.0041   Epoch: 15   Global Step: 662630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:13:04,495-Speed 2633.33 samples/sec   Loss 2.7510   LearningRate 0.0040   Epoch: 15   Global Step: 662640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:13:08,406-Speed 2618.37 samples/sec   Loss 2.7457   LearningRate 0.0040   Epoch: 15   Global Step: 662650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:13:12,314-Speed 2621.00 samples/sec   Loss 2.8180   LearningRate 0.0040   Epoch: 15   Global Step: 662660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:13:16,211-Speed 2628.67 samples/sec   Loss 2.7219   LearningRate 0.0040   Epoch: 15   Global Step: 662670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:13:20,112-Speed 2625.36 samples/sec   Loss 2.7947   LearningRate 0.0040   Epoch: 15   Global Step: 662680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:13:24,049-Speed 2602.10 samples/sec   Loss 2.8024   LearningRate 0.0040   Epoch: 15   Global Step: 662690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:13:27,924-Speed 2643.63 samples/sec   Loss 2.7726   LearningRate 0.0040   Epoch: 15   Global Step: 662700   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:31,820-Speed 2628.78 samples/sec   Loss 2.8200   LearningRate 0.0040   Epoch: 15   Global Step: 662710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:35,724-Speed 2623.66 samples/sec   Loss 2.7749   LearningRate 0.0040   Epoch: 15   Global Step: 662720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:39,634-Speed 2619.63 samples/sec   Loss 2.7210   LearningRate 0.0040   Epoch: 15   Global Step: 662730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:43,534-Speed 2626.57 samples/sec   Loss 2.7184   LearningRate 0.0040   Epoch: 15   Global Step: 662740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:47,435-Speed 2625.54 samples/sec   Loss 2.7851   LearningRate 0.0040   Epoch: 15   Global Step: 662750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:51,340-Speed 2623.08 samples/sec   Loss 2.7620   LearningRate 0.0040   Epoch: 15   Global Step: 662760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:55,238-Speed 2628.08 samples/sec   Loss 2.7725   LearningRate 0.0040   Epoch: 15   Global Step: 662770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:13:59,140-Speed 2624.82 samples/sec   Loss 2.7251   LearningRate 0.0040   Epoch: 15   Global Step: 662780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:14:03,048-Speed 2621.55 samples/sec   Loss 2.8700   LearningRate 0.0040   Epoch: 15   Global Step: 662790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:14:06,954-Speed 2621.65 samples/sec   Loss 2.8275   LearningRate 0.0040   Epoch: 15   Global Step: 662800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:10,883-Speed 2606.97 samples/sec   Loss 2.7941   LearningRate 0.0040   Epoch: 15   Global Step: 662810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:14,791-Speed 2621.63 samples/sec   Loss 2.7729   LearningRate 0.0040   Epoch: 15   Global Step: 662820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:18,690-Speed 2626.80 samples/sec   Loss 2.7080   LearningRate 0.0040   Epoch: 15   Global Step: 662830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:22,587-Speed 2628.02 samples/sec   Loss 2.7514   LearningRate 0.0040   Epoch: 15   Global Step: 662840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:26,486-Speed 2627.78 samples/sec   Loss 2.7047   LearningRate 0.0040   Epoch: 15   Global Step: 662850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:30,386-Speed 2625.93 samples/sec   Loss 2.7221   LearningRate 0.0040   Epoch: 15   Global Step: 662860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:34,288-Speed 2624.89 samples/sec   Loss 2.7206   LearningRate 0.0040   Epoch: 15   Global Step: 662870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:38,197-Speed 2620.06 samples/sec   Loss 2.7895   LearningRate 0.0040   Epoch: 15   Global Step: 662880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:42,110-Speed 2617.41 samples/sec   Loss 2.8063   LearningRate 0.0040   Epoch: 15   Global Step: 662890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:45,988-Speed 2641.69 samples/sec   Loss 2.7607   LearningRate 0.0040   Epoch: 15   Global Step: 662900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:49,884-Speed 2629.21 samples/sec   Loss 2.7963   LearningRate 0.0040   Epoch: 15   Global Step: 662910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:53,786-Speed 2624.97 samples/sec   Loss 2.8146   LearningRate 0.0040   Epoch: 15   Global Step: 662920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:14:57,690-Speed 2623.30 samples/sec   Loss 2.7721   LearningRate 0.0040   Epoch: 15   Global Step: 662930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:01,592-Speed 2625.30 samples/sec   Loss 2.7416   LearningRate 0.0040   Epoch: 15   Global Step: 662940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:05,490-Speed 2627.81 samples/sec   Loss 2.7654   LearningRate 0.0040   Epoch: 15   Global Step: 662950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:09,398-Speed 2620.76 samples/sec   Loss 2.8446   LearningRate 0.0040   Epoch: 15   Global Step: 662960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:13,355-Speed 2588.90 samples/sec   Loss 2.7642   LearningRate 0.0040   Epoch: 15   Global Step: 662970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:17,258-Speed 2623.94 samples/sec   Loss 2.7423   LearningRate 0.0040   Epoch: 15   Global Step: 662980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:21,164-Speed 2622.19 samples/sec   Loss 2.7044   LearningRate 0.0040   Epoch: 15   Global Step: 662990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:25,069-Speed 2623.62 samples/sec   Loss 2.7215   LearningRate 0.0040   Epoch: 15   Global Step: 663000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:15:29,022-Speed 2591.47 samples/sec   Loss 2.8201   LearningRate 0.0040   Epoch: 15   Global Step: 663010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:15:32,939-Speed 2614.48 samples/sec   Loss 2.7600   LearningRate 0.0040   Epoch: 15   Global Step: 663020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:36,839-Speed 2626.77 samples/sec   Loss 2.7510   LearningRate 0.0040   Epoch: 15   Global Step: 663030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:40,737-Speed 2627.94 samples/sec   Loss 2.7859   LearningRate 0.0040   Epoch: 15   Global Step: 663040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:44,640-Speed 2624.14 samples/sec   Loss 2.7491   LearningRate 0.0040   Epoch: 15   Global Step: 663050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:48,541-Speed 2625.40 samples/sec   Loss 2.7909   LearningRate 0.0040   Epoch: 15   Global Step: 663060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:52,437-Speed 2630.40 samples/sec   Loss 2.7474   LearningRate 0.0040   Epoch: 15   Global Step: 663070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:15:56,363-Speed 2609.63 samples/sec   Loss 2.7945   LearningRate 0.0040   Epoch: 15   Global Step: 663080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:16:00,262-Speed 2626.50 samples/sec   Loss 2.7459   LearningRate 0.0040   Epoch: 15   Global Step: 663090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:16:04,162-Speed 2626.55 samples/sec   Loss 2.7591   LearningRate 0.0040   Epoch: 15   Global Step: 663100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:16:08,063-Speed 2625.61 samples/sec   Loss 2.7576   LearningRate 0.0040   Epoch: 15   Global Step: 663110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:16:11,963-Speed 2626.75 samples/sec   Loss 2.7555   LearningRate 0.0040   Epoch: 15   Global Step: 663120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:15,862-Speed 2626.19 samples/sec   Loss 2.7639   LearningRate 0.0040   Epoch: 15   Global Step: 663130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:19,762-Speed 2626.87 samples/sec   Loss 2.7168   LearningRate 0.0040   Epoch: 15   Global Step: 663140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:23,664-Speed 2625.25 samples/sec   Loss 2.8225   LearningRate 0.0040   Epoch: 15   Global Step: 663150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:27,571-Speed 2621.69 samples/sec   Loss 2.7106   LearningRate 0.0040   Epoch: 15   Global Step: 663160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:31,470-Speed 2627.04 samples/sec   Loss 2.8511   LearningRate 0.0040   Epoch: 15   Global Step: 663170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:35,380-Speed 2619.31 samples/sec   Loss 2.7792   LearningRate 0.0040   Epoch: 15   Global Step: 663180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:39,279-Speed 2626.72 samples/sec   Loss 2.8243   LearningRate 0.0040   Epoch: 15   Global Step: 663190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:43,188-Speed 2620.33 samples/sec   Loss 2.7432   LearningRate 0.0040   Epoch: 15   Global Step: 663200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:47,105-Speed 2615.58 samples/sec   Loss 2.6979   LearningRate 0.0040   Epoch: 15   Global Step: 663210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:16:50,992-Speed 2634.44 samples/sec   Loss 2.7326   LearningRate 0.0040   Epoch: 15   Global Step: 663220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:16:54,891-Speed 2627.84 samples/sec   Loss 2.7592   LearningRate 0.0040   Epoch: 15   Global Step: 663230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:16:58,801-Speed 2619.54 samples/sec   Loss 2.7010   LearningRate 0.0040   Epoch: 15   Global Step: 663240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:17:02,701-Speed 2626.22 samples/sec   Loss 2.7218   LearningRate 0.0040   Epoch: 15   Global Step: 663250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:17:06,600-Speed 2626.26 samples/sec   Loss 2.7836   LearningRate 0.0040   Epoch: 15   Global Step: 663260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:17:10,482-Speed 2638.83 samples/sec   Loss 2.7517   LearningRate 0.0040   Epoch: 15   Global Step: 663270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:14,394-Speed 2618.19 samples/sec   Loss 2.7773   LearningRate 0.0040   Epoch: 15   Global Step: 663280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:18,298-Speed 2623.87 samples/sec   Loss 2.7423   LearningRate 0.0040   Epoch: 15   Global Step: 663290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:22,200-Speed 2624.62 samples/sec   Loss 2.7631   LearningRate 0.0040   Epoch: 15   Global Step: 663300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:26,105-Speed 2623.32 samples/sec   Loss 2.7940   LearningRate 0.0040   Epoch: 15   Global Step: 663310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:30,009-Speed 2623.72 samples/sec   Loss 2.7939   LearningRate 0.0040   Epoch: 15   Global Step: 663320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:33,919-Speed 2619.77 samples/sec   Loss 2.7481   LearningRate 0.0040   Epoch: 15   Global Step: 663330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:37,857-Speed 2600.86 samples/sec   Loss 2.7033   LearningRate 0.0040   Epoch: 15   Global Step: 663340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:41,756-Speed 2626.81 samples/sec   Loss 2.7454   LearningRate 0.0040   Epoch: 15   Global Step: 663350   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:45,656-Speed 2626.19 samples/sec   Loss 2.7765   LearningRate 0.0040   Epoch: 15   Global Step: 663360   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:17:49,550-Speed 2630.45 samples/sec   Loss 2.7182   LearningRate 0.0040   Epoch: 15   Global Step: 663370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:17:53,465-Speed 2616.31 samples/sec   Loss 2.6769   LearningRate 0.0040   Epoch: 15   Global Step: 663380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:17:57,361-Speed 2629.45 samples/sec   Loss 2.7329   LearningRate 0.0040   Epoch: 15   Global Step: 663390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:01,281-Speed 2612.92 samples/sec   Loss 2.8658   LearningRate 0.0040   Epoch: 15   Global Step: 663400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:05,268-Speed 2569.07 samples/sec   Loss 2.7874   LearningRate 0.0040   Epoch: 15   Global Step: 663410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:09,182-Speed 2617.22 samples/sec   Loss 2.6753   LearningRate 0.0040   Epoch: 15   Global Step: 663420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:13,090-Speed 2620.58 samples/sec   Loss 2.7041   LearningRate 0.0040   Epoch: 15   Global Step: 663430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:17,018-Speed 2607.84 samples/sec   Loss 2.8118   LearningRate 0.0040   Epoch: 15   Global Step: 663440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:20,932-Speed 2617.22 samples/sec   Loss 2.8227   LearningRate 0.0040   Epoch: 15   Global Step: 663450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:24,832-Speed 2626.56 samples/sec   Loss 2.6529   LearningRate 0.0040   Epoch: 15   Global Step: 663460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:28,732-Speed 2626.46 samples/sec   Loss 2.6973   LearningRate 0.0040   Epoch: 15   Global Step: 663470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:18:32,614-Speed 2638.57 samples/sec   Loss 2.7081   LearningRate 0.0040   Epoch: 15   Global Step: 663480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:36,519-Speed 2622.70 samples/sec   Loss 2.8130   LearningRate 0.0040   Epoch: 15   Global Step: 663490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:40,423-Speed 2623.68 samples/sec   Loss 2.7786   LearningRate 0.0040   Epoch: 15   Global Step: 663500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:44,321-Speed 2627.71 samples/sec   Loss 2.7452   LearningRate 0.0040   Epoch: 15   Global Step: 663510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:48,239-Speed 2614.43 samples/sec   Loss 2.7724   LearningRate 0.0040   Epoch: 15   Global Step: 663520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:52,142-Speed 2624.00 samples/sec   Loss 2.6944   LearningRate 0.0040   Epoch: 15   Global Step: 663530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:56,056-Speed 2617.26 samples/sec   Loss 2.8162   LearningRate 0.0040   Epoch: 15   Global Step: 663540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:18:59,957-Speed 2625.59 samples/sec   Loss 2.7135   LearningRate 0.0040   Epoch: 15   Global Step: 663550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:19:03,859-Speed 2625.32 samples/sec   Loss 2.7933   LearningRate 0.0040   Epoch: 15   Global Step: 663560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:19:07,756-Speed 2627.66 samples/sec   Loss 2.8104   LearningRate 0.0040   Epoch: 15   Global Step: 663570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:19:11,658-Speed 2624.86 samples/sec   Loss 2.7766   LearningRate 0.0040   Epoch: 15   Global Step: 663580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:15,565-Speed 2621.94 samples/sec   Loss 2.8742   LearningRate 0.0040   Epoch: 15   Global Step: 663590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:19,464-Speed 2626.75 samples/sec   Loss 2.8507   LearningRate 0.0040   Epoch: 15   Global Step: 663600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:23,371-Speed 2622.07 samples/sec   Loss 2.7701   LearningRate 0.0040   Epoch: 15   Global Step: 663610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:27,271-Speed 2626.18 samples/sec   Loss 2.7158   LearningRate 0.0040   Epoch: 15   Global Step: 663620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:31,170-Speed 2626.89 samples/sec   Loss 2.8370   LearningRate 0.0040   Epoch: 15   Global Step: 663630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:35,069-Speed 2626.84 samples/sec   Loss 2.7189   LearningRate 0.0040   Epoch: 15   Global Step: 663640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:38,990-Speed 2612.16 samples/sec   Loss 2.7347   LearningRate 0.0040   Epoch: 15   Global Step: 663650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:19:42,899-Speed 2620.23 samples/sec   Loss 2.7720   LearningRate 0.0040   Epoch: 15   Global Step: 663660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:20:03,803-Speed 489.90 samples/sec   Loss 2.7837   LearningRate 0.0040   Epoch: 16   Global Step: 663670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:20:07,673-Speed 2647.39 samples/sec   Loss 2.8160   LearningRate 0.0040   Epoch: 16   Global Step: 663680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:11,570-Speed 2628.13 samples/sec   Loss 2.7989   LearningRate 0.0040   Epoch: 16   Global Step: 663690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:15,626-Speed 2525.60 samples/sec   Loss 2.7963   LearningRate 0.0040   Epoch: 16   Global Step: 663700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:19,560-Speed 2603.43 samples/sec   Loss 2.7702   LearningRate 0.0040   Epoch: 16   Global Step: 663710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:23,464-Speed 2624.47 samples/sec   Loss 2.7471   LearningRate 0.0040   Epoch: 16   Global Step: 663720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:27,365-Speed 2625.03 samples/sec   Loss 2.6767   LearningRate 0.0040   Epoch: 16   Global Step: 663730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:31,269-Speed 2624.01 samples/sec   Loss 2.7962   LearningRate 0.0040   Epoch: 16   Global Step: 663740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:35,361-Speed 2502.92 samples/sec   Loss 2.7654   LearningRate 0.0040   Epoch: 16   Global Step: 663750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:39,266-Speed 2622.77 samples/sec   Loss 2.7618   LearningRate 0.0040   Epoch: 16   Global Step: 663760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:43,185-Speed 2613.85 samples/sec   Loss 2.7605   LearningRate 0.0040   Epoch: 16   Global Step: 663770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:20:47,106-Speed 2612.61 samples/sec   Loss 2.8181   LearningRate 0.0040   Epoch: 16   Global Step: 663780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:20:51,076-Speed 2579.55 samples/sec   Loss 2.7611   LearningRate 0.0040   Epoch: 16   Global Step: 663790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:20:55,079-Speed 2559.99 samples/sec   Loss 2.6931   LearningRate 0.0040   Epoch: 16   Global Step: 663800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:20:58,994-Speed 2616.29 samples/sec   Loss 2.7148   LearningRate 0.0040   Epoch: 16   Global Step: 663810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:21:02,911-Speed 2614.63 samples/sec   Loss 2.7250   LearningRate 0.0040   Epoch: 16   Global Step: 663820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:21:06,826-Speed 2616.75 samples/sec   Loss 2.7894   LearningRate 0.0040   Epoch: 16   Global Step: 663830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:10,740-Speed 2616.46 samples/sec   Loss 2.6982   LearningRate 0.0040   Epoch: 16   Global Step: 663840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:14,658-Speed 2614.16 samples/sec   Loss 2.6932   LearningRate 0.0040   Epoch: 16   Global Step: 663850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:18,586-Speed 2607.68 samples/sec   Loss 2.7129   LearningRate 0.0040   Epoch: 16   Global Step: 663860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:22,581-Speed 2564.08 samples/sec   Loss 2.7592   LearningRate 0.0040   Epoch: 16   Global Step: 663870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:26,499-Speed 2613.74 samples/sec   Loss 2.6922   LearningRate 0.0040   Epoch: 16   Global Step: 663880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:30,416-Speed 2615.22 samples/sec   Loss 2.7381   LearningRate 0.0040   Epoch: 16   Global Step: 663890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:34,370-Speed 2590.87 samples/sec   Loss 2.7143   LearningRate 0.0040   Epoch: 16   Global Step: 663900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:38,290-Speed 2612.94 samples/sec   Loss 2.7409   LearningRate 0.0040   Epoch: 16   Global Step: 663910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:42,208-Speed 2613.63 samples/sec   Loss 2.7150   LearningRate 0.0040   Epoch: 16   Global Step: 663920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:46,141-Speed 2604.43 samples/sec   Loss 2.7907   LearningRate 0.0040   Epoch: 16   Global Step: 663930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:21:50,050-Speed 2620.25 samples/sec   Loss 2.7061   LearningRate 0.0040   Epoch: 16   Global Step: 663940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:21:53,951-Speed 2633.59 samples/sec   Loss 2.7030   LearningRate 0.0040   Epoch: 16   Global Step: 663950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:21:57,877-Speed 2608.31 samples/sec   Loss 2.7002   LearningRate 0.0040   Epoch: 16   Global Step: 663960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:01,796-Speed 2614.16 samples/sec   Loss 2.7587   LearningRate 0.0040   Epoch: 16   Global Step: 663970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:05,713-Speed 2614.76 samples/sec   Loss 2.8049   LearningRate 0.0040   Epoch: 16   Global Step: 663980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:09,651-Speed 2601.13 samples/sec   Loss 2.7130   LearningRate 0.0040   Epoch: 16   Global Step: 663990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:13,567-Speed 2615.60 samples/sec   Loss 2.7296   LearningRate 0.0040   Epoch: 16   Global Step: 664000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:17,638-Speed 2516.08 samples/sec   Loss 2.7677   LearningRate 0.0040   Epoch: 16   Global Step: 664010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:21,590-Speed 2592.10 samples/sec   Loss 2.7455   LearningRate 0.0040   Epoch: 16   Global Step: 664020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:25,504-Speed 2616.77 samples/sec   Loss 2.7720   LearningRate 0.0040   Epoch: 16   Global Step: 664030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:29,415-Speed 2619.05 samples/sec   Loss 2.6975   LearningRate 0.0040   Epoch: 16   Global Step: 664040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:33,338-Speed 2610.87 samples/sec   Loss 2.7108   LearningRate 0.0040   Epoch: 16   Global Step: 664050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:22:37,406-Speed 2518.00 samples/sec   Loss 2.7401   LearningRate 0.0040   Epoch: 16   Global Step: 664060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:22:41,461-Speed 2525.41 samples/sec   Loss 2.7070   LearningRate 0.0040   Epoch: 16   Global Step: 664070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:45,547-Speed 2507.44 samples/sec   Loss 2.7448   LearningRate 0.0040   Epoch: 16   Global Step: 664080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:49,448-Speed 2625.27 samples/sec   Loss 2.7490   LearningRate 0.0040   Epoch: 16   Global Step: 664090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:53,408-Speed 2586.72 samples/sec   Loss 2.7920   LearningRate 0.0040   Epoch: 16   Global Step: 664100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:22:57,390-Speed 2572.27 samples/sec   Loss 2.6853   LearningRate 0.0040   Epoch: 16   Global Step: 664110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:01,308-Speed 2614.69 samples/sec   Loss 2.6761   LearningRate 0.0040   Epoch: 16   Global Step: 664120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:05,218-Speed 2619.55 samples/sec   Loss 2.7630   LearningRate 0.0040   Epoch: 16   Global Step: 664130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:09,125-Speed 2621.55 samples/sec   Loss 2.7154   LearningRate 0.0040   Epoch: 16   Global Step: 664140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:13,037-Speed 2617.80 samples/sec   Loss 2.7709   LearningRate 0.0040   Epoch: 16   Global Step: 664150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:16,947-Speed 2619.63 samples/sec   Loss 2.6684   LearningRate 0.0040   Epoch: 16   Global Step: 664160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:20,851-Speed 2623.99 samples/sec   Loss 2.7803   LearningRate 0.0040   Epoch: 16   Global Step: 664170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:23:24,730-Speed 2641.04 samples/sec   Loss 2.7328   LearningRate 0.0040   Epoch: 16   Global Step: 664180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:28,642-Speed 2617.87 samples/sec   Loss 2.8014   LearningRate 0.0040   Epoch: 16   Global Step: 664190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:32,546-Speed 2623.63 samples/sec   Loss 2.7882   LearningRate 0.0040   Epoch: 16   Global Step: 664200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:36,456-Speed 2619.56 samples/sec   Loss 2.7877   LearningRate 0.0040   Epoch: 16   Global Step: 664210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:40,358-Speed 2624.64 samples/sec   Loss 2.6446   LearningRate 0.0040   Epoch: 16   Global Step: 664220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:44,265-Speed 2621.83 samples/sec   Loss 2.7931   LearningRate 0.0040   Epoch: 16   Global Step: 664230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:48,173-Speed 2620.86 samples/sec   Loss 2.7250   LearningRate 0.0040   Epoch: 16   Global Step: 664240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:23:52,070-Speed 2628.76 samples/sec   Loss 2.7302   LearningRate 0.0040   Epoch: 16   Global Step: 664250   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:23:55,977-Speed 2621.39 samples/sec   Loss 2.7987   LearningRate 0.0040   Epoch: 16   Global Step: 664260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:23:59,878-Speed 2626.02 samples/sec   Loss 2.6226   LearningRate 0.0040   Epoch: 16   Global Step: 664270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:03,790-Speed 2617.83 samples/sec   Loss 2.6764   LearningRate 0.0040   Epoch: 16   Global Step: 664280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:07,702-Speed 2618.26 samples/sec   Loss 2.7544   LearningRate 0.0040   Epoch: 16   Global Step: 664290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:11,612-Speed 2619.54 samples/sec   Loss 2.7155   LearningRate 0.0040   Epoch: 16   Global Step: 664300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:15,518-Speed 2622.73 samples/sec   Loss 2.7082   LearningRate 0.0040   Epoch: 16   Global Step: 664310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:19,426-Speed 2621.01 samples/sec   Loss 2.6988   LearningRate 0.0040   Epoch: 16   Global Step: 664320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:23,327-Speed 2625.88 samples/sec   Loss 2.6823   LearningRate 0.0040   Epoch: 16   Global Step: 664330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:27,229-Speed 2624.58 samples/sec   Loss 2.6952   LearningRate 0.0040   Epoch: 16   Global Step: 664340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:24:31,140-Speed 2619.39 samples/sec   Loss 2.6925   LearningRate 0.0040   Epoch: 16   Global Step: 664350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:35,064-Speed 2610.27 samples/sec   Loss 2.7026   LearningRate 0.0040   Epoch: 16   Global Step: 664360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:38,971-Speed 2621.96 samples/sec   Loss 2.8158   LearningRate 0.0040   Epoch: 16   Global Step: 664370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:42,874-Speed 2624.03 samples/sec   Loss 2.7087   LearningRate 0.0040   Epoch: 16   Global Step: 664380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:46,779-Speed 2622.89 samples/sec   Loss 2.7577   LearningRate 0.0040   Epoch: 16   Global Step: 664390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:50,688-Speed 2619.97 samples/sec   Loss 2.7635   LearningRate 0.0040   Epoch: 16   Global Step: 664400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:54,593-Speed 2623.62 samples/sec   Loss 2.7142   LearningRate 0.0040   Epoch: 16   Global Step: 664410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:24:58,496-Speed 2624.05 samples/sec   Loss 2.7209   LearningRate 0.0040   Epoch: 16   Global Step: 664420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:02,411-Speed 2616.72 samples/sec   Loss 2.6638   LearningRate 0.0040   Epoch: 16   Global Step: 664430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:06,322-Speed 2618.39 samples/sec   Loss 2.7083   LearningRate 0.0040   Epoch: 16   Global Step: 664440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:10,237-Speed 2616.83 samples/sec   Loss 2.6847   LearningRate 0.0040   Epoch: 16   Global Step: 664450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:25:14,118-Speed 2639.10 samples/sec   Loss 2.7199   LearningRate 0.0040   Epoch: 16   Global Step: 664460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:18,026-Speed 2620.74 samples/sec   Loss 2.7229   LearningRate 0.0040   Epoch: 16   Global Step: 664470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:21,933-Speed 2622.25 samples/sec   Loss 2.7749   LearningRate 0.0040   Epoch: 16   Global Step: 664480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:25,845-Speed 2618.18 samples/sec   Loss 2.6623   LearningRate 0.0040   Epoch: 16   Global Step: 664490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:29,752-Speed 2621.97 samples/sec   Loss 2.7430   LearningRate 0.0040   Epoch: 16   Global Step: 664500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:33,680-Speed 2607.36 samples/sec   Loss 2.7379   LearningRate 0.0040   Epoch: 16   Global Step: 664510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:37,587-Speed 2621.86 samples/sec   Loss 2.6877   LearningRate 0.0040   Epoch: 16   Global Step: 664520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:25:41,474-Speed 2634.70 samples/sec   Loss 2.6865   LearningRate 0.0040   Epoch: 16   Global Step: 664530   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:25:45,386-Speed 2618.16 samples/sec   Loss 2.7823   LearningRate 0.0040   Epoch: 16   Global Step: 664540   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:25:49,302-Speed 2615.59 samples/sec   Loss 2.7273   LearningRate 0.0040   Epoch: 16   Global Step: 664550   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:25:53,204-Speed 2625.27 samples/sec   Loss 2.7685   LearningRate 0.0040   Epoch: 16   Global Step: 664560   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:25:57,120-Speed 2615.25 samples/sec   Loss 2.7151   LearningRate 0.0040   Epoch: 16   Global Step: 664570   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:26:01,026-Speed 2622.62 samples/sec   Loss 2.7232   LearningRate 0.0040   Epoch: 16   Global Step: 664580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:26:04,927-Speed 2625.51 samples/sec   Loss 2.7403   LearningRate 0.0040   Epoch: 16   Global Step: 664590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:26:08,833-Speed 2621.96 samples/sec   Loss 2.7086   LearningRate 0.0040   Epoch: 16   Global Step: 664600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:26:12,738-Speed 2623.26 samples/sec   Loss 2.6912   LearningRate 0.0040   Epoch: 16   Global Step: 664610   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:26:16,646-Speed 2621.29 samples/sec   Loss 2.6659   LearningRate 0.0040   Epoch: 16   Global Step: 664620   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-15 22:26:20,556-Speed 2619.40 samples/sec   Loss 2.8114   LearningRate 0.0040   Epoch: 16   Global Step: 664630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:24,462-Speed 2621.86 samples/sec   Loss 2.6806   LearningRate 0.0040   Epoch: 16   Global Step: 664640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:28,365-Speed 2625.36 samples/sec   Loss 2.6873   LearningRate 0.0040   Epoch: 16   Global Step: 664650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:32,262-Speed 2628.28 samples/sec   Loss 2.6709   LearningRate 0.0040   Epoch: 16   Global Step: 664660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:36,174-Speed 2618.12 samples/sec   Loss 2.7601   LearningRate 0.0040   Epoch: 16   Global Step: 664670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:40,110-Speed 2602.95 samples/sec   Loss 2.7608   LearningRate 0.0040   Epoch: 16   Global Step: 664680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:44,059-Speed 2593.56 samples/sec   Loss 2.7929   LearningRate 0.0040   Epoch: 16   Global Step: 664690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:47,991-Speed 2605.32 samples/sec   Loss 2.7319   LearningRate 0.0040   Epoch: 16   Global Step: 664700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:51,896-Speed 2623.30 samples/sec   Loss 2.7351   LearningRate 0.0039   Epoch: 16   Global Step: 664710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:55,799-Speed 2624.01 samples/sec   Loss 2.7390   LearningRate 0.0039   Epoch: 16   Global Step: 664720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:26:59,698-Speed 2626.99 samples/sec   Loss 2.7572   LearningRate 0.0039   Epoch: 16   Global Step: 664730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:03,605-Speed 2621.58 samples/sec   Loss 2.7433   LearningRate 0.0039   Epoch: 16   Global Step: 664740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:07,508-Speed 2625.18 samples/sec   Loss 2.6893   LearningRate 0.0039   Epoch: 16   Global Step: 664750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:11,423-Speed 2616.08 samples/sec   Loss 2.6181   LearningRate 0.0039   Epoch: 16   Global Step: 664760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:15,335-Speed 2618.16 samples/sec   Loss 2.7409   LearningRate 0.0039   Epoch: 16   Global Step: 664770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:19,235-Speed 2626.99 samples/sec   Loss 2.7287   LearningRate 0.0039   Epoch: 16   Global Step: 664780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:23,133-Speed 2627.31 samples/sec   Loss 2.7837   LearningRate 0.0039   Epoch: 16   Global Step: 664790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:27,037-Speed 2623.61 samples/sec   Loss 2.7966   LearningRate 0.0039   Epoch: 16   Global Step: 664800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:30,936-Speed 2627.07 samples/sec   Loss 2.7062   LearningRate 0.0039   Epoch: 16   Global Step: 664810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:34,837-Speed 2625.52 samples/sec   Loss 2.7210   LearningRate 0.0039   Epoch: 16   Global Step: 664820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:38,718-Speed 2639.95 samples/sec   Loss 2.6753   LearningRate 0.0039   Epoch: 16   Global Step: 664830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:42,618-Speed 2625.74 samples/sec   Loss 2.7149   LearningRate 0.0039   Epoch: 16   Global Step: 664840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:46,514-Speed 2629.42 samples/sec   Loss 2.7190   LearningRate 0.0039   Epoch: 16   Global Step: 664850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:50,413-Speed 2626.58 samples/sec   Loss 2.6968   LearningRate 0.0039   Epoch: 16   Global Step: 664860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:54,314-Speed 2626.36 samples/sec   Loss 2.7576   LearningRate 0.0039   Epoch: 16   Global Step: 664870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:27:58,214-Speed 2625.92 samples/sec   Loss 2.6816   LearningRate 0.0039   Epoch: 16   Global Step: 664880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:02,161-Speed 2595.51 samples/sec   Loss 2.8019   LearningRate 0.0039   Epoch: 16   Global Step: 664890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:06,083-Speed 2611.31 samples/sec   Loss 2.7296   LearningRate 0.0039   Epoch: 16   Global Step: 664900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:10,012-Speed 2609.30 samples/sec   Loss 2.7008   LearningRate 0.0039   Epoch: 16   Global Step: 664910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:13,910-Speed 2627.61 samples/sec   Loss 2.7370   LearningRate 0.0039   Epoch: 16   Global Step: 664920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:17,813-Speed 2624.52 samples/sec   Loss 2.7167   LearningRate 0.0039   Epoch: 16   Global Step: 664930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-15 22:28:21,690-Speed 2641.88 samples/sec   Loss 2.8020   LearningRate 0.0039   Epoch: 16   Global Step: 664940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:25,612-Speed 2611.59 samples/sec   Loss 2.6999   LearningRate 0.0039   Epoch: 16   Global Step: 664950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:29,518-Speed 2621.53 samples/sec   Loss 2.7550   LearningRate 0.0039   Epoch: 16   Global Step: 664960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-15 22:28:33,397-Speed 2641.25 samples/sec   Loss 2.7516   LearningRate 0.0039   Epoch: 16   Global Step: 664970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:28:37,300-Speed 2624.36 samples/sec   Loss 2.7862   LearningRate 0.0039   Epoch: 16   Global Step: 664980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:28:41,201-Speed 2625.51 samples/sec   Loss 2.6915   LearningRate 0.0039   Epoch: 16   Global Step: 664990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-15 22:28:45,106-Speed 2622.61 samples/sec   Loss 2.7620   LearningRate 0.0039   Epoch: 16   Global Step: 665000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:28:49,034-Speed 2609.04 samples/sec   Loss 2.7016   LearningRate 0.0039   Epoch: 16   Global Step: 665010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:28:52,949-Speed 2615.55 samples/sec   Loss 2.7351   LearningRate 0.0039   Epoch: 16   Global Step: 665020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:28:56,847-Speed 2627.90 samples/sec   Loss 2.7623   LearningRate 0.0039   Epoch: 16   Global Step: 665030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:00,751-Speed 2623.61 samples/sec   Loss 2.7389   LearningRate 0.0039   Epoch: 16   Global Step: 665040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:04,665-Speed 2617.08 samples/sec   Loss 2.6605   LearningRate 0.0039   Epoch: 16   Global Step: 665050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:08,578-Speed 2617.10 samples/sec   Loss 2.7281   LearningRate 0.0039   Epoch: 16   Global Step: 665060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:12,483-Speed 2623.23 samples/sec   Loss 2.6589   LearningRate 0.0039   Epoch: 16   Global Step: 665070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:29:16,383-Speed 2626.18 samples/sec   Loss 2.6765   LearningRate 0.0039   Epoch: 16   Global Step: 665080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:29:20,262-Speed 2640.15 samples/sec   Loss 2.7742   LearningRate 0.0039   Epoch: 16   Global Step: 665090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:24,173-Speed 2619.22 samples/sec   Loss 2.7297   LearningRate 0.0039   Epoch: 16   Global Step: 665100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:28,072-Speed 2626.99 samples/sec   Loss 2.7281   LearningRate 0.0039   Epoch: 16   Global Step: 665110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:31,978-Speed 2622.22 samples/sec   Loss 2.7059   LearningRate 0.0039   Epoch: 16   Global Step: 665120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:35,883-Speed 2623.20 samples/sec   Loss 2.6979   LearningRate 0.0039   Epoch: 16   Global Step: 665130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:39,792-Speed 2620.22 samples/sec   Loss 2.7374   LearningRate 0.0039   Epoch: 16   Global Step: 665140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:43,692-Speed 2625.73 samples/sec   Loss 2.6543   LearningRate 0.0039   Epoch: 16   Global Step: 665150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:47,601-Speed 2620.79 samples/sec   Loss 2.7173   LearningRate 0.0039   Epoch: 16   Global Step: 665160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:51,507-Speed 2622.33 samples/sec   Loss 2.7448   LearningRate 0.0039   Epoch: 16   Global Step: 665170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:55,411-Speed 2623.51 samples/sec   Loss 2.7016   LearningRate 0.0039   Epoch: 16   Global Step: 665180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:29:59,311-Speed 2626.67 samples/sec   Loss 2.6673   LearningRate 0.0039   Epoch: 16   Global Step: 665190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:30:03,218-Speed 2622.07 samples/sec   Loss 2.6785   LearningRate 0.0039   Epoch: 16   Global Step: 665200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:30:07,099-Speed 2638.42 samples/sec   Loss 2.7177   LearningRate 0.0039   Epoch: 16   Global Step: 665210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:11,003-Speed 2623.64 samples/sec   Loss 2.7677   LearningRate 0.0039   Epoch: 16   Global Step: 665220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:14,913-Speed 2619.61 samples/sec   Loss 2.7195   LearningRate 0.0039   Epoch: 16   Global Step: 665230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:18,811-Speed 2628.13 samples/sec   Loss 2.7029   LearningRate 0.0039   Epoch: 16   Global Step: 665240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:22,713-Speed 2625.20 samples/sec   Loss 2.7074   LearningRate 0.0039   Epoch: 16   Global Step: 665250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:26,619-Speed 2622.41 samples/sec   Loss 2.6858   LearningRate 0.0039   Epoch: 16   Global Step: 665260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:30,516-Speed 2628.47 samples/sec   Loss 2.7031   LearningRate 0.0039   Epoch: 16   Global Step: 665270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:34,441-Speed 2609.24 samples/sec   Loss 2.6837   LearningRate 0.0039   Epoch: 16   Global Step: 665280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:38,356-Speed 2616.46 samples/sec   Loss 2.7126   LearningRate 0.0039   Epoch: 16   Global Step: 665290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:42,258-Speed 2625.05 samples/sec   Loss 2.7737   LearningRate 0.0039   Epoch: 16   Global Step: 665300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:30:46,190-Speed 2605.24 samples/sec   Loss 2.7165   LearningRate 0.0039   Epoch: 16   Global Step: 665310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:30:50,097-Speed 2621.71 samples/sec   Loss 2.7547   LearningRate 0.0039   Epoch: 16   Global Step: 665320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:30:53,994-Speed 2628.34 samples/sec   Loss 2.7325   LearningRate 0.0039   Epoch: 16   Global Step: 665330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:30:57,892-Speed 2627.16 samples/sec   Loss 2.6758   LearningRate 0.0039   Epoch: 16   Global Step: 665340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:01,807-Speed 2616.96 samples/sec   Loss 2.7861   LearningRate 0.0039   Epoch: 16   Global Step: 665350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:05,703-Speed 2629.14 samples/sec   Loss 2.7083   LearningRate 0.0039   Epoch: 16   Global Step: 665360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:09,597-Speed 2629.70 samples/sec   Loss 2.6540   LearningRate 0.0039   Epoch: 16   Global Step: 665370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:13,496-Speed 2627.07 samples/sec   Loss 2.6564   LearningRate 0.0039   Epoch: 16   Global Step: 665380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:17,401-Speed 2623.01 samples/sec   Loss 2.7620   LearningRate 0.0039   Epoch: 16   Global Step: 665390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:21,323-Speed 2612.02 samples/sec   Loss 2.6506   LearningRate 0.0039   Epoch: 16   Global Step: 665400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:31:25,210-Speed 2634.97 samples/sec   Loss 2.7371   LearningRate 0.0039   Epoch: 16   Global Step: 665410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:29,224-Speed 2551.59 samples/sec   Loss 2.7058   LearningRate 0.0039   Epoch: 16   Global Step: 665420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:33,179-Speed 2590.25 samples/sec   Loss 2.7807   LearningRate 0.0039   Epoch: 16   Global Step: 665430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:37,099-Speed 2612.66 samples/sec   Loss 2.7214   LearningRate 0.0039   Epoch: 16   Global Step: 665440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:41,039-Speed 2599.46 samples/sec   Loss 2.7071   LearningRate 0.0039   Epoch: 16   Global Step: 665450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:45,137-Speed 2498.99 samples/sec   Loss 2.6870   LearningRate 0.0039   Epoch: 16   Global Step: 665460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:49,257-Speed 2486.60 samples/sec   Loss 2.7284   LearningRate 0.0039   Epoch: 16   Global Step: 665470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:53,361-Speed 2495.98 samples/sec   Loss 2.7273   LearningRate 0.0039   Epoch: 16   Global Step: 665480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:31:57,310-Speed 2593.25 samples/sec   Loss 2.7028   LearningRate 0.0039   Epoch: 16   Global Step: 665490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:01,214-Speed 2624.46 samples/sec   Loss 2.7542   LearningRate 0.0039   Epoch: 16   Global Step: 665500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:05,088-Speed 2643.45 samples/sec   Loss 2.7313   LearningRate 0.0039   Epoch: 16   Global Step: 665510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:09,001-Speed 2617.67 samples/sec   Loss 2.7098   LearningRate 0.0039   Epoch: 16   Global Step: 665520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:12,905-Speed 2623.17 samples/sec   Loss 2.7643   LearningRate 0.0039   Epoch: 16   Global Step: 665530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:16,812-Speed 2622.32 samples/sec   Loss 2.7373   LearningRate 0.0039   Epoch: 16   Global Step: 665540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:20,720-Speed 2620.71 samples/sec   Loss 2.7729   LearningRate 0.0039   Epoch: 16   Global Step: 665550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:24,633-Speed 2617.71 samples/sec   Loss 2.6896   LearningRate 0.0039   Epoch: 16   Global Step: 665560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:28,544-Speed 2618.86 samples/sec   Loss 2.7218   LearningRate 0.0039   Epoch: 16   Global Step: 665570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:32,439-Speed 2629.60 samples/sec   Loss 2.7268   LearningRate 0.0039   Epoch: 16   Global Step: 665580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:36,375-Speed 2602.49 samples/sec   Loss 2.6540   LearningRate 0.0039   Epoch: 16   Global Step: 665590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:32:40,375-Speed 2560.08 samples/sec   Loss 2.7219   LearningRate 0.0039   Epoch: 16   Global Step: 665600   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:32:44,301-Speed 2609.81 samples/sec   Loss 2.6699   LearningRate 0.0039   Epoch: 16   Global Step: 665610   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:32:48,264-Speed 2584.65 samples/sec   Loss 2.6347   LearningRate 0.0039   Epoch: 16   Global Step: 665620   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:32:52,175-Speed 2618.79 samples/sec   Loss 2.6585   LearningRate 0.0039   Epoch: 16   Global Step: 665630   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:32:56,081-Speed 2622.51 samples/sec   Loss 2.7627   LearningRate 0.0039   Epoch: 16   Global Step: 665640   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:32:59,978-Speed 2628.19 samples/sec   Loss 2.7102   LearningRate 0.0039   Epoch: 16   Global Step: 665650   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:33:03,892-Speed 2616.70 samples/sec   Loss 2.7280   LearningRate 0.0039   Epoch: 16   Global Step: 665660   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:33:07,801-Speed 2620.70 samples/sec   Loss 2.7158   LearningRate 0.0039   Epoch: 16   Global Step: 665670   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:33:11,723-Speed 2611.02 samples/sec   Loss 2.7089   LearningRate 0.0039   Epoch: 16   Global Step: 665680   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:33:15,628-Speed 2623.62 samples/sec   Loss 2.7562   LearningRate 0.0039   Epoch: 16   Global Step: 665690   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:33:19,526-Speed 2627.37 samples/sec   Loss 2.7002   LearningRate 0.0039   Epoch: 16   Global Step: 665700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:23,422-Speed 2629.18 samples/sec   Loss 2.6871   LearningRate 0.0039   Epoch: 16   Global Step: 665710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:27,318-Speed 2628.66 samples/sec   Loss 2.6537   LearningRate 0.0039   Epoch: 16   Global Step: 665720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:31,221-Speed 2624.06 samples/sec   Loss 2.8106   LearningRate 0.0039   Epoch: 16   Global Step: 665730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:35,126-Speed 2622.86 samples/sec   Loss 2.6561   LearningRate 0.0039   Epoch: 16   Global Step: 665740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:39,147-Speed 2547.50 samples/sec   Loss 2.7605   LearningRate 0.0039   Epoch: 16   Global Step: 665750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:43,125-Speed 2575.64 samples/sec   Loss 2.7042   LearningRate 0.0039   Epoch: 16   Global Step: 665760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:47,042-Speed 2614.54 samples/sec   Loss 2.7178   LearningRate 0.0039   Epoch: 16   Global Step: 665770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:50,945-Speed 2624.43 samples/sec   Loss 2.6409   LearningRate 0.0039   Epoch: 16   Global Step: 665780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:54,841-Speed 2629.16 samples/sec   Loss 2.7224   LearningRate 0.0039   Epoch: 16   Global Step: 665790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:33:58,744-Speed 2623.91 samples/sec   Loss 2.7251   LearningRate 0.0039   Epoch: 16   Global Step: 665800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:34:02,645-Speed 2626.19 samples/sec   Loss 2.7150   LearningRate 0.0039   Epoch: 16   Global Step: 665810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:34:06,553-Speed 2620.77 samples/sec   Loss 2.7135   LearningRate 0.0039   Epoch: 16   Global Step: 665820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:34:10,438-Speed 2636.24 samples/sec   Loss 2.7590   LearningRate 0.0039   Epoch: 16   Global Step: 665830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:14,346-Speed 2621.71 samples/sec   Loss 2.6909   LearningRate 0.0039   Epoch: 16   Global Step: 665840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:18,244-Speed 2627.73 samples/sec   Loss 2.6827   LearningRate 0.0039   Epoch: 16   Global Step: 665850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:22,141-Speed 2628.34 samples/sec   Loss 2.6939   LearningRate 0.0039   Epoch: 16   Global Step: 665860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:26,050-Speed 2620.46 samples/sec   Loss 2.7366   LearningRate 0.0039   Epoch: 16   Global Step: 665870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:29,944-Speed 2630.00 samples/sec   Loss 2.7429   LearningRate 0.0039   Epoch: 16   Global Step: 665880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:33,837-Speed 2630.67 samples/sec   Loss 2.6504   LearningRate 0.0039   Epoch: 16   Global Step: 665890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:37,735-Speed 2628.80 samples/sec   Loss 2.6899   LearningRate 0.0039   Epoch: 16   Global Step: 665900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:41,648-Speed 2617.22 samples/sec   Loss 2.6949   LearningRate 0.0039   Epoch: 16   Global Step: 665910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:45,581-Speed 2604.76 samples/sec   Loss 2.6640   LearningRate 0.0039   Epoch: 16   Global Step: 665920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:34:49,487-Speed 2622.07 samples/sec   Loss 2.6760   LearningRate 0.0039   Epoch: 16   Global Step: 665930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:34:53,383-Speed 2629.64 samples/sec   Loss 2.7039   LearningRate 0.0039   Epoch: 16   Global Step: 665940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:34:57,277-Speed 2629.83 samples/sec   Loss 2.7270   LearningRate 0.0039   Epoch: 16   Global Step: 665950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:35:01,185-Speed 2620.96 samples/sec   Loss 2.7677   LearningRate 0.0039   Epoch: 16   Global Step: 665960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:35:05,096-Speed 2618.92 samples/sec   Loss 2.6884   LearningRate 0.0039   Epoch: 16   Global Step: 665970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:35:08,971-Speed 2643.12 samples/sec   Loss 2.6818   LearningRate 0.0039   Epoch: 16   Global Step: 665980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:12,869-Speed 2628.23 samples/sec   Loss 2.7964   LearningRate 0.0039   Epoch: 16   Global Step: 665990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:16,785-Speed 2615.32 samples/sec   Loss 2.7049   LearningRate 0.0039   Epoch: 16   Global Step: 666000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:20,681-Speed 2629.88 samples/sec   Loss 2.7713   LearningRate 0.0039   Epoch: 16   Global Step: 666010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:24,577-Speed 2629.13 samples/sec   Loss 2.5949   LearningRate 0.0039   Epoch: 16   Global Step: 666020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:28,477-Speed 2626.42 samples/sec   Loss 2.6929   LearningRate 0.0039   Epoch: 16   Global Step: 666030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:32,398-Speed 2612.02 samples/sec   Loss 2.7219   LearningRate 0.0039   Epoch: 16   Global Step: 666040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:36,303-Speed 2622.39 samples/sec   Loss 2.6662   LearningRate 0.0039   Epoch: 16   Global Step: 666050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:40,205-Speed 2624.68 samples/sec   Loss 2.7692   LearningRate 0.0039   Epoch: 16   Global Step: 666060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:44,115-Speed 2620.52 samples/sec   Loss 2.6558   LearningRate 0.0039   Epoch: 16   Global Step: 666070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:48,030-Speed 2616.40 samples/sec   Loss 2.6656   LearningRate 0.0039   Epoch: 16   Global Step: 666080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:35:51,938-Speed 2620.84 samples/sec   Loss 2.6946   LearningRate 0.0039   Epoch: 16   Global Step: 666090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:35:55,818-Speed 2639.86 samples/sec   Loss 2.6585   LearningRate 0.0039   Epoch: 16   Global Step: 666100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:35:59,721-Speed 2624.59 samples/sec   Loss 2.7467   LearningRate 0.0039   Epoch: 16   Global Step: 666110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:03,629-Speed 2620.34 samples/sec   Loss 2.8008   LearningRate 0.0039   Epoch: 16   Global Step: 666120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:07,525-Speed 2629.53 samples/sec   Loss 2.7419   LearningRate 0.0039   Epoch: 16   Global Step: 666130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:11,429-Speed 2623.62 samples/sec   Loss 2.7474   LearningRate 0.0039   Epoch: 16   Global Step: 666140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:15,322-Speed 2631.22 samples/sec   Loss 2.7068   LearningRate 0.0039   Epoch: 16   Global Step: 666150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:19,415-Speed 2502.51 samples/sec   Loss 2.6786   LearningRate 0.0039   Epoch: 16   Global Step: 666160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:23,309-Speed 2629.94 samples/sec   Loss 2.6621   LearningRate 0.0039   Epoch: 16   Global Step: 666170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:27,214-Speed 2623.34 samples/sec   Loss 2.6953   LearningRate 0.0039   Epoch: 16   Global Step: 666180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:31,112-Speed 2627.43 samples/sec   Loss 2.7437   LearningRate 0.0039   Epoch: 16   Global Step: 666190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:36:35,017-Speed 2623.31 samples/sec   Loss 2.7094   LearningRate 0.0039   Epoch: 16   Global Step: 666200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:36:38,918-Speed 2625.45 samples/sec   Loss 2.7361   LearningRate 0.0039   Epoch: 16   Global Step: 666210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:36:42,822-Speed 2623.82 samples/sec   Loss 2.7052   LearningRate 0.0039   Epoch: 16   Global Step: 666220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:36:46,723-Speed 2625.30 samples/sec   Loss 2.6509   LearningRate 0.0039   Epoch: 16   Global Step: 666230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:36:50,669-Speed 2595.99 samples/sec   Loss 2.6931   LearningRate 0.0039   Epoch: 16   Global Step: 666240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:36:54,580-Speed 2619.44 samples/sec   Loss 2.7097   LearningRate 0.0039   Epoch: 16   Global Step: 666250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:36:58,478-Speed 2628.13 samples/sec   Loss 2.7452   LearningRate 0.0039   Epoch: 16   Global Step: 666260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:37:02,349-Speed 2645.34 samples/sec   Loss 2.6961   LearningRate 0.0039   Epoch: 16   Global Step: 666270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:06,244-Speed 2629.43 samples/sec   Loss 2.7420   LearningRate 0.0039   Epoch: 16   Global Step: 666280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:10,138-Speed 2630.41 samples/sec   Loss 2.7490   LearningRate 0.0039   Epoch: 16   Global Step: 666290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:14,029-Speed 2632.91 samples/sec   Loss 2.6768   LearningRate 0.0039   Epoch: 16   Global Step: 666300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:17,980-Speed 2592.97 samples/sec   Loss 2.6906   LearningRate 0.0039   Epoch: 16   Global Step: 666310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:21,947-Speed 2581.56 samples/sec   Loss 2.7160   LearningRate 0.0039   Epoch: 16   Global Step: 666320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:25,874-Speed 2609.18 samples/sec   Loss 2.7602   LearningRate 0.0039   Epoch: 16   Global Step: 666330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:29,784-Speed 2619.48 samples/sec   Loss 2.7695   LearningRate 0.0039   Epoch: 16   Global Step: 666340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:33,707-Speed 2610.86 samples/sec   Loss 2.6811   LearningRate 0.0039   Epoch: 16   Global Step: 666350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:37,693-Speed 2569.64 samples/sec   Loss 2.6593   LearningRate 0.0039   Epoch: 16   Global Step: 666360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:41,592-Speed 2627.72 samples/sec   Loss 2.7142   LearningRate 0.0039   Epoch: 16   Global Step: 666370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:37:45,474-Speed 2638.66 samples/sec   Loss 2.6536   LearningRate 0.0039   Epoch: 16   Global Step: 666380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:49,386-Speed 2618.38 samples/sec   Loss 2.7204   LearningRate 0.0039   Epoch: 16   Global Step: 666390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:53,286-Speed 2626.41 samples/sec   Loss 2.6907   LearningRate 0.0039   Epoch: 16   Global Step: 666400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:37:57,179-Speed 2631.46 samples/sec   Loss 2.7636   LearningRate 0.0039   Epoch: 16   Global Step: 666410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:01,093-Speed 2616.65 samples/sec   Loss 2.6945   LearningRate 0.0039   Epoch: 16   Global Step: 666420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:04,994-Speed 2625.19 samples/sec   Loss 2.7370   LearningRate 0.0039   Epoch: 16   Global Step: 666430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:08,889-Speed 2629.65 samples/sec   Loss 2.6660   LearningRate 0.0039   Epoch: 16   Global Step: 666440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:12,784-Speed 2629.96 samples/sec   Loss 2.7478   LearningRate 0.0039   Epoch: 16   Global Step: 666450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:16,691-Speed 2621.59 samples/sec   Loss 2.6387   LearningRate 0.0039   Epoch: 16   Global Step: 666460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:20,601-Speed 2619.46 samples/sec   Loss 2.6646   LearningRate 0.0039   Epoch: 16   Global Step: 666470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:24,493-Speed 2631.90 samples/sec   Loss 2.7087   LearningRate 0.0039   Epoch: 16   Global Step: 666480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:28,386-Speed 2631.45 samples/sec   Loss 2.6957   LearningRate 0.0039   Epoch: 16   Global Step: 666490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:32,296-Speed 2619.64 samples/sec   Loss 2.7033   LearningRate 0.0039   Epoch: 16   Global Step: 666500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:36,191-Speed 2629.14 samples/sec   Loss 2.6837   LearningRate 0.0039   Epoch: 16   Global Step: 666510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:40,094-Speed 2624.31 samples/sec   Loss 2.5860   LearningRate 0.0039   Epoch: 16   Global Step: 666520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:44,180-Speed 2507.05 samples/sec   Loss 2.7073   LearningRate 0.0039   Epoch: 16   Global Step: 666530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:48,073-Speed 2631.31 samples/sec   Loss 2.6781   LearningRate 0.0039   Epoch: 16   Global Step: 666540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:51,974-Speed 2626.15 samples/sec   Loss 2.6830   LearningRate 0.0039   Epoch: 16   Global Step: 666550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:55,874-Speed 2626.17 samples/sec   Loss 2.6476   LearningRate 0.0039   Epoch: 16   Global Step: 666560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:38:59,793-Speed 2613.82 samples/sec   Loss 2.7074   LearningRate 0.0039   Epoch: 16   Global Step: 666570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:03,684-Speed 2632.28 samples/sec   Loss 2.7232   LearningRate 0.0039   Epoch: 16   Global Step: 666580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:39:07,582-Speed 2627.48 samples/sec   Loss 2.7835   LearningRate 0.0039   Epoch: 16   Global Step: 666590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:39:11,483-Speed 2625.55 samples/sec   Loss 2.6788   LearningRate 0.0039   Epoch: 16   Global Step: 666600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:39:15,404-Speed 2612.95 samples/sec   Loss 2.7208   LearningRate 0.0039   Epoch: 16   Global Step: 666610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:39:19,301-Speed 2628.68 samples/sec   Loss 2.7074   LearningRate 0.0039   Epoch: 16   Global Step: 666620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:39:23,207-Speed 2622.51 samples/sec   Loss 2.6397   LearningRate 0.0039   Epoch: 16   Global Step: 666630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:39:27,100-Speed 2630.70 samples/sec   Loss 2.6920   LearningRate 0.0039   Epoch: 16   Global Step: 666640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:30,993-Speed 2631.17 samples/sec   Loss 2.6486   LearningRate 0.0039   Epoch: 16   Global Step: 666650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:34,885-Speed 2631.44 samples/sec   Loss 2.6521   LearningRate 0.0039   Epoch: 16   Global Step: 666660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:38,812-Speed 2608.35 samples/sec   Loss 2.6181   LearningRate 0.0039   Epoch: 16   Global Step: 666670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:42,710-Speed 2627.68 samples/sec   Loss 2.7401   LearningRate 0.0039   Epoch: 16   Global Step: 666680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:46,601-Speed 2633.16 samples/sec   Loss 2.6868   LearningRate 0.0039   Epoch: 16   Global Step: 666690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:50,600-Speed 2560.59 samples/sec   Loss 2.7500   LearningRate 0.0039   Epoch: 16   Global Step: 666700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:54,505-Speed 2623.59 samples/sec   Loss 2.6688   LearningRate 0.0039   Epoch: 16   Global Step: 666710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:39:58,401-Speed 2628.77 samples/sec   Loss 2.7182   LearningRate 0.0039   Epoch: 16   Global Step: 666720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:02,306-Speed 2622.87 samples/sec   Loss 2.8141   LearningRate 0.0039   Epoch: 16   Global Step: 666730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:06,200-Speed 2630.13 samples/sec   Loss 2.6676   LearningRate 0.0039   Epoch: 16   Global Step: 666740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:40:10,097-Speed 2628.18 samples/sec   Loss 2.6626   LearningRate 0.0039   Epoch: 16   Global Step: 666750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:40:13,994-Speed 2628.99 samples/sec   Loss 2.6330   LearningRate 0.0039   Epoch: 16   Global Step: 666760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:40:17,890-Speed 2628.69 samples/sec   Loss 2.6481   LearningRate 0.0039   Epoch: 16   Global Step: 666770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:40:21,760-Speed 2646.74 samples/sec   Loss 2.7231   LearningRate 0.0039   Epoch: 16   Global Step: 666780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:25,664-Speed 2624.06 samples/sec   Loss 2.6646   LearningRate 0.0039   Epoch: 16   Global Step: 666790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:29,576-Speed 2617.90 samples/sec   Loss 2.6558   LearningRate 0.0039   Epoch: 16   Global Step: 666800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:33,512-Speed 2602.36 samples/sec   Loss 2.7707   LearningRate 0.0038   Epoch: 16   Global Step: 666810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:37,527-Speed 2551.93 samples/sec   Loss 2.7619   LearningRate 0.0038   Epoch: 16   Global Step: 666820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:41,617-Speed 2503.72 samples/sec   Loss 2.6349   LearningRate 0.0038   Epoch: 16   Global Step: 666830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:45,714-Speed 2500.22 samples/sec   Loss 2.6626   LearningRate 0.0038   Epoch: 16   Global Step: 666840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:40:49,708-Speed 2564.89 samples/sec   Loss 2.8237   LearningRate 0.0038   Epoch: 16   Global Step: 666850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:40:53,614-Speed 2622.66 samples/sec   Loss 2.6363   LearningRate 0.0038   Epoch: 16   Global Step: 666860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:40:57,537-Speed 2610.55 samples/sec   Loss 2.6811   LearningRate 0.0038   Epoch: 16   Global Step: 666870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:01,433-Speed 2629.06 samples/sec   Loss 2.5944   LearningRate 0.0038   Epoch: 16   Global Step: 666880   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:05,333-Speed 2625.87 samples/sec   Loss 2.7013   LearningRate 0.0038   Epoch: 16   Global Step: 666890   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:09,239-Speed 2622.76 samples/sec   Loss 2.6684   LearningRate 0.0038   Epoch: 16   Global Step: 666900   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:13,142-Speed 2624.02 samples/sec   Loss 2.6991   LearningRate 0.0038   Epoch: 16   Global Step: 666910   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:17,050-Speed 2621.35 samples/sec   Loss 2.6927   LearningRate 0.0038   Epoch: 16   Global Step: 666920   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:20,953-Speed 2624.46 samples/sec   Loss 2.7162   LearningRate 0.0038   Epoch: 16   Global Step: 666930   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:24,864-Speed 2618.89 samples/sec   Loss 2.6919   LearningRate 0.0038   Epoch: 16   Global Step: 666940   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:41:28,757-Speed 2630.86 samples/sec   Loss 2.6514   LearningRate 0.0038   Epoch: 16   Global Step: 666950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:32,653-Speed 2629.00 samples/sec   Loss 2.6983   LearningRate 0.0038   Epoch: 16   Global Step: 666960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:36,563-Speed 2619.23 samples/sec   Loss 2.6653   LearningRate 0.0038   Epoch: 16   Global Step: 666970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:40,457-Speed 2630.45 samples/sec   Loss 2.6844   LearningRate 0.0038   Epoch: 16   Global Step: 666980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:44,353-Speed 2629.08 samples/sec   Loss 2.6751   LearningRate 0.0038   Epoch: 16   Global Step: 666990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:48,253-Speed 2627.25 samples/sec   Loss 2.6461   LearningRate 0.0038   Epoch: 16   Global Step: 667000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:52,159-Speed 2621.75 samples/sec   Loss 2.6422   LearningRate 0.0038   Epoch: 16   Global Step: 667010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:56,060-Speed 2626.01 samples/sec   Loss 2.6766   LearningRate 0.0038   Epoch: 16   Global Step: 667020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:41:59,912-Speed 2658.91 samples/sec   Loss 2.6270   LearningRate 0.0038   Epoch: 16   Global Step: 667030   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:03,841-Speed 2606.83 samples/sec   Loss 2.7553   LearningRate 0.0038   Epoch: 16   Global Step: 667040   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:07,734-Speed 2631.57 samples/sec   Loss 2.7515   LearningRate 0.0038   Epoch: 16   Global Step: 667050   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:11,629-Speed 2629.39 samples/sec   Loss 2.6697   LearningRate 0.0038   Epoch: 16   Global Step: 667060   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:15,540-Speed 2618.97 samples/sec   Loss 2.6683   LearningRate 0.0038   Epoch: 16   Global Step: 667070   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:19,450-Speed 2620.11 samples/sec   Loss 2.6588   LearningRate 0.0038   Epoch: 16   Global Step: 667080   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:23,347-Speed 2627.87 samples/sec   Loss 2.7215   LearningRate 0.0038   Epoch: 16   Global Step: 667090   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:27,251-Speed 2623.94 samples/sec   Loss 2.6586   LearningRate 0.0038   Epoch: 16   Global Step: 667100   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:31,147-Speed 2628.79 samples/sec   Loss 2.6615   LearningRate 0.0038   Epoch: 16   Global Step: 667110   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:35,046-Speed 2626.85 samples/sec   Loss 2.5742   LearningRate 0.0038   Epoch: 16   Global Step: 667120   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-04-15 22:42:38,939-Speed 2631.04 samples/sec   Loss 2.6510   LearningRate 0.0038   Epoch: 16   Global Step: 667130   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:42:42,977-Speed 2536.86 samples/sec   Loss 2.7281   LearningRate 0.0038   Epoch: 16   Global Step: 667140   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:42:46,880-Speed 2624.44 samples/sec   Loss 2.6905   LearningRate 0.0038   Epoch: 16   Global Step: 667150   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:42:50,781-Speed 2625.70 samples/sec   Loss 2.6381   LearningRate 0.0038   Epoch: 16   Global Step: 667160   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:42:54,679-Speed 2627.63 samples/sec   Loss 2.7489   LearningRate 0.0038   Epoch: 16   Global Step: 667170   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:42:58,586-Speed 2622.27 samples/sec   Loss 2.6769   LearningRate 0.0038   Epoch: 16   Global Step: 667180   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:43:02,478-Speed 2631.20 samples/sec   Loss 2.7327   LearningRate 0.0038   Epoch: 16   Global Step: 667190   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:43:06,375-Speed 2628.41 samples/sec   Loss 2.6937   LearningRate 0.0038   Epoch: 16   Global Step: 667200   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:43:10,276-Speed 2625.46 samples/sec   Loss 2.6628   LearningRate 0.0038   Epoch: 16   Global Step: 667210   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:43:14,170-Speed 2630.57 samples/sec   Loss 2.6641   LearningRate 0.0038   Epoch: 16   Global Step: 667220   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:43:18,062-Speed 2631.51 samples/sec   Loss 2.6730   LearningRate 0.0038   Epoch: 16   Global Step: 667230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:22,113-Speed 2528.36 samples/sec   Loss 2.6371   LearningRate 0.0038   Epoch: 16   Global Step: 667240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:26,004-Speed 2632.33 samples/sec   Loss 2.7264   LearningRate 0.0038   Epoch: 16   Global Step: 667250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:29,907-Speed 2624.87 samples/sec   Loss 2.7501   LearningRate 0.0038   Epoch: 16   Global Step: 667260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:33,801-Speed 2629.55 samples/sec   Loss 2.6903   LearningRate 0.0038   Epoch: 16   Global Step: 667270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:37,697-Speed 2629.00 samples/sec   Loss 2.7144   LearningRate 0.0038   Epoch: 16   Global Step: 667280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:41,594-Speed 2628.16 samples/sec   Loss 2.6384   LearningRate 0.0038   Epoch: 16   Global Step: 667290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:45,491-Speed 2628.39 samples/sec   Loss 2.7291   LearningRate 0.0038   Epoch: 16   Global Step: 667300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:49,387-Speed 2629.41 samples/sec   Loss 2.7362   LearningRate 0.0038   Epoch: 16   Global Step: 667310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:53,281-Speed 2630.02 samples/sec   Loss 2.7293   LearningRate 0.0038   Epoch: 16   Global Step: 667320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:43:57,178-Speed 2628.61 samples/sec   Loss 2.6990   LearningRate 0.0038   Epoch: 16   Global Step: 667330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:44:01,090-Speed 2618.01 samples/sec   Loss 2.7288   LearningRate 0.0038   Epoch: 16   Global Step: 667340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:44:04,963-Speed 2644.17 samples/sec   Loss 2.7042   LearningRate 0.0038   Epoch: 16   Global Step: 667350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:08,860-Speed 2628.13 samples/sec   Loss 2.6353   LearningRate 0.0038   Epoch: 16   Global Step: 667360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:12,765-Speed 2623.43 samples/sec   Loss 2.7201   LearningRate 0.0038   Epoch: 16   Global Step: 667370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:16,676-Speed 2619.18 samples/sec   Loss 2.6621   LearningRate 0.0038   Epoch: 16   Global Step: 667380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:20,587-Speed 2618.39 samples/sec   Loss 2.7271   LearningRate 0.0038   Epoch: 16   Global Step: 667390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:24,486-Speed 2627.66 samples/sec   Loss 2.7076   LearningRate 0.0038   Epoch: 16   Global Step: 667400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:28,384-Speed 2627.30 samples/sec   Loss 2.6331   LearningRate 0.0038   Epoch: 16   Global Step: 667410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:32,292-Speed 2620.90 samples/sec   Loss 2.7544   LearningRate 0.0038   Epoch: 16   Global Step: 667420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:36,188-Speed 2628.36 samples/sec   Loss 2.7361   LearningRate 0.0038   Epoch: 16   Global Step: 667430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:40,087-Speed 2627.07 samples/sec   Loss 2.7095   LearningRate 0.0038   Epoch: 16   Global Step: 667440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:44:43,981-Speed 2631.14 samples/sec   Loss 2.6378   LearningRate 0.0038   Epoch: 16   Global Step: 667450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:44:47,894-Speed 2617.23 samples/sec   Loss 2.7280   LearningRate 0.0038   Epoch: 16   Global Step: 667460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:44:51,807-Speed 2618.05 samples/sec   Loss 2.6350   LearningRate 0.0038   Epoch: 16   Global Step: 667470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:44:55,725-Speed 2613.98 samples/sec   Loss 2.7193   LearningRate 0.0038   Epoch: 16   Global Step: 667480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:44:59,623-Speed 2627.68 samples/sec   Loss 2.6586   LearningRate 0.0038   Epoch: 16   Global Step: 667490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:45:03,529-Speed 2622.27 samples/sec   Loss 2.6481   LearningRate 0.0038   Epoch: 16   Global Step: 667500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:45:07,431-Speed 2625.15 samples/sec   Loss 2.7219   LearningRate 0.0038   Epoch: 16   Global Step: 667510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:45:11,331-Speed 2625.97 samples/sec   Loss 2.6006   LearningRate 0.0038   Epoch: 16   Global Step: 667520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:45:15,243-Speed 2618.41 samples/sec   Loss 2.7298   LearningRate 0.0038   Epoch: 16   Global Step: 667530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:45:19,144-Speed 2625.84 samples/sec   Loss 2.6008   LearningRate 0.0038   Epoch: 16   Global Step: 667540   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:45:23,059-Speed 2616.31 samples/sec   Loss 2.7207   LearningRate 0.0038   Epoch: 16   Global Step: 667550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:26,959-Speed 2626.14 samples/sec   Loss 2.6773   LearningRate 0.0038   Epoch: 16   Global Step: 667560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:30,868-Speed 2620.50 samples/sec   Loss 2.6586   LearningRate 0.0038   Epoch: 16   Global Step: 667570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:34,853-Speed 2569.73 samples/sec   Loss 2.6733   LearningRate 0.0038   Epoch: 16   Global Step: 667580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:38,746-Speed 2631.98 samples/sec   Loss 2.7437   LearningRate 0.0038   Epoch: 16   Global Step: 667590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:42,637-Speed 2632.01 samples/sec   Loss 2.5677   LearningRate 0.0038   Epoch: 16   Global Step: 667600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:46,528-Speed 2632.51 samples/sec   Loss 2.6522   LearningRate 0.0038   Epoch: 16   Global Step: 667610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:50,427-Speed 2627.18 samples/sec   Loss 2.6558   LearningRate 0.0038   Epoch: 16   Global Step: 667620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:54,343-Speed 2615.10 samples/sec   Loss 2.6411   LearningRate 0.0038   Epoch: 16   Global Step: 667630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:45:58,258-Speed 2616.97 samples/sec   Loss 2.6829   LearningRate 0.0038   Epoch: 16   Global Step: 667640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:02,171-Speed 2617.09 samples/sec   Loss 2.6342   LearningRate 0.0038   Epoch: 16   Global Step: 667650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:46:06,078-Speed 2621.74 samples/sec   Loss 2.6776   LearningRate 0.0038   Epoch: 16   Global Step: 667660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:46:10,054-Speed 2575.99 samples/sec   Loss 2.7307   LearningRate 0.0038   Epoch: 16   Global Step: 667670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:46:13,962-Speed 2621.38 samples/sec   Loss 2.7270   LearningRate 0.0038   Epoch: 16   Global Step: 667680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:46:17,863-Speed 2625.66 samples/sec   Loss 2.7586   LearningRate 0.0038   Epoch: 16   Global Step: 667690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:46:21,757-Speed 2630.37 samples/sec   Loss 2.6997   LearningRate 0.0038   Epoch: 16   Global Step: 667700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:25,676-Speed 2613.28 samples/sec   Loss 2.6292   LearningRate 0.0038   Epoch: 16   Global Step: 667710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:29,580-Speed 2624.26 samples/sec   Loss 2.7316   LearningRate 0.0038   Epoch: 16   Global Step: 667720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:33,475-Speed 2629.36 samples/sec   Loss 2.6026   LearningRate 0.0038   Epoch: 16   Global Step: 667730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:37,379-Speed 2623.73 samples/sec   Loss 2.6356   LearningRate 0.0038   Epoch: 16   Global Step: 667740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:41,271-Speed 2631.61 samples/sec   Loss 2.6717   LearningRate 0.0038   Epoch: 16   Global Step: 667750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:45,169-Speed 2628.02 samples/sec   Loss 2.7202   LearningRate 0.0038   Epoch: 16   Global Step: 667760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:49,070-Speed 2625.19 samples/sec   Loss 2.7548   LearningRate 0.0038   Epoch: 16   Global Step: 667770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:52,971-Speed 2626.27 samples/sec   Loss 2.6246   LearningRate 0.0038   Epoch: 16   Global Step: 667780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:46:56,865-Speed 2629.71 samples/sec   Loss 2.7012   LearningRate 0.0038   Epoch: 16   Global Step: 667790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:00,766-Speed 2626.07 samples/sec   Loss 2.7276   LearningRate 0.0038   Epoch: 16   Global Step: 667800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:47:04,671-Speed 2622.35 samples/sec   Loss 2.6369   LearningRate 0.0038   Epoch: 16   Global Step: 667810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:47:08,552-Speed 2639.44 samples/sec   Loss 2.7220   LearningRate 0.0038   Epoch: 16   Global Step: 667820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:12,462-Speed 2619.67 samples/sec   Loss 2.7247   LearningRate 0.0038   Epoch: 16   Global Step: 667830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:16,366-Speed 2623.32 samples/sec   Loss 2.7025   LearningRate 0.0038   Epoch: 16   Global Step: 667840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:20,258-Speed 2632.17 samples/sec   Loss 2.7707   LearningRate 0.0038   Epoch: 16   Global Step: 667850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:24,148-Speed 2632.68 samples/sec   Loss 2.6204   LearningRate 0.0038   Epoch: 16   Global Step: 667860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:28,043-Speed 2629.96 samples/sec   Loss 2.6377   LearningRate 0.0038   Epoch: 16   Global Step: 667870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:31,936-Speed 2630.99 samples/sec   Loss 2.6282   LearningRate 0.0038   Epoch: 16   Global Step: 667880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:35,832-Speed 2628.81 samples/sec   Loss 2.6849   LearningRate 0.0038   Epoch: 16   Global Step: 667890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:39,731-Speed 2626.92 samples/sec   Loss 2.6703   LearningRate 0.0038   Epoch: 16   Global Step: 667900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:43,645-Speed 2616.83 samples/sec   Loss 2.7185   LearningRate 0.0038   Epoch: 16   Global Step: 667910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:47,549-Speed 2623.55 samples/sec   Loss 2.7037   LearningRate 0.0038   Epoch: 16   Global Step: 667920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:47:51,441-Speed 2631.70 samples/sec   Loss 2.7636   LearningRate 0.0038   Epoch: 16   Global Step: 667930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:47:55,318-Speed 2641.98 samples/sec   Loss 2.6909   LearningRate 0.0038   Epoch: 16   Global Step: 667940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:47:59,211-Speed 2631.45 samples/sec   Loss 2.7099   LearningRate 0.0038   Epoch: 16   Global Step: 667950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:03,107-Speed 2628.88 samples/sec   Loss 2.6852   LearningRate 0.0038   Epoch: 16   Global Step: 667960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:07,003-Speed 2628.68 samples/sec   Loss 2.7341   LearningRate 0.0038   Epoch: 16   Global Step: 667970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:10,897-Speed 2629.83 samples/sec   Loss 2.7042   LearningRate 0.0038   Epoch: 16   Global Step: 667980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:14,797-Speed 2626.75 samples/sec   Loss 2.6456   LearningRate 0.0038   Epoch: 16   Global Step: 667990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:18,689-Speed 2631.48 samples/sec   Loss 2.6486   LearningRate 0.0038   Epoch: 16   Global Step: 668000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:22,585-Speed 2629.58 samples/sec   Loss 2.7069   LearningRate 0.0038   Epoch: 16   Global Step: 668010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:26,476-Speed 2631.88 samples/sec   Loss 2.6487   LearningRate 0.0038   Epoch: 16   Global Step: 668020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:30,369-Speed 2631.64 samples/sec   Loss 2.7619   LearningRate 0.0038   Epoch: 16   Global Step: 668030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:34,259-Speed 2632.76 samples/sec   Loss 2.6615   LearningRate 0.0038   Epoch: 16   Global Step: 668040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:48:38,153-Speed 2630.59 samples/sec   Loss 2.6187   LearningRate 0.0038   Epoch: 16   Global Step: 668050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:48:42,056-Speed 2623.64 samples/sec   Loss 2.7415   LearningRate 0.0038   Epoch: 16   Global Step: 668060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:45,954-Speed 2628.33 samples/sec   Loss 2.6405   LearningRate 0.0038   Epoch: 16   Global Step: 668070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:49,848-Speed 2630.65 samples/sec   Loss 2.6806   LearningRate 0.0038   Epoch: 16   Global Step: 668080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:53,747-Speed 2626.68 samples/sec   Loss 2.6720   LearningRate 0.0038   Epoch: 16   Global Step: 668090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:48:57,641-Speed 2630.38 samples/sec   Loss 2.7041   LearningRate 0.0038   Epoch: 16   Global Step: 668100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:01,535-Speed 2630.33 samples/sec   Loss 2.5789   LearningRate 0.0038   Epoch: 16   Global Step: 668110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:05,430-Speed 2630.05 samples/sec   Loss 2.6251   LearningRate 0.0038   Epoch: 16   Global Step: 668120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:09,323-Speed 2630.43 samples/sec   Loss 2.7153   LearningRate 0.0038   Epoch: 16   Global Step: 668130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:13,240-Speed 2615.19 samples/sec   Loss 2.6556   LearningRate 0.0038   Epoch: 16   Global Step: 668140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:17,133-Speed 2630.75 samples/sec   Loss 2.6216   LearningRate 0.0038   Epoch: 16   Global Step: 668150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:21,005-Speed 2645.96 samples/sec   Loss 2.6925   LearningRate 0.0038   Epoch: 16   Global Step: 668160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:24,896-Speed 2632.37 samples/sec   Loss 2.6932   LearningRate 0.0038   Epoch: 16   Global Step: 668170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:28,796-Speed 2626.64 samples/sec   Loss 2.6628   LearningRate 0.0038   Epoch: 16   Global Step: 668180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:32,756-Speed 2586.46 samples/sec   Loss 2.7463   LearningRate 0.0038   Epoch: 16   Global Step: 668190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:49:36,625-Speed 2647.43 samples/sec   Loss 2.7020   LearningRate 0.0038   Epoch: 16   Global Step: 668200   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:49:40,516-Speed 2632.34 samples/sec   Loss 2.6415   LearningRate 0.0038   Epoch: 16   Global Step: 668210   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:49:44,449-Speed 2604.40 samples/sec   Loss 2.7199   LearningRate 0.0038   Epoch: 16   Global Step: 668220   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:49:48,367-Speed 2613.89 samples/sec   Loss 2.6341   LearningRate 0.0038   Epoch: 16   Global Step: 668230   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:49:52,266-Speed 2627.36 samples/sec   Loss 2.6984   LearningRate 0.0038   Epoch: 16   Global Step: 668240   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:49:56,171-Speed 2623.42 samples/sec   Loss 2.7370   LearningRate 0.0038   Epoch: 16   Global Step: 668250   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:50:00,073-Speed 2624.99 samples/sec   Loss 2.7041   LearningRate 0.0038   Epoch: 16   Global Step: 668260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:50:03,993-Speed 2612.63 samples/sec   Loss 2.5773   LearningRate 0.0038   Epoch: 16   Global Step: 668270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:50:07,902-Speed 2620.50 samples/sec   Loss 2.6414   LearningRate 0.0038   Epoch: 16   Global Step: 668280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:50:11,802-Speed 2626.09 samples/sec   Loss 2.6343   LearningRate 0.0038   Epoch: 16   Global Step: 668290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:50:15,704-Speed 2625.57 samples/sec   Loss 2.6910   LearningRate 0.0038   Epoch: 16   Global Step: 668300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:19,656-Speed 2592.12 samples/sec   Loss 2.6354   LearningRate 0.0038   Epoch: 16   Global Step: 668310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:23,570-Speed 2616.46 samples/sec   Loss 2.6764   LearningRate 0.0038   Epoch: 16   Global Step: 668320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:27,462-Speed 2631.54 samples/sec   Loss 2.6577   LearningRate 0.0038   Epoch: 16   Global Step: 668330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:31,519-Speed 2524.67 samples/sec   Loss 2.6842   LearningRate 0.0038   Epoch: 16   Global Step: 668340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:35,417-Speed 2628.32 samples/sec   Loss 2.6324   LearningRate 0.0038   Epoch: 16   Global Step: 668350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:39,320-Speed 2624.63 samples/sec   Loss 2.6448   LearningRate 0.0038   Epoch: 16   Global Step: 668360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:43,239-Speed 2613.22 samples/sec   Loss 2.6532   LearningRate 0.0038   Epoch: 16   Global Step: 668370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:47,136-Speed 2628.52 samples/sec   Loss 2.7212   LearningRate 0.0038   Epoch: 16   Global Step: 668380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:51,057-Speed 2611.88 samples/sec   Loss 2.6653   LearningRate 0.0038   Epoch: 16   Global Step: 668390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:54,932-Speed 2644.10 samples/sec   Loss 2.6567   LearningRate 0.0038   Epoch: 16   Global Step: 668400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:50:58,835-Speed 2623.82 samples/sec   Loss 2.6510   LearningRate 0.0038   Epoch: 16   Global Step: 668410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:51:02,729-Speed 2630.07 samples/sec   Loss 2.6370   LearningRate 0.0038   Epoch: 16   Global Step: 668420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:51:06,625-Speed 2628.79 samples/sec   Loss 2.7576   LearningRate 0.0038   Epoch: 16   Global Step: 668430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:51:10,511-Speed 2636.51 samples/sec   Loss 2.6877   LearningRate 0.0038   Epoch: 16   Global Step: 668440   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:14,442-Speed 2605.21 samples/sec   Loss 2.6348   LearningRate 0.0038   Epoch: 16   Global Step: 668450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:18,340-Speed 2627.98 samples/sec   Loss 2.6703   LearningRate 0.0038   Epoch: 16   Global Step: 668460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:22,298-Speed 2587.42 samples/sec   Loss 2.6115   LearningRate 0.0038   Epoch: 16   Global Step: 668470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:26,201-Speed 2624.33 samples/sec   Loss 2.6446   LearningRate 0.0038   Epoch: 16   Global Step: 668480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:30,130-Speed 2607.24 samples/sec   Loss 2.6993   LearningRate 0.0038   Epoch: 16   Global Step: 668490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:34,023-Speed 2631.53 samples/sec   Loss 2.6403   LearningRate 0.0038   Epoch: 16   Global Step: 668500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:37,918-Speed 2629.28 samples/sec   Loss 2.6374   LearningRate 0.0038   Epoch: 16   Global Step: 668510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:41,810-Speed 2631.42 samples/sec   Loss 2.6245   LearningRate 0.0038   Epoch: 16   Global Step: 668520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:45,706-Speed 2629.48 samples/sec   Loss 2.6069   LearningRate 0.0038   Epoch: 16   Global Step: 668530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:51:49,596-Speed 2633.14 samples/sec   Loss 2.6094   LearningRate 0.0038   Epoch: 16   Global Step: 668540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:51:53,493-Speed 2627.65 samples/sec   Loss 2.7099   LearningRate 0.0038   Epoch: 16   Global Step: 668550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:51:57,382-Speed 2634.09 samples/sec   Loss 2.6305   LearningRate 0.0038   Epoch: 16   Global Step: 668560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:01,281-Speed 2627.02 samples/sec   Loss 2.7101   LearningRate 0.0038   Epoch: 16   Global Step: 668570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:05,173-Speed 2631.84 samples/sec   Loss 2.6376   LearningRate 0.0038   Epoch: 16   Global Step: 668580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:09,067-Speed 2629.94 samples/sec   Loss 2.5682   LearningRate 0.0038   Epoch: 16   Global Step: 668590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:12,970-Speed 2624.52 samples/sec   Loss 2.5921   LearningRate 0.0038   Epoch: 16   Global Step: 668600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:16,866-Speed 2629.23 samples/sec   Loss 2.6955   LearningRate 0.0038   Epoch: 16   Global Step: 668610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:20,767-Speed 2626.35 samples/sec   Loss 2.6545   LearningRate 0.0038   Epoch: 16   Global Step: 668620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:24,661-Speed 2629.96 samples/sec   Loss 2.6830   LearningRate 0.0038   Epoch: 16   Global Step: 668630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:28,568-Speed 2621.97 samples/sec   Loss 2.6595   LearningRate 0.0038   Epoch: 16   Global Step: 668640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:52:32,466-Speed 2627.48 samples/sec   Loss 2.7087   LearningRate 0.0038   Epoch: 16   Global Step: 668650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:52:36,347-Speed 2639.36 samples/sec   Loss 2.6207   LearningRate 0.0038   Epoch: 16   Global Step: 668660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:40,238-Speed 2631.91 samples/sec   Loss 2.7086   LearningRate 0.0038   Epoch: 16   Global Step: 668670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:44,130-Speed 2632.13 samples/sec   Loss 2.6342   LearningRate 0.0038   Epoch: 16   Global Step: 668680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:48,021-Speed 2632.52 samples/sec   Loss 2.6468   LearningRate 0.0038   Epoch: 16   Global Step: 668690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:51,913-Speed 2631.61 samples/sec   Loss 2.6299   LearningRate 0.0038   Epoch: 16   Global Step: 668700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:55,818-Speed 2622.96 samples/sec   Loss 2.6754   LearningRate 0.0038   Epoch: 16   Global Step: 668710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:52:59,724-Speed 2622.03 samples/sec   Loss 2.6772   LearningRate 0.0038   Epoch: 16   Global Step: 668720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:03,629-Speed 2623.60 samples/sec   Loss 2.6477   LearningRate 0.0038   Epoch: 16   Global Step: 668730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:07,522-Speed 2630.51 samples/sec   Loss 2.6655   LearningRate 0.0038   Epoch: 16   Global Step: 668740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:11,420-Speed 2627.26 samples/sec   Loss 2.6706   LearningRate 0.0038   Epoch: 16   Global Step: 668750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:15,318-Speed 2628.18 samples/sec   Loss 2.6612   LearningRate 0.0038   Epoch: 16   Global Step: 668760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:53:19,220-Speed 2624.74 samples/sec   Loss 2.6886   LearningRate 0.0038   Epoch: 16   Global Step: 668770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:53:23,117-Speed 2628.74 samples/sec   Loss 2.6396   LearningRate 0.0038   Epoch: 16   Global Step: 668780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:53:26,989-Speed 2644.81 samples/sec   Loss 2.7853   LearningRate 0.0038   Epoch: 16   Global Step: 668790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:31,054-Speed 2519.69 samples/sec   Loss 2.6993   LearningRate 0.0038   Epoch: 16   Global Step: 668800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:34,949-Speed 2629.69 samples/sec   Loss 2.6398   LearningRate 0.0038   Epoch: 16   Global Step: 668810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:38,844-Speed 2629.52 samples/sec   Loss 2.5518   LearningRate 0.0038   Epoch: 16   Global Step: 668820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:42,749-Speed 2623.03 samples/sec   Loss 2.6618   LearningRate 0.0038   Epoch: 16   Global Step: 668830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:46,645-Speed 2628.64 samples/sec   Loss 2.7082   LearningRate 0.0038   Epoch: 16   Global Step: 668840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:50,539-Speed 2630.81 samples/sec   Loss 2.6433   LearningRate 0.0038   Epoch: 16   Global Step: 668850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:54,436-Speed 2628.20 samples/sec   Loss 2.7161   LearningRate 0.0038   Epoch: 16   Global Step: 668860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:53:58,381-Speed 2596.65 samples/sec   Loss 2.6673   LearningRate 0.0038   Epoch: 16   Global Step: 668870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:02,274-Speed 2630.75 samples/sec   Loss 2.6696   LearningRate 0.0038   Epoch: 16   Global Step: 668880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:06,186-Speed 2618.40 samples/sec   Loss 2.5584   LearningRate 0.0038   Epoch: 16   Global Step: 668890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:54:10,073-Speed 2635.19 samples/sec   Loss 2.5746   LearningRate 0.0038   Epoch: 16   Global Step: 668900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:13,965-Speed 2632.93 samples/sec   Loss 2.6775   LearningRate 0.0038   Epoch: 16   Global Step: 668910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:17,861-Speed 2629.06 samples/sec   Loss 2.6253   LearningRate 0.0038   Epoch: 16   Global Step: 668920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:21,757-Speed 2628.97 samples/sec   Loss 2.6460   LearningRate 0.0038   Epoch: 16   Global Step: 668930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:25,650-Speed 2631.27 samples/sec   Loss 2.6436   LearningRate 0.0037   Epoch: 16   Global Step: 668940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:29,571-Speed 2612.02 samples/sec   Loss 2.6840   LearningRate 0.0037   Epoch: 16   Global Step: 668950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:54:33,443-Speed 2645.75 samples/sec   Loss 2.7896   LearningRate 0.0037   Epoch: 16   Global Step: 668960   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:54:37,346-Speed 2624.16 samples/sec   Loss 2.7408   LearningRate 0.0037   Epoch: 16   Global Step: 668970   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:54:41,240-Speed 2630.46 samples/sec   Loss 2.6450   LearningRate 0.0037   Epoch: 16   Global Step: 668980   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:54:45,141-Speed 2625.06 samples/sec   Loss 2.7059   LearningRate 0.0037   Epoch: 16   Global Step: 668990   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:54:49,053-Speed 2618.93 samples/sec   Loss 2.6184   LearningRate 0.0037   Epoch: 16   Global Step: 669000   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:54:52,949-Speed 2628.75 samples/sec   Loss 2.6060   LearningRate 0.0037   Epoch: 16   Global Step: 669010   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:54:56,842-Speed 2631.23 samples/sec   Loss 2.5941   LearningRate 0.0037   Epoch: 16   Global Step: 669020   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:55:00,764-Speed 2611.74 samples/sec   Loss 2.5917   LearningRate 0.0037   Epoch: 16   Global Step: 669030   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:55:04,701-Speed 2602.30 samples/sec   Loss 2.6628   LearningRate 0.0037   Epoch: 16   Global Step: 669040   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:55:08,665-Speed 2583.79 samples/sec   Loss 2.6796   LearningRate 0.0037   Epoch: 16   Global Step: 669050   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:55:12,566-Speed 2625.76 samples/sec   Loss 2.6797   LearningRate 0.0037   Epoch: 16   Global Step: 669060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:16,503-Speed 2601.54 samples/sec   Loss 2.7428   LearningRate 0.0037   Epoch: 16   Global Step: 669070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:20,399-Speed 2629.43 samples/sec   Loss 2.6526   LearningRate 0.0037   Epoch: 16   Global Step: 669080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:24,291-Speed 2631.63 samples/sec   Loss 2.6425   LearningRate 0.0037   Epoch: 16   Global Step: 669090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:28,190-Speed 2627.43 samples/sec   Loss 2.6323   LearningRate 0.0037   Epoch: 16   Global Step: 669100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:32,110-Speed 2612.60 samples/sec   Loss 2.6688   LearningRate 0.0037   Epoch: 16   Global Step: 669110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:36,013-Speed 2624.77 samples/sec   Loss 2.7560   LearningRate 0.0037   Epoch: 16   Global Step: 669120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:39,948-Speed 2603.27 samples/sec   Loss 2.6262   LearningRate 0.0037   Epoch: 16   Global Step: 669130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:43,850-Speed 2624.47 samples/sec   Loss 2.6553   LearningRate 0.0037   Epoch: 16   Global Step: 669140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:47,745-Speed 2629.53 samples/sec   Loss 2.6961   LearningRate 0.0037   Epoch: 16   Global Step: 669150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:51,642-Speed 2628.50 samples/sec   Loss 2.6461   LearningRate 0.0037   Epoch: 16   Global Step: 669160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:55:55,522-Speed 2640.42 samples/sec   Loss 2.6064   LearningRate 0.0037   Epoch: 16   Global Step: 669170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:55:59,421-Speed 2626.23 samples/sec   Loss 2.6429   LearningRate 0.0037   Epoch: 16   Global Step: 669180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:03,312-Speed 2632.91 samples/sec   Loss 2.6792   LearningRate 0.0037   Epoch: 16   Global Step: 669190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:07,244-Speed 2605.29 samples/sec   Loss 2.6392   LearningRate 0.0037   Epoch: 16   Global Step: 669200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:11,137-Speed 2631.00 samples/sec   Loss 2.6978   LearningRate 0.0037   Epoch: 16   Global Step: 669210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:15,029-Speed 2631.60 samples/sec   Loss 2.6224   LearningRate 0.0037   Epoch: 16   Global Step: 669220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:18,924-Speed 2629.53 samples/sec   Loss 2.6366   LearningRate 0.0037   Epoch: 16   Global Step: 669230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:22,816-Speed 2631.68 samples/sec   Loss 2.6262   LearningRate 0.0037   Epoch: 16   Global Step: 669240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:26,727-Speed 2618.71 samples/sec   Loss 2.6958   LearningRate 0.0037   Epoch: 16   Global Step: 669250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:30,617-Speed 2633.63 samples/sec   Loss 2.6560   LearningRate 0.0037   Epoch: 16   Global Step: 669260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:34,487-Speed 2646.88 samples/sec   Loss 2.7291   LearningRate 0.0037   Epoch: 16   Global Step: 669270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:38,394-Speed 2621.28 samples/sec   Loss 2.7163   LearningRate 0.0037   Epoch: 16   Global Step: 669280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:42,288-Speed 2630.29 samples/sec   Loss 2.6141   LearningRate 0.0037   Epoch: 16   Global Step: 669290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:46,184-Speed 2628.94 samples/sec   Loss 2.7216   LearningRate 0.0037   Epoch: 16   Global Step: 669300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:50,080-Speed 2629.06 samples/sec   Loss 2.6469   LearningRate 0.0037   Epoch: 16   Global Step: 669310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:54,069-Speed 2567.80 samples/sec   Loss 2.6774   LearningRate 0.0037   Epoch: 16   Global Step: 669320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:56:57,975-Speed 2621.89 samples/sec   Loss 2.6979   LearningRate 0.0037   Epoch: 16   Global Step: 669330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:01,869-Speed 2630.11 samples/sec   Loss 2.6349   LearningRate 0.0037   Epoch: 16   Global Step: 669340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:05,768-Speed 2627.14 samples/sec   Loss 2.6376   LearningRate 0.0037   Epoch: 16   Global Step: 669350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:09,659-Speed 2632.79 samples/sec   Loss 2.6060   LearningRate 0.0037   Epoch: 16   Global Step: 669360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:13,555-Speed 2628.53 samples/sec   Loss 2.6142   LearningRate 0.0037   Epoch: 16   Global Step: 669370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:57:17,448-Speed 2630.94 samples/sec   Loss 2.6921   LearningRate 0.0037   Epoch: 16   Global Step: 669380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:57:21,340-Speed 2631.82 samples/sec   Loss 2.6819   LearningRate 0.0037   Epoch: 16   Global Step: 669390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:57:25,257-Speed 2614.77 samples/sec   Loss 2.6331   LearningRate 0.0037   Epoch: 16   Global Step: 669400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:57:29,181-Speed 2611.04 samples/sec   Loss 2.6418   LearningRate 0.0037   Epoch: 16   Global Step: 669410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:57:33,071-Speed 2632.93 samples/sec   Loss 2.6583   LearningRate 0.0037   Epoch: 16   Global Step: 669420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:57:36,940-Speed 2646.91 samples/sec   Loss 2.7061   LearningRate 0.0037   Epoch: 16   Global Step: 669430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:40,836-Speed 2628.92 samples/sec   Loss 2.6162   LearningRate 0.0037   Epoch: 16   Global Step: 669440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:44,741-Speed 2623.30 samples/sec   Loss 2.6613   LearningRate 0.0037   Epoch: 16   Global Step: 669450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:48,636-Speed 2629.84 samples/sec   Loss 2.6935   LearningRate 0.0037   Epoch: 16   Global Step: 669460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:52,538-Speed 2624.71 samples/sec   Loss 2.6762   LearningRate 0.0037   Epoch: 16   Global Step: 669470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:57:56,432-Speed 2630.20 samples/sec   Loss 2.6548   LearningRate 0.0037   Epoch: 16   Global Step: 669480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:00,333-Speed 2626.59 samples/sec   Loss 2.6079   LearningRate 0.0037   Epoch: 16   Global Step: 669490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:04,224-Speed 2631.87 samples/sec   Loss 2.5797   LearningRate 0.0037   Epoch: 16   Global Step: 669500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:08,119-Speed 2629.39 samples/sec   Loss 2.5941   LearningRate 0.0037   Epoch: 16   Global Step: 669510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:12,022-Speed 2624.25 samples/sec   Loss 2.6793   LearningRate 0.0037   Epoch: 16   Global Step: 669520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:15,918-Speed 2629.26 samples/sec   Loss 2.6242   LearningRate 0.0037   Epoch: 16   Global Step: 669530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:58:19,836-Speed 2614.53 samples/sec   Loss 2.5764   LearningRate 0.0037   Epoch: 16   Global Step: 669540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:58:23,729-Speed 2630.88 samples/sec   Loss 2.6363   LearningRate 0.0037   Epoch: 16   Global Step: 669550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:58:27,628-Speed 2628.17 samples/sec   Loss 2.6367   LearningRate 0.0037   Epoch: 16   Global Step: 669560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:58:31,523-Speed 2629.60 samples/sec   Loss 2.6691   LearningRate 0.0037   Epoch: 16   Global Step: 669570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:58:35,400-Speed 2641.44 samples/sec   Loss 2.6332   LearningRate 0.0037   Epoch: 16   Global Step: 669580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:39,304-Speed 2623.17 samples/sec   Loss 2.6640   LearningRate 0.0037   Epoch: 16   Global Step: 669590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:43,250-Speed 2596.46 samples/sec   Loss 2.7079   LearningRate 0.0037   Epoch: 16   Global Step: 669600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:47,153-Speed 2624.26 samples/sec   Loss 2.6774   LearningRate 0.0037   Epoch: 16   Global Step: 669610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:51,056-Speed 2624.15 samples/sec   Loss 2.7005   LearningRate 0.0037   Epoch: 16   Global Step: 669620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:54,963-Speed 2621.51 samples/sec   Loss 2.6697   LearningRate 0.0037   Epoch: 16   Global Step: 669630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:58:58,855-Speed 2631.64 samples/sec   Loss 2.6628   LearningRate 0.0037   Epoch: 16   Global Step: 669640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:59:02,752-Speed 2628.96 samples/sec   Loss 2.6588   LearningRate 0.0037   Epoch: 16   Global Step: 669650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:59:06,658-Speed 2621.57 samples/sec   Loss 2.6521   LearningRate 0.0037   Epoch: 16   Global Step: 669660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:59:10,553-Speed 2629.72 samples/sec   Loss 2.7382   LearningRate 0.0037   Epoch: 16   Global Step: 669670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:59:14,454-Speed 2625.82 samples/sec   Loss 2.5916   LearningRate 0.0037   Epoch: 16   Global Step: 669680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:59:18,352-Speed 2627.28 samples/sec   Loss 2.6845   LearningRate 0.0037   Epoch: 16   Global Step: 669690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 22:59:22,232-Speed 2640.28 samples/sec   Loss 2.6782   LearningRate 0.0037   Epoch: 16   Global Step: 669700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:59:26,126-Speed 2630.07 samples/sec   Loss 2.7126   LearningRate 0.0037   Epoch: 16   Global Step: 669710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 22:59:29,994-Speed 2648.28 samples/sec   Loss 2.7156   LearningRate 0.0037   Epoch: 16   Global Step: 669720   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:33,888-Speed 2630.27 samples/sec   Loss 2.6748   LearningRate 0.0037   Epoch: 16   Global Step: 669730   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:37,781-Speed 2631.20 samples/sec   Loss 2.6034   LearningRate 0.0037   Epoch: 16   Global Step: 669740   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:41,674-Speed 2630.35 samples/sec   Loss 2.6445   LearningRate 0.0037   Epoch: 16   Global Step: 669750   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:45,579-Speed 2623.78 samples/sec   Loss 2.6438   LearningRate 0.0037   Epoch: 16   Global Step: 669760   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:49,476-Speed 2628.03 samples/sec   Loss 2.6810   LearningRate 0.0037   Epoch: 16   Global Step: 669770   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:53,373-Speed 2628.63 samples/sec   Loss 2.5892   LearningRate 0.0037   Epoch: 16   Global Step: 669780   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 22:59:57,280-Speed 2621.02 samples/sec   Loss 2.6357   LearningRate 0.0037   Epoch: 16   Global Step: 669790   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:00:01,238-Speed 2588.50 samples/sec   Loss 2.6560   LearningRate 0.0037   Epoch: 16   Global Step: 669800   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:00:05,134-Speed 2628.56 samples/sec   Loss 2.6310   LearningRate 0.0037   Epoch: 16   Global Step: 669810   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:00:09,030-Speed 2628.94 samples/sec   Loss 2.6523   LearningRate 0.0037   Epoch: 16   Global Step: 669820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:12,922-Speed 2632.04 samples/sec   Loss 2.6838   LearningRate 0.0037   Epoch: 16   Global Step: 669830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:16,814-Speed 2631.86 samples/sec   Loss 2.6128   LearningRate 0.0037   Epoch: 16   Global Step: 669840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:20,711-Speed 2628.35 samples/sec   Loss 2.6108   LearningRate 0.0037   Epoch: 16   Global Step: 669850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:24,608-Speed 2628.13 samples/sec   Loss 2.6572   LearningRate 0.0037   Epoch: 16   Global Step: 669860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:28,514-Speed 2622.31 samples/sec   Loss 2.7098   LearningRate 0.0037   Epoch: 16   Global Step: 669870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:32,408-Speed 2630.43 samples/sec   Loss 2.6530   LearningRate 0.0037   Epoch: 16   Global Step: 669880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:36,302-Speed 2631.17 samples/sec   Loss 2.5833   LearningRate 0.0037   Epoch: 16   Global Step: 669890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:40,204-Speed 2624.64 samples/sec   Loss 2.7343   LearningRate 0.0037   Epoch: 16   Global Step: 669900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:44,344-Speed 2473.76 samples/sec   Loss 2.7106   LearningRate 0.0037   Epoch: 16   Global Step: 669910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:00:48,241-Speed 2627.97 samples/sec   Loss 2.6163   LearningRate 0.0037   Epoch: 16   Global Step: 669920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:00:52,153-Speed 2618.07 samples/sec   Loss 2.5901   LearningRate 0.0037   Epoch: 16   Global Step: 669930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:00:56,053-Speed 2626.35 samples/sec   Loss 2.6359   LearningRate 0.0037   Epoch: 16   Global Step: 669940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:00:59,924-Speed 2645.91 samples/sec   Loss 2.6566   LearningRate 0.0037   Epoch: 16   Global Step: 669950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:01:03,823-Speed 2627.28 samples/sec   Loss 2.6455   LearningRate 0.0037   Epoch: 16   Global Step: 669960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:01:07,725-Speed 2624.99 samples/sec   Loss 2.6316   LearningRate 0.0037   Epoch: 16   Global Step: 669970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:01:11,641-Speed 2616.04 samples/sec   Loss 2.7221   LearningRate 0.0037   Epoch: 16   Global Step: 669980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:01:15,538-Speed 2628.35 samples/sec   Loss 2.7439   LearningRate 0.0037   Epoch: 16   Global Step: 669990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:01:19,556-Speed 2548.94 samples/sec   Loss 2.6673   LearningRate 0.0037   Epoch: 16   Global Step: 670000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:02:02,564-[lfw][670000]XNorm: 22.308158
Training: 2022-04-15 23:02:02,565-[lfw][670000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-15 23:02:02,565-[lfw][670000]Accuracy-Highest: 0.99833
Training: 2022-04-15 23:02:52,676-[cfp_fp][670000]XNorm: 21.941925
Training: 2022-04-15 23:02:52,677-[cfp_fp][670000]Accuracy-Flip: 0.99143+-0.00319
Training: 2022-04-15 23:02:52,678-[cfp_fp][670000]Accuracy-Highest: 0.99271
Training: 2022-04-15 23:03:35,839-[agedb_30][670000]XNorm: 22.795367
Training: 2022-04-15 23:03:35,840-[agedb_30][670000]Accuracy-Flip: 0.98233+-0.00659
Training: 2022-04-15 23:03:35,841-[agedb_30][670000]Accuracy-Highest: 0.98233
Training: 2022-04-15 23:03:39,723-Speed 73.06 samples/sec   Loss 2.6827   LearningRate 0.0037   Epoch: 16   Global Step: 670010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:03:43,625-Speed 2625.02 samples/sec   Loss 2.6550   LearningRate 0.0037   Epoch: 16   Global Step: 670020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:03:47,555-Speed 2606.77 samples/sec   Loss 2.6592   LearningRate 0.0037   Epoch: 16   Global Step: 670030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:03:51,479-Speed 2610.44 samples/sec   Loss 2.5874   LearningRate 0.0037   Epoch: 16   Global Step: 670040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:03:55,407-Speed 2607.07 samples/sec   Loss 2.7200   LearningRate 0.0037   Epoch: 16   Global Step: 670050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:03:59,318-Speed 2619.50 samples/sec   Loss 2.6134   LearningRate 0.0037   Epoch: 16   Global Step: 670060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:04:03,180-Speed 2653.05 samples/sec   Loss 2.6135   LearningRate 0.0037   Epoch: 16   Global Step: 670070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:07,068-Speed 2634.89 samples/sec   Loss 2.5912   LearningRate 0.0037   Epoch: 16   Global Step: 670080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:10,963-Speed 2629.84 samples/sec   Loss 2.5953   LearningRate 0.0037   Epoch: 16   Global Step: 670090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:14,854-Speed 2632.34 samples/sec   Loss 2.6170   LearningRate 0.0037   Epoch: 16   Global Step: 670100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:18,748-Speed 2630.07 samples/sec   Loss 2.6675   LearningRate 0.0037   Epoch: 16   Global Step: 670110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:22,643-Speed 2629.86 samples/sec   Loss 2.6776   LearningRate 0.0037   Epoch: 16   Global Step: 670120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:26,546-Speed 2624.93 samples/sec   Loss 2.5905   LearningRate 0.0037   Epoch: 16   Global Step: 670130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:30,457-Speed 2618.72 samples/sec   Loss 2.6772   LearningRate 0.0037   Epoch: 16   Global Step: 670140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:34,363-Speed 2622.18 samples/sec   Loss 2.6510   LearningRate 0.0037   Epoch: 16   Global Step: 670150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:38,275-Speed 2618.17 samples/sec   Loss 2.5838   LearningRate 0.0037   Epoch: 16   Global Step: 670160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:04:42,178-Speed 2624.40 samples/sec   Loss 2.6979   LearningRate 0.0037   Epoch: 16   Global Step: 670170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:04:46,132-Speed 2590.39 samples/sec   Loss 2.6115   LearningRate 0.0037   Epoch: 16   Global Step: 670180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:04:50,063-Speed 2605.37 samples/sec   Loss 2.6803   LearningRate 0.0037   Epoch: 16   Global Step: 670190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:04:53,970-Speed 2621.65 samples/sec   Loss 2.6782   LearningRate 0.0037   Epoch: 16   Global Step: 670200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:04:57,891-Speed 2613.05 samples/sec   Loss 2.6498   LearningRate 0.0037   Epoch: 16   Global Step: 670210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:05:01,801-Speed 2619.18 samples/sec   Loss 2.6881   LearningRate 0.0037   Epoch: 16   Global Step: 670220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:05:05,719-Speed 2615.01 samples/sec   Loss 2.6104   LearningRate 0.0037   Epoch: 16   Global Step: 670230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:05:09,619-Speed 2625.75 samples/sec   Loss 2.5566   LearningRate 0.0037   Epoch: 16   Global Step: 670240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:05:13,525-Speed 2622.82 samples/sec   Loss 2.6341   LearningRate 0.0037   Epoch: 16   Global Step: 670250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:05:17,408-Speed 2638.10 samples/sec   Loss 2.7135   LearningRate 0.0037   Epoch: 16   Global Step: 670260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:21,310-Speed 2624.38 samples/sec   Loss 2.6294   LearningRate 0.0037   Epoch: 16   Global Step: 670270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:25,217-Speed 2621.58 samples/sec   Loss 2.6173   LearningRate 0.0037   Epoch: 16   Global Step: 670280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:29,123-Speed 2622.78 samples/sec   Loss 2.6527   LearningRate 0.0037   Epoch: 16   Global Step: 670290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:33,029-Speed 2621.98 samples/sec   Loss 2.5986   LearningRate 0.0037   Epoch: 16   Global Step: 670300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:36,934-Speed 2622.95 samples/sec   Loss 2.5914   LearningRate 0.0037   Epoch: 16   Global Step: 670310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:40,836-Speed 2625.53 samples/sec   Loss 2.5965   LearningRate 0.0037   Epoch: 16   Global Step: 670320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:44,752-Speed 2615.69 samples/sec   Loss 2.6340   LearningRate 0.0037   Epoch: 16   Global Step: 670330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:48,651-Speed 2626.23 samples/sec   Loss 2.7302   LearningRate 0.0037   Epoch: 16   Global Step: 670340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:52,549-Speed 2627.53 samples/sec   Loss 2.6537   LearningRate 0.0037   Epoch: 16   Global Step: 670350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:05:56,447-Speed 2627.99 samples/sec   Loss 2.5812   LearningRate 0.0037   Epoch: 16   Global Step: 670360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:06:00,337-Speed 2633.08 samples/sec   Loss 2.6449   LearningRate 0.0037   Epoch: 16   Global Step: 670370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:04,238-Speed 2625.53 samples/sec   Loss 2.6352   LearningRate 0.0037   Epoch: 16   Global Step: 670380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:08,230-Speed 2566.26 samples/sec   Loss 2.6293   LearningRate 0.0037   Epoch: 16   Global Step: 670390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:12,124-Speed 2630.52 samples/sec   Loss 2.6330   LearningRate 0.0037   Epoch: 16   Global Step: 670400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:16,029-Speed 2622.80 samples/sec   Loss 2.5872   LearningRate 0.0037   Epoch: 16   Global Step: 670410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:19,926-Speed 2628.38 samples/sec   Loss 2.5771   LearningRate 0.0037   Epoch: 16   Global Step: 670420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:23,821-Speed 2629.68 samples/sec   Loss 2.7192   LearningRate 0.0037   Epoch: 16   Global Step: 670430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:06:27,719-Speed 2627.97 samples/sec   Loss 2.5832   LearningRate 0.0037   Epoch: 16   Global Step: 670440   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:31,624-Speed 2623.33 samples/sec   Loss 2.6149   LearningRate 0.0037   Epoch: 16   Global Step: 670450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:35,528-Speed 2623.61 samples/sec   Loss 2.6120   LearningRate 0.0037   Epoch: 16   Global Step: 670460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:39,439-Speed 2619.10 samples/sec   Loss 2.6047   LearningRate 0.0037   Epoch: 16   Global Step: 670470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:43,340-Speed 2625.07 samples/sec   Loss 2.6345   LearningRate 0.0037   Epoch: 16   Global Step: 670480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:47,241-Speed 2626.26 samples/sec   Loss 2.6409   LearningRate 0.0037   Epoch: 16   Global Step: 670490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:51,135-Speed 2630.47 samples/sec   Loss 2.6041   LearningRate 0.0037   Epoch: 16   Global Step: 670500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:55,044-Speed 2619.82 samples/sec   Loss 2.6255   LearningRate 0.0037   Epoch: 16   Global Step: 670510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:06:58,948-Speed 2623.67 samples/sec   Loss 2.6178   LearningRate 0.0037   Epoch: 16   Global Step: 670520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:07:02,852-Speed 2624.10 samples/sec   Loss 2.6079   LearningRate 0.0037   Epoch: 16   Global Step: 670530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:07:06,761-Speed 2620.19 samples/sec   Loss 2.6433   LearningRate 0.0037   Epoch: 16   Global Step: 670540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:10,662-Speed 2625.78 samples/sec   Loss 2.6864   LearningRate 0.0037   Epoch: 16   Global Step: 670550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:14,560-Speed 2627.56 samples/sec   Loss 2.6115   LearningRate 0.0037   Epoch: 16   Global Step: 670560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:18,455-Speed 2629.74 samples/sec   Loss 2.7174   LearningRate 0.0037   Epoch: 16   Global Step: 670570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:22,368-Speed 2617.44 samples/sec   Loss 2.5981   LearningRate 0.0037   Epoch: 16   Global Step: 670580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:26,275-Speed 2622.12 samples/sec   Loss 2.7013   LearningRate 0.0037   Epoch: 16   Global Step: 670590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:30,173-Speed 2627.09 samples/sec   Loss 2.6290   LearningRate 0.0037   Epoch: 16   Global Step: 670600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:34,083-Speed 2619.83 samples/sec   Loss 2.6695   LearningRate 0.0037   Epoch: 16   Global Step: 670610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:38,015-Speed 2605.35 samples/sec   Loss 2.6308   LearningRate 0.0037   Epoch: 16   Global Step: 670620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:41,913-Speed 2627.99 samples/sec   Loss 2.6353   LearningRate 0.0037   Epoch: 16   Global Step: 670630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:07:45,816-Speed 2624.32 samples/sec   Loss 2.6902   LearningRate 0.0037   Epoch: 16   Global Step: 670640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:07:49,726-Speed 2619.59 samples/sec   Loss 2.6452   LearningRate 0.0037   Epoch: 16   Global Step: 670650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:07:53,624-Speed 2627.87 samples/sec   Loss 2.6176   LearningRate 0.0037   Epoch: 16   Global Step: 670660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:07:57,500-Speed 2642.28 samples/sec   Loss 2.5922   LearningRate 0.0037   Epoch: 16   Global Step: 670670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:01,418-Speed 2614.44 samples/sec   Loss 2.6142   LearningRate 0.0037   Epoch: 16   Global Step: 670680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:05,334-Speed 2615.73 samples/sec   Loss 2.6113   LearningRate 0.0037   Epoch: 16   Global Step: 670690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:09,231-Speed 2628.60 samples/sec   Loss 2.6841   LearningRate 0.0037   Epoch: 16   Global Step: 670700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:13,168-Speed 2601.77 samples/sec   Loss 2.5932   LearningRate 0.0037   Epoch: 16   Global Step: 670710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:17,082-Speed 2617.52 samples/sec   Loss 2.6909   LearningRate 0.0037   Epoch: 16   Global Step: 670720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:20,979-Speed 2627.70 samples/sec   Loss 2.6678   LearningRate 0.0037   Epoch: 16   Global Step: 670730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:24,917-Speed 2601.00 samples/sec   Loss 2.6567   LearningRate 0.0037   Epoch: 16   Global Step: 670740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:28,811-Speed 2630.39 samples/sec   Loss 2.6378   LearningRate 0.0037   Epoch: 16   Global Step: 670750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:32,710-Speed 2627.81 samples/sec   Loss 2.6367   LearningRate 0.0037   Epoch: 16   Global Step: 670760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:08:36,606-Speed 2628.39 samples/sec   Loss 2.6419   LearningRate 0.0037   Epoch: 16   Global Step: 670770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:08:40,516-Speed 2619.08 samples/sec   Loss 2.6200   LearningRate 0.0037   Epoch: 16   Global Step: 670780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:08:44,418-Speed 2626.08 samples/sec   Loss 2.6537   LearningRate 0.0037   Epoch: 16   Global Step: 670790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:08:48,337-Speed 2613.49 samples/sec   Loss 2.6450   LearningRate 0.0037   Epoch: 16   Global Step: 670800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:08:52,249-Speed 2618.33 samples/sec   Loss 2.6625   LearningRate 0.0037   Epoch: 16   Global Step: 670810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:08:56,157-Speed 2621.01 samples/sec   Loss 2.5655   LearningRate 0.0037   Epoch: 16   Global Step: 670820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:00,060-Speed 2624.01 samples/sec   Loss 2.5926   LearningRate 0.0037   Epoch: 16   Global Step: 670830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:03,958-Speed 2627.27 samples/sec   Loss 2.6038   LearningRate 0.0037   Epoch: 16   Global Step: 670840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:07,853-Speed 2630.10 samples/sec   Loss 2.6524   LearningRate 0.0037   Epoch: 16   Global Step: 670850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:11,767-Speed 2617.22 samples/sec   Loss 2.6269   LearningRate 0.0037   Epoch: 16   Global Step: 670860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:15,662-Speed 2629.80 samples/sec   Loss 2.6204   LearningRate 0.0037   Epoch: 16   Global Step: 670870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:19,563-Speed 2625.53 samples/sec   Loss 2.6188   LearningRate 0.0037   Epoch: 16   Global Step: 670880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:23,488-Speed 2609.20 samples/sec   Loss 2.6143   LearningRate 0.0037   Epoch: 16   Global Step: 670890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:27,387-Speed 2627.42 samples/sec   Loss 2.6808   LearningRate 0.0037   Epoch: 16   Global Step: 670900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:31,282-Speed 2629.62 samples/sec   Loss 2.6087   LearningRate 0.0037   Epoch: 16   Global Step: 670910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:35,183-Speed 2625.30 samples/sec   Loss 2.6013   LearningRate 0.0037   Epoch: 16   Global Step: 670920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:09:39,092-Speed 2620.48 samples/sec   Loss 2.6325   LearningRate 0.0037   Epoch: 16   Global Step: 670930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:09:42,994-Speed 2625.49 samples/sec   Loss 2.6300   LearningRate 0.0037   Epoch: 16   Global Step: 670940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:46,897-Speed 2624.25 samples/sec   Loss 2.6576   LearningRate 0.0037   Epoch: 16   Global Step: 670950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:50,797-Speed 2626.69 samples/sec   Loss 2.6531   LearningRate 0.0037   Epoch: 16   Global Step: 670960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:54,708-Speed 2618.87 samples/sec   Loss 2.5614   LearningRate 0.0037   Epoch: 16   Global Step: 670970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:09:58,605-Speed 2628.98 samples/sec   Loss 2.6902   LearningRate 0.0037   Epoch: 16   Global Step: 670980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:02,499-Speed 2630.06 samples/sec   Loss 2.6097   LearningRate 0.0037   Epoch: 16   Global Step: 670990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:06,396-Speed 2628.52 samples/sec   Loss 2.6442   LearningRate 0.0037   Epoch: 16   Global Step: 671000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:10,300-Speed 2623.40 samples/sec   Loss 2.6079   LearningRate 0.0037   Epoch: 16   Global Step: 671010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:14,194-Speed 2631.69 samples/sec   Loss 2.6559   LearningRate 0.0037   Epoch: 16   Global Step: 671020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:18,090-Speed 2629.03 samples/sec   Loss 2.6014   LearningRate 0.0037   Epoch: 16   Global Step: 671030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:21,986-Speed 2628.88 samples/sec   Loss 2.6653   LearningRate 0.0037   Epoch: 16   Global Step: 671040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:10:25,858-Speed 2645.20 samples/sec   Loss 2.5914   LearningRate 0.0037   Epoch: 16   Global Step: 671050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:29,757-Speed 2626.95 samples/sec   Loss 2.5633   LearningRate 0.0037   Epoch: 16   Global Step: 671060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:33,656-Speed 2626.95 samples/sec   Loss 2.6563   LearningRate 0.0037   Epoch: 16   Global Step: 671070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:37,557-Speed 2625.49 samples/sec   Loss 2.6169   LearningRate 0.0037   Epoch: 16   Global Step: 671080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:41,462-Speed 2622.76 samples/sec   Loss 2.6723   LearningRate 0.0036   Epoch: 16   Global Step: 671090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:45,361-Speed 2627.11 samples/sec   Loss 2.6997   LearningRate 0.0036   Epoch: 16   Global Step: 671100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:49,259-Speed 2628.53 samples/sec   Loss 2.6343   LearningRate 0.0036   Epoch: 16   Global Step: 671110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:53,156-Speed 2627.61 samples/sec   Loss 2.6855   LearningRate 0.0036   Epoch: 16   Global Step: 671120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:10:57,052-Speed 2629.63 samples/sec   Loss 2.6031   LearningRate 0.0036   Epoch: 16   Global Step: 671130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:00,971-Speed 2613.31 samples/sec   Loss 2.6729   LearningRate 0.0036   Epoch: 16   Global Step: 671140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:04,873-Speed 2624.43 samples/sec   Loss 2.6342   LearningRate 0.0036   Epoch: 16   Global Step: 671150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:11:08,752-Speed 2640.52 samples/sec   Loss 2.6888   LearningRate 0.0036   Epoch: 16   Global Step: 671160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:12,659-Speed 2622.17 samples/sec   Loss 2.6395   LearningRate 0.0036   Epoch: 16   Global Step: 671170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:16,602-Speed 2597.44 samples/sec   Loss 2.6047   LearningRate 0.0036   Epoch: 16   Global Step: 671180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:20,506-Speed 2623.62 samples/sec   Loss 2.6626   LearningRate 0.0036   Epoch: 16   Global Step: 671190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:24,402-Speed 2629.03 samples/sec   Loss 2.6299   LearningRate 0.0036   Epoch: 16   Global Step: 671200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:28,303-Speed 2626.14 samples/sec   Loss 2.6443   LearningRate 0.0036   Epoch: 16   Global Step: 671210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:32,199-Speed 2628.70 samples/sec   Loss 2.5860   LearningRate 0.0036   Epoch: 16   Global Step: 671220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:36,111-Speed 2617.73 samples/sec   Loss 2.6548   LearningRate 0.0036   Epoch: 16   Global Step: 671230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:40,009-Speed 2627.49 samples/sec   Loss 2.6633   LearningRate 0.0036   Epoch: 16   Global Step: 671240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:43,908-Speed 2627.32 samples/sec   Loss 2.5851   LearningRate 0.0036   Epoch: 16   Global Step: 671250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:11:47,812-Speed 2623.80 samples/sec   Loss 2.5971   LearningRate 0.0036   Epoch: 16   Global Step: 671260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:11:51,728-Speed 2615.71 samples/sec   Loss 2.6710   LearningRate 0.0036   Epoch: 16   Global Step: 671270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:11:55,644-Speed 2615.64 samples/sec   Loss 2.5596   LearningRate 0.0036   Epoch: 16   Global Step: 671280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:11:59,549-Speed 2622.90 samples/sec   Loss 2.5471   LearningRate 0.0036   Epoch: 16   Global Step: 671290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:12:03,457-Speed 2620.93 samples/sec   Loss 2.5620   LearningRate 0.0036   Epoch: 16   Global Step: 671300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:12:07,356-Speed 2626.62 samples/sec   Loss 2.6618   LearningRate 0.0036   Epoch: 16   Global Step: 671310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:12:11,242-Speed 2635.65 samples/sec   Loss 2.6334   LearningRate 0.0036   Epoch: 16   Global Step: 671320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:15,141-Speed 2626.64 samples/sec   Loss 2.6263   LearningRate 0.0036   Epoch: 16   Global Step: 671330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:19,047-Speed 2622.23 samples/sec   Loss 2.5881   LearningRate 0.0036   Epoch: 16   Global Step: 671340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:22,953-Speed 2622.95 samples/sec   Loss 2.5855   LearningRate 0.0036   Epoch: 16   Global Step: 671350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:26,857-Speed 2623.32 samples/sec   Loss 2.5820   LearningRate 0.0036   Epoch: 16   Global Step: 671360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:30,821-Speed 2583.51 samples/sec   Loss 2.6642   LearningRate 0.0036   Epoch: 16   Global Step: 671370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:34,721-Speed 2626.94 samples/sec   Loss 2.6035   LearningRate 0.0036   Epoch: 16   Global Step: 671380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:38,632-Speed 2619.51 samples/sec   Loss 2.6373   LearningRate 0.0036   Epoch: 16   Global Step: 671390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:42,530-Speed 2627.06 samples/sec   Loss 2.5849   LearningRate 0.0036   Epoch: 16   Global Step: 671400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:46,424-Speed 2630.54 samples/sec   Loss 2.6941   LearningRate 0.0036   Epoch: 16   Global Step: 671410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:50,324-Speed 2626.65 samples/sec   Loss 2.5766   LearningRate 0.0036   Epoch: 16   Global Step: 671420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:12:54,196-Speed 2644.91 samples/sec   Loss 2.7422   LearningRate 0.0036   Epoch: 16   Global Step: 671430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:12:58,090-Speed 2629.68 samples/sec   Loss 2.5964   LearningRate 0.0036   Epoch: 16   Global Step: 671440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:01,994-Speed 2624.11 samples/sec   Loss 2.6254   LearningRate 0.0036   Epoch: 16   Global Step: 671450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:05,905-Speed 2619.15 samples/sec   Loss 2.5678   LearningRate 0.0036   Epoch: 16   Global Step: 671460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:09,810-Speed 2623.54 samples/sec   Loss 2.6504   LearningRate 0.0036   Epoch: 16   Global Step: 671470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:13,801-Speed 2565.86 samples/sec   Loss 2.6229   LearningRate 0.0036   Epoch: 16   Global Step: 671480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:17,703-Speed 2625.49 samples/sec   Loss 2.6390   LearningRate 0.0036   Epoch: 16   Global Step: 671490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:21,601-Speed 2627.72 samples/sec   Loss 2.6018   LearningRate 0.0036   Epoch: 16   Global Step: 671500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:25,507-Speed 2621.83 samples/sec   Loss 2.6275   LearningRate 0.0036   Epoch: 16   Global Step: 671510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:29,404-Speed 2627.98 samples/sec   Loss 2.5910   LearningRate 0.0036   Epoch: 16   Global Step: 671520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:33,334-Speed 2606.69 samples/sec   Loss 2.6466   LearningRate 0.0036   Epoch: 16   Global Step: 671530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:13:37,258-Speed 2610.05 samples/sec   Loss 2.6076   LearningRate 0.0036   Epoch: 16   Global Step: 671540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:13:41,158-Speed 2627.53 samples/sec   Loss 2.6697   LearningRate 0.0036   Epoch: 16   Global Step: 671550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:13:45,054-Speed 2628.52 samples/sec   Loss 2.6119   LearningRate 0.0036   Epoch: 16   Global Step: 671560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:13:48,958-Speed 2624.22 samples/sec   Loss 2.6293   LearningRate 0.0036   Epoch: 16   Global Step: 671570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:52,857-Speed 2626.49 samples/sec   Loss 2.5486   LearningRate 0.0036   Epoch: 16   Global Step: 671580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:13:56,752-Speed 2629.52 samples/sec   Loss 2.5736   LearningRate 0.0036   Epoch: 16   Global Step: 671590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:00,644-Speed 2631.50 samples/sec   Loss 2.6267   LearningRate 0.0036   Epoch: 16   Global Step: 671600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:04,540-Speed 2629.69 samples/sec   Loss 2.6144   LearningRate 0.0036   Epoch: 16   Global Step: 671610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:08,433-Speed 2631.60 samples/sec   Loss 2.5567   LearningRate 0.0036   Epoch: 16   Global Step: 671620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:12,333-Speed 2625.84 samples/sec   Loss 2.5330   LearningRate 0.0036   Epoch: 16   Global Step: 671630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:16,237-Speed 2624.40 samples/sec   Loss 2.6844   LearningRate 0.0036   Epoch: 16   Global Step: 671640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:20,137-Speed 2626.04 samples/sec   Loss 2.6035   LearningRate 0.0036   Epoch: 16   Global Step: 671650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:24,050-Speed 2617.30 samples/sec   Loss 2.6319   LearningRate 0.0036   Epoch: 16   Global Step: 671660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:27,949-Speed 2626.88 samples/sec   Loss 2.6108   LearningRate 0.0036   Epoch: 16   Global Step: 671670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:14:31,836-Speed 2635.89 samples/sec   Loss 2.6357   LearningRate 0.0036   Epoch: 16   Global Step: 671680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:35,754-Speed 2614.21 samples/sec   Loss 2.5986   LearningRate 0.0036   Epoch: 16   Global Step: 671690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:39,660-Speed 2622.49 samples/sec   Loss 2.5714   LearningRate 0.0036   Epoch: 16   Global Step: 671700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:43,589-Speed 2606.92 samples/sec   Loss 2.5836   LearningRate 0.0036   Epoch: 16   Global Step: 671710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:47,502-Speed 2617.75 samples/sec   Loss 2.6005   LearningRate 0.0036   Epoch: 16   Global Step: 671720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:51,417-Speed 2618.05 samples/sec   Loss 2.7096   LearningRate 0.0036   Epoch: 16   Global Step: 671730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:55,361-Speed 2596.49 samples/sec   Loss 2.6971   LearningRate 0.0036   Epoch: 16   Global Step: 671740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:14:59,255-Speed 2630.88 samples/sec   Loss 2.6407   LearningRate 0.0036   Epoch: 16   Global Step: 671750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:03,148-Speed 2630.76 samples/sec   Loss 2.5491   LearningRate 0.0036   Epoch: 16   Global Step: 671760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:07,059-Speed 2618.90 samples/sec   Loss 2.6386   LearningRate 0.0036   Epoch: 16   Global Step: 671770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:10,952-Speed 2630.84 samples/sec   Loss 2.6118   LearningRate 0.0036   Epoch: 16   Global Step: 671780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:14,850-Speed 2628.10 samples/sec   Loss 2.6289   LearningRate 0.0036   Epoch: 16   Global Step: 671790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:18,756-Speed 2622.38 samples/sec   Loss 2.6471   LearningRate 0.0036   Epoch: 16   Global Step: 671800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:22,661-Speed 2622.82 samples/sec   Loss 2.6023   LearningRate 0.0036   Epoch: 16   Global Step: 671810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:26,573-Speed 2618.47 samples/sec   Loss 2.6482   LearningRate 0.0036   Epoch: 16   Global Step: 671820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:30,492-Speed 2613.30 samples/sec   Loss 2.6053   LearningRate 0.0036   Epoch: 16   Global Step: 671830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:34,395-Speed 2624.82 samples/sec   Loss 2.5892   LearningRate 0.0036   Epoch: 16   Global Step: 671840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:15:38,284-Speed 2633.38 samples/sec   Loss 2.6369   LearningRate 0.0036   Epoch: 16   Global Step: 671850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:42,185-Speed 2625.88 samples/sec   Loss 2.5841   LearningRate 0.0036   Epoch: 16   Global Step: 671860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:46,080-Speed 2629.31 samples/sec   Loss 2.5454   LearningRate 0.0036   Epoch: 16   Global Step: 671870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:49,987-Speed 2622.07 samples/sec   Loss 2.6673   LearningRate 0.0036   Epoch: 16   Global Step: 671880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:53,881-Speed 2629.94 samples/sec   Loss 2.5665   LearningRate 0.0036   Epoch: 16   Global Step: 671890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:15:57,789-Speed 2621.68 samples/sec   Loss 2.6145   LearningRate 0.0036   Epoch: 16   Global Step: 671900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:16:01,687-Speed 2627.50 samples/sec   Loss 2.5543   LearningRate 0.0036   Epoch: 16   Global Step: 671910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:16:05,620-Speed 2604.04 samples/sec   Loss 2.5452   LearningRate 0.0036   Epoch: 16   Global Step: 671920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:16:09,522-Speed 2625.07 samples/sec   Loss 2.5937   LearningRate 0.0036   Epoch: 16   Global Step: 671930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:16:13,390-Speed 2648.72 samples/sec   Loss 2.5835   LearningRate 0.0036   Epoch: 16   Global Step: 671940   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:17,290-Speed 2626.36 samples/sec   Loss 2.6558   LearningRate 0.0036   Epoch: 16   Global Step: 671950   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:21,184-Speed 2630.32 samples/sec   Loss 2.6548   LearningRate 0.0036   Epoch: 16   Global Step: 671960   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:25,071-Speed 2635.13 samples/sec   Loss 2.5916   LearningRate 0.0036   Epoch: 16   Global Step: 671970   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:28,962-Speed 2632.41 samples/sec   Loss 2.5750   LearningRate 0.0036   Epoch: 16   Global Step: 671980   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:32,867-Speed 2623.33 samples/sec   Loss 2.5917   LearningRate 0.0036   Epoch: 16   Global Step: 671990   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:36,772-Speed 2622.44 samples/sec   Loss 2.6306   LearningRate 0.0036   Epoch: 16   Global Step: 672000   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:40,666-Speed 2630.21 samples/sec   Loss 2.6445   LearningRate 0.0036   Epoch: 16   Global Step: 672010   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:44,568-Speed 2625.59 samples/sec   Loss 2.5431   LearningRate 0.0036   Epoch: 16   Global Step: 672020   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:48,475-Speed 2621.43 samples/sec   Loss 2.7168   LearningRate 0.0036   Epoch: 16   Global Step: 672030   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:16:52,377-Speed 2624.82 samples/sec   Loss 2.5895   LearningRate 0.0036   Epoch: 16   Global Step: 672040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:16:56,273-Speed 2629.16 samples/sec   Loss 2.6168   LearningRate 0.0036   Epoch: 16   Global Step: 672050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:00,172-Speed 2627.31 samples/sec   Loss 2.5395   LearningRate 0.0036   Epoch: 16   Global Step: 672060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:04,068-Speed 2628.74 samples/sec   Loss 2.6453   LearningRate 0.0036   Epoch: 16   Global Step: 672070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:07,964-Speed 2628.71 samples/sec   Loss 2.5971   LearningRate 0.0036   Epoch: 16   Global Step: 672080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:11,860-Speed 2629.46 samples/sec   Loss 2.5714   LearningRate 0.0036   Epoch: 16   Global Step: 672090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:15,750-Speed 2632.59 samples/sec   Loss 2.5979   LearningRate 0.0036   Epoch: 16   Global Step: 672100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:19,653-Speed 2625.04 samples/sec   Loss 2.5349   LearningRate 0.0036   Epoch: 16   Global Step: 672110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:23,599-Speed 2595.80 samples/sec   Loss 2.5829   LearningRate 0.0036   Epoch: 16   Global Step: 672120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:27,506-Speed 2621.62 samples/sec   Loss 2.6415   LearningRate 0.0036   Epoch: 16   Global Step: 672130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:31,401-Speed 2629.41 samples/sec   Loss 2.5627   LearningRate 0.0036   Epoch: 16   Global Step: 672140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:17:35,294-Speed 2631.33 samples/sec   Loss 2.5689   LearningRate 0.0036   Epoch: 16   Global Step: 672150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:17:39,189-Speed 2629.09 samples/sec   Loss 2.6218   LearningRate 0.0036   Epoch: 16   Global Step: 672160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:17:43,057-Speed 2648.68 samples/sec   Loss 2.5442   LearningRate 0.0036   Epoch: 16   Global Step: 672170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:46,977-Speed 2612.65 samples/sec   Loss 2.5886   LearningRate 0.0036   Epoch: 16   Global Step: 672180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:50,875-Speed 2627.82 samples/sec   Loss 2.6906   LearningRate 0.0036   Epoch: 16   Global Step: 672190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:54,782-Speed 2621.36 samples/sec   Loss 2.6341   LearningRate 0.0036   Epoch: 16   Global Step: 672200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:17:58,678-Speed 2629.80 samples/sec   Loss 2.6366   LearningRate 0.0036   Epoch: 16   Global Step: 672210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:02,569-Speed 2632.12 samples/sec   Loss 2.6237   LearningRate 0.0036   Epoch: 16   Global Step: 672220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:06,458-Speed 2633.28 samples/sec   Loss 2.5666   LearningRate 0.0036   Epoch: 16   Global Step: 672230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:10,351-Speed 2631.54 samples/sec   Loss 2.5729   LearningRate 0.0036   Epoch: 16   Global Step: 672240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:14,245-Speed 2630.77 samples/sec   Loss 2.6707   LearningRate 0.0036   Epoch: 16   Global Step: 672250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:18,162-Speed 2614.79 samples/sec   Loss 2.5702   LearningRate 0.0036   Epoch: 16   Global Step: 672260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:22,061-Speed 2626.55 samples/sec   Loss 2.5994   LearningRate 0.0036   Epoch: 16   Global Step: 672270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:25,958-Speed 2628.13 samples/sec   Loss 2.5540   LearningRate 0.0036   Epoch: 16   Global Step: 672280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:29,850-Speed 2632.64 samples/sec   Loss 2.6610   LearningRate 0.0036   Epoch: 16   Global Step: 672290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:33,744-Speed 2630.44 samples/sec   Loss 2.6134   LearningRate 0.0036   Epoch: 16   Global Step: 672300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:37,636-Speed 2631.53 samples/sec   Loss 2.5784   LearningRate 0.0036   Epoch: 16   Global Step: 672310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:41,530-Speed 2630.40 samples/sec   Loss 2.6635   LearningRate 0.0036   Epoch: 16   Global Step: 672320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:45,445-Speed 2615.85 samples/sec   Loss 2.6043   LearningRate 0.0036   Epoch: 16   Global Step: 672330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:18:49,337-Speed 2632.60 samples/sec   Loss 2.6490   LearningRate 0.0036   Epoch: 16   Global Step: 672340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:53,246-Speed 2620.15 samples/sec   Loss 2.5532   LearningRate 0.0036   Epoch: 16   Global Step: 672350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:18:57,145-Speed 2626.42 samples/sec   Loss 2.6631   LearningRate 0.0036   Epoch: 16   Global Step: 672360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:01,047-Speed 2625.00 samples/sec   Loss 2.5704   LearningRate 0.0036   Epoch: 16   Global Step: 672370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:04,950-Speed 2624.86 samples/sec   Loss 2.6496   LearningRate 0.0036   Epoch: 16   Global Step: 672380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:08,844-Speed 2631.06 samples/sec   Loss 2.6973   LearningRate 0.0036   Epoch: 16   Global Step: 672390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:12,740-Speed 2628.54 samples/sec   Loss 2.5189   LearningRate 0.0036   Epoch: 16   Global Step: 672400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:16,654-Speed 2617.43 samples/sec   Loss 2.5938   LearningRate 0.0036   Epoch: 16   Global Step: 672410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:20,561-Speed 2620.90 samples/sec   Loss 2.6548   LearningRate 0.0036   Epoch: 16   Global Step: 672420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:24,462-Speed 2625.68 samples/sec   Loss 2.5828   LearningRate 0.0036   Epoch: 16   Global Step: 672430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:28,372-Speed 2619.81 samples/sec   Loss 2.6133   LearningRate 0.0036   Epoch: 16   Global Step: 672440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:19:32,276-Speed 2624.30 samples/sec   Loss 2.5867   LearningRate 0.0036   Epoch: 16   Global Step: 672450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:19:36,160-Speed 2636.96 samples/sec   Loss 2.6629   LearningRate 0.0036   Epoch: 16   Global Step: 672460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:40,076-Speed 2615.44 samples/sec   Loss 2.5428   LearningRate 0.0036   Epoch: 16   Global Step: 672470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:43,990-Speed 2616.66 samples/sec   Loss 2.6200   LearningRate 0.0036   Epoch: 16   Global Step: 672480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:47,891-Speed 2625.91 samples/sec   Loss 2.6327   LearningRate 0.0036   Epoch: 16   Global Step: 672490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:51,887-Speed 2563.34 samples/sec   Loss 2.5255   LearningRate 0.0036   Epoch: 16   Global Step: 672500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:55,793-Speed 2622.33 samples/sec   Loss 2.5776   LearningRate 0.0036   Epoch: 16   Global Step: 672510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:19:59,835-Speed 2533.82 samples/sec   Loss 2.5295   LearningRate 0.0036   Epoch: 16   Global Step: 672520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:03,818-Speed 2571.25 samples/sec   Loss 2.6529   LearningRate 0.0036   Epoch: 16   Global Step: 672530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:07,710-Speed 2631.77 samples/sec   Loss 2.6113   LearningRate 0.0036   Epoch: 16   Global Step: 672540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:11,605-Speed 2629.43 samples/sec   Loss 2.5920   LearningRate 0.0036   Epoch: 16   Global Step: 672550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:15,509-Speed 2623.37 samples/sec   Loss 2.5507   LearningRate 0.0036   Epoch: 16   Global Step: 672560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:20:19,381-Speed 2645.96 samples/sec   Loss 2.6089   LearningRate 0.0036   Epoch: 16   Global Step: 672570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:23,279-Speed 2627.54 samples/sec   Loss 2.6290   LearningRate 0.0036   Epoch: 16   Global Step: 672580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:27,174-Speed 2629.75 samples/sec   Loss 2.5384   LearningRate 0.0036   Epoch: 16   Global Step: 672590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:31,069-Speed 2629.55 samples/sec   Loss 2.5727   LearningRate 0.0036   Epoch: 16   Global Step: 672600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:34,982-Speed 2618.33 samples/sec   Loss 2.7149   LearningRate 0.0036   Epoch: 16   Global Step: 672610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:38,915-Speed 2604.22 samples/sec   Loss 2.6939   LearningRate 0.0036   Epoch: 16   Global Step: 672620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:42,961-Speed 2531.77 samples/sec   Loss 2.6341   LearningRate 0.0036   Epoch: 16   Global Step: 672630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:46,860-Speed 2627.82 samples/sec   Loss 2.6292   LearningRate 0.0036   Epoch: 16   Global Step: 672640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:50,788-Speed 2607.55 samples/sec   Loss 2.6152   LearningRate 0.0036   Epoch: 16   Global Step: 672650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:54,683-Speed 2629.68 samples/sec   Loss 2.5859   LearningRate 0.0036   Epoch: 16   Global Step: 672660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:20:58,584-Speed 2625.54 samples/sec   Loss 2.6904   LearningRate 0.0036   Epoch: 16   Global Step: 672670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:02,480-Speed 2628.77 samples/sec   Loss 2.5530   LearningRate 0.0036   Epoch: 16   Global Step: 672680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:06,379-Speed 2626.55 samples/sec   Loss 2.5600   LearningRate 0.0036   Epoch: 16   Global Step: 672690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:10,278-Speed 2627.65 samples/sec   Loss 2.6595   LearningRate 0.0036   Epoch: 16   Global Step: 672700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:14,175-Speed 2628.76 samples/sec   Loss 2.6900   LearningRate 0.0036   Epoch: 16   Global Step: 672710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:18,069-Speed 2630.01 samples/sec   Loss 2.6553   LearningRate 0.0036   Epoch: 16   Global Step: 672720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:21,965-Speed 2628.89 samples/sec   Loss 2.5960   LearningRate 0.0036   Epoch: 16   Global Step: 672730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:25,862-Speed 2628.17 samples/sec   Loss 2.5794   LearningRate 0.0036   Epoch: 16   Global Step: 672740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:29,803-Speed 2599.30 samples/sec   Loss 2.5733   LearningRate 0.0036   Epoch: 16   Global Step: 672750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:33,731-Speed 2607.60 samples/sec   Loss 2.5749   LearningRate 0.0036   Epoch: 16   Global Step: 672760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:37,603-Speed 2644.61 samples/sec   Loss 2.6317   LearningRate 0.0036   Epoch: 16   Global Step: 672770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:41,505-Speed 2625.27 samples/sec   Loss 2.5732   LearningRate 0.0036   Epoch: 16   Global Step: 672780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:21:45,380-Speed 2643.63 samples/sec   Loss 2.6437   LearningRate 0.0036   Epoch: 16   Global Step: 672790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:21:49,274-Speed 2630.23 samples/sec   Loss 2.6220   LearningRate 0.0036   Epoch: 16   Global Step: 672800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:21:53,191-Speed 2614.64 samples/sec   Loss 2.5625   LearningRate 0.0036   Epoch: 16   Global Step: 672810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:21:57,084-Speed 2631.37 samples/sec   Loss 2.5781   LearningRate 0.0036   Epoch: 16   Global Step: 672820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:00,979-Speed 2629.34 samples/sec   Loss 2.6143   LearningRate 0.0036   Epoch: 16   Global Step: 672830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:05,176-Speed 2440.19 samples/sec   Loss 2.5509   LearningRate 0.0036   Epoch: 16   Global Step: 672840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:09,086-Speed 2619.17 samples/sec   Loss 2.6550   LearningRate 0.0036   Epoch: 16   Global Step: 672850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:12,988-Speed 2625.13 samples/sec   Loss 2.6173   LearningRate 0.0036   Epoch: 16   Global Step: 672860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:16,886-Speed 2627.49 samples/sec   Loss 2.5950   LearningRate 0.0036   Epoch: 16   Global Step: 672870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:20,782-Speed 2629.59 samples/sec   Loss 2.6182   LearningRate 0.0036   Epoch: 16   Global Step: 672880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:24,698-Speed 2615.21 samples/sec   Loss 2.5616   LearningRate 0.0036   Epoch: 16   Global Step: 672890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:28,589-Speed 2633.30 samples/sec   Loss 2.6413   LearningRate 0.0036   Epoch: 16   Global Step: 672900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:32,482-Speed 2630.61 samples/sec   Loss 2.6148   LearningRate 0.0036   Epoch: 16   Global Step: 672910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:36,379-Speed 2628.10 samples/sec   Loss 2.5873   LearningRate 0.0036   Epoch: 16   Global Step: 672920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:40,282-Speed 2624.10 samples/sec   Loss 2.6122   LearningRate 0.0036   Epoch: 16   Global Step: 672930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:44,209-Speed 2609.19 samples/sec   Loss 2.5580   LearningRate 0.0036   Epoch: 16   Global Step: 672940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:48,107-Speed 2627.23 samples/sec   Loss 2.6499   LearningRate 0.0036   Epoch: 16   Global Step: 672950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:51,999-Speed 2631.92 samples/sec   Loss 2.6393   LearningRate 0.0036   Epoch: 16   Global Step: 672960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:55,893-Speed 2630.78 samples/sec   Loss 2.6025   LearningRate 0.0036   Epoch: 16   Global Step: 672970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:22:59,786-Speed 2631.04 samples/sec   Loss 2.6387   LearningRate 0.0036   Epoch: 16   Global Step: 672980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:23:03,711-Speed 2609.26 samples/sec   Loss 2.5613   LearningRate 0.0036   Epoch: 16   Global Step: 672990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:07,611-Speed 2626.63 samples/sec   Loss 2.5484   LearningRate 0.0036   Epoch: 16   Global Step: 673000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:11,511-Speed 2626.43 samples/sec   Loss 2.6329   LearningRate 0.0036   Epoch: 16   Global Step: 673010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:15,418-Speed 2621.44 samples/sec   Loss 2.5405   LearningRate 0.0036   Epoch: 16   Global Step: 673020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:19,317-Speed 2627.82 samples/sec   Loss 2.6042   LearningRate 0.0036   Epoch: 16   Global Step: 673030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:23,213-Speed 2628.59 samples/sec   Loss 2.5094   LearningRate 0.0036   Epoch: 16   Global Step: 673040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:27,130-Speed 2615.43 samples/sec   Loss 2.6271   LearningRate 0.0036   Epoch: 16   Global Step: 673050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:31,023-Speed 2630.81 samples/sec   Loss 2.6443   LearningRate 0.0036   Epoch: 16   Global Step: 673060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:34,958-Speed 2602.80 samples/sec   Loss 2.5534   LearningRate 0.0036   Epoch: 16   Global Step: 673070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:38,982-Speed 2545.74 samples/sec   Loss 2.5766   LearningRate 0.0036   Epoch: 16   Global Step: 673080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:42,892-Speed 2619.74 samples/sec   Loss 2.5823   LearningRate 0.0036   Epoch: 16   Global Step: 673090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-15 23:23:46,766-Speed 2643.72 samples/sec   Loss 2.6082   LearningRate 0.0036   Epoch: 16   Global Step: 673100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:50,662-Speed 2630.34 samples/sec   Loss 2.6162   LearningRate 0.0036   Epoch: 16   Global Step: 673110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:54,596-Speed 2603.03 samples/sec   Loss 2.6519   LearningRate 0.0036   Epoch: 16   Global Step: 673120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:23:58,501-Speed 2624.02 samples/sec   Loss 2.5432   LearningRate 0.0036   Epoch: 16   Global Step: 673130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:24:02,418-Speed 2614.32 samples/sec   Loss 2.5888   LearningRate 0.0036   Epoch: 16   Global Step: 673140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:24:06,299-Speed 2638.95 samples/sec   Loss 2.6507   LearningRate 0.0036   Epoch: 16   Global Step: 673150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:10,252-Speed 2590.97 samples/sec   Loss 2.5388   LearningRate 0.0036   Epoch: 16   Global Step: 673160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:14,156-Speed 2624.23 samples/sec   Loss 2.5096   LearningRate 0.0036   Epoch: 16   Global Step: 673170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:18,054-Speed 2627.26 samples/sec   Loss 2.6108   LearningRate 0.0036   Epoch: 16   Global Step: 673180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:21,986-Speed 2604.68 samples/sec   Loss 2.5213   LearningRate 0.0036   Epoch: 16   Global Step: 673190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:25,879-Speed 2631.53 samples/sec   Loss 2.6417   LearningRate 0.0036   Epoch: 16   Global Step: 673200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:29,829-Speed 2594.46 samples/sec   Loss 2.6424   LearningRate 0.0036   Epoch: 16   Global Step: 673210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:33,782-Speed 2591.65 samples/sec   Loss 2.6511   LearningRate 0.0036   Epoch: 16   Global Step: 673220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:37,747-Speed 2583.22 samples/sec   Loss 2.6536   LearningRate 0.0036   Epoch: 16   Global Step: 673230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:41,659-Speed 2617.90 samples/sec   Loss 2.5749   LearningRate 0.0036   Epoch: 16   Global Step: 673240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:45,533-Speed 2644.35 samples/sec   Loss 2.5375   LearningRate 0.0036   Epoch: 16   Global Step: 673250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:24:49,416-Speed 2637.43 samples/sec   Loss 2.5591   LearningRate 0.0036   Epoch: 16   Global Step: 673260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:24:53,313-Speed 2628.74 samples/sec   Loss 2.5747   LearningRate 0.0036   Epoch: 16   Global Step: 673270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:24:57,220-Speed 2621.38 samples/sec   Loss 2.5775   LearningRate 0.0035   Epoch: 16   Global Step: 673280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:01,120-Speed 2626.34 samples/sec   Loss 2.5250   LearningRate 0.0035   Epoch: 16   Global Step: 673290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:05,020-Speed 2626.39 samples/sec   Loss 2.5580   LearningRate 0.0035   Epoch: 16   Global Step: 673300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:08,918-Speed 2627.54 samples/sec   Loss 2.6711   LearningRate 0.0035   Epoch: 16   Global Step: 673310   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:12,814-Speed 2628.61 samples/sec   Loss 2.5917   LearningRate 0.0035   Epoch: 16   Global Step: 673320   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:16,712-Speed 2627.79 samples/sec   Loss 2.5406   LearningRate 0.0035   Epoch: 16   Global Step: 673330   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:20,609-Speed 2628.26 samples/sec   Loss 2.5912   LearningRate 0.0035   Epoch: 16   Global Step: 673340   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:24,515-Speed 2622.36 samples/sec   Loss 2.5787   LearningRate 0.0035   Epoch: 16   Global Step: 673350   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-15 23:25:28,418-Speed 2623.81 samples/sec   Loss 2.5654   LearningRate 0.0035   Epoch: 16   Global Step: 673360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:32,312-Speed 2630.60 samples/sec   Loss 2.6487   LearningRate 0.0035   Epoch: 16   Global Step: 673370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:36,217-Speed 2622.97 samples/sec   Loss 2.6199   LearningRate 0.0035   Epoch: 16   Global Step: 673380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:40,121-Speed 2623.68 samples/sec   Loss 2.5360   LearningRate 0.0035   Epoch: 16   Global Step: 673390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:44,023-Speed 2624.81 samples/sec   Loss 2.6033   LearningRate 0.0035   Epoch: 16   Global Step: 673400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:47,938-Speed 2616.29 samples/sec   Loss 2.5652   LearningRate 0.0035   Epoch: 16   Global Step: 673410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:51,835-Speed 2627.72 samples/sec   Loss 2.5656   LearningRate 0.0035   Epoch: 16   Global Step: 673420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:55,826-Speed 2566.95 samples/sec   Loss 2.7058   LearningRate 0.0035   Epoch: 16   Global Step: 673430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:25:59,731-Speed 2622.40 samples/sec   Loss 2.5936   LearningRate 0.0035   Epoch: 16   Global Step: 673440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:03,637-Speed 2622.55 samples/sec   Loss 2.5847   LearningRate 0.0035   Epoch: 16   Global Step: 673450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:07,540-Speed 2623.52 samples/sec   Loss 2.6461   LearningRate 0.0035   Epoch: 16   Global Step: 673460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:26:11,440-Speed 2626.13 samples/sec   Loss 2.5331   LearningRate 0.0035   Epoch: 16   Global Step: 673470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:26:15,318-Speed 2641.97 samples/sec   Loss 2.6380   LearningRate 0.0035   Epoch: 16   Global Step: 673480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:19,222-Speed 2623.27 samples/sec   Loss 2.5682   LearningRate 0.0035   Epoch: 16   Global Step: 673490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:23,142-Speed 2613.11 samples/sec   Loss 2.6321   LearningRate 0.0035   Epoch: 16   Global Step: 673500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:27,045-Speed 2623.91 samples/sec   Loss 2.5998   LearningRate 0.0035   Epoch: 16   Global Step: 673510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:30,963-Speed 2614.31 samples/sec   Loss 2.5981   LearningRate 0.0035   Epoch: 16   Global Step: 673520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:34,867-Speed 2623.05 samples/sec   Loss 2.6157   LearningRate 0.0035   Epoch: 16   Global Step: 673530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:38,770-Speed 2624.48 samples/sec   Loss 2.6463   LearningRate 0.0035   Epoch: 16   Global Step: 673540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:42,689-Speed 2613.41 samples/sec   Loss 2.5958   LearningRate 0.0035   Epoch: 16   Global Step: 673550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:46,587-Speed 2627.96 samples/sec   Loss 2.5472   LearningRate 0.0035   Epoch: 16   Global Step: 673560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:50,492-Speed 2622.75 samples/sec   Loss 2.5916   LearningRate 0.0035   Epoch: 16   Global Step: 673570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:26:54,387-Speed 2630.19 samples/sec   Loss 2.5522   LearningRate 0.0035   Epoch: 16   Global Step: 673580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:26:58,264-Speed 2641.92 samples/sec   Loss 2.6489   LearningRate 0.0035   Epoch: 16   Global Step: 673590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:02,159-Speed 2629.45 samples/sec   Loss 2.5680   LearningRate 0.0035   Epoch: 16   Global Step: 673600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:06,054-Speed 2628.94 samples/sec   Loss 2.6091   LearningRate 0.0035   Epoch: 16   Global Step: 673610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:09,955-Speed 2626.01 samples/sec   Loss 2.6365   LearningRate 0.0035   Epoch: 16   Global Step: 673620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:13,863-Speed 2620.86 samples/sec   Loss 2.6149   LearningRate 0.0035   Epoch: 16   Global Step: 673630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:17,766-Speed 2624.24 samples/sec   Loss 2.6005   LearningRate 0.0035   Epoch: 16   Global Step: 673640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:21,668-Speed 2624.60 samples/sec   Loss 2.6362   LearningRate 0.0035   Epoch: 16   Global Step: 673650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:25,574-Speed 2621.94 samples/sec   Loss 2.6059   LearningRate 0.0035   Epoch: 16   Global Step: 673660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:29,473-Speed 2627.46 samples/sec   Loss 2.5625   LearningRate 0.0035   Epoch: 16   Global Step: 673670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:33,368-Speed 2629.77 samples/sec   Loss 2.5493   LearningRate 0.0035   Epoch: 16   Global Step: 673680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:37,289-Speed 2611.76 samples/sec   Loss 2.5888   LearningRate 0.0035   Epoch: 16   Global Step: 673690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:41,321-Speed 2540.19 samples/sec   Loss 2.4967   LearningRate 0.0035   Epoch: 16   Global Step: 673700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:45,217-Speed 2629.38 samples/sec   Loss 2.5759   LearningRate 0.0035   Epoch: 16   Global Step: 673710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:49,130-Speed 2617.25 samples/sec   Loss 2.6110   LearningRate 0.0035   Epoch: 16   Global Step: 673720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:53,022-Speed 2632.15 samples/sec   Loss 2.6427   LearningRate 0.0035   Epoch: 16   Global Step: 673730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:27:56,918-Speed 2628.59 samples/sec   Loss 2.6655   LearningRate 0.0035   Epoch: 16   Global Step: 673740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:28:00,821-Speed 2624.44 samples/sec   Loss 2.5515   LearningRate 0.0035   Epoch: 16   Global Step: 673750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:28:04,723-Speed 2625.18 samples/sec   Loss 2.5491   LearningRate 0.0035   Epoch: 16   Global Step: 673760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:28:08,619-Speed 2628.64 samples/sec   Loss 2.5596   LearningRate 0.0035   Epoch: 16   Global Step: 673770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:28:12,510-Speed 2632.32 samples/sec   Loss 2.5544   LearningRate 0.0035   Epoch: 16   Global Step: 673780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:28:16,405-Speed 2629.63 samples/sec   Loss 2.6208   LearningRate 0.0035   Epoch: 16   Global Step: 673790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:20,306-Speed 2625.21 samples/sec   Loss 2.5993   LearningRate 0.0035   Epoch: 16   Global Step: 673800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:24,206-Speed 2626.49 samples/sec   Loss 2.5339   LearningRate 0.0035   Epoch: 16   Global Step: 673810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:28,130-Speed 2610.24 samples/sec   Loss 2.5659   LearningRate 0.0035   Epoch: 16   Global Step: 673820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:32,021-Speed 2631.90 samples/sec   Loss 2.6472   LearningRate 0.0035   Epoch: 16   Global Step: 673830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:35,917-Speed 2629.07 samples/sec   Loss 2.6745   LearningRate 0.0035   Epoch: 16   Global Step: 673840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:39,825-Speed 2621.11 samples/sec   Loss 2.5549   LearningRate 0.0035   Epoch: 16   Global Step: 673850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:43,726-Speed 2626.07 samples/sec   Loss 2.5158   LearningRate 0.0035   Epoch: 16   Global Step: 673860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:47,626-Speed 2626.19 samples/sec   Loss 2.6203   LearningRate 0.0035   Epoch: 16   Global Step: 673870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:51,530-Speed 2623.49 samples/sec   Loss 2.5670   LearningRate 0.0035   Epoch: 16   Global Step: 673880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-15 23:28:55,396-Speed 2649.40 samples/sec   Loss 2.6023   LearningRate 0.0035   Epoch: 16   Global Step: 673890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-15 23:28:59,286-Speed 2634.12 samples/sec   Loss 2.5619   LearningRate 0.0035   Epoch: 16   Global Step: 673900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:03,195-Speed 2620.58 samples/sec   Loss 2.5539   LearningRate 0.0035   Epoch: 16   Global Step: 673910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:07,091-Speed 2628.73 samples/sec   Loss 2.5884   LearningRate 0.0035   Epoch: 16   Global Step: 673920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:10,990-Speed 2626.91 samples/sec   Loss 2.5826   LearningRate 0.0035   Epoch: 16   Global Step: 673930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:14,885-Speed 2629.39 samples/sec   Loss 2.5217   LearningRate 0.0035   Epoch: 16   Global Step: 673940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:18,781-Speed 2629.27 samples/sec   Loss 2.5857   LearningRate 0.0035   Epoch: 16   Global Step: 673950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:22,678-Speed 2628.33 samples/sec   Loss 2.5656   LearningRate 0.0035   Epoch: 16   Global Step: 673960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:26,579-Speed 2625.47 samples/sec   Loss 2.6740   LearningRate 0.0035   Epoch: 16   Global Step: 673970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:30,476-Speed 2628.56 samples/sec   Loss 2.5106   LearningRate 0.0035   Epoch: 16   Global Step: 673980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:29:34,379-Speed 2623.85 samples/sec   Loss 2.5861   LearningRate 0.0035   Epoch: 16   Global Step: 673990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:29:38,280-Speed 2625.56 samples/sec   Loss 2.5872   LearningRate 0.0035   Epoch: 16   Global Step: 674000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:29:42,178-Speed 2627.37 samples/sec   Loss 2.6042   LearningRate 0.0035   Epoch: 16   Global Step: 674010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:29:46,072-Speed 2630.31 samples/sec   Loss 2.6249   LearningRate 0.0035   Epoch: 16   Global Step: 674020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:29:49,975-Speed 2624.12 samples/sec   Loss 2.6523   LearningRate 0.0035   Epoch: 16   Global Step: 674030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:29:53,878-Speed 2624.34 samples/sec   Loss 2.5744   LearningRate 0.0035   Epoch: 16   Global Step: 674040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:29:57,759-Speed 2639.53 samples/sec   Loss 2.6647   LearningRate 0.0035   Epoch: 16   Global Step: 674050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:30:01,655-Speed 2628.82 samples/sec   Loss 2.6236   LearningRate 0.0035   Epoch: 16   Global Step: 674060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:30:05,561-Speed 2622.60 samples/sec   Loss 2.6301   LearningRate 0.0035   Epoch: 16   Global Step: 674070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:30:09,487-Speed 2608.33 samples/sec   Loss 2.6195   LearningRate 0.0035   Epoch: 16   Global Step: 674080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:30:13,390-Speed 2624.64 samples/sec   Loss 2.5883   LearningRate 0.0035   Epoch: 16   Global Step: 674090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:30:17,301-Speed 2618.94 samples/sec   Loss 2.6478   LearningRate 0.0035   Epoch: 16   Global Step: 674100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:30:21,177-Speed 2642.37 samples/sec   Loss 2.5256   LearningRate 0.0035   Epoch: 16   Global Step: 674110   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:25,091-Speed 2617.45 samples/sec   Loss 2.6232   LearningRate 0.0035   Epoch: 16   Global Step: 674120   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:29,001-Speed 2619.61 samples/sec   Loss 2.5516   LearningRate 0.0035   Epoch: 16   Global Step: 674130   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:32,899-Speed 2627.36 samples/sec   Loss 2.5723   LearningRate 0.0035   Epoch: 16   Global Step: 674140   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:36,793-Speed 2630.12 samples/sec   Loss 2.5061   LearningRate 0.0035   Epoch: 16   Global Step: 674150   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:40,691-Speed 2628.33 samples/sec   Loss 2.6862   LearningRate 0.0035   Epoch: 16   Global Step: 674160   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:44,585-Speed 2630.13 samples/sec   Loss 2.5641   LearningRate 0.0035   Epoch: 16   Global Step: 674170   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:48,481-Speed 2629.34 samples/sec   Loss 2.6285   LearningRate 0.0035   Epoch: 16   Global Step: 674180   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:52,386-Speed 2622.79 samples/sec   Loss 2.5555   LearningRate 0.0035   Epoch: 16   Global Step: 674190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:30:56,307-Speed 2613.25 samples/sec   Loss 2.6705   LearningRate 0.0035   Epoch: 16   Global Step: 674200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:00,204-Speed 2628.25 samples/sec   Loss 2.6117   LearningRate 0.0035   Epoch: 16   Global Step: 674210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:31:04,107-Speed 2623.55 samples/sec   Loss 2.6540   LearningRate 0.0035   Epoch: 16   Global Step: 674220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:31:08,005-Speed 2628.14 samples/sec   Loss 2.5976   LearningRate 0.0035   Epoch: 16   Global Step: 674230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:31:11,901-Speed 2628.58 samples/sec   Loss 2.5729   LearningRate 0.0035   Epoch: 16   Global Step: 674240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:31:15,777-Speed 2642.71 samples/sec   Loss 2.6160   LearningRate 0.0035   Epoch: 16   Global Step: 674250   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:19,681-Speed 2623.41 samples/sec   Loss 2.5665   LearningRate 0.0035   Epoch: 16   Global Step: 674260   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:23,590-Speed 2620.86 samples/sec   Loss 2.6326   LearningRate 0.0035   Epoch: 16   Global Step: 674270   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:27,486-Speed 2628.59 samples/sec   Loss 2.5217   LearningRate 0.0035   Epoch: 16   Global Step: 674280   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:31,381-Speed 2629.56 samples/sec   Loss 2.5647   LearningRate 0.0035   Epoch: 16   Global Step: 674290   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:35,280-Speed 2626.84 samples/sec   Loss 2.5438   LearningRate 0.0035   Epoch: 16   Global Step: 674300   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:39,215-Speed 2603.01 samples/sec   Loss 2.5261   LearningRate 0.0035   Epoch: 16   Global Step: 674310   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:43,114-Speed 2627.45 samples/sec   Loss 2.5451   LearningRate 0.0035   Epoch: 16   Global Step: 674320   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:47,012-Speed 2628.34 samples/sec   Loss 2.5384   LearningRate 0.0035   Epoch: 16   Global Step: 674330   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:50,905-Speed 2630.62 samples/sec   Loss 2.6012   LearningRate 0.0035   Epoch: 16   Global Step: 674340   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:31:54,798-Speed 2632.67 samples/sec   Loss 2.6146   LearningRate 0.0035   Epoch: 16   Global Step: 674350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:31:58,696-Speed 2627.11 samples/sec   Loss 2.5639   LearningRate 0.0035   Epoch: 16   Global Step: 674360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:02,620-Speed 2610.24 samples/sec   Loss 2.5621   LearningRate 0.0035   Epoch: 16   Global Step: 674370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:06,530-Speed 2619.26 samples/sec   Loss 2.5617   LearningRate 0.0035   Epoch: 16   Global Step: 674380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:10,427-Speed 2628.77 samples/sec   Loss 2.5622   LearningRate 0.0035   Epoch: 16   Global Step: 674390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:14,331-Speed 2623.75 samples/sec   Loss 2.4825   LearningRate 0.0035   Epoch: 16   Global Step: 674400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:18,247-Speed 2616.67 samples/sec   Loss 2.5804   LearningRate 0.0035   Epoch: 16   Global Step: 674410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:22,158-Speed 2618.39 samples/sec   Loss 2.5059   LearningRate 0.0035   Epoch: 16   Global Step: 674420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:26,059-Speed 2626.31 samples/sec   Loss 2.6191   LearningRate 0.0035   Epoch: 16   Global Step: 674430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:30,021-Speed 2585.29 samples/sec   Loss 2.6724   LearningRate 0.0035   Epoch: 16   Global Step: 674440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:33,959-Speed 2600.83 samples/sec   Loss 2.5218   LearningRate 0.0035   Epoch: 16   Global Step: 674450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:32:37,840-Speed 2638.82 samples/sec   Loss 2.5729   LearningRate 0.0035   Epoch: 16   Global Step: 674460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:41,743-Speed 2624.67 samples/sec   Loss 2.6757   LearningRate 0.0035   Epoch: 16   Global Step: 674470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:45,643-Speed 2626.76 samples/sec   Loss 2.5667   LearningRate 0.0035   Epoch: 16   Global Step: 674480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:49,544-Speed 2625.01 samples/sec   Loss 2.5982   LearningRate 0.0035   Epoch: 16   Global Step: 674490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:53,456-Speed 2618.35 samples/sec   Loss 2.5142   LearningRate 0.0035   Epoch: 16   Global Step: 674500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:32:57,354-Speed 2628.04 samples/sec   Loss 2.5933   LearningRate 0.0035   Epoch: 16   Global Step: 674510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:01,259-Speed 2622.61 samples/sec   Loss 2.5411   LearningRate 0.0035   Epoch: 16   Global Step: 674520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:05,159-Speed 2626.19 samples/sec   Loss 2.5600   LearningRate 0.0035   Epoch: 16   Global Step: 674530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:09,059-Speed 2626.90 samples/sec   Loss 2.5008   LearningRate 0.0035   Epoch: 16   Global Step: 674540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:12,955-Speed 2629.26 samples/sec   Loss 2.5552   LearningRate 0.0035   Epoch: 16   Global Step: 674550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:16,852-Speed 2628.36 samples/sec   Loss 2.5982   LearningRate 0.0035   Epoch: 16   Global Step: 674560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:33:20,749-Speed 2628.84 samples/sec   Loss 2.6063   LearningRate 0.0035   Epoch: 16   Global Step: 674570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:33:24,644-Speed 2629.20 samples/sec   Loss 2.6500   LearningRate 0.0035   Epoch: 16   Global Step: 674580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:33:28,601-Speed 2589.13 samples/sec   Loss 2.6121   LearningRate 0.0035   Epoch: 16   Global Step: 674590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:33:32,480-Speed 2640.44 samples/sec   Loss 2.6291   LearningRate 0.0035   Epoch: 16   Global Step: 674600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:36,373-Speed 2630.60 samples/sec   Loss 2.6497   LearningRate 0.0035   Epoch: 16   Global Step: 674610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:40,278-Speed 2623.39 samples/sec   Loss 2.5829   LearningRate 0.0035   Epoch: 16   Global Step: 674620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:44,175-Speed 2628.28 samples/sec   Loss 2.5769   LearningRate 0.0035   Epoch: 16   Global Step: 674630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:48,069-Speed 2630.37 samples/sec   Loss 2.4869   LearningRate 0.0035   Epoch: 16   Global Step: 674640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:51,973-Speed 2624.05 samples/sec   Loss 2.6055   LearningRate 0.0035   Epoch: 16   Global Step: 674650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:55,869-Speed 2628.99 samples/sec   Loss 2.5780   LearningRate 0.0035   Epoch: 16   Global Step: 674660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:33:59,780-Speed 2619.02 samples/sec   Loss 2.6700   LearningRate 0.0035   Epoch: 16   Global Step: 674670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:03,685-Speed 2622.63 samples/sec   Loss 2.5796   LearningRate 0.0035   Epoch: 16   Global Step: 674680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:07,577-Speed 2631.99 samples/sec   Loss 2.6091   LearningRate 0.0035   Epoch: 16   Global Step: 674690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:11,473-Speed 2629.06 samples/sec   Loss 2.5171   LearningRate 0.0035   Epoch: 16   Global Step: 674700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:34:15,370-Speed 2628.51 samples/sec   Loss 2.5165   LearningRate 0.0035   Epoch: 16   Global Step: 674710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:34:19,270-Speed 2625.68 samples/sec   Loss 2.5640   LearningRate 0.0035   Epoch: 16   Global Step: 674720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:34:23,166-Speed 2629.87 samples/sec   Loss 2.5673   LearningRate 0.0035   Epoch: 16   Global Step: 674730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:34:27,059-Speed 2630.63 samples/sec   Loss 2.5186   LearningRate 0.0035   Epoch: 16   Global Step: 674740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:34:30,970-Speed 2619.06 samples/sec   Loss 2.4583   LearningRate 0.0035   Epoch: 16   Global Step: 674750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:34:34,872-Speed 2625.02 samples/sec   Loss 2.5473   LearningRate 0.0035   Epoch: 16   Global Step: 674760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:38,770-Speed 2627.49 samples/sec   Loss 2.5901   LearningRate 0.0035   Epoch: 16   Global Step: 674770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:42,665-Speed 2629.31 samples/sec   Loss 2.6248   LearningRate 0.0035   Epoch: 16   Global Step: 674780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:46,557-Speed 2632.60 samples/sec   Loss 2.5925   LearningRate 0.0035   Epoch: 16   Global Step: 674790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:50,453-Speed 2628.77 samples/sec   Loss 2.5044   LearningRate 0.0035   Epoch: 16   Global Step: 674800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:54,352-Speed 2627.20 samples/sec   Loss 2.5849   LearningRate 0.0035   Epoch: 16   Global Step: 674810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:34:58,250-Speed 2629.25 samples/sec   Loss 2.5320   LearningRate 0.0035   Epoch: 16   Global Step: 674820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:02,153-Speed 2623.59 samples/sec   Loss 2.5729   LearningRate 0.0035   Epoch: 16   Global Step: 674830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:06,050-Speed 2628.32 samples/sec   Loss 2.5926   LearningRate 0.0035   Epoch: 16   Global Step: 674840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:09,950-Speed 2626.07 samples/sec   Loss 2.5770   LearningRate 0.0035   Epoch: 16   Global Step: 674850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:13,833-Speed 2638.10 samples/sec   Loss 2.5593   LearningRate 0.0035   Epoch: 16   Global Step: 674860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:17,731-Speed 2627.43 samples/sec   Loss 2.5789   LearningRate 0.0035   Epoch: 16   Global Step: 674870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:21,635-Speed 2624.09 samples/sec   Loss 2.5217   LearningRate 0.0035   Epoch: 16   Global Step: 674880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:25,541-Speed 2621.54 samples/sec   Loss 2.5624   LearningRate 0.0035   Epoch: 16   Global Step: 674890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:29,443-Speed 2625.42 samples/sec   Loss 2.5597   LearningRate 0.0035   Epoch: 16   Global Step: 674900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:33,341-Speed 2627.51 samples/sec   Loss 2.5964   LearningRate 0.0035   Epoch: 16   Global Step: 674910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:35:37,220-Speed 2640.94 samples/sec   Loss 2.6163   LearningRate 0.0035   Epoch: 16   Global Step: 674920   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:35:41,125-Speed 2622.33 samples/sec   Loss 2.5457   LearningRate 0.0035   Epoch: 16   Global Step: 674930   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:35:45,023-Speed 2628.53 samples/sec   Loss 2.5734   LearningRate 0.0035   Epoch: 16   Global Step: 674940   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:35:48,928-Speed 2622.30 samples/sec   Loss 2.5511   LearningRate 0.0035   Epoch: 16   Global Step: 674950   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:35:52,873-Speed 2597.02 samples/sec   Loss 2.4882   LearningRate 0.0035   Epoch: 16   Global Step: 674960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:35:56,775-Speed 2624.66 samples/sec   Loss 2.5093   LearningRate 0.0035   Epoch: 16   Global Step: 674970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:36:00,667-Speed 2631.89 samples/sec   Loss 2.5484   LearningRate 0.0035   Epoch: 16   Global Step: 674980   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:36:04,561-Speed 2630.30 samples/sec   Loss 2.5813   LearningRate 0.0035   Epoch: 16   Global Step: 674990   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:36:08,454-Speed 2630.54 samples/sec   Loss 2.5899   LearningRate 0.0035   Epoch: 16   Global Step: 675000   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:36:12,352-Speed 2628.00 samples/sec   Loss 2.5774   LearningRate 0.0035   Epoch: 16   Global Step: 675010   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:36:16,251-Speed 2626.90 samples/sec   Loss 2.5507   LearningRate 0.0035   Epoch: 16   Global Step: 675020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:20,146-Speed 2629.64 samples/sec   Loss 2.5296   LearningRate 0.0035   Epoch: 16   Global Step: 675030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:24,040-Speed 2630.34 samples/sec   Loss 2.4850   LearningRate 0.0035   Epoch: 16   Global Step: 675040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:27,932-Speed 2631.17 samples/sec   Loss 2.5560   LearningRate 0.0035   Epoch: 16   Global Step: 675050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:31,828-Speed 2629.33 samples/sec   Loss 2.5973   LearningRate 0.0035   Epoch: 16   Global Step: 675060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:35,722-Speed 2631.07 samples/sec   Loss 2.5779   LearningRate 0.0035   Epoch: 16   Global Step: 675070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:39,617-Speed 2629.57 samples/sec   Loss 2.5363   LearningRate 0.0035   Epoch: 16   Global Step: 675080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:43,539-Speed 2611.61 samples/sec   Loss 2.5980   LearningRate 0.0035   Epoch: 16   Global Step: 675090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:47,437-Speed 2628.11 samples/sec   Loss 2.6189   LearningRate 0.0035   Epoch: 16   Global Step: 675100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:51,331-Speed 2629.90 samples/sec   Loss 2.5344   LearningRate 0.0035   Epoch: 16   Global Step: 675110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:36:55,236-Speed 2623.51 samples/sec   Loss 2.5127   LearningRate 0.0035   Epoch: 16   Global Step: 675120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:36:59,141-Speed 2622.61 samples/sec   Loss 2.6068   LearningRate 0.0035   Epoch: 16   Global Step: 675130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:37:03,036-Speed 2629.68 samples/sec   Loss 2.5260   LearningRate 0.0035   Epoch: 16   Global Step: 675140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:37:06,935-Speed 2627.03 samples/sec   Loss 2.5110   LearningRate 0.0035   Epoch: 16   Global Step: 675150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:37:10,829-Speed 2630.50 samples/sec   Loss 2.5842   LearningRate 0.0035   Epoch: 16   Global Step: 675160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:37:14,698-Speed 2646.75 samples/sec   Loss 2.5866   LearningRate 0.0035   Epoch: 16   Global Step: 675170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:37:18,610-Speed 2618.78 samples/sec   Loss 2.5612   LearningRate 0.0035   Epoch: 16   Global Step: 675180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:37:22,512-Speed 2625.20 samples/sec   Loss 2.5235   LearningRate 0.0035   Epoch: 16   Global Step: 675190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:37:26,380-Speed 2647.81 samples/sec   Loss 2.6146   LearningRate 0.0035   Epoch: 16   Global Step: 675200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:30,294-Speed 2617.22 samples/sec   Loss 2.6336   LearningRate 0.0035   Epoch: 16   Global Step: 675210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:34,203-Speed 2619.81 samples/sec   Loss 2.5963   LearningRate 0.0035   Epoch: 16   Global Step: 675220   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:38,107-Speed 2623.32 samples/sec   Loss 2.5584   LearningRate 0.0035   Epoch: 16   Global Step: 675230   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:42,055-Speed 2595.25 samples/sec   Loss 2.5613   LearningRate 0.0035   Epoch: 16   Global Step: 675240   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:45,974-Speed 2613.86 samples/sec   Loss 2.5338   LearningRate 0.0035   Epoch: 16   Global Step: 675250   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:49,885-Speed 2618.36 samples/sec   Loss 2.5508   LearningRate 0.0035   Epoch: 16   Global Step: 675260   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:53,782-Speed 2628.74 samples/sec   Loss 2.5385   LearningRate 0.0035   Epoch: 16   Global Step: 675270   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:37:57,695-Speed 2617.69 samples/sec   Loss 2.5508   LearningRate 0.0035   Epoch: 16   Global Step: 675280   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:38:01,600-Speed 2622.83 samples/sec   Loss 2.5743   LearningRate 0.0035   Epoch: 16   Global Step: 675290   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:38:05,509-Speed 2619.92 samples/sec   Loss 2.5594   LearningRate 0.0035   Epoch: 16   Global Step: 675300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:09,507-Speed 2562.26 samples/sec   Loss 2.5687   LearningRate 0.0035   Epoch: 16   Global Step: 675310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:13,472-Speed 2583.61 samples/sec   Loss 2.5351   LearningRate 0.0035   Epoch: 16   Global Step: 675320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:17,370-Speed 2627.27 samples/sec   Loss 2.5700   LearningRate 0.0035   Epoch: 16   Global Step: 675330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:21,276-Speed 2622.74 samples/sec   Loss 2.5339   LearningRate 0.0035   Epoch: 16   Global Step: 675340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:25,189-Speed 2618.07 samples/sec   Loss 2.6152   LearningRate 0.0035   Epoch: 16   Global Step: 675350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:29,094-Speed 2622.76 samples/sec   Loss 2.6356   LearningRate 0.0035   Epoch: 16   Global Step: 675360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:33,002-Speed 2621.19 samples/sec   Loss 2.5752   LearningRate 0.0035   Epoch: 16   Global Step: 675370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:36,891-Speed 2633.49 samples/sec   Loss 2.5131   LearningRate 0.0035   Epoch: 16   Global Step: 675380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:40,818-Speed 2608.70 samples/sec   Loss 2.5760   LearningRate 0.0035   Epoch: 16   Global Step: 675390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:38:44,738-Speed 2612.88 samples/sec   Loss 2.5585   LearningRate 0.0035   Epoch: 16   Global Step: 675400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:38:48,641-Speed 2624.09 samples/sec   Loss 2.5421   LearningRate 0.0035   Epoch: 16   Global Step: 675410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:38:52,543-Speed 2625.29 samples/sec   Loss 2.5569   LearningRate 0.0035   Epoch: 16   Global Step: 675420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:38:56,445-Speed 2625.85 samples/sec   Loss 2.6160   LearningRate 0.0035   Epoch: 16   Global Step: 675430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:39:00,344-Speed 2626.71 samples/sec   Loss 2.4948   LearningRate 0.0035   Epoch: 16   Global Step: 675440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:39:04,221-Speed 2641.12 samples/sec   Loss 2.5893   LearningRate 0.0035   Epoch: 16   Global Step: 675450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:08,122-Speed 2625.83 samples/sec   Loss 2.6229   LearningRate 0.0035   Epoch: 16   Global Step: 675460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:12,065-Speed 2598.15 samples/sec   Loss 2.5764   LearningRate 0.0035   Epoch: 16   Global Step: 675470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:15,987-Speed 2611.69 samples/sec   Loss 2.6246   LearningRate 0.0035   Epoch: 16   Global Step: 675480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:19,886-Speed 2626.85 samples/sec   Loss 2.5883   LearningRate 0.0035   Epoch: 16   Global Step: 675490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:23,816-Speed 2607.33 samples/sec   Loss 2.5468   LearningRate 0.0034   Epoch: 16   Global Step: 675500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:27,722-Speed 2622.16 samples/sec   Loss 2.5244   LearningRate 0.0034   Epoch: 16   Global Step: 675510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:31,635-Speed 2617.51 samples/sec   Loss 2.6067   LearningRate 0.0034   Epoch: 16   Global Step: 675520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:35,587-Speed 2591.87 samples/sec   Loss 2.6019   LearningRate 0.0034   Epoch: 16   Global Step: 675530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:39,482-Speed 2629.92 samples/sec   Loss 2.4981   LearningRate 0.0034   Epoch: 16   Global Step: 675540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:43,388-Speed 2622.15 samples/sec   Loss 2.6364   LearningRate 0.0034   Epoch: 16   Global Step: 675550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:39:47,258-Speed 2646.45 samples/sec   Loss 2.5632   LearningRate 0.0034   Epoch: 16   Global Step: 675560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:51,157-Speed 2627.39 samples/sec   Loss 2.5661   LearningRate 0.0034   Epoch: 16   Global Step: 675570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:55,059-Speed 2624.67 samples/sec   Loss 2.5371   LearningRate 0.0034   Epoch: 16   Global Step: 675580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:39:58,996-Speed 2602.39 samples/sec   Loss 2.6090   LearningRate 0.0034   Epoch: 16   Global Step: 675590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:02,916-Speed 2612.43 samples/sec   Loss 2.5120   LearningRate 0.0034   Epoch: 16   Global Step: 675600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:06,823-Speed 2622.09 samples/sec   Loss 2.5885   LearningRate 0.0034   Epoch: 16   Global Step: 675610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:10,779-Speed 2588.93 samples/sec   Loss 2.4994   LearningRate 0.0034   Epoch: 16   Global Step: 675620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:14,744-Speed 2583.64 samples/sec   Loss 2.5654   LearningRate 0.0034   Epoch: 16   Global Step: 675630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:18,637-Speed 2630.88 samples/sec   Loss 2.6497   LearningRate 0.0034   Epoch: 16   Global Step: 675640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:22,577-Speed 2599.94 samples/sec   Loss 2.6289   LearningRate 0.0034   Epoch: 16   Global Step: 675650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:26,465-Speed 2634.88 samples/sec   Loss 2.5808   LearningRate 0.0034   Epoch: 16   Global Step: 675660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:40:30,364-Speed 2626.87 samples/sec   Loss 2.5569   LearningRate 0.0034   Epoch: 16   Global Step: 675670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:40:34,293-Speed 2606.99 samples/sec   Loss 2.5325   LearningRate 0.0034   Epoch: 16   Global Step: 675680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:40:38,234-Speed 2599.25 samples/sec   Loss 2.6701   LearningRate 0.0034   Epoch: 16   Global Step: 675690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:42,132-Speed 2627.38 samples/sec   Loss 2.5674   LearningRate 0.0034   Epoch: 16   Global Step: 675700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:46,044-Speed 2618.55 samples/sec   Loss 2.5397   LearningRate 0.0034   Epoch: 16   Global Step: 675710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:49,937-Speed 2630.82 samples/sec   Loss 2.5268   LearningRate 0.0034   Epoch: 16   Global Step: 675720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:53,833-Speed 2629.51 samples/sec   Loss 2.5254   LearningRate 0.0034   Epoch: 16   Global Step: 675730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:40:57,724-Speed 2631.91 samples/sec   Loss 2.5539   LearningRate 0.0034   Epoch: 16   Global Step: 675740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:41:01,637-Speed 2617.65 samples/sec   Loss 2.4837   LearningRate 0.0034   Epoch: 16   Global Step: 675750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:41:05,535-Speed 2627.80 samples/sec   Loss 2.5077   LearningRate 0.0034   Epoch: 16   Global Step: 675760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:41:09,426-Speed 2632.39 samples/sec   Loss 2.6190   LearningRate 0.0034   Epoch: 16   Global Step: 675770   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:13,323-Speed 2628.30 samples/sec   Loss 2.4922   LearningRate 0.0034   Epoch: 16   Global Step: 675780   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:17,228-Speed 2623.36 samples/sec   Loss 2.5916   LearningRate 0.0034   Epoch: 16   Global Step: 675790   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:21,172-Speed 2596.91 samples/sec   Loss 2.5687   LearningRate 0.0034   Epoch: 16   Global Step: 675800   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:25,073-Speed 2625.75 samples/sec   Loss 2.5903   LearningRate 0.0034   Epoch: 16   Global Step: 675810   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:28,968-Speed 2629.88 samples/sec   Loss 2.5211   LearningRate 0.0034   Epoch: 16   Global Step: 675820   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:32,906-Speed 2600.62 samples/sec   Loss 2.4919   LearningRate 0.0034   Epoch: 16   Global Step: 675830   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:36,803-Speed 2628.60 samples/sec   Loss 2.5690   LearningRate 0.0034   Epoch: 16   Global Step: 675840   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:40,708-Speed 2623.14 samples/sec   Loss 2.5877   LearningRate 0.0034   Epoch: 16   Global Step: 675850   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:44,606-Speed 2627.77 samples/sec   Loss 2.6067   LearningRate 0.0034   Epoch: 16   Global Step: 675860   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:41:48,503-Speed 2627.80 samples/sec   Loss 2.5630   LearningRate 0.0034   Epoch: 16   Global Step: 675870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:41:52,406-Speed 2624.75 samples/sec   Loss 2.5459   LearningRate 0.0034   Epoch: 16   Global Step: 675880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:41:56,303-Speed 2628.23 samples/sec   Loss 2.5609   LearningRate 0.0034   Epoch: 16   Global Step: 675890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:00,198-Speed 2630.31 samples/sec   Loss 2.5245   LearningRate 0.0034   Epoch: 16   Global Step: 675900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:04,089-Speed 2632.15 samples/sec   Loss 2.5144   LearningRate 0.0034   Epoch: 16   Global Step: 675910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:07,987-Speed 2627.60 samples/sec   Loss 2.5654   LearningRate 0.0034   Epoch: 16   Global Step: 675920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:11,883-Speed 2629.21 samples/sec   Loss 2.5195   LearningRate 0.0034   Epoch: 16   Global Step: 675930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:15,825-Speed 2598.27 samples/sec   Loss 2.4426   LearningRate 0.0034   Epoch: 16   Global Step: 675940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:19,727-Speed 2625.38 samples/sec   Loss 2.5483   LearningRate 0.0034   Epoch: 16   Global Step: 675950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:23,643-Speed 2615.24 samples/sec   Loss 2.5580   LearningRate 0.0034   Epoch: 16   Global Step: 675960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:27,549-Speed 2622.75 samples/sec   Loss 2.6185   LearningRate 0.0034   Epoch: 16   Global Step: 675970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:42:31,420-Speed 2646.60 samples/sec   Loss 2.4933   LearningRate 0.0034   Epoch: 16   Global Step: 675980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:35,334-Speed 2616.59 samples/sec   Loss 2.5379   LearningRate 0.0034   Epoch: 16   Global Step: 675990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:39,237-Speed 2624.29 samples/sec   Loss 2.6284   LearningRate 0.0034   Epoch: 16   Global Step: 676000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:43,133-Speed 2628.97 samples/sec   Loss 2.5687   LearningRate 0.0034   Epoch: 16   Global Step: 676010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:47,031-Speed 2627.55 samples/sec   Loss 2.5861   LearningRate 0.0034   Epoch: 16   Global Step: 676020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:50,928-Speed 2628.26 samples/sec   Loss 2.5775   LearningRate 0.0034   Epoch: 16   Global Step: 676030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:54,916-Speed 2568.67 samples/sec   Loss 2.5249   LearningRate 0.0034   Epoch: 16   Global Step: 676040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:42:58,900-Speed 2570.77 samples/sec   Loss 2.6206   LearningRate 0.0034   Epoch: 16   Global Step: 676050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:43:02,787-Speed 2635.13 samples/sec   Loss 2.5935   LearningRate 0.0034   Epoch: 16   Global Step: 676060   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:06,686-Speed 2627.18 samples/sec   Loss 2.4971   LearningRate 0.0034   Epoch: 16   Global Step: 676070   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:10,605-Speed 2613.86 samples/sec   Loss 2.4828   LearningRate 0.0034   Epoch: 16   Global Step: 676080   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:14,501-Speed 2628.76 samples/sec   Loss 2.5166   LearningRate 0.0034   Epoch: 16   Global Step: 676090   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:18,395-Speed 2630.14 samples/sec   Loss 2.6147   LearningRate 0.0034   Epoch: 16   Global Step: 676100   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:22,295-Speed 2627.02 samples/sec   Loss 2.5127   LearningRate 0.0034   Epoch: 16   Global Step: 676110   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:26,192-Speed 2628.36 samples/sec   Loss 2.5536   LearningRate 0.0034   Epoch: 16   Global Step: 676120   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:30,092-Speed 2626.49 samples/sec   Loss 2.5337   LearningRate 0.0034   Epoch: 16   Global Step: 676130   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:34,013-Speed 2611.85 samples/sec   Loss 2.5553   LearningRate 0.0034   Epoch: 16   Global Step: 676140   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:37,910-Speed 2628.44 samples/sec   Loss 2.5245   LearningRate 0.0034   Epoch: 16   Global Step: 676150   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:43:41,802-Speed 2631.43 samples/sec   Loss 2.4679   LearningRate 0.0034   Epoch: 16   Global Step: 676160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:43:45,696-Speed 2630.38 samples/sec   Loss 2.5756   LearningRate 0.0034   Epoch: 16   Global Step: 676170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:43:49,588-Speed 2631.96 samples/sec   Loss 2.4751   LearningRate 0.0034   Epoch: 16   Global Step: 676180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:43:53,480-Speed 2631.44 samples/sec   Loss 2.5641   LearningRate 0.0034   Epoch: 16   Global Step: 676190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:43:57,373-Speed 2631.42 samples/sec   Loss 2.5586   LearningRate 0.0034   Epoch: 16   Global Step: 676200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:01,266-Speed 2631.47 samples/sec   Loss 2.5400   LearningRate 0.0034   Epoch: 16   Global Step: 676210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:05,165-Speed 2626.65 samples/sec   Loss 2.5896   LearningRate 0.0034   Epoch: 16   Global Step: 676220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:09,060-Speed 2629.78 samples/sec   Loss 2.5929   LearningRate 0.0034   Epoch: 16   Global Step: 676230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:12,952-Speed 2631.42 samples/sec   Loss 2.5955   LearningRate 0.0034   Epoch: 16   Global Step: 676240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:16,945-Speed 2565.36 samples/sec   Loss 2.5749   LearningRate 0.0034   Epoch: 16   Global Step: 676250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:20,835-Speed 2632.46 samples/sec   Loss 2.5671   LearningRate 0.0034   Epoch: 16   Global Step: 676260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:44:24,729-Speed 2630.73 samples/sec   Loss 2.6379   LearningRate 0.0034   Epoch: 16   Global Step: 676270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:44:28,626-Speed 2628.27 samples/sec   Loss 2.5216   LearningRate 0.0034   Epoch: 16   Global Step: 676280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:44:32,547-Speed 2612.60 samples/sec   Loss 2.5130   LearningRate 0.0034   Epoch: 16   Global Step: 676290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:36,444-Speed 2628.44 samples/sec   Loss 2.6068   LearningRate 0.0034   Epoch: 16   Global Step: 676300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:40,341-Speed 2628.00 samples/sec   Loss 2.4979   LearningRate 0.0034   Epoch: 16   Global Step: 676310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:44,261-Speed 2612.78 samples/sec   Loss 2.4751   LearningRate 0.0034   Epoch: 16   Global Step: 676320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:48,160-Speed 2627.28 samples/sec   Loss 2.5504   LearningRate 0.0034   Epoch: 16   Global Step: 676330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:52,143-Speed 2571.97 samples/sec   Loss 2.5942   LearningRate 0.0034   Epoch: 16   Global Step: 676340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:56,047-Speed 2623.58 samples/sec   Loss 2.5726   LearningRate 0.0034   Epoch: 16   Global Step: 676350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:44:59,960-Speed 2617.41 samples/sec   Loss 2.5401   LearningRate 0.0034   Epoch: 16   Global Step: 676360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:45:03,862-Speed 2625.12 samples/sec   Loss 2.5101   LearningRate 0.0034   Epoch: 16   Global Step: 676370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:45:07,761-Speed 2626.78 samples/sec   Loss 2.5632   LearningRate 0.0034   Epoch: 16   Global Step: 676380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:45:11,662-Speed 2625.81 samples/sec   Loss 2.5269   LearningRate 0.0034   Epoch: 16   Global Step: 676390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:45:15,561-Speed 2626.81 samples/sec   Loss 2.5751   LearningRate 0.0034   Epoch: 16   Global Step: 676400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:45:19,512-Speed 2592.24 samples/sec   Loss 2.5637   LearningRate 0.0034   Epoch: 16   Global Step: 676410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:45:23,421-Speed 2620.78 samples/sec   Loss 2.5782   LearningRate 0.0034   Epoch: 16   Global Step: 676420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:45:27,329-Speed 2620.50 samples/sec   Loss 2.5980   LearningRate 0.0034   Epoch: 16   Global Step: 676430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:45:31,324-Speed 2564.56 samples/sec   Loss 2.5568   LearningRate 0.0034   Epoch: 16   Global Step: 676440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:45:35,208-Speed 2636.98 samples/sec   Loss 2.5308   LearningRate 0.0034   Epoch: 16   Global Step: 676450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:45:39,118-Speed 2619.33 samples/sec   Loss 2.4845   LearningRate 0.0034   Epoch: 16   Global Step: 676460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:45:43,026-Speed 2620.97 samples/sec   Loss 2.5496   LearningRate 0.0034   Epoch: 16   Global Step: 676470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:45:46,906-Speed 2639.87 samples/sec   Loss 2.5900   LearningRate 0.0034   Epoch: 16   Global Step: 676480   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:45:50,806-Speed 2626.62 samples/sec   Loss 2.5353   LearningRate 0.0034   Epoch: 16   Global Step: 676490   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:45:54,704-Speed 2627.65 samples/sec   Loss 2.5339   LearningRate 0.0034   Epoch: 16   Global Step: 676500   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:45:58,612-Speed 2620.72 samples/sec   Loss 2.5404   LearningRate 0.0034   Epoch: 16   Global Step: 676510   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:02,517-Speed 2623.19 samples/sec   Loss 2.5204   LearningRate 0.0034   Epoch: 16   Global Step: 676520   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:06,409-Speed 2631.88 samples/sec   Loss 2.5416   LearningRate 0.0034   Epoch: 16   Global Step: 676530   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:10,303-Speed 2629.76 samples/sec   Loss 2.5394   LearningRate 0.0034   Epoch: 16   Global Step: 676540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:14,198-Speed 2629.37 samples/sec   Loss 2.5641   LearningRate 0.0034   Epoch: 16   Global Step: 676550   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:18,099-Speed 2626.34 samples/sec   Loss 2.5736   LearningRate 0.0034   Epoch: 16   Global Step: 676560   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:22,044-Speed 2596.48 samples/sec   Loss 2.5686   LearningRate 0.0034   Epoch: 16   Global Step: 676570   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:46:25,937-Speed 2631.33 samples/sec   Loss 2.5127   LearningRate 0.0034   Epoch: 16   Global Step: 676580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:29,852-Speed 2616.26 samples/sec   Loss 2.4901   LearningRate 0.0034   Epoch: 16   Global Step: 676590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:33,749-Speed 2628.43 samples/sec   Loss 2.5968   LearningRate 0.0034   Epoch: 16   Global Step: 676600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:37,646-Speed 2627.85 samples/sec   Loss 2.5987   LearningRate 0.0034   Epoch: 16   Global Step: 676610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:41,544-Speed 2627.93 samples/sec   Loss 2.6009   LearningRate 0.0034   Epoch: 16   Global Step: 676620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:45,443-Speed 2626.84 samples/sec   Loss 2.4777   LearningRate 0.0034   Epoch: 16   Global Step: 676630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:49,467-Speed 2545.51 samples/sec   Loss 2.4687   LearningRate 0.0034   Epoch: 16   Global Step: 676640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:53,453-Speed 2578.21 samples/sec   Loss 2.5543   LearningRate 0.0034   Epoch: 16   Global Step: 676650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:46:57,350-Speed 2628.04 samples/sec   Loss 2.5906   LearningRate 0.0034   Epoch: 16   Global Step: 676660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:01,243-Speed 2631.08 samples/sec   Loss 2.5380   LearningRate 0.0034   Epoch: 16   Global Step: 676670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:05,142-Speed 2627.38 samples/sec   Loss 2.5925   LearningRate 0.0034   Epoch: 16   Global Step: 676680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:47:09,047-Speed 2622.81 samples/sec   Loss 2.4942   LearningRate 0.0034   Epoch: 16   Global Step: 676690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:47:12,916-Speed 2647.27 samples/sec   Loss 2.5514   LearningRate 0.0034   Epoch: 16   Global Step: 676700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:16,806-Speed 2633.30 samples/sec   Loss 2.5119   LearningRate 0.0034   Epoch: 16   Global Step: 676710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:20,700-Speed 2630.06 samples/sec   Loss 2.5178   LearningRate 0.0034   Epoch: 16   Global Step: 676720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:24,595-Speed 2629.54 samples/sec   Loss 2.5619   LearningRate 0.0034   Epoch: 16   Global Step: 676730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:28,489-Speed 2630.54 samples/sec   Loss 2.5684   LearningRate 0.0034   Epoch: 16   Global Step: 676740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:32,382-Speed 2631.17 samples/sec   Loss 2.4655   LearningRate 0.0034   Epoch: 16   Global Step: 676750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:36,276-Speed 2630.10 samples/sec   Loss 2.5938   LearningRate 0.0034   Epoch: 16   Global Step: 676760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:40,173-Speed 2629.40 samples/sec   Loss 2.5930   LearningRate 0.0034   Epoch: 16   Global Step: 676770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:44,061-Speed 2634.29 samples/sec   Loss 2.5138   LearningRate 0.0034   Epoch: 16   Global Step: 676780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:47:47,968-Speed 2621.89 samples/sec   Loss 2.5637   LearningRate 0.0034   Epoch: 16   Global Step: 676790   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:47:51,920-Speed 2592.26 samples/sec   Loss 2.5264   LearningRate 0.0034   Epoch: 16   Global Step: 676800   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:47:55,896-Speed 2575.70 samples/sec   Loss 2.4919   LearningRate 0.0034   Epoch: 16   Global Step: 676810   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:47:59,808-Speed 2619.10 samples/sec   Loss 2.5448   LearningRate 0.0034   Epoch: 16   Global Step: 676820   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:03,704-Speed 2629.10 samples/sec   Loss 2.5422   LearningRate 0.0034   Epoch: 16   Global Step: 676830   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:07,603-Speed 2626.47 samples/sec   Loss 2.5114   LearningRate 0.0034   Epoch: 16   Global Step: 676840   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:11,508-Speed 2622.68 samples/sec   Loss 2.4418   LearningRate 0.0034   Epoch: 16   Global Step: 676850   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:15,409-Speed 2625.80 samples/sec   Loss 2.5391   LearningRate 0.0034   Epoch: 16   Global Step: 676860   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:19,306-Speed 2628.15 samples/sec   Loss 2.5168   LearningRate 0.0034   Epoch: 16   Global Step: 676870   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:23,205-Speed 2627.45 samples/sec   Loss 2.5824   LearningRate 0.0034   Epoch: 16   Global Step: 676880   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:48:27,104-Speed 2627.96 samples/sec   Loss 2.5814   LearningRate 0.0034   Epoch: 16   Global Step: 676890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:31,077-Speed 2578.41 samples/sec   Loss 2.5373   LearningRate 0.0034   Epoch: 16   Global Step: 676900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:34,981-Speed 2623.42 samples/sec   Loss 2.5101   LearningRate 0.0034   Epoch: 16   Global Step: 676910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:38,881-Speed 2626.22 samples/sec   Loss 2.5605   LearningRate 0.0034   Epoch: 16   Global Step: 676920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:42,795-Speed 2617.22 samples/sec   Loss 2.4634   LearningRate 0.0034   Epoch: 16   Global Step: 676930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:46,694-Speed 2627.27 samples/sec   Loss 2.4737   LearningRate 0.0034   Epoch: 16   Global Step: 676940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:50,586-Speed 2631.39 samples/sec   Loss 2.5352   LearningRate 0.0034   Epoch: 16   Global Step: 676950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:54,480-Speed 2630.94 samples/sec   Loss 2.5140   LearningRate 0.0034   Epoch: 16   Global Step: 676960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:48:58,375-Speed 2629.61 samples/sec   Loss 2.4677   LearningRate 0.0034   Epoch: 16   Global Step: 676970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:49:02,262-Speed 2635.03 samples/sec   Loss 2.5493   LearningRate 0.0034   Epoch: 16   Global Step: 676980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:49:06,162-Speed 2625.75 samples/sec   Loss 2.5248   LearningRate 0.0034   Epoch: 16   Global Step: 676990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:10,060-Speed 2628.08 samples/sec   Loss 2.5697   LearningRate 0.0034   Epoch: 16   Global Step: 677000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:13,969-Speed 2619.80 samples/sec   Loss 2.5179   LearningRate 0.0034   Epoch: 16   Global Step: 677010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:17,859-Speed 2634.02 samples/sec   Loss 2.5135   LearningRate 0.0034   Epoch: 16   Global Step: 677020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:21,751-Speed 2632.32 samples/sec   Loss 2.4926   LearningRate 0.0034   Epoch: 16   Global Step: 677030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:25,644-Speed 2630.29 samples/sec   Loss 2.6614   LearningRate 0.0034   Epoch: 16   Global Step: 677040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:29,549-Speed 2623.64 samples/sec   Loss 2.5535   LearningRate 0.0034   Epoch: 16   Global Step: 677050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:33,444-Speed 2629.24 samples/sec   Loss 2.5615   LearningRate 0.0034   Epoch: 16   Global Step: 677060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:37,340-Speed 2628.43 samples/sec   Loss 2.5483   LearningRate 0.0034   Epoch: 16   Global Step: 677070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:41,241-Speed 2625.66 samples/sec   Loss 2.5278   LearningRate 0.0034   Epoch: 16   Global Step: 677080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:45,111-Speed 2647.19 samples/sec   Loss 2.5527   LearningRate 0.0034   Epoch: 16   Global Step: 677090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:49,007-Speed 2628.78 samples/sec   Loss 2.5881   LearningRate 0.0034   Epoch: 16   Global Step: 677100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:52,938-Speed 2606.21 samples/sec   Loss 2.5748   LearningRate 0.0034   Epoch: 16   Global Step: 677110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:49:56,831-Speed 2631.02 samples/sec   Loss 2.5164   LearningRate 0.0034   Epoch: 16   Global Step: 677120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:50:00,708-Speed 2642.64 samples/sec   Loss 2.6361   LearningRate 0.0034   Epoch: 16   Global Step: 677130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:04,607-Speed 2627.35 samples/sec   Loss 2.5064   LearningRate 0.0034   Epoch: 16   Global Step: 677140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:08,496-Speed 2633.23 samples/sec   Loss 2.5254   LearningRate 0.0034   Epoch: 16   Global Step: 677150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:12,387-Speed 2632.35 samples/sec   Loss 2.6267   LearningRate 0.0034   Epoch: 16   Global Step: 677160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:16,281-Speed 2630.52 samples/sec   Loss 2.5113   LearningRate 0.0034   Epoch: 16   Global Step: 677170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:20,175-Speed 2630.23 samples/sec   Loss 2.5200   LearningRate 0.0034   Epoch: 16   Global Step: 677180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:24,073-Speed 2628.38 samples/sec   Loss 2.5021   LearningRate 0.0034   Epoch: 16   Global Step: 677190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:27,966-Speed 2631.03 samples/sec   Loss 2.5084   LearningRate 0.0034   Epoch: 16   Global Step: 677200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:31,864-Speed 2628.04 samples/sec   Loss 2.5499   LearningRate 0.0034   Epoch: 16   Global Step: 677210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:35,759-Speed 2629.55 samples/sec   Loss 2.4856   LearningRate 0.0034   Epoch: 16   Global Step: 677220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:39,660-Speed 2625.44 samples/sec   Loss 2.5414   LearningRate 0.0034   Epoch: 16   Global Step: 677230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:50:43,542-Speed 2638.27 samples/sec   Loss 2.5497   LearningRate 0.0034   Epoch: 16   Global Step: 677240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:47,458-Speed 2616.28 samples/sec   Loss 2.5331   LearningRate 0.0034   Epoch: 16   Global Step: 677250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:51,353-Speed 2629.26 samples/sec   Loss 2.5230   LearningRate 0.0034   Epoch: 16   Global Step: 677260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:55,241-Speed 2634.76 samples/sec   Loss 2.5451   LearningRate 0.0034   Epoch: 16   Global Step: 677270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:50:59,134-Speed 2631.42 samples/sec   Loss 2.5032   LearningRate 0.0034   Epoch: 16   Global Step: 677280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:03,062-Speed 2607.56 samples/sec   Loss 2.4930   LearningRate 0.0034   Epoch: 16   Global Step: 677290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:06,965-Speed 2624.08 samples/sec   Loss 2.5515   LearningRate 0.0034   Epoch: 16   Global Step: 677300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:10,863-Speed 2627.97 samples/sec   Loss 2.5822   LearningRate 0.0034   Epoch: 16   Global Step: 677310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:14,766-Speed 2623.62 samples/sec   Loss 2.5326   LearningRate 0.0034   Epoch: 16   Global Step: 677320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:18,669-Speed 2623.96 samples/sec   Loss 2.5681   LearningRate 0.0034   Epoch: 16   Global Step: 677330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:22,581-Speed 2618.75 samples/sec   Loss 2.5258   LearningRate 0.0034   Epoch: 16   Global Step: 677340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:51:26,460-Speed 2640.72 samples/sec   Loss 2.5161   LearningRate 0.0034   Epoch: 16   Global Step: 677350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:30,358-Speed 2627.54 samples/sec   Loss 2.6057   LearningRate 0.0034   Epoch: 16   Global Step: 677360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:34,271-Speed 2617.50 samples/sec   Loss 2.4749   LearningRate 0.0034   Epoch: 16   Global Step: 677370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:38,179-Speed 2620.94 samples/sec   Loss 2.5307   LearningRate 0.0034   Epoch: 16   Global Step: 677380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:42,089-Speed 2619.09 samples/sec   Loss 2.5031   LearningRate 0.0034   Epoch: 16   Global Step: 677390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:46,021-Speed 2605.62 samples/sec   Loss 2.5128   LearningRate 0.0034   Epoch: 16   Global Step: 677400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:49,921-Speed 2626.18 samples/sec   Loss 2.5098   LearningRate 0.0034   Epoch: 16   Global Step: 677410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:53,843-Speed 2611.95 samples/sec   Loss 2.5295   LearningRate 0.0034   Epoch: 16   Global Step: 677420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:51:57,735-Speed 2631.18 samples/sec   Loss 2.5261   LearningRate 0.0034   Epoch: 16   Global Step: 677430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:01,631-Speed 2629.54 samples/sec   Loss 2.5587   LearningRate 0.0034   Epoch: 16   Global Step: 677440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:05,543-Speed 2618.02 samples/sec   Loss 2.6318   LearningRate 0.0034   Epoch: 16   Global Step: 677450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:52:09,424-Speed 2639.01 samples/sec   Loss 2.5431   LearningRate 0.0034   Epoch: 16   Global Step: 677460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:13,331-Speed 2621.45 samples/sec   Loss 2.5427   LearningRate 0.0034   Epoch: 16   Global Step: 677470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:17,228-Speed 2628.98 samples/sec   Loss 2.5295   LearningRate 0.0034   Epoch: 16   Global Step: 677480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:21,139-Speed 2618.68 samples/sec   Loss 2.5060   LearningRate 0.0034   Epoch: 16   Global Step: 677490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:25,042-Speed 2624.35 samples/sec   Loss 2.4969   LearningRate 0.0034   Epoch: 16   Global Step: 677500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:28,962-Speed 2613.70 samples/sec   Loss 2.5067   LearningRate 0.0034   Epoch: 16   Global Step: 677510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:32,889-Speed 2607.49 samples/sec   Loss 2.5669   LearningRate 0.0034   Epoch: 16   Global Step: 677520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:36,828-Speed 2600.54 samples/sec   Loss 2.5825   LearningRate 0.0034   Epoch: 16   Global Step: 677530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:52:40,724-Speed 2629.12 samples/sec   Loss 2.5491   LearningRate 0.0034   Epoch: 16   Global Step: 677540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:52:44,634-Speed 2619.78 samples/sec   Loss 2.4736   LearningRate 0.0034   Epoch: 16   Global Step: 677550   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:52:48,529-Speed 2629.43 samples/sec   Loss 2.5006   LearningRate 0.0034   Epoch: 16   Global Step: 677560   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:52:52,422-Speed 2631.69 samples/sec   Loss 2.5594   LearningRate 0.0034   Epoch: 16   Global Step: 677570   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:52:56,316-Speed 2630.14 samples/sec   Loss 2.5228   LearningRate 0.0034   Epoch: 16   Global Step: 677580   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:53:00,230-Speed 2617.91 samples/sec   Loss 2.5277   LearningRate 0.0034   Epoch: 16   Global Step: 677590   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:53:04,131-Speed 2625.08 samples/sec   Loss 2.5238   LearningRate 0.0034   Epoch: 16   Global Step: 677600   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:53:08,045-Speed 2617.75 samples/sec   Loss 2.5169   LearningRate 0.0034   Epoch: 16   Global Step: 677610   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:53:11,953-Speed 2621.00 samples/sec   Loss 2.5746   LearningRate 0.0034   Epoch: 16   Global Step: 677620   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:53:15,851-Speed 2627.23 samples/sec   Loss 2.5322   LearningRate 0.0034   Epoch: 16   Global Step: 677630   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:53:19,763-Speed 2617.96 samples/sec   Loss 2.5543   LearningRate 0.0034   Epoch: 16   Global Step: 677640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:23,676-Speed 2618.12 samples/sec   Loss 2.4673   LearningRate 0.0034   Epoch: 16   Global Step: 677650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:27,576-Speed 2626.83 samples/sec   Loss 2.5123   LearningRate 0.0034   Epoch: 16   Global Step: 677660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:31,470-Speed 2629.93 samples/sec   Loss 2.5235   LearningRate 0.0034   Epoch: 16   Global Step: 677670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:35,362-Speed 2631.67 samples/sec   Loss 2.4825   LearningRate 0.0034   Epoch: 16   Global Step: 677680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:39,265-Speed 2624.33 samples/sec   Loss 2.5582   LearningRate 0.0034   Epoch: 16   Global Step: 677690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:43,160-Speed 2629.89 samples/sec   Loss 2.5722   LearningRate 0.0034   Epoch: 16   Global Step: 677700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:47,054-Speed 2630.24 samples/sec   Loss 2.5366   LearningRate 0.0034   Epoch: 16   Global Step: 677710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:50,948-Speed 2631.09 samples/sec   Loss 2.5409   LearningRate 0.0034   Epoch: 16   Global Step: 677720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:54,842-Speed 2630.09 samples/sec   Loss 2.5652   LearningRate 0.0034   Epoch: 16   Global Step: 677730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:53:58,736-Speed 2630.80 samples/sec   Loss 2.5750   LearningRate 0.0034   Epoch: 16   Global Step: 677740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:54:02,634-Speed 2626.89 samples/sec   Loss 2.5037   LearningRate 0.0033   Epoch: 16   Global Step: 677750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:54:06,545-Speed 2619.36 samples/sec   Loss 2.5137   LearningRate 0.0033   Epoch: 16   Global Step: 677760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:54:10,595-Speed 2528.95 samples/sec   Loss 2.5240   LearningRate 0.0033   Epoch: 16   Global Step: 677770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:54:14,563-Speed 2581.54 samples/sec   Loss 2.5618   LearningRate 0.0033   Epoch: 16   Global Step: 677780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:18,466-Speed 2623.70 samples/sec   Loss 2.5232   LearningRate 0.0033   Epoch: 16   Global Step: 677790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:22,365-Speed 2627.67 samples/sec   Loss 2.5566   LearningRate 0.0033   Epoch: 16   Global Step: 677800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:26,264-Speed 2626.29 samples/sec   Loss 2.5500   LearningRate 0.0033   Epoch: 16   Global Step: 677810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:30,169-Speed 2622.76 samples/sec   Loss 2.5647   LearningRate 0.0033   Epoch: 16   Global Step: 677820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:34,070-Speed 2625.28 samples/sec   Loss 2.4820   LearningRate 0.0033   Epoch: 16   Global Step: 677830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:37,965-Speed 2629.80 samples/sec   Loss 2.5692   LearningRate 0.0033   Epoch: 16   Global Step: 677840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:41,856-Speed 2632.33 samples/sec   Loss 2.4770   LearningRate 0.0033   Epoch: 16   Global Step: 677850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:45,759-Speed 2624.32 samples/sec   Loss 2.5493   LearningRate 0.0033   Epoch: 16   Global Step: 677860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:49,655-Speed 2629.86 samples/sec   Loss 2.5692   LearningRate 0.0033   Epoch: 16   Global Step: 677870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:54:53,554-Speed 2626.56 samples/sec   Loss 2.4382   LearningRate 0.0033   Epoch: 16   Global Step: 677880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-15 23:54:57,441-Speed 2635.10 samples/sec   Loss 2.5731   LearningRate 0.0033   Epoch: 16   Global Step: 677890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:55:01,352-Speed 2618.67 samples/sec   Loss 2.5420   LearningRate 0.0033   Epoch: 16   Global Step: 677900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:55:05,259-Speed 2621.21 samples/sec   Loss 2.4764   LearningRate 0.0033   Epoch: 16   Global Step: 677910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:55:09,163-Speed 2623.61 samples/sec   Loss 2.4935   LearningRate 0.0033   Epoch: 16   Global Step: 677920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:55:13,092-Speed 2607.39 samples/sec   Loss 2.5234   LearningRate 0.0033   Epoch: 16   Global Step: 677930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:55:17,004-Speed 2618.61 samples/sec   Loss 2.4789   LearningRate 0.0033   Epoch: 16   Global Step: 677940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:55:20,884-Speed 2639.59 samples/sec   Loss 2.4959   LearningRate 0.0033   Epoch: 16   Global Step: 677950   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:24,784-Speed 2626.91 samples/sec   Loss 2.5600   LearningRate 0.0033   Epoch: 16   Global Step: 677960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:28,683-Speed 2626.85 samples/sec   Loss 2.4586   LearningRate 0.0033   Epoch: 16   Global Step: 677970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:32,578-Speed 2629.86 samples/sec   Loss 2.5194   LearningRate 0.0033   Epoch: 16   Global Step: 677980   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:36,473-Speed 2629.48 samples/sec   Loss 2.5219   LearningRate 0.0033   Epoch: 16   Global Step: 677990   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:40,398-Speed 2609.90 samples/sec   Loss 2.5136   LearningRate 0.0033   Epoch: 16   Global Step: 678000   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:44,289-Speed 2632.10 samples/sec   Loss 2.5962   LearningRate 0.0033   Epoch: 16   Global Step: 678010   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:48,185-Speed 2629.20 samples/sec   Loss 2.6100   LearningRate 0.0033   Epoch: 16   Global Step: 678020   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:52,079-Speed 2630.30 samples/sec   Loss 2.4737   LearningRate 0.0033   Epoch: 16   Global Step: 678030   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:56,010-Speed 2606.04 samples/sec   Loss 2.5508   LearningRate 0.0033   Epoch: 16   Global Step: 678040   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:55:59,903-Speed 2630.72 samples/sec   Loss 2.4882   LearningRate 0.0033   Epoch: 16   Global Step: 678050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:03,797-Speed 2630.20 samples/sec   Loss 2.5285   LearningRate 0.0033   Epoch: 16   Global Step: 678060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:07,689-Speed 2631.90 samples/sec   Loss 2.5556   LearningRate 0.0033   Epoch: 16   Global Step: 678070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:11,595-Speed 2622.13 samples/sec   Loss 2.4762   LearningRate 0.0033   Epoch: 16   Global Step: 678080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:15,504-Speed 2620.39 samples/sec   Loss 2.4619   LearningRate 0.0033   Epoch: 16   Global Step: 678090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:19,406-Speed 2624.98 samples/sec   Loss 2.4292   LearningRate 0.0033   Epoch: 16   Global Step: 678100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:23,309-Speed 2624.20 samples/sec   Loss 2.5329   LearningRate 0.0033   Epoch: 16   Global Step: 678110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:27,221-Speed 2619.41 samples/sec   Loss 2.5762   LearningRate 0.0033   Epoch: 16   Global Step: 678120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:56:31,100-Speed 2640.07 samples/sec   Loss 2.5030   LearningRate 0.0033   Epoch: 16   Global Step: 678130   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:35,000-Speed 2626.27 samples/sec   Loss 2.5025   LearningRate 0.0033   Epoch: 16   Global Step: 678140   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:38,893-Speed 2630.78 samples/sec   Loss 2.5179   LearningRate 0.0033   Epoch: 16   Global Step: 678150   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:42,797-Speed 2623.58 samples/sec   Loss 2.5363   LearningRate 0.0033   Epoch: 16   Global Step: 678160   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:46,686-Speed 2633.74 samples/sec   Loss 2.4609   LearningRate 0.0033   Epoch: 16   Global Step: 678170   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:50,586-Speed 2626.24 samples/sec   Loss 2.5944   LearningRate 0.0033   Epoch: 16   Global Step: 678180   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:54,483-Speed 2628.51 samples/sec   Loss 2.5584   LearningRate 0.0033   Epoch: 16   Global Step: 678190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:56:58,376-Speed 2631.48 samples/sec   Loss 2.4787   LearningRate 0.0033   Epoch: 16   Global Step: 678200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:02,269-Speed 2630.58 samples/sec   Loss 2.5837   LearningRate 0.0033   Epoch: 16   Global Step: 678210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:06,165-Speed 2628.67 samples/sec   Loss 2.5474   LearningRate 0.0033   Epoch: 16   Global Step: 678220   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:10,058-Speed 2630.78 samples/sec   Loss 2.4432   LearningRate 0.0033   Epoch: 16   Global Step: 678230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:13,951-Speed 2631.18 samples/sec   Loss 2.4933   LearningRate 0.0033   Epoch: 16   Global Step: 678240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:17,843-Speed 2632.53 samples/sec   Loss 2.5666   LearningRate 0.0033   Epoch: 16   Global Step: 678250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:21,741-Speed 2627.91 samples/sec   Loss 2.5685   LearningRate 0.0033   Epoch: 16   Global Step: 678260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:25,635-Speed 2630.71 samples/sec   Loss 2.5378   LearningRate 0.0033   Epoch: 16   Global Step: 678270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:29,532-Speed 2628.36 samples/sec   Loss 2.5016   LearningRate 0.0033   Epoch: 16   Global Step: 678280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:33,429-Speed 2628.20 samples/sec   Loss 2.5378   LearningRate 0.0033   Epoch: 16   Global Step: 678290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:57:37,297-Speed 2647.63 samples/sec   Loss 2.5099   LearningRate 0.0033   Epoch: 16   Global Step: 678300   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:41,191-Speed 2630.70 samples/sec   Loss 2.5801   LearningRate 0.0033   Epoch: 16   Global Step: 678310   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:45,087-Speed 2628.84 samples/sec   Loss 2.5060   LearningRate 0.0033   Epoch: 16   Global Step: 678320   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:48,988-Speed 2626.25 samples/sec   Loss 2.4967   LearningRate 0.0033   Epoch: 16   Global Step: 678330   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:52,905-Speed 2614.88 samples/sec   Loss 2.5126   LearningRate 0.0033   Epoch: 16   Global Step: 678340   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:57:56,808-Speed 2624.37 samples/sec   Loss 2.5007   LearningRate 0.0033   Epoch: 16   Global Step: 678350   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:00,704-Speed 2628.77 samples/sec   Loss 2.5057   LearningRate 0.0033   Epoch: 16   Global Step: 678360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:04,603-Speed 2626.90 samples/sec   Loss 2.5128   LearningRate 0.0033   Epoch: 16   Global Step: 678370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:08,512-Speed 2620.51 samples/sec   Loss 2.5300   LearningRate 0.0033   Epoch: 16   Global Step: 678380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:12,407-Speed 2629.65 samples/sec   Loss 2.5070   LearningRate 0.0033   Epoch: 16   Global Step: 678390   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:16,327-Speed 2612.81 samples/sec   Loss 2.5157   LearningRate 0.0033   Epoch: 16   Global Step: 678400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:20,233-Speed 2622.13 samples/sec   Loss 2.5085   LearningRate 0.0033   Epoch: 16   Global Step: 678410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:24,128-Speed 2629.94 samples/sec   Loss 2.5507   LearningRate 0.0033   Epoch: 16   Global Step: 678420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:28,023-Speed 2629.49 samples/sec   Loss 2.5291   LearningRate 0.0033   Epoch: 16   Global Step: 678430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:32,142-Speed 2487.53 samples/sec   Loss 2.5758   LearningRate 0.0033   Epoch: 16   Global Step: 678440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:36,119-Speed 2574.94 samples/sec   Loss 2.5813   LearningRate 0.0033   Epoch: 16   Global Step: 678450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:40,012-Speed 2631.03 samples/sec   Loss 2.5071   LearningRate 0.0033   Epoch: 16   Global Step: 678460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:58:43,889-Speed 2642.14 samples/sec   Loss 2.5091   LearningRate 0.0033   Epoch: 16   Global Step: 678470   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:47,812-Speed 2610.92 samples/sec   Loss 2.5474   LearningRate 0.0033   Epoch: 16   Global Step: 678480   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:51,703-Speed 2632.13 samples/sec   Loss 2.4975   LearningRate 0.0033   Epoch: 16   Global Step: 678490   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:55,616-Speed 2618.42 samples/sec   Loss 2.5070   LearningRate 0.0033   Epoch: 16   Global Step: 678500   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:58:59,510-Speed 2630.07 samples/sec   Loss 2.4931   LearningRate 0.0033   Epoch: 16   Global Step: 678510   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:59:03,406-Speed 2629.64 samples/sec   Loss 2.5749   LearningRate 0.0033   Epoch: 16   Global Step: 678520   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:59:07,309-Speed 2623.85 samples/sec   Loss 2.5079   LearningRate 0.0033   Epoch: 16   Global Step: 678530   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:59:11,202-Speed 2630.83 samples/sec   Loss 2.5368   LearningRate 0.0033   Epoch: 16   Global Step: 678540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:59:15,115-Speed 2617.74 samples/sec   Loss 2.4831   LearningRate 0.0033   Epoch: 16   Global Step: 678550   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:59:19,027-Speed 2618.81 samples/sec   Loss 2.4948   LearningRate 0.0033   Epoch: 16   Global Step: 678560   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-15 23:59:22,938-Speed 2619.44 samples/sec   Loss 2.5702   LearningRate 0.0033   Epoch: 16   Global Step: 678570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:26,837-Speed 2627.23 samples/sec   Loss 2.5011   LearningRate 0.0033   Epoch: 16   Global Step: 678580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:30,728-Speed 2633.33 samples/sec   Loss 2.4770   LearningRate 0.0033   Epoch: 16   Global Step: 678590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:34,621-Speed 2630.59 samples/sec   Loss 2.5693   LearningRate 0.0033   Epoch: 16   Global Step: 678600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:38,518-Speed 2628.36 samples/sec   Loss 2.4601   LearningRate 0.0033   Epoch: 16   Global Step: 678610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:42,423-Speed 2623.01 samples/sec   Loss 2.5182   LearningRate 0.0033   Epoch: 16   Global Step: 678620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:46,319-Speed 2629.10 samples/sec   Loss 2.4940   LearningRate 0.0033   Epoch: 16   Global Step: 678630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:50,213-Speed 2630.38 samples/sec   Loss 2.5385   LearningRate 0.0033   Epoch: 16   Global Step: 678640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:54,120-Speed 2622.21 samples/sec   Loss 2.5021   LearningRate 0.0033   Epoch: 16   Global Step: 678650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-15 23:59:58,022-Speed 2624.87 samples/sec   Loss 2.4897   LearningRate 0.0033   Epoch: 16   Global Step: 678660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:01,921-Speed 2627.43 samples/sec   Loss 2.4597   LearningRate 0.0033   Epoch: 16   Global Step: 678670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:00:05,797-Speed 2642.08 samples/sec   Loss 2.5622   LearningRate 0.0033   Epoch: 16   Global Step: 678680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:09,693-Speed 2628.96 samples/sec   Loss 2.5322   LearningRate 0.0033   Epoch: 16   Global Step: 678690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:13,599-Speed 2621.93 samples/sec   Loss 2.5087   LearningRate 0.0033   Epoch: 16   Global Step: 678700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:17,492-Speed 2631.31 samples/sec   Loss 2.4952   LearningRate 0.0033   Epoch: 16   Global Step: 678710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:21,391-Speed 2627.43 samples/sec   Loss 2.4416   LearningRate 0.0033   Epoch: 16   Global Step: 678720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:25,288-Speed 2627.69 samples/sec   Loss 2.4429   LearningRate 0.0033   Epoch: 16   Global Step: 678730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:29,275-Speed 2569.28 samples/sec   Loss 2.5600   LearningRate 0.0033   Epoch: 16   Global Step: 678740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:33,180-Speed 2623.51 samples/sec   Loss 2.5474   LearningRate 0.0033   Epoch: 16   Global Step: 678750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:37,077-Speed 2628.09 samples/sec   Loss 2.5045   LearningRate 0.0033   Epoch: 16   Global Step: 678760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:40,990-Speed 2617.88 samples/sec   Loss 2.5454   LearningRate 0.0033   Epoch: 16   Global Step: 678770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:44,898-Speed 2621.43 samples/sec   Loss 2.5462   LearningRate 0.0033   Epoch: 16   Global Step: 678780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:00:48,770-Speed 2645.55 samples/sec   Loss 2.5378   LearningRate 0.0033   Epoch: 16   Global Step: 678790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:52,676-Speed 2622.44 samples/sec   Loss 2.5008   LearningRate 0.0033   Epoch: 16   Global Step: 678800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:00:56,578-Speed 2624.59 samples/sec   Loss 2.5361   LearningRate 0.0033   Epoch: 16   Global Step: 678810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:00,477-Speed 2626.85 samples/sec   Loss 2.5380   LearningRate 0.0033   Epoch: 16   Global Step: 678820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:04,372-Speed 2629.11 samples/sec   Loss 2.5451   LearningRate 0.0033   Epoch: 16   Global Step: 678830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:08,267-Speed 2630.31 samples/sec   Loss 2.5051   LearningRate 0.0033   Epoch: 16   Global Step: 678840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:12,158-Speed 2632.71 samples/sec   Loss 2.4682   LearningRate 0.0033   Epoch: 16   Global Step: 678850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:16,052-Speed 2629.78 samples/sec   Loss 2.5001   LearningRate 0.0033   Epoch: 16   Global Step: 678860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:19,950-Speed 2628.20 samples/sec   Loss 2.4721   LearningRate 0.0033   Epoch: 16   Global Step: 678870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:23,857-Speed 2621.13 samples/sec   Loss 2.5268   LearningRate 0.0033   Epoch: 16   Global Step: 678880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:27,754-Speed 2628.66 samples/sec   Loss 2.5433   LearningRate 0.0033   Epoch: 16   Global Step: 678890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:01:31,633-Speed 2640.31 samples/sec   Loss 2.6039   LearningRate 0.0033   Epoch: 16   Global Step: 678900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:35,532-Speed 2626.97 samples/sec   Loss 2.4861   LearningRate 0.0033   Epoch: 16   Global Step: 678910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:39,427-Speed 2629.85 samples/sec   Loss 2.5356   LearningRate 0.0033   Epoch: 16   Global Step: 678920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:43,319-Speed 2632.10 samples/sec   Loss 2.5139   LearningRate 0.0033   Epoch: 16   Global Step: 678930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:47,212-Speed 2630.46 samples/sec   Loss 2.5066   LearningRate 0.0033   Epoch: 16   Global Step: 678940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:51,169-Speed 2588.72 samples/sec   Loss 2.5723   LearningRate 0.0033   Epoch: 16   Global Step: 678950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:01:55,059-Speed 2633.41 samples/sec   Loss 2.5013   LearningRate 0.0033   Epoch: 16   Global Step: 678960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:01:58,953-Speed 2630.96 samples/sec   Loss 2.5172   LearningRate 0.0033   Epoch: 16   Global Step: 678970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:02,849-Speed 2628.65 samples/sec   Loss 2.5244   LearningRate 0.0033   Epoch: 16   Global Step: 678980   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:06,742-Speed 2631.54 samples/sec   Loss 2.4643   LearningRate 0.0033   Epoch: 16   Global Step: 678990   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:10,651-Speed 2619.67 samples/sec   Loss 2.4470   LearningRate 0.0033   Epoch: 16   Global Step: 679000   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:14,553-Speed 2625.84 samples/sec   Loss 2.5138   LearningRate 0.0033   Epoch: 16   Global Step: 679010   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:18,459-Speed 2622.53 samples/sec   Loss 2.4867   LearningRate 0.0033   Epoch: 16   Global Step: 679020   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:22,364-Speed 2623.15 samples/sec   Loss 2.4404   LearningRate 0.0033   Epoch: 16   Global Step: 679030   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:26,260-Speed 2629.52 samples/sec   Loss 2.4533   LearningRate 0.0033   Epoch: 16   Global Step: 679040   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:30,166-Speed 2621.76 samples/sec   Loss 2.5360   LearningRate 0.0033   Epoch: 16   Global Step: 679050   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:02:34,061-Speed 2629.41 samples/sec   Loss 2.4256   LearningRate 0.0033   Epoch: 16   Global Step: 679060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:02:37,956-Speed 2629.60 samples/sec   Loss 2.4524   LearningRate 0.0033   Epoch: 16   Global Step: 679070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:02:41,845-Speed 2633.94 samples/sec   Loss 2.5156   LearningRate 0.0033   Epoch: 16   Global Step: 679080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:02:45,746-Speed 2625.67 samples/sec   Loss 2.4527   LearningRate 0.0033   Epoch: 16   Global Step: 679090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:02:49,658-Speed 2618.57 samples/sec   Loss 2.4883   LearningRate 0.0033   Epoch: 16   Global Step: 679100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:02:53,552-Speed 2630.28 samples/sec   Loss 2.4546   LearningRate 0.0033   Epoch: 16   Global Step: 679110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:02:57,468-Speed 2616.46 samples/sec   Loss 2.4576   LearningRate 0.0033   Epoch: 16   Global Step: 679120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:03:01,383-Speed 2616.03 samples/sec   Loss 2.5319   LearningRate 0.0033   Epoch: 16   Global Step: 679130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:03:05,274-Speed 2632.31 samples/sec   Loss 2.5088   LearningRate 0.0033   Epoch: 16   Global Step: 679140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:03:09,169-Speed 2629.15 samples/sec   Loss 2.5260   LearningRate 0.0033   Epoch: 16   Global Step: 679150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:03:13,069-Speed 2627.09 samples/sec   Loss 2.4687   LearningRate 0.0033   Epoch: 16   Global Step: 679160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:03:16,964-Speed 2628.95 samples/sec   Loss 2.5122   LearningRate 0.0033   Epoch: 16   Global Step: 679170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:03:20,843-Speed 2640.79 samples/sec   Loss 2.5154   LearningRate 0.0033   Epoch: 16   Global Step: 679180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:03:24,737-Speed 2630.31 samples/sec   Loss 2.4428   LearningRate 0.0033   Epoch: 16   Global Step: 679190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:28,636-Speed 2627.81 samples/sec   Loss 2.5240   LearningRate 0.0033   Epoch: 16   Global Step: 679200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:32,538-Speed 2624.77 samples/sec   Loss 2.4648   LearningRate 0.0033   Epoch: 16   Global Step: 679210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:36,430-Speed 2631.61 samples/sec   Loss 2.5739   LearningRate 0.0033   Epoch: 16   Global Step: 679220   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:40,324-Speed 2629.76 samples/sec   Loss 2.4584   LearningRate 0.0033   Epoch: 16   Global Step: 679230   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:44,229-Speed 2623.08 samples/sec   Loss 2.5299   LearningRate 0.0033   Epoch: 16   Global Step: 679240   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:48,141-Speed 2619.15 samples/sec   Loss 2.4338   LearningRate 0.0033   Epoch: 16   Global Step: 679250   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:52,034-Speed 2630.39 samples/sec   Loss 2.6049   LearningRate 0.0033   Epoch: 16   Global Step: 679260   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:55,949-Speed 2616.69 samples/sec   Loss 2.4839   LearningRate 0.0033   Epoch: 16   Global Step: 679270   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:03:59,841-Speed 2631.75 samples/sec   Loss 2.4600   LearningRate 0.0033   Epoch: 16   Global Step: 679280   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:04:03,751-Speed 2619.68 samples/sec   Loss 2.4923   LearningRate 0.0033   Epoch: 16   Global Step: 679290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:07,653-Speed 2624.62 samples/sec   Loss 2.4835   LearningRate 0.0033   Epoch: 16   Global Step: 679300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:11,546-Speed 2631.46 samples/sec   Loss 2.5194   LearningRate 0.0033   Epoch: 16   Global Step: 679310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:15,444-Speed 2627.61 samples/sec   Loss 2.5202   LearningRate 0.0033   Epoch: 16   Global Step: 679320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:19,338-Speed 2630.52 samples/sec   Loss 2.4610   LearningRate 0.0033   Epoch: 16   Global Step: 679330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:23,232-Speed 2630.90 samples/sec   Loss 2.5147   LearningRate 0.0033   Epoch: 16   Global Step: 679340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:27,126-Speed 2630.11 samples/sec   Loss 2.5443   LearningRate 0.0033   Epoch: 16   Global Step: 679350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:31,043-Speed 2615.56 samples/sec   Loss 2.5723   LearningRate 0.0033   Epoch: 16   Global Step: 679360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:34,958-Speed 2615.80 samples/sec   Loss 2.5225   LearningRate 0.0033   Epoch: 16   Global Step: 679370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:38,851-Speed 2631.26 samples/sec   Loss 2.5266   LearningRate 0.0033   Epoch: 16   Global Step: 679380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:42,721-Speed 2646.29 samples/sec   Loss 2.5087   LearningRate 0.0033   Epoch: 16   Global Step: 679390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:46,618-Speed 2629.00 samples/sec   Loss 2.5197   LearningRate 0.0033   Epoch: 16   Global Step: 679400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:50,512-Speed 2630.24 samples/sec   Loss 2.5124   LearningRate 0.0033   Epoch: 16   Global Step: 679410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:54,407-Speed 2629.79 samples/sec   Loss 2.4417   LearningRate 0.0033   Epoch: 16   Global Step: 679420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:04:58,300-Speed 2631.00 samples/sec   Loss 2.4945   LearningRate 0.0033   Epoch: 16   Global Step: 679430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:02,200-Speed 2626.67 samples/sec   Loss 2.4463   LearningRate 0.0033   Epoch: 16   Global Step: 679440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:06,102-Speed 2625.02 samples/sec   Loss 2.4652   LearningRate 0.0033   Epoch: 16   Global Step: 679450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:10,002-Speed 2625.83 samples/sec   Loss 2.5635   LearningRate 0.0033   Epoch: 16   Global Step: 679460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:13,894-Speed 2631.75 samples/sec   Loss 2.5524   LearningRate 0.0033   Epoch: 16   Global Step: 679470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:17,788-Speed 2631.43 samples/sec   Loss 2.4806   LearningRate 0.0033   Epoch: 16   Global Step: 679480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:21,681-Speed 2630.66 samples/sec   Loss 2.4958   LearningRate 0.0033   Epoch: 16   Global Step: 679490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:05:25,596-Speed 2616.68 samples/sec   Loss 2.5032   LearningRate 0.0033   Epoch: 16   Global Step: 679500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:05:30,544-Speed 2070.02 samples/sec   Loss 2.4938   LearningRate 0.0033   Epoch: 16   Global Step: 679510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:34,450-Speed 2622.07 samples/sec   Loss 2.4892   LearningRate 0.0033   Epoch: 16   Global Step: 679520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:38,344-Speed 2630.58 samples/sec   Loss 2.4410   LearningRate 0.0033   Epoch: 16   Global Step: 679530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:42,236-Speed 2631.82 samples/sec   Loss 2.4461   LearningRate 0.0033   Epoch: 16   Global Step: 679540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:46,130-Speed 2629.92 samples/sec   Loss 2.4513   LearningRate 0.0033   Epoch: 16   Global Step: 679550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:50,021-Speed 2632.33 samples/sec   Loss 2.4397   LearningRate 0.0033   Epoch: 16   Global Step: 679560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:53,909-Speed 2634.25 samples/sec   Loss 2.5523   LearningRate 0.0033   Epoch: 16   Global Step: 679570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:05:57,803-Speed 2630.60 samples/sec   Loss 2.4905   LearningRate 0.0033   Epoch: 16   Global Step: 679580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:01,697-Speed 2630.93 samples/sec   Loss 2.4799   LearningRate 0.0033   Epoch: 16   Global Step: 679590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:05,590-Speed 2630.42 samples/sec   Loss 2.5407   LearningRate 0.0033   Epoch: 16   Global Step: 679600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:09,490-Speed 2626.36 samples/sec   Loss 2.4812   LearningRate 0.0033   Epoch: 16   Global Step: 679610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:06:13,366-Speed 2642.45 samples/sec   Loss 2.5108   LearningRate 0.0033   Epoch: 16   Global Step: 679620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:17,293-Speed 2608.18 samples/sec   Loss 2.4844   LearningRate 0.0033   Epoch: 16   Global Step: 679630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:21,187-Speed 2630.53 samples/sec   Loss 2.4751   LearningRate 0.0033   Epoch: 16   Global Step: 679640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:25,117-Speed 2606.28 samples/sec   Loss 2.5346   LearningRate 0.0033   Epoch: 16   Global Step: 679650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:29,013-Speed 2629.84 samples/sec   Loss 2.4513   LearningRate 0.0033   Epoch: 16   Global Step: 679660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:32,908-Speed 2630.14 samples/sec   Loss 2.5300   LearningRate 0.0033   Epoch: 16   Global Step: 679670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:36,805-Speed 2628.00 samples/sec   Loss 2.4786   LearningRate 0.0033   Epoch: 16   Global Step: 679680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:40,727-Speed 2611.10 samples/sec   Loss 2.5274   LearningRate 0.0033   Epoch: 16   Global Step: 679690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:44,618-Speed 2633.21 samples/sec   Loss 2.5475   LearningRate 0.0033   Epoch: 16   Global Step: 679700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:48,511-Speed 2631.00 samples/sec   Loss 2.4721   LearningRate 0.0033   Epoch: 16   Global Step: 679710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:52,390-Speed 2640.59 samples/sec   Loss 2.5170   LearningRate 0.0033   Epoch: 16   Global Step: 679720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:06:56,294-Speed 2622.93 samples/sec   Loss 2.4747   LearningRate 0.0033   Epoch: 16   Global Step: 679730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:00,191-Speed 2628.37 samples/sec   Loss 2.5099   LearningRate 0.0033   Epoch: 16   Global Step: 679740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:04,085-Speed 2630.54 samples/sec   Loss 2.5470   LearningRate 0.0033   Epoch: 16   Global Step: 679750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:07,986-Speed 2626.20 samples/sec   Loss 2.4960   LearningRate 0.0033   Epoch: 16   Global Step: 679760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:11,877-Speed 2631.55 samples/sec   Loss 2.5354   LearningRate 0.0033   Epoch: 16   Global Step: 679770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:15,774-Speed 2628.17 samples/sec   Loss 2.5290   LearningRate 0.0033   Epoch: 16   Global Step: 679780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:19,677-Speed 2625.12 samples/sec   Loss 2.5123   LearningRate 0.0033   Epoch: 16   Global Step: 679790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:23,572-Speed 2629.75 samples/sec   Loss 2.5253   LearningRate 0.0033   Epoch: 16   Global Step: 679800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:27,472-Speed 2627.94 samples/sec   Loss 2.4910   LearningRate 0.0033   Epoch: 16   Global Step: 679810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:31,369-Speed 2628.36 samples/sec   Loss 2.5140   LearningRate 0.0033   Epoch: 16   Global Step: 679820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:07:35,285-Speed 2616.57 samples/sec   Loss 2.5513   LearningRate 0.0033   Epoch: 16   Global Step: 679830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:07:39,186-Speed 2625.62 samples/sec   Loss 2.4600   LearningRate 0.0033   Epoch: 16   Global Step: 679840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:07:43,090-Speed 2623.54 samples/sec   Loss 2.4867   LearningRate 0.0033   Epoch: 16   Global Step: 679850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:46,987-Speed 2628.57 samples/sec   Loss 2.5563   LearningRate 0.0033   Epoch: 16   Global Step: 679860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:50,890-Speed 2625.15 samples/sec   Loss 2.5258   LearningRate 0.0033   Epoch: 16   Global Step: 679870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:54,787-Speed 2628.04 samples/sec   Loss 2.4052   LearningRate 0.0033   Epoch: 16   Global Step: 679880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:07:58,678-Speed 2633.36 samples/sec   Loss 2.4770   LearningRate 0.0033   Epoch: 16   Global Step: 679890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:08:02,572-Speed 2629.95 samples/sec   Loss 2.4606   LearningRate 0.0033   Epoch: 16   Global Step: 679900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:08:06,468-Speed 2628.67 samples/sec   Loss 2.3917   LearningRate 0.0033   Epoch: 16   Global Step: 679910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:08:10,340-Speed 2645.13 samples/sec   Loss 2.5427   LearningRate 0.0033   Epoch: 16   Global Step: 679920   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:16,240-Speed 2627.64 samples/sec   Loss 2.4539   LearningRate 0.0033   Epoch: 16   Global Step: 679930   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:20,143-Speed 2629.87 samples/sec   Loss 2.4921   LearningRate 0.0033   Epoch: 16   Global Step: 679940   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:24,056-Speed 2617.23 samples/sec   Loss 2.5525   LearningRate 0.0033   Epoch: 16   Global Step: 679950   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:28,008-Speed 2632.74 samples/sec   Loss 2.4612   LearningRate 0.0033   Epoch: 16   Global Step: 679960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:31,898-Speed 2632.82 samples/sec   Loss 2.4844   LearningRate 0.0033   Epoch: 16   Global Step: 679970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:35,788-Speed 2633.17 samples/sec   Loss 2.5364   LearningRate 0.0033   Epoch: 16   Global Step: 679980   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:39,686-Speed 2627.71 samples/sec   Loss 2.4865   LearningRate 0.0033   Epoch: 16   Global Step: 679990   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:08:43,589-Speed 2627.57 samples/sec   Loss 2.4445   LearningRate 0.0033   Epoch: 16   Global Step: 680000   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:09:26,558-[lfw][680000]XNorm: 22.639444
Training: 2022-04-16 00:09:26,559-[lfw][680000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 00:09:26,560-[lfw][680000]Accuracy-Highest: 0.99833
Training: 2022-04-16 00:10:16,348-[cfp_fp][680000]XNorm: 22.287411
Training: 2022-04-16 00:10:16,349-[cfp_fp][680000]Accuracy-Flip: 0.99200+-0.00512
Training: 2022-04-16 00:10:16,350-[cfp_fp][680000]Accuracy-Highest: 0.99271
Training: 2022-04-16 00:10:59,418-[agedb_30][680000]XNorm: 23.108925
Training: 2022-04-16 00:10:59,419-[agedb_30][680000]Accuracy-Flip: 0.98167+-0.00615
Training: 2022-04-16 00:10:59,419-[agedb_30][680000]Accuracy-Highest: 0.98233
Training: 2022-04-16 00:11:03,302-Speed 73.29 samples/sec   Loss 2.5173   LearningRate 0.0033   Epoch: 16   Global Step: 680010   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:11:07,168-Speed 2649.34 samples/sec   Loss 2.4233   LearningRate 0.0033   Epoch: 16   Global Step: 680020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:11,060-Speed 2631.61 samples/sec   Loss 2.4900   LearningRate 0.0032   Epoch: 16   Global Step: 680030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:14,934-Speed 2644.43 samples/sec   Loss 2.5073   LearningRate 0.0032   Epoch: 16   Global Step: 680040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:18,807-Speed 2644.44 samples/sec   Loss 2.5580   LearningRate 0.0032   Epoch: 16   Global Step: 680050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:22,711-Speed 2623.41 samples/sec   Loss 2.4318   LearningRate 0.0032   Epoch: 16   Global Step: 680060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:26,589-Speed 2642.64 samples/sec   Loss 2.5031   LearningRate 0.0032   Epoch: 16   Global Step: 680070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:30,466-Speed 2641.32 samples/sec   Loss 2.5334   LearningRate 0.0032   Epoch: 16   Global Step: 680080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:34,349-Speed 2637.97 samples/sec   Loss 2.5684   LearningRate 0.0032   Epoch: 16   Global Step: 680090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:38,250-Speed 2625.92 samples/sec   Loss 2.5566   LearningRate 0.0032   Epoch: 16   Global Step: 680100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:42,134-Speed 2636.65 samples/sec   Loss 2.4637   LearningRate 0.0032   Epoch: 16   Global Step: 680110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:45,992-Speed 2655.32 samples/sec   Loss 2.5069   LearningRate 0.0032   Epoch: 16   Global Step: 680120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:49,878-Speed 2635.82 samples/sec   Loss 2.4862   LearningRate 0.0032   Epoch: 16   Global Step: 680130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:53,770-Speed 2631.73 samples/sec   Loss 2.5459   LearningRate 0.0032   Epoch: 16   Global Step: 680140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:11:57,641-Speed 2645.47 samples/sec   Loss 2.5067   LearningRate 0.0032   Epoch: 16   Global Step: 680150   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:01,536-Speed 2629.86 samples/sec   Loss 2.5335   LearningRate 0.0032   Epoch: 16   Global Step: 680160   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:05,441-Speed 2622.78 samples/sec   Loss 2.4257   LearningRate 0.0032   Epoch: 16   Global Step: 680170   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:09,348-Speed 2622.57 samples/sec   Loss 2.3731   LearningRate 0.0032   Epoch: 16   Global Step: 680180   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:13,240-Speed 2631.60 samples/sec   Loss 2.4666   LearningRate 0.0032   Epoch: 16   Global Step: 680190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:17,186-Speed 2596.71 samples/sec   Loss 2.5606   LearningRate 0.0032   Epoch: 16   Global Step: 680200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:21,082-Speed 2629.16 samples/sec   Loss 2.5073   LearningRate 0.0032   Epoch: 16   Global Step: 680210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:24,971-Speed 2633.32 samples/sec   Loss 2.5503   LearningRate 0.0032   Epoch: 16   Global Step: 680220   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:28,871-Speed 2626.34 samples/sec   Loss 2.5096   LearningRate 0.0032   Epoch: 16   Global Step: 680230   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:32,772-Speed 2625.44 samples/sec   Loss 2.5256   LearningRate 0.0032   Epoch: 16   Global Step: 680240   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:12:36,668-Speed 2629.14 samples/sec   Loss 2.4498   LearningRate 0.0032   Epoch: 16   Global Step: 680250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:12:40,575-Speed 2621.64 samples/sec   Loss 2.5119   LearningRate 0.0032   Epoch: 16   Global Step: 680260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:12:44,472-Speed 2628.48 samples/sec   Loss 2.4796   LearningRate 0.0032   Epoch: 16   Global Step: 680270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:12:48,362-Speed 2633.08 samples/sec   Loss 2.4665   LearningRate 0.0032   Epoch: 16   Global Step: 680280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:12:52,253-Speed 2632.42 samples/sec   Loss 2.4086   LearningRate 0.0032   Epoch: 16   Global Step: 680290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:12:56,145-Speed 2631.35 samples/sec   Loss 2.4794   LearningRate 0.0032   Epoch: 16   Global Step: 680300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:00,046-Speed 2625.73 samples/sec   Loss 2.4897   LearningRate 0.0032   Epoch: 16   Global Step: 680310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:03,946-Speed 2626.30 samples/sec   Loss 2.5129   LearningRate 0.0032   Epoch: 16   Global Step: 680320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:07,847-Speed 2625.86 samples/sec   Loss 2.4894   LearningRate 0.0032   Epoch: 16   Global Step: 680330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:11,740-Speed 2631.07 samples/sec   Loss 2.5443   LearningRate 0.0032   Epoch: 16   Global Step: 680340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:15,644-Speed 2623.67 samples/sec   Loss 2.5170   LearningRate 0.0032   Epoch: 16   Global Step: 680350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:13:19,517-Speed 2644.45 samples/sec   Loss 2.5617   LearningRate 0.0032   Epoch: 16   Global Step: 680360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:23,406-Speed 2633.95 samples/sec   Loss 2.4756   LearningRate 0.0032   Epoch: 16   Global Step: 680370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:27,296-Speed 2633.65 samples/sec   Loss 2.4996   LearningRate 0.0032   Epoch: 16   Global Step: 680380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:31,199-Speed 2624.55 samples/sec   Loss 2.4632   LearningRate 0.0032   Epoch: 16   Global Step: 680390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:35,123-Speed 2609.68 samples/sec   Loss 2.4849   LearningRate 0.0032   Epoch: 16   Global Step: 680400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:39,015-Speed 2632.12 samples/sec   Loss 2.4710   LearningRate 0.0032   Epoch: 16   Global Step: 680410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:42,904-Speed 2634.29 samples/sec   Loss 2.4854   LearningRate 0.0032   Epoch: 16   Global Step: 680420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:46,801-Speed 2628.31 samples/sec   Loss 2.4795   LearningRate 0.0032   Epoch: 16   Global Step: 680430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:50,690-Speed 2634.18 samples/sec   Loss 2.5202   LearningRate 0.0032   Epoch: 16   Global Step: 680440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:54,582-Speed 2631.09 samples/sec   Loss 2.4429   LearningRate 0.0032   Epoch: 16   Global Step: 680450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:13:58,478-Speed 2629.47 samples/sec   Loss 2.3939   LearningRate 0.0032   Epoch: 16   Global Step: 680460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:14:02,369-Speed 2632.17 samples/sec   Loss 2.3844   LearningRate 0.0032   Epoch: 16   Global Step: 680470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:14:06,257-Speed 2634.29 samples/sec   Loss 2.4827   LearningRate 0.0032   Epoch: 16   Global Step: 680480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:14:10,151-Speed 2630.50 samples/sec   Loss 2.4833   LearningRate 0.0032   Epoch: 16   Global Step: 680490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:14:14,061-Speed 2619.61 samples/sec   Loss 2.5361   LearningRate 0.0032   Epoch: 16   Global Step: 680500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:14:17,936-Speed 2643.34 samples/sec   Loss 2.5053   LearningRate 0.0032   Epoch: 16   Global Step: 680510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:21,826-Speed 2633.03 samples/sec   Loss 2.3717   LearningRate 0.0032   Epoch: 16   Global Step: 680520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:25,737-Speed 2619.43 samples/sec   Loss 2.4544   LearningRate 0.0032   Epoch: 16   Global Step: 680530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:29,627-Speed 2632.94 samples/sec   Loss 2.4722   LearningRate 0.0032   Epoch: 16   Global Step: 680540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:33,516-Speed 2633.45 samples/sec   Loss 2.3688   LearningRate 0.0032   Epoch: 16   Global Step: 680550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:37,409-Speed 2630.87 samples/sec   Loss 2.4242   LearningRate 0.0032   Epoch: 16   Global Step: 680560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:41,298-Speed 2634.05 samples/sec   Loss 2.4299   LearningRate 0.0032   Epoch: 16   Global Step: 680570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:45,190-Speed 2631.74 samples/sec   Loss 2.4250   LearningRate 0.0032   Epoch: 16   Global Step: 680580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:49,083-Speed 2631.49 samples/sec   Loss 2.5109   LearningRate 0.0032   Epoch: 16   Global Step: 680590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:52,990-Speed 2621.55 samples/sec   Loss 2.5010   LearningRate 0.0032   Epoch: 16   Global Step: 680600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:14:56,888-Speed 2628.39 samples/sec   Loss 2.4976   LearningRate 0.0032   Epoch: 16   Global Step: 680610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:15:00,790-Speed 2625.13 samples/sec   Loss 2.5344   LearningRate 0.0032   Epoch: 16   Global Step: 680620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:15:04,684-Speed 2630.51 samples/sec   Loss 2.4673   LearningRate 0.0032   Epoch: 16   Global Step: 680630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:15:08,577-Speed 2631.16 samples/sec   Loss 2.5378   LearningRate 0.0032   Epoch: 16   Global Step: 680640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:15:12,460-Speed 2638.02 samples/sec   Loss 2.4865   LearningRate 0.0032   Epoch: 16   Global Step: 680650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:16,352-Speed 2631.27 samples/sec   Loss 2.5188   LearningRate 0.0032   Epoch: 16   Global Step: 680660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:20,266-Speed 2616.92 samples/sec   Loss 2.4593   LearningRate 0.0032   Epoch: 16   Global Step: 680670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:24,188-Speed 2611.92 samples/sec   Loss 2.4028   LearningRate 0.0032   Epoch: 16   Global Step: 680680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:28,076-Speed 2634.66 samples/sec   Loss 2.4473   LearningRate 0.0032   Epoch: 16   Global Step: 680690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:32,017-Speed 2598.69 samples/sec   Loss 2.4848   LearningRate 0.0032   Epoch: 16   Global Step: 680700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:35,906-Speed 2634.07 samples/sec   Loss 2.4614   LearningRate 0.0032   Epoch: 16   Global Step: 680710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:39,805-Speed 2627.22 samples/sec   Loss 2.3920   LearningRate 0.0032   Epoch: 16   Global Step: 680720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:43,699-Speed 2630.43 samples/sec   Loss 2.4713   LearningRate 0.0032   Epoch: 16   Global Step: 680730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:47,595-Speed 2630.42 samples/sec   Loss 2.5763   LearningRate 0.0032   Epoch: 16   Global Step: 680740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:15:51,512-Speed 2614.28 samples/sec   Loss 2.4911   LearningRate 0.0032   Epoch: 16   Global Step: 680750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:15:55,437-Speed 2609.86 samples/sec   Loss 2.4737   LearningRate 0.0032   Epoch: 16   Global Step: 680760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:15:59,335-Speed 2628.02 samples/sec   Loss 2.5530   LearningRate 0.0032   Epoch: 16   Global Step: 680770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:16:03,231-Speed 2628.98 samples/sec   Loss 2.4662   LearningRate 0.0032   Epoch: 16   Global Step: 680780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:16:07,103-Speed 2645.10 samples/sec   Loss 2.5294   LearningRate 0.0032   Epoch: 16   Global Step: 680790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:11,000-Speed 2628.90 samples/sec   Loss 2.5429   LearningRate 0.0032   Epoch: 16   Global Step: 680800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:14,893-Speed 2630.95 samples/sec   Loss 2.4371   LearningRate 0.0032   Epoch: 16   Global Step: 680810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:18,855-Speed 2585.11 samples/sec   Loss 2.4775   LearningRate 0.0032   Epoch: 16   Global Step: 680820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:22,748-Speed 2631.80 samples/sec   Loss 2.4800   LearningRate 0.0032   Epoch: 16   Global Step: 680830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:26,649-Speed 2625.82 samples/sec   Loss 2.4352   LearningRate 0.0032   Epoch: 16   Global Step: 680840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:30,552-Speed 2624.11 samples/sec   Loss 2.4986   LearningRate 0.0032   Epoch: 16   Global Step: 680850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:34,453-Speed 2626.07 samples/sec   Loss 2.4933   LearningRate 0.0032   Epoch: 16   Global Step: 680860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:38,348-Speed 2630.24 samples/sec   Loss 2.4476   LearningRate 0.0032   Epoch: 16   Global Step: 680870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:42,259-Speed 2618.84 samples/sec   Loss 2.4865   LearningRate 0.0032   Epoch: 16   Global Step: 680880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:16:46,153-Speed 2630.65 samples/sec   Loss 2.4734   LearningRate 0.0032   Epoch: 16   Global Step: 680890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:16:50,058-Speed 2622.42 samples/sec   Loss 2.4833   LearningRate 0.0032   Epoch: 16   Global Step: 680900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:16:53,966-Speed 2621.49 samples/sec   Loss 2.4826   LearningRate 0.0032   Epoch: 16   Global Step: 680910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:16:57,862-Speed 2629.26 samples/sec   Loss 2.4951   LearningRate 0.0032   Epoch: 16   Global Step: 680920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:17:01,758-Speed 2628.66 samples/sec   Loss 2.5339   LearningRate 0.0032   Epoch: 16   Global Step: 680930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:17:05,664-Speed 2622.24 samples/sec   Loss 2.4652   LearningRate 0.0032   Epoch: 16   Global Step: 680940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:17:09,573-Speed 2619.61 samples/sec   Loss 2.4070   LearningRate 0.0032   Epoch: 16   Global Step: 680950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:17:13,480-Speed 2622.17 samples/sec   Loss 2.5577   LearningRate 0.0032   Epoch: 16   Global Step: 680960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:17:17,376-Speed 2629.04 samples/sec   Loss 2.5217   LearningRate 0.0032   Epoch: 16   Global Step: 680970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:17:21,248-Speed 2645.99 samples/sec   Loss 2.4884   LearningRate 0.0032   Epoch: 16   Global Step: 680980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:25,189-Speed 2598.42 samples/sec   Loss 2.4841   LearningRate 0.0032   Epoch: 16   Global Step: 680990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:29,091-Speed 2625.10 samples/sec   Loss 2.5107   LearningRate 0.0032   Epoch: 16   Global Step: 681000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:32,985-Speed 2630.44 samples/sec   Loss 2.4716   LearningRate 0.0032   Epoch: 16   Global Step: 681010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:37,016-Speed 2540.74 samples/sec   Loss 2.4368   LearningRate 0.0032   Epoch: 16   Global Step: 681020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:40,924-Speed 2621.57 samples/sec   Loss 2.4147   LearningRate 0.0032   Epoch: 16   Global Step: 681030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:44,821-Speed 2628.26 samples/sec   Loss 2.4030   LearningRate 0.0032   Epoch: 16   Global Step: 681040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:48,724-Speed 2624.09 samples/sec   Loss 2.4537   LearningRate 0.0032   Epoch: 16   Global Step: 681050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:52,659-Speed 2603.64 samples/sec   Loss 2.4772   LearningRate 0.0032   Epoch: 16   Global Step: 681060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:17:56,556-Speed 2628.27 samples/sec   Loss 2.4970   LearningRate 0.0032   Epoch: 16   Global Step: 681070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:00,448-Speed 2631.35 samples/sec   Loss 2.4596   LearningRate 0.0032   Epoch: 16   Global Step: 681080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:18:04,346-Speed 2627.75 samples/sec   Loss 2.4880   LearningRate 0.0032   Epoch: 16   Global Step: 681090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:18:08,475-Speed 2481.53 samples/sec   Loss 2.4000   LearningRate 0.0032   Epoch: 16   Global Step: 681100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:18:12,407-Speed 2604.82 samples/sec   Loss 2.4774   LearningRate 0.0032   Epoch: 16   Global Step: 681110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:18:16,280-Speed 2644.29 samples/sec   Loss 2.4936   LearningRate 0.0032   Epoch: 16   Global Step: 681120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:20,181-Speed 2625.66 samples/sec   Loss 2.4968   LearningRate 0.0032   Epoch: 16   Global Step: 681130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:24,087-Speed 2622.34 samples/sec   Loss 2.5250   LearningRate 0.0032   Epoch: 16   Global Step: 681140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:27,979-Speed 2631.89 samples/sec   Loss 2.4850   LearningRate 0.0032   Epoch: 16   Global Step: 681150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:31,879-Speed 2626.45 samples/sec   Loss 2.4487   LearningRate 0.0032   Epoch: 16   Global Step: 681160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:35,774-Speed 2629.51 samples/sec   Loss 2.4692   LearningRate 0.0032   Epoch: 16   Global Step: 681170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:39,667-Speed 2630.83 samples/sec   Loss 2.4027   LearningRate 0.0032   Epoch: 16   Global Step: 681180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:43,566-Speed 2627.45 samples/sec   Loss 2.3664   LearningRate 0.0032   Epoch: 16   Global Step: 681190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:47,463-Speed 2627.80 samples/sec   Loss 2.4769   LearningRate 0.0032   Epoch: 16   Global Step: 681200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:51,359-Speed 2629.47 samples/sec   Loss 2.4189   LearningRate 0.0032   Epoch: 16   Global Step: 681210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:55,237-Speed 2641.18 samples/sec   Loss 2.4652   LearningRate 0.0032   Epoch: 16   Global Step: 681220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:18:59,129-Speed 2631.40 samples/sec   Loss 2.5366   LearningRate 0.0032   Epoch: 16   Global Step: 681230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:03,053-Speed 2610.37 samples/sec   Loss 2.4596   LearningRate 0.0032   Epoch: 16   Global Step: 681240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:06,965-Speed 2618.88 samples/sec   Loss 2.4488   LearningRate 0.0032   Epoch: 16   Global Step: 681250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:10,866-Speed 2625.11 samples/sec   Loss 2.5021   LearningRate 0.0032   Epoch: 16   Global Step: 681260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:14,767-Speed 2625.97 samples/sec   Loss 2.4859   LearningRate 0.0032   Epoch: 16   Global Step: 681270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:18,687-Speed 2613.11 samples/sec   Loss 2.5501   LearningRate 0.0032   Epoch: 16   Global Step: 681280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:22,597-Speed 2619.54 samples/sec   Loss 2.4824   LearningRate 0.0032   Epoch: 16   Global Step: 681290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:26,504-Speed 2621.52 samples/sec   Loss 2.4534   LearningRate 0.0032   Epoch: 16   Global Step: 681300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:30,400-Speed 2629.25 samples/sec   Loss 2.4622   LearningRate 0.0032   Epoch: 16   Global Step: 681310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:34,297-Speed 2628.01 samples/sec   Loss 2.5048   LearningRate 0.0032   Epoch: 16   Global Step: 681320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:19:38,168-Speed 2645.49 samples/sec   Loss 2.4994   LearningRate 0.0032   Epoch: 16   Global Step: 681330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:42,071-Speed 2625.19 samples/sec   Loss 2.4953   LearningRate 0.0032   Epoch: 16   Global Step: 681340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:45,970-Speed 2626.54 samples/sec   Loss 2.4165   LearningRate 0.0032   Epoch: 16   Global Step: 681350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:19:49,857-Speed 2635.84 samples/sec   Loss 2.3584   LearningRate 0.0032   Epoch: 16   Global Step: 681360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:19:53,771-Speed 2616.76 samples/sec   Loss 2.4187   LearningRate 0.0032   Epoch: 16   Global Step: 681370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:19:57,679-Speed 2621.36 samples/sec   Loss 2.5196   LearningRate 0.0032   Epoch: 16   Global Step: 681380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:01,577-Speed 2627.46 samples/sec   Loss 2.4404   LearningRate 0.0032   Epoch: 16   Global Step: 681390   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:05,532-Speed 2589.75 samples/sec   Loss 2.4483   LearningRate 0.0032   Epoch: 16   Global Step: 681400   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:09,434-Speed 2624.94 samples/sec   Loss 2.5031   LearningRate 0.0032   Epoch: 16   Global Step: 681410   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:13,337-Speed 2624.59 samples/sec   Loss 2.5484   LearningRate 0.0032   Epoch: 16   Global Step: 681420   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:17,234-Speed 2628.11 samples/sec   Loss 2.4682   LearningRate 0.0032   Epoch: 16   Global Step: 681430   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:21,157-Speed 2611.15 samples/sec   Loss 2.4370   LearningRate 0.0032   Epoch: 16   Global Step: 681440   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:25,067-Speed 2619.53 samples/sec   Loss 2.4441   LearningRate 0.0032   Epoch: 16   Global Step: 681450   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:20:28,967-Speed 2626.83 samples/sec   Loss 2.4748   LearningRate 0.0032   Epoch: 16   Global Step: 681460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:32,862-Speed 2629.53 samples/sec   Loss 2.4796   LearningRate 0.0032   Epoch: 16   Global Step: 681470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:36,755-Speed 2630.88 samples/sec   Loss 2.4939   LearningRate 0.0032   Epoch: 16   Global Step: 681480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:40,673-Speed 2613.90 samples/sec   Loss 2.4774   LearningRate 0.0032   Epoch: 16   Global Step: 681490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:44,631-Speed 2588.10 samples/sec   Loss 2.4668   LearningRate 0.0032   Epoch: 16   Global Step: 681500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:48,541-Speed 2623.92 samples/sec   Loss 2.4356   LearningRate 0.0032   Epoch: 16   Global Step: 681510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:52,435-Speed 2629.88 samples/sec   Loss 2.4614   LearningRate 0.0032   Epoch: 16   Global Step: 681520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:20:56,343-Speed 2620.99 samples/sec   Loss 2.4807   LearningRate 0.0032   Epoch: 16   Global Step: 681530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:00,334-Speed 2566.67 samples/sec   Loss 2.4930   LearningRate 0.0032   Epoch: 16   Global Step: 681540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:04,241-Speed 2621.71 samples/sec   Loss 2.5360   LearningRate 0.0032   Epoch: 16   Global Step: 681550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:08,141-Speed 2626.10 samples/sec   Loss 2.4880   LearningRate 0.0032   Epoch: 16   Global Step: 681560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:21:12,055-Speed 2617.04 samples/sec   Loss 2.4824   LearningRate 0.0032   Epoch: 16   Global Step: 681570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:21:15,932-Speed 2642.39 samples/sec   Loss 2.4665   LearningRate 0.0032   Epoch: 16   Global Step: 681580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:19,824-Speed 2631.71 samples/sec   Loss 2.5242   LearningRate 0.0032   Epoch: 16   Global Step: 681590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:23,722-Speed 2627.41 samples/sec   Loss 2.4348   LearningRate 0.0032   Epoch: 16   Global Step: 681600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:27,622-Speed 2626.99 samples/sec   Loss 2.5026   LearningRate 0.0032   Epoch: 16   Global Step: 681610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:21:31,533-Speed 2618.72 samples/sec   Loss 2.4900   LearningRate 0.0032   Epoch: 16   Global Step: 681620   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:35,440-Speed 2621.28 samples/sec   Loss 2.4440   LearningRate 0.0032   Epoch: 16   Global Step: 681630   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:39,353-Speed 2617.52 samples/sec   Loss 2.4588   LearningRate 0.0032   Epoch: 16   Global Step: 681640   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:43,271-Speed 2614.33 samples/sec   Loss 2.4866   LearningRate 0.0032   Epoch: 16   Global Step: 681650   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:47,167-Speed 2629.15 samples/sec   Loss 2.3728   LearningRate 0.0032   Epoch: 16   Global Step: 681660   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:51,073-Speed 2622.90 samples/sec   Loss 2.4304   LearningRate 0.0032   Epoch: 16   Global Step: 681670   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:54,968-Speed 2628.89 samples/sec   Loss 2.4660   LearningRate 0.0032   Epoch: 16   Global Step: 681680   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:21:58,878-Speed 2620.02 samples/sec   Loss 2.4685   LearningRate 0.0032   Epoch: 16   Global Step: 681690   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:02,779-Speed 2625.82 samples/sec   Loss 2.4950   LearningRate 0.0032   Epoch: 16   Global Step: 681700   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:06,678-Speed 2626.74 samples/sec   Loss 2.4694   LearningRate 0.0032   Epoch: 16   Global Step: 681710   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:10,581-Speed 2624.50 samples/sec   Loss 2.4585   LearningRate 0.0032   Epoch: 16   Global Step: 681720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:22:14,490-Speed 2620.38 samples/sec   Loss 2.4419   LearningRate 0.0032   Epoch: 16   Global Step: 681730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:22:18,404-Speed 2618.13 samples/sec   Loss 2.4385   LearningRate 0.0032   Epoch: 16   Global Step: 681740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:22:22,280-Speed 2642.17 samples/sec   Loss 2.4452   LearningRate 0.0032   Epoch: 16   Global Step: 681750   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:26,179-Speed 2626.79 samples/sec   Loss 2.4402   LearningRate 0.0032   Epoch: 16   Global Step: 681760   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:30,073-Speed 2630.43 samples/sec   Loss 2.4595   LearningRate 0.0032   Epoch: 16   Global Step: 681770   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:33,982-Speed 2620.94 samples/sec   Loss 2.5096   LearningRate 0.0032   Epoch: 16   Global Step: 681780   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:37,879-Speed 2628.17 samples/sec   Loss 2.3748   LearningRate 0.0032   Epoch: 16   Global Step: 681790   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:41,800-Speed 2612.09 samples/sec   Loss 2.4899   LearningRate 0.0032   Epoch: 16   Global Step: 681800   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:45,707-Speed 2621.80 samples/sec   Loss 2.5093   LearningRate 0.0032   Epoch: 16   Global Step: 681810   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:49,609-Speed 2624.74 samples/sec   Loss 2.4585   LearningRate 0.0032   Epoch: 16   Global Step: 681820   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:53,512-Speed 2624.78 samples/sec   Loss 2.5028   LearningRate 0.0032   Epoch: 16   Global Step: 681830   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:22:57,425-Speed 2617.48 samples/sec   Loss 2.5463   LearningRate 0.0032   Epoch: 16   Global Step: 681840   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:01,336-Speed 2618.93 samples/sec   Loss 2.4355   LearningRate 0.0032   Epoch: 16   Global Step: 681850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:23:05,234-Speed 2627.07 samples/sec   Loss 2.4956   LearningRate 0.0032   Epoch: 16   Global Step: 681860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:23:09,133-Speed 2627.66 samples/sec   Loss 2.4439   LearningRate 0.0032   Epoch: 16   Global Step: 681870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:23:13,016-Speed 2638.00 samples/sec   Loss 2.4657   LearningRate 0.0032   Epoch: 16   Global Step: 681880   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:16,995-Speed 2574.31 samples/sec   Loss 2.4769   LearningRate 0.0032   Epoch: 16   Global Step: 681890   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:20,889-Speed 2630.22 samples/sec   Loss 2.4568   LearningRate 0.0032   Epoch: 16   Global Step: 681900   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:24,791-Speed 2625.43 samples/sec   Loss 2.4574   LearningRate 0.0032   Epoch: 16   Global Step: 681910   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:28,694-Speed 2623.83 samples/sec   Loss 2.4919   LearningRate 0.0032   Epoch: 16   Global Step: 681920   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:32,592-Speed 2627.50 samples/sec   Loss 2.4748   LearningRate 0.0032   Epoch: 16   Global Step: 681930   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:36,488-Speed 2628.66 samples/sec   Loss 2.5277   LearningRate 0.0032   Epoch: 16   Global Step: 681940   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:40,388-Speed 2626.68 samples/sec   Loss 2.4876   LearningRate 0.0032   Epoch: 16   Global Step: 681950   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:44,285-Speed 2628.59 samples/sec   Loss 2.4075   LearningRate 0.0032   Epoch: 16   Global Step: 681960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:48,186-Speed 2625.58 samples/sec   Loss 2.4353   LearningRate 0.0032   Epoch: 16   Global Step: 681970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:23:52,090-Speed 2623.78 samples/sec   Loss 2.4662   LearningRate 0.0032   Epoch: 16   Global Step: 681980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:23:55,992-Speed 2624.99 samples/sec   Loss 2.4998   LearningRate 0.0032   Epoch: 16   Global Step: 681990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:23:59,919-Speed 2608.09 samples/sec   Loss 2.4452   LearningRate 0.0032   Epoch: 16   Global Step: 682000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:24:03,829-Speed 2619.42 samples/sec   Loss 2.4997   LearningRate 0.0032   Epoch: 16   Global Step: 682010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:24:07,728-Speed 2627.33 samples/sec   Loss 2.3654   LearningRate 0.0032   Epoch: 16   Global Step: 682020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:24:11,606-Speed 2641.42 samples/sec   Loss 2.4386   LearningRate 0.0032   Epoch: 16   Global Step: 682030   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:15,504-Speed 2628.03 samples/sec   Loss 2.4535   LearningRate 0.0032   Epoch: 16   Global Step: 682040   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:19,412-Speed 2620.66 samples/sec   Loss 2.4439   LearningRate 0.0032   Epoch: 16   Global Step: 682050   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:23,310-Speed 2627.57 samples/sec   Loss 2.4767   LearningRate 0.0032   Epoch: 16   Global Step: 682060   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:27,208-Speed 2627.76 samples/sec   Loss 2.5183   LearningRate 0.0032   Epoch: 16   Global Step: 682070   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:31,109-Speed 2625.74 samples/sec   Loss 2.5056   LearningRate 0.0032   Epoch: 16   Global Step: 682080   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:35,011-Speed 2624.73 samples/sec   Loss 2.4540   LearningRate 0.0032   Epoch: 16   Global Step: 682090   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:38,953-Speed 2598.78 samples/sec   Loss 2.4837   LearningRate 0.0032   Epoch: 16   Global Step: 682100   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:42,851-Speed 2627.05 samples/sec   Loss 2.3496   LearningRate 0.0032   Epoch: 16   Global Step: 682110   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:46,749-Speed 2628.28 samples/sec   Loss 2.4065   LearningRate 0.0032   Epoch: 16   Global Step: 682120   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:24:50,653-Speed 2623.60 samples/sec   Loss 2.4137   LearningRate 0.0032   Epoch: 16   Global Step: 682130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:24:54,555-Speed 2627.05 samples/sec   Loss 2.4988   LearningRate 0.0032   Epoch: 16   Global Step: 682140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:24:58,455-Speed 2625.61 samples/sec   Loss 2.4134   LearningRate 0.0032   Epoch: 16   Global Step: 682150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:02,359-Speed 2623.58 samples/sec   Loss 2.4780   LearningRate 0.0032   Epoch: 16   Global Step: 682160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:06,257-Speed 2627.66 samples/sec   Loss 2.4283   LearningRate 0.0032   Epoch: 16   Global Step: 682170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:10,163-Speed 2622.86 samples/sec   Loss 2.4649   LearningRate 0.0032   Epoch: 16   Global Step: 682180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:14,061-Speed 2628.21 samples/sec   Loss 2.4484   LearningRate 0.0032   Epoch: 16   Global Step: 682190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:17,964-Speed 2623.81 samples/sec   Loss 2.4702   LearningRate 0.0032   Epoch: 16   Global Step: 682200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:21,867-Speed 2624.50 samples/sec   Loss 2.4032   LearningRate 0.0032   Epoch: 16   Global Step: 682210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:25,765-Speed 2627.99 samples/sec   Loss 2.4956   LearningRate 0.0032   Epoch: 16   Global Step: 682220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:29,666-Speed 2625.97 samples/sec   Loss 2.4064   LearningRate 0.0032   Epoch: 16   Global Step: 682230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:25:33,559-Speed 2631.54 samples/sec   Loss 2.5717   LearningRate 0.0032   Epoch: 16   Global Step: 682240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:37,474-Speed 2615.57 samples/sec   Loss 2.4145   LearningRate 0.0032   Epoch: 16   Global Step: 682250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:41,409-Speed 2603.31 samples/sec   Loss 2.4428   LearningRate 0.0032   Epoch: 16   Global Step: 682260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:45,307-Speed 2627.71 samples/sec   Loss 2.4495   LearningRate 0.0032   Epoch: 16   Global Step: 682270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:49,243-Speed 2602.64 samples/sec   Loss 2.4508   LearningRate 0.0032   Epoch: 16   Global Step: 682280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:25:53,138-Speed 2629.72 samples/sec   Loss 2.4331   LearningRate 0.0032   Epoch: 16   Global Step: 682290   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:25:57,073-Speed 2603.10 samples/sec   Loss 2.4440   LearningRate 0.0032   Epoch: 16   Global Step: 682300   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:00,987-Speed 2616.68 samples/sec   Loss 2.4654   LearningRate 0.0032   Epoch: 16   Global Step: 682310   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:04,894-Speed 2621.80 samples/sec   Loss 2.4913   LearningRate 0.0032   Epoch: 16   Global Step: 682320   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:08,792-Speed 2627.18 samples/sec   Loss 2.4710   LearningRate 0.0032   Epoch: 16   Global Step: 682330   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:12,710-Speed 2614.70 samples/sec   Loss 2.4027   LearningRate 0.0032   Epoch: 16   Global Step: 682340   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:16,617-Speed 2621.00 samples/sec   Loss 2.4224   LearningRate 0.0031   Epoch: 16   Global Step: 682350   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:20,522-Speed 2623.66 samples/sec   Loss 2.4439   LearningRate 0.0031   Epoch: 16   Global Step: 682360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:24,428-Speed 2622.57 samples/sec   Loss 2.4647   LearningRate 0.0031   Epoch: 16   Global Step: 682370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:28,328-Speed 2625.96 samples/sec   Loss 2.4047   LearningRate 0.0031   Epoch: 16   Global Step: 682380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:32,296-Speed 2581.29 samples/sec   Loss 2.3687   LearningRate 0.0031   Epoch: 16   Global Step: 682390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:26:36,201-Speed 2623.13 samples/sec   Loss 2.5060   LearningRate 0.0031   Epoch: 16   Global Step: 682400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:26:40,124-Speed 2610.92 samples/sec   Loss 2.3928   LearningRate 0.0031   Epoch: 16   Global Step: 682410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:26:44,022-Speed 2627.56 samples/sec   Loss 2.3915   LearningRate 0.0031   Epoch: 16   Global Step: 682420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:26:47,925-Speed 2624.10 samples/sec   Loss 2.5060   LearningRate 0.0031   Epoch: 16   Global Step: 682430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:26:51,829-Speed 2623.81 samples/sec   Loss 2.4726   LearningRate 0.0031   Epoch: 16   Global Step: 682440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:26:55,708-Speed 2640.74 samples/sec   Loss 2.4267   LearningRate 0.0031   Epoch: 16   Global Step: 682450   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:26:59,608-Speed 2626.54 samples/sec   Loss 2.5454   LearningRate 0.0031   Epoch: 16   Global Step: 682460   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:03,523-Speed 2616.16 samples/sec   Loss 2.4481   LearningRate 0.0031   Epoch: 16   Global Step: 682470   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:07,422-Speed 2627.14 samples/sec   Loss 2.3970   LearningRate 0.0031   Epoch: 16   Global Step: 682480   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:11,331-Speed 2619.66 samples/sec   Loss 2.4893   LearningRate 0.0031   Epoch: 16   Global Step: 682490   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:15,326-Speed 2564.66 samples/sec   Loss 2.4213   LearningRate 0.0031   Epoch: 16   Global Step: 682500   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:19,237-Speed 2618.87 samples/sec   Loss 2.4867   LearningRate 0.0031   Epoch: 16   Global Step: 682510   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:23,170-Speed 2604.22 samples/sec   Loss 2.4462   LearningRate 0.0031   Epoch: 16   Global Step: 682520   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:27,074-Speed 2623.96 samples/sec   Loss 2.4830   LearningRate 0.0031   Epoch: 16   Global Step: 682530   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:30,969-Speed 2629.78 samples/sec   Loss 2.4761   LearningRate 0.0031   Epoch: 16   Global Step: 682540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-16 00:27:34,865-Speed 2628.64 samples/sec   Loss 2.4663   LearningRate 0.0031   Epoch: 16   Global Step: 682550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:27:38,765-Speed 2626.23 samples/sec   Loss 2.4319   LearningRate 0.0031   Epoch: 16   Global Step: 682560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:27:42,682-Speed 2615.08 samples/sec   Loss 2.4331   LearningRate 0.0031   Epoch: 16   Global Step: 682570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:27:46,583-Speed 2625.64 samples/sec   Loss 2.4783   LearningRate 0.0031   Epoch: 16   Global Step: 682580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:27:50,478-Speed 2629.94 samples/sec   Loss 2.4490   LearningRate 0.0031   Epoch: 16   Global Step: 682590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:27:54,382-Speed 2624.07 samples/sec   Loss 2.4209   LearningRate 0.0031   Epoch: 16   Global Step: 682600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:27:58,283-Speed 2625.97 samples/sec   Loss 2.4033   LearningRate 0.0031   Epoch: 16   Global Step: 682610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:02,184-Speed 2624.98 samples/sec   Loss 2.4667   LearningRate 0.0031   Epoch: 16   Global Step: 682620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:06,090-Speed 2622.48 samples/sec   Loss 2.4323   LearningRate 0.0031   Epoch: 16   Global Step: 682630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:09,996-Speed 2622.09 samples/sec   Loss 2.3907   LearningRate 0.0031   Epoch: 16   Global Step: 682640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:13,882-Speed 2635.87 samples/sec   Loss 2.4948   LearningRate 0.0031   Epoch: 16   Global Step: 682650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:17,790-Speed 2621.14 samples/sec   Loss 2.4360   LearningRate 0.0031   Epoch: 16   Global Step: 682660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:21,717-Speed 2607.71 samples/sec   Loss 2.4692   LearningRate 0.0031   Epoch: 16   Global Step: 682670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:25,623-Speed 2623.86 samples/sec   Loss 2.4313   LearningRate 0.0031   Epoch: 16   Global Step: 682680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:29,522-Speed 2627.11 samples/sec   Loss 2.4025   LearningRate 0.0031   Epoch: 16   Global Step: 682690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:33,416-Speed 2629.89 samples/sec   Loss 2.3754   LearningRate 0.0031   Epoch: 16   Global Step: 682700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:37,316-Speed 2626.65 samples/sec   Loss 2.4816   LearningRate 0.0031   Epoch: 16   Global Step: 682710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:41,210-Speed 2629.55 samples/sec   Loss 2.4199   LearningRate 0.0031   Epoch: 16   Global Step: 682720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:45,107-Speed 2628.63 samples/sec   Loss 2.4546   LearningRate 0.0031   Epoch: 16   Global Step: 682730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:49,009-Speed 2624.72 samples/sec   Loss 2.4762   LearningRate 0.0031   Epoch: 16   Global Step: 682740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-16 00:28:52,907-Speed 2628.44 samples/sec   Loss 2.5308   LearningRate 0.0031   Epoch: 16   Global Step: 682750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:28:56,818-Speed 2618.38 samples/sec   Loss 2.4148   LearningRate 0.0031   Epoch: 16   Global Step: 682760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:29:00,714-Speed 2629.05 samples/sec   Loss 2.5468   LearningRate 0.0031   Epoch: 16   Global Step: 682770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:29:04,618-Speed 2623.70 samples/sec   Loss 2.4744   LearningRate 0.0031   Epoch: 16   Global Step: 682780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:29:08,515-Speed 2628.24 samples/sec   Loss 2.4099   LearningRate 0.0031   Epoch: 16   Global Step: 682790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-16 00:29:12,387-Speed 2645.56 samples/sec   Loss 2.5325   LearningRate 0.0031   Epoch: 16   Global Step: 682800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:16,283-Speed 2628.43 samples/sec   Loss 2.4098   LearningRate 0.0031   Epoch: 16   Global Step: 682810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:20,182-Speed 2627.62 samples/sec   Loss 2.4411   LearningRate 0.0031   Epoch: 16   Global Step: 682820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:24,092-Speed 2619.31 samples/sec   Loss 2.4010   LearningRate 0.0031   Epoch: 16   Global Step: 682830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:28,004-Speed 2618.82 samples/sec   Loss 2.4901   LearningRate 0.0031   Epoch: 16   Global Step: 682840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:31,931-Speed 2608.13 samples/sec   Loss 2.3850   LearningRate 0.0031   Epoch: 16   Global Step: 682850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:35,828-Speed 2628.44 samples/sec   Loss 2.4754   LearningRate 0.0031   Epoch: 16   Global Step: 682860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:39,724-Speed 2628.81 samples/sec   Loss 2.4067   LearningRate 0.0031   Epoch: 16   Global Step: 682870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:43,623-Speed 2627.04 samples/sec   Loss 2.4219   LearningRate 0.0031   Epoch: 16   Global Step: 682880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:47,524-Speed 2625.06 samples/sec   Loss 2.5092   LearningRate 0.0031   Epoch: 16   Global Step: 682890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:51,426-Speed 2625.36 samples/sec   Loss 2.4402   LearningRate 0.0031   Epoch: 16   Global Step: 682900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:29:55,311-Speed 2636.68 samples/sec   Loss 2.4156   LearningRate 0.0031   Epoch: 16   Global Step: 682910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:29:59,215-Speed 2623.57 samples/sec   Loss 2.4540   LearningRate 0.0031   Epoch: 16   Global Step: 682920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:30:03,211-Speed 2563.25 samples/sec   Loss 2.5125   LearningRate 0.0031   Epoch: 16   Global Step: 682930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:30:07,116-Speed 2622.85 samples/sec   Loss 2.3927   LearningRate 0.0031   Epoch: 16   Global Step: 682940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:30:11,045-Speed 2606.55 samples/sec   Loss 2.4226   LearningRate 0.0031   Epoch: 16   Global Step: 682950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:30:14,939-Speed 2630.83 samples/sec   Loss 2.4560   LearningRate 0.0031   Epoch: 16   Global Step: 682960   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:18,844-Speed 2622.94 samples/sec   Loss 2.5260   LearningRate 0.0031   Epoch: 16   Global Step: 682970   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:22,748-Speed 2623.41 samples/sec   Loss 2.3930   LearningRate 0.0031   Epoch: 16   Global Step: 682980   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:26,651-Speed 2624.98 samples/sec   Loss 2.4598   LearningRate 0.0031   Epoch: 16   Global Step: 682990   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:30,565-Speed 2616.83 samples/sec   Loss 2.4106   LearningRate 0.0031   Epoch: 16   Global Step: 683000   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:34,496-Speed 2605.39 samples/sec   Loss 2.4254   LearningRate 0.0031   Epoch: 16   Global Step: 683010   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:38,397-Speed 2625.39 samples/sec   Loss 2.4013   LearningRate 0.0031   Epoch: 16   Global Step: 683020   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:42,302-Speed 2623.30 samples/sec   Loss 2.4579   LearningRate 0.0031   Epoch: 16   Global Step: 683030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:46,208-Speed 2622.02 samples/sec   Loss 2.4474   LearningRate 0.0031   Epoch: 16   Global Step: 683040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:50,107-Speed 2627.18 samples/sec   Loss 2.4418   LearningRate 0.0031   Epoch: 16   Global Step: 683050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:30:54,058-Speed 2592.36 samples/sec   Loss 2.4064   LearningRate 0.0031   Epoch: 16   Global Step: 683060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:30:57,963-Speed 2623.24 samples/sec   Loss 2.4321   LearningRate 0.0031   Epoch: 16   Global Step: 683070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:01,886-Speed 2610.80 samples/sec   Loss 2.4738   LearningRate 0.0031   Epoch: 16   Global Step: 683080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:05,795-Speed 2620.12 samples/sec   Loss 2.4705   LearningRate 0.0031   Epoch: 16   Global Step: 683090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:09,702-Speed 2621.39 samples/sec   Loss 2.4906   LearningRate 0.0031   Epoch: 16   Global Step: 683100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:13,606-Speed 2624.23 samples/sec   Loss 2.4820   LearningRate 0.0031   Epoch: 16   Global Step: 683110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:17,513-Speed 2621.59 samples/sec   Loss 2.4335   LearningRate 0.0031   Epoch: 16   Global Step: 683120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:21,416-Speed 2624.83 samples/sec   Loss 2.4075   LearningRate 0.0031   Epoch: 16   Global Step: 683130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:25,322-Speed 2621.80 samples/sec   Loss 2.4795   LearningRate 0.0031   Epoch: 16   Global Step: 683140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:29,228-Speed 2622.39 samples/sec   Loss 2.4713   LearningRate 0.0031   Epoch: 16   Global Step: 683150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:33,125-Speed 2628.20 samples/sec   Loss 2.3578   LearningRate 0.0031   Epoch: 16   Global Step: 683160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:31:37,027-Speed 2625.42 samples/sec   Loss 2.4793   LearningRate 0.0031   Epoch: 16   Global Step: 683170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:31:40,929-Speed 2624.70 samples/sec   Loss 2.3571   LearningRate 0.0031   Epoch: 16   Global Step: 683180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:31:44,807-Speed 2641.13 samples/sec   Loss 2.4313   LearningRate 0.0031   Epoch: 16   Global Step: 683190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:48,736-Speed 2606.68 samples/sec   Loss 2.4343   LearningRate 0.0031   Epoch: 16   Global Step: 683200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:52,643-Speed 2622.04 samples/sec   Loss 2.4870   LearningRate 0.0031   Epoch: 16   Global Step: 683210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:31:56,543-Speed 2625.91 samples/sec   Loss 2.4117   LearningRate 0.0031   Epoch: 16   Global Step: 683220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:00,443-Speed 2626.52 samples/sec   Loss 2.4247   LearningRate 0.0031   Epoch: 16   Global Step: 683230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:04,367-Speed 2610.32 samples/sec   Loss 2.4492   LearningRate 0.0031   Epoch: 16   Global Step: 683240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:08,265-Speed 2627.61 samples/sec   Loss 2.4569   LearningRate 0.0031   Epoch: 16   Global Step: 683250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:12,167-Speed 2625.39 samples/sec   Loss 2.4640   LearningRate 0.0031   Epoch: 16   Global Step: 683260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:16,065-Speed 2627.36 samples/sec   Loss 2.4468   LearningRate 0.0031   Epoch: 16   Global Step: 683270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:19,986-Speed 2611.99 samples/sec   Loss 2.4947   LearningRate 0.0031   Epoch: 16   Global Step: 683280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:23,890-Speed 2624.08 samples/sec   Loss 2.4314   LearningRate 0.0031   Epoch: 16   Global Step: 683290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:32:27,789-Speed 2627.00 samples/sec   Loss 2.4726   LearningRate 0.0031   Epoch: 16   Global Step: 683300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:32:31,726-Speed 2601.27 samples/sec   Loss 2.4806   LearningRate 0.0031   Epoch: 16   Global Step: 683310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:32:35,616-Speed 2633.16 samples/sec   Loss 2.4902   LearningRate 0.0031   Epoch: 16   Global Step: 683320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:39,524-Speed 2621.18 samples/sec   Loss 2.3635   LearningRate 0.0031   Epoch: 16   Global Step: 683330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:43,426-Speed 2624.77 samples/sec   Loss 2.4622   LearningRate 0.0031   Epoch: 16   Global Step: 683340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:47,340-Speed 2616.71 samples/sec   Loss 2.4592   LearningRate 0.0031   Epoch: 16   Global Step: 683350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:51,239-Speed 2627.69 samples/sec   Loss 2.4451   LearningRate 0.0031   Epoch: 16   Global Step: 683360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:55,138-Speed 2626.80 samples/sec   Loss 2.4239   LearningRate 0.0031   Epoch: 16   Global Step: 683370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:32:59,042-Speed 2623.71 samples/sec   Loss 2.4237   LearningRate 0.0031   Epoch: 16   Global Step: 683380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:02,945-Speed 2624.49 samples/sec   Loss 2.4880   LearningRate 0.0031   Epoch: 16   Global Step: 683390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:06,846-Speed 2625.21 samples/sec   Loss 2.4155   LearningRate 0.0031   Epoch: 16   Global Step: 683400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:10,746-Speed 2626.19 samples/sec   Loss 2.4890   LearningRate 0.0031   Epoch: 16   Global Step: 683410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:14,672-Speed 2609.08 samples/sec   Loss 2.4827   LearningRate 0.0031   Epoch: 16   Global Step: 683420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:33:18,589-Speed 2614.63 samples/sec   Loss 2.4502   LearningRate 0.0031   Epoch: 16   Global Step: 683430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:33:22,485-Speed 2628.85 samples/sec   Loss 2.4543   LearningRate 0.0031   Epoch: 16   Global Step: 683440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:26,384-Speed 2627.00 samples/sec   Loss 2.4233   LearningRate 0.0031   Epoch: 16   Global Step: 683450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:30,295-Speed 2619.26 samples/sec   Loss 2.4219   LearningRate 0.0031   Epoch: 16   Global Step: 683460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:34,258-Speed 2584.30 samples/sec   Loss 2.4742   LearningRate 0.0031   Epoch: 16   Global Step: 683470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:38,199-Speed 2599.13 samples/sec   Loss 2.4880   LearningRate 0.0031   Epoch: 16   Global Step: 683480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:42,099-Speed 2625.72 samples/sec   Loss 2.4813   LearningRate 0.0031   Epoch: 16   Global Step: 683490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:46,012-Speed 2618.10 samples/sec   Loss 2.4793   LearningRate 0.0031   Epoch: 16   Global Step: 683500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:49,922-Speed 2620.05 samples/sec   Loss 2.4109   LearningRate 0.0031   Epoch: 16   Global Step: 683510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:53,830-Speed 2620.97 samples/sec   Loss 2.4795   LearningRate 0.0031   Epoch: 16   Global Step: 683520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:33:57,745-Speed 2616.63 samples/sec   Loss 2.3147   LearningRate 0.0031   Epoch: 16   Global Step: 683530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:01,657-Speed 2618.17 samples/sec   Loss 2.4025   LearningRate 0.0031   Epoch: 16   Global Step: 683540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:34:05,535-Speed 2641.14 samples/sec   Loss 2.4034   LearningRate 0.0031   Epoch: 16   Global Step: 683550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:09,446-Speed 2618.70 samples/sec   Loss 2.4506   LearningRate 0.0031   Epoch: 16   Global Step: 683560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:13,370-Speed 2609.98 samples/sec   Loss 2.4683   LearningRate 0.0031   Epoch: 16   Global Step: 683570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:17,323-Speed 2591.12 samples/sec   Loss 2.4416   LearningRate 0.0031   Epoch: 16   Global Step: 683580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:21,224-Speed 2625.47 samples/sec   Loss 2.4284   LearningRate 0.0031   Epoch: 16   Global Step: 683590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:25,126-Speed 2625.47 samples/sec   Loss 2.4489   LearningRate 0.0031   Epoch: 16   Global Step: 683600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:29,028-Speed 2625.19 samples/sec   Loss 2.4149   LearningRate 0.0031   Epoch: 16   Global Step: 683610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:32,925-Speed 2627.87 samples/sec   Loss 2.4220   LearningRate 0.0031   Epoch: 16   Global Step: 683620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:36,829-Speed 2623.07 samples/sec   Loss 2.3778   LearningRate 0.0031   Epoch: 16   Global Step: 683630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:40,727-Speed 2627.98 samples/sec   Loss 2.3803   LearningRate 0.0031   Epoch: 16   Global Step: 683640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:34:44,630-Speed 2624.72 samples/sec   Loss 2.4679   LearningRate 0.0031   Epoch: 16   Global Step: 683650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:34:48,529-Speed 2626.38 samples/sec   Loss 2.3835   LearningRate 0.0031   Epoch: 16   Global Step: 683660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:34:52,431-Speed 2625.95 samples/sec   Loss 2.3857   LearningRate 0.0031   Epoch: 16   Global Step: 683670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:34:56,334-Speed 2623.89 samples/sec   Loss 2.3942   LearningRate 0.0031   Epoch: 16   Global Step: 683680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:35:00,252-Speed 2614.05 samples/sec   Loss 2.3647   LearningRate 0.0031   Epoch: 16   Global Step: 683690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:35:04,153-Speed 2625.70 samples/sec   Loss 2.4152   LearningRate 0.0031   Epoch: 16   Global Step: 683700   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:08,055-Speed 2625.31 samples/sec   Loss 2.4280   LearningRate 0.0031   Epoch: 16   Global Step: 683710   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:11,960-Speed 2623.20 samples/sec   Loss 2.4997   LearningRate 0.0031   Epoch: 16   Global Step: 683720   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:15,858-Speed 2627.84 samples/sec   Loss 2.4119   LearningRate 0.0031   Epoch: 16   Global Step: 683730   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:19,770-Speed 2618.49 samples/sec   Loss 2.4293   LearningRate 0.0031   Epoch: 16   Global Step: 683740   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:23,683-Speed 2616.90 samples/sec   Loss 2.4323   LearningRate 0.0031   Epoch: 16   Global Step: 683750   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:27,608-Speed 2620.45 samples/sec   Loss 2.3763   LearningRate 0.0031   Epoch: 16   Global Step: 683760   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:31,511-Speed 2624.36 samples/sec   Loss 2.4602   LearningRate 0.0031   Epoch: 16   Global Step: 683770   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:35,416-Speed 2622.79 samples/sec   Loss 2.3654   LearningRate 0.0031   Epoch: 16   Global Step: 683780   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:39,326-Speed 2618.74 samples/sec   Loss 2.4657   LearningRate 0.0031   Epoch: 16   Global Step: 683790   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:35:43,240-Speed 2617.60 samples/sec   Loss 2.4117   LearningRate 0.0031   Epoch: 16   Global Step: 683800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:35:47,135-Speed 2629.37 samples/sec   Loss 2.4425   LearningRate 0.0031   Epoch: 16   Global Step: 683810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:35:51,040-Speed 2623.17 samples/sec   Loss 2.4450   LearningRate 0.0031   Epoch: 16   Global Step: 683820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:35:54,940-Speed 2626.17 samples/sec   Loss 2.3494   LearningRate 0.0031   Epoch: 16   Global Step: 683830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:35:58,841-Speed 2626.63 samples/sec   Loss 2.4767   LearningRate 0.0031   Epoch: 16   Global Step: 683840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:02,743-Speed 2625.07 samples/sec   Loss 2.3261   LearningRate 0.0031   Epoch: 16   Global Step: 683850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:06,652-Speed 2619.87 samples/sec   Loss 2.5187   LearningRate 0.0031   Epoch: 16   Global Step: 683860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:10,560-Speed 2620.60 samples/sec   Loss 2.4682   LearningRate 0.0031   Epoch: 16   Global Step: 683870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:14,466-Speed 2622.73 samples/sec   Loss 2.4585   LearningRate 0.0031   Epoch: 16   Global Step: 683880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:18,365-Speed 2626.96 samples/sec   Loss 2.4404   LearningRate 0.0031   Epoch: 16   Global Step: 683890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:22,264-Speed 2626.87 samples/sec   Loss 2.3402   LearningRate 0.0031   Epoch: 16   Global Step: 683900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:26,233-Speed 2580.49 samples/sec   Loss 2.4175   LearningRate 0.0031   Epoch: 16   Global Step: 683910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:30,138-Speed 2623.81 samples/sec   Loss 2.4704   LearningRate 0.0031   Epoch: 16   Global Step: 683920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:34,053-Speed 2615.99 samples/sec   Loss 2.4500   LearningRate 0.0031   Epoch: 16   Global Step: 683930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:37,957-Speed 2623.91 samples/sec   Loss 2.4773   LearningRate 0.0031   Epoch: 16   Global Step: 683940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:41,909-Speed 2591.20 samples/sec   Loss 2.4296   LearningRate 0.0031   Epoch: 16   Global Step: 683950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:45,819-Speed 2620.33 samples/sec   Loss 2.4103   LearningRate 0.0031   Epoch: 16   Global Step: 683960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:49,715-Speed 2629.35 samples/sec   Loss 2.4209   LearningRate 0.0031   Epoch: 16   Global Step: 683970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:53,618-Speed 2624.22 samples/sec   Loss 2.4323   LearningRate 0.0031   Epoch: 16   Global Step: 683980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:36:57,495-Speed 2642.22 samples/sec   Loss 2.4251   LearningRate 0.0031   Epoch: 16   Global Step: 683990   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:01,393-Speed 2627.32 samples/sec   Loss 2.4464   LearningRate 0.0031   Epoch: 16   Global Step: 684000   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:05,296-Speed 2624.37 samples/sec   Loss 2.4066   LearningRate 0.0031   Epoch: 16   Global Step: 684010   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:09,226-Speed 2606.37 samples/sec   Loss 2.4204   LearningRate 0.0031   Epoch: 16   Global Step: 684020   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:13,144-Speed 2614.26 samples/sec   Loss 2.4120   LearningRate 0.0031   Epoch: 16   Global Step: 684030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:17,052-Speed 2620.65 samples/sec   Loss 2.4618   LearningRate 0.0031   Epoch: 16   Global Step: 684040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:20,963-Speed 2619.43 samples/sec   Loss 2.4328   LearningRate 0.0031   Epoch: 16   Global Step: 684050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:24,865-Speed 2625.10 samples/sec   Loss 2.4340   LearningRate 0.0031   Epoch: 16   Global Step: 684060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:28,851-Speed 2569.91 samples/sec   Loss 2.4273   LearningRate 0.0031   Epoch: 16   Global Step: 684070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:32,867-Speed 2550.38 samples/sec   Loss 2.4952   LearningRate 0.0031   Epoch: 16   Global Step: 684080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:37:36,768-Speed 2625.32 samples/sec   Loss 2.3826   LearningRate 0.0031   Epoch: 16   Global Step: 684090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:37:40,687-Speed 2613.72 samples/sec   Loss 2.4807   LearningRate 0.0031   Epoch: 16   Global Step: 684100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:37:44,609-Speed 2612.14 samples/sec   Loss 2.4575   LearningRate 0.0031   Epoch: 16   Global Step: 684110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:37:48,527-Speed 2613.91 samples/sec   Loss 2.4088   LearningRate 0.0031   Epoch: 16   Global Step: 684120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:37:52,435-Speed 2620.98 samples/sec   Loss 2.3684   LearningRate 0.0031   Epoch: 16   Global Step: 684130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:37:56,342-Speed 2621.76 samples/sec   Loss 2.5001   LearningRate 0.0031   Epoch: 16   Global Step: 684140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:00,242-Speed 2626.09 samples/sec   Loss 2.4595   LearningRate 0.0031   Epoch: 16   Global Step: 684150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:04,140-Speed 2627.64 samples/sec   Loss 2.3745   LearningRate 0.0031   Epoch: 16   Global Step: 684160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:08,039-Speed 2627.05 samples/sec   Loss 2.4221   LearningRate 0.0031   Epoch: 16   Global Step: 684170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:11,939-Speed 2625.87 samples/sec   Loss 2.4028   LearningRate 0.0031   Epoch: 16   Global Step: 684180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:15,839-Speed 2627.15 samples/sec   Loss 2.4498   LearningRate 0.0031   Epoch: 16   Global Step: 684190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:38:19,742-Speed 2624.64 samples/sec   Loss 2.4100   LearningRate 0.0031   Epoch: 16   Global Step: 684200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:38:23,643-Speed 2624.87 samples/sec   Loss 2.4283   LearningRate 0.0031   Epoch: 16   Global Step: 684210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:38:27,523-Speed 2640.74 samples/sec   Loss 2.4569   LearningRate 0.0031   Epoch: 16   Global Step: 684220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:31,424-Speed 2625.79 samples/sec   Loss 2.3536   LearningRate 0.0031   Epoch: 16   Global Step: 684230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:35,321-Speed 2629.01 samples/sec   Loss 2.4645   LearningRate 0.0031   Epoch: 16   Global Step: 684240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:39,231-Speed 2619.12 samples/sec   Loss 2.3862   LearningRate 0.0031   Epoch: 16   Global Step: 684250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:43,164-Speed 2604.12 samples/sec   Loss 2.3983   LearningRate 0.0031   Epoch: 16   Global Step: 684260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:47,063-Speed 2626.74 samples/sec   Loss 2.4190   LearningRate 0.0031   Epoch: 16   Global Step: 684270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:50,965-Speed 2625.79 samples/sec   Loss 2.4602   LearningRate 0.0031   Epoch: 16   Global Step: 684280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:54,873-Speed 2620.86 samples/sec   Loss 2.4497   LearningRate 0.0031   Epoch: 16   Global Step: 684290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:38:58,785-Speed 2618.09 samples/sec   Loss 2.3830   LearningRate 0.0031   Epoch: 16   Global Step: 684300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:02,687-Speed 2625.33 samples/sec   Loss 2.5185   LearningRate 0.0031   Epoch: 16   Global Step: 684310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:06,583-Speed 2628.69 samples/sec   Loss 2.5326   LearningRate 0.0031   Epoch: 16   Global Step: 684320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:10,491-Speed 2620.96 samples/sec   Loss 2.4688   LearningRate 0.0031   Epoch: 16   Global Step: 684330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:14,400-Speed 2619.81 samples/sec   Loss 2.4616   LearningRate 0.0031   Epoch: 16   Global Step: 684340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:18,321-Speed 2612.55 samples/sec   Loss 2.4174   LearningRate 0.0031   Epoch: 16   Global Step: 684350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:22,224-Speed 2624.50 samples/sec   Loss 2.4315   LearningRate 0.0031   Epoch: 16   Global Step: 684360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:26,240-Speed 2550.59 samples/sec   Loss 2.4275   LearningRate 0.0031   Epoch: 16   Global Step: 684370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:30,144-Speed 2623.28 samples/sec   Loss 2.4893   LearningRate 0.0031   Epoch: 16   Global Step: 684380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:34,060-Speed 2616.33 samples/sec   Loss 2.4493   LearningRate 0.0031   Epoch: 16   Global Step: 684390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:37,959-Speed 2626.97 samples/sec   Loss 2.4952   LearningRate 0.0031   Epoch: 16   Global Step: 684400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:41,857-Speed 2627.42 samples/sec   Loss 2.4359   LearningRate 0.0031   Epoch: 16   Global Step: 684410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:39:45,757-Speed 2626.39 samples/sec   Loss 2.4909   LearningRate 0.0031   Epoch: 16   Global Step: 684420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:39:49,661-Speed 2623.97 samples/sec   Loss 2.4522   LearningRate 0.0031   Epoch: 16   Global Step: 684430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:39:53,558-Speed 2628.03 samples/sec   Loss 2.4258   LearningRate 0.0031   Epoch: 16   Global Step: 684440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:39:57,511-Speed 2592.23 samples/sec   Loss 2.4180   LearningRate 0.0031   Epoch: 16   Global Step: 684450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:40:01,421-Speed 2619.66 samples/sec   Loss 2.4321   LearningRate 0.0031   Epoch: 16   Global Step: 684460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:40:05,323-Speed 2625.60 samples/sec   Loss 2.4040   LearningRate 0.0031   Epoch: 16   Global Step: 684470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:40:09,236-Speed 2617.78 samples/sec   Loss 2.4187   LearningRate 0.0031   Epoch: 16   Global Step: 684480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:40:13,136-Speed 2625.93 samples/sec   Loss 2.4363   LearningRate 0.0031   Epoch: 16   Global Step: 684490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:40:17,012-Speed 2642.13 samples/sec   Loss 2.4013   LearningRate 0.0031   Epoch: 16   Global Step: 684500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:20,909-Speed 2629.20 samples/sec   Loss 2.3712   LearningRate 0.0031   Epoch: 16   Global Step: 684510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:24,806-Speed 2628.31 samples/sec   Loss 2.4087   LearningRate 0.0031   Epoch: 16   Global Step: 684520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:28,702-Speed 2629.05 samples/sec   Loss 2.4240   LearningRate 0.0031   Epoch: 16   Global Step: 684530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:32,610-Speed 2620.59 samples/sec   Loss 2.4513   LearningRate 0.0031   Epoch: 16   Global Step: 684540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:36,512-Speed 2625.64 samples/sec   Loss 2.3680   LearningRate 0.0031   Epoch: 16   Global Step: 684550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:40,413-Speed 2625.87 samples/sec   Loss 2.3713   LearningRate 0.0031   Epoch: 16   Global Step: 684560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:44,311-Speed 2627.65 samples/sec   Loss 2.4276   LearningRate 0.0031   Epoch: 16   Global Step: 684570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:48,207-Speed 2629.21 samples/sec   Loss 2.3870   LearningRate 0.0031   Epoch: 16   Global Step: 684580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:52,104-Speed 2628.29 samples/sec   Loss 2.4042   LearningRate 0.0031   Epoch: 16   Global Step: 684590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:40:55,996-Speed 2631.85 samples/sec   Loss 2.3598   LearningRate 0.0031   Epoch: 16   Global Step: 684600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:40:59,895-Speed 2627.05 samples/sec   Loss 2.4328   LearningRate 0.0031   Epoch: 16   Global Step: 684610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:41:03,798-Speed 2623.72 samples/sec   Loss 2.4470   LearningRate 0.0031   Epoch: 16   Global Step: 684620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:41:07,678-Speed 2639.84 samples/sec   Loss 2.4080   LearningRate 0.0031   Epoch: 16   Global Step: 684630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:11,742-Speed 2520.66 samples/sec   Loss 2.4376   LearningRate 0.0031   Epoch: 16   Global Step: 684640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:15,804-Speed 2521.87 samples/sec   Loss 2.4019   LearningRate 0.0031   Epoch: 16   Global Step: 684650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:19,711-Speed 2621.42 samples/sec   Loss 2.4154   LearningRate 0.0031   Epoch: 16   Global Step: 684660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:23,625-Speed 2617.34 samples/sec   Loss 2.3883   LearningRate 0.0031   Epoch: 16   Global Step: 684670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:27,527-Speed 2625.09 samples/sec   Loss 2.4005   LearningRate 0.0031   Epoch: 16   Global Step: 684680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:31,425-Speed 2627.74 samples/sec   Loss 2.4574   LearningRate 0.0031   Epoch: 16   Global Step: 684690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:35,322-Speed 2628.25 samples/sec   Loss 2.4089   LearningRate 0.0031   Epoch: 16   Global Step: 684700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:39,226-Speed 2623.50 samples/sec   Loss 2.3719   LearningRate 0.0030   Epoch: 16   Global Step: 684710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:43,133-Speed 2621.95 samples/sec   Loss 2.4796   LearningRate 0.0030   Epoch: 16   Global Step: 684720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:47,040-Speed 2621.43 samples/sec   Loss 2.3967   LearningRate 0.0030   Epoch: 16   Global Step: 684730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:41:50,941-Speed 2625.36 samples/sec   Loss 2.4345   LearningRate 0.0030   Epoch: 16   Global Step: 684740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:41:54,846-Speed 2622.76 samples/sec   Loss 2.3861   LearningRate 0.0030   Epoch: 16   Global Step: 684750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:41:58,750-Speed 2624.93 samples/sec   Loss 2.3841   LearningRate 0.0030   Epoch: 16   Global Step: 684760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:02,648-Speed 2627.47 samples/sec   Loss 2.3896   LearningRate 0.0030   Epoch: 16   Global Step: 684770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:06,550-Speed 2625.22 samples/sec   Loss 2.3538   LearningRate 0.0030   Epoch: 16   Global Step: 684780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:10,449-Speed 2626.21 samples/sec   Loss 2.4000   LearningRate 0.0030   Epoch: 16   Global Step: 684790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:14,354-Speed 2623.61 samples/sec   Loss 2.4132   LearningRate 0.0030   Epoch: 16   Global Step: 684800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:18,257-Speed 2623.95 samples/sec   Loss 2.3905   LearningRate 0.0030   Epoch: 16   Global Step: 684810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:22,158-Speed 2625.82 samples/sec   Loss 2.5040   LearningRate 0.0030   Epoch: 16   Global Step: 684820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:26,062-Speed 2623.19 samples/sec   Loss 2.4334   LearningRate 0.0030   Epoch: 16   Global Step: 684830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:29,964-Speed 2625.10 samples/sec   Loss 2.3838   LearningRate 0.0030   Epoch: 16   Global Step: 684840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:42:33,866-Speed 2624.69 samples/sec   Loss 2.4127   LearningRate 0.0030   Epoch: 16   Global Step: 684850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:42:37,764-Speed 2627.80 samples/sec   Loss 2.3934   LearningRate 0.0030   Epoch: 16   Global Step: 684860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:42:41,665-Speed 2625.70 samples/sec   Loss 2.4086   LearningRate 0.0030   Epoch: 16   Global Step: 684870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:42:45,571-Speed 2622.40 samples/sec   Loss 2.4611   LearningRate 0.0030   Epoch: 16   Global Step: 684880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:42:49,480-Speed 2620.38 samples/sec   Loss 2.3963   LearningRate 0.0030   Epoch: 16   Global Step: 684890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:42:53,382-Speed 2624.98 samples/sec   Loss 2.4268   LearningRate 0.0030   Epoch: 16   Global Step: 684900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:42:57,284-Speed 2624.62 samples/sec   Loss 2.4195   LearningRate 0.0030   Epoch: 16   Global Step: 684910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:43:01,180-Speed 2629.66 samples/sec   Loss 2.3210   LearningRate 0.0030   Epoch: 16   Global Step: 684920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:43:05,079-Speed 2627.17 samples/sec   Loss 2.3317   LearningRate 0.0030   Epoch: 16   Global Step: 684930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:43:08,972-Speed 2630.95 samples/sec   Loss 2.4120   LearningRate 0.0030   Epoch: 16   Global Step: 684940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:43:12,859-Speed 2634.29 samples/sec   Loss 2.4100   LearningRate 0.0030   Epoch: 16   Global Step: 684950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:43:16,749-Speed 2634.28 samples/sec   Loss 2.4660   LearningRate 0.0030   Epoch: 16   Global Step: 684960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:43:20,647-Speed 2627.63 samples/sec   Loss 2.4365   LearningRate 0.0030   Epoch: 16   Global Step: 684970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:43:24,525-Speed 2641.27 samples/sec   Loss 2.3813   LearningRate 0.0030   Epoch: 16   Global Step: 684980   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:28,434-Speed 2621.01 samples/sec   Loss 2.4534   LearningRate 0.0030   Epoch: 16   Global Step: 684990   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:32,330-Speed 2628.71 samples/sec   Loss 2.3796   LearningRate 0.0030   Epoch: 16   Global Step: 685000   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:36,232-Speed 2624.55 samples/sec   Loss 2.4040   LearningRate 0.0030   Epoch: 16   Global Step: 685010   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:40,136-Speed 2623.65 samples/sec   Loss 2.4169   LearningRate 0.0030   Epoch: 16   Global Step: 685020   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:44,041-Speed 2623.54 samples/sec   Loss 2.3677   LearningRate 0.0030   Epoch: 16   Global Step: 685030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:47,938-Speed 2627.95 samples/sec   Loss 2.4022   LearningRate 0.0030   Epoch: 16   Global Step: 685040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:51,850-Speed 2619.03 samples/sec   Loss 2.4490   LearningRate 0.0030   Epoch: 16   Global Step: 685050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:55,748-Speed 2627.43 samples/sec   Loss 2.4409   LearningRate 0.0030   Epoch: 16   Global Step: 685060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:43:59,668-Speed 2612.91 samples/sec   Loss 2.3660   LearningRate 0.0030   Epoch: 16   Global Step: 685070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:44:03,608-Speed 2599.76 samples/sec   Loss 2.4467   LearningRate 0.0030   Epoch: 16   Global Step: 685080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:07,697-Speed 2505.23 samples/sec   Loss 2.4147   LearningRate 0.0030   Epoch: 16   Global Step: 685090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:11,793-Speed 2500.33 samples/sec   Loss 2.4139   LearningRate 0.0030   Epoch: 16   Global Step: 685100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:15,714-Speed 2612.83 samples/sec   Loss 2.4000   LearningRate 0.0030   Epoch: 16   Global Step: 685110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:19,614-Speed 2626.32 samples/sec   Loss 2.4385   LearningRate 0.0030   Epoch: 16   Global Step: 685120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:23,511-Speed 2628.18 samples/sec   Loss 2.3360   LearningRate 0.0030   Epoch: 16   Global Step: 685130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:27,422-Speed 2619.15 samples/sec   Loss 2.3809   LearningRate 0.0030   Epoch: 16   Global Step: 685140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:31,319-Speed 2628.55 samples/sec   Loss 2.3671   LearningRate 0.0030   Epoch: 16   Global Step: 685150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:35,219-Speed 2626.45 samples/sec   Loss 2.4185   LearningRate 0.0030   Epoch: 16   Global Step: 685160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:39,115-Speed 2628.77 samples/sec   Loss 2.4608   LearningRate 0.0030   Epoch: 16   Global Step: 685170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:44:43,012-Speed 2628.75 samples/sec   Loss 2.4351   LearningRate 0.0030   Epoch: 16   Global Step: 685180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:44:46,916-Speed 2623.09 samples/sec   Loss 2.4305   LearningRate 0.0030   Epoch: 16   Global Step: 685190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:44:50,813-Speed 2629.19 samples/sec   Loss 2.3988   LearningRate 0.0030   Epoch: 16   Global Step: 685200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:44:54,747-Speed 2603.23 samples/sec   Loss 2.3707   LearningRate 0.0030   Epoch: 16   Global Step: 685210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:44:58,660-Speed 2617.95 samples/sec   Loss 2.4176   LearningRate 0.0030   Epoch: 16   Global Step: 685220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:45:02,649-Speed 2568.08 samples/sec   Loss 2.4244   LearningRate 0.0030   Epoch: 16   Global Step: 685230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:45:06,655-Speed 2557.11 samples/sec   Loss 2.3831   LearningRate 0.0030   Epoch: 16   Global Step: 685240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:45:10,528-Speed 2644.58 samples/sec   Loss 2.4589   LearningRate 0.0030   Epoch: 16   Global Step: 685250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:14,430-Speed 2624.71 samples/sec   Loss 2.4557   LearningRate 0.0030   Epoch: 16   Global Step: 685260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:18,340-Speed 2619.22 samples/sec   Loss 2.4383   LearningRate 0.0030   Epoch: 16   Global Step: 685270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:22,242-Speed 2625.45 samples/sec   Loss 2.3949   LearningRate 0.0030   Epoch: 16   Global Step: 685280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:26,136-Speed 2630.42 samples/sec   Loss 2.4097   LearningRate 0.0030   Epoch: 16   Global Step: 685290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:30,031-Speed 2629.30 samples/sec   Loss 2.4067   LearningRate 0.0030   Epoch: 16   Global Step: 685300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:33,932-Speed 2626.49 samples/sec   Loss 2.3388   LearningRate 0.0030   Epoch: 16   Global Step: 685310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:37,826-Speed 2629.94 samples/sec   Loss 2.3665   LearningRate 0.0030   Epoch: 16   Global Step: 685320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:41,734-Speed 2621.18 samples/sec   Loss 2.4559   LearningRate 0.0030   Epoch: 16   Global Step: 685330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:45,633-Speed 2626.49 samples/sec   Loss 2.5221   LearningRate 0.0030   Epoch: 16   Global Step: 685340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:45:49,530-Speed 2628.66 samples/sec   Loss 2.5041   LearningRate 0.0030   Epoch: 16   Global Step: 685350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:45:53,427-Speed 2627.83 samples/sec   Loss 2.3204   LearningRate 0.0030   Epoch: 16   Global Step: 685360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:45:57,310-Speed 2638.42 samples/sec   Loss 2.4090   LearningRate 0.0030   Epoch: 16   Global Step: 685370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:01,212-Speed 2624.25 samples/sec   Loss 2.4574   LearningRate 0.0030   Epoch: 16   Global Step: 685380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:05,109-Speed 2629.10 samples/sec   Loss 2.4662   LearningRate 0.0030   Epoch: 16   Global Step: 685390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:09,006-Speed 2628.08 samples/sec   Loss 2.3905   LearningRate 0.0030   Epoch: 16   Global Step: 685400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:13,010-Speed 2557.79 samples/sec   Loss 2.4242   LearningRate 0.0030   Epoch: 16   Global Step: 685410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:16,909-Speed 2627.51 samples/sec   Loss 2.4284   LearningRate 0.0030   Epoch: 16   Global Step: 685420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:20,805-Speed 2629.25 samples/sec   Loss 2.4549   LearningRate 0.0030   Epoch: 16   Global Step: 685430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:24,698-Speed 2630.50 samples/sec   Loss 2.3834   LearningRate 0.0030   Epoch: 16   Global Step: 685440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:28,592-Speed 2630.63 samples/sec   Loss 2.4599   LearningRate 0.0030   Epoch: 16   Global Step: 685450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:32,532-Speed 2599.33 samples/sec   Loss 2.4576   LearningRate 0.0030   Epoch: 16   Global Step: 685460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:36,428-Speed 2629.08 samples/sec   Loss 2.4164   LearningRate 0.0030   Epoch: 16   Global Step: 685470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:46:40,306-Speed 2641.43 samples/sec   Loss 2.4587   LearningRate 0.0030   Epoch: 16   Global Step: 685480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:44,206-Speed 2626.13 samples/sec   Loss 2.4044   LearningRate 0.0030   Epoch: 16   Global Step: 685490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:48,108-Speed 2625.01 samples/sec   Loss 2.4077   LearningRate 0.0030   Epoch: 16   Global Step: 685500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:52,006-Speed 2627.49 samples/sec   Loss 2.4208   LearningRate 0.0030   Epoch: 16   Global Step: 685510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:55,901-Speed 2629.84 samples/sec   Loss 2.4899   LearningRate 0.0030   Epoch: 16   Global Step: 685520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:46:59,802-Speed 2625.61 samples/sec   Loss 2.3848   LearningRate 0.0030   Epoch: 16   Global Step: 685530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:03,695-Speed 2631.08 samples/sec   Loss 2.3970   LearningRate 0.0030   Epoch: 16   Global Step: 685540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:07,588-Speed 2631.02 samples/sec   Loss 2.4093   LearningRate 0.0030   Epoch: 16   Global Step: 685550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:11,483-Speed 2629.94 samples/sec   Loss 2.4788   LearningRate 0.0030   Epoch: 16   Global Step: 685560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:15,379-Speed 2629.09 samples/sec   Loss 2.4676   LearningRate 0.0030   Epoch: 16   Global Step: 685570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:19,277-Speed 2627.15 samples/sec   Loss 2.3968   LearningRate 0.0030   Epoch: 16   Global Step: 685580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:47:23,152-Speed 2643.90 samples/sec   Loss 2.4129   LearningRate 0.0030   Epoch: 16   Global Step: 685590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:27,042-Speed 2632.60 samples/sec   Loss 2.3583   LearningRate 0.0030   Epoch: 16   Global Step: 685600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:31,021-Speed 2574.71 samples/sec   Loss 2.4243   LearningRate 0.0030   Epoch: 16   Global Step: 685610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:34,922-Speed 2625.77 samples/sec   Loss 2.4731   LearningRate 0.0030   Epoch: 16   Global Step: 685620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:38,816-Speed 2629.93 samples/sec   Loss 2.3654   LearningRate 0.0030   Epoch: 16   Global Step: 685630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:42,750-Speed 2603.65 samples/sec   Loss 2.3791   LearningRate 0.0030   Epoch: 16   Global Step: 685640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:46,651-Speed 2625.98 samples/sec   Loss 2.3389   LearningRate 0.0030   Epoch: 16   Global Step: 685650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:50,546-Speed 2629.68 samples/sec   Loss 2.4433   LearningRate 0.0030   Epoch: 16   Global Step: 685660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:54,510-Speed 2583.73 samples/sec   Loss 2.3774   LearningRate 0.0030   Epoch: 16   Global Step: 685670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:47:58,407-Speed 2628.99 samples/sec   Loss 2.4545   LearningRate 0.0030   Epoch: 16   Global Step: 685680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:02,312-Speed 2622.76 samples/sec   Loss 2.4053   LearningRate 0.0030   Epoch: 16   Global Step: 685690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:06,217-Speed 2622.71 samples/sec   Loss 2.4003   LearningRate 0.0030   Epoch: 16   Global Step: 685700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:10,115-Speed 2627.57 samples/sec   Loss 2.4155   LearningRate 0.0030   Epoch: 16   Global Step: 685710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:14,026-Speed 2618.89 samples/sec   Loss 2.3987   LearningRate 0.0030   Epoch: 16   Global Step: 685720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:17,934-Speed 2621.19 samples/sec   Loss 2.4190   LearningRate 0.0030   Epoch: 16   Global Step: 685730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:21,834-Speed 2626.44 samples/sec   Loss 2.4530   LearningRate 0.0030   Epoch: 16   Global Step: 685740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:25,733-Speed 2626.58 samples/sec   Loss 2.3812   LearningRate 0.0030   Epoch: 16   Global Step: 685750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:29,680-Speed 2595.83 samples/sec   Loss 2.3559   LearningRate 0.0030   Epoch: 16   Global Step: 685760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:33,575-Speed 2629.57 samples/sec   Loss 2.4291   LearningRate 0.0030   Epoch: 16   Global Step: 685770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:37,470-Speed 2629.11 samples/sec   Loss 2.4140   LearningRate 0.0030   Epoch: 16   Global Step: 685780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:41,347-Speed 2641.94 samples/sec   Loss 2.4629   LearningRate 0.0030   Epoch: 16   Global Step: 685790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:45,358-Speed 2553.92 samples/sec   Loss 2.3255   LearningRate 0.0030   Epoch: 16   Global Step: 685800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:49,253-Speed 2629.87 samples/sec   Loss 2.4489   LearningRate 0.0030   Epoch: 16   Global Step: 685810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:53,210-Speed 2588.41 samples/sec   Loss 2.4292   LearningRate 0.0030   Epoch: 16   Global Step: 685820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:48:57,122-Speed 2618.60 samples/sec   Loss 2.3634   LearningRate 0.0030   Epoch: 16   Global Step: 685830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:01,019-Speed 2628.30 samples/sec   Loss 2.4802   LearningRate 0.0030   Epoch: 16   Global Step: 685840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:04,932-Speed 2617.91 samples/sec   Loss 2.3459   LearningRate 0.0030   Epoch: 16   Global Step: 685850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:08,827-Speed 2629.53 samples/sec   Loss 2.3458   LearningRate 0.0030   Epoch: 16   Global Step: 685860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:12,719-Speed 2631.18 samples/sec   Loss 2.3634   LearningRate 0.0030   Epoch: 16   Global Step: 685870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:16,619-Speed 2626.67 samples/sec   Loss 2.4268   LearningRate 0.0030   Epoch: 16   Global Step: 685880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:20,513-Speed 2630.10 samples/sec   Loss 2.3800   LearningRate 0.0030   Epoch: 16   Global Step: 685890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:49:24,427-Speed 2618.33 samples/sec   Loss 2.3709   LearningRate 0.0030   Epoch: 16   Global Step: 685900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:49:28,301-Speed 2643.81 samples/sec   Loss 2.4492   LearningRate 0.0030   Epoch: 16   Global Step: 685910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:32,198-Speed 2628.43 samples/sec   Loss 2.4128   LearningRate 0.0030   Epoch: 16   Global Step: 685920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:36,091-Speed 2630.85 samples/sec   Loss 2.2552   LearningRate 0.0030   Epoch: 16   Global Step: 685930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:39,984-Speed 2630.83 samples/sec   Loss 2.4933   LearningRate 0.0030   Epoch: 16   Global Step: 685940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:43,880-Speed 2628.89 samples/sec   Loss 2.4226   LearningRate 0.0030   Epoch: 16   Global Step: 685950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:47,778-Speed 2627.53 samples/sec   Loss 2.4134   LearningRate 0.0030   Epoch: 16   Global Step: 685960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:51,683-Speed 2623.81 samples/sec   Loss 2.4297   LearningRate 0.0030   Epoch: 16   Global Step: 685970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:55,586-Speed 2624.47 samples/sec   Loss 2.4289   LearningRate 0.0030   Epoch: 16   Global Step: 685980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:49:59,494-Speed 2621.78 samples/sec   Loss 2.3840   LearningRate 0.0030   Epoch: 16   Global Step: 685990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:03,413-Speed 2613.53 samples/sec   Loss 2.4287   LearningRate 0.0030   Epoch: 16   Global Step: 686000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:07,319-Speed 2622.14 samples/sec   Loss 2.3317   LearningRate 0.0030   Epoch: 16   Global Step: 686010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:50:11,218-Speed 2626.44 samples/sec   Loss 2.4256   LearningRate 0.0030   Epoch: 16   Global Step: 686020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:50:15,134-Speed 2616.56 samples/sec   Loss 2.3983   LearningRate 0.0030   Epoch: 16   Global Step: 686030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:50:19,031-Speed 2628.15 samples/sec   Loss 2.3921   LearningRate 0.0030   Epoch: 16   Global Step: 686040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:50:22,904-Speed 2643.88 samples/sec   Loss 2.3855   LearningRate 0.0030   Epoch: 16   Global Step: 686050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:26,821-Speed 2615.19 samples/sec   Loss 2.4193   LearningRate 0.0030   Epoch: 16   Global Step: 686060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:30,718-Speed 2628.37 samples/sec   Loss 2.3839   LearningRate 0.0030   Epoch: 16   Global Step: 686070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:34,618-Speed 2627.14 samples/sec   Loss 2.4018   LearningRate 0.0030   Epoch: 16   Global Step: 686080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:38,514-Speed 2628.29 samples/sec   Loss 2.3520   LearningRate 0.0030   Epoch: 16   Global Step: 686090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:42,410-Speed 2629.13 samples/sec   Loss 2.4591   LearningRate 0.0030   Epoch: 16   Global Step: 686100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:46,314-Speed 2623.05 samples/sec   Loss 2.4626   LearningRate 0.0030   Epoch: 16   Global Step: 686110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:50,225-Speed 2619.94 samples/sec   Loss 2.4595   LearningRate 0.0030   Epoch: 16   Global Step: 686120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:54,118-Speed 2630.69 samples/sec   Loss 2.3875   LearningRate 0.0030   Epoch: 16   Global Step: 686130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:50:58,014-Speed 2629.29 samples/sec   Loss 2.3818   LearningRate 0.0030   Epoch: 16   Global Step: 686140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:01,907-Speed 2630.65 samples/sec   Loss 2.3238   LearningRate 0.0030   Epoch: 16   Global Step: 686150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:51:05,808-Speed 2626.12 samples/sec   Loss 2.3899   LearningRate 0.0030   Epoch: 16   Global Step: 686160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:51:09,868-Speed 2523.01 samples/sec   Loss 2.3565   LearningRate 0.0030   Epoch: 16   Global Step: 686170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:51:13,766-Speed 2627.22 samples/sec   Loss 2.4298   LearningRate 0.0030   Epoch: 16   Global Step: 686180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:51:17,645-Speed 2640.16 samples/sec   Loss 2.4030   LearningRate 0.0030   Epoch: 16   Global Step: 686190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:21,540-Speed 2630.08 samples/sec   Loss 2.4033   LearningRate 0.0030   Epoch: 16   Global Step: 686200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:25,437-Speed 2628.38 samples/sec   Loss 2.3313   LearningRate 0.0030   Epoch: 16   Global Step: 686210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:29,334-Speed 2628.96 samples/sec   Loss 2.4495   LearningRate 0.0030   Epoch: 16   Global Step: 686220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:33,228-Speed 2629.78 samples/sec   Loss 2.3733   LearningRate 0.0030   Epoch: 16   Global Step: 686230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:37,126-Speed 2628.11 samples/sec   Loss 2.4667   LearningRate 0.0030   Epoch: 16   Global Step: 686240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:51:40,994-Speed 2647.82 samples/sec   Loss 2.3664   LearningRate 0.0030   Epoch: 16   Global Step: 686250   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:51:44,896-Speed 2624.98 samples/sec   Loss 2.4490   LearningRate 0.0030   Epoch: 16   Global Step: 686260   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:51:48,801-Speed 2622.55 samples/sec   Loss 2.4037   LearningRate 0.0030   Epoch: 16   Global Step: 686270   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:51:52,692-Speed 2633.02 samples/sec   Loss 2.3810   LearningRate 0.0030   Epoch: 16   Global Step: 686280   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:51:56,587-Speed 2630.67 samples/sec   Loss 2.4304   LearningRate 0.0030   Epoch: 16   Global Step: 686290   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:52:00,491-Speed 2623.53 samples/sec   Loss 2.3765   LearningRate 0.0030   Epoch: 16   Global Step: 686300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:52:04,384-Speed 2631.01 samples/sec   Loss 2.3947   LearningRate 0.0030   Epoch: 16   Global Step: 686310   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:52:08,282-Speed 2627.51 samples/sec   Loss 2.4204   LearningRate 0.0030   Epoch: 16   Global Step: 686320   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:52:12,216-Speed 2603.41 samples/sec   Loss 2.3807   LearningRate 0.0030   Epoch: 16   Global Step: 686330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:52:16,113-Speed 2628.20 samples/sec   Loss 2.3998   LearningRate 0.0030   Epoch: 16   Global Step: 686340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:52:20,030-Speed 2615.68 samples/sec   Loss 2.3357   LearningRate 0.0030   Epoch: 16   Global Step: 686350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:23,931-Speed 2625.62 samples/sec   Loss 2.3722   LearningRate 0.0030   Epoch: 16   Global Step: 686360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:27,831-Speed 2626.15 samples/sec   Loss 2.4251   LearningRate 0.0030   Epoch: 16   Global Step: 686370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:31,726-Speed 2630.07 samples/sec   Loss 2.3572   LearningRate 0.0030   Epoch: 16   Global Step: 686380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:35,800-Speed 2513.83 samples/sec   Loss 2.4173   LearningRate 0.0030   Epoch: 16   Global Step: 686390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:39,791-Speed 2566.03 samples/sec   Loss 2.3755   LearningRate 0.0030   Epoch: 16   Global Step: 686400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:43,686-Speed 2629.95 samples/sec   Loss 2.3021   LearningRate 0.0030   Epoch: 16   Global Step: 686410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:47,591-Speed 2623.14 samples/sec   Loss 2.3748   LearningRate 0.0030   Epoch: 16   Global Step: 686420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:51,493-Speed 2624.66 samples/sec   Loss 2.4684   LearningRate 0.0030   Epoch: 16   Global Step: 686430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:55,394-Speed 2625.89 samples/sec   Loss 2.4417   LearningRate 0.0030   Epoch: 16   Global Step: 686440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:52:59,295-Speed 2625.80 samples/sec   Loss 2.4262   LearningRate 0.0030   Epoch: 16   Global Step: 686450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:53:03,244-Speed 2593.53 samples/sec   Loss 2.3837   LearningRate 0.0030   Epoch: 16   Global Step: 686460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:07,144-Speed 2626.19 samples/sec   Loss 2.3891   LearningRate 0.0030   Epoch: 16   Global Step: 686470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:11,042-Speed 2627.86 samples/sec   Loss 2.3762   LearningRate 0.0030   Epoch: 16   Global Step: 686480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:14,940-Speed 2627.17 samples/sec   Loss 2.4182   LearningRate 0.0030   Epoch: 16   Global Step: 686490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:18,834-Speed 2630.14 samples/sec   Loss 2.2984   LearningRate 0.0030   Epoch: 16   Global Step: 686500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:22,734-Speed 2626.67 samples/sec   Loss 2.4604   LearningRate 0.0030   Epoch: 16   Global Step: 686510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:26,632-Speed 2627.42 samples/sec   Loss 2.3598   LearningRate 0.0030   Epoch: 16   Global Step: 686520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:30,537-Speed 2623.33 samples/sec   Loss 2.4100   LearningRate 0.0030   Epoch: 16   Global Step: 686530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:34,445-Speed 2620.63 samples/sec   Loss 2.3963   LearningRate 0.0030   Epoch: 16   Global Step: 686540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:38,338-Speed 2631.51 samples/sec   Loss 2.4240   LearningRate 0.0030   Epoch: 16   Global Step: 686550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:42,235-Speed 2628.27 samples/sec   Loss 2.3549   LearningRate 0.0030   Epoch: 16   Global Step: 686560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:53:46,110-Speed 2642.96 samples/sec   Loss 2.4299   LearningRate 0.0030   Epoch: 16   Global Step: 686570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:50,012-Speed 2625.27 samples/sec   Loss 2.3311   LearningRate 0.0030   Epoch: 16   Global Step: 686580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:53,910-Speed 2627.92 samples/sec   Loss 2.3945   LearningRate 0.0030   Epoch: 16   Global Step: 686590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:53:57,812-Speed 2624.72 samples/sec   Loss 2.4270   LearningRate 0.0030   Epoch: 16   Global Step: 686600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:01,709-Speed 2628.64 samples/sec   Loss 2.3415   LearningRate 0.0030   Epoch: 16   Global Step: 686610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:05,628-Speed 2613.90 samples/sec   Loss 2.4112   LearningRate 0.0030   Epoch: 16   Global Step: 686620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:09,532-Speed 2623.74 samples/sec   Loss 2.4256   LearningRate 0.0030   Epoch: 16   Global Step: 686630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:13,437-Speed 2622.68 samples/sec   Loss 2.4466   LearningRate 0.0030   Epoch: 16   Global Step: 686640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:17,344-Speed 2621.69 samples/sec   Loss 2.4395   LearningRate 0.0030   Epoch: 16   Global Step: 686650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:21,250-Speed 2622.73 samples/sec   Loss 2.3714   LearningRate 0.0030   Epoch: 16   Global Step: 686660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:25,158-Speed 2620.83 samples/sec   Loss 2.4272   LearningRate 0.0030   Epoch: 16   Global Step: 686670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:54:29,064-Speed 2622.15 samples/sec   Loss 2.4066   LearningRate 0.0030   Epoch: 16   Global Step: 686680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:54:32,967-Speed 2624.84 samples/sec   Loss 2.3733   LearningRate 0.0030   Epoch: 16   Global Step: 686690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:54:36,866-Speed 2626.55 samples/sec   Loss 2.4355   LearningRate 0.0030   Epoch: 16   Global Step: 686700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 00:54:40,770-Speed 2623.83 samples/sec   Loss 2.3765   LearningRate 0.0030   Epoch: 16   Global Step: 686710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:44,690-Speed 2612.88 samples/sec   Loss 2.4000   LearningRate 0.0030   Epoch: 16   Global Step: 686720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:48,594-Speed 2623.56 samples/sec   Loss 2.3491   LearningRate 0.0030   Epoch: 16   Global Step: 686730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:52,491-Speed 2629.19 samples/sec   Loss 2.3592   LearningRate 0.0030   Epoch: 16   Global Step: 686740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:54:56,387-Speed 2629.27 samples/sec   Loss 2.4275   LearningRate 0.0030   Epoch: 16   Global Step: 686750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:55:00,285-Speed 2628.48 samples/sec   Loss 2.3595   LearningRate 0.0030   Epoch: 16   Global Step: 686760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:55:04,187-Speed 2624.34 samples/sec   Loss 2.4338   LearningRate 0.0030   Epoch: 16   Global Step: 686770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:55:08,080-Speed 2631.27 samples/sec   Loss 2.3456   LearningRate 0.0030   Epoch: 16   Global Step: 686780   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:11,979-Speed 2626.26 samples/sec   Loss 2.4338   LearningRate 0.0030   Epoch: 16   Global Step: 686790   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:15,907-Speed 2608.33 samples/sec   Loss 2.3464   LearningRate 0.0030   Epoch: 16   Global Step: 686800   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:19,873-Speed 2582.79 samples/sec   Loss 2.3576   LearningRate 0.0030   Epoch: 16   Global Step: 686810   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:23,781-Speed 2620.62 samples/sec   Loss 2.3264   LearningRate 0.0030   Epoch: 16   Global Step: 686820   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:27,686-Speed 2623.50 samples/sec   Loss 2.4479   LearningRate 0.0030   Epoch: 16   Global Step: 686830   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:31,584-Speed 2627.44 samples/sec   Loss 2.3983   LearningRate 0.0030   Epoch: 16   Global Step: 686840   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:35,482-Speed 2627.69 samples/sec   Loss 2.3704   LearningRate 0.0030   Epoch: 16   Global Step: 686850   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:39,383-Speed 2625.48 samples/sec   Loss 2.3635   LearningRate 0.0030   Epoch: 16   Global Step: 686860   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:43,300-Speed 2614.53 samples/sec   Loss 2.3712   LearningRate 0.0030   Epoch: 16   Global Step: 686870   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:55:47,194-Speed 2630.95 samples/sec   Loss 2.4150   LearningRate 0.0030   Epoch: 16   Global Step: 686880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:55:51,093-Speed 2626.69 samples/sec   Loss 2.4475   LearningRate 0.0030   Epoch: 16   Global Step: 686890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:55:54,988-Speed 2629.84 samples/sec   Loss 2.4416   LearningRate 0.0030   Epoch: 16   Global Step: 686900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:55:58,877-Speed 2633.63 samples/sec   Loss 2.3973   LearningRate 0.0030   Epoch: 16   Global Step: 686910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:02,810-Speed 2604.82 samples/sec   Loss 2.3797   LearningRate 0.0030   Epoch: 16   Global Step: 686920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:06,703-Speed 2630.69 samples/sec   Loss 2.3532   LearningRate 0.0030   Epoch: 16   Global Step: 686930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:10,594-Speed 2632.10 samples/sec   Loss 2.3934   LearningRate 0.0030   Epoch: 16   Global Step: 686940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:14,495-Speed 2625.77 samples/sec   Loss 2.3702   LearningRate 0.0030   Epoch: 16   Global Step: 686950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:18,405-Speed 2619.11 samples/sec   Loss 2.2566   LearningRate 0.0030   Epoch: 16   Global Step: 686960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:22,311-Speed 2622.75 samples/sec   Loss 2.3433   LearningRate 0.0030   Epoch: 16   Global Step: 686970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:26,188-Speed 2642.16 samples/sec   Loss 2.4330   LearningRate 0.0030   Epoch: 16   Global Step: 686980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:30,085-Speed 2628.33 samples/sec   Loss 2.3245   LearningRate 0.0030   Epoch: 16   Global Step: 686990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:33,984-Speed 2627.31 samples/sec   Loss 2.3501   LearningRate 0.0030   Epoch: 16   Global Step: 687000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:37,882-Speed 2628.02 samples/sec   Loss 2.3509   LearningRate 0.0030   Epoch: 16   Global Step: 687010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:41,782-Speed 2626.20 samples/sec   Loss 2.3306   LearningRate 0.0030   Epoch: 16   Global Step: 687020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:45,683-Speed 2625.32 samples/sec   Loss 2.4746   LearningRate 0.0030   Epoch: 16   Global Step: 687030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:49,585-Speed 2624.95 samples/sec   Loss 2.3952   LearningRate 0.0030   Epoch: 16   Global Step: 687040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:53,481-Speed 2629.16 samples/sec   Loss 2.3857   LearningRate 0.0030   Epoch: 16   Global Step: 687050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:56:57,357-Speed 2642.34 samples/sec   Loss 2.3594   LearningRate 0.0030   Epoch: 16   Global Step: 687060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:01,260-Speed 2623.90 samples/sec   Loss 2.3723   LearningRate 0.0030   Epoch: 16   Global Step: 687070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:05,157-Speed 2628.89 samples/sec   Loss 2.3414   LearningRate 0.0030   Epoch: 16   Global Step: 687080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:09,053-Speed 2629.42 samples/sec   Loss 2.3760   LearningRate 0.0030   Epoch: 16   Global Step: 687090   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:12,946-Speed 2630.31 samples/sec   Loss 2.4116   LearningRate 0.0029   Epoch: 16   Global Step: 687100   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:16,842-Speed 2628.83 samples/sec   Loss 2.4568   LearningRate 0.0029   Epoch: 16   Global Step: 687110   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:20,741-Speed 2627.38 samples/sec   Loss 2.3984   LearningRate 0.0029   Epoch: 16   Global Step: 687120   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:24,639-Speed 2627.77 samples/sec   Loss 2.3615   LearningRate 0.0029   Epoch: 16   Global Step: 687130   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:28,587-Speed 2594.50 samples/sec   Loss 2.3195   LearningRate 0.0029   Epoch: 16   Global Step: 687140   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:32,489-Speed 2624.88 samples/sec   Loss 2.4336   LearningRate 0.0029   Epoch: 16   Global Step: 687150   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:36,386-Speed 2628.39 samples/sec   Loss 2.3504   LearningRate 0.0029   Epoch: 16   Global Step: 687160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:57:40,256-Speed 2646.45 samples/sec   Loss 2.3850   LearningRate 0.0029   Epoch: 16   Global Step: 687170   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:44,152-Speed 2629.25 samples/sec   Loss 2.4191   LearningRate 0.0029   Epoch: 16   Global Step: 687180   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:48,050-Speed 2627.70 samples/sec   Loss 2.4113   LearningRate 0.0029   Epoch: 16   Global Step: 687190   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:51,944-Speed 2630.32 samples/sec   Loss 2.3285   LearningRate 0.0029   Epoch: 16   Global Step: 687200   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:55,841-Speed 2628.15 samples/sec   Loss 2.3503   LearningRate 0.0029   Epoch: 16   Global Step: 687210   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:57:59,738-Speed 2629.45 samples/sec   Loss 2.3512   LearningRate 0.0029   Epoch: 16   Global Step: 687220   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:03,634-Speed 2628.70 samples/sec   Loss 2.3330   LearningRate 0.0029   Epoch: 16   Global Step: 687230   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:07,527-Speed 2631.75 samples/sec   Loss 2.4448   LearningRate 0.0029   Epoch: 16   Global Step: 687240   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:11,424-Speed 2627.73 samples/sec   Loss 2.3790   LearningRate 0.0029   Epoch: 16   Global Step: 687250   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:15,330-Speed 2622.12 samples/sec   Loss 2.3723   LearningRate 0.0029   Epoch: 16   Global Step: 687260   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:19,234-Speed 2623.36 samples/sec   Loss 2.4114   LearningRate 0.0029   Epoch: 16   Global Step: 687270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:58:23,151-Speed 2614.88 samples/sec   Loss 2.4389   LearningRate 0.0029   Epoch: 16   Global Step: 687280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:58:27,048-Speed 2628.38 samples/sec   Loss 2.4053   LearningRate 0.0029   Epoch: 16   Global Step: 687290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:58:30,921-Speed 2644.61 samples/sec   Loss 2.3398   LearningRate 0.0029   Epoch: 16   Global Step: 687300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:34,817-Speed 2629.42 samples/sec   Loss 2.3838   LearningRate 0.0029   Epoch: 16   Global Step: 687310   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:38,712-Speed 2629.12 samples/sec   Loss 2.3747   LearningRate 0.0029   Epoch: 16   Global Step: 687320   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:42,611-Speed 2627.54 samples/sec   Loss 2.3676   LearningRate 0.0029   Epoch: 16   Global Step: 687330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:46,507-Speed 2628.73 samples/sec   Loss 2.4165   LearningRate 0.0029   Epoch: 16   Global Step: 687340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:50,417-Speed 2619.92 samples/sec   Loss 2.3960   LearningRate 0.0029   Epoch: 16   Global Step: 687350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:54,317-Speed 2626.28 samples/sec   Loss 2.3506   LearningRate 0.0029   Epoch: 16   Global Step: 687360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:58:58,212-Speed 2629.37 samples/sec   Loss 2.3224   LearningRate 0.0029   Epoch: 16   Global Step: 687370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:02,128-Speed 2615.45 samples/sec   Loss 2.4152   LearningRate 0.0029   Epoch: 16   Global Step: 687380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:06,026-Speed 2627.91 samples/sec   Loss 2.2556   LearningRate 0.0029   Epoch: 16   Global Step: 687390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:09,901-Speed 2643.31 samples/sec   Loss 2.3665   LearningRate 0.0029   Epoch: 16   Global Step: 687400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:13,800-Speed 2626.72 samples/sec   Loss 2.3715   LearningRate 0.0029   Epoch: 16   Global Step: 687410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:17,697-Speed 2629.16 samples/sec   Loss 2.3939   LearningRate 0.0029   Epoch: 16   Global Step: 687420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:21,593-Speed 2628.99 samples/sec   Loss 2.4275   LearningRate 0.0029   Epoch: 16   Global Step: 687430   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:25,494-Speed 2625.26 samples/sec   Loss 2.3773   LearningRate 0.0029   Epoch: 16   Global Step: 687440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:29,412-Speed 2614.29 samples/sec   Loss 2.3365   LearningRate 0.0029   Epoch: 16   Global Step: 687450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:33,309-Speed 2629.31 samples/sec   Loss 2.3861   LearningRate 0.0029   Epoch: 16   Global Step: 687460   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:37,210-Speed 2624.95 samples/sec   Loss 2.3563   LearningRate 0.0029   Epoch: 16   Global Step: 687470   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:41,110-Speed 2626.82 samples/sec   Loss 2.3954   LearningRate 0.0029   Epoch: 16   Global Step: 687480   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:45,100-Speed 2566.95 samples/sec   Loss 2.3134   LearningRate 0.0029   Epoch: 16   Global Step: 687490   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 00:59:48,994-Speed 2630.26 samples/sec   Loss 2.3905   LearningRate 0.0029   Epoch: 16   Global Step: 687500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:59:52,894-Speed 2626.14 samples/sec   Loss 2.3696   LearningRate 0.0029   Epoch: 16   Global Step: 687510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 00:59:56,795-Speed 2625.17 samples/sec   Loss 2.3494   LearningRate 0.0029   Epoch: 16   Global Step: 687520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:00,700-Speed 2623.80 samples/sec   Loss 2.2718   LearningRate 0.0029   Epoch: 16   Global Step: 687530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:04,616-Speed 2615.30 samples/sec   Loss 2.3462   LearningRate 0.0029   Epoch: 16   Global Step: 687540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:08,511-Speed 2629.42 samples/sec   Loss 2.3622   LearningRate 0.0029   Epoch: 16   Global Step: 687550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:12,405-Speed 2630.32 samples/sec   Loss 2.2822   LearningRate 0.0029   Epoch: 16   Global Step: 687560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:16,304-Speed 2626.96 samples/sec   Loss 2.3966   LearningRate 0.0029   Epoch: 16   Global Step: 687570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:20,292-Speed 2568.57 samples/sec   Loss 2.3870   LearningRate 0.0029   Epoch: 16   Global Step: 687580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:24,188-Speed 2629.23 samples/sec   Loss 2.4064   LearningRate 0.0029   Epoch: 16   Global Step: 687590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:28,106-Speed 2614.69 samples/sec   Loss 2.3282   LearningRate 0.0029   Epoch: 16   Global Step: 687600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:00:32,011-Speed 2623.47 samples/sec   Loss 2.3844   LearningRate 0.0029   Epoch: 16   Global Step: 687610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:00:35,926-Speed 2616.17 samples/sec   Loss 2.3389   LearningRate 0.0029   Epoch: 16   Global Step: 687620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:39,823-Speed 2627.93 samples/sec   Loss 2.3946   LearningRate 0.0029   Epoch: 16   Global Step: 687630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:43,757-Speed 2603.84 samples/sec   Loss 2.3403   LearningRate 0.0029   Epoch: 16   Global Step: 687640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:47,658-Speed 2625.74 samples/sec   Loss 2.4113   LearningRate 0.0029   Epoch: 16   Global Step: 687650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:51,563-Speed 2623.13 samples/sec   Loss 2.3745   LearningRate 0.0029   Epoch: 16   Global Step: 687660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:55,462-Speed 2627.26 samples/sec   Loss 2.4700   LearningRate 0.0029   Epoch: 16   Global Step: 687670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:00:59,359-Speed 2627.92 samples/sec   Loss 2.3949   LearningRate 0.0029   Epoch: 16   Global Step: 687680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:03,264-Speed 2623.81 samples/sec   Loss 2.3383   LearningRate 0.0029   Epoch: 16   Global Step: 687690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:07,164-Speed 2626.44 samples/sec   Loss 2.3684   LearningRate 0.0029   Epoch: 16   Global Step: 687700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:11,077-Speed 2617.44 samples/sec   Loss 2.3910   LearningRate 0.0029   Epoch: 16   Global Step: 687710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:14,999-Speed 2610.74 samples/sec   Loss 2.4107   LearningRate 0.0029   Epoch: 16   Global Step: 687720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:01:18,899-Speed 2626.44 samples/sec   Loss 2.3892   LearningRate 0.0029   Epoch: 16   Global Step: 687730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:01:22,844-Speed 2597.11 samples/sec   Loss 2.3845   LearningRate 0.0029   Epoch: 16   Global Step: 687740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:26,753-Speed 2619.79 samples/sec   Loss 2.4239   LearningRate 0.0029   Epoch: 16   Global Step: 687750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:30,691-Speed 2601.73 samples/sec   Loss 2.3533   LearningRate 0.0029   Epoch: 16   Global Step: 687760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:34,678-Speed 2568.91 samples/sec   Loss 2.3436   LearningRate 0.0029   Epoch: 16   Global Step: 687770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:38,591-Speed 2617.37 samples/sec   Loss 2.4742   LearningRate 0.0029   Epoch: 16   Global Step: 687780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:42,497-Speed 2622.29 samples/sec   Loss 2.3631   LearningRate 0.0029   Epoch: 16   Global Step: 687790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:46,436-Speed 2600.48 samples/sec   Loss 2.3819   LearningRate 0.0029   Epoch: 16   Global Step: 687800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:50,380-Speed 2596.76 samples/sec   Loss 2.3916   LearningRate 0.0029   Epoch: 16   Global Step: 687810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:54,298-Speed 2615.02 samples/sec   Loss 2.3439   LearningRate 0.0029   Epoch: 16   Global Step: 687820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:01:58,217-Speed 2613.53 samples/sec   Loss 2.3990   LearningRate 0.0029   Epoch: 16   Global Step: 687830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:02,113-Speed 2629.31 samples/sec   Loss 2.3784   LearningRate 0.0029   Epoch: 16   Global Step: 687840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:02:05,989-Speed 2641.93 samples/sec   Loss 2.3425   LearningRate 0.0029   Epoch: 16   Global Step: 687850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:09,884-Speed 2630.04 samples/sec   Loss 2.3362   LearningRate 0.0029   Epoch: 16   Global Step: 687860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:13,778-Speed 2630.52 samples/sec   Loss 2.3796   LearningRate 0.0029   Epoch: 16   Global Step: 687870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:17,673-Speed 2629.52 samples/sec   Loss 2.3121   LearningRate 0.0029   Epoch: 16   Global Step: 687880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:21,595-Speed 2611.12 samples/sec   Loss 2.2923   LearningRate 0.0029   Epoch: 16   Global Step: 687890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:25,499-Speed 2623.76 samples/sec   Loss 2.3415   LearningRate 0.0029   Epoch: 16   Global Step: 687900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:29,397-Speed 2628.03 samples/sec   Loss 2.3498   LearningRate 0.0029   Epoch: 16   Global Step: 687910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:33,398-Speed 2560.24 samples/sec   Loss 2.3722   LearningRate 0.0029   Epoch: 16   Global Step: 687920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:37,364-Speed 2582.69 samples/sec   Loss 2.4050   LearningRate 0.0029   Epoch: 16   Global Step: 687930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:41,493-Speed 2480.13 samples/sec   Loss 2.4410   LearningRate 0.0029   Epoch: 16   Global Step: 687940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:45,392-Speed 2627.07 samples/sec   Loss 2.3717   LearningRate 0.0029   Epoch: 16   Global Step: 687950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:02:49,277-Speed 2636.00 samples/sec   Loss 2.3998   LearningRate 0.0029   Epoch: 16   Global Step: 687960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:53,430-Speed 2467.39 samples/sec   Loss 2.4240   LearningRate 0.0029   Epoch: 16   Global Step: 687970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:02:57,328-Speed 2628.05 samples/sec   Loss 2.4037   LearningRate 0.0029   Epoch: 16   Global Step: 687980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:03:01,229-Speed 2625.39 samples/sec   Loss 2.3963   LearningRate 0.0029   Epoch: 16   Global Step: 687990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:03:05,177-Speed 2594.77 samples/sec   Loss 2.3624   LearningRate 0.0029   Epoch: 16   Global Step: 688000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:03:09,073-Speed 2628.44 samples/sec   Loss 2.2669   LearningRate 0.0029   Epoch: 16   Global Step: 688010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:03:12,969-Speed 2629.20 samples/sec   Loss 2.3834   LearningRate 0.0029   Epoch: 16   Global Step: 688020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:03:16,842-Speed 2644.48 samples/sec   Loss 2.4704   LearningRate 0.0029   Epoch: 16   Global Step: 688030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:20,788-Speed 2598.67 samples/sec   Loss 2.3514   LearningRate 0.0029   Epoch: 16   Global Step: 688040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:24,683-Speed 2629.33 samples/sec   Loss 2.3345   LearningRate 0.0029   Epoch: 16   Global Step: 688050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:28,595-Speed 2619.23 samples/sec   Loss 2.3403   LearningRate 0.0029   Epoch: 16   Global Step: 688060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:32,491-Speed 2628.45 samples/sec   Loss 2.3412   LearningRate 0.0029   Epoch: 16   Global Step: 688070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:36,390-Speed 2627.67 samples/sec   Loss 2.2982   LearningRate 0.0029   Epoch: 16   Global Step: 688080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:40,284-Speed 2629.86 samples/sec   Loss 2.3782   LearningRate 0.0029   Epoch: 16   Global Step: 688090   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:44,199-Speed 2615.94 samples/sec   Loss 2.3948   LearningRate 0.0029   Epoch: 16   Global Step: 688100   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:48,096-Speed 2628.24 samples/sec   Loss 2.3618   LearningRate 0.0029   Epoch: 16   Global Step: 688110   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:51,990-Speed 2630.55 samples/sec   Loss 2.3520   LearningRate 0.0029   Epoch: 16   Global Step: 688120   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:03:55,900-Speed 2619.68 samples/sec   Loss 2.3827   LearningRate 0.0029   Epoch: 16   Global Step: 688130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:03:59,793-Speed 2630.98 samples/sec   Loss 2.4016   LearningRate 0.0029   Epoch: 16   Global Step: 688140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:03,689-Speed 2629.16 samples/sec   Loss 2.2992   LearningRate 0.0029   Epoch: 16   Global Step: 688150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:07,588-Speed 2626.92 samples/sec   Loss 2.3483   LearningRate 0.0029   Epoch: 16   Global Step: 688160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:11,480-Speed 2631.34 samples/sec   Loss 2.3182   LearningRate 0.0029   Epoch: 16   Global Step: 688170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:15,384-Speed 2623.87 samples/sec   Loss 2.3389   LearningRate 0.0029   Epoch: 16   Global Step: 688180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:19,279-Speed 2628.93 samples/sec   Loss 2.2955   LearningRate 0.0029   Epoch: 16   Global Step: 688190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:23,184-Speed 2623.65 samples/sec   Loss 2.3733   LearningRate 0.0029   Epoch: 16   Global Step: 688200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:27,081-Speed 2627.76 samples/sec   Loss 2.3326   LearningRate 0.0029   Epoch: 16   Global Step: 688210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:30,975-Speed 2630.89 samples/sec   Loss 2.3412   LearningRate 0.0029   Epoch: 16   Global Step: 688220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:34,870-Speed 2629.44 samples/sec   Loss 2.3192   LearningRate 0.0029   Epoch: 16   Global Step: 688230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:04:38,745-Speed 2643.23 samples/sec   Loss 2.3712   LearningRate 0.0029   Epoch: 16   Global Step: 688240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:42,636-Speed 2633.18 samples/sec   Loss 2.3110   LearningRate 0.0029   Epoch: 16   Global Step: 688250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:46,530-Speed 2629.73 samples/sec   Loss 2.3217   LearningRate 0.0029   Epoch: 16   Global Step: 688260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:50,425-Speed 2630.35 samples/sec   Loss 2.3308   LearningRate 0.0029   Epoch: 16   Global Step: 688270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:54,331-Speed 2621.66 samples/sec   Loss 2.3719   LearningRate 0.0029   Epoch: 16   Global Step: 688280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:04:58,225-Speed 2630.40 samples/sec   Loss 2.4141   LearningRate 0.0029   Epoch: 16   Global Step: 688290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:02,129-Speed 2623.31 samples/sec   Loss 2.3864   LearningRate 0.0029   Epoch: 16   Global Step: 688300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:06,024-Speed 2630.49 samples/sec   Loss 2.3758   LearningRate 0.0029   Epoch: 16   Global Step: 688310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:09,918-Speed 2629.76 samples/sec   Loss 2.3291   LearningRate 0.0029   Epoch: 16   Global Step: 688320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:13,823-Speed 2623.26 samples/sec   Loss 2.2990   LearningRate 0.0029   Epoch: 16   Global Step: 688330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:17,727-Speed 2624.18 samples/sec   Loss 2.3367   LearningRate 0.0029   Epoch: 16   Global Step: 688340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:05:21,647-Speed 2612.61 samples/sec   Loss 2.4409   LearningRate 0.0029   Epoch: 16   Global Step: 688350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:05:25,531-Speed 2637.66 samples/sec   Loss 2.3853   LearningRate 0.0029   Epoch: 16   Global Step: 688360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:29,452-Speed 2612.18 samples/sec   Loss 2.3164   LearningRate 0.0029   Epoch: 16   Global Step: 688370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:33,384-Speed 2604.61 samples/sec   Loss 2.3604   LearningRate 0.0029   Epoch: 16   Global Step: 688380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:37,277-Speed 2630.85 samples/sec   Loss 2.4378   LearningRate 0.0029   Epoch: 16   Global Step: 688390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:41,174-Speed 2628.73 samples/sec   Loss 2.2480   LearningRate 0.0029   Epoch: 16   Global Step: 688400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:45,072-Speed 2627.67 samples/sec   Loss 2.3722   LearningRate 0.0029   Epoch: 16   Global Step: 688410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:48,968-Speed 2628.69 samples/sec   Loss 2.4179   LearningRate 0.0029   Epoch: 16   Global Step: 688420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:52,863-Speed 2629.97 samples/sec   Loss 2.3605   LearningRate 0.0029   Epoch: 16   Global Step: 688430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:05:56,757-Speed 2630.50 samples/sec   Loss 2.4295   LearningRate 0.0029   Epoch: 16   Global Step: 688440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:00,650-Speed 2631.40 samples/sec   Loss 2.3557   LearningRate 0.0029   Epoch: 16   Global Step: 688450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:04,543-Speed 2630.31 samples/sec   Loss 2.3434   LearningRate 0.0029   Epoch: 16   Global Step: 688460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:06:08,435-Speed 2631.77 samples/sec   Loss 2.3440   LearningRate 0.0029   Epoch: 16   Global Step: 688470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:06:12,330-Speed 2630.01 samples/sec   Loss 2.4122   LearningRate 0.0029   Epoch: 16   Global Step: 688480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:06:16,231-Speed 2625.22 samples/sec   Loss 2.3649   LearningRate 0.0029   Epoch: 16   Global Step: 688490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:06:20,128-Speed 2628.71 samples/sec   Loss 2.4023   LearningRate 0.0029   Epoch: 16   Global Step: 688500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:06:24,002-Speed 2643.30 samples/sec   Loss 2.3253   LearningRate 0.0029   Epoch: 16   Global Step: 688510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:27,896-Speed 2630.85 samples/sec   Loss 2.3663   LearningRate 0.0029   Epoch: 16   Global Step: 688520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:31,809-Speed 2617.75 samples/sec   Loss 2.3880   LearningRate 0.0029   Epoch: 16   Global Step: 688530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:35,709-Speed 2626.38 samples/sec   Loss 2.2991   LearningRate 0.0029   Epoch: 16   Global Step: 688540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:39,603-Speed 2630.12 samples/sec   Loss 2.3447   LearningRate 0.0029   Epoch: 16   Global Step: 688550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:43,498-Speed 2629.30 samples/sec   Loss 2.3290   LearningRate 0.0029   Epoch: 16   Global Step: 688560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:47,392-Speed 2631.03 samples/sec   Loss 2.2947   LearningRate 0.0029   Epoch: 16   Global Step: 688570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:51,292-Speed 2625.87 samples/sec   Loss 2.3256   LearningRate 0.0029   Epoch: 16   Global Step: 688580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:55,192-Speed 2626.61 samples/sec   Loss 2.3536   LearningRate 0.0029   Epoch: 16   Global Step: 688590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:06:59,085-Speed 2630.92 samples/sec   Loss 2.3460   LearningRate 0.0029   Epoch: 16   Global Step: 688600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:02,979-Speed 2630.09 samples/sec   Loss 2.3764   LearningRate 0.0029   Epoch: 16   Global Step: 688610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:07:06,873-Speed 2630.72 samples/sec   Loss 2.2635   LearningRate 0.0029   Epoch: 16   Global Step: 688620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:07:10,787-Speed 2616.65 samples/sec   Loss 2.2958   LearningRate 0.0029   Epoch: 16   Global Step: 688630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:14,680-Speed 2631.37 samples/sec   Loss 2.3765   LearningRate 0.0029   Epoch: 16   Global Step: 688640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:18,573-Speed 2630.80 samples/sec   Loss 2.3727   LearningRate 0.0029   Epoch: 16   Global Step: 688650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:22,486-Speed 2617.68 samples/sec   Loss 2.3139   LearningRate 0.0029   Epoch: 16   Global Step: 688660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:26,440-Speed 2590.45 samples/sec   Loss 2.2651   LearningRate 0.0029   Epoch: 16   Global Step: 688670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:30,338-Speed 2628.01 samples/sec   Loss 2.3684   LearningRate 0.0029   Epoch: 16   Global Step: 688680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:34,249-Speed 2618.61 samples/sec   Loss 2.3186   LearningRate 0.0029   Epoch: 16   Global Step: 688690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:38,143-Speed 2630.46 samples/sec   Loss 2.3654   LearningRate 0.0029   Epoch: 16   Global Step: 688700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:42,037-Speed 2630.90 samples/sec   Loss 2.3907   LearningRate 0.0029   Epoch: 16   Global Step: 688710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:45,932-Speed 2629.41 samples/sec   Loss 2.3813   LearningRate 0.0029   Epoch: 16   Global Step: 688720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:49,811-Speed 2640.23 samples/sec   Loss 2.3414   LearningRate 0.0029   Epoch: 16   Global Step: 688730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:53,724-Speed 2617.98 samples/sec   Loss 2.3498   LearningRate 0.0029   Epoch: 16   Global Step: 688740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:07:57,815-Speed 2504.14 samples/sec   Loss 2.3232   LearningRate 0.0029   Epoch: 16   Global Step: 688750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:08:01,721-Speed 2621.77 samples/sec   Loss 2.3485   LearningRate 0.0029   Epoch: 16   Global Step: 688760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:08:05,598-Speed 2641.89 samples/sec   Loss 2.3924   LearningRate 0.0029   Epoch: 16   Global Step: 688770   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:09,509-Speed 2618.70 samples/sec   Loss 2.3870   LearningRate 0.0029   Epoch: 16   Global Step: 688780   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:13,430-Speed 2612.43 samples/sec   Loss 2.4033   LearningRate 0.0029   Epoch: 16   Global Step: 688790   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:17,341-Speed 2618.24 samples/sec   Loss 2.3963   LearningRate 0.0029   Epoch: 16   Global Step: 688800   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:21,255-Speed 2616.78 samples/sec   Loss 2.2896   LearningRate 0.0029   Epoch: 16   Global Step: 688810   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:25,149-Speed 2630.74 samples/sec   Loss 2.3816   LearningRate 0.0029   Epoch: 16   Global Step: 688820   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:29,050-Speed 2625.91 samples/sec   Loss 2.3802   LearningRate 0.0029   Epoch: 16   Global Step: 688830   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:32,992-Speed 2598.28 samples/sec   Loss 2.3163   LearningRate 0.0029   Epoch: 16   Global Step: 688840   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:36,889-Speed 2628.51 samples/sec   Loss 2.4002   LearningRate 0.0029   Epoch: 16   Global Step: 688850   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:40,902-Speed 2552.32 samples/sec   Loss 2.3600   LearningRate 0.0029   Epoch: 16   Global Step: 688860   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:08:44,851-Speed 2593.38 samples/sec   Loss 2.3142   LearningRate 0.0029   Epoch: 16   Global Step: 688870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:08:48,753-Speed 2625.39 samples/sec   Loss 2.4158   LearningRate 0.0029   Epoch: 16   Global Step: 688880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:08:52,659-Speed 2622.50 samples/sec   Loss 2.3213   LearningRate 0.0029   Epoch: 16   Global Step: 688890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:08:56,559-Speed 2626.17 samples/sec   Loss 2.3532   LearningRate 0.0029   Epoch: 16   Global Step: 688900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:09:00,476-Speed 2615.54 samples/sec   Loss 2.3401   LearningRate 0.0029   Epoch: 16   Global Step: 688910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:09:04,437-Speed 2585.79 samples/sec   Loss 2.3282   LearningRate 0.0029   Epoch: 16   Global Step: 688920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:09:08,362-Speed 2609.48 samples/sec   Loss 2.3498   LearningRate 0.0029   Epoch: 16   Global Step: 688930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:09:12,254-Speed 2631.86 samples/sec   Loss 2.3270   LearningRate 0.0029   Epoch: 16   Global Step: 688940   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:16,155-Speed 2626.10 samples/sec   Loss 2.3430   LearningRate 0.0029   Epoch: 16   Global Step: 688950   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:20,049-Speed 2630.08 samples/sec   Loss 2.4278   LearningRate 0.0029   Epoch: 16   Global Step: 688960   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:23,943-Speed 2631.23 samples/sec   Loss 2.4098   LearningRate 0.0029   Epoch: 16   Global Step: 688970   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:27,841-Speed 2627.13 samples/sec   Loss 2.3927   LearningRate 0.0029   Epoch: 16   Global Step: 688980   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:31,739-Speed 2628.24 samples/sec   Loss 2.3941   LearningRate 0.0029   Epoch: 16   Global Step: 688990   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:35,630-Speed 2632.40 samples/sec   Loss 2.3356   LearningRate 0.0029   Epoch: 16   Global Step: 689000   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:39,526-Speed 2628.73 samples/sec   Loss 2.4032   LearningRate 0.0029   Epoch: 16   Global Step: 689010   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:43,420-Speed 2630.54 samples/sec   Loss 2.3754   LearningRate 0.0029   Epoch: 16   Global Step: 689020   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:47,315-Speed 2630.05 samples/sec   Loss 2.3440   LearningRate 0.0029   Epoch: 16   Global Step: 689030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:09:51,225-Speed 2619.41 samples/sec   Loss 2.3698   LearningRate 0.0029   Epoch: 16   Global Step: 689040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:09:55,149-Speed 2610.17 samples/sec   Loss 2.3334   LearningRate 0.0029   Epoch: 16   Global Step: 689050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:09:59,237-Speed 2505.44 samples/sec   Loss 2.3513   LearningRate 0.0029   Epoch: 16   Global Step: 689060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:03,327-Speed 2505.07 samples/sec   Loss 2.3686   LearningRate 0.0029   Epoch: 16   Global Step: 689070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:07,290-Speed 2584.27 samples/sec   Loss 2.3665   LearningRate 0.0029   Epoch: 16   Global Step: 689080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:11,231-Speed 2598.48 samples/sec   Loss 2.3065   LearningRate 0.0029   Epoch: 16   Global Step: 689090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:15,129-Speed 2628.04 samples/sec   Loss 2.3411   LearningRate 0.0029   Epoch: 16   Global Step: 689100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:19,028-Speed 2627.37 samples/sec   Loss 2.3529   LearningRate 0.0029   Epoch: 16   Global Step: 689110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:22,923-Speed 2629.96 samples/sec   Loss 2.3720   LearningRate 0.0029   Epoch: 16   Global Step: 689120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:26,818-Speed 2629.36 samples/sec   Loss 2.3547   LearningRate 0.0029   Epoch: 16   Global Step: 689130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:30,725-Speed 2621.60 samples/sec   Loss 2.3287   LearningRate 0.0029   Epoch: 16   Global Step: 689140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:10:34,600-Speed 2643.70 samples/sec   Loss 2.4195   LearningRate 0.0029   Epoch: 16   Global Step: 689150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:38,499-Speed 2627.34 samples/sec   Loss 2.3526   LearningRate 0.0029   Epoch: 16   Global Step: 689160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:42,402-Speed 2624.20 samples/sec   Loss 2.3338   LearningRate 0.0029   Epoch: 16   Global Step: 689170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:46,330-Speed 2607.52 samples/sec   Loss 2.3358   LearningRate 0.0029   Epoch: 16   Global Step: 689180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:50,240-Speed 2619.89 samples/sec   Loss 2.3959   LearningRate 0.0029   Epoch: 16   Global Step: 689190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:54,137-Speed 2628.33 samples/sec   Loss 2.3417   LearningRate 0.0029   Epoch: 16   Global Step: 689200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:10:58,053-Speed 2615.78 samples/sec   Loss 2.4583   LearningRate 0.0029   Epoch: 16   Global Step: 689210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:01,946-Speed 2631.17 samples/sec   Loss 2.3608   LearningRate 0.0029   Epoch: 16   Global Step: 689220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:05,839-Speed 2630.63 samples/sec   Loss 2.3974   LearningRate 0.0029   Epoch: 16   Global Step: 689230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:09,731-Speed 2631.82 samples/sec   Loss 2.4276   LearningRate 0.0029   Epoch: 16   Global Step: 689240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:13,636-Speed 2623.43 samples/sec   Loss 2.3285   LearningRate 0.0029   Epoch: 16   Global Step: 689250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:11:17,533-Speed 2627.69 samples/sec   Loss 2.4120   LearningRate 0.0029   Epoch: 16   Global Step: 689260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:11:21,499-Speed 2583.44 samples/sec   Loss 2.4527   LearningRate 0.0029   Epoch: 16   Global Step: 689270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:11:25,409-Speed 2619.35 samples/sec   Loss 2.3056   LearningRate 0.0029   Epoch: 16   Global Step: 689280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:11:29,286-Speed 2642.42 samples/sec   Loss 2.3102   LearningRate 0.0029   Epoch: 16   Global Step: 689290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:33,188-Speed 2624.26 samples/sec   Loss 2.4152   LearningRate 0.0029   Epoch: 16   Global Step: 689300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:37,085-Speed 2628.10 samples/sec   Loss 2.3240   LearningRate 0.0029   Epoch: 16   Global Step: 689310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:40,999-Speed 2616.87 samples/sec   Loss 2.3769   LearningRate 0.0029   Epoch: 16   Global Step: 689320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:44,924-Speed 2609.47 samples/sec   Loss 2.3859   LearningRate 0.0029   Epoch: 16   Global Step: 689330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:48,819-Speed 2629.66 samples/sec   Loss 2.3397   LearningRate 0.0029   Epoch: 16   Global Step: 689340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:52,722-Speed 2624.63 samples/sec   Loss 2.3651   LearningRate 0.0029   Epoch: 16   Global Step: 689350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:11:56,618-Speed 2629.52 samples/sec   Loss 2.3419   LearningRate 0.0029   Epoch: 16   Global Step: 689360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:00,522-Speed 2623.16 samples/sec   Loss 2.4040   LearningRate 0.0029   Epoch: 16   Global Step: 689370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:04,440-Speed 2614.19 samples/sec   Loss 2.2931   LearningRate 0.0029   Epoch: 16   Global Step: 689380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:08,338-Speed 2627.19 samples/sec   Loss 2.3075   LearningRate 0.0029   Epoch: 16   Global Step: 689390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:12:12,212-Speed 2644.42 samples/sec   Loss 2.3385   LearningRate 0.0029   Epoch: 16   Global Step: 689400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:16,111-Speed 2626.73 samples/sec   Loss 2.3301   LearningRate 0.0029   Epoch: 16   Global Step: 689410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:20,007-Speed 2628.97 samples/sec   Loss 2.3678   LearningRate 0.0029   Epoch: 16   Global Step: 689420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:23,901-Speed 2629.95 samples/sec   Loss 2.2724   LearningRate 0.0029   Epoch: 16   Global Step: 689430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:27,807-Speed 2622.72 samples/sec   Loss 2.4317   LearningRate 0.0029   Epoch: 16   Global Step: 689440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:31,700-Speed 2631.35 samples/sec   Loss 2.4131   LearningRate 0.0029   Epoch: 16   Global Step: 689450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:35,594-Speed 2630.35 samples/sec   Loss 2.3707   LearningRate 0.0029   Epoch: 16   Global Step: 689460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:39,488-Speed 2630.12 samples/sec   Loss 2.3210   LearningRate 0.0029   Epoch: 16   Global Step: 689470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:43,382-Speed 2630.41 samples/sec   Loss 2.2700   LearningRate 0.0029   Epoch: 16   Global Step: 689480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:47,277-Speed 2629.45 samples/sec   Loss 2.4114   LearningRate 0.0029   Epoch: 16   Global Step: 689490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:51,148-Speed 2645.75 samples/sec   Loss 2.3302   LearningRate 0.0029   Epoch: 16   Global Step: 689500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:55,040-Speed 2631.99 samples/sec   Loss 2.3361   LearningRate 0.0029   Epoch: 16   Global Step: 689510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:12:58,933-Speed 2631.12 samples/sec   Loss 2.3292   LearningRate 0.0029   Epoch: 16   Global Step: 689520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:02,826-Speed 2631.03 samples/sec   Loss 2.4208   LearningRate 0.0029   Epoch: 16   Global Step: 689530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:06,726-Speed 2626.49 samples/sec   Loss 2.3014   LearningRate 0.0028   Epoch: 16   Global Step: 689540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:10,630-Speed 2623.61 samples/sec   Loss 2.3481   LearningRate 0.0028   Epoch: 16   Global Step: 689550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:14,524-Speed 2630.34 samples/sec   Loss 2.4151   LearningRate 0.0028   Epoch: 16   Global Step: 689560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:18,419-Speed 2630.14 samples/sec   Loss 2.3570   LearningRate 0.0028   Epoch: 16   Global Step: 689570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:22,315-Speed 2629.17 samples/sec   Loss 2.2570   LearningRate 0.0028   Epoch: 16   Global Step: 689580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:26,207-Speed 2631.57 samples/sec   Loss 2.2537   LearningRate 0.0028   Epoch: 16   Global Step: 689590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:30,106-Speed 2627.00 samples/sec   Loss 2.2988   LearningRate 0.0028   Epoch: 16   Global Step: 689600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:13:33,984-Speed 2641.30 samples/sec   Loss 2.3401   LearningRate 0.0028   Epoch: 16   Global Step: 689610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:37,881-Speed 2628.19 samples/sec   Loss 2.3696   LearningRate 0.0028   Epoch: 16   Global Step: 689620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:41,775-Speed 2630.81 samples/sec   Loss 2.3077   LearningRate 0.0028   Epoch: 16   Global Step: 689630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:45,672-Speed 2628.29 samples/sec   Loss 2.3430   LearningRate 0.0028   Epoch: 16   Global Step: 689640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:49,614-Speed 2598.31 samples/sec   Loss 2.3210   LearningRate 0.0028   Epoch: 16   Global Step: 689650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:53,515-Speed 2625.82 samples/sec   Loss 2.3301   LearningRate 0.0028   Epoch: 16   Global Step: 689660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:13:57,415-Speed 2626.29 samples/sec   Loss 2.3426   LearningRate 0.0028   Epoch: 16   Global Step: 689670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:01,313-Speed 2628.00 samples/sec   Loss 2.3352   LearningRate 0.0028   Epoch: 16   Global Step: 689680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:05,215-Speed 2624.66 samples/sec   Loss 2.3945   LearningRate 0.0028   Epoch: 16   Global Step: 689690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:09,120-Speed 2623.20 samples/sec   Loss 2.3716   LearningRate 0.0028   Epoch: 16   Global Step: 689700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:13,035-Speed 2616.36 samples/sec   Loss 2.3520   LearningRate 0.0028   Epoch: 16   Global Step: 689710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:14:16,929-Speed 2630.02 samples/sec   Loss 2.3257   LearningRate 0.0028   Epoch: 16   Global Step: 689720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:14:20,802-Speed 2644.80 samples/sec   Loss 2.3465   LearningRate 0.0028   Epoch: 16   Global Step: 689730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:24,698-Speed 2628.94 samples/sec   Loss 2.2999   LearningRate 0.0028   Epoch: 16   Global Step: 689740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:28,590-Speed 2632.02 samples/sec   Loss 2.3929   LearningRate 0.0028   Epoch: 16   Global Step: 689750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:32,521-Speed 2606.22 samples/sec   Loss 2.3820   LearningRate 0.0028   Epoch: 16   Global Step: 689760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:36,411-Speed 2632.61 samples/sec   Loss 2.2961   LearningRate 0.0028   Epoch: 16   Global Step: 689770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:40,307-Speed 2629.08 samples/sec   Loss 2.3388   LearningRate 0.0028   Epoch: 16   Global Step: 689780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:44,212-Speed 2622.26 samples/sec   Loss 2.3885   LearningRate 0.0028   Epoch: 16   Global Step: 689790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:48,104-Speed 2633.08 samples/sec   Loss 2.3467   LearningRate 0.0028   Epoch: 16   Global Step: 689800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:51,994-Speed 2632.68 samples/sec   Loss 2.3926   LearningRate 0.0028   Epoch: 16   Global Step: 689810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:55,889-Speed 2629.86 samples/sec   Loss 2.3192   LearningRate 0.0028   Epoch: 16   Global Step: 689820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:14:59,781-Speed 2632.23 samples/sec   Loss 2.3584   LearningRate 0.0028   Epoch: 16   Global Step: 689830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:03,680-Speed 2627.28 samples/sec   Loss 2.3503   LearningRate 0.0028   Epoch: 16   Global Step: 689840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:07,597-Speed 2614.76 samples/sec   Loss 2.3804   LearningRate 0.0028   Epoch: 16   Global Step: 689850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:11,502-Speed 2622.86 samples/sec   Loss 2.2833   LearningRate 0.0028   Epoch: 16   Global Step: 689860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:15,397-Speed 2629.51 samples/sec   Loss 2.3842   LearningRate 0.0028   Epoch: 16   Global Step: 689870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:19,295-Speed 2628.11 samples/sec   Loss 2.3616   LearningRate 0.0028   Epoch: 16   Global Step: 689880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:23,187-Speed 2631.83 samples/sec   Loss 2.3339   LearningRate 0.0028   Epoch: 16   Global Step: 689890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:15:27,057-Speed 2645.98 samples/sec   Loss 2.3524   LearningRate 0.0028   Epoch: 16   Global Step: 689900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:30,957-Speed 2627.13 samples/sec   Loss 2.3878   LearningRate 0.0028   Epoch: 16   Global Step: 689910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:34,874-Speed 2614.77 samples/sec   Loss 2.2934   LearningRate 0.0028   Epoch: 16   Global Step: 689920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:38,766-Speed 2631.65 samples/sec   Loss 2.3838   LearningRate 0.0028   Epoch: 16   Global Step: 689930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:42,666-Speed 2625.77 samples/sec   Loss 2.3628   LearningRate 0.0028   Epoch: 16   Global Step: 689940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:46,562-Speed 2629.51 samples/sec   Loss 2.3642   LearningRate 0.0028   Epoch: 16   Global Step: 689950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:50,456-Speed 2629.64 samples/sec   Loss 2.3071   LearningRate 0.0028   Epoch: 16   Global Step: 689960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:54,362-Speed 2622.92 samples/sec   Loss 2.3191   LearningRate 0.0028   Epoch: 16   Global Step: 689970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:15:58,259-Speed 2627.91 samples/sec   Loss 2.3537   LearningRate 0.0028   Epoch: 16   Global Step: 689980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:16:02,153-Speed 2631.07 samples/sec   Loss 2.3053   LearningRate 0.0028   Epoch: 16   Global Step: 689990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:16:06,046-Speed 2630.37 samples/sec   Loss 2.2257   LearningRate 0.0028   Epoch: 16   Global Step: 690000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:16:48,644-[lfw][690000]XNorm: 22.375030
Training: 2022-04-16 01:16:48,645-[lfw][690000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 01:16:48,646-[lfw][690000]Accuracy-Highest: 0.99833
Training: 2022-04-16 01:17:38,569-[cfp_fp][690000]XNorm: 22.002732
Training: 2022-04-16 01:17:38,570-[cfp_fp][690000]Accuracy-Flip: 0.99329+-0.00409
Training: 2022-04-16 01:17:38,571-[cfp_fp][690000]Accuracy-Highest: 0.99329
Training: 2022-04-16 01:18:21,526-[agedb_30][690000]XNorm: 23.116592
Training: 2022-04-16 01:18:21,527-[agedb_30][690000]Accuracy-Flip: 0.98317+-0.00621
Training: 2022-04-16 01:18:21,528-[agedb_30][690000]Accuracy-Highest: 0.98317
Training: 2022-04-16 01:18:25,400-Speed 73.48 samples/sec   Loss 2.4001   LearningRate 0.0028   Epoch: 16   Global Step: 690010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:18:29,253-Speed 2658.37 samples/sec   Loss 2.4000   LearningRate 0.0028   Epoch: 16   Global Step: 690020   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:33,129-Speed 2642.80 samples/sec   Loss 2.4222   LearningRate 0.0028   Epoch: 16   Global Step: 690030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:37,008-Speed 2640.85 samples/sec   Loss 2.3466   LearningRate 0.0028   Epoch: 16   Global Step: 690040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:40,891-Speed 2637.47 samples/sec   Loss 2.3054   LearningRate 0.0028   Epoch: 16   Global Step: 690050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:44,777-Speed 2636.42 samples/sec   Loss 2.3389   LearningRate 0.0028   Epoch: 16   Global Step: 690060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:48,660-Speed 2638.26 samples/sec   Loss 2.3869   LearningRate 0.0028   Epoch: 16   Global Step: 690070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:52,546-Speed 2635.55 samples/sec   Loss 2.3200   LearningRate 0.0028   Epoch: 16   Global Step: 690080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:18:56,429-Speed 2638.00 samples/sec   Loss 2.3125   LearningRate 0.0028   Epoch: 16   Global Step: 690090   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:00,316-Speed 2635.32 samples/sec   Loss 2.3331   LearningRate 0.0028   Epoch: 16   Global Step: 690100   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:04,221-Speed 2622.73 samples/sec   Loss 2.2666   LearningRate 0.0028   Epoch: 16   Global Step: 690110   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:08,111-Speed 2633.28 samples/sec   Loss 2.3172   LearningRate 0.0028   Epoch: 16   Global Step: 690120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:12,003-Speed 2631.96 samples/sec   Loss 2.3200   LearningRate 0.0028   Epoch: 16   Global Step: 690130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:15,894-Speed 2631.52 samples/sec   Loss 2.3370   LearningRate 0.0028   Epoch: 16   Global Step: 690140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:19,790-Speed 2629.26 samples/sec   Loss 2.3596   LearningRate 0.0028   Epoch: 16   Global Step: 690150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:23,687-Speed 2628.51 samples/sec   Loss 2.3300   LearningRate 0.0028   Epoch: 16   Global Step: 690160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:27,584-Speed 2628.08 samples/sec   Loss 2.2883   LearningRate 0.0028   Epoch: 16   Global Step: 690170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:31,482-Speed 2628.03 samples/sec   Loss 2.3135   LearningRate 0.0028   Epoch: 16   Global Step: 690180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:35,377-Speed 2629.43 samples/sec   Loss 2.3703   LearningRate 0.0028   Epoch: 16   Global Step: 690190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:39,277-Speed 2626.53 samples/sec   Loss 2.3730   LearningRate 0.0028   Epoch: 16   Global Step: 690200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:19:43,151-Speed 2644.11 samples/sec   Loss 2.3470   LearningRate 0.0028   Epoch: 16   Global Step: 690210   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:47,059-Speed 2620.56 samples/sec   Loss 2.3175   LearningRate 0.0028   Epoch: 16   Global Step: 690220   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:50,953-Speed 2630.14 samples/sec   Loss 2.2761   LearningRate 0.0028   Epoch: 16   Global Step: 690230   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:54,850-Speed 2628.34 samples/sec   Loss 2.2951   LearningRate 0.0028   Epoch: 16   Global Step: 690240   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:19:58,777-Speed 2608.99 samples/sec   Loss 2.3087   LearningRate 0.0028   Epoch: 16   Global Step: 690250   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:02,675-Speed 2627.64 samples/sec   Loss 2.3595   LearningRate 0.0028   Epoch: 16   Global Step: 690260   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:06,575-Speed 2626.73 samples/sec   Loss 2.2903   LearningRate 0.0028   Epoch: 16   Global Step: 690270   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:10,474-Speed 2626.75 samples/sec   Loss 2.3249   LearningRate 0.0028   Epoch: 16   Global Step: 690280   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:14,380-Speed 2622.66 samples/sec   Loss 2.2689   LearningRate 0.0028   Epoch: 16   Global Step: 690290   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:18,279-Speed 2626.38 samples/sec   Loss 2.3605   LearningRate 0.0028   Epoch: 16   Global Step: 690300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:22,178-Speed 2626.96 samples/sec   Loss 2.3969   LearningRate 0.0028   Epoch: 16   Global Step: 690310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:20:26,075-Speed 2628.16 samples/sec   Loss 2.3425   LearningRate 0.0028   Epoch: 16   Global Step: 690320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:20:29,978-Speed 2624.94 samples/sec   Loss 2.3838   LearningRate 0.0028   Epoch: 16   Global Step: 690330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:20:33,876-Speed 2628.36 samples/sec   Loss 2.3385   LearningRate 0.0028   Epoch: 16   Global Step: 690340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:20:37,773-Speed 2627.83 samples/sec   Loss 2.3469   LearningRate 0.0028   Epoch: 16   Global Step: 690350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:20:41,649-Speed 2642.72 samples/sec   Loss 2.4023   LearningRate 0.0028   Epoch: 16   Global Step: 690360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:45,544-Speed 2629.61 samples/sec   Loss 2.3311   LearningRate 0.0028   Epoch: 16   Global Step: 690370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:49,452-Speed 2621.00 samples/sec   Loss 2.3162   LearningRate 0.0028   Epoch: 16   Global Step: 690380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:53,353-Speed 2626.13 samples/sec   Loss 2.3486   LearningRate 0.0028   Epoch: 16   Global Step: 690390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:20:57,260-Speed 2620.99 samples/sec   Loss 2.3895   LearningRate 0.0028   Epoch: 16   Global Step: 690400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:21:01,162-Speed 2625.72 samples/sec   Loss 2.3692   LearningRate 0.0028   Epoch: 16   Global Step: 690410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:21:05,059-Speed 2627.81 samples/sec   Loss 2.3036   LearningRate 0.0028   Epoch: 16   Global Step: 690420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:21:08,956-Speed 2628.51 samples/sec   Loss 2.3314   LearningRate 0.0028   Epoch: 16   Global Step: 690430   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:21:12,857-Speed 2625.99 samples/sec   Loss 2.2993   LearningRate 0.0028   Epoch: 16   Global Step: 690440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:21:16,753-Speed 2628.78 samples/sec   Loss 2.3630   LearningRate 0.0028   Epoch: 16   Global Step: 690450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:21:20,650-Speed 2629.48 samples/sec   Loss 2.3139   LearningRate 0.0028   Epoch: 16   Global Step: 690460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:24,550-Speed 2625.99 samples/sec   Loss 2.3810   LearningRate 0.0028   Epoch: 16   Global Step: 690470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:28,450-Speed 2626.18 samples/sec   Loss 2.2941   LearningRate 0.0028   Epoch: 16   Global Step: 690480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:32,347-Speed 2628.12 samples/sec   Loss 2.3618   LearningRate 0.0028   Epoch: 16   Global Step: 690490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:36,247-Speed 2626.73 samples/sec   Loss 2.3700   LearningRate 0.0028   Epoch: 16   Global Step: 690500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:40,144-Speed 2627.99 samples/sec   Loss 2.2900   LearningRate 0.0028   Epoch: 16   Global Step: 690510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:44,043-Speed 2627.54 samples/sec   Loss 2.3848   LearningRate 0.0028   Epoch: 16   Global Step: 690520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:47,937-Speed 2629.81 samples/sec   Loss 2.3504   LearningRate 0.0028   Epoch: 16   Global Step: 690530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:51,860-Speed 2611.33 samples/sec   Loss 2.3322   LearningRate 0.0028   Epoch: 16   Global Step: 690540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:55,753-Speed 2631.20 samples/sec   Loss 2.4346   LearningRate 0.0028   Epoch: 16   Global Step: 690550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:21:59,647-Speed 2630.43 samples/sec   Loss 2.3741   LearningRate 0.0028   Epoch: 16   Global Step: 690560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:22:03,637-Speed 2566.83 samples/sec   Loss 2.2558   LearningRate 0.0028   Epoch: 16   Global Step: 690570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:22:07,507-Speed 2646.57 samples/sec   Loss 2.3108   LearningRate 0.0028   Epoch: 16   Global Step: 690580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:22:11,404-Speed 2628.35 samples/sec   Loss 2.3570   LearningRate 0.0028   Epoch: 16   Global Step: 690590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:22:15,300-Speed 2629.38 samples/sec   Loss 2.2551   LearningRate 0.0028   Epoch: 16   Global Step: 690600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:22:19,198-Speed 2628.27 samples/sec   Loss 2.2966   LearningRate 0.0028   Epoch: 16   Global Step: 690610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:22:23,112-Speed 2616.62 samples/sec   Loss 2.3380   LearningRate 0.0028   Epoch: 16   Global Step: 690620   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:27,011-Speed 2627.19 samples/sec   Loss 2.3488   LearningRate 0.0028   Epoch: 16   Global Step: 690630   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:30,910-Speed 2626.55 samples/sec   Loss 2.3553   LearningRate 0.0028   Epoch: 16   Global Step: 690640   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:34,823-Speed 2617.46 samples/sec   Loss 2.2731   LearningRate 0.0028   Epoch: 16   Global Step: 690650   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:38,722-Speed 2626.90 samples/sec   Loss 2.3525   LearningRate 0.0028   Epoch: 16   Global Step: 690660   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:42,659-Speed 2601.80 samples/sec   Loss 2.3483   LearningRate 0.0028   Epoch: 16   Global Step: 690670   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:46,561-Speed 2625.02 samples/sec   Loss 2.3204   LearningRate 0.0028   Epoch: 16   Global Step: 690680   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:50,454-Speed 2631.18 samples/sec   Loss 2.3933   LearningRate 0.0028   Epoch: 16   Global Step: 690690   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:54,350-Speed 2628.73 samples/sec   Loss 2.3158   LearningRate 0.0028   Epoch: 16   Global Step: 690700   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:22:58,261-Speed 2619.64 samples/sec   Loss 2.2926   LearningRate 0.0028   Epoch: 16   Global Step: 690710   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:02,158-Speed 2628.58 samples/sec   Loss 2.4076   LearningRate 0.0028   Epoch: 16   Global Step: 690720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:23:06,060-Speed 2624.58 samples/sec   Loss 2.2375   LearningRate 0.0028   Epoch: 16   Global Step: 690730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:23:09,960-Speed 2626.27 samples/sec   Loss 2.2803   LearningRate 0.0028   Epoch: 16   Global Step: 690740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:23:13,833-Speed 2644.96 samples/sec   Loss 2.2652   LearningRate 0.0028   Epoch: 16   Global Step: 690750   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:17,728-Speed 2629.19 samples/sec   Loss 2.3628   LearningRate 0.0028   Epoch: 16   Global Step: 690760   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:21,626-Speed 2628.38 samples/sec   Loss 2.3361   LearningRate 0.0028   Epoch: 16   Global Step: 690770   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:25,542-Speed 2615.20 samples/sec   Loss 2.3129   LearningRate 0.0028   Epoch: 16   Global Step: 690780   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:29,434-Speed 2632.21 samples/sec   Loss 2.3168   LearningRate 0.0028   Epoch: 16   Global Step: 690790   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:33,330-Speed 2628.54 samples/sec   Loss 2.3150   LearningRate 0.0028   Epoch: 16   Global Step: 690800   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:37,225-Speed 2629.31 samples/sec   Loss 2.3169   LearningRate 0.0028   Epoch: 16   Global Step: 690810   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:41,123-Speed 2627.71 samples/sec   Loss 2.3219   LearningRate 0.0028   Epoch: 16   Global Step: 690820   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:45,047-Speed 2610.68 samples/sec   Loss 2.3380   LearningRate 0.0028   Epoch: 16   Global Step: 690830   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:48,938-Speed 2632.82 samples/sec   Loss 2.3153   LearningRate 0.0028   Epoch: 16   Global Step: 690840   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:23:52,828-Speed 2633.24 samples/sec   Loss 2.3797   LearningRate 0.0028   Epoch: 16   Global Step: 690850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:23:56,732-Speed 2623.21 samples/sec   Loss 2.3468   LearningRate 0.0028   Epoch: 16   Global Step: 690860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:00,630-Speed 2628.30 samples/sec   Loss 2.3360   LearningRate 0.0028   Epoch: 16   Global Step: 690870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:04,533-Speed 2623.90 samples/sec   Loss 2.3292   LearningRate 0.0028   Epoch: 16   Global Step: 690880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:08,434-Speed 2625.37 samples/sec   Loss 2.3347   LearningRate 0.0028   Epoch: 16   Global Step: 690890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:12,337-Speed 2624.64 samples/sec   Loss 2.3438   LearningRate 0.0028   Epoch: 16   Global Step: 690900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:16,235-Speed 2627.63 samples/sec   Loss 2.3245   LearningRate 0.0028   Epoch: 16   Global Step: 690910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:20,135-Speed 2626.86 samples/sec   Loss 2.2900   LearningRate 0.0028   Epoch: 16   Global Step: 690920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:24,030-Speed 2629.45 samples/sec   Loss 2.3577   LearningRate 0.0028   Epoch: 16   Global Step: 690930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:27,960-Speed 2606.72 samples/sec   Loss 2.3116   LearningRate 0.0028   Epoch: 16   Global Step: 690940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:31,838-Speed 2641.09 samples/sec   Loss 2.2706   LearningRate 0.0028   Epoch: 16   Global Step: 690950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:35,736-Speed 2627.34 samples/sec   Loss 2.3377   LearningRate 0.0028   Epoch: 16   Global Step: 690960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:39,641-Speed 2622.79 samples/sec   Loss 2.3807   LearningRate 0.0028   Epoch: 16   Global Step: 690970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:43,583-Speed 2598.54 samples/sec   Loss 2.3303   LearningRate 0.0028   Epoch: 16   Global Step: 690980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:47,495-Speed 2618.44 samples/sec   Loss 2.2687   LearningRate 0.0028   Epoch: 16   Global Step: 690990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:51,517-Speed 2547.03 samples/sec   Loss 2.3359   LearningRate 0.0028   Epoch: 16   Global Step: 691000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:55,444-Speed 2607.46 samples/sec   Loss 2.3992   LearningRate 0.0028   Epoch: 16   Global Step: 691010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:24:59,343-Speed 2627.71 samples/sec   Loss 2.3151   LearningRate 0.0028   Epoch: 16   Global Step: 691020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:25:03,218-Speed 2643.54 samples/sec   Loss 2.2971   LearningRate 0.0028   Epoch: 16   Global Step: 691030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:07,128-Speed 2619.12 samples/sec   Loss 2.3313   LearningRate 0.0028   Epoch: 16   Global Step: 691040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:11,025-Speed 2628.56 samples/sec   Loss 2.3735   LearningRate 0.0028   Epoch: 16   Global Step: 691050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:14,943-Speed 2614.28 samples/sec   Loss 2.3207   LearningRate 0.0028   Epoch: 16   Global Step: 691060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:18,852-Speed 2620.03 samples/sec   Loss 2.3796   LearningRate 0.0028   Epoch: 16   Global Step: 691070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:22,753-Speed 2626.03 samples/sec   Loss 2.3199   LearningRate 0.0028   Epoch: 16   Global Step: 691080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:26,649-Speed 2629.08 samples/sec   Loss 2.3083   LearningRate 0.0028   Epoch: 16   Global Step: 691090   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:30,568-Speed 2613.74 samples/sec   Loss 2.3383   LearningRate 0.0028   Epoch: 16   Global Step: 691100   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:34,461-Speed 2631.31 samples/sec   Loss 2.3231   LearningRate 0.0028   Epoch: 16   Global Step: 691110   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:38,354-Speed 2630.61 samples/sec   Loss 2.2358   LearningRate 0.0028   Epoch: 16   Global Step: 691120   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:25:42,252-Speed 2627.81 samples/sec   Loss 2.3931   LearningRate 0.0028   Epoch: 16   Global Step: 691130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:25:46,142-Speed 2632.76 samples/sec   Loss 2.3237   LearningRate 0.0028   Epoch: 16   Global Step: 691140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:25:50,069-Speed 2608.75 samples/sec   Loss 2.3169   LearningRate 0.0028   Epoch: 16   Global Step: 691150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:25:53,972-Speed 2623.76 samples/sec   Loss 2.3238   LearningRate 0.0028   Epoch: 16   Global Step: 691160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:25:57,871-Speed 2627.43 samples/sec   Loss 2.2923   LearningRate 0.0028   Epoch: 16   Global Step: 691170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:01,764-Speed 2630.89 samples/sec   Loss 2.2844   LearningRate 0.0028   Epoch: 16   Global Step: 691180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:05,672-Speed 2621.00 samples/sec   Loss 2.3467   LearningRate 0.0028   Epoch: 16   Global Step: 691190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:09,565-Speed 2631.09 samples/sec   Loss 2.2911   LearningRate 0.0028   Epoch: 16   Global Step: 691200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:13,461-Speed 2629.01 samples/sec   Loss 2.2733   LearningRate 0.0028   Epoch: 16   Global Step: 691210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:17,359-Speed 2626.99 samples/sec   Loss 2.3735   LearningRate 0.0028   Epoch: 16   Global Step: 691220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:21,258-Speed 2626.81 samples/sec   Loss 2.3301   LearningRate 0.0028   Epoch: 16   Global Step: 691230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:26:25,205-Speed 2595.72 samples/sec   Loss 2.3179   LearningRate 0.0028   Epoch: 16   Global Step: 691240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:29,111-Speed 2622.05 samples/sec   Loss 2.3202   LearningRate 0.0028   Epoch: 16   Global Step: 691250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:33,008-Speed 2628.13 samples/sec   Loss 2.4280   LearningRate 0.0028   Epoch: 16   Global Step: 691260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:36,945-Speed 2601.73 samples/sec   Loss 2.2794   LearningRate 0.0028   Epoch: 16   Global Step: 691270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:40,846-Speed 2630.25 samples/sec   Loss 2.3149   LearningRate 0.0028   Epoch: 16   Global Step: 691280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:44,743-Speed 2628.51 samples/sec   Loss 2.3378   LearningRate 0.0028   Epoch: 16   Global Step: 691290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:48,676-Speed 2604.51 samples/sec   Loss 2.3527   LearningRate 0.0028   Epoch: 16   Global Step: 691300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:52,577-Speed 2625.46 samples/sec   Loss 2.3319   LearningRate 0.0028   Epoch: 16   Global Step: 691310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:26:56,482-Speed 2622.59 samples/sec   Loss 2.3451   LearningRate 0.0028   Epoch: 16   Global Step: 691320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:27:00,385-Speed 2624.75 samples/sec   Loss 2.3504   LearningRate 0.0028   Epoch: 16   Global Step: 691330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:27:04,288-Speed 2624.52 samples/sec   Loss 2.3708   LearningRate 0.0028   Epoch: 16   Global Step: 691340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-16 01:27:08,236-Speed 2594.13 samples/sec   Loss 2.3370   LearningRate 0.0028   Epoch: 16   Global Step: 691350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:27:12,134-Speed 2629.40 samples/sec   Loss 2.3484   LearningRate 0.0028   Epoch: 16   Global Step: 691360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:27:16,005-Speed 2646.04 samples/sec   Loss 2.3538   LearningRate 0.0028   Epoch: 16   Global Step: 691370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:19,937-Speed 2605.22 samples/sec   Loss 2.2478   LearningRate 0.0028   Epoch: 16   Global Step: 691380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:23,829-Speed 2631.87 samples/sec   Loss 2.2868   LearningRate 0.0028   Epoch: 16   Global Step: 691390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:27,726-Speed 2628.23 samples/sec   Loss 2.3177   LearningRate 0.0028   Epoch: 16   Global Step: 691400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:31,624-Speed 2627.89 samples/sec   Loss 2.2889   LearningRate 0.0028   Epoch: 16   Global Step: 691410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:35,527-Speed 2623.64 samples/sec   Loss 2.3252   LearningRate 0.0028   Epoch: 16   Global Step: 691420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:39,427-Speed 2626.21 samples/sec   Loss 2.3230   LearningRate 0.0028   Epoch: 16   Global Step: 691430   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:43,323-Speed 2629.59 samples/sec   Loss 2.3422   LearningRate 0.0028   Epoch: 16   Global Step: 691440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:47,218-Speed 2629.62 samples/sec   Loss 2.2603   LearningRate 0.0028   Epoch: 16   Global Step: 691450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:51,117-Speed 2627.07 samples/sec   Loss 2.2940   LearningRate 0.0028   Epoch: 16   Global Step: 691460   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:27:55,015-Speed 2627.54 samples/sec   Loss 2.3228   LearningRate 0.0028   Epoch: 16   Global Step: 691470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:27:58,911-Speed 2629.49 samples/sec   Loss 2.2875   LearningRate 0.0028   Epoch: 16   Global Step: 691480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:28:02,806-Speed 2629.31 samples/sec   Loss 2.3555   LearningRate 0.0028   Epoch: 16   Global Step: 691490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:28:06,702-Speed 2628.90 samples/sec   Loss 2.3121   LearningRate 0.0028   Epoch: 16   Global Step: 691500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:28:10,600-Speed 2627.78 samples/sec   Loss 2.2105   LearningRate 0.0028   Epoch: 16   Global Step: 691510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:28:14,473-Speed 2644.75 samples/sec   Loss 2.3397   LearningRate 0.0028   Epoch: 16   Global Step: 691520   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:18,372-Speed 2626.97 samples/sec   Loss 2.3197   LearningRate 0.0028   Epoch: 16   Global Step: 691530   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:22,278-Speed 2622.31 samples/sec   Loss 2.2811   LearningRate 0.0028   Epoch: 16   Global Step: 691540   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:26,184-Speed 2621.97 samples/sec   Loss 2.2744   LearningRate 0.0028   Epoch: 16   Global Step: 691550   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:30,183-Speed 2561.76 samples/sec   Loss 2.3566   LearningRate 0.0028   Epoch: 16   Global Step: 691560   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:34,140-Speed 2588.52 samples/sec   Loss 2.3160   LearningRate 0.0028   Epoch: 16   Global Step: 691570   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:38,042-Speed 2624.71 samples/sec   Loss 2.3060   LearningRate 0.0028   Epoch: 16   Global Step: 691580   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:41,985-Speed 2597.31 samples/sec   Loss 2.4069   LearningRate 0.0028   Epoch: 16   Global Step: 691590   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:45,896-Speed 2619.95 samples/sec   Loss 2.3054   LearningRate 0.0028   Epoch: 16   Global Step: 691600   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:49,798-Speed 2624.87 samples/sec   Loss 2.3100   LearningRate 0.0028   Epoch: 16   Global Step: 691610   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-16 01:28:53,699-Speed 2625.61 samples/sec   Loss 2.2475   LearningRate 0.0028   Epoch: 16   Global Step: 691620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:28:57,618-Speed 2613.80 samples/sec   Loss 2.3097   LearningRate 0.0028   Epoch: 16   Global Step: 691630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:01,515-Speed 2627.62 samples/sec   Loss 2.3707   LearningRate 0.0028   Epoch: 16   Global Step: 691640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:05,411-Speed 2629.64 samples/sec   Loss 2.3013   LearningRate 0.0028   Epoch: 16   Global Step: 691650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:09,306-Speed 2629.20 samples/sec   Loss 2.2448   LearningRate 0.0028   Epoch: 16   Global Step: 691660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:13,249-Speed 2598.04 samples/sec   Loss 2.3189   LearningRate 0.0028   Epoch: 16   Global Step: 691670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:17,148-Speed 2626.99 samples/sec   Loss 2.3233   LearningRate 0.0028   Epoch: 16   Global Step: 691680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:21,059-Speed 2619.02 samples/sec   Loss 2.3914   LearningRate 0.0028   Epoch: 16   Global Step: 691690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-16 01:29:24,956-Speed 2628.56 samples/sec   Loss 2.3265   LearningRate 0.0028   Epoch: 16   Global Step: 691700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:29:28,855-Speed 2631.47 samples/sec   Loss 2.3411   LearningRate 0.0028   Epoch: 16   Global Step: 691710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:29:32,754-Speed 2626.99 samples/sec   Loss 2.3468   LearningRate 0.0028   Epoch: 16   Global Step: 691720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:29:36,646-Speed 2631.68 samples/sec   Loss 2.3023   LearningRate 0.0028   Epoch: 16   Global Step: 691730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:29:40,627-Speed 2572.89 samples/sec   Loss 2.2616   LearningRate 0.0028   Epoch: 16   Global Step: 691740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:29:44,520-Speed 2631.41 samples/sec   Loss 2.3044   LearningRate 0.0028   Epoch: 16   Global Step: 691750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:29:48,419-Speed 2627.00 samples/sec   Loss 2.3449   LearningRate 0.0028   Epoch: 16   Global Step: 691760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:29:52,323-Speed 2623.55 samples/sec   Loss 2.4242   LearningRate 0.0028   Epoch: 16   Global Step: 691770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:29:56,242-Speed 2613.77 samples/sec   Loss 2.2866   LearningRate 0.0028   Epoch: 16   Global Step: 691780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:00,143-Speed 2626.67 samples/sec   Loss 2.3048   LearningRate 0.0028   Epoch: 16   Global Step: 691790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:04,050-Speed 2621.56 samples/sec   Loss 2.2759   LearningRate 0.0028   Epoch: 16   Global Step: 691800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:07,954-Speed 2622.84 samples/sec   Loss 2.3341   LearningRate 0.0028   Epoch: 16   Global Step: 691810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:11,849-Speed 2629.65 samples/sec   Loss 2.3431   LearningRate 0.0028   Epoch: 16   Global Step: 691820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:15,746-Speed 2628.86 samples/sec   Loss 2.2298   LearningRate 0.0028   Epoch: 16   Global Step: 691830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:19,643-Speed 2628.62 samples/sec   Loss 2.2824   LearningRate 0.0028   Epoch: 16   Global Step: 691840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:30:23,515-Speed 2645.52 samples/sec   Loss 2.2695   LearningRate 0.0028   Epoch: 16   Global Step: 691850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:27,418-Speed 2624.95 samples/sec   Loss 2.3450   LearningRate 0.0028   Epoch: 16   Global Step: 691860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:31,325-Speed 2620.99 samples/sec   Loss 2.3249   LearningRate 0.0028   Epoch: 16   Global Step: 691870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:35,221-Speed 2629.43 samples/sec   Loss 2.2901   LearningRate 0.0028   Epoch: 16   Global Step: 691880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:39,116-Speed 2629.69 samples/sec   Loss 2.3234   LearningRate 0.0028   Epoch: 16   Global Step: 691890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:43,014-Speed 2627.61 samples/sec   Loss 2.2359   LearningRate 0.0028   Epoch: 16   Global Step: 691900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:46,949-Speed 2602.97 samples/sec   Loss 2.2716   LearningRate 0.0028   Epoch: 16   Global Step: 691910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:50,848-Speed 2627.25 samples/sec   Loss 2.2640   LearningRate 0.0028   Epoch: 16   Global Step: 691920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:54,754-Speed 2622.48 samples/sec   Loss 2.3618   LearningRate 0.0028   Epoch: 16   Global Step: 691930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:30:58,694-Speed 2599.19 samples/sec   Loss 2.3627   LearningRate 0.0028   Epoch: 16   Global Step: 691940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:02,600-Speed 2623.24 samples/sec   Loss 2.3283   LearningRate 0.0028   Epoch: 16   Global Step: 691950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:31:06,496-Speed 2628.86 samples/sec   Loss 2.3492   LearningRate 0.0028   Epoch: 16   Global Step: 691960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:31:10,400-Speed 2623.36 samples/sec   Loss 2.3200   LearningRate 0.0028   Epoch: 16   Global Step: 691970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:31:14,313-Speed 2616.99 samples/sec   Loss 2.2841   LearningRate 0.0028   Epoch: 16   Global Step: 691980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:18,234-Speed 2612.94 samples/sec   Loss 2.3006   LearningRate 0.0028   Epoch: 16   Global Step: 691990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:22,151-Speed 2614.63 samples/sec   Loss 2.3055   LearningRate 0.0028   Epoch: 16   Global Step: 692000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:26,060-Speed 2620.15 samples/sec   Loss 2.2570   LearningRate 0.0027   Epoch: 16   Global Step: 692010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:29,965-Speed 2622.83 samples/sec   Loss 2.3275   LearningRate 0.0027   Epoch: 16   Global Step: 692020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:33,981-Speed 2555.77 samples/sec   Loss 2.2899   LearningRate 0.0027   Epoch: 16   Global Step: 692030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:37,879-Speed 2627.80 samples/sec   Loss 2.3265   LearningRate 0.0027   Epoch: 16   Global Step: 692040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:41,804-Speed 2609.40 samples/sec   Loss 2.2966   LearningRate 0.0027   Epoch: 16   Global Step: 692050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:45,708-Speed 2624.04 samples/sec   Loss 2.3098   LearningRate 0.0027   Epoch: 16   Global Step: 692060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:49,609-Speed 2625.59 samples/sec   Loss 2.3791   LearningRate 0.0027   Epoch: 16   Global Step: 692070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:53,501-Speed 2631.96 samples/sec   Loss 2.2674   LearningRate 0.0027   Epoch: 16   Global Step: 692080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:31:57,398-Speed 2628.52 samples/sec   Loss 2.3257   LearningRate 0.0027   Epoch: 16   Global Step: 692090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:01,325-Speed 2608.11 samples/sec   Loss 2.2800   LearningRate 0.0027   Epoch: 16   Global Step: 692100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:05,231-Speed 2622.71 samples/sec   Loss 2.2103   LearningRate 0.0027   Epoch: 16   Global Step: 692110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:09,128-Speed 2627.85 samples/sec   Loss 2.3259   LearningRate 0.0027   Epoch: 16   Global Step: 692120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:13,036-Speed 2620.67 samples/sec   Loss 2.3223   LearningRate 0.0027   Epoch: 16   Global Step: 692130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:16,930-Speed 2630.60 samples/sec   Loss 2.3127   LearningRate 0.0027   Epoch: 16   Global Step: 692140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:20,830-Speed 2626.41 samples/sec   Loss 2.3221   LearningRate 0.0027   Epoch: 16   Global Step: 692150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:32:24,720-Speed 2632.94 samples/sec   Loss 2.3404   LearningRate 0.0027   Epoch: 16   Global Step: 692160   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:28,618-Speed 2627.40 samples/sec   Loss 2.3224   LearningRate 0.0027   Epoch: 16   Global Step: 692170   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:32,525-Speed 2621.41 samples/sec   Loss 2.3481   LearningRate 0.0027   Epoch: 16   Global Step: 692180   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:36,444-Speed 2613.98 samples/sec   Loss 2.3113   LearningRate 0.0027   Epoch: 16   Global Step: 692190   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:40,364-Speed 2613.35 samples/sec   Loss 2.2263   LearningRate 0.0027   Epoch: 16   Global Step: 692200   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:44,257-Speed 2630.39 samples/sec   Loss 2.3085   LearningRate 0.0027   Epoch: 16   Global Step: 692210   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:48,144-Speed 2635.17 samples/sec   Loss 2.2598   LearningRate 0.0027   Epoch: 16   Global Step: 692220   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:52,037-Speed 2631.32 samples/sec   Loss 2.3321   LearningRate 0.0027   Epoch: 16   Global Step: 692230   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:55,933-Speed 2629.43 samples/sec   Loss 2.3005   LearningRate 0.0027   Epoch: 16   Global Step: 692240   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:32:59,835-Speed 2624.88 samples/sec   Loss 2.2807   LearningRate 0.0027   Epoch: 16   Global Step: 692250   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:33:03,729-Speed 2630.31 samples/sec   Loss 2.3369   LearningRate 0.0027   Epoch: 16   Global Step: 692260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:07,634-Speed 2622.36 samples/sec   Loss 2.3036   LearningRate 0.0027   Epoch: 16   Global Step: 692270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:11,531-Speed 2629.17 samples/sec   Loss 2.3025   LearningRate 0.0027   Epoch: 16   Global Step: 692280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:15,433-Speed 2624.80 samples/sec   Loss 2.2694   LearningRate 0.0027   Epoch: 16   Global Step: 692290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:19,347-Speed 2617.18 samples/sec   Loss 2.2498   LearningRate 0.0027   Epoch: 16   Global Step: 692300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:23,243-Speed 2629.02 samples/sec   Loss 2.2002   LearningRate 0.0027   Epoch: 16   Global Step: 692310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:27,139-Speed 2629.12 samples/sec   Loss 2.2243   LearningRate 0.0027   Epoch: 16   Global Step: 692320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:31,030-Speed 2632.68 samples/sec   Loss 2.3085   LearningRate 0.0027   Epoch: 16   Global Step: 692330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:34,924-Speed 2630.27 samples/sec   Loss 2.3260   LearningRate 0.0027   Epoch: 16   Global Step: 692340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:38,847-Speed 2610.60 samples/sec   Loss 2.3375   LearningRate 0.0027   Epoch: 16   Global Step: 692350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:42,745-Speed 2627.63 samples/sec   Loss 2.3336   LearningRate 0.0027   Epoch: 16   Global Step: 692360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:33:46,615-Speed 2647.33 samples/sec   Loss 2.2973   LearningRate 0.0027   Epoch: 16   Global Step: 692370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:50,508-Speed 2631.20 samples/sec   Loss 2.3350   LearningRate 0.0027   Epoch: 16   Global Step: 692380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:54,404-Speed 2628.89 samples/sec   Loss 2.2512   LearningRate 0.0027   Epoch: 16   Global Step: 692390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:33:58,298-Speed 2630.74 samples/sec   Loss 2.3564   LearningRate 0.0027   Epoch: 16   Global Step: 692400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:34:02,164-Speed 2649.12 samples/sec   Loss 2.3747   LearningRate 0.0027   Epoch: 16   Global Step: 692410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:06,074-Speed 2619.31 samples/sec   Loss 2.2952   LearningRate 0.0027   Epoch: 16   Global Step: 692420   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:09,972-Speed 2627.66 samples/sec   Loss 2.3522   LearningRate 0.0027   Epoch: 16   Global Step: 692430   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:13,868-Speed 2629.82 samples/sec   Loss 2.2864   LearningRate 0.0027   Epoch: 16   Global Step: 692440   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:17,774-Speed 2621.81 samples/sec   Loss 2.2503   LearningRate 0.0027   Epoch: 16   Global Step: 692450   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:21,674-Speed 2626.34 samples/sec   Loss 2.3729   LearningRate 0.0027   Epoch: 16   Global Step: 692460   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:25,582-Speed 2621.01 samples/sec   Loss 2.2862   LearningRate 0.0027   Epoch: 16   Global Step: 692470   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:29,478-Speed 2629.48 samples/sec   Loss 2.2753   LearningRate 0.0027   Epoch: 16   Global Step: 692480   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:33,378-Speed 2625.98 samples/sec   Loss 2.2908   LearningRate 0.0027   Epoch: 16   Global Step: 692490   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:37,272-Speed 2630.57 samples/sec   Loss 2.2903   LearningRate 0.0027   Epoch: 16   Global Step: 692500   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:41,162-Speed 2632.35 samples/sec   Loss 2.3683   LearningRate 0.0027   Epoch: 16   Global Step: 692510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:34:45,058-Speed 2629.42 samples/sec   Loss 2.2486   LearningRate 0.0027   Epoch: 16   Global Step: 692520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:34:48,951-Speed 2630.60 samples/sec   Loss 2.3921   LearningRate 0.0027   Epoch: 16   Global Step: 692530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:34:52,842-Speed 2633.09 samples/sec   Loss 2.2823   LearningRate 0.0027   Epoch: 16   Global Step: 692540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:34:56,733-Speed 2632.59 samples/sec   Loss 2.2070   LearningRate 0.0027   Epoch: 16   Global Step: 692550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:00,626-Speed 2631.16 samples/sec   Loss 2.2996   LearningRate 0.0027   Epoch: 16   Global Step: 692560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:04,523-Speed 2628.33 samples/sec   Loss 2.2799   LearningRate 0.0027   Epoch: 16   Global Step: 692570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:08,417-Speed 2630.33 samples/sec   Loss 2.2899   LearningRate 0.0027   Epoch: 16   Global Step: 692580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:12,312-Speed 2629.30 samples/sec   Loss 2.3117   LearningRate 0.0027   Epoch: 16   Global Step: 692590   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:16,216-Speed 2624.18 samples/sec   Loss 2.2784   LearningRate 0.0027   Epoch: 16   Global Step: 692600   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:20,131-Speed 2616.50 samples/sec   Loss 2.3514   LearningRate 0.0027   Epoch: 16   Global Step: 692610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:24,036-Speed 2622.93 samples/sec   Loss 2.3089   LearningRate 0.0027   Epoch: 16   Global Step: 692620   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:27,938-Speed 2625.71 samples/sec   Loss 2.3012   LearningRate 0.0027   Epoch: 16   Global Step: 692630   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:35:31,835-Speed 2628.18 samples/sec   Loss 2.2893   LearningRate 0.0027   Epoch: 16   Global Step: 692640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:35,730-Speed 2628.95 samples/sec   Loss 2.2423   LearningRate 0.0027   Epoch: 16   Global Step: 692650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:39,627-Speed 2628.30 samples/sec   Loss 2.3186   LearningRate 0.0027   Epoch: 16   Global Step: 692660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:43,529-Speed 2625.91 samples/sec   Loss 2.2520   LearningRate 0.0027   Epoch: 16   Global Step: 692670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:47,425-Speed 2628.89 samples/sec   Loss 2.3020   LearningRate 0.0027   Epoch: 16   Global Step: 692680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:51,325-Speed 2626.15 samples/sec   Loss 2.3757   LearningRate 0.0027   Epoch: 16   Global Step: 692690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:55,219-Speed 2630.38 samples/sec   Loss 2.3058   LearningRate 0.0027   Epoch: 16   Global Step: 692700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:35:59,135-Speed 2615.90 samples/sec   Loss 2.2958   LearningRate 0.0027   Epoch: 16   Global Step: 692710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:03,039-Speed 2623.74 samples/sec   Loss 2.2892   LearningRate 0.0027   Epoch: 16   Global Step: 692720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:06,931-Speed 2632.03 samples/sec   Loss 2.2956   LearningRate 0.0027   Epoch: 16   Global Step: 692730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:10,821-Speed 2632.57 samples/sec   Loss 2.3256   LearningRate 0.0027   Epoch: 16   Global Step: 692740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:36:14,714-Speed 2631.14 samples/sec   Loss 2.3011   LearningRate 0.0027   Epoch: 16   Global Step: 692750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:36:18,589-Speed 2643.26 samples/sec   Loss 2.3143   LearningRate 0.0027   Epoch: 16   Global Step: 692760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:22,500-Speed 2618.84 samples/sec   Loss 2.2668   LearningRate 0.0027   Epoch: 16   Global Step: 692770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:26,439-Speed 2601.41 samples/sec   Loss 2.2780   LearningRate 0.0027   Epoch: 16   Global Step: 692780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:30,336-Speed 2628.45 samples/sec   Loss 2.2794   LearningRate 0.0027   Epoch: 16   Global Step: 692790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:34,232-Speed 2629.38 samples/sec   Loss 2.2474   LearningRate 0.0027   Epoch: 16   Global Step: 692800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:38,151-Speed 2613.65 samples/sec   Loss 2.2873   LearningRate 0.0027   Epoch: 16   Global Step: 692810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:42,055-Speed 2623.96 samples/sec   Loss 2.3102   LearningRate 0.0027   Epoch: 16   Global Step: 692820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:45,955-Speed 2625.86 samples/sec   Loss 2.3731   LearningRate 0.0027   Epoch: 16   Global Step: 692830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:49,851-Speed 2629.35 samples/sec   Loss 2.3145   LearningRate 0.0027   Epoch: 16   Global Step: 692840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:53,745-Speed 2630.22 samples/sec   Loss 2.1915   LearningRate 0.0027   Epoch: 16   Global Step: 692850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:36:57,644-Speed 2627.50 samples/sec   Loss 2.3086   LearningRate 0.0027   Epoch: 16   Global Step: 692860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:37:01,538-Speed 2630.31 samples/sec   Loss 2.2621   LearningRate 0.0027   Epoch: 16   Global Step: 692870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:37:05,411-Speed 2645.26 samples/sec   Loss 2.3583   LearningRate 0.0027   Epoch: 16   Global Step: 692880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:09,311-Speed 2625.79 samples/sec   Loss 2.2789   LearningRate 0.0027   Epoch: 16   Global Step: 692890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:13,211-Speed 2626.62 samples/sec   Loss 2.3399   LearningRate 0.0027   Epoch: 16   Global Step: 692900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:17,102-Speed 2631.98 samples/sec   Loss 2.2341   LearningRate 0.0027   Epoch: 16   Global Step: 692910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:21,005-Speed 2624.44 samples/sec   Loss 2.2777   LearningRate 0.0027   Epoch: 16   Global Step: 692920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:24,904-Speed 2627.24 samples/sec   Loss 2.2548   LearningRate 0.0027   Epoch: 16   Global Step: 692930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:28,800-Speed 2629.23 samples/sec   Loss 2.2848   LearningRate 0.0027   Epoch: 16   Global Step: 692940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:32,697-Speed 2628.06 samples/sec   Loss 2.2475   LearningRate 0.0027   Epoch: 16   Global Step: 692950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:36,591-Speed 2630.57 samples/sec   Loss 2.3100   LearningRate 0.0027   Epoch: 16   Global Step: 692960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:40,510-Speed 2613.64 samples/sec   Loss 2.3132   LearningRate 0.0027   Epoch: 16   Global Step: 692970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:44,391-Speed 2639.16 samples/sec   Loss 2.3604   LearningRate 0.0027   Epoch: 16   Global Step: 692980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:48,299-Speed 2620.69 samples/sec   Loss 2.2633   LearningRate 0.0027   Epoch: 16   Global Step: 692990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:52,193-Speed 2631.00 samples/sec   Loss 2.3002   LearningRate 0.0027   Epoch: 16   Global Step: 693000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:56,087-Speed 2631.19 samples/sec   Loss 2.2954   LearningRate 0.0027   Epoch: 16   Global Step: 693010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:37:59,979-Speed 2631.65 samples/sec   Loss 2.2911   LearningRate 0.0027   Epoch: 16   Global Step: 693020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:03,885-Speed 2622.22 samples/sec   Loss 2.3266   LearningRate 0.0027   Epoch: 16   Global Step: 693030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:07,783-Speed 2627.63 samples/sec   Loss 2.2933   LearningRate 0.0027   Epoch: 16   Global Step: 693040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:11,680-Speed 2628.07 samples/sec   Loss 2.2719   LearningRate 0.0027   Epoch: 16   Global Step: 693050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:15,610-Speed 2606.28 samples/sec   Loss 2.3318   LearningRate 0.0027   Epoch: 16   Global Step: 693060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:19,512-Speed 2625.77 samples/sec   Loss 2.2805   LearningRate 0.0027   Epoch: 16   Global Step: 693070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:23,499-Speed 2568.62 samples/sec   Loss 2.2824   LearningRate 0.0027   Epoch: 16   Global Step: 693080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:38:27,519-Speed 2548.02 samples/sec   Loss 2.3593   LearningRate 0.0027   Epoch: 16   Global Step: 693090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:38:31,413-Speed 2630.85 samples/sec   Loss 2.2400   LearningRate 0.0027   Epoch: 16   Global Step: 693100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:38:35,294-Speed 2639.22 samples/sec   Loss 2.2955   LearningRate 0.0027   Epoch: 16   Global Step: 693110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:39,191-Speed 2628.21 samples/sec   Loss 2.2632   LearningRate 0.0027   Epoch: 16   Global Step: 693120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:43,087-Speed 2629.09 samples/sec   Loss 2.3321   LearningRate 0.0027   Epoch: 16   Global Step: 693130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:46,992-Speed 2623.13 samples/sec   Loss 2.2769   LearningRate 0.0027   Epoch: 16   Global Step: 693140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:50,892-Speed 2626.08 samples/sec   Loss 2.3189   LearningRate 0.0027   Epoch: 16   Global Step: 693150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:54,799-Speed 2621.73 samples/sec   Loss 2.3475   LearningRate 0.0027   Epoch: 16   Global Step: 693160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:38:58,703-Speed 2623.50 samples/sec   Loss 2.2090   LearningRate 0.0027   Epoch: 16   Global Step: 693170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:02,602-Speed 2627.28 samples/sec   Loss 2.2301   LearningRate 0.0027   Epoch: 16   Global Step: 693180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:06,498-Speed 2628.86 samples/sec   Loss 2.2755   LearningRate 0.0027   Epoch: 16   Global Step: 693190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:10,419-Speed 2612.00 samples/sec   Loss 2.2502   LearningRate 0.0027   Epoch: 16   Global Step: 693200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:14,316-Speed 2628.57 samples/sec   Loss 2.2280   LearningRate 0.0027   Epoch: 16   Global Step: 693210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:39:18,214-Speed 2627.19 samples/sec   Loss 2.3101   LearningRate 0.0027   Epoch: 16   Global Step: 693220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:39:22,117-Speed 2624.84 samples/sec   Loss 2.3306   LearningRate 0.0027   Epoch: 16   Global Step: 693230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:39:26,017-Speed 2626.27 samples/sec   Loss 2.2620   LearningRate 0.0027   Epoch: 16   Global Step: 693240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:39:29,896-Speed 2640.77 samples/sec   Loss 2.2468   LearningRate 0.0027   Epoch: 16   Global Step: 693250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:33,796-Speed 2626.27 samples/sec   Loss 2.2789   LearningRate 0.0027   Epoch: 16   Global Step: 693260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:37,691-Speed 2629.72 samples/sec   Loss 2.2863   LearningRate 0.0027   Epoch: 16   Global Step: 693270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:41,585-Speed 2629.82 samples/sec   Loss 2.3322   LearningRate 0.0027   Epoch: 16   Global Step: 693280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:45,478-Speed 2631.80 samples/sec   Loss 2.2847   LearningRate 0.0027   Epoch: 16   Global Step: 693290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:49,395-Speed 2614.19 samples/sec   Loss 2.2209   LearningRate 0.0027   Epoch: 16   Global Step: 693300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:53,294-Speed 2627.46 samples/sec   Loss 2.2926   LearningRate 0.0027   Epoch: 16   Global Step: 693310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:39:57,204-Speed 2619.18 samples/sec   Loss 2.3059   LearningRate 0.0027   Epoch: 16   Global Step: 693320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:01,176-Speed 2579.54 samples/sec   Loss 2.2618   LearningRate 0.0027   Epoch: 16   Global Step: 693330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:05,078-Speed 2624.96 samples/sec   Loss 2.2995   LearningRate 0.0027   Epoch: 16   Global Step: 693340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:09,011-Speed 2604.05 samples/sec   Loss 2.2516   LearningRate 0.0027   Epoch: 16   Global Step: 693350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:40:12,903-Speed 2631.53 samples/sec   Loss 2.3100   LearningRate 0.0027   Epoch: 16   Global Step: 693360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:40:16,797-Speed 2630.63 samples/sec   Loss 2.3288   LearningRate 0.0027   Epoch: 16   Global Step: 693370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:40:20,693-Speed 2629.14 samples/sec   Loss 2.3005   LearningRate 0.0027   Epoch: 16   Global Step: 693380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:40:24,683-Speed 2567.39 samples/sec   Loss 2.2923   LearningRate 0.0027   Epoch: 16   Global Step: 693390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:28,582-Speed 2626.91 samples/sec   Loss 2.2988   LearningRate 0.0027   Epoch: 16   Global Step: 693400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:32,476-Speed 2630.32 samples/sec   Loss 2.3555   LearningRate 0.0027   Epoch: 16   Global Step: 693410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:36,368-Speed 2631.48 samples/sec   Loss 2.2876   LearningRate 0.0027   Epoch: 16   Global Step: 693420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:40,263-Speed 2629.42 samples/sec   Loss 2.2711   LearningRate 0.0027   Epoch: 16   Global Step: 693430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:44,158-Speed 2630.07 samples/sec   Loss 2.2759   LearningRate 0.0027   Epoch: 16   Global Step: 693440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:48,057-Speed 2626.90 samples/sec   Loss 2.3061   LearningRate 0.0027   Epoch: 16   Global Step: 693450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:51,959-Speed 2625.20 samples/sec   Loss 2.2992   LearningRate 0.0027   Epoch: 16   Global Step: 693460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:55,857-Speed 2627.54 samples/sec   Loss 2.3056   LearningRate 0.0027   Epoch: 16   Global Step: 693470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:40:59,753-Speed 2629.36 samples/sec   Loss 2.2834   LearningRate 0.0027   Epoch: 16   Global Step: 693480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:03,675-Speed 2611.53 samples/sec   Loss 2.3114   LearningRate 0.0027   Epoch: 16   Global Step: 693490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:41:07,573-Speed 2627.89 samples/sec   Loss 2.3463   LearningRate 0.0027   Epoch: 16   Global Step: 693500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:41:11,466-Speed 2630.71 samples/sec   Loss 2.2664   LearningRate 0.0027   Epoch: 16   Global Step: 693510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:41:15,340-Speed 2644.45 samples/sec   Loss 2.2360   LearningRate 0.0027   Epoch: 16   Global Step: 693520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:19,250-Speed 2619.12 samples/sec   Loss 2.3444   LearningRate 0.0027   Epoch: 16   Global Step: 693530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:23,308-Speed 2524.35 samples/sec   Loss 2.2894   LearningRate 0.0027   Epoch: 16   Global Step: 693540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:27,204-Speed 2629.00 samples/sec   Loss 2.2016   LearningRate 0.0027   Epoch: 16   Global Step: 693550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:31,105-Speed 2625.46 samples/sec   Loss 2.3564   LearningRate 0.0027   Epoch: 16   Global Step: 693560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:35,013-Speed 2621.16 samples/sec   Loss 2.3350   LearningRate 0.0027   Epoch: 16   Global Step: 693570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:38,911-Speed 2627.09 samples/sec   Loss 2.2895   LearningRate 0.0027   Epoch: 16   Global Step: 693580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:42,808-Speed 2628.05 samples/sec   Loss 2.2284   LearningRate 0.0027   Epoch: 16   Global Step: 693590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:41:46,676-Speed 2648.52 samples/sec   Loss 2.2834   LearningRate 0.0027   Epoch: 16   Global Step: 693600   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:41:50,580-Speed 2623.42 samples/sec   Loss 2.2560   LearningRate 0.0027   Epoch: 16   Global Step: 693610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:41:54,494-Speed 2616.97 samples/sec   Loss 2.3368   LearningRate 0.0027   Epoch: 16   Global Step: 693620   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:41:58,390-Speed 2628.66 samples/sec   Loss 2.3178   LearningRate 0.0027   Epoch: 16   Global Step: 693630   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:42:02,284-Speed 2630.64 samples/sec   Loss 2.2961   LearningRate 0.0027   Epoch: 16   Global Step: 693640   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:42:06,191-Speed 2621.62 samples/sec   Loss 2.3020   LearningRate 0.0027   Epoch: 16   Global Step: 693650   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:42:10,134-Speed 2597.76 samples/sec   Loss 2.1962   LearningRate 0.0027   Epoch: 16   Global Step: 693660   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:42:14,073-Speed 2599.98 samples/sec   Loss 2.2884   LearningRate 0.0027   Epoch: 16   Global Step: 693670   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:42:17,945-Speed 2645.01 samples/sec   Loss 2.2781   LearningRate 0.0027   Epoch: 16   Global Step: 693680   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:21,864-Speed 2613.40 samples/sec   Loss 2.3276   LearningRate 0.0027   Epoch: 16   Global Step: 693690   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:25,761-Speed 2628.61 samples/sec   Loss 2.2810   LearningRate 0.0027   Epoch: 16   Global Step: 693700   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:29,724-Speed 2584.27 samples/sec   Loss 2.3102   LearningRate 0.0027   Epoch: 16   Global Step: 693710   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:33,645-Speed 2612.62 samples/sec   Loss 2.3117   LearningRate 0.0027   Epoch: 16   Global Step: 693720   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:37,549-Speed 2623.55 samples/sec   Loss 2.2247   LearningRate 0.0027   Epoch: 16   Global Step: 693730   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:41,463-Speed 2616.83 samples/sec   Loss 2.2443   LearningRate 0.0027   Epoch: 16   Global Step: 693740   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:45,364-Speed 2625.59 samples/sec   Loss 2.3202   LearningRate 0.0027   Epoch: 16   Global Step: 693750   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:49,258-Speed 2630.52 samples/sec   Loss 2.3019   LearningRate 0.0027   Epoch: 16   Global Step: 693760   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:53,154-Speed 2628.91 samples/sec   Loss 2.2948   LearningRate 0.0027   Epoch: 16   Global Step: 693770   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 01:42:57,092-Speed 2601.06 samples/sec   Loss 2.2904   LearningRate 0.0027   Epoch: 16   Global Step: 693780   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:00,985-Speed 2630.48 samples/sec   Loss 2.2573   LearningRate 0.0027   Epoch: 16   Global Step: 693790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:04,884-Speed 2627.15 samples/sec   Loss 2.2642   LearningRate 0.0027   Epoch: 16   Global Step: 693800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:08,778-Speed 2630.00 samples/sec   Loss 2.2149   LearningRate 0.0027   Epoch: 16   Global Step: 693810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:12,674-Speed 2629.50 samples/sec   Loss 2.2874   LearningRate 0.0027   Epoch: 16   Global Step: 693820   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:16,572-Speed 2627.31 samples/sec   Loss 2.2708   LearningRate 0.0027   Epoch: 16   Global Step: 693830   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:20,467-Speed 2630.10 samples/sec   Loss 2.3287   LearningRate 0.0027   Epoch: 16   Global Step: 693840   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:24,368-Speed 2625.20 samples/sec   Loss 2.2857   LearningRate 0.0027   Epoch: 16   Global Step: 693850   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:28,265-Speed 2628.31 samples/sec   Loss 2.2772   LearningRate 0.0027   Epoch: 16   Global Step: 693860   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:32,162-Speed 2628.28 samples/sec   Loss 2.2436   LearningRate 0.0027   Epoch: 16   Global Step: 693870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:43:36,052-Speed 2632.84 samples/sec   Loss 2.2752   LearningRate 0.0027   Epoch: 16   Global Step: 693880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:43:39,943-Speed 2632.48 samples/sec   Loss 2.3344   LearningRate 0.0027   Epoch: 16   Global Step: 693890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:43:43,849-Speed 2621.83 samples/sec   Loss 2.2688   LearningRate 0.0027   Epoch: 16   Global Step: 693900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:43:47,745-Speed 2629.11 samples/sec   Loss 2.2420   LearningRate 0.0027   Epoch: 16   Global Step: 693910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:43:51,643-Speed 2627.97 samples/sec   Loss 2.2113   LearningRate 0.0027   Epoch: 16   Global Step: 693920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:43:55,540-Speed 2628.28 samples/sec   Loss 2.2781   LearningRate 0.0027   Epoch: 16   Global Step: 693930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:43:59,433-Speed 2630.82 samples/sec   Loss 2.2982   LearningRate 0.0027   Epoch: 16   Global Step: 693940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:03,331-Speed 2627.86 samples/sec   Loss 2.2834   LearningRate 0.0027   Epoch: 16   Global Step: 693950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:07,230-Speed 2627.07 samples/sec   Loss 2.2934   LearningRate 0.0027   Epoch: 16   Global Step: 693960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:11,121-Speed 2631.98 samples/sec   Loss 2.2218   LearningRate 0.0027   Epoch: 16   Global Step: 693970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:14,997-Speed 2642.47 samples/sec   Loss 2.2357   LearningRate 0.0027   Epoch: 16   Global Step: 693980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:18,896-Speed 2626.43 samples/sec   Loss 2.3424   LearningRate 0.0027   Epoch: 16   Global Step: 693990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:22,791-Speed 2630.53 samples/sec   Loss 2.2931   LearningRate 0.0027   Epoch: 16   Global Step: 694000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:26,682-Speed 2632.77 samples/sec   Loss 2.2685   LearningRate 0.0027   Epoch: 16   Global Step: 694010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:30,578-Speed 2628.33 samples/sec   Loss 2.2390   LearningRate 0.0027   Epoch: 16   Global Step: 694020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:34,469-Speed 2632.49 samples/sec   Loss 2.3114   LearningRate 0.0027   Epoch: 16   Global Step: 694030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:38,367-Speed 2627.14 samples/sec   Loss 2.3559   LearningRate 0.0027   Epoch: 16   Global Step: 694040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:42,265-Speed 2627.76 samples/sec   Loss 2.3369   LearningRate 0.0027   Epoch: 16   Global Step: 694050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:46,173-Speed 2621.05 samples/sec   Loss 2.2893   LearningRate 0.0027   Epoch: 16   Global Step: 694060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:50,069-Speed 2629.43 samples/sec   Loss 2.2774   LearningRate 0.0027   Epoch: 16   Global Step: 694070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:44:53,987-Speed 2614.36 samples/sec   Loss 2.2885   LearningRate 0.0027   Epoch: 16   Global Step: 694080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:44:57,883-Speed 2628.40 samples/sec   Loss 2.2408   LearningRate 0.0027   Epoch: 16   Global Step: 694090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:45:01,788-Speed 2623.22 samples/sec   Loss 2.3136   LearningRate 0.0027   Epoch: 16   Global Step: 694100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:45:05,698-Speed 2619.41 samples/sec   Loss 2.3438   LearningRate 0.0027   Epoch: 16   Global Step: 694110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:09,593-Speed 2630.11 samples/sec   Loss 2.2227   LearningRate 0.0027   Epoch: 16   Global Step: 694120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:13,488-Speed 2629.16 samples/sec   Loss 2.3146   LearningRate 0.0027   Epoch: 16   Global Step: 694130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:17,384-Speed 2629.26 samples/sec   Loss 2.2849   LearningRate 0.0027   Epoch: 16   Global Step: 694140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:21,276-Speed 2631.83 samples/sec   Loss 2.3065   LearningRate 0.0027   Epoch: 16   Global Step: 694150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:25,185-Speed 2619.95 samples/sec   Loss 2.3340   LearningRate 0.0027   Epoch: 16   Global Step: 694160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:29,082-Speed 2628.56 samples/sec   Loss 2.3365   LearningRate 0.0027   Epoch: 16   Global Step: 694170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:32,976-Speed 2629.53 samples/sec   Loss 2.3073   LearningRate 0.0027   Epoch: 16   Global Step: 694180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:45:36,857-Speed 2638.97 samples/sec   Loss 2.2135   LearningRate 0.0027   Epoch: 16   Global Step: 694190   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:45:40,754-Speed 2628.56 samples/sec   Loss 2.2255   LearningRate 0.0027   Epoch: 16   Global Step: 694200   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:45:44,649-Speed 2629.89 samples/sec   Loss 2.3410   LearningRate 0.0027   Epoch: 16   Global Step: 694210   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:45:48,546-Speed 2628.49 samples/sec   Loss 2.2689   LearningRate 0.0027   Epoch: 16   Global Step: 694220   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:45:52,441-Speed 2629.54 samples/sec   Loss 2.2336   LearningRate 0.0027   Epoch: 16   Global Step: 694230   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:45:56,335-Speed 2630.12 samples/sec   Loss 2.2520   LearningRate 0.0027   Epoch: 16   Global Step: 694240   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:46:00,235-Speed 2626.41 samples/sec   Loss 2.2197   LearningRate 0.0027   Epoch: 16   Global Step: 694250   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:46:04,147-Speed 2617.92 samples/sec   Loss 2.2855   LearningRate 0.0027   Epoch: 16   Global Step: 694260   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:46:08,041-Speed 2630.02 samples/sec   Loss 2.2542   LearningRate 0.0027   Epoch: 16   Global Step: 694270   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:46:11,951-Speed 2619.90 samples/sec   Loss 2.2209   LearningRate 0.0027   Epoch: 16   Global Step: 694280   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:46:15,844-Speed 2631.27 samples/sec   Loss 2.2692   LearningRate 0.0027   Epoch: 16   Global Step: 694290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:19,742-Speed 2627.62 samples/sec   Loss 2.3172   LearningRate 0.0027   Epoch: 16   Global Step: 694300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:23,641-Speed 2626.96 samples/sec   Loss 2.3795   LearningRate 0.0027   Epoch: 16   Global Step: 694310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:27,539-Speed 2627.63 samples/sec   Loss 2.2443   LearningRate 0.0027   Epoch: 16   Global Step: 694320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:31,435-Speed 2628.96 samples/sec   Loss 2.2415   LearningRate 0.0027   Epoch: 16   Global Step: 694330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:35,336-Speed 2625.74 samples/sec   Loss 2.2097   LearningRate 0.0027   Epoch: 16   Global Step: 694340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:39,232-Speed 2628.46 samples/sec   Loss 2.2440   LearningRate 0.0027   Epoch: 16   Global Step: 694350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:43,129-Speed 2627.97 samples/sec   Loss 2.2941   LearningRate 0.0027   Epoch: 16   Global Step: 694360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:47,024-Speed 2629.88 samples/sec   Loss 2.1949   LearningRate 0.0027   Epoch: 16   Global Step: 694370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:50,944-Speed 2612.83 samples/sec   Loss 2.2588   LearningRate 0.0027   Epoch: 16   Global Step: 694380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:46:54,837-Speed 2631.47 samples/sec   Loss 2.2961   LearningRate 0.0027   Epoch: 16   Global Step: 694390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:46:58,730-Speed 2630.44 samples/sec   Loss 2.3511   LearningRate 0.0027   Epoch: 16   Global Step: 694400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:47:02,629-Speed 2627.42 samples/sec   Loss 2.2597   LearningRate 0.0027   Epoch: 16   Global Step: 694410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:47:06,529-Speed 2626.27 samples/sec   Loss 2.3145   LearningRate 0.0027   Epoch: 16   Global Step: 694420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:47:10,409-Speed 2639.85 samples/sec   Loss 2.3145   LearningRate 0.0027   Epoch: 16   Global Step: 694430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:14,309-Speed 2625.61 samples/sec   Loss 2.2469   LearningRate 0.0027   Epoch: 16   Global Step: 694440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:18,213-Speed 2623.87 samples/sec   Loss 2.2671   LearningRate 0.0027   Epoch: 16   Global Step: 694450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:22,115-Speed 2624.73 samples/sec   Loss 2.2929   LearningRate 0.0027   Epoch: 16   Global Step: 694460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:26,017-Speed 2624.98 samples/sec   Loss 2.2546   LearningRate 0.0027   Epoch: 16   Global Step: 694470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:29,917-Speed 2627.19 samples/sec   Loss 2.2078   LearningRate 0.0027   Epoch: 16   Global Step: 694480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:33,819-Speed 2625.25 samples/sec   Loss 2.2892   LearningRate 0.0027   Epoch: 16   Global Step: 694490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:37,720-Speed 2625.47 samples/sec   Loss 2.3199   LearningRate 0.0027   Epoch: 16   Global Step: 694500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:41,826-Speed 2494.22 samples/sec   Loss 2.2112   LearningRate 0.0027   Epoch: 16   Global Step: 694510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:45,723-Speed 2628.37 samples/sec   Loss 2.2086   LearningRate 0.0027   Epoch: 16   Global Step: 694520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:49,601-Speed 2641.07 samples/sec   Loss 2.2792   LearningRate 0.0027   Epoch: 16   Global Step: 694530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:53,502-Speed 2626.58 samples/sec   Loss 2.2979   LearningRate 0.0026   Epoch: 16   Global Step: 694540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:47:57,397-Speed 2629.63 samples/sec   Loss 2.2480   LearningRate 0.0026   Epoch: 16   Global Step: 694550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:01,304-Speed 2621.29 samples/sec   Loss 2.3448   LearningRate 0.0026   Epoch: 16   Global Step: 694560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:05,196-Speed 2631.72 samples/sec   Loss 2.3099   LearningRate 0.0026   Epoch: 16   Global Step: 694570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:09,090-Speed 2630.40 samples/sec   Loss 2.2723   LearningRate 0.0026   Epoch: 16   Global Step: 694580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:13,003-Speed 2617.37 samples/sec   Loss 2.2323   LearningRate 0.0026   Epoch: 16   Global Step: 694590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:16,899-Speed 2629.19 samples/sec   Loss 2.3182   LearningRate 0.0026   Epoch: 16   Global Step: 694600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:20,797-Speed 2627.49 samples/sec   Loss 2.2967   LearningRate 0.0026   Epoch: 16   Global Step: 694610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:24,694-Speed 2628.22 samples/sec   Loss 2.2824   LearningRate 0.0026   Epoch: 16   Global Step: 694620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:28,585-Speed 2632.89 samples/sec   Loss 2.2814   LearningRate 0.0026   Epoch: 16   Global Step: 694630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:48:32,478-Speed 2630.97 samples/sec   Loss 2.2610   LearningRate 0.0026   Epoch: 16   Global Step: 694640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:48:36,388-Speed 2619.32 samples/sec   Loss 2.2530   LearningRate 0.0026   Epoch: 16   Global Step: 694650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:48:40,275-Speed 2635.03 samples/sec   Loss 2.2846   LearningRate 0.0026   Epoch: 16   Global Step: 694660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:44,183-Speed 2621.27 samples/sec   Loss 2.3290   LearningRate 0.0026   Epoch: 16   Global Step: 694670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:48,078-Speed 2629.33 samples/sec   Loss 2.2956   LearningRate 0.0026   Epoch: 16   Global Step: 694680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:51,973-Speed 2630.23 samples/sec   Loss 2.2586   LearningRate 0.0026   Epoch: 16   Global Step: 694690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:55,868-Speed 2629.17 samples/sec   Loss 2.2853   LearningRate 0.0026   Epoch: 16   Global Step: 694700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:48:59,762-Speed 2631.06 samples/sec   Loss 2.2957   LearningRate 0.0026   Epoch: 16   Global Step: 694710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:03,655-Speed 2630.76 samples/sec   Loss 2.3123   LearningRate 0.0026   Epoch: 16   Global Step: 694720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:07,557-Speed 2625.23 samples/sec   Loss 2.2653   LearningRate 0.0026   Epoch: 16   Global Step: 694730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:11,454-Speed 2627.72 samples/sec   Loss 2.2308   LearningRate 0.0026   Epoch: 16   Global Step: 694740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:15,353-Speed 2627.29 samples/sec   Loss 2.2398   LearningRate 0.0026   Epoch: 16   Global Step: 694750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:19,245-Speed 2631.26 samples/sec   Loss 2.2123   LearningRate 0.0026   Epoch: 16   Global Step: 694760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:49:23,120-Speed 2643.86 samples/sec   Loss 2.2333   LearningRate 0.0026   Epoch: 16   Global Step: 694770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:27,012-Speed 2632.12 samples/sec   Loss 2.3166   LearningRate 0.0026   Epoch: 16   Global Step: 694780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:30,905-Speed 2630.83 samples/sec   Loss 2.2812   LearningRate 0.0026   Epoch: 16   Global Step: 694790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:34,802-Speed 2628.57 samples/sec   Loss 2.2735   LearningRate 0.0026   Epoch: 16   Global Step: 694800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:38,703-Speed 2625.42 samples/sec   Loss 2.2143   LearningRate 0.0026   Epoch: 16   Global Step: 694810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:42,602-Speed 2626.97 samples/sec   Loss 2.2595   LearningRate 0.0026   Epoch: 16   Global Step: 694820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:46,500-Speed 2627.38 samples/sec   Loss 2.2391   LearningRate 0.0026   Epoch: 16   Global Step: 694830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:50,406-Speed 2622.16 samples/sec   Loss 2.2576   LearningRate 0.0026   Epoch: 16   Global Step: 694840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:54,316-Speed 2619.58 samples/sec   Loss 2.3277   LearningRate 0.0026   Epoch: 16   Global Step: 694850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:49:58,223-Speed 2621.59 samples/sec   Loss 2.2818   LearningRate 0.0026   Epoch: 16   Global Step: 694860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:50:02,103-Speed 2640.53 samples/sec   Loss 2.2661   LearningRate 0.0026   Epoch: 16   Global Step: 694870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:06,002-Speed 2626.65 samples/sec   Loss 2.2310   LearningRate 0.0026   Epoch: 16   Global Step: 694880   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:09,898-Speed 2628.95 samples/sec   Loss 2.3251   LearningRate 0.0026   Epoch: 16   Global Step: 694890   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:13,788-Speed 2632.95 samples/sec   Loss 2.2239   LearningRate 0.0026   Epoch: 16   Global Step: 694900   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:17,682-Speed 2629.92 samples/sec   Loss 2.2333   LearningRate 0.0026   Epoch: 16   Global Step: 694910   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:21,582-Speed 2626.33 samples/sec   Loss 2.2514   LearningRate 0.0026   Epoch: 16   Global Step: 694920   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:25,477-Speed 2629.78 samples/sec   Loss 2.2561   LearningRate 0.0026   Epoch: 16   Global Step: 694930   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:29,378-Speed 2625.33 samples/sec   Loss 2.2422   LearningRate 0.0026   Epoch: 16   Global Step: 694940   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:33,277-Speed 2627.12 samples/sec   Loss 2.1868   LearningRate 0.0026   Epoch: 16   Global Step: 694950   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:37,170-Speed 2631.45 samples/sec   Loss 2.3309   LearningRate 0.0026   Epoch: 16   Global Step: 694960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:50:41,068-Speed 2627.86 samples/sec   Loss 2.2596   LearningRate 0.0026   Epoch: 16   Global Step: 694970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:50:44,971-Speed 2624.60 samples/sec   Loss 2.2641   LearningRate 0.0026   Epoch: 16   Global Step: 694980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:50:48,869-Speed 2627.79 samples/sec   Loss 2.2446   LearningRate 0.0026   Epoch: 16   Global Step: 694990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:50:52,762-Speed 2630.64 samples/sec   Loss 2.2453   LearningRate 0.0026   Epoch: 16   Global Step: 695000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:50:56,653-Speed 2632.17 samples/sec   Loss 2.3207   LearningRate 0.0026   Epoch: 16   Global Step: 695010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:00,549-Speed 2628.79 samples/sec   Loss 2.2502   LearningRate 0.0026   Epoch: 16   Global Step: 695020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:04,444-Speed 2629.46 samples/sec   Loss 2.2653   LearningRate 0.0026   Epoch: 16   Global Step: 695030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:08,337-Speed 2630.90 samples/sec   Loss 2.2499   LearningRate 0.0026   Epoch: 16   Global Step: 695040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:12,237-Speed 2626.87 samples/sec   Loss 2.2538   LearningRate 0.0026   Epoch: 16   Global Step: 695050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:16,139-Speed 2624.77 samples/sec   Loss 2.1964   LearningRate 0.0026   Epoch: 16   Global Step: 695060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:20,016-Speed 2641.57 samples/sec   Loss 2.2304   LearningRate 0.0026   Epoch: 16   Global Step: 695070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:23,909-Speed 2631.49 samples/sec   Loss 2.3167   LearningRate 0.0026   Epoch: 16   Global Step: 695080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:27,817-Speed 2621.02 samples/sec   Loss 2.3097   LearningRate 0.0026   Epoch: 16   Global Step: 695090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:31,710-Speed 2631.33 samples/sec   Loss 2.2954   LearningRate 0.0026   Epoch: 16   Global Step: 695100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:35,606-Speed 2628.80 samples/sec   Loss 2.2205   LearningRate 0.0026   Epoch: 16   Global Step: 695110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:39,501-Speed 2628.95 samples/sec   Loss 2.3350   LearningRate 0.0026   Epoch: 16   Global Step: 695120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:43,395-Speed 2630.40 samples/sec   Loss 2.2761   LearningRate 0.0026   Epoch: 16   Global Step: 695130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:47,287-Speed 2632.27 samples/sec   Loss 2.2602   LearningRate 0.0026   Epoch: 16   Global Step: 695140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:51,192-Speed 2622.31 samples/sec   Loss 2.2221   LearningRate 0.0026   Epoch: 16   Global Step: 695150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:51:55,085-Speed 2631.73 samples/sec   Loss 2.2284   LearningRate 0.0026   Epoch: 16   Global Step: 695160   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:51:58,978-Speed 2630.93 samples/sec   Loss 2.2196   LearningRate 0.0026   Epoch: 16   Global Step: 695170   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:02,872-Speed 2630.67 samples/sec   Loss 2.1932   LearningRate 0.0026   Epoch: 16   Global Step: 695180   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:06,766-Speed 2629.81 samples/sec   Loss 2.2240   LearningRate 0.0026   Epoch: 16   Global Step: 695190   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:10,667-Speed 2625.64 samples/sec   Loss 2.2389   LearningRate 0.0026   Epoch: 16   Global Step: 695200   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:14,558-Speed 2631.94 samples/sec   Loss 2.2483   LearningRate 0.0026   Epoch: 16   Global Step: 695210   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:18,490-Speed 2605.59 samples/sec   Loss 2.2586   LearningRate 0.0026   Epoch: 16   Global Step: 695220   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:22,380-Speed 2633.11 samples/sec   Loss 2.2149   LearningRate 0.0026   Epoch: 16   Global Step: 695230   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:26,273-Speed 2631.12 samples/sec   Loss 2.2332   LearningRate 0.0026   Epoch: 16   Global Step: 695240   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:30,194-Speed 2612.37 samples/sec   Loss 2.2291   LearningRate 0.0026   Epoch: 16   Global Step: 695250   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:52:34,092-Speed 2627.72 samples/sec   Loss 2.2832   LearningRate 0.0026   Epoch: 16   Global Step: 695260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:52:37,985-Speed 2630.98 samples/sec   Loss 2.2708   LearningRate 0.0026   Epoch: 16   Global Step: 695270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:52:41,898-Speed 2617.46 samples/sec   Loss 2.2598   LearningRate 0.0026   Epoch: 16   Global Step: 695280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:52:45,876-Speed 2574.90 samples/sec   Loss 2.2411   LearningRate 0.0026   Epoch: 16   Global Step: 695290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:52:49,792-Speed 2615.62 samples/sec   Loss 2.2561   LearningRate 0.0026   Epoch: 16   Global Step: 695300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:52:53,696-Speed 2623.93 samples/sec   Loss 2.3306   LearningRate 0.0026   Epoch: 16   Global Step: 695310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:52:57,597-Speed 2625.81 samples/sec   Loss 2.2517   LearningRate 0.0026   Epoch: 16   Global Step: 695320   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:01,495-Speed 2627.77 samples/sec   Loss 2.2158   LearningRate 0.0026   Epoch: 16   Global Step: 695330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:05,493-Speed 2561.71 samples/sec   Loss 2.2393   LearningRate 0.0026   Epoch: 16   Global Step: 695340   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:09,394-Speed 2625.78 samples/sec   Loss 2.2135   LearningRate 0.0026   Epoch: 16   Global Step: 695350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:13,301-Speed 2621.53 samples/sec   Loss 2.2271   LearningRate 0.0026   Epoch: 16   Global Step: 695360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:17,199-Speed 2627.67 samples/sec   Loss 2.2562   LearningRate 0.0026   Epoch: 16   Global Step: 695370   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:21,097-Speed 2627.79 samples/sec   Loss 2.2505   LearningRate 0.0026   Epoch: 16   Global Step: 695380   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:25,027-Speed 2606.39 samples/sec   Loss 2.2262   LearningRate 0.0026   Epoch: 16   Global Step: 695390   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:28,923-Speed 2629.30 samples/sec   Loss 2.2816   LearningRate 0.0026   Epoch: 16   Global Step: 695400   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:32,819-Speed 2628.99 samples/sec   Loss 2.3124   LearningRate 0.0026   Epoch: 16   Global Step: 695410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:53:36,712-Speed 2630.72 samples/sec   Loss 2.2698   LearningRate 0.0026   Epoch: 16   Global Step: 695420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:53:40,623-Speed 2619.07 samples/sec   Loss 2.2689   LearningRate 0.0026   Epoch: 16   Global Step: 695430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:53:44,519-Speed 2628.54 samples/sec   Loss 2.2572   LearningRate 0.0026   Epoch: 16   Global Step: 695440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:53:48,429-Speed 2620.12 samples/sec   Loss 2.2067   LearningRate 0.0026   Epoch: 16   Global Step: 695450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:53:52,378-Speed 2593.69 samples/sec   Loss 2.2445   LearningRate 0.0026   Epoch: 16   Global Step: 695460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:53:56,463-Speed 2507.11 samples/sec   Loss 2.1881   LearningRate 0.0026   Epoch: 16   Global Step: 695470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:00,463-Speed 2561.58 samples/sec   Loss 2.2310   LearningRate 0.0026   Epoch: 16   Global Step: 695480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:04,357-Speed 2630.20 samples/sec   Loss 2.2790   LearningRate 0.0026   Epoch: 16   Global Step: 695490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:08,286-Speed 2606.26 samples/sec   Loss 2.1982   LearningRate 0.0026   Epoch: 16   Global Step: 695500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:12,178-Speed 2632.06 samples/sec   Loss 2.2530   LearningRate 0.0026   Epoch: 16   Global Step: 695510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:16,073-Speed 2629.90 samples/sec   Loss 2.2174   LearningRate 0.0026   Epoch: 16   Global Step: 695520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:54:19,965-Speed 2631.76 samples/sec   Loss 2.1575   LearningRate 0.0026   Epoch: 16   Global Step: 695530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:54:23,849-Speed 2637.57 samples/sec   Loss 2.3128   LearningRate 0.0026   Epoch: 16   Global Step: 695540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:27,740-Speed 2632.29 samples/sec   Loss 2.2345   LearningRate 0.0026   Epoch: 16   Global Step: 695550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:31,635-Speed 2629.93 samples/sec   Loss 2.3479   LearningRate 0.0026   Epoch: 16   Global Step: 695560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:35,525-Speed 2633.24 samples/sec   Loss 2.2626   LearningRate 0.0026   Epoch: 16   Global Step: 695570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:39,417-Speed 2631.78 samples/sec   Loss 2.2858   LearningRate 0.0026   Epoch: 16   Global Step: 695580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:43,305-Speed 2634.05 samples/sec   Loss 2.2362   LearningRate 0.0026   Epoch: 16   Global Step: 695590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:47,195-Speed 2632.64 samples/sec   Loss 2.3039   LearningRate 0.0026   Epoch: 16   Global Step: 695600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:51,088-Speed 2631.78 samples/sec   Loss 2.3180   LearningRate 0.0026   Epoch: 16   Global Step: 695610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:54,982-Speed 2630.19 samples/sec   Loss 2.2845   LearningRate 0.0026   Epoch: 16   Global Step: 695620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:54:58,874-Speed 2631.88 samples/sec   Loss 2.2432   LearningRate 0.0026   Epoch: 16   Global Step: 695630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:55:02,770-Speed 2628.65 samples/sec   Loss 2.2925   LearningRate 0.0026   Epoch: 16   Global Step: 695640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:55:06,663-Speed 2631.72 samples/sec   Loss 2.2054   LearningRate 0.0026   Epoch: 16   Global Step: 695650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:55:10,557-Speed 2630.74 samples/sec   Loss 2.1713   LearningRate 0.0026   Epoch: 16   Global Step: 695660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:55:14,429-Speed 2645.01 samples/sec   Loss 2.2167   LearningRate 0.0026   Epoch: 16   Global Step: 695670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:55:18,322-Speed 2630.84 samples/sec   Loss 2.2801   LearningRate 0.0026   Epoch: 16   Global Step: 695680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:55:22,222-Speed 2625.94 samples/sec   Loss 2.2584   LearningRate 0.0026   Epoch: 16   Global Step: 695690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:55:26,099-Speed 2642.20 samples/sec   Loss 2.2622   LearningRate 0.0026   Epoch: 16   Global Step: 695700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:30,001-Speed 2625.32 samples/sec   Loss 2.3045   LearningRate 0.0026   Epoch: 16   Global Step: 695710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:33,899-Speed 2627.36 samples/sec   Loss 2.2844   LearningRate 0.0026   Epoch: 16   Global Step: 695720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:37,791-Speed 2631.82 samples/sec   Loss 2.2574   LearningRate 0.0026   Epoch: 16   Global Step: 695730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:41,685-Speed 2630.49 samples/sec   Loss 2.3256   LearningRate 0.0026   Epoch: 16   Global Step: 695740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:45,574-Speed 2633.50 samples/sec   Loss 2.1697   LearningRate 0.0026   Epoch: 16   Global Step: 695750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:49,468-Speed 2630.10 samples/sec   Loss 2.2745   LearningRate 0.0026   Epoch: 16   Global Step: 695760   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:53,370-Speed 2625.49 samples/sec   Loss 2.3074   LearningRate 0.0026   Epoch: 16   Global Step: 695770   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:55:57,272-Speed 2625.41 samples/sec   Loss 2.3114   LearningRate 0.0026   Epoch: 16   Global Step: 695780   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:56:01,174-Speed 2624.57 samples/sec   Loss 2.3402   LearningRate 0.0026   Epoch: 16   Global Step: 695790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:56:05,075-Speed 2626.03 samples/sec   Loss 2.3331   LearningRate 0.0026   Epoch: 16   Global Step: 695800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:08,983-Speed 2620.44 samples/sec   Loss 2.2660   LearningRate 0.0026   Epoch: 16   Global Step: 695810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:12,880-Speed 2628.56 samples/sec   Loss 2.1850   LearningRate 0.0026   Epoch: 16   Global Step: 695820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:16,782-Speed 2624.67 samples/sec   Loss 2.1716   LearningRate 0.0026   Epoch: 16   Global Step: 695830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:20,751-Speed 2580.85 samples/sec   Loss 2.2974   LearningRate 0.0026   Epoch: 16   Global Step: 695840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:24,652-Speed 2625.49 samples/sec   Loss 2.2016   LearningRate 0.0026   Epoch: 16   Global Step: 695850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:28,542-Speed 2633.46 samples/sec   Loss 2.2225   LearningRate 0.0026   Epoch: 16   Global Step: 695860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:32,436-Speed 2630.65 samples/sec   Loss 2.2697   LearningRate 0.0026   Epoch: 16   Global Step: 695870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:36,342-Speed 2621.96 samples/sec   Loss 2.2763   LearningRate 0.0026   Epoch: 16   Global Step: 695880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:40,240-Speed 2627.33 samples/sec   Loss 2.2157   LearningRate 0.0026   Epoch: 16   Global Step: 695890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:44,111-Speed 2646.24 samples/sec   Loss 2.2452   LearningRate 0.0026   Epoch: 16   Global Step: 695900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:48,038-Speed 2608.19 samples/sec   Loss 2.2538   LearningRate 0.0026   Epoch: 16   Global Step: 695910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:51,941-Speed 2624.18 samples/sec   Loss 2.2697   LearningRate 0.0026   Epoch: 16   Global Step: 695920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:55,835-Speed 2630.27 samples/sec   Loss 2.2783   LearningRate 0.0026   Epoch: 16   Global Step: 695930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:56:59,726-Speed 2631.87 samples/sec   Loss 2.2926   LearningRate 0.0026   Epoch: 16   Global Step: 695940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:03,619-Speed 2631.33 samples/sec   Loss 2.2325   LearningRate 0.0026   Epoch: 16   Global Step: 695950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:07,529-Speed 2619.92 samples/sec   Loss 2.2730   LearningRate 0.0026   Epoch: 16   Global Step: 695960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:11,423-Speed 2630.29 samples/sec   Loss 2.2761   LearningRate 0.0026   Epoch: 16   Global Step: 695970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:15,319-Speed 2629.07 samples/sec   Loss 2.2106   LearningRate 0.0026   Epoch: 16   Global Step: 695980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:19,211-Speed 2631.43 samples/sec   Loss 2.3118   LearningRate 0.0026   Epoch: 16   Global Step: 695990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:23,103-Speed 2631.44 samples/sec   Loss 2.2130   LearningRate 0.0026   Epoch: 16   Global Step: 696000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:57:27,009-Speed 2622.97 samples/sec   Loss 2.2612   LearningRate 0.0026   Epoch: 16   Global Step: 696010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:57:30,902-Speed 2630.90 samples/sec   Loss 2.2636   LearningRate 0.0026   Epoch: 16   Global Step: 696020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:57:34,785-Speed 2637.10 samples/sec   Loss 2.2696   LearningRate 0.0026   Epoch: 16   Global Step: 696030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:38,688-Speed 2624.70 samples/sec   Loss 2.3560   LearningRate 0.0026   Epoch: 16   Global Step: 696040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:42,594-Speed 2622.25 samples/sec   Loss 2.2910   LearningRate 0.0026   Epoch: 16   Global Step: 696050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:46,496-Speed 2625.28 samples/sec   Loss 2.2495   LearningRate 0.0026   Epoch: 16   Global Step: 696060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:50,388-Speed 2631.43 samples/sec   Loss 2.3063   LearningRate 0.0026   Epoch: 16   Global Step: 696070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:54,281-Speed 2631.30 samples/sec   Loss 2.2480   LearningRate 0.0026   Epoch: 16   Global Step: 696080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:57:58,179-Speed 2627.47 samples/sec   Loss 2.1493   LearningRate 0.0026   Epoch: 16   Global Step: 696090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:02,081-Speed 2624.68 samples/sec   Loss 2.3029   LearningRate 0.0026   Epoch: 16   Global Step: 696100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:05,977-Speed 2629.23 samples/sec   Loss 2.2798   LearningRate 0.0026   Epoch: 16   Global Step: 696110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:09,941-Speed 2583.36 samples/sec   Loss 2.2959   LearningRate 0.0026   Epoch: 16   Global Step: 696120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:13,885-Speed 2597.95 samples/sec   Loss 2.2909   LearningRate 0.0026   Epoch: 16   Global Step: 696130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:58:17,785-Speed 2626.55 samples/sec   Loss 2.2583   LearningRate 0.0026   Epoch: 16   Global Step: 696140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:58:21,688-Speed 2623.86 samples/sec   Loss 2.2898   LearningRate 0.0026   Epoch: 16   Global Step: 696150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:58:25,570-Speed 2638.61 samples/sec   Loss 2.1840   LearningRate 0.0026   Epoch: 16   Global Step: 696160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:29,465-Speed 2629.55 samples/sec   Loss 2.2326   LearningRate 0.0026   Epoch: 16   Global Step: 696170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:33,362-Speed 2628.10 samples/sec   Loss 2.2238   LearningRate 0.0026   Epoch: 16   Global Step: 696180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:37,265-Speed 2624.10 samples/sec   Loss 2.2455   LearningRate 0.0026   Epoch: 16   Global Step: 696190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:41,170-Speed 2623.56 samples/sec   Loss 2.2566   LearningRate 0.0026   Epoch: 16   Global Step: 696200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:45,066-Speed 2628.92 samples/sec   Loss 2.3180   LearningRate 0.0026   Epoch: 16   Global Step: 696210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:48,966-Speed 2626.48 samples/sec   Loss 2.1793   LearningRate 0.0026   Epoch: 16   Global Step: 696220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:52,863-Speed 2628.19 samples/sec   Loss 2.2759   LearningRate 0.0026   Epoch: 16   Global Step: 696230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:58:56,765-Speed 2625.36 samples/sec   Loss 2.2565   LearningRate 0.0026   Epoch: 16   Global Step: 696240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:00,660-Speed 2629.47 samples/sec   Loss 2.2342   LearningRate 0.0026   Epoch: 16   Global Step: 696250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:04,558-Speed 2627.48 samples/sec   Loss 2.2192   LearningRate 0.0026   Epoch: 16   Global Step: 696260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:59:08,464-Speed 2622.07 samples/sec   Loss 2.2674   LearningRate 0.0026   Epoch: 16   Global Step: 696270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:59:12,364-Speed 2626.10 samples/sec   Loss 2.2775   LearningRate 0.0026   Epoch: 16   Global Step: 696280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:59:16,262-Speed 2627.64 samples/sec   Loss 2.2714   LearningRate 0.0026   Epoch: 16   Global Step: 696290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:59:20,170-Speed 2621.51 samples/sec   Loss 2.2596   LearningRate 0.0026   Epoch: 16   Global Step: 696300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 01:59:24,058-Speed 2633.97 samples/sec   Loss 2.2372   LearningRate 0.0026   Epoch: 16   Global Step: 696310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:27,958-Speed 2627.37 samples/sec   Loss 2.2405   LearningRate 0.0026   Epoch: 16   Global Step: 696320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:31,856-Speed 2627.12 samples/sec   Loss 2.2326   LearningRate 0.0026   Epoch: 16   Global Step: 696330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:35,750-Speed 2630.37 samples/sec   Loss 2.2289   LearningRate 0.0026   Epoch: 16   Global Step: 696340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:39,648-Speed 2627.07 samples/sec   Loss 2.2742   LearningRate 0.0026   Epoch: 16   Global Step: 696350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 01:59:43,520-Speed 2645.53 samples/sec   Loss 2.2222   LearningRate 0.0026   Epoch: 16   Global Step: 696360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:59:47,427-Speed 2622.42 samples/sec   Loss 2.2626   LearningRate 0.0026   Epoch: 16   Global Step: 696370   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:59:51,352-Speed 2610.06 samples/sec   Loss 2.2614   LearningRate 0.0026   Epoch: 16   Global Step: 696380   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:59:55,244-Speed 2631.55 samples/sec   Loss 2.2417   LearningRate 0.0026   Epoch: 16   Global Step: 696390   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 01:59:59,158-Speed 2616.95 samples/sec   Loss 2.2670   LearningRate 0.0026   Epoch: 16   Global Step: 696400   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:03,056-Speed 2627.36 samples/sec   Loss 2.2794   LearningRate 0.0026   Epoch: 16   Global Step: 696410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:06,946-Speed 2633.44 samples/sec   Loss 2.2069   LearningRate 0.0026   Epoch: 16   Global Step: 696420   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:10,836-Speed 2633.12 samples/sec   Loss 2.2925   LearningRate 0.0026   Epoch: 16   Global Step: 696430   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:14,731-Speed 2629.38 samples/sec   Loss 2.2346   LearningRate 0.0026   Epoch: 16   Global Step: 696440   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:18,624-Speed 2631.15 samples/sec   Loss 2.2666   LearningRate 0.0026   Epoch: 16   Global Step: 696450   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:22,516-Speed 2632.24 samples/sec   Loss 2.2118   LearningRate 0.0026   Epoch: 16   Global Step: 696460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:00:26,408-Speed 2631.49 samples/sec   Loss 2.2641   LearningRate 0.0026   Epoch: 16   Global Step: 696470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:00:30,304-Speed 2629.79 samples/sec   Loss 2.2637   LearningRate 0.0026   Epoch: 16   Global Step: 696480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:00:34,212-Speed 2620.68 samples/sec   Loss 2.2435   LearningRate 0.0026   Epoch: 16   Global Step: 696490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:00:38,112-Speed 2626.16 samples/sec   Loss 2.2817   LearningRate 0.0026   Epoch: 16   Global Step: 696500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:00:42,004-Speed 2631.00 samples/sec   Loss 2.3041   LearningRate 0.0026   Epoch: 16   Global Step: 696510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:00:45,876-Speed 2645.78 samples/sec   Loss 2.2362   LearningRate 0.0026   Epoch: 16   Global Step: 696520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:49,770-Speed 2629.80 samples/sec   Loss 2.2984   LearningRate 0.0026   Epoch: 16   Global Step: 696530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:53,678-Speed 2621.71 samples/sec   Loss 2.2294   LearningRate 0.0026   Epoch: 16   Global Step: 696540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:00:57,575-Speed 2628.33 samples/sec   Loss 2.2422   LearningRate 0.0026   Epoch: 16   Global Step: 696550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:01,494-Speed 2613.76 samples/sec   Loss 2.2447   LearningRate 0.0026   Epoch: 16   Global Step: 696560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:05,386-Speed 2631.10 samples/sec   Loss 2.2297   LearningRate 0.0026   Epoch: 16   Global Step: 696570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:09,293-Speed 2621.31 samples/sec   Loss 2.2764   LearningRate 0.0026   Epoch: 16   Global Step: 696580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:13,188-Speed 2629.62 samples/sec   Loss 2.2451   LearningRate 0.0026   Epoch: 16   Global Step: 696590   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:17,096-Speed 2621.04 samples/sec   Loss 2.1922   LearningRate 0.0026   Epoch: 16   Global Step: 696600   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:20,993-Speed 2628.47 samples/sec   Loss 2.2032   LearningRate 0.0026   Epoch: 16   Global Step: 696610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:01:24,892-Speed 2627.32 samples/sec   Loss 2.2721   LearningRate 0.0026   Epoch: 16   Global Step: 696620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:28,795-Speed 2624.33 samples/sec   Loss 2.2053   LearningRate 0.0026   Epoch: 16   Global Step: 696630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:32,689-Speed 2630.73 samples/sec   Loss 2.2369   LearningRate 0.0026   Epoch: 16   Global Step: 696640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:36,605-Speed 2614.75 samples/sec   Loss 2.2348   LearningRate 0.0026   Epoch: 16   Global Step: 696650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:40,501-Speed 2629.37 samples/sec   Loss 2.1774   LearningRate 0.0026   Epoch: 16   Global Step: 696660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:44,402-Speed 2624.85 samples/sec   Loss 2.2800   LearningRate 0.0026   Epoch: 16   Global Step: 696670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:48,345-Speed 2598.11 samples/sec   Loss 2.3207   LearningRate 0.0026   Epoch: 16   Global Step: 696680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:52,260-Speed 2615.98 samples/sec   Loss 2.2825   LearningRate 0.0026   Epoch: 16   Global Step: 696690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:01:56,157-Speed 2628.98 samples/sec   Loss 2.1890   LearningRate 0.0026   Epoch: 16   Global Step: 696700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:00,051-Speed 2630.18 samples/sec   Loss 2.2246   LearningRate 0.0026   Epoch: 16   Global Step: 696710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:03,947-Speed 2629.65 samples/sec   Loss 2.2513   LearningRate 0.0026   Epoch: 16   Global Step: 696720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:02:07,818-Speed 2645.60 samples/sec   Loss 2.2076   LearningRate 0.0026   Epoch: 16   Global Step: 696730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:11,709-Speed 2632.45 samples/sec   Loss 2.2255   LearningRate 0.0026   Epoch: 16   Global Step: 696740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:15,606-Speed 2628.32 samples/sec   Loss 2.2380   LearningRate 0.0026   Epoch: 16   Global Step: 696750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:19,502-Speed 2629.11 samples/sec   Loss 2.2252   LearningRate 0.0026   Epoch: 16   Global Step: 696760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:23,396-Speed 2630.24 samples/sec   Loss 2.2365   LearningRate 0.0026   Epoch: 16   Global Step: 696770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:27,290-Speed 2630.26 samples/sec   Loss 2.1700   LearningRate 0.0026   Epoch: 16   Global Step: 696780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:31,198-Speed 2621.08 samples/sec   Loss 2.2159   LearningRate 0.0026   Epoch: 16   Global Step: 696790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:35,092-Speed 2629.97 samples/sec   Loss 2.2770   LearningRate 0.0026   Epoch: 16   Global Step: 696800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:39,007-Speed 2616.54 samples/sec   Loss 2.1846   LearningRate 0.0026   Epoch: 16   Global Step: 696810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:42,904-Speed 2629.03 samples/sec   Loss 2.3111   LearningRate 0.0026   Epoch: 16   Global Step: 696820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:02:46,799-Speed 2629.78 samples/sec   Loss 2.1519   LearningRate 0.0026   Epoch: 16   Global Step: 696830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:02:50,700-Speed 2625.53 samples/sec   Loss 2.2261   LearningRate 0.0026   Epoch: 16   Global Step: 696840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:02:54,619-Speed 2613.32 samples/sec   Loss 2.2528   LearningRate 0.0026   Epoch: 16   Global Step: 696850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:02:58,501-Speed 2638.61 samples/sec   Loss 2.2459   LearningRate 0.0026   Epoch: 16   Global Step: 696860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:02,397-Speed 2629.03 samples/sec   Loss 2.2371   LearningRate 0.0026   Epoch: 16   Global Step: 696870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:06,292-Speed 2629.76 samples/sec   Loss 2.2471   LearningRate 0.0026   Epoch: 16   Global Step: 696880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:10,194-Speed 2625.28 samples/sec   Loss 2.2260   LearningRate 0.0026   Epoch: 16   Global Step: 696890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:14,085-Speed 2632.38 samples/sec   Loss 2.2304   LearningRate 0.0026   Epoch: 16   Global Step: 696900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:17,975-Speed 2633.51 samples/sec   Loss 2.2287   LearningRate 0.0026   Epoch: 16   Global Step: 696910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:21,871-Speed 2628.91 samples/sec   Loss 2.3054   LearningRate 0.0026   Epoch: 16   Global Step: 696920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:25,778-Speed 2622.19 samples/sec   Loss 2.2248   LearningRate 0.0026   Epoch: 16   Global Step: 696930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:29,684-Speed 2622.35 samples/sec   Loss 2.2120   LearningRate 0.0026   Epoch: 16   Global Step: 696940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:33,624-Speed 2599.60 samples/sec   Loss 2.2458   LearningRate 0.0026   Epoch: 16   Global Step: 696950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:37,521-Speed 2627.80 samples/sec   Loss 2.2936   LearningRate 0.0026   Epoch: 16   Global Step: 696960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:41,414-Speed 2631.22 samples/sec   Loss 2.2263   LearningRate 0.0026   Epoch: 16   Global Step: 696970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:45,312-Speed 2627.33 samples/sec   Loss 2.2275   LearningRate 0.0026   Epoch: 16   Global Step: 696980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:49,202-Speed 2633.67 samples/sec   Loss 2.2219   LearningRate 0.0026   Epoch: 16   Global Step: 696990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:53,096-Speed 2630.34 samples/sec   Loss 2.1849   LearningRate 0.0026   Epoch: 16   Global Step: 697000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:03:56,991-Speed 2629.69 samples/sec   Loss 2.2885   LearningRate 0.0026   Epoch: 16   Global Step: 697010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:00,884-Speed 2631.07 samples/sec   Loss 2.2350   LearningRate 0.0026   Epoch: 16   Global Step: 697020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:04,775-Speed 2632.36 samples/sec   Loss 2.2884   LearningRate 0.0026   Epoch: 16   Global Step: 697030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:08,755-Speed 2572.77 samples/sec   Loss 2.1655   LearningRate 0.0026   Epoch: 16   Global Step: 697040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:12,650-Speed 2630.03 samples/sec   Loss 2.2386   LearningRate 0.0026   Epoch: 16   Global Step: 697050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:16,538-Speed 2634.05 samples/sec   Loss 2.2478   LearningRate 0.0026   Epoch: 16   Global Step: 697060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:20,431-Speed 2631.27 samples/sec   Loss 2.2253   LearningRate 0.0026   Epoch: 16   Global Step: 697070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:24,326-Speed 2630.26 samples/sec   Loss 2.2605   LearningRate 0.0026   Epoch: 16   Global Step: 697080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:28,220-Speed 2630.82 samples/sec   Loss 2.2551   LearningRate 0.0026   Epoch: 16   Global Step: 697090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:32,112-Speed 2630.92 samples/sec   Loss 2.2086   LearningRate 0.0026   Epoch: 16   Global Step: 697100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:36,007-Speed 2630.00 samples/sec   Loss 2.2018   LearningRate 0.0025   Epoch: 16   Global Step: 697110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:39,896-Speed 2633.19 samples/sec   Loss 2.2158   LearningRate 0.0025   Epoch: 16   Global Step: 697120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:43,783-Speed 2635.10 samples/sec   Loss 2.2924   LearningRate 0.0025   Epoch: 16   Global Step: 697130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:47,675-Speed 2631.90 samples/sec   Loss 2.2680   LearningRate 0.0025   Epoch: 16   Global Step: 697140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:51,569-Speed 2630.36 samples/sec   Loss 2.2493   LearningRate 0.0025   Epoch: 16   Global Step: 697150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:04:55,460-Speed 2632.35 samples/sec   Loss 2.2322   LearningRate 0.0025   Epoch: 16   Global Step: 697160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:04:59,334-Speed 2644.76 samples/sec   Loss 2.2182   LearningRate 0.0025   Epoch: 16   Global Step: 697170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:03,229-Speed 2628.95 samples/sec   Loss 2.1883   LearningRate 0.0025   Epoch: 16   Global Step: 697180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:07,123-Speed 2630.61 samples/sec   Loss 2.2373   LearningRate 0.0025   Epoch: 16   Global Step: 697190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:11,017-Speed 2630.10 samples/sec   Loss 2.2108   LearningRate 0.0025   Epoch: 16   Global Step: 697200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:14,913-Speed 2628.95 samples/sec   Loss 2.2027   LearningRate 0.0025   Epoch: 16   Global Step: 697210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:18,809-Speed 2628.82 samples/sec   Loss 2.2771   LearningRate 0.0025   Epoch: 16   Global Step: 697220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:22,719-Speed 2619.40 samples/sec   Loss 2.2045   LearningRate 0.0025   Epoch: 16   Global Step: 697230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:05:26,594-Speed 2643.58 samples/sec   Loss 2.2571   LearningRate 0.0025   Epoch: 16   Global Step: 697240   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:30,497-Speed 2624.54 samples/sec   Loss 2.2118   LearningRate 0.0025   Epoch: 16   Global Step: 697250   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:34,392-Speed 2629.29 samples/sec   Loss 2.2292   LearningRate 0.0025   Epoch: 16   Global Step: 697260   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:38,284-Speed 2631.47 samples/sec   Loss 2.2505   LearningRate 0.0025   Epoch: 16   Global Step: 697270   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:42,187-Speed 2624.79 samples/sec   Loss 2.2693   LearningRate 0.0025   Epoch: 16   Global Step: 697280   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:46,081-Speed 2630.60 samples/sec   Loss 2.1319   LearningRate 0.0025   Epoch: 16   Global Step: 697290   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:50,163-Speed 2509.66 samples/sec   Loss 2.2304   LearningRate 0.0025   Epoch: 16   Global Step: 697300   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:54,094-Speed 2605.90 samples/sec   Loss 2.2561   LearningRate 0.0025   Epoch: 16   Global Step: 697310   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:05:57,986-Speed 2631.73 samples/sec   Loss 2.1800   LearningRate 0.0025   Epoch: 16   Global Step: 697320   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:06:01,897-Speed 2618.91 samples/sec   Loss 2.1965   LearningRate 0.0025   Epoch: 16   Global Step: 697330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:06:05,794-Speed 2628.28 samples/sec   Loss 2.2775   LearningRate 0.0025   Epoch: 16   Global Step: 697340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:09,695-Speed 2625.47 samples/sec   Loss 2.2159   LearningRate 0.0025   Epoch: 16   Global Step: 697350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:13,596-Speed 2625.69 samples/sec   Loss 2.2673   LearningRate 0.0025   Epoch: 16   Global Step: 697360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:17,501-Speed 2623.16 samples/sec   Loss 2.2019   LearningRate 0.0025   Epoch: 16   Global Step: 697370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:21,394-Speed 2630.34 samples/sec   Loss 2.1836   LearningRate 0.0025   Epoch: 16   Global Step: 697380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:25,288-Speed 2630.82 samples/sec   Loss 2.2234   LearningRate 0.0025   Epoch: 16   Global Step: 697390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:29,184-Speed 2628.83 samples/sec   Loss 2.1897   LearningRate 0.0025   Epoch: 16   Global Step: 697400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:33,151-Speed 2582.25 samples/sec   Loss 2.2758   LearningRate 0.0025   Epoch: 16   Global Step: 697410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:37,096-Speed 2596.01 samples/sec   Loss 2.1714   LearningRate 0.0025   Epoch: 16   Global Step: 697420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:41,000-Speed 2623.72 samples/sec   Loss 2.2421   LearningRate 0.0025   Epoch: 16   Global Step: 697430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:44,999-Speed 2561.17 samples/sec   Loss 2.2452   LearningRate 0.0025   Epoch: 16   Global Step: 697440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:06:48,868-Speed 2647.26 samples/sec   Loss 2.2430   LearningRate 0.0025   Epoch: 16   Global Step: 697450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:52,764-Speed 2629.28 samples/sec   Loss 2.2316   LearningRate 0.0025   Epoch: 16   Global Step: 697460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:06:56,672-Speed 2621.27 samples/sec   Loss 2.2732   LearningRate 0.0025   Epoch: 16   Global Step: 697470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:07:00,544-Speed 2645.07 samples/sec   Loss 2.2051   LearningRate 0.0025   Epoch: 16   Global Step: 697480   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:04,466-Speed 2612.04 samples/sec   Loss 2.2313   LearningRate 0.0025   Epoch: 16   Global Step: 697490   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:08,368-Speed 2624.64 samples/sec   Loss 2.2861   LearningRate 0.0025   Epoch: 16   Global Step: 697500   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:12,262-Speed 2630.66 samples/sec   Loss 2.2224   LearningRate 0.0025   Epoch: 16   Global Step: 697510   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:16,158-Speed 2629.01 samples/sec   Loss 2.2295   LearningRate 0.0025   Epoch: 16   Global Step: 697520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:20,051-Speed 2631.23 samples/sec   Loss 2.2374   LearningRate 0.0025   Epoch: 16   Global Step: 697530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:23,942-Speed 2632.45 samples/sec   Loss 2.2731   LearningRate 0.0025   Epoch: 16   Global Step: 697540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:27,841-Speed 2627.27 samples/sec   Loss 2.1960   LearningRate 0.0025   Epoch: 16   Global Step: 697550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:31,760-Speed 2613.23 samples/sec   Loss 2.1975   LearningRate 0.0025   Epoch: 16   Global Step: 697560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:35,653-Speed 2631.44 samples/sec   Loss 2.2822   LearningRate 0.0025   Epoch: 16   Global Step: 697570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:07:39,544-Speed 2632.24 samples/sec   Loss 2.2636   LearningRate 0.0025   Epoch: 16   Global Step: 697580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:07:43,437-Speed 2630.92 samples/sec   Loss 2.2094   LearningRate 0.0025   Epoch: 16   Global Step: 697590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:07:47,338-Speed 2625.58 samples/sec   Loss 2.2742   LearningRate 0.0025   Epoch: 16   Global Step: 697600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:07:51,255-Speed 2614.91 samples/sec   Loss 2.2043   LearningRate 0.0025   Epoch: 16   Global Step: 697610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:07:55,154-Speed 2627.67 samples/sec   Loss 2.2521   LearningRate 0.0025   Epoch: 16   Global Step: 697620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:07:59,061-Speed 2621.33 samples/sec   Loss 2.1845   LearningRate 0.0025   Epoch: 16   Global Step: 697630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:08:02,970-Speed 2620.13 samples/sec   Loss 2.2039   LearningRate 0.0025   Epoch: 16   Global Step: 697640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:08:06,868-Speed 2627.71 samples/sec   Loss 2.2661   LearningRate 0.0025   Epoch: 16   Global Step: 697650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:08:10,759-Speed 2632.44 samples/sec   Loss 2.1904   LearningRate 0.0025   Epoch: 16   Global Step: 697660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:08:14,657-Speed 2627.15 samples/sec   Loss 2.1985   LearningRate 0.0025   Epoch: 16   Global Step: 697670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:08:18,532-Speed 2643.59 samples/sec   Loss 2.1468   LearningRate 0.0025   Epoch: 16   Global Step: 697680   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:22,433-Speed 2625.14 samples/sec   Loss 2.2346   LearningRate 0.0025   Epoch: 16   Global Step: 697690   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:26,338-Speed 2623.00 samples/sec   Loss 2.2294   LearningRate 0.0025   Epoch: 16   Global Step: 697700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:30,234-Speed 2629.26 samples/sec   Loss 2.1576   LearningRate 0.0025   Epoch: 16   Global Step: 697710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:34,126-Speed 2631.46 samples/sec   Loss 2.2020   LearningRate 0.0025   Epoch: 16   Global Step: 697720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:38,021-Speed 2630.03 samples/sec   Loss 2.2229   LearningRate 0.0025   Epoch: 16   Global Step: 697730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:41,911-Speed 2632.76 samples/sec   Loss 2.2216   LearningRate 0.0025   Epoch: 16   Global Step: 697740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:45,804-Speed 2631.47 samples/sec   Loss 2.1819   LearningRate 0.0025   Epoch: 16   Global Step: 697750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:49,716-Speed 2617.70 samples/sec   Loss 2.2041   LearningRate 0.0025   Epoch: 16   Global Step: 697760   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:53,614-Speed 2628.14 samples/sec   Loss 2.2054   LearningRate 0.0025   Epoch: 16   Global Step: 697770   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:08:57,515-Speed 2624.94 samples/sec   Loss 2.1860   LearningRate 0.0025   Epoch: 16   Global Step: 697780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:09:01,384-Speed 2647.84 samples/sec   Loss 2.1944   LearningRate 0.0025   Epoch: 16   Global Step: 697790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:05,289-Speed 2622.56 samples/sec   Loss 2.2109   LearningRate 0.0025   Epoch: 16   Global Step: 697800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:09,195-Speed 2622.37 samples/sec   Loss 2.2335   LearningRate 0.0025   Epoch: 16   Global Step: 697810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:13,092-Speed 2628.11 samples/sec   Loss 2.2233   LearningRate 0.0025   Epoch: 16   Global Step: 697820   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:16,987-Speed 2629.70 samples/sec   Loss 2.2421   LearningRate 0.0025   Epoch: 16   Global Step: 697830   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:20,890-Speed 2624.85 samples/sec   Loss 2.2269   LearningRate 0.0025   Epoch: 16   Global Step: 697840   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:24,814-Speed 2609.98 samples/sec   Loss 2.2677   LearningRate 0.0025   Epoch: 16   Global Step: 697850   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:28,722-Speed 2620.90 samples/sec   Loss 2.1882   LearningRate 0.0025   Epoch: 16   Global Step: 697860   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:32,625-Speed 2624.74 samples/sec   Loss 2.1943   LearningRate 0.0025   Epoch: 16   Global Step: 697870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:36,535-Speed 2619.17 samples/sec   Loss 2.2133   LearningRate 0.0025   Epoch: 16   Global Step: 697880   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:40,440-Speed 2622.98 samples/sec   Loss 2.1946   LearningRate 0.0025   Epoch: 16   Global Step: 697890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:09:44,334-Speed 2630.97 samples/sec   Loss 2.2340   LearningRate 0.0025   Epoch: 16   Global Step: 697900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:09:48,310-Speed 2576.00 samples/sec   Loss 2.2655   LearningRate 0.0025   Epoch: 16   Global Step: 697910   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:52,205-Speed 2630.33 samples/sec   Loss 2.2178   LearningRate 0.0025   Epoch: 16   Global Step: 697920   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:09:56,135-Speed 2605.75 samples/sec   Loss 2.2146   LearningRate 0.0025   Epoch: 16   Global Step: 697930   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:00,050-Speed 2616.82 samples/sec   Loss 2.2271   LearningRate 0.0025   Epoch: 16   Global Step: 697940   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:03,959-Speed 2620.18 samples/sec   Loss 2.2423   LearningRate 0.0025   Epoch: 16   Global Step: 697950   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:07,862-Speed 2624.33 samples/sec   Loss 2.2977   LearningRate 0.0025   Epoch: 16   Global Step: 697960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:11,910-Speed 2529.73 samples/sec   Loss 2.1852   LearningRate 0.0025   Epoch: 16   Global Step: 697970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:15,816-Speed 2622.35 samples/sec   Loss 2.2890   LearningRate 0.0025   Epoch: 16   Global Step: 697980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:19,713-Speed 2628.22 samples/sec   Loss 2.2157   LearningRate 0.0025   Epoch: 16   Global Step: 697990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:23,607-Speed 2630.58 samples/sec   Loss 2.2317   LearningRate 0.0025   Epoch: 16   Global Step: 698000   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:10:27,500-Speed 2630.99 samples/sec   Loss 2.2114   LearningRate 0.0025   Epoch: 16   Global Step: 698010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:31,394-Speed 2630.29 samples/sec   Loss 2.2487   LearningRate 0.0025   Epoch: 16   Global Step: 698020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:35,289-Speed 2629.56 samples/sec   Loss 2.2438   LearningRate 0.0025   Epoch: 16   Global Step: 698030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:39,187-Speed 2627.52 samples/sec   Loss 2.2351   LearningRate 0.0025   Epoch: 16   Global Step: 698040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:43,082-Speed 2629.57 samples/sec   Loss 2.2696   LearningRate 0.0025   Epoch: 16   Global Step: 698050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:46,979-Speed 2628.50 samples/sec   Loss 2.1881   LearningRate 0.0025   Epoch: 16   Global Step: 698060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:50,878-Speed 2627.27 samples/sec   Loss 2.2609   LearningRate 0.0025   Epoch: 16   Global Step: 698070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:54,775-Speed 2628.59 samples/sec   Loss 2.2374   LearningRate 0.0025   Epoch: 16   Global Step: 698080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:10:58,668-Speed 2631.22 samples/sec   Loss 2.2529   LearningRate 0.0025   Epoch: 16   Global Step: 698090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:02,567-Speed 2626.40 samples/sec   Loss 2.2361   LearningRate 0.0025   Epoch: 16   Global Step: 698100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:06,468-Speed 2625.43 samples/sec   Loss 2.1685   LearningRate 0.0025   Epoch: 16   Global Step: 698110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:11:10,342-Speed 2643.71 samples/sec   Loss 2.2673   LearningRate 0.0025   Epoch: 16   Global Step: 698120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:14,240-Speed 2628.01 samples/sec   Loss 2.2453   LearningRate 0.0025   Epoch: 16   Global Step: 698130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:18,129-Speed 2633.38 samples/sec   Loss 2.1909   LearningRate 0.0025   Epoch: 16   Global Step: 698140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:22,029-Speed 2626.18 samples/sec   Loss 2.2558   LearningRate 0.0025   Epoch: 16   Global Step: 698150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:25,943-Speed 2616.93 samples/sec   Loss 2.3045   LearningRate 0.0025   Epoch: 16   Global Step: 698160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:29,830-Speed 2635.27 samples/sec   Loss 2.2049   LearningRate 0.0025   Epoch: 16   Global Step: 698170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:33,731-Speed 2625.96 samples/sec   Loss 2.1213   LearningRate 0.0025   Epoch: 16   Global Step: 698180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:37,628-Speed 2628.26 samples/sec   Loss 2.1559   LearningRate 0.0025   Epoch: 16   Global Step: 698190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:41,519-Speed 2632.08 samples/sec   Loss 2.1729   LearningRate 0.0025   Epoch: 16   Global Step: 698200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:45,418-Speed 2627.00 samples/sec   Loss 2.2272   LearningRate 0.0025   Epoch: 16   Global Step: 698210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:11:49,314-Speed 2628.95 samples/sec   Loss 2.2018   LearningRate 0.0025   Epoch: 16   Global Step: 698220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:11:53,229-Speed 2616.08 samples/sec   Loss 2.2259   LearningRate 0.0025   Epoch: 16   Global Step: 698230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:11:57,132-Speed 2624.43 samples/sec   Loss 2.2814   LearningRate 0.0025   Epoch: 16   Global Step: 698240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:12:01,040-Speed 2620.52 samples/sec   Loss 2.2185   LearningRate 0.0025   Epoch: 16   Global Step: 698250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:12:04,939-Speed 2627.18 samples/sec   Loss 2.1558   LearningRate 0.0025   Epoch: 16   Global Step: 698260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:12:08,819-Speed 2639.44 samples/sec   Loss 2.2109   LearningRate 0.0025   Epoch: 16   Global Step: 698270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:12,715-Speed 2629.90 samples/sec   Loss 2.2521   LearningRate 0.0025   Epoch: 16   Global Step: 698280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:16,784-Speed 2516.60 samples/sec   Loss 2.1985   LearningRate 0.0025   Epoch: 16   Global Step: 698290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:20,682-Speed 2627.29 samples/sec   Loss 2.1790   LearningRate 0.0025   Epoch: 16   Global Step: 698300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:24,582-Speed 2626.99 samples/sec   Loss 2.2123   LearningRate 0.0025   Epoch: 16   Global Step: 698310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:28,485-Speed 2624.06 samples/sec   Loss 2.1505   LearningRate 0.0025   Epoch: 16   Global Step: 698320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:32,390-Speed 2622.97 samples/sec   Loss 2.1698   LearningRate 0.0025   Epoch: 16   Global Step: 698330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:36,288-Speed 2626.79 samples/sec   Loss 2.1962   LearningRate 0.0025   Epoch: 16   Global Step: 698340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:40,194-Speed 2622.31 samples/sec   Loss 2.1902   LearningRate 0.0025   Epoch: 16   Global Step: 698350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:12:44,101-Speed 2622.07 samples/sec   Loss 2.2030   LearningRate 0.0025   Epoch: 16   Global Step: 698360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:12:47,992-Speed 2632.17 samples/sec   Loss 2.2336   LearningRate 0.0025   Epoch: 16   Global Step: 698370   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:12:51,901-Speed 2620.36 samples/sec   Loss 2.2503   LearningRate 0.0025   Epoch: 16   Global Step: 698380   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:12:55,800-Speed 2626.80 samples/sec   Loss 2.2651   LearningRate 0.0025   Epoch: 16   Global Step: 698390   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:12:59,693-Speed 2630.86 samples/sec   Loss 2.2147   LearningRate 0.0025   Epoch: 16   Global Step: 698400   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:13:03,597-Speed 2623.91 samples/sec   Loss 2.2061   LearningRate 0.0025   Epoch: 16   Global Step: 698410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:13:07,485-Speed 2633.80 samples/sec   Loss 2.2774   LearningRate 0.0025   Epoch: 16   Global Step: 698420   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:13:11,379-Speed 2630.40 samples/sec   Loss 2.2165   LearningRate 0.0025   Epoch: 16   Global Step: 698430   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:13:15,278-Speed 2627.01 samples/sec   Loss 2.2647   LearningRate 0.0025   Epoch: 16   Global Step: 698440   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:13:19,178-Speed 2626.31 samples/sec   Loss 2.2011   LearningRate 0.0025   Epoch: 16   Global Step: 698450   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:13:23,070-Speed 2631.45 samples/sec   Loss 2.2347   LearningRate 0.0025   Epoch: 16   Global Step: 698460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:26,963-Speed 2631.41 samples/sec   Loss 2.1687   LearningRate 0.0025   Epoch: 16   Global Step: 698470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:30,856-Speed 2631.13 samples/sec   Loss 2.2362   LearningRate 0.0025   Epoch: 16   Global Step: 698480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:34,763-Speed 2621.30 samples/sec   Loss 2.1273   LearningRate 0.0025   Epoch: 16   Global Step: 698490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:38,659-Speed 2629.03 samples/sec   Loss 2.2099   LearningRate 0.0025   Epoch: 16   Global Step: 698500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:42,559-Speed 2626.13 samples/sec   Loss 2.1719   LearningRate 0.0025   Epoch: 16   Global Step: 698510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:46,452-Speed 2630.54 samples/sec   Loss 2.1333   LearningRate 0.0025   Epoch: 16   Global Step: 698520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:50,351-Speed 2627.28 samples/sec   Loss 2.2577   LearningRate 0.0025   Epoch: 16   Global Step: 698530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:54,246-Speed 2630.03 samples/sec   Loss 2.1666   LearningRate 0.0025   Epoch: 16   Global Step: 698540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:13:58,140-Speed 2630.29 samples/sec   Loss 2.2519   LearningRate 0.0025   Epoch: 16   Global Step: 698550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:14:02,015-Speed 2643.34 samples/sec   Loss 2.1769   LearningRate 0.0025   Epoch: 16   Global Step: 698560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:05,908-Speed 2631.06 samples/sec   Loss 2.2051   LearningRate 0.0025   Epoch: 16   Global Step: 698570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:09,797-Speed 2633.19 samples/sec   Loss 2.1766   LearningRate 0.0025   Epoch: 16   Global Step: 698580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:13,692-Speed 2629.92 samples/sec   Loss 2.2131   LearningRate 0.0025   Epoch: 16   Global Step: 698590   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:17,583-Speed 2633.02 samples/sec   Loss 2.2316   LearningRate 0.0025   Epoch: 16   Global Step: 698600   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:21,487-Speed 2623.61 samples/sec   Loss 2.2153   LearningRate 0.0025   Epoch: 16   Global Step: 698610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:25,378-Speed 2632.07 samples/sec   Loss 2.2720   LearningRate 0.0025   Epoch: 16   Global Step: 698620   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:29,285-Speed 2621.60 samples/sec   Loss 2.2000   LearningRate 0.0025   Epoch: 16   Global Step: 698630   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:33,183-Speed 2627.48 samples/sec   Loss 2.2123   LearningRate 0.0025   Epoch: 16   Global Step: 698640   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:37,082-Speed 2627.16 samples/sec   Loss 2.2337   LearningRate 0.0025   Epoch: 16   Global Step: 698650   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:14:40,986-Speed 2623.59 samples/sec   Loss 2.2247   LearningRate 0.0025   Epoch: 16   Global Step: 698660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:14:44,883-Speed 2629.12 samples/sec   Loss 2.2318   LearningRate 0.0025   Epoch: 16   Global Step: 698670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:14:48,781-Speed 2627.67 samples/sec   Loss 2.2117   LearningRate 0.0025   Epoch: 16   Global Step: 698680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:14:52,677-Speed 2628.75 samples/sec   Loss 2.2272   LearningRate 0.0025   Epoch: 16   Global Step: 698690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:14:56,575-Speed 2627.70 samples/sec   Loss 2.1718   LearningRate 0.0025   Epoch: 16   Global Step: 698700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:00,478-Speed 2624.05 samples/sec   Loss 2.2219   LearningRate 0.0025   Epoch: 16   Global Step: 698710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:04,382-Speed 2623.41 samples/sec   Loss 2.1842   LearningRate 0.0025   Epoch: 16   Global Step: 698720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:08,291-Speed 2620.30 samples/sec   Loss 2.1334   LearningRate 0.0025   Epoch: 16   Global Step: 698730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:12,192-Speed 2625.66 samples/sec   Loss 2.2288   LearningRate 0.0025   Epoch: 16   Global Step: 698740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:16,084-Speed 2631.91 samples/sec   Loss 2.2484   LearningRate 0.0025   Epoch: 16   Global Step: 698750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:19,978-Speed 2630.51 samples/sec   Loss 2.1762   LearningRate 0.0025   Epoch: 16   Global Step: 698760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:15:23,848-Speed 2646.23 samples/sec   Loss 2.2242   LearningRate 0.0025   Epoch: 16   Global Step: 698770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:27,743-Speed 2629.85 samples/sec   Loss 2.2301   LearningRate 0.0025   Epoch: 16   Global Step: 698780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:15:31,614-Speed 2645.94 samples/sec   Loss 2.1630   LearningRate 0.0025   Epoch: 16   Global Step: 698790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:35,507-Speed 2630.93 samples/sec   Loss 2.2240   LearningRate 0.0025   Epoch: 16   Global Step: 698800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:39,425-Speed 2614.45 samples/sec   Loss 2.1973   LearningRate 0.0025   Epoch: 16   Global Step: 698810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:43,319-Speed 2630.42 samples/sec   Loss 2.2559   LearningRate 0.0025   Epoch: 16   Global Step: 698820   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:47,213-Speed 2630.50 samples/sec   Loss 2.2007   LearningRate 0.0025   Epoch: 16   Global Step: 698830   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:51,150-Speed 2601.60 samples/sec   Loss 2.2495   LearningRate 0.0025   Epoch: 16   Global Step: 698840   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:55,059-Speed 2620.55 samples/sec   Loss 2.1972   LearningRate 0.0025   Epoch: 16   Global Step: 698850   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:15:58,953-Speed 2630.39 samples/sec   Loss 2.2173   LearningRate 0.0025   Epoch: 16   Global Step: 698860   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:16:02,851-Speed 2627.98 samples/sec   Loss 2.2046   LearningRate 0.0025   Epoch: 16   Global Step: 698870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:16:06,741-Speed 2632.58 samples/sec   Loss 2.2397   LearningRate 0.0025   Epoch: 16   Global Step: 698880   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:16:10,642-Speed 2625.86 samples/sec   Loss 2.1749   LearningRate 0.0025   Epoch: 16   Global Step: 698890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:14,542-Speed 2625.91 samples/sec   Loss 2.1688   LearningRate 0.0025   Epoch: 16   Global Step: 698900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:18,436-Speed 2630.49 samples/sec   Loss 2.2109   LearningRate 0.0025   Epoch: 16   Global Step: 698910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:22,331-Speed 2630.54 samples/sec   Loss 2.1785   LearningRate 0.0025   Epoch: 16   Global Step: 698920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:26,233-Speed 2625.17 samples/sec   Loss 2.2140   LearningRate 0.0025   Epoch: 16   Global Step: 698930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:30,127-Speed 2630.57 samples/sec   Loss 2.1747   LearningRate 0.0025   Epoch: 16   Global Step: 698940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:34,018-Speed 2631.94 samples/sec   Loss 2.1094   LearningRate 0.0025   Epoch: 16   Global Step: 698950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:37,911-Speed 2630.87 samples/sec   Loss 2.2582   LearningRate 0.0025   Epoch: 16   Global Step: 698960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:41,803-Speed 2631.75 samples/sec   Loss 2.2342   LearningRate 0.0025   Epoch: 16   Global Step: 698970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:45,701-Speed 2628.33 samples/sec   Loss 2.1522   LearningRate 0.0025   Epoch: 16   Global Step: 698980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:49,602-Speed 2625.59 samples/sec   Loss 2.2050   LearningRate 0.0025   Epoch: 16   Global Step: 698990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:16:53,476-Speed 2644.04 samples/sec   Loss 2.2487   LearningRate 0.0025   Epoch: 16   Global Step: 699000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:16:57,374-Speed 2628.44 samples/sec   Loss 2.1811   LearningRate 0.0025   Epoch: 16   Global Step: 699010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:01,270-Speed 2628.56 samples/sec   Loss 2.1312   LearningRate 0.0025   Epoch: 16   Global Step: 699020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:05,170-Speed 2626.15 samples/sec   Loss 2.1523   LearningRate 0.0025   Epoch: 16   Global Step: 699030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:09,068-Speed 2627.42 samples/sec   Loss 2.1500   LearningRate 0.0025   Epoch: 16   Global Step: 699040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:12,966-Speed 2627.95 samples/sec   Loss 2.2375   LearningRate 0.0025   Epoch: 16   Global Step: 699050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:16,874-Speed 2621.13 samples/sec   Loss 2.2489   LearningRate 0.0025   Epoch: 16   Global Step: 699060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:20,775-Speed 2625.82 samples/sec   Loss 2.1409   LearningRate 0.0025   Epoch: 16   Global Step: 699070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:24,674-Speed 2626.58 samples/sec   Loss 2.1836   LearningRate 0.0025   Epoch: 16   Global Step: 699080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:28,581-Speed 2622.06 samples/sec   Loss 2.1342   LearningRate 0.0025   Epoch: 16   Global Step: 699090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:32,486-Speed 2623.07 samples/sec   Loss 2.1815   LearningRate 0.0025   Epoch: 16   Global Step: 699100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:36,387-Speed 2625.59 samples/sec   Loss 2.2126   LearningRate 0.0025   Epoch: 16   Global Step: 699110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:40,278-Speed 2632.06 samples/sec   Loss 2.1381   LearningRate 0.0025   Epoch: 16   Global Step: 699120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:44,184-Speed 2622.71 samples/sec   Loss 2.2352   LearningRate 0.0025   Epoch: 16   Global Step: 699130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:48,080-Speed 2628.67 samples/sec   Loss 2.2359   LearningRate 0.0025   Epoch: 16   Global Step: 699140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:51,978-Speed 2628.29 samples/sec   Loss 2.2158   LearningRate 0.0025   Epoch: 16   Global Step: 699150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:17:55,852-Speed 2644.19 samples/sec   Loss 2.1671   LearningRate 0.0025   Epoch: 16   Global Step: 699160   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:17:59,745-Speed 2631.12 samples/sec   Loss 2.1824   LearningRate 0.0025   Epoch: 16   Global Step: 699170   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:18:03,651-Speed 2622.28 samples/sec   Loss 2.2809   LearningRate 0.0025   Epoch: 16   Global Step: 699180   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:18:07,543-Speed 2631.49 samples/sec   Loss 2.1654   LearningRate 0.0025   Epoch: 16   Global Step: 699190   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:18:11,435-Speed 2631.43 samples/sec   Loss 2.2068   LearningRate 0.0025   Epoch: 16   Global Step: 699200   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:18:15,332-Speed 2629.11 samples/sec   Loss 2.1730   LearningRate 0.0025   Epoch: 16   Global Step: 699210   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:18:19,225-Speed 2631.69 samples/sec   Loss 2.2211   LearningRate 0.0025   Epoch: 16   Global Step: 699220   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:18:23,094-Speed 2646.99 samples/sec   Loss 2.2268   LearningRate 0.0025   Epoch: 16   Global Step: 699230   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:26,997-Speed 2625.50 samples/sec   Loss 2.2274   LearningRate 0.0025   Epoch: 16   Global Step: 699240   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:30,896-Speed 2626.77 samples/sec   Loss 2.2339   LearningRate 0.0025   Epoch: 16   Global Step: 699250   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:34,806-Speed 2620.37 samples/sec   Loss 2.1560   LearningRate 0.0025   Epoch: 16   Global Step: 699260   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:38,699-Speed 2630.74 samples/sec   Loss 2.1622   LearningRate 0.0025   Epoch: 16   Global Step: 699270   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:42,593-Speed 2629.97 samples/sec   Loss 2.2352   LearningRate 0.0025   Epoch: 16   Global Step: 699280   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:46,491-Speed 2627.07 samples/sec   Loss 2.1818   LearningRate 0.0025   Epoch: 16   Global Step: 699290   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:50,402-Speed 2619.47 samples/sec   Loss 2.1475   LearningRate 0.0025   Epoch: 16   Global Step: 699300   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:54,301-Speed 2627.81 samples/sec   Loss 2.1914   LearningRate 0.0025   Epoch: 16   Global Step: 699310   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:18:58,193-Speed 2631.15 samples/sec   Loss 2.1926   LearningRate 0.0025   Epoch: 16   Global Step: 699320   Fp16 Grad Scale: 8192   Required: 15 hours
Training: 2022-04-16 02:19:02,091-Speed 2628.62 samples/sec   Loss 2.1609   LearningRate 0.0025   Epoch: 16   Global Step: 699330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:06,029-Speed 2600.61 samples/sec   Loss 2.1263   LearningRate 0.0025   Epoch: 16   Global Step: 699340   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:09,929-Speed 2626.46 samples/sec   Loss 2.2417   LearningRate 0.0025   Epoch: 16   Global Step: 699350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:13,818-Speed 2633.14 samples/sec   Loss 2.2749   LearningRate 0.0025   Epoch: 16   Global Step: 699360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:17,717-Speed 2627.50 samples/sec   Loss 2.1954   LearningRate 0.0025   Epoch: 16   Global Step: 699370   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:21,616-Speed 2627.04 samples/sec   Loss 2.2302   LearningRate 0.0025   Epoch: 16   Global Step: 699380   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:25,513-Speed 2628.76 samples/sec   Loss 2.1883   LearningRate 0.0025   Epoch: 16   Global Step: 699390   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:29,403-Speed 2632.62 samples/sec   Loss 2.1409   LearningRate 0.0025   Epoch: 16   Global Step: 699400   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:33,297-Speed 2630.47 samples/sec   Loss 2.2346   LearningRate 0.0025   Epoch: 16   Global Step: 699410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:37,220-Speed 2611.12 samples/sec   Loss 2.1858   LearningRate 0.0025   Epoch: 16   Global Step: 699420   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:19:41,113-Speed 2630.37 samples/sec   Loss 2.1774   LearningRate 0.0025   Epoch: 16   Global Step: 699430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:19:45,022-Speed 2619.93 samples/sec   Loss 2.1273   LearningRate 0.0025   Epoch: 16   Global Step: 699440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:19:48,916-Speed 2630.87 samples/sec   Loss 2.2435   LearningRate 0.0025   Epoch: 16   Global Step: 699450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:19:52,821-Speed 2623.35 samples/sec   Loss 2.1229   LearningRate 0.0025   Epoch: 16   Global Step: 699460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:19:56,737-Speed 2615.89 samples/sec   Loss 2.1410   LearningRate 0.0025   Epoch: 16   Global Step: 699470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:20:00,743-Speed 2556.57 samples/sec   Loss 2.1856   LearningRate 0.0025   Epoch: 16   Global Step: 699480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:20:04,615-Speed 2645.56 samples/sec   Loss 2.1688   LearningRate 0.0025   Epoch: 16   Global Step: 699490   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:08,609-Speed 2564.94 samples/sec   Loss 2.2900   LearningRate 0.0025   Epoch: 16   Global Step: 699500   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:12,498-Speed 2633.45 samples/sec   Loss 2.2174   LearningRate 0.0025   Epoch: 16   Global Step: 699510   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:16,393-Speed 2629.45 samples/sec   Loss 2.1929   LearningRate 0.0025   Epoch: 16   Global Step: 699520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:20,317-Speed 2610.87 samples/sec   Loss 2.1783   LearningRate 0.0025   Epoch: 16   Global Step: 699530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:24,223-Speed 2621.86 samples/sec   Loss 2.2371   LearningRate 0.0025   Epoch: 16   Global Step: 699540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:28,115-Speed 2631.76 samples/sec   Loss 2.2305   LearningRate 0.0025   Epoch: 16   Global Step: 699550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:32,009-Speed 2629.96 samples/sec   Loss 2.1568   LearningRate 0.0025   Epoch: 16   Global Step: 699560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:35,903-Speed 2631.06 samples/sec   Loss 2.1500   LearningRate 0.0025   Epoch: 16   Global Step: 699570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:39,852-Speed 2594.53 samples/sec   Loss 2.2081   LearningRate 0.0025   Epoch: 16   Global Step: 699580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:20:43,752-Speed 2625.63 samples/sec   Loss 2.1533   LearningRate 0.0025   Epoch: 16   Global Step: 699590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:20:47,653-Speed 2626.25 samples/sec   Loss 2.2343   LearningRate 0.0025   Epoch: 16   Global Step: 699600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:20:51,548-Speed 2629.42 samples/sec   Loss 2.1272   LearningRate 0.0025   Epoch: 16   Global Step: 699610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:20:55,441-Speed 2631.85 samples/sec   Loss 2.2798   LearningRate 0.0025   Epoch: 16   Global Step: 699620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:20:59,341-Speed 2625.63 samples/sec   Loss 2.2143   LearningRate 0.0025   Epoch: 16   Global Step: 699630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:21:03,240-Speed 2626.90 samples/sec   Loss 2.1259   LearningRate 0.0025   Epoch: 16   Global Step: 699640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:21:07,154-Speed 2616.78 samples/sec   Loss 2.1740   LearningRate 0.0025   Epoch: 16   Global Step: 699650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:21:11,045-Speed 2632.90 samples/sec   Loss 2.1499   LearningRate 0.0025   Epoch: 16   Global Step: 699660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:21:14,949-Speed 2623.97 samples/sec   Loss 2.2235   LearningRate 0.0025   Epoch: 16   Global Step: 699670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:21:18,823-Speed 2643.36 samples/sec   Loss 2.1608   LearningRate 0.0025   Epoch: 16   Global Step: 699680   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:22,734-Speed 2619.50 samples/sec   Loss 2.2426   LearningRate 0.0025   Epoch: 16   Global Step: 699690   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:26,628-Speed 2630.49 samples/sec   Loss 2.3073   LearningRate 0.0025   Epoch: 16   Global Step: 699700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:30,547-Speed 2613.75 samples/sec   Loss 2.2011   LearningRate 0.0025   Epoch: 16   Global Step: 699710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:34,503-Speed 2588.92 samples/sec   Loss 2.1471   LearningRate 0.0025   Epoch: 16   Global Step: 699720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:38,406-Speed 2624.74 samples/sec   Loss 2.1590   LearningRate 0.0025   Epoch: 16   Global Step: 699730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:42,437-Speed 2541.01 samples/sec   Loss 2.1967   LearningRate 0.0024   Epoch: 16   Global Step: 699740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:46,341-Speed 2623.33 samples/sec   Loss 2.1726   LearningRate 0.0024   Epoch: 16   Global Step: 699750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:50,281-Speed 2600.24 samples/sec   Loss 2.1600   LearningRate 0.0024   Epoch: 16   Global Step: 699760   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:54,182-Speed 2625.35 samples/sec   Loss 2.1853   LearningRate 0.0024   Epoch: 16   Global Step: 699770   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:21:58,075-Speed 2632.02 samples/sec   Loss 2.1896   LearningRate 0.0024   Epoch: 16   Global Step: 699780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:22:01,978-Speed 2624.36 samples/sec   Loss 2.0897   LearningRate 0.0024   Epoch: 16   Global Step: 699790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:05,879-Speed 2625.58 samples/sec   Loss 2.2049   LearningRate 0.0024   Epoch: 16   Global Step: 699800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:09,772-Speed 2631.16 samples/sec   Loss 2.2091   LearningRate 0.0024   Epoch: 16   Global Step: 699810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:13,669-Speed 2629.35 samples/sec   Loss 2.1855   LearningRate 0.0024   Epoch: 16   Global Step: 699820   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:17,574-Speed 2622.18 samples/sec   Loss 2.2201   LearningRate 0.0024   Epoch: 16   Global Step: 699830   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:21,616-Speed 2534.59 samples/sec   Loss 2.2084   LearningRate 0.0024   Epoch: 16   Global Step: 699840   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:25,534-Speed 2613.95 samples/sec   Loss 2.2160   LearningRate 0.0024   Epoch: 16   Global Step: 699850   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:29,474-Speed 2600.00 samples/sec   Loss 2.1914   LearningRate 0.0024   Epoch: 16   Global Step: 699860   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:33,372-Speed 2627.60 samples/sec   Loss 2.1750   LearningRate 0.0024   Epoch: 16   Global Step: 699870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:37,298-Speed 2608.98 samples/sec   Loss 2.2230   LearningRate 0.0024   Epoch: 16   Global Step: 699880   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:22:41,202-Speed 2623.79 samples/sec   Loss 2.2023   LearningRate 0.0024   Epoch: 16   Global Step: 699890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:22:45,097-Speed 2629.81 samples/sec   Loss 2.2240   LearningRate 0.0024   Epoch: 16   Global Step: 699900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:22:48,999-Speed 2624.36 samples/sec   Loss 2.2030   LearningRate 0.0024   Epoch: 16   Global Step: 699910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:22:52,908-Speed 2620.65 samples/sec   Loss 2.1799   LearningRate 0.0024   Epoch: 16   Global Step: 699920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:22:56,897-Speed 2567.66 samples/sec   Loss 2.1593   LearningRate 0.0024   Epoch: 16   Global Step: 699930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:23:00,937-Speed 2535.57 samples/sec   Loss 2.1467   LearningRate 0.0024   Epoch: 16   Global Step: 699940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:23:04,843-Speed 2622.32 samples/sec   Loss 2.2108   LearningRate 0.0024   Epoch: 16   Global Step: 699950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:23:08,718-Speed 2642.99 samples/sec   Loss 2.0902   LearningRate 0.0024   Epoch: 16   Global Step: 699960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:23:12,773-Speed 2525.65 samples/sec   Loss 2.2243   LearningRate 0.0024   Epoch: 16   Global Step: 699970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:23:16,704-Speed 2605.45 samples/sec   Loss 2.1727   LearningRate 0.0024   Epoch: 16   Global Step: 699980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:23:20,605-Speed 2626.15 samples/sec   Loss 2.2396   LearningRate 0.0024   Epoch: 16   Global Step: 699990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:23:24,499-Speed 2630.32 samples/sec   Loss 2.1715   LearningRate 0.0024   Epoch: 16   Global Step: 700000   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:24:07,966-[lfw][700000]XNorm: 22.498566
Training: 2022-04-16 02:24:07,967-[lfw][700000]Accuracy-Flip: 0.99850+-0.00203
Training: 2022-04-16 02:24:07,968-[lfw][700000]Accuracy-Highest: 0.99850
Training: 2022-04-16 02:24:58,333-[cfp_fp][700000]XNorm: 22.262537
Training: 2022-04-16 02:24:58,334-[cfp_fp][700000]Accuracy-Flip: 0.99257+-0.00325
Training: 2022-04-16 02:24:58,335-[cfp_fp][700000]Accuracy-Highest: 0.99329
Training: 2022-04-16 02:25:41,730-[agedb_30][700000]XNorm: 23.038564
Training: 2022-04-16 02:25:41,731-[agedb_30][700000]Accuracy-Flip: 0.98167+-0.00771
Training: 2022-04-16 02:25:41,732-[agedb_30][700000]Accuracy-Highest: 0.98317
Training: 2022-04-16 02:25:45,606-Speed 72.57 samples/sec   Loss 2.1386   LearningRate 0.0024   Epoch: 16   Global Step: 700010   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:25:49,474-Speed 2647.41 samples/sec   Loss 2.1893   LearningRate 0.0024   Epoch: 16   Global Step: 700020   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:25:53,349-Speed 2644.01 samples/sec   Loss 2.2374   LearningRate 0.0024   Epoch: 16   Global Step: 700030   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:25:57,264-Speed 2615.81 samples/sec   Loss 2.1807   LearningRate 0.0024   Epoch: 16   Global Step: 700040   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:26:01,154-Speed 2632.77 samples/sec   Loss 2.2481   LearningRate 0.0024   Epoch: 16   Global Step: 700050   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-16 02:26:05,046-Speed 2631.69 samples/sec   Loss 2.1542   LearningRate 0.0024   Epoch: 16   Global Step: 700060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:08,919-Speed 2644.81 samples/sec   Loss 2.1877   LearningRate 0.0024   Epoch: 16   Global Step: 700070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:12,815-Speed 2629.25 samples/sec   Loss 2.1474   LearningRate 0.0024   Epoch: 16   Global Step: 700080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:16,709-Speed 2630.59 samples/sec   Loss 2.1536   LearningRate 0.0024   Epoch: 16   Global Step: 700090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:20,604-Speed 2629.65 samples/sec   Loss 2.2025   LearningRate 0.0024   Epoch: 16   Global Step: 700100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:24,487-Speed 2637.54 samples/sec   Loss 2.1654   LearningRate 0.0024   Epoch: 16   Global Step: 700110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:28,462-Speed 2576.50 samples/sec   Loss 2.1924   LearningRate 0.0024   Epoch: 16   Global Step: 700120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:32,352-Speed 2633.15 samples/sec   Loss 2.2308   LearningRate 0.0024   Epoch: 16   Global Step: 700130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:36,237-Speed 2637.11 samples/sec   Loss 2.2145   LearningRate 0.0024   Epoch: 16   Global Step: 700140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:40,130-Speed 2634.09 samples/sec   Loss 2.1408   LearningRate 0.0024   Epoch: 16   Global Step: 700150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:26:44,021-Speed 2631.75 samples/sec   Loss 2.1811   LearningRate 0.0024   Epoch: 16   Global Step: 700160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:26:47,920-Speed 2627.83 samples/sec   Loss 2.1164   LearningRate 0.0024   Epoch: 16   Global Step: 700170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:26:51,876-Speed 2588.93 samples/sec   Loss 2.1728   LearningRate 0.0024   Epoch: 16   Global Step: 700180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:26:55,770-Speed 2630.48 samples/sec   Loss 2.1643   LearningRate 0.0024   Epoch: 16   Global Step: 700190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:26:59,660-Speed 2632.66 samples/sec   Loss 2.1094   LearningRate 0.0024   Epoch: 16   Global Step: 700200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:27:03,553-Speed 2631.41 samples/sec   Loss 2.2229   LearningRate 0.0024   Epoch: 16   Global Step: 700210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:27:07,428-Speed 2643.82 samples/sec   Loss 2.1431   LearningRate 0.0024   Epoch: 16   Global Step: 700220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:11,508-Speed 2510.06 samples/sec   Loss 2.2558   LearningRate 0.0024   Epoch: 16   Global Step: 700230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:15,443-Speed 2603.88 samples/sec   Loss 2.1857   LearningRate 0.0024   Epoch: 16   Global Step: 700240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:19,336-Speed 2630.66 samples/sec   Loss 2.2328   LearningRate 0.0024   Epoch: 16   Global Step: 700250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:23,229-Speed 2630.94 samples/sec   Loss 2.2291   LearningRate 0.0024   Epoch: 16   Global Step: 700260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:27,125-Speed 2628.64 samples/sec   Loss 2.1667   LearningRate 0.0024   Epoch: 16   Global Step: 700270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:31,042-Speed 2615.58 samples/sec   Loss 2.1849   LearningRate 0.0024   Epoch: 16   Global Step: 700280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:34,932-Speed 2633.00 samples/sec   Loss 2.1987   LearningRate 0.0024   Epoch: 16   Global Step: 700290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:38,855-Speed 2610.69 samples/sec   Loss 2.1962   LearningRate 0.0024   Epoch: 16   Global Step: 700300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:42,750-Speed 2629.56 samples/sec   Loss 2.1539   LearningRate 0.0024   Epoch: 16   Global Step: 700310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:46,848-Speed 2499.56 samples/sec   Loss 2.2870   LearningRate 0.0024   Epoch: 16   Global Step: 700320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:27:50,738-Speed 2633.48 samples/sec   Loss 2.2450   LearningRate 0.0024   Epoch: 16   Global Step: 700330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:27:54,681-Speed 2597.08 samples/sec   Loss 2.2355   LearningRate 0.0024   Epoch: 16   Global Step: 700340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:27:58,583-Speed 2624.78 samples/sec   Loss 2.1669   LearningRate 0.0024   Epoch: 16   Global Step: 700350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:02,493-Speed 2619.95 samples/sec   Loss 2.2099   LearningRate 0.0024   Epoch: 16   Global Step: 700360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:06,392-Speed 2627.29 samples/sec   Loss 2.1592   LearningRate 0.0024   Epoch: 16   Global Step: 700370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:10,306-Speed 2616.85 samples/sec   Loss 2.1731   LearningRate 0.0024   Epoch: 16   Global Step: 700380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:14,218-Speed 2618.24 samples/sec   Loss 2.2254   LearningRate 0.0024   Epoch: 16   Global Step: 700390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:18,111-Speed 2631.79 samples/sec   Loss 2.2332   LearningRate 0.0024   Epoch: 16   Global Step: 700400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:22,049-Speed 2600.85 samples/sec   Loss 2.1941   LearningRate 0.0024   Epoch: 16   Global Step: 700410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:25,970-Speed 2611.74 samples/sec   Loss 2.1860   LearningRate 0.0024   Epoch: 16   Global Step: 700420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:29,896-Speed 2609.00 samples/sec   Loss 2.1818   LearningRate 0.0024   Epoch: 16   Global Step: 700430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:33,798-Speed 2625.80 samples/sec   Loss 2.1899   LearningRate 0.0024   Epoch: 16   Global Step: 700440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:28:37,670-Speed 2645.41 samples/sec   Loss 2.1396   LearningRate 0.0024   Epoch: 16   Global Step: 700450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:41,579-Speed 2620.25 samples/sec   Loss 2.1917   LearningRate 0.0024   Epoch: 16   Global Step: 700460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:45,481-Speed 2625.93 samples/sec   Loss 2.1581   LearningRate 0.0024   Epoch: 16   Global Step: 700470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:49,377-Speed 2628.80 samples/sec   Loss 2.1913   LearningRate 0.0024   Epoch: 16   Global Step: 700480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:53,271-Speed 2629.73 samples/sec   Loss 2.2352   LearningRate 0.0024   Epoch: 16   Global Step: 700490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:28:57,162-Speed 2632.47 samples/sec   Loss 2.2355   LearningRate 0.0024   Epoch: 16   Global Step: 700500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:01,055-Speed 2630.93 samples/sec   Loss 2.1295   LearningRate 0.0024   Epoch: 16   Global Step: 700510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:04,949-Speed 2630.54 samples/sec   Loss 2.1579   LearningRate 0.0024   Epoch: 16   Global Step: 700520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:08,842-Speed 2631.52 samples/sec   Loss 2.2202   LearningRate 0.0024   Epoch: 16   Global Step: 700530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:12,736-Speed 2629.96 samples/sec   Loss 2.1639   LearningRate 0.0024   Epoch: 16   Global Step: 700540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:16,631-Speed 2630.06 samples/sec   Loss 2.2321   LearningRate 0.0024   Epoch: 16   Global Step: 700550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-16 02:29:20,638-Speed 2555.65 samples/sec   Loss 2.2244   LearningRate 0.0024   Epoch: 16   Global Step: 700560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:24,665-Speed 2543.50 samples/sec   Loss 2.1573   LearningRate 0.0024   Epoch: 16   Global Step: 700570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:28,571-Speed 2622.16 samples/sec   Loss 2.2016   LearningRate 0.0024   Epoch: 16   Global Step: 700580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:32,480-Speed 2620.61 samples/sec   Loss 2.1748   LearningRate 0.0024   Epoch: 16   Global Step: 700590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-16 02:29:36,376-Speed 2628.73 samples/sec   Loss 2.2414   LearningRate 0.0024   Epoch: 16   Global Step: 700600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:29:40,273-Speed 2628.86 samples/sec   Loss 2.1715   LearningRate 0.0024   Epoch: 16   Global Step: 700610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:29:44,167-Speed 2630.01 samples/sec   Loss 2.1750   LearningRate 0.0024   Epoch: 16   Global Step: 700620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:29:48,067-Speed 2626.59 samples/sec   Loss 2.1983   LearningRate 0.0024   Epoch: 16   Global Step: 700630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:29:51,978-Speed 2619.00 samples/sec   Loss 2.2004   LearningRate 0.0024   Epoch: 16   Global Step: 700640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:29:55,873-Speed 2628.99 samples/sec   Loss 2.1525   LearningRate 0.0024   Epoch: 16   Global Step: 700650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:29:59,767-Speed 2630.34 samples/sec   Loss 2.1657   LearningRate 0.0024   Epoch: 16   Global Step: 700660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:30:03,644-Speed 2642.26 samples/sec   Loss 2.2140   LearningRate 0.0024   Epoch: 16   Global Step: 700670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:07,538-Speed 2630.58 samples/sec   Loss 2.1114   LearningRate 0.0024   Epoch: 16   Global Step: 700680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:11,431-Speed 2631.04 samples/sec   Loss 2.2044   LearningRate 0.0024   Epoch: 16   Global Step: 700690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:15,325-Speed 2630.38 samples/sec   Loss 2.1910   LearningRate 0.0024   Epoch: 16   Global Step: 700700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:19,220-Speed 2629.67 samples/sec   Loss 2.1350   LearningRate 0.0024   Epoch: 16   Global Step: 700710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:23,115-Speed 2629.91 samples/sec   Loss 2.1925   LearningRate 0.0024   Epoch: 16   Global Step: 700720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:27,013-Speed 2627.65 samples/sec   Loss 2.2115   LearningRate 0.0024   Epoch: 16   Global Step: 700730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:30,909-Speed 2628.73 samples/sec   Loss 2.1330   LearningRate 0.0024   Epoch: 16   Global Step: 700740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:34,823-Speed 2617.09 samples/sec   Loss 2.1579   LearningRate 0.0024   Epoch: 16   Global Step: 700750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:38,717-Speed 2630.48 samples/sec   Loss 2.1431   LearningRate 0.0024   Epoch: 16   Global Step: 700760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:42,620-Speed 2624.56 samples/sec   Loss 2.1367   LearningRate 0.0024   Epoch: 16   Global Step: 700770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:30:46,497-Speed 2642.10 samples/sec   Loss 2.1960   LearningRate 0.0024   Epoch: 16   Global Step: 700780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:50,393-Speed 2628.89 samples/sec   Loss 2.1749   LearningRate 0.0024   Epoch: 16   Global Step: 700790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:54,304-Speed 2618.57 samples/sec   Loss 2.2465   LearningRate 0.0024   Epoch: 16   Global Step: 700800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:30:58,203-Speed 2626.83 samples/sec   Loss 2.1514   LearningRate 0.0024   Epoch: 16   Global Step: 700810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:02,106-Speed 2624.14 samples/sec   Loss 2.1750   LearningRate 0.0024   Epoch: 16   Global Step: 700820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:06,004-Speed 2628.30 samples/sec   Loss 2.1789   LearningRate 0.0024   Epoch: 16   Global Step: 700830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:09,902-Speed 2627.41 samples/sec   Loss 2.1896   LearningRate 0.0024   Epoch: 16   Global Step: 700840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:13,805-Speed 2625.15 samples/sec   Loss 2.1875   LearningRate 0.0024   Epoch: 16   Global Step: 700850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:17,720-Speed 2616.37 samples/sec   Loss 2.1001   LearningRate 0.0024   Epoch: 16   Global Step: 700860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:21,621-Speed 2625.33 samples/sec   Loss 2.2338   LearningRate 0.0024   Epoch: 16   Global Step: 700870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:25,520-Speed 2627.41 samples/sec   Loss 2.1751   LearningRate 0.0024   Epoch: 16   Global Step: 700880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:31:29,424-Speed 2623.79 samples/sec   Loss 2.1028   LearningRate 0.0024   Epoch: 16   Global Step: 700890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:31:33,330-Speed 2621.91 samples/sec   Loss 2.1353   LearningRate 0.0024   Epoch: 16   Global Step: 700900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:31:37,240-Speed 2619.40 samples/sec   Loss 2.2498   LearningRate 0.0024   Epoch: 16   Global Step: 700910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:31:41,134-Speed 2631.16 samples/sec   Loss 2.1551   LearningRate 0.0024   Epoch: 16   Global Step: 700920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:45,040-Speed 2621.99 samples/sec   Loss 2.1481   LearningRate 0.0024   Epoch: 16   Global Step: 700930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:48,947-Speed 2621.62 samples/sec   Loss 2.1970   LearningRate 0.0024   Epoch: 16   Global Step: 700940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:52,855-Speed 2620.97 samples/sec   Loss 2.1376   LearningRate 0.0024   Epoch: 16   Global Step: 700950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:31:56,762-Speed 2621.95 samples/sec   Loss 2.2471   LearningRate 0.0024   Epoch: 16   Global Step: 700960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:00,661-Speed 2626.54 samples/sec   Loss 2.1403   LearningRate 0.0024   Epoch: 16   Global Step: 700970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:04,562-Speed 2625.52 samples/sec   Loss 2.2068   LearningRate 0.0024   Epoch: 16   Global Step: 700980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:08,499-Speed 2601.57 samples/sec   Loss 2.1639   LearningRate 0.0024   Epoch: 16   Global Step: 700990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:12,428-Speed 2608.27 samples/sec   Loss 2.1859   LearningRate 0.0024   Epoch: 16   Global Step: 701000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:16,326-Speed 2627.57 samples/sec   Loss 2.1666   LearningRate 0.0024   Epoch: 16   Global Step: 701010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:20,246-Speed 2612.94 samples/sec   Loss 2.1191   LearningRate 0.0024   Epoch: 16   Global Step: 701020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:32:24,169-Speed 2610.63 samples/sec   Loss 2.1330   LearningRate 0.0024   Epoch: 16   Global Step: 701030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:32:28,056-Speed 2635.89 samples/sec   Loss 2.1921   LearningRate 0.0024   Epoch: 16   Global Step: 701040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:31,969-Speed 2617.25 samples/sec   Loss 2.1577   LearningRate 0.0024   Epoch: 16   Global Step: 701050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:35,887-Speed 2614.05 samples/sec   Loss 2.1430   LearningRate 0.0024   Epoch: 16   Global Step: 701060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:39,794-Speed 2621.28 samples/sec   Loss 2.1466   LearningRate 0.0024   Epoch: 16   Global Step: 701070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:43,701-Speed 2622.19 samples/sec   Loss 2.1629   LearningRate 0.0024   Epoch: 16   Global Step: 701080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:47,606-Speed 2622.55 samples/sec   Loss 2.1893   LearningRate 0.0024   Epoch: 16   Global Step: 701090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:51,511-Speed 2623.43 samples/sec   Loss 2.1565   LearningRate 0.0024   Epoch: 16   Global Step: 701100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:55,418-Speed 2621.76 samples/sec   Loss 2.1944   LearningRate 0.0024   Epoch: 16   Global Step: 701110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:32:59,331-Speed 2617.74 samples/sec   Loss 2.1769   LearningRate 0.0024   Epoch: 16   Global Step: 701120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:03,239-Speed 2620.60 samples/sec   Loss 2.1827   LearningRate 0.0024   Epoch: 16   Global Step: 701130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:07,153-Speed 2616.50 samples/sec   Loss 2.2181   LearningRate 0.0024   Epoch: 16   Global Step: 701140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:33:11,034-Speed 2639.58 samples/sec   Loss 2.1723   LearningRate 0.0024   Epoch: 16   Global Step: 701150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:14,941-Speed 2621.07 samples/sec   Loss 2.1448   LearningRate 0.0024   Epoch: 16   Global Step: 701160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:18,845-Speed 2624.16 samples/sec   Loss 2.1596   LearningRate 0.0024   Epoch: 16   Global Step: 701170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:22,748-Speed 2624.40 samples/sec   Loss 2.1682   LearningRate 0.0024   Epoch: 16   Global Step: 701180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:26,657-Speed 2620.02 samples/sec   Loss 2.1489   LearningRate 0.0024   Epoch: 16   Global Step: 701190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:30,566-Speed 2620.32 samples/sec   Loss 2.0734   LearningRate 0.0024   Epoch: 16   Global Step: 701200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:34,479-Speed 2617.61 samples/sec   Loss 2.1560   LearningRate 0.0024   Epoch: 16   Global Step: 701210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:38,383-Speed 2623.21 samples/sec   Loss 2.2064   LearningRate 0.0024   Epoch: 16   Global Step: 701220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:42,308-Speed 2610.38 samples/sec   Loss 2.1327   LearningRate 0.0024   Epoch: 16   Global Step: 701230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:46,212-Speed 2622.94 samples/sec   Loss 2.1907   LearningRate 0.0024   Epoch: 16   Global Step: 701240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:50,116-Speed 2624.06 samples/sec   Loss 2.1637   LearningRate 0.0024   Epoch: 16   Global Step: 701250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:33:53,996-Speed 2639.45 samples/sec   Loss 2.1707   LearningRate 0.0024   Epoch: 16   Global Step: 701260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:33:57,901-Speed 2623.73 samples/sec   Loss 2.1567   LearningRate 0.0024   Epoch: 16   Global Step: 701270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:34:01,800-Speed 2626.68 samples/sec   Loss 2.1005   LearningRate 0.0024   Epoch: 16   Global Step: 701280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:34:05,714-Speed 2616.89 samples/sec   Loss 2.1371   LearningRate 0.0024   Epoch: 16   Global Step: 701290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:34:09,617-Speed 2624.15 samples/sec   Loss 2.2109   LearningRate 0.0024   Epoch: 16   Global Step: 701300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:34:13,521-Speed 2624.38 samples/sec   Loss 2.2053   LearningRate 0.0024   Epoch: 16   Global Step: 701310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:34:17,429-Speed 2620.78 samples/sec   Loss 2.2593   LearningRate 0.0024   Epoch: 16   Global Step: 701320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:34:21,307-Speed 2640.81 samples/sec   Loss 2.1438   LearningRate 0.0024   Epoch: 16   Global Step: 701330   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:25,224-Speed 2615.64 samples/sec   Loss 2.1630   LearningRate 0.0024   Epoch: 16   Global Step: 701340   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:29,132-Speed 2620.64 samples/sec   Loss 2.1140   LearningRate 0.0024   Epoch: 16   Global Step: 701350   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:33,040-Speed 2621.65 samples/sec   Loss 2.1295   LearningRate 0.0024   Epoch: 16   Global Step: 701360   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:36,954-Speed 2616.55 samples/sec   Loss 2.1892   LearningRate 0.0024   Epoch: 16   Global Step: 701370   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:40,862-Speed 2621.29 samples/sec   Loss 2.2321   LearningRate 0.0024   Epoch: 16   Global Step: 701380   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:44,773-Speed 2618.97 samples/sec   Loss 2.1931   LearningRate 0.0024   Epoch: 16   Global Step: 701390   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:48,680-Speed 2622.07 samples/sec   Loss 2.2035   LearningRate 0.0024   Epoch: 16   Global Step: 701400   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:52,616-Speed 2602.02 samples/sec   Loss 2.1950   LearningRate 0.0024   Epoch: 16   Global Step: 701410   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:34:56,522-Speed 2622.97 samples/sec   Loss 2.1216   LearningRate 0.0024   Epoch: 16   Global Step: 701420   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:35:00,426-Speed 2623.52 samples/sec   Loss 2.1365   LearningRate 0.0024   Epoch: 16   Global Step: 701430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:04,327-Speed 2626.33 samples/sec   Loss 2.1348   LearningRate 0.0024   Epoch: 16   Global Step: 701440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:08,229-Speed 2624.59 samples/sec   Loss 2.1439   LearningRate 0.0024   Epoch: 16   Global Step: 701450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:12,135-Speed 2622.33 samples/sec   Loss 2.1768   LearningRate 0.0024   Epoch: 16   Global Step: 701460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:16,036-Speed 2625.60 samples/sec   Loss 2.1369   LearningRate 0.0024   Epoch: 16   Global Step: 701470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:19,939-Speed 2624.42 samples/sec   Loss 2.2363   LearningRate 0.0024   Epoch: 16   Global Step: 701480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:23,843-Speed 2623.77 samples/sec   Loss 2.1673   LearningRate 0.0024   Epoch: 16   Global Step: 701490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:27,747-Speed 2623.40 samples/sec   Loss 2.1804   LearningRate 0.0024   Epoch: 16   Global Step: 701500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:31,649-Speed 2624.93 samples/sec   Loss 2.2178   LearningRate 0.0024   Epoch: 16   Global Step: 701510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:35,554-Speed 2622.83 samples/sec   Loss 2.1693   LearningRate 0.0024   Epoch: 16   Global Step: 701520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:35:39,456-Speed 2624.99 samples/sec   Loss 2.1784   LearningRate 0.0024   Epoch: 16   Global Step: 701530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:35:43,363-Speed 2621.04 samples/sec   Loss 2.1456   LearningRate 0.0024   Epoch: 16   Global Step: 701540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:35:47,234-Speed 2646.73 samples/sec   Loss 2.2100   LearningRate 0.0024   Epoch: 16   Global Step: 701550   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:35:51,148-Speed 2616.39 samples/sec   Loss 2.2004   LearningRate 0.0024   Epoch: 16   Global Step: 701560   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:35:55,053-Speed 2623.27 samples/sec   Loss 2.1960   LearningRate 0.0024   Epoch: 16   Global Step: 701570   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:35:58,958-Speed 2622.57 samples/sec   Loss 2.0707   LearningRate 0.0024   Epoch: 16   Global Step: 701580   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:02,864-Speed 2622.47 samples/sec   Loss 2.1267   LearningRate 0.0024   Epoch: 16   Global Step: 701590   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:06,767-Speed 2624.46 samples/sec   Loss 2.2276   LearningRate 0.0024   Epoch: 16   Global Step: 701600   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:10,670-Speed 2623.99 samples/sec   Loss 2.1473   LearningRate 0.0024   Epoch: 16   Global Step: 701610   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:14,572-Speed 2624.46 samples/sec   Loss 2.2058   LearningRate 0.0024   Epoch: 16   Global Step: 701620   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:18,485-Speed 2618.01 samples/sec   Loss 2.2172   LearningRate 0.0024   Epoch: 16   Global Step: 701630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:22,389-Speed 2623.33 samples/sec   Loss 2.1868   LearningRate 0.0024   Epoch: 16   Global Step: 701640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:36:26,306-Speed 2614.60 samples/sec   Loss 2.1710   LearningRate 0.0024   Epoch: 16   Global Step: 701650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:30,210-Speed 2623.43 samples/sec   Loss 2.1085   LearningRate 0.0024   Epoch: 16   Global Step: 701660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:34,115-Speed 2623.35 samples/sec   Loss 2.1335   LearningRate 0.0024   Epoch: 16   Global Step: 701670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:38,019-Speed 2623.78 samples/sec   Loss 2.1941   LearningRate 0.0024   Epoch: 16   Global Step: 701680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:41,930-Speed 2618.98 samples/sec   Loss 2.1051   LearningRate 0.0024   Epoch: 16   Global Step: 701690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:45,842-Speed 2618.35 samples/sec   Loss 2.1944   LearningRate 0.0024   Epoch: 16   Global Step: 701700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:49,744-Speed 2624.83 samples/sec   Loss 2.1363   LearningRate 0.0024   Epoch: 16   Global Step: 701710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:53,653-Speed 2620.60 samples/sec   Loss 2.1272   LearningRate 0.0024   Epoch: 16   Global Step: 701720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:36:57,568-Speed 2615.57 samples/sec   Loss 2.1616   LearningRate 0.0024   Epoch: 16   Global Step: 701730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:01,647-Speed 2511.37 samples/sec   Loss 2.1552   LearningRate 0.0024   Epoch: 16   Global Step: 701740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:05,560-Speed 2616.94 samples/sec   Loss 2.1828   LearningRate 0.0024   Epoch: 16   Global Step: 701750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:37:09,461-Speed 2626.17 samples/sec   Loss 2.1442   LearningRate 0.0024   Epoch: 16   Global Step: 701760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:13,387-Speed 2609.48 samples/sec   Loss 2.2377   LearningRate 0.0024   Epoch: 16   Global Step: 701770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:17,290-Speed 2623.88 samples/sec   Loss 2.1531   LearningRate 0.0024   Epoch: 16   Global Step: 701780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:21,200-Speed 2620.79 samples/sec   Loss 2.1910   LearningRate 0.0024   Epoch: 16   Global Step: 701790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:25,112-Speed 2618.50 samples/sec   Loss 2.1524   LearningRate 0.0024   Epoch: 16   Global Step: 701800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:29,029-Speed 2615.62 samples/sec   Loss 2.1269   LearningRate 0.0024   Epoch: 16   Global Step: 701810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:32,935-Speed 2621.80 samples/sec   Loss 2.1818   LearningRate 0.0024   Epoch: 16   Global Step: 701820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:36,846-Speed 2619.24 samples/sec   Loss 2.2254   LearningRate 0.0024   Epoch: 16   Global Step: 701830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:40,751-Speed 2622.72 samples/sec   Loss 2.1880   LearningRate 0.0024   Epoch: 16   Global Step: 701840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:37:44,630-Speed 2640.41 samples/sec   Loss 2.1428   LearningRate 0.0024   Epoch: 16   Global Step: 701850   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:37:48,536-Speed 2622.58 samples/sec   Loss 2.2155   LearningRate 0.0024   Epoch: 16   Global Step: 701860   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:37:52,456-Speed 2612.54 samples/sec   Loss 2.1636   LearningRate 0.0024   Epoch: 16   Global Step: 701870   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:37:56,361-Speed 2622.72 samples/sec   Loss 2.2025   LearningRate 0.0024   Epoch: 16   Global Step: 701880   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:00,267-Speed 2622.12 samples/sec   Loss 2.2270   LearningRate 0.0024   Epoch: 16   Global Step: 701890   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:04,173-Speed 2622.76 samples/sec   Loss 2.2055   LearningRate 0.0024   Epoch: 16   Global Step: 701900   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:08,079-Speed 2622.05 samples/sec   Loss 2.1886   LearningRate 0.0024   Epoch: 16   Global Step: 701910   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:11,984-Speed 2623.18 samples/sec   Loss 2.1722   LearningRate 0.0024   Epoch: 16   Global Step: 701920   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:15,887-Speed 2624.25 samples/sec   Loss 2.2209   LearningRate 0.0024   Epoch: 16   Global Step: 701930   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:19,790-Speed 2624.22 samples/sec   Loss 2.2370   LearningRate 0.0024   Epoch: 16   Global Step: 701940   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:38:23,701-Speed 2618.83 samples/sec   Loss 2.1273   LearningRate 0.0024   Epoch: 16   Global Step: 701950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:27,605-Speed 2623.55 samples/sec   Loss 2.1862   LearningRate 0.0024   Epoch: 16   Global Step: 701960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:31,508-Speed 2624.43 samples/sec   Loss 2.1468   LearningRate 0.0024   Epoch: 16   Global Step: 701970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:35,412-Speed 2622.78 samples/sec   Loss 2.1825   LearningRate 0.0024   Epoch: 16   Global Step: 701980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:39,319-Speed 2622.24 samples/sec   Loss 2.1450   LearningRate 0.0024   Epoch: 16   Global Step: 701990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:43,226-Speed 2621.43 samples/sec   Loss 2.1507   LearningRate 0.0024   Epoch: 16   Global Step: 702000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:47,130-Speed 2623.35 samples/sec   Loss 2.1218   LearningRate 0.0024   Epoch: 16   Global Step: 702010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:51,032-Speed 2625.47 samples/sec   Loss 2.2129   LearningRate 0.0024   Epoch: 16   Global Step: 702020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:54,946-Speed 2616.83 samples/sec   Loss 2.1124   LearningRate 0.0024   Epoch: 16   Global Step: 702030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:38:58,848-Speed 2625.40 samples/sec   Loss 2.1908   LearningRate 0.0024   Epoch: 16   Global Step: 702040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:02,748-Speed 2625.93 samples/sec   Loss 2.2359   LearningRate 0.0024   Epoch: 16   Global Step: 702050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:39:06,627-Speed 2640.33 samples/sec   Loss 2.1999   LearningRate 0.0024   Epoch: 16   Global Step: 702060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:10,548-Speed 2612.02 samples/sec   Loss 2.1703   LearningRate 0.0024   Epoch: 16   Global Step: 702070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:14,466-Speed 2614.25 samples/sec   Loss 2.1678   LearningRate 0.0024   Epoch: 16   Global Step: 702080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:18,402-Speed 2602.88 samples/sec   Loss 2.1412   LearningRate 0.0024   Epoch: 16   Global Step: 702090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:22,335-Speed 2603.95 samples/sec   Loss 2.2147   LearningRate 0.0024   Epoch: 16   Global Step: 702100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:26,251-Speed 2616.19 samples/sec   Loss 2.1902   LearningRate 0.0024   Epoch: 16   Global Step: 702110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:39:30,105-Speed 2656.97 samples/sec   Loss 2.1773   LearningRate 0.0024   Epoch: 16   Global Step: 702120   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:34,003-Speed 2627.81 samples/sec   Loss 2.1289   LearningRate 0.0024   Epoch: 16   Global Step: 702130   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:37,906-Speed 2623.53 samples/sec   Loss 2.1928   LearningRate 0.0024   Epoch: 16   Global Step: 702140   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:41,808-Speed 2625.58 samples/sec   Loss 2.1639   LearningRate 0.0024   Epoch: 16   Global Step: 702150   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:45,720-Speed 2617.95 samples/sec   Loss 2.1365   LearningRate 0.0024   Epoch: 16   Global Step: 702160   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:49,630-Speed 2619.58 samples/sec   Loss 2.2363   LearningRate 0.0024   Epoch: 16   Global Step: 702170   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:53,535-Speed 2623.27 samples/sec   Loss 2.2040   LearningRate 0.0024   Epoch: 16   Global Step: 702180   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:39:57,439-Speed 2624.18 samples/sec   Loss 2.1325   LearningRate 0.0024   Epoch: 16   Global Step: 702190   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:40:01,353-Speed 2617.02 samples/sec   Loss 2.1201   LearningRate 0.0024   Epoch: 16   Global Step: 702200   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:40:05,259-Speed 2621.96 samples/sec   Loss 2.2114   LearningRate 0.0024   Epoch: 16   Global Step: 702210   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 02:40:09,165-Speed 2622.27 samples/sec   Loss 2.1163   LearningRate 0.0024   Epoch: 16   Global Step: 702220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:13,079-Speed 2616.49 samples/sec   Loss 2.1471   LearningRate 0.0024   Epoch: 16   Global Step: 702230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:16,991-Speed 2618.18 samples/sec   Loss 2.1481   LearningRate 0.0024   Epoch: 16   Global Step: 702240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:20,907-Speed 2615.71 samples/sec   Loss 2.1675   LearningRate 0.0024   Epoch: 16   Global Step: 702250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:24,811-Speed 2623.62 samples/sec   Loss 2.1827   LearningRate 0.0024   Epoch: 16   Global Step: 702260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:28,713-Speed 2625.14 samples/sec   Loss 2.1226   LearningRate 0.0024   Epoch: 16   Global Step: 702270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:32,620-Speed 2621.59 samples/sec   Loss 2.1139   LearningRate 0.0024   Epoch: 16   Global Step: 702280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:36,531-Speed 2619.22 samples/sec   Loss 2.1929   LearningRate 0.0024   Epoch: 16   Global Step: 702290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:40,437-Speed 2622.09 samples/sec   Loss 2.2078   LearningRate 0.0024   Epoch: 16   Global Step: 702300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:44,337-Speed 2625.93 samples/sec   Loss 2.1579   LearningRate 0.0024   Epoch: 16   Global Step: 702310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:40:48,239-Speed 2624.99 samples/sec   Loss 2.1711   LearningRate 0.0024   Epoch: 16   Global Step: 702320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:40:52,145-Speed 2622.21 samples/sec   Loss 2.1720   LearningRate 0.0024   Epoch: 16   Global Step: 702330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:40:56,060-Speed 2616.60 samples/sec   Loss 2.1253   LearningRate 0.0024   Epoch: 16   Global Step: 702340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:40:59,966-Speed 2621.76 samples/sec   Loss 2.0996   LearningRate 0.0024   Epoch: 16   Global Step: 702350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:03,872-Speed 2622.23 samples/sec   Loss 2.1047   LearningRate 0.0024   Epoch: 16   Global Step: 702360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:07,770-Speed 2627.75 samples/sec   Loss 2.1577   LearningRate 0.0024   Epoch: 16   Global Step: 702370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:11,675-Speed 2623.01 samples/sec   Loss 2.1835   LearningRate 0.0024   Epoch: 16   Global Step: 702380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:15,575-Speed 2626.05 samples/sec   Loss 2.1205   LearningRate 0.0024   Epoch: 16   Global Step: 702390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:19,489-Speed 2616.95 samples/sec   Loss 2.1484   LearningRate 0.0024   Epoch: 16   Global Step: 702400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:23,393-Speed 2623.32 samples/sec   Loss 2.0688   LearningRate 0.0023   Epoch: 16   Global Step: 702410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:27,293-Speed 2626.45 samples/sec   Loss 2.1684   LearningRate 0.0023   Epoch: 16   Global Step: 702420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:41:31,190-Speed 2627.98 samples/sec   Loss 2.1836   LearningRate 0.0023   Epoch: 16   Global Step: 702430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:41:35,105-Speed 2616.40 samples/sec   Loss 2.1498   LearningRate 0.0023   Epoch: 16   Global Step: 702440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:41:39,045-Speed 2599.49 samples/sec   Loss 2.1607   LearningRate 0.0023   Epoch: 16   Global Step: 702450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:41:42,925-Speed 2639.79 samples/sec   Loss 2.1531   LearningRate 0.0023   Epoch: 16   Global Step: 702460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:46,838-Speed 2617.77 samples/sec   Loss 2.1968   LearningRate 0.0023   Epoch: 16   Global Step: 702470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:50,756-Speed 2613.96 samples/sec   Loss 2.1415   LearningRate 0.0023   Epoch: 16   Global Step: 702480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:54,689-Speed 2604.31 samples/sec   Loss 2.2302   LearningRate 0.0023   Epoch: 16   Global Step: 702490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:41:58,571-Speed 2638.32 samples/sec   Loss 2.1634   LearningRate 0.0023   Epoch: 16   Global Step: 702500   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:02,492-Speed 2612.87 samples/sec   Loss 2.1395   LearningRate 0.0023   Epoch: 16   Global Step: 702510   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:06,392-Speed 2626.35 samples/sec   Loss 2.1260   LearningRate 0.0023   Epoch: 16   Global Step: 702520   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:10,301-Speed 2620.22 samples/sec   Loss 2.1894   LearningRate 0.0023   Epoch: 16   Global Step: 702530   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:14,201-Speed 2625.71 samples/sec   Loss 2.1505   LearningRate 0.0023   Epoch: 16   Global Step: 702540   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:18,108-Speed 2622.09 samples/sec   Loss 2.1765   LearningRate 0.0023   Epoch: 16   Global Step: 702550   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:22,040-Speed 2604.53 samples/sec   Loss 2.1129   LearningRate 0.0023   Epoch: 16   Global Step: 702560   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:25,944-Speed 2623.23 samples/sec   Loss 2.0765   LearningRate 0.0023   Epoch: 16   Global Step: 702570   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:29,857-Speed 2618.17 samples/sec   Loss 2.1242   LearningRate 0.0023   Epoch: 16   Global Step: 702580   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:33,767-Speed 2619.70 samples/sec   Loss 2.2052   LearningRate 0.0023   Epoch: 16   Global Step: 702590   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:37,677-Speed 2619.37 samples/sec   Loss 2.0826   LearningRate 0.0023   Epoch: 16   Global Step: 702600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:42:41,578-Speed 2625.04 samples/sec   Loss 2.1756   LearningRate 0.0023   Epoch: 16   Global Step: 702610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:42:45,479-Speed 2626.00 samples/sec   Loss 2.1765   LearningRate 0.0023   Epoch: 16   Global Step: 702620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:42:49,362-Speed 2637.31 samples/sec   Loss 2.1418   LearningRate 0.0023   Epoch: 16   Global Step: 702630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:53,278-Speed 2615.73 samples/sec   Loss 2.0947   LearningRate 0.0023   Epoch: 16   Global Step: 702640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:42:57,192-Speed 2616.79 samples/sec   Loss 2.1374   LearningRate 0.0023   Epoch: 16   Global Step: 702650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:01,089-Speed 2628.47 samples/sec   Loss 2.1405   LearningRate 0.0023   Epoch: 16   Global Step: 702660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:04,989-Speed 2626.10 samples/sec   Loss 2.1581   LearningRate 0.0023   Epoch: 16   Global Step: 702670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:08,889-Speed 2626.84 samples/sec   Loss 2.1680   LearningRate 0.0023   Epoch: 16   Global Step: 702680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:12,785-Speed 2628.42 samples/sec   Loss 2.0957   LearningRate 0.0023   Epoch: 16   Global Step: 702690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:16,681-Speed 2629.16 samples/sec   Loss 2.0657   LearningRate 0.0023   Epoch: 16   Global Step: 702700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:20,582-Speed 2625.29 samples/sec   Loss 2.0956   LearningRate 0.0023   Epoch: 16   Global Step: 702710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:24,477-Speed 2629.66 samples/sec   Loss 2.0835   LearningRate 0.0023   Epoch: 16   Global Step: 702720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:43:28,372-Speed 2629.58 samples/sec   Loss 2.1522   LearningRate 0.0023   Epoch: 16   Global Step: 702730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:32,268-Speed 2629.15 samples/sec   Loss 2.1588   LearningRate 0.0023   Epoch: 16   Global Step: 702740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:36,181-Speed 2617.53 samples/sec   Loss 2.1361   LearningRate 0.0023   Epoch: 16   Global Step: 702750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:40,081-Speed 2625.64 samples/sec   Loss 2.1145   LearningRate 0.0023   Epoch: 16   Global Step: 702760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:43,980-Speed 2627.52 samples/sec   Loss 2.1890   LearningRate 0.0023   Epoch: 16   Global Step: 702770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:47,875-Speed 2629.57 samples/sec   Loss 2.1622   LearningRate 0.0023   Epoch: 16   Global Step: 702780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:51,769-Speed 2630.75 samples/sec   Loss 2.1149   LearningRate 0.0023   Epoch: 16   Global Step: 702790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:55,666-Speed 2628.01 samples/sec   Loss 2.1593   LearningRate 0.0023   Epoch: 16   Global Step: 702800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:43:59,564-Speed 2627.87 samples/sec   Loss 2.1179   LearningRate 0.0023   Epoch: 16   Global Step: 702810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:03,463-Speed 2626.73 samples/sec   Loss 2.1296   LearningRate 0.0023   Epoch: 16   Global Step: 702820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:07,339-Speed 2641.86 samples/sec   Loss 2.2117   LearningRate 0.0023   Epoch: 16   Global Step: 702830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:11,250-Speed 2619.11 samples/sec   Loss 2.1499   LearningRate 0.0023   Epoch: 16   Global Step: 702840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:15,158-Speed 2621.56 samples/sec   Loss 2.1433   LearningRate 0.0023   Epoch: 16   Global Step: 702850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:19,054-Speed 2629.22 samples/sec   Loss 2.1203   LearningRate 0.0023   Epoch: 16   Global Step: 702860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:22,958-Speed 2623.83 samples/sec   Loss 2.1087   LearningRate 0.0023   Epoch: 16   Global Step: 702870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:26,890-Speed 2605.30 samples/sec   Loss 2.1760   LearningRate 0.0023   Epoch: 16   Global Step: 702880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:30,799-Speed 2619.83 samples/sec   Loss 2.2039   LearningRate 0.0023   Epoch: 16   Global Step: 702890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:34,703-Speed 2623.70 samples/sec   Loss 2.1706   LearningRate 0.0023   Epoch: 16   Global Step: 702900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:38,608-Speed 2622.34 samples/sec   Loss 2.1831   LearningRate 0.0023   Epoch: 16   Global Step: 702910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:42,510-Speed 2625.68 samples/sec   Loss 2.1569   LearningRate 0.0023   Epoch: 16   Global Step: 702920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:46,412-Speed 2624.27 samples/sec   Loss 2.1396   LearningRate 0.0023   Epoch: 16   Global Step: 702930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:44:50,282-Speed 2647.01 samples/sec   Loss 2.1790   LearningRate 0.0023   Epoch: 16   Global Step: 702940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:54,177-Speed 2629.85 samples/sec   Loss 2.1335   LearningRate 0.0023   Epoch: 16   Global Step: 702950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:44:58,072-Speed 2629.38 samples/sec   Loss 2.1519   LearningRate 0.0023   Epoch: 16   Global Step: 702960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:45:01,969-Speed 2628.40 samples/sec   Loss 2.1653   LearningRate 0.0023   Epoch: 16   Global Step: 702970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:45:05,884-Speed 2616.56 samples/sec   Loss 2.1182   LearningRate 0.0023   Epoch: 16   Global Step: 702980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:45:09,757-Speed 2644.06 samples/sec   Loss 2.1196   LearningRate 0.0023   Epoch: 16   Global Step: 702990   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:13,671-Speed 2617.12 samples/sec   Loss 2.1072   LearningRate 0.0023   Epoch: 16   Global Step: 703000   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:17,572-Speed 2625.36 samples/sec   Loss 2.1552   LearningRate 0.0023   Epoch: 16   Global Step: 703010   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:21,514-Speed 2599.04 samples/sec   Loss 2.1328   LearningRate 0.0023   Epoch: 16   Global Step: 703020   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:25,414-Speed 2626.08 samples/sec   Loss 2.1637   LearningRate 0.0023   Epoch: 16   Global Step: 703030   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:29,313-Speed 2627.06 samples/sec   Loss 2.1682   LearningRate 0.0023   Epoch: 16   Global Step: 703040   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:33,213-Speed 2626.88 samples/sec   Loss 2.1879   LearningRate 0.0023   Epoch: 16   Global Step: 703050   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:37,119-Speed 2621.60 samples/sec   Loss 2.2022   LearningRate 0.0023   Epoch: 16   Global Step: 703060   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:41,016-Speed 2628.59 samples/sec   Loss 2.2429   LearningRate 0.0023   Epoch: 16   Global Step: 703070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:44,916-Speed 2626.07 samples/sec   Loss 2.0915   LearningRate 0.0023   Epoch: 16   Global Step: 703080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:45:48,813-Speed 2628.50 samples/sec   Loss 2.1302   LearningRate 0.0023   Epoch: 16   Global Step: 703090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:45:52,713-Speed 2626.80 samples/sec   Loss 2.1893   LearningRate 0.0023   Epoch: 16   Global Step: 703100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:45:56,614-Speed 2625.18 samples/sec   Loss 2.1310   LearningRate 0.0023   Epoch: 16   Global Step: 703110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:00,507-Speed 2631.88 samples/sec   Loss 2.1079   LearningRate 0.0023   Epoch: 16   Global Step: 703120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:04,404-Speed 2627.94 samples/sec   Loss 2.1788   LearningRate 0.0023   Epoch: 16   Global Step: 703130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:08,299-Speed 2629.26 samples/sec   Loss 2.1863   LearningRate 0.0023   Epoch: 16   Global Step: 703140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:12,199-Speed 2626.41 samples/sec   Loss 2.1171   LearningRate 0.0023   Epoch: 16   Global Step: 703150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:16,097-Speed 2627.13 samples/sec   Loss 2.1761   LearningRate 0.0023   Epoch: 16   Global Step: 703160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:20,003-Speed 2622.50 samples/sec   Loss 2.1765   LearningRate 0.0023   Epoch: 16   Global Step: 703170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:23,911-Speed 2620.96 samples/sec   Loss 2.1090   LearningRate 0.0023   Epoch: 16   Global Step: 703180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:27,806-Speed 2629.60 samples/sec   Loss 2.1926   LearningRate 0.0023   Epoch: 16   Global Step: 703190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:46:31,690-Speed 2637.07 samples/sec   Loss 2.1628   LearningRate 0.0023   Epoch: 16   Global Step: 703200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:35,615-Speed 2610.31 samples/sec   Loss 2.1630   LearningRate 0.0023   Epoch: 16   Global Step: 703210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:39,520-Speed 2622.76 samples/sec   Loss 2.2052   LearningRate 0.0023   Epoch: 16   Global Step: 703220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:43,420-Speed 2626.21 samples/sec   Loss 2.1957   LearningRate 0.0023   Epoch: 16   Global Step: 703230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:46:47,295-Speed 2643.08 samples/sec   Loss 2.1767   LearningRate 0.0023   Epoch: 16   Global Step: 703240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:46:51,192-Speed 2628.31 samples/sec   Loss 2.1021   LearningRate 0.0023   Epoch: 16   Global Step: 703250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:46:55,090-Speed 2628.00 samples/sec   Loss 2.1248   LearningRate 0.0023   Epoch: 16   Global Step: 703260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:46:58,990-Speed 2625.98 samples/sec   Loss 2.1512   LearningRate 0.0023   Epoch: 16   Global Step: 703270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:02,888-Speed 2627.67 samples/sec   Loss 2.1182   LearningRate 0.0023   Epoch: 16   Global Step: 703280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:06,784-Speed 2628.81 samples/sec   Loss 2.2443   LearningRate 0.0023   Epoch: 16   Global Step: 703290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:10,696-Speed 2618.73 samples/sec   Loss 2.0839   LearningRate 0.0023   Epoch: 16   Global Step: 703300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:14,603-Speed 2621.09 samples/sec   Loss 2.1507   LearningRate 0.0023   Epoch: 16   Global Step: 703310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:18,508-Speed 2623.03 samples/sec   Loss 2.0696   LearningRate 0.0023   Epoch: 16   Global Step: 703320   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:22,408-Speed 2626.06 samples/sec   Loss 2.1813   LearningRate 0.0023   Epoch: 16   Global Step: 703330   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:26,311-Speed 2624.40 samples/sec   Loss 2.1124   LearningRate 0.0023   Epoch: 16   Global Step: 703340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:47:30,194-Speed 2637.88 samples/sec   Loss 2.1402   LearningRate 0.0023   Epoch: 16   Global Step: 703350   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:34,118-Speed 2610.36 samples/sec   Loss 2.1496   LearningRate 0.0023   Epoch: 16   Global Step: 703360   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:38,024-Speed 2622.06 samples/sec   Loss 2.1548   LearningRate 0.0023   Epoch: 16   Global Step: 703370   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:41,920-Speed 2629.38 samples/sec   Loss 2.0937   LearningRate 0.0023   Epoch: 16   Global Step: 703380   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:45,816-Speed 2628.78 samples/sec   Loss 2.1692   LearningRate 0.0023   Epoch: 16   Global Step: 703390   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:49,713-Speed 2628.37 samples/sec   Loss 2.1405   LearningRate 0.0023   Epoch: 16   Global Step: 703400   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:53,621-Speed 2621.03 samples/sec   Loss 2.1105   LearningRate 0.0023   Epoch: 16   Global Step: 703410   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:47:57,525-Speed 2623.95 samples/sec   Loss 2.0867   LearningRate 0.0023   Epoch: 16   Global Step: 703420   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:48:01,423-Speed 2627.06 samples/sec   Loss 2.1317   LearningRate 0.0023   Epoch: 16   Global Step: 703430   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:48:05,354-Speed 2606.14 samples/sec   Loss 2.1211   LearningRate 0.0023   Epoch: 16   Global Step: 703440   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:48:09,251-Speed 2628.71 samples/sec   Loss 2.1675   LearningRate 0.0023   Epoch: 16   Global Step: 703450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:13,148-Speed 2628.11 samples/sec   Loss 2.2090   LearningRate 0.0023   Epoch: 16   Global Step: 703460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:17,066-Speed 2613.56 samples/sec   Loss 2.1280   LearningRate 0.0023   Epoch: 16   Global Step: 703470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:20,968-Speed 2625.67 samples/sec   Loss 2.1359   LearningRate 0.0023   Epoch: 16   Global Step: 703480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:24,864-Speed 2629.24 samples/sec   Loss 2.1552   LearningRate 0.0023   Epoch: 16   Global Step: 703490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:28,761-Speed 2628.44 samples/sec   Loss 2.1265   LearningRate 0.0023   Epoch: 16   Global Step: 703500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:32,658-Speed 2627.65 samples/sec   Loss 2.1489   LearningRate 0.0023   Epoch: 16   Global Step: 703510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:36,554-Speed 2629.00 samples/sec   Loss 2.1739   LearningRate 0.0023   Epoch: 16   Global Step: 703520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:40,453-Speed 2627.15 samples/sec   Loss 2.1393   LearningRate 0.0023   Epoch: 16   Global Step: 703530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:44,353-Speed 2627.05 samples/sec   Loss 2.1374   LearningRate 0.0023   Epoch: 16   Global Step: 703540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:48,277-Speed 2610.13 samples/sec   Loss 2.1800   LearningRate 0.0023   Epoch: 16   Global Step: 703550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:48:52,172-Speed 2629.85 samples/sec   Loss 2.1920   LearningRate 0.0023   Epoch: 16   Global Step: 703560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:48:56,048-Speed 2641.98 samples/sec   Loss 2.1139   LearningRate 0.0023   Epoch: 16   Global Step: 703570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:48:59,946-Speed 2627.61 samples/sec   Loss 2.1307   LearningRate 0.0023   Epoch: 16   Global Step: 703580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:03,836-Speed 2632.75 samples/sec   Loss 2.1266   LearningRate 0.0023   Epoch: 16   Global Step: 703590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:07,732-Speed 2629.48 samples/sec   Loss 2.1016   LearningRate 0.0023   Epoch: 16   Global Step: 703600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:11,643-Speed 2618.79 samples/sec   Loss 2.1107   LearningRate 0.0023   Epoch: 16   Global Step: 703610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:15,545-Speed 2625.25 samples/sec   Loss 2.1627   LearningRate 0.0023   Epoch: 16   Global Step: 703620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:19,449-Speed 2623.83 samples/sec   Loss 2.2126   LearningRate 0.0023   Epoch: 16   Global Step: 703630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:23,356-Speed 2621.22 samples/sec   Loss 2.1119   LearningRate 0.0023   Epoch: 16   Global Step: 703640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:27,264-Speed 2621.04 samples/sec   Loss 2.1598   LearningRate 0.0023   Epoch: 16   Global Step: 703650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:31,162-Speed 2627.70 samples/sec   Loss 2.1644   LearningRate 0.0023   Epoch: 16   Global Step: 703660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:35,065-Speed 2624.32 samples/sec   Loss 2.1100   LearningRate 0.0023   Epoch: 16   Global Step: 703670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:49:38,940-Speed 2643.01 samples/sec   Loss 2.1162   LearningRate 0.0023   Epoch: 16   Global Step: 703680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:42,862-Speed 2611.56 samples/sec   Loss 2.1022   LearningRate 0.0023   Epoch: 16   Global Step: 703690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:46,763-Speed 2625.48 samples/sec   Loss 2.1899   LearningRate 0.0023   Epoch: 16   Global Step: 703700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:50,683-Speed 2612.97 samples/sec   Loss 2.1937   LearningRate 0.0023   Epoch: 16   Global Step: 703710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:54,580-Speed 2628.25 samples/sec   Loss 2.1375   LearningRate 0.0023   Epoch: 16   Global Step: 703720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:49:58,477-Speed 2628.94 samples/sec   Loss 2.1491   LearningRate 0.0023   Epoch: 16   Global Step: 703730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:02,374-Speed 2628.15 samples/sec   Loss 2.0977   LearningRate 0.0023   Epoch: 16   Global Step: 703740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:06,269-Speed 2629.55 samples/sec   Loss 2.0968   LearningRate 0.0023   Epoch: 16   Global Step: 703750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:10,165-Speed 2628.82 samples/sec   Loss 2.1832   LearningRate 0.0023   Epoch: 16   Global Step: 703760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:14,061-Speed 2628.95 samples/sec   Loss 2.0870   LearningRate 0.0023   Epoch: 16   Global Step: 703770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:17,975-Speed 2616.94 samples/sec   Loss 2.1553   LearningRate 0.0023   Epoch: 16   Global Step: 703780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:50:21,855-Speed 2640.09 samples/sec   Loss 2.1089   LearningRate 0.0023   Epoch: 16   Global Step: 703790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:25,750-Speed 2629.86 samples/sec   Loss 2.0705   LearningRate 0.0023   Epoch: 16   Global Step: 703800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:29,645-Speed 2630.48 samples/sec   Loss 2.1304   LearningRate 0.0023   Epoch: 16   Global Step: 703810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:33,536-Speed 2632.43 samples/sec   Loss 2.1298   LearningRate 0.0023   Epoch: 16   Global Step: 703820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:37,450-Speed 2616.72 samples/sec   Loss 2.1168   LearningRate 0.0023   Epoch: 16   Global Step: 703830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:41,350-Speed 2626.21 samples/sec   Loss 2.1704   LearningRate 0.0023   Epoch: 16   Global Step: 703840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:45,259-Speed 2626.73 samples/sec   Loss 2.1873   LearningRate 0.0023   Epoch: 16   Global Step: 703850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:49,163-Speed 2623.38 samples/sec   Loss 2.0787   LearningRate 0.0023   Epoch: 16   Global Step: 703860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:53,077-Speed 2617.37 samples/sec   Loss 2.1502   LearningRate 0.0023   Epoch: 16   Global Step: 703870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:50:57,077-Speed 2560.60 samples/sec   Loss 2.1189   LearningRate 0.0023   Epoch: 16   Global Step: 703880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:00,998-Speed 2612.35 samples/sec   Loss 2.1379   LearningRate 0.0023   Epoch: 16   Global Step: 703890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:51:05,119-Speed 2485.23 samples/sec   Loss 2.1070   LearningRate 0.0023   Epoch: 16   Global Step: 703900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:51:09,168-Speed 2529.43 samples/sec   Loss 2.1544   LearningRate 0.0023   Epoch: 16   Global Step: 703910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:13,173-Speed 2558.25 samples/sec   Loss 2.0805   LearningRate 0.0023   Epoch: 16   Global Step: 703920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:17,067-Speed 2630.37 samples/sec   Loss 2.2335   LearningRate 0.0023   Epoch: 16   Global Step: 703930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:20,964-Speed 2628.46 samples/sec   Loss 2.1372   LearningRate 0.0023   Epoch: 16   Global Step: 703940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:24,870-Speed 2621.99 samples/sec   Loss 2.2151   LearningRate 0.0023   Epoch: 16   Global Step: 703950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:28,788-Speed 2614.12 samples/sec   Loss 2.1479   LearningRate 0.0023   Epoch: 16   Global Step: 703960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:32,691-Speed 2624.78 samples/sec   Loss 2.1188   LearningRate 0.0023   Epoch: 16   Global Step: 703970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:36,583-Speed 2631.47 samples/sec   Loss 2.1195   LearningRate 0.0023   Epoch: 16   Global Step: 703980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:40,480-Speed 2628.03 samples/sec   Loss 2.1793   LearningRate 0.0023   Epoch: 16   Global Step: 703990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:44,405-Speed 2610.36 samples/sec   Loss 2.0673   LearningRate 0.0023   Epoch: 16   Global Step: 704000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:48,290-Speed 2636.22 samples/sec   Loss 2.1497   LearningRate 0.0023   Epoch: 16   Global Step: 704010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:52,189-Speed 2627.22 samples/sec   Loss 2.0909   LearningRate 0.0023   Epoch: 16   Global Step: 704020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:56,084-Speed 2629.22 samples/sec   Loss 2.1715   LearningRate 0.0023   Epoch: 16   Global Step: 704030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:51:59,982-Speed 2628.19 samples/sec   Loss 2.1209   LearningRate 0.0023   Epoch: 16   Global Step: 704040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:03,889-Speed 2621.65 samples/sec   Loss 2.1455   LearningRate 0.0023   Epoch: 16   Global Step: 704050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:07,793-Speed 2623.01 samples/sec   Loss 2.1727   LearningRate 0.0023   Epoch: 16   Global Step: 704060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:11,694-Speed 2626.02 samples/sec   Loss 2.1339   LearningRate 0.0023   Epoch: 16   Global Step: 704070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:15,609-Speed 2616.17 samples/sec   Loss 2.1326   LearningRate 0.0023   Epoch: 16   Global Step: 704080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:19,510-Speed 2625.61 samples/sec   Loss 2.0744   LearningRate 0.0023   Epoch: 16   Global Step: 704090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:23,407-Speed 2628.36 samples/sec   Loss 2.1108   LearningRate 0.0023   Epoch: 16   Global Step: 704100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:27,311-Speed 2624.40 samples/sec   Loss 2.1470   LearningRate 0.0023   Epoch: 16   Global Step: 704110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:52:31,186-Speed 2643.17 samples/sec   Loss 2.1403   LearningRate 0.0023   Epoch: 16   Global Step: 704120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:35,091-Speed 2622.46 samples/sec   Loss 2.1412   LearningRate 0.0023   Epoch: 16   Global Step: 704130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:38,991-Speed 2626.15 samples/sec   Loss 2.1367   LearningRate 0.0023   Epoch: 16   Global Step: 704140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:42,900-Speed 2620.63 samples/sec   Loss 2.0718   LearningRate 0.0023   Epoch: 16   Global Step: 704150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:46,836-Speed 2602.17 samples/sec   Loss 2.1636   LearningRate 0.0023   Epoch: 16   Global Step: 704160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:50,731-Speed 2629.72 samples/sec   Loss 2.0910   LearningRate 0.0023   Epoch: 16   Global Step: 704170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:54,626-Speed 2629.77 samples/sec   Loss 2.1593   LearningRate 0.0023   Epoch: 16   Global Step: 704180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:52:58,500-Speed 2644.17 samples/sec   Loss 2.1697   LearningRate 0.0023   Epoch: 16   Global Step: 704190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:02,393-Speed 2630.46 samples/sec   Loss 2.0878   LearningRate 0.0023   Epoch: 16   Global Step: 704200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:06,296-Speed 2625.00 samples/sec   Loss 2.0830   LearningRate 0.0023   Epoch: 16   Global Step: 704210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:10,219-Speed 2611.09 samples/sec   Loss 2.1404   LearningRate 0.0023   Epoch: 16   Global Step: 704220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:14,207-Speed 2568.04 samples/sec   Loss 2.1072   LearningRate 0.0023   Epoch: 16   Global Step: 704230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:18,116-Speed 2619.88 samples/sec   Loss 2.0893   LearningRate 0.0023   Epoch: 16   Global Step: 704240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:22,016-Speed 2626.52 samples/sec   Loss 2.1761   LearningRate 0.0023   Epoch: 16   Global Step: 704250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:25,909-Speed 2631.12 samples/sec   Loss 2.1641   LearningRate 0.0023   Epoch: 16   Global Step: 704260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:29,804-Speed 2629.15 samples/sec   Loss 2.0961   LearningRate 0.0023   Epoch: 16   Global Step: 704270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:33,706-Speed 2625.56 samples/sec   Loss 2.1104   LearningRate 0.0023   Epoch: 16   Global Step: 704280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:53:37,609-Speed 2623.94 samples/sec   Loss 2.1659   LearningRate 0.0023   Epoch: 16   Global Step: 704290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:53:41,514-Speed 2623.07 samples/sec   Loss 2.1757   LearningRate 0.0023   Epoch: 16   Global Step: 704300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:53:45,415-Speed 2625.21 samples/sec   Loss 2.0920   LearningRate 0.0023   Epoch: 16   Global Step: 704310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:53:49,314-Speed 2627.69 samples/sec   Loss 2.1750   LearningRate 0.0023   Epoch: 16   Global Step: 704320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:53:53,210-Speed 2628.49 samples/sec   Loss 2.1040   LearningRate 0.0023   Epoch: 16   Global Step: 704330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:53:57,124-Speed 2618.01 samples/sec   Loss 2.0854   LearningRate 0.0023   Epoch: 16   Global Step: 704340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:01,021-Speed 2628.38 samples/sec   Loss 2.0913   LearningRate 0.0023   Epoch: 16   Global Step: 704350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:04,916-Speed 2629.46 samples/sec   Loss 2.1379   LearningRate 0.0023   Epoch: 16   Global Step: 704360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:08,853-Speed 2601.89 samples/sec   Loss 2.0970   LearningRate 0.0023   Epoch: 16   Global Step: 704370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:12,748-Speed 2629.14 samples/sec   Loss 2.1326   LearningRate 0.0023   Epoch: 16   Global Step: 704380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:16,657-Speed 2620.80 samples/sec   Loss 2.1033   LearningRate 0.0023   Epoch: 16   Global Step: 704390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:54:20,536-Speed 2640.78 samples/sec   Loss 2.1257   LearningRate 0.0023   Epoch: 16   Global Step: 704400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:24,446-Speed 2619.15 samples/sec   Loss 2.1284   LearningRate 0.0023   Epoch: 16   Global Step: 704410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:28,358-Speed 2618.24 samples/sec   Loss 2.1446   LearningRate 0.0023   Epoch: 16   Global Step: 704420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:32,256-Speed 2626.90 samples/sec   Loss 2.0736   LearningRate 0.0023   Epoch: 16   Global Step: 704430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:36,239-Speed 2572.46 samples/sec   Loss 2.1673   LearningRate 0.0023   Epoch: 16   Global Step: 704440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:40,135-Speed 2629.89 samples/sec   Loss 2.1097   LearningRate 0.0023   Epoch: 16   Global Step: 704450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:44,033-Speed 2627.98 samples/sec   Loss 2.0882   LearningRate 0.0023   Epoch: 16   Global Step: 704460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:47,931-Speed 2627.76 samples/sec   Loss 2.1458   LearningRate 0.0023   Epoch: 16   Global Step: 704470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:51,828-Speed 2628.49 samples/sec   Loss 2.1715   LearningRate 0.0023   Epoch: 16   Global Step: 704480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:55,729-Speed 2625.74 samples/sec   Loss 2.1211   LearningRate 0.0023   Epoch: 16   Global Step: 704490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:54:59,605-Speed 2642.55 samples/sec   Loss 2.1183   LearningRate 0.0023   Epoch: 16   Global Step: 704500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:03,503-Speed 2627.92 samples/sec   Loss 2.0867   LearningRate 0.0023   Epoch: 16   Global Step: 704510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:07,396-Speed 2630.49 samples/sec   Loss 2.1817   LearningRate 0.0023   Epoch: 16   Global Step: 704520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:11,293-Speed 2628.70 samples/sec   Loss 2.1171   LearningRate 0.0023   Epoch: 16   Global Step: 704530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:15,226-Speed 2604.68 samples/sec   Loss 2.1654   LearningRate 0.0023   Epoch: 16   Global Step: 704540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:19,120-Speed 2630.31 samples/sec   Loss 2.0968   LearningRate 0.0023   Epoch: 16   Global Step: 704550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:23,019-Speed 2626.87 samples/sec   Loss 2.1574   LearningRate 0.0023   Epoch: 16   Global Step: 704560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:26,919-Speed 2626.84 samples/sec   Loss 2.1651   LearningRate 0.0023   Epoch: 16   Global Step: 704570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:30,817-Speed 2627.89 samples/sec   Loss 2.1073   LearningRate 0.0023   Epoch: 16   Global Step: 704580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:34,720-Speed 2624.06 samples/sec   Loss 2.1492   LearningRate 0.0023   Epoch: 16   Global Step: 704590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:38,633-Speed 2617.62 samples/sec   Loss 2.1250   LearningRate 0.0023   Epoch: 16   Global Step: 704600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:55:42,540-Speed 2620.73 samples/sec   Loss 2.1360   LearningRate 0.0023   Epoch: 16   Global Step: 704610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:55:46,416-Speed 2643.77 samples/sec   Loss 2.1680   LearningRate 0.0023   Epoch: 16   Global Step: 704620   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:55:50,311-Speed 2629.95 samples/sec   Loss 2.0803   LearningRate 0.0023   Epoch: 16   Global Step: 704630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:55:54,214-Speed 2623.91 samples/sec   Loss 2.1432   LearningRate 0.0023   Epoch: 16   Global Step: 704640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:55:58,112-Speed 2628.85 samples/sec   Loss 2.1958   LearningRate 0.0023   Epoch: 16   Global Step: 704650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:02,011-Speed 2626.70 samples/sec   Loss 2.1540   LearningRate 0.0023   Epoch: 16   Global Step: 704660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:05,914-Speed 2624.09 samples/sec   Loss 2.1489   LearningRate 0.0023   Epoch: 16   Global Step: 704670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:09,817-Speed 2624.06 samples/sec   Loss 2.0804   LearningRate 0.0023   Epoch: 16   Global Step: 704680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:13,725-Speed 2621.54 samples/sec   Loss 2.0494   LearningRate 0.0023   Epoch: 16   Global Step: 704690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:17,620-Speed 2629.00 samples/sec   Loss 2.1014   LearningRate 0.0023   Epoch: 16   Global Step: 704700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:21,514-Speed 2631.15 samples/sec   Loss 2.1299   LearningRate 0.0023   Epoch: 16   Global Step: 704710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:56:25,413-Speed 2627.23 samples/sec   Loss 2.1462   LearningRate 0.0023   Epoch: 16   Global Step: 704720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:29,311-Speed 2628.06 samples/sec   Loss 2.1053   LearningRate 0.0023   Epoch: 16   Global Step: 704730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:33,213-Speed 2624.78 samples/sec   Loss 2.1190   LearningRate 0.0023   Epoch: 16   Global Step: 704740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:37,114-Speed 2625.36 samples/sec   Loss 2.1360   LearningRate 0.0023   Epoch: 16   Global Step: 704750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:41,014-Speed 2625.93 samples/sec   Loss 2.1081   LearningRate 0.0023   Epoch: 16   Global Step: 704760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:44,909-Speed 2630.33 samples/sec   Loss 2.1300   LearningRate 0.0023   Epoch: 16   Global Step: 704770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:48,807-Speed 2628.14 samples/sec   Loss 2.1256   LearningRate 0.0023   Epoch: 16   Global Step: 704780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:52,699-Speed 2632.02 samples/sec   Loss 2.0740   LearningRate 0.0023   Epoch: 16   Global Step: 704790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:56:56,596-Speed 2627.69 samples/sec   Loss 2.0953   LearningRate 0.0023   Epoch: 16   Global Step: 704800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:00,496-Speed 2626.27 samples/sec   Loss 2.1960   LearningRate 0.0023   Epoch: 16   Global Step: 704810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:04,380-Speed 2637.18 samples/sec   Loss 2.1493   LearningRate 0.0023   Epoch: 16   Global Step: 704820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:08,285-Speed 2623.12 samples/sec   Loss 2.0338   LearningRate 0.0023   Epoch: 16   Global Step: 704830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:12,183-Speed 2626.99 samples/sec   Loss 2.1159   LearningRate 0.0023   Epoch: 16   Global Step: 704840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:16,078-Speed 2630.25 samples/sec   Loss 2.1049   LearningRate 0.0023   Epoch: 16   Global Step: 704850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:19,973-Speed 2629.90 samples/sec   Loss 2.0924   LearningRate 0.0023   Epoch: 16   Global Step: 704860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:23,871-Speed 2627.91 samples/sec   Loss 2.1154   LearningRate 0.0023   Epoch: 16   Global Step: 704870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:27,766-Speed 2629.16 samples/sec   Loss 2.1518   LearningRate 0.0023   Epoch: 16   Global Step: 704880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:31,761-Speed 2564.32 samples/sec   Loss 2.1261   LearningRate 0.0023   Epoch: 16   Global Step: 704890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:35,660-Speed 2626.70 samples/sec   Loss 2.1192   LearningRate 0.0023   Epoch: 16   Global Step: 704900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:39,597-Speed 2601.62 samples/sec   Loss 2.1556   LearningRate 0.0023   Epoch: 16   Global Step: 704910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:43,643-Speed 2531.29 samples/sec   Loss 2.1214   LearningRate 0.0023   Epoch: 16   Global Step: 704920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:57:47,534-Speed 2631.92 samples/sec   Loss 2.1667   LearningRate 0.0023   Epoch: 16   Global Step: 704930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:51,431-Speed 2629.09 samples/sec   Loss 2.1332   LearningRate 0.0023   Epoch: 16   Global Step: 704940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:55,324-Speed 2630.66 samples/sec   Loss 2.1325   LearningRate 0.0023   Epoch: 16   Global Step: 704950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:57:59,217-Speed 2631.29 samples/sec   Loss 2.1329   LearningRate 0.0023   Epoch: 16   Global Step: 704960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:03,116-Speed 2626.60 samples/sec   Loss 2.1001   LearningRate 0.0023   Epoch: 16   Global Step: 704970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:07,017-Speed 2625.24 samples/sec   Loss 2.0684   LearningRate 0.0023   Epoch: 16   Global Step: 704980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:10,919-Speed 2624.82 samples/sec   Loss 2.1021   LearningRate 0.0023   Epoch: 16   Global Step: 704990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:14,823-Speed 2624.73 samples/sec   Loss 2.0845   LearningRate 0.0023   Epoch: 16   Global Step: 705000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:18,721-Speed 2627.18 samples/sec   Loss 2.1531   LearningRate 0.0023   Epoch: 16   Global Step: 705010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:22,623-Speed 2624.86 samples/sec   Loss 2.1374   LearningRate 0.0023   Epoch: 16   Global Step: 705020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:26,493-Speed 2646.87 samples/sec   Loss 2.0789   LearningRate 0.0023   Epoch: 16   Global Step: 705030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:30,394-Speed 2625.93 samples/sec   Loss 2.1359   LearningRate 0.0023   Epoch: 16   Global Step: 705040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:34,295-Speed 2629.47 samples/sec   Loss 2.1805   LearningRate 0.0023   Epoch: 16   Global Step: 705050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:38,239-Speed 2597.12 samples/sec   Loss 2.1895   LearningRate 0.0023   Epoch: 16   Global Step: 705060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:42,139-Speed 2626.24 samples/sec   Loss 2.1898   LearningRate 0.0023   Epoch: 16   Global Step: 705070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:46,041-Speed 2624.92 samples/sec   Loss 2.0894   LearningRate 0.0023   Epoch: 16   Global Step: 705080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:49,936-Speed 2629.72 samples/sec   Loss 2.1982   LearningRate 0.0023   Epoch: 16   Global Step: 705090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:53,828-Speed 2631.68 samples/sec   Loss 2.2081   LearningRate 0.0023   Epoch: 16   Global Step: 705100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:58:57,729-Speed 2626.31 samples/sec   Loss 2.0877   LearningRate 0.0023   Epoch: 16   Global Step: 705110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:59:01,640-Speed 2618.82 samples/sec   Loss 2.1356   LearningRate 0.0023   Epoch: 16   Global Step: 705120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 02:59:05,584-Speed 2597.66 samples/sec   Loss 2.1225   LearningRate 0.0023   Epoch: 16   Global Step: 705130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:59:09,479-Speed 2629.07 samples/sec   Loss 2.1528   LearningRate 0.0023   Epoch: 16   Global Step: 705140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 02:59:30,572-Speed 485.49 samples/sec   Loss 2.1251   LearningRate 0.0022   Epoch: 17   Global Step: 705150   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:34,459-Speed 2635.68 samples/sec   Loss 2.0738   LearningRate 0.0022   Epoch: 17   Global Step: 705160   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:38,419-Speed 2586.07 samples/sec   Loss 2.1721   LearningRate 0.0022   Epoch: 17   Global Step: 705170   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:42,304-Speed 2636.46 samples/sec   Loss 2.0595   LearningRate 0.0022   Epoch: 17   Global Step: 705180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:46,192-Speed 2634.62 samples/sec   Loss 2.1154   LearningRate 0.0022   Epoch: 17   Global Step: 705190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:50,085-Speed 2630.91 samples/sec   Loss 2.1839   LearningRate 0.0022   Epoch: 17   Global Step: 705200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:53,980-Speed 2629.84 samples/sec   Loss 2.1613   LearningRate 0.0022   Epoch: 17   Global Step: 705210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 02:59:57,883-Speed 2624.04 samples/sec   Loss 2.1599   LearningRate 0.0022   Epoch: 17   Global Step: 705220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:00:01,783-Speed 2626.46 samples/sec   Loss 2.2014   LearningRate 0.0022   Epoch: 17   Global Step: 705230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:00:05,744-Speed 2585.92 samples/sec   Loss 2.0967   LearningRate 0.0022   Epoch: 17   Global Step: 705240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:00:09,635-Speed 2632.95 samples/sec   Loss 2.1440   LearningRate 0.0022   Epoch: 17   Global Step: 705250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:13,525-Speed 2633.13 samples/sec   Loss 2.0866   LearningRate 0.0022   Epoch: 17   Global Step: 705260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:17,412-Speed 2634.71 samples/sec   Loss 2.0889   LearningRate 0.0022   Epoch: 17   Global Step: 705270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:21,330-Speed 2614.37 samples/sec   Loss 2.0791   LearningRate 0.0022   Epoch: 17   Global Step: 705280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:25,238-Speed 2621.39 samples/sec   Loss 2.1165   LearningRate 0.0022   Epoch: 17   Global Step: 705290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:29,142-Speed 2623.99 samples/sec   Loss 2.1549   LearningRate 0.0022   Epoch: 17   Global Step: 705300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:33,033-Speed 2631.99 samples/sec   Loss 2.0902   LearningRate 0.0022   Epoch: 17   Global Step: 705310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:36,931-Speed 2627.89 samples/sec   Loss 2.1457   LearningRate 0.0022   Epoch: 17   Global Step: 705320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:40,826-Speed 2630.23 samples/sec   Loss 2.1358   LearningRate 0.0022   Epoch: 17   Global Step: 705330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:44,735-Speed 2619.46 samples/sec   Loss 2.1007   LearningRate 0.0022   Epoch: 17   Global Step: 705340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:00:48,644-Speed 2620.50 samples/sec   Loss 2.0825   LearningRate 0.0022   Epoch: 17   Global Step: 705350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:00:52,539-Speed 2629.07 samples/sec   Loss 2.0911   LearningRate 0.0022   Epoch: 17   Global Step: 705360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:00:56,431-Speed 2631.86 samples/sec   Loss 2.1502   LearningRate 0.0022   Epoch: 17   Global Step: 705370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:00,330-Speed 2627.28 samples/sec   Loss 2.1362   LearningRate 0.0022   Epoch: 17   Global Step: 705380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:04,229-Speed 2626.95 samples/sec   Loss 2.1289   LearningRate 0.0022   Epoch: 17   Global Step: 705390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:08,126-Speed 2628.23 samples/sec   Loss 2.1364   LearningRate 0.0022   Epoch: 17   Global Step: 705400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:12,045-Speed 2613.88 samples/sec   Loss 2.1156   LearningRate 0.0022   Epoch: 17   Global Step: 705410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:15,941-Speed 2629.01 samples/sec   Loss 2.0588   LearningRate 0.0022   Epoch: 17   Global Step: 705420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:19,846-Speed 2623.07 samples/sec   Loss 2.1618   LearningRate 0.0022   Epoch: 17   Global Step: 705430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:23,743-Speed 2628.39 samples/sec   Loss 2.1427   LearningRate 0.0022   Epoch: 17   Global Step: 705440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:27,644-Speed 2624.93 samples/sec   Loss 2.1143   LearningRate 0.0022   Epoch: 17   Global Step: 705450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:31,546-Speed 2625.26 samples/sec   Loss 2.1867   LearningRate 0.0022   Epoch: 17   Global Step: 705460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:35,499-Speed 2591.42 samples/sec   Loss 2.0925   LearningRate 0.0022   Epoch: 17   Global Step: 705470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:01:39,410-Speed 2618.62 samples/sec   Loss 2.1248   LearningRate 0.0022   Epoch: 17   Global Step: 705480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:01:43,283-Speed 2644.35 samples/sec   Loss 2.1527   LearningRate 0.0022   Epoch: 17   Global Step: 705490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:47,183-Speed 2626.31 samples/sec   Loss 2.0163   LearningRate 0.0022   Epoch: 17   Global Step: 705500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:51,089-Speed 2622.44 samples/sec   Loss 2.1184   LearningRate 0.0022   Epoch: 17   Global Step: 705510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:55,004-Speed 2616.07 samples/sec   Loss 2.0951   LearningRate 0.0022   Epoch: 17   Global Step: 705520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:01:58,927-Speed 2610.89 samples/sec   Loss 2.0680   LearningRate 0.0022   Epoch: 17   Global Step: 705530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:02,828-Speed 2625.51 samples/sec   Loss 2.1704   LearningRate 0.0022   Epoch: 17   Global Step: 705540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:06,730-Speed 2625.07 samples/sec   Loss 2.0881   LearningRate 0.0022   Epoch: 17   Global Step: 705550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:10,629-Speed 2627.25 samples/sec   Loss 2.1013   LearningRate 0.0022   Epoch: 17   Global Step: 705560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:14,549-Speed 2612.70 samples/sec   Loss 2.1114   LearningRate 0.0022   Epoch: 17   Global Step: 705570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:18,475-Speed 2608.70 samples/sec   Loss 2.0939   LearningRate 0.0022   Epoch: 17   Global Step: 705580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:22,369-Speed 2630.19 samples/sec   Loss 2.1527   LearningRate 0.0022   Epoch: 17   Global Step: 705590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:02:26,276-Speed 2621.85 samples/sec   Loss 2.1152   LearningRate 0.0022   Epoch: 17   Global Step: 705600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:02:30,173-Speed 2628.34 samples/sec   Loss 2.1421   LearningRate 0.0022   Epoch: 17   Global Step: 705610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:02:34,157-Speed 2570.92 samples/sec   Loss 2.1313   LearningRate 0.0022   Epoch: 17   Global Step: 705620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:38,075-Speed 2614.39 samples/sec   Loss 2.0443   LearningRate 0.0022   Epoch: 17   Global Step: 705630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:02:41,957-Speed 2638.53 samples/sec   Loss 2.0952   LearningRate 0.0022   Epoch: 17   Global Step: 705640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:02:45,873-Speed 2615.71 samples/sec   Loss 2.1432   LearningRate 0.0022   Epoch: 17   Global Step: 705650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:02:49,773-Speed 2626.11 samples/sec   Loss 2.0577   LearningRate 0.0022   Epoch: 17   Global Step: 705660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:02:53,719-Speed 2595.53 samples/sec   Loss 2.1076   LearningRate 0.0022   Epoch: 17   Global Step: 705670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:02:57,632-Speed 2618.24 samples/sec   Loss 2.0907   LearningRate 0.0022   Epoch: 17   Global Step: 705680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:03:01,560-Speed 2608.89 samples/sec   Loss 2.0630   LearningRate 0.0022   Epoch: 17   Global Step: 705690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:03:05,454-Speed 2629.64 samples/sec   Loss 2.1167   LearningRate 0.0022   Epoch: 17   Global Step: 705700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:03:09,348-Speed 2630.66 samples/sec   Loss 2.0835   LearningRate 0.0022   Epoch: 17   Global Step: 705710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:03:13,247-Speed 2627.34 samples/sec   Loss 2.0961   LearningRate 0.0022   Epoch: 17   Global Step: 705720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:03:17,145-Speed 2627.47 samples/sec   Loss 2.1553   LearningRate 0.0022   Epoch: 17   Global Step: 705730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:03:21,040-Speed 2629.32 samples/sec   Loss 2.0894   LearningRate 0.0022   Epoch: 17   Global Step: 705740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:24,955-Speed 2616.80 samples/sec   Loss 2.0830   LearningRate 0.0022   Epoch: 17   Global Step: 705750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:28,855-Speed 2625.87 samples/sec   Loss 2.0416   LearningRate 0.0022   Epoch: 17   Global Step: 705760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:32,754-Speed 2628.21 samples/sec   Loss 2.0370   LearningRate 0.0022   Epoch: 17   Global Step: 705770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:36,652-Speed 2627.70 samples/sec   Loss 2.0920   LearningRate 0.0022   Epoch: 17   Global Step: 705780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:40,552-Speed 2626.14 samples/sec   Loss 2.1195   LearningRate 0.0022   Epoch: 17   Global Step: 705790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:44,458-Speed 2622.25 samples/sec   Loss 2.0113   LearningRate 0.0022   Epoch: 17   Global Step: 705800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:48,357-Speed 2627.03 samples/sec   Loss 2.0259   LearningRate 0.0022   Epoch: 17   Global Step: 705810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:52,254-Speed 2627.88 samples/sec   Loss 2.1026   LearningRate 0.0022   Epoch: 17   Global Step: 705820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:03:56,153-Speed 2627.15 samples/sec   Loss 2.0169   LearningRate 0.0022   Epoch: 17   Global Step: 705830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:04:00,054-Speed 2625.63 samples/sec   Loss 2.1128   LearningRate 0.0022   Epoch: 17   Global Step: 705840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:04:03,930-Speed 2642.56 samples/sec   Loss 2.0664   LearningRate 0.0022   Epoch: 17   Global Step: 705850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:04:07,831-Speed 2625.67 samples/sec   Loss 2.1135   LearningRate 0.0022   Epoch: 17   Global Step: 705860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:04:11,709-Speed 2641.32 samples/sec   Loss 2.0848   LearningRate 0.0022   Epoch: 17   Global Step: 705870   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:04:15,619-Speed 2619.31 samples/sec   Loss 2.1765   LearningRate 0.0022   Epoch: 17   Global Step: 705880   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:04:19,519-Speed 2626.07 samples/sec   Loss 2.1300   LearningRate 0.0022   Epoch: 17   Global Step: 705890   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:04:23,423-Speed 2623.60 samples/sec   Loss 2.0766   LearningRate 0.0022   Epoch: 17   Global Step: 705900   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:04:27,326-Speed 2624.92 samples/sec   Loss 2.0997   LearningRate 0.0022   Epoch: 17   Global Step: 705910   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:04:31,222-Speed 2629.01 samples/sec   Loss 2.0468   LearningRate 0.0022   Epoch: 17   Global Step: 705920   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:04:35,119-Speed 2628.17 samples/sec   Loss 2.0497   LearningRate 0.0022   Epoch: 17   Global Step: 705930   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:04:39,013-Speed 2630.56 samples/sec   Loss 2.1217   LearningRate 0.0022   Epoch: 17   Global Step: 705940   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:04:42,924-Speed 2619.20 samples/sec   Loss 2.1110   LearningRate 0.0022   Epoch: 17   Global Step: 705950   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:04:46,824-Speed 2626.09 samples/sec   Loss 2.1267   LearningRate 0.0022   Epoch: 17   Global Step: 705960   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:04:50,721-Speed 2629.45 samples/sec   Loss 2.1106   LearningRate 0.0022   Epoch: 17   Global Step: 705970   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:04:54,618-Speed 2628.63 samples/sec   Loss 2.1302   LearningRate 0.0022   Epoch: 17   Global Step: 705980   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:04:58,534-Speed 2615.25 samples/sec   Loss 2.0607   LearningRate 0.0022   Epoch: 17   Global Step: 705990   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:05:02,441-Speed 2621.34 samples/sec   Loss 2.0467   LearningRate 0.0022   Epoch: 17   Global Step: 706000   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:05:06,339-Speed 2628.48 samples/sec   Loss 2.0631   LearningRate 0.0022   Epoch: 17   Global Step: 706010   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:05:10,239-Speed 2626.25 samples/sec   Loss 2.0517   LearningRate 0.0022   Epoch: 17   Global Step: 706020   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-16 03:05:14,144-Speed 2623.19 samples/sec   Loss 2.0613   LearningRate 0.0022   Epoch: 17   Global Step: 706030   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:18,070-Speed 2608.54 samples/sec   Loss 2.0879   LearningRate 0.0022   Epoch: 17   Global Step: 706040   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:22,064-Speed 2565.11 samples/sec   Loss 2.1883   LearningRate 0.0022   Epoch: 17   Global Step: 706050   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:25,965-Speed 2625.78 samples/sec   Loss 2.2254   LearningRate 0.0022   Epoch: 17   Global Step: 706060   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:29,863-Speed 2627.32 samples/sec   Loss 2.1142   LearningRate 0.0022   Epoch: 17   Global Step: 706070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:33,761-Speed 2627.48 samples/sec   Loss 2.1076   LearningRate 0.0022   Epoch: 17   Global Step: 706080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:37,666-Speed 2623.66 samples/sec   Loss 2.0923   LearningRate 0.0022   Epoch: 17   Global Step: 706090   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:41,563-Speed 2628.72 samples/sec   Loss 2.0201   LearningRate 0.0022   Epoch: 17   Global Step: 706100   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:45,478-Speed 2615.54 samples/sec   Loss 2.0897   LearningRate 0.0022   Epoch: 17   Global Step: 706110   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:49,449-Speed 2580.46 samples/sec   Loss 2.0562   LearningRate 0.0022   Epoch: 17   Global Step: 706120   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:05:53,417-Speed 2581.44 samples/sec   Loss 2.0775   LearningRate 0.0022   Epoch: 17   Global Step: 706130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:05:57,322-Speed 2623.02 samples/sec   Loss 2.1334   LearningRate 0.0022   Epoch: 17   Global Step: 706140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:01,219-Speed 2627.84 samples/sec   Loss 2.0394   LearningRate 0.0022   Epoch: 17   Global Step: 706150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:05,116-Speed 2628.53 samples/sec   Loss 2.1846   LearningRate 0.0022   Epoch: 17   Global Step: 706160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:09,013-Speed 2628.29 samples/sec   Loss 2.0924   LearningRate 0.0022   Epoch: 17   Global Step: 706170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:12,909-Speed 2629.66 samples/sec   Loss 2.0885   LearningRate 0.0022   Epoch: 17   Global Step: 706180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:16,810-Speed 2625.58 samples/sec   Loss 2.1329   LearningRate 0.0022   Epoch: 17   Global Step: 706190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:20,718-Speed 2620.57 samples/sec   Loss 2.0610   LearningRate 0.0022   Epoch: 17   Global Step: 706200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:24,616-Speed 2627.82 samples/sec   Loss 2.0825   LearningRate 0.0022   Epoch: 17   Global Step: 706210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:06:28,495-Speed 2640.29 samples/sec   Loss 2.0893   LearningRate 0.0022   Epoch: 17   Global Step: 706220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:32,391-Speed 2629.37 samples/sec   Loss 2.0826   LearningRate 0.0022   Epoch: 17   Global Step: 706230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:36,291-Speed 2626.78 samples/sec   Loss 2.1128   LearningRate 0.0022   Epoch: 17   Global Step: 706240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:40,190-Speed 2626.76 samples/sec   Loss 2.1020   LearningRate 0.0022   Epoch: 17   Global Step: 706250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:44,084-Speed 2630.20 samples/sec   Loss 2.1426   LearningRate 0.0022   Epoch: 17   Global Step: 706260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:47,980-Speed 2629.31 samples/sec   Loss 2.0784   LearningRate 0.0022   Epoch: 17   Global Step: 706270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:51,874-Speed 2630.49 samples/sec   Loss 2.1558   LearningRate 0.0022   Epoch: 17   Global Step: 706280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:55,781-Speed 2622.34 samples/sec   Loss 2.0612   LearningRate 0.0022   Epoch: 17   Global Step: 706290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:06:59,676-Speed 2629.35 samples/sec   Loss 2.1148   LearningRate 0.0022   Epoch: 17   Global Step: 706300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:07:03,574-Speed 2627.37 samples/sec   Loss 2.0812   LearningRate 0.0022   Epoch: 17   Global Step: 706310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:07:07,486-Speed 2618.10 samples/sec   Loss 2.1049   LearningRate 0.0022   Epoch: 17   Global Step: 706320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:11,389-Speed 2624.56 samples/sec   Loss 2.0670   LearningRate 0.0022   Epoch: 17   Global Step: 706330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:15,293-Speed 2623.56 samples/sec   Loss 2.1047   LearningRate 0.0022   Epoch: 17   Global Step: 706340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:19,190-Speed 2627.63 samples/sec   Loss 2.1184   LearningRate 0.0022   Epoch: 17   Global Step: 706350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:23,083-Speed 2631.25 samples/sec   Loss 2.1431   LearningRate 0.0022   Epoch: 17   Global Step: 706360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:26,980-Speed 2628.49 samples/sec   Loss 2.0994   LearningRate 0.0022   Epoch: 17   Global Step: 706370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:30,884-Speed 2623.80 samples/sec   Loss 2.1045   LearningRate 0.0022   Epoch: 17   Global Step: 706380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:34,782-Speed 2627.93 samples/sec   Loss 2.1334   LearningRate 0.0022   Epoch: 17   Global Step: 706390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:38,680-Speed 2627.14 samples/sec   Loss 2.1031   LearningRate 0.0022   Epoch: 17   Global Step: 706400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:42,594-Speed 2616.92 samples/sec   Loss 2.1785   LearningRate 0.0022   Epoch: 17   Global Step: 706410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:46,492-Speed 2628.08 samples/sec   Loss 2.0976   LearningRate 0.0022   Epoch: 17   Global Step: 706420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:07:50,402-Speed 2619.18 samples/sec   Loss 2.0805   LearningRate 0.0022   Epoch: 17   Global Step: 706430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:07:54,276-Speed 2644.91 samples/sec   Loss 2.1206   LearningRate 0.0022   Epoch: 17   Global Step: 706440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:07:58,173-Speed 2628.04 samples/sec   Loss 2.1190   LearningRate 0.0022   Epoch: 17   Global Step: 706450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:02,069-Speed 2629.27 samples/sec   Loss 2.1023   LearningRate 0.0022   Epoch: 17   Global Step: 706460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:05,966-Speed 2628.32 samples/sec   Loss 2.1308   LearningRate 0.0022   Epoch: 17   Global Step: 706470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:09,857-Speed 2632.25 samples/sec   Loss 2.1290   LearningRate 0.0022   Epoch: 17   Global Step: 706480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:13,754-Speed 2628.18 samples/sec   Loss 2.0189   LearningRate 0.0022   Epoch: 17   Global Step: 706490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:17,648-Speed 2631.34 samples/sec   Loss 2.0531   LearningRate 0.0022   Epoch: 17   Global Step: 706500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:21,548-Speed 2626.38 samples/sec   Loss 2.0790   LearningRate 0.0022   Epoch: 17   Global Step: 706510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:25,462-Speed 2616.86 samples/sec   Loss 2.0884   LearningRate 0.0022   Epoch: 17   Global Step: 706520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:29,367-Speed 2623.01 samples/sec   Loss 2.1032   LearningRate 0.0022   Epoch: 17   Global Step: 706530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:33,276-Speed 2620.09 samples/sec   Loss 2.1212   LearningRate 0.0022   Epoch: 17   Global Step: 706540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:08:37,153-Speed 2641.94 samples/sec   Loss 2.1168   LearningRate 0.0022   Epoch: 17   Global Step: 706550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:41,051-Speed 2627.91 samples/sec   Loss 2.1060   LearningRate 0.0022   Epoch: 17   Global Step: 706560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:44,948-Speed 2627.95 samples/sec   Loss 2.1422   LearningRate 0.0022   Epoch: 17   Global Step: 706570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:48,942-Speed 2564.77 samples/sec   Loss 2.0717   LearningRate 0.0022   Epoch: 17   Global Step: 706580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:52,967-Speed 2544.83 samples/sec   Loss 2.1448   LearningRate 0.0022   Epoch: 17   Global Step: 706590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:08:56,880-Speed 2617.42 samples/sec   Loss 2.0566   LearningRate 0.0022   Epoch: 17   Global Step: 706600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:09:00,780-Speed 2626.52 samples/sec   Loss 2.0653   LearningRate 0.0022   Epoch: 17   Global Step: 706610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:09:04,708-Speed 2607.71 samples/sec   Loss 2.1232   LearningRate 0.0022   Epoch: 17   Global Step: 706620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:09:08,609-Speed 2625.67 samples/sec   Loss 2.1162   LearningRate 0.0022   Epoch: 17   Global Step: 706630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:09:12,516-Speed 2621.25 samples/sec   Loss 2.1176   LearningRate 0.0022   Epoch: 17   Global Step: 706640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:09:16,484-Speed 2582.04 samples/sec   Loss 2.1302   LearningRate 0.0022   Epoch: 17   Global Step: 706650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:20,414-Speed 2606.03 samples/sec   Loss 2.2041   LearningRate 0.0022   Epoch: 17   Global Step: 706660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:24,396-Speed 2572.51 samples/sec   Loss 2.1098   LearningRate 0.0022   Epoch: 17   Global Step: 706670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:28,289-Speed 2631.14 samples/sec   Loss 2.0927   LearningRate 0.0022   Epoch: 17   Global Step: 706680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:32,266-Speed 2575.15 samples/sec   Loss 2.1052   LearningRate 0.0022   Epoch: 17   Global Step: 706690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:36,205-Speed 2600.44 samples/sec   Loss 2.0420   LearningRate 0.0022   Epoch: 17   Global Step: 706700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:40,108-Speed 2624.39 samples/sec   Loss 2.1080   LearningRate 0.0022   Epoch: 17   Global Step: 706710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:44,006-Speed 2627.65 samples/sec   Loss 2.0992   LearningRate 0.0022   Epoch: 17   Global Step: 706720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:47,899-Speed 2630.88 samples/sec   Loss 2.1183   LearningRate 0.0022   Epoch: 17   Global Step: 706730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:51,820-Speed 2613.46 samples/sec   Loss 2.0841   LearningRate 0.0022   Epoch: 17   Global Step: 706740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:09:55,834-Speed 2551.61 samples/sec   Loss 2.0929   LearningRate 0.0022   Epoch: 17   Global Step: 706750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:09:59,738-Speed 2623.88 samples/sec   Loss 1.9967   LearningRate 0.0022   Epoch: 17   Global Step: 706760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:10:03,633-Speed 2629.68 samples/sec   Loss 2.1083   LearningRate 0.0022   Epoch: 17   Global Step: 706770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:10:07,540-Speed 2621.05 samples/sec   Loss 2.1423   LearningRate 0.0022   Epoch: 17   Global Step: 706780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:10:11,413-Speed 2645.15 samples/sec   Loss 2.0854   LearningRate 0.0022   Epoch: 17   Global Step: 706790   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:15,386-Speed 2578.26 samples/sec   Loss 2.1461   LearningRate 0.0022   Epoch: 17   Global Step: 706800   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:19,448-Speed 2521.40 samples/sec   Loss 2.0981   LearningRate 0.0022   Epoch: 17   Global Step: 706810   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:23,381-Speed 2603.65 samples/sec   Loss 2.1062   LearningRate 0.0022   Epoch: 17   Global Step: 706820   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:27,292-Speed 2619.46 samples/sec   Loss 2.0334   LearningRate 0.0022   Epoch: 17   Global Step: 706830   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:31,192-Speed 2625.83 samples/sec   Loss 2.0649   LearningRate 0.0022   Epoch: 17   Global Step: 706840   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:35,103-Speed 2618.96 samples/sec   Loss 2.1254   LearningRate 0.0022   Epoch: 17   Global Step: 706850   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:39,010-Speed 2621.38 samples/sec   Loss 2.0675   LearningRate 0.0022   Epoch: 17   Global Step: 706860   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:42,909-Speed 2627.58 samples/sec   Loss 2.0454   LearningRate 0.0022   Epoch: 17   Global Step: 706870   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:46,809-Speed 2626.31 samples/sec   Loss 2.0966   LearningRate 0.0022   Epoch: 17   Global Step: 706880   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:10:50,710-Speed 2625.04 samples/sec   Loss 2.0978   LearningRate 0.0022   Epoch: 17   Global Step: 706890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:10:54,639-Speed 2607.23 samples/sec   Loss 2.0616   LearningRate 0.0022   Epoch: 17   Global Step: 706900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:10:58,550-Speed 2618.82 samples/sec   Loss 2.0965   LearningRate 0.0022   Epoch: 17   Global Step: 706910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:02,458-Speed 2620.89 samples/sec   Loss 2.0833   LearningRate 0.0022   Epoch: 17   Global Step: 706920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:06,364-Speed 2622.37 samples/sec   Loss 2.1444   LearningRate 0.0022   Epoch: 17   Global Step: 706930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:10,273-Speed 2620.29 samples/sec   Loss 2.0503   LearningRate 0.0022   Epoch: 17   Global Step: 706940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:14,171-Speed 2627.97 samples/sec   Loss 2.0416   LearningRate 0.0022   Epoch: 17   Global Step: 706950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:18,147-Speed 2576.37 samples/sec   Loss 2.0631   LearningRate 0.0022   Epoch: 17   Global Step: 706960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:22,045-Speed 2626.91 samples/sec   Loss 2.0870   LearningRate 0.0022   Epoch: 17   Global Step: 706970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:25,957-Speed 2619.14 samples/sec   Loss 2.1182   LearningRate 0.0022   Epoch: 17   Global Step: 706980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:29,847-Speed 2633.05 samples/sec   Loss 2.0781   LearningRate 0.0022   Epoch: 17   Global Step: 706990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:11:33,722-Speed 2643.07 samples/sec   Loss 2.1313   LearningRate 0.0022   Epoch: 17   Global Step: 707000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:37,619-Speed 2627.81 samples/sec   Loss 2.0434   LearningRate 0.0022   Epoch: 17   Global Step: 707010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:41,520-Speed 2626.44 samples/sec   Loss 2.1020   LearningRate 0.0022   Epoch: 17   Global Step: 707020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:11:45,398-Speed 2641.14 samples/sec   Loss 2.0926   LearningRate 0.0022   Epoch: 17   Global Step: 707030   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:11:49,292-Speed 2630.33 samples/sec   Loss 2.1285   LearningRate 0.0022   Epoch: 17   Global Step: 707040   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:11:53,191-Speed 2626.63 samples/sec   Loss 2.0422   LearningRate 0.0022   Epoch: 17   Global Step: 707050   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:11:57,098-Speed 2622.07 samples/sec   Loss 2.1622   LearningRate 0.0022   Epoch: 17   Global Step: 707060   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:01,003-Speed 2623.23 samples/sec   Loss 2.1143   LearningRate 0.0022   Epoch: 17   Global Step: 707070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:04,915-Speed 2617.73 samples/sec   Loss 2.0733   LearningRate 0.0022   Epoch: 17   Global Step: 707080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:08,819-Speed 2623.32 samples/sec   Loss 2.1305   LearningRate 0.0022   Epoch: 17   Global Step: 707090   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:12,720-Speed 2626.50 samples/sec   Loss 2.1497   LearningRate 0.0022   Epoch: 17   Global Step: 707100   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:16,619-Speed 2627.41 samples/sec   Loss 2.0483   LearningRate 0.0022   Epoch: 17   Global Step: 707110   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:20,518-Speed 2626.76 samples/sec   Loss 2.0918   LearningRate 0.0022   Epoch: 17   Global Step: 707120   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:24,458-Speed 2599.74 samples/sec   Loss 2.1401   LearningRate 0.0022   Epoch: 17   Global Step: 707130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:12:28,401-Speed 2597.98 samples/sec   Loss 2.0886   LearningRate 0.0022   Epoch: 17   Global Step: 707140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:12:32,299-Speed 2627.34 samples/sec   Loss 2.0697   LearningRate 0.0022   Epoch: 17   Global Step: 707150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:12:36,197-Speed 2627.81 samples/sec   Loss 2.0643   LearningRate 0.0022   Epoch: 17   Global Step: 707160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:12:40,166-Speed 2580.92 samples/sec   Loss 2.1370   LearningRate 0.0022   Epoch: 17   Global Step: 707170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:12:44,040-Speed 2643.84 samples/sec   Loss 2.0683   LearningRate 0.0022   Epoch: 17   Global Step: 707180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:47,989-Speed 2593.75 samples/sec   Loss 2.0408   LearningRate 0.0022   Epoch: 17   Global Step: 707190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:51,899-Speed 2619.78 samples/sec   Loss 2.0372   LearningRate 0.0022   Epoch: 17   Global Step: 707200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:55,789-Speed 2632.88 samples/sec   Loss 2.0826   LearningRate 0.0022   Epoch: 17   Global Step: 707210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:12:59,689-Speed 2626.24 samples/sec   Loss 2.1229   LearningRate 0.0022   Epoch: 17   Global Step: 707220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:13:03,583-Speed 2630.38 samples/sec   Loss 2.0427   LearningRate 0.0022   Epoch: 17   Global Step: 707230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:13:07,482-Speed 2627.19 samples/sec   Loss 2.0415   LearningRate 0.0022   Epoch: 17   Global Step: 707240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:13:11,374-Speed 2631.55 samples/sec   Loss 2.0150   LearningRate 0.0022   Epoch: 17   Global Step: 707250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:13:15,269-Speed 2629.82 samples/sec   Loss 2.1447   LearningRate 0.0022   Epoch: 17   Global Step: 707260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:13:19,171-Speed 2624.50 samples/sec   Loss 2.0880   LearningRate 0.0022   Epoch: 17   Global Step: 707270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:13:23,072-Speed 2626.55 samples/sec   Loss 2.0787   LearningRate 0.0022   Epoch: 17   Global Step: 707280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:26,970-Speed 2627.11 samples/sec   Loss 2.1661   LearningRate 0.0022   Epoch: 17   Global Step: 707290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:30,869-Speed 2626.99 samples/sec   Loss 2.1061   LearningRate 0.0022   Epoch: 17   Global Step: 707300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:34,768-Speed 2627.38 samples/sec   Loss 2.0717   LearningRate 0.0022   Epoch: 17   Global Step: 707310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:38,661-Speed 2630.87 samples/sec   Loss 2.1222   LearningRate 0.0022   Epoch: 17   Global Step: 707320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:42,555-Speed 2629.90 samples/sec   Loss 2.0382   LearningRate 0.0022   Epoch: 17   Global Step: 707330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:46,463-Speed 2621.42 samples/sec   Loss 2.0439   LearningRate 0.0022   Epoch: 17   Global Step: 707340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:50,360-Speed 2628.42 samples/sec   Loss 2.0421   LearningRate 0.0022   Epoch: 17   Global Step: 707350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:54,270-Speed 2619.34 samples/sec   Loss 2.0766   LearningRate 0.0022   Epoch: 17   Global Step: 707360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:13:58,169-Speed 2627.48 samples/sec   Loss 2.0773   LearningRate 0.0022   Epoch: 17   Global Step: 707370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:02,065-Speed 2629.00 samples/sec   Loss 2.1377   LearningRate 0.0022   Epoch: 17   Global Step: 707380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:14:06,015-Speed 2592.56 samples/sec   Loss 2.0632   LearningRate 0.0022   Epoch: 17   Global Step: 707390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:14:09,928-Speed 2617.79 samples/sec   Loss 2.0824   LearningRate 0.0022   Epoch: 17   Global Step: 707400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:14:13,827-Speed 2626.44 samples/sec   Loss 2.0184   LearningRate 0.0022   Epoch: 17   Global Step: 707410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:14:17,743-Speed 2616.28 samples/sec   Loss 2.0564   LearningRate 0.0022   Epoch: 17   Global Step: 707420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:21,652-Speed 2620.10 samples/sec   Loss 2.0810   LearningRate 0.0022   Epoch: 17   Global Step: 707430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:25,556-Speed 2623.84 samples/sec   Loss 2.1461   LearningRate 0.0022   Epoch: 17   Global Step: 707440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:29,452-Speed 2629.17 samples/sec   Loss 2.0719   LearningRate 0.0022   Epoch: 17   Global Step: 707450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:33,356-Speed 2623.46 samples/sec   Loss 2.0867   LearningRate 0.0022   Epoch: 17   Global Step: 707460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:37,249-Speed 2631.54 samples/sec   Loss 2.1042   LearningRate 0.0022   Epoch: 17   Global Step: 707470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:41,166-Speed 2614.49 samples/sec   Loss 2.0643   LearningRate 0.0022   Epoch: 17   Global Step: 707480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:45,057-Speed 2632.35 samples/sec   Loss 2.0498   LearningRate 0.0022   Epoch: 17   Global Step: 707490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:48,954-Speed 2628.61 samples/sec   Loss 2.0665   LearningRate 0.0022   Epoch: 17   Global Step: 707500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:52,851-Speed 2628.28 samples/sec   Loss 2.0658   LearningRate 0.0022   Epoch: 17   Global Step: 707510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:14:56,751-Speed 2626.14 samples/sec   Loss 2.1444   LearningRate 0.0022   Epoch: 17   Global Step: 707520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:15:00,621-Speed 2647.05 samples/sec   Loss 2.0699   LearningRate 0.0022   Epoch: 17   Global Step: 707530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:04,520-Speed 2626.77 samples/sec   Loss 2.1096   LearningRate 0.0022   Epoch: 17   Global Step: 707540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:08,427-Speed 2621.45 samples/sec   Loss 2.0431   LearningRate 0.0022   Epoch: 17   Global Step: 707550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:12,320-Speed 2631.55 samples/sec   Loss 2.0333   LearningRate 0.0022   Epoch: 17   Global Step: 707560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:16,219-Speed 2626.98 samples/sec   Loss 2.0923   LearningRate 0.0022   Epoch: 17   Global Step: 707570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:20,118-Speed 2626.88 samples/sec   Loss 2.0254   LearningRate 0.0022   Epoch: 17   Global Step: 707580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:24,013-Speed 2629.56 samples/sec   Loss 2.0650   LearningRate 0.0022   Epoch: 17   Global Step: 707590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:15:27,889-Speed 2642.62 samples/sec   Loss 2.1057   LearningRate 0.0022   Epoch: 17   Global Step: 707600   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:31,789-Speed 2626.44 samples/sec   Loss 2.0836   LearningRate 0.0022   Epoch: 17   Global Step: 707610   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:35,688-Speed 2626.95 samples/sec   Loss 2.0564   LearningRate 0.0022   Epoch: 17   Global Step: 707620   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:39,583-Speed 2629.95 samples/sec   Loss 2.0786   LearningRate 0.0022   Epoch: 17   Global Step: 707630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:43,508-Speed 2608.85 samples/sec   Loss 2.0869   LearningRate 0.0022   Epoch: 17   Global Step: 707640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:47,404-Speed 2629.39 samples/sec   Loss 2.1207   LearningRate 0.0022   Epoch: 17   Global Step: 707650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:51,299-Speed 2629.31 samples/sec   Loss 2.1303   LearningRate 0.0022   Epoch: 17   Global Step: 707660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:55,223-Speed 2610.88 samples/sec   Loss 2.0557   LearningRate 0.0022   Epoch: 17   Global Step: 707670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:15:59,116-Speed 2630.32 samples/sec   Loss 2.1102   LearningRate 0.0022   Epoch: 17   Global Step: 707680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:03,012-Speed 2629.54 samples/sec   Loss 2.0969   LearningRate 0.0022   Epoch: 17   Global Step: 707690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:06,914-Speed 2624.75 samples/sec   Loss 2.0950   LearningRate 0.0022   Epoch: 17   Global Step: 707700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:16:10,822-Speed 2621.27 samples/sec   Loss 2.0353   LearningRate 0.0022   Epoch: 17   Global Step: 707710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:16:14,694-Speed 2644.65 samples/sec   Loss 2.1439   LearningRate 0.0022   Epoch: 17   Global Step: 707720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:18,612-Speed 2614.54 samples/sec   Loss 2.1009   LearningRate 0.0022   Epoch: 17   Global Step: 707730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:22,533-Speed 2612.22 samples/sec   Loss 2.0805   LearningRate 0.0022   Epoch: 17   Global Step: 707740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:26,428-Speed 2629.13 samples/sec   Loss 2.0533   LearningRate 0.0022   Epoch: 17   Global Step: 707750   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:30,341-Speed 2617.82 samples/sec   Loss 2.0966   LearningRate 0.0022   Epoch: 17   Global Step: 707760   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:34,239-Speed 2627.36 samples/sec   Loss 2.0122   LearningRate 0.0022   Epoch: 17   Global Step: 707770   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:38,138-Speed 2626.86 samples/sec   Loss 2.1567   LearningRate 0.0022   Epoch: 17   Global Step: 707780   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:42,082-Speed 2597.70 samples/sec   Loss 2.0071   LearningRate 0.0022   Epoch: 17   Global Step: 707790   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:45,982-Speed 2626.18 samples/sec   Loss 2.0910   LearningRate 0.0022   Epoch: 17   Global Step: 707800   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:49,880-Speed 2628.07 samples/sec   Loss 2.0255   LearningRate 0.0022   Epoch: 17   Global Step: 707810   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:16:53,777-Speed 2627.81 samples/sec   Loss 2.1153   LearningRate 0.0022   Epoch: 17   Global Step: 707820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:16:57,697-Speed 2613.81 samples/sec   Loss 2.0993   LearningRate 0.0022   Epoch: 17   Global Step: 707830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:01,598-Speed 2625.52 samples/sec   Loss 2.0400   LearningRate 0.0022   Epoch: 17   Global Step: 707840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:05,493-Speed 2629.09 samples/sec   Loss 2.0627   LearningRate 0.0022   Epoch: 17   Global Step: 707850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:09,399-Speed 2622.40 samples/sec   Loss 2.1003   LearningRate 0.0022   Epoch: 17   Global Step: 707860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:13,299-Speed 2625.99 samples/sec   Loss 2.0451   LearningRate 0.0022   Epoch: 17   Global Step: 707870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:17,199-Speed 2626.47 samples/sec   Loss 2.0949   LearningRate 0.0022   Epoch: 17   Global Step: 707880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:21,100-Speed 2625.74 samples/sec   Loss 2.0117   LearningRate 0.0022   Epoch: 17   Global Step: 707890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:25,004-Speed 2623.91 samples/sec   Loss 2.0457   LearningRate 0.0022   Epoch: 17   Global Step: 707900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:28,900-Speed 2628.82 samples/sec   Loss 2.0709   LearningRate 0.0022   Epoch: 17   Global Step: 707910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:32,799-Speed 2627.14 samples/sec   Loss 2.0319   LearningRate 0.0022   Epoch: 17   Global Step: 707920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:17:36,691-Speed 2631.24 samples/sec   Loss 2.0883   LearningRate 0.0022   Epoch: 17   Global Step: 707930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:17:40,588-Speed 2628.55 samples/sec   Loss 2.0723   LearningRate 0.0021   Epoch: 17   Global Step: 707940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:44,493-Speed 2622.41 samples/sec   Loss 2.0169   LearningRate 0.0021   Epoch: 17   Global Step: 707950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:48,391-Speed 2628.56 samples/sec   Loss 2.1438   LearningRate 0.0021   Epoch: 17   Global Step: 707960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:52,289-Speed 2627.81 samples/sec   Loss 2.0846   LearningRate 0.0021   Epoch: 17   Global Step: 707970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:17:56,183-Speed 2630.73 samples/sec   Loss 2.1236   LearningRate 0.0021   Epoch: 17   Global Step: 707980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:00,105-Speed 2611.54 samples/sec   Loss 2.0846   LearningRate 0.0021   Epoch: 17   Global Step: 707990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:04,006-Speed 2625.90 samples/sec   Loss 2.0863   LearningRate 0.0021   Epoch: 17   Global Step: 708000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:07,905-Speed 2626.79 samples/sec   Loss 2.1028   LearningRate 0.0021   Epoch: 17   Global Step: 708010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:11,799-Speed 2629.88 samples/sec   Loss 2.0596   LearningRate 0.0021   Epoch: 17   Global Step: 708020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:15,696-Speed 2628.43 samples/sec   Loss 2.0901   LearningRate 0.0021   Epoch: 17   Global Step: 708030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:19,598-Speed 2625.28 samples/sec   Loss 2.0098   LearningRate 0.0021   Epoch: 17   Global Step: 708040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:18:23,476-Speed 2640.85 samples/sec   Loss 2.0939   LearningRate 0.0021   Epoch: 17   Global Step: 708050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:18:27,349-Speed 2644.36 samples/sec   Loss 1.9889   LearningRate 0.0021   Epoch: 17   Global Step: 708060   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:31,248-Speed 2627.23 samples/sec   Loss 2.0904   LearningRate 0.0021   Epoch: 17   Global Step: 708070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:35,147-Speed 2626.98 samples/sec   Loss 2.0496   LearningRate 0.0021   Epoch: 17   Global Step: 708080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:39,050-Speed 2624.08 samples/sec   Loss 2.1089   LearningRate 0.0021   Epoch: 17   Global Step: 708090   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:42,953-Speed 2624.04 samples/sec   Loss 2.0539   LearningRate 0.0021   Epoch: 17   Global Step: 708100   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:46,895-Speed 2598.49 samples/sec   Loss 2.1049   LearningRate 0.0021   Epoch: 17   Global Step: 708110   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:50,790-Speed 2629.70 samples/sec   Loss 2.0050   LearningRate 0.0021   Epoch: 17   Global Step: 708120   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:54,724-Speed 2603.65 samples/sec   Loss 2.0531   LearningRate 0.0021   Epoch: 17   Global Step: 708130   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:18:58,725-Speed 2559.98 samples/sec   Loss 2.1020   LearningRate 0.0021   Epoch: 17   Global Step: 708140   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:19:02,621-Speed 2628.65 samples/sec   Loss 2.0288   LearningRate 0.0021   Epoch: 17   Global Step: 708150   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:19:06,514-Speed 2630.57 samples/sec   Loss 2.0719   LearningRate 0.0021   Epoch: 17   Global Step: 708160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:10,412-Speed 2628.24 samples/sec   Loss 2.0425   LearningRate 0.0021   Epoch: 17   Global Step: 708170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:14,310-Speed 2627.25 samples/sec   Loss 2.1222   LearningRate 0.0021   Epoch: 17   Global Step: 708180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:18,218-Speed 2621.41 samples/sec   Loss 2.0364   LearningRate 0.0021   Epoch: 17   Global Step: 708190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:22,113-Speed 2629.01 samples/sec   Loss 2.0360   LearningRate 0.0021   Epoch: 17   Global Step: 708200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:26,014-Speed 2625.57 samples/sec   Loss 2.0810   LearningRate 0.0021   Epoch: 17   Global Step: 708210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:29,909-Speed 2629.73 samples/sec   Loss 2.0037   LearningRate 0.0021   Epoch: 17   Global Step: 708220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:33,807-Speed 2627.24 samples/sec   Loss 2.0956   LearningRate 0.0021   Epoch: 17   Global Step: 708230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:37,698-Speed 2632.04 samples/sec   Loss 2.0349   LearningRate 0.0021   Epoch: 17   Global Step: 708240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:41,600-Speed 2625.50 samples/sec   Loss 2.0730   LearningRate 0.0021   Epoch: 17   Global Step: 708250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:45,502-Speed 2624.73 samples/sec   Loss 2.0605   LearningRate 0.0021   Epoch: 17   Global Step: 708260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:19:49,381-Speed 2640.73 samples/sec   Loss 2.0423   LearningRate 0.0021   Epoch: 17   Global Step: 708270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:53,290-Speed 2620.44 samples/sec   Loss 2.0241   LearningRate 0.0021   Epoch: 17   Global Step: 708280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:19:57,205-Speed 2616.03 samples/sec   Loss 2.0545   LearningRate 0.0021   Epoch: 17   Global Step: 708290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:01,099-Speed 2630.24 samples/sec   Loss 2.1169   LearningRate 0.0021   Epoch: 17   Global Step: 708300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:04,992-Speed 2630.72 samples/sec   Loss 2.0347   LearningRate 0.0021   Epoch: 17   Global Step: 708310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:08,891-Speed 2626.65 samples/sec   Loss 2.0521   LearningRate 0.0021   Epoch: 17   Global Step: 708320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:12,790-Speed 2627.12 samples/sec   Loss 2.0446   LearningRate 0.0021   Epoch: 17   Global Step: 708330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:16,685-Speed 2629.45 samples/sec   Loss 2.0731   LearningRate 0.0021   Epoch: 17   Global Step: 708340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:20,584-Speed 2627.07 samples/sec   Loss 2.0923   LearningRate 0.0021   Epoch: 17   Global Step: 708350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:24,476-Speed 2631.88 samples/sec   Loss 2.1162   LearningRate 0.0021   Epoch: 17   Global Step: 708360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:28,346-Speed 2646.88 samples/sec   Loss 2.0539   LearningRate 0.0021   Epoch: 17   Global Step: 708370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:32,242-Speed 2628.65 samples/sec   Loss 2.0952   LearningRate 0.0021   Epoch: 17   Global Step: 708380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:36,140-Speed 2627.65 samples/sec   Loss 2.0648   LearningRate 0.0021   Epoch: 17   Global Step: 708390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:40,031-Speed 2632.48 samples/sec   Loss 2.1207   LearningRate 0.0021   Epoch: 17   Global Step: 708400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:43,925-Speed 2629.82 samples/sec   Loss 2.0576   LearningRate 0.0021   Epoch: 17   Global Step: 708410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:47,819-Speed 2630.62 samples/sec   Loss 2.1166   LearningRate 0.0021   Epoch: 17   Global Step: 708420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:51,716-Speed 2628.68 samples/sec   Loss 2.1178   LearningRate 0.0021   Epoch: 17   Global Step: 708430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:55,612-Speed 2628.93 samples/sec   Loss 2.0138   LearningRate 0.0021   Epoch: 17   Global Step: 708440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:20:59,504-Speed 2631.55 samples/sec   Loss 2.0791   LearningRate 0.0021   Epoch: 17   Global Step: 708450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:03,395-Speed 2632.13 samples/sec   Loss 2.0984   LearningRate 0.0021   Epoch: 17   Global Step: 708460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:07,296-Speed 2625.97 samples/sec   Loss 2.1176   LearningRate 0.0021   Epoch: 17   Global Step: 708470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:21:11,168-Speed 2645.26 samples/sec   Loss 2.0521   LearningRate 0.0021   Epoch: 17   Global Step: 708480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:15,064-Speed 2628.82 samples/sec   Loss 2.1212   LearningRate 0.0021   Epoch: 17   Global Step: 708490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:18,962-Speed 2627.37 samples/sec   Loss 2.0245   LearningRate 0.0021   Epoch: 17   Global Step: 708500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:22,860-Speed 2628.03 samples/sec   Loss 2.0306   LearningRate 0.0021   Epoch: 17   Global Step: 708510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:26,758-Speed 2627.82 samples/sec   Loss 2.1307   LearningRate 0.0021   Epoch: 17   Global Step: 708520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:30,653-Speed 2629.39 samples/sec   Loss 2.0496   LearningRate 0.0021   Epoch: 17   Global Step: 708530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:34,591-Speed 2600.72 samples/sec   Loss 2.0398   LearningRate 0.0021   Epoch: 17   Global Step: 708540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:38,664-Speed 2515.12 samples/sec   Loss 1.9362   LearningRate 0.0021   Epoch: 17   Global Step: 708550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:42,705-Speed 2534.98 samples/sec   Loss 2.0770   LearningRate 0.0021   Epoch: 17   Global Step: 708560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:46,594-Speed 2633.06 samples/sec   Loss 2.0512   LearningRate 0.0021   Epoch: 17   Global Step: 708570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:21:50,488-Speed 2631.03 samples/sec   Loss 2.1005   LearningRate 0.0021   Epoch: 17   Global Step: 708580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:21:54,380-Speed 2631.30 samples/sec   Loss 2.0206   LearningRate 0.0021   Epoch: 17   Global Step: 708590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:21:58,263-Speed 2637.51 samples/sec   Loss 2.0272   LearningRate 0.0021   Epoch: 17   Global Step: 708600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:02,162-Speed 2626.98 samples/sec   Loss 2.0817   LearningRate 0.0021   Epoch: 17   Global Step: 708610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:06,057-Speed 2629.51 samples/sec   Loss 2.0409   LearningRate 0.0021   Epoch: 17   Global Step: 708620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:09,952-Speed 2629.90 samples/sec   Loss 2.0306   LearningRate 0.0021   Epoch: 17   Global Step: 708630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:13,860-Speed 2620.98 samples/sec   Loss 2.1577   LearningRate 0.0021   Epoch: 17   Global Step: 708640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:17,757-Speed 2628.56 samples/sec   Loss 2.0661   LearningRate 0.0021   Epoch: 17   Global Step: 708650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:21,666-Speed 2619.82 samples/sec   Loss 2.0486   LearningRate 0.0021   Epoch: 17   Global Step: 708660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:25,558-Speed 2631.59 samples/sec   Loss 2.0614   LearningRate 0.0021   Epoch: 17   Global Step: 708670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:29,455-Speed 2628.44 samples/sec   Loss 2.0722   LearningRate 0.0021   Epoch: 17   Global Step: 708680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:33,353-Speed 2628.28 samples/sec   Loss 2.0636   LearningRate 0.0021   Epoch: 17   Global Step: 708690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:37,255-Speed 2624.61 samples/sec   Loss 2.0843   LearningRate 0.0021   Epoch: 17   Global Step: 708700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:22:41,139-Speed 2637.18 samples/sec   Loss 2.0942   LearningRate 0.0021   Epoch: 17   Global Step: 708710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:45,049-Speed 2619.72 samples/sec   Loss 2.0521   LearningRate 0.0021   Epoch: 17   Global Step: 708720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:48,950-Speed 2625.43 samples/sec   Loss 2.0463   LearningRate 0.0021   Epoch: 17   Global Step: 708730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:52,847-Speed 2629.43 samples/sec   Loss 2.0024   LearningRate 0.0021   Epoch: 17   Global Step: 708740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:22:56,740-Speed 2630.30 samples/sec   Loss 2.0421   LearningRate 0.0021   Epoch: 17   Global Step: 708750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:00,652-Speed 2618.18 samples/sec   Loss 2.0694   LearningRate 0.0021   Epoch: 17   Global Step: 708760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:04,544-Speed 2631.49 samples/sec   Loss 2.0048   LearningRate 0.0021   Epoch: 17   Global Step: 708770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:08,436-Speed 2632.13 samples/sec   Loss 2.0870   LearningRate 0.0021   Epoch: 17   Global Step: 708780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:12,334-Speed 2627.80 samples/sec   Loss 2.0319   LearningRate 0.0021   Epoch: 17   Global Step: 708790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:16,228-Speed 2629.76 samples/sec   Loss 2.0082   LearningRate 0.0021   Epoch: 17   Global Step: 708800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:20,128-Speed 2626.37 samples/sec   Loss 2.0723   LearningRate 0.0021   Epoch: 17   Global Step: 708810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:23:24,003-Speed 2643.23 samples/sec   Loss 2.1539   LearningRate 0.0021   Epoch: 17   Global Step: 708820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:27,900-Speed 2629.54 samples/sec   Loss 2.1260   LearningRate 0.0021   Epoch: 17   Global Step: 708830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:31,800-Speed 2625.88 samples/sec   Loss 2.0116   LearningRate 0.0021   Epoch: 17   Global Step: 708840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:35,690-Speed 2632.65 samples/sec   Loss 2.0937   LearningRate 0.0021   Epoch: 17   Global Step: 708850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:39,581-Speed 2632.21 samples/sec   Loss 2.0977   LearningRate 0.0021   Epoch: 17   Global Step: 708860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:43,522-Speed 2599.21 samples/sec   Loss 2.0938   LearningRate 0.0021   Epoch: 17   Global Step: 708870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:47,414-Speed 2631.80 samples/sec   Loss 2.0690   LearningRate 0.0021   Epoch: 17   Global Step: 708880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:51,312-Speed 2628.14 samples/sec   Loss 2.0527   LearningRate 0.0021   Epoch: 17   Global Step: 708890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:55,209-Speed 2628.33 samples/sec   Loss 1.9943   LearningRate 0.0021   Epoch: 17   Global Step: 708900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:23:59,098-Speed 2633.40 samples/sec   Loss 2.0107   LearningRate 0.0021   Epoch: 17   Global Step: 708910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:02,975-Speed 2642.32 samples/sec   Loss 2.0876   LearningRate 0.0021   Epoch: 17   Global Step: 708920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:06,897-Speed 2611.25 samples/sec   Loss 2.0612   LearningRate 0.0021   Epoch: 17   Global Step: 708930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:10,790-Speed 2630.88 samples/sec   Loss 1.9857   LearningRate 0.0021   Epoch: 17   Global Step: 708940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:14,693-Speed 2623.79 samples/sec   Loss 2.0428   LearningRate 0.0021   Epoch: 17   Global Step: 708950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:18,586-Speed 2631.26 samples/sec   Loss 2.0011   LearningRate 0.0021   Epoch: 17   Global Step: 708960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:22,482-Speed 2628.95 samples/sec   Loss 2.1178   LearningRate 0.0021   Epoch: 17   Global Step: 708970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:26,377-Speed 2630.25 samples/sec   Loss 2.0558   LearningRate 0.0021   Epoch: 17   Global Step: 708980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:30,271-Speed 2629.84 samples/sec   Loss 2.0644   LearningRate 0.0021   Epoch: 17   Global Step: 708990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:34,170-Speed 2626.58 samples/sec   Loss 2.0131   LearningRate 0.0021   Epoch: 17   Global Step: 709000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:38,066-Speed 2629.31 samples/sec   Loss 2.1232   LearningRate 0.0021   Epoch: 17   Global Step: 709010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:41,965-Speed 2627.74 samples/sec   Loss 2.0844   LearningRate 0.0021   Epoch: 17   Global Step: 709020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-16 03:24:45,837-Speed 2644.86 samples/sec   Loss 2.0491   LearningRate 0.0021   Epoch: 17   Global Step: 709030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:49,730-Speed 2631.35 samples/sec   Loss 2.0998   LearningRate 0.0021   Epoch: 17   Global Step: 709040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:53,624-Speed 2630.32 samples/sec   Loss 2.0509   LearningRate 0.0021   Epoch: 17   Global Step: 709050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:24:57,543-Speed 2613.86 samples/sec   Loss 2.0742   LearningRate 0.0021   Epoch: 17   Global Step: 709060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:01,451-Speed 2620.71 samples/sec   Loss 2.0665   LearningRate 0.0021   Epoch: 17   Global Step: 709070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:05,346-Speed 2630.13 samples/sec   Loss 2.0930   LearningRate 0.0021   Epoch: 17   Global Step: 709080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:09,241-Speed 2629.16 samples/sec   Loss 2.0751   LearningRate 0.0021   Epoch: 17   Global Step: 709090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:13,135-Speed 2630.68 samples/sec   Loss 2.0436   LearningRate 0.0021   Epoch: 17   Global Step: 709100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:17,027-Speed 2631.77 samples/sec   Loss 2.0479   LearningRate 0.0021   Epoch: 17   Global Step: 709110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:20,922-Speed 2629.71 samples/sec   Loss 2.0738   LearningRate 0.0021   Epoch: 17   Global Step: 709120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:24,826-Speed 2623.56 samples/sec   Loss 2.0797   LearningRate 0.0021   Epoch: 17   Global Step: 709130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:28,721-Speed 2630.26 samples/sec   Loss 2.1027   LearningRate 0.0021   Epoch: 17   Global Step: 709140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:32,611-Speed 2633.11 samples/sec   Loss 1.9852   LearningRate 0.0021   Epoch: 17   Global Step: 709150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:36,539-Speed 2607.43 samples/sec   Loss 1.9609   LearningRate 0.0021   Epoch: 17   Global Step: 709160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:25:40,433-Speed 2630.20 samples/sec   Loss 2.0905   LearningRate 0.0021   Epoch: 17   Global Step: 709170   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:25:44,320-Speed 2635.58 samples/sec   Loss 2.0535   LearningRate 0.0021   Epoch: 17   Global Step: 709180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:25:48,211-Speed 2632.52 samples/sec   Loss 2.0300   LearningRate 0.0021   Epoch: 17   Global Step: 709190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:25:52,109-Speed 2627.37 samples/sec   Loss 2.0551   LearningRate 0.0021   Epoch: 17   Global Step: 709200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:25:56,001-Speed 2632.15 samples/sec   Loss 2.1310   LearningRate 0.0021   Epoch: 17   Global Step: 709210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:25:59,892-Speed 2632.17 samples/sec   Loss 2.0073   LearningRate 0.0021   Epoch: 17   Global Step: 709220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:26:03,785-Speed 2630.78 samples/sec   Loss 2.1089   LearningRate 0.0021   Epoch: 17   Global Step: 709230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:26:07,681-Speed 2628.75 samples/sec   Loss 2.0878   LearningRate 0.0021   Epoch: 17   Global Step: 709240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:26:11,573-Speed 2632.29 samples/sec   Loss 2.0382   LearningRate 0.0021   Epoch: 17   Global Step: 709250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:26:15,465-Speed 2631.81 samples/sec   Loss 2.0828   LearningRate 0.0021   Epoch: 17   Global Step: 709260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:26:19,363-Speed 2627.85 samples/sec   Loss 2.0531   LearningRate 0.0021   Epoch: 17   Global Step: 709270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:23,271-Speed 2620.80 samples/sec   Loss 2.1065   LearningRate 0.0021   Epoch: 17   Global Step: 709280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:27,179-Speed 2621.80 samples/sec   Loss 2.0634   LearningRate 0.0021   Epoch: 17   Global Step: 709290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:31,091-Speed 2617.89 samples/sec   Loss 2.0370   LearningRate 0.0021   Epoch: 17   Global Step: 709300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:34,994-Speed 2623.83 samples/sec   Loss 2.0443   LearningRate 0.0021   Epoch: 17   Global Step: 709310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:38,893-Speed 2627.14 samples/sec   Loss 2.0640   LearningRate 0.0021   Epoch: 17   Global Step: 709320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:42,908-Speed 2552.01 samples/sec   Loss 2.0555   LearningRate 0.0021   Epoch: 17   Global Step: 709330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:46,800-Speed 2631.65 samples/sec   Loss 2.0255   LearningRate 0.0021   Epoch: 17   Global Step: 709340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:50,736-Speed 2602.17 samples/sec   Loss 2.0466   LearningRate 0.0021   Epoch: 17   Global Step: 709350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:54,626-Speed 2633.51 samples/sec   Loss 2.1305   LearningRate 0.0021   Epoch: 17   Global Step: 709360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:26:58,505-Speed 2640.57 samples/sec   Loss 2.0171   LearningRate 0.0021   Epoch: 17   Global Step: 709370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:27:02,419-Speed 2616.32 samples/sec   Loss 2.0371   LearningRate 0.0021   Epoch: 17   Global Step: 709380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:27:06,317-Speed 2628.22 samples/sec   Loss 2.0748   LearningRate 0.0021   Epoch: 17   Global Step: 709390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:27:10,230-Speed 2617.59 samples/sec   Loss 2.0851   LearningRate 0.0021   Epoch: 17   Global Step: 709400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:27:14,136-Speed 2622.26 samples/sec   Loss 2.0640   LearningRate 0.0021   Epoch: 17   Global Step: 709410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:27:18,043-Speed 2621.87 samples/sec   Loss 2.0893   LearningRate 0.0021   Epoch: 17   Global Step: 709420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-16 03:27:21,925-Speed 2637.88 samples/sec   Loss 2.0519   LearningRate 0.0021   Epoch: 17   Global Step: 709430   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:27:25,833-Speed 2622.00 samples/sec   Loss 2.0832   LearningRate 0.0021   Epoch: 17   Global Step: 709440   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-16 03:27:29,780-Speed 2594.82 samples/sec   Loss 2.0387   LearningRate 0.0021   Epoch: 17   Global Step: 709450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:33,706-Speed 2609.13 samples/sec   Loss 2.0911   LearningRate 0.0021   Epoch: 17   Global Step: 709460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:37,605-Speed 2626.71 samples/sec   Loss 2.0118   LearningRate 0.0021   Epoch: 17   Global Step: 709470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:41,510-Speed 2622.88 samples/sec   Loss 2.0863   LearningRate 0.0021   Epoch: 17   Global Step: 709480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:45,410-Speed 2626.48 samples/sec   Loss 1.9980   LearningRate 0.0021   Epoch: 17   Global Step: 709490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:49,306-Speed 2629.55 samples/sec   Loss 2.0872   LearningRate 0.0021   Epoch: 17   Global Step: 709500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:53,298-Speed 2565.10 samples/sec   Loss 2.0544   LearningRate 0.0021   Epoch: 17   Global Step: 709510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:27:57,196-Speed 2628.45 samples/sec   Loss 2.0607   LearningRate 0.0021   Epoch: 17   Global Step: 709520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:28:01,087-Speed 2632.02 samples/sec   Loss 2.0115   LearningRate 0.0021   Epoch: 17   Global Step: 709530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:04,984-Speed 2628.40 samples/sec   Loss 2.0675   LearningRate 0.0021   Epoch: 17   Global Step: 709540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:08,877-Speed 2630.78 samples/sec   Loss 2.0661   LearningRate 0.0021   Epoch: 17   Global Step: 709550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:12,774-Speed 2628.84 samples/sec   Loss 2.0617   LearningRate 0.0021   Epoch: 17   Global Step: 709560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:16,666-Speed 2631.17 samples/sec   Loss 2.0763   LearningRate 0.0021   Epoch: 17   Global Step: 709570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:20,571-Speed 2622.47 samples/sec   Loss 2.0445   LearningRate 0.0021   Epoch: 17   Global Step: 709580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:24,482-Speed 2620.67 samples/sec   Loss 2.0253   LearningRate 0.0021   Epoch: 17   Global Step: 709590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:28,384-Speed 2625.27 samples/sec   Loss 2.0933   LearningRate 0.0021   Epoch: 17   Global Step: 709600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:32,278-Speed 2630.27 samples/sec   Loss 2.0951   LearningRate 0.0021   Epoch: 17   Global Step: 709610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:36,268-Speed 2566.66 samples/sec   Loss 2.0494   LearningRate 0.0021   Epoch: 17   Global Step: 709620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:40,259-Speed 2566.21 samples/sec   Loss 2.0639   LearningRate 0.0021   Epoch: 17   Global Step: 709630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:28:44,192-Speed 2605.01 samples/sec   Loss 2.0024   LearningRate 0.0021   Epoch: 17   Global Step: 709640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:48,083-Speed 2632.07 samples/sec   Loss 2.0258   LearningRate 0.0021   Epoch: 17   Global Step: 709650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:51,979-Speed 2629.00 samples/sec   Loss 2.1102   LearningRate 0.0021   Epoch: 17   Global Step: 709660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:55,873-Speed 2629.84 samples/sec   Loss 2.0034   LearningRate 0.0021   Epoch: 17   Global Step: 709670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:28:59,769-Speed 2629.31 samples/sec   Loss 2.0496   LearningRate 0.0021   Epoch: 17   Global Step: 709680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:03,678-Speed 2620.01 samples/sec   Loss 2.0480   LearningRate 0.0021   Epoch: 17   Global Step: 709690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:07,574-Speed 2629.40 samples/sec   Loss 2.0626   LearningRate 0.0021   Epoch: 17   Global Step: 709700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:11,469-Speed 2629.61 samples/sec   Loss 2.0373   LearningRate 0.0021   Epoch: 17   Global Step: 709710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:15,362-Speed 2630.42 samples/sec   Loss 2.1487   LearningRate 0.0021   Epoch: 17   Global Step: 709720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:19,273-Speed 2619.32 samples/sec   Loss 2.0297   LearningRate 0.0021   Epoch: 17   Global Step: 709730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:23,147-Speed 2643.82 samples/sec   Loss 2.0220   LearningRate 0.0021   Epoch: 17   Global Step: 709740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:27,042-Speed 2630.59 samples/sec   Loss 2.0564   LearningRate 0.0021   Epoch: 17   Global Step: 709750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:30,940-Speed 2626.94 samples/sec   Loss 2.0124   LearningRate 0.0021   Epoch: 17   Global Step: 709760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:34,847-Speed 2621.83 samples/sec   Loss 2.0379   LearningRate 0.0021   Epoch: 17   Global Step: 709770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:38,742-Speed 2629.23 samples/sec   Loss 2.0250   LearningRate 0.0021   Epoch: 17   Global Step: 709780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:42,639-Speed 2628.69 samples/sec   Loss 2.0669   LearningRate 0.0021   Epoch: 17   Global Step: 709790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:46,539-Speed 2626.04 samples/sec   Loss 2.0103   LearningRate 0.0021   Epoch: 17   Global Step: 709800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:50,433-Speed 2630.91 samples/sec   Loss 2.0782   LearningRate 0.0021   Epoch: 17   Global Step: 709810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:29:54,307-Speed 2643.62 samples/sec   Loss 2.0555   LearningRate 0.0021   Epoch: 17   Global Step: 709820   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:29:58,202-Speed 2630.63 samples/sec   Loss 2.0337   LearningRate 0.0021   Epoch: 17   Global Step: 709830   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:30:02,118-Speed 2615.34 samples/sec   Loss 2.0230   LearningRate 0.0021   Epoch: 17   Global Step: 709840   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:30:06,037-Speed 2613.07 samples/sec   Loss 2.0196   LearningRate 0.0021   Epoch: 17   Global Step: 709850   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:30:09,933-Speed 2628.57 samples/sec   Loss 2.0221   LearningRate 0.0021   Epoch: 17   Global Step: 709860   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:30:13,802-Speed 2647.62 samples/sec   Loss 2.0248   LearningRate 0.0021   Epoch: 17   Global Step: 709870   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:17,699-Speed 2628.57 samples/sec   Loss 2.0495   LearningRate 0.0021   Epoch: 17   Global Step: 709880   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:21,600-Speed 2626.00 samples/sec   Loss 2.0488   LearningRate 0.0021   Epoch: 17   Global Step: 709890   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:25,550-Speed 2592.74 samples/sec   Loss 2.1160   LearningRate 0.0021   Epoch: 17   Global Step: 709900   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:29,447-Speed 2628.49 samples/sec   Loss 2.0064   LearningRate 0.0021   Epoch: 17   Global Step: 709910   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:33,343-Speed 2629.47 samples/sec   Loss 2.0553   LearningRate 0.0021   Epoch: 17   Global Step: 709920   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:37,238-Speed 2629.62 samples/sec   Loss 2.0168   LearningRate 0.0021   Epoch: 17   Global Step: 709930   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:41,145-Speed 2621.42 samples/sec   Loss 2.0547   LearningRate 0.0021   Epoch: 17   Global Step: 709940   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:45,049-Speed 2623.32 samples/sec   Loss 2.1051   LearningRate 0.0021   Epoch: 17   Global Step: 709950   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:48,946-Speed 2628.81 samples/sec   Loss 2.0525   LearningRate 0.0021   Epoch: 17   Global Step: 709960   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:30:52,838-Speed 2632.01 samples/sec   Loss 2.0553   LearningRate 0.0021   Epoch: 17   Global Step: 709970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:30:56,741-Speed 2624.47 samples/sec   Loss 2.0727   LearningRate 0.0021   Epoch: 17   Global Step: 709980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:31:00,663-Speed 2610.91 samples/sec   Loss 2.0891   LearningRate 0.0021   Epoch: 17   Global Step: 709990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:31:04,560-Speed 2628.91 samples/sec   Loss 2.0814   LearningRate 0.0021   Epoch: 17   Global Step: 710000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:31:47,376-[lfw][710000]XNorm: 22.434924
Training: 2022-04-16 03:31:47,376-[lfw][710000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 03:31:47,377-[lfw][710000]Accuracy-Highest: 0.99850
Training: 2022-04-16 03:32:37,517-[cfp_fp][710000]XNorm: 22.279019
Training: 2022-04-16 03:32:37,518-[cfp_fp][710000]Accuracy-Flip: 0.99257+-0.00377
Training: 2022-04-16 03:32:37,519-[cfp_fp][710000]Accuracy-Highest: 0.99329
Training: 2022-04-16 03:33:20,418-[agedb_30][710000]XNorm: 23.109830
Training: 2022-04-16 03:33:20,419-[agedb_30][710000]Accuracy-Flip: 0.98267+-0.00593
Training: 2022-04-16 03:33:20,420-[agedb_30][710000]Accuracy-Highest: 0.98317
Training: 2022-04-16 03:33:24,300-Speed 73.28 samples/sec   Loss 2.0566   LearningRate 0.0021   Epoch: 17   Global Step: 710010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:28,176-Speed 2641.87 samples/sec   Loss 2.0364   LearningRate 0.0021   Epoch: 17   Global Step: 710020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:32,052-Speed 2643.14 samples/sec   Loss 2.0531   LearningRate 0.0021   Epoch: 17   Global Step: 710030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:35,928-Speed 2642.32 samples/sec   Loss 1.9884   LearningRate 0.0021   Epoch: 17   Global Step: 710040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:39,806-Speed 2641.14 samples/sec   Loss 1.9907   LearningRate 0.0021   Epoch: 17   Global Step: 710050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:43,727-Speed 2612.48 samples/sec   Loss 2.0070   LearningRate 0.0021   Epoch: 17   Global Step: 710060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:47,587-Speed 2653.61 samples/sec   Loss 2.0586   LearningRate 0.0021   Epoch: 17   Global Step: 710070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:51,470-Speed 2637.74 samples/sec   Loss 2.0107   LearningRate 0.0021   Epoch: 17   Global Step: 710080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:55,354-Speed 2637.21 samples/sec   Loss 2.0624   LearningRate 0.0021   Epoch: 17   Global Step: 710090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:33:59,248-Speed 2631.26 samples/sec   Loss 2.1224   LearningRate 0.0021   Epoch: 17   Global Step: 710100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:03,140-Speed 2631.50 samples/sec   Loss 2.0627   LearningRate 0.0021   Epoch: 17   Global Step: 710110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:07,031-Speed 2632.59 samples/sec   Loss 2.0395   LearningRate 0.0021   Epoch: 17   Global Step: 710120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:10,936-Speed 2622.31 samples/sec   Loss 2.0056   LearningRate 0.0021   Epoch: 17   Global Step: 710130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:14,846-Speed 2619.86 samples/sec   Loss 2.0037   LearningRate 0.0021   Epoch: 17   Global Step: 710140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:18,737-Speed 2632.88 samples/sec   Loss 1.9869   LearningRate 0.0021   Epoch: 17   Global Step: 710150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:22,641-Speed 2623.42 samples/sec   Loss 2.0795   LearningRate 0.0021   Epoch: 17   Global Step: 710160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:26,536-Speed 2630.22 samples/sec   Loss 2.0242   LearningRate 0.0021   Epoch: 17   Global Step: 710170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:34:30,413-Speed 2641.80 samples/sec   Loss 2.0749   LearningRate 0.0021   Epoch: 17   Global Step: 710180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:34:34,305-Speed 2631.97 samples/sec   Loss 2.0195   LearningRate 0.0021   Epoch: 17   Global Step: 710190   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:34:38,214-Speed 2619.60 samples/sec   Loss 1.9796   LearningRate 0.0021   Epoch: 17   Global Step: 710200   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:34:42,142-Speed 2608.39 samples/sec   Loss 2.0625   LearningRate 0.0021   Epoch: 17   Global Step: 710210   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:34:46,045-Speed 2624.04 samples/sec   Loss 1.9576   LearningRate 0.0021   Epoch: 17   Global Step: 710220   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:34:49,947-Speed 2625.09 samples/sec   Loss 2.0157   LearningRate 0.0021   Epoch: 17   Global Step: 710230   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:34:53,851-Speed 2623.65 samples/sec   Loss 2.0314   LearningRate 0.0021   Epoch: 17   Global Step: 710240   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:34:57,754-Speed 2624.92 samples/sec   Loss 2.0569   LearningRate 0.0021   Epoch: 17   Global Step: 710250   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:35:01,692-Speed 2600.64 samples/sec   Loss 2.0484   LearningRate 0.0021   Epoch: 17   Global Step: 710260   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:35:05,594-Speed 2625.30 samples/sec   Loss 1.9909   LearningRate 0.0021   Epoch: 17   Global Step: 710270   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:35:09,495-Speed 2625.45 samples/sec   Loss 2.0063   LearningRate 0.0021   Epoch: 17   Global Step: 710280   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:35:13,416-Speed 2612.48 samples/sec   Loss 2.0778   LearningRate 0.0021   Epoch: 17   Global Step: 710290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:17,314-Speed 2627.59 samples/sec   Loss 2.0677   LearningRate 0.0021   Epoch: 17   Global Step: 710300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:21,233-Speed 2613.88 samples/sec   Loss 2.0481   LearningRate 0.0021   Epoch: 17   Global Step: 710310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:25,128-Speed 2629.58 samples/sec   Loss 1.9918   LearningRate 0.0021   Epoch: 17   Global Step: 710320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:29,029-Speed 2626.19 samples/sec   Loss 2.0554   LearningRate 0.0021   Epoch: 17   Global Step: 710330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:32,925-Speed 2629.01 samples/sec   Loss 2.0237   LearningRate 0.0021   Epoch: 17   Global Step: 710340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:36,823-Speed 2627.18 samples/sec   Loss 2.0277   LearningRate 0.0021   Epoch: 17   Global Step: 710350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:40,723-Speed 2626.03 samples/sec   Loss 2.0245   LearningRate 0.0021   Epoch: 17   Global Step: 710360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:44,636-Speed 2618.34 samples/sec   Loss 2.0200   LearningRate 0.0021   Epoch: 17   Global Step: 710370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:48,529-Speed 2630.62 samples/sec   Loss 2.0808   LearningRate 0.0021   Epoch: 17   Global Step: 710380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:35:52,459-Speed 2606.46 samples/sec   Loss 2.0529   LearningRate 0.0021   Epoch: 17   Global Step: 710390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:35:56,356-Speed 2628.69 samples/sec   Loss 2.0655   LearningRate 0.0021   Epoch: 17   Global Step: 710400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:00,284-Speed 2607.75 samples/sec   Loss 2.0366   LearningRate 0.0021   Epoch: 17   Global Step: 710410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:04,192-Speed 2620.36 samples/sec   Loss 2.0461   LearningRate 0.0021   Epoch: 17   Global Step: 710420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:08,094-Speed 2625.26 samples/sec   Loss 2.0278   LearningRate 0.0021   Epoch: 17   Global Step: 710430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:12,013-Speed 2613.15 samples/sec   Loss 2.0881   LearningRate 0.0021   Epoch: 17   Global Step: 710440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:15,915-Speed 2625.16 samples/sec   Loss 2.0490   LearningRate 0.0021   Epoch: 17   Global Step: 710450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:19,825-Speed 2619.96 samples/sec   Loss 2.0414   LearningRate 0.0021   Epoch: 17   Global Step: 710460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:23,740-Speed 2616.29 samples/sec   Loss 1.9970   LearningRate 0.0021   Epoch: 17   Global Step: 710470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:27,636-Speed 2629.23 samples/sec   Loss 2.0100   LearningRate 0.0021   Epoch: 17   Global Step: 710480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:31,510-Speed 2643.87 samples/sec   Loss 1.9941   LearningRate 0.0021   Epoch: 17   Global Step: 710490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:35,420-Speed 2619.21 samples/sec   Loss 2.0423   LearningRate 0.0021   Epoch: 17   Global Step: 710500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:39,315-Speed 2630.09 samples/sec   Loss 2.0099   LearningRate 0.0021   Epoch: 17   Global Step: 710510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:43,214-Speed 2627.18 samples/sec   Loss 2.0334   LearningRate 0.0021   Epoch: 17   Global Step: 710520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:47,149-Speed 2602.52 samples/sec   Loss 2.0360   LearningRate 0.0021   Epoch: 17   Global Step: 710530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:51,048-Speed 2627.18 samples/sec   Loss 2.0324   LearningRate 0.0021   Epoch: 17   Global Step: 710540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:54,954-Speed 2622.21 samples/sec   Loss 2.0391   LearningRate 0.0021   Epoch: 17   Global Step: 710550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:36:58,856-Speed 2625.82 samples/sec   Loss 1.9706   LearningRate 0.0021   Epoch: 17   Global Step: 710560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:37:02,753-Speed 2627.88 samples/sec   Loss 2.0434   LearningRate 0.0021   Epoch: 17   Global Step: 710570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:37:06,626-Speed 2644.12 samples/sec   Loss 2.0136   LearningRate 0.0021   Epoch: 17   Global Step: 710580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:10,521-Speed 2629.67 samples/sec   Loss 1.9910   LearningRate 0.0021   Epoch: 17   Global Step: 710590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:14,429-Speed 2620.95 samples/sec   Loss 2.0592   LearningRate 0.0021   Epoch: 17   Global Step: 710600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:18,345-Speed 2615.29 samples/sec   Loss 2.0600   LearningRate 0.0021   Epoch: 17   Global Step: 710610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:22,242-Speed 2629.43 samples/sec   Loss 2.0124   LearningRate 0.0021   Epoch: 17   Global Step: 710620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:26,134-Speed 2631.14 samples/sec   Loss 2.0532   LearningRate 0.0021   Epoch: 17   Global Step: 710630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:30,037-Speed 2624.72 samples/sec   Loss 2.0286   LearningRate 0.0021   Epoch: 17   Global Step: 710640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:33,938-Speed 2625.21 samples/sec   Loss 2.0193   LearningRate 0.0021   Epoch: 17   Global Step: 710650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:37,839-Speed 2625.41 samples/sec   Loss 2.0086   LearningRate 0.0021   Epoch: 17   Global Step: 710660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:41,751-Speed 2617.97 samples/sec   Loss 2.0213   LearningRate 0.0021   Epoch: 17   Global Step: 710670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:37:45,682-Speed 2606.26 samples/sec   Loss 2.0234   LearningRate 0.0021   Epoch: 17   Global Step: 710680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:37:49,584-Speed 2625.71 samples/sec   Loss 2.0809   LearningRate 0.0021   Epoch: 17   Global Step: 710690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:37:53,488-Speed 2623.43 samples/sec   Loss 2.0291   LearningRate 0.0021   Epoch: 17   Global Step: 710700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:37:57,419-Speed 2606.45 samples/sec   Loss 2.0332   LearningRate 0.0021   Epoch: 17   Global Step: 710710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:38:01,313-Speed 2630.08 samples/sec   Loss 2.0379   LearningRate 0.0021   Epoch: 17   Global Step: 710720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:05,218-Speed 2622.73 samples/sec   Loss 2.0130   LearningRate 0.0021   Epoch: 17   Global Step: 710730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:09,219-Speed 2559.85 samples/sec   Loss 2.1028   LearningRate 0.0021   Epoch: 17   Global Step: 710740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:13,144-Speed 2610.14 samples/sec   Loss 2.0555   LearningRate 0.0021   Epoch: 17   Global Step: 710750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:17,071-Speed 2608.12 samples/sec   Loss 2.0058   LearningRate 0.0021   Epoch: 17   Global Step: 710760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:21,049-Speed 2575.30 samples/sec   Loss 2.0599   LearningRate 0.0021   Epoch: 17   Global Step: 710770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:24,977-Speed 2607.58 samples/sec   Loss 2.0482   LearningRate 0.0021   Epoch: 17   Global Step: 710780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:28,890-Speed 2618.05 samples/sec   Loss 2.0379   LearningRate 0.0021   Epoch: 17   Global Step: 710790   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:32,805-Speed 2616.25 samples/sec   Loss 2.0170   LearningRate 0.0021   Epoch: 17   Global Step: 710800   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:36,705-Speed 2626.22 samples/sec   Loss 2.0823   LearningRate 0.0020   Epoch: 17   Global Step: 710810   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:38:40,596-Speed 2633.15 samples/sec   Loss 2.0393   LearningRate 0.0020   Epoch: 17   Global Step: 710820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:38:44,491-Speed 2629.84 samples/sec   Loss 2.0635   LearningRate 0.0020   Epoch: 17   Global Step: 710830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:38:48,386-Speed 2629.53 samples/sec   Loss 2.0782   LearningRate 0.0020   Epoch: 17   Global Step: 710840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:38:52,293-Speed 2621.29 samples/sec   Loss 2.0187   LearningRate 0.0020   Epoch: 17   Global Step: 710850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:38:56,191-Speed 2627.74 samples/sec   Loss 1.9824   LearningRate 0.0020   Epoch: 17   Global Step: 710860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:00,088-Speed 2628.86 samples/sec   Loss 2.0813   LearningRate 0.0020   Epoch: 17   Global Step: 710870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:04,002-Speed 2617.10 samples/sec   Loss 1.9732   LearningRate 0.0020   Epoch: 17   Global Step: 710880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:07,902-Speed 2626.36 samples/sec   Loss 2.0309   LearningRate 0.0020   Epoch: 17   Global Step: 710890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:11,798-Speed 2628.83 samples/sec   Loss 2.0120   LearningRate 0.0020   Epoch: 17   Global Step: 710900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:15,706-Speed 2621.16 samples/sec   Loss 2.0812   LearningRate 0.0020   Epoch: 17   Global Step: 710910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:19,597-Speed 2631.66 samples/sec   Loss 2.0195   LearningRate 0.0020   Epoch: 17   Global Step: 710920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:39:23,507-Speed 2620.04 samples/sec   Loss 2.0360   LearningRate 0.0020   Epoch: 17   Global Step: 710930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:39:27,390-Speed 2637.59 samples/sec   Loss 2.0602   LearningRate 0.0020   Epoch: 17   Global Step: 710940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:31,290-Speed 2627.17 samples/sec   Loss 2.0782   LearningRate 0.0020   Epoch: 17   Global Step: 710950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:35,199-Speed 2619.89 samples/sec   Loss 2.0180   LearningRate 0.0020   Epoch: 17   Global Step: 710960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:39:39,080-Speed 2639.23 samples/sec   Loss 1.9984   LearningRate 0.0020   Epoch: 17   Global Step: 710970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:39:43,004-Speed 2610.14 samples/sec   Loss 2.0056   LearningRate 0.0020   Epoch: 17   Global Step: 710980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:39:46,903-Speed 2627.04 samples/sec   Loss 1.9716   LearningRate 0.0020   Epoch: 17   Global Step: 710990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:39:50,805-Speed 2625.16 samples/sec   Loss 2.0261   LearningRate 0.0020   Epoch: 17   Global Step: 711000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:39:54,703-Speed 2627.98 samples/sec   Loss 1.9868   LearningRate 0.0020   Epoch: 17   Global Step: 711010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:39:58,602-Speed 2627.15 samples/sec   Loss 2.0442   LearningRate 0.0020   Epoch: 17   Global Step: 711020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:02,507-Speed 2623.04 samples/sec   Loss 2.0830   LearningRate 0.0020   Epoch: 17   Global Step: 711030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:06,405-Speed 2627.06 samples/sec   Loss 1.9903   LearningRate 0.0020   Epoch: 17   Global Step: 711040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:10,318-Speed 2618.05 samples/sec   Loss 2.0533   LearningRate 0.0020   Epoch: 17   Global Step: 711050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:14,213-Speed 2629.95 samples/sec   Loss 2.0511   LearningRate 0.0020   Epoch: 17   Global Step: 711060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:18,116-Speed 2624.09 samples/sec   Loss 2.0339   LearningRate 0.0020   Epoch: 17   Global Step: 711070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:22,011-Speed 2629.28 samples/sec   Loss 1.9957   LearningRate 0.0020   Epoch: 17   Global Step: 711080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:25,905-Speed 2630.34 samples/sec   Loss 2.0361   LearningRate 0.0020   Epoch: 17   Global Step: 711090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:29,802-Speed 2628.72 samples/sec   Loss 2.0840   LearningRate 0.0020   Epoch: 17   Global Step: 711100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:33,697-Speed 2629.65 samples/sec   Loss 2.0533   LearningRate 0.0020   Epoch: 17   Global Step: 711110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:37,603-Speed 2621.87 samples/sec   Loss 2.0555   LearningRate 0.0020   Epoch: 17   Global Step: 711120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:41,513-Speed 2619.76 samples/sec   Loss 2.0456   LearningRate 0.0020   Epoch: 17   Global Step: 711130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:40:45,387-Speed 2644.49 samples/sec   Loss 2.0796   LearningRate 0.0020   Epoch: 17   Global Step: 711140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:49,291-Speed 2622.97 samples/sec   Loss 2.1310   LearningRate 0.0020   Epoch: 17   Global Step: 711150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:53,184-Speed 2631.59 samples/sec   Loss 2.0611   LearningRate 0.0020   Epoch: 17   Global Step: 711160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:40:57,081-Speed 2627.62 samples/sec   Loss 2.0349   LearningRate 0.0020   Epoch: 17   Global Step: 711170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:00,982-Speed 2625.59 samples/sec   Loss 2.0287   LearningRate 0.0020   Epoch: 17   Global Step: 711180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:04,878-Speed 2629.20 samples/sec   Loss 2.0383   LearningRate 0.0020   Epoch: 17   Global Step: 711190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:08,868-Speed 2566.91 samples/sec   Loss 2.0662   LearningRate 0.0020   Epoch: 17   Global Step: 711200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:12,767-Speed 2627.12 samples/sec   Loss 2.0705   LearningRate 0.0020   Epoch: 17   Global Step: 711210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:16,697-Speed 2605.82 samples/sec   Loss 2.0101   LearningRate 0.0020   Epoch: 17   Global Step: 711220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:20,612-Speed 2617.05 samples/sec   Loss 2.0649   LearningRate 0.0020   Epoch: 17   Global Step: 711230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:24,520-Speed 2620.62 samples/sec   Loss 1.9958   LearningRate 0.0020   Epoch: 17   Global Step: 711240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:41:28,420-Speed 2627.10 samples/sec   Loss 2.0255   LearningRate 0.0020   Epoch: 17   Global Step: 711250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:41:32,321-Speed 2624.82 samples/sec   Loss 2.0503   LearningRate 0.0020   Epoch: 17   Global Step: 711260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:41:36,246-Speed 2609.44 samples/sec   Loss 2.0065   LearningRate 0.0020   Epoch: 17   Global Step: 711270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:41:40,125-Speed 2640.75 samples/sec   Loss 2.0958   LearningRate 0.0020   Epoch: 17   Global Step: 711280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:44,038-Speed 2618.09 samples/sec   Loss 2.1013   LearningRate 0.0020   Epoch: 17   Global Step: 711290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:47,946-Speed 2620.78 samples/sec   Loss 1.9878   LearningRate 0.0020   Epoch: 17   Global Step: 711300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:51,842-Speed 2629.27 samples/sec   Loss 2.0384   LearningRate 0.0020   Epoch: 17   Global Step: 711310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:55,738-Speed 2629.14 samples/sec   Loss 2.0069   LearningRate 0.0020   Epoch: 17   Global Step: 711320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:41:59,642-Speed 2624.18 samples/sec   Loss 2.0408   LearningRate 0.0020   Epoch: 17   Global Step: 711330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:03,545-Speed 2623.57 samples/sec   Loss 2.0575   LearningRate 0.0020   Epoch: 17   Global Step: 711340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:07,449-Speed 2623.36 samples/sec   Loss 2.0252   LearningRate 0.0020   Epoch: 17   Global Step: 711350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:11,345-Speed 2629.39 samples/sec   Loss 2.0471   LearningRate 0.0020   Epoch: 17   Global Step: 711360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:15,239-Speed 2630.50 samples/sec   Loss 2.0376   LearningRate 0.0020   Epoch: 17   Global Step: 711370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:19,141-Speed 2624.71 samples/sec   Loss 2.0323   LearningRate 0.0020   Epoch: 17   Global Step: 711380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:23,080-Speed 2600.63 samples/sec   Loss 2.0203   LearningRate 0.0020   Epoch: 17   Global Step: 711390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:26,979-Speed 2627.19 samples/sec   Loss 1.9941   LearningRate 0.0020   Epoch: 17   Global Step: 711400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:30,879-Speed 2626.83 samples/sec   Loss 2.0548   LearningRate 0.0020   Epoch: 17   Global Step: 711410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:34,773-Speed 2629.78 samples/sec   Loss 2.0207   LearningRate 0.0020   Epoch: 17   Global Step: 711420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:38,669-Speed 2628.86 samples/sec   Loss 2.0148   LearningRate 0.0020   Epoch: 17   Global Step: 711430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:42,561-Speed 2631.34 samples/sec   Loss 2.0543   LearningRate 0.0020   Epoch: 17   Global Step: 711440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:42:46,436-Speed 2643.18 samples/sec   Loss 1.9896   LearningRate 0.0020   Epoch: 17   Global Step: 711450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:50,338-Speed 2625.21 samples/sec   Loss 1.9842   LearningRate 0.0020   Epoch: 17   Global Step: 711460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:54,258-Speed 2613.24 samples/sec   Loss 2.0429   LearningRate 0.0020   Epoch: 17   Global Step: 711470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:42:58,155-Speed 2628.84 samples/sec   Loss 1.9708   LearningRate 0.0020   Epoch: 17   Global Step: 711480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:02,076-Speed 2612.09 samples/sec   Loss 2.0023   LearningRate 0.0020   Epoch: 17   Global Step: 711490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:05,970-Speed 2630.07 samples/sec   Loss 2.0443   LearningRate 0.0020   Epoch: 17   Global Step: 711500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:09,864-Speed 2630.71 samples/sec   Loss 2.1089   LearningRate 0.0020   Epoch: 17   Global Step: 711510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:13,760-Speed 2629.18 samples/sec   Loss 2.0359   LearningRate 0.0020   Epoch: 17   Global Step: 711520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:17,667-Speed 2621.20 samples/sec   Loss 2.0582   LearningRate 0.0020   Epoch: 17   Global Step: 711530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:21,561-Speed 2630.63 samples/sec   Loss 2.0722   LearningRate 0.0020   Epoch: 17   Global Step: 711540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:43:25,460-Speed 2626.89 samples/sec   Loss 2.0705   LearningRate 0.0020   Epoch: 17   Global Step: 711550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:29,370-Speed 2620.31 samples/sec   Loss 2.0584   LearningRate 0.0020   Epoch: 17   Global Step: 711560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:33,273-Speed 2623.83 samples/sec   Loss 2.0273   LearningRate 0.0020   Epoch: 17   Global Step: 711570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:37,179-Speed 2622.27 samples/sec   Loss 2.1170   LearningRate 0.0020   Epoch: 17   Global Step: 711580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:41,127-Speed 2594.13 samples/sec   Loss 2.0410   LearningRate 0.0020   Epoch: 17   Global Step: 711590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:45,025-Speed 2628.38 samples/sec   Loss 2.0655   LearningRate 0.0020   Epoch: 17   Global Step: 711600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:48,920-Speed 2629.52 samples/sec   Loss 1.9708   LearningRate 0.0020   Epoch: 17   Global Step: 711610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:52,826-Speed 2622.70 samples/sec   Loss 2.0024   LearningRate 0.0020   Epoch: 17   Global Step: 711620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:43:56,733-Speed 2621.49 samples/sec   Loss 2.0737   LearningRate 0.0020   Epoch: 17   Global Step: 711630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:00,645-Speed 2618.31 samples/sec   Loss 1.9515   LearningRate 0.0020   Epoch: 17   Global Step: 711640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:04,548-Speed 2624.30 samples/sec   Loss 1.9924   LearningRate 0.0020   Epoch: 17   Global Step: 711650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:44:08,439-Speed 2632.20 samples/sec   Loss 2.0210   LearningRate 0.0020   Epoch: 17   Global Step: 711660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:44:12,319-Speed 2640.05 samples/sec   Loss 2.0466   LearningRate 0.0020   Epoch: 17   Global Step: 711670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:16,219-Speed 2626.26 samples/sec   Loss 2.0141   LearningRate 0.0020   Epoch: 17   Global Step: 711680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:20,129-Speed 2619.80 samples/sec   Loss 2.0331   LearningRate 0.0020   Epoch: 17   Global Step: 711690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:24,030-Speed 2625.78 samples/sec   Loss 1.9749   LearningRate 0.0020   Epoch: 17   Global Step: 711700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:27,932-Speed 2624.90 samples/sec   Loss 1.9696   LearningRate 0.0020   Epoch: 17   Global Step: 711710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:31,828-Speed 2629.11 samples/sec   Loss 2.0107   LearningRate 0.0020   Epoch: 17   Global Step: 711720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:35,729-Speed 2626.14 samples/sec   Loss 2.0454   LearningRate 0.0020   Epoch: 17   Global Step: 711730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:39,633-Speed 2623.54 samples/sec   Loss 2.0305   LearningRate 0.0020   Epoch: 17   Global Step: 711740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:43,539-Speed 2631.18 samples/sec   Loss 2.0720   LearningRate 0.0020   Epoch: 17   Global Step: 711750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:47,434-Speed 2629.58 samples/sec   Loss 1.9757   LearningRate 0.0020   Epoch: 17   Global Step: 711760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:51,331-Speed 2628.34 samples/sec   Loss 2.0755   LearningRate 0.0020   Epoch: 17   Global Step: 711770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:44:55,222-Speed 2631.90 samples/sec   Loss 2.0520   LearningRate 0.0020   Epoch: 17   Global Step: 711780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:44:59,124-Speed 2625.69 samples/sec   Loss 2.0088   LearningRate 0.0020   Epoch: 17   Global Step: 711790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:45:02,998-Speed 2643.72 samples/sec   Loss 2.0322   LearningRate 0.0020   Epoch: 17   Global Step: 711800   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:06,896-Speed 2628.73 samples/sec   Loss 2.0681   LearningRate 0.0020   Epoch: 17   Global Step: 711810   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:10,795-Speed 2626.63 samples/sec   Loss 2.0094   LearningRate 0.0020   Epoch: 17   Global Step: 711820   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:14,688-Speed 2631.26 samples/sec   Loss 2.0098   LearningRate 0.0020   Epoch: 17   Global Step: 711830   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:18,611-Speed 2610.57 samples/sec   Loss 2.0238   LearningRate 0.0020   Epoch: 17   Global Step: 711840   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:22,508-Speed 2628.44 samples/sec   Loss 2.0169   LearningRate 0.0020   Epoch: 17   Global Step: 711850   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:26,410-Speed 2624.80 samples/sec   Loss 2.0349   LearningRate 0.0020   Epoch: 17   Global Step: 711860   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:30,318-Speed 2621.55 samples/sec   Loss 2.0914   LearningRate 0.0020   Epoch: 17   Global Step: 711870   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:34,226-Speed 2620.63 samples/sec   Loss 1.9907   LearningRate 0.0020   Epoch: 17   Global Step: 711880   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:38,118-Speed 2632.17 samples/sec   Loss 2.0622   LearningRate 0.0020   Epoch: 17   Global Step: 711890   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:45:42,024-Speed 2622.09 samples/sec   Loss 2.0149   LearningRate 0.0020   Epoch: 17   Global Step: 711900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:45:45,930-Speed 2622.52 samples/sec   Loss 1.9261   LearningRate 0.0020   Epoch: 17   Global Step: 711910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:45:49,827-Speed 2628.46 samples/sec   Loss 1.9805   LearningRate 0.0020   Epoch: 17   Global Step: 711920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:45:53,726-Speed 2626.73 samples/sec   Loss 2.0367   LearningRate 0.0020   Epoch: 17   Global Step: 711930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:45:57,630-Speed 2623.46 samples/sec   Loss 1.9904   LearningRate 0.0020   Epoch: 17   Global Step: 711940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:01,540-Speed 2620.53 samples/sec   Loss 2.0013   LearningRate 0.0020   Epoch: 17   Global Step: 711950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:05,435-Speed 2629.27 samples/sec   Loss 1.9708   LearningRate 0.0020   Epoch: 17   Global Step: 711960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:09,417-Speed 2572.08 samples/sec   Loss 1.9997   LearningRate 0.0020   Epoch: 17   Global Step: 711970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:13,452-Speed 2539.18 samples/sec   Loss 1.9972   LearningRate 0.0020   Epoch: 17   Global Step: 711980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:17,350-Speed 2627.63 samples/sec   Loss 2.0481   LearningRate 0.0020   Epoch: 17   Global Step: 711990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:21,220-Speed 2646.38 samples/sec   Loss 2.0280   LearningRate 0.0020   Epoch: 17   Global Step: 712000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:25,114-Speed 2630.20 samples/sec   Loss 1.9265   LearningRate 0.0020   Epoch: 17   Global Step: 712010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:29,009-Speed 2630.12 samples/sec   Loss 2.0413   LearningRate 0.0020   Epoch: 17   Global Step: 712020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:32,908-Speed 2627.28 samples/sec   Loss 1.9852   LearningRate 0.0020   Epoch: 17   Global Step: 712030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:36,805-Speed 2628.19 samples/sec   Loss 2.0136   LearningRate 0.0020   Epoch: 17   Global Step: 712040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:40,703-Speed 2628.42 samples/sec   Loss 1.9581   LearningRate 0.0020   Epoch: 17   Global Step: 712050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:44,592-Speed 2633.04 samples/sec   Loss 2.0482   LearningRate 0.0020   Epoch: 17   Global Step: 712060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:48,482-Speed 2633.58 samples/sec   Loss 2.0222   LearningRate 0.0020   Epoch: 17   Global Step: 712070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:52,380-Speed 2627.66 samples/sec   Loss 2.0109   LearningRate 0.0020   Epoch: 17   Global Step: 712080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:46:56,275-Speed 2629.94 samples/sec   Loss 2.0123   LearningRate 0.0020   Epoch: 17   Global Step: 712090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:47:00,199-Speed 2610.00 samples/sec   Loss 2.0105   LearningRate 0.0020   Epoch: 17   Global Step: 712100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:47:04,101-Speed 2625.14 samples/sec   Loss 2.0253   LearningRate 0.0020   Epoch: 17   Global Step: 712110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:47:08,007-Speed 2621.66 samples/sec   Loss 1.9767   LearningRate 0.0020   Epoch: 17   Global Step: 712120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:47:11,911-Speed 2623.73 samples/sec   Loss 2.0436   LearningRate 0.0020   Epoch: 17   Global Step: 712130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 03:47:15,796-Speed 2637.11 samples/sec   Loss 2.0320   LearningRate 0.0020   Epoch: 17   Global Step: 712140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:47:19,708-Speed 2617.55 samples/sec   Loss 2.0435   LearningRate 0.0020   Epoch: 17   Global Step: 712150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:23,617-Speed 2621.00 samples/sec   Loss 1.9626   LearningRate 0.0020   Epoch: 17   Global Step: 712160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:27,520-Speed 2624.28 samples/sec   Loss 2.0231   LearningRate 0.0020   Epoch: 17   Global Step: 712170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:31,422-Speed 2625.58 samples/sec   Loss 1.9663   LearningRate 0.0020   Epoch: 17   Global Step: 712180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:35,346-Speed 2610.17 samples/sec   Loss 1.9924   LearningRate 0.0020   Epoch: 17   Global Step: 712190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:39,242-Speed 2628.93 samples/sec   Loss 1.9721   LearningRate 0.0020   Epoch: 17   Global Step: 712200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:43,139-Speed 2628.54 samples/sec   Loss 1.9581   LearningRate 0.0020   Epoch: 17   Global Step: 712210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:47,057-Speed 2613.98 samples/sec   Loss 2.0188   LearningRate 0.0020   Epoch: 17   Global Step: 712220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:50,953-Speed 2628.85 samples/sec   Loss 1.9651   LearningRate 0.0020   Epoch: 17   Global Step: 712230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:54,848-Speed 2629.77 samples/sec   Loss 2.0710   LearningRate 0.0020   Epoch: 17   Global Step: 712240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:47:58,906-Speed 2524.22 samples/sec   Loss 1.9807   LearningRate 0.0020   Epoch: 17   Global Step: 712250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:48:02,852-Speed 2595.45 samples/sec   Loss 2.0153   LearningRate 0.0020   Epoch: 17   Global Step: 712260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:48:06,752-Speed 2626.34 samples/sec   Loss 1.9848   LearningRate 0.0020   Epoch: 17   Global Step: 712270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:48:10,649-Speed 2627.90 samples/sec   Loss 1.9833   LearningRate 0.0020   Epoch: 17   Global Step: 712280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:48:14,545-Speed 2629.75 samples/sec   Loss 2.0004   LearningRate 0.0020   Epoch: 17   Global Step: 712290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:48:18,413-Speed 2647.74 samples/sec   Loss 2.0556   LearningRate 0.0020   Epoch: 17   Global Step: 712300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:22,324-Speed 2618.80 samples/sec   Loss 1.9624   LearningRate 0.0020   Epoch: 17   Global Step: 712310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:26,223-Speed 2626.73 samples/sec   Loss 2.0886   LearningRate 0.0020   Epoch: 17   Global Step: 712320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:30,118-Speed 2630.67 samples/sec   Loss 2.0260   LearningRate 0.0020   Epoch: 17   Global Step: 712330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:34,012-Speed 2630.10 samples/sec   Loss 1.9560   LearningRate 0.0020   Epoch: 17   Global Step: 712340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:37,922-Speed 2619.64 samples/sec   Loss 2.0247   LearningRate 0.0020   Epoch: 17   Global Step: 712350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:41,818-Speed 2628.53 samples/sec   Loss 2.0130   LearningRate 0.0020   Epoch: 17   Global Step: 712360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:45,718-Speed 2627.11 samples/sec   Loss 2.0332   LearningRate 0.0020   Epoch: 17   Global Step: 712370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:49,613-Speed 2629.36 samples/sec   Loss 2.0351   LearningRate 0.0020   Epoch: 17   Global Step: 712380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:48:53,556-Speed 2597.99 samples/sec   Loss 1.9674   LearningRate 0.0020   Epoch: 17   Global Step: 712390   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:48:57,454-Speed 2627.65 samples/sec   Loss 1.9978   LearningRate 0.0020   Epoch: 17   Global Step: 712400   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:01,345-Speed 2632.93 samples/sec   Loss 1.9879   LearningRate 0.0020   Epoch: 17   Global Step: 712410   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:05,279-Speed 2603.20 samples/sec   Loss 1.9796   LearningRate 0.0020   Epoch: 17   Global Step: 712420   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:09,173-Speed 2630.34 samples/sec   Loss 2.0423   LearningRate 0.0020   Epoch: 17   Global Step: 712430   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:13,068-Speed 2629.60 samples/sec   Loss 1.9898   LearningRate 0.0020   Epoch: 17   Global Step: 712440   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:16,963-Speed 2630.16 samples/sec   Loss 2.0587   LearningRate 0.0020   Epoch: 17   Global Step: 712450   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:20,857-Speed 2630.78 samples/sec   Loss 1.9730   LearningRate 0.0020   Epoch: 17   Global Step: 712460   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:24,751-Speed 2630.24 samples/sec   Loss 2.0354   LearningRate 0.0020   Epoch: 17   Global Step: 712470   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:28,653-Speed 2625.13 samples/sec   Loss 2.0767   LearningRate 0.0020   Epoch: 17   Global Step: 712480   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:49:32,547-Speed 2630.29 samples/sec   Loss 2.0389   LearningRate 0.0020   Epoch: 17   Global Step: 712490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:36,441-Speed 2630.41 samples/sec   Loss 2.0206   LearningRate 0.0020   Epoch: 17   Global Step: 712500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:40,330-Speed 2633.57 samples/sec   Loss 2.0222   LearningRate 0.0020   Epoch: 17   Global Step: 712510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:44,221-Speed 2632.83 samples/sec   Loss 2.0452   LearningRate 0.0020   Epoch: 17   Global Step: 712520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:48,140-Speed 2613.10 samples/sec   Loss 2.0466   LearningRate 0.0020   Epoch: 17   Global Step: 712530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:52,042-Speed 2625.54 samples/sec   Loss 2.0465   LearningRate 0.0020   Epoch: 17   Global Step: 712540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:55,939-Speed 2628.53 samples/sec   Loss 2.0515   LearningRate 0.0020   Epoch: 17   Global Step: 712550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:49:59,833-Speed 2630.16 samples/sec   Loss 2.0164   LearningRate 0.0020   Epoch: 17   Global Step: 712560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:03,729-Speed 2628.84 samples/sec   Loss 2.0274   LearningRate 0.0020   Epoch: 17   Global Step: 712570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:07,629-Speed 2626.08 samples/sec   Loss 2.0194   LearningRate 0.0020   Epoch: 17   Global Step: 712580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:11,501-Speed 2645.24 samples/sec   Loss 1.9644   LearningRate 0.0020   Epoch: 17   Global Step: 712590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:15,416-Speed 2616.62 samples/sec   Loss 1.9575   LearningRate 0.0020   Epoch: 17   Global Step: 712600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:19,316-Speed 2625.65 samples/sec   Loss 1.9598   LearningRate 0.0020   Epoch: 17   Global Step: 712610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:23,217-Speed 2626.30 samples/sec   Loss 1.9606   LearningRate 0.0020   Epoch: 17   Global Step: 712620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:27,125-Speed 2620.97 samples/sec   Loss 2.0492   LearningRate 0.0020   Epoch: 17   Global Step: 712630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:31,021-Speed 2629.39 samples/sec   Loss 2.0267   LearningRate 0.0020   Epoch: 17   Global Step: 712640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:34,917-Speed 2628.76 samples/sec   Loss 2.0064   LearningRate 0.0020   Epoch: 17   Global Step: 712650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:38,824-Speed 2621.88 samples/sec   Loss 2.0482   LearningRate 0.0020   Epoch: 17   Global Step: 712660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:42,732-Speed 2620.84 samples/sec   Loss 2.0343   LearningRate 0.0020   Epoch: 17   Global Step: 712670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:46,627-Speed 2629.83 samples/sec   Loss 2.1130   LearningRate 0.0020   Epoch: 17   Global Step: 712680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:50:50,523-Speed 2628.97 samples/sec   Loss 2.0188   LearningRate 0.0020   Epoch: 17   Global Step: 712690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:50:54,418-Speed 2629.41 samples/sec   Loss 2.0433   LearningRate 0.0020   Epoch: 17   Global Step: 712700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:50:58,320-Speed 2625.22 samples/sec   Loss 2.0417   LearningRate 0.0020   Epoch: 17   Global Step: 712710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:51:02,218-Speed 2627.44 samples/sec   Loss 2.0201   LearningRate 0.0020   Epoch: 17   Global Step: 712720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:51:06,114-Speed 2628.98 samples/sec   Loss 1.9406   LearningRate 0.0020   Epoch: 17   Global Step: 712730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:51:10,014-Speed 2626.42 samples/sec   Loss 1.9994   LearningRate 0.0020   Epoch: 17   Global Step: 712740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:51:13,888-Speed 2645.05 samples/sec   Loss 2.0166   LearningRate 0.0020   Epoch: 17   Global Step: 712750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:51:17,785-Speed 2628.29 samples/sec   Loss 2.0445   LearningRate 0.0020   Epoch: 17   Global Step: 712760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:51:21,682-Speed 2628.71 samples/sec   Loss 2.0007   LearningRate 0.0020   Epoch: 17   Global Step: 712770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:51:25,578-Speed 2628.59 samples/sec   Loss 1.9574   LearningRate 0.0020   Epoch: 17   Global Step: 712780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:51:29,487-Speed 2619.85 samples/sec   Loss 2.0101   LearningRate 0.0020   Epoch: 17   Global Step: 712790   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:33,387-Speed 2625.98 samples/sec   Loss 1.9047   LearningRate 0.0020   Epoch: 17   Global Step: 712800   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:37,285-Speed 2628.83 samples/sec   Loss 1.9824   LearningRate 0.0020   Epoch: 17   Global Step: 712810   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:41,199-Speed 2617.00 samples/sec   Loss 2.0162   LearningRate 0.0020   Epoch: 17   Global Step: 712820   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:45,117-Speed 2614.48 samples/sec   Loss 2.0124   LearningRate 0.0020   Epoch: 17   Global Step: 712830   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:49,017-Speed 2627.16 samples/sec   Loss 2.0226   LearningRate 0.0020   Epoch: 17   Global Step: 712840   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:52,920-Speed 2624.03 samples/sec   Loss 1.9741   LearningRate 0.0020   Epoch: 17   Global Step: 712850   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:51:56,813-Speed 2630.52 samples/sec   Loss 2.0453   LearningRate 0.0020   Epoch: 17   Global Step: 712860   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:52:00,709-Speed 2629.32 samples/sec   Loss 2.0017   LearningRate 0.0020   Epoch: 17   Global Step: 712870   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:52:04,602-Speed 2631.43 samples/sec   Loss 2.0360   LearningRate 0.0020   Epoch: 17   Global Step: 712880   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:52:08,495-Speed 2630.34 samples/sec   Loss 2.0227   LearningRate 0.0020   Epoch: 17   Global Step: 712890   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:12,390-Speed 2630.33 samples/sec   Loss 1.9845   LearningRate 0.0020   Epoch: 17   Global Step: 712900   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:16,285-Speed 2629.37 samples/sec   Loss 2.0649   LearningRate 0.0020   Epoch: 17   Global Step: 712910   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:20,188-Speed 2624.78 samples/sec   Loss 1.9533   LearningRate 0.0020   Epoch: 17   Global Step: 712920   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:24,116-Speed 2607.33 samples/sec   Loss 2.0010   LearningRate 0.0020   Epoch: 17   Global Step: 712930   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:28,019-Speed 2624.56 samples/sec   Loss 2.0128   LearningRate 0.0020   Epoch: 17   Global Step: 712940   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:31,916-Speed 2627.98 samples/sec   Loss 2.0374   LearningRate 0.0020   Epoch: 17   Global Step: 712950   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:35,816-Speed 2626.65 samples/sec   Loss 1.9902   LearningRate 0.0020   Epoch: 17   Global Step: 712960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:39,713-Speed 2628.34 samples/sec   Loss 2.0102   LearningRate 0.0020   Epoch: 17   Global Step: 712970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:43,605-Speed 2631.78 samples/sec   Loss 2.0335   LearningRate 0.0020   Epoch: 17   Global Step: 712980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:52:47,522-Speed 2614.92 samples/sec   Loss 1.9492   LearningRate 0.0020   Epoch: 17   Global Step: 712990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:52:51,438-Speed 2615.96 samples/sec   Loss 2.0070   LearningRate 0.0020   Epoch: 17   Global Step: 713000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:52:55,337-Speed 2627.56 samples/sec   Loss 1.9843   LearningRate 0.0020   Epoch: 17   Global Step: 713010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:52:59,237-Speed 2625.94 samples/sec   Loss 2.0525   LearningRate 0.0020   Epoch: 17   Global Step: 713020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:53:03,107-Speed 2646.30 samples/sec   Loss 1.9914   LearningRate 0.0020   Epoch: 17   Global Step: 713030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:07,000-Speed 2631.22 samples/sec   Loss 1.9831   LearningRate 0.0020   Epoch: 17   Global Step: 713040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:10,890-Speed 2633.25 samples/sec   Loss 2.0643   LearningRate 0.0020   Epoch: 17   Global Step: 713050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:14,785-Speed 2629.61 samples/sec   Loss 1.9663   LearningRate 0.0020   Epoch: 17   Global Step: 713060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:18,677-Speed 2631.47 samples/sec   Loss 1.9949   LearningRate 0.0020   Epoch: 17   Global Step: 713070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:22,568-Speed 2633.07 samples/sec   Loss 2.0106   LearningRate 0.0020   Epoch: 17   Global Step: 713080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:26,465-Speed 2628.43 samples/sec   Loss 1.9816   LearningRate 0.0020   Epoch: 17   Global Step: 713090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:30,375-Speed 2619.46 samples/sec   Loss 2.0685   LearningRate 0.0020   Epoch: 17   Global Step: 713100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:34,270-Speed 2630.06 samples/sec   Loss 2.0144   LearningRate 0.0020   Epoch: 17   Global Step: 713110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:38,168-Speed 2627.26 samples/sec   Loss 1.9866   LearningRate 0.0020   Epoch: 17   Global Step: 713120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:53:42,063-Speed 2629.80 samples/sec   Loss 2.0494   LearningRate 0.0020   Epoch: 17   Global Step: 713130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:53:45,959-Speed 2629.57 samples/sec   Loss 2.0136   LearningRate 0.0020   Epoch: 17   Global Step: 713140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:53:49,854-Speed 2629.44 samples/sec   Loss 1.9640   LearningRate 0.0020   Epoch: 17   Global Step: 713150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:53:53,783-Speed 2607.20 samples/sec   Loss 2.0160   LearningRate 0.0020   Epoch: 17   Global Step: 713160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:53:57,687-Speed 2623.87 samples/sec   Loss 2.0198   LearningRate 0.0020   Epoch: 17   Global Step: 713170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:54:01,582-Speed 2632.36 samples/sec   Loss 1.9945   LearningRate 0.0020   Epoch: 17   Global Step: 713180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:54:05,528-Speed 2596.00 samples/sec   Loss 1.9909   LearningRate 0.0020   Epoch: 17   Global Step: 713190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:54:09,421-Speed 2630.57 samples/sec   Loss 2.0277   LearningRate 0.0020   Epoch: 17   Global Step: 713200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:54:13,337-Speed 2615.78 samples/sec   Loss 2.0692   LearningRate 0.0020   Epoch: 17   Global Step: 713210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:54:17,238-Speed 2625.37 samples/sec   Loss 2.0492   LearningRate 0.0020   Epoch: 17   Global Step: 713220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:54:21,111-Speed 2645.09 samples/sec   Loss 1.9087   LearningRate 0.0020   Epoch: 17   Global Step: 713230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:25,005-Speed 2630.40 samples/sec   Loss 1.9484   LearningRate 0.0020   Epoch: 17   Global Step: 713240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:28,899-Speed 2630.64 samples/sec   Loss 1.9635   LearningRate 0.0020   Epoch: 17   Global Step: 713250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:32,799-Speed 2626.44 samples/sec   Loss 1.9828   LearningRate 0.0020   Epoch: 17   Global Step: 713260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:36,701-Speed 2625.07 samples/sec   Loss 1.9965   LearningRate 0.0020   Epoch: 17   Global Step: 713270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:40,591-Speed 2632.85 samples/sec   Loss 1.9808   LearningRate 0.0020   Epoch: 17   Global Step: 713280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:44,512-Speed 2612.52 samples/sec   Loss 1.9912   LearningRate 0.0020   Epoch: 17   Global Step: 713290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:48,403-Speed 2631.99 samples/sec   Loss 2.0050   LearningRate 0.0020   Epoch: 17   Global Step: 713300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:52,298-Speed 2630.28 samples/sec   Loss 2.0411   LearningRate 0.0020   Epoch: 17   Global Step: 713310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:54:56,205-Speed 2621.23 samples/sec   Loss 1.9746   LearningRate 0.0020   Epoch: 17   Global Step: 713320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:55:00,104-Speed 2627.10 samples/sec   Loss 2.0350   LearningRate 0.0020   Epoch: 17   Global Step: 713330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:03,998-Speed 2630.73 samples/sec   Loss 2.0198   LearningRate 0.0020   Epoch: 17   Global Step: 713340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:07,897-Speed 2626.51 samples/sec   Loss 1.9716   LearningRate 0.0020   Epoch: 17   Global Step: 713350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:11,797-Speed 2625.89 samples/sec   Loss 2.0189   LearningRate 0.0020   Epoch: 17   Global Step: 713360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:15,734-Speed 2602.43 samples/sec   Loss 2.0138   LearningRate 0.0020   Epoch: 17   Global Step: 713370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:19,628-Speed 2630.13 samples/sec   Loss 2.0018   LearningRate 0.0020   Epoch: 17   Global Step: 713380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:23,527-Speed 2627.98 samples/sec   Loss 2.0111   LearningRate 0.0020   Epoch: 17   Global Step: 713390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:27,428-Speed 2625.50 samples/sec   Loss 1.9445   LearningRate 0.0020   Epoch: 17   Global Step: 713400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:31,325-Speed 2628.70 samples/sec   Loss 1.9965   LearningRate 0.0020   Epoch: 17   Global Step: 713410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:35,219-Speed 2630.37 samples/sec   Loss 1.9685   LearningRate 0.0020   Epoch: 17   Global Step: 713420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:55:39,085-Speed 2648.86 samples/sec   Loss 1.9322   LearningRate 0.0020   Epoch: 17   Global Step: 713430   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:55:42,988-Speed 2624.43 samples/sec   Loss 2.0110   LearningRate 0.0020   Epoch: 17   Global Step: 713440   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:55:46,966-Speed 2575.34 samples/sec   Loss 2.0540   LearningRate 0.0020   Epoch: 17   Global Step: 713450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:55:50,859-Speed 2631.19 samples/sec   Loss 2.0143   LearningRate 0.0020   Epoch: 17   Global Step: 713460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:55:54,752-Speed 2630.49 samples/sec   Loss 2.0587   LearningRate 0.0020   Epoch: 17   Global Step: 713470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:55:58,649-Speed 2629.23 samples/sec   Loss 1.9882   LearningRate 0.0020   Epoch: 17   Global Step: 713480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:02,546-Speed 2627.86 samples/sec   Loss 2.0097   LearningRate 0.0020   Epoch: 17   Global Step: 713490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:06,437-Speed 2632.66 samples/sec   Loss 1.9974   LearningRate 0.0020   Epoch: 17   Global Step: 713500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:10,334-Speed 2628.05 samples/sec   Loss 1.9884   LearningRate 0.0020   Epoch: 17   Global Step: 713510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:14,233-Speed 2627.58 samples/sec   Loss 1.9750   LearningRate 0.0020   Epoch: 17   Global Step: 713520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:18,128-Speed 2629.66 samples/sec   Loss 2.0033   LearningRate 0.0020   Epoch: 17   Global Step: 713530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:56:22,019-Speed 2632.43 samples/sec   Loss 1.9825   LearningRate 0.0020   Epoch: 17   Global Step: 713540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:26,039-Speed 2548.58 samples/sec   Loss 1.9430   LearningRate 0.0020   Epoch: 17   Global Step: 713550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:29,932-Speed 2630.74 samples/sec   Loss 2.0380   LearningRate 0.0020   Epoch: 17   Global Step: 713560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:33,826-Speed 2630.30 samples/sec   Loss 2.0092   LearningRate 0.0020   Epoch: 17   Global Step: 713570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:37,720-Speed 2631.20 samples/sec   Loss 2.0527   LearningRate 0.0020   Epoch: 17   Global Step: 713580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:41,615-Speed 2629.65 samples/sec   Loss 1.9934   LearningRate 0.0020   Epoch: 17   Global Step: 713590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:45,513-Speed 2627.29 samples/sec   Loss 1.9824   LearningRate 0.0020   Epoch: 17   Global Step: 713600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:49,403-Speed 2633.71 samples/sec   Loss 1.9171   LearningRate 0.0020   Epoch: 17   Global Step: 713610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:53,297-Speed 2630.19 samples/sec   Loss 2.0532   LearningRate 0.0020   Epoch: 17   Global Step: 713620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:56:57,258-Speed 2586.16 samples/sec   Loss 1.9465   LearningRate 0.0020   Epoch: 17   Global Step: 713630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:57:01,170-Speed 2617.85 samples/sec   Loss 1.9755   LearningRate 0.0020   Epoch: 17   Global Step: 713640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:05,252-Speed 2509.27 samples/sec   Loss 2.0034   LearningRate 0.0020   Epoch: 17   Global Step: 713650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:09,257-Speed 2557.71 samples/sec   Loss 2.0055   LearningRate 0.0020   Epoch: 17   Global Step: 713660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:13,169-Speed 2618.23 samples/sec   Loss 1.9714   LearningRate 0.0020   Epoch: 17   Global Step: 713670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:17,085-Speed 2615.75 samples/sec   Loss 1.9485   LearningRate 0.0020   Epoch: 17   Global Step: 713680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:20,991-Speed 2622.66 samples/sec   Loss 1.9542   LearningRate 0.0020   Epoch: 17   Global Step: 713690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:24,952-Speed 2585.99 samples/sec   Loss 1.9178   LearningRate 0.0020   Epoch: 17   Global Step: 713700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:28,865-Speed 2617.56 samples/sec   Loss 2.0058   LearningRate 0.0020   Epoch: 17   Global Step: 713710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:57:32,742-Speed 2641.97 samples/sec   Loss 1.9599   LearningRate 0.0020   Epoch: 17   Global Step: 713720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:57:36,633-Speed 2631.78 samples/sec   Loss 2.0346   LearningRate 0.0020   Epoch: 17   Global Step: 713730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:57:40,525-Speed 2632.16 samples/sec   Loss 1.9869   LearningRate 0.0019   Epoch: 17   Global Step: 713740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:57:44,415-Speed 2633.03 samples/sec   Loss 2.0411   LearningRate 0.0019   Epoch: 17   Global Step: 713750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:57:48,307-Speed 2632.14 samples/sec   Loss 2.0227   LearningRate 0.0019   Epoch: 17   Global Step: 713760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:57:52,179-Speed 2645.00 samples/sec   Loss 1.9375   LearningRate 0.0019   Epoch: 17   Global Step: 713770   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:57:56,077-Speed 2628.32 samples/sec   Loss 2.0336   LearningRate 0.0019   Epoch: 17   Global Step: 713780   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:57:59,987-Speed 2618.82 samples/sec   Loss 2.0172   LearningRate 0.0019   Epoch: 17   Global Step: 713790   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:03,879-Speed 2631.62 samples/sec   Loss 1.9903   LearningRate 0.0019   Epoch: 17   Global Step: 713800   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:07,778-Speed 2627.04 samples/sec   Loss 2.0152   LearningRate 0.0019   Epoch: 17   Global Step: 713810   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:11,668-Speed 2633.84 samples/sec   Loss 2.0343   LearningRate 0.0019   Epoch: 17   Global Step: 713820   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:15,574-Speed 2621.75 samples/sec   Loss 1.9994   LearningRate 0.0019   Epoch: 17   Global Step: 713830   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:19,501-Speed 2608.48 samples/sec   Loss 2.0211   LearningRate 0.0019   Epoch: 17   Global Step: 713840   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:23,393-Speed 2632.02 samples/sec   Loss 2.0074   LearningRate 0.0019   Epoch: 17   Global Step: 713850   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:27,284-Speed 2632.32 samples/sec   Loss 2.0107   LearningRate 0.0019   Epoch: 17   Global Step: 713860   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 03:58:31,174-Speed 2633.18 samples/sec   Loss 2.0743   LearningRate 0.0019   Epoch: 17   Global Step: 713870   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:35,070-Speed 2628.93 samples/sec   Loss 2.0138   LearningRate 0.0019   Epoch: 17   Global Step: 713880   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:38,964-Speed 2630.32 samples/sec   Loss 1.9755   LearningRate 0.0019   Epoch: 17   Global Step: 713890   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:42,856-Speed 2631.85 samples/sec   Loss 1.9941   LearningRate 0.0019   Epoch: 17   Global Step: 713900   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:46,748-Speed 2632.60 samples/sec   Loss 2.0130   LearningRate 0.0019   Epoch: 17   Global Step: 713910   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:50,641-Speed 2630.61 samples/sec   Loss 2.0371   LearningRate 0.0019   Epoch: 17   Global Step: 713920   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:54,542-Speed 2626.63 samples/sec   Loss 2.0337   LearningRate 0.0019   Epoch: 17   Global Step: 713930   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:58:58,437-Speed 2629.51 samples/sec   Loss 1.9879   LearningRate 0.0019   Epoch: 17   Global Step: 713940   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:59:02,346-Speed 2620.05 samples/sec   Loss 1.9449   LearningRate 0.0019   Epoch: 17   Global Step: 713950   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:59:06,249-Speed 2624.30 samples/sec   Loss 2.0384   LearningRate 0.0019   Epoch: 17   Global Step: 713960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 03:59:10,148-Speed 2627.70 samples/sec   Loss 2.0258   LearningRate 0.0019   Epoch: 17   Global Step: 713970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:14,112-Speed 2583.33 samples/sec   Loss 1.9706   LearningRate 0.0019   Epoch: 17   Global Step: 713980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:18,030-Speed 2614.91 samples/sec   Loss 1.9759   LearningRate 0.0019   Epoch: 17   Global Step: 713990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:21,929-Speed 2627.67 samples/sec   Loss 1.9706   LearningRate 0.0019   Epoch: 17   Global Step: 714000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:25,846-Speed 2614.60 samples/sec   Loss 1.9711   LearningRate 0.0019   Epoch: 17   Global Step: 714010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:29,739-Speed 2631.48 samples/sec   Loss 2.0023   LearningRate 0.0019   Epoch: 17   Global Step: 714020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:33,644-Speed 2622.98 samples/sec   Loss 1.9854   LearningRate 0.0019   Epoch: 17   Global Step: 714030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:37,538-Speed 2629.68 samples/sec   Loss 2.0158   LearningRate 0.0019   Epoch: 17   Global Step: 714040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:41,435-Speed 2628.37 samples/sec   Loss 1.9782   LearningRate 0.0019   Epoch: 17   Global Step: 714050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:45,331-Speed 2629.09 samples/sec   Loss 2.0026   LearningRate 0.0019   Epoch: 17   Global Step: 714060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:49,203-Speed 2645.31 samples/sec   Loss 2.0085   LearningRate 0.0019   Epoch: 17   Global Step: 714070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:53,095-Speed 2632.39 samples/sec   Loss 1.9643   LearningRate 0.0019   Epoch: 17   Global Step: 714080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 03:59:56,986-Speed 2631.67 samples/sec   Loss 2.0084   LearningRate 0.0019   Epoch: 17   Global Step: 714090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:00,879-Speed 2631.80 samples/sec   Loss 2.0072   LearningRate 0.0019   Epoch: 17   Global Step: 714100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:04,771-Speed 2631.70 samples/sec   Loss 2.0192   LearningRate 0.0019   Epoch: 17   Global Step: 714110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:08,666-Speed 2629.77 samples/sec   Loss 2.0013   LearningRate 0.0019   Epoch: 17   Global Step: 714120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:12,611-Speed 2595.70 samples/sec   Loss 1.9829   LearningRate 0.0019   Epoch: 17   Global Step: 714130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:16,507-Speed 2629.48 samples/sec   Loss 2.0296   LearningRate 0.0019   Epoch: 17   Global Step: 714140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:20,405-Speed 2627.58 samples/sec   Loss 2.0148   LearningRate 0.0019   Epoch: 17   Global Step: 714150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:24,305-Speed 2626.97 samples/sec   Loss 1.9823   LearningRate 0.0019   Epoch: 17   Global Step: 714160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:28,176-Speed 2645.79 samples/sec   Loss 2.0337   LearningRate 0.0019   Epoch: 17   Global Step: 714170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:32,074-Speed 2627.80 samples/sec   Loss 1.9507   LearningRate 0.0019   Epoch: 17   Global Step: 714180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:35,971-Speed 2628.38 samples/sec   Loss 1.9975   LearningRate 0.0019   Epoch: 17   Global Step: 714190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:39,878-Speed 2621.89 samples/sec   Loss 1.9439   LearningRate 0.0019   Epoch: 17   Global Step: 714200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:43,773-Speed 2629.40 samples/sec   Loss 1.9607   LearningRate 0.0019   Epoch: 17   Global Step: 714210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:47,676-Speed 2624.67 samples/sec   Loss 2.0249   LearningRate 0.0019   Epoch: 17   Global Step: 714220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:51,575-Speed 2627.12 samples/sec   Loss 1.9854   LearningRate 0.0019   Epoch: 17   Global Step: 714230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:55,468-Speed 2630.66 samples/sec   Loss 1.9911   LearningRate 0.0019   Epoch: 17   Global Step: 714240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:00:59,374-Speed 2623.09 samples/sec   Loss 1.9639   LearningRate 0.0019   Epoch: 17   Global Step: 714250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:03,345-Speed 2579.09 samples/sec   Loss 1.9603   LearningRate 0.0019   Epoch: 17   Global Step: 714260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:07,404-Speed 2522.85 samples/sec   Loss 1.9843   LearningRate 0.0019   Epoch: 17   Global Step: 714270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:11,465-Speed 2522.07 samples/sec   Loss 2.0334   LearningRate 0.0019   Epoch: 17   Global Step: 714280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:15,510-Speed 2532.92 samples/sec   Loss 1.9532   LearningRate 0.0019   Epoch: 17   Global Step: 714290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:19,410-Speed 2625.95 samples/sec   Loss 1.9735   LearningRate 0.0019   Epoch: 17   Global Step: 714300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:23,313-Speed 2624.61 samples/sec   Loss 1.9668   LearningRate 0.0019   Epoch: 17   Global Step: 714310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:27,208-Speed 2629.94 samples/sec   Loss 2.0210   LearningRate 0.0019   Epoch: 17   Global Step: 714320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:31,095-Speed 2635.09 samples/sec   Loss 1.9748   LearningRate 0.0019   Epoch: 17   Global Step: 714330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:34,989-Speed 2630.24 samples/sec   Loss 1.9955   LearningRate 0.0019   Epoch: 17   Global Step: 714340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:38,889-Speed 2625.86 samples/sec   Loss 1.9810   LearningRate 0.0019   Epoch: 17   Global Step: 714350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:42,783-Speed 2630.28 samples/sec   Loss 1.9801   LearningRate 0.0019   Epoch: 17   Global Step: 714360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:46,674-Speed 2632.74 samples/sec   Loss 1.9957   LearningRate 0.0019   Epoch: 17   Global Step: 714370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:01:50,568-Speed 2630.46 samples/sec   Loss 2.0364   LearningRate 0.0019   Epoch: 17   Global Step: 714380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:01:54,446-Speed 2641.13 samples/sec   Loss 1.8877   LearningRate 0.0019   Epoch: 17   Global Step: 714390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:01:58,432-Speed 2569.73 samples/sec   Loss 1.9309   LearningRate 0.0019   Epoch: 17   Global Step: 714400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:02,331-Speed 2627.16 samples/sec   Loss 1.9582   LearningRate 0.0019   Epoch: 17   Global Step: 714410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:06,224-Speed 2630.66 samples/sec   Loss 1.9855   LearningRate 0.0019   Epoch: 17   Global Step: 714420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:10,121-Speed 2627.78 samples/sec   Loss 1.9565   LearningRate 0.0019   Epoch: 17   Global Step: 714430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:14,019-Speed 2627.36 samples/sec   Loss 1.9932   LearningRate 0.0019   Epoch: 17   Global Step: 714440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:17,913-Speed 2631.08 samples/sec   Loss 1.9622   LearningRate 0.0019   Epoch: 17   Global Step: 714450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:21,876-Speed 2585.27 samples/sec   Loss 1.9477   LearningRate 0.0019   Epoch: 17   Global Step: 714460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:02:25,767-Speed 2632.25 samples/sec   Loss 1.9358   LearningRate 0.0019   Epoch: 17   Global Step: 714470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:29,731-Speed 2583.29 samples/sec   Loss 1.9524   LearningRate 0.0019   Epoch: 17   Global Step: 714480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:33,630-Speed 2627.18 samples/sec   Loss 1.9516   LearningRate 0.0019   Epoch: 17   Global Step: 714490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:37,524-Speed 2630.49 samples/sec   Loss 1.9544   LearningRate 0.0019   Epoch: 17   Global Step: 714500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:41,447-Speed 2611.22 samples/sec   Loss 1.9622   LearningRate 0.0019   Epoch: 17   Global Step: 714510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:45,333-Speed 2635.90 samples/sec   Loss 1.9677   LearningRate 0.0019   Epoch: 17   Global Step: 714520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:49,329-Speed 2563.22 samples/sec   Loss 2.0437   LearningRate 0.0019   Epoch: 17   Global Step: 714530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:53,220-Speed 2632.44 samples/sec   Loss 1.9931   LearningRate 0.0019   Epoch: 17   Global Step: 714540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:02:57,149-Speed 2607.99 samples/sec   Loss 2.0033   LearningRate 0.0019   Epoch: 17   Global Step: 714550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:03:01,036-Speed 2634.90 samples/sec   Loss 1.9956   LearningRate 0.0019   Epoch: 17   Global Step: 714560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:03:04,933-Speed 2628.22 samples/sec   Loss 2.0182   LearningRate 0.0019   Epoch: 17   Global Step: 714570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:08,828-Speed 2629.38 samples/sec   Loss 2.0568   LearningRate 0.0019   Epoch: 17   Global Step: 714580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:12,724-Speed 2629.15 samples/sec   Loss 1.9271   LearningRate 0.0019   Epoch: 17   Global Step: 714590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:16,621-Speed 2628.93 samples/sec   Loss 2.0006   LearningRate 0.0019   Epoch: 17   Global Step: 714600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:20,531-Speed 2619.60 samples/sec   Loss 2.0004   LearningRate 0.0019   Epoch: 17   Global Step: 714610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:24,421-Speed 2633.32 samples/sec   Loss 1.9164   LearningRate 0.0019   Epoch: 17   Global Step: 714620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:28,333-Speed 2617.74 samples/sec   Loss 1.9544   LearningRate 0.0019   Epoch: 17   Global Step: 714630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:32,284-Speed 2592.66 samples/sec   Loss 1.9533   LearningRate 0.0019   Epoch: 17   Global Step: 714640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:36,190-Speed 2622.12 samples/sec   Loss 1.8998   LearningRate 0.0019   Epoch: 17   Global Step: 714650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:40,111-Speed 2611.88 samples/sec   Loss 1.9662   LearningRate 0.0019   Epoch: 17   Global Step: 714660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:44,155-Speed 2533.00 samples/sec   Loss 1.9376   LearningRate 0.0019   Epoch: 17   Global Step: 714670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:48,049-Speed 2630.29 samples/sec   Loss 1.9593   LearningRate 0.0019   Epoch: 17   Global Step: 714680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:51,937-Speed 2634.40 samples/sec   Loss 2.0130   LearningRate 0.0019   Epoch: 17   Global Step: 714690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:55,835-Speed 2628.35 samples/sec   Loss 2.0311   LearningRate 0.0019   Epoch: 17   Global Step: 714700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:03:59,737-Speed 2624.36 samples/sec   Loss 1.9687   LearningRate 0.0019   Epoch: 17   Global Step: 714710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:03,651-Speed 2617.25 samples/sec   Loss 1.9044   LearningRate 0.0019   Epoch: 17   Global Step: 714720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:07,544-Speed 2630.73 samples/sec   Loss 1.9850   LearningRate 0.0019   Epoch: 17   Global Step: 714730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:11,437-Speed 2631.02 samples/sec   Loss 2.0354   LearningRate 0.0019   Epoch: 17   Global Step: 714740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:15,328-Speed 2632.69 samples/sec   Loss 1.9755   LearningRate 0.0019   Epoch: 17   Global Step: 714750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:19,263-Speed 2602.76 samples/sec   Loss 1.9114   LearningRate 0.0019   Epoch: 17   Global Step: 714760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:23,223-Speed 2586.62 samples/sec   Loss 2.0118   LearningRate 0.0019   Epoch: 17   Global Step: 714770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:04:27,161-Speed 2601.16 samples/sec   Loss 2.0506   LearningRate 0.0019   Epoch: 17   Global Step: 714780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:31,053-Speed 2632.11 samples/sec   Loss 1.9535   LearningRate 0.0019   Epoch: 17   Global Step: 714790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:34,945-Speed 2631.22 samples/sec   Loss 2.0434   LearningRate 0.0019   Epoch: 17   Global Step: 714800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:38,838-Speed 2631.26 samples/sec   Loss 1.9513   LearningRate 0.0019   Epoch: 17   Global Step: 714810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:42,733-Speed 2629.85 samples/sec   Loss 2.0224   LearningRate 0.0019   Epoch: 17   Global Step: 714820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:46,630-Speed 2629.21 samples/sec   Loss 1.9877   LearningRate 0.0019   Epoch: 17   Global Step: 714830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:50,531-Speed 2626.01 samples/sec   Loss 1.9979   LearningRate 0.0019   Epoch: 17   Global Step: 714840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:54,422-Speed 2632.58 samples/sec   Loss 1.9506   LearningRate 0.0019   Epoch: 17   Global Step: 714850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:04:58,321-Speed 2627.05 samples/sec   Loss 1.9723   LearningRate 0.0019   Epoch: 17   Global Step: 714860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:02,247-Speed 2608.68 samples/sec   Loss 2.0558   LearningRate 0.0019   Epoch: 17   Global Step: 714870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:06,150-Speed 2624.22 samples/sec   Loss 1.9188   LearningRate 0.0019   Epoch: 17   Global Step: 714880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:05:10,054-Speed 2624.21 samples/sec   Loss 1.9273   LearningRate 0.0019   Epoch: 17   Global Step: 714890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:05:13,930-Speed 2642.38 samples/sec   Loss 2.0053   LearningRate 0.0019   Epoch: 17   Global Step: 714900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:17,827-Speed 2629.01 samples/sec   Loss 2.0191   LearningRate 0.0019   Epoch: 17   Global Step: 714910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:21,732-Speed 2623.10 samples/sec   Loss 1.9678   LearningRate 0.0019   Epoch: 17   Global Step: 714920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:25,628-Speed 2628.81 samples/sec   Loss 1.9655   LearningRate 0.0019   Epoch: 17   Global Step: 714930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:29,523-Speed 2629.81 samples/sec   Loss 1.9248   LearningRate 0.0019   Epoch: 17   Global Step: 714940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:05:33,399-Speed 2642.88 samples/sec   Loss 2.0061   LearningRate 0.0019   Epoch: 17   Global Step: 714950   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:05:37,294-Speed 2629.52 samples/sec   Loss 1.9267   LearningRate 0.0019   Epoch: 17   Global Step: 714960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:05:41,191-Speed 2628.28 samples/sec   Loss 1.9996   LearningRate 0.0019   Epoch: 17   Global Step: 714970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:05:45,084-Speed 2631.09 samples/sec   Loss 1.9676   LearningRate 0.0019   Epoch: 17   Global Step: 714980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:05:48,983-Speed 2626.57 samples/sec   Loss 1.9508   LearningRate 0.0019   Epoch: 17   Global Step: 714990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:05:52,883-Speed 2626.55 samples/sec   Loss 1.9426   LearningRate 0.0019   Epoch: 17   Global Step: 715000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:05:56,779-Speed 2629.52 samples/sec   Loss 1.9230   LearningRate 0.0019   Epoch: 17   Global Step: 715010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:00,679-Speed 2626.10 samples/sec   Loss 1.9312   LearningRate 0.0019   Epoch: 17   Global Step: 715020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:04,577-Speed 2627.62 samples/sec   Loss 2.0090   LearningRate 0.0019   Epoch: 17   Global Step: 715030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:08,478-Speed 2625.60 samples/sec   Loss 2.0235   LearningRate 0.0019   Epoch: 17   Global Step: 715040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:12,377-Speed 2626.37 samples/sec   Loss 1.9830   LearningRate 0.0019   Epoch: 17   Global Step: 715050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:06:16,330-Speed 2591.43 samples/sec   Loss 1.9725   LearningRate 0.0019   Epoch: 17   Global Step: 715060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:06:20,225-Speed 2629.22 samples/sec   Loss 1.9870   LearningRate 0.0019   Epoch: 17   Global Step: 715070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:06:24,106-Speed 2640.24 samples/sec   Loss 1.9952   LearningRate 0.0019   Epoch: 17   Global Step: 715080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:28,016-Speed 2619.49 samples/sec   Loss 1.9672   LearningRate 0.0019   Epoch: 17   Global Step: 715090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:31,907-Speed 2632.37 samples/sec   Loss 1.9581   LearningRate 0.0019   Epoch: 17   Global Step: 715100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:35,799-Speed 2631.31 samples/sec   Loss 2.0162   LearningRate 0.0019   Epoch: 17   Global Step: 715110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:39,701-Speed 2624.91 samples/sec   Loss 1.9306   LearningRate 0.0019   Epoch: 17   Global Step: 715120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:43,604-Speed 2623.93 samples/sec   Loss 1.9744   LearningRate 0.0019   Epoch: 17   Global Step: 715130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:47,519-Speed 2616.96 samples/sec   Loss 1.9667   LearningRate 0.0019   Epoch: 17   Global Step: 715140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:51,405-Speed 2635.38 samples/sec   Loss 1.9845   LearningRate 0.0019   Epoch: 17   Global Step: 715150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:55,365-Speed 2587.19 samples/sec   Loss 1.9767   LearningRate 0.0019   Epoch: 17   Global Step: 715160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:06:59,276-Speed 2618.75 samples/sec   Loss 1.9298   LearningRate 0.0019   Epoch: 17   Global Step: 715170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:03,182-Speed 2622.34 samples/sec   Loss 1.9768   LearningRate 0.0019   Epoch: 17   Global Step: 715180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:07:07,076-Speed 2630.62 samples/sec   Loss 1.9820   LearningRate 0.0019   Epoch: 17   Global Step: 715190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:07:10,972-Speed 2628.57 samples/sec   Loss 1.9609   LearningRate 0.0019   Epoch: 17   Global Step: 715200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:07:14,847-Speed 2643.44 samples/sec   Loss 1.9744   LearningRate 0.0019   Epoch: 17   Global Step: 715210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:18,748-Speed 2625.72 samples/sec   Loss 1.9515   LearningRate 0.0019   Epoch: 17   Global Step: 715220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:22,651-Speed 2624.73 samples/sec   Loss 1.9978   LearningRate 0.0019   Epoch: 17   Global Step: 715230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:26,569-Speed 2614.23 samples/sec   Loss 1.9706   LearningRate 0.0019   Epoch: 17   Global Step: 715240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:30,474-Speed 2623.39 samples/sec   Loss 1.9824   LearningRate 0.0019   Epoch: 17   Global Step: 715250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:34,376-Speed 2624.78 samples/sec   Loss 1.9409   LearningRate 0.0019   Epoch: 17   Global Step: 715260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:38,272-Speed 2628.82 samples/sec   Loss 1.9440   LearningRate 0.0019   Epoch: 17   Global Step: 715270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:42,167-Speed 2629.81 samples/sec   Loss 2.0051   LearningRate 0.0019   Epoch: 17   Global Step: 715280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:46,064-Speed 2628.80 samples/sec   Loss 1.9408   LearningRate 0.0019   Epoch: 17   Global Step: 715290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:50,036-Speed 2578.13 samples/sec   Loss 1.9332   LearningRate 0.0019   Epoch: 17   Global Step: 715300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:07:53,932-Speed 2629.57 samples/sec   Loss 2.0089   LearningRate 0.0019   Epoch: 17   Global Step: 715310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:07:57,830-Speed 2627.81 samples/sec   Loss 2.0122   LearningRate 0.0019   Epoch: 17   Global Step: 715320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:01,956-Speed 2482.49 samples/sec   Loss 1.9982   LearningRate 0.0019   Epoch: 17   Global Step: 715330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:05,868-Speed 2618.09 samples/sec   Loss 1.9454   LearningRate 0.0019   Epoch: 17   Global Step: 715340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:09,785-Speed 2615.31 samples/sec   Loss 1.9315   LearningRate 0.0019   Epoch: 17   Global Step: 715350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:13,697-Speed 2617.85 samples/sec   Loss 2.0025   LearningRate 0.0019   Epoch: 17   Global Step: 715360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:17,826-Speed 2481.00 samples/sec   Loss 1.9690   LearningRate 0.0019   Epoch: 17   Global Step: 715370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:21,797-Speed 2579.08 samples/sec   Loss 2.0127   LearningRate 0.0019   Epoch: 17   Global Step: 715380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:25,705-Speed 2620.66 samples/sec   Loss 2.0098   LearningRate 0.0019   Epoch: 17   Global Step: 715390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:29,604-Speed 2626.84 samples/sec   Loss 2.0219   LearningRate 0.0019   Epoch: 17   Global Step: 715400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:33,497-Speed 2632.57 samples/sec   Loss 1.9264   LearningRate 0.0019   Epoch: 17   Global Step: 715410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:08:37,373-Speed 2642.59 samples/sec   Loss 1.9935   LearningRate 0.0019   Epoch: 17   Global Step: 715420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:41,266-Speed 2630.76 samples/sec   Loss 1.9983   LearningRate 0.0019   Epoch: 17   Global Step: 715430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:45,169-Speed 2624.20 samples/sec   Loss 1.9834   LearningRate 0.0019   Epoch: 17   Global Step: 715440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:49,061-Speed 2632.10 samples/sec   Loss 1.9180   LearningRate 0.0019   Epoch: 17   Global Step: 715450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:52,956-Speed 2629.84 samples/sec   Loss 1.9924   LearningRate 0.0019   Epoch: 17   Global Step: 715460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:08:56,851-Speed 2629.94 samples/sec   Loss 1.9180   LearningRate 0.0019   Epoch: 17   Global Step: 715470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:00,745-Speed 2630.29 samples/sec   Loss 1.9845   LearningRate 0.0019   Epoch: 17   Global Step: 715480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:04,642-Speed 2627.94 samples/sec   Loss 1.9721   LearningRate 0.0019   Epoch: 17   Global Step: 715490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:08,536-Speed 2630.21 samples/sec   Loss 1.9867   LearningRate 0.0019   Epoch: 17   Global Step: 715500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:12,437-Speed 2626.31 samples/sec   Loss 1.9395   LearningRate 0.0019   Epoch: 17   Global Step: 715510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:16,340-Speed 2624.18 samples/sec   Loss 1.9984   LearningRate 0.0019   Epoch: 17   Global Step: 715520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:09:20,227-Speed 2635.27 samples/sec   Loss 2.0018   LearningRate 0.0019   Epoch: 17   Global Step: 715530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:24,134-Speed 2621.59 samples/sec   Loss 1.9513   LearningRate 0.0019   Epoch: 17   Global Step: 715540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:28,066-Speed 2605.74 samples/sec   Loss 2.0074   LearningRate 0.0019   Epoch: 17   Global Step: 715550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:31,991-Speed 2609.50 samples/sec   Loss 1.9465   LearningRate 0.0019   Epoch: 17   Global Step: 715560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:35,886-Speed 2629.34 samples/sec   Loss 1.9795   LearningRate 0.0019   Epoch: 17   Global Step: 715570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:39,784-Speed 2627.78 samples/sec   Loss 1.9589   LearningRate 0.0019   Epoch: 17   Global Step: 715580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:43,697-Speed 2618.89 samples/sec   Loss 1.9499   LearningRate 0.0019   Epoch: 17   Global Step: 715590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:47,632-Speed 2602.96 samples/sec   Loss 1.9134   LearningRate 0.0019   Epoch: 17   Global Step: 715600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:51,529-Speed 2627.99 samples/sec   Loss 2.0400   LearningRate 0.0019   Epoch: 17   Global Step: 715610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:09:55,406-Speed 2642.92 samples/sec   Loss 1.9837   LearningRate 0.0019   Epoch: 17   Global Step: 715620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:09:59,303-Speed 2627.74 samples/sec   Loss 1.9966   LearningRate 0.0019   Epoch: 17   Global Step: 715630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:03,202-Speed 2627.02 samples/sec   Loss 1.9136   LearningRate 0.0019   Epoch: 17   Global Step: 715640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:07,122-Speed 2612.81 samples/sec   Loss 1.9531   LearningRate 0.0019   Epoch: 17   Global Step: 715650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:11,032-Speed 2620.10 samples/sec   Loss 2.0056   LearningRate 0.0019   Epoch: 17   Global Step: 715660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:14,922-Speed 2633.16 samples/sec   Loss 1.9967   LearningRate 0.0019   Epoch: 17   Global Step: 715670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:18,857-Speed 2602.73 samples/sec   Loss 1.9649   LearningRate 0.0019   Epoch: 17   Global Step: 715680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:22,766-Speed 2620.31 samples/sec   Loss 1.9506   LearningRate 0.0019   Epoch: 17   Global Step: 715690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:26,669-Speed 2624.79 samples/sec   Loss 1.9170   LearningRate 0.0019   Epoch: 17   Global Step: 715700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:30,562-Speed 2630.97 samples/sec   Loss 1.9823   LearningRate 0.0019   Epoch: 17   Global Step: 715710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:34,453-Speed 2632.20 samples/sec   Loss 1.9982   LearningRate 0.0019   Epoch: 17   Global Step: 715720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:10:38,347-Speed 2630.78 samples/sec   Loss 1.9855   LearningRate 0.0019   Epoch: 17   Global Step: 715730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:10:42,240-Speed 2630.71 samples/sec   Loss 2.0250   LearningRate 0.0019   Epoch: 17   Global Step: 715740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:10:46,132-Speed 2631.62 samples/sec   Loss 2.0011   LearningRate 0.0019   Epoch: 17   Global Step: 715750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:10:50,006-Speed 2644.32 samples/sec   Loss 2.0310   LearningRate 0.0019   Epoch: 17   Global Step: 715760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:53,899-Speed 2630.84 samples/sec   Loss 1.9954   LearningRate 0.0019   Epoch: 17   Global Step: 715770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:10:57,791-Speed 2633.09 samples/sec   Loss 1.9497   LearningRate 0.0019   Epoch: 17   Global Step: 715780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:11:01,660-Speed 2647.12 samples/sec   Loss 1.9696   LearningRate 0.0019   Epoch: 17   Global Step: 715790   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:05,557-Speed 2628.07 samples/sec   Loss 1.9363   LearningRate 0.0019   Epoch: 17   Global Step: 715800   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:09,457-Speed 2625.65 samples/sec   Loss 1.9492   LearningRate 0.0019   Epoch: 17   Global Step: 715810   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:13,357-Speed 2626.61 samples/sec   Loss 1.9480   LearningRate 0.0019   Epoch: 17   Global Step: 715820   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:17,253-Speed 2629.35 samples/sec   Loss 2.0357   LearningRate 0.0019   Epoch: 17   Global Step: 715830   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:21,148-Speed 2629.04 samples/sec   Loss 1.9197   LearningRate 0.0019   Epoch: 17   Global Step: 715840   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:25,043-Speed 2630.43 samples/sec   Loss 1.9524   LearningRate 0.0019   Epoch: 17   Global Step: 715850   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:28,941-Speed 2627.71 samples/sec   Loss 2.0199   LearningRate 0.0019   Epoch: 17   Global Step: 715860   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:32,835-Speed 2630.07 samples/sec   Loss 1.9586   LearningRate 0.0019   Epoch: 17   Global Step: 715870   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:36,726-Speed 2631.80 samples/sec   Loss 2.0262   LearningRate 0.0019   Epoch: 17   Global Step: 715880   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:11:40,665-Speed 2601.43 samples/sec   Loss 2.0448   LearningRate 0.0019   Epoch: 17   Global Step: 715890   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:11:44,560-Speed 2629.90 samples/sec   Loss 1.8991   LearningRate 0.0019   Epoch: 17   Global Step: 715900   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:11:48,469-Speed 2620.10 samples/sec   Loss 1.9478   LearningRate 0.0019   Epoch: 17   Global Step: 715910   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:11:52,363-Speed 2630.49 samples/sec   Loss 1.9488   LearningRate 0.0019   Epoch: 17   Global Step: 715920   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:11:56,257-Speed 2630.84 samples/sec   Loss 1.9425   LearningRate 0.0019   Epoch: 17   Global Step: 715930   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:12:00,191-Speed 2603.44 samples/sec   Loss 1.9544   LearningRate 0.0019   Epoch: 17   Global Step: 715940   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:12:04,086-Speed 2630.00 samples/sec   Loss 1.9121   LearningRate 0.0019   Epoch: 17   Global Step: 715950   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:12:07,984-Speed 2627.23 samples/sec   Loss 1.8868   LearningRate 0.0019   Epoch: 17   Global Step: 715960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:12:11,878-Speed 2630.59 samples/sec   Loss 1.9425   LearningRate 0.0019   Epoch: 17   Global Step: 715970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:12:15,772-Speed 2630.63 samples/sec   Loss 1.9916   LearningRate 0.0019   Epoch: 17   Global Step: 715980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:12:19,705-Speed 2603.94 samples/sec   Loss 1.9054   LearningRate 0.0019   Epoch: 17   Global Step: 715990   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:23,657-Speed 2591.85 samples/sec   Loss 1.9306   LearningRate 0.0019   Epoch: 17   Global Step: 716000   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:27,548-Speed 2632.56 samples/sec   Loss 1.9559   LearningRate 0.0019   Epoch: 17   Global Step: 716010   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:31,481-Speed 2604.38 samples/sec   Loss 1.8949   LearningRate 0.0019   Epoch: 17   Global Step: 716020   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:35,437-Speed 2589.49 samples/sec   Loss 1.9637   LearningRate 0.0019   Epoch: 17   Global Step: 716030   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:39,389-Speed 2591.34 samples/sec   Loss 1.9881   LearningRate 0.0019   Epoch: 17   Global Step: 716040   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:43,295-Speed 2622.13 samples/sec   Loss 1.9060   LearningRate 0.0019   Epoch: 17   Global Step: 716050   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:47,186-Speed 2632.76 samples/sec   Loss 1.9546   LearningRate 0.0019   Epoch: 17   Global Step: 716060   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:51,076-Speed 2632.90 samples/sec   Loss 1.9993   LearningRate 0.0019   Epoch: 17   Global Step: 716070   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:54,977-Speed 2626.00 samples/sec   Loss 1.9184   LearningRate 0.0019   Epoch: 17   Global Step: 716080   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:12:58,871-Speed 2630.64 samples/sec   Loss 1.9857   LearningRate 0.0019   Epoch: 17   Global Step: 716090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:02,766-Speed 2629.57 samples/sec   Loss 2.0443   LearningRate 0.0019   Epoch: 17   Global Step: 716100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:06,661-Speed 2630.10 samples/sec   Loss 1.9417   LearningRate 0.0019   Epoch: 17   Global Step: 716110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:10,567-Speed 2622.30 samples/sec   Loss 1.9000   LearningRate 0.0019   Epoch: 17   Global Step: 716120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:14,463-Speed 2628.89 samples/sec   Loss 1.9960   LearningRate 0.0019   Epoch: 17   Global Step: 716130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:18,358-Speed 2629.69 samples/sec   Loss 1.9229   LearningRate 0.0019   Epoch: 17   Global Step: 716140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:22,253-Speed 2629.83 samples/sec   Loss 1.9736   LearningRate 0.0019   Epoch: 17   Global Step: 716150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:26,153-Speed 2627.10 samples/sec   Loss 1.9800   LearningRate 0.0019   Epoch: 17   Global Step: 716160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:30,058-Speed 2622.88 samples/sec   Loss 1.9934   LearningRate 0.0019   Epoch: 17   Global Step: 716170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:33,957-Speed 2626.88 samples/sec   Loss 1.9641   LearningRate 0.0019   Epoch: 17   Global Step: 716180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:37,849-Speed 2631.83 samples/sec   Loss 1.9356   LearningRate 0.0019   Epoch: 17   Global Step: 716190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:13:41,742-Speed 2631.17 samples/sec   Loss 1.9582   LearningRate 0.0019   Epoch: 17   Global Step: 716200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:13:45,637-Speed 2629.61 samples/sec   Loss 1.9827   LearningRate 0.0019   Epoch: 17   Global Step: 716210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:13:49,504-Speed 2648.50 samples/sec   Loss 1.9674   LearningRate 0.0019   Epoch: 17   Global Step: 716220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:53,401-Speed 2628.54 samples/sec   Loss 1.8958   LearningRate 0.0019   Epoch: 17   Global Step: 716230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:13:57,301-Speed 2626.30 samples/sec   Loss 1.9296   LearningRate 0.0019   Epoch: 17   Global Step: 716240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:01,206-Speed 2623.24 samples/sec   Loss 1.9699   LearningRate 0.0019   Epoch: 17   Global Step: 716250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:05,106-Speed 2626.10 samples/sec   Loss 2.0012   LearningRate 0.0019   Epoch: 17   Global Step: 716260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:09,005-Speed 2626.66 samples/sec   Loss 1.9414   LearningRate 0.0019   Epoch: 17   Global Step: 716270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:12,904-Speed 2627.41 samples/sec   Loss 1.9310   LearningRate 0.0019   Epoch: 17   Global Step: 716280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:16,802-Speed 2627.47 samples/sec   Loss 2.0085   LearningRate 0.0019   Epoch: 17   Global Step: 716290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:20,698-Speed 2629.20 samples/sec   Loss 1.9825   LearningRate 0.0019   Epoch: 17   Global Step: 716300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:24,596-Speed 2627.50 samples/sec   Loss 1.8860   LearningRate 0.0019   Epoch: 17   Global Step: 716310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:14:28,495-Speed 2626.98 samples/sec   Loss 1.9410   LearningRate 0.0019   Epoch: 17   Global Step: 716320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:32,421-Speed 2608.98 samples/sec   Loss 1.8921   LearningRate 0.0019   Epoch: 17   Global Step: 716330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:36,318-Speed 2627.70 samples/sec   Loss 1.9644   LearningRate 0.0019   Epoch: 17   Global Step: 716340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:40,214-Speed 2628.54 samples/sec   Loss 1.9968   LearningRate 0.0019   Epoch: 17   Global Step: 716350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:44,115-Speed 2626.76 samples/sec   Loss 1.9399   LearningRate 0.0019   Epoch: 17   Global Step: 716360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:48,008-Speed 2630.48 samples/sec   Loss 1.9685   LearningRate 0.0019   Epoch: 17   Global Step: 716370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:51,916-Speed 2621.51 samples/sec   Loss 1.9336   LearningRate 0.0019   Epoch: 17   Global Step: 716380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:55,810-Speed 2629.98 samples/sec   Loss 1.9559   LearningRate 0.0019   Epoch: 17   Global Step: 716390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:14:59,707-Speed 2627.83 samples/sec   Loss 1.9721   LearningRate 0.0019   Epoch: 17   Global Step: 716400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:03,600-Speed 2631.24 samples/sec   Loss 1.9850   LearningRate 0.0019   Epoch: 17   Global Step: 716410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:07,551-Speed 2592.53 samples/sec   Loss 1.9637   LearningRate 0.0019   Epoch: 17   Global Step: 716420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:11,471-Speed 2612.63 samples/sec   Loss 2.0441   LearningRate 0.0019   Epoch: 17   Global Step: 716430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:15,394-Speed 2610.99 samples/sec   Loss 1.9782   LearningRate 0.0019   Epoch: 17   Global Step: 716440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:19,299-Speed 2623.07 samples/sec   Loss 1.9926   LearningRate 0.0019   Epoch: 17   Global Step: 716450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:23,211-Speed 2618.09 samples/sec   Loss 1.9810   LearningRate 0.0019   Epoch: 17   Global Step: 716460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:27,109-Speed 2628.16 samples/sec   Loss 1.9743   LearningRate 0.0019   Epoch: 17   Global Step: 716470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:31,003-Speed 2630.15 samples/sec   Loss 1.9629   LearningRate 0.0019   Epoch: 17   Global Step: 716480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:34,906-Speed 2624.11 samples/sec   Loss 1.9484   LearningRate 0.0019   Epoch: 17   Global Step: 716490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:38,800-Speed 2630.16 samples/sec   Loss 1.9416   LearningRate 0.0019   Epoch: 17   Global Step: 716500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:15:42,678-Speed 2641.24 samples/sec   Loss 1.9906   LearningRate 0.0019   Epoch: 17   Global Step: 716510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:15:46,580-Speed 2625.05 samples/sec   Loss 1.9553   LearningRate 0.0019   Epoch: 17   Global Step: 716520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:15:50,476-Speed 2629.13 samples/sec   Loss 1.8976   LearningRate 0.0019   Epoch: 17   Global Step: 716530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:15:54,376-Speed 2625.67 samples/sec   Loss 1.9682   LearningRate 0.0019   Epoch: 17   Global Step: 716540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:15:58,270-Speed 2631.45 samples/sec   Loss 1.9535   LearningRate 0.0019   Epoch: 17   Global Step: 716550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:16:02,166-Speed 2628.61 samples/sec   Loss 2.0256   LearningRate 0.0019   Epoch: 17   Global Step: 716560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:16:06,074-Speed 2620.41 samples/sec   Loss 1.9304   LearningRate 0.0019   Epoch: 17   Global Step: 716570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:16:09,968-Speed 2630.19 samples/sec   Loss 1.8833   LearningRate 0.0019   Epoch: 17   Global Step: 716580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:16:13,859-Speed 2633.39 samples/sec   Loss 1.8994   LearningRate 0.0019   Epoch: 17   Global Step: 716590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:16:17,766-Speed 2622.21 samples/sec   Loss 2.0146   LearningRate 0.0019   Epoch: 17   Global Step: 716600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:16:21,676-Speed 2619.49 samples/sec   Loss 1.9321   LearningRate 0.0019   Epoch: 17   Global Step: 716610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:25,573-Speed 2628.86 samples/sec   Loss 1.9699   LearningRate 0.0019   Epoch: 17   Global Step: 716620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:29,466-Speed 2630.62 samples/sec   Loss 1.9501   LearningRate 0.0019   Epoch: 17   Global Step: 716630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:33,363-Speed 2628.16 samples/sec   Loss 1.9254   LearningRate 0.0019   Epoch: 17   Global Step: 716640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:37,259-Speed 2629.20 samples/sec   Loss 2.0051   LearningRate 0.0019   Epoch: 17   Global Step: 716650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:41,154-Speed 2630.32 samples/sec   Loss 1.9248   LearningRate 0.0019   Epoch: 17   Global Step: 716660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:45,045-Speed 2631.84 samples/sec   Loss 1.9216   LearningRate 0.0019   Epoch: 17   Global Step: 716670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:48,940-Speed 2630.91 samples/sec   Loss 1.9003   LearningRate 0.0019   Epoch: 17   Global Step: 716680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:52,881-Speed 2598.57 samples/sec   Loss 1.9957   LearningRate 0.0019   Epoch: 17   Global Step: 716690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:16:56,776-Speed 2630.12 samples/sec   Loss 2.0304   LearningRate 0.0019   Epoch: 17   Global Step: 716700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:17:00,681-Speed 2623.35 samples/sec   Loss 1.9612   LearningRate 0.0019   Epoch: 17   Global Step: 716710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:17:04,552-Speed 2645.75 samples/sec   Loss 1.9694   LearningRate 0.0019   Epoch: 17   Global Step: 716720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:17:08,460-Speed 2620.35 samples/sec   Loss 1.9923   LearningRate 0.0019   Epoch: 17   Global Step: 716730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:17:12,345-Speed 2636.73 samples/sec   Loss 1.9774   LearningRate 0.0019   Epoch: 17   Global Step: 716740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:16,241-Speed 2628.98 samples/sec   Loss 1.9150   LearningRate 0.0018   Epoch: 17   Global Step: 716750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:20,175-Speed 2603.75 samples/sec   Loss 1.9504   LearningRate 0.0018   Epoch: 17   Global Step: 716760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:24,079-Speed 2624.02 samples/sec   Loss 1.9929   LearningRate 0.0018   Epoch: 17   Global Step: 716770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:27,972-Speed 2631.37 samples/sec   Loss 1.9214   LearningRate 0.0018   Epoch: 17   Global Step: 716780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:31,983-Speed 2553.65 samples/sec   Loss 1.9537   LearningRate 0.0018   Epoch: 17   Global Step: 716790   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:36,072-Speed 2504.42 samples/sec   Loss 1.9477   LearningRate 0.0018   Epoch: 17   Global Step: 716800   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:40,163-Speed 2503.78 samples/sec   Loss 2.0268   LearningRate 0.0018   Epoch: 17   Global Step: 716810   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:44,211-Speed 2530.64 samples/sec   Loss 2.0378   LearningRate 0.0018   Epoch: 17   Global Step: 716820   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:48,108-Speed 2628.71 samples/sec   Loss 1.9785   LearningRate 0.0018   Epoch: 17   Global Step: 716830   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:17:52,013-Speed 2622.63 samples/sec   Loss 1.9689   LearningRate 0.0018   Epoch: 17   Global Step: 716840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:17:55,913-Speed 2627.55 samples/sec   Loss 1.9309   LearningRate 0.0018   Epoch: 17   Global Step: 716850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:17:59,811-Speed 2627.26 samples/sec   Loss 1.9514   LearningRate 0.0018   Epoch: 17   Global Step: 716860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:03,737-Speed 2608.59 samples/sec   Loss 1.9350   LearningRate 0.0018   Epoch: 17   Global Step: 716870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:07,641-Speed 2623.66 samples/sec   Loss 1.9482   LearningRate 0.0018   Epoch: 17   Global Step: 716880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:11,539-Speed 2628.41 samples/sec   Loss 1.9852   LearningRate 0.0018   Epoch: 17   Global Step: 716890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:15,432-Speed 2630.68 samples/sec   Loss 1.9568   LearningRate 0.0018   Epoch: 17   Global Step: 716900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:19,333-Speed 2625.70 samples/sec   Loss 1.9358   LearningRate 0.0018   Epoch: 17   Global Step: 716910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:23,229-Speed 2629.47 samples/sec   Loss 2.0135   LearningRate 0.0018   Epoch: 17   Global Step: 716920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:27,120-Speed 2632.21 samples/sec   Loss 1.9035   LearningRate 0.0018   Epoch: 17   Global Step: 716930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:31,016-Speed 2629.83 samples/sec   Loss 1.9511   LearningRate 0.0018   Epoch: 17   Global Step: 716940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:18:34,909-Speed 2630.26 samples/sec   Loss 1.9792   LearningRate 0.0018   Epoch: 17   Global Step: 716950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:18:38,783-Speed 2644.24 samples/sec   Loss 1.9744   LearningRate 0.0018   Epoch: 17   Global Step: 716960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:42,683-Speed 2625.59 samples/sec   Loss 2.0331   LearningRate 0.0018   Epoch: 17   Global Step: 716970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:46,576-Speed 2631.56 samples/sec   Loss 1.9021   LearningRate 0.0018   Epoch: 17   Global Step: 716980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:50,470-Speed 2630.36 samples/sec   Loss 1.9824   LearningRate 0.0018   Epoch: 17   Global Step: 716990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:54,377-Speed 2621.79 samples/sec   Loss 1.9082   LearningRate 0.0018   Epoch: 17   Global Step: 717000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:18:58,249-Speed 2645.46 samples/sec   Loss 1.9426   LearningRate 0.0018   Epoch: 17   Global Step: 717010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:02,141-Speed 2631.90 samples/sec   Loss 1.9792   LearningRate 0.0018   Epoch: 17   Global Step: 717020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:06,040-Speed 2627.14 samples/sec   Loss 1.9057   LearningRate 0.0018   Epoch: 17   Global Step: 717030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:09,941-Speed 2625.49 samples/sec   Loss 1.9683   LearningRate 0.0018   Epoch: 17   Global Step: 717040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:13,849-Speed 2620.80 samples/sec   Loss 1.9741   LearningRate 0.0018   Epoch: 17   Global Step: 717050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:17,751-Speed 2625.16 samples/sec   Loss 1.9153   LearningRate 0.0018   Epoch: 17   Global Step: 717060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:21,648-Speed 2628.13 samples/sec   Loss 1.9738   LearningRate 0.0018   Epoch: 17   Global Step: 717070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:25,543-Speed 2630.76 samples/sec   Loss 1.9393   LearningRate 0.0018   Epoch: 17   Global Step: 717080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:29,437-Speed 2630.10 samples/sec   Loss 1.9791   LearningRate 0.0018   Epoch: 17   Global Step: 717090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:33,336-Speed 2626.93 samples/sec   Loss 1.9636   LearningRate 0.0018   Epoch: 17   Global Step: 717100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:19:37,243-Speed 2621.74 samples/sec   Loss 1.9746   LearningRate 0.0018   Epoch: 17   Global Step: 717110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:19:41,171-Speed 2607.40 samples/sec   Loss 1.9385   LearningRate 0.0018   Epoch: 17   Global Step: 717120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:19:45,186-Speed 2550.63 samples/sec   Loss 1.9181   LearningRate 0.0018   Epoch: 17   Global Step: 717130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:19:49,092-Speed 2623.22 samples/sec   Loss 1.9226   LearningRate 0.0018   Epoch: 17   Global Step: 717140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:19:52,987-Speed 2629.47 samples/sec   Loss 1.9865   LearningRate 0.0018   Epoch: 17   Global Step: 717150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:19:56,885-Speed 2628.08 samples/sec   Loss 1.9744   LearningRate 0.0018   Epoch: 17   Global Step: 717160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:20:00,782-Speed 2628.06 samples/sec   Loss 2.0018   LearningRate 0.0018   Epoch: 17   Global Step: 717170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:20:04,678-Speed 2629.17 samples/sec   Loss 1.9768   LearningRate 0.0018   Epoch: 17   Global Step: 717180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:20:08,550-Speed 2645.27 samples/sec   Loss 1.9966   LearningRate 0.0018   Epoch: 17   Global Step: 717190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:12,446-Speed 2629.13 samples/sec   Loss 1.9559   LearningRate 0.0018   Epoch: 17   Global Step: 717200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:16,373-Speed 2607.57 samples/sec   Loss 1.8717   LearningRate 0.0018   Epoch: 17   Global Step: 717210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:20,267-Speed 2630.58 samples/sec   Loss 1.9681   LearningRate 0.0018   Epoch: 17   Global Step: 717220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:24,192-Speed 2609.94 samples/sec   Loss 1.9573   LearningRate 0.0018   Epoch: 17   Global Step: 717230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:28,096-Speed 2623.53 samples/sec   Loss 2.0165   LearningRate 0.0018   Epoch: 17   Global Step: 717240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:32,001-Speed 2623.14 samples/sec   Loss 1.9808   LearningRate 0.0018   Epoch: 17   Global Step: 717250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:35,898-Speed 2628.40 samples/sec   Loss 1.9382   LearningRate 0.0018   Epoch: 17   Global Step: 717260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:39,824-Speed 2608.79 samples/sec   Loss 1.8713   LearningRate 0.0018   Epoch: 17   Global Step: 717270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:43,725-Speed 2630.16 samples/sec   Loss 1.9339   LearningRate 0.0018   Epoch: 17   Global Step: 717280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:20:47,643-Speed 2614.12 samples/sec   Loss 1.9441   LearningRate 0.0018   Epoch: 17   Global Step: 717290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:20:51,543-Speed 2626.98 samples/sec   Loss 1.9578   LearningRate 0.0018   Epoch: 17   Global Step: 717300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:20:55,441-Speed 2627.36 samples/sec   Loss 1.8904   LearningRate 0.0018   Epoch: 17   Global Step: 717310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:20:59,322-Speed 2639.27 samples/sec   Loss 1.9509   LearningRate 0.0018   Epoch: 17   Global Step: 717320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:03,226-Speed 2623.41 samples/sec   Loss 1.9359   LearningRate 0.0018   Epoch: 17   Global Step: 717330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:07,128-Speed 2625.55 samples/sec   Loss 1.9800   LearningRate 0.0018   Epoch: 17   Global Step: 717340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:11,027-Speed 2627.49 samples/sec   Loss 1.9607   LearningRate 0.0018   Epoch: 17   Global Step: 717350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:14,920-Speed 2630.92 samples/sec   Loss 1.9378   LearningRate 0.0018   Epoch: 17   Global Step: 717360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:18,856-Speed 2602.66 samples/sec   Loss 1.9035   LearningRate 0.0018   Epoch: 17   Global Step: 717370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:22,765-Speed 2620.45 samples/sec   Loss 1.9573   LearningRate 0.0018   Epoch: 17   Global Step: 717380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:26,667-Speed 2624.98 samples/sec   Loss 1.9135   LearningRate 0.0018   Epoch: 17   Global Step: 717390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:30,563-Speed 2629.11 samples/sec   Loss 1.9841   LearningRate 0.0018   Epoch: 17   Global Step: 717400   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:34,463-Speed 2626.32 samples/sec   Loss 1.9327   LearningRate 0.0018   Epoch: 17   Global Step: 717410   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:38,361-Speed 2627.20 samples/sec   Loss 1.9942   LearningRate 0.0018   Epoch: 17   Global Step: 717420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:21:42,256-Speed 2630.23 samples/sec   Loss 1.9169   LearningRate 0.0018   Epoch: 17   Global Step: 717430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:21:46,143-Speed 2634.81 samples/sec   Loss 1.9320   LearningRate 0.0018   Epoch: 17   Global Step: 717440   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:50,048-Speed 2623.24 samples/sec   Loss 2.0055   LearningRate 0.0018   Epoch: 17   Global Step: 717450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:53,947-Speed 2626.98 samples/sec   Loss 1.9221   LearningRate 0.0018   Epoch: 17   Global Step: 717460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:21:57,994-Speed 2531.40 samples/sec   Loss 1.9161   LearningRate 0.0018   Epoch: 17   Global Step: 717470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:01,899-Speed 2622.68 samples/sec   Loss 1.9350   LearningRate 0.0018   Epoch: 17   Global Step: 717480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:05,797-Speed 2627.94 samples/sec   Loss 1.9363   LearningRate 0.0018   Epoch: 17   Global Step: 717490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:09,689-Speed 2631.17 samples/sec   Loss 1.9447   LearningRate 0.0018   Epoch: 17   Global Step: 717500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:13,587-Speed 2628.25 samples/sec   Loss 1.8955   LearningRate 0.0018   Epoch: 17   Global Step: 717510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:17,484-Speed 2628.21 samples/sec   Loss 1.9228   LearningRate 0.0018   Epoch: 17   Global Step: 717520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:21,387-Speed 2624.67 samples/sec   Loss 1.9385   LearningRate 0.0018   Epoch: 17   Global Step: 717530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:22:25,288-Speed 2625.56 samples/sec   Loss 1.9236   LearningRate 0.0018   Epoch: 17   Global Step: 717540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:29,183-Speed 2629.83 samples/sec   Loss 1.9818   LearningRate 0.0018   Epoch: 17   Global Step: 717550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:33,079-Speed 2628.91 samples/sec   Loss 1.9329   LearningRate 0.0018   Epoch: 17   Global Step: 717560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:36,978-Speed 2627.18 samples/sec   Loss 1.9799   LearningRate 0.0018   Epoch: 17   Global Step: 717570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:40,875-Speed 2628.03 samples/sec   Loss 1.9599   LearningRate 0.0018   Epoch: 17   Global Step: 717580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:44,777-Speed 2624.85 samples/sec   Loss 1.9598   LearningRate 0.0018   Epoch: 17   Global Step: 717590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:48,671-Speed 2630.68 samples/sec   Loss 1.9447   LearningRate 0.0018   Epoch: 17   Global Step: 717600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:52,571-Speed 2626.12 samples/sec   Loss 1.9843   LearningRate 0.0018   Epoch: 17   Global Step: 717610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:22:56,494-Speed 2611.96 samples/sec   Loss 1.9157   LearningRate 0.0018   Epoch: 17   Global Step: 717620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:00,388-Speed 2629.79 samples/sec   Loss 1.9368   LearningRate 0.0018   Epoch: 17   Global Step: 717630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:04,267-Speed 2640.70 samples/sec   Loss 1.9163   LearningRate 0.0018   Epoch: 17   Global Step: 717640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:08,165-Speed 2627.06 samples/sec   Loss 1.9759   LearningRate 0.0018   Epoch: 17   Global Step: 717650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:12,074-Speed 2620.80 samples/sec   Loss 1.9115   LearningRate 0.0018   Epoch: 17   Global Step: 717660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:15,970-Speed 2629.04 samples/sec   Loss 1.9223   LearningRate 0.0018   Epoch: 17   Global Step: 717670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:19,861-Speed 2631.94 samples/sec   Loss 1.9354   LearningRate 0.0018   Epoch: 17   Global Step: 717680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:23,755-Speed 2630.13 samples/sec   Loss 1.9337   LearningRate 0.0018   Epoch: 17   Global Step: 717690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:27,649-Speed 2631.17 samples/sec   Loss 1.9750   LearningRate 0.0018   Epoch: 17   Global Step: 717700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:31,561-Speed 2617.84 samples/sec   Loss 1.8670   LearningRate 0.0018   Epoch: 17   Global Step: 717710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:35,460-Speed 2626.91 samples/sec   Loss 1.9506   LearningRate 0.0018   Epoch: 17   Global Step: 717720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:39,358-Speed 2627.82 samples/sec   Loss 2.0086   LearningRate 0.0018   Epoch: 17   Global Step: 717730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:43,233-Speed 2643.53 samples/sec   Loss 1.8984   LearningRate 0.0018   Epoch: 17   Global Step: 717740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:47,126-Speed 2630.64 samples/sec   Loss 1.9608   LearningRate 0.0018   Epoch: 17   Global Step: 717750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:51,042-Speed 2616.33 samples/sec   Loss 1.8959   LearningRate 0.0018   Epoch: 17   Global Step: 717760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:54,938-Speed 2628.94 samples/sec   Loss 1.9614   LearningRate 0.0018   Epoch: 17   Global Step: 717770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:23:58,841-Speed 2624.53 samples/sec   Loss 1.8932   LearningRate 0.0018   Epoch: 17   Global Step: 717780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:24:02,738-Speed 2627.94 samples/sec   Loss 1.9417   LearningRate 0.0018   Epoch: 17   Global Step: 717790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:24:06,625-Speed 2635.66 samples/sec   Loss 1.9252   LearningRate 0.0018   Epoch: 17   Global Step: 717800   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:24:10,520-Speed 2629.53 samples/sec   Loss 1.8992   LearningRate 0.0018   Epoch: 17   Global Step: 717810   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:24:14,412-Speed 2632.21 samples/sec   Loss 2.0439   LearningRate 0.0018   Epoch: 17   Global Step: 717820   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:24:18,309-Speed 2628.31 samples/sec   Loss 1.9742   LearningRate 0.0018   Epoch: 17   Global Step: 717830   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:24:22,215-Speed 2622.44 samples/sec   Loss 1.9384   LearningRate 0.0018   Epoch: 17   Global Step: 717840   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:24:26,115-Speed 2626.75 samples/sec   Loss 1.9465   LearningRate 0.0018   Epoch: 17   Global Step: 717850   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:24:29,992-Speed 2641.44 samples/sec   Loss 1.9145   LearningRate 0.0018   Epoch: 17   Global Step: 717860   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:33,899-Speed 2622.00 samples/sec   Loss 1.9456   LearningRate 0.0018   Epoch: 17   Global Step: 717870   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:37,801-Speed 2625.24 samples/sec   Loss 1.8914   LearningRate 0.0018   Epoch: 17   Global Step: 717880   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:41,698-Speed 2628.06 samples/sec   Loss 1.9214   LearningRate 0.0018   Epoch: 17   Global Step: 717890   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:45,598-Speed 2625.83 samples/sec   Loss 1.9689   LearningRate 0.0018   Epoch: 17   Global Step: 717900   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:49,489-Speed 2632.90 samples/sec   Loss 1.9730   LearningRate 0.0018   Epoch: 17   Global Step: 717910   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:53,400-Speed 2618.98 samples/sec   Loss 1.8762   LearningRate 0.0018   Epoch: 17   Global Step: 717920   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:24:57,299-Speed 2627.43 samples/sec   Loss 1.9385   LearningRate 0.0018   Epoch: 17   Global Step: 717930   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:25:01,194-Speed 2628.96 samples/sec   Loss 1.9253   LearningRate 0.0018   Epoch: 17   Global Step: 717940   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:25:05,094-Speed 2627.13 samples/sec   Loss 1.9154   LearningRate 0.0018   Epoch: 17   Global Step: 717950   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-16 04:25:08,989-Speed 2629.87 samples/sec   Loss 1.9385   LearningRate 0.0018   Epoch: 17   Global Step: 717960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:12,884-Speed 2629.13 samples/sec   Loss 1.9917   LearningRate 0.0018   Epoch: 17   Global Step: 717970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:16,778-Speed 2630.38 samples/sec   Loss 1.9059   LearningRate 0.0018   Epoch: 17   Global Step: 717980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:20,676-Speed 2627.46 samples/sec   Loss 1.9719   LearningRate 0.0018   Epoch: 17   Global Step: 717990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:24,574-Speed 2628.04 samples/sec   Loss 1.9468   LearningRate 0.0018   Epoch: 17   Global Step: 718000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:28,467-Speed 2630.81 samples/sec   Loss 1.9059   LearningRate 0.0018   Epoch: 17   Global Step: 718010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:32,384-Speed 2615.47 samples/sec   Loss 1.8998   LearningRate 0.0018   Epoch: 17   Global Step: 718020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:36,286-Speed 2624.36 samples/sec   Loss 1.9483   LearningRate 0.0018   Epoch: 17   Global Step: 718030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:40,187-Speed 2625.54 samples/sec   Loss 1.9146   LearningRate 0.0018   Epoch: 17   Global Step: 718040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:44,097-Speed 2619.87 samples/sec   Loss 1.8964   LearningRate 0.0018   Epoch: 17   Global Step: 718050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:25:47,992-Speed 2629.84 samples/sec   Loss 1.8954   LearningRate 0.0018   Epoch: 17   Global Step: 718060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:25:51,888-Speed 2628.54 samples/sec   Loss 1.9196   LearningRate 0.0018   Epoch: 17   Global Step: 718070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:25:55,828-Speed 2600.47 samples/sec   Loss 1.9116   LearningRate 0.0018   Epoch: 17   Global Step: 718080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:25:59,724-Speed 2629.03 samples/sec   Loss 1.9031   LearningRate 0.0018   Epoch: 17   Global Step: 718090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:26:03,637-Speed 2617.70 samples/sec   Loss 1.9438   LearningRate 0.0018   Epoch: 17   Global Step: 718100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:26:07,542-Speed 2623.09 samples/sec   Loss 1.9351   LearningRate 0.0018   Epoch: 17   Global Step: 718110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:26:11,419-Speed 2641.39 samples/sec   Loss 1.9572   LearningRate 0.0018   Epoch: 17   Global Step: 718120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:15,318-Speed 2627.18 samples/sec   Loss 1.9283   LearningRate 0.0018   Epoch: 17   Global Step: 718130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:19,211-Speed 2631.29 samples/sec   Loss 1.9219   LearningRate 0.0018   Epoch: 17   Global Step: 718140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:23,115-Speed 2622.93 samples/sec   Loss 1.9056   LearningRate 0.0018   Epoch: 17   Global Step: 718150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:27,020-Speed 2623.31 samples/sec   Loss 1.9350   LearningRate 0.0018   Epoch: 17   Global Step: 718160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:30,923-Speed 2624.05 samples/sec   Loss 1.9387   LearningRate 0.0018   Epoch: 17   Global Step: 718170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:34,823-Speed 2626.69 samples/sec   Loss 1.9375   LearningRate 0.0018   Epoch: 17   Global Step: 718180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:38,751-Speed 2607.64 samples/sec   Loss 1.8979   LearningRate 0.0018   Epoch: 17   Global Step: 718190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:42,649-Speed 2627.67 samples/sec   Loss 1.9315   LearningRate 0.0018   Epoch: 17   Global Step: 718200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:46,542-Speed 2631.33 samples/sec   Loss 1.8778   LearningRate 0.0018   Epoch: 17   Global Step: 718210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-16 04:26:50,443-Speed 2625.32 samples/sec   Loss 1.9397   LearningRate 0.0018   Epoch: 17   Global Step: 718220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:26:54,352-Speed 2621.27 samples/sec   Loss 1.9679   LearningRate 0.0018   Epoch: 17   Global Step: 718230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:26:58,257-Speed 2622.54 samples/sec   Loss 1.9555   LearningRate 0.0018   Epoch: 17   Global Step: 718240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:02,154-Speed 2627.96 samples/sec   Loss 1.9611   LearningRate 0.0018   Epoch: 17   Global Step: 718250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:06,068-Speed 2617.18 samples/sec   Loss 1.9321   LearningRate 0.0018   Epoch: 17   Global Step: 718260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:09,964-Speed 2629.17 samples/sec   Loss 1.9293   LearningRate 0.0018   Epoch: 17   Global Step: 718270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:13,870-Speed 2622.89 samples/sec   Loss 1.8803   LearningRate 0.0018   Epoch: 17   Global Step: 718280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:17,762-Speed 2631.50 samples/sec   Loss 1.9184   LearningRate 0.0018   Epoch: 17   Global Step: 718290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:21,652-Speed 2632.85 samples/sec   Loss 1.9441   LearningRate 0.0018   Epoch: 17   Global Step: 718300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:25,547-Speed 2629.71 samples/sec   Loss 1.9437   LearningRate 0.0018   Epoch: 17   Global Step: 718310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:29,441-Speed 2630.57 samples/sec   Loss 1.9624   LearningRate 0.0018   Epoch: 17   Global Step: 718320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-16 04:27:33,314-Speed 2644.58 samples/sec   Loss 1.9868   LearningRate 0.0018   Epoch: 17   Global Step: 718330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:37,218-Speed 2623.51 samples/sec   Loss 1.9023   LearningRate 0.0018   Epoch: 17   Global Step: 718340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:41,116-Speed 2627.55 samples/sec   Loss 1.8667   LearningRate 0.0018   Epoch: 17   Global Step: 718350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-16 04:27:45,011-Speed 2629.73 samples/sec   Loss 1.8885   LearningRate 0.0018   Epoch: 17   Global Step: 718360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:27:48,905-Speed 2631.35 samples/sec   Loss 1.9195   LearningRate 0.0018   Epoch: 17   Global Step: 718370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:27:52,804-Speed 2626.71 samples/sec   Loss 1.9298   LearningRate 0.0018   Epoch: 17   Global Step: 718380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:27:56,698-Speed 2630.91 samples/sec   Loss 1.9162   LearningRate 0.0018   Epoch: 17   Global Step: 718390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:28:00,571-Speed 2644.08 samples/sec   Loss 1.9053   LearningRate 0.0018   Epoch: 17   Global Step: 718400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:04,482-Speed 2619.19 samples/sec   Loss 1.9097   LearningRate 0.0018   Epoch: 17   Global Step: 718410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:08,378-Speed 2628.93 samples/sec   Loss 1.9874   LearningRate 0.0018   Epoch: 17   Global Step: 718420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:12,272-Speed 2630.14 samples/sec   Loss 1.9799   LearningRate 0.0018   Epoch: 17   Global Step: 718430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:16,166-Speed 2630.58 samples/sec   Loss 1.9465   LearningRate 0.0018   Epoch: 17   Global Step: 718440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:20,081-Speed 2616.29 samples/sec   Loss 1.9659   LearningRate 0.0018   Epoch: 17   Global Step: 718450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:23,997-Speed 2616.18 samples/sec   Loss 1.8949   LearningRate 0.0018   Epoch: 17   Global Step: 718460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:27,890-Speed 2631.26 samples/sec   Loss 1.9111   LearningRate 0.0018   Epoch: 17   Global Step: 718470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:31,796-Speed 2621.96 samples/sec   Loss 1.9326   LearningRate 0.0018   Epoch: 17   Global Step: 718480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:35,697-Speed 2625.42 samples/sec   Loss 1.9415   LearningRate 0.0018   Epoch: 17   Global Step: 718490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:39,592-Speed 2629.01 samples/sec   Loss 1.9406   LearningRate 0.0018   Epoch: 17   Global Step: 718500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:28:43,493-Speed 2626.86 samples/sec   Loss 1.9015   LearningRate 0.0018   Epoch: 17   Global Step: 718510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:28:47,386-Speed 2630.67 samples/sec   Loss 1.9523   LearningRate 0.0018   Epoch: 17   Global Step: 718520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:28:51,274-Speed 2634.56 samples/sec   Loss 2.0057   LearningRate 0.0018   Epoch: 17   Global Step: 718530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:55,165-Speed 2632.45 samples/sec   Loss 1.9077   LearningRate 0.0018   Epoch: 17   Global Step: 718540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:28:59,059-Speed 2630.59 samples/sec   Loss 1.8639   LearningRate 0.0018   Epoch: 17   Global Step: 718550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:02,954-Speed 2629.43 samples/sec   Loss 1.8858   LearningRate 0.0018   Epoch: 17   Global Step: 718560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:06,863-Speed 2620.47 samples/sec   Loss 1.9303   LearningRate 0.0018   Epoch: 17   Global Step: 718570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:10,764-Speed 2625.78 samples/sec   Loss 1.9752   LearningRate 0.0018   Epoch: 17   Global Step: 718580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:14,657-Speed 2631.35 samples/sec   Loss 1.9258   LearningRate 0.0018   Epoch: 17   Global Step: 718590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:18,551-Speed 2629.71 samples/sec   Loss 1.9498   LearningRate 0.0018   Epoch: 17   Global Step: 718600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:22,453-Speed 2625.23 samples/sec   Loss 1.9443   LearningRate 0.0018   Epoch: 17   Global Step: 718610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:26,350-Speed 2628.62 samples/sec   Loss 1.9665   LearningRate 0.0018   Epoch: 17   Global Step: 718620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:30,258-Speed 2621.58 samples/sec   Loss 1.8904   LearningRate 0.0018   Epoch: 17   Global Step: 718630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:29:34,160-Speed 2624.70 samples/sec   Loss 1.9645   LearningRate 0.0018   Epoch: 17   Global Step: 718640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:29:38,053-Speed 2630.85 samples/sec   Loss 1.9079   LearningRate 0.0018   Epoch: 17   Global Step: 718650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:29:41,948-Speed 2629.34 samples/sec   Loss 1.8831   LearningRate 0.0018   Epoch: 17   Global Step: 718660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:29:45,822-Speed 2644.52 samples/sec   Loss 1.9603   LearningRate 0.0018   Epoch: 17   Global Step: 718670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:49,721-Speed 2627.29 samples/sec   Loss 1.9581   LearningRate 0.0018   Epoch: 17   Global Step: 718680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:53,621-Speed 2625.42 samples/sec   Loss 1.9428   LearningRate 0.0018   Epoch: 17   Global Step: 718690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:29:57,520-Speed 2627.68 samples/sec   Loss 1.9044   LearningRate 0.0018   Epoch: 17   Global Step: 718700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:01,442-Speed 2611.05 samples/sec   Loss 1.9002   LearningRate 0.0018   Epoch: 17   Global Step: 718710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:05,352-Speed 2619.80 samples/sec   Loss 1.8982   LearningRate 0.0018   Epoch: 17   Global Step: 718720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:09,245-Speed 2630.45 samples/sec   Loss 1.9124   LearningRate 0.0018   Epoch: 17   Global Step: 718730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:13,146-Speed 2626.01 samples/sec   Loss 1.9121   LearningRate 0.0018   Epoch: 17   Global Step: 718740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:17,038-Speed 2631.32 samples/sec   Loss 1.8690   LearningRate 0.0018   Epoch: 17   Global Step: 718750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:20,941-Speed 2624.13 samples/sec   Loss 1.8856   LearningRate 0.0018   Epoch: 17   Global Step: 718760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:30:24,840-Speed 2627.95 samples/sec   Loss 1.9466   LearningRate 0.0018   Epoch: 17   Global Step: 718770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:28,734-Speed 2629.64 samples/sec   Loss 1.9720   LearningRate 0.0018   Epoch: 17   Global Step: 718780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:32,630-Speed 2629.22 samples/sec   Loss 1.9079   LearningRate 0.0018   Epoch: 17   Global Step: 718790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:36,524-Speed 2630.37 samples/sec   Loss 1.9090   LearningRate 0.0018   Epoch: 17   Global Step: 718800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:40,484-Speed 2586.14 samples/sec   Loss 1.9321   LearningRate 0.0018   Epoch: 17   Global Step: 718810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:44,379-Speed 2629.45 samples/sec   Loss 1.9374   LearningRate 0.0018   Epoch: 17   Global Step: 718820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:48,296-Speed 2615.51 samples/sec   Loss 1.9949   LearningRate 0.0018   Epoch: 17   Global Step: 718830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:52,191-Speed 2628.95 samples/sec   Loss 1.9078   LearningRate 0.0018   Epoch: 17   Global Step: 718840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:56,084-Speed 2631.45 samples/sec   Loss 1.9659   LearningRate 0.0018   Epoch: 17   Global Step: 718850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:30:59,985-Speed 2625.42 samples/sec   Loss 1.8654   LearningRate 0.0018   Epoch: 17   Global Step: 718860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:03,857-Speed 2645.80 samples/sec   Loss 1.9322   LearningRate 0.0018   Epoch: 17   Global Step: 718870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:07,766-Speed 2620.00 samples/sec   Loss 1.8208   LearningRate 0.0018   Epoch: 17   Global Step: 718880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:11,664-Speed 2627.39 samples/sec   Loss 1.9821   LearningRate 0.0018   Epoch: 17   Global Step: 718890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:15,560-Speed 2628.69 samples/sec   Loss 1.9434   LearningRate 0.0018   Epoch: 17   Global Step: 718900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:19,455-Speed 2630.04 samples/sec   Loss 1.9380   LearningRate 0.0018   Epoch: 17   Global Step: 718910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:23,349-Speed 2630.35 samples/sec   Loss 1.9389   LearningRate 0.0018   Epoch: 17   Global Step: 718920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:27,244-Speed 2629.53 samples/sec   Loss 1.9047   LearningRate 0.0018   Epoch: 17   Global Step: 718930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:31,201-Speed 2588.50 samples/sec   Loss 1.9205   LearningRate 0.0018   Epoch: 17   Global Step: 718940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:35,100-Speed 2627.03 samples/sec   Loss 1.9129   LearningRate 0.0018   Epoch: 17   Global Step: 718950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:38,995-Speed 2629.95 samples/sec   Loss 1.8796   LearningRate 0.0018   Epoch: 17   Global Step: 718960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:42,889-Speed 2629.73 samples/sec   Loss 1.9508   LearningRate 0.0018   Epoch: 17   Global Step: 718970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:31:46,782-Speed 2631.07 samples/sec   Loss 1.8711   LearningRate 0.0018   Epoch: 17   Global Step: 718980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:31:50,663-Speed 2639.14 samples/sec   Loss 1.9127   LearningRate 0.0018   Epoch: 17   Global Step: 718990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:54,565-Speed 2624.81 samples/sec   Loss 1.9114   LearningRate 0.0018   Epoch: 17   Global Step: 719000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:31:58,486-Speed 2612.12 samples/sec   Loss 1.8349   LearningRate 0.0018   Epoch: 17   Global Step: 719010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:32:02,379-Speed 2631.43 samples/sec   Loss 1.9184   LearningRate 0.0018   Epoch: 17   Global Step: 719020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:32:06,278-Speed 2626.44 samples/sec   Loss 1.9017   LearningRate 0.0018   Epoch: 17   Global Step: 719030   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:10,176-Speed 2628.07 samples/sec   Loss 1.9646   LearningRate 0.0018   Epoch: 17   Global Step: 719040   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:14,077-Speed 2625.75 samples/sec   Loss 1.9662   LearningRate 0.0018   Epoch: 17   Global Step: 719050   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:17,976-Speed 2626.78 samples/sec   Loss 1.8925   LearningRate 0.0018   Epoch: 17   Global Step: 719060   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:21,865-Speed 2633.75 samples/sec   Loss 1.9326   LearningRate 0.0018   Epoch: 17   Global Step: 719070   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:25,755-Speed 2632.64 samples/sec   Loss 1.8290   LearningRate 0.0018   Epoch: 17   Global Step: 719080   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:29,648-Speed 2631.39 samples/sec   Loss 1.8986   LearningRate 0.0018   Epoch: 17   Global Step: 719090   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:33,542-Speed 2629.81 samples/sec   Loss 1.9211   LearningRate 0.0018   Epoch: 17   Global Step: 719100   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:37,431-Speed 2633.87 samples/sec   Loss 1.9268   LearningRate 0.0018   Epoch: 17   Global Step: 719110   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:41,320-Speed 2633.16 samples/sec   Loss 1.9182   LearningRate 0.0018   Epoch: 17   Global Step: 719120   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:32:45,221-Speed 2625.96 samples/sec   Loss 1.9367   LearningRate 0.0018   Epoch: 17   Global Step: 719130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:32:49,116-Speed 2629.71 samples/sec   Loss 1.9226   LearningRate 0.0018   Epoch: 17   Global Step: 719140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:32:53,009-Speed 2631.92 samples/sec   Loss 1.9016   LearningRate 0.0018   Epoch: 17   Global Step: 719150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:32:56,898-Speed 2633.08 samples/sec   Loss 1.8860   LearningRate 0.0018   Epoch: 17   Global Step: 719160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:00,884-Speed 2569.69 samples/sec   Loss 1.8912   LearningRate 0.0018   Epoch: 17   Global Step: 719170   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:04,781-Speed 2627.91 samples/sec   Loss 1.9543   LearningRate 0.0018   Epoch: 17   Global Step: 719180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:08,693-Speed 2618.34 samples/sec   Loss 1.9483   LearningRate 0.0018   Epoch: 17   Global Step: 719190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:12,595-Speed 2624.74 samples/sec   Loss 1.9185   LearningRate 0.0018   Epoch: 17   Global Step: 719200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:16,500-Speed 2622.80 samples/sec   Loss 1.9434   LearningRate 0.0018   Epoch: 17   Global Step: 719210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:20,403-Speed 2624.20 samples/sec   Loss 1.9018   LearningRate 0.0018   Epoch: 17   Global Step: 719220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:33:24,305-Speed 2625.18 samples/sec   Loss 1.9064   LearningRate 0.0018   Epoch: 17   Global Step: 719230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:28,204-Speed 2626.84 samples/sec   Loss 1.9468   LearningRate 0.0018   Epoch: 17   Global Step: 719240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:32,115-Speed 2619.01 samples/sec   Loss 1.9046   LearningRate 0.0018   Epoch: 17   Global Step: 719250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:36,032-Speed 2615.11 samples/sec   Loss 1.9135   LearningRate 0.0018   Epoch: 17   Global Step: 719260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:39,935-Speed 2623.71 samples/sec   Loss 1.9547   LearningRate 0.0018   Epoch: 17   Global Step: 719270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:43,840-Speed 2622.90 samples/sec   Loss 1.9173   LearningRate 0.0018   Epoch: 17   Global Step: 719280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:47,750-Speed 2619.47 samples/sec   Loss 1.9193   LearningRate 0.0018   Epoch: 17   Global Step: 719290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:51,652-Speed 2624.99 samples/sec   Loss 1.9494   LearningRate 0.0018   Epoch: 17   Global Step: 719300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:55,555-Speed 2624.05 samples/sec   Loss 1.9651   LearningRate 0.0018   Epoch: 17   Global Step: 719310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:33:59,466-Speed 2619.46 samples/sec   Loss 1.9490   LearningRate 0.0018   Epoch: 17   Global Step: 719320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:34:03,352-Speed 2635.89 samples/sec   Loss 1.8987   LearningRate 0.0018   Epoch: 17   Global Step: 719330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:34:07,265-Speed 2617.46 samples/sec   Loss 1.9509   LearningRate 0.0018   Epoch: 17   Global Step: 719340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:34:11,164-Speed 2626.51 samples/sec   Loss 1.8589   LearningRate 0.0018   Epoch: 17   Global Step: 719350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:34:15,064-Speed 2626.34 samples/sec   Loss 1.8637   LearningRate 0.0018   Epoch: 17   Global Step: 719360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:34:18,969-Speed 2623.02 samples/sec   Loss 1.9004   LearningRate 0.0018   Epoch: 17   Global Step: 719370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:34:22,848-Speed 2641.04 samples/sec   Loss 1.8377   LearningRate 0.0018   Epoch: 17   Global Step: 719380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:26,751-Speed 2623.83 samples/sec   Loss 1.9344   LearningRate 0.0018   Epoch: 17   Global Step: 719390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:30,647-Speed 2629.23 samples/sec   Loss 1.9040   LearningRate 0.0018   Epoch: 17   Global Step: 719400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:34,540-Speed 2631.11 samples/sec   Loss 1.9088   LearningRate 0.0018   Epoch: 17   Global Step: 719410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:38,443-Speed 2624.36 samples/sec   Loss 1.8934   LearningRate 0.0018   Epoch: 17   Global Step: 719420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:42,334-Speed 2631.94 samples/sec   Loss 1.9837   LearningRate 0.0018   Epoch: 17   Global Step: 719430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:46,229-Speed 2629.82 samples/sec   Loss 1.9731   LearningRate 0.0018   Epoch: 17   Global Step: 719440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:50,124-Speed 2630.09 samples/sec   Loss 1.9249   LearningRate 0.0018   Epoch: 17   Global Step: 719450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:54,025-Speed 2625.00 samples/sec   Loss 2.0269   LearningRate 0.0018   Epoch: 17   Global Step: 719460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:34:57,929-Speed 2624.24 samples/sec   Loss 1.9037   LearningRate 0.0018   Epoch: 17   Global Step: 719470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:01,823-Speed 2629.74 samples/sec   Loss 1.9204   LearningRate 0.0018   Epoch: 17   Global Step: 719480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:35:05,718-Speed 2629.88 samples/sec   Loss 1.9456   LearningRate 0.0018   Epoch: 17   Global Step: 719490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:35:09,591-Speed 2644.00 samples/sec   Loss 1.9858   LearningRate 0.0018   Epoch: 17   Global Step: 719500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:13,486-Speed 2630.19 samples/sec   Loss 1.9587   LearningRate 0.0018   Epoch: 17   Global Step: 719510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:17,380-Speed 2630.60 samples/sec   Loss 1.9424   LearningRate 0.0018   Epoch: 17   Global Step: 719520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:21,278-Speed 2627.35 samples/sec   Loss 1.9166   LearningRate 0.0018   Epoch: 17   Global Step: 719530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:25,178-Speed 2626.15 samples/sec   Loss 1.8656   LearningRate 0.0018   Epoch: 17   Global Step: 719540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:29,083-Speed 2623.35 samples/sec   Loss 1.8694   LearningRate 0.0018   Epoch: 17   Global Step: 719550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:32,976-Speed 2630.93 samples/sec   Loss 1.9235   LearningRate 0.0018   Epoch: 17   Global Step: 719560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:36,883-Speed 2620.96 samples/sec   Loss 1.9261   LearningRate 0.0018   Epoch: 17   Global Step: 719570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:40,777-Speed 2630.38 samples/sec   Loss 1.9386   LearningRate 0.0018   Epoch: 17   Global Step: 719580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:44,670-Speed 2631.86 samples/sec   Loss 1.8611   LearningRate 0.0018   Epoch: 17   Global Step: 719590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:48,570-Speed 2626.20 samples/sec   Loss 1.9499   LearningRate 0.0018   Epoch: 17   Global Step: 719600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:35:52,444-Speed 2643.70 samples/sec   Loss 1.9072   LearningRate 0.0018   Epoch: 17   Global Step: 719610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:35:56,342-Speed 2628.38 samples/sec   Loss 1.8692   LearningRate 0.0018   Epoch: 17   Global Step: 719620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:00,232-Speed 2632.62 samples/sec   Loss 1.9910   LearningRate 0.0018   Epoch: 17   Global Step: 719630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:04,146-Speed 2616.81 samples/sec   Loss 1.9128   LearningRate 0.0018   Epoch: 17   Global Step: 719640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:08,042-Speed 2629.01 samples/sec   Loss 1.8879   LearningRate 0.0018   Epoch: 17   Global Step: 719650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:11,939-Speed 2628.62 samples/sec   Loss 1.9518   LearningRate 0.0018   Epoch: 17   Global Step: 719660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:15,830-Speed 2632.19 samples/sec   Loss 1.9474   LearningRate 0.0018   Epoch: 17   Global Step: 719670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:19,724-Speed 2630.53 samples/sec   Loss 1.9047   LearningRate 0.0018   Epoch: 17   Global Step: 719680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:23,620-Speed 2629.56 samples/sec   Loss 1.9035   LearningRate 0.0018   Epoch: 17   Global Step: 719690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:27,516-Speed 2628.88 samples/sec   Loss 1.8786   LearningRate 0.0018   Epoch: 17   Global Step: 719700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:36:31,413-Speed 2628.46 samples/sec   Loss 1.9053   LearningRate 0.0018   Epoch: 17   Global Step: 719710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:35,361-Speed 2594.74 samples/sec   Loss 1.9205   LearningRate 0.0018   Epoch: 17   Global Step: 719720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:39,267-Speed 2622.35 samples/sec   Loss 1.9561   LearningRate 0.0018   Epoch: 17   Global Step: 719730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:43,158-Speed 2631.73 samples/sec   Loss 1.9152   LearningRate 0.0018   Epoch: 17   Global Step: 719740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:47,057-Speed 2627.00 samples/sec   Loss 1.9213   LearningRate 0.0018   Epoch: 17   Global Step: 719750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:50,955-Speed 2627.92 samples/sec   Loss 1.8871   LearningRate 0.0018   Epoch: 17   Global Step: 719760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:54,852-Speed 2628.48 samples/sec   Loss 1.8804   LearningRate 0.0018   Epoch: 17   Global Step: 719770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:36:58,755-Speed 2624.16 samples/sec   Loss 1.9462   LearningRate 0.0018   Epoch: 17   Global Step: 719780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:02,650-Speed 2630.58 samples/sec   Loss 1.9547   LearningRate 0.0018   Epoch: 17   Global Step: 719790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:06,543-Speed 2630.84 samples/sec   Loss 1.9448   LearningRate 0.0018   Epoch: 17   Global Step: 719800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:10,424-Speed 2638.77 samples/sec   Loss 1.8380   LearningRate 0.0018   Epoch: 17   Global Step: 719810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:14,321-Speed 2628.40 samples/sec   Loss 1.8773   LearningRate 0.0018   Epoch: 17   Global Step: 719820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:18,227-Speed 2623.06 samples/sec   Loss 1.8905   LearningRate 0.0018   Epoch: 17   Global Step: 719830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:22,118-Speed 2632.16 samples/sec   Loss 1.9011   LearningRate 0.0017   Epoch: 17   Global Step: 719840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:26,037-Speed 2614.45 samples/sec   Loss 1.9186   LearningRate 0.0017   Epoch: 17   Global Step: 719850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:29,941-Speed 2623.25 samples/sec   Loss 1.9247   LearningRate 0.0017   Epoch: 17   Global Step: 719860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:33,932-Speed 2566.82 samples/sec   Loss 1.9114   LearningRate 0.0017   Epoch: 17   Global Step: 719870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:37,831-Speed 2626.65 samples/sec   Loss 1.8187   LearningRate 0.0017   Epoch: 17   Global Step: 719880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:41,732-Speed 2625.47 samples/sec   Loss 1.9661   LearningRate 0.0017   Epoch: 17   Global Step: 719890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:45,638-Speed 2622.34 samples/sec   Loss 1.9730   LearningRate 0.0017   Epoch: 17   Global Step: 719900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:49,550-Speed 2618.72 samples/sec   Loss 1.9170   LearningRate 0.0017   Epoch: 17   Global Step: 719910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:37:53,433-Speed 2637.52 samples/sec   Loss 1.8993   LearningRate 0.0017   Epoch: 17   Global Step: 719920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:37:57,336-Speed 2624.36 samples/sec   Loss 1.8568   LearningRate 0.0017   Epoch: 17   Global Step: 719930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:01,240-Speed 2623.43 samples/sec   Loss 1.8768   LearningRate 0.0017   Epoch: 17   Global Step: 719940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:05,141-Speed 2625.89 samples/sec   Loss 1.9417   LearningRate 0.0017   Epoch: 17   Global Step: 719950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:09,041-Speed 2626.48 samples/sec   Loss 1.9178   LearningRate 0.0017   Epoch: 17   Global Step: 719960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:12,937-Speed 2628.30 samples/sec   Loss 1.9162   LearningRate 0.0017   Epoch: 17   Global Step: 719970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:16,838-Speed 2625.52 samples/sec   Loss 1.9261   LearningRate 0.0017   Epoch: 17   Global Step: 719980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:20,739-Speed 2625.85 samples/sec   Loss 1.9277   LearningRate 0.0017   Epoch: 17   Global Step: 719990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:38:24,651-Speed 2618.72 samples/sec   Loss 1.9260   LearningRate 0.0017   Epoch: 17   Global Step: 720000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:39:07,457-[lfw][720000]XNorm: 21.991863
Training: 2022-04-16 04:39:07,458-[lfw][720000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 04:39:07,458-[lfw][720000]Accuracy-Highest: 0.99850
Training: 2022-04-16 04:39:57,376-[cfp_fp][720000]XNorm: 22.170312
Training: 2022-04-16 04:39:57,377-[cfp_fp][720000]Accuracy-Flip: 0.99329+-0.00404
Training: 2022-04-16 04:39:57,378-[cfp_fp][720000]Accuracy-Highest: 0.99329
Training: 2022-04-16 04:40:40,689-[agedb_30][720000]XNorm: 22.908823
Training: 2022-04-16 04:40:40,690-[agedb_30][720000]Accuracy-Flip: 0.98333+-0.00671
Training: 2022-04-16 04:40:40,691-[agedb_30][720000]Accuracy-Highest: 0.98333
Training: 2022-04-16 04:40:44,557-Speed 73.19 samples/sec   Loss 1.9146   LearningRate 0.0017   Epoch: 17   Global Step: 720010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:40:48,394-Speed 2669.14 samples/sec   Loss 1.8629   LearningRate 0.0017   Epoch: 17   Global Step: 720020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:40:52,257-Speed 2651.59 samples/sec   Loss 1.9137   LearningRate 0.0017   Epoch: 17   Global Step: 720030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:40:56,128-Speed 2646.12 samples/sec   Loss 1.8411   LearningRate 0.0017   Epoch: 17   Global Step: 720040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:41:00,007-Speed 2640.20 samples/sec   Loss 1.9192   LearningRate 0.0017   Epoch: 17   Global Step: 720050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:41:03,878-Speed 2646.02 samples/sec   Loss 1.9190   LearningRate 0.0017   Epoch: 17   Global Step: 720060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:41:07,730-Speed 2658.77 samples/sec   Loss 1.9612   LearningRate 0.0017   Epoch: 17   Global Step: 720070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:11,612-Speed 2638.56 samples/sec   Loss 1.8738   LearningRate 0.0017   Epoch: 17   Global Step: 720080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:15,495-Speed 2638.51 samples/sec   Loss 1.8954   LearningRate 0.0017   Epoch: 17   Global Step: 720090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:19,374-Speed 2640.41 samples/sec   Loss 1.8741   LearningRate 0.0017   Epoch: 17   Global Step: 720100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:23,269-Speed 2629.87 samples/sec   Loss 1.9162   LearningRate 0.0017   Epoch: 17   Global Step: 720110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:27,160-Speed 2632.26 samples/sec   Loss 1.9395   LearningRate 0.0017   Epoch: 17   Global Step: 720120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:31,049-Speed 2634.64 samples/sec   Loss 1.9007   LearningRate 0.0017   Epoch: 17   Global Step: 720130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:34,936-Speed 2635.04 samples/sec   Loss 1.8797   LearningRate 0.0017   Epoch: 17   Global Step: 720140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:38,842-Speed 2621.95 samples/sec   Loss 1.8886   LearningRate 0.0017   Epoch: 17   Global Step: 720150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:42,744-Speed 2625.31 samples/sec   Loss 1.9585   LearningRate 0.0017   Epoch: 17   Global Step: 720160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:46,644-Speed 2626.65 samples/sec   Loss 1.9071   LearningRate 0.0017   Epoch: 17   Global Step: 720170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:41:50,544-Speed 2628.13 samples/sec   Loss 1.9551   LearningRate 0.0017   Epoch: 17   Global Step: 720180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:41:54,427-Speed 2637.65 samples/sec   Loss 1.9213   LearningRate 0.0017   Epoch: 17   Global Step: 720190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:41:58,339-Speed 2618.29 samples/sec   Loss 1.8796   LearningRate 0.0017   Epoch: 17   Global Step: 720200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:02,233-Speed 2630.43 samples/sec   Loss 1.9407   LearningRate 0.0017   Epoch: 17   Global Step: 720210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:06,307-Speed 2514.23 samples/sec   Loss 1.9423   LearningRate 0.0017   Epoch: 17   Global Step: 720220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:10,209-Speed 2624.54 samples/sec   Loss 1.9069   LearningRate 0.0017   Epoch: 17   Global Step: 720230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:14,114-Speed 2623.70 samples/sec   Loss 1.9324   LearningRate 0.0017   Epoch: 17   Global Step: 720240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:18,009-Speed 2630.10 samples/sec   Loss 1.9694   LearningRate 0.0017   Epoch: 17   Global Step: 720250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:21,905-Speed 2629.20 samples/sec   Loss 1.8730   LearningRate 0.0017   Epoch: 17   Global Step: 720260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:25,797-Speed 2631.15 samples/sec   Loss 1.8671   LearningRate 0.0017   Epoch: 17   Global Step: 720270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:29,708-Speed 2619.18 samples/sec   Loss 1.9150   LearningRate 0.0017   Epoch: 17   Global Step: 720280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:42:33,600-Speed 2632.21 samples/sec   Loss 1.8484   LearningRate 0.0017   Epoch: 17   Global Step: 720290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:42:37,497-Speed 2628.58 samples/sec   Loss 1.8347   LearningRate 0.0017   Epoch: 17   Global Step: 720300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:42:41,399-Speed 2624.45 samples/sec   Loss 1.8741   LearningRate 0.0017   Epoch: 17   Global Step: 720310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:42:45,318-Speed 2614.15 samples/sec   Loss 1.9645   LearningRate 0.0017   Epoch: 17   Global Step: 720320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:42:49,271-Speed 2591.29 samples/sec   Loss 1.9143   LearningRate 0.0017   Epoch: 17   Global Step: 720330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:42:53,342-Speed 2516.18 samples/sec   Loss 1.9197   LearningRate 0.0017   Epoch: 17   Global Step: 720340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:42:57,235-Speed 2631.03 samples/sec   Loss 1.8808   LearningRate 0.0017   Epoch: 17   Global Step: 720350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:43:01,333-Speed 2498.95 samples/sec   Loss 1.8901   LearningRate 0.0017   Epoch: 17   Global Step: 720360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:43:05,222-Speed 2634.21 samples/sec   Loss 1.8302   LearningRate 0.0017   Epoch: 17   Global Step: 720370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:43:09,091-Speed 2647.88 samples/sec   Loss 1.9090   LearningRate 0.0017   Epoch: 17   Global Step: 720380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:12,983-Speed 2631.71 samples/sec   Loss 1.9007   LearningRate 0.0017   Epoch: 17   Global Step: 720390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:16,880-Speed 2627.87 samples/sec   Loss 1.9065   LearningRate 0.0017   Epoch: 17   Global Step: 720400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:20,838-Speed 2588.00 samples/sec   Loss 1.9175   LearningRate 0.0017   Epoch: 17   Global Step: 720410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:24,929-Speed 2503.41 samples/sec   Loss 1.8947   LearningRate 0.0017   Epoch: 17   Global Step: 720420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:28,861-Speed 2605.02 samples/sec   Loss 1.8998   LearningRate 0.0017   Epoch: 17   Global Step: 720430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:32,752-Speed 2632.69 samples/sec   Loss 1.8777   LearningRate 0.0017   Epoch: 17   Global Step: 720440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:36,642-Speed 2632.98 samples/sec   Loss 1.8970   LearningRate 0.0017   Epoch: 17   Global Step: 720450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:40,536-Speed 2630.24 samples/sec   Loss 1.9643   LearningRate 0.0017   Epoch: 17   Global Step: 720460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:44,431-Speed 2629.84 samples/sec   Loss 1.9361   LearningRate 0.0017   Epoch: 17   Global Step: 720470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:43:48,351-Speed 2612.69 samples/sec   Loss 1.8856   LearningRate 0.0017   Epoch: 17   Global Step: 720480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:43:52,246-Speed 2630.68 samples/sec   Loss 1.9011   LearningRate 0.0017   Epoch: 17   Global Step: 720490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:43:56,140-Speed 2630.27 samples/sec   Loss 1.8786   LearningRate 0.0017   Epoch: 17   Global Step: 720500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:00,035-Speed 2629.71 samples/sec   Loss 1.9379   LearningRate 0.0017   Epoch: 17   Global Step: 720510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:03,928-Speed 2630.70 samples/sec   Loss 1.8741   LearningRate 0.0017   Epoch: 17   Global Step: 720520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:07,830-Speed 2625.01 samples/sec   Loss 1.9220   LearningRate 0.0017   Epoch: 17   Global Step: 720530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:11,732-Speed 2624.67 samples/sec   Loss 1.9598   LearningRate 0.0017   Epoch: 17   Global Step: 720540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:15,661-Speed 2607.18 samples/sec   Loss 1.9504   LearningRate 0.0017   Epoch: 17   Global Step: 720550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:19,564-Speed 2624.43 samples/sec   Loss 1.9904   LearningRate 0.0017   Epoch: 17   Global Step: 720560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:23,464-Speed 2626.79 samples/sec   Loss 1.9219   LearningRate 0.0017   Epoch: 17   Global Step: 720570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:27,357-Speed 2631.04 samples/sec   Loss 1.9167   LearningRate 0.0017   Epoch: 17   Global Step: 720580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:44:31,224-Speed 2648.15 samples/sec   Loss 1.9051   LearningRate 0.0017   Epoch: 17   Global Step: 720590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:35,117-Speed 2630.57 samples/sec   Loss 1.9205   LearningRate 0.0017   Epoch: 17   Global Step: 720600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:39,012-Speed 2630.86 samples/sec   Loss 1.9065   LearningRate 0.0017   Epoch: 17   Global Step: 720610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:42,902-Speed 2632.91 samples/sec   Loss 1.8635   LearningRate 0.0017   Epoch: 17   Global Step: 720620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:46,791-Speed 2633.42 samples/sec   Loss 1.8885   LearningRate 0.0017   Epoch: 17   Global Step: 720630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:50,683-Speed 2632.38 samples/sec   Loss 1.8856   LearningRate 0.0017   Epoch: 17   Global Step: 720640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:54,591-Speed 2620.53 samples/sec   Loss 1.9608   LearningRate 0.0017   Epoch: 17   Global Step: 720650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:44:58,495-Speed 2624.09 samples/sec   Loss 1.9093   LearningRate 0.0017   Epoch: 17   Global Step: 720660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:45:02,396-Speed 2624.83 samples/sec   Loss 1.8809   LearningRate 0.0017   Epoch: 17   Global Step: 720670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:45:06,291-Speed 2630.39 samples/sec   Loss 1.8594   LearningRate 0.0017   Epoch: 17   Global Step: 720680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:45:10,185-Speed 2630.62 samples/sec   Loss 1.8790   LearningRate 0.0017   Epoch: 17   Global Step: 720690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:45:14,109-Speed 2610.58 samples/sec   Loss 1.9048   LearningRate 0.0017   Epoch: 17   Global Step: 720700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:45:18,015-Speed 2622.91 samples/sec   Loss 1.9332   LearningRate 0.0017   Epoch: 17   Global Step: 720710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:45:21,886-Speed 2645.73 samples/sec   Loss 1.9123   LearningRate 0.0017   Epoch: 17   Global Step: 720720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:25,784-Speed 2628.02 samples/sec   Loss 1.9396   LearningRate 0.0017   Epoch: 17   Global Step: 720730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:29,685-Speed 2625.27 samples/sec   Loss 1.9001   LearningRate 0.0017   Epoch: 17   Global Step: 720740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:33,587-Speed 2625.15 samples/sec   Loss 1.8707   LearningRate 0.0017   Epoch: 17   Global Step: 720750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:37,487-Speed 2625.55 samples/sec   Loss 1.9071   LearningRate 0.0017   Epoch: 17   Global Step: 720760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:41,395-Speed 2621.47 samples/sec   Loss 1.9149   LearningRate 0.0017   Epoch: 17   Global Step: 720770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:45,513-Speed 2487.30 samples/sec   Loss 1.8739   LearningRate 0.0017   Epoch: 17   Global Step: 720780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:49,640-Speed 2482.45 samples/sec   Loss 1.8795   LearningRate 0.0017   Epoch: 17   Global Step: 720790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:53,543-Speed 2624.18 samples/sec   Loss 1.9036   LearningRate 0.0017   Epoch: 17   Global Step: 720800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:45:57,458-Speed 2616.60 samples/sec   Loss 1.9149   LearningRate 0.0017   Epoch: 17   Global Step: 720810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:01,365-Speed 2621.66 samples/sec   Loss 1.9284   LearningRate 0.0017   Epoch: 17   Global Step: 720820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:46:05,267-Speed 2624.63 samples/sec   Loss 1.8815   LearningRate 0.0017   Epoch: 17   Global Step: 720830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:46:09,170-Speed 2624.66 samples/sec   Loss 1.9310   LearningRate 0.0017   Epoch: 17   Global Step: 720840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:46:13,083-Speed 2617.40 samples/sec   Loss 1.8774   LearningRate 0.0017   Epoch: 17   Global Step: 720850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:46:16,986-Speed 2624.38 samples/sec   Loss 1.8961   LearningRate 0.0017   Epoch: 17   Global Step: 720860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:46:20,901-Speed 2617.62 samples/sec   Loss 1.8547   LearningRate 0.0017   Epoch: 17   Global Step: 720870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:46:24,785-Speed 2636.47 samples/sec   Loss 1.9232   LearningRate 0.0017   Epoch: 17   Global Step: 720880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:28,685-Speed 2627.00 samples/sec   Loss 1.8280   LearningRate 0.0017   Epoch: 17   Global Step: 720890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:32,589-Speed 2622.90 samples/sec   Loss 1.8584   LearningRate 0.0017   Epoch: 17   Global Step: 720900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:36,495-Speed 2622.17 samples/sec   Loss 1.9990   LearningRate 0.0017   Epoch: 17   Global Step: 720910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:40,402-Speed 2621.26 samples/sec   Loss 1.8771   LearningRate 0.0017   Epoch: 17   Global Step: 720920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:44,304-Speed 2625.45 samples/sec   Loss 1.9168   LearningRate 0.0017   Epoch: 17   Global Step: 720930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:48,203-Speed 2627.08 samples/sec   Loss 1.8546   LearningRate 0.0017   Epoch: 17   Global Step: 720940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:52,106-Speed 2624.54 samples/sec   Loss 1.8585   LearningRate 0.0017   Epoch: 17   Global Step: 720950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:56,008-Speed 2625.04 samples/sec   Loss 1.9073   LearningRate 0.0017   Epoch: 17   Global Step: 720960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:46:59,907-Speed 2626.94 samples/sec   Loss 1.8254   LearningRate 0.0017   Epoch: 17   Global Step: 720970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:03,814-Speed 2621.64 samples/sec   Loss 1.9231   LearningRate 0.0017   Epoch: 17   Global Step: 720980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:47:07,722-Speed 2620.58 samples/sec   Loss 1.9776   LearningRate 0.0017   Epoch: 17   Global Step: 720990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:47:11,642-Speed 2613.20 samples/sec   Loss 1.8671   LearningRate 0.0017   Epoch: 17   Global Step: 721000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:47:15,553-Speed 2619.07 samples/sec   Loss 1.9133   LearningRate 0.0017   Epoch: 17   Global Step: 721010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:47:19,480-Speed 2608.91 samples/sec   Loss 1.9035   LearningRate 0.0017   Epoch: 17   Global Step: 721020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:47:23,356-Speed 2641.81 samples/sec   Loss 1.8916   LearningRate 0.0017   Epoch: 17   Global Step: 721030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:27,262-Speed 2622.63 samples/sec   Loss 1.8608   LearningRate 0.0017   Epoch: 17   Global Step: 721040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:31,165-Speed 2624.21 samples/sec   Loss 1.8684   LearningRate 0.0017   Epoch: 17   Global Step: 721050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:35,069-Speed 2624.09 samples/sec   Loss 1.9022   LearningRate 0.0017   Epoch: 17   Global Step: 721060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:38,972-Speed 2623.60 samples/sec   Loss 1.8781   LearningRate 0.0017   Epoch: 17   Global Step: 721070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:42,877-Speed 2623.14 samples/sec   Loss 1.8890   LearningRate 0.0017   Epoch: 17   Global Step: 721080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:46,807-Speed 2606.43 samples/sec   Loss 1.8485   LearningRate 0.0017   Epoch: 17   Global Step: 721090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:50,714-Speed 2621.73 samples/sec   Loss 1.8816   LearningRate 0.0017   Epoch: 17   Global Step: 721100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:54,618-Speed 2623.68 samples/sec   Loss 1.8437   LearningRate 0.0017   Epoch: 17   Global Step: 721110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:47:58,555-Speed 2602.27 samples/sec   Loss 1.8981   LearningRate 0.0017   Epoch: 17   Global Step: 721120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:48:02,455-Speed 2625.82 samples/sec   Loss 1.8827   LearningRate 0.0017   Epoch: 17   Global Step: 721130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:06,357-Speed 2624.39 samples/sec   Loss 1.9573   LearningRate 0.0017   Epoch: 17   Global Step: 721140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:10,264-Speed 2621.67 samples/sec   Loss 1.9474   LearningRate 0.0017   Epoch: 17   Global Step: 721150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:14,188-Speed 2610.88 samples/sec   Loss 1.8880   LearningRate 0.0017   Epoch: 17   Global Step: 721160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:18,098-Speed 2620.07 samples/sec   Loss 1.8713   LearningRate 0.0017   Epoch: 17   Global Step: 721170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:22,036-Speed 2600.82 samples/sec   Loss 1.9188   LearningRate 0.0017   Epoch: 17   Global Step: 721180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:26,003-Speed 2581.61 samples/sec   Loss 1.9336   LearningRate 0.0017   Epoch: 17   Global Step: 721190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:29,906-Speed 2625.25 samples/sec   Loss 1.8421   LearningRate 0.0017   Epoch: 17   Global Step: 721200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:33,861-Speed 2590.10 samples/sec   Loss 1.9259   LearningRate 0.0017   Epoch: 17   Global Step: 721210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:37,762-Speed 2625.60 samples/sec   Loss 1.9158   LearningRate 0.0017   Epoch: 17   Global Step: 721220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:41,658-Speed 2628.79 samples/sec   Loss 1.9926   LearningRate 0.0017   Epoch: 17   Global Step: 721230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:45,567-Speed 2621.00 samples/sec   Loss 1.8371   LearningRate 0.0017   Epoch: 17   Global Step: 721240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:49,469-Speed 2624.83 samples/sec   Loss 1.8952   LearningRate 0.0017   Epoch: 17   Global Step: 721250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:53,375-Speed 2622.43 samples/sec   Loss 1.9063   LearningRate 0.0017   Epoch: 17   Global Step: 721260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:48:57,304-Speed 2606.81 samples/sec   Loss 1.8920   LearningRate 0.0017   Epoch: 17   Global Step: 721270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:01,222-Speed 2613.88 samples/sec   Loss 1.9739   LearningRate 0.0017   Epoch: 17   Global Step: 721280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:05,124-Speed 2625.36 samples/sec   Loss 1.8669   LearningRate 0.0017   Epoch: 17   Global Step: 721290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:09,024-Speed 2626.28 samples/sec   Loss 1.8811   LearningRate 0.0017   Epoch: 17   Global Step: 721300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:12,926-Speed 2624.70 samples/sec   Loss 1.8753   LearningRate 0.0017   Epoch: 17   Global Step: 721310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:16,826-Speed 2627.03 samples/sec   Loss 1.9295   LearningRate 0.0017   Epoch: 17   Global Step: 721320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:20,735-Speed 2619.75 samples/sec   Loss 1.8670   LearningRate 0.0017   Epoch: 17   Global Step: 721330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:49:24,632-Speed 2628.72 samples/sec   Loss 1.9087   LearningRate 0.0017   Epoch: 17   Global Step: 721340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:28,536-Speed 2623.93 samples/sec   Loss 1.9005   LearningRate 0.0017   Epoch: 17   Global Step: 721350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:32,443-Speed 2621.22 samples/sec   Loss 1.9534   LearningRate 0.0017   Epoch: 17   Global Step: 721360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:36,367-Speed 2609.98 samples/sec   Loss 1.8202   LearningRate 0.0017   Epoch: 17   Global Step: 721370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:40,271-Speed 2624.06 samples/sec   Loss 1.9085   LearningRate 0.0017   Epoch: 17   Global Step: 721380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:44,181-Speed 2619.33 samples/sec   Loss 1.9348   LearningRate 0.0017   Epoch: 17   Global Step: 721390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:48,085-Speed 2623.82 samples/sec   Loss 1.8874   LearningRate 0.0017   Epoch: 17   Global Step: 721400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:52,108-Speed 2546.17 samples/sec   Loss 1.8968   LearningRate 0.0017   Epoch: 17   Global Step: 721410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:56,014-Speed 2622.34 samples/sec   Loss 1.8694   LearningRate 0.0017   Epoch: 17   Global Step: 721420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:49:59,928-Speed 2616.52 samples/sec   Loss 1.9308   LearningRate 0.0017   Epoch: 17   Global Step: 721430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:50:03,833-Speed 2623.03 samples/sec   Loss 1.8360   LearningRate 0.0017   Epoch: 17   Global Step: 721440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:50:07,740-Speed 2621.57 samples/sec   Loss 1.8675   LearningRate 0.0017   Epoch: 17   Global Step: 721450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:50:11,642-Speed 2625.26 samples/sec   Loss 1.9036   LearningRate 0.0017   Epoch: 17   Global Step: 721460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:50:15,529-Speed 2635.21 samples/sec   Loss 1.9213   LearningRate 0.0017   Epoch: 17   Global Step: 721470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:19,453-Speed 2610.44 samples/sec   Loss 1.8948   LearningRate 0.0017   Epoch: 17   Global Step: 721480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:23,359-Speed 2622.69 samples/sec   Loss 1.8754   LearningRate 0.0017   Epoch: 17   Global Step: 721490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:27,261-Speed 2624.86 samples/sec   Loss 1.9307   LearningRate 0.0017   Epoch: 17   Global Step: 721500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:31,246-Speed 2570.11 samples/sec   Loss 1.8606   LearningRate 0.0017   Epoch: 17   Global Step: 721510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:35,155-Speed 2619.98 samples/sec   Loss 1.8743   LearningRate 0.0017   Epoch: 17   Global Step: 721520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:39,061-Speed 2622.56 samples/sec   Loss 1.9999   LearningRate 0.0017   Epoch: 17   Global Step: 721530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:42,968-Speed 2621.35 samples/sec   Loss 1.9323   LearningRate 0.0017   Epoch: 17   Global Step: 721540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:46,870-Speed 2625.64 samples/sec   Loss 1.9038   LearningRate 0.0017   Epoch: 17   Global Step: 721550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:50,776-Speed 2621.89 samples/sec   Loss 1.9329   LearningRate 0.0017   Epoch: 17   Global Step: 721560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:50:54,680-Speed 2623.70 samples/sec   Loss 1.9011   LearningRate 0.0017   Epoch: 17   Global Step: 721570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:50:58,610-Speed 2606.80 samples/sec   Loss 1.9202   LearningRate 0.0017   Epoch: 17   Global Step: 721580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:02,565-Speed 2589.49 samples/sec   Loss 1.8950   LearningRate 0.0017   Epoch: 17   Global Step: 721590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:06,504-Speed 2599.96 samples/sec   Loss 1.8817   LearningRate 0.0017   Epoch: 17   Global Step: 721600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:10,418-Speed 2616.57 samples/sec   Loss 1.8961   LearningRate 0.0017   Epoch: 17   Global Step: 721610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:14,327-Speed 2621.31 samples/sec   Loss 1.8406   LearningRate 0.0017   Epoch: 17   Global Step: 721620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:18,229-Speed 2624.69 samples/sec   Loss 1.8878   LearningRate 0.0017   Epoch: 17   Global Step: 721630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:22,127-Speed 2627.68 samples/sec   Loss 1.8536   LearningRate 0.0017   Epoch: 17   Global Step: 721640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:26,029-Speed 2625.56 samples/sec   Loss 1.9071   LearningRate 0.0017   Epoch: 17   Global Step: 721650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:29,937-Speed 2621.06 samples/sec   Loss 1.8795   LearningRate 0.0017   Epoch: 17   Global Step: 721660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:33,839-Speed 2624.92 samples/sec   Loss 1.8725   LearningRate 0.0017   Epoch: 17   Global Step: 721670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:51:37,747-Speed 2621.26 samples/sec   Loss 1.8543   LearningRate 0.0017   Epoch: 17   Global Step: 721680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:51:41,649-Speed 2624.93 samples/sec   Loss 1.9022   LearningRate 0.0017   Epoch: 17   Global Step: 721690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:51:45,554-Speed 2622.79 samples/sec   Loss 1.8323   LearningRate 0.0017   Epoch: 17   Global Step: 721700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:51:49,466-Speed 2618.29 samples/sec   Loss 1.9019   LearningRate 0.0017   Epoch: 17   Global Step: 721710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:51:53,372-Speed 2622.57 samples/sec   Loss 1.8990   LearningRate 0.0017   Epoch: 17   Global Step: 721720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:51:57,300-Speed 2607.95 samples/sec   Loss 1.8910   LearningRate 0.0017   Epoch: 17   Global Step: 721730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:01,215-Speed 2616.38 samples/sec   Loss 1.8485   LearningRate 0.0017   Epoch: 17   Global Step: 721740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:05,121-Speed 2621.82 samples/sec   Loss 1.9113   LearningRate 0.0017   Epoch: 17   Global Step: 721750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:09,061-Speed 2599.20 samples/sec   Loss 1.8782   LearningRate 0.0017   Epoch: 17   Global Step: 721760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:12,972-Speed 2619.65 samples/sec   Loss 1.9055   LearningRate 0.0017   Epoch: 17   Global Step: 721770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:16,875-Speed 2623.85 samples/sec   Loss 1.8929   LearningRate 0.0017   Epoch: 17   Global Step: 721780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:52:20,776-Speed 2626.00 samples/sec   Loss 1.8575   LearningRate 0.0017   Epoch: 17   Global Step: 721790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:52:24,654-Speed 2641.06 samples/sec   Loss 1.8523   LearningRate 0.0017   Epoch: 17   Global Step: 721800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:28,598-Speed 2597.11 samples/sec   Loss 1.9331   LearningRate 0.0017   Epoch: 17   Global Step: 721810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:32,516-Speed 2614.47 samples/sec   Loss 1.8485   LearningRate 0.0017   Epoch: 17   Global Step: 721820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:36,423-Speed 2621.49 samples/sec   Loss 1.8724   LearningRate 0.0017   Epoch: 17   Global Step: 721830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:52:40,337-Speed 2616.71 samples/sec   Loss 1.9014   LearningRate 0.0017   Epoch: 17   Global Step: 721840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:52:44,363-Speed 2544.79 samples/sec   Loss 1.8131   LearningRate 0.0017   Epoch: 17   Global Step: 721850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:52:48,277-Speed 2616.63 samples/sec   Loss 1.7918   LearningRate 0.0017   Epoch: 17   Global Step: 721860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:52:52,182-Speed 2623.30 samples/sec   Loss 1.9192   LearningRate 0.0017   Epoch: 17   Global Step: 721870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:52:56,090-Speed 2620.52 samples/sec   Loss 1.8392   LearningRate 0.0017   Epoch: 17   Global Step: 721880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:52:59,989-Speed 2627.14 samples/sec   Loss 1.8876   LearningRate 0.0017   Epoch: 17   Global Step: 721890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:53:03,896-Speed 2621.83 samples/sec   Loss 1.8743   LearningRate 0.0017   Epoch: 17   Global Step: 721900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:53:07,799-Speed 2624.24 samples/sec   Loss 1.8499   LearningRate 0.0017   Epoch: 17   Global Step: 721910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:53:11,704-Speed 2622.96 samples/sec   Loss 1.8682   LearningRate 0.0017   Epoch: 17   Global Step: 721920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:53:15,611-Speed 2621.09 samples/sec   Loss 1.8391   LearningRate 0.0017   Epoch: 17   Global Step: 721930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:53:19,521-Speed 2619.57 samples/sec   Loss 1.8664   LearningRate 0.0017   Epoch: 17   Global Step: 721940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:23,428-Speed 2621.89 samples/sec   Loss 1.8922   LearningRate 0.0017   Epoch: 17   Global Step: 721950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:27,329-Speed 2626.04 samples/sec   Loss 1.8477   LearningRate 0.0017   Epoch: 17   Global Step: 721960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:31,230-Speed 2625.97 samples/sec   Loss 1.8416   LearningRate 0.0017   Epoch: 17   Global Step: 721970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:35,129-Speed 2626.43 samples/sec   Loss 1.8248   LearningRate 0.0017   Epoch: 17   Global Step: 721980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:39,029-Speed 2626.48 samples/sec   Loss 1.9109   LearningRate 0.0017   Epoch: 17   Global Step: 721990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:42,932-Speed 2624.06 samples/sec   Loss 1.8623   LearningRate 0.0017   Epoch: 17   Global Step: 722000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:46,845-Speed 2617.18 samples/sec   Loss 1.9481   LearningRate 0.0017   Epoch: 17   Global Step: 722010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:50,757-Speed 2618.78 samples/sec   Loss 1.8555   LearningRate 0.0017   Epoch: 17   Global Step: 722020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:54,665-Speed 2620.78 samples/sec   Loss 1.8770   LearningRate 0.0017   Epoch: 17   Global Step: 722030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:53:58,579-Speed 2617.64 samples/sec   Loss 1.9106   LearningRate 0.0017   Epoch: 17   Global Step: 722040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 04:54:02,464-Speed 2636.17 samples/sec   Loss 1.8559   LearningRate 0.0017   Epoch: 17   Global Step: 722050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:06,378-Speed 2617.17 samples/sec   Loss 1.8218   LearningRate 0.0017   Epoch: 17   Global Step: 722060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:10,286-Speed 2620.33 samples/sec   Loss 1.8825   LearningRate 0.0017   Epoch: 17   Global Step: 722070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:14,191-Speed 2623.24 samples/sec   Loss 1.9313   LearningRate 0.0017   Epoch: 17   Global Step: 722080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:18,145-Speed 2590.66 samples/sec   Loss 1.9552   LearningRate 0.0017   Epoch: 17   Global Step: 722090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:22,223-Speed 2511.22 samples/sec   Loss 1.8561   LearningRate 0.0017   Epoch: 17   Global Step: 722100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:26,130-Speed 2621.66 samples/sec   Loss 1.8840   LearningRate 0.0017   Epoch: 17   Global Step: 722110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:54:30,036-Speed 2622.43 samples/sec   Loss 1.9394   LearningRate 0.0017   Epoch: 17   Global Step: 722120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:33,944-Speed 2621.18 samples/sec   Loss 1.8131   LearningRate 0.0017   Epoch: 17   Global Step: 722130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:37,885-Speed 2598.88 samples/sec   Loss 1.8407   LearningRate 0.0017   Epoch: 17   Global Step: 722140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:41,793-Speed 2621.14 samples/sec   Loss 1.9114   LearningRate 0.0017   Epoch: 17   Global Step: 722150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:45,708-Speed 2616.60 samples/sec   Loss 1.8495   LearningRate 0.0017   Epoch: 17   Global Step: 722160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:49,617-Speed 2619.99 samples/sec   Loss 1.9120   LearningRate 0.0017   Epoch: 17   Global Step: 722170   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:53,517-Speed 2627.05 samples/sec   Loss 1.8586   LearningRate 0.0017   Epoch: 17   Global Step: 722180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:54:57,422-Speed 2622.84 samples/sec   Loss 1.8920   LearningRate 0.0017   Epoch: 17   Global Step: 722190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:55:01,324-Speed 2624.46 samples/sec   Loss 1.8926   LearningRate 0.0017   Epoch: 17   Global Step: 722200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:55:05,207-Speed 2637.99 samples/sec   Loss 1.9082   LearningRate 0.0017   Epoch: 17   Global Step: 722210   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:09,111-Speed 2623.70 samples/sec   Loss 1.8958   LearningRate 0.0017   Epoch: 17   Global Step: 722220   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:13,015-Speed 2623.74 samples/sec   Loss 1.8460   LearningRate 0.0017   Epoch: 17   Global Step: 722230   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:16,919-Speed 2623.85 samples/sec   Loss 1.8673   LearningRate 0.0017   Epoch: 17   Global Step: 722240   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:20,826-Speed 2621.45 samples/sec   Loss 1.8545   LearningRate 0.0017   Epoch: 17   Global Step: 722250   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:24,739-Speed 2617.95 samples/sec   Loss 1.8446   LearningRate 0.0017   Epoch: 17   Global Step: 722260   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:28,646-Speed 2621.68 samples/sec   Loss 1.9087   LearningRate 0.0017   Epoch: 17   Global Step: 722270   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:32,562-Speed 2615.12 samples/sec   Loss 1.8355   LearningRate 0.0017   Epoch: 17   Global Step: 722280   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:36,468-Speed 2622.45 samples/sec   Loss 1.9051   LearningRate 0.0017   Epoch: 17   Global Step: 722290   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:40,373-Speed 2623.50 samples/sec   Loss 1.8944   LearningRate 0.0017   Epoch: 17   Global Step: 722300   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:55:44,337-Speed 2584.00 samples/sec   Loss 1.9576   LearningRate 0.0017   Epoch: 17   Global Step: 722310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:55:48,245-Speed 2621.09 samples/sec   Loss 1.8621   LearningRate 0.0017   Epoch: 17   Global Step: 722320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:55:52,181-Speed 2603.05 samples/sec   Loss 1.8422   LearningRate 0.0017   Epoch: 17   Global Step: 722330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:55:56,096-Speed 2615.57 samples/sec   Loss 1.9038   LearningRate 0.0017   Epoch: 17   Global Step: 722340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:56:00,020-Speed 2610.86 samples/sec   Loss 1.8483   LearningRate 0.0017   Epoch: 17   Global Step: 722350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:56:03,927-Speed 2621.43 samples/sec   Loss 1.8300   LearningRate 0.0017   Epoch: 17   Global Step: 722360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:56:07,849-Speed 2611.81 samples/sec   Loss 1.8729   LearningRate 0.0017   Epoch: 17   Global Step: 722370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:56:11,728-Speed 2640.33 samples/sec   Loss 1.8504   LearningRate 0.0017   Epoch: 17   Global Step: 722380   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:15,636-Speed 2621.79 samples/sec   Loss 1.8886   LearningRate 0.0017   Epoch: 17   Global Step: 722390   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:19,542-Speed 2621.99 samples/sec   Loss 1.8772   LearningRate 0.0017   Epoch: 17   Global Step: 722400   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:23,451-Speed 2620.76 samples/sec   Loss 1.8532   LearningRate 0.0017   Epoch: 17   Global Step: 722410   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:27,359-Speed 2620.29 samples/sec   Loss 1.8988   LearningRate 0.0017   Epoch: 17   Global Step: 722420   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:31,258-Speed 2627.23 samples/sec   Loss 1.8925   LearningRate 0.0017   Epoch: 17   Global Step: 722430   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:35,163-Speed 2622.48 samples/sec   Loss 1.8958   LearningRate 0.0017   Epoch: 17   Global Step: 722440   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:39,078-Speed 2616.64 samples/sec   Loss 1.8782   LearningRate 0.0017   Epoch: 17   Global Step: 722450   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:43,002-Speed 2609.75 samples/sec   Loss 1.8873   LearningRate 0.0017   Epoch: 17   Global Step: 722460   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:46,903-Speed 2625.73 samples/sec   Loss 1.9326   LearningRate 0.0017   Epoch: 17   Global Step: 722470   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 04:56:50,807-Speed 2624.12 samples/sec   Loss 1.8183   LearningRate 0.0017   Epoch: 17   Global Step: 722480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:56:54,707-Speed 2626.48 samples/sec   Loss 1.8642   LearningRate 0.0017   Epoch: 17   Global Step: 722490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:56:58,612-Speed 2622.73 samples/sec   Loss 1.8326   LearningRate 0.0017   Epoch: 17   Global Step: 722500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:02,516-Speed 2623.47 samples/sec   Loss 1.8865   LearningRate 0.0017   Epoch: 17   Global Step: 722510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:06,422-Speed 2622.17 samples/sec   Loss 1.9251   LearningRate 0.0017   Epoch: 17   Global Step: 722520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:10,363-Speed 2598.89 samples/sec   Loss 1.8842   LearningRate 0.0017   Epoch: 17   Global Step: 722530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:14,275-Speed 2618.81 samples/sec   Loss 1.8953   LearningRate 0.0017   Epoch: 17   Global Step: 722540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:18,190-Speed 2616.22 samples/sec   Loss 1.8696   LearningRate 0.0017   Epoch: 17   Global Step: 722550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:22,106-Speed 2615.83 samples/sec   Loss 1.8740   LearningRate 0.0017   Epoch: 17   Global Step: 722560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:26,009-Speed 2624.45 samples/sec   Loss 1.9086   LearningRate 0.0017   Epoch: 17   Global Step: 722570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:29,912-Speed 2631.05 samples/sec   Loss 1.8797   LearningRate 0.0017   Epoch: 17   Global Step: 722580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:57:33,786-Speed 2643.83 samples/sec   Loss 1.8801   LearningRate 0.0017   Epoch: 17   Global Step: 722590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:37,686-Speed 2625.95 samples/sec   Loss 1.8509   LearningRate 0.0017   Epoch: 17   Global Step: 722600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:41,623-Speed 2601.36 samples/sec   Loss 1.8568   LearningRate 0.0017   Epoch: 17   Global Step: 722610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:45,532-Speed 2620.69 samples/sec   Loss 1.8688   LearningRate 0.0017   Epoch: 17   Global Step: 722620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:49,435-Speed 2625.01 samples/sec   Loss 1.8548   LearningRate 0.0017   Epoch: 17   Global Step: 722630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:53,331-Speed 2628.80 samples/sec   Loss 1.9136   LearningRate 0.0017   Epoch: 17   Global Step: 722640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:57:57,236-Speed 2622.92 samples/sec   Loss 1.8415   LearningRate 0.0017   Epoch: 17   Global Step: 722650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:01,147-Speed 2618.72 samples/sec   Loss 1.8922   LearningRate 0.0017   Epoch: 17   Global Step: 722660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:05,072-Speed 2609.56 samples/sec   Loss 1.8646   LearningRate 0.0017   Epoch: 17   Global Step: 722670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:08,970-Speed 2627.19 samples/sec   Loss 1.8857   LearningRate 0.0017   Epoch: 17   Global Step: 722680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:12,985-Speed 2551.99 samples/sec   Loss 1.8983   LearningRate 0.0017   Epoch: 17   Global Step: 722690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:58:16,886-Speed 2625.23 samples/sec   Loss 1.8735   LearningRate 0.0017   Epoch: 17   Global Step: 722700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:58:20,817-Speed 2606.13 samples/sec   Loss 1.8752   LearningRate 0.0017   Epoch: 17   Global Step: 722710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:58:24,697-Speed 2639.99 samples/sec   Loss 1.8983   LearningRate 0.0017   Epoch: 17   Global Step: 722720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:28,597-Speed 2626.27 samples/sec   Loss 1.8550   LearningRate 0.0017   Epoch: 17   Global Step: 722730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:32,513-Speed 2615.28 samples/sec   Loss 1.8522   LearningRate 0.0017   Epoch: 17   Global Step: 722740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:36,425-Speed 2618.47 samples/sec   Loss 1.8967   LearningRate 0.0017   Epoch: 17   Global Step: 722750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:40,331-Speed 2621.71 samples/sec   Loss 1.8532   LearningRate 0.0017   Epoch: 17   Global Step: 722760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:44,235-Speed 2624.33 samples/sec   Loss 1.8854   LearningRate 0.0017   Epoch: 17   Global Step: 722770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:48,139-Speed 2623.25 samples/sec   Loss 1.9347   LearningRate 0.0017   Epoch: 17   Global Step: 722780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:52,044-Speed 2623.07 samples/sec   Loss 1.9019   LearningRate 0.0017   Epoch: 17   Global Step: 722790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:55,949-Speed 2622.82 samples/sec   Loss 1.8879   LearningRate 0.0017   Epoch: 17   Global Step: 722800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:58:59,895-Speed 2596.18 samples/sec   Loss 1.8349   LearningRate 0.0017   Epoch: 17   Global Step: 722810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:03,791-Speed 2628.88 samples/sec   Loss 1.9365   LearningRate 0.0017   Epoch: 17   Global Step: 722820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:59:07,686-Speed 2629.40 samples/sec   Loss 1.9028   LearningRate 0.0017   Epoch: 17   Global Step: 722830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:59:11,572-Speed 2636.03 samples/sec   Loss 1.8845   LearningRate 0.0017   Epoch: 17   Global Step: 722840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:15,472-Speed 2627.01 samples/sec   Loss 1.8566   LearningRate 0.0017   Epoch: 17   Global Step: 722850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:19,405-Speed 2604.01 samples/sec   Loss 1.9075   LearningRate 0.0017   Epoch: 17   Global Step: 722860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:23,299-Speed 2630.55 samples/sec   Loss 1.8931   LearningRate 0.0017   Epoch: 17   Global Step: 722870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:27,195-Speed 2629.44 samples/sec   Loss 1.8820   LearningRate 0.0017   Epoch: 17   Global Step: 722880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:31,104-Speed 2620.25 samples/sec   Loss 1.8662   LearningRate 0.0017   Epoch: 17   Global Step: 722890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:35,026-Speed 2611.32 samples/sec   Loss 1.8611   LearningRate 0.0017   Epoch: 17   Global Step: 722900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:38,925-Speed 2627.10 samples/sec   Loss 1.8881   LearningRate 0.0017   Epoch: 17   Global Step: 722910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:42,823-Speed 2628.40 samples/sec   Loss 1.8553   LearningRate 0.0017   Epoch: 17   Global Step: 722920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:46,722-Speed 2626.79 samples/sec   Loss 1.8377   LearningRate 0.0017   Epoch: 17   Global Step: 722930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 04:59:50,634-Speed 2618.04 samples/sec   Loss 1.8600   LearningRate 0.0017   Epoch: 17   Global Step: 722940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:59:54,542-Speed 2621.14 samples/sec   Loss 1.8173   LearningRate 0.0017   Epoch: 17   Global Step: 722950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 04:59:58,439-Speed 2628.40 samples/sec   Loss 1.8303   LearningRate 0.0017   Epoch: 17   Global Step: 722960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:00:02,342-Speed 2624.51 samples/sec   Loss 1.8901   LearningRate 0.0017   Epoch: 17   Global Step: 722970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:00:06,245-Speed 2623.69 samples/sec   Loss 1.8732   LearningRate 0.0017   Epoch: 17   Global Step: 722980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:00:10,164-Speed 2613.77 samples/sec   Loss 1.9232   LearningRate 0.0017   Epoch: 17   Global Step: 722990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:00:14,059-Speed 2629.84 samples/sec   Loss 1.8795   LearningRate 0.0017   Epoch: 17   Global Step: 723000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:00:17,935-Speed 2641.90 samples/sec   Loss 1.8840   LearningRate 0.0017   Epoch: 17   Global Step: 723010   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:21,838-Speed 2624.22 samples/sec   Loss 1.8818   LearningRate 0.0016   Epoch: 17   Global Step: 723020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:25,737-Speed 2627.13 samples/sec   Loss 1.8753   LearningRate 0.0016   Epoch: 17   Global Step: 723030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:29,635-Speed 2628.44 samples/sec   Loss 1.8958   LearningRate 0.0016   Epoch: 17   Global Step: 723040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:33,530-Speed 2629.38 samples/sec   Loss 1.8494   LearningRate 0.0016   Epoch: 17   Global Step: 723050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:37,430-Speed 2626.18 samples/sec   Loss 1.9011   LearningRate 0.0016   Epoch: 17   Global Step: 723060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:41,330-Speed 2625.78 samples/sec   Loss 1.8899   LearningRate 0.0016   Epoch: 17   Global Step: 723070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:45,248-Speed 2614.71 samples/sec   Loss 1.8558   LearningRate 0.0016   Epoch: 17   Global Step: 723080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:49,148-Speed 2626.06 samples/sec   Loss 1.9135   LearningRate 0.0016   Epoch: 17   Global Step: 723090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:53,049-Speed 2626.32 samples/sec   Loss 1.8509   LearningRate 0.0016   Epoch: 17   Global Step: 723100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:00:56,973-Speed 2610.11 samples/sec   Loss 1.8666   LearningRate 0.0016   Epoch: 17   Global Step: 723110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:01:00,885-Speed 2618.01 samples/sec   Loss 1.8750   LearningRate 0.0016   Epoch: 17   Global Step: 723120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:01:04,830-Speed 2595.90 samples/sec   Loss 1.8497   LearningRate 0.0016   Epoch: 17   Global Step: 723130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:01:08,715-Speed 2636.99 samples/sec   Loss 1.8889   LearningRate 0.0016   Epoch: 17   Global Step: 723140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:12,612-Speed 2628.24 samples/sec   Loss 1.9066   LearningRate 0.0016   Epoch: 17   Global Step: 723150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:16,519-Speed 2622.16 samples/sec   Loss 1.8366   LearningRate 0.0016   Epoch: 17   Global Step: 723160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:20,428-Speed 2619.78 samples/sec   Loss 1.8481   LearningRate 0.0016   Epoch: 17   Global Step: 723170   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:24,527-Speed 2498.83 samples/sec   Loss 1.8170   LearningRate 0.0016   Epoch: 17   Global Step: 723180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:28,437-Speed 2619.86 samples/sec   Loss 1.8431   LearningRate 0.0016   Epoch: 17   Global Step: 723190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:32,338-Speed 2625.62 samples/sec   Loss 1.8775   LearningRate 0.0016   Epoch: 17   Global Step: 723200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:36,244-Speed 2622.51 samples/sec   Loss 1.9004   LearningRate 0.0016   Epoch: 17   Global Step: 723210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:40,156-Speed 2618.41 samples/sec   Loss 1.9268   LearningRate 0.0016   Epoch: 17   Global Step: 723220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:44,068-Speed 2618.14 samples/sec   Loss 1.9001   LearningRate 0.0016   Epoch: 17   Global Step: 723230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:01:47,985-Speed 2614.56 samples/sec   Loss 1.8661   LearningRate 0.0016   Epoch: 17   Global Step: 723240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:01:51,895-Speed 2620.06 samples/sec   Loss 1.8572   LearningRate 0.0016   Epoch: 17   Global Step: 723250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:01:55,817-Speed 2612.08 samples/sec   Loss 1.8585   LearningRate 0.0016   Epoch: 17   Global Step: 723260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:01:59,740-Speed 2610.47 samples/sec   Loss 1.8656   LearningRate 0.0016   Epoch: 17   Global Step: 723270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:03,638-Speed 2628.15 samples/sec   Loss 1.8773   LearningRate 0.0016   Epoch: 17   Global Step: 723280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:07,541-Speed 2623.99 samples/sec   Loss 1.8082   LearningRate 0.0016   Epoch: 17   Global Step: 723290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:11,453-Speed 2618.23 samples/sec   Loss 1.8579   LearningRate 0.0016   Epoch: 17   Global Step: 723300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:15,373-Speed 2613.10 samples/sec   Loss 1.9268   LearningRate 0.0016   Epoch: 17   Global Step: 723310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:19,280-Speed 2622.15 samples/sec   Loss 1.8130   LearningRate 0.0016   Epoch: 17   Global Step: 723320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:23,178-Speed 2627.37 samples/sec   Loss 1.8921   LearningRate 0.0016   Epoch: 17   Global Step: 723330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:27,056-Speed 2642.23 samples/sec   Loss 1.8999   LearningRate 0.0016   Epoch: 17   Global Step: 723340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:30,963-Speed 2621.15 samples/sec   Loss 1.8532   LearningRate 0.0016   Epoch: 17   Global Step: 723350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:34,861-Speed 2627.30 samples/sec   Loss 1.8609   LearningRate 0.0016   Epoch: 17   Global Step: 723360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:38,763-Speed 2624.89 samples/sec   Loss 1.8763   LearningRate 0.0016   Epoch: 17   Global Step: 723370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:42,658-Speed 2629.47 samples/sec   Loss 1.8790   LearningRate 0.0016   Epoch: 17   Global Step: 723380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:46,607-Speed 2594.04 samples/sec   Loss 1.8860   LearningRate 0.0016   Epoch: 17   Global Step: 723390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:50,532-Speed 2610.09 samples/sec   Loss 1.8733   LearningRate 0.0016   Epoch: 17   Global Step: 723400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:54,443-Speed 2618.72 samples/sec   Loss 1.8501   LearningRate 0.0016   Epoch: 17   Global Step: 723410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:02:58,341-Speed 2627.72 samples/sec   Loss 1.7554   LearningRate 0.0016   Epoch: 17   Global Step: 723420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:02,243-Speed 2624.87 samples/sec   Loss 1.8358   LearningRate 0.0016   Epoch: 17   Global Step: 723430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:06,126-Speed 2638.15 samples/sec   Loss 1.8789   LearningRate 0.0016   Epoch: 17   Global Step: 723440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:10,037-Speed 2618.60 samples/sec   Loss 1.8492   LearningRate 0.0016   Epoch: 17   Global Step: 723450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:13,938-Speed 2626.25 samples/sec   Loss 1.8143   LearningRate 0.0016   Epoch: 17   Global Step: 723460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:17,846-Speed 2620.44 samples/sec   Loss 1.8478   LearningRate 0.0016   Epoch: 17   Global Step: 723470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:21,744-Speed 2627.76 samples/sec   Loss 1.8742   LearningRate 0.0016   Epoch: 17   Global Step: 723480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:03:25,622-Speed 2641.64 samples/sec   Loss 1.8591   LearningRate 0.0016   Epoch: 17   Global Step: 723490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:29,520-Speed 2628.11 samples/sec   Loss 1.8493   LearningRate 0.0016   Epoch: 17   Global Step: 723500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:33,417-Speed 2627.68 samples/sec   Loss 1.8758   LearningRate 0.0016   Epoch: 17   Global Step: 723510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:37,316-Speed 2627.02 samples/sec   Loss 1.8265   LearningRate 0.0016   Epoch: 17   Global Step: 723520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:41,226-Speed 2620.04 samples/sec   Loss 1.8931   LearningRate 0.0016   Epoch: 17   Global Step: 723530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:45,123-Speed 2628.49 samples/sec   Loss 1.8412   LearningRate 0.0016   Epoch: 17   Global Step: 723540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:49,037-Speed 2616.41 samples/sec   Loss 1.8769   LearningRate 0.0016   Epoch: 17   Global Step: 723550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:52,962-Speed 2610.16 samples/sec   Loss 1.8289   LearningRate 0.0016   Epoch: 17   Global Step: 723560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:03:56,861-Speed 2627.07 samples/sec   Loss 1.7873   LearningRate 0.0016   Epoch: 17   Global Step: 723570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:04:00,768-Speed 2621.56 samples/sec   Loss 1.9216   LearningRate 0.0016   Epoch: 17   Global Step: 723580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:04:04,683-Speed 2615.99 samples/sec   Loss 1.8635   LearningRate 0.0016   Epoch: 17   Global Step: 723590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:08,583-Speed 2626.02 samples/sec   Loss 1.8465   LearningRate 0.0016   Epoch: 17   Global Step: 723600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:12,487-Speed 2623.50 samples/sec   Loss 1.8265   LearningRate 0.0016   Epoch: 17   Global Step: 723610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:16,518-Speed 2541.59 samples/sec   Loss 1.8075   LearningRate 0.0016   Epoch: 17   Global Step: 723620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:20,414-Speed 2628.98 samples/sec   Loss 1.9220   LearningRate 0.0016   Epoch: 17   Global Step: 723630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:24,316-Speed 2625.39 samples/sec   Loss 1.8389   LearningRate 0.0016   Epoch: 17   Global Step: 723640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:28,215-Speed 2627.11 samples/sec   Loss 1.8406   LearningRate 0.0016   Epoch: 17   Global Step: 723650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:32,115-Speed 2626.29 samples/sec   Loss 1.8434   LearningRate 0.0016   Epoch: 17   Global Step: 723660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:36,014-Speed 2626.91 samples/sec   Loss 1.8485   LearningRate 0.0016   Epoch: 17   Global Step: 723670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:39,924-Speed 2619.43 samples/sec   Loss 1.8756   LearningRate 0.0016   Epoch: 17   Global Step: 723680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:04:43,826-Speed 2625.64 samples/sec   Loss 1.8903   LearningRate 0.0016   Epoch: 17   Global Step: 723690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:04:47,725-Speed 2626.55 samples/sec   Loss 1.8892   LearningRate 0.0016   Epoch: 17   Global Step: 723700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:04:51,621-Speed 2629.65 samples/sec   Loss 1.8608   LearningRate 0.0016   Epoch: 17   Global Step: 723710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:04:55,520-Speed 2626.98 samples/sec   Loss 1.9207   LearningRate 0.0016   Epoch: 17   Global Step: 723720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:04:59,418-Speed 2627.82 samples/sec   Loss 1.8423   LearningRate 0.0016   Epoch: 17   Global Step: 723730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:03,315-Speed 2627.86 samples/sec   Loss 1.7905   LearningRate 0.0016   Epoch: 17   Global Step: 723740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:07,220-Speed 2622.72 samples/sec   Loss 1.8156   LearningRate 0.0016   Epoch: 17   Global Step: 723750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:11,127-Speed 2622.12 samples/sec   Loss 1.8845   LearningRate 0.0016   Epoch: 17   Global Step: 723760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:15,030-Speed 2624.08 samples/sec   Loss 1.8246   LearningRate 0.0016   Epoch: 17   Global Step: 723770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:18,927-Speed 2628.44 samples/sec   Loss 1.8480   LearningRate 0.0016   Epoch: 17   Global Step: 723780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:22,823-Speed 2628.99 samples/sec   Loss 1.8752   LearningRate 0.0016   Epoch: 17   Global Step: 723790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:05:26,719-Speed 2628.60 samples/sec   Loss 1.8908   LearningRate 0.0016   Epoch: 17   Global Step: 723800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:05:30,616-Speed 2628.59 samples/sec   Loss 1.8484   LearningRate 0.0016   Epoch: 17   Global Step: 723810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:05:34,512-Speed 2628.83 samples/sec   Loss 1.8492   LearningRate 0.0016   Epoch: 17   Global Step: 723820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:05:38,385-Speed 2644.94 samples/sec   Loss 1.9463   LearningRate 0.0016   Epoch: 17   Global Step: 723830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:42,278-Speed 2630.65 samples/sec   Loss 1.8496   LearningRate 0.0016   Epoch: 17   Global Step: 723840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:46,173-Speed 2629.72 samples/sec   Loss 1.8553   LearningRate 0.0016   Epoch: 17   Global Step: 723850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:50,075-Speed 2625.00 samples/sec   Loss 1.8901   LearningRate 0.0016   Epoch: 17   Global Step: 723860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:53,972-Speed 2628.59 samples/sec   Loss 1.8794   LearningRate 0.0016   Epoch: 17   Global Step: 723870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:05:57,866-Speed 2630.09 samples/sec   Loss 1.8451   LearningRate 0.0016   Epoch: 17   Global Step: 723880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:01,761-Speed 2629.72 samples/sec   Loss 1.8406   LearningRate 0.0016   Epoch: 17   Global Step: 723890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:05,700-Speed 2600.36 samples/sec   Loss 1.8721   LearningRate 0.0016   Epoch: 17   Global Step: 723900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:09,628-Speed 2606.75 samples/sec   Loss 1.8651   LearningRate 0.0016   Epoch: 17   Global Step: 723910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:13,525-Speed 2628.96 samples/sec   Loss 1.8180   LearningRate 0.0016   Epoch: 17   Global Step: 723920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:17,422-Speed 2628.03 samples/sec   Loss 1.8900   LearningRate 0.0016   Epoch: 17   Global Step: 723930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:06:21,431-Speed 2555.87 samples/sec   Loss 1.8096   LearningRate 0.0016   Epoch: 17   Global Step: 723940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:06:25,330-Speed 2627.25 samples/sec   Loss 1.8810   LearningRate 0.0016   Epoch: 17   Global Step: 723950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:06:29,236-Speed 2621.88 samples/sec   Loss 1.8380   LearningRate 0.0016   Epoch: 17   Global Step: 723960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:06:33,134-Speed 2627.08 samples/sec   Loss 1.8894   LearningRate 0.0016   Epoch: 17   Global Step: 723970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:06:37,006-Speed 2645.44 samples/sec   Loss 1.8216   LearningRate 0.0016   Epoch: 17   Global Step: 723980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:40,912-Speed 2622.42 samples/sec   Loss 1.8358   LearningRate 0.0016   Epoch: 17   Global Step: 723990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:44,818-Speed 2622.41 samples/sec   Loss 1.9155   LearningRate 0.0016   Epoch: 17   Global Step: 724000   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:48,722-Speed 2623.10 samples/sec   Loss 1.9249   LearningRate 0.0016   Epoch: 17   Global Step: 724010   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:52,628-Speed 2622.71 samples/sec   Loss 1.8249   LearningRate 0.0016   Epoch: 17   Global Step: 724020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:06:56,528-Speed 2626.28 samples/sec   Loss 1.8345   LearningRate 0.0016   Epoch: 17   Global Step: 724030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:07:00,424-Speed 2629.18 samples/sec   Loss 1.8471   LearningRate 0.0016   Epoch: 17   Global Step: 724040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:07:04,342-Speed 2613.60 samples/sec   Loss 1.8285   LearningRate 0.0016   Epoch: 17   Global Step: 724050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:07:08,272-Speed 2606.84 samples/sec   Loss 1.8101   LearningRate 0.0016   Epoch: 17   Global Step: 724060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:07:12,208-Speed 2601.69 samples/sec   Loss 1.8938   LearningRate 0.0016   Epoch: 17   Global Step: 724070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:07:16,119-Speed 2618.97 samples/sec   Loss 1.8142   LearningRate 0.0016   Epoch: 17   Global Step: 724080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:20,014-Speed 2629.97 samples/sec   Loss 1.8224   LearningRate 0.0016   Epoch: 17   Global Step: 724090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:23,920-Speed 2622.28 samples/sec   Loss 1.8656   LearningRate 0.0016   Epoch: 17   Global Step: 724100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:27,817-Speed 2628.85 samples/sec   Loss 1.7988   LearningRate 0.0016   Epoch: 17   Global Step: 724110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:31,753-Speed 2601.67 samples/sec   Loss 1.8959   LearningRate 0.0016   Epoch: 17   Global Step: 724120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:35,678-Speed 2609.58 samples/sec   Loss 1.8523   LearningRate 0.0016   Epoch: 17   Global Step: 724130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:39,609-Speed 2605.63 samples/sec   Loss 1.8247   LearningRate 0.0016   Epoch: 17   Global Step: 724140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:43,504-Speed 2629.67 samples/sec   Loss 1.8332   LearningRate 0.0016   Epoch: 17   Global Step: 724150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:47,400-Speed 2628.51 samples/sec   Loss 1.8307   LearningRate 0.0016   Epoch: 17   Global Step: 724160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:51,297-Speed 2629.08 samples/sec   Loss 1.8384   LearningRate 0.0016   Epoch: 17   Global Step: 724170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:55,176-Speed 2640.47 samples/sec   Loss 1.8562   LearningRate 0.0016   Epoch: 17   Global Step: 724180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:07:59,090-Speed 2616.77 samples/sec   Loss 1.8516   LearningRate 0.0016   Epoch: 17   Global Step: 724190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:02,983-Speed 2631.04 samples/sec   Loss 1.8031   LearningRate 0.0016   Epoch: 17   Global Step: 724200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:06,882-Speed 2627.17 samples/sec   Loss 1.8918   LearningRate 0.0016   Epoch: 17   Global Step: 724210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:10,780-Speed 2627.50 samples/sec   Loss 1.8873   LearningRate 0.0016   Epoch: 17   Global Step: 724220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:14,857-Speed 2512.39 samples/sec   Loss 1.8191   LearningRate 0.0016   Epoch: 17   Global Step: 724230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:18,768-Speed 2618.67 samples/sec   Loss 1.8387   LearningRate 0.0016   Epoch: 17   Global Step: 724240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:22,667-Speed 2627.30 samples/sec   Loss 1.8843   LearningRate 0.0016   Epoch: 17   Global Step: 724250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:26,562-Speed 2629.60 samples/sec   Loss 1.8745   LearningRate 0.0016   Epoch: 17   Global Step: 724260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:30,466-Speed 2623.71 samples/sec   Loss 1.8349   LearningRate 0.0016   Epoch: 17   Global Step: 724270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:34,349-Speed 2637.87 samples/sec   Loss 1.8272   LearningRate 0.0016   Epoch: 17   Global Step: 724280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:08:38,226-Speed 2641.45 samples/sec   Loss 1.8569   LearningRate 0.0016   Epoch: 17   Global Step: 724290   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:08:42,132-Speed 2622.31 samples/sec   Loss 1.9163   LearningRate 0.0016   Epoch: 17   Global Step: 724300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:08:46,025-Speed 2631.24 samples/sec   Loss 1.8274   LearningRate 0.0016   Epoch: 17   Global Step: 724310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:08:49,920-Speed 2629.96 samples/sec   Loss 1.8998   LearningRate 0.0016   Epoch: 17   Global Step: 724320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:08:53,814-Speed 2630.65 samples/sec   Loss 1.8574   LearningRate 0.0016   Epoch: 17   Global Step: 724330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:08:57,710-Speed 2629.07 samples/sec   Loss 1.8482   LearningRate 0.0016   Epoch: 17   Global Step: 724340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:01,619-Speed 2619.57 samples/sec   Loss 1.8431   LearningRate 0.0016   Epoch: 17   Global Step: 724350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:05,612-Speed 2565.47 samples/sec   Loss 1.8395   LearningRate 0.0016   Epoch: 17   Global Step: 724360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:09,539-Speed 2607.98 samples/sec   Loss 1.8250   LearningRate 0.0016   Epoch: 17   Global Step: 724370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:13,460-Speed 2612.27 samples/sec   Loss 1.8835   LearningRate 0.0016   Epoch: 17   Global Step: 724380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:17,367-Speed 2621.56 samples/sec   Loss 1.8414   LearningRate 0.0016   Epoch: 17   Global Step: 724390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:09:21,273-Speed 2622.61 samples/sec   Loss 1.8435   LearningRate 0.0016   Epoch: 17   Global Step: 724400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:09:25,172-Speed 2626.77 samples/sec   Loss 1.8315   LearningRate 0.0016   Epoch: 17   Global Step: 724410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:09:29,042-Speed 2646.88 samples/sec   Loss 1.8710   LearningRate 0.0016   Epoch: 17   Global Step: 724420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:32,950-Speed 2620.76 samples/sec   Loss 1.7899   LearningRate 0.0016   Epoch: 17   Global Step: 724430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:36,847-Speed 2627.88 samples/sec   Loss 1.8095   LearningRate 0.0016   Epoch: 17   Global Step: 724440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:40,769-Speed 2611.50 samples/sec   Loss 1.8675   LearningRate 0.0016   Epoch: 17   Global Step: 724450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:44,670-Speed 2625.50 samples/sec   Loss 1.8553   LearningRate 0.0016   Epoch: 17   Global Step: 724460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:48,570-Speed 2626.61 samples/sec   Loss 1.8393   LearningRate 0.0016   Epoch: 17   Global Step: 724470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:52,467-Speed 2628.05 samples/sec   Loss 1.8695   LearningRate 0.0016   Epoch: 17   Global Step: 724480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:09:56,369-Speed 2625.05 samples/sec   Loss 1.9682   LearningRate 0.0016   Epoch: 17   Global Step: 724490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:00,267-Speed 2628.34 samples/sec   Loss 1.8496   LearningRate 0.0016   Epoch: 17   Global Step: 724500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:04,165-Speed 2627.37 samples/sec   Loss 1.8870   LearningRate 0.0016   Epoch: 17   Global Step: 724510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:08,039-Speed 2643.75 samples/sec   Loss 1.8592   LearningRate 0.0016   Epoch: 17   Global Step: 724520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:11,936-Speed 2627.90 samples/sec   Loss 1.8966   LearningRate 0.0016   Epoch: 17   Global Step: 724530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:15,841-Speed 2623.65 samples/sec   Loss 1.8537   LearningRate 0.0016   Epoch: 17   Global Step: 724540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:19,738-Speed 2627.74 samples/sec   Loss 1.8630   LearningRate 0.0016   Epoch: 17   Global Step: 724550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:23,642-Speed 2623.72 samples/sec   Loss 1.8728   LearningRate 0.0016   Epoch: 17   Global Step: 724560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:27,541-Speed 2626.98 samples/sec   Loss 1.8214   LearningRate 0.0016   Epoch: 17   Global Step: 724570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:31,447-Speed 2622.27 samples/sec   Loss 1.8772   LearningRate 0.0016   Epoch: 17   Global Step: 724580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:35,358-Speed 2619.18 samples/sec   Loss 1.8525   LearningRate 0.0016   Epoch: 17   Global Step: 724590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:39,254-Speed 2629.08 samples/sec   Loss 1.8823   LearningRate 0.0016   Epoch: 17   Global Step: 724600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:43,148-Speed 2629.75 samples/sec   Loss 1.8564   LearningRate 0.0016   Epoch: 17   Global Step: 724610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:10:47,043-Speed 2629.60 samples/sec   Loss 1.8418   LearningRate 0.0016   Epoch: 17   Global Step: 724620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:10:50,939-Speed 2628.94 samples/sec   Loss 1.8399   LearningRate 0.0016   Epoch: 17   Global Step: 724630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:10:54,840-Speed 2625.30 samples/sec   Loss 1.7929   LearningRate 0.0016   Epoch: 17   Global Step: 724640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:10:58,708-Speed 2648.94 samples/sec   Loss 1.8214   LearningRate 0.0016   Epoch: 17   Global Step: 724650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:02,603-Speed 2629.27 samples/sec   Loss 1.8670   LearningRate 0.0016   Epoch: 17   Global Step: 724660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:06,505-Speed 2625.42 samples/sec   Loss 1.8447   LearningRate 0.0016   Epoch: 17   Global Step: 724670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:10,402-Speed 2628.46 samples/sec   Loss 1.8470   LearningRate 0.0016   Epoch: 17   Global Step: 724680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:14,310-Speed 2620.67 samples/sec   Loss 1.8137   LearningRate 0.0016   Epoch: 17   Global Step: 724690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:18,212-Speed 2624.50 samples/sec   Loss 1.8442   LearningRate 0.0016   Epoch: 17   Global Step: 724700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:22,115-Speed 2624.56 samples/sec   Loss 1.7907   LearningRate 0.0016   Epoch: 17   Global Step: 724710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:26,013-Speed 2627.22 samples/sec   Loss 1.8350   LearningRate 0.0016   Epoch: 17   Global Step: 724720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:29,913-Speed 2626.42 samples/sec   Loss 1.8334   LearningRate 0.0016   Epoch: 17   Global Step: 724730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:33,806-Speed 2631.06 samples/sec   Loss 1.8431   LearningRate 0.0016   Epoch: 17   Global Step: 724740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:11:37,713-Speed 2621.85 samples/sec   Loss 1.8640   LearningRate 0.0016   Epoch: 17   Global Step: 724750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:11:41,616-Speed 2623.82 samples/sec   Loss 1.8533   LearningRate 0.0016   Epoch: 17   Global Step: 724760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:11:45,527-Speed 2618.92 samples/sec   Loss 1.8916   LearningRate 0.0016   Epoch: 17   Global Step: 724770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:11:49,428-Speed 2626.14 samples/sec   Loss 1.9088   LearningRate 0.0016   Epoch: 17   Global Step: 724780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:11:53,349-Speed 2612.36 samples/sec   Loss 1.7902   LearningRate 0.0016   Epoch: 17   Global Step: 724790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:11:57,246-Speed 2628.13 samples/sec   Loss 1.8690   LearningRate 0.0016   Epoch: 17   Global Step: 724800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:12:01,140-Speed 2629.99 samples/sec   Loss 1.8330   LearningRate 0.0016   Epoch: 17   Global Step: 724810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:12:05,033-Speed 2631.58 samples/sec   Loss 1.8154   LearningRate 0.0016   Epoch: 17   Global Step: 724820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:12:08,933-Speed 2625.67 samples/sec   Loss 1.8515   LearningRate 0.0016   Epoch: 17   Global Step: 724830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:12:12,833-Speed 2626.50 samples/sec   Loss 1.8231   LearningRate 0.0016   Epoch: 17   Global Step: 724840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:12:16,718-Speed 2635.73 samples/sec   Loss 1.8629   LearningRate 0.0016   Epoch: 17   Global Step: 724850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:12:20,594-Speed 2642.73 samples/sec   Loss 1.7973   LearningRate 0.0016   Epoch: 17   Global Step: 724860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:24,644-Speed 2529.61 samples/sec   Loss 1.8823   LearningRate 0.0016   Epoch: 17   Global Step: 724870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:28,559-Speed 2616.12 samples/sec   Loss 1.8397   LearningRate 0.0016   Epoch: 17   Global Step: 724880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:32,452-Speed 2630.69 samples/sec   Loss 1.7955   LearningRate 0.0016   Epoch: 17   Global Step: 724890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:36,348-Speed 2628.77 samples/sec   Loss 1.8407   LearningRate 0.0016   Epoch: 17   Global Step: 724900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:40,247-Speed 2627.56 samples/sec   Loss 1.8267   LearningRate 0.0016   Epoch: 17   Global Step: 724910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:44,141-Speed 2630.27 samples/sec   Loss 1.8420   LearningRate 0.0016   Epoch: 17   Global Step: 724920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:48,042-Speed 2625.51 samples/sec   Loss 1.8437   LearningRate 0.0016   Epoch: 17   Global Step: 724930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:52,051-Speed 2554.94 samples/sec   Loss 1.8206   LearningRate 0.0016   Epoch: 17   Global Step: 724940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:55,962-Speed 2618.89 samples/sec   Loss 1.8668   LearningRate 0.0016   Epoch: 17   Global Step: 724950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:12:59,866-Speed 2623.50 samples/sec   Loss 1.8323   LearningRate 0.0016   Epoch: 17   Global Step: 724960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:13:03,770-Speed 2623.60 samples/sec   Loss 1.8532   LearningRate 0.0016   Epoch: 17   Global Step: 724970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:13:07,664-Speed 2630.30 samples/sec   Loss 1.8779   LearningRate 0.0016   Epoch: 17   Global Step: 724980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:13:11,557-Speed 2631.18 samples/sec   Loss 1.8239   LearningRate 0.0016   Epoch: 17   Global Step: 724990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:13:15,441-Speed 2636.83 samples/sec   Loss 1.8887   LearningRate 0.0016   Epoch: 17   Global Step: 725000   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:19,352-Speed 2619.02 samples/sec   Loss 1.8262   LearningRate 0.0016   Epoch: 17   Global Step: 725010   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:23,251-Speed 2626.80 samples/sec   Loss 1.8394   LearningRate 0.0016   Epoch: 17   Global Step: 725020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:27,149-Speed 2627.88 samples/sec   Loss 1.8465   LearningRate 0.0016   Epoch: 17   Global Step: 725030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:31,071-Speed 2611.44 samples/sec   Loss 1.8741   LearningRate 0.0016   Epoch: 17   Global Step: 725040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:35,013-Speed 2598.25 samples/sec   Loss 1.9304   LearningRate 0.0016   Epoch: 17   Global Step: 725050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:38,905-Speed 2631.59 samples/sec   Loss 1.8552   LearningRate 0.0016   Epoch: 17   Global Step: 725060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:42,808-Speed 2624.29 samples/sec   Loss 1.8433   LearningRate 0.0016   Epoch: 17   Global Step: 725070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:46,702-Speed 2630.59 samples/sec   Loss 1.8340   LearningRate 0.0016   Epoch: 17   Global Step: 725080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:50,600-Speed 2628.01 samples/sec   Loss 1.8722   LearningRate 0.0016   Epoch: 17   Global Step: 725090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:13:54,491-Speed 2631.65 samples/sec   Loss 1.8401   LearningRate 0.0016   Epoch: 17   Global Step: 725100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:13:58,385-Speed 2630.33 samples/sec   Loss 1.8373   LearningRate 0.0016   Epoch: 17   Global Step: 725110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:02,281-Speed 2628.81 samples/sec   Loss 1.8294   LearningRate 0.0016   Epoch: 17   Global Step: 725120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:06,176-Speed 2629.77 samples/sec   Loss 1.8456   LearningRate 0.0016   Epoch: 17   Global Step: 725130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:10,078-Speed 2624.82 samples/sec   Loss 1.8411   LearningRate 0.0016   Epoch: 17   Global Step: 725140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:13,976-Speed 2627.65 samples/sec   Loss 1.8802   LearningRate 0.0016   Epoch: 17   Global Step: 725150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:17,870-Speed 2630.43 samples/sec   Loss 1.8429   LearningRate 0.0016   Epoch: 17   Global Step: 725160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:21,765-Speed 2629.87 samples/sec   Loss 1.8455   LearningRate 0.0016   Epoch: 17   Global Step: 725170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:25,659-Speed 2630.17 samples/sec   Loss 1.8011   LearningRate 0.0016   Epoch: 17   Global Step: 725180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:29,581-Speed 2611.52 samples/sec   Loss 1.8578   LearningRate 0.0016   Epoch: 17   Global Step: 725190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:33,456-Speed 2643.03 samples/sec   Loss 1.8534   LearningRate 0.0016   Epoch: 17   Global Step: 725200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:37,356-Speed 2626.06 samples/sec   Loss 1.8271   LearningRate 0.0016   Epoch: 17   Global Step: 725210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:41,249-Speed 2630.80 samples/sec   Loss 1.7812   LearningRate 0.0016   Epoch: 17   Global Step: 725220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:45,146-Speed 2628.26 samples/sec   Loss 1.8123   LearningRate 0.0016   Epoch: 17   Global Step: 725230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:14:49,023-Speed 2642.49 samples/sec   Loss 1.8210   LearningRate 0.0016   Epoch: 17   Global Step: 725240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:14:52,921-Speed 2627.61 samples/sec   Loss 1.8004   LearningRate 0.0016   Epoch: 17   Global Step: 725250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:14:56,816-Speed 2629.49 samples/sec   Loss 1.7909   LearningRate 0.0016   Epoch: 17   Global Step: 725260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:00,712-Speed 2629.28 samples/sec   Loss 1.8429   LearningRate 0.0016   Epoch: 17   Global Step: 725270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:04,622-Speed 2619.45 samples/sec   Loss 1.8316   LearningRate 0.0016   Epoch: 17   Global Step: 725280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:08,518-Speed 2628.74 samples/sec   Loss 1.7860   LearningRate 0.0016   Epoch: 17   Global Step: 725290   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:12,426-Speed 2620.88 samples/sec   Loss 1.8414   LearningRate 0.0016   Epoch: 17   Global Step: 725300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:16,322-Speed 2628.87 samples/sec   Loss 1.8862   LearningRate 0.0016   Epoch: 17   Global Step: 725310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:20,218-Speed 2628.72 samples/sec   Loss 1.8960   LearningRate 0.0016   Epoch: 17   Global Step: 725320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:24,111-Speed 2630.84 samples/sec   Loss 1.8537   LearningRate 0.0016   Epoch: 17   Global Step: 725330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:15:28,008-Speed 2628.30 samples/sec   Loss 1.8881   LearningRate 0.0016   Epoch: 17   Global Step: 725340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:31,904-Speed 2629.44 samples/sec   Loss 1.8476   LearningRate 0.0016   Epoch: 17   Global Step: 725350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:35,810-Speed 2622.53 samples/sec   Loss 1.8236   LearningRate 0.0016   Epoch: 17   Global Step: 725360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:39,709-Speed 2626.49 samples/sec   Loss 1.8313   LearningRate 0.0016   Epoch: 17   Global Step: 725370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:43,603-Speed 2630.43 samples/sec   Loss 1.8534   LearningRate 0.0016   Epoch: 17   Global Step: 725380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:47,500-Speed 2628.19 samples/sec   Loss 1.8904   LearningRate 0.0016   Epoch: 17   Global Step: 725390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:51,466-Speed 2582.57 samples/sec   Loss 1.8101   LearningRate 0.0016   Epoch: 17   Global Step: 725400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:55,361-Speed 2629.58 samples/sec   Loss 1.8591   LearningRate 0.0016   Epoch: 17   Global Step: 725410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:15:59,286-Speed 2609.39 samples/sec   Loss 1.8108   LearningRate 0.0016   Epoch: 17   Global Step: 725420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:03,195-Speed 2620.35 samples/sec   Loss 1.8300   LearningRate 0.0016   Epoch: 17   Global Step: 725430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:07,065-Speed 2646.67 samples/sec   Loss 1.8634   LearningRate 0.0016   Epoch: 17   Global Step: 725440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:10,973-Speed 2620.62 samples/sec   Loss 1.8774   LearningRate 0.0016   Epoch: 17   Global Step: 725450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:14,867-Speed 2630.92 samples/sec   Loss 1.7369   LearningRate 0.0016   Epoch: 17   Global Step: 725460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:18,773-Speed 2621.57 samples/sec   Loss 1.7811   LearningRate 0.0016   Epoch: 17   Global Step: 725470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:22,676-Speed 2624.25 samples/sec   Loss 1.8675   LearningRate 0.0016   Epoch: 17   Global Step: 725480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:26,591-Speed 2616.40 samples/sec   Loss 1.8435   LearningRate 0.0016   Epoch: 17   Global Step: 725490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:30,485-Speed 2630.28 samples/sec   Loss 1.8216   LearningRate 0.0016   Epoch: 17   Global Step: 725500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:34,377-Speed 2631.83 samples/sec   Loss 1.7946   LearningRate 0.0016   Epoch: 17   Global Step: 725510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:38,271-Speed 2629.80 samples/sec   Loss 1.8315   LearningRate 0.0016   Epoch: 17   Global Step: 725520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:16:42,147-Speed 2642.80 samples/sec   Loss 1.7896   LearningRate 0.0016   Epoch: 17   Global Step: 725530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:16:46,066-Speed 2613.38 samples/sec   Loss 1.8326   LearningRate 0.0016   Epoch: 17   Global Step: 725540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:16:49,960-Speed 2630.56 samples/sec   Loss 1.8992   LearningRate 0.0016   Epoch: 17   Global Step: 725550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:16:53,859-Speed 2627.11 samples/sec   Loss 1.8709   LearningRate 0.0016   Epoch: 17   Global Step: 725560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:16:57,730-Speed 2645.76 samples/sec   Loss 1.8332   LearningRate 0.0016   Epoch: 17   Global Step: 725570   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:01,626-Speed 2629.32 samples/sec   Loss 1.7989   LearningRate 0.0016   Epoch: 17   Global Step: 725580   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:05,525-Speed 2626.99 samples/sec   Loss 1.8138   LearningRate 0.0016   Epoch: 17   Global Step: 725590   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:09,440-Speed 2616.11 samples/sec   Loss 1.8484   LearningRate 0.0016   Epoch: 17   Global Step: 725600   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:13,331-Speed 2632.19 samples/sec   Loss 1.8151   LearningRate 0.0016   Epoch: 17   Global Step: 725610   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:17,226-Speed 2628.89 samples/sec   Loss 1.8003   LearningRate 0.0016   Epoch: 17   Global Step: 725620   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:21,126-Speed 2627.07 samples/sec   Loss 1.7999   LearningRate 0.0016   Epoch: 17   Global Step: 725630   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:25,022-Speed 2629.12 samples/sec   Loss 1.8524   LearningRate 0.0016   Epoch: 17   Global Step: 725640   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:28,925-Speed 2624.00 samples/sec   Loss 1.8061   LearningRate 0.0016   Epoch: 17   Global Step: 725650   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:32,822-Speed 2628.51 samples/sec   Loss 1.8500   LearningRate 0.0016   Epoch: 17   Global Step: 725660   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-16 05:17:36,715-Speed 2631.59 samples/sec   Loss 1.8619   LearningRate 0.0016   Epoch: 17   Global Step: 725670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:17:40,615-Speed 2625.89 samples/sec   Loss 1.8315   LearningRate 0.0016   Epoch: 17   Global Step: 725680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:17:44,509-Speed 2630.09 samples/sec   Loss 1.7985   LearningRate 0.0016   Epoch: 17   Global Step: 725690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:17:48,416-Speed 2621.51 samples/sec   Loss 1.7819   LearningRate 0.0016   Epoch: 17   Global Step: 725700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:17:52,309-Speed 2631.31 samples/sec   Loss 1.8813   LearningRate 0.0016   Epoch: 17   Global Step: 725710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:17:56,236-Speed 2609.31 samples/sec   Loss 1.8180   LearningRate 0.0016   Epoch: 17   Global Step: 725720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:00,135-Speed 2626.71 samples/sec   Loss 1.8328   LearningRate 0.0016   Epoch: 17   Global Step: 725730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:04,032-Speed 2628.92 samples/sec   Loss 1.9027   LearningRate 0.0016   Epoch: 17   Global Step: 725740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:07,925-Speed 2630.61 samples/sec   Loss 1.8386   LearningRate 0.0016   Epoch: 17   Global Step: 725750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:11,832-Speed 2621.55 samples/sec   Loss 1.8467   LearningRate 0.0016   Epoch: 17   Global Step: 725760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:15,731-Speed 2626.81 samples/sec   Loss 1.7936   LearningRate 0.0016   Epoch: 17   Global Step: 725770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:18:19,628-Speed 2628.49 samples/sec   Loss 1.8118   LearningRate 0.0016   Epoch: 17   Global Step: 725780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:18:23,529-Speed 2625.23 samples/sec   Loss 1.8160   LearningRate 0.0016   Epoch: 17   Global Step: 725790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:18:27,404-Speed 2643.72 samples/sec   Loss 1.8371   LearningRate 0.0016   Epoch: 17   Global Step: 725800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:31,302-Speed 2627.61 samples/sec   Loss 1.8475   LearningRate 0.0016   Epoch: 17   Global Step: 725810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:35,234-Speed 2604.93 samples/sec   Loss 1.8084   LearningRate 0.0016   Epoch: 17   Global Step: 725820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:39,129-Speed 2629.79 samples/sec   Loss 1.8488   LearningRate 0.0016   Epoch: 17   Global Step: 725830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:43,024-Speed 2629.50 samples/sec   Loss 1.8649   LearningRate 0.0016   Epoch: 17   Global Step: 725840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:46,923-Speed 2627.27 samples/sec   Loss 1.8605   LearningRate 0.0016   Epoch: 17   Global Step: 725850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:50,808-Speed 2636.24 samples/sec   Loss 1.8020   LearningRate 0.0016   Epoch: 17   Global Step: 725860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:54,705-Speed 2628.50 samples/sec   Loss 1.7962   LearningRate 0.0016   Epoch: 17   Global Step: 725870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:18:58,598-Speed 2630.94 samples/sec   Loss 1.8342   LearningRate 0.0016   Epoch: 17   Global Step: 725880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:02,489-Speed 2632.05 samples/sec   Loss 1.8629   LearningRate 0.0016   Epoch: 17   Global Step: 725890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:06,362-Speed 2644.15 samples/sec   Loss 1.8522   LearningRate 0.0016   Epoch: 17   Global Step: 725900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:10,255-Speed 2631.11 samples/sec   Loss 1.8690   LearningRate 0.0016   Epoch: 17   Global Step: 725910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:14,153-Speed 2628.31 samples/sec   Loss 1.8455   LearningRate 0.0016   Epoch: 17   Global Step: 725920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:18,043-Speed 2632.72 samples/sec   Loss 1.8170   LearningRate 0.0016   Epoch: 17   Global Step: 725930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:21,938-Speed 2629.65 samples/sec   Loss 1.8687   LearningRate 0.0016   Epoch: 17   Global Step: 725940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:25,841-Speed 2624.47 samples/sec   Loss 1.8221   LearningRate 0.0016   Epoch: 17   Global Step: 725950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:29,739-Speed 2627.58 samples/sec   Loss 1.7891   LearningRate 0.0016   Epoch: 17   Global Step: 725960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:33,640-Speed 2625.12 samples/sec   Loss 1.8217   LearningRate 0.0016   Epoch: 17   Global Step: 725970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:37,552-Speed 2618.18 samples/sec   Loss 1.8472   LearningRate 0.0016   Epoch: 17   Global Step: 725980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:41,469-Speed 2614.69 samples/sec   Loss 1.8025   LearningRate 0.0016   Epoch: 17   Global Step: 725990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:19:45,372-Speed 2624.41 samples/sec   Loss 1.8851   LearningRate 0.0016   Epoch: 17   Global Step: 726000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:19:49,268-Speed 2629.27 samples/sec   Loss 1.8769   LearningRate 0.0016   Epoch: 17   Global Step: 726010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:19:53,172-Speed 2624.08 samples/sec   Loss 1.7887   LearningRate 0.0016   Epoch: 17   Global Step: 726020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:19:57,080-Speed 2620.75 samples/sec   Loss 1.8220   LearningRate 0.0016   Epoch: 17   Global Step: 726030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:00,983-Speed 2624.44 samples/sec   Loss 1.8883   LearningRate 0.0016   Epoch: 17   Global Step: 726040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:04,930-Speed 2594.48 samples/sec   Loss 1.7724   LearningRate 0.0016   Epoch: 17   Global Step: 726050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:08,993-Speed 2521.16 samples/sec   Loss 1.8106   LearningRate 0.0016   Epoch: 17   Global Step: 726060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:12,931-Speed 2600.98 samples/sec   Loss 1.8183   LearningRate 0.0016   Epoch: 17   Global Step: 726070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:16,827-Speed 2628.84 samples/sec   Loss 1.8064   LearningRate 0.0016   Epoch: 17   Global Step: 726080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:20,727-Speed 2626.66 samples/sec   Loss 1.8378   LearningRate 0.0016   Epoch: 17   Global Step: 726090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:24,631-Speed 2623.90 samples/sec   Loss 1.8293   LearningRate 0.0016   Epoch: 17   Global Step: 726100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 05:20:28,577-Speed 2595.46 samples/sec   Loss 1.8322   LearningRate 0.0016   Epoch: 17   Global Step: 726110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:32,473-Speed 2629.11 samples/sec   Loss 1.8854   LearningRate 0.0016   Epoch: 17   Global Step: 726120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:36,366-Speed 2630.78 samples/sec   Loss 1.7523   LearningRate 0.0016   Epoch: 17   Global Step: 726130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:40,270-Speed 2623.31 samples/sec   Loss 1.8305   LearningRate 0.0016   Epoch: 17   Global Step: 726140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:44,176-Speed 2622.26 samples/sec   Loss 1.8955   LearningRate 0.0016   Epoch: 17   Global Step: 726150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:20:48,061-Speed 2636.12 samples/sec   Loss 1.8509   LearningRate 0.0016   Epoch: 17   Global Step: 726160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:20:51,967-Speed 2622.48 samples/sec   Loss 1.8486   LearningRate 0.0016   Epoch: 17   Global Step: 726170   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:20:55,860-Speed 2631.53 samples/sec   Loss 1.7813   LearningRate 0.0016   Epoch: 17   Global Step: 726180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:20:59,749-Speed 2633.92 samples/sec   Loss 1.7711   LearningRate 0.0016   Epoch: 17   Global Step: 726190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:03,641-Speed 2631.17 samples/sec   Loss 1.7745   LearningRate 0.0016   Epoch: 17   Global Step: 726200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:07,534-Speed 2631.47 samples/sec   Loss 1.7430   LearningRate 0.0016   Epoch: 17   Global Step: 726210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:11,447-Speed 2617.00 samples/sec   Loss 1.7691   LearningRate 0.0016   Epoch: 17   Global Step: 726220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:15,384-Speed 2601.79 samples/sec   Loss 1.8949   LearningRate 0.0016   Epoch: 17   Global Step: 726230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:19,286-Speed 2625.24 samples/sec   Loss 1.8169   LearningRate 0.0016   Epoch: 17   Global Step: 726240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:23,179-Speed 2631.34 samples/sec   Loss 1.8291   LearningRate 0.0016   Epoch: 17   Global Step: 726250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:27,075-Speed 2628.93 samples/sec   Loss 1.8005   LearningRate 0.0016   Epoch: 17   Global Step: 726260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:21:30,971-Speed 2630.24 samples/sec   Loss 1.8504   LearningRate 0.0016   Epoch: 17   Global Step: 726270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:21:34,864-Speed 2630.71 samples/sec   Loss 1.8809   LearningRate 0.0016   Epoch: 17   Global Step: 726280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:21:38,893-Speed 2542.27 samples/sec   Loss 1.7389   LearningRate 0.0016   Epoch: 17   Global Step: 726290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:21:42,982-Speed 2504.36 samples/sec   Loss 1.8399   LearningRate 0.0015   Epoch: 17   Global Step: 726300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:21:46,914-Speed 2605.10 samples/sec   Loss 1.7702   LearningRate 0.0015   Epoch: 17   Global Step: 726310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:50,824-Speed 2619.62 samples/sec   Loss 1.8501   LearningRate 0.0015   Epoch: 17   Global Step: 726320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:54,735-Speed 2619.39 samples/sec   Loss 1.8053   LearningRate 0.0015   Epoch: 17   Global Step: 726330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:21:58,633-Speed 2627.58 samples/sec   Loss 1.8119   LearningRate 0.0015   Epoch: 17   Global Step: 726340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:02,533-Speed 2626.64 samples/sec   Loss 1.8620   LearningRate 0.0015   Epoch: 17   Global Step: 726350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:06,430-Speed 2627.94 samples/sec   Loss 1.8514   LearningRate 0.0015   Epoch: 17   Global Step: 726360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:10,331-Speed 2625.47 samples/sec   Loss 1.8053   LearningRate 0.0015   Epoch: 17   Global Step: 726370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:14,225-Speed 2629.99 samples/sec   Loss 1.8472   LearningRate 0.0015   Epoch: 17   Global Step: 726380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:18,125-Speed 2626.62 samples/sec   Loss 1.8348   LearningRate 0.0015   Epoch: 17   Global Step: 726390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:22,049-Speed 2611.41 samples/sec   Loss 1.8211   LearningRate 0.0015   Epoch: 17   Global Step: 726400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:22:25,953-Speed 2623.44 samples/sec   Loss 1.9162   LearningRate 0.0015   Epoch: 17   Global Step: 726410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:29,844-Speed 2632.52 samples/sec   Loss 1.8507   LearningRate 0.0015   Epoch: 17   Global Step: 726420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:33,739-Speed 2629.55 samples/sec   Loss 1.8523   LearningRate 0.0015   Epoch: 17   Global Step: 726430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:37,637-Speed 2627.74 samples/sec   Loss 1.8494   LearningRate 0.0015   Epoch: 17   Global Step: 726440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:41,539-Speed 2624.56 samples/sec   Loss 1.8295   LearningRate 0.0015   Epoch: 17   Global Step: 726450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:45,442-Speed 2624.44 samples/sec   Loss 1.8297   LearningRate 0.0015   Epoch: 17   Global Step: 726460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:49,333-Speed 2632.13 samples/sec   Loss 1.8096   LearningRate 0.0015   Epoch: 17   Global Step: 726470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:53,229-Speed 2629.08 samples/sec   Loss 1.7884   LearningRate 0.0015   Epoch: 17   Global Step: 726480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:22:57,135-Speed 2622.44 samples/sec   Loss 1.8863   LearningRate 0.0015   Epoch: 17   Global Step: 726490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:23:01,013-Speed 2641.25 samples/sec   Loss 1.7944   LearningRate 0.0015   Epoch: 17   Global Step: 726500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:04,922-Speed 2620.68 samples/sec   Loss 1.7718   LearningRate 0.0015   Epoch: 17   Global Step: 726510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:08,818-Speed 2628.90 samples/sec   Loss 1.8301   LearningRate 0.0015   Epoch: 17   Global Step: 726520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:12,709-Speed 2632.09 samples/sec   Loss 1.7866   LearningRate 0.0015   Epoch: 17   Global Step: 726530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:16,606-Speed 2628.45 samples/sec   Loss 1.8343   LearningRate 0.0015   Epoch: 17   Global Step: 726540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:20,497-Speed 2633.13 samples/sec   Loss 1.8600   LearningRate 0.0015   Epoch: 17   Global Step: 726550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:24,393-Speed 2628.62 samples/sec   Loss 1.8269   LearningRate 0.0015   Epoch: 17   Global Step: 726560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:28,307-Speed 2617.28 samples/sec   Loss 1.8087   LearningRate 0.0015   Epoch: 17   Global Step: 726570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:32,203-Speed 2628.37 samples/sec   Loss 1.7622   LearningRate 0.0015   Epoch: 17   Global Step: 726580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:36,106-Speed 2624.68 samples/sec   Loss 1.8299   LearningRate 0.0015   Epoch: 17   Global Step: 726590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:23:40,000-Speed 2630.29 samples/sec   Loss 1.8256   LearningRate 0.0015   Epoch: 17   Global Step: 726600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:23:43,897-Speed 2628.54 samples/sec   Loss 1.7823   LearningRate 0.0015   Epoch: 17   Global Step: 726610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:23:47,793-Speed 2628.32 samples/sec   Loss 1.8339   LearningRate 0.0015   Epoch: 17   Global Step: 726620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:23:51,694-Speed 2625.96 samples/sec   Loss 1.8200   LearningRate 0.0015   Epoch: 17   Global Step: 726630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:23:55,620-Speed 2608.88 samples/sec   Loss 1.7846   LearningRate 0.0015   Epoch: 17   Global Step: 726640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:23:59,515-Speed 2629.69 samples/sec   Loss 1.8134   LearningRate 0.0015   Epoch: 17   Global Step: 726650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:24:03,421-Speed 2621.89 samples/sec   Loss 1.8000   LearningRate 0.0015   Epoch: 17   Global Step: 726660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:24:07,337-Speed 2616.40 samples/sec   Loss 1.7689   LearningRate 0.0015   Epoch: 17   Global Step: 726670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:24:11,242-Speed 2622.64 samples/sec   Loss 1.8172   LearningRate 0.0015   Epoch: 17   Global Step: 726680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:24:15,144-Speed 2624.78 samples/sec   Loss 1.8101   LearningRate 0.0015   Epoch: 17   Global Step: 726690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:24:19,045-Speed 2625.73 samples/sec   Loss 1.8194   LearningRate 0.0015   Epoch: 17   Global Step: 726700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 05:24:22,891-Speed 2662.94 samples/sec   Loss 1.7889   LearningRate 0.0015   Epoch: 17   Global Step: 726710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:26,784-Speed 2631.35 samples/sec   Loss 1.8503   LearningRate 0.0015   Epoch: 17   Global Step: 726720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:30,679-Speed 2629.48 samples/sec   Loss 1.7956   LearningRate 0.0015   Epoch: 17   Global Step: 726730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:34,574-Speed 2629.37 samples/sec   Loss 1.8308   LearningRate 0.0015   Epoch: 17   Global Step: 726740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:38,475-Speed 2625.95 samples/sec   Loss 1.8453   LearningRate 0.0015   Epoch: 17   Global Step: 726750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:42,370-Speed 2630.35 samples/sec   Loss 1.7840   LearningRate 0.0015   Epoch: 17   Global Step: 726760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:46,263-Speed 2630.45 samples/sec   Loss 1.8433   LearningRate 0.0015   Epoch: 17   Global Step: 726770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:50,200-Speed 2601.72 samples/sec   Loss 1.8403   LearningRate 0.0015   Epoch: 17   Global Step: 726780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:54,093-Speed 2631.36 samples/sec   Loss 1.8182   LearningRate 0.0015   Epoch: 17   Global Step: 726790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:24:58,033-Speed 2600.05 samples/sec   Loss 1.8299   LearningRate 0.0015   Epoch: 17   Global Step: 726800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:01,949-Speed 2615.23 samples/sec   Loss 1.8547   LearningRate 0.0015   Epoch: 17   Global Step: 726810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:25:05,844-Speed 2629.74 samples/sec   Loss 1.7953   LearningRate 0.0015   Epoch: 17   Global Step: 726820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:25:09,761-Speed 2614.91 samples/sec   Loss 1.8801   LearningRate 0.0015   Epoch: 17   Global Step: 726830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:25:13,654-Speed 2631.09 samples/sec   Loss 1.8150   LearningRate 0.0015   Epoch: 17   Global Step: 726840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:25:17,526-Speed 2645.61 samples/sec   Loss 1.7686   LearningRate 0.0015   Epoch: 17   Global Step: 726850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:21,430-Speed 2623.31 samples/sec   Loss 1.8341   LearningRate 0.0015   Epoch: 17   Global Step: 726860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:25,323-Speed 2631.67 samples/sec   Loss 1.8249   LearningRate 0.0015   Epoch: 17   Global Step: 726870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:29,212-Speed 2633.79 samples/sec   Loss 1.8829   LearningRate 0.0015   Epoch: 17   Global Step: 726880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:33,107-Speed 2630.04 samples/sec   Loss 1.8721   LearningRate 0.0015   Epoch: 17   Global Step: 726890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:37,029-Speed 2610.96 samples/sec   Loss 1.8007   LearningRate 0.0015   Epoch: 17   Global Step: 726900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:40,923-Speed 2630.55 samples/sec   Loss 1.8657   LearningRate 0.0015   Epoch: 17   Global Step: 726910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:44,819-Speed 2628.80 samples/sec   Loss 1.8904   LearningRate 0.0015   Epoch: 17   Global Step: 726920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:48,715-Speed 2628.83 samples/sec   Loss 1.9009   LearningRate 0.0015   Epoch: 17   Global Step: 726930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:52,634-Speed 2614.37 samples/sec   Loss 1.7176   LearningRate 0.0015   Epoch: 17   Global Step: 726940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:25:56,531-Speed 2628.15 samples/sec   Loss 1.8122   LearningRate 0.0015   Epoch: 17   Global Step: 726950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:00,468-Speed 2602.48 samples/sec   Loss 1.8353   LearningRate 0.0015   Epoch: 17   Global Step: 726960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:04,362-Speed 2630.03 samples/sec   Loss 1.8214   LearningRate 0.0015   Epoch: 17   Global Step: 726970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:08,253-Speed 2632.48 samples/sec   Loss 1.8093   LearningRate 0.0015   Epoch: 17   Global Step: 726980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:12,149-Speed 2629.09 samples/sec   Loss 1.7722   LearningRate 0.0015   Epoch: 17   Global Step: 726990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:16,072-Speed 2610.86 samples/sec   Loss 1.8395   LearningRate 0.0015   Epoch: 17   Global Step: 727000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:20,075-Speed 2559.08 samples/sec   Loss 1.7990   LearningRate 0.0015   Epoch: 17   Global Step: 727010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:23,983-Speed 2620.97 samples/sec   Loss 1.8868   LearningRate 0.0015   Epoch: 17   Global Step: 727020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:27,884-Speed 2625.45 samples/sec   Loss 1.7825   LearningRate 0.0015   Epoch: 17   Global Step: 727030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:31,794-Speed 2620.25 samples/sec   Loss 1.7958   LearningRate 0.0015   Epoch: 17   Global Step: 727040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:26:35,686-Speed 2631.42 samples/sec   Loss 1.8143   LearningRate 0.0015   Epoch: 17   Global Step: 727050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-16 05:26:39,534-Speed 2661.87 samples/sec   Loss 1.8477   LearningRate 0.0015   Epoch: 17   Global Step: 727060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:26:43,437-Speed 2623.90 samples/sec   Loss 1.7830   LearningRate 0.0015   Epoch: 17   Global Step: 727070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:26:47,334-Speed 2628.73 samples/sec   Loss 1.8358   LearningRate 0.0015   Epoch: 17   Global Step: 727080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:26:51,232-Speed 2627.50 samples/sec   Loss 1.8147   LearningRate 0.0015   Epoch: 17   Global Step: 727090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:26:55,138-Speed 2622.90 samples/sec   Loss 1.8521   LearningRate 0.0015   Epoch: 17   Global Step: 727100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:26:59,046-Speed 2621.20 samples/sec   Loss 1.8035   LearningRate 0.0015   Epoch: 17   Global Step: 727110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:02,959-Speed 2617.58 samples/sec   Loss 1.7747   LearningRate 0.0015   Epoch: 17   Global Step: 727120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:06,862-Speed 2624.21 samples/sec   Loss 1.8224   LearningRate 0.0015   Epoch: 17   Global Step: 727130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:10,770-Speed 2621.02 samples/sec   Loss 1.8469   LearningRate 0.0015   Epoch: 17   Global Step: 727140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:14,667-Speed 2628.26 samples/sec   Loss 1.7511   LearningRate 0.0015   Epoch: 17   Global Step: 727150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:18,565-Speed 2627.68 samples/sec   Loss 1.8041   LearningRate 0.0015   Epoch: 17   Global Step: 727160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:22,462-Speed 2629.32 samples/sec   Loss 1.8726   LearningRate 0.0015   Epoch: 17   Global Step: 727170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:26,360-Speed 2627.35 samples/sec   Loss 1.8619   LearningRate 0.0015   Epoch: 17   Global Step: 727180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:30,271-Speed 2618.99 samples/sec   Loss 1.8009   LearningRate 0.0015   Epoch: 17   Global Step: 727190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:34,178-Speed 2621.68 samples/sec   Loss 1.7904   LearningRate 0.0015   Epoch: 17   Global Step: 727200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:38,090-Speed 2617.88 samples/sec   Loss 1.8300   LearningRate 0.0015   Epoch: 17   Global Step: 727210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:41,992-Speed 2624.93 samples/sec   Loss 1.8677   LearningRate 0.0015   Epoch: 17   Global Step: 727220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:45,911-Speed 2614.20 samples/sec   Loss 1.8759   LearningRate 0.0015   Epoch: 17   Global Step: 727230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-16 05:27:49,798-Speed 2634.70 samples/sec   Loss 1.8864   LearningRate 0.0015   Epoch: 17   Global Step: 727240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:53,690-Speed 2632.22 samples/sec   Loss 1.8597   LearningRate 0.0015   Epoch: 17   Global Step: 727250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-16 05:27:57,589-Speed 2626.84 samples/sec   Loss 1.8120   LearningRate 0.0015   Epoch: 17   Global Step: 727260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:01,488-Speed 2627.92 samples/sec   Loss 1.7854   LearningRate 0.0015   Epoch: 17   Global Step: 727270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:05,383-Speed 2629.52 samples/sec   Loss 1.8377   LearningRate 0.0015   Epoch: 17   Global Step: 727280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:09,281-Speed 2627.37 samples/sec   Loss 1.8231   LearningRate 0.0015   Epoch: 17   Global Step: 727290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:13,180-Speed 2627.02 samples/sec   Loss 1.7968   LearningRate 0.0015   Epoch: 17   Global Step: 727300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:17,078-Speed 2627.82 samples/sec   Loss 1.8483   LearningRate 0.0015   Epoch: 17   Global Step: 727310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:20,979-Speed 2625.83 samples/sec   Loss 1.8797   LearningRate 0.0015   Epoch: 17   Global Step: 727320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:24,870-Speed 2632.51 samples/sec   Loss 1.8165   LearningRate 0.0015   Epoch: 17   Global Step: 727330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:28,770-Speed 2626.50 samples/sec   Loss 1.8547   LearningRate 0.0015   Epoch: 17   Global Step: 727340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:28:32,666-Speed 2628.96 samples/sec   Loss 1.8126   LearningRate 0.0015   Epoch: 17   Global Step: 727350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:28:36,562-Speed 2629.33 samples/sec   Loss 1.7601   LearningRate 0.0015   Epoch: 17   Global Step: 727360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:28:40,465-Speed 2624.17 samples/sec   Loss 1.8408   LearningRate 0.0015   Epoch: 17   Global Step: 727370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:28:44,413-Speed 2594.61 samples/sec   Loss 1.7954   LearningRate 0.0015   Epoch: 17   Global Step: 727380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:28:48,308-Speed 2629.17 samples/sec   Loss 1.8133   LearningRate 0.0015   Epoch: 17   Global Step: 727390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:28:52,184-Speed 2643.24 samples/sec   Loss 1.8003   LearningRate 0.0015   Epoch: 17   Global Step: 727400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:56,075-Speed 2632.09 samples/sec   Loss 1.7969   LearningRate 0.0015   Epoch: 17   Global Step: 727410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:28:59,968-Speed 2631.61 samples/sec   Loss 1.7935   LearningRate 0.0015   Epoch: 17   Global Step: 727420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:03,860-Speed 2631.19 samples/sec   Loss 1.8661   LearningRate 0.0015   Epoch: 17   Global Step: 727430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:07,762-Speed 2625.27 samples/sec   Loss 1.8100   LearningRate 0.0015   Epoch: 17   Global Step: 727440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:11,682-Speed 2612.89 samples/sec   Loss 1.8604   LearningRate 0.0015   Epoch: 17   Global Step: 727450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:15,574-Speed 2631.40 samples/sec   Loss 1.7971   LearningRate 0.0015   Epoch: 17   Global Step: 727460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:19,474-Speed 2626.70 samples/sec   Loss 1.7795   LearningRate 0.0015   Epoch: 17   Global Step: 727470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:23,380-Speed 2622.34 samples/sec   Loss 1.7599   LearningRate 0.0015   Epoch: 17   Global Step: 727480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:27,290-Speed 2619.49 samples/sec   Loss 1.8107   LearningRate 0.0015   Epoch: 17   Global Step: 727490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:31,188-Speed 2627.73 samples/sec   Loss 1.8228   LearningRate 0.0015   Epoch: 17   Global Step: 727500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:29:35,117-Speed 2607.00 samples/sec   Loss 1.8127   LearningRate 0.0015   Epoch: 17   Global Step: 727510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:29:39,041-Speed 2610.56 samples/sec   Loss 1.8357   LearningRate 0.0015   Epoch: 17   Global Step: 727520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:29:43,043-Speed 2558.92 samples/sec   Loss 1.8355   LearningRate 0.0015   Epoch: 17   Global Step: 727530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:46,946-Speed 2624.43 samples/sec   Loss 1.8248   LearningRate 0.0015   Epoch: 17   Global Step: 727540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:50,853-Speed 2621.86 samples/sec   Loss 1.8545   LearningRate 0.0015   Epoch: 17   Global Step: 727550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:29:54,740-Speed 2634.72 samples/sec   Loss 1.7605   LearningRate 0.0015   Epoch: 17   Global Step: 727560   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:29:58,745-Speed 2557.89 samples/sec   Loss 1.7250   LearningRate 0.0015   Epoch: 17   Global Step: 727570   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:02,649-Speed 2623.81 samples/sec   Loss 1.8001   LearningRate 0.0015   Epoch: 17   Global Step: 727580   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:06,555-Speed 2622.62 samples/sec   Loss 1.7704   LearningRate 0.0015   Epoch: 17   Global Step: 727590   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:10,451-Speed 2628.59 samples/sec   Loss 1.7880   LearningRate 0.0015   Epoch: 17   Global Step: 727600   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:14,349-Speed 2627.54 samples/sec   Loss 1.7994   LearningRate 0.0015   Epoch: 17   Global Step: 727610   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:18,243-Speed 2630.45 samples/sec   Loss 1.7506   LearningRate 0.0015   Epoch: 17   Global Step: 727620   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:22,135-Speed 2632.26 samples/sec   Loss 1.8399   LearningRate 0.0015   Epoch: 17   Global Step: 727630   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:26,028-Speed 2630.80 samples/sec   Loss 1.8346   LearningRate 0.0015   Epoch: 17   Global Step: 727640   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:29,931-Speed 2623.90 samples/sec   Loss 1.7872   LearningRate 0.0015   Epoch: 17   Global Step: 727650   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:30:33,823-Speed 2632.37 samples/sec   Loss 1.7919   LearningRate 0.0015   Epoch: 17   Global Step: 727660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:30:37,716-Speed 2631.52 samples/sec   Loss 1.8041   LearningRate 0.0015   Epoch: 17   Global Step: 727670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:30:41,612-Speed 2628.98 samples/sec   Loss 1.8746   LearningRate 0.0015   Epoch: 17   Global Step: 727680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:30:45,504-Speed 2631.20 samples/sec   Loss 1.7764   LearningRate 0.0015   Epoch: 17   Global Step: 727690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:30:49,407-Speed 2624.46 samples/sec   Loss 1.7878   LearningRate 0.0015   Epoch: 17   Global Step: 727700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:30:53,319-Speed 2618.43 samples/sec   Loss 1.7986   LearningRate 0.0015   Epoch: 17   Global Step: 727710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:30:57,221-Speed 2625.17 samples/sec   Loss 1.8102   LearningRate 0.0015   Epoch: 17   Global Step: 727720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:01,113-Speed 2631.53 samples/sec   Loss 1.8604   LearningRate 0.0015   Epoch: 17   Global Step: 727730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:05,008-Speed 2628.99 samples/sec   Loss 1.8465   LearningRate 0.0015   Epoch: 17   Global Step: 727740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:08,900-Speed 2632.16 samples/sec   Loss 1.7816   LearningRate 0.0015   Epoch: 17   Global Step: 727750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:12,799-Speed 2627.23 samples/sec   Loss 1.7926   LearningRate 0.0015   Epoch: 17   Global Step: 727760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:31:16,708-Speed 2620.25 samples/sec   Loss 1.7825   LearningRate 0.0015   Epoch: 17   Global Step: 727770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:31:20,609-Speed 2625.27 samples/sec   Loss 1.7788   LearningRate 0.0015   Epoch: 17   Global Step: 727780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:31:24,507-Speed 2627.52 samples/sec   Loss 1.7995   LearningRate 0.0015   Epoch: 17   Global Step: 727790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:31:28,416-Speed 2620.84 samples/sec   Loss 1.7751   LearningRate 0.0015   Epoch: 17   Global Step: 727800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:31:32,308-Speed 2631.65 samples/sec   Loss 1.8803   LearningRate 0.0015   Epoch: 17   Global Step: 727810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:36,210-Speed 2624.62 samples/sec   Loss 1.8610   LearningRate 0.0015   Epoch: 17   Global Step: 727820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:40,106-Speed 2628.28 samples/sec   Loss 1.7984   LearningRate 0.0015   Epoch: 17   Global Step: 727830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:44,018-Speed 2619.13 samples/sec   Loss 1.8362   LearningRate 0.0015   Epoch: 17   Global Step: 727840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:48,016-Speed 2562.36 samples/sec   Loss 1.8466   LearningRate 0.0015   Epoch: 17   Global Step: 727850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:51,972-Speed 2589.35 samples/sec   Loss 1.8427   LearningRate 0.0015   Epoch: 17   Global Step: 727860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:55,882-Speed 2619.84 samples/sec   Loss 1.7770   LearningRate 0.0015   Epoch: 17   Global Step: 727870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:31:59,790-Speed 2621.10 samples/sec   Loss 1.7889   LearningRate 0.0015   Epoch: 17   Global Step: 727880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:32:03,760-Speed 2580.07 samples/sec   Loss 1.7989   LearningRate 0.0015   Epoch: 17   Global Step: 727890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:32:07,671-Speed 2618.79 samples/sec   Loss 1.7845   LearningRate 0.0015   Epoch: 17   Global Step: 727900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:32:11,570-Speed 2627.34 samples/sec   Loss 1.8610   LearningRate 0.0015   Epoch: 17   Global Step: 727910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:15,468-Speed 2627.87 samples/sec   Loss 1.7620   LearningRate 0.0015   Epoch: 17   Global Step: 727920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:19,374-Speed 2622.00 samples/sec   Loss 1.7970   LearningRate 0.0015   Epoch: 17   Global Step: 727930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:23,294-Speed 2613.25 samples/sec   Loss 1.7304   LearningRate 0.0015   Epoch: 17   Global Step: 727940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:27,202-Speed 2621.02 samples/sec   Loss 1.7899   LearningRate 0.0015   Epoch: 17   Global Step: 727950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:31,098-Speed 2628.88 samples/sec   Loss 1.7748   LearningRate 0.0015   Epoch: 17   Global Step: 727960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:34,994-Speed 2629.21 samples/sec   Loss 1.7857   LearningRate 0.0015   Epoch: 17   Global Step: 727970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:38,897-Speed 2624.05 samples/sec   Loss 1.8223   LearningRate 0.0015   Epoch: 17   Global Step: 727980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:42,792-Speed 2629.96 samples/sec   Loss 1.8584   LearningRate 0.0015   Epoch: 17   Global Step: 727990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:46,687-Speed 2629.21 samples/sec   Loss 1.8144   LearningRate 0.0015   Epoch: 17   Global Step: 728000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:50,559-Speed 2645.17 samples/sec   Loss 1.7720   LearningRate 0.0015   Epoch: 17   Global Step: 728010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:54,460-Speed 2625.75 samples/sec   Loss 1.8184   LearningRate 0.0015   Epoch: 17   Global Step: 728020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:32:58,353-Speed 2631.57 samples/sec   Loss 1.8144   LearningRate 0.0015   Epoch: 17   Global Step: 728030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:02,245-Speed 2631.84 samples/sec   Loss 1.7865   LearningRate 0.0015   Epoch: 17   Global Step: 728040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:06,140-Speed 2629.35 samples/sec   Loss 1.8045   LearningRate 0.0015   Epoch: 17   Global Step: 728050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:10,035-Speed 2629.56 samples/sec   Loss 1.8095   LearningRate 0.0015   Epoch: 17   Global Step: 728060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:13,933-Speed 2627.39 samples/sec   Loss 1.8188   LearningRate 0.0015   Epoch: 17   Global Step: 728070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:17,869-Speed 2602.83 samples/sec   Loss 1.7928   LearningRate 0.0015   Epoch: 17   Global Step: 728080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:21,766-Speed 2628.59 samples/sec   Loss 1.7229   LearningRate 0.0015   Epoch: 17   Global Step: 728090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:25,704-Speed 2601.58 samples/sec   Loss 1.7814   LearningRate 0.0015   Epoch: 17   Global Step: 728100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:33:29,575-Speed 2646.10 samples/sec   Loss 1.8208   LearningRate 0.0015   Epoch: 17   Global Step: 728110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:33,489-Speed 2616.73 samples/sec   Loss 1.8586   LearningRate 0.0015   Epoch: 17   Global Step: 728120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:37,389-Speed 2626.18 samples/sec   Loss 1.7633   LearningRate 0.0015   Epoch: 17   Global Step: 728130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:41,286-Speed 2628.69 samples/sec   Loss 1.7895   LearningRate 0.0015   Epoch: 17   Global Step: 728140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:45,178-Speed 2631.36 samples/sec   Loss 1.7907   LearningRate 0.0015   Epoch: 17   Global Step: 728150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:49,075-Speed 2628.69 samples/sec   Loss 1.7935   LearningRate 0.0015   Epoch: 17   Global Step: 728160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:52,991-Speed 2615.26 samples/sec   Loss 1.7873   LearningRate 0.0015   Epoch: 17   Global Step: 728170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:33:56,897-Speed 2630.63 samples/sec   Loss 1.8297   LearningRate 0.0015   Epoch: 17   Global Step: 728180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:00,804-Speed 2621.52 samples/sec   Loss 1.7874   LearningRate 0.0015   Epoch: 17   Global Step: 728190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:04,701-Speed 2628.31 samples/sec   Loss 1.8185   LearningRate 0.0015   Epoch: 17   Global Step: 728200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:08,615-Speed 2616.94 samples/sec   Loss 1.7911   LearningRate 0.0015   Epoch: 17   Global Step: 728210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:34:12,516-Speed 2625.67 samples/sec   Loss 1.7689   LearningRate 0.0015   Epoch: 17   Global Step: 728220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:34:16,425-Speed 2619.99 samples/sec   Loss 1.8723   LearningRate 0.0015   Epoch: 17   Global Step: 728230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:34:20,319-Speed 2630.22 samples/sec   Loss 1.7747   LearningRate 0.0015   Epoch: 17   Global Step: 728240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:34:24,210-Speed 2633.32 samples/sec   Loss 1.7607   LearningRate 0.0015   Epoch: 17   Global Step: 728250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:34:28,106-Speed 2628.95 samples/sec   Loss 1.7439   LearningRate 0.0015   Epoch: 17   Global Step: 728260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:34:32,019-Speed 2618.17 samples/sec   Loss 1.7859   LearningRate 0.0015   Epoch: 17   Global Step: 728270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:35,917-Speed 2627.92 samples/sec   Loss 1.8242   LearningRate 0.0015   Epoch: 17   Global Step: 728280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:39,823-Speed 2621.54 samples/sec   Loss 1.7883   LearningRate 0.0015   Epoch: 17   Global Step: 728290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:43,722-Speed 2627.18 samples/sec   Loss 1.8035   LearningRate 0.0015   Epoch: 17   Global Step: 728300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:47,622-Speed 2627.09 samples/sec   Loss 1.7704   LearningRate 0.0015   Epoch: 17   Global Step: 728310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:51,534-Speed 2617.90 samples/sec   Loss 1.8252   LearningRate 0.0015   Epoch: 17   Global Step: 728320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:55,430-Speed 2629.00 samples/sec   Loss 1.8483   LearningRate 0.0015   Epoch: 17   Global Step: 728330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:34:59,340-Speed 2619.58 samples/sec   Loss 1.8017   LearningRate 0.0015   Epoch: 17   Global Step: 728340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:03,236-Speed 2629.11 samples/sec   Loss 1.7837   LearningRate 0.0015   Epoch: 17   Global Step: 728350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:07,128-Speed 2631.65 samples/sec   Loss 1.8141   LearningRate 0.0015   Epoch: 17   Global Step: 728360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:10,999-Speed 2646.04 samples/sec   Loss 1.7731   LearningRate 0.0015   Epoch: 17   Global Step: 728370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:14,891-Speed 2631.42 samples/sec   Loss 1.8540   LearningRate 0.0015   Epoch: 17   Global Step: 728380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:18,788-Speed 2628.46 samples/sec   Loss 1.8053   LearningRate 0.0015   Epoch: 17   Global Step: 728390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:22,690-Speed 2626.00 samples/sec   Loss 1.8041   LearningRate 0.0015   Epoch: 17   Global Step: 728400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:26,591-Speed 2625.87 samples/sec   Loss 1.8122   LearningRate 0.0015   Epoch: 17   Global Step: 728410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:30,541-Speed 2593.00 samples/sec   Loss 1.7551   LearningRate 0.0015   Epoch: 17   Global Step: 728420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:34,440-Speed 2627.31 samples/sec   Loss 1.8307   LearningRate 0.0015   Epoch: 17   Global Step: 728430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:38,338-Speed 2627.18 samples/sec   Loss 1.7597   LearningRate 0.0015   Epoch: 17   Global Step: 728440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:42,237-Speed 2626.75 samples/sec   Loss 1.8025   LearningRate 0.0015   Epoch: 17   Global Step: 728450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:46,138-Speed 2626.59 samples/sec   Loss 1.7239   LearningRate 0.0015   Epoch: 17   Global Step: 728460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:50,047-Speed 2619.62 samples/sec   Loss 1.8094   LearningRate 0.0015   Epoch: 17   Global Step: 728470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:35:53,946-Speed 2627.20 samples/sec   Loss 1.8113   LearningRate 0.0015   Epoch: 17   Global Step: 728480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:35:57,840-Speed 2630.83 samples/sec   Loss 1.7593   LearningRate 0.0015   Epoch: 17   Global Step: 728490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:01,733-Speed 2631.39 samples/sec   Loss 1.8057   LearningRate 0.0015   Epoch: 17   Global Step: 728500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:05,645-Speed 2618.40 samples/sec   Loss 1.9106   LearningRate 0.0015   Epoch: 17   Global Step: 728510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:09,543-Speed 2627.31 samples/sec   Loss 1.7989   LearningRate 0.0015   Epoch: 17   Global Step: 728520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:13,467-Speed 2610.09 samples/sec   Loss 1.7854   LearningRate 0.0015   Epoch: 17   Global Step: 728530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:17,548-Speed 2510.50 samples/sec   Loss 1.7636   LearningRate 0.0015   Epoch: 17   Global Step: 728540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:21,443-Speed 2629.06 samples/sec   Loss 1.7708   LearningRate 0.0015   Epoch: 17   Global Step: 728550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:25,343-Speed 2626.52 samples/sec   Loss 1.8094   LearningRate 0.0015   Epoch: 17   Global Step: 728560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:29,241-Speed 2628.22 samples/sec   Loss 1.7083   LearningRate 0.0015   Epoch: 17   Global Step: 728570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:33,150-Speed 2620.38 samples/sec   Loss 1.8450   LearningRate 0.0015   Epoch: 17   Global Step: 728580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:36:37,054-Speed 2623.77 samples/sec   Loss 1.7831   LearningRate 0.0015   Epoch: 17   Global Step: 728590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:36:40,953-Speed 2627.12 samples/sec   Loss 1.8023   LearningRate 0.0015   Epoch: 17   Global Step: 728600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:36:44,879-Speed 2608.66 samples/sec   Loss 1.7379   LearningRate 0.0015   Epoch: 17   Global Step: 728610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:36:48,757-Speed 2641.38 samples/sec   Loss 1.7772   LearningRate 0.0015   Epoch: 17   Global Step: 728620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:52,654-Speed 2628.57 samples/sec   Loss 1.8013   LearningRate 0.0015   Epoch: 17   Global Step: 728630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:36:56,561-Speed 2621.84 samples/sec   Loss 1.8149   LearningRate 0.0015   Epoch: 17   Global Step: 728640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:00,464-Speed 2624.08 samples/sec   Loss 1.7720   LearningRate 0.0015   Epoch: 17   Global Step: 728650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:04,390-Speed 2608.85 samples/sec   Loss 1.8085   LearningRate 0.0015   Epoch: 17   Global Step: 728660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:08,445-Speed 2526.26 samples/sec   Loss 1.8152   LearningRate 0.0015   Epoch: 17   Global Step: 728670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:12,558-Speed 2490.44 samples/sec   Loss 1.8100   LearningRate 0.0015   Epoch: 17   Global Step: 728680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:16,532-Speed 2576.92 samples/sec   Loss 1.7466   LearningRate 0.0015   Epoch: 17   Global Step: 728690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:20,439-Speed 2622.23 samples/sec   Loss 1.8445   LearningRate 0.0015   Epoch: 17   Global Step: 728700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:24,332-Speed 2630.77 samples/sec   Loss 1.8317   LearningRate 0.0015   Epoch: 17   Global Step: 728710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:28,320-Speed 2568.52 samples/sec   Loss 1.7716   LearningRate 0.0015   Epoch: 17   Global Step: 728720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:37:32,213-Speed 2630.69 samples/sec   Loss 1.8229   LearningRate 0.0015   Epoch: 17   Global Step: 728730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:37:36,114-Speed 2625.31 samples/sec   Loss 1.7665   LearningRate 0.0015   Epoch: 17   Global Step: 728740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:37:40,019-Speed 2623.04 samples/sec   Loss 1.7476   LearningRate 0.0015   Epoch: 17   Global Step: 728750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:37:43,916-Speed 2629.24 samples/sec   Loss 1.7857   LearningRate 0.0015   Epoch: 17   Global Step: 728760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:37:47,815-Speed 2627.23 samples/sec   Loss 1.7638   LearningRate 0.0015   Epoch: 17   Global Step: 728770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:37:51,688-Speed 2644.36 samples/sec   Loss 1.7988   LearningRate 0.0015   Epoch: 17   Global Step: 728780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:55,583-Speed 2629.69 samples/sec   Loss 1.8134   LearningRate 0.0015   Epoch: 17   Global Step: 728790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:37:59,493-Speed 2619.34 samples/sec   Loss 1.7924   LearningRate 0.0015   Epoch: 17   Global Step: 728800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:03,394-Speed 2626.18 samples/sec   Loss 1.8474   LearningRate 0.0015   Epoch: 17   Global Step: 728810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:07,293-Speed 2626.62 samples/sec   Loss 1.8002   LearningRate 0.0015   Epoch: 17   Global Step: 728820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:11,203-Speed 2619.92 samples/sec   Loss 1.7314   LearningRate 0.0015   Epoch: 17   Global Step: 728830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:15,099-Speed 2628.85 samples/sec   Loss 1.8465   LearningRate 0.0015   Epoch: 17   Global Step: 728840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:19,008-Speed 2620.69 samples/sec   Loss 1.8575   LearningRate 0.0015   Epoch: 17   Global Step: 728850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:22,915-Speed 2620.94 samples/sec   Loss 1.7921   LearningRate 0.0015   Epoch: 17   Global Step: 728860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:26,812-Speed 2628.63 samples/sec   Loss 1.7597   LearningRate 0.0015   Epoch: 17   Global Step: 728870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:38:30,708-Speed 2628.87 samples/sec   Loss 1.8401   LearningRate 0.0015   Epoch: 17   Global Step: 728880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:34,606-Speed 2627.91 samples/sec   Loss 1.7342   LearningRate 0.0015   Epoch: 17   Global Step: 728890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:38,499-Speed 2631.21 samples/sec   Loss 1.7725   LearningRate 0.0015   Epoch: 17   Global Step: 728900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:42,441-Speed 2598.26 samples/sec   Loss 1.8052   LearningRate 0.0015   Epoch: 17   Global Step: 728910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:46,333-Speed 2631.92 samples/sec   Loss 1.7947   LearningRate 0.0015   Epoch: 17   Global Step: 728920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:50,231-Speed 2627.76 samples/sec   Loss 1.7806   LearningRate 0.0015   Epoch: 17   Global Step: 728930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:54,124-Speed 2630.63 samples/sec   Loss 1.7848   LearningRate 0.0015   Epoch: 17   Global Step: 728940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:38:58,061-Speed 2602.70 samples/sec   Loss 1.7761   LearningRate 0.0015   Epoch: 17   Global Step: 728950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:39:02,000-Speed 2599.70 samples/sec   Loss 1.8149   LearningRate 0.0015   Epoch: 17   Global Step: 728960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:39:05,896-Speed 2628.78 samples/sec   Loss 1.7297   LearningRate 0.0015   Epoch: 17   Global Step: 728970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:39:09,773-Speed 2641.79 samples/sec   Loss 1.8283   LearningRate 0.0015   Epoch: 17   Global Step: 728980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:39:13,684-Speed 2619.50 samples/sec   Loss 1.8001   LearningRate 0.0015   Epoch: 17   Global Step: 728990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:39:17,598-Speed 2616.86 samples/sec   Loss 1.7407   LearningRate 0.0015   Epoch: 17   Global Step: 729000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:21,496-Speed 2628.02 samples/sec   Loss 1.7893   LearningRate 0.0015   Epoch: 17   Global Step: 729010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:25,455-Speed 2587.38 samples/sec   Loss 1.7593   LearningRate 0.0015   Epoch: 17   Global Step: 729020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:29,353-Speed 2627.65 samples/sec   Loss 1.7764   LearningRate 0.0015   Epoch: 17   Global Step: 729030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:33,265-Speed 2618.23 samples/sec   Loss 1.8070   LearningRate 0.0015   Epoch: 17   Global Step: 729040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:37,185-Speed 2612.89 samples/sec   Loss 1.7951   LearningRate 0.0015   Epoch: 17   Global Step: 729050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:41,083-Speed 2627.22 samples/sec   Loss 1.8305   LearningRate 0.0015   Epoch: 17   Global Step: 729060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:45,081-Speed 2562.01 samples/sec   Loss 1.8178   LearningRate 0.0015   Epoch: 17   Global Step: 729070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:48,983-Speed 2625.40 samples/sec   Loss 1.8654   LearningRate 0.0015   Epoch: 17   Global Step: 729080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:52,903-Speed 2612.11 samples/sec   Loss 1.7381   LearningRate 0.0015   Epoch: 17   Global Step: 729090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:39:56,848-Speed 2597.15 samples/sec   Loss 1.7531   LearningRate 0.0015   Epoch: 17   Global Step: 729100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:40:00,746-Speed 2627.56 samples/sec   Loss 1.7680   LearningRate 0.0015   Epoch: 17   Global Step: 729110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:40:04,623-Speed 2641.88 samples/sec   Loss 1.8036   LearningRate 0.0015   Epoch: 17   Global Step: 729120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:08,520-Speed 2628.45 samples/sec   Loss 1.8035   LearningRate 0.0015   Epoch: 17   Global Step: 729130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:12,430-Speed 2619.91 samples/sec   Loss 1.7793   LearningRate 0.0015   Epoch: 17   Global Step: 729140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:16,324-Speed 2629.88 samples/sec   Loss 1.7718   LearningRate 0.0015   Epoch: 17   Global Step: 729150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:20,219-Speed 2629.63 samples/sec   Loss 1.7520   LearningRate 0.0015   Epoch: 17   Global Step: 729160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:24,119-Speed 2626.75 samples/sec   Loss 1.7152   LearningRate 0.0015   Epoch: 17   Global Step: 729170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:28,010-Speed 2632.22 samples/sec   Loss 1.8028   LearningRate 0.0015   Epoch: 17   Global Step: 729180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:31,924-Speed 2617.10 samples/sec   Loss 1.8094   LearningRate 0.0015   Epoch: 17   Global Step: 729190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:35,838-Speed 2617.37 samples/sec   Loss 1.7615   LearningRate 0.0015   Epoch: 17   Global Step: 729200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:39,731-Speed 2630.91 samples/sec   Loss 1.7765   LearningRate 0.0015   Epoch: 17   Global Step: 729210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:40:43,627-Speed 2628.30 samples/sec   Loss 1.8051   LearningRate 0.0015   Epoch: 17   Global Step: 729220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:40:47,521-Speed 2631.03 samples/sec   Loss 1.8095   LearningRate 0.0015   Epoch: 17   Global Step: 729230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:40:51,414-Speed 2631.32 samples/sec   Loss 1.7954   LearningRate 0.0015   Epoch: 17   Global Step: 729240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:40:55,310-Speed 2629.23 samples/sec   Loss 1.7890   LearningRate 0.0015   Epoch: 17   Global Step: 729250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:40:59,211-Speed 2625.46 samples/sec   Loss 1.7929   LearningRate 0.0015   Epoch: 17   Global Step: 729260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:41:03,128-Speed 2614.55 samples/sec   Loss 1.7616   LearningRate 0.0015   Epoch: 17   Global Step: 729270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:41:07,062-Speed 2604.09 samples/sec   Loss 1.8189   LearningRate 0.0015   Epoch: 17   Global Step: 729280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:41:10,967-Speed 2622.96 samples/sec   Loss 1.7427   LearningRate 0.0015   Epoch: 17   Global Step: 729290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:41:14,871-Speed 2623.45 samples/sec   Loss 1.7572   LearningRate 0.0015   Epoch: 17   Global Step: 729300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:41:18,777-Speed 2622.24 samples/sec   Loss 1.7598   LearningRate 0.0015   Epoch: 17   Global Step: 729310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:41:22,655-Speed 2641.22 samples/sec   Loss 1.7593   LearningRate 0.0015   Epoch: 17   Global Step: 729320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:26,554-Speed 2626.72 samples/sec   Loss 1.7813   LearningRate 0.0015   Epoch: 17   Global Step: 729330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:30,454-Speed 2626.53 samples/sec   Loss 1.7095   LearningRate 0.0015   Epoch: 17   Global Step: 729340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:34,365-Speed 2619.43 samples/sec   Loss 1.7846   LearningRate 0.0015   Epoch: 17   Global Step: 729350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:38,263-Speed 2627.37 samples/sec   Loss 1.8024   LearningRate 0.0015   Epoch: 17   Global Step: 729360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:42,185-Speed 2611.43 samples/sec   Loss 1.7250   LearningRate 0.0015   Epoch: 17   Global Step: 729370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:46,136-Speed 2592.21 samples/sec   Loss 1.7940   LearningRate 0.0015   Epoch: 17   Global Step: 729380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:50,036-Speed 2626.17 samples/sec   Loss 1.7514   LearningRate 0.0015   Epoch: 17   Global Step: 729390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:53,934-Speed 2628.29 samples/sec   Loss 1.7853   LearningRate 0.0015   Epoch: 17   Global Step: 729400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:41:57,847-Speed 2616.99 samples/sec   Loss 1.7893   LearningRate 0.0015   Epoch: 17   Global Step: 729410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:01,768-Speed 2613.07 samples/sec   Loss 1.7412   LearningRate 0.0015   Epoch: 17   Global Step: 729420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:42:05,671-Speed 2623.78 samples/sec   Loss 1.8340   LearningRate 0.0015   Epoch: 17   Global Step: 729430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:42:09,589-Speed 2614.49 samples/sec   Loss 1.8124   LearningRate 0.0015   Epoch: 17   Global Step: 729440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:13,509-Speed 2612.86 samples/sec   Loss 1.8328   LearningRate 0.0015   Epoch: 17   Global Step: 729450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:17,428-Speed 2614.20 samples/sec   Loss 1.7653   LearningRate 0.0015   Epoch: 17   Global Step: 729460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:21,338-Speed 2618.89 samples/sec   Loss 1.7664   LearningRate 0.0015   Epoch: 17   Global Step: 729470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:25,232-Speed 2630.55 samples/sec   Loss 1.8522   LearningRate 0.0015   Epoch: 17   Global Step: 729480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:29,154-Speed 2619.31 samples/sec   Loss 1.7761   LearningRate 0.0015   Epoch: 17   Global Step: 729490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:33,050-Speed 2628.51 samples/sec   Loss 1.7361   LearningRate 0.0015   Epoch: 17   Global Step: 729500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:36,949-Speed 2626.73 samples/sec   Loss 1.7933   LearningRate 0.0015   Epoch: 17   Global Step: 729510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:40,850-Speed 2625.53 samples/sec   Loss 1.8262   LearningRate 0.0015   Epoch: 17   Global Step: 729520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:44,754-Speed 2624.08 samples/sec   Loss 1.7382   LearningRate 0.0015   Epoch: 17   Global Step: 729530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:42:48,670-Speed 2615.65 samples/sec   Loss 1.7023   LearningRate 0.0015   Epoch: 17   Global Step: 729540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:42:52,568-Speed 2628.39 samples/sec   Loss 1.8002   LearningRate 0.0015   Epoch: 17   Global Step: 729550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:42:56,464-Speed 2628.54 samples/sec   Loss 1.8047   LearningRate 0.0015   Epoch: 17   Global Step: 729560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:43:00,415-Speed 2592.50 samples/sec   Loss 1.7736   LearningRate 0.0015   Epoch: 17   Global Step: 729570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:04,311-Speed 2629.40 samples/sec   Loss 1.7876   LearningRate 0.0015   Epoch: 17   Global Step: 729580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:08,205-Speed 2630.36 samples/sec   Loss 1.7660   LearningRate 0.0015   Epoch: 17   Global Step: 729590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:12,120-Speed 2616.51 samples/sec   Loss 1.7792   LearningRate 0.0015   Epoch: 17   Global Step: 729600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:16,016-Speed 2628.46 samples/sec   Loss 1.7867   LearningRate 0.0015   Epoch: 17   Global Step: 729610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:19,918-Speed 2625.63 samples/sec   Loss 1.7490   LearningRate 0.0015   Epoch: 17   Global Step: 729620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:23,819-Speed 2625.20 samples/sec   Loss 1.7927   LearningRate 0.0015   Epoch: 17   Global Step: 729630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:27,730-Speed 2619.30 samples/sec   Loss 1.7979   LearningRate 0.0015   Epoch: 17   Global Step: 729640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:31,627-Speed 2628.55 samples/sec   Loss 1.8298   LearningRate 0.0015   Epoch: 17   Global Step: 729650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:35,529-Speed 2624.86 samples/sec   Loss 1.8064   LearningRate 0.0015   Epoch: 17   Global Step: 729660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:39,432-Speed 2623.64 samples/sec   Loss 1.7791   LearningRate 0.0015   Epoch: 17   Global Step: 729670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:43:43,346-Speed 2617.62 samples/sec   Loss 1.8023   LearningRate 0.0015   Epoch: 17   Global Step: 729680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:43:47,241-Speed 2629.61 samples/sec   Loss 1.7963   LearningRate 0.0014   Epoch: 17   Global Step: 729690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:43:51,117-Speed 2642.83 samples/sec   Loss 1.7943   LearningRate 0.0014   Epoch: 17   Global Step: 729700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:55,013-Speed 2628.71 samples/sec   Loss 1.7730   LearningRate 0.0014   Epoch: 17   Global Step: 729710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:43:58,922-Speed 2621.40 samples/sec   Loss 1.7810   LearningRate 0.0014   Epoch: 17   Global Step: 729720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:02,826-Speed 2623.56 samples/sec   Loss 1.7773   LearningRate 0.0014   Epoch: 17   Global Step: 729730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:06,724-Speed 2627.84 samples/sec   Loss 1.7537   LearningRate 0.0014   Epoch: 17   Global Step: 729740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:10,643-Speed 2613.32 samples/sec   Loss 1.8501   LearningRate 0.0014   Epoch: 17   Global Step: 729750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:14,566-Speed 2611.09 samples/sec   Loss 1.7451   LearningRate 0.0014   Epoch: 17   Global Step: 729760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:18,471-Speed 2623.41 samples/sec   Loss 1.8123   LearningRate 0.0014   Epoch: 17   Global Step: 729770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:22,372-Speed 2625.48 samples/sec   Loss 1.7872   LearningRate 0.0014   Epoch: 17   Global Step: 729780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:26,284-Speed 2618.44 samples/sec   Loss 1.7860   LearningRate 0.0014   Epoch: 17   Global Step: 729790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:30,182-Speed 2627.99 samples/sec   Loss 1.7923   LearningRate 0.0014   Epoch: 17   Global Step: 729800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:44:34,061-Speed 2640.30 samples/sec   Loss 1.7028   LearningRate 0.0014   Epoch: 17   Global Step: 729810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:37,958-Speed 2627.76 samples/sec   Loss 1.7241   LearningRate 0.0014   Epoch: 17   Global Step: 729820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:41,855-Speed 2629.11 samples/sec   Loss 1.7993   LearningRate 0.0014   Epoch: 17   Global Step: 729830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:45,771-Speed 2615.67 samples/sec   Loss 1.7285   LearningRate 0.0014   Epoch: 17   Global Step: 729840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:49,670-Speed 2627.29 samples/sec   Loss 1.7080   LearningRate 0.0014   Epoch: 17   Global Step: 729850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:53,566-Speed 2629.16 samples/sec   Loss 1.7598   LearningRate 0.0014   Epoch: 17   Global Step: 729860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:44:57,467-Speed 2626.06 samples/sec   Loss 1.8283   LearningRate 0.0014   Epoch: 17   Global Step: 729870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:01,361-Speed 2630.02 samples/sec   Loss 1.7991   LearningRate 0.0014   Epoch: 17   Global Step: 729880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:05,260-Speed 2626.63 samples/sec   Loss 1.7601   LearningRate 0.0014   Epoch: 17   Global Step: 729890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:09,159-Speed 2627.15 samples/sec   Loss 1.7686   LearningRate 0.0014   Epoch: 17   Global Step: 729900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:13,119-Speed 2586.66 samples/sec   Loss 1.7518   LearningRate 0.0014   Epoch: 17   Global Step: 729910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:45:17,018-Speed 2626.89 samples/sec   Loss 1.7937   LearningRate 0.0014   Epoch: 17   Global Step: 729920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:45:20,888-Speed 2647.13 samples/sec   Loss 1.7325   LearningRate 0.0014   Epoch: 17   Global Step: 729930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:24,784-Speed 2629.12 samples/sec   Loss 1.8008   LearningRate 0.0014   Epoch: 17   Global Step: 729940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:28,682-Speed 2627.77 samples/sec   Loss 1.7508   LearningRate 0.0014   Epoch: 17   Global Step: 729950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:32,579-Speed 2628.67 samples/sec   Loss 1.7956   LearningRate 0.0014   Epoch: 17   Global Step: 729960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:36,473-Speed 2629.62 samples/sec   Loss 1.7423   LearningRate 0.0014   Epoch: 17   Global Step: 729970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:40,371-Speed 2627.98 samples/sec   Loss 1.7803   LearningRate 0.0014   Epoch: 17   Global Step: 729980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:44,277-Speed 2622.21 samples/sec   Loss 1.7641   LearningRate 0.0014   Epoch: 17   Global Step: 729990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:45:48,171-Speed 2630.31 samples/sec   Loss 1.7620   LearningRate 0.0014   Epoch: 17   Global Step: 730000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:46:31,337-[lfw][730000]XNorm: 21.993687
Training: 2022-04-16 05:46:31,338-[lfw][730000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 05:46:31,339-[lfw][730000]Accuracy-Highest: 0.99850
Training: 2022-04-16 05:47:21,346-[cfp_fp][730000]XNorm: 22.258556
Training: 2022-04-16 05:47:21,347-[cfp_fp][730000]Accuracy-Flip: 0.99314+-0.00473
Training: 2022-04-16 05:47:21,348-[cfp_fp][730000]Accuracy-Highest: 0.99329
Training: 2022-04-16 05:48:04,203-[agedb_30][730000]XNorm: 22.848101
Training: 2022-04-16 05:48:04,204-[agedb_30][730000]Accuracy-Flip: 0.98250+-0.00680
Training: 2022-04-16 05:48:04,204-[agedb_30][730000]Accuracy-Highest: 0.98333
Training: 2022-04-16 05:48:08,081-Speed 73.19 samples/sec   Loss 1.7865   LearningRate 0.0014   Epoch: 17   Global Step: 730010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:48:11,957-Speed 2642.56 samples/sec   Loss 1.8264   LearningRate 0.0014   Epoch: 17   Global Step: 730020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:48:15,836-Speed 2640.73 samples/sec   Loss 1.8010   LearningRate 0.0014   Epoch: 17   Global Step: 730030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:19,720-Speed 2636.91 samples/sec   Loss 1.7008   LearningRate 0.0014   Epoch: 17   Global Step: 730040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:23,600-Speed 2639.49 samples/sec   Loss 1.7728   LearningRate 0.0014   Epoch: 17   Global Step: 730050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:27,493-Speed 2631.09 samples/sec   Loss 1.7907   LearningRate 0.0014   Epoch: 17   Global Step: 730060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:31,378-Speed 2636.21 samples/sec   Loss 1.7740   LearningRate 0.0014   Epoch: 17   Global Step: 730070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:35,278-Speed 2626.24 samples/sec   Loss 1.7098   LearningRate 0.0014   Epoch: 17   Global Step: 730080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:39,166-Speed 2635.10 samples/sec   Loss 1.7044   LearningRate 0.0014   Epoch: 17   Global Step: 730090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:43,055-Speed 2633.50 samples/sec   Loss 1.8157   LearningRate 0.0014   Epoch: 17   Global Step: 730100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:46,949-Speed 2630.46 samples/sec   Loss 1.7763   LearningRate 0.0014   Epoch: 17   Global Step: 730110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:50,853-Speed 2623.91 samples/sec   Loss 1.7233   LearningRate 0.0014   Epoch: 17   Global Step: 730120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:54,749-Speed 2628.53 samples/sec   Loss 1.8404   LearningRate 0.0014   Epoch: 17   Global Step: 730130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:48:58,640-Speed 2633.35 samples/sec   Loss 1.7687   LearningRate 0.0014   Epoch: 17   Global Step: 730140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:49:02,531-Speed 2632.18 samples/sec   Loss 1.7518   LearningRate 0.0014   Epoch: 17   Global Step: 730150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:49:06,437-Speed 2622.08 samples/sec   Loss 1.7740   LearningRate 0.0014   Epoch: 17   Global Step: 730160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:49:10,343-Speed 2623.20 samples/sec   Loss 1.7620   LearningRate 0.0014   Epoch: 17   Global Step: 730170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:49:14,238-Speed 2629.68 samples/sec   Loss 1.7931   LearningRate 0.0014   Epoch: 17   Global Step: 730180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:49:18,150-Speed 2618.36 samples/sec   Loss 1.7742   LearningRate 0.0014   Epoch: 17   Global Step: 730190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:22,044-Speed 2630.66 samples/sec   Loss 1.7876   LearningRate 0.0014   Epoch: 17   Global Step: 730200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:25,940-Speed 2628.69 samples/sec   Loss 1.7880   LearningRate 0.0014   Epoch: 17   Global Step: 730210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:29,846-Speed 2621.95 samples/sec   Loss 1.7265   LearningRate 0.0014   Epoch: 17   Global Step: 730220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:33,745-Speed 2627.71 samples/sec   Loss 1.7329   LearningRate 0.0014   Epoch: 17   Global Step: 730230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:37,653-Speed 2621.03 samples/sec   Loss 1.7961   LearningRate 0.0014   Epoch: 17   Global Step: 730240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:41,562-Speed 2620.76 samples/sec   Loss 1.8342   LearningRate 0.0014   Epoch: 17   Global Step: 730250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:45,462-Speed 2625.77 samples/sec   Loss 1.8000   LearningRate 0.0014   Epoch: 17   Global Step: 730260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:49,375-Speed 2618.06 samples/sec   Loss 1.7405   LearningRate 0.0014   Epoch: 17   Global Step: 730270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:53,284-Speed 2619.78 samples/sec   Loss 1.8082   LearningRate 0.0014   Epoch: 17   Global Step: 730280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:49:57,192-Speed 2620.55 samples/sec   Loss 1.7576   LearningRate 0.0014   Epoch: 17   Global Step: 730290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:50:01,108-Speed 2615.43 samples/sec   Loss 1.7652   LearningRate 0.0014   Epoch: 17   Global Step: 730300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:50:05,013-Speed 2623.51 samples/sec   Loss 1.8130   LearningRate 0.0014   Epoch: 17   Global Step: 730310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:50:08,915-Speed 2625.20 samples/sec   Loss 1.7098   LearningRate 0.0014   Epoch: 17   Global Step: 730320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:50:12,789-Speed 2643.96 samples/sec   Loss 1.7897   LearningRate 0.0014   Epoch: 17   Global Step: 730330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:16,708-Speed 2614.24 samples/sec   Loss 1.8201   LearningRate 0.0014   Epoch: 17   Global Step: 730340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:20,607-Speed 2627.22 samples/sec   Loss 1.7363   LearningRate 0.0014   Epoch: 17   Global Step: 730350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:24,602-Speed 2563.27 samples/sec   Loss 1.7586   LearningRate 0.0014   Epoch: 17   Global Step: 730360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:28,635-Speed 2539.76 samples/sec   Loss 1.7187   LearningRate 0.0014   Epoch: 17   Global Step: 730370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:32,550-Speed 2616.64 samples/sec   Loss 1.8125   LearningRate 0.0014   Epoch: 17   Global Step: 730380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:36,450-Speed 2626.08 samples/sec   Loss 1.7632   LearningRate 0.0014   Epoch: 17   Global Step: 730390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:40,349-Speed 2627.75 samples/sec   Loss 1.8142   LearningRate 0.0014   Epoch: 17   Global Step: 730400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:44,244-Speed 2629.28 samples/sec   Loss 1.7373   LearningRate 0.0014   Epoch: 17   Global Step: 730410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:48,140-Speed 2629.73 samples/sec   Loss 1.7778   LearningRate 0.0014   Epoch: 17   Global Step: 730420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:52,022-Speed 2638.53 samples/sec   Loss 1.7961   LearningRate 0.0014   Epoch: 17   Global Step: 730430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:55,918-Speed 2628.34 samples/sec   Loss 1.7825   LearningRate 0.0014   Epoch: 17   Global Step: 730440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:50:59,818-Speed 2626.45 samples/sec   Loss 1.8197   LearningRate 0.0014   Epoch: 17   Global Step: 730450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:03,715-Speed 2628.91 samples/sec   Loss 1.8021   LearningRate 0.0014   Epoch: 17   Global Step: 730460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:07,611-Speed 2628.41 samples/sec   Loss 1.7543   LearningRate 0.0014   Epoch: 17   Global Step: 730470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:11,512-Speed 2626.23 samples/sec   Loss 1.7786   LearningRate 0.0014   Epoch: 17   Global Step: 730480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:15,415-Speed 2624.07 samples/sec   Loss 1.8167   LearningRate 0.0014   Epoch: 17   Global Step: 730490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:19,317-Speed 2624.78 samples/sec   Loss 1.7830   LearningRate 0.0014   Epoch: 17   Global Step: 730500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:23,215-Speed 2627.33 samples/sec   Loss 1.7162   LearningRate 0.0014   Epoch: 17   Global Step: 730510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:27,137-Speed 2611.62 samples/sec   Loss 1.7877   LearningRate 0.0014   Epoch: 17   Global Step: 730520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:51:31,048-Speed 2619.39 samples/sec   Loss 1.7440   LearningRate 0.0014   Epoch: 17   Global Step: 730530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:34,955-Speed 2621.71 samples/sec   Loss 1.7285   LearningRate 0.0014   Epoch: 17   Global Step: 730540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:38,879-Speed 2610.61 samples/sec   Loss 1.7340   LearningRate 0.0014   Epoch: 17   Global Step: 730550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:42,779-Speed 2626.30 samples/sec   Loss 1.7939   LearningRate 0.0014   Epoch: 17   Global Step: 730560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:46,677-Speed 2627.56 samples/sec   Loss 1.7726   LearningRate 0.0014   Epoch: 17   Global Step: 730570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:50,606-Speed 2607.11 samples/sec   Loss 1.7784   LearningRate 0.0014   Epoch: 17   Global Step: 730580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:54,515-Speed 2620.23 samples/sec   Loss 1.8240   LearningRate 0.0014   Epoch: 17   Global Step: 730590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:51:58,441-Speed 2608.45 samples/sec   Loss 1.8315   LearningRate 0.0014   Epoch: 17   Global Step: 730600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:02,344-Speed 2624.79 samples/sec   Loss 1.8023   LearningRate 0.0014   Epoch: 17   Global Step: 730610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:06,243-Speed 2627.32 samples/sec   Loss 1.8013   LearningRate 0.0014   Epoch: 17   Global Step: 730620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:10,156-Speed 2617.59 samples/sec   Loss 1.7877   LearningRate 0.0014   Epoch: 17   Global Step: 730630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-16 05:52:14,037-Speed 2639.48 samples/sec   Loss 1.7432   LearningRate 0.0014   Epoch: 17   Global Step: 730640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:17,933-Speed 2630.06 samples/sec   Loss 1.7449   LearningRate 0.0014   Epoch: 17   Global Step: 730650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:21,832-Speed 2627.10 samples/sec   Loss 1.7076   LearningRate 0.0014   Epoch: 17   Global Step: 730660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:25,731-Speed 2626.68 samples/sec   Loss 1.7693   LearningRate 0.0014   Epoch: 17   Global Step: 730670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:29,644-Speed 2617.28 samples/sec   Loss 1.7360   LearningRate 0.0014   Epoch: 17   Global Step: 730680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:33,548-Speed 2624.10 samples/sec   Loss 1.7746   LearningRate 0.0014   Epoch: 17   Global Step: 730690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:52:37,425-Speed 2641.86 samples/sec   Loss 1.7757   LearningRate 0.0014   Epoch: 17   Global Step: 730700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:52:41,340-Speed 2616.43 samples/sec   Loss 1.7205   LearningRate 0.0014   Epoch: 17   Global Step: 730710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:52:45,236-Speed 2629.94 samples/sec   Loss 1.7369   LearningRate 0.0014   Epoch: 17   Global Step: 730720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:52:49,139-Speed 2623.89 samples/sec   Loss 1.7356   LearningRate 0.0014   Epoch: 17   Global Step: 730730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:52:53,049-Speed 2619.94 samples/sec   Loss 1.7964   LearningRate 0.0014   Epoch: 17   Global Step: 730740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:52:56,945-Speed 2628.76 samples/sec   Loss 1.7757   LearningRate 0.0014   Epoch: 17   Global Step: 730750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:00,840-Speed 2629.87 samples/sec   Loss 1.7800   LearningRate 0.0014   Epoch: 17   Global Step: 730760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:04,736-Speed 2628.63 samples/sec   Loss 1.8138   LearningRate 0.0014   Epoch: 17   Global Step: 730770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:08,648-Speed 2619.15 samples/sec   Loss 1.8239   LearningRate 0.0014   Epoch: 17   Global Step: 730780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:12,548-Speed 2625.97 samples/sec   Loss 1.7207   LearningRate 0.0014   Epoch: 17   Global Step: 730790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:16,454-Speed 2622.32 samples/sec   Loss 1.8413   LearningRate 0.0014   Epoch: 17   Global Step: 730800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:53:20,361-Speed 2622.12 samples/sec   Loss 1.7216   LearningRate 0.0014   Epoch: 17   Global Step: 730810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:53:24,260-Speed 2627.05 samples/sec   Loss 1.7818   LearningRate 0.0014   Epoch: 17   Global Step: 730820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:53:28,196-Speed 2602.50 samples/sec   Loss 1.7407   LearningRate 0.0014   Epoch: 17   Global Step: 730830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:53:32,078-Speed 2638.24 samples/sec   Loss 1.7704   LearningRate 0.0014   Epoch: 17   Global Step: 730840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:35,997-Speed 2613.78 samples/sec   Loss 1.7555   LearningRate 0.0014   Epoch: 17   Global Step: 730850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:39,890-Speed 2630.53 samples/sec   Loss 1.7787   LearningRate 0.0014   Epoch: 17   Global Step: 730860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:43,788-Speed 2628.45 samples/sec   Loss 1.7530   LearningRate 0.0014   Epoch: 17   Global Step: 730870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:47,686-Speed 2627.23 samples/sec   Loss 1.7835   LearningRate 0.0014   Epoch: 17   Global Step: 730880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:51,580-Speed 2630.40 samples/sec   Loss 1.6969   LearningRate 0.0014   Epoch: 17   Global Step: 730890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:55,475-Speed 2629.90 samples/sec   Loss 1.8047   LearningRate 0.0014   Epoch: 17   Global Step: 730900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:53:59,417-Speed 2599.38 samples/sec   Loss 1.7597   LearningRate 0.0014   Epoch: 17   Global Step: 730910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:03,309-Speed 2631.57 samples/sec   Loss 1.7415   LearningRate 0.0014   Epoch: 17   Global Step: 730920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:07,213-Speed 2623.39 samples/sec   Loss 1.7833   LearningRate 0.0014   Epoch: 17   Global Step: 730930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:11,108-Speed 2629.54 samples/sec   Loss 1.7374   LearningRate 0.0014   Epoch: 17   Global Step: 730940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:54:15,008-Speed 2626.81 samples/sec   Loss 1.7451   LearningRate 0.0014   Epoch: 17   Global Step: 730950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:54:18,909-Speed 2625.30 samples/sec   Loss 1.6756   LearningRate 0.0014   Epoch: 17   Global Step: 730960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:54:22,784-Speed 2643.68 samples/sec   Loss 1.7150   LearningRate 0.0014   Epoch: 17   Global Step: 730970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:26,680-Speed 2629.04 samples/sec   Loss 1.7615   LearningRate 0.0014   Epoch: 17   Global Step: 730980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:30,576-Speed 2629.41 samples/sec   Loss 1.8027   LearningRate 0.0014   Epoch: 17   Global Step: 730990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:34,471-Speed 2629.03 samples/sec   Loss 1.7512   LearningRate 0.0014   Epoch: 17   Global Step: 731000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:38,386-Speed 2616.35 samples/sec   Loss 1.7838   LearningRate 0.0014   Epoch: 17   Global Step: 731010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:42,330-Speed 2596.89 samples/sec   Loss 1.8136   LearningRate 0.0014   Epoch: 17   Global Step: 731020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:46,232-Speed 2624.82 samples/sec   Loss 1.7628   LearningRate 0.0014   Epoch: 17   Global Step: 731030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:50,169-Speed 2601.86 samples/sec   Loss 1.8258   LearningRate 0.0014   Epoch: 17   Global Step: 731040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:54,166-Speed 2562.37 samples/sec   Loss 1.7164   LearningRate 0.0014   Epoch: 17   Global Step: 731050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:54:58,062-Speed 2629.44 samples/sec   Loss 1.7941   LearningRate 0.0014   Epoch: 17   Global Step: 731060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:01,966-Speed 2623.42 samples/sec   Loss 1.7365   LearningRate 0.0014   Epoch: 17   Global Step: 731070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:05,879-Speed 2617.36 samples/sec   Loss 1.7962   LearningRate 0.0014   Epoch: 17   Global Step: 731080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:09,819-Speed 2599.85 samples/sec   Loss 1.7581   LearningRate 0.0014   Epoch: 17   Global Step: 731090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:13,718-Speed 2627.23 samples/sec   Loss 1.7330   LearningRate 0.0014   Epoch: 17   Global Step: 731100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:17,617-Speed 2626.46 samples/sec   Loss 1.7927   LearningRate 0.0014   Epoch: 17   Global Step: 731110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:21,514-Speed 2628.36 samples/sec   Loss 1.6882   LearningRate 0.0014   Epoch: 17   Global Step: 731120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:25,419-Speed 2623.09 samples/sec   Loss 1.7525   LearningRate 0.0014   Epoch: 17   Global Step: 731130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:29,325-Speed 2622.60 samples/sec   Loss 1.7523   LearningRate 0.0014   Epoch: 17   Global Step: 731140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:55:33,198-Speed 2644.53 samples/sec   Loss 1.7419   LearningRate 0.0014   Epoch: 17   Global Step: 731150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:37,093-Speed 2629.45 samples/sec   Loss 1.7597   LearningRate 0.0014   Epoch: 17   Global Step: 731160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:40,988-Speed 2629.31 samples/sec   Loss 1.7468   LearningRate 0.0014   Epoch: 17   Global Step: 731170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:44,889-Speed 2626.27 samples/sec   Loss 1.7502   LearningRate 0.0014   Epoch: 17   Global Step: 731180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:48,792-Speed 2623.93 samples/sec   Loss 1.7242   LearningRate 0.0014   Epoch: 17   Global Step: 731190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:52,691-Speed 2627.07 samples/sec   Loss 1.7769   LearningRate 0.0014   Epoch: 17   Global Step: 731200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:55:56,589-Speed 2627.59 samples/sec   Loss 1.7650   LearningRate 0.0014   Epoch: 17   Global Step: 731210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:00,492-Speed 2625.22 samples/sec   Loss 1.8311   LearningRate 0.0014   Epoch: 17   Global Step: 731220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:04,385-Speed 2630.62 samples/sec   Loss 1.7409   LearningRate 0.0014   Epoch: 17   Global Step: 731230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:08,276-Speed 2631.80 samples/sec   Loss 1.7302   LearningRate 0.0014   Epoch: 17   Global Step: 731240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:12,171-Speed 2629.79 samples/sec   Loss 1.7553   LearningRate 0.0014   Epoch: 17   Global Step: 731250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:16,071-Speed 2626.51 samples/sec   Loss 1.7485   LearningRate 0.0014   Epoch: 17   Global Step: 731260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:19,966-Speed 2629.94 samples/sec   Loss 1.7703   LearningRate 0.0014   Epoch: 17   Global Step: 731270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:23,858-Speed 2631.54 samples/sec   Loss 1.7661   LearningRate 0.0014   Epoch: 17   Global Step: 731280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:27,754-Speed 2630.05 samples/sec   Loss 1.7906   LearningRate 0.0014   Epoch: 17   Global Step: 731290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:31,646-Speed 2630.93 samples/sec   Loss 1.7813   LearningRate 0.0014   Epoch: 17   Global Step: 731300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:35,542-Speed 2628.76 samples/sec   Loss 1.7482   LearningRate 0.0014   Epoch: 17   Global Step: 731310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:39,444-Speed 2625.07 samples/sec   Loss 1.8195   LearningRate 0.0014   Epoch: 17   Global Step: 731320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:43,375-Speed 2605.92 samples/sec   Loss 1.7736   LearningRate 0.0014   Epoch: 17   Global Step: 731330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:56:47,246-Speed 2646.39 samples/sec   Loss 1.7301   LearningRate 0.0014   Epoch: 17   Global Step: 731340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:51,141-Speed 2629.62 samples/sec   Loss 1.8057   LearningRate 0.0014   Epoch: 17   Global Step: 731350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:55,038-Speed 2627.90 samples/sec   Loss 1.7339   LearningRate 0.0014   Epoch: 17   Global Step: 731360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:56:58,936-Speed 2628.25 samples/sec   Loss 1.6980   LearningRate 0.0014   Epoch: 17   Global Step: 731370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:02,842-Speed 2622.44 samples/sec   Loss 1.7647   LearningRate 0.0014   Epoch: 17   Global Step: 731380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:06,742-Speed 2626.34 samples/sec   Loss 1.6941   LearningRate 0.0014   Epoch: 17   Global Step: 731390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:10,640-Speed 2627.14 samples/sec   Loss 1.7550   LearningRate 0.0014   Epoch: 17   Global Step: 731400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:14,546-Speed 2622.97 samples/sec   Loss 1.7945   LearningRate 0.0014   Epoch: 17   Global Step: 731410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:18,584-Speed 2536.54 samples/sec   Loss 1.7295   LearningRate 0.0014   Epoch: 17   Global Step: 731420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:22,658-Speed 2514.03 samples/sec   Loss 1.7933   LearningRate 0.0014   Epoch: 17   Global Step: 731430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:26,554-Speed 2629.04 samples/sec   Loss 1.7396   LearningRate 0.0014   Epoch: 17   Global Step: 731440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:30,449-Speed 2630.24 samples/sec   Loss 1.8118   LearningRate 0.0014   Epoch: 17   Global Step: 731450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:34,355-Speed 2622.20 samples/sec   Loss 1.7208   LearningRate 0.0014   Epoch: 17   Global Step: 731460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:38,251-Speed 2628.35 samples/sec   Loss 1.7906   LearningRate 0.0014   Epoch: 17   Global Step: 731470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:42,154-Speed 2624.37 samples/sec   Loss 1.7699   LearningRate 0.0014   Epoch: 17   Global Step: 731480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:46,050-Speed 2629.49 samples/sec   Loss 1.7752   LearningRate 0.0014   Epoch: 17   Global Step: 731490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:49,953-Speed 2624.93 samples/sec   Loss 1.7773   LearningRate 0.0014   Epoch: 17   Global Step: 731500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:57:53,825-Speed 2644.84 samples/sec   Loss 1.7839   LearningRate 0.0014   Epoch: 17   Global Step: 731510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:57:57,722-Speed 2628.61 samples/sec   Loss 1.7636   LearningRate 0.0014   Epoch: 17   Global Step: 731520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:58:01,628-Speed 2622.26 samples/sec   Loss 1.7689   LearningRate 0.0014   Epoch: 17   Global Step: 731530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:58:05,526-Speed 2627.11 samples/sec   Loss 1.7951   LearningRate 0.0014   Epoch: 17   Global Step: 731540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:58:09,428-Speed 2624.87 samples/sec   Loss 1.7615   LearningRate 0.0014   Epoch: 17   Global Step: 731550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:58:13,301-Speed 2644.57 samples/sec   Loss 1.7253   LearningRate 0.0014   Epoch: 17   Global Step: 731560   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:17,198-Speed 2628.18 samples/sec   Loss 1.7724   LearningRate 0.0014   Epoch: 17   Global Step: 731570   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:21,096-Speed 2628.39 samples/sec   Loss 1.7857   LearningRate 0.0014   Epoch: 17   Global Step: 731580   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:24,997-Speed 2625.90 samples/sec   Loss 1.7581   LearningRate 0.0014   Epoch: 17   Global Step: 731590   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:28,897-Speed 2626.40 samples/sec   Loss 1.6948   LearningRate 0.0014   Epoch: 17   Global Step: 731600   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:32,792-Speed 2629.32 samples/sec   Loss 1.7653   LearningRate 0.0014   Epoch: 17   Global Step: 731610   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:36,702-Speed 2620.01 samples/sec   Loss 1.7952   LearningRate 0.0014   Epoch: 17   Global Step: 731620   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:40,639-Speed 2601.71 samples/sec   Loss 1.7723   LearningRate 0.0014   Epoch: 17   Global Step: 731630   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:44,552-Speed 2618.19 samples/sec   Loss 1.7243   LearningRate 0.0014   Epoch: 17   Global Step: 731640   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:48,457-Speed 2622.39 samples/sec   Loss 1.6885   LearningRate 0.0014   Epoch: 17   Global Step: 731650   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-16 05:58:52,355-Speed 2628.25 samples/sec   Loss 1.7839   LearningRate 0.0014   Epoch: 17   Global Step: 731660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:58:56,265-Speed 2619.47 samples/sec   Loss 1.7662   LearningRate 0.0014   Epoch: 17   Global Step: 731670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:00,163-Speed 2627.29 samples/sec   Loss 1.7984   LearningRate 0.0014   Epoch: 17   Global Step: 731680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:04,065-Speed 2625.02 samples/sec   Loss 1.7800   LearningRate 0.0014   Epoch: 17   Global Step: 731690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:07,989-Speed 2610.58 samples/sec   Loss 1.6868   LearningRate 0.0014   Epoch: 17   Global Step: 731700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:11,888-Speed 2626.67 samples/sec   Loss 1.8263   LearningRate 0.0014   Epoch: 17   Global Step: 731710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:15,787-Speed 2627.68 samples/sec   Loss 1.7865   LearningRate 0.0014   Epoch: 17   Global Step: 731720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:19,695-Speed 2622.30 samples/sec   Loss 1.7238   LearningRate 0.0014   Epoch: 17   Global Step: 731730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:23,604-Speed 2620.32 samples/sec   Loss 1.7632   LearningRate 0.0014   Epoch: 17   Global Step: 731740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:27,500-Speed 2629.34 samples/sec   Loss 1.7157   LearningRate 0.0014   Epoch: 17   Global Step: 731750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 05:59:31,464-Speed 2583.65 samples/sec   Loss 1.7639   LearningRate 0.0014   Epoch: 17   Global Step: 731760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:35,385-Speed 2612.00 samples/sec   Loss 1.6970   LearningRate 0.0014   Epoch: 17   Global Step: 731770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:39,346-Speed 2586.37 samples/sec   Loss 1.7387   LearningRate 0.0014   Epoch: 17   Global Step: 731780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:43,241-Speed 2630.00 samples/sec   Loss 1.7563   LearningRate 0.0014   Epoch: 17   Global Step: 731790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:47,137-Speed 2629.03 samples/sec   Loss 1.8039   LearningRate 0.0014   Epoch: 17   Global Step: 731800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:51,033-Speed 2629.17 samples/sec   Loss 1.7408   LearningRate 0.0014   Epoch: 17   Global Step: 731810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:54,954-Speed 2611.97 samples/sec   Loss 1.7341   LearningRate 0.0014   Epoch: 17   Global Step: 731820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 05:59:58,914-Speed 2587.58 samples/sec   Loss 1.7405   LearningRate 0.0014   Epoch: 17   Global Step: 731830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:00:02,787-Speed 2644.59 samples/sec   Loss 1.7616   LearningRate 0.0014   Epoch: 17   Global Step: 731840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:06,684-Speed 2627.70 samples/sec   Loss 1.7154   LearningRate 0.0014   Epoch: 17   Global Step: 731850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:10,580-Speed 2629.04 samples/sec   Loss 1.7793   LearningRate 0.0014   Epoch: 17   Global Step: 731860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:14,506-Speed 2609.02 samples/sec   Loss 1.7868   LearningRate 0.0014   Epoch: 17   Global Step: 731870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:18,410-Speed 2623.65 samples/sec   Loss 1.7290   LearningRate 0.0014   Epoch: 17   Global Step: 731880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:22,315-Speed 2623.60 samples/sec   Loss 1.7292   LearningRate 0.0014   Epoch: 17   Global Step: 731890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:26,236-Speed 2612.05 samples/sec   Loss 1.7770   LearningRate 0.0014   Epoch: 17   Global Step: 731900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:30,130-Speed 2630.43 samples/sec   Loss 1.7913   LearningRate 0.0014   Epoch: 17   Global Step: 731910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:34,024-Speed 2630.55 samples/sec   Loss 1.7455   LearningRate 0.0014   Epoch: 17   Global Step: 731920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:37,918-Speed 2630.22 samples/sec   Loss 1.7860   LearningRate 0.0014   Epoch: 17   Global Step: 731930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:41,818-Speed 2626.26 samples/sec   Loss 1.7282   LearningRate 0.0014   Epoch: 17   Global Step: 731940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:00:45,719-Speed 2626.19 samples/sec   Loss 1.7766   LearningRate 0.0014   Epoch: 17   Global Step: 731950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:49,618-Speed 2627.23 samples/sec   Loss 1.7162   LearningRate 0.0014   Epoch: 17   Global Step: 731960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:53,520-Speed 2624.59 samples/sec   Loss 1.7451   LearningRate 0.0014   Epoch: 17   Global Step: 731970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:00:57,425-Speed 2623.19 samples/sec   Loss 1.7487   LearningRate 0.0014   Epoch: 17   Global Step: 731980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:01,321-Speed 2629.03 samples/sec   Loss 1.7345   LearningRate 0.0014   Epoch: 17   Global Step: 731990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:05,230-Speed 2620.44 samples/sec   Loss 1.7762   LearningRate 0.0014   Epoch: 17   Global Step: 732000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:09,125-Speed 2629.26 samples/sec   Loss 1.7022   LearningRate 0.0014   Epoch: 17   Global Step: 732010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:13,020-Speed 2629.60 samples/sec   Loss 1.7806   LearningRate 0.0014   Epoch: 17   Global Step: 732020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:16,920-Speed 2626.73 samples/sec   Loss 1.7339   LearningRate 0.0014   Epoch: 17   Global Step: 732030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:20,814-Speed 2630.36 samples/sec   Loss 1.7752   LearningRate 0.0014   Epoch: 17   Global Step: 732040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:24,714-Speed 2626.61 samples/sec   Loss 1.7967   LearningRate 0.0014   Epoch: 17   Global Step: 732050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:01:28,603-Speed 2633.74 samples/sec   Loss 1.7512   LearningRate 0.0014   Epoch: 17   Global Step: 732060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:32,510-Speed 2621.80 samples/sec   Loss 1.7733   LearningRate 0.0014   Epoch: 17   Global Step: 732070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:36,407-Speed 2627.94 samples/sec   Loss 1.7854   LearningRate 0.0014   Epoch: 17   Global Step: 732080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:40,304-Speed 2628.17 samples/sec   Loss 1.7790   LearningRate 0.0014   Epoch: 17   Global Step: 732090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:44,213-Speed 2620.55 samples/sec   Loss 1.8636   LearningRate 0.0014   Epoch: 17   Global Step: 732100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:48,110-Speed 2628.20 samples/sec   Loss 1.7090   LearningRate 0.0014   Epoch: 17   Global Step: 732110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:52,005-Speed 2630.12 samples/sec   Loss 1.7186   LearningRate 0.0014   Epoch: 17   Global Step: 732120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:55,903-Speed 2627.95 samples/sec   Loss 1.7037   LearningRate 0.0014   Epoch: 17   Global Step: 732130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:01:59,802-Speed 2626.53 samples/sec   Loss 1.7871   LearningRate 0.0014   Epoch: 17   Global Step: 732140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:03,698-Speed 2629.09 samples/sec   Loss 1.7125   LearningRate 0.0014   Epoch: 17   Global Step: 732150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:07,590-Speed 2631.48 samples/sec   Loss 1.7271   LearningRate 0.0014   Epoch: 17   Global Step: 732160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:02:11,486-Speed 2628.71 samples/sec   Loss 1.7448   LearningRate 0.0014   Epoch: 17   Global Step: 732170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:02:15,397-Speed 2619.72 samples/sec   Loss 1.7155   LearningRate 0.0014   Epoch: 17   Global Step: 732180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:02:19,297-Speed 2626.11 samples/sec   Loss 1.7206   LearningRate 0.0014   Epoch: 17   Global Step: 732190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:02:23,192-Speed 2629.79 samples/sec   Loss 1.6981   LearningRate 0.0014   Epoch: 17   Global Step: 732200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:02:27,085-Speed 2631.40 samples/sec   Loss 1.8042   LearningRate 0.0014   Epoch: 17   Global Step: 732210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:02:30,975-Speed 2632.74 samples/sec   Loss 1.7724   LearningRate 0.0014   Epoch: 17   Global Step: 732220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:34,897-Speed 2612.00 samples/sec   Loss 1.7847   LearningRate 0.0014   Epoch: 17   Global Step: 732230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:38,824-Speed 2608.39 samples/sec   Loss 1.7689   LearningRate 0.0014   Epoch: 17   Global Step: 732240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:42,721-Speed 2628.01 samples/sec   Loss 1.7406   LearningRate 0.0014   Epoch: 17   Global Step: 732250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:46,614-Speed 2631.18 samples/sec   Loss 1.7590   LearningRate 0.0014   Epoch: 17   Global Step: 732260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:50,513-Speed 2627.53 samples/sec   Loss 1.7407   LearningRate 0.0014   Epoch: 17   Global Step: 732270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:54,406-Speed 2630.78 samples/sec   Loss 1.7225   LearningRate 0.0014   Epoch: 17   Global Step: 732280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:02:58,330-Speed 2611.34 samples/sec   Loss 1.7683   LearningRate 0.0014   Epoch: 17   Global Step: 732290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:02,260-Speed 2606.61 samples/sec   Loss 1.7574   LearningRate 0.0014   Epoch: 17   Global Step: 732300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:06,152-Speed 2631.78 samples/sec   Loss 1.7498   LearningRate 0.0014   Epoch: 17   Global Step: 732310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:10,047-Speed 2629.60 samples/sec   Loss 1.8239   LearningRate 0.0014   Epoch: 17   Global Step: 732320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:03:13,944-Speed 2628.76 samples/sec   Loss 1.7594   LearningRate 0.0014   Epoch: 17   Global Step: 732330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:03:17,839-Speed 2628.97 samples/sec   Loss 1.7358   LearningRate 0.0014   Epoch: 17   Global Step: 732340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:03:21,714-Speed 2644.07 samples/sec   Loss 1.7774   LearningRate 0.0014   Epoch: 17   Global Step: 732350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:25,610-Speed 2630.47 samples/sec   Loss 1.8066   LearningRate 0.0014   Epoch: 17   Global Step: 732360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:29,536-Speed 2608.52 samples/sec   Loss 1.7048   LearningRate 0.0014   Epoch: 17   Global Step: 732370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:33,429-Speed 2631.33 samples/sec   Loss 1.7317   LearningRate 0.0014   Epoch: 17   Global Step: 732380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:37,322-Speed 2631.40 samples/sec   Loss 1.7211   LearningRate 0.0014   Epoch: 17   Global Step: 732390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:41,232-Speed 2619.16 samples/sec   Loss 1.7194   LearningRate 0.0014   Epoch: 17   Global Step: 732400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:45,127-Speed 2629.84 samples/sec   Loss 1.7511   LearningRate 0.0014   Epoch: 17   Global Step: 732410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:49,041-Speed 2616.96 samples/sec   Loss 1.7844   LearningRate 0.0014   Epoch: 17   Global Step: 732420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:52,945-Speed 2623.39 samples/sec   Loss 1.7868   LearningRate 0.0014   Epoch: 17   Global Step: 732430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:03:56,840-Speed 2629.92 samples/sec   Loss 1.7905   LearningRate 0.0014   Epoch: 17   Global Step: 732440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:00,736-Speed 2629.32 samples/sec   Loss 1.7869   LearningRate 0.0014   Epoch: 17   Global Step: 732450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:04:04,607-Speed 2647.10 samples/sec   Loss 1.7478   LearningRate 0.0014   Epoch: 17   Global Step: 732460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:08,514-Speed 2621.24 samples/sec   Loss 1.7858   LearningRate 0.0014   Epoch: 17   Global Step: 732470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:12,408-Speed 2630.66 samples/sec   Loss 1.7297   LearningRate 0.0014   Epoch: 17   Global Step: 732480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:16,313-Speed 2622.91 samples/sec   Loss 1.7164   LearningRate 0.0014   Epoch: 17   Global Step: 732490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:20,213-Speed 2626.17 samples/sec   Loss 1.7322   LearningRate 0.0014   Epoch: 17   Global Step: 732500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:24,114-Speed 2625.96 samples/sec   Loss 1.7102   LearningRate 0.0014   Epoch: 17   Global Step: 732510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:28,013-Speed 2626.30 samples/sec   Loss 1.7835   LearningRate 0.0014   Epoch: 17   Global Step: 732520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:31,912-Speed 2627.99 samples/sec   Loss 1.7186   LearningRate 0.0014   Epoch: 17   Global Step: 732530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:35,825-Speed 2617.49 samples/sec   Loss 1.7154   LearningRate 0.0014   Epoch: 17   Global Step: 732540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:39,725-Speed 2626.18 samples/sec   Loss 1.7492   LearningRate 0.0014   Epoch: 17   Global Step: 732550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:43,618-Speed 2631.03 samples/sec   Loss 1.7546   LearningRate 0.0014   Epoch: 17   Global Step: 732560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:04:47,526-Speed 2621.31 samples/sec   Loss 1.7042   LearningRate 0.0014   Epoch: 17   Global Step: 732570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:04:51,403-Speed 2642.06 samples/sec   Loss 1.7565   LearningRate 0.0014   Epoch: 17   Global Step: 732580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:55,407-Speed 2557.95 samples/sec   Loss 1.7343   LearningRate 0.0014   Epoch: 17   Global Step: 732590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:04:59,327-Speed 2612.80 samples/sec   Loss 1.7597   LearningRate 0.0014   Epoch: 17   Global Step: 732600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:03,224-Speed 2628.40 samples/sec   Loss 1.7082   LearningRate 0.0014   Epoch: 17   Global Step: 732610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:07,115-Speed 2632.54 samples/sec   Loss 1.7745   LearningRate 0.0014   Epoch: 17   Global Step: 732620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:11,029-Speed 2616.98 samples/sec   Loss 1.7294   LearningRate 0.0014   Epoch: 17   Global Step: 732630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:14,931-Speed 2624.70 samples/sec   Loss 1.7656   LearningRate 0.0014   Epoch: 17   Global Step: 732640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:18,824-Speed 2630.97 samples/sec   Loss 1.7870   LearningRate 0.0014   Epoch: 17   Global Step: 732650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:22,721-Speed 2628.57 samples/sec   Loss 1.6874   LearningRate 0.0014   Epoch: 17   Global Step: 732660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:26,614-Speed 2631.11 samples/sec   Loss 1.8004   LearningRate 0.0014   Epoch: 17   Global Step: 732670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:05:30,542-Speed 2607.52 samples/sec   Loss 1.7125   LearningRate 0.0014   Epoch: 17   Global Step: 732680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:34,451-Speed 2620.66 samples/sec   Loss 1.7219   LearningRate 0.0014   Epoch: 17   Global Step: 732690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:38,340-Speed 2633.57 samples/sec   Loss 1.7287   LearningRate 0.0014   Epoch: 17   Global Step: 732700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:42,239-Speed 2626.69 samples/sec   Loss 1.7636   LearningRate 0.0014   Epoch: 17   Global Step: 732710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:46,139-Speed 2627.40 samples/sec   Loss 1.7892   LearningRate 0.0014   Epoch: 17   Global Step: 732720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:50,046-Speed 2621.61 samples/sec   Loss 1.7617   LearningRate 0.0014   Epoch: 17   Global Step: 732730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:53,944-Speed 2628.42 samples/sec   Loss 1.7082   LearningRate 0.0014   Epoch: 17   Global Step: 732740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:05:57,816-Speed 2645.16 samples/sec   Loss 1.7274   LearningRate 0.0014   Epoch: 17   Global Step: 732750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:01,742-Speed 2608.84 samples/sec   Loss 1.7415   LearningRate 0.0014   Epoch: 17   Global Step: 732760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:05,634-Speed 2632.07 samples/sec   Loss 1.7363   LearningRate 0.0014   Epoch: 17   Global Step: 732770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:09,527-Speed 2631.03 samples/sec   Loss 1.6864   LearningRate 0.0014   Epoch: 17   Global Step: 732780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:13,420-Speed 2630.86 samples/sec   Loss 1.7377   LearningRate 0.0014   Epoch: 17   Global Step: 732790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:17,314-Speed 2630.21 samples/sec   Loss 1.7426   LearningRate 0.0014   Epoch: 17   Global Step: 732800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:21,218-Speed 2623.36 samples/sec   Loss 1.8059   LearningRate 0.0014   Epoch: 17   Global Step: 732810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:25,117-Speed 2627.16 samples/sec   Loss 1.7482   LearningRate 0.0014   Epoch: 17   Global Step: 732820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:29,024-Speed 2621.90 samples/sec   Loss 1.7580   LearningRate 0.0014   Epoch: 17   Global Step: 732830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:32,958-Speed 2603.34 samples/sec   Loss 1.7976   LearningRate 0.0014   Epoch: 17   Global Step: 732840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:36,864-Speed 2622.68 samples/sec   Loss 1.7530   LearningRate 0.0014   Epoch: 17   Global Step: 732850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:06:40,764-Speed 2626.00 samples/sec   Loss 1.7628   LearningRate 0.0014   Epoch: 17   Global Step: 732860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:06:44,665-Speed 2626.41 samples/sec   Loss 1.7362   LearningRate 0.0014   Epoch: 17   Global Step: 732870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:06:48,540-Speed 2642.99 samples/sec   Loss 1.7109   LearningRate 0.0014   Epoch: 17   Global Step: 732880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:52,434-Speed 2631.07 samples/sec   Loss 1.7181   LearningRate 0.0014   Epoch: 17   Global Step: 732890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:06:56,328-Speed 2630.33 samples/sec   Loss 1.7331   LearningRate 0.0014   Epoch: 17   Global Step: 732900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:00,231-Speed 2624.30 samples/sec   Loss 1.7085   LearningRate 0.0014   Epoch: 17   Global Step: 732910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:04,143-Speed 2618.33 samples/sec   Loss 1.7197   LearningRate 0.0014   Epoch: 17   Global Step: 732920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:08,043-Speed 2626.10 samples/sec   Loss 1.7128   LearningRate 0.0014   Epoch: 17   Global Step: 732930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:11,946-Speed 2624.11 samples/sec   Loss 1.7239   LearningRate 0.0014   Epoch: 17   Global Step: 732940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:15,844-Speed 2627.53 samples/sec   Loss 1.6939   LearningRate 0.0014   Epoch: 17   Global Step: 732950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:19,754-Speed 2619.60 samples/sec   Loss 1.7636   LearningRate 0.0014   Epoch: 17   Global Step: 732960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:23,648-Speed 2631.35 samples/sec   Loss 1.8165   LearningRate 0.0014   Epoch: 17   Global Step: 732970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:27,539-Speed 2632.23 samples/sec   Loss 1.7535   LearningRate 0.0014   Epoch: 17   Global Step: 732980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:07:31,442-Speed 2624.36 samples/sec   Loss 1.7156   LearningRate 0.0014   Epoch: 17   Global Step: 732990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:07:35,347-Speed 2622.44 samples/sec   Loss 1.7076   LearningRate 0.0014   Epoch: 17   Global Step: 733000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:07:39,228-Speed 2639.45 samples/sec   Loss 1.6887   LearningRate 0.0014   Epoch: 17   Global Step: 733010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:43,123-Speed 2629.68 samples/sec   Loss 1.7419   LearningRate 0.0014   Epoch: 17   Global Step: 733020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:47,017-Speed 2630.87 samples/sec   Loss 1.8003   LearningRate 0.0014   Epoch: 17   Global Step: 733030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:50,912-Speed 2631.43 samples/sec   Loss 1.7349   LearningRate 0.0014   Epoch: 17   Global Step: 733040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:54,806-Speed 2630.20 samples/sec   Loss 1.7739   LearningRate 0.0014   Epoch: 17   Global Step: 733050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:07:58,703-Speed 2628.63 samples/sec   Loss 1.7435   LearningRate 0.0014   Epoch: 17   Global Step: 733060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:08:02,604-Speed 2625.86 samples/sec   Loss 1.7433   LearningRate 0.0014   Epoch: 17   Global Step: 733070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:08:06,497-Speed 2630.45 samples/sec   Loss 1.7473   LearningRate 0.0014   Epoch: 17   Global Step: 733080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:08:10,393-Speed 2628.78 samples/sec   Loss 1.7390   LearningRate 0.0014   Epoch: 17   Global Step: 733090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:08:14,289-Speed 2629.43 samples/sec   Loss 1.8244   LearningRate 0.0014   Epoch: 17   Global Step: 733100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:08:18,209-Speed 2612.59 samples/sec   Loss 1.8003   LearningRate 0.0014   Epoch: 17   Global Step: 733110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:22,101-Speed 2632.20 samples/sec   Loss 1.7285   LearningRate 0.0014   Epoch: 17   Global Step: 733120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:25,995-Speed 2630.54 samples/sec   Loss 1.7021   LearningRate 0.0014   Epoch: 17   Global Step: 733130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:29,887-Speed 2631.98 samples/sec   Loss 1.7377   LearningRate 0.0014   Epoch: 17   Global Step: 733140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:33,781-Speed 2630.04 samples/sec   Loss 1.7273   LearningRate 0.0014   Epoch: 17   Global Step: 733150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:37,684-Speed 2623.95 samples/sec   Loss 1.7147   LearningRate 0.0014   Epoch: 17   Global Step: 733160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:41,591-Speed 2621.37 samples/sec   Loss 1.7167   LearningRate 0.0014   Epoch: 17   Global Step: 733170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:45,490-Speed 2626.81 samples/sec   Loss 1.7020   LearningRate 0.0014   Epoch: 17   Global Step: 733180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:49,385-Speed 2630.33 samples/sec   Loss 1.7803   LearningRate 0.0014   Epoch: 17   Global Step: 733190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:08:53,260-Speed 2643.27 samples/sec   Loss 1.7720   LearningRate 0.0013   Epoch: 17   Global Step: 733200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:08:57,156-Speed 2628.97 samples/sec   Loss 1.7958   LearningRate 0.0013   Epoch: 17   Global Step: 733210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:01,056-Speed 2625.96 samples/sec   Loss 1.7467   LearningRate 0.0013   Epoch: 17   Global Step: 733220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:04,954-Speed 2628.08 samples/sec   Loss 1.7323   LearningRate 0.0013   Epoch: 17   Global Step: 733230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:08,866-Speed 2618.10 samples/sec   Loss 1.7547   LearningRate 0.0013   Epoch: 17   Global Step: 733240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:12,771-Speed 2622.85 samples/sec   Loss 1.7425   LearningRate 0.0013   Epoch: 17   Global Step: 733250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:16,674-Speed 2623.86 samples/sec   Loss 1.6932   LearningRate 0.0013   Epoch: 17   Global Step: 733260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:20,573-Speed 2627.52 samples/sec   Loss 1.7452   LearningRate 0.0013   Epoch: 17   Global Step: 733270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:24,478-Speed 2622.87 samples/sec   Loss 1.7394   LearningRate 0.0013   Epoch: 17   Global Step: 733280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:28,370-Speed 2632.11 samples/sec   Loss 1.6876   LearningRate 0.0013   Epoch: 17   Global Step: 733290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:09:32,346-Speed 2575.76 samples/sec   Loss 1.7850   LearningRate 0.0013   Epoch: 17   Global Step: 733300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:36,254-Speed 2621.25 samples/sec   Loss 1.7239   LearningRate 0.0013   Epoch: 17   Global Step: 733310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:40,152-Speed 2627.31 samples/sec   Loss 1.7864   LearningRate 0.0013   Epoch: 17   Global Step: 733320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:44,054-Speed 2624.81 samples/sec   Loss 1.7698   LearningRate 0.0013   Epoch: 17   Global Step: 733330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:47,967-Speed 2617.79 samples/sec   Loss 1.7089   LearningRate 0.0013   Epoch: 17   Global Step: 733340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:51,856-Speed 2633.39 samples/sec   Loss 1.7361   LearningRate 0.0013   Epoch: 17   Global Step: 733350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:55,747-Speed 2632.54 samples/sec   Loss 1.6881   LearningRate 0.0013   Epoch: 17   Global Step: 733360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:09:59,643-Speed 2629.36 samples/sec   Loss 1.7466   LearningRate 0.0013   Epoch: 17   Global Step: 733370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:10:03,513-Speed 2646.83 samples/sec   Loss 1.6911   LearningRate 0.0013   Epoch: 17   Global Step: 733380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:07,410-Speed 2628.23 samples/sec   Loss 1.7154   LearningRate 0.0013   Epoch: 17   Global Step: 733390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:11,312-Speed 2624.90 samples/sec   Loss 1.7433   LearningRate 0.0013   Epoch: 17   Global Step: 733400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:15,211-Speed 2626.52 samples/sec   Loss 1.7054   LearningRate 0.0013   Epoch: 17   Global Step: 733410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:19,125-Speed 2617.56 samples/sec   Loss 1.7429   LearningRate 0.0013   Epoch: 17   Global Step: 733420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:23,023-Speed 2627.44 samples/sec   Loss 1.6725   LearningRate 0.0013   Epoch: 17   Global Step: 733430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:26,959-Speed 2601.96 samples/sec   Loss 1.7698   LearningRate 0.0013   Epoch: 17   Global Step: 733440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:31,083-Speed 2484.26 samples/sec   Loss 1.7441   LearningRate 0.0013   Epoch: 17   Global Step: 733450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:35,130-Speed 2530.86 samples/sec   Loss 1.7182   LearningRate 0.0013   Epoch: 17   Global Step: 733460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:39,043-Speed 2617.81 samples/sec   Loss 1.7294   LearningRate 0.0013   Epoch: 17   Global Step: 733470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:42,925-Speed 2638.21 samples/sec   Loss 1.7615   LearningRate 0.0013   Epoch: 17   Global Step: 733480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:46,919-Speed 2564.75 samples/sec   Loss 1.8080   LearningRate 0.0013   Epoch: 17   Global Step: 733490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:50,836-Speed 2614.50 samples/sec   Loss 1.7189   LearningRate 0.0013   Epoch: 17   Global Step: 733500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:54,779-Speed 2598.28 samples/sec   Loss 1.7675   LearningRate 0.0013   Epoch: 17   Global Step: 733510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:10:58,886-Speed 2494.21 samples/sec   Loss 1.7327   LearningRate 0.0013   Epoch: 17   Global Step: 733520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:02,790-Speed 2623.49 samples/sec   Loss 1.7436   LearningRate 0.0013   Epoch: 17   Global Step: 733530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:06,687-Speed 2628.06 samples/sec   Loss 1.6890   LearningRate 0.0013   Epoch: 17   Global Step: 733540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:10,620-Speed 2604.80 samples/sec   Loss 1.7310   LearningRate 0.0013   Epoch: 17   Global Step: 733550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:14,697-Speed 2511.78 samples/sec   Loss 1.7013   LearningRate 0.0013   Epoch: 17   Global Step: 733560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:18,595-Speed 2628.03 samples/sec   Loss 1.6850   LearningRate 0.0013   Epoch: 17   Global Step: 733570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:22,514-Speed 2613.36 samples/sec   Loss 1.7340   LearningRate 0.0013   Epoch: 17   Global Step: 733580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:11:26,391-Speed 2641.90 samples/sec   Loss 1.7712   LearningRate 0.0013   Epoch: 17   Global Step: 733590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:30,283-Speed 2632.06 samples/sec   Loss 1.7630   LearningRate 0.0013   Epoch: 17   Global Step: 733600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:34,184-Speed 2626.01 samples/sec   Loss 1.7293   LearningRate 0.0013   Epoch: 17   Global Step: 733610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:38,179-Speed 2563.34 samples/sec   Loss 1.7986   LearningRate 0.0013   Epoch: 17   Global Step: 733620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:42,087-Speed 2620.61 samples/sec   Loss 1.7186   LearningRate 0.0013   Epoch: 17   Global Step: 733630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:45,983-Speed 2629.93 samples/sec   Loss 1.7380   LearningRate 0.0013   Epoch: 17   Global Step: 733640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:49,877-Speed 2630.63 samples/sec   Loss 1.7549   LearningRate 0.0013   Epoch: 17   Global Step: 733650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:53,767-Speed 2633.18 samples/sec   Loss 1.7165   LearningRate 0.0013   Epoch: 17   Global Step: 733660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:11:57,660-Speed 2631.02 samples/sec   Loss 1.7543   LearningRate 0.0013   Epoch: 17   Global Step: 733670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:12:01,601-Speed 2599.45 samples/sec   Loss 1.6929   LearningRate 0.0013   Epoch: 17   Global Step: 733680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:12:05,497-Speed 2629.05 samples/sec   Loss 1.7641   LearningRate 0.0013   Epoch: 17   Global Step: 733690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:09,387-Speed 2632.66 samples/sec   Loss 1.7141   LearningRate 0.0013   Epoch: 17   Global Step: 733700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:13,283-Speed 2629.20 samples/sec   Loss 1.7095   LearningRate 0.0013   Epoch: 17   Global Step: 733710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:17,383-Speed 2498.21 samples/sec   Loss 1.7675   LearningRate 0.0013   Epoch: 17   Global Step: 733720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:21,472-Speed 2504.81 samples/sec   Loss 1.7548   LearningRate 0.0013   Epoch: 17   Global Step: 733730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:25,364-Speed 2631.60 samples/sec   Loss 1.7078   LearningRate 0.0013   Epoch: 17   Global Step: 733740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:29,263-Speed 2627.61 samples/sec   Loss 1.7345   LearningRate 0.0013   Epoch: 17   Global Step: 733750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:33,164-Speed 2624.87 samples/sec   Loss 1.7747   LearningRate 0.0013   Epoch: 17   Global Step: 733760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:37,057-Speed 2630.91 samples/sec   Loss 1.7298   LearningRate 0.0013   Epoch: 17   Global Step: 733770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:12:40,936-Speed 2640.69 samples/sec   Loss 1.7282   LearningRate 0.0013   Epoch: 17   Global Step: 733780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:12:44,934-Speed 2562.92 samples/sec   Loss 1.6929   LearningRate 0.0013   Epoch: 17   Global Step: 733790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:12:49,051-Speed 2487.39 samples/sec   Loss 1.6533   LearningRate 0.0013   Epoch: 17   Global Step: 733800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:12:52,970-Speed 2613.74 samples/sec   Loss 1.7262   LearningRate 0.0013   Epoch: 17   Global Step: 733810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:12:56,924-Speed 2590.68 samples/sec   Loss 1.6997   LearningRate 0.0013   Epoch: 17   Global Step: 733820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:00,906-Speed 2572.35 samples/sec   Loss 1.7774   LearningRate 0.0013   Epoch: 17   Global Step: 733830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:04,804-Speed 2627.78 samples/sec   Loss 1.7416   LearningRate 0.0013   Epoch: 17   Global Step: 733840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:08,705-Speed 2625.44 samples/sec   Loss 1.7435   LearningRate 0.0013   Epoch: 17   Global Step: 733850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:12,635-Speed 2605.85 samples/sec   Loss 1.7116   LearningRate 0.0013   Epoch: 17   Global Step: 733860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:16,553-Speed 2615.02 samples/sec   Loss 1.6954   LearningRate 0.0013   Epoch: 17   Global Step: 733870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:20,460-Speed 2621.20 samples/sec   Loss 1.7044   LearningRate 0.0013   Epoch: 17   Global Step: 733880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:13:24,371-Speed 2619.10 samples/sec   Loss 1.6667   LearningRate 0.0013   Epoch: 17   Global Step: 733890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:13:28,265-Speed 2630.65 samples/sec   Loss 1.7202   LearningRate 0.0013   Epoch: 17   Global Step: 733900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:13:32,165-Speed 2626.85 samples/sec   Loss 1.7569   LearningRate 0.0013   Epoch: 17   Global Step: 733910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:13:36,040-Speed 2642.76 samples/sec   Loss 1.6745   LearningRate 0.0013   Epoch: 17   Global Step: 733920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:39,949-Speed 2620.06 samples/sec   Loss 1.7135   LearningRate 0.0013   Epoch: 17   Global Step: 733930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:43,861-Speed 2618.36 samples/sec   Loss 1.6767   LearningRate 0.0013   Epoch: 17   Global Step: 733940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:47,770-Speed 2620.71 samples/sec   Loss 1.7543   LearningRate 0.0013   Epoch: 17   Global Step: 733950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:51,671-Speed 2625.24 samples/sec   Loss 1.7288   LearningRate 0.0013   Epoch: 17   Global Step: 733960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:55,566-Speed 2629.61 samples/sec   Loss 1.6768   LearningRate 0.0013   Epoch: 17   Global Step: 733970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:13:59,461-Speed 2630.14 samples/sec   Loss 1.7419   LearningRate 0.0013   Epoch: 17   Global Step: 733980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:03,356-Speed 2629.36 samples/sec   Loss 1.7344   LearningRate 0.0013   Epoch: 17   Global Step: 733990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:07,249-Speed 2630.87 samples/sec   Loss 1.7709   LearningRate 0.0013   Epoch: 17   Global Step: 734000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:11,144-Speed 2629.68 samples/sec   Loss 1.6881   LearningRate 0.0013   Epoch: 17   Global Step: 734010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:15,057-Speed 2617.98 samples/sec   Loss 1.7266   LearningRate 0.0013   Epoch: 17   Global Step: 734020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:14:18,925-Speed 2647.62 samples/sec   Loss 1.7142   LearningRate 0.0013   Epoch: 17   Global Step: 734030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:22,818-Speed 2631.68 samples/sec   Loss 1.6811   LearningRate 0.0013   Epoch: 17   Global Step: 734040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:26,714-Speed 2629.00 samples/sec   Loss 1.6766   LearningRate 0.0013   Epoch: 17   Global Step: 734050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:30,612-Speed 2627.40 samples/sec   Loss 1.7109   LearningRate 0.0013   Epoch: 17   Global Step: 734060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:34,554-Speed 2599.02 samples/sec   Loss 1.6979   LearningRate 0.0013   Epoch: 17   Global Step: 734070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:38,451-Speed 2628.19 samples/sec   Loss 1.6758   LearningRate 0.0013   Epoch: 17   Global Step: 734080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:42,353-Speed 2624.54 samples/sec   Loss 1.7821   LearningRate 0.0013   Epoch: 17   Global Step: 734090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:46,322-Speed 2581.07 samples/sec   Loss 1.7363   LearningRate 0.0013   Epoch: 17   Global Step: 734100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:50,292-Speed 2580.52 samples/sec   Loss 1.6825   LearningRate 0.0013   Epoch: 17   Global Step: 734110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:54,243-Speed 2592.39 samples/sec   Loss 1.7173   LearningRate 0.0013   Epoch: 17   Global Step: 734120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:14:58,147-Speed 2623.96 samples/sec   Loss 1.7461   LearningRate 0.0013   Epoch: 17   Global Step: 734130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:02,051-Speed 2623.41 samples/sec   Loss 1.7354   LearningRate 0.0013   Epoch: 17   Global Step: 734140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:05,944-Speed 2631.61 samples/sec   Loss 1.7705   LearningRate 0.0013   Epoch: 17   Global Step: 734150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:09,844-Speed 2625.83 samples/sec   Loss 1.7882   LearningRate 0.0013   Epoch: 17   Global Step: 734160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:13,744-Speed 2626.35 samples/sec   Loss 1.7838   LearningRate 0.0013   Epoch: 17   Global Step: 734170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:17,642-Speed 2627.46 samples/sec   Loss 1.7948   LearningRate 0.0013   Epoch: 17   Global Step: 734180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:21,545-Speed 2624.84 samples/sec   Loss 1.7422   LearningRate 0.0013   Epoch: 17   Global Step: 734190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:25,447-Speed 2624.85 samples/sec   Loss 1.6845   LearningRate 0.0013   Epoch: 17   Global Step: 734200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:29,347-Speed 2626.48 samples/sec   Loss 1.6977   LearningRate 0.0013   Epoch: 17   Global Step: 734210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:33,239-Speed 2631.66 samples/sec   Loss 1.7389   LearningRate 0.0013   Epoch: 17   Global Step: 734220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:37,116-Speed 2642.14 samples/sec   Loss 1.7239   LearningRate 0.0013   Epoch: 17   Global Step: 734230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:41,021-Speed 2622.99 samples/sec   Loss 1.7198   LearningRate 0.0013   Epoch: 17   Global Step: 734240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:44,922-Speed 2625.28 samples/sec   Loss 1.7320   LearningRate 0.0013   Epoch: 17   Global Step: 734250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:48,821-Speed 2626.32 samples/sec   Loss 1.7252   LearningRate 0.0013   Epoch: 17   Global Step: 734260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:52,725-Speed 2624.15 samples/sec   Loss 1.7788   LearningRate 0.0013   Epoch: 17   Global Step: 734270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:15:56,625-Speed 2626.66 samples/sec   Loss 1.7971   LearningRate 0.0013   Epoch: 17   Global Step: 734280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:16:00,527-Speed 2624.85 samples/sec   Loss 1.7152   LearningRate 0.0013   Epoch: 17   Global Step: 734290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:16:04,403-Speed 2642.89 samples/sec   Loss 1.7563   LearningRate 0.0013   Epoch: 17   Global Step: 734300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:08,300-Speed 2628.30 samples/sec   Loss 1.7635   LearningRate 0.0013   Epoch: 17   Global Step: 734310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:12,192-Speed 2631.07 samples/sec   Loss 1.6847   LearningRate 0.0013   Epoch: 17   Global Step: 734320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:16,095-Speed 2624.73 samples/sec   Loss 1.6837   LearningRate 0.0013   Epoch: 17   Global Step: 734330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:19,987-Speed 2631.68 samples/sec   Loss 1.6686   LearningRate 0.0013   Epoch: 17   Global Step: 734340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:23,881-Speed 2630.72 samples/sec   Loss 1.7218   LearningRate 0.0013   Epoch: 17   Global Step: 734350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:27,776-Speed 2629.40 samples/sec   Loss 1.6921   LearningRate 0.0013   Epoch: 17   Global Step: 734360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:31,673-Speed 2628.54 samples/sec   Loss 1.7005   LearningRate 0.0013   Epoch: 17   Global Step: 734370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:35,575-Speed 2624.82 samples/sec   Loss 1.6620   LearningRate 0.0013   Epoch: 17   Global Step: 734380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:39,479-Speed 2623.64 samples/sec   Loss 1.7285   LearningRate 0.0013   Epoch: 17   Global Step: 734390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:16:43,386-Speed 2621.51 samples/sec   Loss 1.6888   LearningRate 0.0013   Epoch: 17   Global Step: 734400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:16:47,297-Speed 2619.13 samples/sec   Loss 1.7321   LearningRate 0.0013   Epoch: 17   Global Step: 734410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:16:51,216-Speed 2613.46 samples/sec   Loss 1.7585   LearningRate 0.0013   Epoch: 17   Global Step: 734420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:16:55,143-Speed 2608.30 samples/sec   Loss 1.7110   LearningRate 0.0013   Epoch: 17   Global Step: 734430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:16:59,018-Speed 2643.79 samples/sec   Loss 1.7782   LearningRate 0.0013   Epoch: 17   Global Step: 734440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:02,911-Speed 2630.83 samples/sec   Loss 1.6910   LearningRate 0.0013   Epoch: 17   Global Step: 734450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:06,842-Speed 2605.33 samples/sec   Loss 1.6533   LearningRate 0.0013   Epoch: 17   Global Step: 734460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:10,753-Speed 2619.17 samples/sec   Loss 1.7562   LearningRate 0.0013   Epoch: 17   Global Step: 734470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:14,668-Speed 2616.38 samples/sec   Loss 1.7587   LearningRate 0.0013   Epoch: 17   Global Step: 734480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:18,570-Speed 2624.38 samples/sec   Loss 1.7266   LearningRate 0.0013   Epoch: 17   Global Step: 734490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:22,502-Speed 2605.50 samples/sec   Loss 1.7457   LearningRate 0.0013   Epoch: 17   Global Step: 734500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:26,398-Speed 2629.18 samples/sec   Loss 1.6876   LearningRate 0.0013   Epoch: 17   Global Step: 734510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:30,303-Speed 2623.08 samples/sec   Loss 1.6778   LearningRate 0.0013   Epoch: 17   Global Step: 734520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:34,209-Speed 2622.01 samples/sec   Loss 1.7690   LearningRate 0.0013   Epoch: 17   Global Step: 734530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:17:38,100-Speed 2632.12 samples/sec   Loss 1.7150   LearningRate 0.0013   Epoch: 17   Global Step: 734540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:17:42,004-Speed 2623.82 samples/sec   Loss 1.7105   LearningRate 0.0013   Epoch: 17   Global Step: 734550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:17:46,017-Speed 2552.09 samples/sec   Loss 1.7426   LearningRate 0.0013   Epoch: 17   Global Step: 734560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:17:49,946-Speed 2607.06 samples/sec   Loss 1.6599   LearningRate 0.0013   Epoch: 17   Global Step: 734570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:17:53,843-Speed 2628.55 samples/sec   Loss 1.7462   LearningRate 0.0013   Epoch: 17   Global Step: 734580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:17:57,739-Speed 2628.69 samples/sec   Loss 1.7056   LearningRate 0.0013   Epoch: 17   Global Step: 734590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:18:01,618-Speed 2641.31 samples/sec   Loss 1.7255   LearningRate 0.0013   Epoch: 17   Global Step: 734600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:05,519-Speed 2625.32 samples/sec   Loss 1.7134   LearningRate 0.0013   Epoch: 17   Global Step: 734610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:09,432-Speed 2617.43 samples/sec   Loss 1.6927   LearningRate 0.0013   Epoch: 17   Global Step: 734620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:13,338-Speed 2621.99 samples/sec   Loss 1.6755   LearningRate 0.0013   Epoch: 17   Global Step: 734630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:17,256-Speed 2614.52 samples/sec   Loss 1.6867   LearningRate 0.0013   Epoch: 17   Global Step: 734640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:21,150-Speed 2630.20 samples/sec   Loss 1.7001   LearningRate 0.0013   Epoch: 17   Global Step: 734650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:25,043-Speed 2631.49 samples/sec   Loss 1.6926   LearningRate 0.0013   Epoch: 17   Global Step: 734660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:28,938-Speed 2629.72 samples/sec   Loss 1.6680   LearningRate 0.0013   Epoch: 17   Global Step: 734670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:32,829-Speed 2632.44 samples/sec   Loss 1.7018   LearningRate 0.0013   Epoch: 17   Global Step: 734680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:36,723-Speed 2630.11 samples/sec   Loss 1.7063   LearningRate 0.0013   Epoch: 17   Global Step: 734690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:40,619-Speed 2629.10 samples/sec   Loss 1.7337   LearningRate 0.0013   Epoch: 17   Global Step: 734700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:18:44,529-Speed 2619.44 samples/sec   Loss 1.7142   LearningRate 0.0013   Epoch: 17   Global Step: 734710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:18:48,421-Speed 2631.49 samples/sec   Loss 1.7041   LearningRate 0.0013   Epoch: 17   Global Step: 734720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:18:52,291-Speed 2647.36 samples/sec   Loss 1.7095   LearningRate 0.0013   Epoch: 17   Global Step: 734730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:18:56,187-Speed 2628.54 samples/sec   Loss 1.7395   LearningRate 0.0013   Epoch: 17   Global Step: 734740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:00,118-Speed 2606.48 samples/sec   Loss 1.6970   LearningRate 0.0013   Epoch: 17   Global Step: 734750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:04,013-Speed 2629.51 samples/sec   Loss 1.7346   LearningRate 0.0013   Epoch: 17   Global Step: 734760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:07,918-Speed 2622.69 samples/sec   Loss 1.7441   LearningRate 0.0013   Epoch: 17   Global Step: 734770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:11,816-Speed 2627.47 samples/sec   Loss 1.7505   LearningRate 0.0013   Epoch: 17   Global Step: 734780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:15,709-Speed 2631.61 samples/sec   Loss 1.6810   LearningRate 0.0013   Epoch: 17   Global Step: 734790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:19,604-Speed 2629.51 samples/sec   Loss 1.7547   LearningRate 0.0013   Epoch: 17   Global Step: 734800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:23,501-Speed 2628.60 samples/sec   Loss 1.7035   LearningRate 0.0013   Epoch: 17   Global Step: 734810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:27,394-Speed 2630.80 samples/sec   Loss 1.6973   LearningRate 0.0013   Epoch: 17   Global Step: 734820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:19:31,287-Speed 2631.17 samples/sec   Loss 1.7513   LearningRate 0.0013   Epoch: 17   Global Step: 734830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:35,179-Speed 2631.89 samples/sec   Loss 1.7275   LearningRate 0.0013   Epoch: 17   Global Step: 734840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:39,074-Speed 2629.71 samples/sec   Loss 1.7025   LearningRate 0.0013   Epoch: 17   Global Step: 734850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:42,967-Speed 2630.52 samples/sec   Loss 1.7081   LearningRate 0.0013   Epoch: 17   Global Step: 734860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:46,870-Speed 2624.35 samples/sec   Loss 1.7474   LearningRate 0.0013   Epoch: 17   Global Step: 734870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:50,760-Speed 2633.39 samples/sec   Loss 1.8029   LearningRate 0.0013   Epoch: 17   Global Step: 734880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:54,655-Speed 2629.53 samples/sec   Loss 1.7132   LearningRate 0.0013   Epoch: 17   Global Step: 734890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:19:58,545-Speed 2632.69 samples/sec   Loss 1.7225   LearningRate 0.0013   Epoch: 17   Global Step: 734900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:20:02,446-Speed 2626.21 samples/sec   Loss 1.7211   LearningRate 0.0013   Epoch: 17   Global Step: 734910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:20:06,357-Speed 2619.02 samples/sec   Loss 1.6780   LearningRate 0.0013   Epoch: 17   Global Step: 734920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:20:10,257-Speed 2626.41 samples/sec   Loss 1.7070   LearningRate 0.0013   Epoch: 17   Global Step: 734930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-16 06:20:14,145-Speed 2633.92 samples/sec   Loss 1.7020   LearningRate 0.0013   Epoch: 17   Global Step: 734940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:20:18,023-Speed 2641.46 samples/sec   Loss 1.7500   LearningRate 0.0013   Epoch: 17   Global Step: 734950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:21,933-Speed 2619.60 samples/sec   Loss 1.7272   LearningRate 0.0013   Epoch: 17   Global Step: 734960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:25,826-Speed 2630.78 samples/sec   Loss 1.7596   LearningRate 0.0013   Epoch: 17   Global Step: 734970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:29,730-Speed 2624.20 samples/sec   Loss 1.7141   LearningRate 0.0013   Epoch: 17   Global Step: 734980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:33,622-Speed 2631.25 samples/sec   Loss 1.6970   LearningRate 0.0013   Epoch: 17   Global Step: 734990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:37,523-Speed 2626.44 samples/sec   Loss 1.7129   LearningRate 0.0013   Epoch: 17   Global Step: 735000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:41,427-Speed 2623.12 samples/sec   Loss 1.6974   LearningRate 0.0013   Epoch: 17   Global Step: 735010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:45,319-Speed 2632.10 samples/sec   Loss 1.7236   LearningRate 0.0013   Epoch: 17   Global Step: 735020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:49,211-Speed 2631.06 samples/sec   Loss 1.6561   LearningRate 0.0013   Epoch: 17   Global Step: 735030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:53,104-Speed 2631.56 samples/sec   Loss 1.7154   LearningRate 0.0013   Epoch: 17   Global Step: 735040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:20:56,999-Speed 2630.23 samples/sec   Loss 1.6863   LearningRate 0.0013   Epoch: 17   Global Step: 735050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:21:00,897-Speed 2627.09 samples/sec   Loss 1.7966   LearningRate 0.0013   Epoch: 17   Global Step: 735060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:21:04,781-Speed 2637.81 samples/sec   Loss 1.7323   LearningRate 0.0013   Epoch: 17   Global Step: 735070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:08,680-Speed 2626.47 samples/sec   Loss 1.7477   LearningRate 0.0013   Epoch: 17   Global Step: 735080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:12,575-Speed 2629.55 samples/sec   Loss 1.6337   LearningRate 0.0013   Epoch: 17   Global Step: 735090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:16,489-Speed 2617.27 samples/sec   Loss 1.6993   LearningRate 0.0013   Epoch: 17   Global Step: 735100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:20,382-Speed 2630.95 samples/sec   Loss 1.7125   LearningRate 0.0013   Epoch: 17   Global Step: 735110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:24,300-Speed 2614.24 samples/sec   Loss 1.7039   LearningRate 0.0013   Epoch: 17   Global Step: 735120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:28,193-Speed 2631.29 samples/sec   Loss 1.7421   LearningRate 0.0013   Epoch: 17   Global Step: 735130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:32,087-Speed 2630.09 samples/sec   Loss 1.6610   LearningRate 0.0013   Epoch: 17   Global Step: 735140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:35,980-Speed 2631.35 samples/sec   Loss 1.6986   LearningRate 0.0013   Epoch: 17   Global Step: 735150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:39,874-Speed 2630.57 samples/sec   Loss 1.6939   LearningRate 0.0013   Epoch: 17   Global Step: 735160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:43,785-Speed 2618.62 samples/sec   Loss 1.7203   LearningRate 0.0013   Epoch: 17   Global Step: 735170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:21:47,679-Speed 2629.83 samples/sec   Loss 1.7309   LearningRate 0.0013   Epoch: 17   Global Step: 735180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:21:51,575-Speed 2629.40 samples/sec   Loss 1.6264   LearningRate 0.0013   Epoch: 17   Global Step: 735190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:21:55,464-Speed 2633.99 samples/sec   Loss 1.7145   LearningRate 0.0013   Epoch: 17   Global Step: 735200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:21:59,358-Speed 2631.44 samples/sec   Loss 1.6973   LearningRate 0.0013   Epoch: 17   Global Step: 735210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:03,270-Speed 2617.93 samples/sec   Loss 1.7504   LearningRate 0.0013   Epoch: 17   Global Step: 735220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:07,184-Speed 2617.41 samples/sec   Loss 1.6865   LearningRate 0.0013   Epoch: 17   Global Step: 735230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:11,080-Speed 2628.87 samples/sec   Loss 1.7088   LearningRate 0.0013   Epoch: 17   Global Step: 735240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:14,984-Speed 2623.78 samples/sec   Loss 1.7912   LearningRate 0.0013   Epoch: 17   Global Step: 735250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:18,883-Speed 2626.46 samples/sec   Loss 1.6932   LearningRate 0.0013   Epoch: 17   Global Step: 735260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:22,785-Speed 2625.46 samples/sec   Loss 1.7112   LearningRate 0.0013   Epoch: 17   Global Step: 735270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:26,696-Speed 2619.36 samples/sec   Loss 1.6881   LearningRate 0.0013   Epoch: 17   Global Step: 735280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:30,590-Speed 2629.83 samples/sec   Loss 1.7035   LearningRate 0.0013   Epoch: 17   Global Step: 735290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:34,487-Speed 2628.23 samples/sec   Loss 1.7547   LearningRate 0.0013   Epoch: 17   Global Step: 735300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:22:38,381-Speed 2630.57 samples/sec   Loss 1.7392   LearningRate 0.0013   Epoch: 17   Global Step: 735310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:22:42,259-Speed 2640.99 samples/sec   Loss 1.7655   LearningRate 0.0013   Epoch: 17   Global Step: 735320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:46,155-Speed 2629.00 samples/sec   Loss 1.7447   LearningRate 0.0013   Epoch: 17   Global Step: 735330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:50,053-Speed 2627.90 samples/sec   Loss 1.7094   LearningRate 0.0013   Epoch: 17   Global Step: 735340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:53,944-Speed 2631.96 samples/sec   Loss 1.7312   LearningRate 0.0013   Epoch: 17   Global Step: 735350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:22:57,837-Speed 2631.49 samples/sec   Loss 1.6847   LearningRate 0.0013   Epoch: 17   Global Step: 735360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:23:01,729-Speed 2631.69 samples/sec   Loss 1.7402   LearningRate 0.0013   Epoch: 17   Global Step: 735370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:23:05,621-Speed 2631.62 samples/sec   Loss 1.7053   LearningRate 0.0013   Epoch: 17   Global Step: 735380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:23:09,527-Speed 2622.30 samples/sec   Loss 1.7421   LearningRate 0.0013   Epoch: 17   Global Step: 735390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:23:13,421-Speed 2631.35 samples/sec   Loss 1.6594   LearningRate 0.0013   Epoch: 17   Global Step: 735400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:23:17,399-Speed 2574.81 samples/sec   Loss 1.6880   LearningRate 0.0013   Epoch: 17   Global Step: 735410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:23:21,323-Speed 2610.83 samples/sec   Loss 1.6988   LearningRate 0.0013   Epoch: 17   Global Step: 735420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:25,226-Speed 2624.65 samples/sec   Loss 1.7069   LearningRate 0.0013   Epoch: 17   Global Step: 735430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:29,169-Speed 2597.15 samples/sec   Loss 1.7445   LearningRate 0.0013   Epoch: 17   Global Step: 735440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:33,066-Speed 2628.68 samples/sec   Loss 1.6771   LearningRate 0.0013   Epoch: 17   Global Step: 735450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:36,966-Speed 2626.53 samples/sec   Loss 1.7280   LearningRate 0.0013   Epoch: 17   Global Step: 735460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:40,860-Speed 2629.95 samples/sec   Loss 1.7026   LearningRate 0.0013   Epoch: 17   Global Step: 735470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:44,767-Speed 2621.73 samples/sec   Loss 1.7032   LearningRate 0.0013   Epoch: 17   Global Step: 735480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:48,664-Speed 2628.51 samples/sec   Loss 1.6715   LearningRate 0.0013   Epoch: 17   Global Step: 735490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:52,560-Speed 2629.79 samples/sec   Loss 1.6972   LearningRate 0.0013   Epoch: 17   Global Step: 735500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:23:56,457-Speed 2628.58 samples/sec   Loss 1.6928   LearningRate 0.0013   Epoch: 17   Global Step: 735510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:00,360-Speed 2624.39 samples/sec   Loss 1.7012   LearningRate 0.0013   Epoch: 17   Global Step: 735520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-16 06:24:04,243-Speed 2637.45 samples/sec   Loss 1.7099   LearningRate 0.0013   Epoch: 17   Global Step: 735530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:08,140-Speed 2628.66 samples/sec   Loss 1.6858   LearningRate 0.0013   Epoch: 17   Global Step: 735540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:12,041-Speed 2625.63 samples/sec   Loss 1.7888   LearningRate 0.0013   Epoch: 17   Global Step: 735550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:15,947-Speed 2622.08 samples/sec   Loss 1.7122   LearningRate 0.0013   Epoch: 17   Global Step: 735560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:19,852-Speed 2622.62 samples/sec   Loss 1.7586   LearningRate 0.0013   Epoch: 17   Global Step: 735570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:23,747-Speed 2629.84 samples/sec   Loss 1.6775   LearningRate 0.0013   Epoch: 17   Global Step: 735580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:27,642-Speed 2630.03 samples/sec   Loss 1.6824   LearningRate 0.0013   Epoch: 17   Global Step: 735590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:31,553-Speed 2619.08 samples/sec   Loss 1.7665   LearningRate 0.0013   Epoch: 17   Global Step: 735600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:24:35,423-Speed 2646.09 samples/sec   Loss 1.6865   LearningRate 0.0013   Epoch: 17   Global Step: 735610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:24:39,320-Speed 2628.54 samples/sec   Loss 1.6856   LearningRate 0.0013   Epoch: 17   Global Step: 735620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:24:43,216-Speed 2628.77 samples/sec   Loss 1.7213   LearningRate 0.0013   Epoch: 17   Global Step: 735630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:24:47,109-Speed 2631.24 samples/sec   Loss 1.7148   LearningRate 0.0013   Epoch: 17   Global Step: 735640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:24:51,004-Speed 2629.04 samples/sec   Loss 1.7668   LearningRate 0.0013   Epoch: 17   Global Step: 735650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:24:54,900-Speed 2630.01 samples/sec   Loss 1.6860   LearningRate 0.0013   Epoch: 17   Global Step: 735660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:24:58,798-Speed 2627.96 samples/sec   Loss 1.7035   LearningRate 0.0013   Epoch: 17   Global Step: 735670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:02,693-Speed 2629.70 samples/sec   Loss 1.7256   LearningRate 0.0013   Epoch: 17   Global Step: 735680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:06,591-Speed 2627.47 samples/sec   Loss 1.7512   LearningRate 0.0013   Epoch: 17   Global Step: 735690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:10,492-Speed 2625.58 samples/sec   Loss 1.7506   LearningRate 0.0013   Epoch: 17   Global Step: 735700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:14,388-Speed 2628.98 samples/sec   Loss 1.6932   LearningRate 0.0013   Epoch: 17   Global Step: 735710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:25:18,289-Speed 2626.26 samples/sec   Loss 1.6733   LearningRate 0.0013   Epoch: 17   Global Step: 735720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:25:22,199-Speed 2619.69 samples/sec   Loss 1.6826   LearningRate 0.0013   Epoch: 17   Global Step: 735730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:25:26,093-Speed 2630.46 samples/sec   Loss 1.7104   LearningRate 0.0013   Epoch: 17   Global Step: 735740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:25:29,991-Speed 2631.17 samples/sec   Loss 1.7406   LearningRate 0.0013   Epoch: 17   Global Step: 735750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:25:33,878-Speed 2634.56 samples/sec   Loss 1.7361   LearningRate 0.0013   Epoch: 17   Global Step: 735760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:37,771-Speed 2630.95 samples/sec   Loss 1.7093   LearningRate 0.0013   Epoch: 17   Global Step: 735770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:41,667-Speed 2629.49 samples/sec   Loss 1.6817   LearningRate 0.0013   Epoch: 17   Global Step: 735780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:45,592-Speed 2630.67 samples/sec   Loss 1.6934   LearningRate 0.0013   Epoch: 17   Global Step: 735790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:49,490-Speed 2628.10 samples/sec   Loss 1.7675   LearningRate 0.0013   Epoch: 17   Global Step: 735800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:53,670-Speed 2633.31 samples/sec   Loss 1.6613   LearningRate 0.0013   Epoch: 17   Global Step: 735810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:25:57,570-Speed 2626.60 samples/sec   Loss 1.6519   LearningRate 0.0013   Epoch: 17   Global Step: 735820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:01,498-Speed 2611.72 samples/sec   Loss 1.6487   LearningRate 0.0013   Epoch: 17   Global Step: 735830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:05,398-Speed 2625.93 samples/sec   Loss 1.7298   LearningRate 0.0013   Epoch: 17   Global Step: 735840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:09,303-Speed 2623.34 samples/sec   Loss 1.6823   LearningRate 0.0013   Epoch: 17   Global Step: 735850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:13,216-Speed 2617.13 samples/sec   Loss 1.7291   LearningRate 0.0013   Epoch: 17   Global Step: 735860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:17,354-Speed 2632.99 samples/sec   Loss 1.7122   LearningRate 0.0013   Epoch: 17   Global Step: 735870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:21,251-Speed 2628.64 samples/sec   Loss 1.6915   LearningRate 0.0013   Epoch: 17   Global Step: 735880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:25,176-Speed 2609.33 samples/sec   Loss 1.6791   LearningRate 0.0013   Epoch: 17   Global Step: 735890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:29,107-Speed 2605.76 samples/sec   Loss 1.7087   LearningRate 0.0013   Epoch: 17   Global Step: 735900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:33,007-Speed 2626.69 samples/sec   Loss 1.6999   LearningRate 0.0013   Epoch: 17   Global Step: 735910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:36,901-Speed 2630.65 samples/sec   Loss 1.6859   LearningRate 0.0013   Epoch: 17   Global Step: 735920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:40,796-Speed 2629.30 samples/sec   Loss 1.6893   LearningRate 0.0013   Epoch: 17   Global Step: 735930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:26:44,664-Speed 2647.84 samples/sec   Loss 1.7498   LearningRate 0.0013   Epoch: 17   Global Step: 735940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:48,564-Speed 2626.19 samples/sec   Loss 1.7139   LearningRate 0.0013   Epoch: 17   Global Step: 735950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:52,475-Speed 2619.38 samples/sec   Loss 1.6860   LearningRate 0.0013   Epoch: 17   Global Step: 735960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:26:56,389-Speed 2616.92 samples/sec   Loss 1.6611   LearningRate 0.0013   Epoch: 17   Global Step: 735970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:00,279-Speed 2633.16 samples/sec   Loss 1.7216   LearningRate 0.0013   Epoch: 17   Global Step: 735980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:04,184-Speed 2622.96 samples/sec   Loss 1.7017   LearningRate 0.0013   Epoch: 17   Global Step: 735990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:08,083-Speed 2627.24 samples/sec   Loss 1.7026   LearningRate 0.0013   Epoch: 17   Global Step: 736000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:11,973-Speed 2632.92 samples/sec   Loss 1.7015   LearningRate 0.0013   Epoch: 17   Global Step: 736010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:15,868-Speed 2629.49 samples/sec   Loss 1.6597   LearningRate 0.0013   Epoch: 17   Global Step: 736020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:19,768-Speed 2626.41 samples/sec   Loss 1.7337   LearningRate 0.0013   Epoch: 17   Global Step: 736030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:23,661-Speed 2631.51 samples/sec   Loss 1.7266   LearningRate 0.0013   Epoch: 17   Global Step: 736040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:27:27,527-Speed 2649.06 samples/sec   Loss 1.7020   LearningRate 0.0013   Epoch: 17   Global Step: 736050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:31,424-Speed 2628.59 samples/sec   Loss 1.7213   LearningRate 0.0013   Epoch: 17   Global Step: 736060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:35,338-Speed 2617.37 samples/sec   Loss 1.7067   LearningRate 0.0013   Epoch: 17   Global Step: 736070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:39,239-Speed 2625.37 samples/sec   Loss 1.6904   LearningRate 0.0013   Epoch: 17   Global Step: 736080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:43,141-Speed 2624.71 samples/sec   Loss 1.7031   LearningRate 0.0013   Epoch: 17   Global Step: 736090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:47,045-Speed 2623.84 samples/sec   Loss 1.6191   LearningRate 0.0013   Epoch: 17   Global Step: 736100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:50,961-Speed 2616.42 samples/sec   Loss 1.6775   LearningRate 0.0013   Epoch: 17   Global Step: 736110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:54,856-Speed 2629.99 samples/sec   Loss 1.7608   LearningRate 0.0013   Epoch: 17   Global Step: 736120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:27:58,757-Speed 2625.43 samples/sec   Loss 1.7201   LearningRate 0.0013   Epoch: 17   Global Step: 736130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:28:02,663-Speed 2622.77 samples/sec   Loss 1.6992   LearningRate 0.0013   Epoch: 17   Global Step: 736140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-16 06:28:06,562-Speed 2627.00 samples/sec   Loss 1.7233   LearningRate 0.0013   Epoch: 17   Global Step: 736150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-16 06:28:10,456-Speed 2630.20 samples/sec   Loss 1.7190   LearningRate 0.0013   Epoch: 17   Global Step: 736160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:28:14,380-Speed 2610.30 samples/sec   Loss 1.7219   LearningRate 0.0013   Epoch: 17   Global Step: 736170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:28:18,259-Speed 2639.92 samples/sec   Loss 1.6984   LearningRate 0.0013   Epoch: 17   Global Step: 736180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:28:22,174-Speed 2617.34 samples/sec   Loss 1.6915   LearningRate 0.0013   Epoch: 17   Global Step: 736190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:28:26,073-Speed 2626.26 samples/sec   Loss 1.6995   LearningRate 0.0013   Epoch: 17   Global Step: 736200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:28:29,966-Speed 2630.91 samples/sec   Loss 1.7169   LearningRate 0.0013   Epoch: 17   Global Step: 736210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:28:33,869-Speed 2625.32 samples/sec   Loss 1.7125   LearningRate 0.0013   Epoch: 17   Global Step: 736220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:28:37,745-Speed 2642.49 samples/sec   Loss 1.6839   LearningRate 0.0013   Epoch: 17   Global Step: 736230   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:28:41,634-Speed 2633.44 samples/sec   Loss 1.6775   LearningRate 0.0013   Epoch: 17   Global Step: 736240   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:28:45,568-Speed 2604.42 samples/sec   Loss 1.6785   LearningRate 0.0013   Epoch: 17   Global Step: 736250   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:28:49,471-Speed 2624.31 samples/sec   Loss 1.6879   LearningRate 0.0013   Epoch: 17   Global Step: 736260   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:28:53,380-Speed 2622.77 samples/sec   Loss 1.6634   LearningRate 0.0013   Epoch: 17   Global Step: 736270   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:28:57,283-Speed 2624.92 samples/sec   Loss 1.7025   LearningRate 0.0013   Epoch: 17   Global Step: 736280   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:29:01,178-Speed 2629.69 samples/sec   Loss 1.7031   LearningRate 0.0013   Epoch: 17   Global Step: 736290   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:29:05,082-Speed 2623.30 samples/sec   Loss 1.6921   LearningRate 0.0013   Epoch: 17   Global Step: 736300   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:29:08,995-Speed 2617.82 samples/sec   Loss 1.7172   LearningRate 0.0013   Epoch: 17   Global Step: 736310   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:29:12,888-Speed 2631.27 samples/sec   Loss 1.7590   LearningRate 0.0013   Epoch: 17   Global Step: 736320   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:29:16,785-Speed 2627.90 samples/sec   Loss 1.7464   LearningRate 0.0013   Epoch: 17   Global Step: 736330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:20,680-Speed 2630.89 samples/sec   Loss 1.6791   LearningRate 0.0013   Epoch: 17   Global Step: 736340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:24,574-Speed 2630.23 samples/sec   Loss 1.7191   LearningRate 0.0013   Epoch: 17   Global Step: 736350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:28,473-Speed 2626.85 samples/sec   Loss 1.6816   LearningRate 0.0013   Epoch: 17   Global Step: 736360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:32,375-Speed 2625.07 samples/sec   Loss 1.7296   LearningRate 0.0013   Epoch: 17   Global Step: 736370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:36,279-Speed 2623.49 samples/sec   Loss 1.7219   LearningRate 0.0013   Epoch: 17   Global Step: 736380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:40,193-Speed 2616.70 samples/sec   Loss 1.6805   LearningRate 0.0013   Epoch: 17   Global Step: 736390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:44,082-Speed 2633.68 samples/sec   Loss 1.7015   LearningRate 0.0013   Epoch: 17   Global Step: 736400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:47,974-Speed 2632.42 samples/sec   Loss 1.6830   LearningRate 0.0013   Epoch: 17   Global Step: 736410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:51,975-Speed 2560.30 samples/sec   Loss 1.6827   LearningRate 0.0013   Epoch: 17   Global Step: 736420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:29:55,873-Speed 2627.97 samples/sec   Loss 1.7120   LearningRate 0.0013   Epoch: 17   Global Step: 736430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:29:59,766-Speed 2631.01 samples/sec   Loss 1.6447   LearningRate 0.0013   Epoch: 17   Global Step: 736440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:30:03,674-Speed 2620.63 samples/sec   Loss 1.7147   LearningRate 0.0013   Epoch: 17   Global Step: 736450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:30:07,582-Speed 2621.19 samples/sec   Loss 1.6772   LearningRate 0.0013   Epoch: 17   Global Step: 736460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:30:11,476-Speed 2630.99 samples/sec   Loss 1.6523   LearningRate 0.0013   Epoch: 17   Global Step: 736470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:30:15,348-Speed 2644.88 samples/sec   Loss 1.7036   LearningRate 0.0013   Epoch: 17   Global Step: 736480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:19,244-Speed 2628.93 samples/sec   Loss 1.6980   LearningRate 0.0013   Epoch: 17   Global Step: 736490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:23,138-Speed 2630.91 samples/sec   Loss 1.6971   LearningRate 0.0013   Epoch: 17   Global Step: 736500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:27,033-Speed 2629.81 samples/sec   Loss 1.6768   LearningRate 0.0013   Epoch: 17   Global Step: 736510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:30,966-Speed 2603.90 samples/sec   Loss 1.6781   LearningRate 0.0013   Epoch: 17   Global Step: 736520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:34,862-Speed 2628.96 samples/sec   Loss 1.6587   LearningRate 0.0013   Epoch: 17   Global Step: 736530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:38,761-Speed 2627.38 samples/sec   Loss 1.7062   LearningRate 0.0013   Epoch: 17   Global Step: 736540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:42,655-Speed 2630.66 samples/sec   Loss 1.6779   LearningRate 0.0013   Epoch: 17   Global Step: 736550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:46,549-Speed 2630.04 samples/sec   Loss 1.6894   LearningRate 0.0013   Epoch: 17   Global Step: 736560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:50,448-Speed 2627.24 samples/sec   Loss 1.6420   LearningRate 0.0013   Epoch: 17   Global Step: 736570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:30:54,341-Speed 2630.85 samples/sec   Loss 1.6908   LearningRate 0.0013   Epoch: 17   Global Step: 736580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:30:58,220-Speed 2641.24 samples/sec   Loss 1.6932   LearningRate 0.0013   Epoch: 17   Global Step: 736590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:02,125-Speed 2622.79 samples/sec   Loss 1.7361   LearningRate 0.0013   Epoch: 17   Global Step: 736600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:06,018-Speed 2631.22 samples/sec   Loss 1.6655   LearningRate 0.0013   Epoch: 17   Global Step: 736610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:09,911-Speed 2630.90 samples/sec   Loss 1.7502   LearningRate 0.0013   Epoch: 17   Global Step: 736620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:13,801-Speed 2633.22 samples/sec   Loss 1.6588   LearningRate 0.0013   Epoch: 17   Global Step: 736630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:17,700-Speed 2626.90 samples/sec   Loss 1.6477   LearningRate 0.0013   Epoch: 17   Global Step: 736640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:21,613-Speed 2618.03 samples/sec   Loss 1.6820   LearningRate 0.0013   Epoch: 17   Global Step: 736650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:25,507-Speed 2630.28 samples/sec   Loss 1.6240   LearningRate 0.0013   Epoch: 17   Global Step: 736660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:31:29,388-Speed 2639.33 samples/sec   Loss 1.6631   LearningRate 0.0013   Epoch: 17   Global Step: 736670   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:33,308-Speed 2613.03 samples/sec   Loss 1.7560   LearningRate 0.0013   Epoch: 17   Global Step: 736680   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:37,207-Speed 2626.83 samples/sec   Loss 1.6842   LearningRate 0.0013   Epoch: 17   Global Step: 736690   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:41,100-Speed 2630.70 samples/sec   Loss 1.6842   LearningRate 0.0013   Epoch: 17   Global Step: 736700   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:44,995-Speed 2629.29 samples/sec   Loss 1.7607   LearningRate 0.0013   Epoch: 17   Global Step: 736710   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:48,891-Speed 2629.76 samples/sec   Loss 1.7076   LearningRate 0.0013   Epoch: 17   Global Step: 736720   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:52,794-Speed 2623.78 samples/sec   Loss 1.7083   LearningRate 0.0013   Epoch: 17   Global Step: 736730   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:31:56,697-Speed 2624.90 samples/sec   Loss 1.6947   LearningRate 0.0013   Epoch: 17   Global Step: 736740   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:32:00,617-Speed 2613.09 samples/sec   Loss 1.7430   LearningRate 0.0013   Epoch: 17   Global Step: 736750   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:32:04,513-Speed 2628.80 samples/sec   Loss 1.6985   LearningRate 0.0013   Epoch: 17   Global Step: 736760   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:32:08,411-Speed 2627.64 samples/sec   Loss 1.7341   LearningRate 0.0013   Epoch: 17   Global Step: 736770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:12,314-Speed 2624.23 samples/sec   Loss 1.6844   LearningRate 0.0013   Epoch: 17   Global Step: 736780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:16,223-Speed 2620.38 samples/sec   Loss 1.7111   LearningRate 0.0013   Epoch: 17   Global Step: 736790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:20,118-Speed 2629.43 samples/sec   Loss 1.6890   LearningRate 0.0013   Epoch: 17   Global Step: 736800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:24,021-Speed 2624.51 samples/sec   Loss 1.6794   LearningRate 0.0013   Epoch: 17   Global Step: 736810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:27,920-Speed 2626.61 samples/sec   Loss 1.6780   LearningRate 0.0013   Epoch: 17   Global Step: 736820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:31,817-Speed 2629.02 samples/sec   Loss 1.6808   LearningRate 0.0013   Epoch: 17   Global Step: 736830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:35,723-Speed 2622.45 samples/sec   Loss 1.7441   LearningRate 0.0012   Epoch: 17   Global Step: 736840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:39,618-Speed 2629.54 samples/sec   Loss 1.7154   LearningRate 0.0012   Epoch: 17   Global Step: 736850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:43,516-Speed 2627.46 samples/sec   Loss 1.7275   LearningRate 0.0012   Epoch: 17   Global Step: 736860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:47,389-Speed 2644.77 samples/sec   Loss 1.6937   LearningRate 0.0012   Epoch: 17   Global Step: 736870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:51,297-Speed 2620.92 samples/sec   Loss 1.6947   LearningRate 0.0012   Epoch: 17   Global Step: 736880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:55,196-Speed 2627.60 samples/sec   Loss 1.6872   LearningRate 0.0012   Epoch: 17   Global Step: 736890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:32:59,102-Speed 2621.81 samples/sec   Loss 1.6654   LearningRate 0.0012   Epoch: 17   Global Step: 736900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:03,002-Speed 2626.52 samples/sec   Loss 1.6626   LearningRate 0.0012   Epoch: 17   Global Step: 736910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:06,899-Speed 2628.74 samples/sec   Loss 1.6740   LearningRate 0.0012   Epoch: 17   Global Step: 736920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:10,800-Speed 2625.42 samples/sec   Loss 1.6696   LearningRate 0.0012   Epoch: 17   Global Step: 736930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:14,729-Speed 2606.48 samples/sec   Loss 1.6704   LearningRate 0.0012   Epoch: 17   Global Step: 736940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:18,627-Speed 2628.25 samples/sec   Loss 1.6847   LearningRate 0.0012   Epoch: 17   Global Step: 736950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:22,521-Speed 2630.69 samples/sec   Loss 1.7455   LearningRate 0.0012   Epoch: 17   Global Step: 736960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:26,419-Speed 2627.96 samples/sec   Loss 1.7262   LearningRate 0.0012   Epoch: 17   Global Step: 736970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:30,310-Speed 2632.16 samples/sec   Loss 1.6751   LearningRate 0.0012   Epoch: 17   Global Step: 736980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:34,223-Speed 2617.28 samples/sec   Loss 1.6871   LearningRate 0.0012   Epoch: 17   Global Step: 736990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:38,136-Speed 2617.52 samples/sec   Loss 1.6500   LearningRate 0.0012   Epoch: 17   Global Step: 737000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:42,031-Speed 2629.91 samples/sec   Loss 1.6688   LearningRate 0.0012   Epoch: 17   Global Step: 737010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:45,927-Speed 2628.94 samples/sec   Loss 1.7021   LearningRate 0.0012   Epoch: 17   Global Step: 737020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:49,821-Speed 2630.97 samples/sec   Loss 1.6883   LearningRate 0.0012   Epoch: 17   Global Step: 737030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:33:53,691-Speed 2646.80 samples/sec   Loss 1.7061   LearningRate 0.0012   Epoch: 17   Global Step: 737040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:33:57,607-Speed 2615.25 samples/sec   Loss 1.7476   LearningRate 0.0012   Epoch: 17   Global Step: 737050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:01,510-Speed 2624.47 samples/sec   Loss 1.6632   LearningRate 0.0012   Epoch: 17   Global Step: 737060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:05,415-Speed 2622.55 samples/sec   Loss 1.6689   LearningRate 0.0012   Epoch: 17   Global Step: 737070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:09,311-Speed 2628.66 samples/sec   Loss 1.6706   LearningRate 0.0012   Epoch: 17   Global Step: 737080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:13,207-Speed 2629.25 samples/sec   Loss 1.6849   LearningRate 0.0012   Epoch: 17   Global Step: 737090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:17,104-Speed 2628.32 samples/sec   Loss 1.6350   LearningRate 0.0012   Epoch: 17   Global Step: 737100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:21,011-Speed 2621.77 samples/sec   Loss 1.7050   LearningRate 0.0012   Epoch: 17   Global Step: 737110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:24,910-Speed 2627.28 samples/sec   Loss 1.6909   LearningRate 0.0012   Epoch: 17   Global Step: 737120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:28,804-Speed 2630.07 samples/sec   Loss 1.7009   LearningRate 0.0012   Epoch: 17   Global Step: 737130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:32,697-Speed 2631.18 samples/sec   Loss 1.6924   LearningRate 0.0012   Epoch: 17   Global Step: 737140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:34:36,569-Speed 2645.07 samples/sec   Loss 1.6976   LearningRate 0.0012   Epoch: 17   Global Step: 737150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:34:40,443-Speed 2643.56 samples/sec   Loss 1.6609   LearningRate 0.0012   Epoch: 17   Global Step: 737160   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:34:44,338-Speed 2630.61 samples/sec   Loss 1.7063   LearningRate 0.0012   Epoch: 17   Global Step: 737170   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:34:48,234-Speed 2629.26 samples/sec   Loss 1.6631   LearningRate 0.0012   Epoch: 17   Global Step: 737180   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:34:52,139-Speed 2623.38 samples/sec   Loss 1.6882   LearningRate 0.0012   Epoch: 17   Global Step: 737190   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:34:56,032-Speed 2630.69 samples/sec   Loss 1.6414   LearningRate 0.0012   Epoch: 17   Global Step: 737200   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:34:59,927-Speed 2629.57 samples/sec   Loss 1.7103   LearningRate 0.0012   Epoch: 17   Global Step: 737210   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:35:03,821-Speed 2630.31 samples/sec   Loss 1.7240   LearningRate 0.0012   Epoch: 17   Global Step: 737220   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:35:07,717-Speed 2628.92 samples/sec   Loss 1.7601   LearningRate 0.0012   Epoch: 17   Global Step: 737230   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:35:11,610-Speed 2631.24 samples/sec   Loss 1.7415   LearningRate 0.0012   Epoch: 17   Global Step: 737240   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:35:15,510-Speed 2626.21 samples/sec   Loss 1.7001   LearningRate 0.0012   Epoch: 17   Global Step: 737250   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:35:19,411-Speed 2626.26 samples/sec   Loss 1.6825   LearningRate 0.0012   Epoch: 17   Global Step: 737260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:23,304-Speed 2630.67 samples/sec   Loss 1.6741   LearningRate 0.0012   Epoch: 17   Global Step: 737270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:27,197-Speed 2631.71 samples/sec   Loss 1.6507   LearningRate 0.0012   Epoch: 17   Global Step: 737280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:31,089-Speed 2630.88 samples/sec   Loss 1.6826   LearningRate 0.0012   Epoch: 17   Global Step: 737290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:34,984-Speed 2629.89 samples/sec   Loss 1.6661   LearningRate 0.0012   Epoch: 17   Global Step: 737300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:38,880-Speed 2629.23 samples/sec   Loss 1.7077   LearningRate 0.0012   Epoch: 17   Global Step: 737310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:42,777-Speed 2628.71 samples/sec   Loss 1.6822   LearningRate 0.0012   Epoch: 17   Global Step: 737320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:46,672-Speed 2629.07 samples/sec   Loss 1.6313   LearningRate 0.0012   Epoch: 17   Global Step: 737330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:50,576-Speed 2624.24 samples/sec   Loss 1.7301   LearningRate 0.0012   Epoch: 17   Global Step: 737340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:54,479-Speed 2624.34 samples/sec   Loss 1.6887   LearningRate 0.0012   Epoch: 17   Global Step: 737350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:35:58,385-Speed 2622.18 samples/sec   Loss 1.6789   LearningRate 0.0012   Epoch: 17   Global Step: 737360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:36:02,291-Speed 2621.90 samples/sec   Loss 1.6824   LearningRate 0.0012   Epoch: 17   Global Step: 737370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:36:06,268-Speed 2575.71 samples/sec   Loss 1.7419   LearningRate 0.0012   Epoch: 17   Global Step: 737380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:36:10,166-Speed 2627.88 samples/sec   Loss 1.7297   LearningRate 0.0012   Epoch: 17   Global Step: 737390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:36:14,076-Speed 2619.66 samples/sec   Loss 1.7172   LearningRate 0.0012   Epoch: 17   Global Step: 737400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:36:17,951-Speed 2643.16 samples/sec   Loss 1.6517   LearningRate 0.0012   Epoch: 17   Global Step: 737410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:21,855-Speed 2623.81 samples/sec   Loss 1.7330   LearningRate 0.0012   Epoch: 17   Global Step: 737420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:25,758-Speed 2624.65 samples/sec   Loss 1.7290   LearningRate 0.0012   Epoch: 17   Global Step: 737430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:29,652-Speed 2630.63 samples/sec   Loss 1.7401   LearningRate 0.0012   Epoch: 17   Global Step: 737440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:33,552-Speed 2625.91 samples/sec   Loss 1.6566   LearningRate 0.0012   Epoch: 17   Global Step: 737450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:37,445-Speed 2630.76 samples/sec   Loss 1.6752   LearningRate 0.0012   Epoch: 17   Global Step: 737460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:41,341-Speed 2628.75 samples/sec   Loss 1.7552   LearningRate 0.0012   Epoch: 17   Global Step: 737470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:45,235-Speed 2630.72 samples/sec   Loss 1.6882   LearningRate 0.0012   Epoch: 17   Global Step: 737480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:49,133-Speed 2628.14 samples/sec   Loss 1.6547   LearningRate 0.0012   Epoch: 17   Global Step: 737490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:53,031-Speed 2627.00 samples/sec   Loss 1.7048   LearningRate 0.0012   Epoch: 17   Global Step: 737500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:36:56,931-Speed 2626.34 samples/sec   Loss 1.6277   LearningRate 0.0012   Epoch: 17   Global Step: 737510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:00,830-Speed 2627.19 samples/sec   Loss 1.7104   LearningRate 0.0012   Epoch: 17   Global Step: 737520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:04,724-Speed 2630.41 samples/sec   Loss 1.6879   LearningRate 0.0012   Epoch: 17   Global Step: 737530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:08,619-Speed 2629.40 samples/sec   Loss 1.6943   LearningRate 0.0012   Epoch: 17   Global Step: 737540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:12,527-Speed 2620.95 samples/sec   Loss 1.6876   LearningRate 0.0012   Epoch: 17   Global Step: 737550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:16,424-Speed 2628.09 samples/sec   Loss 1.7313   LearningRate 0.0012   Epoch: 17   Global Step: 737560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:20,338-Speed 2616.99 samples/sec   Loss 1.7000   LearningRate 0.0012   Epoch: 17   Global Step: 737570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:24,235-Speed 2628.38 samples/sec   Loss 1.6378   LearningRate 0.0012   Epoch: 17   Global Step: 737580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:28,137-Speed 2625.02 samples/sec   Loss 1.6865   LearningRate 0.0012   Epoch: 17   Global Step: 737590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:37:32,011-Speed 2643.89 samples/sec   Loss 1.7043   LearningRate 0.0012   Epoch: 17   Global Step: 737600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:35,902-Speed 2632.11 samples/sec   Loss 1.6754   LearningRate 0.0012   Epoch: 17   Global Step: 737610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:39,801-Speed 2627.11 samples/sec   Loss 1.6692   LearningRate 0.0012   Epoch: 17   Global Step: 737620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:43,697-Speed 2629.49 samples/sec   Loss 1.6520   LearningRate 0.0012   Epoch: 17   Global Step: 737630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:47,590-Speed 2630.82 samples/sec   Loss 1.7018   LearningRate 0.0012   Epoch: 17   Global Step: 737640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:51,482-Speed 2631.62 samples/sec   Loss 1.6608   LearningRate 0.0012   Epoch: 17   Global Step: 737650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:55,379-Speed 2628.36 samples/sec   Loss 1.6311   LearningRate 0.0012   Epoch: 17   Global Step: 737660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:37:59,270-Speed 2631.88 samples/sec   Loss 1.6826   LearningRate 0.0012   Epoch: 17   Global Step: 737670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:03,173-Speed 2624.12 samples/sec   Loss 1.6487   LearningRate 0.0012   Epoch: 17   Global Step: 737680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:07,072-Speed 2627.19 samples/sec   Loss 1.6452   LearningRate 0.0012   Epoch: 17   Global Step: 737690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:10,980-Speed 2621.12 samples/sec   Loss 1.6497   LearningRate 0.0012   Epoch: 17   Global Step: 737700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:38:14,870-Speed 2632.88 samples/sec   Loss 1.6887   LearningRate 0.0012   Epoch: 17   Global Step: 737710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:38:18,756-Speed 2636.03 samples/sec   Loss 1.6902   LearningRate 0.0012   Epoch: 17   Global Step: 737720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:22,649-Speed 2630.78 samples/sec   Loss 1.6263   LearningRate 0.0012   Epoch: 17   Global Step: 737730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:26,552-Speed 2624.26 samples/sec   Loss 1.6757   LearningRate 0.0012   Epoch: 17   Global Step: 737740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:30,447-Speed 2629.56 samples/sec   Loss 1.6691   LearningRate 0.0012   Epoch: 17   Global Step: 737750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:34,341-Speed 2630.60 samples/sec   Loss 1.6132   LearningRate 0.0012   Epoch: 17   Global Step: 737760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:38,239-Speed 2627.15 samples/sec   Loss 1.7181   LearningRate 0.0012   Epoch: 17   Global Step: 737770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:42,141-Speed 2624.98 samples/sec   Loss 1.6930   LearningRate 0.0012   Epoch: 17   Global Step: 737780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:46,036-Speed 2629.13 samples/sec   Loss 1.7341   LearningRate 0.0012   Epoch: 17   Global Step: 737790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:49,930-Speed 2631.08 samples/sec   Loss 1.7024   LearningRate 0.0012   Epoch: 17   Global Step: 737800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:53,825-Speed 2630.28 samples/sec   Loss 1.7032   LearningRate 0.0012   Epoch: 17   Global Step: 737810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:38:57,723-Speed 2627.28 samples/sec   Loss 1.6702   LearningRate 0.0012   Epoch: 17   Global Step: 737820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:39:01,620-Speed 2628.67 samples/sec   Loss 1.7084   LearningRate 0.0012   Epoch: 17   Global Step: 737830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:39:05,519-Speed 2627.07 samples/sec   Loss 1.6801   LearningRate 0.0012   Epoch: 17   Global Step: 737840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:39:09,398-Speed 2640.03 samples/sec   Loss 1.6579   LearningRate 0.0012   Epoch: 17   Global Step: 737850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:39:13,378-Speed 2573.11 samples/sec   Loss 1.6625   LearningRate 0.0012   Epoch: 17   Global Step: 737860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:39:17,315-Speed 2602.37 samples/sec   Loss 1.6292   LearningRate 0.0012   Epoch: 17   Global Step: 737870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:39:21,203-Speed 2634.46 samples/sec   Loss 1.7014   LearningRate 0.0012   Epoch: 17   Global Step: 737880   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:25,171-Speed 2581.26 samples/sec   Loss 1.6477   LearningRate 0.0012   Epoch: 17   Global Step: 737890   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:29,134-Speed 2584.77 samples/sec   Loss 1.6924   LearningRate 0.0012   Epoch: 17   Global Step: 737900   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:33,129-Speed 2564.07 samples/sec   Loss 1.6816   LearningRate 0.0012   Epoch: 17   Global Step: 737910   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:37,032-Speed 2623.45 samples/sec   Loss 1.6567   LearningRate 0.0012   Epoch: 17   Global Step: 737920   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:40,924-Speed 2632.31 samples/sec   Loss 1.6999   LearningRate 0.0012   Epoch: 17   Global Step: 737930   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:44,815-Speed 2632.19 samples/sec   Loss 1.6471   LearningRate 0.0012   Epoch: 17   Global Step: 737940   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:48,717-Speed 2624.98 samples/sec   Loss 1.6799   LearningRate 0.0012   Epoch: 17   Global Step: 737950   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:52,617-Speed 2626.32 samples/sec   Loss 1.6951   LearningRate 0.0012   Epoch: 17   Global Step: 737960   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:39:56,513-Speed 2629.35 samples/sec   Loss 1.6577   LearningRate 0.0012   Epoch: 17   Global Step: 737970   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:40:00,421-Speed 2620.79 samples/sec   Loss 1.6136   LearningRate 0.0012   Epoch: 17   Global Step: 737980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:04,327-Speed 2622.24 samples/sec   Loss 1.6751   LearningRate 0.0012   Epoch: 17   Global Step: 737990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:08,226-Speed 2627.33 samples/sec   Loss 1.6304   LearningRate 0.0012   Epoch: 17   Global Step: 738000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:12,126-Speed 2625.91 samples/sec   Loss 1.7156   LearningRate 0.0012   Epoch: 17   Global Step: 738010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:16,026-Speed 2626.48 samples/sec   Loss 1.7310   LearningRate 0.0012   Epoch: 17   Global Step: 738020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:19,969-Speed 2597.58 samples/sec   Loss 1.7610   LearningRate 0.0012   Epoch: 17   Global Step: 738030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:23,881-Speed 2618.58 samples/sec   Loss 1.6252   LearningRate 0.0012   Epoch: 17   Global Step: 738040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:27,773-Speed 2632.07 samples/sec   Loss 1.6912   LearningRate 0.0012   Epoch: 17   Global Step: 738050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:31,668-Speed 2629.66 samples/sec   Loss 1.6650   LearningRate 0.0012   Epoch: 17   Global Step: 738060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:35,567-Speed 2627.40 samples/sec   Loss 1.6555   LearningRate 0.0012   Epoch: 17   Global Step: 738070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:39,474-Speed 2622.21 samples/sec   Loss 1.7278   LearningRate 0.0012   Epoch: 17   Global Step: 738080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:40:43,370-Speed 2628.54 samples/sec   Loss 1.6627   LearningRate 0.0012   Epoch: 17   Global Step: 738090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:40:47,264-Speed 2631.36 samples/sec   Loss 1.6671   LearningRate 0.0012   Epoch: 17   Global Step: 738100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:40:51,138-Speed 2643.83 samples/sec   Loss 1.6677   LearningRate 0.0012   Epoch: 17   Global Step: 738110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:55,031-Speed 2631.09 samples/sec   Loss 1.6812   LearningRate 0.0012   Epoch: 17   Global Step: 738120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:40:58,903-Speed 2645.38 samples/sec   Loss 1.6968   LearningRate 0.0012   Epoch: 17   Global Step: 738130   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:02,827-Speed 2609.94 samples/sec   Loss 1.7215   LearningRate 0.0012   Epoch: 17   Global Step: 738140   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:06,716-Speed 2634.21 samples/sec   Loss 1.6253   LearningRate 0.0012   Epoch: 17   Global Step: 738150   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:10,658-Speed 2598.94 samples/sec   Loss 1.6708   LearningRate 0.0012   Epoch: 17   Global Step: 738160   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:14,547-Speed 2634.04 samples/sec   Loss 1.7191   LearningRate 0.0012   Epoch: 17   Global Step: 738170   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:18,436-Speed 2633.29 samples/sec   Loss 1.6551   LearningRate 0.0012   Epoch: 17   Global Step: 738180   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:22,351-Speed 2617.32 samples/sec   Loss 1.7154   LearningRate 0.0012   Epoch: 17   Global Step: 738190   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:26,248-Speed 2628.58 samples/sec   Loss 1.6671   LearningRate 0.0012   Epoch: 17   Global Step: 738200   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:30,150-Speed 2624.95 samples/sec   Loss 1.6849   LearningRate 0.0012   Epoch: 17   Global Step: 738210   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:34,048-Speed 2627.59 samples/sec   Loss 1.6289   LearningRate 0.0012   Epoch: 17   Global Step: 738220   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:41:37,949-Speed 2625.40 samples/sec   Loss 1.7056   LearningRate 0.0012   Epoch: 17   Global Step: 738230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:41:41,839-Speed 2632.86 samples/sec   Loss 1.6580   LearningRate 0.0012   Epoch: 17   Global Step: 738240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:41:45,743-Speed 2624.00 samples/sec   Loss 1.6988   LearningRate 0.0012   Epoch: 17   Global Step: 738250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:41:49,734-Speed 2567.56 samples/sec   Loss 1.7131   LearningRate 0.0012   Epoch: 17   Global Step: 738260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:41:53,630-Speed 2628.22 samples/sec   Loss 1.6548   LearningRate 0.0012   Epoch: 17   Global Step: 738270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:41:57,529-Speed 2627.75 samples/sec   Loss 1.6717   LearningRate 0.0012   Epoch: 17   Global Step: 738280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:01,431-Speed 2624.84 samples/sec   Loss 1.6858   LearningRate 0.0012   Epoch: 17   Global Step: 738290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:05,345-Speed 2616.58 samples/sec   Loss 1.6508   LearningRate 0.0012   Epoch: 17   Global Step: 738300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:09,277-Speed 2604.79 samples/sec   Loss 1.6498   LearningRate 0.0012   Epoch: 17   Global Step: 738310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:13,197-Speed 2613.90 samples/sec   Loss 1.6767   LearningRate 0.0012   Epoch: 17   Global Step: 738320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:17,096-Speed 2626.87 samples/sec   Loss 1.6923   LearningRate 0.0012   Epoch: 17   Global Step: 738330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:42:21,009-Speed 2617.87 samples/sec   Loss 1.6923   LearningRate 0.0012   Epoch: 17   Global Step: 738340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:42:24,889-Speed 2639.95 samples/sec   Loss 1.6363   LearningRate 0.0012   Epoch: 17   Global Step: 738350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:28,793-Speed 2623.88 samples/sec   Loss 1.6868   LearningRate 0.0012   Epoch: 17   Global Step: 738360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:32,694-Speed 2625.03 samples/sec   Loss 1.7171   LearningRate 0.0012   Epoch: 17   Global Step: 738370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:36,599-Speed 2623.00 samples/sec   Loss 1.6716   LearningRate 0.0012   Epoch: 17   Global Step: 738380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:40,497-Speed 2627.33 samples/sec   Loss 1.6853   LearningRate 0.0012   Epoch: 17   Global Step: 738390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:44,411-Speed 2617.71 samples/sec   Loss 1.6432   LearningRate 0.0012   Epoch: 17   Global Step: 738400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:48,309-Speed 2627.22 samples/sec   Loss 1.7047   LearningRate 0.0012   Epoch: 17   Global Step: 738410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:52,202-Speed 2631.67 samples/sec   Loss 1.6478   LearningRate 0.0012   Epoch: 17   Global Step: 738420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:56,093-Speed 2631.79 samples/sec   Loss 1.7282   LearningRate 0.0012   Epoch: 17   Global Step: 738430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:42:59,991-Speed 2628.78 samples/sec   Loss 1.6455   LearningRate 0.0012   Epoch: 17   Global Step: 738440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:03,888-Speed 2628.13 samples/sec   Loss 1.7008   LearningRate 0.0012   Epoch: 17   Global Step: 738450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:43:07,779-Speed 2631.84 samples/sec   Loss 1.6433   LearningRate 0.0012   Epoch: 17   Global Step: 738460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:43:11,660-Speed 2638.76 samples/sec   Loss 1.6777   LearningRate 0.0012   Epoch: 17   Global Step: 738470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:15,552-Speed 2632.03 samples/sec   Loss 1.6492   LearningRate 0.0012   Epoch: 17   Global Step: 738480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:19,464-Speed 2618.99 samples/sec   Loss 1.6653   LearningRate 0.0012   Epoch: 17   Global Step: 738490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:23,367-Speed 2623.91 samples/sec   Loss 1.6542   LearningRate 0.0012   Epoch: 17   Global Step: 738500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:27,260-Speed 2631.31 samples/sec   Loss 1.7823   LearningRate 0.0012   Epoch: 17   Global Step: 738510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:31,153-Speed 2631.36 samples/sec   Loss 1.6547   LearningRate 0.0012   Epoch: 17   Global Step: 738520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:35,048-Speed 2629.63 samples/sec   Loss 1.6765   LearningRate 0.0012   Epoch: 17   Global Step: 738530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:38,974-Speed 2608.76 samples/sec   Loss 1.6349   LearningRate 0.0012   Epoch: 17   Global Step: 738540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:42,877-Speed 2624.33 samples/sec   Loss 1.7223   LearningRate 0.0012   Epoch: 17   Global Step: 738550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:46,769-Speed 2631.94 samples/sec   Loss 1.6799   LearningRate 0.0012   Epoch: 17   Global Step: 738560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:43:50,666-Speed 2628.89 samples/sec   Loss 1.6254   LearningRate 0.0012   Epoch: 17   Global Step: 738570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:43:54,565-Speed 2627.19 samples/sec   Loss 1.6914   LearningRate 0.0012   Epoch: 17   Global Step: 738580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:43:58,461-Speed 2629.35 samples/sec   Loss 1.6214   LearningRate 0.0012   Epoch: 17   Global Step: 738590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:02,378-Speed 2614.66 samples/sec   Loss 1.6782   LearningRate 0.0012   Epoch: 17   Global Step: 738600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:06,275-Speed 2628.36 samples/sec   Loss 1.6880   LearningRate 0.0012   Epoch: 17   Global Step: 738610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:10,169-Speed 2630.32 samples/sec   Loss 1.6238   LearningRate 0.0012   Epoch: 17   Global Step: 738620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:14,063-Speed 2630.85 samples/sec   Loss 1.6825   LearningRate 0.0012   Epoch: 17   Global Step: 738630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:17,959-Speed 2628.84 samples/sec   Loss 1.6660   LearningRate 0.0012   Epoch: 17   Global Step: 738640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:21,862-Speed 2623.99 samples/sec   Loss 1.6628   LearningRate 0.0012   Epoch: 17   Global Step: 738650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:25,765-Speed 2624.36 samples/sec   Loss 1.6698   LearningRate 0.0012   Epoch: 17   Global Step: 738660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:29,666-Speed 2625.89 samples/sec   Loss 1.6509   LearningRate 0.0012   Epoch: 17   Global Step: 738670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-16 06:44:33,538-Speed 2645.56 samples/sec   Loss 1.6751   LearningRate 0.0012   Epoch: 17   Global Step: 738680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:37,460-Speed 2611.66 samples/sec   Loss 1.6935   LearningRate 0.0012   Epoch: 17   Global Step: 738690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:41,353-Speed 2630.58 samples/sec   Loss 1.6743   LearningRate 0.0012   Epoch: 17   Global Step: 738700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:45,280-Speed 2608.20 samples/sec   Loss 1.6292   LearningRate 0.0012   Epoch: 17   Global Step: 738710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:49,177-Speed 2628.53 samples/sec   Loss 1.6955   LearningRate 0.0012   Epoch: 17   Global Step: 738720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:53,077-Speed 2626.44 samples/sec   Loss 1.6562   LearningRate 0.0012   Epoch: 17   Global Step: 738730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:44:56,949-Speed 2645.78 samples/sec   Loss 1.6532   LearningRate 0.0012   Epoch: 17   Global Step: 738740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:00,851-Speed 2624.57 samples/sec   Loss 1.6689   LearningRate 0.0012   Epoch: 17   Global Step: 738750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:04,785-Speed 2604.38 samples/sec   Loss 1.6770   LearningRate 0.0012   Epoch: 17   Global Step: 738760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:08,675-Speed 2632.55 samples/sec   Loss 1.7308   LearningRate 0.0012   Epoch: 17   Global Step: 738770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:12,603-Speed 2607.90 samples/sec   Loss 1.7555   LearningRate 0.0012   Epoch: 17   Global Step: 738780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:16,493-Speed 2632.96 samples/sec   Loss 1.6666   LearningRate 0.0012   Epoch: 17   Global Step: 738790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:20,390-Speed 2628.56 samples/sec   Loss 1.6923   LearningRate 0.0012   Epoch: 17   Global Step: 738800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:24,281-Speed 2632.56 samples/sec   Loss 1.6457   LearningRate 0.0012   Epoch: 17   Global Step: 738810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:28,175-Speed 2630.67 samples/sec   Loss 1.6298   LearningRate 0.0012   Epoch: 17   Global Step: 738820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:32,070-Speed 2630.05 samples/sec   Loss 1.6823   LearningRate 0.0012   Epoch: 17   Global Step: 738830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:36,016-Speed 2595.54 samples/sec   Loss 1.6190   LearningRate 0.0012   Epoch: 17   Global Step: 738840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:45:39,912-Speed 2629.11 samples/sec   Loss 1.6863   LearningRate 0.0012   Epoch: 17   Global Step: 738850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:45:43,808-Speed 2628.85 samples/sec   Loss 1.6641   LearningRate 0.0012   Epoch: 17   Global Step: 738860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:45:47,709-Speed 2625.31 samples/sec   Loss 1.6914   LearningRate 0.0012   Epoch: 17   Global Step: 738870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:45:51,672-Speed 2584.32 samples/sec   Loss 1.6484   LearningRate 0.0012   Epoch: 17   Global Step: 738880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:55,585-Speed 2618.67 samples/sec   Loss 1.6386   LearningRate 0.0012   Epoch: 17   Global Step: 738890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:45:59,483-Speed 2627.56 samples/sec   Loss 1.7025   LearningRate 0.0012   Epoch: 17   Global Step: 738900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:03,412-Speed 2606.55 samples/sec   Loss 1.6913   LearningRate 0.0012   Epoch: 17   Global Step: 738910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:07,303-Speed 2632.97 samples/sec   Loss 1.6228   LearningRate 0.0012   Epoch: 17   Global Step: 738920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:11,198-Speed 2629.27 samples/sec   Loss 1.7152   LearningRate 0.0012   Epoch: 17   Global Step: 738930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:15,097-Speed 2626.62 samples/sec   Loss 1.6711   LearningRate 0.0012   Epoch: 17   Global Step: 738940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:18,995-Speed 2627.92 samples/sec   Loss 1.6635   LearningRate 0.0012   Epoch: 17   Global Step: 738950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:22,889-Speed 2630.45 samples/sec   Loss 1.6851   LearningRate 0.0012   Epoch: 17   Global Step: 738960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:26,785-Speed 2628.90 samples/sec   Loss 1.5997   LearningRate 0.0012   Epoch: 17   Global Step: 738970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:30,723-Speed 2601.61 samples/sec   Loss 1.6858   LearningRate 0.0012   Epoch: 17   Global Step: 738980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:46:34,603-Speed 2639.88 samples/sec   Loss 1.6840   LearningRate 0.0012   Epoch: 17   Global Step: 738990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:38,506-Speed 2624.02 samples/sec   Loss 1.6468   LearningRate 0.0012   Epoch: 17   Global Step: 739000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:42,402-Speed 2629.14 samples/sec   Loss 1.6778   LearningRate 0.0012   Epoch: 17   Global Step: 739010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:46,295-Speed 2630.79 samples/sec   Loss 1.6330   LearningRate 0.0012   Epoch: 17   Global Step: 739020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:50,202-Speed 2621.87 samples/sec   Loss 1.6811   LearningRate 0.0012   Epoch: 17   Global Step: 739030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:54,104-Speed 2625.23 samples/sec   Loss 1.6507   LearningRate 0.0012   Epoch: 17   Global Step: 739040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:46:58,001-Speed 2628.38 samples/sec   Loss 1.7057   LearningRate 0.0012   Epoch: 17   Global Step: 739050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:47:01,902-Speed 2625.59 samples/sec   Loss 1.6639   LearningRate 0.0012   Epoch: 17   Global Step: 739060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:47:05,799-Speed 2628.27 samples/sec   Loss 1.6262   LearningRate 0.0012   Epoch: 17   Global Step: 739070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:47:09,696-Speed 2628.96 samples/sec   Loss 1.6813   LearningRate 0.0012   Epoch: 17   Global Step: 739080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:47:13,603-Speed 2621.36 samples/sec   Loss 1.5990   LearningRate 0.0012   Epoch: 17   Global Step: 739090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:17,499-Speed 2628.92 samples/sec   Loss 1.6504   LearningRate 0.0012   Epoch: 17   Global Step: 739100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:21,400-Speed 2626.08 samples/sec   Loss 1.6770   LearningRate 0.0012   Epoch: 17   Global Step: 739110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:25,299-Speed 2626.90 samples/sec   Loss 1.6592   LearningRate 0.0012   Epoch: 17   Global Step: 739120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:29,196-Speed 2628.72 samples/sec   Loss 1.6546   LearningRate 0.0012   Epoch: 17   Global Step: 739130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:33,090-Speed 2630.13 samples/sec   Loss 1.6821   LearningRate 0.0012   Epoch: 17   Global Step: 739140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:36,983-Speed 2630.77 samples/sec   Loss 1.6718   LearningRate 0.0012   Epoch: 17   Global Step: 739150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:40,880-Speed 2628.23 samples/sec   Loss 1.6753   LearningRate 0.0012   Epoch: 17   Global Step: 739160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:44,782-Speed 2624.87 samples/sec   Loss 1.6309   LearningRate 0.0012   Epoch: 17   Global Step: 739170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:48,687-Speed 2622.96 samples/sec   Loss 1.6514   LearningRate 0.0012   Epoch: 17   Global Step: 739180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:47:52,601-Speed 2617.54 samples/sec   Loss 1.6589   LearningRate 0.0012   Epoch: 17   Global Step: 739190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-16 06:47:56,479-Speed 2640.66 samples/sec   Loss 1.6365   LearningRate 0.0012   Epoch: 17   Global Step: 739200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:48:00,385-Speed 2622.93 samples/sec   Loss 1.7055   LearningRate 0.0012   Epoch: 17   Global Step: 739210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:04,297-Speed 2618.28 samples/sec   Loss 1.6246   LearningRate 0.0012   Epoch: 17   Global Step: 739220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:08,273-Speed 2575.91 samples/sec   Loss 1.6128   LearningRate 0.0012   Epoch: 17   Global Step: 739230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:12,361-Speed 2505.20 samples/sec   Loss 1.6158   LearningRate 0.0012   Epoch: 17   Global Step: 739240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:16,384-Speed 2546.36 samples/sec   Loss 1.6939   LearningRate 0.0012   Epoch: 17   Global Step: 739250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:20,281-Speed 2628.59 samples/sec   Loss 1.6756   LearningRate 0.0012   Epoch: 17   Global Step: 739260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:24,174-Speed 2630.87 samples/sec   Loss 1.5974   LearningRate 0.0012   Epoch: 17   Global Step: 739270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:28,074-Speed 2626.94 samples/sec   Loss 1.6618   LearningRate 0.0012   Epoch: 17   Global Step: 739280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:31,974-Speed 2626.16 samples/sec   Loss 1.6407   LearningRate 0.0012   Epoch: 17   Global Step: 739290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:35,873-Speed 2626.64 samples/sec   Loss 1.6912   LearningRate 0.0012   Epoch: 17   Global Step: 739300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:39,781-Speed 2621.03 samples/sec   Loss 1.6478   LearningRate 0.0012   Epoch: 17   Global Step: 739310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:48:43,683-Speed 2624.82 samples/sec   Loss 1.6489   LearningRate 0.0012   Epoch: 17   Global Step: 739320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:48:47,579-Speed 2628.75 samples/sec   Loss 1.6715   LearningRate 0.0012   Epoch: 17   Global Step: 739330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:48:51,453-Speed 2644.47 samples/sec   Loss 1.6233   LearningRate 0.0012   Epoch: 17   Global Step: 739340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:55,350-Speed 2628.19 samples/sec   Loss 1.6385   LearningRate 0.0012   Epoch: 17   Global Step: 739350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:48:59,241-Speed 2632.85 samples/sec   Loss 1.6934   LearningRate 0.0012   Epoch: 17   Global Step: 739360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:03,132-Speed 2631.94 samples/sec   Loss 1.6406   LearningRate 0.0012   Epoch: 17   Global Step: 739370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:07,028-Speed 2628.59 samples/sec   Loss 1.6567   LearningRate 0.0012   Epoch: 17   Global Step: 739380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:10,923-Speed 2629.57 samples/sec   Loss 1.6681   LearningRate 0.0012   Epoch: 17   Global Step: 739390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:14,821-Speed 2628.00 samples/sec   Loss 1.6449   LearningRate 0.0012   Epoch: 17   Global Step: 739400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:18,717-Speed 2628.36 samples/sec   Loss 1.6425   LearningRate 0.0012   Epoch: 17   Global Step: 739410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:22,618-Speed 2625.94 samples/sec   Loss 1.7264   LearningRate 0.0012   Epoch: 17   Global Step: 739420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:26,514-Speed 2629.17 samples/sec   Loss 1.6709   LearningRate 0.0012   Epoch: 17   Global Step: 739430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:30,390-Speed 2643.24 samples/sec   Loss 1.7355   LearningRate 0.0012   Epoch: 17   Global Step: 739440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:34,281-Speed 2632.19 samples/sec   Loss 1.6547   LearningRate 0.0012   Epoch: 17   Global Step: 739450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:38,179-Speed 2627.16 samples/sec   Loss 1.6651   LearningRate 0.0012   Epoch: 17   Global Step: 739460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:42,080-Speed 2625.28 samples/sec   Loss 1.7002   LearningRate 0.0012   Epoch: 17   Global Step: 739470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:45,971-Speed 2633.10 samples/sec   Loss 1.6202   LearningRate 0.0012   Epoch: 17   Global Step: 739480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:49:49,853-Speed 2638.60 samples/sec   Loss 1.7036   LearningRate 0.0012   Epoch: 17   Global Step: 739490   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:49:53,749-Speed 2628.96 samples/sec   Loss 1.7280   LearningRate 0.0012   Epoch: 17   Global Step: 739500   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:49:57,643-Speed 2630.96 samples/sec   Loss 1.6336   LearningRate 0.0012   Epoch: 17   Global Step: 739510   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:01,536-Speed 2630.31 samples/sec   Loss 1.6735   LearningRate 0.0012   Epoch: 17   Global Step: 739520   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:05,434-Speed 2628.32 samples/sec   Loss 1.6169   LearningRate 0.0012   Epoch: 17   Global Step: 739530   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:09,346-Speed 2617.71 samples/sec   Loss 1.6721   LearningRate 0.0012   Epoch: 17   Global Step: 739540   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:13,238-Speed 2632.14 samples/sec   Loss 1.6804   LearningRate 0.0012   Epoch: 17   Global Step: 739550   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:17,141-Speed 2624.13 samples/sec   Loss 1.6769   LearningRate 0.0012   Epoch: 17   Global Step: 739560   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:21,034-Speed 2631.68 samples/sec   Loss 1.6899   LearningRate 0.0012   Epoch: 17   Global Step: 739570   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:24,931-Speed 2628.18 samples/sec   Loss 1.6815   LearningRate 0.0012   Epoch: 17   Global Step: 739580   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:50:28,828-Speed 2628.55 samples/sec   Loss 1.6522   LearningRate 0.0012   Epoch: 17   Global Step: 739590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:32,725-Speed 2628.77 samples/sec   Loss 1.6587   LearningRate 0.0012   Epoch: 17   Global Step: 739600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:36,617-Speed 2631.49 samples/sec   Loss 1.6603   LearningRate 0.0012   Epoch: 17   Global Step: 739610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:40,513-Speed 2629.30 samples/sec   Loss 1.6918   LearningRate 0.0012   Epoch: 17   Global Step: 739620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:44,415-Speed 2624.56 samples/sec   Loss 1.7141   LearningRate 0.0012   Epoch: 17   Global Step: 739630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:48,321-Speed 2622.39 samples/sec   Loss 1.6080   LearningRate 0.0012   Epoch: 17   Global Step: 739640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:52,222-Speed 2625.36 samples/sec   Loss 1.5851   LearningRate 0.0012   Epoch: 17   Global Step: 739650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:50:56,118-Speed 2629.70 samples/sec   Loss 1.6830   LearningRate 0.0012   Epoch: 17   Global Step: 739660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:00,018-Speed 2626.20 samples/sec   Loss 1.6875   LearningRate 0.0012   Epoch: 17   Global Step: 739670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:03,913-Speed 2629.85 samples/sec   Loss 1.6231   LearningRate 0.0012   Epoch: 17   Global Step: 739680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:07,810-Speed 2628.27 samples/sec   Loss 1.6508   LearningRate 0.0012   Epoch: 17   Global Step: 739690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:51:11,707-Speed 2628.16 samples/sec   Loss 1.6725   LearningRate 0.0012   Epoch: 17   Global Step: 739700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:51:15,577-Speed 2646.53 samples/sec   Loss 1.6434   LearningRate 0.0012   Epoch: 17   Global Step: 739710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:19,528-Speed 2592.39 samples/sec   Loss 1.6390   LearningRate 0.0012   Epoch: 17   Global Step: 739720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:23,424-Speed 2628.84 samples/sec   Loss 1.6459   LearningRate 0.0012   Epoch: 17   Global Step: 739730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:27,320-Speed 2629.45 samples/sec   Loss 1.7307   LearningRate 0.0012   Epoch: 17   Global Step: 739740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:31,219-Speed 2626.90 samples/sec   Loss 1.6313   LearningRate 0.0012   Epoch: 17   Global Step: 739750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:35,127-Speed 2620.75 samples/sec   Loss 1.6291   LearningRate 0.0012   Epoch: 17   Global Step: 739760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:39,023-Speed 2629.19 samples/sec   Loss 1.6828   LearningRate 0.0012   Epoch: 17   Global Step: 739770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:42,933-Speed 2619.05 samples/sec   Loss 1.6143   LearningRate 0.0012   Epoch: 17   Global Step: 739780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:46,825-Speed 2631.92 samples/sec   Loss 1.6274   LearningRate 0.0012   Epoch: 17   Global Step: 739790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:50,726-Speed 2625.51 samples/sec   Loss 1.6676   LearningRate 0.0012   Epoch: 17   Global Step: 739800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:51:54,637-Speed 2618.91 samples/sec   Loss 1.6737   LearningRate 0.0012   Epoch: 17   Global Step: 739810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:51:58,531-Speed 2630.14 samples/sec   Loss 1.6130   LearningRate 0.0012   Epoch: 17   Global Step: 739820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:02,424-Speed 2631.61 samples/sec   Loss 1.6654   LearningRate 0.0012   Epoch: 17   Global Step: 739830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:06,319-Speed 2629.42 samples/sec   Loss 1.6269   LearningRate 0.0012   Epoch: 17   Global Step: 739840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:10,226-Speed 2622.09 samples/sec   Loss 1.6320   LearningRate 0.0012   Epoch: 17   Global Step: 739850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:14,136-Speed 2618.90 samples/sec   Loss 1.6798   LearningRate 0.0012   Epoch: 17   Global Step: 739860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:18,027-Speed 2632.16 samples/sec   Loss 1.6321   LearningRate 0.0012   Epoch: 17   Global Step: 739870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:21,928-Speed 2626.04 samples/sec   Loss 1.6660   LearningRate 0.0012   Epoch: 17   Global Step: 739880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:25,823-Speed 2629.70 samples/sec   Loss 1.6420   LearningRate 0.0012   Epoch: 17   Global Step: 739890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:29,716-Speed 2630.90 samples/sec   Loss 1.6959   LearningRate 0.0012   Epoch: 17   Global Step: 739900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:33,601-Speed 2636.26 samples/sec   Loss 1.6867   LearningRate 0.0012   Epoch: 17   Global Step: 739910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:52:37,480-Speed 2640.50 samples/sec   Loss 1.6780   LearningRate 0.0012   Epoch: 17   Global Step: 739920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:52:41,378-Speed 2627.71 samples/sec   Loss 1.6224   LearningRate 0.0012   Epoch: 17   Global Step: 739930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:52:45,293-Speed 2616.19 samples/sec   Loss 1.6508   LearningRate 0.0012   Epoch: 17   Global Step: 739940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:52:49,192-Speed 2626.87 samples/sec   Loss 1.6380   LearningRate 0.0012   Epoch: 17   Global Step: 739950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:52:53,088-Speed 2630.22 samples/sec   Loss 1.6497   LearningRate 0.0012   Epoch: 17   Global Step: 739960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:52:56,984-Speed 2628.93 samples/sec   Loss 1.6606   LearningRate 0.0012   Epoch: 17   Global Step: 739970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:53:00,880-Speed 2628.64 samples/sec   Loss 1.6430   LearningRate 0.0012   Epoch: 17   Global Step: 739980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:53:04,752-Speed 2645.06 samples/sec   Loss 1.6490   LearningRate 0.0012   Epoch: 17   Global Step: 739990   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:53:08,651-Speed 2627.36 samples/sec   Loss 1.5944   LearningRate 0.0012   Epoch: 17   Global Step: 740000   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:53:51,351-[lfw][740000]XNorm: 22.257720
Training: 2022-04-16 06:53:51,352-[lfw][740000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 06:53:51,352-[lfw][740000]Accuracy-Highest: 0.99850
Training: 2022-04-16 06:54:41,737-[cfp_fp][740000]XNorm: 22.363442
Training: 2022-04-16 06:54:41,738-[cfp_fp][740000]Accuracy-Flip: 0.99314+-0.00377
Training: 2022-04-16 06:54:41,739-[cfp_fp][740000]Accuracy-Highest: 0.99329
Training: 2022-04-16 06:55:25,109-[agedb_30][740000]XNorm: 23.031060
Training: 2022-04-16 06:55:25,110-[agedb_30][740000]Accuracy-Flip: 0.98333+-0.00592
Training: 2022-04-16 06:55:25,110-[agedb_30][740000]Accuracy-Highest: 0.98333
Training: 2022-04-16 06:55:28,979-Speed 72.97 samples/sec   Loss 1.6588   LearningRate 0.0012   Epoch: 17   Global Step: 740010   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:32,838-Speed 2654.66 samples/sec   Loss 1.5943   LearningRate 0.0012   Epoch: 17   Global Step: 740020   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:36,709-Speed 2646.35 samples/sec   Loss 1.6493   LearningRate 0.0012   Epoch: 17   Global Step: 740030   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:40,587-Speed 2641.56 samples/sec   Loss 1.7108   LearningRate 0.0012   Epoch: 17   Global Step: 740040   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:44,455-Speed 2648.03 samples/sec   Loss 1.6725   LearningRate 0.0012   Epoch: 17   Global Step: 740050   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:48,353-Speed 2627.67 samples/sec   Loss 1.7155   LearningRate 0.0012   Epoch: 17   Global Step: 740060   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:52,255-Speed 2625.58 samples/sec   Loss 1.6130   LearningRate 0.0012   Epoch: 17   Global Step: 740070   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:55:56,140-Speed 2636.78 samples/sec   Loss 1.6132   LearningRate 0.0012   Epoch: 17   Global Step: 740080   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 06:56:00,034-Speed 2630.59 samples/sec   Loss 1.6430   LearningRate 0.0012   Epoch: 17   Global Step: 740090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:03,915-Speed 2638.85 samples/sec   Loss 1.6057   LearningRate 0.0012   Epoch: 17   Global Step: 740100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:07,794-Speed 2640.29 samples/sec   Loss 1.6581   LearningRate 0.0012   Epoch: 17   Global Step: 740110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:11,676-Speed 2638.21 samples/sec   Loss 1.6068   LearningRate 0.0012   Epoch: 17   Global Step: 740120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:15,558-Speed 2639.07 samples/sec   Loss 1.6563   LearningRate 0.0012   Epoch: 17   Global Step: 740130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:19,465-Speed 2621.25 samples/sec   Loss 1.6317   LearningRate 0.0012   Epoch: 17   Global Step: 740140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:23,364-Speed 2627.76 samples/sec   Loss 1.6631   LearningRate 0.0012   Epoch: 17   Global Step: 740150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:27,250-Speed 2635.00 samples/sec   Loss 1.6893   LearningRate 0.0012   Epoch: 17   Global Step: 740160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:31,141-Speed 2633.38 samples/sec   Loss 1.7145   LearningRate 0.0012   Epoch: 17   Global Step: 740170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:35,033-Speed 2631.74 samples/sec   Loss 1.5673   LearningRate 0.0012   Epoch: 17   Global Step: 740180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:56:38,929-Speed 2628.57 samples/sec   Loss 1.7509   LearningRate 0.0012   Epoch: 17   Global Step: 740190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:56:42,826-Speed 2628.12 samples/sec   Loss 1.6917   LearningRate 0.0012   Epoch: 17   Global Step: 740200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:56:46,719-Speed 2632.46 samples/sec   Loss 1.6620   LearningRate 0.0012   Epoch: 17   Global Step: 740210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:56:50,623-Speed 2623.71 samples/sec   Loss 1.6462   LearningRate 0.0012   Epoch: 17   Global Step: 740220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:56:54,513-Speed 2632.46 samples/sec   Loss 1.6645   LearningRate 0.0012   Epoch: 17   Global Step: 740230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:56:58,423-Speed 2620.32 samples/sec   Loss 1.5956   LearningRate 0.0012   Epoch: 17   Global Step: 740240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:57:02,324-Speed 2625.61 samples/sec   Loss 1.6300   LearningRate 0.0012   Epoch: 17   Global Step: 740250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:57:06,200-Speed 2642.57 samples/sec   Loss 1.6410   LearningRate 0.0012   Epoch: 17   Global Step: 740260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:10,092-Speed 2631.09 samples/sec   Loss 1.6202   LearningRate 0.0012   Epoch: 17   Global Step: 740270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:14,006-Speed 2617.47 samples/sec   Loss 1.6507   LearningRate 0.0012   Epoch: 17   Global Step: 740280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:17,901-Speed 2629.28 samples/sec   Loss 1.6447   LearningRate 0.0012   Epoch: 17   Global Step: 740290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:21,796-Speed 2630.33 samples/sec   Loss 1.6665   LearningRate 0.0012   Epoch: 17   Global Step: 740300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:25,694-Speed 2627.89 samples/sec   Loss 1.6697   LearningRate 0.0012   Epoch: 17   Global Step: 740310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:29,613-Speed 2613.29 samples/sec   Loss 1.6530   LearningRate 0.0012   Epoch: 17   Global Step: 740320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:33,534-Speed 2612.45 samples/sec   Loss 1.6222   LearningRate 0.0012   Epoch: 17   Global Step: 740330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:37,434-Speed 2626.40 samples/sec   Loss 1.6580   LearningRate 0.0012   Epoch: 17   Global Step: 740340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:41,399-Speed 2583.28 samples/sec   Loss 1.6699   LearningRate 0.0012   Epoch: 17   Global Step: 740350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:45,306-Speed 2622.08 samples/sec   Loss 1.6856   LearningRate 0.0012   Epoch: 17   Global Step: 740360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:57:49,184-Speed 2641.05 samples/sec   Loss 1.6871   LearningRate 0.0012   Epoch: 17   Global Step: 740370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:53,081-Speed 2629.04 samples/sec   Loss 1.6702   LearningRate 0.0012   Epoch: 17   Global Step: 740380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:57:57,145-Speed 2520.07 samples/sec   Loss 1.6675   LearningRate 0.0012   Epoch: 17   Global Step: 740390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:01,183-Speed 2537.13 samples/sec   Loss 1.6160   LearningRate 0.0012   Epoch: 17   Global Step: 740400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:05,085-Speed 2624.65 samples/sec   Loss 1.7183   LearningRate 0.0012   Epoch: 17   Global Step: 740410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:08,983-Speed 2627.18 samples/sec   Loss 1.6741   LearningRate 0.0012   Epoch: 17   Global Step: 740420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:12,884-Speed 2625.50 samples/sec   Loss 1.6809   LearningRate 0.0012   Epoch: 17   Global Step: 740430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:16,788-Speed 2624.17 samples/sec   Loss 1.6477   LearningRate 0.0012   Epoch: 17   Global Step: 740440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:20,701-Speed 2617.92 samples/sec   Loss 1.6580   LearningRate 0.0012   Epoch: 17   Global Step: 740450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:24,600-Speed 2627.05 samples/sec   Loss 1.6288   LearningRate 0.0012   Epoch: 17   Global Step: 740460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:58:28,511-Speed 2618.23 samples/sec   Loss 1.6815   LearningRate 0.0012   Epoch: 17   Global Step: 740470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:32,411-Speed 2626.48 samples/sec   Loss 1.6235   LearningRate 0.0012   Epoch: 17   Global Step: 740480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:36,313-Speed 2626.63 samples/sec   Loss 1.6159   LearningRate 0.0012   Epoch: 17   Global Step: 740490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:40,213-Speed 2625.64 samples/sec   Loss 1.6440   LearningRate 0.0012   Epoch: 17   Global Step: 740500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:44,113-Speed 2626.59 samples/sec   Loss 1.6407   LearningRate 0.0012   Epoch: 17   Global Step: 740510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:48,017-Speed 2623.89 samples/sec   Loss 1.6535   LearningRate 0.0012   Epoch: 17   Global Step: 740520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:51,919-Speed 2624.54 samples/sec   Loss 1.5628   LearningRate 0.0012   Epoch: 17   Global Step: 740530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:55,821-Speed 2625.16 samples/sec   Loss 1.6528   LearningRate 0.0012   Epoch: 17   Global Step: 740540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:58:59,724-Speed 2624.69 samples/sec   Loss 1.7173   LearningRate 0.0012   Epoch: 17   Global Step: 740550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:03,631-Speed 2621.57 samples/sec   Loss 1.6285   LearningRate 0.0012   Epoch: 17   Global Step: 740560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:07,533-Speed 2625.06 samples/sec   Loss 1.6501   LearningRate 0.0012   Epoch: 17   Global Step: 740570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-16 06:59:11,417-Speed 2637.10 samples/sec   Loss 1.6220   LearningRate 0.0012   Epoch: 17   Global Step: 740580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:15,331-Speed 2617.03 samples/sec   Loss 1.6092   LearningRate 0.0012   Epoch: 17   Global Step: 740590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:19,233-Speed 2624.95 samples/sec   Loss 1.6952   LearningRate 0.0012   Epoch: 17   Global Step: 740600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:23,143-Speed 2619.65 samples/sec   Loss 1.6475   LearningRate 0.0012   Epoch: 17   Global Step: 740610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:27,084-Speed 2598.88 samples/sec   Loss 1.6308   LearningRate 0.0011   Epoch: 17   Global Step: 740620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:30,995-Speed 2619.37 samples/sec   Loss 1.6324   LearningRate 0.0011   Epoch: 17   Global Step: 740630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 06:59:34,872-Speed 2641.15 samples/sec   Loss 1.5992   LearningRate 0.0011   Epoch: 17   Global Step: 740640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:59:38,776-Speed 2623.97 samples/sec   Loss 1.6664   LearningRate 0.0011   Epoch: 17   Global Step: 740650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:59:42,679-Speed 2624.24 samples/sec   Loss 1.6110   LearningRate 0.0011   Epoch: 17   Global Step: 740660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:59:46,581-Speed 2625.34 samples/sec   Loss 1.7325   LearningRate 0.0011   Epoch: 17   Global Step: 740670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:59:50,483-Speed 2624.37 samples/sec   Loss 1.6980   LearningRate 0.0011   Epoch: 17   Global Step: 740680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:59:54,385-Speed 2625.09 samples/sec   Loss 1.5990   LearningRate 0.0011   Epoch: 17   Global Step: 740690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 06:59:58,288-Speed 2623.70 samples/sec   Loss 1.6167   LearningRate 0.0011   Epoch: 17   Global Step: 740700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:02,192-Speed 2624.00 samples/sec   Loss 1.6139   LearningRate 0.0011   Epoch: 17   Global Step: 740710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:06,094-Speed 2626.21 samples/sec   Loss 1.6677   LearningRate 0.0011   Epoch: 17   Global Step: 740720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:10,005-Speed 2618.84 samples/sec   Loss 1.6670   LearningRate 0.0011   Epoch: 17   Global Step: 740730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:13,917-Speed 2618.31 samples/sec   Loss 1.6340   LearningRate 0.0011   Epoch: 17   Global Step: 740740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:00:17,821-Speed 2623.03 samples/sec   Loss 1.6463   LearningRate 0.0011   Epoch: 17   Global Step: 740750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:00:21,749-Speed 2607.74 samples/sec   Loss 1.6497   LearningRate 0.0011   Epoch: 17   Global Step: 740760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:00:25,646-Speed 2628.20 samples/sec   Loss 1.5978   LearningRate 0.0011   Epoch: 17   Global Step: 740770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:29,579-Speed 2604.23 samples/sec   Loss 1.6608   LearningRate 0.0011   Epoch: 17   Global Step: 740780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:33,487-Speed 2620.60 samples/sec   Loss 1.6271   LearningRate 0.0011   Epoch: 17   Global Step: 740790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:37,397-Speed 2619.72 samples/sec   Loss 1.6598   LearningRate 0.0011   Epoch: 17   Global Step: 740800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:41,306-Speed 2620.72 samples/sec   Loss 1.6317   LearningRate 0.0011   Epoch: 17   Global Step: 740810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:45,213-Speed 2621.48 samples/sec   Loss 1.6489   LearningRate 0.0011   Epoch: 17   Global Step: 740820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:49,127-Speed 2617.60 samples/sec   Loss 1.6403   LearningRate 0.0011   Epoch: 17   Global Step: 740830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:00:52,998-Speed 2645.69 samples/sec   Loss 1.6345   LearningRate 0.0011   Epoch: 17   Global Step: 740840   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:00:56,938-Speed 2599.71 samples/sec   Loss 1.6555   LearningRate 0.0011   Epoch: 17   Global Step: 740850   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:00,848-Speed 2619.73 samples/sec   Loss 1.6554   LearningRate 0.0011   Epoch: 17   Global Step: 740860   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:04,758-Speed 2619.26 samples/sec   Loss 1.6549   LearningRate 0.0011   Epoch: 17   Global Step: 740870   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:08,680-Speed 2611.84 samples/sec   Loss 1.5892   LearningRate 0.0011   Epoch: 17   Global Step: 740880   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:12,593-Speed 2617.48 samples/sec   Loss 1.5554   LearningRate 0.0011   Epoch: 17   Global Step: 740890   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:16,512-Speed 2614.31 samples/sec   Loss 1.5856   LearningRate 0.0011   Epoch: 17   Global Step: 740900   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:20,422-Speed 2618.88 samples/sec   Loss 1.6123   LearningRate 0.0011   Epoch: 17   Global Step: 740910   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:24,334-Speed 2618.53 samples/sec   Loss 1.6659   LearningRate 0.0011   Epoch: 17   Global Step: 740920   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:28,248-Speed 2617.10 samples/sec   Loss 1.6263   LearningRate 0.0011   Epoch: 17   Global Step: 740930   Fp16 Grad Scale: 4096   Required: 10 hours
Training: 2022-04-16 07:01:32,179-Speed 2606.07 samples/sec   Loss 1.7075   LearningRate 0.0011   Epoch: 17   Global Step: 740940   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:36,097-Speed 2613.51 samples/sec   Loss 1.6377   LearningRate 0.0011   Epoch: 17   Global Step: 740950   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:40,017-Speed 2612.68 samples/sec   Loss 1.6281   LearningRate 0.0011   Epoch: 17   Global Step: 740960   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:43,931-Speed 2616.96 samples/sec   Loss 1.5883   LearningRate 0.0011   Epoch: 17   Global Step: 740970   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:47,853-Speed 2611.46 samples/sec   Loss 1.6129   LearningRate 0.0011   Epoch: 17   Global Step: 740980   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:51,781-Speed 2608.88 samples/sec   Loss 1.5888   LearningRate 0.0011   Epoch: 17   Global Step: 740990   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:55,703-Speed 2611.31 samples/sec   Loss 1.6884   LearningRate 0.0011   Epoch: 17   Global Step: 741000   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:01:59,620-Speed 2615.73 samples/sec   Loss 1.6047   LearningRate 0.0011   Epoch: 17   Global Step: 741010   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:02:03,537-Speed 2614.45 samples/sec   Loss 1.6515   LearningRate 0.0011   Epoch: 17   Global Step: 741020   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:02:07,452-Speed 2616.09 samples/sec   Loss 1.6079   LearningRate 0.0011   Epoch: 17   Global Step: 741030   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:02:11,366-Speed 2617.10 samples/sec   Loss 1.6342   LearningRate 0.0011   Epoch: 17   Global Step: 741040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:15,276-Speed 2619.57 samples/sec   Loss 1.6096   LearningRate 0.0011   Epoch: 17   Global Step: 741050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:19,195-Speed 2613.84 samples/sec   Loss 1.6524   LearningRate 0.0011   Epoch: 17   Global Step: 741060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:23,101-Speed 2622.13 samples/sec   Loss 1.6610   LearningRate 0.0011   Epoch: 17   Global Step: 741070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:27,014-Speed 2617.50 samples/sec   Loss 1.6431   LearningRate 0.0011   Epoch: 17   Global Step: 741080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:30,922-Speed 2621.47 samples/sec   Loss 1.6376   LearningRate 0.0011   Epoch: 17   Global Step: 741090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:34,833-Speed 2618.98 samples/sec   Loss 1.6253   LearningRate 0.0011   Epoch: 17   Global Step: 741100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:38,738-Speed 2622.33 samples/sec   Loss 1.6003   LearningRate 0.0011   Epoch: 17   Global Step: 741110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:42,660-Speed 2611.95 samples/sec   Loss 1.6393   LearningRate 0.0011   Epoch: 17   Global Step: 741120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:46,575-Speed 2616.24 samples/sec   Loss 1.6537   LearningRate 0.0011   Epoch: 17   Global Step: 741130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:02:50,490-Speed 2616.29 samples/sec   Loss 1.6276   LearningRate 0.0011   Epoch: 17   Global Step: 741140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:02:54,409-Speed 2614.02 samples/sec   Loss 1.6873   LearningRate 0.0011   Epoch: 17   Global Step: 741150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:02:58,344-Speed 2602.46 samples/sec   Loss 1.5874   LearningRate 0.0011   Epoch: 17   Global Step: 741160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:02,265-Speed 2612.72 samples/sec   Loss 1.6481   LearningRate 0.0011   Epoch: 17   Global Step: 741170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:06,177-Speed 2618.42 samples/sec   Loss 1.6340   LearningRate 0.0011   Epoch: 17   Global Step: 741180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:10,091-Speed 2616.73 samples/sec   Loss 1.6340   LearningRate 0.0011   Epoch: 17   Global Step: 741190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:14,006-Speed 2616.17 samples/sec   Loss 1.6274   LearningRate 0.0011   Epoch: 17   Global Step: 741200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:17,917-Speed 2619.13 samples/sec   Loss 1.6740   LearningRate 0.0011   Epoch: 17   Global Step: 741210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:21,842-Speed 2609.85 samples/sec   Loss 1.6145   LearningRate 0.0011   Epoch: 17   Global Step: 741220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:25,746-Speed 2623.34 samples/sec   Loss 1.6661   LearningRate 0.0011   Epoch: 17   Global Step: 741230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:29,631-Speed 2636.24 samples/sec   Loss 1.6590   LearningRate 0.0011   Epoch: 17   Global Step: 741240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:03:33,519-Speed 2634.61 samples/sec   Loss 1.6367   LearningRate 0.0011   Epoch: 17   Global Step: 741250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:03:37,428-Speed 2620.40 samples/sec   Loss 1.6026   LearningRate 0.0011   Epoch: 17   Global Step: 741260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:03:41,339-Speed 2618.91 samples/sec   Loss 1.6482   LearningRate 0.0011   Epoch: 17   Global Step: 741270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:03:45,263-Speed 2610.14 samples/sec   Loss 1.5939   LearningRate 0.0011   Epoch: 17   Global Step: 741280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:03:49,235-Speed 2578.71 samples/sec   Loss 1.6302   LearningRate 0.0011   Epoch: 17   Global Step: 741290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:03:53,146-Speed 2618.76 samples/sec   Loss 1.6202   LearningRate 0.0011   Epoch: 17   Global Step: 741300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:03:57,051-Speed 2623.14 samples/sec   Loss 1.6092   LearningRate 0.0011   Epoch: 17   Global Step: 741310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:00,985-Speed 2603.67 samples/sec   Loss 1.5788   LearningRate 0.0011   Epoch: 17   Global Step: 741320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:04,914-Speed 2606.68 samples/sec   Loss 1.6143   LearningRate 0.0011   Epoch: 17   Global Step: 741330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:08,822-Speed 2620.46 samples/sec   Loss 1.5975   LearningRate 0.0011   Epoch: 17   Global Step: 741340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:12,731-Speed 2620.76 samples/sec   Loss 1.6835   LearningRate 0.0011   Epoch: 17   Global Step: 741350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:04:16,635-Speed 2623.87 samples/sec   Loss 1.6619   LearningRate 0.0011   Epoch: 17   Global Step: 741360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:04:20,519-Speed 2636.51 samples/sec   Loss 1.6265   LearningRate 0.0011   Epoch: 17   Global Step: 741370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:24,421-Speed 2624.78 samples/sec   Loss 1.6979   LearningRate 0.0011   Epoch: 17   Global Step: 741380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:28,346-Speed 2609.46 samples/sec   Loss 1.6780   LearningRate 0.0011   Epoch: 17   Global Step: 741390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:32,246-Speed 2626.40 samples/sec   Loss 1.5581   LearningRate 0.0011   Epoch: 17   Global Step: 741400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:36,158-Speed 2619.28 samples/sec   Loss 1.6307   LearningRate 0.0011   Epoch: 17   Global Step: 741410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:40,053-Speed 2629.49 samples/sec   Loss 1.6427   LearningRate 0.0011   Epoch: 17   Global Step: 741420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:43,957-Speed 2623.69 samples/sec   Loss 1.6410   LearningRate 0.0011   Epoch: 17   Global Step: 741430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:47,871-Speed 2616.62 samples/sec   Loss 1.6241   LearningRate 0.0011   Epoch: 17   Global Step: 741440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:51,779-Speed 2621.48 samples/sec   Loss 1.6158   LearningRate 0.0011   Epoch: 17   Global Step: 741450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:55,684-Speed 2622.92 samples/sec   Loss 1.6065   LearningRate 0.0011   Epoch: 17   Global Step: 741460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:04:59,589-Speed 2622.43 samples/sec   Loss 1.5834   LearningRate 0.0011   Epoch: 17   Global Step: 741470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:05:03,496-Speed 2621.85 samples/sec   Loss 1.6282   LearningRate 0.0011   Epoch: 17   Global Step: 741480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:05:07,409-Speed 2617.43 samples/sec   Loss 1.6152   LearningRate 0.0011   Epoch: 17   Global Step: 741490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:05:11,330-Speed 2612.88 samples/sec   Loss 1.6736   LearningRate 0.0011   Epoch: 17   Global Step: 741500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:05:15,236-Speed 2621.86 samples/sec   Loss 1.6244   LearningRate 0.0011   Epoch: 17   Global Step: 741510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:05:19,159-Speed 2611.13 samples/sec   Loss 1.6298   LearningRate 0.0011   Epoch: 17   Global Step: 741520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:05:23,075-Speed 2615.76 samples/sec   Loss 1.6432   LearningRate 0.0011   Epoch: 17   Global Step: 741530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:05:26,982-Speed 2621.17 samples/sec   Loss 1.6196   LearningRate 0.0011   Epoch: 17   Global Step: 741540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:05:30,877-Speed 2629.32 samples/sec   Loss 1.6364   LearningRate 0.0011   Epoch: 17   Global Step: 741550   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:34,782-Speed 2622.98 samples/sec   Loss 1.6194   LearningRate 0.0011   Epoch: 17   Global Step: 741560   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:38,691-Speed 2620.48 samples/sec   Loss 1.6497   LearningRate 0.0011   Epoch: 17   Global Step: 741570   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:42,603-Speed 2618.35 samples/sec   Loss 1.6921   LearningRate 0.0011   Epoch: 17   Global Step: 741580   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:46,512-Speed 2620.48 samples/sec   Loss 1.6228   LearningRate 0.0011   Epoch: 17   Global Step: 741590   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:50,414-Speed 2624.79 samples/sec   Loss 1.6233   LearningRate 0.0011   Epoch: 17   Global Step: 741600   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:54,317-Speed 2624.17 samples/sec   Loss 1.6574   LearningRate 0.0011   Epoch: 17   Global Step: 741610   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:05:58,221-Speed 2623.45 samples/sec   Loss 1.5809   LearningRate 0.0011   Epoch: 17   Global Step: 741620   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:06:02,223-Speed 2559.25 samples/sec   Loss 1.6300   LearningRate 0.0011   Epoch: 17   Global Step: 741630   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:06:06,134-Speed 2619.22 samples/sec   Loss 1.5736   LearningRate 0.0011   Epoch: 17   Global Step: 741640   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:06:10,063-Speed 2606.61 samples/sec   Loss 1.6158   LearningRate 0.0011   Epoch: 17   Global Step: 741650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:13,969-Speed 2622.89 samples/sec   Loss 1.5474   LearningRate 0.0011   Epoch: 17   Global Step: 741660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:17,950-Speed 2572.19 samples/sec   Loss 1.6056   LearningRate 0.0011   Epoch: 17   Global Step: 741670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:21,930-Speed 2574.45 samples/sec   Loss 1.6544   LearningRate 0.0011   Epoch: 17   Global Step: 741680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:25,837-Speed 2621.46 samples/sec   Loss 1.6397   LearningRate 0.0011   Epoch: 17   Global Step: 741690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:29,755-Speed 2613.68 samples/sec   Loss 1.6562   LearningRate 0.0011   Epoch: 17   Global Step: 741700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:33,662-Speed 2622.17 samples/sec   Loss 1.6385   LearningRate 0.0011   Epoch: 17   Global Step: 741710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:37,564-Speed 2624.89 samples/sec   Loss 1.6462   LearningRate 0.0011   Epoch: 17   Global Step: 741720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:41,465-Speed 2625.63 samples/sec   Loss 1.6136   LearningRate 0.0011   Epoch: 17   Global Step: 741730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:45,366-Speed 2625.84 samples/sec   Loss 1.6275   LearningRate 0.0011   Epoch: 17   Global Step: 741740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:06:49,266-Speed 2626.61 samples/sec   Loss 1.6734   LearningRate 0.0011   Epoch: 17   Global Step: 741750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:06:53,164-Speed 2627.38 samples/sec   Loss 1.6122   LearningRate 0.0011   Epoch: 17   Global Step: 741760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:06:57,063-Speed 2627.66 samples/sec   Loss 1.6310   LearningRate 0.0011   Epoch: 17   Global Step: 741770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:07:00,966-Speed 2624.11 samples/sec   Loss 1.6700   LearningRate 0.0011   Epoch: 17   Global Step: 741780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:07:04,864-Speed 2627.38 samples/sec   Loss 1.6064   LearningRate 0.0011   Epoch: 17   Global Step: 741790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:07:08,763-Speed 2626.26 samples/sec   Loss 1.6680   LearningRate 0.0011   Epoch: 17   Global Step: 741800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:12,673-Speed 2620.03 samples/sec   Loss 1.5998   LearningRate 0.0011   Epoch: 17   Global Step: 741810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:16,570-Speed 2628.58 samples/sec   Loss 1.6305   LearningRate 0.0011   Epoch: 17   Global Step: 741820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:20,483-Speed 2617.70 samples/sec   Loss 1.6082   LearningRate 0.0011   Epoch: 17   Global Step: 741830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:24,382-Speed 2627.12 samples/sec   Loss 1.6302   LearningRate 0.0011   Epoch: 17   Global Step: 741840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:28,285-Speed 2624.45 samples/sec   Loss 1.6718   LearningRate 0.0011   Epoch: 17   Global Step: 741850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:32,191-Speed 2622.68 samples/sec   Loss 1.6682   LearningRate 0.0011   Epoch: 17   Global Step: 741860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:36,101-Speed 2619.12 samples/sec   Loss 1.6252   LearningRate 0.0011   Epoch: 17   Global Step: 741870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:40,012-Speed 2619.05 samples/sec   Loss 1.6259   LearningRate 0.0011   Epoch: 17   Global Step: 741880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:43,916-Speed 2623.12 samples/sec   Loss 1.6584   LearningRate 0.0011   Epoch: 17   Global Step: 741890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:47,825-Speed 2620.52 samples/sec   Loss 1.5907   LearningRate 0.0011   Epoch: 17   Global Step: 741900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:07:51,707-Speed 2638.62 samples/sec   Loss 1.6629   LearningRate 0.0011   Epoch: 17   Global Step: 741910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:55,607-Speed 2625.98 samples/sec   Loss 1.6324   LearningRate 0.0011   Epoch: 17   Global Step: 741920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:07:59,510-Speed 2624.70 samples/sec   Loss 1.6772   LearningRate 0.0011   Epoch: 17   Global Step: 741930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:03,414-Speed 2623.38 samples/sec   Loss 1.6206   LearningRate 0.0011   Epoch: 17   Global Step: 741940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:07,321-Speed 2621.57 samples/sec   Loss 1.6437   LearningRate 0.0011   Epoch: 17   Global Step: 741950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:11,220-Speed 2627.17 samples/sec   Loss 1.6574   LearningRate 0.0011   Epoch: 17   Global Step: 741960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:15,122-Speed 2625.20 samples/sec   Loss 1.6157   LearningRate 0.0011   Epoch: 17   Global Step: 741970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:19,020-Speed 2627.67 samples/sec   Loss 1.6437   LearningRate 0.0011   Epoch: 17   Global Step: 741980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:22,917-Speed 2628.38 samples/sec   Loss 1.6623   LearningRate 0.0011   Epoch: 17   Global Step: 741990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:26,815-Speed 2627.00 samples/sec   Loss 1.6354   LearningRate 0.0011   Epoch: 17   Global Step: 742000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:30,724-Speed 2621.01 samples/sec   Loss 1.6260   LearningRate 0.0011   Epoch: 17   Global Step: 742010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:08:34,602-Speed 2640.99 samples/sec   Loss 1.6248   LearningRate 0.0011   Epoch: 17   Global Step: 742020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:38,513-Speed 2619.25 samples/sec   Loss 1.5917   LearningRate 0.0011   Epoch: 17   Global Step: 742030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:42,416-Speed 2623.97 samples/sec   Loss 1.6298   LearningRate 0.0011   Epoch: 17   Global Step: 742040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:46,352-Speed 2602.84 samples/sec   Loss 1.6571   LearningRate 0.0011   Epoch: 17   Global Step: 742050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:50,264-Speed 2617.98 samples/sec   Loss 1.6762   LearningRate 0.0011   Epoch: 17   Global Step: 742060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:54,162-Speed 2627.60 samples/sec   Loss 1.6295   LearningRate 0.0011   Epoch: 17   Global Step: 742070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:08:58,060-Speed 2627.85 samples/sec   Loss 1.6265   LearningRate 0.0011   Epoch: 17   Global Step: 742080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:01,956-Speed 2629.75 samples/sec   Loss 1.6397   LearningRate 0.0011   Epoch: 17   Global Step: 742090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:05,857-Speed 2625.24 samples/sec   Loss 1.6270   LearningRate 0.0011   Epoch: 17   Global Step: 742100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:09,769-Speed 2618.15 samples/sec   Loss 1.6398   LearningRate 0.0011   Epoch: 17   Global Step: 742110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:13,688-Speed 2613.84 samples/sec   Loss 1.6242   LearningRate 0.0011   Epoch: 17   Global Step: 742120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:09:17,561-Speed 2645.43 samples/sec   Loss 1.6485   LearningRate 0.0011   Epoch: 17   Global Step: 742130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:21,461-Speed 2626.08 samples/sec   Loss 1.6341   LearningRate 0.0011   Epoch: 17   Global Step: 742140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:25,360-Speed 2627.25 samples/sec   Loss 1.5534   LearningRate 0.0011   Epoch: 17   Global Step: 742150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:29,258-Speed 2628.05 samples/sec   Loss 1.6708   LearningRate 0.0011   Epoch: 17   Global Step: 742160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:33,156-Speed 2629.05 samples/sec   Loss 1.6281   LearningRate 0.0011   Epoch: 17   Global Step: 742170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:37,058-Speed 2624.76 samples/sec   Loss 1.6193   LearningRate 0.0011   Epoch: 17   Global Step: 742180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:40,953-Speed 2629.53 samples/sec   Loss 1.5956   LearningRate 0.0011   Epoch: 17   Global Step: 742190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:44,852-Speed 2627.09 samples/sec   Loss 1.6192   LearningRate 0.0011   Epoch: 17   Global Step: 742200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:48,753-Speed 2625.01 samples/sec   Loss 1.5838   LearningRate 0.0011   Epoch: 17   Global Step: 742210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:52,650-Speed 2628.93 samples/sec   Loss 1.6078   LearningRate 0.0011   Epoch: 17   Global Step: 742220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:09:56,527-Speed 2641.58 samples/sec   Loss 1.6429   LearningRate 0.0011   Epoch: 17   Global Step: 742230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:00,439-Speed 2618.57 samples/sec   Loss 1.6228   LearningRate 0.0011   Epoch: 17   Global Step: 742240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:04,341-Speed 2625.06 samples/sec   Loss 1.5721   LearningRate 0.0011   Epoch: 17   Global Step: 742250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:08,242-Speed 2625.28 samples/sec   Loss 1.6659   LearningRate 0.0011   Epoch: 17   Global Step: 742260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:12,162-Speed 2612.61 samples/sec   Loss 1.6628   LearningRate 0.0011   Epoch: 17   Global Step: 742270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:16,080-Speed 2614.97 samples/sec   Loss 1.7030   LearningRate 0.0011   Epoch: 17   Global Step: 742280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:20,001-Speed 2612.34 samples/sec   Loss 1.6283   LearningRate 0.0011   Epoch: 17   Global Step: 742290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:23,924-Speed 2610.23 samples/sec   Loss 1.6054   LearningRate 0.0011   Epoch: 17   Global Step: 742300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:27,822-Speed 2627.94 samples/sec   Loss 1.6673   LearningRate 0.0011   Epoch: 17   Global Step: 742310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:10:31,704-Speed 2639.44 samples/sec   Loss 1.6223   LearningRate 0.0011   Epoch: 17   Global Step: 742320   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:35,608-Speed 2623.09 samples/sec   Loss 1.6586   LearningRate 0.0011   Epoch: 17   Global Step: 742330   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:39,529-Speed 2612.27 samples/sec   Loss 1.5975   LearningRate 0.0011   Epoch: 17   Global Step: 742340   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:43,434-Speed 2623.21 samples/sec   Loss 1.6851   LearningRate 0.0011   Epoch: 17   Global Step: 742350   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:47,330-Speed 2628.95 samples/sec   Loss 1.6297   LearningRate 0.0011   Epoch: 17   Global Step: 742360   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:51,231-Speed 2626.05 samples/sec   Loss 1.6341   LearningRate 0.0011   Epoch: 17   Global Step: 742370   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:55,134-Speed 2623.85 samples/sec   Loss 1.5772   LearningRate 0.0011   Epoch: 17   Global Step: 742380   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:10:59,041-Speed 2621.52 samples/sec   Loss 1.6834   LearningRate 0.0011   Epoch: 17   Global Step: 742390   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:11:02,961-Speed 2612.74 samples/sec   Loss 1.6009   LearningRate 0.0011   Epoch: 17   Global Step: 742400   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:11:06,858-Speed 2628.71 samples/sec   Loss 1.5814   LearningRate 0.0011   Epoch: 17   Global Step: 742410   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:11:10,760-Speed 2624.87 samples/sec   Loss 1.5794   LearningRate 0.0011   Epoch: 17   Global Step: 742420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:14,674-Speed 2617.20 samples/sec   Loss 1.5915   LearningRate 0.0011   Epoch: 17   Global Step: 742430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:18,629-Speed 2589.71 samples/sec   Loss 1.6092   LearningRate 0.0011   Epoch: 17   Global Step: 742440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:22,541-Speed 2618.45 samples/sec   Loss 1.6453   LearningRate 0.0011   Epoch: 17   Global Step: 742450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:26,441-Speed 2626.55 samples/sec   Loss 1.6417   LearningRate 0.0011   Epoch: 17   Global Step: 742460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:30,343-Speed 2624.80 samples/sec   Loss 1.6272   LearningRate 0.0011   Epoch: 17   Global Step: 742470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:34,247-Speed 2623.39 samples/sec   Loss 1.5987   LearningRate 0.0011   Epoch: 17   Global Step: 742480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:38,150-Speed 2624.89 samples/sec   Loss 1.6065   LearningRate 0.0011   Epoch: 17   Global Step: 742490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:42,054-Speed 2623.46 samples/sec   Loss 1.6208   LearningRate 0.0011   Epoch: 17   Global Step: 742500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:45,951-Speed 2628.54 samples/sec   Loss 1.6168   LearningRate 0.0011   Epoch: 17   Global Step: 742510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:11:49,867-Speed 2616.28 samples/sec   Loss 1.6260   LearningRate 0.0011   Epoch: 17   Global Step: 742520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:11:53,764-Speed 2628.18 samples/sec   Loss 1.6489   LearningRate 0.0011   Epoch: 17   Global Step: 742530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:11:57,640-Speed 2642.32 samples/sec   Loss 1.5893   LearningRate 0.0011   Epoch: 17   Global Step: 742540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:01,544-Speed 2623.72 samples/sec   Loss 1.6479   LearningRate 0.0011   Epoch: 17   Global Step: 742550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:05,444-Speed 2626.50 samples/sec   Loss 1.6313   LearningRate 0.0011   Epoch: 17   Global Step: 742560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:09,342-Speed 2627.26 samples/sec   Loss 1.6564   LearningRate 0.0011   Epoch: 17   Global Step: 742570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:13,249-Speed 2621.88 samples/sec   Loss 1.6076   LearningRate 0.0011   Epoch: 17   Global Step: 742580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:17,145-Speed 2628.56 samples/sec   Loss 1.6146   LearningRate 0.0011   Epoch: 17   Global Step: 742590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:21,101-Speed 2589.78 samples/sec   Loss 1.6623   LearningRate 0.0011   Epoch: 17   Global Step: 742600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:25,003-Speed 2624.65 samples/sec   Loss 1.6159   LearningRate 0.0011   Epoch: 17   Global Step: 742610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:28,899-Speed 2629.24 samples/sec   Loss 1.6270   LearningRate 0.0011   Epoch: 17   Global Step: 742620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:32,798-Speed 2626.24 samples/sec   Loss 1.6262   LearningRate 0.0011   Epoch: 17   Global Step: 742630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:36,745-Speed 2595.63 samples/sec   Loss 1.5768   LearningRate 0.0011   Epoch: 17   Global Step: 742640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:12:40,619-Speed 2643.82 samples/sec   Loss 1.5643   LearningRate 0.0011   Epoch: 17   Global Step: 742650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:44,519-Speed 2626.05 samples/sec   Loss 1.5864   LearningRate 0.0011   Epoch: 17   Global Step: 742660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:48,438-Speed 2614.69 samples/sec   Loss 1.6062   LearningRate 0.0011   Epoch: 17   Global Step: 742670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:52,333-Speed 2629.85 samples/sec   Loss 1.5950   LearningRate 0.0011   Epoch: 17   Global Step: 742680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:12:56,231-Speed 2628.14 samples/sec   Loss 1.5702   LearningRate 0.0011   Epoch: 17   Global Step: 742690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:00,132-Speed 2624.81 samples/sec   Loss 1.6523   LearningRate 0.0011   Epoch: 17   Global Step: 742700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:04,032-Speed 2626.55 samples/sec   Loss 1.5935   LearningRate 0.0011   Epoch: 17   Global Step: 742710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:07,938-Speed 2622.10 samples/sec   Loss 1.6455   LearningRate 0.0011   Epoch: 17   Global Step: 742720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:11,834-Speed 2629.53 samples/sec   Loss 1.6469   LearningRate 0.0011   Epoch: 17   Global Step: 742730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:15,730-Speed 2628.30 samples/sec   Loss 1.6033   LearningRate 0.0011   Epoch: 17   Global Step: 742740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:19,636-Speed 2622.35 samples/sec   Loss 1.6591   LearningRate 0.0011   Epoch: 17   Global Step: 742750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:13:23,532-Speed 2629.08 samples/sec   Loss 1.6496   LearningRate 0.0011   Epoch: 17   Global Step: 742760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:13:27,409-Speed 2641.94 samples/sec   Loss 1.6371   LearningRate 0.0011   Epoch: 17   Global Step: 742770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:31,307-Speed 2628.07 samples/sec   Loss 1.6693   LearningRate 0.0011   Epoch: 17   Global Step: 742780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:35,213-Speed 2622.03 samples/sec   Loss 1.6232   LearningRate 0.0011   Epoch: 17   Global Step: 742790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:39,163-Speed 2592.72 samples/sec   Loss 1.5958   LearningRate 0.0011   Epoch: 17   Global Step: 742800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:43,059-Speed 2628.98 samples/sec   Loss 1.6010   LearningRate 0.0011   Epoch: 17   Global Step: 742810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:46,954-Speed 2630.01 samples/sec   Loss 1.6061   LearningRate 0.0011   Epoch: 17   Global Step: 742820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:50,850-Speed 2629.00 samples/sec   Loss 1.6551   LearningRate 0.0011   Epoch: 17   Global Step: 742830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:54,793-Speed 2597.73 samples/sec   Loss 1.6789   LearningRate 0.0011   Epoch: 17   Global Step: 742840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:13:58,689-Speed 2629.41 samples/sec   Loss 1.6672   LearningRate 0.0011   Epoch: 17   Global Step: 742850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:02,587-Speed 2628.09 samples/sec   Loss 1.6544   LearningRate 0.0011   Epoch: 17   Global Step: 742860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:06,486-Speed 2626.68 samples/sec   Loss 1.6613   LearningRate 0.0011   Epoch: 17   Global Step: 742870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:14:10,389-Speed 2624.25 samples/sec   Loss 1.5953   LearningRate 0.0011   Epoch: 17   Global Step: 742880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:14:14,288-Speed 2627.17 samples/sec   Loss 1.6082   LearningRate 0.0011   Epoch: 17   Global Step: 742890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:14:18,224-Speed 2602.23 samples/sec   Loss 1.6413   LearningRate 0.0011   Epoch: 17   Global Step: 742900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:14:22,200-Speed 2576.51 samples/sec   Loss 1.6171   LearningRate 0.0011   Epoch: 17   Global Step: 742910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:14:26,115-Speed 2615.91 samples/sec   Loss 1.6320   LearningRate 0.0011   Epoch: 17   Global Step: 742920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:30,023-Speed 2621.13 samples/sec   Loss 1.6340   LearningRate 0.0011   Epoch: 17   Global Step: 742930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:33,933-Speed 2619.95 samples/sec   Loss 1.6053   LearningRate 0.0011   Epoch: 17   Global Step: 742940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:37,826-Speed 2630.96 samples/sec   Loss 1.6241   LearningRate 0.0011   Epoch: 17   Global Step: 742950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:41,723-Speed 2628.39 samples/sec   Loss 1.5884   LearningRate 0.0011   Epoch: 17   Global Step: 742960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:45,615-Speed 2632.12 samples/sec   Loss 1.6833   LearningRate 0.0011   Epoch: 17   Global Step: 742970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:49,512-Speed 2627.84 samples/sec   Loss 1.5778   LearningRate 0.0011   Epoch: 17   Global Step: 742980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:53,428-Speed 2616.36 samples/sec   Loss 1.5926   LearningRate 0.0011   Epoch: 17   Global Step: 742990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:14:57,326-Speed 2627.71 samples/sec   Loss 1.6661   LearningRate 0.0011   Epoch: 17   Global Step: 743000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:01,232-Speed 2622.31 samples/sec   Loss 1.6513   LearningRate 0.0011   Epoch: 17   Global Step: 743010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:05,108-Speed 2643.01 samples/sec   Loss 1.6505   LearningRate 0.0011   Epoch: 17   Global Step: 743020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:09,003-Speed 2629.31 samples/sec   Loss 1.6068   LearningRate 0.0011   Epoch: 17   Global Step: 743030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:13,006-Speed 2558.45 samples/sec   Loss 1.6455   LearningRate 0.0011   Epoch: 17   Global Step: 743040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:16,907-Speed 2627.13 samples/sec   Loss 1.6176   LearningRate 0.0011   Epoch: 17   Global Step: 743050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:20,802-Speed 2629.37 samples/sec   Loss 1.6119   LearningRate 0.0011   Epoch: 17   Global Step: 743060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:24,697-Speed 2630.01 samples/sec   Loss 1.6293   LearningRate 0.0011   Epoch: 17   Global Step: 743070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:28,603-Speed 2621.86 samples/sec   Loss 1.5756   LearningRate 0.0011   Epoch: 17   Global Step: 743080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:32,506-Speed 2624.71 samples/sec   Loss 1.6537   LearningRate 0.0011   Epoch: 17   Global Step: 743090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:15:36,383-Speed 2642.15 samples/sec   Loss 1.6610   LearningRate 0.0011   Epoch: 17   Global Step: 743100   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:15:40,284-Speed 2625.03 samples/sec   Loss 1.6379   LearningRate 0.0011   Epoch: 17   Global Step: 743110   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:15:44,178-Speed 2630.45 samples/sec   Loss 1.6122   LearningRate 0.0011   Epoch: 17   Global Step: 743120   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:15:48,072-Speed 2630.52 samples/sec   Loss 1.5687   LearningRate 0.0011   Epoch: 17   Global Step: 743130   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:15:51,965-Speed 2630.65 samples/sec   Loss 1.5768   LearningRate 0.0011   Epoch: 17   Global Step: 743140   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:15:55,858-Speed 2631.43 samples/sec   Loss 1.5606   LearningRate 0.0011   Epoch: 17   Global Step: 743150   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:15:59,757-Speed 2626.43 samples/sec   Loss 1.6454   LearningRate 0.0011   Epoch: 17   Global Step: 743160   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:16:03,659-Speed 2625.49 samples/sec   Loss 1.5909   LearningRate 0.0011   Epoch: 17   Global Step: 743170   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:16:07,579-Speed 2613.19 samples/sec   Loss 1.6199   LearningRate 0.0011   Epoch: 17   Global Step: 743180   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:16:11,480-Speed 2625.06 samples/sec   Loss 1.6296   LearningRate 0.0011   Epoch: 17   Global Step: 743190   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:16:15,374-Speed 2630.66 samples/sec   Loss 1.6366   LearningRate 0.0011   Epoch: 17   Global Step: 743200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:19,388-Speed 2551.06 samples/sec   Loss 1.6460   LearningRate 0.0011   Epoch: 17   Global Step: 743210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:23,380-Speed 2565.82 samples/sec   Loss 1.5726   LearningRate 0.0011   Epoch: 17   Global Step: 743220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:27,286-Speed 2622.45 samples/sec   Loss 1.6076   LearningRate 0.0011   Epoch: 17   Global Step: 743230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:31,188-Speed 2625.23 samples/sec   Loss 1.6864   LearningRate 0.0011   Epoch: 17   Global Step: 743240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:35,090-Speed 2624.35 samples/sec   Loss 1.6008   LearningRate 0.0011   Epoch: 17   Global Step: 743250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:38,997-Speed 2622.31 samples/sec   Loss 1.6060   LearningRate 0.0011   Epoch: 17   Global Step: 743260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:42,897-Speed 2627.05 samples/sec   Loss 1.6127   LearningRate 0.0011   Epoch: 17   Global Step: 743270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:46,793-Speed 2628.95 samples/sec   Loss 1.5743   LearningRate 0.0011   Epoch: 17   Global Step: 743280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:50,728-Speed 2603.03 samples/sec   Loss 1.6175   LearningRate 0.0011   Epoch: 17   Global Step: 743290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:54,600-Speed 2645.30 samples/sec   Loss 1.6640   LearningRate 0.0011   Epoch: 17   Global Step: 743300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:16:58,497-Speed 2628.41 samples/sec   Loss 1.6439   LearningRate 0.0011   Epoch: 17   Global Step: 743310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:02,406-Speed 2620.21 samples/sec   Loss 1.5678   LearningRate 0.0011   Epoch: 17   Global Step: 743320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:06,300-Speed 2630.32 samples/sec   Loss 1.5931   LearningRate 0.0011   Epoch: 17   Global Step: 743330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:10,196-Speed 2629.42 samples/sec   Loss 1.6226   LearningRate 0.0011   Epoch: 17   Global Step: 743340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:14,095-Speed 2627.02 samples/sec   Loss 1.6805   LearningRate 0.0011   Epoch: 17   Global Step: 743350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:17,996-Speed 2625.34 samples/sec   Loss 1.6051   LearningRate 0.0011   Epoch: 17   Global Step: 743360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:21,909-Speed 2617.80 samples/sec   Loss 1.5774   LearningRate 0.0011   Epoch: 17   Global Step: 743370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:25,805-Speed 2628.79 samples/sec   Loss 1.6022   LearningRate 0.0011   Epoch: 17   Global Step: 743380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:29,705-Speed 2626.31 samples/sec   Loss 1.5651   LearningRate 0.0011   Epoch: 17   Global Step: 743390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:33,658-Speed 2590.97 samples/sec   Loss 1.6307   LearningRate 0.0011   Epoch: 17   Global Step: 743400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:17:37,586-Speed 2607.92 samples/sec   Loss 1.6551   LearningRate 0.0011   Epoch: 17   Global Step: 743410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:41,483-Speed 2628.03 samples/sec   Loss 1.6644   LearningRate 0.0011   Epoch: 17   Global Step: 743420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:45,399-Speed 2615.67 samples/sec   Loss 1.6274   LearningRate 0.0011   Epoch: 17   Global Step: 743430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:49,302-Speed 2624.66 samples/sec   Loss 1.6469   LearningRate 0.0011   Epoch: 17   Global Step: 743440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:53,198-Speed 2629.02 samples/sec   Loss 1.5445   LearningRate 0.0011   Epoch: 17   Global Step: 743450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:17:57,098-Speed 2626.27 samples/sec   Loss 1.6280   LearningRate 0.0011   Epoch: 17   Global Step: 743460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:18:00,996-Speed 2627.52 samples/sec   Loss 1.6334   LearningRate 0.0011   Epoch: 17   Global Step: 743470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:18:04,896-Speed 2626.44 samples/sec   Loss 1.6119   LearningRate 0.0011   Epoch: 17   Global Step: 743480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:18:08,793-Speed 2628.63 samples/sec   Loss 1.6568   LearningRate 0.0011   Epoch: 17   Global Step: 743490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:18:12,716-Speed 2610.99 samples/sec   Loss 1.6283   LearningRate 0.0011   Epoch: 17   Global Step: 743500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:18:16,631-Speed 2616.72 samples/sec   Loss 1.6282   LearningRate 0.0011   Epoch: 17   Global Step: 743510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:20,532-Speed 2626.05 samples/sec   Loss 1.6085   LearningRate 0.0011   Epoch: 17   Global Step: 743520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:24,432-Speed 2625.73 samples/sec   Loss 1.6161   LearningRate 0.0011   Epoch: 17   Global Step: 743530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:28,325-Speed 2631.41 samples/sec   Loss 1.6506   LearningRate 0.0011   Epoch: 17   Global Step: 743540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:32,227-Speed 2625.00 samples/sec   Loss 1.5926   LearningRate 0.0011   Epoch: 17   Global Step: 743550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:36,128-Speed 2625.53 samples/sec   Loss 1.5618   LearningRate 0.0011   Epoch: 17   Global Step: 743560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:40,028-Speed 2625.87 samples/sec   Loss 1.6347   LearningRate 0.0011   Epoch: 17   Global Step: 743570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:43,926-Speed 2628.51 samples/sec   Loss 1.6907   LearningRate 0.0011   Epoch: 17   Global Step: 743580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:47,821-Speed 2629.33 samples/sec   Loss 1.6105   LearningRate 0.0011   Epoch: 17   Global Step: 743590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:51,723-Speed 2625.66 samples/sec   Loss 1.6735   LearningRate 0.0011   Epoch: 17   Global Step: 743600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:55,599-Speed 2642.41 samples/sec   Loss 1.6404   LearningRate 0.0011   Epoch: 17   Global Step: 743610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:18:59,490-Speed 2632.25 samples/sec   Loss 1.6392   LearningRate 0.0011   Epoch: 17   Global Step: 743620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:19:03,403-Speed 2617.61 samples/sec   Loss 1.6093   LearningRate 0.0011   Epoch: 17   Global Step: 743630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:07,295-Speed 2632.75 samples/sec   Loss 1.6039   LearningRate 0.0011   Epoch: 17   Global Step: 743640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:11,184-Speed 2633.26 samples/sec   Loss 1.5572   LearningRate 0.0011   Epoch: 17   Global Step: 743650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:15,081-Speed 2628.11 samples/sec   Loss 1.6397   LearningRate 0.0011   Epoch: 17   Global Step: 743660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:18,981-Speed 2626.80 samples/sec   Loss 1.5709   LearningRate 0.0011   Epoch: 17   Global Step: 743670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:22,870-Speed 2633.66 samples/sec   Loss 1.5945   LearningRate 0.0011   Epoch: 17   Global Step: 743680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:26,806-Speed 2602.22 samples/sec   Loss 1.5978   LearningRate 0.0011   Epoch: 17   Global Step: 743690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:30,713-Speed 2621.66 samples/sec   Loss 1.6237   LearningRate 0.0011   Epoch: 17   Global Step: 743700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:34,607-Speed 2630.41 samples/sec   Loss 1.6085   LearningRate 0.0011   Epoch: 17   Global Step: 743710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:38,532-Speed 2609.46 samples/sec   Loss 1.6038   LearningRate 0.0011   Epoch: 17   Global Step: 743720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:19:42,472-Speed 2599.91 samples/sec   Loss 1.6254   LearningRate 0.0011   Epoch: 17   Global Step: 743730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:19:46,366-Speed 2630.04 samples/sec   Loss 1.5573   LearningRate 0.0011   Epoch: 17   Global Step: 743740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:19:50,263-Speed 2628.98 samples/sec   Loss 1.6277   LearningRate 0.0011   Epoch: 17   Global Step: 743750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:19:54,156-Speed 2630.45 samples/sec   Loss 1.5968   LearningRate 0.0011   Epoch: 17   Global Step: 743760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:19:58,067-Speed 2620.22 samples/sec   Loss 1.6393   LearningRate 0.0011   Epoch: 17   Global Step: 743770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:20:01,975-Speed 2620.50 samples/sec   Loss 1.5879   LearningRate 0.0011   Epoch: 17   Global Step: 743780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:20:05,869-Speed 2630.32 samples/sec   Loss 1.6374   LearningRate 0.0011   Epoch: 17   Global Step: 743790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:20:09,739-Speed 2646.65 samples/sec   Loss 1.6080   LearningRate 0.0011   Epoch: 17   Global Step: 743800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:13,668-Speed 2607.38 samples/sec   Loss 1.6446   LearningRate 0.0011   Epoch: 17   Global Step: 743810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:17,569-Speed 2625.84 samples/sec   Loss 1.6625   LearningRate 0.0011   Epoch: 17   Global Step: 743820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:21,475-Speed 2622.04 samples/sec   Loss 1.6601   LearningRate 0.0011   Epoch: 17   Global Step: 743830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:25,373-Speed 2628.11 samples/sec   Loss 1.5936   LearningRate 0.0011   Epoch: 17   Global Step: 743840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:29,265-Speed 2632.31 samples/sec   Loss 1.6169   LearningRate 0.0011   Epoch: 17   Global Step: 743850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:33,163-Speed 2627.48 samples/sec   Loss 1.5892   LearningRate 0.0011   Epoch: 17   Global Step: 743860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:37,093-Speed 2606.04 samples/sec   Loss 1.5886   LearningRate 0.0011   Epoch: 17   Global Step: 743870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:40,990-Speed 2628.25 samples/sec   Loss 1.6211   LearningRate 0.0011   Epoch: 17   Global Step: 743880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:44,901-Speed 2618.37 samples/sec   Loss 1.5602   LearningRate 0.0011   Epoch: 17   Global Step: 743890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:48,849-Speed 2594.69 samples/sec   Loss 1.5281   LearningRate 0.0011   Epoch: 17   Global Step: 743900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:52,743-Speed 2630.25 samples/sec   Loss 1.6331   LearningRate 0.0011   Epoch: 17   Global Step: 743910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:20:56,638-Speed 2629.84 samples/sec   Loss 1.5919   LearningRate 0.0011   Epoch: 17   Global Step: 743920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:00,536-Speed 2627.72 samples/sec   Loss 1.5947   LearningRate 0.0011   Epoch: 17   Global Step: 743930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:04,434-Speed 2627.53 samples/sec   Loss 1.6322   LearningRate 0.0011   Epoch: 17   Global Step: 743940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:08,327-Speed 2631.22 samples/sec   Loss 1.6291   LearningRate 0.0011   Epoch: 17   Global Step: 743950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:12,222-Speed 2629.93 samples/sec   Loss 1.6496   LearningRate 0.0011   Epoch: 17   Global Step: 743960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:16,115-Speed 2630.73 samples/sec   Loss 1.5808   LearningRate 0.0011   Epoch: 17   Global Step: 743970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:20,018-Speed 2624.51 samples/sec   Loss 1.6170   LearningRate 0.0011   Epoch: 17   Global Step: 743980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:23,911-Speed 2630.54 samples/sec   Loss 1.5923   LearningRate 0.0011   Epoch: 17   Global Step: 743990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:27,808-Speed 2628.88 samples/sec   Loss 1.6323   LearningRate 0.0011   Epoch: 17   Global Step: 744000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:21:31,706-Speed 2626.96 samples/sec   Loss 1.5977   LearningRate 0.0011   Epoch: 17   Global Step: 744010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:21:35,605-Speed 2627.23 samples/sec   Loss 1.5882   LearningRate 0.0011   Epoch: 17   Global Step: 744020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:21:39,511-Speed 2622.25 samples/sec   Loss 1.6107   LearningRate 0.0011   Epoch: 17   Global Step: 744030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:21:43,397-Speed 2636.04 samples/sec   Loss 1.5649   LearningRate 0.0011   Epoch: 17   Global Step: 744040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:47,293-Speed 2628.98 samples/sec   Loss 1.6334   LearningRate 0.0011   Epoch: 17   Global Step: 744050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:51,186-Speed 2630.73 samples/sec   Loss 1.6276   LearningRate 0.0011   Epoch: 17   Global Step: 744060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:55,081-Speed 2629.45 samples/sec   Loss 1.5212   LearningRate 0.0011   Epoch: 17   Global Step: 744070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:21:59,024-Speed 2598.11 samples/sec   Loss 1.6250   LearningRate 0.0011   Epoch: 17   Global Step: 744080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:22:02,901-Speed 2642.00 samples/sec   Loss 1.6623   LearningRate 0.0011   Epoch: 17   Global Step: 744090   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:06,833-Speed 2604.89 samples/sec   Loss 1.6026   LearningRate 0.0011   Epoch: 17   Global Step: 744100   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:10,723-Speed 2632.87 samples/sec   Loss 1.5594   LearningRate 0.0011   Epoch: 17   Global Step: 744110   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:14,619-Speed 2629.46 samples/sec   Loss 1.6023   LearningRate 0.0011   Epoch: 17   Global Step: 744120   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:18,514-Speed 2630.04 samples/sec   Loss 1.6633   LearningRate 0.0011   Epoch: 17   Global Step: 744130   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:22,412-Speed 2627.35 samples/sec   Loss 1.5800   LearningRate 0.0011   Epoch: 17   Global Step: 744140   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:26,309-Speed 2628.05 samples/sec   Loss 1.6457   LearningRate 0.0011   Epoch: 17   Global Step: 744150   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:30,209-Speed 2626.46 samples/sec   Loss 1.5835   LearningRate 0.0011   Epoch: 17   Global Step: 744160   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:34,119-Speed 2620.33 samples/sec   Loss 1.6550   LearningRate 0.0011   Epoch: 17   Global Step: 744170   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:38,017-Speed 2627.92 samples/sec   Loss 1.6430   LearningRate 0.0011   Epoch: 17   Global Step: 744180   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:41,985-Speed 2581.05 samples/sec   Loss 1.5793   LearningRate 0.0011   Epoch: 17   Global Step: 744190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:22:45,897-Speed 2618.41 samples/sec   Loss 1.6400   LearningRate 0.0011   Epoch: 17   Global Step: 744200   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:49,800-Speed 2624.08 samples/sec   Loss 1.5633   LearningRate 0.0011   Epoch: 17   Global Step: 744210   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:53,867-Speed 2518.88 samples/sec   Loss 1.5889   LearningRate 0.0011   Epoch: 17   Global Step: 744220   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:22:57,834-Speed 2581.53 samples/sec   Loss 1.6262   LearningRate 0.0011   Epoch: 17   Global Step: 744230   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:01,815-Speed 2573.18 samples/sec   Loss 1.6040   LearningRate 0.0011   Epoch: 17   Global Step: 744240   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:05,914-Speed 2498.97 samples/sec   Loss 1.6333   LearningRate 0.0011   Epoch: 17   Global Step: 744250   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:09,840-Speed 2608.85 samples/sec   Loss 1.6024   LearningRate 0.0011   Epoch: 17   Global Step: 744260   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:13,733-Speed 2631.31 samples/sec   Loss 1.6196   LearningRate 0.0011   Epoch: 17   Global Step: 744270   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:17,631-Speed 2627.51 samples/sec   Loss 1.6225   LearningRate 0.0011   Epoch: 17   Global Step: 744280   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:21,549-Speed 2614.71 samples/sec   Loss 1.6179   LearningRate 0.0011   Epoch: 17   Global Step: 744290   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:25,518-Speed 2580.62 samples/sec   Loss 1.6478   LearningRate 0.0011   Epoch: 17   Global Step: 744300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:23:29,422-Speed 2623.49 samples/sec   Loss 1.5936   LearningRate 0.0011   Epoch: 17   Global Step: 744310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:23:33,324-Speed 2624.87 samples/sec   Loss 1.6103   LearningRate 0.0011   Epoch: 17   Global Step: 744320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:23:37,223-Speed 2627.40 samples/sec   Loss 1.5887   LearningRate 0.0011   Epoch: 17   Global Step: 744330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:23:41,118-Speed 2629.35 samples/sec   Loss 1.5860   LearningRate 0.0011   Epoch: 17   Global Step: 744340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:23:45,001-Speed 2637.47 samples/sec   Loss 1.6046   LearningRate 0.0011   Epoch: 17   Global Step: 744350   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:48,894-Speed 2630.89 samples/sec   Loss 1.6177   LearningRate 0.0011   Epoch: 17   Global Step: 744360   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:52,784-Speed 2633.11 samples/sec   Loss 1.6043   LearningRate 0.0011   Epoch: 17   Global Step: 744370   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:23:56,678-Speed 2630.81 samples/sec   Loss 1.5958   LearningRate 0.0011   Epoch: 17   Global Step: 744380   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:00,579-Speed 2625.24 samples/sec   Loss 1.6211   LearningRate 0.0011   Epoch: 17   Global Step: 744390   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:04,472-Speed 2631.35 samples/sec   Loss 1.6000   LearningRate 0.0011   Epoch: 17   Global Step: 744400   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:08,381-Speed 2619.53 samples/sec   Loss 1.5864   LearningRate 0.0011   Epoch: 17   Global Step: 744410   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:12,438-Speed 2524.94 samples/sec   Loss 1.5689   LearningRate 0.0011   Epoch: 17   Global Step: 744420   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:16,335-Speed 2627.91 samples/sec   Loss 1.5423   LearningRate 0.0011   Epoch: 17   Global Step: 744430   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:20,232-Speed 2629.35 samples/sec   Loss 1.5942   LearningRate 0.0011   Epoch: 17   Global Step: 744440   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-16 07:24:24,137-Speed 2622.83 samples/sec   Loss 1.5945   LearningRate 0.0011   Epoch: 17   Global Step: 744450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:28,027-Speed 2633.05 samples/sec   Loss 1.6033   LearningRate 0.0011   Epoch: 17   Global Step: 744460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:31,920-Speed 2630.94 samples/sec   Loss 1.5912   LearningRate 0.0011   Epoch: 17   Global Step: 744470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:35,810-Speed 2632.69 samples/sec   Loss 1.5803   LearningRate 0.0011   Epoch: 17   Global Step: 744480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:39,703-Speed 2630.96 samples/sec   Loss 1.6242   LearningRate 0.0011   Epoch: 17   Global Step: 744490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:43,597-Speed 2630.64 samples/sec   Loss 1.6531   LearningRate 0.0011   Epoch: 17   Global Step: 744500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:47,489-Speed 2631.05 samples/sec   Loss 1.6833   LearningRate 0.0011   Epoch: 17   Global Step: 744510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:51,382-Speed 2632.07 samples/sec   Loss 1.6416   LearningRate 0.0011   Epoch: 17   Global Step: 744520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:55,278-Speed 2628.91 samples/sec   Loss 1.5992   LearningRate 0.0011   Epoch: 17   Global Step: 744530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:24:59,185-Speed 2621.83 samples/sec   Loss 1.6596   LearningRate 0.0011   Epoch: 17   Global Step: 744540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:03,063-Speed 2640.90 samples/sec   Loss 1.5956   LearningRate 0.0011   Epoch: 17   Global Step: 744550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:06,961-Speed 2627.26 samples/sec   Loss 1.5739   LearningRate 0.0011   Epoch: 17   Global Step: 744560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:10,853-Speed 2631.33 samples/sec   Loss 1.5805   LearningRate 0.0011   Epoch: 17   Global Step: 744570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:14,751-Speed 2627.74 samples/sec   Loss 1.6212   LearningRate 0.0010   Epoch: 17   Global Step: 744580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:18,689-Speed 2601.59 samples/sec   Loss 1.5884   LearningRate 0.0010   Epoch: 17   Global Step: 744590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:22,589-Speed 2626.22 samples/sec   Loss 1.6359   LearningRate 0.0010   Epoch: 17   Global Step: 744600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:26,490-Speed 2626.14 samples/sec   Loss 1.5643   LearningRate 0.0010   Epoch: 17   Global Step: 744610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:30,382-Speed 2631.55 samples/sec   Loss 1.5723   LearningRate 0.0010   Epoch: 17   Global Step: 744620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:34,274-Speed 2631.91 samples/sec   Loss 1.6803   LearningRate 0.0010   Epoch: 17   Global Step: 744630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:38,170-Speed 2628.39 samples/sec   Loss 1.5923   LearningRate 0.0010   Epoch: 17   Global Step: 744640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:25:42,070-Speed 2626.36 samples/sec   Loss 1.5449   LearningRate 0.0010   Epoch: 17   Global Step: 744650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:25:45,963-Speed 2631.13 samples/sec   Loss 1.6664   LearningRate 0.0010   Epoch: 17   Global Step: 744660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:25:49,857-Speed 2631.00 samples/sec   Loss 1.6352   LearningRate 0.0010   Epoch: 17   Global Step: 744670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:25:53,749-Speed 2631.27 samples/sec   Loss 1.5814   LearningRate 0.0010   Epoch: 17   Global Step: 744680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:25:57,684-Speed 2603.07 samples/sec   Loss 1.5835   LearningRate 0.0010   Epoch: 17   Global Step: 744690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:01,671-Speed 2569.16 samples/sec   Loss 1.5961   LearningRate 0.0010   Epoch: 17   Global Step: 744700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:05,568-Speed 2628.77 samples/sec   Loss 1.5802   LearningRate 0.0010   Epoch: 17   Global Step: 744710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:09,468-Speed 2625.72 samples/sec   Loss 1.6004   LearningRate 0.0010   Epoch: 17   Global Step: 744720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:13,365-Speed 2628.69 samples/sec   Loss 1.5677   LearningRate 0.0010   Epoch: 17   Global Step: 744730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:17,266-Speed 2625.76 samples/sec   Loss 1.6308   LearningRate 0.0010   Epoch: 17   Global Step: 744740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:21,162-Speed 2629.27 samples/sec   Loss 1.5649   LearningRate 0.0010   Epoch: 17   Global Step: 744750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-16 07:26:25,031-Speed 2647.72 samples/sec   Loss 1.5814   LearningRate 0.0010   Epoch: 17   Global Step: 744760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:28,926-Speed 2629.96 samples/sec   Loss 1.5742   LearningRate 0.0010   Epoch: 17   Global Step: 744770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:32,832-Speed 2622.15 samples/sec   Loss 1.5969   LearningRate 0.0010   Epoch: 17   Global Step: 744780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:36,732-Speed 2626.11 samples/sec   Loss 1.6141   LearningRate 0.0010   Epoch: 17   Global Step: 744790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:40,623-Speed 2631.95 samples/sec   Loss 1.5851   LearningRate 0.0010   Epoch: 17   Global Step: 744800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:44,524-Speed 2626.27 samples/sec   Loss 1.5342   LearningRate 0.0010   Epoch: 17   Global Step: 744810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:48,425-Speed 2626.28 samples/sec   Loss 1.6544   LearningRate 0.0010   Epoch: 17   Global Step: 744820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:52,318-Speed 2630.38 samples/sec   Loss 1.5170   LearningRate 0.0010   Epoch: 17   Global Step: 744830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:26:56,247-Speed 2607.52 samples/sec   Loss 1.6489   LearningRate 0.0010   Epoch: 17   Global Step: 744840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:00,142-Speed 2629.58 samples/sec   Loss 1.6192   LearningRate 0.0010   Epoch: 17   Global Step: 744850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:04,035-Speed 2631.20 samples/sec   Loss 1.5905   LearningRate 0.0010   Epoch: 17   Global Step: 744860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:07,937-Speed 2624.33 samples/sec   Loss 1.5725   LearningRate 0.0010   Epoch: 17   Global Step: 744870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:11,834-Speed 2628.85 samples/sec   Loss 1.5537   LearningRate 0.0010   Epoch: 17   Global Step: 744880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:15,729-Speed 2630.07 samples/sec   Loss 1.5950   LearningRate 0.0010   Epoch: 17   Global Step: 744890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:19,633-Speed 2623.72 samples/sec   Loss 1.5318   LearningRate 0.0010   Epoch: 17   Global Step: 744900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:23,526-Speed 2630.37 samples/sec   Loss 1.6156   LearningRate 0.0010   Epoch: 17   Global Step: 744910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:27,442-Speed 2615.81 samples/sec   Loss 1.6345   LearningRate 0.0010   Epoch: 17   Global Step: 744920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:31,334-Speed 2632.20 samples/sec   Loss 1.5913   LearningRate 0.0010   Epoch: 17   Global Step: 744930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-16 07:27:35,198-Speed 2650.72 samples/sec   Loss 1.5973   LearningRate 0.0010   Epoch: 17   Global Step: 744940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:27:39,100-Speed 2624.27 samples/sec   Loss 1.5741   LearningRate 0.0010   Epoch: 17   Global Step: 744950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:27:43,042-Speed 2598.96 samples/sec   Loss 1.5540   LearningRate 0.0010   Epoch: 17   Global Step: 744960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:27:46,937-Speed 2629.25 samples/sec   Loss 1.5871   LearningRate 0.0010   Epoch: 17   Global Step: 744970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:27:50,833-Speed 2629.49 samples/sec   Loss 1.5928   LearningRate 0.0010   Epoch: 17   Global Step: 744980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:27:54,736-Speed 2623.85 samples/sec   Loss 1.5841   LearningRate 0.0010   Epoch: 17   Global Step: 744990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:27:58,635-Speed 2628.30 samples/sec   Loss 1.5650   LearningRate 0.0010   Epoch: 17   Global Step: 745000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:28:02,532-Speed 2628.56 samples/sec   Loss 1.6161   LearningRate 0.0010   Epoch: 17   Global Step: 745010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:28:06,434-Speed 2624.31 samples/sec   Loss 1.5573   LearningRate 0.0010   Epoch: 17   Global Step: 745020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:28:10,333-Speed 2627.03 samples/sec   Loss 1.6348   LearningRate 0.0010   Epoch: 17   Global Step: 745030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:28:14,206-Speed 2644.48 samples/sec   Loss 1.6216   LearningRate 0.0010   Epoch: 17   Global Step: 745040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:28:18,107-Speed 2626.16 samples/sec   Loss 1.6465   LearningRate 0.0010   Epoch: 17   Global Step: 745050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-16 07:28:22,032-Speed 2609.65 samples/sec   Loss 1.5912   LearningRate 0.0010   Epoch: 17   Global Step: 745060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:28:25,937-Speed 2622.67 samples/sec   Loss 1.5800   LearningRate 0.0010   Epoch: 17   Global Step: 745070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:28:29,837-Speed 2626.18 samples/sec   Loss 1.5874   LearningRate 0.0010   Epoch: 17   Global Step: 745080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:28:33,750-Speed 2617.95 samples/sec   Loss 1.6216   LearningRate 0.0010   Epoch: 17   Global Step: 745090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:28:37,661-Speed 2618.59 samples/sec   Loss 1.6177   LearningRate 0.0010   Epoch: 17   Global Step: 745100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:28:41,570-Speed 2620.41 samples/sec   Loss 1.6204   LearningRate 0.0010   Epoch: 17   Global Step: 745110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:28:45,447-Speed 2642.10 samples/sec   Loss 1.5841   LearningRate 0.0010   Epoch: 17   Global Step: 745120   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:28:49,352-Speed 2622.64 samples/sec   Loss 1.5378   LearningRate 0.0010   Epoch: 17   Global Step: 745130   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:28:53,251-Speed 2627.69 samples/sec   Loss 1.5381   LearningRate 0.0010   Epoch: 17   Global Step: 745140   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:28:57,147-Speed 2628.81 samples/sec   Loss 1.6311   LearningRate 0.0010   Epoch: 17   Global Step: 745150   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:01,044-Speed 2628.05 samples/sec   Loss 1.5575   LearningRate 0.0010   Epoch: 17   Global Step: 745160   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:04,938-Speed 2630.33 samples/sec   Loss 1.5534   LearningRate 0.0010   Epoch: 17   Global Step: 745170   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:08,831-Speed 2631.43 samples/sec   Loss 1.5966   LearningRate 0.0010   Epoch: 17   Global Step: 745180   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:12,731-Speed 2626.18 samples/sec   Loss 1.6319   LearningRate 0.0010   Epoch: 17   Global Step: 745190   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:16,628-Speed 2628.18 samples/sec   Loss 1.5790   LearningRate 0.0010   Epoch: 17   Global Step: 745200   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:20,528-Speed 2625.78 samples/sec   Loss 1.5704   LearningRate 0.0010   Epoch: 17   Global Step: 745210   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:29:24,419-Speed 2632.32 samples/sec   Loss 1.5814   LearningRate 0.0010   Epoch: 17   Global Step: 745220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:28,334-Speed 2616.80 samples/sec   Loss 1.5687   LearningRate 0.0010   Epoch: 17   Global Step: 745230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:32,230-Speed 2629.07 samples/sec   Loss 1.6502   LearningRate 0.0010   Epoch: 17   Global Step: 745240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:36,125-Speed 2629.36 samples/sec   Loss 1.6364   LearningRate 0.0010   Epoch: 17   Global Step: 745250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:40,022-Speed 2628.11 samples/sec   Loss 1.5440   LearningRate 0.0010   Epoch: 17   Global Step: 745260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:43,915-Speed 2631.32 samples/sec   Loss 1.5685   LearningRate 0.0010   Epoch: 17   Global Step: 745270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:47,814-Speed 2626.58 samples/sec   Loss 1.6371   LearningRate 0.0010   Epoch: 17   Global Step: 745280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:51,715-Speed 2626.27 samples/sec   Loss 1.6084   LearningRate 0.0010   Epoch: 17   Global Step: 745290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:55,616-Speed 2625.03 samples/sec   Loss 1.6067   LearningRate 0.0010   Epoch: 17   Global Step: 745300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:29:59,515-Speed 2627.44 samples/sec   Loss 1.5719   LearningRate 0.0010   Epoch: 17   Global Step: 745310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:03,407-Speed 2632.03 samples/sec   Loss 1.6211   LearningRate 0.0010   Epoch: 17   Global Step: 745320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:30:07,314-Speed 2621.41 samples/sec   Loss 1.6159   LearningRate 0.0010   Epoch: 17   Global Step: 745330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:30:11,205-Speed 2631.99 samples/sec   Loss 1.5883   LearningRate 0.0010   Epoch: 17   Global Step: 745340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:30:15,100-Speed 2630.41 samples/sec   Loss 1.5653   LearningRate 0.0010   Epoch: 17   Global Step: 745350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:30:18,995-Speed 2630.07 samples/sec   Loss 1.6108   LearningRate 0.0010   Epoch: 17   Global Step: 745360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:30:22,896-Speed 2625.19 samples/sec   Loss 1.5943   LearningRate 0.0010   Epoch: 17   Global Step: 745370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:30:26,773-Speed 2641.85 samples/sec   Loss 1.5979   LearningRate 0.0010   Epoch: 17   Global Step: 745380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:30,740-Speed 2582.20 samples/sec   Loss 1.6080   LearningRate 0.0010   Epoch: 17   Global Step: 745390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:34,640-Speed 2626.66 samples/sec   Loss 1.6402   LearningRate 0.0010   Epoch: 17   Global Step: 745400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:38,537-Speed 2627.66 samples/sec   Loss 1.5466   LearningRate 0.0010   Epoch: 17   Global Step: 745410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:42,432-Speed 2630.27 samples/sec   Loss 1.5972   LearningRate 0.0010   Epoch: 17   Global Step: 745420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:46,328-Speed 2628.48 samples/sec   Loss 1.6177   LearningRate 0.0010   Epoch: 17   Global Step: 745430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:50,291-Speed 2585.25 samples/sec   Loss 1.6593   LearningRate 0.0010   Epoch: 17   Global Step: 745440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:54,214-Speed 2610.69 samples/sec   Loss 1.6039   LearningRate 0.0010   Epoch: 17   Global Step: 745450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:30:58,115-Speed 2626.16 samples/sec   Loss 1.6289   LearningRate 0.0010   Epoch: 17   Global Step: 745460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:31:02,008-Speed 2630.50 samples/sec   Loss 1.5600   LearningRate 0.0010   Epoch: 17   Global Step: 745470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:31:05,912-Speed 2623.76 samples/sec   Loss 1.6430   LearningRate 0.0010   Epoch: 17   Global Step: 745480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:31:09,821-Speed 2620.10 samples/sec   Loss 1.5891   LearningRate 0.0010   Epoch: 17   Global Step: 745490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:31:13,756-Speed 2602.71 samples/sec   Loss 1.5648   LearningRate 0.0010   Epoch: 17   Global Step: 745500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:31:17,614-Speed 2654.71 samples/sec   Loss 1.6176   LearningRate 0.0010   Epoch: 17   Global Step: 745510   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:21,514-Speed 2626.46 samples/sec   Loss 1.6396   LearningRate 0.0010   Epoch: 17   Global Step: 745520   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:25,432-Speed 2614.38 samples/sec   Loss 1.6050   LearningRate 0.0010   Epoch: 17   Global Step: 745530   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:29,409-Speed 2575.28 samples/sec   Loss 1.5972   LearningRate 0.0010   Epoch: 17   Global Step: 745540   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:33,305-Speed 2629.24 samples/sec   Loss 1.6287   LearningRate 0.0010   Epoch: 17   Global Step: 745550   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:37,197-Speed 2631.85 samples/sec   Loss 1.6141   LearningRate 0.0010   Epoch: 17   Global Step: 745560   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:41,149-Speed 2591.14 samples/sec   Loss 1.5616   LearningRate 0.0010   Epoch: 17   Global Step: 745570   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:45,043-Speed 2630.93 samples/sec   Loss 1.5579   LearningRate 0.0010   Epoch: 17   Global Step: 745580   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:48,955-Speed 2617.87 samples/sec   Loss 1.6482   LearningRate 0.0010   Epoch: 17   Global Step: 745590   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:52,860-Speed 2622.97 samples/sec   Loss 1.5537   LearningRate 0.0010   Epoch: 17   Global Step: 745600   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:31:56,759-Speed 2627.23 samples/sec   Loss 1.5754   LearningRate 0.0010   Epoch: 17   Global Step: 745610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:32:00,654-Speed 2629.45 samples/sec   Loss 1.5750   LearningRate 0.0010   Epoch: 17   Global Step: 745620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:32:04,542-Speed 2633.77 samples/sec   Loss 1.5869   LearningRate 0.0010   Epoch: 17   Global Step: 745630   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:08,445-Speed 2624.70 samples/sec   Loss 1.5610   LearningRate 0.0010   Epoch: 17   Global Step: 745640   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:12,344-Speed 2627.09 samples/sec   Loss 1.5952   LearningRate 0.0010   Epoch: 17   Global Step: 745650   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:16,236-Speed 2631.62 samples/sec   Loss 1.6190   LearningRate 0.0010   Epoch: 17   Global Step: 745660   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:20,130-Speed 2630.62 samples/sec   Loss 1.5562   LearningRate 0.0010   Epoch: 17   Global Step: 745670   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:24,024-Speed 2630.91 samples/sec   Loss 1.5365   LearningRate 0.0010   Epoch: 17   Global Step: 745680   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:27,935-Speed 2618.42 samples/sec   Loss 1.6103   LearningRate 0.0010   Epoch: 17   Global Step: 745690   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:31,851-Speed 2616.20 samples/sec   Loss 1.5552   LearningRate 0.0010   Epoch: 17   Global Step: 745700   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:35,745-Speed 2629.69 samples/sec   Loss 1.6357   LearningRate 0.0010   Epoch: 17   Global Step: 745710   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:39,642-Speed 2628.23 samples/sec   Loss 1.5907   LearningRate 0.0010   Epoch: 17   Global Step: 745720   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:32:43,537-Speed 2629.45 samples/sec   Loss 1.6055   LearningRate 0.0010   Epoch: 17   Global Step: 745730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:32:47,428-Speed 2632.93 samples/sec   Loss 1.5918   LearningRate 0.0010   Epoch: 17   Global Step: 745740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:32:51,332-Speed 2623.81 samples/sec   Loss 1.5931   LearningRate 0.0010   Epoch: 17   Global Step: 745750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:32:55,273-Speed 2599.95 samples/sec   Loss 1.5789   LearningRate 0.0010   Epoch: 17   Global Step: 745760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:32:59,200-Speed 2608.72 samples/sec   Loss 1.6465   LearningRate 0.0010   Epoch: 17   Global Step: 745770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:03,118-Speed 2614.48 samples/sec   Loss 1.6060   LearningRate 0.0010   Epoch: 17   Global Step: 745780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:07,009-Speed 2632.64 samples/sec   Loss 1.5634   LearningRate 0.0010   Epoch: 17   Global Step: 745790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:10,906-Speed 2627.92 samples/sec   Loss 1.5769   LearningRate 0.0010   Epoch: 17   Global Step: 745800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:14,830-Speed 2610.35 samples/sec   Loss 1.5712   LearningRate 0.0010   Epoch: 17   Global Step: 745810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:18,750-Speed 2613.05 samples/sec   Loss 1.5633   LearningRate 0.0010   Epoch: 17   Global Step: 745820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:22,634-Speed 2637.10 samples/sec   Loss 1.6025   LearningRate 0.0010   Epoch: 17   Global Step: 745830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:26,525-Speed 2632.87 samples/sec   Loss 1.6582   LearningRate 0.0010   Epoch: 17   Global Step: 745840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:30,424-Speed 2627.19 samples/sec   Loss 1.6160   LearningRate 0.0010   Epoch: 17   Global Step: 745850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:34,340-Speed 2615.56 samples/sec   Loss 1.5878   LearningRate 0.0010   Epoch: 17   Global Step: 745860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:38,433-Speed 2502.47 samples/sec   Loss 1.6350   LearningRate 0.0010   Epoch: 17   Global Step: 745870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:42,413-Speed 2572.74 samples/sec   Loss 1.5597   LearningRate 0.0010   Epoch: 17   Global Step: 745880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:46,327-Speed 2616.66 samples/sec   Loss 1.6040   LearningRate 0.0010   Epoch: 17   Global Step: 745890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:50,233-Speed 2623.23 samples/sec   Loss 1.6275   LearningRate 0.0010   Epoch: 17   Global Step: 745900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:54,157-Speed 2610.80 samples/sec   Loss 1.5967   LearningRate 0.0010   Epoch: 17   Global Step: 745910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:33:58,051-Speed 2629.97 samples/sec   Loss 1.5941   LearningRate 0.0010   Epoch: 17   Global Step: 745920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:01,985-Speed 2603.38 samples/sec   Loss 1.5158   LearningRate 0.0010   Epoch: 17   Global Step: 745930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:34:05,860-Speed 2643.66 samples/sec   Loss 1.6142   LearningRate 0.0010   Epoch: 17   Global Step: 745940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:09,755-Speed 2629.69 samples/sec   Loss 1.6050   LearningRate 0.0010   Epoch: 17   Global Step: 745950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:13,849-Speed 2501.75 samples/sec   Loss 1.5855   LearningRate 0.0010   Epoch: 17   Global Step: 745960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:17,784-Speed 2602.56 samples/sec   Loss 1.5895   LearningRate 0.0010   Epoch: 17   Global Step: 745970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:21,691-Speed 2621.27 samples/sec   Loss 1.5444   LearningRate 0.0010   Epoch: 17   Global Step: 745980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:25,586-Speed 2630.42 samples/sec   Loss 1.6230   LearningRate 0.0010   Epoch: 17   Global Step: 745990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:29,477-Speed 2632.85 samples/sec   Loss 1.6012   LearningRate 0.0010   Epoch: 17   Global Step: 746000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:33,367-Speed 2632.58 samples/sec   Loss 1.5522   LearningRate 0.0010   Epoch: 17   Global Step: 746010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:37,270-Speed 2624.08 samples/sec   Loss 1.6413   LearningRate 0.0010   Epoch: 17   Global Step: 746020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:41,161-Speed 2632.52 samples/sec   Loss 1.5956   LearningRate 0.0010   Epoch: 17   Global Step: 746030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:34:45,086-Speed 2609.50 samples/sec   Loss 1.5541   LearningRate 0.0010   Epoch: 17   Global Step: 746040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:34:48,981-Speed 2629.35 samples/sec   Loss 1.5980   LearningRate 0.0010   Epoch: 17   Global Step: 746050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:34:52,907-Speed 2609.43 samples/sec   Loss 1.5914   LearningRate 0.0010   Epoch: 17   Global Step: 746060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:34:56,798-Speed 2632.16 samples/sec   Loss 1.5428   LearningRate 0.0010   Epoch: 17   Global Step: 746070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:35:00,695-Speed 2628.65 samples/sec   Loss 1.6423   LearningRate 0.0010   Epoch: 17   Global Step: 746080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:35:04,598-Speed 2623.62 samples/sec   Loss 1.5914   LearningRate 0.0010   Epoch: 17   Global Step: 746090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:35:08,473-Speed 2644.04 samples/sec   Loss 1.5981   LearningRate 0.0010   Epoch: 17   Global Step: 746100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:35:12,371-Speed 2627.17 samples/sec   Loss 1.5190   LearningRate 0.0010   Epoch: 17   Global Step: 746110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:35:16,272-Speed 2625.54 samples/sec   Loss 1.6134   LearningRate 0.0010   Epoch: 17   Global Step: 746120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:35:20,193-Speed 2612.02 samples/sec   Loss 1.5351   LearningRate 0.0010   Epoch: 17   Global Step: 746130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:35:24,304-Speed 2492.08 samples/sec   Loss 1.5496   LearningRate 0.0010   Epoch: 17   Global Step: 746140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:35:28,220-Speed 2616.02 samples/sec   Loss 1.6175   LearningRate 0.0010   Epoch: 17   Global Step: 746150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:35:32,091-Speed 2645.69 samples/sec   Loss 1.6096   LearningRate 0.0010   Epoch: 17   Global Step: 746160   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:35,979-Speed 2634.42 samples/sec   Loss 1.5544   LearningRate 0.0010   Epoch: 17   Global Step: 746170   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:39,888-Speed 2620.22 samples/sec   Loss 1.5803   LearningRate 0.0010   Epoch: 17   Global Step: 746180   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:43,794-Speed 2622.82 samples/sec   Loss 1.6176   LearningRate 0.0010   Epoch: 17   Global Step: 746190   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:47,691-Speed 2628.28 samples/sec   Loss 1.6448   LearningRate 0.0010   Epoch: 17   Global Step: 746200   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:51,594-Speed 2624.10 samples/sec   Loss 1.5355   LearningRate 0.0010   Epoch: 17   Global Step: 746210   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:55,494-Speed 2626.21 samples/sec   Loss 1.6091   LearningRate 0.0010   Epoch: 17   Global Step: 746220   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:35:59,390-Speed 2629.48 samples/sec   Loss 1.5992   LearningRate 0.0010   Epoch: 17   Global Step: 746230   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:36:03,287-Speed 2628.16 samples/sec   Loss 1.5600   LearningRate 0.0010   Epoch: 17   Global Step: 746240   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:36:07,201-Speed 2616.72 samples/sec   Loss 1.5496   LearningRate 0.0010   Epoch: 17   Global Step: 746250   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:36:11,103-Speed 2625.51 samples/sec   Loss 1.6400   LearningRate 0.0010   Epoch: 17   Global Step: 746260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:14,997-Speed 2630.85 samples/sec   Loss 1.6197   LearningRate 0.0010   Epoch: 17   Global Step: 746270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:18,891-Speed 2629.92 samples/sec   Loss 1.6484   LearningRate 0.0010   Epoch: 17   Global Step: 746280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:22,803-Speed 2618.59 samples/sec   Loss 1.5748   LearningRate 0.0010   Epoch: 17   Global Step: 746290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:26,702-Speed 2626.86 samples/sec   Loss 1.5314   LearningRate 0.0010   Epoch: 17   Global Step: 746300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:30,592-Speed 2633.51 samples/sec   Loss 1.5666   LearningRate 0.0010   Epoch: 17   Global Step: 746310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:34,483-Speed 2632.03 samples/sec   Loss 1.5769   LearningRate 0.0010   Epoch: 17   Global Step: 746320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:38,378-Speed 2629.56 samples/sec   Loss 1.6133   LearningRate 0.0010   Epoch: 17   Global Step: 746330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:42,270-Speed 2631.54 samples/sec   Loss 1.6161   LearningRate 0.0010   Epoch: 17   Global Step: 746340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:46,161-Speed 2632.76 samples/sec   Loss 1.6102   LearningRate 0.0010   Epoch: 17   Global Step: 746350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:50,079-Speed 2614.58 samples/sec   Loss 1.6357   LearningRate 0.0010   Epoch: 17   Global Step: 746360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:36:53,947-Speed 2647.70 samples/sec   Loss 1.6150   LearningRate 0.0010   Epoch: 17   Global Step: 746370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:36:57,838-Speed 2632.74 samples/sec   Loss 1.6017   LearningRate 0.0010   Epoch: 17   Global Step: 746380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:01,732-Speed 2630.30 samples/sec   Loss 1.5675   LearningRate 0.0010   Epoch: 17   Global Step: 746390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:05,653-Speed 2612.56 samples/sec   Loss 1.5547   LearningRate 0.0010   Epoch: 17   Global Step: 746400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:09,548-Speed 2629.49 samples/sec   Loss 1.5786   LearningRate 0.0010   Epoch: 17   Global Step: 746410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:13,443-Speed 2629.95 samples/sec   Loss 1.5641   LearningRate 0.0010   Epoch: 17   Global Step: 746420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:17,339-Speed 2629.26 samples/sec   Loss 1.5794   LearningRate 0.0010   Epoch: 17   Global Step: 746430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:21,234-Speed 2630.00 samples/sec   Loss 1.6124   LearningRate 0.0010   Epoch: 17   Global Step: 746440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:25,128-Speed 2629.67 samples/sec   Loss 1.6357   LearningRate 0.0010   Epoch: 17   Global Step: 746450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:29,024-Speed 2630.00 samples/sec   Loss 1.6077   LearningRate 0.0010   Epoch: 17   Global Step: 746460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:37:32,916-Speed 2631.51 samples/sec   Loss 1.6375   LearningRate 0.0010   Epoch: 17   Global Step: 746470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:37:36,815-Speed 2626.74 samples/sec   Loss 1.5643   LearningRate 0.0010   Epoch: 17   Global Step: 746480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:37:40,715-Speed 2626.19 samples/sec   Loss 1.5952   LearningRate 0.0010   Epoch: 17   Global Step: 746490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:37:44,687-Speed 2578.91 samples/sec   Loss 1.6138   LearningRate 0.0010   Epoch: 17   Global Step: 746500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:37:48,586-Speed 2626.91 samples/sec   Loss 1.5869   LearningRate 0.0010   Epoch: 17   Global Step: 746510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:37:52,482-Speed 2630.40 samples/sec   Loss 1.6153   LearningRate 0.0010   Epoch: 17   Global Step: 746520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:37:56,349-Speed 2648.60 samples/sec   Loss 1.5707   LearningRate 0.0010   Epoch: 17   Global Step: 746530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:00,246-Speed 2628.84 samples/sec   Loss 1.5968   LearningRate 0.0010   Epoch: 17   Global Step: 746540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:04,149-Speed 2624.33 samples/sec   Loss 1.5742   LearningRate 0.0010   Epoch: 17   Global Step: 746550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:08,054-Speed 2622.40 samples/sec   Loss 1.6042   LearningRate 0.0010   Epoch: 17   Global Step: 746560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:11,945-Speed 2632.30 samples/sec   Loss 1.6023   LearningRate 0.0010   Epoch: 17   Global Step: 746570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:15,875-Speed 2606.56 samples/sec   Loss 1.5574   LearningRate 0.0010   Epoch: 17   Global Step: 746580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:19,925-Speed 2529.30 samples/sec   Loss 1.4776   LearningRate 0.0010   Epoch: 17   Global Step: 746590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:23,820-Speed 2629.05 samples/sec   Loss 1.5996   LearningRate 0.0010   Epoch: 17   Global Step: 746600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:27,717-Speed 2629.37 samples/sec   Loss 1.5753   LearningRate 0.0010   Epoch: 17   Global Step: 746610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:31,611-Speed 2629.90 samples/sec   Loss 1.5866   LearningRate 0.0010   Epoch: 17   Global Step: 746620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:38:53,041-Speed 477.87 samples/sec   Loss 1.5517   LearningRate 0.0010   Epoch: 18   Global Step: 746630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:38:56,962-Speed 2612.75 samples/sec   Loss 1.5924   LearningRate 0.0010   Epoch: 18   Global Step: 746640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:39:00,818-Speed 2656.13 samples/sec   Loss 1.6159   LearningRate 0.0010   Epoch: 18   Global Step: 746650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:04,721-Speed 2624.51 samples/sec   Loss 1.6065   LearningRate 0.0010   Epoch: 18   Global Step: 746660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:08,619-Speed 2628.18 samples/sec   Loss 1.5926   LearningRate 0.0010   Epoch: 18   Global Step: 746670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:12,519-Speed 2625.79 samples/sec   Loss 1.6202   LearningRate 0.0010   Epoch: 18   Global Step: 746680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:16,406-Speed 2635.14 samples/sec   Loss 1.6160   LearningRate 0.0010   Epoch: 18   Global Step: 746690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:20,305-Speed 2627.33 samples/sec   Loss 1.5633   LearningRate 0.0010   Epoch: 18   Global Step: 746700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:24,223-Speed 2614.33 samples/sec   Loss 1.5663   LearningRate 0.0010   Epoch: 18   Global Step: 746710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:28,121-Speed 2627.48 samples/sec   Loss 1.5817   LearningRate 0.0010   Epoch: 18   Global Step: 746720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:32,013-Speed 2632.10 samples/sec   Loss 1.6007   LearningRate 0.0010   Epoch: 18   Global Step: 746730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:35,906-Speed 2631.14 samples/sec   Loss 1.5939   LearningRate 0.0010   Epoch: 18   Global Step: 746740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:39:39,800-Speed 2630.35 samples/sec   Loss 1.5978   LearningRate 0.0010   Epoch: 18   Global Step: 746750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:39:43,695-Speed 2629.46 samples/sec   Loss 1.5773   LearningRate 0.0010   Epoch: 18   Global Step: 746760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:39:47,595-Speed 2625.65 samples/sec   Loss 1.5633   LearningRate 0.0010   Epoch: 18   Global Step: 746770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:39:51,502-Speed 2621.87 samples/sec   Loss 1.6087   LearningRate 0.0010   Epoch: 18   Global Step: 746780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:39:55,406-Speed 2623.30 samples/sec   Loss 1.5973   LearningRate 0.0010   Epoch: 18   Global Step: 746790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:39:59,335-Speed 2607.25 samples/sec   Loss 1.6244   LearningRate 0.0010   Epoch: 18   Global Step: 746800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:03,241-Speed 2622.26 samples/sec   Loss 1.6183   LearningRate 0.0010   Epoch: 18   Global Step: 746810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:07,140-Speed 2627.13 samples/sec   Loss 1.6354   LearningRate 0.0010   Epoch: 18   Global Step: 746820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:11,036-Speed 2628.98 samples/sec   Loss 1.5197   LearningRate 0.0010   Epoch: 18   Global Step: 746830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:14,932-Speed 2628.42 samples/sec   Loss 1.5692   LearningRate 0.0010   Epoch: 18   Global Step: 746840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:18,874-Speed 2599.24 samples/sec   Loss 1.5530   LearningRate 0.0010   Epoch: 18   Global Step: 746850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:22,768-Speed 2630.18 samples/sec   Loss 1.5364   LearningRate 0.0010   Epoch: 18   Global Step: 746860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:26,671-Speed 2623.70 samples/sec   Loss 1.5251   LearningRate 0.0010   Epoch: 18   Global Step: 746870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:30,568-Speed 2628.90 samples/sec   Loss 1.6069   LearningRate 0.0010   Epoch: 18   Global Step: 746880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:34,469-Speed 2625.71 samples/sec   Loss 1.6506   LearningRate 0.0010   Epoch: 18   Global Step: 746890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:38,400-Speed 2605.72 samples/sec   Loss 1.5797   LearningRate 0.0010   Epoch: 18   Global Step: 746900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:40:42,302-Speed 2625.04 samples/sec   Loss 1.5791   LearningRate 0.0010   Epoch: 18   Global Step: 746910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:40:46,204-Speed 2625.14 samples/sec   Loss 1.5708   LearningRate 0.0010   Epoch: 18   Global Step: 746920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:40:50,083-Speed 2640.09 samples/sec   Loss 1.5803   LearningRate 0.0010   Epoch: 18   Global Step: 746930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:53,987-Speed 2624.09 samples/sec   Loss 1.6151   LearningRate 0.0010   Epoch: 18   Global Step: 746940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:40:57,916-Speed 2607.23 samples/sec   Loss 1.5859   LearningRate 0.0010   Epoch: 18   Global Step: 746950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:01,817-Speed 2625.52 samples/sec   Loss 1.5398   LearningRate 0.0010   Epoch: 18   Global Step: 746960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:05,743-Speed 2608.63 samples/sec   Loss 1.5864   LearningRate 0.0010   Epoch: 18   Global Step: 746970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:09,670-Speed 2608.47 samples/sec   Loss 1.5808   LearningRate 0.0010   Epoch: 18   Global Step: 746980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:13,643-Speed 2577.92 samples/sec   Loss 1.5847   LearningRate 0.0010   Epoch: 18   Global Step: 746990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:17,542-Speed 2627.38 samples/sec   Loss 1.5589   LearningRate 0.0010   Epoch: 18   Global Step: 747000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:21,443-Speed 2625.78 samples/sec   Loss 1.5885   LearningRate 0.0010   Epoch: 18   Global Step: 747010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:25,452-Speed 2554.32 samples/sec   Loss 1.5635   LearningRate 0.0010   Epoch: 18   Global Step: 747020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:29,370-Speed 2614.94 samples/sec   Loss 1.5792   LearningRate 0.0010   Epoch: 18   Global Step: 747030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:41:33,249-Speed 2640.41 samples/sec   Loss 1.5724   LearningRate 0.0010   Epoch: 18   Global Step: 747040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:37,165-Speed 2615.06 samples/sec   Loss 1.5311   LearningRate 0.0010   Epoch: 18   Global Step: 747050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:41,060-Speed 2629.50 samples/sec   Loss 1.5979   LearningRate 0.0010   Epoch: 18   Global Step: 747060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:44,959-Speed 2627.12 samples/sec   Loss 1.5049   LearningRate 0.0010   Epoch: 18   Global Step: 747070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:48,855-Speed 2629.18 samples/sec   Loss 1.5535   LearningRate 0.0010   Epoch: 18   Global Step: 747080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:52,801-Speed 2595.42 samples/sec   Loss 1.5330   LearningRate 0.0010   Epoch: 18   Global Step: 747090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:41:56,713-Speed 2618.59 samples/sec   Loss 1.5505   LearningRate 0.0010   Epoch: 18   Global Step: 747100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:42:00,643-Speed 2606.38 samples/sec   Loss 1.6109   LearningRate 0.0010   Epoch: 18   Global Step: 747110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:42:04,640-Speed 2562.42 samples/sec   Loss 1.5190   LearningRate 0.0010   Epoch: 18   Global Step: 747120   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:08,543-Speed 2624.06 samples/sec   Loss 1.6156   LearningRate 0.0010   Epoch: 18   Global Step: 747130   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:12,446-Speed 2623.96 samples/sec   Loss 1.6036   LearningRate 0.0010   Epoch: 18   Global Step: 747140   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:16,345-Speed 2627.47 samples/sec   Loss 1.5426   LearningRate 0.0010   Epoch: 18   Global Step: 747150   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:20,244-Speed 2627.80 samples/sec   Loss 1.5317   LearningRate 0.0010   Epoch: 18   Global Step: 747160   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:24,155-Speed 2618.59 samples/sec   Loss 1.5892   LearningRate 0.0010   Epoch: 18   Global Step: 747170   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:28,080-Speed 2610.26 samples/sec   Loss 1.5080   LearningRate 0.0010   Epoch: 18   Global Step: 747180   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:31,979-Speed 2626.89 samples/sec   Loss 1.5545   LearningRate 0.0010   Epoch: 18   Global Step: 747190   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:35,874-Speed 2629.35 samples/sec   Loss 1.6099   LearningRate 0.0010   Epoch: 18   Global Step: 747200   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:39,768-Speed 2630.24 samples/sec   Loss 1.5267   LearningRate 0.0010   Epoch: 18   Global Step: 747210   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:42:43,666-Speed 2628.12 samples/sec   Loss 1.5884   LearningRate 0.0010   Epoch: 18   Global Step: 747220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:42:47,577-Speed 2619.03 samples/sec   Loss 1.5923   LearningRate 0.0010   Epoch: 18   Global Step: 747230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:42:51,510-Speed 2604.15 samples/sec   Loss 1.5868   LearningRate 0.0010   Epoch: 18   Global Step: 747240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:42:55,405-Speed 2629.68 samples/sec   Loss 1.5189   LearningRate 0.0010   Epoch: 18   Global Step: 747250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:42:59,304-Speed 2627.60 samples/sec   Loss 1.5918   LearningRate 0.0010   Epoch: 18   Global Step: 747260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:03,206-Speed 2624.36 samples/sec   Loss 1.6149   LearningRate 0.0010   Epoch: 18   Global Step: 747270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:07,101-Speed 2630.19 samples/sec   Loss 1.5946   LearningRate 0.0010   Epoch: 18   Global Step: 747280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:10,998-Speed 2628.01 samples/sec   Loss 1.5528   LearningRate 0.0010   Epoch: 18   Global Step: 747290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:14,905-Speed 2621.91 samples/sec   Loss 1.5329   LearningRate 0.0010   Epoch: 18   Global Step: 747300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:18,877-Speed 2578.63 samples/sec   Loss 1.5590   LearningRate 0.0010   Epoch: 18   Global Step: 747310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:22,788-Speed 2619.43 samples/sec   Loss 1.5701   LearningRate 0.0010   Epoch: 18   Global Step: 747320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:43:26,796-Speed 2555.39 samples/sec   Loss 1.5883   LearningRate 0.0010   Epoch: 18   Global Step: 747330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:43:30,676-Speed 2640.41 samples/sec   Loss 1.5912   LearningRate 0.0010   Epoch: 18   Global Step: 747340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:34,572-Speed 2628.54 samples/sec   Loss 1.5914   LearningRate 0.0010   Epoch: 18   Global Step: 747350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:38,476-Speed 2623.90 samples/sec   Loss 1.5586   LearningRate 0.0010   Epoch: 18   Global Step: 747360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:42,373-Speed 2628.09 samples/sec   Loss 1.5761   LearningRate 0.0010   Epoch: 18   Global Step: 747370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:46,276-Speed 2624.62 samples/sec   Loss 1.5280   LearningRate 0.0010   Epoch: 18   Global Step: 747380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:50,171-Speed 2630.93 samples/sec   Loss 1.6360   LearningRate 0.0010   Epoch: 18   Global Step: 747390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:54,073-Speed 2624.56 samples/sec   Loss 1.5389   LearningRate 0.0010   Epoch: 18   Global Step: 747400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:43:57,990-Speed 2615.05 samples/sec   Loss 1.5129   LearningRate 0.0010   Epoch: 18   Global Step: 747410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:01,893-Speed 2624.06 samples/sec   Loss 1.5710   LearningRate 0.0010   Epoch: 18   Global Step: 747420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:05,789-Speed 2629.13 samples/sec   Loss 1.5709   LearningRate 0.0010   Epoch: 18   Global Step: 747430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:09,684-Speed 2629.56 samples/sec   Loss 1.6097   LearningRate 0.0010   Epoch: 18   Global Step: 747440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:44:13,581-Speed 2628.74 samples/sec   Loss 1.5642   LearningRate 0.0010   Epoch: 18   Global Step: 747450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:44:17,474-Speed 2631.00 samples/sec   Loss 1.5523   LearningRate 0.0010   Epoch: 18   Global Step: 747460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:44:21,366-Speed 2631.81 samples/sec   Loss 1.5396   LearningRate 0.0010   Epoch: 18   Global Step: 747470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:44:25,263-Speed 2628.42 samples/sec   Loss 1.5259   LearningRate 0.0010   Epoch: 18   Global Step: 747480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:29,162-Speed 2627.58 samples/sec   Loss 1.5507   LearningRate 0.0010   Epoch: 18   Global Step: 747490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:33,059-Speed 2628.95 samples/sec   Loss 1.5352   LearningRate 0.0010   Epoch: 18   Global Step: 747500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:36,953-Speed 2629.60 samples/sec   Loss 1.6046   LearningRate 0.0010   Epoch: 18   Global Step: 747510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:40,849-Speed 2629.51 samples/sec   Loss 1.5451   LearningRate 0.0010   Epoch: 18   Global Step: 747520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:44,759-Speed 2619.67 samples/sec   Loss 1.5750   LearningRate 0.0010   Epoch: 18   Global Step: 747530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:48,653-Speed 2630.26 samples/sec   Loss 1.5435   LearningRate 0.0010   Epoch: 18   Global Step: 747540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:52,544-Speed 2632.06 samples/sec   Loss 1.5534   LearningRate 0.0010   Epoch: 18   Global Step: 747550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:44:56,449-Speed 2623.49 samples/sec   Loss 1.6119   LearningRate 0.0010   Epoch: 18   Global Step: 747560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:45:00,340-Speed 2632.27 samples/sec   Loss 1.4988   LearningRate 0.0010   Epoch: 18   Global Step: 747570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:45:04,238-Speed 2627.78 samples/sec   Loss 1.5582   LearningRate 0.0010   Epoch: 18   Global Step: 747580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:45:08,223-Speed 2570.12 samples/sec   Loss 1.5537   LearningRate 0.0010   Epoch: 18   Global Step: 747590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:45:12,099-Speed 2642.41 samples/sec   Loss 1.5303   LearningRate 0.0010   Epoch: 18   Global Step: 747600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:45:16,004-Speed 2622.54 samples/sec   Loss 1.5901   LearningRate 0.0010   Epoch: 18   Global Step: 747610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:45:19,914-Speed 2619.89 samples/sec   Loss 1.5928   LearningRate 0.0010   Epoch: 18   Global Step: 747620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:45:23,811-Speed 2628.74 samples/sec   Loss 1.5387   LearningRate 0.0010   Epoch: 18   Global Step: 747630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:45:27,686-Speed 2642.77 samples/sec   Loss 1.5240   LearningRate 0.0010   Epoch: 18   Global Step: 747640   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:31,609-Speed 2611.34 samples/sec   Loss 1.5547   LearningRate 0.0010   Epoch: 18   Global Step: 747650   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:35,511-Speed 2624.57 samples/sec   Loss 1.5611   LearningRate 0.0010   Epoch: 18   Global Step: 747660   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:39,409-Speed 2627.99 samples/sec   Loss 1.5413   LearningRate 0.0010   Epoch: 18   Global Step: 747670   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:43,303-Speed 2630.60 samples/sec   Loss 1.5872   LearningRate 0.0010   Epoch: 18   Global Step: 747680   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:47,196-Speed 2631.00 samples/sec   Loss 1.5534   LearningRate 0.0010   Epoch: 18   Global Step: 747690   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:51,099-Speed 2623.67 samples/sec   Loss 1.5278   LearningRate 0.0010   Epoch: 18   Global Step: 747700   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:54,992-Speed 2631.26 samples/sec   Loss 1.5153   LearningRate 0.0010   Epoch: 18   Global Step: 747710   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:45:58,884-Speed 2632.01 samples/sec   Loss 1.6006   LearningRate 0.0010   Epoch: 18   Global Step: 747720   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:46:02,780-Speed 2629.60 samples/sec   Loss 1.5471   LearningRate 0.0010   Epoch: 18   Global Step: 747730   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:46:06,682-Speed 2624.16 samples/sec   Loss 1.6314   LearningRate 0.0010   Epoch: 18   Global Step: 747740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:10,578-Speed 2629.81 samples/sec   Loss 1.5571   LearningRate 0.0010   Epoch: 18   Global Step: 747750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:14,475-Speed 2628.06 samples/sec   Loss 1.5751   LearningRate 0.0010   Epoch: 18   Global Step: 747760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:18,377-Speed 2624.59 samples/sec   Loss 1.6066   LearningRate 0.0010   Epoch: 18   Global Step: 747770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:22,279-Speed 2624.62 samples/sec   Loss 1.5643   LearningRate 0.0010   Epoch: 18   Global Step: 747780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:26,182-Speed 2624.87 samples/sec   Loss 1.6014   LearningRate 0.0010   Epoch: 18   Global Step: 747790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:30,079-Speed 2628.11 samples/sec   Loss 1.5950   LearningRate 0.0010   Epoch: 18   Global Step: 747800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:33,970-Speed 2632.09 samples/sec   Loss 1.5832   LearningRate 0.0010   Epoch: 18   Global Step: 747810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:37,867-Speed 2628.40 samples/sec   Loss 1.5413   LearningRate 0.0010   Epoch: 18   Global Step: 747820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:41,760-Speed 2630.81 samples/sec   Loss 1.5399   LearningRate 0.0010   Epoch: 18   Global Step: 747830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:46:45,655-Speed 2630.14 samples/sec   Loss 1.5437   LearningRate 0.0010   Epoch: 18   Global Step: 747840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:46:49,548-Speed 2630.78 samples/sec   Loss 1.6064   LearningRate 0.0010   Epoch: 18   Global Step: 747850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:46:53,471-Speed 2611.49 samples/sec   Loss 1.5372   LearningRate 0.0010   Epoch: 18   Global Step: 747860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:46:57,367-Speed 2628.85 samples/sec   Loss 1.6092   LearningRate 0.0010   Epoch: 18   Global Step: 747870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:01,258-Speed 2631.95 samples/sec   Loss 1.5458   LearningRate 0.0010   Epoch: 18   Global Step: 747880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:05,159-Speed 2625.93 samples/sec   Loss 1.5657   LearningRate 0.0010   Epoch: 18   Global Step: 747890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:09,057-Speed 2628.42 samples/sec   Loss 1.5815   LearningRate 0.0010   Epoch: 18   Global Step: 747900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:12,961-Speed 2623.30 samples/sec   Loss 1.5530   LearningRate 0.0010   Epoch: 18   Global Step: 747910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:16,902-Speed 2599.52 samples/sec   Loss 1.5368   LearningRate 0.0010   Epoch: 18   Global Step: 747920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:20,812-Speed 2619.63 samples/sec   Loss 1.5653   LearningRate 0.0010   Epoch: 18   Global Step: 747930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:24,684-Speed 2645.54 samples/sec   Loss 1.5662   LearningRate 0.0010   Epoch: 18   Global Step: 747940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:47:28,562-Speed 2641.80 samples/sec   Loss 1.5701   LearningRate 0.0010   Epoch: 18   Global Step: 747950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:32,484-Speed 2611.64 samples/sec   Loss 1.5849   LearningRate 0.0010   Epoch: 18   Global Step: 747960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:36,377-Speed 2630.59 samples/sec   Loss 1.5386   LearningRate 0.0010   Epoch: 18   Global Step: 747970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:40,284-Speed 2621.66 samples/sec   Loss 1.5816   LearningRate 0.0010   Epoch: 18   Global Step: 747980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:44,201-Speed 2615.56 samples/sec   Loss 1.5688   LearningRate 0.0010   Epoch: 18   Global Step: 747990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:48,096-Speed 2629.52 samples/sec   Loss 1.5456   LearningRate 0.0010   Epoch: 18   Global Step: 748000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:52,009-Speed 2617.74 samples/sec   Loss 1.5638   LearningRate 0.0010   Epoch: 18   Global Step: 748010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:55,901-Speed 2631.52 samples/sec   Loss 1.5599   LearningRate 0.0010   Epoch: 18   Global Step: 748020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:47:59,809-Speed 2621.80 samples/sec   Loss 1.5197   LearningRate 0.0010   Epoch: 18   Global Step: 748030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:48:03,701-Speed 2631.07 samples/sec   Loss 1.5553   LearningRate 0.0010   Epoch: 18   Global Step: 748040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:48:07,592-Speed 2632.60 samples/sec   Loss 1.5886   LearningRate 0.0010   Epoch: 18   Global Step: 748050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:48:11,494-Speed 2624.71 samples/sec   Loss 1.5042   LearningRate 0.0010   Epoch: 18   Global Step: 748060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:48:15,386-Speed 2632.32 samples/sec   Loss 1.5907   LearningRate 0.0010   Epoch: 18   Global Step: 748070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:48:19,280-Speed 2630.50 samples/sec   Loss 1.5861   LearningRate 0.0010   Epoch: 18   Global Step: 748080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:48:23,174-Speed 2630.60 samples/sec   Loss 1.5246   LearningRate 0.0010   Epoch: 18   Global Step: 748090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:48:27,047-Speed 2644.05 samples/sec   Loss 1.5310   LearningRate 0.0010   Epoch: 18   Global Step: 748100   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:30,973-Speed 2609.41 samples/sec   Loss 1.5702   LearningRate 0.0010   Epoch: 18   Global Step: 748110   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:34,874-Speed 2625.36 samples/sec   Loss 1.5762   LearningRate 0.0010   Epoch: 18   Global Step: 748120   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:38,793-Speed 2613.56 samples/sec   Loss 1.6133   LearningRate 0.0010   Epoch: 18   Global Step: 748130   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:42,690-Speed 2628.41 samples/sec   Loss 1.5703   LearningRate 0.0010   Epoch: 18   Global Step: 748140   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:46,580-Speed 2633.66 samples/sec   Loss 1.5541   LearningRate 0.0010   Epoch: 18   Global Step: 748150   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:50,471-Speed 2632.04 samples/sec   Loss 1.5324   LearningRate 0.0010   Epoch: 18   Global Step: 748160   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:54,370-Speed 2627.32 samples/sec   Loss 1.5738   LearningRate 0.0010   Epoch: 18   Global Step: 748170   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:48:58,262-Speed 2631.40 samples/sec   Loss 1.5406   LearningRate 0.0010   Epoch: 18   Global Step: 748180   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:02,166-Speed 2624.08 samples/sec   Loss 1.5339   LearningRate 0.0010   Epoch: 18   Global Step: 748190   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:06,058-Speed 2631.56 samples/sec   Loss 1.5128   LearningRate 0.0010   Epoch: 18   Global Step: 748200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:49:09,964-Speed 2621.92 samples/sec   Loss 1.6188   LearningRate 0.0010   Epoch: 18   Global Step: 748210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:49:13,881-Speed 2615.73 samples/sec   Loss 1.5678   LearningRate 0.0010   Epoch: 18   Global Step: 748220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:49:17,777-Speed 2628.66 samples/sec   Loss 1.5337   LearningRate 0.0010   Epoch: 18   Global Step: 748230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:49:21,651-Speed 2644.61 samples/sec   Loss 1.5642   LearningRate 0.0010   Epoch: 18   Global Step: 748240   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:25,546-Speed 2629.58 samples/sec   Loss 1.5921   LearningRate 0.0010   Epoch: 18   Global Step: 748250   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:29,437-Speed 2632.62 samples/sec   Loss 1.5817   LearningRate 0.0010   Epoch: 18   Global Step: 748260   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:33,338-Speed 2625.25 samples/sec   Loss 1.5528   LearningRate 0.0010   Epoch: 18   Global Step: 748270   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:37,227-Speed 2634.04 samples/sec   Loss 1.6015   LearningRate 0.0010   Epoch: 18   Global Step: 748280   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:41,124-Speed 2627.47 samples/sec   Loss 1.5337   LearningRate 0.0010   Epoch: 18   Global Step: 748290   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:45,033-Speed 2620.47 samples/sec   Loss 1.5720   LearningRate 0.0010   Epoch: 18   Global Step: 748300   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:48,928-Speed 2630.34 samples/sec   Loss 1.5095   LearningRate 0.0010   Epoch: 18   Global Step: 748310   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:52,824-Speed 2628.79 samples/sec   Loss 1.5571   LearningRate 0.0010   Epoch: 18   Global Step: 748320   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:49:56,720-Speed 2629.36 samples/sec   Loss 1.5301   LearningRate 0.0010   Epoch: 18   Global Step: 748330   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:50:00,616-Speed 2628.78 samples/sec   Loss 1.5734   LearningRate 0.0010   Epoch: 18   Global Step: 748340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:04,514-Speed 2627.40 samples/sec   Loss 1.5306   LearningRate 0.0010   Epoch: 18   Global Step: 748350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:08,407-Speed 2631.26 samples/sec   Loss 1.5603   LearningRate 0.0010   Epoch: 18   Global Step: 748360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:12,306-Speed 2626.87 samples/sec   Loss 1.5733   LearningRate 0.0010   Epoch: 18   Global Step: 748370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:16,199-Speed 2630.88 samples/sec   Loss 1.5588   LearningRate 0.0010   Epoch: 18   Global Step: 748380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:20,095-Speed 2628.88 samples/sec   Loss 1.5425   LearningRate 0.0010   Epoch: 18   Global Step: 748390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:23,994-Speed 2627.20 samples/sec   Loss 1.5327   LearningRate 0.0010   Epoch: 18   Global Step: 748400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:27,891-Speed 2627.80 samples/sec   Loss 1.5370   LearningRate 0.0010   Epoch: 18   Global Step: 748410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:31,786-Speed 2630.38 samples/sec   Loss 1.5785   LearningRate 0.0010   Epoch: 18   Global Step: 748420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:35,683-Speed 2627.91 samples/sec   Loss 1.5781   LearningRate 0.0010   Epoch: 18   Global Step: 748430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:39,585-Speed 2625.32 samples/sec   Loss 1.5639   LearningRate 0.0010   Epoch: 18   Global Step: 748440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:50:43,468-Speed 2637.83 samples/sec   Loss 1.5967   LearningRate 0.0010   Epoch: 18   Global Step: 748450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:47,370-Speed 2625.18 samples/sec   Loss 1.5716   LearningRate 0.0010   Epoch: 18   Global Step: 748460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:51,271-Speed 2625.32 samples/sec   Loss 1.5215   LearningRate 0.0010   Epoch: 18   Global Step: 748470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:55,176-Speed 2622.95 samples/sec   Loss 1.5816   LearningRate 0.0010   Epoch: 18   Global Step: 748480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:50:59,094-Speed 2614.25 samples/sec   Loss 1.5134   LearningRate 0.0010   Epoch: 18   Global Step: 748490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:02,991-Speed 2628.83 samples/sec   Loss 1.5000   LearningRate 0.0010   Epoch: 18   Global Step: 748500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:06,888-Speed 2628.62 samples/sec   Loss 1.5479   LearningRate 0.0010   Epoch: 18   Global Step: 748510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:10,810-Speed 2611.08 samples/sec   Loss 1.5700   LearningRate 0.0010   Epoch: 18   Global Step: 748520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:14,714-Speed 2624.39 samples/sec   Loss 1.5598   LearningRate 0.0010   Epoch: 18   Global Step: 748530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:18,605-Speed 2631.76 samples/sec   Loss 1.5732   LearningRate 0.0010   Epoch: 18   Global Step: 748540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:22,508-Speed 2624.13 samples/sec   Loss 1.5620   LearningRate 0.0010   Epoch: 18   Global Step: 748550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:51:26,410-Speed 2625.53 samples/sec   Loss 1.6057   LearningRate 0.0010   Epoch: 18   Global Step: 748560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:51:30,310-Speed 2626.39 samples/sec   Loss 1.5169   LearningRate 0.0010   Epoch: 18   Global Step: 748570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:51:34,206-Speed 2629.06 samples/sec   Loss 1.5187   LearningRate 0.0010   Epoch: 18   Global Step: 748580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:51:38,081-Speed 2642.98 samples/sec   Loss 1.5485   LearningRate 0.0010   Epoch: 18   Global Step: 748590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:41,973-Speed 2631.45 samples/sec   Loss 1.5425   LearningRate 0.0010   Epoch: 18   Global Step: 748600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:45,876-Speed 2625.06 samples/sec   Loss 1.6031   LearningRate 0.0010   Epoch: 18   Global Step: 748610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:49,776-Speed 2626.33 samples/sec   Loss 1.6037   LearningRate 0.0010   Epoch: 18   Global Step: 748620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:53,687-Speed 2618.42 samples/sec   Loss 1.5247   LearningRate 0.0010   Epoch: 18   Global Step: 748630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:51:57,591-Speed 2623.81 samples/sec   Loss 1.5471   LearningRate 0.0010   Epoch: 18   Global Step: 748640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:01,493-Speed 2625.12 samples/sec   Loss 1.5544   LearningRate 0.0010   Epoch: 18   Global Step: 748650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:05,410-Speed 2614.98 samples/sec   Loss 1.5507   LearningRate 0.0010   Epoch: 18   Global Step: 748660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:09,312-Speed 2624.83 samples/sec   Loss 1.5918   LearningRate 0.0010   Epoch: 18   Global Step: 748670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:13,285-Speed 2578.43 samples/sec   Loss 1.4839   LearningRate 0.0010   Epoch: 18   Global Step: 748680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:17,198-Speed 2617.79 samples/sec   Loss 1.5180   LearningRate 0.0010   Epoch: 18   Global Step: 748690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:52:21,103-Speed 2622.56 samples/sec   Loss 1.5329   LearningRate 0.0010   Epoch: 18   Global Step: 748700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:52:24,978-Speed 2643.11 samples/sec   Loss 1.5389   LearningRate 0.0010   Epoch: 18   Global Step: 748710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:28,877-Speed 2626.98 samples/sec   Loss 1.4923   LearningRate 0.0010   Epoch: 18   Global Step: 748720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:32,789-Speed 2618.19 samples/sec   Loss 1.6296   LearningRate 0.0009   Epoch: 18   Global Step: 748730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:36,684-Speed 2630.05 samples/sec   Loss 1.5814   LearningRate 0.0009   Epoch: 18   Global Step: 748740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:40,581-Speed 2628.03 samples/sec   Loss 1.5489   LearningRate 0.0009   Epoch: 18   Global Step: 748750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:44,480-Speed 2626.67 samples/sec   Loss 1.6142   LearningRate 0.0009   Epoch: 18   Global Step: 748760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:48,378-Speed 2628.17 samples/sec   Loss 1.5452   LearningRate 0.0009   Epoch: 18   Global Step: 748770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:52,280-Speed 2624.97 samples/sec   Loss 1.6056   LearningRate 0.0009   Epoch: 18   Global Step: 748780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:52:56,188-Speed 2621.41 samples/sec   Loss 1.5249   LearningRate 0.0009   Epoch: 18   Global Step: 748790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:00,081-Speed 2630.65 samples/sec   Loss 1.5681   LearningRate 0.0009   Epoch: 18   Global Step: 748800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:04,023-Speed 2598.10 samples/sec   Loss 1.5613   LearningRate 0.0009   Epoch: 18   Global Step: 748810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:07,983-Speed 2586.64 samples/sec   Loss 1.5529   LearningRate 0.0009   Epoch: 18   Global Step: 748820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:11,878-Speed 2631.40 samples/sec   Loss 1.5463   LearningRate 0.0009   Epoch: 18   Global Step: 748830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:15,772-Speed 2630.78 samples/sec   Loss 1.5708   LearningRate 0.0009   Epoch: 18   Global Step: 748840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:19,663-Speed 2632.27 samples/sec   Loss 1.5535   LearningRate 0.0009   Epoch: 18   Global Step: 748850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:23,561-Speed 2627.89 samples/sec   Loss 1.5160   LearningRate 0.0009   Epoch: 18   Global Step: 748860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:27,465-Speed 2623.16 samples/sec   Loss 1.5522   LearningRate 0.0009   Epoch: 18   Global Step: 748870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:31,361-Speed 2629.40 samples/sec   Loss 1.5505   LearningRate 0.0009   Epoch: 18   Global Step: 748880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:35,273-Speed 2618.52 samples/sec   Loss 1.5398   LearningRate 0.0009   Epoch: 18   Global Step: 748890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:39,166-Speed 2630.60 samples/sec   Loss 1.5315   LearningRate 0.0009   Epoch: 18   Global Step: 748900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:43,061-Speed 2629.43 samples/sec   Loss 1.5481   LearningRate 0.0009   Epoch: 18   Global Step: 748910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:53:46,958-Speed 2628.12 samples/sec   Loss 1.5946   LearningRate 0.0009   Epoch: 18   Global Step: 748920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:53:50,842-Speed 2637.66 samples/sec   Loss 1.6124   LearningRate 0.0009   Epoch: 18   Global Step: 748930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:54,743-Speed 2626.01 samples/sec   Loss 1.5640   LearningRate 0.0009   Epoch: 18   Global Step: 748940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:53:58,645-Speed 2624.71 samples/sec   Loss 1.5115   LearningRate 0.0009   Epoch: 18   Global Step: 748950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:02,564-Speed 2613.53 samples/sec   Loss 1.5237   LearningRate 0.0009   Epoch: 18   Global Step: 748960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:06,458-Speed 2630.29 samples/sec   Loss 1.5815   LearningRate 0.0009   Epoch: 18   Global Step: 748970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:10,352-Speed 2630.50 samples/sec   Loss 1.5185   LearningRate 0.0009   Epoch: 18   Global Step: 748980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:14,244-Speed 2632.11 samples/sec   Loss 1.5139   LearningRate 0.0009   Epoch: 18   Global Step: 748990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:18,149-Speed 2622.59 samples/sec   Loss 1.5750   LearningRate 0.0009   Epoch: 18   Global Step: 749000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:22,049-Speed 2626.96 samples/sec   Loss 1.4836   LearningRate 0.0009   Epoch: 18   Global Step: 749010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:25,971-Speed 2611.92 samples/sec   Loss 1.5242   LearningRate 0.0009   Epoch: 18   Global Step: 749020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:29,868-Speed 2628.44 samples/sec   Loss 1.5845   LearningRate 0.0009   Epoch: 18   Global Step: 749030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:54:33,768-Speed 2626.18 samples/sec   Loss 1.6414   LearningRate 0.0009   Epoch: 18   Global Step: 749040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:54:37,666-Speed 2627.48 samples/sec   Loss 1.5301   LearningRate 0.0009   Epoch: 18   Global Step: 749050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:54:41,582-Speed 2615.44 samples/sec   Loss 1.5350   LearningRate 0.0009   Epoch: 18   Global Step: 749060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:54:45,485-Speed 2624.10 samples/sec   Loss 1.5555   LearningRate 0.0009   Epoch: 18   Global Step: 749070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:54:49,378-Speed 2631.61 samples/sec   Loss 1.5145   LearningRate 0.0009   Epoch: 18   Global Step: 749080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:54:53,247-Speed 2647.89 samples/sec   Loss 1.5678   LearningRate 0.0009   Epoch: 18   Global Step: 749090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:54:57,138-Speed 2632.45 samples/sec   Loss 1.5559   LearningRate 0.0009   Epoch: 18   Global Step: 749100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:01,031-Speed 2631.29 samples/sec   Loss 1.5624   LearningRate 0.0009   Epoch: 18   Global Step: 749110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:04,924-Speed 2630.35 samples/sec   Loss 1.5603   LearningRate 0.0009   Epoch: 18   Global Step: 749120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:08,815-Speed 2632.30 samples/sec   Loss 1.5276   LearningRate 0.0009   Epoch: 18   Global Step: 749130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:12,716-Speed 2625.88 samples/sec   Loss 1.5403   LearningRate 0.0009   Epoch: 18   Global Step: 749140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:16,618-Speed 2624.85 samples/sec   Loss 1.5390   LearningRate 0.0009   Epoch: 18   Global Step: 749150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:20,518-Speed 2626.53 samples/sec   Loss 1.5295   LearningRate 0.0009   Epoch: 18   Global Step: 749160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:24,414-Speed 2629.21 samples/sec   Loss 1.5314   LearningRate 0.0009   Epoch: 18   Global Step: 749170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:28,333-Speed 2613.39 samples/sec   Loss 1.5109   LearningRate 0.0009   Epoch: 18   Global Step: 749180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:55:32,229-Speed 2629.85 samples/sec   Loss 1.5719   LearningRate 0.0009   Epoch: 18   Global Step: 749190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:36,124-Speed 2629.50 samples/sec   Loss 1.5591   LearningRate 0.0009   Epoch: 18   Global Step: 749200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:40,018-Speed 2629.76 samples/sec   Loss 1.5414   LearningRate 0.0009   Epoch: 18   Global Step: 749210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:43,931-Speed 2617.35 samples/sec   Loss 1.5078   LearningRate 0.0009   Epoch: 18   Global Step: 749220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:47,823-Speed 2632.78 samples/sec   Loss 1.5322   LearningRate 0.0009   Epoch: 18   Global Step: 749230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:51,715-Speed 2632.36 samples/sec   Loss 1.5707   LearningRate 0.0009   Epoch: 18   Global Step: 749240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:55,614-Speed 2626.31 samples/sec   Loss 1.5468   LearningRate 0.0009   Epoch: 18   Global Step: 749250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:55:59,507-Speed 2631.81 samples/sec   Loss 1.5225   LearningRate 0.0009   Epoch: 18   Global Step: 749260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:03,414-Speed 2621.31 samples/sec   Loss 1.5184   LearningRate 0.0009   Epoch: 18   Global Step: 749270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:07,310-Speed 2628.74 samples/sec   Loss 1.5497   LearningRate 0.0009   Epoch: 18   Global Step: 749280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:11,185-Speed 2642.75 samples/sec   Loss 1.6153   LearningRate 0.0009   Epoch: 18   Global Step: 749290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:15,079-Speed 2631.14 samples/sec   Loss 1.5111   LearningRate 0.0009   Epoch: 18   Global Step: 749300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:18,969-Speed 2632.87 samples/sec   Loss 1.5700   LearningRate 0.0009   Epoch: 18   Global Step: 749310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:22,867-Speed 2627.38 samples/sec   Loss 1.6185   LearningRate 0.0009   Epoch: 18   Global Step: 749320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:26,794-Speed 2609.04 samples/sec   Loss 1.5534   LearningRate 0.0009   Epoch: 18   Global Step: 749330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:30,688-Speed 2630.35 samples/sec   Loss 1.5335   LearningRate 0.0009   Epoch: 18   Global Step: 749340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:34,583-Speed 2629.09 samples/sec   Loss 1.5775   LearningRate 0.0009   Epoch: 18   Global Step: 749350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:38,479-Speed 2629.12 samples/sec   Loss 1.5639   LearningRate 0.0009   Epoch: 18   Global Step: 749360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:42,375-Speed 2629.51 samples/sec   Loss 1.5116   LearningRate 0.0009   Epoch: 18   Global Step: 749370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:46,269-Speed 2630.56 samples/sec   Loss 1.5733   LearningRate 0.0009   Epoch: 18   Global Step: 749380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:50,184-Speed 2616.00 samples/sec   Loss 1.5186   LearningRate 0.0009   Epoch: 18   Global Step: 749390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:54,077-Speed 2631.07 samples/sec   Loss 1.5653   LearningRate 0.0009   Epoch: 18   Global Step: 749400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:56:57,968-Speed 2632.36 samples/sec   Loss 1.5747   LearningRate 0.0009   Epoch: 18   Global Step: 749410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:57:01,861-Speed 2630.97 samples/sec   Loss 1.5174   LearningRate 0.0009   Epoch: 18   Global Step: 749420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:57:05,774-Speed 2618.02 samples/sec   Loss 1.6168   LearningRate 0.0009   Epoch: 18   Global Step: 749430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:57:09,674-Speed 2626.07 samples/sec   Loss 1.5534   LearningRate 0.0009   Epoch: 18   Global Step: 749440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:57:13,575-Speed 2625.82 samples/sec   Loss 1.5007   LearningRate 0.0009   Epoch: 18   Global Step: 749450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:57:17,443-Speed 2647.95 samples/sec   Loss 1.5548   LearningRate 0.0009   Epoch: 18   Global Step: 749460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:21,347-Speed 2623.65 samples/sec   Loss 1.5076   LearningRate 0.0009   Epoch: 18   Global Step: 749470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:25,296-Speed 2594.23 samples/sec   Loss 1.5469   LearningRate 0.0009   Epoch: 18   Global Step: 749480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:29,190-Speed 2630.01 samples/sec   Loss 1.5115   LearningRate 0.0009   Epoch: 18   Global Step: 749490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:33,144-Speed 2590.25 samples/sec   Loss 1.5715   LearningRate 0.0009   Epoch: 18   Global Step: 749500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:37,083-Speed 2600.32 samples/sec   Loss 1.5144   LearningRate 0.0009   Epoch: 18   Global Step: 749510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:40,988-Speed 2622.88 samples/sec   Loss 1.4936   LearningRate 0.0009   Epoch: 18   Global Step: 749520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:44,882-Speed 2630.76 samples/sec   Loss 1.5539   LearningRate 0.0009   Epoch: 18   Global Step: 749530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:48,776-Speed 2630.65 samples/sec   Loss 1.5639   LearningRate 0.0009   Epoch: 18   Global Step: 749540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:52,672-Speed 2629.12 samples/sec   Loss 1.5894   LearningRate 0.0009   Epoch: 18   Global Step: 749550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:57:56,563-Speed 2631.94 samples/sec   Loss 1.5319   LearningRate 0.0009   Epoch: 18   Global Step: 749560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:58:00,459-Speed 2629.20 samples/sec   Loss 1.5372   LearningRate 0.0009   Epoch: 18   Global Step: 749570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:58:04,332-Speed 2644.72 samples/sec   Loss 1.5339   LearningRate 0.0009   Epoch: 18   Global Step: 749580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:08,227-Speed 2629.43 samples/sec   Loss 1.5247   LearningRate 0.0009   Epoch: 18   Global Step: 749590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:12,139-Speed 2618.27 samples/sec   Loss 1.5339   LearningRate 0.0009   Epoch: 18   Global Step: 749600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:16,032-Speed 2631.30 samples/sec   Loss 1.6158   LearningRate 0.0009   Epoch: 18   Global Step: 749610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:19,942-Speed 2619.56 samples/sec   Loss 1.5201   LearningRate 0.0009   Epoch: 18   Global Step: 749620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:23,865-Speed 2610.78 samples/sec   Loss 1.5250   LearningRate 0.0009   Epoch: 18   Global Step: 749630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:27,759-Speed 2630.83 samples/sec   Loss 1.5185   LearningRate 0.0009   Epoch: 18   Global Step: 749640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:31,652-Speed 2630.98 samples/sec   Loss 1.4786   LearningRate 0.0009   Epoch: 18   Global Step: 749650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:35,556-Speed 2624.17 samples/sec   Loss 1.5477   LearningRate 0.0009   Epoch: 18   Global Step: 749660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:39,490-Speed 2603.24 samples/sec   Loss 1.5699   LearningRate 0.0009   Epoch: 18   Global Step: 749670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:58:43,397-Speed 2621.56 samples/sec   Loss 1.5542   LearningRate 0.0009   Epoch: 18   Global Step: 749680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:58:47,290-Speed 2631.51 samples/sec   Loss 1.4926   LearningRate 0.0009   Epoch: 18   Global Step: 749690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:58:51,181-Speed 2632.40 samples/sec   Loss 1.5540   LearningRate 0.0009   Epoch: 18   Global Step: 749700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:58:55,071-Speed 2632.66 samples/sec   Loss 1.5446   LearningRate 0.0009   Epoch: 18   Global Step: 749710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:58:58,970-Speed 2627.11 samples/sec   Loss 1.5730   LearningRate 0.0009   Epoch: 18   Global Step: 749720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:59:02,869-Speed 2626.98 samples/sec   Loss 1.5399   LearningRate 0.0009   Epoch: 18   Global Step: 749730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:59:06,797-Speed 2607.23 samples/sec   Loss 1.5176   LearningRate 0.0009   Epoch: 18   Global Step: 749740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 07:59:10,720-Speed 2611.17 samples/sec   Loss 1.5729   LearningRate 0.0009   Epoch: 18   Global Step: 749750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:14,621-Speed 2625.35 samples/sec   Loss 1.5824   LearningRate 0.0009   Epoch: 18   Global Step: 749760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:18,515-Speed 2630.50 samples/sec   Loss 1.4953   LearningRate 0.0009   Epoch: 18   Global Step: 749770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:22,412-Speed 2628.53 samples/sec   Loss 1.5791   LearningRate 0.0009   Epoch: 18   Global Step: 749780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:26,318-Speed 2622.25 samples/sec   Loss 1.5489   LearningRate 0.0009   Epoch: 18   Global Step: 749790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:30,214-Speed 2629.62 samples/sec   Loss 1.5484   LearningRate 0.0009   Epoch: 18   Global Step: 749800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:34,120-Speed 2622.01 samples/sec   Loss 1.5490   LearningRate 0.0009   Epoch: 18   Global Step: 749810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 07:59:37,995-Speed 2643.18 samples/sec   Loss 1.5417   LearningRate 0.0009   Epoch: 18   Global Step: 749820   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:59:41,895-Speed 2626.31 samples/sec   Loss 1.5420   LearningRate 0.0009   Epoch: 18   Global Step: 749830   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:59:45,796-Speed 2625.74 samples/sec   Loss 1.5246   LearningRate 0.0009   Epoch: 18   Global Step: 749840   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:59:49,739-Speed 2597.63 samples/sec   Loss 1.5644   LearningRate 0.0009   Epoch: 18   Global Step: 749850   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:59:53,636-Speed 2628.91 samples/sec   Loss 1.5400   LearningRate 0.0009   Epoch: 18   Global Step: 749860   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 07:59:57,531-Speed 2629.11 samples/sec   Loss 1.5802   LearningRate 0.0009   Epoch: 18   Global Step: 749870   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:00:01,424-Speed 2631.17 samples/sec   Loss 1.5643   LearningRate 0.0009   Epoch: 18   Global Step: 749880   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:00:05,319-Speed 2629.29 samples/sec   Loss 1.5281   LearningRate 0.0009   Epoch: 18   Global Step: 749890   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:00:09,211-Speed 2631.73 samples/sec   Loss 1.5610   LearningRate 0.0009   Epoch: 18   Global Step: 749900   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:00:13,105-Speed 2630.21 samples/sec   Loss 1.5770   LearningRate 0.0009   Epoch: 18   Global Step: 749910   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:00:16,996-Speed 2632.44 samples/sec   Loss 1.5243   LearningRate 0.0009   Epoch: 18   Global Step: 749920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:20,896-Speed 2626.52 samples/sec   Loss 1.5721   LearningRate 0.0009   Epoch: 18   Global Step: 749930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:24,790-Speed 2630.34 samples/sec   Loss 1.5479   LearningRate 0.0009   Epoch: 18   Global Step: 749940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:28,681-Speed 2632.56 samples/sec   Loss 1.6145   LearningRate 0.0009   Epoch: 18   Global Step: 749950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:32,572-Speed 2633.47 samples/sec   Loss 1.4530   LearningRate 0.0009   Epoch: 18   Global Step: 749960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:36,464-Speed 2631.65 samples/sec   Loss 1.5716   LearningRate 0.0009   Epoch: 18   Global Step: 749970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:40,353-Speed 2633.43 samples/sec   Loss 1.5531   LearningRate 0.0009   Epoch: 18   Global Step: 749980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:44,244-Speed 2632.06 samples/sec   Loss 1.5331   LearningRate 0.0009   Epoch: 18   Global Step: 749990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:00:48,153-Speed 2620.53 samples/sec   Loss 1.5039   LearningRate 0.0009   Epoch: 18   Global Step: 750000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:01:30,956-[lfw][750000]XNorm: 21.977784
Training: 2022-04-16 08:01:30,957-[lfw][750000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 08:01:30,957-[lfw][750000]Accuracy-Highest: 0.99850
Training: 2022-04-16 08:02:20,721-[cfp_fp][750000]XNorm: 22.309532
Training: 2022-04-16 08:02:20,722-[cfp_fp][750000]Accuracy-Flip: 0.99271+-0.00396
Training: 2022-04-16 08:02:20,723-[cfp_fp][750000]Accuracy-Highest: 0.99329
Training: 2022-04-16 08:03:03,563-[agedb_30][750000]XNorm: 22.922225
Training: 2022-04-16 08:03:03,564-[agedb_30][750000]Accuracy-Flip: 0.98400+-0.00583
Training: 2022-04-16 08:03:03,564-[agedb_30][750000]Accuracy-Highest: 0.98400
Training: 2022-04-16 08:03:07,444-Speed 73.52 samples/sec   Loss 1.4987   LearningRate 0.0009   Epoch: 18   Global Step: 750010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:11,316-Speed 2644.97 samples/sec   Loss 1.6021   LearningRate 0.0009   Epoch: 18   Global Step: 750020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:03:15,183-Speed 2648.89 samples/sec   Loss 1.5590   LearningRate 0.0009   Epoch: 18   Global Step: 750030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:19,061-Speed 2641.86 samples/sec   Loss 1.5422   LearningRate 0.0009   Epoch: 18   Global Step: 750040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:22,953-Speed 2631.15 samples/sec   Loss 1.5488   LearningRate 0.0009   Epoch: 18   Global Step: 750050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:26,860-Speed 2621.81 samples/sec   Loss 1.5273   LearningRate 0.0009   Epoch: 18   Global Step: 750060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:30,749-Speed 2634.69 samples/sec   Loss 1.5587   LearningRate 0.0009   Epoch: 18   Global Step: 750070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:34,634-Speed 2636.14 samples/sec   Loss 1.5084   LearningRate 0.0009   Epoch: 18   Global Step: 750080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:38,539-Speed 2623.34 samples/sec   Loss 1.5705   LearningRate 0.0009   Epoch: 18   Global Step: 750090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:42,431-Speed 2632.14 samples/sec   Loss 1.5236   LearningRate 0.0009   Epoch: 18   Global Step: 750100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:46,326-Speed 2629.26 samples/sec   Loss 1.5060   LearningRate 0.0009   Epoch: 18   Global Step: 750110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:50,248-Speed 2611.69 samples/sec   Loss 1.5251   LearningRate 0.0009   Epoch: 18   Global Step: 750120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:03:54,137-Speed 2633.95 samples/sec   Loss 1.5662   LearningRate 0.0009   Epoch: 18   Global Step: 750130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:03:58,029-Speed 2631.99 samples/sec   Loss 1.4977   LearningRate 0.0009   Epoch: 18   Global Step: 750140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:04:01,926-Speed 2628.24 samples/sec   Loss 1.4798   LearningRate 0.0009   Epoch: 18   Global Step: 750150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:04:05,798-Speed 2645.53 samples/sec   Loss 1.5441   LearningRate 0.0009   Epoch: 18   Global Step: 750160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:09,693-Speed 2630.02 samples/sec   Loss 1.5022   LearningRate 0.0009   Epoch: 18   Global Step: 750170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:13,621-Speed 2607.35 samples/sec   Loss 1.5278   LearningRate 0.0009   Epoch: 18   Global Step: 750180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:17,513-Speed 2632.01 samples/sec   Loss 1.5162   LearningRate 0.0009   Epoch: 18   Global Step: 750190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:21,428-Speed 2616.46 samples/sec   Loss 1.4908   LearningRate 0.0009   Epoch: 18   Global Step: 750200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:25,522-Speed 2501.64 samples/sec   Loss 1.5476   LearningRate 0.0009   Epoch: 18   Global Step: 750210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:29,616-Speed 2502.57 samples/sec   Loss 1.4887   LearningRate 0.0009   Epoch: 18   Global Step: 750220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:33,714-Speed 2499.69 samples/sec   Loss 1.5512   LearningRate 0.0009   Epoch: 18   Global Step: 750230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:37,762-Speed 2530.32 samples/sec   Loss 1.5225   LearningRate 0.0009   Epoch: 18   Global Step: 750240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:41,658-Speed 2629.14 samples/sec   Loss 1.5852   LearningRate 0.0009   Epoch: 18   Global Step: 750250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:45,556-Speed 2628.03 samples/sec   Loss 1.4946   LearningRate 0.0009   Epoch: 18   Global Step: 750260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:04:49,512-Speed 2588.80 samples/sec   Loss 1.5430   LearningRate 0.0009   Epoch: 18   Global Step: 750270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:53,429-Speed 2615.10 samples/sec   Loss 1.5373   LearningRate 0.0009   Epoch: 18   Global Step: 750280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:04:57,326-Speed 2628.21 samples/sec   Loss 1.5131   LearningRate 0.0009   Epoch: 18   Global Step: 750290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:01,266-Speed 2599.53 samples/sec   Loss 1.5766   LearningRate 0.0009   Epoch: 18   Global Step: 750300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:05,162-Speed 2629.10 samples/sec   Loss 1.6330   LearningRate 0.0009   Epoch: 18   Global Step: 750310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:09,055-Speed 2631.06 samples/sec   Loss 1.5627   LearningRate 0.0009   Epoch: 18   Global Step: 750320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:12,952-Speed 2628.92 samples/sec   Loss 1.4719   LearningRate 0.0009   Epoch: 18   Global Step: 750330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:16,907-Speed 2589.36 samples/sec   Loss 1.5562   LearningRate 0.0009   Epoch: 18   Global Step: 750340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:20,839-Speed 2605.23 samples/sec   Loss 1.5364   LearningRate 0.0009   Epoch: 18   Global Step: 750350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:24,736-Speed 2628.26 samples/sec   Loss 1.5822   LearningRate 0.0009   Epoch: 18   Global Step: 750360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:28,632-Speed 2629.22 samples/sec   Loss 1.5703   LearningRate 0.0009   Epoch: 18   Global Step: 750370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:05:32,528-Speed 2628.67 samples/sec   Loss 1.5969   LearningRate 0.0009   Epoch: 18   Global Step: 750380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:05:36,433-Speed 2623.69 samples/sec   Loss 1.5135   LearningRate 0.0009   Epoch: 18   Global Step: 750390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:05:40,489-Speed 2524.75 samples/sec   Loss 1.5405   LearningRate 0.0009   Epoch: 18   Global Step: 750400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:05:44,383-Speed 2630.78 samples/sec   Loss 1.4680   LearningRate 0.0009   Epoch: 18   Global Step: 750410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:48,289-Speed 2622.13 samples/sec   Loss 1.5795   LearningRate 0.0009   Epoch: 18   Global Step: 750420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:52,186-Speed 2628.48 samples/sec   Loss 1.5199   LearningRate 0.0009   Epoch: 18   Global Step: 750430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:56,080-Speed 2630.56 samples/sec   Loss 1.5780   LearningRate 0.0009   Epoch: 18   Global Step: 750440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:05:59,972-Speed 2631.69 samples/sec   Loss 1.5556   LearningRate 0.0009   Epoch: 18   Global Step: 750450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:03,868-Speed 2628.55 samples/sec   Loss 1.5055   LearningRate 0.0009   Epoch: 18   Global Step: 750460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:07,770-Speed 2625.48 samples/sec   Loss 1.5366   LearningRate 0.0009   Epoch: 18   Global Step: 750470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:11,668-Speed 2627.33 samples/sec   Loss 1.5148   LearningRate 0.0009   Epoch: 18   Global Step: 750480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:15,570-Speed 2625.25 samples/sec   Loss 1.5835   LearningRate 0.0009   Epoch: 18   Global Step: 750490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:19,467-Speed 2628.54 samples/sec   Loss 1.4963   LearningRate 0.0009   Epoch: 18   Global Step: 750500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:23,360-Speed 2630.66 samples/sec   Loss 1.5063   LearningRate 0.0009   Epoch: 18   Global Step: 750510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:06:27,262-Speed 2624.82 samples/sec   Loss 1.5496   LearningRate 0.0009   Epoch: 18   Global Step: 750520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:06:31,158-Speed 2628.92 samples/sec   Loss 1.5648   LearningRate 0.0009   Epoch: 18   Global Step: 750530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:06:35,053-Speed 2629.83 samples/sec   Loss 1.5564   LearningRate 0.0009   Epoch: 18   Global Step: 750540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:06:38,948-Speed 2629.35 samples/sec   Loss 1.5369   LearningRate 0.0009   Epoch: 18   Global Step: 750550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:06:42,830-Speed 2638.34 samples/sec   Loss 1.5241   LearningRate 0.0009   Epoch: 18   Global Step: 750560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:46,735-Speed 2623.80 samples/sec   Loss 1.5129   LearningRate 0.0009   Epoch: 18   Global Step: 750570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:50,634-Speed 2626.94 samples/sec   Loss 1.5342   LearningRate 0.0009   Epoch: 18   Global Step: 750580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:54,530-Speed 2630.25 samples/sec   Loss 1.5310   LearningRate 0.0009   Epoch: 18   Global Step: 750590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:06:58,431-Speed 2625.21 samples/sec   Loss 1.5172   LearningRate 0.0009   Epoch: 18   Global Step: 750600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:02,325-Speed 2630.36 samples/sec   Loss 1.5505   LearningRate 0.0009   Epoch: 18   Global Step: 750610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:06,223-Speed 2627.14 samples/sec   Loss 1.5410   LearningRate 0.0009   Epoch: 18   Global Step: 750620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:10,123-Speed 2625.99 samples/sec   Loss 1.5604   LearningRate 0.0009   Epoch: 18   Global Step: 750630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:14,103-Speed 2574.19 samples/sec   Loss 1.5636   LearningRate 0.0009   Epoch: 18   Global Step: 750640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:18,004-Speed 2625.88 samples/sec   Loss 1.5138   LearningRate 0.0009   Epoch: 18   Global Step: 750650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:21,921-Speed 2614.87 samples/sec   Loss 1.5332   LearningRate 0.0009   Epoch: 18   Global Step: 750660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:07:25,824-Speed 2624.65 samples/sec   Loss 1.5184   LearningRate 0.0009   Epoch: 18   Global Step: 750670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:07:29,731-Speed 2621.69 samples/sec   Loss 1.4873   LearningRate 0.0009   Epoch: 18   Global Step: 750680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:07:33,616-Speed 2636.65 samples/sec   Loss 1.5595   LearningRate 0.0009   Epoch: 18   Global Step: 750690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:37,521-Speed 2622.86 samples/sec   Loss 1.5670   LearningRate 0.0009   Epoch: 18   Global Step: 750700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:41,429-Speed 2621.02 samples/sec   Loss 1.5364   LearningRate 0.0009   Epoch: 18   Global Step: 750710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:45,327-Speed 2627.32 samples/sec   Loss 1.5364   LearningRate 0.0009   Epoch: 18   Global Step: 750720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:49,228-Speed 2625.31 samples/sec   Loss 1.5453   LearningRate 0.0009   Epoch: 18   Global Step: 750730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:07:53,111-Speed 2637.62 samples/sec   Loss 1.5144   LearningRate 0.0009   Epoch: 18   Global Step: 750740   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:07:57,015-Speed 2624.39 samples/sec   Loss 1.5169   LearningRate 0.0009   Epoch: 18   Global Step: 750750   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:00,911-Speed 2629.08 samples/sec   Loss 1.4592   LearningRate 0.0009   Epoch: 18   Global Step: 750760   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:04,810-Speed 2626.87 samples/sec   Loss 1.5413   LearningRate 0.0009   Epoch: 18   Global Step: 750770   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:08,721-Speed 2618.39 samples/sec   Loss 1.5709   LearningRate 0.0009   Epoch: 18   Global Step: 750780   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:12,677-Speed 2589.86 samples/sec   Loss 1.5497   LearningRate 0.0009   Epoch: 18   Global Step: 750790   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:16,567-Speed 2632.81 samples/sec   Loss 1.5324   LearningRate 0.0009   Epoch: 18   Global Step: 750800   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:20,466-Speed 2627.74 samples/sec   Loss 1.5381   LearningRate 0.0009   Epoch: 18   Global Step: 750810   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:24,357-Speed 2631.57 samples/sec   Loss 1.5421   LearningRate 0.0009   Epoch: 18   Global Step: 750820   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:28,270-Speed 2618.44 samples/sec   Loss 1.5631   LearningRate 0.0009   Epoch: 18   Global Step: 750830   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:08:32,167-Speed 2628.43 samples/sec   Loss 1.5405   LearningRate 0.0009   Epoch: 18   Global Step: 750840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:36,062-Speed 2629.22 samples/sec   Loss 1.5422   LearningRate 0.0009   Epoch: 18   Global Step: 750850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:39,965-Speed 2624.30 samples/sec   Loss 1.5592   LearningRate 0.0009   Epoch: 18   Global Step: 750860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:43,872-Speed 2621.80 samples/sec   Loss 1.4930   LearningRate 0.0009   Epoch: 18   Global Step: 750870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:47,766-Speed 2629.63 samples/sec   Loss 1.5108   LearningRate 0.0009   Epoch: 18   Global Step: 750880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:51,671-Speed 2623.14 samples/sec   Loss 1.5325   LearningRate 0.0009   Epoch: 18   Global Step: 750890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:55,572-Speed 2625.58 samples/sec   Loss 1.5066   LearningRate 0.0009   Epoch: 18   Global Step: 750900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:08:59,469-Speed 2628.86 samples/sec   Loss 1.4594   LearningRate 0.0009   Epoch: 18   Global Step: 750910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:03,372-Speed 2623.99 samples/sec   Loss 1.5449   LearningRate 0.0009   Epoch: 18   Global Step: 750920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:07,268-Speed 2628.80 samples/sec   Loss 1.5109   LearningRate 0.0009   Epoch: 18   Global Step: 750930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:11,168-Speed 2626.06 samples/sec   Loss 1.5873   LearningRate 0.0009   Epoch: 18   Global Step: 750940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:09:15,067-Speed 2627.62 samples/sec   Loss 1.5246   LearningRate 0.0009   Epoch: 18   Global Step: 750950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:09:18,965-Speed 2627.80 samples/sec   Loss 1.5452   LearningRate 0.0009   Epoch: 18   Global Step: 750960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:09:22,860-Speed 2629.81 samples/sec   Loss 1.5528   LearningRate 0.0009   Epoch: 18   Global Step: 750970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:09:26,760-Speed 2627.29 samples/sec   Loss 1.5467   LearningRate 0.0009   Epoch: 18   Global Step: 750980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:09:30,665-Speed 2622.63 samples/sec   Loss 1.5376   LearningRate 0.0009   Epoch: 18   Global Step: 750990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:09:34,537-Speed 2645.69 samples/sec   Loss 1.4717   LearningRate 0.0009   Epoch: 18   Global Step: 751000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:38,430-Speed 2630.91 samples/sec   Loss 1.5652   LearningRate 0.0009   Epoch: 18   Global Step: 751010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:42,326-Speed 2628.78 samples/sec   Loss 1.5887   LearningRate 0.0009   Epoch: 18   Global Step: 751020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:46,227-Speed 2625.35 samples/sec   Loss 1.5254   LearningRate 0.0009   Epoch: 18   Global Step: 751030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:50,142-Speed 2616.78 samples/sec   Loss 1.4952   LearningRate 0.0009   Epoch: 18   Global Step: 751040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:54,044-Speed 2625.09 samples/sec   Loss 1.5129   LearningRate 0.0009   Epoch: 18   Global Step: 751050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:09:57,942-Speed 2628.22 samples/sec   Loss 1.5504   LearningRate 0.0009   Epoch: 18   Global Step: 751060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:01,878-Speed 2602.29 samples/sec   Loss 1.5597   LearningRate 0.0009   Epoch: 18   Global Step: 751070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:05,772-Speed 2629.84 samples/sec   Loss 1.5184   LearningRate 0.0009   Epoch: 18   Global Step: 751080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:09,670-Speed 2627.94 samples/sec   Loss 1.4833   LearningRate 0.0009   Epoch: 18   Global Step: 751090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:13,548-Speed 2641.19 samples/sec   Loss 1.5521   LearningRate 0.0009   Epoch: 18   Global Step: 751100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:17,442-Speed 2630.42 samples/sec   Loss 1.4835   LearningRate 0.0009   Epoch: 18   Global Step: 751110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:21,338-Speed 2629.08 samples/sec   Loss 1.5250   LearningRate 0.0009   Epoch: 18   Global Step: 751120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:25,235-Speed 2628.92 samples/sec   Loss 1.5255   LearningRate 0.0009   Epoch: 18   Global Step: 751130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:29,126-Speed 2632.74 samples/sec   Loss 1.5650   LearningRate 0.0009   Epoch: 18   Global Step: 751140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:33,018-Speed 2631.04 samples/sec   Loss 1.5002   LearningRate 0.0009   Epoch: 18   Global Step: 751150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:36,915-Speed 2628.63 samples/sec   Loss 1.5170   LearningRate 0.0009   Epoch: 18   Global Step: 751160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:40,807-Speed 2631.42 samples/sec   Loss 1.5268   LearningRate 0.0009   Epoch: 18   Global Step: 751170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:10:44,675-Speed 2648.31 samples/sec   Loss 1.5015   LearningRate 0.0009   Epoch: 18   Global Step: 751180   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:10:48,577-Speed 2624.68 samples/sec   Loss 1.5370   LearningRate 0.0009   Epoch: 18   Global Step: 751190   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:10:52,481-Speed 2624.07 samples/sec   Loss 1.5226   LearningRate 0.0009   Epoch: 18   Global Step: 751200   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:10:56,378-Speed 2628.08 samples/sec   Loss 1.5871   LearningRate 0.0009   Epoch: 18   Global Step: 751210   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:00,268-Speed 2633.09 samples/sec   Loss 1.5130   LearningRate 0.0009   Epoch: 18   Global Step: 751220   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:04,168-Speed 2626.11 samples/sec   Loss 1.4898   LearningRate 0.0009   Epoch: 18   Global Step: 751230   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:08,062-Speed 2630.77 samples/sec   Loss 1.5551   LearningRate 0.0009   Epoch: 18   Global Step: 751240   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:11,966-Speed 2623.49 samples/sec   Loss 1.5195   LearningRate 0.0009   Epoch: 18   Global Step: 751250   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:15,864-Speed 2627.30 samples/sec   Loss 1.4989   LearningRate 0.0009   Epoch: 18   Global Step: 751260   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:19,770-Speed 2622.51 samples/sec   Loss 1.4797   LearningRate 0.0009   Epoch: 18   Global Step: 751270   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:11:23,673-Speed 2624.64 samples/sec   Loss 1.5213   LearningRate 0.0009   Epoch: 18   Global Step: 751280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:27,598-Speed 2609.80 samples/sec   Loss 1.5702   LearningRate 0.0009   Epoch: 18   Global Step: 751290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:31,520-Speed 2610.95 samples/sec   Loss 1.5309   LearningRate 0.0009   Epoch: 18   Global Step: 751300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:35,416-Speed 2629.04 samples/sec   Loss 1.5536   LearningRate 0.0009   Epoch: 18   Global Step: 751310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:39,307-Speed 2632.50 samples/sec   Loss 1.5082   LearningRate 0.0009   Epoch: 18   Global Step: 751320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:43,314-Speed 2556.57 samples/sec   Loss 1.5730   LearningRate 0.0009   Epoch: 18   Global Step: 751330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:47,335-Speed 2547.35 samples/sec   Loss 1.5477   LearningRate 0.0009   Epoch: 18   Global Step: 751340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:51,226-Speed 2632.29 samples/sec   Loss 1.5657   LearningRate 0.0009   Epoch: 18   Global Step: 751350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:55,119-Speed 2631.37 samples/sec   Loss 1.5019   LearningRate 0.0009   Epoch: 18   Global Step: 751360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:11:59,015-Speed 2629.07 samples/sec   Loss 1.5430   LearningRate 0.0009   Epoch: 18   Global Step: 751370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:02,918-Speed 2624.67 samples/sec   Loss 1.5590   LearningRate 0.0009   Epoch: 18   Global Step: 751380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:06,812-Speed 2630.15 samples/sec   Loss 1.5049   LearningRate 0.0009   Epoch: 18   Global Step: 751390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:10,729-Speed 2614.20 samples/sec   Loss 1.5221   LearningRate 0.0009   Epoch: 18   Global Step: 751400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:14,639-Speed 2619.93 samples/sec   Loss 1.5014   LearningRate 0.0009   Epoch: 18   Global Step: 751410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:18,535-Speed 2629.28 samples/sec   Loss 1.4999   LearningRate 0.0009   Epoch: 18   Global Step: 751420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:22,441-Speed 2622.88 samples/sec   Loss 1.5377   LearningRate 0.0009   Epoch: 18   Global Step: 751430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:26,335-Speed 2629.99 samples/sec   Loss 1.5021   LearningRate 0.0009   Epoch: 18   Global Step: 751440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:12:30,219-Speed 2637.40 samples/sec   Loss 1.5570   LearningRate 0.0009   Epoch: 18   Global Step: 751450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:34,116-Speed 2628.38 samples/sec   Loss 1.5050   LearningRate 0.0009   Epoch: 18   Global Step: 751460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:38,008-Speed 2631.37 samples/sec   Loss 1.5404   LearningRate 0.0009   Epoch: 18   Global Step: 751470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:41,903-Speed 2629.44 samples/sec   Loss 1.5630   LearningRate 0.0009   Epoch: 18   Global Step: 751480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:45,810-Speed 2621.76 samples/sec   Loss 1.5200   LearningRate 0.0009   Epoch: 18   Global Step: 751490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:49,704-Speed 2630.51 samples/sec   Loss 1.5060   LearningRate 0.0009   Epoch: 18   Global Step: 751500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:53,603-Speed 2626.88 samples/sec   Loss 1.5486   LearningRate 0.0009   Epoch: 18   Global Step: 751510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:12:57,502-Speed 2627.20 samples/sec   Loss 1.5203   LearningRate 0.0009   Epoch: 18   Global Step: 751520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:01,396-Speed 2630.31 samples/sec   Loss 1.5231   LearningRate 0.0009   Epoch: 18   Global Step: 751530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:05,292-Speed 2629.13 samples/sec   Loss 1.4934   LearningRate 0.0009   Epoch: 18   Global Step: 751540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:09,188-Speed 2628.97 samples/sec   Loss 1.4997   LearningRate 0.0009   Epoch: 18   Global Step: 751550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:13,086-Speed 2627.87 samples/sec   Loss 1.5025   LearningRate 0.0009   Epoch: 18   Global Step: 751560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:16,993-Speed 2621.13 samples/sec   Loss 1.5147   LearningRate 0.0009   Epoch: 18   Global Step: 751570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:20,889-Speed 2629.36 samples/sec   Loss 1.5192   LearningRate 0.0009   Epoch: 18   Global Step: 751580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:24,785-Speed 2629.42 samples/sec   Loss 1.5743   LearningRate 0.0009   Epoch: 18   Global Step: 751590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:28,681-Speed 2629.05 samples/sec   Loss 1.4786   LearningRate 0.0009   Epoch: 18   Global Step: 751600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:32,579-Speed 2627.56 samples/sec   Loss 1.5032   LearningRate 0.0009   Epoch: 18   Global Step: 751610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:36,474-Speed 2629.61 samples/sec   Loss 1.5422   LearningRate 0.0009   Epoch: 18   Global Step: 751620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:40,368-Speed 2630.00 samples/sec   Loss 1.4973   LearningRate 0.0009   Epoch: 18   Global Step: 751630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:44,262-Speed 2630.68 samples/sec   Loss 1.5186   LearningRate 0.0009   Epoch: 18   Global Step: 751640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:13:48,158-Speed 2628.84 samples/sec   Loss 1.5503   LearningRate 0.0009   Epoch: 18   Global Step: 751650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:13:52,063-Speed 2623.09 samples/sec   Loss 1.5490   LearningRate 0.0009   Epoch: 18   Global Step: 751660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:13:55,959-Speed 2629.00 samples/sec   Loss 1.4915   LearningRate 0.0009   Epoch: 18   Global Step: 751670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:13:59,861-Speed 2624.84 samples/sec   Loss 1.5232   LearningRate 0.0009   Epoch: 18   Global Step: 751680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:14:03,756-Speed 2630.18 samples/sec   Loss 1.5036   LearningRate 0.0009   Epoch: 18   Global Step: 751690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:07,697-Speed 2598.84 samples/sec   Loss 1.5136   LearningRate 0.0009   Epoch: 18   Global Step: 751700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:11,642-Speed 2595.64 samples/sec   Loss 1.4868   LearningRate 0.0009   Epoch: 18   Global Step: 751710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:15,565-Speed 2611.61 samples/sec   Loss 1.5296   LearningRate 0.0009   Epoch: 18   Global Step: 751720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:19,475-Speed 2619.41 samples/sec   Loss 1.5687   LearningRate 0.0009   Epoch: 18   Global Step: 751730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:23,370-Speed 2630.07 samples/sec   Loss 1.5192   LearningRate 0.0009   Epoch: 18   Global Step: 751740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:27,260-Speed 2633.24 samples/sec   Loss 1.5248   LearningRate 0.0009   Epoch: 18   Global Step: 751750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:31,157-Speed 2628.03 samples/sec   Loss 1.4530   LearningRate 0.0009   Epoch: 18   Global Step: 751760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:35,064-Speed 2621.86 samples/sec   Loss 1.5260   LearningRate 0.0009   Epoch: 18   Global Step: 751770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:38,962-Speed 2627.45 samples/sec   Loss 1.5056   LearningRate 0.0009   Epoch: 18   Global Step: 751780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:42,831-Speed 2647.02 samples/sec   Loss 1.4851   LearningRate 0.0009   Epoch: 18   Global Step: 751790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:46,739-Speed 2621.17 samples/sec   Loss 1.5321   LearningRate 0.0009   Epoch: 18   Global Step: 751800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:50,654-Speed 2616.01 samples/sec   Loss 1.5280   LearningRate 0.0009   Epoch: 18   Global Step: 751810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:54,561-Speed 2621.99 samples/sec   Loss 1.5138   LearningRate 0.0009   Epoch: 18   Global Step: 751820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:14:58,460-Speed 2627.37 samples/sec   Loss 1.4999   LearningRate 0.0009   Epoch: 18   Global Step: 751830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:02,481-Speed 2546.98 samples/sec   Loss 1.5055   LearningRate 0.0009   Epoch: 18   Global Step: 751840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:06,373-Speed 2631.51 samples/sec   Loss 1.5236   LearningRate 0.0009   Epoch: 18   Global Step: 751850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:10,277-Speed 2623.52 samples/sec   Loss 1.5543   LearningRate 0.0009   Epoch: 18   Global Step: 751860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:14,168-Speed 2632.17 samples/sec   Loss 1.4985   LearningRate 0.0009   Epoch: 18   Global Step: 751870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:18,068-Speed 2626.26 samples/sec   Loss 1.5140   LearningRate 0.0009   Epoch: 18   Global Step: 751880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:21,964-Speed 2629.33 samples/sec   Loss 1.4836   LearningRate 0.0009   Epoch: 18   Global Step: 751890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:15:25,889-Speed 2609.85 samples/sec   Loss 1.4728   LearningRate 0.0009   Epoch: 18   Global Step: 751900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:15:29,780-Speed 2632.47 samples/sec   Loss 1.5533   LearningRate 0.0009   Epoch: 18   Global Step: 751910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:15:33,681-Speed 2625.90 samples/sec   Loss 1.4974   LearningRate 0.0009   Epoch: 18   Global Step: 751920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:15:37,577-Speed 2628.47 samples/sec   Loss 1.4692   LearningRate 0.0009   Epoch: 18   Global Step: 751930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:15:41,452-Speed 2642.88 samples/sec   Loss 1.5489   LearningRate 0.0009   Epoch: 18   Global Step: 751940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:45,352-Speed 2626.65 samples/sec   Loss 1.5485   LearningRate 0.0009   Epoch: 18   Global Step: 751950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:49,259-Speed 2622.56 samples/sec   Loss 1.4135   LearningRate 0.0009   Epoch: 18   Global Step: 751960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:53,166-Speed 2621.24 samples/sec   Loss 1.5182   LearningRate 0.0009   Epoch: 18   Global Step: 751970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:15:57,076-Speed 2619.84 samples/sec   Loss 1.5250   LearningRate 0.0009   Epoch: 18   Global Step: 751980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:16:00,969-Speed 2631.16 samples/sec   Loss 1.5208   LearningRate 0.0009   Epoch: 18   Global Step: 751990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:16:04,864-Speed 2629.76 samples/sec   Loss 1.5591   LearningRate 0.0009   Epoch: 18   Global Step: 752000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:16:08,755-Speed 2632.05 samples/sec   Loss 1.4942   LearningRate 0.0009   Epoch: 18   Global Step: 752010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:16:12,668-Speed 2617.40 samples/sec   Loss 1.4808   LearningRate 0.0009   Epoch: 18   Global Step: 752020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:16:16,571-Speed 2624.35 samples/sec   Loss 1.5144   LearningRate 0.0009   Epoch: 18   Global Step: 752030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:16:20,443-Speed 2645.27 samples/sec   Loss 1.4898   LearningRate 0.0009   Epoch: 18   Global Step: 752040   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:24,339-Speed 2629.23 samples/sec   Loss 1.5154   LearningRate 0.0009   Epoch: 18   Global Step: 752050   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:28,237-Speed 2627.87 samples/sec   Loss 1.5906   LearningRate 0.0009   Epoch: 18   Global Step: 752060   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:32,140-Speed 2624.14 samples/sec   Loss 1.5204   LearningRate 0.0009   Epoch: 18   Global Step: 752070   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:36,050-Speed 2619.13 samples/sec   Loss 1.5102   LearningRate 0.0009   Epoch: 18   Global Step: 752080   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:39,950-Speed 2626.61 samples/sec   Loss 1.4979   LearningRate 0.0009   Epoch: 18   Global Step: 752090   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:43,855-Speed 2622.59 samples/sec   Loss 1.5426   LearningRate 0.0009   Epoch: 18   Global Step: 752100   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:47,755-Speed 2626.09 samples/sec   Loss 1.4424   LearningRate 0.0009   Epoch: 18   Global Step: 752110   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:51,657-Speed 2625.03 samples/sec   Loss 1.4585   LearningRate 0.0009   Epoch: 18   Global Step: 752120   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:55,568-Speed 2619.24 samples/sec   Loss 1.5615   LearningRate 0.0009   Epoch: 18   Global Step: 752130   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:16:59,463-Speed 2629.91 samples/sec   Loss 1.4837   LearningRate 0.0009   Epoch: 18   Global Step: 752140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:03,353-Speed 2632.58 samples/sec   Loss 1.5531   LearningRate 0.0009   Epoch: 18   Global Step: 752150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:07,265-Speed 2618.32 samples/sec   Loss 1.5252   LearningRate 0.0009   Epoch: 18   Global Step: 752160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:11,156-Speed 2632.11 samples/sec   Loss 1.5431   LearningRate 0.0009   Epoch: 18   Global Step: 752170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:15,053-Speed 2628.41 samples/sec   Loss 1.5185   LearningRate 0.0009   Epoch: 18   Global Step: 752180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:18,952-Speed 2626.41 samples/sec   Loss 1.5274   LearningRate 0.0009   Epoch: 18   Global Step: 752190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:22,847-Speed 2630.28 samples/sec   Loss 1.5326   LearningRate 0.0009   Epoch: 18   Global Step: 752200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:26,746-Speed 2626.62 samples/sec   Loss 1.4996   LearningRate 0.0009   Epoch: 18   Global Step: 752210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:30,642-Speed 2629.01 samples/sec   Loss 1.5226   LearningRate 0.0009   Epoch: 18   Global Step: 752220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:34,541-Speed 2627.31 samples/sec   Loss 1.4971   LearningRate 0.0009   Epoch: 18   Global Step: 752230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:17:38,437-Speed 2629.04 samples/sec   Loss 1.5169   LearningRate 0.0009   Epoch: 18   Global Step: 752240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:17:42,334-Speed 2628.04 samples/sec   Loss 1.5401   LearningRate 0.0009   Epoch: 18   Global Step: 752250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:17:46,235-Speed 2625.19 samples/sec   Loss 1.5212   LearningRate 0.0009   Epoch: 18   Global Step: 752260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:17:50,132-Speed 2628.86 samples/sec   Loss 1.5089   LearningRate 0.0009   Epoch: 18   Global Step: 752270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:17:54,026-Speed 2630.46 samples/sec   Loss 1.4904   LearningRate 0.0009   Epoch: 18   Global Step: 752280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:17:57,941-Speed 2616.16 samples/sec   Loss 1.5779   LearningRate 0.0009   Epoch: 18   Global Step: 752290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:18:01,838-Speed 2627.94 samples/sec   Loss 1.5357   LearningRate 0.0009   Epoch: 18   Global Step: 752300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:18:05,739-Speed 2625.60 samples/sec   Loss 1.5392   LearningRate 0.0009   Epoch: 18   Global Step: 752310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:18:09,643-Speed 2623.59 samples/sec   Loss 1.4728   LearningRate 0.0009   Epoch: 18   Global Step: 752320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:18:13,515-Speed 2645.25 samples/sec   Loss 1.5478   LearningRate 0.0009   Epoch: 18   Global Step: 752330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:18:17,416-Speed 2626.02 samples/sec   Loss 1.4649   LearningRate 0.0009   Epoch: 18   Global Step: 752340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:18:21,311-Speed 2629.47 samples/sec   Loss 1.5332   LearningRate 0.0009   Epoch: 18   Global Step: 752350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:18:25,302-Speed 2566.03 samples/sec   Loss 1.4501   LearningRate 0.0009   Epoch: 18   Global Step: 752360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:18:29,199-Speed 2628.98 samples/sec   Loss 1.4567   LearningRate 0.0009   Epoch: 18   Global Step: 752370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:18:33,072-Speed 2644.32 samples/sec   Loss 1.4702   LearningRate 0.0009   Epoch: 18   Global Step: 752380   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:18:36,974-Speed 2625.07 samples/sec   Loss 1.5507   LearningRate 0.0009   Epoch: 18   Global Step: 752390   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:18:40,870-Speed 2628.63 samples/sec   Loss 1.4932   LearningRate 0.0009   Epoch: 18   Global Step: 752400   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:18:44,787-Speed 2616.01 samples/sec   Loss 1.5321   LearningRate 0.0009   Epoch: 18   Global Step: 752410   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:18:48,682-Speed 2629.52 samples/sec   Loss 1.4899   LearningRate 0.0009   Epoch: 18   Global Step: 752420   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:18:52,574-Speed 2631.67 samples/sec   Loss 1.4880   LearningRate 0.0009   Epoch: 18   Global Step: 752430   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:18:56,473-Speed 2626.86 samples/sec   Loss 1.5328   LearningRate 0.0009   Epoch: 18   Global Step: 752440   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:19:00,379-Speed 2622.67 samples/sec   Loss 1.4922   LearningRate 0.0009   Epoch: 18   Global Step: 752450   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:19:04,284-Speed 2623.14 samples/sec   Loss 1.5212   LearningRate 0.0009   Epoch: 18   Global Step: 752460   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:19:08,179-Speed 2629.36 samples/sec   Loss 1.5026   LearningRate 0.0009   Epoch: 18   Global Step: 752470   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-16 08:19:12,105-Speed 2608.89 samples/sec   Loss 1.5162   LearningRate 0.0009   Epoch: 18   Global Step: 752480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:16,001-Speed 2629.48 samples/sec   Loss 1.5001   LearningRate 0.0009   Epoch: 18   Global Step: 752490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:19,890-Speed 2633.50 samples/sec   Loss 1.4950   LearningRate 0.0009   Epoch: 18   Global Step: 752500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:23,780-Speed 2632.99 samples/sec   Loss 1.5293   LearningRate 0.0009   Epoch: 18   Global Step: 752510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:27,691-Speed 2619.16 samples/sec   Loss 1.5423   LearningRate 0.0009   Epoch: 18   Global Step: 752520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:31,585-Speed 2630.03 samples/sec   Loss 1.4842   LearningRate 0.0009   Epoch: 18   Global Step: 752530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:35,484-Speed 2627.47 samples/sec   Loss 1.5013   LearningRate 0.0009   Epoch: 18   Global Step: 752540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:39,375-Speed 2632.50 samples/sec   Loss 1.5273   LearningRate 0.0009   Epoch: 18   Global Step: 752550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:43,279-Speed 2623.33 samples/sec   Loss 1.5497   LearningRate 0.0009   Epoch: 18   Global Step: 752560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:47,182-Speed 2624.50 samples/sec   Loss 1.5531   LearningRate 0.0009   Epoch: 18   Global Step: 752570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:19:51,076-Speed 2630.97 samples/sec   Loss 1.5492   LearningRate 0.0009   Epoch: 18   Global Step: 752580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:19:54,977-Speed 2625.05 samples/sec   Loss 1.5170   LearningRate 0.0009   Epoch: 18   Global Step: 752590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:19:58,870-Speed 2632.04 samples/sec   Loss 1.5104   LearningRate 0.0009   Epoch: 18   Global Step: 752600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:20:02,766-Speed 2628.92 samples/sec   Loss 1.5197   LearningRate 0.0009   Epoch: 18   Global Step: 752610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:20:06,657-Speed 2631.76 samples/sec   Loss 1.5213   LearningRate 0.0009   Epoch: 18   Global Step: 752620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:20:10,553-Speed 2628.71 samples/sec   Loss 1.5295   LearningRate 0.0009   Epoch: 18   Global Step: 752630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:20:14,452-Speed 2627.52 samples/sec   Loss 1.5382   LearningRate 0.0009   Epoch: 18   Global Step: 752640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:18,364-Speed 2618.09 samples/sec   Loss 1.5274   LearningRate 0.0009   Epoch: 18   Global Step: 752650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:22,274-Speed 2619.50 samples/sec   Loss 1.5124   LearningRate 0.0009   Epoch: 18   Global Step: 752660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:26,165-Speed 2633.35 samples/sec   Loss 1.5366   LearningRate 0.0009   Epoch: 18   Global Step: 752670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:30,060-Speed 2629.41 samples/sec   Loss 1.5097   LearningRate 0.0009   Epoch: 18   Global Step: 752680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:33,985-Speed 2610.14 samples/sec   Loss 1.4869   LearningRate 0.0009   Epoch: 18   Global Step: 752690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:37,892-Speed 2621.26 samples/sec   Loss 1.5112   LearningRate 0.0009   Epoch: 18   Global Step: 752700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:41,809-Speed 2614.61 samples/sec   Loss 1.5519   LearningRate 0.0009   Epoch: 18   Global Step: 752710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:45,728-Speed 2613.82 samples/sec   Loss 1.5490   LearningRate 0.0009   Epoch: 18   Global Step: 752720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:49,663-Speed 2603.65 samples/sec   Loss 1.5679   LearningRate 0.0009   Epoch: 18   Global Step: 752730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:20:53,565-Speed 2624.77 samples/sec   Loss 1.5078   LearningRate 0.0009   Epoch: 18   Global Step: 752740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:20:57,461-Speed 2628.82 samples/sec   Loss 1.5394   LearningRate 0.0009   Epoch: 18   Global Step: 752750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:01,389-Speed 2607.31 samples/sec   Loss 1.5454   LearningRate 0.0009   Epoch: 18   Global Step: 752760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:05,338-Speed 2595.08 samples/sec   Loss 1.5066   LearningRate 0.0009   Epoch: 18   Global Step: 752770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:09,345-Speed 2555.74 samples/sec   Loss 1.4460   LearningRate 0.0009   Epoch: 18   Global Step: 752780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:13,304-Speed 2587.02 samples/sec   Loss 1.4376   LearningRate 0.0009   Epoch: 18   Global Step: 752790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:17,204-Speed 2626.28 samples/sec   Loss 1.5342   LearningRate 0.0009   Epoch: 18   Global Step: 752800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:21,098-Speed 2630.67 samples/sec   Loss 1.5308   LearningRate 0.0009   Epoch: 18   Global Step: 752810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:25,033-Speed 2603.39 samples/sec   Loss 1.4657   LearningRate 0.0009   Epoch: 18   Global Step: 752820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:21:28,901-Speed 2647.98 samples/sec   Loss 1.5307   LearningRate 0.0009   Epoch: 18   Global Step: 752830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:32,795-Speed 2630.60 samples/sec   Loss 1.5577   LearningRate 0.0009   Epoch: 18   Global Step: 752840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:36,707-Speed 2618.12 samples/sec   Loss 1.4699   LearningRate 0.0009   Epoch: 18   Global Step: 752850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:40,607-Speed 2626.80 samples/sec   Loss 1.5048   LearningRate 0.0009   Epoch: 18   Global Step: 752860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:44,499-Speed 2631.28 samples/sec   Loss 1.4773   LearningRate 0.0009   Epoch: 18   Global Step: 752870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:48,390-Speed 2632.77 samples/sec   Loss 1.5524   LearningRate 0.0009   Epoch: 18   Global Step: 752880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:52,289-Speed 2626.76 samples/sec   Loss 1.4996   LearningRate 0.0009   Epoch: 18   Global Step: 752890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:21:56,200-Speed 2619.23 samples/sec   Loss 1.5290   LearningRate 0.0009   Epoch: 18   Global Step: 752900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:00,103-Speed 2623.99 samples/sec   Loss 1.4736   LearningRate 0.0009   Epoch: 18   Global Step: 752910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:04,066-Speed 2585.22 samples/sec   Loss 1.5384   LearningRate 0.0009   Epoch: 18   Global Step: 752920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:07,958-Speed 2631.28 samples/sec   Loss 1.5538   LearningRate 0.0009   Epoch: 18   Global Step: 752930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:22:11,856-Speed 2628.13 samples/sec   Loss 1.5402   LearningRate 0.0009   Epoch: 18   Global Step: 752940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:22:15,730-Speed 2643.55 samples/sec   Loss 1.4702   LearningRate 0.0009   Epoch: 18   Global Step: 752950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:19,638-Speed 2621.97 samples/sec   Loss 1.5303   LearningRate 0.0009   Epoch: 18   Global Step: 752960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:23,534-Speed 2628.72 samples/sec   Loss 1.5667   LearningRate 0.0009   Epoch: 18   Global Step: 752970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:27,432-Speed 2627.10 samples/sec   Loss 1.5303   LearningRate 0.0009   Epoch: 18   Global Step: 752980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:31,325-Speed 2631.83 samples/sec   Loss 1.4830   LearningRate 0.0009   Epoch: 18   Global Step: 752990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:35,218-Speed 2631.47 samples/sec   Loss 1.5206   LearningRate 0.0009   Epoch: 18   Global Step: 753000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:39,123-Speed 2622.16 samples/sec   Loss 1.5646   LearningRate 0.0009   Epoch: 18   Global Step: 753010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:43,027-Speed 2623.76 samples/sec   Loss 1.5038   LearningRate 0.0009   Epoch: 18   Global Step: 753020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:46,937-Speed 2619.52 samples/sec   Loss 1.5217   LearningRate 0.0009   Epoch: 18   Global Step: 753030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:50,830-Speed 2631.42 samples/sec   Loss 1.4800   LearningRate 0.0009   Epoch: 18   Global Step: 753040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:22:54,727-Speed 2628.71 samples/sec   Loss 1.4529   LearningRate 0.0009   Epoch: 18   Global Step: 753050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:22:58,620-Speed 2631.14 samples/sec   Loss 1.5139   LearningRate 0.0009   Epoch: 18   Global Step: 753060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:23:02,515-Speed 2629.46 samples/sec   Loss 1.5247   LearningRate 0.0009   Epoch: 18   Global Step: 753070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:23:06,413-Speed 2627.18 samples/sec   Loss 1.4824   LearningRate 0.0009   Epoch: 18   Global Step: 753080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:23:10,319-Speed 2623.31 samples/sec   Loss 1.5085   LearningRate 0.0009   Epoch: 18   Global Step: 753090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:23:14,189-Speed 2646.14 samples/sec   Loss 1.5755   LearningRate 0.0008   Epoch: 18   Global Step: 753100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:18,089-Speed 2626.07 samples/sec   Loss 1.5112   LearningRate 0.0008   Epoch: 18   Global Step: 753110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:21,986-Speed 2628.44 samples/sec   Loss 1.5582   LearningRate 0.0008   Epoch: 18   Global Step: 753120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:25,884-Speed 2627.83 samples/sec   Loss 1.4866   LearningRate 0.0008   Epoch: 18   Global Step: 753130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:29,776-Speed 2632.14 samples/sec   Loss 1.5364   LearningRate 0.0008   Epoch: 18   Global Step: 753140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:33,671-Speed 2629.47 samples/sec   Loss 1.5283   LearningRate 0.0008   Epoch: 18   Global Step: 753150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:37,565-Speed 2630.33 samples/sec   Loss 1.5150   LearningRate 0.0008   Epoch: 18   Global Step: 753160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:41,462-Speed 2627.89 samples/sec   Loss 1.4847   LearningRate 0.0008   Epoch: 18   Global Step: 753170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:45,354-Speed 2632.14 samples/sec   Loss 1.4517   LearningRate 0.0008   Epoch: 18   Global Step: 753180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:49,249-Speed 2629.56 samples/sec   Loss 1.5055   LearningRate 0.0008   Epoch: 18   Global Step: 753190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:23:53,143-Speed 2630.62 samples/sec   Loss 1.5427   LearningRate 0.0008   Epoch: 18   Global Step: 753200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:23:57,062-Speed 2613.62 samples/sec   Loss 1.4860   LearningRate 0.0008   Epoch: 18   Global Step: 753210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:00,976-Speed 2616.58 samples/sec   Loss 1.5683   LearningRate 0.0008   Epoch: 18   Global Step: 753220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:04,887-Speed 2618.71 samples/sec   Loss 1.4790   LearningRate 0.0008   Epoch: 18   Global Step: 753230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:08,795-Speed 2621.19 samples/sec   Loss 1.5048   LearningRate 0.0008   Epoch: 18   Global Step: 753240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:12,708-Speed 2617.02 samples/sec   Loss 1.4913   LearningRate 0.0008   Epoch: 18   Global Step: 753250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:16,607-Speed 2627.40 samples/sec   Loss 1.5017   LearningRate 0.0008   Epoch: 18   Global Step: 753260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:20,501-Speed 2631.31 samples/sec   Loss 1.5391   LearningRate 0.0008   Epoch: 18   Global Step: 753270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:24:24,375-Speed 2643.80 samples/sec   Loss 1.5233   LearningRate 0.0008   Epoch: 18   Global Step: 753280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:28,267-Speed 2632.09 samples/sec   Loss 1.4985   LearningRate 0.0008   Epoch: 18   Global Step: 753290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:32,160-Speed 2630.41 samples/sec   Loss 1.5091   LearningRate 0.0008   Epoch: 18   Global Step: 753300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:36,053-Speed 2631.21 samples/sec   Loss 1.4893   LearningRate 0.0008   Epoch: 18   Global Step: 753310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:39,952-Speed 2626.64 samples/sec   Loss 1.5428   LearningRate 0.0008   Epoch: 18   Global Step: 753320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:43,853-Speed 2625.41 samples/sec   Loss 1.4893   LearningRate 0.0008   Epoch: 18   Global Step: 753330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:47,749-Speed 2629.10 samples/sec   Loss 1.4848   LearningRate 0.0008   Epoch: 18   Global Step: 753340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:51,646-Speed 2628.76 samples/sec   Loss 1.5266   LearningRate 0.0008   Epoch: 18   Global Step: 753350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:55,543-Speed 2627.98 samples/sec   Loss 1.5610   LearningRate 0.0008   Epoch: 18   Global Step: 753360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:24:59,438-Speed 2629.80 samples/sec   Loss 1.5078   LearningRate 0.0008   Epoch: 18   Global Step: 753370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:03,328-Speed 2633.00 samples/sec   Loss 1.5673   LearningRate 0.0008   Epoch: 18   Global Step: 753380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:25:07,216-Speed 2634.18 samples/sec   Loss 1.5093   LearningRate 0.0008   Epoch: 18   Global Step: 753390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:11,112-Speed 2628.73 samples/sec   Loss 1.4726   LearningRate 0.0008   Epoch: 18   Global Step: 753400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:15,015-Speed 2625.19 samples/sec   Loss 1.5301   LearningRate 0.0008   Epoch: 18   Global Step: 753410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:18,939-Speed 2610.30 samples/sec   Loss 1.4919   LearningRate 0.0008   Epoch: 18   Global Step: 753420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:22,935-Speed 2563.73 samples/sec   Loss 1.5014   LearningRate 0.0008   Epoch: 18   Global Step: 753430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:26,823-Speed 2634.24 samples/sec   Loss 1.5073   LearningRate 0.0008   Epoch: 18   Global Step: 753440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:30,729-Speed 2622.69 samples/sec   Loss 1.4583   LearningRate 0.0008   Epoch: 18   Global Step: 753450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:34,623-Speed 2630.16 samples/sec   Loss 1.4397   LearningRate 0.0008   Epoch: 18   Global Step: 753460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:38,531-Speed 2620.40 samples/sec   Loss 1.4454   LearningRate 0.0008   Epoch: 18   Global Step: 753470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:42,434-Speed 2623.99 samples/sec   Loss 1.5148   LearningRate 0.0008   Epoch: 18   Global Step: 753480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:25:46,342-Speed 2621.66 samples/sec   Loss 1.4701   LearningRate 0.0008   Epoch: 18   Global Step: 753490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:25:50,275-Speed 2604.53 samples/sec   Loss 1.5177   LearningRate 0.0008   Epoch: 18   Global Step: 753500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:25:54,170-Speed 2629.64 samples/sec   Loss 1.4591   LearningRate 0.0008   Epoch: 18   Global Step: 753510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:25:58,044-Speed 2644.16 samples/sec   Loss 1.4864   LearningRate 0.0008   Epoch: 18   Global Step: 753520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:01,937-Speed 2631.40 samples/sec   Loss 1.5327   LearningRate 0.0008   Epoch: 18   Global Step: 753530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:05,827-Speed 2632.79 samples/sec   Loss 1.5012   LearningRate 0.0008   Epoch: 18   Global Step: 753540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:09,719-Speed 2631.29 samples/sec   Loss 1.5112   LearningRate 0.0008   Epoch: 18   Global Step: 753550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:13,752-Speed 2539.57 samples/sec   Loss 1.5350   LearningRate 0.0008   Epoch: 18   Global Step: 753560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:17,643-Speed 2632.62 samples/sec   Loss 1.5375   LearningRate 0.0008   Epoch: 18   Global Step: 753570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:21,551-Speed 2621.55 samples/sec   Loss 1.4987   LearningRate 0.0008   Epoch: 18   Global Step: 753580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:25,444-Speed 2631.19 samples/sec   Loss 1.4529   LearningRate 0.0008   Epoch: 18   Global Step: 753590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:29,335-Speed 2631.92 samples/sec   Loss 1.5376   LearningRate 0.0008   Epoch: 18   Global Step: 753600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:33,436-Speed 2497.79 samples/sec   Loss 1.5032   LearningRate 0.0008   Epoch: 18   Global Step: 753610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:26:37,332-Speed 2628.54 samples/sec   Loss 1.4814   LearningRate 0.0008   Epoch: 18   Global Step: 753620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:26:41,239-Speed 2621.65 samples/sec   Loss 1.5045   LearningRate 0.0008   Epoch: 18   Global Step: 753630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:26:45,130-Speed 2632.03 samples/sec   Loss 1.5034   LearningRate 0.0008   Epoch: 18   Global Step: 753640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:26:49,028-Speed 2628.29 samples/sec   Loss 1.4852   LearningRate 0.0008   Epoch: 18   Global Step: 753650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:26:52,923-Speed 2629.36 samples/sec   Loss 1.4795   LearningRate 0.0008   Epoch: 18   Global Step: 753660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:26:56,796-Speed 2644.98 samples/sec   Loss 1.5106   LearningRate 0.0008   Epoch: 18   Global Step: 753670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:00,692-Speed 2628.63 samples/sec   Loss 1.5264   LearningRate 0.0008   Epoch: 18   Global Step: 753680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:04,621-Speed 2607.00 samples/sec   Loss 1.4984   LearningRate 0.0008   Epoch: 18   Global Step: 753690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:08,515-Speed 2630.51 samples/sec   Loss 1.4589   LearningRate 0.0008   Epoch: 18   Global Step: 753700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:12,412-Speed 2628.28 samples/sec   Loss 1.4427   LearningRate 0.0008   Epoch: 18   Global Step: 753710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:16,308-Speed 2628.78 samples/sec   Loss 1.5148   LearningRate 0.0008   Epoch: 18   Global Step: 753720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:20,208-Speed 2626.49 samples/sec   Loss 1.4981   LearningRate 0.0008   Epoch: 18   Global Step: 753730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:24,100-Speed 2631.95 samples/sec   Loss 1.5159   LearningRate 0.0008   Epoch: 18   Global Step: 753740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:27,996-Speed 2629.33 samples/sec   Loss 1.5307   LearningRate 0.0008   Epoch: 18   Global Step: 753750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:31,889-Speed 2630.92 samples/sec   Loss 1.4854   LearningRate 0.0008   Epoch: 18   Global Step: 753760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:35,825-Speed 2602.54 samples/sec   Loss 1.5038   LearningRate 0.0008   Epoch: 18   Global Step: 753770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:27:39,735-Speed 2619.40 samples/sec   Loss 1.5144   LearningRate 0.0008   Epoch: 18   Global Step: 753780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:27:43,631-Speed 2629.11 samples/sec   Loss 1.4785   LearningRate 0.0008   Epoch: 18   Global Step: 753790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:27:47,542-Speed 2619.01 samples/sec   Loss 1.4846   LearningRate 0.0008   Epoch: 18   Global Step: 753800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:27:51,473-Speed 2605.43 samples/sec   Loss 1.4720   LearningRate 0.0008   Epoch: 18   Global Step: 753810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:27:55,344-Speed 2646.86 samples/sec   Loss 1.5605   LearningRate 0.0008   Epoch: 18   Global Step: 753820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:27:59,280-Speed 2602.39 samples/sec   Loss 1.4734   LearningRate 0.0008   Epoch: 18   Global Step: 753830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:03,185-Speed 2622.94 samples/sec   Loss 1.5064   LearningRate 0.0008   Epoch: 18   Global Step: 753840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:07,102-Speed 2615.26 samples/sec   Loss 1.4718   LearningRate 0.0008   Epoch: 18   Global Step: 753850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:10,997-Speed 2629.82 samples/sec   Loss 1.5381   LearningRate 0.0008   Epoch: 18   Global Step: 753860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:14,889-Speed 2631.59 samples/sec   Loss 1.4829   LearningRate 0.0008   Epoch: 18   Global Step: 753870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:18,781-Speed 2631.64 samples/sec   Loss 1.4427   LearningRate 0.0008   Epoch: 18   Global Step: 753880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:22,680-Speed 2627.06 samples/sec   Loss 1.5329   LearningRate 0.0008   Epoch: 18   Global Step: 753890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:26,621-Speed 2599.44 samples/sec   Loss 1.4754   LearningRate 0.0008   Epoch: 18   Global Step: 753900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:30,562-Speed 2599.28 samples/sec   Loss 1.5147   LearningRate 0.0008   Epoch: 18   Global Step: 753910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-16 08:28:34,455-Speed 2631.49 samples/sec   Loss 1.4858   LearningRate 0.0008   Epoch: 18   Global Step: 753920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:28:38,374-Speed 2613.12 samples/sec   Loss 1.4994   LearningRate 0.0008   Epoch: 18   Global Step: 753930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:28:42,273-Speed 2626.81 samples/sec   Loss 1.4357   LearningRate 0.0008   Epoch: 18   Global Step: 753940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:28:46,176-Speed 2624.63 samples/sec   Loss 1.4985   LearningRate 0.0008   Epoch: 18   Global Step: 753950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-16 08:28:50,081-Speed 2622.98 samples/sec   Loss 1.4432   LearningRate 0.0008   Epoch: 18   Global Step: 753960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:28:53,985-Speed 2623.88 samples/sec   Loss 1.5094   LearningRate 0.0008   Epoch: 18   Global Step: 753970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:28:57,877-Speed 2632.16 samples/sec   Loss 1.5298   LearningRate 0.0008   Epoch: 18   Global Step: 753980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:29:01,784-Speed 2621.25 samples/sec   Loss 1.5552   LearningRate 0.0008   Epoch: 18   Global Step: 753990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:29:05,696-Speed 2617.97 samples/sec   Loss 1.5478   LearningRate 0.0008   Epoch: 18   Global Step: 754000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:29:09,602-Speed 2622.44 samples/sec   Loss 1.4890   LearningRate 0.0008   Epoch: 18   Global Step: 754010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:29:13,490-Speed 2634.40 samples/sec   Loss 1.4965   LearningRate 0.0008   Epoch: 18   Global Step: 754020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:29:17,381-Speed 2632.30 samples/sec   Loss 1.4841   LearningRate 0.0008   Epoch: 18   Global Step: 754030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:21,273-Speed 2632.01 samples/sec   Loss 1.5083   LearningRate 0.0008   Epoch: 18   Global Step: 754040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:25,188-Speed 2615.90 samples/sec   Loss 1.4685   LearningRate 0.0008   Epoch: 18   Global Step: 754050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:29,111-Speed 2611.85 samples/sec   Loss 1.5271   LearningRate 0.0008   Epoch: 18   Global Step: 754060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:33,003-Speed 2630.95 samples/sec   Loss 1.5533   LearningRate 0.0008   Epoch: 18   Global Step: 754070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:36,910-Speed 2621.45 samples/sec   Loss 1.5114   LearningRate 0.0008   Epoch: 18   Global Step: 754080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:40,807-Speed 2628.20 samples/sec   Loss 1.5223   LearningRate 0.0008   Epoch: 18   Global Step: 754090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:44,709-Speed 2625.33 samples/sec   Loss 1.4599   LearningRate 0.0008   Epoch: 18   Global Step: 754100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:48,601-Speed 2631.78 samples/sec   Loss 1.4785   LearningRate 0.0008   Epoch: 18   Global Step: 754110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:52,499-Speed 2627.69 samples/sec   Loss 1.5067   LearningRate 0.0008   Epoch: 18   Global Step: 754120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:29:56,398-Speed 2627.22 samples/sec   Loss 1.4820   LearningRate 0.0008   Epoch: 18   Global Step: 754130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:00,294-Speed 2629.40 samples/sec   Loss 1.4994   LearningRate 0.0008   Epoch: 18   Global Step: 754140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:04,198-Speed 2623.40 samples/sec   Loss 1.4810   LearningRate 0.0008   Epoch: 18   Global Step: 754150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:08,087-Speed 2633.06 samples/sec   Loss 1.4859   LearningRate 0.0008   Epoch: 18   Global Step: 754160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:12,012-Speed 2609.55 samples/sec   Loss 1.4652   LearningRate 0.0008   Epoch: 18   Global Step: 754170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:15,940-Speed 2607.95 samples/sec   Loss 1.5161   LearningRate 0.0008   Epoch: 18   Global Step: 754180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:19,832-Speed 2631.55 samples/sec   Loss 1.4199   LearningRate 0.0008   Epoch: 18   Global Step: 754190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:23,727-Speed 2629.77 samples/sec   Loss 1.4820   LearningRate 0.0008   Epoch: 18   Global Step: 754200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:27,617-Speed 2632.63 samples/sec   Loss 1.5074   LearningRate 0.0008   Epoch: 18   Global Step: 754210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:31,508-Speed 2633.37 samples/sec   Loss 1.5006   LearningRate 0.0008   Epoch: 18   Global Step: 754220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:35,401-Speed 2630.46 samples/sec   Loss 1.5091   LearningRate 0.0008   Epoch: 18   Global Step: 754230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:30:39,308-Speed 2621.28 samples/sec   Loss 1.5188   LearningRate 0.0008   Epoch: 18   Global Step: 754240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:30:43,206-Speed 2627.73 samples/sec   Loss 1.4930   LearningRate 0.0008   Epoch: 18   Global Step: 754250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:30:47,112-Speed 2622.67 samples/sec   Loss 1.5275   LearningRate 0.0008   Epoch: 18   Global Step: 754260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:30:50,986-Speed 2644.64 samples/sec   Loss 1.4677   LearningRate 0.0008   Epoch: 18   Global Step: 754270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:54,878-Speed 2631.04 samples/sec   Loss 1.4620   LearningRate 0.0008   Epoch: 18   Global Step: 754280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:30:58,794-Speed 2616.54 samples/sec   Loss 1.5182   LearningRate 0.0008   Epoch: 18   Global Step: 754290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:02,726-Speed 2604.74 samples/sec   Loss 1.4482   LearningRate 0.0008   Epoch: 18   Global Step: 754300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:06,696-Speed 2580.03 samples/sec   Loss 1.5018   LearningRate 0.0008   Epoch: 18   Global Step: 754310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:10,587-Speed 2632.25 samples/sec   Loss 1.5508   LearningRate 0.0008   Epoch: 18   Global Step: 754320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:14,497-Speed 2620.06 samples/sec   Loss 1.4666   LearningRate 0.0008   Epoch: 18   Global Step: 754330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:18,392-Speed 2629.89 samples/sec   Loss 1.5291   LearningRate 0.0008   Epoch: 18   Global Step: 754340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:22,290-Speed 2628.10 samples/sec   Loss 1.4694   LearningRate 0.0008   Epoch: 18   Global Step: 754350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:26,196-Speed 2621.54 samples/sec   Loss 1.4696   LearningRate 0.0008   Epoch: 18   Global Step: 754360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:30,111-Speed 2616.72 samples/sec   Loss 1.4750   LearningRate 0.0008   Epoch: 18   Global Step: 754370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:31:34,005-Speed 2630.26 samples/sec   Loss 1.4989   LearningRate 0.0008   Epoch: 18   Global Step: 754380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:31:37,880-Speed 2643.64 samples/sec   Loss 1.4967   LearningRate 0.0008   Epoch: 18   Global Step: 754390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:41,820-Speed 2599.03 samples/sec   Loss 1.4879   LearningRate 0.0008   Epoch: 18   Global Step: 754400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:45,709-Speed 2633.89 samples/sec   Loss 1.5372   LearningRate 0.0008   Epoch: 18   Global Step: 754410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:49,605-Speed 2629.23 samples/sec   Loss 1.4858   LearningRate 0.0008   Epoch: 18   Global Step: 754420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:53,512-Speed 2621.89 samples/sec   Loss 1.4978   LearningRate 0.0008   Epoch: 18   Global Step: 754430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:31:57,402-Speed 2632.87 samples/sec   Loss 1.4998   LearningRate 0.0008   Epoch: 18   Global Step: 754440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:01,294-Speed 2631.91 samples/sec   Loss 1.5252   LearningRate 0.0008   Epoch: 18   Global Step: 754450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:05,188-Speed 2630.33 samples/sec   Loss 1.4781   LearningRate 0.0008   Epoch: 18   Global Step: 754460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:09,107-Speed 2613.48 samples/sec   Loss 1.4850   LearningRate 0.0008   Epoch: 18   Global Step: 754470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:13,005-Speed 2626.94 samples/sec   Loss 1.5163   LearningRate 0.0008   Epoch: 18   Global Step: 754480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:16,897-Speed 2631.98 samples/sec   Loss 1.4807   LearningRate 0.0008   Epoch: 18   Global Step: 754490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:32:20,767-Speed 2648.07 samples/sec   Loss 1.4676   LearningRate 0.0008   Epoch: 18   Global Step: 754500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:24,661-Speed 2630.36 samples/sec   Loss 1.5081   LearningRate 0.0008   Epoch: 18   Global Step: 754510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:28,549-Speed 2634.20 samples/sec   Loss 1.5063   LearningRate 0.0008   Epoch: 18   Global Step: 754520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:32,445-Speed 2628.88 samples/sec   Loss 1.4410   LearningRate 0.0008   Epoch: 18   Global Step: 754530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:36,388-Speed 2598.01 samples/sec   Loss 1.4899   LearningRate 0.0008   Epoch: 18   Global Step: 754540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:40,282-Speed 2630.55 samples/sec   Loss 1.4587   LearningRate 0.0008   Epoch: 18   Global Step: 754550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:44,182-Speed 2626.32 samples/sec   Loss 1.5127   LearningRate 0.0008   Epoch: 18   Global Step: 754560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:48,078-Speed 2628.93 samples/sec   Loss 1.4981   LearningRate 0.0008   Epoch: 18   Global Step: 754570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:51,969-Speed 2632.51 samples/sec   Loss 1.4603   LearningRate 0.0008   Epoch: 18   Global Step: 754580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:55,877-Speed 2621.83 samples/sec   Loss 1.5117   LearningRate 0.0008   Epoch: 18   Global Step: 754590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:32:59,789-Speed 2617.71 samples/sec   Loss 1.4984   LearningRate 0.0008   Epoch: 18   Global Step: 754600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:03,711-Speed 2612.02 samples/sec   Loss 1.4655   LearningRate 0.0008   Epoch: 18   Global Step: 754610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:07,607-Speed 2628.94 samples/sec   Loss 1.4931   LearningRate 0.0008   Epoch: 18   Global Step: 754620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:11,504-Speed 2628.47 samples/sec   Loss 1.4767   LearningRate 0.0008   Epoch: 18   Global Step: 754630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:15,404-Speed 2626.28 samples/sec   Loss 1.5166   LearningRate 0.0008   Epoch: 18   Global Step: 754640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:19,298-Speed 2630.52 samples/sec   Loss 1.4951   LearningRate 0.0008   Epoch: 18   Global Step: 754650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:23,210-Speed 2618.29 samples/sec   Loss 1.4916   LearningRate 0.0008   Epoch: 18   Global Step: 754660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:33:27,081-Speed 2646.48 samples/sec   Loss 1.5160   LearningRate 0.0008   Epoch: 18   Global Step: 754670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:30,987-Speed 2621.89 samples/sec   Loss 1.4942   LearningRate 0.0008   Epoch: 18   Global Step: 754680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:34,899-Speed 2618.17 samples/sec   Loss 1.5024   LearningRate 0.0008   Epoch: 18   Global Step: 754690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:38,799-Speed 2626.31 samples/sec   Loss 1.4264   LearningRate 0.0008   Epoch: 18   Global Step: 754700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:42,705-Speed 2622.44 samples/sec   Loss 1.5076   LearningRate 0.0008   Epoch: 18   Global Step: 754710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:46,598-Speed 2630.75 samples/sec   Loss 1.5078   LearningRate 0.0008   Epoch: 18   Global Step: 754720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:50,497-Speed 2627.39 samples/sec   Loss 1.5244   LearningRate 0.0008   Epoch: 18   Global Step: 754730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:54,396-Speed 2626.87 samples/sec   Loss 1.4931   LearningRate 0.0008   Epoch: 18   Global Step: 754740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:33:58,292-Speed 2629.31 samples/sec   Loss 1.4809   LearningRate 0.0008   Epoch: 18   Global Step: 754750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:02,201-Speed 2620.22 samples/sec   Loss 1.4330   LearningRate 0.0008   Epoch: 18   Global Step: 754760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:06,077-Speed 2643.11 samples/sec   Loss 1.5397   LearningRate 0.0008   Epoch: 18   Global Step: 754770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:10,009-Speed 2604.77 samples/sec   Loss 1.4980   LearningRate 0.0008   Epoch: 18   Global Step: 754780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:13,926-Speed 2614.92 samples/sec   Loss 1.5099   LearningRate 0.0008   Epoch: 18   Global Step: 754790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:18,038-Speed 2490.68 samples/sec   Loss 1.5162   LearningRate 0.0008   Epoch: 18   Global Step: 754800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:22,118-Speed 2511.04 samples/sec   Loss 1.4284   LearningRate 0.0008   Epoch: 18   Global Step: 754810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:26,198-Speed 2510.65 samples/sec   Loss 1.4612   LearningRate 0.0008   Epoch: 18   Global Step: 754820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:30,276-Speed 2511.34 samples/sec   Loss 1.5488   LearningRate 0.0008   Epoch: 18   Global Step: 754830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:34,360-Speed 2507.40 samples/sec   Loss 1.5162   LearningRate 0.0008   Epoch: 18   Global Step: 754840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:38,434-Speed 2514.01 samples/sec   Loss 1.5035   LearningRate 0.0008   Epoch: 18   Global Step: 754850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:42,329-Speed 2630.60 samples/sec   Loss 1.4648   LearningRate 0.0008   Epoch: 18   Global Step: 754860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:46,224-Speed 2631.22 samples/sec   Loss 1.5398   LearningRate 0.0008   Epoch: 18   Global Step: 754870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:34:50,120-Speed 2628.92 samples/sec   Loss 1.4879   LearningRate 0.0008   Epoch: 18   Global Step: 754880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:34:53,993-Speed 2644.74 samples/sec   Loss 1.4996   LearningRate 0.0008   Epoch: 18   Global Step: 754890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:34:57,887-Speed 2630.09 samples/sec   Loss 1.4584   LearningRate 0.0008   Epoch: 18   Global Step: 754900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:01,783-Speed 2628.66 samples/sec   Loss 1.5285   LearningRate 0.0008   Epoch: 18   Global Step: 754910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:05,677-Speed 2630.18 samples/sec   Loss 1.4531   LearningRate 0.0008   Epoch: 18   Global Step: 754920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:09,574-Speed 2628.79 samples/sec   Loss 1.4925   LearningRate 0.0008   Epoch: 18   Global Step: 754930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:13,469-Speed 2630.04 samples/sec   Loss 1.5117   LearningRate 0.0008   Epoch: 18   Global Step: 754940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:17,378-Speed 2620.28 samples/sec   Loss 1.4360   LearningRate 0.0008   Epoch: 18   Global Step: 754950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:21,272-Speed 2631.38 samples/sec   Loss 1.4825   LearningRate 0.0008   Epoch: 18   Global Step: 754960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:25,168-Speed 2628.87 samples/sec   Loss 1.4808   LearningRate 0.0008   Epoch: 18   Global Step: 754970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:29,077-Speed 2620.04 samples/sec   Loss 1.5290   LearningRate 0.0008   Epoch: 18   Global Step: 754980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:32,974-Speed 2628.38 samples/sec   Loss 1.4616   LearningRate 0.0008   Epoch: 18   Global Step: 754990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:35:36,888-Speed 2616.78 samples/sec   Loss 1.4659   LearningRate 0.0008   Epoch: 18   Global Step: 755000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:35:40,760-Speed 2645.29 samples/sec   Loss 1.4482   LearningRate 0.0008   Epoch: 18   Global Step: 755010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:44,656-Speed 2628.78 samples/sec   Loss 1.4682   LearningRate 0.0008   Epoch: 18   Global Step: 755020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:48,552-Speed 2629.54 samples/sec   Loss 1.4396   LearningRate 0.0008   Epoch: 18   Global Step: 755030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:52,446-Speed 2630.06 samples/sec   Loss 1.5550   LearningRate 0.0008   Epoch: 18   Global Step: 755040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:35:56,340-Speed 2630.17 samples/sec   Loss 1.4899   LearningRate 0.0008   Epoch: 18   Global Step: 755050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:36:00,273-Speed 2604.74 samples/sec   Loss 1.4373   LearningRate 0.0008   Epoch: 18   Global Step: 755060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:36:04,168-Speed 2629.69 samples/sec   Loss 1.5244   LearningRate 0.0008   Epoch: 18   Global Step: 755070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:36:08,062-Speed 2630.21 samples/sec   Loss 1.4508   LearningRate 0.0008   Epoch: 18   Global Step: 755080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:36:11,957-Speed 2629.70 samples/sec   Loss 1.4627   LearningRate 0.0008   Epoch: 18   Global Step: 755090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:36:15,949-Speed 2566.03 samples/sec   Loss 1.4726   LearningRate 0.0008   Epoch: 18   Global Step: 755100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:36:19,837-Speed 2633.97 samples/sec   Loss 1.4621   LearningRate 0.0008   Epoch: 18   Global Step: 755110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:23,729-Speed 2631.85 samples/sec   Loss 1.4592   LearningRate 0.0008   Epoch: 18   Global Step: 755120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:27,624-Speed 2629.95 samples/sec   Loss 1.4975   LearningRate 0.0008   Epoch: 18   Global Step: 755130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:31,517-Speed 2631.26 samples/sec   Loss 1.4481   LearningRate 0.0008   Epoch: 18   Global Step: 755140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:35,409-Speed 2631.93 samples/sec   Loss 1.5137   LearningRate 0.0008   Epoch: 18   Global Step: 755150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:39,302-Speed 2630.41 samples/sec   Loss 1.4922   LearningRate 0.0008   Epoch: 18   Global Step: 755160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:43,196-Speed 2630.54 samples/sec   Loss 1.4626   LearningRate 0.0008   Epoch: 18   Global Step: 755170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:47,089-Speed 2631.27 samples/sec   Loss 1.5466   LearningRate 0.0008   Epoch: 18   Global Step: 755180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:50,989-Speed 2626.74 samples/sec   Loss 1.4448   LearningRate 0.0008   Epoch: 18   Global Step: 755190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:54,905-Speed 2615.06 samples/sec   Loss 1.4888   LearningRate 0.0008   Epoch: 18   Global Step: 755200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:36:58,755-Speed 2661.13 samples/sec   Loss 1.5468   LearningRate 0.0008   Epoch: 18   Global Step: 755210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:37:02,633-Speed 2641.30 samples/sec   Loss 1.4323   LearningRate 0.0008   Epoch: 18   Global Step: 755220   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:06,523-Speed 2632.67 samples/sec   Loss 1.4450   LearningRate 0.0008   Epoch: 18   Global Step: 755230   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:10,415-Speed 2631.70 samples/sec   Loss 1.4871   LearningRate 0.0008   Epoch: 18   Global Step: 755240   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:14,319-Speed 2623.89 samples/sec   Loss 1.5151   LearningRate 0.0008   Epoch: 18   Global Step: 755250   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:18,214-Speed 2629.24 samples/sec   Loss 1.4654   LearningRate 0.0008   Epoch: 18   Global Step: 755260   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:22,108-Speed 2630.65 samples/sec   Loss 1.5183   LearningRate 0.0008   Epoch: 18   Global Step: 755270   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:26,001-Speed 2631.54 samples/sec   Loss 1.5198   LearningRate 0.0008   Epoch: 18   Global Step: 755280   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:29,895-Speed 2630.32 samples/sec   Loss 1.4711   LearningRate 0.0008   Epoch: 18   Global Step: 755290   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:33,792-Speed 2628.68 samples/sec   Loss 1.5418   LearningRate 0.0008   Epoch: 18   Global Step: 755300   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:37,685-Speed 2630.89 samples/sec   Loss 1.4700   LearningRate 0.0008   Epoch: 18   Global Step: 755310   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:37:41,578-Speed 2630.74 samples/sec   Loss 1.4892   LearningRate 0.0008   Epoch: 18   Global Step: 755320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:37:45,479-Speed 2626.01 samples/sec   Loss 1.4817   LearningRate 0.0008   Epoch: 18   Global Step: 755330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:37:49,380-Speed 2625.45 samples/sec   Loss 1.4302   LearningRate 0.0008   Epoch: 18   Global Step: 755340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:37:53,284-Speed 2623.97 samples/sec   Loss 1.4256   LearningRate 0.0008   Epoch: 18   Global Step: 755350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:37:57,181-Speed 2628.03 samples/sec   Loss 1.4788   LearningRate 0.0008   Epoch: 18   Global Step: 755360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:38:01,074-Speed 2631.70 samples/sec   Loss 1.4528   LearningRate 0.0008   Epoch: 18   Global Step: 755370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:38:04,995-Speed 2612.15 samples/sec   Loss 1.4589   LearningRate 0.0008   Epoch: 18   Global Step: 755380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:38:08,892-Speed 2628.40 samples/sec   Loss 1.4337   LearningRate 0.0008   Epoch: 18   Global Step: 755390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:38:12,757-Speed 2649.85 samples/sec   Loss 1.5172   LearningRate 0.0008   Epoch: 18   Global Step: 755400   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:16,651-Speed 2631.82 samples/sec   Loss 1.4854   LearningRate 0.0008   Epoch: 18   Global Step: 755410   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:20,555-Speed 2623.07 samples/sec   Loss 1.4598   LearningRate 0.0008   Epoch: 18   Global Step: 755420   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:24,455-Speed 2625.92 samples/sec   Loss 1.4695   LearningRate 0.0008   Epoch: 18   Global Step: 755430   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:28,352-Speed 2629.14 samples/sec   Loss 1.4348   LearningRate 0.0008   Epoch: 18   Global Step: 755440   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:32,252-Speed 2626.27 samples/sec   Loss 1.4851   LearningRate 0.0008   Epoch: 18   Global Step: 755450   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:36,146-Speed 2630.16 samples/sec   Loss 1.4985   LearningRate 0.0008   Epoch: 18   Global Step: 755460   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:40,057-Speed 2619.18 samples/sec   Loss 1.5371   LearningRate 0.0008   Epoch: 18   Global Step: 755470   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:43,954-Speed 2627.73 samples/sec   Loss 1.4686   LearningRate 0.0008   Epoch: 18   Global Step: 755480   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:47,847-Speed 2631.03 samples/sec   Loss 1.5060   LearningRate 0.0008   Epoch: 18   Global Step: 755490   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:38:51,750-Speed 2624.83 samples/sec   Loss 1.4723   LearningRate 0.0008   Epoch: 18   Global Step: 755500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:38:55,650-Speed 2626.28 samples/sec   Loss 1.5091   LearningRate 0.0008   Epoch: 18   Global Step: 755510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:38:59,556-Speed 2622.41 samples/sec   Loss 1.5037   LearningRate 0.0008   Epoch: 18   Global Step: 755520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:03,457-Speed 2625.35 samples/sec   Loss 1.5013   LearningRate 0.0008   Epoch: 18   Global Step: 755530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:07,350-Speed 2631.32 samples/sec   Loss 1.4515   LearningRate 0.0008   Epoch: 18   Global Step: 755540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:11,241-Speed 2632.30 samples/sec   Loss 1.5011   LearningRate 0.0008   Epoch: 18   Global Step: 755550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:15,157-Speed 2615.47 samples/sec   Loss 1.4589   LearningRate 0.0008   Epoch: 18   Global Step: 755560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:19,063-Speed 2622.13 samples/sec   Loss 1.4411   LearningRate 0.0008   Epoch: 18   Global Step: 755570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:22,960-Speed 2628.52 samples/sec   Loss 1.4628   LearningRate 0.0008   Epoch: 18   Global Step: 755580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:26,855-Speed 2630.38 samples/sec   Loss 1.4976   LearningRate 0.0008   Epoch: 18   Global Step: 755590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:30,751-Speed 2628.51 samples/sec   Loss 1.5025   LearningRate 0.0008   Epoch: 18   Global Step: 755600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:39:34,655-Speed 2623.80 samples/sec   Loss 1.4762   LearningRate 0.0008   Epoch: 18   Global Step: 755610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:39:38,525-Speed 2647.08 samples/sec   Loss 1.5359   LearningRate 0.0008   Epoch: 18   Global Step: 755620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:42,420-Speed 2629.13 samples/sec   Loss 1.4860   LearningRate 0.0008   Epoch: 18   Global Step: 755630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:46,312-Speed 2631.58 samples/sec   Loss 1.4948   LearningRate 0.0008   Epoch: 18   Global Step: 755640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:50,207-Speed 2630.25 samples/sec   Loss 1.5290   LearningRate 0.0008   Epoch: 18   Global Step: 755650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:54,120-Speed 2617.74 samples/sec   Loss 1.5005   LearningRate 0.0008   Epoch: 18   Global Step: 755660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:39:58,017-Speed 2628.74 samples/sec   Loss 1.4843   LearningRate 0.0008   Epoch: 18   Global Step: 755670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:01,913-Speed 2629.12 samples/sec   Loss 1.5388   LearningRate 0.0008   Epoch: 18   Global Step: 755680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:05,832-Speed 2614.23 samples/sec   Loss 1.5235   LearningRate 0.0008   Epoch: 18   Global Step: 755690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:09,725-Speed 2630.67 samples/sec   Loss 1.4617   LearningRate 0.0008   Epoch: 18   Global Step: 755700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:13,616-Speed 2632.66 samples/sec   Loss 1.4728   LearningRate 0.0008   Epoch: 18   Global Step: 755710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:17,515-Speed 2626.66 samples/sec   Loss 1.4745   LearningRate 0.0008   Epoch: 18   Global Step: 755720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:40:21,411-Speed 2629.59 samples/sec   Loss 1.4862   LearningRate 0.0008   Epoch: 18   Global Step: 755730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:40:25,312-Speed 2625.29 samples/sec   Loss 1.4477   LearningRate 0.0008   Epoch: 18   Global Step: 755740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:29,211-Speed 2627.32 samples/sec   Loss 1.4943   LearningRate 0.0008   Epoch: 18   Global Step: 755750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:33,106-Speed 2630.01 samples/sec   Loss 1.4854   LearningRate 0.0008   Epoch: 18   Global Step: 755760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:40:36,980-Speed 2644.34 samples/sec   Loss 1.4397   LearningRate 0.0008   Epoch: 18   Global Step: 755770   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:40:40,874-Speed 2630.46 samples/sec   Loss 1.4335   LearningRate 0.0008   Epoch: 18   Global Step: 755780   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:40:44,768-Speed 2629.51 samples/sec   Loss 1.4416   LearningRate 0.0008   Epoch: 18   Global Step: 755790   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:40:48,672-Speed 2624.12 samples/sec   Loss 1.5459   LearningRate 0.0008   Epoch: 18   Global Step: 755800   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:40:52,566-Speed 2630.82 samples/sec   Loss 1.4784   LearningRate 0.0008   Epoch: 18   Global Step: 755810   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:40:56,618-Speed 2527.92 samples/sec   Loss 1.4819   LearningRate 0.0008   Epoch: 18   Global Step: 755820   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:41:00,721-Speed 2496.18 samples/sec   Loss 1.4553   LearningRate 0.0008   Epoch: 18   Global Step: 755830   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:41:04,616-Speed 2629.95 samples/sec   Loss 1.4756   LearningRate 0.0008   Epoch: 18   Global Step: 755840   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:41:08,549-Speed 2604.41 samples/sec   Loss 1.4615   LearningRate 0.0008   Epoch: 18   Global Step: 755850   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:41:12,451-Speed 2625.03 samples/sec   Loss 1.4951   LearningRate 0.0008   Epoch: 18   Global Step: 755860   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:41:16,364-Speed 2617.69 samples/sec   Loss 1.4342   LearningRate 0.0008   Epoch: 18   Global Step: 755870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:20,264-Speed 2626.58 samples/sec   Loss 1.4526   LearningRate 0.0008   Epoch: 18   Global Step: 755880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:24,167-Speed 2624.21 samples/sec   Loss 1.5180   LearningRate 0.0008   Epoch: 18   Global Step: 755890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:28,068-Speed 2625.96 samples/sec   Loss 1.5112   LearningRate 0.0008   Epoch: 18   Global Step: 755900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:31,962-Speed 2630.49 samples/sec   Loss 1.4967   LearningRate 0.0008   Epoch: 18   Global Step: 755910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:35,852-Speed 2633.06 samples/sec   Loss 1.4613   LearningRate 0.0008   Epoch: 18   Global Step: 755920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:39,767-Speed 2615.77 samples/sec   Loss 1.4661   LearningRate 0.0008   Epoch: 18   Global Step: 755930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:43,660-Speed 2631.15 samples/sec   Loss 1.4587   LearningRate 0.0008   Epoch: 18   Global Step: 755940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:47,561-Speed 2626.28 samples/sec   Loss 1.4867   LearningRate 0.0008   Epoch: 18   Global Step: 755950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:51,455-Speed 2630.66 samples/sec   Loss 1.4687   LearningRate 0.0008   Epoch: 18   Global Step: 755960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:55,344-Speed 2633.51 samples/sec   Loss 1.4608   LearningRate 0.0008   Epoch: 18   Global Step: 755970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:41:59,236-Speed 2631.49 samples/sec   Loss 1.4526   LearningRate 0.0008   Epoch: 18   Global Step: 755980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:03,129-Speed 2631.33 samples/sec   Loss 1.4889   LearningRate 0.0008   Epoch: 18   Global Step: 755990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:07,026-Speed 2628.42 samples/sec   Loss 1.4877   LearningRate 0.0008   Epoch: 18   Global Step: 756000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:10,917-Speed 2631.83 samples/sec   Loss 1.5033   LearningRate 0.0008   Epoch: 18   Global Step: 756010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:14,838-Speed 2612.44 samples/sec   Loss 1.4780   LearningRate 0.0008   Epoch: 18   Global Step: 756020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:18,740-Speed 2625.15 samples/sec   Loss 1.4470   LearningRate 0.0008   Epoch: 18   Global Step: 756030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:22,634-Speed 2630.21 samples/sec   Loss 1.4945   LearningRate 0.0008   Epoch: 18   Global Step: 756040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:26,528-Speed 2630.31 samples/sec   Loss 1.4378   LearningRate 0.0008   Epoch: 18   Global Step: 756050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:30,421-Speed 2631.34 samples/sec   Loss 1.4630   LearningRate 0.0008   Epoch: 18   Global Step: 756060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:34,314-Speed 2630.83 samples/sec   Loss 1.4607   LearningRate 0.0008   Epoch: 18   Global Step: 756070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:42:38,206-Speed 2631.79 samples/sec   Loss 1.4293   LearningRate 0.0008   Epoch: 18   Global Step: 756080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:42:42,102-Speed 2628.83 samples/sec   Loss 1.5091   LearningRate 0.0008   Epoch: 18   Global Step: 756090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:46,005-Speed 2624.69 samples/sec   Loss 1.5153   LearningRate 0.0008   Epoch: 18   Global Step: 756100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:49,923-Speed 2613.95 samples/sec   Loss 1.5034   LearningRate 0.0008   Epoch: 18   Global Step: 756110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:53,818-Speed 2630.25 samples/sec   Loss 1.4861   LearningRate 0.0008   Epoch: 18   Global Step: 756120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:42:57,712-Speed 2630.33 samples/sec   Loss 1.5287   LearningRate 0.0008   Epoch: 18   Global Step: 756130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:01,614-Speed 2625.40 samples/sec   Loss 1.5319   LearningRate 0.0008   Epoch: 18   Global Step: 756140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:05,507-Speed 2631.07 samples/sec   Loss 1.4786   LearningRate 0.0008   Epoch: 18   Global Step: 756150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:09,424-Speed 2614.31 samples/sec   Loss 1.4440   LearningRate 0.0008   Epoch: 18   Global Step: 756160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:13,326-Speed 2625.54 samples/sec   Loss 1.4493   LearningRate 0.0008   Epoch: 18   Global Step: 756170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:17,221-Speed 2629.85 samples/sec   Loss 1.4833   LearningRate 0.0008   Epoch: 18   Global Step: 756180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:21,137-Speed 2615.47 samples/sec   Loss 1.4746   LearningRate 0.0008   Epoch: 18   Global Step: 756190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:43:25,023-Speed 2635.87 samples/sec   Loss 1.4718   LearningRate 0.0008   Epoch: 18   Global Step: 756200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:28,932-Speed 2620.60 samples/sec   Loss 1.5135   LearningRate 0.0008   Epoch: 18   Global Step: 756210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:32,821-Speed 2633.87 samples/sec   Loss 1.4782   LearningRate 0.0008   Epoch: 18   Global Step: 756220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:36,715-Speed 2630.05 samples/sec   Loss 1.4820   LearningRate 0.0008   Epoch: 18   Global Step: 756230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:40,687-Speed 2579.10 samples/sec   Loss 1.4340   LearningRate 0.0008   Epoch: 18   Global Step: 756240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:44,598-Speed 2618.94 samples/sec   Loss 1.5018   LearningRate 0.0008   Epoch: 18   Global Step: 756250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:48,495-Speed 2628.09 samples/sec   Loss 1.4716   LearningRate 0.0008   Epoch: 18   Global Step: 756260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:52,436-Speed 2599.41 samples/sec   Loss 1.4686   LearningRate 0.0008   Epoch: 18   Global Step: 756270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:43:56,335-Speed 2627.22 samples/sec   Loss 1.4495   LearningRate 0.0008   Epoch: 18   Global Step: 756280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:44:00,225-Speed 2633.43 samples/sec   Loss 1.4965   LearningRate 0.0008   Epoch: 18   Global Step: 756290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:44:04,124-Speed 2626.79 samples/sec   Loss 1.4794   LearningRate 0.0008   Epoch: 18   Global Step: 756300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:44:08,019-Speed 2629.84 samples/sec   Loss 1.5291   LearningRate 0.0008   Epoch: 18   Global Step: 756310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:44:11,922-Speed 2623.77 samples/sec   Loss 1.4230   LearningRate 0.0008   Epoch: 18   Global Step: 756320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:44:15,832-Speed 2619.82 samples/sec   Loss 1.4697   LearningRate 0.0008   Epoch: 18   Global Step: 756330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:44:19,738-Speed 2622.60 samples/sec   Loss 1.4150   LearningRate 0.0008   Epoch: 18   Global Step: 756340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:44:23,607-Speed 2647.89 samples/sec   Loss 1.4229   LearningRate 0.0008   Epoch: 18   Global Step: 756350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:44:27,510-Speed 2624.13 samples/sec   Loss 1.4888   LearningRate 0.0008   Epoch: 18   Global Step: 756360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:44:31,412-Speed 2625.43 samples/sec   Loss 1.4946   LearningRate 0.0008   Epoch: 18   Global Step: 756370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:44:35,307-Speed 2630.10 samples/sec   Loss 1.4852   LearningRate 0.0008   Epoch: 18   Global Step: 756380   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:44:39,206-Speed 2626.39 samples/sec   Loss 1.4741   LearningRate 0.0008   Epoch: 18   Global Step: 756390   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:44:43,099-Speed 2630.84 samples/sec   Loss 1.4604   LearningRate 0.0008   Epoch: 18   Global Step: 756400   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:44:46,993-Speed 2630.38 samples/sec   Loss 1.5074   LearningRate 0.0008   Epoch: 18   Global Step: 756410   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:44:50,890-Speed 2628.35 samples/sec   Loss 1.5177   LearningRate 0.0008   Epoch: 18   Global Step: 756420   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:44:54,782-Speed 2631.98 samples/sec   Loss 1.4898   LearningRate 0.0008   Epoch: 18   Global Step: 756430   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:44:58,675-Speed 2631.77 samples/sec   Loss 1.4683   LearningRate 0.0008   Epoch: 18   Global Step: 756440   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:45:02,593-Speed 2613.93 samples/sec   Loss 1.5102   LearningRate 0.0008   Epoch: 18   Global Step: 756450   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:45:06,488-Speed 2630.82 samples/sec   Loss 1.4743   LearningRate 0.0008   Epoch: 18   Global Step: 756460   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:45:10,380-Speed 2630.90 samples/sec   Loss 1.5289   LearningRate 0.0008   Epoch: 18   Global Step: 756470   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:45:14,290-Speed 2619.79 samples/sec   Loss 1.4968   LearningRate 0.0008   Epoch: 18   Global Step: 756480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:18,184-Speed 2630.38 samples/sec   Loss 1.4852   LearningRate 0.0008   Epoch: 18   Global Step: 756490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:22,082-Speed 2627.79 samples/sec   Loss 1.5071   LearningRate 0.0008   Epoch: 18   Global Step: 756500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:25,976-Speed 2630.11 samples/sec   Loss 1.4330   LearningRate 0.0008   Epoch: 18   Global Step: 756510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:29,869-Speed 2631.75 samples/sec   Loss 1.5038   LearningRate 0.0008   Epoch: 18   Global Step: 756520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:33,767-Speed 2627.09 samples/sec   Loss 1.4677   LearningRate 0.0008   Epoch: 18   Global Step: 756530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:37,676-Speed 2620.46 samples/sec   Loss 1.4396   LearningRate 0.0008   Epoch: 18   Global Step: 756540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:41,571-Speed 2629.46 samples/sec   Loss 1.4676   LearningRate 0.0008   Epoch: 18   Global Step: 756550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:45,462-Speed 2632.19 samples/sec   Loss 1.4986   LearningRate 0.0008   Epoch: 18   Global Step: 756560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:49,365-Speed 2624.07 samples/sec   Loss 1.4515   LearningRate 0.0008   Epoch: 18   Global Step: 756570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:45:53,261-Speed 2628.90 samples/sec   Loss 1.4392   LearningRate 0.0008   Epoch: 18   Global Step: 756580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:45:57,150-Speed 2634.65 samples/sec   Loss 1.4634   LearningRate 0.0008   Epoch: 18   Global Step: 756590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:01,044-Speed 2630.51 samples/sec   Loss 1.5064   LearningRate 0.0008   Epoch: 18   Global Step: 756600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:04,939-Speed 2629.81 samples/sec   Loss 1.4514   LearningRate 0.0008   Epoch: 18   Global Step: 756610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:08,829-Speed 2632.86 samples/sec   Loss 1.4659   LearningRate 0.0008   Epoch: 18   Global Step: 756620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:12,724-Speed 2629.40 samples/sec   Loss 1.4635   LearningRate 0.0008   Epoch: 18   Global Step: 756630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:16,619-Speed 2629.17 samples/sec   Loss 1.4601   LearningRate 0.0008   Epoch: 18   Global Step: 756640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:20,509-Speed 2633.61 samples/sec   Loss 1.4907   LearningRate 0.0008   Epoch: 18   Global Step: 756650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:24,419-Speed 2619.19 samples/sec   Loss 1.4627   LearningRate 0.0008   Epoch: 18   Global Step: 756660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:28,325-Speed 2622.57 samples/sec   Loss 1.4469   LearningRate 0.0008   Epoch: 18   Global Step: 756670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:32,221-Speed 2629.11 samples/sec   Loss 1.4759   LearningRate 0.0008   Epoch: 18   Global Step: 756680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:36,133-Speed 2618.40 samples/sec   Loss 1.3725   LearningRate 0.0008   Epoch: 18   Global Step: 756690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:46:40,001-Speed 2648.51 samples/sec   Loss 1.4956   LearningRate 0.0008   Epoch: 18   Global Step: 756700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:43,897-Speed 2629.03 samples/sec   Loss 1.4561   LearningRate 0.0008   Epoch: 18   Global Step: 756710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:47,796-Speed 2627.06 samples/sec   Loss 1.4694   LearningRate 0.0008   Epoch: 18   Global Step: 756720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:51,693-Speed 2628.44 samples/sec   Loss 1.4918   LearningRate 0.0008   Epoch: 18   Global Step: 756730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:55,593-Speed 2626.41 samples/sec   Loss 1.4571   LearningRate 0.0008   Epoch: 18   Global Step: 756740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:46:59,489-Speed 2629.07 samples/sec   Loss 1.4296   LearningRate 0.0008   Epoch: 18   Global Step: 756750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:03,394-Speed 2622.10 samples/sec   Loss 1.4410   LearningRate 0.0008   Epoch: 18   Global Step: 756760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:07,309-Speed 2616.14 samples/sec   Loss 1.4699   LearningRate 0.0008   Epoch: 18   Global Step: 756770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:11,268-Speed 2587.83 samples/sec   Loss 1.4370   LearningRate 0.0008   Epoch: 18   Global Step: 756780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:15,161-Speed 2630.82 samples/sec   Loss 1.4558   LearningRate 0.0008   Epoch: 18   Global Step: 756790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:19,038-Speed 2641.79 samples/sec   Loss 1.5167   LearningRate 0.0008   Epoch: 18   Global Step: 756800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:22,990-Speed 2591.96 samples/sec   Loss 1.4377   LearningRate 0.0008   Epoch: 18   Global Step: 756810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:26,884-Speed 2630.42 samples/sec   Loss 1.5173   LearningRate 0.0008   Epoch: 18   Global Step: 756820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:30,781-Speed 2628.96 samples/sec   Loss 1.4604   LearningRate 0.0008   Epoch: 18   Global Step: 756830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:34,674-Speed 2630.20 samples/sec   Loss 1.5239   LearningRate 0.0008   Epoch: 18   Global Step: 756840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:38,611-Speed 2601.80 samples/sec   Loss 1.5041   LearningRate 0.0008   Epoch: 18   Global Step: 756850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:42,505-Speed 2630.06 samples/sec   Loss 1.4634   LearningRate 0.0008   Epoch: 18   Global Step: 756860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:46,401-Speed 2629.91 samples/sec   Loss 1.4817   LearningRate 0.0008   Epoch: 18   Global Step: 756870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:50,297-Speed 2628.65 samples/sec   Loss 1.4869   LearningRate 0.0008   Epoch: 18   Global Step: 756880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:54,223-Speed 2609.61 samples/sec   Loss 1.4753   LearningRate 0.0008   Epoch: 18   Global Step: 756890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:47:58,117-Speed 2629.99 samples/sec   Loss 1.4466   LearningRate 0.0008   Epoch: 18   Global Step: 756900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:48:02,022-Speed 2623.48 samples/sec   Loss 1.5085   LearningRate 0.0008   Epoch: 18   Global Step: 756910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:48:05,894-Speed 2644.83 samples/sec   Loss 1.5443   LearningRate 0.0008   Epoch: 18   Global Step: 756920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:09,797-Speed 2624.55 samples/sec   Loss 1.4325   LearningRate 0.0008   Epoch: 18   Global Step: 756930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:13,700-Speed 2624.43 samples/sec   Loss 1.4490   LearningRate 0.0008   Epoch: 18   Global Step: 756940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:17,608-Speed 2620.77 samples/sec   Loss 1.4848   LearningRate 0.0008   Epoch: 18   Global Step: 756950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:21,513-Speed 2622.95 samples/sec   Loss 1.4973   LearningRate 0.0008   Epoch: 18   Global Step: 756960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:25,448-Speed 2603.24 samples/sec   Loss 1.4514   LearningRate 0.0008   Epoch: 18   Global Step: 756970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:29,348-Speed 2626.55 samples/sec   Loss 1.5006   LearningRate 0.0008   Epoch: 18   Global Step: 756980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:33,251-Speed 2625.03 samples/sec   Loss 1.4475   LearningRate 0.0008   Epoch: 18   Global Step: 756990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:37,168-Speed 2614.85 samples/sec   Loss 1.4899   LearningRate 0.0008   Epoch: 18   Global Step: 757000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:41,064-Speed 2629.29 samples/sec   Loss 1.4907   LearningRate 0.0008   Epoch: 18   Global Step: 757010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:44,933-Speed 2647.03 samples/sec   Loss 1.4546   LearningRate 0.0008   Epoch: 18   Global Step: 757020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:48,823-Speed 2633.34 samples/sec   Loss 1.4514   LearningRate 0.0008   Epoch: 18   Global Step: 757030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:52,734-Speed 2619.59 samples/sec   Loss 1.3726   LearningRate 0.0008   Epoch: 18   Global Step: 757040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:48:56,627-Speed 2630.58 samples/sec   Loss 1.4324   LearningRate 0.0008   Epoch: 18   Global Step: 757050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:00,537-Speed 2620.18 samples/sec   Loss 1.4946   LearningRate 0.0008   Epoch: 18   Global Step: 757060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:04,442-Speed 2622.74 samples/sec   Loss 1.4818   LearningRate 0.0008   Epoch: 18   Global Step: 757070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:08,335-Speed 2631.55 samples/sec   Loss 1.4604   LearningRate 0.0008   Epoch: 18   Global Step: 757080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:12,229-Speed 2631.29 samples/sec   Loss 1.4484   LearningRate 0.0008   Epoch: 18   Global Step: 757090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:16,123-Speed 2630.71 samples/sec   Loss 1.4599   LearningRate 0.0008   Epoch: 18   Global Step: 757100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:20,021-Speed 2627.20 samples/sec   Loss 1.4631   LearningRate 0.0008   Epoch: 18   Global Step: 757110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:23,913-Speed 2632.01 samples/sec   Loss 1.4600   LearningRate 0.0008   Epoch: 18   Global Step: 757120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:49:27,907-Speed 2564.30 samples/sec   Loss 1.4971   LearningRate 0.0008   Epoch: 18   Global Step: 757130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:49:31,824-Speed 2615.67 samples/sec   Loss 1.4469   LearningRate 0.0008   Epoch: 18   Global Step: 757140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:49:35,715-Speed 2632.62 samples/sec   Loss 1.5102   LearningRate 0.0008   Epoch: 18   Global Step: 757150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:49:39,594-Speed 2639.91 samples/sec   Loss 1.4414   LearningRate 0.0008   Epoch: 18   Global Step: 757160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:43,489-Speed 2629.42 samples/sec   Loss 1.4368   LearningRate 0.0008   Epoch: 18   Global Step: 757170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:47,382-Speed 2631.38 samples/sec   Loss 1.4724   LearningRate 0.0008   Epoch: 18   Global Step: 757180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:51,276-Speed 2630.26 samples/sec   Loss 1.4399   LearningRate 0.0008   Epoch: 18   Global Step: 757190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:55,171-Speed 2630.62 samples/sec   Loss 1.4690   LearningRate 0.0008   Epoch: 18   Global Step: 757200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:49:59,071-Speed 2626.28 samples/sec   Loss 1.4581   LearningRate 0.0008   Epoch: 18   Global Step: 757210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:02,997-Speed 2609.22 samples/sec   Loss 1.5163   LearningRate 0.0008   Epoch: 18   Global Step: 757220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:06,890-Speed 2631.13 samples/sec   Loss 1.4729   LearningRate 0.0008   Epoch: 18   Global Step: 757230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:10,780-Speed 2633.03 samples/sec   Loss 1.4882   LearningRate 0.0008   Epoch: 18   Global Step: 757240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:14,670-Speed 2632.77 samples/sec   Loss 1.4314   LearningRate 0.0008   Epoch: 18   Global Step: 757250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:18,572-Speed 2625.23 samples/sec   Loss 1.4350   LearningRate 0.0008   Epoch: 18   Global Step: 757260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:50:22,443-Speed 2646.45 samples/sec   Loss 1.4668   LearningRate 0.0008   Epoch: 18   Global Step: 757270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:26,339-Speed 2629.18 samples/sec   Loss 1.4257   LearningRate 0.0008   Epoch: 18   Global Step: 757280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:30,234-Speed 2629.87 samples/sec   Loss 1.4574   LearningRate 0.0008   Epoch: 18   Global Step: 757290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:34,128-Speed 2629.94 samples/sec   Loss 1.4386   LearningRate 0.0008   Epoch: 18   Global Step: 757300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:38,020-Speed 2632.40 samples/sec   Loss 1.4252   LearningRate 0.0008   Epoch: 18   Global Step: 757310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:41,911-Speed 2632.30 samples/sec   Loss 1.4843   LearningRate 0.0008   Epoch: 18   Global Step: 757320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:45,806-Speed 2629.03 samples/sec   Loss 1.4838   LearningRate 0.0008   Epoch: 18   Global Step: 757330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:49,704-Speed 2627.34 samples/sec   Loss 1.4775   LearningRate 0.0008   Epoch: 18   Global Step: 757340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:53,592-Speed 2635.26 samples/sec   Loss 1.4616   LearningRate 0.0008   Epoch: 18   Global Step: 757350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:50:57,481-Speed 2634.11 samples/sec   Loss 1.4684   LearningRate 0.0008   Epoch: 18   Global Step: 757360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:51:01,378-Speed 2628.18 samples/sec   Loss 1.4583   LearningRate 0.0008   Epoch: 18   Global Step: 757370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:05,292-Speed 2617.11 samples/sec   Loss 1.4509   LearningRate 0.0008   Epoch: 18   Global Step: 757380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:09,187-Speed 2630.06 samples/sec   Loss 1.4368   LearningRate 0.0008   Epoch: 18   Global Step: 757390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:13,083-Speed 2629.10 samples/sec   Loss 1.4447   LearningRate 0.0008   Epoch: 18   Global Step: 757400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:16,983-Speed 2625.66 samples/sec   Loss 1.4450   LearningRate 0.0008   Epoch: 18   Global Step: 757410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:20,874-Speed 2632.74 samples/sec   Loss 1.5023   LearningRate 0.0008   Epoch: 18   Global Step: 757420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:24,767-Speed 2630.84 samples/sec   Loss 1.5205   LearningRate 0.0008   Epoch: 18   Global Step: 757430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:28,658-Speed 2635.27 samples/sec   Loss 1.4310   LearningRate 0.0008   Epoch: 18   Global Step: 757440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:32,558-Speed 2625.97 samples/sec   Loss 1.4059   LearningRate 0.0008   Epoch: 18   Global Step: 757450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:36,461-Speed 2625.40 samples/sec   Loss 1.4549   LearningRate 0.0008   Epoch: 18   Global Step: 757460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:40,328-Speed 2648.42 samples/sec   Loss 1.4383   LearningRate 0.0008   Epoch: 18   Global Step: 757470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:51:44,197-Speed 2646.92 samples/sec   Loss 1.5209   LearningRate 0.0008   Epoch: 18   Global Step: 757480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:51:48,090-Speed 2630.62 samples/sec   Loss 1.4722   LearningRate 0.0008   Epoch: 18   Global Step: 757490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:51:51,982-Speed 2632.45 samples/sec   Loss 1.4470   LearningRate 0.0008   Epoch: 18   Global Step: 757500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:51:55,871-Speed 2633.23 samples/sec   Loss 1.5365   LearningRate 0.0008   Epoch: 18   Global Step: 757510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:51:59,764-Speed 2631.55 samples/sec   Loss 1.4486   LearningRate 0.0008   Epoch: 18   Global Step: 757520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:03,655-Speed 2632.07 samples/sec   Loss 1.4689   LearningRate 0.0008   Epoch: 18   Global Step: 757530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:07,552-Speed 2628.51 samples/sec   Loss 1.4270   LearningRate 0.0008   Epoch: 18   Global Step: 757540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:11,480-Speed 2607.16 samples/sec   Loss 1.4375   LearningRate 0.0008   Epoch: 18   Global Step: 757550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:15,405-Speed 2609.24 samples/sec   Loss 1.4312   LearningRate 0.0008   Epoch: 18   Global Step: 757560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:19,304-Speed 2627.05 samples/sec   Loss 1.4830   LearningRate 0.0008   Epoch: 18   Global Step: 757570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:23,200-Speed 2629.40 samples/sec   Loss 1.5000   LearningRate 0.0008   Epoch: 18   Global Step: 757580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:52:27,070-Speed 2646.75 samples/sec   Loss 1.4974   LearningRate 0.0008   Epoch: 18   Global Step: 757590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:30,960-Speed 2632.89 samples/sec   Loss 1.4386   LearningRate 0.0008   Epoch: 18   Global Step: 757600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:34,855-Speed 2629.00 samples/sec   Loss 1.4201   LearningRate 0.0008   Epoch: 18   Global Step: 757610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:38,786-Speed 2606.38 samples/sec   Loss 1.4898   LearningRate 0.0008   Epoch: 18   Global Step: 757620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:42,680-Speed 2631.00 samples/sec   Loss 1.3954   LearningRate 0.0008   Epoch: 18   Global Step: 757630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:46,578-Speed 2627.45 samples/sec   Loss 1.4226   LearningRate 0.0008   Epoch: 18   Global Step: 757640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:50,488-Speed 2619.70 samples/sec   Loss 1.4927   LearningRate 0.0008   Epoch: 18   Global Step: 757650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:54,391-Speed 2624.00 samples/sec   Loss 1.5059   LearningRate 0.0008   Epoch: 18   Global Step: 757660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:52:58,283-Speed 2631.85 samples/sec   Loss 1.4638   LearningRate 0.0008   Epoch: 18   Global Step: 757670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:53:02,175-Speed 2631.50 samples/sec   Loss 1.4444   LearningRate 0.0008   Epoch: 18   Global Step: 757680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:53:06,066-Speed 2632.70 samples/sec   Loss 1.4636   LearningRate 0.0008   Epoch: 18   Global Step: 757690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:09,967-Speed 2625.74 samples/sec   Loss 1.4360   LearningRate 0.0008   Epoch: 18   Global Step: 757700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:13,860-Speed 2631.98 samples/sec   Loss 1.4749   LearningRate 0.0008   Epoch: 18   Global Step: 757710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:17,756-Speed 2629.19 samples/sec   Loss 1.4450   LearningRate 0.0008   Epoch: 18   Global Step: 757720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:21,649-Speed 2630.94 samples/sec   Loss 1.4295   LearningRate 0.0008   Epoch: 18   Global Step: 757730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:25,547-Speed 2628.14 samples/sec   Loss 1.4666   LearningRate 0.0007   Epoch: 18   Global Step: 757740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:29,443-Speed 2629.07 samples/sec   Loss 1.4957   LearningRate 0.0007   Epoch: 18   Global Step: 757750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:33,333-Speed 2632.66 samples/sec   Loss 1.4758   LearningRate 0.0007   Epoch: 18   Global Step: 757760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:37,234-Speed 2625.30 samples/sec   Loss 1.4500   LearningRate 0.0007   Epoch: 18   Global Step: 757770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:41,131-Speed 2628.73 samples/sec   Loss 1.4627   LearningRate 0.0007   Epoch: 18   Global Step: 757780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:53:45,004-Speed 2644.56 samples/sec   Loss 1.4381   LearningRate 0.0007   Epoch: 18   Global Step: 757790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:53:48,900-Speed 2629.52 samples/sec   Loss 1.4703   LearningRate 0.0007   Epoch: 18   Global Step: 757800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:53:52,792-Speed 2631.19 samples/sec   Loss 1.4379   LearningRate 0.0007   Epoch: 18   Global Step: 757810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:53:56,685-Speed 2631.58 samples/sec   Loss 1.4234   LearningRate 0.0007   Epoch: 18   Global Step: 757820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:00,583-Speed 2627.52 samples/sec   Loss 1.4299   LearningRate 0.0007   Epoch: 18   Global Step: 757830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:04,479-Speed 2628.76 samples/sec   Loss 1.4220   LearningRate 0.0007   Epoch: 18   Global Step: 757840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:08,372-Speed 2630.84 samples/sec   Loss 1.4942   LearningRate 0.0007   Epoch: 18   Global Step: 757850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:12,270-Speed 2628.62 samples/sec   Loss 1.4597   LearningRate 0.0007   Epoch: 18   Global Step: 757860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:16,163-Speed 2630.42 samples/sec   Loss 1.4808   LearningRate 0.0007   Epoch: 18   Global Step: 757870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:20,151-Speed 2569.03 samples/sec   Loss 1.4986   LearningRate 0.0007   Epoch: 18   Global Step: 757880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:24,023-Speed 2644.76 samples/sec   Loss 1.4434   LearningRate 0.0007   Epoch: 18   Global Step: 757890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:27,925-Speed 2625.58 samples/sec   Loss 1.4317   LearningRate 0.0007   Epoch: 18   Global Step: 757900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:31,820-Speed 2629.33 samples/sec   Loss 1.4806   LearningRate 0.0007   Epoch: 18   Global Step: 757910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:35,733-Speed 2617.47 samples/sec   Loss 1.4951   LearningRate 0.0007   Epoch: 18   Global Step: 757920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:39,623-Speed 2633.10 samples/sec   Loss 1.4584   LearningRate 0.0007   Epoch: 18   Global Step: 757930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:43,528-Speed 2623.92 samples/sec   Loss 1.4676   LearningRate 0.0007   Epoch: 18   Global Step: 757940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:47,433-Speed 2622.53 samples/sec   Loss 1.4933   LearningRate 0.0007   Epoch: 18   Global Step: 757950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:51,327-Speed 2630.18 samples/sec   Loss 1.4814   LearningRate 0.0007   Epoch: 18   Global Step: 757960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:55,300-Speed 2577.99 samples/sec   Loss 1.4881   LearningRate 0.0007   Epoch: 18   Global Step: 757970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:54:59,232-Speed 2605.47 samples/sec   Loss 1.4417   LearningRate 0.0007   Epoch: 18   Global Step: 757980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:55:03,126-Speed 2630.61 samples/sec   Loss 1.3684   LearningRate 0.0007   Epoch: 18   Global Step: 757990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:55:07,012-Speed 2635.21 samples/sec   Loss 1.4481   LearningRate 0.0007   Epoch: 18   Global Step: 758000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:55:10,884-Speed 2645.53 samples/sec   Loss 1.4554   LearningRate 0.0007   Epoch: 18   Global Step: 758010   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:14,774-Speed 2632.70 samples/sec   Loss 1.4296   LearningRate 0.0007   Epoch: 18   Global Step: 758020   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:18,670-Speed 2629.38 samples/sec   Loss 1.4900   LearningRate 0.0007   Epoch: 18   Global Step: 758030   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:22,607-Speed 2602.03 samples/sec   Loss 1.4459   LearningRate 0.0007   Epoch: 18   Global Step: 758040   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:26,506-Speed 2627.03 samples/sec   Loss 1.4218   LearningRate 0.0007   Epoch: 18   Global Step: 758050   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:30,409-Speed 2624.39 samples/sec   Loss 1.4759   LearningRate 0.0007   Epoch: 18   Global Step: 758060   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:34,301-Speed 2631.35 samples/sec   Loss 1.4265   LearningRate 0.0007   Epoch: 18   Global Step: 758070   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:38,194-Speed 2631.01 samples/sec   Loss 1.4660   LearningRate 0.0007   Epoch: 18   Global Step: 758080   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:42,092-Speed 2628.12 samples/sec   Loss 1.4594   LearningRate 0.0007   Epoch: 18   Global Step: 758090   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:45,995-Speed 2623.93 samples/sec   Loss 1.4683   LearningRate 0.0007   Epoch: 18   Global Step: 758100   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:55:49,893-Speed 2628.15 samples/sec   Loss 1.4649   LearningRate 0.0007   Epoch: 18   Global Step: 758110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:55:53,787-Speed 2630.60 samples/sec   Loss 1.4460   LearningRate 0.0007   Epoch: 18   Global Step: 758120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:55:57,679-Speed 2631.69 samples/sec   Loss 1.4720   LearningRate 0.0007   Epoch: 18   Global Step: 758130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:01,576-Speed 2628.38 samples/sec   Loss 1.4573   LearningRate 0.0007   Epoch: 18   Global Step: 758140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:05,472-Speed 2629.25 samples/sec   Loss 1.4296   LearningRate 0.0007   Epoch: 18   Global Step: 758150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:09,366-Speed 2629.79 samples/sec   Loss 1.4693   LearningRate 0.0007   Epoch: 18   Global Step: 758160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:13,261-Speed 2629.80 samples/sec   Loss 1.4622   LearningRate 0.0007   Epoch: 18   Global Step: 758170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:17,177-Speed 2615.85 samples/sec   Loss 1.4616   LearningRate 0.0007   Epoch: 18   Global Step: 758180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:21,075-Speed 2626.99 samples/sec   Loss 1.5077   LearningRate 0.0007   Epoch: 18   Global Step: 758190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:24,973-Speed 2627.69 samples/sec   Loss 1.4405   LearningRate 0.0007   Epoch: 18   Global Step: 758200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:56:28,844-Speed 2646.33 samples/sec   Loss 1.4941   LearningRate 0.0007   Epoch: 18   Global Step: 758210   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:32,740-Speed 2628.99 samples/sec   Loss 1.4482   LearningRate 0.0007   Epoch: 18   Global Step: 758220   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:36,636-Speed 2629.25 samples/sec   Loss 1.4625   LearningRate 0.0007   Epoch: 18   Global Step: 758230   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:40,532-Speed 2629.16 samples/sec   Loss 1.3794   LearningRate 0.0007   Epoch: 18   Global Step: 758240   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:44,426-Speed 2629.97 samples/sec   Loss 1.5260   LearningRate 0.0007   Epoch: 18   Global Step: 758250   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:48,326-Speed 2626.27 samples/sec   Loss 1.4520   LearningRate 0.0007   Epoch: 18   Global Step: 758260   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:52,223-Speed 2628.15 samples/sec   Loss 1.4626   LearningRate 0.0007   Epoch: 18   Global Step: 758270   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:56:56,120-Speed 2628.67 samples/sec   Loss 1.4533   LearningRate 0.0007   Epoch: 18   Global Step: 758280   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:57:00,014-Speed 2629.95 samples/sec   Loss 1.4644   LearningRate 0.0007   Epoch: 18   Global Step: 758290   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:57:03,910-Speed 2629.08 samples/sec   Loss 1.4486   LearningRate 0.0007   Epoch: 18   Global Step: 758300   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 08:57:07,802-Speed 2631.35 samples/sec   Loss 1.4992   LearningRate 0.0007   Epoch: 18   Global Step: 758310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:11,698-Speed 2629.40 samples/sec   Loss 1.4885   LearningRate 0.0007   Epoch: 18   Global Step: 758320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:15,588-Speed 2633.10 samples/sec   Loss 1.4578   LearningRate 0.0007   Epoch: 18   Global Step: 758330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:19,484-Speed 2629.71 samples/sec   Loss 1.4372   LearningRate 0.0007   Epoch: 18   Global Step: 758340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:23,378-Speed 2630.48 samples/sec   Loss 1.4453   LearningRate 0.0007   Epoch: 18   Global Step: 758350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:27,272-Speed 2629.98 samples/sec   Loss 1.4483   LearningRate 0.0007   Epoch: 18   Global Step: 758360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:31,170-Speed 2627.91 samples/sec   Loss 1.4865   LearningRate 0.0007   Epoch: 18   Global Step: 758370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:35,063-Speed 2630.99 samples/sec   Loss 1.4071   LearningRate 0.0007   Epoch: 18   Global Step: 758380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:38,957-Speed 2630.38 samples/sec   Loss 1.5030   LearningRate 0.0007   Epoch: 18   Global Step: 758390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:42,855-Speed 2627.92 samples/sec   Loss 1.4962   LearningRate 0.0007   Epoch: 18   Global Step: 758400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:46,752-Speed 2628.88 samples/sec   Loss 1.4448   LearningRate 0.0007   Epoch: 18   Global Step: 758410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:57:50,628-Speed 2641.95 samples/sec   Loss 1.4840   LearningRate 0.0007   Epoch: 18   Global Step: 758420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:54,527-Speed 2627.12 samples/sec   Loss 1.4316   LearningRate 0.0007   Epoch: 18   Global Step: 758430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:57:58,583-Speed 2525.56 samples/sec   Loss 1.4533   LearningRate 0.0007   Epoch: 18   Global Step: 758440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:02,531-Speed 2594.44 samples/sec   Loss 1.4325   LearningRate 0.0007   Epoch: 18   Global Step: 758450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:06,428-Speed 2627.81 samples/sec   Loss 1.4636   LearningRate 0.0007   Epoch: 18   Global Step: 758460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:10,331-Speed 2624.72 samples/sec   Loss 1.4087   LearningRate 0.0007   Epoch: 18   Global Step: 758470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:14,225-Speed 2630.86 samples/sec   Loss 1.4833   LearningRate 0.0007   Epoch: 18   Global Step: 758480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:18,116-Speed 2632.50 samples/sec   Loss 1.4096   LearningRate 0.0007   Epoch: 18   Global Step: 758490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:22,014-Speed 2627.54 samples/sec   Loss 1.4760   LearningRate 0.0007   Epoch: 18   Global Step: 758500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:25,909-Speed 2629.73 samples/sec   Loss 1.4397   LearningRate 0.0007   Epoch: 18   Global Step: 758510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:29,800-Speed 2631.99 samples/sec   Loss 1.4339   LearningRate 0.0007   Epoch: 18   Global Step: 758520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:58:33,703-Speed 2624.38 samples/sec   Loss 1.4593   LearningRate 0.0007   Epoch: 18   Global Step: 758530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:58:37,618-Speed 2616.38 samples/sec   Loss 1.4895   LearningRate 0.0007   Epoch: 18   Global Step: 758540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:58:41,513-Speed 2629.19 samples/sec   Loss 1.4996   LearningRate 0.0007   Epoch: 18   Global Step: 758550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:58:45,417-Speed 2624.25 samples/sec   Loss 1.4706   LearningRate 0.0007   Epoch: 18   Global Step: 758560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:58:49,312-Speed 2629.77 samples/sec   Loss 1.4403   LearningRate 0.0007   Epoch: 18   Global Step: 758570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:58:53,182-Speed 2646.39 samples/sec   Loss 1.4918   LearningRate 0.0007   Epoch: 18   Global Step: 758580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:58:57,079-Speed 2628.25 samples/sec   Loss 1.4542   LearningRate 0.0007   Epoch: 18   Global Step: 758590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:00,975-Speed 2629.06 samples/sec   Loss 1.4780   LearningRate 0.0007   Epoch: 18   Global Step: 758600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:04,880-Speed 2622.82 samples/sec   Loss 1.4306   LearningRate 0.0007   Epoch: 18   Global Step: 758610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:08,795-Speed 2616.42 samples/sec   Loss 1.4707   LearningRate 0.0007   Epoch: 18   Global Step: 758620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:12,691-Speed 2628.41 samples/sec   Loss 1.4694   LearningRate 0.0007   Epoch: 18   Global Step: 758630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:16,590-Speed 2627.24 samples/sec   Loss 1.3826   LearningRate 0.0007   Epoch: 18   Global Step: 758640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:20,483-Speed 2632.25 samples/sec   Loss 1.4818   LearningRate 0.0007   Epoch: 18   Global Step: 758650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:24,379-Speed 2629.43 samples/sec   Loss 1.4847   LearningRate 0.0007   Epoch: 18   Global Step: 758660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:28,268-Speed 2633.50 samples/sec   Loss 1.4532   LearningRate 0.0007   Epoch: 18   Global Step: 758670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:32,160-Speed 2631.64 samples/sec   Loss 1.4094   LearningRate 0.0007   Epoch: 18   Global Step: 758680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:59:36,052-Speed 2631.14 samples/sec   Loss 1.4735   LearningRate 0.0007   Epoch: 18   Global Step: 758690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:59:39,948-Speed 2629.24 samples/sec   Loss 1.4782   LearningRate 0.0007   Epoch: 18   Global Step: 758700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:59:43,843-Speed 2630.76 samples/sec   Loss 1.4454   LearningRate 0.0007   Epoch: 18   Global Step: 758710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:59:47,738-Speed 2629.45 samples/sec   Loss 1.4911   LearningRate 0.0007   Epoch: 18   Global Step: 758720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 08:59:51,622-Speed 2637.68 samples/sec   Loss 1.5103   LearningRate 0.0007   Epoch: 18   Global Step: 758730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:55,513-Speed 2632.19 samples/sec   Loss 1.4547   LearningRate 0.0007   Epoch: 18   Global Step: 758740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 08:59:59,442-Speed 2606.73 samples/sec   Loss 1.4976   LearningRate 0.0007   Epoch: 18   Global Step: 758750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:03,336-Speed 2630.54 samples/sec   Loss 1.4395   LearningRate 0.0007   Epoch: 18   Global Step: 758760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:07,231-Speed 2630.17 samples/sec   Loss 1.4821   LearningRate 0.0007   Epoch: 18   Global Step: 758770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:11,125-Speed 2629.61 samples/sec   Loss 1.3982   LearningRate 0.0007   Epoch: 18   Global Step: 758780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:15,024-Speed 2627.38 samples/sec   Loss 1.4782   LearningRate 0.0007   Epoch: 18   Global Step: 758790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:18,917-Speed 2631.06 samples/sec   Loss 1.4450   LearningRate 0.0007   Epoch: 18   Global Step: 758800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:22,809-Speed 2632.17 samples/sec   Loss 1.4851   LearningRate 0.0007   Epoch: 18   Global Step: 758810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:26,705-Speed 2629.38 samples/sec   Loss 1.4424   LearningRate 0.0007   Epoch: 18   Global Step: 758820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:30,588-Speed 2637.40 samples/sec   Loss 1.4572   LearningRate 0.0007   Epoch: 18   Global Step: 758830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:34,491-Speed 2624.40 samples/sec   Loss 1.4507   LearningRate 0.0007   Epoch: 18   Global Step: 758840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:38,384-Speed 2630.93 samples/sec   Loss 1.4559   LearningRate 0.0007   Epoch: 18   Global Step: 758850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:42,287-Speed 2623.79 samples/sec   Loss 1.4469   LearningRate 0.0007   Epoch: 18   Global Step: 758860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:46,175-Speed 2634.42 samples/sec   Loss 1.4664   LearningRate 0.0007   Epoch: 18   Global Step: 758870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:50,066-Speed 2632.50 samples/sec   Loss 1.4221   LearningRate 0.0007   Epoch: 18   Global Step: 758880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:53,961-Speed 2629.58 samples/sec   Loss 1.5145   LearningRate 0.0007   Epoch: 18   Global Step: 758890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:00:57,856-Speed 2630.00 samples/sec   Loss 1.4933   LearningRate 0.0007   Epoch: 18   Global Step: 758900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:01,749-Speed 2631.23 samples/sec   Loss 1.4635   LearningRate 0.0007   Epoch: 18   Global Step: 758910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:05,644-Speed 2629.38 samples/sec   Loss 1.4420   LearningRate 0.0007   Epoch: 18   Global Step: 758920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:09,540-Speed 2629.02 samples/sec   Loss 1.5186   LearningRate 0.0007   Epoch: 18   Global Step: 758930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:01:13,435-Speed 2629.70 samples/sec   Loss 1.4539   LearningRate 0.0007   Epoch: 18   Global Step: 758940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:01:17,337-Speed 2624.55 samples/sec   Loss 1.4511   LearningRate 0.0007   Epoch: 18   Global Step: 758950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:01:21,210-Speed 2645.40 samples/sec   Loss 1.4803   LearningRate 0.0007   Epoch: 18   Global Step: 758960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:25,102-Speed 2631.53 samples/sec   Loss 1.4095   LearningRate 0.0007   Epoch: 18   Global Step: 758970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:28,996-Speed 2631.01 samples/sec   Loss 1.4946   LearningRate 0.0007   Epoch: 18   Global Step: 758980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:32,889-Speed 2631.20 samples/sec   Loss 1.4198   LearningRate 0.0007   Epoch: 18   Global Step: 758990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:36,809-Speed 2612.52 samples/sec   Loss 1.4569   LearningRate 0.0007   Epoch: 18   Global Step: 759000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:40,704-Speed 2629.67 samples/sec   Loss 1.4361   LearningRate 0.0007   Epoch: 18   Global Step: 759010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:44,601-Speed 2628.96 samples/sec   Loss 1.4421   LearningRate 0.0007   Epoch: 18   Global Step: 759020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:48,499-Speed 2627.49 samples/sec   Loss 1.4321   LearningRate 0.0007   Epoch: 18   Global Step: 759030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:52,401-Speed 2624.86 samples/sec   Loss 1.4197   LearningRate 0.0007   Epoch: 18   Global Step: 759040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:01:56,319-Speed 2614.54 samples/sec   Loss 1.4385   LearningRate 0.0007   Epoch: 18   Global Step: 759050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:00,236-Speed 2614.64 samples/sec   Loss 1.4408   LearningRate 0.0007   Epoch: 18   Global Step: 759060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:02:04,141-Speed 2622.87 samples/sec   Loss 1.4872   LearningRate 0.0007   Epoch: 18   Global Step: 759070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:02:08,037-Speed 2628.69 samples/sec   Loss 1.4447   LearningRate 0.0007   Epoch: 18   Global Step: 759080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:02:11,937-Speed 2626.84 samples/sec   Loss 1.4072   LearningRate 0.0007   Epoch: 18   Global Step: 759090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:02:15,840-Speed 2624.08 samples/sec   Loss 1.5108   LearningRate 0.0007   Epoch: 18   Global Step: 759100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:19,735-Speed 2629.76 samples/sec   Loss 1.4278   LearningRate 0.0007   Epoch: 18   Global Step: 759110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:23,641-Speed 2622.82 samples/sec   Loss 1.4222   LearningRate 0.0007   Epoch: 18   Global Step: 759120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:27,548-Speed 2621.81 samples/sec   Loss 1.4731   LearningRate 0.0007   Epoch: 18   Global Step: 759130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:31,444-Speed 2628.41 samples/sec   Loss 1.4726   LearningRate 0.0007   Epoch: 18   Global Step: 759140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:35,340-Speed 2629.85 samples/sec   Loss 1.4326   LearningRate 0.0007   Epoch: 18   Global Step: 759150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:39,233-Speed 2631.44 samples/sec   Loss 1.4346   LearningRate 0.0007   Epoch: 18   Global Step: 759160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:43,133-Speed 2625.88 samples/sec   Loss 1.4027   LearningRate 0.0007   Epoch: 18   Global Step: 759170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:47,034-Speed 2626.59 samples/sec   Loss 1.4901   LearningRate 0.0007   Epoch: 18   Global Step: 759180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:50,927-Speed 2630.70 samples/sec   Loss 1.4478   LearningRate 0.0007   Epoch: 18   Global Step: 759190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:02:54,824-Speed 2628.07 samples/sec   Loss 1.4543   LearningRate 0.0007   Epoch: 18   Global Step: 759200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:02:58,750-Speed 2608.92 samples/sec   Loss 1.4385   LearningRate 0.0007   Epoch: 18   Global Step: 759210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:02,658-Speed 2621.52 samples/sec   Loss 1.4641   LearningRate 0.0007   Epoch: 18   Global Step: 759220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:06,554-Speed 2628.78 samples/sec   Loss 1.4277   LearningRate 0.0007   Epoch: 18   Global Step: 759230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:10,446-Speed 2631.36 samples/sec   Loss 1.4348   LearningRate 0.0007   Epoch: 18   Global Step: 759240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:14,368-Speed 2611.50 samples/sec   Loss 1.4291   LearningRate 0.0007   Epoch: 18   Global Step: 759250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:18,266-Speed 2628.59 samples/sec   Loss 1.4119   LearningRate 0.0007   Epoch: 18   Global Step: 759260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:22,163-Speed 2627.91 samples/sec   Loss 1.4433   LearningRate 0.0007   Epoch: 18   Global Step: 759270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:26,054-Speed 2632.61 samples/sec   Loss 1.4133   LearningRate 0.0007   Epoch: 18   Global Step: 759280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:29,948-Speed 2629.99 samples/sec   Loss 1.4060   LearningRate 0.0007   Epoch: 18   Global Step: 759290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:33,841-Speed 2631.37 samples/sec   Loss 1.4603   LearningRate 0.0007   Epoch: 18   Global Step: 759300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-16 09:03:37,715-Speed 2644.04 samples/sec   Loss 1.4597   LearningRate 0.0007   Epoch: 18   Global Step: 759310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:41,608-Speed 2631.25 samples/sec   Loss 1.4895   LearningRate 0.0007   Epoch: 18   Global Step: 759320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:45,502-Speed 2630.34 samples/sec   Loss 1.4282   LearningRate 0.0007   Epoch: 18   Global Step: 759330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:03:49,375-Speed 2644.58 samples/sec   Loss 1.4116   LearningRate 0.0007   Epoch: 18   Global Step: 759340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:03:53,268-Speed 2631.19 samples/sec   Loss 1.4671   LearningRate 0.0007   Epoch: 18   Global Step: 759350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:03:57,165-Speed 2628.44 samples/sec   Loss 1.4298   LearningRate 0.0007   Epoch: 18   Global Step: 759360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:01,061-Speed 2628.18 samples/sec   Loss 1.4892   LearningRate 0.0007   Epoch: 18   Global Step: 759370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:04,956-Speed 2630.03 samples/sec   Loss 1.4471   LearningRate 0.0007   Epoch: 18   Global Step: 759380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:08,849-Speed 2631.49 samples/sec   Loss 1.4154   LearningRate 0.0007   Epoch: 18   Global Step: 759390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:12,747-Speed 2628.02 samples/sec   Loss 1.4585   LearningRate 0.0007   Epoch: 18   Global Step: 759400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:16,643-Speed 2628.96 samples/sec   Loss 1.4596   LearningRate 0.0007   Epoch: 18   Global Step: 759410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:20,540-Speed 2628.78 samples/sec   Loss 1.4063   LearningRate 0.0007   Epoch: 18   Global Step: 759420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:24,434-Speed 2629.98 samples/sec   Loss 1.3812   LearningRate 0.0007   Epoch: 18   Global Step: 759430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:28,329-Speed 2629.70 samples/sec   Loss 1.4548   LearningRate 0.0007   Epoch: 18   Global Step: 759440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:04:32,200-Speed 2646.10 samples/sec   Loss 1.4911   LearningRate 0.0007   Epoch: 18   Global Step: 759450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:36,094-Speed 2630.05 samples/sec   Loss 1.4798   LearningRate 0.0007   Epoch: 18   Global Step: 759460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:39,989-Speed 2630.49 samples/sec   Loss 1.4456   LearningRate 0.0007   Epoch: 18   Global Step: 759470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:43,977-Speed 2569.52 samples/sec   Loss 1.4114   LearningRate 0.0007   Epoch: 18   Global Step: 759480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:48,029-Speed 2527.23 samples/sec   Loss 1.4383   LearningRate 0.0007   Epoch: 18   Global Step: 759490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:51,928-Speed 2628.48 samples/sec   Loss 1.4056   LearningRate 0.0007   Epoch: 18   Global Step: 759500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:55,816-Speed 2634.06 samples/sec   Loss 1.4465   LearningRate 0.0007   Epoch: 18   Global Step: 759510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:04:59,709-Speed 2630.89 samples/sec   Loss 1.4841   LearningRate 0.0007   Epoch: 18   Global Step: 759520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:03,606-Speed 2628.14 samples/sec   Loss 1.4615   LearningRate 0.0007   Epoch: 18   Global Step: 759530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:07,500-Speed 2631.19 samples/sec   Loss 1.4260   LearningRate 0.0007   Epoch: 18   Global Step: 759540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:11,392-Speed 2631.60 samples/sec   Loss 1.4333   LearningRate 0.0007   Epoch: 18   Global Step: 759550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:05:15,286-Speed 2630.68 samples/sec   Loss 1.3812   LearningRate 0.0007   Epoch: 18   Global Step: 759560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:05:19,394-Speed 2493.12 samples/sec   Loss 1.4614   LearningRate 0.0007   Epoch: 18   Global Step: 759570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:23,335-Speed 2598.62 samples/sec   Loss 1.4344   LearningRate 0.0007   Epoch: 18   Global Step: 759580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:27,237-Speed 2626.00 samples/sec   Loss 1.5050   LearningRate 0.0007   Epoch: 18   Global Step: 759590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:31,154-Speed 2614.53 samples/sec   Loss 1.4200   LearningRate 0.0007   Epoch: 18   Global Step: 759600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:35,066-Speed 2618.18 samples/sec   Loss 1.4188   LearningRate 0.0007   Epoch: 18   Global Step: 759610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:38,958-Speed 2631.76 samples/sec   Loss 1.4559   LearningRate 0.0007   Epoch: 18   Global Step: 759620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:05:42,832-Speed 2644.20 samples/sec   Loss 1.5031   LearningRate 0.0007   Epoch: 18   Global Step: 759630   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:05:46,735-Speed 2624.43 samples/sec   Loss 1.4630   LearningRate 0.0007   Epoch: 18   Global Step: 759640   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:05:50,648-Speed 2617.79 samples/sec   Loss 1.3788   LearningRate 0.0007   Epoch: 18   Global Step: 759650   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:05:54,545-Speed 2628.27 samples/sec   Loss 1.5607   LearningRate 0.0007   Epoch: 18   Global Step: 759660   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:05:58,458-Speed 2618.34 samples/sec   Loss 1.4251   LearningRate 0.0007   Epoch: 18   Global Step: 759670   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:06:02,351-Speed 2630.67 samples/sec   Loss 1.4248   LearningRate 0.0007   Epoch: 18   Global Step: 759680   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:06:06,250-Speed 2626.95 samples/sec   Loss 1.4081   LearningRate 0.0007   Epoch: 18   Global Step: 759690   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:06:10,154-Speed 2623.47 samples/sec   Loss 1.3863   LearningRate 0.0007   Epoch: 18   Global Step: 759700   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:06:14,077-Speed 2611.40 samples/sec   Loss 1.4248   LearningRate 0.0007   Epoch: 18   Global Step: 759710   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:06:17,967-Speed 2633.05 samples/sec   Loss 1.4548   LearningRate 0.0007   Epoch: 18   Global Step: 759720   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:06:21,859-Speed 2631.64 samples/sec   Loss 1.4344   LearningRate 0.0007   Epoch: 18   Global Step: 759730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:25,762-Speed 2624.67 samples/sec   Loss 1.3938   LearningRate 0.0007   Epoch: 18   Global Step: 759740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:29,656-Speed 2630.55 samples/sec   Loss 1.3951   LearningRate 0.0007   Epoch: 18   Global Step: 759750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:33,558-Speed 2624.76 samples/sec   Loss 1.4559   LearningRate 0.0007   Epoch: 18   Global Step: 759760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:37,475-Speed 2614.98 samples/sec   Loss 1.3812   LearningRate 0.0007   Epoch: 18   Global Step: 759770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:41,368-Speed 2631.05 samples/sec   Loss 1.4691   LearningRate 0.0007   Epoch: 18   Global Step: 759780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:45,264-Speed 2629.33 samples/sec   Loss 1.4531   LearningRate 0.0007   Epoch: 18   Global Step: 759790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:49,155-Speed 2632.25 samples/sec   Loss 1.4287   LearningRate 0.0007   Epoch: 18   Global Step: 759800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:53,052-Speed 2628.39 samples/sec   Loss 1.4574   LearningRate 0.0007   Epoch: 18   Global Step: 759810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:06:56,946-Speed 2630.29 samples/sec   Loss 1.4307   LearningRate 0.0007   Epoch: 18   Global Step: 759820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:00,838-Speed 2631.98 samples/sec   Loss 1.3858   LearningRate 0.0007   Epoch: 18   Global Step: 759830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:07:04,740-Speed 2624.74 samples/sec   Loss 1.4219   LearningRate 0.0007   Epoch: 18   Global Step: 759840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:07:08,642-Speed 2625.20 samples/sec   Loss 1.4838   LearningRate 0.0007   Epoch: 18   Global Step: 759850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:07:12,541-Speed 2627.26 samples/sec   Loss 1.4341   LearningRate 0.0007   Epoch: 18   Global Step: 759860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:07:16,415-Speed 2643.15 samples/sec   Loss 1.4366   LearningRate 0.0007   Epoch: 18   Global Step: 759870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:20,314-Speed 2627.29 samples/sec   Loss 1.4882   LearningRate 0.0007   Epoch: 18   Global Step: 759880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:24,208-Speed 2630.07 samples/sec   Loss 1.4472   LearningRate 0.0007   Epoch: 18   Global Step: 759890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:28,102-Speed 2631.07 samples/sec   Loss 1.4551   LearningRate 0.0007   Epoch: 18   Global Step: 759900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:31,998-Speed 2628.37 samples/sec   Loss 1.4684   LearningRate 0.0007   Epoch: 18   Global Step: 759910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:35,893-Speed 2630.02 samples/sec   Loss 1.4647   LearningRate 0.0007   Epoch: 18   Global Step: 759920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:39,789-Speed 2628.68 samples/sec   Loss 1.4026   LearningRate 0.0007   Epoch: 18   Global Step: 759930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:43,690-Speed 2625.82 samples/sec   Loss 1.4287   LearningRate 0.0007   Epoch: 18   Global Step: 759940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:47,627-Speed 2601.84 samples/sec   Loss 1.4218   LearningRate 0.0007   Epoch: 18   Global Step: 759950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:51,540-Speed 2617.54 samples/sec   Loss 1.4504   LearningRate 0.0007   Epoch: 18   Global Step: 759960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:07:55,436-Speed 2628.89 samples/sec   Loss 1.4057   LearningRate 0.0007   Epoch: 18   Global Step: 759970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:07:59,311-Speed 2643.47 samples/sec   Loss 1.4304   LearningRate 0.0007   Epoch: 18   Global Step: 759980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:08:03,241-Speed 2606.38 samples/sec   Loss 1.4030   LearningRate 0.0007   Epoch: 18   Global Step: 759990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:08:07,136-Speed 2629.75 samples/sec   Loss 1.4298   LearningRate 0.0007   Epoch: 18   Global Step: 760000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:08:49,861-[lfw][760000]XNorm: 21.754629
Training: 2022-04-16 09:08:49,862-[lfw][760000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 09:08:49,862-[lfw][760000]Accuracy-Highest: 0.99850
Training: 2022-04-16 09:09:39,643-[cfp_fp][760000]XNorm: 21.988936
Training: 2022-04-16 09:09:39,644-[cfp_fp][760000]Accuracy-Flip: 0.99300+-0.00353
Training: 2022-04-16 09:09:39,645-[cfp_fp][760000]Accuracy-Highest: 0.99329
Training: 2022-04-16 09:10:22,514-[agedb_30][760000]XNorm: 22.642308
Training: 2022-04-16 09:10:22,515-[agedb_30][760000]Accuracy-Flip: 0.98450+-0.00601
Training: 2022-04-16 09:10:22,516-[agedb_30][760000]Accuracy-Highest: 0.98450
Training: 2022-04-16 09:10:26,412-Speed 73.52 samples/sec   Loss 1.4351   LearningRate 0.0007   Epoch: 18   Global Step: 760010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:30,282-Speed 2646.06 samples/sec   Loss 1.4549   LearningRate 0.0007   Epoch: 18   Global Step: 760020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:34,164-Speed 2638.76 samples/sec   Loss 1.4166   LearningRate 0.0007   Epoch: 18   Global Step: 760030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:38,042-Speed 2641.03 samples/sec   Loss 1.4299   LearningRate 0.0007   Epoch: 18   Global Step: 760040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:41,920-Speed 2641.26 samples/sec   Loss 1.4065   LearningRate 0.0007   Epoch: 18   Global Step: 760050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:45,822-Speed 2624.80 samples/sec   Loss 1.4205   LearningRate 0.0007   Epoch: 18   Global Step: 760060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:49,700-Speed 2641.65 samples/sec   Loss 1.4710   LearningRate 0.0007   Epoch: 18   Global Step: 760070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:10:53,585-Speed 2636.71 samples/sec   Loss 1.4791   LearningRate 0.0007   Epoch: 18   Global Step: 760080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:10:57,492-Speed 2621.03 samples/sec   Loss 1.4187   LearningRate 0.0007   Epoch: 18   Global Step: 760090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:11:01,381-Speed 2635.91 samples/sec   Loss 1.4744   LearningRate 0.0007   Epoch: 18   Global Step: 760100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:11:05,267-Speed 2635.24 samples/sec   Loss 1.3695   LearningRate 0.0007   Epoch: 18   Global Step: 760110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:11:09,147-Speed 2640.22 samples/sec   Loss 1.4083   LearningRate 0.0007   Epoch: 18   Global Step: 760120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:13,041-Speed 2630.51 samples/sec   Loss 1.4247   LearningRate 0.0007   Epoch: 18   Global Step: 760130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:16,937-Speed 2629.04 samples/sec   Loss 1.4886   LearningRate 0.0007   Epoch: 18   Global Step: 760140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:20,842-Speed 2622.85 samples/sec   Loss 1.4603   LearningRate 0.0007   Epoch: 18   Global Step: 760150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:24,747-Speed 2623.34 samples/sec   Loss 1.4172   LearningRate 0.0007   Epoch: 18   Global Step: 760160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:28,649-Speed 2625.14 samples/sec   Loss 1.4437   LearningRate 0.0007   Epoch: 18   Global Step: 760170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:32,577-Speed 2608.67 samples/sec   Loss 1.4389   LearningRate 0.0007   Epoch: 18   Global Step: 760180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:36,475-Speed 2627.48 samples/sec   Loss 1.4584   LearningRate 0.0007   Epoch: 18   Global Step: 760190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:40,381-Speed 2622.90 samples/sec   Loss 1.4618   LearningRate 0.0007   Epoch: 18   Global Step: 760200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:44,283-Speed 2624.35 samples/sec   Loss 1.4070   LearningRate 0.0007   Epoch: 18   Global Step: 760210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:48,189-Speed 2622.24 samples/sec   Loss 1.4170   LearningRate 0.0007   Epoch: 18   Global Step: 760220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:11:52,091-Speed 2625.36 samples/sec   Loss 1.4200   LearningRate 0.0007   Epoch: 18   Global Step: 760230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:11:55,970-Speed 2640.56 samples/sec   Loss 1.4568   LearningRate 0.0007   Epoch: 18   Global Step: 760240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:11:59,866-Speed 2629.15 samples/sec   Loss 1.4328   LearningRate 0.0007   Epoch: 18   Global Step: 760250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:03,777-Speed 2618.86 samples/sec   Loss 1.4420   LearningRate 0.0007   Epoch: 18   Global Step: 760260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:07,681-Speed 2623.98 samples/sec   Loss 1.4191   LearningRate 0.0007   Epoch: 18   Global Step: 760270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:11,576-Speed 2629.67 samples/sec   Loss 1.4818   LearningRate 0.0007   Epoch: 18   Global Step: 760280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:15,480-Speed 2622.93 samples/sec   Loss 1.4459   LearningRate 0.0007   Epoch: 18   Global Step: 760290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:19,380-Speed 2626.81 samples/sec   Loss 1.4455   LearningRate 0.0007   Epoch: 18   Global Step: 760300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:23,278-Speed 2627.74 samples/sec   Loss 1.4242   LearningRate 0.0007   Epoch: 18   Global Step: 760310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:27,212-Speed 2603.35 samples/sec   Loss 1.4208   LearningRate 0.0007   Epoch: 18   Global Step: 760320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:31,114-Speed 2625.85 samples/sec   Loss 1.4503   LearningRate 0.0007   Epoch: 18   Global Step: 760330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:35,034-Speed 2612.87 samples/sec   Loss 1.3928   LearningRate 0.0007   Epoch: 18   Global Step: 760340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:12:38,935-Speed 2625.87 samples/sec   Loss 1.3610   LearningRate 0.0007   Epoch: 18   Global Step: 760350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:12:42,846-Speed 2619.20 samples/sec   Loss 1.4035   LearningRate 0.0007   Epoch: 18   Global Step: 760360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:12:46,740-Speed 2630.11 samples/sec   Loss 1.4407   LearningRate 0.0007   Epoch: 18   Global Step: 760370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:12:50,615-Speed 2643.08 samples/sec   Loss 1.3823   LearningRate 0.0007   Epoch: 18   Global Step: 760380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:54,509-Speed 2630.64 samples/sec   Loss 1.4239   LearningRate 0.0007   Epoch: 18   Global Step: 760390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:12:58,432-Speed 2611.02 samples/sec   Loss 1.4387   LearningRate 0.0007   Epoch: 18   Global Step: 760400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:02,328-Speed 2629.17 samples/sec   Loss 1.4343   LearningRate 0.0007   Epoch: 18   Global Step: 760410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:06,219-Speed 2633.50 samples/sec   Loss 1.4733   LearningRate 0.0007   Epoch: 18   Global Step: 760420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:10,115-Speed 2629.06 samples/sec   Loss 1.4356   LearningRate 0.0007   Epoch: 18   Global Step: 760430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:14,009-Speed 2631.16 samples/sec   Loss 1.4500   LearningRate 0.0007   Epoch: 18   Global Step: 760440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:17,903-Speed 2630.06 samples/sec   Loss 1.4424   LearningRate 0.0007   Epoch: 18   Global Step: 760450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:21,796-Speed 2630.61 samples/sec   Loss 1.4926   LearningRate 0.0007   Epoch: 18   Global Step: 760460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:25,687-Speed 2632.01 samples/sec   Loss 1.4343   LearningRate 0.0007   Epoch: 18   Global Step: 760470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:29,579-Speed 2632.28 samples/sec   Loss 1.4315   LearningRate 0.0007   Epoch: 18   Global Step: 760480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:13:33,455-Speed 2642.89 samples/sec   Loss 1.3926   LearningRate 0.0007   Epoch: 18   Global Step: 760490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:37,364-Speed 2620.21 samples/sec   Loss 1.3759   LearningRate 0.0007   Epoch: 18   Global Step: 760500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:41,268-Speed 2623.68 samples/sec   Loss 1.4486   LearningRate 0.0007   Epoch: 18   Global Step: 760510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:45,234-Speed 2582.86 samples/sec   Loss 1.4480   LearningRate 0.0007   Epoch: 18   Global Step: 760520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:49,132-Speed 2626.90 samples/sec   Loss 1.4293   LearningRate 0.0007   Epoch: 18   Global Step: 760530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:53,028-Speed 2629.13 samples/sec   Loss 1.3983   LearningRate 0.0007   Epoch: 18   Global Step: 760540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:13:56,928-Speed 2626.01 samples/sec   Loss 1.4358   LearningRate 0.0007   Epoch: 18   Global Step: 760550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:00,824-Speed 2629.55 samples/sec   Loss 1.4415   LearningRate 0.0007   Epoch: 18   Global Step: 760560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:04,716-Speed 2631.46 samples/sec   Loss 1.4137   LearningRate 0.0007   Epoch: 18   Global Step: 760570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:08,608-Speed 2631.78 samples/sec   Loss 1.4227   LearningRate 0.0007   Epoch: 18   Global Step: 760580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:12,501-Speed 2631.42 samples/sec   Loss 1.4390   LearningRate 0.0007   Epoch: 18   Global Step: 760590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:14:16,393-Speed 2631.92 samples/sec   Loss 1.4583   LearningRate 0.0007   Epoch: 18   Global Step: 760600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:14:20,284-Speed 2632.09 samples/sec   Loss 1.4384   LearningRate 0.0007   Epoch: 18   Global Step: 760610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:14:24,195-Speed 2618.71 samples/sec   Loss 1.4320   LearningRate 0.0007   Epoch: 18   Global Step: 760620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:14:28,072-Speed 2641.61 samples/sec   Loss 1.4077   LearningRate 0.0007   Epoch: 18   Global Step: 760630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:31,971-Speed 2626.79 samples/sec   Loss 1.4906   LearningRate 0.0007   Epoch: 18   Global Step: 760640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:35,872-Speed 2625.71 samples/sec   Loss 1.4170   LearningRate 0.0007   Epoch: 18   Global Step: 760650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:39,779-Speed 2621.80 samples/sec   Loss 1.4010   LearningRate 0.0007   Epoch: 18   Global Step: 760660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:43,678-Speed 2626.80 samples/sec   Loss 1.4758   LearningRate 0.0007   Epoch: 18   Global Step: 760670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:47,579-Speed 2625.79 samples/sec   Loss 1.3825   LearningRate 0.0007   Epoch: 18   Global Step: 760680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:51,479-Speed 2625.64 samples/sec   Loss 1.4714   LearningRate 0.0007   Epoch: 18   Global Step: 760690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:55,377-Speed 2627.81 samples/sec   Loss 1.4751   LearningRate 0.0007   Epoch: 18   Global Step: 760700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:14:59,279-Speed 2625.36 samples/sec   Loss 1.4022   LearningRate 0.0007   Epoch: 18   Global Step: 760710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:03,209-Speed 2606.09 samples/sec   Loss 1.4061   LearningRate 0.0007   Epoch: 18   Global Step: 760720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:07,106-Speed 2628.16 samples/sec   Loss 1.4033   LearningRate 0.0007   Epoch: 18   Global Step: 760730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:10,999-Speed 2630.70 samples/sec   Loss 1.4103   LearningRate 0.0007   Epoch: 18   Global Step: 760740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:14,961-Speed 2586.06 samples/sec   Loss 1.4466   LearningRate 0.0007   Epoch: 18   Global Step: 760750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:18,879-Speed 2613.82 samples/sec   Loss 1.3900   LearningRate 0.0007   Epoch: 18   Global Step: 760760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:22,778-Speed 2627.35 samples/sec   Loss 1.4110   LearningRate 0.0007   Epoch: 18   Global Step: 760770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:26,673-Speed 2629.64 samples/sec   Loss 1.4268   LearningRate 0.0007   Epoch: 18   Global Step: 760780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:30,568-Speed 2630.05 samples/sec   Loss 1.4374   LearningRate 0.0007   Epoch: 18   Global Step: 760790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:34,461-Speed 2630.41 samples/sec   Loss 1.4271   LearningRate 0.0007   Epoch: 18   Global Step: 760800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:38,356-Speed 2629.41 samples/sec   Loss 1.4370   LearningRate 0.0007   Epoch: 18   Global Step: 760810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:42,260-Speed 2623.60 samples/sec   Loss 1.4284   LearningRate 0.0007   Epoch: 18   Global Step: 760820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:46,156-Speed 2629.15 samples/sec   Loss 1.4212   LearningRate 0.0007   Epoch: 18   Global Step: 760830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:15:50,033-Speed 2642.45 samples/sec   Loss 1.3973   LearningRate 0.0007   Epoch: 18   Global Step: 760840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:53,932-Speed 2627.26 samples/sec   Loss 1.4626   LearningRate 0.0007   Epoch: 18   Global Step: 760850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:15:57,828-Speed 2628.93 samples/sec   Loss 1.4579   LearningRate 0.0007   Epoch: 18   Global Step: 760860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:01,724-Speed 2629.00 samples/sec   Loss 1.4438   LearningRate 0.0007   Epoch: 18   Global Step: 760870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:05,618-Speed 2630.27 samples/sec   Loss 1.4503   LearningRate 0.0007   Epoch: 18   Global Step: 760880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:09,514-Speed 2628.67 samples/sec   Loss 1.4420   LearningRate 0.0007   Epoch: 18   Global Step: 760890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:13,413-Speed 2627.33 samples/sec   Loss 1.4535   LearningRate 0.0007   Epoch: 18   Global Step: 760900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:17,310-Speed 2628.30 samples/sec   Loss 1.3944   LearningRate 0.0007   Epoch: 18   Global Step: 760910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:21,207-Speed 2628.55 samples/sec   Loss 1.4320   LearningRate 0.0007   Epoch: 18   Global Step: 760920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:25,099-Speed 2631.82 samples/sec   Loss 1.4127   LearningRate 0.0007   Epoch: 18   Global Step: 760930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:28,976-Speed 2642.46 samples/sec   Loss 1.4659   LearningRate 0.0007   Epoch: 18   Global Step: 760940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:32,930-Speed 2590.21 samples/sec   Loss 1.4403   LearningRate 0.0007   Epoch: 18   Global Step: 760950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:37,022-Speed 2502.43 samples/sec   Loss 1.4361   LearningRate 0.0007   Epoch: 18   Global Step: 760960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:40,975-Speed 2591.44 samples/sec   Loss 1.4347   LearningRate 0.0007   Epoch: 18   Global Step: 760970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:44,891-Speed 2615.98 samples/sec   Loss 1.4355   LearningRate 0.0007   Epoch: 18   Global Step: 760980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:48,815-Speed 2609.94 samples/sec   Loss 1.4558   LearningRate 0.0007   Epoch: 18   Global Step: 760990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:52,711-Speed 2629.44 samples/sec   Loss 1.4700   LearningRate 0.0007   Epoch: 18   Global Step: 761000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:16:56,611-Speed 2626.64 samples/sec   Loss 1.4957   LearningRate 0.0007   Epoch: 18   Global Step: 761010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:00,511-Speed 2626.74 samples/sec   Loss 1.4365   LearningRate 0.0007   Epoch: 18   Global Step: 761020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:04,407-Speed 2628.81 samples/sec   Loss 1.3970   LearningRate 0.0007   Epoch: 18   Global Step: 761030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:08,304-Speed 2627.98 samples/sec   Loss 1.4654   LearningRate 0.0007   Epoch: 18   Global Step: 761040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:17:12,185-Speed 2638.89 samples/sec   Loss 1.3909   LearningRate 0.0007   Epoch: 18   Global Step: 761050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:16,083-Speed 2628.11 samples/sec   Loss 1.4501   LearningRate 0.0007   Epoch: 18   Global Step: 761060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:19,982-Speed 2627.15 samples/sec   Loss 1.3866   LearningRate 0.0007   Epoch: 18   Global Step: 761070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:23,891-Speed 2620.49 samples/sec   Loss 1.4338   LearningRate 0.0007   Epoch: 18   Global Step: 761080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:27,788-Speed 2628.12 samples/sec   Loss 1.5091   LearningRate 0.0007   Epoch: 18   Global Step: 761090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:31,748-Speed 2587.26 samples/sec   Loss 1.4678   LearningRate 0.0007   Epoch: 18   Global Step: 761100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:35,659-Speed 2618.11 samples/sec   Loss 1.4528   LearningRate 0.0007   Epoch: 18   Global Step: 761110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:39,567-Speed 2620.95 samples/sec   Loss 1.4786   LearningRate 0.0007   Epoch: 18   Global Step: 761120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:43,465-Speed 2627.24 samples/sec   Loss 1.4654   LearningRate 0.0007   Epoch: 18   Global Step: 761130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:47,364-Speed 2627.49 samples/sec   Loss 1.4071   LearningRate 0.0007   Epoch: 18   Global Step: 761140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:17:51,266-Speed 2625.78 samples/sec   Loss 1.4778   LearningRate 0.0007   Epoch: 18   Global Step: 761150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:17:55,169-Speed 2624.02 samples/sec   Loss 1.4815   LearningRate 0.0007   Epoch: 18   Global Step: 761160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:17:59,067-Speed 2628.55 samples/sec   Loss 1.4448   LearningRate 0.0007   Epoch: 18   Global Step: 761170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:18:02,984-Speed 2614.19 samples/sec   Loss 1.4902   LearningRate 0.0007   Epoch: 18   Global Step: 761180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:18:06,886-Speed 2625.28 samples/sec   Loss 1.4030   LearningRate 0.0007   Epoch: 18   Global Step: 761190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:18:10,760-Speed 2643.58 samples/sec   Loss 1.4367   LearningRate 0.0007   Epoch: 18   Global Step: 761200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:18:14,668-Speed 2621.09 samples/sec   Loss 1.4383   LearningRate 0.0007   Epoch: 18   Global Step: 761210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:18:18,574-Speed 2622.52 samples/sec   Loss 1.3823   LearningRate 0.0007   Epoch: 18   Global Step: 761220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:18:22,475-Speed 2625.02 samples/sec   Loss 1.3823   LearningRate 0.0007   Epoch: 18   Global Step: 761230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:18:26,384-Speed 2620.93 samples/sec   Loss 1.4315   LearningRate 0.0007   Epoch: 18   Global Step: 761240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:18:30,282-Speed 2627.80 samples/sec   Loss 1.4263   LearningRate 0.0007   Epoch: 18   Global Step: 761250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:18:34,194-Speed 2618.34 samples/sec   Loss 1.3894   LearningRate 0.0007   Epoch: 18   Global Step: 761260   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:18:38,105-Speed 2618.89 samples/sec   Loss 1.3796   LearningRate 0.0007   Epoch: 18   Global Step: 761270   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:18:42,013-Speed 2620.19 samples/sec   Loss 1.3515   LearningRate 0.0007   Epoch: 18   Global Step: 761280   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:18:45,911-Speed 2627.62 samples/sec   Loss 1.4201   LearningRate 0.0007   Epoch: 18   Global Step: 761290   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:18:49,820-Speed 2620.69 samples/sec   Loss 1.4133   LearningRate 0.0007   Epoch: 18   Global Step: 761300   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:18:53,716-Speed 2628.92 samples/sec   Loss 1.3939   LearningRate 0.0007   Epoch: 18   Global Step: 761310   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:18:57,613-Speed 2628.97 samples/sec   Loss 1.4477   LearningRate 0.0007   Epoch: 18   Global Step: 761320   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:01,518-Speed 2623.18 samples/sec   Loss 1.4369   LearningRate 0.0007   Epoch: 18   Global Step: 761330   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:05,423-Speed 2622.75 samples/sec   Loss 1.3894   LearningRate 0.0007   Epoch: 18   Global Step: 761340   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:09,322-Speed 2627.29 samples/sec   Loss 1.4022   LearningRate 0.0007   Epoch: 18   Global Step: 761350   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:13,225-Speed 2623.62 samples/sec   Loss 1.4129   LearningRate 0.0007   Epoch: 18   Global Step: 761360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:19:17,106-Speed 2639.31 samples/sec   Loss 1.4523   LearningRate 0.0007   Epoch: 18   Global Step: 761370   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:21,001-Speed 2630.20 samples/sec   Loss 1.4188   LearningRate 0.0007   Epoch: 18   Global Step: 761380   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:24,899-Speed 2627.04 samples/sec   Loss 1.4096   LearningRate 0.0007   Epoch: 18   Global Step: 761390   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:28,801-Speed 2626.22 samples/sec   Loss 1.4381   LearningRate 0.0007   Epoch: 18   Global Step: 761400   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:32,715-Speed 2616.44 samples/sec   Loss 1.4765   LearningRate 0.0007   Epoch: 18   Global Step: 761410   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:36,614-Speed 2627.46 samples/sec   Loss 1.4161   LearningRate 0.0007   Epoch: 18   Global Step: 761420   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:40,507-Speed 2630.47 samples/sec   Loss 1.4035   LearningRate 0.0007   Epoch: 18   Global Step: 761430   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:44,404-Speed 2628.20 samples/sec   Loss 1.3548   LearningRate 0.0007   Epoch: 18   Global Step: 761440   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:48,303-Speed 2627.23 samples/sec   Loss 1.3996   LearningRate 0.0007   Epoch: 18   Global Step: 761450   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:52,208-Speed 2623.11 samples/sec   Loss 1.3905   LearningRate 0.0007   Epoch: 18   Global Step: 761460   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:19:56,110-Speed 2625.22 samples/sec   Loss 1.4047   LearningRate 0.0007   Epoch: 18   Global Step: 761470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:00,007-Speed 2628.03 samples/sec   Loss 1.4314   LearningRate 0.0007   Epoch: 18   Global Step: 761480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:03,903-Speed 2629.12 samples/sec   Loss 1.4438   LearningRate 0.0007   Epoch: 18   Global Step: 761490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:07,802-Speed 2626.91 samples/sec   Loss 1.4038   LearningRate 0.0007   Epoch: 18   Global Step: 761500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:11,699-Speed 2627.99 samples/sec   Loss 1.4474   LearningRate 0.0007   Epoch: 18   Global Step: 761510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:15,603-Speed 2623.41 samples/sec   Loss 1.4491   LearningRate 0.0007   Epoch: 18   Global Step: 761520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:19,518-Speed 2616.78 samples/sec   Loss 1.3853   LearningRate 0.0007   Epoch: 18   Global Step: 761530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:23,419-Speed 2625.08 samples/sec   Loss 1.4286   LearningRate 0.0007   Epoch: 18   Global Step: 761540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:27,327-Speed 2622.03 samples/sec   Loss 1.4049   LearningRate 0.0007   Epoch: 18   Global Step: 761550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:20:31,203-Speed 2642.58 samples/sec   Loss 1.4410   LearningRate 0.0007   Epoch: 18   Global Step: 761560   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:35,102-Speed 2626.94 samples/sec   Loss 1.4453   LearningRate 0.0007   Epoch: 18   Global Step: 761570   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:39,007-Speed 2622.50 samples/sec   Loss 1.4190   LearningRate 0.0007   Epoch: 18   Global Step: 761580   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:42,910-Speed 2623.94 samples/sec   Loss 1.4314   LearningRate 0.0007   Epoch: 18   Global Step: 761590   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:46,814-Speed 2624.84 samples/sec   Loss 1.3943   LearningRate 0.0007   Epoch: 18   Global Step: 761600   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:50,726-Speed 2618.16 samples/sec   Loss 1.3831   LearningRate 0.0007   Epoch: 18   Global Step: 761610   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:54,630-Speed 2623.55 samples/sec   Loss 1.4272   LearningRate 0.0007   Epoch: 18   Global Step: 761620   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:20:58,528-Speed 2627.13 samples/sec   Loss 1.4282   LearningRate 0.0007   Epoch: 18   Global Step: 761630   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:21:02,428-Speed 2626.15 samples/sec   Loss 1.3408   LearningRate 0.0007   Epoch: 18   Global Step: 761640   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:21:06,336-Speed 2621.27 samples/sec   Loss 1.3918   LearningRate 0.0007   Epoch: 18   Global Step: 761650   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:21:10,238-Speed 2625.08 samples/sec   Loss 1.4407   LearningRate 0.0007   Epoch: 18   Global Step: 761660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:14,163-Speed 2609.42 samples/sec   Loss 1.4283   LearningRate 0.0007   Epoch: 18   Global Step: 761670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:18,175-Speed 2553.33 samples/sec   Loss 1.4388   LearningRate 0.0007   Epoch: 18   Global Step: 761680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:22,075-Speed 2626.39 samples/sec   Loss 1.4176   LearningRate 0.0007   Epoch: 18   Global Step: 761690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:25,976-Speed 2625.43 samples/sec   Loss 1.4515   LearningRate 0.0007   Epoch: 18   Global Step: 761700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:29,887-Speed 2619.32 samples/sec   Loss 1.4992   LearningRate 0.0007   Epoch: 18   Global Step: 761710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:33,800-Speed 2617.55 samples/sec   Loss 1.4282   LearningRate 0.0007   Epoch: 18   Global Step: 761720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:37,702-Speed 2624.81 samples/sec   Loss 1.3918   LearningRate 0.0007   Epoch: 18   Global Step: 761730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:41,625-Speed 2610.95 samples/sec   Loss 1.3877   LearningRate 0.0007   Epoch: 18   Global Step: 761740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:45,531-Speed 2622.88 samples/sec   Loss 1.3747   LearningRate 0.0007   Epoch: 18   Global Step: 761750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:21:49,433-Speed 2624.98 samples/sec   Loss 1.4061   LearningRate 0.0007   Epoch: 18   Global Step: 761760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:21:53,334-Speed 2625.63 samples/sec   Loss 1.4564   LearningRate 0.0007   Epoch: 18   Global Step: 761770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:21:57,216-Speed 2638.62 samples/sec   Loss 1.3563   LearningRate 0.0007   Epoch: 18   Global Step: 761780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:22:01,118-Speed 2625.04 samples/sec   Loss 1.4550   LearningRate 0.0007   Epoch: 18   Global Step: 761790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:22:05,026-Speed 2621.39 samples/sec   Loss 1.4155   LearningRate 0.0007   Epoch: 18   Global Step: 761800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:22:08,938-Speed 2618.04 samples/sec   Loss 1.4583   LearningRate 0.0007   Epoch: 18   Global Step: 761810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:22:12,864-Speed 2608.35 samples/sec   Loss 1.4381   LearningRate 0.0007   Epoch: 18   Global Step: 761820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:22:16,740-Speed 2643.21 samples/sec   Loss 1.4309   LearningRate 0.0007   Epoch: 18   Global Step: 761830   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:20,650-Speed 2619.99 samples/sec   Loss 1.3992   LearningRate 0.0007   Epoch: 18   Global Step: 761840   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:24,566-Speed 2615.25 samples/sec   Loss 1.4671   LearningRate 0.0007   Epoch: 18   Global Step: 761850   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:28,470-Speed 2623.94 samples/sec   Loss 1.3948   LearningRate 0.0007   Epoch: 18   Global Step: 761860   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:32,374-Speed 2623.67 samples/sec   Loss 1.4553   LearningRate 0.0007   Epoch: 18   Global Step: 761870   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:36,281-Speed 2621.43 samples/sec   Loss 1.4558   LearningRate 0.0007   Epoch: 18   Global Step: 761880   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:40,184-Speed 2624.66 samples/sec   Loss 1.4052   LearningRate 0.0007   Epoch: 18   Global Step: 761890   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:44,086-Speed 2624.92 samples/sec   Loss 1.4628   LearningRate 0.0007   Epoch: 18   Global Step: 761900   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:48,086-Speed 2560.42 samples/sec   Loss 1.4427   LearningRate 0.0007   Epoch: 18   Global Step: 761910   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:52,000-Speed 2616.80 samples/sec   Loss 1.4367   LearningRate 0.0007   Epoch: 18   Global Step: 761920   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-16 09:22:55,905-Speed 2623.20 samples/sec   Loss 1.4203   LearningRate 0.0007   Epoch: 18   Global Step: 761930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:22:59,802-Speed 2628.25 samples/sec   Loss 1.4395   LearningRate 0.0007   Epoch: 18   Global Step: 761940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:03,705-Speed 2623.85 samples/sec   Loss 1.4009   LearningRate 0.0007   Epoch: 18   Global Step: 761950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:07,608-Speed 2624.51 samples/sec   Loss 1.4262   LearningRate 0.0007   Epoch: 18   Global Step: 761960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:11,507-Speed 2626.60 samples/sec   Loss 1.3701   LearningRate 0.0007   Epoch: 18   Global Step: 761970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:15,407-Speed 2626.73 samples/sec   Loss 1.3985   LearningRate 0.0007   Epoch: 18   Global Step: 761980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:19,311-Speed 2623.90 samples/sec   Loss 1.4155   LearningRate 0.0007   Epoch: 18   Global Step: 761990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:23,209-Speed 2628.14 samples/sec   Loss 1.4164   LearningRate 0.0007   Epoch: 18   Global Step: 762000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:27,107-Speed 2627.80 samples/sec   Loss 1.4437   LearningRate 0.0007   Epoch: 18   Global Step: 762010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:31,007-Speed 2626.22 samples/sec   Loss 1.3696   LearningRate 0.0007   Epoch: 18   Global Step: 762020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:34,885-Speed 2641.34 samples/sec   Loss 1.4230   LearningRate 0.0007   Epoch: 18   Global Step: 762030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:38,796-Speed 2618.07 samples/sec   Loss 1.4229   LearningRate 0.0007   Epoch: 18   Global Step: 762040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:42,697-Speed 2625.47 samples/sec   Loss 1.4390   LearningRate 0.0007   Epoch: 18   Global Step: 762050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:46,595-Speed 2628.56 samples/sec   Loss 1.4041   LearningRate 0.0007   Epoch: 18   Global Step: 762060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:50,493-Speed 2627.69 samples/sec   Loss 1.4297   LearningRate 0.0007   Epoch: 18   Global Step: 762070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:54,397-Speed 2623.89 samples/sec   Loss 1.4226   LearningRate 0.0007   Epoch: 18   Global Step: 762080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:23:58,307-Speed 2618.76 samples/sec   Loss 1.4289   LearningRate 0.0007   Epoch: 18   Global Step: 762090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:02,205-Speed 2627.85 samples/sec   Loss 1.3729   LearningRate 0.0007   Epoch: 18   Global Step: 762100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:06,105-Speed 2626.46 samples/sec   Loss 1.4534   LearningRate 0.0007   Epoch: 18   Global Step: 762110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:10,004-Speed 2627.03 samples/sec   Loss 1.4574   LearningRate 0.0007   Epoch: 18   Global Step: 762120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:13,934-Speed 2606.07 samples/sec   Loss 1.4416   LearningRate 0.0007   Epoch: 18   Global Step: 762130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:24:17,843-Speed 2620.80 samples/sec   Loss 1.3927   LearningRate 0.0007   Epoch: 18   Global Step: 762140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:24:21,719-Speed 2642.93 samples/sec   Loss 1.4690   LearningRate 0.0007   Epoch: 18   Global Step: 762150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:25,622-Speed 2623.96 samples/sec   Loss 1.4681   LearningRate 0.0007   Epoch: 18   Global Step: 762160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:29,538-Speed 2615.36 samples/sec   Loss 1.4243   LearningRate 0.0007   Epoch: 18   Global Step: 762170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:33,451-Speed 2618.03 samples/sec   Loss 1.4212   LearningRate 0.0007   Epoch: 18   Global Step: 762180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:37,358-Speed 2621.71 samples/sec   Loss 1.4499   LearningRate 0.0007   Epoch: 18   Global Step: 762190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:41,257-Speed 2626.89 samples/sec   Loss 1.3920   LearningRate 0.0007   Epoch: 18   Global Step: 762200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:45,162-Speed 2622.70 samples/sec   Loss 1.3869   LearningRate 0.0007   Epoch: 18   Global Step: 762210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:49,078-Speed 2615.83 samples/sec   Loss 1.4476   LearningRate 0.0007   Epoch: 18   Global Step: 762220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:52,978-Speed 2626.06 samples/sec   Loss 1.4069   LearningRate 0.0007   Epoch: 18   Global Step: 762230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:24:56,877-Speed 2627.54 samples/sec   Loss 1.4271   LearningRate 0.0007   Epoch: 18   Global Step: 762240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:00,779-Speed 2624.70 samples/sec   Loss 1.4723   LearningRate 0.0007   Epoch: 18   Global Step: 762250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:25:04,686-Speed 2621.90 samples/sec   Loss 1.4004   LearningRate 0.0007   Epoch: 18   Global Step: 762260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:25:08,593-Speed 2621.56 samples/sec   Loss 1.4281   LearningRate 0.0007   Epoch: 18   Global Step: 762270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:25:12,494-Speed 2625.86 samples/sec   Loss 1.4082   LearningRate 0.0007   Epoch: 18   Global Step: 762280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:25:16,436-Speed 2598.06 samples/sec   Loss 1.3828   LearningRate 0.0007   Epoch: 18   Global Step: 762290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:25:20,316-Speed 2639.55 samples/sec   Loss 1.4224   LearningRate 0.0007   Epoch: 18   Global Step: 762300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:24,214-Speed 2627.63 samples/sec   Loss 1.4094   LearningRate 0.0007   Epoch: 18   Global Step: 762310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:28,127-Speed 2617.44 samples/sec   Loss 1.3976   LearningRate 0.0007   Epoch: 18   Global Step: 762320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:32,027-Speed 2626.47 samples/sec   Loss 1.4277   LearningRate 0.0007   Epoch: 18   Global Step: 762330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:35,935-Speed 2621.06 samples/sec   Loss 1.4311   LearningRate 0.0007   Epoch: 18   Global Step: 762340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:39,912-Speed 2576.05 samples/sec   Loss 1.4216   LearningRate 0.0007   Epoch: 18   Global Step: 762350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:43,811-Speed 2626.95 samples/sec   Loss 1.4252   LearningRate 0.0007   Epoch: 18   Global Step: 762360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:47,713-Speed 2624.38 samples/sec   Loss 1.4539   LearningRate 0.0007   Epoch: 18   Global Step: 762370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:51,612-Speed 2627.24 samples/sec   Loss 1.4170   LearningRate 0.0007   Epoch: 18   Global Step: 762380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:55,519-Speed 2621.84 samples/sec   Loss 1.3939   LearningRate 0.0007   Epoch: 18   Global Step: 762390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:25:59,402-Speed 2637.23 samples/sec   Loss 1.4551   LearningRate 0.0007   Epoch: 18   Global Step: 762400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:03,304-Speed 2625.94 samples/sec   Loss 1.3939   LearningRate 0.0007   Epoch: 18   Global Step: 762410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:07,205-Speed 2625.04 samples/sec   Loss 1.4331   LearningRate 0.0007   Epoch: 18   Global Step: 762420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:11,113-Speed 2621.65 samples/sec   Loss 1.3926   LearningRate 0.0007   Epoch: 18   Global Step: 762430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:15,028-Speed 2615.43 samples/sec   Loss 1.4026   LearningRate 0.0007   Epoch: 18   Global Step: 762440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:18,935-Speed 2621.90 samples/sec   Loss 1.3946   LearningRate 0.0007   Epoch: 18   Global Step: 762450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:22,844-Speed 2620.47 samples/sec   Loss 1.3981   LearningRate 0.0007   Epoch: 18   Global Step: 762460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:26,757-Speed 2617.77 samples/sec   Loss 1.4200   LearningRate 0.0007   Epoch: 18   Global Step: 762470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:30,675-Speed 2613.74 samples/sec   Loss 1.3643   LearningRate 0.0007   Epoch: 18   Global Step: 762480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:34,588-Speed 2617.96 samples/sec   Loss 1.4147   LearningRate 0.0007   Epoch: 18   Global Step: 762490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:26:38,495-Speed 2621.66 samples/sec   Loss 1.3806   LearningRate 0.0007   Epoch: 18   Global Step: 762500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:26:42,420-Speed 2609.76 samples/sec   Loss 1.4059   LearningRate 0.0007   Epoch: 18   Global Step: 762510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:26:46,331-Speed 2619.25 samples/sec   Loss 1.4861   LearningRate 0.0007   Epoch: 18   Global Step: 762520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:26:50,247-Speed 2615.13 samples/sec   Loss 1.4064   LearningRate 0.0007   Epoch: 18   Global Step: 762530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:26:54,149-Speed 2625.17 samples/sec   Loss 1.4490   LearningRate 0.0007   Epoch: 18   Global Step: 762540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:26:58,071-Speed 2611.21 samples/sec   Loss 1.4057   LearningRate 0.0007   Epoch: 18   Global Step: 762550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:27:01,963-Speed 2631.55 samples/sec   Loss 1.3653   LearningRate 0.0007   Epoch: 18   Global Step: 762560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:05,870-Speed 2621.42 samples/sec   Loss 1.4286   LearningRate 0.0007   Epoch: 18   Global Step: 762570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:09,789-Speed 2614.13 samples/sec   Loss 1.4122   LearningRate 0.0007   Epoch: 18   Global Step: 762580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:13,690-Speed 2625.47 samples/sec   Loss 1.3777   LearningRate 0.0007   Epoch: 18   Global Step: 762590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:17,596-Speed 2622.75 samples/sec   Loss 1.4572   LearningRate 0.0007   Epoch: 18   Global Step: 762600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:21,503-Speed 2621.49 samples/sec   Loss 1.4109   LearningRate 0.0007   Epoch: 18   Global Step: 762610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:25,403-Speed 2626.03 samples/sec   Loss 1.3874   LearningRate 0.0007   Epoch: 18   Global Step: 762620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:29,303-Speed 2626.66 samples/sec   Loss 1.4627   LearningRate 0.0007   Epoch: 18   Global Step: 762630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:33,204-Speed 2625.31 samples/sec   Loss 1.4044   LearningRate 0.0007   Epoch: 18   Global Step: 762640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:37,104-Speed 2626.63 samples/sec   Loss 1.3909   LearningRate 0.0007   Epoch: 18   Global Step: 762650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:41,000-Speed 2628.50 samples/sec   Loss 1.4310   LearningRate 0.0007   Epoch: 18   Global Step: 762660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:27:44,912-Speed 2618.69 samples/sec   Loss 1.3610   LearningRate 0.0007   Epoch: 18   Global Step: 762670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-16 09:27:48,788-Speed 2642.22 samples/sec   Loss 1.3884   LearningRate 0.0007   Epoch: 18   Global Step: 762680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:52,691-Speed 2624.64 samples/sec   Loss 1.3856   LearningRate 0.0007   Epoch: 18   Global Step: 762690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:27:56,604-Speed 2616.98 samples/sec   Loss 1.4821   LearningRate 0.0006   Epoch: 18   Global Step: 762700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:00,534-Speed 2607.59 samples/sec   Loss 1.4467   LearningRate 0.0006   Epoch: 18   Global Step: 762710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:04,444-Speed 2619.22 samples/sec   Loss 1.4623   LearningRate 0.0006   Epoch: 18   Global Step: 762720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:08,355-Speed 2618.76 samples/sec   Loss 1.4055   LearningRate 0.0006   Epoch: 18   Global Step: 762730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:12,264-Speed 2620.17 samples/sec   Loss 1.3965   LearningRate 0.0006   Epoch: 18   Global Step: 762740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:16,174-Speed 2620.42 samples/sec   Loss 1.4235   LearningRate 0.0006   Epoch: 18   Global Step: 762750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:20,127-Speed 2590.45 samples/sec   Loss 1.4491   LearningRate 0.0006   Epoch: 18   Global Step: 762760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:24,051-Speed 2610.77 samples/sec   Loss 1.3938   LearningRate 0.0006   Epoch: 18   Global Step: 762770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:27,937-Speed 2635.85 samples/sec   Loss 1.4415   LearningRate 0.0006   Epoch: 18   Global Step: 762780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:31,840-Speed 2624.86 samples/sec   Loss 1.4536   LearningRate 0.0006   Epoch: 18   Global Step: 762790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:35,758-Speed 2614.10 samples/sec   Loss 1.3976   LearningRate 0.0006   Epoch: 18   Global Step: 762800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:39,673-Speed 2616.29 samples/sec   Loss 1.3992   LearningRate 0.0006   Epoch: 18   Global Step: 762810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:43,579-Speed 2621.70 samples/sec   Loss 1.4336   LearningRate 0.0006   Epoch: 18   Global Step: 762820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:47,486-Speed 2622.08 samples/sec   Loss 1.4110   LearningRate 0.0006   Epoch: 18   Global Step: 762830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:51,397-Speed 2619.01 samples/sec   Loss 1.4334   LearningRate 0.0006   Epoch: 18   Global Step: 762840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:55,328-Speed 2605.54 samples/sec   Loss 1.4126   LearningRate 0.0006   Epoch: 18   Global Step: 762850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-16 09:28:59,240-Speed 2618.32 samples/sec   Loss 1.3609   LearningRate 0.0006   Epoch: 18   Global Step: 762860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:29:03,124-Speed 2637.11 samples/sec   Loss 1.4466   LearningRate 0.0006   Epoch: 18   Global Step: 762870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:07,026-Speed 2624.66 samples/sec   Loss 1.4804   LearningRate 0.0006   Epoch: 18   Global Step: 762880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:10,948-Speed 2611.36 samples/sec   Loss 1.4434   LearningRate 0.0006   Epoch: 18   Global Step: 762890   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:14,852-Speed 2624.28 samples/sec   Loss 1.4380   LearningRate 0.0006   Epoch: 18   Global Step: 762900   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:18,773-Speed 2611.82 samples/sec   Loss 1.4001   LearningRate 0.0006   Epoch: 18   Global Step: 762910   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:22,668-Speed 2629.94 samples/sec   Loss 1.4569   LearningRate 0.0006   Epoch: 18   Global Step: 762920   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:26,571-Speed 2623.65 samples/sec   Loss 1.3826   LearningRate 0.0006   Epoch: 18   Global Step: 762930   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:30,474-Speed 2625.94 samples/sec   Loss 1.4315   LearningRate 0.0006   Epoch: 18   Global Step: 762940   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:34,381-Speed 2621.23 samples/sec   Loss 1.4176   LearningRate 0.0006   Epoch: 18   Global Step: 762950   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:38,290-Speed 2620.25 samples/sec   Loss 1.4095   LearningRate 0.0006   Epoch: 18   Global Step: 762960   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:29:42,193-Speed 2623.94 samples/sec   Loss 1.3761   LearningRate 0.0006   Epoch: 18   Global Step: 762970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:29:46,102-Speed 2620.12 samples/sec   Loss 1.4004   LearningRate 0.0006   Epoch: 18   Global Step: 762980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:29:50,011-Speed 2620.27 samples/sec   Loss 1.3899   LearningRate 0.0006   Epoch: 18   Global Step: 762990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:29:53,913-Speed 2625.33 samples/sec   Loss 1.4458   LearningRate 0.0006   Epoch: 18   Global Step: 763000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:29:57,821-Speed 2620.67 samples/sec   Loss 1.4665   LearningRate 0.0006   Epoch: 18   Global Step: 763010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:01,720-Speed 2627.30 samples/sec   Loss 1.3805   LearningRate 0.0006   Epoch: 18   Global Step: 763020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:05,626-Speed 2622.70 samples/sec   Loss 1.3554   LearningRate 0.0006   Epoch: 18   Global Step: 763030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:09,546-Speed 2612.44 samples/sec   Loss 1.3869   LearningRate 0.0006   Epoch: 18   Global Step: 763040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:13,444-Speed 2627.65 samples/sec   Loss 1.4479   LearningRate 0.0006   Epoch: 18   Global Step: 763050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:17,345-Speed 2625.85 samples/sec   Loss 1.3759   LearningRate 0.0006   Epoch: 18   Global Step: 763060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:21,255-Speed 2618.94 samples/sec   Loss 1.4379   LearningRate 0.0006   Epoch: 18   Global Step: 763070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:25,156-Speed 2625.25 samples/sec   Loss 1.4550   LearningRate 0.0006   Epoch: 18   Global Step: 763080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:29,080-Speed 2610.05 samples/sec   Loss 1.4131   LearningRate 0.0006   Epoch: 18   Global Step: 763090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:32,982-Speed 2625.53 samples/sec   Loss 1.4586   LearningRate 0.0006   Epoch: 18   Global Step: 763100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:36,884-Speed 2625.38 samples/sec   Loss 1.4087   LearningRate 0.0006   Epoch: 18   Global Step: 763110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:40,786-Speed 2624.84 samples/sec   Loss 1.4538   LearningRate 0.0006   Epoch: 18   Global Step: 763120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:44,688-Speed 2624.75 samples/sec   Loss 1.4144   LearningRate 0.0006   Epoch: 18   Global Step: 763130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:30:48,566-Speed 2641.27 samples/sec   Loss 1.4420   LearningRate 0.0006   Epoch: 18   Global Step: 763140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:52,471-Speed 2623.12 samples/sec   Loss 1.4138   LearningRate 0.0006   Epoch: 18   Global Step: 763150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:30:56,369-Speed 2627.44 samples/sec   Loss 1.4128   LearningRate 0.0006   Epoch: 18   Global Step: 763160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:00,299-Speed 2616.99 samples/sec   Loss 1.4359   LearningRate 0.0006   Epoch: 18   Global Step: 763170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:04,205-Speed 2622.07 samples/sec   Loss 1.3412   LearningRate 0.0006   Epoch: 18   Global Step: 763180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:08,106-Speed 2626.96 samples/sec   Loss 1.3989   LearningRate 0.0006   Epoch: 18   Global Step: 763190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:12,003-Speed 2628.39 samples/sec   Loss 1.3308   LearningRate 0.0006   Epoch: 18   Global Step: 763200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:15,902-Speed 2627.26 samples/sec   Loss 1.4090   LearningRate 0.0006   Epoch: 18   Global Step: 763210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:19,807-Speed 2622.82 samples/sec   Loss 1.4364   LearningRate 0.0006   Epoch: 18   Global Step: 763220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:23,704-Speed 2627.74 samples/sec   Loss 1.3802   LearningRate 0.0006   Epoch: 18   Global Step: 763230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:27,604-Speed 2626.20 samples/sec   Loss 1.3970   LearningRate 0.0006   Epoch: 18   Global Step: 763240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:31:31,516-Speed 2618.66 samples/sec   Loss 1.4257   LearningRate 0.0006   Epoch: 18   Global Step: 763250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:31:35,396-Speed 2639.94 samples/sec   Loss 1.3924   LearningRate 0.0006   Epoch: 18   Global Step: 763260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:39,294-Speed 2627.77 samples/sec   Loss 1.4605   LearningRate 0.0006   Epoch: 18   Global Step: 763270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:43,192-Speed 2627.50 samples/sec   Loss 1.3504   LearningRate 0.0006   Epoch: 18   Global Step: 763280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:47,096-Speed 2624.07 samples/sec   Loss 1.3929   LearningRate 0.0006   Epoch: 18   Global Step: 763290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:50,998-Speed 2624.46 samples/sec   Loss 1.3892   LearningRate 0.0006   Epoch: 18   Global Step: 763300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:54,904-Speed 2621.82 samples/sec   Loss 1.4075   LearningRate 0.0006   Epoch: 18   Global Step: 763310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:31:58,808-Speed 2623.82 samples/sec   Loss 1.3452   LearningRate 0.0006   Epoch: 18   Global Step: 763320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:02,707-Speed 2627.65 samples/sec   Loss 1.3875   LearningRate 0.0006   Epoch: 18   Global Step: 763330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:06,604-Speed 2628.48 samples/sec   Loss 1.3675   LearningRate 0.0006   Epoch: 18   Global Step: 763340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:10,502-Speed 2627.05 samples/sec   Loss 1.3880   LearningRate 0.0006   Epoch: 18   Global Step: 763350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:14,389-Speed 2635.42 samples/sec   Loss 1.3972   LearningRate 0.0006   Epoch: 18   Global Step: 763360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:18,294-Speed 2623.54 samples/sec   Loss 1.3651   LearningRate 0.0006   Epoch: 18   Global Step: 763370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:22,199-Speed 2622.64 samples/sec   Loss 1.4187   LearningRate 0.0006   Epoch: 18   Global Step: 763380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:26,097-Speed 2627.15 samples/sec   Loss 1.3589   LearningRate 0.0006   Epoch: 18   Global Step: 763390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:29,997-Speed 2626.56 samples/sec   Loss 1.4013   LearningRate 0.0006   Epoch: 18   Global Step: 763400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:33,903-Speed 2622.32 samples/sec   Loss 1.3678   LearningRate 0.0006   Epoch: 18   Global Step: 763410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:37,810-Speed 2622.30 samples/sec   Loss 1.4047   LearningRate 0.0006   Epoch: 18   Global Step: 763420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:41,717-Speed 2621.57 samples/sec   Loss 1.3903   LearningRate 0.0006   Epoch: 18   Global Step: 763430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:45,616-Speed 2626.44 samples/sec   Loss 1.3724   LearningRate 0.0006   Epoch: 18   Global Step: 763440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:49,518-Speed 2625.65 samples/sec   Loss 1.4325   LearningRate 0.0006   Epoch: 18   Global Step: 763450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:32:53,418-Speed 2626.52 samples/sec   Loss 1.4374   LearningRate 0.0006   Epoch: 18   Global Step: 763460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:32:57,324-Speed 2622.03 samples/sec   Loss 1.4247   LearningRate 0.0006   Epoch: 18   Global Step: 763470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:01,222-Speed 2627.93 samples/sec   Loss 1.3753   LearningRate 0.0006   Epoch: 18   Global Step: 763480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:05,117-Speed 2629.12 samples/sec   Loss 1.4518   LearningRate 0.0006   Epoch: 18   Global Step: 763490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:09,021-Speed 2623.45 samples/sec   Loss 1.3887   LearningRate 0.0006   Epoch: 18   Global Step: 763500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:12,931-Speed 2619.58 samples/sec   Loss 1.3989   LearningRate 0.0006   Epoch: 18   Global Step: 763510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:16,831-Speed 2626.78 samples/sec   Loss 1.4362   LearningRate 0.0006   Epoch: 18   Global Step: 763520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:20,730-Speed 2627.19 samples/sec   Loss 1.4552   LearningRate 0.0006   Epoch: 18   Global Step: 763530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:24,628-Speed 2627.53 samples/sec   Loss 1.3997   LearningRate 0.0006   Epoch: 18   Global Step: 763540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:28,525-Speed 2628.02 samples/sec   Loss 1.3949   LearningRate 0.0006   Epoch: 18   Global Step: 763550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:33:32,403-Speed 2641.64 samples/sec   Loss 1.3610   LearningRate 0.0006   Epoch: 18   Global Step: 763560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:36,304-Speed 2625.19 samples/sec   Loss 1.3845   LearningRate 0.0006   Epoch: 18   Global Step: 763570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:40,205-Speed 2625.71 samples/sec   Loss 1.3718   LearningRate 0.0006   Epoch: 18   Global Step: 763580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:44,120-Speed 2616.24 samples/sec   Loss 1.4088   LearningRate 0.0006   Epoch: 18   Global Step: 763590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:48,022-Speed 2624.73 samples/sec   Loss 1.4262   LearningRate 0.0006   Epoch: 18   Global Step: 763600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:51,921-Speed 2629.29 samples/sec   Loss 1.4151   LearningRate 0.0006   Epoch: 18   Global Step: 763610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:55,819-Speed 2627.36 samples/sec   Loss 1.4406   LearningRate 0.0006   Epoch: 18   Global Step: 763620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:33:59,718-Speed 2627.15 samples/sec   Loss 1.4651   LearningRate 0.0006   Epoch: 18   Global Step: 763630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:03,622-Speed 2623.73 samples/sec   Loss 1.3516   LearningRate 0.0006   Epoch: 18   Global Step: 763640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:07,525-Speed 2624.72 samples/sec   Loss 1.3947   LearningRate 0.0006   Epoch: 18   Global Step: 763650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:11,427-Speed 2624.73 samples/sec   Loss 1.3985   LearningRate 0.0006   Epoch: 18   Global Step: 763660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:15,325-Speed 2628.16 samples/sec   Loss 1.4100   LearningRate 0.0006   Epoch: 18   Global Step: 763670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:19,242-Speed 2614.93 samples/sec   Loss 1.4169   LearningRate 0.0006   Epoch: 18   Global Step: 763680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:23,144-Speed 2625.42 samples/sec   Loss 1.4049   LearningRate 0.0006   Epoch: 18   Global Step: 763690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:27,043-Speed 2626.47 samples/sec   Loss 1.4134   LearningRate 0.0006   Epoch: 18   Global Step: 763700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:30,941-Speed 2628.09 samples/sec   Loss 1.4442   LearningRate 0.0006   Epoch: 18   Global Step: 763710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:34,843-Speed 2624.75 samples/sec   Loss 1.4190   LearningRate 0.0006   Epoch: 18   Global Step: 763720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:38,756-Speed 2617.53 samples/sec   Loss 1.3992   LearningRate 0.0006   Epoch: 18   Global Step: 763730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:34:42,636-Speed 2639.26 samples/sec   Loss 1.4134   LearningRate 0.0006   Epoch: 18   Global Step: 763740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:46,542-Speed 2622.61 samples/sec   Loss 1.4295   LearningRate 0.0006   Epoch: 18   Global Step: 763750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:50,451-Speed 2620.37 samples/sec   Loss 1.3801   LearningRate 0.0006   Epoch: 18   Global Step: 763760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:54,431-Speed 2573.40 samples/sec   Loss 1.4054   LearningRate 0.0006   Epoch: 18   Global Step: 763770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:34:58,346-Speed 2616.38 samples/sec   Loss 1.4166   LearningRate 0.0006   Epoch: 18   Global Step: 763780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:02,246-Speed 2626.61 samples/sec   Loss 1.4357   LearningRate 0.0006   Epoch: 18   Global Step: 763790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:06,148-Speed 2624.80 samples/sec   Loss 1.4044   LearningRate 0.0006   Epoch: 18   Global Step: 763800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:10,050-Speed 2624.91 samples/sec   Loss 1.3729   LearningRate 0.0006   Epoch: 18   Global Step: 763810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:13,950-Speed 2625.96 samples/sec   Loss 1.3749   LearningRate 0.0006   Epoch: 18   Global Step: 763820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:17,849-Speed 2627.42 samples/sec   Loss 1.4168   LearningRate 0.0006   Epoch: 18   Global Step: 763830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:21,734-Speed 2636.37 samples/sec   Loss 1.3694   LearningRate 0.0006   Epoch: 18   Global Step: 763840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:25,638-Speed 2623.17 samples/sec   Loss 1.4407   LearningRate 0.0006   Epoch: 18   Global Step: 763850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:29,545-Speed 2621.45 samples/sec   Loss 1.3972   LearningRate 0.0006   Epoch: 18   Global Step: 763860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:33,448-Speed 2625.23 samples/sec   Loss 1.3908   LearningRate 0.0006   Epoch: 18   Global Step: 763870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:37,348-Speed 2625.93 samples/sec   Loss 1.3601   LearningRate 0.0006   Epoch: 18   Global Step: 763880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:41,248-Speed 2626.74 samples/sec   Loss 1.4569   LearningRate 0.0006   Epoch: 18   Global Step: 763890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:45,150-Speed 2624.61 samples/sec   Loss 1.4077   LearningRate 0.0006   Epoch: 18   Global Step: 763900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:49,070-Speed 2612.72 samples/sec   Loss 1.3996   LearningRate 0.0006   Epoch: 18   Global Step: 763910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:53,001-Speed 2606.57 samples/sec   Loss 1.3879   LearningRate 0.0006   Epoch: 18   Global Step: 763920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:35:56,900-Speed 2626.39 samples/sec   Loss 1.4007   LearningRate 0.0006   Epoch: 18   Global Step: 763930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:00,806-Speed 2622.78 samples/sec   Loss 1.3948   LearningRate 0.0006   Epoch: 18   Global Step: 763940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:36:04,733-Speed 2608.12 samples/sec   Loss 1.4324   LearningRate 0.0006   Epoch: 18   Global Step: 763950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:36:08,618-Speed 2636.51 samples/sec   Loss 1.3471   LearningRate 0.0006   Epoch: 18   Global Step: 763960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:12,520-Speed 2625.02 samples/sec   Loss 1.4306   LearningRate 0.0006   Epoch: 18   Global Step: 763970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:16,429-Speed 2619.93 samples/sec   Loss 1.4153   LearningRate 0.0006   Epoch: 18   Global Step: 763980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:20,338-Speed 2620.17 samples/sec   Loss 1.3977   LearningRate 0.0006   Epoch: 18   Global Step: 763990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:24,255-Speed 2614.84 samples/sec   Loss 1.4600   LearningRate 0.0006   Epoch: 18   Global Step: 764000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:28,165-Speed 2620.17 samples/sec   Loss 1.3641   LearningRate 0.0006   Epoch: 18   Global Step: 764010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:32,066-Speed 2626.02 samples/sec   Loss 1.4043   LearningRate 0.0006   Epoch: 18   Global Step: 764020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:35,963-Speed 2627.99 samples/sec   Loss 1.3706   LearningRate 0.0006   Epoch: 18   Global Step: 764030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:39,866-Speed 2624.60 samples/sec   Loss 1.3253   LearningRate 0.0006   Epoch: 18   Global Step: 764040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:43,797-Speed 2605.67 samples/sec   Loss 1.3363   LearningRate 0.0006   Epoch: 18   Global Step: 764050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:47,741-Speed 2597.31 samples/sec   Loss 1.4423   LearningRate 0.0006   Epoch: 18   Global Step: 764060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:36:51,733-Speed 2565.37 samples/sec   Loss 1.4093   LearningRate 0.0006   Epoch: 18   Global Step: 764070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:36:55,637-Speed 2623.93 samples/sec   Loss 1.4122   LearningRate 0.0006   Epoch: 18   Global Step: 764080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:36:59,564-Speed 2607.81 samples/sec   Loss 1.4475   LearningRate 0.0006   Epoch: 18   Global Step: 764090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:03,465-Speed 2626.57 samples/sec   Loss 1.3878   LearningRate 0.0006   Epoch: 18   Global Step: 764100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:07,373-Speed 2620.93 samples/sec   Loss 1.4082   LearningRate 0.0006   Epoch: 18   Global Step: 764110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:11,270-Speed 2627.88 samples/sec   Loss 1.3923   LearningRate 0.0006   Epoch: 18   Global Step: 764120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:15,194-Speed 2610.60 samples/sec   Loss 1.4087   LearningRate 0.0006   Epoch: 18   Global Step: 764130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:19,113-Speed 2613.84 samples/sec   Loss 1.4012   LearningRate 0.0006   Epoch: 18   Global Step: 764140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:23,018-Speed 2622.78 samples/sec   Loss 1.3975   LearningRate 0.0006   Epoch: 18   Global Step: 764150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:27,013-Speed 2563.29 samples/sec   Loss 1.4325   LearningRate 0.0006   Epoch: 18   Global Step: 764160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:31,102-Speed 2505.07 samples/sec   Loss 1.4088   LearningRate 0.0006   Epoch: 18   Global Step: 764170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:35,034-Speed 2604.61 samples/sec   Loss 1.3984   LearningRate 0.0006   Epoch: 18   Global Step: 764180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:37:38,927-Speed 2631.38 samples/sec   Loss 1.4454   LearningRate 0.0006   Epoch: 18   Global Step: 764190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:42,907-Speed 2573.75 samples/sec   Loss 1.3691   LearningRate 0.0006   Epoch: 18   Global Step: 764200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:46,804-Speed 2628.68 samples/sec   Loss 1.4110   LearningRate 0.0006   Epoch: 18   Global Step: 764210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:50,714-Speed 2619.14 samples/sec   Loss 1.3965   LearningRate 0.0006   Epoch: 18   Global Step: 764220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:54,629-Speed 2616.04 samples/sec   Loss 1.4208   LearningRate 0.0006   Epoch: 18   Global Step: 764230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:37:58,527-Speed 2627.61 samples/sec   Loss 1.4106   LearningRate 0.0006   Epoch: 18   Global Step: 764240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:02,454-Speed 2608.62 samples/sec   Loss 1.3528   LearningRate 0.0006   Epoch: 18   Global Step: 764250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:06,356-Speed 2625.21 samples/sec   Loss 1.4122   LearningRate 0.0006   Epoch: 18   Global Step: 764260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:10,269-Speed 2617.70 samples/sec   Loss 1.3929   LearningRate 0.0006   Epoch: 18   Global Step: 764270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:14,182-Speed 2617.17 samples/sec   Loss 1.3924   LearningRate 0.0006   Epoch: 18   Global Step: 764280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:18,091-Speed 2620.91 samples/sec   Loss 1.4065   LearningRate 0.0006   Epoch: 18   Global Step: 764290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:38:21,976-Speed 2636.57 samples/sec   Loss 1.4050   LearningRate 0.0006   Epoch: 18   Global Step: 764300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:25,903-Speed 2607.90 samples/sec   Loss 1.4052   LearningRate 0.0006   Epoch: 18   Global Step: 764310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:29,807-Speed 2623.68 samples/sec   Loss 1.4157   LearningRate 0.0006   Epoch: 18   Global Step: 764320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:33,704-Speed 2628.76 samples/sec   Loss 1.4041   LearningRate 0.0006   Epoch: 18   Global Step: 764330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:37,612-Speed 2621.19 samples/sec   Loss 1.4223   LearningRate 0.0006   Epoch: 18   Global Step: 764340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:41,514-Speed 2625.32 samples/sec   Loss 1.4780   LearningRate 0.0006   Epoch: 18   Global Step: 764350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:45,420-Speed 2622.20 samples/sec   Loss 1.3397   LearningRate 0.0006   Epoch: 18   Global Step: 764360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:49,319-Speed 2627.43 samples/sec   Loss 1.3787   LearningRate 0.0006   Epoch: 18   Global Step: 764370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:53,212-Speed 2630.79 samples/sec   Loss 1.3600   LearningRate 0.0006   Epoch: 18   Global Step: 764380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:38:57,128-Speed 2615.69 samples/sec   Loss 1.4401   LearningRate 0.0006   Epoch: 18   Global Step: 764390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:01,029-Speed 2625.57 samples/sec   Loss 1.4101   LearningRate 0.0006   Epoch: 18   Global Step: 764400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:39:04,901-Speed 2645.08 samples/sec   Loss 1.4177   LearningRate 0.0006   Epoch: 18   Global Step: 764410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:08,799-Speed 2628.06 samples/sec   Loss 1.4370   LearningRate 0.0006   Epoch: 18   Global Step: 764420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:12,690-Speed 2632.67 samples/sec   Loss 1.3561   LearningRate 0.0006   Epoch: 18   Global Step: 764430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:16,586-Speed 2628.60 samples/sec   Loss 1.3570   LearningRate 0.0006   Epoch: 18   Global Step: 764440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:20,487-Speed 2625.75 samples/sec   Loss 1.4084   LearningRate 0.0006   Epoch: 18   Global Step: 764450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:24,385-Speed 2627.95 samples/sec   Loss 1.4032   LearningRate 0.0006   Epoch: 18   Global Step: 764460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:39:28,258-Speed 2644.47 samples/sec   Loss 1.3770   LearningRate 0.0006   Epoch: 18   Global Step: 764470   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:32,154-Speed 2628.64 samples/sec   Loss 1.4184   LearningRate 0.0006   Epoch: 18   Global Step: 764480   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:36,064-Speed 2619.55 samples/sec   Loss 1.3692   LearningRate 0.0006   Epoch: 18   Global Step: 764490   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:39,962-Speed 2627.30 samples/sec   Loss 1.4194   LearningRate 0.0006   Epoch: 18   Global Step: 764500   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:43,859-Speed 2629.19 samples/sec   Loss 1.4200   LearningRate 0.0006   Epoch: 18   Global Step: 764510   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:47,755-Speed 2629.21 samples/sec   Loss 1.3720   LearningRate 0.0006   Epoch: 18   Global Step: 764520   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:51,650-Speed 2629.27 samples/sec   Loss 1.4328   LearningRate 0.0006   Epoch: 18   Global Step: 764530   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:55,546-Speed 2629.51 samples/sec   Loss 1.3881   LearningRate 0.0006   Epoch: 18   Global Step: 764540   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:39:59,440-Speed 2630.15 samples/sec   Loss 1.4135   LearningRate 0.0006   Epoch: 18   Global Step: 764550   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:03,337-Speed 2627.93 samples/sec   Loss 1.3481   LearningRate 0.0006   Epoch: 18   Global Step: 764560   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:07,207-Speed 2646.29 samples/sec   Loss 1.4282   LearningRate 0.0006   Epoch: 18   Global Step: 764570   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:11,103-Speed 2630.29 samples/sec   Loss 1.3881   LearningRate 0.0006   Epoch: 18   Global Step: 764580   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:14,999-Speed 2628.72 samples/sec   Loss 1.3961   LearningRate 0.0006   Epoch: 18   Global Step: 764590   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:18,900-Speed 2626.23 samples/sec   Loss 1.3721   LearningRate 0.0006   Epoch: 18   Global Step: 764600   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:22,820-Speed 2612.51 samples/sec   Loss 1.4056   LearningRate 0.0006   Epoch: 18   Global Step: 764610   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:26,718-Speed 2628.12 samples/sec   Loss 1.4330   LearningRate 0.0006   Epoch: 18   Global Step: 764620   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:30,621-Speed 2623.73 samples/sec   Loss 1.3696   LearningRate 0.0006   Epoch: 18   Global Step: 764630   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:34,514-Speed 2630.94 samples/sec   Loss 1.3768   LearningRate 0.0006   Epoch: 18   Global Step: 764640   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:38,435-Speed 2612.00 samples/sec   Loss 1.4220   LearningRate 0.0006   Epoch: 18   Global Step: 764650   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:42,338-Speed 2626.29 samples/sec   Loss 1.3883   LearningRate 0.0006   Epoch: 18   Global Step: 764660   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:40:46,237-Speed 2626.37 samples/sec   Loss 1.3781   LearningRate 0.0006   Epoch: 18   Global Step: 764670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:40:50,153-Speed 2615.83 samples/sec   Loss 1.4279   LearningRate 0.0006   Epoch: 18   Global Step: 764680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:40:54,054-Speed 2625.83 samples/sec   Loss 1.4273   LearningRate 0.0006   Epoch: 18   Global Step: 764690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:40:57,948-Speed 2630.50 samples/sec   Loss 1.3678   LearningRate 0.0006   Epoch: 18   Global Step: 764700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:41:01,863-Speed 2616.61 samples/sec   Loss 1.3942   LearningRate 0.0006   Epoch: 18   Global Step: 764710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:41:05,773-Speed 2618.97 samples/sec   Loss 1.3755   LearningRate 0.0006   Epoch: 18   Global Step: 764720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:41:09,679-Speed 2622.70 samples/sec   Loss 1.4044   LearningRate 0.0006   Epoch: 18   Global Step: 764730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:41:13,587-Speed 2620.74 samples/sec   Loss 1.3726   LearningRate 0.0006   Epoch: 18   Global Step: 764740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:41:17,466-Speed 2641.27 samples/sec   Loss 1.4440   LearningRate 0.0006   Epoch: 18   Global Step: 764750   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:21,366-Speed 2626.62 samples/sec   Loss 1.3673   LearningRate 0.0006   Epoch: 18   Global Step: 764760   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:25,343-Speed 2575.67 samples/sec   Loss 1.3950   LearningRate 0.0006   Epoch: 18   Global Step: 764770   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:29,240-Speed 2628.40 samples/sec   Loss 1.3931   LearningRate 0.0006   Epoch: 18   Global Step: 764780   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:33,146-Speed 2622.16 samples/sec   Loss 1.3889   LearningRate 0.0006   Epoch: 18   Global Step: 764790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:37,044-Speed 2627.35 samples/sec   Loss 1.3772   LearningRate 0.0006   Epoch: 18   Global Step: 764800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:40,944-Speed 2626.43 samples/sec   Loss 1.4048   LearningRate 0.0006   Epoch: 18   Global Step: 764810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:44,844-Speed 2625.93 samples/sec   Loss 1.3756   LearningRate 0.0006   Epoch: 18   Global Step: 764820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:48,749-Speed 2623.33 samples/sec   Loss 1.3805   LearningRate 0.0006   Epoch: 18   Global Step: 764830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:52,646-Speed 2628.27 samples/sec   Loss 1.3808   LearningRate 0.0006   Epoch: 18   Global Step: 764840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:41:56,540-Speed 2630.59 samples/sec   Loss 1.3794   LearningRate 0.0006   Epoch: 18   Global Step: 764850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:00,440-Speed 2626.42 samples/sec   Loss 1.3895   LearningRate 0.0006   Epoch: 18   Global Step: 764860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:04,338-Speed 2627.76 samples/sec   Loss 1.3965   LearningRate 0.0006   Epoch: 18   Global Step: 764870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:08,229-Speed 2632.00 samples/sec   Loss 1.4339   LearningRate 0.0006   Epoch: 18   Global Step: 764880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:12,127-Speed 2627.96 samples/sec   Loss 1.3748   LearningRate 0.0006   Epoch: 18   Global Step: 764890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:16,028-Speed 2625.55 samples/sec   Loss 1.3714   LearningRate 0.0006   Epoch: 18   Global Step: 764900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:19,928-Speed 2626.05 samples/sec   Loss 1.4041   LearningRate 0.0006   Epoch: 18   Global Step: 764910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:23,830-Speed 2625.70 samples/sec   Loss 1.4358   LearningRate 0.0006   Epoch: 18   Global Step: 764920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:27,774-Speed 2597.72 samples/sec   Loss 1.4327   LearningRate 0.0006   Epoch: 18   Global Step: 764930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:31,675-Speed 2625.42 samples/sec   Loss 1.4041   LearningRate 0.0006   Epoch: 18   Global Step: 764940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:42:35,576-Speed 2625.69 samples/sec   Loss 1.4298   LearningRate 0.0006   Epoch: 18   Global Step: 764950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:42:39,478-Speed 2625.13 samples/sec   Loss 1.4881   LearningRate 0.0006   Epoch: 18   Global Step: 764960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:42:43,392-Speed 2617.20 samples/sec   Loss 1.3589   LearningRate 0.0006   Epoch: 18   Global Step: 764970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:42:47,289-Speed 2628.23 samples/sec   Loss 1.4142   LearningRate 0.0006   Epoch: 18   Global Step: 764980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:42:51,187-Speed 2628.03 samples/sec   Loss 1.3422   LearningRate 0.0006   Epoch: 18   Global Step: 764990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:42:55,119-Speed 2604.90 samples/sec   Loss 1.4095   LearningRate 0.0006   Epoch: 18   Global Step: 765000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:42:59,016-Speed 2628.00 samples/sec   Loss 1.3631   LearningRate 0.0006   Epoch: 18   Global Step: 765010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:43:02,923-Speed 2620.94 samples/sec   Loss 1.4057   LearningRate 0.0006   Epoch: 18   Global Step: 765020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:43:06,825-Speed 2625.48 samples/sec   Loss 1.4016   LearningRate 0.0006   Epoch: 18   Global Step: 765030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:43:10,705-Speed 2639.72 samples/sec   Loss 1.3493   LearningRate 0.0006   Epoch: 18   Global Step: 765040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:14,606-Speed 2625.98 samples/sec   Loss 1.3879   LearningRate 0.0006   Epoch: 18   Global Step: 765050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:18,506-Speed 2626.63 samples/sec   Loss 1.3854   LearningRate 0.0006   Epoch: 18   Global Step: 765060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:22,407-Speed 2625.70 samples/sec   Loss 1.3914   LearningRate 0.0006   Epoch: 18   Global Step: 765070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:26,307-Speed 2626.50 samples/sec   Loss 1.4040   LearningRate 0.0006   Epoch: 18   Global Step: 765080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:30,209-Speed 2624.74 samples/sec   Loss 1.3663   LearningRate 0.0006   Epoch: 18   Global Step: 765090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:34,123-Speed 2616.36 samples/sec   Loss 1.4093   LearningRate 0.0006   Epoch: 18   Global Step: 765100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:38,044-Speed 2613.22 samples/sec   Loss 1.3915   LearningRate 0.0006   Epoch: 18   Global Step: 765110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:41,938-Speed 2630.82 samples/sec   Loss 1.3765   LearningRate 0.0006   Epoch: 18   Global Step: 765120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:45,837-Speed 2627.16 samples/sec   Loss 1.3211   LearningRate 0.0006   Epoch: 18   Global Step: 765130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:43:49,734-Speed 2628.13 samples/sec   Loss 1.3725   LearningRate 0.0006   Epoch: 18   Global Step: 765140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:43:53,628-Speed 2629.98 samples/sec   Loss 1.3783   LearningRate 0.0006   Epoch: 18   Global Step: 765150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:43:57,501-Speed 2644.77 samples/sec   Loss 1.4355   LearningRate 0.0006   Epoch: 18   Global Step: 765160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:44:01,372-Speed 2646.12 samples/sec   Loss 1.4337   LearningRate 0.0006   Epoch: 18   Global Step: 765170   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:05,266-Speed 2629.95 samples/sec   Loss 1.4149   LearningRate 0.0006   Epoch: 18   Global Step: 765180   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:09,163-Speed 2629.13 samples/sec   Loss 1.3176   LearningRate 0.0006   Epoch: 18   Global Step: 765190   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:13,054-Speed 2632.49 samples/sec   Loss 1.3510   LearningRate 0.0006   Epoch: 18   Global Step: 765200   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:16,947-Speed 2631.04 samples/sec   Loss 1.3664   LearningRate 0.0006   Epoch: 18   Global Step: 765210   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:20,840-Speed 2631.16 samples/sec   Loss 1.4079   LearningRate 0.0006   Epoch: 18   Global Step: 765220   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:24,752-Speed 2618.14 samples/sec   Loss 1.3999   LearningRate 0.0006   Epoch: 18   Global Step: 765230   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:28,667-Speed 2616.11 samples/sec   Loss 1.3972   LearningRate 0.0006   Epoch: 18   Global Step: 765240   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:32,559-Speed 2631.61 samples/sec   Loss 1.4202   LearningRate 0.0006   Epoch: 18   Global Step: 765250   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:36,512-Speed 2591.57 samples/sec   Loss 1.3688   LearningRate 0.0006   Epoch: 18   Global Step: 765260   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:44:40,428-Speed 2615.39 samples/sec   Loss 1.3958   LearningRate 0.0006   Epoch: 18   Global Step: 765270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:44:44,331-Speed 2624.55 samples/sec   Loss 1.3797   LearningRate 0.0006   Epoch: 18   Global Step: 765280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:44:48,229-Speed 2628.63 samples/sec   Loss 1.4365   LearningRate 0.0006   Epoch: 18   Global Step: 765290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:44:52,135-Speed 2623.04 samples/sec   Loss 1.4239   LearningRate 0.0006   Epoch: 18   Global Step: 765300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:44:56,030-Speed 2629.32 samples/sec   Loss 1.4482   LearningRate 0.0006   Epoch: 18   Global Step: 765310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:44:59,926-Speed 2629.45 samples/sec   Loss 1.3616   LearningRate 0.0006   Epoch: 18   Global Step: 765320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:03,819-Speed 2630.64 samples/sec   Loss 1.4144   LearningRate 0.0006   Epoch: 18   Global Step: 765330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:07,715-Speed 2629.57 samples/sec   Loss 1.4111   LearningRate 0.0006   Epoch: 18   Global Step: 765340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:11,614-Speed 2626.60 samples/sec   Loss 1.4029   LearningRate 0.0006   Epoch: 18   Global Step: 765350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:15,527-Speed 2617.92 samples/sec   Loss 1.3935   LearningRate 0.0006   Epoch: 18   Global Step: 765360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:19,472-Speed 2596.49 samples/sec   Loss 1.4101   LearningRate 0.0006   Epoch: 18   Global Step: 765370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:45:23,390-Speed 2614.31 samples/sec   Loss 1.3381   LearningRate 0.0006   Epoch: 18   Global Step: 765380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:27,281-Speed 2632.48 samples/sec   Loss 1.3693   LearningRate 0.0006   Epoch: 18   Global Step: 765390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:31,182-Speed 2625.82 samples/sec   Loss 1.3447   LearningRate 0.0006   Epoch: 18   Global Step: 765400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:35,079-Speed 2628.25 samples/sec   Loss 1.3957   LearningRate 0.0006   Epoch: 18   Global Step: 765410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:39,009-Speed 2606.32 samples/sec   Loss 1.3823   LearningRate 0.0006   Epoch: 18   Global Step: 765420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:42,902-Speed 2630.92 samples/sec   Loss 1.3985   LearningRate 0.0006   Epoch: 18   Global Step: 765430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:46,798-Speed 2628.98 samples/sec   Loss 1.3636   LearningRate 0.0006   Epoch: 18   Global Step: 765440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:50,691-Speed 2631.73 samples/sec   Loss 1.3871   LearningRate 0.0006   Epoch: 18   Global Step: 765450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:54,587-Speed 2628.47 samples/sec   Loss 1.3763   LearningRate 0.0006   Epoch: 18   Global Step: 765460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:45:58,494-Speed 2622.19 samples/sec   Loss 1.3804   LearningRate 0.0006   Epoch: 18   Global Step: 765470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:46:02,405-Speed 2619.11 samples/sec   Loss 1.3632   LearningRate 0.0006   Epoch: 18   Global Step: 765480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:06,330-Speed 2608.78 samples/sec   Loss 1.4149   LearningRate 0.0006   Epoch: 18   Global Step: 765490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:10,238-Speed 2620.92 samples/sec   Loss 1.3877   LearningRate 0.0006   Epoch: 18   Global Step: 765500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:14,138-Speed 2627.19 samples/sec   Loss 1.3820   LearningRate 0.0006   Epoch: 18   Global Step: 765510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:18,040-Speed 2624.85 samples/sec   Loss 1.3818   LearningRate 0.0006   Epoch: 18   Global Step: 765520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:21,942-Speed 2624.67 samples/sec   Loss 1.3290   LearningRate 0.0006   Epoch: 18   Global Step: 765530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:25,850-Speed 2621.74 samples/sec   Loss 1.4132   LearningRate 0.0006   Epoch: 18   Global Step: 765540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:29,746-Speed 2628.84 samples/sec   Loss 1.3298   LearningRate 0.0006   Epoch: 18   Global Step: 765550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:33,641-Speed 2629.62 samples/sec   Loss 1.4337   LearningRate 0.0006   Epoch: 18   Global Step: 765560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:37,537-Speed 2628.85 samples/sec   Loss 1.3844   LearningRate 0.0006   Epoch: 18   Global Step: 765570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:46:41,388-Speed 2659.88 samples/sec   Loss 1.4211   LearningRate 0.0006   Epoch: 18   Global Step: 765580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:46:45,282-Speed 2630.48 samples/sec   Loss 1.4043   LearningRate 0.0006   Epoch: 18   Global Step: 765590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:46:49,176-Speed 2630.77 samples/sec   Loss 1.3782   LearningRate 0.0006   Epoch: 18   Global Step: 765600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:46:53,076-Speed 2627.08 samples/sec   Loss 1.3567   LearningRate 0.0006   Epoch: 18   Global Step: 765610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:46:56,979-Speed 2624.21 samples/sec   Loss 1.3818   LearningRate 0.0006   Epoch: 18   Global Step: 765620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:00,872-Speed 2630.76 samples/sec   Loss 1.4035   LearningRate 0.0006   Epoch: 18   Global Step: 765630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:04,770-Speed 2628.19 samples/sec   Loss 1.3952   LearningRate 0.0006   Epoch: 18   Global Step: 765640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:08,663-Speed 2630.89 samples/sec   Loss 1.4159   LearningRate 0.0006   Epoch: 18   Global Step: 765650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:12,556-Speed 2631.24 samples/sec   Loss 1.4302   LearningRate 0.0006   Epoch: 18   Global Step: 765660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:16,446-Speed 2632.95 samples/sec   Loss 1.3829   LearningRate 0.0006   Epoch: 18   Global Step: 765670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:20,343-Speed 2628.59 samples/sec   Loss 1.3661   LearningRate 0.0006   Epoch: 18   Global Step: 765680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:47:24,243-Speed 2626.59 samples/sec   Loss 1.3994   LearningRate 0.0006   Epoch: 18   Global Step: 765690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:47:28,141-Speed 2627.89 samples/sec   Loss 1.3312   LearningRate 0.0006   Epoch: 18   Global Step: 765700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:47:32,171-Speed 2541.60 samples/sec   Loss 1.3294   LearningRate 0.0006   Epoch: 18   Global Step: 765710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:47:36,047-Speed 2642.44 samples/sec   Loss 1.3576   LearningRate 0.0006   Epoch: 18   Global Step: 765720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:39,951-Speed 2623.32 samples/sec   Loss 1.3731   LearningRate 0.0006   Epoch: 18   Global Step: 765730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:43,953-Speed 2559.17 samples/sec   Loss 1.3694   LearningRate 0.0006   Epoch: 18   Global Step: 765740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:47,850-Speed 2628.20 samples/sec   Loss 1.4029   LearningRate 0.0006   Epoch: 18   Global Step: 765750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:51,755-Speed 2623.51 samples/sec   Loss 1.4434   LearningRate 0.0006   Epoch: 18   Global Step: 765760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:55,647-Speed 2631.32 samples/sec   Loss 1.4427   LearningRate 0.0006   Epoch: 18   Global Step: 765770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:47:59,579-Speed 2605.38 samples/sec   Loss 1.4149   LearningRate 0.0006   Epoch: 18   Global Step: 765780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:48:03,474-Speed 2629.94 samples/sec   Loss 1.3451   LearningRate 0.0006   Epoch: 18   Global Step: 765790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:48:07,345-Speed 2645.63 samples/sec   Loss 1.3821   LearningRate 0.0006   Epoch: 18   Global Step: 765800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:11,248-Speed 2624.24 samples/sec   Loss 1.3999   LearningRate 0.0006   Epoch: 18   Global Step: 765810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:15,156-Speed 2620.53 samples/sec   Loss 1.3453   LearningRate 0.0006   Epoch: 18   Global Step: 765820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:19,067-Speed 2619.71 samples/sec   Loss 1.3431   LearningRate 0.0006   Epoch: 18   Global Step: 765830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:22,963-Speed 2628.50 samples/sec   Loss 1.4309   LearningRate 0.0006   Epoch: 18   Global Step: 765840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:26,863-Speed 2626.22 samples/sec   Loss 1.3962   LearningRate 0.0006   Epoch: 18   Global Step: 765850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:30,753-Speed 2633.51 samples/sec   Loss 1.3708   LearningRate 0.0006   Epoch: 18   Global Step: 765860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:34,645-Speed 2631.79 samples/sec   Loss 1.3325   LearningRate 0.0006   Epoch: 18   Global Step: 765870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:38,624-Speed 2574.76 samples/sec   Loss 1.3551   LearningRate 0.0006   Epoch: 18   Global Step: 765880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:42,519-Speed 2629.58 samples/sec   Loss 1.3280   LearningRate 0.0006   Epoch: 18   Global Step: 765890   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:48:46,429-Speed 2619.71 samples/sec   Loss 1.4286   LearningRate 0.0006   Epoch: 18   Global Step: 765900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:48:50,329-Speed 2626.13 samples/sec   Loss 1.3992   LearningRate 0.0006   Epoch: 18   Global Step: 765910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:48:54,224-Speed 2630.37 samples/sec   Loss 1.3363   LearningRate 0.0006   Epoch: 18   Global Step: 765920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:48:58,119-Speed 2628.89 samples/sec   Loss 1.3080   LearningRate 0.0006   Epoch: 18   Global Step: 765930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:02,029-Speed 2619.36 samples/sec   Loss 1.4001   LearningRate 0.0006   Epoch: 18   Global Step: 765940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:05,921-Speed 2631.66 samples/sec   Loss 1.4290   LearningRate 0.0006   Epoch: 18   Global Step: 765950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:09,811-Speed 2633.77 samples/sec   Loss 1.3842   LearningRate 0.0006   Epoch: 18   Global Step: 765960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:13,704-Speed 2631.32 samples/sec   Loss 1.3650   LearningRate 0.0006   Epoch: 18   Global Step: 765970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:17,598-Speed 2630.32 samples/sec   Loss 1.4134   LearningRate 0.0006   Epoch: 18   Global Step: 765980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:21,490-Speed 2631.86 samples/sec   Loss 1.4026   LearningRate 0.0006   Epoch: 18   Global Step: 765990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:25,392-Speed 2624.76 samples/sec   Loss 1.3844   LearningRate 0.0006   Epoch: 18   Global Step: 766000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:49:29,284-Speed 2632.17 samples/sec   Loss 1.3315   LearningRate 0.0006   Epoch: 18   Global Step: 766010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:33,181-Speed 2628.17 samples/sec   Loss 1.4009   LearningRate 0.0006   Epoch: 18   Global Step: 766020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:37,084-Speed 2624.45 samples/sec   Loss 1.4053   LearningRate 0.0006   Epoch: 18   Global Step: 766030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:40,978-Speed 2630.23 samples/sec   Loss 1.3226   LearningRate 0.0006   Epoch: 18   Global Step: 766040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:44,874-Speed 2629.09 samples/sec   Loss 1.3622   LearningRate 0.0006   Epoch: 18   Global Step: 766050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:48,771-Speed 2628.49 samples/sec   Loss 1.4189   LearningRate 0.0006   Epoch: 18   Global Step: 766060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:52,666-Speed 2629.86 samples/sec   Loss 1.3286   LearningRate 0.0006   Epoch: 18   Global Step: 766070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:49:56,569-Speed 2624.36 samples/sec   Loss 1.3298   LearningRate 0.0006   Epoch: 18   Global Step: 766080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:50:00,462-Speed 2631.36 samples/sec   Loss 1.3887   LearningRate 0.0006   Epoch: 18   Global Step: 766090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:50:04,359-Speed 2628.18 samples/sec   Loss 1.3523   LearningRate 0.0006   Epoch: 18   Global Step: 766100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:50:08,251-Speed 2631.30 samples/sec   Loss 1.4103   LearningRate 0.0006   Epoch: 18   Global Step: 766110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:12,247-Speed 2563.24 samples/sec   Loss 1.3434   LearningRate 0.0006   Epoch: 18   Global Step: 766120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:16,146-Speed 2627.05 samples/sec   Loss 1.3587   LearningRate 0.0006   Epoch: 18   Global Step: 766130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:20,058-Speed 2619.07 samples/sec   Loss 1.3695   LearningRate 0.0006   Epoch: 18   Global Step: 766140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:23,971-Speed 2617.79 samples/sec   Loss 1.4122   LearningRate 0.0006   Epoch: 18   Global Step: 766150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:27,877-Speed 2622.47 samples/sec   Loss 1.3625   LearningRate 0.0006   Epoch: 18   Global Step: 766160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:31,774-Speed 2628.33 samples/sec   Loss 1.3754   LearningRate 0.0006   Epoch: 18   Global Step: 766170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:35,711-Speed 2601.03 samples/sec   Loss 1.4140   LearningRate 0.0006   Epoch: 18   Global Step: 766180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:39,607-Speed 2629.22 samples/sec   Loss 1.4389   LearningRate 0.0006   Epoch: 18   Global Step: 766190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:43,502-Speed 2629.81 samples/sec   Loss 1.3890   LearningRate 0.0006   Epoch: 18   Global Step: 766200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:50:47,351-Speed 2661.38 samples/sec   Loss 1.3499   LearningRate 0.0006   Epoch: 18   Global Step: 766210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:50:51,252-Speed 2625.28 samples/sec   Loss 1.3902   LearningRate 0.0006   Epoch: 18   Global Step: 766220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:50:55,145-Speed 2631.09 samples/sec   Loss 1.3672   LearningRate 0.0006   Epoch: 18   Global Step: 766230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:50:59,041-Speed 2629.41 samples/sec   Loss 1.3376   LearningRate 0.0006   Epoch: 18   Global Step: 766240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:02,936-Speed 2629.84 samples/sec   Loss 1.3591   LearningRate 0.0006   Epoch: 18   Global Step: 766250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:06,830-Speed 2630.30 samples/sec   Loss 1.4047   LearningRate 0.0006   Epoch: 18   Global Step: 766260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:10,723-Speed 2630.56 samples/sec   Loss 1.3855   LearningRate 0.0006   Epoch: 18   Global Step: 766270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:14,664-Speed 2599.40 samples/sec   Loss 1.4064   LearningRate 0.0006   Epoch: 18   Global Step: 766280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:18,568-Speed 2624.16 samples/sec   Loss 1.3585   LearningRate 0.0006   Epoch: 18   Global Step: 766290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:22,465-Speed 2628.16 samples/sec   Loss 1.3870   LearningRate 0.0006   Epoch: 18   Global Step: 766300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:26,363-Speed 2627.94 samples/sec   Loss 1.4259   LearningRate 0.0006   Epoch: 18   Global Step: 766310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:30,264-Speed 2625.55 samples/sec   Loss 1.3812   LearningRate 0.0006   Epoch: 18   Global Step: 766320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:34,160-Speed 2628.90 samples/sec   Loss 1.3957   LearningRate 0.0006   Epoch: 18   Global Step: 766330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:38,055-Speed 2629.58 samples/sec   Loss 1.3784   LearningRate 0.0006   Epoch: 18   Global Step: 766340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:41,953-Speed 2628.07 samples/sec   Loss 1.3919   LearningRate 0.0006   Epoch: 18   Global Step: 766350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:51:45,825-Speed 2645.16 samples/sec   Loss 1.4013   LearningRate 0.0006   Epoch: 18   Global Step: 766360   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:51:49,722-Speed 2629.05 samples/sec   Loss 1.4144   LearningRate 0.0006   Epoch: 18   Global Step: 766370   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:51:53,618-Speed 2628.81 samples/sec   Loss 1.3159   LearningRate 0.0006   Epoch: 18   Global Step: 766380   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:51:57,520-Speed 2625.37 samples/sec   Loss 1.4027   LearningRate 0.0006   Epoch: 18   Global Step: 766390   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:01,447-Speed 2608.18 samples/sec   Loss 1.3560   LearningRate 0.0006   Epoch: 18   Global Step: 766400   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:05,341-Speed 2630.66 samples/sec   Loss 1.2808   LearningRate 0.0006   Epoch: 18   Global Step: 766410   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:09,235-Speed 2630.08 samples/sec   Loss 1.3924   LearningRate 0.0006   Epoch: 18   Global Step: 766420   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:13,128-Speed 2631.01 samples/sec   Loss 1.3576   LearningRate 0.0006   Epoch: 18   Global Step: 766430   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:17,039-Speed 2619.09 samples/sec   Loss 1.3712   LearningRate 0.0006   Epoch: 18   Global Step: 766440   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:20,931-Speed 2631.28 samples/sec   Loss 1.3771   LearningRate 0.0006   Epoch: 18   Global Step: 766450   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:52:24,830-Speed 2627.76 samples/sec   Loss 1.4390   LearningRate 0.0006   Epoch: 18   Global Step: 766460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:28,722-Speed 2631.51 samples/sec   Loss 1.3675   LearningRate 0.0006   Epoch: 18   Global Step: 766470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:32,619-Speed 2628.45 samples/sec   Loss 1.4215   LearningRate 0.0006   Epoch: 18   Global Step: 766480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:36,513-Speed 2630.63 samples/sec   Loss 1.3477   LearningRate 0.0006   Epoch: 18   Global Step: 766490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:40,421-Speed 2620.16 samples/sec   Loss 1.3542   LearningRate 0.0006   Epoch: 18   Global Step: 766500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:44,344-Speed 2610.91 samples/sec   Loss 1.3729   LearningRate 0.0006   Epoch: 18   Global Step: 766510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:48,245-Speed 2626.38 samples/sec   Loss 1.4004   LearningRate 0.0006   Epoch: 18   Global Step: 766520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:52,142-Speed 2627.90 samples/sec   Loss 1.3244   LearningRate 0.0006   Epoch: 18   Global Step: 766530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:52:56,107-Speed 2583.56 samples/sec   Loss 1.4349   LearningRate 0.0006   Epoch: 18   Global Step: 766540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:00,000-Speed 2631.25 samples/sec   Loss 1.3746   LearningRate 0.0006   Epoch: 18   Global Step: 766550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:03,898-Speed 2628.06 samples/sec   Loss 1.4345   LearningRate 0.0006   Epoch: 18   Global Step: 766560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:53:07,810-Speed 2618.20 samples/sec   Loss 1.3822   LearningRate 0.0006   Epoch: 18   Global Step: 766570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:53:11,693-Speed 2637.60 samples/sec   Loss 1.3896   LearningRate 0.0006   Epoch: 18   Global Step: 766580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:15,589-Speed 2629.22 samples/sec   Loss 1.3925   LearningRate 0.0006   Epoch: 18   Global Step: 766590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:19,486-Speed 2628.02 samples/sec   Loss 1.3411   LearningRate 0.0006   Epoch: 18   Global Step: 766600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:23,379-Speed 2631.38 samples/sec   Loss 1.3937   LearningRate 0.0006   Epoch: 18   Global Step: 766610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:27,271-Speed 2631.47 samples/sec   Loss 1.3895   LearningRate 0.0006   Epoch: 18   Global Step: 766620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:31,164-Speed 2630.87 samples/sec   Loss 1.3567   LearningRate 0.0006   Epoch: 18   Global Step: 766630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:35,058-Speed 2630.90 samples/sec   Loss 1.4322   LearningRate 0.0006   Epoch: 18   Global Step: 766640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:38,961-Speed 2624.00 samples/sec   Loss 1.3912   LearningRate 0.0006   Epoch: 18   Global Step: 766650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:42,861-Speed 2626.60 samples/sec   Loss 1.3985   LearningRate 0.0006   Epoch: 18   Global Step: 766660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:46,758-Speed 2627.81 samples/sec   Loss 1.3724   LearningRate 0.0006   Epoch: 18   Global Step: 766670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:50,742-Speed 2570.53 samples/sec   Loss 1.3801   LearningRate 0.0006   Epoch: 18   Global Step: 766680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:54,665-Speed 2611.08 samples/sec   Loss 1.3949   LearningRate 0.0006   Epoch: 18   Global Step: 766690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:53:58,561-Speed 2629.24 samples/sec   Loss 1.3742   LearningRate 0.0006   Epoch: 18   Global Step: 766700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:54:02,457-Speed 2629.53 samples/sec   Loss 1.3461   LearningRate 0.0006   Epoch: 18   Global Step: 766710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:54:06,350-Speed 2630.63 samples/sec   Loss 1.3758   LearningRate 0.0006   Epoch: 18   Global Step: 766720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:54:10,246-Speed 2629.38 samples/sec   Loss 1.3556   LearningRate 0.0006   Epoch: 18   Global Step: 766730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:54:14,139-Speed 2631.00 samples/sec   Loss 1.3749   LearningRate 0.0006   Epoch: 18   Global Step: 766740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:54:18,067-Speed 2607.63 samples/sec   Loss 1.3233   LearningRate 0.0006   Epoch: 18   Global Step: 766750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:54:21,969-Speed 2625.32 samples/sec   Loss 1.3791   LearningRate 0.0006   Epoch: 18   Global Step: 766760   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:25,863-Speed 2629.93 samples/sec   Loss 1.3924   LearningRate 0.0006   Epoch: 18   Global Step: 766770   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:29,806-Speed 2598.14 samples/sec   Loss 1.3829   LearningRate 0.0006   Epoch: 18   Global Step: 766780   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:33,700-Speed 2630.96 samples/sec   Loss 1.4181   LearningRate 0.0006   Epoch: 18   Global Step: 766790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:37,593-Speed 2630.40 samples/sec   Loss 1.3676   LearningRate 0.0006   Epoch: 18   Global Step: 766800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:41,487-Speed 2630.51 samples/sec   Loss 1.3546   LearningRate 0.0006   Epoch: 18   Global Step: 766810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:45,465-Speed 2575.91 samples/sec   Loss 1.4155   LearningRate 0.0006   Epoch: 18   Global Step: 766820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:49,542-Speed 2512.01 samples/sec   Loss 1.3347   LearningRate 0.0006   Epoch: 18   Global Step: 766830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:53,589-Speed 2531.31 samples/sec   Loss 1.3870   LearningRate 0.0006   Epoch: 18   Global Step: 766840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:54:57,488-Speed 2627.24 samples/sec   Loss 1.3550   LearningRate 0.0006   Epoch: 18   Global Step: 766850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 09:55:01,391-Speed 2624.34 samples/sec   Loss 1.4304   LearningRate 0.0006   Epoch: 18   Global Step: 766860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:05,294-Speed 2624.22 samples/sec   Loss 1.3762   LearningRate 0.0006   Epoch: 18   Global Step: 766870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:09,202-Speed 2621.13 samples/sec   Loss 1.3253   LearningRate 0.0006   Epoch: 18   Global Step: 766880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:13,103-Speed 2625.65 samples/sec   Loss 1.4119   LearningRate 0.0006   Epoch: 18   Global Step: 766890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:17,000-Speed 2628.26 samples/sec   Loss 1.3669   LearningRate 0.0006   Epoch: 18   Global Step: 766900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:20,908-Speed 2621.40 samples/sec   Loss 1.3759   LearningRate 0.0006   Epoch: 18   Global Step: 766910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:24,803-Speed 2628.83 samples/sec   Loss 1.3803   LearningRate 0.0006   Epoch: 18   Global Step: 766920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:28,704-Speed 2626.09 samples/sec   Loss 1.4116   LearningRate 0.0006   Epoch: 18   Global Step: 766930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:32,612-Speed 2621.23 samples/sec   Loss 1.3801   LearningRate 0.0006   Epoch: 18   Global Step: 766940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:36,509-Speed 2628.35 samples/sec   Loss 1.4119   LearningRate 0.0006   Epoch: 18   Global Step: 766950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:40,399-Speed 2632.71 samples/sec   Loss 1.4141   LearningRate 0.0006   Epoch: 18   Global Step: 766960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:55:44,321-Speed 2611.61 samples/sec   Loss 1.3707   LearningRate 0.0006   Epoch: 18   Global Step: 766970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:55:48,215-Speed 2630.41 samples/sec   Loss 1.3238   LearningRate 0.0006   Epoch: 18   Global Step: 766980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:55:52,109-Speed 2630.84 samples/sec   Loss 1.3500   LearningRate 0.0006   Epoch: 18   Global Step: 766990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:56,012-Speed 2624.03 samples/sec   Loss 1.4125   LearningRate 0.0006   Epoch: 18   Global Step: 767000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:55:59,908-Speed 2629.40 samples/sec   Loss 1.4036   LearningRate 0.0006   Epoch: 18   Global Step: 767010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:03,804-Speed 2629.05 samples/sec   Loss 1.3134   LearningRate 0.0006   Epoch: 18   Global Step: 767020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:07,701-Speed 2628.04 samples/sec   Loss 1.3355   LearningRate 0.0006   Epoch: 18   Global Step: 767030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:11,642-Speed 2599.05 samples/sec   Loss 1.3245   LearningRate 0.0006   Epoch: 18   Global Step: 767040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:15,544-Speed 2625.48 samples/sec   Loss 1.3483   LearningRate 0.0006   Epoch: 18   Global Step: 767050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:19,438-Speed 2631.34 samples/sec   Loss 1.3372   LearningRate 0.0006   Epoch: 18   Global Step: 767060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:23,351-Speed 2617.18 samples/sec   Loss 1.3488   LearningRate 0.0006   Epoch: 18   Global Step: 767070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:27,248-Speed 2628.81 samples/sec   Loss 1.3731   LearningRate 0.0006   Epoch: 18   Global Step: 767080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:31,118-Speed 2646.79 samples/sec   Loss 1.3793   LearningRate 0.0006   Epoch: 18   Global Step: 767090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:35,012-Speed 2629.65 samples/sec   Loss 1.3312   LearningRate 0.0006   Epoch: 18   Global Step: 767100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:38,907-Speed 2629.93 samples/sec   Loss 1.3771   LearningRate 0.0006   Epoch: 18   Global Step: 767110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:42,798-Speed 2632.79 samples/sec   Loss 1.3784   LearningRate 0.0006   Epoch: 18   Global Step: 767120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:46,688-Speed 2633.60 samples/sec   Loss 1.4095   LearningRate 0.0006   Epoch: 18   Global Step: 767130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:50,581-Speed 2631.30 samples/sec   Loss 1.3688   LearningRate 0.0006   Epoch: 18   Global Step: 767140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:54,475-Speed 2630.19 samples/sec   Loss 1.3591   LearningRate 0.0006   Epoch: 18   Global Step: 767150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:56:58,369-Speed 2630.67 samples/sec   Loss 1.3774   LearningRate 0.0006   Epoch: 18   Global Step: 767160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:02,278-Speed 2619.98 samples/sec   Loss 1.3392   LearningRate 0.0006   Epoch: 18   Global Step: 767170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:06,177-Speed 2626.83 samples/sec   Loss 1.3810   LearningRate 0.0006   Epoch: 18   Global Step: 767180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:10,111-Speed 2603.82 samples/sec   Loss 1.4170   LearningRate 0.0006   Epoch: 18   Global Step: 767190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:57:14,016-Speed 2623.24 samples/sec   Loss 1.3794   LearningRate 0.0006   Epoch: 18   Global Step: 767200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:57:17,915-Speed 2627.00 samples/sec   Loss 1.3423   LearningRate 0.0006   Epoch: 18   Global Step: 767210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:57:21,918-Speed 2558.77 samples/sec   Loss 1.4120   LearningRate 0.0006   Epoch: 18   Global Step: 767220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:57:25,864-Speed 2596.12 samples/sec   Loss 1.3952   LearningRate 0.0006   Epoch: 18   Global Step: 767230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:57:29,761-Speed 2628.71 samples/sec   Loss 1.3590   LearningRate 0.0006   Epoch: 18   Global Step: 767240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:57:33,636-Speed 2642.69 samples/sec   Loss 1.3440   LearningRate 0.0006   Epoch: 18   Global Step: 767250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:37,526-Speed 2632.96 samples/sec   Loss 1.4203   LearningRate 0.0006   Epoch: 18   Global Step: 767260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:41,434-Speed 2620.77 samples/sec   Loss 1.2989   LearningRate 0.0006   Epoch: 18   Global Step: 767270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:45,328-Speed 2630.63 samples/sec   Loss 1.4196   LearningRate 0.0006   Epoch: 18   Global Step: 767280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:49,232-Speed 2623.87 samples/sec   Loss 1.3609   LearningRate 0.0006   Epoch: 18   Global Step: 767290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:53,138-Speed 2622.18 samples/sec   Loss 1.3804   LearningRate 0.0006   Epoch: 18   Global Step: 767300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:57:57,041-Speed 2624.69 samples/sec   Loss 1.3484   LearningRate 0.0006   Epoch: 18   Global Step: 767310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:00,941-Speed 2626.01 samples/sec   Loss 1.3745   LearningRate 0.0006   Epoch: 18   Global Step: 767320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:04,844-Speed 2623.95 samples/sec   Loss 1.3747   LearningRate 0.0006   Epoch: 18   Global Step: 767330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:08,739-Speed 2629.75 samples/sec   Loss 1.3937   LearningRate 0.0006   Epoch: 18   Global Step: 767340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:12,640-Speed 2625.56 samples/sec   Loss 1.3910   LearningRate 0.0006   Epoch: 18   Global Step: 767350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:58:16,541-Speed 2625.71 samples/sec   Loss 1.3644   LearningRate 0.0006   Epoch: 18   Global Step: 767360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:58:20,433-Speed 2632.07 samples/sec   Loss 1.4003   LearningRate 0.0006   Epoch: 18   Global Step: 767370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:58:24,328-Speed 2630.13 samples/sec   Loss 1.3800   LearningRate 0.0006   Epoch: 18   Global Step: 767380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:58:28,202-Speed 2643.92 samples/sec   Loss 1.3981   LearningRate 0.0006   Epoch: 18   Global Step: 767390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:32,093-Speed 2632.08 samples/sec   Loss 1.4038   LearningRate 0.0006   Epoch: 18   Global Step: 767400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:35,988-Speed 2630.03 samples/sec   Loss 1.3825   LearningRate 0.0006   Epoch: 18   Global Step: 767410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:39,904-Speed 2615.37 samples/sec   Loss 1.3445   LearningRate 0.0006   Epoch: 18   Global Step: 767420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:43,807-Speed 2623.80 samples/sec   Loss 1.4118   LearningRate 0.0006   Epoch: 18   Global Step: 767430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:47,701-Speed 2630.99 samples/sec   Loss 1.3902   LearningRate 0.0006   Epoch: 18   Global Step: 767440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:51,591-Speed 2632.66 samples/sec   Loss 1.3143   LearningRate 0.0006   Epoch: 18   Global Step: 767450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:55,484-Speed 2632.40 samples/sec   Loss 1.3711   LearningRate 0.0006   Epoch: 18   Global Step: 767460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:58:59,378-Speed 2630.01 samples/sec   Loss 1.3523   LearningRate 0.0006   Epoch: 18   Global Step: 767470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:03,276-Speed 2627.70 samples/sec   Loss 1.3888   LearningRate 0.0006   Epoch: 18   Global Step: 767480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:07,171-Speed 2629.56 samples/sec   Loss 1.3432   LearningRate 0.0006   Epoch: 18   Global Step: 767490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:59:11,064-Speed 2630.42 samples/sec   Loss 1.3635   LearningRate 0.0006   Epoch: 18   Global Step: 767500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:59:14,941-Speed 2642.11 samples/sec   Loss 1.4110   LearningRate 0.0006   Epoch: 18   Global Step: 767510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:18,837-Speed 2628.74 samples/sec   Loss 1.3556   LearningRate 0.0006   Epoch: 18   Global Step: 767520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:22,741-Speed 2623.68 samples/sec   Loss 1.3667   LearningRate 0.0006   Epoch: 18   Global Step: 767530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:26,642-Speed 2626.52 samples/sec   Loss 1.3745   LearningRate 0.0006   Epoch: 18   Global Step: 767540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:30,538-Speed 2628.47 samples/sec   Loss 1.3613   LearningRate 0.0006   Epoch: 18   Global Step: 767550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:34,432-Speed 2630.39 samples/sec   Loss 1.3908   LearningRate 0.0006   Epoch: 18   Global Step: 767560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:38,368-Speed 2602.52 samples/sec   Loss 1.3782   LearningRate 0.0006   Epoch: 18   Global Step: 767570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:42,259-Speed 2631.67 samples/sec   Loss 1.4199   LearningRate 0.0006   Epoch: 18   Global Step: 767580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:46,166-Speed 2621.75 samples/sec   Loss 1.3589   LearningRate 0.0006   Epoch: 18   Global Step: 767590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:50,064-Speed 2627.48 samples/sec   Loss 1.3638   LearningRate 0.0006   Epoch: 18   Global Step: 767600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 09:59:53,956-Speed 2631.50 samples/sec   Loss 1.3786   LearningRate 0.0006   Epoch: 18   Global Step: 767610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 09:59:57,860-Speed 2624.53 samples/sec   Loss 1.3346   LearningRate 0.0006   Epoch: 18   Global Step: 767620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:00:01,773-Speed 2617.55 samples/sec   Loss 1.3844   LearningRate 0.0006   Epoch: 18   Global Step: 767630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:00:05,768-Speed 2563.77 samples/sec   Loss 1.3212   LearningRate 0.0006   Epoch: 18   Global Step: 767640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:00:09,642-Speed 2643.34 samples/sec   Loss 1.3124   LearningRate 0.0006   Epoch: 18   Global Step: 767650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:13,541-Speed 2626.80 samples/sec   Loss 1.3924   LearningRate 0.0006   Epoch: 18   Global Step: 767660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:17,433-Speed 2631.69 samples/sec   Loss 1.3213   LearningRate 0.0006   Epoch: 18   Global Step: 767670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:21,340-Speed 2621.24 samples/sec   Loss 1.3266   LearningRate 0.0006   Epoch: 18   Global Step: 767680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:25,234-Speed 2630.59 samples/sec   Loss 1.3973   LearningRate 0.0006   Epoch: 18   Global Step: 767690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:29,143-Speed 2620.29 samples/sec   Loss 1.3882   LearningRate 0.0006   Epoch: 18   Global Step: 767700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:33,037-Speed 2630.36 samples/sec   Loss 1.3721   LearningRate 0.0006   Epoch: 18   Global Step: 767710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:36,933-Speed 2628.46 samples/sec   Loss 1.3613   LearningRate 0.0006   Epoch: 18   Global Step: 767720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:40,833-Speed 2627.37 samples/sec   Loss 1.4035   LearningRate 0.0006   Epoch: 18   Global Step: 767730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:44,727-Speed 2630.20 samples/sec   Loss 1.3537   LearningRate 0.0006   Epoch: 18   Global Step: 767740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:00:48,636-Speed 2620.49 samples/sec   Loss 1.3424   LearningRate 0.0006   Epoch: 18   Global Step: 767750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:00:52,576-Speed 2599.81 samples/sec   Loss 1.3943   LearningRate 0.0006   Epoch: 18   Global Step: 767760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:00:56,480-Speed 2623.34 samples/sec   Loss 1.3787   LearningRate 0.0006   Epoch: 18   Global Step: 767770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:01:00,372-Speed 2631.64 samples/sec   Loss 1.3437   LearningRate 0.0006   Epoch: 18   Global Step: 767780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:01:04,248-Speed 2642.17 samples/sec   Loss 1.3462   LearningRate 0.0006   Epoch: 18   Global Step: 767790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:08,138-Speed 2634.30 samples/sec   Loss 1.3695   LearningRate 0.0006   Epoch: 18   Global Step: 767800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:12,043-Speed 2623.14 samples/sec   Loss 1.3795   LearningRate 0.0006   Epoch: 18   Global Step: 767810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:15,950-Speed 2621.61 samples/sec   Loss 1.3167   LearningRate 0.0006   Epoch: 18   Global Step: 767820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:19,856-Speed 2623.33 samples/sec   Loss 1.3409   LearningRate 0.0006   Epoch: 18   Global Step: 767830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:23,748-Speed 2631.22 samples/sec   Loss 1.3681   LearningRate 0.0006   Epoch: 18   Global Step: 767840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:27,647-Speed 2627.25 samples/sec   Loss 1.3987   LearningRate 0.0006   Epoch: 18   Global Step: 767850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:31,542-Speed 2630.14 samples/sec   Loss 1.3920   LearningRate 0.0006   Epoch: 18   Global Step: 767860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:35,433-Speed 2631.91 samples/sec   Loss 1.3271   LearningRate 0.0006   Epoch: 18   Global Step: 767870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:39,358-Speed 2609.54 samples/sec   Loss 1.3611   LearningRate 0.0006   Epoch: 18   Global Step: 767880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:43,251-Speed 2631.18 samples/sec   Loss 1.3584   LearningRate 0.0006   Epoch: 18   Global Step: 767890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:01:47,215-Speed 2584.13 samples/sec   Loss 1.3627   LearningRate 0.0006   Epoch: 18   Global Step: 767900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:01:51,181-Speed 2582.41 samples/sec   Loss 1.3389   LearningRate 0.0006   Epoch: 18   Global Step: 767910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:01:55,051-Speed 2646.66 samples/sec   Loss 1.3637   LearningRate 0.0006   Epoch: 18   Global Step: 767920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:01:58,955-Speed 2624.49 samples/sec   Loss 1.3495   LearningRate 0.0006   Epoch: 18   Global Step: 767930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:02,862-Speed 2621.18 samples/sec   Loss 1.3667   LearningRate 0.0006   Epoch: 18   Global Step: 767940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:06,759-Speed 2628.61 samples/sec   Loss 1.3581   LearningRate 0.0006   Epoch: 18   Global Step: 767950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:10,667-Speed 2620.61 samples/sec   Loss 1.3728   LearningRate 0.0006   Epoch: 18   Global Step: 767960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:14,570-Speed 2624.13 samples/sec   Loss 1.3595   LearningRate 0.0006   Epoch: 18   Global Step: 767970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:18,479-Speed 2620.61 samples/sec   Loss 1.3720   LearningRate 0.0006   Epoch: 18   Global Step: 767980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:22,382-Speed 2624.04 samples/sec   Loss 1.3720   LearningRate 0.0006   Epoch: 18   Global Step: 767990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:26,283-Speed 2625.93 samples/sec   Loss 1.3565   LearningRate 0.0006   Epoch: 18   Global Step: 768000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:30,180-Speed 2628.86 samples/sec   Loss 1.3702   LearningRate 0.0006   Epoch: 18   Global Step: 768010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:34,060-Speed 2639.55 samples/sec   Loss 1.3941   LearningRate 0.0006   Epoch: 18   Global Step: 768020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:37,971-Speed 2618.28 samples/sec   Loss 1.3320   LearningRate 0.0006   Epoch: 18   Global Step: 768030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:41,868-Speed 2628.21 samples/sec   Loss 1.3618   LearningRate 0.0006   Epoch: 18   Global Step: 768040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:45,767-Speed 2627.63 samples/sec   Loss 1.3444   LearningRate 0.0006   Epoch: 18   Global Step: 768050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:49,662-Speed 2630.06 samples/sec   Loss 1.4192   LearningRate 0.0005   Epoch: 18   Global Step: 768060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:53,561-Speed 2627.03 samples/sec   Loss 1.3475   LearningRate 0.0005   Epoch: 18   Global Step: 768070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:02:57,503-Speed 2598.55 samples/sec   Loss 1.3515   LearningRate 0.0005   Epoch: 18   Global Step: 768080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:01,395-Speed 2631.60 samples/sec   Loss 1.4041   LearningRate 0.0005   Epoch: 18   Global Step: 768090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:05,305-Speed 2619.54 samples/sec   Loss 1.3438   LearningRate 0.0005   Epoch: 18   Global Step: 768100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:09,277-Speed 2578.82 samples/sec   Loss 1.3474   LearningRate 0.0005   Epoch: 18   Global Step: 768110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:13,219-Speed 2598.45 samples/sec   Loss 1.3655   LearningRate 0.0005   Epoch: 18   Global Step: 768120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:03:17,115-Speed 2628.71 samples/sec   Loss 1.3706   LearningRate 0.0005   Epoch: 18   Global Step: 768130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:03:21,012-Speed 2628.63 samples/sec   Loss 1.3709   LearningRate 0.0005   Epoch: 18   Global Step: 768140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:24,918-Speed 2622.35 samples/sec   Loss 1.3452   LearningRate 0.0005   Epoch: 18   Global Step: 768150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:28,813-Speed 2629.94 samples/sec   Loss 1.3830   LearningRate 0.0005   Epoch: 18   Global Step: 768160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:32,706-Speed 2630.79 samples/sec   Loss 1.3570   LearningRate 0.0005   Epoch: 18   Global Step: 768170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:36,605-Speed 2626.69 samples/sec   Loss 1.3536   LearningRate 0.0005   Epoch: 18   Global Step: 768180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:40,501-Speed 2629.36 samples/sec   Loss 1.3768   LearningRate 0.0005   Epoch: 18   Global Step: 768190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:44,393-Speed 2631.84 samples/sec   Loss 1.3788   LearningRate 0.0005   Epoch: 18   Global Step: 768200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:48,293-Speed 2626.06 samples/sec   Loss 1.3789   LearningRate 0.0005   Epoch: 18   Global Step: 768210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:52,191-Speed 2627.98 samples/sec   Loss 1.3799   LearningRate 0.0005   Epoch: 18   Global Step: 768220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:56,084-Speed 2631.29 samples/sec   Loss 1.3212   LearningRate 0.0005   Epoch: 18   Global Step: 768230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:03:59,981-Speed 2628.24 samples/sec   Loss 1.3455   LearningRate 0.0005   Epoch: 18   Global Step: 768240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:03,874-Speed 2630.92 samples/sec   Loss 1.3924   LearningRate 0.0005   Epoch: 18   Global Step: 768250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:07,766-Speed 2631.90 samples/sec   Loss 1.3943   LearningRate 0.0005   Epoch: 18   Global Step: 768260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:11,658-Speed 2631.22 samples/sec   Loss 1.4132   LearningRate 0.0005   Epoch: 18   Global Step: 768270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:15,558-Speed 2626.60 samples/sec   Loss 1.3557   LearningRate 0.0005   Epoch: 18   Global Step: 768280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:19,460-Speed 2625.20 samples/sec   Loss 1.3544   LearningRate 0.0005   Epoch: 18   Global Step: 768290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:23,359-Speed 2626.70 samples/sec   Loss 1.3742   LearningRate 0.0005   Epoch: 18   Global Step: 768300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:04:27,231-Speed 2645.87 samples/sec   Loss 1.3825   LearningRate 0.0005   Epoch: 18   Global Step: 768310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:31,182-Speed 2592.15 samples/sec   Loss 1.3871   LearningRate 0.0005   Epoch: 18   Global Step: 768320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:35,093-Speed 2619.39 samples/sec   Loss 1.3742   LearningRate 0.0005   Epoch: 18   Global Step: 768330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:38,989-Speed 2628.41 samples/sec   Loss 1.3468   LearningRate 0.0005   Epoch: 18   Global Step: 768340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:42,896-Speed 2621.39 samples/sec   Loss 1.3382   LearningRate 0.0005   Epoch: 18   Global Step: 768350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:46,783-Speed 2634.79 samples/sec   Loss 1.3741   LearningRate 0.0005   Epoch: 18   Global Step: 768360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:50,703-Speed 2614.38 samples/sec   Loss 1.3781   LearningRate 0.0005   Epoch: 18   Global Step: 768370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:54,597-Speed 2629.79 samples/sec   Loss 1.3906   LearningRate 0.0005   Epoch: 18   Global Step: 768380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:04:58,512-Speed 2617.16 samples/sec   Loss 1.3463   LearningRate 0.0005   Epoch: 18   Global Step: 768390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:02,406-Speed 2629.67 samples/sec   Loss 1.3882   LearningRate 0.0005   Epoch: 18   Global Step: 768400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:06,334-Speed 2608.12 samples/sec   Loss 1.4006   LearningRate 0.0005   Epoch: 18   Global Step: 768410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:10,225-Speed 2632.88 samples/sec   Loss 1.3641   LearningRate 0.0005   Epoch: 18   Global Step: 768420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:14,115-Speed 2632.75 samples/sec   Loss 1.3654   LearningRate 0.0005   Epoch: 18   Global Step: 768430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:18,036-Speed 2612.16 samples/sec   Loss 1.3793   LearningRate 0.0005   Epoch: 18   Global Step: 768440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:21,927-Speed 2632.49 samples/sec   Loss 1.3568   LearningRate 0.0005   Epoch: 18   Global Step: 768450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:25,819-Speed 2631.42 samples/sec   Loss 1.3350   LearningRate 0.0005   Epoch: 18   Global Step: 768460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:29,712-Speed 2631.27 samples/sec   Loss 1.3974   LearningRate 0.0005   Epoch: 18   Global Step: 768470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:33,604-Speed 2632.27 samples/sec   Loss 1.3601   LearningRate 0.0005   Epoch: 18   Global Step: 768480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:37,499-Speed 2629.26 samples/sec   Loss 1.3546   LearningRate 0.0005   Epoch: 18   Global Step: 768490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:41,411-Speed 2618.28 samples/sec   Loss 1.3438   LearningRate 0.0005   Epoch: 18   Global Step: 768500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:45,321-Speed 2620.08 samples/sec   Loss 1.3667   LearningRate 0.0005   Epoch: 18   Global Step: 768510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:05:49,192-Speed 2645.91 samples/sec   Loss 1.4139   LearningRate 0.0005   Epoch: 18   Global Step: 768520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:53,085-Speed 2630.71 samples/sec   Loss 1.3665   LearningRate 0.0005   Epoch: 18   Global Step: 768530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:05:56,973-Speed 2635.01 samples/sec   Loss 1.3997   LearningRate 0.0005   Epoch: 18   Global Step: 768540   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:00,880-Speed 2621.26 samples/sec   Loss 1.3784   LearningRate 0.0005   Epoch: 18   Global Step: 768550   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:04,782-Speed 2625.52 samples/sec   Loss 1.2946   LearningRate 0.0005   Epoch: 18   Global Step: 768560   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:08,679-Speed 2627.85 samples/sec   Loss 1.3874   LearningRate 0.0005   Epoch: 18   Global Step: 768570   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:12,582-Speed 2624.88 samples/sec   Loss 1.3241   LearningRate 0.0005   Epoch: 18   Global Step: 768580   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:16,477-Speed 2629.60 samples/sec   Loss 1.3634   LearningRate 0.0005   Epoch: 18   Global Step: 768590   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:20,370-Speed 2631.15 samples/sec   Loss 1.3423   LearningRate 0.0005   Epoch: 18   Global Step: 768600   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:24,301-Speed 2605.09 samples/sec   Loss 1.3290   LearningRate 0.0005   Epoch: 18   Global Step: 768610   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:28,201-Speed 2626.35 samples/sec   Loss 1.3574   LearningRate 0.0005   Epoch: 18   Global Step: 768620   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:32,098-Speed 2628.44 samples/sec   Loss 1.3578   LearningRate 0.0005   Epoch: 18   Global Step: 768630   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:06:35,996-Speed 2627.69 samples/sec   Loss 1.3312   LearningRate 0.0005   Epoch: 18   Global Step: 768640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:06:39,893-Speed 2629.13 samples/sec   Loss 1.3409   LearningRate 0.0005   Epoch: 18   Global Step: 768650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:06:43,800-Speed 2621.05 samples/sec   Loss 1.3495   LearningRate 0.0005   Epoch: 18   Global Step: 768660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:06:47,694-Speed 2630.81 samples/sec   Loss 1.3136   LearningRate 0.0005   Epoch: 18   Global Step: 768670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:06:51,586-Speed 2631.47 samples/sec   Loss 1.3780   LearningRate 0.0005   Epoch: 18   Global Step: 768680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:06:55,477-Speed 2632.72 samples/sec   Loss 1.3532   LearningRate 0.0005   Epoch: 18   Global Step: 768690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:06:59,370-Speed 2630.85 samples/sec   Loss 1.3527   LearningRate 0.0005   Epoch: 18   Global Step: 768700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:03,267-Speed 2628.39 samples/sec   Loss 1.3949   LearningRate 0.0005   Epoch: 18   Global Step: 768710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:07,169-Speed 2625.09 samples/sec   Loss 1.3647   LearningRate 0.0005   Epoch: 18   Global Step: 768720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:11,094-Speed 2609.25 samples/sec   Loss 1.3784   LearningRate 0.0005   Epoch: 18   Global Step: 768730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:14,995-Speed 2626.30 samples/sec   Loss 1.3878   LearningRate 0.0005   Epoch: 18   Global Step: 768740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:07:18,870-Speed 2642.95 samples/sec   Loss 1.3864   LearningRate 0.0005   Epoch: 18   Global Step: 768750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:22,887-Speed 2549.79 samples/sec   Loss 1.3329   LearningRate 0.0005   Epoch: 18   Global Step: 768760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:26,781-Speed 2630.25 samples/sec   Loss 1.3783   LearningRate 0.0005   Epoch: 18   Global Step: 768770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:30,678-Speed 2628.28 samples/sec   Loss 1.3997   LearningRate 0.0005   Epoch: 18   Global Step: 768780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:34,577-Speed 2627.05 samples/sec   Loss 1.3477   LearningRate 0.0005   Epoch: 18   Global Step: 768790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:38,473-Speed 2629.38 samples/sec   Loss 1.3303   LearningRate 0.0005   Epoch: 18   Global Step: 768800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:42,367-Speed 2630.21 samples/sec   Loss 1.3132   LearningRate 0.0005   Epoch: 18   Global Step: 768810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:46,263-Speed 2629.26 samples/sec   Loss 1.3093   LearningRate 0.0005   Epoch: 18   Global Step: 768820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:50,152-Speed 2633.57 samples/sec   Loss 1.3623   LearningRate 0.0005   Epoch: 18   Global Step: 768830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:54,045-Speed 2631.45 samples/sec   Loss 1.3323   LearningRate 0.0005   Epoch: 18   Global Step: 768840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:07:57,935-Speed 2633.56 samples/sec   Loss 1.2925   LearningRate 0.0005   Epoch: 18   Global Step: 768850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:08:01,804-Speed 2647.24 samples/sec   Loss 1.3669   LearningRate 0.0005   Epoch: 18   Global Step: 768860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:05,695-Speed 2632.32 samples/sec   Loss 1.3387   LearningRate 0.0005   Epoch: 18   Global Step: 768870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:09,585-Speed 2632.48 samples/sec   Loss 1.3592   LearningRate 0.0005   Epoch: 18   Global Step: 768880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:13,479-Speed 2631.21 samples/sec   Loss 1.3808   LearningRate 0.0005   Epoch: 18   Global Step: 768890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:17,374-Speed 2629.14 samples/sec   Loss 1.3702   LearningRate 0.0005   Epoch: 18   Global Step: 768900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:21,264-Speed 2633.32 samples/sec   Loss 1.3783   LearningRate 0.0005   Epoch: 18   Global Step: 768910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:25,161-Speed 2628.61 samples/sec   Loss 1.3290   LearningRate 0.0005   Epoch: 18   Global Step: 768920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:29,060-Speed 2627.39 samples/sec   Loss 1.3628   LearningRate 0.0005   Epoch: 18   Global Step: 768930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:32,956-Speed 2628.94 samples/sec   Loss 1.3532   LearningRate 0.0005   Epoch: 18   Global Step: 768940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:36,849-Speed 2630.59 samples/sec   Loss 1.3400   LearningRate 0.0005   Epoch: 18   Global Step: 768950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:08:40,752-Speed 2624.48 samples/sec   Loss 1.3830   LearningRate 0.0005   Epoch: 18   Global Step: 768960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:08:44,654-Speed 2625.18 samples/sec   Loss 1.3329   LearningRate 0.0005   Epoch: 18   Global Step: 768970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:08:48,545-Speed 2631.87 samples/sec   Loss 1.3538   LearningRate 0.0005   Epoch: 18   Global Step: 768980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:08:52,446-Speed 2626.28 samples/sec   Loss 1.3676   LearningRate 0.0005   Epoch: 18   Global Step: 768990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:08:56,335-Speed 2633.50 samples/sec   Loss 1.3558   LearningRate 0.0005   Epoch: 18   Global Step: 769000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:09:00,246-Speed 2620.36 samples/sec   Loss 1.4145   LearningRate 0.0005   Epoch: 18   Global Step: 769010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:09:04,142-Speed 2628.55 samples/sec   Loss 1.3261   LearningRate 0.0005   Epoch: 18   Global Step: 769020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:09:08,041-Speed 2627.58 samples/sec   Loss 1.3504   LearningRate 0.0005   Epoch: 18   Global Step: 769030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:09:11,938-Speed 2628.09 samples/sec   Loss 1.3780   LearningRate 0.0005   Epoch: 18   Global Step: 769040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:09:15,816-Speed 2641.41 samples/sec   Loss 1.3453   LearningRate 0.0005   Epoch: 18   Global Step: 769050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:19,711-Speed 2629.08 samples/sec   Loss 1.3756   LearningRate 0.0005   Epoch: 18   Global Step: 769060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:23,612-Speed 2626.00 samples/sec   Loss 1.3718   LearningRate 0.0005   Epoch: 18   Global Step: 769070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:27,517-Speed 2623.32 samples/sec   Loss 1.3548   LearningRate 0.0005   Epoch: 18   Global Step: 769080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:31,451-Speed 2603.81 samples/sec   Loss 1.3783   LearningRate 0.0005   Epoch: 18   Global Step: 769090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:35,353-Speed 2624.90 samples/sec   Loss 1.3093   LearningRate 0.0005   Epoch: 18   Global Step: 769100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:39,249-Speed 2628.75 samples/sec   Loss 1.3254   LearningRate 0.0005   Epoch: 18   Global Step: 769110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:43,137-Speed 2634.02 samples/sec   Loss 1.3885   LearningRate 0.0005   Epoch: 18   Global Step: 769120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:47,034-Speed 2628.37 samples/sec   Loss 1.3792   LearningRate 0.0005   Epoch: 18   Global Step: 769130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:50,929-Speed 2629.65 samples/sec   Loss 1.3514   LearningRate 0.0005   Epoch: 18   Global Step: 769140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:54,801-Speed 2645.80 samples/sec   Loss 1.3443   LearningRate 0.0005   Epoch: 18   Global Step: 769150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:09:58,701-Speed 2626.54 samples/sec   Loss 1.3378   LearningRate 0.0005   Epoch: 18   Global Step: 769160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:02,662-Speed 2585.84 samples/sec   Loss 1.3758   LearningRate 0.0005   Epoch: 18   Global Step: 769170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:06,693-Speed 2540.48 samples/sec   Loss 1.3700   LearningRate 0.0005   Epoch: 18   Global Step: 769180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:10,604-Speed 2618.77 samples/sec   Loss 1.3266   LearningRate 0.0005   Epoch: 18   Global Step: 769190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:14,506-Speed 2625.48 samples/sec   Loss 1.3461   LearningRate 0.0005   Epoch: 18   Global Step: 769200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:18,408-Speed 2624.37 samples/sec   Loss 1.3512   LearningRate 0.0005   Epoch: 18   Global Step: 769210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:22,305-Speed 2628.49 samples/sec   Loss 1.3676   LearningRate 0.0005   Epoch: 18   Global Step: 769220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:26,202-Speed 2628.62 samples/sec   Loss 1.4536   LearningRate 0.0005   Epoch: 18   Global Step: 769230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:30,198-Speed 2563.03 samples/sec   Loss 1.3320   LearningRate 0.0005   Epoch: 18   Global Step: 769240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:34,255-Speed 2524.89 samples/sec   Loss 1.3474   LearningRate 0.0005   Epoch: 18   Global Step: 769250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:38,318-Speed 2520.42 samples/sec   Loss 1.4014   LearningRate 0.0005   Epoch: 18   Global Step: 769260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:42,222-Speed 2623.39 samples/sec   Loss 1.3319   LearningRate 0.0005   Epoch: 18   Global Step: 769270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:46,116-Speed 2630.95 samples/sec   Loss 1.3531   LearningRate 0.0005   Epoch: 18   Global Step: 769280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:50,014-Speed 2627.00 samples/sec   Loss 1.3965   LearningRate 0.0005   Epoch: 18   Global Step: 769290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:53,922-Speed 2620.94 samples/sec   Loss 1.3845   LearningRate 0.0005   Epoch: 18   Global Step: 769300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:10:57,821-Speed 2627.46 samples/sec   Loss 1.3390   LearningRate 0.0005   Epoch: 18   Global Step: 769310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:01,716-Speed 2629.45 samples/sec   Loss 1.2997   LearningRate 0.0005   Epoch: 18   Global Step: 769320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:05,615-Speed 2627.76 samples/sec   Loss 1.3495   LearningRate 0.0005   Epoch: 18   Global Step: 769330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:09,512-Speed 2628.09 samples/sec   Loss 1.3675   LearningRate 0.0005   Epoch: 18   Global Step: 769340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:13,402-Speed 2632.26 samples/sec   Loss 1.3705   LearningRate 0.0005   Epoch: 18   Global Step: 769350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:11:17,297-Speed 2629.45 samples/sec   Loss 1.3645   LearningRate 0.0005   Epoch: 18   Global Step: 769360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:11:21,192-Speed 2630.51 samples/sec   Loss 1.3365   LearningRate 0.0005   Epoch: 18   Global Step: 769370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:11:25,062-Speed 2646.37 samples/sec   Loss 1.3472   LearningRate 0.0005   Epoch: 18   Global Step: 769380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:28,955-Speed 2631.28 samples/sec   Loss 1.3147   LearningRate 0.0005   Epoch: 18   Global Step: 769390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:32,854-Speed 2627.13 samples/sec   Loss 1.3767   LearningRate 0.0005   Epoch: 18   Global Step: 769400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:36,746-Speed 2632.00 samples/sec   Loss 1.3425   LearningRate 0.0005   Epoch: 18   Global Step: 769410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:40,637-Speed 2631.84 samples/sec   Loss 1.3753   LearningRate 0.0005   Epoch: 18   Global Step: 769420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:44,535-Speed 2627.98 samples/sec   Loss 1.3882   LearningRate 0.0005   Epoch: 18   Global Step: 769430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:48,440-Speed 2622.68 samples/sec   Loss 1.4270   LearningRate 0.0005   Epoch: 18   Global Step: 769440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:52,330-Speed 2632.75 samples/sec   Loss 1.3803   LearningRate 0.0005   Epoch: 18   Global Step: 769450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:11:56,223-Speed 2631.24 samples/sec   Loss 1.3631   LearningRate 0.0005   Epoch: 18   Global Step: 769460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:00,119-Speed 2628.63 samples/sec   Loss 1.3890   LearningRate 0.0005   Epoch: 18   Global Step: 769470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:04,015-Speed 2629.29 samples/sec   Loss 1.3538   LearningRate 0.0005   Epoch: 18   Global Step: 769480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:12:07,888-Speed 2644.83 samples/sec   Loss 1.3178   LearningRate 0.0005   Epoch: 18   Global Step: 769490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:11,781-Speed 2630.64 samples/sec   Loss 1.3323   LearningRate 0.0005   Epoch: 18   Global Step: 769500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:15,682-Speed 2625.62 samples/sec   Loss 1.3402   LearningRate 0.0005   Epoch: 18   Global Step: 769510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:19,790-Speed 2493.80 samples/sec   Loss 1.3606   LearningRate 0.0005   Epoch: 18   Global Step: 769520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:23,781-Speed 2566.27 samples/sec   Loss 1.3163   LearningRate 0.0005   Epoch: 18   Global Step: 769530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:27,710-Speed 2606.98 samples/sec   Loss 1.3138   LearningRate 0.0005   Epoch: 18   Global Step: 769540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:31,601-Speed 2633.00 samples/sec   Loss 1.3424   LearningRate 0.0005   Epoch: 18   Global Step: 769550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:35,497-Speed 2629.69 samples/sec   Loss 1.3942   LearningRate 0.0005   Epoch: 18   Global Step: 769560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:39,397-Speed 2626.63 samples/sec   Loss 1.3159   LearningRate 0.0005   Epoch: 18   Global Step: 769570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:43,290-Speed 2630.95 samples/sec   Loss 1.3802   LearningRate 0.0005   Epoch: 18   Global Step: 769580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:47,162-Speed 2645.31 samples/sec   Loss 1.3518   LearningRate 0.0005   Epoch: 18   Global Step: 769590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:51,060-Speed 2627.32 samples/sec   Loss 1.3257   LearningRate 0.0005   Epoch: 18   Global Step: 769600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:54,957-Speed 2629.20 samples/sec   Loss 1.3355   LearningRate 0.0005   Epoch: 18   Global Step: 769610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:12:58,854-Speed 2627.70 samples/sec   Loss 1.3556   LearningRate 0.0005   Epoch: 18   Global Step: 769620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:02,802-Speed 2594.61 samples/sec   Loss 1.3239   LearningRate 0.0005   Epoch: 18   Global Step: 769630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:06,708-Speed 2622.14 samples/sec   Loss 1.3925   LearningRate 0.0005   Epoch: 18   Global Step: 769640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:10,605-Speed 2628.79 samples/sec   Loss 1.3679   LearningRate 0.0005   Epoch: 18   Global Step: 769650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:14,501-Speed 2628.98 samples/sec   Loss 1.3846   LearningRate 0.0005   Epoch: 18   Global Step: 769660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:18,401-Speed 2625.82 samples/sec   Loss 1.3440   LearningRate 0.0005   Epoch: 18   Global Step: 769670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:22,295-Speed 2631.52 samples/sec   Loss 1.3931   LearningRate 0.0005   Epoch: 18   Global Step: 769680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:26,193-Speed 2627.60 samples/sec   Loss 1.3173   LearningRate 0.0005   Epoch: 18   Global Step: 769690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:13:30,089-Speed 2628.97 samples/sec   Loss 1.3549   LearningRate 0.0005   Epoch: 18   Global Step: 769700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:13:33,986-Speed 2628.37 samples/sec   Loss 1.3188   LearningRate 0.0005   Epoch: 18   Global Step: 769710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:13:37,896-Speed 2619.89 samples/sec   Loss 1.3674   LearningRate 0.0005   Epoch: 18   Global Step: 769720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:13:41,780-Speed 2636.73 samples/sec   Loss 1.3202   LearningRate 0.0005   Epoch: 18   Global Step: 769730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:45,669-Speed 2633.93 samples/sec   Loss 1.3415   LearningRate 0.0005   Epoch: 18   Global Step: 769740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:49,574-Speed 2622.95 samples/sec   Loss 1.3158   LearningRate 0.0005   Epoch: 18   Global Step: 769750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:53,539-Speed 2583.21 samples/sec   Loss 1.3029   LearningRate 0.0005   Epoch: 18   Global Step: 769760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:13:57,408-Speed 2647.45 samples/sec   Loss 1.3794   LearningRate 0.0005   Epoch: 18   Global Step: 769770   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:01,309-Speed 2626.47 samples/sec   Loss 1.3778   LearningRate 0.0005   Epoch: 18   Global Step: 769780   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:05,207-Speed 2627.42 samples/sec   Loss 1.3756   LearningRate 0.0005   Epoch: 18   Global Step: 769790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:09,115-Speed 2620.64 samples/sec   Loss 1.3724   LearningRate 0.0005   Epoch: 18   Global Step: 769800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:13,012-Speed 2628.51 samples/sec   Loss 1.4210   LearningRate 0.0005   Epoch: 18   Global Step: 769810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:16,903-Speed 2632.61 samples/sec   Loss 1.3319   LearningRate 0.0005   Epoch: 18   Global Step: 769820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:20,800-Speed 2628.97 samples/sec   Loss 1.3861   LearningRate 0.0005   Epoch: 18   Global Step: 769830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:24,699-Speed 2626.92 samples/sec   Loss 1.4014   LearningRate 0.0005   Epoch: 18   Global Step: 769840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:28,612-Speed 2617.62 samples/sec   Loss 1.3219   LearningRate 0.0005   Epoch: 18   Global Step: 769850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:32,511-Speed 2626.95 samples/sec   Loss 1.3011   LearningRate 0.0005   Epoch: 18   Global Step: 769860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:14:36,429-Speed 2614.42 samples/sec   Loss 1.3138   LearningRate 0.0005   Epoch: 18   Global Step: 769870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:14:40,338-Speed 2619.96 samples/sec   Loss 1.3389   LearningRate 0.0005   Epoch: 18   Global Step: 769880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:14:44,235-Speed 2628.58 samples/sec   Loss 1.3351   LearningRate 0.0005   Epoch: 18   Global Step: 769890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:14:48,127-Speed 2631.49 samples/sec   Loss 1.3412   LearningRate 0.0005   Epoch: 18   Global Step: 769900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:14:52,024-Speed 2629.01 samples/sec   Loss 1.3360   LearningRate 0.0005   Epoch: 18   Global Step: 769910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:14:55,914-Speed 2632.61 samples/sec   Loss 1.3135   LearningRate 0.0005   Epoch: 18   Global Step: 769920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:14:59,814-Speed 2626.09 samples/sec   Loss 1.3319   LearningRate 0.0005   Epoch: 18   Global Step: 769930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:15:03,725-Speed 2619.64 samples/sec   Loss 1.3189   LearningRate 0.0005   Epoch: 18   Global Step: 769940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:15:07,625-Speed 2626.10 samples/sec   Loss 1.3572   LearningRate 0.0005   Epoch: 18   Global Step: 769950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:15:11,526-Speed 2625.16 samples/sec   Loss 1.3197   LearningRate 0.0005   Epoch: 18   Global Step: 769960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:15:15,420-Speed 2630.45 samples/sec   Loss 1.3337   LearningRate 0.0005   Epoch: 18   Global Step: 769970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:15:19,287-Speed 2648.39 samples/sec   Loss 1.3225   LearningRate 0.0005   Epoch: 18   Global Step: 769980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:15:23,210-Speed 2611.38 samples/sec   Loss 1.3498   LearningRate 0.0005   Epoch: 18   Global Step: 769990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:15:27,111-Speed 2625.76 samples/sec   Loss 1.3791   LearningRate 0.0005   Epoch: 18   Global Step: 770000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:16:10,878-[lfw][770000]XNorm: 21.483209
Training: 2022-04-16 10:16:10,879-[lfw][770000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 10:16:10,879-[lfw][770000]Accuracy-Highest: 0.99850
Training: 2022-04-16 10:17:01,858-[cfp_fp][770000]XNorm: 21.984303
Training: 2022-04-16 10:17:01,859-[cfp_fp][770000]Accuracy-Flip: 0.99343+-0.00333
Training: 2022-04-16 10:17:01,860-[cfp_fp][770000]Accuracy-Highest: 0.99343
Training: 2022-04-16 10:17:44,987-[agedb_30][770000]XNorm: 22.443159
Training: 2022-04-16 10:17:44,988-[agedb_30][770000]Accuracy-Flip: 0.98367+-0.00714
Training: 2022-04-16 10:17:44,988-[agedb_30][770000]Accuracy-Highest: 0.98450
Training: 2022-04-16 10:17:48,877-Speed 72.23 samples/sec   Loss 1.3656   LearningRate 0.0005   Epoch: 18   Global Step: 770010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:17:52,750-Speed 2644.40 samples/sec   Loss 1.3216   LearningRate 0.0005   Epoch: 18   Global Step: 770020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:17:56,624-Speed 2644.20 samples/sec   Loss 1.3013   LearningRate 0.0005   Epoch: 18   Global Step: 770030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:18:00,508-Speed 2636.81 samples/sec   Loss 1.3065   LearningRate 0.0005   Epoch: 18   Global Step: 770040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:18:04,406-Speed 2629.33 samples/sec   Loss 1.3680   LearningRate 0.0005   Epoch: 18   Global Step: 770050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:18:08,282-Speed 2641.77 samples/sec   Loss 1.3509   LearningRate 0.0005   Epoch: 18   Global Step: 770060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:18:12,176-Speed 2630.90 samples/sec   Loss 1.3165   LearningRate 0.0005   Epoch: 18   Global Step: 770070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:18:16,065-Speed 2633.77 samples/sec   Loss 1.3652   LearningRate 0.0005   Epoch: 18   Global Step: 770080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:18:19,931-Speed 2649.48 samples/sec   Loss 1.3900   LearningRate 0.0005   Epoch: 18   Global Step: 770090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:18:23,799-Speed 2648.03 samples/sec   Loss 1.3463   LearningRate 0.0005   Epoch: 18   Global Step: 770100   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:27,691-Speed 2631.57 samples/sec   Loss 1.3235   LearningRate 0.0005   Epoch: 18   Global Step: 770110   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:31,585-Speed 2630.54 samples/sec   Loss 1.3608   LearningRate 0.0005   Epoch: 18   Global Step: 770120   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:35,530-Speed 2597.27 samples/sec   Loss 1.3526   LearningRate 0.0005   Epoch: 18   Global Step: 770130   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:39,414-Speed 2637.05 samples/sec   Loss 1.3361   LearningRate 0.0005   Epoch: 18   Global Step: 770140   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:43,306-Speed 2632.02 samples/sec   Loss 1.3220   LearningRate 0.0005   Epoch: 18   Global Step: 770150   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:47,206-Speed 2626.21 samples/sec   Loss 1.3443   LearningRate 0.0005   Epoch: 18   Global Step: 770160   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:51,115-Speed 2620.51 samples/sec   Loss 1.3536   LearningRate 0.0005   Epoch: 18   Global Step: 770170   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:55,019-Speed 2623.34 samples/sec   Loss 1.3121   LearningRate 0.0005   Epoch: 18   Global Step: 770180   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:18:58,931-Speed 2618.22 samples/sec   Loss 1.3555   LearningRate 0.0005   Epoch: 18   Global Step: 770190   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:19:02,827-Speed 2629.13 samples/sec   Loss 1.2780   LearningRate 0.0005   Epoch: 18   Global Step: 770200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:06,725-Speed 2628.05 samples/sec   Loss 1.3089   LearningRate 0.0005   Epoch: 18   Global Step: 770210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:10,622-Speed 2628.26 samples/sec   Loss 1.3020   LearningRate 0.0005   Epoch: 18   Global Step: 770220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:14,520-Speed 2627.24 samples/sec   Loss 1.3566   LearningRate 0.0005   Epoch: 18   Global Step: 770230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:18,422-Speed 2625.15 samples/sec   Loss 1.3304   LearningRate 0.0005   Epoch: 18   Global Step: 770240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:22,329-Speed 2621.63 samples/sec   Loss 1.3726   LearningRate 0.0005   Epoch: 18   Global Step: 770250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:26,233-Speed 2623.46 samples/sec   Loss 1.3507   LearningRate 0.0005   Epoch: 18   Global Step: 770260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:30,143-Speed 2619.21 samples/sec   Loss 1.3338   LearningRate 0.0005   Epoch: 18   Global Step: 770270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:34,043-Speed 2626.20 samples/sec   Loss 1.3971   LearningRate 0.0005   Epoch: 18   Global Step: 770280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:37,946-Speed 2624.26 samples/sec   Loss 1.3095   LearningRate 0.0005   Epoch: 18   Global Step: 770290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:19:41,853-Speed 2622.20 samples/sec   Loss 1.3763   LearningRate 0.0005   Epoch: 18   Global Step: 770300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:19:45,752-Speed 2626.87 samples/sec   Loss 1.3586   LearningRate 0.0005   Epoch: 18   Global Step: 770310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:19:49,703-Speed 2592.28 samples/sec   Loss 1.3165   LearningRate 0.0005   Epoch: 18   Global Step: 770320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:19:53,610-Speed 2621.93 samples/sec   Loss 1.3955   LearningRate 0.0005   Epoch: 18   Global Step: 770330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:19:57,510-Speed 2626.27 samples/sec   Loss 1.3725   LearningRate 0.0005   Epoch: 18   Global Step: 770340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:20:01,385-Speed 2643.31 samples/sec   Loss 1.3370   LearningRate 0.0005   Epoch: 18   Global Step: 770350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:05,281-Speed 2628.81 samples/sec   Loss 1.3048   LearningRate 0.0005   Epoch: 18   Global Step: 770360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:09,186-Speed 2622.74 samples/sec   Loss 1.3581   LearningRate 0.0005   Epoch: 18   Global Step: 770370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:13,089-Speed 2625.25 samples/sec   Loss 1.3060   LearningRate 0.0005   Epoch: 18   Global Step: 770380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:16,988-Speed 2626.20 samples/sec   Loss 1.3407   LearningRate 0.0005   Epoch: 18   Global Step: 770390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:20,893-Speed 2623.25 samples/sec   Loss 1.3777   LearningRate 0.0005   Epoch: 18   Global Step: 770400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:24,791-Speed 2627.78 samples/sec   Loss 1.3218   LearningRate 0.0005   Epoch: 18   Global Step: 770410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:28,692-Speed 2625.89 samples/sec   Loss 1.3431   LearningRate 0.0005   Epoch: 18   Global Step: 770420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:32,594-Speed 2625.33 samples/sec   Loss 1.3378   LearningRate 0.0005   Epoch: 18   Global Step: 770430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:36,492-Speed 2627.23 samples/sec   Loss 1.3736   LearningRate 0.0005   Epoch: 18   Global Step: 770440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:40,369-Speed 2641.26 samples/sec   Loss 1.3058   LearningRate 0.0005   Epoch: 18   Global Step: 770450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:44,277-Speed 2621.56 samples/sec   Loss 1.2990   LearningRate 0.0005   Epoch: 18   Global Step: 770460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:48,181-Speed 2623.56 samples/sec   Loss 1.3076   LearningRate 0.0005   Epoch: 18   Global Step: 770470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:52,071-Speed 2632.91 samples/sec   Loss 1.3377   LearningRate 0.0005   Epoch: 18   Global Step: 770480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:55,965-Speed 2631.14 samples/sec   Loss 1.4057   LearningRate 0.0005   Epoch: 18   Global Step: 770490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:20:59,860-Speed 2629.67 samples/sec   Loss 1.3748   LearningRate 0.0005   Epoch: 18   Global Step: 770500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:03,763-Speed 2624.12 samples/sec   Loss 1.3407   LearningRate 0.0005   Epoch: 18   Global Step: 770510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:07,670-Speed 2621.71 samples/sec   Loss 1.3585   LearningRate 0.0005   Epoch: 18   Global Step: 770520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:11,565-Speed 2629.52 samples/sec   Loss 1.3294   LearningRate 0.0005   Epoch: 18   Global Step: 770530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:15,459-Speed 2630.01 samples/sec   Loss 1.3640   LearningRate 0.0005   Epoch: 18   Global Step: 770540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:19,365-Speed 2623.18 samples/sec   Loss 1.3106   LearningRate 0.0005   Epoch: 18   Global Step: 770550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:21:23,233-Speed 2647.63 samples/sec   Loss 1.3428   LearningRate 0.0005   Epoch: 18   Global Step: 770560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:27,131-Speed 2628.30 samples/sec   Loss 1.3051   LearningRate 0.0005   Epoch: 18   Global Step: 770570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:31,030-Speed 2626.30 samples/sec   Loss 1.3024   LearningRate 0.0005   Epoch: 18   Global Step: 770580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:34,943-Speed 2617.98 samples/sec   Loss 1.3747   LearningRate 0.0005   Epoch: 18   Global Step: 770590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:38,843-Speed 2625.65 samples/sec   Loss 1.3422   LearningRate 0.0005   Epoch: 18   Global Step: 770600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:42,748-Speed 2623.31 samples/sec   Loss 1.3507   LearningRate 0.0005   Epoch: 18   Global Step: 770610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:46,653-Speed 2622.94 samples/sec   Loss 1.3292   LearningRate 0.0005   Epoch: 18   Global Step: 770620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:50,578-Speed 2609.45 samples/sec   Loss 1.4151   LearningRate 0.0005   Epoch: 18   Global Step: 770630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:54,517-Speed 2600.68 samples/sec   Loss 1.3415   LearningRate 0.0005   Epoch: 18   Global Step: 770640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:21:58,469-Speed 2591.91 samples/sec   Loss 1.3344   LearningRate 0.0005   Epoch: 18   Global Step: 770650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:02,390-Speed 2611.91 samples/sec   Loss 1.3554   LearningRate 0.0005   Epoch: 18   Global Step: 770660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:22:06,281-Speed 2632.45 samples/sec   Loss 1.3447   LearningRate 0.0005   Epoch: 18   Global Step: 770670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:10,186-Speed 2622.74 samples/sec   Loss 1.3156   LearningRate 0.0005   Epoch: 18   Global Step: 770680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:14,093-Speed 2621.91 samples/sec   Loss 1.3611   LearningRate 0.0005   Epoch: 18   Global Step: 770690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:17,998-Speed 2624.26 samples/sec   Loss 1.3572   LearningRate 0.0005   Epoch: 18   Global Step: 770700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:21,902-Speed 2623.30 samples/sec   Loss 1.3337   LearningRate 0.0005   Epoch: 18   Global Step: 770710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:25,845-Speed 2597.48 samples/sec   Loss 1.2963   LearningRate 0.0005   Epoch: 18   Global Step: 770720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:29,758-Speed 2618.09 samples/sec   Loss 1.3519   LearningRate 0.0005   Epoch: 18   Global Step: 770730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:33,664-Speed 2622.46 samples/sec   Loss 1.3906   LearningRate 0.0005   Epoch: 18   Global Step: 770740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:37,570-Speed 2621.80 samples/sec   Loss 1.3527   LearningRate 0.0005   Epoch: 18   Global Step: 770750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:41,467-Speed 2628.01 samples/sec   Loss 1.3551   LearningRate 0.0005   Epoch: 18   Global Step: 770760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:45,360-Speed 2631.50 samples/sec   Loss 1.3543   LearningRate 0.0005   Epoch: 18   Global Step: 770770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:22:49,368-Speed 2555.63 samples/sec   Loss 1.3086   LearningRate 0.0005   Epoch: 18   Global Step: 770780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:22:53,331-Speed 2584.96 samples/sec   Loss 1.3365   LearningRate 0.0005   Epoch: 18   Global Step: 770790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:22:57,207-Speed 2642.26 samples/sec   Loss 1.3399   LearningRate 0.0005   Epoch: 18   Global Step: 770800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:01,108-Speed 2625.95 samples/sec   Loss 1.3603   LearningRate 0.0005   Epoch: 18   Global Step: 770810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:05,003-Speed 2629.08 samples/sec   Loss 1.3838   LearningRate 0.0005   Epoch: 18   Global Step: 770820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:08,906-Speed 2624.86 samples/sec   Loss 1.3499   LearningRate 0.0005   Epoch: 18   Global Step: 770830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:12,803-Speed 2628.13 samples/sec   Loss 1.3349   LearningRate 0.0005   Epoch: 18   Global Step: 770840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:16,703-Speed 2626.15 samples/sec   Loss 1.3013   LearningRate 0.0005   Epoch: 18   Global Step: 770850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:20,602-Speed 2627.25 samples/sec   Loss 1.3309   LearningRate 0.0005   Epoch: 18   Global Step: 770860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:24,498-Speed 2628.89 samples/sec   Loss 1.3440   LearningRate 0.0005   Epoch: 18   Global Step: 770870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:28,391-Speed 2631.42 samples/sec   Loss 1.3540   LearningRate 0.0005   Epoch: 18   Global Step: 770880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:32,293-Speed 2624.93 samples/sec   Loss 1.3633   LearningRate 0.0005   Epoch: 18   Global Step: 770890   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:23:36,190-Speed 2628.01 samples/sec   Loss 1.3554   LearningRate 0.0005   Epoch: 18   Global Step: 770900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:23:40,095-Speed 2622.95 samples/sec   Loss 1.3672   LearningRate 0.0005   Epoch: 18   Global Step: 770910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:23:43,991-Speed 2629.35 samples/sec   Loss 1.3595   LearningRate 0.0005   Epoch: 18   Global Step: 770920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:23:47,888-Speed 2628.83 samples/sec   Loss 1.3471   LearningRate 0.0005   Epoch: 18   Global Step: 770930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:23:51,785-Speed 2628.85 samples/sec   Loss 1.3835   LearningRate 0.0005   Epoch: 18   Global Step: 770940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:23:55,719-Speed 2604.28 samples/sec   Loss 1.3032   LearningRate 0.0005   Epoch: 18   Global Step: 770950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:23:59,622-Speed 2624.09 samples/sec   Loss 1.3979   LearningRate 0.0005   Epoch: 18   Global Step: 770960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:24:03,519-Speed 2628.15 samples/sec   Loss 1.2964   LearningRate 0.0005   Epoch: 18   Global Step: 770970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:24:07,471-Speed 2591.32 samples/sec   Loss 1.3102   LearningRate 0.0005   Epoch: 18   Global Step: 770980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:24:11,366-Speed 2630.28 samples/sec   Loss 1.3126   LearningRate 0.0005   Epoch: 18   Global Step: 770990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:24:15,268-Speed 2624.18 samples/sec   Loss 1.3584   LearningRate 0.0005   Epoch: 18   Global Step: 771000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:19,170-Speed 2625.32 samples/sec   Loss 1.3209   LearningRate 0.0005   Epoch: 18   Global Step: 771010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:23,082-Speed 2618.04 samples/sec   Loss 1.3823   LearningRate 0.0005   Epoch: 18   Global Step: 771020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:26,978-Speed 2629.61 samples/sec   Loss 1.3286   LearningRate 0.0005   Epoch: 18   Global Step: 771030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:30,868-Speed 2632.70 samples/sec   Loss 1.3590   LearningRate 0.0005   Epoch: 18   Global Step: 771040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:34,765-Speed 2628.48 samples/sec   Loss 1.2962   LearningRate 0.0005   Epoch: 18   Global Step: 771050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:38,657-Speed 2631.44 samples/sec   Loss 1.3584   LearningRate 0.0005   Epoch: 18   Global Step: 771060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:42,550-Speed 2632.08 samples/sec   Loss 1.3475   LearningRate 0.0005   Epoch: 18   Global Step: 771070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:46,442-Speed 2631.92 samples/sec   Loss 1.3625   LearningRate 0.0005   Epoch: 18   Global Step: 771080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:24:50,288-Speed 2663.41 samples/sec   Loss 1.3298   LearningRate 0.0005   Epoch: 18   Global Step: 771090   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:24:54,184-Speed 2628.89 samples/sec   Loss 1.3242   LearningRate 0.0005   Epoch: 18   Global Step: 771100   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:24:58,078-Speed 2630.73 samples/sec   Loss 1.2739   LearningRate 0.0005   Epoch: 18   Global Step: 771110   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:01,976-Speed 2627.36 samples/sec   Loss 1.3618   LearningRate 0.0005   Epoch: 18   Global Step: 771120   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:05,883-Speed 2621.34 samples/sec   Loss 1.3449   LearningRate 0.0005   Epoch: 18   Global Step: 771130   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:09,802-Speed 2613.94 samples/sec   Loss 1.3359   LearningRate 0.0005   Epoch: 18   Global Step: 771140   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:13,701-Speed 2627.74 samples/sec   Loss 1.3052   LearningRate 0.0005   Epoch: 18   Global Step: 771150   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:17,598-Speed 2627.83 samples/sec   Loss 1.3795   LearningRate 0.0005   Epoch: 18   Global Step: 771160   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:21,532-Speed 2603.92 samples/sec   Loss 1.3580   LearningRate 0.0005   Epoch: 18   Global Step: 771170   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:25,429-Speed 2628.61 samples/sec   Loss 1.2861   LearningRate 0.0005   Epoch: 18   Global Step: 771180   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:25:29,333-Speed 2624.04 samples/sec   Loss 1.3598   LearningRate 0.0005   Epoch: 18   Global Step: 771190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:33,229-Speed 2628.31 samples/sec   Loss 1.3460   LearningRate 0.0005   Epoch: 18   Global Step: 771200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:37,131-Speed 2624.99 samples/sec   Loss 1.3520   LearningRate 0.0005   Epoch: 18   Global Step: 771210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:41,055-Speed 2610.61 samples/sec   Loss 1.3508   LearningRate 0.0005   Epoch: 18   Global Step: 771220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:44,953-Speed 2627.57 samples/sec   Loss 1.3230   LearningRate 0.0005   Epoch: 18   Global Step: 771230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:48,849-Speed 2628.94 samples/sec   Loss 1.3113   LearningRate 0.0005   Epoch: 18   Global Step: 771240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:52,745-Speed 2628.65 samples/sec   Loss 1.3253   LearningRate 0.0005   Epoch: 18   Global Step: 771250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:25:56,637-Speed 2632.06 samples/sec   Loss 1.3782   LearningRate 0.0005   Epoch: 18   Global Step: 771260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:00,530-Speed 2630.92 samples/sec   Loss 1.3406   LearningRate 0.0005   Epoch: 18   Global Step: 771270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:04,441-Speed 2619.01 samples/sec   Loss 1.3472   LearningRate 0.0005   Epoch: 18   Global Step: 771280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:08,341-Speed 2626.05 samples/sec   Loss 1.3155   LearningRate 0.0005   Epoch: 18   Global Step: 771290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:26:12,240-Speed 2626.57 samples/sec   Loss 1.2952   LearningRate 0.0005   Epoch: 18   Global Step: 771300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:26:16,135-Speed 2629.62 samples/sec   Loss 1.2873   LearningRate 0.0005   Epoch: 18   Global Step: 771310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:26:20,034-Speed 2626.85 samples/sec   Loss 1.3183   LearningRate 0.0005   Epoch: 18   Global Step: 771320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:26:23,914-Speed 2640.81 samples/sec   Loss 1.3612   LearningRate 0.0005   Epoch: 18   Global Step: 771330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:27,806-Speed 2631.96 samples/sec   Loss 1.3118   LearningRate 0.0005   Epoch: 18   Global Step: 771340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:31,699-Speed 2631.21 samples/sec   Loss 1.3302   LearningRate 0.0005   Epoch: 18   Global Step: 771350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:35,611-Speed 2618.26 samples/sec   Loss 1.3390   LearningRate 0.0005   Epoch: 18   Global Step: 771360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:39,521-Speed 2619.44 samples/sec   Loss 1.3219   LearningRate 0.0005   Epoch: 18   Global Step: 771370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:43,413-Speed 2631.85 samples/sec   Loss 1.3016   LearningRate 0.0005   Epoch: 18   Global Step: 771380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:47,311-Speed 2627.91 samples/sec   Loss 1.2616   LearningRate 0.0005   Epoch: 18   Global Step: 771390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:51,211-Speed 2626.05 samples/sec   Loss 1.3189   LearningRate 0.0005   Epoch: 18   Global Step: 771400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:55,207-Speed 2564.02 samples/sec   Loss 1.3076   LearningRate 0.0005   Epoch: 18   Global Step: 771410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:26:59,248-Speed 2534.09 samples/sec   Loss 1.2760   LearningRate 0.0005   Epoch: 18   Global Step: 771420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:03,185-Speed 2602.14 samples/sec   Loss 1.3273   LearningRate 0.0005   Epoch: 18   Global Step: 771430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:27:07,082-Speed 2628.20 samples/sec   Loss 1.3669   LearningRate 0.0005   Epoch: 18   Global Step: 771440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:27:11,146-Speed 2520.57 samples/sec   Loss 1.3496   LearningRate 0.0005   Epoch: 18   Global Step: 771450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:27:15,107-Speed 2585.16 samples/sec   Loss 1.3354   LearningRate 0.0005   Epoch: 18   Global Step: 771460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-16 10:27:18,980-Speed 2645.32 samples/sec   Loss 1.2661   LearningRate 0.0005   Epoch: 18   Global Step: 771470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:22,872-Speed 2631.95 samples/sec   Loss 1.3290   LearningRate 0.0005   Epoch: 18   Global Step: 771480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:26,765-Speed 2631.26 samples/sec   Loss 1.2808   LearningRate 0.0005   Epoch: 18   Global Step: 771490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:30,667-Speed 2624.76 samples/sec   Loss 1.3327   LearningRate 0.0005   Epoch: 18   Global Step: 771500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:34,559-Speed 2632.18 samples/sec   Loss 1.3105   LearningRate 0.0005   Epoch: 18   Global Step: 771510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:38,452-Speed 2630.59 samples/sec   Loss 1.3048   LearningRate 0.0005   Epoch: 18   Global Step: 771520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:42,347-Speed 2629.92 samples/sec   Loss 1.2971   LearningRate 0.0005   Epoch: 18   Global Step: 771530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:46,246-Speed 2626.86 samples/sec   Loss 1.3467   LearningRate 0.0005   Epoch: 18   Global Step: 771540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:50,137-Speed 2631.64 samples/sec   Loss 1.3606   LearningRate 0.0005   Epoch: 18   Global Step: 771550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:54,030-Speed 2631.58 samples/sec   Loss 1.3154   LearningRate 0.0005   Epoch: 18   Global Step: 771560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:27:57,896-Speed 2649.57 samples/sec   Loss 1.3189   LearningRate 0.0005   Epoch: 18   Global Step: 771570   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:01,792-Speed 2628.64 samples/sec   Loss 1.2790   LearningRate 0.0005   Epoch: 18   Global Step: 771580   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:05,704-Speed 2618.21 samples/sec   Loss 1.3273   LearningRate 0.0005   Epoch: 18   Global Step: 771590   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:09,596-Speed 2631.29 samples/sec   Loss 1.3242   LearningRate 0.0005   Epoch: 18   Global Step: 771600   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:13,495-Speed 2627.16 samples/sec   Loss 1.3273   LearningRate 0.0005   Epoch: 18   Global Step: 771610   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:17,400-Speed 2623.24 samples/sec   Loss 1.3573   LearningRate 0.0005   Epoch: 18   Global Step: 771620   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:21,304-Speed 2623.40 samples/sec   Loss 1.3497   LearningRate 0.0005   Epoch: 18   Global Step: 771630   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:25,227-Speed 2610.56 samples/sec   Loss 1.2868   LearningRate 0.0005   Epoch: 18   Global Step: 771640   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:29,126-Speed 2627.32 samples/sec   Loss 1.3387   LearningRate 0.0005   Epoch: 18   Global Step: 771650   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:33,019-Speed 2631.22 samples/sec   Loss 1.3257   LearningRate 0.0005   Epoch: 18   Global Step: 771660   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-16 10:28:36,916-Speed 2627.66 samples/sec   Loss 1.3139   LearningRate 0.0005   Epoch: 18   Global Step: 771670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:28:40,810-Speed 2630.04 samples/sec   Loss 1.2910   LearningRate 0.0005   Epoch: 18   Global Step: 771680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:28:44,705-Speed 2630.16 samples/sec   Loss 1.3354   LearningRate 0.0005   Epoch: 18   Global Step: 771690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:28:48,648-Speed 2597.67 samples/sec   Loss 1.3699   LearningRate 0.0005   Epoch: 18   Global Step: 771700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:28:52,547-Speed 2626.72 samples/sec   Loss 1.3610   LearningRate 0.0005   Epoch: 18   Global Step: 771710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:28:56,442-Speed 2629.66 samples/sec   Loss 1.3461   LearningRate 0.0005   Epoch: 18   Global Step: 771720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:29:00,335-Speed 2630.89 samples/sec   Loss 1.3258   LearningRate 0.0005   Epoch: 18   Global Step: 771730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:29:04,234-Speed 2626.99 samples/sec   Loss 1.3467   LearningRate 0.0005   Epoch: 18   Global Step: 771740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:29:08,133-Speed 2627.34 samples/sec   Loss 1.3028   LearningRate 0.0005   Epoch: 18   Global Step: 771750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-16 10:29:12,030-Speed 2628.02 samples/sec   Loss 1.2868   LearningRate 0.0005   Epoch: 18   Global Step: 771760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:15,931-Speed 2625.50 samples/sec   Loss 1.3636   LearningRate 0.0005   Epoch: 18   Global Step: 771770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:29:19,837-Speed 2622.74 samples/sec   Loss 1.3197   LearningRate 0.0005   Epoch: 18   Global Step: 771780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:29:23,711-Speed 2643.35 samples/sec   Loss 1.3797   LearningRate 0.0005   Epoch: 18   Global Step: 771790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:27,608-Speed 2628.91 samples/sec   Loss 1.3553   LearningRate 0.0005   Epoch: 18   Global Step: 771800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:31,502-Speed 2630.38 samples/sec   Loss 1.3752   LearningRate 0.0005   Epoch: 18   Global Step: 771810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:35,395-Speed 2630.97 samples/sec   Loss 1.3273   LearningRate 0.0005   Epoch: 18   Global Step: 771820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:39,298-Speed 2624.63 samples/sec   Loss 1.3554   LearningRate 0.0005   Epoch: 18   Global Step: 771830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:43,200-Speed 2624.83 samples/sec   Loss 1.3507   LearningRate 0.0005   Epoch: 18   Global Step: 771840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:47,091-Speed 2631.96 samples/sec   Loss 1.3359   LearningRate 0.0005   Epoch: 18   Global Step: 771850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:50,981-Speed 2633.73 samples/sec   Loss 1.3234   LearningRate 0.0005   Epoch: 18   Global Step: 771860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:54,874-Speed 2630.70 samples/sec   Loss 1.2873   LearningRate 0.0005   Epoch: 18   Global Step: 771870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:29:58,800-Speed 2609.19 samples/sec   Loss 1.3059   LearningRate 0.0005   Epoch: 18   Global Step: 771880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:02,694-Speed 2629.69 samples/sec   Loss 1.3293   LearningRate 0.0005   Epoch: 18   Global Step: 771890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:30:06,587-Speed 2631.13 samples/sec   Loss 1.3546   LearningRate 0.0005   Epoch: 18   Global Step: 771900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:30:10,457-Speed 2646.87 samples/sec   Loss 1.3509   LearningRate 0.0005   Epoch: 18   Global Step: 771910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:14,359-Speed 2625.10 samples/sec   Loss 1.2956   LearningRate 0.0005   Epoch: 18   Global Step: 771920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:18,259-Speed 2626.17 samples/sec   Loss 1.3674   LearningRate 0.0005   Epoch: 18   Global Step: 771930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:22,152-Speed 2631.10 samples/sec   Loss 1.3413   LearningRate 0.0005   Epoch: 18   Global Step: 771940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:26,044-Speed 2631.12 samples/sec   Loss 1.3540   LearningRate 0.0005   Epoch: 18   Global Step: 771950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:29,940-Speed 2629.83 samples/sec   Loss 1.3066   LearningRate 0.0005   Epoch: 18   Global Step: 771960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:33,832-Speed 2631.41 samples/sec   Loss 1.3297   LearningRate 0.0005   Epoch: 18   Global Step: 771970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:37,728-Speed 2629.17 samples/sec   Loss 1.3660   LearningRate 0.0005   Epoch: 18   Global Step: 771980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:41,619-Speed 2632.28 samples/sec   Loss 1.3936   LearningRate 0.0005   Epoch: 18   Global Step: 771990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:45,514-Speed 2629.21 samples/sec   Loss 1.3245   LearningRate 0.0005   Epoch: 18   Global Step: 772000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:49,408-Speed 2631.10 samples/sec   Loss 1.3376   LearningRate 0.0005   Epoch: 18   Global Step: 772010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:30:53,287-Speed 2640.51 samples/sec   Loss 1.3573   LearningRate 0.0005   Epoch: 18   Global Step: 772020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:30:57,199-Speed 2618.53 samples/sec   Loss 1.3340   LearningRate 0.0005   Epoch: 18   Global Step: 772030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:01,118-Speed 2613.54 samples/sec   Loss 1.3481   LearningRate 0.0005   Epoch: 18   Global Step: 772040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:05,030-Speed 2618.10 samples/sec   Loss 1.3519   LearningRate 0.0005   Epoch: 18   Global Step: 772050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:08,931-Speed 2625.66 samples/sec   Loss 1.3428   LearningRate 0.0005   Epoch: 18   Global Step: 772060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:12,821-Speed 2633.48 samples/sec   Loss 1.2991   LearningRate 0.0005   Epoch: 18   Global Step: 772070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:16,714-Speed 2631.27 samples/sec   Loss 1.3156   LearningRate 0.0005   Epoch: 18   Global Step: 772080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:20,606-Speed 2631.64 samples/sec   Loss 1.3216   LearningRate 0.0005   Epoch: 18   Global Step: 772090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:24,505-Speed 2627.35 samples/sec   Loss 1.3318   LearningRate 0.0005   Epoch: 18   Global Step: 772100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:28,405-Speed 2625.87 samples/sec   Loss 1.3041   LearningRate 0.0005   Epoch: 18   Global Step: 772110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:32,300-Speed 2630.40 samples/sec   Loss 1.3411   LearningRate 0.0005   Epoch: 18   Global Step: 772120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:31:36,166-Speed 2649.00 samples/sec   Loss 1.3332   LearningRate 0.0005   Epoch: 18   Global Step: 772130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:40,058-Speed 2631.19 samples/sec   Loss 1.3688   LearningRate 0.0005   Epoch: 18   Global Step: 772140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:43,953-Speed 2630.29 samples/sec   Loss 1.3447   LearningRate 0.0005   Epoch: 18   Global Step: 772150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:47,854-Speed 2625.13 samples/sec   Loss 1.3680   LearningRate 0.0005   Epoch: 18   Global Step: 772160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:51,749-Speed 2630.68 samples/sec   Loss 1.3657   LearningRate 0.0005   Epoch: 18   Global Step: 772170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:55,664-Speed 2615.94 samples/sec   Loss 1.3308   LearningRate 0.0005   Epoch: 18   Global Step: 772180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:31:59,567-Speed 2624.56 samples/sec   Loss 1.3467   LearningRate 0.0005   Epoch: 18   Global Step: 772190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:03,492-Speed 2609.37 samples/sec   Loss 1.3413   LearningRate 0.0005   Epoch: 18   Global Step: 772200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:07,389-Speed 2628.76 samples/sec   Loss 1.2881   LearningRate 0.0005   Epoch: 18   Global Step: 772210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:11,289-Speed 2626.42 samples/sec   Loss 1.3196   LearningRate 0.0005   Epoch: 18   Global Step: 772220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:15,183-Speed 2630.29 samples/sec   Loss 1.3354   LearningRate 0.0005   Epoch: 18   Global Step: 772230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:32:19,078-Speed 2629.95 samples/sec   Loss 1.3008   LearningRate 0.0005   Epoch: 18   Global Step: 772240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:32:22,969-Speed 2632.11 samples/sec   Loss 1.3072   LearningRate 0.0005   Epoch: 18   Global Step: 772250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:32:26,862-Speed 2631.78 samples/sec   Loss 1.3114   LearningRate 0.0005   Epoch: 18   Global Step: 772260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:32:30,757-Speed 2629.44 samples/sec   Loss 1.3028   LearningRate 0.0005   Epoch: 18   Global Step: 772270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:32:34,653-Speed 2629.33 samples/sec   Loss 1.2988   LearningRate 0.0005   Epoch: 18   Global Step: 772280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:32:38,521-Speed 2647.90 samples/sec   Loss 1.3465   LearningRate 0.0005   Epoch: 18   Global Step: 772290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:42,414-Speed 2630.75 samples/sec   Loss 1.3792   LearningRate 0.0005   Epoch: 18   Global Step: 772300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:46,309-Speed 2629.11 samples/sec   Loss 1.3323   LearningRate 0.0005   Epoch: 18   Global Step: 772310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:50,213-Speed 2624.10 samples/sec   Loss 1.3950   LearningRate 0.0005   Epoch: 18   Global Step: 772320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:54,107-Speed 2631.01 samples/sec   Loss 1.3713   LearningRate 0.0005   Epoch: 18   Global Step: 772330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:32:58,021-Speed 2616.95 samples/sec   Loss 1.3516   LearningRate 0.0005   Epoch: 18   Global Step: 772340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:01,911-Speed 2633.08 samples/sec   Loss 1.3111   LearningRate 0.0005   Epoch: 18   Global Step: 772350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:05,812-Speed 2625.81 samples/sec   Loss 1.3120   LearningRate 0.0005   Epoch: 18   Global Step: 772360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:09,708-Speed 2628.63 samples/sec   Loss 1.3356   LearningRate 0.0005   Epoch: 18   Global Step: 772370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:13,607-Speed 2626.91 samples/sec   Loss 1.3015   LearningRate 0.0005   Epoch: 18   Global Step: 772380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:17,505-Speed 2628.03 samples/sec   Loss 1.3899   LearningRate 0.0005   Epoch: 18   Global Step: 772390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:33:21,402-Speed 2627.86 samples/sec   Loss 1.3425   LearningRate 0.0005   Epoch: 18   Global Step: 772400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:33:25,300-Speed 2628.19 samples/sec   Loss 1.3621   LearningRate 0.0005   Epoch: 18   Global Step: 772410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:33:29,169-Speed 2647.36 samples/sec   Loss 1.3045   LearningRate 0.0005   Epoch: 18   Global Step: 772420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:33,063-Speed 2630.24 samples/sec   Loss 1.3786   LearningRate 0.0005   Epoch: 18   Global Step: 772430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:36,975-Speed 2618.42 samples/sec   Loss 1.2925   LearningRate 0.0005   Epoch: 18   Global Step: 772440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:40,879-Speed 2623.48 samples/sec   Loss 1.3524   LearningRate 0.0005   Epoch: 18   Global Step: 772450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:44,781-Speed 2624.58 samples/sec   Loss 1.2915   LearningRate 0.0005   Epoch: 18   Global Step: 772460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:48,673-Speed 2632.30 samples/sec   Loss 1.3091   LearningRate 0.0005   Epoch: 18   Global Step: 772470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:52,565-Speed 2631.56 samples/sec   Loss 1.2988   LearningRate 0.0005   Epoch: 18   Global Step: 772480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:33:56,489-Speed 2610.41 samples/sec   Loss 1.2763   LearningRate 0.0005   Epoch: 18   Global Step: 772490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:34:00,400-Speed 2618.98 samples/sec   Loss 1.3898   LearningRate 0.0005   Epoch: 18   Global Step: 772500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:34:04,295-Speed 2630.15 samples/sec   Loss 1.3342   LearningRate 0.0005   Epoch: 18   Global Step: 772510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:34:08,205-Speed 2619.03 samples/sec   Loss 1.2953   LearningRate 0.0005   Epoch: 18   Global Step: 772520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:34:12,104-Speed 2627.09 samples/sec   Loss 1.3696   LearningRate 0.0005   Epoch: 18   Global Step: 772530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:34:15,982-Speed 2640.71 samples/sec   Loss 1.2781   LearningRate 0.0005   Epoch: 18   Global Step: 772540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:34:19,856-Speed 2644.32 samples/sec   Loss 1.3534   LearningRate 0.0005   Epoch: 18   Global Step: 772550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:23,762-Speed 2622.65 samples/sec   Loss 1.3220   LearningRate 0.0005   Epoch: 18   Global Step: 772560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:27,666-Speed 2623.09 samples/sec   Loss 1.2976   LearningRate 0.0005   Epoch: 18   Global Step: 772570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:31,563-Speed 2628.72 samples/sec   Loss 1.2951   LearningRate 0.0005   Epoch: 18   Global Step: 772580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:35,512-Speed 2593.68 samples/sec   Loss 1.4136   LearningRate 0.0005   Epoch: 18   Global Step: 772590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:39,406-Speed 2630.55 samples/sec   Loss 1.2723   LearningRate 0.0005   Epoch: 18   Global Step: 772600   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:43,315-Speed 2620.02 samples/sec   Loss 1.2948   LearningRate 0.0005   Epoch: 18   Global Step: 772610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:47,227-Speed 2618.19 samples/sec   Loss 1.3573   LearningRate 0.0005   Epoch: 18   Global Step: 772620   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:51,160-Speed 2604.07 samples/sec   Loss 1.3051   LearningRate 0.0005   Epoch: 18   Global Step: 772630   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:55,065-Speed 2623.60 samples/sec   Loss 1.3211   LearningRate 0.0005   Epoch: 18   Global Step: 772640   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:34:58,959-Speed 2629.85 samples/sec   Loss 1.3137   LearningRate 0.0005   Epoch: 18   Global Step: 772650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:02,854-Speed 2629.85 samples/sec   Loss 1.3248   LearningRate 0.0005   Epoch: 18   Global Step: 772660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:06,794-Speed 2599.41 samples/sec   Loss 1.2773   LearningRate 0.0005   Epoch: 18   Global Step: 772670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:10,703-Speed 2620.45 samples/sec   Loss 1.3427   LearningRate 0.0005   Epoch: 18   Global Step: 772680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:14,609-Speed 2627.10 samples/sec   Loss 1.3046   LearningRate 0.0005   Epoch: 18   Global Step: 772690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:18,502-Speed 2631.10 samples/sec   Loss 1.2776   LearningRate 0.0005   Epoch: 18   Global Step: 772700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:22,406-Speed 2623.65 samples/sec   Loss 1.2674   LearningRate 0.0005   Epoch: 18   Global Step: 772710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:26,333-Speed 2608.76 samples/sec   Loss 1.2929   LearningRate 0.0005   Epoch: 18   Global Step: 772720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:35:30,251-Speed 2614.38 samples/sec   Loss 1.3302   LearningRate 0.0005   Epoch: 18   Global Step: 772730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:34,149-Speed 2627.13 samples/sec   Loss 1.3152   LearningRate 0.0005   Epoch: 18   Global Step: 772740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:38,049-Speed 2626.37 samples/sec   Loss 1.3227   LearningRate 0.0005   Epoch: 18   Global Step: 772750   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:41,980-Speed 2605.67 samples/sec   Loss 1.3458   LearningRate 0.0005   Epoch: 18   Global Step: 772760   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:45,878-Speed 2627.36 samples/sec   Loss 1.3561   LearningRate 0.0005   Epoch: 18   Global Step: 772770   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:49,773-Speed 2629.72 samples/sec   Loss 1.2890   LearningRate 0.0005   Epoch: 18   Global Step: 772780   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:53,666-Speed 2631.38 samples/sec   Loss 1.3764   LearningRate 0.0005   Epoch: 18   Global Step: 772790   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:35:57,555-Speed 2633.58 samples/sec   Loss 1.3494   LearningRate 0.0005   Epoch: 18   Global Step: 772800   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:36:01,458-Speed 2624.37 samples/sec   Loss 1.2833   LearningRate 0.0005   Epoch: 18   Global Step: 772810   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:36:05,350-Speed 2631.74 samples/sec   Loss 1.3254   LearningRate 0.0005   Epoch: 18   Global Step: 772820   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:36:09,243-Speed 2630.88 samples/sec   Loss 1.2686   LearningRate 0.0005   Epoch: 18   Global Step: 772830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:13,136-Speed 2631.03 samples/sec   Loss 1.3205   LearningRate 0.0005   Epoch: 18   Global Step: 772840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:17,028-Speed 2631.99 samples/sec   Loss 1.3010   LearningRate 0.0005   Epoch: 18   Global Step: 772850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:20,918-Speed 2633.36 samples/sec   Loss 1.3273   LearningRate 0.0005   Epoch: 18   Global Step: 772860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:24,835-Speed 2614.35 samples/sec   Loss 1.3208   LearningRate 0.0005   Epoch: 18   Global Step: 772870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:28,758-Speed 2611.80 samples/sec   Loss 1.3126   LearningRate 0.0005   Epoch: 18   Global Step: 772880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:32,659-Speed 2625.08 samples/sec   Loss 1.3281   LearningRate 0.0005   Epoch: 18   Global Step: 772890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:36,558-Speed 2626.90 samples/sec   Loss 1.3487   LearningRate 0.0005   Epoch: 18   Global Step: 772900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:40,459-Speed 2625.40 samples/sec   Loss 1.3087   LearningRate 0.0005   Epoch: 18   Global Step: 772910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:44,369-Speed 2620.19 samples/sec   Loss 1.3310   LearningRate 0.0005   Epoch: 18   Global Step: 772920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:36:48,264-Speed 2629.26 samples/sec   Loss 1.3207   LearningRate 0.0005   Epoch: 18   Global Step: 772930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:36:52,162-Speed 2628.30 samples/sec   Loss 1.3280   LearningRate 0.0005   Epoch: 18   Global Step: 772940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:36:56,058-Speed 2628.80 samples/sec   Loss 1.3449   LearningRate 0.0005   Epoch: 18   Global Step: 772950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:36:59,955-Speed 2628.80 samples/sec   Loss 1.3083   LearningRate 0.0005   Epoch: 18   Global Step: 772960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:37:03,835-Speed 2640.34 samples/sec   Loss 1.3297   LearningRate 0.0005   Epoch: 18   Global Step: 772970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:07,739-Speed 2623.35 samples/sec   Loss 1.3561   LearningRate 0.0005   Epoch: 18   Global Step: 772980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:11,637-Speed 2627.69 samples/sec   Loss 1.2750   LearningRate 0.0005   Epoch: 18   Global Step: 772990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:15,532-Speed 2630.06 samples/sec   Loss 1.3376   LearningRate 0.0005   Epoch: 18   Global Step: 773000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:19,424-Speed 2631.46 samples/sec   Loss 1.3237   LearningRate 0.0005   Epoch: 18   Global Step: 773010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:23,343-Speed 2613.50 samples/sec   Loss 1.2961   LearningRate 0.0005   Epoch: 18   Global Step: 773020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:27,240-Speed 2628.62 samples/sec   Loss 1.3280   LearningRate 0.0005   Epoch: 18   Global Step: 773030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:31,137-Speed 2628.19 samples/sec   Loss 1.3053   LearningRate 0.0005   Epoch: 18   Global Step: 773040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:35,045-Speed 2620.76 samples/sec   Loss 1.2929   LearningRate 0.0005   Epoch: 18   Global Step: 773050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:38,935-Speed 2633.27 samples/sec   Loss 1.3546   LearningRate 0.0005   Epoch: 18   Global Step: 773060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:42,828-Speed 2631.10 samples/sec   Loss 1.3363   LearningRate 0.0005   Epoch: 18   Global Step: 773070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:37:46,724-Speed 2629.15 samples/sec   Loss 1.3106   LearningRate 0.0005   Epoch: 18   Global Step: 773080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:37:50,597-Speed 2645.19 samples/sec   Loss 1.3595   LearningRate 0.0005   Epoch: 18   Global Step: 773090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:54,495-Speed 2627.31 samples/sec   Loss 1.3187   LearningRate 0.0005   Epoch: 18   Global Step: 773100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:37:58,390-Speed 2629.80 samples/sec   Loss 1.3436   LearningRate 0.0005   Epoch: 18   Global Step: 773110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:38:02,285-Speed 2629.56 samples/sec   Loss 1.3162   LearningRate 0.0005   Epoch: 18   Global Step: 773120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:38:06,183-Speed 2627.80 samples/sec   Loss 1.3220   LearningRate 0.0005   Epoch: 18   Global Step: 773130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:38:10,095-Speed 2617.83 samples/sec   Loss 1.3144   LearningRate 0.0005   Epoch: 18   Global Step: 773140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:38:13,989-Speed 2631.25 samples/sec   Loss 1.3417   LearningRate 0.0005   Epoch: 18   Global Step: 773150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:38:17,884-Speed 2629.03 samples/sec   Loss 1.3209   LearningRate 0.0005   Epoch: 18   Global Step: 773160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:38:21,759-Speed 2643.19 samples/sec   Loss 1.3279   LearningRate 0.0005   Epoch: 18   Global Step: 773170   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:25,682-Speed 2611.64 samples/sec   Loss 1.3274   LearningRate 0.0005   Epoch: 18   Global Step: 773180   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:29,581-Speed 2626.70 samples/sec   Loss 1.3011   LearningRate 0.0005   Epoch: 18   Global Step: 773190   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:33,511-Speed 2606.17 samples/sec   Loss 1.2727   LearningRate 0.0005   Epoch: 18   Global Step: 773200   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:37,410-Speed 2627.35 samples/sec   Loss 1.2833   LearningRate 0.0005   Epoch: 18   Global Step: 773210   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:41,398-Speed 2568.00 samples/sec   Loss 1.3128   LearningRate 0.0005   Epoch: 18   Global Step: 773220   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:45,325-Speed 2608.75 samples/sec   Loss 1.3080   LearningRate 0.0005   Epoch: 18   Global Step: 773230   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:49,226-Speed 2625.77 samples/sec   Loss 1.2734   LearningRate 0.0005   Epoch: 18   Global Step: 773240   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:53,125-Speed 2626.56 samples/sec   Loss 1.4023   LearningRate 0.0005   Epoch: 18   Global Step: 773250   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:38:57,023-Speed 2627.87 samples/sec   Loss 1.3300   LearningRate 0.0005   Epoch: 18   Global Step: 773260   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:39:00,915-Speed 2631.54 samples/sec   Loss 1.3249   LearningRate 0.0005   Epoch: 18   Global Step: 773270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:04,894-Speed 2574.92 samples/sec   Loss 1.3586   LearningRate 0.0005   Epoch: 18   Global Step: 773280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:08,794-Speed 2625.92 samples/sec   Loss 1.3359   LearningRate 0.0005   Epoch: 18   Global Step: 773290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:12,687-Speed 2630.45 samples/sec   Loss 1.3256   LearningRate 0.0005   Epoch: 18   Global Step: 773300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:16,623-Speed 2602.00 samples/sec   Loss 1.3610   LearningRate 0.0005   Epoch: 18   Global Step: 773310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:20,516-Speed 2632.09 samples/sec   Loss 1.3101   LearningRate 0.0005   Epoch: 18   Global Step: 773320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:24,406-Speed 2633.00 samples/sec   Loss 1.3386   LearningRate 0.0005   Epoch: 18   Global Step: 773330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:28,304-Speed 2627.70 samples/sec   Loss 1.3137   LearningRate 0.0005   Epoch: 18   Global Step: 773340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:32,231-Speed 2608.42 samples/sec   Loss 1.3145   LearningRate 0.0005   Epoch: 18   Global Step: 773350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:36,126-Speed 2629.43 samples/sec   Loss 1.3220   LearningRate 0.0005   Epoch: 18   Global Step: 773360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:40,034-Speed 2621.03 samples/sec   Loss 1.3338   LearningRate 0.0005   Epoch: 18   Global Step: 773370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:39:43,941-Speed 2621.35 samples/sec   Loss 1.2615   LearningRate 0.0005   Epoch: 18   Global Step: 773380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:39:47,812-Speed 2646.12 samples/sec   Loss 1.3546   LearningRate 0.0005   Epoch: 18   Global Step: 773390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:51,710-Speed 2628.08 samples/sec   Loss 1.3443   LearningRate 0.0005   Epoch: 18   Global Step: 773400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:39:55,587-Speed 2641.95 samples/sec   Loss 1.3320   LearningRate 0.0005   Epoch: 18   Global Step: 773410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:39:59,484-Speed 2628.45 samples/sec   Loss 1.3609   LearningRate 0.0005   Epoch: 18   Global Step: 773420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:03,393-Speed 2620.54 samples/sec   Loss 1.3840   LearningRate 0.0005   Epoch: 18   Global Step: 773430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:07,293-Speed 2626.39 samples/sec   Loss 1.3332   LearningRate 0.0005   Epoch: 18   Global Step: 773440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:11,200-Speed 2621.12 samples/sec   Loss 1.3557   LearningRate 0.0005   Epoch: 18   Global Step: 773450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:15,104-Speed 2623.57 samples/sec   Loss 1.3077   LearningRate 0.0005   Epoch: 18   Global Step: 773460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:19,007-Speed 2624.84 samples/sec   Loss 1.3431   LearningRate 0.0005   Epoch: 18   Global Step: 773470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:22,912-Speed 2622.76 samples/sec   Loss 1.3622   LearningRate 0.0005   Epoch: 18   Global Step: 773480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:26,829-Speed 2615.02 samples/sec   Loss 1.3058   LearningRate 0.0005   Epoch: 18   Global Step: 773490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:30,728-Speed 2627.30 samples/sec   Loss 1.3359   LearningRate 0.0005   Epoch: 18   Global Step: 773500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:34,632-Speed 2628.78 samples/sec   Loss 1.3204   LearningRate 0.0005   Epoch: 18   Global Step: 773510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:40:38,512-Speed 2639.88 samples/sec   Loss 1.2985   LearningRate 0.0005   Epoch: 18   Global Step: 773520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:42,408-Speed 2628.90 samples/sec   Loss 1.2869   LearningRate 0.0005   Epoch: 18   Global Step: 773530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:46,302-Speed 2630.68 samples/sec   Loss 1.3106   LearningRate 0.0005   Epoch: 18   Global Step: 773540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:50,202-Speed 2626.12 samples/sec   Loss 1.3351   LearningRate 0.0005   Epoch: 18   Global Step: 773550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:54,095-Speed 2632.02 samples/sec   Loss 1.3178   LearningRate 0.0005   Epoch: 18   Global Step: 773560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:40:57,990-Speed 2629.50 samples/sec   Loss 1.3210   LearningRate 0.0005   Epoch: 18   Global Step: 773570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:01,910-Speed 2612.42 samples/sec   Loss 1.3412   LearningRate 0.0005   Epoch: 18   Global Step: 773580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:05,849-Speed 2600.12 samples/sec   Loss 1.2797   LearningRate 0.0005   Epoch: 18   Global Step: 773590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:09,752-Speed 2625.14 samples/sec   Loss 1.2528   LearningRate 0.0005   Epoch: 18   Global Step: 773600   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:13,647-Speed 2629.77 samples/sec   Loss 1.2907   LearningRate 0.0005   Epoch: 18   Global Step: 773610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:17,547-Speed 2625.89 samples/sec   Loss 1.3554   LearningRate 0.0005   Epoch: 18   Global Step: 773620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:41:21,443-Speed 2628.81 samples/sec   Loss 1.3359   LearningRate 0.0005   Epoch: 18   Global Step: 773630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:41:25,352-Speed 2620.59 samples/sec   Loss 1.2986   LearningRate 0.0005   Epoch: 18   Global Step: 773640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:41:29,246-Speed 2630.45 samples/sec   Loss 1.3403   LearningRate 0.0005   Epoch: 18   Global Step: 773650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:41:33,116-Speed 2646.77 samples/sec   Loss 1.3233   LearningRate 0.0005   Epoch: 18   Global Step: 773660   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:37,011-Speed 2629.23 samples/sec   Loss 1.3396   LearningRate 0.0005   Epoch: 18   Global Step: 773670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:40,904-Speed 2631.41 samples/sec   Loss 1.3331   LearningRate 0.0005   Epoch: 18   Global Step: 773680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:44,797-Speed 2631.55 samples/sec   Loss 1.3392   LearningRate 0.0005   Epoch: 18   Global Step: 773690   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:48,688-Speed 2631.70 samples/sec   Loss 1.3053   LearningRate 0.0005   Epoch: 18   Global Step: 773700   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:52,585-Speed 2628.60 samples/sec   Loss 1.2787   LearningRate 0.0005   Epoch: 18   Global Step: 773710   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:41:56,481-Speed 2629.24 samples/sec   Loss 1.2987   LearningRate 0.0005   Epoch: 18   Global Step: 773720   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:42:00,383-Speed 2625.40 samples/sec   Loss 1.3323   LearningRate 0.0005   Epoch: 18   Global Step: 773730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:42:04,275-Speed 2631.68 samples/sec   Loss 1.3490   LearningRate 0.0005   Epoch: 18   Global Step: 773740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:42:08,170-Speed 2629.36 samples/sec   Loss 1.2980   LearningRate 0.0005   Epoch: 18   Global Step: 773750   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:42:12,072-Speed 2624.62 samples/sec   Loss 1.3407   LearningRate 0.0005   Epoch: 18   Global Step: 773760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:15,975-Speed 2625.12 samples/sec   Loss 1.2969   LearningRate 0.0005   Epoch: 18   Global Step: 773770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:19,873-Speed 2627.93 samples/sec   Loss 1.3392   LearningRate 0.0005   Epoch: 18   Global Step: 773780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:23,770-Speed 2627.89 samples/sec   Loss 1.2668   LearningRate 0.0005   Epoch: 18   Global Step: 773790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:27,677-Speed 2622.07 samples/sec   Loss 1.3002   LearningRate 0.0005   Epoch: 18   Global Step: 773800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:31,575-Speed 2627.55 samples/sec   Loss 1.3146   LearningRate 0.0005   Epoch: 18   Global Step: 773810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:35,515-Speed 2599.41 samples/sec   Loss 1.3609   LearningRate 0.0005   Epoch: 18   Global Step: 773820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:39,427-Speed 2618.26 samples/sec   Loss 1.3088   LearningRate 0.0005   Epoch: 18   Global Step: 773830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:43,330-Speed 2624.75 samples/sec   Loss 1.3146   LearningRate 0.0005   Epoch: 18   Global Step: 773840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:47,234-Speed 2623.26 samples/sec   Loss 1.3182   LearningRate 0.0005   Epoch: 18   Global Step: 773850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:42:51,126-Speed 2632.02 samples/sec   Loss 1.3617   LearningRate 0.0005   Epoch: 18   Global Step: 773860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:42:55,020-Speed 2630.74 samples/sec   Loss 1.3249   LearningRate 0.0005   Epoch: 18   Global Step: 773870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:42:58,926-Speed 2622.15 samples/sec   Loss 1.3063   LearningRate 0.0005   Epoch: 18   Global Step: 773880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:43:02,828-Speed 2625.27 samples/sec   Loss 1.3045   LearningRate 0.0005   Epoch: 18   Global Step: 773890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:43:06,702-Speed 2643.99 samples/sec   Loss 1.3608   LearningRate 0.0005   Epoch: 18   Global Step: 773900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:10,600-Speed 2626.99 samples/sec   Loss 1.3063   LearningRate 0.0005   Epoch: 18   Global Step: 773910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:14,510-Speed 2620.09 samples/sec   Loss 1.2803   LearningRate 0.0005   Epoch: 18   Global Step: 773920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:18,419-Speed 2620.08 samples/sec   Loss 1.3546   LearningRate 0.0005   Epoch: 18   Global Step: 773930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:22,313-Speed 2630.02 samples/sec   Loss 1.3019   LearningRate 0.0004   Epoch: 18   Global Step: 773940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:26,210-Speed 2628.94 samples/sec   Loss 1.2578   LearningRate 0.0004   Epoch: 18   Global Step: 773950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:30,108-Speed 2627.93 samples/sec   Loss 1.3468   LearningRate 0.0004   Epoch: 18   Global Step: 773960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:34,030-Speed 2611.64 samples/sec   Loss 1.2640   LearningRate 0.0004   Epoch: 18   Global Step: 773970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:37,919-Speed 2633.40 samples/sec   Loss 1.3500   LearningRate 0.0004   Epoch: 18   Global Step: 773980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:41,833-Speed 2616.85 samples/sec   Loss 1.3879   LearningRate 0.0004   Epoch: 18   Global Step: 773990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:45,727-Speed 2630.73 samples/sec   Loss 1.2999   LearningRate 0.0004   Epoch: 18   Global Step: 774000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:43:49,598-Speed 2646.77 samples/sec   Loss 1.3264   LearningRate 0.0004   Epoch: 18   Global Step: 774010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:53,501-Speed 2624.42 samples/sec   Loss 1.3123   LearningRate 0.0004   Epoch: 18   Global Step: 774020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:43:57,391-Speed 2633.20 samples/sec   Loss 1.3453   LearningRate 0.0004   Epoch: 18   Global Step: 774030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:01,280-Speed 2633.20 samples/sec   Loss 1.3340   LearningRate 0.0004   Epoch: 18   Global Step: 774040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:05,172-Speed 2632.29 samples/sec   Loss 1.3395   LearningRate 0.0004   Epoch: 18   Global Step: 774050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:09,078-Speed 2621.61 samples/sec   Loss 1.3313   LearningRate 0.0004   Epoch: 18   Global Step: 774060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:12,997-Speed 2614.17 samples/sec   Loss 1.3048   LearningRate 0.0004   Epoch: 18   Global Step: 774070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:16,898-Speed 2625.33 samples/sec   Loss 1.3089   LearningRate 0.0004   Epoch: 18   Global Step: 774080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:20,815-Speed 2615.29 samples/sec   Loss 1.3186   LearningRate 0.0004   Epoch: 18   Global Step: 774090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:24,766-Speed 2593.02 samples/sec   Loss 1.3548   LearningRate 0.0004   Epoch: 18   Global Step: 774100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:44:28,664-Speed 2627.85 samples/sec   Loss 1.2830   LearningRate 0.0004   Epoch: 18   Global Step: 774110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:32,573-Speed 2620.73 samples/sec   Loss 1.2513   LearningRate 0.0004   Epoch: 18   Global Step: 774120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:36,467-Speed 2630.22 samples/sec   Loss 1.3024   LearningRate 0.0004   Epoch: 18   Global Step: 774130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:40,366-Speed 2626.84 samples/sec   Loss 1.2959   LearningRate 0.0004   Epoch: 18   Global Step: 774140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:44,257-Speed 2631.84 samples/sec   Loss 1.3333   LearningRate 0.0004   Epoch: 18   Global Step: 774150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:48,181-Speed 2611.10 samples/sec   Loss 1.3194   LearningRate 0.0004   Epoch: 18   Global Step: 774160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:52,073-Speed 2631.87 samples/sec   Loss 1.2383   LearningRate 0.0004   Epoch: 18   Global Step: 774170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:55,983-Speed 2619.58 samples/sec   Loss 1.3481   LearningRate 0.0004   Epoch: 18   Global Step: 774180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:44:59,875-Speed 2631.31 samples/sec   Loss 1.2782   LearningRate 0.0004   Epoch: 18   Global Step: 774190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:45:03,768-Speed 2631.84 samples/sec   Loss 1.2891   LearningRate 0.0004   Epoch: 18   Global Step: 774200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:45:07,645-Speed 2641.64 samples/sec   Loss 1.3417   LearningRate 0.0004   Epoch: 18   Global Step: 774210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:45:11,538-Speed 2630.79 samples/sec   Loss 1.3250   LearningRate 0.0004   Epoch: 18   Global Step: 774220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:45:15,429-Speed 2632.14 samples/sec   Loss 1.2805   LearningRate 0.0004   Epoch: 18   Global Step: 774230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:45:19,304-Speed 2644.02 samples/sec   Loss 1.2945   LearningRate 0.0004   Epoch: 18   Global Step: 774240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:23,204-Speed 2626.57 samples/sec   Loss 1.3136   LearningRate 0.0004   Epoch: 18   Global Step: 774250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:27,100-Speed 2629.68 samples/sec   Loss 1.3472   LearningRate 0.0004   Epoch: 18   Global Step: 774260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:30,996-Speed 2628.26 samples/sec   Loss 1.3601   LearningRate 0.0004   Epoch: 18   Global Step: 774270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:34,892-Speed 2629.45 samples/sec   Loss 1.3142   LearningRate 0.0004   Epoch: 18   Global Step: 774280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:38,787-Speed 2629.64 samples/sec   Loss 1.2643   LearningRate 0.0004   Epoch: 18   Global Step: 774290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:42,692-Speed 2622.45 samples/sec   Loss 1.3237   LearningRate 0.0004   Epoch: 18   Global Step: 774300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:46,585-Speed 2630.69 samples/sec   Loss 1.2972   LearningRate 0.0004   Epoch: 18   Global Step: 774310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:50,478-Speed 2631.39 samples/sec   Loss 1.2940   LearningRate 0.0004   Epoch: 18   Global Step: 774320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:54,380-Speed 2625.40 samples/sec   Loss 1.3477   LearningRate 0.0004   Epoch: 18   Global Step: 774330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:45:58,275-Speed 2630.09 samples/sec   Loss 1.3157   LearningRate 0.0004   Epoch: 18   Global Step: 774340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:46:02,169-Speed 2630.29 samples/sec   Loss 1.2937   LearningRate 0.0004   Epoch: 18   Global Step: 774350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:46:06,076-Speed 2621.84 samples/sec   Loss 1.3500   LearningRate 0.0004   Epoch: 18   Global Step: 774360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:46:09,973-Speed 2628.02 samples/sec   Loss 1.2968   LearningRate 0.0004   Epoch: 18   Global Step: 774370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:46:13,872-Speed 2626.38 samples/sec   Loss 1.3403   LearningRate 0.0004   Epoch: 18   Global Step: 774380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:46:17,759-Speed 2635.31 samples/sec   Loss 1.3059   LearningRate 0.0004   Epoch: 18   Global Step: 774390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:46:21,661-Speed 2625.95 samples/sec   Loss 1.3211   LearningRate 0.0004   Epoch: 18   Global Step: 774400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:46:25,556-Speed 2630.47 samples/sec   Loss 1.3193   LearningRate 0.0004   Epoch: 18   Global Step: 774410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:46:29,428-Speed 2645.28 samples/sec   Loss 1.3050   LearningRate 0.0004   Epoch: 18   Global Step: 774420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:33,367-Speed 2600.22 samples/sec   Loss 1.3894   LearningRate 0.0004   Epoch: 18   Global Step: 774430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:37,259-Speed 2631.66 samples/sec   Loss 1.3514   LearningRate 0.0004   Epoch: 18   Global Step: 774440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:41,171-Speed 2618.45 samples/sec   Loss 1.3397   LearningRate 0.0004   Epoch: 18   Global Step: 774450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:45,065-Speed 2630.57 samples/sec   Loss 1.3066   LearningRate 0.0004   Epoch: 18   Global Step: 774460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:48,964-Speed 2626.77 samples/sec   Loss 1.2578   LearningRate 0.0004   Epoch: 18   Global Step: 774470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:52,866-Speed 2625.60 samples/sec   Loss 1.2938   LearningRate 0.0004   Epoch: 18   Global Step: 774480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:46:56,761-Speed 2629.75 samples/sec   Loss 1.3458   LearningRate 0.0004   Epoch: 18   Global Step: 774490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:47:00,656-Speed 2629.05 samples/sec   Loss 1.3105   LearningRate 0.0004   Epoch: 18   Global Step: 774500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:47:04,550-Speed 2630.14 samples/sec   Loss 1.2656   LearningRate 0.0004   Epoch: 18   Global Step: 774510   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:47:08,456-Speed 2623.67 samples/sec   Loss 1.3120   LearningRate 0.0004   Epoch: 18   Global Step: 774520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:12,350-Speed 2630.03 samples/sec   Loss 1.3497   LearningRate 0.0004   Epoch: 18   Global Step: 774530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:16,245-Speed 2630.03 samples/sec   Loss 1.2713   LearningRate 0.0004   Epoch: 18   Global Step: 774540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:20,164-Speed 2613.69 samples/sec   Loss 1.3372   LearningRate 0.0004   Epoch: 18   Global Step: 774550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:24,115-Speed 2592.56 samples/sec   Loss 1.3584   LearningRate 0.0004   Epoch: 18   Global Step: 774560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:28,013-Speed 2627.78 samples/sec   Loss 1.3064   LearningRate 0.0004   Epoch: 18   Global Step: 774570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:31,933-Speed 2612.80 samples/sec   Loss 1.3254   LearningRate 0.0004   Epoch: 18   Global Step: 774580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:35,832-Speed 2626.61 samples/sec   Loss 1.2768   LearningRate 0.0004   Epoch: 18   Global Step: 774590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:39,731-Speed 2627.15 samples/sec   Loss 1.2794   LearningRate 0.0004   Epoch: 18   Global Step: 774600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:43,636-Speed 2623.33 samples/sec   Loss 1.3437   LearningRate 0.0004   Epoch: 18   Global Step: 774610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:47,506-Speed 2646.08 samples/sec   Loss 1.3266   LearningRate 0.0004   Epoch: 18   Global Step: 774620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:51,405-Speed 2628.17 samples/sec   Loss 1.2700   LearningRate 0.0004   Epoch: 18   Global Step: 774630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:55,297-Speed 2631.96 samples/sec   Loss 1.3062   LearningRate 0.0004   Epoch: 18   Global Step: 774640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:47:59,192-Speed 2629.79 samples/sec   Loss 1.3289   LearningRate 0.0004   Epoch: 18   Global Step: 774650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:03,105-Speed 2617.57 samples/sec   Loss 1.3379   LearningRate 0.0004   Epoch: 18   Global Step: 774660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:07,001-Speed 2628.85 samples/sec   Loss 1.3059   LearningRate 0.0004   Epoch: 18   Global Step: 774670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:10,893-Speed 2631.53 samples/sec   Loss 1.3381   LearningRate 0.0004   Epoch: 18   Global Step: 774680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:14,797-Speed 2624.15 samples/sec   Loss 1.3175   LearningRate 0.0004   Epoch: 18   Global Step: 774690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:18,696-Speed 2626.54 samples/sec   Loss 1.3153   LearningRate 0.0004   Epoch: 18   Global Step: 774700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:22,592-Speed 2629.88 samples/sec   Loss 1.2803   LearningRate 0.0004   Epoch: 18   Global Step: 774710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:26,486-Speed 2630.01 samples/sec   Loss 1.2697   LearningRate 0.0004   Epoch: 18   Global Step: 774720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:48:30,379-Speed 2632.10 samples/sec   Loss 1.3566   LearningRate 0.0004   Epoch: 18   Global Step: 774730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:48:34,250-Speed 2645.90 samples/sec   Loss 1.3281   LearningRate 0.0004   Epoch: 18   Global Step: 774740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:38,146-Speed 2628.57 samples/sec   Loss 1.2882   LearningRate 0.0004   Epoch: 18   Global Step: 774750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:42,046-Speed 2625.76 samples/sec   Loss 1.3007   LearningRate 0.0004   Epoch: 18   Global Step: 774760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:45,946-Speed 2626.94 samples/sec   Loss 1.2871   LearningRate 0.0004   Epoch: 18   Global Step: 774770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:49,843-Speed 2627.72 samples/sec   Loss 1.3225   LearningRate 0.0004   Epoch: 18   Global Step: 774780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:53,768-Speed 2610.08 samples/sec   Loss 1.3144   LearningRate 0.0004   Epoch: 18   Global Step: 774790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:48:57,664-Speed 2629.15 samples/sec   Loss 1.3249   LearningRate 0.0004   Epoch: 18   Global Step: 774800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:01,554-Speed 2633.50 samples/sec   Loss 1.3128   LearningRate 0.0004   Epoch: 18   Global Step: 774810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:05,462-Speed 2620.58 samples/sec   Loss 1.3093   LearningRate 0.0004   Epoch: 18   Global Step: 774820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:09,363-Speed 2625.06 samples/sec   Loss 1.2519   LearningRate 0.0004   Epoch: 18   Global Step: 774830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:13,258-Speed 2629.81 samples/sec   Loss 1.3342   LearningRate 0.0004   Epoch: 18   Global Step: 774840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:17,184-Speed 2608.48 samples/sec   Loss 1.3632   LearningRate 0.0004   Epoch: 18   Global Step: 774850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:21,109-Speed 2609.75 samples/sec   Loss 1.2893   LearningRate 0.0004   Epoch: 18   Global Step: 774860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:25,011-Speed 2624.95 samples/sec   Loss 1.3549   LearningRate 0.0004   Epoch: 18   Global Step: 774870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:28,906-Speed 2629.50 samples/sec   Loss 1.3127   LearningRate 0.0004   Epoch: 18   Global Step: 774880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:32,802-Speed 2629.45 samples/sec   Loss 1.3459   LearningRate 0.0004   Epoch: 18   Global Step: 774890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:36,696-Speed 2630.52 samples/sec   Loss 1.3141   LearningRate 0.0004   Epoch: 18   Global Step: 774900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:49:40,571-Speed 2643.05 samples/sec   Loss 1.2737   LearningRate 0.0004   Epoch: 18   Global Step: 774910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:44,467-Speed 2628.49 samples/sec   Loss 1.2803   LearningRate 0.0004   Epoch: 18   Global Step: 774920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:48,376-Speed 2620.65 samples/sec   Loss 1.2459   LearningRate 0.0004   Epoch: 18   Global Step: 774930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:52,278-Speed 2625.45 samples/sec   Loss 1.2958   LearningRate 0.0004   Epoch: 18   Global Step: 774940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:49:56,180-Speed 2624.73 samples/sec   Loss 1.3133   LearningRate 0.0004   Epoch: 18   Global Step: 774950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:50:00,081-Speed 2625.72 samples/sec   Loss 1.3133   LearningRate 0.0004   Epoch: 18   Global Step: 774960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:50:03,990-Speed 2620.05 samples/sec   Loss 1.2890   LearningRate 0.0004   Epoch: 18   Global Step: 774970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:50:07,893-Speed 2624.14 samples/sec   Loss 1.3306   LearningRate 0.0004   Epoch: 18   Global Step: 774980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:50:11,793-Speed 2626.50 samples/sec   Loss 1.3084   LearningRate 0.0004   Epoch: 18   Global Step: 774990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:50:15,677-Speed 2637.21 samples/sec   Loss 1.3232   LearningRate 0.0004   Epoch: 18   Global Step: 775000   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:19,568-Speed 2631.86 samples/sec   Loss 1.2790   LearningRate 0.0004   Epoch: 18   Global Step: 775010   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:23,464-Speed 2629.17 samples/sec   Loss 1.2793   LearningRate 0.0004   Epoch: 18   Global Step: 775020   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:27,364-Speed 2626.21 samples/sec   Loss 1.3052   LearningRate 0.0004   Epoch: 18   Global Step: 775030   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:31,263-Speed 2626.50 samples/sec   Loss 1.2733   LearningRate 0.0004   Epoch: 18   Global Step: 775040   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:35,173-Speed 2620.05 samples/sec   Loss 1.3153   LearningRate 0.0004   Epoch: 18   Global Step: 775050   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:39,067-Speed 2630.21 samples/sec   Loss 1.3212   LearningRate 0.0004   Epoch: 18   Global Step: 775060   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:42,959-Speed 2631.70 samples/sec   Loss 1.3251   LearningRate 0.0004   Epoch: 18   Global Step: 775070   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:46,852-Speed 2630.84 samples/sec   Loss 1.2796   LearningRate 0.0004   Epoch: 18   Global Step: 775080   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:50,744-Speed 2631.89 samples/sec   Loss 1.2811   LearningRate 0.0004   Epoch: 18   Global Step: 775090   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:50:54,639-Speed 2629.79 samples/sec   Loss 1.3085   LearningRate 0.0004   Epoch: 18   Global Step: 775100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:50:58,533-Speed 2630.39 samples/sec   Loss 1.2972   LearningRate 0.0004   Epoch: 18   Global Step: 775110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:02,428-Speed 2629.29 samples/sec   Loss 1.3008   LearningRate 0.0004   Epoch: 18   Global Step: 775120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:06,321-Speed 2631.53 samples/sec   Loss 1.3328   LearningRate 0.0004   Epoch: 18   Global Step: 775130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:10,227-Speed 2622.01 samples/sec   Loss 1.3071   LearningRate 0.0004   Epoch: 18   Global Step: 775140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:14,127-Speed 2625.56 samples/sec   Loss 1.3039   LearningRate 0.0004   Epoch: 18   Global Step: 775150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:18,028-Speed 2625.98 samples/sec   Loss 1.3106   LearningRate 0.0004   Epoch: 18   Global Step: 775160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:21,927-Speed 2627.34 samples/sec   Loss 1.2945   LearningRate 0.0004   Epoch: 18   Global Step: 775170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:25,817-Speed 2633.25 samples/sec   Loss 1.3559   LearningRate 0.0004   Epoch: 18   Global Step: 775180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:29,710-Speed 2631.09 samples/sec   Loss 1.3195   LearningRate 0.0004   Epoch: 18   Global Step: 775190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:33,602-Speed 2631.75 samples/sec   Loss 1.3077   LearningRate 0.0004   Epoch: 18   Global Step: 775200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:51:37,476-Speed 2643.40 samples/sec   Loss 1.3069   LearningRate 0.0004   Epoch: 18   Global Step: 775210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:41,373-Speed 2628.21 samples/sec   Loss 1.3134   LearningRate 0.0004   Epoch: 18   Global Step: 775220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:45,270-Speed 2627.75 samples/sec   Loss 1.3274   LearningRate 0.0004   Epoch: 18   Global Step: 775230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:49,170-Speed 2626.42 samples/sec   Loss 1.2881   LearningRate 0.0004   Epoch: 18   Global Step: 775240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:53,069-Speed 2627.30 samples/sec   Loss 1.3060   LearningRate 0.0004   Epoch: 18   Global Step: 775250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:51:56,962-Speed 2630.84 samples/sec   Loss 1.2927   LearningRate 0.0004   Epoch: 18   Global Step: 775260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:00,862-Speed 2626.24 samples/sec   Loss 1.3018   LearningRate 0.0004   Epoch: 18   Global Step: 775270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:04,769-Speed 2622.16 samples/sec   Loss 1.2945   LearningRate 0.0004   Epoch: 18   Global Step: 775280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:08,666-Speed 2628.43 samples/sec   Loss 1.2504   LearningRate 0.0004   Epoch: 18   Global Step: 775290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:12,560-Speed 2630.06 samples/sec   Loss 1.3135   LearningRate 0.0004   Epoch: 18   Global Step: 775300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:16,463-Speed 2624.08 samples/sec   Loss 1.3265   LearningRate 0.0004   Epoch: 18   Global Step: 775310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:52:20,342-Speed 2640.17 samples/sec   Loss 1.3084   LearningRate 0.0004   Epoch: 18   Global Step: 775320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:24,241-Speed 2627.03 samples/sec   Loss 1.3159   LearningRate 0.0004   Epoch: 18   Global Step: 775330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:28,154-Speed 2617.40 samples/sec   Loss 1.2995   LearningRate 0.0004   Epoch: 18   Global Step: 775340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:32,052-Speed 2627.52 samples/sec   Loss 1.2557   LearningRate 0.0004   Epoch: 18   Global Step: 775350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:35,962-Speed 2620.16 samples/sec   Loss 1.2727   LearningRate 0.0004   Epoch: 18   Global Step: 775360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:39,869-Speed 2621.88 samples/sec   Loss 1.3177   LearningRate 0.0004   Epoch: 18   Global Step: 775370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:43,781-Speed 2618.50 samples/sec   Loss 1.2973   LearningRate 0.0004   Epoch: 18   Global Step: 775380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:47,681-Speed 2626.26 samples/sec   Loss 1.2833   LearningRate 0.0004   Epoch: 18   Global Step: 775390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:51,584-Speed 2623.90 samples/sec   Loss 1.3125   LearningRate 0.0004   Epoch: 18   Global Step: 775400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:55,481-Speed 2628.28 samples/sec   Loss 1.2800   LearningRate 0.0004   Epoch: 18   Global Step: 775410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:52:59,407-Speed 2608.82 samples/sec   Loss 1.3159   LearningRate 0.0004   Epoch: 18   Global Step: 775420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:03,405-Speed 2562.01 samples/sec   Loss 1.2457   LearningRate 0.0004   Epoch: 18   Global Step: 775430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:07,316-Speed 2618.70 samples/sec   Loss 1.3106   LearningRate 0.0004   Epoch: 18   Global Step: 775440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:11,217-Speed 2625.87 samples/sec   Loss 1.2601   LearningRate 0.0004   Epoch: 18   Global Step: 775450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:15,121-Speed 2623.67 samples/sec   Loss 1.2988   LearningRate 0.0004   Epoch: 18   Global Step: 775460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:19,018-Speed 2627.82 samples/sec   Loss 1.2645   LearningRate 0.0004   Epoch: 18   Global Step: 775470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:22,926-Speed 2621.05 samples/sec   Loss 1.3206   LearningRate 0.0004   Epoch: 18   Global Step: 775480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:53:26,804-Speed 2640.87 samples/sec   Loss 1.2767   LearningRate 0.0004   Epoch: 18   Global Step: 775490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:30,721-Speed 2615.56 samples/sec   Loss 1.2442   LearningRate 0.0004   Epoch: 18   Global Step: 775500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:34,615-Speed 2629.62 samples/sec   Loss 1.2732   LearningRate 0.0004   Epoch: 18   Global Step: 775510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:38,537-Speed 2611.83 samples/sec   Loss 1.3236   LearningRate 0.0004   Epoch: 18   Global Step: 775520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:42,435-Speed 2627.52 samples/sec   Loss 1.3274   LearningRate 0.0004   Epoch: 18   Global Step: 775530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:46,331-Speed 2628.82 samples/sec   Loss 1.2868   LearningRate 0.0004   Epoch: 18   Global Step: 775540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:50,224-Speed 2631.16 samples/sec   Loss 1.2858   LearningRate 0.0004   Epoch: 18   Global Step: 775550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:54,120-Speed 2629.96 samples/sec   Loss 1.2659   LearningRate 0.0004   Epoch: 18   Global Step: 775560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:53:58,016-Speed 2628.79 samples/sec   Loss 1.3044   LearningRate 0.0004   Epoch: 18   Global Step: 775570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:54:01,915-Speed 2627.14 samples/sec   Loss 1.2883   LearningRate 0.0004   Epoch: 18   Global Step: 775580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:54:05,792-Speed 2641.72 samples/sec   Loss 1.2907   LearningRate 0.0004   Epoch: 18   Global Step: 775590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:54:09,703-Speed 2618.32 samples/sec   Loss 1.2845   LearningRate 0.0004   Epoch: 18   Global Step: 775600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:54:13,576-Speed 2644.78 samples/sec   Loss 1.2990   LearningRate 0.0004   Epoch: 18   Global Step: 775610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:17,472-Speed 2629.35 samples/sec   Loss 1.3623   LearningRate 0.0004   Epoch: 18   Global Step: 775620   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:21,366-Speed 2630.76 samples/sec   Loss 1.2826   LearningRate 0.0004   Epoch: 18   Global Step: 775630   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:25,256-Speed 2632.58 samples/sec   Loss 1.2812   LearningRate 0.0004   Epoch: 18   Global Step: 775640   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:29,170-Speed 2617.61 samples/sec   Loss 1.3092   LearningRate 0.0004   Epoch: 18   Global Step: 775650   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:33,088-Speed 2614.42 samples/sec   Loss 1.3030   LearningRate 0.0004   Epoch: 18   Global Step: 775660   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:36,979-Speed 2631.74 samples/sec   Loss 1.3144   LearningRate 0.0004   Epoch: 18   Global Step: 775670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:40,874-Speed 2630.02 samples/sec   Loss 1.3085   LearningRate 0.0004   Epoch: 18   Global Step: 775680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:44,777-Speed 2624.34 samples/sec   Loss 1.3682   LearningRate 0.0004   Epoch: 18   Global Step: 775690   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:48,672-Speed 2629.87 samples/sec   Loss 1.2995   LearningRate 0.0004   Epoch: 18   Global Step: 775700   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:54:52,576-Speed 2623.62 samples/sec   Loss 1.3576   LearningRate 0.0004   Epoch: 18   Global Step: 775710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:54:56,469-Speed 2631.00 samples/sec   Loss 1.3147   LearningRate 0.0004   Epoch: 18   Global Step: 775720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:55:00,371-Speed 2625.11 samples/sec   Loss 1.2905   LearningRate 0.0004   Epoch: 18   Global Step: 775730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:55:04,263-Speed 2631.60 samples/sec   Loss 1.3021   LearningRate 0.0004   Epoch: 18   Global Step: 775740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:55:08,157-Speed 2630.27 samples/sec   Loss 1.3036   LearningRate 0.0004   Epoch: 18   Global Step: 775750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:55:12,029-Speed 2645.58 samples/sec   Loss 1.2722   LearningRate 0.0004   Epoch: 18   Global Step: 775760   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:15,923-Speed 2630.21 samples/sec   Loss 1.3408   LearningRate 0.0004   Epoch: 18   Global Step: 775770   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:19,826-Speed 2624.27 samples/sec   Loss 1.3141   LearningRate 0.0004   Epoch: 18   Global Step: 775780   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:23,719-Speed 2630.87 samples/sec   Loss 1.3384   LearningRate 0.0004   Epoch: 18   Global Step: 775790   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:27,619-Speed 2626.31 samples/sec   Loss 1.2944   LearningRate 0.0004   Epoch: 18   Global Step: 775800   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:31,514-Speed 2629.96 samples/sec   Loss 1.2730   LearningRate 0.0004   Epoch: 18   Global Step: 775810   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:35,406-Speed 2631.53 samples/sec   Loss 1.3074   LearningRate 0.0004   Epoch: 18   Global Step: 775820   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:39,298-Speed 2631.39 samples/sec   Loss 1.2916   LearningRate 0.0004   Epoch: 18   Global Step: 775830   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:43,191-Speed 2631.07 samples/sec   Loss 1.2956   LearningRate 0.0004   Epoch: 18   Global Step: 775840   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:47,083-Speed 2632.68 samples/sec   Loss 1.3699   LearningRate 0.0004   Epoch: 18   Global Step: 775850   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 10:55:50,976-Speed 2630.93 samples/sec   Loss 1.3079   LearningRate 0.0004   Epoch: 18   Global Step: 775860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:55:54,871-Speed 2629.76 samples/sec   Loss 1.2437   LearningRate 0.0004   Epoch: 18   Global Step: 775870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:55:58,763-Speed 2632.06 samples/sec   Loss 1.3122   LearningRate 0.0004   Epoch: 18   Global Step: 775880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:02,654-Speed 2632.29 samples/sec   Loss 1.2612   LearningRate 0.0004   Epoch: 18   Global Step: 775890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:06,543-Speed 2633.05 samples/sec   Loss 1.3131   LearningRate 0.0004   Epoch: 18   Global Step: 775900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:10,441-Speed 2627.49 samples/sec   Loss 1.2946   LearningRate 0.0004   Epoch: 18   Global Step: 775910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:14,349-Speed 2621.50 samples/sec   Loss 1.2766   LearningRate 0.0004   Epoch: 18   Global Step: 775920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:18,245-Speed 2628.33 samples/sec   Loss 1.3222   LearningRate 0.0004   Epoch: 18   Global Step: 775930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:22,161-Speed 2616.38 samples/sec   Loss 1.2667   LearningRate 0.0004   Epoch: 18   Global Step: 775940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:26,058-Speed 2628.81 samples/sec   Loss 1.3353   LearningRate 0.0004   Epoch: 18   Global Step: 775950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:29,970-Speed 2617.78 samples/sec   Loss 1.2885   LearningRate 0.0004   Epoch: 18   Global Step: 775960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:56:33,884-Speed 2618.25 samples/sec   Loss 1.2911   LearningRate 0.0004   Epoch: 18   Global Step: 775970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:37,787-Speed 2623.98 samples/sec   Loss 1.3266   LearningRate 0.0004   Epoch: 18   Global Step: 775980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:41,692-Speed 2623.39 samples/sec   Loss 1.2599   LearningRate 0.0004   Epoch: 18   Global Step: 775990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:45,600-Speed 2620.60 samples/sec   Loss 1.3179   LearningRate 0.0004   Epoch: 18   Global Step: 776000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:49,497-Speed 2628.46 samples/sec   Loss 1.3237   LearningRate 0.0004   Epoch: 18   Global Step: 776010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:53,403-Speed 2622.16 samples/sec   Loss 1.3518   LearningRate 0.0004   Epoch: 18   Global Step: 776020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:56:57,325-Speed 2611.99 samples/sec   Loss 1.2400   LearningRate 0.0004   Epoch: 18   Global Step: 776030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:01,218-Speed 2630.90 samples/sec   Loss 1.3055   LearningRate 0.0004   Epoch: 18   Global Step: 776040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:05,110-Speed 2631.48 samples/sec   Loss 1.2655   LearningRate 0.0004   Epoch: 18   Global Step: 776050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:09,003-Speed 2631.71 samples/sec   Loss 1.2896   LearningRate 0.0004   Epoch: 18   Global Step: 776060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:12,894-Speed 2631.94 samples/sec   Loss 1.2708   LearningRate 0.0004   Epoch: 18   Global Step: 776070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:57:16,761-Speed 2648.96 samples/sec   Loss 1.2688   LearningRate 0.0004   Epoch: 18   Global Step: 776080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:20,681-Speed 2612.29 samples/sec   Loss 1.2443   LearningRate 0.0004   Epoch: 18   Global Step: 776090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:24,570-Speed 2634.35 samples/sec   Loss 1.2828   LearningRate 0.0004   Epoch: 18   Global Step: 776100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:28,466-Speed 2629.43 samples/sec   Loss 1.3257   LearningRate 0.0004   Epoch: 18   Global Step: 776110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:32,365-Speed 2627.45 samples/sec   Loss 1.2703   LearningRate 0.0004   Epoch: 18   Global Step: 776120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:36,258-Speed 2630.83 samples/sec   Loss 1.2931   LearningRate 0.0004   Epoch: 18   Global Step: 776130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:40,281-Speed 2546.28 samples/sec   Loss 1.2966   LearningRate 0.0004   Epoch: 18   Global Step: 776140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:44,378-Speed 2500.11 samples/sec   Loss 1.3342   LearningRate 0.0004   Epoch: 18   Global Step: 776150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:48,468-Speed 2504.08 samples/sec   Loss 1.2808   LearningRate 0.0004   Epoch: 18   Global Step: 776160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:52,530-Speed 2521.78 samples/sec   Loss 1.3039   LearningRate 0.0004   Epoch: 18   Global Step: 776170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:57:56,581-Speed 2528.46 samples/sec   Loss 1.3273   LearningRate 0.0004   Epoch: 18   Global Step: 776180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:58:00,561-Speed 2573.66 samples/sec   Loss 1.2837   LearningRate 0.0004   Epoch: 18   Global Step: 776190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:58:04,432-Speed 2645.91 samples/sec   Loss 1.2708   LearningRate 0.0004   Epoch: 18   Global Step: 776200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:08,332-Speed 2626.32 samples/sec   Loss 1.2945   LearningRate 0.0004   Epoch: 18   Global Step: 776210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:12,231-Speed 2627.18 samples/sec   Loss 1.3070   LearningRate 0.0004   Epoch: 18   Global Step: 776220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:16,218-Speed 2568.85 samples/sec   Loss 1.3165   LearningRate 0.0004   Epoch: 18   Global Step: 776230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:20,130-Speed 2618.34 samples/sec   Loss 1.3013   LearningRate 0.0004   Epoch: 18   Global Step: 776240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:24,037-Speed 2621.12 samples/sec   Loss 1.2686   LearningRate 0.0004   Epoch: 18   Global Step: 776250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:27,943-Speed 2623.09 samples/sec   Loss 1.3033   LearningRate 0.0004   Epoch: 18   Global Step: 776260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:31,836-Speed 2631.48 samples/sec   Loss 1.3184   LearningRate 0.0004   Epoch: 18   Global Step: 776270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:35,733-Speed 2627.94 samples/sec   Loss 1.2901   LearningRate 0.0004   Epoch: 18   Global Step: 776280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:39,624-Speed 2632.33 samples/sec   Loss 1.2922   LearningRate 0.0004   Epoch: 18   Global Step: 776290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:58:43,517-Speed 2630.36 samples/sec   Loss 1.3273   LearningRate 0.0004   Epoch: 18   Global Step: 776300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:58:47,412-Speed 2630.25 samples/sec   Loss 1.2734   LearningRate 0.0004   Epoch: 18   Global Step: 776310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:58:51,307-Speed 2629.40 samples/sec   Loss 1.2645   LearningRate 0.0004   Epoch: 18   Global Step: 776320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:58:55,207-Speed 2627.43 samples/sec   Loss 1.2520   LearningRate 0.0004   Epoch: 18   Global Step: 776330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:58:59,120-Speed 2617.65 samples/sec   Loss 1.3267   LearningRate 0.0004   Epoch: 18   Global Step: 776340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:03,014-Speed 2629.79 samples/sec   Loss 1.3189   LearningRate 0.0004   Epoch: 18   Global Step: 776350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:06,909-Speed 2629.86 samples/sec   Loss 1.2926   LearningRate 0.0004   Epoch: 18   Global Step: 776360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:10,824-Speed 2616.41 samples/sec   Loss 1.3345   LearningRate 0.0004   Epoch: 18   Global Step: 776370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:14,714-Speed 2633.31 samples/sec   Loss 1.3212   LearningRate 0.0004   Epoch: 18   Global Step: 776380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:18,609-Speed 2629.14 samples/sec   Loss 1.2788   LearningRate 0.0004   Epoch: 18   Global Step: 776390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:22,617-Speed 2556.24 samples/sec   Loss 1.3019   LearningRate 0.0004   Epoch: 18   Global Step: 776400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:26,515-Speed 2627.35 samples/sec   Loss 1.3318   LearningRate 0.0004   Epoch: 18   Global Step: 776410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:30,413-Speed 2628.10 samples/sec   Loss 1.3280   LearningRate 0.0004   Epoch: 18   Global Step: 776420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:34,306-Speed 2631.07 samples/sec   Loss 1.3198   LearningRate 0.0004   Epoch: 18   Global Step: 776430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:38,226-Speed 2612.72 samples/sec   Loss 1.2403   LearningRate 0.0004   Epoch: 18   Global Step: 776440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 10:59:42,095-Speed 2646.68 samples/sec   Loss 1.3383   LearningRate 0.0004   Epoch: 18   Global Step: 776450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:45,988-Speed 2632.02 samples/sec   Loss 1.3189   LearningRate 0.0004   Epoch: 18   Global Step: 776460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:49,900-Speed 2617.78 samples/sec   Loss 1.3588   LearningRate 0.0004   Epoch: 18   Global Step: 776470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:53,812-Speed 2618.81 samples/sec   Loss 1.3328   LearningRate 0.0004   Epoch: 18   Global Step: 776480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 10:59:57,717-Speed 2623.01 samples/sec   Loss 1.2669   LearningRate 0.0004   Epoch: 18   Global Step: 776490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:01,609-Speed 2631.96 samples/sec   Loss 1.3501   LearningRate 0.0004   Epoch: 18   Global Step: 776500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:05,512-Speed 2624.18 samples/sec   Loss 1.2869   LearningRate 0.0004   Epoch: 18   Global Step: 776510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:09,448-Speed 2602.29 samples/sec   Loss 1.3133   LearningRate 0.0004   Epoch: 18   Global Step: 776520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:13,353-Speed 2623.18 samples/sec   Loss 1.3085   LearningRate 0.0004   Epoch: 18   Global Step: 776530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:17,266-Speed 2617.58 samples/sec   Loss 1.2823   LearningRate 0.0004   Epoch: 18   Global Step: 776540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:21,162-Speed 2628.67 samples/sec   Loss 1.2680   LearningRate 0.0004   Epoch: 18   Global Step: 776550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:00:25,040-Speed 2641.71 samples/sec   Loss 1.2922   LearningRate 0.0004   Epoch: 18   Global Step: 776560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:28,941-Speed 2625.92 samples/sec   Loss 1.2765   LearningRate 0.0004   Epoch: 18   Global Step: 776570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:32,836-Speed 2629.92 samples/sec   Loss 1.2745   LearningRate 0.0004   Epoch: 18   Global Step: 776580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:36,740-Speed 2623.16 samples/sec   Loss 1.2852   LearningRate 0.0004   Epoch: 18   Global Step: 776590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:40,639-Speed 2626.75 samples/sec   Loss 1.2537   LearningRate 0.0004   Epoch: 18   Global Step: 776600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:44,540-Speed 2625.80 samples/sec   Loss 1.3142   LearningRate 0.0004   Epoch: 18   Global Step: 776610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:48,440-Speed 2625.73 samples/sec   Loss 1.2816   LearningRate 0.0004   Epoch: 18   Global Step: 776620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:52,336-Speed 2629.11 samples/sec   Loss 1.3205   LearningRate 0.0004   Epoch: 18   Global Step: 776630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:00:56,236-Speed 2626.50 samples/sec   Loss 1.2791   LearningRate 0.0004   Epoch: 18   Global Step: 776640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:00,131-Speed 2630.63 samples/sec   Loss 1.2750   LearningRate 0.0004   Epoch: 18   Global Step: 776650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:04,049-Speed 2613.59 samples/sec   Loss 1.3267   LearningRate 0.0004   Epoch: 18   Global Step: 776660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:01:07,919-Speed 2646.72 samples/sec   Loss 1.2469   LearningRate 0.0004   Epoch: 18   Global Step: 776670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:11,819-Speed 2626.09 samples/sec   Loss 1.2806   LearningRate 0.0004   Epoch: 18   Global Step: 776680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:15,731-Speed 2618.39 samples/sec   Loss 1.2703   LearningRate 0.0004   Epoch: 18   Global Step: 776690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:19,629-Speed 2627.42 samples/sec   Loss 1.2838   LearningRate 0.0004   Epoch: 18   Global Step: 776700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:23,528-Speed 2627.61 samples/sec   Loss 1.2614   LearningRate 0.0004   Epoch: 18   Global Step: 776710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:27,423-Speed 2629.59 samples/sec   Loss 1.3071   LearningRate 0.0004   Epoch: 18   Global Step: 776720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:31,319-Speed 2629.00 samples/sec   Loss 1.2899   LearningRate 0.0004   Epoch: 18   Global Step: 776730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:35,212-Speed 2631.11 samples/sec   Loss 1.2893   LearningRate 0.0004   Epoch: 18   Global Step: 776740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:39,113-Speed 2625.87 samples/sec   Loss 1.2732   LearningRate 0.0004   Epoch: 18   Global Step: 776750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:43,039-Speed 2608.73 samples/sec   Loss 1.2648   LearningRate 0.0004   Epoch: 18   Global Step: 776760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:46,932-Speed 2631.17 samples/sec   Loss 1.3193   LearningRate 0.0004   Epoch: 18   Global Step: 776770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:01:50,803-Speed 2646.19 samples/sec   Loss 1.3151   LearningRate 0.0004   Epoch: 18   Global Step: 776780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:54,693-Speed 2633.01 samples/sec   Loss 1.2563   LearningRate 0.0004   Epoch: 18   Global Step: 776790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:01:58,580-Speed 2635.21 samples/sec   Loss 1.3358   LearningRate 0.0004   Epoch: 18   Global Step: 776800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:02,470-Speed 2633.27 samples/sec   Loss 1.2892   LearningRate 0.0004   Epoch: 18   Global Step: 776810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:06,369-Speed 2626.85 samples/sec   Loss 1.2736   LearningRate 0.0004   Epoch: 18   Global Step: 776820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:10,268-Speed 2626.78 samples/sec   Loss 1.2974   LearningRate 0.0004   Epoch: 18   Global Step: 776830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:14,176-Speed 2620.85 samples/sec   Loss 1.3108   LearningRate 0.0004   Epoch: 18   Global Step: 776840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:18,070-Speed 2630.97 samples/sec   Loss 1.2700   LearningRate 0.0004   Epoch: 18   Global Step: 776850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:21,975-Speed 2623.17 samples/sec   Loss 1.3284   LearningRate 0.0004   Epoch: 18   Global Step: 776860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:25,876-Speed 2625.81 samples/sec   Loss 1.2840   LearningRate 0.0004   Epoch: 18   Global Step: 776870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:02:29,798-Speed 2611.35 samples/sec   Loss 1.2757   LearningRate 0.0004   Epoch: 18   Global Step: 776880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:33,713-Speed 2616.57 samples/sec   Loss 1.2719   LearningRate 0.0004   Epoch: 18   Global Step: 776890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:37,609-Speed 2628.97 samples/sec   Loss 1.3275   LearningRate 0.0004   Epoch: 18   Global Step: 776900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:41,501-Speed 2631.89 samples/sec   Loss 1.2906   LearningRate 0.0004   Epoch: 18   Global Step: 776910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:45,395-Speed 2629.77 samples/sec   Loss 1.3035   LearningRate 0.0004   Epoch: 18   Global Step: 776920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:49,288-Speed 2630.88 samples/sec   Loss 1.3306   LearningRate 0.0004   Epoch: 18   Global Step: 776930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:53,180-Speed 2632.10 samples/sec   Loss 1.2490   LearningRate 0.0004   Epoch: 18   Global Step: 776940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:02:57,067-Speed 2635.58 samples/sec   Loss 1.3430   LearningRate 0.0004   Epoch: 18   Global Step: 776950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:03:00,970-Speed 2623.83 samples/sec   Loss 1.2548   LearningRate 0.0004   Epoch: 18   Global Step: 776960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:03:04,862-Speed 2631.76 samples/sec   Loss 1.3113   LearningRate 0.0004   Epoch: 18   Global Step: 776970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:03:08,737-Speed 2643.06 samples/sec   Loss 1.2847   LearningRate 0.0004   Epoch: 18   Global Step: 776980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:03:12,611-Speed 2644.24 samples/sec   Loss 1.2820   LearningRate 0.0004   Epoch: 18   Global Step: 776990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:16,508-Speed 2628.15 samples/sec   Loss 1.3001   LearningRate 0.0004   Epoch: 18   Global Step: 777000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:20,405-Speed 2628.65 samples/sec   Loss 1.2587   LearningRate 0.0004   Epoch: 18   Global Step: 777010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:24,296-Speed 2632.03 samples/sec   Loss 1.2447   LearningRate 0.0004   Epoch: 18   Global Step: 777020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:28,229-Speed 2604.69 samples/sec   Loss 1.2570   LearningRate 0.0004   Epoch: 18   Global Step: 777030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:32,121-Speed 2632.28 samples/sec   Loss 1.2799   LearningRate 0.0004   Epoch: 18   Global Step: 777040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:36,018-Speed 2628.08 samples/sec   Loss 1.2472   LearningRate 0.0004   Epoch: 18   Global Step: 777050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:39,909-Speed 2632.75 samples/sec   Loss 1.3154   LearningRate 0.0004   Epoch: 18   Global Step: 777060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:43,821-Speed 2618.35 samples/sec   Loss 1.3250   LearningRate 0.0004   Epoch: 18   Global Step: 777070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:47,723-Speed 2625.09 samples/sec   Loss 1.3112   LearningRate 0.0004   Epoch: 18   Global Step: 777080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:03:51,614-Speed 2631.70 samples/sec   Loss 1.2706   LearningRate 0.0004   Epoch: 18   Global Step: 777090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:03:55,505-Speed 2632.43 samples/sec   Loss 1.2993   LearningRate 0.0004   Epoch: 18   Global Step: 777100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:03:59,403-Speed 2627.65 samples/sec   Loss 1.2389   LearningRate 0.0004   Epoch: 18   Global Step: 777110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:04:03,278-Speed 2644.07 samples/sec   Loss 1.2948   LearningRate 0.0004   Epoch: 18   Global Step: 777120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:07,188-Speed 2619.34 samples/sec   Loss 1.3094   LearningRate 0.0004   Epoch: 18   Global Step: 777130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:11,111-Speed 2610.87 samples/sec   Loss 1.3199   LearningRate 0.0004   Epoch: 18   Global Step: 777140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:15,030-Speed 2613.85 samples/sec   Loss 1.2618   LearningRate 0.0004   Epoch: 18   Global Step: 777150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:18,921-Speed 2632.12 samples/sec   Loss 1.3346   LearningRate 0.0004   Epoch: 18   Global Step: 777160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:22,818-Speed 2628.04 samples/sec   Loss 1.2824   LearningRate 0.0004   Epoch: 18   Global Step: 777170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:26,730-Speed 2619.00 samples/sec   Loss 1.3576   LearningRate 0.0004   Epoch: 18   Global Step: 777180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:30,625-Speed 2629.64 samples/sec   Loss 1.2800   LearningRate 0.0004   Epoch: 18   Global Step: 777190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:34,534-Speed 2619.69 samples/sec   Loss 1.2829   LearningRate 0.0004   Epoch: 18   Global Step: 777200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:38,462-Speed 2608.65 samples/sec   Loss 1.2981   LearningRate 0.0004   Epoch: 18   Global Step: 777210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:42,354-Speed 2632.13 samples/sec   Loss 1.2951   LearningRate 0.0004   Epoch: 18   Global Step: 777220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:04:46,245-Speed 2632.06 samples/sec   Loss 1.2910   LearningRate 0.0004   Epoch: 18   Global Step: 777230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:04:50,118-Speed 2645.04 samples/sec   Loss 1.2939   LearningRate 0.0004   Epoch: 18   Global Step: 777240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:54,017-Speed 2626.58 samples/sec   Loss 1.2687   LearningRate 0.0004   Epoch: 18   Global Step: 777250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:04:57,924-Speed 2621.60 samples/sec   Loss 1.2957   LearningRate 0.0004   Epoch: 18   Global Step: 777260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:01,821-Speed 2628.65 samples/sec   Loss 1.3265   LearningRate 0.0004   Epoch: 18   Global Step: 777270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:05,728-Speed 2621.24 samples/sec   Loss 1.2942   LearningRate 0.0004   Epoch: 18   Global Step: 777280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:09,621-Speed 2631.26 samples/sec   Loss 1.3116   LearningRate 0.0004   Epoch: 18   Global Step: 777290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:13,564-Speed 2597.50 samples/sec   Loss 1.2257   LearningRate 0.0004   Epoch: 18   Global Step: 777300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:17,461-Speed 2629.33 samples/sec   Loss 1.2770   LearningRate 0.0004   Epoch: 18   Global Step: 777310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:21,352-Speed 2631.72 samples/sec   Loss 1.2839   LearningRate 0.0004   Epoch: 18   Global Step: 777320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:25,249-Speed 2629.58 samples/sec   Loss 1.2789   LearningRate 0.0004   Epoch: 18   Global Step: 777330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:29,145-Speed 2628.87 samples/sec   Loss 1.2922   LearningRate 0.0004   Epoch: 18   Global Step: 777340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:05:33,045-Speed 2626.80 samples/sec   Loss 1.3009   LearningRate 0.0004   Epoch: 18   Global Step: 777350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:05:36,942-Speed 2627.81 samples/sec   Loss 1.2584   LearningRate 0.0004   Epoch: 18   Global Step: 777360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:05:40,814-Speed 2645.24 samples/sec   Loss 1.3110   LearningRate 0.0004   Epoch: 18   Global Step: 777370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:44,707-Speed 2630.54 samples/sec   Loss 1.2805   LearningRate 0.0004   Epoch: 18   Global Step: 777380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:48,613-Speed 2623.33 samples/sec   Loss 1.2930   LearningRate 0.0004   Epoch: 18   Global Step: 777390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:52,504-Speed 2632.49 samples/sec   Loss 1.3158   LearningRate 0.0004   Epoch: 18   Global Step: 777400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:05:56,396-Speed 2631.17 samples/sec   Loss 1.3261   LearningRate 0.0004   Epoch: 18   Global Step: 777410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:00,289-Speed 2631.38 samples/sec   Loss 1.2501   LearningRate 0.0004   Epoch: 18   Global Step: 777420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:04,184-Speed 2629.28 samples/sec   Loss 1.2985   LearningRate 0.0004   Epoch: 18   Global Step: 777430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:08,087-Speed 2624.57 samples/sec   Loss 1.3399   LearningRate 0.0004   Epoch: 18   Global Step: 777440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:11,976-Speed 2633.44 samples/sec   Loss 1.3084   LearningRate 0.0004   Epoch: 18   Global Step: 777450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:15,865-Speed 2633.99 samples/sec   Loss 1.3010   LearningRate 0.0004   Epoch: 18   Global Step: 777460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:19,759-Speed 2630.50 samples/sec   Loss 1.2891   LearningRate 0.0004   Epoch: 18   Global Step: 777470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:06:23,631-Speed 2645.29 samples/sec   Loss 1.2654   LearningRate 0.0004   Epoch: 18   Global Step: 777480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:27,521-Speed 2632.67 samples/sec   Loss 1.2799   LearningRate 0.0004   Epoch: 18   Global Step: 777490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:31,430-Speed 2620.81 samples/sec   Loss 1.2596   LearningRate 0.0004   Epoch: 18   Global Step: 777500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:35,321-Speed 2631.66 samples/sec   Loss 1.2987   LearningRate 0.0004   Epoch: 18   Global Step: 777510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:39,218-Speed 2628.03 samples/sec   Loss 1.3127   LearningRate 0.0004   Epoch: 18   Global Step: 777520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:43,118-Speed 2626.15 samples/sec   Loss 1.2680   LearningRate 0.0004   Epoch: 18   Global Step: 777530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:47,015-Speed 2629.10 samples/sec   Loss 1.3291   LearningRate 0.0004   Epoch: 18   Global Step: 777540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:50,909-Speed 2629.89 samples/sec   Loss 1.2747   LearningRate 0.0004   Epoch: 18   Global Step: 777550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:54,872-Speed 2585.26 samples/sec   Loss 1.2600   LearningRate 0.0004   Epoch: 18   Global Step: 777560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:06:58,769-Speed 2627.81 samples/sec   Loss 1.3364   LearningRate 0.0004   Epoch: 18   Global Step: 777570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:07:02,666-Speed 2629.29 samples/sec   Loss 1.2905   LearningRate 0.0004   Epoch: 18   Global Step: 777580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:07:06,558-Speed 2631.34 samples/sec   Loss 1.2985   LearningRate 0.0004   Epoch: 18   Global Step: 777590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:07:10,475-Speed 2614.48 samples/sec   Loss 1.3230   LearningRate 0.0004   Epoch: 18   Global Step: 777600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:07:14,377-Speed 2624.98 samples/sec   Loss 1.3660   LearningRate 0.0004   Epoch: 18   Global Step: 777610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:07:18,287-Speed 2620.01 samples/sec   Loss 1.3605   LearningRate 0.0004   Epoch: 18   Global Step: 777620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:07:22,186-Speed 2627.31 samples/sec   Loss 1.3122   LearningRate 0.0004   Epoch: 18   Global Step: 777630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:07:26,060-Speed 2643.57 samples/sec   Loss 1.2458   LearningRate 0.0004   Epoch: 18   Global Step: 777640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:07:29,964-Speed 2624.39 samples/sec   Loss 1.3008   LearningRate 0.0004   Epoch: 18   Global Step: 777650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:07:33,866-Speed 2624.54 samples/sec   Loss 1.2837   LearningRate 0.0004   Epoch: 18   Global Step: 777660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:07:37,740-Speed 2643.50 samples/sec   Loss 1.2875   LearningRate 0.0004   Epoch: 18   Global Step: 777670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:07:41,638-Speed 2627.66 samples/sec   Loss 1.2830   LearningRate 0.0004   Epoch: 18   Global Step: 777680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:07:45,538-Speed 2626.93 samples/sec   Loss 1.2708   LearningRate 0.0004   Epoch: 18   Global Step: 777690   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:07:49,439-Speed 2625.44 samples/sec   Loss 1.2729   LearningRate 0.0004   Epoch: 18   Global Step: 777700   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:07:53,338-Speed 2628.35 samples/sec   Loss 1.2921   LearningRate 0.0004   Epoch: 18   Global Step: 777710   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:07:57,236-Speed 2627.28 samples/sec   Loss 1.3024   LearningRate 0.0004   Epoch: 18   Global Step: 777720   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:08:01,160-Speed 2610.66 samples/sec   Loss 1.2673   LearningRate 0.0004   Epoch: 18   Global Step: 777730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:08:05,065-Speed 2622.60 samples/sec   Loss 1.2195   LearningRate 0.0004   Epoch: 18   Global Step: 777740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:08:09,117-Speed 2527.69 samples/sec   Loss 1.2812   LearningRate 0.0004   Epoch: 18   Global Step: 777750   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:08:13,081-Speed 2584.30 samples/sec   Loss 1.3083   LearningRate 0.0004   Epoch: 18   Global Step: 777760   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:08:16,979-Speed 2627.87 samples/sec   Loss 1.2840   LearningRate 0.0004   Epoch: 18   Global Step: 777770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:20,924-Speed 2596.13 samples/sec   Loss 1.2840   LearningRate 0.0004   Epoch: 18   Global Step: 777780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:24,822-Speed 2627.23 samples/sec   Loss 1.3128   LearningRate 0.0004   Epoch: 18   Global Step: 777790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:28,737-Speed 2617.28 samples/sec   Loss 1.2869   LearningRate 0.0004   Epoch: 18   Global Step: 777800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:32,636-Speed 2627.19 samples/sec   Loss 1.2278   LearningRate 0.0004   Epoch: 18   Global Step: 777810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:36,530-Speed 2630.11 samples/sec   Loss 1.2728   LearningRate 0.0004   Epoch: 18   Global Step: 777820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:40,473-Speed 2598.36 samples/sec   Loss 1.3259   LearningRate 0.0004   Epoch: 18   Global Step: 777830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:44,378-Speed 2622.71 samples/sec   Loss 1.3050   LearningRate 0.0004   Epoch: 18   Global Step: 777840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:48,275-Speed 2628.92 samples/sec   Loss 1.2904   LearningRate 0.0004   Epoch: 18   Global Step: 777850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:52,174-Speed 2627.04 samples/sec   Loss 1.2751   LearningRate 0.0004   Epoch: 18   Global Step: 777860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:08:56,068-Speed 2630.02 samples/sec   Loss 1.2872   LearningRate 0.0004   Epoch: 18   Global Step: 777870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:08:59,959-Speed 2632.17 samples/sec   Loss 1.3412   LearningRate 0.0004   Epoch: 18   Global Step: 777880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:09:03,858-Speed 2627.48 samples/sec   Loss 1.2581   LearningRate 0.0004   Epoch: 18   Global Step: 777890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:09:07,736-Speed 2641.28 samples/sec   Loss 1.2570   LearningRate 0.0004   Epoch: 18   Global Step: 777900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:11,640-Speed 2624.01 samples/sec   Loss 1.2603   LearningRate 0.0004   Epoch: 18   Global Step: 777910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:15,541-Speed 2625.75 samples/sec   Loss 1.3021   LearningRate 0.0004   Epoch: 18   Global Step: 777920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:19,452-Speed 2619.00 samples/sec   Loss 1.2482   LearningRate 0.0004   Epoch: 18   Global Step: 777930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:23,367-Speed 2615.94 samples/sec   Loss 1.2806   LearningRate 0.0004   Epoch: 18   Global Step: 777940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:27,276-Speed 2619.99 samples/sec   Loss 1.3078   LearningRate 0.0004   Epoch: 18   Global Step: 777950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:31,173-Speed 2629.21 samples/sec   Loss 1.2785   LearningRate 0.0004   Epoch: 18   Global Step: 777960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:35,068-Speed 2629.64 samples/sec   Loss 1.3298   LearningRate 0.0004   Epoch: 18   Global Step: 777970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:38,958-Speed 2633.14 samples/sec   Loss 1.3252   LearningRate 0.0004   Epoch: 18   Global Step: 777980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:42,877-Speed 2613.16 samples/sec   Loss 1.2788   LearningRate 0.0004   Epoch: 18   Global Step: 777990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:09:46,773-Speed 2629.83 samples/sec   Loss 1.2283   LearningRate 0.0004   Epoch: 18   Global Step: 778000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:09:50,678-Speed 2622.72 samples/sec   Loss 1.2613   LearningRate 0.0004   Epoch: 18   Global Step: 778010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:09:54,581-Speed 2624.13 samples/sec   Loss 1.2601   LearningRate 0.0004   Epoch: 18   Global Step: 778020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:09:58,496-Speed 2615.73 samples/sec   Loss 1.2724   LearningRate 0.0004   Epoch: 18   Global Step: 778030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:10:02,392-Speed 2629.66 samples/sec   Loss 1.2783   LearningRate 0.0004   Epoch: 18   Global Step: 778040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:10:06,274-Speed 2638.36 samples/sec   Loss 1.3330   LearningRate 0.0004   Epoch: 18   Global Step: 778050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:10,181-Speed 2622.53 samples/sec   Loss 1.2570   LearningRate 0.0004   Epoch: 18   Global Step: 778060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:14,078-Speed 2628.25 samples/sec   Loss 1.2518   LearningRate 0.0004   Epoch: 18   Global Step: 778070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:17,981-Speed 2623.94 samples/sec   Loss 1.2571   LearningRate 0.0004   Epoch: 18   Global Step: 778080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:21,877-Speed 2629.24 samples/sec   Loss 1.2704   LearningRate 0.0004   Epoch: 18   Global Step: 778090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:25,770-Speed 2631.01 samples/sec   Loss 1.2931   LearningRate 0.0004   Epoch: 18   Global Step: 778100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:29,676-Speed 2621.92 samples/sec   Loss 1.2682   LearningRate 0.0004   Epoch: 18   Global Step: 778110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:33,567-Speed 2633.21 samples/sec   Loss 1.3156   LearningRate 0.0004   Epoch: 18   Global Step: 778120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:37,459-Speed 2632.41 samples/sec   Loss 1.3014   LearningRate 0.0004   Epoch: 18   Global Step: 778130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:41,367-Speed 2620.48 samples/sec   Loss 1.2394   LearningRate 0.0004   Epoch: 18   Global Step: 778140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:45,267-Speed 2626.53 samples/sec   Loss 1.2673   LearningRate 0.0004   Epoch: 18   Global Step: 778150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:10:49,163-Speed 2629.04 samples/sec   Loss 1.2649   LearningRate 0.0004   Epoch: 18   Global Step: 778160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:10:53,046-Speed 2638.42 samples/sec   Loss 1.2187   LearningRate 0.0004   Epoch: 18   Global Step: 778170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:10:56,941-Speed 2629.49 samples/sec   Loss 1.3026   LearningRate 0.0004   Epoch: 18   Global Step: 778180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:00,862-Speed 2612.12 samples/sec   Loss 1.2200   LearningRate 0.0004   Epoch: 18   Global Step: 778190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:04,777-Speed 2616.43 samples/sec   Loss 1.2981   LearningRate 0.0004   Epoch: 18   Global Step: 778200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:08,676-Speed 2627.23 samples/sec   Loss 1.2831   LearningRate 0.0004   Epoch: 18   Global Step: 778210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:12,585-Speed 2620.62 samples/sec   Loss 1.3029   LearningRate 0.0004   Epoch: 18   Global Step: 778220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:16,478-Speed 2630.89 samples/sec   Loss 1.2884   LearningRate 0.0004   Epoch: 18   Global Step: 778230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:20,404-Speed 2609.17 samples/sec   Loss 1.2505   LearningRate 0.0004   Epoch: 18   Global Step: 778240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:24,299-Speed 2629.83 samples/sec   Loss 1.2650   LearningRate 0.0004   Epoch: 18   Global Step: 778250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:28,188-Speed 2633.89 samples/sec   Loss 1.3112   LearningRate 0.0004   Epoch: 18   Global Step: 778260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:32,080-Speed 2631.65 samples/sec   Loss 1.2973   LearningRate 0.0004   Epoch: 18   Global Step: 778270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:11:35,950-Speed 2646.56 samples/sec   Loss 1.2416   LearningRate 0.0004   Epoch: 18   Global Step: 778280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:39,843-Speed 2630.97 samples/sec   Loss 1.2563   LearningRate 0.0004   Epoch: 18   Global Step: 778290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:43,735-Speed 2631.97 samples/sec   Loss 1.2704   LearningRate 0.0004   Epoch: 18   Global Step: 778300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:47,627-Speed 2631.56 samples/sec   Loss 1.2602   LearningRate 0.0004   Epoch: 18   Global Step: 778310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:51,521-Speed 2630.52 samples/sec   Loss 1.3091   LearningRate 0.0004   Epoch: 18   Global Step: 778320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:55,426-Speed 2623.16 samples/sec   Loss 1.2892   LearningRate 0.0004   Epoch: 18   Global Step: 778330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:11:59,332-Speed 2622.55 samples/sec   Loss 1.2773   LearningRate 0.0004   Epoch: 18   Global Step: 778340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:03,233-Speed 2625.30 samples/sec   Loss 1.2630   LearningRate 0.0004   Epoch: 18   Global Step: 778350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:07,124-Speed 2632.02 samples/sec   Loss 1.2876   LearningRate 0.0004   Epoch: 18   Global Step: 778360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:11,029-Speed 2623.13 samples/sec   Loss 1.2899   LearningRate 0.0004   Epoch: 18   Global Step: 778370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:14,922-Speed 2630.64 samples/sec   Loss 1.2734   LearningRate 0.0004   Epoch: 18   Global Step: 778380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:12:18,832-Speed 2619.98 samples/sec   Loss 1.2300   LearningRate 0.0004   Epoch: 18   Global Step: 778390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:12:22,731-Speed 2627.09 samples/sec   Loss 1.3293   LearningRate 0.0004   Epoch: 18   Global Step: 778400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:12:26,631-Speed 2626.38 samples/sec   Loss 1.2322   LearningRate 0.0004   Epoch: 18   Global Step: 778410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:12:30,531-Speed 2626.41 samples/sec   Loss 1.2951   LearningRate 0.0004   Epoch: 18   Global Step: 778420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:12:34,422-Speed 2631.99 samples/sec   Loss 1.3289   LearningRate 0.0004   Epoch: 18   Global Step: 778430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:12:38,293-Speed 2646.10 samples/sec   Loss 1.3337   LearningRate 0.0004   Epoch: 18   Global Step: 778440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:42,189-Speed 2629.69 samples/sec   Loss 1.2940   LearningRate 0.0004   Epoch: 18   Global Step: 778450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:46,097-Speed 2620.91 samples/sec   Loss 1.3079   LearningRate 0.0004   Epoch: 18   Global Step: 778460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:49,994-Speed 2628.22 samples/sec   Loss 1.3331   LearningRate 0.0004   Epoch: 18   Global Step: 778470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:53,889-Speed 2629.54 samples/sec   Loss 1.2660   LearningRate 0.0004   Epoch: 18   Global Step: 778480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:12:57,782-Speed 2631.67 samples/sec   Loss 1.2676   LearningRate 0.0004   Epoch: 18   Global Step: 778490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:01,671-Speed 2633.65 samples/sec   Loss 1.2482   LearningRate 0.0004   Epoch: 18   Global Step: 778500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:05,564-Speed 2630.78 samples/sec   Loss 1.3162   LearningRate 0.0004   Epoch: 18   Global Step: 778510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:09,459-Speed 2629.50 samples/sec   Loss 1.3131   LearningRate 0.0004   Epoch: 18   Global Step: 778520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:13,359-Speed 2626.26 samples/sec   Loss 1.2372   LearningRate 0.0004   Epoch: 18   Global Step: 778530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:17,260-Speed 2625.67 samples/sec   Loss 1.3164   LearningRate 0.0004   Epoch: 18   Global Step: 778540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:13:21,160-Speed 2626.75 samples/sec   Loss 1.3758   LearningRate 0.0004   Epoch: 18   Global Step: 778550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:13:25,054-Speed 2630.34 samples/sec   Loss 1.2681   LearningRate 0.0004   Epoch: 18   Global Step: 778560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:13:28,928-Speed 2644.12 samples/sec   Loss 1.2993   LearningRate 0.0004   Epoch: 18   Global Step: 778570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:32,825-Speed 2628.05 samples/sec   Loss 1.2826   LearningRate 0.0004   Epoch: 18   Global Step: 778580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:36,723-Speed 2627.89 samples/sec   Loss 1.3061   LearningRate 0.0004   Epoch: 18   Global Step: 778590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:40,643-Speed 2612.90 samples/sec   Loss 1.2812   LearningRate 0.0004   Epoch: 18   Global Step: 778600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:44,564-Speed 2611.76 samples/sec   Loss 1.2864   LearningRate 0.0004   Epoch: 18   Global Step: 778610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:48,458-Speed 2631.23 samples/sec   Loss 1.2972   LearningRate 0.0004   Epoch: 18   Global Step: 778620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:52,354-Speed 2628.68 samples/sec   Loss 1.2172   LearningRate 0.0004   Epoch: 18   Global Step: 778630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:13:56,247-Speed 2630.73 samples/sec   Loss 1.2754   LearningRate 0.0004   Epoch: 18   Global Step: 778640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:00,142-Speed 2629.73 samples/sec   Loss 1.2668   LearningRate 0.0004   Epoch: 18   Global Step: 778650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:04,040-Speed 2627.78 samples/sec   Loss 1.3036   LearningRate 0.0004   Epoch: 18   Global Step: 778660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:07,931-Speed 2631.99 samples/sec   Loss 1.3022   LearningRate 0.0004   Epoch: 18   Global Step: 778670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:11,825-Speed 2630.58 samples/sec   Loss 1.2301   LearningRate 0.0004   Epoch: 18   Global Step: 778680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:15,730-Speed 2623.48 samples/sec   Loss 1.2402   LearningRate 0.0004   Epoch: 18   Global Step: 778690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:19,645-Speed 2616.18 samples/sec   Loss 1.2750   LearningRate 0.0004   Epoch: 18   Global Step: 778700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:23,575-Speed 2606.15 samples/sec   Loss 1.2814   LearningRate 0.0004   Epoch: 18   Global Step: 778710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:27,474-Speed 2627.11 samples/sec   Loss 1.3289   LearningRate 0.0004   Epoch: 18   Global Step: 778720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:31,372-Speed 2627.46 samples/sec   Loss 1.2825   LearningRate 0.0004   Epoch: 18   Global Step: 778730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:35,274-Speed 2625.07 samples/sec   Loss 1.2479   LearningRate 0.0004   Epoch: 18   Global Step: 778740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:39,174-Speed 2626.12 samples/sec   Loss 1.2172   LearningRate 0.0004   Epoch: 18   Global Step: 778750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:43,077-Speed 2625.07 samples/sec   Loss 1.3249   LearningRate 0.0004   Epoch: 18   Global Step: 778760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:14:46,977-Speed 2625.92 samples/sec   Loss 1.3049   LearningRate 0.0004   Epoch: 18   Global Step: 778770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:14:50,881-Speed 2624.20 samples/sec   Loss 1.2648   LearningRate 0.0004   Epoch: 18   Global Step: 778780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:14:54,774-Speed 2630.57 samples/sec   Loss 1.3489   LearningRate 0.0004   Epoch: 18   Global Step: 778790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:14:58,668-Speed 2630.33 samples/sec   Loss 1.2972   LearningRate 0.0004   Epoch: 18   Global Step: 778800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:15:02,575-Speed 2621.32 samples/sec   Loss 1.3345   LearningRate 0.0004   Epoch: 18   Global Step: 778810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:15:06,447-Speed 2646.09 samples/sec   Loss 1.2925   LearningRate 0.0004   Epoch: 18   Global Step: 778820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:15:10,337-Speed 2632.67 samples/sec   Loss 1.2855   LearningRate 0.0004   Epoch: 18   Global Step: 778830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:15:14,232-Speed 2629.64 samples/sec   Loss 1.3106   LearningRate 0.0004   Epoch: 18   Global Step: 778840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:15:18,126-Speed 2630.14 samples/sec   Loss 1.3106   LearningRate 0.0004   Epoch: 18   Global Step: 778850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:15:22,000-Speed 2644.42 samples/sec   Loss 1.2817   LearningRate 0.0004   Epoch: 18   Global Step: 778860   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:15:25,895-Speed 2629.64 samples/sec   Loss 1.3091   LearningRate 0.0004   Epoch: 18   Global Step: 778870   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:15:29,800-Speed 2622.80 samples/sec   Loss 1.2609   LearningRate 0.0004   Epoch: 18   Global Step: 778880   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:15:33,714-Speed 2617.16 samples/sec   Loss 1.2949   LearningRate 0.0004   Epoch: 18   Global Step: 778890   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:15:37,593-Speed 2640.20 samples/sec   Loss 1.2764   LearningRate 0.0004   Epoch: 18   Global Step: 778900   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:15:41,489-Speed 2629.37 samples/sec   Loss 1.2640   LearningRate 0.0004   Epoch: 18   Global Step: 778910   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:15:45,387-Speed 2627.48 samples/sec   Loss 1.2522   LearningRate 0.0004   Epoch: 18   Global Step: 778920   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:15:49,287-Speed 2626.45 samples/sec   Loss 1.2437   LearningRate 0.0004   Epoch: 18   Global Step: 778930   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:15:53,183-Speed 2628.71 samples/sec   Loss 1.2464   LearningRate 0.0004   Epoch: 18   Global Step: 778940   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:15:57,084-Speed 2625.94 samples/sec   Loss 1.2064   LearningRate 0.0004   Epoch: 18   Global Step: 778950   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:00,978-Speed 2629.66 samples/sec   Loss 1.2999   LearningRate 0.0004   Epoch: 18   Global Step: 778960   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:04,870-Speed 2631.51 samples/sec   Loss 1.2807   LearningRate 0.0004   Epoch: 18   Global Step: 778970   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:08,779-Speed 2620.32 samples/sec   Loss 1.3130   LearningRate 0.0004   Epoch: 18   Global Step: 778980   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:12,672-Speed 2631.31 samples/sec   Loss 1.2862   LearningRate 0.0004   Epoch: 18   Global Step: 778990   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:16,591-Speed 2613.56 samples/sec   Loss 1.2917   LearningRate 0.0004   Epoch: 18   Global Step: 779000   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:16:20,484-Speed 2631.29 samples/sec   Loss 1.2864   LearningRate 0.0004   Epoch: 18   Global Step: 779010   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:16:24,356-Speed 2645.61 samples/sec   Loss 1.2971   LearningRate 0.0004   Epoch: 18   Global Step: 779020   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:28,250-Speed 2630.14 samples/sec   Loss 1.2754   LearningRate 0.0004   Epoch: 18   Global Step: 779030   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:32,152-Speed 2624.69 samples/sec   Loss 1.2673   LearningRate 0.0004   Epoch: 18   Global Step: 779040   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:36,043-Speed 2632.12 samples/sec   Loss 1.2885   LearningRate 0.0004   Epoch: 18   Global Step: 779050   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:39,951-Speed 2620.82 samples/sec   Loss 1.3120   LearningRate 0.0004   Epoch: 18   Global Step: 779060   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:43,850-Speed 2627.10 samples/sec   Loss 1.2695   LearningRate 0.0004   Epoch: 18   Global Step: 779070   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:47,750-Speed 2625.76 samples/sec   Loss 1.2781   LearningRate 0.0004   Epoch: 18   Global Step: 779080   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:51,654-Speed 2624.45 samples/sec   Loss 1.3011   LearningRate 0.0004   Epoch: 18   Global Step: 779090   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:55,545-Speed 2632.23 samples/sec   Loss 1.2818   LearningRate 0.0004   Epoch: 18   Global Step: 779100   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:16:59,453-Speed 2620.25 samples/sec   Loss 1.2752   LearningRate 0.0004   Epoch: 18   Global Step: 779110   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:17:03,350-Speed 2628.72 samples/sec   Loss 1.2582   LearningRate 0.0004   Epoch: 18   Global Step: 779120   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:07,242-Speed 2632.06 samples/sec   Loss 1.2747   LearningRate 0.0004   Epoch: 18   Global Step: 779130   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:11,133-Speed 2632.13 samples/sec   Loss 1.2702   LearningRate 0.0004   Epoch: 18   Global Step: 779140   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:15,105-Speed 2578.35 samples/sec   Loss 1.2500   LearningRate 0.0004   Epoch: 18   Global Step: 779150   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:19,017-Speed 2618.33 samples/sec   Loss 1.2702   LearningRate 0.0004   Epoch: 18   Global Step: 779160   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:22,916-Speed 2627.57 samples/sec   Loss 1.3063   LearningRate 0.0004   Epoch: 18   Global Step: 779170   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:26,840-Speed 2617.15 samples/sec   Loss 1.3104   LearningRate 0.0004   Epoch: 18   Global Step: 779180   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:30,745-Speed 2623.17 samples/sec   Loss 1.2816   LearningRate 0.0004   Epoch: 18   Global Step: 779190   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:34,667-Speed 2611.31 samples/sec   Loss 1.2547   LearningRate 0.0004   Epoch: 18   Global Step: 779200   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:38,566-Speed 2626.93 samples/sec   Loss 1.2984   LearningRate 0.0004   Epoch: 18   Global Step: 779210   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:42,464-Speed 2628.28 samples/sec   Loss 1.2709   LearningRate 0.0004   Epoch: 18   Global Step: 779220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:17:46,365-Speed 2625.47 samples/sec   Loss 1.2497   LearningRate 0.0004   Epoch: 18   Global Step: 779230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:17:50,270-Speed 2623.15 samples/sec   Loss 1.2766   LearningRate 0.0004   Epoch: 18   Global Step: 779240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:17:54,147-Speed 2641.87 samples/sec   Loss 1.3589   LearningRate 0.0004   Epoch: 18   Global Step: 779250   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:17:58,041-Speed 2630.49 samples/sec   Loss 1.3104   LearningRate 0.0004   Epoch: 18   Global Step: 779260   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:01,941-Speed 2626.18 samples/sec   Loss 1.2797   LearningRate 0.0004   Epoch: 18   Global Step: 779270   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:05,837-Speed 2629.02 samples/sec   Loss 1.3346   LearningRate 0.0004   Epoch: 18   Global Step: 779280   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:09,737-Speed 2625.74 samples/sec   Loss 1.2632   LearningRate 0.0004   Epoch: 18   Global Step: 779290   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:13,635-Speed 2627.74 samples/sec   Loss 1.2982   LearningRate 0.0004   Epoch: 18   Global Step: 779300   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:17,536-Speed 2625.66 samples/sec   Loss 1.2710   LearningRate 0.0004   Epoch: 18   Global Step: 779310   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:21,436-Speed 2626.57 samples/sec   Loss 1.2593   LearningRate 0.0004   Epoch: 18   Global Step: 779320   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:25,334-Speed 2627.24 samples/sec   Loss 1.2897   LearningRate 0.0004   Epoch: 18   Global Step: 779330   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:29,229-Speed 2630.05 samples/sec   Loss 1.2602   LearningRate 0.0004   Epoch: 18   Global Step: 779340   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:18:33,149-Speed 2612.94 samples/sec   Loss 1.3017   LearningRate 0.0004   Epoch: 18   Global Step: 779350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:18:37,043-Speed 2630.00 samples/sec   Loss 1.2090   LearningRate 0.0004   Epoch: 18   Global Step: 779360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:18:40,938-Speed 2630.06 samples/sec   Loss 1.3059   LearningRate 0.0004   Epoch: 18   Global Step: 779370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:18:44,831-Speed 2631.54 samples/sec   Loss 1.2534   LearningRate 0.0004   Epoch: 18   Global Step: 779380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:18:48,751-Speed 2613.06 samples/sec   Loss 1.3053   LearningRate 0.0004   Epoch: 18   Global Step: 779390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:18:52,645-Speed 2630.24 samples/sec   Loss 1.2495   LearningRate 0.0004   Epoch: 18   Global Step: 779400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:18:56,538-Speed 2631.83 samples/sec   Loss 1.3203   LearningRate 0.0004   Epoch: 18   Global Step: 779410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:00,431-Speed 2630.28 samples/sec   Loss 1.2861   LearningRate 0.0004   Epoch: 18   Global Step: 779420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:04,337-Speed 2622.28 samples/sec   Loss 1.2177   LearningRate 0.0004   Epoch: 18   Global Step: 779430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:08,240-Speed 2624.46 samples/sec   Loss 1.2863   LearningRate 0.0004   Epoch: 18   Global Step: 779440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:12,132-Speed 2631.83 samples/sec   Loss 1.3389   LearningRate 0.0004   Epoch: 18   Global Step: 779450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:19:16,026-Speed 2630.59 samples/sec   Loss 1.3008   LearningRate 0.0004   Epoch: 18   Global Step: 779460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:19:19,906-Speed 2640.39 samples/sec   Loss 1.3090   LearningRate 0.0004   Epoch: 18   Global Step: 779470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:23,803-Speed 2628.33 samples/sec   Loss 1.2453   LearningRate 0.0004   Epoch: 18   Global Step: 779480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:27,772-Speed 2581.03 samples/sec   Loss 1.2598   LearningRate 0.0004   Epoch: 18   Global Step: 779490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:19:31,662-Speed 2633.34 samples/sec   Loss 1.3037   LearningRate 0.0004   Epoch: 18   Global Step: 779500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:35,566-Speed 2623.58 samples/sec   Loss 1.2598   LearningRate 0.0004   Epoch: 18   Global Step: 779510   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:39,460-Speed 2629.78 samples/sec   Loss 1.2881   LearningRate 0.0004   Epoch: 18   Global Step: 779520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:43,351-Speed 2632.83 samples/sec   Loss 1.3033   LearningRate 0.0004   Epoch: 18   Global Step: 779530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:47,251-Speed 2625.75 samples/sec   Loss 1.2515   LearningRate 0.0004   Epoch: 18   Global Step: 779540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:51,145-Speed 2631.00 samples/sec   Loss 1.2271   LearningRate 0.0004   Epoch: 18   Global Step: 779550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:55,037-Speed 2631.73 samples/sec   Loss 1.2779   LearningRate 0.0004   Epoch: 18   Global Step: 779560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:19:58,930-Speed 2631.64 samples/sec   Loss 1.2030   LearningRate 0.0004   Epoch: 18   Global Step: 779570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:02,831-Speed 2625.19 samples/sec   Loss 1.2581   LearningRate 0.0004   Epoch: 18   Global Step: 779580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:06,731-Speed 2625.86 samples/sec   Loss 1.2415   LearningRate 0.0004   Epoch: 18   Global Step: 779590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:10,633-Speed 2624.74 samples/sec   Loss 1.3108   LearningRate 0.0004   Epoch: 18   Global Step: 779600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:20:14,539-Speed 2623.53 samples/sec   Loss 1.2791   LearningRate 0.0004   Epoch: 18   Global Step: 779610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:18,442-Speed 2624.15 samples/sec   Loss 1.2754   LearningRate 0.0004   Epoch: 18   Global Step: 779620   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:22,365-Speed 2611.09 samples/sec   Loss 1.2609   LearningRate 0.0004   Epoch: 18   Global Step: 779630   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:26,257-Speed 2631.70 samples/sec   Loss 1.2331   LearningRate 0.0004   Epoch: 18   Global Step: 779640   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:30,148-Speed 2632.20 samples/sec   Loss 1.2546   LearningRate 0.0004   Epoch: 18   Global Step: 779650   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:34,043-Speed 2629.88 samples/sec   Loss 1.2801   LearningRate 0.0004   Epoch: 18   Global Step: 779660   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:37,936-Speed 2631.22 samples/sec   Loss 1.2649   LearningRate 0.0004   Epoch: 18   Global Step: 779670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:41,829-Speed 2630.42 samples/sec   Loss 1.3106   LearningRate 0.0004   Epoch: 18   Global Step: 779680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:20:45,700-Speed 2646.71 samples/sec   Loss 1.2456   LearningRate 0.0004   Epoch: 18   Global Step: 779690   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:20:49,595-Speed 2629.98 samples/sec   Loss 1.2572   LearningRate 0.0004   Epoch: 18   Global Step: 779700   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:20:53,525-Speed 2606.03 samples/sec   Loss 1.3410   LearningRate 0.0004   Epoch: 18   Global Step: 779710   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:20:57,420-Speed 2630.22 samples/sec   Loss 1.2200   LearningRate 0.0004   Epoch: 18   Global Step: 779720   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:01,313-Speed 2630.62 samples/sec   Loss 1.3004   LearningRate 0.0004   Epoch: 18   Global Step: 779730   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:05,216-Speed 2624.48 samples/sec   Loss 1.3173   LearningRate 0.0004   Epoch: 18   Global Step: 779740   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:09,131-Speed 2616.61 samples/sec   Loss 1.2679   LearningRate 0.0004   Epoch: 18   Global Step: 779750   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:13,037-Speed 2622.02 samples/sec   Loss 1.3131   LearningRate 0.0004   Epoch: 18   Global Step: 779760   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:16,940-Speed 2624.18 samples/sec   Loss 1.2779   LearningRate 0.0004   Epoch: 18   Global Step: 779770   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:20,857-Speed 2618.87 samples/sec   Loss 1.2808   LearningRate 0.0004   Epoch: 18   Global Step: 779780   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-16 11:21:24,755-Speed 2627.87 samples/sec   Loss 1.2656   LearningRate 0.0004   Epoch: 18   Global Step: 779790   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:28,676-Speed 2612.34 samples/sec   Loss 1.2696   LearningRate 0.0004   Epoch: 18   Global Step: 779800   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:32,572-Speed 2629.26 samples/sec   Loss 1.2711   LearningRate 0.0004   Epoch: 18   Global Step: 779810   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:36,474-Speed 2624.51 samples/sec   Loss 1.3349   LearningRate 0.0004   Epoch: 18   Global Step: 779820   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:40,367-Speed 2631.39 samples/sec   Loss 1.3133   LearningRate 0.0004   Epoch: 18   Global Step: 779830   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:44,259-Speed 2631.73 samples/sec   Loss 1.2511   LearningRate 0.0004   Epoch: 18   Global Step: 779840   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:48,155-Speed 2628.93 samples/sec   Loss 1.2929   LearningRate 0.0004   Epoch: 18   Global Step: 779850   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:52,059-Speed 2623.95 samples/sec   Loss 1.2723   LearningRate 0.0004   Epoch: 18   Global Step: 779860   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:55,952-Speed 2630.98 samples/sec   Loss 1.2953   LearningRate 0.0004   Epoch: 18   Global Step: 779870   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:21:59,848-Speed 2628.73 samples/sec   Loss 1.3077   LearningRate 0.0004   Epoch: 18   Global Step: 779880   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:22:03,757-Speed 2620.02 samples/sec   Loss 1.2623   LearningRate 0.0004   Epoch: 18   Global Step: 779890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:07,652-Speed 2629.81 samples/sec   Loss 1.2702   LearningRate 0.0004   Epoch: 18   Global Step: 779900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:11,548-Speed 2628.77 samples/sec   Loss 1.2703   LearningRate 0.0004   Epoch: 18   Global Step: 779910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:15,454-Speed 2622.00 samples/sec   Loss 1.2380   LearningRate 0.0004   Epoch: 18   Global Step: 779920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:19,357-Speed 2624.72 samples/sec   Loss 1.2829   LearningRate 0.0004   Epoch: 18   Global Step: 779930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:23,250-Speed 2630.60 samples/sec   Loss 1.2785   LearningRate 0.0004   Epoch: 18   Global Step: 779940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:27,145-Speed 2630.31 samples/sec   Loss 1.2606   LearningRate 0.0004   Epoch: 18   Global Step: 779950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:31,038-Speed 2630.35 samples/sec   Loss 1.2579   LearningRate 0.0004   Epoch: 18   Global Step: 779960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:34,930-Speed 2631.50 samples/sec   Loss 1.2926   LearningRate 0.0004   Epoch: 18   Global Step: 779970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:38,834-Speed 2623.41 samples/sec   Loss 1.2729   LearningRate 0.0004   Epoch: 18   Global Step: 779980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:22:42,727-Speed 2631.61 samples/sec   Loss 1.2619   LearningRate 0.0004   Epoch: 18   Global Step: 779990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:22:46,621-Speed 2629.71 samples/sec   Loss 1.3066   LearningRate 0.0004   Epoch: 18   Global Step: 780000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:23:29,689-[lfw][780000]XNorm: 21.557862
Training: 2022-04-16 11:23:29,690-[lfw][780000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 11:23:29,690-[lfw][780000]Accuracy-Highest: 0.99850
Training: 2022-04-16 11:24:20,002-[cfp_fp][780000]XNorm: 21.949184
Training: 2022-04-16 11:24:20,003-[cfp_fp][780000]Accuracy-Flip: 0.99400+-0.00408
Training: 2022-04-16 11:24:20,004-[cfp_fp][780000]Accuracy-Highest: 0.99400
Training: 2022-04-16 11:25:03,222-[agedb_30][780000]XNorm: 22.462296
Training: 2022-04-16 11:25:03,223-[agedb_30][780000]Accuracy-Flip: 0.98550+-0.00548
Training: 2022-04-16 11:25:03,224-[agedb_30][780000]Accuracy-Highest: 0.98550
Training: 2022-04-16 11:25:07,068-Speed 72.91 samples/sec   Loss 1.3015   LearningRate 0.0004   Epoch: 18   Global Step: 780010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:10,927-Speed 2653.79 samples/sec   Loss 1.2841   LearningRate 0.0004   Epoch: 18   Global Step: 780020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:14,826-Speed 2628.16 samples/sec   Loss 1.2447   LearningRate 0.0004   Epoch: 18   Global Step: 780030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:18,702-Speed 2642.65 samples/sec   Loss 1.2739   LearningRate 0.0004   Epoch: 18   Global Step: 780040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:22,570-Speed 2648.29 samples/sec   Loss 1.3297   LearningRate 0.0004   Epoch: 18   Global Step: 780050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:26,448-Speed 2641.35 samples/sec   Loss 1.2230   LearningRate 0.0004   Epoch: 18   Global Step: 780060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:30,321-Speed 2645.72 samples/sec   Loss 1.2289   LearningRate 0.0004   Epoch: 18   Global Step: 780070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:34,211-Speed 2633.19 samples/sec   Loss 1.2609   LearningRate 0.0004   Epoch: 18   Global Step: 780080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:38,091-Speed 2640.71 samples/sec   Loss 1.2910   LearningRate 0.0004   Epoch: 18   Global Step: 780090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:41,975-Speed 2636.94 samples/sec   Loss 1.2738   LearningRate 0.0004   Epoch: 18   Global Step: 780100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:45,856-Speed 2639.37 samples/sec   Loss 1.2544   LearningRate 0.0004   Epoch: 18   Global Step: 780110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:25:49,719-Speed 2651.35 samples/sec   Loss 1.2709   LearningRate 0.0004   Epoch: 18   Global Step: 780120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:53,610-Speed 2632.54 samples/sec   Loss 1.2322   LearningRate 0.0004   Epoch: 18   Global Step: 780130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:25:57,501-Speed 2632.66 samples/sec   Loss 1.2673   LearningRate 0.0004   Epoch: 18   Global Step: 780140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:01,398-Speed 2628.49 samples/sec   Loss 1.2898   LearningRate 0.0004   Epoch: 18   Global Step: 780150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:05,283-Speed 2636.71 samples/sec   Loss 1.2490   LearningRate 0.0004   Epoch: 18   Global Step: 780160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:09,171-Speed 2634.18 samples/sec   Loss 1.2594   LearningRate 0.0004   Epoch: 18   Global Step: 780170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:13,072-Speed 2625.33 samples/sec   Loss 1.2893   LearningRate 0.0004   Epoch: 18   Global Step: 780180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:16,967-Speed 2629.62 samples/sec   Loss 1.3014   LearningRate 0.0004   Epoch: 18   Global Step: 780190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:20,856-Speed 2636.22 samples/sec   Loss 1.2739   LearningRate 0.0004   Epoch: 18   Global Step: 780200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:24,742-Speed 2635.49 samples/sec   Loss 1.2647   LearningRate 0.0004   Epoch: 18   Global Step: 780210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:28,612-Speed 2647.15 samples/sec   Loss 1.2640   LearningRate 0.0004   Epoch: 18   Global Step: 780220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:32,522-Speed 2619.38 samples/sec   Loss 1.2503   LearningRate 0.0004   Epoch: 18   Global Step: 780230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:36,409-Speed 2635.17 samples/sec   Loss 1.2842   LearningRate 0.0004   Epoch: 18   Global Step: 780240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:40,305-Speed 2628.80 samples/sec   Loss 1.3140   LearningRate 0.0004   Epoch: 18   Global Step: 780250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:44,231-Speed 2609.72 samples/sec   Loss 1.2655   LearningRate 0.0004   Epoch: 18   Global Step: 780260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:48,123-Speed 2631.43 samples/sec   Loss 1.3005   LearningRate 0.0004   Epoch: 18   Global Step: 780270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:52,013-Speed 2633.73 samples/sec   Loss 1.3032   LearningRate 0.0004   Epoch: 18   Global Step: 780280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:55,910-Speed 2628.15 samples/sec   Loss 1.2745   LearningRate 0.0004   Epoch: 18   Global Step: 780290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:26:59,803-Speed 2630.64 samples/sec   Loss 1.2768   LearningRate 0.0004   Epoch: 18   Global Step: 780300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:03,695-Speed 2631.55 samples/sec   Loss 1.2967   LearningRate 0.0004   Epoch: 18   Global Step: 780310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:07,593-Speed 2627.88 samples/sec   Loss 1.2297   LearningRate 0.0004   Epoch: 18   Global Step: 780320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:27:11,484-Speed 2632.06 samples/sec   Loss 1.2634   LearningRate 0.0004   Epoch: 18   Global Step: 780330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:27:15,376-Speed 2632.04 samples/sec   Loss 1.2673   LearningRate 0.0004   Epoch: 18   Global Step: 780340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:27:19,249-Speed 2644.90 samples/sec   Loss 1.2473   LearningRate 0.0004   Epoch: 18   Global Step: 780350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:23,145-Speed 2628.72 samples/sec   Loss 1.3212   LearningRate 0.0004   Epoch: 18   Global Step: 780360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:27,055-Speed 2620.31 samples/sec   Loss 1.2192   LearningRate 0.0004   Epoch: 18   Global Step: 780370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:30,969-Speed 2616.55 samples/sec   Loss 1.2581   LearningRate 0.0004   Epoch: 18   Global Step: 780380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:34,858-Speed 2633.58 samples/sec   Loss 1.2403   LearningRate 0.0004   Epoch: 18   Global Step: 780390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:38,747-Speed 2633.77 samples/sec   Loss 1.2638   LearningRate 0.0004   Epoch: 18   Global Step: 780400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:27:42,616-Speed 2648.10 samples/sec   Loss 1.2666   LearningRate 0.0004   Epoch: 18   Global Step: 780410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:27:46,507-Speed 2631.97 samples/sec   Loss 1.2587   LearningRate 0.0004   Epoch: 18   Global Step: 780420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:27:50,433-Speed 2609.44 samples/sec   Loss 1.2686   LearningRate 0.0004   Epoch: 18   Global Step: 780430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:27:54,326-Speed 2630.59 samples/sec   Loss 1.2491   LearningRate 0.0004   Epoch: 18   Global Step: 780440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:27:58,226-Speed 2627.10 samples/sec   Loss 1.2572   LearningRate 0.0004   Epoch: 18   Global Step: 780450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:28:02,117-Speed 2632.05 samples/sec   Loss 1.2448   LearningRate 0.0004   Epoch: 18   Global Step: 780460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:28:06,005-Speed 2633.88 samples/sec   Loss 1.2844   LearningRate 0.0004   Epoch: 18   Global Step: 780470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:28:09,895-Speed 2633.20 samples/sec   Loss 1.2903   LearningRate 0.0004   Epoch: 18   Global Step: 780480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:28:13,786-Speed 2632.62 samples/sec   Loss 1.2541   LearningRate 0.0004   Epoch: 18   Global Step: 780490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:28:17,679-Speed 2631.54 samples/sec   Loss 1.2760   LearningRate 0.0004   Epoch: 18   Global Step: 780500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-16 11:28:21,574-Speed 2629.29 samples/sec   Loss 1.2606   LearningRate 0.0003   Epoch: 18   Global Step: 780510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:25,490-Speed 2615.86 samples/sec   Loss 1.2589   LearningRate 0.0003   Epoch: 18   Global Step: 780520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:29,381-Speed 2632.90 samples/sec   Loss 1.2018   LearningRate 0.0003   Epoch: 18   Global Step: 780530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:33,271-Speed 2633.02 samples/sec   Loss 1.2729   LearningRate 0.0003   Epoch: 18   Global Step: 780540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:37,175-Speed 2622.98 samples/sec   Loss 1.2470   LearningRate 0.0003   Epoch: 18   Global Step: 780550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:41,068-Speed 2631.81 samples/sec   Loss 1.2461   LearningRate 0.0003   Epoch: 18   Global Step: 780560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:44,955-Speed 2634.98 samples/sec   Loss 1.2565   LearningRate 0.0003   Epoch: 18   Global Step: 780570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:48,845-Speed 2633.36 samples/sec   Loss 1.2642   LearningRate 0.0003   Epoch: 18   Global Step: 780580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:52,761-Speed 2614.97 samples/sec   Loss 1.2920   LearningRate 0.0003   Epoch: 18   Global Step: 780590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:28:56,666-Speed 2623.83 samples/sec   Loss 1.2483   LearningRate 0.0003   Epoch: 18   Global Step: 780600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:29:00,573-Speed 2621.40 samples/sec   Loss 1.2640   LearningRate 0.0003   Epoch: 18   Global Step: 780610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:29:04,466-Speed 2630.74 samples/sec   Loss 1.2453   LearningRate 0.0003   Epoch: 18   Global Step: 780620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:29:08,373-Speed 2621.45 samples/sec   Loss 1.2620   LearningRate 0.0003   Epoch: 18   Global Step: 780630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-16 11:29:12,249-Speed 2642.74 samples/sec   Loss 1.2896   LearningRate 0.0003   Epoch: 18   Global Step: 780640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:29:16,139-Speed 2633.01 samples/sec   Loss 1.2733   LearningRate 0.0003   Epoch: 18   Global Step: 780650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-16 11:29:20,033-Speed 2630.64 samples/sec   Loss 1.3144   LearningRate 0.0003   Epoch: 18   Global Step: 780660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:23,928-Speed 2629.79 samples/sec   Loss 1.2632   LearningRate 0.0003   Epoch: 18   Global Step: 780670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:27,817-Speed 2633.43 samples/sec   Loss 1.2363   LearningRate 0.0003   Epoch: 18   Global Step: 780680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:31,723-Speed 2622.10 samples/sec   Loss 1.2873   LearningRate 0.0003   Epoch: 18   Global Step: 780690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:35,613-Speed 2632.83 samples/sec   Loss 1.2572   LearningRate 0.0003   Epoch: 18   Global Step: 780700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:39,516-Speed 2624.67 samples/sec   Loss 1.2889   LearningRate 0.0003   Epoch: 18   Global Step: 780710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:43,411-Speed 2630.21 samples/sec   Loss 1.2442   LearningRate 0.0003   Epoch: 18   Global Step: 780720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:47,303-Speed 2631.41 samples/sec   Loss 1.2654   LearningRate 0.0003   Epoch: 18   Global Step: 780730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:29:51,194-Speed 2632.51 samples/sec   Loss 1.3004   LearningRate 0.0003   Epoch: 18   Global Step: 780740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:29:55,085-Speed 2632.11 samples/sec   Loss 1.2511   LearningRate 0.0003   Epoch: 18   Global Step: 780750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:29:58,979-Speed 2630.55 samples/sec   Loss 1.2559   LearningRate 0.0003   Epoch: 18   Global Step: 780760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:30:02,875-Speed 2628.70 samples/sec   Loss 1.3030   LearningRate 0.0003   Epoch: 18   Global Step: 780770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:30:06,792-Speed 2615.32 samples/sec   Loss 1.2557   LearningRate 0.0003   Epoch: 18   Global Step: 780780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:30:10,691-Speed 2626.55 samples/sec   Loss 1.2936   LearningRate 0.0003   Epoch: 18   Global Step: 780790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:30:14,609-Speed 2614.18 samples/sec   Loss 1.2847   LearningRate 0.0003   Epoch: 18   Global Step: 780800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:30:18,484-Speed 2643.30 samples/sec   Loss 1.2403   LearningRate 0.0003   Epoch: 18   Global Step: 780810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:22,426-Speed 2599.10 samples/sec   Loss 1.3106   LearningRate 0.0003   Epoch: 18   Global Step: 780820   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:26,313-Speed 2634.68 samples/sec   Loss 1.2587   LearningRate 0.0003   Epoch: 18   Global Step: 780830   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:30,258-Speed 2596.65 samples/sec   Loss 1.2950   LearningRate 0.0003   Epoch: 18   Global Step: 780840   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:34,154-Speed 2629.15 samples/sec   Loss 1.2563   LearningRate 0.0003   Epoch: 18   Global Step: 780850   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:38,042-Speed 2634.48 samples/sec   Loss 1.2364   LearningRate 0.0003   Epoch: 18   Global Step: 780860   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:41,947-Speed 2623.05 samples/sec   Loss 1.2440   LearningRate 0.0003   Epoch: 18   Global Step: 780870   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:45,855-Speed 2620.59 samples/sec   Loss 1.3223   LearningRate 0.0003   Epoch: 18   Global Step: 780880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:49,763-Speed 2620.87 samples/sec   Loss 1.2431   LearningRate 0.0003   Epoch: 18   Global Step: 780890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:53,660-Speed 2628.76 samples/sec   Loss 1.2824   LearningRate 0.0003   Epoch: 18   Global Step: 780900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:30:57,578-Speed 2614.13 samples/sec   Loss 1.2698   LearningRate 0.0003   Epoch: 18   Global Step: 780910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:31:01,474-Speed 2629.66 samples/sec   Loss 1.2865   LearningRate 0.0003   Epoch: 18   Global Step: 780920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:31:05,372-Speed 2628.01 samples/sec   Loss 1.2856   LearningRate 0.0003   Epoch: 18   Global Step: 780930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:31:09,243-Speed 2645.63 samples/sec   Loss 1.3118   LearningRate 0.0003   Epoch: 18   Global Step: 780940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:13,138-Speed 2629.81 samples/sec   Loss 1.2528   LearningRate 0.0003   Epoch: 18   Global Step: 780950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:17,194-Speed 2525.02 samples/sec   Loss 1.2316   LearningRate 0.0003   Epoch: 18   Global Step: 780960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:21,333-Speed 2474.89 samples/sec   Loss 1.2937   LearningRate 0.0003   Epoch: 18   Global Step: 780970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:25,427-Speed 2502.24 samples/sec   Loss 1.2936   LearningRate 0.0003   Epoch: 18   Global Step: 780980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:29,318-Speed 2631.93 samples/sec   Loss 1.2689   LearningRate 0.0003   Epoch: 18   Global Step: 780990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:33,306-Speed 2568.29 samples/sec   Loss 1.2471   LearningRate 0.0003   Epoch: 18   Global Step: 781000   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:37,407-Speed 2497.83 samples/sec   Loss 1.2779   LearningRate 0.0003   Epoch: 18   Global Step: 781010   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:41,415-Speed 2555.84 samples/sec   Loss 1.2554   LearningRate 0.0003   Epoch: 18   Global Step: 781020   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:45,313-Speed 2627.72 samples/sec   Loss 1.2701   LearningRate 0.0003   Epoch: 18   Global Step: 781030   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:31:49,281-Speed 2581.63 samples/sec   Loss 1.2529   LearningRate 0.0003   Epoch: 18   Global Step: 781040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:31:53,355-Speed 2513.43 samples/sec   Loss 1.2594   LearningRate 0.0003   Epoch: 18   Global Step: 781050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:31:57,395-Speed 2535.50 samples/sec   Loss 1.2418   LearningRate 0.0003   Epoch: 18   Global Step: 781060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:01,473-Speed 2512.16 samples/sec   Loss 1.2583   LearningRate 0.0003   Epoch: 18   Global Step: 781070   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:05,385-Speed 2617.93 samples/sec   Loss 1.2958   LearningRate 0.0003   Epoch: 18   Global Step: 781080   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:09,278-Speed 2631.25 samples/sec   Loss 1.2379   LearningRate 0.0003   Epoch: 18   Global Step: 781090   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:13,179-Speed 2625.48 samples/sec   Loss 1.2710   LearningRate 0.0003   Epoch: 18   Global Step: 781100   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:17,074-Speed 2629.34 samples/sec   Loss 1.2776   LearningRate 0.0003   Epoch: 18   Global Step: 781110   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:20,973-Speed 2626.90 samples/sec   Loss 1.2433   LearningRate 0.0003   Epoch: 18   Global Step: 781120   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:24,870-Speed 2628.75 samples/sec   Loss 1.2580   LearningRate 0.0003   Epoch: 18   Global Step: 781130   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:28,770-Speed 2626.25 samples/sec   Loss 1.2276   LearningRate 0.0003   Epoch: 18   Global Step: 781140   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:32,663-Speed 2630.65 samples/sec   Loss 1.2084   LearningRate 0.0003   Epoch: 18   Global Step: 781150   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:32:36,556-Speed 2630.91 samples/sec   Loss 1.3142   LearningRate 0.0003   Epoch: 18   Global Step: 781160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:32:40,459-Speed 2624.18 samples/sec   Loss 1.3023   LearningRate 0.0003   Epoch: 18   Global Step: 781170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:32:44,350-Speed 2632.73 samples/sec   Loss 1.2867   LearningRate 0.0003   Epoch: 18   Global Step: 781180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:32:48,256-Speed 2621.97 samples/sec   Loss 1.2134   LearningRate 0.0003   Epoch: 18   Global Step: 781190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:32:52,152-Speed 2629.34 samples/sec   Loss 1.2630   LearningRate 0.0003   Epoch: 18   Global Step: 781200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:32:56,049-Speed 2628.54 samples/sec   Loss 1.2179   LearningRate 0.0003   Epoch: 18   Global Step: 781210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:32:59,941-Speed 2631.33 samples/sec   Loss 1.2464   LearningRate 0.0003   Epoch: 18   Global Step: 781220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:03,831-Speed 2632.54 samples/sec   Loss 1.2179   LearningRate 0.0003   Epoch: 18   Global Step: 781230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:07,753-Speed 2612.53 samples/sec   Loss 1.2587   LearningRate 0.0003   Epoch: 18   Global Step: 781240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:11,644-Speed 2632.45 samples/sec   Loss 1.2589   LearningRate 0.0003   Epoch: 18   Global Step: 781250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:15,536-Speed 2631.42 samples/sec   Loss 1.2450   LearningRate 0.0003   Epoch: 18   Global Step: 781260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:33:19,410-Speed 2644.57 samples/sec   Loss 1.3037   LearningRate 0.0003   Epoch: 18   Global Step: 781270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:23,303-Speed 2630.53 samples/sec   Loss 1.2489   LearningRate 0.0003   Epoch: 18   Global Step: 781280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:27,200-Speed 2628.44 samples/sec   Loss 1.2506   LearningRate 0.0003   Epoch: 18   Global Step: 781290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:31,111-Speed 2618.84 samples/sec   Loss 1.2835   LearningRate 0.0003   Epoch: 18   Global Step: 781300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:35,004-Speed 2631.17 samples/sec   Loss 1.2185   LearningRate 0.0003   Epoch: 18   Global Step: 781310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:38,900-Speed 2629.18 samples/sec   Loss 1.2560   LearningRate 0.0003   Epoch: 18   Global Step: 781320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:42,802-Speed 2624.94 samples/sec   Loss 1.2313   LearningRate 0.0003   Epoch: 18   Global Step: 781330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:46,698-Speed 2628.92 samples/sec   Loss 1.2826   LearningRate 0.0003   Epoch: 18   Global Step: 781340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:50,594-Speed 2629.04 samples/sec   Loss 1.2017   LearningRate 0.0003   Epoch: 18   Global Step: 781350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:54,488-Speed 2630.52 samples/sec   Loss 1.2230   LearningRate 0.0003   Epoch: 18   Global Step: 781360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:33:58,386-Speed 2627.98 samples/sec   Loss 1.2548   LearningRate 0.0003   Epoch: 18   Global Step: 781370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:34:02,281-Speed 2629.70 samples/sec   Loss 1.2268   LearningRate 0.0003   Epoch: 18   Global Step: 781380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:34:06,153-Speed 2644.77 samples/sec   Loss 1.2410   LearningRate 0.0003   Epoch: 18   Global Step: 781390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:10,048-Speed 2629.73 samples/sec   Loss 1.2571   LearningRate 0.0003   Epoch: 18   Global Step: 781400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:13,943-Speed 2629.60 samples/sec   Loss 1.2838   LearningRate 0.0003   Epoch: 18   Global Step: 781410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:17,835-Speed 2632.22 samples/sec   Loss 1.2473   LearningRate 0.0003   Epoch: 18   Global Step: 781420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:21,733-Speed 2627.48 samples/sec   Loss 1.2484   LearningRate 0.0003   Epoch: 18   Global Step: 781430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:25,659-Speed 2609.40 samples/sec   Loss 1.2652   LearningRate 0.0003   Epoch: 18   Global Step: 781440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:29,564-Speed 2622.78 samples/sec   Loss 1.2717   LearningRate 0.0003   Epoch: 18   Global Step: 781450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:33,475-Speed 2619.01 samples/sec   Loss 1.3161   LearningRate 0.0003   Epoch: 18   Global Step: 781460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:37,373-Speed 2627.35 samples/sec   Loss 1.2675   LearningRate 0.0003   Epoch: 18   Global Step: 781470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:41,270-Speed 2628.92 samples/sec   Loss 1.3118   LearningRate 0.0003   Epoch: 18   Global Step: 781480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:45,170-Speed 2626.04 samples/sec   Loss 1.2522   LearningRate 0.0003   Epoch: 18   Global Step: 781490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:34:49,068-Speed 2627.99 samples/sec   Loss 1.2296   LearningRate 0.0003   Epoch: 18   Global Step: 781500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:34:52,939-Speed 2645.71 samples/sec   Loss 1.2772   LearningRate 0.0003   Epoch: 18   Global Step: 781510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:34:56,840-Speed 2626.32 samples/sec   Loss 1.2674   LearningRate 0.0003   Epoch: 18   Global Step: 781520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:00,737-Speed 2628.42 samples/sec   Loss 1.2914   LearningRate 0.0003   Epoch: 18   Global Step: 781530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:04,634-Speed 2628.12 samples/sec   Loss 1.2729   LearningRate 0.0003   Epoch: 18   Global Step: 781540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:08,591-Speed 2588.40 samples/sec   Loss 1.2691   LearningRate 0.0003   Epoch: 18   Global Step: 781550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:12,494-Speed 2624.38 samples/sec   Loss 1.2650   LearningRate 0.0003   Epoch: 18   Global Step: 781560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:16,391-Speed 2628.12 samples/sec   Loss 1.2647   LearningRate 0.0003   Epoch: 18   Global Step: 781570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:20,322-Speed 2606.69 samples/sec   Loss 1.2703   LearningRate 0.0003   Epoch: 18   Global Step: 781580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:24,224-Speed 2625.38 samples/sec   Loss 1.2499   LearningRate 0.0003   Epoch: 18   Global Step: 781590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:28,119-Speed 2629.43 samples/sec   Loss 1.2513   LearningRate 0.0003   Epoch: 18   Global Step: 781600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:32,017-Speed 2627.62 samples/sec   Loss 1.2740   LearningRate 0.0003   Epoch: 18   Global Step: 781610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:35:35,912-Speed 2629.85 samples/sec   Loss 1.2517   LearningRate 0.0003   Epoch: 18   Global Step: 781620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:39,814-Speed 2624.18 samples/sec   Loss 1.2686   LearningRate 0.0003   Epoch: 18   Global Step: 781630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:43,722-Speed 2621.36 samples/sec   Loss 1.2934   LearningRate 0.0003   Epoch: 18   Global Step: 781640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:47,627-Speed 2623.40 samples/sec   Loss 1.2767   LearningRate 0.0003   Epoch: 18   Global Step: 781650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:51,525-Speed 2627.34 samples/sec   Loss 1.2344   LearningRate 0.0003   Epoch: 18   Global Step: 781660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:55,426-Speed 2625.65 samples/sec   Loss 1.2624   LearningRate 0.0003   Epoch: 18   Global Step: 781670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:35:59,329-Speed 2624.70 samples/sec   Loss 1.2643   LearningRate 0.0003   Epoch: 18   Global Step: 781680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:03,229-Speed 2626.12 samples/sec   Loss 1.2110   LearningRate 0.0003   Epoch: 18   Global Step: 781690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:07,128-Speed 2626.65 samples/sec   Loss 1.2819   LearningRate 0.0003   Epoch: 18   Global Step: 781700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:11,030-Speed 2624.44 samples/sec   Loss 1.2598   LearningRate 0.0003   Epoch: 18   Global Step: 781710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:14,946-Speed 2616.37 samples/sec   Loss 1.2583   LearningRate 0.0003   Epoch: 18   Global Step: 781720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:36:18,824-Speed 2642.15 samples/sec   Loss 1.2502   LearningRate 0.0003   Epoch: 18   Global Step: 781730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:22,726-Speed 2624.72 samples/sec   Loss 1.2767   LearningRate 0.0003   Epoch: 18   Global Step: 781740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:26,626-Speed 2625.92 samples/sec   Loss 1.2399   LearningRate 0.0003   Epoch: 18   Global Step: 781750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:30,525-Speed 2626.49 samples/sec   Loss 1.2285   LearningRate 0.0003   Epoch: 18   Global Step: 781760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:34,428-Speed 2625.47 samples/sec   Loss 1.2390   LearningRate 0.0003   Epoch: 18   Global Step: 781770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:38,325-Speed 2628.41 samples/sec   Loss 1.2239   LearningRate 0.0003   Epoch: 18   Global Step: 781780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:42,225-Speed 2625.54 samples/sec   Loss 1.2757   LearningRate 0.0003   Epoch: 18   Global Step: 781790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:46,135-Speed 2620.87 samples/sec   Loss 1.2474   LearningRate 0.0003   Epoch: 18   Global Step: 781800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:50,036-Speed 2625.24 samples/sec   Loss 1.2460   LearningRate 0.0003   Epoch: 18   Global Step: 781810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:53,940-Speed 2623.82 samples/sec   Loss 1.2931   LearningRate 0.0003   Epoch: 18   Global Step: 781820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:36:57,834-Speed 2630.25 samples/sec   Loss 1.2721   LearningRate 0.0003   Epoch: 18   Global Step: 781830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:37:01,729-Speed 2629.78 samples/sec   Loss 1.3113   LearningRate 0.0003   Epoch: 18   Global Step: 781840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:37:05,625-Speed 2628.69 samples/sec   Loss 1.2670   LearningRate 0.0003   Epoch: 18   Global Step: 781850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:37:09,499-Speed 2644.19 samples/sec   Loss 1.2326   LearningRate 0.0003   Epoch: 18   Global Step: 781860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:13,409-Speed 2619.49 samples/sec   Loss 1.2931   LearningRate 0.0003   Epoch: 18   Global Step: 781870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:17,371-Speed 2585.21 samples/sec   Loss 1.2076   LearningRate 0.0003   Epoch: 18   Global Step: 781880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:21,267-Speed 2628.52 samples/sec   Loss 1.2697   LearningRate 0.0003   Epoch: 18   Global Step: 781890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:25,168-Speed 2626.25 samples/sec   Loss 1.2440   LearningRate 0.0003   Epoch: 18   Global Step: 781900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:29,064-Speed 2628.79 samples/sec   Loss 1.2968   LearningRate 0.0003   Epoch: 18   Global Step: 781910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:32,970-Speed 2622.42 samples/sec   Loss 1.2656   LearningRate 0.0003   Epoch: 18   Global Step: 781920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:36,892-Speed 2612.03 samples/sec   Loss 1.2688   LearningRate 0.0003   Epoch: 18   Global Step: 781930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:40,799-Speed 2621.41 samples/sec   Loss 1.2269   LearningRate 0.0003   Epoch: 18   Global Step: 781940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:44,716-Speed 2615.16 samples/sec   Loss 1.2649   LearningRate 0.0003   Epoch: 18   Global Step: 781950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:48,622-Speed 2623.80 samples/sec   Loss 1.2809   LearningRate 0.0003   Epoch: 18   Global Step: 781960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:37:52,493-Speed 2645.45 samples/sec   Loss 1.1974   LearningRate 0.0003   Epoch: 18   Global Step: 781970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:37:56,398-Speed 2622.76 samples/sec   Loss 1.2516   LearningRate 0.0003   Epoch: 18   Global Step: 781980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:00,297-Speed 2627.36 samples/sec   Loss 1.2245   LearningRate 0.0003   Epoch: 18   Global Step: 781990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:04,199-Speed 2625.74 samples/sec   Loss 1.2407   LearningRate 0.0003   Epoch: 18   Global Step: 782000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:08,103-Speed 2623.13 samples/sec   Loss 1.2630   LearningRate 0.0003   Epoch: 18   Global Step: 782010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:12,063-Speed 2586.84 samples/sec   Loss 1.2831   LearningRate 0.0003   Epoch: 18   Global Step: 782020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:15,969-Speed 2621.81 samples/sec   Loss 1.2417   LearningRate 0.0003   Epoch: 18   Global Step: 782030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:19,865-Speed 2629.62 samples/sec   Loss 1.2482   LearningRate 0.0003   Epoch: 18   Global Step: 782040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:23,758-Speed 2631.47 samples/sec   Loss 1.2426   LearningRate 0.0003   Epoch: 18   Global Step: 782050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:27,656-Speed 2626.99 samples/sec   Loss 1.2445   LearningRate 0.0003   Epoch: 18   Global Step: 782060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:31,536-Speed 2639.58 samples/sec   Loss 1.2357   LearningRate 0.0003   Epoch: 18   Global Step: 782070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:35,439-Speed 2623.90 samples/sec   Loss 1.2470   LearningRate 0.0003   Epoch: 18   Global Step: 782080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:39,347-Speed 2621.44 samples/sec   Loss 1.2213   LearningRate 0.0003   Epoch: 18   Global Step: 782090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:43,248-Speed 2625.14 samples/sec   Loss 1.3021   LearningRate 0.0003   Epoch: 18   Global Step: 782100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:38:47,127-Speed 2640.87 samples/sec   Loss 1.2346   LearningRate 0.0003   Epoch: 18   Global Step: 782110   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:38:51,029-Speed 2625.06 samples/sec   Loss 1.2735   LearningRate 0.0003   Epoch: 18   Global Step: 782120   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:38:54,992-Speed 2585.21 samples/sec   Loss 1.2299   LearningRate 0.0003   Epoch: 18   Global Step: 782130   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:38:58,891-Speed 2626.80 samples/sec   Loss 1.2331   LearningRate 0.0003   Epoch: 18   Global Step: 782140   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:02,793-Speed 2625.11 samples/sec   Loss 1.2706   LearningRate 0.0003   Epoch: 18   Global Step: 782150   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:06,705-Speed 2617.54 samples/sec   Loss 1.1995   LearningRate 0.0003   Epoch: 18   Global Step: 782160   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:10,618-Speed 2617.33 samples/sec   Loss 1.2596   LearningRate 0.0003   Epoch: 18   Global Step: 782170   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:14,550-Speed 2605.62 samples/sec   Loss 1.2375   LearningRate 0.0003   Epoch: 18   Global Step: 782180   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:18,454-Speed 2623.72 samples/sec   Loss 1.2255   LearningRate 0.0003   Epoch: 18   Global Step: 782190   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:22,355-Speed 2625.38 samples/sec   Loss 1.3012   LearningRate 0.0003   Epoch: 18   Global Step: 782200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:39:26,262-Speed 2622.32 samples/sec   Loss 1.2043   LearningRate 0.0003   Epoch: 18   Global Step: 782210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:30,172-Speed 2619.42 samples/sec   Loss 1.2505   LearningRate 0.0003   Epoch: 18   Global Step: 782220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:34,076-Speed 2623.44 samples/sec   Loss 1.2693   LearningRate 0.0003   Epoch: 18   Global Step: 782230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:37,975-Speed 2626.51 samples/sec   Loss 1.2443   LearningRate 0.0003   Epoch: 18   Global Step: 782240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:41,886-Speed 2618.87 samples/sec   Loss 1.2849   LearningRate 0.0003   Epoch: 18   Global Step: 782250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:45,793-Speed 2622.17 samples/sec   Loss 1.2573   LearningRate 0.0003   Epoch: 18   Global Step: 782260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:49,695-Speed 2624.80 samples/sec   Loss 1.2167   LearningRate 0.0003   Epoch: 18   Global Step: 782270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:53,602-Speed 2621.78 samples/sec   Loss 1.3079   LearningRate 0.0003   Epoch: 18   Global Step: 782280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:39:57,510-Speed 2621.11 samples/sec   Loss 1.2760   LearningRate 0.0003   Epoch: 18   Global Step: 782290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:01,409-Speed 2626.92 samples/sec   Loss 1.2229   LearningRate 0.0003   Epoch: 18   Global Step: 782300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:05,309-Speed 2626.23 samples/sec   Loss 1.2185   LearningRate 0.0003   Epoch: 18   Global Step: 782310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:40:09,210-Speed 2625.08 samples/sec   Loss 1.2570   LearningRate 0.0003   Epoch: 18   Global Step: 782320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:40:13,088-Speed 2642.03 samples/sec   Loss 1.2307   LearningRate 0.0003   Epoch: 18   Global Step: 782330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:16,985-Speed 2628.02 samples/sec   Loss 1.2464   LearningRate 0.0003   Epoch: 18   Global Step: 782340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:20,884-Speed 2627.37 samples/sec   Loss 1.2650   LearningRate 0.0003   Epoch: 18   Global Step: 782350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:24,819-Speed 2602.57 samples/sec   Loss 1.1802   LearningRate 0.0003   Epoch: 18   Global Step: 782360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:28,725-Speed 2623.00 samples/sec   Loss 1.2289   LearningRate 0.0003   Epoch: 18   Global Step: 782370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:32,632-Speed 2621.32 samples/sec   Loss 1.2317   LearningRate 0.0003   Epoch: 18   Global Step: 782380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:36,524-Speed 2631.47 samples/sec   Loss 1.2404   LearningRate 0.0003   Epoch: 18   Global Step: 782390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:40,429-Speed 2622.40 samples/sec   Loss 1.1789   LearningRate 0.0003   Epoch: 18   Global Step: 782400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:44,355-Speed 2609.73 samples/sec   Loss 1.2889   LearningRate 0.0003   Epoch: 18   Global Step: 782410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:40:48,230-Speed 2643.53 samples/sec   Loss 1.2802   LearningRate 0.0003   Epoch: 18   Global Step: 782420   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:40:52,122-Speed 2632.70 samples/sec   Loss 1.1886   LearningRate 0.0003   Epoch: 18   Global Step: 782430   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:40:56,019-Speed 2628.30 samples/sec   Loss 1.2403   LearningRate 0.0003   Epoch: 18   Global Step: 782440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:40:59,918-Speed 2627.08 samples/sec   Loss 1.2341   LearningRate 0.0003   Epoch: 18   Global Step: 782450   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:03,833-Speed 2616.18 samples/sec   Loss 1.2419   LearningRate 0.0003   Epoch: 18   Global Step: 782460   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:07,747-Speed 2616.61 samples/sec   Loss 1.2459   LearningRate 0.0003   Epoch: 18   Global Step: 782470   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:11,644-Speed 2628.02 samples/sec   Loss 1.1751   LearningRate 0.0003   Epoch: 18   Global Step: 782480   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:15,551-Speed 2621.95 samples/sec   Loss 1.2410   LearningRate 0.0003   Epoch: 18   Global Step: 782490   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:19,454-Speed 2624.26 samples/sec   Loss 1.2207   LearningRate 0.0003   Epoch: 18   Global Step: 782500   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:23,375-Speed 2612.33 samples/sec   Loss 1.2555   LearningRate 0.0003   Epoch: 18   Global Step: 782510   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:27,274-Speed 2627.40 samples/sec   Loss 1.1997   LearningRate 0.0003   Epoch: 18   Global Step: 782520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:41:31,189-Speed 2615.91 samples/sec   Loss 1.2548   LearningRate 0.0003   Epoch: 18   Global Step: 782530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:41:35,083-Speed 2630.30 samples/sec   Loss 1.2409   LearningRate 0.0003   Epoch: 18   Global Step: 782540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:41:38,958-Speed 2642.66 samples/sec   Loss 1.2444   LearningRate 0.0003   Epoch: 18   Global Step: 782550   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:42,859-Speed 2626.17 samples/sec   Loss 1.2926   LearningRate 0.0003   Epoch: 18   Global Step: 782560   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:46,757-Speed 2627.24 samples/sec   Loss 1.2030   LearningRate 0.0003   Epoch: 18   Global Step: 782570   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:50,669-Speed 2619.02 samples/sec   Loss 1.3024   LearningRate 0.0003   Epoch: 18   Global Step: 782580   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:54,569-Speed 2625.64 samples/sec   Loss 1.2250   LearningRate 0.0003   Epoch: 18   Global Step: 782590   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:41:58,469-Speed 2626.64 samples/sec   Loss 1.3229   LearningRate 0.0003   Epoch: 18   Global Step: 782600   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:42:02,377-Speed 2620.71 samples/sec   Loss 1.2787   LearningRate 0.0003   Epoch: 18   Global Step: 782610   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:42:06,276-Speed 2626.84 samples/sec   Loss 1.2620   LearningRate 0.0003   Epoch: 18   Global Step: 782620   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:42:10,177-Speed 2625.76 samples/sec   Loss 1.2518   LearningRate 0.0003   Epoch: 18   Global Step: 782630   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:42:14,078-Speed 2625.81 samples/sec   Loss 1.2687   LearningRate 0.0003   Epoch: 18   Global Step: 782640   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:42:17,986-Speed 2620.88 samples/sec   Loss 1.2393   LearningRate 0.0003   Epoch: 18   Global Step: 782650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:21,893-Speed 2621.31 samples/sec   Loss 1.2745   LearningRate 0.0003   Epoch: 18   Global Step: 782660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:25,792-Speed 2627.08 samples/sec   Loss 1.2652   LearningRate 0.0003   Epoch: 18   Global Step: 782670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:29,693-Speed 2625.25 samples/sec   Loss 1.2605   LearningRate 0.0003   Epoch: 18   Global Step: 782680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:33,596-Speed 2624.70 samples/sec   Loss 1.2151   LearningRate 0.0003   Epoch: 18   Global Step: 782690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:37,499-Speed 2623.81 samples/sec   Loss 1.2843   LearningRate 0.0003   Epoch: 18   Global Step: 782700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:41,409-Speed 2619.12 samples/sec   Loss 1.2255   LearningRate 0.0003   Epoch: 18   Global Step: 782710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:45,309-Speed 2627.04 samples/sec   Loss 1.2915   LearningRate 0.0003   Epoch: 18   Global Step: 782720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:49,212-Speed 2623.94 samples/sec   Loss 1.2448   LearningRate 0.0003   Epoch: 18   Global Step: 782730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:53,126-Speed 2617.38 samples/sec   Loss 1.2159   LearningRate 0.0003   Epoch: 18   Global Step: 782740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:42:57,002-Speed 2642.67 samples/sec   Loss 1.2665   LearningRate 0.0003   Epoch: 18   Global Step: 782750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:43:00,907-Speed 2622.41 samples/sec   Loss 1.2222   LearningRate 0.0003   Epoch: 18   Global Step: 782760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:43:04,812-Speed 2622.60 samples/sec   Loss 1.2637   LearningRate 0.0003   Epoch: 18   Global Step: 782770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:43:08,716-Speed 2624.01 samples/sec   Loss 1.2801   LearningRate 0.0003   Epoch: 18   Global Step: 782780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:43:12,606-Speed 2633.25 samples/sec   Loss 1.2655   LearningRate 0.0003   Epoch: 18   Global Step: 782790   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:16,523-Speed 2614.76 samples/sec   Loss 1.2562   LearningRate 0.0003   Epoch: 18   Global Step: 782800   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:20,428-Speed 2622.66 samples/sec   Loss 1.2168   LearningRate 0.0003   Epoch: 18   Global Step: 782810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:24,337-Speed 2620.88 samples/sec   Loss 1.2752   LearningRate 0.0003   Epoch: 18   Global Step: 782820   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:28,242-Speed 2622.77 samples/sec   Loss 1.1877   LearningRate 0.0003   Epoch: 18   Global Step: 782830   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:32,144-Speed 2624.88 samples/sec   Loss 1.2558   LearningRate 0.0003   Epoch: 18   Global Step: 782840   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:36,049-Speed 2622.36 samples/sec   Loss 1.1913   LearningRate 0.0003   Epoch: 18   Global Step: 782850   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:39,951-Speed 2625.74 samples/sec   Loss 1.2475   LearningRate 0.0003   Epoch: 18   Global Step: 782860   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:43,856-Speed 2622.92 samples/sec   Loss 1.2495   LearningRate 0.0003   Epoch: 18   Global Step: 782870   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:47,759-Speed 2624.17 samples/sec   Loss 1.2478   LearningRate 0.0003   Epoch: 18   Global Step: 782880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:43:51,662-Speed 2624.68 samples/sec   Loss 1.2688   LearningRate 0.0003   Epoch: 18   Global Step: 782890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:43:55,579-Speed 2614.72 samples/sec   Loss 1.3004   LearningRate 0.0003   Epoch: 18   Global Step: 782900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:43:59,483-Speed 2623.79 samples/sec   Loss 1.2377   LearningRate 0.0003   Epoch: 18   Global Step: 782910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:44:03,385-Speed 2624.95 samples/sec   Loss 1.2495   LearningRate 0.0003   Epoch: 18   Global Step: 782920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:44:07,260-Speed 2643.54 samples/sec   Loss 1.2691   LearningRate 0.0003   Epoch: 18   Global Step: 782930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:11,162-Speed 2624.89 samples/sec   Loss 1.2833   LearningRate 0.0003   Epoch: 18   Global Step: 782940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:15,063-Speed 2626.17 samples/sec   Loss 1.2436   LearningRate 0.0003   Epoch: 18   Global Step: 782950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:18,983-Speed 2613.24 samples/sec   Loss 1.2544   LearningRate 0.0003   Epoch: 18   Global Step: 782960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:22,882-Speed 2626.80 samples/sec   Loss 1.2966   LearningRate 0.0003   Epoch: 18   Global Step: 782970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:26,791-Speed 2620.79 samples/sec   Loss 1.2548   LearningRate 0.0003   Epoch: 18   Global Step: 782980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:30,692-Speed 2625.37 samples/sec   Loss 1.2263   LearningRate 0.0003   Epoch: 18   Global Step: 782990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:34,600-Speed 2620.48 samples/sec   Loss 1.2499   LearningRate 0.0003   Epoch: 18   Global Step: 783000   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:38,514-Speed 2617.70 samples/sec   Loss 1.2524   LearningRate 0.0003   Epoch: 18   Global Step: 783010   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:42,466-Speed 2591.58 samples/sec   Loss 1.2085   LearningRate 0.0003   Epoch: 18   Global Step: 783020   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:44:46,388-Speed 2612.52 samples/sec   Loss 1.2265   LearningRate 0.0003   Epoch: 18   Global Step: 783030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:44:50,294-Speed 2621.87 samples/sec   Loss 1.2653   LearningRate 0.0003   Epoch: 18   Global Step: 783040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:44:54,192-Speed 2628.15 samples/sec   Loss 1.2514   LearningRate 0.0003   Epoch: 18   Global Step: 783050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:44:58,070-Speed 2640.90 samples/sec   Loss 1.2593   LearningRate 0.0003   Epoch: 18   Global Step: 783060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:01,981-Speed 2618.84 samples/sec   Loss 1.2475   LearningRate 0.0003   Epoch: 18   Global Step: 783070   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:05,875-Speed 2629.97 samples/sec   Loss 1.2179   LearningRate 0.0003   Epoch: 18   Global Step: 783080   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:09,776-Speed 2626.40 samples/sec   Loss 1.2326   LearningRate 0.0003   Epoch: 18   Global Step: 783090   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:13,674-Speed 2627.63 samples/sec   Loss 1.2104   LearningRate 0.0003   Epoch: 18   Global Step: 783100   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:17,574-Speed 2625.99 samples/sec   Loss 1.2319   LearningRate 0.0003   Epoch: 18   Global Step: 783110   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:21,474-Speed 2626.83 samples/sec   Loss 1.2238   LearningRate 0.0003   Epoch: 18   Global Step: 783120   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:25,372-Speed 2627.48 samples/sec   Loss 1.2456   LearningRate 0.0003   Epoch: 18   Global Step: 783130   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:29,286-Speed 2617.69 samples/sec   Loss 1.1847   LearningRate 0.0003   Epoch: 18   Global Step: 783140   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:33,194-Speed 2620.75 samples/sec   Loss 1.2107   LearningRate 0.0003   Epoch: 18   Global Step: 783150   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:37,089-Speed 2629.64 samples/sec   Loss 1.2016   LearningRate 0.0003   Epoch: 18   Global Step: 783160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:45:41,028-Speed 2599.96 samples/sec   Loss 1.2928   LearningRate 0.0003   Epoch: 18   Global Step: 783170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:45:44,945-Speed 2615.37 samples/sec   Loss 1.2480   LearningRate 0.0003   Epoch: 18   Global Step: 783180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:45:48,839-Speed 2630.10 samples/sec   Loss 1.2081   LearningRate 0.0003   Epoch: 18   Global Step: 783190   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:52,737-Speed 2628.32 samples/sec   Loss 1.2649   LearningRate 0.0003   Epoch: 18   Global Step: 783200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:45:56,643-Speed 2621.60 samples/sec   Loss 1.2453   LearningRate 0.0003   Epoch: 18   Global Step: 783210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:00,549-Speed 2622.88 samples/sec   Loss 1.2945   LearningRate 0.0003   Epoch: 18   Global Step: 783220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:04,444-Speed 2629.87 samples/sec   Loss 1.2340   LearningRate 0.0003   Epoch: 18   Global Step: 783230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:08,342-Speed 2627.49 samples/sec   Loss 1.2232   LearningRate 0.0003   Epoch: 18   Global Step: 783240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:12,244-Speed 2625.13 samples/sec   Loss 1.2864   LearningRate 0.0003   Epoch: 18   Global Step: 783250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:16,143-Speed 2626.43 samples/sec   Loss 1.2380   LearningRate 0.0003   Epoch: 18   Global Step: 783260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:20,044-Speed 2625.97 samples/sec   Loss 1.2796   LearningRate 0.0003   Epoch: 18   Global Step: 783270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:23,941-Speed 2628.37 samples/sec   Loss 1.2808   LearningRate 0.0003   Epoch: 18   Global Step: 783280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:46:27,838-Speed 2628.49 samples/sec   Loss 1.2925   LearningRate 0.0003   Epoch: 18   Global Step: 783290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:31,739-Speed 2625.98 samples/sec   Loss 1.2938   LearningRate 0.0003   Epoch: 18   Global Step: 783300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:35,663-Speed 2609.87 samples/sec   Loss 1.2224   LearningRate 0.0003   Epoch: 18   Global Step: 783310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:39,558-Speed 2629.45 samples/sec   Loss 1.2379   LearningRate 0.0003   Epoch: 18   Global Step: 783320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:43,453-Speed 2630.22 samples/sec   Loss 1.2403   LearningRate 0.0003   Epoch: 18   Global Step: 783330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:47,350-Speed 2628.27 samples/sec   Loss 1.2631   LearningRate 0.0003   Epoch: 18   Global Step: 783340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:51,254-Speed 2623.79 samples/sec   Loss 1.2845   LearningRate 0.0003   Epoch: 18   Global Step: 783350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:55,152-Speed 2626.98 samples/sec   Loss 1.2573   LearningRate 0.0003   Epoch: 18   Global Step: 783360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:46:59,030-Speed 2641.44 samples/sec   Loss 1.2692   LearningRate 0.0003   Epoch: 18   Global Step: 783370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:02,927-Speed 2628.68 samples/sec   Loss 1.2908   LearningRate 0.0003   Epoch: 18   Global Step: 783380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:06,825-Speed 2627.67 samples/sec   Loss 1.2663   LearningRate 0.0003   Epoch: 18   Global Step: 783390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:10,736-Speed 2618.50 samples/sec   Loss 1.2514   LearningRate 0.0003   Epoch: 18   Global Step: 783400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:14,637-Speed 2625.92 samples/sec   Loss 1.2687   LearningRate 0.0003   Epoch: 18   Global Step: 783410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:18,544-Speed 2621.25 samples/sec   Loss 1.2379   LearningRate 0.0003   Epoch: 18   Global Step: 783420   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:22,446-Speed 2626.15 samples/sec   Loss 1.2072   LearningRate 0.0003   Epoch: 18   Global Step: 783430   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:26,343-Speed 2627.66 samples/sec   Loss 1.2314   LearningRate 0.0003   Epoch: 18   Global Step: 783440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:30,244-Speed 2626.33 samples/sec   Loss 1.2519   LearningRate 0.0003   Epoch: 18   Global Step: 783450   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:34,145-Speed 2625.37 samples/sec   Loss 1.2428   LearningRate 0.0003   Epoch: 18   Global Step: 783460   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:38,045-Speed 2626.61 samples/sec   Loss 1.2429   LearningRate 0.0003   Epoch: 18   Global Step: 783470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:47:41,951-Speed 2621.90 samples/sec   Loss 1.2855   LearningRate 0.0003   Epoch: 18   Global Step: 783480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:47:45,858-Speed 2621.84 samples/sec   Loss 1.2573   LearningRate 0.0003   Epoch: 18   Global Step: 783490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:47:49,782-Speed 2610.38 samples/sec   Loss 1.2638   LearningRate 0.0003   Epoch: 18   Global Step: 783500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:47:53,661-Speed 2639.85 samples/sec   Loss 1.2303   LearningRate 0.0003   Epoch: 18   Global Step: 783510   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:47:57,560-Speed 2627.71 samples/sec   Loss 1.2421   LearningRate 0.0003   Epoch: 18   Global Step: 783520   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:01,460-Speed 2626.65 samples/sec   Loss 1.2734   LearningRate 0.0003   Epoch: 18   Global Step: 783530   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:05,386-Speed 2608.75 samples/sec   Loss 1.3045   LearningRate 0.0003   Epoch: 18   Global Step: 783540   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:09,286-Speed 2626.10 samples/sec   Loss 1.2171   LearningRate 0.0003   Epoch: 18   Global Step: 783550   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:13,188-Speed 2625.15 samples/sec   Loss 1.2923   LearningRate 0.0003   Epoch: 18   Global Step: 783560   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:17,098-Speed 2619.84 samples/sec   Loss 1.2409   LearningRate 0.0003   Epoch: 18   Global Step: 783570   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:20,999-Speed 2625.98 samples/sec   Loss 1.2180   LearningRate 0.0003   Epoch: 18   Global Step: 783580   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:24,903-Speed 2623.54 samples/sec   Loss 1.2269   LearningRate 0.0003   Epoch: 18   Global Step: 783590   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:28,800-Speed 2628.18 samples/sec   Loss 1.2702   LearningRate 0.0003   Epoch: 18   Global Step: 783600   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:48:32,707-Speed 2621.59 samples/sec   Loss 1.2885   LearningRate 0.0003   Epoch: 18   Global Step: 783610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:48:36,626-Speed 2613.55 samples/sec   Loss 1.2531   LearningRate 0.0003   Epoch: 18   Global Step: 783620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:48:40,524-Speed 2628.14 samples/sec   Loss 1.2545   LearningRate 0.0003   Epoch: 18   Global Step: 783630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:48:44,430-Speed 2622.56 samples/sec   Loss 1.2762   LearningRate 0.0003   Epoch: 18   Global Step: 783640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:48:48,364-Speed 2603.33 samples/sec   Loss 1.1907   LearningRate 0.0003   Epoch: 18   Global Step: 783650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:48:52,271-Speed 2622.38 samples/sec   Loss 1.2548   LearningRate 0.0003   Epoch: 18   Global Step: 783660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:48:56,172-Speed 2625.53 samples/sec   Loss 1.2316   LearningRate 0.0003   Epoch: 18   Global Step: 783670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:00,073-Speed 2625.22 samples/sec   Loss 1.2362   LearningRate 0.0003   Epoch: 18   Global Step: 783680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:03,973-Speed 2626.28 samples/sec   Loss 1.2791   LearningRate 0.0003   Epoch: 18   Global Step: 783690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:07,872-Speed 2627.70 samples/sec   Loss 1.2367   LearningRate 0.0003   Epoch: 18   Global Step: 783700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:11,786-Speed 2616.72 samples/sec   Loss 1.2462   LearningRate 0.0003   Epoch: 18   Global Step: 783710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:15,685-Speed 2628.09 samples/sec   Loss 1.2458   LearningRate 0.0003   Epoch: 18   Global Step: 783720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:19,594-Speed 2620.31 samples/sec   Loss 1.2129   LearningRate 0.0003   Epoch: 18   Global Step: 783730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:23,489-Speed 2629.94 samples/sec   Loss 1.2351   LearningRate 0.0003   Epoch: 18   Global Step: 783740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:27,392-Speed 2624.26 samples/sec   Loss 1.2449   LearningRate 0.0003   Epoch: 18   Global Step: 783750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:31,290-Speed 2627.80 samples/sec   Loss 1.2572   LearningRate 0.0003   Epoch: 18   Global Step: 783760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:35,190-Speed 2625.63 samples/sec   Loss 1.2242   LearningRate 0.0003   Epoch: 18   Global Step: 783770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:49:39,070-Speed 2640.95 samples/sec   Loss 1.2160   LearningRate 0.0003   Epoch: 18   Global Step: 783780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:42,966-Speed 2628.31 samples/sec   Loss 1.2486   LearningRate 0.0003   Epoch: 18   Global Step: 783790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:46,864-Speed 2628.09 samples/sec   Loss 1.2225   LearningRate 0.0003   Epoch: 18   Global Step: 783800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:50,762-Speed 2628.02 samples/sec   Loss 1.2346   LearningRate 0.0003   Epoch: 18   Global Step: 783810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:54,659-Speed 2628.17 samples/sec   Loss 1.2843   LearningRate 0.0003   Epoch: 18   Global Step: 783820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:49:58,742-Speed 2508.81 samples/sec   Loss 1.2080   LearningRate 0.0003   Epoch: 18   Global Step: 783830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:02,641-Speed 2626.27 samples/sec   Loss 1.2483   LearningRate 0.0003   Epoch: 18   Global Step: 783840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:06,543-Speed 2624.64 samples/sec   Loss 1.2753   LearningRate 0.0003   Epoch: 18   Global Step: 783850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:10,447-Speed 2624.23 samples/sec   Loss 1.2163   LearningRate 0.0003   Epoch: 18   Global Step: 783860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:14,359-Speed 2618.38 samples/sec   Loss 1.2729   LearningRate 0.0003   Epoch: 18   Global Step: 783870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:18,268-Speed 2620.98 samples/sec   Loss 1.2765   LearningRate 0.0003   Epoch: 18   Global Step: 783880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:50:22,187-Speed 2613.34 samples/sec   Loss 1.2301   LearningRate 0.0003   Epoch: 18   Global Step: 783890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:50:26,092-Speed 2623.11 samples/sec   Loss 1.2556   LearningRate 0.0003   Epoch: 18   Global Step: 783900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:50:29,998-Speed 2621.57 samples/sec   Loss 1.2670   LearningRate 0.0003   Epoch: 18   Global Step: 783910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:50:33,878-Speed 2640.14 samples/sec   Loss 1.2262   LearningRate 0.0003   Epoch: 18   Global Step: 783920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:37,844-Speed 2582.15 samples/sec   Loss 1.2263   LearningRate 0.0003   Epoch: 18   Global Step: 783930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:41,744-Speed 2626.66 samples/sec   Loss 1.2991   LearningRate 0.0003   Epoch: 18   Global Step: 783940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:45,643-Speed 2627.18 samples/sec   Loss 1.2460   LearningRate 0.0003   Epoch: 18   Global Step: 783950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:49,545-Speed 2624.95 samples/sec   Loss 1.2704   LearningRate 0.0003   Epoch: 18   Global Step: 783960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:53,447-Speed 2626.12 samples/sec   Loss 1.2016   LearningRate 0.0003   Epoch: 18   Global Step: 783970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:50:57,350-Speed 2624.16 samples/sec   Loss 1.2614   LearningRate 0.0003   Epoch: 18   Global Step: 783980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:01,246-Speed 2628.52 samples/sec   Loss 1.2419   LearningRate 0.0003   Epoch: 18   Global Step: 783990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:05,152-Speed 2622.66 samples/sec   Loss 1.2341   LearningRate 0.0003   Epoch: 18   Global Step: 784000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:09,067-Speed 2616.04 samples/sec   Loss 1.2765   LearningRate 0.0003   Epoch: 18   Global Step: 784010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:12,968-Speed 2625.37 samples/sec   Loss 1.2438   LearningRate 0.0003   Epoch: 18   Global Step: 784020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:16,871-Speed 2626.67 samples/sec   Loss 1.2786   LearningRate 0.0003   Epoch: 18   Global Step: 784030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:20,772-Speed 2625.58 samples/sec   Loss 1.2332   LearningRate 0.0003   Epoch: 18   Global Step: 784040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:24,679-Speed 2622.05 samples/sec   Loss 1.2544   LearningRate 0.0003   Epoch: 18   Global Step: 784050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:28,588-Speed 2619.96 samples/sec   Loss 1.2654   LearningRate 0.0003   Epoch: 18   Global Step: 784060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:32,494-Speed 2622.53 samples/sec   Loss 1.2132   LearningRate 0.0003   Epoch: 18   Global Step: 784070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:36,398-Speed 2622.93 samples/sec   Loss 1.2266   LearningRate 0.0003   Epoch: 18   Global Step: 784080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:51:40,284-Speed 2636.12 samples/sec   Loss 1.2491   LearningRate 0.0003   Epoch: 18   Global Step: 784090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:44,197-Speed 2617.68 samples/sec   Loss 1.2339   LearningRate 0.0003   Epoch: 18   Global Step: 784100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:48,096-Speed 2627.01 samples/sec   Loss 1.2587   LearningRate 0.0003   Epoch: 18   Global Step: 784110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:52,128-Speed 2540.46 samples/sec   Loss 1.2504   LearningRate 0.0003   Epoch: 18   Global Step: 784120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:56,027-Speed 2627.05 samples/sec   Loss 1.2375   LearningRate 0.0003   Epoch: 18   Global Step: 784130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:51:59,939-Speed 2618.67 samples/sec   Loss 1.2300   LearningRate 0.0003   Epoch: 18   Global Step: 784140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:03,849-Speed 2619.36 samples/sec   Loss 1.2304   LearningRate 0.0003   Epoch: 18   Global Step: 784150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:07,753-Speed 2623.99 samples/sec   Loss 1.1807   LearningRate 0.0003   Epoch: 18   Global Step: 784160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:11,654-Speed 2625.30 samples/sec   Loss 1.1976   LearningRate 0.0003   Epoch: 18   Global Step: 784170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:15,555-Speed 2625.63 samples/sec   Loss 1.2307   LearningRate 0.0003   Epoch: 18   Global Step: 784180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:19,456-Speed 2626.10 samples/sec   Loss 1.2514   LearningRate 0.0003   Epoch: 18   Global Step: 784190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:52:23,355-Speed 2627.40 samples/sec   Loss 1.2407   LearningRate 0.0003   Epoch: 18   Global Step: 784200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:52:27,261-Speed 2622.13 samples/sec   Loss 1.2361   LearningRate 0.0003   Epoch: 18   Global Step: 784210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:52:31,144-Speed 2637.52 samples/sec   Loss 1.2195   LearningRate 0.0003   Epoch: 18   Global Step: 784220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:35,055-Speed 2619.12 samples/sec   Loss 1.2781   LearningRate 0.0003   Epoch: 18   Global Step: 784230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:38,961-Speed 2622.10 samples/sec   Loss 1.2286   LearningRate 0.0003   Epoch: 18   Global Step: 784240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:42,867-Speed 2622.25 samples/sec   Loss 1.2694   LearningRate 0.0003   Epoch: 18   Global Step: 784250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:46,776-Speed 2619.77 samples/sec   Loss 1.2695   LearningRate 0.0003   Epoch: 18   Global Step: 784260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:50,699-Speed 2611.39 samples/sec   Loss 1.2566   LearningRate 0.0003   Epoch: 18   Global Step: 784270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:54,602-Speed 2624.12 samples/sec   Loss 1.2938   LearningRate 0.0003   Epoch: 18   Global Step: 784280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:52:58,511-Speed 2620.53 samples/sec   Loss 1.2150   LearningRate 0.0003   Epoch: 18   Global Step: 784290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:02,493-Speed 2572.37 samples/sec   Loss 1.2147   LearningRate 0.0003   Epoch: 18   Global Step: 784300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:06,396-Speed 2623.63 samples/sec   Loss 1.2607   LearningRate 0.0003   Epoch: 18   Global Step: 784310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:10,296-Speed 2626.53 samples/sec   Loss 1.2136   LearningRate 0.0003   Epoch: 18   Global Step: 784320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:53:14,199-Speed 2624.92 samples/sec   Loss 1.1922   LearningRate 0.0003   Epoch: 18   Global Step: 784330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:53:18,101-Speed 2624.82 samples/sec   Loss 1.2680   LearningRate 0.0003   Epoch: 18   Global Step: 784340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:53:22,001-Speed 2626.59 samples/sec   Loss 1.2241   LearningRate 0.0003   Epoch: 18   Global Step: 784350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:25,923-Speed 2611.25 samples/sec   Loss 1.1970   LearningRate 0.0003   Epoch: 18   Global Step: 784360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:29,826-Speed 2623.94 samples/sec   Loss 1.2675   LearningRate 0.0003   Epoch: 18   Global Step: 784370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:33,738-Speed 2618.44 samples/sec   Loss 1.3069   LearningRate 0.0003   Epoch: 18   Global Step: 784380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:37,687-Speed 2593.16 samples/sec   Loss 1.2549   LearningRate 0.0003   Epoch: 18   Global Step: 784390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:41,617-Speed 2606.64 samples/sec   Loss 1.2380   LearningRate 0.0003   Epoch: 18   Global Step: 784400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:45,522-Speed 2623.09 samples/sec   Loss 1.2285   LearningRate 0.0003   Epoch: 18   Global Step: 784410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:49,430-Speed 2621.18 samples/sec   Loss 1.2221   LearningRate 0.0003   Epoch: 18   Global Step: 784420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:53,334-Speed 2623.87 samples/sec   Loss 1.2571   LearningRate 0.0003   Epoch: 18   Global Step: 784430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:53:57,210-Speed 2642.94 samples/sec   Loss 1.2264   LearningRate 0.0003   Epoch: 18   Global Step: 784440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:01,111-Speed 2625.62 samples/sec   Loss 1.2388   LearningRate 0.0003   Epoch: 18   Global Step: 784450   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:05,024-Speed 2617.56 samples/sec   Loss 1.2656   LearningRate 0.0003   Epoch: 18   Global Step: 784460   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:08,934-Speed 2619.31 samples/sec   Loss 1.2383   LearningRate 0.0003   Epoch: 18   Global Step: 784470   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:12,839-Speed 2623.00 samples/sec   Loss 1.2189   LearningRate 0.0003   Epoch: 18   Global Step: 784480   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:16,738-Speed 2627.20 samples/sec   Loss 1.2348   LearningRate 0.0003   Epoch: 18   Global Step: 784490   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:20,639-Speed 2626.40 samples/sec   Loss 1.2177   LearningRate 0.0003   Epoch: 18   Global Step: 784500   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:24,535-Speed 2628.73 samples/sec   Loss 1.2744   LearningRate 0.0003   Epoch: 18   Global Step: 784510   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:28,441-Speed 2622.15 samples/sec   Loss 1.2411   LearningRate 0.0003   Epoch: 18   Global Step: 784520   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:32,337-Speed 2628.78 samples/sec   Loss 1.2486   LearningRate 0.0003   Epoch: 18   Global Step: 784530   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:54:36,240-Speed 2624.49 samples/sec   Loss 1.2244   LearningRate 0.0003   Epoch: 18   Global Step: 784540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:54:40,141-Speed 2626.08 samples/sec   Loss 1.2162   LearningRate 0.0003   Epoch: 18   Global Step: 784550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:54:44,039-Speed 2627.14 samples/sec   Loss 1.2091   LearningRate 0.0003   Epoch: 18   Global Step: 784560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:54:47,975-Speed 2603.46 samples/sec   Loss 1.2400   LearningRate 0.0003   Epoch: 18   Global Step: 784570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:54:51,878-Speed 2624.04 samples/sec   Loss 1.2629   LearningRate 0.0003   Epoch: 18   Global Step: 784580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:54:55,779-Speed 2625.85 samples/sec   Loss 1.2271   LearningRate 0.0003   Epoch: 18   Global Step: 784590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:54:59,687-Speed 2620.95 samples/sec   Loss 1.2620   LearningRate 0.0003   Epoch: 18   Global Step: 784600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:55:03,590-Speed 2624.37 samples/sec   Loss 1.2960   LearningRate 0.0003   Epoch: 18   Global Step: 784610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:55:07,488-Speed 2627.27 samples/sec   Loss 1.1943   LearningRate 0.0003   Epoch: 18   Global Step: 784620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:55:11,407-Speed 2614.26 samples/sec   Loss 1.1978   LearningRate 0.0003   Epoch: 18   Global Step: 784630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:55:15,284-Speed 2641.96 samples/sec   Loss 1.2793   LearningRate 0.0003   Epoch: 18   Global Step: 784640   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:19,209-Speed 2609.43 samples/sec   Loss 1.2564   LearningRate 0.0003   Epoch: 18   Global Step: 784650   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:23,104-Speed 2629.85 samples/sec   Loss 1.2352   LearningRate 0.0003   Epoch: 18   Global Step: 784660   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:26,999-Speed 2629.88 samples/sec   Loss 1.2486   LearningRate 0.0003   Epoch: 18   Global Step: 784670   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:30,894-Speed 2629.67 samples/sec   Loss 1.2684   LearningRate 0.0003   Epoch: 18   Global Step: 784680   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:34,842-Speed 2594.47 samples/sec   Loss 1.2429   LearningRate 0.0003   Epoch: 18   Global Step: 784690   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:38,739-Speed 2628.19 samples/sec   Loss 1.2212   LearningRate 0.0003   Epoch: 18   Global Step: 784700   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:42,634-Speed 2629.88 samples/sec   Loss 1.2546   LearningRate 0.0003   Epoch: 18   Global Step: 784710   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:46,532-Speed 2627.97 samples/sec   Loss 1.2660   LearningRate 0.0003   Epoch: 18   Global Step: 784720   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:50,436-Speed 2623.58 samples/sec   Loss 1.2589   LearningRate 0.0003   Epoch: 18   Global Step: 784730   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:55:54,336-Speed 2626.45 samples/sec   Loss 1.2457   LearningRate 0.0003   Epoch: 18   Global Step: 784740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:55:58,238-Speed 2624.61 samples/sec   Loss 1.1759   LearningRate 0.0003   Epoch: 18   Global Step: 784750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:02,139-Speed 2625.45 samples/sec   Loss 1.2318   LearningRate 0.0003   Epoch: 18   Global Step: 784760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:06,042-Speed 2624.53 samples/sec   Loss 1.1823   LearningRate 0.0003   Epoch: 18   Global Step: 784770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:09,959-Speed 2614.69 samples/sec   Loss 1.2357   LearningRate 0.0003   Epoch: 18   Global Step: 784780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:13,859-Speed 2627.41 samples/sec   Loss 1.2106   LearningRate 0.0003   Epoch: 18   Global Step: 784790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:17,757-Speed 2626.96 samples/sec   Loss 1.2339   LearningRate 0.0003   Epoch: 18   Global Step: 784800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:21,675-Speed 2614.67 samples/sec   Loss 1.2689   LearningRate 0.0003   Epoch: 18   Global Step: 784810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:25,581-Speed 2622.37 samples/sec   Loss 1.1858   LearningRate 0.0003   Epoch: 18   Global Step: 784820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:29,486-Speed 2623.43 samples/sec   Loss 1.1925   LearningRate 0.0003   Epoch: 18   Global Step: 784830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:33,361-Speed 2643.34 samples/sec   Loss 1.2517   LearningRate 0.0003   Epoch: 18   Global Step: 784840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:37,269-Speed 2620.40 samples/sec   Loss 1.2469   LearningRate 0.0003   Epoch: 18   Global Step: 784850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:41,168-Speed 2626.99 samples/sec   Loss 1.2658   LearningRate 0.0003   Epoch: 18   Global Step: 784860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:45,063-Speed 2629.97 samples/sec   Loss 1.2307   LearningRate 0.0003   Epoch: 18   Global Step: 784870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:48,961-Speed 2627.51 samples/sec   Loss 1.2204   LearningRate 0.0003   Epoch: 18   Global Step: 784880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:52,865-Speed 2623.78 samples/sec   Loss 1.2770   LearningRate 0.0003   Epoch: 18   Global Step: 784890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:56:56,763-Speed 2627.63 samples/sec   Loss 1.2684   LearningRate 0.0003   Epoch: 18   Global Step: 784900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:00,660-Speed 2629.27 samples/sec   Loss 1.2239   LearningRate 0.0003   Epoch: 18   Global Step: 784910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:04,553-Speed 2630.88 samples/sec   Loss 1.2277   LearningRate 0.0003   Epoch: 18   Global Step: 784920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:08,450-Speed 2627.99 samples/sec   Loss 1.2514   LearningRate 0.0003   Epoch: 18   Global Step: 784930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:12,350-Speed 2626.23 samples/sec   Loss 1.2595   LearningRate 0.0003   Epoch: 18   Global Step: 784940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:57:16,245-Speed 2629.72 samples/sec   Loss 1.2411   LearningRate 0.0003   Epoch: 18   Global Step: 784950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:57:20,171-Speed 2609.12 samples/sec   Loss 1.2317   LearningRate 0.0003   Epoch: 18   Global Step: 784960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:57:24,067-Speed 2628.84 samples/sec   Loss 1.2299   LearningRate 0.0003   Epoch: 18   Global Step: 784970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:57:27,981-Speed 2617.30 samples/sec   Loss 1.2846   LearningRate 0.0003   Epoch: 18   Global Step: 784980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:57:31,853-Speed 2644.95 samples/sec   Loss 1.2658   LearningRate 0.0003   Epoch: 18   Global Step: 784990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:35,750-Speed 2628.62 samples/sec   Loss 1.2117   LearningRate 0.0003   Epoch: 18   Global Step: 785000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:39,648-Speed 2627.76 samples/sec   Loss 1.3053   LearningRate 0.0003   Epoch: 18   Global Step: 785010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:43,548-Speed 2626.31 samples/sec   Loss 1.2089   LearningRate 0.0003   Epoch: 18   Global Step: 785020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:47,444-Speed 2628.62 samples/sec   Loss 1.2518   LearningRate 0.0003   Epoch: 18   Global Step: 785030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:51,371-Speed 2609.00 samples/sec   Loss 1.2813   LearningRate 0.0003   Epoch: 18   Global Step: 785040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:55,266-Speed 2629.66 samples/sec   Loss 1.2251   LearningRate 0.0003   Epoch: 18   Global Step: 785050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:57:59,169-Speed 2624.41 samples/sec   Loss 1.2353   LearningRate 0.0003   Epoch: 18   Global Step: 785060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:03,070-Speed 2625.55 samples/sec   Loss 1.2419   LearningRate 0.0003   Epoch: 18   Global Step: 785070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:06,974-Speed 2623.65 samples/sec   Loss 1.2232   LearningRate 0.0003   Epoch: 18   Global Step: 785080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:10,912-Speed 2600.94 samples/sec   Loss 1.2341   LearningRate 0.0003   Epoch: 18   Global Step: 785090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:58:14,817-Speed 2623.71 samples/sec   Loss 1.2408   LearningRate 0.0003   Epoch: 18   Global Step: 785100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:58:18,724-Speed 2621.27 samples/sec   Loss 1.2615   LearningRate 0.0003   Epoch: 18   Global Step: 785110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:58:22,621-Speed 2628.35 samples/sec   Loss 1.2881   LearningRate 0.0003   Epoch: 18   Global Step: 785120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:58:26,516-Speed 2629.70 samples/sec   Loss 1.2453   LearningRate 0.0003   Epoch: 18   Global Step: 785130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:58:30,397-Speed 2638.73 samples/sec   Loss 1.2497   LearningRate 0.0003   Epoch: 18   Global Step: 785140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:34,296-Speed 2627.47 samples/sec   Loss 1.2698   LearningRate 0.0003   Epoch: 18   Global Step: 785150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:38,195-Speed 2626.86 samples/sec   Loss 1.1628   LearningRate 0.0003   Epoch: 18   Global Step: 785160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:42,092-Speed 2628.09 samples/sec   Loss 1.2263   LearningRate 0.0003   Epoch: 18   Global Step: 785170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:46,001-Speed 2620.02 samples/sec   Loss 1.2297   LearningRate 0.0003   Epoch: 18   Global Step: 785180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:49,910-Speed 2620.93 samples/sec   Loss 1.2751   LearningRate 0.0003   Epoch: 18   Global Step: 785190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:53,805-Speed 2629.61 samples/sec   Loss 1.2720   LearningRate 0.0003   Epoch: 18   Global Step: 785200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:58:57,701-Speed 2629.05 samples/sec   Loss 1.2238   LearningRate 0.0003   Epoch: 18   Global Step: 785210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:59:01,594-Speed 2630.74 samples/sec   Loss 1.2743   LearningRate 0.0003   Epoch: 18   Global Step: 785220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:59:05,493-Speed 2627.46 samples/sec   Loss 1.2247   LearningRate 0.0003   Epoch: 18   Global Step: 785230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:59:09,391-Speed 2627.56 samples/sec   Loss 1.2279   LearningRate 0.0003   Epoch: 18   Global Step: 785240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:59:13,290-Speed 2626.80 samples/sec   Loss 1.2351   LearningRate 0.0003   Epoch: 18   Global Step: 785250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 11:59:17,164-Speed 2643.64 samples/sec   Loss 1.2239   LearningRate 0.0003   Epoch: 18   Global Step: 785260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 11:59:21,037-Speed 2645.13 samples/sec   Loss 1.2642   LearningRate 0.0003   Epoch: 18   Global Step: 785270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:24,934-Speed 2628.03 samples/sec   Loss 1.2401   LearningRate 0.0003   Epoch: 18   Global Step: 785280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:28,848-Speed 2617.06 samples/sec   Loss 1.2683   LearningRate 0.0003   Epoch: 18   Global Step: 785290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:32,743-Speed 2629.44 samples/sec   Loss 1.1947   LearningRate 0.0003   Epoch: 18   Global Step: 785300   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:36,643-Speed 2626.95 samples/sec   Loss 1.1730   LearningRate 0.0003   Epoch: 18   Global Step: 785310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:40,541-Speed 2627.44 samples/sec   Loss 1.2391   LearningRate 0.0003   Epoch: 18   Global Step: 785320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:44,452-Speed 2619.05 samples/sec   Loss 1.2492   LearningRate 0.0003   Epoch: 18   Global Step: 785330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:48,356-Speed 2623.18 samples/sec   Loss 1.2662   LearningRate 0.0003   Epoch: 18   Global Step: 785340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:52,359-Speed 2559.48 samples/sec   Loss 1.2125   LearningRate 0.0003   Epoch: 18   Global Step: 785350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 11:59:56,501-Speed 2473.00 samples/sec   Loss 1.2676   LearningRate 0.0003   Epoch: 18   Global Step: 785360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:00:00,491-Speed 2567.03 samples/sec   Loss 1.2385   LearningRate 0.0003   Epoch: 18   Global Step: 785370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:04,386-Speed 2630.22 samples/sec   Loss 1.2523   LearningRate 0.0003   Epoch: 18   Global Step: 785380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:08,284-Speed 2627.61 samples/sec   Loss 1.2585   LearningRate 0.0003   Epoch: 18   Global Step: 785390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:12,180-Speed 2629.28 samples/sec   Loss 1.2299   LearningRate 0.0003   Epoch: 18   Global Step: 785400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:16,075-Speed 2629.45 samples/sec   Loss 1.2445   LearningRate 0.0003   Epoch: 18   Global Step: 785410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:19,971-Speed 2629.11 samples/sec   Loss 1.2614   LearningRate 0.0003   Epoch: 18   Global Step: 785420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:23,870-Speed 2626.81 samples/sec   Loss 1.1987   LearningRate 0.0003   Epoch: 18   Global Step: 785430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:27,761-Speed 2632.47 samples/sec   Loss 1.1579   LearningRate 0.0003   Epoch: 18   Global Step: 785440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:31,664-Speed 2624.70 samples/sec   Loss 1.2370   LearningRate 0.0003   Epoch: 18   Global Step: 785450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:35,572-Speed 2621.15 samples/sec   Loss 1.2390   LearningRate 0.0003   Epoch: 18   Global Step: 785460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:39,472-Speed 2626.25 samples/sec   Loss 1.2535   LearningRate 0.0003   Epoch: 18   Global Step: 785470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:00:43,348-Speed 2642.87 samples/sec   Loss 1.2191   LearningRate 0.0003   Epoch: 18   Global Step: 785480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:47,252-Speed 2623.98 samples/sec   Loss 1.2300   LearningRate 0.0003   Epoch: 18   Global Step: 785490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:51,148-Speed 2628.80 samples/sec   Loss 1.1862   LearningRate 0.0003   Epoch: 18   Global Step: 785500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:55,117-Speed 2580.65 samples/sec   Loss 1.2285   LearningRate 0.0003   Epoch: 18   Global Step: 785510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:00:59,016-Speed 2627.29 samples/sec   Loss 1.2355   LearningRate 0.0003   Epoch: 18   Global Step: 785520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:02,927-Speed 2618.43 samples/sec   Loss 1.2603   LearningRate 0.0003   Epoch: 18   Global Step: 785530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:06,826-Speed 2627.00 samples/sec   Loss 1.2143   LearningRate 0.0003   Epoch: 18   Global Step: 785540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:10,724-Speed 2628.24 samples/sec   Loss 1.2188   LearningRate 0.0003   Epoch: 18   Global Step: 785550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:14,616-Speed 2631.44 samples/sec   Loss 1.2493   LearningRate 0.0003   Epoch: 18   Global Step: 785560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:18,522-Speed 2622.34 samples/sec   Loss 1.2507   LearningRate 0.0003   Epoch: 18   Global Step: 785570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:22,418-Speed 2629.11 samples/sec   Loss 1.2292   LearningRate 0.0003   Epoch: 18   Global Step: 785580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:01:26,293-Speed 2643.23 samples/sec   Loss 1.2381   LearningRate 0.0003   Epoch: 18   Global Step: 785590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:30,200-Speed 2621.49 samples/sec   Loss 1.2512   LearningRate 0.0003   Epoch: 18   Global Step: 785600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:34,112-Speed 2618.58 samples/sec   Loss 1.2419   LearningRate 0.0003   Epoch: 18   Global Step: 785610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:38,011-Speed 2626.44 samples/sec   Loss 1.2478   LearningRate 0.0003   Epoch: 18   Global Step: 785620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:41,910-Speed 2627.07 samples/sec   Loss 1.2383   LearningRate 0.0003   Epoch: 18   Global Step: 785630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:45,810-Speed 2627.18 samples/sec   Loss 1.2473   LearningRate 0.0003   Epoch: 18   Global Step: 785640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:49,715-Speed 2622.41 samples/sec   Loss 1.1872   LearningRate 0.0003   Epoch: 18   Global Step: 785650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:53,736-Speed 2547.49 samples/sec   Loss 1.2429   LearningRate 0.0003   Epoch: 18   Global Step: 785660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:01:57,682-Speed 2596.11 samples/sec   Loss 1.2271   LearningRate 0.0003   Epoch: 18   Global Step: 785670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:01,596-Speed 2616.60 samples/sec   Loss 1.1914   LearningRate 0.0003   Epoch: 18   Global Step: 785680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:05,512-Speed 2616.11 samples/sec   Loss 1.2422   LearningRate 0.0003   Epoch: 18   Global Step: 785690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:02:09,407-Speed 2629.78 samples/sec   Loss 1.2812   LearningRate 0.0003   Epoch: 18   Global Step: 785700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:02:13,285-Speed 2640.66 samples/sec   Loss 1.2511   LearningRate 0.0003   Epoch: 18   Global Step: 785710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:17,179-Speed 2630.86 samples/sec   Loss 1.1812   LearningRate 0.0003   Epoch: 18   Global Step: 785720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:21,070-Speed 2632.54 samples/sec   Loss 1.1515   LearningRate 0.0003   Epoch: 18   Global Step: 785730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:24,973-Speed 2624.28 samples/sec   Loss 1.2321   LearningRate 0.0003   Epoch: 18   Global Step: 785740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:28,865-Speed 2632.02 samples/sec   Loss 1.2204   LearningRate 0.0003   Epoch: 18   Global Step: 785750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:32,764-Speed 2627.27 samples/sec   Loss 1.2144   LearningRate 0.0003   Epoch: 18   Global Step: 785760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:36,772-Speed 2554.93 samples/sec   Loss 1.2436   LearningRate 0.0003   Epoch: 18   Global Step: 785770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:40,681-Speed 2620.30 samples/sec   Loss 1.2282   LearningRate 0.0003   Epoch: 18   Global Step: 785780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:44,593-Speed 2619.26 samples/sec   Loss 1.2918   LearningRate 0.0003   Epoch: 18   Global Step: 785790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:48,488-Speed 2628.99 samples/sec   Loss 1.2047   LearningRate 0.0003   Epoch: 18   Global Step: 785800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:02:52,544-Speed 2525.92 samples/sec   Loss 1.2080   LearningRate 0.0003   Epoch: 18   Global Step: 785810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:02:56,469-Speed 2609.72 samples/sec   Loss 1.2077   LearningRate 0.0003   Epoch: 18   Global Step: 785820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:00,435-Speed 2583.02 samples/sec   Loss 1.2543   LearningRate 0.0003   Epoch: 18   Global Step: 785830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:04,341-Speed 2621.99 samples/sec   Loss 1.1926   LearningRate 0.0003   Epoch: 18   Global Step: 785840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:08,238-Speed 2628.36 samples/sec   Loss 1.2335   LearningRate 0.0003   Epoch: 18   Global Step: 785850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:12,139-Speed 2625.45 samples/sec   Loss 1.2719   LearningRate 0.0003   Epoch: 18   Global Step: 785860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:16,045-Speed 2629.72 samples/sec   Loss 1.2426   LearningRate 0.0003   Epoch: 18   Global Step: 785870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:19,940-Speed 2629.72 samples/sec   Loss 1.2149   LearningRate 0.0003   Epoch: 18   Global Step: 785880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:23,838-Speed 2628.36 samples/sec   Loss 1.2371   LearningRate 0.0003   Epoch: 18   Global Step: 785890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:27,729-Speed 2631.94 samples/sec   Loss 1.2257   LearningRate 0.0003   Epoch: 18   Global Step: 785900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:31,625-Speed 2629.49 samples/sec   Loss 1.2292   LearningRate 0.0003   Epoch: 18   Global Step: 785910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:35,499-Speed 2643.62 samples/sec   Loss 1.2104   LearningRate 0.0003   Epoch: 18   Global Step: 785920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:39,415-Speed 2615.40 samples/sec   Loss 1.2515   LearningRate 0.0003   Epoch: 18   Global Step: 785930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:43,329-Speed 2616.70 samples/sec   Loss 1.2308   LearningRate 0.0003   Epoch: 18   Global Step: 785940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:47,292-Speed 2585.26 samples/sec   Loss 1.2583   LearningRate 0.0003   Epoch: 18   Global Step: 785950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:51,190-Speed 2627.81 samples/sec   Loss 1.2744   LearningRate 0.0003   Epoch: 18   Global Step: 785960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:55,101-Speed 2618.46 samples/sec   Loss 1.1901   LearningRate 0.0003   Epoch: 18   Global Step: 785970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:03:58,997-Speed 2629.66 samples/sec   Loss 1.2519   LearningRate 0.0003   Epoch: 18   Global Step: 785980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:02,892-Speed 2629.76 samples/sec   Loss 1.2258   LearningRate 0.0003   Epoch: 18   Global Step: 785990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:06,787-Speed 2629.16 samples/sec   Loss 1.2522   LearningRate 0.0003   Epoch: 18   Global Step: 786000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:10,690-Speed 2624.62 samples/sec   Loss 1.1925   LearningRate 0.0003   Epoch: 18   Global Step: 786010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:14,594-Speed 2623.41 samples/sec   Loss 1.2587   LearningRate 0.0003   Epoch: 18   Global Step: 786020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:04:18,497-Speed 2624.52 samples/sec   Loss 1.2757   LearningRate 0.0003   Epoch: 18   Global Step: 786030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:04:22,394-Speed 2628.65 samples/sec   Loss 1.2530   LearningRate 0.0003   Epoch: 18   Global Step: 786040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:04:26,286-Speed 2632.09 samples/sec   Loss 1.2263   LearningRate 0.0003   Epoch: 18   Global Step: 786050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:04:30,181-Speed 2629.10 samples/sec   Loss 1.2262   LearningRate 0.0003   Epoch: 18   Global Step: 786060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:04:34,082-Speed 2625.83 samples/sec   Loss 1.1653   LearningRate 0.0003   Epoch: 18   Global Step: 786070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:04:37,958-Speed 2642.45 samples/sec   Loss 1.2688   LearningRate 0.0003   Epoch: 18   Global Step: 786080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:41,876-Speed 2614.37 samples/sec   Loss 1.2064   LearningRate 0.0003   Epoch: 18   Global Step: 786090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:45,847-Speed 2579.46 samples/sec   Loss 1.2313   LearningRate 0.0003   Epoch: 18   Global Step: 786100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:49,741-Speed 2630.98 samples/sec   Loss 1.2650   LearningRate 0.0003   Epoch: 18   Global Step: 786110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:53,656-Speed 2615.75 samples/sec   Loss 1.1851   LearningRate 0.0003   Epoch: 18   Global Step: 786120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:04:57,550-Speed 2630.69 samples/sec   Loss 1.1937   LearningRate 0.0003   Epoch: 18   Global Step: 786130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:01,453-Speed 2624.12 samples/sec   Loss 1.2552   LearningRate 0.0003   Epoch: 18   Global Step: 786140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:05,350-Speed 2628.68 samples/sec   Loss 1.2242   LearningRate 0.0003   Epoch: 18   Global Step: 786150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:09,245-Speed 2629.01 samples/sec   Loss 1.2097   LearningRate 0.0003   Epoch: 18   Global Step: 786160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:13,146-Speed 2625.81 samples/sec   Loss 1.2355   LearningRate 0.0003   Epoch: 18   Global Step: 786170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:17,020-Speed 2643.86 samples/sec   Loss 1.2493   LearningRate 0.0003   Epoch: 18   Global Step: 786180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:20,947-Speed 2608.52 samples/sec   Loss 1.2277   LearningRate 0.0003   Epoch: 18   Global Step: 786190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:24,843-Speed 2629.34 samples/sec   Loss 1.1948   LearningRate 0.0003   Epoch: 18   Global Step: 786200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:28,739-Speed 2629.31 samples/sec   Loss 1.2024   LearningRate 0.0003   Epoch: 18   Global Step: 786210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:32,636-Speed 2628.11 samples/sec   Loss 1.2273   LearningRate 0.0003   Epoch: 18   Global Step: 786220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:36,533-Speed 2628.53 samples/sec   Loss 1.2372   LearningRate 0.0003   Epoch: 18   Global Step: 786230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:40,444-Speed 2618.92 samples/sec   Loss 1.2415   LearningRate 0.0003   Epoch: 18   Global Step: 786240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:44,339-Speed 2629.71 samples/sec   Loss 1.2745   LearningRate 0.0003   Epoch: 18   Global Step: 786250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:48,234-Speed 2629.17 samples/sec   Loss 1.2315   LearningRate 0.0003   Epoch: 18   Global Step: 786260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:52,166-Speed 2605.48 samples/sec   Loss 1.2614   LearningRate 0.0003   Epoch: 18   Global Step: 786270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:05:56,069-Speed 2624.59 samples/sec   Loss 1.1932   LearningRate 0.0003   Epoch: 18   Global Step: 786280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:05:59,972-Speed 2624.18 samples/sec   Loss 1.2359   LearningRate 0.0003   Epoch: 18   Global Step: 786290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:06:03,866-Speed 2630.86 samples/sec   Loss 1.2320   LearningRate 0.0003   Epoch: 18   Global Step: 786300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:06:07,834-Speed 2580.93 samples/sec   Loss 1.2490   LearningRate 0.0003   Epoch: 18   Global Step: 786310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:06:11,729-Speed 2629.74 samples/sec   Loss 1.2029   LearningRate 0.0003   Epoch: 18   Global Step: 786320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:06:15,626-Speed 2628.46 samples/sec   Loss 1.2074   LearningRate 0.0003   Epoch: 18   Global Step: 786330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:06:19,552-Speed 2609.58 samples/sec   Loss 1.2458   LearningRate 0.0003   Epoch: 18   Global Step: 786340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:23,453-Speed 2625.46 samples/sec   Loss 1.1920   LearningRate 0.0003   Epoch: 18   Global Step: 786350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:27,345-Speed 2631.66 samples/sec   Loss 1.2020   LearningRate 0.0003   Epoch: 18   Global Step: 786360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:31,251-Speed 2622.89 samples/sec   Loss 1.2584   LearningRate 0.0003   Epoch: 18   Global Step: 786370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:35,144-Speed 2631.02 samples/sec   Loss 1.2733   LearningRate 0.0003   Epoch: 18   Global Step: 786380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:39,040-Speed 2628.87 samples/sec   Loss 1.2544   LearningRate 0.0003   Epoch: 18   Global Step: 786390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:42,935-Speed 2629.92 samples/sec   Loss 1.2226   LearningRate 0.0003   Epoch: 18   Global Step: 786400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:46,828-Speed 2631.31 samples/sec   Loss 1.2427   LearningRate 0.0003   Epoch: 18   Global Step: 786410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:50,719-Speed 2632.58 samples/sec   Loss 1.2726   LearningRate 0.0003   Epoch: 18   Global Step: 786420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:54,690-Speed 2579.87 samples/sec   Loss 1.1943   LearningRate 0.0003   Epoch: 18   Global Step: 786430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:06:58,565-Speed 2642.83 samples/sec   Loss 1.1874   LearningRate 0.0003   Epoch: 18   Global Step: 786440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:02,542-Speed 2575.74 samples/sec   Loss 1.2558   LearningRate 0.0003   Epoch: 18   Global Step: 786450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:06,635-Speed 2501.97 samples/sec   Loss 1.2604   LearningRate 0.0003   Epoch: 18   Global Step: 786460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:10,632-Speed 2563.10 samples/sec   Loss 1.2387   LearningRate 0.0003   Epoch: 18   Global Step: 786470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:14,527-Speed 2629.32 samples/sec   Loss 1.2431   LearningRate 0.0003   Epoch: 18   Global Step: 786480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:18,442-Speed 2616.57 samples/sec   Loss 1.2091   LearningRate 0.0003   Epoch: 18   Global Step: 786490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:22,373-Speed 2605.44 samples/sec   Loss 1.2464   LearningRate 0.0003   Epoch: 18   Global Step: 786500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:26,265-Speed 2631.93 samples/sec   Loss 1.1907   LearningRate 0.0003   Epoch: 18   Global Step: 786510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:30,161-Speed 2629.51 samples/sec   Loss 1.1884   LearningRate 0.0003   Epoch: 18   Global Step: 786520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:34,054-Speed 2630.62 samples/sec   Loss 1.1881   LearningRate 0.0003   Epoch: 18   Global Step: 786530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:37,948-Speed 2630.56 samples/sec   Loss 1.1692   LearningRate 0.0003   Epoch: 18   Global Step: 786540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:07:41,841-Speed 2630.44 samples/sec   Loss 1.2605   LearningRate 0.0003   Epoch: 18   Global Step: 786550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:07:45,710-Speed 2647.76 samples/sec   Loss 1.2277   LearningRate 0.0003   Epoch: 18   Global Step: 786560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:49,613-Speed 2624.06 samples/sec   Loss 1.1822   LearningRate 0.0003   Epoch: 18   Global Step: 786570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:53,505-Speed 2631.81 samples/sec   Loss 1.2269   LearningRate 0.0003   Epoch: 18   Global Step: 786580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:07:57,398-Speed 2631.64 samples/sec   Loss 1.2476   LearningRate 0.0003   Epoch: 18   Global Step: 786590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:01,295-Speed 2628.08 samples/sec   Loss 1.2519   LearningRate 0.0003   Epoch: 18   Global Step: 786600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:05,189-Speed 2630.26 samples/sec   Loss 1.1983   LearningRate 0.0003   Epoch: 18   Global Step: 786610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:09,088-Speed 2626.99 samples/sec   Loss 1.1923   LearningRate 0.0003   Epoch: 18   Global Step: 786620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:12,993-Speed 2622.41 samples/sec   Loss 1.2059   LearningRate 0.0003   Epoch: 18   Global Step: 786630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:16,890-Speed 2628.89 samples/sec   Loss 1.2402   LearningRate 0.0003   Epoch: 18   Global Step: 786640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:20,789-Speed 2626.68 samples/sec   Loss 1.2155   LearningRate 0.0003   Epoch: 18   Global Step: 786650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:24,687-Speed 2628.54 samples/sec   Loss 1.2461   LearningRate 0.0003   Epoch: 18   Global Step: 786660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:08:28,557-Speed 2646.58 samples/sec   Loss 1.2099   LearningRate 0.0003   Epoch: 18   Global Step: 786670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:32,452-Speed 2629.28 samples/sec   Loss 1.2689   LearningRate 0.0003   Epoch: 18   Global Step: 786680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:36,349-Speed 2628.69 samples/sec   Loss 1.1862   LearningRate 0.0003   Epoch: 18   Global Step: 786690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:40,243-Speed 2629.70 samples/sec   Loss 1.2199   LearningRate 0.0003   Epoch: 18   Global Step: 786700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:44,142-Speed 2626.65 samples/sec   Loss 1.2131   LearningRate 0.0003   Epoch: 18   Global Step: 786710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:48,058-Speed 2616.34 samples/sec   Loss 1.2358   LearningRate 0.0003   Epoch: 18   Global Step: 786720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:51,950-Speed 2631.73 samples/sec   Loss 1.2388   LearningRate 0.0003   Epoch: 18   Global Step: 786730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:55,850-Speed 2626.62 samples/sec   Loss 1.2586   LearningRate 0.0003   Epoch: 18   Global Step: 786740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:08:59,739-Speed 2633.94 samples/sec   Loss 1.1980   LearningRate 0.0003   Epoch: 18   Global Step: 786750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:09:03,644-Speed 2622.69 samples/sec   Loss 1.2178   LearningRate 0.0003   Epoch: 18   Global Step: 786760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:09:07,543-Speed 2627.17 samples/sec   Loss 1.2163   LearningRate 0.0003   Epoch: 18   Global Step: 786770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:09:11,418-Speed 2642.91 samples/sec   Loss 1.2622   LearningRate 0.0003   Epoch: 18   Global Step: 786780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:09:15,306-Speed 2634.85 samples/sec   Loss 1.2050   LearningRate 0.0003   Epoch: 18   Global Step: 786790   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:19,218-Speed 2618.37 samples/sec   Loss 1.2605   LearningRate 0.0003   Epoch: 18   Global Step: 786800   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:23,132-Speed 2617.04 samples/sec   Loss 1.2251   LearningRate 0.0003   Epoch: 18   Global Step: 786810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:27,033-Speed 2625.81 samples/sec   Loss 1.2465   LearningRate 0.0003   Epoch: 18   Global Step: 786820   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:30,950-Speed 2615.25 samples/sec   Loss 1.2431   LearningRate 0.0003   Epoch: 18   Global Step: 786830   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:34,864-Speed 2616.23 samples/sec   Loss 1.2031   LearningRate 0.0003   Epoch: 18   Global Step: 786840   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:38,770-Speed 2622.02 samples/sec   Loss 1.2565   LearningRate 0.0003   Epoch: 18   Global Step: 786850   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:42,674-Speed 2623.64 samples/sec   Loss 1.1848   LearningRate 0.0003   Epoch: 18   Global Step: 786860   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:46,565-Speed 2632.57 samples/sec   Loss 1.2155   LearningRate 0.0003   Epoch: 18   Global Step: 786870   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:50,465-Speed 2625.93 samples/sec   Loss 1.3084   LearningRate 0.0003   Epoch: 18   Global Step: 786880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:09:54,355-Speed 2633.18 samples/sec   Loss 1.2020   LearningRate 0.0003   Epoch: 18   Global Step: 786890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:09:58,248-Speed 2631.74 samples/sec   Loss 1.2752   LearningRate 0.0003   Epoch: 18   Global Step: 786900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:02,140-Speed 2631.92 samples/sec   Loss 1.2309   LearningRate 0.0003   Epoch: 18   Global Step: 786910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:06,033-Speed 2630.82 samples/sec   Loss 1.2295   LearningRate 0.0003   Epoch: 18   Global Step: 786920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:09,929-Speed 2628.63 samples/sec   Loss 1.2591   LearningRate 0.0003   Epoch: 18   Global Step: 786930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:13,829-Speed 2625.91 samples/sec   Loss 1.1996   LearningRate 0.0003   Epoch: 18   Global Step: 786940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:17,728-Speed 2627.16 samples/sec   Loss 1.2472   LearningRate 0.0003   Epoch: 18   Global Step: 786950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:21,627-Speed 2627.87 samples/sec   Loss 1.2536   LearningRate 0.0003   Epoch: 18   Global Step: 786960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:10:25,508-Speed 2638.95 samples/sec   Loss 1.2397   LearningRate 0.0003   Epoch: 18   Global Step: 786970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:29,400-Speed 2631.51 samples/sec   Loss 1.2665   LearningRate 0.0003   Epoch: 18   Global Step: 786980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:33,298-Speed 2627.76 samples/sec   Loss 1.2436   LearningRate 0.0003   Epoch: 18   Global Step: 786990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:37,203-Speed 2623.35 samples/sec   Loss 1.2289   LearningRate 0.0003   Epoch: 18   Global Step: 787000   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:41,100-Speed 2628.14 samples/sec   Loss 1.2187   LearningRate 0.0003   Epoch: 18   Global Step: 787010   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:44,995-Speed 2629.35 samples/sec   Loss 1.2162   LearningRate 0.0003   Epoch: 18   Global Step: 787020   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:48,897-Speed 2625.28 samples/sec   Loss 1.2281   LearningRate 0.0003   Epoch: 18   Global Step: 787030   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:52,835-Speed 2601.13 samples/sec   Loss 1.2229   LearningRate 0.0003   Epoch: 18   Global Step: 787040   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:10:56,733-Speed 2627.83 samples/sec   Loss 1.2440   LearningRate 0.0003   Epoch: 18   Global Step: 787050   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:11:00,648-Speed 2616.31 samples/sec   Loss 1.2523   LearningRate 0.0003   Epoch: 18   Global Step: 787060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:11:04,544-Speed 2629.92 samples/sec   Loss 1.2472   LearningRate 0.0003   Epoch: 18   Global Step: 787070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:08,440-Speed 2628.55 samples/sec   Loss 1.2314   LearningRate 0.0003   Epoch: 18   Global Step: 787080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:12,337-Speed 2628.56 samples/sec   Loss 1.2539   LearningRate 0.0003   Epoch: 18   Global Step: 787090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:16,235-Speed 2627.23 samples/sec   Loss 1.2260   LearningRate 0.0003   Epoch: 18   Global Step: 787100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:20,134-Speed 2627.39 samples/sec   Loss 1.2277   LearningRate 0.0003   Epoch: 18   Global Step: 787110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:24,031-Speed 2627.96 samples/sec   Loss 1.1788   LearningRate 0.0003   Epoch: 18   Global Step: 787120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:27,923-Speed 2632.07 samples/sec   Loss 1.2085   LearningRate 0.0003   Epoch: 18   Global Step: 787130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:31,820-Speed 2628.53 samples/sec   Loss 1.2083   LearningRate 0.0003   Epoch: 18   Global Step: 787140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:35,715-Speed 2629.69 samples/sec   Loss 1.2138   LearningRate 0.0003   Epoch: 18   Global Step: 787150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:39,619-Speed 2623.75 samples/sec   Loss 1.2050   LearningRate 0.0003   Epoch: 18   Global Step: 787160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:11:43,528-Speed 2619.65 samples/sec   Loss 1.2571   LearningRate 0.0003   Epoch: 18   Global Step: 787170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:11:47,383-Speed 2657.10 samples/sec   Loss 1.2124   LearningRate 0.0003   Epoch: 18   Global Step: 787180   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:11:51,298-Speed 2616.54 samples/sec   Loss 1.2102   LearningRate 0.0003   Epoch: 18   Global Step: 787190   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:11:55,188-Speed 2632.52 samples/sec   Loss 1.1972   LearningRate 0.0003   Epoch: 18   Global Step: 787200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:11:59,183-Speed 2564.33 samples/sec   Loss 1.1990   LearningRate 0.0003   Epoch: 18   Global Step: 787210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:03,079-Speed 2629.06 samples/sec   Loss 1.2471   LearningRate 0.0003   Epoch: 18   Global Step: 787220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:06,967-Speed 2634.76 samples/sec   Loss 1.2680   LearningRate 0.0003   Epoch: 18   Global Step: 787230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:10,859-Speed 2631.59 samples/sec   Loss 1.2037   LearningRate 0.0003   Epoch: 18   Global Step: 787240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:14,752-Speed 2631.18 samples/sec   Loss 1.2053   LearningRate 0.0003   Epoch: 18   Global Step: 787250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:18,643-Speed 2632.20 samples/sec   Loss 1.2213   LearningRate 0.0003   Epoch: 18   Global Step: 787260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:22,537-Speed 2630.73 samples/sec   Loss 1.2272   LearningRate 0.0003   Epoch: 18   Global Step: 787270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:26,428-Speed 2631.63 samples/sec   Loss 1.1706   LearningRate 0.0003   Epoch: 18   Global Step: 787280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:12:30,319-Speed 2632.54 samples/sec   Loss 1.2020   LearningRate 0.0003   Epoch: 18   Global Step: 787290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:12:34,214-Speed 2629.63 samples/sec   Loss 1.2449   LearningRate 0.0003   Epoch: 18   Global Step: 787300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:12:38,088-Speed 2643.99 samples/sec   Loss 1.2385   LearningRate 0.0003   Epoch: 18   Global Step: 787310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:41,983-Speed 2629.84 samples/sec   Loss 1.2777   LearningRate 0.0003   Epoch: 18   Global Step: 787320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:45,881-Speed 2627.87 samples/sec   Loss 1.2166   LearningRate 0.0003   Epoch: 18   Global Step: 787330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:49,800-Speed 2614.10 samples/sec   Loss 1.2526   LearningRate 0.0003   Epoch: 18   Global Step: 787340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:53,700-Speed 2625.94 samples/sec   Loss 1.2463   LearningRate 0.0003   Epoch: 18   Global Step: 787350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:12:57,592-Speed 2631.95 samples/sec   Loss 1.2376   LearningRate 0.0003   Epoch: 18   Global Step: 787360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:01,484-Speed 2631.78 samples/sec   Loss 1.2011   LearningRate 0.0003   Epoch: 18   Global Step: 787370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:05,382-Speed 2627.67 samples/sec   Loss 1.2380   LearningRate 0.0003   Epoch: 18   Global Step: 787380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:09,272-Speed 2632.76 samples/sec   Loss 1.2306   LearningRate 0.0003   Epoch: 18   Global Step: 787390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:13,161-Speed 2634.06 samples/sec   Loss 1.2261   LearningRate 0.0003   Epoch: 18   Global Step: 787400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:17,084-Speed 2611.35 samples/sec   Loss 1.2360   LearningRate 0.0003   Epoch: 18   Global Step: 787410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:13:21,045-Speed 2585.56 samples/sec   Loss 1.2281   LearningRate 0.0003   Epoch: 18   Global Step: 787420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:13:24,938-Speed 2631.14 samples/sec   Loss 1.2027   LearningRate 0.0003   Epoch: 18   Global Step: 787430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:13:28,869-Speed 2605.69 samples/sec   Loss 1.2083   LearningRate 0.0003   Epoch: 18   Global Step: 787440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:13:32,772-Speed 2624.36 samples/sec   Loss 1.2214   LearningRate 0.0003   Epoch: 18   Global Step: 787450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:13:36,665-Speed 2631.51 samples/sec   Loss 1.2741   LearningRate 0.0003   Epoch: 18   Global Step: 787460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:13:40,568-Speed 2623.73 samples/sec   Loss 1.2140   LearningRate 0.0003   Epoch: 18   Global Step: 787470   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:44,475-Speed 2621.46 samples/sec   Loss 1.2246   LearningRate 0.0003   Epoch: 18   Global Step: 787480   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:48,368-Speed 2631.43 samples/sec   Loss 1.1943   LearningRate 0.0003   Epoch: 18   Global Step: 787490   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:52,260-Speed 2632.49 samples/sec   Loss 1.2325   LearningRate 0.0003   Epoch: 18   Global Step: 787500   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:13:56,154-Speed 2629.95 samples/sec   Loss 1.2145   LearningRate 0.0003   Epoch: 18   Global Step: 787510   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:14:00,060-Speed 2622.98 samples/sec   Loss 1.1702   LearningRate 0.0003   Epoch: 18   Global Step: 787520   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:14:03,953-Speed 2630.90 samples/sec   Loss 1.1606   LearningRate 0.0003   Epoch: 18   Global Step: 787530   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:14:07,846-Speed 2630.58 samples/sec   Loss 1.2278   LearningRate 0.0003   Epoch: 18   Global Step: 787540   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:14:11,751-Speed 2622.76 samples/sec   Loss 1.1802   LearningRate 0.0003   Epoch: 18   Global Step: 787550   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:14:15,642-Speed 2633.17 samples/sec   Loss 1.2399   LearningRate 0.0003   Epoch: 18   Global Step: 787560   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:14:19,538-Speed 2628.50 samples/sec   Loss 1.2261   LearningRate 0.0003   Epoch: 18   Global Step: 787570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:23,430-Speed 2632.32 samples/sec   Loss 1.2260   LearningRate 0.0003   Epoch: 18   Global Step: 787580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:27,328-Speed 2626.85 samples/sec   Loss 1.2110   LearningRate 0.0003   Epoch: 18   Global Step: 787590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:31,223-Speed 2630.54 samples/sec   Loss 1.2877   LearningRate 0.0003   Epoch: 18   Global Step: 787600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:35,123-Speed 2626.18 samples/sec   Loss 1.2755   LearningRate 0.0003   Epoch: 18   Global Step: 787610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:39,024-Speed 2625.17 samples/sec   Loss 1.1930   LearningRate 0.0003   Epoch: 18   Global Step: 787620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:42,919-Speed 2629.52 samples/sec   Loss 1.2038   LearningRate 0.0003   Epoch: 18   Global Step: 787630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:46,837-Speed 2614.85 samples/sec   Loss 1.2640   LearningRate 0.0003   Epoch: 18   Global Step: 787640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:50,733-Speed 2629.02 samples/sec   Loss 1.2856   LearningRate 0.0003   Epoch: 18   Global Step: 787650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:54,628-Speed 2629.71 samples/sec   Loss 1.2213   LearningRate 0.0003   Epoch: 18   Global Step: 787660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:14:58,522-Speed 2630.46 samples/sec   Loss 1.2002   LearningRate 0.0003   Epoch: 18   Global Step: 787670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:15:02,393-Speed 2646.23 samples/sec   Loss 1.2407   LearningRate 0.0003   Epoch: 18   Global Step: 787680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:06,291-Speed 2627.93 samples/sec   Loss 1.2458   LearningRate 0.0003   Epoch: 18   Global Step: 787690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:10,191-Speed 2625.92 samples/sec   Loss 1.2590   LearningRate 0.0003   Epoch: 18   Global Step: 787700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:14,084-Speed 2630.43 samples/sec   Loss 1.2291   LearningRate 0.0003   Epoch: 18   Global Step: 787710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:17,978-Speed 2630.62 samples/sec   Loss 1.2036   LearningRate 0.0003   Epoch: 18   Global Step: 787720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:21,870-Speed 2632.35 samples/sec   Loss 1.2197   LearningRate 0.0003   Epoch: 18   Global Step: 787730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:25,764-Speed 2630.49 samples/sec   Loss 1.1978   LearningRate 0.0003   Epoch: 18   Global Step: 787740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:29,670-Speed 2622.15 samples/sec   Loss 1.2675   LearningRate 0.0003   Epoch: 18   Global Step: 787750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:33,565-Speed 2630.17 samples/sec   Loss 1.1829   LearningRate 0.0003   Epoch: 18   Global Step: 787760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:37,467-Speed 2624.52 samples/sec   Loss 1.2525   LearningRate 0.0003   Epoch: 18   Global Step: 787770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:15:41,410-Speed 2597.85 samples/sec   Loss 1.1914   LearningRate 0.0003   Epoch: 18   Global Step: 787780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:15:45,322-Speed 2618.09 samples/sec   Loss 1.2431   LearningRate 0.0003   Epoch: 18   Global Step: 787790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:15:49,233-Speed 2619.09 samples/sec   Loss 1.2590   LearningRate 0.0003   Epoch: 18   Global Step: 787800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:15:53,196-Speed 2585.13 samples/sec   Loss 1.2552   LearningRate 0.0003   Epoch: 18   Global Step: 787810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:15:57,199-Speed 2558.33 samples/sec   Loss 1.1874   LearningRate 0.0003   Epoch: 18   Global Step: 787820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:16:01,074-Speed 2643.59 samples/sec   Loss 1.1809   LearningRate 0.0003   Epoch: 18   Global Step: 787830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:04,990-Speed 2615.64 samples/sec   Loss 1.2299   LearningRate 0.0003   Epoch: 18   Global Step: 787840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:08,889-Speed 2627.14 samples/sec   Loss 1.2491   LearningRate 0.0003   Epoch: 18   Global Step: 787850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:12,781-Speed 2631.06 samples/sec   Loss 1.2514   LearningRate 0.0003   Epoch: 18   Global Step: 787860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:16,675-Speed 2630.97 samples/sec   Loss 1.2255   LearningRate 0.0003   Epoch: 18   Global Step: 787870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:20,571-Speed 2628.51 samples/sec   Loss 1.2151   LearningRate 0.0003   Epoch: 18   Global Step: 787880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:24,470-Speed 2627.14 samples/sec   Loss 1.2085   LearningRate 0.0003   Epoch: 18   Global Step: 787890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:28,365-Speed 2629.70 samples/sec   Loss 1.2248   LearningRate 0.0003   Epoch: 18   Global Step: 787900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:32,268-Speed 2624.45 samples/sec   Loss 1.2427   LearningRate 0.0003   Epoch: 18   Global Step: 787910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:36,175-Speed 2621.88 samples/sec   Loss 1.2134   LearningRate 0.0003   Epoch: 18   Global Step: 787920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:40,090-Speed 2616.00 samples/sec   Loss 1.2310   LearningRate 0.0003   Epoch: 18   Global Step: 787930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:16:43,992-Speed 2625.08 samples/sec   Loss 1.2094   LearningRate 0.0003   Epoch: 18   Global Step: 787940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:16:47,906-Speed 2616.43 samples/sec   Loss 1.2847   LearningRate 0.0003   Epoch: 18   Global Step: 787950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:16:51,781-Speed 2644.15 samples/sec   Loss 1.1991   LearningRate 0.0003   Epoch: 18   Global Step: 787960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:55,675-Speed 2630.10 samples/sec   Loss 1.1638   LearningRate 0.0003   Epoch: 18   Global Step: 787970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:16:59,589-Speed 2617.37 samples/sec   Loss 1.2250   LearningRate 0.0003   Epoch: 18   Global Step: 787980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:03,489-Speed 2625.92 samples/sec   Loss 1.2574   LearningRate 0.0003   Epoch: 18   Global Step: 787990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:07,384-Speed 2630.53 samples/sec   Loss 1.2528   LearningRate 0.0003   Epoch: 18   Global Step: 788000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:11,281-Speed 2627.92 samples/sec   Loss 1.2018   LearningRate 0.0003   Epoch: 18   Global Step: 788010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:15,237-Speed 2589.01 samples/sec   Loss 1.2389   LearningRate 0.0003   Epoch: 18   Global Step: 788020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:19,134-Speed 2627.75 samples/sec   Loss 1.2257   LearningRate 0.0003   Epoch: 18   Global Step: 788030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:23,044-Speed 2620.50 samples/sec   Loss 1.2273   LearningRate 0.0003   Epoch: 18   Global Step: 788040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:26,945-Speed 2625.11 samples/sec   Loss 1.1808   LearningRate 0.0003   Epoch: 18   Global Step: 788050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:30,840-Speed 2630.33 samples/sec   Loss 1.2134   LearningRate 0.0003   Epoch: 18   Global Step: 788060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:17:34,738-Speed 2627.59 samples/sec   Loss 1.1920   LearningRate 0.0003   Epoch: 18   Global Step: 788070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:17:38,633-Speed 2630.20 samples/sec   Loss 1.2190   LearningRate 0.0003   Epoch: 18   Global Step: 788080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:17:42,509-Speed 2642.09 samples/sec   Loss 1.2630   LearningRate 0.0003   Epoch: 18   Global Step: 788090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:17:46,407-Speed 2627.24 samples/sec   Loss 1.2219   LearningRate 0.0003   Epoch: 18   Global Step: 788100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:07,301-Speed 490.13 samples/sec   Loss 1.2712   LearningRate 0.0002   Epoch: 19   Global Step: 788110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:11,231-Speed 2606.89 samples/sec   Loss 1.2145   LearningRate 0.0002   Epoch: 19   Global Step: 788120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:15,130-Speed 2627.04 samples/sec   Loss 1.2266   LearningRate 0.0002   Epoch: 19   Global Step: 788130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:19,057-Speed 2608.08 samples/sec   Loss 1.2352   LearningRate 0.0002   Epoch: 19   Global Step: 788140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:22,948-Speed 2632.82 samples/sec   Loss 1.2350   LearningRate 0.0002   Epoch: 19   Global Step: 788150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:26,850-Speed 2624.42 samples/sec   Loss 1.2485   LearningRate 0.0002   Epoch: 19   Global Step: 788160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:30,751-Speed 2625.82 samples/sec   Loss 1.2804   LearningRate 0.0002   Epoch: 19   Global Step: 788170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:34,649-Speed 2627.61 samples/sec   Loss 1.1708   LearningRate 0.0002   Epoch: 19   Global Step: 788180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:38,542-Speed 2631.24 samples/sec   Loss 1.2195   LearningRate 0.0002   Epoch: 19   Global Step: 788190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:18:42,414-Speed 2644.89 samples/sec   Loss 1.2362   LearningRate 0.0002   Epoch: 19   Global Step: 788200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:18:46,319-Speed 2623.25 samples/sec   Loss 1.2176   LearningRate 0.0002   Epoch: 19   Global Step: 788210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:18:50,216-Speed 2628.54 samples/sec   Loss 1.2349   LearningRate 0.0002   Epoch: 19   Global Step: 788220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:18:54,113-Speed 2628.45 samples/sec   Loss 1.2266   LearningRate 0.0002   Epoch: 19   Global Step: 788230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:18:58,011-Speed 2627.08 samples/sec   Loss 1.2236   LearningRate 0.0002   Epoch: 19   Global Step: 788240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:01,911-Speed 2627.42 samples/sec   Loss 1.2753   LearningRate 0.0002   Epoch: 19   Global Step: 788250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:05,815-Speed 2623.68 samples/sec   Loss 1.2215   LearningRate 0.0002   Epoch: 19   Global Step: 788260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:09,757-Speed 2598.89 samples/sec   Loss 1.2236   LearningRate 0.0002   Epoch: 19   Global Step: 788270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:13,651-Speed 2630.39 samples/sec   Loss 1.2130   LearningRate 0.0002   Epoch: 19   Global Step: 788280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:17,551-Speed 2626.12 samples/sec   Loss 1.2181   LearningRate 0.0002   Epoch: 19   Global Step: 788290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:21,460-Speed 2620.01 samples/sec   Loss 1.2552   LearningRate 0.0002   Epoch: 19   Global Step: 788300   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:25,374-Speed 2617.37 samples/sec   Loss 1.1922   LearningRate 0.0002   Epoch: 19   Global Step: 788310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:19:29,274-Speed 2626.36 samples/sec   Loss 1.1946   LearningRate 0.0002   Epoch: 19   Global Step: 788320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:19:33,175-Speed 2625.71 samples/sec   Loss 1.1995   LearningRate 0.0002   Epoch: 19   Global Step: 788330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:19:37,102-Speed 2607.65 samples/sec   Loss 1.2724   LearningRate 0.0002   Epoch: 19   Global Step: 788340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:19:40,990-Speed 2635.14 samples/sec   Loss 1.1971   LearningRate 0.0002   Epoch: 19   Global Step: 788350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:44,888-Speed 2627.75 samples/sec   Loss 1.2308   LearningRate 0.0002   Epoch: 19   Global Step: 788360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:48,785-Speed 2628.20 samples/sec   Loss 1.2326   LearningRate 0.0002   Epoch: 19   Global Step: 788370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:52,680-Speed 2629.42 samples/sec   Loss 1.2327   LearningRate 0.0002   Epoch: 19   Global Step: 788380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:19:56,593-Speed 2617.74 samples/sec   Loss 1.2866   LearningRate 0.0002   Epoch: 19   Global Step: 788390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:20:00,489-Speed 2629.05 samples/sec   Loss 1.1927   LearningRate 0.0002   Epoch: 19   Global Step: 788400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:20:04,400-Speed 2619.03 samples/sec   Loss 1.2011   LearningRate 0.0002   Epoch: 19   Global Step: 788410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:20:08,305-Speed 2623.19 samples/sec   Loss 1.1927   LearningRate 0.0002   Epoch: 19   Global Step: 788420   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:20:12,210-Speed 2622.29 samples/sec   Loss 1.1727   LearningRate 0.0002   Epoch: 19   Global Step: 788430   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:20:16,108-Speed 2628.64 samples/sec   Loss 1.2289   LearningRate 0.0002   Epoch: 19   Global Step: 788440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:20:20,008-Speed 2626.07 samples/sec   Loss 1.1957   LearningRate 0.0002   Epoch: 19   Global Step: 788450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:23,911-Speed 2624.09 samples/sec   Loss 1.1926   LearningRate 0.0002   Epoch: 19   Global Step: 788460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:27,809-Speed 2627.56 samples/sec   Loss 1.2138   LearningRate 0.0002   Epoch: 19   Global Step: 788470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:31,709-Speed 2626.11 samples/sec   Loss 1.1549   LearningRate 0.0002   Epoch: 19   Global Step: 788480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:35,632-Speed 2610.96 samples/sec   Loss 1.2168   LearningRate 0.0002   Epoch: 19   Global Step: 788490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:39,554-Speed 2611.79 samples/sec   Loss 1.2211   LearningRate 0.0002   Epoch: 19   Global Step: 788500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:43,454-Speed 2626.15 samples/sec   Loss 1.2544   LearningRate 0.0002   Epoch: 19   Global Step: 788510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:47,425-Speed 2580.26 samples/sec   Loss 1.2171   LearningRate 0.0002   Epoch: 19   Global Step: 788520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:51,332-Speed 2621.39 samples/sec   Loss 1.2297   LearningRate 0.0002   Epoch: 19   Global Step: 788530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:55,228-Speed 2628.56 samples/sec   Loss 1.1833   LearningRate 0.0002   Epoch: 19   Global Step: 788540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:20:59,136-Speed 2620.67 samples/sec   Loss 1.2185   LearningRate 0.0002   Epoch: 19   Global Step: 788550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:03,037-Speed 2626.03 samples/sec   Loss 1.2338   LearningRate 0.0002   Epoch: 19   Global Step: 788560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:06,935-Speed 2627.66 samples/sec   Loss 1.2000   LearningRate 0.0002   Epoch: 19   Global Step: 788570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:10,835-Speed 2626.94 samples/sec   Loss 1.1928   LearningRate 0.0002   Epoch: 19   Global Step: 788580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:14,743-Speed 2620.67 samples/sec   Loss 1.2230   LearningRate 0.0002   Epoch: 19   Global Step: 788590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:18,648-Speed 2622.86 samples/sec   Loss 1.2257   LearningRate 0.0002   Epoch: 19   Global Step: 788600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:22,546-Speed 2627.41 samples/sec   Loss 1.2670   LearningRate 0.0002   Epoch: 19   Global Step: 788610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:26,448-Speed 2624.95 samples/sec   Loss 1.1948   LearningRate 0.0002   Epoch: 19   Global Step: 788620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:21:30,321-Speed 2644.28 samples/sec   Loss 1.1971   LearningRate 0.0002   Epoch: 19   Global Step: 788630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:34,225-Speed 2623.83 samples/sec   Loss 1.1819   LearningRate 0.0002   Epoch: 19   Global Step: 788640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:38,124-Speed 2627.56 samples/sec   Loss 1.2232   LearningRate 0.0002   Epoch: 19   Global Step: 788650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:42,019-Speed 2629.23 samples/sec   Loss 1.2263   LearningRate 0.0002   Epoch: 19   Global Step: 788660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:45,914-Speed 2629.77 samples/sec   Loss 1.1726   LearningRate 0.0002   Epoch: 19   Global Step: 788670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:49,810-Speed 2629.16 samples/sec   Loss 1.1882   LearningRate 0.0002   Epoch: 19   Global Step: 788680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:53,704-Speed 2629.72 samples/sec   Loss 1.2467   LearningRate 0.0002   Epoch: 19   Global Step: 788690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:21:57,606-Speed 2625.07 samples/sec   Loss 1.2669   LearningRate 0.0002   Epoch: 19   Global Step: 788700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:22:01,517-Speed 2619.49 samples/sec   Loss 1.1708   LearningRate 0.0002   Epoch: 19   Global Step: 788710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:22:05,415-Speed 2626.99 samples/sec   Loss 1.2713   LearningRate 0.0002   Epoch: 19   Global Step: 788720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:22:09,309-Speed 2630.69 samples/sec   Loss 1.2082   LearningRate 0.0002   Epoch: 19   Global Step: 788730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:13,206-Speed 2628.32 samples/sec   Loss 1.2220   LearningRate 0.0002   Epoch: 19   Global Step: 788740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:17,103-Speed 2628.08 samples/sec   Loss 1.1977   LearningRate 0.0002   Epoch: 19   Global Step: 788750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:21,001-Speed 2627.35 samples/sec   Loss 1.1710   LearningRate 0.0002   Epoch: 19   Global Step: 788760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:24,916-Speed 2615.76 samples/sec   Loss 1.2149   LearningRate 0.0002   Epoch: 19   Global Step: 788770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:28,810-Speed 2630.62 samples/sec   Loss 1.1957   LearningRate 0.0002   Epoch: 19   Global Step: 788780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:32,709-Speed 2626.82 samples/sec   Loss 1.2487   LearningRate 0.0002   Epoch: 19   Global Step: 788790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:36,606-Speed 2628.65 samples/sec   Loss 1.1833   LearningRate 0.0002   Epoch: 19   Global Step: 788800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:40,506-Speed 2626.29 samples/sec   Loss 1.1457   LearningRate 0.0002   Epoch: 19   Global Step: 788810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:44,411-Speed 2622.74 samples/sec   Loss 1.1996   LearningRate 0.0002   Epoch: 19   Global Step: 788820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:22:48,287-Speed 2642.18 samples/sec   Loss 1.2116   LearningRate 0.0002   Epoch: 19   Global Step: 788830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:22:52,203-Speed 2615.73 samples/sec   Loss 1.2017   LearningRate 0.0002   Epoch: 19   Global Step: 788840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:22:56,105-Speed 2624.97 samples/sec   Loss 1.2412   LearningRate 0.0002   Epoch: 19   Global Step: 788850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:00,008-Speed 2623.95 samples/sec   Loss 1.1314   LearningRate 0.0002   Epoch: 19   Global Step: 788860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:03,917-Speed 2620.15 samples/sec   Loss 1.1900   LearningRate 0.0002   Epoch: 19   Global Step: 788870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:07,829-Speed 2618.26 samples/sec   Loss 1.1650   LearningRate 0.0002   Epoch: 19   Global Step: 788880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:11,733-Speed 2623.84 samples/sec   Loss 1.2248   LearningRate 0.0002   Epoch: 19   Global Step: 788890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:15,632-Speed 2627.27 samples/sec   Loss 1.2391   LearningRate 0.0002   Epoch: 19   Global Step: 788900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:19,701-Speed 2516.80 samples/sec   Loss 1.2271   LearningRate 0.0002   Epoch: 19   Global Step: 788910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:23,597-Speed 2629.18 samples/sec   Loss 1.2282   LearningRate 0.0002   Epoch: 19   Global Step: 788920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:27,515-Speed 2614.20 samples/sec   Loss 1.1920   LearningRate 0.0002   Epoch: 19   Global Step: 788930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:31,466-Speed 2591.70 samples/sec   Loss 1.1937   LearningRate 0.0002   Epoch: 19   Global Step: 788940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:35,363-Speed 2628.64 samples/sec   Loss 1.2098   LearningRate 0.0002   Epoch: 19   Global Step: 788950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:39,291-Speed 2607.62 samples/sec   Loss 1.2043   LearningRate 0.0002   Epoch: 19   Global Step: 788960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:43,268-Speed 2575.17 samples/sec   Loss 1.1439   LearningRate 0.0002   Epoch: 19   Global Step: 788970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:47,170-Speed 2625.31 samples/sec   Loss 1.1910   LearningRate 0.0002   Epoch: 19   Global Step: 788980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:51,182-Speed 2552.92 samples/sec   Loss 1.2450   LearningRate 0.0002   Epoch: 19   Global Step: 788990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:55,080-Speed 2627.89 samples/sec   Loss 1.2019   LearningRate 0.0002   Epoch: 19   Global Step: 789000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:23:58,973-Speed 2630.38 samples/sec   Loss 1.2062   LearningRate 0.0002   Epoch: 19   Global Step: 789010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:02,869-Speed 2629.27 samples/sec   Loss 1.2182   LearningRate 0.0002   Epoch: 19   Global Step: 789020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:06,764-Speed 2629.71 samples/sec   Loss 1.2651   LearningRate 0.0002   Epoch: 19   Global Step: 789030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:24:10,651-Speed 2634.41 samples/sec   Loss 1.2544   LearningRate 0.0002   Epoch: 19   Global Step: 789040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:14,546-Speed 2630.05 samples/sec   Loss 1.1958   LearningRate 0.0002   Epoch: 19   Global Step: 789050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:18,462-Speed 2615.48 samples/sec   Loss 1.1874   LearningRate 0.0002   Epoch: 19   Global Step: 789060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:22,358-Speed 2629.40 samples/sec   Loss 1.2346   LearningRate 0.0002   Epoch: 19   Global Step: 789070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:26,251-Speed 2630.81 samples/sec   Loss 1.1922   LearningRate 0.0002   Epoch: 19   Global Step: 789080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:30,145-Speed 2630.36 samples/sec   Loss 1.1808   LearningRate 0.0002   Epoch: 19   Global Step: 789090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:34,040-Speed 2629.12 samples/sec   Loss 1.2235   LearningRate 0.0002   Epoch: 19   Global Step: 789100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:37,935-Speed 2629.94 samples/sec   Loss 1.1947   LearningRate 0.0002   Epoch: 19   Global Step: 789110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:41,829-Speed 2630.59 samples/sec   Loss 1.1962   LearningRate 0.0002   Epoch: 19   Global Step: 789120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:45,756-Speed 2608.31 samples/sec   Loss 1.2350   LearningRate 0.0002   Epoch: 19   Global Step: 789130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:49,653-Speed 2627.69 samples/sec   Loss 1.1935   LearningRate 0.0002   Epoch: 19   Global Step: 789140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-16 12:24:53,544-Speed 2632.61 samples/sec   Loss 1.2259   LearningRate 0.0002   Epoch: 19   Global Step: 789150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:24:57,445-Speed 2625.27 samples/sec   Loss 1.2186   LearningRate 0.0002   Epoch: 19   Global Step: 789160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:25:01,345-Speed 2626.07 samples/sec   Loss 1.1973   LearningRate 0.0002   Epoch: 19   Global Step: 789170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:25:05,262-Speed 2615.13 samples/sec   Loss 1.1973   LearningRate 0.0002   Epoch: 19   Global Step: 789180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:25:09,173-Speed 2625.69 samples/sec   Loss 1.2084   LearningRate 0.0002   Epoch: 19   Global Step: 789190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:25:13,051-Speed 2641.46 samples/sec   Loss 1.2114   LearningRate 0.0002   Epoch: 19   Global Step: 789200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:16,952-Speed 2625.02 samples/sec   Loss 1.2075   LearningRate 0.0002   Epoch: 19   Global Step: 789210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:20,854-Speed 2625.27 samples/sec   Loss 1.1842   LearningRate 0.0002   Epoch: 19   Global Step: 789220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:24,756-Speed 2624.69 samples/sec   Loss 1.2124   LearningRate 0.0002   Epoch: 19   Global Step: 789230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:28,664-Speed 2621.28 samples/sec   Loss 1.2047   LearningRate 0.0002   Epoch: 19   Global Step: 789240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:32,561-Speed 2628.28 samples/sec   Loss 1.2307   LearningRate 0.0002   Epoch: 19   Global Step: 789250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:36,452-Speed 2632.55 samples/sec   Loss 1.2758   LearningRate 0.0002   Epoch: 19   Global Step: 789260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:40,351-Speed 2626.77 samples/sec   Loss 1.1964   LearningRate 0.0002   Epoch: 19   Global Step: 789270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:44,243-Speed 2632.14 samples/sec   Loss 1.1998   LearningRate 0.0002   Epoch: 19   Global Step: 789280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:48,140-Speed 2628.26 samples/sec   Loss 1.2378   LearningRate 0.0002   Epoch: 19   Global Step: 789290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:25:52,039-Speed 2626.56 samples/sec   Loss 1.1963   LearningRate 0.0002   Epoch: 19   Global Step: 789300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:25:55,932-Speed 2631.51 samples/sec   Loss 1.2101   LearningRate 0.0002   Epoch: 19   Global Step: 789310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:25:59,890-Speed 2587.62 samples/sec   Loss 1.2199   LearningRate 0.0002   Epoch: 19   Global Step: 789320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:26:03,790-Speed 2625.69 samples/sec   Loss 1.2211   LearningRate 0.0002   Epoch: 19   Global Step: 789330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:26:07,695-Speed 2622.78 samples/sec   Loss 1.2332   LearningRate 0.0002   Epoch: 19   Global Step: 789340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:26:11,592-Speed 2629.03 samples/sec   Loss 1.1729   LearningRate 0.0002   Epoch: 19   Global Step: 789350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:26:15,470-Speed 2640.79 samples/sec   Loss 1.2235   LearningRate 0.0002   Epoch: 19   Global Step: 789360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:19,367-Speed 2628.59 samples/sec   Loss 1.2507   LearningRate 0.0002   Epoch: 19   Global Step: 789370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:23,263-Speed 2629.34 samples/sec   Loss 1.2105   LearningRate 0.0002   Epoch: 19   Global Step: 789380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:27,155-Speed 2631.89 samples/sec   Loss 1.1956   LearningRate 0.0002   Epoch: 19   Global Step: 789390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:31,052-Speed 2628.32 samples/sec   Loss 1.2367   LearningRate 0.0002   Epoch: 19   Global Step: 789400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:34,948-Speed 2629.02 samples/sec   Loss 1.2121   LearningRate 0.0002   Epoch: 19   Global Step: 789410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:38,848-Speed 2625.83 samples/sec   Loss 1.2456   LearningRate 0.0002   Epoch: 19   Global Step: 789420   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:42,747-Speed 2627.05 samples/sec   Loss 1.1845   LearningRate 0.0002   Epoch: 19   Global Step: 789430   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:46,644-Speed 2627.87 samples/sec   Loss 1.2260   LearningRate 0.0002   Epoch: 19   Global Step: 789440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:50,541-Speed 2628.59 samples/sec   Loss 1.2192   LearningRate 0.0002   Epoch: 19   Global Step: 789450   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-16 12:26:54,436-Speed 2630.01 samples/sec   Loss 1.2232   LearningRate 0.0002   Epoch: 19   Global Step: 789460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:26:58,331-Speed 2630.01 samples/sec   Loss 1.2369   LearningRate 0.0002   Epoch: 19   Global Step: 789470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:02,230-Speed 2626.81 samples/sec   Loss 1.2307   LearningRate 0.0002   Epoch: 19   Global Step: 789480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:06,125-Speed 2629.54 samples/sec   Loss 1.2087   LearningRate 0.0002   Epoch: 19   Global Step: 789490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:10,061-Speed 2601.92 samples/sec   Loss 1.2814   LearningRate 0.0002   Epoch: 19   Global Step: 789500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:13,964-Speed 2624.57 samples/sec   Loss 1.2590   LearningRate 0.0002   Epoch: 19   Global Step: 789510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:17,872-Speed 2620.56 samples/sec   Loss 1.1586   LearningRate 0.0002   Epoch: 19   Global Step: 789520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:21,777-Speed 2622.83 samples/sec   Loss 1.1998   LearningRate 0.0002   Epoch: 19   Global Step: 789530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-16 12:27:25,724-Speed 2594.68 samples/sec   Loss 1.2063   LearningRate 0.0002   Epoch: 19   Global Step: 789540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:27:29,601-Speed 2642.42 samples/sec   Loss 1.2577   LearningRate 0.0002   Epoch: 19   Global Step: 789550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:33,497-Speed 2628.74 samples/sec   Loss 1.2177   LearningRate 0.0002   Epoch: 19   Global Step: 789560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:37,394-Speed 2628.40 samples/sec   Loss 1.1694   LearningRate 0.0002   Epoch: 19   Global Step: 789570   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:41,290-Speed 2629.07 samples/sec   Loss 1.2076   LearningRate 0.0002   Epoch: 19   Global Step: 789580   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:45,202-Speed 2617.47 samples/sec   Loss 1.2211   LearningRate 0.0002   Epoch: 19   Global Step: 789590   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:49,104-Speed 2625.39 samples/sec   Loss 1.2201   LearningRate 0.0002   Epoch: 19   Global Step: 789600   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:52,998-Speed 2629.98 samples/sec   Loss 1.2084   LearningRate 0.0002   Epoch: 19   Global Step: 789610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:27:56,890-Speed 2631.87 samples/sec   Loss 1.2433   LearningRate 0.0002   Epoch: 19   Global Step: 789620   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:00,785-Speed 2629.94 samples/sec   Loss 1.2232   LearningRate 0.0002   Epoch: 19   Global Step: 789630   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:04,688-Speed 2623.86 samples/sec   Loss 1.2235   LearningRate 0.0002   Epoch: 19   Global Step: 789640   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:08,597-Speed 2619.80 samples/sec   Loss 1.1588   LearningRate 0.0002   Epoch: 19   Global Step: 789650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:28:12,633-Speed 2538.51 samples/sec   Loss 1.2037   LearningRate 0.0002   Epoch: 19   Global Step: 789660   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:16,536-Speed 2624.28 samples/sec   Loss 1.1652   LearningRate 0.0002   Epoch: 19   Global Step: 789670   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:20,454-Speed 2613.63 samples/sec   Loss 1.1898   LearningRate 0.0002   Epoch: 19   Global Step: 789680   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:24,367-Speed 2617.80 samples/sec   Loss 1.1907   LearningRate 0.0002   Epoch: 19   Global Step: 789690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:28,273-Speed 2622.67 samples/sec   Loss 1.2245   LearningRate 0.0002   Epoch: 19   Global Step: 789700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:32,175-Speed 2624.20 samples/sec   Loss 1.2260   LearningRate 0.0002   Epoch: 19   Global Step: 789710   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:36,086-Speed 2619.07 samples/sec   Loss 1.2367   LearningRate 0.0002   Epoch: 19   Global Step: 789720   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:39,990-Speed 2623.91 samples/sec   Loss 1.2006   LearningRate 0.0002   Epoch: 19   Global Step: 789730   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:43,897-Speed 2621.16 samples/sec   Loss 1.1715   LearningRate 0.0002   Epoch: 19   Global Step: 789740   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:47,803-Speed 2622.45 samples/sec   Loss 1.2220   LearningRate 0.0002   Epoch: 19   Global Step: 789750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:28:51,698-Speed 2630.29 samples/sec   Loss 1.1962   LearningRate 0.0002   Epoch: 19   Global Step: 789760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:28:55,601-Speed 2623.99 samples/sec   Loss 1.2728   LearningRate 0.0002   Epoch: 19   Global Step: 789770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:28:59,498-Speed 2628.43 samples/sec   Loss 1.1561   LearningRate 0.0002   Epoch: 19   Global Step: 789780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:03,397-Speed 2626.49 samples/sec   Loss 1.2153   LearningRate 0.0002   Epoch: 19   Global Step: 789790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:07,293-Speed 2629.29 samples/sec   Loss 1.1988   LearningRate 0.0002   Epoch: 19   Global Step: 789800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:11,187-Speed 2629.89 samples/sec   Loss 1.2150   LearningRate 0.0002   Epoch: 19   Global Step: 789810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:15,089-Speed 2625.29 samples/sec   Loss 1.2428   LearningRate 0.0002   Epoch: 19   Global Step: 789820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:18,977-Speed 2634.15 samples/sec   Loss 1.2338   LearningRate 0.0002   Epoch: 19   Global Step: 789830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:22,876-Speed 2626.79 samples/sec   Loss 1.1862   LearningRate 0.0002   Epoch: 19   Global Step: 789840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:26,778-Speed 2624.80 samples/sec   Loss 1.2182   LearningRate 0.0002   Epoch: 19   Global Step: 789850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:30,686-Speed 2621.34 samples/sec   Loss 1.1780   LearningRate 0.0002   Epoch: 19   Global Step: 789860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:29:34,600-Speed 2616.50 samples/sec   Loss 1.2254   LearningRate 0.0002   Epoch: 19   Global Step: 789870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:29:38,503-Speed 2624.35 samples/sec   Loss 1.2149   LearningRate 0.0002   Epoch: 19   Global Step: 789880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:29:42,398-Speed 2629.72 samples/sec   Loss 1.2103   LearningRate 0.0002   Epoch: 19   Global Step: 789890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:29:46,289-Speed 2632.04 samples/sec   Loss 1.1549   LearningRate 0.0002   Epoch: 19   Global Step: 789900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:29:50,165-Speed 2642.50 samples/sec   Loss 1.2085   LearningRate 0.0002   Epoch: 19   Global Step: 789910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:54,062-Speed 2628.49 samples/sec   Loss 1.2041   LearningRate 0.0002   Epoch: 19   Global Step: 789920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:29:57,957-Speed 2629.91 samples/sec   Loss 1.2486   LearningRate 0.0002   Epoch: 19   Global Step: 789930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:01,853-Speed 2628.70 samples/sec   Loss 1.2471   LearningRate 0.0002   Epoch: 19   Global Step: 789940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:05,773-Speed 2612.46 samples/sec   Loss 1.1767   LearningRate 0.0002   Epoch: 19   Global Step: 789950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:09,667-Speed 2631.72 samples/sec   Loss 1.2120   LearningRate 0.0002   Epoch: 19   Global Step: 789960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:13,557-Speed 2633.06 samples/sec   Loss 1.2253   LearningRate 0.0002   Epoch: 19   Global Step: 789970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:17,452-Speed 2629.94 samples/sec   Loss 1.2172   LearningRate 0.0002   Epoch: 19   Global Step: 789980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:21,348-Speed 2628.98 samples/sec   Loss 1.2448   LearningRate 0.0002   Epoch: 19   Global Step: 789990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:30:25,238-Speed 2632.54 samples/sec   Loss 1.2049   LearningRate 0.0002   Epoch: 19   Global Step: 790000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:31:08,265-[lfw][790000]XNorm: 21.555816
Training: 2022-04-16 12:31:08,266-[lfw][790000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 12:31:08,266-[lfw][790000]Accuracy-Highest: 0.99850
Training: 2022-04-16 12:31:58,366-[cfp_fp][790000]XNorm: 22.078192
Training: 2022-04-16 12:31:58,367-[cfp_fp][790000]Accuracy-Flip: 0.99300+-0.00353
Training: 2022-04-16 12:31:58,367-[cfp_fp][790000]Accuracy-Highest: 0.99400
Training: 2022-04-16 12:32:41,491-[agedb_30][790000]XNorm: 22.495209
Training: 2022-04-16 12:32:41,491-[agedb_30][790000]Accuracy-Flip: 0.98550+-0.00553
Training: 2022-04-16 12:32:41,492-[agedb_30][790000]Accuracy-Highest: 0.98550
Training: 2022-04-16 12:32:45,367-Speed 73.08 samples/sec   Loss 1.1911   LearningRate 0.0002   Epoch: 19   Global Step: 790010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:32:49,255-Speed 2634.78 samples/sec   Loss 1.2067   LearningRate 0.0002   Epoch: 19   Global Step: 790020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:32:53,129-Speed 2643.71 samples/sec   Loss 1.2266   LearningRate 0.0002   Epoch: 19   Global Step: 790030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:32:57,004-Speed 2643.27 samples/sec   Loss 1.1745   LearningRate 0.0002   Epoch: 19   Global Step: 790040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:33:00,860-Speed 2655.82 samples/sec   Loss 1.2096   LearningRate 0.0002   Epoch: 19   Global Step: 790050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:04,831-Speed 2579.68 samples/sec   Loss 1.2095   LearningRate 0.0002   Epoch: 19   Global Step: 790060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:08,808-Speed 2575.39 samples/sec   Loss 1.1767   LearningRate 0.0002   Epoch: 19   Global Step: 790070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:12,706-Speed 2627.52 samples/sec   Loss 1.2243   LearningRate 0.0002   Epoch: 19   Global Step: 790080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:16,598-Speed 2632.06 samples/sec   Loss 1.2110   LearningRate 0.0002   Epoch: 19   Global Step: 790090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:20,493-Speed 2629.44 samples/sec   Loss 1.2010   LearningRate 0.0002   Epoch: 19   Global Step: 790100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:24,387-Speed 2630.40 samples/sec   Loss 1.1815   LearningRate 0.0002   Epoch: 19   Global Step: 790110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:28,302-Speed 2616.28 samples/sec   Loss 1.2310   LearningRate 0.0002   Epoch: 19   Global Step: 790120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:32,201-Speed 2626.58 samples/sec   Loss 1.2752   LearningRate 0.0002   Epoch: 19   Global Step: 790130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:36,100-Speed 2626.93 samples/sec   Loss 1.2019   LearningRate 0.0002   Epoch: 19   Global Step: 790140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:39,978-Speed 2640.97 samples/sec   Loss 1.1827   LearningRate 0.0002   Epoch: 19   Global Step: 790150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:43,886-Speed 2620.77 samples/sec   Loss 1.2100   LearningRate 0.0002   Epoch: 19   Global Step: 790160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:47,787-Speed 2625.69 samples/sec   Loss 1.1886   LearningRate 0.0002   Epoch: 19   Global Step: 790170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:51,690-Speed 2624.93 samples/sec   Loss 1.1917   LearningRate 0.0002   Epoch: 19   Global Step: 790180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:55,597-Speed 2621.46 samples/sec   Loss 1.2089   LearningRate 0.0002   Epoch: 19   Global Step: 790190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:33:59,500-Speed 2624.22 samples/sec   Loss 1.2071   LearningRate 0.0002   Epoch: 19   Global Step: 790200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:03,406-Speed 2621.62 samples/sec   Loss 1.2396   LearningRate 0.0002   Epoch: 19   Global Step: 790210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:07,315-Speed 2620.30 samples/sec   Loss 1.1688   LearningRate 0.0002   Epoch: 19   Global Step: 790220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:11,220-Speed 2622.52 samples/sec   Loss 1.2723   LearningRate 0.0002   Epoch: 19   Global Step: 790230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:15,122-Speed 2625.63 samples/sec   Loss 1.2272   LearningRate 0.0002   Epoch: 19   Global Step: 790240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:19,021-Speed 2626.95 samples/sec   Loss 1.1645   LearningRate 0.0002   Epoch: 19   Global Step: 790250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:34:22,896-Speed 2643.04 samples/sec   Loss 1.2147   LearningRate 0.0002   Epoch: 19   Global Step: 790260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:26,796-Speed 2626.35 samples/sec   Loss 1.2547   LearningRate 0.0002   Epoch: 19   Global Step: 790270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:30,699-Speed 2624.45 samples/sec   Loss 1.1589   LearningRate 0.0002   Epoch: 19   Global Step: 790280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:34,594-Speed 2629.60 samples/sec   Loss 1.2120   LearningRate 0.0002   Epoch: 19   Global Step: 790290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:38,524-Speed 2605.94 samples/sec   Loss 1.2213   LearningRate 0.0002   Epoch: 19   Global Step: 790300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:42,427-Speed 2624.30 samples/sec   Loss 1.1753   LearningRate 0.0002   Epoch: 19   Global Step: 790310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:46,330-Speed 2624.41 samples/sec   Loss 1.1579   LearningRate 0.0002   Epoch: 19   Global Step: 790320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:50,321-Speed 2566.06 samples/sec   Loss 1.1898   LearningRate 0.0002   Epoch: 19   Global Step: 790330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:54,228-Speed 2622.35 samples/sec   Loss 1.1915   LearningRate 0.0002   Epoch: 19   Global Step: 790340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:34:58,126-Speed 2627.56 samples/sec   Loss 1.2391   LearningRate 0.0002   Epoch: 19   Global Step: 790350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:35:02,086-Speed 2586.35 samples/sec   Loss 1.1881   LearningRate 0.0002   Epoch: 19   Global Step: 790360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:35:05,987-Speed 2625.15 samples/sec   Loss 1.1765   LearningRate 0.0002   Epoch: 19   Global Step: 790370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:35:09,883-Speed 2629.90 samples/sec   Loss 1.1945   LearningRate 0.0002   Epoch: 19   Global Step: 790380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:35:13,759-Speed 2642.29 samples/sec   Loss 1.2178   LearningRate 0.0002   Epoch: 19   Global Step: 790390   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:17,655-Speed 2628.94 samples/sec   Loss 1.1534   LearningRate 0.0002   Epoch: 19   Global Step: 790400   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:21,550-Speed 2630.02 samples/sec   Loss 1.2203   LearningRate 0.0002   Epoch: 19   Global Step: 790410   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:25,459-Speed 2620.14 samples/sec   Loss 1.2646   LearningRate 0.0002   Epoch: 19   Global Step: 790420   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:29,360-Speed 2625.56 samples/sec   Loss 1.1995   LearningRate 0.0002   Epoch: 19   Global Step: 790430   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:33,262-Speed 2624.62 samples/sec   Loss 1.1672   LearningRate 0.0002   Epoch: 19   Global Step: 790440   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:37,165-Speed 2624.18 samples/sec   Loss 1.1684   LearningRate 0.0002   Epoch: 19   Global Step: 790450   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:41,061-Speed 2628.65 samples/sec   Loss 1.2249   LearningRate 0.0002   Epoch: 19   Global Step: 790460   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:44,960-Speed 2627.78 samples/sec   Loss 1.1894   LearningRate 0.0002   Epoch: 19   Global Step: 790470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:48,867-Speed 2621.93 samples/sec   Loss 1.1984   LearningRate 0.0002   Epoch: 19   Global Step: 790480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:35:52,762-Speed 2629.54 samples/sec   Loss 1.1696   LearningRate 0.0002   Epoch: 19   Global Step: 790490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:35:56,658-Speed 2628.96 samples/sec   Loss 1.1906   LearningRate 0.0002   Epoch: 19   Global Step: 790500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:00,553-Speed 2629.46 samples/sec   Loss 1.2379   LearningRate 0.0002   Epoch: 19   Global Step: 790510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:04,447-Speed 2630.26 samples/sec   Loss 1.1811   LearningRate 0.0002   Epoch: 19   Global Step: 790520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:08,357-Speed 2619.27 samples/sec   Loss 1.1861   LearningRate 0.0002   Epoch: 19   Global Step: 790530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:12,273-Speed 2616.72 samples/sec   Loss 1.2273   LearningRate 0.0002   Epoch: 19   Global Step: 790540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:16,171-Speed 2627.10 samples/sec   Loss 1.1567   LearningRate 0.0002   Epoch: 19   Global Step: 790550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:20,074-Speed 2625.23 samples/sec   Loss 1.1825   LearningRate 0.0002   Epoch: 19   Global Step: 790560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:23,976-Speed 2624.91 samples/sec   Loss 1.2149   LearningRate 0.0002   Epoch: 19   Global Step: 790570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:27,871-Speed 2629.73 samples/sec   Loss 1.1683   LearningRate 0.0002   Epoch: 19   Global Step: 790580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:31,801-Speed 2606.34 samples/sec   Loss 1.1916   LearningRate 0.0002   Epoch: 19   Global Step: 790590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:36:35,700-Speed 2626.80 samples/sec   Loss 1.2232   LearningRate 0.0002   Epoch: 19   Global Step: 790600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:36:39,611-Speed 2618.62 samples/sec   Loss 1.1729   LearningRate 0.0002   Epoch: 19   Global Step: 790610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:36:43,488-Speed 2642.08 samples/sec   Loss 1.2553   LearningRate 0.0002   Epoch: 19   Global Step: 790620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:47,385-Speed 2627.57 samples/sec   Loss 1.1940   LearningRate 0.0002   Epoch: 19   Global Step: 790630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:51,282-Speed 2628.57 samples/sec   Loss 1.2143   LearningRate 0.0002   Epoch: 19   Global Step: 790640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:55,185-Speed 2624.52 samples/sec   Loss 1.2264   LearningRate 0.0002   Epoch: 19   Global Step: 790650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:36:59,080-Speed 2630.06 samples/sec   Loss 1.1997   LearningRate 0.0002   Epoch: 19   Global Step: 790660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:02,980-Speed 2626.25 samples/sec   Loss 1.1592   LearningRate 0.0002   Epoch: 19   Global Step: 790670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:06,885-Speed 2622.26 samples/sec   Loss 1.1978   LearningRate 0.0002   Epoch: 19   Global Step: 790680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:10,781-Speed 2629.12 samples/sec   Loss 1.1994   LearningRate 0.0002   Epoch: 19   Global Step: 790690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:14,691-Speed 2619.35 samples/sec   Loss 1.2032   LearningRate 0.0002   Epoch: 19   Global Step: 790700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:18,592-Speed 2625.45 samples/sec   Loss 1.1962   LearningRate 0.0002   Epoch: 19   Global Step: 790710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:22,493-Speed 2625.82 samples/sec   Loss 1.1607   LearningRate 0.0002   Epoch: 19   Global Step: 790720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:37:26,369-Speed 2642.41 samples/sec   Loss 1.2362   LearningRate 0.0002   Epoch: 19   Global Step: 790730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:30,268-Speed 2627.17 samples/sec   Loss 1.1846   LearningRate 0.0002   Epoch: 19   Global Step: 790740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:34,166-Speed 2627.91 samples/sec   Loss 1.2295   LearningRate 0.0002   Epoch: 19   Global Step: 790750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:38,067-Speed 2625.60 samples/sec   Loss 1.1930   LearningRate 0.0002   Epoch: 19   Global Step: 790760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:41,961-Speed 2630.10 samples/sec   Loss 1.1545   LearningRate 0.0002   Epoch: 19   Global Step: 790770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:45,859-Speed 2627.47 samples/sec   Loss 1.2004   LearningRate 0.0002   Epoch: 19   Global Step: 790780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:49,756-Speed 2628.39 samples/sec   Loss 1.2470   LearningRate 0.0002   Epoch: 19   Global Step: 790790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:53,650-Speed 2630.55 samples/sec   Loss 1.1568   LearningRate 0.0002   Epoch: 19   Global Step: 790800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:37:57,544-Speed 2630.24 samples/sec   Loss 1.1869   LearningRate 0.0002   Epoch: 19   Global Step: 790810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:01,450-Speed 2622.00 samples/sec   Loss 1.1877   LearningRate 0.0002   Epoch: 19   Global Step: 790820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:05,345-Speed 2629.30 samples/sec   Loss 1.2055   LearningRate 0.0002   Epoch: 19   Global Step: 790830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:38:09,250-Speed 2623.15 samples/sec   Loss 1.1967   LearningRate 0.0002   Epoch: 19   Global Step: 790840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:38:13,148-Speed 2627.89 samples/sec   Loss 1.2008   LearningRate 0.0002   Epoch: 19   Global Step: 790850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:38:17,044-Speed 2629.15 samples/sec   Loss 1.1981   LearningRate 0.0002   Epoch: 19   Global Step: 790860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:38:20,945-Speed 2625.58 samples/sec   Loss 1.2145   LearningRate 0.0002   Epoch: 19   Global Step: 790870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:38:24,823-Speed 2641.20 samples/sec   Loss 1.1858   LearningRate 0.0002   Epoch: 19   Global Step: 790880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:28,737-Speed 2617.21 samples/sec   Loss 1.1591   LearningRate 0.0002   Epoch: 19   Global Step: 790890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:32,638-Speed 2625.78 samples/sec   Loss 1.2240   LearningRate 0.0002   Epoch: 19   Global Step: 790900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:36,543-Speed 2622.77 samples/sec   Loss 1.1847   LearningRate 0.0002   Epoch: 19   Global Step: 790910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:40,473-Speed 2605.98 samples/sec   Loss 1.1812   LearningRate 0.0002   Epoch: 19   Global Step: 790920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:44,385-Speed 2618.08 samples/sec   Loss 1.2411   LearningRate 0.0002   Epoch: 19   Global Step: 790930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:48,286-Speed 2626.02 samples/sec   Loss 1.2470   LearningRate 0.0002   Epoch: 19   Global Step: 790940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:52,184-Speed 2627.62 samples/sec   Loss 1.1873   LearningRate 0.0002   Epoch: 19   Global Step: 790950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:56,076-Speed 2631.86 samples/sec   Loss 1.2109   LearningRate 0.0002   Epoch: 19   Global Step: 790960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:38:59,983-Speed 2621.22 samples/sec   Loss 1.2161   LearningRate 0.0002   Epoch: 19   Global Step: 790970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:03,927-Speed 2596.88 samples/sec   Loss 1.1960   LearningRate 0.0002   Epoch: 19   Global Step: 790980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:39:07,809-Speed 2638.38 samples/sec   Loss 1.1876   LearningRate 0.0002   Epoch: 19   Global Step: 790990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:11,712-Speed 2624.02 samples/sec   Loss 1.2117   LearningRate 0.0002   Epoch: 19   Global Step: 791000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:15,610-Speed 2627.13 samples/sec   Loss 1.2012   LearningRate 0.0002   Epoch: 19   Global Step: 791010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:19,513-Speed 2624.63 samples/sec   Loss 1.2302   LearningRate 0.0002   Epoch: 19   Global Step: 791020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:23,408-Speed 2629.51 samples/sec   Loss 1.1527   LearningRate 0.0002   Epoch: 19   Global Step: 791030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:27,305-Speed 2628.83 samples/sec   Loss 1.2226   LearningRate 0.0002   Epoch: 19   Global Step: 791040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:31,204-Speed 2626.74 samples/sec   Loss 1.2037   LearningRate 0.0002   Epoch: 19   Global Step: 791050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:35,103-Speed 2627.17 samples/sec   Loss 1.2603   LearningRate 0.0002   Epoch: 19   Global Step: 791060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:39,002-Speed 2627.14 samples/sec   Loss 1.1754   LearningRate 0.0002   Epoch: 19   Global Step: 791070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:42,899-Speed 2627.70 samples/sec   Loss 1.1975   LearningRate 0.0002   Epoch: 19   Global Step: 791080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:46,795-Speed 2628.79 samples/sec   Loss 1.1386   LearningRate 0.0002   Epoch: 19   Global Step: 791090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:39:50,703-Speed 2621.26 samples/sec   Loss 1.2343   LearningRate 0.0002   Epoch: 19   Global Step: 791100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:39:54,578-Speed 2642.92 samples/sec   Loss 1.2178   LearningRate 0.0002   Epoch: 19   Global Step: 791110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:39:58,489-Speed 2619.05 samples/sec   Loss 1.1769   LearningRate 0.0002   Epoch: 19   Global Step: 791120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:02,384-Speed 2629.53 samples/sec   Loss 1.2283   LearningRate 0.0002   Epoch: 19   Global Step: 791130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:06,279-Speed 2630.05 samples/sec   Loss 1.1889   LearningRate 0.0002   Epoch: 19   Global Step: 791140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:10,181-Speed 2624.79 samples/sec   Loss 1.3190   LearningRate 0.0002   Epoch: 19   Global Step: 791150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:14,078-Speed 2628.37 samples/sec   Loss 1.2544   LearningRate 0.0002   Epoch: 19   Global Step: 791160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:17,995-Speed 2615.01 samples/sec   Loss 1.1987   LearningRate 0.0002   Epoch: 19   Global Step: 791170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:21,893-Speed 2627.21 samples/sec   Loss 1.1431   LearningRate 0.0002   Epoch: 19   Global Step: 791180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:25,784-Speed 2632.96 samples/sec   Loss 1.1795   LearningRate 0.0002   Epoch: 19   Global Step: 791190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:29,681-Speed 2628.49 samples/sec   Loss 1.2264   LearningRate 0.0002   Epoch: 19   Global Step: 791200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:33,576-Speed 2629.72 samples/sec   Loss 1.2157   LearningRate 0.0002   Epoch: 19   Global Step: 791210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:40:37,478-Speed 2625.15 samples/sec   Loss 1.2083   LearningRate 0.0002   Epoch: 19   Global Step: 791220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:41,380-Speed 2624.87 samples/sec   Loss 1.1729   LearningRate 0.0002   Epoch: 19   Global Step: 791230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:45,315-Speed 2602.73 samples/sec   Loss 1.1539   LearningRate 0.0002   Epoch: 19   Global Step: 791240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:49,248-Speed 2604.35 samples/sec   Loss 1.1878   LearningRate 0.0002   Epoch: 19   Global Step: 791250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:53,173-Speed 2610.18 samples/sec   Loss 1.2700   LearningRate 0.0002   Epoch: 19   Global Step: 791260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:40:57,044-Speed 2645.75 samples/sec   Loss 1.1495   LearningRate 0.0002   Epoch: 19   Global Step: 791270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:00,941-Speed 2628.34 samples/sec   Loss 1.2031   LearningRate 0.0002   Epoch: 19   Global Step: 791280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:04,836-Speed 2629.44 samples/sec   Loss 1.2015   LearningRate 0.0002   Epoch: 19   Global Step: 791290   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:08,775-Speed 2600.93 samples/sec   Loss 1.2192   LearningRate 0.0002   Epoch: 19   Global Step: 791300   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:12,675-Speed 2626.51 samples/sec   Loss 1.2467   LearningRate 0.0002   Epoch: 19   Global Step: 791310   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:16,574-Speed 2626.65 samples/sec   Loss 1.1816   LearningRate 0.0002   Epoch: 19   Global Step: 791320   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:20,495-Speed 2612.62 samples/sec   Loss 1.1715   LearningRate 0.0002   Epoch: 19   Global Step: 791330   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:24,392-Speed 2628.12 samples/sec   Loss 1.1828   LearningRate 0.0002   Epoch: 19   Global Step: 791340   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:28,288-Speed 2629.01 samples/sec   Loss 1.1624   LearningRate 0.0002   Epoch: 19   Global Step: 791350   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:32,182-Speed 2630.44 samples/sec   Loss 1.2221   LearningRate 0.0002   Epoch: 19   Global Step: 791360   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:41:36,100-Speed 2614.17 samples/sec   Loss 1.2144   LearningRate 0.0002   Epoch: 19   Global Step: 791370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:41:39,996-Speed 2628.40 samples/sec   Loss 1.1837   LearningRate 0.0002   Epoch: 19   Global Step: 791380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:41:43,893-Speed 2629.21 samples/sec   Loss 1.1755   LearningRate 0.0002   Epoch: 19   Global Step: 791390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:41:47,793-Speed 2626.40 samples/sec   Loss 1.2196   LearningRate 0.0002   Epoch: 19   Global Step: 791400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:41:51,690-Speed 2628.48 samples/sec   Loss 1.1569   LearningRate 0.0002   Epoch: 19   Global Step: 791410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:41:55,586-Speed 2628.71 samples/sec   Loss 1.2234   LearningRate 0.0002   Epoch: 19   Global Step: 791420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:41:59,501-Speed 2616.31 samples/sec   Loss 1.2292   LearningRate 0.0002   Epoch: 19   Global Step: 791430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:03,398-Speed 2628.15 samples/sec   Loss 1.1914   LearningRate 0.0002   Epoch: 19   Global Step: 791440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:07,309-Speed 2618.87 samples/sec   Loss 1.1817   LearningRate 0.0002   Epoch: 19   Global Step: 791450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:11,202-Speed 2630.45 samples/sec   Loss 1.1522   LearningRate 0.0002   Epoch: 19   Global Step: 791460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:15,100-Speed 2628.24 samples/sec   Loss 1.2002   LearningRate 0.0002   Epoch: 19   Global Step: 791470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:42:19,089-Speed 2567.61 samples/sec   Loss 1.2039   LearningRate 0.0002   Epoch: 19   Global Step: 791480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:42:22,982-Speed 2631.71 samples/sec   Loss 1.2071   LearningRate 0.0002   Epoch: 19   Global Step: 791490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:42:26,856-Speed 2643.35 samples/sec   Loss 1.2088   LearningRate 0.0002   Epoch: 19   Global Step: 791500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:30,748-Speed 2632.36 samples/sec   Loss 1.1633   LearningRate 0.0002   Epoch: 19   Global Step: 791510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:34,640-Speed 2630.96 samples/sec   Loss 1.2360   LearningRate 0.0002   Epoch: 19   Global Step: 791520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:38,536-Speed 2629.07 samples/sec   Loss 1.2262   LearningRate 0.0002   Epoch: 19   Global Step: 791530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:42,448-Speed 2618.11 samples/sec   Loss 1.1965   LearningRate 0.0002   Epoch: 19   Global Step: 791540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:46,356-Speed 2621.26 samples/sec   Loss 1.2270   LearningRate 0.0002   Epoch: 19   Global Step: 791550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:50,254-Speed 2627.80 samples/sec   Loss 1.2263   LearningRate 0.0002   Epoch: 19   Global Step: 791560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:54,180-Speed 2608.62 samples/sec   Loss 1.2129   LearningRate 0.0002   Epoch: 19   Global Step: 791570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:42:58,080-Speed 2626.94 samples/sec   Loss 1.1774   LearningRate 0.0002   Epoch: 19   Global Step: 791580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:01,992-Speed 2617.92 samples/sec   Loss 1.1908   LearningRate 0.0002   Epoch: 19   Global Step: 791590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:05,894-Speed 2624.76 samples/sec   Loss 1.1674   LearningRate 0.0002   Epoch: 19   Global Step: 791600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:43:09,811-Speed 2614.75 samples/sec   Loss 1.1513   LearningRate 0.0002   Epoch: 19   Global Step: 791610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:43:13,705-Speed 2630.72 samples/sec   Loss 1.2669   LearningRate 0.0002   Epoch: 19   Global Step: 791620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:43:17,590-Speed 2636.19 samples/sec   Loss 1.2508   LearningRate 0.0002   Epoch: 19   Global Step: 791630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:21,485-Speed 2630.35 samples/sec   Loss 1.1891   LearningRate 0.0002   Epoch: 19   Global Step: 791640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:25,382-Speed 2627.66 samples/sec   Loss 1.1737   LearningRate 0.0002   Epoch: 19   Global Step: 791650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:29,277-Speed 2630.49 samples/sec   Loss 1.1731   LearningRate 0.0002   Epoch: 19   Global Step: 791660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:33,169-Speed 2631.00 samples/sec   Loss 1.1665   LearningRate 0.0002   Epoch: 19   Global Step: 791670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:37,076-Speed 2621.90 samples/sec   Loss 1.1829   LearningRate 0.0002   Epoch: 19   Global Step: 791680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:40,972-Speed 2628.61 samples/sec   Loss 1.1909   LearningRate 0.0002   Epoch: 19   Global Step: 791690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:44,872-Speed 2626.59 samples/sec   Loss 1.1947   LearningRate 0.0002   Epoch: 19   Global Step: 791700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:48,761-Speed 2634.03 samples/sec   Loss 1.2159   LearningRate 0.0002   Epoch: 19   Global Step: 791710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:52,726-Speed 2609.81 samples/sec   Loss 1.2326   LearningRate 0.0002   Epoch: 19   Global Step: 791720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:43:56,599-Speed 2645.00 samples/sec   Loss 1.2153   LearningRate 0.0002   Epoch: 19   Global Step: 791730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:02,972-Speed 2628.67 samples/sec   Loss 1.2231   LearningRate 0.0002   Epoch: 19   Global Step: 791740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:06,863-Speed 2632.34 samples/sec   Loss 1.1921   LearningRate 0.0002   Epoch: 19   Global Step: 791750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:10,807-Speed 2596.62 samples/sec   Loss 1.1710   LearningRate 0.0002   Epoch: 19   Global Step: 791760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:14,907-Speed 2588.42 samples/sec   Loss 1.1839   LearningRate 0.0002   Epoch: 19   Global Step: 791770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:18,797-Speed 2633.45 samples/sec   Loss 1.2138   LearningRate 0.0002   Epoch: 19   Global Step: 791780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:22,686-Speed 2633.39 samples/sec   Loss 1.2090   LearningRate 0.0002   Epoch: 19   Global Step: 791790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:26,576-Speed 2632.80 samples/sec   Loss 1.1787   LearningRate 0.0002   Epoch: 19   Global Step: 791800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:30,865-Speed 2632.84 samples/sec   Loss 1.1745   LearningRate 0.0002   Epoch: 19   Global Step: 791810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:34,764-Speed 2629.15 samples/sec   Loss 1.1904   LearningRate 0.0002   Epoch: 19   Global Step: 791820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:39,091-Speed 2636.28 samples/sec   Loss 1.2000   LearningRate 0.0002   Epoch: 19   Global Step: 791830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:44:42,957-Speed 2648.95 samples/sec   Loss 1.2025   LearningRate 0.0002   Epoch: 19   Global Step: 791840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:47,024-Speed 2602.80 samples/sec   Loss 1.1877   LearningRate 0.0002   Epoch: 19   Global Step: 791850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:50,920-Speed 2629.10 samples/sec   Loss 1.1959   LearningRate 0.0002   Epoch: 19   Global Step: 791860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:54,832-Speed 2618.15 samples/sec   Loss 1.1822   LearningRate 0.0002   Epoch: 19   Global Step: 791870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:44:58,744-Speed 2618.13 samples/sec   Loss 1.1747   LearningRate 0.0002   Epoch: 19   Global Step: 791880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:45:02,638-Speed 2630.35 samples/sec   Loss 1.2292   LearningRate 0.0002   Epoch: 19   Global Step: 791890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:45:06,530-Speed 2632.01 samples/sec   Loss 1.1953   LearningRate 0.0002   Epoch: 19   Global Step: 791900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:45:10,426-Speed 2628.49 samples/sec   Loss 1.2018   LearningRate 0.0002   Epoch: 19   Global Step: 791910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:45:14,298-Speed 2645.44 samples/sec   Loss 1.1519   LearningRate 0.0002   Epoch: 19   Global Step: 791920   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:18,195-Speed 2628.41 samples/sec   Loss 1.2664   LearningRate 0.0002   Epoch: 19   Global Step: 791930   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:22,099-Speed 2623.44 samples/sec   Loss 1.1155   LearningRate 0.0002   Epoch: 19   Global Step: 791940   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:25,996-Speed 2628.13 samples/sec   Loss 1.1813   LearningRate 0.0002   Epoch: 19   Global Step: 791950   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:29,890-Speed 2631.07 samples/sec   Loss 1.2064   LearningRate 0.0002   Epoch: 19   Global Step: 791960   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:33,783-Speed 2630.89 samples/sec   Loss 1.2115   LearningRate 0.0002   Epoch: 19   Global Step: 791970   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:37,675-Speed 2632.13 samples/sec   Loss 1.1589   LearningRate 0.0002   Epoch: 19   Global Step: 791980   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:41,569-Speed 2630.46 samples/sec   Loss 1.2188   LearningRate 0.0002   Epoch: 19   Global Step: 791990   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:45,464-Speed 2629.78 samples/sec   Loss 1.1949   LearningRate 0.0002   Epoch: 19   Global Step: 792000   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:49,358-Speed 2629.57 samples/sec   Loss 1.2077   LearningRate 0.0002   Epoch: 19   Global Step: 792010   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:45:53,272-Speed 2617.46 samples/sec   Loss 1.2123   LearningRate 0.0002   Epoch: 19   Global Step: 792020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:45:57,163-Speed 2631.91 samples/sec   Loss 1.2242   LearningRate 0.0002   Epoch: 19   Global Step: 792030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:01,056-Speed 2631.66 samples/sec   Loss 1.1915   LearningRate 0.0002   Epoch: 19   Global Step: 792040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:04,961-Speed 2622.30 samples/sec   Loss 1.1765   LearningRate 0.0002   Epoch: 19   Global Step: 792050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:08,905-Speed 2597.53 samples/sec   Loss 1.2376   LearningRate 0.0002   Epoch: 19   Global Step: 792060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:12,805-Speed 2626.23 samples/sec   Loss 1.1505   LearningRate 0.0002   Epoch: 19   Global Step: 792070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:16,728-Speed 2611.54 samples/sec   Loss 1.2091   LearningRate 0.0002   Epoch: 19   Global Step: 792080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:20,629-Speed 2625.47 samples/sec   Loss 1.1592   LearningRate 0.0002   Epoch: 19   Global Step: 792090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:24,522-Speed 2631.17 samples/sec   Loss 1.1851   LearningRate 0.0002   Epoch: 19   Global Step: 792100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:28,413-Speed 2631.87 samples/sec   Loss 1.1766   LearningRate 0.0002   Epoch: 19   Global Step: 792110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:32,312-Speed 2626.63 samples/sec   Loss 1.1896   LearningRate 0.0002   Epoch: 19   Global Step: 792120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:36,221-Speed 2620.57 samples/sec   Loss 1.1714   LearningRate 0.0002   Epoch: 19   Global Step: 792130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:40,111-Speed 2633.72 samples/sec   Loss 1.2086   LearningRate 0.0002   Epoch: 19   Global Step: 792140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:44,002-Speed 2632.50 samples/sec   Loss 1.1951   LearningRate 0.0002   Epoch: 19   Global Step: 792150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:47,912-Speed 2619.61 samples/sec   Loss 1.2149   LearningRate 0.0002   Epoch: 19   Global Step: 792160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:51,804-Speed 2631.99 samples/sec   Loss 1.2423   LearningRate 0.0002   Epoch: 19   Global Step: 792170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:55,693-Speed 2633.88 samples/sec   Loss 1.2162   LearningRate 0.0002   Epoch: 19   Global Step: 792180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:46:59,570-Speed 2641.86 samples/sec   Loss 1.1869   LearningRate 0.0002   Epoch: 19   Global Step: 792190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:03,472-Speed 2624.58 samples/sec   Loss 1.2390   LearningRate 0.0002   Epoch: 19   Global Step: 792200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:07,389-Speed 2614.74 samples/sec   Loss 1.2054   LearningRate 0.0002   Epoch: 19   Global Step: 792210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:11,292-Speed 2624.68 samples/sec   Loss 1.1881   LearningRate 0.0002   Epoch: 19   Global Step: 792220   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:15,189-Speed 2628.91 samples/sec   Loss 1.2160   LearningRate 0.0002   Epoch: 19   Global Step: 792230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:19,083-Speed 2630.22 samples/sec   Loss 1.1714   LearningRate 0.0002   Epoch: 19   Global Step: 792240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:22,980-Speed 2628.11 samples/sec   Loss 1.2539   LearningRate 0.0002   Epoch: 19   Global Step: 792250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:26,873-Speed 2631.33 samples/sec   Loss 1.1677   LearningRate 0.0002   Epoch: 19   Global Step: 792260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:30,771-Speed 2628.17 samples/sec   Loss 1.1804   LearningRate 0.0002   Epoch: 19   Global Step: 792270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:34,671-Speed 2626.24 samples/sec   Loss 1.2084   LearningRate 0.0002   Epoch: 19   Global Step: 792280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:47:38,582-Speed 2618.75 samples/sec   Loss 1.2097   LearningRate 0.0002   Epoch: 19   Global Step: 792290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:47:42,479-Speed 2628.21 samples/sec   Loss 1.2010   LearningRate 0.0002   Epoch: 19   Global Step: 792300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:47:46,375-Speed 2630.05 samples/sec   Loss 1.1979   LearningRate 0.0002   Epoch: 19   Global Step: 792310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:47:50,275-Speed 2626.20 samples/sec   Loss 1.2017   LearningRate 0.0002   Epoch: 19   Global Step: 792320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:47:54,169-Speed 2630.23 samples/sec   Loss 1.1570   LearningRate 0.0002   Epoch: 19   Global Step: 792330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:47:58,074-Speed 2623.12 samples/sec   Loss 1.1892   LearningRate 0.0002   Epoch: 19   Global Step: 792340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:01,969-Speed 2629.80 samples/sec   Loss 1.2185   LearningRate 0.0002   Epoch: 19   Global Step: 792350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:05,869-Speed 2626.63 samples/sec   Loss 1.1789   LearningRate 0.0002   Epoch: 19   Global Step: 792360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:09,760-Speed 2632.25 samples/sec   Loss 1.1942   LearningRate 0.0002   Epoch: 19   Global Step: 792370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:13,658-Speed 2627.79 samples/sec   Loss 1.1453   LearningRate 0.0002   Epoch: 19   Global Step: 792380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:17,558-Speed 2626.27 samples/sec   Loss 1.1740   LearningRate 0.0002   Epoch: 19   Global Step: 792390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:48:21,472-Speed 2617.71 samples/sec   Loss 1.2013   LearningRate 0.0002   Epoch: 19   Global Step: 792400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:48:25,378-Speed 2622.03 samples/sec   Loss 1.2343   LearningRate 0.0002   Epoch: 19   Global Step: 792410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:29,356-Speed 2575.21 samples/sec   Loss 1.2076   LearningRate 0.0002   Epoch: 19   Global Step: 792420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:33,278-Speed 2611.40 samples/sec   Loss 1.1735   LearningRate 0.0002   Epoch: 19   Global Step: 792430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:48:37,151-Speed 2644.53 samples/sec   Loss 1.1766   LearningRate 0.0002   Epoch: 19   Global Step: 792440   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:48:41,113-Speed 2585.73 samples/sec   Loss 1.1877   LearningRate 0.0002   Epoch: 19   Global Step: 792450   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:48:45,049-Speed 2602.37 samples/sec   Loss 1.1900   LearningRate 0.0002   Epoch: 19   Global Step: 792460   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:48:48,946-Speed 2627.98 samples/sec   Loss 1.1663   LearningRate 0.0002   Epoch: 19   Global Step: 792470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:48:52,841-Speed 2629.61 samples/sec   Loss 1.1761   LearningRate 0.0002   Epoch: 19   Global Step: 792480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:48:56,735-Speed 2630.43 samples/sec   Loss 1.2037   LearningRate 0.0002   Epoch: 19   Global Step: 792490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:49:00,627-Speed 2632.41 samples/sec   Loss 1.1260   LearningRate 0.0002   Epoch: 19   Global Step: 792500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:49:04,521-Speed 2629.67 samples/sec   Loss 1.1995   LearningRate 0.0002   Epoch: 19   Global Step: 792510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:49:08,417-Speed 2629.24 samples/sec   Loss 1.2235   LearningRate 0.0002   Epoch: 19   Global Step: 792520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:49:12,317-Speed 2625.87 samples/sec   Loss 1.1786   LearningRate 0.0002   Epoch: 19   Global Step: 792530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:49:16,217-Speed 2626.32 samples/sec   Loss 1.1682   LearningRate 0.0002   Epoch: 19   Global Step: 792540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:20,118-Speed 2625.58 samples/sec   Loss 1.2503   LearningRate 0.0002   Epoch: 19   Global Step: 792550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:24,021-Speed 2624.38 samples/sec   Loss 1.1984   LearningRate 0.0002   Epoch: 19   Global Step: 792560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:27,913-Speed 2631.41 samples/sec   Loss 1.1827   LearningRate 0.0002   Epoch: 19   Global Step: 792570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:31,806-Speed 2631.59 samples/sec   Loss 1.1570   LearningRate 0.0002   Epoch: 19   Global Step: 792580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:35,700-Speed 2630.12 samples/sec   Loss 1.2058   LearningRate 0.0002   Epoch: 19   Global Step: 792590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:39,599-Speed 2626.72 samples/sec   Loss 1.1784   LearningRate 0.0002   Epoch: 19   Global Step: 792600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:43,508-Speed 2620.39 samples/sec   Loss 1.2443   LearningRate 0.0002   Epoch: 19   Global Step: 792610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:47,406-Speed 2626.93 samples/sec   Loss 1.2422   LearningRate 0.0002   Epoch: 19   Global Step: 792620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:51,303-Speed 2632.02 samples/sec   Loss 1.1689   LearningRate 0.0002   Epoch: 19   Global Step: 792630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:49:55,205-Speed 2624.94 samples/sec   Loss 1.1846   LearningRate 0.0002   Epoch: 19   Global Step: 792640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:49:59,074-Speed 2647.46 samples/sec   Loss 1.1707   LearningRate 0.0002   Epoch: 19   Global Step: 792650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:50:02,942-Speed 2647.87 samples/sec   Loss 1.1696   LearningRate 0.0002   Epoch: 19   Global Step: 792660   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:06,883-Speed 2598.91 samples/sec   Loss 1.2421   LearningRate 0.0002   Epoch: 19   Global Step: 792670   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:10,779-Speed 2629.36 samples/sec   Loss 1.1928   LearningRate 0.0002   Epoch: 19   Global Step: 792680   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:14,672-Speed 2631.34 samples/sec   Loss 1.1575   LearningRate 0.0002   Epoch: 19   Global Step: 792690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:18,570-Speed 2627.51 samples/sec   Loss 1.1844   LearningRate 0.0002   Epoch: 19   Global Step: 792700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:22,467-Speed 2627.72 samples/sec   Loss 1.2167   LearningRate 0.0002   Epoch: 19   Global Step: 792710   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:26,359-Speed 2632.08 samples/sec   Loss 1.2236   LearningRate 0.0002   Epoch: 19   Global Step: 792720   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:30,265-Speed 2621.92 samples/sec   Loss 1.1461   LearningRate 0.0002   Epoch: 19   Global Step: 792730   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:34,175-Speed 2619.42 samples/sec   Loss 1.1810   LearningRate 0.0002   Epoch: 19   Global Step: 792740   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:38,075-Speed 2626.45 samples/sec   Loss 1.1660   LearningRate 0.0002   Epoch: 19   Global Step: 792750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:50:41,974-Speed 2626.61 samples/sec   Loss 1.2076   LearningRate 0.0002   Epoch: 19   Global Step: 792760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:50:45,877-Speed 2624.78 samples/sec   Loss 1.1583   LearningRate 0.0002   Epoch: 19   Global Step: 792770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:50:49,774-Speed 2628.49 samples/sec   Loss 1.1883   LearningRate 0.0002   Epoch: 19   Global Step: 792780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:50:53,841-Speed 2517.89 samples/sec   Loss 1.1857   LearningRate 0.0002   Epoch: 19   Global Step: 792790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:50:57,801-Speed 2586.45 samples/sec   Loss 1.1899   LearningRate 0.0002   Epoch: 19   Global Step: 792800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:01,737-Speed 2602.07 samples/sec   Loss 1.1672   LearningRate 0.0002   Epoch: 19   Global Step: 792810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:05,825-Speed 2505.82 samples/sec   Loss 1.1725   LearningRate 0.0002   Epoch: 19   Global Step: 792820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:09,801-Speed 2575.58 samples/sec   Loss 1.1236   LearningRate 0.0002   Epoch: 19   Global Step: 792830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:13,711-Speed 2619.89 samples/sec   Loss 1.2105   LearningRate 0.0002   Epoch: 19   Global Step: 792840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:17,616-Speed 2623.05 samples/sec   Loss 1.1782   LearningRate 0.0002   Epoch: 19   Global Step: 792850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:21,490-Speed 2644.04 samples/sec   Loss 1.1982   LearningRate 0.0002   Epoch: 19   Global Step: 792860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:25,393-Speed 2624.51 samples/sec   Loss 1.1680   LearningRate 0.0002   Epoch: 19   Global Step: 792870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:29,356-Speed 2583.96 samples/sec   Loss 1.2173   LearningRate 0.0002   Epoch: 19   Global Step: 792880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:33,256-Speed 2626.45 samples/sec   Loss 1.1784   LearningRate 0.0002   Epoch: 19   Global Step: 792890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:37,152-Speed 2628.74 samples/sec   Loss 1.1684   LearningRate 0.0002   Epoch: 19   Global Step: 792900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:41,071-Speed 2613.62 samples/sec   Loss 1.2040   LearningRate 0.0002   Epoch: 19   Global Step: 792910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:44,979-Speed 2621.19 samples/sec   Loss 1.2339   LearningRate 0.0002   Epoch: 19   Global Step: 792920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:48,880-Speed 2625.28 samples/sec   Loss 1.1781   LearningRate 0.0002   Epoch: 19   Global Step: 792930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:52,779-Speed 2626.70 samples/sec   Loss 1.2147   LearningRate 0.0002   Epoch: 19   Global Step: 792940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:51:56,680-Speed 2625.49 samples/sec   Loss 1.1622   LearningRate 0.0002   Epoch: 19   Global Step: 792950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:00,587-Speed 2621.56 samples/sec   Loss 1.2080   LearningRate 0.0002   Epoch: 19   Global Step: 792960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:52:04,456-Speed 2647.70 samples/sec   Loss 1.1456   LearningRate 0.0002   Epoch: 19   Global Step: 792970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:08,369-Speed 2617.42 samples/sec   Loss 1.2143   LearningRate 0.0002   Epoch: 19   Global Step: 792980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:12,269-Speed 2626.49 samples/sec   Loss 1.2201   LearningRate 0.0002   Epoch: 19   Global Step: 792990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:16,165-Speed 2629.05 samples/sec   Loss 1.1792   LearningRate 0.0002   Epoch: 19   Global Step: 793000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:20,059-Speed 2630.56 samples/sec   Loss 1.1537   LearningRate 0.0002   Epoch: 19   Global Step: 793010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:23,951-Speed 2630.94 samples/sec   Loss 1.2077   LearningRate 0.0002   Epoch: 19   Global Step: 793020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:27,841-Speed 2633.19 samples/sec   Loss 1.1564   LearningRate 0.0002   Epoch: 19   Global Step: 793030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:31,734-Speed 2630.62 samples/sec   Loss 1.1895   LearningRate 0.0002   Epoch: 19   Global Step: 793040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:35,626-Speed 2631.91 samples/sec   Loss 1.1387   LearningRate 0.0002   Epoch: 19   Global Step: 793050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:39,534-Speed 2620.82 samples/sec   Loss 1.1856   LearningRate 0.0002   Epoch: 19   Global Step: 793060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:43,433-Speed 2627.33 samples/sec   Loss 1.1725   LearningRate 0.0002   Epoch: 19   Global Step: 793070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:52:47,345-Speed 2617.92 samples/sec   Loss 1.1876   LearningRate 0.0002   Epoch: 19   Global Step: 793080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:52:51,216-Speed 2646.08 samples/sec   Loss 1.1366   LearningRate 0.0002   Epoch: 19   Global Step: 793090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:55,108-Speed 2632.11 samples/sec   Loss 1.2058   LearningRate 0.0002   Epoch: 19   Global Step: 793100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:52:59,002-Speed 2629.66 samples/sec   Loss 1.1962   LearningRate 0.0002   Epoch: 19   Global Step: 793110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:02,901-Speed 2627.12 samples/sec   Loss 1.1857   LearningRate 0.0002   Epoch: 19   Global Step: 793120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:06,834-Speed 2603.84 samples/sec   Loss 1.2456   LearningRate 0.0002   Epoch: 19   Global Step: 793130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:10,731-Speed 2628.69 samples/sec   Loss 1.2150   LearningRate 0.0002   Epoch: 19   Global Step: 793140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:14,626-Speed 2629.17 samples/sec   Loss 1.1604   LearningRate 0.0002   Epoch: 19   Global Step: 793150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:18,521-Speed 2630.14 samples/sec   Loss 1.1963   LearningRate 0.0002   Epoch: 19   Global Step: 793160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:22,421-Speed 2626.05 samples/sec   Loss 1.2097   LearningRate 0.0002   Epoch: 19   Global Step: 793170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:26,318-Speed 2628.46 samples/sec   Loss 1.1942   LearningRate 0.0002   Epoch: 19   Global Step: 793180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:53:30,217-Speed 2627.08 samples/sec   Loss 1.2228   LearningRate 0.0002   Epoch: 19   Global Step: 793190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:34,123-Speed 2621.63 samples/sec   Loss 1.2277   LearningRate 0.0002   Epoch: 19   Global Step: 793200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:38,022-Speed 2627.08 samples/sec   Loss 1.2435   LearningRate 0.0002   Epoch: 19   Global Step: 793210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:41,970-Speed 2593.87 samples/sec   Loss 1.1866   LearningRate 0.0002   Epoch: 19   Global Step: 793220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:46,060-Speed 2505.08 samples/sec   Loss 1.1708   LearningRate 0.0002   Epoch: 19   Global Step: 793230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:50,047-Speed 2569.03 samples/sec   Loss 1.2329   LearningRate 0.0002   Epoch: 19   Global Step: 793240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:53,934-Speed 2634.95 samples/sec   Loss 1.1572   LearningRate 0.0002   Epoch: 19   Global Step: 793250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:53:57,835-Speed 2625.52 samples/sec   Loss 1.1866   LearningRate 0.0002   Epoch: 19   Global Step: 793260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:54:01,730-Speed 2629.92 samples/sec   Loss 1.1410   LearningRate 0.0002   Epoch: 19   Global Step: 793270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:54:05,629-Speed 2627.32 samples/sec   Loss 1.1608   LearningRate 0.0002   Epoch: 19   Global Step: 793280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:54:09,509-Speed 2639.85 samples/sec   Loss 1.1706   LearningRate 0.0002   Epoch: 19   Global Step: 793290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:13,409-Speed 2626.34 samples/sec   Loss 1.1641   LearningRate 0.0002   Epoch: 19   Global Step: 793300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:17,302-Speed 2631.08 samples/sec   Loss 1.1866   LearningRate 0.0002   Epoch: 19   Global Step: 793310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:21,196-Speed 2631.24 samples/sec   Loss 1.1622   LearningRate 0.0002   Epoch: 19   Global Step: 793320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:25,094-Speed 2627.35 samples/sec   Loss 1.1895   LearningRate 0.0002   Epoch: 19   Global Step: 793330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:29,005-Speed 2619.56 samples/sec   Loss 1.2029   LearningRate 0.0002   Epoch: 19   Global Step: 793340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:32,924-Speed 2613.23 samples/sec   Loss 1.1868   LearningRate 0.0002   Epoch: 19   Global Step: 793350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:36,820-Speed 2629.14 samples/sec   Loss 1.1952   LearningRate 0.0002   Epoch: 19   Global Step: 793360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:40,713-Speed 2630.99 samples/sec   Loss 1.2125   LearningRate 0.0002   Epoch: 19   Global Step: 793370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:44,612-Speed 2626.81 samples/sec   Loss 1.1511   LearningRate 0.0002   Epoch: 19   Global Step: 793380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:48,506-Speed 2630.39 samples/sec   Loss 1.1841   LearningRate 0.0002   Epoch: 19   Global Step: 793390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:54:52,374-Speed 2648.08 samples/sec   Loss 1.2246   LearningRate 0.0002   Epoch: 19   Global Step: 793400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:54:56,275-Speed 2625.58 samples/sec   Loss 1.1949   LearningRate 0.0002   Epoch: 19   Global Step: 793410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:00,170-Speed 2629.98 samples/sec   Loss 1.1954   LearningRate 0.0002   Epoch: 19   Global Step: 793420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:04,066-Speed 2629.47 samples/sec   Loss 1.1883   LearningRate 0.0002   Epoch: 19   Global Step: 793430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:07,966-Speed 2625.75 samples/sec   Loss 1.2242   LearningRate 0.0002   Epoch: 19   Global Step: 793440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:11,860-Speed 2630.44 samples/sec   Loss 1.1863   LearningRate 0.0002   Epoch: 19   Global Step: 793450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:15,752-Speed 2631.85 samples/sec   Loss 1.2012   LearningRate 0.0002   Epoch: 19   Global Step: 793460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:19,648-Speed 2628.23 samples/sec   Loss 1.1382   LearningRate 0.0002   Epoch: 19   Global Step: 793470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:23,546-Speed 2628.05 samples/sec   Loss 1.1463   LearningRate 0.0002   Epoch: 19   Global Step: 793480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:55:27,415-Speed 2647.32 samples/sec   Loss 1.1772   LearningRate 0.0002   Epoch: 19   Global Step: 793490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:31,406-Speed 2567.11 samples/sec   Loss 1.2100   LearningRate 0.0002   Epoch: 19   Global Step: 793500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:35,303-Speed 2628.36 samples/sec   Loss 1.1548   LearningRate 0.0002   Epoch: 19   Global Step: 793510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:39,194-Speed 2631.82 samples/sec   Loss 1.1956   LearningRate 0.0002   Epoch: 19   Global Step: 793520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:43,086-Speed 2631.30 samples/sec   Loss 1.2218   LearningRate 0.0002   Epoch: 19   Global Step: 793530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:47,027-Speed 2599.06 samples/sec   Loss 1.1738   LearningRate 0.0002   Epoch: 19   Global Step: 793540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:50,919-Speed 2631.89 samples/sec   Loss 1.1951   LearningRate 0.0002   Epoch: 19   Global Step: 793550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:54,832-Speed 2617.81 samples/sec   Loss 1.1539   LearningRate 0.0002   Epoch: 19   Global Step: 793560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:55:58,745-Speed 2617.89 samples/sec   Loss 1.2044   LearningRate 0.0002   Epoch: 19   Global Step: 793570   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:56:02,642-Speed 2628.72 samples/sec   Loss 1.1997   LearningRate 0.0002   Epoch: 19   Global Step: 793580   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:56:06,567-Speed 2609.35 samples/sec   Loss 1.1756   LearningRate 0.0002   Epoch: 19   Global Step: 793590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:10,466-Speed 2626.93 samples/sec   Loss 1.2243   LearningRate 0.0002   Epoch: 19   Global Step: 793600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:14,361-Speed 2629.24 samples/sec   Loss 1.2097   LearningRate 0.0002   Epoch: 19   Global Step: 793610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:18,262-Speed 2625.53 samples/sec   Loss 1.2035   LearningRate 0.0002   Epoch: 19   Global Step: 793620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:22,165-Speed 2624.44 samples/sec   Loss 1.1478   LearningRate 0.0002   Epoch: 19   Global Step: 793630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:26,065-Speed 2626.53 samples/sec   Loss 1.1928   LearningRate 0.0002   Epoch: 19   Global Step: 793640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:29,970-Speed 2623.04 samples/sec   Loss 1.1368   LearningRate 0.0002   Epoch: 19   Global Step: 793650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:33,869-Speed 2626.41 samples/sec   Loss 1.2080   LearningRate 0.0002   Epoch: 19   Global Step: 793660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:37,774-Speed 2623.55 samples/sec   Loss 1.2067   LearningRate 0.0002   Epoch: 19   Global Step: 793670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:41,665-Speed 2632.43 samples/sec   Loss 1.1597   LearningRate 0.0002   Epoch: 19   Global Step: 793680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:56:45,542-Speed 2641.87 samples/sec   Loss 1.1467   LearningRate 0.0002   Epoch: 19   Global Step: 793690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:56:49,441-Speed 2626.79 samples/sec   Loss 1.1766   LearningRate 0.0002   Epoch: 19   Global Step: 793700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:56:53,339-Speed 2627.95 samples/sec   Loss 1.1849   LearningRate 0.0002   Epoch: 19   Global Step: 793710   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:56:57,233-Speed 2630.83 samples/sec   Loss 1.1945   LearningRate 0.0002   Epoch: 19   Global Step: 793720   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:01,124-Speed 2632.16 samples/sec   Loss 1.1856   LearningRate 0.0002   Epoch: 19   Global Step: 793730   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:05,017-Speed 2630.43 samples/sec   Loss 1.2073   LearningRate 0.0002   Epoch: 19   Global Step: 793740   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:08,918-Speed 2625.85 samples/sec   Loss 1.1966   LearningRate 0.0002   Epoch: 19   Global Step: 793750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:12,813-Speed 2629.87 samples/sec   Loss 1.1557   LearningRate 0.0002   Epoch: 19   Global Step: 793760   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:16,713-Speed 2626.78 samples/sec   Loss 1.1640   LearningRate 0.0002   Epoch: 19   Global Step: 793770   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:20,607-Speed 2630.37 samples/sec   Loss 1.2249   LearningRate 0.0002   Epoch: 19   Global Step: 793780   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 12:57:24,508-Speed 2625.99 samples/sec   Loss 1.1893   LearningRate 0.0002   Epoch: 19   Global Step: 793790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:28,444-Speed 2602.35 samples/sec   Loss 1.1824   LearningRate 0.0002   Epoch: 19   Global Step: 793800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:32,468-Speed 2545.16 samples/sec   Loss 1.1656   LearningRate 0.0002   Epoch: 19   Global Step: 793810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:36,362-Speed 2630.01 samples/sec   Loss 1.1595   LearningRate 0.0002   Epoch: 19   Global Step: 793820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:40,257-Speed 2630.53 samples/sec   Loss 1.1248   LearningRate 0.0002   Epoch: 19   Global Step: 793830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:44,149-Speed 2630.93 samples/sec   Loss 1.1624   LearningRate 0.0002   Epoch: 19   Global Step: 793840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:48,049-Speed 2626.69 samples/sec   Loss 1.2411   LearningRate 0.0002   Epoch: 19   Global Step: 793850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:51,946-Speed 2628.14 samples/sec   Loss 1.2237   LearningRate 0.0002   Epoch: 19   Global Step: 793860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:55,840-Speed 2630.77 samples/sec   Loss 1.1503   LearningRate 0.0002   Epoch: 19   Global Step: 793870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:57:59,730-Speed 2633.03 samples/sec   Loss 1.1427   LearningRate 0.0002   Epoch: 19   Global Step: 793880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:03,603-Speed 2644.26 samples/sec   Loss 1.1939   LearningRate 0.0002   Epoch: 19   Global Step: 793890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:07,497-Speed 2630.91 samples/sec   Loss 1.1583   LearningRate 0.0002   Epoch: 19   Global Step: 793900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:11,390-Speed 2630.94 samples/sec   Loss 1.1431   LearningRate 0.0002   Epoch: 19   Global Step: 793910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:15,285-Speed 2629.38 samples/sec   Loss 1.1902   LearningRate 0.0002   Epoch: 19   Global Step: 793920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:19,183-Speed 2628.00 samples/sec   Loss 1.1713   LearningRate 0.0002   Epoch: 19   Global Step: 793930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:23,087-Speed 2623.57 samples/sec   Loss 1.1699   LearningRate 0.0002   Epoch: 19   Global Step: 793940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:26,989-Speed 2626.06 samples/sec   Loss 1.2099   LearningRate 0.0002   Epoch: 19   Global Step: 793950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:30,884-Speed 2629.35 samples/sec   Loss 1.1670   LearningRate 0.0002   Epoch: 19   Global Step: 793960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:34,782-Speed 2627.62 samples/sec   Loss 1.1935   LearningRate 0.0002   Epoch: 19   Global Step: 793970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:38,676-Speed 2629.79 samples/sec   Loss 1.1798   LearningRate 0.0002   Epoch: 19   Global Step: 793980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:42,587-Speed 2619.41 samples/sec   Loss 1.1464   LearningRate 0.0002   Epoch: 19   Global Step: 793990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:58:46,465-Speed 2641.52 samples/sec   Loss 1.1999   LearningRate 0.0002   Epoch: 19   Global Step: 794000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:50,360-Speed 2629.55 samples/sec   Loss 1.2270   LearningRate 0.0002   Epoch: 19   Global Step: 794010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:54,278-Speed 2614.44 samples/sec   Loss 1.1853   LearningRate 0.0002   Epoch: 19   Global Step: 794020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:58:58,184-Speed 2622.51 samples/sec   Loss 1.1480   LearningRate 0.0002   Epoch: 19   Global Step: 794030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:02,127-Speed 2597.47 samples/sec   Loss 1.1518   LearningRate 0.0002   Epoch: 19   Global Step: 794040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:06,025-Speed 2627.45 samples/sec   Loss 1.1941   LearningRate 0.0002   Epoch: 19   Global Step: 794050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:09,936-Speed 2619.33 samples/sec   Loss 1.1831   LearningRate 0.0002   Epoch: 19   Global Step: 794060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:13,839-Speed 2624.35 samples/sec   Loss 1.2255   LearningRate 0.0002   Epoch: 19   Global Step: 794070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:17,733-Speed 2630.47 samples/sec   Loss 1.1796   LearningRate 0.0002   Epoch: 19   Global Step: 794080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:21,625-Speed 2632.65 samples/sec   Loss 1.1913   LearningRate 0.0002   Epoch: 19   Global Step: 794090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:25,544-Speed 2613.27 samples/sec   Loss 1.2271   LearningRate 0.0002   Epoch: 19   Global Step: 794100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:59:29,438-Speed 2630.37 samples/sec   Loss 1.2110   LearningRate 0.0002   Epoch: 19   Global Step: 794110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:59:33,340-Speed 2625.24 samples/sec   Loss 1.1691   LearningRate 0.0002   Epoch: 19   Global Step: 794120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 12:59:37,241-Speed 2625.34 samples/sec   Loss 1.1936   LearningRate 0.0002   Epoch: 19   Global Step: 794130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:41,142-Speed 2625.24 samples/sec   Loss 1.2280   LearningRate 0.0002   Epoch: 19   Global Step: 794140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:45,044-Speed 2624.98 samples/sec   Loss 1.1997   LearningRate 0.0002   Epoch: 19   Global Step: 794150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:48,944-Speed 2626.36 samples/sec   Loss 1.1776   LearningRate 0.0002   Epoch: 19   Global Step: 794160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:52,843-Speed 2627.65 samples/sec   Loss 1.2298   LearningRate 0.0002   Epoch: 19   Global Step: 794170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 12:59:56,737-Speed 2630.51 samples/sec   Loss 1.2001   LearningRate 0.0002   Epoch: 19   Global Step: 794180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:00,631-Speed 2630.49 samples/sec   Loss 1.1751   LearningRate 0.0002   Epoch: 19   Global Step: 794190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:04,531-Speed 2625.70 samples/sec   Loss 1.1890   LearningRate 0.0002   Epoch: 19   Global Step: 794200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:08,427-Speed 2629.30 samples/sec   Loss 1.1685   LearningRate 0.0002   Epoch: 19   Global Step: 794210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:12,320-Speed 2630.56 samples/sec   Loss 1.1690   LearningRate 0.0002   Epoch: 19   Global Step: 794220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:16,192-Speed 2645.80 samples/sec   Loss 1.2366   LearningRate 0.0002   Epoch: 19   Global Step: 794230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:20,082-Speed 2632.69 samples/sec   Loss 1.2234   LearningRate 0.0002   Epoch: 19   Global Step: 794240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:23,991-Speed 2620.66 samples/sec   Loss 1.2084   LearningRate 0.0002   Epoch: 19   Global Step: 794250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:27,898-Speed 2621.71 samples/sec   Loss 1.2252   LearningRate 0.0002   Epoch: 19   Global Step: 794260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:31,834-Speed 2602.55 samples/sec   Loss 1.1749   LearningRate 0.0002   Epoch: 19   Global Step: 794270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:35,723-Speed 2634.15 samples/sec   Loss 1.2169   LearningRate 0.0002   Epoch: 19   Global Step: 794280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:39,615-Speed 2631.71 samples/sec   Loss 1.1969   LearningRate 0.0002   Epoch: 19   Global Step: 794290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:43,511-Speed 2628.94 samples/sec   Loss 1.2019   LearningRate 0.0002   Epoch: 19   Global Step: 794300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:47,417-Speed 2622.56 samples/sec   Loss 1.2027   LearningRate 0.0002   Epoch: 19   Global Step: 794310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:51,312-Speed 2629.32 samples/sec   Loss 1.2074   LearningRate 0.0002   Epoch: 19   Global Step: 794320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:00:55,209-Speed 2629.04 samples/sec   Loss 1.1601   LearningRate 0.0002   Epoch: 19   Global Step: 794330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:00:59,127-Speed 2614.14 samples/sec   Loss 1.1938   LearningRate 0.0002   Epoch: 19   Global Step: 794340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:03,019-Speed 2631.87 samples/sec   Loss 1.1908   LearningRate 0.0002   Epoch: 19   Global Step: 794350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:06,929-Speed 2619.65 samples/sec   Loss 1.1922   LearningRate 0.0002   Epoch: 19   Global Step: 794360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:10,839-Speed 2619.18 samples/sec   Loss 1.2292   LearningRate 0.0002   Epoch: 19   Global Step: 794370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:14,735-Speed 2628.96 samples/sec   Loss 1.2291   LearningRate 0.0002   Epoch: 19   Global Step: 794380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:18,626-Speed 2632.83 samples/sec   Loss 1.1844   LearningRate 0.0002   Epoch: 19   Global Step: 794390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:22,522-Speed 2629.20 samples/sec   Loss 1.1912   LearningRate 0.0002   Epoch: 19   Global Step: 794400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:26,414-Speed 2630.95 samples/sec   Loss 1.1767   LearningRate 0.0002   Epoch: 19   Global Step: 794410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:30,306-Speed 2632.26 samples/sec   Loss 1.2096   LearningRate 0.0002   Epoch: 19   Global Step: 794420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:34,202-Speed 2629.05 samples/sec   Loss 1.1612   LearningRate 0.0002   Epoch: 19   Global Step: 794430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:38,119-Speed 2615.10 samples/sec   Loss 1.1467   LearningRate 0.0002   Epoch: 19   Global Step: 794440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:01:42,027-Speed 2620.81 samples/sec   Loss 1.1935   LearningRate 0.0002   Epoch: 19   Global Step: 794450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:01:45,946-Speed 2613.87 samples/sec   Loss 1.1909   LearningRate 0.0002   Epoch: 19   Global Step: 794460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:01:49,839-Speed 2630.66 samples/sec   Loss 1.1634   LearningRate 0.0002   Epoch: 19   Global Step: 794470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:01:53,712-Speed 2645.01 samples/sec   Loss 1.2121   LearningRate 0.0002   Epoch: 19   Global Step: 794480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:01:57,618-Speed 2622.29 samples/sec   Loss 1.1839   LearningRate 0.0002   Epoch: 19   Global Step: 794490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:02:01,511-Speed 2631.57 samples/sec   Loss 1.1705   LearningRate 0.0002   Epoch: 19   Global Step: 794500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:02:05,421-Speed 2619.35 samples/sec   Loss 1.1172   LearningRate 0.0002   Epoch: 19   Global Step: 794510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:02:09,290-Speed 2647.40 samples/sec   Loss 1.1404   LearningRate 0.0002   Epoch: 19   Global Step: 794520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:13,185-Speed 2629.72 samples/sec   Loss 1.1892   LearningRate 0.0002   Epoch: 19   Global Step: 794530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:17,078-Speed 2631.01 samples/sec   Loss 1.1782   LearningRate 0.0002   Epoch: 19   Global Step: 794540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:20,974-Speed 2629.05 samples/sec   Loss 1.1691   LearningRate 0.0002   Epoch: 19   Global Step: 794550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:24,870-Speed 2629.25 samples/sec   Loss 1.2213   LearningRate 0.0002   Epoch: 19   Global Step: 794560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:28,776-Speed 2621.81 samples/sec   Loss 1.1659   LearningRate 0.0002   Epoch: 19   Global Step: 794570   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:32,670-Speed 2630.83 samples/sec   Loss 1.1758   LearningRate 0.0002   Epoch: 19   Global Step: 794580   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:36,564-Speed 2630.25 samples/sec   Loss 1.1816   LearningRate 0.0002   Epoch: 19   Global Step: 794590   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:40,490-Speed 2608.48 samples/sec   Loss 1.1751   LearningRate 0.0002   Epoch: 19   Global Step: 794600   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:44,384-Speed 2630.18 samples/sec   Loss 1.2020   LearningRate 0.0002   Epoch: 19   Global Step: 794610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:02:48,283-Speed 2627.64 samples/sec   Loss 1.1473   LearningRate 0.0002   Epoch: 19   Global Step: 794620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:02:52,181-Speed 2627.88 samples/sec   Loss 1.1607   LearningRate 0.0002   Epoch: 19   Global Step: 794630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:02:56,074-Speed 2630.80 samples/sec   Loss 1.2082   LearningRate 0.0002   Epoch: 19   Global Step: 794640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:02:59,984-Speed 2619.67 samples/sec   Loss 1.1885   LearningRate 0.0002   Epoch: 19   Global Step: 794650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:03,889-Speed 2622.76 samples/sec   Loss 1.1742   LearningRate 0.0002   Epoch: 19   Global Step: 794660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:07,791-Speed 2625.27 samples/sec   Loss 1.1869   LearningRate 0.0002   Epoch: 19   Global Step: 794670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:11,683-Speed 2631.48 samples/sec   Loss 1.2075   LearningRate 0.0002   Epoch: 19   Global Step: 794680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:15,688-Speed 2557.74 samples/sec   Loss 1.1925   LearningRate 0.0002   Epoch: 19   Global Step: 794690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:19,730-Speed 2533.90 samples/sec   Loss 1.1676   LearningRate 0.0002   Epoch: 19   Global Step: 794700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:23,627-Speed 2628.53 samples/sec   Loss 1.1812   LearningRate 0.0002   Epoch: 19   Global Step: 794710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:27,537-Speed 2619.49 samples/sec   Loss 1.1606   LearningRate 0.0002   Epoch: 19   Global Step: 794720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:03:31,433-Speed 2629.34 samples/sec   Loss 1.1867   LearningRate 0.0002   Epoch: 19   Global Step: 794730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:03:35,373-Speed 2599.98 samples/sec   Loss 1.1544   LearningRate 0.0002   Epoch: 19   Global Step: 794740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:03:39,240-Speed 2648.55 samples/sec   Loss 1.2021   LearningRate 0.0002   Epoch: 19   Global Step: 794750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:43,174-Speed 2603.65 samples/sec   Loss 1.1681   LearningRate 0.0002   Epoch: 19   Global Step: 794760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:47,081-Speed 2621.85 samples/sec   Loss 1.1734   LearningRate 0.0002   Epoch: 19   Global Step: 794770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:50,981-Speed 2626.57 samples/sec   Loss 1.1205   LearningRate 0.0002   Epoch: 19   Global Step: 794780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:54,876-Speed 2629.94 samples/sec   Loss 1.1765   LearningRate 0.0002   Epoch: 19   Global Step: 794790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:03:58,768-Speed 2631.37 samples/sec   Loss 1.2031   LearningRate 0.0002   Epoch: 19   Global Step: 794800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:02,658-Speed 2633.23 samples/sec   Loss 1.1554   LearningRate 0.0002   Epoch: 19   Global Step: 794810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:06,550-Speed 2631.34 samples/sec   Loss 1.2031   LearningRate 0.0002   Epoch: 19   Global Step: 794820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:10,447-Speed 2628.59 samples/sec   Loss 1.1933   LearningRate 0.0002   Epoch: 19   Global Step: 794830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:14,346-Speed 2626.39 samples/sec   Loss 1.1675   LearningRate 0.0002   Epoch: 19   Global Step: 794840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:18,240-Speed 2630.75 samples/sec   Loss 1.2001   LearningRate 0.0002   Epoch: 19   Global Step: 794850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:22,138-Speed 2628.08 samples/sec   Loss 1.1355   LearningRate 0.0002   Epoch: 19   Global Step: 794860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:26,030-Speed 2631.75 samples/sec   Loss 1.1740   LearningRate 0.0002   Epoch: 19   Global Step: 794870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:29,934-Speed 2623.21 samples/sec   Loss 1.1515   LearningRate 0.0002   Epoch: 19   Global Step: 794880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:33,833-Speed 2627.26 samples/sec   Loss 1.1553   LearningRate 0.0002   Epoch: 19   Global Step: 794890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:37,729-Speed 2628.27 samples/sec   Loss 1.2539   LearningRate 0.0002   Epoch: 19   Global Step: 794900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:41,628-Speed 2627.08 samples/sec   Loss 1.1980   LearningRate 0.0002   Epoch: 19   Global Step: 794910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:04:45,515-Speed 2635.54 samples/sec   Loss 1.2099   LearningRate 0.0002   Epoch: 19   Global Step: 794920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:49,411-Speed 2629.05 samples/sec   Loss 1.1528   LearningRate 0.0002   Epoch: 19   Global Step: 794930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:53,305-Speed 2630.49 samples/sec   Loss 1.1912   LearningRate 0.0002   Epoch: 19   Global Step: 794940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:04:57,218-Speed 2617.58 samples/sec   Loss 1.1353   LearningRate 0.0002   Epoch: 19   Global Step: 794950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:01,116-Speed 2627.65 samples/sec   Loss 1.1735   LearningRate 0.0002   Epoch: 19   Global Step: 794960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:05,016-Speed 2626.65 samples/sec   Loss 1.2051   LearningRate 0.0002   Epoch: 19   Global Step: 794970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:08,913-Speed 2627.58 samples/sec   Loss 1.1813   LearningRate 0.0002   Epoch: 19   Global Step: 794980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:12,819-Speed 2622.44 samples/sec   Loss 1.1444   LearningRate 0.0002   Epoch: 19   Global Step: 794990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:16,716-Speed 2628.37 samples/sec   Loss 1.2074   LearningRate 0.0002   Epoch: 19   Global Step: 795000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:20,613-Speed 2628.66 samples/sec   Loss 1.1621   LearningRate 0.0002   Epoch: 19   Global Step: 795010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:24,509-Speed 2628.80 samples/sec   Loss 1.1468   LearningRate 0.0002   Epoch: 19   Global Step: 795020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:05:28,407-Speed 2627.45 samples/sec   Loss 1.1894   LearningRate 0.0002   Epoch: 19   Global Step: 795030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:05:32,290-Speed 2637.89 samples/sec   Loss 1.1809   LearningRate 0.0002   Epoch: 19   Global Step: 795040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:36,188-Speed 2627.90 samples/sec   Loss 1.2056   LearningRate 0.0002   Epoch: 19   Global Step: 795050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:40,080-Speed 2632.02 samples/sec   Loss 1.1983   LearningRate 0.0002   Epoch: 19   Global Step: 795060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:43,982-Speed 2624.64 samples/sec   Loss 1.1881   LearningRate 0.0002   Epoch: 19   Global Step: 795070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:47,884-Speed 2625.45 samples/sec   Loss 1.1570   LearningRate 0.0002   Epoch: 19   Global Step: 795080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:51,775-Speed 2632.13 samples/sec   Loss 1.1759   LearningRate 0.0002   Epoch: 19   Global Step: 795090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:55,765-Speed 2567.15 samples/sec   Loss 1.2086   LearningRate 0.0002   Epoch: 19   Global Step: 795100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:05:59,657-Speed 2631.91 samples/sec   Loss 1.2062   LearningRate 0.0002   Epoch: 19   Global Step: 795110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:03,557-Speed 2626.24 samples/sec   Loss 1.1584   LearningRate 0.0002   Epoch: 19   Global Step: 795120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:07,450-Speed 2631.51 samples/sec   Loss 1.1596   LearningRate 0.0002   Epoch: 19   Global Step: 795130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:11,342-Speed 2631.15 samples/sec   Loss 1.1583   LearningRate 0.0002   Epoch: 19   Global Step: 795140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:06:15,209-Speed 2648.66 samples/sec   Loss 1.2501   LearningRate 0.0002   Epoch: 19   Global Step: 795150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:19,103-Speed 2630.19 samples/sec   Loss 1.1749   LearningRate 0.0002   Epoch: 19   Global Step: 795160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:23,002-Speed 2627.18 samples/sec   Loss 1.1362   LearningRate 0.0002   Epoch: 19   Global Step: 795170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:26,895-Speed 2630.85 samples/sec   Loss 1.1674   LearningRate 0.0002   Epoch: 19   Global Step: 795180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:30,803-Speed 2621.84 samples/sec   Loss 1.1755   LearningRate 0.0002   Epoch: 19   Global Step: 795190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:34,704-Speed 2625.20 samples/sec   Loss 1.2018   LearningRate 0.0002   Epoch: 19   Global Step: 795200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:38,616-Speed 2618.07 samples/sec   Loss 1.2126   LearningRate 0.0002   Epoch: 19   Global Step: 795210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:42,509-Speed 2631.66 samples/sec   Loss 1.1566   LearningRate 0.0002   Epoch: 19   Global Step: 795220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:06:46,384-Speed 2643.02 samples/sec   Loss 1.1759   LearningRate 0.0002   Epoch: 19   Global Step: 795230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:06:50,275-Speed 2632.27 samples/sec   Loss 1.1904   LearningRate 0.0002   Epoch: 19   Global Step: 795240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:06:54,164-Speed 2634.31 samples/sec   Loss 1.1668   LearningRate 0.0002   Epoch: 19   Global Step: 795250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:06:58,074-Speed 2619.61 samples/sec   Loss 1.1729   LearningRate 0.0002   Epoch: 19   Global Step: 795260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:01,964-Speed 2633.00 samples/sec   Loss 1.2288   LearningRate 0.0002   Epoch: 19   Global Step: 795270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:05,887-Speed 2610.77 samples/sec   Loss 1.1637   LearningRate 0.0002   Epoch: 19   Global Step: 795280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:09,780-Speed 2631.54 samples/sec   Loss 1.1939   LearningRate 0.0002   Epoch: 19   Global Step: 795290   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:13,670-Speed 2632.48 samples/sec   Loss 1.1422   LearningRate 0.0002   Epoch: 19   Global Step: 795300   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:17,574-Speed 2624.25 samples/sec   Loss 1.1707   LearningRate 0.0002   Epoch: 19   Global Step: 795310   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:21,463-Speed 2633.65 samples/sec   Loss 1.1735   LearningRate 0.0002   Epoch: 19   Global Step: 795320   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:07:25,357-Speed 2630.82 samples/sec   Loss 1.2389   LearningRate 0.0002   Epoch: 19   Global Step: 795330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:29,268-Speed 2618.36 samples/sec   Loss 1.1961   LearningRate 0.0002   Epoch: 19   Global Step: 795340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:33,163-Speed 2630.38 samples/sec   Loss 1.1136   LearningRate 0.0002   Epoch: 19   Global Step: 795350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:37,057-Speed 2630.03 samples/sec   Loss 1.1832   LearningRate 0.0002   Epoch: 19   Global Step: 795360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:40,950-Speed 2631.25 samples/sec   Loss 1.1917   LearningRate 0.0002   Epoch: 19   Global Step: 795370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:44,842-Speed 2631.51 samples/sec   Loss 1.1552   LearningRate 0.0002   Epoch: 19   Global Step: 795380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:48,744-Speed 2625.52 samples/sec   Loss 1.2010   LearningRate 0.0002   Epoch: 19   Global Step: 795390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:52,639-Speed 2629.96 samples/sec   Loss 1.1941   LearningRate 0.0002   Epoch: 19   Global Step: 795400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:07:56,532-Speed 2630.58 samples/sec   Loss 1.2265   LearningRate 0.0002   Epoch: 19   Global Step: 795410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:00,428-Speed 2629.12 samples/sec   Loss 1.1879   LearningRate 0.0002   Epoch: 19   Global Step: 795420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:04,326-Speed 2628.07 samples/sec   Loss 1.1785   LearningRate 0.0002   Epoch: 19   Global Step: 795430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:08:08,217-Speed 2632.33 samples/sec   Loss 1.1885   LearningRate 0.0002   Epoch: 19   Global Step: 795440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:08:12,117-Speed 2625.64 samples/sec   Loss 1.1936   LearningRate 0.0002   Epoch: 19   Global Step: 795450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:08:16,010-Speed 2631.67 samples/sec   Loss 1.1810   LearningRate 0.0002   Epoch: 19   Global Step: 795460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:08:19,883-Speed 2644.74 samples/sec   Loss 1.1494   LearningRate 0.0002   Epoch: 19   Global Step: 795470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:23,801-Speed 2613.94 samples/sec   Loss 1.1958   LearningRate 0.0002   Epoch: 19   Global Step: 795480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:27,693-Speed 2632.18 samples/sec   Loss 1.1965   LearningRate 0.0002   Epoch: 19   Global Step: 795490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:31,586-Speed 2631.45 samples/sec   Loss 1.1432   LearningRate 0.0002   Epoch: 19   Global Step: 795500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:35,491-Speed 2622.49 samples/sec   Loss 1.1599   LearningRate 0.0002   Epoch: 19   Global Step: 795510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:39,400-Speed 2620.08 samples/sec   Loss 1.2216   LearningRate 0.0002   Epoch: 19   Global Step: 795520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:43,296-Speed 2629.81 samples/sec   Loss 1.2275   LearningRate 0.0002   Epoch: 19   Global Step: 795530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:47,207-Speed 2618.79 samples/sec   Loss 1.1386   LearningRate 0.0002   Epoch: 19   Global Step: 795540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:51,100-Speed 2630.70 samples/sec   Loss 1.1805   LearningRate 0.0002   Epoch: 19   Global Step: 795550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:55,010-Speed 2619.98 samples/sec   Loss 1.1392   LearningRate 0.0002   Epoch: 19   Global Step: 795560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:08:58,905-Speed 2629.74 samples/sec   Loss 1.1981   LearningRate 0.0002   Epoch: 19   Global Step: 795570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:09:02,780-Speed 2642.80 samples/sec   Loss 1.1625   LearningRate 0.0002   Epoch: 19   Global Step: 795580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:06,676-Speed 2630.12 samples/sec   Loss 1.2201   LearningRate 0.0002   Epoch: 19   Global Step: 795590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:10,567-Speed 2631.73 samples/sec   Loss 1.2081   LearningRate 0.0002   Epoch: 19   Global Step: 795600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:14,472-Speed 2622.86 samples/sec   Loss 1.1426   LearningRate 0.0002   Epoch: 19   Global Step: 795610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:18,378-Speed 2622.60 samples/sec   Loss 1.2211   LearningRate 0.0002   Epoch: 19   Global Step: 795620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:22,273-Speed 2629.74 samples/sec   Loss 1.2107   LearningRate 0.0002   Epoch: 19   Global Step: 795630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:26,195-Speed 2611.67 samples/sec   Loss 1.1836   LearningRate 0.0002   Epoch: 19   Global Step: 795640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:30,100-Speed 2622.82 samples/sec   Loss 1.1466   LearningRate 0.0002   Epoch: 19   Global Step: 795650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:33,990-Speed 2633.13 samples/sec   Loss 1.1329   LearningRate 0.0002   Epoch: 19   Global Step: 795660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:37,886-Speed 2628.89 samples/sec   Loss 1.1928   LearningRate 0.0002   Epoch: 19   Global Step: 795670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:41,782-Speed 2629.47 samples/sec   Loss 1.2135   LearningRate 0.0002   Epoch: 19   Global Step: 795680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:09:45,677-Speed 2629.53 samples/sec   Loss 1.1740   LearningRate 0.0002   Epoch: 19   Global Step: 795690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:49,569-Speed 2631.52 samples/sec   Loss 1.1773   LearningRate 0.0002   Epoch: 19   Global Step: 795700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:53,469-Speed 2626.67 samples/sec   Loss 1.1899   LearningRate 0.0002   Epoch: 19   Global Step: 795710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:09:57,370-Speed 2625.77 samples/sec   Loss 1.1942   LearningRate 0.0002   Epoch: 19   Global Step: 795720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:01,264-Speed 2630.09 samples/sec   Loss 1.2052   LearningRate 0.0002   Epoch: 19   Global Step: 795730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:05,155-Speed 2632.24 samples/sec   Loss 1.1440   LearningRate 0.0002   Epoch: 19   Global Step: 795740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:09,051-Speed 2628.92 samples/sec   Loss 1.1570   LearningRate 0.0002   Epoch: 19   Global Step: 795750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:13,008-Speed 2589.36 samples/sec   Loss 1.1851   LearningRate 0.0002   Epoch: 19   Global Step: 795760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:16,902-Speed 2630.57 samples/sec   Loss 1.1447   LearningRate 0.0002   Epoch: 19   Global Step: 795770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:20,834-Speed 2605.50 samples/sec   Loss 1.1828   LearningRate 0.0002   Epoch: 19   Global Step: 795780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:24,728-Speed 2629.94 samples/sec   Loss 1.1626   LearningRate 0.0002   Epoch: 19   Global Step: 795790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:10:28,656-Speed 2608.49 samples/sec   Loss 1.2103   LearningRate 0.0002   Epoch: 19   Global Step: 795800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:10:32,548-Speed 2631.17 samples/sec   Loss 1.1350   LearningRate 0.0002   Epoch: 19   Global Step: 795810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:10:36,489-Speed 2599.14 samples/sec   Loss 1.1419   LearningRate 0.0002   Epoch: 19   Global Step: 795820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:10:40,550-Speed 2522.52 samples/sec   Loss 1.1666   LearningRate 0.0002   Epoch: 19   Global Step: 795830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:10:44,440-Speed 2632.52 samples/sec   Loss 1.1929   LearningRate 0.0002   Epoch: 19   Global Step: 795840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:10:48,312-Speed 2645.88 samples/sec   Loss 1.1536   LearningRate 0.0002   Epoch: 19   Global Step: 795850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:52,209-Speed 2627.66 samples/sec   Loss 1.1603   LearningRate 0.0002   Epoch: 19   Global Step: 795860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:10:56,128-Speed 2613.82 samples/sec   Loss 1.2016   LearningRate 0.0002   Epoch: 19   Global Step: 795870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:00,027-Speed 2626.79 samples/sec   Loss 1.1535   LearningRate 0.0002   Epoch: 19   Global Step: 795880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:03,920-Speed 2631.52 samples/sec   Loss 1.1880   LearningRate 0.0002   Epoch: 19   Global Step: 795890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:07,814-Speed 2630.15 samples/sec   Loss 1.1518   LearningRate 0.0002   Epoch: 19   Global Step: 795900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:11,709-Speed 2629.93 samples/sec   Loss 1.2079   LearningRate 0.0002   Epoch: 19   Global Step: 795910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:15,604-Speed 2629.08 samples/sec   Loss 1.1684   LearningRate 0.0002   Epoch: 19   Global Step: 795920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:19,517-Speed 2618.12 samples/sec   Loss 1.1801   LearningRate 0.0002   Epoch: 19   Global Step: 795930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:23,414-Speed 2628.24 samples/sec   Loss 1.1896   LearningRate 0.0002   Epoch: 19   Global Step: 795940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:27,308-Speed 2630.75 samples/sec   Loss 1.1522   LearningRate 0.0002   Epoch: 19   Global Step: 795950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:11:31,197-Speed 2633.40 samples/sec   Loss 1.2071   LearningRate 0.0002   Epoch: 19   Global Step: 795960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:35,088-Speed 2632.71 samples/sec   Loss 1.2147   LearningRate 0.0002   Epoch: 19   Global Step: 795970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:39,004-Speed 2615.55 samples/sec   Loss 1.1538   LearningRate 0.0002   Epoch: 19   Global Step: 795980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:42,911-Speed 2621.66 samples/sec   Loss 1.1706   LearningRate 0.0002   Epoch: 19   Global Step: 795990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:46,823-Speed 2617.90 samples/sec   Loss 1.1474   LearningRate 0.0002   Epoch: 19   Global Step: 796000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:50,719-Speed 2630.11 samples/sec   Loss 1.1711   LearningRate 0.0002   Epoch: 19   Global Step: 796010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:54,614-Speed 2629.44 samples/sec   Loss 1.1712   LearningRate 0.0002   Epoch: 19   Global Step: 796020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:11:58,530-Speed 2616.09 samples/sec   Loss 1.1386   LearningRate 0.0002   Epoch: 19   Global Step: 796030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:12:02,429-Speed 2626.58 samples/sec   Loss 1.1247   LearningRate 0.0002   Epoch: 19   Global Step: 796040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:12:06,326-Speed 2628.02 samples/sec   Loss 1.2173   LearningRate 0.0002   Epoch: 19   Global Step: 796050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:12:10,222-Speed 2629.06 samples/sec   Loss 1.1559   LearningRate 0.0002   Epoch: 19   Global Step: 796060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:14,151-Speed 2607.38 samples/sec   Loss 1.1345   LearningRate 0.0002   Epoch: 19   Global Step: 796070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:18,045-Speed 2630.93 samples/sec   Loss 1.1637   LearningRate 0.0002   Epoch: 19   Global Step: 796080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:21,941-Speed 2628.44 samples/sec   Loss 1.2059   LearningRate 0.0002   Epoch: 19   Global Step: 796090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:25,858-Speed 2615.03 samples/sec   Loss 1.1532   LearningRate 0.0002   Epoch: 19   Global Step: 796100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:29,753-Speed 2630.03 samples/sec   Loss 1.1507   LearningRate 0.0002   Epoch: 19   Global Step: 796110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:33,645-Speed 2631.74 samples/sec   Loss 1.1683   LearningRate 0.0002   Epoch: 19   Global Step: 796120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:37,541-Speed 2629.37 samples/sec   Loss 1.1491   LearningRate 0.0002   Epoch: 19   Global Step: 796130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:12:41,407-Speed 2649.04 samples/sec   Loss 1.1919   LearningRate 0.0002   Epoch: 19   Global Step: 796140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:12:45,327-Speed 2612.65 samples/sec   Loss 1.1158   LearningRate 0.0002   Epoch: 19   Global Step: 796150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:12:49,225-Speed 2628.28 samples/sec   Loss 1.1643   LearningRate 0.0002   Epoch: 19   Global Step: 796160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:12:53,092-Speed 2649.24 samples/sec   Loss 1.1795   LearningRate 0.0002   Epoch: 19   Global Step: 796170   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:12:56,981-Speed 2633.95 samples/sec   Loss 1.1726   LearningRate 0.0002   Epoch: 19   Global Step: 796180   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:00,882-Speed 2625.72 samples/sec   Loss 1.2094   LearningRate 0.0002   Epoch: 19   Global Step: 796190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:04,784-Speed 2624.98 samples/sec   Loss 1.1746   LearningRate 0.0002   Epoch: 19   Global Step: 796200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:08,692-Speed 2620.98 samples/sec   Loss 1.1769   LearningRate 0.0002   Epoch: 19   Global Step: 796210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:12,751-Speed 2523.30 samples/sec   Loss 1.1790   LearningRate 0.0002   Epoch: 19   Global Step: 796220   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:16,713-Speed 2585.66 samples/sec   Loss 1.1883   LearningRate 0.0002   Epoch: 19   Global Step: 796230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:20,608-Speed 2629.29 samples/sec   Loss 1.1612   LearningRate 0.0002   Epoch: 19   Global Step: 796240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:24,508-Speed 2626.96 samples/sec   Loss 1.1935   LearningRate 0.0002   Epoch: 19   Global Step: 796250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:28,411-Speed 2623.62 samples/sec   Loss 1.1804   LearningRate 0.0002   Epoch: 19   Global Step: 796260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:13:32,315-Speed 2624.13 samples/sec   Loss 1.1876   LearningRate 0.0002   Epoch: 19   Global Step: 796270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:36,210-Speed 2629.40 samples/sec   Loss 1.1769   LearningRate 0.0002   Epoch: 19   Global Step: 796280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:40,109-Speed 2627.34 samples/sec   Loss 1.1571   LearningRate 0.0002   Epoch: 19   Global Step: 796290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:44,194-Speed 2507.09 samples/sec   Loss 1.2154   LearningRate 0.0002   Epoch: 19   Global Step: 796300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:48,166-Speed 2579.09 samples/sec   Loss 1.1680   LearningRate 0.0002   Epoch: 19   Global Step: 796310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:52,072-Speed 2622.21 samples/sec   Loss 1.2218   LearningRate 0.0002   Epoch: 19   Global Step: 796320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:55,972-Speed 2627.16 samples/sec   Loss 1.2268   LearningRate 0.0002   Epoch: 19   Global Step: 796330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:13:59,940-Speed 2581.47 samples/sec   Loss 1.1795   LearningRate 0.0002   Epoch: 19   Global Step: 796340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:03,859-Speed 2614.15 samples/sec   Loss 1.1737   LearningRate 0.0002   Epoch: 19   Global Step: 796350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:07,786-Speed 2607.97 samples/sec   Loss 1.1361   LearningRate 0.0002   Epoch: 19   Global Step: 796360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:11,682-Speed 2629.21 samples/sec   Loss 1.1666   LearningRate 0.0002   Epoch: 19   Global Step: 796370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:14:15,565-Speed 2637.79 samples/sec   Loss 1.1749   LearningRate 0.0002   Epoch: 19   Global Step: 796380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:19,461-Speed 2629.21 samples/sec   Loss 1.2208   LearningRate 0.0002   Epoch: 19   Global Step: 796390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:23,369-Speed 2621.25 samples/sec   Loss 1.1371   LearningRate 0.0002   Epoch: 19   Global Step: 796400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:27,265-Speed 2629.04 samples/sec   Loss 1.1805   LearningRate 0.0002   Epoch: 19   Global Step: 796410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:31,158-Speed 2630.60 samples/sec   Loss 1.1692   LearningRate 0.0002   Epoch: 19   Global Step: 796420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:35,059-Speed 2625.26 samples/sec   Loss 1.1657   LearningRate 0.0002   Epoch: 19   Global Step: 796430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:38,953-Speed 2631.33 samples/sec   Loss 1.1564   LearningRate 0.0002   Epoch: 19   Global Step: 796440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:42,853-Speed 2626.74 samples/sec   Loss 1.1820   LearningRate 0.0002   Epoch: 19   Global Step: 796450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:46,755-Speed 2624.64 samples/sec   Loss 1.2123   LearningRate 0.0002   Epoch: 19   Global Step: 796460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:50,663-Speed 2621.52 samples/sec   Loss 1.1352   LearningRate 0.0002   Epoch: 19   Global Step: 796470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:14:54,567-Speed 2623.80 samples/sec   Loss 1.1215   LearningRate 0.0002   Epoch: 19   Global Step: 796480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:14:58,460-Speed 2631.10 samples/sec   Loss 1.1568   LearningRate 0.0002   Epoch: 19   Global Step: 796490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:02,434-Speed 2577.35 samples/sec   Loss 1.2010   LearningRate 0.0002   Epoch: 19   Global Step: 796500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:06,333-Speed 2626.75 samples/sec   Loss 1.1628   LearningRate 0.0002   Epoch: 19   Global Step: 796510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:10,231-Speed 2627.78 samples/sec   Loss 1.1552   LearningRate 0.0002   Epoch: 19   Global Step: 796520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:14,132-Speed 2626.15 samples/sec   Loss 1.1456   LearningRate 0.0002   Epoch: 19   Global Step: 796530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:18,039-Speed 2621.93 samples/sec   Loss 1.1845   LearningRate 0.0002   Epoch: 19   Global Step: 796540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:21,934-Speed 2629.94 samples/sec   Loss 1.1475   LearningRate 0.0002   Epoch: 19   Global Step: 796550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:25,824-Speed 2633.37 samples/sec   Loss 1.1553   LearningRate 0.0002   Epoch: 19   Global Step: 796560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:29,718-Speed 2629.92 samples/sec   Loss 1.2008   LearningRate 0.0002   Epoch: 19   Global Step: 796570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:33,625-Speed 2621.23 samples/sec   Loss 1.1309   LearningRate 0.0002   Epoch: 19   Global Step: 796580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:37,524-Speed 2626.86 samples/sec   Loss 1.1368   LearningRate 0.0002   Epoch: 19   Global Step: 796590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:15:41,409-Speed 2637.65 samples/sec   Loss 1.1546   LearningRate 0.0002   Epoch: 19   Global Step: 796600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:45,301-Speed 2632.07 samples/sec   Loss 1.1559   LearningRate 0.0002   Epoch: 19   Global Step: 796610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:49,194-Speed 2631.18 samples/sec   Loss 1.1673   LearningRate 0.0002   Epoch: 19   Global Step: 796620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:53,089-Speed 2629.40 samples/sec   Loss 1.1997   LearningRate 0.0002   Epoch: 19   Global Step: 796630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:15:56,991-Speed 2625.80 samples/sec   Loss 1.1971   LearningRate 0.0002   Epoch: 19   Global Step: 796640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:00,888-Speed 2628.10 samples/sec   Loss 1.1838   LearningRate 0.0002   Epoch: 19   Global Step: 796650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:04,787-Speed 2626.82 samples/sec   Loss 1.1477   LearningRate 0.0002   Epoch: 19   Global Step: 796660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:08,742-Speed 2589.03 samples/sec   Loss 1.2449   LearningRate 0.0002   Epoch: 19   Global Step: 796670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:12,829-Speed 2506.97 samples/sec   Loss 1.1873   LearningRate 0.0002   Epoch: 19   Global Step: 796680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:16,905-Speed 2512.71 samples/sec   Loss 1.1497   LearningRate 0.0002   Epoch: 19   Global Step: 796690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:20,803-Speed 2627.85 samples/sec   Loss 1.1751   LearningRate 0.0002   Epoch: 19   Global Step: 796700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:16:24,697-Speed 2630.08 samples/sec   Loss 1.1698   LearningRate 0.0002   Epoch: 19   Global Step: 796710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:16:28,572-Speed 2643.44 samples/sec   Loss 1.1642   LearningRate 0.0002   Epoch: 19   Global Step: 796720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:32,488-Speed 2615.33 samples/sec   Loss 1.1563   LearningRate 0.0002   Epoch: 19   Global Step: 796730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:36,520-Speed 2540.41 samples/sec   Loss 1.1594   LearningRate 0.0002   Epoch: 19   Global Step: 796740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:40,417-Speed 2627.54 samples/sec   Loss 1.1478   LearningRate 0.0002   Epoch: 19   Global Step: 796750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:44,390-Speed 2579.11 samples/sec   Loss 1.1930   LearningRate 0.0002   Epoch: 19   Global Step: 796760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:48,283-Speed 2631.24 samples/sec   Loss 1.1408   LearningRate 0.0002   Epoch: 19   Global Step: 796770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:52,179-Speed 2628.78 samples/sec   Loss 1.1521   LearningRate 0.0002   Epoch: 19   Global Step: 796780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:56,073-Speed 2630.67 samples/sec   Loss 1.1666   LearningRate 0.0002   Epoch: 19   Global Step: 796790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:16:59,970-Speed 2628.22 samples/sec   Loss 1.1646   LearningRate 0.0002   Epoch: 19   Global Step: 796800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:17:03,864-Speed 2630.08 samples/sec   Loss 1.1629   LearningRate 0.0002   Epoch: 19   Global Step: 796810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:17:07,737-Speed 2644.84 samples/sec   Loss 1.1284   LearningRate 0.0002   Epoch: 19   Global Step: 796820   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:11,642-Speed 2622.99 samples/sec   Loss 1.1505   LearningRate 0.0002   Epoch: 19   Global Step: 796830   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:15,536-Speed 2630.49 samples/sec   Loss 1.1954   LearningRate 0.0002   Epoch: 19   Global Step: 796840   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:19,455-Speed 2622.32 samples/sec   Loss 1.2031   LearningRate 0.0002   Epoch: 19   Global Step: 796850   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:23,373-Speed 2614.34 samples/sec   Loss 1.2130   LearningRate 0.0002   Epoch: 19   Global Step: 796860   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:27,265-Speed 2631.97 samples/sec   Loss 1.1442   LearningRate 0.0002   Epoch: 19   Global Step: 796870   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:31,199-Speed 2603.53 samples/sec   Loss 1.1483   LearningRate 0.0002   Epoch: 19   Global Step: 796880   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:35,096-Speed 2628.47 samples/sec   Loss 1.1948   LearningRate 0.0002   Epoch: 19   Global Step: 796890   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:39,002-Speed 2621.92 samples/sec   Loss 1.1987   LearningRate 0.0002   Epoch: 19   Global Step: 796900   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:42,943-Speed 2599.07 samples/sec   Loss 1.1507   LearningRate 0.0002   Epoch: 19   Global Step: 796910   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:17:46,838-Speed 2629.55 samples/sec   Loss 1.0866   LearningRate 0.0002   Epoch: 19   Global Step: 796920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:17:50,743-Speed 2624.45 samples/sec   Loss 1.1579   LearningRate 0.0002   Epoch: 19   Global Step: 796930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:17:54,654-Speed 2618.53 samples/sec   Loss 1.1682   LearningRate 0.0002   Epoch: 19   Global Step: 796940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:17:58,551-Speed 2628.33 samples/sec   Loss 1.0806   LearningRate 0.0002   Epoch: 19   Global Step: 796950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:02,444-Speed 2631.50 samples/sec   Loss 1.1473   LearningRate 0.0002   Epoch: 19   Global Step: 796960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:06,339-Speed 2629.36 samples/sec   Loss 1.1732   LearningRate 0.0002   Epoch: 19   Global Step: 796970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:10,242-Speed 2624.02 samples/sec   Loss 1.1612   LearningRate 0.0002   Epoch: 19   Global Step: 796980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:14,135-Speed 2631.20 samples/sec   Loss 1.1498   LearningRate 0.0002   Epoch: 19   Global Step: 796990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:18,032-Speed 2628.57 samples/sec   Loss 1.1711   LearningRate 0.0002   Epoch: 19   Global Step: 797000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:21,926-Speed 2629.81 samples/sec   Loss 1.1653   LearningRate 0.0002   Epoch: 19   Global Step: 797010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:18:25,821-Speed 2630.55 samples/sec   Loss 1.1926   LearningRate 0.0002   Epoch: 19   Global Step: 797020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:29,711-Speed 2632.47 samples/sec   Loss 1.0964   LearningRate 0.0002   Epoch: 19   Global Step: 797030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:33,606-Speed 2630.36 samples/sec   Loss 1.1927   LearningRate 0.0002   Epoch: 19   Global Step: 797040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:37,515-Speed 2620.10 samples/sec   Loss 1.1600   LearningRate 0.0002   Epoch: 19   Global Step: 797050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:41,415-Speed 2626.11 samples/sec   Loss 1.1699   LearningRate 0.0002   Epoch: 19   Global Step: 797060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:45,320-Speed 2622.68 samples/sec   Loss 1.1728   LearningRate 0.0002   Epoch: 19   Global Step: 797070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:49,226-Speed 2623.31 samples/sec   Loss 1.1803   LearningRate 0.0002   Epoch: 19   Global Step: 797080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:53,117-Speed 2632.16 samples/sec   Loss 1.1820   LearningRate 0.0002   Epoch: 19   Global Step: 797090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:18:57,024-Speed 2621.68 samples/sec   Loss 1.2013   LearningRate 0.0002   Epoch: 19   Global Step: 797100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:19:00,919-Speed 2629.42 samples/sec   Loss 1.1383   LearningRate 0.0002   Epoch: 19   Global Step: 797110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:04,817-Speed 2627.89 samples/sec   Loss 1.1814   LearningRate 0.0002   Epoch: 19   Global Step: 797120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:08,711-Speed 2630.55 samples/sec   Loss 1.1932   LearningRate 0.0002   Epoch: 19   Global Step: 797130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:12,611-Speed 2626.00 samples/sec   Loss 1.2073   LearningRate 0.0002   Epoch: 19   Global Step: 797140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:16,497-Speed 2635.68 samples/sec   Loss 1.1699   LearningRate 0.0002   Epoch: 19   Global Step: 797150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:20,392-Speed 2629.51 samples/sec   Loss 1.2237   LearningRate 0.0002   Epoch: 19   Global Step: 797160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:24,286-Speed 2630.60 samples/sec   Loss 1.1692   LearningRate 0.0002   Epoch: 19   Global Step: 797170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:28,177-Speed 2632.37 samples/sec   Loss 1.1614   LearningRate 0.0002   Epoch: 19   Global Step: 797180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:32,072-Speed 2630.26 samples/sec   Loss 1.1361   LearningRate 0.0002   Epoch: 19   Global Step: 797190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:35,965-Speed 2630.43 samples/sec   Loss 1.2008   LearningRate 0.0002   Epoch: 19   Global Step: 797200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:19:39,854-Speed 2633.77 samples/sec   Loss 1.1950   LearningRate 0.0002   Epoch: 19   Global Step: 797210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:19:43,758-Speed 2623.18 samples/sec   Loss 1.1989   LearningRate 0.0002   Epoch: 19   Global Step: 797220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:19:47,651-Speed 2631.22 samples/sec   Loss 1.1624   LearningRate 0.0002   Epoch: 19   Global Step: 797230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:19:51,718-Speed 2518.51 samples/sec   Loss 1.1345   LearningRate 0.0002   Epoch: 19   Global Step: 797240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:19:55,620-Speed 2625.36 samples/sec   Loss 1.2412   LearningRate 0.0002   Epoch: 19   Global Step: 797250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:19:59,517-Speed 2628.09 samples/sec   Loss 1.1510   LearningRate 0.0002   Epoch: 19   Global Step: 797260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:20:03,426-Speed 2620.91 samples/sec   Loss 1.1572   LearningRate 0.0002   Epoch: 19   Global Step: 797270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:20:07,298-Speed 2644.82 samples/sec   Loss 1.1712   LearningRate 0.0002   Epoch: 19   Global Step: 797280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:11,195-Speed 2628.04 samples/sec   Loss 1.1298   LearningRate 0.0002   Epoch: 19   Global Step: 797290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:15,093-Speed 2627.26 samples/sec   Loss 1.0777   LearningRate 0.0002   Epoch: 19   Global Step: 797300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:18,990-Speed 2629.34 samples/sec   Loss 1.1755   LearningRate 0.0002   Epoch: 19   Global Step: 797310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:22,882-Speed 2631.45 samples/sec   Loss 1.1342   LearningRate 0.0002   Epoch: 19   Global Step: 797320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:26,771-Speed 2633.65 samples/sec   Loss 1.2139   LearningRate 0.0002   Epoch: 19   Global Step: 797330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:30,678-Speed 2621.50 samples/sec   Loss 1.1378   LearningRate 0.0002   Epoch: 19   Global Step: 797340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:34,578-Speed 2626.75 samples/sec   Loss 1.2021   LearningRate 0.0002   Epoch: 19   Global Step: 797350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:38,482-Speed 2624.00 samples/sec   Loss 1.1953   LearningRate 0.0002   Epoch: 19   Global Step: 797360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:42,377-Speed 2629.51 samples/sec   Loss 1.1805   LearningRate 0.0002   Epoch: 19   Global Step: 797370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:20:46,277-Speed 2626.65 samples/sec   Loss 1.1315   LearningRate 0.0002   Epoch: 19   Global Step: 797380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:20:50,173-Speed 2629.41 samples/sec   Loss 1.1715   LearningRate 0.0002   Epoch: 19   Global Step: 797390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:20:54,069-Speed 2629.33 samples/sec   Loss 1.1787   LearningRate 0.0002   Epoch: 19   Global Step: 797400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:20:57,960-Speed 2632.02 samples/sec   Loss 1.2057   LearningRate 0.0002   Epoch: 19   Global Step: 797410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:01,858-Speed 2627.21 samples/sec   Loss 1.1987   LearningRate 0.0002   Epoch: 19   Global Step: 797420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:05,760-Speed 2625.18 samples/sec   Loss 1.1831   LearningRate 0.0002   Epoch: 19   Global Step: 797430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:09,655-Speed 2629.73 samples/sec   Loss 1.1760   LearningRate 0.0002   Epoch: 19   Global Step: 797440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:13,559-Speed 2624.36 samples/sec   Loss 1.2089   LearningRate 0.0002   Epoch: 19   Global Step: 797450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:17,451-Speed 2630.97 samples/sec   Loss 1.1916   LearningRate 0.0001   Epoch: 19   Global Step: 797460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:21,359-Speed 2621.26 samples/sec   Loss 1.1214   LearningRate 0.0001   Epoch: 19   Global Step: 797470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:25,251-Speed 2631.50 samples/sec   Loss 1.1373   LearningRate 0.0001   Epoch: 19   Global Step: 797480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:29,146-Speed 2630.47 samples/sec   Loss 1.2151   LearningRate 0.0001   Epoch: 19   Global Step: 797490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:33,039-Speed 2630.56 samples/sec   Loss 1.1791   LearningRate 0.0001   Epoch: 19   Global Step: 797500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:36,910-Speed 2646.25 samples/sec   Loss 1.1332   LearningRate 0.0001   Epoch: 19   Global Step: 797510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:21:40,775-Speed 2649.53 samples/sec   Loss 1.1840   LearningRate 0.0001   Epoch: 19   Global Step: 797520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:21:44,669-Speed 2630.78 samples/sec   Loss 1.1348   LearningRate 0.0001   Epoch: 19   Global Step: 797530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:21:48,561-Speed 2632.00 samples/sec   Loss 1.1566   LearningRate 0.0001   Epoch: 19   Global Step: 797540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:21:52,461-Speed 2625.97 samples/sec   Loss 1.1919   LearningRate 0.0001   Epoch: 19   Global Step: 797550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:21:56,354-Speed 2631.32 samples/sec   Loss 1.1914   LearningRate 0.0001   Epoch: 19   Global Step: 797560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:00,252-Speed 2627.58 samples/sec   Loss 1.1445   LearningRate 0.0001   Epoch: 19   Global Step: 797570   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:04,160-Speed 2620.98 samples/sec   Loss 1.1966   LearningRate 0.0001   Epoch: 19   Global Step: 797580   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:08,055-Speed 2629.42 samples/sec   Loss 1.1344   LearningRate 0.0001   Epoch: 19   Global Step: 797590   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:11,947-Speed 2632.39 samples/sec   Loss 1.1480   LearningRate 0.0001   Epoch: 19   Global Step: 797600   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:15,838-Speed 2632.14 samples/sec   Loss 1.2001   LearningRate 0.0001   Epoch: 19   Global Step: 797610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:19,769-Speed 2605.59 samples/sec   Loss 1.1455   LearningRate 0.0001   Epoch: 19   Global Step: 797620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:23,665-Speed 2628.90 samples/sec   Loss 1.1643   LearningRate 0.0001   Epoch: 19   Global Step: 797630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:27,557-Speed 2632.10 samples/sec   Loss 1.1571   LearningRate 0.0001   Epoch: 19   Global Step: 797640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:31,448-Speed 2632.49 samples/sec   Loss 1.1838   LearningRate 0.0001   Epoch: 19   Global Step: 797650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:35,343-Speed 2629.52 samples/sec   Loss 1.2103   LearningRate 0.0001   Epoch: 19   Global Step: 797660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:39,245-Speed 2624.98 samples/sec   Loss 1.1927   LearningRate 0.0001   Epoch: 19   Global Step: 797670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:43,146-Speed 2625.40 samples/sec   Loss 1.1796   LearningRate 0.0001   Epoch: 19   Global Step: 797680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:22:47,015-Speed 2647.74 samples/sec   Loss 1.1661   LearningRate 0.0001   Epoch: 19   Global Step: 797690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:50,918-Speed 2624.18 samples/sec   Loss 1.1551   LearningRate 0.0001   Epoch: 19   Global Step: 797700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:54,820-Speed 2625.46 samples/sec   Loss 1.1858   LearningRate 0.0001   Epoch: 19   Global Step: 797710   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:22:58,729-Speed 2620.27 samples/sec   Loss 1.1573   LearningRate 0.0001   Epoch: 19   Global Step: 797720   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:02,634-Speed 2622.74 samples/sec   Loss 1.2309   LearningRate 0.0001   Epoch: 19   Global Step: 797730   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:06,534-Speed 2626.32 samples/sec   Loss 1.1450   LearningRate 0.0001   Epoch: 19   Global Step: 797740   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:10,455-Speed 2612.38 samples/sec   Loss 1.1534   LearningRate 0.0001   Epoch: 19   Global Step: 797750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:14,359-Speed 2631.39 samples/sec   Loss 1.1643   LearningRate 0.0001   Epoch: 19   Global Step: 797760   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:18,290-Speed 2605.76 samples/sec   Loss 1.1862   LearningRate 0.0001   Epoch: 19   Global Step: 797770   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:22,181-Speed 2632.70 samples/sec   Loss 1.1704   LearningRate 0.0001   Epoch: 19   Global Step: 797780   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:26,078-Speed 2628.22 samples/sec   Loss 1.2131   LearningRate 0.0001   Epoch: 19   Global Step: 797790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:23:29,979-Speed 2626.31 samples/sec   Loss 1.1322   LearningRate 0.0001   Epoch: 19   Global Step: 797800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:23:33,881-Speed 2624.65 samples/sec   Loss 1.1942   LearningRate 0.0001   Epoch: 19   Global Step: 797810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:23:37,797-Speed 2615.49 samples/sec   Loss 1.1616   LearningRate 0.0001   Epoch: 19   Global Step: 797820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:23:41,688-Speed 2631.97 samples/sec   Loss 1.1601   LearningRate 0.0001   Epoch: 19   Global Step: 797830   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:45,580-Speed 2632.36 samples/sec   Loss 1.2120   LearningRate 0.0001   Epoch: 19   Global Step: 797840   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:49,474-Speed 2630.83 samples/sec   Loss 1.1418   LearningRate 0.0001   Epoch: 19   Global Step: 797850   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:53,369-Speed 2629.50 samples/sec   Loss 1.1050   LearningRate 0.0001   Epoch: 19   Global Step: 797860   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:23:57,260-Speed 2632.78 samples/sec   Loss 1.1792   LearningRate 0.0001   Epoch: 19   Global Step: 797870   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:24:01,162-Speed 2624.97 samples/sec   Loss 1.1588   LearningRate 0.0001   Epoch: 19   Global Step: 797880   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:24:05,057-Speed 2629.63 samples/sec   Loss 1.1299   LearningRate 0.0001   Epoch: 19   Global Step: 797890   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:24:08,949-Speed 2631.52 samples/sec   Loss 1.2163   LearningRate 0.0001   Epoch: 19   Global Step: 797900   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:24:12,843-Speed 2630.50 samples/sec   Loss 1.1776   LearningRate 0.0001   Epoch: 19   Global Step: 797910   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:24:16,738-Speed 2629.48 samples/sec   Loss 1.2151   LearningRate 0.0001   Epoch: 19   Global Step: 797920   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-16 13:24:20,652-Speed 2617.23 samples/sec   Loss 1.1533   LearningRate 0.0001   Epoch: 19   Global Step: 797930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:24,747-Speed 2502.10 samples/sec   Loss 1.1686   LearningRate 0.0001   Epoch: 19   Global Step: 797940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:28,804-Speed 2524.15 samples/sec   Loss 1.1664   LearningRate 0.0001   Epoch: 19   Global Step: 797950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:32,718-Speed 2616.97 samples/sec   Loss 1.2050   LearningRate 0.0001   Epoch: 19   Global Step: 797960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:36,614-Speed 2628.89 samples/sec   Loss 1.1902   LearningRate 0.0001   Epoch: 19   Global Step: 797970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:40,509-Speed 2630.07 samples/sec   Loss 1.1899   LearningRate 0.0001   Epoch: 19   Global Step: 797980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:44,416-Speed 2621.47 samples/sec   Loss 1.1832   LearningRate 0.0001   Epoch: 19   Global Step: 797990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:48,323-Speed 2621.95 samples/sec   Loss 1.1442   LearningRate 0.0001   Epoch: 19   Global Step: 798000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:52,227-Speed 2623.27 samples/sec   Loss 1.1345   LearningRate 0.0001   Epoch: 19   Global Step: 798010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:24:56,122-Speed 2629.66 samples/sec   Loss 1.1433   LearningRate 0.0001   Epoch: 19   Global Step: 798020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:00,005-Speed 2638.21 samples/sec   Loss 1.1885   LearningRate 0.0001   Epoch: 19   Global Step: 798030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:03,903-Speed 2627.56 samples/sec   Loss 1.1829   LearningRate 0.0001   Epoch: 19   Global Step: 798040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:07,795-Speed 2631.54 samples/sec   Loss 1.1313   LearningRate 0.0001   Epoch: 19   Global Step: 798050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:11,689-Speed 2630.58 samples/sec   Loss 1.1602   LearningRate 0.0001   Epoch: 19   Global Step: 798060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:15,582-Speed 2630.31 samples/sec   Loss 1.1515   LearningRate 0.0001   Epoch: 19   Global Step: 798070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:19,484-Speed 2626.04 samples/sec   Loss 1.1725   LearningRate 0.0001   Epoch: 19   Global Step: 798080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:23,399-Speed 2616.20 samples/sec   Loss 1.1302   LearningRate 0.0001   Epoch: 19   Global Step: 798090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:27,303-Speed 2623.71 samples/sec   Loss 1.1724   LearningRate 0.0001   Epoch: 19   Global Step: 798100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:31,206-Speed 2623.73 samples/sec   Loss 1.1546   LearningRate 0.0001   Epoch: 19   Global Step: 798110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:35,098-Speed 2632.55 samples/sec   Loss 1.1661   LearningRate 0.0001   Epoch: 19   Global Step: 798120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:38,988-Speed 2632.59 samples/sec   Loss 1.1796   LearningRate 0.0001   Epoch: 19   Global Step: 798130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:25:42,854-Speed 2649.51 samples/sec   Loss 1.1388   LearningRate 0.0001   Epoch: 19   Global Step: 798140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:46,745-Speed 2632.40 samples/sec   Loss 1.1567   LearningRate 0.0001   Epoch: 19   Global Step: 798150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:50,641-Speed 2628.94 samples/sec   Loss 1.1490   LearningRate 0.0001   Epoch: 19   Global Step: 798160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:54,535-Speed 2630.85 samples/sec   Loss 1.1657   LearningRate 0.0001   Epoch: 19   Global Step: 798170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:25:58,445-Speed 2619.18 samples/sec   Loss 1.1599   LearningRate 0.0001   Epoch: 19   Global Step: 798180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:02,359-Speed 2617.65 samples/sec   Loss 1.1680   LearningRate 0.0001   Epoch: 19   Global Step: 798190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:06,255-Speed 2629.26 samples/sec   Loss 1.1910   LearningRate 0.0001   Epoch: 19   Global Step: 798200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:10,156-Speed 2625.36 samples/sec   Loss 1.1676   LearningRate 0.0001   Epoch: 19   Global Step: 798210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:14,090-Speed 2603.37 samples/sec   Loss 1.1461   LearningRate 0.0001   Epoch: 19   Global Step: 798220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:17,987-Speed 2628.84 samples/sec   Loss 1.1529   LearningRate 0.0001   Epoch: 19   Global Step: 798230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:21,923-Speed 2602.14 samples/sec   Loss 1.1739   LearningRate 0.0001   Epoch: 19   Global Step: 798240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:26:25,815-Speed 2631.72 samples/sec   Loss 1.1652   LearningRate 0.0001   Epoch: 19   Global Step: 798250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:26:29,710-Speed 2630.16 samples/sec   Loss 1.1575   LearningRate 0.0001   Epoch: 19   Global Step: 798260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:26:33,606-Speed 2628.77 samples/sec   Loss 1.1758   LearningRate 0.0001   Epoch: 19   Global Step: 798270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:26:37,498-Speed 2631.59 samples/sec   Loss 1.1232   LearningRate 0.0001   Epoch: 19   Global Step: 798280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:26:41,402-Speed 2623.84 samples/sec   Loss 1.1765   LearningRate 0.0001   Epoch: 19   Global Step: 798290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:26:45,330-Speed 2607.60 samples/sec   Loss 1.1801   LearningRate 0.0001   Epoch: 19   Global Step: 798300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:49,221-Speed 2632.36 samples/sec   Loss 1.1548   LearningRate 0.0001   Epoch: 19   Global Step: 798310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:53,137-Speed 2616.07 samples/sec   Loss 1.2279   LearningRate 0.0001   Epoch: 19   Global Step: 798320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:26:57,043-Speed 2621.83 samples/sec   Loss 1.2016   LearningRate 0.0001   Epoch: 19   Global Step: 798330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:00,935-Speed 2631.56 samples/sec   Loss 1.1759   LearningRate 0.0001   Epoch: 19   Global Step: 798340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:04,822-Speed 2635.39 samples/sec   Loss 1.1508   LearningRate 0.0001   Epoch: 19   Global Step: 798350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:08,718-Speed 2628.92 samples/sec   Loss 1.1636   LearningRate 0.0001   Epoch: 19   Global Step: 798360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:12,609-Speed 2632.56 samples/sec   Loss 1.1574   LearningRate 0.0001   Epoch: 19   Global Step: 798370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:16,499-Speed 2633.04 samples/sec   Loss 1.1737   LearningRate 0.0001   Epoch: 19   Global Step: 798380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:20,391-Speed 2631.96 samples/sec   Loss 1.1368   LearningRate 0.0001   Epoch: 19   Global Step: 798390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:24,296-Speed 2622.82 samples/sec   Loss 1.0868   LearningRate 0.0001   Epoch: 19   Global Step: 798400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:27:28,206-Speed 2619.99 samples/sec   Loss 1.1943   LearningRate 0.0001   Epoch: 19   Global Step: 798410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:27:32,121-Speed 2616.37 samples/sec   Loss 1.1807   LearningRate 0.0001   Epoch: 19   Global Step: 798420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-16 13:27:36,018-Speed 2627.70 samples/sec   Loss 1.2110   LearningRate 0.0001   Epoch: 19   Global Step: 798430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-16 13:27:39,960-Speed 2598.33 samples/sec   Loss 1.1839   LearningRate 0.0001   Epoch: 19   Global Step: 798440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:27:43,868-Speed 2621.33 samples/sec   Loss 1.1972   LearningRate 0.0001   Epoch: 19   Global Step: 798450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:27:47,775-Speed 2621.39 samples/sec   Loss 1.1248   LearningRate 0.0001   Epoch: 19   Global Step: 798460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:27:51,679-Speed 2624.10 samples/sec   Loss 1.1534   LearningRate 0.0001   Epoch: 19   Global Step: 798470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:27:55,574-Speed 2629.86 samples/sec   Loss 1.1781   LearningRate 0.0001   Epoch: 19   Global Step: 798480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:27:59,475-Speed 2625.95 samples/sec   Loss 1.1791   LearningRate 0.0001   Epoch: 19   Global Step: 798490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:03,375-Speed 2626.20 samples/sec   Loss 1.1622   LearningRate 0.0001   Epoch: 19   Global Step: 798500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:07,271-Speed 2629.02 samples/sec   Loss 1.1949   LearningRate 0.0001   Epoch: 19   Global Step: 798510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:11,178-Speed 2621.00 samples/sec   Loss 1.1574   LearningRate 0.0001   Epoch: 19   Global Step: 798520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:15,083-Speed 2623.81 samples/sec   Loss 1.1613   LearningRate 0.0001   Epoch: 19   Global Step: 798530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:28:18,966-Speed 2637.44 samples/sec   Loss 1.1791   LearningRate 0.0001   Epoch: 19   Global Step: 798540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:22,891-Speed 2609.76 samples/sec   Loss 1.1670   LearningRate 0.0001   Epoch: 19   Global Step: 798550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:26,794-Speed 2624.14 samples/sec   Loss 1.1476   LearningRate 0.0001   Epoch: 19   Global Step: 798560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:30,694-Speed 2626.95 samples/sec   Loss 1.1683   LearningRate 0.0001   Epoch: 19   Global Step: 798570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:34,592-Speed 2627.35 samples/sec   Loss 1.1888   LearningRate 0.0001   Epoch: 19   Global Step: 798580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:38,488-Speed 2629.31 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 19   Global Step: 798590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:42,387-Speed 2627.00 samples/sec   Loss 1.1185   LearningRate 0.0001   Epoch: 19   Global Step: 798600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:46,290-Speed 2623.93 samples/sec   Loss 1.1620   LearningRate 0.0001   Epoch: 19   Global Step: 798610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:50,314-Speed 2545.74 samples/sec   Loss 1.1623   LearningRate 0.0001   Epoch: 19   Global Step: 798620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:28:54,198-Speed 2637.10 samples/sec   Loss 1.1607   LearningRate 0.0001   Epoch: 19   Global Step: 798630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:28:58,148-Speed 2593.66 samples/sec   Loss 1.1756   LearningRate 0.0001   Epoch: 19   Global Step: 798640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:02,053-Speed 2622.88 samples/sec   Loss 1.1383   LearningRate 0.0001   Epoch: 19   Global Step: 798650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:05,948-Speed 2629.37 samples/sec   Loss 1.2304   LearningRate 0.0001   Epoch: 19   Global Step: 798660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:09,844-Speed 2628.77 samples/sec   Loss 1.1247   LearningRate 0.0001   Epoch: 19   Global Step: 798670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:13,742-Speed 2627.79 samples/sec   Loss 1.2056   LearningRate 0.0001   Epoch: 19   Global Step: 798680   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:17,643-Speed 2626.23 samples/sec   Loss 1.1687   LearningRate 0.0001   Epoch: 19   Global Step: 798690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:21,529-Speed 2635.92 samples/sec   Loss 1.1457   LearningRate 0.0001   Epoch: 19   Global Step: 798700   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:25,429-Speed 2626.31 samples/sec   Loss 1.1785   LearningRate 0.0001   Epoch: 19   Global Step: 798710   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:29,320-Speed 2632.85 samples/sec   Loss 1.1350   LearningRate 0.0001   Epoch: 19   Global Step: 798720   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:29:33,207-Speed 2634.70 samples/sec   Loss 1.1738   LearningRate 0.0001   Epoch: 19   Global Step: 798730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:29:37,115-Speed 2621.05 samples/sec   Loss 1.1624   LearningRate 0.0001   Epoch: 19   Global Step: 798740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:29:41,027-Speed 2618.33 samples/sec   Loss 1.1321   LearningRate 0.0001   Epoch: 19   Global Step: 798750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:29:44,944-Speed 2615.96 samples/sec   Loss 1.1735   LearningRate 0.0001   Epoch: 19   Global Step: 798760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:29:48,845-Speed 2625.30 samples/sec   Loss 1.1314   LearningRate 0.0001   Epoch: 19   Global Step: 798770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:29:52,741-Speed 2629.93 samples/sec   Loss 1.1811   LearningRate 0.0001   Epoch: 19   Global Step: 798780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:29:56,638-Speed 2628.41 samples/sec   Loss 1.1943   LearningRate 0.0001   Epoch: 19   Global Step: 798790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:30:00,546-Speed 2620.59 samples/sec   Loss 1.1912   LearningRate 0.0001   Epoch: 19   Global Step: 798800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:30:04,455-Speed 2620.61 samples/sec   Loss 1.1359   LearningRate 0.0001   Epoch: 19   Global Step: 798810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:30:08,359-Speed 2623.70 samples/sec   Loss 1.1758   LearningRate 0.0001   Epoch: 19   Global Step: 798820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:30:12,237-Speed 2640.97 samples/sec   Loss 1.1613   LearningRate 0.0001   Epoch: 19   Global Step: 798830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:30:16,106-Speed 2647.39 samples/sec   Loss 1.1457   LearningRate 0.0001   Epoch: 19   Global Step: 798840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:20,044-Speed 2600.99 samples/sec   Loss 1.1408   LearningRate 0.0001   Epoch: 19   Global Step: 798850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:23,932-Speed 2634.62 samples/sec   Loss 1.2288   LearningRate 0.0001   Epoch: 19   Global Step: 798860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:27,823-Speed 2632.92 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 19   Global Step: 798870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:31,710-Speed 2634.66 samples/sec   Loss 1.1442   LearningRate 0.0001   Epoch: 19   Global Step: 798880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:35,608-Speed 2628.46 samples/sec   Loss 1.1616   LearningRate 0.0001   Epoch: 19   Global Step: 798890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:39,502-Speed 2629.91 samples/sec   Loss 1.1388   LearningRate 0.0001   Epoch: 19   Global Step: 798900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:43,394-Speed 2631.47 samples/sec   Loss 1.1457   LearningRate 0.0001   Epoch: 19   Global Step: 798910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:47,293-Speed 2626.97 samples/sec   Loss 1.1458   LearningRate 0.0001   Epoch: 19   Global Step: 798920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:51,182-Speed 2634.31 samples/sec   Loss 1.1545   LearningRate 0.0001   Epoch: 19   Global Step: 798930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:30:55,079-Speed 2629.35 samples/sec   Loss 1.1417   LearningRate 0.0001   Epoch: 19   Global Step: 798940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:30:58,970-Speed 2632.13 samples/sec   Loss 1.1520   LearningRate 0.0001   Epoch: 19   Global Step: 798950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:31:02,844-Speed 2644.29 samples/sec   Loss 1.1803   LearningRate 0.0001   Epoch: 19   Global Step: 798960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:06,738-Speed 2629.69 samples/sec   Loss 1.1690   LearningRate 0.0001   Epoch: 19   Global Step: 798970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:10,634-Speed 2628.94 samples/sec   Loss 1.1430   LearningRate 0.0001   Epoch: 19   Global Step: 798980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:14,588-Speed 2590.56 samples/sec   Loss 1.1750   LearningRate 0.0001   Epoch: 19   Global Step: 798990   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:18,499-Speed 2619.08 samples/sec   Loss 1.2027   LearningRate 0.0001   Epoch: 19   Global Step: 799000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:22,413-Speed 2616.85 samples/sec   Loss 1.1490   LearningRate 0.0001   Epoch: 19   Global Step: 799010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:26,308-Speed 2629.81 samples/sec   Loss 1.1888   LearningRate 0.0001   Epoch: 19   Global Step: 799020   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:30,205-Speed 2628.52 samples/sec   Loss 1.0934   LearningRate 0.0001   Epoch: 19   Global Step: 799030   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:34,102-Speed 2629.50 samples/sec   Loss 1.1268   LearningRate 0.0001   Epoch: 19   Global Step: 799040   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:38,000-Speed 2627.19 samples/sec   Loss 1.1536   LearningRate 0.0001   Epoch: 19   Global Step: 799050   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:31:41,893-Speed 2631.44 samples/sec   Loss 1.1753   LearningRate 0.0001   Epoch: 19   Global Step: 799060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:31:45,784-Speed 2631.97 samples/sec   Loss 1.1646   LearningRate 0.0001   Epoch: 19   Global Step: 799070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:31:49,676-Speed 2631.92 samples/sec   Loss 1.1171   LearningRate 0.0001   Epoch: 19   Global Step: 799080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:31:53,568-Speed 2631.20 samples/sec   Loss 1.1542   LearningRate 0.0001   Epoch: 19   Global Step: 799090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:31:57,445-Speed 2642.11 samples/sec   Loss 1.1314   LearningRate 0.0001   Epoch: 19   Global Step: 799100   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:01,357-Speed 2618.94 samples/sec   Loss 1.1997   LearningRate 0.0001   Epoch: 19   Global Step: 799110   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:05,254-Speed 2628.56 samples/sec   Loss 1.1731   LearningRate 0.0001   Epoch: 19   Global Step: 799120   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:09,199-Speed 2596.48 samples/sec   Loss 1.1374   LearningRate 0.0001   Epoch: 19   Global Step: 799130   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:13,139-Speed 2598.90 samples/sec   Loss 1.1623   LearningRate 0.0001   Epoch: 19   Global Step: 799140   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:17,055-Speed 2615.63 samples/sec   Loss 1.2059   LearningRate 0.0001   Epoch: 19   Global Step: 799150   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:20,945-Speed 2633.40 samples/sec   Loss 1.1794   LearningRate 0.0001   Epoch: 19   Global Step: 799160   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:24,842-Speed 2628.77 samples/sec   Loss 1.1455   LearningRate 0.0001   Epoch: 19   Global Step: 799170   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:28,740-Speed 2627.51 samples/sec   Loss 1.1633   LearningRate 0.0001   Epoch: 19   Global Step: 799180   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:32,632-Speed 2631.17 samples/sec   Loss 1.1433   LearningRate 0.0001   Epoch: 19   Global Step: 799190   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:32:36,531-Speed 2627.33 samples/sec   Loss 1.1659   LearningRate 0.0001   Epoch: 19   Global Step: 799200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:32:40,428-Speed 2628.54 samples/sec   Loss 1.1766   LearningRate 0.0001   Epoch: 19   Global Step: 799210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:32:44,322-Speed 2630.16 samples/sec   Loss 1.1950   LearningRate 0.0001   Epoch: 19   Global Step: 799220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:32:48,220-Speed 2627.67 samples/sec   Loss 1.1595   LearningRate 0.0001   Epoch: 19   Global Step: 799230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:32:52,123-Speed 2623.87 samples/sec   Loss 1.0948   LearningRate 0.0001   Epoch: 19   Global Step: 799240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:32:56,015-Speed 2632.63 samples/sec   Loss 1.1795   LearningRate 0.0001   Epoch: 19   Global Step: 799250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:32:59,913-Speed 2627.26 samples/sec   Loss 1.1852   LearningRate 0.0001   Epoch: 19   Global Step: 799260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:03,808-Speed 2630.31 samples/sec   Loss 1.1855   LearningRate 0.0001   Epoch: 19   Global Step: 799270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:07,703-Speed 2629.05 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 19   Global Step: 799280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:11,603-Speed 2626.24 samples/sec   Loss 1.1630   LearningRate 0.0001   Epoch: 19   Global Step: 799290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:15,495-Speed 2632.24 samples/sec   Loss 1.1479   LearningRate 0.0001   Epoch: 19   Global Step: 799300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:33:19,366-Speed 2645.30 samples/sec   Loss 1.1591   LearningRate 0.0001   Epoch: 19   Global Step: 799310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:23,275-Speed 2620.62 samples/sec   Loss 1.1177   LearningRate 0.0001   Epoch: 19   Global Step: 799320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:27,175-Speed 2626.46 samples/sec   Loss 1.1840   LearningRate 0.0001   Epoch: 19   Global Step: 799330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:31,168-Speed 2565.32 samples/sec   Loss 1.1362   LearningRate 0.0001   Epoch: 19   Global Step: 799340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:35,060-Speed 2631.40 samples/sec   Loss 1.1927   LearningRate 0.0001   Epoch: 19   Global Step: 799350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:38,957-Speed 2628.09 samples/sec   Loss 1.1689   LearningRate 0.0001   Epoch: 19   Global Step: 799360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:42,857-Speed 2626.53 samples/sec   Loss 1.2357   LearningRate 0.0001   Epoch: 19   Global Step: 799370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:46,750-Speed 2631.31 samples/sec   Loss 1.1615   LearningRate 0.0001   Epoch: 19   Global Step: 799380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:50,641-Speed 2632.28 samples/sec   Loss 1.1160   LearningRate 0.0001   Epoch: 19   Global Step: 799390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:54,536-Speed 2629.95 samples/sec   Loss 1.1546   LearningRate 0.0001   Epoch: 19   Global Step: 799400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:33:58,431-Speed 2629.86 samples/sec   Loss 1.1761   LearningRate 0.0001   Epoch: 19   Global Step: 799410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:34:02,300-Speed 2647.41 samples/sec   Loss 1.1210   LearningRate 0.0001   Epoch: 19   Global Step: 799420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:06,195-Speed 2629.89 samples/sec   Loss 1.1631   LearningRate 0.0001   Epoch: 19   Global Step: 799430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:10,096-Speed 2625.21 samples/sec   Loss 1.1469   LearningRate 0.0001   Epoch: 19   Global Step: 799440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:14,009-Speed 2618.09 samples/sec   Loss 1.1690   LearningRate 0.0001   Epoch: 19   Global Step: 799450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:17,901-Speed 2631.44 samples/sec   Loss 1.1699   LearningRate 0.0001   Epoch: 19   Global Step: 799460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:21,801-Speed 2626.79 samples/sec   Loss 1.1736   LearningRate 0.0001   Epoch: 19   Global Step: 799470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:25,699-Speed 2627.43 samples/sec   Loss 1.1281   LearningRate 0.0001   Epoch: 19   Global Step: 799480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:29,594-Speed 2630.26 samples/sec   Loss 1.1753   LearningRate 0.0001   Epoch: 19   Global Step: 799490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:33,484-Speed 2632.75 samples/sec   Loss 1.1695   LearningRate 0.0001   Epoch: 19   Global Step: 799500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:37,417-Speed 2604.21 samples/sec   Loss 1.1573   LearningRate 0.0001   Epoch: 19   Global Step: 799510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:41,311-Speed 2630.52 samples/sec   Loss 1.1745   LearningRate 0.0001   Epoch: 19   Global Step: 799520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:34:45,195-Speed 2637.31 samples/sec   Loss 1.1513   LearningRate 0.0001   Epoch: 19   Global Step: 799530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:49,090-Speed 2629.56 samples/sec   Loss 1.1217   LearningRate 0.0001   Epoch: 19   Global Step: 799540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:53,013-Speed 2610.89 samples/sec   Loss 1.1631   LearningRate 0.0001   Epoch: 19   Global Step: 799550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:34:56,909-Speed 2628.87 samples/sec   Loss 1.1638   LearningRate 0.0001   Epoch: 19   Global Step: 799560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:00,819-Speed 2619.84 samples/sec   Loss 1.1457   LearningRate 0.0001   Epoch: 19   Global Step: 799570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:04,710-Speed 2632.98 samples/sec   Loss 1.1623   LearningRate 0.0001   Epoch: 19   Global Step: 799580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:08,609-Speed 2626.55 samples/sec   Loss 1.1828   LearningRate 0.0001   Epoch: 19   Global Step: 799590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:12,513-Speed 2623.87 samples/sec   Loss 1.1764   LearningRate 0.0001   Epoch: 19   Global Step: 799600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:16,413-Speed 2626.05 samples/sec   Loss 1.1657   LearningRate 0.0001   Epoch: 19   Global Step: 799610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:20,337-Speed 2610.59 samples/sec   Loss 1.1236   LearningRate 0.0001   Epoch: 19   Global Step: 799620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:24,230-Speed 2630.99 samples/sec   Loss 1.1502   LearningRate 0.0001   Epoch: 19   Global Step: 799630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:35:28,103-Speed 2645.19 samples/sec   Loss 1.1548   LearningRate 0.0001   Epoch: 19   Global Step: 799640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:32,001-Speed 2627.12 samples/sec   Loss 1.1721   LearningRate 0.0001   Epoch: 19   Global Step: 799650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:35,895-Speed 2630.50 samples/sec   Loss 1.1814   LearningRate 0.0001   Epoch: 19   Global Step: 799660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:39,787-Speed 2631.01 samples/sec   Loss 1.1331   LearningRate 0.0001   Epoch: 19   Global Step: 799670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:43,697-Speed 2620.03 samples/sec   Loss 1.1286   LearningRate 0.0001   Epoch: 19   Global Step: 799680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:47,597-Speed 2626.89 samples/sec   Loss 1.1545   LearningRate 0.0001   Epoch: 19   Global Step: 799690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:51,493-Speed 2628.74 samples/sec   Loss 1.1706   LearningRate 0.0001   Epoch: 19   Global Step: 799700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:55,398-Speed 2623.22 samples/sec   Loss 1.1440   LearningRate 0.0001   Epoch: 19   Global Step: 799710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:35:59,290-Speed 2632.25 samples/sec   Loss 1.1462   LearningRate 0.0001   Epoch: 19   Global Step: 799720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:03,205-Speed 2616.03 samples/sec   Loss 1.2640   LearningRate 0.0001   Epoch: 19   Global Step: 799730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:07,282-Speed 2511.85 samples/sec   Loss 1.1518   LearningRate 0.0001   Epoch: 19   Global Step: 799740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:36:11,186-Speed 2623.37 samples/sec   Loss 1.1343   LearningRate 0.0001   Epoch: 19   Global Step: 799750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:36:15,065-Speed 2641.55 samples/sec   Loss 1.1766   LearningRate 0.0001   Epoch: 19   Global Step: 799760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:18,962-Speed 2627.83 samples/sec   Loss 1.1633   LearningRate 0.0001   Epoch: 19   Global Step: 799770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:22,853-Speed 2632.07 samples/sec   Loss 1.1285   LearningRate 0.0001   Epoch: 19   Global Step: 799780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:26,746-Speed 2631.42 samples/sec   Loss 1.1629   LearningRate 0.0001   Epoch: 19   Global Step: 799790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:30,638-Speed 2631.08 samples/sec   Loss 1.1950   LearningRate 0.0001   Epoch: 19   Global Step: 799800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:34,546-Speed 2627.41 samples/sec   Loss 1.1882   LearningRate 0.0001   Epoch: 19   Global Step: 799810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:38,456-Speed 2619.67 samples/sec   Loss 1.1920   LearningRate 0.0001   Epoch: 19   Global Step: 799820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:42,351-Speed 2629.23 samples/sec   Loss 1.1218   LearningRate 0.0001   Epoch: 19   Global Step: 799830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:46,248-Speed 2627.93 samples/sec   Loss 1.1381   LearningRate 0.0001   Epoch: 19   Global Step: 799840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:50,141-Speed 2632.05 samples/sec   Loss 1.1396   LearningRate 0.0001   Epoch: 19   Global Step: 799850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:36:54,032-Speed 2631.69 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 19   Global Step: 799860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:36:58,002-Speed 2580.30 samples/sec   Loss 1.2038   LearningRate 0.0001   Epoch: 19   Global Step: 799870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:01,899-Speed 2628.54 samples/sec   Loss 1.1988   LearningRate 0.0001   Epoch: 19   Global Step: 799880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:05,791-Speed 2632.29 samples/sec   Loss 1.2038   LearningRate 0.0001   Epoch: 19   Global Step: 799890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:09,689-Speed 2627.57 samples/sec   Loss 1.1752   LearningRate 0.0001   Epoch: 19   Global Step: 799900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:13,613-Speed 2609.71 samples/sec   Loss 1.1491   LearningRate 0.0001   Epoch: 19   Global Step: 799910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:17,507-Speed 2630.66 samples/sec   Loss 1.1251   LearningRate 0.0001   Epoch: 19   Global Step: 799920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:21,394-Speed 2635.30 samples/sec   Loss 1.1303   LearningRate 0.0001   Epoch: 19   Global Step: 799930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:25,321-Speed 2608.56 samples/sec   Loss 1.1809   LearningRate 0.0001   Epoch: 19   Global Step: 799940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:29,213-Speed 2631.77 samples/sec   Loss 1.1616   LearningRate 0.0001   Epoch: 19   Global Step: 799950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:33,100-Speed 2635.65 samples/sec   Loss 1.1638   LearningRate 0.0001   Epoch: 19   Global Step: 799960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:36,991-Speed 2631.93 samples/sec   Loss 1.1955   LearningRate 0.0001   Epoch: 19   Global Step: 799970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:37:40,887-Speed 2629.00 samples/sec   Loss 1.1428   LearningRate 0.0001   Epoch: 19   Global Step: 799980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:37:44,759-Speed 2645.59 samples/sec   Loss 1.1684   LearningRate 0.0001   Epoch: 19   Global Step: 799990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:37:48,653-Speed 2630.58 samples/sec   Loss 1.1323   LearningRate 0.0001   Epoch: 19   Global Step: 800000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:38:31,412-[lfw][800000]XNorm: 21.339039
Training: 2022-04-16 13:38:31,413-[lfw][800000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 13:38:31,414-[lfw][800000]Accuracy-Highest: 0.99850
Training: 2022-04-16 13:39:21,261-[cfp_fp][800000]XNorm: 21.776211
Training: 2022-04-16 13:39:21,262-[cfp_fp][800000]Accuracy-Flip: 0.99300+-0.00375
Training: 2022-04-16 13:39:21,263-[cfp_fp][800000]Accuracy-Highest: 0.99400
Training: 2022-04-16 13:40:04,370-[agedb_30][800000]XNorm: 22.300622
Training: 2022-04-16 13:40:04,371-[agedb_30][800000]Accuracy-Flip: 0.98583+-0.00512
Training: 2022-04-16 13:40:04,371-[agedb_30][800000]Accuracy-Highest: 0.98583
Training: 2022-04-16 13:40:08,242-Speed 73.36 samples/sec   Loss 1.1722   LearningRate 0.0001   Epoch: 19   Global Step: 800010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:12,109-Speed 2648.92 samples/sec   Loss 1.1687   LearningRate 0.0001   Epoch: 19   Global Step: 800020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:15,983-Speed 2644.05 samples/sec   Loss 1.1596   LearningRate 0.0001   Epoch: 19   Global Step: 800030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:19,864-Speed 2638.90 samples/sec   Loss 1.1807   LearningRate 0.0001   Epoch: 19   Global Step: 800040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:23,748-Speed 2636.55 samples/sec   Loss 1.1998   LearningRate 0.0001   Epoch: 19   Global Step: 800050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:27,660-Speed 2618.62 samples/sec   Loss 1.1270   LearningRate 0.0001   Epoch: 19   Global Step: 800060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:31,574-Speed 2618.97 samples/sec   Loss 1.1416   LearningRate 0.0001   Epoch: 19   Global Step: 800070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:35,462-Speed 2634.28 samples/sec   Loss 1.2012   LearningRate 0.0001   Epoch: 19   Global Step: 800080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:39,329-Speed 2648.88 samples/sec   Loss 1.1561   LearningRate 0.0001   Epoch: 19   Global Step: 800090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:43,220-Speed 2632.99 samples/sec   Loss 1.2016   LearningRate 0.0001   Epoch: 19   Global Step: 800100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:47,111-Speed 2632.31 samples/sec   Loss 1.1522   LearningRate 0.0001   Epoch: 19   Global Step: 800110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:51,007-Speed 2628.95 samples/sec   Loss 1.2120   LearningRate 0.0001   Epoch: 19   Global Step: 800120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:54,909-Speed 2625.05 samples/sec   Loss 1.1652   LearningRate 0.0001   Epoch: 19   Global Step: 800130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:40:58,808-Speed 2626.32 samples/sec   Loss 1.1538   LearningRate 0.0001   Epoch: 19   Global Step: 800140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:02,702-Speed 2630.72 samples/sec   Loss 1.1665   LearningRate 0.0001   Epoch: 19   Global Step: 800150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:06,596-Speed 2630.52 samples/sec   Loss 1.1728   LearningRate 0.0001   Epoch: 19   Global Step: 800160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:10,486-Speed 2633.06 samples/sec   Loss 1.1389   LearningRate 0.0001   Epoch: 19   Global Step: 800170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:14,378-Speed 2631.51 samples/sec   Loss 1.1819   LearningRate 0.0001   Epoch: 19   Global Step: 800180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:18,272-Speed 2630.75 samples/sec   Loss 1.1379   LearningRate 0.0001   Epoch: 19   Global Step: 800190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:41:22,139-Speed 2648.54 samples/sec   Loss 1.1992   LearningRate 0.0001   Epoch: 19   Global Step: 800200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:26,034-Speed 2629.43 samples/sec   Loss 1.1384   LearningRate 0.0001   Epoch: 19   Global Step: 800210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:29,927-Speed 2630.59 samples/sec   Loss 1.1888   LearningRate 0.0001   Epoch: 19   Global Step: 800220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:33,818-Speed 2632.80 samples/sec   Loss 1.1735   LearningRate 0.0001   Epoch: 19   Global Step: 800230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:37,713-Speed 2629.89 samples/sec   Loss 1.1644   LearningRate 0.0001   Epoch: 19   Global Step: 800240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:41,608-Speed 2629.53 samples/sec   Loss 1.1436   LearningRate 0.0001   Epoch: 19   Global Step: 800250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:45,504-Speed 2629.29 samples/sec   Loss 1.1716   LearningRate 0.0001   Epoch: 19   Global Step: 800260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:49,462-Speed 2587.89 samples/sec   Loss 1.1610   LearningRate 0.0001   Epoch: 19   Global Step: 800270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:53,357-Speed 2629.48 samples/sec   Loss 1.1652   LearningRate 0.0001   Epoch: 19   Global Step: 800280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:41:57,251-Speed 2630.96 samples/sec   Loss 1.1992   LearningRate 0.0001   Epoch: 19   Global Step: 800290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:01,118-Speed 2648.48 samples/sec   Loss 1.1541   LearningRate 0.0001   Epoch: 19   Global Step: 800300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:05,015-Speed 2628.83 samples/sec   Loss 1.1292   LearningRate 0.0001   Epoch: 19   Global Step: 800310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:09,082-Speed 2518.65 samples/sec   Loss 1.2197   LearningRate 0.0001   Epoch: 19   Global Step: 800320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:12,981-Speed 2626.65 samples/sec   Loss 1.1810   LearningRate 0.0001   Epoch: 19   Global Step: 800330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:16,878-Speed 2629.11 samples/sec   Loss 1.1652   LearningRate 0.0001   Epoch: 19   Global Step: 800340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:20,787-Speed 2620.07 samples/sec   Loss 1.1487   LearningRate 0.0001   Epoch: 19   Global Step: 800350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:24,678-Speed 2632.05 samples/sec   Loss 1.1326   LearningRate 0.0001   Epoch: 19   Global Step: 800360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:28,588-Speed 2619.58 samples/sec   Loss 1.1304   LearningRate 0.0001   Epoch: 19   Global Step: 800370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:32,477-Speed 2633.94 samples/sec   Loss 1.1109   LearningRate 0.0001   Epoch: 19   Global Step: 800380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:36,370-Speed 2630.78 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 19   Global Step: 800390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:40,265-Speed 2630.30 samples/sec   Loss 1.1353   LearningRate 0.0001   Epoch: 19   Global Step: 800400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:42:44,157-Speed 2631.96 samples/sec   Loss 1.1244   LearningRate 0.0001   Epoch: 19   Global Step: 800410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:42:48,024-Speed 2648.60 samples/sec   Loss 1.1462   LearningRate 0.0001   Epoch: 19   Global Step: 800420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:51,925-Speed 2626.21 samples/sec   Loss 1.1526   LearningRate 0.0001   Epoch: 19   Global Step: 800430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:55,818-Speed 2630.77 samples/sec   Loss 1.1936   LearningRate 0.0001   Epoch: 19   Global Step: 800440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:42:59,737-Speed 2613.57 samples/sec   Loss 1.1334   LearningRate 0.0001   Epoch: 19   Global Step: 800450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:03,643-Speed 2621.79 samples/sec   Loss 1.1686   LearningRate 0.0001   Epoch: 19   Global Step: 800460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:07,663-Speed 2548.02 samples/sec   Loss 1.1785   LearningRate 0.0001   Epoch: 19   Global Step: 800470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:11,576-Speed 2617.60 samples/sec   Loss 1.1541   LearningRate 0.0001   Epoch: 19   Global Step: 800480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:15,484-Speed 2621.62 samples/sec   Loss 1.2034   LearningRate 0.0001   Epoch: 19   Global Step: 800490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:19,425-Speed 2599.34 samples/sec   Loss 1.1154   LearningRate 0.0001   Epoch: 19   Global Step: 800500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:23,330-Speed 2622.82 samples/sec   Loss 1.1887   LearningRate 0.0001   Epoch: 19   Global Step: 800510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:27,236-Speed 2622.00 samples/sec   Loss 1.1335   LearningRate 0.0001   Epoch: 19   Global Step: 800520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:43:31,119-Speed 2637.91 samples/sec   Loss 1.1241   LearningRate 0.0001   Epoch: 19   Global Step: 800530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:35,012-Speed 2631.32 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 19   Global Step: 800540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:38,902-Speed 2632.29 samples/sec   Loss 1.1423   LearningRate 0.0001   Epoch: 19   Global Step: 800550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:42,828-Speed 2609.16 samples/sec   Loss 1.1006   LearningRate 0.0001   Epoch: 19   Global Step: 800560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:46,732-Speed 2623.82 samples/sec   Loss 1.1492   LearningRate 0.0001   Epoch: 19   Global Step: 800570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:50,682-Speed 2593.44 samples/sec   Loss 1.1153   LearningRate 0.0001   Epoch: 19   Global Step: 800580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:54,573-Speed 2632.77 samples/sec   Loss 1.1353   LearningRate 0.0001   Epoch: 19   Global Step: 800590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:43:58,495-Speed 2611.56 samples/sec   Loss 1.1513   LearningRate 0.0001   Epoch: 19   Global Step: 800600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:02,408-Speed 2618.20 samples/sec   Loss 1.1691   LearningRate 0.0001   Epoch: 19   Global Step: 800610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:06,392-Speed 2570.36 samples/sec   Loss 1.1408   LearningRate 0.0001   Epoch: 19   Global Step: 800620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:10,320-Speed 2607.90 samples/sec   Loss 1.1908   LearningRate 0.0001   Epoch: 19   Global Step: 800630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:44:14,222-Speed 2625.26 samples/sec   Loss 1.1185   LearningRate 0.0001   Epoch: 19   Global Step: 800640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:44:18,138-Speed 2615.48 samples/sec   Loss 1.1435   LearningRate 0.0001   Epoch: 19   Global Step: 800650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:44:22,045-Speed 2621.57 samples/sec   Loss 1.1663   LearningRate 0.0001   Epoch: 19   Global Step: 800660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:44:25,920-Speed 2643.75 samples/sec   Loss 1.1512   LearningRate 0.0001   Epoch: 19   Global Step: 800670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:29,810-Speed 2633.26 samples/sec   Loss 1.1383   LearningRate 0.0001   Epoch: 19   Global Step: 800680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:33,719-Speed 2620.12 samples/sec   Loss 1.1777   LearningRate 0.0001   Epoch: 19   Global Step: 800690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:37,627-Speed 2620.81 samples/sec   Loss 1.1324   LearningRate 0.0001   Epoch: 19   Global Step: 800700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:41,535-Speed 2620.65 samples/sec   Loss 1.1330   LearningRate 0.0001   Epoch: 19   Global Step: 800710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:45,442-Speed 2621.88 samples/sec   Loss 1.1533   LearningRate 0.0001   Epoch: 19   Global Step: 800720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:49,333-Speed 2632.64 samples/sec   Loss 1.1293   LearningRate 0.0001   Epoch: 19   Global Step: 800730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:53,222-Speed 2633.61 samples/sec   Loss 1.1779   LearningRate 0.0001   Epoch: 19   Global Step: 800740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:44:57,116-Speed 2630.45 samples/sec   Loss 1.1354   LearningRate 0.0001   Epoch: 19   Global Step: 800750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:45:01,005-Speed 2633.49 samples/sec   Loss 1.1727   LearningRate 0.0001   Epoch: 19   Global Step: 800760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:45:04,903-Speed 2627.93 samples/sec   Loss 1.1823   LearningRate 0.0001   Epoch: 19   Global Step: 800770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:45:08,747-Speed 2664.41 samples/sec   Loss 1.1908   LearningRate 0.0001   Epoch: 19   Global Step: 800780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:12,680-Speed 2604.63 samples/sec   Loss 1.1556   LearningRate 0.0001   Epoch: 19   Global Step: 800790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:16,578-Speed 2627.27 samples/sec   Loss 1.1363   LearningRate 0.0001   Epoch: 19   Global Step: 800800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:20,555-Speed 2576.21 samples/sec   Loss 1.2038   LearningRate 0.0001   Epoch: 19   Global Step: 800810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:24,450-Speed 2629.25 samples/sec   Loss 1.1761   LearningRate 0.0001   Epoch: 19   Global Step: 800820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:28,346-Speed 2629.72 samples/sec   Loss 1.1630   LearningRate 0.0001   Epoch: 19   Global Step: 800830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:32,259-Speed 2617.36 samples/sec   Loss 1.1619   LearningRate 0.0001   Epoch: 19   Global Step: 800840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:36,164-Speed 2622.85 samples/sec   Loss 1.1519   LearningRate 0.0001   Epoch: 19   Global Step: 800850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:40,060-Speed 2629.11 samples/sec   Loss 1.1832   LearningRate 0.0001   Epoch: 19   Global Step: 800860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:44,013-Speed 2590.97 samples/sec   Loss 1.1310   LearningRate 0.0001   Epoch: 19   Global Step: 800870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:45:47,904-Speed 2632.20 samples/sec   Loss 1.1784   LearningRate 0.0001   Epoch: 19   Global Step: 800880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:45:51,797-Speed 2632.01 samples/sec   Loss 1.1541   LearningRate 0.0001   Epoch: 19   Global Step: 800890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:45:55,727-Speed 2606.19 samples/sec   Loss 1.1543   LearningRate 0.0001   Epoch: 19   Global Step: 800900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:45:59,621-Speed 2630.68 samples/sec   Loss 1.1795   LearningRate 0.0001   Epoch: 19   Global Step: 800910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:03,517-Speed 2629.28 samples/sec   Loss 1.0891   LearningRate 0.0001   Epoch: 19   Global Step: 800920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:07,412-Speed 2629.15 samples/sec   Loss 1.1790   LearningRate 0.0001   Epoch: 19   Global Step: 800930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:11,315-Speed 2624.28 samples/sec   Loss 1.1542   LearningRate 0.0001   Epoch: 19   Global Step: 800940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:15,230-Speed 2616.34 samples/sec   Loss 1.1427   LearningRate 0.0001   Epoch: 19   Global Step: 800950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:19,128-Speed 2628.09 samples/sec   Loss 1.1108   LearningRate 0.0001   Epoch: 19   Global Step: 800960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:23,020-Speed 2631.72 samples/sec   Loss 1.1297   LearningRate 0.0001   Epoch: 19   Global Step: 800970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:26,952-Speed 2604.86 samples/sec   Loss 1.1447   LearningRate 0.0001   Epoch: 19   Global Step: 800980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:46:30,820-Speed 2648.17 samples/sec   Loss 1.1179   LearningRate 0.0001   Epoch: 19   Global Step: 800990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:34,716-Speed 2629.27 samples/sec   Loss 1.1471   LearningRate 0.0001   Epoch: 19   Global Step: 801000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:38,627-Speed 2618.55 samples/sec   Loss 1.1365   LearningRate 0.0001   Epoch: 19   Global Step: 801010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:42,525-Speed 2628.29 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 801020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:46,435-Speed 2619.42 samples/sec   Loss 1.0839   LearningRate 0.0001   Epoch: 19   Global Step: 801030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:50,348-Speed 2617.43 samples/sec   Loss 1.1836   LearningRate 0.0001   Epoch: 19   Global Step: 801040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:54,256-Speed 2621.45 samples/sec   Loss 1.1092   LearningRate 0.0001   Epoch: 19   Global Step: 801050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:46:58,153-Speed 2628.32 samples/sec   Loss 1.2053   LearningRate 0.0001   Epoch: 19   Global Step: 801060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:02,048-Speed 2629.64 samples/sec   Loss 1.1396   LearningRate 0.0001   Epoch: 19   Global Step: 801070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:05,936-Speed 2634.36 samples/sec   Loss 1.1724   LearningRate 0.0001   Epoch: 19   Global Step: 801080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:09,833-Speed 2628.02 samples/sec   Loss 1.1062   LearningRate 0.0001   Epoch: 19   Global Step: 801090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:47:13,732-Speed 2627.04 samples/sec   Loss 1.2030   LearningRate 0.0001   Epoch: 19   Global Step: 801100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:47:17,627-Speed 2630.03 samples/sec   Loss 1.1658   LearningRate 0.0001   Epoch: 19   Global Step: 801110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:47:21,493-Speed 2649.30 samples/sec   Loss 1.2190   LearningRate 0.0001   Epoch: 19   Global Step: 801120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:25,385-Speed 2632.28 samples/sec   Loss 1.1250   LearningRate 0.0001   Epoch: 19   Global Step: 801130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:29,281-Speed 2628.67 samples/sec   Loss 1.1655   LearningRate 0.0001   Epoch: 19   Global Step: 801140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:33,169-Speed 2635.35 samples/sec   Loss 1.1278   LearningRate 0.0001   Epoch: 19   Global Step: 801150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:37,062-Speed 2630.97 samples/sec   Loss 1.1338   LearningRate 0.0001   Epoch: 19   Global Step: 801160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:40,992-Speed 2605.81 samples/sec   Loss 1.1493   LearningRate 0.0001   Epoch: 19   Global Step: 801170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:44,898-Speed 2622.83 samples/sec   Loss 1.1230   LearningRate 0.0001   Epoch: 19   Global Step: 801180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:48,792-Speed 2630.68 samples/sec   Loss 1.1478   LearningRate 0.0001   Epoch: 19   Global Step: 801190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:52,686-Speed 2630.02 samples/sec   Loss 1.1428   LearningRate 0.0001   Epoch: 19   Global Step: 801200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:47:56,583-Speed 2629.04 samples/sec   Loss 1.1591   LearningRate 0.0001   Epoch: 19   Global Step: 801210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:00,477-Speed 2629.86 samples/sec   Loss 1.1295   LearningRate 0.0001   Epoch: 19   Global Step: 801220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:48:04,526-Speed 2530.29 samples/sec   Loss 1.1514   LearningRate 0.0001   Epoch: 19   Global Step: 801230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:48:08,399-Speed 2644.40 samples/sec   Loss 1.1352   LearningRate 0.0001   Epoch: 19   Global Step: 801240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:12,337-Speed 2601.29 samples/sec   Loss 1.1566   LearningRate 0.0001   Epoch: 19   Global Step: 801250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:16,224-Speed 2635.04 samples/sec   Loss 1.1946   LearningRate 0.0001   Epoch: 19   Global Step: 801260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:20,115-Speed 2632.74 samples/sec   Loss 1.1647   LearningRate 0.0001   Epoch: 19   Global Step: 801270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:24,003-Speed 2634.45 samples/sec   Loss 1.1234   LearningRate 0.0001   Epoch: 19   Global Step: 801280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:27,911-Speed 2620.70 samples/sec   Loss 1.1697   LearningRate 0.0001   Epoch: 19   Global Step: 801290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:31,804-Speed 2630.54 samples/sec   Loss 1.1514   LearningRate 0.0001   Epoch: 19   Global Step: 801300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:35,715-Speed 2619.13 samples/sec   Loss 1.1604   LearningRate 0.0001   Epoch: 19   Global Step: 801310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:39,635-Speed 2613.46 samples/sec   Loss 1.1815   LearningRate 0.0001   Epoch: 19   Global Step: 801320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:43,529-Speed 2630.84 samples/sec   Loss 1.1729   LearningRate 0.0001   Epoch: 19   Global Step: 801330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:47,400-Speed 2646.14 samples/sec   Loss 1.1781   LearningRate 0.0001   Epoch: 19   Global Step: 801340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:51,303-Speed 2624.36 samples/sec   Loss 1.1320   LearningRate 0.0001   Epoch: 19   Global Step: 801350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:55,196-Speed 2631.59 samples/sec   Loss 1.1856   LearningRate 0.0001   Epoch: 19   Global Step: 801360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:48:59,085-Speed 2633.57 samples/sec   Loss 1.1767   LearningRate 0.0001   Epoch: 19   Global Step: 801370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:49:02,979-Speed 2629.94 samples/sec   Loss 1.1563   LearningRate 0.0001   Epoch: 19   Global Step: 801380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:49:06,874-Speed 2629.95 samples/sec   Loss 1.1436   LearningRate 0.0001   Epoch: 19   Global Step: 801390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:49:10,766-Speed 2632.00 samples/sec   Loss 1.1778   LearningRate 0.0001   Epoch: 19   Global Step: 801400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:49:14,694-Speed 2607.35 samples/sec   Loss 1.1599   LearningRate 0.0001   Epoch: 19   Global Step: 801410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:49:18,588-Speed 2630.15 samples/sec   Loss 1.1357   LearningRate 0.0001   Epoch: 19   Global Step: 801420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:49:22,515-Speed 2609.11 samples/sec   Loss 1.1510   LearningRate 0.0001   Epoch: 19   Global Step: 801430   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:26,410-Speed 2629.36 samples/sec   Loss 1.1603   LearningRate 0.0001   Epoch: 19   Global Step: 801440   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:30,306-Speed 2629.16 samples/sec   Loss 1.1584   LearningRate 0.0001   Epoch: 19   Global Step: 801450   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:34,197-Speed 2632.99 samples/sec   Loss 1.1566   LearningRate 0.0001   Epoch: 19   Global Step: 801460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:38,092-Speed 2629.41 samples/sec   Loss 1.1505   LearningRate 0.0001   Epoch: 19   Global Step: 801470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:41,985-Speed 2630.62 samples/sec   Loss 1.1777   LearningRate 0.0001   Epoch: 19   Global Step: 801480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:45,882-Speed 2628.34 samples/sec   Loss 1.1255   LearningRate 0.0001   Epoch: 19   Global Step: 801490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:49,778-Speed 2629.23 samples/sec   Loss 1.1292   LearningRate 0.0001   Epoch: 19   Global Step: 801500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:53,672-Speed 2630.93 samples/sec   Loss 1.1342   LearningRate 0.0001   Epoch: 19   Global Step: 801510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:49:57,573-Speed 2625.86 samples/sec   Loss 1.1514   LearningRate 0.0001   Epoch: 19   Global Step: 801520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:50:01,523-Speed 2593.36 samples/sec   Loss 1.1326   LearningRate 0.0001   Epoch: 19   Global Step: 801530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:05,417-Speed 2629.89 samples/sec   Loss 1.1566   LearningRate 0.0001   Epoch: 19   Global Step: 801540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:09,317-Speed 2626.50 samples/sec   Loss 1.0999   LearningRate 0.0001   Epoch: 19   Global Step: 801550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:13,207-Speed 2633.18 samples/sec   Loss 1.1609   LearningRate 0.0001   Epoch: 19   Global Step: 801560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:17,101-Speed 2630.38 samples/sec   Loss 1.1563   LearningRate 0.0001   Epoch: 19   Global Step: 801570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:21,009-Speed 2620.79 samples/sec   Loss 1.1505   LearningRate 0.0001   Epoch: 19   Global Step: 801580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:24,901-Speed 2631.89 samples/sec   Loss 1.0961   LearningRate 0.0001   Epoch: 19   Global Step: 801590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:28,790-Speed 2634.17 samples/sec   Loss 1.1671   LearningRate 0.0001   Epoch: 19   Global Step: 801600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:32,695-Speed 2622.61 samples/sec   Loss 1.1655   LearningRate 0.0001   Epoch: 19   Global Step: 801610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:36,586-Speed 2632.30 samples/sec   Loss 1.1626   LearningRate 0.0001   Epoch: 19   Global Step: 801620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:40,458-Speed 2644.96 samples/sec   Loss 1.1374   LearningRate 0.0001   Epoch: 19   Global Step: 801630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:44,355-Speed 2629.42 samples/sec   Loss 1.1951   LearningRate 0.0001   Epoch: 19   Global Step: 801640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:48,250-Speed 2629.39 samples/sec   Loss 1.1549   LearningRate 0.0001   Epoch: 19   Global Step: 801650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:52,141-Speed 2632.54 samples/sec   Loss 1.1531   LearningRate 0.0001   Epoch: 19   Global Step: 801660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:56,034-Speed 2630.53 samples/sec   Loss 1.1380   LearningRate 0.0001   Epoch: 19   Global Step: 801670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:50:59,928-Speed 2630.71 samples/sec   Loss 1.1551   LearningRate 0.0001   Epoch: 19   Global Step: 801680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:03,833-Speed 2622.61 samples/sec   Loss 1.0967   LearningRate 0.0001   Epoch: 19   Global Step: 801690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:07,731-Speed 2627.77 samples/sec   Loss 1.1716   LearningRate 0.0001   Epoch: 19   Global Step: 801700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:11,627-Speed 2628.97 samples/sec   Loss 1.1290   LearningRate 0.0001   Epoch: 19   Global Step: 801710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:15,525-Speed 2628.05 samples/sec   Loss 1.1395   LearningRate 0.0001   Epoch: 19   Global Step: 801720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:19,428-Speed 2624.27 samples/sec   Loss 1.1695   LearningRate 0.0001   Epoch: 19   Global Step: 801730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:51:23,342-Speed 2617.59 samples/sec   Loss 1.1411   LearningRate 0.0001   Epoch: 19   Global Step: 801740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:27,242-Speed 2626.26 samples/sec   Loss 1.1572   LearningRate 0.0001   Epoch: 19   Global Step: 801750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:31,151-Speed 2620.67 samples/sec   Loss 1.1468   LearningRate 0.0001   Epoch: 19   Global Step: 801760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:35,071-Speed 2613.22 samples/sec   Loss 1.1651   LearningRate 0.0001   Epoch: 19   Global Step: 801770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:38,968-Speed 2628.01 samples/sec   Loss 1.1749   LearningRate 0.0001   Epoch: 19   Global Step: 801780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:42,877-Speed 2620.05 samples/sec   Loss 1.1688   LearningRate 0.0001   Epoch: 19   Global Step: 801790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:46,777-Speed 2626.03 samples/sec   Loss 1.1692   LearningRate 0.0001   Epoch: 19   Global Step: 801800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:50,680-Speed 2624.61 samples/sec   Loss 1.1588   LearningRate 0.0001   Epoch: 19   Global Step: 801810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:51:54,549-Speed 2647.25 samples/sec   Loss 1.1328   LearningRate 0.0001   Epoch: 19   Global Step: 801820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:51:58,447-Speed 2628.18 samples/sec   Loss 1.1480   LearningRate 0.0001   Epoch: 19   Global Step: 801830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:02,340-Speed 2630.66 samples/sec   Loss 1.1416   LearningRate 0.0001   Epoch: 19   Global Step: 801840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:06,231-Speed 2632.30 samples/sec   Loss 1.1412   LearningRate 0.0001   Epoch: 19   Global Step: 801850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:10,133-Speed 2625.09 samples/sec   Loss 1.1161   LearningRate 0.0001   Epoch: 19   Global Step: 801860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:14,029-Speed 2629.36 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 801870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:17,934-Speed 2622.62 samples/sec   Loss 1.1494   LearningRate 0.0001   Epoch: 19   Global Step: 801880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:21,841-Speed 2621.22 samples/sec   Loss 1.1776   LearningRate 0.0001   Epoch: 19   Global Step: 801890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:25,744-Speed 2624.72 samples/sec   Loss 1.1294   LearningRate 0.0001   Epoch: 19   Global Step: 801900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:29,667-Speed 2610.30 samples/sec   Loss 1.1809   LearningRate 0.0001   Epoch: 19   Global Step: 801910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:52:33,571-Speed 2624.20 samples/sec   Loss 1.1238   LearningRate 0.0001   Epoch: 19   Global Step: 801920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:52:37,480-Speed 2620.05 samples/sec   Loss 1.1521   LearningRate 0.0001   Epoch: 19   Global Step: 801930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:52:41,384-Speed 2623.58 samples/sec   Loss 1.1612   LearningRate 0.0001   Epoch: 19   Global Step: 801940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:52:45,294-Speed 2619.36 samples/sec   Loss 1.1294   LearningRate 0.0001   Epoch: 19   Global Step: 801950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:52:49,199-Speed 2622.88 samples/sec   Loss 1.1354   LearningRate 0.0001   Epoch: 19   Global Step: 801960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:52:53,097-Speed 2627.81 samples/sec   Loss 1.1257   LearningRate 0.0001   Epoch: 19   Global Step: 801970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:52:56,979-Speed 2638.70 samples/sec   Loss 1.1871   LearningRate 0.0001   Epoch: 19   Global Step: 801980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:00,896-Speed 2615.09 samples/sec   Loss 1.0930   LearningRate 0.0001   Epoch: 19   Global Step: 801990   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:04,792-Speed 2628.81 samples/sec   Loss 1.1969   LearningRate 0.0001   Epoch: 19   Global Step: 802000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:08,702-Speed 2619.73 samples/sec   Loss 1.1546   LearningRate 0.0001   Epoch: 19   Global Step: 802010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:12,790-Speed 2505.66 samples/sec   Loss 1.1290   LearningRate 0.0001   Epoch: 19   Global Step: 802020   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:16,680-Speed 2632.35 samples/sec   Loss 1.1861   LearningRate 0.0001   Epoch: 19   Global Step: 802030   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:20,594-Speed 2617.61 samples/sec   Loss 1.0861   LearningRate 0.0001   Epoch: 19   Global Step: 802040   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:24,497-Speed 2624.46 samples/sec   Loss 1.1354   LearningRate 0.0001   Epoch: 19   Global Step: 802050   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:28,389-Speed 2631.32 samples/sec   Loss 1.1205   LearningRate 0.0001   Epoch: 19   Global Step: 802060   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:32,285-Speed 2628.79 samples/sec   Loss 1.2164   LearningRate 0.0001   Epoch: 19   Global Step: 802070   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:53:36,188-Speed 2625.29 samples/sec   Loss 1.1653   LearningRate 0.0001   Epoch: 19   Global Step: 802080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:53:40,079-Speed 2631.97 samples/sec   Loss 1.1848   LearningRate 0.0001   Epoch: 19   Global Step: 802090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:53:43,976-Speed 2628.38 samples/sec   Loss 1.1701   LearningRate 0.0001   Epoch: 19   Global Step: 802100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:53:47,879-Speed 2624.07 samples/sec   Loss 1.1515   LearningRate 0.0001   Epoch: 19   Global Step: 802110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:53:51,798-Speed 2613.67 samples/sec   Loss 1.1548   LearningRate 0.0001   Epoch: 19   Global Step: 802120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:53:55,702-Speed 2623.81 samples/sec   Loss 1.1189   LearningRate 0.0001   Epoch: 19   Global Step: 802130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:53:59,609-Speed 2621.19 samples/sec   Loss 1.1512   LearningRate 0.0001   Epoch: 19   Global Step: 802140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:03,541-Speed 2605.20 samples/sec   Loss 1.1839   LearningRate 0.0001   Epoch: 19   Global Step: 802150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:07,429-Speed 2634.49 samples/sec   Loss 1.1364   LearningRate 0.0001   Epoch: 19   Global Step: 802160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:11,319-Speed 2633.45 samples/sec   Loss 1.1960   LearningRate 0.0001   Epoch: 19   Global Step: 802170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:15,191-Speed 2645.11 samples/sec   Loss 1.1188   LearningRate 0.0001   Epoch: 19   Global Step: 802180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:19,084-Speed 2630.64 samples/sec   Loss 1.1308   LearningRate 0.0001   Epoch: 19   Global Step: 802190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:22,983-Speed 2627.34 samples/sec   Loss 1.1691   LearningRate 0.0001   Epoch: 19   Global Step: 802200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:26,884-Speed 2625.85 samples/sec   Loss 1.1332   LearningRate 0.0001   Epoch: 19   Global Step: 802210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:30,782-Speed 2627.95 samples/sec   Loss 1.1529   LearningRate 0.0001   Epoch: 19   Global Step: 802220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:34,671-Speed 2633.16 samples/sec   Loss 1.1539   LearningRate 0.0001   Epoch: 19   Global Step: 802230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:38,565-Speed 2630.33 samples/sec   Loss 1.1710   LearningRate 0.0001   Epoch: 19   Global Step: 802240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:42,460-Speed 2629.32 samples/sec   Loss 1.1460   LearningRate 0.0001   Epoch: 19   Global Step: 802250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:46,356-Speed 2629.52 samples/sec   Loss 1.1532   LearningRate 0.0001   Epoch: 19   Global Step: 802260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:50,249-Speed 2631.13 samples/sec   Loss 1.1658   LearningRate 0.0001   Epoch: 19   Global Step: 802270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:54:54,157-Speed 2620.88 samples/sec   Loss 1.1365   LearningRate 0.0001   Epoch: 19   Global Step: 802280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:54:58,063-Speed 2622.25 samples/sec   Loss 1.1075   LearningRate 0.0001   Epoch: 19   Global Step: 802290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:55:01,955-Speed 2631.09 samples/sec   Loss 1.0896   LearningRate 0.0001   Epoch: 19   Global Step: 802300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:55:05,865-Speed 2619.95 samples/sec   Loss 1.1765   LearningRate 0.0001   Epoch: 19   Global Step: 802310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:55:09,757-Speed 2631.63 samples/sec   Loss 1.1816   LearningRate 0.0001   Epoch: 19   Global Step: 802320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:55:13,665-Speed 2621.18 samples/sec   Loss 1.1307   LearningRate 0.0001   Epoch: 19   Global Step: 802330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:55:17,537-Speed 2645.27 samples/sec   Loss 1.1449   LearningRate 0.0001   Epoch: 19   Global Step: 802340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:21,431-Speed 2630.20 samples/sec   Loss 1.1477   LearningRate 0.0001   Epoch: 19   Global Step: 802350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:25,331-Speed 2626.47 samples/sec   Loss 1.1951   LearningRate 0.0001   Epoch: 19   Global Step: 802360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:29,225-Speed 2630.02 samples/sec   Loss 1.1740   LearningRate 0.0001   Epoch: 19   Global Step: 802370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:33,123-Speed 2628.11 samples/sec   Loss 1.1385   LearningRate 0.0001   Epoch: 19   Global Step: 802380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:37,021-Speed 2627.20 samples/sec   Loss 1.1838   LearningRate 0.0001   Epoch: 19   Global Step: 802390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:40,957-Speed 2602.26 samples/sec   Loss 1.1214   LearningRate 0.0001   Epoch: 19   Global Step: 802400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:44,854-Speed 2629.11 samples/sec   Loss 1.1192   LearningRate 0.0001   Epoch: 19   Global Step: 802410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:48,762-Speed 2620.47 samples/sec   Loss 1.1282   LearningRate 0.0001   Epoch: 19   Global Step: 802420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:52,669-Speed 2622.17 samples/sec   Loss 1.1872   LearningRate 0.0001   Epoch: 19   Global Step: 802430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:55:56,564-Speed 2629.21 samples/sec   Loss 1.1451   LearningRate 0.0001   Epoch: 19   Global Step: 802440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:56:00,489-Speed 2609.43 samples/sec   Loss 1.1426   LearningRate 0.0001   Epoch: 19   Global Step: 802450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:56:04,373-Speed 2637.28 samples/sec   Loss 1.0800   LearningRate 0.0001   Epoch: 19   Global Step: 802460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:56:08,267-Speed 2630.45 samples/sec   Loss 1.1662   LearningRate 0.0001   Epoch: 19   Global Step: 802470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:56:12,147-Speed 2639.44 samples/sec   Loss 1.1749   LearningRate 0.0001   Epoch: 19   Global Step: 802480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:16,037-Speed 2633.51 samples/sec   Loss 1.1213   LearningRate 0.0001   Epoch: 19   Global Step: 802490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:19,939-Speed 2624.84 samples/sec   Loss 1.1336   LearningRate 0.0001   Epoch: 19   Global Step: 802500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:23,845-Speed 2622.48 samples/sec   Loss 1.1418   LearningRate 0.0001   Epoch: 19   Global Step: 802510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:27,747-Speed 2625.55 samples/sec   Loss 1.1229   LearningRate 0.0001   Epoch: 19   Global Step: 802520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:31,651-Speed 2623.59 samples/sec   Loss 1.2000   LearningRate 0.0001   Epoch: 19   Global Step: 802530   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:35,545-Speed 2629.98 samples/sec   Loss 1.1460   LearningRate 0.0001   Epoch: 19   Global Step: 802540   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:39,438-Speed 2631.43 samples/sec   Loss 1.1416   LearningRate 0.0001   Epoch: 19   Global Step: 802550   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:43,338-Speed 2625.99 samples/sec   Loss 1.1809   LearningRate 0.0001   Epoch: 19   Global Step: 802560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:47,249-Speed 2618.75 samples/sec   Loss 1.1429   LearningRate 0.0001   Epoch: 19   Global Step: 802570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 13:56:51,226-Speed 2575.46 samples/sec   Loss 1.1489   LearningRate 0.0001   Epoch: 19   Global Step: 802580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:56:55,146-Speed 2612.80 samples/sec   Loss 1.1695   LearningRate 0.0001   Epoch: 19   Global Step: 802590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:56:59,056-Speed 2620.25 samples/sec   Loss 1.1284   LearningRate 0.0001   Epoch: 19   Global Step: 802600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:02,957-Speed 2625.35 samples/sec   Loss 1.1762   LearningRate 0.0001   Epoch: 19   Global Step: 802610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:06,885-Speed 2607.34 samples/sec   Loss 1.1710   LearningRate 0.0001   Epoch: 19   Global Step: 802620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:10,779-Speed 2630.46 samples/sec   Loss 1.1676   LearningRate 0.0001   Epoch: 19   Global Step: 802630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:14,679-Speed 2626.78 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 19   Global Step: 802640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:18,575-Speed 2628.58 samples/sec   Loss 1.1560   LearningRate 0.0001   Epoch: 19   Global Step: 802650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:22,486-Speed 2620.08 samples/sec   Loss 1.1771   LearningRate 0.0001   Epoch: 19   Global Step: 802660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:26,389-Speed 2623.89 samples/sec   Loss 1.1309   LearningRate 0.0001   Epoch: 19   Global Step: 802670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:30,309-Speed 2613.38 samples/sec   Loss 1.1186   LearningRate 0.0001   Epoch: 19   Global Step: 802680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:57:34,211-Speed 2625.17 samples/sec   Loss 1.1307   LearningRate 0.0001   Epoch: 19   Global Step: 802690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:57:38,107-Speed 2629.11 samples/sec   Loss 1.1425   LearningRate 0.0001   Epoch: 19   Global Step: 802700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:57:41,978-Speed 2645.75 samples/sec   Loss 1.1960   LearningRate 0.0001   Epoch: 19   Global Step: 802710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:45,907-Speed 2607.08 samples/sec   Loss 1.1034   LearningRate 0.0001   Epoch: 19   Global Step: 802720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:49,803-Speed 2629.27 samples/sec   Loss 1.1588   LearningRate 0.0001   Epoch: 19   Global Step: 802730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:53,703-Speed 2626.83 samples/sec   Loss 1.1430   LearningRate 0.0001   Epoch: 19   Global Step: 802740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:57:57,618-Speed 2615.95 samples/sec   Loss 1.1433   LearningRate 0.0001   Epoch: 19   Global Step: 802750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:01,516-Speed 2628.36 samples/sec   Loss 1.1062   LearningRate 0.0001   Epoch: 19   Global Step: 802760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:05,406-Speed 2632.87 samples/sec   Loss 1.1476   LearningRate 0.0001   Epoch: 19   Global Step: 802770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:09,390-Speed 2570.81 samples/sec   Loss 1.1691   LearningRate 0.0001   Epoch: 19   Global Step: 802780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:13,283-Speed 2630.80 samples/sec   Loss 1.1742   LearningRate 0.0001   Epoch: 19   Global Step: 802790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:17,192-Speed 2620.62 samples/sec   Loss 1.1354   LearningRate 0.0001   Epoch: 19   Global Step: 802800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:21,091-Speed 2627.01 samples/sec   Loss 1.1817   LearningRate 0.0001   Epoch: 19   Global Step: 802810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:58:24,958-Speed 2648.94 samples/sec   Loss 1.1694   LearningRate 0.0001   Epoch: 19   Global Step: 802820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:28,850-Speed 2632.32 samples/sec   Loss 1.1527   LearningRate 0.0001   Epoch: 19   Global Step: 802830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:32,741-Speed 2631.80 samples/sec   Loss 1.1030   LearningRate 0.0001   Epoch: 19   Global Step: 802840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:36,637-Speed 2632.08 samples/sec   Loss 1.1357   LearningRate 0.0001   Epoch: 19   Global Step: 802850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:40,531-Speed 2630.42 samples/sec   Loss 1.1922   LearningRate 0.0001   Epoch: 19   Global Step: 802860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:44,432-Speed 2625.79 samples/sec   Loss 1.1501   LearningRate 0.0001   Epoch: 19   Global Step: 802870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:48,339-Speed 2621.45 samples/sec   Loss 1.1452   LearningRate 0.0001   Epoch: 19   Global Step: 802880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:52,248-Speed 2619.98 samples/sec   Loss 1.1955   LearningRate 0.0001   Epoch: 19   Global Step: 802890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:58:56,163-Speed 2617.19 samples/sec   Loss 1.1348   LearningRate 0.0001   Epoch: 19   Global Step: 802900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:00,082-Speed 2613.50 samples/sec   Loss 1.1653   LearningRate 0.0001   Epoch: 19   Global Step: 802910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:03,977-Speed 2629.35 samples/sec   Loss 1.1537   LearningRate 0.0001   Epoch: 19   Global Step: 802920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:59:07,877-Speed 2626.43 samples/sec   Loss 1.1395   LearningRate 0.0001   Epoch: 19   Global Step: 802930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 13:59:11,752-Speed 2642.92 samples/sec   Loss 1.1694   LearningRate 0.0001   Epoch: 19   Global Step: 802940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:15,657-Speed 2622.63 samples/sec   Loss 1.1436   LearningRate 0.0001   Epoch: 19   Global Step: 802950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:19,574-Speed 2615.33 samples/sec   Loss 1.1333   LearningRate 0.0001   Epoch: 19   Global Step: 802960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:23,467-Speed 2631.37 samples/sec   Loss 1.1675   LearningRate 0.0001   Epoch: 19   Global Step: 802970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:27,360-Speed 2631.15 samples/sec   Loss 1.0951   LearningRate 0.0001   Epoch: 19   Global Step: 802980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:31,253-Speed 2630.76 samples/sec   Loss 1.1852   LearningRate 0.0001   Epoch: 19   Global Step: 802990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:35,155-Speed 2625.13 samples/sec   Loss 1.1166   LearningRate 0.0001   Epoch: 19   Global Step: 803000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:39,048-Speed 2630.94 samples/sec   Loss 1.1683   LearningRate 0.0001   Epoch: 19   Global Step: 803010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:42,945-Speed 2628.32 samples/sec   Loss 1.1498   LearningRate 0.0001   Epoch: 19   Global Step: 803020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:46,885-Speed 2599.30 samples/sec   Loss 1.1033   LearningRate 0.0001   Epoch: 19   Global Step: 803030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:50,757-Speed 2645.68 samples/sec   Loss 1.2020   LearningRate 0.0001   Epoch: 19   Global Step: 803040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:54,663-Speed 2622.20 samples/sec   Loss 1.1503   LearningRate 0.0001   Epoch: 19   Global Step: 803050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 13:59:58,554-Speed 2632.68 samples/sec   Loss 1.1388   LearningRate 0.0001   Epoch: 19   Global Step: 803060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:02,470-Speed 2615.32 samples/sec   Loss 1.1694   LearningRate 0.0001   Epoch: 19   Global Step: 803070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:06,407-Speed 2602.30 samples/sec   Loss 1.1059   LearningRate 0.0001   Epoch: 19   Global Step: 803080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:10,299-Speed 2631.20 samples/sec   Loss 1.1439   LearningRate 0.0001   Epoch: 19   Global Step: 803090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:14,194-Speed 2629.92 samples/sec   Loss 1.1190   LearningRate 0.0001   Epoch: 19   Global Step: 803100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:18,117-Speed 2610.80 samples/sec   Loss 1.1452   LearningRate 0.0001   Epoch: 19   Global Step: 803110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:22,006-Speed 2634.08 samples/sec   Loss 1.1498   LearningRate 0.0001   Epoch: 19   Global Step: 803120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:25,899-Speed 2631.77 samples/sec   Loss 1.1865   LearningRate 0.0001   Epoch: 19   Global Step: 803130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:29,826-Speed 2607.50 samples/sec   Loss 1.1321   LearningRate 0.0001   Epoch: 19   Global Step: 803140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:00:33,699-Speed 2644.69 samples/sec   Loss 1.0642   LearningRate 0.0001   Epoch: 19   Global Step: 803150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:37,595-Speed 2629.33 samples/sec   Loss 1.1370   LearningRate 0.0001   Epoch: 19   Global Step: 803160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:41,491-Speed 2629.04 samples/sec   Loss 1.1228   LearningRate 0.0001   Epoch: 19   Global Step: 803170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:45,392-Speed 2625.85 samples/sec   Loss 1.1217   LearningRate 0.0001   Epoch: 19   Global Step: 803180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:49,297-Speed 2623.11 samples/sec   Loss 1.1501   LearningRate 0.0001   Epoch: 19   Global Step: 803190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:53,201-Speed 2623.46 samples/sec   Loss 1.1130   LearningRate 0.0001   Epoch: 19   Global Step: 803200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:00:57,115-Speed 2617.36 samples/sec   Loss 1.1458   LearningRate 0.0001   Epoch: 19   Global Step: 803210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:01:01,017-Speed 2624.69 samples/sec   Loss 1.1064   LearningRate 0.0001   Epoch: 19   Global Step: 803220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:01:05,142-Speed 2483.37 samples/sec   Loss 1.1075   LearningRate 0.0001   Epoch: 19   Global Step: 803230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:01:09,172-Speed 2541.65 samples/sec   Loss 1.1079   LearningRate 0.0001   Epoch: 19   Global Step: 803240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:01:13,081-Speed 2620.11 samples/sec   Loss 1.1867   LearningRate 0.0001   Epoch: 19   Global Step: 803250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:01:16,976-Speed 2629.65 samples/sec   Loss 1.1127   LearningRate 0.0001   Epoch: 19   Global Step: 803260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:01:20,851-Speed 2643.67 samples/sec   Loss 1.1547   LearningRate 0.0001   Epoch: 19   Global Step: 803270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:01:24,751-Speed 2625.75 samples/sec   Loss 1.1222   LearningRate 0.0001   Epoch: 19   Global Step: 803280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:01:28,631-Speed 2640.33 samples/sec   Loss 1.2096   LearningRate 0.0001   Epoch: 19   Global Step: 803290   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:32,532-Speed 2625.51 samples/sec   Loss 1.1464   LearningRate 0.0001   Epoch: 19   Global Step: 803300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:36,426-Speed 2630.00 samples/sec   Loss 1.1398   LearningRate 0.0001   Epoch: 19   Global Step: 803310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:40,318-Speed 2631.87 samples/sec   Loss 1.1461   LearningRate 0.0001   Epoch: 19   Global Step: 803320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:44,218-Speed 2626.48 samples/sec   Loss 1.1265   LearningRate 0.0001   Epoch: 19   Global Step: 803330   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:48,114-Speed 2628.86 samples/sec   Loss 1.1660   LearningRate 0.0001   Epoch: 19   Global Step: 803340   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:52,012-Speed 2627.58 samples/sec   Loss 1.1400   LearningRate 0.0001   Epoch: 19   Global Step: 803350   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:55,907-Speed 2629.73 samples/sec   Loss 1.1868   LearningRate 0.0001   Epoch: 19   Global Step: 803360   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:01:59,798-Speed 2632.71 samples/sec   Loss 1.1489   LearningRate 0.0001   Epoch: 19   Global Step: 803370   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:02:03,692-Speed 2630.45 samples/sec   Loss 1.0932   LearningRate 0.0001   Epoch: 19   Global Step: 803380   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:02:07,581-Speed 2633.31 samples/sec   Loss 1.1413   LearningRate 0.0001   Epoch: 19   Global Step: 803390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:11,484-Speed 2624.06 samples/sec   Loss 1.1517   LearningRate 0.0001   Epoch: 19   Global Step: 803400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:15,409-Speed 2610.18 samples/sec   Loss 1.1370   LearningRate 0.0001   Epoch: 19   Global Step: 803410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:19,313-Speed 2623.18 samples/sec   Loss 1.1763   LearningRate 0.0001   Epoch: 19   Global Step: 803420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:23,241-Speed 2607.93 samples/sec   Loss 1.1190   LearningRate 0.0001   Epoch: 19   Global Step: 803430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:27,141-Speed 2626.35 samples/sec   Loss 1.1313   LearningRate 0.0001   Epoch: 19   Global Step: 803440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:31,047-Speed 2622.60 samples/sec   Loss 1.1713   LearningRate 0.0001   Epoch: 19   Global Step: 803450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:34,946-Speed 2626.88 samples/sec   Loss 1.1427   LearningRate 0.0001   Epoch: 19   Global Step: 803460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:38,843-Speed 2628.46 samples/sec   Loss 1.1170   LearningRate 0.0001   Epoch: 19   Global Step: 803470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:42,743-Speed 2625.51 samples/sec   Loss 1.1764   LearningRate 0.0001   Epoch: 19   Global Step: 803480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:46,669-Speed 2609.10 samples/sec   Loss 1.1735   LearningRate 0.0001   Epoch: 19   Global Step: 803490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:02:50,610-Speed 2599.49 samples/sec   Loss 1.1550   LearningRate 0.0001   Epoch: 19   Global Step: 803500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:54,509-Speed 2627.26 samples/sec   Loss 1.1434   LearningRate 0.0001   Epoch: 19   Global Step: 803510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:02:58,414-Speed 2622.54 samples/sec   Loss 1.1399   LearningRate 0.0001   Epoch: 19   Global Step: 803520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:02,327-Speed 2617.74 samples/sec   Loss 1.1989   LearningRate 0.0001   Epoch: 19   Global Step: 803530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:06,244-Speed 2615.06 samples/sec   Loss 1.1731   LearningRate 0.0001   Epoch: 19   Global Step: 803540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:10,153-Speed 2620.16 samples/sec   Loss 1.1660   LearningRate 0.0001   Epoch: 19   Global Step: 803550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:14,057-Speed 2623.31 samples/sec   Loss 1.1696   LearningRate 0.0001   Epoch: 19   Global Step: 803560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:17,969-Speed 2618.04 samples/sec   Loss 1.1748   LearningRate 0.0001   Epoch: 19   Global Step: 803570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:21,881-Speed 2618.64 samples/sec   Loss 1.1070   LearningRate 0.0001   Epoch: 19   Global Step: 803580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:25,793-Speed 2617.84 samples/sec   Loss 1.1477   LearningRate 0.0001   Epoch: 19   Global Step: 803590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:29,699-Speed 2622.53 samples/sec   Loss 1.1114   LearningRate 0.0001   Epoch: 19   Global Step: 803600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:03:33,606-Speed 2621.76 samples/sec   Loss 1.1045   LearningRate 0.0001   Epoch: 19   Global Step: 803610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:03:37,513-Speed 2621.50 samples/sec   Loss 1.1508   LearningRate 0.0001   Epoch: 19   Global Step: 803620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:03:41,398-Speed 2636.01 samples/sec   Loss 1.1038   LearningRate 0.0001   Epoch: 19   Global Step: 803630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:45,297-Speed 2626.93 samples/sec   Loss 1.1564   LearningRate 0.0001   Epoch: 19   Global Step: 803640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:49,195-Speed 2627.94 samples/sec   Loss 1.1885   LearningRate 0.0001   Epoch: 19   Global Step: 803650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:53,085-Speed 2633.28 samples/sec   Loss 1.1347   LearningRate 0.0001   Epoch: 19   Global Step: 803660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:03:56,978-Speed 2630.64 samples/sec   Loss 1.1526   LearningRate 0.0001   Epoch: 19   Global Step: 803670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:00,892-Speed 2617.70 samples/sec   Loss 1.1605   LearningRate 0.0001   Epoch: 19   Global Step: 803680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:04,784-Speed 2631.29 samples/sec   Loss 1.1394   LearningRate 0.0001   Epoch: 19   Global Step: 803690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:08,674-Speed 2632.69 samples/sec   Loss 1.1372   LearningRate 0.0001   Epoch: 19   Global Step: 803700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:12,582-Speed 2621.11 samples/sec   Loss 1.1393   LearningRate 0.0001   Epoch: 19   Global Step: 803710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:16,476-Speed 2630.42 samples/sec   Loss 1.1840   LearningRate 0.0001   Epoch: 19   Global Step: 803720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:20,371-Speed 2629.57 samples/sec   Loss 1.1209   LearningRate 0.0001   Epoch: 19   Global Step: 803730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:04:24,276-Speed 2622.30 samples/sec   Loss 1.1623   LearningRate 0.0001   Epoch: 19   Global Step: 803740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:04:28,166-Speed 2633.80 samples/sec   Loss 1.1019   LearningRate 0.0001   Epoch: 19   Global Step: 803750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:32,068-Speed 2624.88 samples/sec   Loss 1.1420   LearningRate 0.0001   Epoch: 19   Global Step: 803760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:35,972-Speed 2623.83 samples/sec   Loss 1.0767   LearningRate 0.0001   Epoch: 19   Global Step: 803770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:39,871-Speed 2626.44 samples/sec   Loss 1.1684   LearningRate 0.0001   Epoch: 19   Global Step: 803780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:43,776-Speed 2622.89 samples/sec   Loss 1.1121   LearningRate 0.0001   Epoch: 19   Global Step: 803790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:47,692-Speed 2615.62 samples/sec   Loss 1.2220   LearningRate 0.0001   Epoch: 19   Global Step: 803800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:51,602-Speed 2620.19 samples/sec   Loss 1.1787   LearningRate 0.0001   Epoch: 19   Global Step: 803810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:55,508-Speed 2621.93 samples/sec   Loss 1.1709   LearningRate 0.0001   Epoch: 19   Global Step: 803820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:04:59,420-Speed 2618.23 samples/sec   Loss 1.1074   LearningRate 0.0001   Epoch: 19   Global Step: 803830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:05:03,318-Speed 2628.10 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 19   Global Step: 803840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:05:07,215-Speed 2628.28 samples/sec   Loss 1.1484   LearningRate 0.0001   Epoch: 19   Global Step: 803850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:05:11,093-Speed 2641.54 samples/sec   Loss 1.1960   LearningRate 0.0001   Epoch: 19   Global Step: 803860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:05:14,988-Speed 2629.56 samples/sec   Loss 1.0977   LearningRate 0.0001   Epoch: 19   Global Step: 803870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:05:18,864-Speed 2642.38 samples/sec   Loss 1.1466   LearningRate 0.0001   Epoch: 19   Global Step: 803880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:22,762-Speed 2627.25 samples/sec   Loss 1.1361   LearningRate 0.0001   Epoch: 19   Global Step: 803890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:26,662-Speed 2627.46 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 19   Global Step: 803900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:30,561-Speed 2626.72 samples/sec   Loss 1.1729   LearningRate 0.0001   Epoch: 19   Global Step: 803910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:34,462-Speed 2626.09 samples/sec   Loss 1.1821   LearningRate 0.0001   Epoch: 19   Global Step: 803920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:38,384-Speed 2611.40 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 803930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:42,287-Speed 2623.78 samples/sec   Loss 1.1319   LearningRate 0.0001   Epoch: 19   Global Step: 803940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:46,194-Speed 2622.08 samples/sec   Loss 1.1820   LearningRate 0.0001   Epoch: 19   Global Step: 803950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:50,113-Speed 2613.46 samples/sec   Loss 1.1529   LearningRate 0.0001   Epoch: 19   Global Step: 803960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:54,018-Speed 2623.01 samples/sec   Loss 1.1039   LearningRate 0.0001   Epoch: 19   Global Step: 803970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:05:57,925-Speed 2622.25 samples/sec   Loss 1.1326   LearningRate 0.0001   Epoch: 19   Global Step: 803980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:01,833-Speed 2620.55 samples/sec   Loss 1.1192   LearningRate 0.0001   Epoch: 19   Global Step: 803990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:05,729-Speed 2629.80 samples/sec   Loss 1.0866   LearningRate 0.0001   Epoch: 19   Global Step: 804000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:09,638-Speed 2619.83 samples/sec   Loss 1.0948   LearningRate 0.0001   Epoch: 19   Global Step: 804010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:13,531-Speed 2630.73 samples/sec   Loss 1.1542   LearningRate 0.0001   Epoch: 19   Global Step: 804020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:17,426-Speed 2629.60 samples/sec   Loss 1.0952   LearningRate 0.0001   Epoch: 19   Global Step: 804030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:21,321-Speed 2630.00 samples/sec   Loss 1.1558   LearningRate 0.0001   Epoch: 19   Global Step: 804040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:25,217-Speed 2628.98 samples/sec   Loss 1.1591   LearningRate 0.0001   Epoch: 19   Global Step: 804050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:29,112-Speed 2629.52 samples/sec   Loss 1.1352   LearningRate 0.0001   Epoch: 19   Global Step: 804060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:33,025-Speed 2617.36 samples/sec   Loss 1.1219   LearningRate 0.0001   Epoch: 19   Global Step: 804070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:36,936-Speed 2619.20 samples/sec   Loss 1.1006   LearningRate 0.0001   Epoch: 19   Global Step: 804080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:06:40,805-Speed 2647.78 samples/sec   Loss 1.1313   LearningRate 0.0001   Epoch: 19   Global Step: 804090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:44,697-Speed 2631.38 samples/sec   Loss 1.1096   LearningRate 0.0001   Epoch: 19   Global Step: 804100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:48,652-Speed 2589.94 samples/sec   Loss 1.1345   LearningRate 0.0001   Epoch: 19   Global Step: 804110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:52,560-Speed 2621.05 samples/sec   Loss 1.1899   LearningRate 0.0001   Epoch: 19   Global Step: 804120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:06:56,466-Speed 2622.57 samples/sec   Loss 1.1546   LearningRate 0.0001   Epoch: 19   Global Step: 804130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:00,376-Speed 2619.12 samples/sec   Loss 1.1418   LearningRate 0.0001   Epoch: 19   Global Step: 804140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:04,276-Speed 2626.65 samples/sec   Loss 1.1243   LearningRate 0.0001   Epoch: 19   Global Step: 804150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:08,167-Speed 2631.89 samples/sec   Loss 1.1975   LearningRate 0.0001   Epoch: 19   Global Step: 804160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:12,060-Speed 2631.60 samples/sec   Loss 1.1294   LearningRate 0.0001   Epoch: 19   Global Step: 804170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:15,976-Speed 2615.15 samples/sec   Loss 1.1411   LearningRate 0.0001   Epoch: 19   Global Step: 804180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:19,848-Speed 2645.32 samples/sec   Loss 1.1062   LearningRate 0.0001   Epoch: 19   Global Step: 804190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:23,738-Speed 2633.70 samples/sec   Loss 1.1847   LearningRate 0.0001   Epoch: 19   Global Step: 804200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:27,651-Speed 2617.18 samples/sec   Loss 1.2044   LearningRate 0.0001   Epoch: 19   Global Step: 804210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:31,547-Speed 2629.32 samples/sec   Loss 1.1759   LearningRate 0.0001   Epoch: 19   Global Step: 804220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:35,450-Speed 2624.17 samples/sec   Loss 1.1613   LearningRate 0.0001   Epoch: 19   Global Step: 804230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:39,340-Speed 2632.97 samples/sec   Loss 1.1322   LearningRate 0.0001   Epoch: 19   Global Step: 804240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:43,251-Speed 2618.72 samples/sec   Loss 1.1509   LearningRate 0.0001   Epoch: 19   Global Step: 804250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:47,141-Speed 2633.01 samples/sec   Loss 1.1458   LearningRate 0.0001   Epoch: 19   Global Step: 804260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:51,035-Speed 2630.70 samples/sec   Loss 1.1718   LearningRate 0.0001   Epoch: 19   Global Step: 804270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:54,930-Speed 2629.73 samples/sec   Loss 1.1196   LearningRate 0.0001   Epoch: 19   Global Step: 804280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:07:58,850-Speed 2612.68 samples/sec   Loss 1.1775   LearningRate 0.0001   Epoch: 19   Global Step: 804290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:08:02,828-Speed 2575.14 samples/sec   Loss 1.1247   LearningRate 0.0001   Epoch: 19   Global Step: 804300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:08:06,917-Speed 2505.11 samples/sec   Loss 1.1602   LearningRate 0.0001   Epoch: 19   Global Step: 804310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:08:10,875-Speed 2587.38 samples/sec   Loss 1.1546   LearningRate 0.0001   Epoch: 19   Global Step: 804320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:08:14,744-Speed 2648.05 samples/sec   Loss 1.1597   LearningRate 0.0001   Epoch: 19   Global Step: 804330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:18,646-Speed 2624.56 samples/sec   Loss 1.1599   LearningRate 0.0001   Epoch: 19   Global Step: 804340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:22,544-Speed 2627.97 samples/sec   Loss 1.0832   LearningRate 0.0001   Epoch: 19   Global Step: 804350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:26,442-Speed 2627.69 samples/sec   Loss 1.1178   LearningRate 0.0001   Epoch: 19   Global Step: 804360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:30,394-Speed 2592.31 samples/sec   Loss 1.1870   LearningRate 0.0001   Epoch: 19   Global Step: 804370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:34,339-Speed 2595.95 samples/sec   Loss 1.1385   LearningRate 0.0001   Epoch: 19   Global Step: 804380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:38,232-Speed 2631.45 samples/sec   Loss 1.1215   LearningRate 0.0001   Epoch: 19   Global Step: 804390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:42,119-Speed 2634.70 samples/sec   Loss 1.1539   LearningRate 0.0001   Epoch: 19   Global Step: 804400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:46,010-Speed 2632.67 samples/sec   Loss 1.1308   LearningRate 0.0001   Epoch: 19   Global Step: 804410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:49,903-Speed 2631.37 samples/sec   Loss 1.1483   LearningRate 0.0001   Epoch: 19   Global Step: 804420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:08:53,797-Speed 2630.56 samples/sec   Loss 1.1985   LearningRate 0.0001   Epoch: 19   Global Step: 804430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:08:57,673-Speed 2642.15 samples/sec   Loss 1.1690   LearningRate 0.0001   Epoch: 19   Global Step: 804440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:01,576-Speed 2624.27 samples/sec   Loss 1.1373   LearningRate 0.0001   Epoch: 19   Global Step: 804450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:05,479-Speed 2624.16 samples/sec   Loss 1.1678   LearningRate 0.0001   Epoch: 19   Global Step: 804460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:09,389-Speed 2620.12 samples/sec   Loss 1.1809   LearningRate 0.0001   Epoch: 19   Global Step: 804470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:13,285-Speed 2628.86 samples/sec   Loss 1.1416   LearningRate 0.0001   Epoch: 19   Global Step: 804480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:17,180-Speed 2629.95 samples/sec   Loss 1.1797   LearningRate 0.0001   Epoch: 19   Global Step: 804490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:21,075-Speed 2629.36 samples/sec   Loss 1.1613   LearningRate 0.0001   Epoch: 19   Global Step: 804500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:24,967-Speed 2631.82 samples/sec   Loss 1.1540   LearningRate 0.0001   Epoch: 19   Global Step: 804510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:28,885-Speed 2614.15 samples/sec   Loss 1.1390   LearningRate 0.0001   Epoch: 19   Global Step: 804520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:32,780-Speed 2630.32 samples/sec   Loss 1.0996   LearningRate 0.0001   Epoch: 19   Global Step: 804530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:09:36,673-Speed 2630.47 samples/sec   Loss 1.1121   LearningRate 0.0001   Epoch: 19   Global Step: 804540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:09:40,568-Speed 2629.94 samples/sec   Loss 1.1431   LearningRate 0.0001   Epoch: 19   Global Step: 804550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:09:44,479-Speed 2619.08 samples/sec   Loss 1.1518   LearningRate 0.0001   Epoch: 19   Global Step: 804560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:09:48,383-Speed 2623.35 samples/sec   Loss 1.1788   LearningRate 0.0001   Epoch: 19   Global Step: 804570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:09:52,280-Speed 2628.28 samples/sec   Loss 1.1602   LearningRate 0.0001   Epoch: 19   Global Step: 804580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:09:56,148-Speed 2648.17 samples/sec   Loss 1.1740   LearningRate 0.0001   Epoch: 19   Global Step: 804590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:10:00,042-Speed 2630.97 samples/sec   Loss 1.1857   LearningRate 0.0001   Epoch: 19   Global Step: 804600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:03,957-Speed 2615.80 samples/sec   Loss 1.1059   LearningRate 0.0001   Epoch: 19   Global Step: 804610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:07,863-Speed 2621.96 samples/sec   Loss 1.1345   LearningRate 0.0001   Epoch: 19   Global Step: 804620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:11,758-Speed 2629.98 samples/sec   Loss 1.1461   LearningRate 0.0001   Epoch: 19   Global Step: 804630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:15,657-Speed 2627.50 samples/sec   Loss 1.1423   LearningRate 0.0001   Epoch: 19   Global Step: 804640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:19,640-Speed 2571.38 samples/sec   Loss 1.1528   LearningRate 0.0001   Epoch: 19   Global Step: 804650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:23,528-Speed 2634.44 samples/sec   Loss 1.0932   LearningRate 0.0001   Epoch: 19   Global Step: 804660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:27,434-Speed 2622.19 samples/sec   Loss 1.1796   LearningRate 0.0001   Epoch: 19   Global Step: 804670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:31,328-Speed 2629.90 samples/sec   Loss 1.1002   LearningRate 0.0001   Epoch: 19   Global Step: 804680   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:35,221-Speed 2631.41 samples/sec   Loss 1.1240   LearningRate 0.0001   Epoch: 19   Global Step: 804690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:10:39,115-Speed 2630.47 samples/sec   Loss 1.1545   LearningRate 0.0001   Epoch: 19   Global Step: 804700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:10:43,005-Speed 2632.83 samples/sec   Loss 1.1504   LearningRate 0.0001   Epoch: 19   Global Step: 804710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:10:46,906-Speed 2625.71 samples/sec   Loss 1.0684   LearningRate 0.0001   Epoch: 19   Global Step: 804720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:10:50,800-Speed 2630.46 samples/sec   Loss 1.1352   LearningRate 0.0001   Epoch: 19   Global Step: 804730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:10:54,693-Speed 2630.82 samples/sec   Loss 1.1026   LearningRate 0.0001   Epoch: 19   Global Step: 804740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:10:58,597-Speed 2625.33 samples/sec   Loss 1.0941   LearningRate 0.0001   Epoch: 19   Global Step: 804750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:11:02,544-Speed 2594.52 samples/sec   Loss 1.1062   LearningRate 0.0001   Epoch: 19   Global Step: 804760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:11:06,439-Speed 2630.04 samples/sec   Loss 1.1432   LearningRate 0.0001   Epoch: 19   Global Step: 804770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:11:10,328-Speed 2633.99 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 19   Global Step: 804780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:11:14,229-Speed 2625.13 samples/sec   Loss 1.1605   LearningRate 0.0001   Epoch: 19   Global Step: 804790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:11:18,128-Speed 2626.96 samples/sec   Loss 1.1529   LearningRate 0.0001   Epoch: 19   Global Step: 804800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:22,022-Speed 2630.92 samples/sec   Loss 1.1305   LearningRate 0.0001   Epoch: 19   Global Step: 804810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:25,918-Speed 2628.86 samples/sec   Loss 1.1758   LearningRate 0.0001   Epoch: 19   Global Step: 804820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:29,820-Speed 2625.19 samples/sec   Loss 1.0938   LearningRate 0.0001   Epoch: 19   Global Step: 804830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:33,711-Speed 2632.14 samples/sec   Loss 1.1565   LearningRate 0.0001   Epoch: 19   Global Step: 804840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:37,607-Speed 2629.64 samples/sec   Loss 1.1701   LearningRate 0.0001   Epoch: 19   Global Step: 804850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:41,500-Speed 2631.03 samples/sec   Loss 1.1331   LearningRate 0.0001   Epoch: 19   Global Step: 804860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:45,394-Speed 2629.59 samples/sec   Loss 1.1064   LearningRate 0.0001   Epoch: 19   Global Step: 804870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:49,295-Speed 2625.92 samples/sec   Loss 1.0920   LearningRate 0.0001   Epoch: 19   Global Step: 804880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:11:53,190-Speed 2629.81 samples/sec   Loss 1.1174   LearningRate 0.0001   Epoch: 19   Global Step: 804890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:11:57,092-Speed 2625.01 samples/sec   Loss 1.0769   LearningRate 0.0001   Epoch: 19   Global Step: 804900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:01,061-Speed 2580.81 samples/sec   Loss 1.1767   LearningRate 0.0001   Epoch: 19   Global Step: 804910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:04,980-Speed 2614.24 samples/sec   Loss 1.1232   LearningRate 0.0001   Epoch: 19   Global Step: 804920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:08,887-Speed 2621.82 samples/sec   Loss 1.1564   LearningRate 0.0001   Epoch: 19   Global Step: 804930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:12,782-Speed 2629.43 samples/sec   Loss 1.1396   LearningRate 0.0001   Epoch: 19   Global Step: 804940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:16,678-Speed 2628.91 samples/sec   Loss 1.1923   LearningRate 0.0001   Epoch: 19   Global Step: 804950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:20,578-Speed 2625.95 samples/sec   Loss 1.1207   LearningRate 0.0001   Epoch: 19   Global Step: 804960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:24,480-Speed 2624.98 samples/sec   Loss 1.1434   LearningRate 0.0001   Epoch: 19   Global Step: 804970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:28,378-Speed 2628.01 samples/sec   Loss 1.1320   LearningRate 0.0001   Epoch: 19   Global Step: 804980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:32,263-Speed 2637.27 samples/sec   Loss 1.1141   LearningRate 0.0001   Epoch: 19   Global Step: 804990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:36,157-Speed 2630.00 samples/sec   Loss 1.1370   LearningRate 0.0001   Epoch: 19   Global Step: 805000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:40,050-Speed 2630.86 samples/sec   Loss 1.1386   LearningRate 0.0001   Epoch: 19   Global Step: 805010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:43,942-Speed 2631.49 samples/sec   Loss 1.1247   LearningRate 0.0001   Epoch: 19   Global Step: 805020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:47,841-Speed 2627.30 samples/sec   Loss 1.1778   LearningRate 0.0001   Epoch: 19   Global Step: 805030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:51,733-Speed 2631.47 samples/sec   Loss 1.1285   LearningRate 0.0001   Epoch: 19   Global Step: 805040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:55,661-Speed 2608.06 samples/sec   Loss 1.1625   LearningRate 0.0001   Epoch: 19   Global Step: 805050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:12:59,560-Speed 2626.92 samples/sec   Loss 1.1842   LearningRate 0.0001   Epoch: 19   Global Step: 805060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:03,454-Speed 2629.90 samples/sec   Loss 1.1599   LearningRate 0.0001   Epoch: 19   Global Step: 805070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:07,353-Speed 2627.28 samples/sec   Loss 1.1301   LearningRate 0.0001   Epoch: 19   Global Step: 805080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:11,225-Speed 2645.42 samples/sec   Loss 1.1607   LearningRate 0.0001   Epoch: 19   Global Step: 805090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:15,162-Speed 2601.26 samples/sec   Loss 1.1489   LearningRate 0.0001   Epoch: 19   Global Step: 805100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:19,068-Speed 2622.18 samples/sec   Loss 1.1159   LearningRate 0.0001   Epoch: 19   Global Step: 805110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:22,981-Speed 2618.20 samples/sec   Loss 1.1405   LearningRate 0.0001   Epoch: 19   Global Step: 805120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:26,884-Speed 2624.19 samples/sec   Loss 1.1216   LearningRate 0.0001   Epoch: 19   Global Step: 805130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:30,781-Speed 2628.72 samples/sec   Loss 1.1260   LearningRate 0.0001   Epoch: 19   Global Step: 805140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:34,676-Speed 2629.60 samples/sec   Loss 1.1262   LearningRate 0.0001   Epoch: 19   Global Step: 805150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:38,580-Speed 2623.01 samples/sec   Loss 1.0980   LearningRate 0.0001   Epoch: 19   Global Step: 805160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:13:42,458-Speed 2640.98 samples/sec   Loss 1.1450   LearningRate 0.0001   Epoch: 19   Global Step: 805170   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:13:46,353-Speed 2630.23 samples/sec   Loss 1.1519   LearningRate 0.0001   Epoch: 19   Global Step: 805180   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:13:50,242-Speed 2632.91 samples/sec   Loss 1.1458   LearningRate 0.0001   Epoch: 19   Global Step: 805190   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:13:54,147-Speed 2623.68 samples/sec   Loss 1.1424   LearningRate 0.0001   Epoch: 19   Global Step: 805200   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:13:58,051-Speed 2623.62 samples/sec   Loss 1.1812   LearningRate 0.0001   Epoch: 19   Global Step: 805210   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:14:01,973-Speed 2612.34 samples/sec   Loss 1.1059   LearningRate 0.0001   Epoch: 19   Global Step: 805220   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:14:05,936-Speed 2584.76 samples/sec   Loss 1.1963   LearningRate 0.0001   Epoch: 19   Global Step: 805230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:14:09,854-Speed 2613.58 samples/sec   Loss 1.1517   LearningRate 0.0001   Epoch: 19   Global Step: 805240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:14:13,780-Speed 2609.31 samples/sec   Loss 1.1676   LearningRate 0.0001   Epoch: 19   Global Step: 805250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:14:17,705-Speed 2609.40 samples/sec   Loss 1.1417   LearningRate 0.0001   Epoch: 19   Global Step: 805260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:14:21,602-Speed 2628.74 samples/sec   Loss 1.1563   LearningRate 0.0001   Epoch: 19   Global Step: 805270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:25,518-Speed 2616.35 samples/sec   Loss 1.1694   LearningRate 0.0001   Epoch: 19   Global Step: 805280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:29,414-Speed 2628.43 samples/sec   Loss 1.1320   LearningRate 0.0001   Epoch: 19   Global Step: 805290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:33,308-Speed 2630.38 samples/sec   Loss 1.1038   LearningRate 0.0001   Epoch: 19   Global Step: 805300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:37,205-Speed 2628.98 samples/sec   Loss 1.1216   LearningRate 0.0001   Epoch: 19   Global Step: 805310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:41,097-Speed 2631.34 samples/sec   Loss 1.1838   LearningRate 0.0001   Epoch: 19   Global Step: 805320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:44,995-Speed 2627.21 samples/sec   Loss 1.1253   LearningRate 0.0001   Epoch: 19   Global Step: 805330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:48,893-Speed 2627.87 samples/sec   Loss 1.1691   LearningRate 0.0001   Epoch: 19   Global Step: 805340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:52,789-Speed 2628.67 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 805350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:14:56,684-Speed 2629.63 samples/sec   Loss 1.1363   LearningRate 0.0001   Epoch: 19   Global Step: 805360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:00,560-Speed 2643.30 samples/sec   Loss 1.1372   LearningRate 0.0001   Epoch: 19   Global Step: 805370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:04,451-Speed 2632.00 samples/sec   Loss 1.1645   LearningRate 0.0001   Epoch: 19   Global Step: 805380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:08,343-Speed 2631.49 samples/sec   Loss 1.1216   LearningRate 0.0001   Epoch: 19   Global Step: 805390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:12,251-Speed 2620.64 samples/sec   Loss 1.1457   LearningRate 0.0001   Epoch: 19   Global Step: 805400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:16,147-Speed 2629.18 samples/sec   Loss 1.1574   LearningRate 0.0001   Epoch: 19   Global Step: 805410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:20,056-Speed 2620.92 samples/sec   Loss 1.1548   LearningRate 0.0001   Epoch: 19   Global Step: 805420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:23,963-Speed 2622.12 samples/sec   Loss 1.1239   LearningRate 0.0001   Epoch: 19   Global Step: 805430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:27,879-Speed 2615.10 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 805440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:31,781-Speed 2625.37 samples/sec   Loss 1.1471   LearningRate 0.0001   Epoch: 19   Global Step: 805450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:35,697-Speed 2615.88 samples/sec   Loss 1.1052   LearningRate 0.0001   Epoch: 19   Global Step: 805460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:39,618-Speed 2611.48 samples/sec   Loss 1.0943   LearningRate 0.0001   Epoch: 19   Global Step: 805470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:15:43,524-Speed 2622.17 samples/sec   Loss 1.1251   LearningRate 0.0001   Epoch: 19   Global Step: 805480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:15:47,422-Speed 2628.28 samples/sec   Loss 1.1655   LearningRate 0.0001   Epoch: 19   Global Step: 805490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:15:51,303-Speed 2639.27 samples/sec   Loss 1.1349   LearningRate 0.0001   Epoch: 19   Global Step: 805500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:55,250-Speed 2594.92 samples/sec   Loss 1.1589   LearningRate 0.0001   Epoch: 19   Global Step: 805510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:15:59,147-Speed 2629.09 samples/sec   Loss 1.1323   LearningRate 0.0001   Epoch: 19   Global Step: 805520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:03,042-Speed 2630.09 samples/sec   Loss 1.1047   LearningRate 0.0001   Epoch: 19   Global Step: 805530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:06,933-Speed 2632.08 samples/sec   Loss 1.1775   LearningRate 0.0001   Epoch: 19   Global Step: 805540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:10,828-Speed 2629.79 samples/sec   Loss 1.1296   LearningRate 0.0001   Epoch: 19   Global Step: 805550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:14,720-Speed 2631.62 samples/sec   Loss 1.1306   LearningRate 0.0001   Epoch: 19   Global Step: 805560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:18,657-Speed 2601.58 samples/sec   Loss 1.1265   LearningRate 0.0001   Epoch: 19   Global Step: 805570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:22,552-Speed 2630.46 samples/sec   Loss 1.0999   LearningRate 0.0001   Epoch: 19   Global Step: 805580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:26,447-Speed 2629.52 samples/sec   Loss 1.1704   LearningRate 0.0001   Epoch: 19   Global Step: 805590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:30,368-Speed 2612.63 samples/sec   Loss 1.1202   LearningRate 0.0001   Epoch: 19   Global Step: 805600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:16:34,273-Speed 2622.65 samples/sec   Loss 1.1437   LearningRate 0.0001   Epoch: 19   Global Step: 805610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:16:38,166-Speed 2631.54 samples/sec   Loss 1.1244   LearningRate 0.0001   Epoch: 19   Global Step: 805620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:16:42,039-Speed 2644.47 samples/sec   Loss 1.1280   LearningRate 0.0001   Epoch: 19   Global Step: 805630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:45,938-Speed 2626.97 samples/sec   Loss 1.1013   LearningRate 0.0001   Epoch: 19   Global Step: 805640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:49,840-Speed 2624.83 samples/sec   Loss 1.1690   LearningRate 0.0001   Epoch: 19   Global Step: 805650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:53,737-Speed 2628.62 samples/sec   Loss 1.1638   LearningRate 0.0001   Epoch: 19   Global Step: 805660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:16:57,637-Speed 2626.42 samples/sec   Loss 1.1714   LearningRate 0.0001   Epoch: 19   Global Step: 805670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:01,545-Speed 2620.86 samples/sec   Loss 1.2007   LearningRate 0.0001   Epoch: 19   Global Step: 805680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:05,452-Speed 2622.05 samples/sec   Loss 1.1288   LearningRate 0.0001   Epoch: 19   Global Step: 805690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:09,350-Speed 2627.19 samples/sec   Loss 1.1099   LearningRate 0.0001   Epoch: 19   Global Step: 805700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:13,266-Speed 2615.72 samples/sec   Loss 1.1082   LearningRate 0.0001   Epoch: 19   Global Step: 805710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:17,156-Speed 2632.76 samples/sec   Loss 1.1212   LearningRate 0.0001   Epoch: 19   Global Step: 805720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:21,047-Speed 2632.37 samples/sec   Loss 1.1468   LearningRate 0.0001   Epoch: 19   Global Step: 805730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:24,946-Speed 2626.97 samples/sec   Loss 1.1249   LearningRate 0.0001   Epoch: 19   Global Step: 805740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:28,838-Speed 2632.18 samples/sec   Loss 1.1555   LearningRate 0.0001   Epoch: 19   Global Step: 805750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:32,730-Speed 2631.84 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 19   Global Step: 805760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:36,623-Speed 2631.46 samples/sec   Loss 1.0852   LearningRate 0.0001   Epoch: 19   Global Step: 805770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:40,515-Speed 2632.10 samples/sec   Loss 1.1300   LearningRate 0.0001   Epoch: 19   Global Step: 805780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:44,404-Speed 2633.28 samples/sec   Loss 1.1228   LearningRate 0.0001   Epoch: 19   Global Step: 805790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:48,299-Speed 2629.68 samples/sec   Loss 1.1226   LearningRate 0.0001   Epoch: 19   Global Step: 805800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:52,211-Speed 2618.41 samples/sec   Loss 1.0650   LearningRate 0.0001   Epoch: 19   Global Step: 805810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:56,104-Speed 2631.59 samples/sec   Loss 1.1590   LearningRate 0.0001   Epoch: 19   Global Step: 805820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:17:59,998-Speed 2630.75 samples/sec   Loss 1.1700   LearningRate 0.0001   Epoch: 19   Global Step: 805830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:18:03,889-Speed 2632.62 samples/sec   Loss 1.1628   LearningRate 0.0001   Epoch: 19   Global Step: 805840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:07,780-Speed 2632.28 samples/sec   Loss 1.1720   LearningRate 0.0001   Epoch: 19   Global Step: 805850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:11,671-Speed 2632.83 samples/sec   Loss 1.1561   LearningRate 0.0001   Epoch: 19   Global Step: 805860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:15,572-Speed 2625.32 samples/sec   Loss 1.1102   LearningRate 0.0001   Epoch: 19   Global Step: 805870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:19,480-Speed 2620.28 samples/sec   Loss 1.1254   LearningRate 0.0001   Epoch: 19   Global Step: 805880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:23,370-Speed 2633.67 samples/sec   Loss 1.1346   LearningRate 0.0001   Epoch: 19   Global Step: 805890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:27,268-Speed 2627.90 samples/sec   Loss 1.1015   LearningRate 0.0001   Epoch: 19   Global Step: 805900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:31,160-Speed 2631.91 samples/sec   Loss 1.1345   LearningRate 0.0001   Epoch: 19   Global Step: 805910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:35,056-Speed 2628.61 samples/sec   Loss 1.1406   LearningRate 0.0001   Epoch: 19   Global Step: 805920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:38,951-Speed 2629.76 samples/sec   Loss 1.1645   LearningRate 0.0001   Epoch: 19   Global Step: 805930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:42,847-Speed 2629.69 samples/sec   Loss 1.1107   LearningRate 0.0001   Epoch: 19   Global Step: 805940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:18:46,749-Speed 2625.24 samples/sec   Loss 1.1521   LearningRate 0.0001   Epoch: 19   Global Step: 805950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:18:50,645-Speed 2628.91 samples/sec   Loss 1.1121   LearningRate 0.0001   Epoch: 19   Global Step: 805960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:18:54,552-Speed 2621.38 samples/sec   Loss 1.1627   LearningRate 0.0001   Epoch: 19   Global Step: 805970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:18:58,468-Speed 2615.48 samples/sec   Loss 1.1425   LearningRate 0.0001   Epoch: 19   Global Step: 805980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:02,364-Speed 2628.87 samples/sec   Loss 1.1642   LearningRate 0.0001   Epoch: 19   Global Step: 805990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:06,259-Speed 2630.28 samples/sec   Loss 1.0998   LearningRate 0.0001   Epoch: 19   Global Step: 806000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:10,153-Speed 2629.76 samples/sec   Loss 1.1502   LearningRate 0.0001   Epoch: 19   Global Step: 806010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:14,061-Speed 2621.37 samples/sec   Loss 1.1488   LearningRate 0.0001   Epoch: 19   Global Step: 806020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:18,048-Speed 2569.64 samples/sec   Loss 1.1472   LearningRate 0.0001   Epoch: 19   Global Step: 806030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:21,995-Speed 2594.99 samples/sec   Loss 1.0947   LearningRate 0.0001   Epoch: 19   Global Step: 806040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:25,891-Speed 2629.05 samples/sec   Loss 1.1418   LearningRate 0.0001   Epoch: 19   Global Step: 806050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:29,795-Speed 2623.26 samples/sec   Loss 1.0794   LearningRate 0.0001   Epoch: 19   Global Step: 806060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:33,691-Speed 2629.46 samples/sec   Loss 1.1393   LearningRate 0.0001   Epoch: 19   Global Step: 806070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:19:37,567-Speed 2642.39 samples/sec   Loss 1.1469   LearningRate 0.0001   Epoch: 19   Global Step: 806080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:41,466-Speed 2627.18 samples/sec   Loss 1.1429   LearningRate 0.0001   Epoch: 19   Global Step: 806090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:45,359-Speed 2630.39 samples/sec   Loss 1.1689   LearningRate 0.0001   Epoch: 19   Global Step: 806100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:49,252-Speed 2631.43 samples/sec   Loss 1.1135   LearningRate 0.0001   Epoch: 19   Global Step: 806110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:53,148-Speed 2628.68 samples/sec   Loss 1.1664   LearningRate 0.0001   Epoch: 19   Global Step: 806120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:19:57,041-Speed 2631.48 samples/sec   Loss 1.1125   LearningRate 0.0001   Epoch: 19   Global Step: 806130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:00,954-Speed 2617.39 samples/sec   Loss 1.1395   LearningRate 0.0001   Epoch: 19   Global Step: 806140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:04,846-Speed 2631.63 samples/sec   Loss 1.1357   LearningRate 0.0001   Epoch: 19   Global Step: 806150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:08,748-Speed 2624.75 samples/sec   Loss 1.1230   LearningRate 0.0001   Epoch: 19   Global Step: 806160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:12,661-Speed 2618.46 samples/sec   Loss 1.1274   LearningRate 0.0001   Epoch: 19   Global Step: 806170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:16,562-Speed 2625.40 samples/sec   Loss 1.1615   LearningRate 0.0001   Epoch: 19   Global Step: 806180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:20:20,481-Speed 2613.88 samples/sec   Loss 1.1108   LearningRate 0.0001   Epoch: 19   Global Step: 806190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:20:24,436-Speed 2589.71 samples/sec   Loss 1.1677   LearningRate 0.0001   Epoch: 19   Global Step: 806200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:20:28,305-Speed 2648.01 samples/sec   Loss 1.1514   LearningRate 0.0001   Epoch: 19   Global Step: 806210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:32,227-Speed 2611.28 samples/sec   Loss 1.1163   LearningRate 0.0001   Epoch: 19   Global Step: 806220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:36,121-Speed 2630.34 samples/sec   Loss 1.1937   LearningRate 0.0001   Epoch: 19   Global Step: 806230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:40,036-Speed 2616.53 samples/sec   Loss 1.1104   LearningRate 0.0001   Epoch: 19   Global Step: 806240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:20:43,909-Speed 2644.41 samples/sec   Loss 1.1300   LearningRate 0.0001   Epoch: 19   Global Step: 806250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:20:47,843-Speed 2605.10 samples/sec   Loss 1.1168   LearningRate 0.0001   Epoch: 19   Global Step: 806260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:20:51,806-Speed 2584.60 samples/sec   Loss 1.0794   LearningRate 0.0001   Epoch: 19   Global Step: 806270   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:20:55,775-Speed 2580.47 samples/sec   Loss 1.1297   LearningRate 0.0001   Epoch: 19   Global Step: 806280   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:20:59,677-Speed 2624.69 samples/sec   Loss 1.1565   LearningRate 0.0001   Epoch: 19   Global Step: 806290   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:21:03,590-Speed 2617.70 samples/sec   Loss 1.1591   LearningRate 0.0001   Epoch: 19   Global Step: 806300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:21:07,494-Speed 2623.53 samples/sec   Loss 1.1111   LearningRate 0.0001   Epoch: 19   Global Step: 806310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:21:11,390-Speed 2629.20 samples/sec   Loss 1.1507   LearningRate 0.0001   Epoch: 19   Global Step: 806320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:21:15,293-Speed 2624.08 samples/sec   Loss 1.1309   LearningRate 0.0001   Epoch: 19   Global Step: 806330   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:21:19,190-Speed 2628.72 samples/sec   Loss 1.1226   LearningRate 0.0001   Epoch: 19   Global Step: 806340   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:21:23,089-Speed 2626.90 samples/sec   Loss 1.1097   LearningRate 0.0001   Epoch: 19   Global Step: 806350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:26,982-Speed 2631.82 samples/sec   Loss 1.1494   LearningRate 0.0001   Epoch: 19   Global Step: 806360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:30,874-Speed 2631.04 samples/sec   Loss 1.1698   LearningRate 0.0001   Epoch: 19   Global Step: 806370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:34,780-Speed 2622.60 samples/sec   Loss 1.1429   LearningRate 0.0001   Epoch: 19   Global Step: 806380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:38,684-Speed 2623.17 samples/sec   Loss 1.0942   LearningRate 0.0001   Epoch: 19   Global Step: 806390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:42,585-Speed 2625.79 samples/sec   Loss 1.1520   LearningRate 0.0001   Epoch: 19   Global Step: 806400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:46,485-Speed 2625.76 samples/sec   Loss 1.1389   LearningRate 0.0001   Epoch: 19   Global Step: 806410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:50,381-Speed 2629.39 samples/sec   Loss 1.1360   LearningRate 0.0001   Epoch: 19   Global Step: 806420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:54,273-Speed 2632.00 samples/sec   Loss 1.1064   LearningRate 0.0001   Epoch: 19   Global Step: 806430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:21:58,165-Speed 2631.96 samples/sec   Loss 1.1268   LearningRate 0.0001   Epoch: 19   Global Step: 806440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:22:02,058-Speed 2631.04 samples/sec   Loss 1.1019   LearningRate 0.0001   Epoch: 19   Global Step: 806450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:22:05,946-Speed 2634.02 samples/sec   Loss 1.1247   LearningRate 0.0001   Epoch: 19   Global Step: 806460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:22:09,813-Speed 2648.54 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 19   Global Step: 806470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:22:13,702-Speed 2634.03 samples/sec   Loss 1.1305   LearningRate 0.0001   Epoch: 19   Global Step: 806480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:22:17,604-Speed 2625.23 samples/sec   Loss 1.1300   LearningRate 0.0001   Epoch: 19   Global Step: 806490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:22:21,526-Speed 2611.35 samples/sec   Loss 1.1232   LearningRate 0.0001   Epoch: 19   Global Step: 806500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:22:25,402-Speed 2642.33 samples/sec   Loss 1.1131   LearningRate 0.0001   Epoch: 19   Global Step: 806510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:29,295-Speed 2631.73 samples/sec   Loss 1.1100   LearningRate 0.0001   Epoch: 19   Global Step: 806520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:33,381-Speed 2507.23 samples/sec   Loss 1.1950   LearningRate 0.0001   Epoch: 19   Global Step: 806530   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:37,275-Speed 2630.32 samples/sec   Loss 1.1954   LearningRate 0.0001   Epoch: 19   Global Step: 806540   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:41,168-Speed 2631.27 samples/sec   Loss 1.1172   LearningRate 0.0001   Epoch: 19   Global Step: 806550   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:45,066-Speed 2627.07 samples/sec   Loss 1.1199   LearningRate 0.0001   Epoch: 19   Global Step: 806560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:48,964-Speed 2628.47 samples/sec   Loss 1.1618   LearningRate 0.0001   Epoch: 19   Global Step: 806570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:52,862-Speed 2627.26 samples/sec   Loss 1.1580   LearningRate 0.0001   Epoch: 19   Global Step: 806580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:22:56,768-Speed 2622.42 samples/sec   Loss 1.1465   LearningRate 0.0001   Epoch: 19   Global Step: 806590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:00,688-Speed 2612.49 samples/sec   Loss 1.1333   LearningRate 0.0001   Epoch: 19   Global Step: 806600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:04,586-Speed 2628.06 samples/sec   Loss 1.1904   LearningRate 0.0001   Epoch: 19   Global Step: 806610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:23:08,483-Speed 2628.34 samples/sec   Loss 1.1166   LearningRate 0.0001   Epoch: 19   Global Step: 806620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:23:12,354-Speed 2646.17 samples/sec   Loss 1.1186   LearningRate 0.0001   Epoch: 19   Global Step: 806630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:16,270-Speed 2615.77 samples/sec   Loss 1.1094   LearningRate 0.0001   Epoch: 19   Global Step: 806640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:20,160-Speed 2632.41 samples/sec   Loss 1.1162   LearningRate 0.0001   Epoch: 19   Global Step: 806650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:24,053-Speed 2631.45 samples/sec   Loss 1.0879   LearningRate 0.0001   Epoch: 19   Global Step: 806660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:27,945-Speed 2631.19 samples/sec   Loss 1.1820   LearningRate 0.0001   Epoch: 19   Global Step: 806670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:31,842-Speed 2628.50 samples/sec   Loss 1.1630   LearningRate 0.0001   Epoch: 19   Global Step: 806680   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:35,763-Speed 2612.53 samples/sec   Loss 1.1753   LearningRate 0.0001   Epoch: 19   Global Step: 806690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:39,659-Speed 2628.83 samples/sec   Loss 1.1646   LearningRate 0.0001   Epoch: 19   Global Step: 806700   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:43,570-Speed 2618.46 samples/sec   Loss 1.1105   LearningRate 0.0001   Epoch: 19   Global Step: 806710   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:47,478-Speed 2621.44 samples/sec   Loss 1.1613   LearningRate 0.0001   Epoch: 19   Global Step: 806720   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:23:51,378-Speed 2626.48 samples/sec   Loss 1.1389   LearningRate 0.0001   Epoch: 19   Global Step: 806730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:23:55,307-Speed 2606.86 samples/sec   Loss 1.1017   LearningRate 0.0001   Epoch: 19   Global Step: 806740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:23:59,208-Speed 2625.41 samples/sec   Loss 1.1245   LearningRate 0.0001   Epoch: 19   Global Step: 806750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:03,111-Speed 2624.94 samples/sec   Loss 1.1194   LearningRate 0.0001   Epoch: 19   Global Step: 806760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:07,008-Speed 2627.65 samples/sec   Loss 1.1116   LearningRate 0.0001   Epoch: 19   Global Step: 806770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:10,968-Speed 2587.04 samples/sec   Loss 1.1276   LearningRate 0.0001   Epoch: 19   Global Step: 806780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:14,863-Speed 2629.76 samples/sec   Loss 1.1235   LearningRate 0.0001   Epoch: 19   Global Step: 806790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:18,758-Speed 2630.37 samples/sec   Loss 1.0935   LearningRate 0.0001   Epoch: 19   Global Step: 806800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:22,662-Speed 2623.55 samples/sec   Loss 1.1282   LearningRate 0.0001   Epoch: 19   Global Step: 806810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:26,552-Speed 2632.47 samples/sec   Loss 1.1371   LearningRate 0.0001   Epoch: 19   Global Step: 806820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:30,439-Speed 2635.24 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 19   Global Step: 806830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:34,342-Speed 2624.87 samples/sec   Loss 1.1588   LearningRate 0.0001   Epoch: 19   Global Step: 806840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:38,234-Speed 2632.57 samples/sec   Loss 1.1072   LearningRate 0.0001   Epoch: 19   Global Step: 806850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:42,126-Speed 2631.71 samples/sec   Loss 1.1720   LearningRate 0.0001   Epoch: 19   Global Step: 806860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:46,024-Speed 2627.47 samples/sec   Loss 1.1268   LearningRate 0.0001   Epoch: 19   Global Step: 806870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:49,927-Speed 2625.07 samples/sec   Loss 1.1674   LearningRate 0.0001   Epoch: 19   Global Step: 806880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:53,837-Speed 2619.52 samples/sec   Loss 1.1482   LearningRate 0.0001   Epoch: 19   Global Step: 806890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:24:57,730-Speed 2630.86 samples/sec   Loss 1.1596   LearningRate 0.0001   Epoch: 19   Global Step: 806900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:01,630-Speed 2626.87 samples/sec   Loss 1.1164   LearningRate 0.0001   Epoch: 19   Global Step: 806910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:05,534-Speed 2623.51 samples/sec   Loss 1.1502   LearningRate 0.0001   Epoch: 19   Global Step: 806920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:09,425-Speed 2632.22 samples/sec   Loss 1.1219   LearningRate 0.0001   Epoch: 19   Global Step: 806930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:25:13,350-Speed 2610.18 samples/sec   Loss 1.1362   LearningRate 0.0001   Epoch: 19   Global Step: 806940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:25:17,226-Speed 2642.54 samples/sec   Loss 1.1125   LearningRate 0.0001   Epoch: 19   Global Step: 806950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:21,123-Speed 2628.47 samples/sec   Loss 1.1304   LearningRate 0.0001   Epoch: 19   Global Step: 806960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:25,026-Speed 2624.44 samples/sec   Loss 1.1470   LearningRate 0.0001   Epoch: 19   Global Step: 806970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:29,066-Speed 2535.13 samples/sec   Loss 1.1601   LearningRate 0.0001   Epoch: 19   Global Step: 806980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:33,126-Speed 2522.75 samples/sec   Loss 1.0798   LearningRate 0.0001   Epoch: 19   Global Step: 806990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:37,018-Speed 2632.14 samples/sec   Loss 1.1288   LearningRate 0.0001   Epoch: 19   Global Step: 807000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:40,913-Speed 2629.49 samples/sec   Loss 1.1176   LearningRate 0.0001   Epoch: 19   Global Step: 807010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:44,826-Speed 2617.94 samples/sec   Loss 1.1714   LearningRate 0.0001   Epoch: 19   Global Step: 807020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:48,723-Speed 2628.38 samples/sec   Loss 1.0748   LearningRate 0.0001   Epoch: 19   Global Step: 807030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:52,620-Speed 2628.38 samples/sec   Loss 1.0901   LearningRate 0.0001   Epoch: 19   Global Step: 807040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:25:56,520-Speed 2626.70 samples/sec   Loss 1.1257   LearningRate 0.0001   Epoch: 19   Global Step: 807050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:26:00,420-Speed 2626.43 samples/sec   Loss 1.1584   LearningRate 0.0001   Epoch: 19   Global Step: 807060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:26:04,324-Speed 2623.60 samples/sec   Loss 1.0765   LearningRate 0.0001   Epoch: 19   Global Step: 807070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:26:08,199-Speed 2642.53 samples/sec   Loss 1.1281   LearningRate 0.0001   Epoch: 19   Global Step: 807080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:12,092-Speed 2631.15 samples/sec   Loss 1.1229   LearningRate 0.0001   Epoch: 19   Global Step: 807090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:15,997-Speed 2623.25 samples/sec   Loss 1.1059   LearningRate 0.0001   Epoch: 19   Global Step: 807100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:19,890-Speed 2631.32 samples/sec   Loss 1.1083   LearningRate 0.0001   Epoch: 19   Global Step: 807110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:23,786-Speed 2628.63 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 807120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:27,691-Speed 2623.42 samples/sec   Loss 1.1229   LearningRate 0.0001   Epoch: 19   Global Step: 807130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:31,587-Speed 2629.20 samples/sec   Loss 1.1144   LearningRate 0.0001   Epoch: 19   Global Step: 807140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:35,485-Speed 2627.35 samples/sec   Loss 1.0907   LearningRate 0.0001   Epoch: 19   Global Step: 807150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:39,380-Speed 2629.86 samples/sec   Loss 1.1163   LearningRate 0.0001   Epoch: 19   Global Step: 807160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:43,276-Speed 2629.26 samples/sec   Loss 1.1383   LearningRate 0.0001   Epoch: 19   Global Step: 807170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:47,185-Speed 2619.66 samples/sec   Loss 1.0911   LearningRate 0.0001   Epoch: 19   Global Step: 807180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:26:51,083-Speed 2628.03 samples/sec   Loss 1.1487   LearningRate 0.0001   Epoch: 19   Global Step: 807190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-16 14:26:54,963-Speed 2639.44 samples/sec   Loss 1.0874   LearningRate 0.0001   Epoch: 19   Global Step: 807200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:26:58,858-Speed 2630.14 samples/sec   Loss 1.1255   LearningRate 0.0001   Epoch: 19   Global Step: 807210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:27:02,756-Speed 2627.73 samples/sec   Loss 1.1346   LearningRate 0.0001   Epoch: 19   Global Step: 807220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:27:06,661-Speed 2622.84 samples/sec   Loss 1.0828   LearningRate 0.0001   Epoch: 19   Global Step: 807230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:27:10,563-Speed 2624.62 samples/sec   Loss 1.1721   LearningRate 0.0001   Epoch: 19   Global Step: 807240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:27:14,463-Speed 2626.65 samples/sec   Loss 1.1190   LearningRate 0.0001   Epoch: 19   Global Step: 807250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-16 14:27:18,371-Speed 2620.96 samples/sec   Loss 1.1095   LearningRate 0.0001   Epoch: 19   Global Step: 807260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:22,397-Speed 2543.94 samples/sec   Loss 1.1010   LearningRate 0.0001   Epoch: 19   Global Step: 807270   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:26,289-Speed 2632.01 samples/sec   Loss 1.1936   LearningRate 0.0001   Epoch: 19   Global Step: 807280   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:30,197-Speed 2621.06 samples/sec   Loss 1.1639   LearningRate 0.0001   Epoch: 19   Global Step: 807290   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:34,095-Speed 2627.69 samples/sec   Loss 1.1118   LearningRate 0.0001   Epoch: 19   Global Step: 807300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:38,001-Speed 2622.33 samples/sec   Loss 1.1479   LearningRate 0.0001   Epoch: 19   Global Step: 807310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:41,901-Speed 2626.66 samples/sec   Loss 1.1304   LearningRate 0.0001   Epoch: 19   Global Step: 807320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:45,795-Speed 2630.30 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 19   Global Step: 807330   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-16 14:27:49,690-Speed 2629.66 samples/sec   Loss 1.1240   LearningRate 0.0001   Epoch: 19   Global Step: 807340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:27:53,582-Speed 2631.50 samples/sec   Loss 1.1810   LearningRate 0.0001   Epoch: 19   Global Step: 807350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:27:57,499-Speed 2615.38 samples/sec   Loss 1.1693   LearningRate 0.0001   Epoch: 19   Global Step: 807360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:28:01,399-Speed 2626.16 samples/sec   Loss 1.1458   LearningRate 0.0001   Epoch: 19   Global Step: 807370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:28:05,275-Speed 2642.68 samples/sec   Loss 1.1532   LearningRate 0.0001   Epoch: 19   Global Step: 807380   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:09,199-Speed 2610.03 samples/sec   Loss 1.1546   LearningRate 0.0001   Epoch: 19   Global Step: 807390   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:13,093-Speed 2630.42 samples/sec   Loss 1.1623   LearningRate 0.0001   Epoch: 19   Global Step: 807400   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:16,998-Speed 2623.41 samples/sec   Loss 1.1194   LearningRate 0.0001   Epoch: 19   Global Step: 807410   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:20,965-Speed 2581.18 samples/sec   Loss 1.1185   LearningRate 0.0001   Epoch: 19   Global Step: 807420   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:24,866-Speed 2626.82 samples/sec   Loss 1.0956   LearningRate 0.0001   Epoch: 19   Global Step: 807430   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:28,760-Speed 2630.39 samples/sec   Loss 1.1258   LearningRate 0.0001   Epoch: 19   Global Step: 807440   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:32,659-Speed 2626.76 samples/sec   Loss 1.1408   LearningRate 0.0001   Epoch: 19   Global Step: 807450   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:36,580-Speed 2612.42 samples/sec   Loss 1.1440   LearningRate 0.0001   Epoch: 19   Global Step: 807460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:40,474-Speed 2629.69 samples/sec   Loss 1.1042   LearningRate 0.0001   Epoch: 19   Global Step: 807470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:28:44,369-Speed 2630.24 samples/sec   Loss 1.1344   LearningRate 0.0001   Epoch: 19   Global Step: 807480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:28:48,267-Speed 2627.55 samples/sec   Loss 1.1479   LearningRate 0.0001   Epoch: 19   Global Step: 807490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:28:52,158-Speed 2632.81 samples/sec   Loss 1.1511   LearningRate 0.0001   Epoch: 19   Global Step: 807500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:28:56,050-Speed 2630.97 samples/sec   Loss 1.1097   LearningRate 0.0001   Epoch: 19   Global Step: 807510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:28:59,947-Speed 2628.72 samples/sec   Loss 1.1161   LearningRate 0.0001   Epoch: 19   Global Step: 807520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:03,843-Speed 2628.87 samples/sec   Loss 1.1215   LearningRate 0.0001   Epoch: 19   Global Step: 807530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:07,772-Speed 2607.17 samples/sec   Loss 1.1674   LearningRate 0.0001   Epoch: 19   Global Step: 807540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:11,678-Speed 2622.29 samples/sec   Loss 1.1521   LearningRate 0.0001   Epoch: 19   Global Step: 807550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:15,574-Speed 2629.31 samples/sec   Loss 1.1087   LearningRate 0.0001   Epoch: 19   Global Step: 807560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:19,483-Speed 2620.68 samples/sec   Loss 1.1340   LearningRate 0.0001   Epoch: 19   Global Step: 807570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:23,367-Speed 2637.46 samples/sec   Loss 1.1455   LearningRate 0.0001   Epoch: 19   Global Step: 807580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:27,254-Speed 2634.66 samples/sec   Loss 1.1141   LearningRate 0.0001   Epoch: 19   Global Step: 807590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:31,146-Speed 2631.57 samples/sec   Loss 1.1301   LearningRate 0.0001   Epoch: 19   Global Step: 807600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:35,056-Speed 2619.33 samples/sec   Loss 1.1252   LearningRate 0.0001   Epoch: 19   Global Step: 807610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:38,952-Speed 2629.14 samples/sec   Loss 1.1316   LearningRate 0.0001   Epoch: 19   Global Step: 807620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:42,843-Speed 2632.59 samples/sec   Loss 1.1056   LearningRate 0.0001   Epoch: 19   Global Step: 807630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:46,732-Speed 2634.33 samples/sec   Loss 1.1160   LearningRate 0.0001   Epoch: 19   Global Step: 807640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:50,627-Speed 2629.24 samples/sec   Loss 1.1016   LearningRate 0.0001   Epoch: 19   Global Step: 807650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:54,523-Speed 2629.46 samples/sec   Loss 1.1490   LearningRate 0.0001   Epoch: 19   Global Step: 807660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:29:58,430-Speed 2621.20 samples/sec   Loss 1.1335   LearningRate 0.0001   Epoch: 19   Global Step: 807670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:02,327-Speed 2628.31 samples/sec   Loss 1.1666   LearningRate 0.0001   Epoch: 19   Global Step: 807680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:30:06,200-Speed 2643.97 samples/sec   Loss 1.1169   LearningRate 0.0001   Epoch: 19   Global Step: 807690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:10,091-Speed 2632.82 samples/sec   Loss 1.1538   LearningRate 0.0001   Epoch: 19   Global Step: 807700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:13,989-Speed 2627.60 samples/sec   Loss 1.1626   LearningRate 0.0001   Epoch: 19   Global Step: 807710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:17,893-Speed 2623.65 samples/sec   Loss 1.1660   LearningRate 0.0001   Epoch: 19   Global Step: 807720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:21,790-Speed 2628.81 samples/sec   Loss 1.1261   LearningRate 0.0001   Epoch: 19   Global Step: 807730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:25,682-Speed 2631.38 samples/sec   Loss 1.0967   LearningRate 0.0001   Epoch: 19   Global Step: 807740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:29,574-Speed 2632.07 samples/sec   Loss 1.1234   LearningRate 0.0001   Epoch: 19   Global Step: 807750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:30:33,440-Speed 2648.81 samples/sec   Loss 1.1591   LearningRate 0.0001   Epoch: 19   Global Step: 807760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:30:37,338-Speed 2628.43 samples/sec   Loss 1.1117   LearningRate 0.0001   Epoch: 19   Global Step: 807770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:30:41,230-Speed 2631.15 samples/sec   Loss 1.1182   LearningRate 0.0001   Epoch: 19   Global Step: 807780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:30:45,130-Speed 2626.90 samples/sec   Loss 1.1627   LearningRate 0.0001   Epoch: 19   Global Step: 807790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:30:49,069-Speed 2600.34 samples/sec   Loss 1.1587   LearningRate 0.0001   Epoch: 19   Global Step: 807800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:30:52,959-Speed 2633.32 samples/sec   Loss 1.1174   LearningRate 0.0001   Epoch: 19   Global Step: 807810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:30:56,852-Speed 2630.97 samples/sec   Loss 1.1506   LearningRate 0.0001   Epoch: 19   Global Step: 807820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:31:00,761-Speed 2620.44 samples/sec   Loss 1.0828   LearningRate 0.0001   Epoch: 19   Global Step: 807830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:31:04,681-Speed 2612.68 samples/sec   Loss 1.1147   LearningRate 0.0001   Epoch: 19   Global Step: 807840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:31:08,578-Speed 2628.37 samples/sec   Loss 1.1121   LearningRate 0.0001   Epoch: 19   Global Step: 807850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:31:12,469-Speed 2632.32 samples/sec   Loss 1.1658   LearningRate 0.0001   Epoch: 19   Global Step: 807860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:16,373-Speed 2623.79 samples/sec   Loss 1.1476   LearningRate 0.0001   Epoch: 19   Global Step: 807870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:20,279-Speed 2622.23 samples/sec   Loss 1.1047   LearningRate 0.0001   Epoch: 19   Global Step: 807880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:24,173-Speed 2630.39 samples/sec   Loss 1.1592   LearningRate 0.0001   Epoch: 19   Global Step: 807890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:28,069-Speed 2629.20 samples/sec   Loss 1.1035   LearningRate 0.0001   Epoch: 19   Global Step: 807900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:31,974-Speed 2623.61 samples/sec   Loss 1.1581   LearningRate 0.0001   Epoch: 19   Global Step: 807910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:35,865-Speed 2631.72 samples/sec   Loss 1.1140   LearningRate 0.0001   Epoch: 19   Global Step: 807920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:39,762-Speed 2628.77 samples/sec   Loss 1.1470   LearningRate 0.0001   Epoch: 19   Global Step: 807930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:43,653-Speed 2631.95 samples/sec   Loss 1.1612   LearningRate 0.0001   Epoch: 19   Global Step: 807940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:47,553-Speed 2626.67 samples/sec   Loss 1.1363   LearningRate 0.0001   Epoch: 19   Global Step: 807950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:51,442-Speed 2633.86 samples/sec   Loss 1.1411   LearningRate 0.0001   Epoch: 19   Global Step: 807960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:31:55,311-Speed 2647.16 samples/sec   Loss 1.1567   LearningRate 0.0001   Epoch: 19   Global Step: 807970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:31:59,221-Speed 2620.22 samples/sec   Loss 1.2121   LearningRate 0.0001   Epoch: 19   Global Step: 807980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:03,282-Speed 2521.94 samples/sec   Loss 1.0953   LearningRate 0.0001   Epoch: 19   Global Step: 807990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:07,340-Speed 2524.38 samples/sec   Loss 1.1361   LearningRate 0.0001   Epoch: 19   Global Step: 808000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:11,246-Speed 2621.93 samples/sec   Loss 1.1016   LearningRate 0.0001   Epoch: 19   Global Step: 808010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:15,143-Speed 2628.78 samples/sec   Loss 1.1372   LearningRate 0.0001   Epoch: 19   Global Step: 808020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:19,032-Speed 2634.28 samples/sec   Loss 1.1303   LearningRate 0.0001   Epoch: 19   Global Step: 808030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:22,942-Speed 2619.36 samples/sec   Loss 1.1186   LearningRate 0.0001   Epoch: 19   Global Step: 808040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:26,836-Speed 2630.59 samples/sec   Loss 1.1399   LearningRate 0.0001   Epoch: 19   Global Step: 808050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:30,733-Speed 2628.18 samples/sec   Loss 1.1534   LearningRate 0.0001   Epoch: 19   Global Step: 808060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:34,632-Speed 2626.78 samples/sec   Loss 1.0970   LearningRate 0.0001   Epoch: 19   Global Step: 808070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:32:38,539-Speed 2621.77 samples/sec   Loss 1.1425   LearningRate 0.0001   Epoch: 19   Global Step: 808080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:32:42,412-Speed 2644.95 samples/sec   Loss 1.1213   LearningRate 0.0001   Epoch: 19   Global Step: 808090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:46,307-Speed 2629.58 samples/sec   Loss 1.1074   LearningRate 0.0001   Epoch: 19   Global Step: 808100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:50,205-Speed 2627.65 samples/sec   Loss 1.1513   LearningRate 0.0001   Epoch: 19   Global Step: 808110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:54,122-Speed 2615.01 samples/sec   Loss 1.1316   LearningRate 0.0001   Epoch: 19   Global Step: 808120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:32:58,020-Speed 2627.93 samples/sec   Loss 1.0969   LearningRate 0.0001   Epoch: 19   Global Step: 808130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:01,919-Speed 2626.78 samples/sec   Loss 1.0665   LearningRate 0.0001   Epoch: 19   Global Step: 808140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:05,827-Speed 2620.76 samples/sec   Loss 1.1028   LearningRate 0.0001   Epoch: 19   Global Step: 808150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:09,728-Speed 2625.83 samples/sec   Loss 1.1435   LearningRate 0.0001   Epoch: 19   Global Step: 808160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:13,696-Speed 2582.00 samples/sec   Loss 1.1101   LearningRate 0.0001   Epoch: 19   Global Step: 808170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:17,619-Speed 2610.49 samples/sec   Loss 1.1249   LearningRate 0.0001   Epoch: 19   Global Step: 808180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:21,510-Speed 2632.51 samples/sec   Loss 1.1470   LearningRate 0.0001   Epoch: 19   Global Step: 808190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:33:25,409-Speed 2627.24 samples/sec   Loss 1.1270   LearningRate 0.0001   Epoch: 19   Global Step: 808200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:33:29,287-Speed 2641.93 samples/sec   Loss 1.1302   LearningRate 0.0001   Epoch: 19   Global Step: 808210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:33,180-Speed 2630.48 samples/sec   Loss 1.1503   LearningRate 0.0001   Epoch: 19   Global Step: 808220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:37,079-Speed 2627.14 samples/sec   Loss 1.0737   LearningRate 0.0001   Epoch: 19   Global Step: 808230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:41,091-Speed 2552.81 samples/sec   Loss 1.1057   LearningRate 0.0001   Epoch: 19   Global Step: 808240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:44,990-Speed 2627.55 samples/sec   Loss 1.1535   LearningRate 0.0001   Epoch: 19   Global Step: 808250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:48,905-Speed 2616.07 samples/sec   Loss 1.1338   LearningRate 0.0001   Epoch: 19   Global Step: 808260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:52,816-Speed 2619.09 samples/sec   Loss 1.1202   LearningRate 0.0001   Epoch: 19   Global Step: 808270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:33:56,689-Speed 2645.17 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 19   Global Step: 808280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:00,589-Speed 2625.83 samples/sec   Loss 1.1656   LearningRate 0.0001   Epoch: 19   Global Step: 808290   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:04,488-Speed 2626.83 samples/sec   Loss 1.1230   LearningRate 0.0001   Epoch: 19   Global Step: 808300   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:08,387-Speed 2626.96 samples/sec   Loss 1.1572   LearningRate 0.0001   Epoch: 19   Global Step: 808310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:12,374-Speed 2569.09 samples/sec   Loss 1.1364   LearningRate 0.0001   Epoch: 19   Global Step: 808320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:16,495-Speed 2485.38 samples/sec   Loss 1.0661   LearningRate 0.0001   Epoch: 19   Global Step: 808330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:20,404-Speed 2620.52 samples/sec   Loss 1.0881   LearningRate 0.0001   Epoch: 19   Global Step: 808340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:24,327-Speed 2611.81 samples/sec   Loss 1.1397   LearningRate 0.0001   Epoch: 19   Global Step: 808350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:28,224-Speed 2628.46 samples/sec   Loss 1.1292   LearningRate 0.0001   Epoch: 19   Global Step: 808360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:32,121-Speed 2627.55 samples/sec   Loss 1.1167   LearningRate 0.0001   Epoch: 19   Global Step: 808370   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:34:36,021-Speed 2626.30 samples/sec   Loss 1.1193   LearningRate 0.0001   Epoch: 19   Global Step: 808380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:34:39,925-Speed 2624.25 samples/sec   Loss 1.1195   LearningRate 0.0001   Epoch: 19   Global Step: 808390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:34:43,890-Speed 2583.35 samples/sec   Loss 1.1296   LearningRate 0.0001   Epoch: 19   Global Step: 808400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:34:47,791-Speed 2626.01 samples/sec   Loss 1.1786   LearningRate 0.0001   Epoch: 19   Global Step: 808410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:34:51,690-Speed 2627.05 samples/sec   Loss 1.1162   LearningRate 0.0001   Epoch: 19   Global Step: 808420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:34:55,592-Speed 2625.32 samples/sec   Loss 1.1372   LearningRate 0.0001   Epoch: 19   Global Step: 808430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:34:59,484-Speed 2631.88 samples/sec   Loss 1.0830   LearningRate 0.0001   Epoch: 19   Global Step: 808440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:03,415-Speed 2605.07 samples/sec   Loss 1.1005   LearningRate 0.0001   Epoch: 19   Global Step: 808450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:07,305-Speed 2633.92 samples/sec   Loss 1.0958   LearningRate 0.0001   Epoch: 19   Global Step: 808460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:11,196-Speed 2632.01 samples/sec   Loss 1.1001   LearningRate 0.0001   Epoch: 19   Global Step: 808470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:15,083-Speed 2635.41 samples/sec   Loss 1.1216   LearningRate 0.0001   Epoch: 19   Global Step: 808480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:18,983-Speed 2626.59 samples/sec   Loss 1.1175   LearningRate 0.0001   Epoch: 19   Global Step: 808490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:22,903-Speed 2613.11 samples/sec   Loss 1.1484   LearningRate 0.0001   Epoch: 19   Global Step: 808500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:26,797-Speed 2630.15 samples/sec   Loss 1.1501   LearningRate 0.0001   Epoch: 19   Global Step: 808510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:30,694-Speed 2628.14 samples/sec   Loss 1.1179   LearningRate 0.0001   Epoch: 19   Global Step: 808520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:34,591-Speed 2628.51 samples/sec   Loss 1.1104   LearningRate 0.0001   Epoch: 19   Global Step: 808530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:38,482-Speed 2632.41 samples/sec   Loss 1.1446   LearningRate 0.0001   Epoch: 19   Global Step: 808540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:42,380-Speed 2627.13 samples/sec   Loss 1.1533   LearningRate 0.0001   Epoch: 19   Global Step: 808550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:46,270-Speed 2634.32 samples/sec   Loss 1.1176   LearningRate 0.0001   Epoch: 19   Global Step: 808560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:50,166-Speed 2628.77 samples/sec   Loss 1.1267   LearningRate 0.0001   Epoch: 19   Global Step: 808570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:35:54,067-Speed 2625.90 samples/sec   Loss 1.1320   LearningRate 0.0001   Epoch: 19   Global Step: 808580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:35:57,965-Speed 2628.01 samples/sec   Loss 1.0864   LearningRate 0.0001   Epoch: 19   Global Step: 808590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:36:01,859-Speed 2629.82 samples/sec   Loss 1.1534   LearningRate 0.0001   Epoch: 19   Global Step: 808600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:36:05,764-Speed 2622.98 samples/sec   Loss 1.1700   LearningRate 0.0001   Epoch: 19   Global Step: 808610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:36:09,660-Speed 2628.81 samples/sec   Loss 1.1597   LearningRate 0.0001   Epoch: 19   Global Step: 808620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:36:13,557-Speed 2628.45 samples/sec   Loss 1.1365   LearningRate 0.0001   Epoch: 19   Global Step: 808630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:36:17,428-Speed 2646.26 samples/sec   Loss 1.1385   LearningRate 0.0001   Epoch: 19   Global Step: 808640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:21,331-Speed 2624.27 samples/sec   Loss 1.1529   LearningRate 0.0001   Epoch: 19   Global Step: 808650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:25,342-Speed 2554.15 samples/sec   Loss 1.1473   LearningRate 0.0001   Epoch: 19   Global Step: 808660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:29,234-Speed 2631.32 samples/sec   Loss 1.1205   LearningRate 0.0001   Epoch: 19   Global Step: 808670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:33,139-Speed 2622.93 samples/sec   Loss 1.1217   LearningRate 0.0001   Epoch: 19   Global Step: 808680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:37,031-Speed 2631.83 samples/sec   Loss 1.1034   LearningRate 0.0001   Epoch: 19   Global Step: 808690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:40,931-Speed 2626.05 samples/sec   Loss 1.1040   LearningRate 0.0001   Epoch: 19   Global Step: 808700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:44,824-Speed 2631.06 samples/sec   Loss 1.1091   LearningRate 0.0001   Epoch: 19   Global Step: 808710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:48,719-Speed 2629.89 samples/sec   Loss 1.1365   LearningRate 0.0001   Epoch: 19   Global Step: 808720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:52,639-Speed 2613.11 samples/sec   Loss 1.1563   LearningRate 0.0001   Epoch: 19   Global Step: 808730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:36:56,533-Speed 2630.34 samples/sec   Loss 1.0801   LearningRate 0.0001   Epoch: 19   Global Step: 808740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:37:00,428-Speed 2629.65 samples/sec   Loss 1.1588   LearningRate 0.0001   Epoch: 19   Global Step: 808750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:37:04,307-Speed 2640.06 samples/sec   Loss 1.1103   LearningRate 0.0001   Epoch: 19   Global Step: 808760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:08,206-Speed 2627.33 samples/sec   Loss 1.1282   LearningRate 0.0001   Epoch: 19   Global Step: 808770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:12,102-Speed 2628.92 samples/sec   Loss 1.1706   LearningRate 0.0001   Epoch: 19   Global Step: 808780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:16,006-Speed 2623.79 samples/sec   Loss 1.1402   LearningRate 0.0001   Epoch: 19   Global Step: 808790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:19,902-Speed 2629.32 samples/sec   Loss 1.0918   LearningRate 0.0001   Epoch: 19   Global Step: 808800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:23,796-Speed 2630.21 samples/sec   Loss 1.1200   LearningRate 0.0001   Epoch: 19   Global Step: 808810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:27,691-Speed 2630.40 samples/sec   Loss 1.1209   LearningRate 0.0001   Epoch: 19   Global Step: 808820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:31,598-Speed 2621.48 samples/sec   Loss 1.1515   LearningRate 0.0001   Epoch: 19   Global Step: 808830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:35,529-Speed 2605.39 samples/sec   Loss 1.0761   LearningRate 0.0001   Epoch: 19   Global Step: 808840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:39,434-Speed 2622.41 samples/sec   Loss 1.1584   LearningRate 0.0001   Epoch: 19   Global Step: 808850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:37:43,306-Speed 2645.68 samples/sec   Loss 1.1842   LearningRate 0.0001   Epoch: 19   Global Step: 808860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:37:47,201-Speed 2630.06 samples/sec   Loss 1.1355   LearningRate 0.0001   Epoch: 19   Global Step: 808870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:37:51,141-Speed 2599.87 samples/sec   Loss 1.0842   LearningRate 0.0001   Epoch: 19   Global Step: 808880   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:37:55,037-Speed 2628.77 samples/sec   Loss 1.1027   LearningRate 0.0001   Epoch: 19   Global Step: 808890   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:37:58,932-Speed 2629.90 samples/sec   Loss 1.0883   LearningRate 0.0001   Epoch: 19   Global Step: 808900   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:38:02,837-Speed 2622.73 samples/sec   Loss 1.1000   LearningRate 0.0001   Epoch: 19   Global Step: 808910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:38:06,742-Speed 2623.11 samples/sec   Loss 1.1451   LearningRate 0.0001   Epoch: 19   Global Step: 808920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:38:10,638-Speed 2629.39 samples/sec   Loss 1.1099   LearningRate 0.0001   Epoch: 19   Global Step: 808930   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:38:14,558-Speed 2612.75 samples/sec   Loss 1.1035   LearningRate 0.0001   Epoch: 19   Global Step: 808940   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:38:18,488-Speed 2606.36 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 19   Global Step: 808950   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:38:22,583-Speed 2501.33 samples/sec   Loss 1.1015   LearningRate 0.0001   Epoch: 19   Global Step: 808960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:26,672-Speed 2504.69 samples/sec   Loss 1.1415   LearningRate 0.0001   Epoch: 19   Global Step: 808970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:30,643-Speed 2580.00 samples/sec   Loss 1.1818   LearningRate 0.0001   Epoch: 19   Global Step: 808980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:34,545-Speed 2625.15 samples/sec   Loss 1.1157   LearningRate 0.0001   Epoch: 19   Global Step: 808990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:38,438-Speed 2630.55 samples/sec   Loss 1.1267   LearningRate 0.0001   Epoch: 19   Global Step: 809000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:42,342-Speed 2623.06 samples/sec   Loss 1.1372   LearningRate 0.0001   Epoch: 19   Global Step: 809010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:46,265-Speed 2610.97 samples/sec   Loss 1.1039   LearningRate 0.0001   Epoch: 19   Global Step: 809020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:50,157-Speed 2632.24 samples/sec   Loss 1.1334   LearningRate 0.0001   Epoch: 19   Global Step: 809030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:54,066-Speed 2619.88 samples/sec   Loss 1.1027   LearningRate 0.0001   Epoch: 19   Global Step: 809040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:38:57,969-Speed 2625.14 samples/sec   Loss 1.0921   LearningRate 0.0001   Epoch: 19   Global Step: 809050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:39:01,891-Speed 2611.55 samples/sec   Loss 1.0910   LearningRate 0.0001   Epoch: 19   Global Step: 809060   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:05,876-Speed 2569.75 samples/sec   Loss 1.1040   LearningRate 0.0001   Epoch: 19   Global Step: 809070   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:09,772-Speed 2628.75 samples/sec   Loss 1.0745   LearningRate 0.0001   Epoch: 19   Global Step: 809080   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:13,673-Speed 2626.05 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 19   Global Step: 809090   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:17,582-Speed 2620.04 samples/sec   Loss 1.1198   LearningRate 0.0001   Epoch: 19   Global Step: 809100   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:21,523-Speed 2599.36 samples/sec   Loss 1.1271   LearningRate 0.0001   Epoch: 19   Global Step: 809110   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:25,416-Speed 2630.93 samples/sec   Loss 1.0944   LearningRate 0.0001   Epoch: 19   Global Step: 809120   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:29,340-Speed 2611.04 samples/sec   Loss 1.1267   LearningRate 0.0001   Epoch: 19   Global Step: 809130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:33,238-Speed 2627.70 samples/sec   Loss 1.1234   LearningRate 0.0001   Epoch: 19   Global Step: 809140   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:37,141-Speed 2623.95 samples/sec   Loss 1.0908   LearningRate 0.0001   Epoch: 19   Global Step: 809150   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:39:41,042-Speed 2625.10 samples/sec   Loss 1.1468   LearningRate 0.0001   Epoch: 19   Global Step: 809160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:39:44,937-Speed 2630.15 samples/sec   Loss 1.1133   LearningRate 0.0001   Epoch: 19   Global Step: 809170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:39:48,833-Speed 2629.30 samples/sec   Loss 1.1471   LearningRate 0.0001   Epoch: 19   Global Step: 809180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:39:52,772-Speed 2600.66 samples/sec   Loss 1.1197   LearningRate 0.0001   Epoch: 19   Global Step: 809190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:39:56,664-Speed 2631.53 samples/sec   Loss 1.0963   LearningRate 0.0001   Epoch: 19   Global Step: 809200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:00,592-Speed 2607.82 samples/sec   Loss 1.1107   LearningRate 0.0001   Epoch: 19   Global Step: 809210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:04,509-Speed 2615.04 samples/sec   Loss 1.1121   LearningRate 0.0001   Epoch: 19   Global Step: 809220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:08,414-Speed 2623.08 samples/sec   Loss 1.0776   LearningRate 0.0001   Epoch: 19   Global Step: 809230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:12,323-Speed 2619.71 samples/sec   Loss 1.1202   LearningRate 0.0001   Epoch: 19   Global Step: 809240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:16,222-Speed 2626.67 samples/sec   Loss 1.1512   LearningRate 0.0001   Epoch: 19   Global Step: 809250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:20,118-Speed 2629.57 samples/sec   Loss 1.1312   LearningRate 0.0001   Epoch: 19   Global Step: 809260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:40:24,013-Speed 2629.64 samples/sec   Loss 1.1411   LearningRate 0.0001   Epoch: 19   Global Step: 809270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:40:27,921-Speed 2620.98 samples/sec   Loss 1.1317   LearningRate 0.0001   Epoch: 19   Global Step: 809280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:40:31,831-Speed 2619.56 samples/sec   Loss 1.1322   LearningRate 0.0001   Epoch: 19   Global Step: 809290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:35,720-Speed 2634.22 samples/sec   Loss 1.1201   LearningRate 0.0001   Epoch: 19   Global Step: 809300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:39,614-Speed 2631.14 samples/sec   Loss 1.0729   LearningRate 0.0001   Epoch: 19   Global Step: 809310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:43,503-Speed 2633.71 samples/sec   Loss 1.0936   LearningRate 0.0001   Epoch: 19   Global Step: 809320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:47,395-Speed 2631.59 samples/sec   Loss 1.1163   LearningRate 0.0001   Epoch: 19   Global Step: 809330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:51,287-Speed 2631.63 samples/sec   Loss 1.1075   LearningRate 0.0001   Epoch: 19   Global Step: 809340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:55,181-Speed 2630.60 samples/sec   Loss 1.1363   LearningRate 0.0001   Epoch: 19   Global Step: 809350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:40:59,110-Speed 2606.95 samples/sec   Loss 1.1472   LearningRate 0.0001   Epoch: 19   Global Step: 809360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:03,003-Speed 2631.06 samples/sec   Loss 1.1678   LearningRate 0.0001   Epoch: 19   Global Step: 809370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:06,915-Speed 2618.04 samples/sec   Loss 1.1660   LearningRate 0.0001   Epoch: 19   Global Step: 809380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:10,822-Speed 2621.50 samples/sec   Loss 1.1506   LearningRate 0.0001   Epoch: 19   Global Step: 809390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:41:14,695-Speed 2645.38 samples/sec   Loss 1.1530   LearningRate 0.0001   Epoch: 19   Global Step: 809400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:18,631-Speed 2602.28 samples/sec   Loss 1.1550   LearningRate 0.0001   Epoch: 19   Global Step: 809410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:22,652-Speed 2547.23 samples/sec   Loss 1.1515   LearningRate 0.0001   Epoch: 19   Global Step: 809420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:26,553-Speed 2625.71 samples/sec   Loss 1.0918   LearningRate 0.0001   Epoch: 19   Global Step: 809430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:30,466-Speed 2617.91 samples/sec   Loss 1.0781   LearningRate 0.0001   Epoch: 19   Global Step: 809440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:34,381-Speed 2616.59 samples/sec   Loss 1.1516   LearningRate 0.0001   Epoch: 19   Global Step: 809450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:38,305-Speed 2610.10 samples/sec   Loss 1.1012   LearningRate 0.0001   Epoch: 19   Global Step: 809460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:42,231-Speed 2608.52 samples/sec   Loss 1.1225   LearningRate 0.0001   Epoch: 19   Global Step: 809470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:46,142-Speed 2619.57 samples/sec   Loss 1.1616   LearningRate 0.0001   Epoch: 19   Global Step: 809480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:50,038-Speed 2628.98 samples/sec   Loss 1.0731   LearningRate 0.0001   Epoch: 19   Global Step: 809490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:41:53,932-Speed 2630.48 samples/sec   Loss 1.1332   LearningRate 0.0001   Epoch: 19   Global Step: 809500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:41:57,803-Speed 2646.88 samples/sec   Loss 1.1085   LearningRate 0.0001   Epoch: 19   Global Step: 809510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:01,698-Speed 2629.43 samples/sec   Loss 1.1130   LearningRate 0.0001   Epoch: 19   Global Step: 809520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:05,596-Speed 2627.54 samples/sec   Loss 1.1382   LearningRate 0.0001   Epoch: 19   Global Step: 809530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:09,491-Speed 2630.22 samples/sec   Loss 1.1367   LearningRate 0.0001   Epoch: 19   Global Step: 809540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:13,399-Speed 2621.18 samples/sec   Loss 1.0973   LearningRate 0.0001   Epoch: 19   Global Step: 809550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:17,291-Speed 2631.50 samples/sec   Loss 1.1225   LearningRate 0.0001   Epoch: 19   Global Step: 809560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:21,208-Speed 2614.97 samples/sec   Loss 1.1315   LearningRate 0.0001   Epoch: 19   Global Step: 809570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:25,102-Speed 2630.37 samples/sec   Loss 1.1084   LearningRate 0.0001   Epoch: 19   Global Step: 809580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:29,000-Speed 2627.91 samples/sec   Loss 1.0971   LearningRate 0.0001   Epoch: 19   Global Step: 809590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:32,910-Speed 2619.84 samples/sec   Loss 1.1556   LearningRate 0.0001   Epoch: 19   Global Step: 809600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:36,788-Speed 2641.01 samples/sec   Loss 1.0520   LearningRate 0.0001   Epoch: 19   Global Step: 809610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:40,838-Speed 2528.96 samples/sec   Loss 1.1448   LearningRate 0.0001   Epoch: 19   Global Step: 809620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:44,762-Speed 2610.30 samples/sec   Loss 1.1249   LearningRate 0.0001   Epoch: 19   Global Step: 809630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:48,669-Speed 2621.36 samples/sec   Loss 1.1400   LearningRate 0.0001   Epoch: 19   Global Step: 809640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:52,560-Speed 2632.79 samples/sec   Loss 1.1263   LearningRate 0.0001   Epoch: 19   Global Step: 809650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:42:56,454-Speed 2630.78 samples/sec   Loss 1.1668   LearningRate 0.0001   Epoch: 19   Global Step: 809660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:00,357-Speed 2624.36 samples/sec   Loss 1.1336   LearningRate 0.0001   Epoch: 19   Global Step: 809670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:04,252-Speed 2629.50 samples/sec   Loss 1.1221   LearningRate 0.0001   Epoch: 19   Global Step: 809680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:08,152-Speed 2626.17 samples/sec   Loss 1.1090   LearningRate 0.0001   Epoch: 19   Global Step: 809690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:12,047-Speed 2629.17 samples/sec   Loss 1.0966   LearningRate 0.0001   Epoch: 19   Global Step: 809700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:15,943-Speed 2629.99 samples/sec   Loss 1.0998   LearningRate 0.0001   Epoch: 19   Global Step: 809710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:43:19,832-Speed 2633.89 samples/sec   Loss 1.1101   LearningRate 0.0001   Epoch: 19   Global Step: 809720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:43:23,706-Speed 2644.19 samples/sec   Loss 1.1941   LearningRate 0.0001   Epoch: 19   Global Step: 809730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:27,598-Speed 2631.31 samples/sec   Loss 1.1499   LearningRate 0.0001   Epoch: 19   Global Step: 809740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:31,510-Speed 2618.35 samples/sec   Loss 1.1504   LearningRate 0.0001   Epoch: 19   Global Step: 809750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:35,407-Speed 2628.38 samples/sec   Loss 1.1376   LearningRate 0.0001   Epoch: 19   Global Step: 809760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:39,301-Speed 2629.98 samples/sec   Loss 1.1064   LearningRate 0.0001   Epoch: 19   Global Step: 809770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:43,204-Speed 2624.96 samples/sec   Loss 1.0928   LearningRate 0.0001   Epoch: 19   Global Step: 809780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:47,099-Speed 2629.11 samples/sec   Loss 1.1560   LearningRate 0.0001   Epoch: 19   Global Step: 809790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:50,994-Speed 2630.25 samples/sec   Loss 1.1277   LearningRate 0.0001   Epoch: 19   Global Step: 809800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:54,881-Speed 2635.05 samples/sec   Loss 1.0891   LearningRate 0.0001   Epoch: 19   Global Step: 809810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:43:58,771-Speed 2633.50 samples/sec   Loss 1.0942   LearningRate 0.0001   Epoch: 19   Global Step: 809820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:02,675-Speed 2623.34 samples/sec   Loss 1.1828   LearningRate 0.0001   Epoch: 19   Global Step: 809830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:06,570-Speed 2630.38 samples/sec   Loss 1.1647   LearningRate 0.0001   Epoch: 19   Global Step: 809840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:10,462-Speed 2631.53 samples/sec   Loss 1.0959   LearningRate 0.0001   Epoch: 19   Global Step: 809850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:14,356-Speed 2630.33 samples/sec   Loss 1.1435   LearningRate 0.0001   Epoch: 19   Global Step: 809860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:18,261-Speed 2622.58 samples/sec   Loss 1.1377   LearningRate 0.0001   Epoch: 19   Global Step: 809870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:22,153-Speed 2632.04 samples/sec   Loss 1.0985   LearningRate 0.0001   Epoch: 19   Global Step: 809880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:26,043-Speed 2632.86 samples/sec   Loss 1.2029   LearningRate 0.0001   Epoch: 19   Global Step: 809890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:29,935-Speed 2631.89 samples/sec   Loss 1.1163   LearningRate 0.0001   Epoch: 19   Global Step: 809900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:33,823-Speed 2634.71 samples/sec   Loss 1.1413   LearningRate 0.0001   Epoch: 19   Global Step: 809910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:37,715-Speed 2631.83 samples/sec   Loss 1.1487   LearningRate 0.0001   Epoch: 19   Global Step: 809920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:41,611-Speed 2628.70 samples/sec   Loss 1.1289   LearningRate 0.0001   Epoch: 19   Global Step: 809930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:44:45,499-Speed 2634.09 samples/sec   Loss 1.1120   LearningRate 0.0001   Epoch: 19   Global Step: 809940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:44:49,404-Speed 2623.62 samples/sec   Loss 1.1054   LearningRate 0.0001   Epoch: 19   Global Step: 809950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:44:53,296-Speed 2631.87 samples/sec   Loss 1.1339   LearningRate 0.0001   Epoch: 19   Global Step: 809960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:44:57,198-Speed 2624.84 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 19   Global Step: 809970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:45:01,092-Speed 2630.35 samples/sec   Loss 1.1180   LearningRate 0.0001   Epoch: 19   Global Step: 809980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:45:04,997-Speed 2623.56 samples/sec   Loss 1.0993   LearningRate 0.0001   Epoch: 19   Global Step: 809990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:45:08,889-Speed 2631.46 samples/sec   Loss 1.1249   LearningRate 0.0001   Epoch: 19   Global Step: 810000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:45:51,700-[lfw][810000]XNorm: 21.221002
Training: 2022-04-16 14:45:51,701-[lfw][810000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 14:45:51,702-[lfw][810000]Accuracy-Highest: 0.99850
Training: 2022-04-16 14:46:41,332-[cfp_fp][810000]XNorm: 21.789366
Training: 2022-04-16 14:46:41,333-[cfp_fp][810000]Accuracy-Flip: 0.99371+-0.00351
Training: 2022-04-16 14:46:41,334-[cfp_fp][810000]Accuracy-Highest: 0.99400
Training: 2022-04-16 14:47:24,023-[agedb_30][810000]XNorm: 22.216408
Training: 2022-04-16 14:47:24,024-[agedb_30][810000]Accuracy-Flip: 0.98533+-0.00531
Training: 2022-04-16 14:47:24,025-[agedb_30][810000]Accuracy-Highest: 0.98583
Training: 2022-04-16 14:47:27,905-Speed 73.66 samples/sec   Loss 1.1201   LearningRate 0.0001   Epoch: 19   Global Step: 810010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:31,773-Speed 2648.48 samples/sec   Loss 1.1268   LearningRate 0.0001   Epoch: 19   Global Step: 810020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:35,709-Speed 2602.71 samples/sec   Loss 1.1444   LearningRate 0.0001   Epoch: 19   Global Step: 810030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:39,591-Speed 2638.68 samples/sec   Loss 1.1515   LearningRate 0.0001   Epoch: 19   Global Step: 810040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:43,528-Speed 2601.02 samples/sec   Loss 1.1492   LearningRate 0.0001   Epoch: 19   Global Step: 810050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:47,477-Speed 2594.00 samples/sec   Loss 1.1420   LearningRate 0.0001   Epoch: 19   Global Step: 810060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:47:51,339-Speed 2652.87 samples/sec   Loss 1.1375   LearningRate 0.0001   Epoch: 19   Global Step: 810070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:55,229-Speed 2633.17 samples/sec   Loss 1.1758   LearningRate 0.0001   Epoch: 19   Global Step: 810080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:47:59,117-Speed 2634.38 samples/sec   Loss 1.1425   LearningRate 0.0001   Epoch: 19   Global Step: 810090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:48:03,007-Speed 2633.06 samples/sec   Loss 1.0993   LearningRate 0.0001   Epoch: 19   Global Step: 810100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:48:06,893-Speed 2635.70 samples/sec   Loss 1.1373   LearningRate 0.0001   Epoch: 19   Global Step: 810110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:48:10,764-Speed 2645.98 samples/sec   Loss 1.0821   LearningRate 0.0001   Epoch: 19   Global Step: 810120   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:14,662-Speed 2627.71 samples/sec   Loss 1.1628   LearningRate 0.0001   Epoch: 19   Global Step: 810130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:18,557-Speed 2629.87 samples/sec   Loss 1.1114   LearningRate 0.0001   Epoch: 19   Global Step: 810140   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:22,456-Speed 2627.03 samples/sec   Loss 1.1025   LearningRate 0.0001   Epoch: 19   Global Step: 810150   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:26,445-Speed 2568.10 samples/sec   Loss 1.1439   LearningRate 0.0001   Epoch: 19   Global Step: 810160   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:30,359-Speed 2616.51 samples/sec   Loss 1.1142   LearningRate 0.0001   Epoch: 19   Global Step: 810170   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:34,259-Speed 2627.02 samples/sec   Loss 1.1112   LearningRate 0.0001   Epoch: 19   Global Step: 810180   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:38,164-Speed 2622.29 samples/sec   Loss 1.1538   LearningRate 0.0001   Epoch: 19   Global Step: 810190   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:42,064-Speed 2626.54 samples/sec   Loss 1.1189   LearningRate 0.0001   Epoch: 19   Global Step: 810200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:45,975-Speed 2618.94 samples/sec   Loss 1.1555   LearningRate 0.0001   Epoch: 19   Global Step: 810210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:48:49,880-Speed 2622.98 samples/sec   Loss 1.0931   LearningRate 0.0001   Epoch: 19   Global Step: 810220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:48:53,791-Speed 2618.96 samples/sec   Loss 1.0819   LearningRate 0.0001   Epoch: 19   Global Step: 810230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:48:57,895-Speed 2496.18 samples/sec   Loss 1.1340   LearningRate 0.0001   Epoch: 19   Global Step: 810240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:49:01,795-Speed 2625.78 samples/sec   Loss 1.1477   LearningRate 0.0001   Epoch: 19   Global Step: 810250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:49:05,709-Speed 2617.82 samples/sec   Loss 1.1544   LearningRate 0.0001   Epoch: 19   Global Step: 810260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:49:09,639-Speed 2605.85 samples/sec   Loss 1.1376   LearningRate 0.0001   Epoch: 19   Global Step: 810270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:49:13,529-Speed 2633.74 samples/sec   Loss 1.1398   LearningRate 0.0001   Epoch: 19   Global Step: 810280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:17,432-Speed 2624.22 samples/sec   Loss 1.1436   LearningRate 0.0001   Epoch: 19   Global Step: 810290   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:21,339-Speed 2621.47 samples/sec   Loss 1.1159   LearningRate 0.0001   Epoch: 19   Global Step: 810300   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:25,287-Speed 2594.42 samples/sec   Loss 1.0584   LearningRate 0.0001   Epoch: 19   Global Step: 810310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:29,195-Speed 2621.29 samples/sec   Loss 1.0952   LearningRate 0.0001   Epoch: 19   Global Step: 810320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:33,094-Speed 2627.18 samples/sec   Loss 1.1241   LearningRate 0.0001   Epoch: 19   Global Step: 810330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:37,016-Speed 2611.60 samples/sec   Loss 1.1114   LearningRate 0.0001   Epoch: 19   Global Step: 810340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:40,927-Speed 2618.69 samples/sec   Loss 1.1275   LearningRate 0.0001   Epoch: 19   Global Step: 810350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:44,824-Speed 2628.58 samples/sec   Loss 1.0971   LearningRate 0.0001   Epoch: 19   Global Step: 810360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:48,730-Speed 2622.96 samples/sec   Loss 1.1402   LearningRate 0.0001   Epoch: 19   Global Step: 810370   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:49:52,638-Speed 2620.95 samples/sec   Loss 1.1631   LearningRate 0.0001   Epoch: 19   Global Step: 810380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:49:56,555-Speed 2615.27 samples/sec   Loss 1.1104   LearningRate 0.0001   Epoch: 19   Global Step: 810390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:00,454-Speed 2627.32 samples/sec   Loss 1.1204   LearningRate 0.0001   Epoch: 19   Global Step: 810400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:04,364-Speed 2620.06 samples/sec   Loss 1.1185   LearningRate 0.0001   Epoch: 19   Global Step: 810410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:08,336-Speed 2578.59 samples/sec   Loss 1.1329   LearningRate 0.0001   Epoch: 19   Global Step: 810420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:12,233-Speed 2628.28 samples/sec   Loss 1.1221   LearningRate 0.0001   Epoch: 19   Global Step: 810430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:16,133-Speed 2626.27 samples/sec   Loss 1.0976   LearningRate 0.0001   Epoch: 19   Global Step: 810440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:20,030-Speed 2628.08 samples/sec   Loss 1.1433   LearningRate 0.0001   Epoch: 19   Global Step: 810450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:23,934-Speed 2623.43 samples/sec   Loss 1.1263   LearningRate 0.0001   Epoch: 19   Global Step: 810460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:27,840-Speed 2622.19 samples/sec   Loss 1.0866   LearningRate 0.0001   Epoch: 19   Global Step: 810470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:31,721-Speed 2640.08 samples/sec   Loss 1.0688   LearningRate 0.0001   Epoch: 19   Global Step: 810480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:35,618-Speed 2628.25 samples/sec   Loss 1.1093   LearningRate 0.0001   Epoch: 19   Global Step: 810490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:39,539-Speed 2612.24 samples/sec   Loss 1.1154   LearningRate 0.0001   Epoch: 19   Global Step: 810500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:43,548-Speed 2555.41 samples/sec   Loss 1.0944   LearningRate 0.0001   Epoch: 19   Global Step: 810510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:47,450-Speed 2625.27 samples/sec   Loss 1.1108   LearningRate 0.0001   Epoch: 19   Global Step: 810520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:51,368-Speed 2613.90 samples/sec   Loss 1.1046   LearningRate 0.0001   Epoch: 19   Global Step: 810530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:55,264-Speed 2628.64 samples/sec   Loss 1.1180   LearningRate 0.0001   Epoch: 19   Global Step: 810540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:50:59,165-Speed 2626.33 samples/sec   Loss 1.1222   LearningRate 0.0001   Epoch: 19   Global Step: 810550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:03,063-Speed 2627.60 samples/sec   Loss 1.0837   LearningRate 0.0001   Epoch: 19   Global Step: 810560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:06,960-Speed 2628.75 samples/sec   Loss 1.1462   LearningRate 0.0001   Epoch: 19   Global Step: 810570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:10,859-Speed 2626.36 samples/sec   Loss 1.0955   LearningRate 0.0001   Epoch: 19   Global Step: 810580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:51:14,759-Speed 2626.99 samples/sec   Loss 1.1601   LearningRate 0.0001   Epoch: 19   Global Step: 810590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:51:18,658-Speed 2627.10 samples/sec   Loss 1.1064   LearningRate 0.0001   Epoch: 19   Global Step: 810600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:51:22,537-Speed 2640.35 samples/sec   Loss 1.1112   LearningRate 0.0001   Epoch: 19   Global Step: 810610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:26,441-Speed 2623.60 samples/sec   Loss 1.1848   LearningRate 0.0001   Epoch: 19   Global Step: 810620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:30,362-Speed 2612.34 samples/sec   Loss 1.1544   LearningRate 0.0001   Epoch: 19   Global Step: 810630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:34,267-Speed 2622.77 samples/sec   Loss 1.1912   LearningRate 0.0001   Epoch: 19   Global Step: 810640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:38,178-Speed 2619.67 samples/sec   Loss 1.1237   LearningRate 0.0001   Epoch: 19   Global Step: 810650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:42,083-Speed 2622.63 samples/sec   Loss 1.1608   LearningRate 0.0001   Epoch: 19   Global Step: 810660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:45,981-Speed 2628.35 samples/sec   Loss 1.1561   LearningRate 0.0001   Epoch: 19   Global Step: 810670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:49,874-Speed 2631.01 samples/sec   Loss 1.1329   LearningRate 0.0001   Epoch: 19   Global Step: 810680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:53,773-Speed 2626.82 samples/sec   Loss 1.1236   LearningRate 0.0001   Epoch: 19   Global Step: 810690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:51:57,665-Speed 2631.48 samples/sec   Loss 1.1288   LearningRate 0.0001   Epoch: 19   Global Step: 810700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:52:01,564-Speed 2627.55 samples/sec   Loss 1.1254   LearningRate 0.0001   Epoch: 19   Global Step: 810710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:05,461-Speed 2628.51 samples/sec   Loss 1.1181   LearningRate 0.0001   Epoch: 19   Global Step: 810720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:09,362-Speed 2625.39 samples/sec   Loss 1.1298   LearningRate 0.0001   Epoch: 19   Global Step: 810730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:13,275-Speed 2618.10 samples/sec   Loss 1.1016   LearningRate 0.0001   Epoch: 19   Global Step: 810740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:17,207-Speed 2604.53 samples/sec   Loss 1.1276   LearningRate 0.0001   Epoch: 19   Global Step: 810750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:21,103-Speed 2629.12 samples/sec   Loss 1.1287   LearningRate 0.0001   Epoch: 19   Global Step: 810760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:24,995-Speed 2631.87 samples/sec   Loss 1.1436   LearningRate 0.0001   Epoch: 19   Global Step: 810770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:28,890-Speed 2629.71 samples/sec   Loss 1.1328   LearningRate 0.0001   Epoch: 19   Global Step: 810780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:52:32,780-Speed 2633.31 samples/sec   Loss 1.1181   LearningRate 0.0001   Epoch: 19   Global Step: 810790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:52:36,680-Speed 2626.74 samples/sec   Loss 1.1096   LearningRate 0.0001   Epoch: 19   Global Step: 810800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:52:40,585-Speed 2623.45 samples/sec   Loss 1.1070   LearningRate 0.0001   Epoch: 19   Global Step: 810810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:52:44,496-Speed 2618.27 samples/sec   Loss 1.1209   LearningRate 0.0001   Epoch: 19   Global Step: 810820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:52:48,373-Speed 2642.34 samples/sec   Loss 1.1451   LearningRate 0.0001   Epoch: 19   Global Step: 810830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:52:52,278-Speed 2623.23 samples/sec   Loss 1.1078   LearningRate 0.0001   Epoch: 19   Global Step: 810840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:52:56,304-Speed 2544.50 samples/sec   Loss 1.1245   LearningRate 0.0001   Epoch: 19   Global Step: 810850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:00,196-Speed 2631.95 samples/sec   Loss 1.1035   LearningRate 0.0001   Epoch: 19   Global Step: 810860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:04,140-Speed 2597.12 samples/sec   Loss 1.1341   LearningRate 0.0001   Epoch: 19   Global Step: 810870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:08,076-Speed 2602.05 samples/sec   Loss 1.1604   LearningRate 0.0001   Epoch: 19   Global Step: 810880   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:11,980-Speed 2623.89 samples/sec   Loss 1.1208   LearningRate 0.0001   Epoch: 19   Global Step: 810890   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:15,879-Speed 2627.56 samples/sec   Loss 1.1418   LearningRate 0.0001   Epoch: 19   Global Step: 810900   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:19,779-Speed 2626.08 samples/sec   Loss 1.1300   LearningRate 0.0001   Epoch: 19   Global Step: 810910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:23,693-Speed 2617.10 samples/sec   Loss 1.1329   LearningRate 0.0001   Epoch: 19   Global Step: 810920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:53:27,590-Speed 2628.50 samples/sec   Loss 1.1422   LearningRate 0.0001   Epoch: 19   Global Step: 810930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:31,486-Speed 2628.97 samples/sec   Loss 1.1504   LearningRate 0.0001   Epoch: 19   Global Step: 810940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:35,389-Speed 2624.55 samples/sec   Loss 1.1170   LearningRate 0.0001   Epoch: 19   Global Step: 810950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:39,291-Speed 2625.15 samples/sec   Loss 1.1637   LearningRate 0.0001   Epoch: 19   Global Step: 810960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:43,184-Speed 2630.87 samples/sec   Loss 1.1211   LearningRate 0.0001   Epoch: 19   Global Step: 810970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:47,084-Speed 2626.41 samples/sec   Loss 1.0601   LearningRate 0.0001   Epoch: 19   Global Step: 810980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:51,042-Speed 2588.70 samples/sec   Loss 1.1019   LearningRate 0.0001   Epoch: 19   Global Step: 810990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:54,945-Speed 2624.37 samples/sec   Loss 1.0796   LearningRate 0.0001   Epoch: 19   Global Step: 811000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:53:58,871-Speed 2608.92 samples/sec   Loss 1.1161   LearningRate 0.0001   Epoch: 19   Global Step: 811010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:02,797-Speed 2608.56 samples/sec   Loss 1.1302   LearningRate 0.0001   Epoch: 19   Global Step: 811020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:06,679-Speed 2638.98 samples/sec   Loss 1.1379   LearningRate 0.0001   Epoch: 19   Global Step: 811030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:10,574-Speed 2629.05 samples/sec   Loss 1.1146   LearningRate 0.0000   Epoch: 19   Global Step: 811040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:14,471-Speed 2628.63 samples/sec   Loss 1.1440   LearningRate 0.0000   Epoch: 19   Global Step: 811050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:18,366-Speed 2630.26 samples/sec   Loss 1.1002   LearningRate 0.0000   Epoch: 19   Global Step: 811060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:22,262-Speed 2628.79 samples/sec   Loss 1.0927   LearningRate 0.0000   Epoch: 19   Global Step: 811070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:26,172-Speed 2619.71 samples/sec   Loss 1.1465   LearningRate 0.0000   Epoch: 19   Global Step: 811080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:30,070-Speed 2628.04 samples/sec   Loss 1.1259   LearningRate 0.0000   Epoch: 19   Global Step: 811090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:33,962-Speed 2631.28 samples/sec   Loss 1.1774   LearningRate 0.0000   Epoch: 19   Global Step: 811100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:37,858-Speed 2629.31 samples/sec   Loss 1.1220   LearningRate 0.0000   Epoch: 19   Global Step: 811110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:41,754-Speed 2629.02 samples/sec   Loss 1.1218   LearningRate 0.0000   Epoch: 19   Global Step: 811120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:45,664-Speed 2619.65 samples/sec   Loss 1.0410   LearningRate 0.0000   Epoch: 19   Global Step: 811130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:54:49,546-Speed 2638.40 samples/sec   Loss 1.1384   LearningRate 0.0000   Epoch: 19   Global Step: 811140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:53,447-Speed 2626.10 samples/sec   Loss 1.0836   LearningRate 0.0000   Epoch: 19   Global Step: 811150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:54:57,352-Speed 2623.20 samples/sec   Loss 1.0810   LearningRate 0.0000   Epoch: 19   Global Step: 811160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:55:01,259-Speed 2622.78 samples/sec   Loss 1.1406   LearningRate 0.0000   Epoch: 19   Global Step: 811170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:55:05,159-Speed 2626.07 samples/sec   Loss 1.1118   LearningRate 0.0000   Epoch: 19   Global Step: 811180   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:09,051-Speed 2631.91 samples/sec   Loss 1.1327   LearningRate 0.0000   Epoch: 19   Global Step: 811190   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:12,950-Speed 2626.85 samples/sec   Loss 1.0814   LearningRate 0.0000   Epoch: 19   Global Step: 811200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:16,901-Speed 2592.60 samples/sec   Loss 1.1041   LearningRate 0.0000   Epoch: 19   Global Step: 811210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:20,809-Speed 2621.72 samples/sec   Loss 1.1514   LearningRate 0.0000   Epoch: 19   Global Step: 811220   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:24,724-Speed 2615.88 samples/sec   Loss 1.1278   LearningRate 0.0000   Epoch: 19   Global Step: 811230   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:28,622-Speed 2627.52 samples/sec   Loss 1.1510   LearningRate 0.0000   Epoch: 19   Global Step: 811240   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:32,522-Speed 2626.37 samples/sec   Loss 1.0922   LearningRate 0.0000   Epoch: 19   Global Step: 811250   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:36,415-Speed 2630.98 samples/sec   Loss 1.1197   LearningRate 0.0000   Epoch: 19   Global Step: 811260   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:40,334-Speed 2613.58 samples/sec   Loss 1.0767   LearningRate 0.0000   Epoch: 19   Global Step: 811270   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:55:44,234-Speed 2626.43 samples/sec   Loss 1.1363   LearningRate 0.0000   Epoch: 19   Global Step: 811280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:55:48,126-Speed 2631.97 samples/sec   Loss 1.0761   LearningRate 0.0000   Epoch: 19   Global Step: 811290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:55:52,020-Speed 2630.65 samples/sec   Loss 1.1688   LearningRate 0.0000   Epoch: 19   Global Step: 811300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:55:55,918-Speed 2627.56 samples/sec   Loss 1.1012   LearningRate 0.0000   Epoch: 19   Global Step: 811310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:55:59,845-Speed 2608.36 samples/sec   Loss 1.1308   LearningRate 0.0000   Epoch: 19   Global Step: 811320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:03,748-Speed 2624.16 samples/sec   Loss 1.1333   LearningRate 0.0000   Epoch: 19   Global Step: 811330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:07,651-Speed 2624.47 samples/sec   Loss 1.0893   LearningRate 0.0000   Epoch: 19   Global Step: 811340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:11,548-Speed 2628.37 samples/sec   Loss 1.1361   LearningRate 0.0000   Epoch: 19   Global Step: 811350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:15,446-Speed 2627.83 samples/sec   Loss 1.1609   LearningRate 0.0000   Epoch: 19   Global Step: 811360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:19,350-Speed 2623.41 samples/sec   Loss 1.1521   LearningRate 0.0000   Epoch: 19   Global Step: 811370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:23,249-Speed 2626.82 samples/sec   Loss 1.1412   LearningRate 0.0000   Epoch: 19   Global Step: 811380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:56:27,125-Speed 2642.80 samples/sec   Loss 1.1381   LearningRate 0.0000   Epoch: 19   Global Step: 811390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:31,019-Speed 2631.12 samples/sec   Loss 1.1077   LearningRate 0.0000   Epoch: 19   Global Step: 811400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:34,919-Speed 2625.90 samples/sec   Loss 1.1675   LearningRate 0.0000   Epoch: 19   Global Step: 811410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:38,813-Speed 2630.89 samples/sec   Loss 1.0896   LearningRate 0.0000   Epoch: 19   Global Step: 811420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:42,803-Speed 2567.39 samples/sec   Loss 1.0771   LearningRate 0.0000   Epoch: 19   Global Step: 811430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:46,738-Speed 2603.17 samples/sec   Loss 1.1053   LearningRate 0.0000   Epoch: 19   Global Step: 811440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:50,689-Speed 2592.58 samples/sec   Loss 1.1131   LearningRate 0.0000   Epoch: 19   Global Step: 811450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:56:54,581-Speed 2631.51 samples/sec   Loss 1.0878   LearningRate 0.0000   Epoch: 19   Global Step: 811460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:56:58,532-Speed 2591.97 samples/sec   Loss 1.1443   LearningRate 0.0000   Epoch: 19   Global Step: 811470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:02,429-Speed 2628.78 samples/sec   Loss 1.0825   LearningRate 0.0000   Epoch: 19   Global Step: 811480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:06,328-Speed 2627.29 samples/sec   Loss 1.0951   LearningRate 0.0000   Epoch: 19   Global Step: 811490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:10,230-Speed 2624.70 samples/sec   Loss 1.1505   LearningRate 0.0000   Epoch: 19   Global Step: 811500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:14,129-Speed 2627.92 samples/sec   Loss 1.1319   LearningRate 0.0000   Epoch: 19   Global Step: 811510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:18,024-Speed 2629.59 samples/sec   Loss 1.0837   LearningRate 0.0000   Epoch: 19   Global Step: 811520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:21,915-Speed 2632.41 samples/sec   Loss 1.0737   LearningRate 0.0000   Epoch: 19   Global Step: 811530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:25,807-Speed 2631.36 samples/sec   Loss 1.1047   LearningRate 0.0000   Epoch: 19   Global Step: 811540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:29,700-Speed 2631.66 samples/sec   Loss 1.1365   LearningRate 0.0000   Epoch: 19   Global Step: 811550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 14:57:33,596-Speed 2628.24 samples/sec   Loss 1.1018   LearningRate 0.0000   Epoch: 19   Global Step: 811560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:57:37,509-Speed 2617.94 samples/sec   Loss 1.1863   LearningRate 0.0000   Epoch: 19   Global Step: 811570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:57:41,413-Speed 2623.80 samples/sec   Loss 1.1129   LearningRate 0.0000   Epoch: 19   Global Step: 811580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:57:45,309-Speed 2629.25 samples/sec   Loss 1.1253   LearningRate 0.0000   Epoch: 19   Global Step: 811590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:57:49,202-Speed 2631.25 samples/sec   Loss 1.1428   LearningRate 0.0000   Epoch: 19   Global Step: 811600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:57:53,095-Speed 2630.44 samples/sec   Loss 1.0986   LearningRate 0.0000   Epoch: 19   Global Step: 811610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:57:56,992-Speed 2628.06 samples/sec   Loss 1.1028   LearningRate 0.0000   Epoch: 19   Global Step: 811620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:00,904-Speed 2618.94 samples/sec   Loss 1.1374   LearningRate 0.0000   Epoch: 19   Global Step: 811630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:04,817-Speed 2617.27 samples/sec   Loss 1.1013   LearningRate 0.0000   Epoch: 19   Global Step: 811640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:08,714-Speed 2627.98 samples/sec   Loss 1.1059   LearningRate 0.0000   Epoch: 19   Global Step: 811650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:12,628-Speed 2617.27 samples/sec   Loss 1.1083   LearningRate 0.0000   Epoch: 19   Global Step: 811660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:58:16,528-Speed 2626.85 samples/sec   Loss 1.1374   LearningRate 0.0000   Epoch: 19   Global Step: 811670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:58:20,440-Speed 2617.70 samples/sec   Loss 1.0863   LearningRate 0.0000   Epoch: 19   Global Step: 811680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:58:24,336-Speed 2629.13 samples/sec   Loss 1.0839   LearningRate 0.0000   Epoch: 19   Global Step: 811690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:58:28,208-Speed 2644.91 samples/sec   Loss 1.1623   LearningRate 0.0000   Epoch: 19   Global Step: 811700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:32,259-Speed 2529.17 samples/sec   Loss 1.0906   LearningRate 0.0000   Epoch: 19   Global Step: 811710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:36,215-Speed 2589.65 samples/sec   Loss 1.0970   LearningRate 0.0000   Epoch: 19   Global Step: 811720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:40,108-Speed 2630.68 samples/sec   Loss 1.1190   LearningRate 0.0000   Epoch: 19   Global Step: 811730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:44,006-Speed 2627.62 samples/sec   Loss 1.1263   LearningRate 0.0000   Epoch: 19   Global Step: 811740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:47,896-Speed 2633.21 samples/sec   Loss 1.1033   LearningRate 0.0000   Epoch: 19   Global Step: 811750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:51,826-Speed 2605.99 samples/sec   Loss 1.1048   LearningRate 0.0000   Epoch: 19   Global Step: 811760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:55,740-Speed 2617.05 samples/sec   Loss 1.1267   LearningRate 0.0000   Epoch: 19   Global Step: 811770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:58:59,630-Speed 2633.72 samples/sec   Loss 1.1563   LearningRate 0.0000   Epoch: 19   Global Step: 811780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:03,526-Speed 2628.42 samples/sec   Loss 1.0685   LearningRate 0.0000   Epoch: 19   Global Step: 811790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:07,419-Speed 2631.11 samples/sec   Loss 1.0937   LearningRate 0.0000   Epoch: 19   Global Step: 811800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:59:11,315-Speed 2629.27 samples/sec   Loss 1.1120   LearningRate 0.0000   Epoch: 19   Global Step: 811810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:59:15,218-Speed 2623.94 samples/sec   Loss 1.1348   LearningRate 0.0000   Epoch: 19   Global Step: 811820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 14:59:19,099-Speed 2639.14 samples/sec   Loss 1.0922   LearningRate 0.0000   Epoch: 19   Global Step: 811830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:23,037-Speed 2600.95 samples/sec   Loss 1.1024   LearningRate 0.0000   Epoch: 19   Global Step: 811840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:26,935-Speed 2627.70 samples/sec   Loss 1.1240   LearningRate 0.0000   Epoch: 19   Global Step: 811850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:30,873-Speed 2601.58 samples/sec   Loss 1.1416   LearningRate 0.0000   Epoch: 19   Global Step: 811860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:34,773-Speed 2626.54 samples/sec   Loss 1.0870   LearningRate 0.0000   Epoch: 19   Global Step: 811870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:38,668-Speed 2629.40 samples/sec   Loss 1.1106   LearningRate 0.0000   Epoch: 19   Global Step: 811880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:42,558-Speed 2632.86 samples/sec   Loss 1.1440   LearningRate 0.0000   Epoch: 19   Global Step: 811890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:46,456-Speed 2628.09 samples/sec   Loss 1.1313   LearningRate 0.0000   Epoch: 19   Global Step: 811900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:50,365-Speed 2620.21 samples/sec   Loss 1.0847   LearningRate 0.0000   Epoch: 19   Global Step: 811910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:54,277-Speed 2618.24 samples/sec   Loss 1.1307   LearningRate 0.0000   Epoch: 19   Global Step: 811920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 14:59:58,171-Speed 2630.86 samples/sec   Loss 1.1218   LearningRate 0.0000   Epoch: 19   Global Step: 811930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:00:02,054-Speed 2637.68 samples/sec   Loss 1.1197   LearningRate 0.0000   Epoch: 19   Global Step: 811940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:05,944-Speed 2633.24 samples/sec   Loss 1.0933   LearningRate 0.0000   Epoch: 19   Global Step: 811950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:09,841-Speed 2627.84 samples/sec   Loss 1.1256   LearningRate 0.0000   Epoch: 19   Global Step: 811960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:13,731-Speed 2632.59 samples/sec   Loss 1.1169   LearningRate 0.0000   Epoch: 19   Global Step: 811970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:17,643-Speed 2619.11 samples/sec   Loss 1.1196   LearningRate 0.0000   Epoch: 19   Global Step: 811980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:21,536-Speed 2631.59 samples/sec   Loss 1.0359   LearningRate 0.0000   Epoch: 19   Global Step: 811990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:25,428-Speed 2631.17 samples/sec   Loss 1.0940   LearningRate 0.0000   Epoch: 19   Global Step: 812000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:29,474-Speed 2532.10 samples/sec   Loss 1.1198   LearningRate 0.0000   Epoch: 19   Global Step: 812010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:33,370-Speed 2628.81 samples/sec   Loss 1.1058   LearningRate 0.0000   Epoch: 19   Global Step: 812020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:37,271-Speed 2625.52 samples/sec   Loss 1.1488   LearningRate 0.0000   Epoch: 19   Global Step: 812030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:00:41,179-Speed 2620.84 samples/sec   Loss 1.1753   LearningRate 0.0000   Epoch: 19   Global Step: 812040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:00:45,027-Speed 2662.25 samples/sec   Loss 1.1351   LearningRate 0.0000   Epoch: 19   Global Step: 812050   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:00:48,922-Speed 2629.34 samples/sec   Loss 1.1437   LearningRate 0.0000   Epoch: 19   Global Step: 812060   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:00:52,820-Speed 2628.25 samples/sec   Loss 1.1373   LearningRate 0.0000   Epoch: 19   Global Step: 812070   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:00:56,728-Speed 2620.95 samples/sec   Loss 1.1377   LearningRate 0.0000   Epoch: 19   Global Step: 812080   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:00,627-Speed 2627.30 samples/sec   Loss 1.0926   LearningRate 0.0000   Epoch: 19   Global Step: 812090   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:04,519-Speed 2631.67 samples/sec   Loss 1.1040   LearningRate 0.0000   Epoch: 19   Global Step: 812100   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:08,416-Speed 2627.85 samples/sec   Loss 1.0949   LearningRate 0.0000   Epoch: 19   Global Step: 812110   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:12,315-Speed 2627.19 samples/sec   Loss 1.1434   LearningRate 0.0000   Epoch: 19   Global Step: 812120   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:16,216-Speed 2625.66 samples/sec   Loss 1.0976   LearningRate 0.0000   Epoch: 19   Global Step: 812130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:20,116-Speed 2626.76 samples/sec   Loss 1.1009   LearningRate 0.0000   Epoch: 19   Global Step: 812140   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:01:24,022-Speed 2622.23 samples/sec   Loss 1.1010   LearningRate 0.0000   Epoch: 19   Global Step: 812150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:27,917-Speed 2629.60 samples/sec   Loss 1.1426   LearningRate 0.0000   Epoch: 19   Global Step: 812160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:31,815-Speed 2628.60 samples/sec   Loss 1.1120   LearningRate 0.0000   Epoch: 19   Global Step: 812170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:35,711-Speed 2628.32 samples/sec   Loss 1.1175   LearningRate 0.0000   Epoch: 19   Global Step: 812180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:39,609-Speed 2627.84 samples/sec   Loss 1.1678   LearningRate 0.0000   Epoch: 19   Global Step: 812190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:43,515-Speed 2621.69 samples/sec   Loss 1.0829   LearningRate 0.0000   Epoch: 19   Global Step: 812200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:47,406-Speed 2633.38 samples/sec   Loss 1.0668   LearningRate 0.0000   Epoch: 19   Global Step: 812210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:51,297-Speed 2632.38 samples/sec   Loss 1.1235   LearningRate 0.0000   Epoch: 19   Global Step: 812220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:55,253-Speed 2588.98 samples/sec   Loss 1.1058   LearningRate 0.0000   Epoch: 19   Global Step: 812230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:01:59,147-Speed 2631.10 samples/sec   Loss 1.0968   LearningRate 0.0000   Epoch: 19   Global Step: 812240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:03,042-Speed 2629.48 samples/sec   Loss 1.1253   LearningRate 0.0000   Epoch: 19   Global Step: 812250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:02:06,942-Speed 2626.10 samples/sec   Loss 1.0763   LearningRate 0.0000   Epoch: 19   Global Step: 812260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:10,835-Speed 2631.31 samples/sec   Loss 1.0810   LearningRate 0.0000   Epoch: 19   Global Step: 812270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:14,727-Speed 2631.64 samples/sec   Loss 1.0952   LearningRate 0.0000   Epoch: 19   Global Step: 812280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:18,626-Speed 2626.97 samples/sec   Loss 1.1181   LearningRate 0.0000   Epoch: 19   Global Step: 812290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:22,532-Speed 2622.96 samples/sec   Loss 1.1178   LearningRate 0.0000   Epoch: 19   Global Step: 812300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:26,429-Speed 2628.35 samples/sec   Loss 1.1013   LearningRate 0.0000   Epoch: 19   Global Step: 812310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:30,359-Speed 2606.34 samples/sec   Loss 1.1651   LearningRate 0.0000   Epoch: 19   Global Step: 812320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:34,257-Speed 2628.03 samples/sec   Loss 1.0973   LearningRate 0.0000   Epoch: 19   Global Step: 812330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:38,160-Speed 2624.14 samples/sec   Loss 1.1321   LearningRate 0.0000   Epoch: 19   Global Step: 812340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:42,060-Speed 2627.15 samples/sec   Loss 1.1014   LearningRate 0.0000   Epoch: 19   Global Step: 812350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:02:45,959-Speed 2626.60 samples/sec   Loss 1.1243   LearningRate 0.0000   Epoch: 19   Global Step: 812360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:02:49,861-Speed 2625.34 samples/sec   Loss 1.1049   LearningRate 0.0000   Epoch: 19   Global Step: 812370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:02:53,755-Speed 2630.75 samples/sec   Loss 1.0875   LearningRate 0.0000   Epoch: 19   Global Step: 812380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:02:57,669-Speed 2617.05 samples/sec   Loss 1.1269   LearningRate 0.0000   Epoch: 19   Global Step: 812390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:03:01,675-Speed 2556.14 samples/sec   Loss 1.1571   LearningRate 0.0000   Epoch: 19   Global Step: 812400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:03:05,585-Speed 2620.21 samples/sec   Loss 1.1557   LearningRate 0.0000   Epoch: 19   Global Step: 812410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:03:09,478-Speed 2630.99 samples/sec   Loss 1.1071   LearningRate 0.0000   Epoch: 19   Global Step: 812420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:03:13,372-Speed 2630.57 samples/sec   Loss 1.1380   LearningRate 0.0000   Epoch: 19   Global Step: 812430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:03:17,243-Speed 2646.35 samples/sec   Loss 1.1062   LearningRate 0.0000   Epoch: 19   Global Step: 812440   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:21,136-Speed 2630.57 samples/sec   Loss 1.0994   LearningRate 0.0000   Epoch: 19   Global Step: 812450   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:25,029-Speed 2631.38 samples/sec   Loss 1.1232   LearningRate 0.0000   Epoch: 19   Global Step: 812460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:28,934-Speed 2622.68 samples/sec   Loss 1.1025   LearningRate 0.0000   Epoch: 19   Global Step: 812470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:32,829-Speed 2630.30 samples/sec   Loss 1.1582   LearningRate 0.0000   Epoch: 19   Global Step: 812480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:36,718-Speed 2633.50 samples/sec   Loss 1.1253   LearningRate 0.0000   Epoch: 19   Global Step: 812490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:40,612-Speed 2630.69 samples/sec   Loss 1.0593   LearningRate 0.0000   Epoch: 19   Global Step: 812500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:44,505-Speed 2630.52 samples/sec   Loss 1.1386   LearningRate 0.0000   Epoch: 19   Global Step: 812510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:48,401-Speed 2629.44 samples/sec   Loss 1.0732   LearningRate 0.0000   Epoch: 19   Global Step: 812520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:52,314-Speed 2617.71 samples/sec   Loss 1.1197   LearningRate 0.0000   Epoch: 19   Global Step: 812530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:03:56,212-Speed 2627.48 samples/sec   Loss 1.1089   LearningRate 0.0000   Epoch: 19   Global Step: 812540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:00,107-Speed 2629.66 samples/sec   Loss 1.0867   LearningRate 0.0000   Epoch: 19   Global Step: 812550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:04,001-Speed 2630.52 samples/sec   Loss 1.1685   LearningRate 0.0000   Epoch: 19   Global Step: 812560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:07,890-Speed 2633.00 samples/sec   Loss 1.1030   LearningRate 0.0000   Epoch: 19   Global Step: 812570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:11,782-Speed 2632.28 samples/sec   Loss 1.0580   LearningRate 0.0000   Epoch: 19   Global Step: 812580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:15,679-Speed 2628.19 samples/sec   Loss 1.1227   LearningRate 0.0000   Epoch: 19   Global Step: 812590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:19,695-Speed 2551.18 samples/sec   Loss 1.0670   LearningRate 0.0000   Epoch: 19   Global Step: 812600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:23,790-Speed 2500.91 samples/sec   Loss 1.1297   LearningRate 0.0000   Epoch: 19   Global Step: 812610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:27,704-Speed 2616.84 samples/sec   Loss 1.0660   LearningRate 0.0000   Epoch: 19   Global Step: 812620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:31,597-Speed 2630.97 samples/sec   Loss 1.1104   LearningRate 0.0000   Epoch: 19   Global Step: 812630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:35,477-Speed 2639.76 samples/sec   Loss 1.0680   LearningRate 0.0000   Epoch: 19   Global Step: 812640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:39,369-Speed 2631.31 samples/sec   Loss 1.0900   LearningRate 0.0000   Epoch: 19   Global Step: 812650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:43,262-Speed 2631.29 samples/sec   Loss 1.1140   LearningRate 0.0000   Epoch: 19   Global Step: 812660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:47,196-Speed 2604.62 samples/sec   Loss 1.1247   LearningRate 0.0000   Epoch: 19   Global Step: 812670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:51,273-Speed 2512.38 samples/sec   Loss 1.1226   LearningRate 0.0000   Epoch: 19   Global Step: 812680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:55,218-Speed 2596.61 samples/sec   Loss 1.0850   LearningRate 0.0000   Epoch: 19   Global Step: 812690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:04:59,120-Speed 2624.26 samples/sec   Loss 1.1536   LearningRate 0.0000   Epoch: 19   Global Step: 812700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:05:03,046-Speed 2609.30 samples/sec   Loss 1.1486   LearningRate 0.0000   Epoch: 19   Global Step: 812710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:05:06,938-Speed 2631.67 samples/sec   Loss 1.1406   LearningRate 0.0000   Epoch: 19   Global Step: 812720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:05:10,831-Speed 2631.28 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 812730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:05:14,728-Speed 2628.49 samples/sec   Loss 1.0945   LearningRate 0.0000   Epoch: 19   Global Step: 812740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:05:18,577-Speed 2661.26 samples/sec   Loss 1.1174   LearningRate 0.0000   Epoch: 19   Global Step: 812750   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:22,474-Speed 2628.34 samples/sec   Loss 1.1644   LearningRate 0.0000   Epoch: 19   Global Step: 812760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:26,369-Speed 2629.80 samples/sec   Loss 1.1147   LearningRate 0.0000   Epoch: 19   Global Step: 812770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:30,261-Speed 2631.64 samples/sec   Loss 1.1508   LearningRate 0.0000   Epoch: 19   Global Step: 812780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:34,168-Speed 2621.47 samples/sec   Loss 1.1215   LearningRate 0.0000   Epoch: 19   Global Step: 812790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:38,083-Speed 2616.10 samples/sec   Loss 1.1670   LearningRate 0.0000   Epoch: 19   Global Step: 812800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:41,980-Speed 2629.40 samples/sec   Loss 1.1400   LearningRate 0.0000   Epoch: 19   Global Step: 812810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:45,875-Speed 2628.99 samples/sec   Loss 1.1212   LearningRate 0.0000   Epoch: 19   Global Step: 812820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:49,778-Speed 2624.61 samples/sec   Loss 1.1253   LearningRate 0.0000   Epoch: 19   Global Step: 812830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:53,671-Speed 2631.42 samples/sec   Loss 1.1413   LearningRate 0.0000   Epoch: 19   Global Step: 812840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:05:57,565-Speed 2630.55 samples/sec   Loss 1.0811   LearningRate 0.0000   Epoch: 19   Global Step: 812850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:01,482-Speed 2615.00 samples/sec   Loss 1.0642   LearningRate 0.0000   Epoch: 19   Global Step: 812860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:05,379-Speed 2627.84 samples/sec   Loss 1.1235   LearningRate 0.0000   Epoch: 19   Global Step: 812870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:09,279-Speed 2626.45 samples/sec   Loss 1.0597   LearningRate 0.0000   Epoch: 19   Global Step: 812880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:13,180-Speed 2626.51 samples/sec   Loss 1.1184   LearningRate 0.0000   Epoch: 19   Global Step: 812890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:17,181-Speed 2560.51 samples/sec   Loss 1.0733   LearningRate 0.0000   Epoch: 19   Global Step: 812900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:21,101-Speed 2612.57 samples/sec   Loss 1.0526   LearningRate 0.0000   Epoch: 19   Global Step: 812910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:24,995-Speed 2630.59 samples/sec   Loss 1.1278   LearningRate 0.0000   Epoch: 19   Global Step: 812920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:28,891-Speed 2628.72 samples/sec   Loss 1.1682   LearningRate 0.0000   Epoch: 19   Global Step: 812930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:32,783-Speed 2632.16 samples/sec   Loss 1.1182   LearningRate 0.0000   Epoch: 19   Global Step: 812940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:36,698-Speed 2616.23 samples/sec   Loss 1.0514   LearningRate 0.0000   Epoch: 19   Global Step: 812950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:06:40,569-Speed 2646.22 samples/sec   Loss 1.1028   LearningRate 0.0000   Epoch: 19   Global Step: 812960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:44,484-Speed 2615.90 samples/sec   Loss 1.1298   LearningRate 0.0000   Epoch: 19   Global Step: 812970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:48,376-Speed 2632.40 samples/sec   Loss 1.0809   LearningRate 0.0000   Epoch: 19   Global Step: 812980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:52,268-Speed 2631.23 samples/sec   Loss 1.1139   LearningRate 0.0000   Epoch: 19   Global Step: 812990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:06:56,168-Speed 2626.79 samples/sec   Loss 1.0912   LearningRate 0.0000   Epoch: 19   Global Step: 813000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:00,065-Speed 2627.99 samples/sec   Loss 1.1092   LearningRate 0.0000   Epoch: 19   Global Step: 813010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:03,967-Speed 2624.78 samples/sec   Loss 1.1348   LearningRate 0.0000   Epoch: 19   Global Step: 813020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:07,880-Speed 2617.52 samples/sec   Loss 1.0586   LearningRate 0.0000   Epoch: 19   Global Step: 813030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:11,780-Speed 2626.47 samples/sec   Loss 1.1498   LearningRate 0.0000   Epoch: 19   Global Step: 813040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:15,686-Speed 2622.24 samples/sec   Loss 1.1214   LearningRate 0.0000   Epoch: 19   Global Step: 813050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:19,596-Speed 2619.56 samples/sec   Loss 1.1377   LearningRate 0.0000   Epoch: 19   Global Step: 813060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:23,488-Speed 2631.79 samples/sec   Loss 1.1502   LearningRate 0.0000   Epoch: 19   Global Step: 813070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:27,383-Speed 2629.38 samples/sec   Loss 1.1264   LearningRate 0.0000   Epoch: 19   Global Step: 813080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:31,302-Speed 2614.01 samples/sec   Loss 1.1026   LearningRate 0.0000   Epoch: 19   Global Step: 813090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:35,215-Speed 2617.68 samples/sec   Loss 1.1384   LearningRate 0.0000   Epoch: 19   Global Step: 813100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:39,117-Speed 2624.72 samples/sec   Loss 1.1209   LearningRate 0.0000   Epoch: 19   Global Step: 813110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:43,023-Speed 2622.64 samples/sec   Loss 1.0924   LearningRate 0.0000   Epoch: 19   Global Step: 813120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:07:46,893-Speed 2646.73 samples/sec   Loss 1.1097   LearningRate 0.0000   Epoch: 19   Global Step: 813130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:50,788-Speed 2629.73 samples/sec   Loss 1.0915   LearningRate 0.0000   Epoch: 19   Global Step: 813140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:54,682-Speed 2630.84 samples/sec   Loss 1.1027   LearningRate 0.0000   Epoch: 19   Global Step: 813150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:07:58,581-Speed 2626.71 samples/sec   Loss 1.1213   LearningRate 0.0000   Epoch: 19   Global Step: 813160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:02,478-Speed 2628.50 samples/sec   Loss 1.0703   LearningRate 0.0000   Epoch: 19   Global Step: 813170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:06,570-Speed 2502.95 samples/sec   Loss 1.0986   LearningRate 0.0000   Epoch: 19   Global Step: 813180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:10,466-Speed 2629.36 samples/sec   Loss 1.1080   LearningRate 0.0000   Epoch: 19   Global Step: 813190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:14,361-Speed 2629.50 samples/sec   Loss 1.1150   LearningRate 0.0000   Epoch: 19   Global Step: 813200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:18,304-Speed 2597.83 samples/sec   Loss 1.1687   LearningRate 0.0000   Epoch: 19   Global Step: 813210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:22,203-Speed 2626.95 samples/sec   Loss 1.1174   LearningRate 0.0000   Epoch: 19   Global Step: 813220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:26,100-Speed 2628.69 samples/sec   Loss 1.1234   LearningRate 0.0000   Epoch: 19   Global Step: 813230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:08:30,031-Speed 2620.83 samples/sec   Loss 1.0982   LearningRate 0.0000   Epoch: 19   Global Step: 813240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:33,921-Speed 2632.80 samples/sec   Loss 1.1039   LearningRate 0.0000   Epoch: 19   Global Step: 813250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:37,837-Speed 2615.54 samples/sec   Loss 1.1110   LearningRate 0.0000   Epoch: 19   Global Step: 813260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:41,741-Speed 2623.60 samples/sec   Loss 1.1077   LearningRate 0.0000   Epoch: 19   Global Step: 813270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:45,663-Speed 2632.40 samples/sec   Loss 1.1186   LearningRate 0.0000   Epoch: 19   Global Step: 813280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:49,559-Speed 2628.74 samples/sec   Loss 1.1446   LearningRate 0.0000   Epoch: 19   Global Step: 813290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:53,456-Speed 2628.59 samples/sec   Loss 1.0989   LearningRate 0.0000   Epoch: 19   Global Step: 813300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:08:57,364-Speed 2620.65 samples/sec   Loss 1.1145   LearningRate 0.0000   Epoch: 19   Global Step: 813310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:01,258-Speed 2631.39 samples/sec   Loss 1.1344   LearningRate 0.0000   Epoch: 19   Global Step: 813320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:05,152-Speed 2630.24 samples/sec   Loss 1.1308   LearningRate 0.0000   Epoch: 19   Global Step: 813330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:09,025-Speed 2644.44 samples/sec   Loss 1.0528   LearningRate 0.0000   Epoch: 19   Global Step: 813340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:12,920-Speed 2629.13 samples/sec   Loss 1.1223   LearningRate 0.0000   Epoch: 19   Global Step: 813350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:16,824-Speed 2623.95 samples/sec   Loss 1.1189   LearningRate 0.0000   Epoch: 19   Global Step: 813360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:20,718-Speed 2630.06 samples/sec   Loss 1.1220   LearningRate 0.0000   Epoch: 19   Global Step: 813370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:24,713-Speed 2564.19 samples/sec   Loss 1.0690   LearningRate 0.0000   Epoch: 19   Global Step: 813380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:28,615-Speed 2624.92 samples/sec   Loss 1.0804   LearningRate 0.0000   Epoch: 19   Global Step: 813390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:32,516-Speed 2625.60 samples/sec   Loss 1.1091   LearningRate 0.0000   Epoch: 19   Global Step: 813400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:36,432-Speed 2616.10 samples/sec   Loss 1.1326   LearningRate 0.0000   Epoch: 19   Global Step: 813410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:40,336-Speed 2623.08 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 813420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:44,236-Speed 2626.05 samples/sec   Loss 1.1825   LearningRate 0.0000   Epoch: 19   Global Step: 813430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:48,139-Speed 2625.65 samples/sec   Loss 1.1532   LearningRate 0.0000   Epoch: 19   Global Step: 813440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:09:52,016-Speed 2641.80 samples/sec   Loss 1.0989   LearningRate 0.0000   Epoch: 19   Global Step: 813450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:55,907-Speed 2632.32 samples/sec   Loss 1.1378   LearningRate 0.0000   Epoch: 19   Global Step: 813460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:09:59,803-Speed 2628.78 samples/sec   Loss 1.1134   LearningRate 0.0000   Epoch: 19   Global Step: 813470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:03,702-Speed 2627.89 samples/sec   Loss 1.0960   LearningRate 0.0000   Epoch: 19   Global Step: 813480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:07,600-Speed 2627.70 samples/sec   Loss 1.1180   LearningRate 0.0000   Epoch: 19   Global Step: 813490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:11,490-Speed 2632.87 samples/sec   Loss 1.0694   LearningRate 0.0000   Epoch: 19   Global Step: 813500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:15,392-Speed 2624.69 samples/sec   Loss 1.0921   LearningRate 0.0000   Epoch: 19   Global Step: 813510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:19,289-Speed 2628.01 samples/sec   Loss 1.1205   LearningRate 0.0000   Epoch: 19   Global Step: 813520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:23,183-Speed 2630.77 samples/sec   Loss 1.1923   LearningRate 0.0000   Epoch: 19   Global Step: 813530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:27,098-Speed 2616.21 samples/sec   Loss 1.0983   LearningRate 0.0000   Epoch: 19   Global Step: 813540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:10:30,999-Speed 2626.22 samples/sec   Loss 1.1107   LearningRate 0.0000   Epoch: 19   Global Step: 813550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:34,892-Speed 2631.43 samples/sec   Loss 1.1428   LearningRate 0.0000   Epoch: 19   Global Step: 813560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:38,788-Speed 2628.74 samples/sec   Loss 1.1536   LearningRate 0.0000   Epoch: 19   Global Step: 813570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:42,710-Speed 2611.41 samples/sec   Loss 1.1295   LearningRate 0.0000   Epoch: 19   Global Step: 813580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:46,636-Speed 2608.82 samples/sec   Loss 1.0823   LearningRate 0.0000   Epoch: 19   Global Step: 813590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:50,544-Speed 2621.38 samples/sec   Loss 1.1112   LearningRate 0.0000   Epoch: 19   Global Step: 813600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:54,445-Speed 2624.97 samples/sec   Loss 1.1261   LearningRate 0.0000   Epoch: 19   Global Step: 813610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:10:58,350-Speed 2623.64 samples/sec   Loss 1.1117   LearningRate 0.0000   Epoch: 19   Global Step: 813620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:02,254-Speed 2623.81 samples/sec   Loss 1.1333   LearningRate 0.0000   Epoch: 19   Global Step: 813630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:06,154-Speed 2626.46 samples/sec   Loss 1.0736   LearningRate 0.0000   Epoch: 19   Global Step: 813640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:10,047-Speed 2630.94 samples/sec   Loss 1.1168   LearningRate 0.0000   Epoch: 19   Global Step: 813650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:13,943-Speed 2628.97 samples/sec   Loss 1.1234   LearningRate 0.0000   Epoch: 19   Global Step: 813660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:17,836-Speed 2631.08 samples/sec   Loss 1.1330   LearningRate 0.0000   Epoch: 19   Global Step: 813670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:21,735-Speed 2626.77 samples/sec   Loss 1.0577   LearningRate 0.0000   Epoch: 19   Global Step: 813680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:25,631-Speed 2628.72 samples/sec   Loss 1.1033   LearningRate 0.0000   Epoch: 19   Global Step: 813690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:29,534-Speed 2624.91 samples/sec   Loss 1.1567   LearningRate 0.0000   Epoch: 19   Global Step: 813700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:33,434-Speed 2626.06 samples/sec   Loss 1.1254   LearningRate 0.0000   Epoch: 19   Global Step: 813710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:37,329-Speed 2630.01 samples/sec   Loss 1.1188   LearningRate 0.0000   Epoch: 19   Global Step: 813720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:11:41,233-Speed 2623.62 samples/sec   Loss 1.1202   LearningRate 0.0000   Epoch: 19   Global Step: 813730   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:45,132-Speed 2626.89 samples/sec   Loss 1.1811   LearningRate 0.0000   Epoch: 19   Global Step: 813740   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:49,033-Speed 2625.84 samples/sec   Loss 1.1468   LearningRate 0.0000   Epoch: 19   Global Step: 813750   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:52,927-Speed 2630.58 samples/sec   Loss 1.1099   LearningRate 0.0000   Epoch: 19   Global Step: 813760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:11:56,834-Speed 2621.11 samples/sec   Loss 1.1191   LearningRate 0.0000   Epoch: 19   Global Step: 813770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:12:00,726-Speed 2632.58 samples/sec   Loss 1.1228   LearningRate 0.0000   Epoch: 19   Global Step: 813780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:12:04,627-Speed 2625.09 samples/sec   Loss 1.1089   LearningRate 0.0000   Epoch: 19   Global Step: 813790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:12:08,525-Speed 2631.34 samples/sec   Loss 1.1101   LearningRate 0.0000   Epoch: 19   Global Step: 813800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:12:12,416-Speed 2632.53 samples/sec   Loss 1.0618   LearningRate 0.0000   Epoch: 19   Global Step: 813810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:12:16,311-Speed 2629.51 samples/sec   Loss 1.0833   LearningRate 0.0000   Epoch: 19   Global Step: 813820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:12:20,215-Speed 2623.12 samples/sec   Loss 1.1313   LearningRate 0.0000   Epoch: 19   Global Step: 813830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:24,110-Speed 2630.23 samples/sec   Loss 1.0581   LearningRate 0.0000   Epoch: 19   Global Step: 813840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:28,017-Speed 2621.79 samples/sec   Loss 1.0822   LearningRate 0.0000   Epoch: 19   Global Step: 813850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:32,046-Speed 2542.28 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 813860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:35,956-Speed 2619.78 samples/sec   Loss 1.1028   LearningRate 0.0000   Epoch: 19   Global Step: 813870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:39,845-Speed 2633.70 samples/sec   Loss 1.0912   LearningRate 0.0000   Epoch: 19   Global Step: 813880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:43,880-Speed 2629.68 samples/sec   Loss 1.1087   LearningRate 0.0000   Epoch: 19   Global Step: 813890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:47,772-Speed 2631.63 samples/sec   Loss 1.1301   LearningRate 0.0000   Epoch: 19   Global Step: 813900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:51,665-Speed 2631.16 samples/sec   Loss 1.0979   LearningRate 0.0000   Epoch: 19   Global Step: 813910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:55,557-Speed 2631.48 samples/sec   Loss 1.1210   LearningRate 0.0000   Epoch: 19   Global Step: 813920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:12:59,460-Speed 2624.83 samples/sec   Loss 1.1247   LearningRate 0.0000   Epoch: 19   Global Step: 813930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:13:03,356-Speed 2628.88 samples/sec   Loss 1.1326   LearningRate 0.0000   Epoch: 19   Global Step: 813940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:13:07,249-Speed 2631.30 samples/sec   Loss 1.1400   LearningRate 0.0000   Epoch: 19   Global Step: 813950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:13:11,144-Speed 2629.39 samples/sec   Loss 1.1409   LearningRate 0.0000   Epoch: 19   Global Step: 813960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:13:15,064-Speed 2612.63 samples/sec   Loss 1.0840   LearningRate 0.0000   Epoch: 19   Global Step: 813970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:13:18,936-Speed 2646.03 samples/sec   Loss 1.1190   LearningRate 0.0000   Epoch: 19   Global Step: 813980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:22,834-Speed 2627.12 samples/sec   Loss 1.1278   LearningRate 0.0000   Epoch: 19   Global Step: 813990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:26,746-Speed 2618.69 samples/sec   Loss 1.1402   LearningRate 0.0000   Epoch: 19   Global Step: 814000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:30,654-Speed 2620.99 samples/sec   Loss 1.1144   LearningRate 0.0000   Epoch: 19   Global Step: 814010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:34,604-Speed 2593.11 samples/sec   Loss 1.0657   LearningRate 0.0000   Epoch: 19   Global Step: 814020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:38,546-Speed 2598.21 samples/sec   Loss 1.1529   LearningRate 0.0000   Epoch: 19   Global Step: 814030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:42,438-Speed 2632.49 samples/sec   Loss 1.1440   LearningRate 0.0000   Epoch: 19   Global Step: 814040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:46,333-Speed 2629.44 samples/sec   Loss 1.1259   LearningRate 0.0000   Epoch: 19   Global Step: 814050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:50,237-Speed 2623.64 samples/sec   Loss 1.1483   LearningRate 0.0000   Epoch: 19   Global Step: 814060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:54,134-Speed 2627.93 samples/sec   Loss 1.0934   LearningRate 0.0000   Epoch: 19   Global Step: 814070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:13:58,008-Speed 2644.51 samples/sec   Loss 1.1214   LearningRate 0.0000   Epoch: 19   Global Step: 814080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:01,904-Speed 2629.19 samples/sec   Loss 1.1007   LearningRate 0.0000   Epoch: 19   Global Step: 814090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:05,856-Speed 2591.33 samples/sec   Loss 1.1452   LearningRate 0.0000   Epoch: 19   Global Step: 814100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:09,866-Speed 2554.51 samples/sec   Loss 1.1337   LearningRate 0.0000   Epoch: 19   Global Step: 814110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:13,837-Speed 2579.61 samples/sec   Loss 1.0858   LearningRate 0.0000   Epoch: 19   Global Step: 814120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:18,106-Speed 2516.21 samples/sec   Loss 1.1519   LearningRate 0.0000   Epoch: 19   Global Step: 814130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:22,049-Speed 2597.57 samples/sec   Loss 1.0798   LearningRate 0.0000   Epoch: 19   Global Step: 814140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:25,947-Speed 2628.19 samples/sec   Loss 1.0531   LearningRate 0.0000   Epoch: 19   Global Step: 814150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:29,838-Speed 2631.70 samples/sec   Loss 1.0908   LearningRate 0.0000   Epoch: 19   Global Step: 814160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:33,740-Speed 2625.38 samples/sec   Loss 1.1029   LearningRate 0.0000   Epoch: 19   Global Step: 814170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:37,642-Speed 2624.90 samples/sec   Loss 1.0636   LearningRate 0.0000   Epoch: 19   Global Step: 814180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:14:41,608-Speed 2582.60 samples/sec   Loss 1.1089   LearningRate 0.0000   Epoch: 19   Global Step: 814190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:45,504-Speed 2628.89 samples/sec   Loss 1.1336   LearningRate 0.0000   Epoch: 19   Global Step: 814200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:49,414-Speed 2619.74 samples/sec   Loss 1.1156   LearningRate 0.0000   Epoch: 19   Global Step: 814210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:53,309-Speed 2629.95 samples/sec   Loss 1.1235   LearningRate 0.0000   Epoch: 19   Global Step: 814220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:14:57,203-Speed 2630.38 samples/sec   Loss 1.1413   LearningRate 0.0000   Epoch: 19   Global Step: 814230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:01,117-Speed 2617.18 samples/sec   Loss 1.1179   LearningRate 0.0000   Epoch: 19   Global Step: 814240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:05,023-Speed 2621.97 samples/sec   Loss 1.0781   LearningRate 0.0000   Epoch: 19   Global Step: 814250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:08,917-Speed 2630.23 samples/sec   Loss 1.0877   LearningRate 0.0000   Epoch: 19   Global Step: 814260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:12,811-Speed 2631.04 samples/sec   Loss 1.1488   LearningRate 0.0000   Epoch: 19   Global Step: 814270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:16,713-Speed 2624.39 samples/sec   Loss 1.1142   LearningRate 0.0000   Epoch: 19   Global Step: 814280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:20,606-Speed 2631.30 samples/sec   Loss 1.0851   LearningRate 0.0000   Epoch: 19   Global Step: 814290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:15:24,491-Speed 2636.26 samples/sec   Loss 1.1448   LearningRate 0.0000   Epoch: 19   Global Step: 814300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:28,384-Speed 2631.64 samples/sec   Loss 1.0977   LearningRate 0.0000   Epoch: 19   Global Step: 814310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:32,275-Speed 2632.13 samples/sec   Loss 1.1352   LearningRate 0.0000   Epoch: 19   Global Step: 814320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:36,166-Speed 2632.08 samples/sec   Loss 1.0777   LearningRate 0.0000   Epoch: 19   Global Step: 814330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:40,058-Speed 2631.52 samples/sec   Loss 1.1043   LearningRate 0.0000   Epoch: 19   Global Step: 814340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:43,954-Speed 2629.19 samples/sec   Loss 1.1039   LearningRate 0.0000   Epoch: 19   Global Step: 814350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:47,892-Speed 2601.07 samples/sec   Loss 1.1406   LearningRate 0.0000   Epoch: 19   Global Step: 814360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:51,789-Speed 2627.87 samples/sec   Loss 1.1446   LearningRate 0.0000   Epoch: 19   Global Step: 814370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:15:55,675-Speed 2636.39 samples/sec   Loss 1.0860   LearningRate 0.0000   Epoch: 19   Global Step: 814380   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:15:59,553-Speed 2641.21 samples/sec   Loss 1.1120   LearningRate 0.0000   Epoch: 19   Global Step: 814390   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:03,446-Speed 2630.57 samples/sec   Loss 1.1102   LearningRate 0.0000   Epoch: 19   Global Step: 814400   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:07,357-Speed 2619.15 samples/sec   Loss 1.1079   LearningRate 0.0000   Epoch: 19   Global Step: 814410   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:11,261-Speed 2623.65 samples/sec   Loss 1.1273   LearningRate 0.0000   Epoch: 19   Global Step: 814420   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:15,154-Speed 2630.39 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 814430   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:19,069-Speed 2616.59 samples/sec   Loss 1.1598   LearningRate 0.0000   Epoch: 19   Global Step: 814440   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:22,971-Speed 2631.29 samples/sec   Loss 1.1010   LearningRate 0.0000   Epoch: 19   Global Step: 814450   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:26,863-Speed 2631.78 samples/sec   Loss 1.1090   LearningRate 0.0000   Epoch: 19   Global Step: 814460   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:30,771-Speed 2621.19 samples/sec   Loss 1.1349   LearningRate 0.0000   Epoch: 19   Global Step: 814470   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:34,677-Speed 2622.27 samples/sec   Loss 1.0785   LearningRate 0.0000   Epoch: 19   Global Step: 814480   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-16 15:16:38,576-Speed 2626.87 samples/sec   Loss 1.1078   LearningRate 0.0000   Epoch: 19   Global Step: 814490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:16:42,474-Speed 2627.55 samples/sec   Loss 1.1470   LearningRate 0.0000   Epoch: 19   Global Step: 814500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:16:46,391-Speed 2615.10 samples/sec   Loss 1.1140   LearningRate 0.0000   Epoch: 19   Global Step: 814510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:16:50,388-Speed 2562.04 samples/sec   Loss 1.1015   LearningRate 0.0000   Epoch: 19   Global Step: 814520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:16:54,287-Speed 2626.84 samples/sec   Loss 1.0969   LearningRate 0.0000   Epoch: 19   Global Step: 814530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:16:58,188-Speed 2626.12 samples/sec   Loss 1.1865   LearningRate 0.0000   Epoch: 19   Global Step: 814540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:02,082-Speed 2629.71 samples/sec   Loss 1.0956   LearningRate 0.0000   Epoch: 19   Global Step: 814550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:05,975-Speed 2631.40 samples/sec   Loss 1.1057   LearningRate 0.0000   Epoch: 19   Global Step: 814560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:09,868-Speed 2630.49 samples/sec   Loss 1.0471   LearningRate 0.0000   Epoch: 19   Global Step: 814570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:13,825-Speed 2588.79 samples/sec   Loss 1.1402   LearningRate 0.0000   Epoch: 19   Global Step: 814580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:17,722-Speed 2628.22 samples/sec   Loss 1.1456   LearningRate 0.0000   Epoch: 19   Global Step: 814590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:17:21,622-Speed 2626.79 samples/sec   Loss 1.0920   LearningRate 0.0000   Epoch: 19   Global Step: 814600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:17:25,523-Speed 2625.23 samples/sec   Loss 1.1295   LearningRate 0.0000   Epoch: 19   Global Step: 814610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:17:29,395-Speed 2645.95 samples/sec   Loss 1.1181   LearningRate 0.0000   Epoch: 19   Global Step: 814620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:33,284-Speed 2633.25 samples/sec   Loss 1.0971   LearningRate 0.0000   Epoch: 19   Global Step: 814630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:37,178-Speed 2630.65 samples/sec   Loss 1.1339   LearningRate 0.0000   Epoch: 19   Global Step: 814640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:41,071-Speed 2630.78 samples/sec   Loss 1.1288   LearningRate 0.0000   Epoch: 19   Global Step: 814650   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:44,967-Speed 2628.95 samples/sec   Loss 1.1007   LearningRate 0.0000   Epoch: 19   Global Step: 814660   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:48,932-Speed 2583.43 samples/sec   Loss 1.1596   LearningRate 0.0000   Epoch: 19   Global Step: 814670   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:52,964-Speed 2540.29 samples/sec   Loss 1.1348   LearningRate 0.0000   Epoch: 19   Global Step: 814680   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:17:56,856-Speed 2631.47 samples/sec   Loss 1.1079   LearningRate 0.0000   Epoch: 19   Global Step: 814690   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:00,749-Speed 2631.19 samples/sec   Loss 1.0821   LearningRate 0.0000   Epoch: 19   Global Step: 814700   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:04,647-Speed 2627.68 samples/sec   Loss 1.1126   LearningRate 0.0000   Epoch: 19   Global Step: 814710   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:08,558-Speed 2619.05 samples/sec   Loss 1.1437   LearningRate 0.0000   Epoch: 19   Global Step: 814720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:18:12,456-Speed 2627.71 samples/sec   Loss 1.1044   LearningRate 0.0000   Epoch: 19   Global Step: 814730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:18:16,355-Speed 2626.76 samples/sec   Loss 1.1015   LearningRate 0.0000   Epoch: 19   Global Step: 814740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:18:20,267-Speed 2618.27 samples/sec   Loss 1.1132   LearningRate 0.0000   Epoch: 19   Global Step: 814750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:18:24,143-Speed 2642.95 samples/sec   Loss 1.0490   LearningRate 0.0000   Epoch: 19   Global Step: 814760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:28,036-Speed 2631.71 samples/sec   Loss 1.1542   LearningRate 0.0000   Epoch: 19   Global Step: 814770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:31,938-Speed 2624.84 samples/sec   Loss 1.1553   LearningRate 0.0000   Epoch: 19   Global Step: 814780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:35,844-Speed 2621.44 samples/sec   Loss 1.1206   LearningRate 0.0000   Epoch: 19   Global Step: 814790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:39,738-Speed 2631.06 samples/sec   Loss 1.1427   LearningRate 0.0000   Epoch: 19   Global Step: 814800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:43,628-Speed 2633.85 samples/sec   Loss 1.0927   LearningRate 0.0000   Epoch: 19   Global Step: 814810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:47,519-Speed 2631.60 samples/sec   Loss 1.1668   LearningRate 0.0000   Epoch: 19   Global Step: 814820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:51,412-Speed 2631.29 samples/sec   Loss 1.1370   LearningRate 0.0000   Epoch: 19   Global Step: 814830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:55,312-Speed 2626.58 samples/sec   Loss 1.0985   LearningRate 0.0000   Epoch: 19   Global Step: 814840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:18:59,228-Speed 2615.26 samples/sec   Loss 1.1178   LearningRate 0.0000   Epoch: 19   Global Step: 814850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:19:03,129-Speed 2625.52 samples/sec   Loss 1.1250   LearningRate 0.0000   Epoch: 19   Global Step: 814860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:07,029-Speed 2626.59 samples/sec   Loss 1.1137   LearningRate 0.0000   Epoch: 19   Global Step: 814870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:10,924-Speed 2629.47 samples/sec   Loss 1.1519   LearningRate 0.0000   Epoch: 19   Global Step: 814880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:14,824-Speed 2626.66 samples/sec   Loss 1.1571   LearningRate 0.0000   Epoch: 19   Global Step: 814890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:18,766-Speed 2598.50 samples/sec   Loss 1.1048   LearningRate 0.0000   Epoch: 19   Global Step: 814900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:22,666-Speed 2626.26 samples/sec   Loss 1.1166   LearningRate 0.0000   Epoch: 19   Global Step: 814910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:26,561-Speed 2630.16 samples/sec   Loss 1.0405   LearningRate 0.0000   Epoch: 19   Global Step: 814920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:30,458-Speed 2628.20 samples/sec   Loss 1.0717   LearningRate 0.0000   Epoch: 19   Global Step: 814930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:34,361-Speed 2624.66 samples/sec   Loss 1.0993   LearningRate 0.0000   Epoch: 19   Global Step: 814940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:38,298-Speed 2601.45 samples/sec   Loss 1.1257   LearningRate 0.0000   Epoch: 19   Global Step: 814950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:42,188-Speed 2632.81 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 814960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:19:46,057-Speed 2647.54 samples/sec   Loss 1.1606   LearningRate 0.0000   Epoch: 19   Global Step: 814970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:49,960-Speed 2624.78 samples/sec   Loss 1.1249   LearningRate 0.0000   Epoch: 19   Global Step: 814980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:53,852-Speed 2631.05 samples/sec   Loss 1.0960   LearningRate 0.0000   Epoch: 19   Global Step: 814990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:19:57,748-Speed 2629.90 samples/sec   Loss 1.1371   LearningRate 0.0000   Epoch: 19   Global Step: 815000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:01,641-Speed 2630.96 samples/sec   Loss 1.1281   LearningRate 0.0000   Epoch: 19   Global Step: 815010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:05,551-Speed 2619.47 samples/sec   Loss 1.0898   LearningRate 0.0000   Epoch: 19   Global Step: 815020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:09,441-Speed 2632.30 samples/sec   Loss 1.0810   LearningRate 0.0000   Epoch: 19   Global Step: 815030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:13,344-Speed 2624.46 samples/sec   Loss 1.1197   LearningRate 0.0000   Epoch: 19   Global Step: 815040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:17,238-Speed 2630.45 samples/sec   Loss 1.1300   LearningRate 0.0000   Epoch: 19   Global Step: 815050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:21,127-Speed 2634.43 samples/sec   Loss 1.0950   LearningRate 0.0000   Epoch: 19   Global Step: 815060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:24,996-Speed 2646.86 samples/sec   Loss 1.1865   LearningRate 0.0000   Epoch: 19   Global Step: 815070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:28,886-Speed 2633.01 samples/sec   Loss 1.0996   LearningRate 0.0000   Epoch: 19   Global Step: 815080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:32,783-Speed 2628.74 samples/sec   Loss 1.1068   LearningRate 0.0000   Epoch: 19   Global Step: 815090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:36,684-Speed 2625.12 samples/sec   Loss 1.0752   LearningRate 0.0000   Epoch: 19   Global Step: 815100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:40,586-Speed 2624.39 samples/sec   Loss 1.1111   LearningRate 0.0000   Epoch: 19   Global Step: 815110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:44,490-Speed 2624.35 samples/sec   Loss 1.0909   LearningRate 0.0000   Epoch: 19   Global Step: 815120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:48,391-Speed 2625.65 samples/sec   Loss 1.1018   LearningRate 0.0000   Epoch: 19   Global Step: 815130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:52,284-Speed 2630.90 samples/sec   Loss 1.1466   LearningRate 0.0000   Epoch: 19   Global Step: 815140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:20:56,171-Speed 2635.84 samples/sec   Loss 1.1104   LearningRate 0.0000   Epoch: 19   Global Step: 815150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:00,067-Speed 2628.48 samples/sec   Loss 1.0728   LearningRate 0.0000   Epoch: 19   Global Step: 815160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:03,957-Speed 2632.66 samples/sec   Loss 1.0879   LearningRate 0.0000   Epoch: 19   Global Step: 815170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:21:07,828-Speed 2645.63 samples/sec   Loss 1.0634   LearningRate 0.0000   Epoch: 19   Global Step: 815180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:11,730-Speed 2625.55 samples/sec   Loss 1.0996   LearningRate 0.0000   Epoch: 19   Global Step: 815190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:15,632-Speed 2624.49 samples/sec   Loss 1.1584   LearningRate 0.0000   Epoch: 19   Global Step: 815200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:19,549-Speed 2615.25 samples/sec   Loss 1.1647   LearningRate 0.0000   Epoch: 19   Global Step: 815210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:23,449-Speed 2626.24 samples/sec   Loss 1.1297   LearningRate 0.0000   Epoch: 19   Global Step: 815220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:27,347-Speed 2627.96 samples/sec   Loss 1.1147   LearningRate 0.0000   Epoch: 19   Global Step: 815230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:31,251-Speed 2623.31 samples/sec   Loss 1.0846   LearningRate 0.0000   Epoch: 19   Global Step: 815240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:21:35,130-Speed 2640.34 samples/sec   Loss 1.1448   LearningRate 0.0000   Epoch: 19   Global Step: 815250   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:21:39,027-Speed 2628.62 samples/sec   Loss 1.1665   LearningRate 0.0000   Epoch: 19   Global Step: 815260   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:21:42,928-Speed 2625.24 samples/sec   Loss 1.1331   LearningRate 0.0000   Epoch: 19   Global Step: 815270   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:21:46,832-Speed 2623.62 samples/sec   Loss 1.1123   LearningRate 0.0000   Epoch: 19   Global Step: 815280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:21:50,735-Speed 2624.60 samples/sec   Loss 1.1724   LearningRate 0.0000   Epoch: 19   Global Step: 815290   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:21:54,633-Speed 2627.32 samples/sec   Loss 1.1069   LearningRate 0.0000   Epoch: 19   Global Step: 815300   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:21:58,534-Speed 2625.84 samples/sec   Loss 1.1174   LearningRate 0.0000   Epoch: 19   Global Step: 815310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:22:02,432-Speed 2627.99 samples/sec   Loss 1.1574   LearningRate 0.0000   Epoch: 19   Global Step: 815320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:22:06,327-Speed 2629.69 samples/sec   Loss 1.0827   LearningRate 0.0000   Epoch: 19   Global Step: 815330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:22:10,228-Speed 2625.36 samples/sec   Loss 1.0735   LearningRate 0.0000   Epoch: 19   Global Step: 815340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:22:14,126-Speed 2627.83 samples/sec   Loss 1.1007   LearningRate 0.0000   Epoch: 19   Global Step: 815350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:18,049-Speed 2611.03 samples/sec   Loss 1.1229   LearningRate 0.0000   Epoch: 19   Global Step: 815360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:21,942-Speed 2631.19 samples/sec   Loss 1.0744   LearningRate 0.0000   Epoch: 19   Global Step: 815370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:25,839-Speed 2627.63 samples/sec   Loss 1.0880   LearningRate 0.0000   Epoch: 19   Global Step: 815380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:29,773-Speed 2604.19 samples/sec   Loss 1.1318   LearningRate 0.0000   Epoch: 19   Global Step: 815390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:33,672-Speed 2627.46 samples/sec   Loss 1.0860   LearningRate 0.0000   Epoch: 19   Global Step: 815400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:37,568-Speed 2628.93 samples/sec   Loss 1.1169   LearningRate 0.0000   Epoch: 19   Global Step: 815410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:41,471-Speed 2624.43 samples/sec   Loss 1.1092   LearningRate 0.0000   Epoch: 19   Global Step: 815420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:45,371-Speed 2625.87 samples/sec   Loss 1.1084   LearningRate 0.0000   Epoch: 19   Global Step: 815430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:49,274-Speed 2625.04 samples/sec   Loss 1.1656   LearningRate 0.0000   Epoch: 19   Global Step: 815440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:22:53,172-Speed 2627.47 samples/sec   Loss 1.0742   LearningRate 0.0000   Epoch: 19   Global Step: 815450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:22:57,077-Speed 2623.18 samples/sec   Loss 1.1090   LearningRate 0.0000   Epoch: 19   Global Step: 815460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:23:00,955-Speed 2640.83 samples/sec   Loss 1.1157   LearningRate 0.0000   Epoch: 19   Global Step: 815470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:23:04,850-Speed 2630.64 samples/sec   Loss 1.0980   LearningRate 0.0000   Epoch: 19   Global Step: 815480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:23:08,743-Speed 2631.07 samples/sec   Loss 1.1095   LearningRate 0.0000   Epoch: 19   Global Step: 815490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:23:12,639-Speed 2628.77 samples/sec   Loss 1.1172   LearningRate 0.0000   Epoch: 19   Global Step: 815500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:23:16,515-Speed 2642.53 samples/sec   Loss 1.1229   LearningRate 0.0000   Epoch: 19   Global Step: 815510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:20,424-Speed 2620.96 samples/sec   Loss 1.0790   LearningRate 0.0000   Epoch: 19   Global Step: 815520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:24,319-Speed 2629.03 samples/sec   Loss 1.0830   LearningRate 0.0000   Epoch: 19   Global Step: 815530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:28,212-Speed 2631.20 samples/sec   Loss 1.0394   LearningRate 0.0000   Epoch: 19   Global Step: 815540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:32,108-Speed 2629.05 samples/sec   Loss 1.0717   LearningRate 0.0000   Epoch: 19   Global Step: 815550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:36,004-Speed 2628.82 samples/sec   Loss 1.1508   LearningRate 0.0000   Epoch: 19   Global Step: 815560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:39,902-Speed 2627.81 samples/sec   Loss 1.0832   LearningRate 0.0000   Epoch: 19   Global Step: 815570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:43,793-Speed 2632.46 samples/sec   Loss 1.1404   LearningRate 0.0000   Epoch: 19   Global Step: 815580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:47,699-Speed 2622.83 samples/sec   Loss 1.1305   LearningRate 0.0000   Epoch: 19   Global Step: 815590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:51,600-Speed 2625.15 samples/sec   Loss 1.1132   LearningRate 0.0000   Epoch: 19   Global Step: 815600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-16 15:23:55,543-Speed 2597.69 samples/sec   Loss 1.0961   LearningRate 0.0000   Epoch: 19   Global Step: 815610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:23:59,538-Speed 2564.09 samples/sec   Loss 1.1027   LearningRate 0.0000   Epoch: 19   Global Step: 815620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:03,438-Speed 2626.67 samples/sec   Loss 1.1034   LearningRate 0.0000   Epoch: 19   Global Step: 815630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:07,328-Speed 2632.52 samples/sec   Loss 1.1426   LearningRate 0.0000   Epoch: 19   Global Step: 815640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:11,220-Speed 2631.61 samples/sec   Loss 1.0792   LearningRate 0.0000   Epoch: 19   Global Step: 815650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:15,113-Speed 2631.00 samples/sec   Loss 1.1173   LearningRate 0.0000   Epoch: 19   Global Step: 815660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:19,013-Speed 2627.34 samples/sec   Loss 1.1061   LearningRate 0.0000   Epoch: 19   Global Step: 815670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:22,904-Speed 2631.76 samples/sec   Loss 1.1711   LearningRate 0.0000   Epoch: 19   Global Step: 815680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:26,794-Speed 2633.20 samples/sec   Loss 1.1123   LearningRate 0.0000   Epoch: 19   Global Step: 815690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:30,703-Speed 2620.06 samples/sec   Loss 1.0872   LearningRate 0.0000   Epoch: 19   Global Step: 815700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:34,598-Speed 2630.56 samples/sec   Loss 1.1294   LearningRate 0.0000   Epoch: 19   Global Step: 815710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:24:38,494-Speed 2628.96 samples/sec   Loss 1.0971   LearningRate 0.0000   Epoch: 19   Global Step: 815720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:24:42,387-Speed 2630.83 samples/sec   Loss 1.1005   LearningRate 0.0000   Epoch: 19   Global Step: 815730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:24:46,256-Speed 2648.20 samples/sec   Loss 1.0982   LearningRate 0.0000   Epoch: 19   Global Step: 815740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:50,156-Speed 2625.86 samples/sec   Loss 1.1275   LearningRate 0.0000   Epoch: 19   Global Step: 815750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:54,047-Speed 2632.66 samples/sec   Loss 1.1117   LearningRate 0.0000   Epoch: 19   Global Step: 815760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:24:57,962-Speed 2616.58 samples/sec   Loss 1.1038   LearningRate 0.0000   Epoch: 19   Global Step: 815770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:01,854-Speed 2631.45 samples/sec   Loss 1.1283   LearningRate 0.0000   Epoch: 19   Global Step: 815780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:05,753-Speed 2626.97 samples/sec   Loss 1.0480   LearningRate 0.0000   Epoch: 19   Global Step: 815790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:09,691-Speed 2600.95 samples/sec   Loss 1.1069   LearningRate 0.0000   Epoch: 19   Global Step: 815800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:13,591-Speed 2627.08 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 815810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:17,497-Speed 2622.12 samples/sec   Loss 1.0731   LearningRate 0.0000   Epoch: 19   Global Step: 815820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:21,397-Speed 2626.49 samples/sec   Loss 1.1288   LearningRate 0.0000   Epoch: 19   Global Step: 815830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:25,295-Speed 2627.86 samples/sec   Loss 1.1210   LearningRate 0.0000   Epoch: 19   Global Step: 815840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:25:29,195-Speed 2626.68 samples/sec   Loss 1.0835   LearningRate 0.0000   Epoch: 19   Global Step: 815850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:25:33,073-Speed 2640.60 samples/sec   Loss 1.1035   LearningRate 0.0000   Epoch: 19   Global Step: 815860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:36,976-Speed 2624.41 samples/sec   Loss 1.0916   LearningRate 0.0000   Epoch: 19   Global Step: 815870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:40,875-Speed 2626.74 samples/sec   Loss 1.0984   LearningRate 0.0000   Epoch: 19   Global Step: 815880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:44,781-Speed 2622.67 samples/sec   Loss 1.0899   LearningRate 0.0000   Epoch: 19   Global Step: 815890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:48,675-Speed 2630.46 samples/sec   Loss 1.0983   LearningRate 0.0000   Epoch: 19   Global Step: 815900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:52,593-Speed 2613.72 samples/sec   Loss 1.1001   LearningRate 0.0000   Epoch: 19   Global Step: 815910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:25:56,496-Speed 2625.60 samples/sec   Loss 1.1272   LearningRate 0.0000   Epoch: 19   Global Step: 815920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:00,384-Speed 2633.85 samples/sec   Loss 1.1215   LearningRate 0.0000   Epoch: 19   Global Step: 815930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:04,282-Speed 2627.84 samples/sec   Loss 1.1003   LearningRate 0.0000   Epoch: 19   Global Step: 815940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:08,173-Speed 2632.17 samples/sec   Loss 1.1019   LearningRate 0.0000   Epoch: 19   Global Step: 815950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:12,071-Speed 2627.52 samples/sec   Loss 1.1366   LearningRate 0.0000   Epoch: 19   Global Step: 815960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:26:15,967-Speed 2629.38 samples/sec   Loss 1.1163   LearningRate 0.0000   Epoch: 19   Global Step: 815970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:26:19,837-Speed 2646.76 samples/sec   Loss 1.1542   LearningRate 0.0000   Epoch: 19   Global Step: 815980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:23,774-Speed 2601.02 samples/sec   Loss 1.1385   LearningRate 0.0000   Epoch: 19   Global Step: 815990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:27,767-Speed 2566.18 samples/sec   Loss 1.1327   LearningRate 0.0000   Epoch: 19   Global Step: 816000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:31,654-Speed 2634.35 samples/sec   Loss 1.0937   LearningRate 0.0000   Epoch: 19   Global Step: 816010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:35,545-Speed 2632.51 samples/sec   Loss 1.1464   LearningRate 0.0000   Epoch: 19   Global Step: 816020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:39,451-Speed 2621.75 samples/sec   Loss 1.1170   LearningRate 0.0000   Epoch: 19   Global Step: 816030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:43,348-Speed 2628.84 samples/sec   Loss 1.1066   LearningRate 0.0000   Epoch: 19   Global Step: 816040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:47,236-Speed 2634.15 samples/sec   Loss 1.1025   LearningRate 0.0000   Epoch: 19   Global Step: 816050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:51,140-Speed 2624.68 samples/sec   Loss 1.0855   LearningRate 0.0000   Epoch: 19   Global Step: 816060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:55,059-Speed 2613.44 samples/sec   Loss 1.1048   LearningRate 0.0000   Epoch: 19   Global Step: 816070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:26:58,936-Speed 2642.28 samples/sec   Loss 1.1077   LearningRate 0.0000   Epoch: 19   Global Step: 816080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:02,829-Speed 2630.49 samples/sec   Loss 1.1138   LearningRate 0.0000   Epoch: 19   Global Step: 816090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:06,722-Speed 2631.38 samples/sec   Loss 1.0994   LearningRate 0.0000   Epoch: 19   Global Step: 816100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:10,619-Speed 2628.22 samples/sec   Loss 1.1746   LearningRate 0.0000   Epoch: 19   Global Step: 816110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:14,514-Speed 2630.14 samples/sec   Loss 1.1198   LearningRate 0.0000   Epoch: 19   Global Step: 816120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:18,407-Speed 2630.41 samples/sec   Loss 1.1304   LearningRate 0.0000   Epoch: 19   Global Step: 816130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:22,316-Speed 2621.06 samples/sec   Loss 1.1055   LearningRate 0.0000   Epoch: 19   Global Step: 816140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:26,216-Speed 2625.70 samples/sec   Loss 1.1082   LearningRate 0.0000   Epoch: 19   Global Step: 816150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:30,108-Speed 2632.40 samples/sec   Loss 1.0971   LearningRate 0.0000   Epoch: 19   Global Step: 816160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:34,003-Speed 2629.41 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 19   Global Step: 816170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:37,902-Speed 2626.79 samples/sec   Loss 1.1109   LearningRate 0.0000   Epoch: 19   Global Step: 816180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:27:41,802-Speed 2626.50 samples/sec   Loss 1.1165   LearningRate 0.0000   Epoch: 19   Global Step: 816190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-16 15:27:45,679-Speed 2642.56 samples/sec   Loss 1.1569   LearningRate 0.0000   Epoch: 19   Global Step: 816200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:49,703-Speed 2545.85 samples/sec   Loss 1.0950   LearningRate 0.0000   Epoch: 19   Global Step: 816210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:53,646-Speed 2597.96 samples/sec   Loss 1.1033   LearningRate 0.0000   Epoch: 19   Global Step: 816220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:27:57,541-Speed 2629.29 samples/sec   Loss 1.1272   LearningRate 0.0000   Epoch: 19   Global Step: 816230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-16 15:28:01,442-Speed 2626.34 samples/sec   Loss 1.0526   LearningRate 0.0000   Epoch: 19   Global Step: 816240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:05,347-Speed 2622.74 samples/sec   Loss 1.1096   LearningRate 0.0000   Epoch: 19   Global Step: 816250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:09,253-Speed 2621.70 samples/sec   Loss 1.0984   LearningRate 0.0000   Epoch: 19   Global Step: 816260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:13,152-Speed 2627.26 samples/sec   Loss 1.1239   LearningRate 0.0000   Epoch: 19   Global Step: 816270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:17,050-Speed 2627.27 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 816280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:20,952-Speed 2625.03 samples/sec   Loss 1.1634   LearningRate 0.0000   Epoch: 19   Global Step: 816290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:24,846-Speed 2630.75 samples/sec   Loss 1.1443   LearningRate 0.0000   Epoch: 19   Global Step: 816300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:28:28,749-Speed 2624.70 samples/sec   Loss 1.0975   LearningRate 0.0000   Epoch: 19   Global Step: 816310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:28:32,620-Speed 2645.41 samples/sec   Loss 1.1141   LearningRate 0.0000   Epoch: 19   Global Step: 816320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:36,513-Speed 2631.19 samples/sec   Loss 1.1185   LearningRate 0.0000   Epoch: 19   Global Step: 816330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:40,416-Speed 2624.21 samples/sec   Loss 1.0965   LearningRate 0.0000   Epoch: 19   Global Step: 816340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:44,310-Speed 2630.33 samples/sec   Loss 1.0843   LearningRate 0.0000   Epoch: 19   Global Step: 816350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:48,212-Speed 2624.86 samples/sec   Loss 1.1186   LearningRate 0.0000   Epoch: 19   Global Step: 816360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:52,122-Speed 2627.13 samples/sec   Loss 1.1159   LearningRate 0.0000   Epoch: 19   Global Step: 816370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:56,016-Speed 2629.88 samples/sec   Loss 1.1225   LearningRate 0.0000   Epoch: 19   Global Step: 816380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:28:59,910-Speed 2630.30 samples/sec   Loss 1.1238   LearningRate 0.0000   Epoch: 19   Global Step: 816390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:03,819-Speed 2620.50 samples/sec   Loss 1.1618   LearningRate 0.0000   Epoch: 19   Global Step: 816400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:07,713-Speed 2630.61 samples/sec   Loss 1.1170   LearningRate 0.0000   Epoch: 19   Global Step: 816410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:11,610-Speed 2628.19 samples/sec   Loss 1.1578   LearningRate 0.0000   Epoch: 19   Global Step: 816420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:29:15,479-Speed 2647.03 samples/sec   Loss 1.1389   LearningRate 0.0000   Epoch: 19   Global Step: 816430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:19,379-Speed 2626.67 samples/sec   Loss 1.0855   LearningRate 0.0000   Epoch: 19   Global Step: 816440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:23,269-Speed 2633.43 samples/sec   Loss 1.1239   LearningRate 0.0000   Epoch: 19   Global Step: 816450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:27,161-Speed 2631.59 samples/sec   Loss 1.0886   LearningRate 0.0000   Epoch: 19   Global Step: 816460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:31,053-Speed 2632.00 samples/sec   Loss 1.1009   LearningRate 0.0000   Epoch: 19   Global Step: 816470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:34,946-Speed 2630.49 samples/sec   Loss 1.0918   LearningRate 0.0000   Epoch: 19   Global Step: 816480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:38,845-Speed 2627.32 samples/sec   Loss 1.1414   LearningRate 0.0000   Epoch: 19   Global Step: 816490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:42,762-Speed 2614.82 samples/sec   Loss 1.1273   LearningRate 0.0000   Epoch: 19   Global Step: 816500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:46,841-Speed 2511.38 samples/sec   Loss 1.1147   LearningRate 0.0000   Epoch: 19   Global Step: 816510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:50,851-Speed 2554.06 samples/sec   Loss 1.0729   LearningRate 0.0000   Epoch: 19   Global Step: 816520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:29:54,744-Speed 2631.18 samples/sec   Loss 1.0888   LearningRate 0.0000   Epoch: 19   Global Step: 816530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:29:58,639-Speed 2629.41 samples/sec   Loss 1.0806   LearningRate 0.0000   Epoch: 19   Global Step: 816540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:30:02,510-Speed 2646.13 samples/sec   Loss 1.0994   LearningRate 0.0000   Epoch: 19   Global Step: 816550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:06,411-Speed 2625.80 samples/sec   Loss 1.1319   LearningRate 0.0000   Epoch: 19   Global Step: 816560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:10,338-Speed 2608.46 samples/sec   Loss 1.0802   LearningRate 0.0000   Epoch: 19   Global Step: 816570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:14,234-Speed 2628.91 samples/sec   Loss 1.1123   LearningRate 0.0000   Epoch: 19   Global Step: 816580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:18,128-Speed 2630.87 samples/sec   Loss 1.0903   LearningRate 0.0000   Epoch: 19   Global Step: 816590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:22,021-Speed 2631.08 samples/sec   Loss 1.1206   LearningRate 0.0000   Epoch: 19   Global Step: 816600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:25,922-Speed 2625.43 samples/sec   Loss 1.0928   LearningRate 0.0000   Epoch: 19   Global Step: 816610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:29,819-Speed 2628.33 samples/sec   Loss 1.1227   LearningRate 0.0000   Epoch: 19   Global Step: 816620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:33,713-Speed 2630.43 samples/sec   Loss 1.0823   LearningRate 0.0000   Epoch: 19   Global Step: 816630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:37,608-Speed 2629.73 samples/sec   Loss 1.1233   LearningRate 0.0000   Epoch: 19   Global Step: 816640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:41,482-Speed 2644.23 samples/sec   Loss 1.0704   LearningRate 0.0000   Epoch: 19   Global Step: 816650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:45,380-Speed 2627.96 samples/sec   Loss 1.0763   LearningRate 0.0000   Epoch: 19   Global Step: 816660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:49,278-Speed 2626.81 samples/sec   Loss 1.1037   LearningRate 0.0000   Epoch: 19   Global Step: 816670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:53,169-Speed 2633.31 samples/sec   Loss 1.0758   LearningRate 0.0000   Epoch: 19   Global Step: 816680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:30:57,059-Speed 2632.77 samples/sec   Loss 1.0863   LearningRate 0.0000   Epoch: 19   Global Step: 816690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:00,951-Speed 2631.65 samples/sec   Loss 1.0913   LearningRate 0.0000   Epoch: 19   Global Step: 816700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:04,847-Speed 2628.60 samples/sec   Loss 1.0931   LearningRate 0.0000   Epoch: 19   Global Step: 816710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:08,758-Speed 2619.39 samples/sec   Loss 1.0933   LearningRate 0.0000   Epoch: 19   Global Step: 816720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:12,649-Speed 2632.17 samples/sec   Loss 1.0911   LearningRate 0.0000   Epoch: 19   Global Step: 816730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:16,538-Speed 2634.00 samples/sec   Loss 1.1196   LearningRate 0.0000   Epoch: 19   Global Step: 816740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:20,436-Speed 2627.74 samples/sec   Loss 1.0811   LearningRate 0.0000   Epoch: 19   Global Step: 816750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:31:24,303-Speed 2648.49 samples/sec   Loss 1.0956   LearningRate 0.0000   Epoch: 19   Global Step: 816760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:28,194-Speed 2632.47 samples/sec   Loss 1.1004   LearningRate 0.0000   Epoch: 19   Global Step: 816770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:32,097-Speed 2624.38 samples/sec   Loss 1.0942   LearningRate 0.0000   Epoch: 19   Global Step: 816780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:36,002-Speed 2622.53 samples/sec   Loss 1.1097   LearningRate 0.0000   Epoch: 19   Global Step: 816790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:39,896-Speed 2630.02 samples/sec   Loss 1.1314   LearningRate 0.0000   Epoch: 19   Global Step: 816800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:43,796-Speed 2627.28 samples/sec   Loss 1.0869   LearningRate 0.0000   Epoch: 19   Global Step: 816810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:47,697-Speed 2625.31 samples/sec   Loss 1.0743   LearningRate 0.0000   Epoch: 19   Global Step: 816820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:51,610-Speed 2618.12 samples/sec   Loss 1.1116   LearningRate 0.0000   Epoch: 19   Global Step: 816830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:55,521-Speed 2619.16 samples/sec   Loss 1.0951   LearningRate 0.0000   Epoch: 19   Global Step: 816840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:31:59,408-Speed 2634.80 samples/sec   Loss 1.0515   LearningRate 0.0000   Epoch: 19   Global Step: 816850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:03,300-Speed 2631.84 samples/sec   Loss 1.0394   LearningRate 0.0000   Epoch: 19   Global Step: 816860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:32:07,200-Speed 2626.09 samples/sec   Loss 1.1258   LearningRate 0.0000   Epoch: 19   Global Step: 816870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:32:11,116-Speed 2615.56 samples/sec   Loss 1.1037   LearningRate 0.0000   Epoch: 19   Global Step: 816880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:32:14,992-Speed 2642.58 samples/sec   Loss 1.0595   LearningRate 0.0000   Epoch: 19   Global Step: 816890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:18,887-Speed 2630.17 samples/sec   Loss 1.0980   LearningRate 0.0000   Epoch: 19   Global Step: 816900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:22,898-Speed 2553.38 samples/sec   Loss 1.1248   LearningRate 0.0000   Epoch: 19   Global Step: 816910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:26,799-Speed 2625.68 samples/sec   Loss 1.1024   LearningRate 0.0000   Epoch: 19   Global Step: 816920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:30,690-Speed 2632.77 samples/sec   Loss 1.0838   LearningRate 0.0000   Epoch: 19   Global Step: 816930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:34,595-Speed 2622.96 samples/sec   Loss 1.0645   LearningRate 0.0000   Epoch: 19   Global Step: 816940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:38,491-Speed 2628.62 samples/sec   Loss 1.1042   LearningRate 0.0000   Epoch: 19   Global Step: 816950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:42,387-Speed 2629.29 samples/sec   Loss 1.1352   LearningRate 0.0000   Epoch: 19   Global Step: 816960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:46,286-Speed 2626.63 samples/sec   Loss 1.0867   LearningRate 0.0000   Epoch: 19   Global Step: 816970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:50,270-Speed 2571.07 samples/sec   Loss 1.0991   LearningRate 0.0000   Epoch: 19   Global Step: 816980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:32:54,164-Speed 2630.64 samples/sec   Loss 1.1429   LearningRate 0.0000   Epoch: 19   Global Step: 816990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:32:58,057-Speed 2631.18 samples/sec   Loss 1.0650   LearningRate 0.0000   Epoch: 19   Global Step: 817000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:33:01,962-Speed 2622.62 samples/sec   Loss 1.0746   LearningRate 0.0000   Epoch: 19   Global Step: 817010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:33:05,852-Speed 2633.06 samples/sec   Loss 1.1792   LearningRate 0.0000   Epoch: 19   Global Step: 817020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:33:09,725-Speed 2644.79 samples/sec   Loss 1.1186   LearningRate 0.0000   Epoch: 19   Global Step: 817030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:13,629-Speed 2623.59 samples/sec   Loss 1.0899   LearningRate 0.0000   Epoch: 19   Global Step: 817040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:17,557-Speed 2607.80 samples/sec   Loss 1.1322   LearningRate 0.0000   Epoch: 19   Global Step: 817050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:21,457-Speed 2626.46 samples/sec   Loss 1.0977   LearningRate 0.0000   Epoch: 19   Global Step: 817060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:25,359-Speed 2625.05 samples/sec   Loss 1.0417   LearningRate 0.0000   Epoch: 19   Global Step: 817070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:29,270-Speed 2618.87 samples/sec   Loss 1.0806   LearningRate 0.0000   Epoch: 19   Global Step: 817080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:33,164-Speed 2631.06 samples/sec   Loss 1.0965   LearningRate 0.0000   Epoch: 19   Global Step: 817090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:37,061-Speed 2628.16 samples/sec   Loss 1.0843   LearningRate 0.0000   Epoch: 19   Global Step: 817100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:40,991-Speed 2606.10 samples/sec   Loss 1.0971   LearningRate 0.0000   Epoch: 19   Global Step: 817110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:44,881-Speed 2633.50 samples/sec   Loss 1.0949   LearningRate 0.0000   Epoch: 19   Global Step: 817120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:33:48,776-Speed 2629.42 samples/sec   Loss 1.0961   LearningRate 0.0000   Epoch: 19   Global Step: 817130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:33:52,700-Speed 2610.84 samples/sec   Loss 1.1382   LearningRate 0.0000   Epoch: 19   Global Step: 817140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:33:56,570-Speed 2645.98 samples/sec   Loss 1.1210   LearningRate 0.0000   Epoch: 19   Global Step: 817150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:00,462-Speed 2632.31 samples/sec   Loss 1.1134   LearningRate 0.0000   Epoch: 19   Global Step: 817160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:04,360-Speed 2627.40 samples/sec   Loss 1.0965   LearningRate 0.0000   Epoch: 19   Global Step: 817170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:08,258-Speed 2627.62 samples/sec   Loss 1.1075   LearningRate 0.0000   Epoch: 19   Global Step: 817180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:12,150-Speed 2631.86 samples/sec   Loss 1.1375   LearningRate 0.0000   Epoch: 19   Global Step: 817190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:16,041-Speed 2632.29 samples/sec   Loss 1.0922   LearningRate 0.0000   Epoch: 19   Global Step: 817200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:20,159-Speed 2489.78 samples/sec   Loss 1.1397   LearningRate 0.0000   Epoch: 19   Global Step: 817210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:34:24,199-Speed 2535.16 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 817220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:28,090-Speed 2632.46 samples/sec   Loss 1.1278   LearningRate 0.0000   Epoch: 19   Global Step: 817230   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:31,978-Speed 2634.07 samples/sec   Loss 1.0474   LearningRate 0.0000   Epoch: 19   Global Step: 817240   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:35,878-Speed 2626.80 samples/sec   Loss 1.1150   LearningRate 0.0000   Epoch: 19   Global Step: 817250   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:39,777-Speed 2627.73 samples/sec   Loss 1.1189   LearningRate 0.0000   Epoch: 19   Global Step: 817260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:43,680-Speed 2623.81 samples/sec   Loss 1.0826   LearningRate 0.0000   Epoch: 19   Global Step: 817270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:47,594-Speed 2617.44 samples/sec   Loss 1.1097   LearningRate 0.0000   Epoch: 19   Global Step: 817280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:51,511-Speed 2614.85 samples/sec   Loss 1.1377   LearningRate 0.0000   Epoch: 19   Global Step: 817290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:55,404-Speed 2631.43 samples/sec   Loss 1.1016   LearningRate 0.0000   Epoch: 19   Global Step: 817300   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:34:59,294-Speed 2633.23 samples/sec   Loss 1.1572   LearningRate 0.0000   Epoch: 19   Global Step: 817310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:35:03,188-Speed 2630.04 samples/sec   Loss 1.1378   LearningRate 0.0000   Epoch: 19   Global Step: 817320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:07,076-Speed 2634.26 samples/sec   Loss 1.1217   LearningRate 0.0000   Epoch: 19   Global Step: 817330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:11,012-Speed 2603.20 samples/sec   Loss 1.1116   LearningRate 0.0000   Epoch: 19   Global Step: 817340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:14,912-Speed 2626.41 samples/sec   Loss 1.0805   LearningRate 0.0000   Epoch: 19   Global Step: 817350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:18,806-Speed 2630.39 samples/sec   Loss 1.1195   LearningRate 0.0000   Epoch: 19   Global Step: 817360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:22,707-Speed 2625.67 samples/sec   Loss 1.0797   LearningRate 0.0000   Epoch: 19   Global Step: 817370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:26,602-Speed 2629.27 samples/sec   Loss 1.1006   LearningRate 0.0000   Epoch: 19   Global Step: 817380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:30,498-Speed 2629.35 samples/sec   Loss 1.0924   LearningRate 0.0000   Epoch: 19   Global Step: 817390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:34,393-Speed 2629.48 samples/sec   Loss 1.1149   LearningRate 0.0000   Epoch: 19   Global Step: 817400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:38,284-Speed 2633.06 samples/sec   Loss 1.1189   LearningRate 0.0000   Epoch: 19   Global Step: 817410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:42,178-Speed 2630.33 samples/sec   Loss 1.1082   LearningRate 0.0000   Epoch: 19   Global Step: 817420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:35:46,060-Speed 2639.00 samples/sec   Loss 1.1272   LearningRate 0.0000   Epoch: 19   Global Step: 817430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:49,969-Speed 2619.72 samples/sec   Loss 1.1202   LearningRate 0.0000   Epoch: 19   Global Step: 817440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:53,863-Speed 2631.25 samples/sec   Loss 1.1325   LearningRate 0.0000   Epoch: 19   Global Step: 817450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:35:57,765-Speed 2625.12 samples/sec   Loss 1.1381   LearningRate 0.0000   Epoch: 19   Global Step: 817460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:01,680-Speed 2616.19 samples/sec   Loss 1.1098   LearningRate 0.0000   Epoch: 19   Global Step: 817470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:05,576-Speed 2628.99 samples/sec   Loss 1.0973   LearningRate 0.0000   Epoch: 19   Global Step: 817480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:09,492-Speed 2615.93 samples/sec   Loss 1.1180   LearningRate 0.0000   Epoch: 19   Global Step: 817490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:13,440-Speed 2593.88 samples/sec   Loss 1.0849   LearningRate 0.0000   Epoch: 19   Global Step: 817500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:17,344-Speed 2624.11 samples/sec   Loss 1.0768   LearningRate 0.0000   Epoch: 19   Global Step: 817510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:21,248-Speed 2623.26 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 19   Global Step: 817520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:25,131-Speed 2645.44 samples/sec   Loss 1.0850   LearningRate 0.0000   Epoch: 19   Global Step: 817530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:36:29,002-Speed 2645.84 samples/sec   Loss 1.0829   LearningRate 0.0000   Epoch: 19   Global Step: 817540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:32,892-Speed 2632.94 samples/sec   Loss 1.1096   LearningRate 0.0000   Epoch: 19   Global Step: 817550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:36,791-Speed 2626.68 samples/sec   Loss 1.1212   LearningRate 0.0000   Epoch: 19   Global Step: 817560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:40,683-Speed 2632.03 samples/sec   Loss 1.0989   LearningRate 0.0000   Epoch: 19   Global Step: 817570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:44,576-Speed 2631.19 samples/sec   Loss 1.0612   LearningRate 0.0000   Epoch: 19   Global Step: 817580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:48,474-Speed 2627.51 samples/sec   Loss 1.0997   LearningRate 0.0000   Epoch: 19   Global Step: 817590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:52,371-Speed 2628.48 samples/sec   Loss 1.0976   LearningRate 0.0000   Epoch: 19   Global Step: 817600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:36:56,268-Speed 2628.28 samples/sec   Loss 1.0766   LearningRate 0.0000   Epoch: 19   Global Step: 817610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:37:00,169-Speed 2626.02 samples/sec   Loss 1.1240   LearningRate 0.0000   Epoch: 19   Global Step: 817620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:37:04,064-Speed 2629.40 samples/sec   Loss 1.1416   LearningRate 0.0000   Epoch: 19   Global Step: 817630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:37:07,960-Speed 2629.02 samples/sec   Loss 1.1071   LearningRate 0.0000   Epoch: 19   Global Step: 817640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:11,848-Speed 2634.13 samples/sec   Loss 1.1169   LearningRate 0.0000   Epoch: 19   Global Step: 817650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:15,755-Speed 2622.19 samples/sec   Loss 1.0958   LearningRate 0.0000   Epoch: 19   Global Step: 817660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:19,652-Speed 2628.22 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 19   Global Step: 817670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:23,547-Speed 2629.97 samples/sec   Loss 1.1229   LearningRate 0.0000   Epoch: 19   Global Step: 817680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:27,445-Speed 2627.42 samples/sec   Loss 1.0864   LearningRate 0.0000   Epoch: 19   Global Step: 817690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:31,340-Speed 2630.06 samples/sec   Loss 1.1066   LearningRate 0.0000   Epoch: 19   Global Step: 817700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:35,234-Speed 2630.39 samples/sec   Loss 1.1236   LearningRate 0.0000   Epoch: 19   Global Step: 817710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:39,127-Speed 2630.56 samples/sec   Loss 1.1343   LearningRate 0.0000   Epoch: 19   Global Step: 817720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:43,022-Speed 2629.44 samples/sec   Loss 1.0944   LearningRate 0.0000   Epoch: 19   Global Step: 817730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:46,914-Speed 2631.66 samples/sec   Loss 1.0938   LearningRate 0.0000   Epoch: 19   Global Step: 817740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:37:50,813-Speed 2627.59 samples/sec   Loss 1.1407   LearningRate 0.0000   Epoch: 19   Global Step: 817750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:37:54,684-Speed 2645.47 samples/sec   Loss 1.1520   LearningRate 0.0000   Epoch: 19   Global Step: 817760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:37:58,580-Speed 2630.17 samples/sec   Loss 1.0926   LearningRate 0.0000   Epoch: 19   Global Step: 817770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:02,470-Speed 2633.04 samples/sec   Loss 1.0764   LearningRate 0.0000   Epoch: 19   Global Step: 817780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:06,362-Speed 2631.83 samples/sec   Loss 1.1365   LearningRate 0.0000   Epoch: 19   Global Step: 817790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:10,262-Speed 2625.71 samples/sec   Loss 1.1095   LearningRate 0.0000   Epoch: 19   Global Step: 817800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:14,189-Speed 2612.71 samples/sec   Loss 1.1212   LearningRate 0.0000   Epoch: 19   Global Step: 817810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:18,309-Speed 2486.15 samples/sec   Loss 1.0957   LearningRate 0.0000   Epoch: 19   Global Step: 817820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:22,222-Speed 2617.93 samples/sec   Loss 1.0829   LearningRate 0.0000   Epoch: 19   Global Step: 817830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:26,116-Speed 2630.73 samples/sec   Loss 1.0603   LearningRate 0.0000   Epoch: 19   Global Step: 817840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:30,012-Speed 2629.08 samples/sec   Loss 1.0624   LearningRate 0.0000   Epoch: 19   Global Step: 817850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:33,910-Speed 2627.41 samples/sec   Loss 1.1096   LearningRate 0.0000   Epoch: 19   Global Step: 817860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:38:37,794-Speed 2637.20 samples/sec   Loss 1.1275   LearningRate 0.0000   Epoch: 19   Global Step: 817870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:41,685-Speed 2632.59 samples/sec   Loss 1.0949   LearningRate 0.0000   Epoch: 19   Global Step: 817880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:45,577-Speed 2631.79 samples/sec   Loss 1.1081   LearningRate 0.0000   Epoch: 19   Global Step: 817890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:49,470-Speed 2630.55 samples/sec   Loss 1.0897   LearningRate 0.0000   Epoch: 19   Global Step: 817900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:53,416-Speed 2596.20 samples/sec   Loss 1.0509   LearningRate 0.0000   Epoch: 19   Global Step: 817910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:38:57,322-Speed 2622.74 samples/sec   Loss 1.0780   LearningRate 0.0000   Epoch: 19   Global Step: 817920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:01,237-Speed 2616.32 samples/sec   Loss 1.0805   LearningRate 0.0000   Epoch: 19   Global Step: 817930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:05,156-Speed 2614.24 samples/sec   Loss 1.1408   LearningRate 0.0000   Epoch: 19   Global Step: 817940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:09,056-Speed 2625.85 samples/sec   Loss 1.1004   LearningRate 0.0000   Epoch: 19   Global Step: 817950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:12,960-Speed 2623.81 samples/sec   Loss 1.0908   LearningRate 0.0000   Epoch: 19   Global Step: 817960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:16,879-Speed 2613.80 samples/sec   Loss 1.1187   LearningRate 0.0000   Epoch: 19   Global Step: 817970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:20,793-Speed 2617.63 samples/sec   Loss 1.0988   LearningRate 0.0000   Epoch: 19   Global Step: 817980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:24,736-Speed 2597.62 samples/sec   Loss 1.1017   LearningRate 0.0000   Epoch: 19   Global Step: 817990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:28,649-Speed 2618.56 samples/sec   Loss 1.0819   LearningRate 0.0000   Epoch: 19   Global Step: 818000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:32,541-Speed 2631.57 samples/sec   Loss 1.1239   LearningRate 0.0000   Epoch: 19   Global Step: 818010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:39:36,450-Speed 2620.70 samples/sec   Loss 1.0860   LearningRate 0.0000   Epoch: 19   Global Step: 818020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:39:40,347-Speed 2628.04 samples/sec   Loss 1.1145   LearningRate 0.0000   Epoch: 19   Global Step: 818030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:39:44,262-Speed 2616.66 samples/sec   Loss 1.1123   LearningRate 0.0000   Epoch: 19   Global Step: 818040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:39:48,158-Speed 2628.91 samples/sec   Loss 1.1316   LearningRate 0.0000   Epoch: 19   Global Step: 818050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:39:52,054-Speed 2629.62 samples/sec   Loss 1.0765   LearningRate 0.0000   Epoch: 19   Global Step: 818060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:39:55,947-Speed 2630.49 samples/sec   Loss 1.1027   LearningRate 0.0000   Epoch: 19   Global Step: 818070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:39:59,873-Speed 2609.32 samples/sec   Loss 1.1079   LearningRate 0.0000   Epoch: 19   Global Step: 818080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:40:03,761-Speed 2634.54 samples/sec   Loss 1.0913   LearningRate 0.0000   Epoch: 19   Global Step: 818090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:40:07,666-Speed 2623.08 samples/sec   Loss 1.1392   LearningRate 0.0000   Epoch: 19   Global Step: 818100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:40:11,565-Speed 2626.34 samples/sec   Loss 1.1270   LearningRate 0.0000   Epoch: 19   Global Step: 818110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:40:15,458-Speed 2631.46 samples/sec   Loss 1.0550   LearningRate 0.0000   Epoch: 19   Global Step: 818120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:19,356-Speed 2627.47 samples/sec   Loss 1.0891   LearningRate 0.0000   Epoch: 19   Global Step: 818130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:23,249-Speed 2631.27 samples/sec   Loss 1.0932   LearningRate 0.0000   Epoch: 19   Global Step: 818140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:27,145-Speed 2628.88 samples/sec   Loss 1.1459   LearningRate 0.0000   Epoch: 19   Global Step: 818150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:31,036-Speed 2633.20 samples/sec   Loss 1.0826   LearningRate 0.0000   Epoch: 19   Global Step: 818160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:34,928-Speed 2631.49 samples/sec   Loss 1.1278   LearningRate 0.0000   Epoch: 19   Global Step: 818170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:38,825-Speed 2628.21 samples/sec   Loss 1.1276   LearningRate 0.0000   Epoch: 19   Global Step: 818180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:42,721-Speed 2628.93 samples/sec   Loss 1.0847   LearningRate 0.0000   Epoch: 19   Global Step: 818190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:46,621-Speed 2626.64 samples/sec   Loss 1.1104   LearningRate 0.0000   Epoch: 19   Global Step: 818200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:50,534-Speed 2617.46 samples/sec   Loss 1.1707   LearningRate 0.0000   Epoch: 19   Global Step: 818210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:40:54,454-Speed 2612.89 samples/sec   Loss 1.1494   LearningRate 0.0000   Epoch: 19   Global Step: 818220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:40:58,330-Speed 2643.21 samples/sec   Loss 1.1090   LearningRate 0.0000   Epoch: 19   Global Step: 818230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:02,243-Speed 2617.39 samples/sec   Loss 1.1029   LearningRate 0.0000   Epoch: 19   Global Step: 818240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:06,162-Speed 2613.81 samples/sec   Loss 1.0721   LearningRate 0.0000   Epoch: 19   Global Step: 818250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:10,062-Speed 2625.91 samples/sec   Loss 1.1150   LearningRate 0.0000   Epoch: 19   Global Step: 818260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:13,953-Speed 2632.68 samples/sec   Loss 1.1075   LearningRate 0.0000   Epoch: 19   Global Step: 818270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:17,897-Speed 2597.41 samples/sec   Loss 1.1340   LearningRate 0.0000   Epoch: 19   Global Step: 818280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:21,799-Speed 2625.55 samples/sec   Loss 1.0851   LearningRate 0.0000   Epoch: 19   Global Step: 818290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:25,698-Speed 2626.79 samples/sec   Loss 1.0758   LearningRate 0.0000   Epoch: 19   Global Step: 818300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:29,615-Speed 2615.37 samples/sec   Loss 1.1091   LearningRate 0.0000   Epoch: 19   Global Step: 818310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:33,508-Speed 2631.12 samples/sec   Loss 1.0787   LearningRate 0.0000   Epoch: 19   Global Step: 818320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:37,402-Speed 2629.61 samples/sec   Loss 1.1099   LearningRate 0.0000   Epoch: 19   Global Step: 818330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:41:41,280-Speed 2641.41 samples/sec   Loss 1.1127   LearningRate 0.0000   Epoch: 19   Global Step: 818340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:41:45,163-Speed 2642.98 samples/sec   Loss 1.1243   LearningRate 0.0000   Epoch: 19   Global Step: 818350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:41:49,059-Speed 2628.51 samples/sec   Loss 1.0754   LearningRate 0.0000   Epoch: 19   Global Step: 818360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:41:52,956-Speed 2629.26 samples/sec   Loss 1.0886   LearningRate 0.0000   Epoch: 19   Global Step: 818370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:41:56,847-Speed 2631.67 samples/sec   Loss 1.1106   LearningRate 0.0000   Epoch: 19   Global Step: 818380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:00,741-Speed 2631.48 samples/sec   Loss 1.1426   LearningRate 0.0000   Epoch: 19   Global Step: 818390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:04,637-Speed 2628.96 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 818400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:08,554-Speed 2614.57 samples/sec   Loss 1.1334   LearningRate 0.0000   Epoch: 19   Global Step: 818410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:12,450-Speed 2628.95 samples/sec   Loss 1.1734   LearningRate 0.0000   Epoch: 19   Global Step: 818420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:16,343-Speed 2631.15 samples/sec   Loss 1.1370   LearningRate 0.0000   Epoch: 19   Global Step: 818430   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:20,239-Speed 2629.64 samples/sec   Loss 1.1245   LearningRate 0.0000   Epoch: 19   Global Step: 818440   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:42:24,139-Speed 2625.67 samples/sec   Loss 1.1442   LearningRate 0.0000   Epoch: 19   Global Step: 818450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:28,063-Speed 2610.94 samples/sec   Loss 1.1324   LearningRate 0.0000   Epoch: 19   Global Step: 818460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:31,957-Speed 2630.38 samples/sec   Loss 1.0966   LearningRate 0.0000   Epoch: 19   Global Step: 818470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:35,851-Speed 2630.35 samples/sec   Loss 1.0835   LearningRate 0.0000   Epoch: 19   Global Step: 818480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:39,744-Speed 2630.64 samples/sec   Loss 1.0911   LearningRate 0.0000   Epoch: 19   Global Step: 818490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:43,643-Speed 2627.42 samples/sec   Loss 1.1420   LearningRate 0.0000   Epoch: 19   Global Step: 818500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:47,547-Speed 2623.83 samples/sec   Loss 1.0856   LearningRate 0.0000   Epoch: 19   Global Step: 818510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:51,441-Speed 2630.02 samples/sec   Loss 1.1633   LearningRate 0.0000   Epoch: 19   Global Step: 818520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:55,335-Speed 2630.78 samples/sec   Loss 1.0908   LearningRate 0.0000   Epoch: 19   Global Step: 818530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:42:59,235-Speed 2626.85 samples/sec   Loss 1.1217   LearningRate 0.0000   Epoch: 19   Global Step: 818540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:43:03,146-Speed 2618.81 samples/sec   Loss 1.1260   LearningRate 0.0000   Epoch: 19   Global Step: 818550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:43:07,072-Speed 2608.89 samples/sec   Loss 1.0963   LearningRate 0.0000   Epoch: 19   Global Step: 818560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:43:10,966-Speed 2630.63 samples/sec   Loss 1.1112   LearningRate 0.0000   Epoch: 19   Global Step: 818570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:43:14,862-Speed 2628.49 samples/sec   Loss 1.0954   LearningRate 0.0000   Epoch: 19   Global Step: 818580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:43:18,732-Speed 2647.05 samples/sec   Loss 1.0751   LearningRate 0.0000   Epoch: 19   Global Step: 818590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:43:22,602-Speed 2646.37 samples/sec   Loss 1.0552   LearningRate 0.0000   Epoch: 19   Global Step: 818600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:26,503-Speed 2626.41 samples/sec   Loss 1.1231   LearningRate 0.0000   Epoch: 19   Global Step: 818610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:30,406-Speed 2624.19 samples/sec   Loss 1.1212   LearningRate 0.0000   Epoch: 19   Global Step: 818620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:34,336-Speed 2606.12 samples/sec   Loss 1.1138   LearningRate 0.0000   Epoch: 19   Global Step: 818630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:38,232-Speed 2629.10 samples/sec   Loss 1.0959   LearningRate 0.0000   Epoch: 19   Global Step: 818640   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:42,136-Speed 2623.73 samples/sec   Loss 1.1166   LearningRate 0.0000   Epoch: 19   Global Step: 818650   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:46,026-Speed 2632.81 samples/sec   Loss 1.0545   LearningRate 0.0000   Epoch: 19   Global Step: 818660   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:49,919-Speed 2631.54 samples/sec   Loss 1.1138   LearningRate 0.0000   Epoch: 19   Global Step: 818670   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:53,814-Speed 2629.06 samples/sec   Loss 1.1258   LearningRate 0.0000   Epoch: 19   Global Step: 818680   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:43:57,706-Speed 2632.16 samples/sec   Loss 1.1588   LearningRate 0.0000   Epoch: 19   Global Step: 818690   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:44:01,602-Speed 2629.02 samples/sec   Loss 1.0961   LearningRate 0.0000   Epoch: 19   Global Step: 818700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:05,506-Speed 2623.98 samples/sec   Loss 1.0820   LearningRate 0.0000   Epoch: 19   Global Step: 818710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:09,402-Speed 2628.86 samples/sec   Loss 1.0673   LearningRate 0.0000   Epoch: 19   Global Step: 818720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:13,303-Speed 2625.43 samples/sec   Loss 1.1587   LearningRate 0.0000   Epoch: 19   Global Step: 818730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:17,197-Speed 2629.80 samples/sec   Loss 1.0679   LearningRate 0.0000   Epoch: 19   Global Step: 818740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:21,095-Speed 2628.34 samples/sec   Loss 1.0824   LearningRate 0.0000   Epoch: 19   Global Step: 818750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:24,988-Speed 2631.26 samples/sec   Loss 1.1071   LearningRate 0.0000   Epoch: 19   Global Step: 818760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:28,881-Speed 2630.58 samples/sec   Loss 1.1378   LearningRate 0.0000   Epoch: 19   Global Step: 818770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:32,795-Speed 2616.92 samples/sec   Loss 1.1198   LearningRate 0.0000   Epoch: 19   Global Step: 818780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:36,832-Speed 2537.25 samples/sec   Loss 1.0594   LearningRate 0.0000   Epoch: 19   Global Step: 818790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:40,722-Speed 2633.30 samples/sec   Loss 1.0431   LearningRate 0.0000   Epoch: 19   Global Step: 818800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:44:44,609-Speed 2635.14 samples/sec   Loss 1.1068   LearningRate 0.0000   Epoch: 19   Global Step: 818810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:48,510-Speed 2626.45 samples/sec   Loss 1.1460   LearningRate 0.0000   Epoch: 19   Global Step: 818820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:52,410-Speed 2626.18 samples/sec   Loss 1.0881   LearningRate 0.0000   Epoch: 19   Global Step: 818830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:44:56,319-Speed 2620.14 samples/sec   Loss 1.0813   LearningRate 0.0000   Epoch: 19   Global Step: 818840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:00,213-Speed 2630.99 samples/sec   Loss 1.1378   LearningRate 0.0000   Epoch: 19   Global Step: 818850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:04,110-Speed 2627.92 samples/sec   Loss 1.0880   LearningRate 0.0000   Epoch: 19   Global Step: 818860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:08,008-Speed 2627.54 samples/sec   Loss 1.1334   LearningRate 0.0000   Epoch: 19   Global Step: 818870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:11,927-Speed 2614.05 samples/sec   Loss 1.1074   LearningRate 0.0000   Epoch: 19   Global Step: 818880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:15,820-Speed 2631.09 samples/sec   Loss 1.0739   LearningRate 0.0000   Epoch: 19   Global Step: 818890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:19,715-Speed 2629.83 samples/sec   Loss 1.0973   LearningRate 0.0000   Epoch: 19   Global Step: 818900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:23,613-Speed 2627.95 samples/sec   Loss 1.1021   LearningRate 0.0000   Epoch: 19   Global Step: 818910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:45:27,504-Speed 2632.31 samples/sec   Loss 1.0372   LearningRate 0.0000   Epoch: 19   Global Step: 818920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:45:31,399-Speed 2630.42 samples/sec   Loss 1.0967   LearningRate 0.0000   Epoch: 19   Global Step: 818930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:45:35,279-Speed 2639.49 samples/sec   Loss 1.1134   LearningRate 0.0000   Epoch: 19   Global Step: 818940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:39,180-Speed 2625.91 samples/sec   Loss 1.1467   LearningRate 0.0000   Epoch: 19   Global Step: 818950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:43,077-Speed 2627.85 samples/sec   Loss 1.1114   LearningRate 0.0000   Epoch: 19   Global Step: 818960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:47,145-Speed 2517.95 samples/sec   Loss 1.1369   LearningRate 0.0000   Epoch: 19   Global Step: 818970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:51,037-Speed 2631.81 samples/sec   Loss 1.1561   LearningRate 0.0000   Epoch: 19   Global Step: 818980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:54,929-Speed 2631.88 samples/sec   Loss 1.1074   LearningRate 0.0000   Epoch: 19   Global Step: 818990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:45:58,872-Speed 2598.00 samples/sec   Loss 1.1213   LearningRate 0.0000   Epoch: 19   Global Step: 819000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:02,762-Speed 2633.42 samples/sec   Loss 1.1414   LearningRate 0.0000   Epoch: 19   Global Step: 819010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:06,657-Speed 2629.49 samples/sec   Loss 1.1044   LearningRate 0.0000   Epoch: 19   Global Step: 819020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:10,580-Speed 2611.07 samples/sec   Loss 1.0687   LearningRate 0.0000   Epoch: 19   Global Step: 819030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:14,498-Speed 2614.69 samples/sec   Loss 1.0767   LearningRate 0.0000   Epoch: 19   Global Step: 819040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:46:18,376-Speed 2641.50 samples/sec   Loss 1.1247   LearningRate 0.0000   Epoch: 19   Global Step: 819050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:22,272-Speed 2629.69 samples/sec   Loss 1.1518   LearningRate 0.0000   Epoch: 19   Global Step: 819060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:26,185-Speed 2617.33 samples/sec   Loss 1.0552   LearningRate 0.0000   Epoch: 19   Global Step: 819070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:30,079-Speed 2630.00 samples/sec   Loss 1.1081   LearningRate 0.0000   Epoch: 19   Global Step: 819080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:33,977-Speed 2628.16 samples/sec   Loss 1.1345   LearningRate 0.0000   Epoch: 19   Global Step: 819090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:37,872-Speed 2629.34 samples/sec   Loss 1.1099   LearningRate 0.0000   Epoch: 19   Global Step: 819100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:41,772-Speed 2625.88 samples/sec   Loss 1.0847   LearningRate 0.0000   Epoch: 19   Global Step: 819110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:45,673-Speed 2625.87 samples/sec   Loss 1.1266   LearningRate 0.0000   Epoch: 19   Global Step: 819120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:49,577-Speed 2623.84 samples/sec   Loss 1.0970   LearningRate 0.0000   Epoch: 19   Global Step: 819130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:53,472-Speed 2629.66 samples/sec   Loss 1.0953   LearningRate 0.0000   Epoch: 19   Global Step: 819140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:46:57,392-Speed 2612.75 samples/sec   Loss 1.1421   LearningRate 0.0000   Epoch: 19   Global Step: 819150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:47:01,292-Speed 2626.30 samples/sec   Loss 1.0896   LearningRate 0.0000   Epoch: 19   Global Step: 819160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:47:05,166-Speed 2644.29 samples/sec   Loss 1.1414   LearningRate 0.0000   Epoch: 19   Global Step: 819170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:09,056-Speed 2632.63 samples/sec   Loss 1.0731   LearningRate 0.0000   Epoch: 19   Global Step: 819180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:12,950-Speed 2630.29 samples/sec   Loss 1.1369   LearningRate 0.0000   Epoch: 19   Global Step: 819190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:16,840-Speed 2633.56 samples/sec   Loss 1.1058   LearningRate 0.0000   Epoch: 19   Global Step: 819200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:20,732-Speed 2631.29 samples/sec   Loss 1.0951   LearningRate 0.0000   Epoch: 19   Global Step: 819210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:24,664-Speed 2604.97 samples/sec   Loss 1.1265   LearningRate 0.0000   Epoch: 19   Global Step: 819220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:28,579-Speed 2616.61 samples/sec   Loss 1.0989   LearningRate 0.0000   Epoch: 19   Global Step: 819230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:32,479-Speed 2626.21 samples/sec   Loss 1.0818   LearningRate 0.0000   Epoch: 19   Global Step: 819240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:36,378-Speed 2626.49 samples/sec   Loss 1.0861   LearningRate 0.0000   Epoch: 19   Global Step: 819250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:40,270-Speed 2631.64 samples/sec   Loss 1.1345   LearningRate 0.0000   Epoch: 19   Global Step: 819260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:44,165-Speed 2630.61 samples/sec   Loss 1.1017   LearningRate 0.0000   Epoch: 19   Global Step: 819270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:47:48,033-Speed 2647.67 samples/sec   Loss 1.1361   LearningRate 0.0000   Epoch: 19   Global Step: 819280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:51,929-Speed 2629.12 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 19   Global Step: 819290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:55,835-Speed 2622.73 samples/sec   Loss 1.0814   LearningRate 0.0000   Epoch: 19   Global Step: 819300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:47:59,761-Speed 2609.16 samples/sec   Loss 1.1217   LearningRate 0.0000   Epoch: 19   Global Step: 819310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:03,660-Speed 2626.73 samples/sec   Loss 1.0843   LearningRate 0.0000   Epoch: 19   Global Step: 819320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:07,554-Speed 2630.52 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 19   Global Step: 819330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:11,448-Speed 2629.84 samples/sec   Loss 1.0953   LearningRate 0.0000   Epoch: 19   Global Step: 819340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:15,360-Speed 2618.50 samples/sec   Loss 1.1070   LearningRate 0.0000   Epoch: 19   Global Step: 819350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:19,265-Speed 2623.42 samples/sec   Loss 1.1182   LearningRate 0.0000   Epoch: 19   Global Step: 819360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:23,164-Speed 2626.94 samples/sec   Loss 1.0914   LearningRate 0.0000   Epoch: 19   Global Step: 819370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:27,059-Speed 2629.48 samples/sec   Loss 1.0802   LearningRate 0.0000   Epoch: 19   Global Step: 819380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:48:30,966-Speed 2622.33 samples/sec   Loss 1.1246   LearningRate 0.0000   Epoch: 19   Global Step: 819390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:48:34,862-Speed 2628.52 samples/sec   Loss 1.1355   LearningRate 0.0000   Epoch: 19   Global Step: 819400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:48:38,761-Speed 2628.60 samples/sec   Loss 1.0654   LearningRate 0.0000   Epoch: 19   Global Step: 819410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:48:42,679-Speed 2614.23 samples/sec   Loss 1.0788   LearningRate 0.0000   Epoch: 19   Global Step: 819420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:48:46,549-Speed 2646.89 samples/sec   Loss 1.0914   LearningRate 0.0000   Epoch: 19   Global Step: 819430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:50,443-Speed 2629.88 samples/sec   Loss 1.1395   LearningRate 0.0000   Epoch: 19   Global Step: 819440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:54,348-Speed 2623.72 samples/sec   Loss 1.1241   LearningRate 0.0000   Epoch: 19   Global Step: 819450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:48:58,245-Speed 2627.57 samples/sec   Loss 1.0783   LearningRate 0.0000   Epoch: 19   Global Step: 819460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:02,144-Speed 2627.87 samples/sec   Loss 1.0871   LearningRate 0.0000   Epoch: 19   Global Step: 819470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:06,049-Speed 2622.71 samples/sec   Loss 1.0862   LearningRate 0.0000   Epoch: 19   Global Step: 819480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:09,964-Speed 2616.98 samples/sec   Loss 1.0647   LearningRate 0.0000   Epoch: 19   Global Step: 819490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:13,857-Speed 2630.45 samples/sec   Loss 1.0764   LearningRate 0.0000   Epoch: 19   Global Step: 819500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:17,781-Speed 2610.08 samples/sec   Loss 1.0968   LearningRate 0.0000   Epoch: 19   Global Step: 819510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:21,683-Speed 2625.36 samples/sec   Loss 1.0878   LearningRate 0.0000   Epoch: 19   Global Step: 819520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:25,576-Speed 2631.26 samples/sec   Loss 1.0504   LearningRate 0.0000   Epoch: 19   Global Step: 819530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:49:29,451-Speed 2643.01 samples/sec   Loss 1.0469   LearningRate 0.0000   Epoch: 19   Global Step: 819540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:33,349-Speed 2627.72 samples/sec   Loss 1.1001   LearningRate 0.0000   Epoch: 19   Global Step: 819550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:37,245-Speed 2628.92 samples/sec   Loss 1.0419   LearningRate 0.0000   Epoch: 19   Global Step: 819560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:41,135-Speed 2633.40 samples/sec   Loss 1.1533   LearningRate 0.0000   Epoch: 19   Global Step: 819570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:45,030-Speed 2629.85 samples/sec   Loss 1.1005   LearningRate 0.0000   Epoch: 19   Global Step: 819580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:48,936-Speed 2622.11 samples/sec   Loss 1.0849   LearningRate 0.0000   Epoch: 19   Global Step: 819590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:52,833-Speed 2628.16 samples/sec   Loss 1.0798   LearningRate 0.0000   Epoch: 19   Global Step: 819600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:49:56,738-Speed 2623.34 samples/sec   Loss 1.1042   LearningRate 0.0000   Epoch: 19   Global Step: 819610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:00,631-Speed 2630.75 samples/sec   Loss 1.0890   LearningRate 0.0000   Epoch: 19   Global Step: 819620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:04,533-Speed 2625.61 samples/sec   Loss 1.0703   LearningRate 0.0000   Epoch: 19   Global Step: 819630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:08,453-Speed 2612.28 samples/sec   Loss 1.1374   LearningRate 0.0000   Epoch: 19   Global Step: 819640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:12,347-Speed 2630.89 samples/sec   Loss 1.0548   LearningRate 0.0000   Epoch: 19   Global Step: 819650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:16,248-Speed 2625.78 samples/sec   Loss 1.1009   LearningRate 0.0000   Epoch: 19   Global Step: 819660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:20,151-Speed 2624.23 samples/sec   Loss 1.1255   LearningRate 0.0000   Epoch: 19   Global Step: 819670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:24,039-Speed 2634.02 samples/sec   Loss 1.1235   LearningRate 0.0000   Epoch: 19   Global Step: 819680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:27,930-Speed 2632.34 samples/sec   Loss 1.0745   LearningRate 0.0000   Epoch: 19   Global Step: 819690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:31,825-Speed 2629.99 samples/sec   Loss 1.0670   LearningRate 0.0000   Epoch: 19   Global Step: 819700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:35,732-Speed 2621.38 samples/sec   Loss 1.0775   LearningRate 0.0000   Epoch: 19   Global Step: 819710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:39,626-Speed 2630.52 samples/sec   Loss 1.1113   LearningRate 0.0000   Epoch: 19   Global Step: 819720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:43,532-Speed 2621.87 samples/sec   Loss 1.1444   LearningRate 0.0000   Epoch: 19   Global Step: 819730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:50:47,425-Speed 2631.31 samples/sec   Loss 1.1014   LearningRate 0.0000   Epoch: 19   Global Step: 819740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:50:51,330-Speed 2622.83 samples/sec   Loss 1.1069   LearningRate 0.0000   Epoch: 19   Global Step: 819750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:50:55,232-Speed 2625.16 samples/sec   Loss 1.1626   LearningRate 0.0000   Epoch: 19   Global Step: 819760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:50:59,128-Speed 2629.30 samples/sec   Loss 1.0852   LearningRate 0.0000   Epoch: 19   Global Step: 819770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:51:03,025-Speed 2628.17 samples/sec   Loss 1.0530   LearningRate 0.0000   Epoch: 19   Global Step: 819780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:06,994-Speed 2580.18 samples/sec   Loss 1.1010   LearningRate 0.0000   Epoch: 19   Global Step: 819790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:10,886-Speed 2632.27 samples/sec   Loss 1.1353   LearningRate 0.0000   Epoch: 19   Global Step: 819800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:14,791-Speed 2622.46 samples/sec   Loss 1.0626   LearningRate 0.0000   Epoch: 19   Global Step: 819810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:18,683-Speed 2631.29 samples/sec   Loss 1.0884   LearningRate 0.0000   Epoch: 19   Global Step: 819820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:22,577-Speed 2631.15 samples/sec   Loss 1.1289   LearningRate 0.0000   Epoch: 19   Global Step: 819830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:26,469-Speed 2631.52 samples/sec   Loss 1.0958   LearningRate 0.0000   Epoch: 19   Global Step: 819840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:30,370-Speed 2625.72 samples/sec   Loss 1.0766   LearningRate 0.0000   Epoch: 19   Global Step: 819850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:34,266-Speed 2628.97 samples/sec   Loss 1.1108   LearningRate 0.0000   Epoch: 19   Global Step: 819860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:38,155-Speed 2633.95 samples/sec   Loss 1.0762   LearningRate 0.0000   Epoch: 19   Global Step: 819870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:42,048-Speed 2631.11 samples/sec   Loss 1.1024   LearningRate 0.0000   Epoch: 19   Global Step: 819880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:51:45,918-Speed 2646.35 samples/sec   Loss 1.1290   LearningRate 0.0000   Epoch: 19   Global Step: 819890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:49,808-Speed 2633.42 samples/sec   Loss 1.0555   LearningRate 0.0000   Epoch: 19   Global Step: 819900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:53,704-Speed 2629.55 samples/sec   Loss 1.1153   LearningRate 0.0000   Epoch: 19   Global Step: 819910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:51:57,611-Speed 2621.28 samples/sec   Loss 1.0808   LearningRate 0.0000   Epoch: 19   Global Step: 819920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:01,518-Speed 2621.80 samples/sec   Loss 1.1366   LearningRate 0.0000   Epoch: 19   Global Step: 819930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:05,412-Speed 2630.18 samples/sec   Loss 1.0931   LearningRate 0.0000   Epoch: 19   Global Step: 819940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:09,302-Speed 2632.98 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 819950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:13,202-Speed 2626.10 samples/sec   Loss 1.0681   LearningRate 0.0000   Epoch: 19   Global Step: 819960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:17,113-Speed 2619.30 samples/sec   Loss 1.1385   LearningRate 0.0000   Epoch: 19   Global Step: 819970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:21,141-Speed 2542.13 samples/sec   Loss 1.1105   LearningRate 0.0000   Epoch: 19   Global Step: 819980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:52:25,118-Speed 2576.13 samples/sec   Loss 1.1200   LearningRate 0.0000   Epoch: 19   Global Step: 819990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:52:29,083-Speed 2583.60 samples/sec   Loss 1.1177   LearningRate 0.0000   Epoch: 19   Global Step: 820000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:53:12,306-[lfw][820000]XNorm: 21.196989
Training: 2022-04-16 15:53:12,307-[lfw][820000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-16 15:53:12,308-[lfw][820000]Accuracy-Highest: 0.99850
Training: 2022-04-16 15:54:02,368-[cfp_fp][820000]XNorm: 21.790877
Training: 2022-04-16 15:54:02,369-[cfp_fp][820000]Accuracy-Flip: 0.99371+-0.00351
Training: 2022-04-16 15:54:02,370-[cfp_fp][820000]Accuracy-Highest: 0.99400
Training: 2022-04-16 15:54:45,461-[agedb_30][820000]XNorm: 22.201931
Training: 2022-04-16 15:54:45,462-[agedb_30][820000]Accuracy-Flip: 0.98600+-0.00507
Training: 2022-04-16 15:54:45,463-[agedb_30][820000]Accuracy-Highest: 0.98600
Training: 2022-04-16 15:54:49,338-Speed 73.01 samples/sec   Loss 1.0807   LearningRate 0.0000   Epoch: 19   Global Step: 820010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:54:53,207-Speed 2647.22 samples/sec   Loss 1.0873   LearningRate 0.0000   Epoch: 19   Global Step: 820020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:54:57,093-Speed 2635.97 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 19   Global Step: 820030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:00,970-Speed 2641.66 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 820040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:04,850-Speed 2639.61 samples/sec   Loss 1.0711   LearningRate 0.0000   Epoch: 19   Global Step: 820050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:08,734-Speed 2637.19 samples/sec   Loss 1.1581   LearningRate 0.0000   Epoch: 19   Global Step: 820060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:12,637-Speed 2625.85 samples/sec   Loss 1.1116   LearningRate 0.0000   Epoch: 19   Global Step: 820070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:16,533-Speed 2629.00 samples/sec   Loss 1.0649   LearningRate 0.0000   Epoch: 19   Global Step: 820080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:20,428-Speed 2630.00 samples/sec   Loss 1.1328   LearningRate 0.0000   Epoch: 19   Global Step: 820090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:24,322-Speed 2630.38 samples/sec   Loss 1.0866   LearningRate 0.0000   Epoch: 19   Global Step: 820100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:55:28,192-Speed 2646.76 samples/sec   Loss 1.1344   LearningRate 0.0000   Epoch: 19   Global Step: 820110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:32,085-Speed 2630.98 samples/sec   Loss 1.0972   LearningRate 0.0000   Epoch: 19   Global Step: 820120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:35,978-Speed 2631.04 samples/sec   Loss 1.0637   LearningRate 0.0000   Epoch: 19   Global Step: 820130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:39,875-Speed 2628.51 samples/sec   Loss 1.1296   LearningRate 0.0000   Epoch: 19   Global Step: 820140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:43,784-Speed 2620.05 samples/sec   Loss 1.0946   LearningRate 0.0000   Epoch: 19   Global Step: 820150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:47,693-Speed 2620.44 samples/sec   Loss 1.0515   LearningRate 0.0000   Epoch: 19   Global Step: 820160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:51,585-Speed 2631.73 samples/sec   Loss 1.1334   LearningRate 0.0000   Epoch: 19   Global Step: 820170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:55,482-Speed 2628.36 samples/sec   Loss 1.0969   LearningRate 0.0000   Epoch: 19   Global Step: 820180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:55:59,392-Speed 2619.97 samples/sec   Loss 1.1442   LearningRate 0.0000   Epoch: 19   Global Step: 820190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:03,286-Speed 2630.21 samples/sec   Loss 1.1688   LearningRate 0.0000   Epoch: 19   Global Step: 820200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:07,179-Speed 2630.74 samples/sec   Loss 1.1055   LearningRate 0.0000   Epoch: 19   Global Step: 820210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:56:11,088-Speed 2619.82 samples/sec   Loss 1.0974   LearningRate 0.0000   Epoch: 19   Global Step: 820220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:56:14,974-Speed 2636.53 samples/sec   Loss 1.0666   LearningRate 0.0000   Epoch: 19   Global Step: 820230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:18,867-Speed 2630.79 samples/sec   Loss 1.1221   LearningRate 0.0000   Epoch: 19   Global Step: 820240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:22,761-Speed 2630.97 samples/sec   Loss 1.0338   LearningRate 0.0000   Epoch: 19   Global Step: 820250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:26,655-Speed 2630.51 samples/sec   Loss 1.1428   LearningRate 0.0000   Epoch: 19   Global Step: 820260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:30,557-Speed 2624.85 samples/sec   Loss 1.1098   LearningRate 0.0000   Epoch: 19   Global Step: 820270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:34,458-Speed 2626.07 samples/sec   Loss 1.1314   LearningRate 0.0000   Epoch: 19   Global Step: 820280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:38,359-Speed 2625.61 samples/sec   Loss 1.1167   LearningRate 0.0000   Epoch: 19   Global Step: 820290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:42,253-Speed 2630.00 samples/sec   Loss 1.1521   LearningRate 0.0000   Epoch: 19   Global Step: 820300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:46,200-Speed 2595.63 samples/sec   Loss 1.1150   LearningRate 0.0000   Epoch: 19   Global Step: 820310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:50,272-Speed 2515.37 samples/sec   Loss 1.0525   LearningRate 0.0000   Epoch: 19   Global Step: 820320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:54,141-Speed 2647.47 samples/sec   Loss 1.1257   LearningRate 0.0000   Epoch: 19   Global Step: 820330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:56:58,185-Speed 2532.42 samples/sec   Loss 1.0837   LearningRate 0.0000   Epoch: 19   Global Step: 820340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:02,151-Speed 2583.04 samples/sec   Loss 1.1010   LearningRate 0.0000   Epoch: 19   Global Step: 820350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:06,063-Speed 2618.47 samples/sec   Loss 1.1378   LearningRate 0.0000   Epoch: 19   Global Step: 820360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:09,956-Speed 2630.70 samples/sec   Loss 1.0661   LearningRate 0.0000   Epoch: 19   Global Step: 820370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:13,945-Speed 2567.76 samples/sec   Loss 1.0872   LearningRate 0.0000   Epoch: 19   Global Step: 820380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:17,836-Speed 2632.32 samples/sec   Loss 1.0415   LearningRate 0.0000   Epoch: 19   Global Step: 820390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:21,726-Speed 2633.36 samples/sec   Loss 1.1445   LearningRate 0.0000   Epoch: 19   Global Step: 820400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:25,615-Speed 2633.53 samples/sec   Loss 1.1283   LearningRate 0.0000   Epoch: 19   Global Step: 820410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:29,515-Speed 2625.85 samples/sec   Loss 1.1203   LearningRate 0.0000   Epoch: 19   Global Step: 820420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:33,409-Speed 2630.59 samples/sec   Loss 1.0728   LearningRate 0.0000   Epoch: 19   Global Step: 820430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:57:37,309-Speed 2626.96 samples/sec   Loss 1.0803   LearningRate 0.0000   Epoch: 19   Global Step: 820440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:57:41,215-Speed 2622.44 samples/sec   Loss 1.0849   LearningRate 0.0000   Epoch: 19   Global Step: 820450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 15:57:45,163-Speed 2594.83 samples/sec   Loss 1.1380   LearningRate 0.0000   Epoch: 19   Global Step: 820460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:57:49,029-Speed 2648.98 samples/sec   Loss 1.1023   LearningRate 0.0000   Epoch: 19   Global Step: 820470   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:57:52,918-Speed 2633.50 samples/sec   Loss 1.1274   LearningRate 0.0000   Epoch: 19   Global Step: 820480   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:57:56,953-Speed 2538.72 samples/sec   Loss 1.1193   LearningRate 0.0000   Epoch: 19   Global Step: 820490   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:01,039-Speed 2507.07 samples/sec   Loss 1.0992   LearningRate 0.0000   Epoch: 19   Global Step: 820500   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:05,035-Speed 2562.97 samples/sec   Loss 1.1108   LearningRate 0.0000   Epoch: 19   Global Step: 820510   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:08,922-Speed 2635.26 samples/sec   Loss 1.0992   LearningRate 0.0000   Epoch: 19   Global Step: 820520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:12,811-Speed 2633.24 samples/sec   Loss 1.1155   LearningRate 0.0000   Epoch: 19   Global Step: 820530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:16,720-Speed 2620.58 samples/sec   Loss 1.1226   LearningRate 0.0000   Epoch: 19   Global Step: 820540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:20,612-Speed 2632.09 samples/sec   Loss 1.0893   LearningRate 0.0000   Epoch: 19   Global Step: 820550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:24,501-Speed 2633.32 samples/sec   Loss 1.0981   LearningRate 0.0000   Epoch: 19   Global Step: 820560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:58:28,462-Speed 2586.03 samples/sec   Loss 1.0586   LearningRate 0.0000   Epoch: 19   Global Step: 820570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:32,549-Speed 2506.16 samples/sec   Loss 1.1369   LearningRate 0.0000   Epoch: 19   Global Step: 820580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:36,494-Speed 2596.02 samples/sec   Loss 1.1102   LearningRate 0.0000   Epoch: 19   Global Step: 820590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:40,390-Speed 2628.80 samples/sec   Loss 1.0967   LearningRate 0.0000   Epoch: 19   Global Step: 820600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:44,295-Speed 2622.87 samples/sec   Loss 1.0791   LearningRate 0.0000   Epoch: 19   Global Step: 820610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:48,188-Speed 2631.51 samples/sec   Loss 1.1347   LearningRate 0.0000   Epoch: 19   Global Step: 820620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:52,076-Speed 2634.21 samples/sec   Loss 1.0590   LearningRate 0.0000   Epoch: 19   Global Step: 820630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:55,968-Speed 2632.42 samples/sec   Loss 1.0729   LearningRate 0.0000   Epoch: 19   Global Step: 820640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:58:59,857-Speed 2633.18 samples/sec   Loss 1.0769   LearningRate 0.0000   Epoch: 19   Global Step: 820650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:59:03,746-Speed 2633.57 samples/sec   Loss 1.1245   LearningRate 0.0000   Epoch: 19   Global Step: 820660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:59:07,614-Speed 2648.17 samples/sec   Loss 1.0604   LearningRate 0.0000   Epoch: 19   Global Step: 820670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:59:11,484-Speed 2646.47 samples/sec   Loss 1.0954   LearningRate 0.0000   Epoch: 19   Global Step: 820680   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:15,378-Speed 2630.08 samples/sec   Loss 1.1000   LearningRate 0.0000   Epoch: 19   Global Step: 820690   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:19,277-Speed 2627.02 samples/sec   Loss 1.1290   LearningRate 0.0000   Epoch: 19   Global Step: 820700   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:23,166-Speed 2634.03 samples/sec   Loss 1.0954   LearningRate 0.0000   Epoch: 19   Global Step: 820710   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:27,056-Speed 2632.52 samples/sec   Loss 1.1014   LearningRate 0.0000   Epoch: 19   Global Step: 820720   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:30,946-Speed 2634.11 samples/sec   Loss 1.0957   LearningRate 0.0000   Epoch: 19   Global Step: 820730   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:34,842-Speed 2628.43 samples/sec   Loss 1.1002   LearningRate 0.0000   Epoch: 19   Global Step: 820740   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:38,733-Speed 2632.68 samples/sec   Loss 1.0962   LearningRate 0.0000   Epoch: 19   Global Step: 820750   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:42,653-Speed 2612.85 samples/sec   Loss 1.0741   LearningRate 0.0000   Epoch: 19   Global Step: 820760   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:46,565-Speed 2618.27 samples/sec   Loss 1.0996   LearningRate 0.0000   Epoch: 19   Global Step: 820770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 15:59:50,467-Speed 2625.40 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 820780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:59:54,397-Speed 2606.25 samples/sec   Loss 1.0593   LearningRate 0.0000   Epoch: 19   Global Step: 820790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 15:59:58,296-Speed 2626.80 samples/sec   Loss 1.1210   LearningRate 0.0000   Epoch: 19   Global Step: 820800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:02,234-Speed 2601.41 samples/sec   Loss 1.1200   LearningRate 0.0000   Epoch: 19   Global Step: 820810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:06,124-Speed 2633.12 samples/sec   Loss 1.0893   LearningRate 0.0000   Epoch: 19   Global Step: 820820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:10,017-Speed 2630.95 samples/sec   Loss 1.0712   LearningRate 0.0000   Epoch: 19   Global Step: 820830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:13,912-Speed 2629.98 samples/sec   Loss 1.0967   LearningRate 0.0000   Epoch: 19   Global Step: 820840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:17,802-Speed 2632.84 samples/sec   Loss 1.1478   LearningRate 0.0000   Epoch: 19   Global Step: 820850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:21,698-Speed 2628.81 samples/sec   Loss 1.1227   LearningRate 0.0000   Epoch: 19   Global Step: 820860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:25,592-Speed 2630.96 samples/sec   Loss 1.1034   LearningRate 0.0000   Epoch: 19   Global Step: 820870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:29,493-Speed 2625.29 samples/sec   Loss 1.0962   LearningRate 0.0000   Epoch: 19   Global Step: 820880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:00:33,391-Speed 2628.40 samples/sec   Loss 1.0907   LearningRate 0.0000   Epoch: 19   Global Step: 820890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:00:37,259-Speed 2647.52 samples/sec   Loss 1.0974   LearningRate 0.0000   Epoch: 19   Global Step: 820900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:41,148-Speed 2633.87 samples/sec   Loss 1.0739   LearningRate 0.0000   Epoch: 19   Global Step: 820910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:45,040-Speed 2631.91 samples/sec   Loss 1.0955   LearningRate 0.0000   Epoch: 19   Global Step: 820920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:48,931-Speed 2632.09 samples/sec   Loss 1.1137   LearningRate 0.0000   Epoch: 19   Global Step: 820930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:52,827-Speed 2629.03 samples/sec   Loss 1.0936   LearningRate 0.0000   Epoch: 19   Global Step: 820940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:00:56,730-Speed 2624.14 samples/sec   Loss 1.1064   LearningRate 0.0000   Epoch: 19   Global Step: 820950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:01:00,625-Speed 2629.77 samples/sec   Loss 1.0825   LearningRate 0.0000   Epoch: 19   Global Step: 820960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:01:04,644-Speed 2548.89 samples/sec   Loss 1.1311   LearningRate 0.0000   Epoch: 19   Global Step: 820970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:01:08,723-Speed 2510.69 samples/sec   Loss 1.1425   LearningRate 0.0000   Epoch: 19   Global Step: 820980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:01:12,771-Speed 2530.14 samples/sec   Loss 1.1235   LearningRate 0.0000   Epoch: 19   Global Step: 820990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:01:16,639-Speed 2648.71 samples/sec   Loss 1.1192   LearningRate 0.0000   Epoch: 19   Global Step: 821000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:20,532-Speed 2630.45 samples/sec   Loss 1.0909   LearningRate 0.0000   Epoch: 19   Global Step: 821010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:24,519-Speed 2569.27 samples/sec   Loss 1.1408   LearningRate 0.0000   Epoch: 19   Global Step: 821020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:28,545-Speed 2543.79 samples/sec   Loss 1.0938   LearningRate 0.0000   Epoch: 19   Global Step: 821030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:32,490-Speed 2596.41 samples/sec   Loss 1.0930   LearningRate 0.0000   Epoch: 19   Global Step: 821040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:36,389-Speed 2627.47 samples/sec   Loss 1.1510   LearningRate 0.0000   Epoch: 19   Global Step: 821050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:40,284-Speed 2629.64 samples/sec   Loss 1.0892   LearningRate 0.0000   Epoch: 19   Global Step: 821060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:44,183-Speed 2626.48 samples/sec   Loss 1.1080   LearningRate 0.0000   Epoch: 19   Global Step: 821070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:48,086-Speed 2624.64 samples/sec   Loss 1.1033   LearningRate 0.0000   Epoch: 19   Global Step: 821080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:51,981-Speed 2629.46 samples/sec   Loss 1.1057   LearningRate 0.0000   Epoch: 19   Global Step: 821090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:01:55,882-Speed 2625.49 samples/sec   Loss 1.1254   LearningRate 0.0000   Epoch: 19   Global Step: 821100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:01:59,786-Speed 2623.71 samples/sec   Loss 1.0458   LearningRate 0.0000   Epoch: 19   Global Step: 821110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:03,681-Speed 2629.72 samples/sec   Loss 1.0570   LearningRate 0.0000   Epoch: 19   Global Step: 821120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:07,574-Speed 2630.63 samples/sec   Loss 1.0818   LearningRate 0.0000   Epoch: 19   Global Step: 821130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:11,468-Speed 2630.59 samples/sec   Loss 1.0699   LearningRate 0.0000   Epoch: 19   Global Step: 821140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:15,362-Speed 2630.68 samples/sec   Loss 1.1422   LearningRate 0.0000   Epoch: 19   Global Step: 821150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:19,255-Speed 2630.55 samples/sec   Loss 1.0846   LearningRate 0.0000   Epoch: 19   Global Step: 821160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:23,147-Speed 2632.14 samples/sec   Loss 1.1092   LearningRate 0.0000   Epoch: 19   Global Step: 821170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:27,038-Speed 2632.21 samples/sec   Loss 1.1156   LearningRate 0.0000   Epoch: 19   Global Step: 821180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:30,928-Speed 2632.47 samples/sec   Loss 1.0753   LearningRate 0.0000   Epoch: 19   Global Step: 821190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:34,827-Speed 2627.19 samples/sec   Loss 1.1296   LearningRate 0.0000   Epoch: 19   Global Step: 821200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:02:38,735-Speed 2621.51 samples/sec   Loss 1.1473   LearningRate 0.0000   Epoch: 19   Global Step: 821210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:42,637-Speed 2624.90 samples/sec   Loss 1.1099   LearningRate 0.0000   Epoch: 19   Global Step: 821220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:46,545-Speed 2621.03 samples/sec   Loss 1.1205   LearningRate 0.0000   Epoch: 19   Global Step: 821230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:50,439-Speed 2630.55 samples/sec   Loss 1.0835   LearningRate 0.0000   Epoch: 19   Global Step: 821240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:54,327-Speed 2634.67 samples/sec   Loss 1.0914   LearningRate 0.0000   Epoch: 19   Global Step: 821250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:02:58,218-Speed 2631.73 samples/sec   Loss 1.1277   LearningRate 0.0000   Epoch: 19   Global Step: 821260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:02,109-Speed 2632.30 samples/sec   Loss 1.0750   LearningRate 0.0000   Epoch: 19   Global Step: 821270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:06,009-Speed 2626.56 samples/sec   Loss 1.1256   LearningRate 0.0000   Epoch: 19   Global Step: 821280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:09,901-Speed 2631.65 samples/sec   Loss 1.0863   LearningRate 0.0000   Epoch: 19   Global Step: 821290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:13,820-Speed 2614.21 samples/sec   Loss 1.1331   LearningRate 0.0000   Epoch: 19   Global Step: 821300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:17,722-Speed 2624.87 samples/sec   Loss 1.1104   LearningRate 0.0000   Epoch: 19   Global Step: 821310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:03:21,603-Speed 2639.69 samples/sec   Loss 1.1248   LearningRate 0.0000   Epoch: 19   Global Step: 821320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:25,498-Speed 2629.35 samples/sec   Loss 1.1195   LearningRate 0.0000   Epoch: 19   Global Step: 821330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:29,389-Speed 2632.49 samples/sec   Loss 1.0791   LearningRate 0.0000   Epoch: 19   Global Step: 821340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:03:33,256-Speed 2648.31 samples/sec   Loss 1.0703   LearningRate 0.0000   Epoch: 19   Global Step: 821350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:03:37,332-Speed 2513.42 samples/sec   Loss 1.0665   LearningRate 0.0000   Epoch: 19   Global Step: 821360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:03:41,389-Speed 2524.14 samples/sec   Loss 1.0822   LearningRate 0.0000   Epoch: 19   Global Step: 821370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:03:45,280-Speed 2634.87 samples/sec   Loss 1.0943   LearningRate 0.0000   Epoch: 19   Global Step: 821380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:03:49,177-Speed 2627.70 samples/sec   Loss 1.1449   LearningRate 0.0000   Epoch: 19   Global Step: 821390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:03:53,089-Speed 2618.73 samples/sec   Loss 1.1343   LearningRate 0.0000   Epoch: 19   Global Step: 821400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:03:56,980-Speed 2632.41 samples/sec   Loss 1.1488   LearningRate 0.0000   Epoch: 19   Global Step: 821410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:04:00,871-Speed 2632.43 samples/sec   Loss 1.0883   LearningRate 0.0000   Epoch: 19   Global Step: 821420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:04:04,769-Speed 2627.62 samples/sec   Loss 1.0749   LearningRate 0.0000   Epoch: 19   Global Step: 821430   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:04:08,675-Speed 2622.52 samples/sec   Loss 1.1362   LearningRate 0.0000   Epoch: 19   Global Step: 821440   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:04:12,570-Speed 2629.91 samples/sec   Loss 1.1021   LearningRate 0.0000   Epoch: 19   Global Step: 821450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:16,464-Speed 2630.36 samples/sec   Loss 1.1105   LearningRate 0.0000   Epoch: 19   Global Step: 821460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:20,377-Speed 2617.86 samples/sec   Loss 1.0877   LearningRate 0.0000   Epoch: 19   Global Step: 821470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:24,268-Speed 2632.53 samples/sec   Loss 1.1333   LearningRate 0.0000   Epoch: 19   Global Step: 821480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:28,154-Speed 2635.20 samples/sec   Loss 1.0706   LearningRate 0.0000   Epoch: 19   Global Step: 821490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:32,042-Speed 2634.23 samples/sec   Loss 1.1491   LearningRate 0.0000   Epoch: 19   Global Step: 821500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:35,937-Speed 2629.81 samples/sec   Loss 1.1172   LearningRate 0.0000   Epoch: 19   Global Step: 821510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:39,829-Speed 2632.34 samples/sec   Loss 1.1384   LearningRate 0.0000   Epoch: 19   Global Step: 821520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:43,722-Speed 2631.16 samples/sec   Loss 1.0879   LearningRate 0.0000   Epoch: 19   Global Step: 821530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:47,618-Speed 2628.65 samples/sec   Loss 1.1393   LearningRate 0.0000   Epoch: 19   Global Step: 821540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:51,505-Speed 2635.57 samples/sec   Loss 1.0619   LearningRate 0.0000   Epoch: 19   Global Step: 821550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:04:55,380-Speed 2643.73 samples/sec   Loss 1.0958   LearningRate 0.0000   Epoch: 19   Global Step: 821560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:04:59,301-Speed 2611.51 samples/sec   Loss 1.1221   LearningRate 0.0000   Epoch: 19   Global Step: 821570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:03,194-Speed 2631.32 samples/sec   Loss 1.1055   LearningRate 0.0000   Epoch: 19   Global Step: 821580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:07,085-Speed 2632.69 samples/sec   Loss 1.1151   LearningRate 0.0000   Epoch: 19   Global Step: 821590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:10,978-Speed 2630.99 samples/sec   Loss 1.0800   LearningRate 0.0000   Epoch: 19   Global Step: 821600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:14,891-Speed 2617.17 samples/sec   Loss 1.1183   LearningRate 0.0000   Epoch: 19   Global Step: 821610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:18,782-Speed 2632.69 samples/sec   Loss 1.0832   LearningRate 0.0000   Epoch: 19   Global Step: 821620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:22,679-Speed 2628.54 samples/sec   Loss 1.1241   LearningRate 0.0000   Epoch: 19   Global Step: 821630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:26,573-Speed 2630.42 samples/sec   Loss 1.0957   LearningRate 0.0000   Epoch: 19   Global Step: 821640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:30,470-Speed 2628.07 samples/sec   Loss 1.1124   LearningRate 0.0000   Epoch: 19   Global Step: 821650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:34,350-Speed 2639.42 samples/sec   Loss 1.0443   LearningRate 0.0000   Epoch: 19   Global Step: 821660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:38,255-Speed 2623.17 samples/sec   Loss 1.1261   LearningRate 0.0000   Epoch: 19   Global Step: 821670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:42,157-Speed 2625.20 samples/sec   Loss 1.0748   LearningRate 0.0000   Epoch: 19   Global Step: 821680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:46,051-Speed 2630.17 samples/sec   Loss 1.0818   LearningRate 0.0000   Epoch: 19   Global Step: 821690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:49,967-Speed 2616.22 samples/sec   Loss 1.0968   LearningRate 0.0000   Epoch: 19   Global Step: 821700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:54,057-Speed 2503.73 samples/sec   Loss 1.1367   LearningRate 0.0000   Epoch: 19   Global Step: 821710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:05:58,010-Speed 2591.84 samples/sec   Loss 1.1279   LearningRate 0.0000   Epoch: 19   Global Step: 821720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:06:01,913-Speed 2623.97 samples/sec   Loss 1.1339   LearningRate 0.0000   Epoch: 19   Global Step: 821730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:06:05,819-Speed 2621.92 samples/sec   Loss 1.1222   LearningRate 0.0000   Epoch: 19   Global Step: 821740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:06:09,761-Speed 2598.50 samples/sec   Loss 1.0956   LearningRate 0.0000   Epoch: 19   Global Step: 821750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:06:13,669-Speed 2621.42 samples/sec   Loss 1.0677   LearningRate 0.0000   Epoch: 19   Global Step: 821760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:17,567-Speed 2627.67 samples/sec   Loss 1.1092   LearningRate 0.0000   Epoch: 19   Global Step: 821770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:21,467-Speed 2626.03 samples/sec   Loss 1.0839   LearningRate 0.0000   Epoch: 19   Global Step: 821780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:25,366-Speed 2627.04 samples/sec   Loss 1.1154   LearningRate 0.0000   Epoch: 19   Global Step: 821790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:29,267-Speed 2626.00 samples/sec   Loss 1.0430   LearningRate 0.0000   Epoch: 19   Global Step: 821800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:33,166-Speed 2626.90 samples/sec   Loss 1.0634   LearningRate 0.0000   Epoch: 19   Global Step: 821810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:37,061-Speed 2629.62 samples/sec   Loss 1.0578   LearningRate 0.0000   Epoch: 19   Global Step: 821820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:40,957-Speed 2628.85 samples/sec   Loss 1.0887   LearningRate 0.0000   Epoch: 19   Global Step: 821830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:44,976-Speed 2548.72 samples/sec   Loss 1.0876   LearningRate 0.0000   Epoch: 19   Global Step: 821840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:48,914-Speed 2601.26 samples/sec   Loss 1.1035   LearningRate 0.0000   Epoch: 19   Global Step: 821850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:52,798-Speed 2637.03 samples/sec   Loss 1.1234   LearningRate 0.0000   Epoch: 19   Global Step: 821860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:06:56,670-Speed 2645.55 samples/sec   Loss 1.1189   LearningRate 0.0000   Epoch: 19   Global Step: 821870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:00,566-Speed 2629.12 samples/sec   Loss 1.1234   LearningRate 0.0000   Epoch: 19   Global Step: 821880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:04,484-Speed 2614.41 samples/sec   Loss 1.0702   LearningRate 0.0000   Epoch: 19   Global Step: 821890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:08,563-Speed 2511.17 samples/sec   Loss 1.0896   LearningRate 0.0000   Epoch: 19   Global Step: 821900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:12,464-Speed 2626.10 samples/sec   Loss 1.1451   LearningRate 0.0000   Epoch: 19   Global Step: 821910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:16,368-Speed 2623.35 samples/sec   Loss 1.1097   LearningRate 0.0000   Epoch: 19   Global Step: 821920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:20,288-Speed 2612.78 samples/sec   Loss 1.1169   LearningRate 0.0000   Epoch: 19   Global Step: 821930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:24,195-Speed 2622.06 samples/sec   Loss 1.0867   LearningRate 0.0000   Epoch: 19   Global Step: 821940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:28,095-Speed 2626.48 samples/sec   Loss 1.0639   LearningRate 0.0000   Epoch: 19   Global Step: 821950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:31,993-Speed 2627.42 samples/sec   Loss 1.1186   LearningRate 0.0000   Epoch: 19   Global Step: 821960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:07:35,888-Speed 2629.64 samples/sec   Loss 1.1250   LearningRate 0.0000   Epoch: 19   Global Step: 821970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:07:39,747-Speed 2654.43 samples/sec   Loss 1.0873   LearningRate 0.0000   Epoch: 19   Global Step: 821980   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:07:43,652-Speed 2623.01 samples/sec   Loss 1.0938   LearningRate 0.0000   Epoch: 19   Global Step: 821990   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:07:47,542-Speed 2632.50 samples/sec   Loss 1.0947   LearningRate 0.0000   Epoch: 19   Global Step: 822000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:07:51,436-Speed 2631.06 samples/sec   Loss 1.0914   LearningRate 0.0000   Epoch: 19   Global Step: 822010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:07:55,327-Speed 2632.14 samples/sec   Loss 1.1242   LearningRate 0.0000   Epoch: 19   Global Step: 822020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:07:59,219-Speed 2631.80 samples/sec   Loss 1.1098   LearningRate 0.0000   Epoch: 19   Global Step: 822030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:08:03,118-Speed 2627.31 samples/sec   Loss 1.1506   LearningRate 0.0000   Epoch: 19   Global Step: 822040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:08:07,009-Speed 2632.07 samples/sec   Loss 1.0842   LearningRate 0.0000   Epoch: 19   Global Step: 822050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:08:10,902-Speed 2630.89 samples/sec   Loss 1.1379   LearningRate 0.0000   Epoch: 19   Global Step: 822060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:08:14,806-Speed 2623.40 samples/sec   Loss 1.1059   LearningRate 0.0000   Epoch: 19   Global Step: 822070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:08:18,719-Speed 2618.90 samples/sec   Loss 1.0238   LearningRate 0.0000   Epoch: 19   Global Step: 822080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:22,781-Speed 2521.33 samples/sec   Loss 1.1254   LearningRate 0.0000   Epoch: 19   Global Step: 822090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:26,737-Speed 2589.25 samples/sec   Loss 1.0578   LearningRate 0.0000   Epoch: 19   Global Step: 822100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:30,722-Speed 2570.31 samples/sec   Loss 1.0776   LearningRate 0.0000   Epoch: 19   Global Step: 822110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:34,612-Speed 2633.33 samples/sec   Loss 1.0850   LearningRate 0.0000   Epoch: 19   Global Step: 822120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:38,518-Speed 2622.41 samples/sec   Loss 1.0676   LearningRate 0.0000   Epoch: 19   Global Step: 822130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:42,417-Speed 2627.09 samples/sec   Loss 1.1396   LearningRate 0.0000   Epoch: 19   Global Step: 822140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:46,328-Speed 2620.82 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 822150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:50,221-Speed 2630.73 samples/sec   Loss 1.0920   LearningRate 0.0000   Epoch: 19   Global Step: 822160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:54,112-Speed 2632.76 samples/sec   Loss 1.1045   LearningRate 0.0000   Epoch: 19   Global Step: 822170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:08:58,006-Speed 2629.81 samples/sec   Loss 1.1388   LearningRate 0.0000   Epoch: 19   Global Step: 822180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:09:01,884-Speed 2641.12 samples/sec   Loss 1.1354   LearningRate 0.0000   Epoch: 19   Global Step: 822190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:05,784-Speed 2626.04 samples/sec   Loss 1.0764   LearningRate 0.0000   Epoch: 19   Global Step: 822200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:09,686-Speed 2625.29 samples/sec   Loss 1.0520   LearningRate 0.0000   Epoch: 19   Global Step: 822210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:13,583-Speed 2629.17 samples/sec   Loss 1.1072   LearningRate 0.0000   Epoch: 19   Global Step: 822220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:17,488-Speed 2622.79 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 19   Global Step: 822230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:21,387-Speed 2627.28 samples/sec   Loss 1.0567   LearningRate 0.0000   Epoch: 19   Global Step: 822240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:25,279-Speed 2631.63 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 822250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:09:29,147-Speed 2648.27 samples/sec   Loss 1.1402   LearningRate 0.0000   Epoch: 19   Global Step: 822260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:33,042-Speed 2629.51 samples/sec   Loss 1.1342   LearningRate 0.0000   Epoch: 19   Global Step: 822270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:36,939-Speed 2627.94 samples/sec   Loss 1.0759   LearningRate 0.0000   Epoch: 19   Global Step: 822280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:40,955-Speed 2550.23 samples/sec   Loss 1.0861   LearningRate 0.0000   Epoch: 19   Global Step: 822290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:44,853-Speed 2628.28 samples/sec   Loss 1.0808   LearningRate 0.0000   Epoch: 19   Global Step: 822300   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:48,750-Speed 2628.19 samples/sec   Loss 1.0829   LearningRate 0.0000   Epoch: 19   Global Step: 822310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:52,661-Speed 2619.07 samples/sec   Loss 1.1067   LearningRate 0.0000   Epoch: 19   Global Step: 822320   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:09:56,584-Speed 2610.85 samples/sec   Loss 1.0925   LearningRate 0.0000   Epoch: 19   Global Step: 822330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:10:00,550-Speed 2582.84 samples/sec   Loss 1.0869   LearningRate 0.0000   Epoch: 19   Global Step: 822340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:10:04,622-Speed 2515.21 samples/sec   Loss 1.1081   LearningRate 0.0000   Epoch: 19   Global Step: 822350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:10:08,544-Speed 2611.55 samples/sec   Loss 1.1236   LearningRate 0.0000   Epoch: 19   Global Step: 822360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:12,666-Speed 2486.12 samples/sec   Loss 1.1347   LearningRate 0.0000   Epoch: 19   Global Step: 822370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:16,698-Speed 2540.51 samples/sec   Loss 1.0903   LearningRate 0.0000   Epoch: 19   Global Step: 822380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:20,731-Speed 2539.65 samples/sec   Loss 1.1135   LearningRate 0.0000   Epoch: 19   Global Step: 822390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:24,782-Speed 2528.31 samples/sec   Loss 1.1360   LearningRate 0.0000   Epoch: 19   Global Step: 822400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:28,685-Speed 2624.77 samples/sec   Loss 1.1355   LearningRate 0.0000   Epoch: 19   Global Step: 822410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:32,698-Speed 2552.60 samples/sec   Loss 1.1178   LearningRate 0.0000   Epoch: 19   Global Step: 822420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:36,715-Speed 2549.45 samples/sec   Loss 1.0918   LearningRate 0.0000   Epoch: 19   Global Step: 822430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:40,729-Speed 2551.68 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 822440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:44,620-Speed 2632.99 samples/sec   Loss 1.1162   LearningRate 0.0000   Epoch: 19   Global Step: 822450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:48,498-Speed 2640.95 samples/sec   Loss 1.1288   LearningRate 0.0000   Epoch: 19   Global Step: 822460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:52,396-Speed 2628.07 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 822470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:10:56,288-Speed 2631.97 samples/sec   Loss 1.0950   LearningRate 0.0000   Epoch: 19   Global Step: 822480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:11:00,197-Speed 2620.33 samples/sec   Loss 1.1188   LearningRate 0.0000   Epoch: 19   Global Step: 822490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:11:04,095-Speed 2627.60 samples/sec   Loss 1.1100   LearningRate 0.0000   Epoch: 19   Global Step: 822500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:11:07,967-Speed 2644.91 samples/sec   Loss 1.1095   LearningRate 0.0000   Epoch: 19   Global Step: 822510   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:11,871-Speed 2624.18 samples/sec   Loss 1.1057   LearningRate 0.0000   Epoch: 19   Global Step: 822520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:15,782-Speed 2618.85 samples/sec   Loss 1.0962   LearningRate 0.0000   Epoch: 19   Global Step: 822530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:19,676-Speed 2630.17 samples/sec   Loss 1.1009   LearningRate 0.0000   Epoch: 19   Global Step: 822540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:23,579-Speed 2624.82 samples/sec   Loss 1.1026   LearningRate 0.0000   Epoch: 19   Global Step: 822550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:27,475-Speed 2628.85 samples/sec   Loss 1.1137   LearningRate 0.0000   Epoch: 19   Global Step: 822560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:31,372-Speed 2628.92 samples/sec   Loss 1.1279   LearningRate 0.0000   Epoch: 19   Global Step: 822570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:35,264-Speed 2631.32 samples/sec   Loss 1.0325   LearningRate 0.0000   Epoch: 19   Global Step: 822580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:39,159-Speed 2629.61 samples/sec   Loss 1.1416   LearningRate 0.0000   Epoch: 19   Global Step: 822590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:43,058-Speed 2627.39 samples/sec   Loss 1.0357   LearningRate 0.0000   Epoch: 19   Global Step: 822600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:11:46,964-Speed 2622.40 samples/sec   Loss 1.1512   LearningRate 0.0000   Epoch: 19   Global Step: 822610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:11:50,904-Speed 2599.95 samples/sec   Loss 1.0517   LearningRate 0.0000   Epoch: 19   Global Step: 822620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:11:54,796-Speed 2631.95 samples/sec   Loss 1.0858   LearningRate 0.0000   Epoch: 19   Global Step: 822630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:11:58,697-Speed 2625.81 samples/sec   Loss 1.0933   LearningRate 0.0000   Epoch: 19   Global Step: 822640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:02,597-Speed 2625.80 samples/sec   Loss 1.0859   LearningRate 0.0000   Epoch: 19   Global Step: 822650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:06,501-Speed 2623.61 samples/sec   Loss 1.1022   LearningRate 0.0000   Epoch: 19   Global Step: 822660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:10,406-Speed 2622.49 samples/sec   Loss 1.0371   LearningRate 0.0000   Epoch: 19   Global Step: 822670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:14,335-Speed 2607.81 samples/sec   Loss 1.1483   LearningRate 0.0000   Epoch: 19   Global Step: 822680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:18,232-Speed 2628.80 samples/sec   Loss 1.0796   LearningRate 0.0000   Epoch: 19   Global Step: 822690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:22,135-Speed 2624.08 samples/sec   Loss 1.1326   LearningRate 0.0000   Epoch: 19   Global Step: 822700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:26,041-Speed 2621.94 samples/sec   Loss 1.1137   LearningRate 0.0000   Epoch: 19   Global Step: 822710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:12:29,940-Speed 2627.43 samples/sec   Loss 1.1218   LearningRate 0.0000   Epoch: 19   Global Step: 822720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:12:33,821-Speed 2639.08 samples/sec   Loss 1.0929   LearningRate 0.0000   Epoch: 19   Global Step: 822730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:37,721-Speed 2626.00 samples/sec   Loss 1.1233   LearningRate 0.0000   Epoch: 19   Global Step: 822740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:41,621-Speed 2626.85 samples/sec   Loss 1.0958   LearningRate 0.0000   Epoch: 19   Global Step: 822750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:45,526-Speed 2622.56 samples/sec   Loss 1.1336   LearningRate 0.0000   Epoch: 19   Global Step: 822760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:49,458-Speed 2605.50 samples/sec   Loss 1.1023   LearningRate 0.0000   Epoch: 19   Global Step: 822770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:53,366-Speed 2620.49 samples/sec   Loss 1.0976   LearningRate 0.0000   Epoch: 19   Global Step: 822780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:12:57,299-Speed 2604.59 samples/sec   Loss 1.0780   LearningRate 0.0000   Epoch: 19   Global Step: 822790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:01,248-Speed 2593.93 samples/sec   Loss 1.1009   LearningRate 0.0000   Epoch: 19   Global Step: 822800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:05,164-Speed 2615.54 samples/sec   Loss 1.0379   LearningRate 0.0000   Epoch: 19   Global Step: 822810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:09,057-Speed 2630.63 samples/sec   Loss 1.1097   LearningRate 0.0000   Epoch: 19   Global Step: 822820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:12,949-Speed 2632.25 samples/sec   Loss 1.1200   LearningRate 0.0000   Epoch: 19   Global Step: 822830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:13:16,821-Speed 2644.99 samples/sec   Loss 1.0989   LearningRate 0.0000   Epoch: 19   Global Step: 822840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:20,718-Speed 2628.75 samples/sec   Loss 1.1018   LearningRate 0.0000   Epoch: 19   Global Step: 822850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:24,608-Speed 2632.50 samples/sec   Loss 1.0773   LearningRate 0.0000   Epoch: 19   Global Step: 822860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:28,500-Speed 2632.46 samples/sec   Loss 1.1131   LearningRate 0.0000   Epoch: 19   Global Step: 822870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:32,390-Speed 2633.14 samples/sec   Loss 1.0789   LearningRate 0.0000   Epoch: 19   Global Step: 822880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:36,280-Speed 2632.81 samples/sec   Loss 1.1109   LearningRate 0.0000   Epoch: 19   Global Step: 822890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:13:40,147-Speed 2649.07 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 19   Global Step: 822900   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:13:44,041-Speed 2630.13 samples/sec   Loss 1.1279   LearningRate 0.0000   Epoch: 19   Global Step: 822910   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:13:47,935-Speed 2630.46 samples/sec   Loss 1.0840   LearningRate 0.0000   Epoch: 19   Global Step: 822920   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:13:51,825-Speed 2633.27 samples/sec   Loss 1.1140   LearningRate 0.0000   Epoch: 19   Global Step: 822930   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:13:55,728-Speed 2624.20 samples/sec   Loss 1.0180   LearningRate 0.0000   Epoch: 19   Global Step: 822940   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:13:59,619-Speed 2633.01 samples/sec   Loss 1.0745   LearningRate 0.0000   Epoch: 19   Global Step: 822950   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:03,513-Speed 2630.26 samples/sec   Loss 1.0506   LearningRate 0.0000   Epoch: 19   Global Step: 822960   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:07,449-Speed 2601.70 samples/sec   Loss 1.0962   LearningRate 0.0000   Epoch: 19   Global Step: 822970   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:11,345-Speed 2629.08 samples/sec   Loss 1.1140   LearningRate 0.0000   Epoch: 19   Global Step: 822980   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:15,238-Speed 2630.96 samples/sec   Loss 1.1100   LearningRate 0.0000   Epoch: 19   Global Step: 822990   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:19,145-Speed 2621.97 samples/sec   Loss 1.0866   LearningRate 0.0000   Epoch: 19   Global Step: 823000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:14:23,065-Speed 2612.95 samples/sec   Loss 1.1010   LearningRate 0.0000   Epoch: 19   Global Step: 823010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:14:27,091-Speed 2543.80 samples/sec   Loss 1.1196   LearningRate 0.0000   Epoch: 19   Global Step: 823020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:14:31,154-Speed 2520.94 samples/sec   Loss 1.0764   LearningRate 0.0000   Epoch: 19   Global Step: 823030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:35,127-Speed 2578.67 samples/sec   Loss 1.1016   LearningRate 0.0000   Epoch: 19   Global Step: 823040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:39,025-Speed 2627.37 samples/sec   Loss 1.1151   LearningRate 0.0000   Epoch: 19   Global Step: 823050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:42,917-Speed 2631.37 samples/sec   Loss 1.0747   LearningRate 0.0000   Epoch: 19   Global Step: 823060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:46,810-Speed 2631.75 samples/sec   Loss 1.1329   LearningRate 0.0000   Epoch: 19   Global Step: 823070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:50,732-Speed 2611.78 samples/sec   Loss 1.0501   LearningRate 0.0000   Epoch: 19   Global Step: 823080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:54,628-Speed 2629.00 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 19   Global Step: 823090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:14:58,531-Speed 2623.92 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 19   Global Step: 823100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:15:02,425-Speed 2630.29 samples/sec   Loss 1.1241   LearningRate 0.0000   Epoch: 19   Global Step: 823110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:15:06,330-Speed 2622.45 samples/sec   Loss 1.1428   LearningRate 0.0000   Epoch: 19   Global Step: 823120   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:15:10,230-Speed 2626.88 samples/sec   Loss 1.1635   LearningRate 0.0000   Epoch: 19   Global Step: 823130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:14,144-Speed 2616.87 samples/sec   Loss 1.0819   LearningRate 0.0000   Epoch: 19   Global Step: 823140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:18,056-Speed 2618.66 samples/sec   Loss 1.0684   LearningRate 0.0000   Epoch: 19   Global Step: 823150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:21,950-Speed 2630.56 samples/sec   Loss 1.0446   LearningRate 0.0000   Epoch: 19   Global Step: 823160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:25,846-Speed 2628.71 samples/sec   Loss 1.1301   LearningRate 0.0000   Epoch: 19   Global Step: 823170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:29,743-Speed 2628.69 samples/sec   Loss 1.0739   LearningRate 0.0000   Epoch: 19   Global Step: 823180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:33,633-Speed 2632.57 samples/sec   Loss 1.0824   LearningRate 0.0000   Epoch: 19   Global Step: 823190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:37,532-Speed 2626.81 samples/sec   Loss 1.1008   LearningRate 0.0000   Epoch: 19   Global Step: 823200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:41,441-Speed 2620.14 samples/sec   Loss 1.0677   LearningRate 0.0000   Epoch: 19   Global Step: 823210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:45,348-Speed 2628.34 samples/sec   Loss 1.1186   LearningRate 0.0000   Epoch: 19   Global Step: 823220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:15:49,252-Speed 2623.94 samples/sec   Loss 1.0641   LearningRate 0.0000   Epoch: 19   Global Step: 823230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:15:53,150-Speed 2627.85 samples/sec   Loss 1.0921   LearningRate 0.0000   Epoch: 19   Global Step: 823240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:15:57,049-Speed 2626.87 samples/sec   Loss 1.0877   LearningRate 0.0000   Epoch: 19   Global Step: 823250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:16:00,949-Speed 2626.32 samples/sec   Loss 1.0964   LearningRate 0.0000   Epoch: 19   Global Step: 823260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:16:04,826-Speed 2641.79 samples/sec   Loss 1.0916   LearningRate 0.0000   Epoch: 19   Global Step: 823270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:08,716-Speed 2632.47 samples/sec   Loss 1.0754   LearningRate 0.0000   Epoch: 19   Global Step: 823280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:12,637-Speed 2612.55 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 19   Global Step: 823290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:16,533-Speed 2629.00 samples/sec   Loss 1.1300   LearningRate 0.0000   Epoch: 19   Global Step: 823300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:20,426-Speed 2631.70 samples/sec   Loss 1.0881   LearningRate 0.0000   Epoch: 19   Global Step: 823310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:24,319-Speed 2630.81 samples/sec   Loss 1.0869   LearningRate 0.0000   Epoch: 19   Global Step: 823320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:28,213-Speed 2630.24 samples/sec   Loss 1.0883   LearningRate 0.0000   Epoch: 19   Global Step: 823330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:32,119-Speed 2622.59 samples/sec   Loss 1.0887   LearningRate 0.0000   Epoch: 19   Global Step: 823340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:36,013-Speed 2630.27 samples/sec   Loss 1.0729   LearningRate 0.0000   Epoch: 19   Global Step: 823350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:39,909-Speed 2628.97 samples/sec   Loss 1.1245   LearningRate 0.0000   Epoch: 19   Global Step: 823360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:44,023-Speed 2489.70 samples/sec   Loss 1.0825   LearningRate 0.0000   Epoch: 19   Global Step: 823370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:16:47,891-Speed 2648.30 samples/sec   Loss 1.0813   LearningRate 0.0000   Epoch: 19   Global Step: 823380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:51,796-Speed 2623.12 samples/sec   Loss 1.0663   LearningRate 0.0000   Epoch: 19   Global Step: 823390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:55,704-Speed 2620.51 samples/sec   Loss 1.0765   LearningRate 0.0000   Epoch: 19   Global Step: 823400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:16:59,603-Speed 2627.39 samples/sec   Loss 1.1257   LearningRate 0.0000   Epoch: 19   Global Step: 823410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:03,497-Speed 2630.52 samples/sec   Loss 1.0960   LearningRate 0.0000   Epoch: 19   Global Step: 823420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:07,396-Speed 2626.89 samples/sec   Loss 1.0846   LearningRate 0.0000   Epoch: 19   Global Step: 823430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:11,291-Speed 2629.17 samples/sec   Loss 1.0699   LearningRate 0.0000   Epoch: 19   Global Step: 823440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:15,213-Speed 2612.36 samples/sec   Loss 1.1303   LearningRate 0.0000   Epoch: 19   Global Step: 823450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:19,112-Speed 2627.23 samples/sec   Loss 1.0829   LearningRate 0.0000   Epoch: 19   Global Step: 823460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:23,022-Speed 2619.29 samples/sec   Loss 1.1262   LearningRate 0.0000   Epoch: 19   Global Step: 823470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:26,914-Speed 2632.13 samples/sec   Loss 1.0700   LearningRate 0.0000   Epoch: 19   Global Step: 823480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:17:30,784-Speed 2646.83 samples/sec   Loss 1.1173   LearningRate 0.0000   Epoch: 19   Global Step: 823490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:34,682-Speed 2627.09 samples/sec   Loss 1.0856   LearningRate 0.0000   Epoch: 19   Global Step: 823500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:38,578-Speed 2628.91 samples/sec   Loss 1.0785   LearningRate 0.0000   Epoch: 19   Global Step: 823510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:17:42,455-Speed 2641.81 samples/sec   Loss 1.0723   LearningRate 0.0000   Epoch: 19   Global Step: 823520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:17:46,355-Speed 2626.66 samples/sec   Loss 1.0736   LearningRate 0.0000   Epoch: 19   Global Step: 823530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:17:50,249-Speed 2630.85 samples/sec   Loss 1.0857   LearningRate 0.0000   Epoch: 19   Global Step: 823540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:17:54,140-Speed 2631.70 samples/sec   Loss 1.0803   LearningRate 0.0000   Epoch: 19   Global Step: 823550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:17:58,056-Speed 2616.30 samples/sec   Loss 1.1119   LearningRate 0.0000   Epoch: 19   Global Step: 823560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:18:01,957-Speed 2625.24 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 19   Global Step: 823570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:18:05,863-Speed 2622.42 samples/sec   Loss 1.1176   LearningRate 0.0000   Epoch: 19   Global Step: 823580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:18:09,759-Speed 2629.03 samples/sec   Loss 1.1243   LearningRate 0.0000   Epoch: 19   Global Step: 823590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:18:13,656-Speed 2628.71 samples/sec   Loss 1.0596   LearningRate 0.0000   Epoch: 19   Global Step: 823600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:18:17,551-Speed 2629.72 samples/sec   Loss 1.1058   LearningRate 0.0000   Epoch: 19   Global Step: 823610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:18:21,448-Speed 2627.71 samples/sec   Loss 1.0872   LearningRate 0.0000   Epoch: 19   Global Step: 823620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:25,464-Speed 2550.85 samples/sec   Loss 1.0716   LearningRate 0.0000   Epoch: 19   Global Step: 823630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:29,365-Speed 2625.12 samples/sec   Loss 1.1067   LearningRate 0.0000   Epoch: 19   Global Step: 823640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:33,261-Speed 2629.55 samples/sec   Loss 1.0585   LearningRate 0.0000   Epoch: 19   Global Step: 823650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:37,161-Speed 2626.61 samples/sec   Loss 1.0989   LearningRate 0.0000   Epoch: 19   Global Step: 823660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:41,058-Speed 2627.70 samples/sec   Loss 1.1066   LearningRate 0.0000   Epoch: 19   Global Step: 823670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:44,955-Speed 2628.68 samples/sec   Loss 1.0882   LearningRate 0.0000   Epoch: 19   Global Step: 823680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:48,878-Speed 2611.06 samples/sec   Loss 1.0934   LearningRate 0.0000   Epoch: 19   Global Step: 823690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:52,784-Speed 2621.86 samples/sec   Loss 1.0744   LearningRate 0.0000   Epoch: 19   Global Step: 823700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:18:56,683-Speed 2627.44 samples/sec   Loss 1.1093   LearningRate 0.0000   Epoch: 19   Global Step: 823710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:00,584-Speed 2625.66 samples/sec   Loss 1.1320   LearningRate 0.0000   Epoch: 19   Global Step: 823720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:04,489-Speed 2623.47 samples/sec   Loss 1.0944   LearningRate 0.0000   Epoch: 19   Global Step: 823730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:08,477-Speed 2568.08 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 19   Global Step: 823740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:12,375-Speed 2627.73 samples/sec   Loss 1.0949   LearningRate 0.0000   Epoch: 19   Global Step: 823750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:16,266-Speed 2631.72 samples/sec   Loss 1.1122   LearningRate 0.0000   Epoch: 19   Global Step: 823760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:20,162-Speed 2629.35 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 823770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:24,055-Speed 2631.14 samples/sec   Loss 1.0918   LearningRate 0.0000   Epoch: 19   Global Step: 823780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:27,948-Speed 2631.19 samples/sec   Loss 1.0843   LearningRate 0.0000   Epoch: 19   Global Step: 823790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:31,839-Speed 2632.54 samples/sec   Loss 1.0509   LearningRate 0.0000   Epoch: 19   Global Step: 823800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:35,728-Speed 2633.79 samples/sec   Loss 1.0722   LearningRate 0.0000   Epoch: 19   Global Step: 823810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:39,598-Speed 2646.53 samples/sec   Loss 1.1077   LearningRate 0.0000   Epoch: 19   Global Step: 823820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:43,501-Speed 2624.08 samples/sec   Loss 1.0850   LearningRate 0.0000   Epoch: 19   Global Step: 823830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:47,428-Speed 2608.93 samples/sec   Loss 1.0800   LearningRate 0.0000   Epoch: 19   Global Step: 823840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:51,322-Speed 2629.63 samples/sec   Loss 1.1301   LearningRate 0.0000   Epoch: 19   Global Step: 823850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:55,241-Speed 2613.90 samples/sec   Loss 1.0631   LearningRate 0.0000   Epoch: 19   Global Step: 823860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:19:59,132-Speed 2632.89 samples/sec   Loss 1.0805   LearningRate 0.0000   Epoch: 19   Global Step: 823870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:20:03,005-Speed 2644.97 samples/sec   Loss 1.1249   LearningRate 0.0000   Epoch: 19   Global Step: 823880   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:06,897-Speed 2631.59 samples/sec   Loss 1.0562   LearningRate 0.0000   Epoch: 19   Global Step: 823890   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:10,795-Speed 2626.84 samples/sec   Loss 1.0932   LearningRate 0.0000   Epoch: 19   Global Step: 823900   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:14,715-Speed 2612.98 samples/sec   Loss 1.0615   LearningRate 0.0000   Epoch: 19   Global Step: 823910   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:18,604-Speed 2634.20 samples/sec   Loss 1.0872   LearningRate 0.0000   Epoch: 19   Global Step: 823920   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:22,499-Speed 2629.86 samples/sec   Loss 1.0976   LearningRate 0.0000   Epoch: 19   Global Step: 823930   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:26,403-Speed 2622.95 samples/sec   Loss 1.1114   LearningRate 0.0000   Epoch: 19   Global Step: 823940   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:30,297-Speed 2630.54 samples/sec   Loss 1.1312   LearningRate 0.0000   Epoch: 19   Global Step: 823950   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:34,192-Speed 2629.92 samples/sec   Loss 1.1102   LearningRate 0.0000   Epoch: 19   Global Step: 823960   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:38,100-Speed 2621.05 samples/sec   Loss 1.1207   LearningRate 0.0000   Epoch: 19   Global Step: 823970   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:20:41,991-Speed 2632.27 samples/sec   Loss 1.1152   LearningRate 0.0000   Epoch: 19   Global Step: 823980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:20:45,888-Speed 2628.80 samples/sec   Loss 1.0593   LearningRate 0.0000   Epoch: 19   Global Step: 823990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:20:49,780-Speed 2631.28 samples/sec   Loss 1.1126   LearningRate 0.0000   Epoch: 19   Global Step: 824000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:20:53,682-Speed 2626.20 samples/sec   Loss 1.1367   LearningRate 0.0000   Epoch: 19   Global Step: 824010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:20:57,577-Speed 2629.46 samples/sec   Loss 1.1052   LearningRate 0.0000   Epoch: 19   Global Step: 824020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:21:01,472-Speed 2629.77 samples/sec   Loss 1.0482   LearningRate 0.0000   Epoch: 19   Global Step: 824030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:21:05,373-Speed 2625.47 samples/sec   Loss 1.0625   LearningRate 0.0000   Epoch: 19   Global Step: 824040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:09,281-Speed 2621.01 samples/sec   Loss 1.0893   LearningRate 0.0000   Epoch: 19   Global Step: 824050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:13,175-Speed 2630.24 samples/sec   Loss 1.1338   LearningRate 0.0000   Epoch: 19   Global Step: 824060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:17,066-Speed 2632.56 samples/sec   Loss 1.1040   LearningRate 0.0000   Epoch: 19   Global Step: 824070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:20,956-Speed 2632.83 samples/sec   Loss 1.0616   LearningRate 0.0000   Epoch: 19   Global Step: 824080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:24,851-Speed 2629.64 samples/sec   Loss 1.0675   LearningRate 0.0000   Epoch: 19   Global Step: 824090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:28,743-Speed 2631.70 samples/sec   Loss 1.1268   LearningRate 0.0000   Epoch: 19   Global Step: 824100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:32,642-Speed 2626.96 samples/sec   Loss 1.0721   LearningRate 0.0000   Epoch: 19   Global Step: 824110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:36,533-Speed 2632.35 samples/sec   Loss 1.0915   LearningRate 0.0000   Epoch: 19   Global Step: 824120   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:40,433-Speed 2625.64 samples/sec   Loss 1.0849   LearningRate 0.0000   Epoch: 19   Global Step: 824130   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:21:44,329-Speed 2629.72 samples/sec   Loss 1.1059   LearningRate 0.0000   Epoch: 19   Global Step: 824140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:21:48,222-Speed 2630.67 samples/sec   Loss 1.0881   LearningRate 0.0000   Epoch: 19   Global Step: 824150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:21:52,137-Speed 2616.69 samples/sec   Loss 1.0929   LearningRate 0.0000   Epoch: 19   Global Step: 824160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:21:56,028-Speed 2631.79 samples/sec   Loss 1.1050   LearningRate 0.0000   Epoch: 19   Global Step: 824170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:21:59,934-Speed 2622.71 samples/sec   Loss 1.1283   LearningRate 0.0000   Epoch: 19   Global Step: 824180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:22:03,824-Speed 2632.58 samples/sec   Loss 1.0972   LearningRate 0.0000   Epoch: 19   Global Step: 824190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:22:07,717-Speed 2631.18 samples/sec   Loss 1.0616   LearningRate 0.0000   Epoch: 19   Global Step: 824200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:22:11,628-Speed 2618.81 samples/sec   Loss 1.1097   LearningRate 0.0000   Epoch: 19   Global Step: 824210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:22:15,511-Speed 2637.73 samples/sec   Loss 1.0986   LearningRate 0.0000   Epoch: 19   Global Step: 824220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:19,406-Speed 2629.75 samples/sec   Loss 1.0787   LearningRate 0.0000   Epoch: 19   Global Step: 824230   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:23,316-Speed 2619.48 samples/sec   Loss 1.0771   LearningRate 0.0000   Epoch: 19   Global Step: 824240   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:27,211-Speed 2630.32 samples/sec   Loss 1.0886   LearningRate 0.0000   Epoch: 19   Global Step: 824250   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:31,121-Speed 2619.16 samples/sec   Loss 1.1239   LearningRate 0.0000   Epoch: 19   Global Step: 824260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:35,017-Speed 2628.96 samples/sec   Loss 1.0877   LearningRate 0.0000   Epoch: 19   Global Step: 824270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:38,921-Speed 2623.52 samples/sec   Loss 1.0752   LearningRate 0.0000   Epoch: 19   Global Step: 824280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:42,840-Speed 2613.84 samples/sec   Loss 1.1161   LearningRate 0.0000   Epoch: 19   Global Step: 824290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:46,730-Speed 2632.97 samples/sec   Loss 1.1061   LearningRate 0.0000   Epoch: 19   Global Step: 824300   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:50,627-Speed 2628.56 samples/sec   Loss 1.0806   LearningRate 0.0000   Epoch: 19   Global Step: 824310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:22:54,675-Speed 2530.30 samples/sec   Loss 1.0938   LearningRate 0.0000   Epoch: 19   Global Step: 824320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:22:58,574-Speed 2627.15 samples/sec   Loss 1.1244   LearningRate 0.0000   Epoch: 19   Global Step: 824330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:02,471-Speed 2628.05 samples/sec   Loss 1.0906   LearningRate 0.0000   Epoch: 19   Global Step: 824340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:06,383-Speed 2618.07 samples/sec   Loss 1.0510   LearningRate 0.0000   Epoch: 19   Global Step: 824350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:10,285-Speed 2625.27 samples/sec   Loss 1.0656   LearningRate 0.0000   Epoch: 19   Global Step: 824360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:14,182-Speed 2628.45 samples/sec   Loss 1.0915   LearningRate 0.0000   Epoch: 19   Global Step: 824370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:18,081-Speed 2627.03 samples/sec   Loss 1.0687   LearningRate 0.0000   Epoch: 19   Global Step: 824380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:21,986-Speed 2623.04 samples/sec   Loss 1.0901   LearningRate 0.0000   Epoch: 19   Global Step: 824390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:25,878-Speed 2631.75 samples/sec   Loss 1.0694   LearningRate 0.0000   Epoch: 19   Global Step: 824400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:29,775-Speed 2628.40 samples/sec   Loss 1.0706   LearningRate 0.0000   Epoch: 19   Global Step: 824410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:33,673-Speed 2627.43 samples/sec   Loss 1.0802   LearningRate 0.0000   Epoch: 19   Global Step: 824420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:23:37,545-Speed 2644.79 samples/sec   Loss 1.0904   LearningRate 0.0000   Epoch: 19   Global Step: 824430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:41,436-Speed 2632.07 samples/sec   Loss 1.0925   LearningRate 0.0000   Epoch: 19   Global Step: 824440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:45,329-Speed 2631.90 samples/sec   Loss 1.0986   LearningRate 0.0000   Epoch: 19   Global Step: 824450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:49,221-Speed 2632.34 samples/sec   Loss 1.0993   LearningRate 0.0000   Epoch: 19   Global Step: 824460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:53,112-Speed 2631.69 samples/sec   Loss 1.0592   LearningRate 0.0000   Epoch: 19   Global Step: 824470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:23:57,020-Speed 2621.39 samples/sec   Loss 1.1303   LearningRate 0.0000   Epoch: 19   Global Step: 824480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:00,947-Speed 2608.18 samples/sec   Loss 1.0700   LearningRate 0.0000   Epoch: 19   Global Step: 824490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:04,845-Speed 2627.58 samples/sec   Loss 1.1273   LearningRate 0.0000   Epoch: 19   Global Step: 824500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:08,784-Speed 2600.13 samples/sec   Loss 1.0876   LearningRate 0.0000   Epoch: 19   Global Step: 824510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:12,676-Speed 2632.52 samples/sec   Loss 1.0443   LearningRate 0.0000   Epoch: 19   Global Step: 824520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:16,586-Speed 2619.19 samples/sec   Loss 1.1317   LearningRate 0.0000   Epoch: 19   Global Step: 824530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:24:20,501-Speed 2616.58 samples/sec   Loss 1.0336   LearningRate 0.0000   Epoch: 19   Global Step: 824540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:24:24,387-Speed 2635.92 samples/sec   Loss 1.0835   LearningRate 0.0000   Epoch: 19   Global Step: 824550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:28,288-Speed 2625.54 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 824560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:32,186-Speed 2628.27 samples/sec   Loss 1.1157   LearningRate 0.0000   Epoch: 19   Global Step: 824570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:36,101-Speed 2615.77 samples/sec   Loss 1.0936   LearningRate 0.0000   Epoch: 19   Global Step: 824580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:39,997-Speed 2628.87 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 824590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:43,909-Speed 2617.81 samples/sec   Loss 1.1421   LearningRate 0.0000   Epoch: 19   Global Step: 824600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:47,798-Speed 2634.04 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 824610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:51,693-Speed 2629.56 samples/sec   Loss 1.0634   LearningRate 0.0000   Epoch: 19   Global Step: 824620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:55,589-Speed 2629.13 samples/sec   Loss 1.0936   LearningRate 0.0000   Epoch: 19   Global Step: 824630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:24:59,481-Speed 2631.81 samples/sec   Loss 1.0934   LearningRate 0.0000   Epoch: 19   Global Step: 824640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:25:03,359-Speed 2641.68 samples/sec   Loss 1.1066   LearningRate 0.0000   Epoch: 19   Global Step: 824650   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:07,271-Speed 2617.95 samples/sec   Loss 1.1291   LearningRate 0.0000   Epoch: 19   Global Step: 824660   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:11,164-Speed 2630.53 samples/sec   Loss 1.0845   LearningRate 0.0000   Epoch: 19   Global Step: 824670   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:15,069-Speed 2622.85 samples/sec   Loss 1.0902   LearningRate 0.0000   Epoch: 19   Global Step: 824680   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:18,992-Speed 2611.25 samples/sec   Loss 1.0810   LearningRate 0.0000   Epoch: 19   Global Step: 824690   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:22,884-Speed 2631.15 samples/sec   Loss 1.1071   LearningRate 0.0000   Epoch: 19   Global Step: 824700   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:26,779-Speed 2629.76 samples/sec   Loss 1.0833   LearningRate 0.0000   Epoch: 19   Global Step: 824710   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:30,677-Speed 2627.85 samples/sec   Loss 1.1176   LearningRate 0.0000   Epoch: 19   Global Step: 824720   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:34,572-Speed 2630.16 samples/sec   Loss 1.0875   LearningRate 0.0000   Epoch: 19   Global Step: 824730   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:38,460-Speed 2633.82 samples/sec   Loss 1.1046   LearningRate 0.0000   Epoch: 19   Global Step: 824740   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:25:42,352-Speed 2631.80 samples/sec   Loss 1.1194   LearningRate 0.0000   Epoch: 19   Global Step: 824750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:25:46,246-Speed 2630.43 samples/sec   Loss 1.1084   LearningRate 0.0000   Epoch: 19   Global Step: 824760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:25:50,142-Speed 2628.74 samples/sec   Loss 1.1224   LearningRate 0.0000   Epoch: 19   Global Step: 824770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:25:54,037-Speed 2629.66 samples/sec   Loss 1.1020   LearningRate 0.0000   Epoch: 19   Global Step: 824780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:25:57,911-Speed 2644.08 samples/sec   Loss 1.0523   LearningRate 0.0000   Epoch: 19   Global Step: 824790   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:01,806-Speed 2630.08 samples/sec   Loss 1.0591   LearningRate 0.0000   Epoch: 19   Global Step: 824800   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:05,711-Speed 2623.55 samples/sec   Loss 1.0877   LearningRate 0.0000   Epoch: 19   Global Step: 824810   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:09,600-Speed 2632.98 samples/sec   Loss 1.1112   LearningRate 0.0000   Epoch: 19   Global Step: 824820   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:13,513-Speed 2617.98 samples/sec   Loss 1.0638   LearningRate 0.0000   Epoch: 19   Global Step: 824830   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:17,409-Speed 2628.62 samples/sec   Loss 1.0875   LearningRate 0.0000   Epoch: 19   Global Step: 824840   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:21,309-Speed 2626.61 samples/sec   Loss 1.0966   LearningRate 0.0000   Epoch: 19   Global Step: 824850   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:25,253-Speed 2596.93 samples/sec   Loss 1.0953   LearningRate 0.0000   Epoch: 19   Global Step: 824860   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:29,152-Speed 2627.25 samples/sec   Loss 1.1294   LearningRate 0.0000   Epoch: 19   Global Step: 824870   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:33,049-Speed 2628.37 samples/sec   Loss 1.1196   LearningRate 0.0000   Epoch: 19   Global Step: 824880   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-16 16:26:36,978-Speed 2606.65 samples/sec   Loss 1.0833   LearningRate 0.0000   Epoch: 19   Global Step: 824890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:26:40,887-Speed 2620.97 samples/sec   Loss 1.1056   LearningRate 0.0000   Epoch: 19   Global Step: 824900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:26:44,787-Speed 2627.09 samples/sec   Loss 1.1202   LearningRate 0.0000   Epoch: 19   Global Step: 824910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:26:48,682-Speed 2629.03 samples/sec   Loss 1.1044   LearningRate 0.0000   Epoch: 19   Global Step: 824920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:26:52,598-Speed 2615.79 samples/sec   Loss 1.1500   LearningRate 0.0000   Epoch: 19   Global Step: 824930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:26:56,510-Speed 2618.45 samples/sec   Loss 1.0955   LearningRate 0.0000   Epoch: 19   Global Step: 824940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:00,408-Speed 2627.26 samples/sec   Loss 1.0791   LearningRate 0.0000   Epoch: 19   Global Step: 824950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:04,310-Speed 2624.93 samples/sec   Loss 1.0913   LearningRate 0.0000   Epoch: 19   Global Step: 824960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:08,199-Speed 2634.15 samples/sec   Loss 1.0985   LearningRate 0.0000   Epoch: 19   Global Step: 824970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:12,100-Speed 2625.76 samples/sec   Loss 1.0983   LearningRate 0.0000   Epoch: 19   Global Step: 824980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:15,997-Speed 2628.68 samples/sec   Loss 1.0567   LearningRate 0.0000   Epoch: 19   Global Step: 824990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:27:19,934-Speed 2601.28 samples/sec   Loss 1.1609   LearningRate 0.0000   Epoch: 19   Global Step: 825000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:23,836-Speed 2625.20 samples/sec   Loss 1.1053   LearningRate 0.0000   Epoch: 19   Global Step: 825010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:27,876-Speed 2536.16 samples/sec   Loss 1.1209   LearningRate 0.0000   Epoch: 19   Global Step: 825020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:31,770-Speed 2630.18 samples/sec   Loss 1.0744   LearningRate 0.0000   Epoch: 19   Global Step: 825030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:35,689-Speed 2613.51 samples/sec   Loss 1.0953   LearningRate 0.0000   Epoch: 19   Global Step: 825040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:39,583-Speed 2630.21 samples/sec   Loss 1.0909   LearningRate 0.0000   Epoch: 19   Global Step: 825050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:43,485-Speed 2625.46 samples/sec   Loss 1.0951   LearningRate 0.0000   Epoch: 19   Global Step: 825060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:47,391-Speed 2622.41 samples/sec   Loss 1.1250   LearningRate 0.0000   Epoch: 19   Global Step: 825070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:51,312-Speed 2612.49 samples/sec   Loss 1.0959   LearningRate 0.0000   Epoch: 19   Global Step: 825080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:55,203-Speed 2632.07 samples/sec   Loss 1.1018   LearningRate 0.0000   Epoch: 19   Global Step: 825090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:27:59,097-Speed 2630.97 samples/sec   Loss 1.1000   LearningRate 0.0000   Epoch: 19   Global Step: 825100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-16 16:28:02,969-Speed 2644.92 samples/sec   Loss 1.1362   LearningRate 0.0000   Epoch: 19   Global Step: 825110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:28:06,868-Speed 2627.18 samples/sec   Loss 1.0793   LearningRate 0.0000   Epoch: 19   Global Step: 825120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:28:10,766-Speed 2627.80 samples/sec   Loss 1.0981   LearningRate 0.0000   Epoch: 19   Global Step: 825130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-16 16:28:14,659-Speed 2630.86 samples/sec   Loss 1.0720   LearningRate 0.0000   Epoch: 19   Global Step: 825140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:18,552-Speed 2631.07 samples/sec   Loss 1.0744   LearningRate 0.0000   Epoch: 19   Global Step: 825150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:22,443-Speed 2632.73 samples/sec   Loss 1.1162   LearningRate 0.0000   Epoch: 19   Global Step: 825160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:26,343-Speed 2626.60 samples/sec   Loss 1.0849   LearningRate 0.0000   Epoch: 19   Global Step: 825170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:30,234-Speed 2633.59 samples/sec   Loss 1.0832   LearningRate 0.0000   Epoch: 19   Global Step: 825180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:34,131-Speed 2628.05 samples/sec   Loss 1.1139   LearningRate 0.0000   Epoch: 19   Global Step: 825190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:38,027-Speed 2629.33 samples/sec   Loss 1.0990   LearningRate 0.0000   Epoch: 19   Global Step: 825200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:41,919-Speed 2631.76 samples/sec   Loss 1.1064   LearningRate 0.0000   Epoch: 19   Global Step: 825210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:28:45,821-Speed 2625.08 samples/sec   Loss 1.1006   LearningRate 0.0000   Epoch: 19   Global Step: 825220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:28:49,738-Speed 2615.12 samples/sec   Loss 1.0957   LearningRate 0.0000   Epoch: 19   Global Step: 825230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:53,636-Speed 2627.99 samples/sec   Loss 1.0745   LearningRate 0.0000   Epoch: 19   Global Step: 825240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:28:57,531-Speed 2629.57 samples/sec   Loss 1.0948   LearningRate 0.0000   Epoch: 19   Global Step: 825250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:01,425-Speed 2630.14 samples/sec   Loss 1.0813   LearningRate 0.0000   Epoch: 19   Global Step: 825260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:05,318-Speed 2631.26 samples/sec   Loss 1.0699   LearningRate 0.0000   Epoch: 19   Global Step: 825270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:09,212-Speed 2629.97 samples/sec   Loss 1.0801   LearningRate 0.0000   Epoch: 19   Global Step: 825280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:13,107-Speed 2630.45 samples/sec   Loss 1.0795   LearningRate 0.0000   Epoch: 19   Global Step: 825290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:16,997-Speed 2633.28 samples/sec   Loss 1.1130   LearningRate 0.0000   Epoch: 19   Global Step: 825300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:20,888-Speed 2632.57 samples/sec   Loss 1.0394   LearningRate 0.0000   Epoch: 19   Global Step: 825310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:24,781-Speed 2631.16 samples/sec   Loss 1.1148   LearningRate 0.0000   Epoch: 19   Global Step: 825320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:28,670-Speed 2634.06 samples/sec   Loss 1.1367   LearningRate 0.0000   Epoch: 19   Global Step: 825330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:29:32,539-Speed 2647.27 samples/sec   Loss 1.1248   LearningRate 0.0000   Epoch: 19   Global Step: 825340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:36,435-Speed 2628.55 samples/sec   Loss 1.0923   LearningRate 0.0000   Epoch: 19   Global Step: 825350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:40,328-Speed 2630.68 samples/sec   Loss 1.1066   LearningRate 0.0000   Epoch: 19   Global Step: 825360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:44,219-Speed 2632.92 samples/sec   Loss 1.1074   LearningRate 0.0000   Epoch: 19   Global Step: 825370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:48,113-Speed 2630.42 samples/sec   Loss 1.0720   LearningRate 0.0000   Epoch: 19   Global Step: 825380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:52,017-Speed 2624.28 samples/sec   Loss 1.1101   LearningRate 0.0000   Epoch: 19   Global Step: 825390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:55,910-Speed 2630.59 samples/sec   Loss 1.1020   LearningRate 0.0000   Epoch: 19   Global Step: 825400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:29:59,808-Speed 2627.95 samples/sec   Loss 1.0737   LearningRate 0.0000   Epoch: 19   Global Step: 825410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:03,716-Speed 2621.34 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 825420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:07,613-Speed 2627.84 samples/sec   Loss 1.1081   LearningRate 0.0000   Epoch: 19   Global Step: 825430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:11,509-Speed 2628.73 samples/sec   Loss 1.0784   LearningRate 0.0000   Epoch: 19   Global Step: 825440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:30:15,413-Speed 2631.03 samples/sec   Loss 1.0955   LearningRate 0.0000   Epoch: 19   Global Step: 825450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:30:19,336-Speed 2610.66 samples/sec   Loss 1.1024   LearningRate 0.0000   Epoch: 19   Global Step: 825460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:30:23,219-Speed 2637.95 samples/sec   Loss 1.1152   LearningRate 0.0000   Epoch: 19   Global Step: 825470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:27,111-Speed 2631.85 samples/sec   Loss 1.1280   LearningRate 0.0000   Epoch: 19   Global Step: 825480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:31,009-Speed 2627.58 samples/sec   Loss 1.1058   LearningRate 0.0000   Epoch: 19   Global Step: 825490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:34,906-Speed 2628.83 samples/sec   Loss 1.0952   LearningRate 0.0000   Epoch: 19   Global Step: 825500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:38,799-Speed 2630.55 samples/sec   Loss 1.1330   LearningRate 0.0000   Epoch: 19   Global Step: 825510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:42,695-Speed 2628.51 samples/sec   Loss 1.1370   LearningRate 0.0000   Epoch: 19   Global Step: 825520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:46,618-Speed 2610.68 samples/sec   Loss 1.0831   LearningRate 0.0000   Epoch: 19   Global Step: 825530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:50,512-Speed 2631.44 samples/sec   Loss 1.1112   LearningRate 0.0000   Epoch: 19   Global Step: 825540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:54,405-Speed 2631.03 samples/sec   Loss 1.1191   LearningRate 0.0000   Epoch: 19   Global Step: 825550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:30:58,307-Speed 2625.51 samples/sec   Loss 1.0987   LearningRate 0.0000   Epoch: 19   Global Step: 825560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:02,205-Speed 2627.38 samples/sec   Loss 1.0559   LearningRate 0.0000   Epoch: 19   Global Step: 825570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:31:06,136-Speed 2605.98 samples/sec   Loss 1.0772   LearningRate 0.0000   Epoch: 19   Global Step: 825580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:10,042-Speed 2622.21 samples/sec   Loss 1.0707   LearningRate 0.0000   Epoch: 19   Global Step: 825590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:13,936-Speed 2630.24 samples/sec   Loss 1.1124   LearningRate 0.0000   Epoch: 19   Global Step: 825600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:17,918-Speed 2571.86 samples/sec   Loss 1.0703   LearningRate 0.0000   Epoch: 19   Global Step: 825610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:21,945-Speed 2543.97 samples/sec   Loss 1.0899   LearningRate 0.0000   Epoch: 19   Global Step: 825620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:25,840-Speed 2630.30 samples/sec   Loss 1.0859   LearningRate 0.0000   Epoch: 19   Global Step: 825630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:29,776-Speed 2602.44 samples/sec   Loss 1.1025   LearningRate 0.0000   Epoch: 19   Global Step: 825640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:33,669-Speed 2631.19 samples/sec   Loss 1.0157   LearningRate 0.0000   Epoch: 19   Global Step: 825650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:37,582-Speed 2617.21 samples/sec   Loss 1.0656   LearningRate 0.0000   Epoch: 19   Global Step: 825660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:41,479-Speed 2628.65 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 825670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:45,372-Speed 2630.68 samples/sec   Loss 1.1396   LearningRate 0.0000   Epoch: 19   Global Step: 825680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:31:49,240-Speed 2648.36 samples/sec   Loss 1.0501   LearningRate 0.0000   Epoch: 19   Global Step: 825690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:53,134-Speed 2630.29 samples/sec   Loss 1.0876   LearningRate 0.0000   Epoch: 19   Global Step: 825700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:31:57,026-Speed 2631.50 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 19   Global Step: 825710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:00,920-Speed 2630.50 samples/sec   Loss 1.0610   LearningRate 0.0000   Epoch: 19   Global Step: 825720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:04,814-Speed 2630.68 samples/sec   Loss 1.0823   LearningRate 0.0000   Epoch: 19   Global Step: 825730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:08,713-Speed 2626.74 samples/sec   Loss 1.0905   LearningRate 0.0000   Epoch: 19   Global Step: 825740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:12,623-Speed 2619.35 samples/sec   Loss 1.1491   LearningRate 0.0000   Epoch: 19   Global Step: 825750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:16,533-Speed 2619.85 samples/sec   Loss 1.0512   LearningRate 0.0000   Epoch: 19   Global Step: 825760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:20,480-Speed 2595.39 samples/sec   Loss 1.0847   LearningRate 0.0000   Epoch: 19   Global Step: 825770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:24,400-Speed 2613.15 samples/sec   Loss 1.1148   LearningRate 0.0000   Epoch: 19   Global Step: 825780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:28,334-Speed 2603.07 samples/sec   Loss 1.0630   LearningRate 0.0000   Epoch: 19   Global Step: 825790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:32:32,230-Speed 2629.59 samples/sec   Loss 1.1392   LearningRate 0.0000   Epoch: 19   Global Step: 825800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:32:36,129-Speed 2626.86 samples/sec   Loss 1.0681   LearningRate 0.0000   Epoch: 19   Global Step: 825810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:32:40,026-Speed 2629.07 samples/sec   Loss 1.0793   LearningRate 0.0000   Epoch: 19   Global Step: 825820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:32:43,924-Speed 2627.33 samples/sec   Loss 1.0879   LearningRate 0.0000   Epoch: 19   Global Step: 825830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:32:47,832-Speed 2621.27 samples/sec   Loss 1.1056   LearningRate 0.0000   Epoch: 19   Global Step: 825840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:32:51,716-Speed 2637.16 samples/sec   Loss 1.1067   LearningRate 0.0000   Epoch: 19   Global Step: 825850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:55,619-Speed 2624.53 samples/sec   Loss 1.0859   LearningRate 0.0000   Epoch: 19   Global Step: 825860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:32:59,518-Speed 2627.32 samples/sec   Loss 1.1168   LearningRate 0.0000   Epoch: 19   Global Step: 825870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:03,458-Speed 2599.42 samples/sec   Loss 1.1470   LearningRate 0.0000   Epoch: 19   Global Step: 825880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:07,368-Speed 2619.40 samples/sec   Loss 1.0810   LearningRate 0.0000   Epoch: 19   Global Step: 825890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:11,277-Speed 2620.66 samples/sec   Loss 1.0526   LearningRate 0.0000   Epoch: 19   Global Step: 825900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:15,174-Speed 2628.65 samples/sec   Loss 1.1070   LearningRate 0.0000   Epoch: 19   Global Step: 825910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:19,075-Speed 2625.65 samples/sec   Loss 1.1210   LearningRate 0.0000   Epoch: 19   Global Step: 825920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:22,971-Speed 2629.84 samples/sec   Loss 1.1068   LearningRate 0.0000   Epoch: 19   Global Step: 825930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:33:26,852-Speed 2638.68 samples/sec   Loss 1.0619   LearningRate 0.0000   Epoch: 19   Global Step: 825940   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:30,747-Speed 2630.13 samples/sec   Loss 1.0947   LearningRate 0.0000   Epoch: 19   Global Step: 825950   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:34,642-Speed 2629.48 samples/sec   Loss 1.0907   LearningRate 0.0000   Epoch: 19   Global Step: 825960   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:38,548-Speed 2621.88 samples/sec   Loss 1.1026   LearningRate 0.0000   Epoch: 19   Global Step: 825970   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:42,443-Speed 2629.80 samples/sec   Loss 1.0882   LearningRate 0.0000   Epoch: 19   Global Step: 825980   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:46,358-Speed 2616.47 samples/sec   Loss 1.0905   LearningRate 0.0000   Epoch: 19   Global Step: 825990   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:50,251-Speed 2631.33 samples/sec   Loss 1.0705   LearningRate 0.0000   Epoch: 19   Global Step: 826000   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:54,145-Speed 2629.97 samples/sec   Loss 1.0980   LearningRate 0.0000   Epoch: 19   Global Step: 826010   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:33:58,040-Speed 2630.60 samples/sec   Loss 1.1407   LearningRate 0.0000   Epoch: 19   Global Step: 826020   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:34:01,934-Speed 2629.80 samples/sec   Loss 1.0929   LearningRate 0.0000   Epoch: 19   Global Step: 826030   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:34:05,826-Speed 2631.45 samples/sec   Loss 1.0929   LearningRate 0.0000   Epoch: 19   Global Step: 826040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:09,720-Speed 2630.53 samples/sec   Loss 1.0814   LearningRate 0.0000   Epoch: 19   Global Step: 826050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:13,611-Speed 2633.36 samples/sec   Loss 1.1052   LearningRate 0.0000   Epoch: 19   Global Step: 826060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:17,509-Speed 2627.68 samples/sec   Loss 1.0770   LearningRate 0.0000   Epoch: 19   Global Step: 826070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:21,440-Speed 2606.17 samples/sec   Loss 1.0673   LearningRate 0.0000   Epoch: 19   Global Step: 826080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:25,337-Speed 2628.10 samples/sec   Loss 1.1052   LearningRate 0.0000   Epoch: 19   Global Step: 826090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:29,235-Speed 2627.94 samples/sec   Loss 1.1192   LearningRate 0.0000   Epoch: 19   Global Step: 826100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:33,177-Speed 2598.20 samples/sec   Loss 1.0805   LearningRate 0.0000   Epoch: 19   Global Step: 826110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:37,082-Speed 2623.03 samples/sec   Loss 1.1007   LearningRate 0.0000   Epoch: 19   Global Step: 826120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:40,977-Speed 2630.05 samples/sec   Loss 1.1328   LearningRate 0.0000   Epoch: 19   Global Step: 826130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:44,859-Speed 2638.15 samples/sec   Loss 1.0750   LearningRate 0.0000   Epoch: 19   Global Step: 826140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:48,779-Speed 2613.48 samples/sec   Loss 1.0973   LearningRate 0.0000   Epoch: 19   Global Step: 826150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:52,682-Speed 2624.14 samples/sec   Loss 1.1073   LearningRate 0.0000   Epoch: 19   Global Step: 826160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:34:56,585-Speed 2624.11 samples/sec   Loss 1.0314   LearningRate 0.0000   Epoch: 19   Global Step: 826170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:00,487-Speed 2625.42 samples/sec   Loss 1.0442   LearningRate 0.0000   Epoch: 19   Global Step: 826180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:04,379-Speed 2631.63 samples/sec   Loss 1.1417   LearningRate 0.0000   Epoch: 19   Global Step: 826190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:08,274-Speed 2629.61 samples/sec   Loss 1.1234   LearningRate 0.0000   Epoch: 19   Global Step: 826200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:12,173-Speed 2627.11 samples/sec   Loss 1.1012   LearningRate 0.0000   Epoch: 19   Global Step: 826210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:16,063-Speed 2633.44 samples/sec   Loss 1.0882   LearningRate 0.0000   Epoch: 19   Global Step: 826220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:19,989-Speed 2608.99 samples/sec   Loss 1.0871   LearningRate 0.0000   Epoch: 19   Global Step: 826230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:23,908-Speed 2613.45 samples/sec   Loss 1.1145   LearningRate 0.0000   Epoch: 19   Global Step: 826240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:35:27,804-Speed 2629.29 samples/sec   Loss 1.1038   LearningRate 0.0000   Epoch: 19   Global Step: 826250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:35:31,686-Speed 2638.77 samples/sec   Loss 1.0646   LearningRate 0.0000   Epoch: 19   Global Step: 826260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:35,611-Speed 2609.41 samples/sec   Loss 1.0775   LearningRate 0.0000   Epoch: 19   Global Step: 826270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:39,519-Speed 2620.91 samples/sec   Loss 1.1047   LearningRate 0.0000   Epoch: 19   Global Step: 826280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:43,413-Speed 2629.85 samples/sec   Loss 1.1049   LearningRate 0.0000   Epoch: 19   Global Step: 826290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:47,316-Speed 2623.93 samples/sec   Loss 1.1457   LearningRate 0.0000   Epoch: 19   Global Step: 826300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:51,222-Speed 2622.93 samples/sec   Loss 1.1245   LearningRate 0.0000   Epoch: 19   Global Step: 826310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:55,130-Speed 2620.78 samples/sec   Loss 1.1033   LearningRate 0.0000   Epoch: 19   Global Step: 826320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:35:59,040-Speed 2620.17 samples/sec   Loss 1.0727   LearningRate 0.0000   Epoch: 19   Global Step: 826330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:02,930-Speed 2632.42 samples/sec   Loss 1.0870   LearningRate 0.0000   Epoch: 19   Global Step: 826340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:06,820-Speed 2632.89 samples/sec   Loss 1.1542   LearningRate 0.0000   Epoch: 19   Global Step: 826350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:10,718-Speed 2627.62 samples/sec   Loss 1.1118   LearningRate 0.0000   Epoch: 19   Global Step: 826360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:36:14,590-Speed 2645.79 samples/sec   Loss 1.1327   LearningRate 0.0000   Epoch: 19   Global Step: 826370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:18,481-Speed 2632.19 samples/sec   Loss 1.1050   LearningRate 0.0000   Epoch: 19   Global Step: 826380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:22,376-Speed 2629.58 samples/sec   Loss 1.0910   LearningRate 0.0000   Epoch: 19   Global Step: 826390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:26,268-Speed 2631.54 samples/sec   Loss 1.0954   LearningRate 0.0000   Epoch: 19   Global Step: 826400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:30,162-Speed 2630.35 samples/sec   Loss 1.1000   LearningRate 0.0000   Epoch: 19   Global Step: 826410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:34,066-Speed 2624.22 samples/sec   Loss 1.1087   LearningRate 0.0000   Epoch: 19   Global Step: 826420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:37,999-Speed 2603.65 samples/sec   Loss 1.1122   LearningRate 0.0000   Epoch: 19   Global Step: 826430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:41,900-Speed 2625.63 samples/sec   Loss 1.1016   LearningRate 0.0000   Epoch: 19   Global Step: 826440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:45,803-Speed 2624.17 samples/sec   Loss 1.0716   LearningRate 0.0000   Epoch: 19   Global Step: 826450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:49,786-Speed 2571.77 samples/sec   Loss 1.1158   LearningRate 0.0000   Epoch: 19   Global Step: 826460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:53,656-Speed 2646.45 samples/sec   Loss 1.1188   LearningRate 0.0000   Epoch: 19   Global Step: 826470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:36:57,561-Speed 2623.53 samples/sec   Loss 1.0892   LearningRate 0.0000   Epoch: 19   Global Step: 826480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:01,462-Speed 2625.28 samples/sec   Loss 1.1188   LearningRate 0.0000   Epoch: 19   Global Step: 826490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:05,355-Speed 2631.20 samples/sec   Loss 1.0220   LearningRate 0.0000   Epoch: 19   Global Step: 826500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:09,248-Speed 2631.16 samples/sec   Loss 1.0795   LearningRate 0.0000   Epoch: 19   Global Step: 826510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:13,140-Speed 2631.09 samples/sec   Loss 1.0562   LearningRate 0.0000   Epoch: 19   Global Step: 826520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:17,032-Speed 2631.46 samples/sec   Loss 1.1161   LearningRate 0.0000   Epoch: 19   Global Step: 826530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:20,926-Speed 2630.64 samples/sec   Loss 1.0624   LearningRate 0.0000   Epoch: 19   Global Step: 826540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:24,822-Speed 2628.84 samples/sec   Loss 1.0832   LearningRate 0.0000   Epoch: 19   Global Step: 826550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:28,718-Speed 2629.61 samples/sec   Loss 1.1241   LearningRate 0.0000   Epoch: 19   Global Step: 826560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:32,629-Speed 2618.92 samples/sec   Loss 1.1528   LearningRate 0.0000   Epoch: 19   Global Step: 826570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:37:36,523-Speed 2630.54 samples/sec   Loss 1.1124   LearningRate 0.0000   Epoch: 19   Global Step: 826580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:37:40,415-Speed 2632.03 samples/sec   Loss 1.1375   LearningRate 0.0000   Epoch: 19   Global Step: 826590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:37:44,289-Speed 2643.57 samples/sec   Loss 1.0792   LearningRate 0.0000   Epoch: 19   Global Step: 826600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:48,185-Speed 2628.50 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 826610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:52,203-Speed 2550.30 samples/sec   Loss 1.0685   LearningRate 0.0000   Epoch: 19   Global Step: 826620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:37:56,108-Speed 2622.77 samples/sec   Loss 1.0766   LearningRate 0.0000   Epoch: 19   Global Step: 826630   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:00,002-Speed 2630.35 samples/sec   Loss 1.0370   LearningRate 0.0000   Epoch: 19   Global Step: 826640   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:03,901-Speed 2627.47 samples/sec   Loss 1.1011   LearningRate 0.0000   Epoch: 19   Global Step: 826650   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:07,798-Speed 2628.01 samples/sec   Loss 1.1013   LearningRate 0.0000   Epoch: 19   Global Step: 826660   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:11,701-Speed 2624.05 samples/sec   Loss 1.0619   LearningRate 0.0000   Epoch: 19   Global Step: 826670   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:15,598-Speed 2628.15 samples/sec   Loss 1.0813   LearningRate 0.0000   Epoch: 19   Global Step: 826680   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:19,496-Speed 2627.81 samples/sec   Loss 1.1239   LearningRate 0.0000   Epoch: 19   Global Step: 826690   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:23,394-Speed 2627.53 samples/sec   Loss 1.1133   LearningRate 0.0000   Epoch: 19   Global Step: 826700   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:27,293-Speed 2627.38 samples/sec   Loss 1.0929   LearningRate 0.0000   Epoch: 19   Global Step: 826710   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:31,190-Speed 2628.54 samples/sec   Loss 1.1253   LearningRate 0.0000   Epoch: 19   Global Step: 826720   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:35,087-Speed 2628.22 samples/sec   Loss 1.1386   LearningRate 0.0000   Epoch: 19   Global Step: 826730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:38:38,991-Speed 2623.53 samples/sec   Loss 1.0779   LearningRate 0.0000   Epoch: 19   Global Step: 826740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:38:42,886-Speed 2629.30 samples/sec   Loss 1.1107   LearningRate 0.0000   Epoch: 19   Global Step: 826750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:38:46,843-Speed 2588.91 samples/sec   Loss 1.0849   LearningRate 0.0000   Epoch: 19   Global Step: 826760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:38:50,718-Speed 2643.68 samples/sec   Loss 1.0852   LearningRate 0.0000   Epoch: 19   Global Step: 826770   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:54,610-Speed 2631.30 samples/sec   Loss 1.0760   LearningRate 0.0000   Epoch: 19   Global Step: 826780   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:38:58,522-Speed 2618.77 samples/sec   Loss 1.1187   LearningRate 0.0000   Epoch: 19   Global Step: 826790   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:02,452-Speed 2606.35 samples/sec   Loss 1.1086   LearningRate 0.0000   Epoch: 19   Global Step: 826800   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:06,351-Speed 2626.45 samples/sec   Loss 1.1165   LearningRate 0.0000   Epoch: 19   Global Step: 826810   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:10,257-Speed 2622.56 samples/sec   Loss 1.1135   LearningRate 0.0000   Epoch: 19   Global Step: 826820   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:14,159-Speed 2624.92 samples/sec   Loss 1.0934   LearningRate 0.0000   Epoch: 19   Global Step: 826830   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:18,050-Speed 2632.03 samples/sec   Loss 1.0947   LearningRate 0.0000   Epoch: 19   Global Step: 826840   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:21,941-Speed 2633.50 samples/sec   Loss 1.0947   LearningRate 0.0000   Epoch: 19   Global Step: 826850   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:25,870-Speed 2606.53 samples/sec   Loss 1.1342   LearningRate 0.0000   Epoch: 19   Global Step: 826860   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:39:29,803-Speed 2604.83 samples/sec   Loss 1.1185   LearningRate 0.0000   Epoch: 19   Global Step: 826870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:33,713-Speed 2619.14 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 826880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:37,616-Speed 2624.68 samples/sec   Loss 1.0713   LearningRate 0.0000   Epoch: 19   Global Step: 826890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:41,537-Speed 2611.60 samples/sec   Loss 1.1319   LearningRate 0.0000   Epoch: 19   Global Step: 826900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:45,439-Speed 2625.45 samples/sec   Loss 1.1245   LearningRate 0.0000   Epoch: 19   Global Step: 826910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:49,335-Speed 2629.53 samples/sec   Loss 1.1249   LearningRate 0.0000   Epoch: 19   Global Step: 826920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:53,229-Speed 2630.02 samples/sec   Loss 1.1121   LearningRate 0.0000   Epoch: 19   Global Step: 826930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:39:57,123-Speed 2630.48 samples/sec   Loss 1.0977   LearningRate 0.0000   Epoch: 19   Global Step: 826940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:01,013-Speed 2633.85 samples/sec   Loss 1.0302   LearningRate 0.0000   Epoch: 19   Global Step: 826950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:04,918-Speed 2622.62 samples/sec   Loss 1.0847   LearningRate 0.0000   Epoch: 19   Global Step: 826960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:08,831-Speed 2617.39 samples/sec   Loss 1.1116   LearningRate 0.0000   Epoch: 19   Global Step: 826970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:40:12,733-Speed 2625.15 samples/sec   Loss 1.0867   LearningRate 0.0000   Epoch: 19   Global Step: 826980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:40:16,609-Speed 2642.87 samples/sec   Loss 1.0753   LearningRate 0.0000   Epoch: 19   Global Step: 826990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:20,525-Speed 2615.07 samples/sec   Loss 1.0840   LearningRate 0.0000   Epoch: 19   Global Step: 827000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:24,421-Speed 2629.36 samples/sec   Loss 1.0898   LearningRate 0.0000   Epoch: 19   Global Step: 827010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:28,316-Speed 2629.84 samples/sec   Loss 1.1260   LearningRate 0.0000   Epoch: 19   Global Step: 827020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:32,235-Speed 2613.80 samples/sec   Loss 1.1388   LearningRate 0.0000   Epoch: 19   Global Step: 827030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:36,129-Speed 2630.84 samples/sec   Loss 1.0684   LearningRate 0.0000   Epoch: 19   Global Step: 827040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:40,021-Speed 2631.66 samples/sec   Loss 1.1207   LearningRate 0.0000   Epoch: 19   Global Step: 827050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:40:43,893-Speed 2644.84 samples/sec   Loss 1.0750   LearningRate 0.0000   Epoch: 19   Global Step: 827060   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:40:47,788-Speed 2629.71 samples/sec   Loss 1.0590   LearningRate 0.0000   Epoch: 19   Global Step: 827070   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:40:51,682-Speed 2630.43 samples/sec   Loss 1.0651   LearningRate 0.0000   Epoch: 19   Global Step: 827080   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:40:55,582-Speed 2626.50 samples/sec   Loss 1.0736   LearningRate 0.0000   Epoch: 19   Global Step: 827090   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:40:59,475-Speed 2631.04 samples/sec   Loss 1.0898   LearningRate 0.0000   Epoch: 19   Global Step: 827100   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:41:03,368-Speed 2631.14 samples/sec   Loss 1.1118   LearningRate 0.0000   Epoch: 19   Global Step: 827110   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:41:07,260-Speed 2631.63 samples/sec   Loss 1.0296   LearningRate 0.0000   Epoch: 19   Global Step: 827120   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:41:11,167-Speed 2621.67 samples/sec   Loss 1.1232   LearningRate 0.0000   Epoch: 19   Global Step: 827130   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:41:15,064-Speed 2628.85 samples/sec   Loss 1.1416   LearningRate 0.0000   Epoch: 19   Global Step: 827140   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:41:18,959-Speed 2629.10 samples/sec   Loss 1.1098   LearningRate 0.0000   Epoch: 19   Global Step: 827150   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:41:22,859-Speed 2626.53 samples/sec   Loss 1.1099   LearningRate 0.0000   Epoch: 19   Global Step: 827160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:26,763-Speed 2623.21 samples/sec   Loss 1.0904   LearningRate 0.0000   Epoch: 19   Global Step: 827170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:30,657-Speed 2631.49 samples/sec   Loss 1.0699   LearningRate 0.0000   Epoch: 19   Global Step: 827180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:34,566-Speed 2620.33 samples/sec   Loss 1.0664   LearningRate 0.0000   Epoch: 19   Global Step: 827190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:38,461-Speed 2629.58 samples/sec   Loss 1.0781   LearningRate 0.0000   Epoch: 19   Global Step: 827200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:42,354-Speed 2630.89 samples/sec   Loss 1.1043   LearningRate 0.0000   Epoch: 19   Global Step: 827210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:46,246-Speed 2631.73 samples/sec   Loss 1.0844   LearningRate 0.0000   Epoch: 19   Global Step: 827220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:50,142-Speed 2628.95 samples/sec   Loss 1.0726   LearningRate 0.0000   Epoch: 19   Global Step: 827230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:54,035-Speed 2630.80 samples/sec   Loss 1.0886   LearningRate 0.0000   Epoch: 19   Global Step: 827240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:41:57,927-Speed 2631.65 samples/sec   Loss 1.0825   LearningRate 0.0000   Epoch: 19   Global Step: 827250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:01,819-Speed 2632.15 samples/sec   Loss 1.0732   LearningRate 0.0000   Epoch: 19   Global Step: 827260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:42:05,701-Speed 2638.85 samples/sec   Loss 1.0985   LearningRate 0.0000   Epoch: 19   Global Step: 827270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:09,599-Speed 2627.34 samples/sec   Loss 1.0841   LearningRate 0.0000   Epoch: 19   Global Step: 827280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:13,492-Speed 2630.77 samples/sec   Loss 1.0665   LearningRate 0.0000   Epoch: 19   Global Step: 827290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:17,389-Speed 2629.07 samples/sec   Loss 1.0894   LearningRate 0.0000   Epoch: 19   Global Step: 827300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:21,283-Speed 2630.12 samples/sec   Loss 1.1419   LearningRate 0.0000   Epoch: 19   Global Step: 827310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:25,190-Speed 2621.95 samples/sec   Loss 1.1260   LearningRate 0.0000   Epoch: 19   Global Step: 827320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:29,082-Speed 2632.24 samples/sec   Loss 1.0948   LearningRate 0.0000   Epoch: 19   Global Step: 827330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:32,977-Speed 2629.62 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 827340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:36,873-Speed 2629.52 samples/sec   Loss 1.0888   LearningRate 0.0000   Epoch: 19   Global Step: 827350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:40,828-Speed 2589.83 samples/sec   Loss 1.1289   LearningRate 0.0000   Epoch: 19   Global Step: 827360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:44,723-Speed 2629.50 samples/sec   Loss 1.1408   LearningRate 0.0000   Epoch: 19   Global Step: 827370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:42:48,597-Speed 2643.84 samples/sec   Loss 1.0482   LearningRate 0.0000   Epoch: 19   Global Step: 827380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:52,492-Speed 2630.15 samples/sec   Loss 1.0557   LearningRate 0.0000   Epoch: 19   Global Step: 827390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:42:56,381-Speed 2633.95 samples/sec   Loss 1.1134   LearningRate 0.0000   Epoch: 19   Global Step: 827400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:00,282-Speed 2625.20 samples/sec   Loss 1.1059   LearningRate 0.0000   Epoch: 19   Global Step: 827410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:04,181-Speed 2627.93 samples/sec   Loss 1.1083   LearningRate 0.0000   Epoch: 19   Global Step: 827420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:08,072-Speed 2632.19 samples/sec   Loss 1.1004   LearningRate 0.0000   Epoch: 19   Global Step: 827430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:11,997-Speed 2609.47 samples/sec   Loss 1.0376   LearningRate 0.0000   Epoch: 19   Global Step: 827440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:15,899-Speed 2625.08 samples/sec   Loss 1.1280   LearningRate 0.0000   Epoch: 19   Global Step: 827450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:19,796-Speed 2628.95 samples/sec   Loss 1.1112   LearningRate 0.0000   Epoch: 19   Global Step: 827460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:23,704-Speed 2620.85 samples/sec   Loss 1.1286   LearningRate 0.0000   Epoch: 19   Global Step: 827470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:27,625-Speed 2612.30 samples/sec   Loss 1.1048   LearningRate 0.0000   Epoch: 19   Global Step: 827480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:43:31,523-Speed 2627.46 samples/sec   Loss 1.1124   LearningRate 0.0000   Epoch: 19   Global Step: 827490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:43:35,398-Speed 2643.23 samples/sec   Loss 1.0782   LearningRate 0.0000   Epoch: 19   Global Step: 827500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:39,299-Speed 2625.86 samples/sec   Loss 1.0639   LearningRate 0.0000   Epoch: 19   Global Step: 827510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:43,200-Speed 2625.74 samples/sec   Loss 1.0576   LearningRate 0.0000   Epoch: 19   Global Step: 827520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:47,096-Speed 2628.37 samples/sec   Loss 1.0915   LearningRate 0.0000   Epoch: 19   Global Step: 827530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:50,990-Speed 2630.95 samples/sec   Loss 1.0882   LearningRate 0.0000   Epoch: 19   Global Step: 827540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:54,886-Speed 2628.21 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 19   Global Step: 827550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:43:58,790-Speed 2624.29 samples/sec   Loss 1.1130   LearningRate 0.0000   Epoch: 19   Global Step: 827560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:02,689-Speed 2626.40 samples/sec   Loss 1.1346   LearningRate 0.0000   Epoch: 19   Global Step: 827570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:06,594-Speed 2623.64 samples/sec   Loss 1.0658   LearningRate 0.0000   Epoch: 19   Global Step: 827580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:10,488-Speed 2630.02 samples/sec   Loss 1.1027   LearningRate 0.0000   Epoch: 19   Global Step: 827590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:14,380-Speed 2631.62 samples/sec   Loss 1.0795   LearningRate 0.0000   Epoch: 19   Global Step: 827600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:44:18,277-Speed 2628.14 samples/sec   Loss 1.0803   LearningRate 0.0000   Epoch: 19   Global Step: 827610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:44:22,173-Speed 2629.39 samples/sec   Loss 1.0897   LearningRate 0.0000   Epoch: 19   Global Step: 827620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:44:26,068-Speed 2629.48 samples/sec   Loss 1.0802   LearningRate 0.0000   Epoch: 19   Global Step: 827630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:44:30,123-Speed 2525.56 samples/sec   Loss 1.0835   LearningRate 0.0000   Epoch: 19   Global Step: 827640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:34,048-Speed 2609.49 samples/sec   Loss 1.0592   LearningRate 0.0000   Epoch: 19   Global Step: 827650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:37,952-Speed 2623.81 samples/sec   Loss 1.1165   LearningRate 0.0000   Epoch: 19   Global Step: 827660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:41,847-Speed 2629.82 samples/sec   Loss 1.0723   LearningRate 0.0000   Epoch: 19   Global Step: 827670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:45,749-Speed 2625.16 samples/sec   Loss 1.0680   LearningRate 0.0000   Epoch: 19   Global Step: 827680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:49,644-Speed 2629.62 samples/sec   Loss 1.1046   LearningRate 0.0000   Epoch: 19   Global Step: 827690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:53,537-Speed 2630.96 samples/sec   Loss 1.0622   LearningRate 0.0000   Epoch: 19   Global Step: 827700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:44:57,448-Speed 2619.51 samples/sec   Loss 1.0941   LearningRate 0.0000   Epoch: 19   Global Step: 827710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:01,344-Speed 2628.52 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 19   Global Step: 827720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:05,241-Speed 2628.27 samples/sec   Loss 1.0875   LearningRate 0.0000   Epoch: 19   Global Step: 827730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:09,111-Speed 2646.56 samples/sec   Loss 1.1349   LearningRate 0.0000   Epoch: 19   Global Step: 827740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:13,002-Speed 2632.21 samples/sec   Loss 1.0638   LearningRate 0.0000   Epoch: 19   Global Step: 827750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:16,906-Speed 2624.90 samples/sec   Loss 1.0882   LearningRate 0.0000   Epoch: 19   Global Step: 827760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:20,797-Speed 2632.21 samples/sec   Loss 1.1243   LearningRate 0.0000   Epoch: 19   Global Step: 827770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:24,697-Speed 2626.59 samples/sec   Loss 1.1298   LearningRate 0.0000   Epoch: 19   Global Step: 827780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:28,597-Speed 2626.07 samples/sec   Loss 1.0892   LearningRate 0.0000   Epoch: 19   Global Step: 827790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:32,498-Speed 2625.79 samples/sec   Loss 1.0522   LearningRate 0.0000   Epoch: 19   Global Step: 827800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:36,397-Speed 2626.91 samples/sec   Loss 1.0781   LearningRate 0.0000   Epoch: 19   Global Step: 827810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:40,306-Speed 2619.84 samples/sec   Loss 1.0253   LearningRate 0.0000   Epoch: 19   Global Step: 827820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:44,198-Speed 2631.49 samples/sec   Loss 1.0745   LearningRate 0.0000   Epoch: 19   Global Step: 827830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:48,104-Speed 2622.87 samples/sec   Loss 1.1135   LearningRate 0.0000   Epoch: 19   Global Step: 827840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:45:51,998-Speed 2630.50 samples/sec   Loss 1.0812   LearningRate 0.0000   Epoch: 19   Global Step: 827850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:45:55,869-Speed 2645.78 samples/sec   Loss 1.1049   LearningRate 0.0000   Epoch: 19   Global Step: 827860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:45:59,768-Speed 2627.28 samples/sec   Loss 1.0929   LearningRate 0.0000   Epoch: 19   Global Step: 827870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:03,666-Speed 2627.70 samples/sec   Loss 1.0795   LearningRate 0.0000   Epoch: 19   Global Step: 827880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:07,566-Speed 2626.12 samples/sec   Loss 1.1320   LearningRate 0.0000   Epoch: 19   Global Step: 827890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:11,462-Speed 2628.81 samples/sec   Loss 1.1137   LearningRate 0.0000   Epoch: 19   Global Step: 827900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:15,373-Speed 2618.93 samples/sec   Loss 1.0961   LearningRate 0.0000   Epoch: 19   Global Step: 827910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:19,266-Speed 2630.91 samples/sec   Loss 1.1392   LearningRate 0.0000   Epoch: 19   Global Step: 827920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:23,157-Speed 2632.53 samples/sec   Loss 1.0750   LearningRate 0.0000   Epoch: 19   Global Step: 827930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:27,047-Speed 2632.62 samples/sec   Loss 1.1154   LearningRate 0.0000   Epoch: 19   Global Step: 827940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:30,948-Speed 2625.75 samples/sec   Loss 1.1285   LearningRate 0.0000   Epoch: 19   Global Step: 827950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:34,844-Speed 2628.84 samples/sec   Loss 1.1183   LearningRate 0.0000   Epoch: 19   Global Step: 827960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:46:38,713-Speed 2647.72 samples/sec   Loss 1.1070   LearningRate 0.0000   Epoch: 19   Global Step: 827970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:42,618-Speed 2623.06 samples/sec   Loss 1.1094   LearningRate 0.0000   Epoch: 19   Global Step: 827980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:46,526-Speed 2620.93 samples/sec   Loss 1.1391   LearningRate 0.0000   Epoch: 19   Global Step: 827990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:50,434-Speed 2620.40 samples/sec   Loss 1.1066   LearningRate 0.0000   Epoch: 19   Global Step: 828000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:54,350-Speed 2616.21 samples/sec   Loss 1.1070   LearningRate 0.0000   Epoch: 19   Global Step: 828010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:46:58,241-Speed 2631.96 samples/sec   Loss 1.0428   LearningRate 0.0000   Epoch: 19   Global Step: 828020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:02,135-Speed 2630.67 samples/sec   Loss 1.0853   LearningRate 0.0000   Epoch: 19   Global Step: 828030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:06,027-Speed 2631.75 samples/sec   Loss 1.1536   LearningRate 0.0000   Epoch: 19   Global Step: 828040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:09,920-Speed 2631.38 samples/sec   Loss 1.0873   LearningRate 0.0000   Epoch: 19   Global Step: 828050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:13,809-Speed 2633.60 samples/sec   Loss 1.0729   LearningRate 0.0000   Epoch: 19   Global Step: 828060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:17,715-Speed 2621.95 samples/sec   Loss 1.1050   LearningRate 0.0000   Epoch: 19   Global Step: 828070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:47:21,597-Speed 2638.54 samples/sec   Loss 1.0796   LearningRate 0.0000   Epoch: 19   Global Step: 828080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:25,534-Speed 2601.94 samples/sec   Loss 1.0528   LearningRate 0.0000   Epoch: 19   Global Step: 828090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:29,424-Speed 2633.14 samples/sec   Loss 1.0689   LearningRate 0.0000   Epoch: 19   Global Step: 828100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:33,320-Speed 2628.94 samples/sec   Loss 1.1026   LearningRate 0.0000   Epoch: 19   Global Step: 828110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:37,211-Speed 2632.42 samples/sec   Loss 1.0696   LearningRate 0.0000   Epoch: 19   Global Step: 828120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:41,110-Speed 2626.12 samples/sec   Loss 1.1208   LearningRate 0.0000   Epoch: 19   Global Step: 828130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:45,011-Speed 2626.41 samples/sec   Loss 1.1169   LearningRate 0.0000   Epoch: 19   Global Step: 828140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:48,910-Speed 2626.41 samples/sec   Loss 1.1122   LearningRate 0.0000   Epoch: 19   Global Step: 828150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:52,810-Speed 2626.28 samples/sec   Loss 1.0738   LearningRate 0.0000   Epoch: 19   Global Step: 828160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:47:56,742-Speed 2604.85 samples/sec   Loss 1.0648   LearningRate 0.0000   Epoch: 19   Global Step: 828170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:00,640-Speed 2628.01 samples/sec   Loss 1.0605   LearningRate 0.0000   Epoch: 19   Global Step: 828180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:48:04,531-Speed 2632.50 samples/sec   Loss 1.0665   LearningRate 0.0000   Epoch: 19   Global Step: 828190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:48:08,402-Speed 2645.92 samples/sec   Loss 1.0925   LearningRate 0.0000   Epoch: 19   Global Step: 828200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:12,290-Speed 2633.62 samples/sec   Loss 1.1000   LearningRate 0.0000   Epoch: 19   Global Step: 828210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:16,180-Speed 2633.73 samples/sec   Loss 1.0746   LearningRate 0.0000   Epoch: 19   Global Step: 828220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:20,072-Speed 2631.06 samples/sec   Loss 1.0850   LearningRate 0.0000   Epoch: 19   Global Step: 828230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:23,967-Speed 2629.81 samples/sec   Loss 1.1109   LearningRate 0.0000   Epoch: 19   Global Step: 828240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:27,861-Speed 2630.57 samples/sec   Loss 1.0735   LearningRate 0.0000   Epoch: 19   Global Step: 828250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:31,771-Speed 2619.24 samples/sec   Loss 1.1040   LearningRate 0.0000   Epoch: 19   Global Step: 828260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:35,663-Speed 2631.72 samples/sec   Loss 1.1045   LearningRate 0.0000   Epoch: 19   Global Step: 828270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:39,555-Speed 2631.85 samples/sec   Loss 1.1266   LearningRate 0.0000   Epoch: 19   Global Step: 828280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:43,444-Speed 2634.03 samples/sec   Loss 1.1037   LearningRate 0.0000   Epoch: 19   Global Step: 828290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:47,334-Speed 2632.87 samples/sec   Loss 1.0586   LearningRate 0.0000   Epoch: 19   Global Step: 828300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:48:51,213-Speed 2640.31 samples/sec   Loss 1.1414   LearningRate 0.0000   Epoch: 19   Global Step: 828310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:48:55,079-Speed 2649.38 samples/sec   Loss 1.0992   LearningRate 0.0000   Epoch: 19   Global Step: 828320   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:48:58,978-Speed 2626.83 samples/sec   Loss 1.0991   LearningRate 0.0000   Epoch: 19   Global Step: 828330   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:02,873-Speed 2630.02 samples/sec   Loss 1.1602   LearningRate 0.0000   Epoch: 19   Global Step: 828340   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:06,782-Speed 2619.73 samples/sec   Loss 1.0502   LearningRate 0.0000   Epoch: 19   Global Step: 828350   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:10,677-Speed 2630.12 samples/sec   Loss 1.1042   LearningRate 0.0000   Epoch: 19   Global Step: 828360   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:14,571-Speed 2630.11 samples/sec   Loss 1.0879   LearningRate 0.0000   Epoch: 19   Global Step: 828370   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:18,475-Speed 2623.73 samples/sec   Loss 1.0786   LearningRate 0.0000   Epoch: 19   Global Step: 828380   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:22,367-Speed 2631.13 samples/sec   Loss 1.0861   LearningRate 0.0000   Epoch: 19   Global Step: 828390   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:26,257-Speed 2633.50 samples/sec   Loss 1.0926   LearningRate 0.0000   Epoch: 19   Global Step: 828400   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:30,148-Speed 2632.35 samples/sec   Loss 1.1141   LearningRate 0.0000   Epoch: 19   Global Step: 828410   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:49:34,037-Speed 2633.80 samples/sec   Loss 1.1015   LearningRate 0.0000   Epoch: 19   Global Step: 828420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:49:37,939-Speed 2624.88 samples/sec   Loss 1.1015   LearningRate 0.0000   Epoch: 19   Global Step: 828430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:49:41,865-Speed 2608.98 samples/sec   Loss 1.0703   LearningRate 0.0000   Epoch: 19   Global Step: 828440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:49:45,761-Speed 2629.38 samples/sec   Loss 1.0545   LearningRate 0.0000   Epoch: 19   Global Step: 828450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:49:49,655-Speed 2629.95 samples/sec   Loss 1.0812   LearningRate 0.0000   Epoch: 19   Global Step: 828460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:49:53,571-Speed 2615.23 samples/sec   Loss 1.0889   LearningRate 0.0000   Epoch: 19   Global Step: 828470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:49:57,468-Speed 2628.53 samples/sec   Loss 1.0875   LearningRate 0.0000   Epoch: 19   Global Step: 828480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:01,359-Speed 2632.55 samples/sec   Loss 1.1382   LearningRate 0.0000   Epoch: 19   Global Step: 828490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:05,253-Speed 2630.52 samples/sec   Loss 1.0721   LearningRate 0.0000   Epoch: 19   Global Step: 828500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:09,258-Speed 2557.62 samples/sec   Loss 1.0798   LearningRate 0.0000   Epoch: 19   Global Step: 828510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:13,149-Speed 2632.36 samples/sec   Loss 1.0921   LearningRate 0.0000   Epoch: 19   Global Step: 828520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:50:17,068-Speed 2613.76 samples/sec   Loss 1.0913   LearningRate 0.0000   Epoch: 19   Global Step: 828530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:20,984-Speed 2615.93 samples/sec   Loss 1.0649   LearningRate 0.0000   Epoch: 19   Global Step: 828540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:24,892-Speed 2620.17 samples/sec   Loss 1.1424   LearningRate 0.0000   Epoch: 19   Global Step: 828550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:28,793-Speed 2625.57 samples/sec   Loss 1.1183   LearningRate 0.0000   Epoch: 19   Global Step: 828560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:32,685-Speed 2632.53 samples/sec   Loss 1.1239   LearningRate 0.0000   Epoch: 19   Global Step: 828570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:36,575-Speed 2632.64 samples/sec   Loss 1.1682   LearningRate 0.0000   Epoch: 19   Global Step: 828580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:40,474-Speed 2626.67 samples/sec   Loss 1.0637   LearningRate 0.0000   Epoch: 19   Global Step: 828590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:44,366-Speed 2632.03 samples/sec   Loss 1.1101   LearningRate 0.0000   Epoch: 19   Global Step: 828600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:48,258-Speed 2632.12 samples/sec   Loss 1.1231   LearningRate 0.0000   Epoch: 19   Global Step: 828610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:52,156-Speed 2627.60 samples/sec   Loss 1.0770   LearningRate 0.0000   Epoch: 19   Global Step: 828620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:50:56,052-Speed 2628.69 samples/sec   Loss 1.0738   LearningRate 0.0000   Epoch: 19   Global Step: 828630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:50:59,985-Speed 2604.34 samples/sec   Loss 1.0912   LearningRate 0.0000   Epoch: 19   Global Step: 828640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:51:03,884-Speed 2626.77 samples/sec   Loss 1.0578   LearningRate 0.0000   Epoch: 19   Global Step: 828650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:51:07,778-Speed 2630.88 samples/sec   Loss 1.0958   LearningRate 0.0000   Epoch: 19   Global Step: 828660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:51:11,646-Speed 2647.86 samples/sec   Loss 1.1381   LearningRate 0.0000   Epoch: 19   Global Step: 828670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:15,546-Speed 2626.33 samples/sec   Loss 1.0752   LearningRate 0.0000   Epoch: 19   Global Step: 828680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:19,447-Speed 2625.00 samples/sec   Loss 1.1215   LearningRate 0.0000   Epoch: 19   Global Step: 828690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:23,360-Speed 2618.39 samples/sec   Loss 1.1051   LearningRate 0.0000   Epoch: 19   Global Step: 828700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:27,256-Speed 2628.92 samples/sec   Loss 1.0895   LearningRate 0.0000   Epoch: 19   Global Step: 828710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:31,153-Speed 2628.50 samples/sec   Loss 1.1269   LearningRate 0.0000   Epoch: 19   Global Step: 828720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:35,053-Speed 2626.32 samples/sec   Loss 1.1092   LearningRate 0.0000   Epoch: 19   Global Step: 828730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:38,943-Speed 2632.69 samples/sec   Loss 1.0737   LearningRate 0.0000   Epoch: 19   Global Step: 828740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:42,837-Speed 2630.27 samples/sec   Loss 1.0924   LearningRate 0.0000   Epoch: 19   Global Step: 828750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:46,726-Speed 2634.20 samples/sec   Loss 1.1009   LearningRate 0.0000   Epoch: 19   Global Step: 828760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:51:50,627-Speed 2625.54 samples/sec   Loss 1.0733   LearningRate 0.0000   Epoch: 19   Global Step: 828770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:51:54,525-Speed 2632.45 samples/sec   Loss 1.0595   LearningRate 0.0000   Epoch: 19   Global Step: 828780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:51:58,399-Speed 2643.73 samples/sec   Loss 1.0717   LearningRate 0.0000   Epoch: 19   Global Step: 828790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:02,298-Speed 2627.66 samples/sec   Loss 1.1145   LearningRate 0.0000   Epoch: 19   Global Step: 828800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:06,190-Speed 2631.51 samples/sec   Loss 1.0545   LearningRate 0.0000   Epoch: 19   Global Step: 828810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:10,085-Speed 2629.05 samples/sec   Loss 1.1016   LearningRate 0.0000   Epoch: 19   Global Step: 828820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:14,016-Speed 2605.26 samples/sec   Loss 1.1105   LearningRate 0.0000   Epoch: 19   Global Step: 828830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:17,909-Speed 2631.56 samples/sec   Loss 1.1085   LearningRate 0.0000   Epoch: 19   Global Step: 828840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:21,803-Speed 2629.96 samples/sec   Loss 1.1069   LearningRate 0.0000   Epoch: 19   Global Step: 828850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:25,700-Speed 2628.55 samples/sec   Loss 1.1004   LearningRate 0.0000   Epoch: 19   Global Step: 828860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:29,598-Speed 2628.12 samples/sec   Loss 1.1242   LearningRate 0.0000   Epoch: 19   Global Step: 828870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:33,496-Speed 2627.98 samples/sec   Loss 1.1374   LearningRate 0.0000   Epoch: 19   Global Step: 828880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:37,396-Speed 2626.17 samples/sec   Loss 1.0770   LearningRate 0.0000   Epoch: 19   Global Step: 828890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:52:41,288-Speed 2631.62 samples/sec   Loss 1.0764   LearningRate 0.0000   Epoch: 19   Global Step: 828900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:52:45,156-Speed 2647.53 samples/sec   Loss 1.1603   LearningRate 0.0000   Epoch: 19   Global Step: 828910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:49,054-Speed 2627.48 samples/sec   Loss 1.1016   LearningRate 0.0000   Epoch: 19   Global Step: 828920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:52,948-Speed 2630.85 samples/sec   Loss 1.1312   LearningRate 0.0000   Epoch: 19   Global Step: 828930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:52:56,877-Speed 2606.68 samples/sec   Loss 1.0883   LearningRate 0.0000   Epoch: 19   Global Step: 828940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:00,768-Speed 2632.70 samples/sec   Loss 1.0745   LearningRate 0.0000   Epoch: 19   Global Step: 828950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:04,674-Speed 2622.79 samples/sec   Loss 1.1017   LearningRate 0.0000   Epoch: 19   Global Step: 828960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:08,570-Speed 2628.68 samples/sec   Loss 1.0821   LearningRate 0.0000   Epoch: 19   Global Step: 828970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:12,460-Speed 2632.75 samples/sec   Loss 1.0639   LearningRate 0.0000   Epoch: 19   Global Step: 828980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:16,355-Speed 2630.11 samples/sec   Loss 1.0932   LearningRate 0.0000   Epoch: 19   Global Step: 828990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:20,246-Speed 2632.06 samples/sec   Loss 1.1063   LearningRate 0.0000   Epoch: 19   Global Step: 829000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:24,138-Speed 2632.18 samples/sec   Loss 1.0772   LearningRate 0.0000   Epoch: 19   Global Step: 829010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:28,031-Speed 2630.47 samples/sec   Loss 1.0926   LearningRate 0.0000   Epoch: 19   Global Step: 829020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:31,927-Speed 2629.27 samples/sec   Loss 1.1008   LearningRate 0.0000   Epoch: 19   Global Step: 829030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:35,842-Speed 2615.96 samples/sec   Loss 1.1222   LearningRate 0.0000   Epoch: 19   Global Step: 829040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:39,734-Speed 2631.58 samples/sec   Loss 1.1096   LearningRate 0.0000   Epoch: 19   Global Step: 829050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:43,639-Speed 2623.10 samples/sec   Loss 1.1081   LearningRate 0.0000   Epoch: 19   Global Step: 829060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:47,542-Speed 2626.59 samples/sec   Loss 1.1338   LearningRate 0.0000   Epoch: 19   Global Step: 829070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:53:51,417-Speed 2643.40 samples/sec   Loss 1.1027   LearningRate 0.0000   Epoch: 19   Global Step: 829080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:55,310-Speed 2631.08 samples/sec   Loss 1.0801   LearningRate 0.0000   Epoch: 19   Global Step: 829090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:53:59,200-Speed 2632.64 samples/sec   Loss 1.1043   LearningRate 0.0000   Epoch: 19   Global Step: 829100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:54:03,108-Speed 2621.21 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 19   Global Step: 829110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:54:06,994-Speed 2635.33 samples/sec   Loss 1.1017   LearningRate 0.0000   Epoch: 19   Global Step: 829120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:54:10,886-Speed 2632.02 samples/sec   Loss 1.1158   LearningRate 0.0000   Epoch: 19   Global Step: 829130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:54:14,779-Speed 2630.84 samples/sec   Loss 1.1229   LearningRate 0.0000   Epoch: 19   Global Step: 829140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:54:18,696-Speed 2615.02 samples/sec   Loss 1.0303   LearningRate 0.0000   Epoch: 19   Global Step: 829150   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:22,591-Speed 2630.32 samples/sec   Loss 1.0785   LearningRate 0.0000   Epoch: 19   Global Step: 829160   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:26,513-Speed 2611.00 samples/sec   Loss 1.0798   LearningRate 0.0000   Epoch: 19   Global Step: 829170   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:30,423-Speed 2619.41 samples/sec   Loss 1.0702   LearningRate 0.0000   Epoch: 19   Global Step: 829180   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:34,321-Speed 2627.56 samples/sec   Loss 1.1038   LearningRate 0.0000   Epoch: 19   Global Step: 829190   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:38,225-Speed 2623.77 samples/sec   Loss 1.1519   LearningRate 0.0000   Epoch: 19   Global Step: 829200   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:42,126-Speed 2625.73 samples/sec   Loss 1.1132   LearningRate 0.0000   Epoch: 19   Global Step: 829210   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:46,034-Speed 2621.03 samples/sec   Loss 1.0980   LearningRate 0.0000   Epoch: 19   Global Step: 829220   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:49,942-Speed 2621.00 samples/sec   Loss 1.1046   LearningRate 0.0000   Epoch: 19   Global Step: 829230   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:53,845-Speed 2624.06 samples/sec   Loss 1.1321   LearningRate 0.0000   Epoch: 19   Global Step: 829240   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:54:57,745-Speed 2626.22 samples/sec   Loss 1.0972   LearningRate 0.0000   Epoch: 19   Global Step: 829250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:01,642-Speed 2628.41 samples/sec   Loss 1.0768   LearningRate 0.0000   Epoch: 19   Global Step: 829260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:05,549-Speed 2622.16 samples/sec   Loss 1.0704   LearningRate 0.0000   Epoch: 19   Global Step: 829270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:09,447-Speed 2628.03 samples/sec   Loss 1.0970   LearningRate 0.0000   Epoch: 19   Global Step: 829280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:13,348-Speed 2625.41 samples/sec   Loss 1.0774   LearningRate 0.0000   Epoch: 19   Global Step: 829290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:17,241-Speed 2631.48 samples/sec   Loss 1.1253   LearningRate 0.0000   Epoch: 19   Global Step: 829300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:21,137-Speed 2628.72 samples/sec   Loss 1.0973   LearningRate 0.0000   Epoch: 19   Global Step: 829310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:25,031-Speed 2630.23 samples/sec   Loss 1.0923   LearningRate 0.0000   Epoch: 19   Global Step: 829320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:28,924-Speed 2630.94 samples/sec   Loss 1.1196   LearningRate 0.0000   Epoch: 19   Global Step: 829330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:32,816-Speed 2631.71 samples/sec   Loss 1.1281   LearningRate 0.0000   Epoch: 19   Global Step: 829340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:55:36,708-Speed 2631.93 samples/sec   Loss 1.1024   LearningRate 0.0000   Epoch: 19   Global Step: 829350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:55:40,599-Speed 2632.62 samples/sec   Loss 1.1397   LearningRate 0.0000   Epoch: 19   Global Step: 829360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:55:44,499-Speed 2626.31 samples/sec   Loss 1.0814   LearningRate 0.0000   Epoch: 19   Global Step: 829370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:55:48,397-Speed 2627.95 samples/sec   Loss 1.0995   LearningRate 0.0000   Epoch: 19   Global Step: 829380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:55:52,288-Speed 2631.98 samples/sec   Loss 1.1269   LearningRate 0.0000   Epoch: 19   Global Step: 829390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:55:56,181-Speed 2631.03 samples/sec   Loss 1.0577   LearningRate 0.0000   Epoch: 19   Global Step: 829400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-16 16:56:00,056-Speed 2643.37 samples/sec   Loss 1.0995   LearningRate 0.0000   Epoch: 19   Global Step: 829410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:56:03,955-Speed 2627.41 samples/sec   Loss 1.0718   LearningRate 0.0000   Epoch: 19   Global Step: 829420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:56:07,847-Speed 2631.58 samples/sec   Loss 1.0946   LearningRate 0.0000   Epoch: 19   Global Step: 829430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:56:11,761-Speed 2617.34 samples/sec   Loss 1.1064   LearningRate 0.0000   Epoch: 19   Global Step: 829440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:56:15,635-Speed 2644.30 samples/sec   Loss 1.0652   LearningRate 0.0000   Epoch: 19   Global Step: 829450   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:19,526-Speed 2632.46 samples/sec   Loss 1.0805   LearningRate 0.0000   Epoch: 19   Global Step: 829460   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:23,416-Speed 2632.82 samples/sec   Loss 1.0919   LearningRate 0.0000   Epoch: 19   Global Step: 829470   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:27,462-Speed 2531.76 samples/sec   Loss 1.1180   LearningRate 0.0000   Epoch: 19   Global Step: 829480   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:31,350-Speed 2634.74 samples/sec   Loss 1.1004   LearningRate 0.0000   Epoch: 19   Global Step: 829490   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:35,245-Speed 2628.94 samples/sec   Loss 1.0774   LearningRate 0.0000   Epoch: 19   Global Step: 829500   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:39,146-Speed 2626.86 samples/sec   Loss 1.0985   LearningRate 0.0000   Epoch: 19   Global Step: 829510   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:43,047-Speed 2625.67 samples/sec   Loss 1.0852   LearningRate 0.0000   Epoch: 19   Global Step: 829520   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:46,947-Speed 2626.20 samples/sec   Loss 1.1044   LearningRate 0.0000   Epoch: 19   Global Step: 829530   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:50,849-Speed 2625.76 samples/sec   Loss 1.0974   LearningRate 0.0000   Epoch: 19   Global Step: 829540   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-16 16:56:54,744-Speed 2628.89 samples/sec   Loss 1.1006   LearningRate 0.0000   Epoch: 19   Global Step: 829550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:56:58,638-Speed 2630.66 samples/sec   Loss 1.1205   LearningRate 0.0000   Epoch: 19   Global Step: 829560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:57:02,537-Speed 2626.39 samples/sec   Loss 1.1310   LearningRate 0.0000   Epoch: 19   Global Step: 829570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-16 16:57:06,436-Speed 2627.50 samples/sec   Loss 1.0661   LearningRate 0.0000   Epoch: 19   Global Step: 829580   Fp16 Grad Scale: 16384   Required: -0 hours